The Oxford Handbook of Sound and Imagination 9780190460167, 9780190460242, 9780190460198, 9780190460273, 0190460164

Whether social, cultural, or individual, the act of imagination always derives from a pre-existing context. For example,

115 20 5MB

English Pages 877 [704] Year 2019

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Cover
The Oxford Handbook of SOUND AND IMAGINATION, VOLUME 2
Copyright
Contents
Acknowledgments
Contributors
The Companion Website
Introduction: Volume 2
The Chapters
Musical Performance
Systems and Technologies
Psychology
Aesthetics
Posthumanism
Reference
Part I: MUSICAL PERFORMANCE
Chapter 1: Improvisation: An Ideal Display of Embodied Imagination
Introduction
The Shared Double Histories of Imagination and Improvisation
Tensions between the Sacred and the Divine
Kant’s Reconciliation of Imaginations
Shifting Views on Imagination and Improvisation
Imagination in Embodied Cognitive Science
Imagination Eliminates the Need for a Transcendental Subjectivity
Friston’s Free Energy Principle
Language and Imagination
Conclusion
References
Chapter 2: Anticipated Sonic Actions and Sounds in Performance
Introduction
Approaches to Sounds in Traditional Western Performance
Perception and Imagination of Timbral Qualities
Sounds in Imagined Performances and Mental Practice
Anticipatory Imagery of Sonic Actions
Sonic Actions in Controller-Driven Performances of Digital Music
Imagined Bodily Causes of Sounds
Jackets and Gloves: Bespoke Mappings between Gestures and Sound
Conclusions
References
Chapter 3: Motor Imagery in Perception and Performance of Sound and Music
Introduction
Sound, Music, and the Body
Mimetic Motor Imagery and the Sense of Agency
Motor Imagery’s Role in Perceiving Music through Instrumental Actions
Ecological Embedding and Affordances
Gestures, Metaphors, and Models in Technological Instruments
Instruments, Objects, and the Others
Notes
References
Chapter 4: Music and Emergence
Introduction
Listening
The Unconscious
Groove
At Last!
Notes
References
Chapter 5: Affordances in Real, Virtual, and Imaginary Musical Performance
Introduction
The Hand as a Perceptual System, Musical Instruments as Tools
Virtual Instruments: Analog Hardware Reimagined
The Worshipful Fraternity of Air Guitarists
Further Discussion, and Some Remarks on Representations
Acknowledgments
Notes
References
Part II: SYSTEMS AND TECHNOLOGIES
Chapter 6: Systemic Abstractions: The Imaginary Regime
Introduction
Interval
Diatonic Scaling
Some Notational Implications
Interface
Interaction
Final Remarks
Notes
References
Chapter 7: From Rays to Ra: Music, Physics, and the Mind
Introduction
Replication, Repetition, Homeostasis
Self-Replication
Invariance
Emergent Structure
Swarm Behavior
Homeostatic Frames of Reference
Periodicity
Entrainment and Social Bonding
Music and Homeostasis
Sub- and Supraconscious Levels
Music as Expanding Homeostatic Frames of Reference
Emotions and Embodied Meaning
Sun Ra
Musical Universals versus Cultural Difference
Resonance and Neural Synchronization
Conclusions
Notes
References
Chapter 8: Music Analysis and Data Compression
Introduction
Encodings, Decoders, and Two-Part Codes
Music Analysis and Data Compression
Music-Theoretical Concepts That Promote Compact Encodings of Musical Objects
Kolmogorov Complexity
Music Analysis and Perceptual Coding
A Sketch of a Compression-Based Model of Musical Learning
Using the Model to Explain Individual Differences
COSIATEC: Music Analysis by Point-Set Compression
Evaluating Music Analysis Algorithms
Applying a Compression-Driven Approach to the Analysis of Musical Audio
Summary
Acknowledgments
Notes
References
Chapter 9: Bioacoustics: Imaging and Imagining the Animal World
Introduction
Sounding Animals
Recoding the Recording—Catching Nighthawks
The Transacoustic Community
References
Chapter 10: Musical Notation as the Externalization of Imagined, Complex Sound
Introduction
A Symphonic Concert
A yoik—Complexity on a Different Scale
A Responsorial Chant in a Mass
Externalization and Complexity
Information and Complexity in Biology and Culture
The Externalization of Pitch and Intervals
Pythagoras, Sounds, and Mathematics
The Octave Revolution
Externalization and the Emergence of Complexity
Selves, Individuals, and Individualities
Summing Up
Acknowledgments
Notes
References
Chapter 11: “. . . they call usby our name . . .”: Technology, Memory, and Metempsychosis
Introduction
Memory and Imagination
Memory and Representation
Memory beyond Representation
Metempsychosis
Ghosts in the Machine? Uncanny Connections between Telegraphy and Phonography
Proust: Telephone and Camera
Recordings
Notes
References
Chapter 12: Musical Shape Cognition
Introduction
Notions of Shape
Motor Cognition
Musical Timescales
Sound Features
Motion Features
Multimodal Sound-Motion Shapes
Musical Instants
Shape Cognition in Musical Imagery
Prospects and Challenges
References
Chapter 13: Playing the Inner Ear: Performing the Imagination
Introduction
From Information to Imagination
Imagination and Imagery
Form in Space and Time
Reverse Engineering: From Reaction to Generation
First Steps Toward Reconstruction
Rendering Memory
Rendering Imagination
Memory and Imagination
Embodied Response
Time Scales of the Embodied
Synesthesia and the Visual Imagination
Notation, Visualization, and Evocation
Reverse Engineering: “Air Imagining”
Playing and Performing the Imagination
Space and Imagination
Sound Sculpting, Sound Dancing
An Imaginative Plug-In—Imaginary Sound Transformation
Where Do We Begin? Seeds and Provocations
Nature, Playing, and Performing
Conclusion—and a Footnote on Ethics and the Transparency of the “Fourth Wall”
Notes
References
Part III: PSYCHOLOGY
Chapter 14: Music in Detention and Interrogation: The Musical Ecology of Fear
Introduction
Music in Detention/Interrogation
The History and Broader Context of Music in Detention and Interrogation
Music and Behavior Control
Music in Military Life
Psychological Warfare
Sound Weapons
Music in Interrogation and Detention: Recent US Practices
Music, Information, and Interpretation: Fear and Imagination
Sound as Information about Objects and Events
Music and Sound as Information for Interpretation and Imagination
Imagination and Fear of Music
Conclusion: Music, Torture, and the Necessity of Experience
References
Chapter 15: Augmented Unreality: Synesthetic Artworks and Audiovisual Hallucinations
Introduction
Altered States of Consciousness
Visual Hallucinations
Auditory Hallucinations
Synesthesia
Toward Representation
Audiovisual Representations of Hallucinations
Diegetic Representations of Hallucinations
Synesthetic Artworks
A Conceptual Model for Audiovisual Representations of ASCs
Input
Mode of Representation
Arena Space
In Practice
Augmented Unreality
Concluding Remarks
Notes
References
Chapter 16: Consumer Sound
Introduction
Sensory Analysis of Complex Audio Stimuli
Sensory Analysis of Interaction between Sound Zones: Development of a Perceptual Model for Prediction of “Distraction”
Stages One and Two: Free Elicitation and Team Discussions
Stage Three: Attribute Reduction
Stage Four: Attribute Ratings
Attribute Modeling
Discussion
The Next Step
Notes
References
Chapter 17: Creating a Brand Image through Music: Understanding the Psychological Mechanisms behind Audio Branding
Introduction
From Classical Conditioning to Music-Brand Fit
Meaning Structure in Identity-Based Brand Management
The Role of Music in Creating Brand Images
Brand Salience
Emotional Brand Meaning
From Expression to Recognition and Feeling of Emotion in Music
Music-Emotion Induction Mechanisms Applied to Audio Branding
Cognitive appraisal
Memory-Based Mechanisms
Low-Level Stimulus Responses
Responses to Expressive Schemata and Personas
Cognitive Brand Meaning
An Integrated Brand-Music Communication Model
References
Chapter 18: Sound and Emotion
Introduction
The Auditory System
Sound Perception and Localization
Attention and Higher Order Influences
Emotional Responses to Auditory Stimuli
Mental Representations Induced by Sound
Learned Emotional Meaning of Sound
Vocal Affect
Music
Musical Emotions
Psychological Mechanisms of Emotion Induction by Music
Neural Correlates of Musical Emotions
Emotional Influences on Sound Perception and Auditory Attention
Concluding Remarks
References
Chapter 19: Voluntary Auditory Imagery and Music Pedagogy
Introduction
Psychology Research on Auditory Imagery
Individual Differences in Auditory Imagery Abilities
Music Pedagogy and Voluntary Auditory Imagery
Rehearsal Strategies for Instrumental Performance
Conclusions
Acknowledgments
Notes
References
Chapter 20: A Different Way of Imagining Sound: Probing the Inner Auditory Worlds of Some Children on the Autism Spectrum
Introduction
An Ecological Model of Auditory Perception
Absolute Pitch
The Impact of Remembering and Imagining Musical Sounds in Absolute Terms
A Session with Romy
Freddie—the Silent Musician
Conclusion
Notes
References
Chapter 21: Multimodal Imagery in the Receptive Music Therapy Model Guided Imagery and Music (GIM)
Introduction
Guided Imagery and Music—Music Listening as Psychotherapy
Multimodal Imagery in GIM—Examples and a Neuroaffective Perspective
Theories of Consciousness, Music, Imagery, Emotion, and Health—as Related to GIM
Discussion
Conclusion
References
Chapter 22: Empirical Musical Imagery beyond the “Mind’s Ear”
Introduction
Embodied Cognition and Mental Imagery
Offline Cognition
An Embodied Review of Empirical Studies of Musical Imagery
Imagery in Performance
Imagery in Composing and Listening
Voluntary Musical Imagery
Involuntary Musical Imagery
Tests of Musical Imagery’s Embodiment
Directions for Future Research
Concluding Remarks
Notes
References
Paet IV: AESTHETICS
Chapter 23: Imaginative Listening to Music
Introduction
The Opposition of Hearing and Listening
Three Species of Imagination
Props, Triggers, and Absolute Music
Expressiveness
Metaphor, Musical Space, and Movement
Experiential Illusion
Less Obvious Candidates
Conclusion
Notes
References
Chapter 24: A Hopeful Tone: A Waltonian Reconstruction of Bloch’s Musical Aesthetics
Introduction
Waltonian Fictionality
Walton on Music
The Historical and Class Character of Walton’s Theory
Bloch’s Musical Aesthetics
Utopia
Waltonizing Bloch
Conclusion
Notes
References
Chapter 25: Sound as Environmental Presence: Toward an Aesthetics of Sonic Atmospheres
Introduction
Environmentality
Environmental Imagination
Basic Sonic Environmentalities: Atmosphere, Ambience, and Ecology
Atmosphere as Environmental Presence
Sonic Atmospheres
David Lynch: Eraserhead (1977)
Janet Cardiff and George Bures Miller: Forest (for a Thousand Years) (2012)
Conclusion
References
Chapter 26: The Aesthetics of Improvisation
Introduction
A Philosophical Humanist Approach to Music Aesthetics
High Art and Vernacular Art
Perfectionist and Imperfectionist Aesthetics
The Concept of Improvisation and “Improvised Feel”
Spontaneity and the Aesthetics of Perfection
Free Improvisers, Interpreters, and “Improvisation as a Compositional Method”
Jazz as Classical Music
Defining Popular and Classical Music
The Critique of “Jazz as Classical Music”
Art and Entertainment: Jazz as an Art Music of Improvisation
Acknowledgments
Notes
References
Part V: POSTHUMANISM
Chapter 27: Sonic Materialism: Hearing the Arche-Sonic
Introduction
The Ancestrality of a Sonic World
Arche-Sonic Vibrations
Porous Bodies
What Is the Vibrational Facticity of Impossible Bodies and Things?
Political Textures
Conclusion
Notes
References
Chapter 28: Imagining the Seamless Cyborg: Computer System Sounds as Embodying Technologies
Introduction
From Circuit Sonification to Audio Branding
System Sounds and Affect
Everyday Cyborgs
The Sonic Smoothing of the Prosthesis
Sound Glitches as Intervention
Notes
References
Chapter 29: Glitched and Warped: Transformations of Rhythm in the Age of the Digital Audio Workstation
Introduction
The Prehistory: “Organic” and “Machinic” Rhythms in the Popular Music Mainstream
Microrhythmic Manifestations of the Digital Audio Workstation: Two Trends
An Extension of the Human?
Imagining the “Humachine” through Sound
Notes
References
Chapter 30: On the Other Side of Time: Afrofuturism and the Sounds of the Future
Introduction
Space Is the Place
Afrofuturism and After
Blackness and Technology
The Music of the Future
Sonic Fiction
Transmolecularization—Beyond Sun Ra
Notes
References
Chapter 31: Posthumanist Voices in Literature and Opera
Introduction
Autopoiesis and the Autoaffective Voice
Videocentrism and Expressive Voices
The Phantom of the Operatic Voice
Living in a Material World: Luba Luft’s Pamina
Conclusion
Notes
References
Further Reading
Index
Recommend Papers

The Oxford Handbook of Sound and Imagination
 9780190460167, 9780190460242, 9780190460198, 9780190460273, 0190460164

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

T h e Ox f o r d H a n d b o o k o f

S OU N D A N D I M AGI NAT ION, VOLU M E 2

The Oxford Handbook of

SOUND AND IMAGINATION, VOLUME 2 Edited by

MARK GRIMSHAW-AAGAARD, MADS WALTHER-HANSEN, and

MARTIN KNAKKERGAARD

1

1 Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide. Oxford is a registered trade mark of Oxford University Press in the UK and certain other countries. Published in the United States of America by Oxford University Press 198 Madison Avenue, New York, NY 10016, United States of America. © Oxford University Press 2019 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, by license, or under terms agreed with the appropriate reproduction rights organization. Inquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above. You must not circulate this work in any other form and you must impose this same condition on any acquirer. Library of Congress Cataloging-in-Publication Data Names: Grimshaw-Aagaard, Mark. | Walther-Hansen, Mads. | Knakkergaard, Martin. Title: The Oxford handbook of sound and imagination / edited by Mark Grimshaw-Aagaard, Mads Walther-Hansen, and Martin Knakkergaard. Description: New York, NY : Oxford University Press, [2019] | Series: Oxford handbooks | Includes bibliographical references and index. Identifiers: LCCN 2018049753 | ISBN 9780190460167 (v. 1: cloth : alk. paper) | ISBN 9780190460242 (v. 2: cloth : alk. paper) | ISBN 9780190460198 (v. 1: oxford handbooks online) | ISBN 9780190460273 (v. 2: oxford handbooks online) Subjects: LCSH: Music—Psychological aspects. | Sound—Psychological aspects. | Imagination. Classification: LCC ML3830 .O92 2019 | DDC 781.1/1—dc23 LC record available at https://lccn.loc.gov/2018049753 1 3 5 7 9 8 6 4 2 Printed by Sheridan Books, Inc., United States of America

Contents

Acknowledgmentsix Contributorsxi The Companion Websitexiii

Introduction: Volume 2

1

Mark Grimshaw-Aagaard, Mads Walther-Hansen, and Martin Knakkergaard

PA RT   I   M U SIC A L P E R F OR M A N C E 1. Improvisation: An Ideal Display of Embodied Imagination

15

Justin Christensen

2. Anticipated Sonic Actions and Sounds in Performance

37

Clemens Wöllner

3. Motor Imagery in Perception and Performance of Sound and Music

59

Jan Schacher

4. Music and Emergence

77

John M. Carvalho

5. Affordances in Real, Virtual, and Imaginary Musical Performance

97

Marc Duby

PA RT I I   SYST E M S A N D T E C H N OL O G I E S 6. Systemic Abstractions: The Imaginary Regime

117

Martin Knakkergaard

7. From Rays to Ra: Music, Physics, and the Mind

133

Janna K. Saslaw and James P. Walsh

8. Music Analysis and Data Compression David Meredith

153

vi   contents

9. Bioacoustics: Imaging and Imagining the Animal World

179

Mickey Vallee

10. Musical Notation as the Externalization of Imagined, Complex Sound

191

Henrik Sinding-Larsen

11. “. . . they call us by our name . . .”: Technology, Memory, and Metempsychosis

219

Bennett Hogg

12. Musical Shape Cognition

237

Rolf Inge Godøy

13. Playing the Inner Ear: Performing the Imagination

259

Simon Emmerson

PA RT I I I   P SYC HOL O G Y 14. Music in Detention and Interrogation: The Musical Ecology of Fear

281

W. Luke Windsor

15. Augmented Unreality: Synesthetic Artworks and Audiovisual Hallucinations

301

Jonathan Weinel

16. Consumer Sound

321

Søren Bech and Jon Francombe

17. Creating a Brand Image through Music: Understanding the Psychological Mechanisms behind Audio Branding

349

Hauke Egermann

18. Sound and Emotion

369

Erkin Asutay and Daniel Västfjäll

19. Voluntary Auditory Imagery and Music Pedagogy

391

Andrea R. Halpern and Katie Overy

20. A Different Way of Imagining Sound: Probing the Inner Auditory Worlds of Some Children on the Autism Spectrum Adam Ockelford

409

contents   vii

21. Multimodal Imagery in the Receptive Music Therapy Model Guided Imagery and Music (GIM)

427

Lars Ole Bonde

22. Empirical Musical Imagery beyond the “Mind’s Ear”

445

Freya Bailes

PA RT I V   A E S T H E T IC S 23. Imaginative Listening to Music

467

Theodore Gracyk

24. A Hopeful Tone: A Waltonian Reconstruction of Bloch’s Musical Aesthetics

489

Bryan J. Parkhurst

25. Sound as Environmental Presence: Toward an Aesthetics of Sonic Atmospheres

517

Ulrik Schmidt

26. The Aesthetics of Improvisation

535

Andy Hamilton

PA RT   V   P O S T H UM A N I SM 27. Sonic Materialism: Hearing the Arche-Sonic

559

Salomé Voegelin

28. Imagining the Seamless Cyborg: Computer System Sounds as Embodying Technologies

579

Daniël Ploeger

29. Glitched and Warped: Transformations of Rhythm in the Age of the Digital Audio Workstation

595

Anne Danielsen

30. On the Other Side of Time: Afrofuturism and the Sounds of the Future

611

Erik Steinskog

31. Posthumanist Voices in Literature and Opera

629

Jason R. D’aoust

Index

653

Acknowledgments

This handbook has been a four-year labor of, if not unconditional love, then surely a love tempered by blood, toil, tears, and sweat. Completing the task from proposal to completion, of compiling, editing, and publishing a two-volume work of over 650,000 words, requires dedication, attention to detail, and, at times, sheer bloody-mindedness. Here, thanks are due to the many people without whom the book you now hold in your hand would not have seen the light of day. Our first thanks go to our commissioning editor Norm Hirschy and music editor Lauralee Yeary, both of Oxford University Press, who not only had the vision to see beyond the shortcomings of our initial proposal but also firmly and patiently guided us through the many twists and turns of putting together the final manuscript. Additionally, we are grateful to their many nameless colleagues at the press who have tirelessly labored over copyediting, proofing, design, indexing, and a host of other unknown tasks that take place behind the scenes. Thanks are also due a number of anonymous reviewers, from proposal through to draft manuscript, who were overwhelmingly supportive of what they read while also presenting us with many suggestions for expansion and improvement. Alistair Payne, Professor of Fine Art Practice and Head of the School of Fine Art at the Glasgow School of Art, has our gratitude for allowing us to use his magnificent diptych The Fall as the cover art. Finally, although it is our names on the front of the handbook—and thus our responsibility for any errors that remain—none of what you are reading would have been possible without the contributions of our authors who have neither ceased in their enthusiasm for the project nor flagged in the face of countless e-mails from us. Our heartfelt thanks go to them; we hope you enjoy their efforts.

Contributors

Erkin Asutay, Postdoctoral Researcher, Department of Behavioral Sciences and Learning, Linköping University Freya Bailes, Academic Fellow, University of Leeds Søren Bech, Director Research, Bang & Olufsen a/s and Professor, Aalborg University Lars Ole Bonde, Professor Emeritus in Music Therapy, Aalborg University; Professor Emeritus in Music and Health, Center for Research in Music and Health, The Norwegian Academy of Music John M. Carvalho, Professor of Philosophy, Villanova University Justin Christensen, Postdoctoral Researcher, University of Saskatchewan Anne Danielsen, Professor, RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo Jason R. D’Aoust, Visiting Assistant Professor of Comparative Literature and Musical Studies, Oberlin College and Conservatory Marc Duby, Research Professor in Musicology, University of South Africa Hauke Egermann, Assistant Professor, York Music Psychology Group, University of York Simon Emmerson, Professor of Music, Technology and Innovation, Leicester Media School, De Montfort University Jon Francombe, Senior Research and Development Engineer, BBC Research and Development Rolf Inge Godøy, Professor, Department of Musicology and the RITMO Centre for Interdisciplinary Studies in Rhythm, Time and Motion, University of Oslo Theodore Gracyk, Professor of Philosophy, Minnesota State University Moorhead Mark Grimshaw-Aagaard, The Obel Professor of Music, Aalborg University Andrea R. Halpern, Professor of Psychology, Bucknell University Andy Hamilton, Professor of Philosophy, Durham University Bennett Hogg, Senior Lecturer, International Centre for Music Studies, Newcastle University Martin Knakkergaard, Senior Lecturer, Aalborg University

xii   contributors David Meredith, Associate Professor, Aalborg University Adam Ockelford, Professor of Music and Director of the Applied Music Research Centre, University of Roehampton Katie Overy, Senior Lecturer in Music, University of Edinburgh Bryan  J.  Parkhurst, Assistant Professor of Music Theory and Philosophy, Oberlin College and Conservatory Daniël Ploeger, Research Fellow, Royal Central School of Speech and Drama, University of London Janna K. Saslaw, Professor of Music Theory, Loyola University New Orleans Jan Schacher, Research Associate, Institute for Computer Music and Sound Technology, Zurich University of the Arts Ulrik Schmidt, Associate Professor, Roskilde University Henrik Sinding-Larsen, Researcher, Department of Social Anthropology, University of Oslo Erik Steinskog, Associate Professor, University of Copenhagen Mickey Vallee, Tier II Canada Research Chair in Community, Identity and Digital Media, Athabasca University Salomé Voegelin, Reader in Sound Arts, London College of Communication, University of the Arts London Daniel Västfjäll, Professor, Department of Behavioral Sciences and Learning, Linköping University James P. Walsh, Adjunct Instructor of Music, Loyola University New Orleans Mads Walther-Hansen, Associate Professor, Aalborg University Jonathan Weinel, Visiting Research Fellow, Aalborg University W. Luke Windsor, Professor of Music Psychology, School of Music, University of Leeds Clemens Wöllner, Professor of Systematic Musicology, Universität Hamburg

The Companion Website

www.oup.com/us/ohsi2 Oxford has created a website of images to accompany The Oxford Handbook of Sound and Imagination, Volume 2. Readers are encouraged to consult this resource while reading the volume as many images on the website are in color.

I n troduction Volume 2 Mark Grimshaw-Aagaard, Mads Walther-Hansen, and Martin Knakkergaard

. . . Fümms bö wä tää zää Uu, pögiff, kwiiee. Dedesnn nn rrrrr, Ii Ee, mpiff tillff toooo, tillll, Jüü-Kaa? Rinnzekete bee bee nnz krr müüüü, ziiuu ennze ziiuu rinnzkrrmüüüü, Rakete bee bee. Rrummpff tillff toooo? Ziiuu ennze ziiuu nnzkrrmüüüü, ziiuu ennze ziiuu rinnzkrrmüüüü, Rakete bee bee. Rakete bee zee . . . —Kurt Schwitters, URSONATE (extract)

A working assumption might be that imagination has its genesis in past experience, whether that genesis is social, cultural, or individual, and this influences the interpretation of context and directs the thinking and ideas that arise from it. This is the theme that fundamentally constitutes the substance of this handbook: the role and effect of imagination in the development and use of sonic processes and artifacts. Whether the act of imagination is a previously unheard sound in a science fiction movie or a new compositional style, such a process always derives from, and may be discussed and made sense of in relation to, something pre-existing; the mundane recordings of wildlife that form the basis for the alien’s screech, for example, or a distinctive difference from other compositional styles. Yet, one should not make the mistake of assuming sonic imagination is purely to do with the creation of new artifacts; one can rehearse mentally a piece of music or recall and imagine a previously heard sound for the silent action seen on screen. Equally, imaginative sound processes and artifacts themselves provoke other instances and forms of imagination often far removed from the field of sound. It is this broad reach that the handbook endeavors to cover.

2   grimshaw-aagaard, walther-hansen, and knakkergaard A quick perusal of books on imagination will demonstrate that, if it is not viewed as abstract or creative thought, imagination is typically discussed in terms of image, as is clear from the root of the word itself. Equally, previous works on sonic imagination are predominantly on the subject of musical imagination, but they disguise the topic of imagination under themes of musical compositional creativity or performance techniques such as improvisation or deal solely with auditory imagery in the domain of neurosciences and psychoacoustics. When initially proposed to Oxford University Press in early 2015, the handbook then envisioned consisted of forty-nine chapters and, while we included chapters covering the “traditional” areas of sonic imagination just noted, we also deliberately included chapters that dealt with other aspects of musical and auditory imagery and, bringing in other viewpoints from study areas that, prima facie, have little to do with sound, speech, and what might be called pure sound (that is, sound that, in English, is classed neither as music nor as speech). The handbook has since grown to seventy chapters, and the “sound” in the handbook’s title covers the broad domains of pure sound, music, and speech from numerous perspectives both conflicting and complementary. This is neither a humanities handbook nor a natural sciences handbook. As we make clear in what follows, we eschew such proscriptive labels. The handbook is determinedly multidisciplinary and so includes contributions from scholars and practitioners from numerous disciplines and fields including musicology, acoustics and psychoacoustics, sound studies, film studies, soundscape practice, literature, computer sciences, psychology, computer games, acoustic ecology, cognition and neuroimaging, and the list goes on. Thinking about and working with sound and imagination belongs to no one area.

The Chapters The handbook comprises seventy chapters (excluding this Introduction) shared across ten parts and two volumes that broadly arc across philosophical concerns to more practical matters before returning to philosophical issues again. However, the reader should not expect a particular part to be purely philosophical and untainted by practice or for those parts ostensibly dealing with practice to be unsullied by philosophy. As a multidisciplinary handbook, we have endeavored to maintain that ethos across all parts meaning that the reader, moving sequentially through the book, will, for instance, find a chapter on the relationship of imagination to presence in the context of multimodal surfaces juxtaposed to one dealing with the science of auditory imagery or a chapter on synesthetic art and hallucination abutting another detailing the process of controlling or even excluding the listener’s imagination from auditory imagery. This is quite deliberate and is a demonstration that particular topics within the broad theme of sound and imagination are as common to a variety of disciplines as those disciplines’ writing styles are diverse. Yet there is a more devious method at work here: in a world where universities, politicians, and research funding bodies all implicitly or explicitly work toward the

introduction: volume 2   3 ­ rioritization of certain forms and areas of research, we would rather present a handbook p structure that ignores the barriers that arise in response to such short-term, limited, and, yes, unimaginative thinking in order to show that the conditions for new thoughts and ideas and for the synthesis of new knowledge are best nurtured and sustained in the absence of academic siloes. So, our advice to the reader of this handbook is to indeed read sequentially, and, in this, we trust that inspiration will be found. Volume 2 of the handbook comprises five parts: “Musical Performance,” “Systems and Technologies,” “Psychology,” “Aesthetics,” and “Posthumanism.” The first part takes us into the sphere of musical performance and imagination. The chapters here cover musicking and meaning-making in improvisation, the construction and use of sonic imagery in performance, the role of motor imagery when playing a musical instrument, the emergence of a hidden music facilitated through embodiment and environmental affordance, and the connection between gesture and sound, particularly the imagined sound of the air guitar. Part 2 of the volume has as its framework sound and imagination in the context of systems and technologies. These are systems and technologies that underpin not only the production of sound and music but also the analysis and description of sound and music, stressing the role of imagination in what is consequently conceived of or interpreted. There are chapters on the profound influence of Ancient Greek imagination on Western tuning systems, the centrality of repetition not only to the emergence of life but also to the experience of music, the compression of musical information as a means to analytical musical knowledge, how the technology and interpretation of bioacoustics imposes a potentially incorrect imagining of the existences of other animal species and our relationships to them, musical notation as an externalization of music, the reliance of sound recording on the imagination, shape cognition and the experience of music, and, finally, a speculative essay on a tool to extract sound imagery for musical purposes. Part 3 concentrates on the psychology of sound and imagination. The first chapter covers psychological warfare and interrogation practices under the influence of music, tying this to the ethics of marketing, while the next chapter takes as its topic audiovisual media, such as VJ events and gaming, to show how sounds can be used to evoke hallucinatory experiences. The next four chapters deal with the areas of the control of auditory imagination in the context of listening tests for the design of consumer audio products, the use of music in the creation of brand identity, the role of emotions in sound perception, and the part that musical imagery plays in music education and performance rehearsal. The final three chapters in this part have topics ranging from the relevance of autism research to the development of musical ability, the role of musical imagery in music therapy, and musical imagery in an embodied framework. The penultimate part of the volume comprises four essays dealing with aesthetics and sonic imagination. Topics range over hearing-in—a form of imaginative engagement with music—the viewpoint that music can be seen as a utopian allegory, the affective aesthetic potential of sonic environments, and the aesthetics of imperfection in the context of musical improvisation. Part 5 positions the theme of sound and imagination within posthumanism. The part begins with a chapter on sonic materialism that brings sound into the posthumanist

4   grimshaw-aagaard, walther-hansen, and knakkergaard debate on New Materialism. This is followed by an essay on computer operating system sounds as cybernetic prostheses. The third chapter has as its topic rhythmic feel in ­popular music and the uses of music technology to extend the human’s natural repertoire of gestures. Afrofuturism is the topic of the next chapter, in which the style’s music is presented as a form of sonic time travel, while the final chapter comprises a posthumanist appraisal of vocality in the context of opera.

Musical Performance Justin Christensen deals with the bond between improvisation and imagination in artistic experience. Starting with a reassessment in continental philosophy both of how imagination is conceived and can be demonstrated, Christensen observes that the connection between improvisation and imagination has little value in classic aesthetic theories. He then goes on to argue for the value of improvisation as a reflection of perception– action coupling that is central to newer theories that favor embodied approaches to music cognition. In the light of such theories, where perception, action, and imagination are seen as interdependent properties, Christensen proposes a greater recognition of the processes of musicking—including improvisation—to better understand meaningmaking and the role of imagination in musical experience. Clemens Wöllner’s chapter deals with sonic actions in music performance. He argues that musicians construct sonic images in the act of playing that allow them to anticipate sonic actions and to perform without auditory feedback (for instance, when sound is switched off during a performance). The construction of sonic images is discussed in the context of performances on both traditional and controller-driven instruments, and Wöllner shows how a performer’s anticipated sonic actions differ according to the type of instrument. In relation to this, the level of detail of the imagined sound qualities involved in auditory imagery is explored, and Wöllner considers the mappings between gesture and sound that are required for audiences to imagine the sound emerging from the performer’s actions. In his chapter, Jan Schacher asks the question of what it is to imagine and initiate an action on a musical instrument. For Schacher, the body is the central element of listening and sound perception; thus, the body, in an embodied and enactive sense, becomes the focus for his explication of musicking on both conventional instruments and digital instruments where, in the latter case, bodily schemata are replaced by metaphors and instrumental representations. This last provides a significant topic of enquiry in the chapter and the theme is explored from a number of angles, chief among which is a focus (through the lenses of motor imagery and imagination in music) on relations between inner and outer aspects of our ways and means of listening to and performing music and sound. Ultimately, Schacher identifies a tension underlying digital musical performance brought about by the fracturing of the action–sound bond that is the basis not only for our sound perception of the natural world but also for the world of culturally ingrained musical performance.

introduction: volume 2   5 John Carvalho’s chapter is about the music that emerges in a skilled engagement with an environment of sound. This music emerges in a piece of music when the embodied skills of a composer, performer, or listener enact affordances that turn up in the environment defined by that piece of music. For Carvalho, the imagination animates these skills and directs their embodied testing of the environment for affordances to be enacted in making music emerge in it. To support this argument, Carvalho turns to an ecology of mind and a taxonomy of listening that account for how composers, performers, and ­listeners enact the music in a piece of music. Marc Duby bases his exploration of sound and imagination on James J. Gibson’s affordance concept. Using this concept, Duby studies how musicians benefit from real and imagined actions in their interaction with real (e.g., pianos), virtual (e.g., MIDI controllers), and air instruments (e.g., air guitars [nonexistent instruments]). In each case, Duby explores the connection between gesture and sound and how the instruments afford creativity. This leads to discussions of the range of imaginary possibilities that the instruments afford musicians in the act of performing, composing, and listening, and of how the special case of the air guitar challenges existing theories of embodied cognition.

Systems and Technologies The second chapter in the handbook by Martin Knakkergaard concentrates on the development of a system designed to constrain the pitch interval in Western music. Western tuning systems, according to Knakkergaard, developed out of the Ancient Greek’s mystical fascination with the number 4. Thus, the Ancient Greek’s beliefs and imaginations are very much alive today in a musical system that both regulates and guides compositional imagination, and this remains the case even with the apparent freedom offered music by digitalization. Janna Saslaw and James Walsh speculate that repetition—as a principle central to the emergence of life—could also be seen as central to the experience of music. Their argument involves discussions about a number of key components involved in the continuous process of developing the human species, such as self-replication, invariance, emergent structure, swarm behavior, homeostatic frames of reference, periodicity, resonance, and entrainment. In relation to this, the music of Sun Ra is used as an exemplification of the evolutionary benefit of music in creating more efficient homeostasis. David Meredith’s chapter focuses on how the musical knowledge that underpins both music perception and musical imagery can be acquired by compressing musical infor­ mation. The chapter is based on the notion that the goal of music analysis is to find the best possible explanations for the structures of musical objects, such as individual works or movements as well as extracts from works or collections of works. In opposition to what Meredith considers to be the subjectively grounded evaluation of different analyses of the same music, carried out by music analysts who adopt a humanistic approach, his chapter discusses what he sees as objectively evaluable analysis tasks aimed at finding ways of understanding musical objects that allow us to most effectively carry out the musical tasks.

6   grimshaw-aagaard, walther-hansen, and knakkergaard In a chapter that explores the use within bioacoustics of technology and interpretation of its data in order to assess human acoustic impact on nonhuman species, Mickey Vallee introduces the term “transacoustic community” to illustrate the nefarious and transgressive means those data are put to. Vallee makes the charge that the bioacoustics community hears without listening, having a different imagination of sound to other sound-based researchers. This imagination springs not only from the specific aims of the bioacoustics community but also from the audio technology used that ultimately relies on visualization for its data access; thus, the requirement of a mastery of visual interpretation, rather than a refined aurality, affects our understanding of the relationship between humans and other species. Henrik Sinding-Larsen presents an analysis of how new tools for the visual description of sound revolutionized the way music was conceived, performed, and disseminated. The Ancient Greeks had previously described pitches and intervals in mathematically precise ways. However, their complex system had few consequences until it was combined with the practical minds of Roman Catholic choirmasters around 1000 ce. Now, melodies became depicted as note-heads on lines with precise pitch meanings and with note names based on octaves. This graphical and conceptual externalization of patterns in sound paved the way for a polyphonic complexity unimaginable in a purely oral/aural tradition. However, this higher complexity also entailed strictly standardized/homogenized scales and less room for improvisation in much of notation-based music. Through the concept of externalization, lessons from the history of musical notation are generalized to other tools of description, and Sinding-Larsen ends with a reflection on what future practices might become imaginable and unimaginable as a result of computer programming. Bennett Hogg queries the relations that sound recording has commonly been thought to have to memory, in particular mechanistic approaches to both memory and recording that see them as processes that fix things through time. Making sense of memories as  they are “laid down,” and as they are “recalled,” involves imagining novel con­ nections between memorized materials and networks of sensory, social, and cultural experience. Imagination, through time, subtly reworks memories, modulating their affect, re-evaluating the significance of particular memories, mythologizing them, even. To understand listening to recordings according to a rather reductive model of memory risks misrepresenting the richness of the cognitive ecosystem in which listening occurs. In looking for a new metaphor to inhabit this ecosystem of memory, imagination, and persistence through time, Hogg proposes metempsychosis, the transmigration of souls, as a more suggestive model. Rolf Inge Godøy’s chapter is focused on how notions of shape, understood as geometric figures and images stemming from body-motion, metaphors, graphic representation, and so on, can be associated with the production and perception of music. Central to the chapter is the understanding that shape cognition is not only deeply rooted in the human experience of music and in musical imagery as such, but also has the potential to enhance our understanding of music as a phenomenon in general. The chapter also discusses

introduction: volume 2   7 how musical shape cognition, given that it is becoming increasingly more feasible with new technology, can contribute to various domains of music-related research and, furthermore, can be highly valuable to practical applications in musical and multimedia artistic creation. The conceiving of an evocative synthesis engine from our imagining of sound is the substance of Simon Emmerson’s chapter. In it, Emmerson surveys recent neurological experiments in the synthesis of speech and music and focuses his attention on how our imagining of sound might be synthesized at some future date. The purpose of this speculative chapter is not to map out the design and interface of such a system but rather to conceive of what the act of imagining sound is and how the tool to extract such sound imagery might be used both for musical purposes and to externalize these formerly private sounds.

Psychology Luke Windsor focuses on enforced listening to music in detention camps and explores the use of music in detention and interrogation while pointing to the creation of ambiguity and uncertainty as a central effect. Windsor engages with several cases of psychological warfare during previous wars and references interrogation practices described by CIA. The exploration of these cases, where music is used as a sound weapon, leads to a broader discussion of the application of music to influence behavior and the ethics of music application in, for instance, marketing. The representation of hallucinations within audiovisual media forms the subject of Jonathan Weinel’s chapter. Weinel builds his discussion around the concept of augmented unreality, and he provides examples from films, VJ performances, digital games, and other audiovisual media to show how sounds are used to form hallucinations. Ultimately, Weinel points to a set of structural norms that defines psychedelic hallucinations and the hypothesis that, with the improvement of digital technologies, the boundaries between external reality and synthetic unreality may gradually dissolve. Søren Bech and Jon Francombe’s chapter provides an illustration of how sensory analysis is undertaken in the audio industry. It demonstrates how the industry attempts to quantify the listener’s imagination—which is taken to include a range of modifiers of the listener’s auditory experience including mood, expectation, and previous experience— in order to ensure that the end result, the listener’s auditory experience or impression after the audio transmission chain, matches the intended experience as closely as possible. The example provided to illustrate this is of sensory analysis used for qualitative and quantitative evaluation of the listening experience in a personal sound zone. A perceptual model was developed to reliably predict the listener’s sense of distraction (due to interfering audio) from the experience of listening to audio intended for a particular zone. Hauke Egermann explores the influence of music on how consumers imagine characteristics of a brand. The chapter deals with several psychological mechanisms to outline

8   grimshaw-aagaard, walther-hansen, and knakkergaard the associative and emotional potential of music and illustrates how music aids in establishing brand recognition and recall in consumers. Egermann elaborates on how music can create brand attention and positive-affective responses in consumers and can affect the cognitive meaning of a brand image. In summation, he argues for a brand-music communication model that describes three different functions of music in the creation of brand identity—brand salience, cognitive meaning, and emotional meaning. The focus of Erkin Asutay and Daniel Västfjäll’s chapter is the relationship between sound and emotion. Evidence from behavioral and neuroimaging studies is presented that documents how sound can evoke emotions and how emotional processes affect sound perception. Asutay and Västfjäll view the auditory system as an adaptive network that governs how auditory stimuli influence emotional reactions and how the affective significance of sound influences auditory attention. This leads to the conclusion that affective experience is integral to auditory perception. Andrea Halpern and Katie Overy argue that auditory imagery can be used actively as a tool in various education and rehearsal sessions. Building on Nelly Ben-Or’s techniques of mental representation for the concert pianist and the pedagogical approaches of Zoltán Kodály and Edward Gordon, Halpern and Overy suggest that conscious and deliberate use of auditory imagery should be exploited more in music education and could be used with profound benefits for musicians as a rehearsal strategy. This leads to a call for further empirical investigations of how voluntary auditory imagery might be best used as a training method for both professional musicians and in classroom settings. Adam Ockelford draws attention to that section of the population that does not normally engage in everyday listening; those for whom the acoustic properties of sound are prioritized over the function of sound. In particular, Ockelford points to the listening of autistic children for whom the perceptual qualities of sound exert an especial fascination at the expense of the meaning that everyday listening would normally give to sound. Through research that supports his contention that the development of musical abilities in children precedes that of language skills, Ockelford makes the claim that the aural imagination of those on the autistic spectrum is one that processes all sound, even speech, for their musical structural properties and thus it is music that is the autistic person’s gateway to communication and empathy. Lars Ole Bonde considers musical imagery in the context of music therapy sessions from the tradition of the Bonny Method of guided imagery and music which provides well-documented examples of such imagery. While Bonde mainly focuses on listening in clinical settings, he argues that image listening should be seen as a health resource in everyday listening settings. Taking in perspectives from neuroaffective theory, Bonde analyzes clinical material and evidence from the analysis of EEG data, and he shows how music therapy theory—as a specific tradition within musicology—can contribute to research on music listening through a greater understanding of the multimodal imagery of such listening. Musical imagery as a multimodal experience is also the topic of Freya Bailes’s chapter, where embodied cognition is used as a framework to argue this. Existing empirical studies

introduction: volume 2   9 of musical imagery are reviewed, and Bailes points to future directions for the study of musical imagery as an embodied phenomenon. Arguing that musical imagery can never be fully disembodied, Bailes moves beyond the idea of auditory imagery as merely a simulation of auditory experience by “the mind’s ear.” Instead she outlines how the imagi­ ning of sounds and music is always connected to sensory-motor processing.

Aesthetics Theodore Gracyk takes issue with the claim that imaginative engagement is a pre­ requisite for the appreciation of music; that the experience of expressiveness in music derives from an imaginative enrichment that allows music to be heard as a sequence of motion and gestures in sound or that the expressive interpretation of music is guided by imaginative description. While not completely rejecting an imaginative response to music, Gracyk instead opts for a form of imaginative engagement with music described as hearing-in. While not all music demands such engagement, hearing-in is not a trigger for imaginative imagery but rather a musical prop that invites the listener to attend to music’s animation in, for example, the form of musical causality and anticipation. Bryan Parkhurst uses contemporary analytic “normativist” aesthetics as a lens through which to view Leftist/Marxian “normative” aesthetics of music appreciation. In order to do this, Parkhurst situates the key theses of Ernst Bloch’s theory of utopian musical listening within the framework of Kendall Walton’s theories of musical fictionality and emotionality. The aim of this task is to make Bloch’s fundamental position perspicuous enough that it can be assessed and evaluated. Parkhurst concludes that Bloch’s contention that music should be viewed as a utopian allegory, and that the distinguished office of (Western classical) music is to contribute to the political project of the imagining of a better world (a “regnum humanum”), faces difficult objections. An exploration of the affective dimension of our sonic environment forms the topic of Ulrik Schmidt’s chapter. Schmidt poses the question, What does it mean to be affected by the sonic environment as environment? The answer to this question involves a conceptual distinction between atmosphere, ambience, and ecology. Schmidt argues that affect and imagination are key components in the environmental production of presence, and he provides examples of the aesthetic potentials of environments and explores how an environment can “perform” in different ways to affect us as environment. In his chapter on musical improvisation, Andy Hamilton deals with the cultural aspects and historical practices of improvisation. The chapter sets out to explore the artistic status of improvised music and this involves a discussion of the connection between imagination and art, and the contrast between composition and improvisation. These discussions provide a theoretical framework to outline and defend an aesthetics of imperfection as a contrast to an aesthetics of perfection. Finally, the artistic value of jazz as an improvised art form is discussed, and Hamilton ponders whether jazz should be described as classical or art music.

10   grimshaw-aagaard, walther-hansen, and knakkergaard

Posthumanism Salomé Voegelin’s chapter contributes to current ideas on materiality, reality, objectivity, and subjectivity as they are articulated in recent texts on New Materialism. Her chapter makes use of the writing of Quentin Meillassoux and his posthumanist theorizing, and it aims to contribute to the discussion through a focus on sound, as sound is seen to support the reimagination of material relations and processes. In order to qualify and substantiate her notion of sonic materialism, Voegelin includes narrow listenings to three sound art works, focusing “on the inexhaustible nature of sound that exists permanently in an expanded and formless now that I inhabit in a present that continues before and after me.” Daniël Ploeger investigates the designed sounds of operating systems (particularly those of Apple and Microsoft computers and devices) from a cultural critical perspective, arguing that such sounds are cybernetic prostheses enhancing our capabilities. In a chapter that takes in initial conceptions of the cyborg, which are overcast by the cyborg’s roots in the military-industrial complex, to the subversion and use of operating system sounds for creative purposes, Ploeger discusses the use and subsequent development of such sounds—from early mainframe computers’ inherent noises to the designed sounds of today’s computing devices—and shows how they underpin the imagining of computers as extensions of the human body. Ultimately, for Ploeger, the recent design of operating system sounds serves to propagate pre-existing ideological concepts of the cyborg as evinced by our now technologically prosthetisized bodies. Anne Danielsen’s chapter focuses mainly on the particular rhythmic feels that have characterized many popular music styles since the 1980s and how these are produced through the manipulation of sound samples and the timing of rhythm tracks. Initially, Danielsen evaluates the formation of these rhythmic feels in two perspectives, an internal and an external, but then goes on to discuss how they constitute a challenge to previous popular music forms while, at the same time, offering new opportunities for human imagination and musical creativity. The chapter uncovers transformations across several styles and discusses whether the present technology at hand can be seen as an extension of the human, creating musics and causing gestural movements that go beyond humankind’s “natural” repertoire. A musical imagining of the future and an exposition of a challenge to the normative historical discourse are the subjects of Erik Steinskog’s chapter on Afrofuturism. These topics are dealt with through a discussion of “blackness” and the theoretical discourse that addresses the musical style and polemical and political stance of afrofuturist musicians such as Sun Ra and others following in his path. Steinskog suggests that afrofuturist music is a form of sonic time travel that intertwines the modalities of time represented by notions of past, present, and future, his argument being that reimaginations, reinterpretations, and revisions of a normative past are represented in the technology and music of the black future.

introduction: volume 2   11 From a background that critically investigates conceptualizations and understandings of the relations and dialectics between the inner and the outer voice and the dis­ cursive implications of the posthumanist appraisal of vocality, Jason D’Aoust examines the “operatic voice,” or rather the vocality of the opera—especially as it is practiced and understood in present time. From a philosophically informed perspective, the chapter carries out studies and analyses of artistic works from different genres, comprising opera, literature, and film, understanding opera as a cultural work of exclusion and inclusion and how its practitioners are now deconstructing the canon of opera.

Reference Schwitters, K. 1932. URSONATE. http://www.costis.org/x/schwitters/ursonate.htm. Accessed June 15, 2018.

Pa rt I

M USIC A L PE R FOR M A NC E

Chapter 1

Improv isation An Ideal Display of Embodied Imagination Justin Christensen

Introduction Imagination has often been considered to play a major role in perception, in the ­production and appreciation of aesthetic objects, in simulation, and in fanciful creative thought. Phenomenologists such as Husserl have claimed that imagination should not be thought of in terms of images or description, but rather as a means of structuring consciousness and giving meaning to phenomena. For Merleau-Ponty, imagination brings about a perception that “arouses the expectation of more than it contains, and this elementary perception is therefore already charged with a meaning” (Merleau-Ponty 2002, 4, italics in original). More recently, Varela and colleagues have stated, “cognition is not the representation of a pregiven world by a pregiven mind but is rather the enactment of a world and a mind on the basis of a history of the variety of actions that a being in the world performs” (Varela et al. 1992, 9). Simply said, cognition is not independent of the world, but instead functions as a means to guide action and perception. For ­phenomenologists and for many empiricists, imagination acts to bind cognition together with action and perception, and can also be seen to govern our perceptions, shaping and filtering them into meaningful experiences. For instance, Merleau-Ponty has supported linking together imagination, action and perception, stating, our waking relations with objects and others especially have an oneiric character as a matter of principle: others are present to us in the way that dreams are, the way myths are, and this is enough to question the cleavage between the real and the imaginary.  (1970, 48)

16   justin christensen This viewpoint is also shared by a number of embodied cognitive theorists, such as Hawkins and Blakeslee: As strange as it sounds, when your own behavior is involved, your predictions not only precede sensation, they determine sensation. Thinking of going to the next ­pattern in a sequence causes a cascading prediction of what you should experience next. As the cascading prediction unfolds, it generates the motor commands necessary to fulfill the prediction. Thinking, predicting, and doing are all part of the same unfolding of sequences moving down the cortical hierarchy.  (2004, 158)

This direct connection of imagination to action suggests that embodiment plays an important role in how we experience our predictive simulations, our fanciful creative thoughts and our perceptions. Supporting this, Vittorio Gallese considers simulation to be embodied, “because it uses a preexisting body-model in the brain and therefore involves a nonpropositional form of self-representation that also allows [one] to experience what others are experiencing” (Gallese, quoted in Metzinger 2009, 177). Since Hawkins and Blakeslee have argued that imagination determines sensation, and that it shares the same neural activations as for sensations and actions, then the distance between sensation, action, and simulation in a dynamic framework becomes quite negligible (Decety and Grèzes 2006). Furthermore, neural imaging studies have suggested that when individuals perceive the actions and the emotions produced by others, they use the same neural mechanisms as when they produce the actions and the emotions themselves [even though] there is no complete overlap between self- and others representations. This would lead to confusion and chaotic social interaction.  (12)

Beyond this support for a common-coding of perception and action, there is evidence of perception–action coupling in infant development (Johnson and Johnson 2000), childhood development (Getchell 2007), and in sports activities (Ranganathan and Carlton 2007). Attempting to move beyond common-coding, Maes and colleagues ­support a more radical embodied approach to explain action-perception coupling in music listening and performance. They argue, “sensory-motor association learning can be considered a central mechanism underlying the development of internal models” (Maes et al. 2014, 2). Similarly, they point out, “the ability to predict the auditory consequences of one’s actions, which is one of the core mechanisms of action-based effects on perception, depends on previous acquired sensory-motor associations” (2). Alongside this, they “define the concepts of temporal contiguity and probabilistic contingency as two [of the] main principles underlying associative learning processes” (2). Furthermore, they consider that “musical instrument playing [is] a special but highly illustrative case of sensory-motor association learning” (2). Subsequently, Maes and colleagues follow a dynamic systems approach to examining embodied music cognition in order to incorporate social interaction, introspection, and expressivity alongside sensorymotor ­coupling in their meta-analytic study of embodied music cognition.

improvisation: embodied imagination   17 Similar to these views, I will argue that this ability to imagine and simulate the actions and emotions of others plays a major role in our reception of both written music and improvisational practice (see Wöllner, this volume, chapter  2, on the anticipation of sound in performance). I fully acknowledge that this is only part of the picture, as ­language, with its role of describing and categorizing, also plays a large role in filtering and shaping our imagination. To attempt a more complete picture, I propose that imagination is made up of a dynamic collaboration between nonpropositional (embodied) and propositional (language) forms of knowledge that construct our aesthetic experiences. Bowman has remarked, “When we hear a musical performance, we do not just ‘think,’ nor do we just ‘hear’: we participate with our whole bodies; we construct and enact it” (Bowman 2004, 47; also seen in Borgo 2005, 44). This combined approach of  the nonpropositional and propositional also fits well with both Heidegger’s and Gadamer’s theories on aesthetics. For them, art has a great impact on our experiences in the world, through presenting us with mutually dependent disclosures and “hiddennesses.” These disclosures not only disclose themselves, but they also reveal the presence of the hidden, with these revelations drawing us further into the aesthetic experience. While some “hiddennesses” are aspects of experience that we lack the ability to conceptualize through language, others are aspects of experience that just have not as of yet reached the center of attention to be conceptualized. This phenomenological perspective proposes that the separation of the nonpropositional from the propositional forms of thought is achieved through having prereflective and reflective forms of consciousness, validating that imagination should be viewed from multiple levels. As a result, I will argue that a dynamic and multileveled perspective of imagination is necessary for exploring our musical experiences. Furthermore, I propose that only focusing on the reflective, verbally reportable aspect of consciousness impoverishes our understanding of artistic emotions. Many such as Daniel Dennett (1991) consider disclosure (the ability to give verbal description) necessary before an experience can be considered a valid conscious experience. This ignoring of the embodied aspects of the musical experience has also permeated rationalist and positivist views on aesthetics. DeNora has stated, listening is too often de-historicised in a way that imposes the model of the (historically specific) silent and respectful listener as a given. Within this assumption, the body of the listener is excised. And yet, such listening involves a high degree of bodily discipline.  (2003, 84)

Borgo (2005) has pointed out the inherent tension between an embodied take on music reception and a more traditional aesthetic view, which purports that one should disinterestedly examine an art object as something that is autonomous and fully separate from oneself. Rationalist and positivist aesthetic viewpoints have also had difficulty in making aesthetic judgments on musical experience, as the experience is ephemeral and thus the only art object that remains to be judged is the score. This largely results from the fact that representation has been considered vital for something to be recognized as

18   justin christensen a fine art. For instance, owing to music’s ephemerality and lack of disclosed ­representation, Kant has needed to consider music generally as agreeable sensations rather than a fine art (with textual vocal music to be an exception to this norm). Supporting this, Kant has stated that music, as he hears it, provides “nothing but sensation without concepts, so that unlike poetry it leaves us with nothing to meditate about” (Kant 2007, §328). This difficulty is exacerbated for improvisation, which lacks even a score to judge as an art object. I propose that we need to get rid of the notion of reified art objects, and that we instead need to re-examine imagination and improvisation, both through our past interpretations of their usefulness and through the context of more current embodied cognitive approaches, in order to give imagination and improvisation the important recognition they deserve as part of artistic experience.

The Shared Double Histories of Imagination and Improvisation A historical study of imagination often begins by examining Plato’s suspicion of ­imagination seen in the Analogy of the Divided Line in the Republic (Plato 1992, 509c), where imagination is not seen as an effective means of gaining knowledge, but is instead the lowest form of cognition, as it only grasps at objects through shadows and reflections. For Plato, the obtaining of intellect and knowledge is achieved by ignoring one’s unstable imagination, by transcending one’s unreliable senses, and by aiming toward gaining apodictic knowledge about the immutable and universal essences that are hidden behind the objects in the world. Imagination for Plato was an imitation of the sensual, which he considered to be an imitation of the ideal, and therefore imagination for him became an imitation of an imitation, a copy of a copy. As a result, it is no surprise that Plato thought, “an imitator has no worthwhile knowledge of the things he imitates, that imitation is a kind of game and not something to be taken seriously, and that all the tragic poets . . . are as imitative as they could possibly be” (Plato  1997, 1206, 602b). However, even though Plato criticized imagination and the creative arts, he was awed by the possibility that artists had a direct connection to a higher power or God. For a poet is an airy thing . . . he is not able to make poetry until he becomes inspired and goes out of his mind and his intellect is no longer in him. As long as a human being has his intellect in his possession he will always lack the power to make poetry or sing prophecy.  (Plato 1997, 942, 534b–c)

For me, this simultaneous disparagement and awe of artists is an unresolved dichotomy in Plato’s writing. On the one hand, imitation and creative thought is reprehensibly far from the transcendental “forms” while, on the other hand, artistic inspiration in its excess can form a direct link to the transcendental. Even with this dichotomy, Plato’s view of creative imagination has had an enduring influence on the arts, such as when

improvisation: embodied imagination   19 Shelley, in A Defence of Poetry, discussed the artist resonating to their internal and ­external influences like an Aeolian lyre, drawing on divine effluence (Shelley [1840] 2010). Similarly, Samuel Taylor Coleridge gave a Platonic description of imagination when he stated that imagination is “a repetition in the finite mind of the eternal act of creation in the infinite I am” (Coleridge  1984, 304). Johnson has nicely summed up Plato’s view on creative imagination in the Republic by stating, “But imagination of this sort is not a rational faculty; rather, it is the result of a kind of demonic possession in which the poet loses rational control” (2013, 143, italics in original). Thus, the Platonic creative imagination in its reaching toward the divine necessarily walks a fine line between genius and insanity. Moreover, if the Platonic creative imagination lacks this transcendental genius, then it fails society. The other side of imagination, which has been considered as a mediator between the senses and thought, has often been seen to begin with Aristotle, for whom imagination is different from either perceiving or discursive thinking, though it is not found without sensation, or judgment without it. That this activity is not the same kind of thinking as judgment is obvious. For imagination lies within our power whenever we wish.  (Aristotle 2004, bk. 3, chap. 3, 427b)

Hume’s view on imagination is a variation on this, having no categorical difference between perceiving and imagining. [T]hose perceptions, which enter with most force and violence, we may name impressions . . . By ideas I mean the faint images of these in thinking and reasoning . . . The first circumstance, that strikes my eye, is the great resemblance betwixt our impressions and ideas in every other particular, except their degree of force and vivacity.  ([1731] 1888, 1)

Hobbes was more extreme in his viewpoint, stating, “there is no conception in a man’s mind, which hath not at first, totally, or by parts, been begotten upon the organs of sense. The rest are derived from that original” ([1651] 1996, 9). For Hobbes, we are unable to  imagine anything that is completely free from the inputs of our sense apparatus. Accordingly, Aristotelian imagination has a strong connection to an empirical philosophical viewpoint and is shared by empiricists such as Hobbes, Berkeley, Locke, and Hume as well as others. The Platonic and Aristotelian viewpoints of imagination have an inherent tension between them, congruent with the Cartesian mind–body problem. Since Descartes (1985) presumed that the mind and the soul are more or less the same thing, the Platonic view of imagination conforms well to a Cartesian substance of the mind in that it can be seen to draw inspiration from the transcendental. Similarly, the Aristotelian view of imagination conforms to a Cartesian substance of the body immanent in the world, as this viewpoint explores the connections between sense experience and thought. Related to this, Mary Warnock (1976) has asked how it is that imagination can both facilitate everyday perception and be a source of novelty. For me, the only way to resolve this

20   justin christensen dual nature of imagination is to deny the unfounded divisions that have been made between the mind and the body, and between the divinely inspired and the routinely experienced.

Tensions between the Sacred and the Divine Throughout the history of Christian church music, there have existed competing ­concerns regarding the amount of polyphony and melismatic ornamentation (singing multiple notes per syllable of text) allowed during the mass setting. On the one hand, improvisatory practice has meant innovation. One example of this need for innovation in musical practice is that early Christian church music began as a monophonic setting. One problem with monophony is that if one wanted to have boys and men sing together there were either very few available pitches that could be sung by all of the voices or one had to expand on the idea of monophony. These contrasting physical limitations could very possibly have encouraged the simple innovation of parallel motion homophony by allowing the boys to sing an octave above the men. Improvisation and innovation thus have relied on coevolving with the constraints and affordances that a situation provides. “[F]ar from being the antithesis of creativity, constraints on thinking are what make it possible . . . Constraints map out a territory of structural possibilities which can then be explored, and perhaps transformed to give another one” (Boden 2004, 95). Within these constraints, affordances are “action possibilities” present in an environment that are dependent on the actant’s ability to make use of them. A good example of a developing dynamic interaction with the affordances of an environment can be seen in how infants learn to move. Thelen and Smith (1996) proposed a dynamic systems approach for the development of movement patterns in infants, after they found that each of their infant subjects faced unique challenges in response to their individual body dimensions, energy levels, and changing contexts. In overcoming these challenges, these infants used unique strategies, seen as emergent phenomena arising from decentralized and local interactions, which Thelen argued would be very difficult to defend as being innate (1995). Following the ideas of Thelen and Smith, I would argue that the body (not only as a form of constraint) takes part in dynamic interactions with the environment spontaneously guiding the individual to find movements, perceptions, and thoughts that come more naturally for them when attempting to achieve meaningful experiences and achieve goals. This type of improvising as a practical engagement with one’s environment could be seen to have connections to an Aristotelian view on imagination, where one synthesizes a meaning from perceptions, goals, and knowledge. On the other hand, improvisation can also be seen as having some characteristics that might correspond to a Platonic version of imagination. Church officials saw that music had power to inspire divine thoughts. Reflecting this, Giannozzo Manetti stated, all the places of the Temple resounded with the sounds of harmonious symphonies as well as the concords of diverse instruments, so that it seemed not without reason that the angels and the sounds and singing of divine paradise had been sent from

improvisation: embodied imagination   21 heaven to us on earth to insinuate in our ears a certain incredible divine sweetness; wherefore at that moment I was so possessed by ecstasy that I seemed to enjoy the life of the Blessed here on earth.  (Manetti, quoted in Dufay 1966, xxvii)

Manetti’s statement suggests that an individual might reach an altered state of ­consciousness by becoming overwhelmed by the music as part of a mystical religious experience. Rouget (1985) has collected extensive ethnographic data on religious experiences and music, and has presented the idea that ergotropic trance in many cases may arise from sensory overstimulation and noise as part of a mystical experience. I would argue that the improvised complex polyphony and melismatic singing in the mass could contribute to this ecstatic feeling brought on by overstimulating the listeners. Furthering this view is the evidence that melismatic ornamentation and polyphony were very often among the musical elements that were specified when church officials repeatedly attempted to restore sacredness to the mass setting (Fellerer and Hadas 1953). “Improvisation seems to have been a central aspect of musical practice in the first centuries of the Christian church” (Sancho-Velazquez 2001, 9). Later, in the ninth century, a story was spread around that all of the music that Gregory I wrote down for the liturgy of the service was whispered to him by God. Sancho-Velazquez (2001) suggests the spreading of this story was a political move, shifting the divine creative inspiration from the process of improvisation to the process of composition so that the church had a reason to unify the liturgical service across the empire. Still, Gregory’s scores were generally only regarded as outlines over which the singers should follow their inspiration and the goals of the liturgy of that week (Ferand 1961). Evidence from writings in the twelfth century taken from a monastery in Spain reveals that singers prolonged notes of the written chant to allow an upper voice to sing melismatically over the top: “Clearly it was a style that could have originated, and probably did, in improvisation” (Grout and Palisca 1988, 103). Also, a fourteenth-century treatise by Tinctoris supports the notion of improvised polyphony when he divided polyphony into being either mente (improvised) or scripto (written out). Furthermore, in attempting to recover the solemnity of the service, the Synod of Schwerin requested in 1492 that melismas and polyphony be reduced to increase the intelligibility of the text especially during the psalmody (Fellerer and Hadas 1953, 578). Similarly, in 1503 the Council of Basel requested that the Credo not be “mutilated” (Fellerer and Hadas  1953). As a culmination of the tensions between divine inspiration and sacredness, the Council of Trent proposed Canon 8 during the Counter-Reformation, which stated: the sacred mysteries should be celebrated with utmost reverence, with both deepest feeling toward God alone, and with external worship that is truly suitable and becoming, so that others may be filled with devotion and called to religion . . . But the entire manner of singing in musical modes should be calculated, not to afford vain delight to the ear, but so that the words may be comprehensible to all. (Canon 8, quoted and translated in Monson 2002, 9)

These requests for suitable solemnity, reverence, and comprehensibility of text seem to suggest that sacredness be given a priority over divine inspiration. I find this tension

22   justin christensen between sacredness and divine inspiration to have similar aspects to the tensions that we have earlier seen involved with the Platonic and Aristotelian imaginations. Improvisation as divine inspiration tied to altered states of consciousness and mystical religious experiences maps well onto Plato’s need for divine inspiration in creative thought for it to be worthwhile. Similarly, improvisation as an innovative practice with constraints and affordances maps well onto the Aristotelian imagination that links together cognition with perception and action. Furthering the links between divinely inspired improvisation and Platonic imagination, there have also been accusations of demon possession related to ecstatic experiences in the church (Edwards 1742), as people feared that they did not know where these divine inspirations came from, whether from God or from demons (Edwards 1746). Outside of the church, there have also been descriptions of ecstatic responses to improvised performances of secular music. In the sixteenth century, Jacques Descartes de Ventemille described an improvised (free fantasia) performance of Francesco da Milano, stating, he continued with such ravishing skill that little by little, making the strings ­languish under his fingers in his sublime way, he transported all those who were listening into so pleasurable a melancholy that . . . they remained deprived of all senses save that of hearing. [He left] as much astonishment in each of us as if we had been ­elevated by an ecstatic transport of some divine frenzy. (Descartes de Ventemille, quoted in Weiss and Taruskin 2007, 134)

While improvisation’s role was not only to transport people into ecstatic frenzy, I would argue that it, showing a similar dual nature to imagination, would have been considered to act in both the transcendental and material realms. As a result, I will argue in similar fashion against any needless divisions between improvisation as a powerful and sublime experience and improvisation as innovatory practice. I see these two sides of improvisation as necessary to one another.

Kant’s Reconciliation of Imaginations Kant attempted to reconcile the tension between rationalism and empiricism through his Copernican shift. One secondary effect of this attempt at reconciliation was that Kant gave the mind an active role in making meaning from sensual experiences by way of the imagination. This is a big step forward from Locke’s passive perception, where perceptions and ideas are only passively received by the mind (Locke 1803, II:xii.1, xxii.2, and xxxiii.3 show examples of this way of thinking). Unfortunately, with Kant’s advancement to active perception, he risked everyone having a completely individual and solipsistic synthesis of sensory information, where no one would be able relate to anyone else in how they made meaning out of their perceptions. This was a danger of

improvisation: embodied imagination   23 which he was aware, and so, as a result, Kant introduced the concept of synthetic a priori knowledge (a  concept is described by Kant as “something that is universal and that serves as a rule” [Kant 1998, A106]). Kant’s appeal for a synthetic a priori is an appeal for universally generalizable experience. Basically, if we exist as minds (in a Cartesian sense) the only possibility for us to share experience with one another and to communicate with one another is either to have an apparatus for connecting to the world that is universally similar or to passively receive the information from the world. If we have a mind–body separation and we want to escape the need for only passively accepting our perceptions of the world, then we would need a chip in our brains that could bridge the gap, translating experience from the body to the mind. Kant thus considered subjective experience not to be fully subjective, and gave it the name transcendental subjectivity, as it gave individuals some access to an apodictic experience. Through this transcendental subjectivity, there are some universal ways to experience reality, which builds a universal (although invisible) foundation on which we can intelligibly communicate to one another and experience things similarly. I find that an oversimplified but useful comparison to these hidden (transcendental subjectivity) universals is with the hidden rules of universal generative grammar that Chomsky has proposed are hardwired into the brain to allow us to learn languages quickly and efficiently. Nevertheless, I would argue that neither of these viewpoints has panned out. Chomskyan generative grammar has rules that are very commonly used around the world but has failed to find universality (Everett 2005), while Kant’s s­ ynthetic a priori has failed to adequately bridge the gap between the substances of the mind and body. Instead, as Johnson points out, the rigid separation of understanding from sensation and imagination relegates the latter to second-class status as falling outside the realm of knowledge. As a result, judgments of taste can never, for Kant, be determinative or constitutive of experience. Neither can they be “cognitive,” for he regards the cognitive as the conceptual, and there is no concept or rule guiding reflective judgment.  (2013, 167)

In a way, Kant may have succeeded in bringing together the different types of imagi­ nation. However, in doing so, he relegated the Platonic imagination to the same place on the divided line as Plato, to the very lowest rank. Furthermore, to accomplish this he had to give great power to transcendental subjectivity, thus partially muting the active participatory role of the mind, and obligating our experiences and understandings to be normalized through their universal underpinnings  (Steeves 2004).

Shifting Views on Imagination and Improvisation Improvisation reached its height in classical music, becoming ubiquitous during the baroque period with performers very often filling out their lines by following the

24   justin christensen harmonic motion indicated by a written-out figured bass. C.  P.  E.  Bach described improvisation in his time by stating: “Variation when passages are repeated is indispensable today. The public demands that practically every idea be constantly altered” (Nettl et al., n.d.). There were also many opportunities for keyboard players to freely improvise. Sometimes they had strict forms such as fugues and other contrapuntal structures that they could improvise within, and, at other times, they could freely improvise, not even necessarily following a predetermined musical form, but developing their ideas as their imagination led them (e.g., fantasias or ricercare) (Bailey 1993). However, once the figured bass had become removed from the music and musical forms began to change during the Classical period, the divide between improvisation and composition became much clearer. One of the reasons for this removal of the figured bass could be that composers were attempting to wrest creative power away from performers. Verdi was the epitome of this argument when he stated: No: I want only one single creation, and I shall be quite satisfied if [the singers] ­perform simply and exactly what [the composer] has written. The trouble is that they do not confine themselves to what he has written. I deny that either singers or conductors can “create” or work creatively. (Verdi, quoted in Sancho-Velazquez 2001, 5)

In the spirit of the Enlightenment near the beginning of the eighteenth century, Fénelon’s The Adventures of Telemachus presented imagination as something childish that could be exploited when teaching children, but ought to have been eradicated by the time the students reached adulthood. Fénelon followed the views of Plato, and thus imagination was yet again considered inferior to reason (Lyons 2005). Romanticism is a period that we, in retrospect, have decided celebrated creative imagination. However, I would argue that it instead continues to celebrate divinely inspired imagination and attempts to guard us against more fanciful imagination, while also continuing to separate the divinely inspired from the routinely experienced. Rousseau, an important figure for the growth of Romanticism also wanted to s­ uppress imagination in favor of a Lockean passively receptive “sensibility.” Rousseau’s book Emile; or, On Education had a teacher attempting to protect Emile from his imagination, killing his imagination through habit. Through this book, Rousseau argued that people should strongly avoid creatively modifying or distorting the information of the senses (1979). This viewpoint matches well with Kant’s aesthetics on reception. In his Critique of Judgement, Kant stated, “judgements of taste” must contain four characteristic ­features: (1) disinterestedness, where pleasure is derived from judging something as beautiful, and not the inverse of judging something as beautiful as a result of finding it pleasurable; (2) universality of this judgment; (3) necessity of this judgment, where the beauty is intrinsic to the object itself; and (4) “purposiveness without purpose” of the object ([1790] 2007). Even though Kant gave the mind an active and participatory role in perception, all of these features of aesthetic judgment work to greatly reduce the role of imagination in the aesthetic appreciation of art.

improvisation: embodied imagination   25 Although Rousseau disapproved of imagination during the reception of an aesthetic object, and Kant argued that individuals needed to be guarded against the fanciful aspects of imagination (Kant 2007, §420), they were both greatly in favor of the imagination of the naturally gifted genius during the creative process (following a view similar to Plato’s view on creative imagination). I think the premise that creative imagination was an important method for deciding whether a musician was naturally gifted or not had a major impact on allowing improvisation to continue into the second half of the nineteenth century against the growing tide of positivism. This viewpoint of improvisation, tied to a naturally gifted creativity, was enhanced by the tradition of eminent composers calling attention to younger composers based on their improvisational abilities. Clementi recognized Mozart for his improvising skills, who in turn recognized Beethoven. As a result, it is not surprising that Liszt contacted Beethoven in an attempt to gain recognition from Beethoven for his improvising skills (Sancho-Velazquez 2001). One of the problems for improvisation during the nineteenth century was that through its ephemerality it could not be considered as an autonomous art object. As a result, improvisation through the nineteenth century became a vehicle that pointed to the talent rather than the creativity of the performer, and thus often gave rise to virtuosity and showmanship in improvisation. Another problem for improvisation during this period was that music theorists wanted to find objective universals in music that would match the truths that they were searching for in the sciences. Christensen has stated that the nineteenth-century music theorist Fetis saw the task of his self-proclaimed science of the “philosophie de la musique” to be to show how tonality was the dialectical synthesis of theory and history; music history was the actualization of tonality, while music theory could be seen as its “objectification.”  (1996, 56)

More recently, Kivy repeatedly has stated in one way or another through his book on musical genius, “A musical genius is one who produces supremely valuable musical works” (2001, 178). Thus, genius is not in the process of musicking (focusing on music as an act rather than music as an object), but rather in the production of objectifiable ­aesthetic art objects. Similarly, through the twentieth century, improvisation has been discredited by major composers such as Boulez and Berio, and by influential music theorists such as Adorno (Peters 2009). When musicology has become a search for a canon of masterworks, then improvisation has had a difficult time staying relevant in this changing landscape. Concurrently in the early twentieth century, due to the influence of behaviorism, imagination was relegated to “the outer darkness of intellectual irrelevance” (Morley 2005, 117). While improvisation slowly departed from classical music, it quickly spread into other musical styles. Jazz improvisation, which highlights spontaneous and unplanned performance practice, gained great popularity in the early twentieth century, especially in America, well in advance of any psychological or philosophical theories that highly value the role of spontaneous imagination in cognition. However, improvisation in jazz

26   justin christensen was not without difficulty in its early days. Gushee quotes Colles in the 1927 Groves Dictionary of Music and Musicians stating that improvisation “is therefore the primitive act of music-making, existing from the moment that the untutored individual obeys the impulse to relieve his feelings by bursting into song” (Colles  1927). Alongside this, Gushee quotes a questioner in Jacobs Orchestra Monthly from 1912 asking about faking (improvisation) even though it is “not playing correctly,” and later mentions that improvi­sation, which is now considered “as an act of creative imagination, in the past it was sometimes considered anything but, something that inadequately trained musicians did from rote or force of habit or necessity” (1927, 265–266). Once improvisation in jazz grew through the early stages, something that might have helped allow improvisation to flourish in jazz was the act of recording, as it gave the possibility for improvisation to be fixed into an autonomous aesthetic art object (Solis 2009). Subsequently, this ability to reify improvisation into fixed art objects has been a point of contention for improvisers. Some, like Derek Bailey (1993) are greatly against this reification, while others such as Gabriel Solis (2009) see a positive value in this.

Imagination in Embodied Cognitive Science Earlier, I presented the tradeoffs that Kant felt were necessary (overvaluing normalized experiences while devaluing imagination) to allow individuals to have a participatory role in perceiving their world around them. Also, I have already several times through this chapter appealed for an embodied perspective as a better means of participating in the world around us.

Imagination Eliminates the Need for a Transcendental Subjectivity In this section, I will present some empirical theories that provide support for an embodied perspective when examining aesthetic experience and experience in general. In opposition to Kant’s transcendental subjectivity, I would like to argue that, through the brain’s capabilities for predictive simulation alongside our embodied situatedness in a shared environment, we do not need to have a transcendental subjectivity in order to share experiences. Simulation, a form of imagination, can either overtly or covertly emulate the behaviors of the self or of others. It “can be conceived as a conscious reactivation of previously executed actions stored in memory [where one] may replay her own past experience in order to extract from it pleasurable, motivational or strictly ­informational properties” (Decety and Grèzes 2006, 5). With this, considerable evidence has been presented that shows strong “similarities in the neural circuits activated

improvisation: embodied imagination   27 during the generation, imagination, as well as observation of one’s own and other’s behavior” (4). Egermann and colleagues have stated that recent theories of auditory statistical learning, also supported by evidence reported by Huron (2006) and Pearce and Wiggins (2006), propose that melodic expectations do not rely on underlying patterns of universal bottom-up principles but have merely been formed through exposure to syntactic relationships within musical structures of a given culture (Abdallah and Plumbley,  2009; Pearce and Wiggins,  2006). Furthermore, computational simulations of this learning process have yielded robust predictions of perceptual expectations, outperforming other rule-based models like Narmour’s.  (Egermann et al. 2013, 2)

The research of Egermann and colleagues highlights the importance of including ­subjective cultural criteria in musical analyses, with the significant improvement it contributes to predicting listener expectation. Their work strongly contradicts the claim structuralist musicologists have asserted in the past, which is that the inclusion of subjective elements would compromise the possibility of making a rigorous analysis. This use of statistical learning has now also branched out into learning jazz improvisation principles in the same manner through exposure to syntactic relationships (Thom 2000). I propose that agnosticism toward others (a weakened version of solipsism), which has been influential to the mind–body separation, is a confabulation of the experience of the self. Damasio has stated, “consciousness, as currently designed, constrains the world of imagination to be first and foremost about the individual, about an individual organism, about the self in the broad sense of the term” (Damasio 2010, 300). We experience that our imagined simulations of our self are completely unique from the experiences of others. For this reason, thinkers such as Nagel have erroneously argued that there are fundamentally different mental processes used for imagining the actions of oneself than the ones that are used for imagining the actions of another (Nagel 1974). In an extreme exposition of agnosticism of others, Derrida asserts that no animal or human individual inhabit[s] the same world as another, however close and similar these living individuals may be . . . between my world and any other world, there is first the time and space of an infinite difference, an interruption that is incommensurable with all attempts to make a passage, a bridge, and isthmus, all attempts at communication, translation, trope, and transfer that the desire for a world or the want of a world, the being wanting a world will try to pose, impose, propose, stabilize. There is no world there are only islands.  (Derrida 2011, 31)

This agnostic position toward the other is more emphatically expressed by Derrida here than he may actually believe, but theories following a Cartesian mind–body separation have a very difficult time defending against this, which has led to their need to search for essentialisms and universally innate common abilities (such as Kant’s transcendental subjectivity) that allow for communication between these infinitely separated islands. Furthermore, I argue, following an embodied viewpoint, that since we have a responsibility

28   justin christensen to act in a goal-directed manner within the time constraints of real life, we actively ­participate in the perception and meaning making of our environment rather than ­passively apprehend an accurate reality. Since our predictions are situated in similar embodiments and environments to one another, we have access to similar shared experiences. Jean-Luc Nancy in Being Singular Plural has given an adept perspective on how embodiment can defend us from solipsism: That which exists, whatever this might be, coexists because it exists. The co-implication of existing [l’exister] is the sharing of the world. A world is not something external to existence; it is not an extrinsic addition to other existences; the world is the coexistence that puts these existences together. . . . Kant established that there exists something, exactly because I can think of a possible existence: but the possible comes second in relation to the real, because there already exists something real. (Nancy 2000, 29)

Nancy here proposes that thinking of reality includes an immediate coexistence. Thus, it is impossible to approach things-in-themselves as they always already exist in a reality that is plural. In my opinion, the troubles that we have had in joining the different types of imagination into a single framework stem mainly from one problem, the completely unnecessary split between mind and body.

Friston’s Free Energy Principle Recently, Karl Friston (2012) has come up with a theory that strongly supports the idea that cognition, perception, and action are not easily divisible but instead work together to play a role in the dynamic interactions between body, mind, and environment. This work also supports the notion that cognition does not just appear in response to triggers from the environment. Instead, cognition is continually operating to prepare for the “remembered present” (Edelman 2001) through simulation. This cognitive simulation constructs a population of predictions, comparing our present experience to our past experiences, and thus actively constructing meaningful experiences by filtering stimuli through memories of past experiences. The free energy principle also gives empirical support to notions of social simulation that allow us the ability to imagine the actions and emotions of others, giving us nonpropositional means for the reception of performed music. The essence of the free energy principle is “that all agents or biological systems (like us) must minimize free energy” (Friston 2012, 263). Furthermore, the driving question behind this research is How, in a changing and unpredictable world, do biological agents resist a natural tendency to disorder and thermodynamic equilibrium? All the physics that we know, such as the fluctuation theorem . . . suggests that random fluctuations in our

improvisation: embodied imagination   29 environment will ultimately change our physical states to the point we cease to exist. And yet, biological systems seem to violate these laws [and] they occupy a small number of states with a high probability and avoid a large number of other states. In short, they appear to resist thermodynamic imperatives.  (263, 266)

Similar principles have been discussed by Saslaw and Walsh in their chapter (this volume, chapter 7), and more can be read on this topic there. A major part of Friston’s proposed answer to this question is the minimization of surprise. He suggests that we do this in two ways; through altering our perceptions, and through altering our actions. First of  all, we should through learning, evolution, and neurodevelopment maximize the chances that our model works by becoming a model of our environmental niche. Then, we can change our level of surprise by changing (optimizing) either our predictions (perceptions), our expectations (our model), or our actions (Friston 2012). “This perspective suggests that we should selectively sample data (or place ourselves in relation to the world) so that we experience what we expect to experience. In other words, we will act upon the world to ensure that our predictions come true” (272). It is this selective sampling of data that supports imagination’s major influence on our agential interaction with the world. This is supported by Calvo-Merino and colleagues’ work on dance (2005), where experts had stronger neural activations in the premotor cortex than novices. Furthermore, in this study they also saw that experts had stronger activations in response to dance styles more similar to their own. Thus, our ability to act can also influence how we choose to sample our perceptual data and how it directly influences our behaviors. Related to this, Borgo quotes Evan Parker when he states, In the end the saxophone has been for me a rather specialised bio-feedback instrument for studying and expanding my control over my hearing and the motor mechanics of parts of my skeleto-muscular system and their improved functioning has given me more to think about. Sometimes the body leads the imagination, sometimes the imagination leads the body.  (Parker 1992; Borgo 2005, 58)

Friston suggests something very similar, but rather as a tri-part structure, where sometimes perception leads, sometimes imagination leads, and sometimes the body leads. In this regard, I find improvisation to be an ideal representation of Friston’s free energy principle.

Language and Imagination Regarding the reception of music, I would assert that an individual’s use of vocabulary and concepts as well as their previous experience can have a very strong impact on their conscious experience. This premise has been explored in much greater detail in Mads Walther-Hansen’s chapter (volume 1, chapter 23) and more can be read there. In support of the claims that link language and experience, language has been found to be crucial for developing concepts and categories, as it facilitates a development of new

30   justin christensen categorizations (Lupyan et al.  2007), and when dementia patients lose the concept knowledge to describe their emotions they can lose the capacity to perceive them (Lindquist et al. 2014). Further supporting this, Richard Hilbert has found that chronic pain that does not fit within the standard descriptions of pain causes further suffering and social isolation in patients, as they have no language with which to describe their pain (Hilbert 1984). On top of this, meta-analysis results from Nilsson and de López are consistent with the theory that children with language impairment have a “substantially lower ToM [theory of mind] performance compared to age-matched typically developing children” (2016, 143). Furthermore, there have been significant links made between language processing and spatial representation (Richardson et al. 2003), language pro­ cessing and the perception of moving objects (Meteyard et al. 2007), and language processing and color perception (Thierry et al. 2009). Evidence is considerable and growing that language is not an innocent tool, but rather that it has a large impact on conscious experience. As a result, it is easy to see the power that language might have in the reflective awareness and reflective imagination involved in improvisation. The importance of reflective consciousness becomes more readily apparent once one realizes all of the necessary work that is put in to prepare for “spontaneous” improvisatory performances. Musicians spend years learning theory, scales, and the techniques necessary to pull this off. As Berliner has stated, there is “a lifetime of preparation and knowledge behind every idea that an improviser performs” (2009, 17). Similarly, musicians converse with one another between and during performances, reflecting and narrativizing on their musical experiences. Furthermore, as can be seen in Schmicking’s chapter (volume 1, chapter 4), musicians intermittently imagine and reflect on how they can guide the music to where they want it to go. Even with these strong arguments for reflective thinking in improvisation, I would argue that some thinkers such as Dennett (1991) take it too far when they consider that the ability to provide a verbal report is necessary for an experience to be considered a valid conscious experience. As artistic experiences both elicit and elude conceptualization according to Heidegger, I propose that only focusing on the reflective, verbally reportable aspect of consciousness impoverishes our understanding of artistic emotions (Harries 2011). Following this, Thompson and colleagues state, “phenomenologists emphasize that most of experience is lived through unreflectively and inattentively, with only a small portion being thematically or attentively given” (Thompson et al. 2005, 59). This is well supported by the work of Al Bregman, who has spent much of his career studying auditory scene analysis. Bregman has researched how sound perceptually either groups together or splits apart into auditory streams, and has found that the most transformative effect on how the auditory stream is processed is whether it is foregrounded or backgrounded in the listener’s mind (Bregman 1990). With that said, Bregman’s research also supports the notion that both the foreground and background musical elements are consciously experienced by a listener. Flow (Csikszentmihalyi 1990), frequently considered to be an optimal state both for performing and listening to music, also has support for it being a strong mix of reflective and prereflective states of

improvisation: embodied imagination   31 consciousness. Individuals in flow “are capable of strategically allocating attention, and hence alternating between reflective and pre-reflective modes of awareness, in order to meet the requirements of dynamically unfolding and contextually contingent performance environments” (Toner and Moran 2015, 769). Since improvisation is an emergent, dynamic, and social practice, it is by its very nature ephemeral and elusive. It still allows for dialoguing with and dialoguing about, but it also heavily relies on more embodied empathic responses from the listeners who not only think about the music but also ­participate in it.

Conclusion Neither improvisation nor imagination fares well within either the classical Greek or Enlightenment theories of knowledge and understanding. These theories have instead privileged objective, disinterested forms of approaching reified aesthetic objects, which neither imagination nor improvisation has to offer. In this chapter, I have argued that this failure of imagination and improvisation to find value within aesthetic theories that value the ontology of fixed art objects says less about the value of either imagination or improvisation than it does about the value lacking in these aesthetic theories. Instead, I suggest that we need to focus more on newer aesthetic theories that value the ontology of the process of musicking. In relation to empirical research, improvisation and imagination have made a resurgence into popular acceptance concurrent with the rise of embodied theories that make direct links to perception and action. Researchers like Decety, Grezes, and Friston have shown the extraordinary influence that imagination has over perception and action, and the close links that perception, action, and imagination have to one another. Furthermore, improvisation, with its reliance on a strong integration of perception, action, and imagination can be seen as a strong reflection of this tight interdependence. As a result, if we really want to understand our embodied, holistically integrated, and time-constrained musical experiences, I feel that it is imperative that we include the investigation of the improvisatory and participatory aspects of meaning making that occur as part of the process of musicking.

References Abdallah, S., and M.  Plumbley. 2009. Information Dynamics: Patterns of Expectation and Surprise in the Perception of Music. Connection Science 21 (2–3): 89–117. doi:10.1037/0 02h2-3514.78.1.53. Aristotle. 2004. De Anima. Translated by Hugh Lawson-Tancred. Reissue edition. London: Penguin. Bailey, D. 1993. Improvisation: Its Nature and Practice in Music. New York: Da Capo. Berliner, P. F. 2009. Thinking in Jazz: The Infinite Art of Improvisation. Chicago and London: University of Chicago Press.

32   justin christensen Boden, M.  A. 2004. The Creative Mind: Myths and Mechanisms. Oxon and New York: Psychology. Borgo, D. 2005. Sync or Swarm: Improvising Music in a Complex Age. New York and London: Continuum. Bowman, W. 2004. Cognition and the Body: Perspectives from Music Education. In Landscapes: The Arts, Aesthetics, and Education, edited by L.  Bresler, 29–50. Boston, Dordrecht, and London: Kluwer Academic. Bregman, A.  S. 1990. Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge, MA, and London: MIT Press. Calvo-Merino, B., D.  E.  Glaser, J.  Grèzes, R.  E.  Passingham, and P.  Haggard. 2005. Action Observation and Acquired Motor Skills: An fMRI Study with Expert Dancers. Cerebral Cortex 15 (8): 1243–1249. doi:10.1093/cercor/bhi007. Christensen, T. 1996. Fetis and Emerging Tonal Consciousness. In Music Theory in the Age of Romanticism, edited by I. Bent, 37–56. Cambridge: Cambridge University Press. Coleridge, S. T. 1984. Biographia Literaria; or, Biographical Sketches of My Literary Life and Opinions. Vol. 1. Edited by J. Engell and J. Bate. Princeton: Princeton University Press. Colles, H. C. 1927. Groves Dictionary of Music and Musicians. 3rd ed. Edited by H. C. Colles. New York: Macmillan. Csikszentmihalyi, M. 1990. Flow: The Psychology of Optimal Experience. New York: Harper & Row. Damasio, A. R. 2010. Self Comes to Mind: Constructing the Conscious Brain. 1st ed. New York: Pantheon. Decety, J., and J. Grèzes. 2006. The Power of Simulation: Imagining One’s Own and Other’s Behavior. Brain Research 1079 (1): 4–14. doi:10.1016/j.brainres.2005.12.115. Dennett, D. C. 1991. Real Patterns. Journal of Philosophy 88 (1): 27–51. doi:10.2307/2027085. DeNora, T. 2003. After Adorno: Rethinking Music Sociology. Cambridge: Cambridge University Press. Derrida, J. 2011. The Beast and the Sovereign. Vol. 2. Edited by M.  Lisse, M.-L.  Mallet, and G. Michaud. Translated by G. Bennington. Chicago and London: University of Chicago Press. Descartes, R. 1985. The Philosophical Writings of Descartes. Vol. 2. Translated by J. Cottingham, R. Stoothoff, and D. Murdoch. Cambridge: Cambridge University Press. Dufay, G. 1966. Opera Omnia. Vol. 2. Edited by H.  Besseler. Rome: American Institute of Musicology. Edelman, G. 2001. Consciousness: The Remembered Present. Annals of the New York Academy of Sciences 929 (1): 111–122. doi:10.1111/j.1749–6632.2001.tb05711.x. Edwards, J. 1742. Some Thoughts Concerning the Present Revival of Religion in New-England, and the Way in Which It Ought to Be Acknowledged and Promoted, Humbly Offered to the Publick, in a Treatise on That Subject. Boston: S. Kneeland and T. Green. Edwards, J. 1746. The Treatise on Religious Affections. New York: American Tract Society. Egermann, H., M.  T.  Pearce, G.  A.  Wiggins, and S.  McAdams. 2013. Probabilistic Models of  Expectation Violation Predict Psychophysiological Emotional Responses to Live Concert Music. Cognitive, Affective, and Behavioral Neuroscience 13 (3): 533–553. doi:10.3758/ s13415-013-0161-y. Everett, D. L. 2005. Cultural Constraints on Grammar and Cognition in Pirahã: Another Look at the Design Features of Human Language. Current Anthropology 46 (4): 621–646. doi:10.1086/431525.

improvisation: embodied imagination   33 Fellerer, K. G., and M. Hadas. 1953. Church Music and the Council of Trent. Musical Quarterly 39 (4): 576–594. Ferand, E. 1961. Improvisation in Nine Centuries of Western Music: An Anthology. Koln: Arno Volk Verlag. Friston, K. J. 2012. Free Energy and Global Dynamics. In Principles of Brain Dynamics: Global State Interactions, edited by M.  I.  Rabinovich, K.  J.  Friston, and P.  Varona, 261–292. Cambridge, MA, and London: MIT Press. Getchell, N. 2007. Developmental Aspects of Perception-Action Coupling in Multi-Limb Coordination: Rhythmic Sensorimotor Synchronization. Motor Control 11: 1–15. Grout, D. J., and Claude V. Palisca. 1988. A History of Western Music. London: Norton. Harries, K. 2011. Art Matters: A Critical Commentary on Heidegger’s “The Origin of the Work of Art.” New York: Springer. Hawkins, J., and S. Blakeslee. 2004. On Intelligence. New York: Times Books. Hilbert, R. A. 1984. The Acultural Dimensions of Chronic Pain: Flawed Reality Construction and the Problem of Meaning. Social Problems 31 (4): 365–378. doi:10.2307/800384. Hobbes, T. 1996. Leviathan. Edited by J. C. A. Gaskin. Oxford: Oxford University Press. Hume, D. 1888. A Treatise of Human Nature. Edited by L. A. Selby-Bigge. Oxford: Clarendon Press. Huron, D. 2006. Sweet Anticipation: Music and the Psychology of Expectation. Cambridge: MIT Press. Johnson, M. 2013. The Body in the Mind: The Bodily Basis of Meaning, Imagination, and Reason. Chicago and London: University of Chicago Press. Johnson, S. P., and K. L. Johnson. 2000. Early Perception-Action Coupling: Eye Movements and the Development of Object Perception. Infant Behavior and Development 23 (3–4): 461–483. doi:10.1016/S0163-6383(01)00057-1. Kant, I. 1998. Critique of Pure Reason. Edited by A.  W.  Wood. Translated by P.  Guyer. The Cambridge Edition of the Works of Immanuel Kant. Cambridge and New York: Cambridge University Press. Kant, I. 2007. Critique of Judgement. Edited by N. Walker. Translated by J. C. Meredith. Oxford: Oxford University Press. Kivy, P. 2001. The Possessor and the Possessed: Handel, Mozart, Beethoven, and the Idea of Musical Genius. New Haven and London: Yale University Press. Lindquist, K. A., M. Gendron, L. F. Barrett, and B. C. Dickerson. 2014. Emotion Perception, but Not Affect Perception, Is Impaired with Semantic Memory Loss. Emotion 14 (2): ­375–387. doi:10.1037/a0035293. Locke, J. 1803. An Essay Concerning Human Understanding. 1st American ed. Vol. 2. Boston: David Carlisle. Lupyan, G., D.  H.  Rakison, and J.  L.  McClelland. 2007. Language Is Not Just for Talking Redundant Labels Facilitate Learning of Novel Categories. Psychological Science 18 (12): 1077–1083. doi:10.1111/j.1467–9280.2007.02028.x. Lyons, J.  D. 2005. Before Imagination: Embodied Thought from Montaigne to Rousseau. Stanford, CA: Stanford University Press. Maes, P.-J., M. Leman, C. Palmer, and M. Wanderley. 2014. Action-Based Effects on Music Perception. Frontiers in Psychology 4:1008. doi:10.3389/fpsyg.2013.01008. Merleau-Ponty, M. 1970. Themes from the Lectures at the Collège de France, 1952–1960. Evanston, IL: Northwestern University Press.

34   justin christensen Merleau-Ponty, M. 2002. Phenomenology of Perception. Translated by C. Smith. London and New York: Psychology. Meteyard, L., B.  Bahrami, and G.  Vigliocco. 2007. Motion Detection and Motion Verbs: Language Affects Low-Level Visual Perception. Psychological Science 18 (11): 1007–1013. doi:10.1111/j.1467–9280.2007.02016.x. Metzinger, T. 2009. The Ego Tunnel: The Science of the Mind and the Myth of the Self. New York: Basic Books. Monson, C. A. 2002. The Council of Trent Revisited. Journal of the American Musicological Society 55 (1): 1–37. doi:10.1525/jams.2002.55.1.1. Morley, J. 2005. Introduction: Phenomenology of Imagination. Phenomenology and the Cognitive Sciences 4 (2): 117–120. doi:10.1007/s11097-005-0134-x. Nagel, T. 1974. What Is It Like to Be a Bat? Philosophical Review 83 (4): 435–450. doi:10.2307/2183914. Nancy, J.-L. 2000. Being Singular Plural. Translated by R. D. Richardson, and A. E. O’Byrne. Stanford: Stanford University Press. Nettl, B., R. C. Wegman, I. Horsley, M. Collins, S. A. Carter, G. Garden, et al. n.d. Improvisation. Grove Music Online. Edited by D. Root. http://www.oxfordmusiconline.com. Accessed July 15, 2015. Nilsson, K. K., and K. J. de López. 2016. Theory of Mind in Children with Specific Language Impairment: A Systematic Review and Meta-Analysis. Child Development 87 (1): 143–153. doi:10.1111/cdev.12462. Parker, E. 1992. Man and Machine 1992: “De Motu” for Buschi Niebergall. http://www.efi. group.shef.ac.uk/fulltext/demotu.html. Accessed April 17, 2017. Pearce, M. T., and G. A. Wiggins. 2006. Expectation in Melody: The Influence of Context and Learning. Music Perception 23 (5): 377–405. doi:10.1525/mp.2006.23.5.377. Peters, G. 2009. The Philosophy of Improvisation. Chicago and London: University of Chicago Press. Plato. 1992. Republic. Translated by G. M. A. Grube. Revised by C. D. C. Reeve. Indianapolis: Hackett. Plato. 1997. Complete Works. Edited by J.  M.  Cooper and D.  S.  Hutchinson. Indianapolis: Hackett. Ranganathan, R., and L.  G.  Carlton. 2007. Perception-Action Coupling and Anticipatory Performance in Baseball Batting. Journal of Motor Behavior 39 (5): 369–380. doi:10.3200/ JMBR.39.5.369–380. Richardson, D. C., M. J. Spivey, L. W. Barsalou, and K. McRae. 2003. Spatial Representations Activated during Real-Time Comprehension of Verbs. Cognitive Science 27 (5): 767–780. doi:10.1016/S0364-0213(03)00064-8. Rouget, G. 1985. Music and Trance: A Theory of the Relations between Music and Possession. Chicago: University of Chicago Press. Rousseau, J. J. 1979. Emile; or, On Education. Translated by A. Bloom. New York: Basic Books. Sancho-Velazquez, A. 2001. The Legacy of Genius: Improvisation, Romantic Imagination, and the Western Musical Canon. PhD thesis, University of California. Los Angeles. Shelley, P. B. (1840) 2010. A Defence of Poetry and Other Essays. Whitefish, MT: Kessinger. Solis, G. 2009. Genius, Improvisation, and the Narratives of Jazz History. In Musical Improvisation: Art, Education, and Society, edited by G. Solis and B. Nettl, 90–102. Urbana: University of Illinois Press.

improvisation: embodied imagination   35 Steeves, J. B. 2004. Imagining Bodies: Merleau-Ponty’s Philosophy of Imagination. Pittsburgh: Duquesne University Press. Thelen, E. 1995. Motor Development: A New Synthesis. American Psychologist 50 (2): 79–95. doi:10.1037/0003-066X.50.2.79. Thelen, E., and L. B. Smith. 1996. A Dynamic Systems Approach to the Development of Cognition and Action. Cambridge, MA: MIT Press. Thierry, G., P. Athanasopoulos, A. Wiggett, B. Dering, and J.-R. Kuipers. 2009. Unconscious Effects of Language-Specific Terminology on Preattentive Color Perception. Proceedings of the National Academy of Sciences 106 (11): 4567–4570. doi:10.1073/pnas.0811155106. Thom, B. 2000. Unsupervised Learning and Interactive Jazz/Blues Improvisation. In Proceedings of the Seventeenth National Conference on Artificial Intelligence: Austin, July ­31–August 2, 652–657. Menlo Park: AAAI Press. Thompson, E., A. Lutz, and D. Cosmelli. 2005. Neurophenomenology: An Introduction for Neurophilosophers. In Cognition and the Brain: The Philosophy and Neuroscience Movement, edited by A. Brook and K. Akins, 40–97. Cambridge University Press. Toner, J., and A. Moran. 2015. In Praise of Conscious Awareness: A New Framework for the Investigation of “Continuous Improvement” in Expert Athletes. In Frontiers in Psychology, 5: 769. doi:10.3389/fpsyg.2014.00769. Varela, F.  J., E.  Rosch, and E.  Thompson. 1992. The Embodied Mind: Cognitive Science and Human Experience. Cambridge, MA: MIT Press. Warnock, M. 1976. Imagination. Berkeley and Los Angeles: University of California Press. Weiss, P., and R. Taruskin. 2007. Music in the Western World. Belmont: Cengage Learning.

chapter 2

A n ticipated Son ic Actions a n d Sou n ds i n Per for m a nce Clemens Wöllner

Introduction Imagine the following situation: a guitarist performs in a large concert venue together with a singer, a bass guitarist, and a percussionist. At one point, the stage monitor speakers cease functioning, and the guitarist does not hear the sound of her own instrument or the sounds produced by the others. Only some noise from the audience reaches her ears and some delayed feedback from the loudspeakers directed away from the stage. Puzzled at first, the guitarist quickly glances at the fellow musicians and then continues playing. They had performed the piece a dozen times and she is an experienced guitarist, so she is able to anticipate the sound outcomes of her own performance actions and to synchronize with the imagined sounds of the others, all the while ignoring the delayed acoustic feedback from the audience loudspeakers. Musicians such as the one described in this situation are typically well aware of the effects of their actions, so they may depend less on actual auditory feedback. Long training enables them to associate finetuned motor behavior with sounding action outcomes, and their actions are tightly coupled with sounds in their imagination (Keller 2012). Even when hearing sounds, skilled musicians may experience a motor resonance of the corresponding actions (Bangert et al. 2006). The sound waves transform to sonic actions in their auditory and motor imagery, enabling them to perform even when auditory feedback is disrupted or when anticipating their own and others’ sounds. This chapter elaborates on anticipated sounds in performance, and focuses on sonic actions in bodily representations of sounds (see also, Godøy this volume, chapter 12, on musical shape cognition). Research into traditional instrumental performances is discussed, and features of electroacoustically generated sounds in gestural performances

38   clemens wöllner are described for controller-driven music. In all these examples, sound characteristics, according to more restricted definitions, refer to the timbre of instrumental or vocal sounds in a performance. Certain spectral components, as specified later, characterize and distinguish sounds from each other. In a wider sense, the sounds of a music performance include further components such as timing and dynamics that make a ­performance unique and distinctive from others. These sound qualities are often related to timbre, such that higher intensity in many acoustical instruments alters their timbre as well, or the chosen tempo affects playing techniques, articulation, and sound quality. Distinctions between the two concepts of sound may thus be primarily of theoretical interest. Yet, for electronic music, an unlimited combination of sound features is possible that does not necessarily involve the aforementioned interdependencies between timing, intensity and timbre. The chapter will focus on both perspectives of musical sounds in actual and imagined performances and explores ways in which anticipated sonic actions might differ between performances of physical instruments and those of gestural, controller-based music. The main argument is that musicians construct images of their sonic actions that permit them to perform independently from what they actually hear, while at the same time a feedback mechanism is needed for controlling the actual sounds that the audience should perceive.

Approaches to Sounds in Traditional Western Performance In what ways do musicians establish images of their sonic actions? Findings of research on perception and auditory imagery, mental performances and anticipation of sound events are discussed in the following section. The research presented here employed physical analog instruments that allow musicians to construct multimodal images of  their actions without sound modifications or distortions that are possible with electronic instruments.

Perception and Imagination of Timbral Qualities Among the various parameters that characterize music performances, timbre, or “sound color” (Slawson 1985; Krumhansl 1989, McAdams, Winsberg, Donnadieu, de Soete and Krimphoff 1995), is one of the key features. Listeners remark the unique timbre of the voices of great singers and describe them as a personal expression of musical ideas. Musical instruments of famous instrument makers are well known for their characteristic sound qualities, which are further shaped by experienced performers. Even for performances on the piano, offering a more limited variability in sound parameters (compared to, say, string instruments), which are mainly based on keystroke velocity

anticipated sonic actions and sounds in performance   39 and pedaling, listeners ascribe individual sound qualities to pianists that are reflected in  a number of performance parameters such as touch or articulation (Bernays and Traube 2014). On the other hand, these qualities are not fixed in Western musical notation, nor can they be captured in a direct, one-dimensional way. Empirical approaches to music performance seem to have focused primarily on timing, pitch, dynamics, and articulation. Though these dimensions may characterize key elements of music performances, other features shape musical experiences to a significant extent. For electroacoustic music, Stockhausen (1963) defines five musical dimensions: pitch including harmony and melody, duration including meter and rhythm, timbre (“Tonfarbe,” literally “tone color”), dynamics, and spatial aspects. According to Stockhausen, each dimension should be equally important for composers, performers, and listeners, and he employed this stance for his own compositions. While the first three of Stockhausen’s parameters have long been described and indicated in the scores in relatively precise terms (typically, however, not taking into account the fine nuances in performers’ microtiming or dynamics (see also Danielsen this volume, chapter 29, on microtiming), defining the tone color seems more challenging. Musicians, theorists, and empirical researchers often employ verbal descriptions such as full, bright, diffuse, or tense in their descriptive approaches to timbre, and these should relate to acoustic features of the sounds. Measurements of the acoustic features, then, focus on spectral components, including formant areas, spectral centroid, spectral flux, or intensity of selected partials. Reuter and Siddiq (2017) present an overview of various attempts to classify instrumental timbres by assessing their closeness or distance to each other in so-called timbre spaces. Grey (1975) was the first to construct such a timbre space by having listeners rate the perceptual quality of synthesized sounds. The dissimilarity of the perceived timbres was transferred into a three-dimensional space using the statistical method of multidimensional scaling. Briefly, the further away the timbres are in the space, the more they differ from each other in perceptual judgments. Several other researchers constructed timbre spaces in the following years. In their meta-analysis, Reuter and Siddiq found out that the dimensional solutions for describing instrumental sounds vary widely in different studies of timbre spaces. They tested the synthesizer sounds used in these studies and compared them with the timbres of actual instrumental sound samples from the Vienna Symphonic Library. Listeners subsequently rated the dissimilarity of the timbres in pairwise comparison judgments. As a result, there were large differences between the timbral qualities of the same instruments across the stimuli sets used in the respective studies, indicating limits in the comparability and generalizability of synthesized sounds. In contrast, ratings within the set of the actual instrumental samples were more widespread (see the boxes with a “v” as first letter, Figure 2.1). Consequently, Reuter and Siddiq suggest that a wider range of sound samples should be investigated and that perceptual judgments should be more firmly related to acoustical properties. If sound qualities, according to Reuter and Siddiq, are rather elusive and lead to great variation in perceptual judgments and descriptions, in what ways are they relevant for musical performers and composers? Even more, some Western music, especially of the

40   clemens wöllner

vS

vS

vS gS1

vP vF vF kP vP

gT

gT

kS gE

kP

vE

gC1

gF

gE gB vB

kB

kE

vP

vk

kT

vk

vk vk

vE kP

gF

gC1 gC1

kF

vE

vE

vC

vk vk vk

kC vC

kT

vC

kC

vC kP

kC

vC

kD

kP

2

gC2 gC1

kT

kB

vk

en

kP

gF

vE

kC vT

gF

kB

vB

kS

gS3 gS1

vF kP

kT

kF

kE kB

sio n

gFgC1

vC

kT

kF

gF

gP

gE kT

im

gS2

Dimension 3

gP

gS1

gS1

vT

gS2

vT

D

gS3

gS2

kS

vk

kD

vB

on 1

D

nsi ime

Figure 2.1  Meta timbre space including four dimensions (fourth dimension: shades on the gray scale). (1) Sounds used in the study by Grey (1975): gB (bassoon), gC2 (bass clarinet), gC1 (E clarinet), gE (English horn), gF (French horn), gS1-3 (cello with different playing techniques), gT (trombone), gP (trumpet). (2) Sounds used in Krumhansl (1989) and McAdams et al. (1995): kB (bassoon), kC (clarinet), kE (English horn), kF (French horn), kS (strings), kT (trombone), kP (trumpet). (3) VSL sounds (Vienna Symphonic Library): vB (bassoon), vC (clarinet), vE (English horn), vF (French horn), vS (strings), vT (trombone), vP (trumpet) (Reuter and Siddiq 2017, 160).

baroque period with its widespread stress on structural elements rather than per­ formative affordances of the instruments, sounds adequate even if it is performed on a variety of different instruments. Arrangements for new instruments are popular and common, suggesting that the essence of a piece of music may remain intact and recognizable even with completely different timbres. On the other hand, musical arrangements also point to the fascination that the timbres of various instruments have for the audience. Notions that timbral qualities are hard to capture in language do not imply that they should be less relevant. In addition, it is generally accepted, and somewhat expected from musical performers, that timbre differs from one performer to the other, giving rise to characteristic sound qualities of soloists and whole orchestras. For recorded music, the mixing

anticipated sonic actions and sounds in performance   41 engineer’s and sound producer’s personal ideals of timbre further influence the sound of a recording. It can thus be assumed that performers, producers, and listeners alike have distinct images of sound qualities that shape their expectations and, in the case of musicians, their actions. A basis for this claim certainly lies in the fact that people are indeed able to distinguish between timbres, and that they do imagine them vividly. Evidence for the vivid imagery of timbre stems from empirical research that addresses differences and similarities between imagined and perceived instrumental timbres. Halpern et al. (2004) asked ten participants with moderate musical experience to judge the similarity of instrumental timbres in pairwise comparisons. The sounds were taken from the McGill University Master Samples Library and included realistic sounds of musical instruments. Participants were either acoustically presented with the samples or had to imagine the sounds, while their brain activity was assessed with fMRI. The pairwise comparisons, indicating the similarity or distance of the timbres, were analyzed with multidimensional scaling, resulting in two dimensions that the authors defined as “brilliance” and “nasality.” As an example, the oboe timbre was placed highly in both dimensions, whereas the clarinet was low in both dimensions. Interestingly, there were roughly similar dimensional scaling solutions for the perceived and imagined timbres. In addition, the similarity ratings for the timbre pairs correlated highly between the imagined and perceived conditions, suggesting some overlap in the cognitive processes of perception and imagery. This conclusion is further supported by neuroimaging results showing that areas in the right auditory cortex were activated both in timbre perception and in timbre imagination. In addition, there was some activation in the supplementary motor area during timbre imagery, but not in perception. Musicians may have accessed some unspecific motor component during imagery, or they subvocalized the pitches of the instruments during imagery. Further research, conversely, questions whether individuals can indeed imagine musical timbre vividly, and suggests that it is rather their representations of timing or pitch that are more detailed and accurate (Bailes 2007). Familiarity with a piece of music clearly aids in imagining timbral qualities. Despite the different systems in which timbral qualities are classified, they are undoubtedly among the key characteristics of individual music performances and, accordingly, one of the first features listeners may perceive when appraising musical interpretations. Whereas the timing and dynamics unfold over time, the sound colors of a performance are present immediately. Some evidence for the significance of the sound quality for listeners’ judgments stems from research on “thin slices” of music (Gjerdingen and Perrott 2008; Krumhansl 2010). Even when presented with very short musical excerpts of 300 and 400 ms, listeners were able to provide judgments on genre or emotional content. For well-known pieces, they could also name the performers or release decade. Since information about other musical elements is reduced, it can be assumed that timbral qualities have a paramount role for these quick evaluations, especially for genre judgments. The same evaluation strategies, in combination with other elements such as timing, dynamics, or pitch, are present of course for longer durations.

42   clemens wöllner

Sounds in Imagined Performances and Mental Practice Being able to imagine sound is essential for improvisers and composers, who need to construct an image of the sound and structure of the music before it is played and heard. Even composers and sound designers working on their instruments or computers, often in trial and error processes, need some form of imagination in order to evaluate their choices of the sounds. In an early study, Agnew (1922) scanned the diaries, letters, and other sources of five well-known composers and reports that Schumann had very vivid “inner hearing” (280), Berlioz “heard his compositions mentally” (284), and Tchaikovsky was supposedly already aware of the sound quality in the Moscow concert hall in which his compositions should be performed at a later time. For music performance, auditory imagery is central for at least four reasons. First, performers need to control the quality of their own sound, by exerting the muscles of the vocal tract or putting the fingers in the right position before producing a sound (Keller et al. 2010). Second, when practicing mentally without any overt movements, they vividly imagine the sound of the music, resulting in benefits for memorization or interpretative choices (Highben and Palmer  2004; Connolly and Williamon  2004). Third, performing together with others and anticipating their sounds—for instance, in a jam session—requires anticipatory imagery of one’s own and others’ actions and action goals (Keller 2012). This skill is also important in musical conducting (Wöllner 2012). Conductors need to imagine the course of the music while, at the same time, anticipating what the orchestra is going to play. So, experienced conductors indicate their intentions by use of gestures and facial expressions some moments before the respective sounds can be heard, which permits them to react and adjust their conducting in case the orchestra performs in a different way than anticipated. Finally, it is important for musical performers to imagine what the audience may perceive, especially if the musicians do not hear themselves well in a large ensemble and when acoustic feedback is impaired. Yet, even in situations with ideal acoustic reverberation, performers should develop vivid inner representations of sounds, since they are sometimes different from the sounds that the audience perceives; this depends on intrabodily sound transmission to the cochlea from the vibrating instruments or the singer’s own voice. Therefore, performers should ideally have a sense for the external quality of their sounds, in the same way as experienced speakers and actors control the timbre of their voices by means of recordings. Recent approaches to virtual acoustic environments aim at offering musicians real-time feedback of their playing from different audience positions (Brereton 2017). Empirical studies of auditory imagery have often employed methods in which auditory feedback was deprived. The rationale for this approach lies in the idea that if musicians have access to stable sound representations and auditory imagery, then they may depend less on external acoustic feedback. In a sight-reading study in which different types of feedback were manipulated (Banton 1995), pianists did not depend on auditory feedback. Sight-reading performances with the sound of a digital piano switched off did not lead to more errors, thus it was not necessary for them to hear what they actually

anticipated sonic actions and sounds in performance   43 played. It can be assumed that pianists could vividly imagine and anticipate the sounds during sight-reading. Finney (1997) observed in a related study that manipulations of pitch in auditory feedback interfered with pianists’ performance plans and impaired their play. When auditory feedback was completely absent, on the other hand, their imagery skills allowed them to perform without disruptions. The tactile and kinesthetic feedback was evidently more important for pianists to control their performances than the external auditory information. These findings show that employing different feedback modalities in empirical studies allows insights into the stability of performers’ imagery skills. Further research investigated the learning and recall of unfamiliar music under different feedback conditions. Highben and Palmer (2004) asked pianists to practice without auditory feedback (i.e., playing without sound), without motor feedback (i.e.,  score reading while listening to a prerecorded version of the piece), or while practicing the music in their minds without any feedback at all. The last condition led to worse results than the conditions where only one of the feedback modalities was absent. Those pianists who succeeded in an auditory skill post-test were also better in learning with no auditory feedback of the piano. It can be concluded that musicians with good aural skills may benefit from more stable auditory images, supporting them in practice and performance. In a related study (Brown and Palmer 2013), higher auditory imagery skills allowed the pianists to recall more notes, to play with greater temporal regularity, and to overcome distortion caused by experimental interferences. These results indicate that the individual stability of auditory imagery aids in performance and recall, even in situations when the sensory feedback of the actual sound is absent or altered. Auditory imagery was not correlated with motor imagery, suggesting that both cognitive skills are relatively independent in performers. Multimodal imagery skills are particularly vital for mental practice, in which performers play the music “in their minds’ ear” without overt muscle movements. Such mental performances are only efficient and feasible if performers possess efficient imagery skills. A study assessed timing stability in piano performance across experimental variations of performance feedback and imagery (Wöllner and Williamon 2007). Pianists were asked to perform a memorized composition under four conditions that included a normal performance as well as conditions without auditory feedback, without auditory and visual feedback, and, finally, while tapping along with an imagined performance. Analyses of the microtiming and dynamics revealed that the condition without auditory feedback (but with haptic feedback of the MIDI piano) was relatively close to the normal performance and deviated only about 10 percent, while tapping along with the imagined performance led to strong timing deviations of up to 40 percent. In other words, musicians had developed strong auditory-motor images of the compositions that did not impair their play as long as they received the kinesthetic feedback of their fingers from the piano keys. These results were not reflected in the pianists’ self-evaluations of their practicing and memorizing strategies, in which they indicated that they had memorized and employed aural, visual, kinesthetic, and conceptual images of the music’s structure to a similar extent. In an associated study (Clark and Williamon 2011), the timing accuracy

44   clemens wöllner of pianists’ imagined performances was related to the amount of time they had spent practicing mentally per day, while results of a self-report imagery vividness test were correlated only with live performances, not imagined performances. These findings suggest that domain-specific practicing may enhance musical imagery skills, which are not necessarily related to the general vividness of imagery outside the specific skills. In summary, vivid auditory imagery may aid in performance when sounds cannot be perceived. Studies provided evidence that auditory perception and imagery are closely related, since the cognitive processes involved are highly comparable and both engage the secondary auditory cortex (Daselaar et al. 2010; for a review, see Hubbard 2010). As shown previously, when individuals evaluate the qualities of the sounds they imagine, their judgments generally resemble those of the actually perceived timbres (Halpern et al. 2004). Similarly, individuals are able to imagine durations and pitches in conditions where they do not hear the sounds (Janata and Paroo 2006). Vivid auditory, visual, and motor imagery is thus central for mentally practicing music, but also for other performance areas such as sports or medicine in which athletes and surgeons train “internally” a number of crucial actions, acquire new skills, and prepare for their performance in the absence of any overt physical movements (Cocks et al. 2014). Mental practice has been shown to be beneficial compared to no practice (see Driskell et al. 1994; Connolly and Williamon 2004) and can aid the performers in re-enforcing their imagery skills, which are necessary for building the auditory and motor representations of a musical piece or of athletic movement patterns. Perhaps unsurprisingly then, musical training increases auditory imagery skills. In a study using a melody-continuation paradigm in which only the beginnings of the melodies were actually played, musicians had a more vivid imagination of the following tones of the melodies that were not played (Herholz et al. 2008). Compared to nonmusicians, they responded more quickly to single incorrect tones that were played during the imagined continuation of the melodies and showed a neural mismatch negativity, leading the authors to suggest that imagery shares the same neural processes with actual perception.

Anticipatory Imagery of Sonic Actions In the performance of music, auditory imagery is a key element for anticipating and shaping actions. Musicians access images of the sound before they actually play the music. In doing so, they constantly activate representations of the sounds in working memory (Kalakoski 2001) which are processed together with further images of timing, intensity, and pitch patterns in accordance with the given demands for each moment of the performance. In other words, musicians typically plan their actions by assessing higher-order representations of the music and, at the same time, by incorporating the auditory feedback of the sounds they produce (Pfordresher 2006). Keller (2012) termed this skill “anticipatory auditory imagery” (209), which is beneficial for the planning and execution of actions, allowing for more precise timing and economical control of the performance movements. Musicians have an awareness of the sounds they are about to

anticipated sonic actions and sounds in performance   45 produce. In this way, music performance can be seen as an act of knowing or sensing what comes next. Before producing a certain action, performers internally anticipate the sonic outcome. There are two internal models that describe the processes of action anticipation: a forward model (being aware that a motor action leads to a sensory experience), and an inverse model (focusing on the desired outcome of an action affects motor behavior)—both models run a short time before action execution (Keller 2012; Rauschecker 2011). Experienced musicians are thus able to anticipate and control the movements they need to execute in order to reach a desired sound quality and to adjust the force necessary for producing the sounds during play. They do not only respond to the outcome of their own actions, but also internally imagine the sonic actions of co-performers in an ensemble (cf., Sevdalis and Keller 2014). Several experimental studies investigated the musicians’ anticipatory imagery that guides their actions and the related sound outcomes. In a tapping task, Keller and colleagues (2010) compared the timing accuracy in conditions without sound as well as with compatible or incompatible sounds. In the silent condition, the musicians’ timing matched the given target tempo best, and the accelerations of their finger trajectories before tapping were highest. Thus, auditory imagery clearly aided the musicians in planning their actions and timing their tapping movements. Action anticipation was also investigated in a study by Bishop and colleagues (2013). Twenty-nine pianists of varying degrees of expertise were asked to play the right-hand part of relatively unknown and simple piano compositions. After practicing the piece, some of the experimental conditions included actually playing the music while, in other performance conditions, no acoustic feedback or no auditory plus motor feedback was provided (hence, in the latter condition, they imagined playing through the compositions in their minds). As expected, experienced pianists produced fewer pitch and timing errors than less experienced ones in the conditions without auditory feedback, suggesting that they had vivid imagery skills and more stable action anticipation (cf., also studies reviewed above). Furthermore, at several points, dynamic and articulation markings occurred in the scores on a digital visual screen that had not been present when practicing the pieces. During performances with disrupted feedback and during imagined performances, pianists were asked whether the newly introduced markings matched their own expressive intentions at the specific moments. For instance, a crescendo marking appeared and pianists said “yes” if it matched their own idea or “no” if this was not the case. As a result, they responded more quickly than would have been expected if they had waited for the auditory feedback. These findings suggest that pianists had access to anticipatory imagery that aided them in performing and imagining the music without feedback. The findings further indicate that there are relatively stable performance plans in terms of articulation and dynamics, so that the music was vividly played “in the mind’s ear.” Research on agency provides further evidence for the claim that musicians construct stable representations of their actions, allowing them to imagine their own play before carrying out an action. Agency is the awareness that actions are produced by oneself, in other words, that someone feels an authorship of their actions (Jeannerod 2003; Synofzik et al. 2008). Auditory feedback is highly informative for action identification,

46   clemens wöllner since a number of actions are typically accompanied by sounds—most prominently in music making but also in other everyday actions including walking or clapping. Therefore, perceiving the sounds of an action may stimulate the motor systems associated with the action (for a review, see Aglioti and Pazzaglia 2010). Neuroimaging research showed that perceiving sounds may activate a network not only of auditory centers but also of motor brain areas. Bangert and colleagues (2006) carried out an fMRI study in which seven pianists and seven nonmusicians listened to short piano melodies. In the pianists, a sensorimotor integration network was active, including the supplementary motor and premotor areas as well as the auditory Broca and Wernicke areas, indicating that listening to the piano sounds stimulated action-specific auditory and motor representations in their brains. Experienced musicians are well aware of their own musical actions when presented with the sound of previously recorded performances. In a study of musical self-recognition (Repp and Knoblich 2004), twelve advanced pianists played short, unfamiliar compositions, half of them without auditory feedback. Two months later, recordings of the performances were played to the pianists, and they successfully identified their own performances. Recognition scores were not affected by whether or not auditory feedback had been provided in the recording sessions. In follow-up tests, the recordings were manipulated such that nuances in the dynamics or differences in the overall tempo were removed. Pianists still identified their own performances out of the twelve performances, suggesting that the remaining information regarding the expressive microtiming and articulation was sufficient for the recognition of self-generated actions. These results point to the prominence of timing characteristics for auditory representations of one’s own sonic actions and self-other distinctions. While timing has been shown to be crucial even for recognizing simple auditory stimuli (for a review, see Sevdalis and Keller 2014), it might well be that for other, nonkeyboard instruments such as the clarinet or flute, timbral qualities contribute more strongly to a sense of agency. Future research could systematically control for these factors, such as in research on visual self-identification (Knoblich and Prinz 2001). In addition to auditory characteristics, visual performance cues can also be highly informative for recognizing one’s musical actions. In a study with twelve orchestral conductors, self-other identification of point-light displays was investigated (Wöllner 2012). In a performance session, each of the conductors directed a string quintet performing a Mendelssohn string symphony while markers at relevant parts of their bodies were recorded with a motion-capture system. In a subsequent perception session, conductors watched visual, auditory, and audiovisual sequences of their own and others’ conducting performances. They successfully identified their own performance in the visual and audiovisual conditions, while judgments of the auditory-only version were not above chance level. Since the sequences were relatively short (ca. 6–11 s), and the conductors did not produce the sounds themselves, their action representations depended more on the visual information. This interpretation is supported by visual ratings of the conducting quality, which were highest for their own gestures, independently of whether or not they had successfully identified their own performances. Cues for self-recognition

anticipated sonic actions and sounds in performance   47 might therefore include a number of different sources and modalities that resonate with the perceiver’s motor system (cf., Alaeerts et al. 2009). An even greater challenge for musicians is to imagine the sonic actions of others while performing themselves at the same time, thus integrating self and other imagined actions as well as self auditory-motor feedback and other auditory feedback. Visual information guides action anticipation in an ensemble. Keller and Appel (2010) investigated the relation between anticipatory imagery and dyadic synchronization. Seven pairs of pianists performed duets of the classical piano literature together on MIDI instruments while seeing or not seeing each other. Their body motion was recorded with an optical motion capture system. In a second session, each pianist’s anticipatory auditory imagery was tested with a tapping paradigm that included auditory feedback of marimba tones or no auditory feedback. As a result, anticipatory imagery scores, calculated for each pianist separately, were correlated with average duo synchrony, suggesting that those pianists with good imagery skills were more successful at timing their ensemble play. While being in visual contact did not markedly affect results, lags in anteriorposterior body sway between duo partners were related to synchrony, indicating that the pianists also timed their performances via body motion. Alternatively, their body motion might have functioned as individual time-keeping support. In this study, pianists not only had to anticipate their own actions but also had to imagine the co-performer’s sonic actions. Keller and Appel suggest that inverse internal models (see above) were run slightly before producing the actions. Anticipatory auditory imagery should consist of imagining the sounds of oneself and the other performer, and both are then transferred to adequate motor commands. In duo and ensemble performance, internal models are thus coupled by simulating the other’s actions (cf., Gallese and Goldman 1998). Taken together, timing information seem most important for an awareness of actions and self-other distinctions. Evidence for the paramount importance of timing has also been provided by studies outside the domain of music (e.g., Knoblich and Prinz 2001). Most research on auditory action identification and sonic imagination investigated simple acoustical stimuli or piano performances, in which the variety of sound qualities is rather limited as compared to string or wind instruments, or the human voice. More research is needed that addresses specific timbral qualities in the anticipatory imagination of the sounds to be produced by oneself and other performers.

Sonic Actions in Controller-Driven Performances of Digital Music Anticipation and auditory imagery of sonic actions are vital in performance areas that do not rely on the haptic feedback of traditional instruments. A growing field in performance practice employs sensors and controller systems that allow for the shaping of sounds in new forms. The bodily movements of the performers are translated to sound

48   clemens wöllner signals, so that their musical gestures become sonic actions in a somewhat direct and, in the eyes of observers, apparently unmediated way. Producing sounds by human actions “in the air,” without the haptic feedback of physical instruments, has fascinated performers and the audience for a long time even before modern computing technology. One of the first such instruments is the Theremin, invented in 1920, in which the spatial positions of the two hands control volume and pitch (see Theremin 1996). The fundamentals of the Theremin are two metal capacitors that function as proximity sensors in the near field for the above two performance dimensions. A boost in sensor-based music performances has coincided with the greater availability of electronics and software solutions since the 1980s. There are various developments in the field, including technology for digitally augmenting the sound options of acoustical instruments (which are fundamentally still based on the playing techniques of these instruments but involve additional electronic sounds), or controllers that turn a variety of different information including brainwaves or human motion into sound (cf., Hugill 2012). In the following, central conceptual issues and examples of performance systems that focus on purely gesture-based or “open-air” controllers and their consequences for action-sound mappings and anticipatory imagination are discussed. At the center of the discussion will be two types of systems that enable experiences of sonic agency for performers and audience: the conductor’s jacket and data gloves.

Imagined Bodily Causes of Sounds Practitioners and theorists point to an important difference between instrumental ­performances and the controlling of live electronics by human gestures: compared to traditional physical instruments, the mapping between the action and the sound outcome is typically ambiguous when interacting with gesture-based instruments. While, in the former, a finger movement, for instance, leads to a certain sound, and parameters such as movement velocity shape the intensity and timbre, there are hardly any limits for mapping gestures to sound qualities in sensor-based systems (cf., Hunt et al. 2003; Caramiaux et al. 2014). Regardless of the system or transmitter that is used, from ultrasound to infrared or video-based capturing, there are manifold ways to align and synthesize the sounds. MIDI and other interfacing protocols are often employed in gesture-driven performances that manage hardware or software sound generators, including digital instrument libraries or synthesizers on external devices. This leads to a great freedom and range of mapping options (Miranda and Wanderley 2006). In this way, developers of the systems need to decide which type of gestures may control which sound characteristics, so that the performers can shape and vary their gestures intuitively. Developers also need to consider how gestures are segmented into different units or chunks, in so far as different sound categories are used. Van Nort and colleagues (2014, 7) discuss three concepts of mapping: a “systems” perspective, which describes the alignment between sound parameters and the controller; a ­“functional” perspective that defines the combination of variables and computational

anticipated sonic actions and sounds in performance   49 operations used, for instance, in sound synthesis; and a “perceptual” perspective, addressing the observable linkage between a human gesture and the sound outcome. The perceptual perspective is particularly relevant for notions of intentionality, such that a gesture and the corresponding sounds are perceived as being intentional and meaningful. Thus, also with regard to the audience, the mappings between gesture and sound need to be carefully considered since, in a live performance, the audience expects to perceive the performer as an originator of sonic actions. While the audience knows that a computer produces the actual sounds, they search for mappings between movement and sound so that the sonic action appears “real” to them—even if they are aware that a gestural origination of the sound itself is imagined rather than real. Following Chion’s (1983) concept of “causal listening,” the audience aims at gathering more information on the actions that lead to a certain sound and closely link perception with the corresponding actions, for which they may use their own motor imagery (cf., Caramiaux et al. 2014; Gallese and Goldman 1998). Electronic sounds, rendered from loudspeakers that are typically placed at different spatial positions away from the performer, should, in this regard, still inform the audience about the original actions, and acoustical information should be panned in a way so as to match the visual performance information. The impression of action-sound fusion is furthermore particularly strong if the audience perceives gestural actions as chunks with certain “goal-points” (Godøy 2010, 121) that coincide with musical changes, such as accents for quick downward movements or changes in timbre along with peaks in movement position or acceleration. Expectations that sounds do signify actions in human-controller performances lead to a second issue: performers typically have no haptic feedback in empty-handed gestures performed “in the air.” Thus, for controlling their bodily movements, they rely on a limited set of auditory, and sometimes visual, feedback. Rovan and Hayward (2000) describe the performance process in controller-based music as follows: the performer’s intention initiates a certain gesture, which in turn is gauged by proprioception and vision—hence, without haptic or tactile feedback. Following this, the sound outcome may lead to adjustments of the gestures based on auditory as well as further visual and proprioceptive information, ultimately leading to adjustments of the intentions if necessary. This process is clearly different from the close feedback loop in traditional instrumental performance, which typically allows for a more precise motor control based on haptic feedback. There are certain cases in which temporal latencies are apparent in gesture-driven performances because of the chain of calculations and processes before the actual sound is produced (McPherson et al. 2016). The audience and the performer alike may then perceive a feeling of “disembodiment,” so that the sounds appear rather artificial and not related to the human actions—in short, they are not imagined as proper sonic actions. Overcoming latency issues with more robust computing as well as placing the loudspeakers in close proximity to the performance spot, or creating phantom images of the sounds by use of sound wave synthesis or multichannel stereo where appropriate, can enhance the perceived correspondence between gesture and sound, that is, an imaginary fusion of perceived physical presence and sound location.

50   clemens wöllner The disembodiment problem is also apparent when listeners perform actions in the air to accompany music, either to follow the melodic lines and the musical structure (e.g., Hohagen and Wöllner 2015) or by playing “air instruments” (Godøy et al. 2006; cf., Jensenius et al. 2010; Visi et al. 2016). When mimicking or imitating the soundproducing gestures, listeners may still imagine a link between their actions and the sound of the original performance, and the motor involvement might augment their auditory experience. Therefore, it can be assumed that audience members are prone to assign visual gestures to performance sounds (see also Behne and Wöllner 2011), even if they are aware that there is no direct mapping between the two.

Jackets and Gloves: Bespoke Mappings between Gestures and Sound Gestural performances typically require specific training before satisfactory results are achieved. For example, in order to control the nuances in pitch and dynamics of an aesthetically pleasing vibrato in a Theremin performance, some experience is needed that goes beyond intuitive approaches. There is one traditional musical profession specialized in gestures without an object to be touched (or vocal chords to be stretched), which is musical conducting. Conductors are experienced in shaping the orchestra’s sound by their hand and arm movements, and they also need to imagine the musical sounds before the musicians actually perform them on their instruments. In other words, they use their anticipatory imagery to transmit their intentions to the musicians by means of gestures. For these reasons, one could assume that conductors should also be particularly skilled in performing controller-driven music. Nakra (2000, 2002) developed the conductor’s jacket together with Tod Machover and others at MIT. The jacket is a sensory interface that allows for a wide range of ­different expressive movements in real-time performances. For the construction of the jacket, the performance data of six professional conductors were analyzed, including their muscle activity, respiration patterns, heart rate, galvanic skin response, and body temperature, as well as changes in the arms’ spatial position via motion markers. These data were then compared with the visual information and the musical structure. The next step was to use the data and, partly intuitively, implement them in music ­synthesis software. For example, the movements of the right arm indicated beat patterns and those of the left arm controlled expressive qualities in accordance with longestablished approaches to conducting (cf., Wöllner and Auhagen 2008). In addition to these e­ lements based on position and acceleration data, changes in the muscle tension influenced the dynamics, whereas breathing patterns had an impact on phrasing. In the final version of the conductor’s jacket, electromyographic (EMG) sensors and a  device for  the measurement of breathing patterns were integrated. The software converted all sensory information into MIDI data (Rogus MIDI library) in real time, and employed filters for the final sound output. In this way, the conducting gestures

anticipated sonic actions and sounds in performance   51 were  transferred into musical parameters such as sound onset and duration, tempo, articulation, and dynamics. Although the system was based on the movement patterns of actual conductors, Nakra (2002) saw limits when used for the training of conductors, which requires more complex interactions between musicians and conductor. It should thus instead be used as a new, stand-alone instrument that allows different sounds to be produced, and for which specific pieces should be written or arranged. Among the pieces composed for the conductor’s jacket, Etude 2 (Nakra 2000) used algorithms in which the EMG signal of the right biceps alone controlled pitch, volume, and timbre at the same time. The more the muscle contracted, the more the pitch height and the intensity level increased. In addition, the sound spectrum was altered such that the timbre appeared to be brighter. When the contractions of the biceps were overtly shown by arm movements, direct mappings between gesture and sound qualities became immediately apparent for the audience and the performer alike. In contrast, traditional conducting involves sometimes rather small gestures or blinks of an eye that can lead to large effects such as an entrance of the whole orchestra—moments in which the audience may doubt that the conductor “evokes” the sound. Besides the conductor’s jacket, there are several other performance systems using EMG sensors (see, among others, Donnarumma and Tanaka 2014; Nymoen et al. 2015). A large number of gesture-driven controllers employ motion-capture technology to direct digital instruments (e.g., Dobrian and Bevilacqua’s Motion Capture Music 2003). Among the commercial systems most widely used in various applications and performance art is the Kinect system, which was developed for Microsoft’s Xbox game console. The bodily motion of one or more performers can be tracked with video technology using an RGB color camera and a depth sensor with infrared lighting, therefore no markers are necessary as in other motion-capture systems. The advantage of the system clearly lies in its usability, whereas caveats include the somewhat arbitrary modeling of the body and the latency, resulting in less precise results compared to multiple-camera motion-capture systems. Still, it is possible to capture some position and motion data of performers and to map them to sounds. Various toolkits have been developed for Kinect, one of the first action-to-sound applications among them is Crossole (Sentürk et al. 2012). In this digital instrument, virtual building blocks are visualized in a way that a performer may control them, including chord structure, arpeggios, and timbre. The musical style is limited to harmonies of the Western classical tradition, while the timbre control consists of various sound effects including delays and filters. In addition, the contour of the melodic lines can be drawn by gestures. While artistic approaches such as Crossole strongly rely on visual representations of preset spatial positions for controlling the musical output, other systems allow for more flexibility in shaping the sound qualities. For example, data gloves capture the fine-tuned motion of the hands by use of markers or accelerometers. One of the earliest data gloves gained information from optical finger-flex sensors and position sensors in order to train a neural network of mappings between hand gestures and a vocabulary of 203 words (Fels and Hinton 1993). Mapping results after training were astonishingly high, and later

52   clemens wöllner Acceleration Sensor Strain Sensor

Y

Z X

Strain Sensor (inner surface of the wrist)

Figure 2.2  The Musicglove (Hayafuchi and Suzuki 2008, 242).

versions were also used for music performances (Fels et al. 2002), in which the audience should perceive metaphorical relationships between expressive gestures and sound. A different set of gloves were constructed by Laetitia Sonami and collaborators. The first version of their Lady’s Glove was invented in 1991, combining several transducers at the fingers and a magnet worn on the other hand to control synthesizers via MIDI. Later versions, mapped to MAX-MSP, allowed for more flexible sound control and intuitive gesture-sound mappings (Rodgers 2010). Hayafuchi and Suzuki (2008) developed the Musicglove (Figure 2.2), a device with accelerometers and strain sensors that measure the bending of the wrist and some fingers. A number of set gestures control the musical tracks: for instance, making a fist pauses the music. Overall acceleration is mapped to the tempo and vertical acceleration more specifically to the tempo of the beats. Therefore, the sound of preexisting music can be controlled with the gestures. One potential application includes the control of dance music by a DJ, where the gestures could even be more convincing for an audience when compared to conventional controlling devices such as a laptop or a turntable. Perhaps the expressive performances of Imogen Heap made controllers, and data gloves in particular, more popular with audiences even of popular music genres. The Mi.Mu Gloves (see Mitchell and Heap 2011) contain bend sensors, accelerometers, and a gyroscope as well as tactile feedback to the performer via vibrations. A large number of preset audio control options are possible that can be incorporated in various performance genres including dance, in which the mapping between human motion and sounds may seem particularly impressive.

Conclusions Musical performers in a wide range of genres benefit from vivid auditory and motor imagery. Being able to perform the music in their minds’ ears enables performers to anticipate their own and other musicians’ sonic actions and to shape the sound quality. In the absence of sound or in situations with altered or delayed auditory feedback, musicians depend on imagery skills to control their bodily performances. In digital music performance, with its numerous features for shaping and combining sounds, precise imagery clearly helps in reaching the desired sound outcome. Even more, controller-driven

anticipated sonic actions and sounds in performance   53 performances often fascinate the audience if a close mapping between gesture and sound—an apparent fusion—is achieved. In these cases, the audience imagines the sound to originate in the performer’s movements as sonic actions. The anticipatory processes in auditory imagery and the level of detail of imagined sound qualities, however, remain areas for further inquiry. Spectral components of the sound quality are used to distinguish between performances; they may characterize a performer’s “fingerprint,” that is his or her individual approach to sound (Bernays and Traube 2014). At the same time, timbral qualities still appear rather elusive in descriptive systems. In most empirical studies of music performances, microtiming information sufficed for distinctions between sonic actions. In addition to timing, the role of timbre in anticipatory imagery and the shaping of sounds are particularly important for instruments and genres that rely more strongly on nuances in sound qualities, and thus deserve further study. Imagery skills vary among musicians according to their background and expertise, depending on whether or not the music is played by heart, in ensembles with or without notation, or in improvisations. Research summarized in this chapter has shown that pianists who typically play their repertoire in a memorized way often develop par­ ticularly vivid auditory imagery. Their performances are not interrupted if the sound cannot be heard, for instance when it is switched off at MIDI pianos. Yet, for all musical performers, it is paramount to imagine sonic events as an outcome of their own actions. They need high degrees of vivid auditory imagery and motor awareness in order to fine-tune the desired sound qualities in a performance and to adjust their play if necessary, based on an auditory and motor error-feedback system. In this regard, research on action–perception coupling indicated that musicians have a strong sense of agency for their sonic actions. The neural processes underpinning imagery and perception are fundamentally similar, and both may activate overlapping action networks in performers, such that even listening to the sounds of music performances resonates with the performer’s action systems. Being aware of one’s own sonic actions, and vivid auditory imagery skills, occasionally appears even more pertinent to musicians than hearing the actual acoustical sounds. Apart from the widespread consumption of heavily compressed audio files, anecdotal evidence suggests that experienced musicians are able to ignore shortcomings of sound recordings and renditions; some of them may even not need high-fidelity sound systems and can still appreciate the music to some extent by compensating for, and imagining, the missing sound features. Similarly for performing, as discussed earlier, imagery can become more important than actual sonic feedback during playing, especially in the absence or alteration of sound. The relative dependencies of performers on imagery or feedback remain a topic for more research: in other words, whether musicians primarily concentrate on the imagined sounds during the act of performing as an intended action outcome, or whether they rely more strongly on a variety of different feedback modalities, including kinesthetic and visual components (cf., the internal models discussed earlier). While performance plans may become more stable if they draw on multimodal sensory stimuli, focusing one’s attention only on the sound as an external action goal,

54   clemens wöllner thus concentrating less on internal bodily processes, has been shown to be efficient in various fields (for an overview, see Wulf 2007). The anticipation of sonic actions, as stated earlier, might rely solely on imagined sounds that are to be produced. These processes appear to be particularly vital in gestural performances of live electronics, in which the multisensory feedback loop of acoustical instruments is often absent. Real-time processing, spatial placement and sound source control are central for close mappings and perceived fusion in experiences of human sonic actions. New developments in performance interfaces address these issues by modifying gesture-to-sound mappings that offer captivating links between bodily actions and sounds as an intentional, meaningful process for performers and for the perceptions and imaginations of audiences.

References Aglioti, S. M., and M. Pazzaglia. 2010. Representing Actions through Their Sound. Experimental Brain Research 206: 141–151. Agnew, M. 1922. The Auditory Imagery of Great Composers. Psychological Monographs 31: 279–287. Alaeerts, K., S. P. Swinnen, and N. Wenderoth. 2009. Interaction of Sound and Sight during Action Perception: Evidence for Shared Modality-Dependent Action Representations. Neuropsychologia 47: 2593–2599. Bailes, F. 2007. Timbre as an Elusive Component of Imagery for Music. Empirical Musicology Review 2: 21–34. Bangert, M., T. Peschel, G. Schlaug, M. Rotte, D. Drescher, H. Hinrichs, et al. 2006. Shared Networks for Auditory and Motor Processing in Professional Pianists: Evidence from fMRI Conjunction. NeuroImage 30: 917–926. Banton, L.  J. 1995. The Role of Visual and Auditory Feedback during the Sight-Reading of Music. Psychology of Music 23: 3–16. Behne, K.-E., and C. Wöllner. 2011. Seeing or Hearing the Pianists? A Synopsis of an Early Audiovisual Perception Experiment and a Replication. Musicae Scientiae 15: 324–342. Bernays, M., and C. Traube. 2014. Investigating Pianists’ Individuality in the Performance of Five Timbral Nuances through Patterns of Articulation, Touch, Dynamics, and Pedaling. Frontiers of Psychology 5: 157. Bishop, L., F. Bailes, and R. T. Dean. 2013. Musical Imagery and the Planning of Dynamics and Articulation during Performance. Music Perception 31: 97–117. Brereton, J. 2017. Music Perception and Performance in Virtual Acoustic Spaces. In Body, Sound and Space in Music and Beyond: Multimodal Explorations, edited by C. Wöllner, 211–234. Abingdon, UK: Routledge. Brown, R. M., and C. Palmer. 2013. Auditory and Motor Imagery Modulate Learning in Music Performance. Frontiers in Human Neuroscience 7: 320. Caramiaux, J. F., N. Schnell, and F. Bevilacqua. 2014. Mapping through Listening. Computer Music Journal 38: 34–48. Chion, M. 1983. Guide des objets sonores: Pierre Schaeffer et la recherche musicale. Paris: Buchet/Chastel. Clark, T., and A.  Williamon. 2011. Evaluation of a Mental Skills Training Program for Musicians. Journal of Applied Sport Psychology 23: 342–359.

anticipated sonic actions and sounds in performance   55 Cocks, M., C.-A. Moulton, S. Luu, and T. Cil. 2014. What Surgeons Can Learn from Athletes: Mental Practice in Sports and Surgery. Journal of Surgical Education 71: 262–269. Connolly, C., and A. Williamon. 2004. Mental Skills Training. In Musical Excellence: Strategies and Techniques to Enhance Performance, edited by A. Williamon, 221–245. Oxford: Oxford University Press. Daselaar, S.  M., Y.  Porat, W.  Huijbers, and C.  M.  Pennartz. 2010. Modality-Specific and Modality-Independent Components of the Human Imagery System. Neuroimage 52: 677–685. Dobrian, C., and F.  Bevilacqua. 2003. Gestural Control of Music using the Vicon Motion Capture System. In Proceedings of the New Interfaces for Musical Expression Conference, 161–163. May 22–24, 2003, Montréal, Quebec, Canada. Donnarumma, M., and A.  Tanaka. 2014. Principles, Challenges and Future Directions of Physiological Computing for the Physical Performance of Digital Musical Instruments. In Proceedings of the 9th Conference on Interdisciplinary Musicology (CIM14), edited by T. Klouche and E. R. Miranda, 363–368. Berlin, Germany: Staatliches Institut für Musikforschung. Driskell, J. E., C. Copper, and A. Moran. 1994. Does Mental Practice Enhance Performance? Journal of Applied Psychology 97: 481–492. Fels, S. S., A. Gadd, and A. Mulder. 2002. Mapping Transparency through Metaphor: Towards More Expressive Musical Instruments. Organised Sound 7 (2): 109–126. Fels, S. S., and G. E. Hinton. 1993. Glove-Talk: A Neural Network Interface between a DataGlove and a Speech Synthesizer. IEEE Transactions on Neural Networks 4 (1): 2–8. Finney, S. A. 1997. Auditory Feedback and Musical Keyboard Performance. Music Perception 15: 153–174. Gallese, V., and A.  Goldman. 1998. Mirror Neurons and the Simulation Theory of MindReading. Trends in Cognitive Sciences 2: 493–501. Gjerdingen, R. O., and D. Perrott. 2008. Scanning the Dial: The Rapid Recognition of Music Genres. Journal of New Music Research 37 (2): 93–100. Godøy, R.  I. 2010. Gestural Affordances of Musical Sound. In Musical Gestures: Sound, Movement, and Meaning, edited by R.  I.  Godøy and M.  Leman, 103–125. New York: Routledge. Godøy, R.  I., E.  Haga, and A.  R.  Jensenius. 2006. Playing “Air Instruments”: Mimicry of Sound-Producing Gestures by Novices and Experts. In Gesture in Human-Computer Interaction and Simulation: 6th International Gesture Workshop, edited by S. Gibet, N. Courty, and J.-F. Kamp, 256–267. Berlin: Springer. Grey, J. M. 1975. An Exploration of Musical Timbre. PhD thesis, Department of Psychology, Stanford University. Halpern, A. R., R. J. Zatorre, M. Bouffard, and J. A. Johnson. 2004. Behavioral and Neural Correlates of Perceived and Imagined Musical Timbre. Neuropsychologia 42: 1281–1292. Hayafuchi, K., and K. Suzuki. 2008. Musicglove: A Wearable Musical Controller for Massive Media Library. In Proceedings of the International Conference on New Interfaces for Musical Expression (NIME) 8, 241–244. 5–7 June 2008, Genova, Italy. Herholz, S. C., C. Lappe, A. Knief, and C. Pantev. 2008. Neural Basis of Musical Imagery and the Effect of Musical Expertise. European Journal of Neuroscience 28: 2352–2360. Highben, Z., and C.  Palmer. 2004. Effects of Auditory and Motor Mental Practice in Memorized Piano Performance. Bulletin of the Council for Research in Music Education 159: 58–65. Hohagen, J., and C. Wöllner. 2015. Self-Other Judgements of Sonified Movements: Investigating Truslit’s Musical Gestures. In Proceedings of the Ninth Triennial Conference of the European

56   clemens wöllner Society for the Cognitive Sciences of Music, August 17–22, Royal Northern College of Music, Manchester. Hubbard, T. L. 2010. Auditory Imagery: Empirical Findings. Psychological Bulletin 136: 302–329. Hugill, A. 2012. The Digital Musician. 2nd ed. New York: Routledge. Hunt, A., M. M. Wanderley, and M. Paradis. 2003. The Importance of Parameter Mapping in Electronic Instrument Design. Journal of New Music Research 32: 429–440. Janata, P., and K. Paroo. 2006. Acuity of Auditory Images in Pitch and Time. Perception and Psychophysics 68: 829–844. Jeannerod, M. 2003. The Mechanism of Self-Recognition in Humans. Behavioural Brain Research 142: 1–15. Jensenius, A. R., M. M. Wanderley, R. I. Godøy, and M. Leman. 2010. Concepts and Methods in Research on Music-Related Gestures. In Musical Gestures: Sound, Movement, and Meaning, edited by R. I. Godøy and M. Leman, 12–35. New York: Routledge. Kalakoski, V. 2001. Musical Imagery and Working Memory. In Musical Imagery, edited by R. I. Godøy and H. Jørgensen, 43–55. Lisse, the Netherlands: Swets and Zeitlinger. Keller, P.  E. 2012. Mental Imagery in Music Performance: Underlying Mechanisms and Potential Benefits. Annals of the New York Academy of Sciences 1252: 206–213. Keller, P. E., and M. Appel. 2010. Individual Differences, Auditory Imagery, and the Coordination of Body Movements and Sounds in Musical Ensembles. Music Perception 28: 27–46. Keller, P. E., S. Dalla Bella, and I. Koch. 2010. Auditory Imagery Shapes Movement Timing and Kinematics: Evidence from a Musical Task. Journal of Experimental Psychology: Human Perception and Performance 36: 508–513. Knoblich, G., and W.  Prinz. 2001. Recognition of Self-Generated Actions from Kinematic Displays of Drawing. Journal of Experimental Psychology: Human Perception and Performance 27: 456–465. Krumhansl, C. 1989. Why Is Musical Timbre So Hard to Understand? In Structure and Perception of Electroacoustic Sound and Music, edited by S. Nielzen and O. Olsson, 43–53. Amsterdam: Elsevier. Krumhansl, C. L. 2010. Plink: “Thin slices” of Music. Music Perception 27: 337–354. McAdams, S., S.  Winsberg, S.  Donnadieu, G.  de Soete, and J.  Krimphoff. 1995. Perceptual Scaling of Synthesized Musical Timbres: Common Dimensions, Specificities, and Latent Subject Classes. Psychological Research 58 (3): 177–192. McPherson, A., R. H. Jack, and G. Moro. 2016. Action-Sound Latency: Are Our Tools Fast Enough? In Proceedings of the International Conference on New Interfaces for Musical Expression, 12–14 July 2016, Brisbane, Australia. Miranda, E. R., and M. M. Wanderley. 2006. New Digital Musical Instruments: Control and Interaction Beyond the Keyboard. Madison, WI: A-R Editions. Mitchell, T. J., and I. Heap. 2011. SoundGrasp: A Gestural Interface for the Performance of Live Music. In Proceedings of the International Conference on New Interfaces for Musical Expression (NIME), 465–468. 30 May–1 June 2011, Oslo, Norway. Nakra, T. M. 2000. Inside the Conductor’s Jacket: Analysis, Interpretation and Musical Synthesis of Expressive Gesture. PhD thesis, MIT. Nakra, T. M. 2002. Synthesizing Expressive Music through the Language of Conducting. Journal of New Music Research 31: 11–26. Nymoen, K. H., M. R. Haugen, and A. R. Jensenius. 2015. MuMYO: Evaluating and Exploring the MYO Armband for Musical Interaction. In Proceedings of the International Conference on New Interfaces for Musical Expression, edited by E. Berdahl. Baton Rouge: Louisiana State University.

anticipated sonic actions and sounds in performance   57 Pfordresher, P.  Q. 2006. Coordination of Perception and Action in Music Performance. Advances in Cognitive Psychology 2: 183–198. Rauschecker, J. P. 2011. An Expanded Role for the Dorsal Auditory Pathway in Sensorimotor Control and Integration. Hearing Research 271: 16–25. Repp, B. H., and G. Knoblich. 2004. Perceiving Action Identity: How Pianists Recognize Their Own Performances. Psychological Science 15: 604–609. Reuter, C., and S. Siddiq. 2017. The Colourful Life of Timbre Spaces: Timbre Concepts from the Early Ideas to a Meta Timbre Space and Beyond. In Body, Sound and Space in Music and Beyond: Multimodal Explorations, edited by C. Wöllner, 150–167. Abingdon, UK: Routledge. Rodgers, T. 2010. Pink Noises: Women on Electronic Music and Sound. Durham, NC: Duke University Press. Rovan, J., and V. Hayward. 2000. Typology of Tactile Sounds and Their Synthesis in GestureDriven Computer Music Performance. In Trends in Gestural Control of Music, edited by M. Wanderley and M. Battier, 355–368. Paris: IRCAM. Sentürk, S., S. W. Lee, A. Sastry, A. Daruwalla, and G. Weinberg. 2012. Crossole: A Gestural Interface for Composition, Improvisation and Performance using Kinect. In Proceedings of the International Conferences on New Interfaces for Musical Expression (NIME). 21–23 May 2012, University of Michigan, Ann Arbor. Sevdalis, V., and P. E. Keller. 2014. Know Thy Sound: Perceiving Self and Others in Musical Contexts. Acta Psychologica 152: 67–74. Slawson, W. 1985. Sound Color. Berkeley: University of California Press. Stockhausen, K. 1963. Musik und Raum. In Texte zur elektronischen und instrumentalen Musik, Band I, 159–160. Cologne, Germany: DuMont. Synofzik, M., G. Vosgerau, and A. Newen. 2008. I Move, Therefore I Am: A New Theoretical Framework to Investigate Agency and Ownership. Consciousness and Cognition 17: 411–424. Theremin, L. S. 1996. The Design of a Musical Instrument based on Cathode Relays. Leonardo Music Journal 6: 49–50. Van Nort, D., M.  Wanderley, and P.  Depalle. 2014. Mapping Control Structures for Sound Synthesis: Functional and Topological Perspectives. Computer Music Journal 38: 6–22. Visi, F., E.  Coorevits, R.  Schramm, and E.  R.  Miranda. 2016. Analysis of Mimed Violin Performance Movements of Neophytes. In Music, Mind, and Embodiment: 11th International Symposium on Computer Music Multidisciplinary Research, edited by R. Kronland-Martinet, M. Aramaki, and S. Ystad, 88–108. Cham, Switzerland: Springer. Wöllner, C. 2012. Self-Recognition of Highly Skilled Actions: A Study of Orchestral Conductors. Consciousness and Cognition 21: 1311–1321. Wöllner, C., and W.  Auhagen. 2008. Perceiving Conductors’ Expressive Gestures from Different Visual Perspectives: An Exploratory Continuous Response Study. Music Perception 26: 129–143. Wöllner, C., and A. Williamon. 2007. An Exploratory Study of the Role of Performance Feedback and Musical Imagery in Piano Playing. Research Studies in Music Education 29: 39–54. Wulf, G. 2007. Attention and Motor Skill Learning. Champaign, IL: Human Kinetics.

chapter 3

Motor I m agery i n Perception a n d Per for m a nce of Sou n d a n d M usic Jan Schacher

Music exists at the intersection of organised sounds with our sensorimotor apparatus, our bodies, our brains, our cultural values and practices, musichistorical conventions, our prior experiences, and a host of other social and cultural factors. Consequently, musical motion is really experienced by us, albeit via our imaginative structuring of sounds. —(Johnson 2007, 255)

Introduction Audition is one of our central senses, and listening is deeply integrated into how we perceive and act in the world. In our perception, sound plays a central role for the construction of a coherent, multimodal “world-view.” Hearing and listening are tied together with all other senses in low-level, subpersonal, and prereflective relationships that are established in interaction with others, with the goal of meaning generation in higher cognitive functions. The central element used for making sense of acoustic information and for interpreting sounds is the body with its indivisible being-in-the-world, its capacity for action, and its role as substrate for cognitive processes. Sound comprises any acoustic phenomenon we might perceive, whereas music is a constrained field, operating within combinations of sounds that are human-made and culturally coded. Sound perception operates on numerous levels, from providing evolutionary survival cues, to carrying core elements of interpersonal exchanges, to enabling culturally encoded

60   jan schacher music appreciation. The manner in which elements of primary, low-level “enactive” sound perception and listening form the central aspects that constitute the field of musical perception is to be the topic of this chapter. This situates the reflection between the poles of sound and music, between the states of active and passive perception, and performance of sound and music. The practice of music performance provides an interesting case where these principles can be observed in action. When highly technical instruments come into play, this relationship is put to the test, because physiological and cognitive schemata are replaced by metaphors and representations of instruments. The increasing pervasiveness of technology in music making and listening makes this issue relevant and even urgent to understand. The relationship between the body and music is, in a way, quite straightforward: Performing music is a voluntary act and normally involves physical movement; perceiving music on the other hand can be motionless and involuntary. These two modes are never completely separable, they occur simultaneously to different degrees both for the musician and the listener. That the performer also listens to what is played goes without saying, but does the listener also perform what is heard? How does motionless aural perception involve the body of the person partaking in a performance? And how does the performer imagine and initiate musical actions with the body? The perspectives taken in this chapter are influenced by the phenomenological thinking of Merleau-Ponty (1945), who postulates an indivisible unity between body and perception in the construction of the world. Based on this, the “enactive” position claims that it is this link that produces cognition: “A living organism enacts the world it lives in; its effective, embodied action in the world actually constitutes its perception and thereby grounds cognition” (Stewart et al. 2010, vii). In other words, the fact that perception is guided by action in continuously reinforcing loops that leave imprints in the organism might be what produces cognitive structures (Varela et al. 1991, 173). The topic of musical motor imagery represents a critical perspective on music making by an individual, or “musicking” (Small 1999), in the social situation of a concert, by many. Here the domains of evolutionary, psychological, cultural, and social significance intersect and permit the linking and juxtaposition of current approaches in musicology, psychology of music, and philosophies of mind. Motor imagery and imagination in music as the focal points provide a lens through which to investigate the relationships between the inner and outer aspects of listening to and performing of music and sound. The elements of agency and intentionality, precognitive and conscious corporeal ­perception, and explicit and covert imitation play a central role in this constellation. They inform the way a musical action is effective on both performer and listener and to where or what the sources of musical events are ascribed by the perceiver. They also shape the role that sounds and sound-objects play in constituting a prereflective perception of musical dynamics and expressive meaning as well as the sonic environment in general. Imitation or simulation, as our body’s strategy to understand the other, extends to sounds, to this invisible yet signaling domain. The mimicry involves not just the sounding aspect itself, but also the sound-producing processes, that is, the actions that produce the sound. Recognition is based on subpersonal and prereflective re-enacting

motor imagery in perception and performance   61 or simulation of the sound-originating event, in an approximation that uses all available means, first of all with the voice but also with the other movement capabilities of the body. In the culturally charged space of music, instruments and modes of sounding with devices, materials, and processes are projected (imagined) and simulated internally when not witnessed in actual performance. The body’s relation to the instrument, that is, the sound-producing mechanism or object, as well as the subpersonal and precognitive processes that are engaged in sound making, creates a site of intertwining, enfolding, and charging that constitutes the basis for sound- and music-related imagery and imagination. One of the aims of this chapter is to understand how the naturally acquired skills and knowledge about corporeal and physical object-relationships come to bear in digital, technological modes of music performance and perception. Interactions with digital instruments that contain intangible sound production processes are based on the bodily skills of acting on and with physical tools and instruments. The recognition or imagination of synthetic electronic sounds is dependent on our embodied and tacit knowledge of sound sources in the natural world even when they have no real-world counterpart. That is why knowledge of conventional instruments, of their sound, physics, and their modes of playing is crucial for imagining musical sounds and represents an important part of our cultural heritage.

Sound, Music, and the Body In addition to being a natural phenomenon that can be explicated by physics’ wave equations and psychoacoustics’ understanding of our perceptual apparatus, sound carries a deeper meaning of what is “out there” or what is beyond our self. Since we hear events and not sounds (Gaver 1993) as the invisible, enveloping presence of exterior actions, it lets us witness in an immediate and fleeting, ephemeral manner the others, be they natural, cultural, or subjective agents. Sound is transmitted through the medium of air (or water, bones, or other materials), touches us physically, and with its energy sets our body in resonance. Sound then touches us perceptually, providing us with involuntary (affective) reactions, as well as with recognizable or novel impressions. We perceive sound on subpersonal as well as conscious levels and through it we can be alarmed, intrigued, informed; we reach out to investigate. Hearing and listening as modes of (dis-)engagement are always mutually inclusive (except during sleep) and simul­ taneously let us gauge, weight, and focus our own sonic perceptions and what it is we hear. In culture, the role of sound is superseded by music and speech, which are means for the active and conscious creation and structuring of sounds into units and combinations of meaning. However, music is in no way decoupled from sound perception, from the primary affective impact it has on the body and, through it, on our emotional and cognitive appraisal. General cultural categories and practices, such as rituals (Turner 1982), theatrical performances (Schechner 1977), and play (Huizinga 1955), separate sound from music, make it subservient to the latter, in the same way as in music stories, language(s),

62   jan schacher symbolic, and semantic elements subjugate the primary elements of timbre, pulse, rhythm, consonance or dissonance, and the sonic spaces that are created with the use of voices and instruments. Through its structuring movement and its relational meaningconstruction, music functions as an ordering principle and as a conveyor of fixed (and therefore recognizable) sounding tropes and affects. Cultural practices—that is, musical styles that avoid the tendency of fixing all elements—have the potential to reflect and mirror back the perception of music to a primary sensorial experience of the ephemeral, both through listening and through perceiving by other corporeal means, such as kinesthesia or proprioception. Other cultural practices, such as popular and club music, leverage fixity and use expectations (Huron 2006) and musical schemata to access a mode of perception that need not reside primarily in listening, and rather occurs through co-performing; for example, by dancing and similar modes of physical partaking in the music. In all these cases, the body’s characteristics and its capacities to resonate, re-enact and “re-member” physically provide the foundation for the sonic experience. Even in phonographically mediated music or sound experiences (Walther-Hansen  2012), the body provides a point of reference—even if only through its absence in the perceptual field when listening to a recording. And even sounds with clearly identified natural causes that do not involve an active subject, be it human or nonhuman, such as water or wind noises, provoke a physiological, affective response in the body. Here again, the “enactive” position postulates that cognition is a function of being bodily in the world, and that having a body is what enables us to develop experiential structures that exhibit meaning (Noë 2004). The “autopoietic” position of embodied action is stated clearly by Varela and colleagues: By . . . embodied we mean to highlight two points: . . . cognition depends upon the kinds of experience that come from having a body with various sensorimotor capacities, . . . these individual sensorimotor capacities are themselves embedded in a more encompassing biological, psychological, and cultural context. By . . . action we mean to emphasize . . . that sensory and motor processes, perception and action, are fundamentally inseparable in lived cognition. . . . the enactive approach consists of two points: (1) perception consists in perceptually guided actions and (2) cognitive structures emerge from recurrent sensorimotor patterns that enable action to be perceptually guided.  (Varela et al. 1991, 173)

Applied to my topic, it is fair to say that without the body’s materiality, the body’s ability to produce sound itself, we would not be able to perceive, let alone identify and give identity to sound, as far abstracted from its moment of production it might be, or however tenuous our perceptual link to it might be (Voegelin 2010, 82). The subjective identification with sounds and their origins, through their presence in the same moment as the perceiving subject, connects immediately but without fixity: it remains fluid and fleeting, and needs to be (re-)activated continuously. When perceiving events in sound, we perceive other bodies, agents, and actants (Latour 2005) that are of the same kind and have the same capacities as our own body

motor imagery in perception and performance   63 and that the body only understands when they fit with a preexisting experience, a predisposition to resemble, to be equivalent: a resonance. The body is the site of experience, the site of fusion between senses, perceptions, and memories, the site of cognition. As a  substrate and foundation, it carries cultural schemata of sounding, of “eventing”1 (Ihde 2007, 109) with sounds in an affective, interpersonal, protolinguistic, and even musical way. These schemata complement those of the body itself, of its immediate learning and imprinting, and expose the potential to approximate relationships between sound and event through other means. Music performance encapsulates and charges sonic perception with cultural dimensions, yet depends essentially on the perceptual and subpersonal capabilities of an “enactive,” embodied intertwining with the sounding world, and on an experiential, personal, and interindividual link to sound in a cultural context. In addition, personal aspects of the performative construction of the self (Butler 1988) contribute the elements of gender, age, social position, and other biographical factors that further color the act of music performance and perception. The nature of musical processes is a dynamic flow, not simply of time but also of elements constituted of bodily actions that produce distinct sound impressions and carry immediate ecological meaning in a prereflective corporeal domain, before rising to a protolinguistic, presemantic, or adaptive semantic level (Reybrouck 2006). Within musical perception, the processes we are affected by, perceive, and act out are made by dynamic chains of sound-objects as well as by action– sound pairs or multimodal “gestural sonorous objects” (Godøy 2006). These elements form “segregated streams and objects that lead, via the subjective sensing of the ­subject’s body motion, to impressions of movement, gesture, tensions, and release of ­tension” (Leman and Camurri 2006, 212–213). As musicians perform, they construct a temporally unfolding stream of movement dynamics that the listener-viewer reenacts and co-performs through kinesthetic, corporeal resonances and higher-order dynamic sensing. This state of active engagement is more akin to moving oneself than to sounding within oneself.

Mimetic Motor Imagery and the Sense of Agency The field of music psychology investigates cognitive, neural, and behavioral aspects of  music perception and performance actions. Empirical studies address the ­understanding of one’s own and others’ actions (Jeannerod  2006), for example, by investigating self-recognition (Sevdalis and Keller  2010) and co-performing with other musicians (Keller 2008). In his mimetic motor imagery hypothesis, Cox brings together a number of elements that demonstrate and anchor in empirical research how imitative, simulation-based ­re-enacting of sound-producing movements lies at the core of music perception: “Motor

64   jan schacher imagery is imagery related to the exertions and movements of our skeletal-motor ­system, and in the case of music this involves the various exertions enacted in musical performance” (Cox 2011). With his hypothesis, he attempts to show that, within the embodied perspective of music perception, the corporeal sensorimotor apparatus and its neurological equivalents form an indispensable foundation for perceiving, recognizing, and conceptualizing musical events. As with sound perception, mimetic motor imagery is based on the perception of an event, rather than motion or sound in their raw form. These events carry with them the physicality of their origin, are evidence of the actions that produced them, and can be comprehended via overt and covert imitation through bodily representations of observed actions in the same modality, in a cross-modal way, or even in an amodal, metaphorical manner. The bodily mimetic responses allow for the representation of all perceived acoustic features, albeit translated to those capabilities that are present in the motor system (e.g., the voice or movements) (Cox 2001). Giving an intentional perceptual focus (attention) to an ongoing event generates in the body–mind an observational state that attempts to understand the unfolding event. Through the recognition of intentions and goal-directedness the event gets ascribed to an agent, be it human, biological, or an unknown entity. The primary reason for the sensing subject to perceive agency is to understand the possible goals and implications of the perceived action. In music perception, these goals need not be complex cultural achievements, they can be relatively low-level, single units of meaning, such as a single musical phrase, individual tone, or rhythm element, or the endpoint of a single gesture trajectory. The identification of goals and goal-points within the action-perception loop is an essential part of anticipatory perception, of the capability to project an external willed action, in order to understand its finality and how it affects the perceiving subject. In music, the temporal aspect of the perceived events is particularly important, since meaning arises out of a continuous stream of events rather than simultaneous, momentary perception of compact objects. This speaks to our evident ability to have prospective as well as retrospective images of musical sound, that is, as performer or improvisers we are able, “in a split second” to overview what we have just played, what we are playing now, and what we will be playing in the near future.  (Godøy and Leman 2010, 121)

The understanding of goals is achieved by the comparison of established body-schematic patterns against the perceived events, either in an identical modality or a surrogate one (Smalley 1996), in order to generate a bodily impression or affective resonance through imitation (Enticott et al. 2010; Montgomery et al. 2007; Lotze et al. 2006). “Individuals recognize actions made by others because the neural pattern elicited in their premotor areas during action observation is similar to that internally generated to produce that action” (Rizzolatti and Arbib 1998, 190). The re-enacting mechanisms are subpersonal, prereflective processes that represent a way for the perceiving organism to inquire what certain actions and states feel like and

motor imagery in perception and performance   65 to obtain in response a body-schematic resonance that might lead to open mimicry through movement (e.g., dancing or foot-tapping) or to inhibited forms of movement participation through prereflective inner motor imagery alone (Rizzolatti 2012). In effect, it is as if we are responding to an invitation to somehow imitate and to thus take part. Accordingly, we can speak of the performing arts as offering a mimetic invitation, and we can speak of our various responses as mimetic engagement or mimetic participation, whether in the form of overt movement or in the privacy of covert imagery.  (Cox 2011, 2)

The basis for motor imagery is given by two elements that are crucial for the acting ­subject: the kinesthetic memory (Sheets-Johnstone 2009) and physical experience of the perceptual consequences of similar earlier actions, and the ability to trigger complex motor- or body-schemata or kinetic melodies (Luria 1973), such ability providing an appropriate action form for the intentional image (Bergson 1939). While the memory of a prior experience is always linked to an actually executed action, the intentional image can become a surrogate for those perceptions that an executed action would produce (Annett 1996). Thus, in motor imagery processes, the necessary addition of location and timing information to a pre-established motor schema alters it in such a way as to help inhibit or suppress the execution of the actual movement patterns. The perceptual linking of imagery to motor patterns functions in two reciprocal paths, one serving a representative the other an operational role. They are not exclusively coupled and are independent enough to allow for an inhibition of imitative actions as reaction to movement or action perception (Berthoz 1997, 209), and permit the projection of action in motor imagery only and without the need to be executed openly (Reybrouck 2001). In addition to recognition and mimetic re-enacting for goal-understanding, the mechanism of motor imagery plays a crucial role in the preparation of real actions (Glasersfeld  1996, 65) and the storage of memories of executed actions: “motor action itself, in its prenoetic body-schematic performance, has the same tacit and auto-affective structure that involves the retention of previous postures, and the anticipation of future action” (Gallagher 2005, 204). If, in the case of simulation, the impulse for the execution of an action is blocked on the way from the cortex to the spinal cord (Decety and Chaminade 2003) then, in case of execution, the efferent and afferent neural activation streams form a complete loop that enables adaptive and continuous control over the action (Annett 1996) and lead, via internal model parameters (Keller 2012, 209), to the perception of one’s own agency: “Performative awareness that I have of my body is tied to my embodied capabilities for movement and action . . . my knowledge of what I can do . . . is in my body, not in a reflective or intellectual attitude” (Gallagher 2005, 74). This “sense of effort” (James 1896) provides the tacit proprioceptive knowledge that perceptual changes are indeed the outcome of one’s own actions. “That is, although the content of experience may be the intended action, the sense that I am generating the action may be traced to processes that lie between intention and performance” (Gallagher 2005, 57) and “they are generated

66   jan schacher in the sub-personal processes of body-schematic control, and specifically in the processes of motor preparation and the sensory feedback that results from the action” (174). Any level of motor imagery is informed by the sense of agency and ties it to a real, as opposed to purely imaginary, or even hallucinated action.

Motor Imagery’s Role in Perceiving Music through Instrumental Actions The technique of motor imagery is commonly used in conjunction with physical training in order to optimize skillful execution, for example, of an athletic task. It is used to imprint as a corporeal motor schema a sequence of movements in a single coherent and economical movement unit. The goal is to have at one’s disposition a complex movement pattern that only needs to be triggered once and not consciously controlled in every aspect throughout the entire movement trajectory. The repetitive nature of practicing coordinated movements of instrumental play fulfills the function of establishing body-schemata, “integral kinaesthetic structures” (Luria 1973, quoted in Sheets-Johnstone 2009), dynamic patterns, or so-called kinetic melodies. However, such obtained knowledgeability is not simply a know-how, a lesser of form of knowledge that is “merely physical.” Kinetic melodies are saturated in cognitive and affective acuities that both anchor invariants and color and individualize the manner in which any particular melody [pattern] is run off.  (Sheets-Johnstone 2009, 256)

Through the practicing process, the embodied “know-how” becomes prereflective and can later, in the right environment and circumstances, be triggered as a unit without the necessity to individually deal with the actions that constitute it. Obtaining these motor schemata is considered beneficial to concentration and mental preparation for extreme, singular, and rare high-performance moments. Having integrated complex patterns into single units allows one to shift the focus to anticipation and adaptation in complex situations such as, for example, returning a 200-km/h tennis service. In athletic training scenarios, on the one hand, the patterns are predefined, pretrained units of movement that are continuously recalled and reinforced. In a musical performance scenario, on the other hand, how many of the actions are pattern-based and to what extent these patterns modulate the shaping of a performance depends on stylistic, musical definitions and their degree of fixity. Active motor imagery serves as a technique for training and builds on the way movements are memorized, related, and executed on a subpersonal as well as somatic/ kinesthetic level. Athletes, as well as musicians and dancers, are known to practice mentally and with reduced physical activation, such as when marking phrases (Kirsh

motor imagery in perception and performance   67 2010). The extra scaffolding obtained from executing reduced, yet signifying bodily actions, represents an interesting case of a hybrid practice, which leverages motor imagery with goal-points or “key-frames” (Godøy and Leman 2010), without fully exerting the body. Practice exercises for musicians can have the same degree of determination as an athlete’s movement schema and can be mentally practiced in the same way. Training of fine-motor control and the creation of larger body-schematic movement units that can be recalled without conscious involvement are the central occupation in a musician’s training during the instrumental skill acquisition phases. The body accumulates knowledge about movements, dynamics, and forces and, in the case of traditional musical instruments, links it to the perception, the adaptation, and the control of the desired sound-qualities, thus dealing with movement-sound conjunctions rather than with movement and sound separately. This embodied knowledge encompasses the full range of the body’s motion and audition control. It is completely interdependent with the environmental situation within which it is learned and acquired. Full musical performance situations consist of a large number of perceptual tasks that need to be negotiated and mastered with skills going beyond mere body-schematic patterns. Yet, in order to achieve the necessary level of (hyper-)reflection (Kozel 2007), and in order to master both the corporeal and the expressive musical demands of the performance, it is important to able to base the execution on previously imprinted body-schemata, to anticipate or plan the triggering of a motor movement unit, and to modulate in real-time the parameters of its execution. The ensuing multilevel perception and attention during music performance are necessary to manage high-level musical goals and succeed with timing and the expressive control of phrases (Brown et al. 2015) or chunks, in a top-down manner (Godøy 2006, 156) without the need to put full attentional focus on specific single task elements. The difference between general movement tasks, such as picking up a cup and drinking from it, and musical tasks lies in the acoustic, sounding component whose perception plays a crucial role in controlling the quality of execution. In musical performance with an instrument, by modulating fine-motor actions, an adaptive feedback-loop is created that controls sound’s central aspects such as timbre, timing, and dynamics. This loop contains both prereflective, kinesthetic as well as conscious, musical, or sonic ­perceptions and (re-)actions and includes all the peripheral situating elements that are part of the performance, that is, the stage, the other players, the social situation, and so forth. Thus, the training of instrumental playing on a traditional instrument such as the violin or the flute consists of learning recursive motor adaptations that depend both on the perception of physiological, corporeal elements, such as posture, breath, tension, and force, and on the auditory perception of tonal qualities such as timbre, pitch, resonance, and volume: When the status of habituation is reached, the body-image retreats into the background in order to enable the concentration on the sonic-expressive shaping of the entire piece of music, something to which the prereflective, proprioceptive and auditory body senses are continuously subjected.  (Kim and Seifert 2010, 111, my translation)

68   jan schacher For the musical performer, lower-level auditory processes occur on a prereflective level and inform musical awareness on a higher level, where the musical elements become part of the experiential content. With habituation, this prereflective perception of musical elements gets integrated into prereflective somatic proprioception, as in the example of “feeling” the correct intonation on a string instrument. This habituation process shows how musical awareness plays out on a metaphorical (Lakoff and Johnson 1980) or conceptual (Fauconnier and Turner 2003) level and blends with and informs the sensory-motor integration of auditory adaptations in body-schematic patterns. As with any other physical task, performing music involves the coordination of intention, goal, perception, and adaptive feedback for adjusting the motion trajectory. This is where motor imagery on a prereflective and subpersonal level, as well as active, intentional imagination become fundamental to a successful performance.

Ecological Embedding and Affordances The “enactive” perspective on perception would not be complete without considering the ecological embedding that forms an integral part of and frames the moment of sound and music perception. As the “being-in-the-world” is a relationship between the exterior and the interior, so is the zone between subject and object, between body– mind and environment the domain where interaction arises (Kozel 2007). In the case of performing with technology, the environment comprises stage, technical devices, performers, and the public; for example, in pieces bringing together choreography with motion-capture technology (Kozel  2007, 274) or video analysis with choreographic scores (Groves et al. 2007). When thinking of sound and music perception, the relationship to the sources of sounds is key and, therefore, also the relationship we have on a physical, subpersonal level with the tools and instruments that produce them, with the physics of sound, and with our aural and psychoacoustic capabilities of localizing and identifying sources through their spectral contents (Bregman 1990). The concept of ecological embedding applied to music perception implies a direct relationship between source, sound, and subjectively perceived sound image. Affordances are defined in terms of ecological potential as that which an object or environment is offering as actions or resources to a subject: The affordance of something does not change as the need of the observer changes. The observer may or may not perceive or attend to the affordance, according to his needs, but the affordance, being invariant, is always there to be perceived.  (Gibson 2015, 130)

Gibson derived his concept from “Gestalt” psychology’s terms of valence, invitation, and demand, but was critical that its proponents used the concept in a value-free manner. He emphasized the inherent meaning that arises out of ecological embedding:

motor imagery in perception and performance   69 An affordance points two ways, to the environment and to the observer. So does the information to specify an affordance . . . this is only to reemphasize that exteroception is accompanied by proprioception—that to perceive the world is to coperceive oneself . . . The awareness of the world and of one’s complementary relations to the world are not separable.  (132–133)

In order to understand the scope of objective affordances (Paine  2009) that arise in ­playing traditional, physical music instruments, the concept of perceptual affordances needs to be added that is located in the cultural domain of music. On a primary level, perceptual affordances can be defined as those types of perceptions generated when entering into contact with the instrument but without necessarily interacting with it. These perceptions form a multimodal field that encompasses the traditional five senses of vision, audition, touch, taste, and smell. They arise when attentional awareness is guided toward the instrument in any of the sensory modes. An example of such an affordance is that of perceiving the tension of a drum skin while holding a frame-drum. On a secondary level, perceptual affordances could also be seen as the potential for perceptions to arise from interaction with the instrument. These secondary perceptions could be tied to the five senses as well, if they manifest themselves within the outside perceptual field and in direct relationship to the instrument. An example of this affordance would be the sound generated from playing the instrument and contained in the auditory event that arises out of an instrumental action. The perception or awareness that originates within the player when interacting with the instrument, however, represents a separate type of perceptual affordance that—even though it is derived from contact and action with the instrument—does not exist independently of the cognitive or subpersonal processes of the performer. The outer contact with the instrument is conveyed by tactile and sometimes vibrotactile cues. In contrast, the inner effects of contact with the instrument are based on a kind of sensing that is active within the body, such as kinesthetic and vestibular sensing. These effects cannot be called perceptions but rather sensations and belong to the prereflective, precognitive levels of our perceptual system. An example of this inner type of affordance might be the level of comfort or the complexity of physical adaptation an instrument demands for its proper playing position, such as, for example, correctly lifting the hands while sitting at a piano. Or the affordance might be the prereflective adaptations to playing due to the perception of vibrational forces transmitted through the body, such as the modulation of a vibrato as felt through the changes in the vibrating string. Finally, on a higher level, the sounds of an instrument itself obtain their meaning, and therefore offer their “musical” or “cultural” affordance in the context of their application. In that case it is less the physical aspects of the instrument and more their habitual use that defines the “ecological” potential. Motor imagery depends on internalizing the affordances, on the ability to internalize the sounding result of a musical motor action; it therefore “involves mutuality between perception and action at a neurobiological level” (Windsor and Bézenac 2012, emphasis added) as well as on an experiential and cultural level, since any musical action is situated in a cultural context and builds on prior experiences.

70   jan schacher

Gestures, Metaphors, and Models in Technological Instruments The literature on musical gesture provides a rich set of categorizations and classifications that deal mainly with the types and effects of actions on musical instruments labeled as “gestures.” Cadoz’s classification of the “gesture channel” differentiates between the three functions of the ergotic, that is, the “material action, modification and transformation of the environment,” the epistemic, and the semiotic, and orders the instrumental “gestures” in the three categories of excitation, modification, and selection (Cadoz 2000). Godøy formulates the distinction between body-related and sound-related “gestures” (Godøy and Leman 2010) that are categorized into sound-producing, communicative, sound-facilitating, and sound-accompanying “gestures” (Jensenius et al. 2010). These authors all take into account the bodily basis for the actions, sometimes also the perceptual effects, but fail to address the prereflective effects inherent to acting and perceiving musical agency through an instrument, in particular the new forms of technological instrument that rely on abstract mathematical models and digital signal processing for the production of sound. In order for digital musical instruments to become “playable” in the proper sense of the word, the representations of their digital processes need to occur in metaphors (Lakoff and Johnson 1980); these processes are too complex to be grasped and acted on directly while performing.2 The metaphors are present in visual representations, such as the display of waveforms or spectrograms, in physical placeholders, such as levers, wheels, knobs, and sliders, or in more encompassing analog device metaphors such as tape-reels, patch-bays, and signal-chains. By themselves, these metaphors are useful, and enable complex instruments to be “played”; the problem is their limiting effect on the cognitive and perceptual capacities that could be better mobilized with richer, more differentiated, and more process- or action-specific metaphors. A number of conceptual models for the control of digital sound processes originate in real-world scenarios and in existing physical devices and can therefore be cognitively handled through actions and behaviors that are shaped by everyday experiences. The two main models of control can be identified as the instrument (Jordà Puig 2005) and the cockpit (Wanderley and Orio 2002). The first model is based on a traditional musical instrument’s dependence on (continuous) energy input that is necessary to produce sound. Rather than presenting mechanisms for generating larger time-based structures, the instrument offers a palette of sound options (or playing techniques) that need to be actively selected, combined, and performed by the musician. The second model of action puts the performer into an observer perspective or pilot’s cockpit, where, from a position of overview, single control actions keep the system within the boundaries of the intended output, while the actual sound processes produce their output without the need for continuous excitation and control. A third and less common model is that of dialogical communication and interaction with generative aspects that become an

motor imagery in perception and performance   71 integral part of the sound production processes. The most interesting manifestations of the third model deploy some form of autonomous agents to generate an “inter-subjective” exchange (Lewis 2000). The types of interaction and their position on the conceptual axis, between direct parametric control and “naturalistic interaction,” depend on the level at which the musician acts or “inter-acts” with the digital domain (Kozel 2007, 68). Different complexities demand different tangible objects and instrumental interfaces. In the case of one-­ dimensional and precise parametric control, individual objects such as knobs, sliders, or buttons are cognitively appropriate, since they represent in their physical form the singular dimension of the parameter and can be handled discretely. In the case of higher-dimensional or model-based action patterns, control objects with more degrees of freedom are required. The mode of “interaction” with more intertwined dimensions should reflect the relationship and dependency of those degrees of freedom that are present in the digital domain. The most complex set of entangled degrees of freedom that we can cognitively handle are those present in our entire body. Leveraging this level of complexity, at least through extraction of information about posture and kinematic qualities of the body is attempted, for example, by camera-based motion controls for games in so-called natural user interaction where full-body movements are used for control. This might be an appropriate method when the goal is to affect a virtual body that mirrors the capabilities of the natural body in a virtual game environment. It becomes problematic, however, when the correspondence between the actions in the physical world and the result or reaction in the abstract digital domain are modeled after categories that originate in the abstract domain. Empty-handed and movement-based controls in an allocentric3 frame work well for metaphors of control that reflect spatial qualities. Object-based, instrumental actions with tangible interfaces in an egocentric4 (for example, with wearable sensors) or object-centric frame are effective for actions on abstract entities without clear correspondence in the real world. Digital instrument design and interface developments oscillate between these two poles. There is, however, a tendency to shift away from action and behavior patterns that are based on the bodily capabilities shaped by object “interaction” with physical instruments, and to move toward symbolic and metaphorical projection onto a disjointed digital model. A smartphone with its touch screen, for example, gets used as—but was also designed to become—a generalized object with a repeatable and representable repertoire of movement patterns, the so-called gestures of pinch-to-zoom, swipe, and so on. These action patterns were copied from the natural world. Slight dissonances within or new interpretations of these patterns are learned and absorbed quickly when they constitute part of the interaction vocabulary of an information device.5 Technological instruments such as the turntable or a tablet are easily integrated into a musician’s movement and instrumental vocabulary. Turntable-ism is a prime example of the reappropriation of a music playback device into an instrument, subverting codes of musical style as well as social codes of “stealing” music or the disregard for the “authentic” musician’s voice (Eshun 1998). Today, in a further shift, the turntable finds

72   jan schacher itself translated to a physical interface that is merely a representation of the turntable. The digital interface for DJs reconstructs the actions of scrubbing with a vinyl disc on a platter through a scrubbing, disk-like interface on a touch-screen or physical controller that affords the same movement-type and, through it, the same sonic transformation of digital content as a turntable. This kind of mimicry or design-genealogy within instrument building illustrates the fact that successful instrument designs engender cultural movement-, action-, and gesture-tropes, which survive technological transitions because of strong habituation and the effectiveness of the metaphorical elements they carry. Electronic music performance practices provide a good use-case to observe these shifted relations. Here, the patternand image-projections occur in a pronounced manner and, in some cases, bodily motor imagery or gestural interaction models are translated into purely metaphorical forms.

Instruments, Objects, and the Others As the source of sounds, musical instruments with their rich cultural history and the field of association they carry have a profound impact on our imagination of music making. With the exception of the voice, all human-made (conventional) musical sounds are generated by vibrating objects that exhibit specifically tuned physical properties. Within a single culture, these modes of sound production are commonly known and form the basis for understanding the act of music making. Sounds that have never been heard, and do not resemble any other sounds that were experienced before, are not easily identified, get confused with other sounds, or are simply ignored (Lemaitre et al. 2010). The prereflective auditory processes responsible for these decisions are part of the filtering and inhibition systems that most of our perception is based on. Since recognizing sounds is an evolutionary necessity, we are highly attuned to localizing and identifying a sound’s origin rapidly and preconsciously, even if that means occasionally failing. This capability is transferred to recognizing musical sounds and identifying instruments, voices, and acoustical signals. When considering the import that musical instruments have on our ability to imagine producing sounds in a meaningful manner, the primary relationship to take into account is that of an active body interacting with, exerting control over, and imposing intentions onto a tool or object. The “body-object articulation” is a charged field and contains not just the pragmatic value of its usage but also the signifiers of agency6 or, in political terms, of the inherent power-relationship (Foucault 1977); the articulation constitutes a body-weapon, body-tool, even body-machine complex that becomes a relevant topic and urgent concern in technological performance practices and the way technological and information-bearing tools pervade our current life-world and blur the boundary between the organic and the technological (Haraway 1987). Even though the body-object-movement relationships and kinesthetic patterns that are offered by technological instruments exist in the same domain as the traditional ones, culturally defined and explicitly designed motor images and interaction patterns

motor imagery in perception and performance   73 prevail. In order to enable the manipulation of sound with intentional actions, even a technological instrument that is based on digital (intangible) processes to generate sound needs a control- or performance-interface that is based on physical characteristics; it needs to provide methods of access through proxy layers that enable physical or gestural interactions. The way technological instruments mediate and alter the path from an imagined and anticipated sound-event to its sonic manifestation tells as much about the “technicity” of the instrument (Simondon 1958) as about the mechanisms for music making we depend on. The temporal unity of an action and its sonic result, for example, are critical to maintaining a sense of causality and agency. The translations that are necessary to link a physical action to the production of a sound show the perceptual boundaries of the physical properties of sounding objects, moving bodies, and the action-sound coupling that are always present in the natural world; the immediate bond between bodily action and sounding result is broken by the use of symbolic machines. These computer programs with their associated graphical user interfaces are merely executing logical or mathematical operations in order to generate sounds. Even though technology is optimized to hide this fracture, for example, by becoming so fast as to appear immediate and transparent, our necessary and indissociable reliance on embodied perception for identifying sound sources as a matter of survival generates an inherent tension and contradiction that undergirds and permeates any performance with technology. How this tension can be fruitfully exploited to generate meaningful relationships for performing arts is stated succinctly by Kozel (2007, 70–71): “If we create responsive relations with others and our environments that transcend language, then by means of intentional performance with technologies we can regard technologies not as tools, but as filters or membranes for our encounters with others.” This statement emphasizes the fact that musical imagination and performance are part of a deeply cultured activity and are always already oriented toward others (Decety and Chaminade 2003). This applies to all levels of “technicity” of instruments, even primary vocal utterances of musical nature, and shows that current musical practices contain the reciprocal function of affectively touching the performing as well as the perceiving subject who are each other’s “other” in the communicatively enfolded moment of “musicking.”

Notes 1. Or producing an event. 2. Even emerging live-coding practices rely on textual representations in programming ­languages and widgets of graphical user interfaces to “perform” with sound-processes. 3. An outer spatial frame of reference. 4. A spatial frame of reference anchored on oneself. 5. Think of the finding of the power button of your smartphone; after a short period of accustomization, the act of switching the screen on or off becomes a pattern that does not need extra attention, even though there is often no clear reason why it might be in one place or the other on the device. 6. It is interesting to consider the term “agency” in its German translation: “Handlungssmacht” could be translated as the power to act (Stockhammer 2015).

74   jan schacher

References Annett, J. 1996. On Knowing How to Do Things: A Theory of Motor Imagery. Cognitive Brain Research 3 (2): 65–69. Bergson, H. 1939. Matière et mémoire: Essai sur la relation entre le corps et l’esprit. Paris, France: Presses Universitaires de France, Quadrige. (English: 1911, Matter and Memory. London, UK: George Allen and Unwin.) Berthoz, A. 1997. Le sens du mouvement. Paris, France: Odile Jacob. Bregman, A.  S. 1990. Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge, MA: MIT Press. Brown, R. M., R. J. Zatorre, and V. B. Penhune. 2015. Expert Music Performance: Cognitive, Neural, and Developmental Bases. Progress in Brain Research 217: 57–86. Butler, J. 1988. Performative Acts and Gender Constitution. Theatre Journal 40 (4): 519–531. Cadoz, C. 2000. Gesture-Music. In Trends in Gestural Control of Music, edited by M. M. Wanderley and M. Battier, 71–94. Paris, France: Ircam, Centre Pompidou. Cox, A. 2001. The Mimetic Hypothesis and Embodied Musical Meaning. Musicae Scientiae 5 (2): 195–212. Cox, A. 2011. Embodying Music: Principles of the Mimetic Hypothesis. Music Theory Online 17 (2): 1–24. Decety, J., and T. Chaminade. 2003. When the Self Represents the Other: A New Cognitive Neuroscience View on Psychological Identification. Consciousness and Cognition 12 (4): 577–596. Enticott, P.  G., H.  A.  Kennedy, J.  L.  Bradshaw, N.  J.  Rinehart, and P.  B.  Fitzgerald. 2010. Understanding Mirror Neurons: Evidence for Enhanced Corticospinal Excitability During the Observation of Transitive but Not Intransitive Hand Gestures. Neuropsychologia 48 (9): 2675–2680. Eshun, K. 1998. More Brilliant Than the Sun: Adventures in Sonic Fiction. London: Quartet Books. Fauconnier, G., and M. Turner. 2003. The Way We Think: Conceptual Blending and the Mind’s Hidden Complexities. New York: Basic Books. Foucault, M. 1977. Discipline and Punish: The Birth of the Prison. London: Vintage. Gallagher, S. 2005. How the Body Shapes the Mind. Oxford: Clarendon. Gaver, W. W. 1993. What in the World Do We Hear? An Ecological Approach to Auditory Event Perception. Ecological Psychology 5 (1): 1–29. Gibson, J. J. 2015. The Ecological Approach to Visual Perception. New York and London: Taylor and Francis, Psychology Press. Glasersfeld, E. 1996. Radikaler Konstruktivismus: Ideen, Ergebnisse, Probleme. Frankfurt am Main: Suhrkamp. Godøy, R. I. 2006. Gestural-Sonorous Objects: Embodied Extensions of Schaeffer’s Conceptual Apparatus. Organised Sound 11 (2): 149–157. Godøy, R.  I., and M.  Leman. 2010. Musical Gestures: Sound, Movement and Meaning. New York: Routledge. Groves, R., N. Zuniga Shaw, and S. DeLahunta. 2007. Talking about Scores: William Forsythe’s Vision for a New Form of “Dance Literature.” In Knowledge in Motion: Perspectives of Artistic and Scientific Research in Dance, edited by S. Gehm, P. Husemann, and K. von Wilcke, 91–100. Bielefeld, Germany: Transcript Verlag. Haraway, D. 1987. A Manifesto for Cyborgs: Science, Technology, and Socialist Feminism in the 1980s. Australian Feminist Studies 2 (4): 1–42.

motor imagery in perception and performance   75 Huizinga, J. 1955. Homo Ludens: A Study of the Play-Element in Culture. Boston: Beacon. Huron, D. B. 2006. Sweet Anticipation: Music and the Psychology of Expectation. Cambridge, MA: MIT Press. Ihde, D. 2007. Listening and Voice: Phenomenologies of Sound. Albany: SUNY Press. James, W. 1896. The Principles of Psychology. London: Macmillan. Jeannerod, M. 2006. Motor Cognition: What Actions Tell the Self. Oxford: Oxford University Press. Jensenius, A., M. M. Wanderley, R. I. Godøy, and M. Leman. 2010. Musical Gestures, Concepts and Methods in Research. In Musical Gestures, Sound, Movement and Meaning, edited by R. I. Godøy and M. Leman, 12–35. New York: Routledge. Johnson, M. 2007. The Meaning of the Body, Aesthetics of Human Understanding. Chicago: University of Chicago Press. Jordà Puig, S. 2005. Digital Lutherie: Crafting Musical Computers for New Musics’ Performance and Improvisation. PhD thesis, Barcelona, Spain: Universitat Pompeu Fabra, Department of Information and Communication Technologies. Keller, P. E. 2008. Joint Action in Music Performance. Emerging Communication 10:205. Keller, P.  E. 2012. Mental Imagery in Music Performance: Underlying Mechanisms and Potential Benefits. Annals of the New York Academy of Sciences 1252 (1): 206–213. Kim, J.  H., and U.  Seifert. 2010. Embodiment musikalischer Praxis und Medialität des Musikinstrumentes—unter besonderer Berücksichtigung digitaler interaktiver Musik­ performances. In Klang (ohne) Körper, Spuren und Potenziale des Körpers in der ­elektronischen Musik, edited by M.  Harenberg and D.  Weissberg, 105–117. Bielefeld: Transcript Verlag. Kirsh, D. 2010. Thinking with the Body. In Proceedings of the 32nd Annual Conference of the Cognitive Science Society, Austin, TX, edited by the Cognitive Science Society, 32:32, 2864–2869. Mahwah, NJ: Lawrence Erlbaum. Kozel, S. 2007. Closer: Performance, Technology, Phenomenology. Cambridge, MA: MIT Press. Lakoff, G., and M. Johnson. 1980. Metaphors We Live By. Chicago: University of Chicago Press. Latour, B. 2005. Reassembling the Social. Oxford: Oxford University Press. Lemaitre, G., O.  Houix, N.  Misdariis, and P.  Susini. 2010. Listener Expertise and Sound Identification Influence the Categorization of Environmental Sounds. Journal of Experimental Psychology: Applied 16 (1): 16. Leman, M., and A. Camurri. 2006. Understanding Musical Expressiveness using Interactive Multimedia Platforms. Musicae Scientiae 10 (1): 209–233. Lewis, G. E. 2000. Too Many Notes: Computers, Complexity and Culture in Voyager. Leonardo Music Journal 10: 33–39. Lotze, M., U. Heymans, N. Birbaumer, R. Veit, M. Erb, H. Flor, et al. 2006. Differential Cerebral Activation during Observation of Expressive Gestures and Motor Acts. Neuropsychologia 44 (10): 1787–1795. Luria, A. R. 1973. The Working Brain. Harmondsworth, UK: Penguin Books. Merleau-Ponty, M. 1945. Phénoménologie de la perception. Paris: Gallimard. Montgomery, K. J., N. Isenberg, and J. V. Haxby. 2007. Communicative Hand Gestures and Object-Directed Hand Movements Activated the Mirror Neuron System. Social Cognitive and Affective Neuroscience 2 (2): 114–122. Noë, A. 2004. Action in Perception. Cambridge, MA: MIT Press. Paine, G. 2009. Towards Unified Design Guidelines for New Interfaces for Musical Expression. Organised Sound 14 (2): 142–155.

76   jan schacher Reybrouck, M. 2001. Biological Roots of Musical Epistemology: Functional Cycles, Umwelt, and Enactive Listening. Semiotica 134 (1–4): 599–634. Reybrouck, M. 2006. Music Cognition and the Bodily Approach: Musical Instruments as Tools for Musical Semantics. Contemporary Music Review 25 (1–2): 59–68. Rizzolatti, G. 2012. The Mirror of Mechanism: A Neural Mechanism for Understanding Others. Bethesda, MD: National Institutes of Health. Rizzolatti, G., and M. A. Arbib. 1998. Language within Our Grasp. Trends in Neurosciences 21 (5): 188–194. Schechner, R. 1977. Performance Theory. London and New York: Routledge. Sevdalis, V., and P. E. Keller. 2010. Cues for Self-Recognition in Point-Light Displays of Actions Performed in Synchrony with Music. Consciousness and Cognition 19 (2): 617–626. Sheets-Johnstone, M. 2009. Kinesthetic Memory. In The Corporeal Turn: An Interdisciplinary Reader, edited by M. Sheets-Johnstone, 253–277. Exeter, UK: Imprint Academic. Simondon, G. 1958. On the Mode of Existence of Technical Objects. Paris: Editions Aubier-­Montaigne. Small, C. 1999. Musicking: The Meanings of Performing and Listening; A Lecture. Music Education Research 1 (1): 9–22. Smalley, D. 1996. The Listening Imagination: Listening in the Electroacoustic Era. Contemporary Music Review 13 (2): 77–107. Stewart, J., O.  Gapenne, and E.  A.  Di Paolo. 2010. Enaction, toward a New Paradigm for Cognitive Science. Cambridge, MA: MIT Press. Stockhammer, P. W. 2015. Lost in Things: An Archaeologist’s Perspective on the Epistemological Potential of Objects. Nature and Culture 10 (3): 269–283. Turner, V. W. 1982. From Ritual to Theatre: The Human Seriousness of Play. New York, NY: Performing Arts Journal. Varela, F. J., E. T. Thompson, and E. Rosch. 1991. The Embodied Mind: Cognitive Science and Human Experience. Cambridge, MA: MIT Press. Voegelin, S. 2010. Listening to Noise and Silence: Towards a Philosophy of Sound Art. London and New York: Continuum. Walther-Hansen, M. 2012. The Perception of Sounds in Phonographic Space. PhD thesis, Department of Arts and Cultural Studies, Musicology Section, University of Copenhagen. Wanderley, M. M., and N. Orio. 2002. Evaluation of Input Devices for Musical Expression: Borrowing Tools from HCI. Computer Music Journal 26 (3): 62–76. Windsor, W. L., and C. de Bézenac. 2012. Music and Affordances. Musicae Scientiae 16 (1): 102–120.

chapter 4

M usic a n d Em ergence John M. Carvalho

Introduction There is something that emerges in a piece of music, especially in the skilled act of ­making music. What emerges is the music in that piece of music. We say we make music when, better put, we enact it by patterning sounds that achieve or contribute to the emergence of music in an otherwise undifferentiated field of sound. For the purposes of this chapter, the music that emerges from our skilled engagement with an otherwise undifferentiated field of sound will be described as afforded by that field (Gibson 1979). The affordances that turn up in the field will depend on the skills and the refinement of the skills one attempting to make music deploys in her skilled engagement with a field of sound. For the musician, sound is her environment, and the skills she has for engaging this environment have been acquired and refined in prior engagements with sound in this environment. Her skills are importantly embodied and, also, extended in the instruments and others tools—the score, a music stand, a tuning device, and so on—she uses in her skilled engagements with an environment of sound. Affordances in that environment turn up for her specifically embodied and extended skills, and music emerges from her embodied and extended engagements with those affordances. She imaginatively tests which skills most musically pick up what is afforded by her environment and deploys the skills that enact the music virtually present there.1 The particular music that emerges from that environment emerges for the distinctly refined skills engaged by the composer, the performer, and the listener (recognizing that these skills can and will overlap). The emergence of roughly the same music for a variety of musical skill sets and refinements testifies to the way these skills are shared and the extent to which our environment is co-constituted by a variety of musical subjects. This chapter draws on arguments for the ecology of cognition to support its claims about the emergence of music (Clarke 2005). The leading idea for this ecology is that our minds are fundamentally active and interactive, in the world and not in our heads. This thinking reverses traditional models that conceive cognition as passively receptive to

78   john m. carvalho input from the environment that is processed in the form of representational content leading to action and behavior. On the ecological model, cognition actively engages the world, remaking the environment into an emergent field where its projects can be realized. Using skills learned and refined in engagements with the world, subjects realize their aims by enacting what the environment affords them. In the case of music, subjects pattern sounds that turn up in the environment for their particular set of skills and the refinement of those skills. Music emerges from affordances picked up and enacted or realized by composers, performers, and audiences drawing from their skilled engagement with prior performances of music works, the score, this particular performance and responses to that performance as well as in the instruments played, the voices sung, the venue where the music is performed, the constituency of the audience and so on. Again, the music that emerges will be singularly connected to specific performers and audiences engaged in making this music, but it will also be shared in virtue of the convergences of the skills and affordances shared by the musical subjects involved. In this ecological model, the imagination is not treated as a discrete faculty representing mental content that differs from what can be perceived or believed about what is represented in that content. On the philosophy of mind drawn on here, the imagination figures as an affective valence of the always embodied engagement of the mind in an environment of sound. The mind as it is conceived here is not separable into distinct doxic, praxic, and pathic streams. For embodied cognition, there is no perception without action and no action without a conception, including an imagination, of the end of that action which is afforded by the environment for an actor with specific skills. The imagination functions with cognition and action to make actual what is only virtually present in the environment for this particular body with aims afforded for the skills acquired and refined by that body in relation to the environment. For the musician, embodied in a composer or a performer or a listener, as this particular composer, performer, or listener—does she or has she played the piano, to what degree of proficiency, or is she a singer, a horn player, a DJ, and so forth?—the environment of sounds turns up affordances for the emergence of music by virtue of the imaginative, perceptive, cognitive, and active engagement of this particular musician with this particular environment of sound. The imagination is a feature of this engagement. It is not determinative, but it always figures in the embodied engagement of a musician with the sonic environment that turns up for her as she composes, performs, and listens for the music in that environment, engaging the environment to make music emerge from it.2 On this ecological model, music will be accounted for in terms of what emerges from the affordances subjects pick up in their skilled interactions with the environment. This music does not have a specific representational content that can be recognized as this but not that. It is, instead, what repeats itself and cannot but repeat itself precisely because it has no content.3 What emerges as music is what repeats itself in the environment in virtue of the skilled engagements with the environment by composers, performers, and auditors. This music that repeats itself is what the composer, the performer, and auditor find and give back to the environment by their attentive, enactive playing and listening. Without such an engagement, there is no music but only a succession of notes more or

music and emergence   79 less adequately executed, more or less attentively heard. In this chapter, music will be taken to be what emerges in performances of it. What emerges does not approximate an idea or an ideal. Music is very much real in its emergence as what we hear in this performance of it. We approach this controversy as well as the question of a skilled listening to music through an underappreciated text, “Listening,” by Roland Barthes (1985a). In his account of listening, Barthes refers to the way the unconscious, in the psychoanalytic setting, gives an ear to what emerges from the subject the analyst listens to. We update and substantiate Barthes with a revised taxonomy for modes of listening proposed by Kai Tuuri and Tuomas Eerola (2012) defending Barthes’s appeal to the unconscious against what Tuuri and Eerola call “critical” and “reflective” listening. We locate music’s “unconscious” in its “groove” as discussed by Maria Witek (2014) and Tiger Roholt (2014), and we encounter it in the performance of “At Last!” by Etta James. We conclude by defending a critical as opposed to a metaphysical ontology (Goehr 2007; Neufield 2014) that identifies the emergence of music in an enactive performance of it.4

Listening Among the many insights he offers us about music, Barthes says there are three ways of listening. There is listening to an alert, listening that is a deciphering, and listening that develops an intersubjective space where what we listen to is “a general ‘signifying’ no longer conceivable without the determination of the unconscious” (1985a, 246). The first two ways of listening, Barthes says, we share with animals. The last, on his view, is a distinctly human, and modern, way of listening.5 For Barthes, this distinctly human listening compares with the listening of the analyst in the psychoanalytic setting. Barthes does not spell out the implications of this observation for listening to music. We do that here, but ours is not a hermeneutic exercise. We do not hope to reveal what Barthes might really have wanted to say about a listening to music that is inconceivable without the determination of the unconscious. Instead, we hope to take advantage of what Barthes wrote to get traction on the question, How does what is virtually present in an environment of otherwise undifferentiated sound emerge as music in our listening to that environment? To manage this, at least two things must be clarified: what Barthes means by the unconscious and what Barthes means by a “general signifying,” what he also calls “signifiance.” Before getting to the unconscious and signifiance, however, we should note that Barthes distinguishes listening from hearing. Hearing, on his account, is the physiological analogue to the psychological act of listening. We cannot account for listening acoustically, Barthes writes, or by reference to the anatomy of the ear and its object or goal. Rather, listening involves the mind as well as the body which is formed by the mind from which the body takes the object or goal of its listening. Hearing, on this view, is how the body physically responds to listening’s psychological evaluation of a spatial and temporal ­situation. Listening does not respond to hearing. That would lead quickly to a dualism

80   john m. carvalho Barthes would not abide and we should reject. It is rather the case that listening directs hearing while hearing supports listening. Listening is, thus, embodied in hearing just as much as hearing is animated by listening. Barthes would likely not have been moved from this position by the evidence now available that extends the brute anatomy of the ear to the sophisticated psychobiology studied by cognitive neuroscience (see Schnupp et al. 2012). Listening has for him an evaluative function that directs the hearing body to affordances in its environment, and there are good reasons for thinking he is right. There is no question of a dualism here. Hearing and listening are conjoined tendencies of an entirely embodied cognition. At one pole we recognize that something is audible. At the other, we engage what is audible in the context of the lives we enact by listening but also singing, dancing, speaking, and so forth. Listening “is the very sense of space and time,” Barthes writes. By “the perception of degrees of remoteness and of regular returns of phonic stimuli” we shape our sonic world (1985a, 246). He contends that humans identify a territory by listening to the familiar and the unfamiliar in the sounds that constellate a more general environment. We hear sounds, on his view, but identify a place by listening. For example, the “household symphony” of kitchen noises, plumbing, heating and air-conditioning, the sounds of nature or the neighbors and maintenance equipment bleeding in from the outdoors form an aural texture of background noises we hear as the basis for listening to a world we call our home. It is, again, as if listening “were the exercise of a function of intelligence,” Barthes writes, taking intelligence to be a kind of “selection” (247). Listening picks things out, it picks up affordances, but it only exercises this function against the background of what is familiar and unobtrusive. If the background noises are too loud or unfamiliar, listening—as a form of intelligence or selection—is precluded. Affordances turn up because they reinforce what is familiar and because they expand creatively on what has been familiarly afforded. If we were to restate Barthes in a contemporary idiom or, better, if we were to draw from Barthes what will help us understand the role of the mind in our appreciation of music today, we might say that listening enacts or achieves the music afforded by what we hear in an environment of sounds. Let us now see how this distinction plays out in Barthes’s taxonomy. The alert, the first order of listening, is said to be what threatens to interrupt, disturb or positively enhance the safe, sonic space that is the listener’s territory. Listening, at this level, is a response to surprises perceived as either a menace or a need. The “raw material” of listening on this level is what Barthes calls the “index.” The index is something singular, something that stands out because it is distinctive or exemplary in the context or the texture of the territory or what we have called the environment. This is a type of listening we share with animals. The sound of a can of food being opened stands out in the sonic space of the napping cat as do unexpected footsteps in the hall, the one promising the satisfaction of a perceived need, the other anticipating a perceived threat. In the experience of listening to music, an alert may take the form of a missed note that derails the resolution of a melodic line or what proves to be a passing tone that opens a sonic space for improvisation. What is important about this type of listening in the case of music is

music and emergence   81 that something distinctive and pertinent stands out as a danger or an opportunity against an otherwise undifferentiated and diffused but familiar and comforting texture of sounds. Listening at this level makes us more intelligent, allows us to pick out and pick up what will threaten or enhance our listening experience just in case we feel at home among the sounds, including the music, we hear. At the next level, listening becomes something more human and actively creative, beginning, Barthes writes, with “the intentional reproduction of rhythm” (1985a, 248). To account for rhythm, Barthes refers to archaeological evidence, incisions on cave walls, and the regularly percussive activity of housebuilding that predate, on his view, the invention of language.6 While he agrees that we can never know anything about it, he invites us to enjoy the delirium (jouissance) and logic of speculating about the origins of sonic rhythm. He plausibly suggests, in this context, that “without rhythm, no language is possible” (249). The sign, for example, which is the most fundamental unit of meaning, and what must be deciphered in a language, is based on an oscillation between what is marked and what is unmarked, the latter term signing the general category, the paradigm, against which the former syntagmatically stands out. So, for example, “man” is unmarked: it is sometimes, still, and too often, used to refer to both males and females of the human species. “Woman,” on the other hand, is marked: it refers only to females. Barthes explains the relation between the marked and unmarked in terms of the well-known story of the game, Fort-da, played by Sigmund Freud’s grandson, Ernst, staging the presence and absence of his mother (Freud 1953a, 14–15). Fort-da is at once a symbolic game and the creation of rhythm. More importantly, by miming his mother’s return, retrieving the spool cast from his crib, the boy enters into another form of listening. For what on the first level would have been listening for the index and the unmarked sign of his mother’s presence, the sound of her shoes on the floor or the rustling of her skirts, has been made, by the young Ernst, into a marked sign of her return. He no longer listens for what is possible but for what has become the secret, “that which, concealed in reality, can reach human consciousness only through a code, which serves simultaneously to encipher and decipher that reality” (Barthes 1985a, 249).7 Listening on this level, which is the listening most often associated with listening (especially a skilled listening) to music, is always linked to an interpretation. It is an attempt to make intelligible what is otherwise obscure. Barthes connects this listening to religion and the hidden world of the gods. The listener, at this level, he writes, seeks to decipher the future, to identify what is marked with the destiny the gods have in store for us, and to reveal our transgressions, the marked ways we have made ourselves displeasing in the sight of the gods. What is important about this development for a discussion of the determination of the unconscious in a distinctly human way of listening is the introduction of an interiority in this type of listening. It is by listening to ourselves, to our conscience, that we learn about our transgression, about how we have displeased, if not the gods, our families, our friends, and ourselves. Further, faced with these transgressions, we seek to confess them to someone else, to ask another to listen to us. In the long history of the Christian pastoral, and the shorter history of psychoanalysis, this listening has become

82   john m. carvalho an increasingly private affair, until the moment when the speaker, listening to what is interior to himself commands the attention of another’s—the priest’s, the analyst’s— interiority. The one speaking now commands another to listen to what the speaker has heard listening to himself. The injunction to listen is the total interpellation on one subject by another: it places above everything else the quasi-physical contact of the subjects (by voice and ear): it creates transference: “listen to me” means touch me, know that I exist.8 (Barthes 1985a, 251)

Is not this what the musical performer commands? “Listen to me,” she says, through her instrument and her song. “Touch me. Know that I exist.” She does not simply share the music she plays in her performance of it with an audience. She commands the attention of that audience. Hear me. Feel me. Touch me. That is our clue to how the musician listens to herself, to what is afforded her embodied enactment of her music. Importantly, it is the affordance she engages, interior to her, that this musician listens to while performing and that she commands her audience to listen to in the music she makes. Listening to her music, music that emerges from the musician’s skilled engagement with those affordances, the audience listens to the musician herself. They make her  interiority theirs. Here we have the beginnings of a shared intersubjective space of listening. Barthes describes the telephone as the archetypical instrument of this listening, since it “collects the two partners into an ideal (and in certain circumstances an intolerable) inter-subjectivity” (1985a, 251–252). Telephonic communication, he says, invites the Other to “collect [the speaker’s] whole body in his voice” (252). Speaker and listener are, thereby, embodied and extended in the telephone that connects these modalities of their embodiment.9 So far, Barthes’s observations square with the revised taxonomy for listening proposed by Tuuri and Eerola (2012). Tuuri and Eerola conceive listening as an action-oriented intentional activity that finds meaning in “emerging resonances between experi­ ential patterns of sensation, structured patterns of recurrent sensorimotor experiences (action-sound couplings) and the projection of action-relevant mental images” (137). In the literature on listening taxonomies (Schaeffer 1966; Chion 1983, 1990), they discern three listening modes: a causal mode distinguished by an intention to apprehend causal indices, a semantic mode distinguished by an intention to comprehend meanings, and a reduced mode distinguished by an intention to perceive the sound itself (Tuuri and Eerola 2012, 139).10 More recent developments (Huron 2002; Tuuri et al. 2007) suggest a division into two pre-attentive modes (reflexive and connotative), two source-oriented modes (causal and empathetic), and three context-oriented modes (functional, semantic, and critical) (Tuuri and Eerola 2012, 141). The pre-attentive modes, which capture innate and primordial affective responses and their associations, map what Barthes calls listening as to an alert. The source-oriented modes, which capture denotative activation systems and the perception of a sound being intentional, and the context-oriented modes, which capture sounds’ affordances, their sociocultural conventions, and their

music and emergence   83 appropriateness, make a finer grained map of what Barthes (1985a) calls listening as a deciphering (Tuuri and Eerola 2012, 141–142). In their revised taxonomy, Tuuri and Eerola term the pre-attentive modes (to which they add kinesthetic action sound couplings) “experiential.” They group the sourceoriented and context-oriented modes, stripped of critical listening, under the heading “denotative.” Finally, they pair critical listening, judgments about the appropriateness of a sound in a given context and of our responses to that sound, with reduced listening, focusing on the “sound itself and its qualities,” and call this mode “reflective” (142, 147). Again, experiential and denotative modes of listening follow what we have observed in Barthes so far. Reflective listening, however, is not a part of Barthes’s plan. Tuuri and Eerola attribute reflective listening in its reduced mode to an attention to qualities of the sound apart from the denotations associated with the sound (149). To the critical mode of reflective listening they attribute a judgment that “evokes new meaning” in the sounds experienced “and reevaluates those [meanings] already evoked” (149). So described, reflective listening does not appear to contribute to the making or emergence of music. Reduced reflective listening gives us sound stripped of its relation to the listener and the environment. Critical reflective listening makes the dynamic interplay between cognition and the environment one-sided: the sound is given and the listener judges what is given. In the experience we are hoping to describe, the music emerges from affordances that turn up in the environment for the specifically embodied skills of the composer, performer, and listener.11 We expect to find music so enacted in what Barthes has called a distinctly human and modern mode of listening not covered by Tuuri and Eerola.

The Unconscious Listening at Barthes’s first level transforms noise into an index. Listening at the second level transforms the index into a sign. It also transforms the listener into a dual subject. On this second level, interpellation becomes interlocution “in which the listener’s silence will be as active as the locutor’s speech” (Barthes 1985a, 252). Listening speaks, and it is at this level that the third type of listening, a distinctly human and modern listening, a listening “no longer conceivable without the determination of the unconscious,” begins to take shape. The image of the telephone cited earlier does not occur capriciously to Barthes. It is the same image Freud used to describe the analyst’s listening to the analysand. The analyst must bend his own unconscious like a receptive organ toward the emerging unconscious of the patient, must be as the receiver of the telephone to the disc. As the receiver transmutes the electric vibrations induced by sound waves back into sound waves, so is the physician’s unconscious mind able to reconstruct the patient’s unconscious, which has directed his associations, from the communications derived from it.  (Freud 1963, 117–126)

84   john m. carvalho In the free association of the analysand’s giving an account of himself, the unconscious speaks: “touch me,” it says, “know that I exist.” The analyst, evenly hovering, attending to nothing in particular, refusing to latch onto anything that would lead her to learn only what she already knows, listens for the emerging unconscious of the analysand. She makes her unconscious the sounding board for the unconscious of her patient or, better, a surface where the array of her patient’s cathexes can take shape, where, in this array, her patient’s unconscious can emerge.12 Since the unconscious is said by Freud to function at the level of images, any intervention at the level of language, the language of the analyst in particular, threatens to introduce a selection that revises the unconscious in advance, telling the analyst only what she wants to hear. The evenly hovering attention of the analyst attempts to eliminate or bracket, at least, anything that might mediate a connection between her unconscious and the unconscious of her patient. The unconscious, while not a language, is said to be structured like a language (Lacan 1981), that is as a constellation of gaps, lapses, differences, and differential relations, so the patient’s words, spoken freely and associatively, provide a medium for his unconscious to emerge. The impressions the patient’s free associations make on the unconscious of the analyst are, ideally, recorded there for revision only after the fact in the report the analyst writes. The “symbolic order” of language, the language of the Other, is said to enter the equation only in this revision the analyst gives of her patient’s account as mediated by her own unconscious. Of course, these ideal conditions only infrequently obtain. What obtains more regularly, right away in the account the analysand gives of himself, is the structuring influence of the symbolic order and the Other as embodied by the analyst. The unconscious, on these terms, should not be counted as something archaic or archetypical in consciousness. It is not, either, what Freud called the preconscious or what others often think of as a reservoir of repressed desires and cathexes. The unconscious on these terms is rather what is afforded in the account the patient gives. It is what emerges, when it emerges, and repeats itself in the free associations of the patient for the analyst who skillfully bends her ear to these associations and makes her unconscious a sounding board for the unconscious of the analysand. Jacques Lacan (1981) calls this unconscious the objet a, the object cause of desire in the analysand. For Barthes it is what he calls a general signifying, an overfullness of meaning or signifiance that means nothing in particular. In musical terms, it is not an alert indicated by a false note nor a sign interpellating a practiced listener to interpret the meaning of a raised fourth in a blues scale. There is, in music, following Barthes, something more than what it means. There is in music a general signifying that emerges and repeats itself—“hear me,” “know that I exist”—and we hear that something more, that signifiance, only with a listening practiced and refined on the model of the unconscious just described. Listening in this way, engaging the affordances in an environment of sound on the model of the unconscious of the analyst bending to the unconscious of the analysand, the music in a piece of music, the something more in that environment, emerges for us. This mode of listening cannot be found in any of the taxonomies offered by Tuuri and Eerola.

music and emergence   85

Groove This something more in music is approximated by what Maria Witek and Tiger Roholt have called “groove.” Groove is not a colloquial term. Roholt uses it spe­ cifically to talk about the body’s noncognitive grasp of an element in music that signifies without signification, that is, without meaning something in particular. Groove, Roholt writes, is something we feel, the “motor-intentional affect” lived through as part of the effort to understand the music’s motor-intentionality through bodily movements (Roholt 2014, 137). Groove, following Roholt, is perceived haptically by the body in its lived corporeality. In Barthes’s terms, our embodied listening echoes haptically the motor-intentional affect in the music itself. On our terms, this embodied listening is a skilled engagement with what is afforded by an environment of sounds. This listening compares with the listening in the psychoanalytic setting. It is not listening for something it has determined in advance. It is listening for what, emerging in this environment of sound, that has no determinate content, that cannot be fitted under a concept, repeats itself and cannot but repeat itself. Groove is what repeats itself for the listener skilled at engaging what is afforded by its general signifying, its signifiance. This general signifying is the something more we are listening for, especially as performers, in the performance of a piece of music. We are listening for the music as it emerges in the environment of sounds we engage. We are listening for what is latent, as it were, in the manifestly skilled execution of the piece as scored. This signifiance is the ­element that turns up, emerges, slips away, and re-emerges. It is what as performers we are striving to find and hold together (maybe by letting go of our attention to the score and our technique to do so). It is what as auditors we are listening for in those pieces of music we find exemplary because in them there is a groove. What Roholt calls groove gestures to what we are describing as the intersubjective space where what we are listening for is a general signifying that repeats itself and cannot but repeat itself, and this general signifying, this groove, is there only just in case we are there to enact or achieve it. So, what we are getting at is not so much what Witek calls “groove music,” though no doubt there is such a thing, but a groove in music, a general signifying that we bend our ears toward the way the analyst bends her unconscious to the unconscious of the analysand. This is as true of the outro of “Straight, No Chaser” as played by the second Miles Davis Quintet on Milestones (1958)—a groove hackneyed performers struggle to achieve—as it is for the opening passage of “The Beatitudes,” by Vladimir Martynov (1998), rescored for Kronos Quartet (2012[2006]) and heard throughout Paulo Sorentino’s film The Great Beauty (2013). In an interview with David Harrington of Kronos Quartet,13 we hear many of the same sentiments reported by Simon Høffding (forthcoming) from his interviews with The Danish String Quartet. After learning to execute the score, these highly skilled, professional musicians say they have to learn the music that comes only after playing the piece together repeatedly. In their rehearsals, they are not just listening to one another.

86   john m. carvalho They are also importantly listening for the music. That music has a fleeting quality. It cannot be heard when one or another player insists on what they have deciphered as its secret, a secret only she or he can hear. The music emerges in the intersubjective space created by a shared bending of the ears of every player (and often the living composer in the case of Kronos) to the music in its emergence. These ears are obviously particular to each of those players, but they are also general to the quartet as a whole in virtue of the skills those players have refined by playing together and the affordances they have shared in their performances. With ears skillfully attentive to the environment of sound, these performers actively engage what emerges with their voices and their instruments and make music, enact music in this performance of it.14 Consider the example of Beethoven’s late string quartet, No. 14, Opus 131 (1826), which is played attacca (without break or pause).15 As the piece goes on for nearly forty minutes in seven movements without stopping, the musicians are tasked with achieving and maintaining the music through their own fatigue and their instruments’ changing tuning. They must continuously enact the music in part by listening, with their own musical bodies—bodies extended by their individual instruments as well as by the bodies of those playing with them—to what is emerging in this music, what repeats and can do nothing more than repeat itself because, in the overfullness of its own significance, its signifiance, it cannot signify something to the exclusion of something else. Witek and Roholt agree on the attribution of groove to music that moves the bodies of listeners. They point to rock, rhythm and blues, funk, hip hop, and electronic dance music as sources of listening pleasure derived from a compulsion to move and acting on that compulsion by moving the body. Witek focuses her more scientific study on the impact syncopation has on the desire of listeners to move to the music. Roholt focuses on swing and on the comprehension of a motor-intentionality in the music by listeners who move their bodies in response to that music. Our focus has been on listeners who are also performers. Again, we are guided in our intuitions by Barthes: There are two musics (or so I’ve always thought): one you listen to, and one you play. They are entirely different arts, each with its own history, sociology, aesthetics, erotics . . . The music you play depends not so much on an auditive as a manual (hence much more sensuous) activity . . . it is a muscular music; in it the auditive sense has only a degree of sanction: as if the body was listening, not the “soul”; this music is not played “by heart”; confronting the keyboard or the music stand, the body proposes, leads coordinates—the body itself must transcribe what it reads: it fabricates sound and sense: it is the scriptor, not the receiver.16  (Barthes 1985b, 261)

From what we have said earlier, we know that for us the body listening is not just the physical body but (as for Roholt) the haptic, sensuous, affected and affective body, the body capable of enacting or achieving music not out of habit (by heart) but by a constant, attentive bending toward the environment where music can be heard emerging in the performance of it. This listening is something we can fathom comfortably when it comes to playing for ourselves or performing solo for an audience. We can also comfortably fathom how

music and emergence   87 this listening is engaged in the enactment of music by small ensembles. The examples cited previously were the Miles Davis Quintet and Kronos Quartet. It is more difficult to imagine this kind of listening in a large ensemble, a symphony orchestra or concert band.17 This difficulty is not a problem for the view set out here, since in those large ensembles it sometimes happens, for a particular passage, a particular movement, in a particular performance, that there is a magical confluence of the music making of the conductor, one hundred or more performers, a concert audience, and the music. These performances are truly memorable, legendary even. We attend performances of large ensembles, and perform in them, for the chance to achieve that magic in music. In a small ensemble, however, such music must be achieved more regularly if that ensemble and the music they play will be remembered at all. It is likely most regular in string quartets that play together for twenty-five years or more. In jazz (and rock and pop) ensembles, too often a clash of egos or the assertion of a single ego leads to the ensemble disbanding after only a few years, players seeking alternative intersubjective spaces where they can achieve that general signifying conceivable only within the determination of the unconscious rather than the particular signifying of one dominant player. This listening is not unfathomable for those who are not playing the music themselves. It is this listening that motivated Roholt and Witek to focus on how the bodies of listeners are physically moved by rock, hip hop, and electronic dance music. Roholt considers the case of classical music and ventures to guess that an attention by listeners to the nuances expressed in some classical performance “will involve some body movement that can be elucidated in terms of motor-intentionality” (Roholt 2014, 125–126). We would venture to propose that he might discover motor-intentionality in the body movements of classical performers (who are also listeners) themselves, especially the body movements of performers in chamber ensembles and string quartets which extend the ears and minds of those performers in the music that emerges from their playing or, better, their achieving and enacting of that music.18

At Last! In what we have said so far, at least two points deserve closer scrutiny: our apparent commitment to the very idea of the unconscious—why introduce this element since it is bound to arouse controversy—and the apparent commitment in the conclusions we draw to an enactive ontology of music. In fact, neither commitment is as controversial as it seems. We were led to the concept of the unconscious by following Barthes’s association of a distinctly human and modern form of listening with the listening of the analyst in the psychoanalytic setting. We have tried to show that this is a form of listening well known and acknowledged by performers and attentive listeners alike. Performers regularly talk about finding the music in what has been formally scored or merely sketched out in advance. We mentioned the reports of David Harrington of Kronos Quartet

88   john m. carvalho earlier. As an example of what he is getting at, consider the entrance of the second violin in Kronos’s performance of “The Beatitudes,” by Vladimir Martynov. The crucially embodied timing and the tonality of that entrance enacts or achieves the music in this piece of music. Bowed a little late, a little too soon, with more or less vibrato, just slightly sharp or flat, and the piece fails to come together. Everything about that piece of music follows from this entrance, which must be skillfully achieved for the music of “The Beatitudes” to emerge, and it will only be achieved if every member of the quartet, even those not yet playing, bend their ears in the direction of that emerging music. As another example of the same sort, take the Etta James rendering in 1960 of “At Last!” written by Mack Gordon and Harry Warren in 1941 for the film Orchestra Wives (directed by Archie Mayo). The timing, tonality, and timbre of the second note James sings, as “last,” introducing and leading the band into the tune, are crucial to the emergence of the music she and her accompanists achieve in this song. The entrance of the note sounded as “last” follows a credenza ending in a held 9th chord and following that, James singing “At” without accompaniment on the dominant 5th of the tune’s tonic scale. “Last,” then, resolves the tension introduced with “At” and marks the downbeat as well as the key that signs the song, signifying generally and abundantly the music of “At Last!” As James holds “At” for as long as it feels right for her (the note is scored with a fermata), she bends the sound in anticipation of the “last” that will follow (as she bends her ear to the music emerging in her performance of the song). She increases the tension between the dominant 5th and the tonic with the time it takes to resolve it and a grain in her voice that enacts the blues idiom where she locates the tune (Barthes 1985c). She lands on “last” in a way that cues the entrance of the band and, to do all these things, she must be skillfully attentive to the affordance turning up for her, listening to the general signifying she and those performing with her are in the course of enacting. She does not consult a mental representation of prior performances of the song by herself or another. She does not remember the song. What she is listening to, in advance, are the affordances she will engage to enact the song on this occasion, the song that is realized or achieved only by her skilled performance. No doubt she draws on skills she has acquired and refined in prior performance of this song and others, but in this enactment of “At Last!” she engages those skills in the context of the entirely local affordances picked up from the particular performance of the cadenza and the anticipated skills of the band as a whole as well as the audience for this particular performance. She bends her embodied ear to what is afforded by this environment and, skillfully picking up those affordances, enacts or achieves the music in this song. She brings “At Last!” to life. She deploys her refined skills to realize and hold together the music of that song, what in that song repeats itself and cannot but repeat itself in anticipation of her engagement with what is emerging in the overfullness of her performance of it. What emerges from “At Last!” in James’s performance is not a secret to be found by tracing its origins to the musical score for a film about white entertainers (Orchestra Wives) translated by a black blues singer for another audience and embraced by that other audience because of what James brings to the performance of the song. Rather, what emerges as “At Last!” is something James listens for in the song as she enacts it,

music and emergence   89 affording us the chance to listen for ourselves with the skills we have refined for enacting the music James is performing. It may happen that this performance will result in a missed encounter. It may happen that, on this particular occasion, James’s skills or the supporting affordances will not be up to the task. “At Last!” will be realized only just in case it is enacted by performers and listeners alike. Again, there is a certain magic to these enactments. Music is not achieved just by the skilled execution of scored notes performed in a prescribed manner for audiences skilled at evaluating such executions (Goehr 2007). Music emerges for performers deploying their skills in the context of local affordances for listeners deploying their skills in the context of their own local affordances to achieve and enjoy not the signs of this or that song but the general signifying that is the music itself. If there is a certain magic to these enactments, it is not something mysterious we cannot foresee but a rare confluence of several, variable affordances turning up and being picked up by skilled practitioners.19 If what emerges in a piece of music is not what Barthes, earlier, called its secret, what we listen for on the second level, listening as a deciphering, it is also not what, once heard, the musician can actively achieve by repeating it in memory. What the musician is listening for each time she performs will be different relative to the local affordances that vary with the musician’s honing of her skills and with the particular material circumstances of this or that performance. As we noted, it may happen that the musician’s skills and the affordances she picks up result in a missed encounter with the general signifying of the music she is attempting to enact. It may also happen that she encounters a signifying she is not expecting, that she gives the song a life she did not know it had. This can happen in the spontaneity of a live performance or it may happen in the practiced enactment of the song by the same or different performers. Compare, for example, the renderings of “Stella by Starlight” by Ella Fitzgerald (Verve 1961) and the Miles Davis Quintet (Columbia 1964). Both arguably achieve the music of the song, but these are two very different lives that emerge from this same piece of music. Fitzgerald’s “Stella,” with the help of accompaniment by Ray Brown on bass, Lou Levy on piano, and Stan Levey on drums, swings. Against the syncopated rhythm, we listen for the song’s title as it appears late in the lyric line and to the urgency in Fitzgerald’s up-tempo vocals. The Davis Quintet’s “Stella” quickly dispenses with the melody. In its place, Davis plays a moody, introspective meditation on the form of the tune, leading the rhythm section through shifting cadences and drawing a line through the form that enacts or achieves a music that only emerges in this particular performance of the tune. Both performances achieve “Stella by Starlight.” Both enact the music the song would not have otherwise. (Both find a “groove,” in the language used earlier.) Drawing on their refined skills as musicians, listening to the resources afforded by the circumstances of their performances—Fitzgerald recording in a studio, Davis recorded live at Lincoln Center—they enact the general signifying of the tune, what about the tune repeats itself in anticipation of being engaged in this environment of sound by their skilled musicianship.20 What each performer hears in “Stella by Starlight” is animated by an evenly hovering attention to an intersubjective space formed from the bending of their

90   john m. carvalho skilled listening to the performance of that song. From their skilled engagement to the environment of sound turning up in that space, these performers enact what we have been calling music. Something similar could be said of the renderings of “At Last!” by the Glenn Miller Orchestra in 1941 and by Etta James some twenty years later. These performers, Miller and James, each enact the music in the song, yet what emerges as a general signifying of “At Last!” will vary relative to the specific skills of these musicians and the affordances local to the performances they give. In effect, “At Last!” is a different tune every time it is achieved, when it is achieved, allowing again that a given performance may result in a missed encounter with the music of that tune. A more traditional account of the distinction we are drawing would make James’s an especially powerful token of the type “At Last!” (Wollheim 1980). On that view, the type of the song would be given in the score, and the first token performance of the piece would transfer qualities to the type substantiating it, setting a standard for future tokens. James’s performance would be an especially powerful token that would transfer traits to the type, thus setting a new standard against which future performances of  the tune, by Beyoncé Knowles, for example, would be judged. In fact, James came to think of “At Last!” as her song, and Knowles’s performance (at the inaugural ball for US President Barack Obama) has been judged against a standard supposedly set by James. This view, however, idealizes the music of this song and music in general. It assumes that there is a standard of correctness (the score) for measuring the achievements of Miller and James and Knowles (and others). It would be better to say that Etta James came to think of the song as hers because she heard a general signifying in the song that others did not have the skills to achieve. It is not that James is more skilled, rather her specific skills lead her to pick up affordances that do not turn up for others in every performance of the tune that enacts the music in it. Now, what exactly is the song whose general signifying James listens for in her ­performance of it? How is the song identified, how is it heard as the song it is? On the view defended here, “At Last!” is nothing else than the song that is enacted in every performance of it, and it exists or, better, lives in those performances based on the skills of the performer and the affordances that turn up for her specific skills and the skills of her audience. What she listens for, what in the service of which she deploys her considerable skills, is constituted, enacted, or achieved in all of the renderings of it that have been ­performed and appreciated and that enable the music to emerge. There is no music in the tune without a performance of it (Goehr 2007). The tune can be played and heard, but not every playing or hearing of it achieves or enacts the music in that tune. This view is grounded in an ecology of cognition that takes the mind to be necessarily embodied and continuous with the environment in which it is embodied. For this embodied mind, imagination contributes to the perception, cognition, and evaluation of the affordances that turn up in the environment for the skills it has acquired and refined. On this view, music is what emerges from such a skillful engagement with an environment of sound. Music is just what is enacted and achieved in performances of it. Performances depend on some manner of composition. Performances depend more strictly on skilled listeners for those performances. In any case, on this ecological and

music and emergence   91 embodied view, music does not exist in an idea in the mind of the composer or in the memory of a listener based on recordings or prior performances he has heard. Music emerges in the skilled enactment of it on this occasion, played and sung by these musicians, in this venue for this audience. Of course, it may happen, as in the “spontaneous compositions and improvisations” of Charles Mingus or the “creative spontaneous compositions” of Steve Coleman, for example, that composer, performer, and auditor are the same person. For an ecology of musical cognition, the problem of the ontology of music is solved: music just is what emerges in our engaged listening and skilled enactment of it (Neufield 2014). In the film A Late Quartet, mentioned earlier, Peter (played by Christopher Walken), the cellist and elder statesman in the group, relates the story of his encounter with the great Pablo Casals for a master class of young musicians. As a young musician himself, he played for Casals and, to his ear, performed miserably, but Casals inexplicably praised him. Years later, as a mature professional, he chided Casals for what he said was Casals’s insincerity so many years ago, and this time Casals grew angry. Didn’t you play this figure, Casals asked, picking up his cello to demonstrate, with this fingering? It was a novelty for me, he said. And didn’t you attack this phrase—again, demonstrating—with an up bow? Casals emphasized the good stuff, Peter tells his students. He encouraged. He wasn’t listening for the mistakes. The music in a piece of music is not missed when we make mistakes. The music is missed when we fail to listen for what is emerging, what turns up and wants to emerge, in that piece of music, when we fall short of bending our ears and our skills to what affords us the chance to enact the music in a piece of music we otherwise do not hear. What emerges in music is the music we make or, better, enact and achieve by composing, listening, and especially playing skillfully what are, without our engaged attention and refined skills, only patterns of sounds.

Notes 1. I conceive of this imagination as thoroughly embodied, as something felt about the fit of this or that skill, as a form of affective cognition on the order of how the body feels about attacking a snow-packed slope with a pair of skis or feels about the dish that can be made from what is afforded by the refrigerator. Given what is afforded by an environment of sound, the musician “imaginatively” feels the deployment of this or that skill will render the most musical results. She has an embodied and affective sense of what to do with this environment. This idea is developed in a paragraph below. 2. This account does not fall short of analytic specificity. It rethinks the mind as a continuous, embodied engagement with its surrounds. It conceives of the imagination as inex­ tricably caught up in perception, cognition, assertion, and action as well as with evaluations of the ethical and aesthetic value of this imagination. All of these dimensions of embodied cognition are present in different degrees in different engagements as they are afforded by different environments and as those affordances turn up for the skills acquired and refined by a particular embodied mind. The musician is an especially rich example of a continuity of mind, body, and environment that is achieved by a skilled engagement with the music afforded by sonic material.

92   john m. carvalho 3. This is not to say that the music has no content but only the affordances in an environment of sound have no content prior to being skillfully engaged by the composer, performer or listener. Once engaged, the music afforded by that environment acquires a content relative to the specifically embodied skills deployed in that engagement. This content will be shared in the same way skills are shared so that it is not surprising that the content the composer enacts in a score is picked up as affordances and enacted as content by performers for ­listeners who find affordances in performances for a shared content. Differences in the embodiments of the skills acquired and refined by composers, performers, and listeners as well as the particular conditions in which the music is enacted will account for different valences in the content on each occasion. 4. I thank Aili Brenahan, Marc Duby, Richard Eldridge, Enrique Morata, Manos Perrakis, Martin  E.  Rosenberg, and Dylan van der Schyff, who read and commented on earlier drafts of this paper. 5. By “modern” Barthes refers, as we see in more detail later, to a time after the discovery of the unconscious, so from the late 18th century (in the work of Friedrich Schelling) to his present day. Also, he is referring to a mode of listening and not to the modernity of what we are listening to. 6. A piece of music heard at Stanford University many years ago as part of a recital by students was composed for audience members to take the stage and use hammers and nails provided to assemble random lengths of two-by-fours into unspecified arrangements. In this piece, likely inspired by the work of La Monte Young, otherwise asynchronous sounds and untampered pitches developed a rhythm and a tonal palette over time that we would, today, attribute to a form of entrainment. We may suppose Barthes has such a phenomenon in mind. 7. This secret is not the unconscious. It is rather the meaning or signification that the sign, insofar as it is a sign, conceals, even as it points to it. 8. “Transference” refers to the psychoanalytic patient’s unconscious redirection of his own affects toward the analyst. 9. It may be helpful, here, to distinguish this account of music making and appreciation from a more traditional or standard story. It is often thought that the musician hears something of the developing motif of the music she is making and makes music of the notes she is playing or singing by her attention to this development. Memory is required, and the pattern the music realizes in her performance is followed from recollection (Eldridge 2003, 132–133). This makes music a mental activity heard, first, in the mind of the composer and performer, then, communicated to the listener. On our account, there is only music in the enactment of it from affordances picked up in the environment where that music is performed. For the skilled musician those affordances are vast. They include the score, the instrument, past performances by the musician herself and others, a note just played that was just slightly too sharp, a page turned too quickly, what has been played by other musicians in the ensemble, and on and on. The skilled musician also has a way of navigating all of these variables, efficiently and creatively. She quickly cancels what will not contribute to her enacting the music in this piece of music on this occasion. She will make it seem effortless, and when she is successful she brings to her audience something more than a pattern of tones more or less perfectly executed. She brings something of herself. She communicates something of the affordances that turn up for her, and only her, in the performance of this piece of music. 10. In fact, Schaeffer posits four modes of listening: Écouter (attentive to the source of the sound), Ouïr (nonattentive listening to the context of the sound), Entendre (selective

music and emergence   93 appreciation of the sound itself and its qualities) and Comprendre (attribution of a meaning to the sound) (Schaeffer 1966). Tuuri and Eerola (2012) appear to have left out Ouïr in their initial assessment of the literature but build it into their revised taxonomy. 11. Pierre Schaeffer’s musique concrète presents an interesting test in this context. Starting with a type of reduced listening (écoute réduite), Schaeffer and those who followed him attempt to find music in sounds dissociated from their sources in traditional musical instrumentation and representation or abstraction in scores. On our terms, then, Schaeffer and fellow electroacoustic practitioners have acquired and refined skills that allow music to emerge from an environment of sound that is not restricted to the tradition of “serious” music. If there is a difference between our view and theirs, it is in the volitional reduction or bracketing of sounds from their source which is part of the skill set of electroacoustic musicians. On our view, music emerges from an embodied engagement with the sound environment. These intuitions, inspired by Simon Emmerson’s comments, deserve further study elsewhere. 12. This is how it might be possible to speak about an unconscious in music but not a consciousness. We tend to think of consciousness as some one thing, whatever it might be, whereas the unconscious emerges from relations between elements, positively and negatively charged cathexes, just as music might be said to emerge from patterns that turn up as melody, harmony, and rhythm in the notes played and sung without emerging as something we can specify (see Freud 1953b). We make no claims here for an unconscious in music. Our aim is rather to suggest an analogy between the listening of the analyst in the psychoanalytic setting and the listening of the composer, performer, and auditor in the case of music. 13. See, for example, the interview by Don Kaplan, “Navigating a Single Note: The Kronos Quartet’s David Harrington” at www.learningmusician.com/features/0107/DavidHarrington (2007); “Interview with David Harrington of Kronos Quartet” at www.youtube.com/ watch?v=hxoF0wMb0Jc (July 2013); and “Spotlight on . . . David Harrington (Kronos Quartet)” at www.youtube.com/watch?v=ibGTx4CY1VA (2013). Accessed October 5, 2017. 14. For the 2014 meeting of the American Society of Aesthetics in San Antonio, Texas, the chamber ensemble SOLI performed and discussed their corroborations with living composers. Working with Robert Xavier Rodríguez on Música, por un tiempo, the musicians came to recommend that one of the movements be played at a tempo slightly modified from how it was scored. Rodríguez agreed, and in that modified tempo performers and composer together “found” the music in that score. We would say that composer and performers, bending their ears to the environment of sound originally scored, engaged what was afforded in that environment and contributed, through their recommendation, to the emergence of the music in that environment by enacting it in a performance of that music. Examples of music coming together in this way are the norm and not the exception in the course of making music. 15. As featured in the film A Late Quartet (Yaron Silberman, US, 2012). 16. The relation between the scriptor and the receiver in music should be compared with what Barthes elsewhere calls a “writerly” and “readerly” text (Barthes 1977). 17. The exception would be the swing bands of the 1940s headed by Count Basie, the Dorsey brothers, Duke Ellington, Stan Kenton, and others. 18. The Music of Strangers (Morgan Neville, US, 2015) documents Yo-Yo Ma’s Silk Road Project—a collective of musicians displaced by crises and brought together by a conviction that music can make a difference in the world—with images of performers whose seemingly noninstrumental flourishes are no doubt crucial to enacting or achieving the music they

94   john m. carvalho play. Such noninstrumental flourishes proliferate in performances of Kronos Quartet. These same noninstrumental flourishes and forms of bodily engagement, as represented by the actors in A Late Quartet, are crucial for the verisimilitude of that film. 19. This achievement is what distinguishes music from what only passes for music and everything that does not even pass. Skilled execution, whether it be classical music, jazz, or pop, may pass for music, but if it only passes it is missing what is achieved in those (mostly live) performances where the context of local affordances allows the enactment of something truly musical. In the studio, it is what distinguishes which recording of the same tune makes it to the album. 20. Working in the studio, Fitzgerald’s rendering can be refined to the musicians’ satisfaction. In the Quintet’s live performance, Davis draws a listener’s excitement—someone in the audience wails loudly in response to a note he has played—into the melodic line he is drawing. In fact, in the context of this performance, each solo can be described as aiming to enact the music in this piece of music with difference drawn from each musician’s specifically honed skills, from the resources afforded and extended by his particular instrument, by the resources that have turned up in what has already been played and what he can anticipate being played, and what is emerging in the tune as a whole which he listens for as the band plays through the form.

References Barthes, R. 1977. From Work to Text. In Image/Music/Text, translated by S. Heath, 155–164. New York: Noonday. Barthes, R. 1985a. Listening. In The Responsibility of Forms, translated by R. Howard, 245–260. Berkeley: University of California Press. Barthes, R. 1985b. Musica Practica. In The Responsibility of Forms, Translated by R. Howard, 26–66. Berkeley: University of California Press. Barthes, R. 1985c. The Grain of the Voice. In The Responsibility of Forms, translated by R. Howard, 267–277. Berkeley: University of California Press. Chion, M. 1983. Guide des objets sonores: Pierre Schaeffer et la recherche musicale. Paris: Buchet Chastel. Chion, M. 1990. Audio-Vision: Sound on Screen. New York: Columbia University Press. Clarke, E. 2005. Ways of Listening: An Ecological Approach to the Perception of Musical Meaning. Oxford: Oxford University Press. Eldridge, R. 2003. Hegel on Music. In Hegel and the Arts, edited by S.  Houlgate, 119–145. Evanston, IL: Northwestern University Press. Freud, S. 1953a. Beyond the Pleasure Principle. In The Standard Edition of the Complete Psychological Works of Sigmund Freud, Vol. 18, translated by J. Stachey, 1–64. London: Hogarth. Freud, S. 1953b. The Unconscious. In The Standard Edition of the Complete Psychological Works of Sigmund Freud, Vol. 14, translated by J. Stachey, 159–215. London: Hogarth. Freud, S. 1963. Recommendations for Physicians on the Psychoanalytic Method of Treatment. In Therapy and Technique, translated by Joan Rivere, edited by P. Rieff 117–126. New York: Collier Books. Gibson, J. 1979. The Ecological Approach to Visual Perception. Hillsdale, NJ: Erlbaum. Goehr, L. 2007. The Imaginary Museum of Musical Works: An Essay in the Philosophy of Music. Oxford: Oxford University Press.

music and emergence   95 Huron, D. 2002. A Six-Component Theory of Auditory-Evoked Emotion. In Proceedings of the International Conference on Music Perception and Cognition, edited by C. Stevens, D, Burnham, G. McPherson, E. Schubert, and J. Renwick, 673–676. Sydney: AMPS and Causal Productions. Høffding, S. 2019. Performative Passivity: Lessons on Phenomenology and the Extended and Musical Mind with the Danish String Quartet. In Music and Consciousness 2: Worlds, Practices, Modalities, edited by R. Herbert, D. Clarke, and E. Clarke. Oxford: Oxford University Press. Kaplan, D. 2007. Navigating a Single Note: The Kronos Quartet’s David Harrington. Learning Musician. http://learningmusician.com/features/0107/DavidHarrington. Kronos Quartet. 2012. Music of Vladimir Martynov. New York: Nonesuch Records Inc. Lacan, J. 1981. The Unconscious and Repetition. In The Four Fundamental Concepts of PsychoAnalysis, translated by A. Sheridan, 17–64. New York: Norton. Neufield, J. 2014. Musical Ontology: Critical, Not Metaphysical. Contemporary Aesthetics 12. Roholt, T. 2014. Groove: A Phenomenology of Rhythmic Nuance. New York: Bloomsbury. Schaeffer, P. 1966. Traite des objets musicaux. Paris: Editions du Seuil. Schnupp, J., I. Nelkin, and A. J. King. 2012. Auditory Neuroscience: Making Sense of Sound. Cambridge, MA: MIT Press. Tuuri, K., and T.  Eerola. 2012. Formulating a Revised Taxonomy for Modes of Listening. Journal of New Music Research 41 (2): 137–152. Tuuri, K., M. Mustonen, and A. Pirhonen. 2007. Same Sound Different Meanings: A Novel Scheme for Modes of Listening. In Proceedings of Audio Mostly, 13–18. Ilmenau, Germany: Frauhofer Institute for Digital Media Technology. Witek, M. A. G., E. F. Clarke, M. Wallentin, M. L. Kringelbach, and P. Vuust. 2014. Syncopation, Body-Movement and Pleasure in Groove Music. PloS ONE 9 (4). Wollheim, R. 1980. Art and its Objects. 2nd ed. Cambridge: Cambridge University Press.

chapter 5

A ffor da nce s i n R e a l , V irt ua l , a n d Im agi na ry M usica l Per for m a nce Marc Duby

Introduction The concept of affordance as “the ecological equivalent of meaning” (Reybrouck 2015, 16) finds its roots in the work of James J. Gibson (1962, 1968, 1979, 1994, 1998). Gibson frames perception in terms of agents in motion, actively engaging within specific environments, employing each sense as a perceptual system, defined as “a functional assemblage of tissue comprising classically defined sensory and motor tissue” (Michaels et al. 2007, 750). In Gibson’s understanding, the perception–action (henceforth PA) cycle forms the foundation of the synergistic relationship between perception and action, synergistic because informationally rich environments respond to (and are affected by) the active, questing creatures situated within them. Fuster (2013) describes the operations of this cycle in terms that place the emphasis on the feedback loops that obtain from organism-environment coupling: A flow of environmental signals gathered by sensory systems shapes the actions of the organism upon the environment; these actions produce environmental changes, which in turn generate new sensory input, which informs new action, and so on. This circular flow of information operates in the interactions of all animal organisms with their environment.  (90)

A growing body of literature (Clarke  2005; Barrett  2011,  2014; Krueger  2011,  2014; Windsor 2011; Windsor and de Bezenac 2012) seeks to understand musical performance

98   marc duby from a Gibsonian ecopsychological perspective, invoking various notions of affordances as central to their arguments. In this regard, Krueger (2014) suggests that musical affordances “help elucidate the extent to which audition (including music listening) and action are fundamentally interwoven” (2). While the connections between live musical performance and performers’ actions appear straightforward because they are directly observable, Krueger’s argument also frames the listening agent as an active participant in an ongoing sense-making project, as opposed to a passive recipient of sensory impulses emanating from outside to impinge on a blank slate. Reybrouck (2015) considers the implications of such a perspective for understanding music in stating, “[d]ealing with music in ecological and biosemiotic terms means that we should try to understand music not merely in terms of its acoustical qualities but in terms of what it affords to the listener” (16). Performers also listen, and the ecological approach to music allows for the possibility of understanding both listening and performance as active sense-making processes, which require imagination and action to bring to fruition. After a period of relative exile from academic discourse, the topic of imagination seems to have returned to the forefront of debates in recent times (Cook  1992; Godøy and Jørgensen 2001; Hargreaves et al. 2011; Hargreaves 2012; Markman et al. 2015; Kind 2016). With respect to musical creativity, Hargreaves (2012) argues, echoing Plato, that ­“creativity” is an ill-defined term, actually “only one facet of a much broader phenomenon, the central core of which is imagination” (546). Hargreaves proposes that musical creativity be reconceived in terms of “the cognitive processing underlying music perception and production” (545). Since performing music is an especially demanding task that simultaneously recruits a number of perceptual and action systems, exploring the varieties of cognitive processing in musical performance seems instructive, and as Schlaug (2015) contends, “[t]he demands placed on the nervous system by music making are unique and provide a uniquely rich multisensory and motor experience to the player” (38). Clark and his colleagues provide a useful definition of musical imagery couched in terms that include recollecting sounds in absentia, the sense of proprioceptive and kinesthetic actions required for musical performance, and more overtly phenomenological intuitions regarding the score, the instrument, the specific environments in which music is performed, and the emotional components that such performances subserve. They write: Musical imagery has often been viewed and considered as the ability to hear or recreate sounds in the mind even when no audible sounds are present. However, imagery as used by musicians involves not only the melodic and temporal contours of music but also a sense of the physical movements required to perform the music, a “view” of the score, instrument, or the space in which they are performing, and a “feel” of the emotions and sensations a musician wishes to express in performance as well as those experienced during an actual performance.  (2011, 352)

As opposed to recollection of musical sounds in absentia, this chapter’s goal is broadly to explore the second sense of this statement, that is, to place the emphasis on motor

affordances in real, virtual, and imaginary music    99 actions as musicians use them in real or imagined performance, attending in the first place to the physical movements that bring music (understood tout court as “organized sound”) into being. Learning to play an instrument necessitates a prolonged period of acquaintanceship to acquire technical proficiency: for instance, to learn the gradations of pressure to apply to a violin bow to produce a particular sound quality (Cumming 2000). Through physical engagement with the task, long-term changes in brain plasticity ensue (Schlaug 2015), facilitating and strengthening what Heft (2001) describes as “the mutuality between the knower and the object known” (143). One way to approach this mutuality with regard to musical instruments is to understand them as transducers (devices for converting one form of energy into another). This is how Baily (1992) treats them: A musical instrument is a type of transducer, converting patterns of body movement into patterns of sound. The interaction between the human body, with its intrinsic modes of operation, and the morphology [the study of the forms of things, in particular] of the instrument may shape the structure of the music, channeling human creativity in predictable directions.  (149)

So saying, Baily makes explicit the mechanisms whereby body movements are transformed into audible patterns of organized sound through embodied engagements. In broad sympathy with the fundamental principle that fingers and voices move air to bring sound into being, it is also feasible to consider musical instruments as tools1 and, in this regard, the concept of affordances provides an opportunity for understanding musical instruments as real and imaginary tools and a starting point for exploring the various configurations of human–musical instrument interfaces that this concept might illuminate. Such interfaces range in possibilities along a spectrum from directly embodied (as in cases of a musician generating sounds in the moment) to more or less disembodied, as in the case of the air guitar and its related cousin, the virtual air guitar (Karjalainen et al. 2006). Virtual instruments, in which the performer’s actions generate MIDI data from keyboards, drum controllers, or other sources (wind controllers, guitar synthesizers, and so on) form a middle ground by introducing a degree of arbitrariness between actions and outcomes. Such raw data does not by itself specify the intended sound because the receiving device (generally a digital computer) treats the incoming binary data as “pure” information, and not sound. The performer (or composer/arranger) is then free to assign appropriate virtual instruments to translate these data into sound. Through the Gibsonian idea of active touch (1962), musical instruments can be understood as a specialized subset of tools with the potential to provide agents with real and imaginary affordances (environmental opportunities for feedback-directed action and learning). On this view, musical instruments afford playability just as different surfaces variously afford climbability, stability, or concealment for different creatures. So understood, as tools whose configurations have changed over time—whether real, virtual, or entirely nonexistent (instruments of human construction, computer-based,

100   marc duby or made of thin air)—musical instruments afford a wide range of imaginative possibilities for active participation through performance, composition, or listening.

The Hand as a Perceptual System, Musical Instruments as Tools The form of our mind is shaped by our handedness. The kind of mind we exemplify is influenced by our possession of hands. —McGinn (2015, 67, original emphasis)

Stanley Kubrick’s celebrated film 2001: A Space Odyssey (1968) opens with a scene from prehistoric times known as “The Dawn of Man.”2 Shortly after a mysterious black monolith of otherworldly origin is discovered by a group of apes on the African savannah, Moonwatcher, the leader of the troop, grasps in a momentous instant the creative—and destructive—potential of the bone he has to hand by using it to smash to pieces the skeleton from whence it came. For proto-man, from grasping the affordances of the animal bone (as a weapon, a tool for waging war) to murdering the leader of the competing troop is one small step, and as the conquering alpha male flings the weapon aloft in celebration, the spinning bone morphs into a futuristic space station to the melodious strains of The Blue Danube. As Horn describes it (2015), “Kubrick’s segue from a triumphantly hurled tibia bone weapon to a cylindrical Earth-orbiting satellite brilliantly encapsulated 4 million years of technology into 10 seconds of film.” Through cross-fading the two events, Kubrick’s imagery—by way of Arthur C. Clarke’s imagination—proclaims the direct links between technology and a telescoped version of evolution. After a combination of unforeseen factors such as prehistoric climate changes and competition for scarce resources forced our ancestors to descend from the relative safety of their arboreal environment (McGinn 2015), early man, as much prey as predator, was forced by harsh and dangerous conditions to become a tool-maker. Harari (2014) notes how, beginning approximately 2.5 million years ago, “evolutionary pressure brought about an increasing concentration of nerves and finely tuned muscles in the palm and fingers” (9). These evolutionary adaptations enabled humans to produce ever more sophisticated tools, so that “the manufacture and use of tools are the criteria by which archaeologists recognise ancient humans” (10). Weapons such as arrows and spears enabled killing at a distance, so empowering the hunters to attack larger prey and providing a means of self-defense against predator and proto-human competitors. Anatomical developments such as larger brains, opposable thumbs, and upright posture equipped our ancestors for the emergence of the new world of the savannah, as Wallin (1991) argues:

affordances in real, virtual, and imaginary music    101 The upright posture presented new perspectives. To see and to hear was given a new context. The anterior limbs were made free for communicative gestures, for making and for using tools and instruments, for combining and comparing objects which earlier had not had any obvious relation to each other, to support the balance during that very specific locomotion, the dance, which was released by intense sound sequences.  (493, emphasis added)

These interpretations of prehistoric developments connect posture, perception, and movement to a PA cycle unfolding over evolutionary time. Wallin describes how bipedalism freed the forelimbs for gesture and tool-making. As a further consequence of these evolutionary developments, early man’s developing ability to hunt and defend himself allowed for the possibility of leisure time, for stories, music, art, and dancing; in short, the enabling conditions for early culture to emerge.3 Moonwatcher’s actions at the beginning of 2001 fictionalize the enactment of an imaginary proto-human evolutionary leap, in which new action possibilities emerge from physically grasping an environmental object and mentally grasping its potential as a tool all at once; in short, grasping its affordances. In this regard, Michaels and ­colleagues (2007) characterize such a process on the basis of dynamic touch: Dynamic touch is a subsystem of the haptic perceptual system, and refers to perceiving properties of hand-held and hand-wielded objects. During the process of wielding, one can be aware of a variety of properties of the object being wielded such as length, orientation, and heaviness.  (751)

Active or dynamic touch forms a vital element in approaching the transformations in the PA cycle that ensue in learning to play a musical instrument. In the first place, the instrument provides both tactile and auditory feedback. In playing a stringed instrument, the notes set the fingertips vibrating. From the languages of biology and physics emerges the idea that instrument and musician alike are behaving like resonant bodies in a force field of vibrating energy. One acquires the contours of a given acoustic environment and how it responds to sound (in short, its acoustic fingerprint) through a multisensory grasp of the instrument’s behavior in specific spaces. Similarly, at loud concerts the low-frequency elements of bass and kick drum are not just heard but also felt with one’s whole body resonating in sympathy. Playing the double bass affords characteristic morphological possibilities such as using the sympathetic resonance of the open G string to adjust the pitch of the stopped one an octave below and the deployment of harmonics for tuning the instrument. The harmonic D on the G string has the same pitch as that of its counterpart on the D string, and this property of the instrument’s morphology can be used for tuning the instrument in the absence of a reference tone (a tuning fork or digital tuner). For ecological psychology, such properties are availed through direct perception. To remain with the example of tactile perception, imagine handling an object presently out of sight. By turning it around in one’s hand and feeling its surfaces, contours,

102   marc duby and edges, invariant properties of the object are revealed that specify its shape and quite possibly its identity. The superiority of active over passive touch in shape recognition is a robust experimental finding.  (Heft 2001, 174)

The PA cycle (in which the notion of feedback—more precisely, degrees and types of feedback—plays a vital role) provides a framework for comparing this experience with those of virtual and nonexistent instruments, which also provide auditory feedback. In the last two categories of instruments, tactile (haptic) feedback from the instrument is unpredictable to a degree because of its mediation through MIDI (playing a MIDI guitar may generate the sound of a saxophone, for argument’s sake) or absent altogether, as in the case of air instruments. To resolve the apparent category mistake4 implicit in terms such as “motor imagery” and “auditory imagery,” one might consider the thought experiment of inverting such terms so reframe “motor imagery” simply as imagined movement, and by corollary auditory imagery as imagined sound. Far from mere verbal sleight of hand, this exercise restores two perspectives: first, that imagining movement may not require intermediary representations5 to be re-implemented in action, and second that the disciplinary procedure of considering perceptual systems in isolation of necessity overlooks their multimodal integration in complex organisms. Creatures actively engaged in environments where parsimonious cognitive decisions enable quick responses to systemic or local changes make sense of simultaneous multisensory information available to the participant, whether deploying vestibular, visual, auditory, haptic, or taste and smell, as per Gibson’s list of perceptual systems. As this information becomes available to the organism by way of affordances, it becomes meaningful, and therefore it seems plausible to consider the hand as a special case of a perceptual system. Handedness, as McGinn’s epigraph proclaims, remains key to shaping the human mind and will continue to do so as humankind and technologies continue to co-evolve. On this view, contemporary hand-held technologies (such as mobile phones and tablets) may well spark a new cognitive revolution as fingers and thumbs adapt to these new interfaces.

Virtual Instruments: Analog Hardware Reimagined Every path to composition engages tools, be it a pencil, a drum, a piano, an oscillator, a pair of dice, a computer program, or a phone application. Each tool opens up aesthetic possibilities but also imposes aesthetic constraints. —Roads (2015)

The increasing sophistication of digital computers has facilitated a wide range of possibilities for musical simulations. In this section, the discussion focuses chiefly on

affordances in real, virtual, and imaginary music    103 two aspects: the use of computing technology to simulate the behavior of analog equipment (such as Hammond organs, Fender-Rhodes electric pianos, and guitar amplifiers, to name a few) and its potential to recreate the soundscapes of iconic recording studios. For a fee, the home studio enthusiast may purchase access to the acoustic fingerprint of celebrated environments such as Abbey Road, including virtual simulations of tape technology without the expense, weight, and inconvenience of the original analog equipment. The Fender Rhodes electric piano is a weighty beast whose signature sound has graced many a recording, and it is no doubt convenient for home studios to have access to this fingerprint without its less convenient aspects: likewise, the Hammond organ, now in a virtual digital version where the recordist has access to its wide range of sonic possibilities. More modern virtual instruments such as the Korg Wavestation consist of the original digital synthesis programs used in the hardware unit, so porting over its sonic capabilities to a computer platform. This is less a simulation than a shift of environment from hardware (integrated circuits capable of various forms of digital synthesis) to software, which “reads off ” the synthesizer’s behavior using the same digital parameters as the original unit. The advent of digital technologies has rendered virtual the spaces of costly recording studios and their equipment, so that it is possible (at least in theory) for the recordist to recreate the sound of these environments at home. Companies that offer software for such simulations claim that these are meticulous physical models of their original analog counterparts and some comparisons by reviewers seem to indicate a close degree of similarity between the original environments and their software models. Emulating the behavior and sound characteristics of analog equipment using digital technology raises the problem of translating the nonlinear response characteristics of analog technologies into the binary language used by computers, pointing to different (if not incompatible) ways of encoding information. “Analog” implies fluctuations in voltage as produced by changes in attack and volume of an electric musical instrument, for argument’s sake, whereas digital technologies sample such voltage changes using a bit stream of zeros and ones to take snapshots of the system state in infinitesimally small increments. The differences between these technologies are perhaps best exemplified by how they handle distortion. Vacuum tube technologies found in “retro” guitar amplifiers produce nonlinear effects (harmonic distortion) when overdriven, as opposed to digital clipping in the case of a computer, which compromises the integrity of the original signal. In other words, unlike analog equipment, it is impossible to overdrive a computer in a musically pleasing way6. The brave new world of the digital computer has certainly simplified the task of precision editing. Compare, for argument’s sake, Pierre Schaeffer’s labor-intensive tasks in producing his original musique concrète compositions to the relative ease with which such tasks can be performed in the digital domain. As Schaeffer (2012) notes, the technological options of the time incorporated an element of risk insofar as their results were unpredictable:

104   marc duby A movement of the bow responds with dignity to the composer’s notations, to the conductor’s baton. But the effects of a turn of a handle on the gramophone, an adjustment of the potentiometer, are unpredictable—or at least we can’t predict them yet. And so we reel dizzily between fumbling manipulations and erratic effects, going from the banal to the bizarre.  (79)

Consider the exercise of playing a sound backward. Using the available technology of Schaeffer’s time, this would entail cutting a length of magnetic tape, resplicing and reversing it so that the information was read as back to front by the playback head. To replicate this effect in the digital domain, the recordist simply changes the order of the sample data, so that the direction of the bit stream is reversed and the computer reads the data from back to front to produce the desired effect. As Doornbusch and Shill (2014) note, the concept of affordances also finds application in the field of digital audio editing: “Having audio available in a digital form can be said to be an affordance to the editing and manipulation of that audio” (27). The advent of the digital computer has loosened the centuries-old bonds between sound production and its outcomes, so paving the way for new methods such as granular synthesis, of which Opie writes (2015), “If you want to do it the original Xenakis way you will need a reel to reel tape recorder, a razor blade, sticky tape, and a lot of time.” Physical modeling as a synthesis technique attempts to recreate the sonic behavior of musical instruments by simulating their physical characteristics. According to Hind (2016), “With physical modeling it is the actual physics of the instrument and its playing technique which are modelled by the computer.” From physical modeling, the designers of the Virtual Air Guitar adopted the Karplus-Strong algorithm, “a computationally efficient digital wave-guide algorithm for modeling the guitar string as a single-delay loop filter structure with parametric control of the fundamental frequency and losses in the filter loop” (Karjalainen et al. 2006, 965). Their aim was to create “a pleasant rock guitar sound experience” (969) (complete with a simulated vacuum tube amplifier and distortion) where the player has a degree of control over the outcome as opposed to the “schizophonic”7 experience of popular video games such as Guitar Hero (Miller 2009; Katz 2012), where the would-be guitarist has to deal with a controller interface, a plastic instrument without strings that is modeled on a Fender Stratocaster. Miller (2009) extends Schafer’s original concept to what she terms “schizophonic performance,” in which clear lines dividing live and recorded performance are blurred “by combining the physical gestures of live musical performance with previously recorded sound” (401). Her nuanced work raises vital questions regarding notions of authenticity and identity, specifically that of the rock guitarist. She writes: “By giving players an immersive gaming experience filled with rock-oriented cues—including the musical repertoire, archetypal rock-star avatars, a responsive crowd, a guitar-shaped controller, and physical performance cues—Harmonix [the designers of Guitar Hero] encouraged them to adopt a rock-star identity.” In this respect, Katz (2012) notes how “digital music technologies—in the form of video games and mobile phone applications—challenge traditional notions of musicianship and amateurism” (460). Rather than framing such technologies as less than

affordances in real, virtual, and imaginary music    105 real, as pale shadows of an Ur-experience of the authentic, Miller (2009) strikingly suggests, “these games might be compelling and valuable not just as simulations, fantasyenablers, and stepping stones to real instruments, but because they offer people a new kind of musical experience” (425). One might quite simply frame such new experiences in terms of the unique imaginative possibilities afforded by such musical games.

The Worshipful Fraternity of Air Guitarists In this shifting musical setting, rock guitar has assumed an almost “traditionalist” aura for many audiences and musicians, encased in a nostalgia for past forms that in previous eras was reserved for more folk-based styles of expression. At the same time, though, rock guitar has moved into more hybridized contexts wherein the polarity between analog and digital, electric and electronic musicianship, becomes the basis for creative fusion. —Waksman (2003, 131)

What do the gestures of air guitar enthusiasts mean for theories of embodied cognition? According to Godøy (2006), it is not uncommon to see people making sound-producing gestures such as playing air drums, air guitar, or air piano when listening to music. Our observation studies of people with different levels of expertise, ranging from novices with no musical training to professional musicians, playing air piano, seem to suggest that associations of sound with soundproducing gestures is common and also quite robust even for novices.  (155)

What Godøy proposes is a direct link between sound and gesture that applies equally to all musically inclined participants; that is, independently of the expertise required to be a professional musician. By connecting instrumental sounds to the imaginary actions required to bring them to life, Godøy provides a lens through which to examine the nature of cognition as embodied action. In the case of the air guitar, this is a lens that refracts light, whose resulting picture bends reality and challenges comfortable assumptions about human musicking. At first glance, it might seem patently absurd to spend time discussing such an extreme case as that of the air guitar, a phenomenon one might want easily to dismiss rather like the notorious case of the pop duo Milli Vanilli, whom the media unmasked as charlatans after it was discovered that they had employed session singers on their hit record and mimed onstage to a prerecorded backtrack.8 Popular and critical outrage at their inauthentic performance tactics prompted the withdrawal of their awards, and subsequently their career took a disastrous turn into obscurity.

106   marc duby Godøy (2004) elsewhere argues for profound links between sound and gesture maintaining that, when listening to the ferocious beating of drums, it is probably almost impossible to avoid having images of an equally ferocious activity of hands and mallets hitting the drum membranes, and conversely, listening to a slow, quiet, and contemplative piece of music would probably evoke images of slow and smooth sound-generating gestures. (58)

It seems obvious—but not trivial—that “the associations of sound with sound-producing gestures” Godøy describes are grounded in movement, in what has been termed mind as motion (Port and Van Gelder 1995; Sheets-Johnstone 2011; see also Godøy, this volume, chapter 12). Imagine this mind within the setting of a stadium such as Woodstock 1969, the sheer power, the high volume, the presence of a large crowd of people—Pete Townshend’s windmilling gestures, the ecstatic abandonment of Alvin Lee, Hendrix’s evocation of the Vietnam War in “Machine Gun”—thus, through films and recordings, are immortalized the musical gestures of a whole host of soon-to-be canonized guitar heroes. Air guitar practice inverts in one sense this heroic aspect by claiming such status for everyman guitarist; at the same time, it exalts the long-term love affair between humans and instruments, which Latour (2005) describes as “the constant companionship, the continuous intimacy, the inveterate contiguity, the passionate affairs, the convoluted attachments of primates with objects for the past one million years” (83). Humans mold, and are in turn molded by, instruments. Some of them have the potential to injure us until we learn how to treat them right.9 At the same time, they lend us their unique voices. As much as musicians’ actions give voice to matter, matter as the instrument speaks through the hands and breath of the player. What, though, if this is an imaginary instrument? Is it right and fitting to speak of imaginary affordances accompanying imaginary gestures? The bogus pomp of this section’s title points to the medieval-guild-like character of this imagined community, with its inherent “traditionalism” and “nostalgia for past forms,” as Waksman’s epigraph suggests. The electric guitarist as hero is surely one of the abiding mythologies of twentieth-century rock culture, incorporating notions of superhuman virtuosity, presence, authenticity, technical (and by implication, sexual) prowess, and an aura of masculinity. Originating in Oulu, Finland, in 1996, the idea for an air guitar competition (Türoque and Crane 2006) was based on an imaginary civil war requiring “air guitar forces” for its resolution. Air guitar contests set soberer traditional music competitions on their heads, raising the question of what criteria the judges of such contests deploy in assessing the winner. Fortunately for the perplexed, the website of the Air Guitar World Championships, at this time of writing in their twenty-first year, (http://www.airguitarworldchampionships.com) reveals all. While providing for personal air roadies but disallowing real or imaginary backing bands, the site projects a distinctly egalitarian aura:

affordances in real, virtual, and imaginary music    107 Air Guitar is all about surrendering to the music without having an actual instrument. Anyone can taste rock stardom by playing the Air Guitar. No equipment is needed, and there is no requirement for any specific place or special skills. In Air Guitar playing all people are equal regardless of race, gender, age, social status or sexual orientation.

The procedure is for contestants to submit a one-minute audio clip for miming to, to be played over a “big sound system,” with the jury criteria listed as: “Originality, the ability to be taken over by the music, stage presence, technical merit, artistic impression and airness.” In the unlikely case of mystification, the last criterion (“airness”) is defined in the rules of the US Air Guitar Championships (http://usairguitar.com/rules2/) as “the extent to which a performance transcends the imitation of a real guitar and becomes an art form in and of itself.” Hutchinson (2016) understands airness as “a term meant to pinpoint some of those ineffable qualities that transform a competent performance into a truly great one” (416). The notion of congruency in pantomime also comes into play in defining the criteria for technical merit. As the site claims, “You don’t have to know what notes you’re playing, but the more your invisible fretwork corresponds to the music that’s playing, the better the performance.” The air guitar phenomenon has inspired a number of websites that specialize in the sales and marketing of these invisible instruments, with Dimitri’s Air Guitars in Sydney, Australia, a front-runner (http://www.air-guitars.net/home/about.html). Here it is possible to order not only electric, acoustic, and bass air guitars, but useful—if not essential— accessories such as air strings and plectrums. Learning the right moves for the budding air guitar practitioner entails imitation, defined by Buccino and colleagues (2004) as “the capacity of individuals to learn to do an action from seeing it done. Imitation implies learning and requires a transformation of a seen action into an ideally identical motor action done by the observer” (323, original emphases). As Heine van der Walt (aka Lord Wolmer, the South African air guitar champion in the late 2000s) describes his learning process, he drew his original inspiration from watching VHS tapes of bands like Iron Maiden and Megadeth: “My elder brothers were in high school and they would watch these tapes, and there’s me, about 6 years old, staring. As I watched the bands I thought ‘Man, that must be the best job in the world’ ” (AuntyNexus  2014). With the credentials of a professional touring guitarist, van der Walt might be thought to have possessed an unfair advantage over so-called nonmusicians. Regarding his time in the contest, he notes that the number of professional musicians taking part increased from around 30 to 50 percent. Table 5.1 consists of a summary of the three categories of instruments in this chapter. It forms a matrix with porous boundaries as opposed to discrete categories, and the distinctions between the categories are understood as fluid and dynamic. In this light, it is perhaps best understood as encompassing degrees of hybridity, because music fields such as jazz, Western art music, karaoke, DJ’ing, videogames, and even imaginary metal (as in the repertoire of my informant), all avail themselves of evolving technologies of performance and improvements in interfaces (in short, affordances) to bring musical sounds to life. Moreover, the porosity of these genres allows for

108   marc duby

Table 5.1  Comparing Performance with Three Different Interfaces (Real, Virtual, and Air) Real (live performance)

Virtual (live or recorded)

Air (recorded)

Agentic

Agentic/schizophonic

Schizophonic

Dynamic sensorium

Reproduced sensorium

Imaginary sensorium

“Authentic”

Potentially “inauthentic”

Avowedly “inauthentic”

Online (real-time)

On- or offline (sequencers, backtracks)

Offline (and loving it)

Gesture and sound congruent (one-to-one correspondence)

Gesture and sound (in)congruent (may not necessarily correspond)

Gesture and sound incongruent (reassembled after the fact)

Instruments, voices

Controllers

Air instruments

Jazz, Western art music, etc.

Karaoke, DJ’ing, videogames

Pantomime (game)

“Serious”

“Serious”/ludic

Ludic (kitsch, camp, drag)

cross-pollination across and between them, so bringing ideas of recursion, borrowing, and interpenetration across boundaries. Reading from top to bottom, “agentic” refers to the degree to which a given participant can influence the outcome of the music, so that real-time live performance in fields like jazz and Western art music necessitates a one-to-one correspondence between gesture and sound, whereas karaoke “inserts” the performer into a ready-made environment, the only variable in this instance being the performer’s contribution to a fixed backtrack. Agency is understood in this musical context simply as the person moving the air, so to speak. The notion of authenticity (hotly debated in the field between traditionalists and innovators, according to both Miller and Katz) is deliberately placed in inverted commas because it is felt to be to a large degree irrelevant for the purposes of this argument. A more productive strategy perhaps is to approach this contentious notion as a function of time; in other words, time connects gesture to sound in live performance in demonstrable fashion, because both the audience and fellow-musicians can observe a congruency between action and sound. With virtual instruments, gesture and sound might not necessarily correspond, because a gesture on a keyboard might trigger an incongruent sound or prerecorded sequence of musical events, so that a one-to-one correspondence might or might not occur. In the case of air guitar, the gesture is separate from the outcome because the guitarist is miming to a prerecorded backtrack and then reassembling the gestures to correspond with the sonic outcome after the fact, according to the conventions of the genre. By proclaiming its adherence to the ersatz, the fake, the less than authentic— “celebrating the basement,” as van der Walt puts it10—the world of the air guitarist upends the social order of professionalism and virtuosity, acknowledging that, after all, it is just a game. Or a carnival, so that Bakhtin’s (1998) distinction between the official feast’s “consecration of inequality” and the freewheeling spirit of carnival seems very appropriate in this context.

affordances in real, virtual, and imaginary music    109 As opposed to the official feast, one might say that the carnival celebrated temporary liberation from the prevailing truth and from the established order; it marked the suspension of all hierarchical rank, privileges, norms, and prohibitions. Carnival was the true feast of time, the feast of becoming, change, and renewal.  (45)

Miller (2009) astutely connects the ludic aspect of guitar-oriented videogames to selfsatirizing genres such as pantomime, high camp, and kitsch, stating that, in the case of Rock Band, “players not only choose the gender, body type, clothes, and instruments for their avatars but also must select a physical performance style, choosing from rock, punk, metal, and goth ‘attitudes’ that govern the avatar’s physical mannerisms, stance, and affect” (421). As a result of the disconnect between performance and outcomes, Miller contends, “The games invite players to make a spectacle of themselves” (421). Yes, indeed.

Further Discussion, and Some Remarks on Representations Movement is the mother of all cognition. —Sheets-Johnstone (2011, 128).

If we accept this premise as the fundamental basis for cognition, then the motor actions of professional musicians involve the acquisition over time of an exacting level of precise control as required for the execution of highly complex music at virtuoso level. Consider in this regard the actual changes in wiring in the corpus callosum (responsible for coordination between left and right “sides” of the body: see, for instance, Sacks 2011; Koelsch 2012) in professional musicians, such that Sacks claims that musicians’ brains are distinguishable at the physical level (i.e., dissection) from those of other occu­ pations. The corpus callosum rewires itself exactly because it is bound up in precise control and coordination exemplified by the performance of music at the outer limits of human possibility by virtue of its technical difficulty, its inherent structure (Satie’s Vexations, with its 840 repetitions lasting between eight and 24 hours to perform11), or its demands for synchronization, such as music by Pierre Boulez the author witnessed in performance in which not least of the technical demands was to play such difficult music in time with the rest of the ensemble. If, as Sheets-Johnstone proposes, consciousness of self and others springs from our own animation in the gradual process of individuation, it is true that the realm of the sensorimotor has largely been ignored by conventional cognitive science, as she avers. The fact is that musicians learn not only how to move but also how to limit movement for economy’s sake (so conserving energy) as well as deploying off-line resources such as mental rehearsal to enhance performance in the case of athletes, dancers, musicians,

110   marc duby and so on. For Cook (1992), “[b]eing able to play the piano is a matter not so much of mastering the actions required in performance as of knowing how to organize them into a coherent motor sequence” (75). Beilock and Lyons (2015) ground their discussion on expert performance in experts’ greater propensities for off-line preparation, a kind of motor visualization of imagined actions/procedures supported by clinical evidence. Between player and instruments exists a reciprocity through which each is gradually transformed; this is how constant friction wears the original varnish down to bare wood so that instruments acquire a patina over time, and how, in turn, such instruments become priceless. Much more than mere tools, instruments provide avenues for self-expression, real and imaginative affordances for creativity, and may contribute to the enactment of a temporary sense of community among participants. As Peretz (2006) states it, a key purpose of music and dance is to “enhance cooperation and educate the emotions and the senses. It is a form of communion whose adaptive function is to generate greater sensory awareness and social cooperation” (24). One of the founders of the field of artificial intelligence maintains that “mobility, acute vision and the ability to carry out survival-related tasks in a dynamic environment provide a necessary basis for the development of true intelligence” (Brooks 1991, 141). His view aims to dispense with representations as intermediaries between body and mind in favor of direct perception and action within a robotic environment. For Brooks, a robotic agent displays intelligence by simply acting and has no need for an internal map inside her silicon head. Windsor and de Bezenac (2012) point to the incompatibility of notions like representations with an ecological approach, claiming that: Ecological approaches do not sit well with discussions of imagery and representation, however situated or embodied these discussions may be. Although our attempt to stretch affordances to cover a wide range of behaviours may appear speculative in some instances, we have intentionally chosen to avoid falling back on mental processes and representations as an explanation of behaviour in order to test how well the concept can be extended, and we would expect that such hypotheses should attract further empirical as well as philosophical investigation.  (116)

Stretching affordances (somewhat of a conceptual newcomer to the broad field of cognitive science), as these writers propose, may well provide an ecologically valid methodological approach in problematizing the divisions between body and mind, action and perception, and making and theorizing about sounds. By placing the emphasis on musical actions as examples of enacted cognition, ecological approaches offer a viable alternative to the disembodied stance of the computationalist project. Further to this point, one may choose to be parsimonious about representations and deploy such concepts as and when they are necessary and proven by experimental evidence. In this regard, it may be important to take heed of Haselager and his colleagues, who proffer what they term (Haselager et al. 2003) “an antidote to the traditional representational cravings of cognitive science: ‘Don’t use representations in explanation and modeling unless it is absolutely necessary’ ” (229).

affordances in real, virtual, and imaginary music    111 So, instead of giving in to such cravings, Chemero (2016) enjoins his readers to consider the explanatory force of sensorimotor empathy as the “implicit, sometimes unin­ tentional, skilful perceptual and motor coordination with objects and other people” (138). This concept seems well suited to understanding the affordances of musical performance as real-time sociocultural phenomena, without necessarily invoking representations as intermediaries in such circumstances.

Acknowledgments This material is based on work supported financially by the National Research Foundation of South Africa. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the author(s), and therefore the NRF does not accept any liability thereto.

Notes 1. It is worthwhile to note how composers have tended to push against the limits and conventions of musical creativity by harnessing new extended instrumental techniques in performance, so extending concepts of what musical instruments can be employed to do. 2. “Synopsis for 2001: A Space Odyssey,” 2016. http://www.imdb.com/title/tt0062622/ synopsis?ref_=tt_stry_pl. Accessed July 25, 2016. 3. “Man is known by his artifacts. He is an artisan, an artificer, an employer of the arts, an artist, and a creator of art. Beginning with tools and fire and speech, the ‘tripod of culture,’ he went on to making pictures and images, then to the exploitation of plants and animals, then to the exchange of goods for money, and finally to the invention of writing” (Gibson 1968, 27). 4. The category mistake comes from Gilbert Ryle, whose example is a visitor to Oxford who is in search of “the University,” misunderstanding the abstract nature of this term for the agglomeration of buildings and personnel who staff it. The contention is that terms such as “motor imagery” are problematic because they conflate two different modalities of perception. 5. As Decety and Stevens put it (2015), “Because motor representation inherently involves aspects of both body and mind, it presents as the most obvious candidate for wedding this dichotomy” (3) [that is, that between body and mind]. No such dichotomy exists in ecological psychology, which, as noted, insists on the mutuality between agent and environment. 6. Unison technology as employed by the California-based audio company Universal Audio offers a solution to this problem by allowing the digital information of the computer to change the impedance characteristics of the interface, so modeling the behavior of an analog channel strip. 7. Miller (2009) defines “schizophonic” as “R. Murray Schafer’s term for the split between a sound and its source, made possible by recording technology” (400). 8. In fact, this kind of technological sleight of hand is fairly routine within the music industry. Consider auto-tuning software, for instance, which purports to correct bad intonation, or

112   marc duby the wide range of “sweetening” (audio processing such as compression, EQ, and reverb) treatments employed routinely as production techniques. 9. Speaking of strings, Rufus Reid (1974) exhorts the apprentice jazz double bass player to develop calluses to acquire a stylistically idiomatic pizzicato sound. These are constituted over time through friction between skin, metal strings, and wooden fingerboard. The instrument gives and takes over time: witness the devastating psychological effects for professional musicians of being incapacitated and unable to play. Such strokes of cruel misfortune sunder performers from their professional and artistic selves and steal from them a vital raison d’être. 10. M. Duby, Skype Interview: Heine van Der Walt. Pretoria, South Africa, September 29, 2016. 11. According to Sweet (2013), “An Australian pianist named Peter Evans abandoned a 1970 solo performance after five hundred and ninety-five repetitions because he claimed he was being overtaken by evil thoughts and noticed strange creatures emerging from the sheet music. ‘People who play it do so at their own peril,’ he said afterward.”

References AuntyNexus. 2014. Space Has Never Held Such Terror: Boargazm Interview. ­http://metal4africa.com/interviews/space-has-never-held-such-terror-boargazm-interview/. Accessed April 6, 2017. Baily, J. 1992. Music Performance, Motor Structure, and Cognitive Models. In European Studies in Ethnomusicology: Historical Developments and Recent Trends: Selected Papers Presented at the VIIth European Seminar in Ethnomusicology, Berlin, October 1–6, 1990, edited by M. P. Baumann, A. Simon, and U. Wegner, 142–158. Wilhelmshaven: F. Noetzel. Bakhtin, M. 1998. Rabelais and His World (1940), Mikhail Bakhtin. In Literary Theory: An Anthology, rev. ed., edited by J. Rivkin and M. Ryan, 45–51. Oxford: Blackwell. Barrett, M. S. 2011. Troubling the Creative Imaginary: Some Possibilities of Ecological Thinking for Music and Learning. In Musical Imaginations: Multidisciplinary Perspectives on Creativity, Performance and Perception, edited by D. J. Hargreaves, D. Miell, and R. MacDonald, 45–66. Oxford: Oxford Scholarship Online. doi:10.1093/acprof. Barrett, M. S., ed. 2014. Collaborative Creative Thought and Practice in Music. SEMPRE Studies in the Psychology of Music. Farnham, UK: Ashgate. Beilock, S.  L., and I.  M.  Lyons. 2015. Expertise and the Mental Simulation of Actions. In Handbook of Imagination and Mental Simulation, edited by K. D. Markman, W. M. P. Klein, and J. A. Suhr, 139–159. New York: Psychology Press. Brooks, R.  A. 1991. Intelligence without Representation. Artificial Intelligence 47: 139–159. doi:10.1016/0004-3702(91)90053-M. Buccino, G., S. Vogt, A. Ritzl, G. R. Fink, K. Zilles, H. J. Freund, et al. 2004. Neural Circuits Underlying Imitation Learning of Hand Actions: An Event-Related fMRI Study. Neuron 42 (2): 323–334. doi:10.1016/S0896-6273(04)00181-3. Chemero, A. 2016. Sensorimotor Empathy. Journal of Consciousness Studies 5: 138–152. Clark, T., A.  Williamon, and A.  Aksentijevic. 2011. Musical Imagery and Imagination: The Function, Measurement, and Application of Imagery Skills for Performance. In Musical Imaginations: Multidisciplinary Perspectives on Creativity, Performance and Perception, edited by D. J. Hargreaves, D. Miell, and R. MacDonald: 45–66. Oxford: Oxford Scholarship Online. doi:10.1093/acprof. Clarke, E.  F. 2005. Ways of Listening: An Ecological Approach to the Perception of Musical Meaning. Oxford: Oxford University Press.

affordances in real, virtual, and imaginary music    113 Cook, N. 1992. Music, Imagination, and Culture. Oxford: Clarendon. Cumming, N. 2000. The Sonic Self: Musical Subjectivity and Signification, Advances in Semiotics. Bloomington, IN: Indiana University Press. Decety, J., and J. A. Stevens. 2015. Action Representation and Its Role in Social Interaction. In Handbook of Imagination and Mental Simulation, edited by K. D. Markman, W. M. P. Klein, and J. A. Suhr, 3–20. New York and Hove, UK: Psychology Press. Doornbusch, P., and G. Shill. 2014. Affordance and Appropriation. In Music in the Social and Behavioral Sciences: An Encyclopedia, edited by W. F. Thompson, 28–31. Thousand Oaks: SAGE Publications, Inc. doi:http://dx.doi.org/10.4135/9781452283012. Fuster, J. M. 2013. The Neuroscience of Freedom and Creativity: Our Predictive Brain. Cambridge: Cambridge University Press. Gibson, J. J. 1962. Observations on Active Touch. Psychological Review 69 (6): 477–491. Gibson, J. J. 1968. The Senses Considered as Perceptual Systems. London: George Allen & Unwin. Gibson, J.  J. 1979. The Ecological Approach to Visual Perception: Classic Edition. New York: Psychology Press. Gibson, J.  J. 1994. The Visual Perception of Objective Motion and Subjective Movement. Psychological Review 101 (2): 318–323. doi:10.1037/h0061885. Gibson, J.  J. 1998. Visually Controlled Locomotion and Visual Orientation in Animals. Ecological Psychology 10 (3–4): 161–176. Godøy, R.  I. 2004. Gestural Imagery in the Service of Musical Imagery. In Gesture-Based Communication in Human-Computer Interaction: 5th International Gesture Workshop, GW 2003, edited by A. Camurri and G. Volpe. 55–62. Berlin: Springer-Verlag. Godøy, R. I. 2006. Gestural-Sonorous Objects: Embodied Extensions of Schaeffer’s Conceptual Apparatus. Organised Sound 11 (2): 149–157. doi:10.1017/S1355771806001439. Godøy, R. I., and H. Jørgensen. 2001. Musical Imagery. London: Routledge. Harari, Y. N. 2014. Sapiens: A Brief History of Humankind. London: Harvill Secker. Hargreaves, D. J. 2012. Musical Imagination: Perception and Production, Beauty and Creativity. Psychology of Music 40 (5): 539–557. doi:10.1177/0305735612444893. Hargreaves, D. J., D. Miell, and R. MacDonald. 2011. Musical Imaginations: Multidisciplinary Perspectives on Creativity, Performance and Perception. Oxford: Oxford Scholarship Online. Haselager, W. F. G., R. M. Bongers, and I. Van Rooij. 2003. Cognitive Science, Representations and Dynamical Systems Theory. In The Dynamical Systems Approach to Cognition: Concepts and Empirical Paradigms Based on Self-Organization, Embodiment, and Coordination Dynamics, edited by W. Tschacher and J.-P. Dauwalder, 229–241. Singapore: World Scientific. Heft, H. 2001. Ecological Psychology in Context: James Gibson, Roger Barker, and the Legacy of William James’s Radical Empiricism. Mahwah, NJ: Erlbaum. Hind, N. 2016. Physical Modelling Synthesis. https://ccrma.stanford.edu/software/clm/compmus/clm-tutorials/pm.html. Accessed September 27, 2016. Horn, M. 2015. 10 Reasons Why 2001: A Space Odyssey Stands as the Best Sci-Fi Movie Ever Made. http://memeburn.com/2015/11/10-reasons-why-2001-a-space-odyssey-stands-asthe-best-sci-fi-movie-ever-made/. Accessed April 8, 2017. Hutchinson, S. 2016. Asian Fury: A Tale of Race, Rock, and Air Guitar. Ethnomusicology 60 (3): 411–433. Karjalainen, M., T. Mäki-Patola, A. Kanerva, and A. Huovilainen. 2006. Virtual Air Guitar. AES: Journal of the Audio Engineering Society 54 (10): 964–980. Katz, M. 2012. The Amateur in the Age of Mechanical Music. In The Oxford Handbook of Sound Studies, edited by T. Pinch and K. Bijsterveld, 459–479. Oxford: Oxford University Press. Kind, A. 2016. The Routledge Handbook of Philosophy of Imagination. Abingdon, UK: Routledge.

114   marc duby Koelsch, S. 2012. Brain and Music. Chichester, UK: Wiley-Blackwell. Krueger, J. 2011. Doing Things with Music. Phenomenology and the Cognitive Sciences 10 (1): 1–22. doi:10.1007/s11097-010-9152-4. Krueger, J. 2014. Affordances and the Musically Extended Mind. Frontiers in Psychology 4 (January): 1–13. doi:10.3389/fpsyg.2013.01003. Latour, B. 2005. Reassembling the Social: An Introduction to Actor-Network-Theory. Oxford: Oxford University Press. Markman, K. D., W. M. P. Klein, and J. A. Suhr. 2015. Handbook of Imagination and Mental Simulation. New York and Hove, UK: Psychology Press. McGinn, C. 2015. Prehension: The Hand and the Emergence of Humanity. Cambridge, MA: MIT Press. Michaels, C.  F., Z.  Weier, and S.  J.  Harrison. 2007. Using Vision and Dynamic Touch to Perceive the Affordances of Tools. Perception 36 (5): 750–772. doi:10.1068/p5593. Miller, K. 2009. Schizophonic Performance: Guitar Hero, Rock Band, and Virtual Virtuosity. Journal of the Society for American Music 3 (4): 395–429. doi:10.1017/S1752196309990666. Opie, T. 2015. What Is Granular Synthesis? http://granularsynthesis.com/guide.php. Accessed April 17, 2017. Peretz, I. 2006. The Nature of Music from a Biological Perspective. In Cognition 100:1–32. doi:10.1016/j.cognition.2005.11.004. Port, R. F., and T. Van Gelder. 1995. Mind as Motion: Explorations in the Dynamics of Cognition. Cambridge, MA: MIT Press. Reid, R. 1974. The Evolving Bassist: An Aid in Developing a Total Music Concept for the Double Bass, and the Four and Six String Electric Basses. Chicago: Myriad. Reybrouck, M. 2015. Music as Environment: An Ecological and Biosemiotic Approach. Behavioral Sciences 5 (1): 1–26. doi:10.3390/bs5010001. Roads, C. 2015. Composing Electronic Music: A New Aesthetic. Oxford: Oxford University Press. Sacks, O. 2011. Musicophilia: Tales of Music and the Brain. Rev. ed. London: Picador. Schaeffer, P. 2012. In Search of a Concrete Music. Translated by C. North and J. Dack. Berkeley: University of California Press. Schlaug, G. 2015. Musicians and Music Making as a Model for the Study of Brain Plasticity. In Music, Neurology, and Neuroscience: Evolution, the Musical Brain, Medical Conditions, and Therapies, 37–55. Progress in Brain Research, 217. Amsterdam: Elsevier B.V. doi:10.1016/bs. pbr.2014.11.020. Sheets-Johnstone, M. 2011. The Primacy of Movement. Expanded 2nd ed. Amsterdam and Philadelphia: John Benjamins. Sweet, S. 2013. A Dangerous and Evil Piano Piece. The New Yorker. Türoque, B., and D. Crane. 2006. To Air Is Human: One Man’s Quest to Become the World’s Greatest Air Guitarist. New York: Riverhead Press. Waksman, S. 2003. Contesting Virtuosity: Rock Guitar since 1976. In The Cambridge Companion to the Guitar, edited by V. A. Coelho, 122–132. Cambridge Companions Online. Cambridge: Cambridge University Press. Wallin, N.  L. 1991. Biomusicology: Neurophysiological, Neuropsychological, and Evolutionary Perspectives on the Origins and Purposes of Music. Musicological Series by Pendragon Press. Stuyvesant, NY: Pendragon Press. Windsor, W. L. 2011. Gestures in Music-Making: Action, Information and Perception. In New Perspectives on Music and Gesture, edited by A.  Gritten and E.  King, 45–66. SEMPRE Studies in the Psychology of Music. Farnham, UK: Ashgate. Windsor, W. L., and C. de Bezenac. 2012. Music and Affordances. Musicae Scientiae 16 (1): 102–120. doi:10.1177/1029864911435734.

pa rt I I

SYST E M S A N D T E C H NOL O GI E S

chapter 6

System ic A bstr actions The Imaginary Regime Martin Knakkergaard

Introduction When we imagine and make music, we are fundamentally manufacturing sculptures in time. Music is an outline of time in which we make use of pitched as well as unpitched human and musical instrument sounds, odd sound syntheses and noise, all kinds of con­ crete sound sources and samples, silence, and any combination of frequencies imaginable, for the last few decades even digitally modeled and thus transcending the limits of sound made by analog means. In making these sounds we are, besides our own bodies, using a number of technologies most of which—the musical instruments—have been spe­ cifically designed for the particular purpose. Generally, technology in the widest sense of the concept has and always has had a profound influence on music. No matter whether we look at how music is conceived, organized, performed, preserved, or developed, music making is always closely tied to the technologies at hand at the given time. As such, technology—and the insights, beliefs and understandings behind it—always regu­ late the conditions on which music is formed, eventually leading to the generation of theoretical rules, terminologies, and discourses that more or less stand out as universal principles, uncovered by the means of language and logic. The workings and principles of these technologies are guiding our imagination, they are co-builders of the mental frames and schemes we conceive music by, and central to these music technologies—and to music making in general—is the notion of the tone or the interval, which is, in my understanding, also what distinguishes music from other sound-producing, expressive art-forms. As phenomena, music and music-making are highly complex and richly faceted artforms as well as engaging social activities sometimes initiated for their own sake and

118   martin knakkergaard sometimes to support gatherings, communities, and ceremonies. Music is a seclusive art of sound that makes time perceptual, generally in such a way that we, when listening to it, can tell if it is done right or wrong, is well performed or not, even though we are not familiar with the particular piece in advance or with the musical genre or style in ques­ tion. What we are or become aware of, however, is the underlying systemic implications that the music in question either conforms to or fails to observe. So, even though there are no limits to what can be used in making music as long as it can produce some kind of sound, how the music is organized and realized is quite another matter. There are limits, but, contrary to sounds, these limits are not linked to any kind of materiality. The composition of sounds into music is ruled and restricted by corporeal implications and dependencies as well as by the element of embodiment on the one hand and, on the other, by specific, cultural practices that are theorized, or capable of being theorized, explicitly or implicitly. These factors and elements interact dialectically with the approaches, techniques, and technologies that are used for making music, and together they regulate and guide our imagination of music, in regard to what music we can expect and what music we can ideate. Whereas we in the Western countries, for instance, often tend to—or at least used to—recognize music from other cultures as exotic and maybe even consider it to be somewhat wrong in terms of pitch and articu­ lation or likewise, we forget, overlook, or simply do not know that our music at a time not that long ago might very well have sounded much in the same way, even after it had been theoretically confined by extensive restrictions of metaphysically justified theory and normativity that music’s encounter with the philosophy and speculations of the Ancient Greeks led to. So, even though music in practice typically does not comply with the rigidity of theory, Western music has had its own Weberian iron cage for an eternity that has imposed a number of restrictions on the expressive and articulatory potentials of musical forming and expression—banning “subversive noise [and] maintaining tonalism” (Attali 2008, 8) in favor of control and order, which, however, all the same has paved the way for the development of, for instance, advanced polyphony and rich and complex harmony. In this chapter I shall discuss how the beliefs and imaginations of primarily the Pythagoreans have influenced the shaping of pitch structures in music for more than 2000 years and how the resultant selection and organization of abstract and generalized pitches in practice affect our musical expectations, conceptualizations, and imaginations.

Interval At the core of the organization of music is the interval. No matter how we look at music, the interval holds a central position as the essential building block and reference. It is by the presence and use of defined intervals that music is distinguishable from other forms of expressive sonorous (art) forms such as sound art and speech. Sound art and speech obviously contain intervals (it is impossible not to), but these intervals are, contrary to music, not tied to and dependent on specific, rigorous codes, although the use of

systemic abstractions: the imaginary regime   119 pitch-differences in speech, just like variations in dynamics and tempo, carries a lot of  information—in many languages they actually alter the meaning of words and sentences—and of course can obtain a decisive function in sound art as well. Any interval is valid in sound art and speech, intervals—no matter whether they are rhythm or pitch intervals—are tied to language, dialect, situation, age, and so forth, and are more or less unique to the person speaking (see Yasar, volume 1, chapter 21). Using specific, regular intervals will either turn speech to song or maybe signal that the person in question is not quite well. With regard to rhythm intervals or beats, as Oliver Sacks states (quoting A.  D.  Patel), “The perception of synchronization of beat, Patel feels, ‘is an aspect of rhythm that appears to be unique to music . . . and cannot be explained as a by-product of linguistic rhythm’ ” (2006, 243). Regarding music, we think in intervals just as we hear in—and listen for—intervals, and the concrete intervals are set against an immanent systematization that is given in advance depending on culture, epoch, style, and genre. From early on, the brain is sim­ ply trained to listen for and respond to patterns of intervals that fall within certain, definite matrixes, typically referred to as tone systems. These systems are not universal but are culturally specific, which implies that what, from the side of the receiver, is acknowledged as meaningful musical utterances is dependent on the sonorous pattern’s reference to the tone system, just as what can be imagined from the perspective of the sender is guided by, and in a way confined by, proportions of the tone system. Again, whether we are refer­ ring to pitch or time intervals is not important: in practice, they are both always present as long as pitch paradoxically is also allowed to include nonpitched or inharmonic sound such as noise and many drum sounds. In this chapter I will, however, concentrate almost entirely on pitch intervals. The modern organization of pitch is not just a system of intervals but a finite system of tones or pitches of fixed frequencies, albeit the determination of the frequencies—the tuning—is relative to whatever concert pitch is currently decided (today, typically a = 440–444 Hz). In comparison, neumes, which were in use for centuries in the Middle Ages, only indicate relative intervals,1 and often ornamental implications too, but carry no information about pitch (or rhythm and duration for that matter); the neume virga, for instance, indicates a tone that is higher than both or one of the surrounding tones, whereas clivis indicates two tones where the first is the highest and porrectus indicates the tone sequence high-low-high, and so on. Neumes were thus primarily a descriptive and mnemotechnical tool, to support learning and performance of music, and it was completely dependent on the user’s imagination, experience, and acquaintance with the musical practices and the tone system of the time. Similarly, pitches within the fixed notation of Gamelan music today are only relative, as they vary “considerably from one gamelan to the next, both in absolute pitch and in relative size of intervals” (Brinner 2017), implying that instruments of whole ensembles are not necessarily “in tune” with one another. In other words, whereas the intervals of Western music of today are dependent on abstract, fixed proportions, this is not at all a universal—or an ahistorical—situation, and even though the music theory of the Middle Ages was closely tied to a strict systemati­ zation founded on Pythagorean principles and idealization, the performance of the music relied on oral practices.

120   martin knakkergaard Since the Pythagoreans, the systematized organization of pitches in most Western cultures is founded on arithmetic (Sundberg 1980; Hansen 2003). This factor is probably a decisive element behind the Western understanding of numbers as central to music as it, for instance, is manifested in this oft-cited quotation from Leibniz (1712): “Musica est exercitium arithmeticae occultum nescientis se numerare animi” (Music is a hidden arithmetic exercise of the soul, which does not know that it is counting). It must, how­ ever, be noted that, to the Pythagoreans, numbers were sacred, and their study of music aimed beyond the music itself: The important truths about music were to be found instead in its harmonious reflec­ tion of number, which was ultimate reality. As a mere temporal manifestation, the employment of this harmonious structure in actual pieces of music was of decidedly secondary interest.  (Mathiesen 2017)

And, in the Greek’s “new sensitivity to order and form” (Burkert  1962, quoted in Sundberg 1980, 21, my translation), the numbers provided a way to escape materiality’s grip, as numbers “are present in the things with a reality-constructing and determining function” (Sundberg 1980, 21, my translation). The Ancient Greeks’ use of numbers in the generation of the tone system, where the octave is defined by the relation 2:1, the fifth as 3:2 and the fourth as 4:3, is a direct result of their general favoring of the musica universalis (also referred to as the harmony of the spheres) comprising the small integers 1-2-3-4. With these four numbers, they also con­ structed the triangular tetractys that, besides constraining the above ratios, also comprises 10, the sum of the four numbers and the core of the decimal system. Even though the fundamental ratios of the tone system were supported empirically by experiments with proportional divisions of the string of a monochord, the Greeks abstained from con­ tinuing the sequence of ratios further, which, for instance, based on an additional readout from the monochord, would have produced the major third by the ratio 5:4 (see later). They insisted on “explaining” the harmonic proportions from within the reach of the tetractys because the number 4 was considered to be sacred as it was observable every­ where: nature and matter was made up of the four elements, there were four seasons, four directions, four ages, and so on, and these numbers were expected to be present in all proportions, natural as well as mystical, that warranted cosmic correspondence, balance, and coherence. The scientific axiom that is linked to this particular use of num­ bers in relation to music is, in other words, rooted in a sort of mythical understanding of unity, simplicity, and balance as a ruling principle (see Klempe 1991; Sundberg 1980; Knakkergaard 2016a).

Diatonic Scaling Even though there are practically unlimited ways to divide audible sound into separate pitches (see later), the Western practice is initially to maintain the octave as an identity

systemic abstractions: the imaginary regime   121

A

1 B

2 e c

d

f

g

a

3 b

e a

bb

4 e’ c’

d’

c’

d’

f’

g’

a’

e’

5

Figure 6.1 The construction of a (double) diatonic tone system with a slight chromatic element.

or unison interval and to partition this octave into a scale of 7 discrete steps. The scale comprises 5 (whole) tones (T) and 2 semitones (S), often represented as this series T-T-S-T-T-T-S, thus forming the scale we today refer to as “major.” Originally, the gen­ erative structure behind this scale is the tetrachord—literally “four strings”—and the partition of the octave is rooted in the combination of five equally constructed groups of four tones within the compass of a fourth, called diatonic tetrachords (Figure 6.1). To the Greeks, these five tetrachords together formed a complete tone system, and again “against this tetrachordial thinking the fourth interval stands out as «das Lieblingskind der griechischen Theorie» (the lovechild of Greek theory)” (Handschin 1948, quoted in Sundberg 1980, my translation). The internal proportions of the tetrachord were constructed2 on the basis of the three fundamental intervals whose ratios, as mentioned earlier, are contained within the tetractys: the octave, the fifth, and the fourth. Setting the octave to 12, the octave (2:1) below will be 6, the fourth (4:3) below will be 9 and the fifth (3:2) below will be 8, and the pro­ cedure thus reveals the interval 9:8, namely, between the fourth and the fifth. The Greeks called this interval “tonos” (≈ “tension”), which was “considered to be the fundamental tone-step” (Sundberg 1980, 112, my translation) and the Pythagoreans used it to “fill out” the gaps (downward) between the fundamental intervals—see d and c in tetrachord 1 and f and g in tetrachord 2 in Figure 6.1.3 The tetrachordal understanding of this early diatonic scale came to dominate in Western music theory until the eleventh century, when a hexachordal understanding gradually took over, along with the development of Guidonian notation and the mnemotechnical system called the Guidonean hand. The use of the hexachord as a par­ ticular unit was most of all a pragmatic move, not a theoretical one. Its primary aim was to ease and support the learning of vocal music, and it formed an intelligent approach to the adoption of the proportions of the diatonic system of the time by the introduction of the fixed format T-T-S-T-T, vocalized as ut-re-mi-fa-sol-la. Furthermore, it made it easy to distinguish between the three different versions, the natural, the hard (durum), and the soft (molle), according to their note placement, respectively c, g, and f, thus allowing for the use of both b and bb (see Figure 6.2). The tone-step was maintained as a fundamental proportion, and so the introduction of the hexachord did not weaken the diatonic scale’s dominant position. Quite the con­ trary. By replacing the much more ambiguous neumes, the innovation in practice not only consolidated the scale but also indicated the transition from an informed descriptive

122   martin knakkergaard molle naturale durum gamut

ut ut re

re

mi fa sol

ut re mi fa sol la G A B c d e

f

ut g

mi fa

sol

la

ut

la

ut

re

mi fa sol

la

re mi fa a bb b c’

sol d’

la e’

re a’

f’

re

ut g’

mi fa

b’

sol

mi fa h’ c”

la

sol la d” e”

Figure 6.2  The possible hexachords within the tone system of the Middle Ages, the gamut.

system of complex signs to a reductionist prescriptive system of disconnected and generalizable abstract entities. As such, it marks the first steps toward the merging of a tone system and a notation system too, pointing toward a situation that legitimizes to a certain degree Wishart’s claim that “the priorities of notation do not merely reflect musical properties—they actually create them” (1996, 11). With the eventual expansion of the tone system, the so-called gamut (see Figure 6.2), by the inclusion of half steps between the (whole) tone-steps—some of which in the Late Middle Ages were referred to as false or fictive music (musica falsa and musica ficta)—a full chromatic scale could finally emerge in the sixteenth century. And even though the basic scale almost certainly was the pentatonic (Hansen 1995, 25f), just as it was and is in most non-Western music (Tran 1977, 77), the inclusion of semitones afforded the pre­ requisites for the development of the harmonic principles that came to dominate music from the late seventeenth century until 1900—after Riemann (1893), often referred to as cadential harmony or “functional harmony”—in which the two semitones of the diatonic scale provide the pivotal leading notes. Returning to the tetrachord, it is noteworthy that it played a decisive role as a con­ stitutive element in tone scales of the Greco-Roman epoch, not just in Greece but also in general, for instance, in Anatolia and Mesopotamia. And besides the diatonic tet­ rachord, the Greeks also referred to two variants of the four-tone unit: namely the enharmonic and the chromatic, where the enharmonic was built on the intervals quarter-tone–quarter-tone–major third and the chromatic on semitone–semitone– minor third. Had these tetrachords not been suppressed by the diatonic, the inner workings of the Western scale or scales could have been composed very differently to what they came to be. In comparison, the tone system of Anatolia, most of which today is known as Turkey, developed into a number of slightly different scales, makams, which represent unique selections of tones, referred to as commas, from a division of the octave into 53 theoretically equidistant intervals. These proportions suggest that the chromatic and enharmonic tetrachords or similar continued to function as con­ stituent elements. However, elements similar to the diatonic principle are present in these scales too, as many makams include octave identity dividing the octave accordingly into scales or combined sets of pentachords and tetrachords, each scale comprising seven steps. Just like the makams, scales from other non-Western music cultures bear similarities with the diatonic scale, but the general picture is that they are typically pen­ tatonic selections.

systemic abstractions: the imaginary regime   123

Some Notational Implications From the beginning of the thirteenth century, five-line staff notation was dominant along with the increased use of the polyphony of the time, which also brought detailed representation of the rhythmic parameter with notes of different duration organized by divisive (multiplicative) principles—♩ = ♪ + ♪, for example—as well as the introduction of bar-lines and, somewhat later, measures. As mentioned, the tone system and the notation system have become increasingly interdependent in accordance with the developments of the latter and the resulting expan­ sion of the former. However, being based on theory and abstraction, both the tone system and the notation system relate to a number of compromises and choices, and almost all ratios, maintained and recorded within the two systems, “do not really” comply with the stringency of the systems (Hansen 1999). Instead, sounding music is almost free of the exact divisions of the systems (see Danielsen, this volume, chapter 29), which, nevertheless, have a decisive influence on the development of musical instruments. And many instruments will unavoidably regulate and restrict the possible interactions with the systems thus, in reality, forming yet another technological system that not only solidi­ fies the two other systems but also co-shapes our understanding of what can be achieved and imagined even further. The manifest musical interfaces are thus developed into affirmative and controlling devices on their own.

Interface Depending on the needs and ambitions at a given time, the development of musical instruments and the generation of tone and notation systems have to interconnect in a process of synthesis in which they can be refined dialectically. The innovations and changes that vocal music went through following the breakthrough of the radical new kind of polyphony that Ars Nova brought about in the fourteenth century gradually trans­ formed music from an almost entirely linearly organized art form into a musical practice that included increased attention toward the vertical parameter as well. This development initiated the emergence of instrumental music, as a separate trajectory, and eventually also paved the way for the emergence of the triad, which later made up a central element in the constitution of the “functional harmony” mentioned earlier. The central premise for the triad to emerge was the inclusion of the third as a consonant interval. The first traces of this process go back to the twelfth century, in which theore­ ticians began to refer to the purely empirical observations, secundum auditum (“by the ear”), that were achieved when practicing and performing vocal music that displayed polyphonic implications (Hansen 1995, 58). This development eventually led to reconsider­ ations regarding the relation between consonance and dissonance, which did not ­simply imply the inclusion of the third as a consonant interval but also, in fact, formed a

124   martin knakkergaard break with Pythagorean theory because the third’s numeric ratio fell “outside the range of the tetractys, which by the Pythagoreans assumedly had to be given some sig­ nificance” (Sundberg 1980, 107). As they did not want to exceed the restriction of 4,4 they consequently conceived of the major third by moving four fifths up and two octaves 3  2  4 81   = . down: 4 64 In Ramos de Pareja’s treatise Musica Practica from 1482, the interval of the third is, for the first time, described by the ratio 5:4 (today known as the “pure major third”), and some seventy-six years later Zarlino was the first also to include in his concept of harmony triads consisting of 5ths and 3rds; this he was able to do because, besides perfect consonances, he defined imperfect ones—the 3rds—by means of simple, “harmonic” numerical proportions (5:4 and 6:5) rather than by the complicated Pythagorean proportions.  (Dahlhaus 2017)

From the title of his treatise, it is evident that Ramos de Pareja was concerned with music in practice: music in its performance. Thus, his theoretical effort was apparently motivated by the need to overcome the divide between music in theory and music in practice that the development of polyphony in particular had led to, or, rather, had uncovered. Just as singers and choirs in practice had probably been singing secundum auditum all along, ensuring that intervals other than octaves, fifths, and fourths sounded in consonance, they probably had never had any problems in “transposing” or “shifting” from one central pitch to another (cf., hexachord, earlier)—they simply maintained the internal proportions of the scale by ear—this obviously was, however, not the case regarding the musical instruments of the time. The instruments were laid-out and tuned according to the principles of the Pythagoreans, to whom “harmonia”—as already implied—was a question of codifying the (diatonic) scale and “the relationship between those notes that constituted the framework of the tonal system” (Dahlhaus 2017) and not the theoretical or acoustic harmony of simultaneously sounding intervals. Consequently, instruments with fixed steps like, for instance, the organs and the early clavicembalo of the Middle Ages, were not capable of producing triads that sounded satisfactory. The thirds and sixths of the Pythagorean scale do not meet medieval and Renaissance criteria of consonance implied by such terms as “perfection” and “unity.” When used as harmonic intervals these Pythagorean 3rds and 6ths are likely to be characterized, on an organ Diapason stop for example, by rather prominent beats; middle C–E or C–A beat more than 16 times per second at modern concert pitch.  (Lindley 2017)

By 1600, the development of the tone system had found its final form with the division of the octave into twelve intervals or half-steps—however, the question of tuning was not solved at this time (and remains still unsolved to a certain degree). Equal temperament,

systemic abstractions: the imaginary regime   125 which was suggested by Vincenzo Galilei in 1584, where the semitone is defined as 12 2 , is a compromise that is not fully musically satisfying. Just or harmonic tuning, that is based on the ratios of small integers as suggested by Parejo and Zarlino, is more pleasing to the ear, and is typically used by vocals and strings in ensembles where the performers adjust pitch with each other by the ear (secundum auditum). However, just tuning only works ideally for one scale at a time, hence the tuning of, for example, D major is not the same as that of E major. Today, however, music software programs such as Apple’s Logic Pro X and Steinberg’s Cubase include Hermode tuning, which is capable of adjusting simultaneously sounding tones of electronic instruments in real time to accommodate to just intonation without compromising equal temperament as the overarching tuning. It is, however, interesting to consider that, by taking one’s point of departure in the diatonic scale’s combination of tones and semitones, there is nothing that prevents the whole tone from being more than twice as large or less than twice as high as the halftone. If a tone system’s minimum interval equals m, then the system using the fewest tones, is the one whose semitone is m and whole tone 2m. This gives in all 2 × m + 5 × 2m = 12m, hence 12 tones per octave.” (Hansen 2003, 1645, my translation)

However, if the half tone is set to 2m and the whole tone to 3m we get 19 tones within the octave, which, compared with the 12-tone system, is slightly better or more precise in terms of pure intervals. While it nevertheless was the 12-tone system that prevailed, it is probably due to economic and technical performative advantages, and maybe to the fact that Western European music preferred to live with too large major thirds (implying small semitones and thus sharp leading notes) than major thirds that are smaller than the pure.  (Hansen 2003, 1645, my translation)

This 19-tone system has not been abandoned altogether—the 19-tone-guitar is for instance currently available5—but, from this time on, the system of 12 intervals per octave has been the absolute dominant standard. Even though many musical instruments can be traced back much further—and very often to non-Western cultures and coun­ tries—most of the ones that exist today have been refined and developed to reach their current shape and level of perfection within the last 300 years in order to comply with the 12-interval tone-system. The piano and its well-known keyboard lay-out is in many ways the epitome of a modern musical instrument, being, as it is, the most commonly used instrument for music teaching and demonstration because it provides a generally objec­ tive interface whose design is easily understood and is capable of producing many simultaneously sounding tones. And although its tuning typically is fixed, it is possible to alter the tuning if it is the original, acoustic instrument. Such a practice was actually very often the case in the seventeenth century where there were many candidates for the new tuning system that the new fully chromatic tone system required, and it was not

126   martin knakkergaard uncommon for composers and keyboardists to tune their instrument themselves according to the particular standards of the times or to their own preferences. But the division of the octave into 12 discrete steps is, however, fixed, just as the diatonic scale in a way holds a privileged position (highlighted by the seven white keys). Contrary to modern Western musical instruments, which generally all—apart from a few exceptions like the trombone and the family of strings—dictate discrete steps through fixed holes, keys, frets, and similar contraptions, instruments from other cultures do not just comprise many variations of the fretless violin-type instruments and assortments of different reed instruments that corresponds to distinctive tunings but also encompass fretted instruments where the frets are organized in fewer or more steps than fretted instruments of the West today or where the distance between frets is even adjustable in order to accommodate different intervallic relationships of dissimilar modus. This could indicate that, without the dominance of a metaphysical, fundamentally extra­ musical theory such as the Pythagorean theory, the cores of other tone systems might be closely tied to, for example, emotion and experience, as the use of ‘ “makams or perdahs’ and South Asian ‘rāgas and rāginīs’ . . . properly signifies” passions or affections of the mind (Powers et al. 2017)—but then again, the Greeks, or at least Ptolemy, also associ­ ated different modes with ethnic names, such as Dorian, Lydian, and so on, which were carried on in the writings of Boethius (app. 480–525). As implied, the West maintained a highly theoretical axiom leading to a situation where it can be said that, the more theorized tone systems are, the more abstract they become, and the more abstract they become, the more flexible (understood as emanci­ pated from materiality) but, at the same time, regulating and influential they are. Once a tone system coincides with a notation system, the way is paved for a completely soundless music to emerge. A music that is conceived in close relation to descriptive units organ­ ized in a standardized coordinate-system that is visualized and thus readable. When this has taken place, there are absolutely no (physical) limits to the kinds of music that can be imagined as long as it respects the rules and norms of systems, leading to a situation where we expect and perhaps even require the elements of the system—and none other—to be present for us to acknowledge music, even though these elements are all principally abstract and not dependent on any kind of physical, material interaction.

Interaction Until now, all references to musical instruments in this chapter have signified traditional acoustic music instruments. During the first half of the twentieth century, however, a number of electrophones were introduced, such as the Hammond organ, the electric guitar, and the first synthesizers, and some of these instruments came to change and expand the concept of the interface and its implications. Among the synthesizers, the Ondes Martenot and the Theremin—which are both monophonic or one-note-at-a-time instruments—introduced new kinds of step-less interfaces (in the case of the Ondes

systemic abstractions: the imaginary regime   127 Martenot, though, in combination with a normal keyboard interface) thus offering seamless control of pitching similar to that of, for instance, the violin. More generally, in being electric, these electrophones brought a new element to the field of music: namely musical sound as an electromagnetic current either produced as a transformation of acoustic sound, as in the case of the electric guitar and the human voice through a micro­ phone, or as entirely electrically generated phenomena, as in the case of synthesizers and samplers. Besides being a major factor in the development of most of the popular music genres that emerged through the greater part of the twentieth century, sound as an electromagnetic current, obviously, came to play a paramount and indispensable role in relation to the sound production of music and eventually also in the digitization of music. By the middle of the twentieth century, it had become possible to work with sound interactively by means of tape-recorders. This technology was primarily used for record­ ing and reproduction of performed music—where it soon catalyzed a development by which music and sound increasingly became subject to numerous forms of manipulation and processing—but some of the avant-garde composers used tape-recording to integrate nonpitched or concrete sound resources into their compositional vocabulary while others used it in combination with sine-wave generators to lay out new, unique tone sys­ tems for unique compositions and, in both cases, the tape-recorder was actually turned into a musical instrument of its own. Examples of this include Pierre Schaeffer’s continual development of musique concrète from the late 1940s and onward (Knakkergaard 1994, 36) and Karl-Heinz Stockhausen’s meticulous investigations into alternative pitch-­ organizations and ton-gemische in Studie 1 (1953) and Studie 2 (1954) (Manning 1994, 45f.). Much of this must be seen as an attempt to escape the straitjacket of the systems just as, for instance, dodecaphony and serialism were attempts to dismantle the tyranny of the diatonic scale, and the odd instruments and arbitrary tone systems of Harry Partch are idealistic emancipatory efforts to break the spell. However, only Schaeffer’s work seems to have had lasting importance and not really as a specific aesthetics but much more as a method—and, frankly, a method that first aroused greater interest once its practical implications were simplified by the introduction of digital sampling techniques. Instead of forming a substantial challenge to the systems, the electric turn simply per­ petuated them in its composition and concept of musical artifacts. It did contribute immensely to the development and shaping of music as we know it today, but it did not form any kind of break. Standing on the shoulders of the advances of the, by this time, unambiguous systems, and the practices and understandings they had led to, the electric turn, for its part, brought about a great many techniques and approaches toward music making and performance that progressively changed the concept and idea of sounding music. In particular, the recording studio nourished ideas and imaginations about music that were unthinkable without it; in this way, the studio became a musical instrument in itself. It has not really challenged the systems though—they are safe and sound—instead, the electric turn acts much more like an interface that facilitates the shaping of the sound­ ing of the music as performance, and makes this sound facilitation an art of its own. In what is essentially the same period, a few composers such as Max Matthews and Lejaren Hiller, who in their capacity as researchers had access to the mainframe

128   martin knakkergaard computers of the time, carried out experiments with digitally produced—and partly generated—music and sound. The application of digital technology did not just imply an expansion of available interfaces (digital technology’s physical interfaces that are comparable with traditional music instruments in fact came somewhat later) but addi­ tionally offered new ways to control and interact with musical sound as well as new models for musical shaping. From the start, this was only carried out on a very limited scale since it could only be done by using mainframe computers and more or less stan­ dardized command line programming. But, following the introduction of MIDI, a standard protocol for the digital control of musical events, in the beginning of the 1980s, together with the rapidly growing propagation of microcomputers and the development of “graphical programming” in the same decade, the digital interface became quite ubiquitous and so did various kinds of interaction that exceeded the, strictly speaking, very definite forms of interactions that are possible by means of traditional musical instruments (acoustic as well as electric). Today, computers, digital audio, and MIDI— in the form of a vast number of music and sound applications of which many are spe­ cifically aimed toward particular uses, interests, and music—together have become a more or less dominating factor and reference, making the tone system, along with the matching notation system, accessible from a plethora of digital sources. One of the most baffling elements that this development has brought with it is the unification of the three separate interdependent systems—or technologies—discussed so far: the tone system, the notation system, and the interface technology, namely, the instru­ ments (see also Dyndahl, volume 1, chapter 10, and Danielsen, this volume, chapter 29). Digital technology in the form of MIDI integrates the three systems in such a way that they appear to be one coherent and indivisible system. In this alternative and, in a way, nonphysical world, the tone of the system, the note of the score, and the key of the key­ board have apparently become one and any trace of theory and abstraction has practically been obscured by the manifest totality and parallelism of the digital virtualization (for a more detailed discussion on some of the consequences of this, see Knakkergaard 2016b). Albeit digital technology logically offers unlimited ways of organizing sound into separate tone steps, for designing interfaces and for representing sonorous events graphically or likewise, such opportunities have nevertheless only been developed to a modest degree, and even though the technology epitomizes a situation where most physical borders can be crossed at will, the twelve intervals of the octave and its protagonist, the diatonic scale, are maintained. By means of digital technology it is, for instance, much easier than ever before to work with different tunings. In addition to the possibility for the user to define personal, unique tunings, the music application Logic Pro X, for instance, offers 97 “standard” tunings including scientific ones such as “1/4-comma meantone with equal beating fifths” and “12-tone Pythagorean subset of JI 17-tone scale”; historical ones like “Ramos de Pareja (Ramos de Pareia)—Monochord, Musica practica (1482)” and “J. S. Bach ‘well temperament,’ acc. to Jacob Breetvelt’s Tuner”; and also exotic tunings such as “Northern Indian Gamut, modern Hindustani Gamut out of 22 or more Shrutis” and “Gamelan Udas Mas (approx) s6,p6,p7,s1,p2,s2,p2,p3,s3,p5,s5,p5.” Due to MIDI’s limitations, there is, however, no way to avoid the twelve steps of the octave and the

systemic abstractions: the imaginary regime   129 notion of the standard keyboard layout because the otherwise abstract tone in MIDI is understood as a key-number instead of a tone-number and as such is tied to the concrete concept of a key which is struck.6 MIDI is organized via the metaphor of the standard Western keyboard, plain and simple, and thus, in reality, controls and regulates the way music is created and appreciated in a much more rigid and dominating sense than was possible before its advent, eventually nourishing the notion that the systems behind are ontologically given a priori. Thus, unless we turn to sound art and sound installations, the development seems only to have consolidated the dominance of the implied systems and the interval has not at all been set free. Although practices that imply procedures such as glissandi, blue notes, and similar alterations that diverge from the fixed intervals are still widely used, the keyboard metaphor is not really suited for the “in-betweens” and the 12-note segmen­ tation of the octave—and the octave itself—in MIDI is not just a prerogative but an una­ voidable premise. Composers, performers, critics, and music thinkers, especially in the field of contem­ porary music, every so often challenged this situation. The alternative protocol, ZIPI, which was introduced in the 1990s, is maybe the best qualified and most versatile and least esoteric example of this. But although the proposed standard was MIDI compatible, and thus did not imply a complete break with current equipment and practices, it never caught on and, to date, every attempt to establish an agenda that could threaten the con­ cepts seriously has failed. For the time being, it seems fair to claim that music is not only organized by means of the system’s concepts and elements, but it is also imagined through the same conceptual formats—taking sound production of music into consid­ eration confirms this.

Final Remarks Music of today—and roughly speaking of the last 2,500 years—is not just influenced but also determined by a particular kind of metaphysical thinking of the Ancient Greeks. Although this thinking’s strong focus on the number four in reality lost its footing long ago, it is still the major factor behind the idiosyncratic regime of possible tone-steps within the range of audible sound and therefore the division of the octave—and nothing seems capable of disturbing this regime seriously. The tone steps and their tuning might appear to us as the squares or fields of a graph paper but, in reality, they build a format that resembles the nodes of the lines of the grid and not the squares. Thus, there are many possible nodes in-between the ones that are preprinted, and even though they are invisible, these alternative nodes are very often articulated and exposed in musical performances. They are, however, brought into play in relation to the preformatted nodes that, in this way, function not just as a reference but also as theoretical and abstract final goals. There is no doubt, that this well-organized and continuously exposed universe of pitches, and especially the various selections that make up certain identifiable “tonics”—pentatonic,

130   martin knakkergaard diatonic, and familiar, or unique modes—is essential as the core premise for musical creativity in Western cultures, not least because the “tonics”—contrary to the thoroughly chromatic as in the case of serialism—make way for sensations of invariant musical signs (motives, figures) that are perceptually achieved through a kind of object perma­ nence even where there is talk of obvious breakdowns between the different expositions of the sign (Kjeldsen 2004). As Kjeldsen points out, it is our perception that “delivers” the notion of identity or equivalence, just as it makes us “experience” elements such as tension and relaxation. Thus, we really cannot hear what we are hearing, the “tonicity” is too strong and our brains are too adapted to the patterns of the diatonic scale. The strength of our perception of the diatonic scale—or any scale with which we can become familiar—also makes it possible for us to ignore the quality or character of the sound source and it does not matter if it is a piccolo flute or a bass synthesizer that plays a melody, we can still recognize it just as we can even when it is transposed. The diatonic scale is a strong regime. Historically, there are numerous examples of alternative proposals aimed at replacing or supplementing the ruling tone system. Ferruccio Busoni’s essay Entwurf einer neuen Ästhetik der Tonkunst, first published in 1907, is a good—and famous—example of such a proposal in which he, among many other things, suggests an expansion of the octave into eighteen steps by means of a sixth-note-division of the whole tone (Busoni [1916] 1973, 40) and further claims that music’s full blossom is hindered by the instruments, that “their range, their tone, what they can render . . . are chained fast, and their hundred chains must also bind the creative composer” (Ruscoll 1972, 33). Many composers and musicians in the twentieth century challenged the dominance of the systems, some by expanding it further similar to what Busoni dreamed of, others by working with non­ pitched or weakly pitched sounds in the form of samples, as introduced by the composers of musique concrète. However, even when working with isolated sound samples, the organization of these may take the form of a “normal” piece of music, in composition as well as in realization. This is, for instance, the case in a highly “outer-space-soundscape” composed by Eric Serra for a particular scene in Luc Besson’s movie The Fifth Element. In the scene, even though the odd sounds and many “picturesque” sound-effects appear somewhat chaotic and as a properly stereotyped space-soundscape, when closer exam­ ined they turn out to be neatly composed in accordance with not just a discrete steady beat but also in accordance with a selection of pitches that evokes a sense of musical mode with references to the diatonic scale (see Knakkergaard 2009, 294). Again, the diatonic scale is a strong regime, and, in a way, it seems fair to claim that not just our musical practices but also our understandings and imaginations of music are subject to the discreet hegemony of diatonism. However, this diatonism, and the abstract entities of the reductionist tone system as a whole, have nourished the development of a firm, highly complex and advanced basis for musical creativity and imagination. These frame­ works that today truly are numerically regulated have provided the prerequisites that secure the comprehensibility of highly complex sound structures and an overwhelming amount of highly different musical genres and styles. So, maybe the Greeks were right after all, not in their focus on the number four, but in their vision and imagination of the

systemic abstractions: the imaginary regime   131 order of musical sound regulation as a means to gain insight into some of the fundamental principles of existence. By detaching the structure of the tone system from practice and by making its entities abstract, the path is paved for a composite, ideal system whose elements all are theoretically balanced. Such a strategy is not unique to Western cultures: no matter where we look in time and place, there are always basic norms, scales, and generative principles in play in the making of sonorous musical artifacts aimed for cere­ monial and religious and eventually epistemological purposes. The question remains, however, how is it possible to overcome the limitations that the current principles entail, how can their imaginative spell be broken?

Notes 1. From around 1150, Byzantine neumes did, however, indicate intervals but not pitches. 2. Or reconstructed theoretically, as the tetrachords were present empirically at the time. 3. The diatonic scale can also be produced by applying the fifth as “generator interval”: F–c–g–d’–a’–e’’–b’’, just as the pentatonic scale can be produced by stacking fifths F–c–g–d’–a’ and the chromatic scale by proceeding from the diatonic: F–c–g–d’–a’–e’’–b’’–f#’’’–c#’’’’– g#’’’’–d#’’’’’–a’’’’’. However, before the introduction of equal temperament—which in fact replaces the fifth with the semitone as the generator interval—these intervals would, just like the ones produced by means of the tetrachord, not “be in tune” when folded back into the same octave. 4. The fact that the Pythagoreans defined the interval of the whole note as 9:8 does not cor­ rupt this point, as they understood the 9 as a fifth plus a fifth and the 8 as a fifth plus a fourth, this way maintaining the limits of 4. 5. See https://en.wikibooks.org/wiki/Guitar/Print_Version. Accessed March 2017. 6. By tuning every single key-number (tone) individually, it is possible to program MIDI in such a way that it has more than 12 tones to the octave.

References Attali, J. 2008. Noise and Politics. In Audio Culture: Readings in Modern Music, edited by C. Cox and D. Warner, 7–9. New York: Continuum. Brinner, B. 2017. Indonesia. §III: Central Java. 3. Instruments and ensembles. Grove Music Online. Oxford Music Online. Oxford University Press. Accessed October 16, 2017. Burkert, W. 1962. Weisheit und Wissenschaft: Studien zu Pythagoras, Philolaos und Platon. Nürnberg: Hans Carl. Busoni, F. (1916) 1973. Entwurf einer neuen Ästhetik der Tonkunst. Hamburg: Verlag der Musikalienhandlung. Karl Dieter Wagner. Dahlhaus, C. 2017. Harmony. Grove Music Online. Oxford Music Online. Oxford: Oxford University Press. Handschin, J. 1948. Der Toncharacter. Zürich: Atlantis. Hansen, F. E. 1995. Middelalderen. In Gads Musikhistorie, edited by S. Sørensen and B. Marschner, 15–72. Copenhagen: G.E.C. Gad. Hansen, F.  E. 1999. Musik: Logisk konstruktion eller æstetisk udtryk? In Æstetik og logik, edited by J. Holmgaard, 151–167. Aalborg, Denmark: Medusa.

132   martin knakkergaard Hansen, F.  E. 2003. Tonesystem. In Gads Musikleksikon, edited by F.  Gravesen and M. Knakkergaard, 1641–1646. Copenhagen: G.E.C. Gad. Kjeldsen, J. 2004. Tonale gruppesymmetrier versus tonale gennembrud: Semiotikken mellem struktur og tilbliven. In Semiotiske undersøgelser, edited by T. Thellefsen and A. M. Dinesen, 141–163. Copenhagen: Hans Reitzel. Klempe, H. 1991. Musikkvitenskapelige retninger: En innføring. Oslo: Spartacus. Knakkergaard, M. 1994. IO: Om musik, teknologi og musikteknologi. Odense, Denmark: Odense Universitetsforlag. Knakkergaard, M. 2009. The Musical Ready-Made: On the Ontology of Music and Musical Structures in Film. In Music in Advertising: Commercial Sounds in Media Communication and Other Settings, edited by N. J. Graakjær and C. Jantzen, 275–304. Aalborg, Denmark: Aalborg Universitetsforlag. Knakkergaard, M. 2016a. Music by Numbers. In Cultural Psychology of Musical Experience, edited by S. H. Klempe, 299–318. Charlotte, NC: Information Age. Knakkergaard, M. 2016b. Unsound Sound: On the Ontology of Sound in the Digital Age. Leonardo Music Journal 26: 64–67. Leibniz, G. W. von. 1712. Letter to Christian Goldbach, April 17, 1712. https://en.wikiquote.org/ wiki/Gottfried_Leibniz. Accessed November 10, 2017. Lindley, M. 2017. Pythagorean Intonation. Grove Music Online. Oxford Music Online. Oxford University Press. Accessed October 11, 2017. Manning, P. 1994. Electronic and Computer Music. Oxford: Clarendon. Mathiesen, T.  J. 2017. Greece, §I: Ancient. (i) Pythagoreans. Grove Music Online. Oxford Music Online. Oxford University Press. Accessed March 21, 2017. Patel, A. D. 2006. Musical Rhythm, Linguistic Rhythm, and Human Evolution. Music Perception 24 (1): 99–104. Powers, H. S., and R. Widdess. 2017. Mode, §V: Middle East and Asia, V. Middle East and Asia. Grove Music Online. Oxford Music Online. Oxford University Press. Accessed September 30, 2017. Riemann, H. 1893. Vereinfachte Harmonielehre, oder die Lehre von den tonalen Funktionen der Akkorde. London: Augener. Ruscoll, H. 1972. The Liberation of Sound. New York: Prentice Hall. Sacks, O. 2008. Musicophilia: Tales of Music and the Brain. London: Picador. Sundberg, O. K. 1980. Pythagoras og de tonende tall. Oslo: Tanum-Norli. Tran, V. K. 1977. Is the Pentatonic Universal? A Few Reflections on Pentatonicism. World of Music xix (1–2): 76–84. Wishart, T. 1996. On Sonic Art. Amsterdam: Harwood Academic.

chapter 7

From R ays to R a Music, Physics, and the Mind Janna K. Saslaw and James P. Walsh

Introduction Surfing the net one day led us to the discovery of a fortuitous combination of articles. The first, by Elizabeth Hellmuth Margulis, titled “One More Time” (Margulis  2014), deals with the crucial role of repetition in musical experience. The second article, “A New Physics Theory of Life,” describes the work of MIT’s Jeremy England (Wolchover 2014). England’s work focuses on the second law of thermodynamics, particularly on how entropy can be defeated locally under certain physical conditions. The significance of repetition in both articles led us to the thought that a line could be traced from the one to the other. That is, if repetition is crucial to the emergence of life and to the experiencing of music, could it be that a fundamental relationship underlies both phenomena? We will take a moment to examine the implications of England’s work. Entropy can be regarded as a measure of the tendency of energy to disperse over time.1 We focus on entropy of “open” systems. Within these systems, entropy can be kept low by increasing the entropy of their surroundings. During photosynthesis, for example, a plant uses sunlight to maintain its own internal order while increasing overall entropy in the universe (Wolchover  2014). Jeremy England’s mathematical formula shows that more likely evolutionary outcomes involve atoms that absorb and dissipate more energy. Significantly, “[p]articles tend to dissipate more energy when they resonate with a driving force, or move in the direction it is pushing them, and they are more likely to move in that direction than any other at any given moment.” For example, “clumps of atoms surrounded by a bath at some temperature, like the atmosphere or the ocean, should

134   janna k. saslaw and james p. walsh tend over time to arrange themselves to resonate better and better with the sources of mechanical, electromagnetic or chemical work in their environments” (Wolchover 2014).2 There are two mechanisms mentioned by England that can increase efficiency of energy use and its subsequent dissipation. These are self-replication (in nonliving or living things) and increasing structural organization. Self-replication increases energy use and dissipation by copying an already efficient entity. Structural organization will only increase, as indicated earlier, if it results in greater energy usage. Both mechanisms are found in life forms, but are not limited to them.3 It seemed to us that resonance with an energy source, self-replication (a form of repetition), and increasing structural organization were all notions that pertain to sound in general and to music in particular, both as physical and cultural productions. A few examples may suffice at this point. Since sound is produced by waves, resonance works most obviously in areas where waves combine: timbre, consonance, and synchrony of constituent elements. Repetition applies to rhythm, but also to many other facets of sound, including the creation, recognition, and memory of pitch patterns. Increasing structural organization can be found in the development of sonic and musical creations over time. In this chapter, we intend to trace the role of repetition from the atomic level to the homeostasis (stable state of equilibrium) of life forms, to the formation of culture, and to music. We will focus on the evolutionary advantage of music in homeostasis of individuals and cultures, both sub- and supraconsciously, delineating what we call a “homeostatic frame of reference.” We will provide short examples in music, from lullabies to Beethoven, before examining two longer examples presenting the Afrofuturist jazz pioneer Sun Ra’s vision and methods of expanding listeners’ homeostatic frame of reference through his music. We speculate that music is not simply a cultural invention or an evolutionary trait, but rather an outcome of elementary laws governing the disposition of matter.4 It is a product of iteration or periodicity and the natural accumulation of complexity through variation. Just as England implies that if you shine light on random atoms long enough, they will tend to self-replicate and organize until, eventually, you will get a plant, we propose that if you continue shining that light you will get music.

Replication, Repetition, Homeostasis In this section, we will examine the key components of our conjecture: self-replication, invariance, emergent structure, homeostasis, entrainment, swarm behavior, and neural synchronization. Each of these components will be discussed in what follows. We use the term “homeostatic frame of reference” to refer to any group of entities that col­ lectively maintains homeostasis. Using this concept, it is possible to theorize a continuous process of development from England’s observations about thermodynamics, through work on the origins of life and the development of cells, on to theories of mind and ­consciousness, and even the behavior of crowds, economies, and nations.

music, physics, and the mind   135

Self-Replication Self-replication is a necessary mechanism for counteracting entropy. According to England, Interest in the modeling of evolution long ago gave rise to a rich literature exploring the consequences of self-replication for population dynamics and Darwinian competition. In such studies, the idea of Darwinian “fitness” is frequently invoked in comparisons among different self-replicators in a non-interacting population: the replicators that interact with their environment in order to make copies of themselves fastest are more “fit” by definition because successive rounds of exponential growth will ensure that they come to make up an arbitrarily large fraction of the future population.  (England 2013)

England’s contribution is the statistical formula predicting that: a self-replicator’s maximum potential fitness is set by how effectively it exploits sources of energy in its environment to catalyze its own reproduction. Thus, the empirical, biological fact that reproductive fitness is intimately linked to efficient metabolism now has a clear and simple basis in physics.  (England 2013)

Note that the mathematical formulas to determine reproductive fitness will apply at any level of structure. As England puts it, to examine self-replication, first the entity being replicated must be identified. Whatever that entity may be, however, the probability of its replication is determined in the same way: “Self-replication” is only visible once an observer decides how to classify the “self ” in the system: only once a coarse-graining scheme determines how many copies of some object are present for each microstate can we talk in probabilistic terms about the general tendency for that type of object to affect its own reproduction . . . Whatever the scheme, however, the resulting stochastic population dynamics must obey the same general relationship entwining heat, organization, and durability. (England 2013)

Invariance Invariance is the property of identity between entities. This identity could be of any sort. For example, in the case of two different triangles, the area or the angles or some other aspect could be invariant. Invariant behavioral properties of starlings cause flocking. Invariant ratios between elements in sunflower heads lead to their arrangement in spirals. In sound, invariance of wave forms connects instruments that may be of different sizes or composition. In music, invariant intervals, contour, and rhythm (separately or together) connect different motive forms. In a progression toward more complex

136   janna k. saslaw and james p. walsh entities, replication leads to collections of similar entities; interaction of multiple similar entities leads to emergent structure; emergent structure leads to formation of more complex entities formed of multiple complex systems, and so on.

Emergent Structure Circumstances that create a high enough probability of an entity sharing an invariant feature with another entity tend to lead to a multiplicity of similar entities. These entities then may give rise to an emergent structure—a new structure that comes to exist because of the interactions of the initial individual entities. For emergent structure to arise, one needs an instantiation of some entity, another entity that displays invariance with the first one, and some property of the invariance between the items that allows for connectedness. For example, life on earth is largely carbon-based due to the element’s ability to chain together to create more complex combinations. Single-celled creatures contain genes for cell adhesion molecules that allow them to attach to their environment but also serve to connect them to each other and create the beginnings of multicellular organisms (Neubauer 2011, 45–49). Musical meter may be viewed as an emergent property of rhythms that coincide at regular time intervals. Emergent structure may be found at any level of complexity. Each phase involves ­similarly behaving entities that give rise to an emergent structure. This emergent structure then takes on its own life as an entity, interacting with similar entities, and in turn gives rise to yet another layer of emergent properties. This simple pattern of events will continue over time. Thus, particles form into atoms, atoms into molecules, molecules into chemicals, chemicals into cells, cells into organs, and organs into organisms.

Swarm Behavior Organisms give rise to yet another level of complexity when they form into groups. We probably have all seen flocks of flying birds that appear to act with a single intelligence. In fact, swarm behavior “emerges naturally from simple rules of interaction between neighboring members of a group” (Fisher 2009, 9). One could view human social groups as a kind of swarm. In the words of Len Fisher, author of The Perfect Swarm (2009): The process by which simple rules produce complex patterns is called “self-­ organization.” In nature it happens when atoms and molecules get together ­spontaneously to form crystals and when crystals combine to form the intricate patterns of seashells. It happens when wind blows across the sands of the desert to produce the elaborate shapes of dunes. It happens in our own physical development when individual cells get together to form structures such as a heart and a liver, and patterns such as a face. It also happens when we get together to form the complex social patterns of families, cities, and societies.  (2)

music, physics, and the mind   137 Fisher further notes that animal and human swarms provide certain advantages to individuals: Swarm behavior becomes swarm intelligence when a group can use it to solve a problem collectively, in a way that the individuals within the group cannot. Bees use it to discover new nest sites. Ants use it to find the shortest route to a food source. It also plays a key role, if often an unsuspected one, in many aspects of our own society, from the workings of the Internet to the functioning of our cities.  (10)

In this passage, we see how “swarm intelligence” is used to solve problems that deal with homeostasis. For insects, finding a nest or food source contributes to a stable environment and regulation of energy intake and expenditure. In our view, this is a primary function of musical activity. We propose that music, which can serve to bring individual humans together into a sociocultural group or “swarm,” also aids in solving problems required for homeostasis (the ultimate goal of which is to sustain life). To delineate the role of the second law of thermodynamics as it pertains to life forms we will generalize from the concept of swarm behavior to homeostatic frames of reference.

Homeostatic Frames of Reference As stated earlier, “homeostatic frames of reference” delineate various groups of entities that share a common need for maintaining homeostasis. Thus, a bee’s nest is an example of a homeostatic frame of reference. Another example would be the human body, seen as a collection of individual cells that must maintain an appropriate temperature to avoid death. In a work that served as a primary inspiration for this article, Raymond Neubauer describes the significance of homeostasis in relation to evolution: For life that exists within a relatively narrow range of conditions, there must be a strong tendency toward homeostasis, toward building an inner world that is buf­ fered against fluctuations in the outer world. The changes described here are probably not just accidental (although mutations supplied the raw material). Those groups that came to dominance in each era evolved new ways to be independent of their surroundings.  (2011, 26)

The concept can be seen to give rise to various levels of complexity in which one might consider the role of homeostasis. For example, if we consider biological functions, we might derive frames of reference that expand from individual cells through organisms, families, communities, societies, nations, and so forth. The idea could be applied on both larger and smaller scales. We could conceivably start with atoms and build up greater levels of complexity through the creation of molecules, chemicals, protein chains, and

138   janna k. saslaw and james p. walsh so on. Likewise, we could expand our notion of a homeostatic frame of reference to include worlds, solar systems, galaxies, and the universe.

Periodicity Periodicity is the recurrence of invariant attributes. We focus here on temporal periodicity, which involves recurrence of invariant temporal intervals. At the cellular level, periodicity helps regulate basic bodily functions and controls the scanning of the environment that leads to perception. Periodicity is necessary in musical activity as well. The existence of a tone, for example, requires periodicity of sound wave frequency. Temporal periodicity is also fundamental in creating a basic pulse. At the level of the individual, repetition in music lends itself to the creation of memory. Repetition is crucial to simple identification of event sequences. In this process, the presence of invariant features in consecutive spans of time causes neurons to fire in a manner that aids the formation of memory. The neurobiologist Gerald Edelman describes the neuronal activity of sequencing events in the brain in these terms: Signaling in either a phasic [periodic] or a continuous fashion across reentrantly connected maps permits temporal correlations of the various selections that occur among neuronal groups within these maps. These correlations are driven initially by the signals arriving at primary cortical receiving areas from stimulus objects at a given time and place; selections in all higher-order maps related to the presence of an object are correlated through reentry with these primary areas.5  (Edelman 1989, 49)

Correlations permit certain neural pathways to be strengthened, while others weaken. After multiple encounters with a stimulus, particular patterns of neuronal groups will be selected in a mapped area. Following such selection, similar signals in each neuronal channel can preferentially activate previously selected neuronal groups in the repertoires of a neural region to which that channel is mapped.  (50)

Regularly occurring sensory inventories, or global mappings, lead to adaptive changes in the brain, and also to a sense that the individual is one enduring entity. The properties of motion converted by global mapping and reentry to adaptive action are fundamental to the building of perceptual experience via short- and longterm memory. This continual activity (which might be called the rhythm of reentry) is the substrate for continuity in primary consciousness.  (248)

In a sense, then, brains are themselves “swarms,” with each neuron functioning both as an individual and collectively according to simple rules. These rules include the activation or inhibition of neuronal members of the “swarm” in order to optimize homeostatic regulation.

music, physics, and the mind   139 Rodolfo R. Llinás (2002) compares neural activity to “some types of fireflies, which synchronize their light flash activity and may illuminate trees in a blinking fashion like Christmas tree lights. This effect of oscillating in phase so that scattered elements may work together as one in an amplified fashion is known as resonance—and neurons do it too” (12). This type of synchronization is referred to also as “entrainment.”

Entrainment and Social Bonding Entrainment occurs when two different periodicities gradually come into synchronization with each other (Clayton et al. 2004, 2). The tendency for rhythmic processes or oscillations to adjust in order to match other rhythms has been described in a wide variety of systems and over a wide range of time scales (i.e. periodicities): from fireflies illuminating in synchrony, through human individuals adjusting their speech rhythms to match each other in conversation, to sleep-wake cycles synchronizing to the 24-hour cycle of light and dark. Examples have been claimed from the fast frequency oscillations of brain waves to periods extending over many years, and in organisms from the simplest to the most complex as well as in the behaviour of inorganic materials and systems.  (3)

In human activity, it may be that entrainment serves to activate coordinated action. For instance, work songs use metrical regularity to enable the exertion of multiple individuals to happen simultaneously. This basic function can be extended to include the idea of groove. Margulis describes the way that groove enables human bonding: Listeners, in other words, engage in anticipatory attending, allocating attention in advance to expected time points in the future. This strategy enables them to process music more efficiently, devoting additional attention to moments where events are likely to occur, but it also allows them to tap along, or join in. Repetition allows for increasingly successful predictive attending, and the resulting entrainment mimics the condition of successful social interactions and easy communication . . . Music scholars have referred to elements of this state as groove: a felt, kinesthetic sense of the predictable elements of the temporal structure within a particular episode of music making.  (Margulis 2013, 111–112)

Groove may enable individuals to merge into social units: Groove tends to make people feel as though they were “a part of the music,” providing further evidence for a link between the ability to successfully predict elements of the musical structure and the kind of extended subjectivity that has been identified as a hallmark of strong experiences of music.  (112)

Repetition facilitates social connectedness by contributing to the sense of a piece’s “rightness,” in Margulis’s words. She maintains,

140   janna k. saslaw and james p. walsh feeling that a piece is inevitable and right amounts to an appealing sense of someone else’s (the composer or performer) artistic act precisely matching our own s­ ensibilities. It can be intoxicating to feel that a piece created by another person is funda­ mentally right.  (113)

That is, repetition facilitates a sense of pleasurable resonance with others.

Music and Homeostasis At a physical level, music helps our body to coordinate homeostasis. It has been shown to have effects on the immune system, through its effect on stress levels and, indeed, has recently been shown to have effect on a cellular level (Lestard et al. 2013). At the individual level, music helps us to create memories and to develop a personal sense of identity. At the societal level, music impacts on many aspects of life, including childrearing, mating, and peer bonding. In each of these we observe that maintaining homeostasis serves as a principal purpose. Let us use childrearing and warfare as examples. Lullabies contain certain features that contribute to the well-being of infants, contributing to the likelihood of their survival. A study done at the Great Ormond Street Hospital for Children, London, found that: Live music, as compared with reading and no interaction, appears to improve the wellbeing of young patients with cardiac and/or respiratory problems, and also to be beneficial for their careers. It seems to be live music per se, and not the social component of the musical interaction that attracts and distracts children, thereby helping them to feel less in pain and more relaxed, and this seems to apply to the older children in particular.  (Longhi et al. 2015)

Ellen Dissanayake has done extensive research into the role of music in human infancy. Music is an important mode through which parents interact and bond with their children, leading to homeostatic benefits for both. Researchers have described a number of adaptive or functional benefits to infants of early interactions or their component—from achieving emotional self-regulation to predisposing the learning of language . . . The most important function with regard to my argument here is that the mechanisms of early interactions serve to communicate and coordinate the emotions and behavior of the interacting pair and thus to create and reinforce their emotional bond. . . . One can propose that mother-infant interaction, with its peculiar characteristics, is an adaptive behavior that enabled ancestral infants to enjoy increased survival and their mothers to have greater reproductive success.  (Dissanayake 2008, 173–174)

On the opposite end of the spectrum, music can also be used to improve the chances of a society surviving while at war, either by using it to discourage or even to torture the

music, physics, and the mind   141 enemy,6 or by using it to benefit soldiers, inspiring them to fight or distracting them from pain or defeat. At a social level, the reuse of invariant features in multiple pieces of music helps to create an identifiable style. Robert Gjerdingen (1988, 2007) has identified these kinds of style-defining invariant features for the Galant and Classical eras. For invariant features of musical works to be recycled they must be found to be significant to musicking (Small 1998, 9). Thus, we should be able to establish that certain features of musical styles tend to affirm underlying socially binding values. R. Murray Schafer confirms: The thesis is also borne out well in tribal societies where, under the strict control of the flourishing community, music is tightly structured, while in detribalized areas the individual sings appallingly sentimental songs. Any ethnomusicologist will confirm this. There can be little doubt then that music is an indicator of the age, revealing, for those who know how to read its symptomatic messages, a means of fixing social and even political events. For some time I have also believed that the general acoustic environment of a society can be read as an indicator of social conditions which produce it and may tell us much about the trending and evolution of that society.  (Schafer 1993, 7)

By examining the values that bind individuals into a society, we should be able to determine how these values contribute to homeostasis—how the sound environment leads to sustenance of suitable living conditions.

Sub- and Supraconscious Levels Music has a significance that is not obvious to us at the conscious level. It communicates directly with the cells of our body, without intervention of natural language, and it serves to bind us together as members of a society, with each of us serving as single cells in a collective organism. Thus, its “meaning” lies both “beneath” our consciousness, as when our body reacts to it, and “above” our consciousness, when it contributes to our participation in group activities. Music perhaps also is a manner in which the supra mind, or the mind that is an ­emergent property of the combination of human minds, thinks. It interfaces between us and coordinates the action of the “supra organisms” of which we are merely cells.

Music as Expanding Homeostatic Frames of Reference As organisms start to deal with sound they begin to develop a new form of intelligence. As they begin to communicate through sound this activity becomes part of their swarm intelligence. As these communications evolve they become more complex and begin to take on replicable forms that can be recombined to create systems capable of expressing more and more complex communications about the internal

142   janna k. saslaw and james p. walsh state of the organism. Thus, organized sound itself (including music) contributes to the evolution of intelligence. It may be that composers propose new combinations of sound that then affect swarm behavior. With his Ninth Symphony, Beethoven is attempting to affect human history. At the moment when the baritone soloist and then the chorus announce “Alle Menschen werden Brüder” (“all men become brothers”), the composer and poet combine to expand the homeostatic frame of reference from the listener as individual to the world as family.

Emotions and Embodied Meaning The field of embodied cognition considers thought to be shaped by the most immediate knowledge humans have—that of the orientation and sensations of their bodies in the world. Musical imagination is no exception to this mode of understanding. It rests on mental representations of musical features shaped by schemas (essentially neuronal phase synchronizations) such as CONTAINER, PATH, and NEAR-FAR. The CONTAINER schema shapes our sense of music being “in” or “out” of a key, scale, or style; the PATH schema shapes our sense of a passage “moving” in space—higher, lower, “from” one key “to” another, for instance; the NEAR-FAR schema places musical features relative to a focal point in mental “space.” These and other schemas shape the world of musical imagination through mapping or “metaphor” (see Saslaw 1996, 1997–1998; Cox 2011). Recent work in neuroscience indicates that the brain maps closeness or distance of social, spatial, and temporal relations from the self (see Tavares et al. 2015). These relationships relate to emotional content, showing how the listener’s images of and emotional responses to music are tied to embodiment. As discussed previously, infants experience well-being and calm from being held and fed close to their caregivers, and we suggest that they associate closeness with these emotions. Thus, the infant’s sense of space is emotionally invested. Closeness is restful and calm, distance is uncomfortable and tense. We suggest that music induces emotions relative to the conceptual closeness or distance of its elements. Musical images display the following correspondences: Spatial/Emotional images: Close = Familiar = Safe = Comfort = Rest Musical realm: Tonic, Familiar, Resolution, Closely related tones/keys Spatial/Emotional images: Distance = Unfamiliar = Dangerous = Anxiety = Unrest Musical realm: Foreign tones/keys/overtones, unfamiliar styles These relationships between spatial/emotional images and the musical realm may be a  result of basic laws of thermodynamics. Resonance would underlie more positive emotions, the ones we sense as familiar, warm, and so forth. Lack of resonance, or rather the absence of coordinated periodicity, would underlie the more distant, unfamiliar, and potentially negative emotions. We propose that this correspondence occurs not because image schemas invoke ­metaphors, but because metaphors themselves arise from the processes Jeremy England

music, physics, and the mind   143 describes. The primary concept here is the role that periodicity plays in creating and enhancing adaptive systems. Neuronal phase synchronization creates a periodicity that allows the stochastic system of potential neural connections to become the adaptive ­system that is the mind. In other words, metaphors arise from the inherent tendency of our cognitive apparatus to adapt for homeostatic regulation, and music’s periodic character aids this process.

Sun Ra In this section we will introduce musical examples that illustrate the principles stated above. Our choice to examine the music and thought of Sun Ra (born Herman Poole Blount, 1914–1993), stems from his explicit attempt to create a larger, better homeostatic frame of reference. Many styles and genres of music overtly feature periodicity that encourages physical entrainment and social connection: dance music of all sorts, work songs, gospel-style hymns, marches, for instance. Minimal music, especially the early work of Philip Glass and Steve Reich, brings attention both to entrainment and also lack of it. However, Sun Ra’s work, at times, combines different periodicities in order to create the emergent property of new types of entrainment, thus encouraging us to expand our homeostatic frame of reference to a planetary scale. In a sense, Sun Ra dedicated his career and music to imagining moving beyond the society that held him back. Yet Sun Ra did not simply imagine what his universal utopia would be like. He spoke and behaved as if it were reality. Sun Ra created his life and music as a means of refusing to participate in the oppressive narrative of his time. Sun Ra spoke of the earth as being out of tune with the universe. We take his comments to mean that humans need to expand our homeostatic frame of reference to include a vision of the earth as it exists in the vastness of space. Sun Ra’s thinking on this subject seems akin to the message delivered in James Lovelock’s classic book on the environment: We need to love and respect the Earth with the same intensity that we give to our families and our tribe. It is not a political matter of them and us or some adversarial affair with lawyers involved; our contract with the Earth is fundamental, for we are a part of it and cannot survive without a healthy planet as our home. I wrote this book when we were only just beginning to glimpse the true nature of our planet and I wrote it as a story of discovery. If you are someone wanting to know for the first time about the idea of Gaia, it is the story of a planet that is alive in the same way that a gene is selfish.  (Lovelock 2000, viii–ix)

Sun Ra considered his music to come from beyond Earth. For him, an envisioned music of space would not unfold like that of Earth. If the harmony is just what they teach you in schools, then it wouldn’t be any other than what we’ve been hearing all along, but when the harmony’s moved the rest is

144   janna k. saslaw and james p. walsh supposed to move and still fit, then you’ve got another message from another realm, from somebody else. Superior beings definitely speak in other harmonic ways than the earth way because they’re talking something different, and you have to have chord against chord, melody against melody, and rhythm against rhythm; if you’ve got that, you’re expressing something else.  (Schaap 1989, cited in Szwed 1997, 128)

“Melody against melody, and rhythm against rhythm” is immediately apparent in the composition “Space Is the Place” (Sun Ra 1973). This is perhaps Sun Ra’s most wellknown composition, having been performed on the television show Saturday Night Live in 1978. The bass ostinato is in 5/4 while the melody is in cut time (2/2). This means that their respective downbeats coincide only every five bars of cut time (or four of 5/4). Since the melody’s phrases move in four-bar units, this means that the two patterns would begin together only every twenty (or sixteen) bars (the melody is not identical throughout, but the four-bar phrases continue). Adding to the different feels of the ostinato and melody is the fact that, although quarter notes are the same tempo in both, in the former the beat is the quarter note while in the latter it is the half note. In per­ formance, percussion and other parts may add to the layers of metrical conflict. This use of parallel streams of rhythms that articulate conflicting metric structures lures the listener into a kind of entrainment: after multiple repetitions, the mind begins to coordinate the 5/4 bass ostinato with the cut time melody in a manner that creates a sense of flow. In other words, the invariant quarter-note pulse, combined with the two conflicting metrical streams, creates an emergent property of entrainment over longer spans of time—the coordinated downbeats of the two meters. This emergent property, a new kind of entrainment, reflects the lyrics’ encouragement to the audience to expand their consciousness: Outer space is a pleasant place A place where you can be free There’s no limit to the things you can do Your thought is free and your life is worthwhile Space is the place

Sometimes, “rhythm against rhythm” is found in the subtle shifting, variation, and expansion of motives that confound metrical expectations, as in “Dance of the Language Barrier,”7 which probably dates from the early 1980s.8 The titular “language barrier” refers to the difficulty of understanding created by Sun Ra’s radical alteration of his materials, which disguises the underlying motivic repetition, but it also refers to the problems of communication between human beings. In terms of our discussion, the “language barrier” represents the border between differing homeostatic frames of reference. The “dance” aspect of the title suggests a pairing of the opposing sides, creating an expansion of one’s social framework to include those who do not speak your language. “Dance of the Language Barrier” is the musical realization of the difficulties of creating entrainment between different homeostatic frames of reference. Sun Ra tried to create a consciously challenging sense of entrainment, largely through more challenging rhythmic ideas. Nevertheless, if he had thought that the language barrier was insurmountable,

music, physics, and the mind   145 he would not have written a piece of music to tackle, and, through physical and social means, eliminate it. Sun Ra is constantly changing the length of his motives in “Language Barrier.” While many jazz phrases flow in units of two or four bars, there is no consistent phrase length here. Most of the variations in motivic length in this tune are created through additions or subtractions from the upbeat figures that begin every motive. Listeners certainly will notice invariant elements in the phrases, especially after repeated hearings, but the degree of variation seems much higher than in typical jazz tunes. The metrical clarity of the tune is severely undermined by a high degree of syncopation, achieved by minimizing articulation of downbeats and emphasizing upbeats. The composition also contains a very complicated pattern of accents. When jazz musicians are playing a tune, they often talk about whether the “feel” of a note is “up” or “down.” “Up” notes occur as anticipations or delays of the beat and are played as accents, resulting in syncopation. “Down” notes occur on the beat. In our own experience performing “Dance of the Language Barrier,” we found that the written score does not adequately portray where to place ups and downs. Only through listening to the tune as played by Michael Ray, former trumpeter with the Arkestra, were we able to place the accents and shape the phrases the way Sun Ra had taught them to his band. In a personal conversation with us in October 2000, the woodwind player Marshall Allen, who worked with Sun Ra from the 1950s until the latter’s death and now leads the Arkestra, discussed the difficulty of learning the accentual patterns of Sun Ra’s music. Allen indicated that the main focus of rehearsals was how to phrase the songs, how to achieve the right sound, tone, “vibration,” “voice,” or style, how to “say” what Sun Ra wanted. This included when to slur notes, when to cut them off, and when to “push the time forward or backwards,” “before the beat” or “after.”9 Part of the difficulty in detecting patterning in “Dance of the Language Barrier” comes from the relatively constant stream of fast note values in the long, irregular phrases, punctuated by longer note values that seem to be irregularly placed. Finally, the harmonic progressions of this tune do not reinforce a sense of regular phrase structure. In fact, apparently there was no single progression for the tune. Sun Ra was notorious for reharmonizing his compositions at each performance. Since he was the keyboard player, many of his more complicated written scores do not contain chord symbols—only he knew what harmonies he would play, and the band would follow. All of these factors combine to underline the “Language Barrier” of the piece’s title. However, a crucial factor serves to encourage metrical entrainment: the drum part. In the recording, there is a clearly discernable jazz swing beat, with cymbals adding emphasis. Thus, a listener might have great difficulty humming the tune, for the reasons outlined previously, but would still have no trouble tapping a foot to the performance. Thus, in a sense, the language barrier is being erected by the melody and broken by the drums of the Sun Ra Arkestra. Or perhaps the tune/harmony is one “language” and the very traditional swing drum part is another. In any case, one would need to supply more than just passive listening. The kind of “dance” that would be performed to this music might well require superhuman efforts—exactly what Sun Ra wanted. In our experience, if one makes those efforts, one enters a new understanding. The fact that this music can be

146   janna k. saslaw and james p. walsh played by a big band, which was designed to be a model of cooperation and coordination, means that the “Language Barrier” is surmountable, and rewards listeners with a cosmic mode of dance. From Sun Ra’s writings and communications (see Sun Ra 2005; Szwed 1997), it seems that he hoped audiences, in a sense, would create new thought patterns that would resonate with the advanced patterns of his music, turning away from negative, divisive ideas toward a more inclusive frame of reference. Sun Ra envisioned an active role for every person creating or hearing his music. He felt that “all people vibrate” (Sun Ra 2005, 460), and all their individual frequencies were important. “Each person is music himself and he’ll have to express what he is” (476). Echoing the terms used by Jeremy England (discussed earlier) to describe the resonance of atoms with an energy source, Sun Ra associates particular sound and musical frequencies with the notion of race: “each color has its own vibration. My measurement of race is rate of vibration—beams, rays” (460). So, in his view, individuals and races not only resonate with external energy sources, but they are themselves energy sources. Through the vibrations of his music, Sun Ra suggests, entrainment between the audience and the performers takes place, resulting in a higher state of being. “The real aim of this music is to coordinate the minds of people into an intelligent reach for a better world, and an intelligent approach to the living future” (457). Achieving an emergent entrainment different from normal experience would result in humans actually becoming different, more natural beings (ones who do not injure or kill their brothers and sisters—to Sun Ra, a much better homeostatic condition). “Space music is an introductory prelude to the sound of greater infinity. . . . It is a different order of sounds synchronized to the different order of being. . . . It is of, for and to the Attributes of the Natural Being of the universe” (457).

Musical Universals versus Cultural Difference Music functions at two nonconscious levels. At the level of the body, music affects hormonal and motor systems; at the level of society these hormonal and motor systems contribute to coordinated behavior. Given this theory, one can conclude that music does have basic universal functions and effects. However, history and evolution have led to the creation of differing strategies for maintaining homeostasis in different “climates.” From these differing strategies, basic values systems emerge and develop. The differences in musical thinking in different cultures, and so on, then arise from the differing values systems and behaviors that arise from cultural differences. Invariance and repetition allow for predictive coherence and create the possibility of emotive hearing in individuals and hence coordinated actions between individuals. As an example, mating strategies10 may be viewed as related to musical taste. A society that favors fewer children and greater parental commitment may prefer music that differs

music, physics, and the mind   147 from that in a society that favors more children and privileges the act of mating. One might speculate that the latter society will feature a more “dance-friendly” style of music with sharper attacks, motor rhythms, and simpler melodic forms, while the former might place more emphasis on complex melodic shape.

Resonance and Neural Synchronization Whatever the features of a musical style may be, it is likely that their emergence was shaped by resonance, in accordance with the processes described earlier. Resonance, according to our view, encompasses both universal laws of physics and cultural differences, as indicated by the work of Edward W. Large (2011). He finds, for example, that: Tonality is a universal feature of music, found in virtually every culture, but tonal “languages” vary across cultures with learning. Here, a model auditory system, based on knowledge of auditory organization and general neurodynamic principles, was described and studied. The model provides a direct link to neurophysiology and, while simplified compared to the organization and dynamics of the real auditory system, it makes realistic predictions. . . . Analysis of the model suggests that certain musical universals may arise from intrinsic neurodynamic properties (i.e., nonlinear resonance). Moreover, preliminary results of learning studies suggest that different tonal languages may be learned in such a network through passive exposure to music. In other words, this neurodynamic theory predicts the existence of a dynamical, universal grammar . . . of music.  (123)

Even widespread cultural resonance, then, ultimately can be viewed as a consequence of “a general tendency in driven many-particle systems towards self-organization into states formed through exceptionally reliable absorption and dissipation of work energy from the surrounding environment” (Perunov et al. 2014). Neuronal oscillations in the brain may well function according to the same probability calculations. Periodicity amplifies the probability that a particular alignment may recur. Put briefly, stochastic conditions become systematic when subjected to periodicity, whether the entities involved are atoms, chemical compounds, simple or complex organisms, brains, social groups, musical systems, or entire worlds. In the realm of music, repetition is sound rendered appropriate for memory ­formation. We suggest that periodicity in music amplifies the brain’s neuronal phase synchronization. As stated earlier, we suggest that the underlying image schemas involved in processing musical emotion have their effect because they correspond to resonance potentials in neuronal phase synchronization. Similar processes would then apply to metrical structures, phrase groupings, and large-scale musical form.

148   janna k. saslaw and james p. walsh If, instead of imagining an open system with particles in it, as England asks us to do, we substitute the brain for the system and potential synaptic connections for the particles, we can see how neural synchronization functions to create the emergent property of memory and ideas. A neuronal group that “fires together” repeatedly could be considered equivalent to a recurring grouping of particles. As they fire, these neurons would be absorbing and dissipating heat more efficiently than neural connections that are not subject to periodic activation by neural synchronization. The neurophysiologist Pascal Fries claims: Inputs that consistently arrive at moments of high input gain benefit from enhanced effective connectivity. Thus, strong effective connectivity requires rhythmic synchronization within pre- and postsynaptic groups and coherence between them. (2015, 220, emphasis ours)

He proposes that cognitive processes rely on this type of synchronization: Even in the absence of changes in (ultra)structural connectivity, neuronal synchronization as an emergent dynamic of active neuronal groups has causal consequences for neuronal communication. If neuronal communication depends on neuronal synchroni­ zation, then dynamic changes in synchronization can flexibly alter the pattern of communication. Such flexible changes in the brain’s communication structure, on the backbone of the more rigid anatomical structure, are at the heart of cognition.  (220)

The mind, then, is an emergent structure that arises as a result of the effect of periodicity on the stochastic collection of possible neural connections.

Conclusions So, based on the foundation laid out earlier, we feel that we have clarified the connection between the phenomena of repetition in music described by Margulis and England’s theoretical work on emergent systems. Using terminology developed by Grimshaw and Garner (2015), we could say that music is an emergent perception that is part of an acoustic ecology arising from the interaction of two different stochastic systems, the mind and the universe. We can describe air pressure fluctuations in the auditory range as exosonus, and the activation of neural networks caused by these sounds as endosonus (33). Repetition in music causes the exosonic stimulus to be better suited to forming neural networks and thus to become endosonic activity. We have proposed that this emergent perception is facilitated by synchronized neural activity that creates a referential time frame in which particular sequences of synaptic connections are more likely to recur and thus are reinforced. The neural networks themselves then function as an emergent system that gains efficiency in taking in and dissipating energy (in the form of neural firings). The endosonic results can be recombined through memory activation to take

music, physics, and the mind   149 on the properties of an emergent system as well, and can be manipulated by imagination to become effective communication. These effective communications are more likely to have the appropriate resonance to become significant when they are structured by repetition. Let us revisit the steps in our argument: • Self-Replication (Repetition)—enables increased efficiency of energy use and dissipation. Entities use energy to create copies of themselves with the property of • Invariance—which allows them to combine into • Emergent Structures—whose coordinated action, in living beings, is • Swarm Behavior—which serves to create • Homeostatic Frames of Reference. • Periodicity—is the mechanism behind • Resonance—which serves as the primary driving force behind reversing entropy in Jeremy England’s theory, coordinates neural activity, and serves as the basis of • Entrainment—synchronization between entities, allowing for communication. • Music—for humans, acts both as a catalyst of social entrainment and as a means for homeostasis,11 aiding in the formation of homeostatic frames of reference. Light shines on a heat bath. Its resonance creates entities that are more efficient at processing heat, some of which then self-replicate. Eventually, the invariant relationships between the replicants allow them to combine into emergent structures. Emergent structures coordinate the actions of living beings into swarm behavior, allowing them to better control homeostasis. Periodicity, which was already present in the system in the form of resonance-creating energy, then serves as a basis for communication between emergent structures by means of entrainment. This periodicity also influences neurons to form networks, allowing consciousness to arise. Music is then a means by which entrainment may be used for communication, allowing us to form more efficient homeostatic frames of reference. As we have seen, Sun Ra used music to teach the human species to see ourselves as a single homeostatic frame of reference in order for the species to continue to exist in the vastness of space.

Notes 1. The entropy in a “closed” or isolated system can be described as follows: “[Entropy] increases as a simple matter of probability: There are more ways for energy to be spread out than for it to be concentrated. . . . Eventually, the system arrives at a state of maximum entropy called “thermodynamic equilibrium,” in which energy is uniformly distributed. A cup of coffee and the room it sits in become the same temperature, for example. As long as the cup and the room are left alone, this process is irreversible. The coffee never spontaneously heats up again because the odds are overwhelmingly stacked against so much of the room’s energy randomly concentrating in its atoms” (Wolchover 2014). 2. In other words, England argues that under certain conditions, matter will spontaneously self-organize instead of becoming more disordered. This tendency could account for the

150   janna k. saslaw and james p. walsh internal order of many inanimate structures and of living things as well (see also Chaisson 2001). “Snowflakes, sand dunes and turbulent vortices all have in common that they are strikingly patterned structures that emerge in many-particle systems” (Wolchover 2014). 3. “The underlying principle driving the whole process is dissipation-driven adaptation of matter. . . . You start with a random clump of atoms, and if you shine light on it for long enough, it should not be so surprising that you get a plant” (Wolchover 2014). 4. These notions have been around for millennia in the form of philosophical and religious conceptions such as the “music of the spheres” in Western thought, as well as in ancient Asian cosmology. Here, we add the contributions of modern physics and biology to these earlier ideas. 5. Reentry is “a process of temporally ongoing parallel signaling between separate [neuronal] maps along ordered anatomical connections” (Edelman 1989, 49). In other words, it is the method by which receptors in the brain of stimuli from the world communicate and coordinate with other neuronal activity elsewhere in brain structures. If neuronal activity from separate receptors coincides temporally, those neural patterns are associated by strengthening of their pathways. For example, if I hear a lullaby and simultaneously smell baby powder, those stimuli can become associated in my brain. 6. For more on music used as a weapon, see Ross (2016) or music used in detention and torture, see Windsor, this volume, chapter 14. For more on the sound environment of war, see Bull, volume 1, chapter 9. 7. There is only one recorded Arkestra performance of this tune, on Sun Ra (1990). 8. Personal communications, October 2000, with Michael Ray, trumpeter in the Sun Ra Arkestra from 1977 on, and with Robert L. Campbell, coauthor of Campbell and Trent (2000). 9. Allen also said that Sun Ra would sometimes ask the band to go faster on ascending passages and slower on descending ones. “If you go up the stairs you use more energy than going down.” Sometimes only a part of the band would be accelerating while the rest stayed at a steady tempo. 10. Neubauer (2011) describes two contrasting mating strategies, suited to differing habitats. “Opportunistic, or r-selected species, tend to have rapid rates of increase, small size, many offspring, rapid development, and little parental care. They are able to colonize variable or unpredictable habitats quickly but may also experience catastrophic mortality when conditions change. Equilibrial, or K-selected, species have fewer young, with slower development, and a lot of parental care. They often exist in more constant or predictable environments where competition is keen and long-term survival skills, in terms of either behavioral versatility or physical growth, are important” (14). 11. While this chapter was in the editing process, Damasio (2018) was published. It supports our thesis that homeostasis plays a crucial role in the formation of cultural activity, including music. Damasio states that “cultural activity began and remains deeply embedded in feeling” (5), and “feelings are the mental expressions of homeostasis” (6). Damasio defines homeostasis as “the mechanisms of life itself and . . . the conditions of its regulation” (6).

References Campbell, R. L., and C. Trent. 2000. The Earthly Recordings of Sun Ra. 2nd ed. Redwood, NY: Cadence Books. Chaisson, E.  J. 2001. Cosmic Evolution: The Rise of Complexity in Nature. Cambridge, MA: Harvard University Press.

music, physics, and the mind   151 Clayton, M., R. Sager, and U. Will. 2004. In Time with the Music: The Concept of Entrainment and Its Significance for Ethnomusicology. ESEM CounterPoint 1. http://web.stanford.edu/ group/brainwaves/2006/Will-InTimeWithTheMusic.pdf. Accessed January 19, 2016. Cox, A. 2011. Embodying Music: Principles of the Mimetic Hypothesis. Music Theory Online 17 (2). http://www.mtosmt.org/issues/mto.11.17.2/mto.11.17.2.cox.html. Accessed May 7, 2017. Damasio, A. 2018. The Strange Order of Things: Life, Feeling, and the Making of Cultures. New York: Pantheon. Dissanayake, E. 2008. If Music Is the Food of Love, What about Survival and Reproductive Success? Musicae Scientiae 12 (1 suppl), 169–195. Edelman, G.  M. 1989. The Remembered Present: A Biological Theory of Consciousness. New York: Basic Books. England, J. 2013. Statistical Physics of Self-Replication. Journal of Chemical Physics 139 (121923). doi: http://dx.doi.org/10.1063/1.4818538. Accessed May 7, 2017. Fisher, L. 2009. The Perfect Swarm: The Science of Complexity in Everyday Life. New York: Basic Books. Fries, P. 2015. Rhythms for Cognition: Communication through Coherence. Neuron 88 (1): 220–235. doi:10.1016/j.neuron.2015.09.034. Gjerdingen, R. 1988. A Classic Turn of Phrase: Music and the Psychology of Convention. Philadelphia: University of Pennsylvania Press. Gjerdingen, R. 2007. Music in the Galant Style. Oxford: Oxford University Press. Grimshaw, M., and T. Garner. 2015. Sonic Virtuality: Sound as Emergent Perception. Oxford: Oxford University Press. Large, E. 2011. Musical Tonality, Neural Resonance and Hebbian Learning. In Mathematics and Computation in Music, 115–125. New York: Springer. Lestard, N. D., R. C. Valente, A. G. Lopes, and M. A. Capella. 2013. Direct Effects of Music in Non-Auditory Cells in Culture. Noise Health 15: 307–314. Llinás, R. 2002. I of the Vortex: From Neurons to Self. Cambridge, MA: MIT Press. Longhi, E., N. Pickett, and D. J. Hargreaves. 2015. Wellbeing and Hospitalized Children: Can Music Help? Psychology of Music 43 (2): 188–196. Lovelock, J. 2000. Gaia: A New Look at Life on Earth. Oxford: Oxford University Press. Margulis, E. H. 2013. On Repeat: How Music Plays the Mind. Oxford: Oxford University Press. Margulis, E. H. 2014. One More Time. Aeon. https://aeon.co/essays/why-repetition-can-turnalmost-anything-into-music. 07 March. Accessed May 7, 2017. Neubauer, R. L. 2011. Evolution and the Emergent Self: The Rise of Complexity and Behavioral Versatility in Nature. New York: Columbia University Press. Perunov, N., R. Marsland, and J. England. 2014. Statistical Physics of Adaptation. http://arxiv. org/pdf/1412.1875.pdf. Accessed May 7, 2017. Ross, A. 2016. When Music Is Violence. New Yorker. July 4, 2016 Issue. http://www.newyorker. com/magazine/2016/07/04/when-music-is-violence. Accessed May 7, 2017. Saslaw, J. 1996. Forces, Containers, and Paths: The Role of Body-Derived Image Schemas in the Conceptualization of Music. Journal of Music Theory 40 (2): 217–243. Saslaw, J. 1997–1998. Life Forces: Conceptual Structures in Schenker’s “Free Composition” and Schoenberg’s “The Musical Idea.” Theory and Practice 22–23: 17–34. Schaap, P. 1989. An Interview with Sun Ra. WKCR 5:5 (January–February) 7. Schafer, R.  M. 1993. The Soundscape: Our Sonic Environment and the Tuning of the World. Rochester, Vermont: Destiny Books. Small, C. 1998. Musicking: The Meanings of Performing and Listening. Middletown, CT: Wesleyan University Press.

152   janna k. saslaw and james p. walsh Sun Ra. 1973. Space Is the Place. Impulse IMP 12492. Sun Ra. 1990. Mayan Temples. Black Saint 120121–1. Sun Ra. 2005. The Immeasurable Equation: The Collected Poetry and Prose. Compiled and edited by J. L. Wolf and H. Geerken. Norderstedt, Germany: Waitawhile. Szwed, J. 1997. Space Is the Place: The Life and Times of Sun Ra. New York: Pantheon. Tavares, R. M., A. Mendelsohn, Y. Grossman, C. H. Williams, M. Shapiro, Y. Trope, et al. 2015. A Map for Social Navigation in the Human Brain. Neuron 87: 231–243. Wolchover, N. 2014. https://www.quantamagazine.org/20140122-a-new-physics-theory-of-life/. Accessed May 7, 2017.

chapter 8

M usic A na lysis a n d Data Compr e ssion David Meredith

Introduction Most people are capable of imagining music, and composers can even imagine novel music they have never heard before. This is known as musical imagery and can be dis­ tinguished from musical listening or music perception, where the music experienced results from physical sound energy being transmitted across the listener’s peripheral auditory system and then transduced in the inner ear into nerve signals that are propagated to higher centers of the brain. In both music perception and musical imagery, what is expe­ rienced is actually an encoding of musical information, created by the person’s brain. Alternatively, one could adopt a less dualist stance and say that experiencing music is the direct result of certain spatiotemporal patterns of neural firing that encode musical information. In listening, this encoding is generated from information about sound currently in the environment, combined with the person’s musical knowledge. In imagery, the encoding is constructed only from the person’s musical knowledge. Sound is thus just one particular medium for communicating musical information and is not a prerequisite for musical experience. Indeed, trained musicians can experience (i.e., “imagine”) music they have never previously heard while silently reading a musical score. Musical imagery and perception therefore have a great deal in common—indeed, there are some brain centers (especially in the right temporal lobe) that are necessary for both (Halpern 2003). Both the way that one perceives and understands music as well as the music that one is capable of imagining are therefore largely determined by one’s musical knowledge that is gained through passive exposure to music, active learning of musical skills, and/or study of music theory and analysis. It has been proposed in psychology, information theory, and computer science that knowledge acquisition—that is, learning—is essentially data compression (Chater 1996; Vitányi and Li 2000): on being exposed to new data, a learning

154   david meredith system attempts to encode this data as parsimoniously as possible by removing redun­ dancy in the data and relating it to what it already knows. If a learning system can describe the new data in a compressed manner, then the total amount of space used to store all the system’s knowledge increases by only a small amount. The less extra space required to encode the new data, the better this new data is “understood” by the system. In this chapter, I focus on how the musical knowledge that underpins both music perception and musical imagery can be acquired by compressing musical infor­ mation. In particular, my concern is with how it might be possible to find the best ways of understanding musical works simply by compressing as much as possible the information that they contain. The ideas presented in this chapter are founded on the assumption that the goal of music analysis is to find the best possible explanations for the structures of musical “objects,” where such objects are typically individual works or movements but could be extracts from works (e.g., phrases, chords, voices, even individual notes) or collections of works (e.g., all the pieces by a composer or in a particular genre). This assumption raises the following question: given two analyses of the same musical object (i.e., two different explanations for the object or ways of understanding it), how are we supposed to decide which of the two is the “better” one? Musicologists and music analysts who adopt a humanistic approach generally do not use objective criteria for deciding which of two possible analyses of the same musical object is preferable. Typically, such scholars prefer analyses that provide what they individually consider to be “more satisfying” readings of a work—that is, the ones that make them feel as though they have a better understanding of a work or repertoire. Analysts, then, traditionally evaluate musical analyses on subjective grounds. However, claiming that one analysis of a piece of music is “better than” another one for the same piece is considered here to be meaningless, unless one specifies objectively evaluable tasks that the first analysis allows one to perform more successfully. If such tasks are specified, then one can meaningfully aim to find those analyses that are the best for carrying out those tasks. Such tasks could include: • memorizing a piece in order, for example, to be able to perform it without a score; • identifying errors in a score or performance of an analyzed piece or other related pieces; • correctly identifying the composer, place of composition, genre, form, and so on, of an analyzed piece or other related pieces; • predicting what occurs in one part of a piece, having analyzed another part of the same piece or other related pieces; or • transcribing a performance of a piece from an audio recording or MIDI1 represen­ tation to staff notation. Of course, it may be the case that there is no single way of understanding a piece or set of pieces that allows for optimal performance on all such tasks. For example, the best way of understanding a piece in order to be able to detect errors in a performance may not be the best way of understanding that piece in order to determine whether some other, pre­ viously unheard, piece is by the same composer. There may also be several different ways

music analysis and data compression   155 of understanding a given piece or set of pieces that are equally effective for carrying out a given task. Nevertheless, it will often be the case that understanding a piece in certain ways will allow one to carry out certain objectively evaluable tasks more effectively than understanding the piece in certain other ways; to this extent, one can speak of some anal­ yses as being “better than” others for carrying out specific, objectively evaluable tasks. The goal of the work presented in this chapter is therefore that of finding those ways of under­ standing musical objects that allow us to most effectively carry out the musical tasks that we want to accomplish. The approach adopted is based on the hypothesis that the best possible explanations for the structure of a given musical object are those that 1. are as simple as possible; 2. account for as much of the detailed structure of the object as possible; and 3. set the object in as broad a context as possible (i.e., describe the object as being part of as large a body of music as possible). Clearly, these goals often conflict: accounting for the structure of a piece in more detail or in a way that relates the piece to all the music in some larger context can often entail making one’s explanation (i.e., analysis) more complex. This hypothesis, which forms the foundation for the work reported in this chapter, is a form of the well-known principle of parsimony. This principle can be traced back to antiquity2 and is known in common parlance as “Ockham’s razor,” after the medieval English philosopher, William of Ockham (ca. 1287–1347), who made several statements to the effect that, when presented with two or more possible explanations that account for some set of observations, one should prefer the simplest of these explanations. In more recent times, the parsimony principle has been formalized in various ways, including Rissanen’s (1978) minimum description length (MDL) principle, Solomonoff ’s (1964a, 1964b) theory of inductive inference, and Kolmogorov’s concept of a minimal algorithmic sufficient statistic3 (Li and Vitányi 2008, 401ff; Vereshchagin and Vitányi 2004). The essential idea underpinning these concepts is that explanations for data (i.e., ways of understanding it) can be derived from it by compressing it—that is, by finding parsimonious ways of describing the data by exploiting regularity in it and removing redundancy from it. Indeed, Vitányi and Li (2000, 446) have shown that “data com­ pression is almost always the best strategy” both for model selection and prediction. The basic hypothesis that drives the research presented in this chapter is thus that the more parsimoniously one can describe an object without losing information about it, the better one explains the object being described, suggesting the possibility of auto­ matically deriving explanatory descriptions of objects (in our case, musical objects) simply by the lossless compression of “in extenso” descriptions of them. In the case of music, such an in extenso description might, for example, be a list of the properties of the notes in a piece (e.g., the pitch, onset, and duration of each note), such as can be found in a MIDI file. Alternatively, it could be a list of sample values describing the audio signal of a musical performance, such as can be found in a pulse-code modulation (PCM) audio file. The defining characteristic of an in extenso description of an object is that it explicitly specifies the properties of each atomic component of the object (e.g., a MIDI event in a

156   david meredith MIDI file or an audio sample in a PCM audio file), without grouping these atoms together into larger constituents and without specifying any structural relationships between components of the object.4 In contrast, an explanation for the structure of an object, such as an analysis of a musical object, will group atomic components together into larger constituents (e.g., notes grouped into phrases and chords or audio samples grouped together into musical events), specify structural relationships between com­ ponents (e.g., “theme B is an inversion of theme A”), and classify constituents into catego­ ries (e.g., “chords X and Y are tonic chords in root position,” “bars 1–4 and 16–19 are occurrences of the same theme”). Throughout this chapter, I assume that an analysis is a losslessly compressed encoding of an in extenso description of a musical object, even though most musical analyses to date have typically been lossy, in that they only focus on certain aspects of the structure of an object (e.g., harmony, voice-leading, thematic structure, etc.). Such lossily compressed encodings of an object can also provide useful ways of understanding it, but, because information in the original object is lost in such encodings, they do not (at least individually) explain all of the detailed structure of the object. In particular, such lossy encodings do not provide enough information for the original object to be exactly reconstructed. Thus, if one is interested, for example, in learning enough about a corpus of pieces in order to compose new pieces of the same type, then such lossy analytical methods would not be sufficient. In the remainder of this chapter, it is proposed that a musical analysis can fruitfully be conceived of as being an algorithm (possibly implemented as a computer program) that, when executed, outputs an in extenso description of the musical object being analyzed, and thus serves as a hypothesis about the nature of the process that gave rise to that musical object. Moreover, it is hypothesized that, if one has two algorithms or programs that each generate the same musical object, then the shorter of these (i.e., the one that can be encoded using fewer bits of information) will represent the better way of under­ standing that object for any task that requires or benefits from musical understanding. A model of music perception and learning will be sketched later on in this chapter, that is based on the idea of accounting for the structure of a newly experienced piece of music by minimally modifying a compressed encoding of previously encountered pieces. Some recent work will then be reviewed in which these ideas have been put into practice by devising compression algorithms that acquire musical knowledge that can then be applied in automatically carrying out a variety of advanced musicological tasks.

Encodings, Decoders, and Two-Part Codes In this chapter, a musical analysis is conceived of as an effective procedure (i.e., algorithm), possibly implemented as a working computer program, that, when executed, generates as its only output an in extenso description of the music to be explained. Typically, the

music analysis and data compression   157 description of this program may be shorter than its output. A basic claim of this chapter is that such a description (in the form of a program) becomes an explanation for the structure of the object being described as soon as it is shorter than the in extenso description of the object that it generates. In other words, a compressed encoding of an in extenso description of an object can be considered a candidate explanation (not neces­ sarily a “correct” one) for the structure of that object because it serves as a hypothesis as to the nature of the process that gave rise to the object. Moreover, it is hypothesized that the more parsimoniously one can describe an object on some given level of detail, the better that description explains the structure of the object on that level of detail. As discussed earlier, this is an application of Ockham’s razor or the MDL principle (Rissanen 1978). The following simple example serves to illustrate the foregoing ideas. Consider the problem of describing the set of twelve points shown in Figure 8.1. One could do this by explicitly giving the coordinates of all twelve points, thus:

P(p(0, 0), p(0, 1), p(1, 0), p(1, 1), p(2, 0), p(2, 1), p(2,2), p(2, 3), p(3, 0), p(3,1), p(3, 2), p(3, 3)).

(1)

In this encoding, a set of points, {p1, p2, . . . pn}, is denoted by P(p1, p2, . . . pn) and each point within such a set is denoted by p(x,y), where x and y are the x- and y-coordinates of the point respectively. The encoding in (1) can be thought of as being a program that computes the set of points in Figure 8.1 simply by specifying each point individually. Representing this set of points in this way requires one to write down twenty-four inte­ ger coordinate values. Moreover, the encoding does not represent any groupings of the points into larger constituents, nor does it represent any structural relationships between the points. In other words, this description is an in extenso description that does not represent any of the structure in the point set and therefore cannot be said to offer any explanation for it. One could go even further and say that expression (1) repre­ sents the data as though it were a random, meaningless arrangement of points with no order or regularity. 3

2

1

0 0

1

2

3

Figure 8.1  A set of twelve two-dimensional points in a Euclidean integer lattice.

158   david meredith Note that, in order to actually generate the set of twelve points, the description (1) needs to be decoded. An algorithm that carries out this decoding is called a decoder. In this case, such a decoder only needs to know about the meanings of the P(·) and p(x,y) formalisms. One can obtain a shorter encoding of the point set in Figure 8.1 by exploiting the fact that it consists of three copies, at different spatial positions, of the square configuration of points,

P(p(0, 0), p(0,1), p(1, 0), p(1, 1)). (2) One could represent this description of the point set as follows:



T(P(p(0, 0), p(0,1), p(1, 0), p(1,1)), V(v(2, 0), v(2, 2))), (3)

where T(P(p1, p2, . . . pn),V(v1, v2, . . . vm)) denotes the union of the point set, {p1, p2, . . . pn}, and the point sets that result by translating {p1, p2, . . . pn} by the vectors, {v1, v2, . . . vm}, where each vector is denoted by v(x,y), x and y being the x- and y-coordinates, respectively, of the vector. Note that description (3) fully specifies the point set in Figure  8.1 using only twelve integer values—that is, half the number required to explicitly list the coordinates of the points in the in extenso description in (1). Description (3) is thus a losslessly compressed encoding of description (1). Description (3) thus qualifies as an explanation for the structure of the point set in Figure 8.1, precisely because it represents some of the structural regularity in this point set. If one perceives the point set in Figure 8.1 in the way represented by description (3), then the twelve points are no longer perceived to be arranged in a random, meaningless manner—they are now seen as resulting from the occurrence of three identical squares. Moreover, it is precisely because expression (3) captures this structure that it manages to convey all the information in (1) while being only roughly half the length of (1). On the other hand, in order to generate the actual point set in Figure 8.1 from the expression in (3), the decoder now needs to be able to interpret not only the operators P(·) and p(x,y), but also the operators T(·), V(·), and v(x,y). The decoder required to decode description (3) is therefore itself longer and more complex to describe than the decoder required to decode expression (1). The crucial question is therefore whether we save enough on the length of the encoding to warrant the resulting increase in length of the decoder. If the set of twelve points in Figure 8.1 were the only data that we ever had to understand and the operators T(·), V(·) and v(x,y) were only of any use on this par­ ticular dataset, then the increase in the length of the decoder required to implement these extra operators would probably exceed the decrease in the length of the encoding that these operators make possible. Consequently, in this case, the parsimony principle would not predict that description (3) represented a better way of understanding the point set in Figure 8.1—the new encoding would just replace the specification of eight random points in (1) with two random vectors in (3) and three randomly chosen new operators to be encoded in the decoder. However, the concepts of a vector, a vector set, and the operation of translation can be used to formulate compressed encodings of an

music analysis and data compression   159 infinite and commonly occurring class of point sets—those containing subsets related by translation. If we encode a sufficiently large sample of such point sets using translationinvariance as a compression strategy, then the saving in the lengths of the resulting encodings will more than offset the increase in the length of the decoder required to make it capable of handling translation of point sets. This illustrates that interpreting the point set in Figure 8.1 as being composed of three identical square configurations of four points only makes sense if one is interpreting this point set in the broad context of a large (in this case, infinite) class of point sets, of which the set of points in Figure 8.1 is an example. The foregoing example illustrates that what we are really interested in is not just the length of an encoding but the sum of the length of the encoding and the length of the decoder required to generate the in extenso description of the encoded object from the encoding. We therefore think about descriptions of objects as being two-part codes in which the first part (the decoder) represents all the structural regularity in the object that it shares with all the members of a (typically large) set of other objects and the sec­ ond part represents what is unique to the object and random relative to the decoder.5 This is why we would not, for example, be interested in a “decoder” that itself consists solely of an in extenso description of the point set in Figure 8.1 and generates this point set every time it is run with no input. In this case, the “encoding” of the data would be of length zero but, because the decoder would be of length at least equal to that of the uncompressed in extenso description of the point set, we would have no net com­ pression and, consequently, no explanation.

Music Analysis and Data Compression If the best explanations are the shortest descriptions that account for as much data as possible in as much detail as possible, then this suggests that the goal of music analysis should be to find the shortest—but most detailed—description of as much music as pos­ sible. To illustrate this, let us consider a close musical analogue of the point-set example in Figure 8.1 discussed previously. Figure 8.2 shows the beginning of J. S. Bach’s Prelude in C minor (BWV 871) from the second book of Das Wohltemperierte Klavier (1742) and Figure 8.3 shows a point-set repre­ senting this music, in which the horizontal dimension represents time in sixteenth notes and the vertical dimension represents morphetic pitch, an integer that encodes the pitch letter name (A–G) and octave of a note but not its alteration ( . . . ≅≅, ≅, ∃, #, *, . . . ), so that, for example, D≅4, D∃4 and D#4 all have the same morphetic pitch of 24 (Meredith 2006, 2007). The union of the three, 4-note patterns, A, B, and C, in Figure 8.3 could be described in an in extenso manner, on an analogy with description (1), as follows:



P(p(1, 27), p(2, 26), p(3, 27), p(4, 28), p(5, 26), p(6, 25), p(7, 26), p(8, 27), p(9, 25), p(10, 24), p(11, 25), p(12, 26))

(4)

160   david meredith

Figure 8.2  The opening notes from J. S. Bach’s Prelude in C minor (BWV 871) from the s­ econd book of Das Wohltemperierte Klavier (1742). Patterns A, B, and C correspond, respectively, to the patterns with the same labels in Figure 8.3 (from Meredith et al. 2002). 30 29 A

28

B

27

C

26 25 24 23 22 21 20 19 18 17 16

0

1

2

3

4

5

6

7

8

9

10

11

12

Figure 8.3  A point-set representation of the music in Figure 8.2. The horizontal dimension represents time in sixteenth notes; the vertical dimension represents morphetic pitch (Meredith 2006, 2007). Patterns A, B, and C correspond, respectively, to the patterns with the same labels in Figure 8.2. See text for further explanation (from Meredith et al. 2002).

This would require one to write down twenty-four integer coordinates. Alternatively, on an analogy with description (3), one could exploit the fact that the set consists of three occurrences of the same pattern at different (modal) transpositions, and describe it more parsimoniously as follows:

music analysis and data compression   161

T(P(p(1, 27), p(2, 26), p(3, 27), p(4, 28)), V(v(4, −1), v(8, −2))) (5)

This expression not only requires one to write down only half as many integers but also encodes some of the analytically important structural regularity in the music—namely, that the twelve points consist of three, 4-note patterns at different transpositions. Thus, by seeking a compressed encoding of the data, we have succeeded in finding a represen­ tation that gives us important information about the structural regularities in that data. In the particular case of Figure 8.3, we can get an even more compact description by recognizing that the vector mapping A onto B is the same as that mapping B onto C. This means that one could represent the vector set V(v(4,−1),v(8,−2)) in description (5) as a vector sequence consisting of two consecutive occurrences of the vector v(4,−1), where the result of translating pattern A by the first vector in the sequence is itself translated by the second vector in the sequence. For example, this could be encoded as V(2v(4,−1)), where the emboldened V operator indicates that what follows is a sequence or ordered set, not an unordered set; and where we denote k consecutive occurrences of a vector, v(x,y), by kv(x,y). This would, of course, require a modification of the decoder so that it could process both vector sequences and the shorthand notation for sequences consist­ ing of multiple occurrences of the same vector. As discussed earlier, whether or not add­ ing this functionality to the decoder would be worthwhile depends on whether the new functionality allows for a sufficient reduction in encoding length over the whole class of musical objects that we are interested in explaining. In this particular case, since the device of musical sequence, exemplified by the excerpt in Figure 8.2, is commonly used throughout Western music, it would almost certainly be a good strategy to allow for the encoding of this type of structure in a compact manner. It is therefore not surprising that most psychological coding languages that have been designed for representing musical structure allow for multiple consecutive occurrences of the same interval or vector to be encoded in such a compact form (Deutsch and Feroe 1981; Meredith 2012b; Restle 1970; Simon and Sumner 1968, 1993).

Music-Theoretical Concepts That Promote Compact Encodings of Musical Objects There are a number of basic music-theoretical concepts and practices that help Western musicians and composers to encode tonal music parsimoniously and reduce the cogni­ tive load required to process musical information. One example of such a concept is that of a voice. The strategy of conceiving of music as being organized into voices substantially reduces the amount of information about note durations that has to be communicated and remembered by musicians. For the vast

162   david meredith majority of notes in a piece of polyphonic Western music, the duration is equal to the within-voice, inter-onset interval—that is, most notes are held until the onset of the next note in the same voice. This means that, for most notes, provided we know the voice to which it belongs, we do not have to explicitly encode its duration—we only need to do so if there is a rest between it and the next note in the same voice. Grouping notes together into sequences that represent voices therefore considerably reduces the amount of ­information about note durations that needs to be explicitly encoded, remembered, and communicated. The way in which pitch information is encoded in standard Western staff notation also helps to make scores more parsimonious. Key signatures, for example, remove the need to explicitly state the accidental for every note in a piece. Instead, accidentals only have to be placed before notes whose pitches are outside the diatonic set indicated by the key signature. Since most of the notes within a single piece of Western tonal music occur within a small number of closely related diatonic sets (i.e., within a relatively limited range on the line of fifths), accidentals are typically only necessary for a small pro­ portion of the notes in a score. Key signatures, therefore, provide a mechanism for parsi­ moniously encoding information about pitch names in Western tonal music. Also, typically, Western music based on the major–minor system (or the diatonic modes) is organized into consecutive temporal segments in which each note is under­ stood to have one of seven different basic tonal functions within the key in operation at the point where the note occurs. For example, in the major–minor system, these basic tonal functions would be {tonic, supertonic, mediant . . . leading note} and each could be modified or qualified by being considered flattened or sharpened relative to a diatonic major or minor scale. Staff notation capitalizes on this by providing only seven different vertical positions at which notes can be positioned within each octave, rather than the twelve different positions that would be necessary if the pitch of each note were repre­ sented chromatically rather than in terms of its role within a seven-note scale. Again, this strategy allows for pitch information to be encoded more parsimoniously, leading to a reduction in the cognitive load on a musician reading the score. This pitch-naming strategy leads to more parsimonious encodings by assigning ­simpler (shorter) encodings to pitches that are more likely to occur in the music. Time signatures similarly define a hierarchy of “probability” over the whole range of possible temporal positions at which a note may start within a measure. Specifically, notes are more likely to start on stronger beats.6 In Western classical and popular music, this results in only very few possible positions within a bar being probable positions for the start (or end) of a note and the notation is designed to make it easier to notate and read notes that start at more probable positions (i.e., on stronger beats). In data compression, variable-length codes, such as the Huffman code (Huffman 1952; Cormen et al. 2009, 431–435) or Shannon–Fano code (Shannon 1948a, 1948b; Fano 1949), work in a closely analogous way by assigning shorter codes (i.e., simpler encodings) to more probable symbols or symbol strings. Huffman coding, in particular, assigns more frequent ­symbols to nodes closer to the root in a binary tree, which is closely analogous to tree-based representations of musical meter that assign stronger beats to higher levels in a tree structure (Lerdahl and Jackendoff 1983; Temperley 2001, 2004, 2007; Martin 1972; Meredith 1996, 214–219).

music analysis and data compression   163 It thus seems that several features of Western staff notation and certain music-theo­ retical concepts have evolved in order to allow for Western tonal music to be encoded more parsimoniously.

Kolmogorov Complexity The work presented in this chapter is based on the central thesis that explanation is ­compression. The more compressible an object is, the less random it is, the simpler it is and the more explicable it is. This basic thesis was formalized by information theorists during the 1960s and encapsulated in the concept of Kolmogorov complexity. The Kolmogorov complexity of an object is a measure of the amount of intrinsic information in the object (Chaitin  1966; Kolmogorov  1965; Solomonoff  1964a,  1964b; Li and Vitányi 2008). It differs from the Shannon information content of an object, which is the amount of information that has to be transmitted in order to uniquely specify the object within some predefined set of possible objects. The Kolmogorov complexity of an object is the length in bits of the shortest possible effective (i.e., computable) description of an object, where an effective description can be thought of as being a computer program that takes no input and computes the object as its only output. In other words, the Kolmogorov complexity of an object is a measure of the complexity of the simplest process that can give rise to the object. The more structural regularity there is in an object, the shorter its shortest possible description and the lower its Kolmogorov complexity. Unfortunately, it is not generally possible to determine the Kolmogorov complexity of an object, as it is usually impossible to prove that any given description of the object is the shortest possible. Nevertheless, the theory of Kolmogorov complexity supports the notion of using the length of a description as a measure of its complexity and it supports the idea that the shorter the description of a given object, the more structural regularity that description captures. The theory has also been used to show formally that data com­ pression is almost always the best strategy for both model selection and prediction (Vitányi and Li 2000). For some further comments on the relationship between music analysis and Kolmogorov complexity, see Meredith (2012a).

Music Analysis and Perceptual Coding As stated at the outset, the work presented here is based on the assumption that the goal of music analysis is to find the best possible explanations for musical works. This could be recast in the language of psychology by saying that music analysis aims to find the most successful perceptual organizations that are consistent with a given musical surface (Lerdahl and Jackendoff 1983).

164   david meredith Most theories of perceptual organization have been founded on one of two principles: the likelihood principle (Helmholtz 1867), which proposes that the perceptual system prefers organizations that are the most probable in the world; and the simplicity principle (Koffka 1935), which states that the perceptual system prefers the simplest perceptual organizations. For many years, psychologists considered the simplicity and likelihood principles to be in conflict until Chater (1996), drawing on the theory of Kolmogorov complexity, pointed out that the two principles are mathematically equivalent. However, Vitányi and Li (2000) showed that, strictly speaking, the predictions of the likelihood principle (which corresponds to Bayesian inference) and the simplicity principle (which corre­ sponds to what they call the “ideal MDL principle”) are only expected to converge for individually random objects in computable distributions (Vitányi and Li 2000, 446). They state, “if the contemplated objects are nonrandom or the distributions are not computable then MDL [i.e., the simplicity principle] and Bayes’s rule [i.e., the likelihood principle] may part company.” Musical objects are typically highly regular and not at all random, at least in the sense that randomness is defined within algorithmic information theory (Li and Vitányi 2008, 49ff.). Vitányi and Li’s conclusions therefore seem to cast doubt on whether approaches based on the likelihood principle, commonly applied in Bayesian and probabilistic approaches to musical analysis such as those proposed by Meyer (1956), Huron (2006), Pearce and Wiggins (2012), and Temperley (2007), can ever successfully be used to dis­ cover certain types of structural regularity in musical objects such as thematic transfor­ mations or parsimonious generative definitions of scales or chords. The approach presented in this chapter is therefore more closely aligned with ­models of perceptual organization based on the simplicity principle—in particular, theories of perceptual organization in the tradition of Gestalt psychology (Koffka 1935) that take the form of coding languages designed to represent the structures of patterns in particular domains. Theories of this type predict that sensory input is more likely to be perceived to be organized in ways that correspond to shorter descriptions in a ­particular coding language. Coding theories of this type have been proposed for serial patterns (Simon 1972), visual patterns (Leeuwenberg 1971), and, indeed, musical patterns (Deutsch and Feroe 1981; Meredith 2012b; Povel and Essens 1985; Restle 1970; Simon and Sumner 1968, 1993).

A Sketch of a Compression-Based Model of Musical Learning Let us define a musical object to be any quantity of music, ranging from a single note through to a complete work or even a collection of works. A musical object is typically interpreted by a listener or an analyst in the context of some larger object that contains it

music analysis and data compression   165

M

I T WS P C

F

Figure 8.4  A Venn diagram illustrating various contexts in which a musical object might be interpreted. A phrase (P) could be interpreted within the context of a section (S), which could be interpreted within the context of a work (W), and so on. C = works by the same composer; F = works in the same form or genre; I = works for the same instrumentation; T = tonal music; M = all music.

(see Figure 8.4). In essence, the model of musical learning presented here is as follows.7 The analyst or listener explicitly or implicitly tries to find the shortest program that ­computes the in extenso descriptions of a set of musical objects containing: • the object to be explained (the explanandum); and • other objects, related to the explanandum, defining a context within which the explanandum is to be interpreted. This idea is illustrated in Figure 8.5. The analyst and listener differ in the degree of freedom that they have to choose the context within which they interpret an object. The analyst can explicitly choose a con­ text of closely related objects, such as other music in the same genre or by the same com­ poser. In general, the more similar the explanandum is to the other objects in the context, the shorter its description can be, relative to that context. The listener, on the other hand, is forced to interpret the explanandum in the context of their largely implicit understanding of all the previous music they have encountered. Figure 8.6 illustrates the idea that, when the listener hears a new piece (in red), the existing explanation (i.e., program) (P) for all the music previously heard (in yellow) is minimally modified to produce a new program (P’) to account for the new piece in addi­ tion to all previously encountered music. This is achieved by discovering the simplest way of interpreting as much of the material in the new piece in terms of what is already known. The perceived structure of the newly encountered musical object is then ­represented by the specific way in which P’ computes that object. Note that P’ may also

166   david meredith P

Figure 8.5  The analyst’s or listener’s understanding of a musical object (the dark gray circle—in red on the companion website) is modeled as a program, P, that computes a set of musical objects containing the one to be explained along with other related objects (the light gray circles—in yellow on the companion website) forming a context within which the explanandum is interpreted.

P

P′

Figure 8.6  When the listener hears a new piece (the dark gray circle—in red on the compan­ ion website), the existing explanation (i.e., “program”) (P) for all the music previously heard is minimally modified to produce a new program (P’) to account for the new piece in addition to all previously encountered music. This might be achieved by discovering the simplest way of interpreting as much of the material in the new piece in terms of what is already known.

g­ enerate the previously heard pieces in a way that differs from that in which P generates these pieces, reflecting the fact that hearing a new piece may change the way that one interprets pieces that one has heard before. One can speculate that P’ is produced in a two-stage process. In the first stage, an attempt is made to interpret as much of the new, unfamiliar piece as possible by reusing elements and transformations that have previously been used to encode (i.e., under­ stand) music. This will typically lead to a compact encoding of the new piece if it con­ tains material that is related to that in previously encountered music. However, after this first stage, the global interpretation of all pieces known to the listener/analyst (including

music analysis and data compression   167 the most recently interpreted piece) may no longer be as close to optimal as it could be. In a second stage, therefore, the brain of the listener or analyst might carry out a more computationally expensive “knowledge consolidation” process in which an attempt is made to find a globally more efficient encoding of all music known to the individual. This might, for example, occur during sleep (see Tononi and Cirelli 2014) and might consist of a randomized process of seeking alternative encodings of individual pieces that help to produce a more efficient global interpretation of the music known to the individual. On this view, music analysis, perception, and learning essentially reduce to the pro­cess of compressing musical objects. This is, of course, an idealized model: for example, in practice, a listener will not have internalized a model that can account in detail for all the music they have previously heard. In other words, in reality, this learning process would probably be based on rather lossy compression. However, it is important to stress that, even though both the analyst and the listener aim to find the shortest possible encodings of the music they encounter, they both usu­ ally fail to achieve this. As Chater (1996) points out, “the perceptual system cannot, in general, maximize simplicity (or likelihood) over all perceptual organizations. . . . It is, nonetheless, entirely possible that the perceptual system chooses the simplest (or most probable) organization that it is able to construct” (578). This is largely a result of the limited processing and memory resources available to the perceptual system. For exam­ ple, we typically describe the structure of a piece of music in terms of motives, themes, and sections, all of which are temporally compact segments, meaning that they are ­patterns that contain all the events that occur within a particular time span. It could well be that, for some pieces, a more parsimonious description (corresponding to a better explanation) might be possible in terms of patterns containing notes and events that are dispersed widely throughout the piece. However, listeners would normally fail to discover such patterns because their limited memories and attention spans constrain them to focus on patterns that are temporally compact (see also Collins et al. 2011).

Using the Model to Explain Individual Differences The model just sketched can be applied to understanding the emergence of differences between the ways that individuals understand the same piece. The model proposed in the previous section consists essentially of a greedy algorithm8 that is used to construct an interpretation for a newly encountered piece that minimally modifies an existing “program” that generates descriptions of all the pieces in a particular context set. It was proposed that this greedy approach might be supplemented by a computationally more expensive process of consolidation that attempts to find a globally more efficient encoding. Nevertheless, because such a consolidation process will not generally be

168   david meredith capable of consistently discovering a globally optimal encoding, the way that an individual understands a given piece will generally depend not only on which pieces they already know, but also on the order in which these pieces were encountered. This implication could fairly straightforwardly be tested empirically. A rather crude version of the foregoing model has been implemented in an algorithm called SIATECLearn. The SIATECLearn algorithm is based on the geometric pattern discovery algorithm, SIATEC, proposed by Meredith and colleagues (2002). The SIATEC algorithm takes as input a set of points called a dataset and automatically discovers all the translationally related occurrences of maximal repeated patterns in the dataset. If the dataset represents a piece of music, with each point representing a note in pitch-time space, then two patterns in this space related by translation correspond to two state­ ments of the same musical pattern, possibly with transposition. We say a pattern P is translatable within a dataset D if there exists a vector, v, such that P translated by v gives a pattern that is also in D. A translatable pattern is maximal for a given vector, v, in a data­ set D, if it contains all the points in the dataset that can be mapped by translation by v onto other points in the dataset. The maximal translatable pattern (MTP) for a vector v in a dataset D, which we can denote by (MTP (v, D), can also be thought of as being the intersection of the dataset D and the dataset D translated by –v. That is,

MTP (v , D) = D ∩ (D − v ). (6)

For each (nonempty) MTP, P, in a dataset, SIATEC finds all the occurrences of P, and outputs this occurrence set of P. Such an occurrence set is called the translational equivalence class (TEC) of P in D, denoted by TEC(P, D), because it contains all the patterns in the dataset that are translationally equivalent to P. That is,

TEC(P , D) = {Q | Q ⊆ D ∧ (∃v | Q = P + v )}. (7)

SIATEC therefore takes a dataset as input and outputs a collection of TECs, such that each TEC contains all the occurrences of a particular maximal translatable pattern. An algorithm called SIATECCompress (Meredith 2013b, 2015, 2016) runs SIATEC on a dataset, then sorts the found TECs into decreasing order of “quality.” Given two TECs, the one that results in the better compression (in the sense of expressions (4) and (5), discussed earlier) is deemed superior. If both TECs give the same degree of com­ pression, then the one whose pattern is spatially more compact is considered superior. SIATECCompress then scans this list of occurrence sets and computes an encoding of the input dataset in the form of a set of TECs that, taken together, account for or cover the entire input dataset. SIATECLearn runs SIATECCompress, but also stores the patterns it finds on each run and will preferably reuse these patterns rather than newly found ones on subse­ quent runs of the algorithm. Thus, when SIATECLearn is run on the twelve-point pat­ tern on the left in Figure 8.7, it “interprets” the dataset as being constructed from three occurrences of the square pattern shown. This square pattern is therefore stored in its

music analysis and data compression   169 y 6

y 5

5

4

4

3

3 2

2

1

1 0

0

1

2

3

0

4 x

1

0

2

3

4 x

Figure 8.7  Output of SIATECLearn when presented first with the dataset on the left and then with the dataset on the right. y 6

y 5

5

4

4

3

3 2

2

1 0

1

0

1

2

3

4 x

0

0

1

2

3

4 x

Figure 8.8  Output of SIATECLearn when presented first with the dataset on the left and then with the dataset on the right.

“long-term” memory. When the algorithm is subsequently run on the ten-point dataset on the right, it prefers to use the stored square pattern rather than any of the patterns that it finds in this newly encountered dataset; it interprets the new dataset as containing two occurrences of the square pattern along with two extra points. Conversely, when SIATECLearn is first presented with the ten-point dataset, it inter­ prets the dataset as being composed from five occurrences of the two-point vertical line configuration shown on the left in Figure 8.8. This pattern is then stored in long-term memory, so that, when the algorithm is subsequently presented with the twelve-point dataset, it interprets this set as consisting of six occurrences of this vertical line rather than three occurrences of the square pattern. This very simple example illustrates how the way in which objects are interpreted can depend on the order in which they are presented.

170   david meredith

COSIATEC: Music Analysis by Point-Set Compression Given the concept of a TEC, as defined in (7) earlier, we can define the covered set, CS(T), of a TEC T to be the union of all the patterns in T. That is

CS(T ) = ∪P∈T P .

(8)

COSIATEC (Meredith et al. 2003; Meredith 2013b, 2015, 2016) is a greedy compression algorithm based on SIATEC. The algorithm takes a dataset as input and computes a set of TECs that collectively cover this dataset in such a way that none of the TECs’ ­covered sets intersect. It also attempts to choose this set of TECs so that it minimizes the length of the output encoding. The basic idea behind the algorithm is sketched in the pseudocode in Figure 8.9. As shown in Figure 8.9, the COSIATEC algorithm first finds the “best” TEC in the output of SIATEC for the input dataset, S. The best TEC is the one that produces the best compression. This means that it is the one that has the best compression factor, which is the ratio of the number of points in its covered set (as defined in (8)) to the sum of the number of points in one occurrence of the TEC’s pattern and the number of occurrences minus 1. The reasoning behind this is that a TEC can be compactly encoded as an ordered pair, (P,V), where P is one occurrence in the TEC and V is the set of vectors that map P onto all the other occurrences of P in the dataset. The number of vectors in V is therefore equal to the number of occurrences of P minus 1. The length of an in extenso encoding of a TEC’s covered set in terms of points is simply |CS(T)| as defined in (8). Each vector in V has approximately the same information content as a point in P, so the length of an ordered pair encoding of a TEC, (P,V), in terms of points is approximately |P|+|V|. The compression factor is the ratio of the length of the in extenso encoding to the length of the compressed encoding. Thus, the compression factor of a TEC, T = (P,V), denoted CF(T), can be defined as

CF (T ) =

CS(T ) P +V

.

COSIATEC(S) while S is not empty Find the best TEC, T, using SIATEC Add T to the encoding, E Remove the points covered by T from S return the encoding E

Figure 8.9  The COSIATEC algorithm.

music analysis and data compression   171 If two TECs have the same compression factor, then COSIATEC chooses the TEC in which the first occurrence of the pattern is the more compact: the compactness of a pat­ tern is the ratio of the number of points in the pattern to the number of dataset points in the bounding box of the pattern. The rationale behind this heuristic is that patterns are more likely to be noticeable if the region of pitch-time space that they span does not also contain many “distractor” points that are not in the pattern. These heuristics for evaluat­ ing the quality of a TEC are discussed in more detail by Meredith and colleagues (2002), Meredith (2015), and Collins and coauthors (2011). As shown in Figure 8.9, once the best TEC, T, has been found for the input dataset, S, this TEC is added to the encoding (E) and the covered set of T, CS(T), is removed from S. Once the covered set of T has been removed from S, the process is repeated, with SIATEC being run on the new S. The procedure is repeated until S is empty, at which point E contains a set of TECs that collectively cover the entire input dataset. Moreover, because the TEC that gives the best compression factor is selected on each iteration, E is typically a compact or compressed encoding of S. COSIATEC typically produces encod­ ings that are more compact than those produced by SIATECCompress. Figure 8.10 shows the output of COSIATEC for a short Dutch folk song. The complete piece can be encoded as the union of the covered sets of five TECs. In Figure 8.10, each TEC is drawn in a different shade. The first TEC, drawn in red, consists of the occur­ rences of a three-note, lower-neighbor-note figure. This TEC has the best compression factor of any TEC for a maximal translatable pattern in this dataset. After these threenote patterns have been removed from the piece, the next best TEC is the one drawn in light green in Figure 8.10, namely the two occurrences of the four-note, rising scale seg­ ment. The fifth TEC consists of the fourteen occurrences of a single unconnected point in Figure 8.10. These are the points (notes) that are left over after removing the sets of repeated patterns that give the best compression factor. This final set of “residual” points, which cannot be compressed by the algorithm, is essentially seen by the algorithm as being random “noise” that it cannot “explain.” Figure 8.11 shows the analysis generated by COSIATEC for a more complex piece of music, the Prelude in C minor (BWV 871) from book 2 of J. S. Bach’s Das Wohltemperierte Klavier. Note that the first TEC (in red) generated by COSIATEC (i.e., the one that results in the most compression over the whole dataset) is precisely the four-note ­pattern shown in Figure 8.2, discussed earlier.

Morphetic pitch

NLB015569_01 mid 28 21 14 7 0

0

185

370

555

740

925

1110 1295 1480 1665 1850 2035 2220 2405 2590 2775 2960 3145 3330 3515 3700 Time/tatums

Figure 8.10  The set of TECs computed by COSIATEC for a short Dutch folk song, “Daar zou er en maagdje vroeg opstaan” (file number NLB015569 from the Nederlandse Liederen Bank, http://www.liederenbank.nl). Courtesy of Peter van Kranenburg.

172   david meredith

Morphetic pitch

35 28 21 14 7 0

0

2131 4262 6393 8524 10655 12786 14917 17048 19179 21310 23441 25572 27703 29834 31965 34096 36227 38358 40489 42620 Time/tatums

Figure 8.11  Analysis generated by COSIATEC of J. S. Bach’s Prelude in C minor (BWV 871) from the second book of Das Wohltemperierte Klavier (1742). Each set of pattern occurrences (i.e., TEC) is displayed in a distinct shade of gray (see image on companion website which uses colors). The first TEC generated, consisting of occurrences of the opening “V”-shaped motive (indicated with triangles here and red on the companion website), is the one that has the highest compression factor over the whole dataset. The overall compression factor of this analysis is 2.3, and the residual point set, containing notes that the algorithm does not re-express in a compact form, contains 3.61 percent of the notes in the piece (corresponding to 25 out of 692 notes).

Evaluating Music Analysis Algorithms In the introduction to this chapter, it was proposed that, when given two or more ­different analyses of the same piece of music (or, more generally, musical object), it may be possible to determine which of the analyses is the best for carrying out certain objectively evaluable tasks. It is similarly possible to evaluate algorithms that compute analyses by comparing how well the generated analyses allow certain tasks to be performed. In a recent paper (Meredith 2015), the point-set compression algorithms, COSIATEC and SIATECCompress, were compared on a number of different tasks with a third greedy compression algorithm proposed by Forth and Wiggins (2009) and Forth (2012). The algorithms were evaluated on three tasks: folk song classification, discovery of repeated themes and sections, and discovery of fugal subject and countersubject entries. Although no obvious correlation was found between compression factor and per­ formance on these tasks, COSIATEC achieved both the best compression factor (around 1.6) and the best classification success rate (84%) on the folk-song classification task. The pattern-discovery task on which the algorithms compared in this study were evaluated consisted of finding the repeated themes and sections identified in the JKU Patterns Development Database, a collection of five pieces of classical and baroque music, each accompanied by “ground-truth” analyses by expert musicologists (Collins 2013). The output of each algorithm was compared with these analyses. I have argued (Meredith 2015, 263–265) that these “ground-truth” analyses are not satisfactory for at least two reasons: first, the musicologists on whose work the ground-truth analyses are based did not consistently identify all occurrences of the patterns that they considered to be worth mentioning; and second, there are patterns that are noticeable and important that the

music analysis and data compression   173

Figure 8.12  Examples of noticeable and/or important patterns in Bach’s Fugue in A minor (BWV 889), that were discovered by the algorithms tested by Meredith (2015) but were not recorded in the “ground-truth” analyses in the JKU Patterns Development Database used for evaluation. Patterns (a), (b), and (d) were discovered by COSIATEC. Patterns (c) and (d) were discovered by SIATECCompress.

musicologists who created the ground-truth analyses failed to mention. Indeed, the tested algorithms discovered not only structurally salient patterns that the analysts omitted to mention but also exact occurrences of the ground-truth patterns that are not recorded in the ground-truth analyses. Figure 8.12 shows some examples of structurally important patterns in a fugue by J. S. Bach that were not recorded in the “ground-truth” analyses used for evaluation. Notwithstanding the foregoing methodological issues with this task, it was found that SIATECCompress performed best on average, achieving an average F1 score of about 50 percent over the five pieces in the corpus. However, COSIATEC, achieved F1 scores of 71 percent and 60 percent on the pieces by Beethoven and Mozart, respectively; and Forth’s algorithm performed substantially better than the other algorithms on a fugue by Bach. There was therefore no algorithm that consistently performed best on this task. On the fugal analysis task, the algorithms performed rather less well than on the other evaluation tasks. COSIATEC and SIATECCompress achieved a mean recall of around 60 percent over the twenty-four fugues in the first book of J. S. Bach’s Das Wohltemperierte Klavier. However, COSIATEC’s precision on this task was much lower (around 10%). Overall, the best performing algorithm was SIATECCompress that achieved an F1 score of around 30 percent on this fugal analysis task. In the study just discussed, the performance of the SIA-based compression algo­ rithms on the folk-song classification task was compared with that of the general-­ purpose text compression algorithm, bzip2 (Seward 2010). On this task, bzip2 achieved a much higher average compression factor (3.5) but a much lower classification success rate (12.5%) than the SIA-based algorithms. At first sight, this might be interpreted as evidence against the basic hypothesis that shorter descriptions correspond to better explanations. In a later study, Corentin Louboutin and I therefore explored in more

174   david meredith depth whether general-purpose compression algorithms could be used for music ­analysis, by comparing three general-purpose compression algorithms with COSIATEC on two music-analytical tasks (Louboutin and Meredith  2016). The general-purpose algorithms compared included the Burrows–Wheeler algorithm (Burrows and Wheeler 1994), Lempel–Ziv-77 (LZ77) (Ziv and Lempel 1977) and Lempel–Ziv-78 (LZ78) (Ziv and Lempel 1978). This study confirmed that, in order to achieve good results, the type of representation used for the music has to be appropriate for the compression algorithm used. Thus, COSIATEC, which discovers maximal repeated patterns in point sets, was unaffected by the order in which the notes were sorted in the input files. However, LZ77 discovers repeated substrings in a sequence of symbols and these substrings, consisting of sequences of contiguous symbols in the original string representation, only corre­ spond to sequences of contiguous notes in a voice when the notes in the music are presented to the algorithm a voice at a time. If the notes are presented a chord at a time (i.e., sorting the notes first by pitch and then by onset), then we should not expect LZ77 to be capable of finding repeated melodic themes. Our results confirmed this; when the algorithms were used on the fugal analysis task described earlier, the F1 score for the LZ77 algorithm doubled when the notes were first sorted so that the algorithm was pre­ sented with the music a voice at a time rather than a chord at a time. On this task, we also found a strong correlation between compression factor and F1 score, supporting the general notion that shorter descriptions represent better explanations. On the folk-song classification task, we were able to improve on the performance of COSIATEC by using eight different representations in combination, with LZ77 being used to calculate normalized compression distances for seven of these and COSIATEC being used for the last one. In this way, we succeeded in achieving a classification success rate of over 94 percent using an eight-nearest-neighbor classification algorithm, ­compared with 85 percent for COSIATEC alone.

Applying a Compression-Driven Approach to the Analysis of Musical Audio The main concern in this chapter has been with explaining “musical objects” by dis­ covering losslessly compressed descriptions of these objects. The basic scheme is that one takes an in extenso encoding of such an object and then attempts to find a short algo­ rithm that generates that in extenso encoding as its only output. The encoding could be on any level of granularity and could represent any quantity of music in any possible domain in which a musical object might be manifested—for example, an image of a score, a symbolic encoding of a score, an audio recording or a video recording. In the examples and evaluations presented above, the focus has been on musical objects that are symbolic encodings of scores. In such cases, one can realistically hope to be able to produce losslessly compressed descriptions in which we are required to consider only a

music analysis and data compression   175 very small proportion of the information in the object to be “random” or “noise.” On the other hand, if one were concerned with explaining the structure of a digital audio recording of a performance of a piece produced by human performers playing from a score, then one would expect the compression factors achievable to be lower and one would expect to have to be satisfied with considering a larger proportion of the information in the object as being “noise.” This is because the detailed structure of such a recording depends not only on the score from which the players are performing, but also many other factors that are perhaps harder to model, such as the acoustics of the space in which the recording was made, the precise nature of the instruments used and, most importantly, the players themselves and their own particular ways of interpreting the score.

Summary In this chapter, I have proposed that the goal of music analysis should be to find the “best” ways of understanding musical objects and that two different analyses of the same musical object can be compared objectively by determining whether one of them allows us to more effectively perform some specific set of tasks. I have also explored the hypoth­ esis that, for all tasks that require an understanding of how a musical object is con­ structed, the best ways of understanding that object are those that are represented by the shortest possible descriptions of the object. I have briefly outlined how this hypothesis relates to the theory of Kolmogorov complexity and to coding theory models of percep­ tion. I have also briefly sketched how these ideas can form the basis of a theory of musi­ cal learning that can potentially explain aspects of music cognition such as individual differences. Finally, I briefly described the COSIATEC point-set compression algorithm and reviewed the results of some experiments in which it and other related algorithms have been used to automatically carry out musical tasks such as folk-song classification and thematic analysis. The results achieved in these experiments generally support the idea that the knowledge necessary to be able to successfully carry out advanced musico­ logical tasks can largely be acquired simply by compressing in extenso representations of musical objects. Moreover, some of the results clearly indicate a correlation between compression factor and success on musicological tasks. However, these experiments also show that performance on such tasks depends heavily both on the specific types of redundancy exploited by the compression algorithm used to generate the compressed encodings and on the precise form of the in extenso representations used as input to these compression-based learning methods.

Acknowledgments The work reported in this chapter was carried out as part of the EU collaborative project, “Learning to Create” (Lrn2Cre8). The project Lrn2Cre8 acknowledges the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under FET grant number 610859.

176   david meredith

Notes 1. https://www.midi.org/specifications 2. See, for example, chapter 25 of Book 1 of Aristotle’s Posterior Analytics (Bouchier 1901, 66). 3. Kolmogorov introduced the field of nonprobabilistic statistics at a conference in Tallinn, Estonia, in 1973 and in a talk at the Moscow Mathematical Society in 1974 (Li and Vitányi 2008, 405). Unfortunately, these talks were never published in written form. 4. See Simon and Sumner (1968, 1993) for a similar use of the term “in extenso” in the context of music representations. 5. For a more technical discussion of two-part codes, see Vitányi and Li (2000, 447). 6. See Temperley (2007, chap. 3) for a model of rhythm and meter perception based on the idea that simpler meters are more probable and events are more likely to occur on stronger beats. 7. This model was originally described by Meredith (2012c, 2013a). 8. A greedy algorithm attempts to solve an optimization problem by always choosing the locally best option at each decision point in the construction of a solution. This does not always produce a globally optimal solution, but for some problems it does (e.g., activity selection, the construction of a Huffman code). For more details, see Cormen and colleagues (2009, 414–450).

References Bouchier, E. S. 1901. Aristotle’s Posterior Analytics. Oxford: Oxford University Press. Burrows, M., and D. J. Wheeler. 1994. A Block-Sorting Lossless Data Compression Algorithm. Palo Alto, CA: Digital Systems Research Center (now HP Labs). Technical Report SRC 124. Chaitin, G.  J. 1966. On the Length of Programs for Computing Finite Binary Sequences. Journal of the Association for Computing Machinery 13 (4): 547–569. Chater, N. 1996. Reconciling Simplicity and Likelihood Principles in Perceptual Organization. Psychological Review 103 (3): 566–581. Collins, T. 2013. JKU Patterns Development Database. http://tomcollinsresearch.net/research/ data/mirex/JKUPDD-Aug2013.zip. Accessed January 21, 2016. Collins, T., R. Laney, A. Willis, and P. H. Garthwaite. 2011. Modeling Pattern Importance in Chopin’s “Mazurkas.” Music Perception 28 (4): 387–414. Cormen, T. H., C. E. Leiserson, R. L. Rivest, and C. Stein. 2009. Introduction to Algorithms. 3rd ed. Cambridge, MA: MIT Press. Deutsch, D., and J. Feroe. 1981. The Internal Representation of Pitch Sequences in Tonal Music. Psychological Review 88 (6): 503–522. Fano, R.  M. 1949. The Transmission of Information, Technical Report No. 65, March 17. Cambridge, MA: Research Laboratory of Electronics, MIT. Forth, J. 2012. Cognitively-Motivated Geometric Methods of Pattern Discovery and Models of Similarity in Music. PhD thesis, Department of Computing, Goldsmiths, University of London. Forth, J., and G.  A.  Wiggins. 2009. An Approach for Identifying Salient Repetition in Multidimensional Representations of Polyphonic Music. In London Algorithmics 2008: Theory and Practice, edited by J. Chan, J. W. Daykin, and M. S. Rahman, 44–58. London: College Publications.

music analysis and data compression   177 Halpern, A. R. 2003. Cerebral Substrates of Musical Imagery. In The Cognitive Neuroscience of Music, edited by I. Peretz and R. J. Zatorre, Chapter 15. Oxford: Oxford University Press. doi:10.1093/acprof:oso/9780198525202.001.0001. Helmholtz, H. L. F. 1867. Handbuch der physiologischen Optik. Leipzig: Leopold Voss. Huffman, D. A. 1952. A Method for the Construction of Minimum-Redundancy Codes. In Proceedings of the IRE, September, Vol. 40 (9), 1098–1101. doi:10.1109/JRPROC.1952.273898. Huron, D. 2006. Sweet Anticipation: Music and the Psychology of Expectation. Cambridge, MA: MIT Press. Koffka, K. 1935. Principles of Gestalt Psychology. New York: Harcourt Brace. Kolmogorov, A.  N. 1965. Three Approaches to the Quantitative Definition of Information. Problems of Information Transmission 1 (1): 1–7. Leeuwenberg, E. L. J. 1971. A Perceptual Coding Language for Visual and Auditory Patterns. American Journal of Psychology 84 (3): 307–349. Lerdahl, R., and R. Jackendoff. 1983. A Generative Theory of Tonal Music. Cambridge, MA: MIT Press. Li, M., and P.  M.  B.  Vitányi. 2008. An Introduction to Kolmogorov Complexity and Its Applications. 3rd ed. New York: Springer. Louboutin, C., and D. Meredith. 2016. Using General-Purpose Compression Algorithms for Music Analysis. Journal of New Music Research 45 (1): 1–16. Martin, J.  G. 1972. Rhythmic (Hierarchical) versus Serial Structure in Speech and Other Behavior. Psychological Review 79 (6): 487–509. Meredith, D. 1996. The Logical Structure of an Algorithmic Theory of Tonal Music. Unpublished thesis. http://www.titanmusic.com/papers/public/thesis1996.pdf. Accessed May 15, 2017. Meredith, D. 2006. The “ps13” Pitch Spelling Algorithm. Journal of New Music Research 35 (2): 121–159. Meredith, D. 2007. Computing Pitch Names in Tonal Music: A Comparative Analysis of Pitch Spelling Algorithms. PhD thesis, University of Oxford. Meredith, D. 2012a. Music Analysis and Kolmogorov Complexity. In Proceedings of the 19th Colloquio d’Informatica Musicale (XIX CIM). Trieste, Italy, 21–24 November, Trieste, Italy. pages 96-102. Available online at http://cim.lim.di.unimi.it/2012_CIM_XIX_Atti.pdf. Meredith, D. 2012b. A Geometric Language for Representing Structure in Polyphonic Music. In Proceedings of the 13th International Society for Music Information Retrieval Conference (ISMIR 2012), 133–138. Porto, Portugal: International Society for Music Information Retrieval. Meredith, D. 2012c. A Compression-Based Model of Musical Learning. DMRN+7: Digital Music Research Network One-day Workshop 2012, December 18, Queen Mary University of London. Meredith, D. 2013a. Analysis by Compression: Automatic Generation of Compact Geometric Encodings of Musical Objects. In The Music Encoding Conference 2013, May 22–24, 41–53. Mainz, Germany: Mainz Academy for Literature and Sciences. Meredith, D. 2013b. COSIATEC and SIATECCompress: Pattern Discovery by Geometric Compression. In Music Information Retrieval Evaluation Exchange (Competition on “Discovery of Repeated Themes & Sections”) (MIREX). Curitiba, Brazil. https://www.music-ir. org/mirex/abstracts/2013/DM10.pdf. Meredith, D. 2015. Music Analysis and Point-Set Compression. Journal of New Music Research 44 (3): 245–270.

178   david meredith Meredith, D. 2016. Analysing Music with Point-Set Compression Algorithms. In Computational Music Analysis, edited by D. Meredith, 335–366. Cham, Switzerland: Springer. Meredith, D., K. Lemström, and G. A. Wiggins. 2002. Algorithms for Discovering Repeated Patterns in Multidimensional Representations of Polyphonic Music. Journal of New Music Research 31 (4): 321–345. Meredith, D., K. Lemström, and G. A. Wiggins. 2003. Algorithms for Discovering Repeated Patterns in Multidimensional Representations of Polyphonic Music. Proceedings of the Cambridge Music Colloquium. University of Cambridge. Meyer, L. B. 1956. Emotion and Meaning in Music. Chicago: Chicago University Press. Pearce, M., and G.  A.  Wiggins. 2012. Auditory Expectation: The Information Dynamics of Music Perception and Cognition. Topics in Cognitive Science 4: 625–652. Povel, D.-J., and P.  Essens. 1985. Perception of Temporal Patterns. Music Perception 2 (4): 411–440. Restle, F. 1970. Theory of Serial Pattern Learning: Structural Trees. Psychological Review 77 (6): 481–495. Rissanen, J. 1978. Modeling by Shortest Data Description. Automatica 14:465–471. Seward, J. 2010. bzip2 version 1.0.6, released 20 September 2010. http://www.bzip.org. Accessed April 19, 2014. Shannon, C.  E. 1948a. A Mathematical Theory of Communication. Bell System Technical Journal 27 (3): 379–423. Shannon, C.  E. 1948b. A Mathematical Theory of Communication. Bell System Technical Journal 27 (4): 623–656. Simon, H. A. 1972. Complexity and the Representation of Patterned Sequences of Symbols. Psychological Review 79 (5): 369–382. Simon, H. A., and R. K. Sumner. 1968. Pattern in Music. In Formal Representation of Human Judgment, edited by B. Kleinmuntz. New York: Wiley. Simon, H. A., and R. K. Sumner. 1993. Pattern in Music. In Machine Models of Music, edited by S. M. Schwanauer and D. A. Levitt, 83–110. Cambridge, MA: MIT Press. Solomonoff, R.  J. 1964a. A Formal Theory of Inductive Inference, Part  I.  Information and Control 7 (1): 1–22. Solomonoff, R.  J. 1964b. A Formal Theory of Inductive Inference, Part II. Information and Control 7 (2): 224–254. Temperley, D. 2001. The Cognition of Basic Musical Structures. Cambridge, MA: MIT Press. Temperley, D. 2004. An Evaluation System for Metrical Models. Computer Music Journal 28 (3): 28–44. Temperley, D. 2007. Music and Probability. Cambridge, MA: MIT Press. Tononi, G., and C. Cirelli. 2014. Sleep and the Price of Plasticity: From Synaptic and Cellular Homeostasis to Memory Consolidation and Integration. Neuron 81 (1): 12–34. Vereshchagin, N. K., and P. M. B. Vitányi. 2004. Kolmogorov’s Structure Functions and Model Selection. IEEE Transactions on Information Theory 50 (12): 3265–3290. Vitányi, P. M. B., and M. Li. 2000. Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity. IEEE Transactions on Information Theory 46 (2): 446–464. Ziv, J., and A. Lempel. 1977. A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory 23 (3): 337–343. Ziv, J., and A. Lempel. 1978. Compression of Individual Sequences via Variable-Rate Coding. IEEE Transactions on Information Theory 24 (5): 530–536.

chapter 9

Bioacoustic s Imaging and Imagining the Animal World Mickey Vallee

Introduction About 100 kilometers north of the Alberta Oil Sands, hidden amid the burned and twisted shards of lumber, netted over freshly growing grass and sprouting pine, a common nighthawk nests on the ground, unseen by our eyes despite our attempts. Its own speckled pattern mingling with the black, ash, and tan of the land, the nighthawk sits unseen and unheard until one of the biologists whispers to the rest of us, “got it.” I can see it: its form emerges from its surroundings under my own eyes like an image that grows from a magic eye test, an organism whose home only a year ago was engulfed in a 700,000-hectare forest fire. It lays there like a taxidermy prop, but breathing rapidly, seemingly unaware that we see it, its solid black marble eyes glistening with the silence of a life full of wait. It seems designed for this terrain, and even as I see it. I cannot say in all comfort that I can fix it under my gaze—my vision cannot hold it in all certainty, which the nighthawk uses to its advantage when it explodes from the ground where it nests (when it senses that its young are under threat, the common nighthawk produces a loud “wing-clap” as it arks dramatically through the air away from its makeshift nest— they do not nest in trees). This one lands in front of us, clumsily, with its plumage puffed out and its wing bent backward, meters from its nest, in an attempt to distract us from its young; one of the researchers slowly positions his iPhone overtop two chicks left on the ground and takes a picture, and we leave hastily to let the mother return to her pair. The common nighthawk is difficult to sight: it is nocturnal, it blends with its environment, it is notoriously elusive, and it is relatively quiet save for a nasally peent! in flight, and a sonic wing-clap as it dives (Viel 2014). Sighting nighthawks, especially while they are nesting, is painstaking work, which is why biologists are turning increasingly to bioacoustics technologies for the purposes of identification and location exercises: an

180   mickey vallee animal’s sonic emissions serve as a reliable route of access to their location and their patterns of behavior (Laiolo 2010). But what to do with these sonic emissions, and how these emissions play into the imagination of science, is my research focus in this chapter. In the context of the biological sciences, researchers who use bioacoustics are interested in animals’ sounds in their ecological contexts and what those sounds might indicate regarding the security of biodiversity and concerns over ecological depletion (this context-based program of research is what some call “ecoacoustics”; see Sueur and Farina 2015). Bioacoustics researchers use a variety of sound equipment to gather and analyze data: durable autonomous recording units (ARUs) can track years of information from within one location (Hutto and Stutzman 2009); backpack microphones strapped to animals’ backs will track their sonic patterns as they move through space (Gill et al. 2016); data are uploaded onto “listeners” that align the sounds with their appropriate species (Schroder et al. 2012); and such results are uploaded for international research centers and for international research teams. This last point about the digital community of nighthawks is an example of a transacoustic community. Barry Truax defines an acoustic community as an “information rich” system that uses “acoustic cues and signals” that play a “significant role in defining the community spatially, temporally in terms of daily and seasonal cycles, as well as socially and culturally in terms of shared activities, rituals, and dominant institutions” (2001, 66). But here, a transacoustic community transcends the immediacy of place, transgresses the boundaries of immediate community, transforms data into inter­ national research centers, transcends the visual with auditory analysis that has a better and higher definition, and transposes from the audible into the visible. Because the sharing is to access signs of population depletion and biodiversity loss, imagination is a scientific tool for intervening in avoidable and undesirable futures. Indeed, I am not so much interested in sound here as I am in sounding as a research method. Thus, in being interested in how researchers are implicated in the infrastructures they spontaneously design, I work toward inverting that infrastructure, with an eye to the argument that such encounters are almost entirely reliant on a specific form of imagination where the image of sound overrides the evanescence that is so often ascribed to it. Throughout the chapter, I will attempt to “open the blackbox” of bioacoustics, by exploring the notion that contemporary bioacoustics encourages hearing without listening: that is, when emerging sound technologies are capable of detecting small variations in sound, they register at a much higher accuracy than does human listening; the scientists involved in this research must develop a technical mastery at species identification, but one that is visually instead of audibly grounded. Characteristic of other sound-based research units, bioacoustics researchers use sound not to understand the nature of the sonic but instead as a means to find palpable solutions to pressing social and environmental problems using the sonic as a mode for imaging. Bioacoustics researchers are not intrigued by sound as an object so much as a method. Since sound technologies and their storage devices have become (1) digitized and (2) automated, they are capable of capturing the sounds of global populations in real time.

bioacoustics: imaging and imagining the animal world   181 Sound has become an essential methodological device for identifying species, as well as tracking the polyphonic and polyrhythmic complexities of the various landscapes that change across ecosystems. The open disciplinary conversation between disciplines and to the public requires a more flexible and porous usage and definition of emerging sound technologies, intended to educate members of the public in assisting research projects. If Henry David Thoreau celebrated the “warbling of the birds ushering in the day” (1885, 35), and this is certainly not an antiquated attitude toward birdsong today, researchers today are more accepting of the fleeting nature of sound as “arrangements of charged particles in the semiconductive materials of solid state “flash” memory, or the magnetic surfaces of hard drives, tapes, and minidiscs” (Gallagher 2015a, 569). “Common practices include,” the geographer and sound recordist Michael Gallagher writes elsewhere, “making field recordings, including the transduction of inaudible vibrations using devices such as hydrophones and contact microphones; making compositions from field recordings, and distributing these via CDs, MP3s, vinyl, radio or online platforms such as weblogs, digital audio maps and podcasts; site-specific performances and installations; and audio walks designed for listening on portable devices whilst moving through a particular environment” (2015b, 469). To elucidate the specific complexity of the transacoustic community proposed, I aim in this chapter to clarify the general complexity that sound retains throughout creative imaging processes. Sounding, I argue, has the potential of producing interdisciplinary and theoretically innovative knowledge that seeks new virtual spatializations of the earth. I proceed with a description of the historical context through which bioacoustics became a research focus for those in the biological sciences. I am especially interested in the move from individual specimens to whole species in their ecological contexts. I conclude with a brief discussion of the transacoustic community, borrowing from Jakob Johann Baron von Uexküll’s notion of the Umwelt; Uexküll explored the making of worlds from a theoretical biological perspective, his ideas about organism self-preservation deriving from a decidedly antimechanistic perspective that asks us to attend to an organism’s inner and external sense of events as the habituation of the codes and information an organism uses to inhabit an environment.

Sounding Animals There is a long-standing interest in transcribing animals’ sounds: Anthanius Kircher’s Musurgia Universalis ([1650] 1970) imagines connections between humanity and nature by designating certain animal songs to the cosmos, such as a variety of birds as well as the sloth (which exhibits a six-interval vocal range). Ludwig van Beethoven, on his well-known wilderness walks, incorporated birdsong into his compositions (the most obvious being the sixth symphony’s “cuckoo calls” [see Baptista and Keister 2005]). Olivier Messiaen transcribed birdsong and manipulated their speed and range in order to capture an otherwise imperceptible dynamic (Hill 2007). David Rothenberg’s (2008)

182   mickey vallee well-known performances for and with a variety of species, live and recorded, continues this tradition of linking aesthetic, sound, sense, and imagination (and, in Rothenberg’s case, collaboration). Most of these and other examples have relied on an aesthetic of listening, which the nineteenth-century music critic Paul Scudo once referred to as “the divine language of sentiment and imagination” (Scudo, cited in Johnson 1995, 272). But in this chapter I am interested in the type of imagination belonging to the biological sciences. Before the mid-twentieth century, researchers in the biological sciences centralized listening in their data collection and analyses, transcribing the sounds of animals for the purposes of discovering keys to biodiversity, mating behavior, and the anticipation of biological change. Because recording devices were too cumbersome, one had to rely on having a musical ear to transcribe sound in the form of onomatopoeia. Unconvinced by this method, Albert R. Brand at Cornell University’s Ornithology Research Lab had attempted to capture bird song with “sound film” (used otherwise for Hollywood “talkies”), which captured both the image and the sonic emissions of birds. This he considered a more objective means of capturing sound. Brand had written: [R]arely do two observers hear the same song in exactly the same way. The song is not noticeably different when produced by varying members of the species, but by the time the sound waves have affected the listeners’ hearing apparatus, and have been transferred by the nerves to the brain, and interpreted by that organ, it has created an entirely different sensation and impression on each individual listener.  (1937, 14)

Although they were grounded in visual images and movement, Brand’s films were still keen on listening in real time to the sounds of animals. However, by the mid-twentieth century, the spectrogram was introduced to ornithologists to visualize sonic information. Spectrograms had a significant impact on the democratization of access to sonic data and analysis; scientists needed no more musical ear but rather a technical knowhow. Spectrograms made a direct contribution to the democratization of sound analysis, data collection, and contributions to science throughout the late twentieth century. Today the spectrogram image (and its variations) is the most common image of an animal’s utterance, an image that is inseparable from a new kind of work that would free up the scientist from the burden of listening and instead place the attention on, first, placing the equipment and, then, using it to capture the animals’ sounds. The researcher, now liberated from their own ear, worked with the technology that could pick up the transmission of information. Better yet, the spectrogram was equipped with a capacity for accurate visualization, given that the vibrations from the needle on the machine would be etched into a paper surface. The spectrogram caught more than the sound of the organism, but rather the whole situation within which it was situated; this transcription of the atmosphere, of its world, allowed researchers to visualize the polyrhythmic complexities of its environment, including its communications with other species. The spectrogram demanded a unique, visually grounded, art of its own: calligraphy, traced on paper, meticulously teased out the upper portion of the recorded sound so as to discover an arc represented through space (frequency) and time (duration). Birdsong would no longer be described using words, or onomatopoeia for that matter, but had to

bioacoustics: imaging and imagining the animal world   183 have a direct inscription of the ecology in which the organism was situated onto pages, using ink and paper. These were less contiguities, these points of contact, any kind of mediation and transduction, than they were direct feelings, motivations, and movements onto the page in order to trace the otherwise invisible (but real) contours of the bodies responsible for producing them. Calligraphy thus demanded a particular visual detail of sonic information that reduced the need for the humans involved in producing them to listen attentively and instead to trace the contours of a sonic inscription. The spectrogram was adept at picking up certain important information: the environment and the ecology in which the animal was situated, whereas the onomatopoeia transcriptions isolated song from context. What became more important, then, was less the taxonomy of the animal than what the animals’ sounds could tell researchers about their surroundings and their environments: how they were situated within a community or a sound ecology. Bioacoustics researchers today are interested almost principally in method, technique, and representation, attributed to the rapidly expanding datasets they have access to. While some use multiple ARUs to triangulate the position of organisms and their return to particular locations, others use sound to measure the amount of masking caused by anthrophonic intrusion (see Berkaak, volume 1, chapter 15, for debates surrounding cultural heritage). A vast array of representational methods and knowledge syntheses are available to those interested in bioacoustics, moving well beyond the “manipulation and playback” model of acoustic ecology, or the GIS (Geographic information system)-based representations of landscape ecology. Bioacoustics research is also a response to the uncertainty and anxiety around biodiversity loss, on a global scale, and the role that anthrophonic interference is having on the balance of ecosystems. Some research teams use “noise mapping” by reading city decibel levels corresponding to a color-coded legend that identify noise “hot-spots” (Hawkins 2011). With a supposed 83 percent of the land in the United States being about two-thirds of a mile from a road, conservation officers team up with acousticians and sound ecologists to reduce the presence of helicopters, planes, and other means of transporting especially tourists into natural landscapes (Powers 2016). Such high levels of noise have inspired Gordon Hempton to locate the “quietest square inch on earth” in (ironically enough) the United States that, he claims, has no anthrophonic interference whatsoever for up to 20 minutes at a time (Berger 2015). Such searches for quietude against the din of mobile humanity and expanding urbanization have also resulted in conservationist measures to select habitats and in the use of sonic technologies as a geoengineering strategy. These researchers have taken to using sonic technologies to coordinate new and better soundscapes by masking the anthrophonic interference with loudspeakers planted in natural settings that are intended to “give back” the soundscape (Berger 2015). Others use multiple recording technologies to triangulate the exact location of species so as to expedite conservationist interventions for those who are deserting their natural habitat (Donaldson  2016). (Triangulation is thus the creation of a virtual space that uses sound, of tracing the contours of what a place might come to represent.) Playback (which will be expanded later) is used for giving voice back to place.

184   mickey vallee Conservationist interventions, if they are successful, must transcend local interventions and projects, which is the motivation behind such films as Global Soundscapes: Mission to Control the Earth, an Imax feature that attempts to identify every sound in the world for a massive online repository of the earth’s sounds. Such interventions are intended to elucidate the making of a place through sound, but in such a manner that transcends any one such location, the actualization alluded to above; longitudinal research comes to life through sonification (Vartan 2016). This is research that is geared toward finding the changes to “whole, global populations” of species, places, ecosystems, and the ­biosphere, and to elucidating the necessity for conservationist intervention for the survival of the human race (Torino 2015). Certainly, if bioacoustics researchers are interested in whole populations over long periods of time, it is becoming increasingly necessary to use technologies that listen to and recognize patterns, such as the declining sonic signals emitted from organisms. One research team’s 35,000 samples from recordings in Tippecanoe County in Indiana have indicated that places with more anthrophonic interference are drowned by sonic drones that mask the information emitted by organisms which is fundamental to reproduction; that is, while organisms emit signals for the purposes of biological reproduction, the anthrophonic is an accidental byproduct of a machine in action, which is, in theory at least, meaningless beyond the action of the machine (Pijanowski et al. 2011). Interested in studying “the whole spectrum of acoustical energy in a landscape” (Hall 2016), soundscape ecologists in Germany have used 300 microphones to record one area; the microphones are timed to record one minute of sound in the environment every ten minutes, after which the data is processed by computer using over 120 terabytes of storage. On the other side of the unimaginably immense, there is the local involvement of citizen scientists and volunteer biologists, who contribute to the community-building aspects of global research initiatives. While the data collected and analyzed by these volunteers is sometimes perceived as borderline spurious (Cohn 2008), the efforts for community building and for live feedback on scientific methodologies is invaluable. In Canada, students and community members are working with the University of British Columbia (UBC) to sonically monitor tankers in the waters around Canada’s proposed west-bound pipeline (see Mazumder 2016, 5). But beyond the community-oriented or conservationist mandated studies of global populations, there is another set of practices that uses recorded sound, as well as the practice of recording sound, in new ways that compress, expand, and challenge the representations of soundscapes into new experimental cartographies.

Recoding the Recording—Catching Nighthawks The bioacoustics researchers with whom I worked in Northern Alberta erected mist-nets deep in the forest, in some places only accessible by bike or all-terrain vehicle, laced with ghetto blasters emitting nighthawk calls in order to bait and capture them in flight. Once

bioacoustics: imaging and imagining the animal world   185 captured, the nighthawks were placed into small aluminum tubes and returned to the research station on the gate of their pickup truck, were measured, and equipped with a small backpack microphone and a GPS device; the data that is subsequently recorded onto the microphone is uploaded to international research networks and measures the “sound-event” of the organism (its heartbeat, its wing pace, its calls, etc.) against the “sound-scene” of its habitat (the geophonic, anthrophonic, and biophonic data that informs the backdrop against which the sound-events unfold). It was not necessarily the results of this research that interested me as the infrastructural labor that went into the capture of data. This infrastructure is conducive with the “essence of mediations” that Bruno Latour describes as crossing the line between signs and things. Latour writes: To be sure, we no longer portray scientists as those who abandon the realm of signs, politics, passions and feelings in order to discover the world of cold and human things in themselves, “out there.” But that does not mean we portray them as talking to humans only, because those they address in their research are not exactly humans but strange hybrids with long tails, trails, tentacles, filaments tying words to things which are, so to speak, behind them, accessible only through highly indirect and  immensely complex mediations of different series of instruments . . . Instead of  abandoning the base world of rhetoric, argumentation, calculation—much like  the religious hermits of the past—scientists began to speak in truth because they plunge even more deeply into the secular world of words, signs, passions, materials, and mediations, and extend themselves even further in the intimate ­connections with the nonhumans they have learned to bring to bear on their ­discussions.  (1999, 96–97)

Certainly, the sonic imagination around nighthawk bioacoustics (and bioacoustics generally) is not limited to only those devices that are restricted to the audible, but pertains also to those that are part of a larger assembly of devices intended to capture the energy and emissions of organisms in their environment. With the assemblage of capture no longer restricted to the human capacity to listen, the efficacy and pragmatic efficiency of capture turned the aural-centric away from the qualities perceived through human hearing. I will now briefly describe two nodes in the network of nighthawk capture: the ghetto blaster and the mist-net. First, the ghetto blaster plays back the sound of a nighthawk in order to bait a nighthawk. A recording’s playback is partly neutral (it simply plays back that which it once inscribed, after all, in this case a stock recording of a common nighthawk’s peent!), but playback is also provocative (it always happens in new arrangements, in new contexts, for new audiences, in new moments, new times, new places). The flying nighthawk situates the presence of the phantom nighthawk by diving toward its sound. This reinforces Michel Chion’s notion of playback, in which “there is something before us whose entire effort is to attach his face and body to the voice we hear” (1999, 156). Playback is about producing symmetries between subject and object, which is to say that recording’s code is re-placed along with the ecosystem in which it is re-placed. Playback changes the place in which

186   mickey vallee it is situated, it attaches (virtually) bodies to sounds, but in new assemblages (the nighthawk bird is a nighthawk-net-database). Playback does not happen without such a change. In playback, the event of recording is transformed into a new event that involves the event of a bird plummeting into what the biological community refers to as a mist-net. The silent partner of the ghetto blaster, but one no less important in the recording apparatus, is the mist-net. The mist-net emerged along with the spectrogram and was considered one of the great inventions for ornithologists. Where once orni­ thologists used bait to trap individual specimens, by 1947 they were placing mist-nests around the periphery of their observation areas. Much like the microphones that capture everything that they come into contact with, the mist-nets were condoned for capturing everything that would attempt to pass through them, which allowed a more realistic impression of the numbers of specimens occupying a zone, and which led to the rise of quantitative measurements over qualitative descriptions of species. And while today many bioacoustics researchers consider the ARU the best practice for obtaining sonic data, they still use the mist-net to capture and track individual specimens (as part of a group) and to receive a more high-definition image of a population as it occupies a territory. The differences between qualitative and quantitative research here are not worthy to parse. Instead, the entire process counts as the “labor of methods” approach, which is currently gaining popularity in interdisciplinary research. As Michael Mair and colleagues write of the fabled divisions that keep quantitative and qualitative research methods distant from one another, such a practice is best avoided because: labelling research practices as qualitative or quantitative (or indeed “mixed”) may well have some uses (as badges of membership, for instance), but the labels themselves are not specifically descriptive of those practices and should not be treated as such. Knowing whether a piece of research is qualitative or quantitative, interpretive or calculative . . . is much less important for characterizing that research than understanding the specific ways in which it makes “the social structures of everyday activities observable”—that is, how it puts society on display.  (2015, 54)

After one of the researchers with whom I worked strapped a backpack microphone between the wings of a captured nighthawk, she released it to discover that it dropped like a stone; the microphone was not properly installed and was causing a disequi­ librium in the nighthawk’s capacity for flight. Swiftly she moved to its writhing body on the ground to remove the device, which she did with ease. As the microphones take a great deal of labor and care to install, it comes as a disappointment when they do not work. Sound is connected to vibration and the kind of ethnographic fieldwork where sound is the goal, but when sound is the method, the goal in this case being conservation, then what does that tell us about the philosophy of soundworlds and the worlds between sounds? This requires a turn to the transacoustic community as a way of imagining a “society on display.”

bioacoustics: imaging and imagining the animal world   187

The Transacoustic Community The common nighthawk, the bioacoustics researchers, the technologies through which they are measured and made sense of, globally and locally, constitute the image of a transacoustic community. The transacoustic community is bound by an elaborate recording/playback apparatus that is not necessarily reducible to the listenable but expands more generally into recording as a technical and cultural set of images. The transacoustic community is itself an image central to contemporary debates and discussions around multispecies encounters: simply, that entities open through their surfaces onto other entities (these openings are precisely the point of interrogation for bioacoustics researchers). There are variegated routes of access to such a conclusion (that entities have edges that open onto other entities), a few of which I have set out to explore in this chapter. Of course, there are many ways of doing bioacoustics research, but all these ways converge on the creation, maintenance, and breaking through of an entity’s contained space through its sonic emissions; technological assemblages belonging to bioacoustics researchers are intended to create new images and imaginations for how these breaches are done. Entities sound. And insofar as they sound, they make up their worlds. But since entities sound out to other entities, it is insufficient to claim that the worlds to which these entities belong are contained. Thus, the notion that a world is not self-contained, but rather porous and protean, makes it necessary to interrogate the underlying function of worlds as open, such as Uexküll’s philosophy of the Umwelt, which translates literally as “world around” (Brentari 2015, 75): it describes a connecting point between an organism’s interior and exterior sense of events, and describes the habituation of the codes and information an organism uses to inhabit an environment (von Uexküll [1934] 2010, 126–132). Elizabeth Grosz has expanded on this position, explaining that the human world finds its equivalent in the professional life of the architect, who brings together things at the demarcation of boundaries, heterogeneous expressions within a space that are given meaning through those very heterogeneous expressions (2008, 48). Grosz accounts for the famous tick that appears early in Uexküll’s book, A Foray into the Worlds of Animals and Humans, which he uses to go against the physiological approach to organisms as the sum of independent reflexes, arguing instead that ticks are embedded in affective worlds. Ticks use the smell of chemicals, the heat of the sun, and the flesh of the mammal to complete their worlds, once they have conjoined with another organism, such as the mammal; their world is defined by their connection to another’s world. The tick’s world is thus complete when it attaches to the edge of another world (the world of the unaware mammal, for instance, whose own totality the tick is equally unaware of). While the organism’s perceptive world is an inherited, species-specific conscious perception of those objects an organism perceives as outside of itself (such as the sound-event mentioned above, the peent! of the nighthawk), its operative world completes the organism’s immersion in an environment by merging with it (such as the sound-scene, the

188   mickey vallee flutter of moth wings the nighthawk dives into to consume) (see Brentari 2015, 99). At stake here is thus, along with the maintenance of worlds as highly dependent on the membrane of their milieus, the indeterminate nature of worlds as they forever open onto and into other worlds. Elizabeth Grosz writes that such worlds are musical (a common audio-based resource for those constructing idealized collective experiences): the music of nature is not composed by living organisms, a kind of anthropomorphic projection onto animals of a uniquely human form of creativity; rather, it is the Umwelten, highly specifically divided up milieu fragments that play the organism. The organism is equipped by its organs to play precisely the tune its milieu has composed for it, like an instrument playing in a larger orchestra. Each living thing, including the human, is a melodic line of development, a movement of counterpoint, in a symphony composed of larger and more complex movements provided by its objects, the qualities that its world illuminates or sounds off for it. Both the organism and its Umwelt taken together are the units of survival. Each organism is a musician completely taken over by its tune, an instrument, ironically, only of a larger performance in which it is only one role, one voice or melody.  (2008, 43)

At what point does a boundary turn into a breaking point? Or, at what point does the edge of one boundary merge into the edge of another? Uexküll establishes his position against physiological accounts that would see organism and interorganism behaviors as effects of stimuli reactions between different parts of an organism. This way of perceiving organisms was isolationist, against environment, and against the notion that an organism possessed agency in the construction of its world, and had to have agency in order to converge with the edge of another world. But a cycle is never on its own; it is with other cycles, and with others, significant and otherwise. Therefore, it is not the world that the organism creates but its coconstructive capacity for going into other worlds. It is for every melodic contour an edge of another’s world. It is sound that accounts for the breakthrough between the edges of worlds. Sound, in the context of this chapter, is the individuation of energies from separate worlds in transduction, which involves at once the assemblages of vocal tissue and environmental biotic and abiotic movements, including all other biophonic, geophonic, and anthrophonic crystallizations. When software has been programmed to detect a variant in sound, a transduction, it is registered at a much higher accuracy than with organic listening and pattern identification (though researchers often test-listen samples to assure accuracy), which introduces a technical or mechanical, in any case inorganic, listening into the process of exploration and discovery. The individuation of longitudinal studies, such as those that are multisited and multimicrophoned, which record more sound than is possible to listen to organically, and which is never stable but always in flux, is the crystallization of the node in a transacoustic community. Imagination, to imagine, to image; these variations on a term point to the slippage (linguistically and otherwise) of image; there are countless philosophical explorations of image and imagination, but a question of how images come to be made through sound is quite another matter, and one that is often grounded in routine empirics

bioacoustics: imaging and imagining the animal world   189 and the technics of observation. To return to the point of transacoustic communities: bioacoustics researchers are less interested in sound as an object of analysis itself than they are in using sound (and expanding definitions of sound) for high-definition insights into environmental and social problems. As such, they open up sound to a cross-modality of senses and transductions. We need to further think through sound as an emerging complexity with expanding boundaries, like the world composed of the ghetto blaster and the mist-net, which releases us from thinking of a recording as a pristine reproduction of sound “as it is.” Instead we think here through the image, imaging, and imagination of sound, as it affords the heard and the unheard; within and beyond, below and above normal human hearing, the capacity to live in resonance between objects and entities like ghetto blasters, mist-nets, nighthawks, and the global research teams that they converge on. There is more at stake than having the potential to capture more sound at a higher definition with a wider grasp at a longer rate, as the opening example of the common nighthawk suggests, to the extent we imagine place as a boundless mediation through and resonance between technological interfaces—this is a perspective that privileges the aesthetic and technological intervention over the usefulness or pragmatics of learning about biodiversity preservation. Instead, we might consider how new digital technologies, including the massive amounts of data storage and the live-response possibilities of constructing transacoustic communities, reveal how the image of sound (in this case its transduction of energy into forms of information for the purposes of research) is not necessarily obligated to what we might consider listenable.

References Baptista, L. F., and R. A. Keister. 2005. Why Birdsong Is Sometimes like Music. Perspectives in Biology and Medicine 48 (3): 426–443. Berger, E. 2015. Welcome to the Quietest Square Inch in the U.S.  Outside. Outside Online. https://www.outsideonline.com/2000721/welcome-quietest-square-inch-us. Accessed September 30, 2017. Brand, A. R. 1937. Why Bird Song Cannot Be Described Adequately. Wilson Bulletin 49 (1): 11–14. Brentari, C. 2015. Jakob von Uexküll: The Discovery of the Umwelt between Biosemiotics and Theoretical Biology. New York: Springer. Chion, M. 1999. The Voice in Cinema. New York: Columbia University Press. Cohn, J.  P. 2008. Citizen Science: Can Volunteers Do Real Research? AIBS Bulletin 58 (3): 192–197. Donaldson, A. 2016. National Network of Acoustic Recorders Proposed to Eavesdrop on Australian Ecosystems. ABC News. http://www.abc.net.au/news/2016-07-11/soundscapeecology-could-track-environmental-changes/7587354. Accessed September 30, 2017. Gallagher, M. 2015a. Field Recording and the Sounding of Spaces. Environment and Planning D: Society and Space 33: 560–576. Gallagher, M. 2015b. Sounding Ruins: Reflections on the Production of an “Audio Drift.” Cultural Geographies 22 (3): 467–485. Gill, L.  F., P.  B.  D’Amelio, N.  M.  Adreani, H.  Sagunsky, M.  C.  Gahr, and A.  Maat. 2016. A Minimum-Impact, Flexible Tool to Study Vocal Communication of Small Animals with Precise Individual-Level Resolution. Methods in Ecology and Evolution 7 (11): 1349–1358.

190   mickey vallee Grosz, E. 2008. Chaos, Territory, Art: Deleuze and the Framing of the Earth. New York: Columbia University Press. Hall, M. 2016. Soundscape Ecology: Eavesdropping on Nature. Deutsche Well (DW). http:// www.dw.com/en/soundscape-ecology-eavesdropping-on-nature/a-19304871. Accessed March 12, 2017. Hawkins, D. 2011. “Soundscape Ecology”: The New Science Helping Identify Ecosystems at Risk. Ecologist: Setting the Environmental Agenda since 1970. http://www.theecologist.org/ investigations/science_and_technology/1171165/soundscape_ecology_the_new_science_ helping_identify_ecosystems_at_risk.html. Accessed September 30, 2017 Hill, P. 2007. Olivier Messiaen: Oiseaux exotiques. Farnham: Ashgate. Hutto, R.  L., and R.  J.  Stutzman. 2009. Humans versus Autonomous Recording Units: A Comparison of Point-Count Results. Journal of Field Ornithology 80 (4): 387–398. Johnson, J. J. 1995. Listening in Paris: A Cultural History. Berkeley: University of California Press. Kircher, A. (1650) 1970. Musurgia Universalis: sive Ars Magna, Consoni et Dissoni. Hildesheim and New York: Olms. Laiolo, P. 2010. The Emerging Significance of Bioacoustics in Animal Species Conservation. Biological Conservation 143 (7): 1635–1645. Latour, B. 1999. Pandora’s Hope: Essays on the Reality of Science Studies. Cambridge, MA: Harvard University Press. Mair, M., C. Greiffenhagen, and W. W. Sharrock. 2015. Statistical Practice: Putting Society on Display. Theory, Culture and Society 33 (3): 51–77. Mazumder, A. 2016. Pacific North West LNG Project: A Review and Assessment of the Project Plans and Their Potential Impacts on Marine Fish and Fish Habitat in the Skeena Estuary. Environmental Assessment Report, Government of Canada. Minister of Environment and Climate Change. Pijanowski, B.  C., L.  J.  Villanueva-Rivera, S.  L.  Dumyahn, A.  Farina, B.  L.  Krause, B. M. Napoletano, et al. 2011. Soundscape Ecology: The Science of Sound in the Landscape. BioScience 61 (3): 203–216. Powers, A. 2016. Preserving the Quietest Places. The California Sunday Magazine. https:// story.californiasunday.com/quietest-places-on-earth. Accessed September 30, 2017. Rothenberg, D. 2008. Thousand-Mile Song: Whale Music in a Sea of Sound. New York: Basic Books. Schröder, M., E. Bevacqua, R. Cowie, F. Eyben, H. Gunes, D. Heylen, et al. 2012. Building Autonomous Sensitive Artificial Listeners. IEEE Transactions on Affective Computing 3 (2): 165–183. Sueur, J., and A. Farina. 2015. Ecoacoustics: The Ecological Investigation and Interpretation of Environmental Sound. Biosemiotics 8 (3): 493–502. Thoreau, D. 1885. The Writings of Henry David Thoreau. Vol. 6. Boston: Houghton Mifflin. Torino, L. 2015. You Can Actually Hear the Climate Changing. Outside. https://www.outsideonline. com/2035701/you-can-actually-hear-climate-changing. Accessed September 30, 2017. Truax, B. 2001. Acoustic Communication. Vol. 1. Santa Barbara: Greenwood. Vartan, S. 2016. We’re Changing the Way the World Sounds: Noise Impacts Ecosystems in  More Ways than You Might Think. Mother Nature Network. http://www.mnn.com/ earth-matters/wilderness-resources/blogs/we-are-changing-way-world-sounds. Accessed September 30, 2017. Viel, J. M. 2014. Habitat Preferences of the Common Nighthawk (Chordeiles Minor) in Cities and Villages in Southeastern Wisconsin. PhD thesis, University of Wisconsin-Milwaukee. von Uexküll, J. J. Baron. (1934) 2010. A Foray into the Worlds of Animals and Humans, with A Theory of Meaning. Translated by J. D. O’Neil. London: University of Minnesota Press.

chapter 10

M usica l Notation as the Exter na liz ation of Im agi n ed, Compl ex Sou n d Henrik Sinding-Larsen

Introduction At one, concrete level, this chapter is about the innovation of musical notation and how this tool for the description of sounds affected the way new music was imagined, performed, and socially organized. But there is also another and more theoretical aim; to see how this case of imagining and describing sonic qualities and patterns can inform and be informed by a general theory on the emergence of complexity as a result of new tools for storing, transmitting, and processing information. A key concept in my work with these topics is externalization. The theoretical aim of the chapter is to explore the insights we may gain by analyzing musical notation as a case of externalization of sound or patterns in sound. A subtheme within this endeavor focuses on imagination, and how imagination can be supported by externalizations. Obviously, there are limits to how a chapter of this size can provide even a brief overview of the history of musical notation and its consequences. Thus, only selected contemporary and historical examples will be dealt with to the extent that they serve the wider and theoretical aim. Because of this composite aim, the text moves between quite different levels of empirical detail and theoretical abstraction. It also draws on various academic disciplines. Before digging into the more theoretical and conceptual issues, I will set the scene by describing some cultural events where music was important and where musical notation played quite different roles. The cases are built on concrete events with personal participation as well as reflections based on other relevant events and sources. The aim is to highlight differences that are related to musical notation as a tool for the description of sounds.

192   henrik sinding-larsen My background for engaging with these issues is broadly interdisciplinary (including some evolutionary biology and informatics) with an origin in social anthropology and fieldwork among Norwegian fiddlers and their controversies around notation’s role in the preservation of a living folk music tradition (Sinding-Larsen 1983, 1991). Having once been member of the ensemble Kalenda Maya playing medieval and renaissance music is also an experience relevant to my choice of topic.

A Symphonic Concert A symphony orchestra is performing a twentieth-century work of classical music. When the conductor marks the pulse and tempo he mostly follows the score in a fairly regular way. But at times, the tempo is “contracted or stretched” with substantial amounts of interpretive freedom, agogic and rubato, whether notated or not. The musicians divide their visual attention between the conductor and the sheet music in front of them. They often play parts with a melodic complexity that makes them very difficult or even impossible to recall from memory. Thus, the written music is not only a support for learning the work in question but an indispensable element of its performance. Each musician concentrates on their part where every note and interval is played in full compliance with what is written; no improvisation, syncopation, ornamentation, or gliding from one note to the next unless explicitly indicated in the score or demanded by the conductor. The movements of the arms and fingers within the group of violinists could match that of Olympic medalists in synchronized swimming. Only the conductor is in a position to give full attention to the work as a totality. A performance of a symphony orchestra could be analyzed as a complex sociocultural event or a ritual where society (or a section of it) communicates with itself. The musicians, the conductor, the orchestra as a socioeconomic unit, the audience who paid for their tickets, and the politicians and sponsors who invested in the concert hall and the orchestra could all be seen as actors in a ritual with a “liturgy” perfecting and celebrating a selection of their cultural values: hierarchical, loyal cooperation with a high level of standardization within each group (of instruments); balanced and functional complementarity between groups; individual creativity concentrated at the top level (of composer, conductor, and possibly soloist); and a high level of individual accomplishment (among the musicians manifested in their technical ability to simultaneously synchronize multiple parameters like melody, rhythm, intonation, timbre, and intensity while following the score). The overall achievement of this, as with most professional orchestras, is impressive. A high level of technical difficulty is an important part of their artistic expression. However, to repeat what has been achieved in the past is not an ideal for the best orchestras. In the aesthetics of Western classical music, as well as in the general modern ethos dominated by the value of “progress,” the ideal is to always try to push the limits of what is possible not least in technical control. In general, this priority increases the focus on avoiding errors. A concert at the highest level could be compared

externalization of imagined, complex sound   193 to trapeze artistry without a safety net: Will someone play or sing out of tune? Will someone miss the timing? Will each note be sufficiently distinct? Will someone play a wrong note? Will the sum of artistic efforts match the expectations of critics? With such premises, it is not surprising that a sense of precariousness and nervousness may be prominent during performance with a corresponding feeling of relief when the concert ends without flaws. However, every dimension of symphonic music is not equally complex. If one masked the variations in pitch, intensity, timbre, and orchestration and just listened to the rhythmic aspects of a classical concert, then much of so-called advanced ­symphonic music would be rather simple if not boring judged by the ideals of, for instance, jazz or a well-played traditional fiddle tune intended for dancing. Compared to popular music, the “romantic freedom,” so prominent in much of classical music, reflects an inversion of priority between the melodic and rhythmic dimensions of music. In most popular music, a regular pulse and tempo is to a larger extent treated as an imperative premise on top of which the melody, harmony, and other pitchbased effects can unfold. In classical music, in particular in the romantic period, cadences but also other, local melodic events, even a single note, might trigger emotional responses that “demand” more time: time that is not taken from the duration of other notes in the same measure but that results in a net slowing down of the tempo. The tempo is, in these cases, treated as an expressive parameter subordinate to the “needs” of the melody or harmony. Musical “needs” of the melody and harmony can “with impunity” override and both accelerate and retard the integrity of a regular pulse. Deviating from the regularity of pulse and tempo may occur in many musical genres, also within popular music. Generally, it is most easily practiced and achieved by a soloist. To achieve the romantic kind of “freedom” with a large orchestra is generally very difficult without notation-based instruction and a conductor during performance. In popular music, an ideal is often to extensively challenge the main beat through subtle, improvised syncopations and other off-beat, rhythmic effects while adhering even more strictly to the regularity of the overall pulse. This is an element in what is called “groove”; a phenomenon that is hard or impossible to capture with musical notation (see Danielsen, this volume, chapter 29). As a contrast to this ideal, it is along the dimensions of pitch (melody, harmony), timbre (instrumentation, orchestral texture), and intensity (the overall, dynamic “narrative”) that we find most of the complexity of symphonic music (in addition to agogics and rubato mentioned earlier). The aesthetics of classical music is often expressed in disciplined, large-scale, hierarchically organized complexity most of which is impossible to achieve without musical notation. Notation in this context is, to a large degree, indispensable both for the music’s conception (imagination) and performance. Nevertheless, within these strict frames, set by the composer’s notation, uniqueness and creativity are highly valued. Similar values do also permeate modern complex society in many other domains. For example, laws and contracts could be thought of as externalizations or notation systems facilitating complex economic and social processes.

194   henrik sinding-larsen

A yoik—Complexity on a Different Scale Seeking to compare a symphonic work with a very different genre, the traditional yoik from the song tradition used among indigenous Sami reindeer herders in the northern regions of Scandinavia, is a fruitful endeavor. Yoik has shamanistic origins and was traditionally only performed by a single singer in small settings without any instruments or at most a hand-held drum played by the singer. The melodic material and intervallic range were very limited but could be extensively repeated—at certain occasions until a state of trance was reached. Within these constraints, there existed a huge variation in subtle qualities of the voice including countless pitch degrees outside those captured by a diatonic scale and traditional musical notation. Rhythmically, it alternates between a relatively steady pulse and more free rhythms; at any moment it is susceptible to pauses for breathing followed by restarts with no reconnection to the previous pulse (Graff 2007). This organization of time is incompatible with long, elaborate melodic developments. However, the range of possible qualities of the voice in yoik by far exceeds the acceptable ones for a classically trained voice. The pitch is often mixed with expressive guttural sounds or more relaxed, speech-like qualities which make the pitches less fixed and “pure” and thus less combinable into polyphonic complexity. The social and cultural values traditionally communicated through yoik are quite different from those of a symphony orchestra. The traditional Sami society was small scale, and personal relations to both humans and animals constituted an important part of the total amount of the social “glue” keeping society integrated. This represents a contrast to the society that produced symphonic music where written laws and contracts had replaced much of the relational integration that characterized smaller scale societies. An important function of yoik was to describe or confirm concrete relations. A yoik can be descriptive of persons, animals, and landscapes, not as externalized descriptions of these entities, but by performing the yoik as a kind of “speech act” or “song act” which directly connects the singer with the “described.” An important genre of yoik is called person yoiks. The Sami singers insist that a person yoik is not about a person, it is that person. To some extent, a similar logic is operative vis-à-vis animals. A famous yoik about a wolf chasing a reindeer is modeled on the sounds of howling wolves.1 The yoik’s main motif ’s most important intervals are a fourth and a fifth. Central to the section of the yoik “describing” the wolf ’s final attack on the reindeer the fourth is, in a gliding way, pushed toward the tritone, a particularly dissonant interval, known as Diabolus in Musica in medieval times. One possible function of the wolf yoik is to connect with the wolf and, in that way, obtain some magic control over this dangerous predator. Yoiks have been notated by ethnomusicologists with classical musical notation, but only few of the important elements in yoik are captured by this tool of description. An example that could reveal other and subtler contrasts to a symphonic concert would be sacred music echoing the congregational chant as it may have sounded and

externalization of imagined, complex sound   195 functioned in the cathedrals and monasteries of medieval Europe at the time just before some of the basic elements of modern musical notation were invented around the eleventh century ce.

A Responsorial Chant in a Mass The following example consists of impressions from a chant I witnessed during a Roman Catholic mass in 2005 in Kraków. This event is obviously not from a preno­ tation era, but it had some “historic” qualities that make it useful for comparison to a symphonic concert. As I do not understand Polish, I could concentrate on the musical aspects of the priest’s reading. The distance from text reading to song is narrower in these liturgical settings than in general. The chant was responsorial in the sense that the priest sang the verses and the congregation responded with a refrain that contained a relatively intricate melodic line. The refrain was repeated after each of the many verses, and the synchrony of the congregation became rhythmically tighter with each repetition. No organ or large choir dominated the voices of the congregation. The chant was monophonic, unaccompanied song. The church was packed, every seat was occupied, and many were standing. I saw nobody with a hymnbook or other texts in their hands during the chant. Their eyes were focused on the priest or beyond, and their musical response came without hesitation in unison, with firm voices, and organically integrated to the rhythm and pitch set by the priest. It was obvious that this congregation had participated in the ritual many times before. Neither the rhythm of the priest’s chant nor the response from the congregation was strictly metric. It was influenced by the rhythm of the words and phrases like a kind of melodic prayer. Without a dominating organ, choir, hymnbook, conductor, or a melodic style that presupposed a steady beat, the only way to synchronize tightly with one’s fellow worshippers was through mutual attentive imitation enhanced by the many repetitions of verses in this chant as well as repetitions from attended masses in the past. The level of synchronization of time and pitch was never comparable to that of a professional symphonic orchestra or choir with a conductor. But the congregational chant had some other qualities that followed from the particular process whereby their level of synchrony was obtained. One reason for the strong impression that this responsorial chant made on me was what I experienced as a state of enhanced, mutual, collective attentiveness or awareness of an organic kind. I experienced the synchrony obtained as different from that which follows from the counting of beats and measures or following the gestures of a conductor. I would describe the difference as the one between marching together and breathing together. We could also articulate the difference as one between a less and a more externalized way of achieving synchrony: synchronization through collectively evolving, mutual attention versus synchronization through individual adjustment to a common, externalized, rhythmic template (drum beat or other audible track of reference).

196   henrik sinding-larsen The metaphor of breathing together as a suitable metaphor for the less externalized process of synchronization was corroborated by a story a Norwegian singer of Gregorian chant told me about a workshop he had attended.2 The leader of the workshop, a member of a renowned ensemble of early music, instructed the workshop participants on how to synchronize in the spirit of early music. The goal of his exercise was to make all the participants start to sing in full synchrony without any prior counting or visual cues, in total blackness. The only way to achieve this was to listen to each other’s breathing, synchronize the breath, and then start singing. The deeper aim of the exercise was to achieve a relevant state of mutual attentiveness for performing music closer to an oral or—in my terminology—a less externalized tradition. Of course, many other factors contributed to my different experiences of sounds and music between the mass in Krakow and the symphonic concert. In the mass, there was no separation between performers and audience. Everyone sang the same monophonic song except for the priest, who had a more elaborate textual part. None of the sections in the chant was exceedingly complex. The main values that were celebrated were less about coordinated hierarchy and excellence and more about community and inclusion through a shared practice. Also in ecclesiastic settings, hierarchy and excellence may be valued. But in the Dark Ages, before notation and the splendors of Gothic polyphony, the sacred music in churches and monasteries was less about the display of artistry and more about community and participation (Saulnier 2009). This was also reflected in the more modest complexity of the monophonic Gregorian chant. To understand what happened to music between the period of early Gregorian chant and the modern symphony orchestra, it can be useful to dig deeper into the relationship between notation and externalization.

Externalization and Complexity Humans had imagined and performed music for a very long time without notation. The purportedly oldest musical instrument is a flute made of mammoth ivory found in a cave in Southern Germany. It has been carbon dated to between 42,000 and 43,000 bce (Goodall 2013, 6). The oldest rudimentary musical notation is from ancient Mesopotamia (app. 2000 bce) and the oldest efficient notation is from Western Europe around 1000 ce. So, how and why did this need for a comprehensive notation of musical sounds emerge? And what were the consequences? I argue that these questions must be answered in the light of wide, historical transformations where musical notation was just one example among many other emergent tools of description affecting various domains in society. The quest for an efficient notation started with an alliance between the imperially ambitious Frankish king Charlemagne (r. 774–814) and the pope, who both wanted every Christian in Western Europe to sing the same chants authorized by the Vatican. The century following Charlemagne saw the rise of a comprehensive project of political unification supported by religious, educational, artistic, bureaucratic, economic, architectural, and military standardization (Freedman 2011). Orthography and grammar

externalization of imagined, complex sound   197 were standardized, and the small letters for writing were simplified to promote literacy. Coins and weights were standardized to promote long-distance trade. Such was the political and cultural climate in which the development toward an efficient system for notating music started (Levy 1998). On the frontispiece of many of the newly standardized, liturgical chant books was a picture of Saint Gregory (pope from 590 to 604 ce) with a dove (the Holy Spirit) whispering the chants directly into his ear, and a scribe sitting by his side and notating (Figure 10.1). A much earlier tool of description of vocal sounds was the phonetic alphabet with decisive implications for the development of Greco-Roman civilization and its unprecedented level of complexity (Goody and Watt 1963; Ong and Hartley 2012). The emergence of programming languages for computers is a recent example with possibly even more global consequences than the phonetic alphabet. Computer programming also includes radically new, digital approaches to the description, recording, and production

Figure 10.1  Frontispiece of a chant book from the monastery of Saint Gall circa 1000 ce. (St. Gallen, Stiftsbibliothek, Cod. Sang. 390, p. 13—Antiphonarium officii [Antiphonary for liturgy of the hours].)

198   henrik sinding-larsen of sound (Danielsen, this volume, chapter 29; Knakkergaard, this volume, chapter 6). In order to understand what happened in the particular case of musical notation, it would be helpful to understand what all these histories of emergent tools of description have in common. My main hypothesis is that all are cases of externalization, which is a concept I have used to bring together several theories on transitions in cultural and natural history (Sinding-Larsen 1987, 1991, 2008). I have found this concept and perspective useful for developing a more holistic understanding of cultural history as well as the relationship between cultural and biological evolution.3 There is a need to pay attention to one distinction made in The Oxford English Dictionary’s definition of externalization: “The action or process of externalizing; an instance of this; also concr. an embodiment. externalize: To make external; to embody in outward form.” What is important to retain here is that the word “externalization” can be used with two related but different meanings: (1) a process (of making external), and (2) something concrete, an embodiment which could be the result of that process. I also use “externalization” in more abstract and specialized senses that I will gradually approach through various examples until I arrive at a more formal, in-depth discussion of the concept later (see “Externalization and the Emergence of Complexity”). The action of recalling a pattern of sounds from memory and writing this as a pattern of note-heads on a staff would qualify as a process of externalization, while the concrete result of that process, the actual manuscript with notation, could also be called an externalization. In this chapter, it is the process of externalization that is of main interest, including the larger-scale and more complex processes that follow from the fact that even externalizations may themselves be externalized. And not only may externali­ zations be externalized. In evolutionary time scales, externalizations show a tendency to become externalized. In spite of periods of significant setbacks, both biological and cultural evolution are characterized by a long-term tendency toward increased levels of externalization.

Information and Complexity in Biology and Culture Living organisms grow by capturing or diverting flows of energy and materials from the environment into the dynamics of their bodies. These are materials and energy that otherwise would have dispersed more directly in accordance with the law of increasing entropy (the second law of thermodynamics) (Deacon 2012b). Life could be thought of as an extremely indirect way of dispersing energy and materials, and an overall trend in the evolution of life is a steady increase in the level of indirectness. A main driver of increasing indirectness is increasing complexity in information or “informed actions” that constrain and enable the flows of energy and materials. The ultimate function of information is to constrain environmental (external) flows of material and energy for

externalization of imagined, complex sound   199 the purpose of maintaining, growing, and reproducing the internal dynamics of a living body (its interiority). All living organisms need to handle information about how they performed helpful and harmful actions in the past (memory), a way to repeat helpful actions in the future (heritable, functional habits/traditions), and when needed (for example in face of environmental changes), a way to modify habits/traditions through imagination, creativity, learning, and evolution. Niche construction is a recent and increasingly important concept in evolutionary biology (Odling-Smee 2010). Niche construction denotes organisms’ actions in constructing and changing their environment for their short-term benefit in a way that also has consequences for the species’ long-term genetic selection. Beavers build dams with logs cut by their teeth. The dams favor selection for a flat tail that is adaptive for swimming in the dams. Teeth suitable for cutting trees, dams as a constructed niche, tails for swimming, and many other features enter into a kind of dialectic or coevolutionary process. It is argued that humans first developed a rudimentary language as a cultural (nongenetic) adaptation. Subsequently a language community functioned as a semiotic niche construction that favored the selection of individuals with larger brains who processed linguistic signs more efficiently (Deacon 2012a). Further externalizations through writing, maps, notations, and other semiotic tools have now become part of the niche or environment in which humans grow up and live. There is no doubt that the human semiotic externalizations we call science have vastly increased our species’ ability to channel energy and materials from the rest of the environment into our bodies as well as into those of domesticated plants and animals under our control. Describing and controlling are two closely linked activities not only within science. The same is true for describing and controlling the sound production called music. In a wide sense, all music could be thought of as a more or less transient sonic niche construction where the development of notation systems and tuning systems has played an important role as semiotic externalizations. There exist different levels or orders of externalization processes. To create a new piece of music and write its pitches and rhythmical patterns on paper by means of musical notation could be thought of as one order of externalization. To improve or create a new system of notation with which it is possible to write down or externalize entirely new kinds of sonic phenomena is an externalization process of a comparatively higher order than just making a description with an existing tool of description. To create a new tuning system that better matches the possibilities of a notation system could be seen as a form of externalization that resembles sonic niche construction or at least the construction of a sonic infrastructure. Finally, the term “externalization” may also be used to denote the large-scale processes of societal transformation that are a result of multiple, nested and/ or more limited externalization processes. This implies that my concept of externalization can be used for processes on several levels and often in a wider sense than the colloquial sense that is mostly concerned with the first order of “making external.” At a basic level, an alphabet can describe, make explicit, or externalize phonemes in a language. At a higher level, the process of introducing an alphabet and literacy to a culture that is without writing has been characterized as “alphabetization” which, in my terminology, would be an externalization (in the wider sense) by means of writing. The process of

200   henrik sinding-larsen describing (making explicit, external) a work process in a programming language interpretable by digital computers can be called a process of digitalization. “Digitalization” can also denote: “The adoption or increase in use of digital or computer technology by an organization, industry, country, etc.” (The Oxford English Dictionary). In the same way as alphabetization and digitalization can be used in both a narrow and a wider sense, my concept of externalization is meant to capture what alphabetization, digitalization, and the introduction of musical notation have in common on several levels.

The Externalization of Pitch and Intervals Something radical and interesting happens when we change from speaking to singing. The continuous type of pitch variation that characterizes prosody switches to a much more discrete or discontinuous type of pitch variation that characterizes melodies. The singing voice often moves in discrete steps between a limited set of pitches with a more or less fixed pattern of intervals we call a musical scale. One important background for our affinity toward discrete pitch steps in music is to be found in physical acoustics. A vibrating object like a string does not only vibrate in its full length but simultaneously in fractions of its length. These shorter fractions vibrate at higher frequencies that are inversely proportional to the frequency of the full length. If the full length vibrates at 100 cycles per second (Hz), then ½ of the length vibrates at 200 Hz, 1/3 at 300 Hz, ¼ at 400 Hz, and so on. The full-length pitch or frequency is called the fundamental frequency or just the fundamental. Pitches from the vibrating fractions are called overtones. In a well-crafted musical string, the most important of these fractions will vibrate in integer multiples of the whole length and produce pitches that are called harmonic overtones. The collection of harmonic overtones together with its fundamental is called a harmonic series. A tone consisting of concurrently sounding overtones from a single harmonic series is called a complex harmonic tone. In general, overtones are fused with the fundamental into a single auditory image with the fundamental being perceived and labeled as that complex tone’s only pitch. However, the overtones become important when we judge concurrent intervals between different pitches as consonant or dissonant. An element in how we judge the consonance or dissonance of an interval is the degree to which the fundamentals’ overtones overlap and form a single harmonic series or not. If the overlap is extensive, we could say that the fundamentals are closely harmonically related. Pitches at an octave apart (frequency ratio 2:1) are the most closely related because the higher tone has no harmonic overtone that is not also present in the tone one octave below. This is the physical basis for what we call octave equivalence. We could think of pitches an octave apart as simultaneously being identical and different in two different pitch spaces or pitch dimensions. One dimension is continuous and linear (the height or register aspect of pitch) while the other dimension (variably called “pitch

externalization of imagined, complex sound   201 class,” “tonal chroma,” or the identity aspect of pitch) varies in a discrete and cyclic way and may be depicted as a circle. This dimension could also be called the harmonic aspect of pitch, since it is this aspect that is the basis for creating melody and harmony. Scale comes from the Italian word “scala” meaning ladder. A ladder is ascending in a straight line, which is a relevant metaphor for the height or register aspect of pitch. But the pitch class (chroma or harmonic aspect of pitch) changes in steps from one pitch class to the next until it reaches the octave, which is identical to the pitch class where the movement started. In other words, a one-octave musical scale is simultaneously a straight ladder of linearly ascending pitch heights and a circular, harmonic scale akin to a “soft” ladder that is turned onto itself in a ring and where an octave is “one full circle” (Deutsch 2013).4 Many of the early challenges in developing an efficient notation system had to do with this double nature of pitches. The second most harmonically close interval after an octave has the frequency ratio 3:2 and is called a perfect fifth.5 We find this interval between the third and second overtone in a harmonic series. The potential symmetries, consonances, and dissonances that are a part of physical and auditory acoustics have under various cultural circumstances been exploited to create tensions and resolutions in musical themes and variations.6 To unleash this literally epic potential one needs to create a tone system (a collection of pitches and intervals) suitable for moving around in a tonal pitch space in harmonically relevant and motorically/perceptually manageable scale steps. This can be done in many ways and can be achieved in an entirely oral tradition. But a notation system coupled with a system for producing precise and predictable intervals with tunable instruments might provide significantly extended possibilities. How well the notation system is able to describe the tone system and its potential harmonic symmetries will influence what kind of music it is possible to imagine, compose, and perform.

Pythagoras, Sounds, and Mathematics Pythagoras was, according to legend, the first to describe (externalize) the size of intervals by means of what in his time was a relatively new and powerful tool of description: mathematics. Pythagoras established the basis for the idea that the length of a vibrating string was inversely proportional to its frequency and that the size of the most consonant and basic music-relevant intervals could be expressed as small-integer ratios between string lengths, such as 2:1 (octave), 3:2 (perfect fifth), 4:3 (perfect fourth). Also, the intervallic difference between a fourth and a fifth (with the ratio 9:8) was of particular importance to the Greeks and was called a tone. Both the arithmetic and geometry of these ratios were important, because the Greeks by means of calculations and a compass could construct the length of strings that would produce the theoretically established pitches and intervals as sounds. The instrument they used for this “sonification” of their theory

202   henrik sinding-larsen was based on one string with movable bridges above a line inscribed with the appropriately constructed geometric points. The instrument was called a monochord. With this knowledge, the Pythagoreans generated a tone-system by repeatedly adding the interval of a fifth to an initial pitch and then (building on the principle of octave equivalence) subtracting surplus octaves to locate all scale steps within the first octave (Hansen 2003). After six applications of a fifth to, for example, F, one gets F–C–G–D–A–E–B. Arranged within one octave from C the result is C–D–E–F–G–A–B. Although the Greeks used different note-names, they had created a sequence of seven pitches and intervals (the diatonic scale) that was to become the backbone of Western music until today not least because later (medieval) musical notation was specifically developed to fit this particular scale. The Pythagoreans chose the interval of a tone (ratio 9:8) as their “atom” in the diatonic version of the tone system. The basic interval structure of the seven diatonic scale steps (five tones [T] and two semitones [S]) is in today’s major mode given as follows: TTSTTTS. The cyclic character of the octave implies that the beginning and end of this linear interval pattern TTSTTTS can be joined to form a circle. Because the two semitones are asymmetrically located in the circle, one may obtain seven different diatonic sequences (or scales) of tones and semitones depending on where one starts in the circle. The Greeks identified these permutations and called them “species of the octave” but did not relate this to the cyclic nature of the octave. Today the idea of a musical scale is indissolubly connected to the octave as a cyclic or symmetrically repeatable segment within the tone system. But in ancient Greece, the foundational, symmetric scale segment was considered to be the tetrachord, which was not symmetrically repeatable to the same extent as the octave. A tetrachord consisted of four notes or scale degrees (three scale steps) where the two boundary notes were fixed (a perfect fourth apart) while the two notes that separated the internal scale steps could vary. The tetrachord in the diatonic genus (with internal steps of one semitone and two tones) was considered to be the most ancient and natural (Atkinson 2009, 11). The Greek diatonic genus is basically the scale that we still use and that now has attained almost global dominance not least through the dissemination of modern notation and keyboard instruments with their diatonic layout of the white keys. To make the tetrachord segment work for the description of their two-octave diatonic tone system, the Greeks had to stack the tetrachords in two different ways: one where the highest note in one tetrachord was also the lowest in the next tetrachord (called conjunct tetrachords—repeating each tetrachordal scale degree a fourth apart), and another (the disjunct) repeating a fifth apart. This lack of a uniform symmetry in the stack of tetrachords added to the complexity of the Greek way of naming pitches. Each pitch label consisted of two terms. One part of the pitch name referred to the location of the tetrachord in the overall register of tetrachords. The other part referred to the location (or scale degree) within that tetrachord. However, notes with the same scale degree in two consecutive tetrachords were only to a limited degree harmonically related and their relatedness varied depending on the type of tetrachord (conjunct or disjunct). The Greek diatonic tone-system as sounds was not basically that different from the one we use today, but their tetrachord-based perspective and notation system provided

externalization of imagined, complex sound   203 support for a different cognitive map with different constraints and affordances for how symmetries and harmonic possibilities could be imagined. Aristoxenos, a pupil of Aristotle and a founding father of Greek music theory, was well aware of the acoustics of the octave and octave equivalence. But for various reasons, he, along with subsequent Greek philosophers, singled out the tetrachord as their elementary symmetric segment in the tone system. Some reasons were metaphysical and related to the magic of the number four: The material world consisted of four elements (earth, fire, air, and water), and one only needed four numbers (e.g., 16-12-9-8, see Figure 10.2) to establish the ratios of the four foundational intervals in Greek music theory (2:1 octave, 3:2 fifth, 4:3 fourth, and 9:8 tone). But it could also be that a music theory and notation based on tetrachords worked sufficiently well for their basically monophonic melodies that included intervals smaller than a semitone (in the enharmonic genus) that in any case were less suited for our kind of polyphony. It could also be that the symmetry-obsessed Greeks found the lack of symmetry within an entire diatonic octave (with its two, asymmetrically placed semitones) to be incompatible with the status of a foundational entity in their tone-system. In any case, the tetrachordal conception of the tone-system represented serious limitations for medieval music scholars

Figure 10.2  The numerical basis for Pythagoras’s harmonic scale. (Detail from woodcut on page 18 in the 1492 treatise Theorica musice Franchini Gafuri laudensis. Source: Bibliothèque nationale de France.)

204   henrik sinding-larsen with the ambition of creating an efficient notation system that could support an emerging Christian interest in more advanced polyphony.

The Octave Revolution An important breakthrough came with the late tenth-century treatise (anonymous or by Pseudo-Odo7) Dialogus de musica, where the basic repeatable segment changed from the tetrachord to the heptachord (which implies the octave). The basic interval structure was still the same as in the Greek diatonic genus. But a deeper, octave-based symmetry was highlighted in a new, simple, and consistent way. The Greeks described their two-octave tone-system with a notation system using four different and variously stacked tetrachords with a unique and complex name and note symbol for each of the fifteen different pitches across both octaves. The same infor­ mation was in the new system expressed with one repeatable set of seven Latin letters in capital and small versions: ABCDEFGabcdefg. The most important part of this innovation was that each of the seven letters represented a pitch with symmetrically identical intervallic contexts across the entire tone system. Implicitly, the modern concept of pitch class was thereby invented and the harmonic aspect of pitch (based on the cyclic character of the octave) had been named and externalized with an unprecedented level of accuracy and simplicity. One background for the octave perspective could be Pseudo-Odo’s approach to the division of the monochord. Whereas Boethius (c. 450–525) established the diatonic set of intervals by first dividing the monochord in two octaves and then proceeded with the smaller intervals and pitches variously placed across the two-octave system (Boethius 1989), Pseudo-Odo found a more ingenious method starting with G and letting the whole string represent one octave and a tone. In this way, he could begin with constructing a tone (ratio 9:8 from G to A), then another tone (from A to B), then a fourth from G to C before working his way up the rest of the first octave. Subsequently, he created all the pitches of the second octave from the first (by dividing the string of each pitch of the first octave in two equal halves). Pseudo-Odo’s dividing and naming of the monochord provided a better support for an octave-based imagination of the tone system than former principles (Atkinson 2009, 212–214). All the sounds and intervals were the same as those established by Pythagoras 1,500 years earlier, but a new tool of description, the seven letters in small and capital versions, externalized the fundamental acoustic symmetry of octaves and thereby the harmonic aspect of pitch (the pitch class) in a better way. The seven Latin letters rapidly became a dominating way of referring to diatonic pitches all over Western Europe. This rapid dissemination was also linked to the more or less simultaneous graphic revolution of staff notation that likewise supported an octave perspective of the tone system. The diatonic scale and notation was fairly well fitted to the existing repertory of Gregorian chant, but the fit was not perfect. Certain chant melodies had an intervallic

externalization of imagined, complex sound   205 structure of tones and semitones that did not fit the notation system. These melodies needed a semitone interval where the tone system did not provide one. At times this resulted in melodies that did not fit being simply suppressed or altered to fit the system (Atkinson 2009, 244). The integrity of the notation system became at this stage more important than preserving an oral and divine tradition that Saint Gregory purportedly had received directly from the Holy Spirit! This tells us something about the cultural power of semiotic conventions. Eventually, the notation system, as well as the diatonic tone-system, was developed with the addition of more notes and particular signs until the system comprised twelve semitones in an octave. The Greek names and symbols (inverted/distorted letters) for pitches were sufficiently functional for theoretical treatises on music and the storage of melodic shapes for “archival” purposes but were not for practical playing and sight-singing or the imagination (composition) of new, polyphonic complexity. In particular, the Greeks did not use graphics in their notation system to visualize with iconic resemblance the melodic pitch movements (scale steps) of actual melodies. This idea appeared for the first time in the ninth century in the context of the Carolingian renaissance. The treatise Musica enchiriadis (The music handbook) from the second half of the century is an influential and early example (Erickson and Palisca 1995). The anonymous author described his pitches with the long Greek note names and also with a version of an ancient Greek collection of signs called the dasian sign system. But more importantly, he transferred these signs into a kind of coordinate system where the pitches of syllables of text were placed on lines ascending on the vertical axis while the temporal sequence of the same syllables were placed on the horizontal axis. In the left column in Figure 10.3 we see seven ascending dasian pitch signs (looking like twisted Fs and referred to in the treatise as notae or notes). The intervals separating the pitch signs (and hence also the lines) are marked with T for tone and S for semitone in accordance with the Greek diatonic genus. This treatise, from before 900 ce, contained the most basic graphic idea of staff notation more than a century before Guido of Arezzo’s treatise from 1028 ce, which is reckoned as the definitive birth of staff notation.

Figure 10.3  Example of polyphonic chant from the treatise Musica Enchiriadis. (Staatsbibliothek Bamberg, Msc.Var.1, fol.57r, photo: Gerald Raab.)

206   henrik sinding-larsen The critical Enchiriadis innovation was (1) to depict the vertically stacked horizontal lines as placeholders for pitches instead of attaching a letter-based pitch symbol to each separate syllable in the text (as the Greeks had done), and (2) to specify the intervals between the lines. This implied a graphic and iconic communication of pitch movements and melodic contours that was cognitively much more intuitive and efficient than its alphabetic predecessors. However, the author did not take it for granted that his medieval reader understood his bold abstraction right away. The author asks the reader to think of the lines as ordered strings (to create associations to the order of strings on a lyre or harp) and he further asks the reader, “Let these strings be in place of the sounds the notae signify” (Atkinson 2009, 124). The idea of a quality of a sound (pitch) depicted as a visual line in a staff had not yet become established imagination. Or we could say that the idea of pitch had not yet been fully externalized from the actual sounding string (at least for his readers). Nor had the sound been fully externalized from the syllable of the sung word that would eventually be replaced with a dot (a note-head). It would take more than a century before the scribes of music would pick up again this way of depicting a pitch space as a grid of horizontal lines. In the meantime, a quite different system of notation was developed: the neumes (Figure 10.4). Neumes also depicted shifting pitches as vertical movements, in particular within compounded neumes (ligatures). In that sense, they were more intuitive and efficient to read than the alphabetic notation. But without definitive, horizontal lines (pitches) of reference and explicit intervallic distance between the lines, it was often impossible to know exactly how the pitch changed from one neume to the next. Each neume (or group of neumes) was complex, gestural, dynamic, often with additional hints on duration. Neumes contained hints about changes in direction but no map with coordinates of the

Figure 10.4  Musical notation (neumes) from circa 900 ce. (St. Gallen, Stiftsbibliothek, Cod. Sang. 359, p.145—Cantatorium.)

externalization of imagined, complex sound   207 pitch space to help locate from where the changes of direction took place. Neumes served as mnemonic support for singers who already knew the melodies, but were not usable for learning melodies or as a cognitive tool to support the imagination of advanced polyphony. The concepts egocentric and allocentric are used to characterize two kinds of navigation (Buzsaki and Moser 2013). To navigate in an unknown landscape without a map requires egocentric navigation. Any place must be understood in relation to the navigator’s personal path through the landscape to this location. The experience of a specific location becomes path-dependent. To navigate with a map is called allocentric navigation because prior information about the landscape has been plotted onto a map or has been externalized in a way the navigator can consult independently of her own past itinerary. The information about a location on a map is thus more path-independent and more independent of the “ego” as the ultimate point of reference or “point-of-view.” In many contexts, allocentric may be used as a synonym for externalized, and the process of externalization could be thought of as “allocentrification.” The neumes in their inability to depict precise intervals (scale steps) were less externalized from the oral tradition and thus less allocentric than both the previous alphabetic notation and the subsequent staff notation, which was first developed with just one line of reference (Figure 10.5), and then with four, five, or six (Figure 10.6). The innovative idea of Guido’s staff notation was to combine and develop several previously established insights: (1) to map the neumes as standardized dots onto a compressed version of the pitch lines established in Musica enchiriadis (using both lines and

Figure 10.5  Neumes on a single line red F-staff. Montecassino, Italy, 2nd half of 12th century. (“The Schøyen Collection MS 1681.”)

208   henrik sinding-larsen

Figure 10.6 Thirteenth-century polyphonic composition in three parts. (Chansonnier de Montpellier. H196 p.16. Source: Bibliothèque interuniversitaire de Montpellier. BU historique de médecine. Credit photo: BIU Montpellier/DIAMM, University of Oxford.)

externalization of imagined, complex sound   209 spaces between lines as horizontal markers of pitch levels) and thereby creating more compact visual gestalts of the melodic contours; and (2) to specify the intervals between lines on the basis of octave symmetry, which meant that two note-heads seven ordinary scale steps apart would always sound as a 2:1 octave. This was not the case in the Enchiriadis version of the lines because of its consistent use of disjunct tetrachords. The new staff became a powerful tool for the externalization of not only precise intervals but also intervals’ contexts by making intervals visible and much more intuitively intelligible than in any previous notation system. The letter-based pitch symbols had been able to externalize exact intervals but in a more indirect way than the staff. Two intervals with the same width, for example the fifths C–G and E–A, had in their letter-based versions no visual similarities that would tell the reader that both were fifths. The letters only functioned as ordinal numbers indicating the number of steps from a first scale degree of a particular segment whether this segment was a tetrachord or an octave. On the other hand, note-heads on a staff communicated pitch height dissociated from particular scale degrees. For example, a fifth (two note-heads three lines or three spaces apart) was immediately visible and recognizable as this interval irrespective of its register or scale degree. In this way, intervals could be transposed vertically up or down the staff (register) while maintaining both their visual and sonic characteristics. Although some adjustments might be needed for the location of semitones, in general, pitch patterns became visually and cognitively “transposable” on a staff to a much higher degree than in the letter-based or neume-based notation systems. Through vertical alignment of several voices, it became easier to visually express and imagine how a particular pitch was part of several intervals at the same time. This paved the way for more complex polyphonic compositions. The staff was thus not only important for physical externalized notes on tablets and parchment. The staff had become a tool for visual-spatial imagination of sonic relationships between concordant and discordant intervals, relationships that were not easy to keep track of in a purely aural mode of conceiving music. The name of an early and important genre of polyphony was counterpoint. This term alludes directly to the note-heads as points on the staff that would be organized in several voices, point-against-point, in parallel, oblique, and countermovements. The art of counterpoint increased in complexity in the coming centuries and, according to many, peaked with the fugues of J. S. Bach.8 Although notation was crucial in this development, the change from an oral to a written musical tradition did not represent a simple, one-way transition where the importance of interiorized, implicit oral models is decreased and that of externalized, explicit written models is increased (Berger  2005). Berger acknowledges Goody when she writes that literacy does not replace orality. Literacy also creates conditions for a new kind of orality. Certain genres of music departed more profoundly from their oral/aural origins than others. Trevor Wishart reflects on the limited success of composers of serial (atonal) music. He attributes its limited appeal to this music’s almost total reliance on notation for its imagination; serial music was, according to Wishart, conceived with the eyes and not with the sense of hearing (Wishart and Emmerson 1996). Music is a sonic art and will ultimately be linked to hearing. But visual externalizations in the form of notation

210   henrik sinding-larsen and theories about the geometry of interval ratios have evidently influenced music. The extent to which the quality of music can be “objectively” determined by means of mathematical proportions or if music will always depend on idiosyncratic feelings and cultural preferences has been a contentious issue ever since Aristoxenos’s critique of the more mathematically fundamentalist Pythagoreans (Boethius 1989, chap. 5). Any semiotic culture is, in a profound way, both real and imaginary. We could say that culture is nothing but cemented, habituated, institutionalized, or externalized imaginations that are used for further imaginations and externalizations. Music, art, humor, and science are human activities where the creative tensions between institutionalized constraints and imagination are widely cultivated (Deacon 2006). These activities are characterized by being both more constrained and regularized on one side and more relaxed (open to random or unconstrained events) on the other side than ordinary life. Institutionalized externalizations are essential both for demolishing old complexity and building new complexity. We cannot predict what in the future will be specifically imagined or institutionalized, but an externalization perspective on the past may tell us something about the general shape of certain transitions in this evolution. In the following, I will take a step back from the externalization of pitch that paved the way for increased polyphonic complexity to see how this relates to the connection between externalization and complexity more generally.

Externalization and the Emergence of Complexity My current understanding of externalization is influenced by modern evolutionary theory, not least the theory of what is called the major transitions in evolution (Maynard Smith and Szathmáry 1995). This is a theory of the evolution of higher-order complexity based on cooperating individuals that become integrated to a degree where the group ends up functioning more or less as a superindividual. The prototype example in biology is the evolution of multicellular organisms from the cooperation of unicellular organisms. Major transitions are about transitions in individuality. Such transitions are intimately linked to the emergence of new ways of communicating information between the previously autonomous individuals. Theories of major transitions in evolution are thus in a wide sense about the emergence and consequences of new tools of description. The development of social groups among humans based on the emergence of language is reckoned as one such major transition in evolution. We can use this perspective to look at more limited and specialized cases of emergence of new tools of description and new cooperative formations like the emergence of musical notation. The emergence of orchestras playing music that could not have existed without this notation could be understood, at least in part, as a major transition in this sense. I also propose that an alternative and more precise term for “major transition” could be an onto-synergistic

externalization of imagined, complex sound   211 transition, because the transition is about the emergence of an ontologically new entity (a new and higher-level individuality), and because the foundation for its emergence is the synergy obtained through new levels of cooperation enabled by new tools for the management of information and knowledge. Terrence Deacon’s related transition to what he calls higher-level teleodynamics is also an inspiration for my understanding of externalization; in particular, Deacon’s treatment of the relation between the dynamic (processes) and the static (constraints) (Deacon 2012b). Some short conceptual clarifications before attempting to produce a more explicit definition of externalization: Knowledge: useful information or information with significance for the organism (Deacon 2017); and Self: an organism’s most fundamental organizing principle and what defines its individuation (Deacon 2012b, 465–466). I may also use self in this wide sense almost as a synonym with individual, which may include even wider individualities or individual-like entities like an orchestra. The subjective self is regarded as a more special mode of the wider, organismic self (Deacon 2012b). In the context of historic developments of new tools of description (or more specifically semiotic externalizations), I propose to define and explain the concept externalization in the following way: Externalization denotes a process of change where knowledge (useful information) stored as an integral part of the dynamics of an individual self (in some broad sense intracognitively) is transferred to a more static storage medium external to the dynamic self (to some degree, extracognitively stored) in a way that makes the information accessible (for interpretation) multiple times for the same individual as well as for other individuals (or larger scale individualities) across longer stretches of time and space than before the externalization (that is, with more independence from the ephemeral immediacy of the dynamics of the individual self). The new possibilities for synergy and cooperation between a single self/individual and its externalizations as well as the new synergy between formerly non- or less-cooperating selves/individuals/individualities, can create new, extended, and more complex selves/ individualities. To the extent that the new, emergent level of individuality achieves a sustainable autonomy, we may call it an onto-synergistic transition.

Selves, Individuals, and Individualities Central to my definition of externalization is the contrast between the dynamics of an individual self and a more static storage medium external to the dynamic self. Here, two contrasts—internal-external and dynamic-static—are connected. As the knowledge or information in the externalized and more static medium becomes accessible multiple times to the same self as well as to others, new synergies by means of reflexive cognition can emerge between individuals and their externalizations as well as between several individuals engaging in social or distributed cognition and cooperative action. Synergistic

212   henrik sinding-larsen and cooperative action can become more or less strongly institutionalized or in other ways fixated as habits or addictions (Hui and Deacon 2010). To the extent that this institution becomes self-regenerating, self-repairing, and in other ways protects itself from dissolution (“death”), we may speak of the emergence of higher-order individuality or an onto-synergistic transition. The members of this higher-order individuality may share the benefits stemming from the social synergies. But they will in general also have to pay a price in the form of giving up some of their autonomy and uniqueness for the sake of the new and larger-scale individuality based on standardized differentiation and separation of labor or combination of labor (Corning and Szathmáry 2015). A former, organically grown, relational, and complex uniqueness is replaced with a new, higher-level uniqueness based on the enhanced combinatorial (often permutational) properties of the simpler, more standardized elements. The musicians in a symphonic orchestra or singers in a choir who are not allowed to improvise or do anything not indicated in the score could represent an example of such higher-level complexity made up of lower-level, standardized elements that have given up some of their autonomy to share the gains from a higher-level synergy. Also, in the development of musical notation, we saw the early neumes, where a single symbol (a ligature) depicted a cluster of notes together with their intervallic movements as well as an egocentrically grounded vertical placement on the paper, give way to staff notation where each pitch had a separate symbol (note-head) and where all intervals were allocentrically defined by exact vertical positions in a grid (the five-lined staff). Signs and symbols are not alive and do not have to literally give up autonomy in the same sense as living individuals. But there might nevertheless be some interesting similarities between the processes leading to standardization and increased combinatorial properties of elements in a semiotic system and musicians in an orchestra. It is the external and relatively static quality—the quality of being dissociated from the continuous stream of material and energetic consequences of the dynamic self (its egocentricity)—that provides descriptions and information with a certain distance of virtuality which in the next turn can become a support for creative imagination. Sounds are inherently dynamic and ephemeral. Descriptions of sounds (or aspects of or patterns in sound) on paper are more static and have, in that sense, a distance to the immediacy of the present. For the externalization process to become more than a halt or temporary postponing of the dynamics of the self (a kind of pause-button), it must also be possible to copy and manipulate the externalized information (the patterns that contain possible information) without presupposing or involving interpretation of these patterns. It must be possible to manipulate the externalized patterns in their preinterpreted state. With a metaphor from computer science we could express this idea as follows: It must be possible to rewrite a program while it is not running. If it is possible to come back to or revisit the static, “frozen,” externalized version of the sound patterns multiple times, to make copies, change one copy and compare the variant with the original, then the conditions for evolutionary processes are in place (heredity, mutation, variation, and selection). Such evolutionary-like processes supported by notation will often be an important part of a cognition/imagination (Szathmáry and Fernando 2011).

externalization of imagined, complex sound   213

Summing Up Throughout human history, new tools of description (new semiotic systems) have catalyzed the emergence of larger-scale social and cultural institutions. The emergence of language paved the way for the first human tribal groups based on culturally accumulated skills and tools. The emergence of the first cities and empires (Mesopotamia) was based on taxation that presupposed records of taxation based on the invention of writing (including numerals) (Scott 2017). On a smaller scale, but in the same direction, musical notation contributed to the emergence of large-scale symphony orchestras playing music with a harmonic complexity unthinkable within a purely oral tradition. It was notation’s quality of comprising externalized patterns of sound that enabled the synergies of multiple musicians playing different parts reading from multiple, identical copies of these patterns. Externalization also enabled new synergies between the two senses of vision and hearing as a support for the composer’s combined aural and visual imagination of the complexities of multipart polyphony. These several changes promoted each other in a dialectic or coevolutionary manner. The early tools for the description of sounds were tailor-made for the diatonic tone system with its relatively limited number of pitches available for singing, playing, and composing (imagining). But the notation’s explicit (externalized) character also made it easier to explore this limited pitch space to its edges from where it was possible to look further, toward new, “forbidden” or “unimaginable” notes and interval patterns. In the Middle Ages, music that included notes outside those accepted in the early notation systems was called musica ficta or musica falsa as opposed to the music within the system which was called musica recta or musica vera (“true” music) (Bent and Silbiger 2017). This lasted until the authoritative theoreticians (guardians of the notation norms) had accepted an unlimited use of the supplementary signs, accidentals, key signatures, and so forth. These amendments to the notation system implied that the staff, originally tailor-made to describe the seven-step diatonic tone system, could now describe a twelve-step, chromatic tone system. As the enhanced notation system now supported the imagination of more audacious harmonic modulations (in particular on instruments with fixed pitches like organs and harpsichords), an old discrepancy between notation and sound became more acute. All intervals that were visually identical on the staff (that spanned the same number of lines) were not acoustically identical as sounds. The result was that certain intervals that, on paper, should be consonant sounded dissonant. This discrepancy was eliminated by a fully homogenized tuning system. With the tuning system called twelve-tone equal temperament, one obtained for the first time a full symmetry between notated intervals and acoustic intervals. The synergy between the visual and the aural intervals in this way became complete, and the number of combinatorial possibilities increased significantly. Bach celebrated the path toward equal temperament with his famous collection Das Wohltemperierte Klavier, which, for the first time, exploited all the possible keys and modes of his time.

214   henrik sinding-larsen Notation of sounds (both intervallic and rhythmic aspects), imagination of sounds, production of sounds, and appreciation of sounds (the aesthetics of the new harmonic complexity) were all entangled and functioned at various historical stages as premises for each other’s development in what can be analyzed as processes of externalization. Semiotic externalization in music did not occur isolated from other cultural and societal externalization processes. The standardization of elementary intervals in the tone system, together with efficient tools of description, complemented the emergence of large-scale synergies and complexities in other domains like architecture. The thick walls of Romanesque churches were often made of stones with nonstandardized sizes more or less shaped by nature. The elements in walls, columns, and arches of Gothic cathedrals were more standardized, were thinner, and let in more light not least thanks to the use of mathematical calculations in the design of “flying buttresses” that supported higher constructions with less stone. The Gothic “revolution” coincides precisely in time with the development of staff notation, and the world’s first four-part polyphony was performed in 1198 ce in Notre Dame of Paris while it was still under construction. Increased specialization of work processes and complex divisions of labor also had cognitive consequences demanding both more standardized and more flexible selves able to calibrate to various and varying allocentric coordinates of reference. The emergence of the vanishing point in perspective painting is another example of the increased externalization that characterized the shift from the Middle Ages to early modernity. Egocentric, intuitive craftsmanship was, to an increasing degree, supplemented with explicit, written design based on geometry and other science-based principles. Increased standardization on a lower level combined with an increased complexity on a combinatorial level involved increased openness to combinatorial synergies. A full exploitation of the new combinatorial flexibility based on standardized elements presupposes an ability to switch between egocentric and allocentric perspectives. Performing a Gregorian plainchant in a single mode could be seen as celebrating an attitude toward life where the singers were never too far removed from either a spiritual or a tonal “home” and where the collection of notes was limited and predictable, as if given by God. The extended modulations made possible by the chromatic scale of equal temperament represented a new flexibility in perspective shifting, a cultural value that would become increasingly important and celebrated in the emergent, creative individualism of large-scale, complex modernity. Scales based on equal temperament, together with their harmonic possibilities, have now penetrated music cultures all over the world to the point where scales built on traditional, nonequalized intervals are either already extinct or critically endangered (Huron 2008).9 There exist a number of attempts to revive the qualities of prenotational music, both within Gregorian chant and various kinds of folk music. But the power of the many externalization processes in modernity, of which notation-based music is just one, is so massive that attempts to preserve “premodern” music qualities are extremely difficult except for those prenotational practices that lend themselves to externalization. The pioneers of the Gregorian chant revival movement put great emphasis on the earliest

externalization of imagined, complex sound   215 neumes, as this notation contained information about the previous oral chant tradition, information that was lost in later and more standardized editions of the chant books. But the revivers’ quest for the oldest, most “authentic” manuscripts has made them more obsessed with notation than ever before. The old neumes were treated as “directly externalized authenticity,” which becomes somewhat of a paradox if the aim is to revive qualities of a prenotational past (Bergeron 1998). Obviously, there exist no recordings of prenotational Gregorian chant. In that respect, certain folk music traditions are in a better position. Recorded music from oral traditions does exist and represents a new kind of externalization which captures many details that escaped the limited descriptive power of traditional notation. But a meticulous copying of a recorded tune can never be the same as learning music in a traditional, small-scale, oral setting. The increased power of recordings as externalizations might, in some senses, even increase the distance to an oral tradition because the externalized template for what is “authentic” becomes more totalizing with less room for a personal interpretation than one based on a crude transcription with standard notation. It seems that some aspects of a lower level of externalization simply cannot “survive” the descriptive power of higher-level externalizations. Today, thousands of people worldwide are at any one moment engaged in imagining and describing innumerable physical, social, and cultural processes by means of computer-based tools of description (not least for creating artificial intelligence and virtual reality, including elaborate soundscapes). These tools (programming languages) have a descriptive power far beyond anything the medieval and renaissance creators of musical notation could have imagined. With the modern sound and music applications of the digital age (Knakkergaard, this volume, chapter 6), the distinction between notation (descriptive tools) and music has to some extent been abolished. Whatever can be formally described can automatically be played, and whatever can be played can automatically be described. All aspects of life (or music) cannot be formally described, but the proportion that can, increases steadily. Humanity is undergoing multiple processes of externalization contributing to a major transition in cultural evolution with consequences comparable to those following the invention of writing or maybe even the emergence of language. My contention is that we may get a better understanding of what may be gained and lost in this transition by looking closely at what happened to music as a result of what in hindsight looks like a comparatively innocent medieval improvement of the Greek way of describing, imagining, and controlling musical sounds. The aim of the concept and theory of externalization is not to make normative judgments on what is “progress” or what is better or worse music. It is to show how externalization processes are deeply transformative and that increased complexity at a large scale may be inseparable from reduced complexity at lower levels. My ultimate goal with the concept of externalization applied to the history of describing and imagining musical sounds is to create a distance of reflection to both the historic processes and our current global dynamics so that we are better able to imagine what might follow. The ultimate ambition of the concept of externalization is thus to function as a good example of itself.

216   henrik sinding-larsen

Acknowledgments The research leading to this chapter has received funding from the European Research Council under EU’s Seventh Framework Programme (ERC Grant Agreement no. 295843), the Research Council of Norway (SAMKUL project no. 246893/F10), and Department of Social Anthropology, University of Oslo. Thanks to Hans M. Borchgrevink, Henning Kraggerud, Rob Waring, Tellef Kvifte, Tim Ingold, Chris Hann, Viggo Vestel, Tord Larsen, Alix Hui, Ola Graff, Maria Kartveit, Thomas Hylland Eriksen, Lars Henrik Johansen, Mark Grimshaw-Aagaard, and Martin Knakkergaard for important feedback to the manuscript and contributions to the writing process.

Notes 1. Performer Per Hætta, 1960, Track 24 on CD (1995) Norsk folkemusikk: 10: Folkemusikk frå Nord-Noreg og Sameland. (Norwegian folk music: vol. 10: Folk music from Northern Norway and Sameland). Grappa musikkforlag AS, Oslo. GRCD 4070. 2. Thanks to Hans M. Borchgrevink for this story. 3. An in-depth genealogy of the concept externalization falls outside the scope of this chapter, but Leroi-Gourhan’s term “exteriorization” from 1964 is a precursor (see Ingold 1999) and Hegel’s term “entaüsserung” from 1809, translated as “externalisation” in Rae (2012), may be the first use with a related meaning to the one used in this chapter. 4. The two dimensions of pitch have been depicted in a combined way as a three-dimensional spiral or helix ascending one octave per cycle (Deutsch 2013). But the ladder, circle, and helix are simplifications of the complex and entangled relationship that exists between pitch heights and pitch classes involving both physical, physiological, and cultural factors. 5. The terms “octave,” “fifth,” “fourth,” and so on, refer to the intervals one covers with scale steps in the diatonic tone system counting the starting pitch of the scale as the first scale degree. 6. Several other and more personal and cultural factors than vibrational interference patterns may influence the judgment of consonance versus dissonance. Nonetheless: “Preference for consonance over dissonance is observed in infants with little postnatal exposure to culturally specific music . . . . Consonance and dissonance play crucial roles in music across cultures: whereas dissonance is commonly associated with musical tension, consonance is typically associated with relaxation and stability” (Thompson 2013, 108–109). 7. The anonymous author was for centuries thought to be Odo of Cluny (this is now proven to be incorrect, and instead the author is often referred to as Pseudo-Odo) (Atkinson 2009). 8. A dynamic visualization of Bach’s notation-based polyphony can be watched on the online video “Music+Math; Symmetry” provided by the Santa Fe Institute: http://tuvalu.santafe. edu/projects/musicplusmath/index.php?id=35. Accessed November 10, 2017. 9. Small-scale music cultures are particularly affected, although ethnopolitical movements may to some extent counteract these global trends. An example: In Norwegian folk song, nonequalized scales that previously were nearly extinct are now regarded as a valuable cultural trait and increasingly used among professional folk singers. However, they do not represent a widespread music culture and their mastery of the traditional scales mostly resembles that of a second language while their first tonal language remains in equal temperament.

externalization of imagined, complex sound   217

References Atkinson, C. M. 2009. The Critical Nexus: Tone-System, Mode, and Notation in Early Medieval Music. Oxford: Oxford University Press. Bent, M., and A. Silbiger. 2017. Musica Ficta. Grove Music Online. Oxford Music Online. http:// www.oxfordmusiconline.com/subscriber/article/grove/music/19406. Accessed November 15, 2017. Berger, A.  M.  B. 2005. Medieval Music and the Art of Memory. Berkeley: University of California Press. Bergeron, K. 1998. Decadent Enchantments: The Revival of Gregorian Chant at Solesmes. Berkeley: University of California Press. Boethius, A. M. S. 1989. Fundamentals of Music. Translated, with introduction and notes by C. M. Bower. Edited by C. V. Palisca. New Haven, CT: Yale University Press. Buzsaki, G., and E. I. Moser. 2013. Memory, Navigation and Theta Rhythm in the HippocampalEntorhinal System. Nature Neuroscience 16 (2): 130–138. Corning, P. A., and E. Szathmáry. 2015. “Synergistic Selection”: A Darwinian Frame for the Evolution of Complexity. Journal of Theoretical Biology 371: 45–58. Deacon, T. 2006. The Aesthetic Faculty. In The Artful Mind: Cognitive Science and the Riddle of Human Creativity, edited by M. Turner, 21–53. Oxford: Oxford University Press. Deacon, T. W. 2012a. Beyond the Symbolic Species. In The Symbolic Species Evolved, edited by T. Schilhab, F. Stjernfelt, and T. Deacon, 9–38. Dordrecht, Netherlands: Springer. Deacon, T.  W. 2012b. Incomplete Nature: How Mind Emerged from Matter. New York: W.W. Norton. Deacon, T.  W. 2017. Information and Reference. In Representation and Reality in Humans, Other Living Organisms and Intelligent Machines, edited by G.  Dodig-Crnkovic and R. Giovagnoli, 3–15. Cham, Switzerland: Springer. Deutsch, D. 2013. The Processing of Pitch Combinations. In The Psychology of Music, 3rd ed., edited by D. Deutsch, 249–325. San Diego: Elsevier. Erickson, R., and C. V. Palisca. 1995. Musica enchiriadis and Scolica enchiriadis. New Haven, CT: Yale University Press. Freedman, P. 2011. The Early Middle Ages, 284–1000 (HIST 210). Open Yale Course Online Lecture. https://oyc.yale.edu/history/hist-210. Accessed May 16, 2016. Goodall, H. 2013. The Story of Music. London: Chatto & Windus. Goody, J., and I. Watt. 1963. The Consequences of Literacy. Comparative Studies in Society and History 5 (3): 304–345. doi:10.1017/S0010417500001730. Graff, O. 2007. Om å forstå joikemelodier: Refleksjoner over et pitesamisk materiale. Svensk Tidskrift för Musikforskning 89: 50–69. Hansen, F. E. 2003. Tonesystem. In Gads musikleksikon: Sagdel, Vol. 2, edited by F. Gravesen and M. Knakkergaard, 1641–1646. Copenhagen. Denmark: Gad. Hui, J., and T. Deacon. 2010. The Evolution of Altruism via Social Addiction. In Social Brain, Distributed Mind, edited by R. I. M. Dunbar, C. Gamble, and J. Gowlett, 177–198. Oxford and New York: Oxford University Press. Huron, D. 2008. Lost in Music. Nature 453 (7194): 456. Ingold, T. 1999. “Tools for the Hand, Language for the Face”: An Appreciation of Leroi-Gourhan’s Gesture and Speech. Studies in History and Philosophy of Biological and Biomedical Sciences 30 (4): 411–453. Levy, K. 1998. Gregorian Chant and the Carolingians. Princeton, NJ: Princeton University Press.

218   henrik sinding-larsen Maynard Smith, J., and E.  Szathmáry. 1995. The Major Transitions in Evolution. Oxford: Freeman Spektrum. Odling-Smee, J. 2010. Niche Inheritance. In Evolution: The Extended Synthesis, edited by M. Pigliucci and G. B. Müller, 175–207. Cambridge, MA: MIT Press. Ong, W. J., and J. Hartley. 2012. Orality and Literacy: The Technologizing of the Word. 3rd ed. London and New York: Routledge. Rae, G. 2012. Hegel, Alienation, and the Phenomenological Development of Consciousness. International Journal of Philosophical Studies 20 (1): 23–42. Saulnier, D. 2009. Gregorian Chant: A Guide to the History and Liturgy. Orleans, MA: Paraclete Press. Scott, J. C. 2017. Against the Grain: A Deep History of the Earliest States. New Haven, CT: Yale University Press. Sinding-Larsen, H. 1983. Fra fest til forestilling: Et antropologisk perspektiv på norsk folkemusikk og dans gjennom skiftende materielle, sosiale og ideologiske betingelser fra nasjonalromantikken og fram til i dag. Magister Artium dissertation. University of Oslo. Sinding-Larsen, H. 1987. Information Technology and the Management of Knowledge. AI & Society: The Journal of Human-Centred Systems and Machine Intelligence 1 (2): 93–101. Sinding-Larsen, H. 1991. Computers, Musical Notation and the Externalization of Knowledge: Towards a Comparative Study in the History of Information Technology. In Understanding the Artificial: On the Future Shape of Artificial Intelligence, edited by M. Negrotti, 101–125. London: Springer. Sinding-Larsen, H. 2008. Externality and Materiality as Themes in the History of the Human Sciences. Fractal: Revista de Psicologia 20 (1): 9–17. Szathmáry, E., and C.  Fernando. 2011. Concluding Remarks. In The Major Transitions in Evolution Revisited, edited by B. Calcott and K. Sterelny, 301–310. Cambridge, MA: MIT Press. Thompson, W.  F. 2013. Intervals and Scales. In The Psychology of Music, 3rd ed., edited by D. Deutsch, 107–140. San Diego: Elsevier. Wishart, T., and S.  Emmerson. 1996. On Sonic Art. Contemporary Music Studies, 12. Amsterdam: Harwood.

chapter 11

“.  .  .  th ey ca l l us by  ou r na m e  .   .  .” Technology, Memory, and Metempsychosis Bennett Hogg

Introduction In the following chapter I shall be proposing that living, as we do, in a world where sound recordings are a major element in our sonic ecosystems, we cannot think about sound without considering the ways in which recording technologies affect and inform our experiences of listening more widely. That perception and memory have been invoked to account for sound recording is widely noted (perception and memory forming a structurally congruent pair to recording and playback). However, in line with phenomenological positions developed since Husserl, we cannot discount imagination when we talk about perception and memory. This problematizes the assumed congruency between sound recording and memory, there being no equivalence of imagination immanent to the medium of sound recording itself (as opposed to imaginative ways artists might use sound recording). After examining several problematic mappings of sound recording and memory I shall be proposing the animistic doctrine of metempsychosis, or the transmigration of souls, as a more suitable model of sound recording than the more obvious and culturally embedded one of memory. Recordings of all kinds, from the written word to the digital photograph, have, for millennia, held associations with memory. Since at least Plato’s concerns that writing, as one form of recording, undermines human memory, to the relatively recent use of the term “memory” to refer to storage of information on computer hard drives, recording and memory have gone hand in hand. The etymology of the word “recording” refers, of course, directly to remembering—from the Latin recordare—though it is worth pointing out at the outset that to record and to remember are not always the same thing, though

220   bennett hogg they are often part of the same, larger process. Freud, in 1930, proposed the gramophone as a prosthesis of memory, along with the camera as one of the “materializations” of the “innate faculty of recall” ([1930] 2004, 35). Sound recording has been widely—and on the whole unproblematically—figured as a metaphor of memory, as has the photograph. In parallel to this, memory has been conceived in terms of sound recording—or other forms of inscription (Freud [1924] 1961; Draaisma 2000; Terdiman 1993)—such that it is not always fully possible to determine exactly which is the metaphor and which the original object; indeed, as with so many phenomena, determining which aspect is originary and which consequent is difficult to determine; the conundrum of the chicken and the egg. Metaphors do have a tendency to acquire power over their referents, though (see also Walther-Hansen, volume 1, chapter 23, for more on metaphor and recording technology). Even as they illuminate those aspects of a phenomenon which they resonate with, their apparent efficacy can dazzle us, and cast into shade, those aspects of a phenomenon that are not accounted for in the work the metaphor does. We should not, therefore, take the prosthesis of memory to be the same thing as memory—a prosthesis may extend human capacities to remember, or compensate for the failures of memory (Armstrong 1998, 78), but its prosthetic activity is only really active as one among many different elements that go together to make up memory tout court. Memory, even as a metaphor, is less like a recording than we might at first think, complicated as it is by being a malleable element within a greater ecosystem of embodied consciousness. Recordings behave in ways very unlike memory, for similar reasons, being enmeshed in their own cultural, material, and creative ecosystems which, though having significant overlaps with ideas of memory because of their mediating role in human culture, are also in some respects quite radically separate from, and not well accounted for by recourse to, models drawn directly from human memory. In particular, recent thinking has credibly challenged formerly held ideas about the ways in which perception, memory, and imagination work together. A traditional linear model, in which perception sends images into memory which serves as a resource for imagination to draw on, is perhaps too strictly causal to account for the complex procedures involved in many acts of imagination; doodling absentmindedly on a piece of paper, for example, and then realizing that one has drawn a monster. Though distinct and dissimilar from one another, memory, perception, and imagination cannot, in terms of the ways in which they interact and mutually inform one another, be separated from one another, as Bergson asserted early in the twentieth century. In this chapter, I have used the constellation of memory and imagination as mutually critical tools to destabilize a received wisdom that understands recording and memory as being adequately similar to one another. The key problem with the admittedly persuasive idea that memory and recording can stand as productive metaphors of one another, is, in a nutshell, that recordings can stand on their own, and the signals they contain can remain more or less unchanged,1 can persist as entities, whereas human memories, through their positioning inside of a psychic ecosystem of consciousness, action, and agency in which forgetting, imagination, and supposition render them much less fixed, are much less discretely organized with respect to one another. Imagination, then, is my principal critical tool for prizing apart the connection of memory and recording, and at the same time a phenomenologically informed understanding of imagination affords a process

technology, memory, and metempsychosis   221 through which other conceptualizations of recorded sound become possible. What follows is, to paraphrase Freud, speculation,2 but this is a speculation that is concerned less with branching out into the unknown than with seeking to integrate my different readings of the disparate cultural phenomena that have become associated with sound recording.

Memory and Imagination Once we start looking into memory and imagination it soon becomes clear that we are dealing with multiplicities rather than directly definable, unitary phenomena, each of which have convoluted histories, and which, depending on the philosophical approach grounding them, continue to carry aspects of these histories into the present. Sometimes such aspects seem to cross over to, or to repopulate, sociocultural phenomena in the present day. Casey, for example, notes how memory has been “frequently confined to a passively reproductive function of low epistemic status” (1977, 187) in Western philosophy since at least Aristotle. The notion that memory is passive and merely reproductive places it in a subordinate position within consciousness in relation to sense perception, which, in Hume, Kant, and to an extent Merleau-Ponty, is seen as primary. Insofar as sound recording has been associated with memory, this subordination of it with respect to perception finds echoes in Adorno’s dismissive words on sound recording as “not good for much more than reproducing and storing a music deprived of its best dimension, a music, namely, that was already in existence before the phonograph record and is not significantly altered by it” ([1934] 2002, 278). Even if, as Thomas Levin notes, a distrust of the mimetic in Adorno may be “to some degree a function of the Jewish taboo on representation so central to Adorno’s aesthetic” (Levin 1990, 25), it nevertheless fits in with a more widely distributed distrust of “mere” copies or images, also manifest in various iconoclastic moments such as Puritanism’s destruction of religious paintings and statuary in England in the sixteenth and seventeenth centuries, and antagonized by the likes of Warhol and Lichtenstein bringing techniques and ideologies of the mechanical copy into the mainstream art world in the 1960s. But such a conflation of memory with the reproduction of copies misrepresents how memory operates—a misrepresentation that predates the inception of sound recording and which therefore demonstrates how already existent cultural and philosophical values colonize, as models of thinking, emergent technologies. Clearly, the phonograph was conceived as an extension of memory almost at the moment of its inception. Johnson reports that “at any future point in history” the recorded voice can be recalled (Johnson 1877), but the relatively “low epistemic status” that has historically accrued to memory is later compounded in the case of the phonograph through the central role sound recordings come to play (not really envisaged by its inventor in the early days) in mass culture—Adorno and Horkheimer’s culture industry—whose financial successes come at the price of a failure (if it is indeed a “failure”) to attain high cultural value as “Art.” Financial success and artistic failure are both, of course, factors in the reproducibility

222   bennett hogg of identical copies: as Adorno articulates it, reproduction-as-repetition is “the very antithesis of the humane and the artistic, since the latter cannot be repeated and turned on at will but remain tied to their place and time” ([1934] 2002, 278). In the eighteenth century, memory and imagination were seen as subordinate, secondary phenomena with respect to perception, “the act of acts, from which all other acts of mind are seen to stem” (Casey 1977, 188). In Hume’s Thesis, for example, sense “impressions” are primary, and are then “copied or reflected by ideas” (Warnock  1976, 37). Hume’s “ideas” seem to have the character of remembered “impressions,” but are not counted as “memories” per se. For Hume, the mental force that turns impressions into ideas is imagination, articulating something of the closeness with which imagination and memory have been associated together, beyond their commonality as the poor cousins of perception. In the same philosophical tradition, Hobbes had already proposed that “Imagination and Memory are but one thing, which for divers considerations hath divers names” (cited in Casey 1977, 188), and something of this seems to carry through into Hume’s “ideas.” That imagination might have a sort of bridging function is implicit in what has just been outlined, and also strongly present in the ways in which Kant, in Warnock’s analysis, understood an existence that was either purely intellectual or purely sensory to be an impossibility, with imagination serving to bring the intellectual and the sensory together. “Neither understanding alone nor sensation alone can do the work of imagination, nor can they be conceived to come together without imagination. For neither can construct creatively, nor reproduce images to be brought out and applied to present experience. Only imagination is in this sense creative; only it makes pictures of things. It forms these pictures by taking sense impressions and working on them” (Warnock 1976, 31). There are two degrees of imagination in Kant, the empirical (or reproductive) imagination, and the transcendental (or productive) imagination, distinguished from one another by the fact that “the transcendental imagination is said to have a constructive function . . . it is an active or spontaneous power” (Warnock 1976, 30). The reproductive imagination, by way of contrast, can only recreate images stored in the memory (see, 26–32).

Memory and Representation Toward the end of the eighteenth century Blake would write, “Imagination has nothing to do with memory,” in the margins to a collection of Wordsworth’s poems, identifying, in terms current among the nascent Romantic movement, imagination as “the Divine Vision” that is only vouchsafed to the “Spiritual” rather than “the Natural Man” (Blake [1927] 1975, 822). In seeking to promote the notion of a creative and spiritually inspired imagination, Blake repeats the compensatory denigration of another psychic element— memory. As noted, there has been a tendency to see memory in terms of the storage and retrieval of “images”—of copies. Casey notes that Western philosophy’s conceptions of memory—and, indeed, thought more generally—have, for centuries, been colored

technology, memory, and metempsychosis   223 and informed by the model of representation (Casey 1977, 187; 1993, 166–167). The ­representational model for understanding memory has, for Casey, “been given a privileged place in thinking about memory überhaupt.” A representational model leads to an understanding of memory as “reproductive,” and as a result “we witness a working presumption that all significant human remembering . . . is at once representational and founded on isomorphic relations between the representing content of what we remember and the represented thing or event we are recalling” (Casey  1993, 166). Dreyfuss notes how a similar ideological frame determines how human actions have, until recently, been understood, but proposes, in contrast to this, a phenomenological approach that resists the idea that representation is a prerequisite for action. “When everyday coping is going well one experiences something like what athletes call flow . . . One’s activity is completely geared into the demands of the situation” (Dreyfuss  1996, 35). Such “skillful coping does not require a mental representation of its goal” but rather, quoting Merleau-Ponty, “[a] movement is learned when the body has understood it, that is, when it has incorporated it into its ‘world,’ and to move one’s body is to aim at things through it; it is to allow oneself to respond to their call, which is made upon it independently of any representation” (Merleau-Ponty  1962, 139, quoted in Dreyfuss 1996, 37, emphasis added).

Memory beyond Representation As an example of nonrepresentational memory Casey gives the scenario of when one sees again someone not seen for decades: I do not check out an inner image, or other representation, of my friend: his face and body give themselves out as already (and instantly) recognizable to me, as featuring familiarity on their very sleeve, as it were. Here what is remembered, far from being continued in intrapsychic space, suffuses what I perceive as I perceive Burton [the friend]; and in this natural context Bergson is right to say that “perception is full of memories.”  (1993, 167–168)

The implied dialogism (or more accurately polylogism) of recognition is differently presented by Varela and colleagues, yet the refusal of the “stored object” model of memory and recognition remains a strong trope. Visual sensory data from the eye is met by activity that flows out from the cortex. The encounter of these two ensembles of neuronal activity is one moment in the emergence of a new coherent configuration, depending on a sort of resonance or active match-mismatch between the sensory activity and the internal setting at the primary cortex. The primary visual cortex is, however, but one of the partners in this particular neuronal local circuit at the LGN level. . . . Thus the behaviour of the whole system resembles a cocktail party conversation much more than a chain of command.  (Varela et al. 1993, 96)

224   bennett hogg Husserl underwrites the distinction between memory and imagination by claiming that memory retains while imagination protends, but some slight self-reflection will show this to be too baldly schematic. Casey argues, rather, that memory and imagination, while being distinct mental phenomena in their own rights, are “indispensable” to  one another. Their “mutual inclusiveness and co-iterability” and “their inbuilt ­co-operativeness” (Casey 1977, 194–195). Memory and imagination are not just “difficult to disentangle” from one another, “each act is indispensable in its collaboration with the other . . . not just essential but co-essential, essential in its very co-ordination with the other” (Casey 1977, 196).3 For Harpur, this interconnectedness of imagination and memory is also highly significant. Memory does not simply keep records of past events like files but: mixes them up with fantasies and imagined events . . . It even makes things up altogether, like imagination, and points to the fact that Mnemosyne (Memory) is the mother of the Greek Muses and infers from this “that memory is pregnant with imaginative power.”  (2002, 215–216)

Harpur’s resistance to thinking of memory in terms of file storage is congruent with the view of contemporary cognitive science. Memories “are not stored intact in the brain like computer files on a hard disk” but are built up from different elements in a process that is also “open to the world,” in other words, a process that incorporates environmental and social elements, adjusting and reconstructing memories in the light of the contemporary situation and conditions (Auyang 2000, 283). Sanders underlines this, noting how beliefs, ideas, and memories are not only in brains but also out in the world, to the extent that we put traces into the world, “which changes what [we] will be confronted with the next time it comes around,” so that our memories are not only carried around inside of us but are inscribed, as it were, in our worlds (Sanders 1996, paragraph 36). This brings us to an ecological sense of memory—and of imagination—and though J. J. Gibson himself was cautious about “the muddle of memory,” excluding recognition from his understanding of perception, Auyang proposes that an understanding of memory in terms of an ecosystem of thought is a viable project (Auyang 2000, 300–301). It should now be clear that the ideas of Varela and colleagues, as well as Casey, Dreyfuss, and Sanders reported earlier, are broadly compatible with understanding memory and imagination as participating elements within a greater ecosystem of consciousness that includes intellection, embodiment, action, agency, and sociocultural relations.

Metempsychosis We can begin to re-evaluate the cultural imaginary of sound recording by reflecting on the first page of Proust’s À la recherche du temps perdu, surely one of the most thoroughgoing and insistent engagements with memory and imagination in the Western literary

technology, memory, and metempsychosis   225 canon. Standing as an emblem at the beginning of Proust’s novel is an implicit reference to metempsychosis. Having drifted to sleep while reading, it seems to the book’s narrator that I myself was the immediate subject of my book . . . This impression would persist for some moments after I awoke. . . . Then it would begin to seem unintelligible, as the thoughts of a former existence must be to a reincarnate spirit.  (Proust [1913] 1985, 3)

“My book” here refers directly to the one the young Proust was reading as he fell asleep, and though Proust does not claim that what happens when he drifts off to sleep while reading is directly mnemic, his placing of this figure at the very beginning of a book in which he and his memories are the immediate subject invites a double reading of what “the immediate subject of my book” intends. In Proust’s conflation of memory and reincarnation memory is positioned less like a process of recording and more like the experience of being otherwise re-embodied. Somewhat later in the first chapter Proust writes: there is much to be said for the Celtic belief that the souls of those whom we have lost are held captive in some inferior being, in an animal, in a plant, in some inanimate object, and thus effectively lost to us until the day (which to many never comes) when we happen to pass by the tree or to obtain possession of the object which forms their prison. Then they start and tremble, and they call us by our name, and as soon as we have recognized their voice the spell is broken. Delivered by us, they have overcome death and return to share our life.  (47)

This immediately precedes the famous incident of the madeleine, in which voluntary memory, “the memory of the intellect” that “preserve[s] nothing of the past itself ” is presented as inferior to the so-called memoire involontaire that spontaneously and unexpectedly takes over the whole being, triggered not by a volitional intention but by an encounter with an object charged with memory. The profound sense of joy that results from the tea-soaked madeleine is figured in terms of an invasion by “something isolated, detached, with no suggestion of its origin.” If the book is all memory, and returning from the book (as it were) is to be reincarnated, it follows that Proust intuits a strong connection between memory and the transmigration of souls in his novel. The persistence of a soul, or a disembodied personality or intelligence after bodily death is congruent with the idea of metempsychosis—the transmigration of a soul from one embodiment to another. Sound and music are full of such references: the per­ sistence of the voice of Echo, or the autotransformation of the nymph Syrinx into a Panpipe in Ovid’s Metamorphoses; the voice of the murdered younger sister in the Scottish traditional ballad The Twa Sisters or, as it is sometimes known, Binnorie, or its Germanic equivalent set down in Grimm’s Household Tales and set to music by Mahler as Das klagende Lied, which emerges from the playing of an instrument made from her mortal remains—a harp or fiddle strung with her hair or a flute made from one of her bones; spirit mediums speaking with the voices of the dead they claim to be channeling, spirits of the departed temporarily taking over the physical bodies of the medium to speak through them; there is even a hint of the voice qua soul or personality as a vehicle of

226   bennett hogg transmigration in the behavior of the Cheshire Cat in Disney’s 1951 version of Alice in Wonderland, who appears as a disembodied voice, and persists momentarily as a voice and a fading smile (the detached outlet of a voice) after its body has dissolved. After the telematic technologies of phonography and telephony made their appearance around 1876–1878, one immediate set of cultural constellations in which they were quickly bound up involved the survival of death, the supernatural, and a means to communicate with the dead. The conflation of voices disembodied by recording, telephony, or radio, with the voices of the dead has been a clearly identifiable trope in fantasy literature and popular culture almost since the moment of these technologies’ inventions (Connor 2000; Kittler 1999; Sconce 2000; Weiss 2002; Hogg 2008), “a historically mediated imaginary . . . in which death is part of a cluster of ideas that gather around the image of technology” (Danius 2002, 181; see also Peters 1999, 137–176). This is also another clear instance of how emergent technologies are often colonized by ideas that predate their invention and that was mentioned earlier in this chapter. Memory, or more properly remembrance, is intimately linked with cultural practices around the death of someone, and so it is perhaps not surprising that technologies that can record moments in time—such as the movie camera, photography, sound recording— should step in as extensions of the mnemic capacity. But in the case of sound recording, this mnemic capacity is articulated through an apparent reappearance of the living presence of the departed. In gruesomely vivid terms, the narrator of Renard’s Death and the Shell (1907) evokes the image of departed friends seemingly brought back to life by listening to recordings of their voices on a phonograph: [O]n Wednesday the dead spoke to us. . . . How terrible it is to hear this copper throat and its sounds from beyond the grave ! . . . it is the voice itself, the living voice, still alive among carrion, skeletons, nothingness. (quoted in Kittler 1999, 53, emphasis added)

The voice then, as cipher of the soul, passes between the worlds of the dead and of the living through the mediumship of sound recording. Though memorial in its tone, this seems more like a visitation or a ghostly encounter than a memory.

Ghosts in the Machine? Uncanny Connections between Telegraphy and Phonography On the subject of ghosts, telegraphy served as a model for spirit communication from the time of the so-called Rochester Rappings of 1848. Here, two young sisters claimed to be able to communicate with spirits who knocked once for yes and twice for no in answer to questions put to them verbally. From this grew a spiritualist movement sometimes

technology, memory, and metempsychosis   227 characterized as “the spirit telegraph” (Sconce 2000, 21–28), “such fantastic visions of electronic telecommunications demonstrat[ing] that the cultural conception of a technology is often as important and influential as the technology itself ” (Sconce 2000, 27). From this it is interesting to speculate on what we might think of as phonography’s ghost, the technology that phonography left behind, as it were, the technology that Edison was intentionally working on when he chanced upon phonography: the automatic telegraph repeater. The distances over which Morse telegraphy was possible were originally limited by the resistance of the telegraph wires, which led to a degradation of signal to the point that, in order to transmit across the massive distances of the United States, repeater stations were needed in which a clerk would transcribe an incoming signal and then retransmit it manually to the next repeater station, and so on across the whole country. Not only did this take time and manpower, but it also meant that errors of reception and transmission, both technological and human, could creep into the system. Edison was working on a system whereby an incoming Morse signal would cut dots and dashes into a moving paper strip which would then pass through a mechanical reader. This reader would register, by means of a moving needle, the sequence of short and long signals, transmitting them onward almost instantaneously and, in theory, as absolutely exact copies. It was in experimenting with the increase in speed at which accurate relays were possible that Edison is said to have chanced upon the idea of recording sound—more particularly the human voice (Kittler 1999, 27–28; Wile 1977, 10–13). The original situation in which telegraph messages could be sent over long distances was that a human being would receive and transcribe the incoming message, which they would then retransmit in a second act of “writing.” The technology Edison was working on, though, moved from an imaginary of transcribing human bodies toward an imaginary in which information passes smoothly and automatically across great distances without passing through the body of other humans. Rather than a series of dictations and reinscriptions—each of them conceived, according to the traditional imaginaries of writing, as records—a disembodied energy passes from its origin to its destination without seeming to be recorded at all (though in Edison’s repeater it was, in fact, recorded, but not in writing by a human hand). To then seize on these transitional moments (the telegraph repeaters) in the relay of information and isolate them as a recording technology is, in some senses, to turn a means of transmission into a means of recording; records that were originally only made for the purposes of resending onward become things in themselves. Recording, then, according to this alternative genealogy, is the capture of an energy during a moment of its transmigration. If the story ended there we would have little more than an amusing observation, but much of Edison’s work was conducted in a milieu of spiritualist research and a grasping at electromagnetic explanations of avowedly psychic and supernatural phenomena, not with the aim of debunking myths and superstitions, but of arriving at scientific justification for such beliefs (Kahn 1994, 76–78; also Connor 2000, 362–393). As Connor puts it, “The commerce between the disembodied and the re-embodied, the phantasmal and the mechanical, is a feature in particular of the scientific understanding of the voice, but it [is] apparent too in the languages and

228   bennett hogg experiences of the Victorian supernatural, which coil so closely together with that work of scientific imagining and understanding” (Connor 2000, 363). And had the phonograph not made speech, “as it were, immortal” (Johnson 1877)? Like the soul? When we listen to an audio recording, is it really like remembering, though? Although the type of recording affects how we experience it—the voice of a departed loved one, a string quartet by Bartók, and the undefined distant rumble of traffic produce very different experiences—one general distinction between listening to a recording and remembering is that it is not necessary to re-experience something in the time it would take to happen in order to remember it. I can remember my wedding, for example, in an instant, whereas it occupied the best part of a whole day. I remember hearing Mahler’s Seventh Symphony in Newcastle City Hall similarly, as though it were, in mental time, a moment. Memory seems to compress and codify experiences, at some level. And though the etymology of phonography is concerned with the writing of sound, playing back a recording is nothing like reading, even if “reading” is a viable metaphor for what the machine is doing. Listening to a sound recording can seem like something altogether more intersubjective, and though it can evoke memories it feels more like an encounter with a sounding presence than recall per se. We see this in the “terrible” experience that Renard’s narrator has with “the voice itself, the living voice, still alive among carrion, skeletons, nothingness” (Kittler 1999, 53).

Proust: Telephone and Camera Rather as Casey conceives of imagination and memory as “not just essential but ­co-essential, essential in [their] very co-ordination with [each] other,” Harpur finds Proust’s memoire involontaire—the memory that seems to surge into consciousness unbidden, triggered by a chance encounter such as the madeleine dipped in tea—“analogous to imagination . . . the relationship between recollection and imagination is so richly interfused that it is as difficult to separate them as it would be to separate, in Proust’s novel, autobiography and art” (Harpur 2002, 212). Given that sound recording has been traditionally associated with recollection yet, as we have seen, the match is very far from perfect, it is useful to look briefly at two other technologies that occupy important positions in Proust’s elaboration of his remembering of times past. That the something that surges up as involuntary memory is “detached” and “isolated” finds a resonance in Proust’s experiences with his beloved grandmother, the hearing of her voice on the telephone isolated from seeing her face, and his view of her some short time later before she sees him and is able to return his gaze. In the latter instance, it is Freud’s other prosthesis of memory, the camera, that is invoked. Whereas the human eye is “marked by affection and tenderness . . . necessarily refracted by preconceptions” it “prevents the beholder from seeing the traces of time in the face of a loved one. . . . Memory thus prevents truth from coming forward” (Danius 2002, 15). The camera eye, though, invoked by Proust to

technology, memory, and metempsychosis   229 account for the shock of seeing the “red-faced, heavy and vulgar, sick, vacant . . . dejected old woman whom I did not know” (Proust quoted Danius 2002, 15), “carries no thoughts and no memories, nor is it burdened by a history of assumptions. For this reason, the camera eye is a relentless purveyor of truth” (Danius 2002, 15). Though the horror dissipates as soon as eye contact is made, the moment experienced “hint[s] at her impending death.” Here, though, it is not the photographic record that is deathly (as in Barthes’s Camera Lucida, with its “anterior future” case of “he is dead, and he is going to die” (Barthes 2000, 96), but the technological gaze of the camera. Memory, rather than a dead record, in fact moderates vision and hearing, humanizes and warms it. Beckett takes the same episodes and, like Danius, brings out the ways in which memory mediates rather than models recording. He writes, “the laws of memory are subject to the more general laws of habit” (Beckett [1931] 1999, 18–19), and it is habit, when he visits his grandmother after the experience of their telephone conversation has so unsettled him, which is “in abeyance, the habit of his tenderness for his grandmother . . . the notion of what he should see has not had time to interfere its prism between the eye and its object. His eye functions with the cruel precision of a camera; it photographs the reality of his grandmother” (27–28). The telephone, though, is a more productive technology to examine in terms of memory, imagination, and the transmigration of souls. As already noted, the telegraph had required the intermediation of human bodies to transmit over large distances whereas, as Connor puts it, the telephone “allowed for intimate communication between two interlocutors alone” (Connor 2000, 362). In its near immediacy of transmission, and its avoidance of writing and reinscription/retransmission, the telephone fails as a technological prosthesis of memory, or as a metaphor for memory. Connor, though, notes how the “striking co-incidence in time” of the discovery of the two inventions (within a year of one another) “allows us to see the two inventions as different forms of, or relays in, some single, but polymorphous prosthetic apparatus” (362). When we consider Connor’s suggestion in the light of the telegraph as not only a technological forebear of phonography, and the telematic communication system preceding telephony, but also as a metaphor for communication with the souls of the dead (the Spirit Telegraph), this “polymorphous prosthetic apparatus” allows for an imaginary in which memory and metempsychosis fold over one another, but where at least from the technological perspective metempsychosis, as the motion of an energy through different embodiments in media, seems a more plausible model of what is happening with sound recording technologies. I do not believe that, in phenomenological terms, our experience of sound recordings is like the retrieval of files from a storage medium, and neither is our experience of memory. It is worth noting that memory is nothing like a unitary phenomenon but is, instead, a whole range of sometimes independent, sometimes interconnected cognitive and embodied processes (Casey 1993, 165–169). This is one reason why it does not really work as a metaphor of recording, though recording does seem—on the surface of things—to work as a metaphor of remembering. Recording sounds and playing them back seems like a mnemic process, but our experience of listening to such recordings is very different,

230   bennett hogg as already noted. As Peters has described it, there is a cultural, viable dimension to the recorded voice that comes over as in a sense “oracular,” a direct transmission from a being whose consciousness is elsewhere and other to our own, and with whom there is no possible dialogue. Though to be in the presence of a recorded voice is in many respects to experience another subjectivity, it is not in any real sense an intersubjectivity of participants with equivalent status. We are told things by the recorded voice, but there is no sense of any interlocution. This gives rise, in Peters’s account, to voices whose contents are available for hermeneutic readings, but not for dialogic interrogation. As such, the voices of the dead—which we encounter most conventionally of course through sound recordings—are “the paradigm case of hermeneutics: the art of interpretation where no return message can be received” (Peters 1999, 149).

Recordings Sounds, though, are recorded. If we experience them as oracles, voices of the dead, idealized memories, writings, they nevertheless remain as records (rather than the more complicated memories). The content of a memory does not remain constant in itself, but is shaped, molded, even materially changed as it combines with other associations, joins forces with other bits of information in a story that may well conflate different events without even realizing that this has happened. In Freud’s theory of screen memories, for example, modifications to memory are repressed as modifications and “remembered” as having actually occurred; such transitive and unreliable qualities of human memory have already been extensively noted. Memory-content, then, has no independent existence, and no permanence; it is as much an effect of the ecosystem of which it is a part as it is an agent that has an effect on that ecosystem. Sound recordings, though, seem self-sufficient and, as Edison himself put it, are “as it were, immortal” (Johnson 1877). We know this, though, not to be true. In the first case, recordings physically deteriorate, they are not immortal in any reliably ontological sense. In this respect, they seem once more to resemble human memory. Many of us will have known friends or relatives with Alzheimer’s disease, or similar degenerative neurological conditions, and the loss of memory for these patients is often likened to a kind of (premature) death; someone is said to have “left” long before they actually died, for example. Listening to a very decayed phonograph recording—such as the one supposed to be of Brahms playing the piano—seems to suggest the fogged and abrasive feeling of not remembering. In the final moments of Ibsen’s play Ghosts, for example, Osvald is paralyzed in the final stages of syphilis, and rapidly collapsing into dementia. His mother asks him if there is anything he wants, and he asks for “the Sun,” which has just risen in a bright dawn after days and days of rain and darkness. When his mother questions this, he repeats, “the Sun,” and then, like a cracked record, repeats again “the Sun . . . the Sun” without any change of expression. Ibsen directs that Osvald “repeats dully and tonelessly,” and then “tonelessly as before,” In contrast, his

technology, memory, and metempsychosis   231 mother—still human, not reduced to a broken machine—is given extensive and detailed indications of the gamut of emotion that should be expressed: she “trembles with fear”; “throws herself on her knees”; “tears her hair with both hands”; “whispers as though numbed”; “shrink a few steps backwards and screams”; “stares at him in horror” (Ibsen [1881] 1973, 97–98). Juxtaposed against one another, then, are Osvald’s deathly, machinelike voice uttering a single sound like a broken record, and the terrible range of violently shifting, painfully human emotions specified by Ibsen and performed for the audience by Osvald’s mother. Ghosts was written in 1881, only four years after the invention of the phonograph, and there is no evidence that Ibsen conceived of Osvald’s expressionless repetitions because he had ever heard a broken phonograph cylinder playing. However, the fact that Osvald’s loss of his personality is performed through a mechanical and expressionless cycle of repetitions does show a strong congruency in the cultural imagination between a dehumanized body and the sound of a broken machine, particularly one designed to sustain beyond the immediate present the human voice, that atavistic marker of presence and soul. The “talking machine” was a sensation in the late nineteenth century because it managed to do what no machine prior to it had been able to do4—it spoke. The distorted, identical repetitions of the cracked record, though, present nothing but a machine to the listener, and the suspension of disbelief that sustains the persistence of a human presence is broken. Understandably, perhaps, there is a sense of pathos around a recording that has degenerated. There is a constellation of anthropomorphization that encompasses loss of memory, a sense of dehumanization of the broken voice, and the transience of all things human. At the same time, there is opened the possibility for a further anthropomorphi­ zation of the fading signal because the recorded speech, like memory, is not immortal, whatever Edison said about it in 1877. It is worth noting, though, that loss of conscious memory is only “tragic” when it is framed as such; forgetting is, in many respects, an essential condition of being able to function in the world without collapsing beneath the onslaught of multitudinous and mostly distracting or irrelevant connections and associations between disparate pieces of information. But human memory does not only fade through a process of physical damage, or through the natural entropy of biological matter. Imagination fills in gaps, creates memories that never happened, forms connections that never pertained in “the real world.” As I pointed out at the very start of this chapter, imagination is a phenomenon that shines a critical light on the too easy association of recording and memory and brings a subtle pressure to bear to explore other kinds of relations. For instance, we tend to think of analog recordings as more or less inert material onto which a signal is recorded. We experience the playback of a recording as a separation of the signal from the medium, in part because we recognize the sound as a voice or an instrument or a steam train which exists, or existed, outside of the recording. Additionally, we experience the sounds as being physically separate from the medium as they emerge from loudspeakers, not the disc or tape itself.5 All of this serves to reinforce the notion that there is the material medium on the one hand, and the signal on the other, the medium serving only as a

232   bennett hogg partial and temporary means for the disembodied energy of the sound “itself ” to be transmitted through. Again, the transmigration of souls seems a more apposite model for this experience than does human memory. But is this the only way to conceive of this? In the case of a vinyl recording, for instance, the “curves of the needle,” as Adorno described them ([1928] 2002), are a physically integral part of the disc, and their playing back is, arguably, simply the sound the disc itself makes under the particular conditions of its playback via a turntable and cartridge. The voice or music we hear is a sonic property of a solid piece of matter; there is no real separation, any more than the sound of a cymbal, or a piano, could be considered as a separate property of the metal or the piano string. The sound of a cymbal or a piano is the articulation of the sonic properties of its constitutive materials as they are held in a particular state, and under particular conditions of excitation. Can we think of the sound of a recording in terms of being simply the sonic properties of the disc, or the tape, itself, once it has been excited in a suitable way? To do so allows us to reconfigure, in a less anthropomorphizing way, our understanding and our imagination of the fading signal. For the surrealist poet and thinker André Breton, “the marvelous” is that which stands outside of the natural order, which confounds rational expectations we may have of the world, and which encompasses automata,6 the so-called fixed-explosion, and objective chance (Foster 1997). He proposes the ruin as an instance of the marvelous in which nature retakes culture in a reversal of the humanistic notion that culture has the domination over nature. In this, he mirrors the very aim and objective of surrealism as “the future resolution of these two states, dream and reality, which are seemingly so contradictory” as it is stated in the first of the manifestoes of surrealism (Breton [1924] 2004, 14). This is a “resolution” that tends more toward a shift of balance away from the culturally and traditionally dominant side of the binarism toward placing more significance on the generally subordinate term, rather than a genuine leveling out. The ruin, as a formulation modeled on this pattern of thinking, shows a cultural construction succumbing to nature, just as automatic writing models the dictation of the unconscious (figured naively in much surrealism as more “natural” than conscious thought) against rationally constructed narratives. Automatic writing, or other forms of articulating “pure psychic automatism,” is “[d]ictated by thought, in the absence of any control exercised by reason, [and is] exempt from any aesthetic or moral concern” (Breton [1924] 2004, 26). Taking this as a model, can we take another view of the apparently fading signal, with its pathetic anthropomorphic associations, and see not the fading of human memory, but simply the medium reasserting itself as nature reasserts itself in the ruin? Evading the separation of signal and medium in this way makes it possible to imagine the record as an object where signal and medium are completely integrated. Instead of a human memory, the sound of the record evidences something more like the soul of one departed that has migrated into an apparently inanimate, “inferior” object, as Proust states “the Celtic belief ” to be. It is important, I think, to ensure that one thinks in terms of the “reality,” as it were, of the nonseparation of signal and medium, and the “imaginary” of the metempsychosis of sound, just as memory is also an “imaginary” of recording.

technology, memory, and metempsychosis   233 As an alternative imaginary for recording, the notion of an entity migrating into an object and awaiting the right conditions for its release seems in fact no more or less fanciful than thinking of it in terms of a memory. Additionally, contemporary creative practices, such as sampling and remixing, tend to treat recordings as mobile and transferable entities that can be detached and reassembled, much closer to the cooperation of imagination and memory than of memory conceived in isolation. Such practices also model the mobility of energies and identities implicit in the transmigration of souls; musical creativity, at least where recordings form the prima materia of a practice, seems in many respects more like assembling (dare I say summoning?) a community of “souls” than a stitching together of memories, insofar as the materials are put into relation with each other, in which they mediate and interact with one another, rather than being selectively recalled as isolated quanta of information. The reduction of voice to the circulation of “detachable phonemes” is an attribute of what Ivan Kreilkamp (1997) has termed “the phonographic logic” of the voice, which he elaborates through a close reading of Conrad’s novella Heart of Darkness. Kreilkamp notes how, for early listeners to the phonograph, there was something “disturbingly anti-mimetic” about the phonograph, in which the “whole person is not made immortal” just “part-objects, signs standing in for the whole” (221). Such a detachment, characterizing the recording as “a disturbing fragmentation of the human subject into circulating bits of sound,” (221) seems very different to the intimacy and incorporation of sound that characterizes a memory; we tend to talk about our memories, after all. Kreilkamp’s observations, made in connection with the ways that Conrad figures the voice in his novella Heart of Darkness, proposes a phonography of fading, broken, dissociated voices that combine into an uncanny present reality colored and inflected with the past and the coexistence in the present of the deceased. Taken all-in-all, then, I have tried to make a case in which imagination is brought to bear on both sound and memory, destabilizing the assumed relationships between sound recording and memory. Imagining beyond the established model of recording as a metaphor of memory, and of memory as a metaphor of recording, I have sought after a fuller, more inclusive way to conceive of sound recording afforded by such a critical use of imagination. Drawing on various sources in which sound recording features in the cultural imaginary, I have proposed that metempsychosis, the transmigration of souls, though admittedly controversial, may afford a productive way to approach the ways that sound recordings have been deployed, and may have been consciously and unconsciously understood, in the period since their inception.

Notes 1. This is of course a more complex situation; the artifacts arising from damage to, or degradation of certain media (scratches, glitches, drop-outs, etc.) have a perceptible effect on the status of the recorded signal, and the process by which a signal is reanimated and received

234   bennett hogg has much to do with when and by whom it is heard. Arguably, though, that which is essential in the original recorded signal is retained to a high degree. 2. At the very beginning of chapter 4 of Beyond the Pleasure Principle, Freud writes, “What follows now is speculation, speculation often far-fetched, which each will according to his particular attitude acknowledge or neglect. One may call it the exploitation of an idea out of curiosity to see whither it will lead” (Freud [1920] 1961, 24). 3. Casey notes three other instances where this is the case—screen memories, dreams (Freud), and time-consciousness (Husserl) (Casey 1977, 196). 4. In du Moncel’s early account of the phonograph (du Moncel 1879), he writes of it in relation to the telephone and the microphone but also, as an afterthought, almost, to Faber’s Speaking Machine, a mechanical organ-like device that seems to have attempted to physically recreate, through bellows and different shaped pipes and resonators, the phonemes of human speech. 5. I refer specifically to analog recording systems because the issues of signal and medium are perhaps more explicitly foregrounded than with digital systems, though the differences between digital and analog recordings with respect to the current discussion is not as large or significant as might be imagined. In addition, the culturally significant resurgence of cassette tape and vinyl discs over the past ten years or so, especially in DIY culture and certain areas of experimental music and sound art, means that these forms of recording are still very much a significant element in the audio culture of today. 6. As Foster (1997) shows, eighteenth- and nineteenth-century automata such as The Little Writer, The Chess Playing Turk, and the Harpsichord Player exercised a complex fascination over many of the surrealists, and André Breton in particular.

References Adorno, T. W. (1927) 2002. The Curves of the Needle. In Essays on Music, edited by R. Leppert, 271–276. Berkeley: University of California Press. Adorno, T. W. (1934) 2002. The Form of the Phonograph Record. In Essays on Music, edited by R. Leppert, 277–282. Berkeley: University of California Press. Armstrong, T. 1998. Modernism, Technology and the Body. Cambridge: Cambridge University Press. Auyang, S. Y. 2000. Mind in Everyday Life and Cognitive Science. Cambridge, MA, and London: MIT Press. Barthes, Roland. 2000. Camera Lucida. Translated by R. Howard. London: Vintage. Beckett, S. (1931) 1999. Proust and Three Dialogues. London: John Calder. Blake, W. (1927) 1975. Poetry and Prose of William Blake. Edited by G. Keynes. London: The Nonesuch Library. Breton, A. (1924) 2004. Manifesto of Surrealism. In Manifestoes of Surrealism, edited by A. Breton, translated by R. Seaver and H. R. Lane, 3–47. Ann Arbor: University of Michigan Press, Ann Arbor Paperbacks. Casey, E. S. 1977. Imagining and Remembering. Review of Metaphysics 31 (2):187–209. Casey, E. S. 1993. On the Neglected Case of Place Memory. In Natural and Artificial Minds, edited by R. G. Burton, 165–185. Albany, NY: State University of New York Press. Connor, S. 2000. Dumbstruck: A Cultural History of Ventriloquism. Oxford and New York: Oxford University Press.

technology, memory, and metempsychosis   235 Danius, S. 2002. The Senses of Modernism: Technology, Perception, and Aesthetics. Ithaca, NY: Cornell University Press. Draaisma, D. 2000. Metaphors of Memory: A History of Ideas about the Mind. Translated by P. Vincent. Cambridge: Cambridge University Press. Dreyfuss, H.  L. 1996. The Current Relevance of Merleau-Ponty’s Phenomenology of Embodiment. Electronic Journal of Analytic Philosophy 4. http://ejap.louisiana.edu/EJAP/ 1996.spring/dreyfus.1996.spring.html. Accessed May 15, 2017. du Moncel, T. A. L. vicomte. 1879. The Telephone, the Microphone and the Phonograph. London: C. Keegan Paul. Foster, H. 1997. Compulsive Beauty. Cambridge, MA, and London: MIT Press. Freud, S. (1920) 1961. Beyond the Pleasure Principle. In The Standard Edition of the Complete Psychological Works of Sigmund Freud XVIII, edited and translated by J.  Strachey, 7–64. London: Hogarth. Freud, S. (1924) 1961. A Note on the Mystic Writing-Pad. In The Standard Edition of the Complete Psychological Works of Sigmund Freud XIX: The Ego and the Id and Other Works, edited and translated by J. Strachey, 226–232. London: Hogarth. Freud, S. (1930) 2004. Civilization and Its Discontents. Translated by D.  McLintock. Harmondsworth, UK: Penguin. Harpur, P. 2002. The Philosophers’ Secret Fire: A History of the Imagination. London: Penguin. Hogg, B. 2008. The Cultural Imagination of the Phonographic Voice 1877–1940. PhD thesis, University of Newcastle upon Tyne. Ibsen, H. (1881) 1973. Ghosts. Translated by M. Mayer. London: Eyre Methuen. Johnson, E. H. 1877. A Wonderful Invention—Speech Capable of Indefinite Repetition from Automatic Records. Scientific American 37 (20): 304. Kahn, D. 1994. Death in Light of the Phonograph. In Wireless Imagination: Sound, Radio, and the Avant-Garde, edited by D. Kahn and G. Whitehead, 69–103. Cambridge, MA: MIT Press. Kittler, F. 1999. Gramophone Film Typewriter. Translated by G. Winthrop-Young and M. Wutz. Stanford, CA: Stanford University Press. Kreilkamp, I. 1997. A Voice without a Body: The Phonographic Logic of Heart of Darkness. Victorian Studies: An Interdisciplinary Journal of Social, Political, and Cultural Studies 40 (2): 211–244. Levin, T.  Y. 1990. For the Record: Adorno on Music in the Age of Its Technological Reproducibility. October 55: 23–47. Merleau-Ponty, M. 1962. The Phenomenology of Perception. Translated by C. Smith. London: Routledge and Keegan Paul. Peters, J. D. 1999. Speaking into the Air: A History of the Idea of Communication. Chicago and London: University of Chicago Press. Proust, M. (1913) 1985. Remembrance of Things Past, Vol. 1: Swann’s Way and A Budding Grove. Translated by C. K. S. Moncrieff and T. Kilmartin. Harmondsworth: Penguin. Sanders, J. T. 1996. An Ecological Approach to Cognitive Science. Electronic Journal of Analytic Philosophy 4. http://ejap.louisiana.edu/EJAP/1996.spring/sanders.1996.spring.html. Accessed May 15, 2017. Sconce, J. 2000. Haunted Media: Electronic Presence from Telegraphy to Television. Durham, NC, and London: Duke University Press. Terdiman, R. 1993. Present Past: Modernity and the Memory Crisis. Ithaca, NY: Cornell University Press.

236   bennett hogg Varela, F.  J., E.  Thompson, and E.  Rosch. 1993. The Embodied Mind: Cognitive Science and Human Experience. Cambridge, MA, and London: MIT Press. Warnock, M. 1976. Imagination. London: Faber & Faber. Weiss, A.  S. 2002. Breathless: Sound Recording, Disembodiment, and the Transformation of Lyrical Nostalgia. Middletown, CT: Wesleyan University Press. Wile, R.  R. 1977. The Wonder of the Age: The Edison Invention of the Phonograph. In Phonographs and Gramophones, 9–48. Edinburgh: Royal Scottish Museum.

chapter 12

M usica l Sh a pe Cogn ition Rolf Inge Godøy

Introduction This chapter shall explore notions of shape in our experiences of music, “shape” denoting various geometric figures or images that we may associate with the production and/or perception of music. For instance: • Shapes of sound-producing body motion, such as of hands hitting a drum or ­bowing on a violin, or the shape of the mouth in blowing on a trumpet or in singing a high-pitched vowel. • Shapes of sound-accompanying body motion, such as of hands gesticulating to a tune, or of heads nodding, feet stomping, or the whole body swaying to the beat of the music. • The widespread use of shape-related metaphors such as flat, pointed, round, smooth, and so forth, when speaking and writing about music, both in everyday and more music analytic contexts. • The profusion of graphical shape images for representing features of sound in musical acoustics, such as various images of waveforms and spectra. • Shape features in graphical scores and composition sketches, but also apparent in Western, common practice notation such as note patterns on score pages. • Multimodal images of shapes in musical imagery, that is, combining images of sound, motion, and vision in salient recollections and/or inventions of music in our minds. The basic tenet of this chapter is that what may be called shape cognition is not only deeply rooted in our experiences of music and in musical imagery but also has the potential to enhance our understanding of music as a phenomenon, to contribute to

238   rolf inge godøy various domains of music-related research, and to have practical applications in musical and multimedia artistic creation. Yet, given this plethora of shape instances in music-related contexts, there has, to my knowledge, been relatively little focus on shape cognition as such in music. There are some obvious reasons for why shape cognition in music, until now, has not been the main focus of a more concerted research effort: • Handling shapes in research is inherently challenging because shapes are distributed; that is, shapes typically have curves with peaks and troughs and all kinds of twists and turns, hence, shapes are not reducible to singular values or abstract symbols, and require conceptual and technological tools previously not readily available in music-related research. • With the focus on abstract symbolic representations in Western musical culture, for instance, on discrete pitches and durations, concepts for continuous sound and body motion have been less developed. However, I believe it is now possible to enhance our understanding and applications of musical shape cognition because of significant technological and conceptual advances: using state-of-the-art technologies we can record, process, and experiment directly with music-related shapes, meaning we can now handle chunks of temporally unfolding sound and motion as wholes. Also, we can map “hard” numerical data of sound and body motion shapes to subjective experiences; that is, we can link metaphorical labels of shape with signal-based sound and motion data. Additionally, through research on music-related body motion during the last decade, we now have a better understanding of how sound feature shapes and body motion shapes are connected in musical experience, providing us with hands-on skills for a systematic and extensive exploration of musical shape cognition. In other words, we see musical shape cognition as a unifying paradigm for handling complex, information rich, and temporally unfolding musical sound and body motion in a holistic manner, for capturing all kinds of musical features, from basic acoustic and motion-related features to higher-level stylistic and affective features, as more “solid” and “instantaneous” overview images. This chapter will include a summary of some prominent cases of music-related shape cognition, but shall take shape cognition in music further by trying to understand it as an active process. With a so-called motor theory perspective, I shall argue that musical shape images can emerge from sound-producing body motion and/or from active tracing (e.g., moving to the beat, gesticulating, or miming sound-production) in listening to, or in imagining, musical sound. My main tenet is that active tracing of sound features as shapes is integral to the perception and cognition of music. In Figure 12.1, we can see a simple example of this in what I have called sound-tracing; meaning asking listeners to spontaneously draw the shape of a musical excerpt that they just heard. Additionally, I also believe that the spontaneous tracing of sound features as shapes can be exploited in various music-related contexts; besides enhancing our understanding

Musical Shape Cognition   239

Figure 12.1  Sound-tracings by nine listeners of the sound fragment built up of an initial triangle attack, a downward glide in the strings, and a final drum roll (spectrogram at the bottom). (Sound fragment from cd3, track 13, 20”–29”, in Schaeffer [1967] 1998.)

of music as a phenomenon, shape cognition may also be useful in sonic design, musical composition, performance, and multimedia arts, and a number of other domains by providing conceptual and practical tools for handling most musical features as shape images.

Notions of Shape Music is ephemeral: musical sound and music-related body motion unfolds in time and then vanishes, yet we are (fortunately) left with memory traces of what we just heard and/or saw. The ephemeral nature of music is (and has been) a major challenge for research, however, given available technologies for recording, processing, and representing sound and music-related body motion, we now have the means to “freeze” or “make solid” the ephemeral, enabling close scrutiny of details previously not possible. Yet, given these means, the next major challenge has become how to make sense out of the vast amount of data typically generated by digitalization. On the other hand, traditional means of representation by Western music notation, although useful in conserving some aspects of music, is evidently incapable of representing many aesthetically and affectively highly significant features of musical expression. This concerns what we may call subsymbolic features of music, meaning the various features of sound, such as its so-called timbre (sometimes referred to tone color), a number

240   rolf inge godøy of nuances in pitch (intonation) and loudness (dynamics), as well as what we may call the suprasymbolic features, meaning the expressive elements of musical phrases such as in timing and articulation, in so-called grooves, and in various affective and motionrelated labels, for example, tense, relaxed, light, heavy, agitated, calm, and so on. We thus have the dual challenge of, on the one hand, representing salient features of music using digital technology and, on the other hand, to go beyond the limitations of traditional Western notation. My answer to this dual challenge, then, is that of musical shape cognition, meaning that all features of music—that is, those at the subsymbolic, the symbolic, and the suprasymbolic levels—can be represented as shapes; shapes that enable us to systematically explore the many until now mostly inaccessible, yet highly significant, elements of musical experience. Musical shape cognition is thus a unifying conceptual and practical paradigm for studying and actively manipulating salient features of music at different timescales, ranging from the micro-level, subnote timescale features, to phrase and section-level features of musical expression. In sum, we have the main challenge of bridging gaps between the quantitative (of digital representations of sound and body motion) and the qualitative (of holistic and subjective musical experience), and I believe musical shape cognition will be the best answer to this challenge. It would be no exaggeration to say that expressions of shape are ubiquitous in musical discourse: there are innumerable occurrences of shape-related terms in music theory, music analysis, music aesthetics, music history, and other music-related disciplines. We typically encounter shape expressions for designating melodic, harmonic, rhythmic, textural, dynamic, and expressive features, as well as large-scale formal designs. Also, our Western music notation system, with its spatial distribution of notes on the pages of the score, could actually be seen as having some element of shape cognition and, ­secondarily, also as scripts for sound-production that in turn will result in body motion shapes. And, needless to say, various graphical scores and sketches found in musical composition and analysis contexts are instances of shape cognition. However, the more systematic approach to musical shape cognition should be seen in relation to some specific previous research endeavors: • Seminal ideas on shape cognition in music extend back to classical Gestalt theory, with early proponents towards the end of the nineteenth century such as Ehrenfels and Stumpf and, a bit later, Koffka, Köhler, and Wertheimer, who were all concerned with musical features as shapes (Smith 1988; Leman 1997; Godøy 1997b). A number of Gestalt ideas have been extended into more recent music theory (Tenny and Polansky 1980), into auditory research (Bregman 1990), and into music perception research on melodies (Dowling 1994). • The single most important historical background for my present thoughts on musical shape cognition is the phenomenological approach to musical research advocated by Pierre Schaeffer and his colleagues (Schaeffer  1966, [1967] 1998). With the triple challenges of new music, music from other cultures, and new music

Musical Shape Cognition   241









technology in the post-World War II era, the need to develop a more universally applicable music theory became evident to Schaeffer. To go beyond the confines of traditional mainstream Western music theory, Schaeffer and colleagues turned their attention to the subjective perception of sound, with the ambition of establishing a systematic classification of fragments of sound, of so-called sonic objects, of any type, origin, or signification, for the most part by a systematic ordering of sound features as shapes. • Shape cognition plays an important role in acoustic and psychoacoustic research (De Poli et al. 1991), and it has been used in signal-based visualizations of musical sound (Cogan 1984) and in readily available software (e.g., SonicVisualiser, Praat, and AudioSculpt as well as MIRToolbox, Timbre Toolbox, and other MatLabbased software). Within these software development projects, there is ongoing work to try to extract more perceptually salient information from signals and to represent these features as “solid” shapes, that is, representations that can be exploited in the context of our work on musical shape cognition. • In work with new interfaces for musical expression (NIME), there is the challenge of capturing and mapping body motion shapes to sound with the aim of enabling more human-friendly control of the many parameters that go into digital synthesis and processing of musical sound. As for motion data input, different technologies for motion capture are available (various sensors, infrared and video camera recordings). Associated processing tools (e.g., the MoCapToolbox, the EyesWeb software, and the AudioVideoAnalysis software [Jensenius  2013]) have been important in developing shape cognition, making the study of motion as “solid” shape images possible. Notably, this also makes possible the study of expressive and affective features of motion as shapes derived from motion data, for example, of amplitude, velocity, acceleration, jerk, and so forth. • We have learned much from more general approaches to shape cognition in ­so-called morphodynamical theory (Thom 1983; Petitot 1985), an extensive theory of geometric cognition as a basis for capturing and handling complex and distributed phenomena in general. Also, in so-called cognitive linguistics, studies of image schemata (i.e., more generic shape images) and of metaphor theory suggest that shapes and spatial relations are crucial for all cognition (Godøy 1997a). Additionally, there has been some very interesting work on the display of quantitative information as shapes (Tufte 1983), with modes of representation that seem to have great potential for shape cognition in general. • Lastly, we have seen shape cognition become a topic in so-called embodied music cognition, where the shapes of both sound-producing and sound-accompanying body motion are understood as integral to musical experience (Godøy 2001, 2003a; Leman 2008; Godøy and Leman 2010). In Figure 12.2 is an example of such soundproducing motion shapes of a pianist playing an excerpt from a Beethoven sonata, together with the notation and spectrogram of the resultant sound, demonstrating a case of the ubiquitous sound-motion shape relationships in music.

242   rolf inge godøy

Frequency (Hz) Position (mm)

Spectrogram 900

1000

600 300

Velocity (mm/s)

Velocity (mm/s)

Position (mm)

Right side, horizontal movement 900 800 Left side, horizontal movement

600 500 400 300

Right side, velocities 400 200 0 Left side, velocities 400 200 0

1

2

3

4

5

6

7

8

9

10

Time (s) Shoulder

Elbow

Wrist

Figure 12.2  A synoptic representation of notation (top), spectrogram of resultant sound (next to top), motion shapes, and velocity shapes of the shoulders, elbows, and wrists of a pianist ­playing the opening of the last movement of L. v. Beethoven’s Piano Sonata No. 17 Op. 31 No. 2 in d-minor, The Tempest, demonstrating shape correspondences between score, sound-producing motion (including velocity shapes), and resultant sound Reproduced with permission from the publisher, S. Hirzel Verlag, from (Godøy, Jensenius, and Nymoen 2010).

Musical Shape Cognition   243 As for embodied music cognition, I myself and colleagues have for more than a decade tried to advance our knowledge of musical shape cognition through the following topics: • Imitations of sound-producing body motion, so-called air instrument per­ formances, by both trained and nontrained listeners (Godøy et al. 2006). • So-called sound-tracing studies; that is, listeners with different levels of musical training drawing sound shapes in listening (Nymoen et al. 2013). An example of such sound-tracing can be seen in Figure 12.1. • Motion capture studies of performers (Jensenius 2008; Godøy et al. 2010; Godøy et al. 2016). • Processing and representations of motion capture data (Jensenius 2013). • Sonifications, that is, turning motion capture shape and other visual domain data into sound (Jensenius and Godøy 2013). • Statistical processing and machine learning for feature classification (Nymoen et al. 2013). • Theory development on chunking and emergence of shapes (Godøy 2013, 2014, 2017, 4–29). Throughout, we have tried to make compilations of relevant findings together with our international partners as, for example, summarized in (Godøy and Leman 2010), and we shall in the following sections of this chapter give an overview of some main aspects of sound and motion related to musical shape cognition.

Motor Cognition A core idea of the present chapter is that shape cognition is embodied, and that it extends to several sense modalities; that is, it is manifest in sound and motion, with motion in turn including vision, proprioception, haptics, and sense of effort. In particular, the motor theory of perception has claimed that images of sound-producing body motion are integral to our perception of sound. Initially presented in linguistics with the suggestion that language acquisition is not only a matter of becoming familiar with a set of sounds but also just as much a matter of learning the corresponding sound-producing motion of the vocal apparatus (Liberman and Mattingly 1985), it has been extended to other domains of human perception and cognition (Galantucci et al. 2006), including the visual domain (Berthoz 1997). Furthermore, there is now brain observation evidence of the spontaneous linking of sound and motion in perception (Haueisen and Knösche 2001; Bangert and Altenmüller 2003), including evidence of a neurophysiological predisposition for this linking (Kohler et al. 2002). The mental simulation of assumed sound-producing motion will, in most cases, be covert, but we may also sometimes have observable behavior in the form of imitation. This imitation may be variable in its accuracy, ranging from

244   rolf inge godøy very detailed to rather approximate and vague, as can be observed in cases of the aforementioned air instrument performance and as may be observed in various kinds of vocal imitation such as in scat singing and beatboxing. The motor theory perspective implies that shapes of observed or imagined soundproducing body motion are projected onto whatever it is that we are hearing; for instance, we might project images of energetic hand motion onto ferocious drum sound, or slow bowing motion onto protracted, soft string sound. The idea is that the shapes of soundproducing body motion contribute to the mental schemas for perceiving musical sound. However, there is an important duality to shape cognition here: shapes may be considered “instantaneous” images—that is, something occurring “in the blink of an eye”—yet shapes may also be considered something that unfolds in time—something that has to be “set into motion” and more like a script that needs to be run through in a performance. This duality will be a recurrent topic in musical shape cognition, with a tentative understanding that perception and action may shift between “instantaneous” and “unfolding” shapes. This can be related to musical features, as suggested in Figure 12.3, meaning I can hypothesize that there is a core of more amodal shape cognition surrounded first by a circle of body instantiation, both as stationary posture shapes and as motion trajectory shapes that are in turn surrounded by a circle of musical features manifest variably as postures and motion shapes. With my main tenet that active tracing of sound features as shapes is integral to the perception and cognition of music, and the associated idea that the spontaneous tracing of sound features as shapes can be exploited in various music-related contexts, we can work along the following lines: • Collect music-related body motion by marker-based motion capture, video, and other means. • Analysis of music-related body motion in view of salient shape features. • Analysis of corresponding musical sound, also in view of salient shape-related features. • Advanced statistical methods (e.g., functional data analysis, canonical correlation analysis, etc.) and machine learning, for making correlations between shapes in different sensory domains. • Systematic explorations of sound and motion shape correspondences by analysisby- synthesis and by various perceptual experiments. My present explorations of musical shape cognition can thus be characterized as mainly behavioral and signal-based (i.e., based on sound and body motion data), however, I am also inspired by various recent neurocognitive findings from nonin­ vasive brain research methods (e.g., functional magnetic resonance imaging), in particular concerning motor theory and multisensory integration of sound and motion per­ ception in music.

Musical Shape Cognition   245

Musical Timescales In research on musical shape cognition, we need to be specific about what timescales we are dealing with. For instance, a short melodic phrase has a pitch contour, as does a whole melody, while a large-scale work, such as a whole symphony, can be perceived as having a (macroscopic) pitch contour. Previously, I have suggested distinguishing three main timescales of musical features (Godøy 2010a): • Micro, that is, the less than approximately .5 seconds duration range of continuous sound and body motion, with features such as pitch, loudness, stationary timbre (or tone color), and various microtextural fluctuations related to shape metaphors such as smooth, grainy, rough, and so forth. • Meso, typically in the very approximately .5–5 seconds duration range, and usually encompassing salient information on rhythmic, textural, timbral, harmonic, melodic, and overall stylistic and affective features, and very often related to salient body motion shapes of sound-production, such as of hands moving along the keyboard (see Figure 12.2). • Macro, typically containing several meso timescale chunks, forming sections, whole songs, and more extended works of music. Clearly, the micro and meso timescales are the most important with respect to ­perceiving salient musical features such as timbre, dynamics, rhythmical-textural, melodic, harmonic, and motion shapes; a couple of seconds of music would be enough to tell us, for example, that it is a slow waltz, late romantic style, played by a small café ensemble, and so on (see Gjerdingen and Perrott 2008, for examples of duration thresholds for various features). But also, the macro timescale could be important for musical shape cognition; however, this would be more on a narrative or dramaturgical level, for instance, as found in various cases of program music. On the micro and meso levels, we have attention and memory constraints that make these timescales special (see, Godøy 2013 for a summary), but we also find soundproducing constraints that contribute to chunking at the meso timescale. This includes some crucial biomechanical constraints (e.g., limits to maximum speed of body motion, need for rest and change of posture to avoid strain injury, need to anticipate positioning of effectors [fingers, hands, arms, etc.] before tone onsets, etc.) resulting in s­ o-called coarticulation, meaning a contextual fusion of events into meso timescale chunks (Godøy 2014). Also, there are some motor control constraints that contribute to the formation of meso timescale chunks. For one thing, human motor control seems to be hierarchical and goal-oriented (Grafton and Hamilton 2007), organized in the form of action Gestalts (Klapp and Jagacinski 2011), and furthermore, there have been wellfounded suggestions that human motor control is intermittent (Loram et al. 2014), and

246   rolf inge godøy also that it may proceed by postures (Rosenbaum et al. 2007), something that I have called key-postures in sound-producing body motion (Godøy 2013, 2014). Each such key-posture is surrounded by what I call a prefix and a suffix so that there is a continuous trajectory to and from the key-postures, something which is closely linked with the aforementioned coarticulation (Godøy 2014). The intermittency in human perception, cognition, and action is especially relevant to our ideas on musical shape cognition, because a shape is, by definition, something that works holistically, as something overviewed “instantly,” in a “now-point,” to use Husserl’s expression (Husserl 1991; Godøy 2010c).

Sound Features Although readily available technologies provide us with a number of versatile means for capturing, storing, and retrieving musical sound, as well as processing and representing musical sound in different ways, we still have substantial challenges in extracting and representing perceptually salient features of musical sound. As pointed out by Pierre Schaeffer several decades ago, there is often a complex, or nonlinear, relationship between our subjective perception of musical sound and the acoustic substrate of the sound, a relationship of what Schaeffer called anamorphosis (Schaeffer 1966). Although psychoacoustic and music perception research in the ensuing decades have made substantial progress in exploring the relationships between acoustics and the subjective perception of sound, Schaeffer’s idea of starting research with subjective notions of sound features is still an attractive proposition in the context of musical shape cognition. The basic idea of Schaeffer was that of focusing on sonic objects, typically in the very approximately .5–5 seconds duration range. The focus on sonic objects emerged from what was a technological necessity in the early days of musique concrète in the late 1940s and early 1950s, working with so-called closed groves (sillons fermés) on phonograph records in order to mix sounds into compositions, and where the experience of innumerable repeated listening to such looped sound fragments led to a perceptual focus on the overall shape of the sound fragments as well as on their internal fabric. The focus on the overall shapes led to what is called the typology of sonic objects, and the focus on the various internal features was called the morphology of the sonic objects. Throughout this work with sonic object theory, the crucial element for Schaeffer was that all features should be thought of as concrete, as nonabstract shapes, as opposed to Western music theory, which Schaeffer thought of as basically abstract in its designation of pitch and duration symbols. The typology in Schaeffer’s theory is, then, a scheme of concrete shapes, of what we would call dynamic and pitch-related envelopes. Also, the morphology consists of a scheme of concrete shapes, but here extended to include a number of spectral features.

Musical Shape Cognition   247 Of these two schemes, the typology was considered a coarse, first sorting of sonic objects into three basic dynamic envelope shape categories: • Sustained, meaning a relatively stationary and protracted type of sound. • Impulsive, meaning a short sound with an abrupt, percussive, or plucked onset. • Iterative, including several sounds in rapid succession as in a tremolo or trill. It should be noted that there are so-called phase-transitions between these typological c­ ategories, dependent on the density, rate, duration, and so forth of the events and clearly related to constraints of sound-production (and probably also of perception and cognition): shortening the duration of a sustained sound will lead to an impulsive sound and, conversely, lengthening the duration of an impulsive sound will lead to a sustained sound; slowing down the rate of onsets in an iterative sound will lead to a series of impulsive sounds, accelerating the rates of onset of impulsive sounds will lead to an iterative sound; and so on. Also, in addition to these dynamic typological categories, there were three pitch-related shape categories in the typology: • Stable pitch, a clear and unchanging pitch sensation. • Variable pitch, a clear but changing pitch sensation, for instance, with a glissando. • Nonpitched, that is, noise or a strongly inharmonic type of sound. These two main categories (dynamic and pitch-related) were then combined in a 3 × 3 matrix of basic typological classification. The general procedure of both the typology and the ensuing morphology is that of a top-down feature differentiation, starting out with the overall envelopes and proceeding to subfeatures, sub-subfeatures, and so on, as far down as is deemed useful for characterizing perceptually salient features. The morphology is quite extensive in detail, so here are just two of the categories: • Grain, meaning a rapid fluctuation within the sound, be that of loudness, pitch, or spectral content; for example, the grainy sound of a deep double bass tone. • Gait, meaning a slower fluctuation within the sound, such as in the undulating motion in a dance tune accompaniment. These morphological features would then be subject to further differentiations; for example, the rate and amplitude of the grain, the consistency versus fluctuation of the rate of the gait, and so forth. In addition, we may think of more conventional Western music theory features as shapes, both in terms of more purely sonic features, such as could be represented by spectrograms, and as note-level symbolic features, such as could be represented with Western common practice notation, alternatively as MIDI data: • Textural and rhythmic patterns • Melodic and modal patterns • Harmonic patterns

248   rolf inge godøy • Timbral patterns • Articulatory and expressive patterns • Timing patterns In particular, the last three feature categories are well suited for technological ­representations that combine Western notation with more detailed information on onset timing, duration, dynamics, and spectral features, thus making accessible performancerelated information that previously it was not possible to represent. Such access to features of musical sound is presently made possible within the field of so-called music information retrieval; that is, the searching through large collections of musical sound by way of various sound perception criteria (Müller 2015).

Motion Features The basic idea of the aforementioned motor theory is that any sound event is also perceived as embedded in a motion event, hence that it could be useful to make a more systematic overview of various types of music-related body motion. Following the scheme suggested in Godøy and Leman (2010), there are the following main categories: • Sound-producing motion, consisting of so-called excitatory motion that transfers energy from the body to the musical instrument (or the vocal apparatus) such as by hitting, stroking, bowing, plucking, and blowing, as well as modulatory motion, such as changing the pitch with the left hand on a string instrument or moving the mute on a brass instrument. We also find so-called ancillary motion here, motion that is not directly sound-producing but made in order to facilitate soundproduction, to avoid fatigue or strain injury, or to help in the expressive shaping of the music, as well as so-called communicative motion for coordinating musicians within the ensemble or for making some theatrical effects on the audience. • Sound-accompanying motion, including all kinds of body motion made to music, such as in dancing, walking, gesticulating, nodding, and swaying, often in sync with some features of the music and often reflecting some overall sense of effort of the music (Godøy 2010b). These are just main categories, and notably so; music-related motion may also in many cases be multifunctional, that is, it can be both directly sound-producing and more theatrical so as to enhance the total, multimodal experience of attending a ­concert. Also, there might, in many cases, be similarities in the energy envelopes of sound-producing and sound-accompanying motion, typically in dance and/or other kinds of body motion, such as in the classic example of Charlie Chaplin’s shaving motions mirroring the sound-producing motions in the famous barber scene from The Great Dictator (see Godøy 2010b, for a discussion of this).

Musical Shape Cognition   249 Furthermore, we have different timescales at work here as well, ranging from the global to the local. Typically, we may have overall, global motion features such as: • Quantity of motion, which may be calculated directly from the video data (frameby-frame pixel difference) or motion capture data or other time-varying sensor data (total amount of distance traveled within a timeframe), and which may be a coarse indicator of overall activity level. • Various derivatives of the motion data, such as velocity and jerk, indicative of the mode of motion, a high value for jerk meaning much abrupt motion, a low or zero value for jerk meaning rather calm motion, and so on. And we may have more local motion features such as: • Local trajectories, such as the shape of beat, that are indicative of mode of articulation. • Trajectories for different sonic features, such as ornaments and figures, indicating anticipatory motion, phase-transition, and coarticulation. Common to all these motion features is that they may be experienced, conceived, and represented as shapes and, returning to the basic dynamic sound envelopes of Schaeffer’s typology presented earlier, we see that they correlate well with motion ­features (Godøy 2006): • Sustained, meaning smooth, protracted motion with little acceleration and no jerk. • Impulsive, meaning abrupt motion with much acceleration and jerk. • Iterative, meaning rapid back-and-forth motion, fast enough to make individual motion events fuse into a superordinate oscillatory motion. As was the case for the sound features, these motion categories and features can in turn  be combined into more complex textures, for example, into often-found foreground-background or melody-accompaniment textures of Western music, or into various heterophonic textures with composite sonic objects.

Multimodal Sound-Motion Shapes Musical shape cognition combines sound and motion, and hence also motion-linked sensations such as vision, touch, proprioception, sense of effort, and possibly other sensations as well, all together calling the “purity” of music into question, and rather suggesting that we recognize and tackle sound-motion links as an inherent multimodal feature of music. Furthermore, “multimodal” here means that sound is combined

250   rolf inge godøy

Spectral motion shapes Rhythmictextural shapes Dynamic (envelope) shapes

Stationary spectral shapes (formants, vowels, etc.) Posture shapes (of hands, mouth, torso, etc.) Musical shape cognition

Chordal shapes Modality shapes Melodic shapes

Motion trajectory shapes (of fingers, hands, arms, etc.) Intonation Expressive shapes (articulation) shapes Affective Timing shapes shapes

Figure 12.3  A schematic overview of music-related shape cognition elements. Assuming a core of amodal musical shape cognition, the next circle includes quasistationary body postures (as typically found in the vocal apparatus and in instrumental music as effector shapes) and motion trajectories that are then related to various other musical features in the next circle.

with shapes in the involved modalities, that is, shapes of motion, vision, proprioception, touch, and so on. Concretely, we can see how different, and variably multimodal, musical features relate to shape in Figure 12.3. Centered on a core of what may be understood as a very general and amodal musical shape cognition, this general faculty for shape cognition may be differentiated into the two main categories of posture-related shapes and motion trajectory shapes, but certainly this is more of a gravitation matter than a sharp divide. These two main categories can in turn be differentiated into a number of shape categories that are variously posture and/or motion related. In more detail, the posture shapes, hence the quasistationary shapes, include the following sound-motion shapes (going clockwise around the outmost circle, starting at the top): • Stationary spectral shapes, or more accurately, so-called quasistationary spectral shapes for natural sounds, often reflecting vocal apparatus sound shapes and/or instrumental sound shapes, with characteristic energy distributions, formants, spectral centroids, and so forth. • Chordal shapes, both more abstract classifications (chord types) and actual distributions (spacing and voicing), related to stationary spectral shapes. • Modality (tone semantic) shapes, reflecting typically recurrent interval constel­ lations within a limited timeframe, typically modes such as Dorian, Phrygian, and Lydian, similar to the spectral and chordal shapes.

Musical Shape Cognition   251 And the motion trajectory shapes include the following sound-motion shapes ­(continuing clockwise around the outmost circle), often displaying high levels of meso timescale, within chunk fusion by coarticulation: • Melodic shapes, meaning contours at different timescales, ranging from small motive to more protracted melodies. • Intonation shapes, including note-level nuances such as glissando, vibrato, and portamento. • Affective shapes, meaning more composite shapes of, for example, temporal, dynamic, and timbral changes. • Timing shapes, typically acceleration or retardation curves. • Expressive (articulation) shapes, including accentuations, timing, and intonation nuances. • Dynamic (envelope) shapes, with crescendo, decrescendo, and/or more rapid fluctuations. • Rhythmic-textural shapes, the overall sound-producing motion shapes and the overall sound output shapes of all kinds sound patterns. • Spectral motion shapes, meaning changes in overall spectral features, for instance, by opening-closing of mutes or change of bowing position. Needless to say, there are a number of challenges in studying sound-motion similarity in music, however it is a feasible undertaking provided there is careful consideration of the ontological validity of the shape images across modalities (Godøy et al. 2016).

Musical Instants A shape is intrinsically (by definition) instantaneous, in the case where we see it all at once (a figure on a page, a sculpture, an object), when we anticipate a sequence of motion, a sequence of events, of sound, or when we need to scan a figure in time or listen to a sequence in time and form the shape image retrospectively and by keeping the temporally unfolded shape in some kind of resonant buffer. We thus have a seemingly enigmatic relationship between continuity (stream of sensations) and discontinuity (instantaneous shape images based on overviews of continuous segments) in our reflections on musical shape cognition. However, this relationship between continuity and discontinuity may perhaps (at least partially) be understood as integral to motion planning and motor control, cf. the model of key-posture oriented motion where key-postures are discontinuous but where their respective prefixes and suffixes are continuous, resulting in a continuous, undulating motion, only intermittently “punctuated” by key-postures (Godøy 2013). The idea of motor theory is that the production schema is projected onto whatever it is that we are perceiving, hence suggesting that we also perceive the key-posture orientation

252   rolf inge godøy “in reverse” of sound-production when we are perceiving musical sound. However, we clearly need more research here, in particular along the lines suggested by (Grossberg and Myers 2000), with the metaphor of a short-term memory resonant buffer that keeps the entire chunk in consciousness until the sequential unfolding is perceived holistically as one chunk. Also, many perceptually salient features (e.g., style, affect, aesthetics) are dependent on some minimum duration segment of spatiotemporal unfolding, yet, even this unfolding seems to be somehow perceived “all-at-once” or “in-a-now,” as was pointed out by Husserl. We seem to still have a long way to go in fully understanding this relationship between the continuous and the discontinuous in perception, and here are some ideas that might be relevant for our reflections here: • Edmund Husserl, in dialogue with several of his contemporaries, argued that perception by necessity proceeds in a discontinuous manner by intermittently interrupting the continuous stream of sensations at so-called now-points in order to make sense out of whatever it is that we are perceiving (Husserl 1991; Godøy 2008,  2010c). One possible extension of Husserl’s idea of interruption could be that of intermittent motor control as an “instantaneous” overview image of what is to come, that is, as an anticipation of sound-producing motion trajectories to be made in the immediate future (Godøy 2011, 2013). This is something that would fit quite nicely with the idea of shape overview as an integral element of anticipatory planning of body motion. • Related ideas on the holistic experience of the present moment have been ­presented throughout the twentieth century (e.g., Pöppel  1997, Michon  1978, Stern  2004), and, although the arguments in favor of such basic discontinuity in perception and cognition may vary, there seems to be a general consensus that there is indeed such a “moment-by-moment” feature in human perception. • In more music-related contexts, we have various accounts of the need for overview images of musical unfolding, and Hindemith suggested that an instantaneous overview of an entire composition was an essential feature of craftsmanship in music: “If we cannot, in the flash of a single moment, see a composition in its absolute entirety, with every pertinent detail in its proper place, we are not genuine creators” (2000, 61). • Xenakis expressed similar ideas on the instantaneous overview of any musical work with his concept of “hors-temps” (“outside time”) and of having a distant visual perspective on the whole work so that it would appear in one snapshot: “This is to say that in the snapshot, the spatial relations of the entities, the forms that their contiguities assume, the structures, are essentially outside time (horstemps). The flux of time does not intervene in any way. That is exactly what happens with the traces that the phenomenal entities have left in our memory. Their geographical map is outside time” (1992, 264). • The transition from shape images to sound may be seen as a general phenomenon that may have different instantiations, ranging from that of transforming the notation symbols of a score into sound by way of sound-producing motion, to

Musical Shape Cognition   253 more direct transformation of graphic images to sound by so-called sonification (Jensenius and Godøy 2013). • Psychoacoustic research on auditory objects has shed some light on this holistic perception (Griffiths and Warren 2004; Bizley and Cohen 2013), partly through recourse to the multimodal features of sound. Additionally, we have an interesting modeling of the holistic perception of sequential unfolding by some kind of resonant buffer, a kind of short-term memory store (Grossberg and Myers 2000). From a more conceptual point of view, we may in any case claim that shape is by ­definition holistic or nonpunctual, hence, always temporally extended, and often also spatially extended in the sense of the effectors (fingers, hands, arms) and, more indirectly, in the time-domain and frequency-domain representations as extended shapes. How the transition between the continuous stream of sound-motion and the discontinuous images of shape (physical and/or mental images) actually works in our perception and cognition still seems to be quite enigmatic, yet it so very obviously seems to work, both in musical contexts and in general.

Shape Cognition in Musical Imagery Evidently, musical sound creates memory traces in our minds, and it seems that we may mentally replay the music in the original tempo, or in slow motion, or in fast motion, even defying temporal unfolding as the sounds may be more in the guise of instan­ taneous overview images (cf., the previous section). What such an ability to recollect and reenact musical sound in our minds points to, is the capability of musical imagery, meaning to make music present in our minds beyond the immediate or “original” listening experience. “Musical imagery” may be defined as “our mental capacity for imagining musical sound in the absence of a directly audible sound source, meaning that we can recall and re-experience or even invent new musical sound through our ‘inner ear’ ” (Godøy and Jørgensen 2001, ix). However, the expression “musical imagery” is sometimes also taken to denote mental images that accompany music, images of colors, textures, landscapes, and so forth, that listeners may have when listening to various kinds of music (Aksnes and Ruud 2008). Such imagery with music may of course reflect shape-related features of the music; however, I shall here limit my reflections to the imagery for sound and its associated sound-producing and/or sound-accompanying motion images. Knowledge about musical imagery has in the past couple of decades been enhanced by both behavioral and brain imaging research (see, e.g., Zatorre and Halpern 2005, for an overview). But musical imagery may also be seen in the broader context of mental imagery, and is closely linked with our general capacity for reenacting in our minds whatever it is that we may have experienced (Kosslyn et al. 2001), as well as having a capacity for simulating expected future events and actions in order to make us better

254   rolf inge godøy prepared for upcoming challenges in our moment-to-moment existence (Berthoz 1997). Furthermore, mental imagery might be voluntary as well as sometimes involuntary; in the latter case as unwanted persistent images in our minds. In music, we have the wellknown experience of “tunes stuck in the head,” that is, of so-called involuntary musical imagery (see Williams 2015, for an overview). As for more voluntary musical imagery, one crucial question in our context is how such imagery is triggered, sustained, and put to use in various music-related tasks: musical imagery is integral to so-called mental practice for performance, in score reading, in composition, arranging, and orchestration, or in any situation requiring the recall of previously heard music, such as for the purpose of writing a review or for the sheer pleasure of savoring a great musical experience. More precisely, the question is that of how to control volitional musical imagery, that is, how to actively initiate and sustain images of musical sound in our minds. In line with the arguments above of the many and strong links between motion, sound, and vision in music, a possible answer to this question of volitional musical imagery could be that of shape cognition in music; the appearance and persistence of salient imagery of musical sound may be linked to enacted shape cognition, meaning that images of musical sound may be “brought to life” in our minds by active shape tracings. In other words, what could be called gestural imagery may here be understood as serving auditory imagery (Godøy 2003b). The concrete implementation of shape cognition in musical imagery could be by imagined performance, reminiscent of so-called air instrument performance (Godøy et al. 2006) or by various kinds of sound-tracing (Nymoen et al. 2013), that is, of moving hands, fingers, arms, torso, and so on, in tracing the shapes or miming the production of various sound features. The crucial element here would be the shifting between images of motion and sound, with motion imagery triggering sound imagery, and conversely, sound imagery triggering motion imagery, all the time with shape cognition as the translating factor between motion and sound. Such shifting between sound and motion, and between motion and sound, could be a way to explore features of both sound and of motion, potentially encompassing a large number of musical features that previously has had no name but still are aesthetically significant. This was in fact one of the main ideas of Schaeffer’s typology and morphology of sonic objects, and also something that may be put to use in contemporary sonic design: our awareness of sonic features may be enhanced by shape cognition in musical imagery, by actively and persistently shifting between images of sound shapes and motion shapes.

Prospects and Challenges Musical shape cognition is becoming increasingly more feasible with new technology; thanks to new conceptual tools and attitudes, we might soon achieve an enhanced understanding of how we perceive sensory impression holistically as shapes. As argued earlier,

Musical Shape Cognition   255 shape cognition can become useful in a number of music-related tasks, in particular by bridging the gaps between “hard” numerical data and “soft” qualitative concepts and, furthermore, these advantages of shape cognition are based on the fact that shapes are inherently holistic and extended, whereas more abstract symbols are inherently atomistic and local. But needless to say, we still have much to do in understanding how musical shape cognition works. On a more practical level, we have the following challenges: • Finding and isolating experientially salient features, both in sound data and in motion data. • Systematically exploring sound-motion shape relationships by analysis-by-synthesis and match-mismatch experiments. • And, not to forget, exploring practical applications of musical shape cognition in performance, composition, improvisation, sonic design, and various multimedia arts, by systematic mappings between different representations. And on a more general level, we also have some significant challenges: • Understanding how the continuous to discontinuous transition in perception works. • Develop better representations of perceptually salient features of sound-motion as shapes. • Develop enhanced means for handling very large collections of sound-motion data in view of systematic feature mappings. Yet, in spite of these outstanding challenges, it seems fair to conclude that most (if not all) perceptually salient musical features may be conceptualized as shapes. Our capacity for musical shape cognition should be considered one of the most powerful tools of both knowledge and skill in musical creation and we are only at the beginning of tapping its potential.

References Aksnes, H., and E. Ruud. 2008. Body-Based Schemata in Receptive Music Therapy. Musicae Scientiae 12 (1): 49–74. Bangert, M., and E. O. Altenmüller. 2003. Mapping Perception to Action in Piano Practice: A Longitudinal DC-EEG Study. BMC Neuroscience 4: 26. Berthoz, A. 1997. Le sens du mouvement. Paris: Odile Jacob. Bever, T.  G., and D.  Poeppel. 2010. Analysis by Synthesis: A (Re-)Emerging Program of Research for Language and Vision. Biolinguistics 4: 174–200. Bizley, J. K., and Y. E. Cohen. 2013. The What, Where and How of Auditory-Object Perception. Nature Reviews Neuroscience 14: 693–707. Bregman, A. S. 1990. Auditory Scene Analysis. Cambridge, MA, and London: MIT Press.

256   rolf inge godøy Cogan, R. 1984. New Images of Musical Sound. Cambridge, MA, and London: Harvard University Press. De Poli, G., A. Piccialli, and C. Roads. 1991. Representations of Musical Signals. Cambridge, MA, and London: MIT Press. Dowling, W. J. 1994. Melodic Contour in Hearing and Remembering Melodies. In Musical Perceptions, edited by R. Aiello and J. A. Sloboda, 173–190. New York: Oxford University Press. Galantucci, B., C. A. Fowler, and M. T. Turvey. 2006. The Motor Theory of Speech Perception Reviewed. Psychonomic Bulletin and Review 13 (3): 361–377. Gjerdingen, R. O., and D. Perrott. 2008. Scanning the Dial: The Rapid Recognition of Music Genres. Journal of New Music Research 37 (2): 93–100. Godøy, R. I. 1997a. Formalization and Epistemology. Oslo: Scandinavian University Press. Godøy, R. I. 1997b. Knowledge in Music Theory by Shapes of Musical Objects and SoundProducing Actions. In Music, Gestalt, and Computing, edited by M. Leman, 89–102. Berlin: Springer-Verlag. Godøy, R. I. 2001. Imagined Action, Excitation, and Resonance. In Musical Imagery, edited by R. I. Godøy and H. Jørgensen, 239–252. Lisse, Netherlands: Swets and Zeitlinger. Godøy, R. I. 2003a. Motor-Mimetic Music Cognition. Leonardo 36 (4): 317–319. Godøy, R.  I. 2003b. Gestural Imagery in the Service of Musical Imagery. In Gesture-Based Communication in Human-Computer Interaction, LNAI 2915, edited by A.  Camurri and G. Volpe, 55–62. Berlin and Heidelberg: Springer-Verlag. Godøy, R. I. 2006. Gestural-Sonorous Objects: Embodied Extensions of Schaeffer’s Conceptual Apparatus. Organised Sound 11 (2): 149–157. Godøy, R.  I. 2008. Reflections on Chunking in Music. In Systematic and Comparative Musicology: Concepts, Methods, Findings, edited by A.  Schneider, 117–132. Frankfurt: Peter Lang. Godøy, R. I. 2010a. Images of Sonic Objects. Organised Sound 15 (1): 54–62. Godøy, R.  I. 2010b. Gestural Affordances of Musical Sound. In Musical Gestures: Sound, Movement, and Meaning, edited by R. I. Godøy and M. Leman, 103–125. New York: Routledge. Godøy, R.  I. 2010c. Thinking Now-Points in Music-Related Movement. In Concepts, Experiments, and Fieldwork: Studies in Systematic Musicology and Ethnomusicology, edited by R. Bader, C. Neuhaus, and U. Morgenstern, 245–260. Frankfurt am Main: Peter Lang. Godøy, R. I. 2011. Sound-Action Awareness in Music. In Music and Consciousness, edited by D. Clarke and E. Clarke, 231–243. Oxford: Oxford University Press. Godøy, R. I. 2013. Quantal Elements in Musical Experience. In Sound, Perception, Performance: Current Research in Systematic Musicology, Vol. 1, edited by R. Bader, 113–128. Berlin: Springer. Godøy, R. I. 2014. Understanding Coarticulation in Musical Experience. In Sound, Music, and Motion, LNCS 8905, edited by M. Aramaki, O. Derrien, R. Kronland-Martinet, and S. Ystad, 535–547. Berlin: Springer. Godøy, R. I. 2017. Key-Postures, Trajectories and Sonic Shapes. In Music and Shape, edited by D. Leech-Wilkinson and H. Prior. Oxford: Oxford University Press. Godøy, R.  I., E.  Haga, and A.  R.  Jensenius. 2006. Playing “Air Instruments”: Mimicry of Sound-Producing Gestures by Novices and Experts. In GW2005, LNAI3881, edited by S. Gibet, N. Courty, and J.-F. Kamp, 256–267. Berlin: Springer. Godøy, R. I., A. R. Jensenius, and K. Nymoen. 2010. Chunking in Music by Coarticulation. Acta Acustica united with Acustica 96 (4): 690–700. Godøy, R. I., and H. Jorgensen. 2001. Musical Imagery. Lisse, Netherlands: Swets and Zeitlinger

Musical Shape Cognition   257 Godøy, R.  I., and M.  Leman. 2010. Musical Gestures: Sound, Movement, and Meaning. New York: Routledge. Godøy, R. I., M. Song, K. Nymoen, M. R. Haugen, and A. R. Jensenius. 2016. Exploring SoundMotion Similarity in Musical Experience. Journal of New Music Research 45 (3): 210–222. Grafton, S. T., and A. F. de C. Hamilton. 2007. Evidence for a Distributed Hierarchy of Action Representation in the Brain. Human Movement Science 26:590–616. Griffiths, T.  D., and J.  D.  Warren. 2004. What Is an Auditory Object? Nature Reviews Neuroscience 5 (11): 887–892. Grossberg, S., and C. Myers. 2000. The Resonant Dynamics of Speech Perception: Interword Integration and Duration-Dependent Backward Effects. Psychological Review 107 (4): 735–767. Haueisen, J., and T. R. Knösche. 2001. Involuntary Motor Activity in Pianists Evoked by Music Perception. Journal of Cognitive Neuroscience 13 (6): 786–792. Hindemith, P. 2000. A Composer’s World: Horizons and Limitations. Mainz: Schott. Husserl, E. 1991. On the Phenomenology of the Consciousness of Internal Time, 1893–1917. Translated by J. B, Brough. Dordrecht: Kluwer Academic. Jensenius, A. R. 2008. Action, Sound: Developing Methods and Tools to Study Music- Related Body Movement. PhD thesis, University of Oslo. Oslo: Acta Humaniora. Jensenius, A. R. 2013. Some Video Abstraction Techniques for Displaying Body Movement in Analysis and Performance. Leonardo: Journal of the International Society for the Arts, Sciences and Technology 46 (1): 53–60. Jensenius, A. R., and R. I. Godøy. 2013. Sonifying the Shape of Human Body Motion using Motiongrams. Empirical Musicology Review 8: 73–83. Klapp, S.  T., and R.  J.  Jagacinski. 2011. Gestalt Principles in the Control of Motor Action. Psychological Bulletin 137 (3): 443–462. Kohler, E., C. Keysers, M. A. Umiltà, L. Fogassi, V. Gallese, and G. Rizzolatti. 2002. Hearing Sounds, Understanding Actions: Action Representation in Mirror Neurons. Science 297: 846–848. Kosslyn, S. M., G. Ganis, and W. L. Thompson. 2001. Neural Foundations of Imagery. Nature Reviews Neuroscience 2: 635–642. Leman, M. 1997. Music, Gestalt, and Computing: Studies in Cognitive and Systematic Musicology. Berlin: Springer. Leman, M. 2008. Embodied Music Cognition and Mediation Technology. Cambridge, MA: MIT Press. Liberman, A. M., and I. G. Mattingly. 1985. The Motor Theory of Speech Perception Revised. Cognition 21: 1–36. Loram, I. D., C. van de Kamp, M. Lakie, H. Gollee, and P. J. Gawthrop. 2014. Does the Motor System Need Intermittent Control? Exercise and Sport Science Review 42 (3): 117–125. Michon, J. 1978. The Making of the Present: A Tutorial Review. In Attention and Performance VII, edited by J. Requin, 89–111. Hillsdale, NJ: Erlbaum. Müller, M. 2015. Fundamentals of Music Processing. Heidelberg, NY: Springer. Nymoen, K., R. I. Godøy, A. R. Jensenius, and J. Torresen. 2013. Analyzing Correspondence between Sound Objects and Body Motion. ACM Transactions on Applied Perception 10 (2). 9:1–9:22. Petitot, J. 1985. Morphogenèse du sens I. Paris: Presses Universitaires de France. Pöppel, E. 1997. A Hierarchical Model of Time Perception. Trends in Cognitive Science 1 (2): 56–61.

258   rolf inge godøy Rosenbaum, D. A., R. G. Cohen, S. A. Jax, D. J. Weiss, and R. van der Wei. 2007. The Problem of Serial Order in Behavior: Lashley’s Legacy. Human Movement Science 26 (4): 525–554. Schaeffer, P. 1966. Traite des objets musicaux. Paris: Editions du Seuil. Schaeffer, P. (1967) 1998. Solfege de l’objet sonore. With sound examples by Reibel, Guy, and Ferreyra, Beatriz. Paris: INA/GRM. Smith, B. 1988. Foundations of Gestalt Theory. Munich and Vienna: Philosophia Verlag. Stern, D. 2004. The Present Moment in Psychotherapy and Everyday Life. New York: W. W. Norton. Tenny, J., and L.  Polansky. 1980. Temporal Gestalt Perception in Music. Journal of Music Theory 24 (2): 205–241. Thom, R. 1983. Paraboles et catastrophes. Paris: Flammarion. Tufte, E. R. 1983. The Visual Display of Quantitative Information. Cheshire, CT: Graphics Press. Williams, T. I. 2015. The Classification of Involuntary Musical Imagery: The Case for Earworms. Psychomusicology 25 (1): 5–13. Xenakis, I. 1992. Formalized Music. Rev. ed. Stuyvesant: Pendragon. Zatorre, R. J., and A. Halpern. 2005. Mental Concerts: Musical Imagery and Auditory Cortex. Neuron 47: 9–12.

chapter 13

Pl ay i ng th e I n n er E a r Performing the Imagination Simon Emmerson

Introduction Musicians and sound artists imagine sound; the imagination is the most powerful and flexible audio workstation we have. It can do fabulous transformations—some beyond current “real-world” capabilities. In this manner, it works on sound memory and experience. But the imagination is also a synthesizer. Here we are in uncharted waters. Real synthesis is experimental: we build circuits, set dials, or build algorithms and set parameters—whatever the means, we can sit back and listen to the possibly unexpected results. However, the imagination works differently in this mode—perhaps its inputs are not at all conscious. I might hear a sound in my imagination apparently from nowhere— I can perceive no immediate cause. How might we externalize and use this enormous power? And, furthermore, might imagining music become a form of performance?1 There is on one hand a discourse of language2: first, better to describe what these internal sounds and processes sound like but, second, as a means to encourage their fuller development. On the other hand, there is a subtle non- (pre-?) linguistic game. We do not need words for this—they might even get in the way and limit the options. This is about creativity and play—a continuous sequence of imagine, play, listen, modify, imagine. . . . There may be other nonverbal ways to externalize, for example—what is the role of visualization? Sound-to-visual synesthesia is a specific form of a more general phenomenon (Van Campen 2008). This, too, may be a descriptive response but potentially a powerful synthesizer as well. This is a form of reverse engineering—we describe an effect and work backward to reconstruct a possible material cause. This chapter will not deal directly with interface design, brain, and neuroscience. What I want to discuss and encourage is greater engagement of this infinite world in the creative process of sound- and music making. It is entirely speculative although based on ideas and tools that seem to have had seeds in the last quarter of the twentieth century—indeed

260   simon emmerson my own observations of approaching fifty years of making music with electroacoustic technology3 suggest that, reflected forward another fifty, students studying today will live to see some of this speculation come to pass. The imagination has always been a powerful tool for sound creation. To share the fruits of this almost infinite resource we need some way to externalize—at least for the moment we cannot have access to imagined sound directly. Such a powerful tool has the potential for seduction, envelopment, and immersion. From Plato onward this is at the root of some belief systems’ concerns about the morality and ethics of sound and music. When Caliban declaims: Be not afeard. The isle is full of noises, Sounds and sweet airs, that give delight and hurt not. Sometimes a thousand twangling instruments Will hum about mine ears, and sometime voices That, if I then had waked after long sleep, Will make me sleep again: and then, in dreaming, The clouds methought would open and show riches Ready to drop upon me, that when I waked I cried to dream again (The Tempest, Act III, scene ii)

it is clearly a wondrous dream and we all (almost immediately) are imagining what sounds might make him “cry to dream again.” Interestingly, the word “noises” seems not to have a negative connotation in this passage. If for the moment these imaginings are private, how might we in future harness them to enhance our abilities and possibilities in the shared perceptible world of sound?

From Information to Imagination In the 1980s, the phrase that was often used to describe emerging computer and digital applications was “music information technology.” This was a dry and practical reduction of music to information—eliminating or at least discouraging the descriptions of aesthetic dimensions. We now have an emerging “music imagination technology” (Emmerson 2011). Imagination is defined as “the faculty or action of forming new ideas, or images or concepts of external objects not present to the senses.”4

Imagination and Imagery There is a tendency to bracket together “image” and “imagination”—the two have the same origin in the Latin “imago,” but we can make a distinction between them. Clearly imagination has a greater scope than image. For the purposes of this discussion I will

playing the inner ear: performing the imagination   261 use image in its everyday sense as having a visual component, while imagination can have a much broader range of sensory, space, and time elements (audio, visual, tactile, and so forth)—thus an image may, of course, be real or take effect within the imagination.5 So as musicians we may imagine a scenario, an instrument, a performance, a sense of space, place and movement, a form, an atmosphere. These may not be sounding but give us a context for sound. For example, we may imagine a complex relationship expressed through mathematics that somehow drives the sound synthesis. John Chowning talks of  something similar in his realization that the principles of frequency modulation (FM, well established in radio) might be applied in the audio domain. Then, most importantly, we have what I would call “inner listening modes” that only partly correspond to those of the physical world. First, we can imagine acousmatic sound—that is the sound itself directly (with no sense of place and origin). If we are aware of our imagination at work we may not (after all) search for any source and cause. But then again, as a second such mode, imagining source and cause is also possible— we  might construct imaginary instruments, environments, machines, and so forth (in scenarios as surmised earlier), and “hear” what we believe is their sounding.

Form in Space and Time The relationship of space and time in the sound imagination is complex and much more interactive and exchangeable than in the world around us. Form, shape, direction, extension can exist in either space or time—and often in an ambiguous “mix” of the two. Iannis Xenakis is cited by the architect Le Corbusier, for whom he worked in the 1950s, as stating: Goethe said that “architecture was music become stone.” From the composer’s point of view the proposition could be reversed by saying that “music is architecture in movement.”  (Le Corbusier 1968, 326)

So, when sound becomes music there seem to be two approaches to its evolution: form in space, which seems to have an outside time existence—a kind of architecture; and form in time, “forming”—an emergent property that is built over time into memory. Let us examine each in turn, starting with form in space. Composers have often described imagining musical form “outside time”—some have claimed to “see” forms of musical works in an instant. Form in space clearly has a relationship to the idea of the (Western musical) score where time is mapped onto space. But, perceptually, we have form in time—a kind of accumul­ation in which only at the conclusion of listening does memory assemble the whole.6 Of course, if it is a piece that already exists, and that I already know, then this works somewhat differently, as I may be comparing the present with a memory of the past—but for our imaginative synthesizer we are working much more in the moment on entirely new sound and music. I have a compromise (perhaps alternative) approach to bring these two

262   simon emmerson views together. Imagining sound through a form of performance—perhaps improvised, perhaps rehearsed—might bring us closer to the realm of truly musical relationships.

Reverse Engineering: From Reaction to Generation Much study in the psychology of music tries to understand the reaction of the human to musical stimulus, increasingly embodied as well as socially and ecologically situated (Clarke 2005; Bharucha et al. 2006; Leman 2016). I suggest that we have here the elements of a tool set for a reversal of the process. One of the aims of the study and understanding of human reaction to sound might be to allow its generation from our ideas. From both film music libraries and music information retrieval (MIR)-based sound spotting—that is, finding sound with a given characteristic—we see the emergence of such toolkits. It is this mirroring that will form the basis of my discussion. This is what I described above as a kind of reverse engineering, starting with the result and working back to the possible cause.

First Steps Toward Reconstruction We have seen a first stage of this process in the highly significant steps forward in neuroscience in recent years following on from improved methods of observing (visualizing) and recording brain activity during attention to both audio (listening) and visual ­signals. Most of these follow the rapid development of noninvasive methods of scanning such as the use of the electroencephalogram (EEG) and functional magnetic resonance imaging (f MRI). This allows us to decide whether specific signals result in specific neural responses (involving complex spatial position, temporal, and strength measures)—a correlation of stimulus to response patterns. Both audio and visual stimuli have been used7—while any musician knows both make important contributions, their relationship and interaction at neurological levels has barely been addressed (Thompson et al. 2013). A second stage is now tentatively becoming established, namely a bolder and more speculative move beyond correlation to use of the observed activity data to reconstruct the original signal (I shall refer mostly to sound from now on). This in turn has two (often blended) approaches: reconstruction of a sound signal per se, and then the reconstruction of the effect of the sound. Grimshaw and Garner (2015) suggest the former is the real focus of sonic virtuality while Thompson, for example, moves toward the latter, optimistically asserting, “it should be possible to accurately reconstruct aspects of subjective musical experiences from neurophysiological signals” (2013, 56),8 also arguing for the need “to establish whether the reconstructed audio evokes the same musical experience as the

playing the inner ear: performing the imagination   263 original audio. The results of these behavioral experiments will guide future research toward accurate stimulus reconstruction from brain activity” (Thompson et al. 2013, 6). This is based on the oldest of scientific methods that needs careful handling. If music (or any signal) X tends to stimulate neural pattern Y sufficiently consistently across a wide population then the observation of Y implies the imaginary or real presence of X.9 We know philosophically that a correlation is not necessarily a cause—but we tend progressively to adopt this belief the greater the supporting evidence (and in the absence of contrary evidence). In short, we are using an inverse procedure to logical deduction— namely induction—as we do not yet possess a direct causal mechanism between signal and response. A sophisticated application of this to speech resynthesis is described in Pasley and colleagues (2012). While great progress at the level of general features has been made they still report that “Single trial reconstructions are generally not intelligible. However, coarse features such as syllable structure may be discerned” (13). Early research focused on measuring the neural response to the presence of real audio or visual material. But more recently there has emerged the first stages of detecting the neural activity patterns relating to such as memory and even imagination in the absence of any physical signal. Grimshaw and Garner (2015) include a meticulous review of this relationship. Their position is the most radical reappraisal of what (and where) sound is: Sound is an emergent perception arising primarily in the auditory cortex and that is formed through spatio-temporal processes in an embodied system. (1, this idea is developed throughout the book)

To support this thesis they develop the idea of the sonic aggregate, which comprises two sets of components: the exosonus, a set of material and sensuous components; and the endosonus, a set of immaterial and nonsensuous components. The endosonus is a requirement for the perception of sound to emerge; the exosonus is not.  (4)

As a consequence of this (extensively argued) definition, Grimshaw and Garner suggest that all kinds of sound perception come under this one umbrella. Thus “imagined sound,” tinnitus, sonic hallucinations, and so forth are “sound” in the same way that sound in the presence of a sound wave is. However, they do not argue that these are all thus “the same.”10 It is interesting to ask whether a familiar memorized sound (perhaps the result of many listenings) will produce, when recalled, stronger patterns than a speculative imagi­ned one. Zatorre and colleagues (1996) examine this question in detail but with respect to melodic (song) fragments—their detailed discussion looks at the degree to which different brain regions are engaged in imagining and perceiving the same auditory material for a set task. I am generalizing this question to include sound quality, which receives very little mention in the literature. I suspect—without any empirical evidence—that recognizable (real-world) sounds will produce stronger results when recalled, as opposed to the more abstract sound types often found in recent sound art.11

264   simon emmerson In their final chapter, Grimshaw and Garner (2015) state one possible ultimate aim as “simply thinking a sound when one wishes to design audio files” (196). I am, in this chapter, taking off from this point—namely the observation and use of the neural patterns resulting from imagined sound as the basis for a synthesis engine.12

Rendering Memory Writers as well as research scientists have imagined the awesome power of tapping directly into another person’s memory and somehow “reading” it. In Dennis Potter’s final play created for British television, Cold Lazarus (1994/1996), the memories of a writer13 whose head has been cryogenically frozen for nearly four hundred years are extracted and projected in 3D into a relatively large space—a hi-tech laboratory whose funding might depend on selling the results to a worldwide TV network. As the experiment proceeds with increasing success, we see memories of landscapes and people and hear sounds and conversations that the small group of scientists tries to make sense of. The past is thus preserved and then projected into the present—or is it? It turns out it is not that simple—the observers slowly become aware that “it” is interacting with them; the head retains a degree of consciousness. As the head can observe the projected memories, memory, the present, and imagination become confused.

Rendering Imagination Let us take Potter’s vision and give it a more optimistic, forward-looking projection. What of the future—the act of imagination of what might be—could this also not be projected in like manner to be rendered and synthesized at our behest? This is not synthesizing the future strictly but “the imaginative present”—we might project what we hear (and see) in our imagination right now. Can we really imagine sound without access to memory? This may be impossible to answer, as we clearly access memory without conscious intention. As Murray Schafer (1977) long ago pointed out, certain sounds seem to have a universal resonance, possibly through their role in both our long-term evolution and in our experience of prebirth sound in an amniotic state. Of course, we can consciously recall sound to our own present internal perception but to communicate this to others in any detail14 we need (at present) language and other descriptive symbol sets. Perhaps naming itself is an act of memory, being shorthand for this description—but for that we seem to need a degree of stability and repeatability. Throughout my life I have heard sounds while driving that I have wanted to capture— I am aware that this sensation lies uneasily between physical perception and imagination. Sometimes, I cannot tell the provenance of the sound at all—my considered view is that some aspect of the real sound around me provokes an additional imaginative layer and the two strongly interact.

playing the inner ear: performing the imagination   265 Thus, this is more than simply externalizing imagination and effecting its synthesis into sound—it may one day be possible to unravel these two components and understand their interaction. The sounds are (unsurprisingly) typically drones but with great (and sometimes changing) internal detail and occasional sharper events as a kind of punctuation. Their mystery is compounded by ambiguous spatiality—both very close and very distant at the same time. I have repeatedly referenced the line from the text of Stockhausen’s Momente (from a personal letter from Mary Bauermeister): “Everything surrounding me is near and far at once”—as a personally relevant resonance. I believe it is a key modernist trope concerning spatiality in a mediated age.

Memory and Imagination One aspect of imagination is that it may be seen as anticipatory behavior—a tool for survival. It has also been suggested that it has expanded into the mental bandwidth previously occupied by the need to memorize—whether Homeric epics or routes for navigation on land and sea, before maps, writing, and other externalized forms of memory. This process continues in our smart technology, especially in its use to stream our experiences to external platforms for sharing and later review. Why bother to remember (detail at least) when the device does this for you—and you can choose what to recall and review later?

Embodied Response Perhaps other aspects of embodied response may be reverse engineered to join our battery of drivers of the new imaginative synthesizer. What do sound and music stimulate within us? One most extensively researched recently is that of mirror neurons firing “in sympathy.” Indeed, in their discussion of the relationship of music to the mirror neuron system, Molnar-Szakacs and Overy (2006) go so far as to write that we propose that humans may comprehend all communicative signals, whether visual or auditory, linguistic or musical, in terms of their understanding of the motor action behind that signal, and furthermore, in terms of the intention behind that motor action.  (238)15

Watch dance and we mentally dance too. It is thus not surprising that actual muscular movement creeps back in, such as rhythmic foot tapping or bodily reduced (more or less) dance gestures. Then there are families of “air” activities—air guitar and air conducting are common—that indicate the embodied attunement and entrainment of the listener. These have already been actively harnessed in many computer game controllers and interfaces. Still strictly embodied and physical, we might add our air imagination— which will have rather different characteristics that we will explore in what follows.16

266   simon emmerson

Time Scales of the Embodied In 2001, I wrote an article for Computer Music Journal “From Dance! to ‘Dance’—Distance and Digits” (Emmerson 2001). In part, this was a response to the anxiety surrounding the relationship of periodic (beat driven) to nonperiodic electroacoustic musics. I suggested this apparent division related to the different time scales of periodicity and memory pertaining to the human body on the one hand and the environment on the other. The evolution of the human organism has been one of continuous adaptation to the environment including the earth’s gravitational field. The limits of the periodicities of limb and movement are mechanically defined to provide our world of meso-time. But our mind (which is, of course, a part of the body) deals with longer time scales, of contemplation, and reflection, as well as the ability to apprehend the longer rotations of seasons, stars, and the social structures and conventions of life itself. There is no clear division or border but a continuum of body time scales to mind time scales. The relationship between gesture and texture—Denis Smalley’s (1986) paradigmdefining duo—is crucial and shows a similar contrast of scale: Gesture is concerned with action directed away from a previous goal or towards a new goal . . . Texture . . . is concerned with internal behaviour patterning, energy directed inwards or reinjected, self-propagating.  (82)

Thus, gesture tends to imply the performative—involving clear cause, effect, and agent— while texture tends to imply elemental continuity where the cause, effect, and agent chains are more complex. We can either describe these as being at micro-scales we cannot individually perceive, or as much larger immanent structures with some kind of continuous (and often vague) agency and causality. To harness the potential of “air ­imagination” we will need to capture the complete range of these time scales, from the immediately embodied rhythmic through to longer time scales of day, week, month and season, year, growth, and decline.

Synesthesia and the Visual Imagination Some listeners describe a visual “accompaniment” provoked by sound. If we deal with music in general this has a developed literature. Van Campen (2008), itself a key text, has an extensive and comprehensive annotated bibliography. True involuntary visualization while listening to music—one of the most important forms of synesthesia—is rare. But there seems to be some residual (less specific) visualization in many listeners. More recent research suggests this is more common than we might think—whether related to the imagination is a moot point—but relevant here (Van Campen  2008, 42). Much music today is perceived acousmatically, that is, without any available visual clues as

playing the inner ear: performing the imagination   267 to source and cause. Electroacoustic music, of course, often uses this as the basis of its aesthetics, deliberately bracketing out the search for origins and thus stimulating the imagination through sound alone.17 I personally do have imaginary visualization when listening to electroacoustic music: I see shapes, textures, colors, often “set” in a quasi-real-world vista and spaces. This might be abstract geometric or more of a “landscape environment.” I listen with eyes open to enhance this perception that seems to be in real space around me, superimposed on (and strangely integrated with) the actual visual information from wall materials, loudspeakers, other audience disposition, and set-up.

Notation, Visualization, and Evocation There is available to us another possible tool that we might develop to capture some of these experiences. It has long been discussed how best to write down any music from an “aural” tradition. Much electroacoustic music belongs to such a tradition, one with little or no human-readable notation.18 The need to “fix” this sonic flux19 comes from several quarters. First, there is diffusion (or projection) in concert: the performer—often the composer—cannot remember every detail in the work and needs a representation to follow and anticipate successive sound events, thus allowing a strategy of presentation—a performance—usually at a mixing console to distribute the sound around the auditorium. Classic examples are found from the musique concrète and acousmatic music traditions over the years—for example, from Bernard Parmegiani (in Chion 1982) and recently a wonderful book including many such scores from Trevor Wishart (2012). Second, those that study and want to understand the music demand an “outside-time” representation to allow analysis and contemplation. Then we have also a more hidden and private domain: that of the composer’s mnemonic sketch score. This may have a range of functions within the composing process (Gray  2013); remembering the characteristics of sounds from one day to the next, suggesting possible sound trajectories and developments, but also, tantalizingly close to what we are developing here, projecting the evocative notation into sound rather than the reverse. This leads to the need for what has been termed “evocative transcription.”20 The aim is to create a symbolic visual representation (a “picture”) of the sound that both represents it and evokes it in memory and imagination. Many traditions have produced both handdrawn and machine-driven transcriptions. In machine transcription, there are well-­ established time and frequency-domain representation procedures. In the frequency domain (which is key to our synthesizer ideas), spectrum representations may be “objective” but have inadequate correlation to—and evocation of—the actual sound as perceived. In more recent years, both the software packages Acousmographe (INA/ GRM)21 and EAnalysis (Couprie)22 aim for a hybrid of machine-assisted and manual evocative transcription. Symbolic shapes from a library may be combined with freehand drawing and both may be subject to standard shape transformations. There are others that aim at a more machine-focused approach (Park 2016) but this will need to be informed

268   simon emmerson by extensive research into human perception to allow the machine’s knowledge to be accurately recognized by the human interpreter. There have been more discussions than real attempts at a degree of agreement on standard universal symbols for this visualization (Couprie 2004). But these generally have a degree of abstraction that is difficult for many to grasp (we do not automatically “map” the characteristics of sound in a standard way). Most symbolic systems to date have thus been geometric and avoid reference to any representation of a source object (for example a bell symbol to represent its sounding), yet many young people are happier in this more direct representative domain.23 Perhaps some degree of coherence and standardization of these visual attributions could be established through more thorough research into evocative notation. Experience and a lot of use will tell us what works.

Reverse Engineering: “Air Imagining” We aim to reverse engineer our “air imagining.” If we have the means to create an evocative transcription through the quasivisual imagination, we have the potential to work this backward: a synesthesic synthesizer—requiring software to translate a repertoire of shapes, attributes, colors, and textures into sound. And if generalization and universal agreement on notation is not feasible, personalization of choices and preferences should be possible. Perhaps later generations will be more at ease with a computer system seeming to know, or at least endlessly second guess, our patterns of needs and desires, as in the placing of automated website advertising that appears to follow our recent—as well as making suggestions for future—purchases. So, for our evocative synthesis engine, the system can learn our individual preferences in graphic style, shape, and representation, and verbal attribution—as well as pattern matching to known sound types we have used in the past. The proposal inevitably suggests an experimental fuzziness—the aim in such an envisioning of synthesis is to allow greater creative and imaginative play, not instant formulaic generation. Predictability would soon lose the user’s interest. I am challenging the designers to harness learning engines and MIR24 search capability to work out my evocative transcription preferences, my visualization strategies to drive the system back into sound synthesis. There are of course software packages and apps that are beginning to address this process (e.g., Metasynth).

Playing and Performing the Imagination The synesthesic synthesizer may be played to produce new sounds—but what about new music? Now is not the time to discuss any distinction between the two ideas—sound may be music, music must effectively involve sound, but the boundary may be both

playing the inner ear: performing the imagination   269 porous and flexible. That said, I do have a bias toward retaining somewhere in the relationship the idea of performance. Some of the actions that led to the imaginary synthesis described previously were essentially performative. But I hesitate to say they were performance. I want to make a fuzzy distinction between playing and performing. Playing might be seen as a search for suitable materials, performing as presenting some kind of structure— maybe perceived as “expression” or “argument”—beyond the individual components, although the two clearly overlap. Musicians are generally not dancers. Their movements have been accurately directed by mechanical technology toward the physical excitation of an object. The use of media has freed this up, allowing movement alone to control sound, bringing dance and music performance a step closer. But there often remains a strong residual desire for some sort of resistance. The study of such haptics informs interfaces and interactions where this enhances muscular control. The ultimate “air play” may combine free and resistive components.25

Space and Imagination Space is essentially tactile—it is so much more than an abstract extension. Let us look back to the physical objects of the analog studio: switches, knobs, dials, and faders. We had to move ourselves, our limbs, to operate the system far beyond the reduced actions needed for a QWERTY keyboard. In the first instance, as the digital revolution first took hold, these physical objects (switches, etc.) were steadily replaced first by “number boxes” and then screen icons representing the absent objects themselves. The funneling of information was extreme—to program a synthesizer in the early 1980s involved addressing a tiny handful of parameters on these diminutive displays, one group at a time, storing and moving on. It was not even possible to see, let alone feel, the disposition of the machine.26 We increasingly sense this loss of “tactile location”—the desire to play using our hands persists and has even increased in recent years. Thus, the analog studio was performed with wide-ranging physical gestures. These might bear a curious relationship to possible performance gestures at the mixing desk in the performance space. There are wonderful images of François Bayle working in the GRM studios in 1984 that capture this sense of performance, its energy and dynamism, perhaps with an improvisatory edge (Bayle 1993, photo V et seq. following page 27). This contrasts with the more controlled and judged performances on the Acousmonium (the multi-loudspeaker system originally designed by Bayle at the GRM in the 1970s [photo II–III, following page 16]).

Sound Sculpting, Sound Dancing Glimpses of this possibility are seen in some of the earliest experimental inventions. At the musique concrète studios in Paris we see the pupitre d’espace (1951) designed to

270   simon emmerson perform the spatialization of the prerecorded sound around the audience.27 We can now, of course, adapt this for a real-time sound sculpture where the movement controls sound quality. Thus, the next stage of our imaginary sonification performance might be to manipulate sound in space as a malleable (even fluid) substance—to place, move, and “smear” sounds within that space, as a painter might sketch or, more appropriately, as a dancer might move or a sculptor might manipulate clay. This shifts the metaphor for externalizing imagination from 2D painting or movie to a 3D activity—dancing, bricolage, or sculpting. Yet again, many of the inventions and developments of previous decades may give us helpful clues as to how the new haptics (with resistance) and 3D representation can be adapted to the musician’s touch and feel. Our imagination of being a dancer/ sculptor can be as a creator immersed in the sound—something our real sculptor can but dream of.28 From the STEIM studio in Amsterdam came some of the most inventive devices to harness the elemental (human) agency of movement. From 1984 on, Michel Waisvisz developed a series of controllers, known as “The Hands,” detecting hand and some body movement (Waisvisz 1985, later versions may be seen on the STEIM website29 especially in videos of Waisvisz’s performances over the years). Using a later technology, Laetitia Sonami developed her “Lady’s Glove”30 and, in a more popular idiom, Imogen Heap has used similar controls (her Mi.Mu gloves31). Such gesture transducers might be a very suitable interface to capture the “air” controller gestures as we explore performance as a creative part of the imagination synthesizer. We must also remember another powerful tool that might be controlled by such dance or sculptural gestures.

An Imaginative Plug-In—Imaginary Sound Transformation As we remarked in the introduction, the mind—the imagination—is also a fabulous sound-transformation device. While the sensational world is bound by some sense of an externally applied space and time, the imagination knows not these as boundaries. Just as we remarked on documented cases of composers glimpsing an entire piece in an instant, so too we might be able to grasp a complex transformation in a flash, then perhaps “play” it at something nearer the time of real performance. We might be able to compare alternative strategies for the sound to develop. Time compression and expansion can be a useful tool for creation but may not map simply to external world time. And so too with space: Gaston Bachelard’s intimate immensity (1964) speaks to an oneiric experience that I have often had in the twilight between wakefulness and sleep and also when entering the Olympic Stadium in London in 2012. Many composers (and movie sound designers) have tried to capture such an experience in real sound. If our imaginative experience of space can be surreal (even completely unreal) then it can act as a stimulus—an impossible goal we know we cannot reach but there might be fruitful experience in trying.

playing the inner ear: performing the imagination   271

Where Do We Begin? Seeds and Provocations Let us assume we do indeed have a suitable synthesis engine that can (in some way yet to be determined) respond to our imaginative synthesis wishes. But the system starts with a blank. Where do we begin? What do we start with? With the concrète traditions of music-making we start with a sound and play with it—an empirical and experimental approach. But, with our imaginary synthesizer, we have many possible sounds that do not yet exist. Or perhaps they do exist but are hidden from our consciousness until called forth. Let us divide the possibilities into seeds and provocations. I will not come to definitive conclusions here but make some key suggestions and questions that we can creatively address. First, let us describe a seed stimulus: this might be external or internal—empirical or idealized. External, here, means the origin was a real sound and remembered. Then again, the seed could be entirely imagined—or perhaps a combination of sounds impossible in the world around. The system will not be perfect—we could even say it “guesstimates” what we are imagining. A (real) sound is made and play begins. The user can treat this first attempt as a source, and maintain the original thought as the target. This is a kind of “imagination control loop”—change slightly till matched, or till sufficiently close. Or, of course, we could treat any outcome as something entirely new with a future path of its own and forget the original stimulus. It might be the case that some imagined sounds are in fact physically impossible.32 Furthermore, “holding” a sound in the imagination unaltered while being compared with other sounds might be a difficult (perhaps impossible) task! Hence the need for the evocative transcription discussed earlier to help us fix it. Provocations work somewhat differently. One line of thought throughout both modernist and postmodern musical discourse has been the creation of the unexpected as a major device—not unexpected simply from the listener’s perspective but from even the composer’s and performer’s perspectives. These involve generative procedures—usually some kind of automata that (more or less) decouples the immediate taste and will of the composer and performer. Some like external “models” for this reason; to generate what might not otherwise have been conceived. Needless to say, others reject them completely! Common in recent decades have been systems that are both mathematically beautiful and relate to a degree to a real-world phenomenon. Examples include fractal and chaotic systems, swarm algorithms, and so on. Within our model here, these could be provoking our imagination, kick starting the synthesizer. It will remain a matter of choice as to whether and to what degree we intervene and guide the system toward any goal. If we fixed a goal (as earlier) using a form of evocative transcription then we remain free to modify and moderate this ideal—perhaps our provocative system comes up with something we prefer to our original target.

272   simon emmerson That leads us to a final source of potential input to our mind machine. Earlier generations thought of the term “synthesizer” as pertaining to electronic generation usually of sound types clearly not derived directly from real, sounding objects, although often based on instrumental models (wind, brass, string). But largely through the analysis/resynthesis developments in the last quarter of the twentieth century, the distinctions between samplers and synthesizers steadily blurred until none now remain. There has also been a cultural shift to hearing technological sound as part of an extended environment—the nature-culture divide has effectively disappeared. Birdsong sits alongside traffic sound in the urban soundscape.

Nature, Playing, and Performing This powerful symbiosis of the recorded and the synthesized allows a new form of relationship to environmental sound, based on a transition from it playing (recorded soundscape) to playing it (synthesized soundscape). This is nowhere more clearly seen and heard than in the dream of harnessing the environment as music. This has a long history (hinted at in the Shakespeare quoted previously) accelerated by the advent of recording and communication technology. There are two imaginings here. At first, the object of the imagining has a life of its own (it plays). This is a notion clearly articulated in the nineteenth century with respect to “nature”: At a sufficient distance over the woods this sound [bells] acquires a certain vibratory hum, as if the pine needles in the horizon were the strings of a harp which it swept. All sound heard at the greatest possible distance produces one and the same effect, a vibration of the universal lyre.  (Thoreau 1986, “Sounds” [from Walden 1854])

but radically reformed by the futurists to include urban and industrial sound: We will sing of the vibrant nightly fervour of arsenals and shipyards blazing with violent electric moons . . . deep-chested locomotives whose wheels paw the tracks like the hooves of enormous steel horses bridled by tubing; and the sleek flight of planes whose propellers chatter in the wind like banners and seem to cheer like an enthusiastic crowd.  (Marinetti [1909] 1973) To convince ourselves of the amazing variety of noises, it is enough to think of the rumble of thunder, the whistle of the wind, the roar of the waterfall . . ., and of the generous, solemn white breathing of a nocturnal city.  (Russolo [1913] 1973)

But, in the early twentieth century, there rapidly emerged from this a dramatic new option—I can play it!—returning to the human listener the possibility of becoming performer. In 1922 (in Baku, Azerbaijan), Arseny Avraamov created a Symphony of sirens: Avraamov worked with choirs thousands strong, foghorns from the entire Caspian flotilla, two artillery batteries, several full infantry regiments, hydro-airplanes, twenty five steam locomotives and whistles and all the factory sirens in the city. He also invented a number of portable devices, which he called “Steam Whistle Machines”

playing the inner ear: performing the imagination   273 for this event, consisting of an ensemble of 20 to 25 sirens tuned to the notes of the Internationale . . . Avraamov did not want spectators, but intended the active participation of everybody.  (Molina 2008, 19)33

However, the technology of recording allows a simulacrum of such a vast and ungainly process. Recording sounds (more recently, of course, sound and image) allows the creation of a substitute for the real environment—and these are a lot simpler to play than the original! From the very earliest days of Pierre Schaeffer’s experiments in the studio at French radio, we have his invention of the sampler—in his imagination: Once my initial joy is past, I ponder. I’ve already got quite a lot of problems with my turntables because there is only one note per turntable. With a cinematographic flash-forward, Hollywood style, I see myself surrounded by twelve dozen turntables, each with one note. Yet it would be, as mathematicians would say, the most general musical instrument possible. Is it another blind alley, or am I in possession of a solution whose importance I can only guess at? (Pierre Schaeffer’s diary: April 22, 1948 [Schaeffer 2012, 7])

It is clear from the contextual discussion that Schaeffer does not mean “note” as the traditional pitched event but in a more general sense of what was to become the “sound object.” Thus, with the advent of the internet nearly fifty years later, the ability to “sample” sounds worldwide becomes a real possibility—even off-earth through, for example, the NASA website—thus giving us the power to reach out to, play with, and ultimately perform the environment in its mediated forms.34 Technology allows the creative reorganization of these spaces; their transformation (often through the simplest means of amplification and spatialization)—a “small” event can become a landscape.35 Our imagination allows us to become Alice in Wonderland and change scale—the human scale can be made gargantuan and the largest can be brought within human scale. John Cage famously did this in many of his installations, projecting one space into another. His realizations of Variations IV (1963) and Roaratorio (1979) are good examples. Thus, amplified small sounds can fill a listening space alongside the reinjection of an entire city soundscape.

Conclusion—and a Footnote on Ethics and the Transparency of the “Fourth Wall” In the five or so years between first thoughts about ideas behind this chapter and the time of writing, speculation has rapidly become reality. On the one hand, the development of increasingly accurate and extensive brain scanning techniques, on the other, the advent of commercially available EEG and ECG brain interfaces (with a major drive to produce thought-directed game controllers36) suggest that sooner rather than later we could have imagination-driven sound synthesis.

274   simon emmerson I have suggested that the way into this may not be so simple—both acousticians and musicians have found it extraordinarily difficult to describe or define timbre or sound quality. Just as the ideas themselves are multidimensional, so we shall need to harness all the tools we have used to date in our new synthesis engine, from the most quantitative measures to the most playful and creative actions across many modes—graphic, sculptural, movement, or haptic. Musicians are well used to creative play and improvisation, and I have argued that such embodied performance will be an integrated part of this new experimental world—indeed vital to its fulfillment. While an exciting prospect, potentially of enormous power, we shall need to tread with mindful awareness. The example from the work of Dennis Potter I cited earlier contained a basic ethical dilemma—the retrieval of the memories from the unfrozen cortex was literally torture for the conscious head unable to express itself. Until in the final episode it manages to construct a “message” on a piece of paper in its imagination which begs for release—granted a short while later in a terrorist attack on the laboratory. A final example will amplify this need for care in ethical matters as some of the tools I am sketching do indeed come into existence in coming decades. The creative model I have discussed here is based on an optimistic “projection outwards” under our aware control (with all its limitations) and with our consent. But there is the dystopic mirror view that might become an “invasion inward” without our (apparent) knowledge or permission. Mind reading is just the start of it in Andrei Tarkovsky’s sci-fi film Solaris (1972). The space station is overrun by the invisible intelligence of the planet’s ocean that can create apparently real people, things, and places from the memories of the cosmonauts. In this case, they never had control over this immense power. They were not responsible for its behavior and do not begin to understand its working. The scientists on board (and back on Earth) know only of certain “rhythms” and changes in the ocean’s behavior—and that the “creations” do not possess the same atomic structures as their earthly equivalents. The humans on board not surprisingly become increasingly deranged. If we wish to conclude with the more optimistic view that we can avoid our dreams becoming nightmares, then we will need to share openly the necessary knowledge and understanding of the workings of our imagination synthesizer. We might need to take steps to ensure that this is the case and to gain aware consent for its use. Today we may cursorily accept a “cookie” regime on our computer; but let us imagine an equivalent (or more advanced) observer of our behavior while wearing an EEG interface and the possible consequences of that data collection if made unaware and uncontrolled. While moving to matters outside the remit of this chapter, we shall need to be aware of the issues and participate in deciding our preferred safeguards.

Notes 1. This chapter is based on my keynote presentation to Audio Mostly 2014—“Imagining Sound and Music,” run by the Music and Sound Knowledge Group, Aalborg University, October 2014. Some ideas first appear undeveloped in my keynote addresses to ACMC2011 (Auckland) and ICMC2011 (Huddersfield).

playing the inner ear: performing the imagination   275 2. The standard understanding is that discourse will always be language—but musicians muddy this somewhat by referring to “musical discourse” and “musical language” in a close parallel (see Jean-Jacques Nattiez 1990). 3. My earliest experiences of electronic music date from 1969, meetings in Cambridge with Roger Smalley and Tim Souster, and my subsequent work with them and their group Intermodulation. 4. See http://www.oxforddictionaries.com/definition/english/imagination. Accessed January 15, 2016. 5. Unfortunately, this is not in line with common usage in neuroscience. In this literature “image” can refer to a wide range of neuronal constructs that (it is argued) go to form the experiences of a mental image that may not be visual. Zatorre and Halpern (2005) expressly use this broader view in discussing “musical imagery.” 6. Levinson (1998) argues against the need for this architectonic reconstruction (in memory) for the listener. But practitioners and anyone seeking to understand musical working (musicologists included) will have need of a grasp of this wholeness after the event, usually externalized outside of time in a diagram of some kind. 7. There is much work using fixed image stimuli—recent moving image research adds much greater complexity in time-domain capture (Nishimoto et al. 2011). There is no equivalent in sound that inevitably possesses this complexity! 8. Stober and Thompson (2012) have suggested the subdiscipline of music imagery infor­ mation retrieval (MIIR), but this remains (at present) a genre characteristic rather than specific sound detail as it is intended to interface MIR (metadata) engines. 9. Shown well in Schaefer and coauthors (2011) for real music signals. Grimshaw and Garner (2015) may blur the distinction but do not deny the difference. 10. p.158—where they aim to “account for both the similarities and differences observed in neuroimaging research between imagined sound and sound perceived in the presence of a sound wave.” 11. I further speculate that this might relate to what Denis Smalley (1996) describes as “indicative fields” even within the abstraction of much acousmatic music—a kind of resonance with archetypes—but this is for future research. 12. Grimshaw and Garner’s brief elaboration focuses on the emotional affect of the sound— one of their exemplary streams is sound design in computer games—what follows here deals with this only indirectly. I also do not discuss how I will an imagined or recall a memorized sound into existence. 13. Daniel Feeld—whose end-life drama has been the subject of Potter’s previous play, Karaoke, with which it is paired. 14. We might vocalize what we imagine in order to communicate it—see Malloch and Trevarthen (2008) with respect to “communicative musicality.” Trevor Wishart has also described the human voice as “a flexible sound-generating device, like a sophisticated synthesiser” (Emmerson 2007, 106), though he did not describe this particular possible use! 15. This is not without detractors: Hickok (2009) strongly challenges assumptions, evidence, and philosophy behind the theory’s being “all but accepted as fact.” 16. Although less overt in its manifestation, Sheets-Johnstone (2011) argues that this just as much originates in the embodied world of movement. 17. This is Pierre Schaeffer’s notion of écoute réduite (reduced listening) that forms the basis of musique concrète (Chion 1983, 33–34). 18. Computer representation of sound may be seen as a kind of notation but it cannot be “read” directly except by other computers!

276   simon emmerson 19. That is assumed to mean we map the time of the music onto the space of the page (or screen equivalent) in some way—discussed further in what follows. 20. This phrase has been around for several decades with no obvious origin. For a good introduction to its meaning and function, see Hugill (2012, 237). 21. See http://www.inagrm.com/accueil/outils/acousmographe. Accessed May 15, 2017. 22. See http://logiciels.pierrecouprie.fr/?page_id=402. Accessed May 15, 2017. 23. While this has not been the subject of formal research, evidence and discussion may be found in Wolf (2013), Holland (2016), and ideas behind the EARS2 resource site (ears2. dmu.ac.uk) and the associated software Compose with Sounds (cws.dmu.ac.uk). 24. MIR is a fast-developing discipline that has harnessed machine assistance to seek, sort, and represent (visualize and display) information of some use and comprehension to the user from “big data” sources (see Casey et al. 2008). 25. There is an interesting case with a conductor. While theoretically a “no resistance” system, I wonder to what extent the response of the orchestra/ensemble “feels” like a resistive weight. 26. This did not stop the Yamaha DX7 (1983) becoming the most successful synthesizer in history at that time. 27. For an image, see the booklet with the CD box archives GRM (INA/GRM 2004, p.18). 28. This is something well developed for musicians of limited physical movement— see  the  Sound=Space environment developed by Rolf Gehlhaar as an example (http://www.gehlhaar.org/x/pages/soundspace.htm. Accessed May 15, 2017). 29. See http://steim.org/. Accessed May 15, 2017. 30. See http://sonami.net/ladys-glove/. Accessed May 22, 2017. 31. See, https://mimugloves.com. Accessed December 14, 2018. 32. This is not the place to discuss the interesting relationship between “impossible” and “impossible to produce”—or the possibility that anything imaginable might exist somehow. 33. Molina’s words, but followed by the complete “instructions” for the performance written by Avraamov himself (20–21), as published in the local press. These comprise, in fact, an hour-by-hour scenario of the entire event. Molina reports that there were two predecessors (1919, 1921) and a subsequent full version in Moscow (1923). 34. An excellent and extreme version is seen in such as “The Earth’s Original 4.5 Billion YearOld Electronic Music Composition,” an installation by Robin McGinley (2002) that projects “sferics” into the installation triggered by the visitors (see https://vimeo.com/66475800. Accessed May 15, 2017). 35. I have written elsewhere on “space frames” and their transformation and reconfiguration through technology (Emmerson 2007, 2015). 36. See, for example, the interfaces from Emotiv (emotiv.com) and Neurosky (neurosky. com)—with others announced.

References Bachelard, G. 1964. The Poetics of Space. Boston: Beacon. Bayle, F. 1993. Musique acousmatique—propositions . . . positions. Paris: INA and Buchet/ Chastel. Bharucha, J. J., M. Curtis, and K. Paroo. 2006. Varieties of Musical Experience. Cognition 100: 131–72.

playing the inner ear: performing the imagination   277 Casey, M., R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney. 2008. Content-Based Music Information Retrieval: Current Directions and Future Challenges. Proceedings of the IEEE 96 (4). 668–696. Chion, M. 1982. L’envers d’une oeuvre (Parmegiani: De Natura Sonorum). Paris: Buchet/Chastel. Chion, M. 1983. Guide des objets sonores—Pierre Schaeffer et la recherche musicale. Paris: Buchet/Chastel. Clarke, E. 2005. Ways of Listening: An Ecological Approach to the Perception of Musical Meaning. Oxford: Oxford University Press. Couprie, P. 2004. Graphical Representation: An Analytical and Publication Tool for Electroacoustic Music. Organised Sound 9 (1): 109–113. Emmerson, S. 2001. From Dance! to “Dance”: Distance and Digits. Computer Music Journal 25 (1): 13–20. Emmerson, S. 2007. Living Electronic Music. Aldershot, UK: Ashgate. Emmerson, S. 2011. Music Imagination Technology. Keynote Address. In Proceedings of the International Computer Music Conference Huddersfield, ICMA, 365–372. San Francisco, ICMA. Emmerson, S. 2015. Local/Field and Beyond: The Scale of Spaces. In Kompositionen für ­hörbaren Raum (Compositions for Audible Space), edited by M. Brech and R. Paland, 13–26. Bielefeld: transcript Verlag. Gray, D. 2013. The Visualization and Representation of Electroacoustic Music. PhD thesis, Leicester: De Montfort University. Grimshaw, M., and T. A. Garner. 2015. Sonic Virtuality: Sound as Emergent Perception. Oxford: Oxford University Press. Hickok, G. 2009. Eight Problems for the Mirror Neuron Theory of Action Understanding in Monkeys and Humans. Journal of Cognitive Neuroscience 21 (7): 1229–1243. Holland, D. 2016. Developing Heightened Listening: A Creative Tool for Introducing Primary School Children to Sound-Based Music. PhD thesis, Leicester: De Montfort University. Hugill, A. 2012. The Digital Musician. 2nd ed. New York and London: Routledge. Le Corbusier. 1968. Modulor 2. Cambridge, MA: MIT Press. Leman, M. 2016. The Expressive Moment: How Interaction (with Music) Shapes Human Empowerment. Cambridge, MA: MIT Press. Levinson, J. 1998. Music in the Moment. Ithaca and London: Cornell University Press. Malloch, S., and C.  Trevarthen. 2008. Communicative Musicality: Exploring the Basis of Human Companionship. Oxford: Oxford University Press. Marinetti, F. T. (1909) 1973. The Founding and Manifesto of Futurism. In Futurist Manifestos, edited by U. Appolonio, 19–24. London: Thames and Hudson. Molina Alarcón, M. 2008. Baku: Symphony of Sirens: Sound Experiments in the Russian Avant Garde. London: ReR Megacorp. Molnar-Szakacs, I., and K. Overy. 2006. Music and Mirror Neurons: From Motion to “E”motion. Social Cognitive and Affective Neuroscience 1 (3): 235–241. Nattiez, J.-J. 1990. Music and Discourse: Toward a Semiology of Music. Princeton, NJ: Princeton University Press. Nishimoto, S., A. T. Vu, T. Naselaris, Y. Benjamini, B. Yu, and J. L. Gallant. 2011. Reconstructing Visual Experiences from Brain Activity Evoked by Natural Movies. Current Biology 21: 1641–1646. Park, T. H. 2016. Exploiting Computational Paradigms for Electroacoustic Music Analysis. In Expanding the Horizon of Electroacoustic Music Analysis, edited by S.  Emmerson and L. Landy, 123–147. Cambridge: Cambridge University Press.

278   simon emmerson Pasley, B. N., S. V. David, N. Mesgarani, A. Flinker, S. A. Shamma, N. E. Crone, et al. 2012. Reconstructing Speech from Human Auditory Cortex. PLoS Biology 10 (1): p.e1001251. Russolo, L. (1913) 1973. The Art of Noises (Extracts). In Futurist Manifestos, edited by U. Apollonio, 74–90. London: Thames and Hudson. Schaeffer, P. 2012. À la recherche d’une musique concrète. Translated by J. Dack and C. North: In Search of a Concrete Music. Los Angeles: University of California Press. Schaefer, R. S., J. Farquhar, Y. Blokland, M. Sadakata, and P. Desain. 2011. Name That Tune: Decoding Music from the Listening Brain. Neuroimage 56 (2): 843–849. Schafer, R. M. 1977. The Tuning of the World. New York: Knopf. Sheets-Johnstone, M. 2011. The Primacy of Movement. Amsterdam and Philadelphia: John Benjamins. Smalley, D. 1986. Spectro-Morphology and Structuring Processes. In The Language of Electroacoustic Music, edited S. Emmerson, 61–93. London: Macmillan. Smalley, D. 1996. The Listening Imagination: Listening in the Electroacoustic Era. Contemporary Music Review 13 (2): 77–107. Stober, S., and J.  Thompson. 2012. Music Imagery Information Retrieval: Bringing the Song  on  Your Mind Back to Your Ears. Paper presented at The 13th International Conference  on  Music Information Retrieval (ISMIR12). https://pdfs.semanticscholar.org/ 45a3/da1804f955ecde6a59eff7d6545ad7c607b6.pdf. Accessed May 15, 2017. Thompson, J. 2013. Neural Decoding of Subjective Music Listening Experiences. Master’s ­thesis (Digital Musics), Hanover, NH: Dartmouth College. Thompson, J., M. Casey, and L. Torresani. 2013. Audio Stimulus Reconstruction using MultiSource Semantic Embedding. Poster presented at Neural Information Processing Systems Workshop on Machine Learning and Interpretation in Neuroimaging, Lake Tahoe, USA. Paper version available from https://sites.google.com/site/mlininips2013/proceedings-ofmlini-2012-1. Accessed May 15, 2017. Thoreau, H. D. 1986. Walden and Civil Disobedience. Harmondsworth, UK: Penguin. Van Campen, C. 2008. The Hidden Sense: Synesthesia in Art and Science. Cambridge, MA: MIT Press. Waisvisz, M. 1985. The Hands: A Set of Remote Midi-Controllers. In Proceedings of the International Computer Music Conference., 313–318. San Francisco: ICMA. Wishart, T. 2012. Sound Composition. York, UK: Wishart. Wolf, M. 2013. The Appreciation of Electroacoustic Music: An Empirical Study with Inexperienced Listeners. PhD thesis, Leicester: De Montfort University. Zatorre, R. J., A. R. Halpern, D. W. Perry, E. Meyer, and A. C. Evans. 1996. Hearing in the Mind’s Ear: A PET Investigation of Musical Imagery and Perception. Journal of Cognitive Neuroscience 8 (1): 29–46. Zatorre, R.  J., and A.  R.  Halpern. 2005. Mental Concerts: Musical Imagery and Auditory Cortex. Neuron 47: 9–12.

pa rt I I I

P SYC HOL O GY

chapter 14

M usic i n Deten tion a n d I n ter rogation The Musical Ecology of Fear W. Luke Windsor

Introduction When we are fearful, it can be because we are threatened or perceive a threat: often that threat is current, sometimes it is remembered, and sometimes it is imagined. Music (and sound) can play a role in the generation of fear, and in this chapter I will argue that music is used in detention and interrogation not only to influence our emotional state directly but also to create an ambiguity and uncertainty that leaves the detainee subject to the free play of imagination, perverting the benign imagination of aesthetic contemplation into something malign and horrific. In order to do this, the boundary between the real and the imagined will be explored: the chapter aims to identify the location of this boundary, how it is set for individuals, and the circumstances in which it becomes crossed. A subsidiary aim is to address the broader context for the use of music in detention and interrogation in order that, in the more academic quest for an understanding of a set of musical behaviors and their consequences, the history and psychology of music’s use as a military tool is not overlooked. Music is often seen as a force for good: for example, the co-optation of Mozart’s music as a panacea has become a paradigmatic example of folk psychology, despite many unintended consequences. Yet, throughout human history, music (and sound) has been associated with and used by commercial, political, and military forces in attempts to control behavior, and music itself seems to have intrinsic power to do harm. In Thomas Kenneally’s book Schindler’s Ark (1982), an inmate-musician and a German officer in a work camp in occupied Poland conspire to use the repetition of a musical work (the infamous Gloomy Sunday) to fatal effect: as a result the officer commits suicide after requesting that the song be repeated with increasing passion. Kenneally’s parable gives

282   w. luke windsor musical power to the detained Jewish prisoner, and the officer willingly submits. This, of course, is the opposite of the normal state of affairs: music in detention is most often controlled by the captor and, as will be discussed later, used to attempt to influence the thinking and behavior of the captive. The parable is also an exaggeration for effect: through hyperbole, Kenneally actually highlights the powerlessness of the captive, who only has the passion for music left as a weapon. As Grant (2013a, 2014) points out, it is not just in the recent Iraq War, where the use of loud recorded music in detention gained notable media coverage (see, e.g., Chrytoschek 2011), that music has been used to coerce or humiliate detainees. Moreover, forced singing and playing, as well as forced listening form part of this history of music in detention and link it to a broader and longer context of music in military settings (see, e.g., Pieslak 2009; Grant 2013b). Furthermore, as Pieslak (2009) discovered, there is a considerable overlap in the choice of music by soldiers for personal use and their selection of music in overt and covert attempts to influence others. This chapter will not attempt to provide an overview of all the ways in which music might or might not be used to influence others, for ill or for good. It will instead focus on an acute and special case of music psychology: that of forced listening to music in detention, whether or not such forced listening is intended to elicit information or not. The aims here are to show how such uses of music can be reframed within a broader context of musical persuasion and to provide a deeper engagement with the ethics of music than could ever be achieved without considering the extreme case of music in detention. For a more general discussion of the darker side of musical experience, Johnson and Cloonan (2009) provide a thought-provoking survey of the many ways in which popular music is deployed as an accompaniment to, or tool of, violence. It is with considerable care that any music researcher should engage with the study of music in detention and/or interrogation. This is partly due to the commonly held view that music is, or should be, a benevolent art with positive impacts on individuals and societies, a view that may cause us to turn away from a more malevolent instrumentalism. Within such a context, research pointing out the harm that music can do seems counterproductive. Indeed, if one is focused on maximizing the potential social benefit of research (see, e.g., Sloboda 2005, 395–419), working on the harm that music can do can only really be justified if it presents data or analysis that can be used in advocacy against the harmful use of music or of it suggests tools to combat such uses. Added to this particular disincentive, the broader study of internment, detention, interrogation, and torture requires a sensitivity to and breadth of knowledge about international and military law and custom, the ethical and moral background to cruel and inhuman practices, and so on; few musicologists or music psychologists come ready-prepared to engage with this body of work. In addition, the researcher may be persuaded that, by studying music in interrogation, they might inadvertently promote practices they disagree with, or indeed add to the body of knowledge that interrogators employ in the field. The position taken by the Society for Ethnomusicology in 2007 (SEM 2007) suggests there is something particular about the use of music in coercive

music in detention and interrogation   283 interrogation that should be called out by musicologists, and also that musicologists should call attention to such (ab)uses of music. Some musicologists are content to disavow all coercive interrogation and question whether we need to make a special case of music within this context, when considered in the context of the range of coercive methods used in detention and interrogation: The issue is really torture, which to me is always wrong, period. I can’t see that music as torture is more or less wrong than anything else as torture, and I confess that deep down this feels like special pleading—e.g., water resource managers complaining about the use of water for torture, or (more ridiculously) Hello Kitty aficionados complaining that Hello Kitty armbands were to be used by a Thai police department as badges of malfeasance and indiscipline.  (Bellman 2007)

This chapter takes an initial step back from these problems, and it does not initially ­consider whether they are of particular importance given the wider debate about the legality or morality of obtaining information through psychological or physical manipulation or pressure. Instead, it will engage with the perceptual and social-psychological consequences of playing music in situations of detention, and it will engage also with the context for these practices, as this may help us better understand how they have come about and how they can be seen to be situated within a broader context of music as a source of behavioral control. Hence, although much reference will be made to existing ethnographic and historical work in this domain (especially that of Cusick 2006, 2008a, 2008b; Pieslak 2009; Grant 2013a, 2014) the broader contexts that will be applied are derived from psychological research that is related both to interrogation and also to other forms of coercion, and the understanding of the relationship between imagination, sound (and music), and direct perception. The role of imagination in the creation of a fearful, vulnerable, and malleable state has an explicit and implicit relationship with the ability or inability of a person to directly and effectively act on and perceive their surroundings. It is for this reason that, rather than analyzing the role of sound in coercive interrogation in a theoretical vacuum, some positioning is required. This chapter will introduce and apply the work of Gibson (e.g., 1966, 1979) on direct perception and ecological psychology and will attempt to show how his theory of perception helps explain the ways in which sound and music, normally helpful or benign, become sources of fear and confusion. The work of Gibson will be returned to in the conclusion of this chapter in a more political vein, as it will become clear that his approach to psychology provides a neat riposte to the co-optation of (music) psychology by military and commercial interests for purposes of persuasion. The contrast between Bernays’s (1942) and Gibson’s (1939) reactions to Nazi propaganda efforts do not, as I will argue, rest on both an ethical distinction and a theoretical one: their views of human psychology lead them to very different conclusions about how we as individuals should respond to attempts by others to influence us. Before this, however, it is necessary to review some of the existing work on music and interrogation/ torture and its intellectual and practical antecedents.

284   w. luke windsor

Music in Detention/Interrogation Although music is mentioned in many recent accounts of the detention of political ­prisoners and detainees captured by the United States and its allies in its “war on terror,” it is neither the most significant aspect of their treatment nor is it isolated from a wider history of music in the context of persuasion or the broader context of sound in military or intelligence applications. The scope of this chapter does not allow for a complete review of the psychological or historical literature of psychological warfare and even less for a review of the enormous literature on music in behavior control (see Volgsten, volume 1, chapter 11). It is also the case that media interest in the use of music in interrogation, and indeed in modern warfare, has been intense throughout the so-called war on terror, arguably in disproportion to its uptake and impact in comparison to other more clearly violent and illegal coercive practices described in military and CIA manuals and in the accounts of detainees and practitioners, and in analyses thereof. Nonetheless, to understand the peculiar appeal of music to military and intelligence interrogators, and its relationships with mainstream applied music psychology and the history of sound in warfare, the following will provide a brief overview of some relevant literature and concepts.

The History and Broader Context of Music in Detention and Interrogation Before directly addressing the main focus of this chapter, the following sub-sections provide necessary context. First, the use of music to control behavior is briefly addressed in general terms; then the use of sound and music is addressed in military settings through short introductions to the role music plays in military life, its use within psychological warfare, and lastly the use of music as a weapon. A final sub-section introduces recent debate on the use of music in interrogation and detention by US forces in the early part of the twentieth century.

Music and Behavior Control Music has a huge and well-researched impact on our emotions (see, e.g., Juslin and Sloboda 2011, for a comprehensive review). In many situations, we are free to choose our own music, but in the cinema, or while watching television, shopping, or sitting in a hospital waiting room or dentists’ surgery, music is presented to us through external agency. Music, and especially recorded and publicly broadcast music, has a long history in relation to the psychology of persuasion. It has been used in advertising and brand promotion, where more or less subtle, intrinsic or extrinsic qualities of musical structure or lyrical content are deployed to attach an emotional valence to a product or brand, or to manipulate arousal levels (see Gustafsson, volume 1, chapter 18, and Egermann, this volume, chapter 17). Such approaches are also used in retail settings (such as shops,

music in detention and interrogation   285 malls, and restaurants) to influence not just our internal state, in an attempt to imbue spaces with a particular ambiance, but also our level of activity. One of the most highly cited publications in the field of consumer control is the description of a study in which the volume of music was varied in a supermarket (Smith and Curnow 1966): louder music was associated with less time in the store but no lesser volume of purchasing. The authors of this study explain this through an arousal hypothesis, whereby the louder music leads to greater arousal in the customers and faster shopping, rather than driving the customers from the store. The correspondence of music to customer’s expectations or their degree of liking are, however, important factors that can be manipulated to influence their behavior. A study by North, Hargreaves, and Mckendrick (1999) demonstrated that we will stay on hold to a help line longer when the music is both liked and congruent with the task. More subtle dimensions of musical structure and associated or evoked emotions can also influence what we purchase, how long we linger, and even how much we are prepared to pay for products. For example, the style of music and its associations with more or less expensive items might be a powerful predictor of purchasing (Wilson 2003). Music has also been considered as a factor in delineating zones within shopping malls and department stores, with different style of music helping to identify soft boundaries between different product areas (e.g., Yalch and Spangenberg 1993). Music is also used without the intention of influencing purchasing in public spaces. Just as we might employ it within our own spaces or through earphones to manage our mood, our spaces’ musical ambiences are curated for us in attempts to speed or slow our movements, make us more comfortable, or provide public information. Although these uses are potentially more benign and may be alternatives to more expensive or harmful attempts to influence us, the central aim is to coerce the listener into a more or less passive state. Dentists, for example, claim to use music to calm patients with some success, aiming to make their work easier through a more relaxed patient without needing recourse to medication. However, Aitken and colleagues (2002) found no effect of music in such contexts above and beyond the patient’s enjoyment of it in a controlled setting, and even in studies where it is shown to have an effect it may only be for less anxious patients (e.g., Lahmann et al. 2008). Moreover, regardless of whether it is effective, music may simply become another remembered feature of a hostile environment for an “uncooperative” patient (see, e.g., Welly et al. 2012), and associations of music with experience can obviously flow both ways. Nonetheless, Standley’s meta-analysis of music in dental and medical settings (1986) does suggest an effect. Similarly, in waiting rooms, medical or otherwise, rather than speeding up service, one may choose to play music to increase tolerance of waiting time (see, e.g., North et al. 1999) or reduce stress (see, e.g., Tansik and Routhieaux 1999). Note, that in all these situations, music’s primary value in self-managing our psychological state is supplanted by external control of this environmental information. Of course, music is but one of many kinds of stimulus information that we and others use to orient and be oriented in the environment, but the semi-unavoidable nature of acoustic stimulation is significantly different from some other forms of influence: averting or closing one’s eyes is much easier than ignoring unwanted sound. Of course, one can

286   w. luke windsor wear ear defenders, plugs, or headphones to block out or supplant this information with silence or our own choice of music, a technological adaptation that serves to both regain and enhance control of the auditory environment in a way that is thoroughly contemporary. As a corollary, the encouragement of employees to curate their own workplace musical environment in order to increase productivity and staff well-being (and to avoid the distractions of workplace noise) is becoming more widespread, and there is some empirical evidence to support the effectiveness of such practices (see, e.g., Lesiuk 2005). Of course, music’s ubiquity in this space of influence has led many to complain about, campaign against, or avoid such settings and uses of music. The attempts by early adopters of musical broadcast technology to impose music in settings such as public transport often backfired (see, e.g., Hui 2016), and there is a general social consensus that even the minor public acoustic spillage from headphones is an intrusion that can attract considerable opprobrium. Before concluding this section, and in order to form a link with the later discussion of the relationship between more general uses of music as propaganda and in psychological warfare, a final way in which music is used in explicitly political settings is worthy of mention. In an unusual and original study, Shevy (2008) used different genres of music to influence participants’ perceptions of trustworthiness, friendliness, and political ideology, exploiting the stereotypical associations of hip hop and country music: a pertinent feature of his findings, which will become relevant when discussing psychological warfare and interrogation, is that the extent and nature of such influence should vary with the ideology and musical preference of the listener: a liberal African American listener would be primed very differently by music than a white or Hispanic listener or a conservative African American, and such influences would vary with preference for musical genre. Music is a tool for subtle persuasion in the context of ideology, not just for commercial ends.

Music in Military Life Music is very much a part of military life (as is sound, see Bull, volume 1, chapter 9): all of the behavioral applications for music listed above might apply to military situations, both as externally applied attempts to manage the behavior of military personnel and in the self-management of emotion in individuals. In addition, and particularly since the Korean War, music has been used to influence opposing civilian or military populations and has been treated more or less as a weapon. There are three particular features of music that are common to all of these applications: rhythmic structure as a guide or stimulus for movement; loudness as a method of overwhelming the auditory system; and the biosemiotics of musical meaning, whereby values and even denotative meaning can be expressed at a distance. Music can be cheaply and easily transmitted electronically either through amplification, or via radio, Internet, and satellite, on its own or in combination with visual images. Musical rhythm is most often associated in military life with the direct entrainment of movement to musical meter through marching. Most military units have marching

music in detention and interrogation   287 bands, and the coordination of movement to music has both utilitarian and psychological dimensions. Even in situations where instruments are not used, soldiers will often march to songs: the clearest examples of this in the Western military is the singing of the French Foreign Legion which cuts across marching and more reflective settings; the Boudin, for example, is sung standing to attention as well as in celebratory or functional marching situations. The tempo of the Boudin, whether sung in motion or not, is surprisingly slow, infamously necessitating the arrival of French Foreign Legion units at celebratory events after other French units, and, indeed, it both denotes the separate identity of the Legion, and connotes its rather dour character. This tempo, and a curious single style, with truncated phrase ends, extends to a wide repertoire of traditional and popular songs, mostly in French, which many of the recruits will barely speak, but also many songs are in German, reflecting the large number of German recruits the Legion has attracted at times (see, e.g., French Foreign Legion 2016). A related tradition, from the United States, is that of the cadences and jodies sung by soldiers as they train (see Pieslak 2009): again, synchronization of movement is paramount, but in both cases the content is also significant, and rather different, as will be discussed below. Importantly, the music of marching sits in an interesting zone in between self-chosen musical behavior and imposed discipline: the choice to march or sing is not free, it is taken under military discipline, and to refuse is a matter for the military courts. Regardless of any other subtler parameter, the sheer volume of military music is important. Whether participatory or not, military bands and even unaccompanied singing produce loud sounds which travel far. In combat, and extensively documented in Pieslak’s study of music in the Iraq War (also see Gittoes 2005, documentary), soldiers not only take the trouble to select their own music to accompany combat within armored vehicles, they create DIY sound systems within them to broadcast the music over their intercom systems or through internally mounted loudspeakers. There is a sense in which, just as a commuter masks the sounds of others with music over headphones, this creates a private environment within the vehicle, the sheer volume of sound masking the influence of the threats from outside. The volume of broadcast or headset music here is self-chosen, although in Gittoes’s extraordinary unsanctioned film about music in the Iraq War (2005) some of the interviewees have clearly developed less coherent musical selections, creating a conflicted musical environment within the confines of the armored vehicle: being unable to escape from loud unwanted music is clearly a potential problem in combat, just as it might be in other work-settings. The semiotics of military music interacts with these other two parameters: the trite example of the bugle call at one end of a spectrum which ends with the singing (and broadcast over loudspeakers) of “Je ne regrette rien” in association with the withdrawal of the final French Foreign Legion units from Algeria. The lyrics, tempo, and musical structures of military music implicitly and explicitly influence soldiers before, during, and after combat; they serve to identify particular units and they communicate ideas, national identities, and ideologies. This semiotics was particularly important to imperial military powers; for example, in Africa in the late nineteenth and twentieth centuries

288   w. luke windsor (see, e.g., Clayton 1978). For example, in East Africa, both British and traditional African music were adopted to build a corporate identity, often exploiting the usage of Swahili as a cross-tribal language. To sing “Men of Harlech” in Welsh, or indeed English, is one thing, but to sing it in Swahili, quite something else. In this case, the fantasy of Welsh (actually mostly English) soldiers singing this song at Rorke’s Drift (in the film Zulu) has a real counterpart in the musical practices of later colonial troops. Or consider the lyrics of this traditional World War II song from Kenya (also sung in Uganda): Mussolini Mussolini, Mussolini amekimbia! Nakumbuku njaro Nairobi! Nakumbuku njaro Faifa keya! Tutarudi! Tutarudi! Mussolini, Mussolini Mussolini has run away! We remember the light of Nairobi We remember the brightness of 5 KAR. (Clayton 1978, 38)

Psychological Warfare Allied with the presentation of propaganda in spoken form via radio or loudspeaker, music has long played a role, along with sound effects, in efforts to influence the behavior of opposing forces. Indeed, for Volcler (2013; also see Goodman 2012 for a more theoretically driven treatment), the modern usage of music in this context (often associated with the Korean War and later conflicts) is one of two main precursors of the use of music in detention, the other being post-1945 CIA-sponsored research on the psychology of coercive interrogation. Volcler also draws parallels between the nonlethal usage of sound as a persuasive tool and as a weapon to disable or kill, which will be addressed briefly in what follows. This historical link between music as an at-a-distance tool of warfare and music in detention is also made by Pieslak (2009), who distances his historical narrative from that of Cusick (2006, 2008a, 2008b), for whom, like Volcler (2013), the sources of music in detention derive both from propaganda practice and from covert psychological research programs. It is probable that the history of music’s use in detention draws on many precursor practices (see Grant 2014, for an excellent overview of the many ways in which music comes to be used as and in torture), and it is likely that the use of music to explicitly influence behavior draws variously on all of these precursors, depending on circumstance. This will be returned to later in relation to the tension between improvised and more institutionally circumscribed practices described in manuals and by practitioners and detainees. Even in mainstream psychological warfare, the use of music often oscillates between more improvised and administered extremes and between motivational soundtrack and nonlethal weapon, as exemplified by the use of music during the siege of the Vatican

music in detention and interrogation   289 Embassy in Panama in 1989, originally intended to mask reporters’ attempts to ­eavesdrop on negotiations: While some accounts claim that the music was played to boost the morale of American troops (a claim that even here demonstrates the overlap between psychological tactics and inspiration for possible combat), it had, regardless of original intent, a powerful side effect. When Noriega commented that the music was irritating him, the Marines increased the volume, playing the music continuously.  (Pieslak 2009, 82)

Rather than review the range of ways music is used in persuasion in the field, the reader is directed toward Pieslak’s coverage of the use of music by opposing forces in the Iraq War (2009): here both sides broadcast sound at high volumes via loudspeaker: nasheeds on the Iraqi side, and rock and rap music on the US side: in both cases he argues that such a sonic environment inspires friendly forces while also being intended to destabilize the enemy.

Sound Weapons Volcler (2013), in her provocative book Extremely Loud: Sound as a Weapon, argues that the use of music in detention and interrogation takes place in a broader context of sonic weaponization. Indeed, although spending much time on the claims made for physiological applications of sound, she concludes that it is the psychological impact of sound (and music), whether tacit or conscious, that is the most effective weapon. Although sound at high intensity can damage the ear (or even other organs), and contemporary technologies such as the Long Range Acoustic Device (LRAD) can both deliver verbal instructions, tones, noise, or music at long ranges and high enough intensities to cause distress or damage, she notes that the fear of such weapons is probably just as impactful as their application. Importantly, like Cusick, she notes that the attraction of nonlethal weaponry is often somewhat disingenuous: just as the LRAD is marketed as a long-range communication device but potentially applied as a weapon at shorter ranges, sound and music are portrayed as relatively harmless (no-touch) interrogation techniques rather than as psychologically harmful torture methods: “No-touch torture” shares with non-lethal weapons the advantage that it leaves no marks directly caused by interrogators on the visible, fleshy surfaces of the body. Thus hard to prove, and hard to jibe with images of torture familiar from visual and literary culture, “no-touch torture’s” premise is nonetheless consistent with the premise behind non-lethal weapons, including those that use sound; and it is consistent with the premise by which PsyOps units use sound or music to prepare the battlefield. The common premise is that sound can damage human beings, usually without killing us, in a wide variety of ways. What differentiates the uses of sound or music on the battlefield and the uses of sound or music in the interrogation room is the claimed site of the damage. Theorists of battlefield use emphasize sound’s bodily effects, while theorists of the interrogation room focus on the capacity of sound and music to destroy subjectivity.  (Cusick 2006)

290   w. luke windsor Volcler’s most interesting conclusion is that, in many cases, the use of sound as weapon is more effective as a purely psychological technique, a placebo weapon of the imagination: The difficulty in understanding the functioning and effects of acoustic weapons, as well as the mass of conspiracy theories and paranormal inventions they inspire, works in their favor: the information about them becomes confused, thus fueling the psychological effect from which they benefit . . . Weapons of high technology that, like “no-touch torture,” touch without touching, pass through obstacles, and act without seeming to act, acoustic weapons are also infused with a carefully sustained illusion of magic.  (2013, 137–138)

As I will argue later, it is the appeal to imagination as opposed to direct perception that is at the heart of music’s use in detention and interrogation but, before turning to this ecobehavioral interpretation, the next section provides a brief review of recent practices, impacts, and narratives of music in detention and interrogation, focusing on recent conflicts in Iraq, Afghanistan, and the wider “war on terror.”

Music in Interrogation and Detention: Recent US Practices The use of music in the early twenty-first century by US interrogators and guards is embedded in a longer historical and technical context. Before focusing on accounts of these practices from the perspective of military personnel and detainees, it is important to recognize the ideological positioning of the main researchers and its impact on their somewhat limited and selective choices of informants: Pieslak (2009) views the use of music in interrogation from the perspective of the interrogator and, like Lagouranis (e.g., Lagouranis and Mikaelian 2008), one of the few interrogators to write in detail about his work, concludes that the small part that music and sound played in interrogations was largely improvised in the absence of clear legal guidance (according to Lagouranis much of the wider practice was similarly developed “on the job”). Pieslak’s two main military sources exemplify this institutionally vague context in their differing opinions on whether music can ever constitute torture as opposed to legal coercion, noting that neither of the military manuals in effect at this time mention the use of music in any detailed manner or loud sounds (Department of the Army 1992 and 2006, the latter subsequent to considerable amendment following public exposure of the more extreme methods used by US forces). One of Pieslak’s sources views music and other sounds as permissible within both the operational and legal guidelines as long as the interrogator experienced them simultaneously with the detainee; the other viewed music as an illegal “change of scenery” for the detainee, tantamount to actually blindfolding, transporting, and confusing the detainee about their location. Pieslak notes that his second informant was working in Iraq after procedures were made less extreme following the exposure of prisoner treatment at Abu Ghraib prison in 2004.

music in detention and interrogation   291 Of course, some interrogators may have been implicitly or explicitly following guidelines provided by the CIA, either the infamous KUBARK Counter Intelligence Interrogation manual (CIA 1963; also see CIA 1983), or later medical guidance (CIA 2004). These CIA manuals do discuss sensory deprivation (which music or noise can contribute to by masking other sounds) and the general principles of coercive interrogation, but nowhere is the detailed use of music discussed. In the most recent of these manuals (CIA 2004, 8), the use of “white noise or loud music” is ranked fifth out of twenty techniques (in ascending severity) intended to act psychologically on the detainee, with “shaving” the least severe, and “waterboarding” the most severe. Here, and later in this manual, it is made clear that sounds should not be so loud as to damage hearing (CIA 2004, 13) with a maximum of 79 dB. Given that such advice was routinely ignored, according to contemporary accounts such as those cited here, including those reported by Pieslak, it seems possible that the use of extremely loud music above such levels was either improvised in the field, in line with the wider use of music in military settings, or was directed through less-well-documented cultural practices. In contrast, Cusick (2006, 2008a, 2008b; also Volcler 2013), views the use of music and sound in interrogation as directly emanating from the guidance in the military and CIA manuals mentioned earlier (also see McCoy  2006) and hence from covert CIA research programs: unlike Pieslak, she argues that despite the lack of direct reference to music in the manuals, they imply a particular set of practices “very much like the relationship of performance practice norms to that of a published score” (Cusick 2008a, 16). Although some of the evidence for a direct link between psychological research programs on sensory deprivation and related forms of coercion and later practice is rather weak (see, e.g., Blass 2007; Brown 2007, in relation to the contributions of Milgram and Hebb to CIA research), it is clear that these manuals draw on empirical studies on sensory deprivation, although, as pointed out by Pieslak (2009), there is no evidence that any of these studies used music as a masking stimulus. Cusick’s (2008a) primary sources are detainees themselves, although she does refer in detail to secondary material from interrogators themselves. Her work provides the clearest description of music in interrogation practices in this period, especially through her interviews with Donald Vance and Moazzam Begg and her analysis of the interrogation of Muhammad al-Qatani but also in her discussion of accounts by two pseudonymous US interrogators and Lagouranis. In summary, these accounts provide clear qualitative evidence of the practices and their impact: 1. music was played at very high volumes both in detention more generally and during interrogations, certainly much louder than would be advised if permanent damage were to be avoided 2. music was played for long durations (exacerbating the damaging effects of loudness) 3. the music chosen reflected the individual tastes and cultural backgrounds of the interrogators 4. music was interchangeable in some instances with everyday sounds

292   w. luke windsor 5. the interrogators used music for a number of intended purposes related to: a. masking background sounds to isolate the detainee b. interrupting cognition through distraction c. creating cultural dissonance d. establishing the dominance of the interrogator. The issue of dominance (5d) seems particularly pertinent to the explicit training interrogators received; the relationship of dependency between interrogator and detainee is established not only through the playing of loud music (or indeed disturbing everyday sounds) but by its cessation at the will of the interrogator. 5a–c correspond to what Pieslak’s second informant refers to as a “change of scene”: music is intended to block out and change the environment of the detainee in such a way as to maximize isolation and minimize any sense of familiar surroundings. Such masking and distortion of the environment is cultural as well as natural, as evidenced by the contrast Cusick identifies between the experiences of Begg and Vance (both familiar with Western popular music) and that of al-Qatani, who was less so and more so considered music to be haram (forbidden). Begg and Vance found the constant loud music irritating, disorienting, and painful, but were not sensitive to its cultural dissonance: Begg even notes that he believed that his interrogators were sensitive to this and did not use music with him in the interrogation cell (although it was played elsewhere) (Cusick 2008a, 6–7). Vance, however, like Begg found that the loud music, regardless of its cultural familiarity still played a huge role in the psychological regime in which they were immersed. Regarding al-Qatani, Cusick claims that the use of Western music was used knowingly to undermine his religious convictions due to its cultural specificity: Given that the Taliban had forbidden music in Afghanistan for religious reasons, it seems possible that al-Qatani genuinely believed that listening to music was haram, forbidden, and therefore sinful. Yet his inability to talk knowledgeably about Islam’s theological traditions on music allowed “the music theme” to merge with the themes known as “the bad Muslim,” “al Qaeda betrays Islam,” “God intends to defeat al Qaeda,” “arrogant Saudi,” and “I control all” to produce the overall “approach” called “Pride/ Ego Down.” That is, al-Qatani was humiliated, and his Muslim identity attacked, by his obvious ignorance of his own tradition. Meanwhile, the “loud music” he may have experienced as sinful continued to keep him awake, to end his interrogation just before he was allowed to sleep, to awaken him, to prevent him from speaking in answer to interrogators’ questions, and to fill up longer and longer parts of interrogation days that were also filled with the argument over music’s alleged sinfulness, which constituted “the music theme.”  (2008a, 13)

This passage illustrates that the use of music by US interrogators is, if one accepts this account at face-value, much more sophisticated than anything in the training manuals declassified by the US Government. Music is not just another convenient loud sound to disorient through controlling the environment, it is a cultural weapon of persuasion, related to its use in propaganda and psychological warfare. This point is even more

music in detention and interrogation   293 forcefully made by Grant (2014), who locates music in detention within a broader context of music as a method for sanitizing the act of applying militaristic power. Grant suggest that there exists a continuum between the natural and the cultural, and between the linguistic and musical in many interrogation or detention settings, and even implies there is a perverse creativity in the use of music by interrogators to avoid more obvious evidence of force.

Music, Information, and Interpretation: Fear and Imagination Having described and contextualized the ways in which music has recently been used in detention and interrogation, it is time to return to the ethical dimension of music and its co-optation by interrogators. In their different ways, Cusick (2006, 2008a, 2008b), Pieslak (2009), and Grant (2014) attempt to make sense of the way in which music seems corrupted by its association with detention and interrogation, even though they may argue about whether it can be considered torture in itself. Rather than approach this question directly, this final section will recast the use of music in detention and interrogation within the ecobehavioral approach to psychology characterized by Gibson (e.g., 1966, 1979; also see Heft 2001). The intention here is to demonstrate that this co-optation of music is partly a result of choosing to apply psychological research not only to the understanding of human behavior but also to its control. To this end I will contrast the positions of Gibson (1939) and Bernays (1942) on Nazi propaganda, but first it is necessary to describe Gibson’s mature position on the relationship between direct perception and mediate perception, and how it helps explain both benign and malign applications of music.

Sound as Information about Objects and Events Within ecological psychology, stimulus information from objects and events is considered as information about those objects and events: sound, for example, is a source of information about the world and, in Gibson’s later work, serves as information about the potential actions afforded to an organism by the environment (1966, 1979). Gibson worked much more extensively on vision than auditory perception, but his ideas have been applied to perception via both musical and everyday sounds (see, e.g., Gaver 1993a, 1993b; Windsor and de Bezenac 2012). Music, like any other sound, provides information about the actions that produce it, and although this may seem a rather unusual way of viewing music, this tendency of the auditory system to detect the real (or virtual) causes of sounds is both an undeniably important aspect of auditory perception and one that is

294   w. luke windsor important to our interpretation of both more or less conventional (see, e.g., Clarke 2005) and unconventional (see, e.g., Windsor 2000) musics. In many everyday situations, we are able to identify or at least classify the sources for the music that we hear, whether played on the radio or self-chosen, whether experienced live, with the additional benefit of visual information, or acousmatically over loudspeakers or headphones. We know something about where it comes from, who made it, and what they might have wanted us to think or feel; we can infer meaning from lyrics or instrumentation, or the subtleties of harmonic or melodic semiosis. Our sensitivity to these dimensions of sound, however, is developed through developmental familiarity. It might seem paradoxical in a book about imagination to turn to work on direct perception: Gibson’s eschewal of representation and information processing in his account of perception is both controversial within psychology (see, e.g., Fodor and Pylyshyn 1981) and may seem unproductive when other approaches (such as that of Neisser, e.g., 1978, 89–105) explicitly try to understand the relationship between imagination, memory, and perception. However, as I will argue later, the way that imagination functions in Gibson’s work highlights a boundary between real and virtual which is both useful and thought-provoking in this context.

Music and Sound as Information for Interpretation and Imagination Of course, if all music did was inform us about causation and agency, it is hard to imagine that we would spend so much time and effort listening to it. Musical listening also invokes interpretative action in response to the impoverished or ambiguous information provided by the artwork: As Gibson himself states in response to the question of what happens in cases of inadequate information: “the perceptual system hunts” (Gibson 1966, 303; also see Windsor  2000 and 2004). Where the immediate information from a particular source is insufficient, the human being not only hunts for additional information from the “natural” environment, but also from the social and cultural environment. By observing the actions of others, exploring cultural artefacts, by involvement in discussion with others, information may be explored which supplements that provided by the event or object in question. In the case of a sound or sequence of sounds which fails clearly to specify an event, the human listener attempts nonetheless to make sense of these sounds in relation to the environment (Windsor 2000 and 2011). Such activity represents for the listener the active part of what might ­otherwise be considered a passive activity. Choosing to write or talk about music we have heard, walking out of a gig, or even listening again to a passage of music are all actions that leave traces on the environment for others to perceive. Music-makers help furnish us with affordances for interpretative action; they do not merely provide sounds which we process in a passive internal fashion. (Windsor and de Bezenac 2012, 14)

music in detention and interrogation   295 In everyday listening situations, the meaning of the sounds we listen to is mediated by more or less active exploration of the real or virtual agency that created them: we talk to others about music, we read record sleeves or online discussions, and, where these fail to provide us with sufficient clarity, we invent, indeed imagine reasons that the music is like it is (see also Bull, volume 1, chapter 9, on the mediation of sound in war).

Imagination and Fear of Music If one accepts that music is a source of information about our surroundings, albeit one that is somewhat contingent on familiarity, then exposure to music that we do not know is particularly likely to invoke more imaginative and cognitive approaches to listening, as opposed to more automatic responses. In situations where we do not choose the music that we are listening to, this can be a pleasurable puzzle to be solved. Of course, extreme unfamiliarity/complexity is also less than palatable because the cognitive effort is perceived as too great (see, e.g., Berlyne 1971). For the detainee, music is at the same time purely noise and a puzzle, but not one that has any aesthetic value: for detainees such as Begg and Vance (see Cusick 2008a), their experience of music in detention was mainly one of sensory deprivation and masking: loud music functioned to isolate and irritate and was often boring and overly familiar although sometimes absurd. Vance in particular notes the effect of being exposed to random changes in the choice of music, often mid-track, on his ability to think. For Begg in particular, the music served to block out sounds that otherwise provided useful information about the real environment. These aspects of music fit neatly into the sensory deprivation techniques described in the CIA manuals that implicitly or explicitly informed their interrogations. The acoustic environment was loud, artificial, and unpredictable: hardly useful stimulus information for engaging meaningfully with the already impoverished and unfamiliar surroundings of detention. For detainees with less familiarity with the music of their interrogators, there is an additional set of perceptual challenges. It is not just that the music is unfamiliar noise, although certainly that is one dimension of the experience: the additional challenge is that, along with removal of the detainees’ abilities to explore the sources of sounds (or indeed control them), such efforts to identify sound sources are largely pointless. In such a situation, there is no room for interpreting the sounds at all and the choice of music is simply loud and foreign. The only causation to be perceived is that of the interrogator choosing to play or stop playing the music, a power that can be exploited to punish or reward or simply to confuse. The only way of going beyond this is to imagine, and imagination in such a situation is probably only a source of fear of the unknown. Cusick’s account of al-Qatani’s interrogation (2008a) provides yet a third way in which interpretative action functions. Here, it is not unfamiliarity per se that creates stress and confusion, it is the way in which imagination is superseded by active interpretation of musical stimuli: the interrogators and al-Qatani engage in a narration of the music which, for the detainee, becomes dissonant and problematic: the music that troubles

296   w. luke windsor al-Qatani most is the Arabic music that is played: the other music is distressing for the reasons cited previously, acting more as noise than in a more subtle manner, but the Arabic music opens up what turns out to be a distressing dialogue about Islamic culture’s attitudes to music, one which, according to Cusick, he loses.

Conclusion: Music, Torture, and the Necessity of Experience It would be wrong to claim that enforced listening to music in detention is a simply a malevolent distortion of the ways in which we are exposed to the uses of music by marketeers (discussed earlier), however inescapable that soundtrack might be: for many detainees in recent conflicts there is such a disjunction between the musical cultures of interrogator and prisoner that music functions as mere noise. The subtler emotional impacts of music in advertising require us to recognize and decode styles and structures of music in a way only possible for the acculturated. Where detainees share a musical history with their captors, such subtle effects are still possible, and, for Vance (see Cusick 2008a), the presentation of music with emotional impact became extremely distressing due to its lack of fit with the setting of detention and interrogation. However, in all these situations, regardless of the effectiveness of choosing a soundtrack for torture, there is a deeper implication of the use of music in these situations, one that exposes a fundamental disagreement within psychology about the application of psychological methods to influence behavior. To understand where this comes from it is necessary to first take a diversion from direct discussion of music and focus instead on the roots of psychological warfare in the field of public relations and propaganda. To this end I will contrast the conclusions that Bernays (1942) and Gibson (1939) drew from the effectiveness of propaganda in influencing public opinion in times of conflict. Bernays’s work on propaganda and public relations (see, e.g., Bernays [1928] 2004) developed in part from his work in World War I within the Committee on Public Information in the United States. For Bernays, there was no conflict between the idea of a free society and the use of propaganda to influence political opinions. Indeed, in an essay written during World War II, Bernays (1942) sets up a clear platform for combating the authoritarian and highly effective use of Nazi propaganda with Allied propaganda. The moral justification for this rests on a belief that it is possible to coerce public opinion in an ethical manner, that if the ends are good, then the means of persuasion can be justified. Moreover, Bernays’s approach was characterized by an underlying exhortation that propaganda should not lie; a restraint not always followed by practitioners and, in areas of ambiguity, a matter of judgment. Gibson, writing just three years earlier, had a rather different view of propaganda, which conditioned his later desire to focus his empirical research on, and theoretical writing on, the idea that direct experience (unmediated by visual displays, writing, or

music in detention and interrogation   297 even language) provides much richer and less unambiguous source of information about the world than proposed by cognitive and social psychologists. For Gibson, the deception of German people (particularly into anti-Semitism) through propaganda in the 1930s was a reason to suspect all propaganda, because it is inherently deceptive. The answer, for Gibson, was to direct psychology away from the study of preconceptions, and the social influence that seeks to reinforce them, toward the study of direct perception: Our world, more especially our world of social objects, is understood in terms of preconceptions, preexisting attitudes, habitual norms, standards, and frames of reference. When the preconception is sufficiently rigid, an object will be perceived not at all in accordance with the actual sensory stimulation but in congruence with the preconception. No psychological law has been more exhaustively demonstrated than this one. Preconceptions of this sort, moreover, are wrought out socially and modify individual judgments. This fact also has been amply demonstrated both inside and outside of a laboratory. A preconception that is socially reinforced becomes a norm or standard for everybody. It becomes verbally symbolized in the process and thereby is stereotyped and strengthened. Each individual adopts and internalizes it, forgetting its imitative origin, and incorporates it in his repertory of values and opinions.  (1939, 165–166)

When applied to the context of music and its use to influence behavior, to coerce, it becomes clear that regardless of the setting, such a co-optation of an emotionally powerful and unavoidable stimulus is another way in which the powerful seek to control the weak: it is a method for preventing an individual from hearing their environment and is one part of the creation of a setting whereby direct experience is limited to a space controlled by others. The problem with music in torture, then, is not that music is corrupted by the interrogator but that it is used to curtail experience, rather than being a stimulus for further interpretation and exploration. In detention, the aesthetic of music is that of fear: imagination of the worst replaces any other possible interpretation. Music becomes a stimulus for social control, and the ideal subject is the detainee who cannot escape, who cannot explore and discover the world through direct experience. The coercive use of music is just one way in which the experience of an individual in detention is eroded, reifying a view of human beings that Gibson derided: The greatest myth of the twentieth century is that people are sheep. Our intellectual culture has been built on the idea that ordinary people tend to see things as others want them to, with little independence of mind. This is a pernicious assumption. . . . It was Gibson’s purpose to undermine such thinking.  (Reed 1996, 162)

Whereas propagandists and marketeers might tacitly assume that human beings are as passive and directable as “sheep,” using music to influence through the control of auditory information, the interrogator forces the prisoner into a passive mode of engaging with sound, one in which propaganda has clearly failed in its mission to persuade. Music used in this way is a tacit admission that persuasion through musical influence has been

298   w. luke windsor abandoned to brute force: like Bellman (2007), I can only conclude that it is torture that is ethically repugnant, with music playing a minor, and paradoxically unmusical role. This role is in direct contrast to the utopian (and possibly naïve) view of music as a propaganda tool advanced by Young (1954) or Szafranski (1995), one in which enemies of the United States could be influenced by exposure to US cultural products such as music: Saudi Arabia recently joined China as the most recent nation to outlaw satellite television receivers. One can easily appreciate the effects that Music Television (MTV) might have on such cultures.  (Szafranski 1995)

It is as if, faced with an enemy that was often not susceptible to such influence due to its rejection of music as an acceptable mode of expression, the US military and intelligence communities reacted by attempting to maintain this belief in the power of music, while simultaneously undermining its aesthetic potential. The coercive, and hence restrictive, nature of using music in detention and interrogation turns music, at its most degraded, into noise, and at its most sophisticated, into a stimulus for a fearful imagination.

References Aitken, J. C., S. Wilson, D. Coury, and A. M. Moursi. 2002. The Effect of Music Distraction on Pain, Anxiety and Behavior in Pediatric Dental Patients. Pediatric Dentistry 24: 114–118. Bellman, J. 2007. Music as Torture: A Dissonant Counterpoint. https://dialmformusicology. com/2007/08/21/music-as-tortur/. Accessed January 19, 2017. Bernays, E. L. (1928) 2004. Propaganda. New York: IG. Bernays, E. L. 1942. The Marketing of National Policies: A Study of War Propaganda. Journal of Marketing 6 (3): 236–244. Berlyne, D. E. 1971. Aesthetics and Psychobiology. New York: Appleton-Century-Crofts. Blass, T. 2007. Unsupported Allegations about a Link between Milgram and the CIA. Journal of the History of the Behavioural Sciences 43 (2): 199–203. Brown, R. E. 2007. Alfred McCoy, Hebb, the CIA and Torture. Journal of the History of the Behavioural Sciences 43 (2): 205–213. Chrytoschek, K., dir. 2011. Songs of War: Music as a Weapon. A & O Buero. CIA. 1963. KUBARK Counterintelligence Interrogation. National Security Archive Electronic Briefing Book No. 122. Washington, DC: George Washington University. CIA. 1983. Human Resource Exploitation Training Manual. National Security Archive Electronic Briefing Book No. 122. Washington, DC: George Washington University. CIA. 2004. OMS Guidelines on Medical and Psychological Support to Detainee Rendition, Interrogation, and Detention. Langley: CIA. Clarke, E. 2005. Ways of Listening: An Ecological Approach to the Perception of Musical Meaning. Oxford: Oxford University Press. Clayton, A. 1978. Communication for New Loyalties: African Soldiers’ Songs. Papers in International Studies, Africa Series 34. Center for International Studies, Ohio University. Cusick, S.  G. 2006. Musicology, Torture, Repair. Radical Musicology 3. http://www.radicalmusicology.org.uk/2008/Cusick.htm. Accessed January 19, 2017.

music in detention and interrogation   299 Cusick, S. G. 2008a. “You Are in a Place That Is Out of the World . . . ”: Music in the Detention Camps of the “Global War on Terror.” Journal of the Society for American Music 2 (1): 1–26. Cusick, S.  G. 2008b. Music as Torture/Music as Weapon. TRANS 8. http://www.sibetrans. com/trans/articulo/152/music-as-torture-music-as-weapon. Accessed January 19, 2017. Department of the Army. 1992. FM 34–52 Intelligence Interrogation. Washington, DC: Department of the Army. Department of the Army. 2006. FM 2–22.3 (FM 34-52) Human Intelligence Collector Operations. Washington, DC: Department of the Army. Fodor, J. A., and Z. W. Pylyshyn. 1981. How Direct Is Visual Perception? Some Reflections on Gibson’s “Ecological Approach.” Cognition 9: 139–196. French Foreign Legion. 2016. French Foreign Legion Songs and Marches. http://foreignlegion. info/songs/. Accessed January 19, 2017. Gaver, W. W. 1993a. What in the World Do We Hear? An Ecological Approach to Auditory Event Perception. Ecological Psychology 5 (1): 1–29. Gaver, W. W. 1993b. How Do We Hear in the World? Explorations in Ecological Acoustics. Ecological Psychology 5 (4): 285–313. Gibson, J. J. 1939. The Aryan Myth. The Journal of Educational Sociology 13 (3): 164–171. Gibson, J. J. 1966. The Senses Considered as Perceptual Systems. Boston: Houghton Mifflin. Gibson, J. J. 1979. The Ecological Approach to Visual Perception. Boston: Houghton Mifflin. Gittoes, G., dir. 2005. Soundtrack to War. Australia: ABC Video. Goodman, S. 2012. Sonic Warfare: Sound, Affect and the Ecology of Fear. Cambridge, MA: MIT Press. Grant, M. J. 2013a. The Illogical Logic of Music Torture. Torture 23 (2): 4–13. Grant, M.  J. 2013b. Music and Punishment in the British Army in the Eighteenth and Nineteenth Centuries. The World of Music 2 (1): 9–30. Grant, M. J. 2014. Pathways to Music Torture. Musique et Conflits Armés après 1945 4: 2–19. Heft, H. 2001. Ecological Psychology in Context: James Gibson, Roger Barker and the Legacy of William James’s Radical Empiricism. Mahwah, NJ: Erlbaum. Hui, A. 2016. Aural Rights and Early Environmental Ethics: Negotiating the Post-War Soundscape. In Current Directions in Ecomusicology, edited by A. S. Allen and K. Dawe, 176–187. New York: Routledge. Johnson, B., and M. Cloonan. 2009. Dark Side of the Tune: Popular Music and Violence. Farnham, UK: Ashgate. Juslin, P., and J.  A.  Sloboda. 2011. Handbook of Music and Emotion: Theory, Research, Application. Oxford: Oxford University Press. Kenneally, T. 1982. Schindler’s Ark. London: Hodder and Stoughton. Lagouranis, T., and A. Mikaelian. 2008. Fear Up Harsh: An Army Interrogator’s Dark Journey through Iraq. New York: New American Library. Lahmann, C., R. Schoen, P. Henningsen, J. Ronel, M. Muehlbacher, T. Loew, et al. 2008. Brief Relaxation versus Music Distraction in the Treatment of Dental Anxiety: A Randomized Controlled Clinical Trial. Journal of the American Dental Association 139 (3): 317–324. Lesiuk, T. 2005. The Effect of Music Listening on Work Performance. Psychology of Music 33 (2): 173–191. McCoy, A. W. 2006. A Question of Torture: CIA Interrogation from the Cold War to the War on Terror. New York: Metropolitan. Neisser, U. 1978. Perceiving, Anticipating and Imagining. In Minnesota Studies in the Philosophy of Science IX, edited by C. W. Savage. Minneapolis: University of Minnesota Press.

300   w. luke windsor North, A. C., D. J. Hargreaves, and J. Mckendrick. 1999. Music and On-Hold Waiting Time. British Journal of Psychology 90 (1): 161–164. Pieslak, J. 2009. Sound Targets: American Soldiers and Music in the Iraq War. Bloomington: Indiana University Press. Reed, E. S. 1996. The Necessity of Experience. New Haven, CT, and London: Yale University Press. SEM. 2007. Position Statement on Torture. http://www.ethnomusicology.org/?PS_Torture. Accessed December 13, 2018. Shevy, M. 2008. Music Genre as Cognitive Schema: Extramusical Associations with Country and Hip-Hop Music. Psychology of Music 36 (4): 477–498. Sloboda, J. A. 2005. Assessing Music Psychology Research: Values, Priorities and Outcomes. In Exploring the Musical Mind, edited by J. A. Sloboda. Oxford: Oxford University Press. Smith, P. C., and R. Curnow. 1966. “Arousal Hypothesis” and the Effects of Music on Purchasing Behaviour. Journal of Applied Psychology 50: 255–256. Standley, J. M. 1986. Music Research in Medical/Dental Treatment: Meta-Analysis and Clinical Applications. Journal of Music Therapy 23 (2): 56–122. Szafranski, R. 1995. A Theory of Information Warfare: Preparing for 2020. Air Power Journal 9 (1): 56–65. https://www.airuniversity.af.edu/Portals/10/ASPJ/journals/Volume-09_Issue1-Se/1995_Vol9_No1.pdf. Accessed December 13, 2018. Tansik, D. A., and R. Routhieaux. 1999. Customer Stress-Relaxation: The Impact of Music in a Hospital Waiting Room. International Journal of Service Industry Management 10: 68–81. Volcler, J. 2013. Extremely Loud: Sound as a Weapon. New York: New Press. Welly, A., H. Lang, D. Welly, and P. Kropp. 2012. Impact of Dental Atmosphere and Behaviour of the Dentist on Children’s Cooperation. Applied Psychophysiology and Biofeedback 37 (3): 195–204. Wilson, S. 2003. The Effect of Music on Perceived Atmosphere and Purchase Intentions in a Restaurant. Psychology of Music 31 (1): 93–112. Windsor, W.  L. 2000. Through and around the Acousmatic: The Interpretation of Electroacoustic Sounds. In Music, Electronic Media and Culture, edited by S. Emmerson, 7–35. London: Ashgate. Windsor, W.  L., and C.  de Bézenac. 2012. Music and Affordances. Musicae Scientiae 16: 102–120. Yalch, R. F., and E. Spangenberg. 1993. Using Store Music for Retail Zoning: A Field Experiment. Advances in Consumer Research 20: 632–636. Young, J.  S. 1954. Communist Vulnerabilities to the Use of Music in Psychological Warfare. Washington, DC: George Washington University.

chapter 15

Augm en ted U n r e a lit y Synesthetic Artworks and Audiovisual Hallucinations Jonathan Weinel

Introduction During “altered states of consciousness” (ASCs), such as those produced by psychedelic drugs, an individual may experience substantial changes to mood, thoughts and perception, and have subjective experiences of visual or auditory hallucinations. In Hobson’s (2003, 44–46) discussion of his AIM (activation, input, modulation) model of consciousness he distinguishes the imagery of dreams and hallucinations as “internal” sensory inputs, in contrast with the “external” inputs that are received via the senses from the surrounding environment during normal waking consciousness. For the purposes of this chapter, external inputs correspond with physical “reality,” while the internal inputs generated by the brain during dreams or hallucinations shall be considered as “unreality.”1 The use of “unreality” as a label should not undermine the significance of these ASCs, as, throughout history, hallucinations have held a special place in human culture. Ancient shamanic traditions use hallucinogenic plants and other “techniques of ecstasy” (Eliade 1964) in order to induce visionary states that are considered to have religious significance. Such practices are believed to have been used in early forms of human religion (La Barre 1972), and still exist in a variety of surviving indigenous cultures today. In contrast, the use of hallucinogenic drugs such as LSD is illegal in most countries, and is typically viewed in more negative terms. Despite this, various countercultures since the 1960s have celebrated hallucinogens and their profound effects on conscious experience. Within these subcultures, representations of ASCs have been incorporated into a substantial amount of “psychedelic” art, literature, and music. Meanwhile, as mainstream

302   JONATHAN WEINEL films and video games have sought to provide audiences with ever more exotic ­experiences and storylines, representations of hallucination have also been incorporated. The focus of this chapter is on the material design of these representations of ­hallucination within audiovisual media and the role of sound within these. First, I discuss the form of visual hallucinations, auditory hallucinations, and synesthesia. Following this, I consider how these may provide a basis for the design of audiovisual artworks. Many of these artworks can be categorized as either diegetic or synesthetic in their essential operation, as I reveal through an examination of examples from avant-garde films, feature films, light shows, visualizations, VJ performances, music videos, and video games. Following this exploration, I propose a conceptual model with three continua that describes a range of possible approaches for the representation of ASCs using audiovisual media. One of the theoretical configurations implied by this model is what I  refer to as “augmented unreality”: the convergent layering of synthetic sensory information on real-world environments in order to simulate hallucinatory experiences of unreality through digital media.2 Augmented unreality benefits from technological advances such as high-resolution computer graphics, projection mapping, and multichannel surround sound systems. These allow not only for greater levels of accuracy in the representation of hallucinations but also for these to be embedded in arenas where audiences may not be expecting such an encounter. In these spaces, the boundaries between the real physical environment and the synthetic unreal can be subverted and dissolved; and it is this illusory capability that presents an important new paradigm shift for digital cultures. Early examples of augmented unreality can be seen in electronic dance music culture, in which projection mapping techniques, decor, and sonic manipulations are combined to simulate the experience of hallucinations at outdoor psychedelic music festivals. In this chapter, through consideration of these various examples in relation to the conceptual model, I will demonstrate how sound is used in the context of audiovisual representations of hallucinations, and the role it may provide in the emerging paradigm of augmented unreality.

Altered States of Consciousness The term “altered states of consciousness” rose to prominence in the 1960s, to describe the variety of conscious states that lie beyond the typical experience of normal waking consciousness (Ludwig 1969). The varieties of ASCs include: psychosis, such as may be experienced by schizophrenics; psychedelic experiences as produced by hallucinogenic drugs such as LSD; the hallucinations caused by sensory deprivation; states of hypnosis; trances, as experienced in spirit possession rituals; and states of meditation that are used in Buddhism and other religions. Dreaming is also sometimes considered as a form of ASC (e.g., Hobson 2003), as are the unusual states that occur on the boundaries of sleep

synesthetic artworks and audiovisual hallucinations   303 such as hypnogogic hallucinations or sleep paralysis. Altered states of consciousness may be induced through various means, including sensory deprivation; sensory overload, during which the senses are bombarded with excessive stimulation through rapid drumming, the spraying of liquids, and the energetic dancing of trance ceremonies; focus or absorption in a repetitive task; or changes to physiological condition that may be caused by fasting, dehydration, sleep deprivation, or the use of intoxicating substances such as psychoactive drugs. The symptoms of an ASC are also various, with each type exhibiting a selection of effects. These effects may typically include changes to thought processes, memory, emotional state, and perception. Changes to sensory perception that occur as “hallucinations” may affect vision, body image, sound, tastes, and smells. While hallucinations are a component of many forms of ASC, this chapter will focus mainly on their occurrence in psychedelic experiences, and the visual, auditory, and synesthetic aspects of these in particular. As we shall see, these aspects of hallucinations have been significant as a basis for a large number of psychedelic, audiovisual artworks.

Visual Hallucinations Considering the visual effects of hallucinogens in more detail, Heinrich Klüver (1971) carried out studies exploring the effects of mescaline on the visual system. These and related studies (Ostler 1970; Siegel 1977) explored the commonality of visual patterns of hallucinations between subjects. Klüver proposed a set of “form constants”: lattices, cobwebs, funnels, and spirals that constitute the basic form from which the visual impressions perceived during mescaline hallucinations are derived. According to Klüver, in the early stages of hallucination these form constants provide the basis for the visual patterns of hallucination commonly described, while in later stages of hallucination, other forms such as tunnels may be abstracted from these basic forms. In their study, Bressloff and colleagues (2001) suggested that these form constants arise in the visual cortex, and they are believed to be a cross-cultural feature of visual perception during hallucinations. Figure 15.1 presents a spiral image based on the form constants, as used in Psych Dome: an interactive audiovisual installation based on hallucinations (Weinel et al. 2015).

Auditory Hallucinations Though visual hallucinations seem to be more prevalent, auditory hallucinations are also commonly reported. Studies of auditory hallucinations have mainly focused on schizophrenics, who experience “auditory-verbal hallucinations” (AVHs), in which voices are heard as if from the external environment or inside the head (Wayne 2012, 87). Though AVHs are the most common type experienced by people with schizophrenia,

304   JONATHAN WEINEL

Figure 15.1  Artistic impression of visual patterns of hallucination from Psych Dome (Weinel et al. 2015). The Psych Dome installation uses a consumer-grade electroencephalograph (EEG) headset to control parameters of an audiovisual artwork based on hallucinations.

“non-verbal auditory hallucinations” (NVAHs) are also known to occur and may consist of hallucinated music (Kumar et al. 2014), bangs, or noises (Jones et al. 2012). Neuroimaging studies have suggested that auditory hallucinations activate the parts of the brain involved in inner speech and Heschl’s gyrus (the auditory cortex), supporting the view that auditory hallucinations are perceived with a sense of reality comparable to that of sounds that have origins in the external environment (Dierks et al. 1999). In the hallucinations caused by drug experiences, perception of sound is also altered, ranging on a continuum from enhanced enjoyment (or otherwise) to distortions in sound quality and total hallucination of sounds with no external acoustic origin (Weinel et al. 2014). The latter may consist of either AVHs or NVAHs. Figure  15.2 illustrates a continuum of aural experience from normal waking consciousness to total hallucination (as discussed in Weinel et al. 2014). In normal waking consciousness, auditory input comes predominantly from external sensory input, which provides a basis for aural perception. As hallucinatory effects are intensified, the perceptual experience of sounds becomes enhanced; sounds are perceived as more or less enjoyable than usual or as profoundly significant. Further along the scale, the subjective experience of sound becomes distorted, as if properties such as volume, spatial location, or audio quality have been altered or manipulated with digital signal processes. As these effects intensify, the balance shifts from external to internal sensory inputs that arise within the brain. In the most extreme cases, total experiences of hallucination occur that consist of hallucinated noises, voices, or music that have no acoustic origin in the external environment.

synesthetic artworks and audiovisual hallucinations   305

Synesthesia The term “synesthesia” comes from the Greek syn (union) and aisthesis (sensation), and describes the dissolution of boundaries between the senses (Cytowic  1989). In such experiences, sounds may have tastes or colors may have smells. These are not merely imagined correspondences but actual experiences across the senses that are caused by a given stimuli. The phenomenon is reported as a general trait for some individuals in typical states of waking consciousness. However, psychedelic drugs such as mescaline, psilocybin, or LSD are also known to promote experiences of synesthesia. Although synesthesia can involve the blurring of any of the sensory modalities, in these psychedelic experiences sounds often trigger corresponding visual images (e.g., Bliss and Clark 1962, 97), suggesting the directional flow of information that is illustrated in Figure 15.3.

Toward Representation The visual and sonic components of hallucinations can be used to inform the design of corresponding visual images and sounds. Indeed, such practices may be very old; it has been proposed that examples of early shamanic rock art might have been based on the visual images seen during hallucinations (Lewis-Williams 2004). In more recent examples of psychedelic art and films, the internal experience of hallucinations can be represented through appropriate design of audiovisual content. The design of this content has been assisted by developments in sound and visual technology, such as computer graphics and audio techniques that have allowed almost any sound or visual image imaginable to be created. These technologies have allowed the subjective visual and aural experience of hallucinations to be represented in digital video, by creating materials that correspond Sensory Input Perception of Sound Normal Waking Consciousness

External

Normal Enhanced Distorted

Total Hallucination

Internal

Hallucination

Figure 15.2  Continuum of auditory hallucination.

Sound Representation

Hallucinated Imagery

Figure 15.3  Sound-to-image synesthesia.

306   JONATHAN WEINEL with the visual or aural experiences observed during ASCs. Audiovisual artworks have also enabled sound-to-image processes, similar to those found in synesthesia, to be realized through the design of moving images that correspond with music.3 In recent years, these representations of hallucination have also become interactive, as video game technologies present simulations of hallucination or synesthesia. As we shall see, representations of hallucination do not have to follow one fixed approach but may use a variety of possible approaches, ranging from those that seek to replicate visual or aural experience as accurately as possible to those that use more stylized approaches such as impressionism, metaphorical imagery, or symbolism.

Audiovisual Representations of Hallucinations From observing audiovisual representations of hallucinations in a range of existing cultural artifacts, it is possible to consider a variety of possible design approaches that are used. In the following subsections I have grouped illustrative descriptions of some of these artifacts into two broad categories: • Diegetic Representations of Hallucinations • Synesthetic Artworks These categorizations are by no means definitive, but provide a useful means through which we can initially begin to distinguish some key differences between works that use representations of hallucination. “Diegetic representations of hallucinations” is the phrase that describes representations of ASCs occurring within narrative contexts and applies to examples in various films and 3D video games. These examples use the illusory properties of audiovisual media in order to construct narratives involving characters in various environments. Within these narratives, scenes of hallucination are portrayed through the use of various audiovisual techniques that enable changes to the conscious state of the character to be communicated to an audience. In contrast, “synesthetic artworks” provide audiences with sensory experiences of sound and light similar to those that may be experienced during hallucination. Artworks in this category do not typically present these representations of synesthesia within a narrative framework; examples of synesthetic artworks can be found in avant-garde visual music films, visualizations, VJ performances, music videos, and interactive music visualizations. These two categories can also be distinguished by whether they use audiovisual represen­ tations of hallucination to enrich the sensory experience of a present location (synesthetic artworks), or immerse the audience in a narrative depiction of another time and place (diegetic representations of hallucination). In the following subsections, each of these categories is illustrated through a selection of examples.

synesthetic artworks and audiovisual hallucinations   307

Diegetic Representations of Hallucinations Examples in this category incorporate representations of hallucination within narrative progressions. In early examples this was achieved primarily through visual innovations. For example, the classic surrealist film Un chien andalou (Buñuel  1929) creates a dreamlike narrative through a series of bizarre, nonsequitur events intended to reflect the irrationality of the unconscious according to the Freudian view (e.g., Freud 1899). Various works of the “trance-film”4 genre follow a similar approach. For example, Maya Deren and Alexander Hamid’s Meshes of the Afternoon (1943) constructs a dream narrative through the use of props, camera techniques, and editing. In Meshes of the Afternoon, a swaying camera suggests disorientation from the eye-view of the protagonist, while manipulation of props and other elements suggests the unreality of the dream. Perhaps one of the most significant early representations of hallucination, Kenneth Anger’s Inauguration of the Pleasure Dome (1954), rises to a hallucinatory visual crescendo following the protagonists’ ritual consumption of intoxicating substances. Anger uses a visually stunning process of “vertical montage” (in which images are superimposed) to reflect visionary experiences and brightly colored symbols flash on screen to reflect Thelemic5 visual hallucinations with increasing intensity. A similar technique was also later used in The Trip (Corman 1967) to reflect the visual hallucinations of LSD experiences. These early examples all show significant developments in the use of props, editing and animation techniques—essentially “special effects”—to reflect the subjective visual experience of hallucinations within a diegetic context. However, it is notable that almost no attempts are made to represent the diegetic experience of auditory hallucinations; sound is simply used to provide a nondiegetic musical support that sets the mood of the film for the audience. By the 1990s, the widespread availability of computer-generated imagery (CGI) and digital audio techniques allowed the possibility for more accurate representations of the subjective experience of hallucinations, and we begin to see diegetic representations of auditory hallucination. For example, Terry Gilliam’s (1998) cinematic adaptation of Hunter  S.  Thompson’s Fear and Loathing in Las Vegas (1971) uses CGI alongside props, costumes, and lighting to describe visual hallucinations: faces metamorphose; vine designs on a carpet creep up the walls; rooms pulsate with colored lights; while cameras sway and frame-rates are dropped to suggest the cognitive impairment of the intoxicated characters from a first-person perspective. This is matched by the sound: as Hunter S. Thompson (Johnny Depp) listens to a telephone conversation in a hotel lobby, the sound of the stranger’s voice is processed with reverb, causing it to momentarily fill the sound stage, in reflection of Thompson’s absorbed attention; the sound literally fills his “head space.”6 As a receptionist transforms into a snake, a pitch transposition effect is applied to her voice; and as Thompson wades through the mud of his reptile zoo hallucination, the audience hears the sloshing sounds of the unreal sludge. These uses of sound go beyond nondiegetic musical accompaniment to reflect the aural experience of hallucinations. These sounds can be seen to reflect the full continuum suggested in

308   JONATHAN WEINEL Figure 15.2: from sounds that have an acoustic basis within the diegetic environment, to distorted versions of these, and sounds that are entirely internal products of hallucination with no acoustic basis in the diegetic environment. Later examples, such as Enter the Void (Noé 2009), push further still toward accurate representations of hallucination with the aid of CGI and digital audio techniques. Enter the Void uses a sustained first-person perspective: the camera presents the subjective eye-view of the protagonist, allowing the audience to see what he sees (including his blinking eyelids); while sound presents his aural experience so that the audience hears what he hears. Sound is not only used to relate his conversations, but also to reveal the inner speech of his thoughts that are delineated from vocal speech by processing the dialogue with an echo effect. Early in the film, the character smokes a glass pipe containing DMT (dimethyltriptamine), a powerful hallucinogen with a rapid onset and short duration. As he inhales the drug and the effects onset, his vision becomes blurred and spots of light flash across his visual field. He closes his eyes, and we see a network of organic fibers and fractal patterns (creating using CGI), suggestive of abstractions from Klüver’s (1971) form constants. Throughout this sequence we hear an abstract sound collage, in which the sounds from the Tokyo streets below are processed with flangers and other effects in order to suggest perceptual distortions and auditory hallucination. Through these various techniques, Enter the Void demonstrates how both sound and visual images can be used to render the subjective experience of visual and auditory hallucination with improved levels of accuracy, so that the media presented bears stronger resemblance to the visual and aural experiences people actually describe during hallucinations. In recent years, the use of computer graphics and sound to describe visual and auditory hallucinations has also been used in interactive media, such as first-person shooters (FPS) video games. For example, Weinel’s (2011) Quake Delirium demo project and Far Cry 3 (Ubisoft 2012) are video game projects that provide animated visual properties in order to simulate distortions to visual perception, while also using digital effects and sounds to simulate auditory hallucinations. In the latter game, the simulation of hallucination provides a means through which to enrich the narrative, but also demonstrates an emerging paradigm shift in which games allow the player to explore new potentialities through the simulation of altered states of consciousness in the context of virtual worlds.

Synesthetic Artworks Synesthetic artworks present audiences with experiences of light and sound that are comparable to those that may occur during a hallucination, without the use of a clearly defined narrative context. “Visual music” is a form of avant-garde film that is specifically orientated toward synesthetic forms (Brougher and Mattis 2005). In the films of artists such as Len Lye, Norman McLaren, Oskar Fischinger, and John Whitney, animated arrangements of color and shape are used to form dynamic relationships similar to those

synesthetic artworks and audiovisual hallucinations   309 found in musical composition. While much of the work in this idiom has been ­characterized by the quest for a harmonic visual language that Whitney (1980) articulated in his writings on visual music, some works were also conceptualized as representations of the internal experiences of the “inner eye” (Wees 1992). Harry Smith’s Early Abstractions (1946–1957) series7 and Jordan Belson’s visual music films, such as Allures (1961) and the unfinished LSD (1962), are notable as examples that seek to present internal sensory experiences through film. Both artists used music as a complement to their visuals, creating synesthetic audiovisual experiences for their audiences. Although both drew inspiration from their own experiences of ASCs, their work can be more appropriately seen not as attempts to convey their own first-person experience but as constructing new sensory experiences for their audiences that provoke a form of synesthesia through the use of audiovisual media. This approach was also explored through the use of psychedelic light shows such as: Jordan Belson and Henry Jacob’s Vortex Concerts; works by the USCO collective (Davis  1975, 67; Oren  2010); and Andy Warhol’s Exploding Plastic Inevitable shows with live music by the Velvet Underground (Youngblood 1970, 102–105; Joseph 2002). For audiences on psychedelic drugs, these light shows may provide a complementary experience; however, they also construct a multimodal experience of sound and light for those individuals who are not operating under a chemically altered mindset, and this imitates the processes of synesthesia, constructing a similar experience ­synthetically through sound and projections. New technologies such as light synthesizers and computer software acted as a catalyst for the furthering of these synesthetic audiovisual experiences from the late 1970s onward. Early sound-to-light devices such as the Atari Video Music (1976) can be seen as simulating sound-to-image synesthesia (as in Figure 15.3). Subsequently, programs such as Jeff Minter’s Psychedelia (Llamasoft 1984), Trip-a-Tron (Llamasoft 1988), Virtual Light Machine (VLM) (Llamasoft  1990), and later Neon (Llamasoft  2004), are successive iterations of synesthetic equipment that incorporate progressive levels of computational integration between sound and image (Minter 2005). Along with the availability of computer graphics software on home computers, programs such as these, and hardware such as the NewTek Video Toaster, would be among those that supported the nascent VJ (“video jockey”) performances that flourished in tandem with the electronic dance music culture8 of the 1990s, as demonstrated on the Studio !K7 X-Mix (1993–1998) series. The mode of these is essentially one of sensory stimulation, and incorporates replications of visual hallucinations and synesthesia: looping 3D graphics, fractals, and cycling textures are combined in correspondence with music to produce impressions of psychedelic hallucinations and rave culture iconography. This VJ culture became a common element of larger dance music clubs and outdoor raves and has also grown to encompass the use of projection mapping technology that allows multiple surfaces to be used as video screens. Modern VJ software allows the use of real-time audio parameters as a means to manipulate graphical filters that are applied to predesigned video clips, or as parameters that drive animations. Recent examples of this type of work include the videos of VJ Chaotic (Ken Scott), such as Forever Imaginary (2014a), and planetarium (“fulldome”9) works such as Crystallize (2014b).

310   JONATHAN WEINEL As with diegetic representations of hallucination, these synesthetic artworks have gradually been moving toward interactivity. The antecedents for this can be found in various earlier works such as the GRAphics Symbiosis System (GRASS, c. 1975) realtime visual system or the interactive features of Jeff Minter’s work. Robin Arnott’s game SoundSelf (2014) is notable in this area: reportedly “inspired by a group-ohm on LSD” (Ismail 2014), the game uses the human voice as an input to control synesthetic tunnel visualizations reminiscent of Klüver’s (1971) form constants, and supports the Oculus Rift virtual reality (VR) headset. Similarly, the Psych Dome software (Figure 15.1) uses EEG as means to control real-time generation of sounds and graphics (Weinel et al. 2015) and has been used as part of an interactive performance by Darren Curtis and Bradley Pitt based on the concept of a vision quest: Noosphere: A Vision Quest at Adelaide Fringe Festival (Sacred Resonance 2016).

A Conceptual Model for Audiovisual Representations of ASCs The discussion so far has outlined two main types of representation of hallucinatory ASCs: diegetic representations that present hallucinations within the context of a narrative progression, and synesthetic artworks that enrich the sensory experience through the presentation of hallucinatory audiovisual experiences.10 In order to further consider the differences implicated by examples within these groups, Figure  15.4 presents a conceptual model describing possible approaches for the representation of ASCs using three continua: “input,” “mode of representation,” and “arena space.”

Stylized

Mode of Representation Transported

Arena Space Accurate

Situation

Input External

Internal

Figure 15.4  Conceptual model for audiovisual representation of ASCs.

synesthetic artworks and audiovisual hallucinations   311

Input The x axis of the model describes input, and corresponds with Hobson’s (2003, 44–46) discussion of sensory Input that can be modulated between internal and external sources. Visual or sonic materials can be used to represent experiences of external sensory experience (e.g., impressions of actual environmental surroundings), or internal sensory experience (e.g., hallucinated visions or sounds). For instance, a narrative representation of hallucination may include visual and auditory elements that describe either an actual environment or a hallucination. Modulation between both external and internal elements is also possible, such as if an audiovisual representation of an actual environment is presented with gradually increasing distortions and the introduction of hallucinated elements.

Mode of Representation The y axis of the model describes mode of representation, which may range from “accurate” to “stylized.” “Accurate” representations are those that attempt to render the visual or auditory elements of hallucination as authentically as possible for the audience; hence, visual effects may be used to present the visual experience of hallucination in a way that closely approximates the first-person experience, while sound may be used to render auditory distortions and auditory hallucinations.11 At the opposite end of this continuum, “stylized” describes a wide range of artistic possibilities for rending hallucinations, such as through the use of art styles such as impressionism, cartooning, symbolism, or metaphorical techniques.12 Modulation between accurate and stylized approaches is possible, such as if an accurate representation diverges into the use of metaphorical materials during certain sequences in order to describe hallucinations. Such modulation is not uncommon, as movie directors often show the onset of hallucinations using visual effects or geometric patterns, before transitioning into the use of symbolic or metaphorical cinematic materials to describe the more intense phases of hallucination.

Arena Space The z axis of the model describes arena space: the entire performance space in which musical and visual elements are presented.13 At one end of this continuum, “transported” approaches are those that seek to remove the audience from the awareness of their real-world context through immersion into the illusory audiovisual medium. This is the position typically used by diegetic works that seek to absorb the audience into a fictional world and narrative. At the other end of this continuum, “situational” approaches are those that work in conjunction with the real-world environment, presenting sound and visual images that enhance the experience of the “here and now” (as opposed to the

312   JONATHAN WEINEL “then and there”). Synesthetic artworks such as psychedelic light shows at rock concerts often use this approach, since they aim to stimulate the senses of the audience within the present. Modulation between transported and situational approaches is also possible, since an audiovisual work may operate in conjunction with the arena space or seek to transport the listener from it at various points during a performance.

In Practice As demonstrated in Figure 15.5, the conceptual model can be used to describe the representational approach used by various examples, such as those discussed previously. Enter the Void (Noé 2009) uses representations of both internal and external sensory experience and modulates between the two as the protagonist shifts between normal and hallucinatory states of consciousness. Due to these modulations, the actual point on the conceptual model changes through the course of the film; hence, the ellipse indicates not one point but the approximate range that is traversed over time. The mode of representation in Enter the Void leans toward accurate representations of ASC, and as a fictional narrative, it seeks to transport the audience from awareness of the movie theater into the diegesis of the story. Fear and Loathing in Las Vegas (Gilliam 1998) also uses both internal and external inputs; the hotel lobby scene described earlier includes real-world sounds of the environment and modified versions of these that suggest movement along the continuum toward internal sensory perception and hallucination. However, while aspects of the visual and auditory approach used in Fear and Loathing in Las Vegas correspond with the actual form of ASCs, the mode of representation is relatively more stylized than Enter the Void. As the work is diegetic, use of the arena space is similarly “transported”

Stylized

Fear & Loathing in Las Vegas Allures

Mode of Representation

Enter the Vold LSD

Transported

Arena Space Accurate

Situation

Input External

Internal

Figure 15.5  Examples of audiovisual works positioned on the conceptual model.

synesthetic artworks and audiovisual hallucinations   313 for this work, and indeed this is the arena space position for most works in the “diegetic representations of hallucination” group. Psychedelic visual music films such as those by Jordan Belson do not generally include representations of external elements; visual elements are descriptive of visual impressions of inner experience and therefore occupy the internal part of the axis, as indicated for Allures (1961) and the unfinished work LSD (1962) in Figure  15.5. Considering the mode of representation, these films each fall somewhere between accurate and stylized positions. For instance, LSD leans toward accuracy through the depiction of forms similar to Klüver’s form constants; it resembles the type of imagery people actually describe during visual hallucinations with closed eyes during LSD trips. In contrast, Allures is a more metaphorical work. Both works could be considered as “situational,” since they aim to actually induce synesthetic experience rather than transport the listener into a fictional narrative. The situational approach is also the typical position for many other works discussed in the “synesthetic artworks” category, since psychedelic light shows and VJ performances typically seek to bombard the senses with light and sound and enhance the sensory experience of a space, rather than extract the individual from his or her awareness of it.

Augmented Unreality As we have seen, audiovisual representations of hallucinations may use a variety of different approaches. The historic development of these forms has also been closely related to the advancement of computer graphics and digital audio. These technologies are especially useful because, by their very nature as subjective, unreal phenomena, hallucinations cannot be captured or recorded in the way that images and sounds from the external environment can be. Computer graphics and sound then provide a means to create synthetic representations of visual or sonic material based on the form of hallucinations, thus avoiding this problem. Through digital technologies we have witnessed a progression from the use of camera techniques and props to represent ASCs (Un chien andalou, Buñuel 1929; Meshes of the Afternoon, Deren and Hammid 1943), to sophisticated CGI and digital audio as a means to portray the subjective experience of hallucinations (Fear and Loathing in Las Vegas, Gilliam 1998; Enter the Void, Noé 2009). These technologies have also had a significant impact on synesthetic artworks that are presented in social situations such as psychedelic rock concerts and raves, taking us from the early model of projecting analog film with approximately synchronized music, to real-time, audiovisual light shows involving multiple projection surfaces where visuals are linked computationally to electronic dance music (VJ culture). The development of synesthetic artworks that occur in social situations, often in ad hoc locations such as outdoor performance spaces (e.g., music festivals and raves) is

314   JONATHAN WEINEL currently on the cusp of a further new development: “augmented unreality.” In computing, “augmented reality” describes the use of immersive computer technologies in order to add an additional layer of information that allows the user to access additional or virtual data. Typically, the aim of these systems is to enrich the user experience by providing access to useful information in correspondence with the location of the user. The proposed concept of “augmented unreality” is similar in that it also uses immersive technologies to add an additional layer of information; however, it is distinguished by the use of this layer to disrupt the perceived reality of the situation, effectively bending it toward imaginary or hallucinatory experiences of unreality. In these terms, the modification is a distortion or subversion of physical, external reality, and may exist to alter or corrupt information rather than add new data. The aim is less to inform, and more to misinform and destabilize the perceptual experience of the subject. Figure 15.6 illustrates the concept of augmented unreality with regard to the conceptual model for audiovisual representations of ASC. As shown by the model, in a given situation the audience will experience both the real-world external environment of the situation and a synthetic construction of internal unreality that is facilitated through digital technologies. However, these two input sources converge, so that sounds or visual images from the external environment appear as if distorted, dissolving the boundaries between the real-world environment and the synthetic unreality.14 Augmented unreality is significantly aided by the quality with which digital technologies allow the production of media that accurately resemble the form of hallucinations. As discussed, high-resolution computer graphics and digital audio techniques, such as realistic spatialization, provide a powerful means through which to construct synthetic illusory representations of unreality that are effective and convincing. The process is also assisted by the computational processes that link the experience of sound to the visuals, allowing the media to form a synesthetic mesh across the modalities, imitating the mechanism of actual synesthesia (as in Figure 15.3). Materials can also be designed

Stylized

Mode of Representation

Synthetic Internal Unreality Transported

Real-world External Environment

Arena Space Accurate

Situation

Input External

Internal

Figure 15.6  Conceptual model of “augmented unreality.”

synesthetic artworks and audiovisual hallucinations   315 to converge with the external environment through the use of techniques such as the imitation and processing of visual or aural information derived from the external environment; the external environment then becomes an input source that can be subjected to graphical or sonic transformations. We find visual examples of this in the spectacles of projection-mapped buildings, where artists use the actual form of the building and its texture as a basis for the design of transformed materials. In sound, we find a similar principle in electroacoustic compositions such as Rajmil Fischman’s No Me Quedo . . .  (2000; discussed in Fischman 2008), which uses recorded sound and digital transformations to provide convergence between instrumental sounds and synthetic electroacoustic sounds. The delivery of these illusory forms of media is supported by the availability of increasingly powerful technologies, such as multichannel speaker systems and multiprojection mapping systems. These allow the media to be delivered convincingly, and their (semi)portable nature also enables the illusions to be “thrown” and sited outside of the usual arenas of cinemas or on computer screens where we might otherwise expect to see them. This, in turn, allows the potential for illusory encounters that are unexpected and, in some cases, may be indistinguishable from the real, physical environment. It is the combination of convincing illusory media, coupled with the ability to site or throw these anywhere, that exposes an important paradigm shift for digital culture, since almost any public space is then a potential location where perceived reality can be corrupted through the augmented unrealities of digital media. In ideal cases, the high-quality sound and graphics will allow the surface of the media to qualitatively approach the point where its synthetic nature cannot be detected with certainty, while the portability of these illusions will help to catch audiences off-guard. Early examples of augmented unreality can be observed in electronic dance music culture. For example, psychedelic trance culture15 prioritizes the aesthetics of the psychedelic experience in music, and at outdoor festivals ultraviolet decor and VJ collectives such as Trip Hackers and Artescape design synesthetic visual elements that are intended to mesh with outdoor (real-world) festival environments (e.g., Dickson 2015). Projection mapping is used in conjunction with sculptural elements that provide custom surfaces for projection and temporary architectural spaces that imitate the form of visual hallucinations and mandalas. These sculptural elements allow animated fractals and tunnel elements suggestive of visual hallucinations to be integrated into real-world environments such as forests, subverting the physical reality of these situations. As heard on Durango’s Tumult (2005), these visual elements are typically used in conjunction with music that includes a combination of rhythmic and melodic elements (intended to produce maximum energetic dance effects), coupled with sounds such as noises and voices that are suggestive of auditory hallucinations. These sounds are manipulated using high-quality digital spatialization and transformations, enabling the enhancements and distortions of auditory hallucinations (Figure 15.2) to be represented through sound. Both sounds and visual materials then explicitly simulate the sensory experience of visual and auditory hallucinations. Since the light show is linked to the audio, the form of synesthesia is also imitated, so that the colors and movement of visual images fluctuate and jump in response to the sounds. The overall effect is “situational,”

316   JONATHAN WEINEL since it works in conjunction with the real-world, outdoor setting of the festival, ­integrating real environmental features such as trees, birds, and the skyline into the equation. Digital media are used to elicit a synthetic experience of unreality in a manner that blends with the real, physical environment, and thus augmented unreality (Figure 15.6) is accomplished. In these situations, it is entirely possible that the audience may begin to experience dissolution of the boundaries between the real environment and the synthetic presentations of unreality. This may be especially true for audiences using chemical substances to alter their mind-sets, however drugs may not be a prerequisite, since the illusory properties of digital media alone could be sufficient to provide such experiences. As the audiovisual technologies discussed thus far become pervasive, the capability to convincingly invoke augmented unreality should increase. Although I have characterized augmented reality here in terms of projections and loudspeakers, it is possible that other emerging audiovisual technologies could also be used to achieve similar effects. For example: wearable video equipment such as the “smart contact lenses,” which play and record video (currently in development); augmented/mixed reality glasses such as the Microsoft’s Hololens; or headphone systems such as Doppler Labs’ Here (Doppler Labs 2015), which modifies and filters sounds from the external environment, are among those could theoretically be used to simulate hallucinations and achieve augmented unreality.16 The long-term implications of this type of media could be dramatic, as the glow of synthetic virtual environments and their accompanying sonic vibrations extend over the everyday, allowing the potential to simulate ASC experiences without the use of intoxicating substances.

Concluding Remarks This chapter has provided an outline of the main effects of hallucinations (a form of ASC) with regard to the visual and aural components of the experience, including sound-to-image synesthesia. As we have seen, the typical form of psychedelic hallucin­ ations follows some structural norms that produce commonality in the experiences between participants. These norms have allowed the representation of hallucinations in a variety of audiovisual media such as films, visualizations, and computer games.17 These can be broadly classified in terms of diegetic representations of hallucination and synesthetic artworks and may use a range of possible approaches. These possible approaches can be considered in terms of the conceptual model presented, which allows the use of input, mode of representation, and arena space to be considered for a given work. The conceptual model also allows us to reflect on the recent move toward improved accuracy found in representations of hallucination and as afforded by digital technologies for sound and computer graphics. I have argued that this drive toward realism, coupled with new technologies for siting work in ad hoc locations, has opened up a new paradigm of “augmented unreality,” in which real external environments and synthetic representations of unreality converge. Augmented unreality is currently

synesthetic artworks and audiovisual hallucinations   317 exemplified by the synesthetic environments of psychedelic trance festivals, but over the next few decades we can expect the trend to grow as illusory audiovisual technologies become increasingly pervasive. As these technologies provide improved resolutions and capabilities for modifying audience experience, the boundaries between external reality and synthetic unreality may dissolve to the point where the two can no longer be distinguished; in effect, producing synthetic digital forms of ASCs.

Notes 1. In drawing on Hobson’s distinction of “external” and “internal” sensory inputs, we should note that he does not propose these as binary categories, but rather a continuum of possible states. It is acknowledged that internal processes can significantly shape normal waking consciousness, and indeed, conversely, in some cases the contents of dreams can also be influenced by external sensory inputs. What is important here is the main origin of sensory material, which in normal waking consciousness is predominantly “external,” unlike dreams and hallucinations that are primarily “internal.” As I explore in this chapter, both the real (external) and the unreal (internal) can provide a basis for corresponding art, sound, and music. 2. While the emphasis here is on digital practices, many of the essential approaches I explore in this chapter were first proven with analog technologies such as film and magnetic tape, and before that, techniques such as painting and the use of acoustic instruments. 3. It should be acknowledged here that experiences of synesthesia are highly individualized; nonetheless, in drug experiences we find that a common mechanism of sound-to-image synesthesia occurs, along with typical visual effects such as the “form constants” (Klüver 1971). In this regard, there are generalizable processes that audiovisual media can begin to reproduce, even if the specific manifestations of synesthesia that are experienced by individuals may remain somewhat elusive. 4. As discussed by Sitney (1979, 21), the concept of the “trance-film” (similar to the “psychodrama”) describes films on such themes as dream, somnambulism, ritual, or possession. 5. “Thelemic” refers to the use of iconography derived from Aleister Crowley’s Thelema religion, which Anger was a member of. These icons are presented in Inauguration of the Pleasure Dome (Anger 1954) as if they were visual hallucinations, suggesting that the ritual invokes visionary experiences related to the Thelemic principles. 6. For further discussion of the metaphorical use of reverb to suggest internal psychological processes in films and popular music, see Doyle (2005). 7. During this period Harry Smith created a series of untitled films, of which several were subsequently lost or destroyed (Sitney 1979, 232–233). Early Abstractions (1946–1957) collects the remaining films from this series. 8. For a further discussion of electronic dance music culture, see St. John (2009). 9. “Fulldome” environments project video on to the hemispherical ceiling of a dome structure, in order to provide an immersive 360° experience. These environments are used for planeta­rium shows, but have also been used to provide various forms of expanded cinema. Notable fulldome events showcasing new work in the United Kingdom have included Mario DiMaggio’s Dome Club series and FullDomeUK. 10. The description of “hallucinatory audiovisual experiences” here does not presume that audiences experience a hallucination in exactly the same way as would be precipitated by

318   JONATHAN WEINEL other means (e.g., psychedelic drugs); rather, the experience of sound and images may elicit distinct illusory experiences that imitate the form of hallucinations. 11. Instead of the term “accurate” we might otherwise have used the term “realistic” here, to describe the stylistic approach taken, in correspondence with “realist” approaches in the visual arts (e.g., photorealism). As Kennedy (2008, 449–450) remarks, realist approaches can be used for depicting actual scenes, but they can also be used when rendering the imaginary (or in this case, the hallucinatory). However, for our purposes here the terms “realist” or “realistic” are unhelpful, since by definition the hallucinatory is unreal; hence the term “accurate” is preferable, to avoid having to describe unreal materials as also “realistic.” 12. For a further discussion of metaphors in art, see also Kennedy (2008). 13. The term “arena space” is borrowed from Smalley (2007), and describes “the whole public space inhabited by both performers and listeners” (42) Here, the term is adapted to include audiovisual elements. 14. The convergence of synthetic and real-world materials here is a development and adaptation of Fischman’s (2008) discussion of convergence of instrumental and electronic materials in electroacoustic music, especially his own composition No Me Quedo . . . (2000). 15. For more information on psychedelic trance culture, see St. John’s definitive account Global Tribe: Technology, Spirituality and Psytrance (2012). 16. In a series of public lectures, Carl Smith (see also 2014,  2016) has described these and other technologies as enabling a new paradigm that he refers to as “context engineering”: computer systems that allow the user to modify his or her contextual awareness, using “reality as a medium.” In these terms, “augmented unreality” could be considered as a specific branch of context engineering. 17. For an expanded discussion of how ASCs may be represented or induced across a wide range of electronic music and audiovisual media, see also Weinel (2018).

References Anger, K., dir. 1954. Inauguration of the Pleasure Dome. Arnott, R. 2014. Soundself. Video game. Belson, J., dir. 1961. Allures. USA. Belson, J., dir. 1962. LSD. USA. Bliss, E. L., and L. D. Clark. 1962. Visual Hallucinations. In Hallucinations, edited by L. J. West, 92–107. New York: Grune & Stratton. Bressloff, P. C., J. D. Cown, M. Golubitsky, P. J. Thomas, and M. C. Wiener. 2001. Geometric Visual Hallucinations, Euclidean Symmetry and the Functional Architecture of Striate Cortex. Philosophical Transactions: Biological Sciences 356:299–330. Brougher, K., and O.  Mattis. 2005. Visual Music: Synaesthesia in Art and Music since 1900. London: Thames & Hudson. Buñuel, L., dir. 1929. Un chien andalou. France. Corman, R., dir. 1967. The Trip. American International Pictures. Cytowic, R. E. 1989. Synesthesia: A Union of the Senses. New York: Springer-Verlag. Davis, D. 1975. Art and the Future. New York: Praeger. Deren, M., and A. Hammid, dirs. 1943. Meshes of the Afternoon. USA. Dickson, C. 2015. Earthdance Cape Town 2015: Main Stage Installation and Video Mapping by Afterlife. Vimeo. https://vimeo.com/139905544. Accessed October 25, 2015.

synesthetic artworks and audiovisual hallucinations   319 Dierks, T., D. E. J. Linden, M. Jandl, E.  Formisano, R.  Goebel, H.  Lanfermann, et al. 1999. Activation of Heschl’s Gyrus during Auditory Hallucinations. Neuron 22:615–621. Doppler Labs. 2015. Here Active Listening. http://www.hereplus.me/. Accessed October 25, 2015. Doyle, P. 2005. Echo and Reverb: Fabricating Space in Popular Music Recording 1900–1960. Middletown, CT: Wesleyan Press. Durango. 2005. Tumult. Italy: Inpsyde Media. Eliade, M. 1964. Shamanism: Archaic Techniques of Ecstasy. Princeton, NJ: Princeton University Press. Fischman, R. 2000. No Me Quedo . . . (17:30). Available on: R.  Fischman . . . A Wonderful World. EMF. Fischman, R. 2008. Mimetic Space: Unravelled. Organised Sound 13 (2): 111–122. Freud, S. 1899. The Interpretation of Dreams. Ware, UK: Wordsworth Editions. Gilliam, T., dir. 1998. Fear and Loathing in Las Vegas. USA: Universal Pictures. Hobson, J. A. 2003. The Dream Drugstore. Cambridge, MA: MIT Press. Ismail, R. 2014. Robin Arnott Presskit: Soundself. SoundSelf. Video Game Website. http:// soundselfgame.com/presskit/sheet.php?p=soundself. Accessed October 25, 2015. Jones, S. M., T. Trauer, A. Mackinnon, E. Sims, N. Thomas, and D. L. Copolov. 2012. A New Phenomenological Survey of Auditory Hallucinations: Evidence for Subtypes and Implications for Theory and Practice. Schizophrenia Bulletin 40 (1): 231–235. Joseph, B. W. 2002. “My Mind Split Open”: Andy Warhol’s Exploding Plastic Inevitable. Grey Room 8:80–107. Kennedy, J.  M. 2008. Metaphor and Art. In The Cambridge Handbook of Metaphor and Thought, edited by R. W. Gibbs, 447–461. Cambridge: Cambridge University Press. Klüver, H. 1971. Mescal and Mechanisms of Hallucinations. Chicago: University of Chicago Press. Kumar, S., W. Sedley, G. R. Barnes, S. Teki, K. J. Friston, and T. D. Griffiths. 2014. A Brain Basis for Musical Hallucinations. Cortex 52:56–97. La Barre, W. 1972. The Ghost Dance: The Origins of Religion. London: Allen & Unwin. Lewis-Williams, J. D. 2004. The Mind in the Cave. London: Thames & Hudson. Llamasoft. 1984. Psychedelia. Commodore 64. Llamasoft. 1988. Trip-a-Tron. Atari ST. Llamasoft. 1990. Virtual Light Machine (VLM). Atari Jaguar. Llamasoft. 2004. Neon. X-Box 360. Ludwig, A. M. 1969. Altered States of Consciousness. In Altered States of Consciousness: A Book of Readings, edited by C. T. Tart, 9–22. New York: John Wiley & Sons. Minter, J. 2005. Neon. Llamasoft: Home of the Virtual Light Machine and the Minotaur Project Games. http://minotaurproject.co.uk/neon.php. Accessed October 25, 2015. Noé, G., dir. 2009. Enter the Void. France: Fidélité Films. Oren, M. 2010. Getting Out of Your Mind to Use Your Head. Art Journal 69 (4): 76–95. Ostler, G. 1970. Phosphenes. Scientific American 222 (2): 79–87. Sacred Resonance. 2016. Noosphere: A Vision Quest. Interactive audio-visual performance). Adelaide Planetarium, March 4–6, 2016. Adelaide, Australia. http://www.sacredresonance. com.au/#!noosphere-/c1y6h. Accessed April 11, 2017. Scott, K. 2014a. VJ Chaotic: Forever Imaginary. Available on Various Artists Optical Research. DVD. London: Hardcore Jewellery. Scott, K. 2014b. Crystallize. USA. Siegel, R. K. 1977. Hallucinations. Scientific American 237 (4): 132–140. Sitney, P.  A. 1979. Visionary Film: The American Avant-Garde 1943–1978. 2nd ed. Oxford: Oxford University Press.

320   JONATHAN WEINEL Smalley, D. 2007. Space-Form and the Acousmatic Image. Organised Sound 12 (1): 38–58. Smith, C.  H. 2014. Context Engineering Hybrid Spaces for Perceptual Augmentation. In Electronic Visualisation and the Arts (EVA 2014), 244–245. London: British Computer Society. http://www.bcs.org/upload/pdf/ewic_ev14_s18paper3.pdf. Accessed September 29, 2016. Smith, C. H. 2016. Context Engineering Experience Framework. In Electronic Visualisation and the Arts (EVA 2016), 191–192. London: British Computer Society. http://dx.doi.org/ 10.14236/ewic/EVA2016.37. Accessed September 29, 2016. Smith, H. E., dir. 1946–1957. Early Abstractions. USA. St. John, G. 2009. Technomad: Global Raving Countercultures. London: Equinox. St. John, G. 2012. Global Tribe: Technology, Spirituality and Psytrance. London: Equinox. Studio !K7. 1993–1998. X-Mix. Video Series. Thompson, H. S. (1971) 2005. Fear and Loathing in Las Vegas. Reprint. London: HarperCollins. Ubisoft. 2012. Far Cry 3. Sony PlayStation 3. Wayne, W.  U. 2012. Explaining Schizophrenia: Auditory Verbal Hallucination and SelfMonitoring. Mind and Language 27 (1): 86–107. Wees, W. C. 1992. Making Films for the Inner Eye: Jordan Belson, James Whitney, Paul Sharits. In Light Moving in Time: Studies in the Visual Aesthetics of Avant-Garde Film, edited by W. C. Wees, 123–152. Berkeley: University of California Press. http://publishing.cdlib.org/ ucpressebooks/view?docId=ft438nb2fr;brand=ucpress. Accessed October 25, 2015. Weinel, J. 2011. Quake Delirium: Remixing Psychedelic Video Games. Sonic Ideas (Ideas Sonicas) 3 (2): 22–29. Weinel, J. 2018. Inner Sound: Altered States of Consciousness in Electronic Music and AudioVisual Media. New York: Oxford University Press. Weinel, J., S.  Cunningham, and D.  Griffiths. 2014. Sound through the Rabbit Hole: Sound Design Based on Reports of Auditory Hallucination. In ACM Proceedings of Audio Mostly 2014. Denmark: Aalborg University. doi: 10.1145/2636879.2636883 Weinel, J., S. Cunningham, N. Roberts, S. Roberts, and D. Griffiths. 2015. EEG as a Controller for Psychedelic Visual Music in an Immersive Dome Environment. Sonic Ideas (Ideas Sonicas) 7 (14): 85–91. Whitney, J.  H. 1980. Digital Harmony: On the Complementarity of Music and Visual Art. Peterborough: Byte Books/McGraw-Hill. Youngblood, G. 1970. Expanded Cinema. New York: E. P. Dutton.

chapter 16

Consum er Sou n d Søren Bech and Jon Francombe

Introduction This chapter deals with one of many methods (namely descriptive sensory analysis) used for the objectification and quantification of the consumer’s imagination with respect to the audio signal; it provides a justification of the method as used in the audio industry for the design of audio playback technology that maximizes the potential for control­ ling or improving the consumer’s auditory imagination. In the first section, the basic assumptions and procedures behind sensory analysis are introduced. These are exem­ plified by the quantitative descriptive analysis (QDA) method. QDA is one of the basic methods in sensory analysis of food or sound quality; it addresses and controls the ­complex influence of an individual listener’s expectations, mood, previous experiences, and so on in an experimental context. In the following section, an example of a complete sensory analysis of a complex sound field is provided, followed by details of the subse­ quent development of a perceptual model for prediction of the attribute distraction in a particular type of sound field. In the final section, upcoming and future developments in this area are discussed. The traditional role of the audio industry1 has been to provide means for a listener to perceive and experience audio content (as made by some content creator, e.g., a music artist or sound designer) at anytime and anywhere after the production of the content. This includes products or services that are used to record and store the sound (micro­ phones, tape, records, CD players, and so on); processes and products for transmission of the sound to the end consumer; and finally, products for reproducing the sound in the consumer’s home, car, or other listening venue. Theile (1991) states that a reproduction system should “satisfy aesthetically and it should match the tonal and spatial properties of the original sound at the same time.” A primary goal of the industry has therefore always been “transparency”—that is, to create an impression or auditory experience2 for the listener so that, for example, during a news broadcast it is possible for the listener to form an auditory image of the announcer being “in the listening room” (as opposed to

322   søren bech and jon francombe being in a remote studio). Another example is to enable any listener to imagine that he or she is in the concert hall where, say, a classical music performance took place. The main goals for researchers in academia and industry have therefore been to understand the processes involved in the entire transmission chain (from recording to repro­ duction) and to develop products that allow the listener to perceive auditory images that (1) correspond to actual “participation” in the original performance; and (2) accurately reflect the original and unmodified intentions of the artist and the producer. This goal has driven a range of research areas under the general term “acoustics” that is defined by ANSI/ASA (2013) as: “(a) Science of sound, including its production, trans­ mission, and effects, including biological and psychological effects; (b) Those qualities of a room that, together, determine its character with respect to auditory effects.” Specific areas in the present context include “communication acoustics” (Blauert 2005; Pulkki and Karjalainen 2015) and signal processing in acoustics (Havelock et al. 2008). The audio industry has continuously improved or developed new techniques and a range of products with the overall purpose of improving the ability of the rendering/reproduction process to allow the listener to experience a perceptual image equivalent to that which would accompany the original acoustic event. For example, over a number of decades the optimal reproduction system has developed from a single channel (monophonic) reproduction system through two-channel stereophony, 5.1 “surround sound,” and more recently to advanced surround sound systems including 22.2 reproduction (Hamasaki 2011, and references therein). Such systems and their evaluation are discussed further at the end of this chapter. The increase in complexity of the recording/reproduction systems was in part made possible by the introduction of digital signal processing; however, it was not until the introduction of advanced encoding and decoding of audio and video signals in products that such multichannel systems and other signal “manipulation” techniques became widely available to consumers. The introduction of digital signal processing in mass-market audio and video prod­ ucts such as mp3 audio players produced a new range of possibilities for further improv­ ing the quality of audio or video signals, and therefore the quality of “imagination” based on the consumer’s auditory experience. In addition to benefits such as general higher quality, increased number and availability of programs, and more advanced features, a number of signal artifacts were unfortunately also introduced. These included “ringing”3 in audio and “squared clouds”4 in video. These artifacts were very noticeable even for the average consumer. In order to remove or technically compensate for these imper­ fections, the industry had need of “measuring” methods that could connect the physical properties of the signals with the perceived auditory impression of the consumers. This was not a new problem or topic area; researchers in psychophysics had been investigat­ ing such relationships for years (see, e.g., Gescheider 2015, for an introduction), focusing on “simple” auditory experiences such as the perceived strength of a sound (loudness). However, the new problem was how to quantify complex multidimensional experiences that, in addition to simple attributes such as loudness and timbre, also included a number of completely unnatural artifacts (such as “squared clouds” in video). The first task in this process was therefore to devise an experimental paradigm that could

consumer sound   323 be used to disentangle a complex auditory or visual experience into a number of ­subcomponents—the so-called attributes—and then to apply standard psychophysical procedures for each of the attributes. The first author was faced with such a task in 1994, when the EUREKA-funded Adonis project “Perceptual image quality of television displays” was started with Philips NatLab, Philips TV, and Bang & Olufsen as partners (Bech et al. 1996). The purpose of the project was to develop a framework that could be used to quantify the general image quality of CRT displays and to establish a perceptual model or group of models that would link physical measurements with a viewer’s general quality impression. The idea of splitting complex percepts into individual attributes was not new. In 1922, Sabine identified individual attributes of the overall impression of concert halls such as loudness and distortion (Sabine 1922); later, Beranek (1962) identified further attributes of concert hall sound and also developed questionnaires and rating schemes to be used in experiments with human beings (in the following termed assessors). In the audio field, researchers including Staffeldt (1974), Gabrielsson and Sjögren (1979), Toole (1982), and Bech (1994) started developing procedures inspired by the work in concert halls, but with the purpose of understanding perception of reproduced sound by loudspeakers in domestic settings. The focus later moved to communication sound quality, particularly for mobile phones (see Zacharov 2012, for a historical overview). The food industry had also developed and standardized procedures (ISO 1993, 1994, 2002a, 2002b) for description of food quality, based on identification of individual attributes (see Lawless and Heyman 1998, for an overview). The partners in the Adonis project, inspired by the tradition in the food industry, developed the method “rapid perceptual image description” (RaPID) (Bech et al. 1996) for quantification of image quality of television displays that has since been used by, for example, Bang & Olufsen and other television manufacturers. The method includes a specification of the entire evaluation process, including selection and training of asses­ sors, conducting the experiments, and statistical analysis and reporting of the results. The process of evaluation of consumer products using such methods is termed sensory analysis or sensory evaluation. The RaPID method was transferred and implemented for audio evaluations at Bang & Olufsen by the first author and later used in a number of audio and video research projects. Bech and Zacharov (2006) described the state-of-the art within sensory analysis of sound and since then a number of research projects have used and further developed the range of methods. The International Telecommunication Union— Radiocommunication Sector (ITU-R) has also developed standardized methods for perceptual evaluation of audio components such as low bit-rate codecs (ITU-R 1997, 2015). In the food area, new “fast-track” methods have recently been developed to aid a more efficient sensory analysis in industrial settings. These are currently being further tested in audio (see Kaplanis et al. 2017a, 2017b; Moulin et al. 2016). The introduction of so-called audio objects in the transmission protocol for the broadcasting of sound (Herre et al. 2014) and in cinema sound (Kjörling et al. 2016) has further emphasized the use of sensory analysis, as it is now possible to manipulate

324   søren bech and jon francombe the spatial properties of the rendering to a much larger degree than ever before, thereby further increasing the degrees of freedom for controlling or improving the listener’s auditory images. This means that sensory analysis of spatial properties of reproduced sound is currently a hot research topic in many large projects—see, for example, the work of the “S3A: Future Spatial Audio for an Immersive Listener Experience at Home” project.5

Sensory Analysis of Complex Audio Stimuli The introduction of new signal processing techniques (as discussed earlier) meant that perceptual audio scientists needed to develop ways of quantifying the auditory experiences of consumers presented with complex auditory stimuli. Various methods, often adapted from other sensory sciences (for example, food science), have been used to achieve this aim. This section includes a description of the basic assumptions of descriptive analysis and the main principles of the QDA method. The content will be a summary of information presented by Bech (1999), Bech and Zacharov (2006, chap. 4), and Martin and Bech (2005). Readers are referred to these publications for additional details of QDA, other methods, and general references. The use of assessors to evaluate and report on the auditory experiences produced by a certain set of stimuli in a scientifically valid manner requires (at least) two basic issues to be clearly defined: first, the question the assessor is required to answer; and second, a specification of how the assessor should report the answer. The definition of the question to the assessor depends on the purpose of the experi­ ment and the stimuli he/she is subjected to in the experiment. In a laboratory setting, specific stimuli can be engineered to answer specific questions; conversely, in field set­ tings the stimuli will be naturally occurring and this will have an impact on the type of questions that can be posed to the assessors. In the Adonis project (introduced previ­ ously), the key questions were related to general image quality—or lack thereof—due to artifacts in the processing of the natural images; therefore, the stimuli had to be complex natural images. However, in order to establish a scientifically valid relationship between the physical phenomena and signal processing introduced to the original image and the overall quality changes, it was necessary to focus first on specific aspects or attributes of the image, and then to determine how these contributed to the overall image quality. The simplified conceptual model of human perception shown in Figure 16.1 was there­ fore established, inspired by previously developed models by Plomp (1976), Nijenhuis (1993), Yendrikhovskij (1998), and Stone and Sidel (2004). The process starts with a physical stimulus—in this case a sound field—that impinges on the auditory system of an assessor. The sound field can be described by a number of physical variables Φk each with a physical strength or intensity (e.g., sound pressure level

consumer sound   325 Sound field

Φ1…………………Φk

Physical variables Auditory system

Ψ1…………………Ψl

Auditory attributes Learning & ?

S1…………………Sm

Sensorial strength Context & ?

I1…………………In

Individual impressions Combination rules

Itot

Total auditory impression

Figure 16.1  Conceptual model of human perception of multidimensional complex auditory stimuli (from Bech et al. 1996).

and frequency). The auditory system transforms the mechanical activity of the eardrum into nerve impulses that are assumed to be combined in the brain of the assessor, result­ ing in a number of specific auditory attributes Ψl (e.g., pitch, loudness), each with a sen­ sorial strength Sm. The sensorial strength of each attribute depends on the physical strength of the variables Φk in combination with the properties of the auditory system and experimental factors such as learning effects. For the present purpose it is sufficient to characterize these properties by the auditory sensitivity (e.g., can you hear the sound or not) and selectivity for example, the “just noticeable difference threshold” (i.e. can a certain physical change of an audible sound be noticed or not).6 The next step in the process is assumed to be the result of a combination of the individual attributes Ψl, with sensorial strength Sm, into specific impressions In. Finally, these individual impressions are combined into an overall auditory impression Itot. It is assumed that the combination of the specific attributes Ψl (each with a sensorial strength Sm) into individual impressions In, as well as the combination of individual impressions into an overall impression, depends on context, expectations, the mood of the assessor, and so on.

326   søren bech and jon francombe The assumed relationship between the physical domain and the attribute domain is shown in Equation 1. Equation 2 shows the relationship between the attribute domain and the total auditory impression.

Ψ1 = k 1Φ1 + k 2Φ2 + k 3Φ3 +…+ k kΦk + ε1 (1)



I tot = m 1I1 + m 2 I2 + m 3 I3 +…+ m n I n + ε tot , (2)

where kk represents a weighting factor reflecting the importance of the physical variable, mn represents the weight of each individual impression in forming the overall impres­ sion, and ε represents the noise or unexplained variance in the dataset. The assumption made when using QDA or similar methods is that the assessor rating7 of each stimulus for each attribute corresponds to its sensorial strength (Sm). These very simplistic engineering relations have been shown to be able to describe the experimental results and predict the outcome of new experiments for a large number of situations in, for example, audio, video, or food quality experiments. It is also noted that if these relationships can be established then it is possible to relate changes in the physical variables directly to changes in the general impression of the assessors. This represents key information for understanding human behavior and for the development of new food, fragrance, audio, or video products, and explains why the descriptive methods are used commonly by manufacturers in these areas. It is important to note that Equations 1 and 2 do not represent a complete model of the human decision process and they should not be expected to describe more than a maximum of 80–90 percent of the variance in a dataset. However, this is quite often enough to make some very useful estimations of future assessor behavior. The model shown in Figure  16.1 was developed into the filter model (shown in Figure 16.2) by Pedersen and Fog (1998).

Physical domain

Perceptive domain

Filter 1, the senses Physical stimulus

Perceived stimulus

Auditory Sensitivity and Selectivity

Affective domain

Filter 2, other factors

+/–

Likes/dislikes

Mood Context Emotion Background Expectation

Figure 16.2  The “filter model” developed by Pedersen and Fog (1998) inspired by Bech et al. (1996) to describe the process of human sound perception.

consumer sound   327 The model now operates with three domains—the physical, the perceptual, and the affective domains—which are characterized by the measurement principle that is nor­ mally applied. In the physical domain, standard physical measures are used to character­ ize the stimuli; in the perceptual domain, the stimuli are characterized by the assessor’s judgment of the sensorial strength of the relevant individual attributes; and finally, in the affective domain, the assessor’s rating of, for example, the overall auditory impres­ sion (see Eq. 2) is used for characterization of the stimuli. The general idea or principle of descriptive analysis is therefore to identify the indi­ vidual attributes for the stimuli of interest and have assessors judge the sensorial strengths of each of these. This is often done under highly controlled laboratory conditions using a limited number of assessors (e.g., 15–20) as described for the QDA method. The affective assessments are then established in the field using a large number of consumers (e.g., 100–200). The two resulting perceptual datasets can be combined with the physi­ cal variables; this process can provide the requisite information to be able to advise the engineering department on how to achieve a certain strength of individual attributes or overall impression. The QDA method was developed by Stone and Sidel (2004) and is one of basic methods that specifies in detail the entire experimental process, including identification/ elicitation of attributes, training of assessors, planning and conducting experiments, analyzing the results, and presentation of the results and conclusions. The method is described by Stone and Sidel: a sensory methodology that provides quantitative descriptions of products, based on perceptions of a group of qualified subjects. It is a total sensory description tak­ ing into account all sensations that are perceived—visual, auditory, olfactory, kinaesthetic; and so on—when the product is evaluated. The word product is used here in the figurative sense; the product may be an idea or concept, an ingredient, or a finished product purchased and used by the consumer. The evaluation can also be total, for example, as in evaluation of a shaving cream before, during, and after use. Alternatively, the evaluation can focus on only one aspect such as the use. The evalu­ ation is defined in part by the product characteristics as determined by the subjects, and in part by the nature of the problem.  (2004, 203)

The QDA method exhibits the following properties (only those relevant to audio evalu­ ations are listed here). The QDA method:

• is responsive to all sensory aspects of a product or stimulus; • relies on a limited number of assessors for each test; • uses assessors tested and trained before participating in the test(s); • uses a language development process free from panel leader influence; • provides quantitative results; • employs a repeated trials experimental design; and • specifies a data analysis strategy.

328   søren bech and jon francombe The QDA method employs the so-called direct elicitation principle, in contrast to other indirect elicitation methods. The direct elicitation principle assumes that there is a close relationship between the individual attributes (Ψl) and verbal descriptors (single words) elicited as a part of, for example, the QDA process. This is contrary to the indi­ rect elicitation principle where it is not assumed that this relationship exists, and other methods are used—for example, multidimensional scaling (see Shifmann et al. 1981, for an introduction), in which the assessors rate only the perceived (dis)similarity between the stimuli. The statistical analysis then allows for an identification of the individual sensory dimensions that are assumed to be related to individual or groups of related attributes. There are advantages and disadvantages to direct and indirect elicitation techniques; both have been used—sometimes in conjunction—in audio attribute elicitation studies. A full discussion is beyond the scope of this chapter, but Mason and colleagues (2001) present a detailed review of the challenges of capturing a listener’s imagination or impression of an auditory scene using verbal descriptors. The direct elicitation principle assumes that it is possible to elicit a number of words (a vocabulary) where each word corresponds to a specific attribute.8 Two main tech­ niques exist for elicitation of this vocabulary: 1. consensus vocabulary techniques, in which a common vocabulary is developed and agreed on by a team of highly trained assessors and 2. individual vocabulary techniques, in which each assessor develops a specific vocabulary and the common vocabulary is established using statistical analysis of the combined set of ratings from all assessors. The QDA method uses the consensus vocabulary technique, and Lawless and Heyman (1998) list the following properties, in order of importance, which each word in the vocabulary should preferably fulfill. An attribute should:

• be able to discriminate between the stimuli; • have little or no overlap with other words in the vocabulary; • be related to concepts that influence naïve (consumers’) preference decisions; • relate to physical measures defining the stimuli; • be singular rather than combinations of several words or those holistic9 in nature; • be precise and reliable; • be able to generate consensus between the members of the panel of assessors; • be unambiguous to all assessors; • be specifiable by a reference stimulus that is easy to obtain; • have communication value and not be based on jargon; and • be related to reality.

consumer sound   329 The QDA method defines a number of basic steps when applying the method:

1. selection of appropriate assessors; 2. development of the vocabulary and training of the assessors; 3. conducting the tests and report evaluations in quantified form; and 4. statistical analysis and presentation of the results.

Meilgaard and colleagues (1991) list the following generic requirements for selection of a panel of assessors. Assessors should have the ability to: 1. detect differences between the attributes present and in their intensities; 2. describe those attributes using (1) verbal descriptors and (2) scaling methods for differences in intensity; 3. abstract reasoning as descriptive methods depends on the use of references when attributes must quickly be recalled and applied to other stimuli; and 4. participate in the ongoing training program and work of the panel. The practical implementation of these requirements depends on the sensory modality in question, but, as an example, Martin and Bech (2005) used the following implementation when establishing a permanent panel for listening tests. A general invitation to participate in the selection process was sent via the intranet to all employees of Bang & Olufsen, and approximately sixty people responded positively. To test criterion 1, each applicant was first subjected to a pass/fail standard audiometric test using a maximum 20 dB deviation (ISO  1984) for no more than one ear as the ­criterion. This test reduced the number of applicants to approximately thirty. A further three tests were devised to assess the applicants’ abilities to hear various manipulations of standard stereophonic material. The changes ranged from known threshold values to clearly audible changes for known attributes such as timbre, spatial changes, and distortion. To test criteria 2 and 3, a standard fluency test (Spreen and Strauss 1998; Wickelmaier and Choisel 2005) was used. Finally, an Occupational Personality Questionnaire inter­ view was conducted to test the ability and suitability of the applicants to work in a team. Based on their average score in the tests, applicants were rank ordered and ten assessors were invited to be members of the permanent panel. Once the permanent panel of assessors has been established, the QDA process parts aimed at developing the vocabulary and starting the training process can be initiated. The development of a vocabulary for the first time involves typically six phases, as described in the following. Once the initial vocabulary has been established, the ongoing training of the panel is used to maintain the vocabulary and to develop new words when new types of stimuli (e.g., a new category of products) become relevant.

330   søren bech and jon francombe Phase one includes a representative set of stimuli that excites all of the sensory ­ ifferences that are relevant for the experiment or product portfolio at hand. Before the d first session, assessors are typically asked to prepare their own list of words that they can imagine for a predefined scenario—for example, considering the differences between the sound reproduction equipment they possess in their home. In the first session the team is subjected to the stimuli and asked to explain/discuss the meaning of their contri­ butions and organize all of the contributed words into categories that represent the same meaning/interpretation/percept. Phase two includes removing duplicate words in each category, agreeing on a common word or attribute for each category, and eventu­ ally adding a brief description of how the attribute should be interpreted. Phase three includes further discussions of the categories of words and  the agreed on common attribute for each category, followed by the selection of representative stimuli that clearly exhibit the agreed attribute. Phase four includes the first series of practical tests, where the differences between the stimuli cover a large perceptual range. The subjects are asked to discuss and define the endpoint markers of a rating scale that will be used to rate or rank-order the stimuli for each of the agreed common attributes. The subjects also familiarize themselves with the process of scaling the intensities of the selected stimuli for each of the attributes. Typically, a graphical 15-cm horizontal line with no tick marks except endpoint markers offset by 1.5 cm at each end is used for the rating process. The assessor is asked to indicate by either a movable curser or tick, the rating of the stimuli in question (e.g., see “Stage Four: Attribute Ratings later” later). Phase five intro­ duces stimuli with smaller differences and repetitions are included in the experiment. The results of the experiments in phase five are used to check the response system for logical inconsistencies, and to check the abilities of the assessors and selected attributes by answering the following questions: • Are assessors consistent in their ratings of repeated stimuli for all attributes? • Are assessors agreeing on the ratings (ranking) of individual stimuli and attributes? Phase six is the final check of the paradigm developed, and it includes experiments with test conditions that are similar to those in real tests. Zacharov and Lorho (2005) include an example of the development phases just described. Vocabularies of attributes, developed using QDA or other methods, have been published for specific sensory modalities. For example, Noble and colleagues (1987) developed the “wine aroma wheel,” Bech and colleagues (1996) developed a list for image quality of CRT displays, and Pedersen and Zacharov (2015) developed the sound wheel. The statistical analysis of the preliminary training experiments in phases four and five  usually employs analysis of variance (ANOVA) models or more advanced pro­ cedures (such as those described by Næs and colleagues 2010), and can be executed using either commercial software or freeware such as Panelcheck10 or Consumercheck.11 Both Panelcheck and Consumercheck were developed as part of research projects aimed at developing statistical procedures specifically for sensory experiments and implementing

consumer sound   331 them so that nonexperts in statistics can easily use them. The ongoing assessment of panel performance is especially important; procedures for that specific purpose are included in Panelcheck or eGauge (Lorho et al. 2010), and the details of a specific proce­ dure (eGauge) are described in ITU-R (2014b). Once the panel and initial list of attributes have been established, the ongoing training is used to maintain the attribute list and to check the performance of the panel members. Thereafter, a typical application of the QDA procedure in audio includes the following points: 1. definition of the stimuli (for example, the selection of loudspeakers and programs to be tested); 2. initial listening sessions with all members of the panel present, focusing on the selection of attributes from the existing vocabulary such that all perceptual differ­ ences are covered by the selected attributes. A typical selection includes ten to fifteen attributes; 3. conducting the listening tests where each stimulus (e.g., a loudspeaker-­program combination) is rated for all attributes selected in the initial listening sessions. There are many options for the practical implementation of the final tests (see, e.g., Bech et al. 2005; Martin and Bech 2005; Hegarty et al. 2007; Postel et al. 2011); however, it is important that only one attribute be rated at a time. This forces the assessor to keep focus on the interpretation of that particular attribute and the differences between, for example, loudspeakers for a given program; and 4. statistical analysis of the results. This includes, in addition to the standard tests of the quality and properties of the raw data, analysis for each of the attributes where the main variables (e.g., loudspeakers and programs) are examined. The correlation between the examined attributes should also be analyzed—for example, using principal component analysis—to determine the number of independent attributes. Experience from listening and viewing tests at Bang & Olufsen suggests that highly trained subjects can distinguish between a maximum of four to five attri­ butes independently from an initial group of ten to fifteen attributes. In addition to examining the main variables, it is also important to check the performance of the panel (as discussed earlier). Further details of the complete statistical analysis of sensory data are presented by Næs and colleagues (2010). This section has described the considerations that led to the development of ­experimental paradigms aimed at analysis of highly complex, sensory experiences. The QDA method has been described as an example of one of the basic methods, and references are given to other more recent paradigms. To illustrate the process of a sensory analysis in detail, the following section includes a description of a PhD proj­ ect included in a recent research project named “Perceptually Optimized Sound Zones” (POSZ). The PhD project was aimed at developing a perceptual model for prediction of human perception of the interaction between separate sound zones in a domestic situation.

332   søren bech and jon francombe

Sensory Analysis of Interaction between Sound Zones: Development of a Perceptual Model for Prediction of “Distraction” In this section, the POSZ project will be briefly introduced, followed by the descriptive analysis procedure that was used to ultimately develop a predictive model of the main aspect of the listener experience (namely, perceived distraction). The POSZ project brought together researchers in signal processing and audio perception in order to develop perceptually optimal algorithms for producing personal sound zones. In a per­ sonal sound zone situation, two (or more) separate sound fields are produced in sepa­ rate zones in a room in such a way that multiple program items (one item in each zone) can be reproduced simultaneously over the same loudspeakers; consequently, multiple listeners distributed between zones can listen to different program material without the need for headphones. The reproduction of personal sound over loudspeakers—as opposed to headphones—has a number of advantages that are worth the extra signal processing required: removing the need for headphones enables communication between people even if they are consuming separate audio programs, and also facilitates much greater awareness of the environment (this is particularly important in an automotive scenario e.g., for road awareness and safety). The signal processing required to produce such a complex sound field introduces considerable artifacts that are likely to degrade the target quality. At the same time, it is difficult to achieve perfect separation between zones, meaning that a listener may expe­ rience unwanted audio interference on their target audio program. The descriptive analysis performed as part of the POSZ project focused on the latter perceptual problem (i.e., imperfect separation), as there has been considerable prior work on modeling audio quality (e.g., ITU-R 2001; Rumsey et al. 2008; Conetta et al. 2008; Dewhirst et al. 2008a, 2008b; George et al. 2008). A series of perceptual tests was performed to deter­ mine the perceptual experience of a listener in an audio-on-audio interference situation (i.e., a situation in which the experience of listening to some target audio is modified by a secondary interfering audio program). However, an attempt was also made to quantify the magnitude of the effect of these different facets (Baykaner et al. 2015). In the previous section, the QDA paradigm—a strictly controlled and specified method—was outlined. There have been numerous other methods, with similarities to and differences from QDA, which are often trademarked and must be carefully con­ trolled if they are to be strictly followed (Lawless and Heymann 1998, 227–257; Delarue et al. 2016). In practice, it is common for researchers to select aspects of these methods as required for particular elicitation tasks, leading to the development of new methods or simply to ad hoc techniques that are appropriate for particular studies. Murray et al. (2001) term such methods “generic descriptive analysis.”

consumer sound   333 As part of the POSZ project, we conducted a comprehensive literature review of descriptive analysis methods (Francombe et al.  2014a; Francombe  2014). The review focused particularly on the attribute elicitation stage, in order to determine the opti­ mum method for identification of the most relevant attributes for evaluating the experi­ ence of a listener in an audio-on-audio interference situation. The final method selected drew on ideas presented by Zacharov and Koivuniemi (2001) in the “audio descriptive analysis and mapping” (ADAM) method. The experimental method featured four stages: (1) free elicitation; (2) team discussion; (3) attribute reduction; and (4) attribute ratings. These experiment stages are discussed in the following sections; more detail on the experimental method is given by Francombe and colleagues (2014a). Following selection of the most suitable attribute, a predictive model was created. Predictive models are important tools in sensory science, as they help to bridge the gap between physical measurements (which are repeatable and quick to perform, but may not relate directly to human perception—for example, a measurement of signal level is not sufficient to determine the perception of “loudness”) and human responses (which can be reliable and accurate and are naturally related directly to perception, but are very time-consuming and expensive to obtain). A model that can accurately predict the human response in an objective manner can be used for evaluation (saving the time and expense of using a panel of human participants) as well as for system optimization. The development of such a predictive model is discussed further in what follows.

Stages One and Two: Free Elicitation and Team Discussions As discussed previously, direct elicitation processes can use individual or consensus methods to determine attributes. There are advantages and disadvantages to both methods. Consensus methods naturally create a team language, making it more likely that experi­ ment participants understand and agree on the meaning of a descriptor (although not absolutely guaranteed, as it can be difficult to use words to convey experiences with words—see Mason et al. 2001). Individual vocabulary methods do not naturally have this property—although it can be assured through statistical analysis—but they enable all participants to have an equal say, and therefore remove a source of bias where sub­ jects with stronger verbal skills or domineering personalities might take over in a team discussion. Individual methods are also often less time consuming. In order to take advantage of the benefits of both types of methods, a hybrid approach is sometimes taken; this was the case in the ADAM method and in the POSZ project. The first stage of the experiment was an individual free elicitation and was intended to produce a wide pool of descriptive terminology that was relevant to audio-on-audio interference situations. Participants were presented with a set of stimuli (various audioon-audio interference situations) as well as a reference (a target program with no inter­ ference), and asked to write any words that they felt to be relevant for describing the differences between the stimuli and the reference. Stimuli were presented using a custom interface displayed on a computer screen (Figure 16.3); the audio excerpts were randomly

334   søren bech and jon francombe

Figure 16.3  User interface for the free elicitation task. Stimuli were replayed by clicking the circular buttons. Participant responses were typed into the text box at the bottom of the screen.

assigned to a set of buttons, which were positioned above a text box into which responses could be typed. The multiple stimulus presentation meant that participants could also compare between stimuli, widening the pool of potential descriptors. Five trained lis­ teners and four untrained listeners performed the first stage of the test (see below for a discussion on participant experience). A total of 572 unique words and phrases were produced in this first stage. The second stage featured a set of team discussions that were intended to reduce the large set of individually elicited words and phrases into a manageable set of carefully defined attributes. The underlying assumption was that many of the responses from stage one, although ostensibly unique, were describing essentially the same experience. The task for the participants was to find the optimal terminology for labeling and describing the underlying percept. The trained and untrained participants performed the team discussion separately. Each phrase was presented back to the team (using phys­ ical printouts on small cards), and the participants were asked to categorize together any of the responses that described the same percept. It was necessary for the participants to reach a consensus when performing the categorization. When all of the responses had been categorized, participants were asked to produce an attribute definition (a label for the category), endpoint definition (terms that could be used as the positive and negative endpoints of a scale of the attribute), and an attribute description (a short description of the percept that could be understood by someone who had not participated in the experiment). The experiment was facilitated by an experimenter who played no active

consumer sound   335 part in the discussions, serving only to administer the task (e.g., by presenting the phrases and documenting the results). The experimenter was well versed in the background of the tests but was careful to avoid taking an active part in the discussions so as to avoid biasing the results. Using this procedure, the trained listeners categorized 259 responses into 9 attributes, and the untrained listeners categorized 313 responses into 8 attributes. A further team discussion was performed with both sets of participants in order to unify the attribute sets. A number of minor changes were made to definitions, descriptions, and endpoints. Where there were duplicate attributes in the two sets, the participants generally agreed that the trained listener labels and descriptions should be retained. The final attribute set included twelve attributes (see Francombe et al. 2014a, for details): masking; calming; distraction; separation; confusion; annoyance; environment; chaotic; balance and blend; imagery; response to stimuli over time; and short-term response to stimuli.

Stage Three: Attribute Reduction In the first stage of the elicitation just described, participants were asked for all differ­ ences between the test stimuli and the reference. It is likely that some of the differences are much more important than others for evaluating the listener experience. It was therefore necessary to determine the most relevant attribute(s) for further investigation. This type of redundancy reduction is ideally performed using statistical methods; how­ ever, this requires attribute ratings to be made for all of the elicited attributes. To avoid a lengthy and potentially unnecessary attribute rating stage, an attribute reduction phase was included to quickly select only the most relevant attributes. A simplified ranking procedure was used, in which participants were played the experiment stimuli and asked to select the one most relevant attribute for differentiating between the version of the stimulus with and without auditory interference. The test was performed using a custom computer interface, shown in Figure 16.4. The attributes were assigned to the buttons at random. The results of the experiment were analyzed using a chi-square goodness-of-fit test, which quantifies differences from a specified distribution (in this case, the uniform distri­bution—that is, the assumption that all attributes are used with equal probability). It was clear that particular attributes were used at significantly greater than chance fre­ quency, and consequently, four attributes were selected for further analysis: annoyance; distraction; balance and blend; and confusion. The definitions for these attributes are given in Table 16.1.

Stage Four: Attribute Ratings As discussed previously, the relationship between attributes can be assessed by per­ forming a statistical analysis of ratings made on each attribute. For the stimuli under test,

336   søren bech and jon francombe

Response to Stimuli over Time Very Positive Feeling → Very Negative Feeling

Confusion

Calming

Balance and Blend

Extremely Confusing → Not At All Confusing

Very Calming → Very Unsettling

Complementary → Conflicting

Short-term Response to Stimuli

Chaotic

Distraction

Chaotic → Simple

Not At All Distracting → Overpowered

Environment Realistic → Unrealistic

Very Positive Feeling → Very Negative Feeling

Separation

Masking

Imagery

Annoyance

Completely Separate → Indistinguishable

Completely Audible → Drowned Out

Strong Relation → No Relation

Very Annoying → Not Annoying At All

Play/Stop Reference

Play/Stop Test

Next

Trial 1/54 Figure 16.4  User interface for the attribute reduction stage. Stimulus playback was controlled using the buttons at the bottom of the screen. The attribute labels and definitions were posi­ tioned at random on the grid of buttons.

Table 16.1  Attribute Labels, Descriptions, and Endpoints for the Four Attributes That Were Used at Significantly Greater Than Chance Frequency in the Attribute Reduction Stage Attribute

Description

Endpoints

Annoyance

To what extent the alternate audio causes irritation when trying to listen to the target audio

Very annoying to Not at all annoying

Distraction

How much the alternate audio pulls your attention or distracts you from the target audio

Not at all distracting to Overpowered

Balance and blend

How you judge the blend of sources to be

Complementary to Conflicting

Confusion

How confusing the merge of the two audio programs is—rhythmically, melodically, or harmonically; how they blend together. Confusion because the sources interact with each other

Extremely confusing to Not at all confusing

consumer sound   337

Figure 16.5  User interface for the attribute rating stage. Stimuli were replayed by clicking the labeled circular buttons, and ratings were given using the vertical sliders.

ratings were made on the four attributes carried forward from the attribute reduction stage. A multiple stimulus paradigm, modified from the standardized BS.1534-3 “MUSHRA” test (ITU-R 2015), was used: participants gave ratings on 15-cm vertical sliders with endpoint label positions 1.5 cm from the scale ends. The user interface is shown in Figure 16.5. A reference stimulus (just the target audio with no interference) could be played by clicking a button labeled “R” that was positioned in line with the 0 point of the scale (i.e., not at all distracting). The stimuli could be played by clicking the labeled but­ tons at the top of the page; the distraction score was given by setting the associated slider to the desired position. The experiment was performed by the listeners who had par­ ticipated in the attribute elicitation as well as a small team of new participants in order to ensure that the attributes could be used and understood outside of the original panel. A principal component analysis (PCA) (Næs et al. 2010, 209–226) was performed to assess the relationships between the four attributes. In PCA, orthogonal vectors that explain the maximum variance are consecutively extracted from the attribute rating data. The attributes (and ratings) can then be plotted in the new lower-dimensional space to easily allow interpretation of the relationship between the attributes as well as the relationship between attributes and ratings. The PCA solution is plotted in Figure 16.6. The vectors show the correlation between each attribute and the first two principal com­ ponents; the angle and length of each vector indicates the degree to which the associated attribute is correlated with the two visualized components. The number of dimensions on which the original data is represented can be chosen by considering metrics such as “variance explained” or by visual evaluation of a scree plot. In this analysis, almost all of the variance in the data could be explained by two components, indicating that there

338   søren bech and jon francombe

Component 2 (10.0% variance explained)

1

Balance and Blend

0.5 Confusion

0

Annoyance Distraction

–0.5

–1 –1

–0.5 0 0.5 Component 1 (88.5% variance explained)

1

Figure 16.6 Principal component representation of four attributes. The vectors show the ­correlation between each attribute and the two principle components represented in the plot (and can therefore be different lengths depending on the strength of the relationship).

was considerable redundancy in the four attributes. The first component accounted for 88.5 percent of the variance and was related to both annoyance and distraction. The ­second, explaining a further 10 percent of the variance, was related to balance and blend. The attribute confusion was equally loaded onto both dimensions. There were no appar­ ent differences in the PCA solution between the participants who had taken part in the whole experiment and those who only performed the rating task. Further analysis of participant agreement suggested that confusion was the least well understood of the four attributes (i.e., the ratings exhibited least agreement between participants), while distraction was most well understood (or at least, participants used the scale in the same way). Consequently, distraction was selected as the attribute to model; it was strongly related to the component that explained the vast majority of the variance in the data, and it was well understood by the participants.

Attribute Modeling As discussed previously, it is hugely beneficial to be able to predict the human response in a sensory evaluation task in a quick and repeatable manner. It is therefore desirable to develop predictive models that use measured, physical features of the sound field to derive predictions of the human response. As described earlier, the first stage in this pro­ cedure is determining the correct perceptual attribute to model: in this case, distraction

consumer sound   339 due to the presence of some interfering audio program was found to be most appropriate. It is then necessary to collect a large amount of human data constituting ratings of the attribute for different stimuli—preferably over the entire stimulus space that the model might encounter in its target usage domain. As well as collecting subjective ratings, it is necessary to determine the physical parameters of the stimuli that contribute to the ratings, in order that the mathematical relationship between physical parameters and ratings can be modeled. In order to collect a set of ratings, a pool of one hundred audio-on-audio interference situations was created. It was considered desirable to ensure that the training stimuli covered a wide range of potential broadcast audio content, but also that the model train­ ing was not biased by closely controlling a set of physical parameters prior to the feature extraction stage. Consequently, the stimulus set was established using a random sampling method, in which program items were taken from online radio stations at randomly generated times (Francombe et al. 2014b). The items were loudness matched using a perceptual model prior to being used for the construction of the test stimuli by varying a set of parameters. The test parameters (target level, interferer level, and interferer direction12) were not varied in a full factorial manner—they were determined at random (within reasonable ranges). In this manner, a diverse and representative training set was developed. Listener ratings of distraction were collected using the same methodology as for the attribute ratings described above (a multiple stimulus presentation rating test). Participants exhibited strong agreement in their ratings, which helped to validate selection of the attribute distraction. The random sampling stimulus selection method was found to produce a set of stimuli that evenly covered the full range of the perceptual scale. The next challenge was extraction of relevant physical parameters from the stimuli. The range of features that it is possible to extract from audio recordings is multifarious; therefore, selecting the correct features is a crucial and difficult task in any modeling process. To aid with this procedure, participants were asked to write down reasons that they had for finding the audio-on-audio situations distracting; the written response data was analyzed using a form of verbal protocol analysis (Ericsson and Simon 1993, 1–62) to generate a set of categories, which was then used to motivate the search for features. Audio features were extracted using a variety of freely available toolboxes to produce a set of 399 features. The categories and extracted features are described by Francombe and colleagues (2015b). After the creation of a large feature set, the next challenge is model fitting. There is a variety of different methods of modeling data, but in this case, a simple linear regression model was used. One of the main advantages of such a model is that it is easy to interpret the relationship between the features and the response variable. This is not always the case; in more complex model structures (for example, neural networks), this relation­ ship can be obscured. The feature selection process involves training a large number of models and using some criteria to determine which is the best. As an exhaustive search through a large feature set (e.g., 399 in this case) is prohibitively time consuming, it is common to use a search algorithm; in this case, a stepwise feature addition and removal procedure was followed.

340   søren bech and jon francombe One of the primary concerns for a predictive model is its generalizability; that is, the model should be able to make predictions for situations outside of those on which it was trained. As the number of features in a regression model tends toward the num­ ber of data points, it becomes possible to mathematically account for all of the vari­ ance in the data. However, this is not beneficial, as it is very unlikely that the model will be able to make an accurate prediction for a new data point that falls outside of the training data set. This problem is known as overfitting. It is far better for the model to have some error, but for the features to accurately describe a physical phenomenon and therefore to generalize to new situations, than for the model to very accurately predict the training set but fail under new circumstances. It is therefore desirable to minimize the number of features in the final model, while still including all features that describe physical processes that determine the human response. A further consider­ ation when selecting features for a linear regression model is the relationship between each predictor. The linear regression model works under the assumption that the fea­ tures do not correlate highly with each other (this is known as multicollinearity between features). There are two primary metrics that describe the performance of a model. Goodnessof-fit is primarily measured using root-mean-square error (RMSE)—this quantifies the difference between the measured subjective response and the model prediction. The amount of variance explained by the model is measured by the coefficient of determin­ation, R2. Both metrics can be altered to reduce the chance of overfitting. Crossvalidation can be used to estimate the performance of the model on data points outside of the training set. In cross-validation, a number of data points are withheld from the training set, but used for testing (e.g., calculating the RMSE). This process can be repeated for multiple groups of “holdout” data. The R2 statistic can be adjusted in such a way that models with a higher number of features are penalized. These adjusted statistics were used to ensure that the features selected were general­ izable as well as providing an accurate fit. The final model included five features: overall loudness; target-to-interferer ratio; interference-related perceptual score from the “Perceptual Evaluation methods for Audio Source Separation” (PEASS) toolbox (Emiya et al. 2011); high-frequency level range of the interferer; and percentage of temporal windows with low target-to-interferer ratio. The model exhibited an RMSE of approximately 10 percent on the training set and explained 88 percent of the vari­ ance in the data. Regardless of how well a model fits the training data, success or failure can only really be assessed through validation on a new dataset, that is, on data points for which subjec­ tive responses are available but were not used to train the model. In this manner, the generalizability and accuracy of the model can truly be tested. Two validation data sets were used to test the POSZ distraction model. The first used ratings from stimuli col­ lected using the same procedure as that used for the training set data collection (but were not included during the model training). The second validation set used stimuli collected for a previous experiment, which were consequently different in some regards (the program items were longer and some exhibited different conditions such as filtering

consumer sound   341 or the presence of simulated road noise). The RMSE increased from 10 percent to approximately 12 percent and 16 percent for the two datasets respectively; the explained variance (indicated by R2) decreased from 88 percent to 82 percent and 78 percent respectively. This relatively modest reduction in performance suggested that the final model was generalizable to a range of audio-on-audio interference situations with music program material.

Discussion The procedure just described was designed to ensure that a robust model of a relevant facet of listener experience for a relatively new and unknown listening situation could be created. The model was shown to perform well for training and validation datasets; it has since been tested in a number of situations and found to perform very successfully, with error remaining at approximately 10 percent (Rämö et al. 2016). It is hoped that having an accurate model will enable quick, perceptually relevant evaluation of personal sound zones. Some efforts have also been made to use the model to optimize a sound zone generation system by selecting optimally positioned loudspeakers (Francombe et al. 2013). We believe that one of the primary reasons for the success of the model was the compre­ hensive attribute elicitation experiment, which ensured that the correct facet of the ­listening experience was being modeled. It was consistently found that the attribute distraction produced strong agreement between participants; this is invaluable when collecting training data. There are numerous mathematical modeling methods, feature selection tricks, and so on; however, it is often the quality of the subjective training data that is most important when developing such a model. The elicitation procedure described drew heavily on some well-established ideas within the literature but also introduced some novel aspects. It has been widely stated (e.g., Lawless and Heymann 1998) that descriptive attributes should be developed by trained participants while hedonic judgments (e.g., preference) should be made by untrained participants. For the task of investigating the experience of a listener in a personal sound zone system, we felt that it was desirable to perform the elicitation experiment with both trained and untrained listeners. While the trained listeners tended to give better descriptions (this was reflected by the selection of trained ­listener attributes where there was overlap between the two sets), there were also unique and important attributes determined by the untrained participants (e.g., ­balance and blend, which was found to be one of the four most relevant attributes and explained a small but notable proportion of variance in the ratings). Of course, there are some sensory evaluation tasks that require a high degree of experience—for example, where very small degradations or artifacts are present. However, in the case  of audio-on-audio interference in sound zones, the perspective of untrained ­listeners—who will ultimately be the end users of any commercial system—was ­definitely valuable.

342   søren bech and jon francombe

The Next Step The work described thus far has shown that perceptual models, developed using advanced sensory science methodologies, are useful for pure research and for product optimization. However, it is hard to see how perceptual scientists will ever complete the task of quantifying the imagination of consumers with relation to complex auditory scenes. The development of new and ever-more advanced signal processing methods is unlikely to slow and, in fact, spatial audio is the topic of much current research. For example, some recent or current large projects include the BILI project,13 the S3A: Future Spatial Audio project,14 and the ORPHEUS project.15 Two-channel stereo reproduction has been prevalent in domestic and professional audio replay for a number of decades. Five-channel surround sound has also seen considerable uptake, if not quite to the same level of ubiquity as two-channel stereo. However, there are a number of different surround sound reproduction methods available. Channel-based methods have varying loudspeaker counts and positions (including loudspeakers above and below the listener); a set of common loudspeaker layouts has been standardized by the International Telecommunication Union (ITU-R  2014a). Methods that require fewer channels and less set-up effort—such as headphones and soundbars—are also becoming increasingly popular, particularly for domestic audio reproduction. In the last year or so, the boom in virtual reality technology has yet again pushed realistic spatial audio to the forefront of many research agendas. As technology enables production of more complex and realistic experiences—even those that might not relate to real-world situations—quantification of experience and imagination remains of utmost importance. Descriptive analysis experiments have been performed to try to uncover the per­ ceptual differences between reproduction methods—see Francombe and colleagues (2015a) for a review of relevant literature. The resultant picture is complex; there are many different attributes, but limited consensus on their exact meanings or on which are most important. There has been a recent effort to consolidate the existing research in order to produce a standardized set of terms (Pedersen and Zacharov 2015; Zacharov et al. 2016), drawing parallels with the ubiquitous wine aroma wheel (Noble et al. 1987). Another current research topic is the development of faster and more efficient experi­ mental methods; the so-called FastTrack or RaPID methods (see Delarue et al. 2016; Moulin et al. 2016). The purpose is to increase the efficiency of the experimental effort while maintaining the statistical quality of the data. This is especially important for industrial applications, but also in academia for pilot experiments. The research area described and exemplified in this chapter represents another step in the development of sound reproduction techniques that will allow the listener to imagine the sound event as intended by the creator/artist anywhere and anytime. However, it is an ongoing challenge for evaluation methods and models to keep up with the development of new sound reproduction and processing technologies.

consumer sound   343 We feel that the benefit of in-depth perceptual understanding and optimization make this a worthwhile effort.

Notes 1. The audio industry is here defined to include researchers working at universities in areas such as signal processing, electroacoustics, psychoacoustics, and psychology. The area also includes developers working in companies producing products for recording, storing, transmitting, and rendering sound. The “products” include new principles and algorithms, as well as systems for recording, encoding, transmitting, decoding, and rendering sound in the consumer’s home. 2. The following terminology, based on Dorsch (2016), is used in this chapter. A listener is exposed to a sound field in an environment and a perception or percept is created after the transformation of physical energy to neural information by the auditory system. The percept results in an auditory impression or auditory experience. Based on the auditory impression, one or more auditory images are created. The reader is referred to Dorsch (2016) and other chapters in the handbook for a further discussion of imagination. 3. “Ringing” refers to added oscillations of an electrical or acoustic signal that were not present in the original signal. The audible consequences can be that the signal continues when it should have stopped; this is most noticeable on transient signals such as drums. 4. “Squared clouds” refers to a visible artifact in images where clouds have squared edges instead of smooth edges as in nature. This is typically caused by a limited resolution in the bit stream or loss of information during transmission of the signal. 5. http://www.s3a-spatialaudio.org/. Accessed October 5, 2017. 6. These auditory properties can be measured accurately using a range of psychophysical procedures; however, it is outside the scope of this chapter to discuss them in further detail. The reader is referred to Gescheider (2015). 7. The reader should note that the rating will reflect the assessor’s sensitivity to the attribute in question plus a general component reflecting the so-called bias, which is a measure of an asses­ sor’s tendency to respond that a stimulus is present compared to not present. These two com­ ponents can be separated using signal detection theory; see, for example, Gescheider (2015). 8. It is noted that several of the elicited words and the corresponding ratings could be repre­ sentative of the same attribute; however, such multicollinearity is identified and resolved during the statistical analysis. 9. An example could be if the term “quality” is a part of the definition of the word, as “quality” is often ambiguous to assessors. 1 0. http://www.panelcheck.com/. 11. https://consumercheck.co/. 12. These terms refer to a sound zone setup with two or more zones. Target level represents the level of the primary sound in zone A (in which the assessor is situated). Interferer level represents the level of the sound in zone A caused by the interference of sound from the other zones. Interferer direction represents the spatial direction of the interfering sound from other zones. 13. http://www.bili-project.org. 14. http://www.s3a-spatialaudio.org/. 15. https://orpheus-audio.eu/.

344   søren bech and jon francombe

References ANSI/ASA. 2013. Acoustical Terminology. S1.1–2013. American National Standards Institute/ Acoustical Society of America. Baykaner, K., P. Coleman, R. Mason, P. J. B. Jackson, J. Francombe, M. Olik, et al. 2015. The Relationship between Target Quality and Interference in Sound Zones. Journal of the Audio Engineering Society 63 (1–2): 78–89. Bech, S. 1994. Perception of Timbre in Small Rooms: Influence of Room and Loudspeaker Position. Journal of the Audio Engineering Society 42 (12): 999–1007. Bech, S. 1999. Methods for Subjective Evaluation of Spatial Characteristics of Sound. In Proceedings of the Audio Engineering Society 16th International Conference: Spatial Sound Reproduction, 487–504. New York, NY: Audio Engineering Society. Bech, S., M.-A. Gulbol, G. Martin, J. Ghani, and W. Ellermeier. 2005. A Listening Test System for Automotive Audio, Part 2: Initial Verification. In Proceedings of the Audio Engineering Society 118th Convention, 487–504. Barcelona, Spain. Convention paper 6359. New York, NY: Audio Engineering Society. Bech, S., R. Hamberg, M. Nijenhuis, C. Teunissen, H. Looren de Jong, P. Houben, et al. 1996. Rapid Perceptual Image Description (RaPID) Method. In Proceedings of SPIE 2657, 17–28. Bellingham, Washington, USA. Bech, S., and N. Zacharov. 2006. Perceptual Audio Evaluation: Theory, Method and Application. Chichester, UK: Wiley. Beranek, L. L. 1962. Music, Acoustics and Architecture. New York: Wiley. Blauert, J. 2005. Communication Acoustics. Berlin: Springer. Conetta, R., F. Rumsey, S. Zielinski, P. J. B. Jackson, M. Dewhirst, S. Bech, et al. 2008. QESTRAL (Part 2): Calibrating the QESTRAL Model using Listening Test Data. In Audio Engineering Society 125th Convention. San Francisco. Convention paper 7596. New York, NY: Audio Engineering Society. Delarue, J., D. B. Lawlor, and D. M. Rogeaux. 2016. Rapid Sensory Profiling Techniques and Related Methods: Applications in New Product Development and Consumer Research. Cambridge: Woodhead. Dewhirst, M., R.  Conetta, F.  Rumsey, P.  J.  B.  Jackson, S.  Zielinski, S.  George, et al. 2008a. QESTRAL (Part 4): Test Signals, Combining Metrics, and the Prediction of Overall Spatial Quality. In Audio Engineering Society 125th Convention. San Francisco. Convention paper 9598., New York, NY: Audio Engineering Society. Dewhirst, M., P.  J.  B.  Jackson, R.  Conetta, S.  Zielinski, F.  Rumsey, D.  Meares, et al. 2008b. QESTRAL (Part 3): System and Metrics for Spatial Quality Prediction. In Audio Engineering Society 125th Convention. San Francisco. Convention paper 9597., New York, NY: Audio Engineering Society. Dorsch, F. 2016. Hume. In The Routledge Handbook of Philosophy of Imagination, edited by A. Kind, 40–54. London: Routledge. Emiya, V., E. Vincent, N. Harlander, and V. Hohmann. 2011. Subjective and Objective Quality Assessment of Audio Source Separation. IEEE Transactions on Audio, Speech, and Language Processing 19 (7): 2046–57. Ericsson, K. A., and H. A. Simon. 1993. Protocol Analysis: Verbal Reports as Data. London: MIT Press. Francombe, J. 2014. Perceptual Evaluation of Audio-on-Audio Interference in a Personal Sound Zone System. PhD thesis, Guildford, UK: University of Surrey.

consumer sound   345 Francombe, J., P.  Coleman, M.  Olik, K.  Baykaner, P.  J.  B.  Jackson, R.  Mason, et al. 2013. Perceptually Optimized Loudspeaker Selection for the Creation of Personal Sound Zones. In Audio Engineering Society 52nd International Conference: Sound Field Control. Guildford, UK. New York, NY: Audio Engineering Society. Francombe, J., R. Mason, M. Dewhirst, and S. Bech. 2014a. Elicitation of Attributes for the Evaluation of Audio-on-Audio Interference. Journal of the Acoustical Society of America 136 (5): 2630–2641. Francombe, J., R. Mason, M. Dewhirst, and S. Bech. 2014b. Investigation of a Random Radio Sampling Method for Selecting Ecologically Valid Music Programme Material. In Audio Engineering Society 136th Convention. Berlin, Germany. Convention paper 9029. New York, NY: Audio Engineering Society. Francombe, J., T.  Brookes, and R.  Mason. 2015a. Perceptual Evaluation of Spatial Quality: Where Next? In 22nd International Congress on Sound and Vibration Proceedings. 340–347, Florence, Italy. Francombe, J., R.  Mason, M.  Dewhirst, and S.  Bech. 2015b. A Model of Distraction in an Audio-on-Audio Interference Situation with Music Program Material. Journal of the Audio Engineering Society 63 (1–2): 63–77. Gabrielsson, A., and H.  Sjogren. 1979. Perceived Sound Quality of Sound-Reproducing Systems. Journal of the Acoustical Society of America 65 (4): 1019–33. George, S., S. Zielinski, F. Rumsey, R. Conetta, M. Dewhirst, P. J. B. Jackson, et al. 2008. An Unintrusive Objective Model for Predicting the Sensation of Envelopment Arising from Surround Sound Recordings. In Audio Engineering Society 125th Convention. San Francisco. Convention paper 7599. New York, NY: Audio Engineering Society. Gescheider, G. A. 2015. Psychophysics: The Fundamentals. 3rd ed. London: Routledge. Hamasaki, K. 2011. 22.2 Multichannel Audio Format Standardization Activity. Broadcast Technology 45: 14–19. Havelock, D., S. Kuwano, and M. Vorländer. 2008. Handbook of Signal Processing in Acoustics. New York: Springer. Hegarty, P., S. Choisel, and S. Bech. 2007. A Listening Test System for Automotive Audio, Part 3: Comparison of Attribute Ratings Made in a Vehicle with Those Made using an Auralization System. In Audio Engineering Society 123rd Convention. New York. Convention paper 7224. New York, NY: Audio Engineering Society. Herre, J., J. Hilpert, A. Kuntz, and J. Plogsties. 2014. MPEG-H Audio: The New Standard for Universal Spatial/3D Audio Coding. Journal of the Audio Engineering Society 62 (12): 821–830. ISO. 1984. Acoustics: Threshold of Hearing by Air Conduction as a Function of Age and Sex for Otologically Normal Persons. 7029:1984. International Organisation for Standards. ISO. 1993. Sensory Analysis: General Guidance for the Selection, Training and Monitoring of Assessors, Part 1: Selected Assessors. 8586–1:1993. International Organisation for Standards. ISO. 1994. Sensory Analysis: General Guidance for the Selection, Training and Monitoring of Assessors, Part 2: Experts. 8586–2:1994. International Organisation for Standards. ISO. 2002a. Sensory Analysis: General Guidance for the Staff of a Sensory Evaluation Laboratory, Part 1: Staff Responsibilities. 13300–1:2006. International Organisation for Standards. ISO. 2002b. Sensory Analysis: General Guidance for the Staff of a Sensory Evaluation Laboratory, Part 2: Recruitment and Training of Panel Leaders. 13300–2:2006. International Organisation for Standards.

346   søren bech and jon francombe ITU-R. 1997. Methods for the Subjective Assessment of Small Impairments in Audio Systems Including Multichannel Sound Systems. Recommendation BS.1116–1. International Telecommunication Union. ITU-R. 2001. Method for Objective Measurements of Perceived Audio Quality. International Telecommunication Union. ITU-R. 2014a. Advanced Sound System for Programme Production. Recommendation BS.2051–0. International Telecommunication Union. ITU-R. 2014b. Methods for Assessor Screening. Recommendation BS.2300–0. International Telecommunication Union. ITU-R. 2015. Method for the Subjective Assessment of Intermediate Quality Levels of Coding Systems. Recommendation BS.1534–3. International Telecommunication Union. Kaplanis, N., S.  Bech, S.  Tervo, J.  Pätynen, T.  Lokki, T.  Waterschoot, et al. 2017a. A Rapid Sensory Analysis Method for Perceptual Assessment of Automotive Audio. Journal of the Audio Engineering Society 65 (1–2): 1–17. Kaplanis, N., S. Bech, S. Tervo, J. Pätynen, T. Lokki, T. Waterschoot, et al. 2017b. Perceptual Evaluation of Car Cabin Acoustics. Journal of the Acoustical Society of America 141 (2): 1459–146. Kjörling, K., J. Rödén, M. Wolters, J. Riedmiller, A. Biswas, P. Ekstrand, et al. 2016. AC-4: The Next Generation Audio Codec. In Audio Engineering Society 140th Convention. Paris. Convention paper 9491. New York, NY: Audio Engineering Society. Lawless, H. T., and H. Heymann. 1998. Sensory Evaluation of Food: Principles and Practices. New York: Springer. Lorho, G., G. Le Ray, and N. Zacharov. 2010. eGauge: A Measure of Assessor Expertise in Audio Quality Evaluations. In Audio Engineering Society 38th International Conference: Sound Quality Evaluation, 1–10. Piteå, Sweden., New York, NY: Audio Engineering Society. Martin, G., and S.  Bech. 2005. Attribute Identification and Quantification in Automotive Audio, Part 1: Introduction to the Descriptive Analysis Technique. In Audio Engineering Society 118th Convention. Barcelona. Convention paper 6360. New York, NY: Audio Engineering Society. Mason, R., N.  Ford, F.  Rumsey, and B.  De Bruyn. 2001. Verbal and Nonverbal Elicitation Techniques in the Subjective Assessment of Spatial Sound Reproduction. Journal of the Audio Engineering Society 49 (5): 366–84. Meilgaard, M., G. V. Civille, and B. T. Carr. 1991. Sensory Evaluation Techniques. Florida: CRC Press. Moulin, S., S.  Bech, and T.  Stegenborg-Andersen. 2016. Sensory Profiling of High-End Loudspeakers using Rapid Methods, Part 1: Baseline Experiment using Headphone Reproduction. In 2016 Audio Engineering Society Conference on Headphone Technology. Aalborg, Denmark. New York, NY: Audio Engineering Society. Murray, J. M., C. M. Delahunty, and I. A. Baxter. 2001. Descriptive Sensory Analysis: Past, Present and Future. Food Research International 34 (6): 461–71. Næs, T., P.  Brockhoff, and O.  Tomić. 2010. Statistics for Sensory and Consumer Science. Hoboken, NJ: Wiley. Nijenhuis, M. 1993. Sampling and Interpolation of Static Images: A Perceptual View. PhD thesis, Institute of Perception Research, Eindhoven University of Technology, The Netherlands.

consumer sound   347 Noble, A. C., R. A. Arnold, J. Buechsenstein, E. J. Leach, J. O. Schmidt, and P. M. Stern. 1987. Modification of a Standardized System of Wine Aroma Terminology. American Journal of Enology and Viticulture 38 (2): 143–46. Pedersen, T. H., and C. L. Fog. 1998. Optimisation of Perceived Product Quality. Euronoise 2: 633–638. Pedersen, T. H., and N. Zacharov. 2015. The Development of a Sound Wheel for Reproduced Sound. In Audio Engineering Society 138th Convention. Warsaw. Convention paper 9310. New York, NY: Audio Engineering Society. Plomp, R. 1976. Aspects of Tone Sensation: A Psychophysical Study. London: Academic Press. Postel, F., P. Hegarty, and S. Bech. 2011. A Listening Test System for Automotive Audio, Part 5: The Influence of Listening Environment on the Realism of Binaural Reproduction. In Audio Engineering Society 130th Convention. London. Convention paper 8446. New York, NY: Audio Engineering Society. Pulkki, V., and M.  Karjalainen. 2015. Communication Acoustics: An Introduction to Speech, Audio and Psychoacoustics. Chichester, UK: Wiley. Rämö, J., S.  Marsh, S.  Bech, R.  Mason, and S.  H.  Jensen. 2016. Validation of a Perceptual Distraction Model in a Complex Personal Sound Zone System. In Audio Engineering Society 141st Convention. Los Angeles, CA, USA. Convention paper 9665. New York, NY: Audio Engineering Society. Rumsey, F., S.  Zielinski, P.  J.  B.  Jackson, M.  Dewhirst, R.  Conetta, S.  George, et al. 2008. QESTRAL (Part 1): Quality Evaluation of Spatial Transmission and Reproduction using an Artificial Listener. In Audio Engineering Society 125th Convention. San Francisco. Convention paper 7595. New York, NY: Audio Engineering Society. Sabine, W.  C. 1922. Reverberation: Introduction. In The American Architect, reprinted in W. C. Sabine Collected Papers on Acoustics, 3–68 London: Harvard University Press. Shiffman, S.  S., M.  L.  Reynolds, and F.  W.  Young. 1981. Introduction to Multidimensional Scaling. London: Academic Press. Spreen, O., and E.  A.  Strauss. 1998. A Compendium of Neuropsychological Tests. New York: Oxford University Press. Staffeldt, H. 1974. Correlation between Subjective and Objective Data for Quality Loudspeakers. Journal of the Audio Engineering Society 22 (6): 402–415. Stone, H., and J. L. Sidel. 2004. Sensory Evaluation Practices. 3rd ed. London: Academic Press. Theile, G. 1991. On the Naturalness of Two-Channel Stereo Sound. In Proceedings of the Audio Engineering Society 9th International Conference: Television Sound Today and Tomorrow, 143–149. New York, NY: Audio Engineering Society. Toole, Floyd  E. 1982. Listening Tests: Turning Opinion into Fact. Journal of the Audio Engineering Society 30 (6): 431–445. Wickelmaier, F., and S. Choisel. 2005. Selecting Participants for Listening Test of Multichannel Reproduced Sound. In Audio Engineering Society 118th Convention. Barcelona. Convention paper 6483. New York, NY: Audio Engineering Society. Yendrikhovskij, S. N. 1998. Color Reproduction and the Naturalness Constraint. PhD thesis, Institute of Perception Research, Eindhoven University of Technology, The Netherlands. Zacharov, N. 2012. The Impact of Sensory Evaluation of Sound: Past, Present and Future. In Sensometrix. Rennes, France: The Sensometrix Society.

348   søren bech and jon francombe Zacharov, N., and K.  Koivuniemi. 2001. Unravelling the Perception of Spatial Sound Reproduction: Analysis and External Preference Mapping. In Audio Engineering Society 111th Convention. New York. Convention paper 5423. New York, NY: Audio Engineering Society. Zacharov, N., and G. Lorho. 2005. Sensory Analysis of Sound (in Telecommunications). In European Sensory Network Conference. Madrid, Spain: European Sensory Network. Zacharov, N., T. Pedersen, and C. Pike. 2016. A Common Lexicon for Spatial Sound Quality Assessment: Latest Developments. In 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX), 1–6. Lisbon, Portugal: QoMEX.

chapter 17

Cr e ati ng a Br a n d I m age through M usic Understanding the Psychological Mechanisms behind Audio Branding Hauke Egermann

Introduction For a brand today, it is about creating an emotional bond to people; to leave the stage of merely being a product and to be seen more as a trusted friend—a friend with values that you identify yourself and your lifestyle with. . . . Music is something that people connect with, enjoy discussing and sharing with others. Music preference relates to and can reveal a person’s personality. Brands are becoming aware of the possibility to emerge as an ambassador of this social media, the positive effect it can have on their brand image, and how it can attract the attention of people in product and brand marketing. —Lusensky (2008, 9)

This quote was taken from a study report of a Scandinavian Music and Audio Branding consulting agency. On the one hand, it describes the requirement to create meaningful brands for successful marketing and, on the other, it emphasizes the multiple roles that music is thought to play in this context: (1) music is said to create brand attention; (2) music is said to create a positive-affective response in consumers; and (3) music can presumably structure and influence the cognitive meaning dimensions of a brand image. Accordingly, Jackson (2003) defines the professional practice of audio branding as the creation of brand expressions in sound that depend on the consistent and strategic use of these expressions in marketing communication (see also Gustafsson, volume 1, chapter 18).

350   HAUKE EGERMANN They can have various compositional forms: audio logos that are often quite short sequences of acoustical elements, longer jingles and brand songs, background sound tracks and soundscapes, interaction sounds, and the typical brand voices (Krugmann 2007). Potential touchpoints where a consumer experiences these elements could be advertisements in media such as TV, radio, websites, or cinema but also corporate films, brand events, or customer telephone lines. As many of the audio branding elements have musical elements, these elements are said to shape a long-term image of a brand. But how does this shaping work? How does music function when it influences how a consumer imagines characteristics of a brand? This chapter will present several theoretical and empirical accounts in order to understand the psychological mechanisms at work when the imagination of a brand is influenced by music. It will provide insights to the underlying functionality and effectiveness of these practices that will ultimately be summarized in an integrative brand-music communication model.

From Classical Conditioning to Music-Brand Fit The classical literature on music’s effects in advertising mostly focuses on associative learning as the main mechanism involved (North and Hargreaves 2008; Zander 2006). Here, for example, music is taken as an unconditioned stimulus with affective-cognitive meaning that—through paired presentation with a brand or a product (conditioned stimulus)—transfers its evaluative and associative qualities to this brand or product and influences the recipients’ attitudes toward it. This, in turn, is thought to influence consumers’ product choices (Lantos and Craton  2012). In an oft-cited experiment, Gorn (1982) showed that one type of pen was chosen more often than another type of pen, when it was presented together with liked music as opposed to disliked music. So, the unconditioned stimulus music biased the preference for a certain type of product. According to the elaboration likelihood model, there are two routes to this type of persuasion: one of low involvement, that allows peripheral information/qualities to influence choice decisions, and one of high involvement (Petty et al. 1983). In the latter one, consumers are more likely to base their decisions on central qualities of the product and accordingly are less likely to be influenced by music. In the low involvement route, however, music is thought to impact consumer behavior. This difference between the low and high route of persuasion is often exemplified with product categories that involve different processing depths; for example, while the decision of spending a lot of money on a car is more likely to be based on the high-involvement route of processing, low-value goods of everyday buying choices are based on the peripheral route of processing. Yet, the automotive industries are reported to be among the top four most audio branding active industries, including financial services/banking, transportation,

Psychological Mechanisms behind Audio Branding   351 and healthcare/pharmaceuticals (Audio Branding Academy 2013). All of these industries sell high-involvement products. This contradiction could be explained by the concept of musical fit. Under certain conditions, consumers in highly involved states are also likely to be influenced by music. MacInnis and Park (1991) state that this happens when music fits with certain characteristics in the advert: “While music may fit with many ad elements, fit is defined here as consumers’ subjective perceptions of the music’s relevance or appropriateness to the central ad message” (162). Accordingly, even with highly involved consumers, music that activates certain knowledge structures relevant to the advertising message can lead to a positive attitude change toward an advertised product. MacInnis and Park evaluated this theory by showing that music was most effective when it fit the product and brand characteristics that were presented to highly involved consumers in a fictitious TV advert. While some studies were able to replicate these findings (e.g., Zander  2006; for a review, see North and Hargraves 2008), several contradictory results were reported that were assumed to be caused by an often quite experimenter-focused definition of musicproduct-fit that accorded with a lack of specificity of theoretical models employed (North and Hargraves 2008). Therefore, in the following sections, I will specify a theoretical model that might explain why some music fits to a certain brand and advertisement while some other types of music do not. In addition to the affective qualities of music that make it an unconditioned stimulus when presented together with brands, cognitive and associative meaning structures also seem to be involved when music is used in branding. Thus, effective music elicits attentional, emotional, and cognitive responses in consumers, and the following sections will illustrate which types of responses are relevant for branding. However, before that, I will elaborate the structure of different cognitiveaffective meanings that are the focus of the branding process in general (to which different music types might appear be more or less fitting).

Meaning Structure in Identity-Based Brand Management In the consumer-based brand equity model pyramid by Keller (2009), the first step in developing a brand is to create brand salience. The use of branding helps to create awareness and attention for a product and makes it possible to differentiate one product from another similar product. When a brand has salience, an associated visual logo gains sign qualities that refer to its product. Keller, furthermore, distinguishes brand performance from brand imagery that result in judgments and feelings in consumers. While brand performance is related to more functional aspects of the products (like quality, price, service, or reliability), brand imagery is instead based on associative qualities like the brand identity. If a brand has performance and imagery characteristics that are also

352   HAUKE EGERMANN evaluated and responded to positively, the top of Keller’s pyramid is reached, which he terms brand resonance: customers show loyalty with the brand and its product(s), and this is accompanied by attachment, a sense of community, and engagement. Thus, establishing a brand image is thought to create a benefit to those who aim to market commercial products and services: “According to this view, brand knowledge is not the facts about the brand—it is all the thoughts, feelings, perceptions, images, experiences and so on that become linked to the brand in the minds of consumers (individuals and organizations)” (Keller 2009, 143). But how is such a brand image created? Many authors relate it to the constant and strategic planning and implementation of a brand identity. Accordingly, a brand image is received and constructed by a consumer and can be seen to result from a brand identity that was created by a sender (Kapferer 2012). Brand identities share several similarities with the identities of human individuals and social groups (Azoulay and Kapferer 2003). In this view, brand identities are constructed through human expressions that have led some authors to the conclusion that consumers choose brands like they choose friends. Azoulay and Kapferer note, “human individuals are perceived through their behaviour, and, in exactly the same way, consumers can attribute a personality to a brand according to its perceived communication and ‘behaviours’ ” (2003, 149). Furthermore, Aaker reports that consumers might even view brands as their partners (1995). Therefore, in general, brands could have as many characteristics as humans have. However, in consumer research and marketing practice, several attributes have received more attention than others and hence seem to be the most important: brand personality, brand values, and brand demographical-regional origin (see also Burmann et al. 2003). Brand personality and values have been described through several theoretical models. In psychology, personality is often generally described as a construct that allows us to explain individual differences in behavior, thought, and feelings that are stable and coherent in humans (Mischel et al. 2004). It is often broken down into five different facets consisting of: (1) openness to experience, (2) conscientiousness, (3) extraversion, (4) agreeableness, and (5) neuroticism (also called the Five-Factor model, see Digman 1990). One widely used conceptualization of brand personality is that of Aaker (1997), who describes it as a set of all human characteristics that can be associated with a brand. These consist of the following five dimensions: sincerity, excitement, competence, sophistication, and ruggedness (see Table 17.1). While it can be discussed whether all these attributes can be considered personality features in a narrow sense (and show only partially a similarity to the aforementioned Five-Factor model from psychology), it is obvious that the same words could be used to describe humans. Furthermore, this model is used in various marketing contexts and it has been empirically shown that communicating brand personality characteristics create unique, congruent, and stronger brand associations in consumers (Freling and Forbes 2005). According to Schwartz (1992), there is a limited and fixed set of general, universal human value types. These are based on universal human needs that manifest themselves in behavioral orientations. Accordingly, “Values (1) are concepts or beliefs, (2) pertain to

Psychological Mechanisms behind Audio Branding   353

Table 17.1  Aaker’s Brand Personality Dimensions and Attributes (Aaker 1997) Dimension

Sincerity

Excitement

Competence

Sophistication

Ruggedness

Attributes

Down-to-earth

Daring

Reliable

Upper class

Outdoorsy

Charming

Tough

Spirited

Intelligent

Honest

Imaginative

Successful

Wholesome

Up-to-date

Cheerful

desirable end states or behaviors, (3) transcend specific situations, (4) guide selection or evaluation of behavior and events, and (5) are ordered by relative importance” (Schwartz 1992, 4). Schwartz presented ten motivational types that can be used to group values: universalism, benevolence, tradition, conformity, security, power, achievement, hedonism, stimulation, and self-direction. Furthermore, he showed that this structure of universal value types was found across different cultures. This list of value types has subsequently been adopted to branding contexts, where some value types were found not to be applicable (e.g., universalism, conformity, security) and others were added (like aesthetics, ecology, or health; see Gaus et al. 2010). Allen (2002) showed that brands that endorse human values that match those of consumers are preferred because of the perceived product similarity to the consumers’ self-concepts. Accordingly, brands can be used by consumers to express their self-identities. Demographic-regional origin generally refers to the regional localization and demographic context of a brand (Thakor and Kohli 1996). Different products and product types are associated with different countries (e.g., alcoholic drinks like vodka with Russia or whisky with Scotland) that evoke certain associative meaning patterns. Furthermore, brand identities can also refer to certain demographic characteristics like age, gender, or social status (Batra et al. 1993).

The Role of Music in Creating Brand Images According to the previously presented literature on brand personality, values, and demographic-regional origin, brand identities can be differentiated via a combination of personal, emotional, and cognitive human attributes that manifest themselves through communication activities. The following section will illustrate how music is used in audio branding as a medium to communicate these identity-building meaning structures that result in brand images. First, I will illustrate how music can establish brand salience, after which I will elaborate on how music has emotional and cognitive meaning that is relevant to branding activities.

354   HAUKE EGERMANN

Brand Salience The ability to identify and localize objects is an important function of our auditory perception system. Changes in auditory streams have been shown to lead to an increase in attention allocation that is accompanied by a short activation of the peripheral nervous system (the so called orienting response; see Chuen et al. 2016). These findings imply that dynamic music and sounds employed in branding lead to an increased awareness of a brand. For instance, playing music at a point of sale might direct customers’ attentional foci to the location of the sound source. The concept of musical fit might also play an important role in direction attention. According to the congruence-associations framework presented by Cohen (2001), music that is presented together with a visual narrative will influence how the narrative is perceived. While this theory was originally developed to explain the effects of music in film, it could also be applied to advertising. It was shown that those aspects of a visual narrative that are congruent to the music will likely be in the focus of a perceiver’s attention (Marshall and Cohen 1988). Furthermore, the associative-emotional meaning of the music will then be attributed to this focus of visual attention. Thus, presenting music in an audiovisual commercial that structurally or semantically fits a visually presented brand identity will lead to an increased attention for the brand. Like visual logos, audio logos help to memorize and identify a brand. The constant presentation of musical elements together with a product can lead to a long-term memory representation that enables brand recognition and recall. According to Keller (2009), brand recognition refers to a situation in which a consumer is able to confirm prior exposure to a brand when presented with a related brand cue (e.g., a visual logo). Brand recall describes a situation where a consumer recalls a brand when only a product category is primed. In audio branding, a consumer learns to associate musical/acoustical elements (audio logo) with a brand, and subsequent exposure to the logo will activate the mental representation for the brand and product. In a telephone survey that followed the presentation of a nine-month automobile advertising campaign, Stewart and colleagues (1990) observed that 83 percent of respondents recalled seeing the advertisement when presented with a short musical excerpt that was used in the advert, whereas only 62 percent remembered seeing the advert when presented with the product name. Thus, the musical cue was more sensitive than the verbal cues and resulted in stronger activations of the mental network that represent the brand and its advert (brand recall). Audio logos are almost always quite short, making them easy to memorize, and often use the melodic elements of pitch and rhythm. Employing these musical features, audio logos can be presented with varying timbres while preserving their original identity, which then enables brand recognition. Related to this, a study by Bonde and Hansen (2013) implies that pitch information is more perceptually relevant than rhythm information in audio logo recognition. In a statistical analysis of musical features of radio station singles and audio logos, we found that they were on average four notes long (range 3–9 notes) which is likely to match the capacity of the short-term memory (Muellensiefen et al. 2015).

Psychological Mechanisms behind Audio Branding   355 Summarized together, previous research indicates that musical elements and sounds presented together with brands are able to create brand awareness and brand memorability. In this way, they contribute to brand salience, especially when the music used fits (visual) brand qualities.

Emotional Brand Meaning In addition to creating brand salience through establishing brand expressions that lead to additional and stronger memory representations, music may contribute to creating an emotional brand image and an emotional response to a brand. Accordingly, understanding which events elicit which emotional responses allows the prediction of human behavior, a matter that is also considered important in consumer behavior research (Hirschmann and Holbrook  1982). Purchase decisions and attitudes toward adverts, products, and brands are very likely to be influenced by consumers’ emotional reactions to them (Bagozzi et al. 1999). Thus, marketing strategies often target emotional responses through communication measures. Here, one has to differentiate between emotions that are expressed and recognized in the music/advertisement or emotions that are felt as one’s own response to the music/advertisement.

From Expression to Recognition and Feeling of Emotion in Music In one listener, a sad piece of music might be perceived as expressing a negative e­ motional state and at the same time induce an unpleasant feeling. In another listener, that same piece might induce a pleasant feeling. Since Gabrielsson (2002) called for a differenti­ ation between expression and induction of emotion, there have been several experiments comparing the two phenomena (Evans and Schubert 2008; Hunter et al. 2010; Kallinen and Ravaja 2006). However, these studies have often remained rather exploratory in showing that different relations between different emotion types do exist, but theo­ retically grounded explanation of how the two different phenomena can be linked have only been presented recently (Egermann and McAdams 2013). This differentiation between expressed/recognized and induced emotion in music might equal the differentiation between brand identity (that is created by a sender) and the resulting brand image (that is constructed in a perceiver). Why is music able to express emotions that are recognized by a listener? An emotional expressive musician can make use of knowledge acquired during his or her musical training (Juslin et al. 2006). Juslin and Laukka (2003) suggest that the ability to express and recognize emotion in music might be based on acoustical similarities between features used in music and those used in human vocal expressions. In a review of 145 studies from both domains (music and vocal), they show that basic emotions like sadness, anger, fear, happiness, and tenderness are communicated with distinct patterns related to pitch, intensity, timing, and timbre that are independent of the respective domain. This idea was later expanded to other expressive body movement sounds by showing

356   HAUKE EGERMANN partial similarities between musical expressions and walking sounds (Giordano et al. 2014). Hearing action sounds can lead to an understanding of associated actions through activations of mirror neurons (Kohler et al. 2002). This coupling is thought to be based on Hebbian learning that may have the capacity to bind perceptions, actions, and emotional expressions together (Keysers and Gazzola 2009). This perspective on emotion emphasizes the importance of the behavioral response component of emotion as described by Scherer (2005). Accordingly, the main function of emotional responding can be described as coordinating approach and avoidance behavior. Thus, motion and emotion are strongly linked. Expressing and recognizing emotion through movement sounds seems to be a general human capacity that could also apply to emotion expression and recognition in music. This leads to the hypothesis that music might sound emotional to us because it sounds like someone is moving in an emotionally expressive way. Expressive movement characteristics of music that are presented together with a brand might influence how a listener will perceive the identity of a brand. Yet, how does expressing and recognizing emotion in music lead to the induction of an emotional response in a listener? The following section presents several theoretical and empirical accounts that try to explain why music creates emotions that we attribute to ourselves.

Music-Emotion Induction Mechanisms Applied to Audio Branding One of the first attempts to summarize music-related emotion-induction theory and research was presented by Scherer and Zentner (2001) who described several central and peripheral emotion production routes incorporating appraisal, empathy, memory, and peripheral arousal (see also Scherer and Coutinho 2013). In two more recent reviews, Juslin and colleagues identified several additional psychological mechanisms (Juslin and Västfjäll 2008; Juslin et al. 2010) that are thought to be involved in the induction of different emotional qualities through music: cognitive appraisal, evaluative conditioning, visual imagery, episodic memory, musical expectancy, brain stem reflexes, emotional contagion, and rhythmic entrainment. cognitive appraisal Cognitive appraisal is thought to be involved in creating an emotional response to music when the music listened to and the corresponding situations are evaluated. Here emotions are thought to emerge from an appraisal on several dimensions like novelty, urgency, coping potential, norm compatibility, or goal congruence (see also Scherer 1999). In Egermann and colleagues (2013), my colleagues and I demonstrate how social appraisal processes moderate emotional responses to music when participants conform with social norms. Here, study participants were confronted with the emotional impact ratings of previous participants. As this social source of information was stronger than a computational source, we concluded that participants conformed based on normative influence, indicating that cognitive appraisal processes might play a role in responding to music. Furthermore, music that previously helped a listener in achieving a certain goal in the past is remembered and could be used as an unconditioned emotional stimulus in a branding context. For example, relaxing music that was previously listened

Psychological Mechanisms behind Audio Branding   357 to that resulted in the goal-congruent effect of relaxation in a listener might be associated to the positive emotions of achieving the intended goal. On the other hand, music that previously hindered a listener in reaching a goal (e.g., the loud party music from your neighbors at 3:00 a.m. in the night preventing you from falling asleep) might result in negative responses to it. Thus, using either type of music in branding will have a dif­ ferent impact on the brand image and brand judgments in the consumer. This idea draws attention to memory processes that are involved when listeners respond to music emotionally. These mechanisms are particularly involved in creating interindividual differences in consumers’ responses to music in branding contexts. Memory-Based Mechanisms Evaluative conditioning happens when an unconditioned positive or negative stimulus has been paired with music and affects its emotional response to it. Episodic memory is thought to be involved when personal emotional episodes are memorized and remembered together with music. While the occurrence of evaluative conditioning and music has not been investigated under laboratory conditions, Juslin and colleagues (2015) reported that music that was often used in certain emotional situations (weddings, graduation, beginning of summer) was able to induce the targeted emotions of happiness and nostalgia (see also Janata et al. 2007). Accordingly, these emotions might have been induced through episodic memory. In the context of branding, these two mechanisms (con­ ditioning and episodic memory) might be similar to second-order conditioning: the original, conditioned stimulus music (CS1) is first paired with another unconditioned and emotional stimulus (US) and then subsequently transfers its newly acquired emotional meaning to another conditioned stimulus (CS2) through paired presentation, which in this context is a brand. Furthermore, the learning of statistical regularities of musical structures is thought to give rise to listener expectations. Huron (2006) suggests that there are four different types of expectation associated with music, which are created by statistical learning in different auditory memory modules. Veridical expectations are derived from episodic memory and contain knowledge of the progression within a specific piece. Schematic expectations arise from being exposed to multiple pieces and contain information about general event patterns of different musical styles and of music in general. Dynamic expectations are built up through knowledge stored in short-term memory about a specific piece that one is currently listening to and are updated in real time through listening. Finally, Huron also describes conscious expectations that contain listeners’ explicit thoughts about how the music will sound. In his ITPRA (imagination, tension, prediction, reaction, appraisal) theory, Huron links violations or confirmations of these expectations to different types of emotional responses to musical structures. So far, these links have only been partially experimentally studied. Steinbeis and colleagues (2006) reported that harmonic expectancy violations induced increased emotional intensity and physiological arousal. Egermann and colleagues (2013) presented an experimental investi­gation that was conducted in a live concert setting. To confirm the existence of the link between expectation and emotion, we used a threefold approach. (1) On the

358   HAUKE EGERMANN basis of an information-theoretic cognitive model, melodic pitch expectations were ­predicted by analyzing the musical stimuli presented. (2) A continuous rating scale was used by one half of the audience to measure their experience of unexpectedness toward the music heard. (3) Emotional reactions were measured using a multicomponent approach: subjective feeling, expressive facial behavior, and peripheral arousal. The results confirmed the predicted relationship between high-information-content musical events, the violation of musical expectations (in corresponding ratings), and emotional reactions. Musical structures leading to expectation reactions were manifested in emotional reactions at two different emotion component levels: increases in subjective arousal and autonomic nervous system activations. This emotion-induction mechanism could be especially important when considering the timing of individual elements in audiovisual advertisements: musical moments of expectation violation could lead to sudden increases in experienced emotional intensity. Vermeulen and colleagues (2011) showed that synchronous (vs. asynchronous) presentation of brand name and musical peak moment improved the attitude toward the advert. However, they were not able to find transfer effects from this effect on the attitude toward the brand. Musical expectations might play another role in determining a consumer’s response to a piece of music used in a branding context: Predictability (or expectation) of musical structures is linked to the overall complexity of a given musical stimulus (Pearce and Wiggins  2006). Berlyne (1971) hypothesized that stimulus complexity is positively correlated to arousal. Furthermore, he claimed that a hedonic response to an artistic stimulus is optimal when it induces an optimal arousal-level that is not too high or not too low. In order to illustrate this relationship, he referred to the so-called Wundt-curve that has the shape of an inverted U and describes the association between stimulus complexity and hedonic response. There is only limited empirical evidence to show that this theory also applies to music (e.g., North and Hargreaves 1995). If proven to be correct, this theory could explain why some musical stimuli act as positive unconditioned stimuli in branding and some not: Their overall predictability induces optimal or nonoptimal arousal levels in listeners. Cognitive appraisal, emotional memory, and musical expectations are all highly linked to a consumer’s individual and cultural background. This would imply that our responses to music are often very individual and there are only little interindividual similarities in responses to music. Nevertheless, Juslin and Västfjäll (2008) also give examples of emotional response mechanisms that are thought to be much more reliant on low-level stimulus characteristics and perceptual processing that are less influenced by individual learning. Low-Level Stimulus Responses Brain stem reflexes occur when basic acoustical characteristics (e.g., sudden, dissonant, or loud sounds) of auditory stimuli signal events that have occurred might be relevant to our general well-being. For example, a sudden increase in noisiness in the music (that has similarities to the scream of a dangerous animal) might provoke an increase of arousal in the listener (Blumstein et al. 2012). Furthermore, Nagel and colleagues (2008) reported

Psychological Mechanisms behind Audio Branding   359 that an increase in loudness was associated with the experience of music-induced emotional peak moments (so-called chills). Creating sensory dissonance by spectral manipulations of recorded music decreased the pleasantness of induced emotions in Western listeners and participants from a native African population that was isolated from Western cultural influences (Fritz et al. 2009). Employing a similar research design, we also demonstrated that Western music, which was arousing to Canadians, induced similar subjective and physiological arousal in a similarly isolated Pygmy population when it was compared to calming Western music (Egermann et al. 2015). Since this population had never heard Western music before, we concluded that their responses were mediated through universal response mechanisms that take basic acoustical features as their input and create emotional meaningful responses to it. Accordingly, we observed that Pygmies and Westerners responded similarly to acoustic changes in timbre, pitch, and tempo, and this was indicated by higher activations of the peripheral nervous system and subjective arousal ratings. Therefore, the arousal dimension of emotion seems to be based on culturally universal response mechanisms. Yet, the valence dimension (which differentiates between positive and negative emotions) seems to be influenced by cultural learning, as there were no similarities here between the two participant groups. Rhythmic entrainment occurs when body activity synchronizes with external musical rhythms. Accordingly, Khalfa and colleagues (2008) reported that fast musical tempi lead to higher respiration rates compared to slower tempi. This change in bodily arousal is then thought to influence subjective arousal through the peripheral feedback route (Scherer and Zentner 2001). However, so far there has been little empirical evidence presented that has shown how the occurrence of entrainment leads to changes in subjective music experiences (Labbe and Grandjean 2014). Thus, it remains unclear how rhythmic entrainment leads to emotional responses of different qualities. Entrainment might just be a predictor of emotional intensity instead of emotional quality. Responses to Expressive Schemata and Personas Emotional responses to music might be also based on (1) empathy with a performer or composer to whom emotional expressions might be attributed (Scherer and Zentener 2001) and (2) a rather automated unconscious contagion through internal mimicking of expressive cues in the music (Juslin and Västfjäll 2008). Both phenomena can be understood as highly related because they create emotional responses that match those being expressed. In an investigation on empathy and emotional contagion, we have shown that they moderate whether expressed emotions are felt in the listener (Egermann and McAdams 2013). Furthermore, music preference ratings were shown to strongly influence empathy. Thus, we are more likely to empathize with music that we like than music that we dislike. Music that is expressive of positive emotions and that is liked by a consumer will be more likely to act as an unconditioned stimulus when paired with a brand than positive music that a listener dislikes (for an extensive discussion on the determinants of music preference, see Lamont and Greasley 2012). Also, it has been claimed that emotion is induced during the listening process when emotional mental images are built up by music (also called visual imagery; Juslin and

360   HAUKE EGERMANN Västfjäll 2008). These experiences are thought to be based on the nonlinear mapping between features of the musical structures and image schemata (Lakoff 1987; Johnson 1987; for a use of visual imagery in music therapy, see also Bonde, this volume, chapter 21). This mechanism has yet to be studied experimentally in showing that these mental images are the course of an emotional response to music. The only study that I have found that explicitly states that it investigates this mechanism has been published by Vuoskoski and Eerola (2013). The authors reported that a sad narrative, that was read before ­listening to a piece of music, intensified the sadness experienced by participants. It was then concluded that, during listening, visual images of that narrative were experienced by participants. However, in contrast to what was originally stated by Juslin and Västfjäll (2008), in this case, it was not the music that brought up emotional images, but the narrative.

Cognitive Brand Meaning While music might fit a brand in terms of its affective qualities and could act as an unconditioned emotional stimulus, it could also activate more or less brand-fitting cognitive knowledge structures. Koelsch (2011) identifies two different types of meaning in music: extra-musical and intramusical. While the latter type refers to meaning and listener responses that emerge through relationships from one musical structure to another (like tension that is built up before the uncertain onset of a highly expected chord), the former type refers to nonmusical meaning that is attributed to music due to three different sign qualities (based on Peirce’s [1994] semiotic theory): iconic meaning where music refers to nonmusical attributes due to its structural similarity (e.g., the association of a staircase with an ascending melodic line), indexical meaning where music refers to an action by someone else (e.g., a sad musical expression might refer to the sad state of a sad performer/composer), and symbolic meaning where, through associative learning, music becomes sign-like qualities of nonmusical objects or events (e.g., a national anthem that is associated with a nation). The latter mechanism might be the most relevant mechanism in helping to understand why music functions as contributing to the cognitive structure of brands. While there is an unlimited amount of cognitive meaning concepts that could be potentially associated with music, the previous section on brand identities shows that there are a much smaller number of concepts that might apply to brands and music at the same time. Like brand identities, music is also associated with human identities. Assessing association patterns with music, Watt and Ash (1998) show that “those categories with the highest levels of inter-subject agreement are those that are most naturally applied to people; the lowest levels of inter-subject agreement are reached on categories that are not naturally applied to people” (46). Thus, in Western culture, music could be perceived as a virtual person or a virtual group of peoples. What follows illustrates what the content of these associations might be.

Psychological Mechanisms behind Audio Branding   361 In the process of socialization, music listeners use music as a tool for social identity formation. During social bonding processes, music preferences are often topics of conversations (Rentfrow and Gosling 2006). The more similar music preference profiles for two people are, the more likely these two people will bond (Boer et al. 2011). Here, musical genres are especially associated with certain human characteristics. According to North and Hargreaves (1999), adolescents use music as a “badge” for their social identity that communicates something about their self-concepts (see also Lamont, volume 1, chapter 12). For example, listening to indie, classical, or pop music is associated with several typical personal qualities and attributes. The study of North and Hargreaves has stimulated several other investigations into the stereotypical knowledge structures that are associated with fans and performers of different music genres (Table 17.2). Here, it was shown, that these people were usually linked to certain demographics (e.g., age, education, sex), values, personality traits, ethnicities, clothing styles, and various other personal qualities (e.g., attractiveness, trustworthiness, or friendliness). While music genres seem to be socially constructed phenomena, they can also be described as cognitive musical schemata (Huron 2006). Genres consist of typical melodic, rhythmic, and harmonic features and instrumental arrangements. Therefore, employing these particular musical features in branding contexts will elicit particular genre-relevant associations in listeners. Fischer (2009) showed, for example, that the same melodic fragment presented on different instruments led to different typical value associations. Tradition was positively related to the melody being performed on an accordion, an oboe, and a violin, and negatively to a synthesizer and guitar. On the other side, hedonism was associated with a guitar and a synthesizer but not an oboe, violin, or accordion. Furthermore, trumpets and violins were highly associated with power.

Table 17.2  Overview of Studies Testing and Showing Relationship between Music Genre Stereotypes and Human Characteristics Human characteristics

North and Hargreaves 1999

Rentfrow and Gosling 2006

Rentfrow and Gosling 2007

Demographics

X

Values

X

X

X

X

Personality

X

X

X

X

Shevy 2008 X

Ethnicity

X

Clothing

X

X

Intellect/ expertise

X

X

Other personal qualities

Rentfrow et al. 2009

X

X

Kristen and Shevy 2013 X

X

X X

X

X

362   HAUKE EGERMANN In Egermann and Stiegler (2014) we showed that traditional instrumental pieces from different European countries are more or less correctly associated with their country of origin in an online listening test. While participants were not able to correctly identify music from northern Italy or Sweden, Spanish flamenco music was correctly identified by nearly all participants in a recognition paradigm (where participants were given the names of different European countries to choose from). In a free open recall version of the study where participants were asked to list all music-evoked words, again, around 85 percent of participants reported an association with the country Spain. In a second part of this study, we showed that some music excerpts that were chosen to represent music styles that were popular in different decades of the twentieth century were able to induce correct time/decade associations in the listeners. Here, we observed that especially those styles that were popular during the participant’s adolescent years were most effective. Taken together, these results indicate, that music can activate shared meaning structures that could be used for communication purposes (see also Shevy 2008). However, the success of these measures depends on the similarity of interindividual, extra-musical association networks and the strength of the learning of associations between music and other features (exemplified by the lower recognition rate of some countries and decades). Thus, when creating or selecting music to communicate specific, extra-musical meaning, as done in audio branding practice, a detailed knowledge about listeners seems to be just as crucial as the design of the stimuli themselves.

An Integrated Brand-Music Communication Model The previously reported theoretical and empirical reports can be summarized in the following hypothetical model (see Figure 17.1). It presents a simplified communication process, where a company aims to create a brand image in its customers by expressing its brand identity using music. Here, three different functions of music are identified. Music is thought to create salience through creating attention and establishing an additional memory representation for the brand (1. Brand Salience). Furthermore, through shared knowledge about cognitive human attributes related to certain musical character­ istics, music communicates brand values, brand personality, and many other concepts (2. Cognitive Meaning). The characteristics associated with the social group behind certain music genres consisting of performers and listeners are here used as a tool to elicit relevant social associations when music is chosen or produced. When brand identities try to form brand images through music, consumers will process its social-referential meaning with the same mental social capacity as the one they usually employ for person perception. Furthermore, in addition to being able to express emotions that are recognized by a listener (again, probably due to its similarities with typically human expressive sounds), music is also able to evoke and induce emotion (3. Emotional Meaning). Through

Psychological Mechanisms behind Audio Branding   363 Communication Process Expression

Company Brand identity

Music

Consumer Brand image

1. Salience 2. Emotional meaning 3. Cognitive meaning

Shared human attributes and knowledge structures Fit between brand and music

Figure 17.1  Brand-music communication model.

conditioning, music might become an unconditioned stimulus that projects its emotional and cognitive meaning to an original brand without meaning. All three functions (providing salience and cognitive and emotional meaning) are improved when the human attributes evoked by brands and music are semantically similar and “fit” (North and Hargreaves 2008). While many of the reported relationships have been studied separately, there are still no studies that test the entire communication process from the conception of a brand identity to the achievement of a brand image in a consumer through the use of music. In many studies, music was chosen that had certain qualities that are relevant in this context (being salient, emotional, or associated with cognitive concepts). Nevertheless, few studies have focused on studying the emergence of these qualities in a branding context. Therefore, this model remains speculative in that its components have not been tested in their independent functionality. However, the anecdotal evidence reported by audio branding practitioners (Lusensky 2008), who in their daily work influence how consumers imagine brands to be like, is quite striking.

References Aaker, J.  L. 1997. Dimensions of Brand Personality. Journal of Marketing Research 24: 347–356. Aaker, J. L., S. Fournier, D. E. Allen, and J. Olson. 1995. A Brand as a Character, a Partner and a Person: Three Perspectives on the Question of Brand Personality. Advances in Consumer Research 22: 391–395. Allen, M. 2002. Human Values and Product Symbolism: Do Consumers Form Product Preference by Comparing the Human Values Symbolized by a Product to the Human Values That They Endorse?. Journal of Applied Social Psychology 32 (12): 2475–2501.

364   HAUKE EGERMANN Audio Branding Academy. 2013. Audio Branding Barometer 2013. http://audio-brandingacademy.org/media/barometer/ABB2013_20131103.pdf. Accessed April 9, 2017. Azoulay, A., and J.-N.  Kapferer. 2003. Do Brand Personality Scales Really Measure Brand Personality? Journal of Brand Management 11 (2): 143–155. http://doi.org/10.1057/palgrave. bm.2540162. Bagozzi, R. P., M. Gopinath, and P. U. Nyer. 1999. The Role of Emotions in Marketing. Journal of the Academy of Marketing Science 27 (2): 184–206. Batra, R., D. R. Lehmann, and D. Singh. 1993. The Brand Personality Component of Brand Goodwill: Some Antecedents and Consequences. In Brand Equity and Advertising, edited by, D. A. Aaker and A. L. Biel. Hillsdale, NJ: Erlbaum. Berlyne, D. E. 1971. Aesthetics and Psychobiology. New York: Appleton-Century-Crofts. Blumstein, D. T., G. A. Bryant, and P. Kaye. 2012. The Sound of Arousal in Music is ContextDependent. Biology Letters 8 (5): 744–747. http://doi.org/10.1098/rsbl.2012.0374. Boer, D., R. Fischer, M. Strack, M. H. Bond, E. Lo, and J. Lam. 2011. How Shared Preferences in Music Create Bonds between People: Values as the Missing Link. Personality and Social Psychology Bulletin 37: 1159–1171. http://doi.org/10.1177/0146167211407521. Bonde, A., and A.  G.  Hansen. 2013. Audio Logo Recognition, Reduced Articulation and Coding Orientation: Rudiments of Quantitative Research Integrating Branding Theory, Social Semiotics and Music Psychology. SoundEffects 3 (1): 113–135. Burmann, C., Blinda, L., and Nitschke, A. 2003. Konzeptionelle Grundlagen des identitätsbasierten Markenmanagements. LIM-Arbeitspapiere No. 1. Bremen, Germany. Chuen, L., D.  Sears, and S.  McAdams. 2016. Psychophysiological Responses to Auditory Change. Psychophysiology 53 (6): 891–904. Cohen, A. J. 2001. Music as a Source of Emotion in Film. In Music and Emotion: Theory and Research, edited by, J. A. Sloboda and P. N. Juslin. Oxford: Oxford University Press. Digman, J.  M. 1990. Personality Structure: Emergence of the Five-Factor Model. Annual Review of Psychology 41: 417–440. Egermann, H., N.  Fernando, L.  Chuen, and S.  McAdams. 2015. Music Induces Universal Emotion-Related Psychophysiological Responses: Comparing Canadian Listeners to Congolese Pygmies. Frontiers in Psychology 5: 1–9. http://doi.org/10.3389/fpsyg.2014.01341. Egermann, H., and S. McAdams, S. 2013. Empathy and Emotional Contagion as a Link between Recognized and Felt Emotions in Music Listening. Music Perception 31 (2): 139–156. Egermann, H., M. T. Pearce, G. A. Wiggins, and S. McAdams. 2013. Probabilistic Models of Expectation Violation Predict Psychophysiological Emotional Responses to Live Concert Music. Cognitive, Affective, and Behavioral Neuroscience 13 (3): 533–553. http://doi.org/10.3758/ s13415-013-0161-y. Egermann, H., and C. Stiegler. 2014. Communicating National and Temporal Origin of Music: An Experimental Approach to Applied Musical Semantics. In Abstract Book of the 13th International Conference of Music Perception and Cognition. Seoul: ICMPC, 343. Evans, P., and E.  Schubert. 2008. Relationships between Expressed and Felt Emotions in Music. Musicae Scientiae 12 (1): 75–99. Fischer, R. 2009. Sinn für die Marke: Systematisierung des akustischen Markentransfers anhand der Instrumentation. Unpublished master thesis. Hochschule für Musik und Theater Hannover. Freling, T.  H., and L.  P.  Forbes. 2005. An Empirical Analysis of the Brand Personality Effect. Journal of Product and Brand Management 14 (7): 404–413. http://doi.org/10.1108/ 10610420510633350.

Psychological Mechanisms behind Audio Branding   365 Fritz, T., S. Jentschke, N. Gosselin, D. Sammler, I. Peretz, R. Turner, A. D., et al. 2009. Universal Recognition of Three Basic Emotions in Music. Current Biology 19 (7): 573–576. http://doi. org/10.1016/j.cub.2009.02.058. Gabrielsson, A. 2002. Emotion Perceived and Emotion Felt: Same or Different. Musicae Scientiae (Special Issue 2001–2002): 123–145. Gaus, H., S. Jahn, T. Kiessling, and J. Drengner. 2010. How to Measure Brand Values? Advances in Consumer Research 37: 1–2. Giordano, B.  L., H.  Egermann, and R.  Bresin. 2014. The Production and Perception of Emotionally Expressive Walking Sounds: Similarities between Musical Performance and Everyday Motor Activity. PLoS One 9 (12): e115587. doi:10.1371/journal.pone.0115587. Gorn, G.  J. 1982. The Effects of Music in Advertising on Choice Behavior: A Classical Conditioning Approach. Journal of Marketing 46 (1): 94–101. Hirschman, E.  C., and M.  B.  Holbrook. 1982. Hedonic Consumption: Emerging Concepts, Methods and Propositions. Journal of Marketing 46 (3): 92–101. Hunter, P.  G., G.  Schellenberg, and U.  Schimmack. 2010. Feelings and Perceptions of Happiness and Sadness Induced by Music: Similarities, Differences, and Mixed Emotions. Psychology of Aesthetics, Creativity, and the Arts 4 (1): 47–56. http://doi.org/10.1037/ a0016873. Huron, D. 2006. Sweet Anticipation. Cambridge, MA: MIT Press. Jackson, D. 2003. Sonic Branding: An Essential Guide to the Art and Science of Sonic Branding. New York: Palgrave Macmillan. Janata, P., S. T. Tomic, and S. K. Rakowski. 2007. Characterisation of Music-Evoked Autobio­ graphical Memories. Memory 15 (8): 845–860. http://doi.org/10.1080/09658210701734593. Johnson, M. 1987. The Body in the Mind: The Bodily Basis of Meaning, Imagination, and Reason. Chicago: University of Chicago. Juslin, P.  N., G.  Barradas, and T.  Eerola. 2015. From Sound to Significance: Exploring the Mechanisms Underlying Emotional Reactions to Music. American Journal of Psychology 128 (3): 281–304. Juslin, P. N., J. Karlsson, E. Lindström, A. Friberg, and E. Schoonderwaldt. 2006. Play It Again with Feeling: Computer Feedback in Musical Communication of Emotions. Journal of Experimental Psychology: Applied 12: 79–95. doi:10.1037/1076-898X.12.2.79. Juslin, P.  N., and P.  Laukka. 2003. Communication of Emotions in Vocal Expression and Music Performance: Different Channels, Same Code? Psychological Bulletin 129: 770–814. Juslin, P. N., S. Liljeström, D. Västfjäll, and L.-O. Lundqvist. 2010. How Does Music Evoke Emotions? Exploring the Underlying Mechanisms. In Handbook of Music and Emotion: Theory, Research, Applications, edited by P. N. Juslin and J. A. Sloboda, 605–643. Oxford: Oxford University Press. http://doi.org/10.1093/acprof:oso/9780199230143.003.0022. Juslin, P.  N., and D.  Västfjäll. 2008. Emotional Responses to Music: The Need to Consider Underlying Mechanisms. Behavioral and Brain Sciences 31 (5): 559–575; discussion 575–621. http://doi.org/10.1017/S0140525X08005293. Kallinen, K., and N. Ravaja. 2006. Emotion Perceived and Emotion Felt: Same and Different. Musicae Scientiae 10 (2): 191–213. Kapferer, J. 2012. The New Strategic Brand Management: Advanced Insights and Strategic Thinking (5th ed.). London: Kogan Page. Keller, K.  L. 2009. Building Strong Brands in a Modern Marketing Communications Environment. Journal of Marketing Communications 15 (2–3): 139–155. http://doi.org/ 10.1080/13527260902757530.

366   HAUKE EGERMANN Keysers, C., and V.  Gazzola. 2009. Expanding the Mirror: Vicarious Activity for Actions, Emotions, and Sensations. Current Opinion in Neurobiology 19: 666–671. doi:10.1016/j.conb. 2009.10.006. Khalfa, S., M. Roy, P. Rainville, S. Dalla Bella, and I. Peretz. 2008. Role of Tempo Entrainment in Psychophysiological Differentiation of Happy and Sad Music? International Journal of Psychophysiology 68 (1): 17–26. http://doi.org/10.1016/j.ijpsycho.2007.12.001. Koelsch, S. 2011. Towards a Neural Basis of Processing Musical Semantics. Physics of Life Reviews 8 (2): 89–105. http://doi.org/10.1016/j.plrev.2011.04.004. Kohler, E., C. Keysers, M. A. Umiltà, L. Fogassi, V. Gallese, and G. Rizzolatti. 2002. Hearing Sounds, Understanding Actions: Action Representation in Mirror Neurons. Science 297 (5582): 846–848. http://doi.org/10.1126/science.1070311. Kristen, S., and M.  Shevy. 2013. A Comparison of German and American Listeners’ Extra Musical Associations with Popular Music Genres. Psychology of Music 41 (6): 764–778. http://doi.org/10.1177/0305735612451785. Krugmann, D. 2007. Integration akustischer Reize in die identitätsbasierte Markenführung. LiM-Arbeitspapiere No. 27. Bremen, Germany. Labbe, C., and D. Grandjean. 2014. Musical Emotions Predicted by Feelings of Entrainment. Music Perception 32 (2): 170–185. http://doi.org/10.1525/mp.2014.32.2.170. Lakoff, G. 1987. Women, Fire, and Dangerous Things: What Categories Reveal about the Mind. Chicago: University of Chicago Press. Lamont, A., and A.  Greasley. 2012. Musical Preferences. Oxford Handbooks Online. http:// www.oxfordhandbooks.com/view/10.1093/oxfordhb/9780199298457.001.0001/oxfordhb9780199298457-e-015. Accessed April 7, 2017. Lantos, G. P., and L. G. Craton. 2012. A Model of Consumer Response to Advertising Music. Journal of Consumer Marketing 29 (1): 22–42. http://doi.org/10.1108/07363761211193028. Lusensky, J. 2008. Sounds Like Branding. Heartbeats International. http://www.soundslikebranding.com/pdf/slb_digital.pdf. Accessed May 7, 2016. MacInnis, D.  J., and C.  W.  Park. 1991. The Differential Role of Characteristics of Music on High- and Low-involvement Consumers’ Processing of Ads. Journal of Consumer Research 18: 161–173. Marshall, S. K., and A. J. Cohen. 1988. Effects of Musical Soundtracks on Attitudes toward Animated Geometric Figures. Music Perception 6 (1): 95–112. Mischel, W., Y. Shoda, and O. Ayduk. 2004. Introduction to Personality: Toward an Integration. New York: John Wiley & Sons. Muellensiefen, D., H.  Egermann, S.  Burrows. 2015. Radio Station Jingles: How Statistical Learning Applies to a Special Genre of Audio Logos. In Audio Branding Yearbook 2014–2015, edited by K. Bronner, R. Hirt, and C. Ringe, 53–72. Baden-Baden, Germany: Nomos. Nagel, F., R. Kopiez, and O. Grewe. 2008. Psychoacoustical Correlates of Musically Induced Chills. Musicae Scientiae 12 (1): 101–113. North, A., and D.  J.  Hargreaves. 1995. Subjective Complexity, Familiarity, and Liking for Popular Music. Psychomusicology 14: 77–93. North, A., and D. Hargreaves. 1999. Music and Adolescent Identity. Music Education Research 1 (1): 75–92. http://doi.org/10.1080/1461380990010107. North, A. C., and D. J. Hargreaves. 2008. The Social and Applied Psychology of Music. Oxford: Oxford University Press. Pearce, M. T., and G. A. Wiggins. 2006. Expectation in Melody: The Influence of Context and Learning. Music Perception 23 (5): 377–405. http://doi.org/10.1525/mp.2006.23.5.377.

Psychological Mechanisms behind Audio Branding   367 Peirce, C. S. 1994. Elements of Logic. In The Collected Papers of Charles Sanders Peirce. Electronic Edition, Vol. 2, edited by C. Hartshorne and P. Weiss. Charlottesville, NC: InteLex Corp. Petty, R.  E., J.  T.  Cacioppo, and D.  T.  Schumann. 1983. Central and Peripheral Routes to Advertising Effectiveness: The Moderating Effect of Involvement, Journal of Consumer Research 10: 135–146. Rentfrow, P.  J., S.  D.  Gosling. 2006. Message in a Ballad the Role of Music Preferences in Interpersonal Perception. Psychological Science 17 (3): 236–242. Rentfrow, P. J., and S. D. Gosling. 2007. The Content and Validity of Music-Genre Stereotypes among College Students. Psychology of Music 35 (2): 306–326. Rentfrow, P. J., J. A. Mcdonald, and J. A. Oldmeadow. 2009. You Are What You Listen To: Young People’s Stereotypes about Music Fans. Group Processes and Intergroup Relations 12 (3): 329–344. http://doi.org/10.1177/1368430209102845. Scherer, K. 1999. Appraisal Theory. In Handbook of Cognition and Emotion, edited by T. Dalgleish and M. Power. 637–663. Chichester, UK: Wiley. Scherer, K. R. 2005. What Are Emotions? And How Can They Be Measured? Social Science Information 44 (4): 695–729. Scherer, K.  R., and M.  R.  Zentner. 2001. Emotional Effects of Music: Production Rules. In Music and Emotion: Theory and Research, edited by P. N. Juslin and J. A. Sloboda, 361–392. Oxford: Oxford University Press. Scherer, K. R., and E. Coutinho. 2013. How Music Creates Emotion: A Multifactorial Process Approach. In The Emotional Power of Music Multidisciplinary Perspectives on Musical Arousal, Expression, and Social Control, edited by T. Cochrane, B. Fantini, and K. R. Scherer. Oxford: Oxford University Press. Schwartz, S. H. 1992. Universals in the Content and Structure of Values: Theoretical Advances and Empirical Tests in 20 Countries. In Advances in Experimental Social Psychology, Vol. 25, edited by M. Zanna, 1–65. Orlando, FL: Academic Press. Shevy, M. 2008. Music Genre as Cognitive Schema: Extramusical Associations with Country and Hip-Hop Music. Psychology of Music 36 (4): 477–498. http://doi.org/10.1177/ 0305735608089384. Steinbeis, N., S. Koelsch, and J. A. Sloboda. 2006. The Role of Harmonic Expectancy Violations in Musical Emotions: Evidence from Subjective, Physiological, and Neural Responses. Journal of Cognitive Neuroscience 18 (8): 1380–1393. Stewart, D.  W., K.  M.  Farmer, and C.  I.  Stannard. 1990. Music as a Recognition Cue in Advertising-Tracking Studies. Journal of Advertising Research (September): 30 (4) 39–48. Thakor, M. V., and C. S. Kohli. 1996. Brand Origin: Conceptualization and Review. Journal of Consumer Marketing, 13 (3): 27–42. Vermeulen, I., T.  Hartmann, and A.-M.  Welling. 2011. The Chill Factor: Improving Ad Responses by Employing Chill-Inducing Background Music. Proceedings of the 61th Annual Conference of the International Communication Association (ICA), May 26–30, Boston, MA. Vuoskoski, J.  K., and T.  Eerola. 2013. Extramusical Information Contributes to Emotions Induced by Music. Psychology of Music 43 (2): 262–274. http://doi.org/10.1177/0305735613502373. Watt, R. J., and R. L. Ash. 1998. A Psychological Investigation of Meaning in Music. Musicae Scientiae 2 (1): 33–53. http://doi.org/10.1177/102986499800200103. Zander, M. F. 2006. Musical Influences in Advertising: How Music Modifies First Impressions of Product Endorsers and Brands. Psychology of Music 34 (4): 465–480. http://doi.org/ 10.1177/0305735606067158.

chapter 18

Sou n d a n d Emotion Erkin Asutay and Daniel Västfjäll

Introduction Auditory stimulation in our daily life is an ever-present phenomenon. We are usually subject to a constant stream of sounds, even when we are asleep. Throughout this handbook, authors approach sound from many different perspectives, from art to technology, from politics to psychology. Among many other indicators, this alone shows that sound perception is multifaceted. Sound can provide us information but it can also move, harmonize, or traumatize us (LaBelle 2007). Yet, sometimes we can totally ignore it. Sounds also underlie most communication. We talk to people and listen to what they have to say, and the communication is carried out not only by language but also by prosody and intonation. We listen to music to relax and to stimulate ourselves. We can even use sounds to communicate to other species such as our pets. Apart from this communication aspect, sounds inform us regarding objects, events, and spaces that surround us. The auditory system makes sense of this complex input in a seemingly effortless and effective manner. A subset of the auditory input may be of interest to us while the rest may go unnoticed. With the help of executive functions and selective attention, we are able to decompose the acoustic input that reaches our ears into separate auditory streams and focus on the ones that are of value while pushing others into the background. Psychological mechanisms such as attention, motivation, and prior experience can have an impact on sound perception. This chapter focuses on the relationship between sound and emotion; that is, how sounds evoke emotions and how emotional processes influence sound perception and auditory attention. The main objective of the current chapter is to show that affective experience is integral to auditory perception.

370   ERKIN ASUTAY AND DANIEL VÄSTFJÄLL Auditory stimuli have a great potential to evoke emotions in people (Armony and LeDoux  2010; Tajadura-Jiménez  2008). The auditory system scans our surrounding environment, detects and identifies significant objects and events, and signals for attention shifts when necessary (Juslin and Västfjäll 2008). It can also orient the visual system to a particular region of interest (Arnott and Alain 2011). Critically, it has been shown that the auditory system takes the behavioral state of the organism (i.e., emotional, motivational, and attentional) into account while processing auditory stimuli (Weinberger 2010). On the other hand, emotions work in concert with perceptual processes. They can guide us to establish our motivation and preferences about objects, events, and places (Lang and Bradley 2010) and can call for rapid mobilization for action when necessary (Frijda 2008). Here, we present evidence documenting the interplay between auditory and emotional processes. Moreover, imagination, in this chapter, is broadly taken as mental representations that are induced by sounds; and we focus on the impact of these mental representations on the affective experience during sound perception. These imaginations could be very different depending on the context, the listener’s condition, and the sound itself. We make an overall classification of these mental representations from the perspective of the distinction between musical listening and everyday listening (Gaver 1993). The distinction comes from the application of an ecological approach to sound perception (Clarke 2005; Neuhoff 2004). Imagine that you are walking by the pier and you hear a sound. If you pay attention to the sound, you may focus on its perceptual features like loudness, pitch, and timbre and how these features evolve in time. On the other hand, you might just notice that you hear the sound of a passing boat, and your attention will be on the source of the sound. The former is an example of musical listening, while the latter exemplifies everyday listening. Note also that the distinction between everyday and musical listening does not suggest that all musical sounds are received in musical listening mode and vice versa. In the following, we first start with basic properties of the auditory system, and ­present a view of the auditory system as an adaptive and cognitive network that specializes in processing acoustic stimulus features while integrating behavioral state of the organism to its processing. This forms the biological and behavioral basis for our main argument that affective experience is one of the main parts of sound perception. Next, in order to show the tight connections between auditory and affective processes, we focus on affective responses to auditory stimuli, reviewing empirical evidence from behavioral and neuroimaging studies. We present the subject in three different sections: responses to learned emotional meaning of sounds, responses to vocal signals, and responses to music. In doing so, we also attempt to make clear how these different sources of stimuli induce affective reactions in us and how we respond to them. We also discuss how the mental representations evoked by sounds influence affective reactions to auditory stimuli. Then, we present evidence on how the affective significance of sounds can influence perception and attention. Finally, we will bring all this together and underline the main argument of this chapter that the affective experience is an integral part of sound perception.

Sound and Emotion   371

The Auditory System Sound perception is a fundamental part of our interactions with and experience of the external environment. We receive a continuous flow of auditory stimulation from our surroundings, and the auditory system makes sense of this input. It has been suggested that the auditory system has evolved as an alarm system that scans our surroundings, detects salient events in it, and signals for attention shifts to prioritized targets (Juslin and Västfjäll 2008). The reception of sound starts at the ear, which is a specialized organ in sensing local pressure fluctuations. Sound waves travel through the ear canal and set the ear drum in motion, which in turn sets the three bones in the middle ear vibrating. Their function is to amplify the mechanical oscillations and transmit them to the inner ear. These oscillations travel through the fluid in the cochlear canals and set the basilar membrane in motion. The hair cells in the cochlea generate action potentials depending on the basilar membrane motion. Hence, in this manner, acoustic signals are converted to neural signals that travel from the auditory nerve to the central nervous system. On this auditory pathway, substantial information processing takes place in the brain stem and several midbrain stations. The information generated in these structures is sent to the thalamus, which is a relay station that collects signals from the periphery and passes it to the sensory cortices. The primary auditory cortex (A1) is located in the superior part of the temporal lobe of the brain, and the adjoining areas are referred to as the auditory belt areas (Woods et al. 2009). Neurons in the A1 have higher sensitivity to acoustic stimulus features compared to the belt areas, whereas the belt areas show a greater attentional modulation than A1 neurons do (Woods et al. 2010). Neurons in the auditory pathway have preferred frequency regions that they respond to; and in most of the auditory areas there is tonotopic organization: an orderly correspondence between the location of the neurons and their specific frequency tuning (for detailed information on the auditory system, see Moore 2012; Rees and Palmer 2010).

Sound Perception and Localization Perception of a sound can be characterized by a number of subjective (perceptual) dimensions. The main perceptual dimensions of acoustic stimuli are loudness, pitch, and timbre (Fastl and Zwicker 2007). Loudness is one of the most important low-level features, and is the perceptual equivalent to sound intensity. Nevertheless, loudness is only moderately correlated to sound intensity. This is mainly due to the fact that human hearing is not equally sensitive over the entire hearing spectrum. It is most sensitive to mid-frequency ranges (from approximately 100 to 5,000 Hz) where speech signals contain the most energy. Pitch is another main perceptual dimension of auditory stimuli and is related to frequency content and spectrum. Here, pitch also only roughly correlates

372   ERKIN ASUTAY AND DANIEL VÄSTFJÄLL with the frequency spectrum of acoustic signals. Pitch perception arises from tonality, periodicity, and harmonicity. Hence, both the temporal and spectral aspects contribute to pitch perception (for more detailed accounts of pitch and loudness, see Fastl and Zwicker 2007; Moore 2012; Wang and Bendor 2010; Young 2010). Two sounds can have both the same loudness and pitch, yet could sound completely different from one another. To exemplify, consider two different instruments playing exactly the same tone at the same loudness. Timbre is the perceptual quality that accounts for the differences between the two instruments. It is a multidimensional feature; that is, it arises from various aspects of acoustic signals (e.g., transients, relative strength of harmonics). Auditory stimuli also provide spatial information. Localizing sound sources in space is a computationally challenging task, since the auditory system, unlike the visual system, seems to lack a topographical space representation. Spatial cues have to be computed from the signals that reach the respective ears. Intensity and arrival time differences between the respective ears provide cues for sound localization (Blauert 1997). Interaural time difference (ITD) is the main cue for the perceived azimuth of low-frequency sounds (below approximately 1.5 kHz; Hartmann et al. 2013), while interaural level difference (ILD) seems to be more useful for high-frequency signals (above 2 kHz). Apart from these binaural cues, humans also employ monaural cues to extract auditory spatial information. Here, the auditory system makes use of the spectral modulations of the incoming sound that are caused by the shape of the outer ear and the incoming angle of the sound waves (Blauert 1997). Although monaural cues are highly frequency dependent, they can be useful for localizing sounds in the median plane (e.g., front vs. back). It seems that the neural processing of the auditory spatial information (ILDs and ITDs) already starts at the brainstem level (see Ahveninen et al. 2014; Yin and Kuwada 2010). While the role of the auditory cortex on spatial processing is not clear, recent research has led to a two-channel model (hemifield code), in which two neuronal populations are broadly tuned to the left or the right side of the auditory space (Stecker et al. 2005). According to the hemifield code the joint activity of these two populations leads to the azimuth perception.

Attention and Higher Order Influences The auditory system can decompose the complex auditory input into separate streams based on physical and perceptual principles (very much like Gestalt principles) in a seemingly preattentive manner (Bregman 1999). These separate auditory streams, which are perceived as coherent entities, compete for attentional resources to guide perception and behavior (Fritz et al. 2007; Shinn-Cunningham 2008). Perception of separate auditory streams in complex environments depends on both the stimulus characteristics and the listener’s attentional state and intentions. For instance, while listening to an orchestral piece one can focus on and listen to a particular instrumental section, or one can attend to the music as a whole. Previous research has found that frequency, phase, temporal envelope, and source location differences between successive and concurrent sounds can facilitate stream segregation (Moore and Gockel 2012).

Sound and Emotion   373 Research on auditory attention has indicated that the attentional modulation of the auditory cortex could facilitate the processing of behaviorally relevant sounds (Petkov et al. 2004). The auditory cortex shows both learning-induced (Ohl and Scheich 2005) and attention-driven plasticity (i.e., changes in the neural responses due to factors like motivation, learning, stimulus-statistics, etc.; see Ahveninen et al. 2011). It can also acquire specific memory traces (Weinberger 2004) and adapt to the changing nature of auditory environments (Dahmen et al. 2010). Spatial sensitivity of the auditory cortex is enhanced by engaging auditory (Lee and Middlebrooks 2011) or visual spatial tasks (Salminen et al. 2013). Furthermore, auditory brain stem responses can be modulated by working memory load (Sörqvist et al. 2012) and selective attention (Lehmann and Schönweisner 2014). Taken together, these findings indicate that the processing of auditory stimuli is dynamic, adapts to changing environments, and is optimized to process behaviorally significant stimuli. The adaptive capacity of the auditory system suggests that the auditory cortex is not a mere acoustic analysis center. It has been argued that the auditory cortex can integrate higher-order, nonauditory input (e.g., motivation, attention, motor function) into its processing (Weinberger 2010). Apart from the cortex, studies on the inferior colliculus (IC—a hub for the construction of a higher-order auditory percept) in the auditory midbrain shows that neural activity in the IC is sensitive to factors such as eye movements, learning-induced plasticity, motivation, emotion, and task engagement (Bajo et al. 2010; Gruthers and Groh 2012; Malmierca 2005; Marsh et al. 2002). Furthermore, connectional analyses (mainly of the cat and the primate brain) indicate that the auditory network shows a unique architecture with its corticocortical, thalamocortical, and corticocollicular connections (Read et al.  2002; Winer and Lee  2007). Taken together, the behavioral and functional evidence presented in this section suggests that the auditory network is specialized in processing acoustic stimulus features as its main input; and it also makes use of the information on the behavioral state of the organism during auditory processing.

Emotional Responses to Auditory Stimuli How does sound induce emotions? In this section, we discuss the affective experience induced by various auditory stimuli such as environmental sounds, vocalizations, and music. The main aim is to present the close relationship between auditory and affective processes. In her work on auditory-induced emotions during everyday listening, TajaduraJiménez (2008; Tajadura-Jiménez and Västfjäll 2008) suggested four general con­ tributing factors to the affective experience induced by auditory stimuli: physical, spatial, cross-modal, and psychological. The physical factors are related to acoustical features of sounds (such as loudness, pitch, duration, transients, etc.) causing affective reactions in people. In basic psychoacoustic research, the effects of physical features on sound

374   ERKIN ASUTAY AND DANIEL VÄSTFJÄLL perception are generally studied using tone and noise complexes that do not possess semantic content or a particular sound source (Fastl and Zwicker 2007). The perceived loudness and sharpness (i.e., high/low frequency balance) of such tone and noise complexes can be related to the affective reactions they induce (Västfjäll 2012). In music, for instance, sounds that feature dissonant, loud, sudden, or fast temporal components can induce physiological arousal and negative affect in listeners (Juslin and Västfjäll 2008). Auditory stimuli also provide spatial information regarding both the spaces we occupy and objects in our surroundings (i.e., their location and motion with respect to our bodies), and this spatial information can also possess affective quality (Asutay and Västfjäll 2015a). Previous research has found behavioral, neural, and emotional biases in favor of approaching sound sources compared to receding sound sources (Hsee et al. 2014; Maier and Ghazanfar 2007; Seifritz et al. 2002; Tajadura-Jiménez, Väljamäe, et al. 2010). In particular, approaching sounds are found to be more emotional and behaviorally salient than receding sounds. The impact of sound source distance and location together with room size on affective responses has also been studied in the context of everyday listening (Tajadura-Jiménez, Larsson, et al.  2010). Cross-modal factors in auditoryinduced emotion are related to the role of the information that we gather from other modalities. This happens when affective information we receive from one modality influences processing in another (Gerdes et al. 2014). Finally, the psychological factors that influence emotional reactions to auditory stimuli are related to specific meaning and interpretation of a sound, and associations evoked by a sound and/or its source. In everyday listening, these factors are related to sound source identification and semantic content (Tajadura-Jiménez 2008).

Mental Representations Induced by Sound One of the psychological mechanisms through which sounds induce emotions in people is by evoking certain mental representations. As introduced at the beginning of the chapter, we approach imagination as mental representations evoked by sounds. The nature of these mental representations can be very different depending on the situational context, the listener’s behavioral state, and the sound itself. In everyday listening, where we hear events and objects rather than sounds per se, mental representations evolve around the sound source and psychological associations evoked by the sound source. Imagine hearing a bird song while sitting on a park bench. In everyday listening mode, you would almost automatically identify it as a bird; and you might do that without even focusing on the sound itself. The mental representation of a sound source (e.g., a bird) and associations evoked by that source (e.g., a serene forest, or a happy vacation in the past) would influence the affective reactions induced by the sound. To investigate these determinants of auditory-induced emotion, usually one follows an ecological approach (Neuhoff 2004). For instance, Bradley and Lang (2000) used visual scales and psychophysiological measures to assess affective reactions to many different naturally occurring sounds. In our lab, we have studied the affective quality of the meaning associated with

Sound and Emotion   375 environmental sounds (mainly related to the sound source; Asutay et al. 2012). We used a Fourier-time-transform algorithm that performs spectral broadening to reduce the identifiability of sounds, while preserving temporal and spectral variation. The results indicated that emotional reactions to environmental sounds were mostly defined by the meaning attributed to the sound source by the listener. In other words, when par­ ticipants could not identify the source of a particular sound, strong affective reactions induced by the same sound were mostly eliminated. Mental representations evoked by music can be qualitatively very different in comparison with environmental sounds. For instance, it has been suggested that music can trigger visual imagery, that is, when the listener conjures up visual images (Juslin and Västfjäll 2008). Visual imagery is defined as a quasiperceptual experience that resembles an actual perceptual experience but occurs in the absence of visual stimuli. The exact nature of how music evokes mental images remains to be determined. It seems that listeners conceptualize the musical structure using a metaphorical nonverbal mapping between the music and an “image-schemata” that is grounded in bodily experience (Lakoff and Johnson 1980). Visual imagery evoked by musical stimuli can be a part of the affective experience induced by music (Juslin and Västfjäll 2008). Moreover, mental images evoked by music can also occur in connection with memory, where certain musical stimuli trigger a specific memory of a particular event; this process also influences affective reactions to music. Another line of somewhat relevant research comes from auditory imagery studies where researchers have studied the nature of auditory imagery in the absence of auditory stimulation (for detailed accounts, see Hubbard 2010; Zatorre and Halpern 2005). Although this research is far from definitive, it has been found that auditory imagery preserves many structural and temporal properties of sounds and that it involves many of the same brain areas as auditory perception.

Learned Emotional Meaning of Sound As mentioned earlier, affective reactions to environmental sounds are mostly due to the meaning associated with them by the listener. Responses to learned emotional meaning of sound have mostly been studied using conditioning paradigms. Classical con­ ditioning involves learning relationships between events that are initially not related. In its most basic form, the relationship between an unconditioned stimulus (US) and a conditioned stimulus (CS) is formed through successive pairing of the two. A US (e.g., a mild electric shock) readily evokes a response (e.g., fear) with autonomic (e.g., increased heart rate) and behavioral components (e.g., freezing, facial expressions) without any training. However, a CS (e.g., a tone) initially has little to no meaning for the organism. After consistent CS–US pairing is established, a CS when presented alone starts to evoke similar responses (at both behavioral and autonomic levels) that are caused by the US. This is called a conditioned response (for detailed accounts on the parameters that influence the effectiveness of conditioning, see De Houwer et al. 2001; Delgado et al. 2006; Domjan 2005; Olsson and Phelps 2004; Rescorla 1998). Hence, a CS, which

376   ERKIN ASUTAY AND DANIEL VÄSTFJÄLL becomes a warning signal for a US through conditioning, can evoke evolutionarily shaped defensive and appetitive responses. In other words, a CS gains affective salience through conditioning. Neural structures and mechanisms associated with processing of emotional auditory information have been extensively studied with animals using conditioning paradigms (for detailed accounts, see Armony and LeDoux  2010; Weinberger  2010). Ample research points to the importance of the amygdala, which is an almond-shaped structure located in the temporal region of the brain adjacent to the hippocampus with around a dozen interconnected nuclei, as a critical structure for auditory conditioning (Phelps and LeDoux 2005). The amygdala is one of the most studied structures in relation to emotional processing in all sensory modalities. Recent theories posit that the amygdala functions as a relevance detector for biologically and emotionally salient targets in our surroundings (Sander et al. 2003). Neurons in the amygdala change their response patterns to the CS after conditioning. This change or plasticity usually occurs in the form of increased firing rates to the CS. The CS-specific plasticity observed in the amygdala can occur after just a few CS–US pairings and it persists throughout conditioning and even during extinction (i.e., sole CS presentation after conditioning). Furthermore, after conditioning, the frequency selectivity of the cells in the auditory system can change to enhance their responses to the CS frequency at the expense of other frequencies. This receptive field plasticity has been observed in both the primary auditory cortex (A1) and in other structures in the auditory pathway (Xiao and Suga 2005). Receptive field plasticity due to associative learning in A1 is highly specific to the CS frequency, develops rapidly in as few as five trials, shows long-term retention (can endure up to eight weeks after a thirty-trial conditioning session), and continues to develop (Weinberger 2010). Also, it has been shown in both positive and negative affective contexts, and in several species including humans. Other studies found increased perceptual sensitivity in the auditory system after associative affective learning (see, Armony and LeDoux 2010). Taken together, these findings indicate that if an auditory signal gains affective or motivational value through learning, then its represen­ tation in the auditory system during its processing will develop specific plasticity. This is perfectly in line with the high adaptive capacity of the auditory system presented in the previous section.

Vocal Affect Humans and most animals use vocalizations to communicate with their conspecifics. Vocal acoustics is valuable for communication between individuals regarding important events that may arise in their environment, for example, presence of a predator or a food supply. Vocalizations, together with facial expressions, are also important for inferring the emotional state of the speaker. The ability of an individual to successfully interpret the emotional state of the speaker can be crucial for survival in certain situations, and it is

Sound and Emotion   377 critical for social interactions. Unlike other animals, humans also have language to rely on in their communications. Speech signals can carry emotional information not only through semantic content but also through intonation—that is, prosody. Even though there is some conflicting evidence, recent brain imaging studies have found increased amygdala activity for emotional in contrast to neutral vocalizations (Fecteau et al. 2007; Sander and Scheich 2005; Sander et al. 2003; Wiethoff et al. 2009). Other brain areas involved in emotional processing of vocal information are the temporal (superior temporal sulcus [STS] and superior temporal gyrus [STG]) and frontal regions (orbitofrontal cortex [OFC] and inferior temporal gyrus [IFG]). The STS has been shown to respond to the human voice regardless of linguistic content (Belin et al. 2000). The auditory areas along the middle and superior temporal cortex (e.g., STS and STG) are sensitive to the emotional content in vocal signals, and their activation does not seem to depend on attentional focus or task demands (Brück et al. 2013; Grandjean and Frühholz  2013). On the other hand, frontal regions (e.g., OCF and IFG) seem to be involved in emotional processing of vocal signals in a context-dependent (i.e., attentional and task demands) fashion. Hence, models of emotional processing of vocalizations and prosody suggest that affective processing of vocal signals takes place in regions within the STS and STG (some models have proposed that facial expressions are also integrated into this processing, e.g., Brück et al. 2013). The outcomes of this processing are made accessible for higher-order cognitive processes that take place in frontal regions (Brück et al. 2013; Grandjean and Frühholz 2013; Kotz et al. 2013; Schirmer and Kotz 2006). Most research concerning emotional vocalizations approach the subject from the perspective of successful decoding of the affective state of the speaker. This is mainly due to the understanding that the main function of vocalizations is to inform the receiver about the speaker’s emotions. However, other researchers argue that the primary function of vocalizations is to induce emotions in the receiver (Bachorowski and Owren 2008; Owren and Rendall 2001; Russell et al. 2003). According to this framework (known as the affect-induction account of vocalizations, see Bachorowski and Owren 2008), the primary function of vocal signaling is not to inform the receiver about the speaker’s affect, even though vocalizations usually arise from the speaker’s emotions. Listeners can clearly make inferences regarding the affective state of speakers, but this is a secondary outcome. However, the primary outcome is that affective vocal signals induce emotional reactions in listeners in order to modulate their behavior, depending on the context in which the vocalizations occur and the listener’s prior experience with such signals. Hence, vocal signals are not merely displays of the speaker’s emotions. Instead, they are tools of social influence. The affect-induction account began with research on the functions of primate calling (Owren and Rendall 2001), later applied to specific human emotional vocalizations such as laughter (Owren et al. 2013). In connection to this account, it has been argued that infant crying has a function of increasing caregiver arousal (Zeskind  2013). Furthermore, research conducted on tamarin monkeys suggest that species may use emotional features in their vocalizations in order to induce arousing and calling states in receivers (Snowdon and Teie 2013).

378   ERKIN ASUTAY AND DANIEL VÄSTFJÄLL

Music A chapter concerning emotional reactions to sound would be incomplete without music, as it is very important with its high emotional significance for humans. Music is an indispensable part of humanity. Musical instruments are among the oldest cultural artifacts that have been discovered. The bone flutes discovered in Southern Germany date back to about 35,000 years (Conrad et al. 2009). Despite being an ancient part of human life, the evolutionary origins of music are still under debate, and this is, of course, a very difficult inquiry to answer with certainty. Some researchers argue that music is a human invention that has no direct adaptational biological function. For instance, Patel (2010) proposed that music relies on brain functions that are developed for other purposes and that music is not an original function that has shaped our species through natural selection. According to Patel, humans employed previously acquired abilities to invent music. On the other hand, there are adaptationist views postulating that music is in fact an evolutionary adaptation with survival value. Among those, Charles Darwin ([1879] 2004) proposed that music evokes strong emotions and could be an antecedent to our language capacity. Further, it has been suggested that music, with its capacity to be an important channel for communication of emotions, could promote successful reproduction and improve social cohesion (for a detailed discussion, see Altenmüller et al. 2013; Patel 2010). These social functions of music (i.e., social cohesion, com­ munication, and cooperation) might have been critical for survival for human beings. Moreover, music can influence the autonomic nervous system and immune system activity (Koelsch  2011), and musical emotion processing can activate serotonergic (increased serotonin is associated with satisfactory feelings from expected outcomes) and dopaminergic (dopamine is associated with the reward system and reward related feelings) neuromodulatory systems in the brain (Altenmüller et al. 2013; Koelsch 2013).

Musical Emotions The emotion-inducing power of music is usually central in adaptationist views. Nevertheless, there is a conception that musical emotions are merely aesthetic experiences. Some researchers have claimed that music cannot induce everyday emotions such as sadness, happiness, and anger (e.g., Scherer 2003); and others argue that music cannot induce emotions at all (Konecni 2003). One of the main arguments here is that music cannot induce everyday emotions related to survival functions, as it does not seem to possess any capacity related to an individual’s goals and well-being. Hence, it can only induce subtler feelings and aesthetic experiences that are not considered as “real emotions.” Here, we reject this conception and claim that music can in fact induce both basic and complex emotions in listeners through various psychological mechanisms, some of which are not specific to musical stimuli and are common with other emotion-inducing stimuli. Although there are a number of emotion theories whose proponents do not agree on a precise definition of what an emotion is, they largely agree on several components of an emotional episode (for detailed accounts of several

Sound and Emotion   379 emotion theories, see Barrett 2006; Moors 2010; Russell 2009; Scherer 2009). Emotions are generally brief, affective reactions to salient events, and they involve several ­components such as physiological arousal (i.e., autonomic activity such as changes in heart rate), motor expression (e.g., smiling), subjective feeling (e.g., feeling happy upon hearing a loved song), action tendency (e.g., dancing), and regulation. Previous research has shown that music can evoke changes in all of the components that an emotional episode would have (Juslin and Västfjäll 2008; Koelsch et al. 2010). Furthermore, music can induce activity in core neural structures of emotion processing (Koelsch 2013), which is another indicator that music can in fact induce emotions.

Psychological Mechanisms of Emotion Induction by Music Music can induce emotions in many different ways. Juslin and Västfjäll (2008; Juslin 2013), in their model, have proposed a number of psychological mechanisms through which music can induce emotions in the listener: (1) brain-stem reflex, (2) rhythmic entrainment, (3) evaluative conditioning, (4) emotional contagion, (5) visual imagery, (6) episodic memory, (7) musical expectancy, and (8) aesthetic judgment. Below, we very briefly explain these mechanisms (for detailed accounts, see Juslin and Västfjäll 2008; Juslin et al. 2010; Juslin, 2013). Brain-stem reflex refers to the process of emotion induction due to fundamental acoustical characteristics of the music. For instance, music that features sudden, loud, or fast components can induce changes in arousal and negative affective reactions. Brain-stem reflexes are automatic and related to the early stages of auditory processing. Rhythmic entrainment refers to a process whereby the external rhythm of the music influences certain internal bodily rhythm, such as heart rate and respiration. The adjusted bodily rhythm then can influence emotions via proprioceptive feedback. Evaluative con­ ditioning refers to a process of emotion induction by a piece of music that has been paired repeatedly with positive or negative experiences. Thus, this mechanism involves learning of the association between the music and the affective experience. Emotional con­tagion refers to a process of emotion induction where the listener perceives the emotional expression of music, and then this mimetically leads to the induction of the same emotion. For instance, a piece of music that evokes happiness through contagion could be fast, moderately loud, and with an intonation contour that makes great leaps. On the other hand, sad music could involve slower and softer sections. Some researchers have explained this as music mimicking emotional expressions (Johnson-Laird and Oatley  2008). Visual imagery refers to a process of emotion induction when music evokes visual images with affective qualities (e.g., a serene landscape). Mental images can trigger affective reactions internally, and music is argued to be effective in stimulating mental images. Episodic memory refers to a process of emotion induction because a piece of music evokes a memory of a particular event from the listener’s past. When a memory is evoked, the emotions associated with it are evoked as well. Musical expectancy refers to a process of emotion induction where a specific musical feature confirms, violates, or delays listeners’ expectations, and this in turn may lead to feelings of surprise,

380   ERKIN ASUTAY AND DANIEL VÄSTFJÄLL tension, or suspense (Meyer 1956). Musical expectations are related to the anticipation of future sounds, which involves memory and statistical learning of musical structures. In addition, expectation and anticipation are linked to the reward processing and the dopaminergic system in the brain (Huron and Margulis 2010). Finally, aesthetic judgment refers to emotional reactions induced through a subjective evaluation of the aesthetic value of music. Taken together, one may argue that emotional reactions to music could occur through several psychological mechanisms, some of which are not specific to music but, instead, are common to other emotion-inducing stimuli. This also suggests that musical emotions are in fact emotions and they share commonalities with emotions induced by other stimuli.

Neural Correlates of Musical Emotions In this section, we briefly review a number of findings from brain imaging studies regarding affective processing of music (for more detailed accounts, see Koelsch 2013; Koelsch et al. 2010). A number of functional imaging studies concerning the processing of musical emotions indicate the involvement of the amygdala (e.g., Ball et al. 2007; Blood and Zatorre 2001; Eldar et al. 2007; Fritz and Koelsch 2005; Koelsch et al. 2006). According to a number of studies, the amygdala is involved in the processing of both pleasant and unpleasant musical stimuli. Several studies have found activations in the ventral striatum (associated with the reward and experience of pleasure) while listening to pleasant music (e.g., Blood and Zatorre 2001; Koelsch et al. 2006; Menon and Levitin 2005). The ventral striatum is involved in selecting and rewarding behavior in response to incentive stimuli. The nucleus accumbens (NAc), which is a part of the ventral striatum, reflects dopaminergic activity, and is a part of the reward network (sensitive to rewarding stimuli such as food, sex, and money; Sescousse and colleagues 2013). Another study found that intense pleasure in response to music can lead to increased dopamine release in the striatum (Salimpoor et al. 2011). Moreover, a number of brain imaging studies on emotional processing of music found involvement of the structures that are associated with memory processes in the brain (i.e., the hippocampus, the parahippocampal gyrus, and the temporal poles; Fritz and Koelsch 2005; Koelsch et al. 2006). Koelsch and colleagues (2010) suggested that involvement of these structures that are associated with memory processes can also be linked to emotional processing. The activity of the neural structures that are involved in autonomic (i.e., the anterior cingulate cortex; ACC) and endocrine system activity (i.e., the insular cortex) can be influenced by musical stimuli (see, Koelsch 2011; Koelsch 2014). As discussed earlier, music can induce changes in physiological arousal that is mainly related to autonomic (e.g., heart rate, perspiration, pupil dilation, respiration) and endocrine system activity (Hodges 2010; Koelsch 2011). Nevertheless, the ACC and the insular cortex activity are not necessarily related to emotion processing. Taken together, the evidence reviewed in this section indicates that music can activate brain structures within networks related to

Sound and Emotion   381 emotional, reward, and memory processes, as well as the structures related to autonomic and endocrine system activity. Therefore, it is not difficult to understand why music is such a special construct for human societies.

Emotional Influences on Sound Perception and Auditory Attention A growing body of empirical evidence suggests that the affective salience of external stimuli provides invaluable cues for allocation of attentional resources and enhances perception possibly via fast neural routes to sensory processing areas in the brain. One of the main arguments is that emotional stimuli form a special group of high-salient stimuli that are prioritized in sensory processing often at the expense of emotionally neutral stimuli. In other words, people readily pay more attention to emotional signals in comparison to neutral signals. Most of the studies concerning the impact of emotional processes on attention and perception comes from the visual modality (e.g., Vuilleumier 2005; Vuilleumier and Driver 2007; Yiend 2010). Although comparable evidence in the auditory modality is scarce, it seems to be accumulating. Here, we review evidence from human behavioral and brain imaging studies on how affective sounds can modulate perceptual and attentional processes. In a change detection experiment, we found that the affective significance of individual sounds in a complex auditory scene guides auditory attention (Asutay and Västfjäll 2014). Participants listened to two complex auditory scenes (each consisting of six simultaneous environmental sounds), and indicated whether the two scenes were identical or there was a change. Changes took the form of sound replacement (i.e., one sound was replaced by another). Detection accuracy was higher when the changed stimuli were emotionally negative and arousing compared to neutral. In addition, there was an overall increase in perceptual sensitivity for trials in which the unchanged events were negative. These findings suggest that the emotional salience of sounds guides attentional resources in a complex environment and that the presence of an emotionally negative and arousing environment can lead to an overall decrease in auditory attentional thresholds. Furthermore, using an aversive conditioning paradigm, we found that affective learning not only modulates the affective significance of the CS but also can alter loudness perception (Asutay and Västfjäll 2012). In this experiment, participants went through a conditioning session, in which a CS (CS+; bandpass noise) was consistently paired with a US (a vibratory shock delivered to the chair participants sat on). They were also exposed to a control stimulus (CS−) that was not associated with the US. Sounds were bandpass noise at different frequencies, and CS+ and CS− assignments were counterbalanced among participants. After conditioning, CS+ was reported as being more fearful and

382   ERKIN ASUTAY AND DANIEL VÄSTFJÄLL negative and perceived as louder compared to CS−. Another recent study also found that negative emotion can influence loudness perception (Siegel and Stefanucci, 2011). They used a mood-induction technique to induce negative affect in half of the participants and neutral affect in the rest. Participants then listened to auditory stimuli and performed loudness judgments. People in the negative affect group perceived the auditory stimuli as being louder compared to those in the neutral affect group. In our laboratory, we have also investigated the effect of emotional salience of sounds on auditory spatial attention (Asutay and Västfjäll, 2015a, 2015b). Using a covert spatial orienting paradigm, we found that negative sounds provide exogenous cues to orient auditory spatial attention to a particular region of space where they originate (Asutay and Västfjäll 2015b). The auditory stimuli in the experiment were environmental sounds with inherent meaning. Neural models explaining the influence of emotion on other processes place the amygdala in a central position (e.g., LeDoux 2012; Phelps 2006; Pourtois et al. 2013). The amygdala seems to receive information regarding the affective salience of external stimuli early in the processing and, through its fast neural routes to sensory cortical regions, it can modulate perceptual and attentional processing; that is, it can induce transient changes in attentional thresholds in the presence of emotional stimuli. (Phelps 2006; Phelps and LeDoux 2005). Emotional information can also modulate neural activity in regions associated with attentional control that can modulate the impact of selective attention on sensory processing (Domínguez-Borràs and Vuilleumier 2013). Apart from this, the amygdala has direct projections to neuromodulatory systems (e.g., cholinergic, adrenergic, dopaminergic) that are capable of modulating perceptual and attentional processes. Cholinergic nuclei located in the basal forebrain receive input from the amygdala, and they can release acetylcholine to widespread cortical areas. Activation of the cholinergic system can facilitate neural excitability in sensory areas and is argued to be central in learning-induced changes in the auditory cortex (Weinberger 2010). The central amygdala also projects to the locus coeruleus (LC) in the brain stem, which is a part of the noradrenergic system. The LC sends noradrenaline inputs to widespread cortical areas to regulate arousal and autonomic functions. Activation of the noradrenergic system can facilitate sensory processing, enhance cognitive flexibility, and promote vigilant attentional shifting in the presence of significant sensory stimuli (Corbetta et al. 2008; Sara and Bouret 2012). In general, the presence of emotionally significant stimuli can activate the neuromodulatory systems that, in turn, can regulate the activity in the brain regions that are involved in active information processing. Although most evidence on the effect of these neuromodulatory systems relies on animal models, a few human studies exist (Hermans et al. 2011; Thiel et al. 2002; Weis et al. 2012). In conclusion, it seems that processing of emotionally significant stimuli is enhanced via several gain control mechanisms (direct influence on sensory processing and attentional thresholds, and indirect influence of modulatory systems) that are mediated by a large brain network centered around the amygdala.

Sound and Emotion   383

Concluding Remarks In this chapter, we have focused on the relationship between sound and emotion: how acoustic stimuli induce affective reactions in listeners and how the affective significance of sounds influences the way we perceive and attend to them. We reviewed human behavioral and neuroimaging studies concerning learning-induced emotional reactions, vocal emotional signals, and music. Our main aims here were to illustrate the close relationship between affective and auditory processes and to state that affective experience is an integral part of auditory perception. First, viewing the auditory system as an adaptive network specialized in processing acoustic stimuli indicates that the affective and motivational significance of auditory stimuli influences both the way they are processed and our reactions to them. It also makes intuitive sense when we consider the function of the auditory system that scans our surroundings, detects potentially relevant targets, and signals for attention-shifts to salient objects when necessary. In that respect, it functions as an adaptive warning system. Hence, emotional, motivational, and attentional state of the organism are taken into account while complex auditory input is processed and analyzed. Next, conditioning studies have shown that as the emotional significance of an auditory stimulus changes through learning, the representation of that particular sound in the auditory system will develop specific neural plasticity. Thus, emotional significance of sounds can lead to biases in auditory processing and adapt the system to be more attentive and tuned to significant events. This conclusion is also very much in line with the adaptive capacity of the auditory system. In addition, empirical evidence and neural models show how emotional significance of auditory stimuli can effectively rewire the neural structure so that affective stimuli receive priority during sensory processing. Taken together, the findings reviewed here point to the close relationship between affective and auditory processes. Furthermore, music can influence both the autonomic nervous system and immune system activity and activate serotonergic and dopaminergic modulatory systems. Music can also induce and regulate emotions through various psychological mechanisms, most of which are common to other emotion inducing stimuli. Empirical evidence suggests that musical stimuli can consistently activate the main neural structures of affective processing. Emotional signals in music activate brain structures within the networks related to emotional, reward, and memory processes, as well as the structures related to autonomic and endocrine system activity. In addition, mental representations that are evoked by auditory stimuli might also influence emotional reactions elicited during sound perception. We argue that these mental representations depend on the situational context, the listener, and the stimulus itself. Evoked mental representations that are related to the sound source and its meaning induce emotional reactions while listening to environmental sounds. On the other

384   ERKIN ASUTAY AND DANIEL VÄSTFJÄLL hand, visual imagery evoked by the acoustic features of sound might have a completely different nature. For instance, musical stimuli can induce visual imagery and episodic memory, both of which have an impact on emotional experience while listening to music. The former seems to be influenced by the structure of music, while the latter is the retrieval of a memory with emotional significance by music. In conclusion, we argue that auditory perception is central to most interactions we have with our surroundings. Sounds have great potential to communicate biologically significant emotional information, such as vocal signals and music, which modulates both the sensory processing in the brain and the behavioral outcomes. Finally, considering the high adaptational capacity of the auditory system, we claim that emotional experience is integral to sound perception.

References Ahveninen, J., M. Hämäläinen, I. P. Jääskeläinen, S. P. Ahlfors, S: Huang, F. H. Lin, et al. 2011. Attention-Driven Auditory Cortex Short-Term Plasticity Helps Segregate Relevant Sounds from Noise. Proceedings of the National Academy of Sciences 108: 4182–4187. Ahveninen, J., N. Kopco, and I. P. Jääskeläinen. 2014. Psychophysics and Neuronal Basis of Sound Localization in Humans. Hearing Research 307: 86–97. Altenmüller, E., R. Kopiez, and O. Grewe. 2013. A Contribution to the Evolutionary Basis of Music: Lessons from the Chill Response. In Evolution of Emotional Communication, edited by E. Altenmüller, S. Schmidt, and E. Zimmermann, 313–335. Oxford: Oxford University Press. Altenmüller, E., S. Schmidt, and E. Zimmermann. 2013. Evolution of Emotional Communication. Oxford: Oxford University Press. Armony, J. L., and J. LeDoux. 2010. Emotional Responses to Auditory Stimuli. In The Oxford Handbook of Auditory Science: The Auditory Brain, Vol. 2, edited by A. Rees and A. R. Palmer, 479–505. New York: Oxford University Press. Arnott, S. R., and C. Alain. 2011. The Auditory Dorsal Pathway: Orienting Vision. Neuroscience and Biobehavioral Reviews 35: 2162–2173. Asutay, E. 2014. Emotional Influences on Auditory Perception and Attention. Doctoral dissertation. Chalmers University of Technology, Sweden. Asutay, E., and D. Västfjäll. 2012. Perception of Loudness is Influenced by Emotion. PLoS One 7: e388660. Asutay, E., and D. Västfjäll. 2014. Emotional Bias in Change-Deafness in Multisource Auditory Environments. Journal of Experimental Psychology: General 143: 27–32. Asutay, E., and D.  Västfjäll. 2015a. Attentional and Emotional Prioritization of Sounds Occurring Outside the Visual Field. Emotion 15: 281–286. Asutay, E., and D. Västfjäll. 2015b. Negative Emotion Provides Cues for Orienting Auditory Spatial Attention. Frontiers in Psychology 6: 618. Asutay, E., D. Västfjäll., A. Tajadura-Jiménez, A. Genell, P. Bergman, and M. Kleiner. 2012. Emoacoustics: A Study of the Psychoacoustical and Psychological Dimensions of Emotional Sound Design. Journal of the Audio Engineering Society 60: 21–28. Bachorowski, J. A., and M. J. Owren. 2008. Vocal Expressions of Emotion. In Handbook of Emotions, 3rd ed., edited by M. Lewis, J. M. Havilland-Jones, and L. F. Barrett, 211–234. New York: Guilford Press. Bajo, V. M., F. R. Nodal, D. R. Moore, and A. J. King. 2010. The Descending Corticocollicular Pathway Mediates Learning-Induced Auditory Plasticity. Nature Neuroscience 13: 253–260.

Sound and Emotion   385 Ball, T., B. Rahm, S. Eickhoff, A. Schulze-Bonhage, O. Speck, and I. Mutschler. 2007. Response Properties of Human Amygdala Subregions: Evidence Based on Functional MRI Combined with Probabilistic Anatomical Maps. PLoS One 3: 307. Barrett, L.  F. 2006. Solving the Emotion Paradox: Categorization and the Experience of Emotion. Personality and Social Psychology Review 10: 20–46. Belin, P., R. J. Zatorre, P. Lafallie, P. Ahad, and B. Pike. 2000. Voice-Selective Areas in Human Auditory Cortex. Nature 403: 309–312. Blauert, J. 1997. Spatial Hearing. Rev. ed. Cambridge, MA: MIT Press. Blood, A. J., and R. Zatorre. 2001. Intensely Pleasurable Responses to Music Correlate with Activity in Brain Regions Implicated in Reward and Emotion. Proceedings of the National Academy of Sciences 98: 11818–11823. Bradley, M. M., and P. J. Lang. 2000. Affective Reactions to Acoustic Stimuli. Psychophysiology 49, 204–215. Bregman, A. 1999. Auditory Scene Analysis: The Perceptual Organization of Sound. 2nd ed. London: MIT Press. Brück, C., B. Kreifelts, T. Ethofer, and D. Wildgruber. 2013. Emotional Voices. In The Cambridge Handbook of Human Affective Neuroscience, edited by J. Armony and P. Vuilleumier, 265–285. New York: Cambridge University Press. Clarke, E.  F. 2005. Ways of Listening: An Ecological Approach to the Perception of Musical Meaning. New York: Oxford University Press. Conrad, N. J., Malina, M., and Münzel, S. C. 2009. New Flutes Document the Earliest Tradition in Southwestern Germany. Nature 460: 737–740. Corbetta, M., G.  Patel, and G.  L.  Schulman. 2008. The Reorienting System of the Human Brain: From Environment to Theory of Mind. Neuron 58: 306–324. Dahmen, J.  C., P.  Keating, F.  R.  Nodal, A.  L.  Schulz, and A.  J.  King. 2010. Adaptation to Stimulus Statistics in the Perception and Neural Representation in Auditory Space. Neuron 66: 937–948. Darwin, C. (1879) 2004. The Descent of Man. London: Penguin. De Houwer, J., S. Thomas, and F. Baeyens. 2001. Associative Learning of Likes and Dislikes: A Review of 25 Years of Research on Human Evaluative Conditioning. Psychological Bulletin 127: 853–869. Delgado, M.  R., A.  Olsson, and E.  A.  Phelps. 2006. Extending Animal Models of Fear Conditioning to Humans. Biological Psychology 73: 39–48. Domínguez-Borràs, J., and P. Vuilleumier. 2013. Affective Biases in Attention and Perception. In The Cambridge Handbook of Human Affective Neuroscience, edited by J.  Armony and P. Vuilleumier, 331–356. New York: Cambridge University Press. Domjan, M. 2005. Pavlovian Conditioning: A Functional Perspective. Annual Review of Psychology 56: 179–206. Eldar, E., O. Ganor, R. Admon, A. Bleich, and T. Hendler. 2007. Feeling the World: Limbic Response to Music Depend on Related Content. Cerebral Cortex 17: 2828–2840. Fastl, H., and E. Zwicker. 2007. Psychoacoustics: Facts and Models. Berlin: Springer. Fecteau, S., P. Belin, Y. Joanette, and J. L. Armony. 2007. Amygdala Responses to Nonlinguistic Emotional Vocalizations. Neuroimage 36: 480–487. Frijda, N. 2008. The Psychologists’ Point of View. In Handbook of Emotions, 3rd ed., edited by M. Lewis, J. M. Havilland-Jones, and L. F. Barrett, 68–87. New York: Guilford Press. Fritz, J. B., M. Elhilali, S. V. David, and S. A. Shamma. 2007. Auditory Attention: Focusing the Searchlight on Sound. Current Opinion in Neurobiology 17: 1–19. Fritz, T., and S. Koelsch. 2005. Initial Response to Pleasant and Unpleasant Music: An fMRI Study (Poster). NeuroImage 26 (Suppl.), 271.

386   ERKIN ASUTAY AND DANIEL VÄSTFJÄLL Gaver, W. W. 1993. What Do We Hear in the World? An Ecological Approach to Auditory Event Perception. Ecological Psychology 5: 1–29. Gerdes, A. B. M., M. J. Wieser, and G. W. Alpers. 2014. Emotional Pictures and Sounds: A Review of Multimodal Interactions of Emotion Cues in Multiple Domains. Frontiers in Psychology 5: 1351. Grandjean, D., and S.  Frühholz. 2013. An Integrative Model of Brain Processes for the Decoding of Emotional Prosody. In Evolution of Emotional Communication, edited by E. Altenmüller, S. Schmidt, and E. Zimmermann, 211–228. Oxford: Oxford University Press. Gruthers, K. G., and J. M. Groh. 2012. Sounds and Beyond: Multisensory and Other NonAuditory Signals in the Inferior Colliculus. Frontiers in Neural Circuits 6: 96. Hermans, E. J., H. J. F. van Marle, L. Ossewaarde, M. J. A. G. Henckens, S, Qin, M. T. R. van Kesteren, et al. 2011. Stress-Related Noradrenergic Activity Prompts Large-Scale Neural Network Reconfiguration. Science 334: 1151–1153. Hartmann, W.  M., L.  Dunai, and T.  Qu. 2013 Interaural Time Difference Thresholds as a Function of Frequency. In Basic Aspects of Hearing, edited by B. C. J. Moore, R. D. Patterson, I. M. Winter, R. P. Carlyon, and H. E. Gockel, 239–246. New York: Springer. Hodges, D.  A. 2010. Psychophysiological Measures. In Handbook of Music and Emotion, edited by P. N. Juslin and J. A. Sloboda, 279–312. New York: Oxford University Press. Hsee, C. K., Y. Tu, Z. Y. Lu, and B. Ruan. 2014. Approach Aversion: Negative Hedonic Reactions toward Approaching Stimuli. Journal of Personality and Social Psychology 106: 699–712. Hubbard, T.  L. 2010. Auditory Imagery: Empirical Findings. Psychological Bulletin 136: 302–329. Huron, D., and E.  H.  Margulis. 2010. Musical Expectancy and Thrills. In Handbook of Music and Emotion, edited by P. N. Juslin and J. A. Sloboda, 575–604. New York: Oxford University Press. Johnson-Laird, P. N., and K. Oatley. 2008. Emotions, Music, and Literature. In Handbook of Emotions, 3rd ed., edited by M. Lewis, J. M. Havilland-Jones, and L. F. Barrett, 102–113. New York: Guilford Press. Juslin, P. 2013. From Everyday Emotions to Aesthetic Emotions: Towards a Unified Theory of Musical Emotions. Physics of Life Reviews 10: 235–266. Juslin, P., and D. Västfjäll. 2008. Emotional Responses to Music: The Need to Consider Underlying Mechanisms. Behavioral and Brain Sciences 31: 559–621. Juslin, P. N., S. Liljeström, D. Västfjäll, and L. O. Lundqvist. 2010. How Does Music Evoke Emotions? Exploring the Underlying Mechanisms. In Handbook of Music and Emotion, edited by P. N. Juslin and J. A. Sloboda, 605–642. New York: Oxford University Press. Koelsch, S. 2011. Toward a Neural Basis of Music Perception: A Review and Updated Model. Frontiers in Psychology 2: 110. Koelsch, S. 2013. Emotion and Music. In The Cambridge Handbook of Human Affective Neuroscience, edited by J.  Armony and P.  Vuilleumier, 286–303. New York: Cambridge University Press. Koelsch, S. 2014. Brain-Correlates of Music-Evoked Emotions. Nature Reviews Neuroscience 15: 170–180. Koelsch, S., T. Fritz, D. Y. von Cramon, K. Müller, and A. D. Frederici. 2006. Investigating Emotion with Music: an fMRI Study. Human Brain Mapping 27: 239–250. Koelsch, S., W. A. Siebel, and T. Fritz. 2010. Functional Neuroimaging. In Handbook of Music and Emotion, edited by P. N. Juslin and J. A. Sloboda, 313–344. New York: Oxford University Press. Konecni, V. J. 2003. Review of Music and Emotion: Theory and Research. Music Perception 20: 332–341.

Sound and Emotion   387 Kotz, S. A., A. S. Hasting, and S. Paulmann. 2013. On the Orbito-Striatal Interface in Acoustic Emotional Processing. In Evolution of Emotional Communication, edited by E. Altenmüller, S. Schmidt, and E. Zimmermann, 229–240. Oxford: Oxford University Press. LaBelle, B. 2007. Background Noise: Perspectives on Sound Art. New York: Continuum. Lakoff, G., and M. Johnson. 1980. Metaphors We Live By. Chicago: University of Chicago Press. Lang, P. J., and M. M. Bradley. 2010. Emotion and the Motivational Brain. Biological Psychology 84: 437–450. LeDoux, J. 2012. Rethinking the Emotional Brain. Neuron 73: 653–676. Lee, C. C., and J. C. Middlebrooks. 2011. Auditory Cortex Spatial Sensitivity Sharpens during Task Performance. Nature Neuroscience 14: 108–114. Lehmann, A., and M. Schönweisner. 2014. Selective Attention Modulates Human Auditory Brainstem Responses: Relative Contributions of Frequency and Spatial Cues. PLoS One 9 (e85442). Maier, J. X., and A. A. Ghazanfar. 2007. Looming Biases in Monkey Auditory Cortex. Journal of Neuroscience 27: 4093–4100. Malmierca, M. S. 2005. The Inferior Colliculus: A Center for Convergence of Ascending and Ascending Auditory Information. Neuroembryology and Ageing 3: 215–229. Marsh, R.  A., Z.  M.  Fuzessery, C.  D.  Grose, and J.  J.  Wenstrup. 2002. Projection to the Inferior Colliculus from the Basal Nucleus of the Amygdala. Journal of Neuroscience 22: 10449–10460. Menon, V., and D. J. Levitin. 2005. The Rewards of Music Listening: Response and Physiological Connectivity of the Mesolimbic System. NeuroImage 28: 175–184. Meyer, L. B. 1956. Emotion and Meaning in Music. Chicago: Chicago University Press. Moore, B.  C.  J. 2012. An Introduction to the Psychology of Hearing. 6th ed. London: Academic Press. Moore, B. C. J., and H. E. Gockel. 2012. Properties of Auditory Stream Formation. Philosophical Truncations of the Royal Society B 367: 919–931. Moors, A. 2010. Theories of Emotion Causation: A Review. In Cognition and Emotion: Review of Current Research and Theories, edited by J. de Houwer and D. Hermans, 1–37. New York: Psychology Press. Neuhoff, J. G. 2004. Ecological Psychoacoustics. Boston, MA: Elsevier Academic Press. Ohl, F. W., and H. Scheich. 2005. Learning-Induced Plasticity in Animal and Human Auditory Cortex. Current Opinion in Neurobiology 15: 470–477. Olsson, A., and E.  A.  Phelps. 2004. Learned Fear of “Unseen” Faces after Pavlovian, Observational, and Instructed Fear. Psychological Science 15: 822–828. Owren, M.  J., and D.  Rendall. 2001. Sound on the Rebound: Bringing Form and Function Back to the Forefront in Understanding Nonhuman Primate Vocal Signaling. Evolutionary Anthropology 10: 58–71. Owren, M.  J., M.  Phillip, E.  Vanman, N.  Trivedi, A.  Schulman, and J.  Bachorowski. 2013. Understanding Spontaneous Human Laughter: The Role of Voicing in Inducing Positive Emotion. In Evolution of Emotional Communication, edited by E. Altenmüller, S. Schmidt, and E. Zimmermann, 175–190. Oxford: Oxford University Press. Patel, A. 2010. Music, Biological Evolution, and The Brain. In Emerging Disciplines, edited by M. Bailar, 91–144. Houston, TX: Houston University Press. Petkov, C. I., X. Kang, K. Alho, O. Bertrand, E. W. Yund, and D. L. Woods. 2004. Attentional Modulation of Human Auditory Cortex. Nature Neuroscience 7: 685–663. Phelps, E. A. 2006. Emotion and Cognition: Insights from Studies of the Human Amygdala. Annual Review of Psychology 57: 27–53.

388   ERKIN ASUTAY AND DANIEL VÄSTFJÄLL Phelps, E. A., and J. LeDoux. 2005. Contributions of the Amygdala to Emotion Processing: From Animal Models to Human Behavior. Neuron 48: 175–187. Pourtois, G., A.  Schettino, and P.  Vuilleumier. 2013. Brain Mechanisms for Emotional Influences on Perception and Attention: When Is Magic and What Is Not. Biological Psychology 92: 492–512. Read, H.  L., J.  A.  Winer, and C.  E.  Schreiner. 2002. Functional Architecture of Auditory Cortex. Current Opinion in Neurobiology 12: 433–440. Rees, A., and A.  R.  Palmer. 2010. The Oxford Handbook of Auditory Science: The Auditory Brain, Vol. 2. New York: Oxford University Press. Rescorla, R.  A. 1998. Pavlovian Conditioning: It’s Not What You Think It Is. American Psychologist 43: 151–160. Russell, J.  A. 2009. Emotion, Core Affect, and Psychological Construction. Cognition and Emotion 23: 1259–1283. Russell, J. A., J. A. Bachorowski, and J. M. Fernandez-Dols. 2003. Facial and Vocal Expressions of Emotion. Annual Review of Psychology 54: 329–349. Salimpoor, V., M.  Benovoy, K.  Larcher, A.  Dagher, and R.  Zattere. 2011. Anatomically Distinct Dopamine Release during Anticipation and Experience of Peak Emotion Music. Nature Neuroscience 14: 257–262. Salminen, N. H., J. Aho, and M. Sams. 2013. Visual Task Enhances Spatial Selectivity in the Human Auditory Cortex. Frontiers in Neuroscience 7: 44. Sander, D., J.  Grafman, and T.  Zalla. 2003. The Human Amygdala: An Evolved System for Relevance Detection. Reviews in the Neurosciences 14: 303–316. Sander, K., A. Brechman, and H. Scheich. 2003. Audition of Laughing and Crying Leads to Right Amygdala Activation in a Low-Noise fMRI Setting. Brain Research Protocols 11: 81–91. Sander, K., and H.  Scheich. 2005. Left Auditory Cortex and Amygdala, but Right Insula Dominance for Human Laughing and Crying. Journal Cognitive Neuroscience 17: 1519–1531. Sara, S.  J., and S.  Bouret. 2012. Orienting and Reorienting: The Locus Coeruleus Mediates Cognition through Arousal. Neuron 76: 130–141. Scherer, K. R. 2003. Why Music Does Not Produce Basic Emotions: A Plea for a New Approach to Measuring Emotional Effects of Music. In Proceedings of the Stockholm Music Acoustics Conference 2003, edited by R. Bresin, 25–28. Stockholm, Sweden: Royal Institute of Technology. Scherer, K. R. 2009. Emotions and Emergent Processes: They Require a Dynamic Computational Architecture. Philosophical Transactions of the Royal Society B 364: 3459–3474. Schirmer, A., and S.  A.  Kotz. 2006. Beyond the Right Hemisphere: Brain Mechanisms Mediating Vocal Emotional Processing. Trends in Cognitive Science 10: 24–30. Seifritz, E., J.  G.  Neuhoff, D.  Bilecen, K.  Scheffler, H.  Mustovic, H.  Schächinger, et al. 2002. Neural Processing of Auditory Looming in the Human Brain. Current Biology 12: 2147–2151. Sescousse, G., X. Caldú, B. Segura, and J. C. Dreher. 2013. Processing Primary and Secondary Rewards: A Quantitative Meta-Analysis and Review of Human Functional Neuroimaging Studies. Neuroscience and Biobehavioral Reviews 37: 681–696. Shinn-Cunningham, B.  G. 2008. Object-Based Auditory and Visual Attention. Trends in Cognitive Sciences 12: 182–186. Siegel, E. H., and J. K. Stefanucci. 2011. A Little Bit Louder Now: Negative Affect Increases Perceived Loudness. Emotion 11: 1006–1011. Snowdon, C. T., and D. Teie. 2013. Emotional Communication in Monkeys: Music to Their Ears? In Evolution of Emotional Communication, edited by E. Altenmüller, S. Schmidt, and E. Zimmermann, 133–151. Oxford: Oxford University Press.

Sound and Emotion   389 Sörqvist, P., S. Stenfelt, and J. Rönnberg. 2012. Working Memory Capacity and Visual-Verbal Cognitive Load Modulate Auditory-Sensory Gating in the Brainstem: Toward a Unified View of Attention. Journal of Cognitive Neuroscience 24: 2147–2154. Stecker, G. C., I. A. Harrington, and J. C. Middlebrooks. 2005. Location Coding by Opponent Neural Populations in The Auditory Cortex. PLoS Biology 3: 78. Tajadura-Jiménez, A. 2008. Embodied Psychoacoustics: Spatial and Multisensory Determinants of Auditory-Induced Emotion. Doctoral dissertation. Chalmers University of Technology, Sweden. Tajadura-Jiménez, A., P.  Larsson, A.  Väljamäe, D.  Västfjäll, and M.  Kleiner. 2010b. When Room Size Matters: Acoustic Influences on Emotional Responses to Sounds. Emotion 10: 416–422. Tajadura-Jiménez, A., A.  Väljamäe, E.  Asutay, and D.  Västfjäll. 2010a. Embodied Auditory Perception: The Emotional Impact of Approaching and Receding Sounds. Emotion 10: 216–229. Tajadura-Jiménez, A., and D.  Västfjäll. 2008. Auditory-Induced Emotion: A Neglected Channel for Communication in Human-Computer Interaction. In Affect and Emotion in Human-Computer Interaction: From Theory to Applications, edited by C. Peter and R. Beale, 63–74. Berlin/Heidelberg: Springer-Verlag. Thiel, C.  M., K.  J.  Friston, and R.  J.  Dolan. 2002. Cholinergic Modulation of ExperienceDependent Plasticity in Human Auditory Cortex. Neuron 35: 567–574. Västfjäll, D. 2012. Emotional Reactions to Sounds without Meaning. Psychology 3: 606–609. Vuilleumier, P. 2005. How Brains Beware: Neural Mechanisms of Emotional Attention. Trends in Cognitive Sciences 9: 585–594. Vuilleumier, P., and J.  Driver. 2007. Modulation of Visual Processing by Attention and Emotion: Windows on Causal Interactions between Human Brain Regions. Philosophical Transactions of the Royal Society B 362: 837–855. Wang, X., and D.  Bendor. 2010. Pitch. In The Oxford Handbook of Auditory Science: The Auditory Brain, Vol. 2, edited by A.  Rees and A.  R.  Palmer, 149–172. New York: Oxford University Press. Weinberger, N. M. 2004. Specific Long-Term Memory Traces in Primary Auditory Cortex. Nature Reviews: Neuroscience 5: 279–290. Weinberger, N. M. 2010. The Cognitive Auditory Cortex. In The Oxford Handbook of Auditory Science: The Auditory Brain, Vol. 2, edited by A. Rees and A. R. Palmer, 441–478. New York: Oxford University Press. Weis, T., S.  Puschmann, A.  Brechmann, and C.  M.  Thiel. 2012. Effects of L-dopa during Auditory Instrumental Learning in Humans. PLoS One 7: e52504. Wiethoff, S., D. Wildgruber, W. Grodd, and T. Ethofer. 2009. Response and Habitation of the Amygdala during Processing of Emotional Prosody. Neuroreport 20: 1356–1360. Winer, J. A., and C. C. Lee. 2007. The Distributed Auditory Cortex. Hearing Research 229: 3–13. Woods, D. L., T. J. Herron, A. D. Cate, E. W. Yund, G. C. Stecker, T. Rinne, et al. 2010. Functional Properties of Human Auditory Cortical Fields. Frontiers in Systems Neuroscience 4: 155. Woods, D.  L., G.  C.  Stecker, T.  Rinne, T.  J.  Herron, A.  D.  Cate, E.  W.  Yund, et al. 2009. Functional Maps of Human Auditory Cortex: Effects of Acoustic Features and Attention. PLoS One 4 (e5183). Xiao, Z., and N. Suga. 2005. Asymmetry in Cortigofugal Modulation of Frequency-Tuning in Moustached Bat Auditory System. Proceedings of the National Academy of Sciences 102: 19162–19167.

390   ERKIN ASUTAY AND DANIEL VÄSTFJÄLL Yiend, J. 2010. The Effects of Emotion in Attention: A Review of Attentional Processing of Emotional Information. Cognition and Emotion 24: 3–47. Yin, T. C. T. and S. Kuwada. 2010. Binaural Localization Cues. In The Oxford Handbook of Auditory Science: The Auditory Brain, Vol. 2, edited by A. Rees and A. R. Palmer, 271–302. New York: Oxford University Press. Young, E.  D. 2010. Level and Spectrum. In The Oxford Handbook of Auditory Science: The Auditory Brain, Vol. 2, edited by A.  Rees and A.  R.  Palmer, 93–124. New York: Oxford University Press. Zatorre, R.  J., and A.  R.  Halpern. 2005. Mental Concerts: Musical Imagery and Auditory Cortex. Neuron 47: 9–12. Zeskind, P. S. 2013. Infant Crying and the Synchrony of Arousal. In Evolution of Emotional Communication, edited by E.  Altenmüller, S.  Schmidt, and E.  Zimmermann, 155–176. Oxford: Oxford University Press.

Chapter 19

Volu n ta ry Au ditory Im agery a n d M usic Pedag ogy Andrea R. Halpern and Katie Overy

Introduction Auditory imagery is a common everyday experience. People are able to imagine the sound of waves crashing on a beach, the voice of a famous movie actor, or the melody of a familiar song or TV theme tune. Although people vary in the extent to which they report these as vivid experiences, on average they rate vividness of imagined sounds at the upper end of rating scales such as the Bucknell Auditory Imagery Scale (Halpern 2015), averaging about 5 on a 7-point scale, where 7 means “as vivid as actually hearing the sound.” Imagined music can also be involuntary (Beaman and Williams 2010; Hyman et al. 2013), or even hallucinatory (Griffiths 2000; Weinel, this volume, chapter 15) but the focus of our discussion is on the willful calling to mind of music. Our argument here is that, in general, auditory imagery is not just something people do when mind-wandering or passing the time; it can have definite positive consequences in mood regulation, ­self-entertainment, and mental rehearsal. More particularly, musicians, composers, and music educators understand that auditory imagery is a tool, and they regularly employ auditory imagery in both pedagogical and professional capacities. We suggest that this important skill could be used more widely than it is already; for example, to enable musicians to employ more efficient memorization skills and to rehearse both physical and expressive aspects of performance without risking excessive motor practice. Using imagined music to accomplish something beneficial is reported among ­non-musicians as well as musicians, of course. People report voluntarily bringing music to mind to regulate their emotional state, and they judge the emotionality of familiar imagined music similarly to judging the emotionality of heard music (Lucas et al. 2010). Recorded music has been shown to assist athletic performance in a variety of situations,

392   andrea r. halpern and katie overy including keeping a steady pace during swimming (Karageorghis et al.  2013), and ­imagined music can have similar benefits. One compelling example was reported by the marathon swimmer Diana Nyad during her record-setting swim of the Straits of Cuba, covering 110 miles in 53 hours: Diana Nyad uses singing to help pass the time and the monotony and sensory deprivation inevitable in marathon swimming. To help, she sings silently from a mental playlist of about 65 songs [including] Janis Joplin’s chart-topping version of Me and Bobby McGee. “If I sing that 2,000 times in a row, the whole song, I will get through five hours and 15 minutes,” Nyad said . . . . “It’s kind of stupid,” she added, “but it gets me through.”1

For musicians, imagining a musical performance can be a useful rehearsal tool and even a powerful experience, as expressed by violinist Romel Joseph, who was trapped under the rubble of his music conservatory for eighteen hours after the Haiti earthquake of 2010. This quote captures both the emotional and performance aspects of imagining music in a most deliberate way: He didn’t panic—instead, he kept himself to a strict schedule. He spent part of each hour in prayer. The rest of the time he filled by rehearsing his favorite classical music performances in his head, note by note. “For example, if I perform the Franck sonata, which is [sic] 35 minutes long in my honors recital at Juilliard, then I would bring myself to that time. That allows me . . . to mentally take myself out of the space where I was.”2

Psychology Research on Auditory Imagery If auditory imagery had an arbitrary, or even illusory link to perceiving and performing real music, then advocating for the increased use of imagery in musical rehearsal and pedagogy might not make a compelling argument. However, research over the years has suggested that both musicians and nonmusicians (i.e., those who haven’t studied musical performance to a high level) can mentally represent a surprisingly wide range of auditory characteristics of actual musical sound (see Hubbard 2010 for a comprehensive review), in many cases using imagery very consciously and deliberately. For instance, most individuals, including nonmusicians, can call to mind the melody of a familiar song without any difficulty. For songs with no canonical recorded versions, people are remarkably consistent in reproducing or choosing a similar pitch to the one they produced or chose on a prior occasion for the same song (Halpern 1989). In addition, they are also fairly accurate in reproducing the opening pitches of well-known recordings of

voluntary auditory imagery and music pedagogy   393 music from their mental playlist (Levitin and Cooke 1996; Frieler et al. 2013) and can usually recognize the correct pitch within two semitones (Schellenberg and Trehub 2003). Auditory imagery also represents some of the temporal characteristics of sounded music remarkably accurately. If asked to carry out memory tasks comparing pitches at two nonadjacent places in a familiar melody, reaction times increase proportionally to the distance between the notes in beats of the actual tune (Halpern 1988a) and if asked to mentally complete a phrase of a familiar tune after a sounded cue of the opening notes, reaction times similarly increase proportionally for longer phrases (Halpern and Zatorre 1999). A recent study of involuntary musical imagery asked people to tap the tempo of involuntary auditory images that occurred over a five-day period; tempos were recorded via an accelerometer. Results showed that 77 percent of 115 reports of ­episodes involving recorded music were within 15 percent of the original recorded tempo (Jakubowski et al. 2015). Even a multidimensional construct such as timbre is processed similarly in hearing and imagery. Halpern and Zatorre (2004) asked people to make similarity judgments between pairs of sounded and imagined musical instruments while undergoing fMRI scanning. Similarity ratings in both conditions were highly correlated. Additionally, both types of judgments involved activation of the secondary auditory cortex (judgments on sounded but not imagined music additionally activated the primary auditory cortex). Thus, we have some basis to conclude that mentally simulating or rehearsing music might involve similar neural processing and thus confer some of the same benefits as actual hearing or even production—given that production includes not only motor but also auditory skills. On the other hand, apart from the case of auditory hallucinations (Griffiths 2000; Weinel, this volume, chapter 15) most people do not actually confuse imagining with hearing, and thus we should not be surprised if there were behavioral and neural substrate differences between the two. One obvious difference between the two types of auditory experience is that auditory imagery tasks are on average more difficult than matched perceptual tasks. For example, Zatorre and Halpern (1993) presented patients who had undergone surgery removing part of their temporal lobes (mean age about thirty years old) and matched controls with the text of the first line of a familiar tune, such as Jingle Bells. Two lyrics were highlighted, as in “Dashing through the SNOW, in a one-horse open SLEIGH.” The task was to judge whether the pitch of the second highlighted lyric was higher or lower in pitch than the first such lyric (the reader is invited to try that now). In one condition, participants heard a recording of someone singing the tune; in the other condition, they had to use mental imagery only. The right-temporal lobectomy patients had lower performance in both conditions compared to the other two groups, implicating the role of the right temporal lobe in pitch perception and imagery, whereas accuracy rates for all participants were about 12–15 percent lower in the imagined than heard condition. In a subsequent study with healthy young adults in which only the two to-be-compared lyrics were presented, similar performance drops from heard to imagined conditions were found (Zatorre and Halpern 1996).

394   andrea r. halpern and katie overy Imagery tasks are likely to be more difficult because they involve considerable working memory resources; indeed, Baddeley and Andrade (2000) found that working memory (WM) performance scores correlated with self-reported vividness of both auditory and visual imagery. WM span is also positively correlated with measures of pitch and temporal imagery ability (Colley et al. 2017). Brain imaging studies also point to the involvement of executive function in auditory imagery. Herholz and colleagues (2012) asked people with a range of musical experience to listen to or imagine familiar songs, while simultaneously viewing the lyrics in a karaoke-type video presentation. Compared to a baseline, cerebral blood flow (measured via fMRI) in imagined and heard conditions activated perceptual areas such as the superior temporal gyrus (STG, the locus of the secondary auditory cortex), but imagining tunes also uniquely activated several areas associated with higher-order planning and other executive functions, such as the supplementary motor area (SMA), intraparietal cortex (IPS), inferior frontal cortex (IFC), and right dorsolateral prefrontal cortex (DLPFC) (see Figure 19.1). This additional neural activity is interpreted as reflecting extra cognitive effort and suggests that imagery tasks are more difficult. However, such tasks may also benefit music learning in both the short and long term precisely because of this increased level of cognitive engagement (known as a “desirable difficulty” in the cognitive literature), potentially leading to better encoding and later retention (Bjork et al. 2014). As alluded to in our opening remarks, auditory imagery is not always intentional—it can come unbidden in the form of so-called earworms, or what is sometimes called involuntary musical imagery (INMI). Numerous researchers have now studied this phenomenon, documenting the incidence and phenomenology of the experience, the relationship to personality variables, and the characteristics of the triggers and the tunes themselves that come to mind (for example, Bailes 2006; Halpern and Bartlett 2011; Hyman et al. 2013; Müllensiefen et al. 2014; Williamson and Jilka 2013). However, in this chapter we focus on voluntary auditory imagery precisely because it is under the control of the individual and thus can be harnessed and modified as needed to accomplish musical goals.

IPS

DLPFC

SMA

listen > imagine 2.3 5.4 imagine > listen 2.3 3.7 STG

y = –52

z = 42

x = 38

IFC and Insular Cortex

Figure 19.1  Brain areas more active in listening than imagining a familiar tune (the major activity is labeled “STG”—orange on the companion website) and those more active in imagi­ ning than listening to a familiar tune (the major activity is labeled “IPS,” “SMA,” “DLPFC,” and “IFC”—blue on the companion website). (Reprinted with permission from Herholz et al. 2012).

voluntary auditory imagery and music pedagogy   395

Individual Differences in Auditory Imagery Abilities Auditory imagery is sometimes separated into two distinct types of processing—­ generation and transformation. Generation refers to calling a perceptual experience to mind, that is, initiating the auditory (or visual) image. Most individuals are able to generate musical auditory images, such as the initial note of a familiar tune (Halpern 1989), the sounds of two different instruments (Halpern and Zatorre 2004), a phrase of a melody (Herholz et al. 2012), or even an entire minute of a fully realized classical symphony (Lucas et al. 2010). Once an auditory image has been generated, or even during the ­process, transformations can be applied to the internal representations. For instance, if asked to imagine the familiar melody of “Happy Birthday” in a version where the third note moves down instead of up in pitch, most people report they can do so. Other transformations can be more difficult, such as mentally reversing a just-presented tune and answering a question about the reversal accurately (Zatorre et al. 2010). Both of these processes can be useful in professional musical life, for example a performer engaging in mental rehearsal of a familiar or notated passage of music might rely primarily on generation of the auditory image, but might also use transformation to try out different expressive interpretations prior to executing them. Similarly, composers and arrangers who are working with and developing musical themes can use both types of process. There are considerable variations in both self-reported and objectively measured abilities in these areas of auditory imagery ability, and researchers have become quite interested of late in creating scales and measures that index these differences. One example of a self-report measure is the Bucknell Auditory Imagery Scale (BAIS), mentioned earlier, which has two subscales that capture self-report of generation (Vividness) and transformation or the ability to control characteristics of the image (Control) (Halpern 2015). An example of an item on the Vividness subscale (BAIS-V) is: “Consider attending a choir rehearsal. [Imagine] the sound of an all-children’s choir singing the first verse of a song.” An example of an item on the Control subscale (BAIS-C) is: “Consider being present at a jazz club. [First imagine] the sound of a saxophone solo. [Imagine that] the saxophone is now accompanied by a piano.” Both subscales typically elicit a wide range of responses on a scale of 1 (“no image present”) to 7 (“as vivid as the actual sound”) scale. For instance, in the original development sample of seventy-six undergraduates with a variety of musical training backgrounds, seventyfour of them used at least four scale points on at least one of the scales. On average, respondents showed a standard deviation of 1.5 in their ratings over the fourteen items on both the BAIS-V and BAIS-C.  Correlations between scores and years of musical training was typically modest on both scales, about .3. Objectively measured auditory imagery performance also varies considerably among people, as shown in tasks as diverse as mentally imagining a pitch contour (Gelding et al. 2015) and recognizing or producing a mentally reversed or transposed phrase (Greenspon et al. 2017). More importantly for our current purpose, individual differences in self-report can predict objectively measured task-based imagery performance, and this predictive validity appears to vary according to whether the auditory imagery task is more loaded

396   andrea r. halpern and katie overy on generation or transformation. So, for instance, performance on the pitch contour task of Gelding and colleagues (2015), which requires the mental continuation of a pitch pattern, was predicted by scores on the BAIS-V. Greenspon and colleagues (2017), on the other hand, gave good and poor vocal pitch-matchers a suite of auditory imagery tasks that required mental transformations of melodies. For instance, given a short target melody (three or four notes), participants had to produce the notes in reverse order or to transpose the segment to a new key. BAIS-C scores (but not BAIS-V scores) predicted the extent to which performance was worse in the transformed versus exact reproduction condition. These differences in auditory imagery self-report also predict behaviors that are more indirectly linked with auditory imagery skill. For example, self-report of auditory vividness correlates with the extent to which undergraduate students pitch-match accurately (Pfordresher and Halpern 2013). This points to the importance of the sensorimotor relationship of imagining an auditory target to successful execution of the very fine vocal movements involved in accurate singing. In the temporal domain, Colley and colleagues (2017) asked nonmusicians to tap along to expressive piano music (Chopin, Etude op 10, no. 3). The challenge in this task was that, being expressive, the tempo of the music (i.e., the speed of the underlying pulse) changed frequently as the pianist accelerated and decelerated, using rubato in the phrases. The authors measured the overall asynchrony of the taps to the onsets of the beat (i.e., combining anticipating or lagging with respect to the beat), as well as the extent to which the participant learned to anticipate (or predict) the beat, over multiple trials. They also tested performance with a more temporally regular piano piece by Mozart. BAIS-V predicted good synchronization in the Mozart piece. On the more difficult task of tapping with the Chopin, synchronization performance was predicted by BAIS-C, as well as by objective pitch and temporal imagery tasks. BAIS-C and temporal imagery ability correlated with learning to anticipate the beat, which is considered to be a measure of temporal sensitivity. We can also see individual differences in auditory imagery self-report reflected in functional and structural brain differences. In the study by Herholz and colleagues (2012), where participants were imagining or hearing familiar songs, a functional connectivity analysis showed that the right superior temporal gyrus and the right dorsolateral prefontal cortex (which mediates working memory) were functionally connected when participants were imagining familiar songs. This correlation was stronger in individuals with higher scores on the BAIS-V. Even brain volume in some areas is larger in individuals with more vivid imagery: Lima and colleagues (2015) measured gray matter volume as a function of BAIS-V score and found more gray matter volume in more vivid imagers in the supplementary motor cortex, as well as areas previously associated with auditory imagery in the parietal and frontal lobes. This relationship held independent of age (the age range was from 20 to 81), musical background, or short-term memory span. Indeed, auditory imagery has been shown to vary even among trained musicians. As noted previously, in most studies using the BAIS, the correlation is only about .3 with years of musical training (Halpern 2015), and the differences in brain connectivity shown by Herholz and coauthors (2012) emerged even among the trained musicians in that study.

voluntary auditory imagery and music pedagogy   397 Given this range of individual differences, the potential complexity of auditory imagery and the links with perception abilities, it seems possible that the deliberate practice of auditory imagery might be of benefit to practicing musicians. As reviewed by Keller (2012), auditory imagery allows for the prediction of upcoming movements by imagining upcoming sounds, which can benefit speed and accuracy of motor sequences such as striking piano keys. This prediction also benefits ensemble coordination, as players must anticipate what other members of the ensemble are about to play. Highben and Palmer (2004) found that pianists with the highest scores on aural skills tests coped the best when auditory feedback was absent during playing from memory, presumably because they could internally generate that feedback. Imagery is even more important for players of non-fixed-pitch instruments where there is no direct, unequivocal ­mapping between the note to be played and the finger positions on the instrument. An in-depth interview study with elite brass players by Trusheim (1991) revealed that many players reported reliance on auditory imagery to anticipate the movements needed to play the next note in tune and with the desired tone. The human singing voice may ­benefit most from auditory imagery, which allows planning and error correction, and indeed good pitch-matching singing skills have been linked with both self-reported and objectively measured auditory imagery skills (Pfordresher and Halpern  2013; Greenspon et al. 2017).

Music Pedagogy and Voluntary Auditory Imagery It is perhaps unsurprising then, that several approaches to musical training involve the explicit training of auditory imagery skills. In fact, the skill of reading music notation and “hearing” the appropriate auditory image “in one’s head” is so commonly used in musical performance and training that it often is not even given a particular name; its centrality is simply assumed (much like the skill of reading a book “in one’s head” is commonly assumed). The use of auditory imagery in expert musical performance preparation is sometimes called “mental rehearsal” and often combined with motor and visual imagery. Some of the first music psychologists also noted the importance of auditory imagery, starting with the earliest published measures of musical ability (Seashore 1919). Indeed, Carl Seashore regarded auditory imagery as the highest form of musicianship: “[T]he most outstanding mark of the musical mind is a high capacity for auditory imagery” (Seashore 1938, 161). It is thus useful at this point to consider the ways in which voluntary auditory imagery has been employed in particular approaches to music pedagogy. Although not all such approaches have either been documented or investigated empirically, there is nevertheless a long tradition of such training, going back decades and perhaps even centuries.

398   andrea r. halpern and katie overy Two major classroom music education figures of the twentieth century, Zoltán Kodály and Edward Gordon, made auditory imagery an explicit feature of their pedagogical approaches, referring to it as “inner hearing” or “audiation,” respectively. Perhaps the most methodical, worked-out teaching method arose from the work of Kodály, a Hungarian composer and professor at the Liszt Academy in the first half of the twentieth century (Ittzés 2002). Observing that Austrian and Viennese music were held in higher regard than Hungarian folk music during the period of the Austro-Hungarian Empire and apparently unimpressed by both the general quality and the repertoire of urban children’s singing in Hungary, Kodály developed an entirely new approach to classroom music education. This new approach was based on what he considered to be the children’s musical “mother-tongue,” that is, their musical vernacular, Hungarian folk songs, which he collected and preserved in collaboration with Béla Bartók (Kodály 1960, 1974). Essentially the idea, which was developed and put into practice by Kodály’s students (Ádám 1944; Szőnyi 1974), is to begin classroom music lessons with songs already somewhat familiar to children, and through regular repetition and analysis of this familiar repertoire, learn the fundamentals of musical knowledge such as scales, rhythm, musical notation, and sight-singing. The skills acquired can then give children direct access to participating in and understanding the entire world of Western art music (and indeed other music from around the globe). Integral to this process is the use of “inner hearing” as a device to develop children’s musicianship and literacy skills. For example, a primary school music activity might involve learning to miss out a few notes or words during a song and to imagine them instead of singing them. To take Jingle Bells as an example again, children might be asked to sing the whole song together, while leaving out the words “bells” and “all the way” throughout the whole song (try it!). Not only does this rehearse the skill of “inner hearing,” it can also be made into an enjoyable game, and additionally, the musical structure appearing from the three repetitions of “Jingle” (mi-mi, mi-mi, mi-so; see Figure 19.2) becomes prominent and can be “discovered” by the children with the guidance of the teacher, leading to an understanding of musical form. Developing such skills to a more advanced level eventually allows older children to be able to sight-sing one melody while imagining a countermelody (i.e., a simultaneous melody), or imagine a familiar chord sequence (e.g., I–VI–IV–V–I) in various major and minor keys, for example (see Figure 19.3). The focus on repeatedly singing and analyzing familiar songs is key to the Kodály approach, and in the context of auditory imagery it is worth noting the strong emphasis placed on regular practice and depth of understanding. Kodály believed that the collecting

Figure 19.2  First line of the song Jingle Bells, where the word “Jingle” is sung aloud each time and the rest of the line is imagined.

voluntary auditory imagery and music pedagogy   399

Figure 19.3  Harmonic chord sequence of I, VI, IV, V, I, first shown in the key of C major and then in A minor, where I is the tonic chord and V is the dominant chord.

of musical experiences is more important than studying music theoretically (Kodály 1974) and placed special emphasis on the use of relative sol-fa (i.e., naming notes according to their position in a musical scale, rather than by their absolute pitch) and two-part singing (Kodály 1962). Baddeley and Andrade suggest that the experience of vivid imagery requires abundant sensory information to be available from long-term memory (2000), and Neisser (1976) has noted that imagery arises (at least in part) from schemata, based on prior experience. Since a voluntary auditory image is self-generated, it reflects ­considerable prior processing and should not be seen as an uninterpreted sensory copy (Hubbard 2010)—mental models play an important part in musical imagery, much as they do in music perception (Schaefer 2014). Imagery and memory are also considered to be closely linked; mental imagery is an important component of working memory rehearsal, for example (Baddeley and Logie  1992). Singing may also be of particular value in the development of “inner hearing” skills because a vast amount of familiar musical material can be brought to mind through songs, without requiring any instrumental expertise. It has even been shown that musicians subvocalize when performing a notation-reading auditory imagery task (Brodsky et al. 2008), suggesting reliance on an imagined sung version of a melody (although it must be noted that neuroimaging studies to date have not revealed activation of the primary motor cortex during auditory imagery) (see Zatorre and Halpern 2005). Another point of interest regarding the Kodály approach is that it specifically aims to develop the ability to hear, or imagine, more than one melodic line at the same time, an ability that has recently been shown to be particularly developed in musical conductors (Wöllner and Halpern  2016). While the extent to which such a skill involves actual divided attention, versus rapid switching between parts, is still debated (Alzahabi and Becker 2013), it is nevertheless clear that this ability can be trained and developed to a high level of skill. Indeed, at an advanced level, such as an undergraduate “harmony and counterpoint” or “stylistic composition” exam, a music student might be asked to write a fugue in the style of Bach and a song in the style of Schubert while sitting at a desk in an exam hall, thus relying on auditory imagery of several melodic lines and/or harmonic progressions, as well as expert musical knowledge, in order to complete the task. A final aspect of the Kodály approach that is rarely discussed but important to note, is the fact that it involves group musical learning, almost always taking place in the school or university classroom. The “inner hearing” activities thus involve what we

400   andrea r. halpern and katie overy might describe as group auditory imagery, or “shared auditory imagery,” which can bring a highly focused sense of shared attention when used effectively, as well as allowing more generally for the potential benefits of group learning and social music-making (Heyes  2013; Kirschner and Tomasello  2010; Overy and Molnar-Szakacs  2009; Overy 2012). The idea of “shared auditory imagery” is not well documented and perhaps warrants future research. A second major music education figure of the twentieth century, Edward Gordon, based in the United States, focused his own music education approach much more specifically on auditory imagery, or what he calls “audiation.” Gordon proposes that only by understanding where a young child’s current audiation skills lie, can the child be taught appropriately, and much of Gordon’s work focuses on an interest in the variability of this skill in the general population and how to measure it appropriately (1987). Importantly, Gordon extends the meaning of the word “audiation” from auditory imagery alone to include the process of listening to music with some cognition of its structure, rather than just sensory perception, arguing that “audiation” is part of intelligent music listening. The Gordon measures of music audiation (e.g., Gordon 1979, 1982) have become some of the most commonly used measures of musical ability in children and are often also used in psychology and brain imaging research (e.g., Ellis et al. 2012). Examples of the kinds of tasks used are the melody and rhythm discrimination tests, in which two melodies or rhythms are heard and the child or adult’s task is to determine whether they are the same or different, a task commonly found in tests of musical ability (e.g., Bentley 1966; Wing 1970; Overy et al. 2003; 2005). Gordon assumes that, in order to perform this comparison task, a child must be able to hold the initial melody in mind for a short period of time, that is, to “audiate” the short extract. This measure thus links directly with the idea that working memory rehearsal requires mental imagery, as ­outlined earlier (Baddeley and Logie 1992). Voluntary auditory imagery is central to the Gordon concept of musical ability and is regarded as an important aspect of musicianship and an effective learning tool in the Kodály approach. On further analysis, there are also some interesting key elements in common between the two approaches. For example, both approaches: (1) use physical movement gestures in the teaching of “inner hearing” or “audiation;” (2) place strong emphasis on what Gordon calls “notational audiation” and what Kodály calls “musical literacy,” that is, the ability to read a musical score and hear the music in one’s head; and (3) place a strong emphasis of the importance of teacher-training programs in these skills. A detailed comparative analysis of the two approaches would no doubt generate some clear focus points for future research in this area, and perhaps lead to a richer understanding of how auditory imagery can be used, adapted, and developed in a range of different musical and pedagogical contexts.

Rehearsal Strategies for Instrumental Performance An entirely different area of musical training in which a highly worked out methodology for voluntary auditory imagery has been explicitly developed is professional

voluntary auditory imagery and music pedagogy   401 instrumental performance, or more specifically, the memorization of musical material for piano ­performance. The amount of repertoire that a professional pianist needs to be able to play accurately by memory (going back to Franz Liszt, who is reported to have started the showmanship of performing whole piano recitals without the score, see Hamilton 2008) is vast, involving hours of often highly technically demanding music, which must be perfectly executed. Such performances require an extraordinary feat of memory and can sometimes lead to extreme performance anxiety and subsequent medication (e.g., James and Savage 1984). In addition, the motivation to overlearn the material and reduce memory slips in performance can lead to the risk of overpractice, leading to physical strain and personal injury (e.g., Rosety-Rodriguez et al. 2003). The use of voluntary auditory imagery or what is more usually referred to as “mental rehearsal” or “mental practice” to prepare for such concerts is commonly recommended, and has been shown empirically to be effective (e.g., Bernardi et al. 2013) but is rarely specifically trained, even at conservatory level (Clark and Williamon 2011). One method developed to help prepare expert pianists for performance is that of Nelly Ben-Or, a concert pianist based in London and professor of the Alexander Technique at the Guildhall School of Music. Ben-Or combines concepts from the Alexander Technique, in which body imagery strategies are applied during preparation for action (McEvenue 2002), with auditory, visual, and motor imagery strategies, developing what she calls “techniques of mental representation.” Using these multimodal techniques of mental representation, an entire musical piece (or large section, depending on the scale of the piece) is imagined and memorized away from the piano, prior to physical rehearsal. Regular piano practice thus becomes a mix of mental imagery rehearsal and physical rehearsal, with the aim that physical rehearsal only takes place once imagined recall is fluent, thus limiting the possibility that any physical or technical performance constraints will impose restraints on what is imagined and ultimately ­performed and ideally allowing for more musical expression and flexibility. In an observational, ethnographic study of eleven pianists training in Nelly Ben Or’s method, Davidson-Kelly and colleagues (Davidson-Kelly 2014; Davidson-Kelly et al. 2015) described the process as acquiring “total inner memory” prior to performance and noted that the ability to understand a musical score is crucial to the success of the approach, requiring adequate theoretical knowledge as well as aural and technical skills. This relates back to the idea of mental imagery requiring strong schematic and prior knowledge, as mentioned previously (e.g., Neisser 1976; Hubbard 2010). In addition, the method assumes a level of skill in which the performer is not overly hampered by technical difficulties—if a piece requires rapid finger movements or large leaps, these are assumed to be mostly within the technical capabilities of the performer. Interviews with pianists training with Nelly Ben-Or revealed that the method was perceived by most participants to be effortful and challenging, but extremely effective at increasing awareness of the nuances of a piece and consolidating memory: “[I]t is really difficult to change 17 years habit,” “It requires enormous . . . time, effort and concentration,” “I am more secure and have less memory mistakes [sic],” “These pieces stay in my head and I can refresh my memory very quickly. They are very reliable” (Davidson-Kelly et al. 2015).

402   andrea r. halpern and katie overy In summary, voluntary auditory imagery is already used regularly and effectively in a variety of music pedagogy, performance, and composition practice contexts. Nevertheless, it has not yet been widely demonstrated or studied empirically in these real-world ­contexts. Further investigation of the nature, extent, and boundaries of this skill and how to train it most effectively, may prove beneficial to understanding auditory imagery in general and may also elucidate the potential benefits to professional musicians and their rehearsal, memorization, and performance techniques.

Conclusions Auditory imagery is an ability that most people can access and control, with a fair amount of precision and with fidelity to actual perceived sounds. Such imagery can be used for entertainment and emotional self-regulation (such as imagining calming songs if one is in a stressful situation). But we wish to emphasize another aspect of this experience: what seems at times to be an effortless ability in fact can require a fair amount of cognitive resources, including working and long-term memory. For both musicians and nonmusicians, the successful re-evoking of music often reflects the fact that the material has been encountered multiple times and reflects a detailed knowledge of the piece (particularly in Ben-Or’s approach). Thus, we could view voluntary auditory imagery in music learning as a tool that takes some effort to use, but results in superior technical and expressive skills, or a “desirable difficulty.” The fact that brain activation during auditory imagery shows areas in common with auditory perception but also unique activation of higher-order areas involved in memory and executive function, supports this idea of imagery being used as a tool to enhance learning. Auditory imagery does not occur in a vacuum of course. Musicians can also use motor and kinesthetic imagery (Meister et al. 2004), as they imagine their hand and body movements during playing, and visual imagery when imagining a score, a piano keyboard, or a conductor’s gestures from a prior rehearsal. Much research has pointed to the multimodality of imagery, both behaviorally and in terms of neural function (McNorgan 2012). The translation of a visual score into an auditory experience requires coordination across the two modalities, often via some representation of the motor system. Some of the pedagogy techniques described here exploit this interaction and could perhaps still be extended. For example, in the Kodály approach, preschool children are often asked to keep a sequence of learned motor actions going throughout an “action song” while imagining some melodic lines and singing the others. Similarly, Curwen hand-signs (Curwen 1854) are used in the Kodály approach with young children to represent pitch for both imagined and sung musical activities, before moving on to written notation and more advanced musical materials. This use of the motor system to represent sound while it is being imagined might be further exploited in more advanced ways, yet to be conceived and developed.

voluntary auditory imagery and music pedagogy   403 In this chapter, we have discussed the role of auditory imagery in primary music e­ ducation as well as in professional musical situations. These methods explicitly recognize that individuals with different levels of ability and training might use imagery in different ways. For example, Ben-Or proposes using multimodal imagery or “total inner memory” (Davidson-Kelly et al.  2015) to memorize and mentally rehearse a piece, assuming that the piece is largely within the performer’s current technical expertise. The Kodály approach extends from preschool to undergraduate levels of musicianship, entailing the wide range of beginner to expert levels of repertoire and musical skill therein. We assume that other approaches to music learning, such as imitative, oral transmission styles found in non-Western cultures and nonnotated musical genres such as pop and folk, may also use features of voluntary auditory imagery in a variety of ­different ways. We would also like to recognize here that adults older (even!) than undergraduates often have an interest in beginning or furthering their musical experiences or training. Some of the training methods referred to in this chapter could easily be adapted so that the training was appropriate for middle-aged or senior adults, for example by using generation-appropriate songs and physical activities. For adults with more seriously limited mobility, such adapted techniques might even be helpful for motor rehabilitation, for example in cases of stroke survival or Parkinson’s disease, where musical imagery of a steady beat, for example, has been proposed as potentially helpful in the rehabilitation of motor skills (Schaefer 2014). Of course, we should also emphasize that auditory imagery in music pedagogy is not always focused on (eventual) proficiency in singing, playing a musical instrument or reading music notation. We mentioned the value of social music-making earlier on, and fully recognize that many adults who are not necessarily formally trained in music ­nevertheless enjoy singing together in a group. However, many of these individuals are not satisfied with their vocal abilities and wish they could improve. Some adults do not sing at all, but wish they could improve their skills in order to enjoy both the artistic and social benefits of music-making such as choral singing (Clift and Hancox 2010). Research in progress with colleagues at a UK music conservatory is currently investigating a new way to teach adults who do not sing much, or well, to sing more confidently and more accurately. Given the strong relationship between auditory imagery vividness and pitch matching ability, and the importance of musical imagery skills in many pedagogical approaches, one aspect of the research will be to create an intervention to train and improve auditory imagery skills. The study will include a version of the mental pitch comparison task mentioned earlier (Zatorre and Halpern 1993), where difficulty is gradually increased by probing pitches that are increasingly distant from each other within the song. Developed as an enjoyable app that can be accessed at home, the study will track whether (1) it is possible to measure the improvement of auditory imagery skills and (2) whether that improvement correlates with improved pitch matching and vocal quality. Such improved skills may also lead to new possibilities in the areas of improvising and composing for these adult learners.

404   andrea r. halpern and katie overy We close with the thought that auditory imagery tasks are both inexpensive (one only has to imagine sounds!) and can be fun, such as asking people to imagine and play with famous tunes in their heads (we will leave you with an auditory image of the beautiful song “Danny Boy” and ask you to enjoy and spend too long on the highest note). Auditory imagery tasks can be developed for individuals with a wide range of musical backgrounds and performance goals and can thus serve to enhance the traditional tools of music educators.

Acknowledgments Katie Overy thanks Eva Vendrei (in memoriam) for her inspirational teaching, Ittzés Mihály (in memorium) for his expert advice, and the International Kodály Society for their 2001 Sarolta Kodály scholarship to study at the Zoltán Kodály Pedagogical Institute of Music, Hungary.

Notes 1. https://pingroof.com/diana-nyad-inspiring-more-than-one-generation/ Accessed September 20, 2017. 2. “Wife, School Lost in Quake, Violinist Vows to Rebuild,” from the NPR news program All Things Considered (2010). http://www.npr.org/2010/01/23/122900781/wife-school-lost-inquake-violinist-vows-to-rebuild. Accessed September 20, 2017.

References Adam, J. 1944. Módszeres Énektanítás a Relatív Szolmizáció Alapján (Systematic Singing Teaching Based on the Tonic Sol-fa). Budapest: Editio Musica Budapest. Alzahabi, R., and M.  W.  Becker. 2013. The Association between Media Multitasking, TaskSwitching, and Dual-Task Performance. Journal of Experimental Psychology: Human Perception and Performance 39: 1485–1495. Baddeley, A.  D., and J.  Andrade. 2000. Working Memory and the Vividness of Imagery. Journal of Experimental Psychology: General 129: 126–145. Baddeley, A. D., and R. H. Logie. 1992. Auditory Imagery and Working Memory. In Auditory Imagery, edited by D. Reisberg, 179–197. Hillsdale, NJ: Erlbaum. Bailes, F. A. 2006. The Use of Experience-Sampling Methods to Monitor Musical Imagery in Everyday Life. Musicae Scientiae 10: 173–190. Beaman, C. P., and T. I. Williams. 2010. Earworms (Stuck Song Syndrome): Towards a Natural History of Intrusive Thoughts. British Journal of Psychology 101: 637–653. Bentley, A. 1966. Measures of Musical Abilities, Manual. London: George A. Harap. Bernardi, N. F., A. Schories, H.-C Jabusch, B. Colombo, and E. Altenmueller. 2013. Mental Practice in Music Memorisation: An Ecologicalempirical Study. Music Perception 30: 275–290. Bjork, E. L., J. L. Little, and B. C. Storm. 2014. Multiple-Choice Testing as a Desirable Difficulty in the Classroom. Journal of Applied Research in Memory and Cognition 3: 165–170.

voluntary auditory imagery and music pedagogy   405 Brodsky, W., Y. Kessler, B-S. Rubinstein, J. Ginsborg, A. Henik. 2008. The Mental Representation of Music Notation: Notational Audiation. Journal of Experimental Psychology: Human Perception and Performance 34: 427–445. Clark, T., and A.  Williamon. 2011. Evaluation of a Mental Skills Training Program for Musicians. Journal of Applied Sport Psychology 23: 342–359. Clift, S., and G. Hancox. 2010. The Significance of Choral Singing for Sustaining Psychological Wellbeing: Findings from a Survey of Choristers in England, Australia and Germany. Music Performance Research 3: 79–96. Colley, I. D., P. E. Keller and A. R. Halpern. 2017. Working Memory and Auditory Imagery Predict Sensorimotor Synchronization with Expressively Timed Music, Quarterly Journal of Experimental Psychology 71: 1781–1796. doi:10.1080/17470218.2017.1366531. Curwen, J. 1854. An Account of the Tonic Sol-fa Method of Teaching to Sing. London: Tonic Sol-fa Press. Davidson-Kelly, K. 2014. Mental Imagery Rehearsal Strategies for Expert Pianists. PhD thesis, University of Edinburgh, Scotland. Davidson-Kelly, K., R.  S.  Schaeffer, N.  Moran, and K.  Overy. 2015. “Total Inner Memory”: Deliberate Uses of Multimodal Musical Imagery during Performance Preparation. Psychomusicology: Music, Mind and Brain 25 (1): 83–92. Ellis, R. J., A. C. Norton, K. Overy, E. Winner, D. C. Alsop, and G. Schlaug. 2012. Differentiating Maturational and Training Influences on fMRI Activation during Music Processing. NeuroImage 60 (3): 1902–1912. Frieler, K., Fischinger, T., Schlemmer, K., Lothwesen, K., Jakubowski, K., & Müllensiefen, D. (2013). Absolute Memory for Pitch: A Comparative Replication of Levitin’s 1994 Study in Six European Labs. Musicae Scientiae 17 (3): 334–349. Gelding, R. W., W. F. Thompson, and B. W. Johnson. 2015. The Pitch Imagery Arrow Task: Effects of Musical Training, Vividness, and Mental Control. PLoS One 10 (3): e0121809. Gordon, E. E. 1979. Primary Measures of Music Audiation. Chicago: GIA Publications. Gordon, E. E. 1982. Intermediate Measures of Music Audiation. Chicago: GIA Publications. Gordon, E. E. 1987. The Nature, Description, Measurement and Evaluation of Musical Aptitude. Chicago: GIA Publications. Greenspon, E.  B., P.  Q.  Pfordresher, and A.  R.  Halpern. 2017. Mental Transformations of Melodies. Music Perception 34: 585–604. Griffiths, T. D. 2000. Musical Hallucinosis in Acquired Deafness: Phenomenology and Brain Substrate. Brain 123: 2065–2076. Halpern, A. R. 1988a. Mental Scanning in Auditory Imagery for Songs. Journal of Experimental Psychology: Learning, Memory, and Cognition 14: 434–443. Halpern, A. R. 1989. Memory for the Absolute Pitch of Familiar Songs. Memory and Cognition 17: 572–581. Halpern, A.  R. 2015. Differences in Auditory Imagery Self Report Predict Behavioral and Neural Outcomes. Psychomusicology: Music, Mind, and Brain 25: 37–47. Halpern, A. R., and J. C. Bartlett. 2011. The Persistence of Musical Memories: A Descriptive Study of Earworms. Music Perception 28: 425–431. Halpern, A. R., and R. J. Zatorre. 1999. When That Tune Runs through Your Head: A PET Investigation of Auditory Imagery for Familiar Melodies. Cerebral Cortex 9: 697–704. Halpern, A. R., R. J. Zatorre, M. Bouffard, and J. A. Johnson. 2004. Behavioral and Neural Correlates of Perceived and Imagined Musical Timbre. Neuropsychologia 42: 1281–1292.

406   andrea r. halpern and katie overy Hamilton, K. 2008. After the Golden Age: Romantic Pianism and Modern Performance. New York: Oxford University Press. Herholz, S.  C., A.  R.  Halpern, and R.  J.  Zatorre. 2012. Neuronal Correlates of Perception, Imagery, and Memory for Familiar Tunes. Journal of Cognitive Neuroscience 24: 1382–1397. Heyes, C. 2013. What Can Imitation Do for Cooperation? In Cooperation and Its Evolution, edited by K. Sterelny, R. Joyce, B. Calcott, and B. Fraser. Cambridge, MA: MIT Press. Highben, Z., and C.  Palmer. 2004. Effects of Auditory and Motor Mental Practice in Memorized Piano Performance. Bulletin of the Council for Research in Music Education 159: 58–65. Hubbard, T. L. 2010. Auditory Imagery: Empirical Findings. Psychological Bulletin 136:302–329. Hyman, I. E., Jr., N. K. Burland, H. M. Duskin, M. C. Cook, C. M. Roy, J. C. McGrath et al. 2013. Going Gaga: Investigating, Creating, and Manipulating the Song Stuck in My Head. Applied Cognitive Psychology 27: 204–215. Ittzés, M. 2002. Zoltán Kodály: In Retrospect. Kecskemét, Hungary: Kodály Institute. Jakubowski, K. N. Farrugia, A. R. Halpern, S. K. Sankarpandi, and L. Stewart. 2015. The Speed of Our Mental Soundtracks: Tracking the Tempo of Involuntary Musical Imagery in Everyday Life. Memory and Cognition 43: 1229–1242. James, I., and I. B. Savage. 1984. Beneficial Effect of Nadolol on Anxiety-Induced Disturbances of Performance in Musicians: A Comparison with Diazepam and Placebo; Proceedings of a Symposium on the Increasing Clinical Value of Beta Blockers Focus on Nadolol. American Heart Journal 108: 1150–1155. Karageorghis, C. I., J. C. Hutchinson, L. Jones, H. L. Farmer, M. A. Ayhan, R. C. Wilson, et al. 2013. Psychological, Psychophysical, and Ergogenic Effects of Music in Swimming. Psychology of Sport and Exercise 14: 560–568. Keller, P.  E. 2012. Mental Imagery in Music Performance: Underlying Mechanisms and Potential Benefits. Annals of the New York Academy of Sciences 1252 (1): 206–213. Kirschner, S., and M. Tomasello. 2010. Joint Music Making Promotes Prosocial Behavior in 4-Year-Old Children. Evolution and Human Behavior 31: 354–364. Kodály, Z. 1960. Folk music of Hungary. London: Barrie and Rockliff. Kodály, Z. 1962. Bicinia Hungarica. London: Boosey and Hawkes. Kodály, Z. 1974. The Selected Writings of Zoltán Kodály. London and New York: Boosey & Hawkes. Levitin, D. J., and P. R. Cook. 1996. Memory for Musical Tempo: Additional Evidence That Auditory Memory Is Absolute. Perception and Psychophysics 58: 927–935. Lima, C., N. Lavan, S. Evans, Z. Agnew, A. R. Halpern, P. Shanmugalingam, et al. 2015. Feel the Noise: Relating Individual Differences in Auditory Imagery to the Structure and Function of Sensorimotor Systems. Cerebral Cortex 25: 4638–4650. doi:10.1093/cercor/bhv134. Lucas, B.  J., E.  Schubert, and A.  R.  Halpern. 2010. Perception of Emotion in Sounded and Imagined Music. Music Perception 27: 399–412. McEvenue, K. 2002. The Actor and the Alexander Technique. New York: Palgrave Macmillan. McNorgan, C. 2012. A Meta-Analytic Review of Multisensory Imagery Identifies the Neural Correlates of Modality-Specific and Modality-General Imagery. Frontiers in Human Neuroscience 6: 285–295. Meister, I. G., T. Krings, H. Foltys, B. Boroojerdi, M. Müller, R. Töpper, and A. Thron. 2004. Playing Piano in the Mind—an fMRI Study on Music Imagery and Performance in Pianists. Cognitive Brain Research 19: 219–228.

voluntary auditory imagery and music pedagogy   407 Müllensiefen, D., J.  Fry, R.  Jones, S.  Jilka, L.  Stewart, and V.  Williamson. 2014. Individual Differences Predict Patterns in Spontaneous Involuntary Musical Imagery. Music Perception 31 (4): 323–338. doi:10.1525/MP.2014.31.4.323. Neisser, U. 1976. Cognition and Reality: Principles and Implications of Cognitive Psychology. New York: Freeman. Overy, K. 2012. Making Music in a Group: Synchronization and Shared Experience. Annals of the New York Academy of Science 1252: 65–68. Overy, K., and I. Molnar-Szakacs. 2009. Being Together in Time: Musical Experience and the Mirror Neuron System. Music Perception 26: 489–504. Overy, K., R. I. Nicolson, A. J. Fawcett, and E. F. Clarke. 2003. Dyslexia and Music: Measuring Musical Timing Skills. Dyslexia 9: 18–36. Overy, K., A. Norton, K. Cronin, E. Winner, and G. Schlaug. 2005. Examining Rhythm and Melody Processing in Young Children using fMRI. Annals of the New York Academy of Science 1060: 210–218. Pfordresher, P.  Q., and A.  R.  Halpern. 2013. Auditory Imagery and the Poor-Pitch Singer. Psychonomic Bulletin and Review 20: 747–753. Rosety-Rodriguez, M., F. J. Ordonez, and J. Farias. 2003. The Influence of the Active Range of Movement of Pianists’ Wrists on Repetitive Strain Injury. European Journal of Anatomy 7: 75–77. Schaefer, R. S. 2014. Auditory Rhythmic Cueing in Movement Rehabilitation: Findings and Possible Mechanisms. Philosophical Transactions of the Royal Society B 369: 20130402. Seashore, C. E. 1919. Seashore Measures of Musical Talent. New York: Columbia Phonograph Company. Seashore, C. E. 1938. Psychology of Music. New York: McGraw Hill. Schellenberg, E. G., and S. E. Trehub. 2003. Good Pitch Memory Is Widespread. Psychological Science 14: 262–266. Szőnyi, E. 1974. Musical Reading and Writing. Vol.1. Budapest: Editio Musica Budapest. Trusheim, W. H. 1991. Audiation and Mental Imagery: Implications for Artistic Performance. Quarterly Journal of Music Teaching and Learning 2: 138–147. Williamson, V.  J., and S.  R.  Jilka. 2013. Experiencing Earworms: An Interview Study of Involuntary Musical Imagery. Psychology of Music 42: 653–670. doi:10.1177/0305735613483848. Wing, H. D. 1970. Standardised Tests of Musical Intelligence. Windsor: NFER-Nelson Publishing. Wöllner, C., and A.  R.  Halpern. 2016. Attentional Flexibility and Memory Capacity in Conductors and Pianists. Attention, Perception, and Psychophysics 78: 198–208. doi:10.3758/ s13414-015-0989-z. Zatorre, R.  J., and A.  R. Halpern. 1993. Effect of Unilateral Temporal-Lobe Excision on Perception and Imagery of Songs. Neuropsychologia 31: 221–232. Zatorre, R. J., Halpern, A. R., Perry, D. W., Meyer, E., and Evans, A. C. 1996. Hearing in the Mind’s Ear: A PET Investigation of Musical Imagery and Perception. Journal of Cognitive Neuroscience 8: 29–46. Zatorre, R. J., and A. R. Halpern. 2005. Mental Concerts: Musical Imagery and the Auditory Cortex. Neuron 47: 9–12. Zatorre, R. J., A. R. Halpern, and M. Bouffard. 2010. Mental Reversal of Imagined Melodies: A Role for the Posterior Parietal Cortex. Journal of Cognitive Neuroscience 22: 775–789.

chapter 20

A Differ en t Way of Im agi n i ng Sou n d Probing the Inner Auditory Worlds of Some Children on the Autism Spectrum Adam Ockelford

Introduction Imagine that you are walking along a road at night when you hear a sound. On the one hand, you might pay attention to its pitch and loudness and the ways they change with time. You might attend to the sound’s timbre, whether it is rough or smooth, bright or dull. . . . These are all examples of musical listening, in which the perceptual dimensions and attributes of concern have to do with the sound itself, and are those used in the creation of music. . . . On the other hand, as you stand there in the road, it is likely that you will not listen to the sound itself at all. Instead, you are likely to notice that the sound is made by an automobile with a large and powerful engine. Your attention is likely to be drawn to the fact that it is approaching quickly from behind. And you might even attend to the environment, hearing that the road you are on is actually a narrow alley, with echoing walls on each side. This is an example of everyday listening, the experience of listening to events rather than sounds.

So writes William Gaver (1993, 1), in relation to his “ecological” analysis of hearing, in which he sets out how, for most listeners, in everyday contexts, the function of sounds in auditory perception is privileged over their acoustic properties. However, there are people for whom this does not appear to be the case, including a significant minority of children who are on the autism spectrum. Children with autism typically have challenges with social interaction and communication, and tend to exhibit a narrow—even obsessive—focus on particular activities that are often characterized by pattern and

410   adam ockelford predictability. The perceptual qualities of objects may be more important than their function, and, in the auditory domain, parents often report a fascination with sound, apparently for its own sake: “My son Jack is obsessed with the beeping sound of the microwave when its cooking cycle comes to an end. He can’t bear to leave the kitchen till it’s stopped. And just lately, he’s become very interested in the whirr of the tumble-drier too.” “My four-year old daughter just repeats what I say. For a long time, she didn’t speak at all, but now, the educational psychologist tells me, she’s ‘echolalic.’ I say, ‘Hello, Anna,’ and she says ‘Hello, Anna’ back. I ask ‘Do you want to play with your toys’ and she just replies ‘Play with your toys,’ though I don’t think she really knows what I mean.” “Ben wants to listen to the jingles that he downloads from the internet all the time. And I mean, the whole time—16 hours a day if we let him. He doesn’t even play them all the way through: sometimes just the first couple of seconds of a clip, over and over again. He must have heard them thousands of times. But he never seems to get bored.” “Callum puts his hands over his ears and starts rocking and humming to himself when my mobile goes off, but totally ignores the ringtone on my husband’s phone, which is much louder.” “My ten-year-old son Freddie constantly flicks any glasses, bowls, pots or pans that are within reach. The other day, he emptied out the dresser— and even brought in half a dozen flowerpots from the garden—and lined everything up on the floor. Then he sat and ‘played’ his new instrument for hours. I couldn’t see a pattern in what he’d done, but if I moved anything when he wasn’t looking, he’d notice straight away, and move it back again.” “Every now and then, Romy only pretends to play the notes on her keyboard— touching the keys with her fingers but not actually pressing them down. And sometimes, she introduces everyday sounds that she hears into her improvising. For example, she plays the complicated descending harmonic sound of the aeroplanes coming into land at Heathrow as chords, and somehow integrates them into the music she is playing.” “Omur repeatedly bangs away at particular notes on his piano (mainly ‘B’ and ‘F sharp.’ high up in the right hand), sometimes persisting until the string or the hammer breaks.” “Derek (who is blind) copies the sounds of the page turns in his own rendition of a Chopin waltz that his piano teacher played for him by tapping his fingers on the music rack above the keyboard.”  (Ockelford 2013)

Why should this be the case? What causes some autistic children to hear sounds in this way? And what impact, if any, does this idiosyncratic style of auditory perception have on the way that they perceive, remember, and imagine music? These are the questions that lie at the heart of this chapter.

inner auditory worlds of children on the autism spectrum   411

An Ecological Model of Auditory Perception Early in life, “neurotypical” human infants learn to differentiate between auditory input according to one of three functions that it can fulfill. This results in the development of “everyday” listening, which, as Gaver observes, is concerned with attending to events such as a car passing by or a door slamming; “musical” listening, which focuses on perceptual qualities such as pitch and loudness; and “linguistic” listening, which is ultimately based on the perception and cognition of speech sounds. The separation of music and language perception ties in with evidence from neuroscience, which suggests that, while the two domains share some neurological resources, they also have dedicated processing pathways (Patel 2012) that are distinct from those activated by environmental sounds (Norman-Haignere, Kanwisher, and McDermott 2015). It is not known just how these three types of auditory processing—relating to everyday sounds, music, and speech—become defined in the brain’s architecture following the initial development of hearing around four to three months before birth (Lecanuet 1996). There is currently some debate as to which develops first, although there is increasing evidence that musical hearing and ability are essential to language acquisition (Brandt, Gebrian, and Slevc 2012). My own work (e.g., Ockelford 2017) supports this view. My theory of what makes music “music”—“zygonic” theory (Ockelford 2005)—contends that, for music to exist in the mind, there must be perceived imitation of one feature of a sound by another (Ockelford, 2005), and the fact that, from an early age, babies do copy vocal sounds and relish being copied long before they can use or understand words suggests that music is indeed a precursor of language (Voyajolu and Ockelford 2016). In any case, singing and speech appear to follow discrete developmental paths from around the beginning of the second year of life (Lecanuet 1996). We can surmise that the other category—“everyday” sounds—must perceptually be the most primitive of all, since it appears to require less cognitive processing than either music or speech. And in phylogenetic terms (in our development as a species), the capacity to process music and then language are thought to be relatively recent specialisms of the auditory system (see, e.g., Masataka 2007). Hence it seems reasonable to assume that, early on in “typical” human development, the brain treats all sound in the same way and that music processing starts to emerge first, followed by language. We can speculate that the residue that is left remains as “everyday” sounds. Hence the ecological model of auditory perception can be represented as follows, in which it is assumed that, as well as their shared neural resources, music and language come to have distinct, additional distinct neural correlates during the first postnatal year (Figure 20.1). Clearly, since the precise nature of the sounds that constitute speech or music, and the relationship between them, varies somewhat from one culture to another, the model should be regarded as indicative rather than absolute.

412   adam ockelford Developmental stream of sound perception and production –3 months “neurotypical” development Proto-musical processing starts to emerge

Birth

Proto-speech processing starts to become distinct from music 12 months “neurotypical” development Distinguishable cognitive domains (though some overlap remains)

Everyday sounds

Music

Speech

Figure 20.1  The emerging streams of music and language processing in auditory development.

But what of children on the autism spectrum? As the parents’ descriptions suggest, it seems that certain sounds, especially those that are particularly salient or pleasing to an individual, such as the whirring of the tumble drier, acquire little or no functional sig­ nificance for some children. Instead, they tend to be processed only in terms of their sounding qualities—that is, in musical terms. It seems also that everyday sounds that involve repetition or regularity (such as the beeping of a microwave) may be processed in music-structural terms. This would imply that the children hear the repetition that is actually generated mechanically or electronically as being imitative (Figure 20.2). There is, of course, another possibility that we should acknowledge: that the autistic children who are preoccupied with the sounding qualities of certain everyday objects and the repetitive patterns that some of them make do not actually hear them in a musical way—that is, as being derived from one another through imitation—but purely as regularities in the environment. Furthermore, it could be that those same children do not hear music as “music” either, but merely as patterned sequences of sounds, to which no sense of human agency is transferred. Why should this be the case? Perhaps because

inner auditory worlds of children on the autism spectrum   413 Developmental stream of sound perception and production –3 months “neurotypical” development

Proto-musical processing starts to emerge

Birth

Proto-speech processing starts to become distinct from music 12 months “neurotypical” development

Everyday sounds

Music

Speech

Figure 20.2  Some everyday sounds might be processed as music among children on the autism spectrum.

such children did not engage in the early vocal interactions with carers—“communicative musicality” (Malloch and Trevarthen 2009)—that I have suggested may embed a sense of imitation in sounds that are repeated (Ockelford  2017). However, the accounts of Romy reproducing the whines of jet engines of airplanes coming in to land and integrating them into her improvisation at the piano, of Derek evidently regarding the rustle of a page turn as part of a Chopin waltz, and of Freddie appropriating everyday sound-makers (flower pots) to be used as musical instruments, suggest that some autistic children, at least, do perceive everyday sounds in a musical way. It may well be that this tendency is reinforced by the prevalence of music in the lives of young children (Lamont 2008); in the developed world, they are typically surrounded by electronic games and gadgets, toys, mobile phones, mp3 players, computers, iPads, TVs, radios, and so on, all of which emanate music to a greater or lesser extent. In the wider environment too—in restaurants, cafés, shops, cinemas and waiting rooms, cars and airplanes, and at many religious gatherings and other public ceremonies— music is

414   adam ockelford ubiquitous. So, given that children are inundated with nonfunctional (musical) sounds, designed, in one way or another, to influence emotional states and behavior, perhaps we should not be surprised that the sounds with which they often co-occur that to neurotypical ears are functional, should come to be processed in the same way. The manner in which some autistic children perceive the world can have other consequences too. For example, the development of language can be affected, resulting in, among other things, “echolalia”—a distinctive form of speech widely reported among blind and autistic children (Mills 1993; Sterponi and Shankey 2013) and which was originally defined as the meaningless repetition of words or phrases (Fay 1967; 1973). However, it appears that echolalia actually fulfills a range of functions in verbal interaction (Prizant 1979), including turn-taking and affirmation, and often finds a place in noninter­ active contexts too, serving as a self-reflective commentary or rehearsal strategy (Prizant and Duchan 1981; McEvoy, Loveland, and Landry 1988). Given the hypothesis that imitation lies at the heart of musical structure (Ockelford 2012), it could be argued that one cause of echolalia is the organization of language (in the absence of semantics and syntax) through the structure (repetition) that is present in all music. It is as though words become musical objects in their own right, to be manipulated not according to their meaning or grammatical function, but purely through their sounding qualities. This implies a further modification to the ecological model of auditory development (see  Figure 20.3). It is of interest to note that echolalia is not restricted to certain exceptional groups who exist on one extreme of the multidimensional continuum that makes up human neurodiversity; it is a feature of “typical” language acquisition in young children (Mcglone-Dorrian and Potter 1984) when, it seems, the urge to imitate what they hear outstrips semantic understanding. This would accord with a stage in the ecological model of auditory development when the two strands of communication through sound— language and music—are not cognitively distinct, and would support the notion that musical development precedes the onset of language. For children on the autism spectrum, it is worth noting that music itself can become “superstructured” with additional repetition, as the account, for example, of Ben shows; it is common for children on the autism spectrum to play snippets of music (or videos with music) over and over again. It is as though music’s already high proportion of repetition, which is at least 80 percent (Ockelford 2005), is insufficient for the mind ravenous for structure, and so it creates even more. Speaking to autistic adults who are able to ­verbalize why (as children) they would repeat musical excerpts in this way, it appears that the main reason (apart from the sheer enjoyment of hearing a particularly fascinating series of sounds repeatedly) is that they could hear more and more in the sequence concerned as they listened to it again and again. Bearing in mind that most music is, as we have seen, highly complex, with many events occurring simultaneously (and given that even single notes generally comprise many pitches in the form of harmonics), to the child with finely tuned auditory perception, there is in fact a plethora of different things to attend to in even a few seconds of music, and an even greater number of relationships

inner auditory worlds of children on the autism spectrum   415 Developmental stream of sound perception and production –3 months “neurotypical” development

Proto-musical processing starts to emerge

Birth

12 months “neurotypical” development

Everyday sounds

Music

Speech

Figure 20.3  Speech might also be processed in musical terms by some children on the autism spectrum.

between sounds to fathom. So, for example, while listening to a passage for orchestra one hundred times may be extremely tedious to the “neurotypical” ear, which can detect only half a dozen composite events, each fused in perception, to the mind of the autistic child, which can break down the sequence into a dozen different melodic lines, the stimulus may be rich and riveting. Moreover, there tends to be far more structure in a piece of music than would theo­ retically be required for it to make sense (Ockelford 2017). Compositions are, by any standards, overengineered, typically with levels of repetition of 80 percent or more (Ockelford 2005). In terms of information theory, they are high redundant. Why should this be the case? Perhaps because, traditionally, composers have been aware that they need to design pieces in such a way that their message will still come across in the sub­ optimal circumstances that will inevitably characterize most performances. For example, different interpretations may unexpectedly foreground some features of a work at the

416   adam ockelford expense of others. The acoustics in which a concert takes place may be less than ideal. Listeners’ concentrations may wander. For the child on the autism spectrum, though, attending to the same short passage for the nth time, it means that to go on listening remains a worthwhile venture; there are still new connections between notes to be unearthed.

Absolute Pitch It seems that one of the consequences of an early preoccupation with the “musical” qualities of sounds is the development of “absolute pitch” (AP)—the capacity to identify or produce pitches in isolation from others. In the West’s population as a whole, this ability is extremely rare, with an estimated prevalence of 1 in 10,000 (Takeuchi and Hulse 1993). However, among those on the autism spectrum, the position is very different; recent estimates, derived from parental questionnaires, vary between 8 percent, N = 118 (Vamvakari 2013) and 21 percent, N = 305 (Reese 2014). These figures are broadly supported by DePape, Hall, Tillmann, and Trainor (2012) who, in a study of twenty-seven high-functioning adolescents with autism spectrum condition, found that three of them (11 percent) had AP. It is very unusual to find such high orders of difference in the incidence of a perceptual ability between different subgroups of the human population and, evidently, there is something distinct in the way that the parts of the brain responsible for pitch memory wire themselves up in a significant minority of autistic children. While AP is a useful (though inessential) skill in “neurotypical” musicians—including those performing at the highest level—it appears to be an indispensable factor in the development of music performance skills in autistic children with learning difficulties— so-called “savants” (Miller 1989). It appears to be this unusual ability that motivates and enables some young children with a limited understanding of the world, from the age of twenty-four months or so, to pick out tunes and harmonies on instruments that they may encounter at home or elsewhere—typically the keyboard or piano. This may well occur with no adult intervention (or, indeed, awareness). It seems that AP has this impact since each pitch sounds distinct, potentially eliciting a powerful emotional response, so being able to reproduce these at will must surely be an electrifying experience. But more than this, AP makes learning to play by ear manageable, in a way that “relative pitch”— the capacity to process melodic and harmonic intervals—does not. To understand why, consider a typical playground chant that children use to taunt one another (Figure 20.4).

Figure 20.4  A playground chant.

inner auditory worlds of children on the autism spectrum   417 In “neurotypical” individuals, motifs such as this are likely to be encoded in the mind, stored and retrieved principally as a series of differences between notes (although “fuzzy” absolute memories will exist—a child would know if the chant were an octave too high, for example). However, for children with AP, the position is quite different, since they have the capacity to capture the pitch data from music directly, rather than as series of intervals. Hence, in seeking to remember and repeat groups of notes over significant periods of time, they have certain processing advantages over their “neurotypical” peers, who extract and store information at a higher level of abstraction, and thereby lose the “surface detail.” (Note that there are disadvantages to “absolute” representations of pitch too since, on their own, they cannot take advantage of the patterns that exist through the repetition of intervals and they make greater demands on memory. However, as there appears to be, to all intents and purposes, no limit on the brain’s long-term storage capacity, this is not a serious problem; indeed, having an exceptional memory is something that is common to many children with autism.) In my view, it is this capacity for “absolute pitch data capture” that explains why children with AP who are on the autism spectrum and have learning difficulties are able to develop instrumental skills at an early age with no formal tuition since, for them, reproducing groups of notes that they have heard is merely a question of remembering a series of one-to-one mappings between given pitches as they sound and (typically) the keys on a keyboard that produce them. These relationships are invariant; once learned, they can service a lifetime of music making, through which they are constantly reinforced. On the other hand, were a child with “relative pitch” to try to play by ear, he or she would have to become proficient in the far more complicated process of calculating how the intervals that are perceived map onto the distances between keys, which, due to the asymmetries of the keyboard, are likely to differ according to what would necessarily be an arbitrary starting point. For example, the interval that exists between the first two notes of the playground chant (a minor 3rd) shown in Figure 20.5 can be produced through no fewer than twelve distinct key combinations, comprising one of four underlying patterns. Moreover, the complexity of the situation is compounded by the fact that virtually the same physical leap between other keys may sound different (a major 3rd) according to its position on the keyboard. That is not to say that children with AP who learn to play by ear do not rapidly develop the skills to play melodies beginning on different notes too, and it is not unusual for them to learn to reproduce pieces fluently in every key. This may appear contradictory, in the light of the processing advantage conferred by being able to encode pitches as perceptual identities in their own right, each of which, as we have seen, maps uniquely onto a particular note on the keyboard. However, the reality of almost all pieces of music is that melodic (and harmonic) motifs variously appear at different pitches through transposition and so, to make sense of music, young children with AP need to learn to process pitch relatively as well as absolutely (Stalinski and Schellenberg 2010).

418   adam ockelford Playing by ear: “relative” pitch

Playing by ear: “absolute” pitch Multiple potential mappings to notes on keyboard

One-to-one mappings to notes on keyboard

? ? P I

Motif stored as series of intervals

Motif stored as series of pitches

G

E

First key press produces sound; mental calculation of interval from this sound by (initial) trial and error to find second key (to match the interval) There are twelve possibilities: four different patterns Confounding factor: the same pattern and similar ones produce different intervals

Minor 3rds

Major 3rds But!

But!

But!

But!

Figure 20.5  The different mechanisms involved in playing by ear using “absolute” and “relative” pitch abilities.

The Impact of Remembering and Imagining Musical Sounds in Absolute Terms What is the day-to-day impact of AP on children with learning difficulties who are on the autism spectrum likely to be? The answer is: as varied as the children are themselves. Elsewhere, I have written at length about the extraordinary life of Derek Paravicini (Ockelford 2009) who is what Treffert (2009) calls a “prodigious” musical savant. It is simply not possible to imagine Derek without his piano playing, in which the way he thinks, the way he feels, and the way he relates to other people are embodied. But there are many other children on the autism spectrum with whom I have worked over many years and who are no less exceptional in their different ways and no less enlightening as to how musical sounds can be remembered and imagined.

inner auditory worlds of children on the autism spectrum   419

Figure 20.6  Romy and Adam share a musical joke (image © 2010 Evangelos Himonides).

In this context, here are two accounts of children whom I have worked with every week for a number of years. They are taken from blogs that were designed to raise awareness of autism and musicality and to stir the debate on the relationship between so-called disability and ability. The children and their parents visit me in a large practice room at the University of Roehampton where I am based. There are two pianos, to avoid potential difficulties over personal space. A number of the children rarely say a word. Some, like Romy, are entirely nonverbal. She converses through her playing, showing what piece she would like next, and indicating when she has had enough. On occasions, she will tease me by apparently suggesting one thing when she means another. In this way, jokes are shared and, sometimes, feelings of sadness too. For Romy, music truly functions as a proxy language (Figure 20.6).

A Session with Romy On Sunday mornings, at 10:00 a.m., I steel myself for Romy’s arrival. I know that the next two hours will be an exacting test of my musical mettle. Yet Romy has severe learning difficulties, and she doesn’t speak at all. She is musical to the core, though; she lives and breathes music—it is the very essence of her being. With her passion comes a high degree of particularity; Romy knows precisely which piece she wants me to play, at what tempo, and in which key. And woe betide me if I get it wrong.

420   adam ockelford When we started working together, six years ago, mistakes and misunderstandings occurred all too frequently since, as it turned out, there were very few pieces that Romy would tolerate: for example, the theme from Für Elise (never the middle section); the Habanera from Carmen; and some snippets from “Buckaroo Holiday” (the first movement of Aaron Copland’s Rodeo). Romy’s acute neophobia meant that even one note of a different piece would evoke shrieks of fear-cum-anger, and the session could easily grow into an emotional conflagration. So gradually, gradually, over weeks, then months, and then years, I introduced new pieces—sometimes, quite literally, at the rate of one note per session. On occasion, if things were difficult, I would even take a step back before trying to move on again the next time. And, imperceptibly at first, Romy’s fears started to melt away. The theme from Brahms’s Haydn Variations became something of an obsession, followed by the slow movement of Beethoven’s Pathetique sonata. Then it was Joplin’s The Entertainer, and Rocking All Over the World by Status Quo. Over the six years, Romy’s jigsaw box of musical pieces—fragments ranging from just a few seconds to a minute or so in length—has filled up at an ever-increasing rate. Now it’s overflowing, and it’s difficult to keep up with Romy’s mercurial musical mind; mixing and matching ideas in our improvised sessions, and even changing melodies and harmonies so they mesh together, or to ensure that my contributions don’t! As we play, new pictures in sound emerge and then retreat as a kaleidoscope of ideas whirls between us. Sometimes a single melody persists for fifteen minutes, even half an hour. For Romy, no matter how often it is repeated, a fragment of music seems to stay fresh and vibrant. At other times, it sounds as though she is trying to play several pieces at the same time—she just can’t get them out quickly enough, and a veritable nest of earworms wriggle their way onto the piano keyboard. Vainly I attempt to herd them into a common direction of musical travel. So here I am, sitting at the piano in Roehampton, on a Sunday morning in midNovember, waiting for Romy to join me (not to be there when she arrives is asking for trouble). I’m limbering up with a rather sedate rendition of the opening of Chopin’s Etude in C major, Op. 10, No. 1 when I hear her coming down the corridor, vocalizing with increasing fervor. I feel the tension rising, and as her father pushes open the door, she breaks away from him, rushes over to the piano and, with a shriek and an extraordinarily agile sweep of her arm, elbows my right hand out of the way at the precise moment that I was going to hit the D an octave above middle C. She usurps this note to her own ends, ushering in her favorite Brahms-Haydn theme. Instantly, Romy smiles, relaxes and gives me the choice of moving out of the way or having my lap appropriated as an unwilling cushion on the piano stool. I choose the former, sliding to my left onto a chair that I’d placed earlier in readiness for the move that I knew I would have to make. I join in the Brahms, and encourage her to use her left hand to add a bass line. She tolerates this up to the end of the first section of the theme, but in her mind she’s already moved on, and without a break in the sound, Romy steps onto the set of A Little Night Music, gently noodling around the introduction to Send in the Clowns. But it’s in the wrong key—G instead of E flat—which I know from experience means that she doesn’t really want us to go into the Sondheim classic, but instead wants me to play the first four bars (and only the first four bars) of Schumann’s Kleine Studie

inner auditory worlds of children on the autism spectrum   421 Op. 68, No. 14. Trying to perform the fifth bar would, in any case, be futile since Romy’s already started to play . . . now, is it I am Sailing or O Freedom? The opening ascent from D through E to G could signal either of those possibilities. Almost tentatively, Romy presses those three notes down and then looks at me and smiles, waiting, and knowing that whichever option I choose will be the wrong one. I just shake my head at her and plump for O Freedom, but sure enough Rod Stewart shoves the Spiritual out of the way before it has time to draw a second breath. From there, Romy shifts up a gear to the Canon in D—or is it really Pachelbel’s masterpiece? With a deft flick of her little finger up to a high A, she seems to suggest that she wants Streets of London instead (which uses the same harmonies). I opt for Ralph McTell, but another flick, this time aimed partly at me as well as the keys, shows that Romy actually wants Beethoven’s Pathetique theme—but again, in the wrong key (D). Obediently I start to play, but Romy takes us almost immediately to A flat (the tonality that Beethoven originally intended). As soon as I’m there, though, Romy races back up the keyboard again, returning to Pachelbel’s domain. Before I’ve had time to catch up, though, she’s transformed the music once more; now we’re hearing the famous theme from Dvorak’s New World Symphony. I pause to recover my thoughts, but Romy is impatiently waiting for me to begin the accompaniment. Two or three minutes into the session, and we’ve already touched on twelve pieces spanning 300 years of Western music and an emotional range to match. Yet, here is a girl who in everyday life is supposed to have no “theory of mind”—the capacity to put yourself in other people’s shoes and think what they are thinking. Here is someone who is supposed to lack the ability to communicate. Here is someone who functions, apparently, at an 18-month level. But I say here is a joyous musician who amazes all who hear her. Here is a girl in whom extreme ability and disability coexist in the most extraordinary way. Here is someone who can reach out through music and touch one’s emotions in a profound way. If music is important to us all, for Romy it is truly her lifeblood.1

How did Romy, severely learning disabled, become such a talented, if idiosyncratic, musician? In my view, it was her early inability to process language, in tandem with her inability to grasp the portent of many everyday sounds, that enhanced her ability to process all sounds in a musical way. The two were inextricably linked. Indeed, without the former, we can surmise that the latter would never have developed. Romy has AP, meaning that for her, as we have seen, her mental images of musical sounds are distinct with regard to pitch. Hence, every note on the piano is instantly recognizable. But more than this, for Romy, each pitch provides a stable point of reference in a capricious world. And it’s not just notes on the piano that function for Romy in this way. In her mind, each of the notes in any piece of music sounds distinct. While, for most of us, musical sounds pass by unremarkably in perceptual terms, for Romy, different notes, different chords, can affect her profoundly: an E flat major harmony can make her quiver with excitement, for example, while G7 can make her cry. In itself, though, absolute pitch is insufficient to make an exceptional musician; that takes at least seven thousand hours of practice (Sloboda et al. 1996). How, then, did Romy acquire her musical skills? Like many autistic children early in life, she developed an

422   adam ockelford obsession. In her case this was a small electronic keyboard, whose notes lit up in the sequence needed to play one of a number of simple tunes. As far as Romy was concerned, this musical toy was one of only a few things with which she could meaningfully interact, and whose logic she could understand, and she spent hundreds of hours playing with it. The keyboard was comfortingly predictable in comparison to any human being—even her devoted family, whose language and behavior differed subtly from one occasion to another, as all interaction engagement does. The keyboard, though, invariably responded to Romy in the same way. Whenever she pressed a particular key, it always sounded the same as it did before. Here was something in the environment that Romy could predict and control. And so, through countless hours of self-directed exploration as a toddler, Romy discovered where all the notes (whose sounds she could hear in her head) are on the keyboard. Today, as a teenager, for Romy to play the piano merely requires her to hear a tune in her head (available to her through the internal library of songs, stored as series of absolute auditory images) and play along with it, pressing down the correct keys in sequence as their pitches sound in her head. And this approach works not only for music. As we noted earlier, she will reproduce the sounds of the jet engines of planes as they descend toward Heathrow Airport, for example, and she unhesitatingly copies any ringtones that interrupt her piano lessons. Absolute pitch can have other consequences for children on the autism spectrum too. The absolute representation of sounds in their heads appears to fuel musical imagination in a way that is more vivid, more visceral even, than the relative memory of intervals alone. And, although formal research is yet to be undertaken, the anecdotal accounts of parents and teachers suggest that earworms are widespread; evidenced most obviously in some children’s incessant vocalizing of melodic fragments. With minds full of tunes that seem to be playing the whole time, external sounds can be at best superfluous and at worst an irritation, as the following account of a session with Freddie, then eleven years old, shows (Figure 20.7).

Freddie—the Silent Musician “Why’s he doing that?” Freddie’s father, Simon, sounded more than usually puzzled by the antics of his son. After months of displacement activity, Freddie was finally sitting next to me at the piano, and looked as though this time he really were about to play. A final fidget and then his right hand moved towards the keys. With infinite care, he placed his thumb on middle C as he had watched me do before— but without pressing it down. Silently, he moved to the next note (D), which he feathered in a similar way, using his index finger, then with the same precision he touched E, F and G, before coming back down the soundless scale to an inaudible C. I couldn’t help smiling.

inner auditory worlds of children on the autism spectrum   423

Figure 20.7  Freddie picks out a note on the piano (image © 2012 The University of Roehampton).

“Fred, we need to hear the notes!” My comment was rewarded with a deep stare, right into my eyes. Through them, almost. It was always hard to know what Freddie was thinking, but on this occasion he did seem to understand and was willing to respond to my request, since his thumb went back to C. Again, the key remained un-pressed, but this time he sang the note (perfectly in tune), and then the next one, and the next, until the five-finger exercise was complete. In most children (assuming that they had the necessary musical skills), such behavior would probably be regarded as an idiosyncratic attempt at humor or even mild naughtiness. But Freddie was being absolutely serious and was pleased, I think, to achieve what he’d been asked to do, for he had indeed enabled me to hear the notes! He stared at me again, evidently expecting something more, and without thinking I leant forward. “Now on this one, Fred,” I said, touching C sharp. Freddie gave the tiniest blink and a twitch of his head, and I imagined him, in a fraction of a second, making the necessary kinesthetic calculations. Without hesitation or error, he produced the five-finger exercise again, this time using a mixture of black and white notes. Each pressed silently. All sung flawlessly.

424   adam ockelford And then, spontaneously, he was off up the keyboard, beginning the same pentatonic pattern on each of the twelve available keys. At my prompting, Freddie re-ran the sequence with his left hand—his unbroken voice hoarsely whispering the low notes. So logical. Why bother to play the notes if you know what they sound like already? So apparently simple a task, and yet . . . such a difficult feat to accomplish: the whole contradiction of autism crystallized in a few moments of music making.2

As I later said to Freddie’s father, if I had wanted to teach a “neurotypical” child to do what his son had achieved with little or no apparent effort, it would probably have taken many lessons, and hundreds of hours of practice for the pupil to master the relationship between the Western tonal system and the asymmetrical (yet regular) layout of the piano keyboard. Yet Freddie had done it merely by watching and listening to what I had done, attending to the streams of notes flowing by, extracting the implicit rules of Western musical syntax, and using these to create patterns of sounds anew. The crucial point is that I had never played the full sequence of scales to Freddie that he subsequently produced. He had worked out the necessary structures intuitively, merely through exposure to music.

Conclusion In this chapter, we have seen how some children on the autism spectrum appear to have aural imaginations that are rooted in processing a range of everyday sounds and even speech in a musical way. The way they perceive, remember, and imagine sounds has a high level of intensity born of their sense of AP. This enables them to play by ear—a skill that is often acquired entirely through their own efforts and that typically first manifests itself in the early years. But more than this, for Freddie, for Romy, and for many other children on the autism spectrum, music may be the key not only to aesthetic fulfillment, but also to communication, shared attention, and emotional understanding. It can do this because it is a language built not on symbolic meaning but on repetition; on order and on predictability in the domain of sound. With musically empathetic adults with whom to interact, this love of pattern—insistence, even—need not restrict the children’s auditory imaginations but can emancipate them, through the capacity to understand musical structure and the rules of the generative grammars through which melodies, harmonic sequences, and rhythms are created afresh.

Notes 1. http://blog.oup.com/2012/12/music-proxy-language-autisic-children. Accessed September 15, 2017. 2. http://www.huffingtonpost.com/adam-ockelford/autism-genius_b_4118805.html. Accessed September 15, 2017.

inner auditory worlds of children on the autism spectrum   425

References Brandt, A., M. Gebrian, and L. R. Slevc. 2012. Music and Early Language Acquisition. Frontiers in Psychology 3. doi:10.3389/fpsyg.2012.00327. DePape, A.-M. R., G. B. C. Hall, B. Tillmann, and L. J. Trainor. 2012. Auditory Processing in High-Functioning Adolescents with Autism Spectrum Disorder. PLoS One 7 (9): e44084. doi:10.1371/journal.pone.0044084. Fay, W.  H. 1967. Childhood Echolalia. Folia Phoniatrica et Logopaedica 19 (4): 297–306. doi:10.1159/000263153. Fay, W. H. 1973. On the Echolalia of the Blind and of the Autistic Child. Journal of Speech and Hearing Disorder 38 (4): 478. doi:10.1044/jshd.3804.478. Gaver, W. W. 1993. What in the World Do We Hear? An Ecological Approach to Auditory Event Perception. Ecological Psychology 5 (1): 1–29. doi:10.1207/s15326969eco0501_1. Lamont, A. 2008. Young Children’s Musical Worlds: Musical Engagement In 3.5-Year-Olds. Journal of Early Childhood Research 6 (3): 247–261. doi:10.1177/1476718x08094449. Lecanuet, J.-P. 1996. Prenatal Auditory Experience. In Musical Beginnings, 3–34. Oxford: Oxford University Press. Malloch, S., and C. Trevarthen, eds. 2009. Communicative Musicality: Exploring the basis of Human Companionship. New York, NY: Oxford University Press. Masataka, N. 2007. Music, Evolution and Language. Developmental Science 10 (1): 35–39. McEvoy, R. E., K. A. Loveland, and S. H. Landry. 1988. The Functions of Immediate Echolalia in Autistic Children: A Developmental Perspective. Journal of Autism and Developmental Disorders 18 (4): 657–668. doi:10.1007/bf02211883. Mcglone-Dorrian, D., and R. E. Potter. 1984. The Occurrence of Echolalia in Three Year Olds’ Responses to Various Question Types. Communication Disorders Quarterly 7 (2): 38–47. doi:10.1177/152574018400700204. Miller, L. 1989. Musical Savants: Exceptional Skill and Mental Retardation. Hillsdale, NJ: Laurence Erlbaum. Mills, A. 1993. Visual Handicap. In Language Development in Exceptional Circumstances edited by D. Bishop and K. Mogford. Hove: Psychology Press, 150–164. Norman-Haignere, S., N. G. Kanwisher, and J. H. McDermott. 2015. Distinct Cortical Pathways for Music and Speech Revealed by Hypothesis-Free Voxel Decomposition. Neuron 88 (6): 1281–1296. doi:10.1016/j.neuron.2015.11.035. Ockelford, A. 2005. Repetition in Music: Theoretical and Metatheoretical perspectives. Farnham: Ashgate. Ockelford, A. 2009. In the Key of Genius: The Extraordinary Life of Derek Paravicini. London: Random House. Ockelford, A. 2012. Music, Language and Autism. London: Jessica Kingsley. Ockelford, A. 2013. Applied Musicology: Using Zygonic Theory to Inform Music Education, Therapy, and Psychology Research. New York, NY: Oxford University Press. Ockelford, A. 2017. Comparing Notes: How We Make Sense of Music. London: Profile Books. Patel, A.  D. 2012. Language, Music, and the Brain: A Resource-Sharing Framework. In Language and Music as Cognitive Systems, edited by P.  Rebuschat, M.  Rohmeier, J. A. Hawkins, and I. Cross, 204–223. Oxford: Oxford University Press. Prizant, B. 1979. An analysis of the functions of immediate echolalia in autistic children, Dissertation Abstracts International 39 (9-B), 4,592–4,593.

426   adam ockelford Prizant, B.  M. and J.  F.  Duchan. 1981. The Functions of Immediate Echolalia in Autistic Children. Journal of Speech and Hearing Disorder 46 (3): 241. doi:10.1044/jshd.4603.241. Reese, A. 2014. The Effect of Exposure to Structured Musical Activities on Communication Skills and Speech for Children and Young Adults on the Autism Spectrum. Unpublished PhD thesis, University of Roehampton, London. Sloboda, J. A., J. W. Davidson, M. J. Howe, and D. G. Moore. 1996. The Role of Practice in the Development of Performing Musicians. British Journal of Psychology 87 (2): 287–309. Stalinski, S. M., and E. G. Schellenberg. 2010. Shifting Perceptions: Developmental Changes in Judgments of Melodic Similarity. Developmental Psychology 46 (6): 1799–1803. doi:10.1037/ a0020658. Sterponi, L., and J. Shankey. 2013. Rethinking Echolalia: Repetition as Interactional Resource in the Communication of a Child with Autism. Journal of Child Language 41 (2): 275–304. doi:10.1017/s0305000912000682. Takeuchi, A. H., and S. H. Hulse. 1993. Absolute Pitch. Psychological Bulletin 113 (2): 345. Treffert, D. 2009. The Savant Syndrome: An Extraordinary Condition. A Synopsis: Past, Present, Future. Philosophical Transactions of The Royal Society B: Biological Sciences 364 (1522): 1351–1357. doi:10.1098/rstb.2008.0326. Vamvakari, T. 2013. My Child and Music: A Survey Exploration of the Musical Abilities and Interests of Children and Young People Diagnosed with Autism Spectrum Conditions. Unpublished master’s thesis, University of Roehampton, London. Voyajolu, A., and A.  Ockelford. 2016. Sounds of Intent in the Early Years: A Proposed Framework of Young Children’s Musical Development. Research Studies in Music Education 38 (1): 93–113. doi:10.1177/1321103x16642632.

Chapter 21

M u ltimoda l I m agery i n the R ecepti v e M usic Th er a py Model Gu ided I m agery a n d M usic (GI M ) Lars Ole Bonde

Introduction Music is a “technology of the self,” as Tia DeNora (2000) concluded in her pioneering study of how music is used in everyday life. DeNora based her study on interviews and observations, focusing on how music was used in contexts as different as aerobic ­exercise classes, karaoke evenings, and music therapy sessions. DeNora elaborated on Gibson’s (1983) concept of affordance—in this case documenting how listening to music can offer the listener a variety of options for use (affordances), mirrored in specific appropriations related to the listener’s needs and the context. Since DeNora’s study, a number of empirical studies have provided more and other evidence of how music listening is appropriated, that is, used for a number of purposes (Bonde et al. 2013; Clarke  2005; Lilliestam  2013). In my own research, I have concentrated on health music(k)ing; that is, how music can be used as/in therapy and as a health resource in every­ day life (Bonde 2000, 2005, 2007, 2010, 2017; Bonde and Blom 2016). In an ongoing study on music and public health (Bonde et al. 2018; Ekholm et al. 2016a, 2016b) it is documented that two-thirds of the adult Danish population use music for relaxation and mood regulation and that an equal number regard music as a health resource. In this chapter, I will focus on a specific model of receptive music therapy; that is, ­psychotherapy based on imagination facilitated by music listening, namely the Bonny Method of guided imagery and music (GIM), because this model can illustrate the close

428   lars ole bonde connection between music, body, and mind, as it emanates spontaneously in ­multimodal imagery during music listening. Imagery and imagination are both impor­ tant parts of our mental life. Since Descartes, Western philosophers have placed mental imagery at the center of imagination, as a kind of “perception with the mind’s eye,” and it is common knowledge that it is possible to imagine even phenomena that we have never experienced in “real life” (Kind 2006). This is a core feature of creative thinking and imagination, and highly relevant also in psychotherapy. There have been many con­ troversies, especially among psychologists, over the nature of mental images, and the dichotomy of “pictorialism” versus “descriptionalism” will briefly be mentioned in the theoretical section. The chapters opens with a brief introduction to the GIM model and its clinical development over the last forty years; then some clinical material is presented and analyzed in a neuroaffective perspective; the rest of the chapter will outline different theoretical perspectives on music and imagery and describe selected and ongoing research in GIM as “sound imagination.” Even if GIM is used clinically as a therapeutic method, this type of music listening can also be used in nonclinical settings, for instance, for self-development. Clarke (2011) describes the difficulties of studying “what it is like to hear music” and the lack of data to support a scientifically grounded understanding of musical consciousness. However, recordings and transcripts of GIM sessions provide unique data that can contribute to such an understanding. Researchers can study the spontaneous report of experiences supported and evoked by the music in GIM in many different ways; for example, by per­ forming thematic analyses of transcripts (Blom 2014; Bonde and Beck 2018) or by ana­ lyzing EEG signals from clients as well as therapists (“travelers” and “guides”) recorded during therapy (Fachner et al. 2015). This will be illustrated in the chapter, and the value of such data will be discussed in a broader perspective.

Guided Imagery and Music—Music Listening as Psychotherapy The American musician and music therapist Helen Lindquist Bonny (1921–2010) ­developed a new model of receptive music therapy in the 1970s and 1980s. It is called the Bonny Method of GIM and today it is the internationally best-known receptive music therapy model with training, clinical work, and research performed in four continents. The Bonny Method is the name of an individual session format developed by Bonny, while GIM is a generic concept encompassing many different individual or group for­ mats using music, imagery, and verbal dialogue in/as therapy (Bruscia 2002). The Bonny Method is “a model of music psychotherapy centrally consisting of a client imaging spontaneously to pre-recorded sequences of classical music” (Abrams  2002, 103). It should be added that the spontaneous imaging to (classical) music in GIM is based on the induction of an altered state of consciousness (ASC) through deep relaxation

multimodal imagery in guided imagery and music   429 (e.g., progressive muscle relaxation or autogenic suggestions) and supported by a ­continuous dialogue in an interactive dyad format. A Bonny Method session lasts 90–120 minutes and is composed of five elements or stages: (1) prelude (verbal check-in and focusing, 15–20 minutes); (2) relaxation/induction (the client lies down with closed eyes, 4–8 minutes); (3) music travel (exploring imagery to therapist-selected, prere­ corded classical music, with the client in an altered state and guided by the therapist, 30–45 minutes); (4) return (to normal state of consciousness and upright position, often facilitated by mandala drawing, 5–10 minutes); (5) postlude (verbal dialogue with dis­ cussion and integration of the music/imagery experience, as related to prelude focus, 10–20 minutes) (Bonny 1978a; Bonde 2010; Bruscia 2002). This classical, individual session format is used when clients and patients have strength/stamina enough to “travel” 30–45 minutes. In many clinical contexts, this is not the case, and therefore a number of guided or unguided “music and imagery” formats have been developed for individual and group work in many different clinical contexts (Grocke and Wigram 2007; Grocke and Moe 2015. In these types of GIM the relaxation/ induction can be shorter and clients may sit up; the music listening phase can be reduced to a few minutes; there may be no dialogue or guiding interventions during music lis­ tening; the music can be nonclassical; the processing may be verbal only; and so forth. In all formats of GIM, the music functions as a container for the imagery experience that can be supportive, reeducative, or reconstructive (Summer  2002,  2009). The choice of music follows similar principles, with a span from gentle and supportive music to intense and challenging music, as explained in a “taxonomy of therapeutic music” (Wärja and Bonde 2014). The so-called intensity profile of a piece of music is dependent on its stability (predictability) or variability (unexpected changes) in tension and release of the musical parameters. Grocke (2010) presents an overview of GIM research, including quantitative effect studies in medical conditions (e.g., hypertension, rheumatoid arthritis, cancer, heart surgery), as well as qualitative studies: primarily a large number of case studies, studies of therapeutic processes, imagery development, and the role of the music in GIM. In the last five years, a number of studies of GIM in new clinical areas (e.g., refugees with war traumas, clients with simple or complex PTSD, patients in palliative care, war veterans who are victims of abuse) has broadened the evidence base of GIM (McKinney and Honig 2017).

Multimodal Imagery in GIM—Examples and a Neuroaffective Perspective Music listening, both in and outside GIM, can evoke and support imagery in all sensory modalities: visual, auditory, olfactory, gustatory, and sensory-kinesthetic. In GIM ­theory and practice, emotions and memories are also considered imagery modalities.

430   lars ole bonde Every individual—client or not—has a personal style of imaging where a certain modality can be dominant; image sequences can be superficial (stream-of-consciousness) or organic and intense; the tempo of imagery development may be slow or fast; the body may be calm or expressive; the dialogue sparse or rich; and the guiding must be adapted accordingly. In Table 21.1 it is possible to follow several examples of how imagery to one piece of music is highly individual, and at the same time there are common factors related to the music as it unfolds. Imagery can unfold in many ways in the “music travel” of the GIM session. Typically, it is configured as narrative episodes centered round one or a few core images/meta­ phors. In some cases the music travel inspires a whole, coherent story—a narrative engaging all sensory modalities and following the principles of narrative configuration or mimesis, as described by Ricoeur (1978; see also Bonde 2000, 2004, 2005). In the fol­ lowing, a number of image experiences to the same music are presented. Data comes from doctoral research (Bonde  2005), and includes the therapist’s session notes and recordings of four participants’ music travels to the same music: Bach’s Mein Jesu, was für Seelenweh! BVW 487 in an orchestral (string) arrangement by Leopold Stokowski. The piece is part of the GIM Music Program Mostly Bach (constructed by Bonny in 1977). The participants in this project were cancer survivors, and the theme “Death” or fear of death would now and then appear in the imagery, especially when they listened to this particular piece of music. The original song text is very emotional and describes Jesus in the garden of Gethsemane on the night before his death, from the perspective of a compassionate witness. The Stokowski arrangement is purely instrumental, but the very slow tempo and expressive phrasing brings the passionate nature of the melody and harmony to the foreground. Extra data material comes from a research workshop where a group of music therapy researchers listened to the music. Their experiences and com­ ments are synthesized in one column. The presentation format is called “event structure analysis” (Tesch 1990; Marr 2001), and it outlines how the imagery is connected to specific sections of the music. Table 21.1 shows that the imagery is personal (columns 4–7) and multimodal (column 2; olfactory and gustatory imagery were not present in this dataset). All four participants work with their relationship to death and dying. For Mrs. H, it is very painful to go in that direction. She resists and reacts with a high degree of body tension. Mrs. A also has strong bodily reactions; however, she accepts the tension and explores her ambivalent emotions, with accept of the ultimate transition on the one hand, doubt and not-knowing on the other hand. In the end (after a repetition of the piece) she accepts the ambiva­ lence. Mrs. F explores “The garden of Death” and is reassured by strong images of Death as a friend. Mrs. L reports a metaphoric fantasy of “The death of an elephant.” She identi­ fies the elephant as herself and, quite unexpectedly, a lotus appears as a sign of pro­ tection and reassurance. The research workshop group (column 8) identifies the changing moods in the music and confirm at a more analytical level how the music affords an existential exchange related to very strong and difficult emotions. The event structure presentation format gives information on how the dynamic and emotional narratives of the listeners follow the music’s development closely.

Table 21.1  Event Structure Analysis of Four Cancer Survivors’ and a Group of Researchers’ Imagery Experiences during Listening to Bach: Mein Jesu! BVW 487 Arrangement (most guide interventions are included, in italics) Episode/ Bars

Code

Mein Jesu

E

A1 (1–6)

R S V

A2 (1–6 rep.)

V R E

B1 (7–14)

V/E E/A S

Music

Mrs. L 6,2–3

Mrs. F 8,4–5

Mrs. A 10,2–3

Mrs. H 9,4–5

Workshop

Comments

Strings only. They sound soft and muted. Cello plays the melody. The bowing is continuous.

Now I feel sadness. Allow yourself to feel that. Where do you feel the sadness? In my head.

I want Death to be my friend.

What happens in your neck? Something is pressing (she yawns, massages her jaws, tears)

A cemetery. Graves and stones.

The music evokes visual imagery and emotional reactions.

The sad and sombre mood from the previous track (Bach: Komm süsser Tod) is deepened by the soft and earnest voice of the celli.

Violins take over the melody, one octave higher.

I see an elephant. It is huge and tired, moves heavily. The battle is lost. What battle?

Is anything preventing you from that? I don’t think so. Is Death nearby? Yes. How does it look?

Can you feel what the press is about?

It reminds me of death. I don’t want to go into that.

Mood: 2 sadness, sorrow, loneliness. Opening toward a meeting or a vast space.

The initial statement is confirmed by the violins one octave higher.

Celli take over the melody again. The chromatic quavers end with the first breathing point. Second breathing point is before the last phrase.

The battle about managing everything. That’s why it is sad.—I am the elephant. It didn’t succeed. It will be shot, I think. It has given it up. A lotus suddenly appears.

It is light and mild—like the angel: It says: “Be not afraid!”

Allow yourself to feel the feelings.

The body is tense all over.

Expansion and exploration—dialogue is possible. Minor mood changes: 1 (spiritual, dignified, serious), or 3 (longing, yearning).

The tension builds up through the harmonic underpinning of the chromatic melodic line. The breathing points allow the listener to digest and let go. The final melodic phrase is like a prayer. (continued)

Table 21.1  (Continued) Episode/ Bars

Code

B2 (7–14 rep.)

E/V

B3 (Coda)

E

E

R

Music

Mrs. L 6,2–3

Mrs. F 8,4–5

Violins take over. Dynamic intensity, both in crescendi and in diminuendi. Surprising subito piano in the end of the chromatic phrase.

Just above the head of How is it for you to the elephant. How does hear that? that feel? Very good. Very confident. The lotus is a sign that someone holds his hand over it, even if it can’t see it . . . I can see it.

Last three bars are re repeated, with celli playing the melody. Ends on a D major chord.

How is it for you to be aware of that? (Coughs). It feels safe.

Mrs. A 10,2–3

Mrs. H 9,4–5

Workshop

Comments

Images of death/ The celli repeat the final rebirth or saying phrase in an introverted goodbye are possible. confirmation of the necessity of prayer. The final major chord offers comfort.

It is something about accept—will I ever reach the other side?—And what is the other side? (tears). How is it for you right now? Both difficult and OK.

First column: Episodes corresponding with phenomenological description and formal analysis of the music. Second column: Coding of image modalities (V = visual, A = auditory, S = sensory-kinesthetic, O = olfactory, G = gustatory, E = emotions, M = memories, R = reflections and thoughts, T = transpersonal, Ot = other, e.g., body tension). Third column: Cues referring to the phenomenological description and the Intensity profile of the music. Fourth–seventh columns: Imagery of four participants (1,1 = First session, first music selection). Eighth column: Results from a research workshop with music therapy researchers as participants. Mood numbers refer to Hevner (1936). Ninth column: The author’s hermeneutic interpretation of music and image potential.

multimodal imagery in guided imagery and music   433 The clinical outcome of the participants’ music and imagery experiences is reported elsewhere (Bonde 2005, 2007). In the context of this chapter, I will examine the expe­ riences from a neuroaffective perspective (Hart  2012; Lindvang and Beck  2017). Neuroaffective theory describes and explains how affects and emotions are aroused and regulated in different consciousness states, and three basic neurological levels, as presented, for example, in the theory of the triune brain (the autonomous nervous system, the limbic system, and the neocortex) and its relevance for psychotherapy (MacLean 1990; Hart 2012; Lindvang and Beck 2017). Imagery experiences in GIM are fine illustrations of how these levels are at work—in the same session, in the slightly altered state (facilitated by the deep relaxation before the music travel), multimodal images are evoked during music listening and can be correlated with alpha waves in the brain activity and responses at the autonomous level of the nervous system, focused on sensory perception and arousal regulating. Imagery is closely connected with emotions, processed in the limbic system, while the ongoing dialogue between client and therapist makes a verbal-metaphorical bridge to the frontal cortex system, focused on mentali­ zation (Fachner et al. 2015; Hunt 2011, 2015, 2017). In Mrs. F’s travel, there are very few words. However, the imagery is intense and ­concentrated on the existential question: I may die soon, how shall I approach this fact? Emotionally, she moves between despair and hope, but in the music travel she experi­ ences a transformation of anxiety. Hope is activated when Death appears as a friend, not a foe. This transformation is both emotional (relief and joy) and bodily (deep breathing, serenity). The music travel of Mrs. L illustrates neuroaffective theory very clearly. First, a sensory response indicates a change in perception (level one: autonomous), then emo­ tions arise (level two: limbic) and images are evoked; and, finally, the transformative experience of being the elephant links to the frontal level—the client even mentalizes the elephant as herself. The final phase of the GIM session—the postlude dialogue—is the stage of integrating the neuroaffective levels by examining emotions, images, and their connection with the theme in focus. Such existential experiences can lead to new coping strategies and increased self-awareness. Together with the growing research in music in everyday life, studies like this suggest that GIM and other types of deep music listening have an almost unexplored health potential and should be used in prophylactic projects. The transformative potential of such experiences can be illustrated by results from a study of GIM with a nonclinical population (Blom 2014; Bonde and Blom 2016). Ten participants volunteered for a pro­ject presented as “Self-development through music and imagery.” Six participants had previous GIM experience and were offered three sessions. Four participants had never experienced GIM; they were offered five sessions. Advanced GIM music programs, identified as potential sources of transformation, were used in the sessions. All pro­ grams included strong and challenging music, with the purpose of inspiring and facili­ tating existential and spiritual processes of transformation. Participants filled in questionnaires on existential well-being, and they were interviewed about their experi­ ences together with the therapist (in so-called collaborative interviews); all session tran­ scripts were analyzed. This analysis documented that all ten participants used GIM to

434   lars ole bonde facilitate deep existential work. They all reported strong experiences of beauty and ­confirmation at a deep level of being. The experience of surrender (Blom  2014— described later) could be documented for eight of ten participants in the sessions, and, in the interviews, they described the seminal influence of these music and imagery experiences for their inner and outer life. Such “strong music experiences” have been documented in literature from music ­psychology, music sociology, ethnomusicology, and music therapy. One of the pioneer researchers, the Swedish music psychologist Alf Gabrielsson (2011), collected more than a thousand first-person reports on such experiences. He and his colleagues analyzed them phenomenologically and developed a descriptive categorization of characteristics and types. Gabrielsson and other Scandinavian researchers have looked into the health potential of such experiences (Bonde et al. 2013; Lilliestam 2013). These studies indicate that strong music experiences not only have existential meaning for the listener but also are health promoting—especially when they are shared, such as in individual or group therapy (Stern 2010).

Theories of Consciousness, Music, Imagery, Emotion, and Health—as Related to GIM The GIM experience is complex, and many types of theories are relevant as part of the framework of understanding how music, imagery, guiding, drawing, and verbal pro­ cessing work together. Helen Bonny, the creator of GIM, thought of GIM as a transfor­ mational practice, enabling even transpersonal experiences through music listening. She was influenced by transpersonal psychology and worked for some years together with Stanislav Grof at Maryland Psychiatric Center on the selection of music for experi­ mental LSD sessions (Bonny 1975, 2002a, 2002b; 2002c). Her so-called Cut-log diagram (Bonny 1975) is a theoretically based map of the mind, integrating layers and states of consciousness known from the psychological theories of Freud, Jung, Grof, and Wilber. The diagram reflects the enormous diversity of GIM experiences in thousands of travel­ ers and how the GIM experience can lead the client to many different layers or states in the same session. The diagram and the theory has been further developed by other GIM therapists (Goldberg 2002; Clark 2014). Clark (2014) documents how the original two-­ dimensional model expanded into three dimensions of “funnel” models (Bush 1995), a holographic model (Goldberg 2002), and Clark herself suggests a “synthesis” model of the “invisible, interpenetrating fields” of center and periphery, consciousness, music, guide, and traveler. Bonde (2000, 2004, 2005) studied GIM experiences inspired by metaphor theory (Ricoeur 1978; Lakoff and Johnson 1980; 1999; Johnson 2007). Based on these studies,

multimodal imagery in guided imagery and music   435 I suggest that (mostly nonverbal) images are reported as metaphors in the traveler-guide dialogue, and that images are configured in narrative scenes, episodes, or complete ­narratives revealing embodied core metaphors and “scripts” that can be processed therapeutically. In this type of dialogic music listening, imagery is reported as a meta­ phorical narrative of experiences in other sensory modalities (Horowitz 1983, see later). I also studied the relationship between music and imagery—among GIM practitioners often described with the didactic metaphor of “music as cotherapist” (Bonde 2010; Wärja and Bonde 2014). Based on a number of event structure analyses (see Table 21.1 for an example), I formulated a series of grounded theories (Bonde 2005; 2017), address­ ing steps or stages in the therapeutic process and the roles and functions of the musical elements (melody, harmony, rhythm, form, style, etc.) in GIM. Here are a few obser­ vations in the theory of narrative patterns (in the imagery configuration) related to musi­ cal structure (Bonde 2005; 2017): The clearer the narrative structure of the music is, the clearer this will be reflected in the imagery. Music introducing higher intensity and ten­ sion is reflected in the imagery in many ways: a change of perspective is seen, manifest action may replace hesitation or a block, emotional outlets may follow reflections, sud­ den insight (“messages”) are experienced, or the imagery develops in a new direction. Examples can be seen in Table 21.1; for example, in the development of the “Death of an elephant” story, where the intimate relationship between musical form and narrative form is demonstrated. A ternary form in the music may impose a ternary narrative or dramatic structure on the imagery. Simplicity and complexity are complementary in the development of music and imagery. Simple musical forms with many repetitions tend to stabilize the imagery, inviting extended descriptions and a differentiation of (emotional) qualities, while complex or developmental forms with many changes or transformations tend to impose a dynamic process on the imagery. This theory is closely related to how DeNora (2011) understands GIM as a “laboratory” where music “provides structures for formulating thought and . . . knowledge of the world”: GIM is an excellent natural laboratory, a place in which to see how agents transfer musical properties to extra-musical properties and how they come to understand those extra-musical matters through the sonic structure of music, and in real time, that is, in direct correlation with the unfolding musical event.  (317)

Based on Bonny’s early ideas of the “profile of affective/energy dynamics” of the music in GIM (1978b), I developed a basic classification of “therapeutic music in GIM,” distinguish­ ing between the specific intensity profiles of (1) supportive music, (2) mixed supportive/ challenging music, and (3) challenging music (Bonde 2005). The classification was later developed into a “taxonomy” (Wärja and Bonde 2014) describing in more detail how the ebb and flow of musical tension and release can be understood in a therapeutic context. Theories of imagery form a controversial field in clinical psychology. For what is  imagery actually, and how is it related to imagination? The psychologist and ­psychotherapist Horowitz (1983) presented a theory of mental representation, with

436   lars ole bonde imagery in a central role. In this theory, there is a distinction between three modes of ­representation—three types of “thinking.” According to Horowitz, enactive representation is the “thinking of the body” and, mostly, this kind of knowledge is tacit and implicit, and the first to be developed in the child. Image representation is next in the developmental process, a specific way of processing information with the inner senses—with at least six modalities: visual, auditory, sensory-kinesthetic, olfactory, gustatory, and emotional. The latest stage in the developmental process is thinking in words and concepts (logic and numbers), what Horowitz calls lexical representation. Horowitz’s theory is a relevant framework for the understanding of GIM experiences where all three modes of represen­ tation are active and where metaphors bridge them. It is also close to neuroaffective theory. Thinking in multimodal images that are expressed verbally in metaphors and narrative episodes is much more common and important than we normally think, and music is probably the most image-stimulating and -evoking medium that exists. In dreams, daydreams, and creative imaginative states of consciousness imagery belongs to a specific form of human creativity. In cognitive psychology, however, there has been a hot debate going on over decades on how to understand mental imagery and its role in cognition (Kind  2006). There are two competing views: propositional and depictive (descriptionalism versus pictorialism; the first claiming that images are represented roughly in the way language is represented, the latter that images are represented roughly in the same way as pictures). Based on my GIM studies, I am in line with Kosslyn and colleagues (2006) who support the depictive view and contend not only that mental images depict information but also that these depictions play a functional role in human cognition (for example, problem solving, memory, creativity). From the perspective of interpersonal psychology, the study by Blom (2011, 2014) is taking music and imagery (and GIM research and theory) to a new level. The study of imagery in GIM has long focused on the content of the imagery, and systems of classifi­ cation have been suggested (Grocke 1999; 2007). As an alternative, Blom suggests that the focus should be on process, based on the premise that music in GIM is a relational agent, with the musical elements metaphorically serving as relational ingredients with transform­ational potential. The therapeutic relationship (the triangle of music–therapist– client) is the interpersonal framework of that process, including explicit and implicit negotiation, disruption and repair, and moments of intense affectivity. Based on the thorough analysis of music and imagery in ten nonclinical participants’ thirty-eight music travels to advanced GIM music programs, she developed an intersubjective understanding of the process of “surrender” in GIM. The processes and the shared multimodal imagery can be divided into six categories, with the first three describing basic ways of sharing (1. shared attention, 2. shared intention, 3. shared affectivity) while the last three are genuine interpersonal experiences (4. confirmation, 5. nonconfirmation, 6. surrender or transcendence). Imagery is only mentioned briefly in two recent handbooks of music psychology (Hallam et al. 2009; Juslin and Sloboda 2011). However, in experimental music psychology both modality-independent and modality-specific imagery have been studied using

multimodal imagery in guided imagery and music   437 functional neuroimaging techniques (McNorgan  2012). McNorgan’s review suggests there is a core network of brain regions recruited during all types of imagery, while modality-specific imagery is associated with increased activation in corresponding sensorimotor regions (10). Hubbard (2010) reviewed empirical studies in auditory imagery, confirming the activation of neural regions also activated in auditory per­ ception. However, these studies of imagined intervals, melodies, and other musical and verbal elements are not so relevant in the context of multisensory imagery in GIM. Visual imagery is part of Juslin and Västfjäll’s ecological theory of human perception of sound and music. They use the acronym BRECVEMA for the eight suggested mechanisms, conscious as well as unconscious, for induction of emotion through music: (1) Brain stem reflex, (2) Rhythmic entrainment, (3) Evaluative conditioning, (4) Contagion, (5) Visual imagery, (6) Episodic memory, (7) Musical expectancy, and (8) Aesthetic judgment. Juslin, Barrada, and Eerola (2015) have tested four of these mechanisms experimentally (nos. 1, 4, 6, and 7, i.e., not including imagery). Results confirm the hypothesis that evoked emotions were related to specific, target mechanism conditions, and the authors conclude that a multimechanism framework is necessary to explain how emotional responses to music are mediated. In the context of this chapter, mechanism no. 5—Visual imagery—is of special interest. Juslin and Västfjäll write that emotions are evoked because the listener experiences imagery during music listening by metaphorical transfer from the musical structure (e.g., a rising melodic contour) to per­ sonal images and fantasies (e.g., a sunrise). This explanation is in line with the theories of Bonny as well as Lakoff and Johnson (mentioned earlier). However, this transfer should not be limited to the visual sense only. Juslin and Västfjäll mention that our remote human ancestors have been dependent on recognizing and identifying sound patterns and their meaning; thus, sound and imagery had an important phylogenetic function. As several chapters in this book show, it is not very likely that only one sense should be involved in such an important cognitive operation. Many of the same elements or mechanisms are discussed by Clarke et al. (2015). They suggest a new “model of musical empathic engagement, from a listening perspective” (18). This complex and promising theoretical model does not mention imagery explicitly, however the concept of “mimetic resonance” is very close: “a tendency to hear musical events in ‘anthropomorphic’ or more broadly animated ways, according to their gestural, vocal, or dynamic qualities; and incorporating mirror neurons and other components of perception–action mimicry” (21). This could easily be read as a description of how music affords the listener a number of options that can be explored through music and imagery (DeNora 2011). In his ecological theory of music listening, Clarke (2005) emphasizes the continuity between music and the realities of everyday life and links the auditory experience of music in modern human beings to the practical functions of auditory perception in phylogenesis. Clarke does not mention imagery but, like DeNora, he uses Gibson’s concept of “affordance” and underlines the importance of the social and the musical context for the actual affordance(s), and therefore also for the appropri­ ations by the listener.

438   lars ole bonde The neuroscience of music has developed a lot over the last twenty years (Christensen 2012). Cognitive neuroscience has broadened our understanding of how music is processed in the brain, and how the complex interplay of music and emotion involves all three “systems” of the brain, as mentioned in the section on neuroaffective theory earlier. However, there are not many neuroscientific studies of spontaneous, music-evoked imagery or of GIM experiences. An early study by Lem (1999) presented a promising way of using EEG to document brain activity during listening to a piece of music from the GIM repertoire and correlating this with the imagery reported post hoc. In a recent neuro­ phenomenological study (Hunt 2017), a similar method was used to investigate brain activity during music listening. The participants listened to music and a script focusing on only one of six specific imagery modalities: body, visual, kinesthetic, interaction, affect, and memory (Hunt 2017). In these studies, there was no dialogue and no verbal reporting during music listening—the imagery cannot be reported immediately because talking and movements disturb the EEG signal. Therefore, it has not until now been possible to study brain activity in a naturalistic GIM setting. An ongoing study (Fachner et al. 2015) has the ambition of solving the problem, at least partially. Two GIM sessions were recorded in a naturalistic setting, and the traveler’s brain responses were EEGrecorded during (1) rest, (2) relaxation/induction, and (3) the music travel. The verbal dialogue was transcribed verbatim to enable an analysis of the imagery and its meaning. Based on this analysis, core metaphors and episodes of special interest were identified, and some of these were selected for EEG analysis, based on the premise that there should be long enough periods of silence before and/or after the verbal report to enable an uncompromised EEG signal. The analysis is ongoing, and a preliminary conclusion of this neurometric EEG-LORETA case study was that the ASC (defined as alpha waves or slower) induced in the relaxation phase has marked influence on the music listen­ ing process, and ASC-related change indicates connection to visual imagery processing during music listening in GIM. In the second phase of this study, EEG signals were been recorded from both therapist and client simultaneously and in a naturalistic setting. The analysis is ongoing.

Discussion Music therapy is not limited to clinical practice areas only. Music therapy research is recognized as a specific tradition in its own right within musicology (Ruud  2016). References to music therapy are increasingly found in theories and studies in music psychology (e.g., Juslin and Västfjäll 2008; Asutay and Västfjäll, this volume, chapter 18; Eerola and Vuoskoski 2013), and music therapy theory contributes to the understanding of musicking in a health perspective and an embodiment perspective (Bonde and Beck forthcoming 2019; Small 1998; Stige 2003). As shown earlier, many different theories have been developed to explain the complex interplay of music, imagery, and the

multimodal imagery in guided imagery and music   439 interpersonal relationships in GIM. There is also a lot of research to support the evidence of GIM as an effective method of psychotherapy. However, neuroscientific evidence of GIM as effective psychotherapy is still quite sparse. Such experimental studies using advanced technology in a laboratory to study music and imagery is quite far from both the naturalistic GIM setting and everyday music listening, and there is still a long way to go to document whether/how pivotal or trans­ formative imagery is correlated with changes in brain activity. Therefore, an important design development (as described earlier) is to record the EEG of both traveler/client and guide/therapist simultaneously. This can give important information on the neuro­logical nature of the interpersonal relationship in particular, and the interpersonal nature of the GIM experience, as suggested by Blom (2014). With her interpersonal theory of processes in GIM, Blom (2011, 2014) indirectly contributes to a demystification of spiritual and transpersonal experiences that are often reported in GIM. Blom gives these strong experiences of “surrender” a contemporary relational psychological framework and her study indicates the health potential of such experiences. Most of the existing music and imagery research in music psychology investigates imagination of intervals, melodies, and other musical elements in order to compare them to the listening process (Hubbard 2010; Hubbard, volume 1, chapter 8). This kind of experimental research has a long history; however, it often lacks ecological validity in the contexts of receptive music therapy or everyday music listening. It is interesting that “imagery” is not listed in the index of The Oxford Handbook of Music Psychology (Hallam et al. 2009), and that “imagining” is only mentioned in the chapter on the psychology of composition (Impett 2009). Kinesthetic-image schemas are mentioned in the chapter on music and meaning (Cross and Tolbert  2009), with references to the cognitive metaphor theory by Lakoff and Johnson (mentioned earlier), listed as an example of an experientialist approach to music and meaning. Even if the handbook has many chapters on music and emotion, imagery is not an element in them. In the Handbook of Music and Emotion (Juslin and Sloboda 2011), imagery is included in the index and discussed in two chapters. Woody and McPherson (2011) describe how musicians use imagery and metaphors to evoke emotions for performance. Gabrielsson (2011) reports from his study of “strong experiences with music” (mentioned earlier) how such experiences are often reported by the listeners/informants. Juslin and Västfjäll (2008) include imagery in their promising BRECVEMA model (described earlier), however, they only mention visual imagery and, as we have seen from the empirical data, imagery is multimodal, not only visual. As shown by McNorgan (2012) each imagery modality has both general and specific neural correlates and therefore contributes to meaning in a unique way. I think the relative absence of empirical, naturalistic music, and imagery studies in neuroscience reflects a dominating, more or less traditional, postpositivist approach to research in music listening. The more actual listening reports are included in the research, the more imagery comes to the foreground. What is suggested here is that research in music listening should be much more focused on naturalistic settings

440   lars ole bonde and that the study of multimodal imagery can be a key to broaden our understanding not  only of GIM and other receptive music therapy methods (Hunt  2015) but also of music listening as—in DeNora’s words—“a technology of the self ” in everyday life (DeNora 2000, 2007, 2011) and as a genuine health resource (Ekholm et al. 2016a, 2016b; Bonde et al. 2018). Cognitive neuroscience and neurophenomenology can contribute to this if the researchers take the epistemological stance that the first-person and the third-person perspective are equally important (Hunt 2015).

Conclusion Music imaging is a natural phenomenon that can be encouraged and used in many different ways and contexts, including music education (Halpern and Overy, this volume, chapter 19). It is used in therapy (e.g., in GIM) to stimulate the client’s creative imagination and ability to change or transform inappropriate patterns of attachment and emotion regulation, but it is also used in everyday life as what Tia DeNora calls “a technology of the self.” Using the concepts of the social psychologist James Gibson, we can say that music affords imaging and music imaging can be appropriated in multiple ways, for creative-imaginative purposes as well as for the regulation of physical, psychological, and spiritual well-being. What Even Ruud (2010) calls “listening self-care” and “musical self-medication” are typical forms of appropriations. Music imaging is both a mode of thinking (based on introjection of patterns afforded by the musical material) and a mode of expression (affording the projection of personal material of all sorts on the music). Music listening in GIM therapy is of course not “music listening” per se. Client experiences are highly personal, even idiosyncratic, and the therapeutic focus is always more important in the context than the aesthetic qualities of the music. However, GIM experiences are good examples of music’s affordances and appropriations (DeNora 2000). With the therapist’s support, the GIM client takes from the music what is needed to explore salient physical, psychological, social, existential, or spiritual issues. The combination of music and imagery is not just relevant in a clinical context, even if “image listening” has been regarded as irrelevant by musicology until recently; the experience of multimodal imagery while listening to music is inherently human and has a great potential as a health resource. Before creating the Bonny Method of GIM, Helen Bonny worked together with the Canadian musicologist Louis Savary on a project called “Listening with a new consciousness” (Bonny and Savary 1973). This book presents many manuscripts of guided “music travels” for groups, with many different target groups from school children to re­ligious groups. The GIM therapist Carol Bush developed “GIM on your own” (1995) as a method for self-development. The study of imagery during music listening is increas­ ingly being integrated in music psychology, and early evidence from neuroscience supports the prophylactic potential of music and imagery work. In other words, GIM is a welldocumented example of “sound imagination” contributing to a new perspective or paradigm that Tia DeNora calls a MusEcological perspective (DeNora 2011).

multimodal imagery in guided imagery and music   441

References Abrams, B. 2002. Transpersonal Dimensions of the Bonny Method. In Guided Imagery and Music: The Bonny Method and Beyond, edited by K. E. Bruscia and D. E. Grocke, 339–358. Gilsum NH: Barcelona Publishers. Blom, K. M. 2011. Transpersonal—Spiritual BMGIM Experiences and the Process of Surrender. Nordic Journal of Music Therapy 20 (2): 185–203. Blom, K. M. 2014. Experiences of Transcendence and the Process of Surrender in Guided Imagery and Music (GIM). PhD thesis. Aalborg University. http://vbn.aau.dk/files/ 204635175/Katarina_Martenson_Blom_Thesis.pdf. Accessed December 29, 2018. Bonde, L.  O. 2000. Metaphor and Narrative in Guided Imagery and Music. Journal of the Association for Music and Imagery 7: 59–76. Bonde, L. O. 2005. The Bonny Method of Guided Imagery and Music (BMGIM) with Cancer Survivors: A Psychological Study with Focus on the Influence of BMGIM on Mood and Quality of Life. PhD thesis, Aalborg University. http://www.wfmt.info/Musictherapyworld/ modules/archive/dissertations/pdfs/Bonde2005.pdf. Accessed December 28, 2018. Bonde, L.  O. 2007. Imagery, Metaphor and Perceived Outcomes in Six Cancer Survivors’ BMGIM Therapy. Qualitative Inquiries in Music Therapy, Vol. 3, edited by A.  Meadows, 132–164. Gilsum, NH: Barcelona Publishers. Bonde, L.  O. 2010. Music as Support and Challenge. Jahrbuch Musiktherapie Bd. 6, Imaginationen in der Musiktherapie, 89–118. Wiesbaden: Reichert Verlag. Bonde, L.  O. 2017. Embodied Music Listening. In The Routledge Companion to Embodied Music Interaction, edited by M.  Lesaffre, M.  Leman, and P.-J.  Maes, 269–277. London: Routledge. Bonde, L. O., and B. D. Beck. 2019 (forthcoming). Imagining Nature during Music Listening. An Exploration of the Meaning, Sharing and Therapeutic Potential of Nature Imagery in Guided Imagery and Music. In Nature in Psychotherapy and Arts-Based Therapy, edited by E. Pfeifer and H.-H. Decker-Voigt. Giessen: Psychosozial Verlag. Bonde, L. O., and K. M. Blom. 2016. Music Listening and the Experience of Surrender: An Exploration of Imagery Experiences Evoked by Selected Classical Music from the Western Tradition. In Cultural Psychology of Musical Experience, edited by H.  Klempe, 207–234. Charlotte, NC: Information Age Publishing. Bonde, L. O., O. Ekholm, and K. Juel. 2018. Associations between Music and Health-Related Outcomes in Adult Non-Musicians, Amateur Musicians and Professional Musicians— Results from a Nationwide Danish Study. Nordic Journal of Music Therapy 27 (4): 262–282. Bonde, L. O., M. S. Skånland, E. Ruud, and G. Trondalen. 2013. Musical Life Stories: Narratives on Health Musicking. Oslo: Skriftserie fra Senter for musikk og helse. Bonny, H. L. 1975. Music and Consciousness. Journal of Music Therapy 12: 121–135. Bonny, H.  L. 1978a. GIM Monograph #1: Facilitating GIM Sessions. Salina, KS: Bonny Foundation. Bonny, H. L. 1978b. GIM Monograph #2: The Role of Taped Music Programs in the GIM Process. Salina, KS: Bonny Foundation. Bonny, H.  L. 2002a. Autobiographical Essay. In Music and Consciousness: The Evolution of Guided Imagery and Music, edited by L. Summer, 1–18. Gilsum NH: Barcelona Publishers. Bonny, H. L. 2002b. The Early Development of Guided Imagery and Music (GIM). In Music and Consciousness: The Evolution of Guided Imagery and Music, edited by L. Summer, 53–68. Gilsum NH: Barcelona Publishers.

442   lars ole bonde Bonny, H. L. 2002c. Guided Imagery and Music (GIM): Discovery of the Method. In Music and Consciousness: The Evolution of Guided Imagery and Music, edited by L. Summer, 43–52. Gilsum NH: Barcelona Publishers. Bonny, H., and L. Savary. 1973. Music and Your Mind: Listening with a New Consciousness. New York: Harper & Row. Bruscia, K.  E. 2002. The Boundaries of Guided Imagery and Music (GIM) and the Bonny Method. Guided Imagery and Music. In Guided Imagery and Music: The Bonny Method and Beyond, edited K. E. Bruscia and D. E. Grocke, 37–61. Gilsum NH: Barcelona Publishers. Bush, C. 1995. Healing Imagery and Music: Pathways to the Inner Self. Portland, OR: Rudra Press. Christensen, E. 2012. Music Listening, Music Therapy, Phenomenology and Neuroscience. PhD thesis. Aalborg: Aalborg University. http://vbn.aau.dk/files/68298556/MUSIC_ LISTENING_FINAL_ONLINE_Erik_christensen12.pdf. Accessed May 7, 2017. Clark, M. 2014. A New Synthesis Model of the Bonny Method of Guided Imagery and Music. Journal of the Association for Music and Imagery 14: 1–22. Clarke, D., and E.  Clarke. 2011. Music and Consciousness: Philosophical, Psychological, and Cultural Perspectives. Oxford: Oxford University Press. Clarke, E. 2005. Ways of Listening: An Ecological Approach to the Perception of Musical Meaning. Oxford: Oxford University Press. Clarke, E. 2011. Music Perception and Musical Consciousness. In Music and Consciousness. Philosophical, Psychological, and Cultural Perspectives, edited by D. Clarke and E. Clarke, 193–213. Oxford: Oxford University Press. Clarke, E., T. DeNora, and J. Vuoskoski. 2015. Music, Empathy and Cultural Understanding. Physics of Life Reviews 15, 61–88. doi:dx.doi.org/10.1016/j.plrev.2015.09.001. Cross, I., and E. Tolbert. 2009. Music and Meaning. In The Oxford Handbook of Music Psychology, edited by S. Hallam, I. Cross, and M. Thaut, 33–46. Oxford: Oxford University Press. DeNora, T. 2000. Music in Everyday Life. Cambridge: Cambridge University Press. DeNora, T. 2007. Health and Music in Everyday Life—A Theory of Practice. Psyke and Logos 28 (1): 271–287. DeNora, T. 2011. Practical Consciousness and Social Relation in MusEcological Perspective. In Music and Consciousness: Philosophical, Psychological, and Cultural Perspectives, edited by D. Clarke and E. Clarke, 309–326. Oxford: Oxford University Press. Eerola, T., and J. K. Vuoskoski. 2013. A Review of Music and Emotion Studies: Approaches, Emotion Models, and Stimuli. Music Perception: An Interdisciplinary Journal 30 (3): 307–340. Ekholm, O., K.  Juel, and L.  O.  Bonde. 2016a. Associations between Daily Musicking and Health: Results from a Nationwide Survey in Denmark. Scandinavian Journal of Public Health 44 (7): 726–732. https://doi.org/10.1177/1403494816664252. Ekholm, O., K. Juel, and L. O. Bonde. 2016b. Music and Public Health—An Empirical Study of the Use of Music in the Daily Life of Adult Danes and the Health Implications of Musical Participation. Arts and Health 8 (2): 154–168. https://doi.org/10.1080/17533015.2015.1048696. Fachner, J., E. Ala-Ruona, and L. O. Bonde. 2015. Guided Imagery in Music—A Neurometric EEG/LORETA Case Study. In Proceedings of the Ninth Triennial Conference of the European Society for the Cognitive Sciences of Music, 17–22 August 2015, edited by J.  Ginsborg, A. Lamont, M. Phillips, and S. Bramley. Manchester, UK: Society for the Cognitive Sciences of Music (ESCOM). Gabrielsson, A. 2011. Strong Experiences with Music: Music Is Much More Than Just Music. Oxford: Oxford University Press. Gibson, J. J. 1983. The Senses Considered as Perceptual Systems. Westport, CT: Greenwood Press.

multimodal imagery in guided imagery and music   443 Goldberg, F.  S. 2002. A Holographic Field Theory Model of the Bonny Method of Guided Imagery and Music (BMGIM). In Guided Imagery and Music: The Bonny Method and Beyond, edited by K. E. Bruscia, and D. E. Grocke, 359–377. Gilsum NH: Barcelona Publishers. Grocke, D. 1999. A Phenomenological Study of Pivotal Moments in Guided Imagery and Music (GIM) Therapy. PhD thesis. Melbourne: Faculty of Music, The University of Melbourne. In Music Therapy Info CD-Rom III, edited by D. Aldridge. Witten: Universität Witten/Herdecke. Grocke, D. 2010. An Overview of Research in the Bonny Method of Guided Imagery and Music. Voices: A World Forum for Music Therapy 10 (3). https://voices.no/index.php/voices/ article/view/1886/1651. Accessed December 28, 2018. Grocke, D., and T. Wigram. 2007. Receptive Methods in Music Therapy: Techniques and Clinical Applications for Music Therapy Clinicians, Educators, and Students. London: Jessica Kingsley. Grocke, D., and T. Moe. 2015. Guided Imagery and Music: A Spectrum of Approach. London: Jessica Kingsley. Hallam, S., I. Cross, and M. Thaut. 2009. The Oxford Handbook of Music Psychology. Oxford: Oxford University Press. Hart, S. 2012. Neuroaffektiv psykoterapi med voksne [Neuroaffective Psychotherapy with Adults]. Copenhagen: Hans Reitzels Forlag. Hevner, K. 1936. Experimental Studies of the Elements of Expression in Music. American Journal of Psychology 48: 246–268. Horowitz, M. 1983. Image Formation and Psychotherapy. New York: Jason Aronson. Hubbard, T. L. 2010. Auditory Imagery: Empirical Findings. Psychological Bulletin 136 (2): 302. Hunt, A. M. 2011. A Neurophenomenological Description of the Guided Imagery and Music Experience. PhD Thesis.: Philadelphia, PA Temple University. Hunt, A. 2015. Boundaries and Potentials of Traditional and Alternative Neuroscience Research Methods in Music Therapy Research. Frontiers in Human Neuroscience, June 9. 9: 342. doi:10.3389/fnhum.2015.00342. Hunt, A. 2017. Protocol for a Neurophenomenological Investigation of a Guided Imagery and Music Experience (Part II). Music and Medicine 9 (2): 116–127. Impett, J. 2009. Making a Mark: The Psychology of Composition. In The Oxford Handbook of Music Psychology, edited by S. Hallam, I. Cross, and M. Thaut, 651–666. Oxford: Oxford University Press. Johnson, M. 2007. The Meaning of the Body: Aesthetics of Human Understanding. Chicago, IL: Chicago University Press. Juslin, P.  N., and J.  A.  Sloboda. 2011. Handbook of Music and Emotion. Oxford: Oxford University Press. Juslin, P.  N., and D.  Västfjäll. 2008. Emotional Responses to Music: The Need to Consider Underlying Mechanisms. Behavioral and Brain Sciences 31: 559–575. Juslin, T., Barradas, G., and Eerola, T. 2015. From Sound to Significance. Exploring the Mechanisms Underlying Emotional Reactions to Music. American Journal of Psychology 128 (3): 281–304. Kind, A. 2006. Imagery and Imagination. Internet Encylopedia of Philosophy, 1–19. https:// www.iep.utm.edu/imagery/. Accessed December 29, 2018. Kosslyn, S. M., W. L. Thompson, and G. Gais. 2006. The Case for Mental Imagery. Oxford: Oxford University Press. Lakoff, G., and M. Johnson. 1980. Metaphors We Live By. Chicago and London: University of Chicago Press.

444   lars ole bonde Lakoff, G., and M. Johnson. 1999. Philosophy in the Flesh: The Embodied Mind and Its Challenge to Western Thought. New York: Basic Books. Lem, A. 1999. Selected Patterns of Brainwave Activity Point to the Connection between Imagery Experiences and the Psychoacoustic Qualities of Music. In Music Medicine, Vol. 3, edited by R. R. Pratt, and D. E. Grocke, 75–87. Melbourne: University of Australia. Lilliestam, L. 2013. Music, the Life Trajectory and Existential Health. In Musical Life Stories: Narratives on Health Musicking, edited by L.  O.  Bonde. E.  Ruud, M.  Skånland, and G.  Trondalen, Anthology #6, 17–39. Oslo: Publications from the Centre for Music and Health. Lindvang, C., and B. D. Beck. 2017. Musik, krop og følelser: Neuroaffektive processer i musikterapi [Music, Body, and Emotions. Neuroaffective Processes in Music Therapy]. Copenhagen: Frydenlund Academic. MacLean, P.  D. 1990. The Triune Brain in Evolution: Role in Paleocerebral Functions. New York: Plenum. Marr, J. 2001. The Use of the Bonny Method of Guided Imagery and Music in Spiritual Growth. Journal of Pastoral Care 55 (4): 397–406. McKinney, C., and T. Honig. 2017. Health Outcomes of a Series of Bonny Method of Guided Imagery and Music Sessions: A Systematic Review. Journal of Music Therapy 54 (1): 1–34. McNorgan, C. 2012. A Meta-Analytic Review of Multisensory Imagery Identifies the Neural Correlates of Modality-Specific and Modality-General Imagery. Frontiers in Human Neuroscience 2012 (6): article 285. Ricoeur, P. 1978. The Rule of Metaphor: Multi-Disciplinary Studies of the Creation of Meaning in Language. London: Routledge & Kegan Paul. Ruud, E. 2010. Music Therapy: A Perspective from the Humanities. Gilsum, NH: Barcelona Publishers. Ruud, E. 2016. Musikkvitenskap. Oslo: Universitetsforlaget. Small, C. 1998. Musicking: The Meanings of Performing and Listening. London: Wesleyan University Press. Stern, D. 2010. Forms of Vitality: Exploring Dynamic Experience in Psychology, the Arts, Psychotherapy and Development. Oxford: Oxford University Press. Stige, B. 2003. Elaborations toward a Notion of Community Music Therapy. Oslo: Unipub. Summer, L. 2002. Group Music and Imagery Therapy: Emergent Receptive Techniques in Music Therapy Practice. In Guided Imagery and Music: The Bonny Method and Beyond, edited by K. E. Bruscia and D. E. Grocke, 297–306. Gilsum NH: Barcelona Publishers. Summer, L. 2009. Client Perspectives on the Music in Guided Imagery and Music (GIM). PhD thesis. Aalborg University. http://vbn.aau.dk/files/112202270/6467_lisa_summer_ thesis.pdf. Accessed December 29, 2018. Tesch, R. 1990. Qualitative Research: Analysis Types and Software Tools. London. Wärja, M., and L. O. Bonde. 2014. Music as Co-Therapist: Towards a Taxonomy of Music in Therapeutic Music and Imagery. Music and Medicine 6 (2): 16–27. Woody, R. H., and G. E. McPherson. 2011. Emotion and Motivation in the Lives of Performers. In Handbook of Music and Emotion, edited by P.  N.  Juslin and J.  A.  Sloboda, 401–424. Oxford: Oxford University Press.

chapter 22

Empir ica l M usica l Im agery beyon d the “Mi n d’s E a r” Freya Bailes

Introduction Many empirical studies of musical imagery begin by defining their subject as music “heard” by the mind’s ear, before swiftly acknowledging the importance of additional musical dimensions to the sonic. In defense of this approach, there are good reasons to emphasize the auditory components of imaged music when defining it, since mental imagery is generally understood to be a visual phenomenon. Even those who have previously encountered the term “musical imagery” might conceive of it as a primarily visual image accompanying heard music, as in its therapeutic use in Guided Imagery and Music (see Bonde, this volume, chapter 21). An alternative approach to communicating the intended meaning of musical imagery is to provide examples, which might include having an “earworm,” mentally continuing music that has stopped, audiating a musical score (see Halpern and Overy, this volume, chapter 19), mentally rehearsing for a music performance, or imagining1 a new composition. None of these examples is prescriptive with respect to the sensory modalities that might be represented in imagination, but neither do they indicate what might be imaged in addition to sound, and this chapter aims to explore the multimodality of our imagery for music. Returning to attempts to define musical imagery, in Bailes (2007) I explicitly refer to the “mind’s ear,” defining musical imagery as “the experience of imagining musical sound in the absence of directly corresponding sound stimulation from the physical environment” (555). While this definition encapsulates the notion of simulating sensory experience, it focuses exclusively on sound. Beaty and colleagues (2013) describe musical imagery as “melodies of the mind” (1163), a neutral expression though one that suggests a passive occurrence. In their study of involuntary musical imagery, Jakubowski and

446   freya bailes colleagues (2015) refer to a “mental replay of music” (1229), while Liikkanen (2012) poetically describes musical imagery as “a mental soundscape audible for our ‘inner ear’ ” (236). Weber and Brown (1986) introduced musical imagery as “a particular form of auditory imagery in which one imagines a melody or song . . . the ability to imagine, among other things, tonal progressions” (411). Researchers must account for an increasing body of evidence to suggest that rather than merely imaging the sound of music, we image visual and kinesthetic dimensions of musical experience as well.2 In this chapter, I begin with an introduction to theories whereby our seemingly disembodied mental imagery can instead be understood in relation to embodied cognition. I will revisit the findings of empirical studies of musical imagery to determine the extent to which embodied cognition could hold explanatory power, before considering recent work that has directly tested hypotheses relating body movement to musical imagery, and outlining a number of possible directions for future research.

Embodied Cognition and Mental Imagery Others have already reflected on how mental imagery might relate to our embodied experience. One theoretical position of relevance to an embodied account of mental imagery is experiential cognition (Reybrouck 2001), which posits that our representation of the world is generated by an interaction between environmental input and our capacity to represent it. Our bodies are our most immediate environments, and our physicality in turn governs our interaction with the wider environment. In their seminal text on embodied cognition, Varela and colleagues (1991) emphasize the dependence of minds on bodies that are characterized by certain sensorimotor capacities. For them, embodied “means reflection in which body and mind have been brought together” (Varela et al. 1991, 27). By this argument, the apparently disembodied mental simulation of sensorimotor experience is necessarily conditioned by our physical experiences of the world. In parallel work, there is increasing evidence to support theories that our perceptions are influenced by the possible actions afforded by what we perceive (Gibson  1986; Hubbard 2013). According to these theories, perceiving the actions of another will activate motor plans of our own (Schiavio et al. 2014). In this way, listening to music implies the actions associated with its production (Cox  2001; Reybrouck  2001). The motor theory of perception originated as a theory of language perception, and has been evoked to explain the influence of motor constraints on our representations of verbal stimuli (Hubbard 2013). Callan and colleagues (2006) extended the concept to suggest the existence of a motor theory of music perception, to account for their findings of activation of the motor cortex in both covert speech and song.

empirical musical imagery beyond the “mind’s ear”   447 The relevance of perception–action coupling for our understanding of mental imagery has also been explored. Referring to work by Berthoz (1996) on imaged movements, Reybrouck notes that the supplementary motor area (SMA) of the brain is implicated in both perception and action, and the same is true of imaged action. He writes, “Perception, therefore, can be considered as simulated action, as imagining the actions that are implied in using the perceived objects” (120–121). It follows: • That there is a strong link between our knowledge of sound and sound sources, both in perception and cognition, so that features of sound are in most cases related to features of sound-production, sound-production here understood as including both the sound-producing action and the features of the resonant bodies and environments. And, as an extension of this: • That images of sound-production, including visual, motor, tactile etc. elements, may actually trigger images of sound, and conversely, that images of sound may trigger images of sound-production. (Godøy 2001, 238) As an extension of this idea, Godøy (2001) also suggests that the greater our understanding of how sounds are produced, the greater the likelihood of their salience as auditory imagery.3 This leads to my prediction that the degree to which musical imagery is embodied lies along a continuum ranging from imaging oneself performing a clearly defined and rehearsed sonic output at one extreme, to imaging the timbre of an arti­ ficially produced sine wave, which the human body could not produce without recourse to digital means, at the other. A pertinent question arises as to whether our musical imagery can ever be so abstracted from its origins in sound production as to be effectively disembodied. In line with the theoretical propositions of embodied cognition (e.g., Varela et al. 1991; Niedenthal et al. 2005), I argue that musical imagery cannot be fully disembodied. As embodied minds, our thoughts are inseparable from our sensorimotor experience, and in the absence of personal experience in producing specific sounds, we draw on our knowledge of the actions required to make similar sounds to infer and image the sorts of articulatory gestures involved in their making (Cox 2001; Godøy 2001; Godøy, this volume, chapter 12).

Offline Cognition Theories of embodied cognition distinguish between “online” and “offline” processes (Niedenthal et al. 2005). Online embodiment relates to our processing of the real-world, external environment, occurring, for example, while listening to music. Offline embodiment is described as the simulation of our online cognitions, decoupled from current real-world operations. By such an account, mental imagery can be understood as a form of modality-relevant offline cognition, simulating our embodied online experiences. Insofar as our online cognition of music is a multimodal phenomenon

448   freya bailes relating to our bodily senses, we might expect offline music cognition to reflect these same bodily concerns. The distinction between online and offline cognition contrasts sensorimotor processing with ideomotor simulation respectively. Relevant to the motor theory of perception outlined above, Reybrouck (2001) argues that: It makes a difference . . . as to both the intensitiy [sic] and precision of the covert movements (the ideomotor simulation) if the subject who tries to imagine a certain musical structure is an expert or a layman. Subjects who received formal musical training can use this explicit musical knowledge and will easily imagine all the motor processes that are connected with the production of the sounds.  (129)

Indeed, embodied, real-world experience through the online cognition of music is an important prerequisite for the offline cognition of music, and hence our ability to simulate music in imagination is inextricably linked to our musical experience. This experience might take the form of many thousands of hours of individual practice on a musical instrument, but just as importantly it might relate to our experience of being sung to as infants. The establishment of sensorimotor associations through learning is central to Pfordresher, Halpern, and Greenspon’s (2015) model of multimodal imagery association (MMIA). According to this model, those with a vocal pitch-imitation deficit have imagery weaknesses in auditory-motor mapping. One example of such poor mapping would be between an auditory image of the desired pitch height of a note and a motor image of the necessary laryngeal tension to sing it. A strength of the MMIA model is its inclusion of a proposed account of the development of sensorimotor imagery, from the initial associations between vocalizations and auditory feedback that are formed during infancy, through the development of associations in memory between specific auditory and motor imagery, through to the ability to generalize sensorimotor associations to imagine new musical sequences. There is some contention regarding embodied accounts of auditory imagery. An unresolved debate originating in research on auditory imagery for speech concerns the extent to which auditory imagery is a matter of an “inner ear” or an “inner voice” (Hubbard 2013; Hubbard, volume 1, chapter 8). The “inner ear” in the context of imagery for speech is akin to an auditory image of the encoded speech, while the “inner voice” involves auditory and kinesthetic modalities though subvocal articulatory rehearsal (Kalakoski  2001). In 2013, Hubbard suggested that this distinction was redundant, instead drawing on perception–action coupling theories to defend his perspective, as follows: (1) Perception of another’s actions is believed to activate one’s own motor plans, so (2) If imagery involves the same mechanisms as perception (which is consistent with the claims of embodied cognition), then (3) Imaging another’s speech should involve articulatory mechanisms just as imaging one’s own speech. In other words, according to this account, embodiment is at work in mental imagery even for sounds that we have not ourselves enacted. The debate is complicated in its application to musical imagery,

empirical musical imagery beyond the “mind’s ear”   449 since not all music is vocally produced (Cox 2001; see Hubbard volume 1, chapter 8), and yet there is compelling evidence that we are able to simulate and image a variety of musical sounds, which I will now review.

An Embodied Review of Empirical Studies of Musical Imagery We have seen a move in the cognitive sciences toward an embodied account of our mental activity (Niedenthal et al. 2005; Glenberg et al. 2013), and other chapters in this handbook reflect this focus (see Christensen, this volume, chapter 1; Huvenne, volume 1, chapter 30; Saslaw and Walsh, this volume, chapter 7). I will now revisit the findings of past empirical studies of musical imagery from this embodied perspective. The purpose of this review is not to prove or disprove the embodiment of musical imagery, since such an approach is methodologically untenable, and it would be impossible to refute the argument that our minds are embodied. Rather, the purpose is to determine whether embodied cognition could have explanatory power with respect to the findings of musical imagery studies that were not necessarily designed to test such theories. It should be noted that our retrospective view of the indicators of embodied imagery is probably obscured by the neglect of past researchers to enquire about, or express an interest in, those imagery parameters that might reflect embodiment. A similar point has been made by Hubbard (2013) regarding empirical studies of auditory imagery that do not habitually ask participants about their concurrent experiences of visual imagery. However, some studies of musical imagery are directly concerned with bodily involvement, since they focus on music performance. These will be reviewed first, before a review of studies in which musical imagery occurs during composition and listening, in voluntary musical imagery tasks, and during involuntary musical imagery.

Imagery in Performance In order to perform almost all forms of music,4 one must move to produce the sound. This fundamental auditory-motor association appears to be represented in imagery, with increasing behavioral and brain imaging evidence being consistent with a role for kinesthetic imagery in music performance (see Lotze 2013, for a review). However, such auditory-motor associations must first be formed through experience of action in our sonic environment and, in the case of expert musicians, through repeated and deliberate musical enactment. In research by Lotze and colleagues (2003), professional violinists scored higher than amateur violinists for the vividness of their movement imagery and, at a neural level, they showed increased brain activations in the representation areas of the fingers during an imagined performance of Mozart’s violin concerto in D Major.

450   freya bailes To aid in their memorization of pitch sequences, trained music students are able to use finger tapping, as though tapping on a keyboard. This motor-encoding strategy appeared to reinforce their representation of the auditory stimuli (Mikumo 1994). In a study of pianists’ uses of musical imagery for expressive parameters during performance, we also found evidence that musical imagery was strengthened when the pianists were able to play on a silent piano keyboard, thus providing motor reinforcement (Bishop et al. 2013). There is evidence that during imaged song, the motor cortex is activated. For instance, Callan and colleagues (2006) asked participants in a functional magnetic resonance imaging (fMRI) study of the brain regions involved in perceived and imaged speech and song to covertly sing (i.e., image) stimuli cued by visually presented lyrics. Even though the task made no explicit motor demands, it seems that the song imagery was nevertheless embodied. An important means by which musical imagery can facilitate performance is by enhancing the ability to anticipate upcoming events. This was the focus of work by Keller and colleagues (2010), whose findings were consistent with the use of auditory imagery to enable action planning. Specifically, their method allowed them to relate anticipatory imagery for specific pitches to the accuracy of the actions required to “perform” them. They concluded that cross-modal (i.e., auditory, visual, motor) ideomotor processes were in operation, which would be consistent with an embodied representation of the pitch-space array. In a related study, Keller and Appel (2010) investigated the role of anticipatory auditory imagery in ensemble performance. The auditory imagery abilities of the duo pianists related to the quality of their coordination, regardless of whether or not they were able to see each other as they performed. The authors again postulate a role for ideomotor processes, suggesting that auditory imagery enhances the operation of internal models that simulate the action of both oneself and others. Indeed, learning the part of one’s duo partner by rehearsing it can be detrimental when it comes to subsequently performing the duo with them, since an embodied representation of the partner’s part, which necessarily differs from one’s own interpretation, can hinder coordination (Ragert et al. 2013). Another study of imagery for performance is a participant observation study of an extended masterclass led by Nelly Ben-Or for expert pianists (Davidson-Kelly et al. 2015). Central to Ben-Or’s approach is the use of multimodal musical imagery during performance preparation. Eleven participants in her five-day masterclass were observed and interviewed about their experiences, and a follow-up questionnaire was also given out nine months later. A thematic analysis of the resulting data led to the articulation of key elements of Ben-Or’s pedagogy. The principal feature is that performers should memorize the music before physically rehearsing it. While this might seem to be an extreme of disembodiment, the opposite could be said of the mental imagery that is consequently required of the pianists, since in order to memorize a performance piece, auditory, motor, and visual aspects must be integrated. Nelly Ben-Or herself explains that the memory formed during deliberate imagery rehearsal is “a kind of memory that includes an inner sense of the action of playing that music which I see [and it] has to include a vision of the keyboard” (Davidson-Kelly et al. 2015, 86). The authors of the

empirical musical imagery beyond the “mind’s ear”   451 study explore possible cognitive mechanisms by which “total inner memory” might enable effective performance. In particular, they suggest that the mental focus on the distal performance goal afforded by the multimodal image could enhance a close connection with the sound without the potentially disruptive effects of attending to proximal issues of technical production. While Ben-Or’s instruction prioritizes nonmotor tasks, an embodied understanding of sound production is assumed, so that the motor aspects of performance automatically fall into place as long as the musical image is complete. Interestingly, the pianists who participated in this study increased their ratings of the importance of imagining movement during performance preparation following the masterclass. In an experience sampling survey of the everyday experiences of musical imagery (Bailes 2007), music students reported imaging music in the course of their daily life that they had recently performed, and also music that they were preparing for an upcoming performance. The extent to which musicians are more inclined to imagine music associated with performance than music they do not normally enact remains an open question. It seems appropriate to look to the relationship of music to dance for confirmation of mental kinesthetic-musical links. In an experimental study of memory for music and dance (Mitchell and Gallagher 2001), participants were visually presented with sequences alternating music and dance stimuli. Some participants reported mentally accompanying the silent dance performances with the previously presented musical stimulus. In other words, there was a reported tendency to match performed movement with imaged sound.

Imagery in Composing and Listening If we accept that musical imagery for performance is embodied, we might wonder whether performer-composers manifest such embodiment in their compositional imagery. Evidence of sorts for the potential influence of playing the keyboard (e.g., organ, harpsichord, piano) on the imagery of composers is presented by a detailed score analysis of works by Bach, Beethoven, Schubert, Schoenberg, and Webern (Baker 2001). Baker argues that keyboard-thinking is reflected in the spatial arrangements of parts that match physical dispositions such as asymmetrical writing around the body’s center at the keyboard (middle C), and gestures reminiscent of hand crossing. In this way, the musical imagery of composers experienced as keyboard players is embodied. Eitan and Granot (2006) provide other clues to an embodied understanding of mental imagery in relation to music. Rather than concerning the ways in which we image the auditory dimensions of music, they studied the ways in which their participants imaged motion while actually listening to music. Parallels were observed between changes in the auditory space of the stimuli (e.g., pitch height) and the analogies of movement in visual space as described by the listeners. The results were framed in terms of a complex mapping between bodily motion and auditory information.

452   freya bailes

Voluntary Musical Imagery An activity in which many performing musicians in Western classical music traditions engage, but which can be separated from performance, is that of reading a musical score. Brodsky and colleagues (1999,  2003) invented an ingenious method to assess the “notational audiation” abilities of musicians (including professional orchestra players, conservatorium musicians, and music-specialist high school students). This is the ability to transform musical notation into an auditory image. Well-known musical themes were embedded within written variations, which were presented to musicians to read and audiate. Musicians were then presented with two aurally presented themes, from which they had to select the one that had been embedded in the notated variation. To test the hypothesis that musicians would subvocalize the music that they were mentally reading, consecutive interference tasks were introduced to variously disrupt such subvocalization. Performance on the embedded melodies task was worse during phonatory interference (e.g., wordless singing) than rhythmic interference (e.g., finger tapping a beat while hearing a task-irrelevant rhythmic pattern), and electromyography (EMG) measurements of electrical activity in the muscles near the larynx were greater when participants read the musical score than during control tasks. The authors conclude that notational audiation involves kinesthetic-like phonatory processes, consistent with an embodied account of score reading. Our knowledge of the role of subvocalization in auditory imagery relates to the more extensive body of work on inner speech. Much of this concerns phonological encoding (the so-called inner ear) and rehearsal (the so-called inner voice), which are supposed to be handled by the phonological loop of Baddeley’s (1986) working memory model. Kalakoski (2001) reports experiments to extend studies on working memory and auditory imagery for speech to the realm of music. Interestingly, she finds that articu­ latory suppression and concurrent speech interfere with performance on experimental tasks that measure melody recall but not pitch comparison. Can such a result be explained by an embodied account of mental imagery? It might be speculated that a natural response to melody is to inwardly “sing” it, while the comparison of pitches is less likely to call on covert production. Such a difference in rehearsal strategy could account for disrupted subvocalization during melody tasks, but not during pitch comparison. A fascinating instance of empirical musical imagery research that produced unexpected but potentially illuminating results can be found in a brain imaging study by Halpern and colleagues (2004). In their study of the neural correlates of imagined timbre, they sought to see whether supplementary motor area (SMA) activity previously observed during auditory imagery tasks (Zatorre et al. 1996; Halpern and Zatorre 1999) could be explained by subvocalization. They predicted that a timbre imagery task would not elicit such subvocalization, given the difficulties of vocally producing nonvocal timbres, and this would be reflected in the absence of activity in SMA. However, some subthreshold activity in SMA was recorded, and the authors acknowledged that the timbre used in their study was pitched, which is a parameter that can be readily vocalized.

empirical musical imagery beyond the “mind’s ear”   453 A recent fMRI study of the brain activity associated with mentally transforming imagery for melodies found that some regions associated with motor control were activated (Foster et al.  2013). In this study, participants were instructed to mentally transpose or reverse melodies, and then judge whether the subsequently presented comparison stimulus matched their transformed image. Activation was found in the intraparietal sulcus (IPS), forming part of the posterior parietal cortex (PPC), which is connected to both working memory and motor-planning centers of the brain. The authors also found clusters of significant interest in SMA when they contrasted the reversed condition with the control, as well as consistent activations in pre-SMA during both types of melody transformation task. While the authors do not speculate on their findings of SMA and pre-SMA activation, the involvement of motor centers in such an ostensibly mental task could be indicative of a link with covert production or ideomotor simulation.

Involuntary Musical Imagery A recent use of neuroimaging to assess the brain structures associated with involuntary musical imagery did not report the significant role for the SMA suggested by earlier studies of voluntary musical imagery (Halpern 2015, 33). Because of the difficulties of capturing involuntary musical imagery (INMI) in a laboratory setting, however, participants in this study were not scanned while imaging music. Rather, the focus was on the individual differences in cortical structure that might be linked with general INMI experiences across time. The INMI experiences were measured using the Involuntary Musical Imagery Scale (IMIS; Farrugia et al. 2015), and ratings for the “Movement” subscale did not significantly relate to variations in brain structure either. A thorough search for neural evidence of embodied INMI would need to capture the musical imagery as it occurs, which still represents important methodological challenges for researchers of spontaneous cognition. Current trends for naturalistic studies have favored a preponderance of publications examining the everyday experience of INMI. The importance of the body in everyday experiences of musical imagery is suggested by the finding of a correlation between individuals’ frequencies of INMI episodes and their ratings on the “Body” factor of the Goldsmiths Musical Sophistication Index (Floridou et al. 2012). Moreover, recent work points to an association between singing and the frequency and pleasantness of INMI (Williamson and Jilka 2014). More evidence of embodied INMI comes from a questionnaire study of the individual differences that predict INMI patterns (Müllensiefen et al. 2014). It was found that a measure of everyday singing, which encompassed self-rated singing ability and the extent of self-reported sing-along behavior, predicted the length of reported episodes of INMI. Since this measure of everyday singing relates to musical enactment, any resulting imagery might have an embodied dimension. A link was also found between listening engagement and reported INMI frequency. However, while musical training is supposed to enhance auditory-motor associations, no relationship

454   freya bailes was found between musical training and the self-report measures of INMI employed in this study. Active musical engagement is important in relation to INMI (Williamson et al. 2011), and Liikkanen (2012) found that INMI was associated with exercising. In earlier work, I found that music students reported musical imagery during activities that involve motion, such as when traveling or getting up in the morning (Bailes 2006). Music students describing their experiences of musical imagery report concurrent visual and motor dimensions (Bailes 2007), as did the musically experienced interviewees in an INMI study by Williamson and Jilka (2014). If we are more inclined to image music that we are able to sing than music that we are not able to sing, then our musical imagery should reflect the characteristics of vocal music. Work by Burgoyne5 and colleagues is consistent with this argument. They have been using online gaming to gather data about the catchiness of music, with gamers indicating their familiarity with popular music and then indicating whether the music’s continuation after a period of silence is correct or not, requiring them to mentally continue the music and compare the subsequently presented snippet with their mental continuation. Using sophisticated algorithms, they have been able to establish the salient musical parameters that make such music catchy, and these are melodic repetition, vocal prominence, melodic conventionality, and melodic range conventionality. The propensity for certain popular music to be experienced as an “earworm” is not explained by its popularity (chart position) or exposure (recent runs) alone (Jakubowski et al. 2016). Perhaps it is of significance that it is music that can be readily sung that sticks in our memory. Floridou and Müllensiefen (2015) used experience-sampling methods to explore the conditions that predict INMI. Respondents in their study were asked not only about their experiences of INMI, but also about mind wandering. Responses were modeled in relation to contextual factors such as the activity that participants were undertaking when they were contacted. One finding was the statistical relationship between mind wandering and INMI, suggesting that mind wandering was a prerequisite to INMI. In turn, mind wandering was statistically linked to the activity that respondents were doing when their experience was sampled. Given that physical movement was one of the activities found to favor mind wandering, this research could point to a role for the bodily initiation of a chain of effect from activity to mind wandering to INMI. However, a replication and extension of my earlier (Bailes  2007) empirical study of musical imagery in everyday life did not confirm the previously found relationship between the activity that respondents were engaged with and their propensity to imagine music (Bailes 2015). This more recent work sampled the experiences of members of the general public rather than university music students. Perhaps the association between activity and imagery found in the earlier work relates to the theoretically stronger auditorymotor associations that result from musical training. An increasingly frequent suggestion in studies of INMI is that arousal state plays a role in its encounter. In an experience sampling study of the phenomenology of musical imagery in its everyday occurrence, Beaty and colleagues (2013) report that participants imaged music more when they felt happy or worried, but not sad. While happy and worried

empirical musical imagery beyond the “mind’s ear”   455 represent emotions that are high in arousal, sad is typically considered to be a low arousal state. As a result of their interview study, Williamson and Jilka (2014) speculate, “INMI may have a functional relationship with arousal state whereby it can be triggered unconsciously in order to modulate a person’s psychophysiological arousal level” (666). It seems that the body might play a variety of different roles when it comes to shaping our everyday experiences of imaging music: physical enactment contributes to an embodied memory for music; our physical capabilities facilitate imagery for music that we can produce with our bodies; INMI could function to moderate our physiological arousal; and activities involving motion are associated with musical imagery. In my experience sampling study of everyday musical imagery occurrences (Bailes 2015), I was interested in the mood of participants at the times they were observed. Mood scales were included to measure the respondents’ positivity, present-mindedness, and arousal (alert-drowsy, energetic-tired). A model of mood ratings during musical imagery episodes found that respondents were unlikely to report imaging music when they felt drowsy. The relationship between INMI and subjective arousal is relevant to work by Jakubowski and colleagues (2015), who tracked the tempo of imaged music by asking respondents to tap it as it occurred in everyday life, with measurements recorded by a wrist-worn accelerometer. Participants further noted information about their circumstances at the time in a diary. While no measure of the physiological arousal of the respondents was recorded, we do have information about their subjective ratings of arousal, and these were found to be significantly related to the tempo of the music that they tapped. This is in keeping with one of the four factors of the newly created IMIS, “Movement”6 (Floridou et al. 2015). A factor analysis of answers to a self-report inventory of individual differences in INMI grouped the following movement items together: “The rhythms of my earworms match my movements,” “The way I move is in sync with my earworms,” and “When I get an earworm I move to the beat of the imagined music.” This “Movement” factor was subsequently found to correlate with a number of other existing measures. Notable correlations occurred with the reported frequency of experiencing INMI and the Bucknell Auditory Imagery Scale-Vividness (BAIS-V) (Halpern  2015). The authors note a “potential for overlap in embodied responses to hearing real music and experiencing spontaneous INMI, a link that could be explored with both behavioral and neuroimaging studies” (Floridou et al. 2015, 33). It is commonly believed that “earworms” are an annoyance, and work by Williamson and colleagues (2014) sought to understand how we deal with them when they occur. Using data from English and Finnish online surveys, the authors conducted a qualitative analysis of 1,046 earworm reports and found that physical approaches to dealing with the phenomenon were among the most popular responses. For example, respondents would seek out the tune (including singing it or playing it) or use musical or verbal distraction such as humming, singing, talking aloud, or listening to music/the radio/television. Response categories derived from the English survey included a “Physical” subcategory under the “Distract” theme, with physical behaviors intended to distract the respondent from their earworm including the subgroupings “eat,” “rhythmic,” “breathe,” “exercise,” and “work.” A second model was derived for the English survey data to only include INMI

456   freya bailes behaviors that were rated as being effective. This model retained a “Physical” subcategory for the “Distract” theme, and nonmusical forms of distraction included speech and watching television. Williamson and colleagues (2014) suggest that we use distraction behaviors that compete in working memory with the musical imagery, in this case implicating movement and thus bodily involvement.

Tests of Musical Imagery’s Embodiment In summary, empirical studies of musical imagery present data that vary in the strength to which they can be considered to be supportive of an embodied interpretation. Where evidence is lacking, this could be attributable to the use of research methods that are not optimally applied to further our understanding of imagery embodiment, since they were designed to address quite different research problems. However, a handful of studies have now been conducted to test specific hypotheses that relate musical imagery to movement. McCullough Campbell and Margulis (2015) set about testing the hypothesis that physical activity during music listening would induce more frequent INMI than passive music listening. In this research, 123 participants were randomly assigned to different experiment conditions that varied in the requirement to have a motor involvement while listening to a song (thought to be likely to induce INMI). Participants were instructed to listen, move, or sing while hearing the song over headphones, before being asked to take part in a dot-tracking task designed to induce INMI because of its low demands on the participants’ attention. Following this, participants completed a questionnaire asking them about their INMI experiences both during the experiment and in general. Contrary to expectation, no significant differences were found between the experiment groups, seemingly because participants found it difficult to comply with the instruction to listen silently without moving. Consequently, the authors of the study re-analyzed the data comparing INMI frequency in relation to the amount of motor involvement that the individual participants reported, rather than the amount that was asked of them by their experimental condition. This analysis revealed that those participants who reported both moving and vocalizing during the song presentation experienced more INMI than those who reported being still and silent. The finding that “moving and vocalizing proved near irresistible” (Margulis et al. 2015, 353) in itself lends support to the case for the embodiment of musical engagement, and the propensity to move or vocalize to some extent, while listening, is well known. Beaman and colleagues (2015) investigated the role of articulatory motor planning during both voluntary and involuntary musical recollections. Following in the tradition of research suggesting the importance of subvocalization in auditory imagery, they devised a paradigm in which participants were exposed to a particular song and then the incidence of imaging it was recorded. Subsequent to the song presentation, participants were either asked to chew gum or were not given gum to chew. The authors hypothesized

empirical musical imagery beyond the “mind’s ear”   457 that chewing gum should serve to degrade articulatory motor programming, and so reduce the incidence of musical imagery accounts. Their findings suggest that musical recollections were reduced when chewing gum, and the authors argue that this reflects an association between articulatory motor programming and imagery for song. I will now consider some of the other ways in which theories of embodied mental imagery might be explicitly tested to further our understanding of the role of the body in our musical imagination.

Directions for Future Research We have seen that many empirical findings from studies of musical imagery challenge the restricted notion of hearing in the “mind’s ear,” since our body is implicated in the quality of the experience. However, searching the literature for compatible evidence for an embodied account of musical imagery is a problematic endeavor because: (1) the search decontextualizes findings in ways that might mask or even contradict the original purpose of the source research, (2) it runs the risk of amplifying evidence by virtue of its isolation, (3) it is susceptible to bias in the selection of relevant material, and (4) it can only highlight associations rather than establish causal relationships. In order to assess the extent to which musical imagery is an embodied cognition phenomenon, a tailor-made research agenda is needed to enable more hypothesis testing about the role of the body in our musical consciousness. Before outlining some promising future directions, it is important to acknowledge evidence that seems to temper the claims that can be made for an embodied musical imagery. First, research corroborates centuries of music pedagogy in suggesting that physical practice at an instrument will lead to greater improvements than mental practice (Cahn 2008; Bernardi et al. 2013). Similarly, Lotze and colleagues (2003) argue that auditory-motor associations were not sufficiently tight in their study of violinists for them to be co-activated without actually hearing the performed sound, or actually producing the performed movement. Finally, Aleman and colleagues (2000) found that musicians outperformed nonmusicians on auditory imagery tasks. While superior musical imagery abilities are to be expected, and these are entirely consistent with embodied imagery, there is no reason to suppose that musicians should have any more embodied knowledge of the everyday sounds used in the auditory imagery task than the nonmusicians.7 Moreover, the superior performance of the musicians on the auditory imagery task cannot be explained by a greater ability to compare sounds as a result of their training, since musicians and nonmusicians were comparable in their performance on the equivalent sound perception task. These potential caveats support the case for an empirical exploration of the extent of the contribution that embodiment makes to musical imagery experience. In uncovering movement as an important factor in the experience of INMI, Floridou and colleagues (2015) agree that embodied cognition is a relevant avenue for future

458   freya bailes work. I will now point to a selection of the questions that are raised by expanding musical imagery beyond the mind’s ear. For example, should we test for possible differences in the degree to which our musical imagery is embodied, and what could such differences tell us? Godøy (2001) argues, “we have more salient images of sound when we have more salient images of how the sounds are produced” (238). This hypothesis is amenable to experimental testing and is ripe for future research. Here, we can return to the unexpected finding from Halpern and colleagues (2004) of activation in SMA during a timbre imagery task: pitch and timbre might be disentangled by asking participants to image a selection of noise-based stimuli, with the prediction that noisy timbres that are difficult to produce will not elicit SMA activity. Musical imagery (e.g., re-presenting a sequence of just heard notes in one’s mind) feeds our musical imagination (e.g., creating a new sequence of notes in one’s mind). To the extent that our musical imagery is embodied, are our imaginative re-presentations of music constrained by our physical experience, and if so how can we understand the role of our body in creative musical thought? Empirical studies of the musical imagery of composers are lacking (Bailes and Bishop 2012), and research is needed to explore how bodily experience shapes compositional ideas. Embodied accounts of musical imagery necessarily relate to learning, since it would be the changes in our embodied experience that come to shape our ­mental representations. This is arguably the research area for which we have the most ­empirical evidence in the guise of studies demonstrating enhanced auditory-motor coupling as a result of musical training. How might an empirical under­standing of mental imagery as embodied be applied in music education? The music pedagogy of Jaques-Dalcroze (1967) places great emphasis on the importance of movement and sound. This intimately associates sounds with their physical production (Campbell 1989), and it seems reasonable to suggest that the musical representations of those who follow such training are strong in motor imagery. A related prediction is that there is a link between musical experience and the fidelity of musical imagery, and that this can be explained in terms of embodiment. Anecdotal evidence of more vivid musical imagery for music that has been experienced through dance or musical performance might be corroborated by experimental research. An embodied account of learning should help to explain the pedagogical links between doing and thinking. The implications for music education are extensive, suggesting that practice-led learning is the most effective approach to developing reliable representations of music. Finally, interoception, which is the sense of our physiological condition, is theoretically relevant to mental imagery when this is viewed as an offline simulation of embodied cognition. Research into interoception (e.g., Kadota et al. 2010) suggests that it can have significant consequences on our psychological state. It seems that our perceptions are subconsciously tuned to our own biological rhythms such as heartbeat (Aspell et al. 2013), and it remains an open question as to whether interoceptive forces shape our musical imagery.

empirical musical imagery beyond the “mind’s ear”   459

Concluding Remarks I would like to conclude by reflecting on the apparent intangibility of both sound and imagination, with a reminder that our corporality tangibly relates the two. Imagination is often taken to be synonymous with mental freedom, yet our thoughts are shaped by our environmental, biological, and cognitive experience. If this shaping extends to mental imagery, then our musical imagery will be characterized by those features of our environment that are of personal significance, and investigating musical imagery should enable a better understanding of what is meaningful in sound. A review of the empirical literature has demonstrated that our imagery for musical sound is not limited to a single, auditory modality, and the involvement of motor imagery in particular reminds us that music results from physical action. In this way, our understanding of sound as embodied can be illuminated through the lens of imagination. We might then ask how our understanding of imagination can be magnified through the lens of sound. For most people, the term “mental imagery” is equated with visual imagery. However, much can be gained from studying mental imagery for other modalities: sound and music are obviously articulated through time, with auditory and musical imagery emphasizing the dynamic processes that underpin their generation rather than the apparently static product evoked by imagining a visual scene (Bailes forthcoming). This chapter has argued that we should extend our understanding of auditory imagery beyond the “mind’s ear.” Sound naturally affords a focus on the essentially dynamic properties of imagery and imagination, and this is often missing in the frequently disembodied, static conceptualization of visual imagery occurring in the “mind’s eye.”

Notes 1. Throughout this chapter the terms “imagery” and “imaging” primarily reference ­re-­presentation, while “imagination” and “imagining” are used more often to signal the imaginative. 2 In this respect, imagined music resembles perceived music as a cross-modal phenomenon in which auditory, visual, and kinesthetic senses seem most likely to feature rather than gustatory or olfactory modalities. 3. Though see Schiavio, Menin, and Matyja (2014) for arguments against the loose ­adaptation of the unconscious embodied simulation account to describe conscious phenomena. 4. Many forms of electronic music require minimal gestures. 5. Burgoyne, J.A. 2015. Resurrecting the Earworms of Our Youth: What Is Responsible for Long-Term Musical Salience? Paper read at Investigating the Music in our Heads, June 1, 2015, Goldsmiths University of London. 6. The others are “negative valence,” “personal reflections,” and “help.” 7. Unless they have been trained in environmental listening or acousmatic composition.

460   freya bailes

References Aleman, A., M. R. Nieuwenstein, K. B. E. Böcker, and E. H. F. de Haan. 2000. Music Training and Mental Imagery Ability. Neuropsychologia 38 (12): 1664–1668. doi:10.1016/S00283932(00)00079-8. Aspell, J. E., L. Heydrich, G. Marillier, T. Lavanchy, B. Herbelin, and O. Blanke. 2013. Turning Body and Self Inside Out: Visualized Heartbeats Alter Bodily Self-Consciousness and Tactile Perception. Psychological Science 24 (12): 2445–2453. doi:10.1177/0956797613498395. Baddeley, A. 1986. Working Memory. Oxford: Clarendon Press. Bailes, F. 2006. The Use of Experience-Sampling Methods to Monitor Musical Imagery in Everyday Life. Musicae Scientiae 10 (2): 173–190. Bailes, F. 2007. The Prevalence and Nature of Imagined Music in the Everyday Lives of Music Students. Psychology of Music 35 (4): 555–570. doi:10.1177/0305735607077834. Bailes, F. 2015. Music in Mind? An Experience Sampling Study of What and When, Towards and Understanding of Why. Psychomusicology: Music, Mind, and Brain 25 (1): 58–68. doi:10.1037/pmu0000078. Bailes, F. 2019. Musical Imagery and the Temporality of Consciousness. In Music and Consciousness 2: Worlds, Practices, Modalities, edited by D. Clarke, R. Herbert, and E. Clarke. Oxford University Press. Bailes, F., and L. Bishop. 2012. Musical Imagery in the Creative Process. In The Act of Musical Composition: Studies in the Creative Process, edited by D.  Collins, 54–77. Farnham, UK: Ashgate. Baker, J. M. 2001. The Keyboard as Basis for Imagery of Pitch Relations. In Musical Imagery, edited by R. I. Godøy and H. Jørgensen, 251–269. Lisse, Netherlands: Swets & Zeitlinger. Beaman, C.  P., K.  Powell, and E.  Rapley. 2015. Want to Block Earworms from Conscious Awareness? B(u)y Gum! Quarterly Journal of Experimental Psychology 68 (6): 1049–1057. doi:10.1080/17470218.2015.1034142. Beaty, R. E., C. J. Burgin, E. C. Nusbaum, T. R. Kwapil, D. A. Hodges, and P. J. Silvia. 2013. Music to the Inner Ears: Exploring Individual Differences in Musical Imagery. Consciousness and Cognition 22 (4): 1163–1173. doi:10.1016/j.concog.2013.07.006. Bernardi, N. F., M. De Buglio, P. D. Trimarchi, A. Chielli, and E. Bricolo. 2013. Mental Practice Promotes Motor Anticipation: Evidence from Skilled Music Performance. Frontiers in Human Neuroscience 7:451. doi:10.3389/fnhum.2013.00451. Berthoz, A. 1996. The Role of Inhibition in the Hierarchical Gating of Executed and Imagined Movements. Cognitive Brain Research 3:101–113. doi:10.1016/0926-6410(95)00035-6. Bishop, L., F. Bailes, and R. T. Dean. 2013. Musical Imagery and the Planning of Dynamics and Articulation during Performance. Music Perception 31 (2): 97–117. doi:10.1525/mp.2013.31.2.97. Brodsky, W., A.  Henik, B.-S.  Rubinstein, and M.  Zorman. 1999. Inner Hearing among Symphony Orchestra Musicians: Intersectional Differences of String-Players versus WindPlayers. In Music, Mind, and Science, edited by S.  W.  Yi, 370–392. Seoul: Seoul National University Press. Brodsky, W., A.  Henik, B.-S.  Rubinstein, and M.  Zorman. 2003. Auditory Imagery from Musical Notation in Expert Musicians. Perception and Psychophysics 65 (4): 602–612. doi:10.3758/BF03194586. Cahn, D. 2008. The Effects of Varying Rations of Physical and Mental Practice, and Task Difficulty on Performance of a Tonal Pattern. Psychology of Music 36 (2): 179–191. doi:10.1177/ 0305735607085011.

empirical musical imagery beyond the “mind’s ear”   461 Callan, D.  E., V.  Tsytsarev, T.  Hanakawa, A.  M.  Callan, M.  Katsuhara, H.  Fukuyama, et  al. 2006. Song and Speech: Brain Regions Involved with Perception and Covert Production. Neuroimage 31 (3): 1327–1342. doi:10.1016/j.neuroimage.2006.01.036. Campbell, P. S. 1989. Dalcroze Reconstructed: An Application of Music Learning Theory to the Principles of Jaques-Dalcroze. In Readings in Music Learning Theory, edited by C. C. D. L. Walters & Taggart, 301–315. Chicago, IL: GIA Publishers. Cox, A. 2001. The Mimetic Hypothesis and Embodied Musical Meaning. Musicae Scientiae 5 (2): 195–212. doi:10.1177/102986490100500204. Davidson-Kelly, K., R.  S.  Schaefer, N.  Moran, and K.  Overy. 2015. “Total Inner Memory”: Deliberate Uses of Multimodal Musical Imagery during Performance Preparation. Psychomusicology: Music, Mind, and Brain 25 (1): 83–92. doi:10.1037/pmu0000091. Eitan, Z., and R.  Y.  Granot. 2006. How Music Moves: Musical Parameters and Listeners’ Images of Motion. Music Perception 23: 221–247. doi:10.1525/mp.2006.23.3.221. Farrugia, N., K. Jakubowski, R. Cusack, and L. Stewart. 2015. Tunes Stuck in Your Brain: The Frequency and Affective Evaluation of Involuntary Musical Imagery Correlate with Cortical Structure. Consciousness and Cognition 35: 66–77. doi:10.1016/j.concog.2015.04.020. Floridou, G. A., and D. Müllensiefen. 2015. Environmental and Mental Conditions Predicting the Experience of Involuntary Musical Imagery: An Experience Sampling Method Study. Consciousness and Cognition 33: 472–486. doi:10.1016/j.concog.2015.02.012. Floridou, G.  A., V.  J.  Williamson, and D.  Müllensiefen. 2012. Contracting Earworms: The Roles of Personality and Musicality. In 12th International Conference on Music Perception and Cognition and the 8th Triennial Conference of the European Society for the Cognitive Sciences of Music, edited by E.  Cambouropoulos, C.  Tsougras, P.  Mavromatis, and K. Pastiadis, 302–309. Thessaloniki, Greece: School of Music Studies, Aristotle University of Thessaloniki. Floridou, G.  A., V.  J.  Williamson, L.  Stewart, and D.  Müllensiefen. 2015. The Involuntary Musical Imagery Scale (IMIS). Psychomusicology: Music, Mind, and Brain 25 (1): 28–36. doi:10.1037/pmu0000067. Foster, N. E. V., A. R. Halpern, and R. J. Zatorre. 2013. Common Parietal Activation in Musical Mental Transformations across Pitch and Time. Neuroimage 75: 27–35. doi:10.1016/j. neuroimage.2013.02.044. Gibson, J.  J. 1986. The Ecological Approach to Visual Perception. New York, NY: Taylor & Francis Group. Glenberg, A. M., J. K. Witt, and J. Metcalfe. 2013. From the Revolution to Embodiment: 25 Years of Cognitive Psychology. Perspectives on Psychological Science 8 (5): 573–585. doi:10.1177/1745691613498098. Godøy, R. I. 2001. Imagined Action, Excitation, and Resonance. In Musical Imagery, edited by R. I. Godøy and H. Jørgensen, 237–250. Lisse, Netherlands: Swets & Zeitlinger. Halpern, A.  R. 2015. Differences in Auditory Imagery Self-Report Predict Neural and Behavioral Outcomes. Psychomusicology: Music, Mind, and Brain 25 (1): 37–47. doi:10.1037/ pmu0000081. Halpern, A.  R., and R.  J.  Zatorre. 1999. When That Tune Runs through Your Head: a PET Investigation of Auditory Imagery for Familiar Melodies. Cerebral Cortex 9: 697–704. doi:10.1093/cercor/9.7.697. Halpern, A. R., R. J. Zatorre, M. Bouffard, and J. A. Johnson. 2004. Behavioral and Neural Correlates of Perceived and Imagined Musical Timbre. Neuropsychologia 42: 1281–1292. doi:0.1016/j.neuropsychologia.2003.12.017.

462   freya bailes Hubbard, T.  L. 2013. Auditory Imagery Contains More Than Audition. In Multisensory Imagery, edited by S. Lacey and R. Lawson, 221–247. New York: Springer. Jakubowski, K., N. Farrugia, A. R. Halpern, S. K. Sankarpandi, and L. Stewart. 2015. The Speed of Our Mental Soundtracks: Tracking the Tempo of Involuntary Musical Imagery in Everyday Life. Memory and Cognition 43 (8): 1229–1242. doi:10.3758/s13421-015-0531-5. Jakubowski, K., S.  Finkel, L.  Stewart, and D.  Müllensiefen. 2017. Dissecting an Earworm: Melodic Features and Song Popularity Predict Involuntary Musical Imagery. Psychology of Aesthetics, Creativity, and the Arts 1 (1): 122–135. doi:10.1037/aca0000090. Jaques-Dalcroze, É. 1967. Rhythm, Music and Education. Woking, UK: The Dalcroze Society. Kadota, Y., G. Cooper, A. R. Burton, J. Lemon, U. Schall, A. Lloyd, and U. Vollmer-Conna. 2010. Autonomic Hyper-Vigilance in Post-Infective Fatigue Syndrome. Biological Psychology 85 (1): 97–103. doi:10.1016/j.biopsycho.2010.05.009. Kalakoski, V. 2001. Musical Imagery and Working Memory. In Musical Imagery, edited by R. I. Gødoy and H. Jørgensen, 43–56. Lisse, Netherlands: Swets & Zeitlinger. Keller, P.  E., and M.  Appel. 2010. Individual Differences, Auditory Imagery, and the Coordination of Body Movements and Sounds in Musical Ensembles. Music Perception 28 (1): 27–46. doi:10.1525/MP.2010.28.1.27. Keller, P. E., S. Dalla Bella, and I. Koch. 2010. Auditory Imagery Shapes Movement Timing and Kinematics: Evidence from a Musical Task. Journal of Experimental Psychology: Human Perception and Performance 36 (2): 508–513. doi:10.1037/a0017604. Liikkanen, L.  A. 2012. Musical Activities Predispose to Involuntary Musical Imagery. Psychology of Music 40:236–256. doi:10.1177/0305735611406578. Lotze, M. 2013. Kinesthetic Imagery of Musical Performance. Frontiers in Human Neuroscience 7: 280. doi:10.3389/fnhum.2013.00280. Lotze, M., G. Scheler, H.-R. M. Tan, C. Braun, and N. Birbaumer. 2003. The Musician’s Brain: Functional Imaging of Amateurs and Professionals during Performance and Imagery. NeuroImage 20 (3): 1817–1829. doi:10.1016/j.neuroimage.2003.07.018. McCullough Campbell, S. M., and E. H. Margulis. 2015. Catching an Earworm through Movement. Journal of New Music Research 44 (4): 347–358. doi:10.1080/09298215.2015.1084331. Mikumo, M. 1994. Motor Encoding Strategy for Pitches and Melodies. Music Perception 12 (2): 175–197. doi:10.2307/40285650. Mitchell, R. W., and M. C. Gallagher. 2001. Embodying Music: Matching Music and Dance in Memory. Music Perception 19 (1): 65–85. doi:10.1525/mp.2001.19.1.65. Müllensiefen, D., J. Fry, R. Jones, S. Jilka, L. Stewart, and V. J. Williamson. 2014. Individual Differences Predict Patterns in Spontaneous Involuntary Musical Imagery. Music Perception 31 (4): 323–338. doi:10.1525/mp.2014.31.4.323. Niedenthal, P.  M., L.  W.  Barsalou, P.  Winkielman, S.  Krauth-Gruber, and F.  Ric. 2005. Embodiment in Attitudes, Social Perception, and Emotion. Personality and Social Psychology Review 9 (3): 184–211. doi:10.1207/s15327957pspr0903_1. Pfordresher, P. Q., A. R. Halpern, and E. B. Greenspon. 2015. A Mechanism for Sensorimotor Translation in Singing: The Multi-Modal Imagery Association (MMIA) Model. Music Perception 32 (3): 242–253. doi:10.1525/mp.2015.32.3.242. Ragert, M., T. Schroeder, and P. E. Keller. 2013. Knowing Too Little or Too Much: The Effects of Familiarity with a Co-Performer’s Part on Interpersonal Coordination in Musical Ensembles. Frontiers in Psychology 4: 368. doi:10.3389/fpsyg.2013.00368. Reybrouck, M. 2001. Musical Imagery between Sensory Processing and Ideomotor Simulation. In Musical Imagery, edited by R. I. Godøy and H. Jørgensen, 117–135. Lisse, Netherlands: Swets & Zeitlinger.

empirical musical imagery beyond the “mind’s ear”   463 Schiavio, A., D. Menin, and J. Matyja. 2014. Music in the Flesh: Embodied Simulation in Musical Understanding. Psychomusicology: Music, Mind, and Brain 24 (4): 340–343. doi:10.1037/pmu0000052. Varela, F.  J., E.  Thompson, and E.  Rosch. 1991. The Embodied Mind: Cognitive Science and Human Experience. Cambridge, MA: MIT Press. Weber, R. J., and S. Brown. 1986. Musical Imagery. Music Perception 3 (4): 411–426. doi:10.2307/ 40285346. Williamson, V. J., S. R. Jilka, J. Fry, S. Finkel, D. Müllensiefen, and L. Stewart. 2011. How Do “Earworms” Start? Classifying the Everyday Circumstances of Involuntary Musical Imagery. Psychology of Music 40 (3): 259–284. doi:10.1177/0305735611418553. Williamson, V.  J., and S.  R.  Jilka. 2014. Experiencing Earworms: An Interview Study of Involuntary Musical Imagery. Psychology of Music 42 (5): 653–670. doi:10.1177/0305735613483848. Williamson, V. J., L. A. Liikkanen, K. Jakubowski, and L. Stewart. 2014. Sticky Tunes: How Do People React to Involuntary Musical Imagery? PLoS One 9 (1): e86170. doi:10.1371/journal. pone.0086170. Zatorre, R. J., A. R. Halpern, D. W. Perry, E. Meyer, and A. C. Evans. 1996. Hearing in the Mind’s Ear: A PET Investigation of Musical Imagery and Perception. Journal of Cognitive Neuroscience 8 (1): 29–46. doi:10.1162/jocn.1996.8.1.29.

pa rt I V

A E ST H ET IC S

chapter 23

Im agi nati v e Listen i ng to M usic Theodore Gracyk

That’s the worst of music—these silly dreams. —Virginia Woolf

Introduction Appreciative listening to music involves the exercise of taste, for it involves attention to aesthetic properties, as when we distinguish between graceful and clunky transitions, and between violent and sluggish rhythms. For over three centuries, major figures in philosophical aesthetics have argued that aesthetic engagement with art—and therefore music—includes pleasures of the imagination (Addison and Steele 1965). So listening is both perceptual and imaginative. Frequently, this connection is cashed out with respect to the problem of how music conveys emotion, as when R. K. Elliott diagnoses the experience of emotional qualities in music as a case of “imaginatively enriched perception” (1967, 119). Listening to music differs from hearing the sounds that constitute the music, and some of this difference stems from our imaginative enrichment of those sounds.1 Although I endorse the conventional thesis that imaginative engagement is normally required to appreciate music when listening to it, I argue that we should be more circumspect about this claim than is typically the case. For example, most accounts of musical expressiveness say that imaginative engagement is required in order to perceive the melancholy that runs through most of Mozart’s G Minor String Quintet (K. 516) or the joy of Louis Prima’s “Sing Sing Sing” as performed by Benny Goodman and his Orchestra. In turn, expressiveness is frequently tied to imaginative enrichment that recasts auditory events as musical motion. Imagination lets us hear motion and gestures “in” the progression of sounds, which in turn facilitates an experience of expressiveness.2 I reject both of these proposals, as well as the weaker proposal that imagination is

468   theodore gracyk required in the form of a descriptive supposition that guides expressive interpretation. At the same time, I caution that we swing too far in the opposite direction if we dismiss all imaginative engagement as a subjective, distracted response (e.g., Meyer 1956, 257).

The Opposition of Hearing and Listening Virginia Woolf frequented the opera and sought out performances of Beethoven’s string quartets. Reflecting in her diary about a concert of instrumental chamber music, she mused, “musical people don’t listen as I do, but critically, . . . without programmes” (Woolf  1980, 39). She wondered whether she was listening properly when the music encouraged streams of imaginative imagery and associations. Woolf remarks that, when a concert program features a Bach concerto, “its [sic] difficult not to think of other things” (Woolf 1977, 33). She offers a lengthy description of listening to music in her stream-of-consciousness story “A String Quartet,” in which a nameless protagonist responds as follows to the opening measures of a Mozart quartet. [L]ooking across at the player opposite, the first violin counts one, two, three— Flourish, spring, burgeon, burst! The pear tree on the top of the mountain. Fountains jet; drops descend. But the waters of the Rhone flow swift and deep, race under the arches, and sweep the trailing water leaves, washing shadows over the silver fish, the spotted fish rushed down by the swift waters, now swept into an eddy. (Woolf 2003, 133)

If we read this passage as quasi-autobiographical, it appears that by “think[ing] of other things” Woolf means she engages in vivid imaginative associations while listening. Although Woolf probably did not know of it, Violet Paget was just then concluding an empirical study of listeners’ responses to classical music. A London acquaintance of Woolf, Paget published both fiction and nonfiction under the pseudonym Vernon Lee. There were almost certainly occasions when they attended the same music performances. When Lee analyzed her listener-supplied data, she concluded that attentive listeners who focus on following “the notes and all their relations” are the least likely to accompany their listening with “extraneous suggestion . . . metaphorical allusion, [and] visual analogy” (1932, 440). Consequently, Lee divides music audiences into the categories of listeners and hearers, where the latter are distracted by extraneous imaginings. She frequently praises less imaginative listeners as “the more musical” audience (e.g., 1932, 30). Lee seems to be adapting and defending the ideas of Edmund Gurney, who divides listening into two categories, definite and indefinite (1880, 304). Definite listening occurs when a “musical ear” attends closely to musical form. Indefinite listeners treat musical sound as “a congenial background for their subjective trains of thought and

imaginative listening to music   469 emotion” (Gurney 1880, 306). According to Gurney and Lee, Woolf ’s active imagination places her squarely in the company of mere hearers. Her listening is indefinite.3 For Woolf, the issue was more than an academic question. Her mode of attending to music had recently been described and criticized by her brother-in-law, Clive Bell. Explaining and defending aesthetic formalism, Bell valorizes the pattern-focused attention of Lee’s “listeners” and Gurney’s definite listening: Tired or perplexed, I let slip my sense of form . . . I begin to read into the musical forms human emotions of terror and mystery, love and hate, and spend the minutes, pleasantly enough, in a world of turbid and inferior feeling. At such times, were the grossest pieces of onomatopoeic representation—the song of a bird, the galloping of horses, the cries of children, or the laughing of demons—to be introduced into the symphony . . . they would afford new points of departure for new trains of romantic feeling or heroic thought. I know very well what has happened. I have been using art as a means to the emotions of life and reading into it the ideas of life. I have been cutting blocks with a razor  (Bell 1914, 31–32).

Bell acknowledges that musical representation and therefore imaginative response is sometimes intended by composers. Yet he dismisses it: “The representative element in a work of art may or may not be harmful; always it is irrelevant” (Bell 1914, 25). Form alone is the whole of music’s artistic value. Bell endorses listening, in Woolf ’s words, “without programmes.” Imaginative engagement is not necessary when listening, and it is a sign that one is distracted from attending closely to relevant musical relationships. This disparagement of imaginative engagement is a recent phenomenon. Previously, a long tradition had regarded it as central to musical experience. For example, Johann Mattheson’s doctrine of affections identified “strong imagination” as a precondition for responding appropriately to instrumental music (1739, 82). Modern philosophical aesthetics went further and recognized imagination as a necessary component of aesthetic response. In 1712, Joseph Addison identifies the chief value of the “polite” or fine arts as the function of generating “the pleasures of the imagination” (1965, 535–582). By the end of the decade, the abbé Du Bos was preaching the same gospel in France. Where Addison concentrates on literature and the plastic arts and mentions music only briefly, Du Bos recognizes that instrumental music requires special attention and devotes pages to explaining how it engages the imagination (1748 I, 360–375). Soon after, Charles Batteux offers the first coherent definition of the fine arts and argues that all good music is representational—therefore it engages the imagination—for otherwise it could not be art (2015, 136). Back across the channel, David Hume proposes that “a true judge in the finer arts” must respond with “delicacy of imagination” (Hume 1987, 234). The century’s debates about the nature of aesthetic response culminate in the touchstone of modern aesthetics, Immanuel Kant’s Critique of the Power of Judgment. Synthesizing eight decades of aesthetics, Kant famously contends that aesthetic judgment is a free play of imagination and understanding. Instrumental music occupies “the lowest place” in the fine arts because it provides limited guidance to imagination; its charms are too

470   theodore gracyk much the result of the stimulation of “sensations” (Kant 2000, 206). The centrality of imagination continues to dominate theories of musical experience through the nineteenth century and then into our own time. In recent years, imaginative engagement is treated as an essential component of listening by both philosophers and musicologists, including Roger Scruton (1974, 1999), Charles Rosen (1995), Nicholas Cook (1990), Denis Dutton (2009), and Jerrold Levinson (2006a). Let us assume that many people attend to instrumental music as Woolf did, engaging with music more imaginatively than is minimally required for perception of sound sequences.4 Against the common prejudice that listeners who engage in a more robust imaginative response are less musical than those who listen “without programmes,” imaginative supplementation of what we actually hear seems to be unavoidable and necessary in music listening. But if all music listening is partly perceptual and partly imaginative, the real issue with Woolf ’s response is the degree to which some listeners let their imaginations run free.

Three Species of Imagination Many, many distinct roles have been assigned to imagination since Aristotle identified it as central to human thought (Sparshott 1990; Stevenson 2003; Townsend 2006, 160–161). Therefore, disagreements about its role in listening cannot be resolved unless we provide focus by determining which roles are relevant. I have already discarded most of the roles assigned to imagination by focusing on occurrent imagining, where imagination is applied to music that one currently hears. Occurrent imagining may have little or nothing in common either with having a tune stuck in the head (an eidetic image or “earworm”) or with imagining sounds while silently studying a musical score (Tovey 1936). Mary Warnock provides a succinct summary of the relevant central idea as our “capacity to look beyond the immediate and the present” (1976, 201). Of course, this does not distinguish imagination from memory, which can share precisely the same content. When I look at my yellow house and remember that it used to be blue, I need not form a mental image of it as blue. However, suppose I do, and “picture” the house as it used to be. Psychologists refer to this phenomenon as memory imagery. Since the experiential content of memory and imagination imagery can be identical, we need nonphenomenal criteria for differentiating memory imagery from imagination imagery. Current consensus holds that imagining the house as blue differs from remembering that it was blue according to whether one believes that it was blue. Imagination imagery is belief-independent. If someone believes that the image reproduces something as it was experienced in the past, then it is a memory (even if it is false).5 In Roger Scruton’s preferred description, “imagination involves thought which is unasserted” (1974, 97; cf. Scruton 1999, 88–89). Other recent explanations say that imagination is “quarantined” from beliefs, where “pretense representations differ from belief representations by their function” (Nichols 2004, 130). Or, more precisely, by their reduced function as measured

imaginative listening to music   471 by behavioral consequences. As a well-worn example has it, a horror movie may induce some level of fear, but imaginary monsters do not prompt normal people to call the police for help, nor do they jump from their seats and run for safety. When Woolf imagines the spotted fish in the eddy, she does not make plans to return to that spot later with a fishing rod.6 So which species of imagining are most relevant to music listening? I will concentrate on three modes of imaginative engagement that philosophers typically discuss in relation to experiencing pictures, literature, and music. They are propositional imagining, imagination imagery, and hearing-in.7 Propositional imagining involves conceiving or making-believe that a proposition or set of propositions is true of some world, without necessarily believing that it holds true of our own. This species of imagining is normally understood to simulate belief. For example, suppose you are reading Jane Austen’s Sense and Sensibility and reach the line, “a neat wicket gate admitted them into [a small green court].” One way to respond to this linguistic prompt is to suppose, for purposes of the narrative, that the Dashwood family has now passed through a wicket gate and has entered a small courtyard in front of their new cottage. This thought may or may not be accompanied by a second imaginative activity, imagination imagery. Some readers will supplement the propositional imagining with imagery, visualizing (in their “mind’s eye”) a fence and wicket gate and a group of women going toward a cottage.8 There will be considerable variation in what is imagined. One reader may construct an image of a green, grassy courtyard in front of a one-story cottage. Another may furnish the area with rose bushes, and imagine a small, two-story house. However, music listening might require a third species of imaginative engagement, which literary fiction does not require. This third kind is common with sculpture, pictures, and films: it involves imaginative transformation of what one directly perceives. Suppose I look at a landscape painting and I see both a paint-cracked surface and the painting’s representation of a horse-drawn cart crossing a stream beside a white cottage. Looking at the canvas, I imagine that I am actually seeing the English countryside in some past age. This kind of imaginative engagement accompanies reading when illustrations appear in the particular edition that one is reading. But, graphic novels aside, pictures are not necessary for the experience of literature. For sculpture and pictures, the requisite imaginative engagement with the visual object is generally referred to as seeing-in (e.g., Lopes 2005). With music, the parallel case is hearing-in, as when the rumble of thunder is heard in the tympani rolls in the storm sequence of Beethoven’s Pastoral Symphony.9 Hearing-in is guided by the listener’s direct experience of sonic features, their combination, and their sequencing. Hearing-in and seeing-in are alike in that each involves imaginatively experiencing a perceived object to be something more than it is. Granted, both seeing-in and hearing-in are sometimes supplemented with—and guided by—propositional imagining.10 Informed listeners may attend to a relevant passage in the Pastoral Symphony by consciously or unconsciously imagining “The storm is passing now” without believing they have witnessed a storm.11 Others will also add imagination imagery, supplementing the auditory experience by visualizing a storm and then its

472   theodore gracyk lifting.12 There will be cases, therefore, when seeing-in and hearing-in invite all three species of imaginative engagement. Is this third kind of imaginative engagement with music, hearing-in, required in music listening, either by itself or as guided by propositional imagining? A central test case for the necessity of imagination in listening is the thesis that music demands hearing-in when we imaginatively “animate” what we hear (e.g., Trivedi 2011, 118). Many accounts of listening treat hearing-in as essential to the experience of musical movement and to the experience of music’s expressive qualities. However, it is possible that different imaginative processes—or perhaps none at all—are involved when we experience movement, structure, and expressivity. The other two species of imaginative engagement, propositional imagining and imagination imagery, are subject to the objection that the thoughts and images are unnecessary and inessential additions to the listening process. This objection appears to capture Woolf ’s worries about her listening strategies. However, it cannot be raised against hearing-in if both of two conditions hold: (1) the experience of musical animation is an essential aspect of the experience of listening; and (2) the perceived animation requires imaginative hearing-in, which is not required more generally for auditory perception. Following some additional prefatory work, I will challenge the second of these two conditions; having done so, I will look for other ways that hearing-in might be essential to music listening.

Props, Triggers, and Absolute Music My general concern is the plausibility of the thesis that a particular species of imaginative engagement, hearing-in, is required for music listening. In this section I elaborate on why hearing-in is the crucial test case. It does not take much to demonstrate the weaker thesis that imaginative engagement is frequently appropriate when listening. For example, it is appropriate for songs, opera, and program music where verbal cues will attune the listener’s sensitivity to extra-musical representation in various musical structures. Listening to Jimi Hendrix’s rendition of “The Star-Spangled Banner” at Woodstock, we should hear bombs exploding in the guitar pyrotechnics following the (unsung) line, “the rocket’s red glare, the bombs bursting in air.” Likewise, we should hear the foot treadle of the spinning wheel in the music of Schubert’s “Gretchen am Spinnrade.” Although these cases of hearing-in are appropriate, imaginative responses, they do not advance the case that hearing-in is a necessary element of music listening. They fail for the same reason that stage sets at the opera do not count as evidence that music is an audiovisual art form. These are hybrids of music and something more, and the “something more” is an obligatory guide to the imagination in our response to the hybrid object of attention (see Davies 1994, 113–114). The plausibility of the stronger thesis, that listening to music requires hearing-in, hinges on imagination’s role for listeners who do not receive explicit guidance from extra-musical information. As Eduard Hanslick (1986, 15) argues, absolute music is the

imaginative listening to music   473 ideal test case for any view that a property or process is essential to music or music listening. We can generalize from it because it is “pure, objective, and self-contained— that is, not subordinated to words (song), to drama (opera), to a literary programme or even to emotional expression” (Hamilton 2007, 87). For the remainder of this essay, I will concentrate on examples that lack extra-musical information. But when listeners’ imaginations float free from intramusical guidance, purists can dismiss the imaginative response as subjective, irrelevant, and unmusical. So the strong thesis requires intramusical guidance from absolute music that yields (relatively) reliable recognition of whatever is heard in the music; listeners would report agreement at roughly the same level that people agree that a dog is pictured when shown a picture of a dog. Because no such level of agreement is evident with musical representation, we seem justified in dismissing idiosyncratic images, such as Woolf ’s “pear tree on the top of the mountain.” After all, how can anyone hear a tree, much less a particular type of fruit tree, by way of hearing-in? But the very same objection can be raised against any musical representation in which a sound does not resemble another sound. The tympani may sound like thunder and the woodwinds can imitate birdcalls, but imagination will play a limited role in listening if it is limited to cases of onomatopoeia. We need more than onomatopoeia but less than universal recognition of whatever is represented. We need an account of how musical patterns and passages guide propositional and/or imaginative hearing in a non-onomatopoeic manner (see Davies 1994, chap. 2). To better understand guided response, it will be useful to adapt a distinction from Kendall Walton. Consider the difference between cases where we imagine, of some object that we perceive, that it is something it is not, versus cases where an object leads us to imagine something, but we do not imagine it of the prompt itself (Walton 1990, 25). To imagine that a tree stump is a bear is a case of the former, whereas imagining that it is raining somewhere when one sees a dripping faucet is a case of the latter sort (because one is not imagining that the dripping water is rain). In the former case, the object is a prop, while in the second it is a trigger. The perceived object is a prop if there are con­ventions in place by which its properties generate fictional truths that guide our imaginings; that is, it is a prop if its particular features guide appropriately backgrounded participants to imagine a determinate state of affairs. Suppose we are playing the board game Monopoly and I mistakenly move someone else’s token and then “buy the railroad” on which it lands. Other players can (and will!) object that I have moved from the wrong location and so cannot buy that railroad with my Monopoly money. The various props together with established rules-of-play endorse certain imaginative responses and not others. Here, the mistake of moving the wrong token has an objective consequence for what is taking place (and, also, not taking place) in the game world. Conversely, the same object of perception is a mere trigger when the imagined content is imaginative imagery that is idiosyncratic and unconstrained by the object’s features. Suppose I select the wheelbarrow as my game token because it encourages happy thoughts of a bountiful harvest from my small backyard garden. But my garden is too small to involve use of a wheelbarrow, and my response floats free of the game; now, the token functions as a trigger, rather than a prop, for my imaginative enrichment.

474   theodore gracyk Aligned with the distinction between hearing-in and imaginative imagery, the ­ istinction between props and triggers offers a general framework for evaluating differd ences in listeners’ responses. It directs us to ask, of any particular imaginative response, whether it is appropriately directed and focused by perceptual cues. Purists are correct to question the appropriateness of responses of someone who treats all Western instrumental music as a trigger for fanciful, free-roaming imaginings.13 So we seem to have a principled reason to set aside idiosyncratic responses, such as Woolf ’s fish and pear tree. Therefore, imagination imagery is a poor candidate for the strong thesis and we should concentrate on the way that music functions as a prop (rather than a trigger) for hearing-in. For example, Lee found a pattern of water imagery independently associated with certain pieces of absolute music (1932, 428–429; e.g., for a Chopin Nocturne). This high level of agreement suggests that this imagery arises because the music is a rule-governed prop for our hearing-in. However, we must not be too hasty. Instrumental music often serves as a fragmentary, largely indeterminate prop: the imaginings they license are less determinate than in most games of make-believe (Walton 1994, 52). From that perspective, there is no reason to object to the fact that Woolf ’s imaginary fish and tree are highly determinate interpretations of audible elements of the musical experience.14 She may be more “musical” than she thinks she is, but thinks otherwise due to the influence of formalists who deny that music should be a prop for hearing-in. What is at stake is whether there is any prop-function that holds for all music listening. Given the anti-imagination stance of formal purists, it is important to recognize that some formalists endorse hearing-in. Hanslick, perhaps the most influential formalist of the nineteenth century, urged a distinction between necessary and unnecessary imaginative hearing-in. He is frequently attacked for a variety of intellectual sins, real and exaggerated, but he is seldom given credit for foreshadowing Walton’s distinction between prompts and triggers. Hanslick famously argues that it is an error to imagine a narrative or expressive persona when listening to Bach’s Das wohltemperierte Klavier (1986, 14). However, Hanslick is not anti-imagination: “If we are to treat music as an art, we must recognize that imagination and not feeling is always the aesthetical authority” (5). Some imaginative responses count as appreciative response, while some others do not (30). Basically, hearing-in is only appropriate when there are real properties of the music that serve as focal points for the imaginative response. For Hanslick, the first requirement is a culturally entrenched tonal system. On this basis, our sense of musical form is tied to our apprehension of it as a representation of “the motion of a physical process according to the prevailing momentum: fast, slow, strong, weak, rising, falling . . . It can depict not love but only such motion as can occur in connection with love” (11). Hearing-in generates awareness of musical animation. It is therefore essential to all musical content, which Hanslick identifies with “tonally moving forms” (29). He distinguishes this from cases of “hearing” where music is a mere trigger for free associ­ ation and where the listener is not appreciating it for what it is, despite enjoying the experience (59). Hanslick’s sketchy remarks endorse the necessity of imaginatively enriched perception. More importantly, he identifies the feature that has attracted almost universal consensus

imaginative listening to music   475 as the indispensable example of hearing-in: musical movement. In contrast, purism (à la Gurney, Bell, and Lee) does not distinguish between appropriate and inappropriate cases of imaginative engagement. When instrumental music encourages thoughts and images of anything but the music, the purist dismisses it as a trigger for imaginative play that distracts from the proper object of attention, which is musical form. In principle, I side with Hanslick. Music is not a mere trigger for fanciful association, for we can identify ways in which the imagining of some properties or situations is directly guided by the music, where there is some level of intersubjective agreement that particular imaginings do or do not fit the music’s features in the absence of extra-musical cues. The best possible case will be one where listeners cannot recognize musical properties without this sort of imaginative engagement. But if we can establish that hearing-in is necessary when listening, however minimally, the purist cannot use the exercise of imagination as a criterion for regarding a listener as unmusical or the listening as distracted.

Expressiveness Appropriately backgrounded listeners frequently hear music as sad, joyful, anxious, and so on. Levels of agreement are so high that we can use expressivity as a test case of musical competence. For example, we must doubt the musicality of anyone who reports that Benny Goodman’s performances of “Sing Sing Sing” sound melancholy and despairing. Following established philosophical usage, I speak of music’s “expressiveness” and “expressive qualities” rather than its “expression of emotion.” Genuine expression requires a person or sentient being who has an emotion and signals it to others by means of external signs. Thus, my dog expresses happiness by wagging his tail. However, composers are capable of composing music that sounds happy or sad, or happy or sad in a very particular way, without having to draw on their own emotional experiences as a source of the music’s design. The key to composing sad music is knowing what sad music sounds like. There may be occasions where composers engage in self-expression, but self-expression is not necessary for the music’s having an expressive dimension. Therefore, it is better to describe the sadness of, say, a twelve-bar blues as an expressive quality than to treat it as an expression of the emotion of sadness. I will be brief about expressiveness and imagination. Many, and perhaps most, philosophies of art analyze musical expressiveness by reference to imagination and make-believe. Malcolm Budd pinpoints the “underlying idea” as the proposal that “emotionally expressive music is designed to encourage the listener to imagine the occurrence of experiences of emotion” (1989, 135). Unfortunately, the idea that music’s expressiveness emerges through imaginative engagement does not establish that all music listening requires imagination. Some music is not appropriately heard as possessing expressive qualities, including some the serialism of Milton Babbitt, Pierre Boulez’s Structures I and II, and Philip Glass’s Music in Contrary Motion. “Expressionless” music is not restricted to the twentieth and twenty-first centuries. The fugues of

476   theodore gracyk J.S. Bach’s Das wohltemperierte Klavier and Die Kunst der Fuge are frequently identified as examples of “emotionless” musical masterpieces (Lang 1997, 509). So although a hearingin account of expressiveness supports the view that imaginative enrichment is sometimes necessary to perceive expressive features, we should not generalize this finding to all music listening. However, I regard even that position on expressiveness as overly generous. The limited scope of that endorsement collapses if there is a plausible nonimagination account of music’s expressiveness. Here, I think that Budd and Stephen Davies are correct to exclude imagination from our detection of musical sadness and happiness, the two most universally recognized “emotions” in music (Budd 1989, 137; Davies 2011, 1–20). We describe many external appearances with emotion terms, yet we do not always imagine in these cases that we are detecting any underlying mental states. For example, in the same way that we can describe weather as “gloomy” without attributing feelings to weather, we can describe someone as having an “angry” tone of voice without thinking they are angry. Since emotion descriptions of music are obviously descriptions of how the music sounds, phrases such as “angry music” and “sad music” may be compressed, literal descriptions of angry-sounding music, sad-sounding music, and so on. Yet the topic of musical expressiveness is not irrelevant to our interests here. Many accounts of expressiveness regard it as dependent on a second phenomenon, our experience of musical animation or movement (Hanslick 1986, 11; Lee 1932, 80; Kivy 1989, 52–58; Levinson 2006b, 121–123; Davies 2011, 10–11). In turn, the experience of musical movement and animation is generally thought to require imaginative engagement (and doubly so, when it is interpreted as a bodily gesture reflecting agency). Since all music displays some kind of motion or animation, it is not expressiveness but rather musical motion that provides a universal musical phenomenon that may require imagination. I investigate this proposal in the next section.

Metaphor, Musical Space, and Movement Here is Shakespeare, four hundred years ago: “That strain again! It had a dying fall.”15 Which strain does Duke Orsino want to hear again? The one that moves with a dying fall. Here is a recent description of some music in Bernard Herrmann’s score for Hitchcock’s Psycho: “The opposing nature of the two musical lines moving toward each other reflects . . . two perspectives” (Rothbart 2013, 46). The musical lines are oriented in an unreal acousmatic “space” in which they are moving toward each other, and on this basis they can represent what is happening in the film.16 But do we imagine the movement of a melodic line, or the leap of the octave? Despite significant cultural differences, the use of motion-terminology and action-descriptions to characterize music is a cross-cultural phenomenon (Becker  2010). An important

imaginative listening to music   477 assumption is that we are not dealing with after-the-fact metaphorical descriptions, because then we may be looking at propositional imagining detached from hearing-in. We want to pursue the idea that we talk that way about music because that is how music sounds. And, to experience the music as moving, we must also perceive an unreal space in which it moves.17 The primary reason to think that imagination is at play here is that a melody does not literally die, for it is not alive. Nor can it approach another melody, for it is not an object in space. It is not subject to gravitational pull, so it cannot fall. A melody is a sequence of tones, one following another in time. Granted, sound waves are physical displacements, but a “falling” melody is not literally one in which sound waves move from a higher to a lower location in our physical environment. Similarly, music seems to speed up, or slow down, as happens repeatedly and to notable effect in the Beatles’ “We Can Work It Out.” However, a tempo change is very unlike our paradigm examples of motion, as when a moving vehicle accelerates. Unlike an automobile, music presents no stable object that changes speed as it moves through acousmatic space. Instead, tempo changes involve changes in the frequency of certain periodic events that we perceived as grouped together. But a change in the frequency of an event is not the same thing as a speeding up or slowing down. Roger Scruton provides an analysis of musical motion that, if accepted, proves that these experiences require imagination (see also Zangwill 2007). He argues that we cannot perceive harmony, melody, and rhythm except by thinking of sounds in terms of our concepts of spatial arrangement and movement. (For example, a chord covers a spatial distance or width, and a melody involves movement up and down and from one note to another.) Since the sounds do not actually possess spatial properties and actions of the sort we perceive in the music, music must be “the object of metaphorical perception” (Scruton 1999, 353; see also 2014, 161).18 In the context of the present discussion, Scruton’s analysis is important for proposing that metaphors involve an imaginative transference of concepts, so his claim that music is the object of metaphorical perception fits our characterization of music as an imaginative prompt. He offers an account of propositional imagination as a guide to hearing-in. Scruton’s theory of metaphorical listening is both interesting and influential, but many criticisms have been launched against it. In the end, it does not support the view that imaginative perception is required to listen to music. Here are two potent lines of criticism. First, metaphors are conceptual constructs, and there is no reason to suppose that prelinguistic infants engage in the conceptual transference that underlies metaphorical thinking. Yet there is solid empirical evidence supporting the view that prelinguistic infants attend to music and recognize at least some of its expressive content, in much the way that adult listeners do.19 So linguistically guided metaphor is not essential to our basic experience of music, nor its basic expressivity. This point is especially salient when we consider the appearance of multiple distinct motions in short passages of music, as when the two musical lines approach each other in Herrmann’s score for Psycho. It is unlikely that we construct a metaphor for each, one rising and the other falling, and then

478   theodore gracyk a further metaphor for the relationship between them (as approaching each other). The metaphors would explode exponentially even in cases of a moderately more complex piece of music, such as when Talking Heads perform “Crosseyed and Painless” with a vocal line and seven distinct instrumental parts. If we do not find an alternative to the view that awareness of musical motion requires propositional imaginings, then how many metaphors do we juggle in our minds when we attend to polyrhythmic music of this sort? Or do we concede that we cannot hear the musicality of most of those instruments during most of the performance? But that is simply nonsense. We can perceive a great deal more musical detail and interplay than we conceptualize. Paul Boghossian provides a second criticism.20 We can distinguish between justified and unjustified metaphors. To do so, “we would have to be aware of some layer of musical experience with a perfectly literal content that our musical metaphors would be designed to illuminate. But there doesn’t seem to be such a layer of experience” (Boghossian  2007, 123). To put it another way, if one insists that all music listening derives from the guidance of a particular metaphor, then there is no principled way to distinguish between the metaphorical and the literal components of the experience, and the literal component cannot guide the application of the metaphor. Consequently, we should also be able to listen to any sound sequence in terms of the same metaphor, and we will hear it as music. (On this hypothesis, there would be nothing radical in John Cage’s invitation to attend to the music in seemingly nonmusical sound.) However, although almost everyone recognizes pitch differences in various natural and environmental sounds, and can hear patterns of change in these pitched sounds, it is very difficult to hear “music” in sound sequences when they have not been organized intentionally as such. Although there may be objective cues in some sound sequences (and absent from others) that invite us to recognize musical “motion” of various kinds, the perception of musicality does not arise from our application of a particular metaphor or from their subsumption under concepts imported from the visual and tactile realms. Stephen Davies (2011, 32) makes the related point that metaphors only facilitate con­ ceptual transference when they are “live,” as nonstandard descriptions that cast new light on a situation. However, we are not being creative or imaginative when we talk of musical motion, so the metaphor does not work as a metaphor any longer. I conclude that music listening does not require metaphorical perception, nor metaphor-guided perception. This conclusion deprives us of our most compelling reason to think that all music listening is infused with imagination. It does not, however, prove that imagination is always dispensable. Another argument might demonstrate that it has a necessary role, without invoking the guidance of metaphor. We might construct an alternative argument by modifying Scruton’s premises. First, suppose there is a metaphor, but it comes after-the-fact or as a supplement to the experience, as our best description of what we experience with harmony, melody, and rhythm.21 Second, we now allow that we perceive neither spatial arrangement nor motion in a melody or rhythm, for there is no object in space that fits our description when we say that a sad melody droops. We do not perceive space and motion and we do not, upon reflection, believe that music moves. Yet we experience something in the

imaginative listening to music   479 music that is usefully described with this language. Some sort of transference is taking place. Therefore, some underlying mental process is at work that encourages this mode of description. We must be engaging in an imaginative transformation of what we perceive, even if what we perceive is ineffable and only approximated by our space and movement metaphors. We have some kind of propositional imaging that guides hearing-in, but its precise nature is not available to our introspection. There are three strong objections to this line of thinking, and I think they are jointly decisive. First, there is nothing added by appeal to imagination that is not accomplished by admitting that human experience is largely ineffable. We frequently resort to descriptions that we do not endorse as literally true. However, this practice does not generally prove that imagination has transformed the experience in a way that defies description, so there is no reason to postulate it concerning music. The second counterargument simply rejects the premise that generates the previous objection. We can deny that the experience of music is unusually resistant to fine-grained description. Hanslick makes the point that musicians and music theorists possess a detailed technical vocabulary for describing music. The problem is not music, but the fact that so few people have learned to employ this vocabulary. The frequent use of “poetical fictions” to describe basic musical phenomena provides no evidence that people hear something in the music that is not literally there (Hanslick 1986, 30). The correct conclusion is that we simply have a lot of people who have not learned to articulate what they hear. Third, Davies (2011, 25–32) argues that there is simply no metaphor at work when we say that a melody falls or that the span of one chord is wider than another. We talk of spatial distances and movement for all sorts of things besides bodies in our three-dimensional environment. We apply these concepts and use these words literally whenever our experience is highly similar to established exemplars. Our general “motion” vocabulary is polysemous, not metaphorical (Davies 2011, 32).22 We have hit another wall in the search for a compelling reason to grant that imaginative enrichment must inform listening.

Experiential Illusion Not everything within experience that is observer-dependent is imagination-dependent. Rather than requiring imaginary or imagination-assisted aspect perception, the experience of musical animation corresponds to standard criteria for experiential illusion: we perceive an object in our environment, but we directly experience it as having properties that it does not really have (Johnston  2006, 269). Experiential illusions, such as the appearance of the line lengths in the Müller-Lyer illusion, persist even in the face of a belief that the object is not as it appears to be. Similarly, there are notable color illusions, as when two patches of identical color in the same image look like two different colors because of the different contrast colors that are beside them. More to the point, visual illusions are not always static. We readily “see” motion where there is no actual motion.

480   theodore gracyk Obvious examples include marquee lights and strings of Christmas lights. When they are rapidly lit in sequence, they create the illusion that a single point of light is moving along the string. We see motion even when we know that something else is really happening. Rafael De Clerq (2007) offers the example of moving the cursor around on a computer screen. But I do not really move anything there. In reality “there is no cursor moving on my computer screen: there are just local changes in the light emitted (just as, strictly speaking, there [are] only sounds or vibrations in the surrounding air)” (De Clerq 2007, 162). The important point is that these illusory motions are systematic, natural, and belief-resistant effects of our perceptual system, and they are not imaginative transformations of a more basic experience. Likewise, the experience of musical space and musical motion seem to involve an immediate, unlearned, unconscious process. As with a host of optical illusions, our species just seems to be hard-wired to organize some kinds of sounds in particular ways that we find “musical.” The experience of (illusory) movement in musical space would be a prominent example: we may talk about a piano sonata moving into a remote key, but there are no literal spaces or distances to traverse.23 Like any other perceptual illusion, the experience of melodic movement and of size or width in a musical chord persists in the face of knowledge that the relevant acousmatic “space” is not real. Because the experience of movement in music is common and fits our standard criteria for perceptual illusion, there is no reason to invoke imagination to explain the experience of acousmatic “space” and motion within it. Musical movement is an experiential illusion in which phenomenal effects systematically mislead us in response to certain kinds of sound structures, which composers and musicians exploit. (Again, the attention that infants give to melodies casts suspicion on the necessary role of imaginative processing.) If we are to appeal to illusion, rather than imaginings, we are committed to saying that it is the natural product of our inherited auditory system. Charles Nussbaum (2007) offers just such an account. The physiology of the human ear is integrated with mental structures that map all sounds spatially; we cannot attend to musical structure unless we also “move through [music’s] virtual tonal space in imagination” (2007, 99). It is noteworthy that Nussbaum places no weight on his occasional references to imagination. Given his explanation of why our mental representation of tonal space is an unavoidable response, Nussbaum more often and more accurately refers to it as an illusion (e.g., 50). A related line of explanation might posit that there is some degree of synesthesia involved in music perception (i.e., some natural incorporation of nonauditory perceptual systems into auditory perception; see the chapter on cross-modal corres­ pondences by Eitan and Tamir-Ostrover, volume 1, chapter 36). At the same time, Saam Trivedi (2008) and Stephen Davies correctly emphasize that any recourse to a biological explanation of this illusion is a supplement to the philosophical question at hand, which is the question of the proper referent of various terms when applied to music. Something perceived, or something imagined? Davies returns us to the philosophical issues by observing that a causal explanation is relevant when it explains why particular des­ criptions are employed so universally. Cross-cultural uniformity in language use is a good

imaginative listening to music   481 reason to think that the language is being used literally, not metaphorically: we really do experience music in terms of its own space, with movement in that space (Davies 2011, 31–32). And, again, that is a reason to think that we have hit another dead end in trying to prove that imagination is necessary for hearing sounds as music. If the experience of musical space and movement is going to be linked to imagination, the connection involves a sense of “imagination” that I have not explored. Imagination is only necessary if we stipulate that perceptual illusions are imaginative constructs. Although that stipulation was common in the past, contemporary usage does not endorse it.

Less Obvious Candidates If musical motion is experienced rather than imagined, then what other phenomena might require imaginative engagement? There is a complex debate about the importance of listeners’ comprehension of large-scale structure or musical “architecture” for complex instrumental compositions. Because these structures are never immediately present for direct perception, they must be imagined by knowledgeable listeners, either propositionally or though imagination imagery. Suppose Aaron Copland is right, and “imagination and the imagination alone” permits a listener “to see all around the structural framework of an extended piece of music” (1952, 15). Unfortunately, Copland’s qualification about “an extended piece of music” shows that it might not be required for all listening. We certainly do not need it when listening to an instrumental version of a strophic folk song, such as Donald Bird’s jazz take on “House of the Rising Sun.” It might not be necessary even for extended forms, such as symphonies. According to Jerrold Levinson’s (1997) concatenationism account of basic musical understanding, it is not required, for we can get what is most important from any musical work by attending moment-to-moment. Let us pause to take a closer look at moment-to-moment listening, for it involves anticipation and expectation of where the music is going (Meyer, 1956). It is this point, above all, that leads Hanslick to maintain that “imagination . . . is always the aesthetical authority” (1986, 5). Imagination yields aesthetic pleasure through “the mental satisfaction which the listener finds in continuously following and anticipating the composer’s designs, here to be confirmed in his expectations, there to be agreeably led astray” (64). But why are our anticipations imagined in one of the three relevant ways identified earlier, rather than the product of a nonimaginative cognitive process, such as inference? Hanslick is silent on this point. If listeners are “agreeably led astray” because they have a justified belief that something will occur, imagination is superfluous. However, there is one phenomenon that might bolster the claim that musical anticipation is a kind of imaginative hearing-in. It is the case where a listener knows that a musical event will occur, but who is nonetheless surprised by it. One of the classic examples is the crash of noise in the second movement of Haydn’s Symphony No. 94, aptly nicknamed the Surprise Symphony. It startles me even though I know it will be there, an effect that

482   theodore gracyk would seem to require imaginative representation of where the music is headed in the unfolding “world” of the music, independent of my belief about its real structure. However, this musical phenomenon can also be explained without recourse to imagination. Our listening process is cognitively complex, and consequently two distinct, competing thoughts can be generated by different cognitive processes. Haydn’s “surprise” exploits schematic expectation on the local scale, which leads us to expect, moment to moment, another passage of soft and lulling music. Expectation of disruption is based on episodic memory (from study of the score, or from past listening), which conflicts with immediate listening expectations that arise unconsciously from awareness of the perceptual pattern as it is being heard. Two doxastic states are in conflict: for a moment, I genuinely expect a continuation of the soothing melody, and for independent reasons I expect a disruption. Again, we have a straightforward explanation of being “agreeably led astray” that does not require us to suppose that imagination is involved.24 It is only imagination if imagination likewise generates my expectation that the stuff in my mug will taste like coffee (because it tasted that way when I sipped it thirty seconds ago), or when I anticipate that the airplane moving in a straight line above me will, in the next few seconds, continue in the same line. However, this is a general (and dubious) claim about all near-term anticipation, and not illuminating about music. One final candidate remains, and I think it fits the bill—it does not justify a conclusion about all music, but it arises when listening to a great deal of music (especially Western tonal music), and it requires imagination. When Haydn’s symphony arrives at its “surprise,” it does more than violate our expectations about the music’s likely continuation. The music’s forward motion has been suddenly, momentarily arrested. Unconscious inference may be at work, yet it is not simply a matter of mistaken inference about the immediate future. As Daniel Barenboim puts it: “There is a certain inevitability about music. Once it is set in motion, it follows its own natural course” (2003, 190; see also Meyer 1989, 33). Walton rightly emphasizes that we hear musical patterns as more than sequences of motions: “we imagine (subliminally anyway) that causal principles are operating by virtue of which the occurrence of the dominant seventh makes it likely that a tonic will follow, and . . . we imaginatively expect the tonic, whether or not we actually expect it” (1994, 49). There are two imaginative acts in Walton’s description. We imagine that the tonic will follow and we imagine causal principles by virtue of which this occurs. I have just explained why I do not think our expectation of the tonic is a case of imagining. But the idea “that certain musical events are nomologically connected” (Walton 1994, 49), where one sound or process is heard as if caused by another, is a strong candidate for imaginative hearing-in. Since we do not perceive causal connections, and so we are not subject to perceptual illusions about them, imagination appears to be at work when we experience music as if causal relationships internal to the music are guiding and organizing its unfolding. The aptness of this causal pretense might be explained by an implication-realization model of musical listening (Narmour,  1990), but our sense of guided motion goes beyond mere inferential expectation that some sound events are more or less likely.

imaginative listening to music   483 What’s more, this fictional causality is generally imposed within a teleological framework. In a lot of instrumental music, we sense the music’s starting out for somewhere, of its being sidetracked, disrupted, and its eventual arrival. One might suppose that we must therefore imagine the agency of a person, and therefore imagine a persona, and therefore endorse Jonathan Kramer’s view that “tonal motion is always goal-directed” (1988, 25). However, I might be content with the more schematic idea of a virtual “gravitational field” with vectoral properties and competing forces (Hatten 2004, 133). Or I might flesh this out by imagining a pear tree, growing on a mountain, or the course of the waters of the Rhone as they flow from Switzerland to the Mediterranean. The structure of the music is a prop for imagining causal relationships and transformations, but this causal activity can be imagined under a wide range of descriptions. Someone who has no sense of causal relationships within tonal music must be listening in a very strange and attenuated way (much like the listening of a very young child).

Conclusion The experience of causal forces at work within music, shaping and directing it, arises through the listener’s imaginative engagement in a manner that can be characterized as hearing-in. In recognizing that there is at least one way in which music listening requires imaginative engagement in the Western common practice tonal tradition, I have defended a middle position between a musical purism that denigrates imaginative response and the traditional consensus that finds the imagination at work in multiple ways, especially in the apprehension of all musical motion and structure. The underlying experience of musical motion betrays the hallmarks of illusion, not imagination imagery or hearing-in. This illusion provides a natural, cross-cultural basis for music’s expressiveness. So our experience of expressive qualities does not always require imaginative enrichment of what we hear. It might seem a bit of a letdown to conclude that neither propositional imagining nor hearing-in is required for much of the experience that is distinctively musical. However, that conclusion has been secured by focusing on examples of absolute music that are devoid of extra-musical cues about its interpretation. In practice, most music has either an associated text, or is rich in expressive properties, or both. Artworks typically prescribe imaginative engagement, and music typically does, too.25 To borrow Jerrold Levinson’s way of putting it (2015, 135), an anti-imaginative view of “listening” misdirects us if it counsels us to ignore the invitations that most music extends. That is, most musical experience takes place in a cultural context that constitutes an invitation to imagine that it portrays particular situations or events, so the music serves as a detailed yet inde­ terminate prop for our imaginative engagement. Artworks and other cultural artifacts, including most music, are designed to elicit a particular response, and someone is not appreciating the artwork or cultural achievement for what it is if their response is indifferent to its cultural particularity. (Because

484   theodore gracyk cultural artifacts have a history, responses that ignore their history may even invite moral censure, see Gracyk 2011) A Strauss waltz’s invitation to respond physically, by dancing, is quite different from a tone poem’s invitation to respond imaginatively, including with rich imaginative imagery. And both of those invitations seem very different from that of Bach’s Die Kunst der Fuge. Yet even that work invites imaginative engagement in the form of hearing-in. Consequently, anti-imagination purism endorses an impoverished response to a great deal of music.

Notes 1. The distinction between hearing music and listening to it is prominent in Hanslick’s formalism (1986, 60). For additional discussion of his distinction and how it was deployed, see Gracyk (2007, chap. 5), and Cook (1990, 15–17). 2. Prominent gesture theorists include Godøy (2010) and Levinson (2006a). 3. Susanne K. Langer categorizes a listener like Woolf as “a person of limited musical sense” (1957, 242); imagination imagery should not be encouraged by teachers, critics, and composers (243). 4. I set aside, without further comment, the degree to which listening to music involves imagination because imagination is at work in all perceptual experience, filling in the gaps as our attention flits among objects and providing more stability and coherence than we directly perceive (e.g., Zinkin 2003; Stevenson 2003, 249–253). Such an account will not be especially illuminating about music listening. 5. Context matters: beliefs and memories can be incorporated into fictional narratives; consequently, the distinction between memory and imagined event is sometimes a matter of the use of a representation (Walton 1990, 369). But, again, content alone does not make the difference. I thank Bryan Parkhust for reminding me of this point. 6. Tamar Gendler offers compelling arguments that “quarantine” is always limited, and that “a certain degree of contagion is inevitable, indeed desirable” (2010, 8; see especially chaps. 11 and 12). 7. Some analyses employ the phrase “imaginative hearing” rather than “hearing-in” (e.g., Trivedi 2011, 114). However, I avoid this phrase because other writers use it to refer to imagination imagery. 8. An important line of analysis, descriptivism, argues that there is no such imagery. See Pylyshyn (1973) and Dennett (1981). 9. Some accounts distinguish seeing-in from seeing-as, and so there might be reasons to worry about a difference between hearing-in and hearing-as (e.g., the former requires audibility but the latter involves propositional imagining). Following Levinson’s analysis (1996, 111–112), I doubt that the distinction is relevant to anything I discuss. 10. I thank an anonymous reader for noting that it is difficult to conceive of propositional imagining about music occurring without some form of imagination imagery. However, I would suggest that this commonly occurs when one encounters descriptions of fictional musical works in literary works, such as Marcel Proust’s description of the Vinteuil Sonata. For more on fictional composers and compositions, see Ross (2009). 11. Consequently, the issue on which I focus is distinct from the question of the minimum level and type(s) of conceptual information that a listener must apply in order to possess musical understanding. The scope of this debate is outlined by Davies (2011, 88–128). However, that debate concerns a listener’s genuine beliefs, not imagination.

imaginative listening to music   485 12. Woolf ’s (2003) story is ambiguous: her description of the listener’s stream-of-consciousness may describe propositional imagining, or it may describe imagination imagery, or both. 13. However, assigning imagination imagery to the category of “triggered” associations does not prove that listeners are “less musical” because they sometimes respond in this manner. 14. Kendall Walton resists the interpretation that all music is representational and/or a prop (1994, 59–60). However, his reservations turn on the technical point that one is actually using one’s experience, not the music itself, as a prop. So my use of “prop” is more liberal than Walton’s considered view (and more in line with Walton 1990, 63). 15. William Shakespeare, Twelfth Night, or What You Will, 1.1. 16. Movement in acousmatic space is to be distinguished from movement of sound in real space, as when the marching band moves toward you and then away from you as it moves down the street. 17. Budd (2003) has the most stringent position, denying that we must perceive an acousmatic “space” in order to perceive musical motion. 18. Music theorist Steve Larson (2012) adopts and develops Scruton’s thesis by extending it to metaphors of musical forces. 19. E.g., Nawrot (2003). At six months of age, infants display marked musical preferences based on cultural exposure (Adachi and Trehub  2012). Although their listening is impoverished compared to that of a competent adult, they are listening, not merely hearing, and there is no reason to think that they are guided by metaphors. 20. Trivedi (2008, 51–52) makes a similar argument, but it depends on a premise about the eliminability of all metaphor that is too sweeping. 21. Budd (2003, 212) proposes that Scruton might be read in this way, as distinguishing between the perception and the metaphor, where the metaphor arises only in any subsequent verbal expression of that experience. However, Budd’s interpretation flies in the face of the great many places where Scruton clearly says that “our experience of music involves an elaborate system of [spatial] metaphors” (Scruton 1999, 80). 22. Consider the prevalence of such language in descriptions of philosophical exchanges: a philosopher “stakes out a position” and then makes a “move” in an argument. These descriptions have ceased to be metaphors and they can be employed without exercise of the imagination. 23. This analysis is briefly suggested in Davies (2011, 32). 24. This explanation, in terms of two response systems, paraphrases Huron (2006, 226). 25. See Kieran (1996, 337). Having rejected the view that music is representational and invites imagination whenever it “moves” or displays expressiveness, Kieran’s characterization is more accurate than Walton’s position that “virtually all music qualifies” as representational (Walton 1994, 48).

References Adachi, M., and S. E. Trehub. 2012. Musical Lives of Infants. In The Oxford Handbook of Music Education, edited by G. McPherson and G. Welch, 229–247. New York: Oxford University Press. Addison, J., and R. Steele. 1965. The Spectator. Vol. 3. Edited by D. F. Bond, Oxford: Clarendon Press. Barenboim, D. 2003. A Life in Music. Edited by M. Lewin. New York: Arcade Publishing. Batteux, C. 2015. The Fine Arts Reduced to a Single Principle. Translated by J. O. Young. Oxford: Oxford University Press.

486   theodore gracyk Becker, J. 2010. Exploring the Habitus of Listening: Anthropological Perspectives. In Handbook of Music and Emotion: Theory, Research, Applications, edited by P. N. Juslin and J. A. Sloboda, 127–158. Oxford: Oxford University Press. Bell, C. 1914. Art. London: Chatto and Windus. Boghossian, P. 2007. Explaining Musical Experience. In Philosophers on Music: Experience, Meaning, and Work, edited by K. Stock, 117–129. New York: Oxford University Press. Budd, M. Music and the Communication of Emotion. 1989. Journal of Aesthetics and Art Criticism 47: 129–38. Budd, M. 2003. Musical Movement and Aesthetic Metaphors. British Journal of Aesthetics 43: 209–223. Cook, N. 1990. Music, Imagination, and Culture. Oxford: Oxford University Press. Copland, A. 1952. Music and Imagination. Cambridge, MA: Harvard University Press. Davies, S. 1994. Musical Meaning and Expression. Ithaca, NY: Cornell University Press. Davies, S. 2011. Musical Understanding and Other Essays on the Philosophy of Music. Oxford: Oxford University Press. De Clerq, R. 2007. Melody and Metaphorical Movement. British Journal of Aesthetics 47: 156–168. Dennett, D. 1981. Two Approaches to Mental Images. In Imagery, edited by N. Block, 87–107. Cambridge, MA: MIT Press. Du Bos, J.-B. 1748. Critical Reflections on Poetry, Painting and Music. 5th ed. Translated by T. Nugent. London: John Nourse. Dutton, D. 2009. The Art Instinct: Beauty, Pleasure, and Human Evolution. Oxford: Oxford University Press. Elliott, R. K. 1967. Aesthetic Theory and the Experience of Art. Proceedings of the Aristotelian Society 67: 111–126. Gendler, T.  S. 2010. Intuition, Imagination, and Philosophical Methodology. Oxford: Oxford University Press. Godøy, R.  I. 2010. Gestural Affordances of Musical Sound. In Musical Gestures: Sound, Movement, and Meaning, edited by R.  I.  Godøy and M.  Leman, 103–125. New York: Routledge. Gracyk, T. 2007. Listening to Popular Music: Or, How I Learned to Stop Worrying and Love Led Zeppelin. Ann Arbor: University of Michigan Press. Gracyk, T. 2011. Misappropriation of Our Musical Past. he Journal of Aesthetic Education 45 (3): 50–66. Gurney, E. 1880. The Power of Sound. London: Smith, Elder & Company. Hamilton, A. 2007. Aesthetics and Music. London and New York: Continuum. Hanslick, E. 1986. On the Musically Beautiful. Translated by G. Payzant. Indianapolis: Hackett. Hatten, R.  S. 2004. Interpreting Musical Gestures, Topics, and Tropes: Mozart, Beethoven, Schubert. Bloomington: Indiana University Press. Hume, D. 1987. Of the Standard of Taste. In Essays Moral, Political, and Literary, edited by E. F. Miller, 226–249. Indianapolis, IN: Liberty Fund. Huron, D. 2006. Sweet Anticipation: Music and the Psychology of Expectation. Cambridge, MA: MIT Press. Johnston, M. 2006. Better than Mere Knowledge? The Function of Sensory Awareness. In Perceptual Experience, edited by T. S. Gendler and J. Hawthorne, 260–290. Oxford: Oxford University Press. Kant, I. 2000. Critique of the Power of Judgment. Translated by P.  Guyer and E.  Matthews. Cambridge: Cambridge University Press.

imaginative listening to music   487 Kieran, M. 1996. Art, Imagination, and the Cultivation of Morals. Journal of Aesthetics and Art Criticism 54: 337–351. Kivy, P. 1989. Sound Sentiment: An Essay on the Musical Emotions. Philadelphia, PA: Temple University Press. Kramer, J.  D. 1988. The Time of Music: New Meanings, New Temporalities, New Listening Strategies. New York: Schirmer. Lang, P. H. 1997. Music in Western Civilization. New York: W.W. Norton. Langer, S. K. 1957. Philosophy in a New Key: A Study in the Symbolism of Reason, Rite, and Art. 3rd ed. Cambridge, MA: Harvard University Press. Larson, S. 2012. Musical Forces: Motion, Metaphor, and Meaning in Music. Bloomington: Indiana University Press. Lee, V. 1932. Music and Its Lovers: An Empirical Study of Emotion and Imaginative Responses to Music. London: G. Allen & Unwin. Levinson, J. 1996. Musical Expressiveness. In The Pleasures of Aesthetics: Philosophical Essays, 90–125. Ithaca, NY: Cornell University Press. Levinson, J. 1997. Music in the Moment. Ithaca: Cornell University Press. Levinson, J. 2006a. Sound, Gesture, Spatial Imagination, and the Expression of Emotion in Music. In Contemplating Art: Essays in Aesthetics, 77–90. Oxford: Oxford University Press. Levinson, J. 2006b. Nonexistent Artforms and the Case of Visual Music. In Contemplating Art: Essays in Aesthetics, 109–128. Oxford: Oxford University Press. Levinson, J. 2015. Musical Concerns: Essays in Philosophy of Music. Oxford: Oxford University Press. Lopes, D. M. 2005. Sight and Sensibility: Evaluating Pictures. Oxford: Oxford University Press. Mattheson, J. 1739. Der vollkommene Capellmeister. Hamburg, Germany: Christian Herold. Meyer, L. B. 1956. Emotion and Meaning in Music. Chicago: University of Chicago Press. Meyer, L.  B. 1989. Style and Music: Theory, History, and Ideology. Chicago: University of Chicago Press. Narmour, E. 1990. The Analysis and Cognition of Basic Melodic Structures: The ImplicationRealization Model. Chicago: University of Chicago Press. Nawrot, E. S. 2003. The Perception of Emotional Expression in Music: Evidence from Infants, Children and Adults. Psychology of Music 31: 75–92. Nichols, S. 2004. Imagining and Believing: The Promise of a Single Code. Journal of Aesthetics and Art Criticism 62: 129–139. Nussbaum, C.  O. 2007. The Musical Representation: Meaning, Ontology, and Emotion. Cambridge, MA: MIT Press. Pylyshyn, Z.  W. 1973. What the Mind’s Eye Tells the Mind’s Brain: A Critique of Mental Imagery. Psychological Bulletin 80: 1–24. Rosen, C. 1995. The Romantic Generation. Cambridge, MA: Harvard University Press. Ross, A. 2009. Imaginary Concerts: The Music of Fictional Composers. New Yorker, August 24: 72. Rothbart, P. 2013. The Synergy of Film and Music: Sight and Sound in Five Hollywood Films. Lanham, MD: Scarecrow Press. Scruton, R. 1974. Art and Imagination: A Study in the Philosophy of Mind. London: Methuen. Scruton, R. 1999. The Aesthetics of Music. Oxford: Oxford University Press. Scruton, R. 2014. The Soul of the World. Princeton, NJ: Princeton University Press. Sparshott, F. 1990. Imagination: The Very Idea. Journal of Aesthetics and Art Criticism 48: 1–8. Stevenson, L. 2003. Twelve Conceptions of Imagination. British Journal of Aesthetics 43: 238–259. Tovey, D. F. 1936. The Training of the Musical Imagination. Music and Letters 17: 337–356.

488   theodore gracyk Townsend, D. 2006. Historical Dictionary of Aesthetics. Lanham, MD: Scarecrow Press. Trivedi, S. 2008. Metaphors and Musical Expressiveness. In New Waves in Aesthetics, edited by K. Stock and K. Thomson-Jones, 41–57. Basingstoke, UK: Palgrave Macmillan. Trivedi, S. 2011. Music and Imagination. In The Routledge Companion to Philosophy and Music, edited by T. Gracyk and A. Kania, 113–122. London and New York: Routledge. Walton, K. L. 1990. Mimesis as Make-Believe: On the Foundations of the Representational Arts. Cambridge, MA: Harvard University Press. Walton, K.  L. 1994. Listening with Imagination: Is Music Representational?” Journal of Aesthetics and Art Criticism 52: 47–61. Warnock, M. 1976. Imagination. Berkeley and Los Angeles: University of California Press. Woolf, V. 1977. The Diary of Virginia Woolf, Vol. 1: 1915–1919, edited by A. O. Bell. New York: Harcourt Brace Jovanovich. Woolf, V. 1980. The Diary of Virginia Woolf, Vol. 2: 1920–1924, edited by A. O. Bell. New York: Harcourt Brace Jovanovich. Woolf, V. 2003. A Haunted House: The Complete Shorter Fiction, edited by S. Dick. London: Vintage. Zangwill, N. 2007. Music, Metaphor, Emotion. Journal of Aesthetics and Art Criticism 65: 391–400. Zinkin, M. 2003. Film and the Transcendental Imagination: Kant and Hitchcock’s The Lady Vanishes. In Imagination, Philosophy, and the Arts, edited by D.  Lopes and M.  Kieran, 245–258. London: Routledge.

chapter 24

A Hopefu l Ton e A Waltonian Reconstruction of Bloch’s Musical Aesthetics Bryan J. Parkhurst

Introduction Here are two similar-sounding terms: normative aesthetics and normativist aesthetics. The principal contentions of this paper are that (1) Ernst Bloch’s normative aesthetics of music and Kendall Walton’s normativist aesthetics of music both set out to address the relationship between music and the imagination or, more broadly, between “musicking” (Small 1998) and the imagination1; and that (2) Walton’s normativist theoretical framework provides conceptual resources that are helpful for interpreting and critiquing Bloch’s normative claims. The first order of business, then, is to give a provisional explanation of the difference between normative aesthetics and normativist aesthetics. Normative aesthetic claims belong to the realm of the aesthetic “ought.” They concern what the aesthetic subject ought to do and how the aesthetic object ought to be. Aristotle speaks in a normative-aesthetic register when he states: if you string together a set of speeches expressive of character, and well finished in point of diction and thought, you will not produce the essential tragic effect nearly so well as with a play which, however deficient in these respects, yet has a plot and artistically constructed incidents  (1961, 63).

So does Hume, in telling us that “to enable a critic the more fully to execute [his critical] undertaking, he must preserve his mind free from all prejudice, and allow nothing to enter into his consideration, but the very object which is submitted to his examination” (Hume 2006, 244). It is thus correct to say that “normative aesthetics establishes rules for

490   BRYAN J. PARKHURST the artist and standards for the critic” (Jerusalem  1920, 207). But it is not complete, for the aesthetic subject need not be an artist or a critic in the usual sense (she might, for example, be a participant in a traditional worksong of a tribal community) and the aesthetic object need not be an artwork (it might instead be a human body, a sunset, a mathematical equation). Additionally, we might include in the domain of normative aesthetics interpretive claims about what a specific work means or represents (taking a  cue from philosophers of language who hold that meaning itself is a normative property). And we might also include judgments that concern the moral and political character of works of art, so that normative aesthetics becomes capacious enough to encompass, as well, the critical theory or critique of aesthetic phenomena, that is, morally or politically committed aesthetic inquiry animated by “a vision of the good social order grounded in both a detailed, empirical understanding of how existing institutions function and a commitment to normative criteria that are (in the broadest sense) ethical” (Neuhouser 2011, 281). A good thing to mean by “normativist aesthetics,” and what is meant by it here, is the investigation, description, and systematization of the norms that are held to be constitutive of a given aesthetic activity, that is, the norms one must abide by insofar as one is a participant in an aesthetic practice.2 “Meta-aesthetics” might also be an appropriate term for this type of inquiry. On one reading of it, Kantian aesthetics is in large part norm­ ativist: it seeks to identify the norms the adjudicating subject follows in performing a type of judgment that counts as distinctively aesthetic (as opposed to distinctively empirical or moral, in the Kantian trichotomy).3 Whereas normative aesthetics espouses norms, normativist aesthetics delineates the internal practical structure of normatively governed practices, and is in that sense a kind of Geistes- or Kulturwissenschaft, or what could be called a philosophical anthropology. As we shall see, the line of demarcation between normative and normativist aesthetics, although it will be useful for us as a heuristic, is not sharply inscribed.4 In what follows, I juxtapose Marxist normative aesthetics with Anglo-American analytic normativist aesthetics. I do this by looking at Bloch’s theory of utopian musical listening (a theory, I will suggest, of how music ought to be heard, of how its latent revolutionary content ought to be disclosed by acts of imaginative listening) against the background of Walton’s theory of musical representation and emotionality (an account of the norms that govern music-centered make-believe). My aim in bringing together these two very different philosophical treatments of the musical imagination is to use Waltonian tools to reconstruct a core Blochian position regarding the relationship between musical sound and revolutionary political consciousness. The position in question is that music makes an appreciable contribution to the psychological faculty of imagining, and to the political project of constructing, a better world, a “regnum humanum” (a kingdom of humanity, Bloch 1986, 1296), in which there is an abolishment of alienation, violence, and privation.5 For Bloch, a Hegelian Marxist, the project of actualizing a regnum humanum through the implementation of communism is a historical labor of communal Selbstbildung and Selbstverständigung (self-formation and self-reflection):

A Hopeful Tone   491 Once man has comprehended himself and has established his own domain in real democracy, without depersonalization and alienation, something arises in the world which all men have glimpsed in childhood: a place and a state in which no one has yet been. And the name of this something is home (Heimat).  (1971, 44–45)6

I am concerned to assess whether Bloch is entitled to regard musical works as indispensable coadjutants in this process and to invest them with as much political gravity as he does. The structure of this chapter is as follows. I begin with an overview of Walton’s theory of fiction and his application of it to music. I then historically situate Waltonian musical aesthetics as a way of transitioning to the contrasting ideological framework of Bloch’s Marxist aesthetics. Thereafter, I use Walton’s categories to give an interpretive reconstruction ofBloch’s key proposal about music. Having thereby delimited a commitment that is (1) plausibly attributable to Bloch and (2) clear enough to be evaluable, I close with some evaluative remarks.

Waltonian Fictionality Walton’s theory of fiction, as set out most notably in Mimesis as Make-Believe, is a theory of what it is for an artwork to have the representational content it has.7 The content of representational works of art corresponds to what is true in the world of the work. Saying that a proposition is true in the world of the work is the same as saying that the proposition is “fictional.” And the fact that a proposition is fictional is a fact about a normative status it possesses. Propositions that are fictional are to be imagined; they are what an appreciator (a reading, listening, viewing consumer) of an artwork ought to (is under some form of normative pressure to) imagine, because and insofar as she engages with the artwork as a participant in a game of artwork-centered make-believe. Hence, for Walton, what is “inside” an artwork—the otherworldly fictional content it contains—has everything to do with what goes on “outside” of it, that is, with how this-worldly appreci­ ators conduct themselves in relation to the artwork and in accordance with certain rules of aesthetic behavior that dictate what, how, and when to imagine: Fictional worlds are imaginary worlds. Visual and literary representations establish fictional worlds by virtue of their role in our imaginative lives. The Garden of Earthly Delights gets us to imagine monsters and freaks. On reading Franz Kafka’s story, “A Hunger Artist,” one imagines a man who fasts for the delight of spectators. It is by prescribing such imaginings that these works establish their fictional worlds. The propositions we are to imagine are those that are “true in the fictional world,” or fictional. Pictures and stories are representational by virtue of the fact that they call for such imaginings.  (2015, 153)

A game of make-believe played with an artwork is often one in which the work serves as a “prop.” A prop has the function of rendering certain propositions fictional in the

492   BRYAN J. PARKHURST context of a set of “principles of generation.” These are conventions that regulate how specific features of the prop (such as the property a slab of marble has, when carved just so, of resembling an uncovered female figure) confer fictionality on specific propositions (such as the proposition Aphrodite is naked). But some of the make-believe called for by an artwork is not centered on the artwork’s (propositional) content proper (its fictional world) and is instead centered on the appreciator’s own sensory and cognitive engagement with that artwork as a prop. When looking at Bruegel the Elder’s The Peasant Wedding, according to Walton’s account, I imagine not only that there is rustic merrymaking, but also that I see rustic merrymaking, and that my visual experience of the painting is a visual experience of rustic merrymaking. What is imagined in imaginings of this sort belongs to a “game world” rather than to a “work world.” These imaginanda are not constitutive of the artwork’s subject matter aptly so-called, but imagining them is nevertheless made appropriate by the particular manner and circumstances in which the artwork puts its appreciator in epistemic contact with its subject matter. This précis leaves out many subtleties. But three notable features of Walton’s theory are evident from what has been said so far: 1.  It is first and foremost a theory of the “intentionality” of artworks, their property of being about something or of being representations of something. 2.  The nature of this aboutness is cashed out mainly along propositional lines: art is representational when it normatively enjoins us to imagine that a proposition is true, that something is the case, that a state of affairs obtains. 3.  The appreciator’s aesthetic experience of an artwork is held, contra various species of formalism, to paradigmatically involve the interpretive, meaning-gleaning activity of deriving items of semantic content (fictions) from features of the work and from features of the context in which it is experienced. This being the case, it seems right to class Walton’s theory as a normativist theory of reception. And it seems equally right to assume that this is not the sort of reception theory you get if you take music as your starting point or primary datum. Music looks to be a proposition-mongering affair only intermittently and per accidens, maybe even deviantly (in opposition to the true nature of true music). Pretheoretically, music does not strike us as a form of art that consistently and constitutively has us imagining that such-and-such is thus-and-so. “The weak representational nature of music (relative to the other arts)” (Klumpenhouwer 2002, 34)—what Richard Wagner called the “infinitely hazy character of music”—has led aestheticians to insist, with some justice, that music is not an art of content but instead an art of pure form. Or, in variations on this theme, they have held that music’s form is its content (as Eduard Hanslick believed); or that music’s content is distinctively and exclusively musical, owing to music’s thoroughgoing selfreflexivity, its inability or refusal to be a signifier of anything besides itself (as Heinrich Schenker believed). In the most extreme version of this gesture, it is claimed (by, among others, the musical aestheticians of the German Frühromantik, such as Tieck and Hoffmann) that music is sheerly ineffable, and that the content of a musical experience is entirely refractory to the fixities of linguistic description.8

A Hopeful Tone   493 The history of musical aesthetics contains far more denials than affirmations that music, as such, is “a transcribable, thus readable, discourse” (Attali 1985, 25) that is replete with linguistically paraphrasable content.9 But there is a dogged recurrence of the trope that music still somehow aspires to the condition of language, perhaps by being organized rhetorically, like a speech (as Baroque-era theorists such as Johann Mattheson argued), perhaps by possessing a underlying grammar-like structure (e.g., a formalizable syntax) that floats free from a domain of reference,10 or perhaps by being in some looser way pseudolinguistic, for example, by being a “temporal succession of articulated sounds that are more than just sound, [a succession that is] related to logic [in that] there is a right and a wrong” (Adorno 1993, 401). Yet often, as in the case of Adorno’s theory, the acknowledgment of a “language-character” (Sprachcharakter) in music is accompanied by undiminished eagerness to evacuate music of semantic significance. Music may in some sense “speak” to its hearers and abide by some kind of “logic,” Adorno believes, “but what is said cannot be abstracted from the music; it does not form a system of signs” (1993, 401).11 Whether or not Adorno’s or any other antisemantic theory fully and accurately diagnoses music’s condition, it seems inarguable that antisemanticism represents a motivated response to some deep, distinguishing, and perhaps distinguished feature of music. And music’s “infinitely hazy character” shows up as a problem to be reckoned with if one’s explanatory standpoint, like Walton’s, is on the whole, a propositional/ representational one.

Walton on Music Walton’s representation-centric normativist theory does not rest on a bedrock of intuitions and considerations that are primarily musical: music alone, we can agree, simply does not create a pressing demand for a theory of fictional representation. But, as Walton’s rich and attentive discussions of music show, this does not mean that music cannot be profitably examined from a representationalist angle. Walton’s important contribution to the philosophy of music is to have drawn our attention to a host of continuities between music and the other (more unequivocally representational) arts, continuities that would have been suppressed or passed over by a theory that began with, and that attempted to theoretically enshrine, the conviction that music is sui generis. In Mimesis, and in several articles that specifically address music, Walton presents reasons for believing that much, perhaps most, music is representational in the same sense that novels and paintings are. These include, but are not limited to, the following: 1.  A musical work may be associated with a fictional world because it prescribes imaginings about its own constituents, such as its chords and melodic lines. For example, we may imagine that musical objects causally impinge on one another: [w]e imagine (subliminally anyway) that causal principles are operating by virtue of which the occurrence of the dominant seventh makes it likely that a tonic will follow, and on hearing the dominant we imaginatively expect the tonic, whether or

494   BRYAN J. PARKHURST not we actually expect it. If, or to the extent that [this imagining] is prescribed, we have fictionality.  (Walton 2015, 154)

2.  In many instances, music’s expressivity may be a function of how it represents human behaviors, which it does by prescribing that we imagine that there is a behaver who behaves: [T]here can be no doubt that some expressive music is expressive by virtue of connections with human behavior. There is little strain in thinking of some musical passages as representing, as inducing us to imagine, exuberant or agitated or bold behavior. . . . Where there is behavior there is a behaver. If music represents an instance of behaving calmly or nervously or with determination, it represents, at least indirectly, someone so behaving. So the fictional world contains human beings, anonymous fictive agents, whether or not the sounds themselves are characters in it. (Walton 2015, 156–7)

3.  Music may represent properties of events or actions and thus make it fictional that property-bearing events or actions take place, while leaving largely indeterminate the kinds of objects or actors implicated in those events and actions: [T]he lateness of the upper voice, and its dallying quality, the rigidity of the bass’s progression, the fortuitousness or accidentalness of the D-major triad, the movement to something new, are in the music . . . Some of this at least is a matter of imagining. We imagine something’s being late, probably without imagining what sort of thing it is. And we imagine a fortuitous or accidental occurrence. . . . [W]hy shouldn’t it count as representational, . . . as representing instances of lateness, fortuitousness, etc.? (Walton 2015, 159)

Without feeling at all disposed to deny these claims, we may still feel disposed to pass comment on how theory-laden they are. These are the sorts of things one would be primed to notice and point out about music if one’s motivating objective were to extend the applicability of a propositional model of fictional representation, a model designed in the first instance to accommodate literary and visual aesthetic phenomena. This not an indictment. All observations and explanations are probably in some measure theoryladen. Moreover, it is satisfying to follow Walton to the counterintuitive but unavoidable conclusion he reaches, which is that there is a class of familiar music-appreciative behaviors that centrally involve make-believe, such that music (more often than not, and more often than anyone had realized) is representational and is (to that extent) a cognate of pictures and literature qua fiction-conveying technologies. And the attraction of having a unified theory under which art-forms can be subsumed serves as an incentive to think and talk about music in terms of its pervasive representationality, or (equivalently) its fictionality, or (equivalently) its imagination-prescribing function. One may worry that there is a danger that those attractions will lead us to disproportionately accentuate music’s ties to word and image and to downplay whatever it is that is idiosyncratically musical (radically nonliterary, radically nonpictorial) about music. For we can readily grant that the kinds of fictionality Walton finds in music may be

A Hopeful Tone   495 commonplace (many people may in fact perform such imaginings when they listen to music) and may be in some sense mandatory (one may do a worse job of appreciating music if one fails to perform such imaginings) while at the same time believing that such imaginings are not a sine qua non of musical experience in and of itself. One arguably cannot count as appreciating Maxim Gorky’s Mother as a novel, or count as appreciating Evdokiya Usikova’s Lenin with Villagers as a picture, at all without using one’s imagination to explore a fictional world populated with fictional objects and events. By contrast, although a relatively nonimaginative experience of Shostakovich’s Leningrad Symphony might arguably be impoverished in comparison with an experience that is rich in propositional imaginings, most of us will be unwilling to insist that a listener who fails to imagine the Siege of Leningrad while listening to this piece, or who fails even to nonspecifically imagine that something or other is destroyed or imperiled, or that violence or trauma somehow transpire, is thereby disqualified from counting as a musical listener altogether.12 But actually, there is little cause for worry, for Walton concedes most of this. He recognizes that there is a significant “remainder” (as Adorno would say) when music is brought under the categories native to a theory of fictional representation, something left over that is, paradoxically, made conspicuous by the very fact that it is excluded or downplayed, something that is (again as Adorno would say) “nonidentical” with the concepts whose explanatory use the Waltonian theory of fictionality encourages.13 Accordingly, Walton looks outside the bounds of fictional representation for a feature that marks music off from the literary and visual arts. To seek this, given an antecedent conception of aesthetic appreciation as something that requires compliance with norms of imagining, is to seek a form of imaginative experience the prescribing of which is unique to music. Walton finds music’s individuating trait, the differentium specificum that it does not share with novels and paintings, and so forth, in the way it prevails upon us to imagine that our auditory experience of the musical work is an affective experience of an emotional state. Music “gets us to imagine experiencing a certain feeling, and possibly expressing it or being inclined to express it in a certain manner. It often does this without getting us to imagine knowing about (let alone perceiving) someone else having that experience or expressing it in that manner” (Walton 2015, 173). Rather, music gets the listener to imagine of her experience of hearing sounds that it is an experience of a particular emotion. Walton’s terminology allows this idea to be expressed concisely: music, unlike other forms of art, gets us to treat our perceptual experience itself (as opposed to external objects that cause and are represented by that experience) as a prop. Walton frames the suggestion in terms of the difference between work worlds and game worlds: Work worlds comprise fictional truths generated by the work alone. But feelings . . . do not exist independently of people who feel them. . . . So there is no pressure to regard the music itself as establishing a fictional world in which there are feelings. . . . It is the listener’s auditory experiences, which, like feelings, cannot exist apart from being experienced, that make it fictional that there are feelings. When the listener imagines experiencing agitation herself, there is no reason to think of the music as making

496   BRYAN J. PARKHURST anything fictional. It is the listener’s hearing of the music that makes it fictional that she feels agitated. The only fictional world is the world of her game, of her experience.  (2015, 173)

The Historical and Class Character of Walton’s Theory Ignoring for present purposes the technical nuts and bolts of what I have elsewhere called Walton’s “first-person feeling theory of musical expression,”14 I now segue into a discussion of Bloch by first ideologically diagnosing the kind of imaginative listening that Walton’s normativist theory takes as its object of theorization. A critical-historical vantage point allows us to see that this mode of listening has affinities with the ideology of what Korstvedt (2010, 122), following Adorno, calls “Romantic bourgeois Innerlichkeit.”15 One key respect in which the Waltonian listener can be identified as a stereotypically “Romantic” subject is that this listener’s experience (of music as a locus of expression or emotionality) is one in which introspectible “sentiment, longing, and emotion . . . even suppressed animality” (Korstvedt 2010, 122)16 are elevated to the status of ends-­ in-themselves. For the Romantic aesthete—in whose eyes art is a supremely valorized object of perception and cognition because it facilitates a “spontaneous overflow of powerful feelings”17—an activation or intensification or heightened awareness of the emotions is the raison d’être of music, or of a certain way of listening to it. According to this outlook, musical sounds are not to be valued primarily as stimuli that lead to proper action (as Plato’s account of the musical modes in The Republic holds) nor as aids to the restoration of bodily and spiritual equilibrium (as Aristotle’s account of musically abetted catharsis in the Politics holds). Nor are musical sounds to be listened to for their own sake, or for the sake of detached contemplation of their sensuous auditory properties, as happens in the modernist practice of “reduced” listening, that is, “listening for the purpose of focusing on the qualities of the sound itself (pitch, timbre, etc.) independent of its source or meaning” (Chion 1994, 222–223). Instead, music figures into the Romantic vision as an instrumentally valuable means to an intrinsically valuable end of emotional extremity or disequilibrium. According to Hegel, whose aesthetic system both expounds and historicizes the Romantic conception of music, music’s release from its self-incurred tutelage (its ageslong period of subordination to the verbal art forms) occurs when it matures into a  romantische Kunstform, a mode of artistic expression that taps into “inner spirit” (der innere Geist, i.e., emotional subjectivity). More so than the other arts, music provides “a resonant reflection, not of objectivity in its ordinary material sense, but of the mode and modifications under which the most intimate self of the soul, from the point of view of its subjective life and ideality, is essentially moved” (Hegel 1920, 342). Music is a “province which unfolds in expanse the expression of every kind of emotion, and every shade of joyfulness, merriment, jest, caprice, jubilation and laughter of the soul, every

A Hopeful Tone   497 gradation of anguish, trouble, melancholy, lament, sorry, pain, longing and the like, no less than those of reverence, adoration, and love fall within the appropriate limits of its expression” (Hegel 1920, 359). Adorno, a Romantic at heart, accepts these premises and attempts to draw out what he sees as their consequences for the psychology of classposition. He interprets the musically assisted retreat into the “exclusive and private warmth” (Daniel 2001) of bourgeois interiority as a gesture of withdrawn resignation, on the part of a no longer heroic middle class, in the face of monstrous and impersonal forces and relations of production that lie outside the ken of the individual subject’s power to efficaciously intervene for social change: Although inwardness, even in Kant, implied a protest against a social order heteronomously imposed on its subjects, it was from the beginning marked by an indifference toward this order, a readiness to leave things as they are and to obey. This accorded with the origin of inwardness in the labor process: Inwardness served to cultivate an anthropological type that would dutifully, quasi-voluntarily, perform the wage labor required by the new mode of production necessitated by the relations of production. With the growing powerlessness of the autonomous subject, inwardness consequently became completely ideological, the mirage of an inner kingdom where the silent majority are indemnified for what is denied them socially. (Adorno 1998, 116)

Walton, I wish to suggest, offers a normativist description of the reception practice that is the shared basis of Adorno’s and Hegel’s aesthetics of music: a modality of listening in which music functions as “a resonant reflection . . . of the mode and modifications under which the most intimate self of the soul . . . is essentially moved.” “Reflection,” in its Hegelian usage, stands for a relationship between inner and outer (the appearance of something reflects its essence), as well as for a form of second-order or self-directed awareness (I reflect on my own thoughts and experiences when they are treated as objects of further thoughts and experiences). As Walton describes it, music is caught up in a social practice that is reflective in both of these senses. Music’s much-vaunted expressiveness arises from its special use-value (to use Marx’s term), its capacity for setting in motion, and steering the course of, an exercise of the imaginative, affective, and introspective faculties. And this reflective, reflexive relation of the listening self to itself is achieved through the mediation of an external musical sound-object, a work that is the “reflective” outward appearance of an inner emotional esse, to wit, the music’s expressive content (Inhalt). Also, music’s conventionally instituted, socially enacted prescriptivity, the exhortative force with which it prompts us to engage in an imaginative exploration of the intimacies of the self, is the normative ground of its felt expressiveness for the individual subject. This represents an additional layer of reflective relation between inner and outer: the “outer,” social fact that music has the job of “get[ting] us to imagine feeling or experiencing exuberance or tension ourselves—or relaxation or determination or confidence or anguish or wistfulness,” according to Walton, “goes a long way toward explaining the intimacy [one feels] with the anguish in the music” (2015, 165). As all this indicates, Walton can plausibly be viewed as within the groove of a venerable Romantic problematic.

498   BRYAN J. PARKHURST The Waltonian emotional listener is a “bourgeois” subject, we can go on to say (with an Adornian accent), in that her auditory activity is conducted as the private affair of an autonomous individual who exerts a form of self-control or self-mastery over her own subjectivity, specifically by configuring her auditory experience in accordance with the rules of a game of make-believe. A sense of ownership, possession, or inhabitance is implicit in such listening,18 as is suggested by the spatial metaphors Walton uses to describe it: My impression is the opposite of being distanced from the world of the music. . . . I feel intimate with the music, more intimate, even, that I feel with the world of a painting . . .it is as though I am inside the music, or it is inside me. Rather than having an objective, a perspectival relation to the musical world, I seem to relate to it in a most personal and subjective manner.  (2015, 165)

And, although it is inherently norm-compliant, musical experience also stakes out a preserve in which the pursuit of one’s private aesthetic interest is insulated from interference by an external authority figure (the author). For, in music, unlike in literature and painting, it is the appreciator’s subjective experiences, rather than the work’s objective properties, that are directly responsible for generating fictional truths. In that sense, music’s normativity is self-legislated by the listener: It is the auditory experiences, not the music itself, that generate fictional truths. I can step outside of my game with a painting. When I do, I see the picture and notice that it represents a dragon, that it calls for the imagining of a dragon (even if I don’t actually imagine this). But when I step outside my game with music and consider the music itself, all I see is music, not a fictional world to go with it. There are just the notes, and they themselves don’t call for imagining anything. The absence of a work world does not, however, prevent the listener’s imagination from running wild as she participates in her game of make-believe.  (Walton 2015, 174)

The term “Innerlichkeit” (inwardness, interiority, subjectivity, introspection, introversion) is pertinent to all this. It captures the fact that the Waltonian listener is self-reflexive, is absorbed in the “world of her game,” is intent on the phenomenological properties made manifest to her in her own act of introspective self-awareness, and is concerned with the public, outer realm (the realm in which musical sounds have their objective being) only insofar as its sonic events provide a mode of access to, or an occasion to withdraw into, an interior realm of solitary, self-directed emotional fantasy.

Bloch’s Musical Aesthetics In that it essentially involves phenomenological introspection, the having of an experience whose intentional content is (at least in part) the having of an experience (recall that Walton proposes that we imagine of our auditory experience of music that it is an emotional

A Hopeful Tone   499 experience), Waltonian emotional listening represents an encounter with the self. This is a self-encounter in which, moreover, the imagination is irreducibly involved. Bloch’s philosophy of music likewise thematizes imaginative self-encounters and it likewise has a pronounced Romantic slant to it (Habermas [1969–1970] calls Bloch a “Marxist Romantic”). But rather than attempting to codify the norms indigenous to a historically localized form of “bourgeois-Romantic” listening, as Walton’s theory can be read as doing, Bloch’s philosophy of music endeavors to bring about the dialectical transcendence— the processual subversion, preservation, and elevation—of those norms. It does this by prescribing a mode of emotional listening that poses challenges to both the aesthetic ideology in which it itself is rooted—namely, bourgeois Innerlichkeit—and the economic configuration that is, in Bloch’s Marx-inspired view, determinative of this ideology— namely, the capitalist mode of production. This reappropriation and redeployment of his culture’s musical heritage is part and parcel of Bloch’s overall philosophical strategy of pitting culture (religion, philosophy, art, etc., as they have been handed down) against itself (by “sublating,” problematizing, and radicalizing it) for the sake of itself, that is, for the sake of instituting new cultural forms that can nurture the “subjective conditions for revolution” (Kellner and O’Hara 1976, 19) by evoking a “future kingdom of freedom as the real content of revolutionary consciousness” (Bloch 1972, 272). Habermas points to the Hegelian basis of this maneuver: What Bloch wants to preserve for socialism, which subsists on scorning tradition, is the tradition of the scorned. In contrast to the unhistorical procedure of Feuerbach’s criticism of ideology, which deprived Hegel’s “sublation” (Aufhebung) of half of its meaning (forgetting elevare and being satisfied with tollere), Bloch presses the ideologies to yield their ideas to him; he wants to save that which is true in false consciousness: “All great culture that existed hitherto has been the foreshadowing of an achievement, inasmuch as images and thoughts can be projected from the ages” summit into the far horizon of the future.  (1969–1970, 312)

It is clear from the amount of attention Bloch lavishes on music,19 and from his undiluted enthusiasm for it, that he judges the Western musical heritage to be among the most precious items in the bequeathed patrimony of “great culture.” This is because music performs with distinction what Bloch sees as the rightful function of art in general, that of putting us in touch with our longing for, and with our will to create, a world unblemished by alienation, exploitation, and oppression. Music is preeminent for Bloch because it is preeminently utopian: For Bloch, music is the most utopian of the arts. It is speech which men can understand: a subject-like correlate outside of us which embodies our own intensity, and in which we experience an anticipatory transcendence of the existing interval or distance (Abstand) between subject and object. “Identity,” the “last moment,” “a world for us,” “utopia” is present in music: as the anticipatory presence and pre-experience (Vorgefühl) of the possibility of self. . . . Music expresses something “not yet.” It copies what is objectively undetermined in the world. There is a human

500   BRYAN J. PARKHURST world in music which has not yet become actual: a pre-appearance of a possible regnum humanum. For Bloch, music is the most public organon of the Incognitio or subjective factor in the world as a whole, and it provides an anticipatory experience of the subject-like (subjekthaft) agens as if it had become objectified in the external world.  (1982, 175)

On top of having a revelatory or epiphanic function (which I return to below), music—as Bloch hears it, and as he directs us to hear it—has a motivational, action-precipitating, hortatory function that goes beyond a prescription to merely imagine. Bloch’s ideal listener, in and by listening, attunes herself to a subjective, atavistic “agens” (Bloch 1986, 204), a desiderative force that conducts the revolutionary agent forward along the historically necessary path. Music sets in motion and sustains a process of revolutionary worldtransformation by providing a critical insight—a disconcealment of the distance between what the world is and what it ought to be—and by helping us to “keep faith with the directions implied by such beginnings” (Bloch 1960, 286; translated by Jameson 1971, 122). Music sonically amplifies, and places us in auditory communion with, the agens, “the primordial hunger that activates the (human) subject” as a site of “desire and hope” (Levy 1997, 177). Thus, to adapt Marx’s phrase, Bloch’s philosophy of music seeks to discover the revolutionary kernel within the ideological shell of inherited modalities of musical listening. Bloch’s dialectical approach leaves intact many of the salient features of bourgeoisRomantic listening. Bloch never wavers in affirming music’s deep and ineradicable connection to emotion, subjectivity, and introspection: “Expression is always music’s terminus a quo and terminus ad quem” (1985, 206), and “music is . . .subjective . . .in that its expression . . .mirrors the affective looking-glass reflecting a given society and the world as it occurs in affective correlate” (208). Thanks to its “capacity for directly human expression” (Bloch 1985, 201), music enables us to “approach ourselves purely, encounter ourselves” (Bloch  2000, 39). But this encounter with the self is not—as it is in the approach to listening that is extolled by Hegel and analyzed by Walton—mostly or merely a turn inward toward solipsistic fantasies and unfettered emotional peregrinations. Musical experience “is not just romantic or quasi-freely subjective” (Bloch 1985, 200). Rather, the musical self-encounter is a turn inward that is always also a turn outward. The self-identity that the musical listener introspectively encounters is at the same time a reflective, communal identity. Hegel speaks in this connection of an “ ‘I’ that is a ‘We,’ and a ‘We’ that is an ‘I’ ” (Hegel 1977, 110). Emotionality remains a precondition for genuinely musical experience, but the having of a private emotional pathē is no longer held up as an end-in-itself. Rather, private, individual emotion is pressed into service for the sake of a public, communitarian causa finalis: “Expression is realized in forms regarded not as reifications and an end in themselves but as means to a word-surpassing or wordless statement and always, ultimately, to the utterance of a—call” (Bloch 1985, 205). A “call” in a number of senses. Music calls to us, in a “language” that has no recourse to words and that, either in spite of or because of this limitation, is able to tap into our

A Hopeful Tone   501 profoundest feelings of hope and expectancy.20 It calls for us, beckons us, from a utopian future we currently see through a glass, darkly. And it calls upon us to recognize and fully actualize the nature of our true, ideally social selves, both by causing us to regret, and to resolve to rectify, the current incompleteness of our historical project of selfemancipation and self-realization, and also by pushing us to adopt the means necessary for achieving a world that is adequate to our shared species-being (Gattungswesen). Such a world must of needs be characterized by collectivity, nonalienation, solidarity, and the absence of scarcity—attributes whose political and economic precondition, Bloch believes, is the abolition of the capitalist mode of production through the insurrectionary activity of the proletarian class. “The realized We-world” is Bloch’s term for the unqualifiedly redemptive commonwealth of humanity that is the asymptotic goal of the socialist movement. By means of an act of divinatory musical hearing (Hellhören), Bloch thinks, we can feel the real possibility of this future (or, if you like, future-perfect) state of affairs, can gain sensuous knowledge of the world’s objective tendency (Tendenz) to move toward the actuality of communism.21 “Music as a whole stands at the boundary of humanity, but [it is] the boundary where humanity, with a new language and the callaura surrounding deeply felt intensity, a realized We-world [der Ruf-Aura um getroffene Intensität, erlangte Wir-Welt] first comes into being. The order in the musical expression also suggests a house, even a crystal, but one composed of future freedom; a star, but as a new earth” (Bloch 1986, 1103).

Utopia That all sounds rousing and eschatological enough, but what precisely can it mean? If we hope to decipher such aphorisms, we must place them against the background of Bloch’s more general theory of utopia, the master narrative that structures all of Bloch’s philosophical and sociological investigations. For the politically committed MarxistLeninist Bloch of The Principle of Hope, a variety of human communicative practices— predominantly those involving the “production and usage of signs” to convey “social meaning[s] expressed in a code” (Attali 1985, 24)—possess a “utopian function.” In a motley assemblage of cultural forms—architecture, fairy-tales, the detective novel, religion, alchemy, circuses, advertisements, fashion, medicine, and the fine arts—Bloch espies a “Vorschein,” an anticipatory illumination of a possible, preferable, future state of affairs.22 Sometimes a Vorschein may reveal itself to us in the mundane transactions of contemporary commodity culture. “Shop windows and advertising are in their capitalist form exclusively lime-twigs for the attracted dream birds” (Bloch 1986, 334). To use one metaphor to unpack another: the siren song of manipulative marketing, notwithstanding its liability to mystify consumers and fetishize commodities, gives voice (unbeknownst to itself) to a legitimately humanistic wish for a new and improved way of life. In other cases, a Vorschein may only become perceptible in hindsight, through an interpretive

502   BRYAN J. PARKHURST reassessment of an antiquated cultural form, or what Hegel calls a “shape of life that has grown old” (Hegel 1992, 23): [S]hould visionary hearing of that kind be attained through successful musical poesis a se, then all music we already know will later sound and give forth other expressive contents besides those it has had so far. Then the musical expression perceived up to now could seem like a child’s stammering by comparison, a language of an ultimate kind that is seeking to take shape but has come close to doing so only in a few, very exalted places. Nobody can understand it yet, although it is occasionally possible to surmise its meaning. But nobody has as yet heard Mozart, Beethoven or Bach as they are really calling, designating and teaching; this will only happen much later, with the fullest maturation of these and all great works.  (Bloch 1985, 207)

Both quotidian experiences and aesthetic experiences, both modern shapes of life and outmoded shapes of life can, under proper scrutiny, show us the yawning gap between how the world is and how it could and should be. Bloch’s hermeneutic undertaking is to use cultural and aesthetic criticism to sharpen our experience of the nonidentity of is and ought. He seeks to brighten and intensify the utopian Vorschein that is everywhere to be glimpsed in the world of human values, institutions, culture and art. Accordingly, the philosophy of music progressively elaborated in The Spirit of Utopia and The Principle of Hope encourages us to hear the revolutionary, hopeful tone23 that resounds in the masterworks of the Western canon (to which Bloch’s musical preferences are more or less restricted).24 With Bloch’s aesthetic philosophy as its handmaiden, music can finally come into its own as a “source-sound of self-shapings still unachieved in the world” (Bloch 1985, 219). Fair enough, but what exactly is a Vorschein, and how exactly is it to be found in music? Bloch’s cryptic, prophetic reflections on culture never give way to a precise statement of what it is for a cultural practice, musical or otherwise, to radiate a pre-appearance of utopia. At the risk of oversimplification, I would point to two basic ideas that seem to lie at the center of Bloch’s proposal: (1) there is a way of imaginatively engaging with cultural objects so that they provide a sensory and intellectual provocation to construct a mental representation of (fictional) utopian circumstances, and (2) this fiction-generating imaginative activity, properly performed, furnishes us with motivation to make the utopian fiction a true representation; we are propelled by our utopian make-believe to try to bring the world into alignment with the utopian fictions we imagine. These two ideas are especially germane in the case of musical experience, as Ruth Levitas notes: “Bloch argues not only that music is the most utopian of cultural forms but that it is uniquely capable of conveying and effecting a better world” (Levitas 2013, 220, emphasis mine). To paraphrase Levitas, music has pride of place in the sphere of the arts because of, on the one hand, its unparalleled capacity for possessing utopian sense or reference (the semantic property of being about or signifying utopia) and, on the other hand, its capacity for carrying utopian prescriptive force (the power to exhort us to make utopia real). Music exceeds the other arts in its power to summon a vision of a utopian “Not-Yet-Being” (Noch-Nicht-Sein) or “Real-Possible” (objectiv-real Mögliches)

A Hopeful Tone   503 that lies beneath the surface of “That-Which-Is” (Das Seiendes)—the world as it currently confronts and confounds us; and it also has greater power to make palpable the historical urgency of this vision: “music is that art of pre-appearance which relates most intensively to the welling core of the existence-moment of That-Which-Is and relates most expansively to its horizon—cantus essentiam fontis vocat [singing summons the existence of the fountain]” (Bloch 1986, 1069–1070).25 Couched in language that is less expressionistic: the world as it now is, which includes us as we now are, is pregnant with—contains as an “objectively real possibility” (objektiv-reale Möglichkeit)—the world as we imagine it should be, which includes us as we would wish ourselves to be. Music reveals that the alienated, self-dirempted world of capitalist modernity is implicitly and immanently (not yet, noch nicht) “a homeland of identity in which neither man behaves toward the world, nor the world behaves towards man, as if toward a stranger” (Bloch 1986, 209). How? Music “relates most intensively to the welling core of the existence moment” by being the most real art, the art most immediately connected with our concrete material and corporeal predicament as embodied creatures, in the perfectly literal sense that there is nothing spatially between us and the physical vibrations that are music’s material substratum. Music’s realness thus rests on transhistorical features of our sensory apparatus. Bloch says as much in saying, wryly, that “as hearers we can keep closely in touch, as it were. The ear is slightly more embedded in the skin than the eye is” (1985, 73). And, very much in the spirit of Walton’s remarks about our spatial oneness with musical sounds (“it is as though I am inside the music, or it is inside me”), Bloch refers to the “heard note” as a “sound that burns out of us . . . a fire in which not the vibrating air but we ourselves begin to quiver and to cast off our cloaks” (1). To give sense to the idea that music relates “most intensively” to “That-Which-Is,” we can appeal, on Bloch’s behalf, to the spatial immediacy and bodily resonances that place music in an “incomparable proximity to existence” (Bloch 1985, 227)—namely our creaturely existence as bodies that are repositories of affect and desire. Here, force is a function of distance: the potency of music’s utopian prescription has to do with its closeness to us. Music “comes close to the subject-based and driving force of events” (208), the human will as the authentic engine of history, because of music’s capacity to (nonmetaphorically) move us, indeed to become one with us, on a somatic level. “There is not music of fire and water or of the Romantic wilderness that does not of necessity, through the very note-material, contain within it the fifth of the elements: man” (227). The nature of sound and the nature of our bodies ensure that the material conditions are continually present for establishing a “correspondence between the motion of the note and the motion of the soul” (123). But why believe that music is “uniquely capable of conveying . . .a better world?” in the sense of helping us to imagine one? The figurative and literary arts can convey semantic freight of a utopian sort by showing us or telling us about some utopian situation or other, such as the leisure-filled, egalitarian, neo-Medieval England described in William Morris’s utopian novel News from Nowhere. But it seems that music unaided by words and pictures is not merely inferior as a vehicle for “conveying” utopia; music seems wholly unfit for this representational task.

504   BRYAN J. PARKHURST Bloch might respond that this is too simplistic a way of framing the issue. Utopian art, as Bloch conceives of it, is not simply, nor is it primarily or paradigmatically, art that draws a blueprint of a better world and/or a better way of living in the world: Thus the concept of the Not-Yet and of the intention towards it that is thoroughly forming itself out no longer has its only, indeed exhaustive example in the social utopias; important though the social utopias are . . . [T]o limit the utopian to the Thomas More variety, or simply to orientate it in that direction, would be like trying to reduce electricity to the amber from which it gets its Greek name and in which it was first noticed. Indeed, the utopian coincides so little with the novel of an ideal state that the whole totality of philosophy becomes necessary . . .to do justice to the content of that designated by utopia.  (1986, 11)

Levitas, taking her lead from Bloch’s standoffishness toward the “novel of the ideal state,” states that “the importance of . . . all utopias, lies not in the descriptions of social arrangements, but in the exploration of values that is undertaken” (2010, 140). Utopian art is not limited to idealistic science fiction; rather, it is any art that permits us to navigate a space of alternative values, not so that we might come to commit ourselves to those exact values, but so that we might cultivate the imaginative faculty of thinking deeply and creatively about a radically novel personal and societal ethos, one that might possibly emerge into prominence within a radically reorganized way of producing and reproducing human civilization. Hudson (1982) essentially agrees with Levitas when he states that Bloch’s view is that utopian artworks might or might not contain “descriptions of social arrangements,” but must possess a “cognitive function as a mode of operation of constructive reason; [an] educative function as a mythography which instructs men to will and desire more and better, [an] anticipatory function as a futurology of possibilities which later become actual, and [a] causal function as an agent of historical change” (51). Be that as it may, the question stands: how is music alone supposed to do this? Even if Levitas’s “exploration of values” and Hudson’s utopian functions do not presuppose full-blown verbal or pictorial “descriptions of social arrangements,” they seem to be predicated on the presence of semantic content of some sort, that is, on the availability of a specifiable, “transcribable” meaning or representational content that can be somehow accessed by means of (proper engagement with) the utopian-functioning artwork. Shouldn’t this mean that music’s “weak representational function” disqualifies it from playing a genuinely utopian role, at least on its own? It looks as though Bloch shares this worry: “It does not go without saying that the note can indicate external things and be related to them. After all, it inhabits precisely that region where our eyes can tell us nothing more and a new dance begins” (Bloch 1985, 219). Yet he blithely proceeds as though it does go without saying and exempts himself from giving a justification for his conviction that music is the utopian medium par excellence. This lacuna cannot be passed over without comment. To have even a minimal appreciation for Bloch’s large philosophical investment in music, we must have some measure of warranted sympathy for his belief in music’s utopian function. And to have this, we need to be able to explain to ourselves how (purely) musical utopianism is so much as possible.

A Hopeful Tone   505

Waltonizing Bloch With the help of Walton’s theoretical apparatus and a clue from Adorno, we can formulate Bloch’s position so that it makes enough sense to be assessable. As a way into this, let us return to the dualism I set out at the beginning of the chapter. Inarguably, Bloch’s aesthetics is robustly normative. Although Bloch has no use for micro-evaluative rankings of individual artworks and is an aesthetic omnivore who, “in contrast to Lukács, breaks with the high culture bias in Marxist aesthetics” (Hudson 1982, 179),26 Bloch’s musical aesthetics is at root an endorsement of the classical canon’s supposed aptness for promoting socialist values. It is also an elaborate exposition of the view that the value of a musical work is partly based on its fitness for aiding the cause of human emancipation. Recall that normativist aesthetics has the job of describing and systematizing the norms that govern and constitute real-life aesthetic practices and habits. Walton examines the practices and habits that surround fictional representations; he explains the property of fictional representationality in terms of the uses to which fictionally representational aesthetic objects are put, and in terms of the normative statuses those objects are accorded, by participants in games of make-believe. Bloch, on the other hand, does not set out to explain how an already-up-and-running aesthetic practice is organized and administered. His conception of “visionary hearing” (Hellhören), for instance, does not arise out of an attempt to explain what the typical listener typically does when listening to music. But Bloch’s aesthetics does adopt the holistic, synoptic perspective characteristic of Walton’s normativist work. Where Walton sets out to trace the normative contours of a complex representational and imaginative practice as it currently exists, Bloch calls for the revision or remaking of time-honored aesthetic customs. In both cases, the theoretical object is all of a certain musical way of life in its globality We might therefore adopt a more fine-grained version of the normativist/ normative distinction and speak instead of a distinction between descriptive metaaesthetics, as pursued by Walton, and normative meta-aesthetics, as pursued by Bloch. Bloch’s aesthetics is normative less at the level of the individual work or individual aesthetic judgment and more at the level of the entire aesthetic culture in which such works and judgments have their place. The normative system that Bloch propounds, like the one whose defining attributes Walton catalogs, has authority over acts of the imagination. According to the norms of imaginative listening Bloch would have us adopt, music (read: the masterworks of the Western tonal canon) should be accorded the function of evincing a utopian vision. The utopian vision to be evoked is substantially the same from piece to piece. Music, all of it, has an immutable representational content that is prior to the contingent, individuating details of specific works. This Bloch refers to both as the “a priori latent theme [that] . . . is really central to all the magic of music” (Bloch 2000, 3) and as “the hearing-in-Existence . . . common to all forms of music” (Bloch 1986, 1089). That Bloch sees himself as breaking

506   BRYAN J. PARKHURST with aesthetic tradition by promulgating a novel imaginative practice, rather than as factually describing the representational properties musical works already have (relative to an actually existing interpretive practice), is made evident by a passage in which he describes music’s utopian function as an unorthodox, unprecedented representational use for which music could be purposely enlisted: [T]he musical object that has really to be brought out is not decided. The . . . dramaticsymphonic movement posits only an area of very general readiness into which the poetically executed music-drama can now be fitted “at one’s discretion.” And by the same token, there yawns between the most transcendable [compositional devices] and the ultimate signet-character of great composers or indeed the ultimate object, the ideogram of utopian music in general, an empty, damaging hiatus which renders the transition more difficult. Even in rhythm and counterpoint illumined theoretically and set in relation philosophically, it is not possible to come directly to the kind of presentiment accessible to the weeping, shaken, most profoundly torn-apart, praying, listener. In other words, without this special learning-from-oneself, feeling-oneself-expressed, human outstripping of theory [through] the interpolating of a fresh subject (though one most closely related to the composer) and of this subject’s visionary speech . . .without this, all transcending relations of the [compositional devices] to the apeiron . . .will remain stationary. Thus with the presentiment, a stage which no longer belongs to the history of music, the note itself reappears as the solely intended, explosive aha!-experience of the parting of the mist; the note which is heard and used and apprehended, heard in a visionary way, sung by human beings and conveying human beings.  (1985, 92, emphasis in original)

Part of Bloch’s point seems to be that there is no way of explaining music’s utopianism, no way of tracing a path from what is objectively the case about music’s structure (perhaps at the level of “rhythm and counterpoint illumined theoretically”) to its capacity for “visionary speech.” But, as I have insisted, a vindication of the possibility of musical utopianism is a requirement for taking Bloch’s revisionary, revolutionary musical aesthetics at all seriously, and any such vindication would seem to stand in need of such an explanation. Adorno’s writings, as interpreted by Richard Leppert, may permit us to be more optimistic than Bloch is about the prospects of explaining musical utopianism. In spite of his infamous pessimism, Adorno is in many respects a utopian thinker about music.27 A utopian sensibility imbues Adorno’s formulation of the concept of structural listening, a privileged mode of “formalist” hearing whose decline he blames on the organs of mass culture, principally the radio and the phonograph. Leppert explains: Adorno promoted the realization through listening of a reciprocal relation between part and whole, by means of which each would be the more fully realized. Heard atomistically . . . the detail was rendered meaningless in its isolation, just as any sense of the whole was obliterated. Conversely, however, if the detail were heard solely as a building block of something larger, it would surrender any sense of its

A Hopeful Tone   507 own spontaneity—which ultimately must be preserved if the whole is to express anything more than its own immanent structure. The relation between part and whole is radically reciprocal; each emerges and, indeed, lives from and through its other.  (Leppert 2005, 116)

For Adorno, music’s sensuous presentation of the reconciliation of part and whole (at least sometimes) stands for a state of perfection, self-subsistence, harmony, plenitude, consummation, fulfillment, and nonalienation, in a way that is (at least sometimes) utopian.28 Such a state of reconciliation obtains when the elemental constituents of a system and the system as a unified whole are mutually adjusted and accommodated to one another, such that each is a necessary requirement of, and in turn requires, the other. This familiar organicist conceit is readily transposed into a political key: In musical details Adorno heard the subject speaking, willingly bending toward the musical object (the whole) in order to make possible the work, a whole larger than the sum of its individual parts. Something, in other words, like a utopian society. Musical details, bending and blending their expressive character toward the whole, while retaining their own specific character, permitted the reenactment of reconciliation between subject and object, for Adorno the artwork’s highest goal. (Leppert 2005, 116)

Music, on this somewhat cursory telling of the Adornian story, is “like” a utopian society: the reciprocal mediation of whole and part (piece and note) in organically unified music is relevantly similar in form (or “isomorphic,” “structurally homologous,” etc.) to the reciprocal mediation of whole and part (society and person) that is distinctive of a “homeland commensurate with man” (Bloch  1986, 136). Subsequent to humankind’s hard-won entrance into its postcapitalist homeland, individuality is not, and cannot be, alienated from collectivity. This is because, as the Communist Manifesto famously puts it, communism is by definition a circumstance in which “the free development of each is a condition for the free development of all.” Animated by exactly this vision, Adorno holds that great music’s greatness is shown as a force for synthesis. Not only does the musical synthesis preserve the unity of appearance and protect it from falling apart into diffuse culinary moments, but in such unity, in the relation of particular moments to an evolving whole, there is also preserved the image of a social condition in which above those particular moments would be more than mere appearance.  (2002, 290)

Adorno thus singles out a formal similarity, a shared mereological property, as the common denominator that semiotically links utopian music to utopian social circumstances. It is on the basis of this structural resemblance that he ascribes to music an ability to “preserve an image” of utopia. Should we therefore infer that utopian music has a representational function analogous to that of Bosch’s The Garden of Earthly Delights

508   BRYAN J. PARKHURST or Manet’s Le Déjeuner sur l’herbe? These utopian paintings catalyze our imagining of utopian states of affairs by looking like a setting in which such states of affairs obtain; looking at these paintings is phenomenologically similar (in relevant respects) to what it would be like to actually look at an actual utopian setting. A Waltonian would say that this visual similarity enables and invites us to pretend that in perceiving the artwork we are perceiving the utopian state of affairs that is represented in and by the artwork. Does utopian music do something like this? Is the Adornian position (or the most defensible position consistent with the most charitable interpretation of Adorno’s remarks) the position that music’s utopian function consists in its being what Walton calls a depiction? Regarding depictions, Walton claims: The viewer of Meindert Hobbema’s Water Mill with the Great Red Roof plays a game in which it is fictional that he sees a red-roofed mill. As a participant in the game, he imagines that this is so. And this self-imagining is done in a first-person manner: he imagines seeing a mill, not just that he sees one, and he imagines this from the inside. Moreover, his actual act of looking at the painting is what makes it fictional that he looks at a mill. And this act is such that fictionally it itself is his looking at a mill; he imagines of his looking that its object is a mill. We might sum this up by saying that in seeing the canvas he imaginatively sees a mill. Let’s say provisionally that to be a “depiction” is to have the function of serving as a prop in visual games of this sort.  (1990, 294)

Most pieces of music, according to Walton’s arguments discussed earlier, do not have a work world and thus do not function as depictions. But some pieces quite obviously do function this way. To take a familiar example, those who listen to the fourth movement of Beethoven’s Pastoral Symphony play a game in which it is fictional that there is a thunderstorm, and in which they are to imagine that their act of listening to the music is an act of listening to a thunderstorm. Underlying this depictive function is the sonic resemblance the music bears to a thunderstorm. Is this how we should explain what happens when Bloch’s musical listener “psychologically anticipates the Real-Possible” (1986, 144)—the concrete possibility of utopian reconciliation between the human species and the human lifeworld—when she performs an act of “visionary hearing?” Does music’s utopianism consist in its being an auditory depiction of utopia? Probably not. It would be odd to claim that musically induced utopian make-believe involves imagining that we hear utopia. Our game world when we listen to utopian music is not plausibly one in which it is fictional that we have an auditory experience of utopia. The reason for this is banal: there is not a distinctive way (or even a distinctive set of ways) a utopian social arrangement would sound. Unlike thunderstorms, utopian social arrangements lack a defining sonic profile. Perhaps it is true that utopian music’s tonal structure is abstractly isomorphic to utopia’s interpersonal structure—but this is not the same as music sounding like utopia. Music cannot sound like utopia, because there is nothing (in general) that utopia sounds like. But if music cannot depict utopia, what can it do? One response that comes to mind is that music might allegorize utopia. Here, again, Walton’s analysis is helpful. Walton

A Hopeful Tone   509 understands allegory as art that (1) refers to something that is different from what it represents; and (2) refers by representing: Dr. Pangloss in Voltaire’s Candide stands for Leibniz, to whom the work refers. . . . But I prefer not to regard [this] work as representing Leibniz . . . in our sense. It is not fictional of Leibniz that his name is “Pangloss” and that he became a “beggar covered with sores, dull-eyed, with the end of his nose fallen away, his mouth awry, his teeth black, who talked huskily, was tormented with a violent cough and spat out a tooth at every couch,” and in this sorry state met his old philosophy student, Candide, to whom he continued to prove that all is for the best. We are not asked to imagine this of Leibniz, although we are expected to think about him when we read about Pangloss, to notice and reflect on certain “resemblances” between the two. Pangloss is Voltaire’s device for referring to Leibniz, but he refers to Leibniz in order to comment on him, not in order to establish fictional truths about him. Reference is thus built on the generation of fictional truths, ones not about the things referred to, is one common kind of allegory.  (1990, 113)

What could utopian music fictionally represent, such that it could thereby allegorically refer to utopia? We can answer this question by giving Leppert’s Adorno-inspired words some Waltonian prefixes: a work of music makes it fictional that its notes “bend . . .and blend . . .their expressive character toward the whole,” gets us to imagine that those notes “permit . . .the reenactment of reconciliation between subject and object,” prescribes that we make-believe that the notes don’t “surrender any sense of [their] own spontaneity” (Leppert 2005, 116). Musical notes are not agents and so cannot literally surrender spontaneity (or perform any of the actions Leppert mentions); but music gets us to imagine it. And in prescribing such an imagining, Bloch could say, were he inclined to be so precise, musical sounds thereby refer to a not-yet-existing utopia and impel us “to notice and reflect on certain ‘resemblances’ between the two” (Walton 1990, 113) (i.e., between the musical-objects-as-agentially-imagined and the allegorically referred-to utopian state). Or at least, musical sounds would do so in the context of a semiotic listening practice that has been reconstituted according to Bloch’s prescriptive meta-aesthetics. In the music-interpretive practice whose adoption Bloch can be understood as advocating, music assumes a utopian function via something like the following mechanism: by getting us to imagine that its notes relate to one another dynamically, holistically, and organically as mutually reconciled sonic agents, music makes allegorical reference to utopia, invites us to perform the contemplative action of reflecting on the nature of utopia, and causes us to desire the actualization of utopia.

Conclusion We have at last arrived at a proposal that is both sufficiently Blochian and sufficiently transparent: music ought to be taken to be as an allegory (in Walton’s technical sense) of utopia.

510   BRYAN J. PARKHURST Bloch’s writings on cultural hermeneutics are meant to serve as a series of object lessons in how to endow cultural items with a utopian function. I have attempted to specify with a reasonable degree of precision how this utopian function could work in the musical case. At most, this interpretive reconstruction shows the bare possibility of putting music to such a use. But it goes no great distance toward demonstrating the wisdom or utility or likelihood of putting music to such a use. One might well ask: what is there to admire, even for a committed Marxist, about an aesthetic practice in which lots of music unintentionally (through no deliberate decision on the part of its composer) allegorizes pretty much the same thing? Also, is the concept of an unintentional allegory at all sensible? If allegorical content is unrelated to authorial intention, what, if anything, constrains the interpreter’s attributions of allegorical meaning? And, even if these questions have convincing answers, one wonders why Bloch sees this type of listening practice as having paramount exigence for politics. There is a hard row to hoe for anyone who would defend Bloch’s insistence on the political momentousness of listening to the canon of common-practice classical masterworks as radical allegories. In the first place, it is difficult to see how utopian music could tell us anything we do not already know about utopia. For the act of allegorical interpretation to get off the ground, the interpreter needs to already be aware of the one thing music says about utopia, which is that utopia possesses (roughly speaking) organic form; otherwise the interpreter would have no way of determining that utopia, in particular, is what the music allegorically refers to. Moreover, the only people who would have any real inclination to try to hear music as Bloch instructs us to hear it—as radical political allegory—are those who are antecedently convinced of the rightness of socialist ideals and antecedently disposed to pursue them (by, among other methods, listening to music in an appropriately socialist way). Thus, Bloch’s prescribed aesthetic practice presupposes the kind of knowledge and motivation that it would need to instill, were it to hold any real claim to political efficacy. Bloch’s failure to notice the self-underminingness of his central commitment is symptomatic of an underlying credulity that runs throughout his writings and that threatens to vitiate his positive project at large. Even Bloch’s most sympathetic expositors at times feel the temptation of dismissing his system wholesale: The problem is that Bloch . . .retreats into cipher talk at so many analytically crucial points that [his philosophical system] runs the risk of being poetry philosophy, a theurgic aestheticist Weltanschauung: a system of faith in hope with splendid metamystical meditations, but little explanatory power.  (Hudson 1982, 151–152)

It may also be the case that Bloch, in spite of his “emphasi[s] that Marxism must actively inherit the total cultural heritage” and in spite of his “break . . .with the Eurocentrism and high bourgeois bias of the Marxist tradition in aesthetics” (Hudson 1982, 174), was an unwitting captive of his own elevated taste for European art music. A cultured, middle-class convert to Marxism, Bloch was unable to relinquish the conviction that the esteemed musical works of the Austro-German tradition ought to be politically salvageable and, more than this, politically essential in relation to his adopted Marxist ideals.

A Hopeful Tone   511 In Hudson’s politely damning assessment, “the underlying insight that Bloch always remained a ‘bourgeois’ intellectual with left adventurist sympathies is not without foundation” (Hudson 1982, 211). This is not to say that Bloch offers no genuine insights to politically committed Marxists who are concerned with art. Bloch’s writings read like the diaries of a resolute socialist determined to find hope and inspiration wherever they can be found. At their best, they make vivid the appeal of trying to reconfigure our extant aesthetic practices so that they become sources of moral hope and political fervor as well as instruments of hegemonic (in Gramsci’s sense) culture-building more generally. But Bloch tethered his attempt to formulate a revolutionary aesthetics to a fundamentally passive (and characteristically bourgeois-Romantic) conception of the aesthetic domain as principally a space of aesthetic reception. What matters most, for Bloch, is that music be heard in the right way. It seems not to have occurred to him to attempt to develop a complementary notion of the revolutionary potentialities of aesthetic production, nor to think through the social implications of an aesthetic practice that transcends the division of labor between aesthetic producer and aesthetic consumer, or one that transcends the division (massively expanded under capitalism) between producing art and producing “necessities.” Walton’s normativist theory helped us to put our finger on these elementary deficiencies, which Bloch’s formidable prose style and esotericism make it easy to overlook. If we are to begin to remedy these deficiencies, though, first we must duly free ourselves from the constricting tenets of (exclusively) reception-based aesthetics.

Notes 1. To music is to take part, in any capacity, in a musical performance, whether by performing, by listening, by rehearsing or practicing, by providing material for performance (what is called composing), or by dancing. We might at times even extend its meaning to what the person is doing who takes the tickets at the door or the hefty men who shift the piano and the drums or the roadies who set up the instruments and carry out the sound checks or the cleaners who clean up after everyone else has gone. They, too, are all contributing to the nature of the event that is a musical performance (Small 1998, 9). 2. This is consistent with the use of “normativist” in the philosophy of law. “Normativism or the normative theory of legal science represents an attempt to describe (and to rationalize) the actual practice and thinking of contemporary jurists [in which] jurists in fact typically provide statements of norms in a deontic language—in a language that is to say, that is syntactically indistinguishable from the language used to give expression to the norms themselves” (Guastini 1998, 317). Normativist legal theory seeks to describe the fundamentally normative practices of jurists, just as normativist aesthetic theory seeks to describe the fundamentally normative practices of aesthetic subjects. 3. As is well known, some of the relevant judgmental norms for Kant are disinterestedness, universality, and (the representation or apprehension of) purposiveness without a purpose. 4. This may already be obvious. Hume, one may reasonably think, can be equally well described as making a normative claim about how aesthetic judges ought to behave or, alternatively, as making a normativist claim about what rules are in fact followed by those who count as true aesthetic judges. Though the distinction between normative aesthetics

512   BRYAN J. PARKHURST and normativist aesthetics is easy to blur, it proves to be analytically useful for describing Bloch’s project. And there is a tolerably clear difference between paradigmatic instances of purely evaluative normative claims, such as Marx’s pronouncement that ancient sculpture remains a perfect aesthetic “standard and model beyond attainment” for modern artists (quoted in Lifshitz 1973, 89), and purely anthropological normativist claims, such as Marx’s observation that the earliest Greek statues normatively adhered to “models of the mathematical construction of the body” and were the products of normative practices in which “nature was subordinated to reason rather than to the imagination” (quoted in Lifshitz 1973, 37). 5. I should acknowledge at the outset that this goes against the mystical, theurgic grain of Bloch’s philosophy. Bloch’s “erratic blocks of hyphenated terminology, luxuriant growths of pleonastic turns, [and] heaving of dithyrambic breath” (Habermas 1969–1970, 316) are not often counterpointed by rigorously clear argumentation. Nevertheless, I try to elicit from Bloch’s writings about music an unambiguous basic commitment and a possible rationale for it. Without this much, we have no basis for making a principled assessment of what is living and what is dead in Bloch’s aesthetics. 6. Kellner and O’Hara describe Bloch’s philosophical venture as having a Hegelianteleological complexion: “For Bloch history is a struggle against those conditions which prevent the human being from attaining self-realization in non-alienating, non-alienated relationships with itself, nature, and other people. Bloch constantly argues that Marxist theory ought not to forget its telos, which is, as Marx puts it in the 1844 EconomicPhilosophic Manuscripts: ‘the naturalization of man and the humanization of nature’” (1976, 14–15). 7. For the sake of convenience, I will summarize Walton’s theory as though it were focused solely on imaginative engagement with artworks, even though he deals with imaginative engagement with artworks as a special case of engagement with fictional representations in general, which is itself a special case of make-believe in general. 8. Jankélévitch (2003) is a contemporary champion of this sort of view. 9. The musicological subdiscipline of musical semiotics, part of which involves an attempt to recover of codes of signification (“topics”) that would have been familiar to contempor­aneous audiences of historically remote music, is the major source of recent affirmations. 10. Cf., Kivy:

Unlike random noise or even ordered, periodic sound, music is quasi-syntactical; and where we have something like syntax, of course, we have one of the necessary properties of language. That is why music so often gives the strong impression of being meaningful. But in the long run syntax without semantics must completely defeat linguistic interpretation. And although musical meaning may exist as a theory, it does not exist as a reality of listening.  (1990, 8–9) 11. See Hullot-Kentor’s translator’s note in Adorno (1998, 273) for a helpful discussion of the term “Sprachcharakter.” One of the leitmotifs of Adorno’s Aesthetic Theory is the notion that modernist artworks (of whatever medium) express themselves in “a language remote from all meaning” (105). This is problematic not just because the analogy with language becomes strained for nonsematic artworks that also lack a codifiable syntactical dimension (such as abstract expressionist paintings), but also because, as we shall see, Adorno’s denial of musical signification sits uneasily with his musical utopianism.

A Hopeful Tone   513 12. “[T]he lateness of the upper voice, and its dallying quality, the rigidity of the bass’s progression, the fortuitousness or accidentalness of the D-major triad, the movement to something new, are in the music. To miss these is, arguably, to fail fully to understand or appreciate the music” (Walton 2015, 158). Perhaps so, but this insufficient understanding or appreciation does not appear to disbar someone from counting as listening to music, the way one would fail to count as reading a novel if one imagined nothing about anything while running one’s eyes over its words. 13. Walton notes that some of the representational musical imaginings he catalogs “may strike one as optional, as not mandated especially by the music itself, and so not contributing to a fictional world of the musical work” (2015, 173). 14. I defend Walton’s view in Parkhurst (2012). 15. According to Daniel (2001), in Adorno’s Aesthetic Theory,

Innerlichkeit (a term Adorno’s translators variously render as “inwardness,” “interiority,” and the “bourgeois interior” and which refers simultaneously to the inner psychic domain of the bourgeois subject and his actual living space) is described typically as having been initially a strategy developed by the emergent bourgeoisie for its own self-differentiation and self-definition in the face of a rigidly imposed external order. A psychic site of refuge constructed to accommodate an imagined alternative life, the bourgeois interior was fatally flawed, however, in that it was content merely to look like an alternative to the external order without really being in any way resistant to it. 16. This list is drawn from Korstvedt’s attempt to describe, in Blochian terms, how Bloch seeks to simultaneously cancel, appropriate, transcend, and reconfigure emotional listening: “Bloch imagines a refunctioning of Romantic, bourgeois Innerlichkeit that transforms subjective space from a place of sentiment, longing, and emotion, even of suppressed animality, into one that opens onto “an ethics and metaphysics of inwardness, of fraternal inwardness, of the secrecy disclosed within itself that will be the total sundering of the word and the dawn of truth over the graves as they dissipate.” He believes that music, the only “subjective theurgy,” is the way that leads to this mystery, yet he is rather vague, almost groping in his explanation; he avers that as “the inwardly utopian art,” music “lies completely beyond anything empirically demonstrable,” but suggests that the sublime music of deliverance “at the End will not withdraw allegorically back into a home strange or even forbidden to us; but will accompany us, in some deep way, to the mystery of utopia” (2010, 122). 17. William Wordsworth’s preface to his and Samuel Taylor Coleridge’s Lyrical Ballads gives this famous description of what is accomplished by “all good poetry” (Wordsworth and Coleridge 2008, 175). M. H. Abrams (1971) advances the view that Romantic aesthetics is to a significant extent unified by its tendency to generalize this description of (or prescription for) poetry and extend it to all the arts. 18. Adorno pursues the idea that innerlichkeit is a kind of spiritual real-estate. Daniel (2001) helpfully epitomizes Adorno’s view: The alternative world of interiority is one built ostensibly for self-protection, a psychic/physical space into which the subject can withdraw for comfort and refuge. This option of withdrawal is clearly a class privilege of the bourgeois, who is naive and/or arrogant enough to presume that he can create his own exclusive and

514   BRYAN J. PARKHURST private warmth. But it is a privilege indulged at great cost. The alternative world of interiority can only be inhabited (although “occupied” might be the more accurate term here) once the subject has renounced a somatic relationship with the world: the bourgeois interior is thus “museal,” a “still life [in which] the self is overwhelmed in its own domain by commodities and their historical essence.” 19. Both Bloch’s early expressionist work, The Spirit of Utopia (1918), and the sprawling presentation of his heterodox Marxist philosophy, The Principle of Hope (three volumes, 1954–1959), deal extensively with music. Bloch (1985) contains the most important musical discussions from these works. 20. This view of Bloch’s anticipates the treatment of language in Adorno’s Aesthetic Theory. According to Francesca Vidal: Music cannot be understood the way language can; it is not interpretable in the sense that words are. Therefore, Bloch employs the term “call” (Ruf). Music wants to be heard; this links it with language, but it is understandable otherwise than language. That the call for an “otherwise than here” is attributed to it derives from the philosophy of music. That the relationship between philosophy and music is mutual, and philosophy is not simply interpreting something into music, is because music itself expresses something of the future, something that in the openness of its process has to do not only with music itself but with the world.  (Vidal 2003, 173) 21. “[C]lairvoyance is long extinguished. Should not however a clairaudience, a new kind of seeing from within, be imminent, which, now that the visible world has become too weak to hold the spirit, will call forth the audible world, the refuge of the light, the primacy of flaring up instead of the former primacy of seeing, whenever the hour of the language of music will have come. For this place is still empty, it only echoes obscurely back in metaphysical contexts. But there will come a time when the sound speaks” (Bloch 2000, 163). 22. The breadth of Bloch’s interests is staggering. Part I of the Principle of Hope examines “small day dreams”; Part II explores the “anticipatory consciousness” of utopia; Part III explores “the reflection of wish-images” in advertisements, fashion and design, fairy tales, travel, circuses, and theater; Part IV explores how “the outlines of a better world” may be descried in utopian literature, technology, architecture, painting opera, poetry, philosophy, and recreation; Part V explores the “wish images of the fulfilled moment” that arise in moral philosophy, music, funereal practices, religion, and communism as humankind’s summum bonum. Interestingly, Walton’s philosophy of make-believe has also been rightly lauded for the vastness of the range of cultural products it brings into consideration. 23. Bloch’s conception of musical tones themselves as material bearers of utopian content is historically and musicologically contextualized in Gallope (2012). 24. “Bloch works overwhelmingly with European and particularly German music, so much so that he is really offering a Western philosophy of music, in content at least. The trap is a common one, for the very particular and anomalous history of Western economics and culture becomes the norm for universalizing ‘the’ philosophy of music” (Boer 2014, 105). 25. Here Bloch draws on the Christian image and instrument of the fountain as a source of the “Water of Life” by which the faithful are baptized into immortality. Bloch’s engagement with Christianity, and his willingness to place Marxism in dialogue with JudeoChristian theology, have been widely discussed. Marsden 1989 is a good introduction to this topic.

A Hopeful Tone   515 26. This is truer of Bloch’s aesthetics as a whole than it is of his musical aesthetics, which deals predominantly with the great works of the Western canon. 27. For a book-length argument to the effect that Adorno’s aesthetics is more utopian than not, see Boucher (2013). 28. Copious qualifications are in order, given that Adorno sees a crucial difference between (for instance) the kind of totality exhibited by the music of the heroic period of the bourgeoisie (Beethoven), by the kind of neoclassical music that apes such music (early Stravinsky), and the “administered totality” of twelve-tone serialism (Schoenberg after 1921). For present purposes, I am trying to avoid such complications and cut to the chase.

References Abrams, M. H. 1971. The Mirror and the Lamp: Romantic Theory and the Critical Tradition. Oxford: Oxford University Press. Adorno, T. W. 1993. Music, Language, and Composition. Translated by S. Gillespie. Musical Quarterly 77 (3): 401–414. Adorno, T. W. 1998. Aesthetic Theory. Translated by R. Hullot-Kentor. Minneapolis: University of Minnesota Press. Adorno, T. W. 2002. Essays on Music. Berkeley, CA: University of California Press. Aristotle. 1961. Aristotle’s Poetics. Translated by S. H. Butcher. New York: Hill and Wang. Attali, J. 1985. Noise: The Political Economy of Music. Translated by B. Massumi. Minneapolis: University of Minnesota Press. Boer, R. 2014. Theo-Utopian Hearing: Ernst Bloch on Music. In The Dialectics of the Religious and the Secular, edited by M. R. Ott, 100–133. Leiden: Brill. Boucher, G. 2013. Adorno Reframed. London: I. B. Tauris. Bloch, E. 1960. Spuren. Frankfurt: Suhrkamp Verlag. Bloch, E. 1971. On Karl Marx. New York: Herder and Herder. Bloch, E. 1972. Atheism in Christianity. Translated by J. T. Swann. New York: Herder. Bloch, E. 1985. Essays on the Philosophy of Music. Translated by P.  Palmer. Cambridge: Cambridge University Press. Bloch, E. 1986. The Principle of Hope. 3 vols. Translated by N. Plaice, S. Plaice, and P, Knight. Cambridge, MA: MIT Press. Bloch, E. 2000. The Spirit of Utopia. Translated by A.  A.  Nassar. Stanford, CA: Stanford University Press. Chion, M. 1994. Audio-Vision: Sound on Screen. Translated by C.  Gorbman. New York: Columbia University Press. Daniel, J. O. 2001. Achieving Subjectlessness: Reassessing the Politics of Adorno’s Subject of Modernity. Cultural Logic 3(1). https://clogic.eserver.org/jamie-owen-daniel-achievingsubjectlessness. Gallope, M. 2012. Ernst Bloch’s Utopian Ton of Hope. Contemporary Music Review 31 (5–6): 371–387. Guastini, R. 1998. Normativism or the Normative Theory of Legal Science: Some Epistemological Problems. In Normativity and Norms: Critical Perspectives on Kelsenian Themes, edited by S. L. Paulson and B. Litschewski Paulson, 317–330. New York: Oxford University Press. Hegel, G. W. F. 1920. Philosophy of Fine Art. Vol. 3. Translated by F. P. B. Osmaston. London: G. Bell and Sons.

516   BRYAN J. PARKHURST Hegel, G.  W.  F. 1977. Phenomenology of Spirit. Translated by A.  V.  Miller. Oxford: Oxford University Press. Hegel, G. W. F. 1992. Elements of the Philosophy of Right. Translated by A. Wood. Cambridge: Cambridge University Press. Hudson, W. 1982. The Marxist Philosophy of Ernst Bloch. London: Macmillan. Habermas, J. 1969–1970. Ernst Bloch—A Marxist Romantic. Salmagundi 10–11: 311–25. Hume, D. 2006. Essays: Moral, Political, and Literary. New York: Cosimo Classics. Jameson, F. 1971. Marxism and Form. Princeton, NJ: Princeton University Press. Jankélévitch, V. 2003. Music and the Ineffable. Translated by C.  Abbate. Princeton, NJ: Princeton University Press. Jerusalem, W. 1920. Introduction to Philosophy. Translated by C.  F.  Sanders. New York: Macmillan. Kellner, D., and H. O’Hara. 1976. Utopia and Marxism in Ernst Bloch. New German Critique 9: 11–34. Kivy, P. 1990. Music Alone: Philosophical Reflections on the Purely Musical Experience. Ithaca, NY: Cornell University Press. Klumpenhouwer, H. 2002. Commodity Form, Disavowal, and Practices of Music Theory. In Music and Marx: Ideas, Practice, and Politics, edited by R. B. Qureshi. New York: Routledge, 23–44. Korstvedt, B. M. 2010. Listening for Utopia in Ernst Bloch’s Musical Philosophy. Cambridge: Cambridge University Press. Leppert, R. 2005. Music “Pushed to the Edge of Existence”: Adorno, Listening, and the Question of Hope. Cultural Critique 60 (1): 92–133. Levitas, R. 2013. Singing Summons the Existence of the Fountain: Bloch, Music, and Utopia. In The Privatization of Hope: Ernst Bloch and the Future of Utopia, edited by P. Thompson and S. Zizek. Durham, NC: Duke University Press. Levy, Z. 1997. Utopia and Reality in the Philosophy of Ernst Bloch. In Not Yet: Reconsidering Ernst Bloch, edited by J. O. Daniel and T. Moylan, 175–185. London: Verso. Lifshitz, M. 1973. The Philosophy of Art of Karl Marx. Translated by R. B. Winn. London: Pluto Press. Marsden, J. 1989. Bloch’s Messianic Marxism. New Blackfriars 70: 32–44. Neuhouser, F. 2011. The Idea of a Hegelian “Science” of Society. In A Companion to Hegel, edited by S. Houlgate and M. Baur. Oxford: Wiley-Blackwell. Parkhurst, B. 2012. The First-Person Feeling Theory of Musical Expression. Postgraduate Journal of Aesthetics 9(2): 14–27. Small, C. 1998. Musicking: The Meanings of Performing and Listening. Middletown CT: Wesleyan University Press. Vidal, F. 2003. Bloch. In Music in German Philosophy: An Introduction. Chicago: University of Chicago Press. Walton, K. 1990. Mimesis as Make-Believe. Cambridge, MA: Harvard University Press. Walton, K. 2015. In Other Shoes: Music, Metaphor, Empathy, Existence. Oxford, UK: Oxford University Press. Wordsworth, W., and S.  T.  Coleridge. 2008. Lyrical Ballads: 1798 and 1800. Peterborough, Ontario: Broadview Press.

chapter 25

Sou n d as En v iron m en ta l Pr e sence Toward an Aesthetics of Sonic Atmospheres Ulrik Schmidt

Introduction Contemporary auditory culture is characterized by an intensified and often rigorously detailed focus on the design and sensory experience of our sonic environment. From shopping arcades and private homes to film, sound art, digital media, and computer games, sound is used to stimulate our sensation of being in a particular environment. Whether professionally designed or a result of our practice of everyday aestheticization, environmental sound design has become ubiquitous. It frames and penetrates our everyday existence, action, and social exchange. However, our knowledge of sonic environments is still somewhat limited. Perhaps, this shortcoming of concepts and theoretical models may partly be explained by the very nature of the sonic environment itself. Due to its intermediary position and ephemeral, ubiquitous distribution of multiple simultaneous events, it is notoriously difficult to determine the status and describe the character of environments in general and sonic environments in particular. But, as Gernot Böhme notes, although “ ‘atmosphere’ is used as an expression for something vague, this does not necessarily mean that the meaning of this expression is itself vague” (1993, 118). Or as Gilles Deleuze describes it—in a comment on Leibniz’s famous analysis of his “small perceptions” while listening to a noisy seascape—sonic environments are “distinct-obscure” (1994, 213). They are distinct and obscure at the very same time. But, despite the perceptual obscurity of sonic environments—or perhaps better, exactly because of it—we must be in constant search for new terminologies and theoretical

518   ulrik schmidt frameworks to parallel the sophistication in design and amount of perceptual detail characteristic of our current sonic environments. In this chapter, I will present a theoretical perspective on what I see as some of the most important questions regarding the ways sound affects the sensory experience of our environment: What does it mean to be affected by the sonic environment as environment and not as a set of individual sounds in the environment? And what perspectives and conceptual frameworks will allow us to distinguish between different types of sonic environments and different ways of being affected by them? I will approach these questions by proposing the term “sonic environmentality” as a general term for the ways sound can act and affect us as environment. The concept of sonic environmentality enables a further distinction between three basic forms or dimensions of the way sound may affect us as environment: atmosphere, ambience, and ecology. During the last decades, we have seen considerable scholarly interest in atmosphere (Böhme 1993, 1995, 2001; Schmitz 1993, 2014; Anderson 2014; Hasse 2014) and, especially in recent years, also in ambience (McCullough 2013; Kim-Cohen 2013; Schmidt 2013, 2015) and ecology (Guattari 2000; Morton 2007, 2010; Herzogenrath 2008, 2009). However, there has been a widespread tendency here to understand atmosphere, ambience, and ecology as generic, all-embracing, and, to some extent, synonymous concepts for our experience of and relations with the environment. In contrast to this common tendency, I will propose a clear and explicit distinction between atmosphere, ambience, and ecology as separate affective dimensions of our environment, in the sense that they express three different aspects of the way an environment performs and affects us as environment, each with its own distinct aesthetic potential. To substantiate and further develop this thought in more detail, the last part of the chapter will focus on one of the three environmentalities, sonic atmosphere, as one basic dimension of a more general sonic environmentality. I will begin from a broader perspective, though, by considering affect and imagination as decisive features in our perception of the surroundings as sonic environments.

Environmentality In the most general terms, an environment can be described as a meaningful relationship between an individual and its immediate surroundings. This insight was the cornerstone in biologist Jakob von Uexküll’s influential studies of the animal environment as Umwelt (1921, [1934] 2010). According to Uexküll, an Umwelt—a word that literally translates as “surrounding world”—differs from the mere physical surroundings by constituting a meaningful whole, a “world,” which is specific to each individual who experiences it. Uexküll’s thoughts have played a major role in many later understandings of the environment in disciplines such as biology, psychology, semiotics, sociology, and philosophy. For example, the psychologist James  J.  Gibson and philosophers such as Martin Heidegger, Maurice Merleau-Ponty, and Gilles Deleuze all explicitly pursued Uexküll’s

sound as environmental presence   519 theory of the environment as an intimate and meaningful relationship between an individual and its surroundings and developed this idea in different directions (Buchanan 2008). Hence, in his ecological approach to perception, Gibson based his theory on the existence of an essential “mutuality of animal and environment” (1986, 8). And with direct reference to Uexküll, Deleuze has argued for a basic characterization of individuals “by the affects they are capable of ” (1988, 125) and that this specific capacity to be affected, this system of affective disclosure, is what produces the overall range and properties of their particular environment as a set of conditions for action and affection. Following this line of thinking we can describe an environment in general terms as a set of elementary affective relations between an individual and its surroundings. However, although we perceive our environment as a meaningful whole through our affective relations with it, the specific experience of it—that is, the situated act of perceptual engagement with our environment—is often characterized by a more or less consciously focused attention on singular and discrete objects or events, which we isolate from the environment as a whole in an elementary figure-ground segmentation. Sound—like light and smell—may basically be of a surrounding character; it propagates, resonates, and reverberates throughout space. For that reason alone, sound by nature stimulates a directly environmental perspective in perception, involving a capacity for what I shall call environmental imagination. However, most often sounds are not experienced directly as environments but rather as vibrating products of particular and localized events ­coming to us in streams from particular positions in the environment (Bregman 1990). In such cases, the very environmental character of the environment withdraws perceptually to a secondary position in the background of our attentive awareness. In other words, while the environment affects us, and does so perpetually, by constantly providing us with a range of individual, conscious and nonconscious stimuli, our engagement with the environment as a whole—that is, with the environment as environment—is seldom conscious and actively intended but will usually take place in a preattentive and preconscious mode of experience. Still, however, despite its perceptually withdrawn and peripheral character, we are constantly engaging in an affective relationship with our sonic environment as environment. This affective relationship does not so much concern the experience of individual objects, parts, streams, and events but the specific ways that the events, in each particular case, merge into a total performance of environmental wholeness. Since we lack a proper concept, I propose the term “environmentality” to describe this performance and potential affectivity of the environment as environment. The expression is borrowed from a phrase in Heidegger’s Sein und Zeit ([1927] 1996) where he, with direct inspiration from the early writings of Uexküll (1921), speaks of “environmentality” (Umweltlichkeit) as “the worldliness of the surrounding world” (Heidegger [1927] 1996, 62). However, in contrast to Heidegger, who was mainly interested in the experiential constitution and ontological status of the environment as a meaningful world, I understand environmentality as a basic performative capacity of some phenomena to act and directly affect us as encompassing, environmental wholes.

520   ulrik schmidt Accordingly, sonic environmentality is a set of phenomenal properties that potentially make a particular sound or collection of sounds perform and affect us as environment. The most basic feature of sonic environmentality is thus related to the very environing character of a given sound. A sound performs and affects us with its environmental characteristics when it penetrates our total perceptual field with a ubiquitous sense of being everywhere and all-over. It pervades the entire situation by destabilizing immanent hierarchies and by merging all individual sounds into a heterogenic, all-encompassing milieu. For the same reason, sonic environmentality is essentially nonfigurative in the gestalt psychological sense that it dissolves the perceptual tension between central and peripheral elements. You may hear isolated sounds and discrete events, but if they affect you as environment, no isolated event will play a privileged part or take a central position. From an environmental perspective, no sound is more important than others. They all mix in the performative production of a consistent sonic whole. Unfortunately, this crucial point regarding the aesthetic nature of our sonic environment has been somewhat neglected in many leading theories of soundscapes and sonic environments. Most notably, this is the case in R. Murray Schafer’s influential soundscape theory (1977). Schafer built his analysis of soundscapes around the delicate identification and perceptual attention toward (and potential preservation of) the individual sounds that were most characteristic of, and intimately associated with, a particular environment (“soundmarks” and “sound signals”). As theorists and composers such as Francisco López (1998) and David Toop (2004) have remarked, Schafer’s prime interest in unique soundmarks and sound signals is closely related to his critique of noisy, modern soundscapes and what Toop describes as Schafer’s “personal aversion to urbanism” (Toop  2004, 62). This is most apparent in Schafer’s critique of the so-called lo-fi soundscapes of industrial societies in which “individual acoustic signals are obscured in an overdense population of sounds” (Schafer 1977, 43). Schafer’s longing for the natural, nonurban environment is thus closely related to his basic preference for sonic environments that allow individual sounds to be perceived in isolation from other disturbing sources. For this reason, and in contrast to his general intentions, Schafer’s theory of the soundscape in fact advocates an analysis of sonic environments in which the very environmentality of the soundscape, the very tendency to affect us in all its environmental wholeness, is overshadowed by an essentially “nonenvironmental” listening to individual sounds in the environment. Only secondarily, if at all, did he consider the environment as a consistent whole of affective relations. Schafer’s soundscape theory, in other words, is not a theory of sonic environmentality. What I propose here is another approach in which we carefully insist on the very environmental character of the soundscape in order to investigate directly what it means to be affected by sound as environment. As a term, sonic environmentality highlights the fact that a merging of sounds from all disparate sources (humans, things, animals, matter, and energy) into a global mix is a basic feature of our experience of sonic environments. It makes us aware that when an environment performs its specifically environmental characteristics through sound, all sounds belong, as Timothy Morton describes it, to the same de-hierarchized and all-encompassing “worldly mesh” of “imagine[d] interconnectedness” (2010, 15).

sound as environmental presence   521

Environmental Imagination Whether produced by events in the immediate acoustic surroundings, by technical recording and playback or by electronic sound synthesis, all sonic events have a potential to become perceptually environmental and to produce a basic sense of sonic environmentality. However, we do not experience sonic environments in pure sensory isolation. On the contrary, the experience of our environment is essentially multisensorial. As Merleau-Ponty has argued, perception is the product of an intimate multisensory relationship between the individual and its surroundings as a whole. “My body,” he notes, “is the seat or rather the very actuality of the phenomenon of expression, and there the visual and auditory experiences, for example, are pregnant one with the other, and their expressive value is the ground of the antepredicative unity of the perceived world” (Merleau-Ponty [1945] 2005, 273). Accordingly, auditory perception alone does not provide an exhaustive phenomenal experience of the environment as a whole. It will always be but one dimension of our full, affective relations with the perceived environment as an “antepredicative unity” (i.e., a precategorical and preobjective whole). This has important implications for the ways in which sound as specific material manifestation contributes to the affectivity and multisensory experience of a particular environment. As mentioned earlier, sound is by nature environmentally encompassing because of its vibrating and reverberating properties alone. Consequently, in terms of sensory perception our sonic environment is, in each particular situation, directly given and perceivable for us in its full sonic manifestation. Visually, however, we only perceive the environment as a large fragment of its total existence, framed as it is by our limited visual perspective. Part of our environment always withdraws into the environmental darkness of nonvisual perception. Thus, in terms of visibility and visuality the environment is never fully actualized, and the manifestation of environmental wholeness only exists for us virtually, as a potential somewhere in the proximity of our sensory engagement with the surrounding world. In other words, when environmentalities affect us in their all-encompassing multisensory wholeness, they do so because they are partly imagined. Imagination is to bring to presence what is not present by producing a virtual image that affects us as real. According to the philosopher Tamar Gendler, to “imagine something is to form a particular sort of mental representation of that thing” (2013). Accordingly, because it is not directly perceivable in its entirety, the visual dimension of our environment needs to be partly imagined as such by producing an image of environmental wholeness with which we can engage in an environmental exchange of affective relations. This potentially has crucial consequences for our understanding of environmentality in general and sonic environmentality in particular. First, our general multisensory perception of the environment as an antepredicative unity is in fact divided between a direct and full auditory perception of the environment in all its sonic wholeness and a visual perception of it that is limited and fragmentary and must be completed virtually in our imagination. Second, the imaginative aspect of environmental perception is not limited to the visual domain. Because of its basic function in a unitary multisensorial

522   ulrik schmidt system of environmental perception, sonic environmentality contributes directly to the elementary process of what I shall call environmental imagination. Sonic environmentalities, as all environmentalities, affect us as partly imagined, not because they are partly unreal but because they are partly “veiled” and withdrawn from perception. And finally, this intimate connection to a domain of relative invisibility further exposes another fundamental quality of sonic environmentality that is specific to the auditory domain compared to other modalities of the perceptual system: all sonic environments are—to a greater or lesser extent—acousmatic environments. The word “acousmatic” derives from the name of the disciples of Pythagoras (acousmatikoi) who listened to the teachings of their master while he was hidden from view behind a curtain. After being reintroduced by the French composer Pierre Schaeffer in 1966, the term has been widely used to describe the experience of a sound without seeing the source that produces it. However, as critics such as Brian Kane have argued, while the term has often been closely associated with the introduction of technological sound recording and, more specifically, electroacoustic music, the acousmatic has much wider implications than first indicated by Schaeffer (Kane 2014). This is, I will further argue, especially the case in terms of sonic environments. We may be able to hear an environment in all its surrounding sonic complexity but, because of our limited visibility, the environment as a whole will always stay partly veiled behind the acousmatic curtain. As Francisco López notes in regard to field recordings he made in La Selva rainforest in Costa Rica, we can find a clear empirical demonstration of this “environmental acousmatics” (2004, 86) if we listen to the noisy cacophony of a dense tropical rain forest. “There are many sounds in the forest,” López says: but one rarely has the opportunity to see the sources of most of those sounds . . . This acousmatic feature is best exemplified by one of the most characteristic sounds of La Selva: the strikingly loud and harsh song of the cicadas. . . . You hear it with an astonishing intensity and proximity. Yet, like a persistent paradox, you never see its source.  (López 2004, 86)

This basic acousmatic quality of the sonic environment indicates a basic link between the experience of sonic environmentality and the process of environmental imagination. Because of the acousmatic curtain, the auditory experience of our environment ­corresponds with a cognitive process in which we spontaneously map the cacophony of sonic environmental effects onto a total image of the environment as a multisensory whole. This environmental imagination by way of acousmatic environmentality can take place in two different ways. It can be produced by sonic events that take place as part of the individual’s actual physical surroundings. Or it can be produced by sonic events that take place in a virtual space. The virtual production can again either happen representationally, as is the case in most technical reproductions or simulations (technical or mental) of actual sonic environments, or it can happen in a nonrepresentational, synthetic construction of an abstract virtual environment. However, in terms of sonic environmentality as an acousmatic stimulation of the individual’s environmental

sound as environmental presence   523 imagination it makes no essential difference whether this environmentality is produced in our actual physical surroundings or in a simulated virtual space (representational or nonrepresentational). Whether “actually” or “virtually” produced, sonic environmentality stimulates the same environmental imagination because of its essential acousmatic character. All sonic environments involve an act of imagination as the construction of a virtual image of environmental wholeness.

Basic Sonic Environmentalities: Atmosphere, Ambience, and Ecology As argued initially, sonic environmentality is basically characterized by a nonhierarchical mesh of sounds into a relational whole, which makes the environment perform as environment. However, in each particular situation, a given sonic environmentality will also perform and affect us in a certain way that is specific to it. All sonic environmentalities affect us environmentally by stimulating our environmental imagination, but when they do so, they simultaneously express a particular environmentality that differs, more or less substantially, from all other particular environmentalities. In other words, sonic environmentality simultaneously involves an environment’s performance of its own environmental characteristics on a general level, where it produces a set of generic environmental effects shared by all environmentalities, and on a specific level, where it distinguishes itself from all other environmentalities. It is still possible, however, to identify a number of environmental characteristics that are shared by all particular environmentalities. I will refer to such environmental characteristics, shared by all particular environmentalities, as basic sonic environmentalities. They are basic in the sense that they express different elementary aspects or dimensions common to all environmentalities, which can be articulated or emphasized to a greater or lesser extent in each particular situation. Furthermore, I propose a distinction between three, basic environmentalities: atmosphere, ambience, and ecology. In contrast to common practice, in which the three concepts are often used somewhat synonymously, I thus argue for a well-defined and explicit differentiation between them as three distinctive dimensions or “varieties” of the way an environment performs and affects us as environment. All three, basic sonic environmentalities stimulate our general environmental imagination, but in each particular situation they will, as mentioned, do so to a greater or lesser extent and with quite different aesthetic potentials. In fact, the basic environmentalities differ because they comprise and express different aspects of the generically environmental. More precisely, ambient environmentality emphasizes an environment’s ubiquitous properties, potentially intensifying in experience the basic environmental imagination of being surrounded in and by the environment as a total field. Ecological environmentality, on the other hand, emphasizes an environment’s relational properties,

524   ulrik schmidt thereby potentially intensifying the basic environmental imagination of interconnectivity and of being mutually involved with all its parts in a dehierarchized relationship. And finally, atmospheric environmentality emphasizes an environment’s anthropomorphic, social and site-specific qualities in order to intensify the basic environmental imagination of a spatially distributed presence. The three, basic sonic environmentalities are not mutually exclusive, and they seldom exist and affect us in pure isolation. On the contrary, a particular sonic environment can in most cases be characterized by the ways in which it combines the three basic sonic environmentalities into a consistent image of environmental wholeness that is unique to that particular environment. Still, however, each basic environmental variety will typically have a more or less profound impact on the overall environmental imagination in the sense that, for instance, sonic environmentalities dominated by the atmospheric dimension could be described as “atmospheric” and sonic environmentalities dominated by the ambient or ecological dimensions could be described as “ambient” or “ecological” respectively. Since it is not possible here to go into details with all three basic sonic environmentalities and their aesthetic potentials, I will—while not ignoring the other basic environmentalities entirely—narrow my focus in the rest of the chapter to an exploration of sonic atmosphere. First, I will outline some of the general theoretical implications and aesthetic potentials related to the consideration of atmosphere as basic environmentality. In the last sections of the chapter, I will specifically discuss sonic atmosphere and analyze it in relation to specific cases in cinematographic sound design and sound art.

Atmosphere as Environmental Presence As elementary expressions of the generically environmental, all three basic environmentalities obviously have many characteristics in common. For instance, qualities such as an intermediary position between subject and object (quasi-objectivity, in-betweenness), and the diffuse, ephemeral, and enveloping character often highlighted as core attributes of atmosphere in many theoretical accounts (Böhme 1993, 1995, 2001; Schmitz 2014; Hasse 2014), are in fact characteristics that are shared by all environmentalities and not exclusive to atmospheres. Quasi-objectivity, in-betweenness, ephemerality, and envelopment are just as elementary in ambient and ecological environmentalities as they are in atmospheres. What are in focus here are the characteristics that distinguish the three environmentalities from each other and make them stimulate our environmental imagination in different ways. What is of key interest is, in other words, sonic atmosphere as the specifically atmospheric dimension of sonic environmentality. The most characteristic element of atmosphere, which distinguishes it from other environmentalities, is, I will argue, another basic feature often mentioned in texts on atmosphere, albeit usually rather briefly and in an evanescent manner. This feature is clearly articulated in a passage by Böhme in which he describes an atmosphere’s capacity

sound as environmental presence   525 to produce a “sense of presence” [das Spüren von Anwesenheit] (Böhme 2001, 45). Atmospheres, Böhme notes with explicit inspiration from Heidegger, “seem to fill the space with a certain tone of feeling like a haze” (1993, 113–114); they evoke a vague sense of something’s or someone’s environmental “being-here” as a feeling of “indeterminate and spatially disseminated moods” (2001, 47). How can we, more precisely, characterize this experience of environmental presence as a spatially distributed mood, tone, or feeling? As I argue, the sense of atmosphere as environmental presence is mainly evoked by two interrelated factors. First, it is related to the production of presence as a site-specific (Kwon 2002) sense of being in a particular place. As Jürgen Hasse describes it, we perceive atmospheres “as an affective tone of a place. . . . They communicate something about the distinct qualities of a place in a perceptible manner, they tune us to its rhythm” (Hasse 2014, 215). This sense of sitespecificity is closely related to what Böhme calls the “ecstasy of things” (1993, 1995, 2001) by which he understands a certain capacity of individual entities to “go out of themselves” in our environmental perception (gr. ekstasis: standing out). In each specific situation, all individual things—objects, materials, persons, sounds, everything that makes up the sociomaterial environment—go out of themselves and merge into a unique assemblage that intimately connects our atmospheric imagination to the place or site in which it is produced and evoked. Second, the atmospheric sense of presence is closely related to the environmental imagination of a particularly human or social quality. According to Heidegger, “mood” (Stimmung) is the existential capacity of Dasein that continuously and in each particular moment in time “makes manifest ‘how one is and is coming along’” ([1927] 1996, 127). Following Böhme’s Heideggerian definition of atmosphere as “spatially d ­ isseminated moods,” atmosphere can in view of that be described as the experience of “how a particular environment is and is coming along.” It discloses the “state” of a place or ­situation by investing it with a human-like affectivity and expression of being in a ­certain mood. Or as Jürgen Hasse describes it, again with an implicit reference to Heidegger, atmospheres “let us comprehend without words how something is around us. Therefore, atmospheres are also indicators of social situations” (2014, 215). Atmospheres “are not things, but emotions that we are affected by as essences of the world-with-others” (221). Anthropomorphism is the attribution of human characteristics—human form, behavior, consciousness, expressivity—to nonhuman things. In view of that, the production and experience of atmosphere as spatially disseminated moods can be said to involve an unmistakable anthropomorphism because of its tendency to infuse our environmental imagination with a human-like sense of intentionality, expressivity, and emotion. Anthropomorphism makes the environment act and perform “like a human being” emerging from the assemblage of relations between materials, things, bodies, and events dynamically distributed throughout the place—like “a sort of spirit that floats around,” as Michel Orsoni describes it (quoted in Anderson 2014, 137). An atmosphere is, in other words, not only essentially social; it is the environmental performance and imagination of our sociomaterial surroundings as “Other” in the form of an abstract, quasi-subjective being. Needless to say, this sense of mood, spirit, or intentionality does

526   ulrik schmidt not involve the construction of another subject, a particular person, in our imagination. Atmospheric anthropomorphism remains essentially environmental. To summarize my argument so far, atmosphere can be described as the affective production and environmental imagination of a site-specific and anthropomorphic presence emerging from the material layout of a particular environment. This anthropomorphic and site-specific character, it must be stressed, is a unique quality of atmosphere as basic environmentality. You will find nothing like it in either ecological or ambient environmentalities that both affect the individual by way of essentially nonhuman properties and effects. Actually, the very difference between atmosphere and the other two basic environmentalities regarding the specifically human properties and imaginations allows us to see how atmosphere is in fact the sole basic environmentality in and by which the human dimension of our environment—that is, social, subjective, anthropomorphic—is performed and experienced. Atmosphere, in short, is the performance and imagination of the specifically human relations with our environment.

Sonic Atmospheres So, if we return to the question of sonic atmosphere, how can we more precisely understand the sonic production of site-specific and anthropomorphic presence? What is the specifically atmospheric dimension of our sonic environment? What does a sonic atmosphere sound like? To explore this question in more detail I will, in the last part of the chapter, consider two examples taken from two different domains in which the staging and experience of sonic atmospheres is particularly prevalent: cinematographic sound design and contemporary sound art. To emphasize the specifically atmospheric qualities of the sonic environments in question, I will occasionally include accompanying observations on ambient and ecological environmentalities as well.

David Lynch: Eraserhead (1977) Sound design is a central domain for the staging of sonic atmospheres. Especially in fiction film and computer games, sound design plays a key role in creating a sensation in the listening spectator/player of being in the imagined environment in which the action takes place by creating an atmospheric sense of place and anthropomorphic presence. Obviously, music often contributes strongly to the overall experience by investing the scene with an emotional character, but it typically does so by way of conventional musical expression, not by producing environmental effects. In rare cases, however, film music with immanent environmental properties may be used to autonomously produce sonic atmospheres. As an example of this, consider Stanley Kubrick’s sympathetic use of György Ligeti’s environmental compositions (Atmosphères [1961], Requiem [1963–1965], and Lux Aeterna [1966]) in 2001: A Space Odyssey (1968). In perfect line with the narrative,

sound as environmental presence   527 the music continually invests the film’s many scenes of the dark and empty outer space with a mystical abstract presence that hovers over the imagined environment throughout the film. However, rather than being a result of the use of music, the cinematographic creation of sonic atmosphere mainly takes place in relation to the overall sound design of the film or game in question. As a paradigmatic example of this, consider David Lynch’s and Alan Splet’s pioneering sound design for Lynch’s Eraserhead (1977). The film’s sound track itself—later made available in a shorter version as a stand-alone release in its own right (1982)—is a profound example of the evocation of environmental presence by the use of sound. By incessantly combining acousmatic site-specific action with environmental sounds of anthropomorphic expression, the soundscape itself becomes a leading character in the staging of the film’s bizarre and anxious universe. In order to explore this in more detail, consider the first scene of the film (6:00–11:20) that comes after a short prologue. Here, we follow the protagonist Henry walking home from work through an empty industrial landscape, into his building, up the elevator, and down the hallway to his apartment. Outside his apartment, the woman next door approaches him with a message from a girl named Mary who called on the payphone. After a minute’s dialogue, Henry enters his apartment. The whole scene takes approximately five minutes. The film’s emphasis on durational time, with little action and narrative progression, gives room for an affective staging of environmentality and the stimulation of our environmental imagination. Henry’s appearance, including his conversation with the neighbor, is awkwardly nervous and tense and it gives the whole scene an uneasy and claustrophobic feeling. This feeling of anxiety and tension, however, is not only a product of Henry’s awkward behavior. It is effectively intensified by the scene’s sound design. In the whole passage, as is the case throughout the entire film, we constantly hear deep, droning layers of complex abstract noise. On this noisy background, an acousmatic series of inconspicuous individual sound events is heard, coming from particular but undefined off-screen locations in the environment as an imagined whole. The overall result is a looming and penetrating feeling of environmental presence. The sonic environmentality staged in Eraserhead is not exclusively atmospheric but equally involves a production of ambient and ecological effects. The droning noise, for instance, envelops the whole scenario in an ambient sensation of being immersed in a total field of sound. And the layers of individual sounds from disparate ontological ­levels might intensify the environmental imagination of a scenario in which all parts are interconnected and mutually involved with everything else in a nonhierarchical, ecological mesh. Still, however, sonic atmosphere is arguably the most profound of the three basic sonic environmentalities in Eraserhead. While the main aesthetic function of the heavy layers of background noise mainly is to give the whole scenario a strong overall environmental character (ubiquity, consistency), the major role of each individual sound event is to stage a particular atmosphere by simultaneously evoking a strong sense of site-specificity and a feeling of anthropomorphic presence penetrating the whole scenario.

528   ulrik schmidt A short list of the most important individual sound cues that can be heard over the layer of distant noise during the first scene could read like this: 6:30–6:45 6:45–7:20 7:25–7:30 7:45–8:00 7:50–8:10 8:00–8:30 8:00–8:45 8:45–8:50 9:00–9:45 9:55–10:05 10:30–11:15

Sound of distant foghorn (c) Organ piece (Fats Waller) with a strong diegetic character as if being played “live” somewhere in the distance (b) Foghorn in the distance (c) Squeaking mechanical noises (c) Low-pitched electronic hum (a) Sounds of someone banging on metal (b) Heavy low-pitched breathing (c) Foghorn (c) Highly intensified elevator sounds (a) Sound of malfunctioning electric installation (a) Short dialogue with woman across the hall (b)

The individual sounds in the first scene can be categorized into three main groups: industrial sounds of mechanical or machinic activity (designated with an “a” in the list); concrete sounds of human bodily actions (b); and sound signals and other sounds with a strong anthropomorphic, voice-like character (c). These individual sounds from the different groups, and the way they mix into a continuous sequence of varying intensity, are the main contributors to the overall atmosphere of the scene as a sense of environmental presence. The sounds of mechanical and bodily action help—in the midst of chaotic noise—to perceptually consolidate the scenario as a particular place, a physical location, in which concrete actions take place. And both the specifically human character of the action (b) and the anthropomorphic sound events (c) further invest the scenario with a human presence that is not reducible to each single sound but rather stems from the environment itself as an expressive imaginary whole. The whole environment seems to be alive, constantly expressing itself and communicating to us about its state of being. One might want to interpret this expression as a mere sonic representation of Henry’s mental and emotional condition. But the aesthetic effect is first and foremost nonrepresentational and profoundly environmental. The various sounds persistently perform as an environmental whole. The combination of noise, site-specific action, and anthropomorphic expression affects us by directly stimulating our environmental imagination and enveloping us in the sense of environmental presence we call atmosphere.

Janet Cardiff and George Bures Miller: Forest (for a Thousand Years) (2012) Janet Cardiff and George Bures Miller’s sound art installation Forest (for a Thousand Years) was created specifically for Documenta 13 in Kassel, Germany, in 2012. Installed in a small forest opening in the beautiful Karlsaue Park, the audience was invited to sit

sound as environmental presence   529 down and listen to an all-encompassing sonic environment (28-minute loop) coming from a large system of loudspeakers discreetly hidden from view up among the trees. Hence, compared to the cinematographic environmentality of Eraserhead, the environmen­ tality of Forest is produced by the use of sound only and it is meticulously installed— with the use of a complex Ambisonics surround sound system—into a specific location in the Karlsaue Park. This site-specific and surrounding quality of the installation is used extensively to stage the work’s particular environmental characteristics. As was the case in Eraserhead, the sonic environmentality produced in Forest is not purely atmospheric. Ambient and ecological effects are also important aspects of the overall experience of the work as sonic environmentality. For instance, the mix of technically reproduced sound with actual sounds from the physical surroundings into a single acousmatic mesh potentially evokes an ecological sense of dehierarchized interconnectivity. Sounds from the loudspeakers blend, almost imperceptibly, with natural sounds from the forest and from members of the audience into an intensified, acousmatic forest ecology. And simultaneously, the immersive character of the whole technical setup—with loudspeakers disseminated throughout the entire site and the use of a full-sphere surround system that allows for simulations of accurate location and movement of virtual sound events—strongly helps to amplify a basic ambient sensation of being surrounded by the sonic environment as a consistent, all-encompassing whole. Nevertheless, the most profound characteristic of Forest is, once again, the way in which the piece affectively tunes the whole situation into a penetrating atmospheric sense of environmental presence. In terms of aesthetic impact, the functions of the ecological and ambient environmentalities are secondary as they mainly support this pervading production of atmosphere. As was the case in Eraserhead, the atmospheric presence in Forest mainly has to do with the very character and properties of the sounds heard. In order to give a better idea of the sonic material in Forest and the sequential structure of the work, consider a short description of a single 28-minute loop reconstructed from field notes I made while visiting Documenta in July 2012: We arrive in a quiet section of the work. All we hear are birds singing quietly among the trees occasionally accompanied by the sound of people walking, handling things and moving different objects around. A low-pitched drone of electronic sound fades in to fill the environment for a few minutes, superimposed with speaking voices in a chaotic mix of non-sensible chatter. The drones and voices disappear and we hear birds again, now accompanied by the sound of people hammering and knocking on wood. Occasionally, the knocking sounds join in rhythmic coordination, always on the verge of becoming a musical practice. Suddenly, a large tree crashes to the ground with a loud sweeping sound. After a short while a group of people starts to laugh, at first more discreetly and dispersed, but soon more intensely and in concert. They laugh together and they laugh at something, although we do not know what it is. The laughing stops and after a short silence a high-pitched sound emerges, slowly, almost imperceptibly, and soon we find ourselves immersed in the cacophonic noise of a heavy storm descending. After the storm has passed, we once again hear the sound of birds and people walking around, handling different pieces of metal and wooden objects. Occasionally

530   ulrik schmidt we can hear the snorting sound of a large animal nearby. After a period of stillness—a disturbing stillness, as if the whole environment is waiting for something to happen—the space is pierced by the haunting scream of a girl somewhere in the distance. After a short while, we hear the metallic sound of cars rolling by, soon followed by marching feet and droning airplanes above. What sounds like a large wooden wagon is being pulled across the forest, we hear the neighing of horses and sounds of military drums approaching. Suddenly a group of men are shouting aggressively nearby, and we find ourselves in the middle of a sonic battle of gunshots and bombs exploding. The droning of airplanes return, machineguns and missiles are being fired everywhere. The battle ends in a brief intense climax. After a short period of penetrating silence, we can hear the beautiful sound of choir music (Arvo Pärt’s Nunc dimittis [2001]). The music plays for a few minutes, then the sounds of singing birds and people moving around return once again and another 28 minutes’ loop begins.  (notes translated from Danish by the author)

As this short description of Forest suggests, each sonic event has a very specific aesthetic function in the overall production of atmospheric environmentality. They all help to provoke our environmental imagination by creating a tense experience of physical action and aroused emotions, acousmatically distributed among the trees to produce an atmospheric sense of environmental presence. In fact, Forest is a profound example of the very combination of site-specific action and anthropomorphic affectivity that is the main feature of atmospheric environmentality. Everything we hear ­supports the production of a sense of specificity and virtual human presence among the woods and helps to intensify the overall affective character of the environment as an imaginary whole. Apart from the musical intermezzo, the basic sonic means used to create the atmosphere in Forest are essentially quite similar to the ones in Eraserhead. The sonic material mainly consists of acousmatic sounds of animals, human bodily action, voices, and machines combined with occasional sounds of electronic drones and stormy weather. What distinguishes the production of sonic atmosphere in Forest from that of Eraserhead is, among other things, a much stronger emphasis on dramaturgical elements. The narrative action, however, remains somewhat abstract throughout the whole cycle. Despite the fact that we hear all sonic action in excessive detail, what exactly is taking place remains obscured, hidden as the action is behind the double acousmatic curtain of the forest/sound system. But again, precisely because of this acousmatic abstraction, the sounds stimulate our environmental imagination all the more forcefully by intensifying our tendency toward causal listening. We constantly strive to locate the virtual action that is taking place around us and to figure out “what it is.” In direct contrast to Schaeffer’s hope for a pure reduced listening in acousmatic space, Forest thus becomes a demonstration of how, in Luke Windsor’s words, “the acousmatic curtain” not merely serves “to obscure the sources of sounds. Indeed, it can be seen to intensify our search for intelligible sources, for likely causal events” (2000, 31). So, in this process of intensified causal listening effectuated by Forest’s double acousmatics, we spontaneously merge the disparate events into a multisensory feeling of environmental wholeness that is both abstract and concrete at the same time. To repeat the initial quote from Deleuze, the feeling of environmental

sound as environmental presence   531 presence evoked by Forest is “distinct-obscure”—distinct and obscure at the same time. We never know exactly what is going on, but what we hear still affects us directly as particular actions taking place around us in all their specific environmental presence, right here and right now. This atmospheric stimulation of our environmental imagination takes place in two ways. First, by being mapped perceptually onto a mental image of environmental wholeness, the artificially produced acousmatic sounds produce a strong environmental reality effect. The combination of advanced technology and forest acousmatics creates a hyperreal environmental spectacle in which the line between (virtual) reproduction and (actual) production is perceptually blurred. Precisely because of this double acousmatic character of the piece we are encouraged all the more intensely to invest the sounds with a strong sense of site-specific presence—as if they were actually taking place around us, as if they were in fact causal products of real environmental activities hidden from our eyes out there among the trees. Forest is site-specific sonic environmentality as atmospheric spectacle. Second, Forest is a profound example of the use of anthropomorphic sound to create a sense of all-encompassing environmental affectivity. Apart from the changing weather conditions and generic forest sounds such as bird song, the sonic environment continually evokes a particular sense of human presence: human bodily activity, social interaction, groups of people battling, intense cries of joy and horror. However, because of the acousmatic veil we are not given a fully actualized diegetic world in which to situate the human activities. Instead, we are left with a more abstract and nonlocalizable environmental imagination of a “global” physical and emotional being, affectively penetrating the scenario as a whole. We may not know exactly what is going on and we may not be able to locate the events and distinguish them from each other, but we sense and imagine the anthropomorphic tension and constant change in intensity and mood as an overall environmental affect. We attune ourselves to the state and being of the environment as an imaginary whole. To summarize, we can conclude that the atmospheric environmentality of Forest is characterized by the combined production of two different affective sensations of environmental presence. First, Forest creates a feeling of a site-specific, hyperreal spectacle that anchors the event in the space and time of the performative situation itself. And second, the abstract anthropomorphic expressions penetrate the entire event with an environmental imagination of affective presence.

Conclusion The aim of this chapter has been twofold. First, the aim has been to explore our affective relations with the sonic environment on a general level and, second, to analyze this relationship in a more specific context as the production and imagination of sonic atmospheres. Atmosphere is understood as the environmental production of a sense of site-specific and anthropomorphic presence. The two examples considered—Lynch’s

532   ulrik schmidt Eraserhead and Forest (for a Thousand Years) by Cardiff and Miller—are from the fields of cinematographic sound design and contemporary sound art respectively. It would indeed be possible, though, to expand the perspective and transfer the chapter’s overall argument to other areas of contemporary auditory culture where the staging and experience of sonic environmentality is of equal importance. In many computer games, for instance, not only is the sonic production of environmentality crucial to give the gameplay a sense of worldly realism but also very often sound is used intensively to affect our environmental imagination of the game environment with an atmospheric sense of site-specific and anthropomorphic presence quite alike the ones found in film (Eraserhead) and sound art installations (Forest). And again, we can find similar tendencies, albeit with quite different means, in our everyday use of background music where the sonic production of atmospheric presence often plays an important role in the staging of everyday social interactions. While generally stimulating the basic environmental mode of listening described by Anahid Kassabian as a form of “ubiquitous listening” (2013), background music is also, on a more specific level, typically used in everyday life to intensify our experience of being in a particular place or social situation by evoking a sense of site-specific and anthropomorphic presence. Sound and music is employed to affectively evoke an environmental feeling of being in a particular place and a particular mood. In other words, sonic environmentality and the production of sonic atmosphere cover a vast and diverse field of aesthetic practice including some of the most important areas of contemporary auditory culture. With the distinction presented here between atmosphere, ambience, and ecology as three basic dimensions of our affective relations with the sonic environment, I have proposed a theoretical framework for a possible further exploration of it in its different “distinct-obscure” manifestations. Hopefully, such a framework may inspire other contributions to the future development of what could become a general aesthetics of sonic environmentality. Still, however, in this process we must keep in mind not only the affective and imaginative character of sonic environments but also how they affect us and stimulate our imagination as environments. A true aesthetics of our sonic environment is, first and foremost, an aesthetics of sonic environmentality.

References Anderson, B. 2014. Encountering Affect: Capacities, Apparatuses, Conditions. Farnham, UK: Ashgate. Böhme, G. 1993. Atmosphere as the Fundamental Concept of a New Aesthetics. Thesis Eleven 36: 113–126. Böhme, G. 1995. Atmosphäre. Frankfurt am Main: Suhrkamp Verlag. Böhme, G. 2001. Aisthetik. München: Wilhelm Fink Verlag. Bregman, A.  S. 1990. Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge, MA: MIT Press. Buchanan, B. 2008. Onto-Ethologies. Albany: State University of New York Press. Deleuze, G. 1988. Spinoza: Practical Philosophy. San Francisco: City Light Books.

sound as environmental presence   533 Deleuze, G. 1994. Difference and Repetition. New York, NY: Columbia University Press. Gendler, T. 2013. Imagination. Stanford Encyclopedia of Philosophy, edited by E.  N.  Zalta. http://plato.stanford.edu/archives/fall2013/entries/imagination/. Accessed June 26, 2017. Gibson, J. J. 1986. The Ecological Approach to Visual Perception. Hillsdale, NJ: Erlbaum. Guattari, F. 2000. The Three Ecologies. London: Athlone Press. Hasse, J. 2014. Atmospheres as Expressions of Medial Power. Lebenswelt 4 (1): 214–229. Heidegger, M. (1927) 1996. Being and Time. Albany: State University of New York Press. Herzogenrath, B. 2008. An [Un]Likely Alliance: Thinking Environment[s] with Deleuze/ Guattari. Newcastle upon Tyne, UK: Cambridge Scholars. Herzogenrath, B. 2009. Deleuze/Guattari and Ecology. New York: Palgrave Macmillan. Kane, B. 2014. Sound Unseen: Acousmatic Sound in Theory and Practice. Oxford: Oxford University Press. Kassabian, A. 2013. Ubiquitous Listening: Affect, Attention, and Distributed Subjectivity. Berkeley: University of California Press. Kim-Cohen, S. 2013. Against Ambience. New York: Bloomsbury. Kwon, M. 2002. One Place after Another. Cambridge, MA: MIT Press. López, F. 2004. Profound Listening and Environmental Sound Matter. In: Audio Culture, edited by C. Cox and D. Warner, 82–87. New York, NY: Continuum. López, F. 1998. Schizophonia vs L’objet Sonore: Soundscapes and Artistic Freedom. eContact 1 (4). http://www.franciscolopez.net/schizo.html. Accessed June 21, 2016. Lynch, D. 1977. Eraserhead. Libra Films International. Lynch, D., and A. Splet. 1982. Eraserhead. Original Soundtrack. I.R.S. Records. McCullough, M. 2013. Ambient Commons: Attention in the Age of Embodied Information. Cambridge, MA: MIT Press. Merleau-Ponty, M. (1945) 2005. Phenomenology of Perception. London and New York: Routledge. Morton, T. 2007. Ecology without Nature: Rethinking Environmental Aesthetics. Cambridge, MA: Harvard University Press. Morton, T. 2010. The Ecological Thought. Cambridge, MA: Harvard University Press. Schmidt, U., 2013. Det ambiente: Sansning, medialisering, omgivelse [The Ambient: Sensation, Mediatization, Environment]. Gylling, Denmark: Aarhus University Press. Schmidt, U. 2015. The Socioaesthetics of Being Surrounded: Ambient Sociality and MovementSpace. In Socioaesthetics: Ambience—Imaginary, edited by A. Michelsen and F. Tygstrup, 25–39. Leiden: Brill Publishers. Schmitz, H. 1993. Gefühle als Atmosphären und das affektive Betroffensein von ihnen. In: Zur Philosophie der Gefühle, edited by H.  Fink-Eitel and G.  Lohmann, 33–56. Frankfurt am Main: Suhrkamp Verlag. Schmitz, H. 2014. Atmosphären. München: Verlag Karl Alber. Schaeffer, P. 1966. Traité des objets musicaux. Paris: Éditions du Seuil. Schafer, R.  M. 1977. The Soundscape: Our Sonic Environment and the Tuning of the World. Rochester, VT: Destiny Books. Toop, D. 2004. Haunted Weather. London: Serpent’s Tail. Uexküll, J. von. 1921. Umwelt und Innenwelt der Tiere. Berlin: Springer. Uexküll, J. von. (1934) 2010. A Foray into the Worlds of Animals and Humans. Minneapolis: University of Minnesota Press. Windsor, Luke. 2000. Through and around the Acousmatic: The Interpretation of ElectroAcoustic Sounds. In Music, Electronic Media and Culture, edited by S.  Emmerson, 7–35. Aldershot, UK: Ashgate.

chapter 26

The A esth etics of Im prov isation Andy Hamilton

Introduction Within philosophical aesthetics, and musicology generally, improvisation as an approach to musical performance remains misunderstood. It is still sometimes regarded as having a lower status than the interpretation of composed works, even if it is now less commonly treated as “instant composition,” “made up as you go along.” The opposition between an aesthetics of perfection and an aesthetics of imperfection offers a fruitful context for its elucidation and applies across the range of sound and imagination. The latter concept was coined by Ted Gioia in his book The Imperfect Art, to denote a valuation of spontaneous process over finished product, expressed most clearly in the work of improvising musicians. It is important to stress that this ideal of spontaneous creation applies also to interpretation of composed works, which are typically not mechanically reproduced as improvisers sometimes suggest, but creatively realized; spontaneity at the micro-level is compatible with following a score, and creates a higher level of creative performance. This chapter is concerned to develop and defend an aesthetics of imperfection, which provides a theoretical framework that contrasts with traditional perfectionist attitudes across the range of performance practice and deepens our understanding and appreciation of both improvised and composed music. It concludes with a discussion of the relation between the aesthetics of imperfection and the status of improvised music as art music or classical music.

A Philosophical Humanist Approach to Music Aesthetics In contrast to Justin Christensen’s (this volume, chapter 1) entry, this chapter addresses improvisation in its cultural rather than psychological aspects—its expression as historical

536   andy hamilton practice, rather than its nature as embodied cognition. It is impossible completely to separate these contrasting aspects. But on the humanistic approach to music aesthetics which this article assumes, they stand in some tension. A humanistic approach treats music as a sounding, vibrating phenomenon, and a performing art or entertainment. This approach is opposed to an abstract, Platonist conception, which is nonparticipant and intellectualist, and identifies a musical work with a written score; but it also opposes the subpersonal standpoint of much philosophical discussion of neural research. Humanism asserts the centrality of artistic criticism to the understanding of art—a humane, aesthetic investigation in which neural research has little relevance. Such research can tell us much about our musical responses, but its implications for the appreciation or understanding of music as an art should not be exaggerated—as it is when the personal level of aesthetics and the subpersonal level of brain processes and activity are conflated. On a humanistic view, music is an art with at least a small “a”—a practice involving skill or craft whose ends are essentially aesthetic, and which is the necessary object of aesthetic attention, with sounds regarded as tones. (The most commodified popular music perhaps falls below this level, into more entertainment—see Hamilton forthcoming.) “Aesthetic attention” involves appreciation of beauty and cognate or related notions. Many Ancient Greek theorists seem to neglect the auditory experience of music, and so do not regard it as an art in our current sense; it is sometimes assumed that the Greeks had an ethical or a mathematical, rather than an aesthetic, conception of music. That may be true of theorists, but not of Greeks in general; an aesthetic conception of music should be attributed here as in other cross-cultural instances (see, Hamilton 2007a, chap. 1). The art of music aims at the imaginative treatment of sound—according to the contrast between fantasy and imagination stressed by Coleridge, “imaginative” implies that the result is not mere entertainment, but aspires to artistic creativity. Scruton (2015), in Art and Imagination, generalized Coleridge’s account of imagination as a capacity of rational beings, beyond central cases of imagining and “seeing as,” to include any entertaining of unasserted thoughts and their equivalent in mental imagery and perception. For Scruton, experience of music has as its intentional objects (1) sounds and silences (featured in an asserted thought) and (2) life and movement (featured in an unasserted thought, involving the imagination). That is, on Scruton’s view, we literally hear sounds and silences and, through a process of imaginative, metaphorical perception, hear life and movement in them. While his view that the perception of music is necessarily metaphorical may be questioned, the essentially humanistic basis of Scruton’s account should be acknowledged. An imaginative or artistic treatment of sound is an artistic treatment, though it is not unique to music, if one accepts that there is nonmusical sound art—and clearly, the latter can include improvisation. But my present focus is on improvisation in music, rather than in nonmusical sound art. To develop this connection between imagination and art: I am asserting a conceptual holism between art and imagination, an interdependence of the two concepts.1 This position is famously associated with one of the most notorious theories of art in the

the aesthetics of improvisation   537 philosophical canon, the Croce-Collingwood theory. In Outlines of a Philosophy of Art from 1924, Collingwood introduced his view that art is a form of imaginative activity, to be contrasted with the production of a physical object. He developed the distinction in his better-known The Principles of Art.2 For Collingwood, the artist is involved in a special type of making, namely, imaginative creation: a thing which “exists in a person’s head” and nowhere else is alternatively called an imaginary thing. The actual making of the tune is therefore alternatively called the making of an imaginary tune . . . the making a tune is an instance of imaginative creation. The same applies to the making of a poem, or a picture, or any other work of art.  (1958, 134)

According to the standard reading of the Croce-Collingwood view, the “total imaginative experience” that constitutes the “work of art proper”—the artwork in a strict rather than colloquial sense—must be regarded as related only contingently to the physical artifact. (Collingwood’s position in fact may be subtler than this.) In the aesthetic as opposed to general psychological case, I believe, this radically mentalistic conception of imagination is mistaken. Equally mistaken, I believe, is Sartre’s account of imagination, with which it has affinities. Sartre (2004) argues that a musical work such as Beethoven’s Seventh Symphony exists neither in time, in the usual sense, nor space. Rather, it exists in the imagination, in the imaginary, outside the real: To the extent that I grasp it, the symphony is not there, between those walls, at the tip of the violin bows. Nor is it “past” as if . . . in the mind of Beethoven. It is entirely outside the real. It . . . possesses an internal time [which] does not follow another time that it continues . . . nor is it followed by a time that would come “after” the finale. [Yet the] Seventh Symphony depends, in its appearance, on the real: that the conductor does not faint, that a fire breaking out in the hall does not put a sudden stop to the performance.  (191–194)

This Sartrean view of music shows the limitations of metaphysics, I believe—a cloud of philosophy condensed in a drop of grammar—though this is not the place to offer a critique. However, one does not have to follow these writers in espousing a radically mentalistic conception of both art and imagination, in order to recognize the germ of truth in the connection between these concepts. The truth that these theories mislocate can be explained as follows: When a piece of entertainment or craft is described as involving an imaginative achievement, then it is being claimed as belonging to the realm of art. In contrast, fantasy—such as the “Game of Thrones” genre—is a staple of low-grade entertainment. Similarly, musical fantasia is composition that panders to a relaxed state of pleasant, unimaginative connections and themes requiring no concentration or attention, a pleasing, lazy stream of association that thinks of nothing beyond its own sensations. This is not to dismiss fantasy entirely from the realms of art. There are works whose

538   andy hamilton content is predominantly fantastical, such as the last plays of Shakespeare, Keats’s Lamia, whose main point of interest is genuinely artistic—and Mozart’s keyboard “Fantasias” include some of his most sublime shorter works. Much entertainment, in contrast, is for its fans simply fantasy, or “novelty,” to resurrect an eighteenth-century aesthetic category. Fantasy productions are “imaginative” only in the sense that they involve great insight into popular taste, thus opening the way to commercial opportunities—an important and neglected sense of imagination, but one that is nonartistic. These contrasts apply to performances of improvised music. Here, as in other cases, answering the questions “Is this art as opposed to mere craft or entertainment?” and “Does this work involve imagination rather than fantasy?” appeals to the same kinds of feature. In what follows I demonstrate the artistic status of improvised music, thus showing its imaginative content. To reiterate, a humanistic as opposed to abstract account of music sees it as a sounding, vibrating phenomenon, and a performing art. Abstract or static accounts, in contrast, are nonparticipant and intellectualist; they regard rhythm statically, as a pattern of possibly unstressed sounds and silences—as simply order-in-time as opposed to orderin-movement. Humanists stress music’s essential origins in the human production of sound and movement, involving a distinctive attack characteristic of traditional musical means of producing sounds by striking, bowing, or blowing. These means of production, supplemented in the twentieth century by electronic media, are still essential to the concept of music. On a humanistic conception, music, dance, and poetry originated together and are essentially connected. (Chimps may dance or march rhythmically; but for humanism in my sense, chimps are close enough to human.) Philosophical humanism affirms the importance of humane understanding against both scientism and—less common in a more secular intellectual climate—supernaturalist exceptionalism. Hence the tripartite distinction: Scientism: the view that the physical or natural sciences constitute the paradigm of human knowledge, one on which other disciplines must model themselves. Exceptionalism: the (normally religious) view that “human animal” is a contradiction in terms, and that human beings are the only biological entity that cannot be grouped with others on any level. Philosophical humanism: holds that the explanation of human behavior is irreducibly personal—that is, it essentially involves what is often termed the intentional stance, resting on commonsense or “folk” psychology and the attribution of beliefs, desires, intentions, and similar attitudes to rational agents. Whole-person ascription involving the intentional stance is the fundamental level of explanation of human behavior. Subpersonal and neural explanation has a place, but not, as scientism holds, the ultimate one; so humanism does not amount to exceptionalism as defined earlier. Humanism is not antiscientific, but antiscientistic—a quite different thing. (This tripartite distinction is developed in Hamilton 2013a, chap. 7) This chapter assumes a humanistic recognition of the value of art and the aesthetic to human well-being. It advocates a normative conception of art and culture that challenges

the aesthetics of improvisation   539 fashionable sociological conceptions, holding that art has a purpose or purposes essentially, and argues against the common assumption that because “anything can be art,” it is therefore indefinable.

High Art and Vernacular Art To say that music is an art, is not to say that it is always a high art—that was the point of describing it as an art at least with a small “a.” Clearly most music is not; but I will argue that an improvised art music is possible, for instance in modern jazz. Art music is “Art” with a capital “A”—high art as opposed to mechanical, vernacular, or popular art with a small “a,” essentially craft or entertainment. The art historian Oscar Kristeller (1951) famously argued that the modern system of the fine or high arts appeared only in the eighteenth century: In [the] broader meaning, the term “Art” comprises above all the five major arts of painting, sculpture, architecture, music and poetry. These five constitute the irreducible nucleus of the modern system of the arts, on which all writers and thinkers seem to agree.  (497)

On Kristeller’s view, Plato and the Greeks did not think of poetry and drama, music, painting, sculpture, and architecture as species of the same genus, practiced by “artists” in the current overarching sense of the term (Kristeller discussed in Hamilton 2007a). The modern system separated fine art from craft, generating a concept of high art produced by artists of genius, while leaving great scope for differences between the individual arts. Kristeller’s view underlies the modernist consensus. (Artistic modernism being an intensification of modernity, from the later nineteenth century onward.) But even if one disagrees with his claim that the Arts—with a capital “A,” the fine or high arts—arose only in a modern system, aesthetics still needs to explore the very wide divergences between modern concepts of art and those found in antiquity and in non-Western cultures. Kristeller’s concern was with Western art, and widely differing models or systems are found in other cultures—Edo-era Japan, for instance, valued “the Four Accomplishments” or gentlemanly pursuits of music, games of skill, calligraphy, and painting (Guth 2010, 11). Such cross-cultural data are essential in addressing the question “What is art?,” and are more significant than considerations arising from postmodernism that have tended to preoccupy Western commentators.3 The twenty-first century has no very clear system of the arts—the vogue for stipulating one did not much outlive the eighteenth century—and there is a vagueness in the understanding of our present “system” of the arts, and in its accompanying notion of an “artistic conception.” Nonetheless, it should be recognized that there is an implicit system, otherwise practices of arts funding, newspaper reporting, and so on, would be impossible. The modernist narrative interprets the fine or high arts, with their associated self-conscious

540   andy hamilton artistic conception, as autonomous artforms—independent of each other, and having lost any defining practical or social function. Autonomous arts, according to the modernist narrative, transcend both the practical utility of the useful or mechanical arts such as furniture or ceramics, and the social functions—religious, courtly, and military— which art and music served prior to their evolution as high arts by the eighteenth and nineteenth centuries. The concept of autonomous art received its most intense expression in the later nineteenth century doctrine of art for art’s sake, which attempted to locate the artwork outside the socioeconomic nexus. The first sense of autonomous art dates from the appearance of the modern system of the arts: it is the sense that excludes decorative art with a practical function such as ceramics, weapons, and furniture. That is, it excludes art that lacks practical autonomy. Whether an artform is capable of such autonomy cannot be entirely predicted; but humans would have somehow to lose the need for furniture, before such artifacts could become autonomous art. Even when exhibited in a museum, their functional origins are inescapable. They may therefore be characterized as intrinsically heteronomous art. A second sense of autonomy is social autonomy. Though the demarcation between social and practical function is not a clear one—the representational or pictorial function of painting, for instance, while serving social functions such as enhancing an aristocratic patron’s prestige, is not itself social—social autonomy is particularly stressed by the modernist narrative. Other examples of social function would be eighteenth-century music for banquets or military pageants, or twentieth-century political art and mass entertainment—functionality persists after the advent of autonomy. Such artforms are contingently heteronomous, because they are capable of becoming autonomous. Socially autonomous art constitutes an autonomous practice whose defining function is aesthetic or artistic rather than social. The possibility of socially autonomous art is often rejected out of hand, but this rejection may rest on a misunderstanding. What I have in mind is art that has no social defining function, though it clearly has nondefining functions that are social. The defining function is what one needs to know, in order to understand anything at all about the event or process. For instance, Bach’s cantatas were originally composed for church services, whose purposes they served. In contrast, it would be absurd to say of modern concert performances that the music serves the social occasion of a concert; the music is the social occasion. The performance has no defining social function, but rather a defining functionlessness—though of course it has many nondefining functions that, as Adorno stressed, arise in virtue of that defining functionlessness. (Arguably the act of performing has a social function—that of presenting the music to an audience—and so it could be argued that it has a defining social function. This question must be pursued elsewhere.) The aesthetic significance of social autonomy lies in how it can free artist and audience from socially conditioned taste. It generates what I term a post-Romantic conception of art, one that regards high or classic art as neither didactic nor pleasurable diversion; one could say that such art aims at truth, but is not reducible to anything as crude as a “message,” and artworks are concerned, rather, to raise possibilities for consideration.

the aesthetics of improvisation   541 According to this conception, art is autonomous, and its audience has freedom or autonomy in interpreting it (see Hamilton 2013b). This freedom is relative because, according to a familiar modernist dialectic, social and thus aesthetic autonomy arises from, yet is in tension with, capitalist commodification. In the period from the Renaissance to the later eighteenth century, different artforms in turn became free of church and aristocratic patronage, as the artist’s work was commodified through entry into the capitalist marketplace. This process is found also in non-Western art, such as that of Edo-era Japan and, indeed, on a smaller scale in art of many eras (see Hamilton  2009). What is distinctive about post-eighteenth-century developments, as in the development of capitalism generally, is their scale and ubiquity. The concept of high art originates in social distinction, but the implied contrast is not purely social. According to a persuasive modernist narrative, high art, which appeared differentially across the arts from the Renaissance onward, is autonomous art; to reiterate, it transcends the practical utility of mechanical arts, and the premodern social functions— religious, courtly, and military—of art and music before their evolution as high arts. High art originated as the art patronized by church and aristocracy, with elevated themes and subjects. But high social location is neither sufficient nor necessary for high art. Inigo Jones’s courtly masques for James I are regarded by art historians as expensive, frivolous high-class entertainment, that wasted the architect’s genius. In contrast, in the era of modernism, high art embraced low subjects. French realists such as Millet and Courbet chose humble scenes. While The Gleaners might be imagined as having a high, biblical theme, impressionism’s urban subjects could not; Caillebotte was criticized for his working-class The House-Painters and The Floor-Scrapers. High art and art with a small “a” is distinguished by autonomy and not directly by aesthetic value. High art is not just a social category but also is historically conditioned—fully manifested in Western modernity, but present in earlier times and other places. “High art” parallels “High Renaissance” or “high modernism”—it refers to the highest or most exemplary achievement (see Hamilton  2009). (“Fine art” stands in contrast with mechanical art.) The modernist narrative interprets the fine or high arts, with their associated self-conscious artistic conception, as autonomous artforms—independent of each other, and having lost any defining practical or social function. Art ceased to be a product simply for an occasion, and is liberated from direct social function in service of court, aristocracy or church. It is created not simply to satisfy a patron, but as authentic artistic expression. The modernist picture is that such a possibility, though perhaps only remotely realizable, opens up when art enters the marketplace; art becomes potentially autonomous at the same time as it becomes commodified. For different artforms, this liberation occurred at different points from the Renaissance to the eighteenth century, when music, the most backward art in this respect, finally gained its freedom. It is no coincidence that the concept of genius and originality, less possible in a craft tradition, flourished at this time. In presenting a work of art before the public, the artist is claiming—or hoping—that it is worthy of their undivided attention and will richly reward it. Thus, at a concert of contemporary music at the Huddersfield Festival in 2011, I was struck by the way that

542   andy hamilton programming a concert of art music, in which the audience is meant to be silent and attentive, implies a demand on them by the artwork—one which, as on this occasion, might be justified, if the quality of the works were not high. In contrast, muzak in a bar, or Tafelmusik at an eighteenth century aristocratic banquet, make no such claim. Similarly, a painting in an art gallery makes the claim of art, while a kitsch reproduction at a cheap furniture store does not. That is one sense of “the claim of art.” It is the claim that an artwork makes on us, as opposed to the claim involved in calling something an artwork. Some proponents of high art might argue that improvised music does not justify such attention—hence its performance in clubs or bars. This criticism is now addressed by exploring the dialectic of perfectionist and imperfectionist aesthetics.

Perfectionist and Imperfectionist Aesthetics A humanistic aesthetic rejects the artistic primacy of the musical score, espousing what I have termed the aesthetics of imperfection. This aesthetics questions the centrality of the Western art music tradition within philosophical aesthetics and argues, with Ted Gioia, that despite its formal deficiencies, we are nonetheless interested in the “imperfect art” of improvisation. Gioia originates the term “the aesthetics of imperfection,” and defends it against what he calls “the aesthetics of perfection,” which takes composition as the paradigm (Gioia 1988). The aesthetics of perfection emphasizes the timelessness of the work and the authority of the composer and, in its pure form, is Platonist and antihumanistic. In contrast, the aesthetics of imperfection is more consciously humanistic. It values the event or process of performance, especially when this involves improvisation—though these opposites turn out to be dialectically interpenetrating. Thus, the contrast between composition and improvisation proves more subtle and complex than Gioia and other writers allow. The focus in this chapter is principally on jazz and related popular music, but much of the discussion is applicable to other kinds of improvised music. The opposition between these rival aesthetics became sharpened and intensified in the West during the nineteenth century with the increasing specification and prescription that musical notation placed on performers. The process reached its high point during the later nineteenth and twentieth centuries, being associated with the increasing hegemony of the work-concept. An artistic practice that had once involved improvisational freedom for performers became limited to interpretation of an essentially fixed work. The dichotomy between improvisation and composition lacked its present meaning, or perhaps any meaning at all, before this process was well advanced: “By 1800 . . . the notion of extemporization acquired its modern understanding [and] was seen to stand in strict opposition to ‘composition’ proper” (Goehr 1992, 234). Philosophers have tended to neglect improvisation as a contrast to composition. In Scruton’s The Aesthetics

the aesthetics of improvisation   543 of Music, for instance, the work-concept dominates, and an improvisation is treated as a work that is identical with a performance (Scruton 1997). I will argue that an aesthetics of perfection arose with the work-concept and is opposed by an aesthetics of imperfection associated with improvisation. This opposition offers a fruitful framework for looking at certain aesthetic questions in the performing arts. An illustration is found in the debate between Busoni, the defender of improvisation, and Schoenberg, the compositional determinist.4 Schoenberg emphasized the autonomy of the composer-genius in the creation of masterworks that, he insisted, required the complete subservience of the performer; he stood for increasing individuality for the composer at the expense of that of the performer. Busoni, however, found virtues in improvisation and in the individual contribution of the performerinterpreter. He argues, “Every notation is, in itself, the transcription of an abstract idea. The instant the pen seizes it, the idea loses its original form.” In a rather elusive discussion, he argues that the purity of the improvisation is closer to the locus of artistic inspiration. This opposition expresses a dilemma that Western art music has found very hard to resolve. As Rose Rosengard Subotnik (1991) puts it, “when efforts to preserve the autonomy of the composer’s vision are unbounded, the performer is turned into a kind of automaton” (256). The aesthetics of imperfection focuses on the event or process of performance, while the aesthetics of perfection endorses them and emphasizes the timelessness of the work. The dichotomy, as will become clear, implies others: process and product; impermanence and permanence; spontaneity and deliberation. The idea of an “aesthetics of imperfection” may appear paradoxical, its connotations too negative—how could imperfection be an aesthetic value? However, “perfection” and “imperfection” have a descriptive sense close to their Latin derivation—“perficere” means “to do thoroughly, to complete, to finish, to work up;” “imperfectus” means “unfinished, incomplete.” The aesthetics of imperfection finds virtues in improvisation that transcend errors in form and execution—virtues that arise precisely from the “unfinished state” of such performances. Thus, in the arts, sketches help us understand a work’s development, and are sometimes regarded as at least as valuable as the fully crafted final product—the inspiration is freer, and closer to its unconscious source. For instance, some critics might regard Constable’s full-scale preliminary sketches as having a liveliness that the finished landscape paintings, for all their other qualities, might lack. The result is an “aesthetics of imperfection” or incompleteness, where the listener or reader contribution is greater than in more prescriptive or “perfectionist” aesthetics. However, one should not advocate either aesthetic, of perfection or imperfection, without qualification. Rather, “improvisation” and “composition” denote ideal types or interpenetrating opposites. A feature that seems definitive of one type also turns out to be present, in some sense, in the other—or so I will argue with regard to preparation, spontaneity, and structure. There is a continuum of improvised practice, as follows. Pre-realized electronic music stands at the far limit of prestructuring since, although possibly possessing spontaneity at the level of composition, at the level of performance or “sounding” it is fixed.

544   andy hamilton Trial-and-error compositional efforts of students in a recording studio stand in contrast with the organic, motivically developing, through-composed works of Brahms and Schoenberg. Within the improvised sector, preperformance structuring ranges from the work of jazz composers such as Ellington and Gil Evans to the very loose frameworks brought along by Miles Davis to the Kind of Blue recording session. At the furthest “improvised” limit of the continuum stands free improvisation, a development of 1960s free jazz, which abandons the recurring harmonic structures and groundbeat of earlier jazz. Thus, the aesthetics of perfection and imperfection apply not just at the level of performance, but within the process of composition also. Or rather, there is a sense in which these levels overlap. The rival aesthetics extend into other aspects of artistic production; thus, for instance, recording offers its own issues of perfection versus imperfection. Perfectionists believe that allegedly contingent conditions of live performance can be screened out—as in the creative recording techniques of the pianist Glenn Gould. The imperfectionist view, in contrast, is that recording should be a transparent medium giving a faithful representation of a particular performance, with only the grossest imperfections eliminated (see Hamilton 2003). Although an aesthetics of perfection seems to demand absolute fidelity to the composer’s intentions—or rather, it has a very narrow and stringent conception of what such fidelity involves—it should be separated from a commitment to authentic performance in its present-day sense (see Davies 2001). The aesthetics of perfection may imply a Platonist conception of the musical work as a timeless sound-structure, detachable from its original conditions of performance, instruments as well as locations. The converse implication, from Platonism to perfectionism, is stronger, as Glenn Gould’s remarks illustrate: Music need not be performed any more than books need to be read aloud, for its logic is perfectly represented on the printed page; and the performer . . . is totally unnecessary except as his interpretations make the music understandable to an audience unfortunate enough not to be able to read it in print. (Gould in Bazzana 1997, 20–21)

The late-twentieth-century concept of authenticity, in contrast, exhibits aspects of both perfection and imperfection. It has been argued that it rejects the “portability of music” in favor of an ideal of acoustic interdependence of composer, ensemble, and environment; but it also seems to search after a timeless conception of the work.5

The Concept of Improvisation and “Improvised Feel” What does spontaneity amount to in improvised performances? And how does it matter aesthetically? These questions bring us to the heart of the concept of improvisation. Those who adopt a purely causal account of the concept of improvisation imply that its

the aesthetics of improvisation   545 presence is of little aesthetic consequence. Thus, Cavell claims that the standard concept “seems merely to name events which one knows, as matters of historical fact . . . independent of anything a critic would have to discover by an analysis or interpretation . . . not to have been composed.” And Eric Hobsbawm writes: “There is no special merit in improvisation. . . . For the listener it is musically irrelevant that what he hears is improvised or written down. If he did not know he could generally not tell the difference.” However, he continues, “improvisation, or at least a margin of it around even the most ‘written’ jazz compositions, is rightly cherished, because it stands for the constant living re-creation of the music, the excitement and inspiration of the players which is communicated to us.”6 The concept of improvisation does have an essential genetic component—a succinct definition would be “not written down or otherwise fixed in advance.” A purely genetic account claims that whether a performance is improvised may not be apparent merely by listening to it, and adds that the mere fact that a performance is improvised is not an aesthetically or critically relevant feature. The account diagnoses what amounts to an “intentional fallacy” concerning improvisation—reminiscent of the suggestion that extraneous knowledge of authorial intention is irrelevant to critical evaluation. The genetic account exaggerates the extent to which improvisation is undetectable, however. There is a genuine phenomenon of improvised feel, gestured at by Hobsbawm’s comments on what improvisation symbolizes. In The Art of Improvisation from 1934, T. C. Whitmer offered a set of “General Basic Principles,” which included the expression of an aesthetics of imperfection: Don’t look forward to a finished and complete entity. The idea must always be kept in a state of flux. An error may only be an unintentional rightness. Polishing is not at all the important thing; instead strive for a rough go-ahead energy. Do not be afraid of being wrong; just be afraid of being uninteresting.  (Whitmer in Bailey 1993, 48)

From this feel arises the distinctive form of melodic lines and voicings in an improvised performance. Lee Konitz describes a “very obvious energy” in improvisation, which he believes does not exist in a prepared delivery: “There’s something maybe more tentative about it, maybe less strong or whatever, that makes it sound like someone is really reacting to the moment” (Konitz in Hamilton 2007b). One might say of a purported improvisation “That couldn’t have been improvised”— meaning for instance that the figuration is too complex or the voicings too clear to be created under the constraints of an improvised performance. (Perhaps a genius such as J. S. Bach could do so.) Conversely, an improvised feel might be present in prepared playing that takes improvisation as its model, or where a composer is looking to create an improvised effect. The fact that the performance was not improvised might justifiably alter one’s view of the skill of the performer; but there is a more elusive sense in which it matters aesthetically. The artistic ideal of spontaneous creation is one factor that separates improvised art music from entertainment. The entertainer, in contrast, perfects a prepared routine and sticks with it, in the knowledge that it works—a “bag of tricks” model of improvisation. Routines are avoided by the “modernists” who reject the culture

546   andy hamilton industry—jazz musicians such as Bill Evans, Paul Bley, Lee Konitz, and others who disdain flashy virtuosity. There are various senses in which improvisation matters aesthetically, therefore. Even assuming a viable notion of “extraneous” knowledge, claims of an intentional fallacy are not vindicated. They are further undermined when one comes to consider the role of preparation. Cavell and Hobsbawm seem to subscribe to the “instant composition” view of improvisation. In my criticism of this view I will develop a positive definition of improvisation in terms of improvised feel. A continuum of composition and improvisation is reflected in the idea of different kinds of preparation for performance.

Spontaneity and the Aesthetics of Perfection The characterization of improvisation as instant composition is assumed both by an aesthetics of imperfection, with its ideal of complete spontaneity, and by an aesthetics of perfection, which denigrates improvisation. These positions are in some sense mutually dependent; the difference is that one praises instant composition while the other condemns it. Later I will criticize the first position. Here, I argue against the second that claims that improvisers, in their fruitless aspiration to spontaneity, recycle rehearsed material; on this view, improvisation is a barrier to individual self-expression, not a way of realizing it. Thus, Adorno treats jazz’s aspirations to spontaneous improvisation as hollow, subjugated by the demands of the culture industry of which it is a part. Modernist composers are almost unanimous in their negative view of improvisation. Elliott Carter (1997), for instance, argues that it allows undigested fragments of the unconscious to float to the surface. His conclusion is that “improvisation is undertaken mainly to appeal to the theatrical side of musical performance and rarely reaches the highest artistic level of . . . Western [art] music” (324–325). Pierre Boulez questions the more radical chance or aleatoric techniques, deployed during the 1960s by Stockhausen and others, which leave much to the performer’s decision. His criticism is that familiar patterns of notes are embedded in the performer’s muscular memory as a result of countless hours spent with the instrument, to be regurgitated when there is no restraining score. Improvisers express themselves less than they think because so much of what they play is what they are remembering, including things they do not even know they are remembering. In a later interview, Boulez was better disposed toward jazz than toward aleatoric improvisation, though he still stressed what he regards as its limitations: “The [work-concept is] the top level not only of enjoyment, but also depth. I cannot consider improvisation as really the highest level.”7 As the aesthetics of imperfection recognizes, the improviser has less chance than the composer of eradicating cliché in their work. But the improviser’s preparation and practice

the aesthetics of improvisation   547 is precisely intended to keep them from playing what they already know. Thus, there is a relation between preparation and performance not envisaged by Carter and Boulez— nor by the polar opposite of their view, the pure spontaneity assumed by a full-blown aesthetics of imperfection. Mediating the extremes of perfection and imperfection yields the following picture. Interpreters think about and practice a work with the aim of giving a faithful representation of it in performance. Improvisers also practice, but with the aim of being better prepared for spontaneous creation. Many improvisers will formulate structures and ideas, and, at an unconscious level, these phrases will provide openings for a new creation. Thus, there are different ways for a performer to get beyond what they already do, to avoid repeating themselves. For the improviser, the performance must feel like a leap into the unknown, and it will be an inspired one when the hours of preparation connect with the requirements of the moment and help to shape a fresh and compelling creation. At the time of performance, they must clear their conscious minds of prepared patterns and simply play. Thus, it makes sense to talk of preparation for the spontaneous effort. As Lee Konitz puts it, “That’s my way of preparation—to not be prepared. And that takes a lot of preparation!”8 This is the qualified truth in Busoni’s claim, discussed earlier, that improvisation is valuable because it is closer to the original idea.

Free Improvisers, Interpreters, and “Improvisation as a Compositional Method” To reiterate, the connection between preparation and performance is misconceived by a radical aesthetics of imperfection, as well as by improvisation’s perfectionist critics. Thus, some free improvisers aim to improvise, in Ornette Coleman’s words, “without memory,” while Derek Bailey (1993) advocated “non-idiomatic improvisation,” apparently without a personal vocabulary—a paradoxical notion for such a highly idiomatic and individual improviser, as has often been pointed out. Against these authorities, I would argue that an improviser’s individuality resides in, among other things, their creative development of favorite stylistic or structural devices, without which they risk incoherence and noncommunication. The interpretation of composed works is also misconceived, both by imperfectionists and perfectionists. Many proponents of an aesthetics of imperfection believe that interpreters simply “reproduce the score.” The dialectic here parallels that concerning instant composition: imperfectionists dismiss interpretation as mere reproduction, while perfectionists praise it for the same reason, since—on their view—a reproduction leaves no space for the performer’s individuality. (These are extreme statements, and the views of Busoni and Schoenberg are more subtle.) In fact, the greatest interpreters produce the illusion of spontaneous creation.9 Artists of the stature of Lipatti, Brendel, Furtwängler, or Kocsis make us hear the work anew, as it never has been before. This is a genuine

548   andy hamilton phenomenon, not an artistic illusion. As interpreters come to know a work intimately, internalizing it and making it their own—just as actors become the part—a certain freedom develops. In contrast to the macro-freedom of improvisers, interpreters have a micro-freedom to reconceive the work at the moment of performance, involving subtle parameters such as tone and dynamics. The process of interpretation is misunderstood by an aesthetics of perfection also. A well-rehearsed performance of a familiar work will, after all, involve something that the performer has already played, and this could become stultifying. So, the interpreter must strive for that improvisational freshness that gives the illusion that they are not playing “what they already know”—that is, a pre-existing work. Improvisation makes the performer alive in the moment, bringing them to a state of alertness, enhanced in a group situation of interactive empathy (Konitz in Hamilton 2007b). But all players have choices inviting spontaneity in performance. These choices arise from the room in which they are playing: its humidity and temperature, who they are playing with, and so on. Interactive empathy is present in classical music too, at a high level in the traditional string quartet. Again, both perfectionists and imperfectionists fail to recognize that improvisation and composition are interpenetrating opposites—features apparently definitive of one are found in the other also. It should be stressed that improvisation is not just—perhaps not mainly—an individual achievement of one musician, but the product of collective teamwork of several musicians. Communication between musicians and the audience is also a vital part of the process of improvisation. Although improvisers and composers are no longer in two mutually uncomprehending camps, pervasive misunderstandings of improvisation remain, which this chapter has tried to correct. Despite the qualifications of it presented here, I believe that the aesthetics of imperfection is right to focus on music as event—subverting the received account whereby works are merely exemplified in performance. This conclusion provides further support for a humanistic philosophy of music.

Jazz as Classical Music We now consider the relation between the aesthetics of imperfection and the status of improvised music as art music. In particular, in what sense is jazz an art music? The jazz historian Scott DeVeaux writes, The rapid acceptance of bebop as the basic style by an entire generation of musicians helped pull jazz away from its previous reliance on contemporary popular song, dance music, and entertainment and toward a new sense of the music as an autonomous art.10

Jazz became an autonomous art, one with a fairly capital “A”—a practice involving skill, with an aesthetic end that richly rewards serious attention. Like Ming vases and Ancient

the aesthetics of improvisation   549 Greek sculptures, its products are now accepted as (high) art even though its creators possessed no such concept. However, many have reservations about describing jazz as an art music; even more so, about describing it as a classical music. Its products have many of the features of art music, despite evidently being less contrived than the great works of the Western canon. Historically, jazz has drawn for its material on the charms of ephemeral pop music, whose charms arise from their powers of association for individual listeners—what has been described dismissively as the “potency of cheap music.” When those materials are used as they are in jazz, an art of great power can be created. The present situation is more complex, but jazz still provides a case study of the dialectic between popular and art music. This dialectic gives rise to central aesthetic questions, much-discussed in musicology and sociology of music, but whose deeper roots philosophical aesthetics tends to neglect. My suggestion is that jazz shares some of the features of Western art music—that apparently unique, autonomous art music that contrasts with nonautonomous art musics such as gagaku, courtly gamelan, and Indian art musics. The claim that jazz is a classical music commonly means: 1. Jazz is a serious art form whose long association with the entertainment industry is no longer essential—in Adorno’s language, it is an autonomous art. 2. It has arrived at an era of common practice, which is codified and taught in the academy. 3. It has a near-universality and constitutes an international language, transcending national and ethnic boundaries. It might be questioned whether any art music—whether Western art music or jazz— has feature 3. Western art music is not widely appreciated in India, for example. We are speaking of near- or relative universality, therefore. Features 1–3 apply only partially to non-Western art musics such as Korean or Japanese classical music, or courtly gamelan. I suggested that the latter art musics are nonautonomous, but one could argue that all of these musics developed from a folk or popular music to an autonomous art music. However, during the twentieth century, jazz acquired the universal status that was previously the claim solely of the Western classical tradition. Feature 3 is neither necessary nor sufficient for a genre to be a classical music. It is not sufficient, because rock and roll, for instance, has a universality, is an international language, but does not—with limited exceptions, perhaps including vocational courses—constitute an art music taught in the academy, and is not as separate from the entertainment industry as jazz is. Nor is it necessary, because Indian art musics do not constitute a universal language. Ascribing a “universal status” to Western art music will cause objections from many quarters; it might be argued, for instance, that jazz involves a break with conceptions of “Western” and “non-Western.” These difficult and controversial issues clearly require a longer treatment than is possible here. Jazz’s academic status is shown by music programs like that at Berklee, which encourage the idea of jazz improvisation as a craft that can be taught academically. What David

550   andy hamilton Liebman calls the “apprenticeship system”—young players going on the road with Art Blakey, Miles Davis, and other leaders—has been replaced by an academic training.11 Another factor in jazz’s classical status is canon-creation—the ready availability on digital media of the complete recorded history of jazz. Critics have an essential role in creating and sustaining a canon. As Krin Gabbard writes: The jazz history we have now really wouldn’t exist without the critics . . . would we have Ornette Coleman without Martin Williams? There were certain artists who fit the aesthetic and the predetermined historical notions of critics so perfectly that they were written into the jazz canon.  (2000)

Defining Popular and Classical Music We need to explore in more depth what “classical music” means. It now exists as one half of a polarity, interdefined with popular music—each concept depends on the other. (This claim needs to be reconciled with the fact that they did not quite originate together.) “Classical music” means, in order of decreasing specificity: 1. music conforming to a style-period within Western art music, namely, the first Viennese School of Haydn, Mozart, and Beethoven—music with ideals of balance and proportion, in contrast to Baroque garishness and disproportion. 2. Western art music in general—a sense that appeared together with the developing contrast with popular music. This is the definition understood by the ordinary listener, for whom “classical music” denotes a range of music from Baroque or earlier to the contemporary avant-garde. 3. music that possesses a standard of excellence and formal discipline, belonging to the canon—the accumulation of art, literature, and humane reflection that has stood the test of time and place, and established a continuing tradition of reference and allusion. It was only from the early twentieth century that classical and popular music began to be defined as a contrasting pair. Popular music is music directed at the tastes of the mass of the population. “Popular” is normally defined in terms of scale of activity—for example, sales of sheet music or recordings. The growing divide between art music and popular music during the nineteenth century was deepened by Wagnerian opera and became a rupture with the advent of modernism; for many commentators, modernist art actively sets itself against popular culture (Sadie and Tyrrell 2004). The most influential account of the sociology and aesthetics of the classical/popular divide is Adorno’s. He held that, from the nineteenth century onward, all varieties of music, from folk to avantgarde classical music, have been subject to mass mediation through the “culture industry,” a term that implies mechanical reproduction for the masses, rather than production by

the aesthetics of improvisation   551 them. For Adorno, the divide is not so much between serious and popular music as such—a division that has become, in his view, increasingly meaningless due to the almost inescapable commodity character of cultural products in the twentieth century—but rather between music that accepts its character as commodity, and self-reflective music that critically opposes this fate, and thus alienates itself from society (Paddison 1982). One objection to applying the term “classical music” to Western art music is the apparent implication that it is the unique classical music—which clearly it is not. However, I will argue that even its unique “abnormality” is now qualified by the appearance of a comparably “abnormal” classical music, jazz.

The Critique of “Jazz as Classical Music” Does jazz exhibit classical tendencies? Are such tendencies desirable? Factual and normative dimensions of jazz’s classical status interpenetrate but should be distinguished. Some see jazz still poised between art and entertainment, close to popular music in the ordinary sense of the term, and contrasting with Western art music. The jazz trumpeter Brad Goode, for instance, writes, “most jazz musicians, post be-bop, consider themselves to be ‘artists’ and consequently only consider the integrity of the music during their performances,” an attitude he finds inconsistent with making a living. My view is that jazz can be a classical music, and that exploiting the divide between the classical and popular (in the mass sense) is one of its distinctive strengths as an art of improvisation. Setting aside the views of those who deny that jazz could be classical because it is of little artistic value, there are three main reasons for rejecting the classicizing tendency— that it makes jazz elitist, or safe, or static. The final objection is the most powerful, but is also misguided. During the eighteenth and nineteenth centuries, Western art music entered an era of common practice based on functional harmony and the tonal system of major and minor keys. Some argue that this era came to an end with the “emancipation of the dissonance” by Schoenberg and his contemporaries; others hold that—concerning music in everyday life—it is still with us. There has been a corresponding period in jazz. Like classical music, jazz also seemingly reached the limits of avant-gardism, though more rapidly. Conrad Cork (1996) argues that while the evolution of jazz practice was rapid for about five decades, it became much reduced after the 1970s, either “because the music has atrophied [or] because it has arrived at a period of common practice, where it can function on its own terms” (73). Just as classical tonality returned to fashion in the 1970s and 1980s, however, jazz has seen a conservative reaction. Others are more critical of the era of common practice, arguing that classical musics and languages are no longer created actively but are conserved in conservatories; interpreters study the seminal texts in order to restore them to life. Thus, Emmett Price writes, “Classical implies static, non-changing; a relic frozen in

552   andy hamilton time. Jazz has never been static, non-changing or frozen,” while Alex Ross refers to the “pernicious” implication that jazz “has become ‘classical’ in the pejorative sense: complete, finished, historical.”12 This negative picture is unduly critical, I believe. Classical music is not the curatorial exercise that these writers assume, and which the authenticity movement in early music may appear to imply. Classical musics do not have to be “static, non-changing, frozen.” As Parakilas argues, rather than resuscitating corpses, the classical repertory keeps “certain old works . . . ever-popular, ever-present, ever-new. It is an idea founded on reverence for the past, but not necessarily on a modern scholarly conception of ­history. . . . [It may not take] notice of historical differences between one work and another within it,” as proponents of early music do ([1984] 2004, 39). Whether classical musics are “static, non-changing, frozen” depends on the extent to which a repertory admits new material. Parakilas comments that such a repertory need not be kept up-to-date with works from the period just past. The repertory of Gregorian chant, for instance, was considered closed by the time of the Renaissance, and performers did not sing the older chants within that repertory differently from the younger chants, though the repertory as a whole was performed differently from place to place and from one period to the next.  (39)

Performance choices were made following a scholarly conception of history, to resuscitate the “corpse” of Gregorian chant. In contrast, a living classical repertory is one that is kept up to date. This, I think, is true both of Western art music and jazz. A reverence for the exact notes transmitted by history, Parakilas argues, is characteristic of classic repertories. His comment that since Charlie Parker has become “classic jazz,” musicians give classical performances which reproduce exactly the “text” of a recorded performance, probably refers to arrangements of Parker solos by Supersax; George Russell’s arrangement of Miles Davis’s solo on “So What” is another example. These cases are not central, however; they are an “early music” as opposed to classical tendency in jazz. As an improviser’s and not an interpreter’s art, jazz imposes strict limits on the former, early music possibility—but less so on the latter, classical tendency, as we now see.

Art and Entertainment: Jazz as an Art Music of Improvisation I have argued that the description “classical” is benign, and that the process of classicization has been a largely beneficial one. Jazz and other improvised musics do not need to be legitimated in a practical as opposed to philosophical sense. What is in question is not whether the music has artistic value, but how that value arises. One view is that—in contrast to Western art music—jazz’s artistic value arises in part at least from its status as

the aesthetics of improvisation   553 improvised music. This is Gioia’s assumption, who as we saw, defends the “imperfect art” of improvisation. On this view, spontaneity implies authenticity, and it makes sense to talk of preparation for the spontaneous effort—Konitz’s “way of preparation—to not be prepared.” Konitz has “complete faith” in the spontaneous process (Hamilton 2007b). A purist version of the aesthetics of imperfection asserts essential differences between jazz and Western art music. But there are also growing similarities arising from the developed artistry of jazz, which means that it can be described as an “imperfectionist art music.” In jazz, an aesthetics of imperfection, expressed through improvisation, allows popular materials to achieve art music status. In its early decades, jazz was an offshoot of the entertainment industry and used its materials. Jazz players later developed loftier aspirations. As we have seen, some writers distinguish a classical art, that involves ­restoration, from a living art, that involves novelty and innovation; on their view, ­creativity in interpretation of a classic is the limited kind that re-enacts or reanimates. This is a misguided account of many classical performing arts I believe. Interpretation is neither “mechanical reproduction,” as proponents of the aesthetics of imperfection sometimes view it, nor restoration as in the case of painting or architecture. Of course, there are different approaches, as there are in the restoration of paintings; but no pristine authentic performance is possible—the performing arts are inexhaustibly interpretable. As Parakilas notes, it is the project of the early music tendency, but not that of classical performers, to reproduce historical Beethoven performances—and even for early music practitioners, interpretation is inescapable, and usually recognized as such. It would be wrong to separate sharply “classical arts” and “living arts,” therefore. Against Parakilas’s assumption that classical and new music are separate practices, they may form a continuum, thus further undermining the rigid demarcation between classical and living arts. In performance, the era of common practice endures, both for Western art music and jazz. These musics aspire to exist in a “common present,” as a living art; classical exemplars offer inspiration rather than rigid templates. The dialectic between aesthetic perfectionism and imperfection recurs, therefore (see Hamilton 2007a). Improvisation in jazz is perfectionist in its affinities with Western art music; while interpretation in Western art music is imperfectionist in its affinities with improvisation. But improvisation imposes limits on classical perfectionism in jazz. Recordings such as A Love Supreme or Mingus Ah Um, are rightly described as “classics” since, as recordings, they are fixed in their perfection and work without qualification to classicize jazz. Concert recreations of A Love Supreme reconstruct but cannot replicate the recording. Jazz’s nature as an improviser’s rather than an interpreter’s art informs its classical status, because improvisation is an expression of performers’ creativity. In improvisation, the performer rather than the composer is the primary creator. In interpreted music, the composer is the primary creator, and the performer is secondary, though still creative. This fact sets limits to the “classicization” of improvised music, depending on whether the performer is primarily concerned with exploring the song’s essence, or prioritizes their own artistic self-expression. In jazz, the superiority of spontaneous creation over prepared solos began to be stressed at the same time—during the transition from

554   andy hamilton swing to bebop, as jazz was becoming an art music and therefore “classicized.” That is, improvisation became valued in jazz as the music was gaining an identity beyond the realm of entertainment and commercial commodification. This fact lends support to the suggestion that jazz is an art music of improvisation. And in showing that improvised performances have artistic depth, to reiterate the argument of an earlier section, I have shown that they involve imagination as opposed to mere fantasy or fancy.

Acknowledgments Thanks for comments and discussion with Gabriele Tommasi, Joanna Demers, Philip Clark, Conrad Cork, Lee Konitz, Max Paddison, Lara Pearson, Lewis Porter, Brian Marley, David Udolf, and Jeff Williams.

Notes 1. Conceptual holism is a leitmotif of Hamilton (2013a, see for instance chap. 1). 2. See Guyer, in Baldwin (2003, 728). 3. The contrast between art with a small “a” and with a capital “A,” and the nature of art before the modern system, is addressed in Hamilton (2007a). 4. The “debate” consisted of Schoenberg writing marginal comments in his copy of Busoni’s book; subsequent quotations are from Busoni (1962, 84) and Stuckenschmidt (1977, 226–227). 5. The former is the view of Robin Maconie (1990, 150–151). 6. Cavell, “Music Discomposed” (1976, 200); Hobsbawm quote from The Jazz Scene, first published 1959 under the pseudonym of Francis Newton, quoted in Gottlieb (1997, 813). 7. Boulez (1986, 461); interview with the author, Usher Hall, Edinburgh International Festival, August 2000. 8. Quoted in Hamilton (2007b); Konitz’s ideas on improvisation are discussed in chapter 6. 9. Stressed by Gunther Schuller in “The Future of Form in Jazz”(1986, 24–25). 10. http://www.oxfordmusiconline.com/grovemusic/view/10.1093/gmo/9781561592630. 001.0001/omo-9781561592630-e-1002248431. Accessed December 17, 2018. 11. Interview in Jazz Review, April/May 2008. 12. Emmett Price, http://www.allaboutjazz.com/php/article.php?id=807. Accessed April 15, 2017; Classical View; Talking Some Good, Hard Truths About Music by Alex Ross, November 12, 1995, http://query.nytimes.com/gst/fullpage.html?res=9A00E2D61439F931 A25752C1A963958260&sec=&pagewanted=2. Accessed April 15, 2017.

References Bailey, D. 1993. Improvisation: Its Nature and Practice in Music. Cambridge, MA: Da Capo. Baldwin, T., ed. (2003). The Cambridge History of Philosophy 1870–1945. Cambridge: Cambridge University Press. Bazzana, K. 1997. Glenn Gould: The Performer in the Work. New York: Oxford University Press. Boulez, P. 1986. Orientations. London: Faber. Busoni, F. 1962. Sketch of a New Aesthetic of Music. In Three Classics in the Aesthetic of Music. New York: Dover.

the aesthetics of improvisation   555 Carter, E. 1997. Collected Essays and Lectures, 1937–95. Edited by J. Bernard. Rochester, NY: University of Rochester Press. Cavell, S. 1976. Music Discomposed. In Must We Mean What We Say?, 180–212. Cambridge: Cambridge University Press. Collingwood, R. 1958. The Principles of Art. Oxford: Oxford University Press. Cork, C. 1996. Harmony with Lego Bricks. Rev. ed. Leicester, UK: Tadley Ewing Publications. Davies, S. 2001. Musical Works and Performances. Oxford: Clarendon Press. Gabbard, K. 2000. Race and Reappropriation: Spike Lee Meets Aaron Copland. American Music 18 (4): 370–390. Gioia, E. 1988. The Imperfect Art. Oxford: Oxford University Press. Goehr, L. 1992. The Imaginary Museum of Musical Works. Oxford: Clarendon. Gottlieb, R. 1997. Reading Jazz. London: Bloomsbury. Guth, C. 2010. Art of Edo Japan: The Artist and the City 1615–1868. Yale University Press. Guyer, P. 2003. Aesthetics between the Wars: Art and Liberation. In The Cambridge History of Philosophy 1870–1945, edited by T. Baldwin, 721–738. Cambridge: Cambridge University Press. Hamilton, A. 2003. The Art of Recording and the Aesthetics of Perfection. British Journal of Aesthetics 43 (4): 345–362 Hamilton, A. 2007a. Aesthetics and Music. London: Continuum. Hamilton, A. 2007b. Lee Konitz: Conversations on the Art of the Improviser. Ann Arbor: University of Michigan Press. Hamilton, A. 2009. Scruton’s Philosophy of Culture: Elitism, Populism, and Classic Art. British Journal of Aesthetics 49: 389–404. Hamilton, A. 2013a. The Self in Question: Memory, The Body and Self-Consciousness. London: Palgrave Macmillan. Hamilton, A. 2013b. Artistic Truth. In Philosophy and the Arts, edited by A. O’Hear. Cambridge: Cambridge University Press. Hamilton, A. Forthcoming. Art and Entertainment. London: Routledge. Kristeller, O. 1951. The Modern System of the Arts. Journal of the History of Ideas 12 (4): 496–527. Maconie, R. 1990. The Concept of Music. Oxford: Clarendon Press. Paddison, M. 1982. The Critique Criticised: Adorno and Popular Music. Popular Music 2: 201–218. Parakilas, J. (1984) 2004. Classical Music as Popular Music. In Popular Music: Critical Concepts in Media and Cultural Studies, Vol. 2, edited by S. Frith, 36–54. London: Routledge. Sadie, S., and J. Tyrrell. 2004. Modernism. In New Grove Dictionary of Music and Musicians, edited by S. Sadie and J. Tyrrell. New York: Oxford University Press. Sartre, J.-P. 2004. The Imaginary: A Phenomenological Psychology of the Imagination. London: Routledge. Schuller, G. 1986. The Future of Form in Jazz. In Musings, 18–25. New York: Oxford University Press. Scruton, R. 1997. The Aesthetics of Music. Oxford: Clarendon Press. Scruton, R. 2015. Art and Imagination: A Study in the Philosophy of Mind. London: St. Augustine’s Press. Stuckenschmidt, H. 1977. Arnold Schoenberg: His Life, World and Work. London: John Calder. Subotnik, R. R. 1991. Developing Variations: Style and Ideology in Western Music. Minneapolis: University of Minnesota Press.

pa rt V

P O S T H U M A N ISM

chapter 27

Son ic M ater i a lism Hearing the Arche-Sonic Salomé Voegelin

Introduction This chapter tries to make a contribution to current ideas on materiality, reality, objectivity, and subjectivity as they are articulated in the many texts on New Materialism that have emerged recently under the auspices of speculative realism, object-orientated ontology, complexity theory, and various other current and emerging “subgenres” that all share a renewed interest in the status and understanding of materiality, material relationships, and the role of the human subject in the context of a contemporary world, whose technological and actual globalization demands a new critical engagement and scholarship to grasp the impact and to articulate the significance of its fluid interconnectedness. The origin of the term “New Materialism” invariably gets sited in the mid- to late-1990s, where it is associated chiefly with the writings of Manuel DeLanda and Rosi Braidotti, whereby the newness of its project or its status as a continuation of traditional materialism remains debated and debatable. Nevertheless, in current discourse the term acts as a shared name for different approaches toward the question of materiality and subjectivity in a digital age. It covers an interest in the relationship between nature and culture, “naturecultures,”1 and brings with it a critique of an anthropocentric worldview. It is articulated variously in relation to climate change and its amplification of ecological consequentiality; it engages in the organization and significance of the global flow of capital and goods, and gives words to the consideration of a concurrent fluidity or fixity of persons; it presents new strategies to engage in issues of identity, sexuality, race, and feminism; and it provides a framework and tools to debate and bring into association all those issues and dynamics to grasp the world and its material reality not as a stable and singular construction but as a matter of agency, interdependence, and reciprocity that impact on its social and political actuality. This chapter is placed in the context of these theorizations that deal with the relationship between nature and culture, materiality and subjectivity, and seeks to participate in the current discourse about matter from a sonic

560   salomé voegelin point of view. This sonicomaterialist perspective is motivated by the idea that the invisible mobility of sound is always already critical of the dualisms of a visuohumanist tradition, in that it is always and by necessity focused on the in-between of things: their relationship and interbeing. Sound is not “this” or “that” but is the between of them, and thus it brings with it a conception of the world as a relational field. To probe this interpretation and try its suggestions, this text focuses on the writing of Quentin Meillassoux, whose book After Finitude (2009) can be understood as a central if somewhat eccentric articulation of New Materialist considerations. Meillassoux critiques an anthropocentric view of the world, attributed by him to the correlationism of phenomenology and metaphysics in general. In its place, he promotes the mind-independence of mathematics to measure and calculate a world before and after human experience. Thus, he sets out the possibility of a human free conception of the world that eschews what he perceives as, on the one hand, the “fideism” of phenomenology and, on the other, the absolutizing idealism of transcendental philosophy, which, in any event, he understands to ultimately produce the same dogmatic conceptions. In what follows here I engage in his charge of an anthropocentric worldview by considering the proposition of a posthumanist theorizing through a focus on sound, creating an invisible imaginary of the material world. The contention is that the sonic sensibility, articulated in sound practice and discourse, precedes and enables the concerns of New Materialism. Sound’s ephemeral materiality and invisible relationality informs the concepts, and grants perceptual access to the ideas discussed currently in relation to materiality and subjectivity. In this sense, sound and listening establish a proto–New Materialist sensibility that is present as a minor strand and challenge within materialist philosophies already, but which only now, in the context of a renewed attention on agency and interdependence, is able to question its humanist rationality and dialectical stance. Accordingly, we could consider whether, without the emergence of sonic practice, discourse and sensibility in art and the humanities, in everyday thought, and in science, New Materialists would find it harder to conceive of and be understood in their articulation of “fragile things,” “speculative turns,” and “dark ecologies,” which are some of the terms and concepts used to theorize a new materialist world. The connection might not be entirely conscious; most theorists writing on materiality today might never have thought to listen, but it might nevertheless be an important if somewhat subliminal influence: a hidden Zeitgeist, something in the air that has shifted focus away from the apparent certainty of what we see unto more ephemeral and darker structures that might well sound, or for whose fragility the material of sound might serve as metaphor. Thus, I would like to contend that New Materialism presents a quasisonic consciousness of the invisible, the relational, the dynamic eventness of things, their predicativeness, and duration. The aim however is not to prove the superiority of sound as a concept and theoretical device. Nor is it my intention to produce an essentialized position. Rather, as with much of my work, the objective is to revisit the nominal and habitual reality of things,

sonic materialism   561 so often set within the boundaries and certainties of a visual language and anchored in the visual witnessing of the object itself, in order to articulate another possibility of what there is. A sonic sensibility invites a different view. It generates a world of fleeting things and coincidences that demonstrate that nothing can be anchored, and everything remains fluid and uncertain, not necessarily as precarity, as a state of anxious fragility, but as a serendipitous collaboration between the multiplicities of the “what is.” Sound, I will argue, aids the reimagination of material relations and processes. It makes appreciable other possibilities of how things might be and how things might relate, and serves to consider positions and positionings of materials, subjects, and objects in a different and more mobile light. I will argue however that the fluidity proposed and the relations intimated are not, as Meillassoux might fear, the fanatical and egocentric imaginings of a correlationist in search of a de-absolutized world. But neither do I seek shelter in his “mathematical world” that has expunged humanity from any involvement in the what is. Rather, I believe that listening as an attitude to the world practices the ambivalence between measure and experience. And it is in practicing rather than resolving this ambivalence that we can reach what, at this moment, appears incommensurable, merely possible and even impossible, to diversify the rationale of logic and reason itself rather than disappear in a plurality of factions. A sonic materialism thus presents not an absence of reason in immersive noncriticality and fanatical egotism. Instead, it foregrounds personal responsibility and participation: not to deny our being in the world and the world being for us what it is through our being in it, but to embrace the human ability to think this position as relative rather than central; to appreciate our responsibility in how the world is: politically, ecologically, and socially; and to initiate change and a different attitude, rather than withdraw into an infrastructure of numbers and codes, which as I will argue, always and unavoidably are the design of a human-thought world. An auditory imagination does not produce Meillassoux’s “fideist obscurantism” of a proper truth, his conception that phenomenology and metaphysics depend on belief and piety instead of truth, thus denying factuality and reason their singular condition of possibility in favor of unlimited irrationality and fanaticism;2 and neither does it engage in the “communal solipsism” that he attributes to them. Instead, a sonic conception and sensibility of the world is the point of access to pure possibility as actuality. This chapter will elaborate on these ideas through the practice of listening to three sound art works: my audition of Toshiya Tsunoda’s Scenery of Decalcomania (2004), an album of seven tracks, allows me to enter the world by its vibrations and to hear its space as events and interactions; listening to the sound transmitted through the porous body in the performance Ventriloqua by Aura Satz (2003), I am initiated into a place of other voices; and my absorption in the pulsating drip of Anna Raimondo’s rhythmic words in Mediterraneo (2015) makes me hear the relationship of language, materiality, and belonging through their fluid boundaries.

562   salomé voegelin

The Ancestrality of a Sonic World In his widely discussed and oft-quoted work After Finitude, which could be described as an infectiously peculiar cornerstone of New Materialism, Quentin Meillassoux sets out an argument for ancestrality, the measure and articulation of a world anterior to humanity, in order to achieve the principles of a human-free conception of the world. The materials and events of such an anteriority he calls “arche-fossils,” and he wants them to be understood not simply as present traces of the past, but as indicative of a logic and reason able to grasp the anterior without a present human experience. The aim throughout the book is to generate the condition of this nonhuman ancestrality, to be able to reach beyond ourselves into a space devoid of ourselves that might ultimately shed light not only on what was but also establish an understanding of the “what is” without the specter of human perception. His anteriority encapsulates an ulteriority too, and together they generate a conceptual space beyond finitude, whose content, material, and organization is experientially inaccessible. This inaccessibility gives cause and justification to his critique of correlationism and leads him to propose the mathematizing of nature: to establish the stability of its laws as “a mind-independent fact . . . that is indifferent to our existence” (Meillassoux 2009, 127) and thus capable of making accessible a world without us through speculation that excludes metaphysics and thus excludes the human point of view and finitude. His argumentations for an after finitude begins with a critique of the strong correlationism of phenomenology and other metaphysical philosophies, which he understands to occur as a counter to the absolutism of transcendental idealism and to result in equally dogmatic fanatisms. While he appears to agree with the need to critique transcendental universalisms, and the dogma of the absolute, he is looking for another solution based on facticity and the contingency of facticity: on the fact that the world “is there,” rather than on my own contingency in a world that “is there for me.”3 In relation to this, strong correlationism presents itself as the dogma of a contingent perception that does not appreciate that things might be otherwise than they appear to me. In other words, it appears to leave no room for speculation: for a speculative materialism that can gain access to the anterior, the nonhuman world without making it “wholly other.” Meillassoux seeks to overcome this problem with metaphysics by promoting decorrelation through data and numbers. Leaving aside for now the question of science’s truth and objectivity, its supposed nonanthropocentrism, and whether it does indeed represent the mind-independent facts on which his thesis hinges, the problem of correlation and the device of ancestrality is intriguing and useful to developing sound’s contribution to New Materialism. Meillassoux’s After Finitude lends inspiration to the aim of a non-human-centered conception of a sonic world. His ancestrality offers a conceptual space to the methodology of a sonic discourse, allowing us to reflect on the nature of sound behind and in front of our lives, and thus enabling us to contribute from the invisible mobility of its materiality to the conception of a posthumanist world.4

sonic materialism   563 At the same time, as I borrow Meillassoux’s notion of ancestrality as a critical device for the development of a sonic materialism, sound produces a critique of his mathematical speculation: I cannot measure a past sound, only past sources of a sound. The arche-fossil, “the material support on the basis of which the experiments that yield estimates of ancestral phenomena proceed” (Meillassoux 2009, 11), is at least a quasivisual object, which gives grounding to a visual sense of objectivity, measure and truth. And thus, the ancestrality traced in its shape conveys a visual world.5 A paleontologist might be able to deduce the sound of a diplodocus by the measure of its jaw, its chest cavity, and the capacity of its lungs, but the material of the sound is not only the measure of its source. It is not only that of the facticity of its making but includes the factuality of distance, the environment and climate of its path to reception, which might well be plural, and must include other measures of anatomy, climate conditions, and geographical positioning, to say nothing of the cause of the cry, be it hunger or fear, or the anticipation of its reception. Sounds are relational: they have a spatiotemporal thickness beyond the visual material from which they seem to emerge and to which they are expediently but rather incorrectly attributed. Sounds do not rest on a supporting material; there is no sonic arche-fossil. Instead they invisibly and inexhaustibly generate the texture of the world, and it is a challenge to calculate this texture backward into preterrestrial life. As much as one dinosaur would have heard not just another dinosaur but also his desire and fear, his location and circumstance, so I too do not just hear a car, but hear the sound of a car from my particular position inside the flat sitting at the window through which the sound of the car as material thickness travels together with that of the rain and the wind and all of them meet the sound of my fingers typing and the ticking of the clock to produce a complex materiality whose measure is not strictly additive and thus is not accessible through mathematical speculation only. This sonic material is all given at once but does not grant immediacy: there is no back that I cannot see but can calculate the dimension of, there is only simultaneity and yet there is no sense of a form, but only mobile and invisible formlessness. This unseen formlessness demands participation. As I hear it I hear myself, not as two distinct entities but as a timespace material generated through our simultaneity, and that is what I hear and how I am heard at the same time. The sound as material is an event, an expansion in time and space, that generates an environment, which I inhabit not at the center of it but centered by it. The material object of the car sounds as thing amidst other things of which I am a thing too. Together we are producing vibrations that are not their own measure but a more ephemeral thickness within which we can perhaps gauge the ancestrality of the sonic world.

Arche-Sonic Vibrations Toshiya Tsunoda’s Scenery of Decalcomania from 2004 is an album of seven tracks that use vibration plates, oscillators, contact microphones, and gates to trigger and record the vibration of things. He uses natural occurrences and creates events that cause

564   salomé voegelin vibrations to travel through a certain space—through bottles and cylinders, pipes and copper foil—and observes how these vibrations affect this space “or to put it another way, a space is made to appear through vibration” (Tsunoda 2004). Listened to on headphones, the sounds seem trapped and hardened. Unable to make connections to the outside world they congeal to small and intense abstractions. Compact and taut sound bits bouncing around between my ears, they have nowhere to go and nothing to connect to and thus fail to generate the timespace of their materiality. They are fossilized, calcified into the shape of their own essence, the measure and description of what they represent, “I used three glass bottles, three vibration plates and sine wave as material rather than what they produce as vibrations” (Tsunoda 2004). This observation at once questions the ideas of a fossil-knowledge, not for its accuracy but for what it can convey, and introduces vibrations as an alternative vestige of ancestrality. On a speaker system, the sounds of Tsunoda’s vibration recordings attain exteriority and start to affect the shape and texture of the space that I am listening in. The quieter and louder crackles, the low frequency buzzes and drones, as well as the high-pitched oscillations appear not to come from the loudspeakers but from the walls, the floor, and the furniture and even from myself; and they do not disappear into their materiality but mobilize their appearance to give them their extensionality. Tsunoda’s sound is diffuse, it has seemingly no direction or provenance and does not privilege one connection over another. Its vibrating materiality holds and generates subjects and objects in simultaneity and without partiality. The seven tracks exist and move in a timespace that they generate but which includes other things, and me, “thinging” with its hum. I cannot separate them out from the whirr of the fish tank in the corner or the murmur of the fridge in the kitchen, which become other vibrational bodies of his compositions: “Is this a scenery or is this an environment?” Tsunoda asks in his liner notes and promptly answers himself: “ ‘Scenery’ appears as our view. ‘Environment’ appears as a view including us.” The timespace textures of Tsunoda’s vibration recordings join the sounds in this room to create an inclusive environment and also open the apparently unsound to their own vibration: the tiled fireplace in front of me becomes a cavity and sounds as mobile materiality, which it was all along, but its appearance needed to be triggered in an interaction that shows the world as a field of invisible contacts of dark events, rather than as a surface of measurable entities, of outlines, fossils, bones, and shapes. I can think the individual sounds via their source, but this would mean to reduce the predicate to the noun and to retain it within its visual boundaries and possibilities. In contrast, as invisible vibrations, they are not reducible or quantifiable and instead produce the environment in which we find each other in vibrating as an arche-sound that is not fossilized but is the invisible mobility of all things, whose infinity is not measured as a before and after my existence but is the inexhaustibility of the present. This vibrational contingency is not limited to the comprehension of Tsunoda’s work as a world for me, creating a solipsism that appreciates no other possibilities and that shifts toward the “fideist obscurantism” of my own perceptual dogma. By contrast,

sonic materialism   565 through the specificity of my encounter I appreciate the fluid and unstable reality of my contingency as one of many, none of which are “wholly other,” and all of which are simultaneous, each as real and each as possible as the other. As I walk around the room, I appreciate the plurality of the work: at each point, another vibration comes to the fore while all others remain in play. Thus, I come to physically comprehend the simultaneous plurality of the real, and rather than reduce its vibration to a set of numbers in order to discount my physicality on the way to a plural but factional scenery, I hear a heterogeneous environment. Tsunoda’s work reminds me of my existence in an ancestral texture of sound, whose appearance, however, is not fossilized but moves on inexhaustibly. The “after finitude” of a sonic sensibility does not present a certain finished form; it does not present “the material support” for the investigation of ancestral phenomena—the geological formation, the fossil imprint, the density of coal, and the rings of a tree—and it does not rely on the possibility of a pure mathematics of nature “to demonstrate the integrity of an objective reality that exists independently of us—a domain of primary (mathematically measurable) qualities purged of any merely sensory, subject-dependent secondary qualities” (Hallward 2011, 140) such as smell, sound, and touch. But while, as Peter Hallward continues, the thing measured is indifferent to it being measured or what it is measured as, the idea of measuring is absolutely subject-dependent. The arche-fossil presents a reduction and deformation of the thing into its measure that is akin to the reduction of Tsunoda’s sounds in the closed-offness of the headphone or the absorption and deadening of sound in the acoustic isolation of the anechoic chamber. Without the reverberation of sound within its environment, as concept and actuality of material connection and exteriority, the vibrational thing does not expand into its formless capacity but deforms into the condition of its measurement. And while this conjecture might shed light on a world without human experience, since it is still calculated from a human point of view, through the subject-dependent idea of measuring, it does not enable access to a nonhuman world. The assumption would be that a world without humans is a world without experience and possibilities unless they are strictly speculative; the contingency of facticity rather than of the material itself. By contrast, the ancestrality of sonic vibration is the phenomenon of its material, which is infinite; it sounds now as an arche-sonic that brings me to the consciousness of a before and after through my equal participation in its present texture. In the texture of the world as a vibration-environment, possibilities do not negate each other, causing plurality as dissent and factionality, which inevitably leads back to strong and contested territories and identities. Instead, they trigger nonselective connections and serendipitous collaborations between invisible things whose textures show me my responsibility and instill the humility of my own reflection. Vibrations are the ground on which communication and communality is sought rather than found. In this point, my motivation for a sonic materialism answers William E. Connolly’s invitation to “respond to the charge of anthropocentrism in order to fold more modesty into some traditional European modes of theism and humanism alike” (Connolly 2013, 400).

566   salomé voegelin Vibration as an arche-sound allows a phenomenological ancestrality that avoids the charge of anthropocentrism and fideism through practice. It makes the world appear as an invisible field of connections within which my body oscillates as a thing amid other things. Vibration is the inexhaustible condition of this world that existed before me and will exist after me and binds me into its texture, not at its center, but in its weave to which I respond with the humility of my participation. This participation is triggered by the formlessness of sound and its appearance as a dis-illusion6 that neither invites a transcendental revelation, the recognition of an a priori, nor a mathematical speculation, the numerical constitution of its form, and instead promotes the “practice” of the object and the subject in doubt. This is a phenomenological doubt in the certainty of the given that prompts the suspension of habits of thought and instigates the perceptual practice of a passing sense of things. Listening as a phenomenological practice tunes into the formlessness of the world and revisits nominal and habitual realities in the darkness of their mobility. In doing so it does not aim to know what something really is, but engages in the possibilities of the material and thus it at once illuminates and questions the rationale and reason of material definition, use and value, and envisions what else they could be. This is not the practice of the acoustician, the sound engineer, the musician, or the psychotherapist, all of whom have professional expectations and goals that immediately signal a privileged position, determining what will be heard and how it will be evaluated. But neither is it the solipsistic and precritical practice of artistic indulgence in an “anything-goes.” Instead, the material equality of the encounter in the inexhaustible flow of sonic textures serves as the guarantor for the humility of the perceptual engagement that does not transform doubt into (anthropocentric) certainty but generates a practice-based reality that includes the idea of measuring and experience without resolving their ambivalence. The contingency of my position in the arche-sonic texture of the world does not signal the perception of a world for me. Rather, the practice-based contingency of a vibrational-self connects me with the infinite flow of the world and gives rise and opportunity to a posthumanism that does not need to replace the subject through mathematical speculation, a speculation that, in any event, holds its own authorship and subjectivity that is thus not banished but whose authority becomes invisible and thus even more singular and nominal and potentially also less accountable. Instead, a different subjectivity emerges that appreciates its responsibility toward a nonhierarchical listening and practices the doubt in the appearance of things through the ambivalence between its measure and its experience. It is within the practicing of this ambivalence in listening to the vibration of the world as its ancestral texture, rather than resolving it through speculation, that I believe a promising and useful materialism might be found. In this context, Tsunoda’s sonic vibrations offer an ancestrality that does not need to expunge human perception to consider the what is. Instead, the vibrations make the world a vibration-environment of simultaneous formlessness, which the listener practices

sonic materialism   567 to understand the equivalence of things, what they are together as a nonhierarchical texture into which she is woven too. In this regard, vibrations are the texture that connects measurement and experience and makes them collaborate. And while they are not infinite neither are they inexhaustible, and they can make us hear different materials and voices that are not negotiated through a pre-existing referent but come to speak for themselves. Physiologically, but also politically and socially, we can tune into the hum of the ancestral flow to try to discern different voices and different materialities that so far were considered incommensurable.

Porous Bodies The ancestrality of sonic vibration, its inexhaustible texture that sounds as an arche-sonic, has not only the capacity to make accessible an anterior or ulterior world, to effect in me the consciousness of a before and after terrestrial life. It also makes accessible an over there and another place: an “extra-terrestrial” life, alien forms, and unknown things. In other words, sonic vibrations, the arche-sonic, calls into the realm of the possible also the impossible, that which for physiological, ideological, aesthetics, sociopolitical, and economic reasons we cannot or do not want to hear, the possibility of its sound is central however to a materialist critique of a human- centered rationality.

What Is the Vibrational Facticity of Impossible Bodies and Things? Aura Satz’s work Ventriloqua, originally performed in 2003 with her own pregnant body and subsequently restaged with other performers, explores the ideas of elastic and porous boundaries through which other voices might be made audible. The ventriloqua of the title convenes “the other’s” voice as metaphor and concept of the absent. But it also assigns a real other voice to the unborn, as an unknown thing not yet of this world. The artist sits on a white chaise-longue, clothed in a red custom-­ made robe that abstracts her body and covers her face, and thus the potential of her voice and articulation, while exposing the pregnant stomach through a round opening embroidered with shiny sequins. Visually, her shape has lost its definition and becomes identified instead through the state of pregnancy, the other, the not-yet-present body rather than her own. There is an absurdity in this emphasis, a strange reversal of the expected referential order, and at the same time the abstraction stresses an archetype, a mother without a voice, without a face, whose body is in the service of that other not yet heard voice to give it a space in the world while her own is muted. The ambivalence of this voicing and unvoicing is brought center stage through Satz’s role in the performance, and becomes a central part also of my reading of this work in

568   salomé voegelin relation to sonic materialism and the idea of an egalitarian sonic-texture. Satz, and each of her subsequent pregnant stand-ins, are not the performers of the work, they are its conduit; they are a social conduit and vessel for another voice rather than the contingent formlessness of their own particular articulation. The pregnant form reclines on the chaise longue, with one hand she holds on to the antenna of a Theremin placed on a tripod next to her. Holding on to it her body becomes the extension of the instrument as another antenna. In this way, she opens up her own sonic range to the Theremin that, in turn, is calibrated on her body. This calibration is unstable and needs constant retuning as the human body presents an inefficient conduit in the sense that it is not finely tunable but brings its own disturbances to the performance. This inefficiency demonstrates the fluctuating and mobile capacity of physicality and indicates the illusion of pure mediumship: the channeling of another voice, a separate spirit, or of ancestral data, without the impact of the medium itself. It puts into doubt the sustainability of mind-independent facts, and stresses the ambivalent relationship between the voice and the unvoiced, between the present and the absent, which are not absolute but ideological and applied. The body as Theremin is controlled by the performer in close proximity but without physical contact. In this first enactment of Ventriloqua, the Thereminist Anna Piva plays the body by moving her hands just above the skin protruding through the sequined gap in the costume. Her hands move through pronounced physical gestures, producing the visual shapes that play invisible oscillations and amplitudes. The electric signals thus generated are sent to an amplifier and emit via loudspeakers sounds that emerge as modulating tones and surging vibrations issuing from the skin and into the auditorium. The atmosphere is séance-like: the room is darkened and a single light illuminates the protruding white globe of skin as it is made to sing. This turn into darkness carries an occult undertone that pervades the work’s performance. The voice of the unborn as alien spirit, is called into the room through an act of mediumship. Its inaudible voice is seemingly channeled through the Theremin-body and made to speak pre-birth. There is the potential that the making audible of the unheard, rather than pursuing a posthumanist equality of materiality and an inclusive politics of the voice, steps into the mystical and fanatical that Meillassoux ascribes to strong correlationism and that Theodor Adorno fears in relation to astrology and the occult. The parallel is intriguing. Both correlationism and the occult are responses to a philosophical rationality of absolutes that leaves no room for faith, for contingency and self, and yet, according to both Meillassoux and Adorno, each in turn ends in its own dogmatic obscurantism. Adorno, in his text The Stars down to Earth, writes against mysticism and the occult as the cornerstones and antecedents to fascism and totalitarian governance. Focusing on the pervasiveness of astrology through a study of daily columns of “Astrological Forecasts” by Carroll Righter in the Los Angeles Times, Adorno produces a Thesis Against the Occult in which he argues that monotheism is decomposing into a second mythology that separates the spirit from the body, the material experience of the world, and that critiques materialism while seeking to “weigh the astral body” (Adorno, 2004, 177). In other words, his thesis, developed over nine key observations, ridicules the occult

sonic materialism   569 as a “metaphysics for dunces” that draws its rationality from the irrationality of a fourth dimension, a nonbeing that claims to answer all the questions about the material world. His critique remains serious, however, since he fears: the power of occultism, as of Fascism, to which it is connected by thought patterns of the ilk of anti-Semitism, [which] is not only pathic. Rather, it lies in the fact that in the lesser panaceas, as in superimposed pictures, consciousness famished for truth imagines it is grasping a dimly present knowledge diligently denied to it by official progress in all its forms.  (Adorno 2004, 175)

Sonic materialism, although invested in bringing a different consciousness to the material world by foregrounding its invisible dimension, does not pursue a “mystical materialism.” It does not focus on a “fourth dimension” removed from the material and the body while purporting to answer the questions of its formation (and thus holding power over its identity and governance). Rather, the facticity of a sonic materialism describes the practical intertwining of the body and the material in doubt as the practical condition of suspended habits and nonhierarchical simultaneity. It does not demarcate a virtual materialism but reconnects the invisible mobility of sonic processes to their manifestation and consequence in the world. Correspondingly, Satz’s work does not perform a mediation of unheard voices as an embrace of the occult but as a re-vision of the habits and values of the real. The flow of sonic vibrations mediated from the pregnant body via the Theremin does not summon the unborn as astral body, but performs the relationship of its voice to that of its mother and highlights the social condition of her silence. The piece points to an unvoicing, to the moment where one voice usurps and stills another; a reduction and deformation of the mother away from her own form not into a serendipitous and collaborative formlessness but into the socially deformed figure of motherhood. This interpretation is ­central in relation to a sonic materialism that observes the world in sound as a vibrationenvironment in which we find each other in vibrating as an arche-sound that is not mystical and remote, calling from afar, but is the invisible and inexhaustible texture of all the possibilities of the world. The inaudible is not an audible of another world, but is what sounds here, as another possibility of this world. Sonic materialism acknowledges the proximity through which I comprehend physically, in practice, the simultaneous plurality of the real, rather than reducing its vibration to a measurable quantity, or separating it through a reificatory idea of spirits and the immaterial. In this regard, Adorno’s Thesis Against the Occult also serves to re-evaluate Meillassoux’s speculative materialism and his insistence on mind-independent facts. And it gives cause to the need for a careful nonanthropocentrism that does not simply expunge the human and sever material speculation from the perceptual process, initiating a mathematical or mystical materialism without a body–soul connection and with only a fanatism of numbers or the occult to draw on. Such “mythical speculation” denies the inefficiency of the human in that it does not account for the disturbances of the body and the mind in the transmission of measures and spirits. Instead, it pretends that a

570   salomé voegelin direct and unaffected conduit to astral and mathematical systems, free of human design and intervention, is indeed possible, and that what we measure and hear are the unfettered computations of ancestrality and the true voices of the spirit world. In response to this we need to take care that materialism does not result in ventriloquism: the speaking for something/somebody else through a human-designed channel of spirituality or calculation masquerading as mind- and body-independent fact, a process that equates to a hyperanthropocentrism hiding in a mythical or mathematical undergrowth. Instead, we need to remind ourselves of Connolly’s call for more modesty about our status in the world in relation to the “traditional European modes of theism and humanism” to grasp responsibility and pursue a different relationship between the voiced and the unvoiced. Within this objective, ventriloquism becomes a useful device and metaphor to conjure the other not as a separate other, neither a spirit nor a measurable quantity, but as a voice that sounds simultaneously but is not heard: an extrasocial rather than an extraterrestrial, whose sound thickens the perceived reality of the world through the actualization of its impossibility. In this sense, listening to the ventriloquist sharpens our sensibility and care, and fosters a practice of listening-out for the unheard or the overheard to draw the inaudible as another possibility from the impossible into the simultaneous plurality of the actual. Ventriloqua, as a listening-out for the unheard materialities of this world, defines a useful attitude to material relations as well as toward notions of presence and absence understood not as dialectical absolutes but as the possibilities that can, and the possibilities that cannot make themselves count in the actual world. As one defined inaudible voice is sounded through the Theremin, that of the unborn child, we are reminded of other voices, historical and present, which have not been heard. And as the threshold of possibility becomes porous, impossible things start to present themselves in the sonic-vibrations of the actual world.

Political Textures Sonic vibrations, the arche-sonic texture of the world, reveal seemingly impossible modulations that are not reducible to the volume of past sounds or the spirits of otherworldly voices, but demand they be heard within this world. And thus, within the texture of this world, appears that which we cannot or do not want to hear and which demands to be heard, to make itself count as a slice of the real. This forceful appearance of impossible things in the midst of our actual world challenges the notion of difference and distance: two terms and values that are at the center of the humanist project that seeks to know the world through the rationality of differentiation and the ability to read its relationships as the distance between objects. Listening-out for inaudible things, as a sonic-materialist attitude, by contrast seeks to understand the impossible through its proximity to my own impossibility. In sound, we do not meet as difference or similarity, but negotiate

sonic materialism   571 who we are in a meeting that is primary, before definition, again and again, seeking invisible and tentative recognitions of what we might be in the practical equivalence of its texture. Anna Raimondo’s work Mediterraneo, from 2015, brings us to the vibrations of the unheard that texture a current sociopolitical reality but lack their own articulation. Her voice, repeating over and over again the word “Mediterraneo,” takes us to the center of the liquid expanse that is not simply between Africa, the Middle East, and Europe, a mere connecting and separating passage, but is the material and metaphor of their relationship as a deep and treacherous “what is.” Listening hears not one against the other or their separation, but hears the in-between, the relationship, as the material of the continents’ contingent facticity. Listening to her voice, I suspend my belief in what I know to be on either side. I find a focus not in their distance and what that denotes, but hear in the materiality of their primary relationship other possibilities of what they could be. In sound, the Mediterranean is the crossing not the crossed. It is not the infrastructure of connecting and separating, a bridge between continents that enables us to cross while at the same time maintaining the distance that exists in the first place, determining either side through the actuality of what it is not. Rather it is a volume, a material inhabited in listening, whose traveling within is not about my purpose or provenance, and it is not about my sameness and their otherness: the real actuality of this continent and the apparent impossibility of that, but about the possibility of the water’s own expanse and how time and space define things together. The crossing enables simultaneity. It performs the intertwining of the self with the world, and of the continents of the world with each other. These continents are not absolute territories but are expansions of each other whose impossible meeting points sound in the middle of the sea. At the same time, this self is not a positive or a negative identity, and neither is it an anthropocentric definition, but an uncertain and contingent subjectivity, constituted in an inhabiting practice of perception that is crossing boundaries not to measure and name but to engage in their watery depth to understand the defining lines through the self ’s coinciding with them, rather than dispassionately and from afar. Distance creates the distortion of dis-illusions, which promises resolve once we step closer. By contrast, the simultaneity of inhabiting creates the dis-illusions of plural possibilities that are not resolved into one singular and actual real—war, fighting, right, or wrong—but that practice the inexhaustible ambivalence between measurement and experience: what something is as numbers and what it appears to be in perception, so that we might understand and respond with engaged and practical doubt to what seems incommensurable from ashore. On a bleached-out white background we see a glass slowly, drip by drip, filling with a blue liquid that, as the poet Paul Claudel would say, has a certain blue of the sea that is so blue that only blood would be more red. And as the sound of dripping water slowly fills the glass, Raimondo’s voice catches her breath, accelerates, slows down and stutters, speeds up again, and repeats and repeats “Mediterraneo” until her voice is drowned in

572   salomé voegelin the water she has conjured with her own words. Until then, on the unsteady rhythm of her voice, we are pulled through the emotions of fear, excitement, hope, and death that define the Mediterranean as the liquid material that is “the between” of Africa, the Middle East, and Europe today, and whose material consequence does not stop at the coastline but offers us the texture to hear its vibration and to understand how we are bound up with it. Raimondo’s work brings us into the urgency of the situation through the focus on the materiality of the sea as the common texture of the adjoining continents rather than through the confrontations of their different shores. The repetitive mantra of her voice entreats me to enter into the water in order to—from within the fluid materiality— understand physically the complexity of its fabric, form, and agency: of what it weaves together formlessly rather than what it is as a certain form; and in order to suspend what I think I know of it and pluralize what it might be as the invisible organization of different things: salt, water, waves, holidays, routes of escape, yachts, aquatic life, sand, handmade dinghies, dreams, and desperation. Listening, I am persuaded to understand these things in their consequential and intersubjective relationships: what they sound together as sonic things and what thus they make me hear. Sound creates a vibrational-texture of the processes of the world that I hear coextensively and to which I am bound through my own sound. By contrast, a soundless ocean pretends the possibility of distance and dissociation, to be apart as mute objects and to be defined by this distance. The absence of sound cuts the link to any cause and masks the connection to any consequences. Thus, a mute Mediterranean enables my withdrawal from the sociopolitical and ecological circumstance of its waves and permits the rejection of my responsibility in its unfolding. Raimondo composes, from the hypnotic rhythm of her voice and the steady dripping of blue water, the political reality of the Mediterranean. Slowly submerging, with her words, into the deep blue sea, I abandon my reading of its terrain within the rationale and reason of existing maps and come to hear its texture as woven of unresolved material and positions. I do not follow its outline but produce a dark and mobile geography of the Mediterranean as a formless shape, whose possibilities and impossibilities undulate to create a fluid place that defies calculation but calls forth an attitude of listening-out to understand where things are at and to take responsibility within that invisible factuality: within this dark and mobile geography, we hear, as Connolly suggests we should, “the human subject as formation and erase it as a ground” (Connolly 2013, 400). In the watery depth of Mediterraneo, humanity appears as formless form that has lost the access to its grounding in the traditions of knowledge and established canons of thought, in political certainties and journalistic judiciousness, as well as in relation to historical and geographical identities. Instead, the rhythmic drip, drip, drip, and the reiteration of its name call for another ground, a groundless ground of invisible processes based on the responsibility of a practice-based subjectivity that appreciates the consequentiality and intersubjectivity of things without controlling them. Having been transported into the middle of the sea by Raimondo’s audiovisual work, we can hear the world as the vibrational-texture that binds us all and everything into an

sonic materialism   573 ecosystem of invisible processes. This does not mean that some do not have more power than others. Simultaneity does not prevent hierarchies. Instead, the simultaneity of the sonic-texture makes visible the interdependencies of power, organization, selforganization, and control and provides an opportunity to revisit economical and political values that depend on the divides and distances established in a humanist philosophy and perpetuated in the ecology of the visual. A sonic reality emerges not from maps and words but from the fluidity of blue liquid and the drowning of the voice. And as the fluidity gives access to a groundless world, a world without a priori reason and rationality, the drowning words do not fade but re-emerge in the plurality of the inaudible. The posthumanist impetus of sonic materialism does not expunge the human but shakes the ground he stands on to make himself taller. This is not so that no ground can be established but rather enables the grounding to become practice-based, contingent and plural, based not on mind-body-independent speculation but the suspense of habits and the beginning of doubt, and that includes doubt in the normative habits of a singular authorship.

Conclusion The paradox of Meillassoux’s speculative materialism is the disavowal of the authorial voice by that authorial voice. The contradiction is striking, and its consequences not only benign. His desire to replace human perception through material computation fails to acknowledge the power and nominalism of his own authority. And it ignores the fact that it is only a minority of voices that can make themselves heard in a current actuality, producing narrow norms and values of how things are and of what needs to be done. His mathematical foreclosure of human perception ignores the fact that the anthropocentrism he critiques is the perspective of an economically, class-, gender- and race-defined minority. His speculative materialism silences the ignored and denies the opportunity to those not yet heard to even comment on their own inaudibility within this human perspective. In response, a careful nonanthropocentrism neither mutes nor speaks for the other, subjects or objects, but adopts an attitude of attention and curiosity to hear the other speak, not as an ancestral or a spiritual voice, but as another voice of this world. This invites an understanding of the ephemeral mobility of things and of subjects as things, without anxiety of fragile perishability, but in terms of the serendipitous potential for a collaborative worldview. The aim is to not only pluralize articulation but also the ground, the reason and rationale of the process of communication: how things are said and how they are heard, judged, and incorporated into the reality of actuality, or left out as irrelevant, marginal, unimportant. Satz’s work presents the porous nature of the body, its ability to let the other speak through her form, but it also brings into play the inefficiency of transmission and serves as a reminder that neither the body nor the mind can act as pure conduits. Both affect

574   salomé voegelin the material through their own “disturbances” that manipulate and distort the others’ voices and construct a hyperanthropocentric ventriloquism that fails to see the impact of its measure on the heard. Consequently, a sonic materialism does not pretend to be able to speak for the other, it does not ventriloquize, and instead calls for an attitude of “listening-out for,” a stance of care and humility that hears the possible and the impossible in the vibrational texture of the world. This texture interweaves the voiced and the unvoiced as reciprocal and simultaneous things that are not hierarchical but speak of the hierarchies of the world. The aim is to hear a plurality of authorships and acknowledge the self-authoring of nature and of material that we can translate carefully, as Tsunoda does in his vibration recordings, to make them accessible and thinkable, always in the knowledge, however, that there are no mind-body-independent facts but that our body and mind will always diffuse and influence what it is we hear. In this sense, sonic materialism is a phenomenological materialism, which is not a contradiction but an acknowledgment of the subject as thing thinging amid other things and an articulation not of its control over the material world but of its responsibility within it. Materialism is thus a relationalism, not of different things but of things together. The material is not an entity but is the vibrational texture that things create simultaneously through the “equal differences”7 produced in their encounter with each other rather than beforehand. I comprehend the alterior and ulterior, as well as the extrasocial, not as human exclusive domains of numbers and spirits but through my position in the flow of their vibrational texture. The arche-sonic weave of this texture holds the possibility of the before and after as well as of the over there. It produces the concept of my finitude not as an absolute but as an element of its infinitude that is accessible to me through the continuous processes of reciprocation and generation of material relations within which I exist as a thing among other things. Phenomenological ancestrality is the before and after accessed through the inexhaustible formlessness of a present sound that I inhabit in intersubjective contiguity. The mathematical ancestral and the spiritual astral by contrast rely on distance and absence to assure and assert their measurement of the real. In this sense, they are entirely visual concepts: they overcome, mathematically and through mediumship, a temporal or spatial distance in order to know and sense a place or a thing that is nominally without them; and while this might make the other talk and the ancestral yield its measure, his voice and computation is channeled through the distance needed for its reach in the first place. This distance is at the basis of a visual materialism that seeks to omit the human but keeps the gap and difference between things that serve human articulation, measurement, and thought. A sonic materialism does not start from this distance but from within the texture of the world, which includes me simultaneously as a thing in the weave of things. Interwoven in its flow, I understand the contingency of my position not as absolute, as a position for me, but as a matter of the facticity of the world, which thus becomes accessible

sonic materialism   575 to me as a proximity where the measure is not between things, or between me and the world, but is the relationship that we form. Thus, there is no need to overcome a distance in order to understand the mobility of the world. There is no sonic sublime that shapes the conceptual ground of articulation and propels perception toward idealism. There is, instead, embedded doubt, the suspension of habits and norms, which produces a groundlessness that encourages not just a plurality of voices but a plurality of rationales and reasons that hear and value their speech. I practice this plurality on the ambivalence between measurement and experience, producing a complex sociopolitical texture from arche-sonic weaves that bind me into my responsibility within its inexhaustible flow. Where New Materialists theorize as speculation, I practice in doubt; and where they are in search of the infinite, the anterior and ulterior condition of thought and existence, I focus on the inexhaustible nature of sound that exists permanently in an expanded and formless now that I inhabit in a present that continues before and after me. In short, sonic materialism builds on the groundlessness of an auditory imagination the critical attitude of a “listening out for” rather than an occult dream. And while I do not share Meillassoux’s mathematical speculation, I share in his desire for a philosophical position of infinity that serves to acknowledge that there is “more” than we can see and experience. And I take this more to be the start rather than the conclusion of our appreciation and participation in the material world. Raimondo’s piece makes us aware that the world entered via such a listening attitude, as sonic sensibility and Zeitgeist, is rather darker and deeper than first imagined. The sonic is not self-certainly benign, peaceful, egalitarian, and just. Instead, it reveals the conspiracies of the visual world and probes the political expediency of class systems, dividing and ruling, in a sea of blue.

Notes 1. The term “naturecultures” was coined by Donna Haraway in The Companion Species Manifesto (2003). It expresses a reciprocal and nondialectical entanglement of nature and culture, body and mind, and so forth, and proposes a rethinking of the broader modernist ideology represented in these dualisms. 2. Meillassoux justifies and contextualizes his turning away from philosophical thought toward mathematical speculation by explaining,

it would be absurd to accuse all correlationists of religious fanaticism, just as it would be absurd to accuse all metaphysicians of ideological dogmatism. But it is clear to what extent the fundamental decisions that underlie metaphysics invariably reappear, albeit in caricatural form, in ideologies, and to what extent the fundamental decisions that underlie obscurantist belief may find support in the decisions of strong correlationism.  (Meillassoux 2009, 49) He further states “that thought under the pressure of correlationism, has relinquished its right to criticize the irrational” (45) and that, paradoxically, a philosophy, phenomenology, which sought to critique the absolutism and dogmatism of transcendence “has been

576   salomé voegelin transformed into a renewed argument for blind faith” (49). It is therefore that, instead of seeking insights into a post- and prehuman world via philosophy, he employs the mindindependent sphere of calculation and measurement to argue its “proper” truth. 3. In the course of his book Meillassoux develops facticity, the pure possibility of what there is, into the notion of factuality understood as the speculative essence of facticity: the fact that what there is, cannot be thought of as a fact but is a matter of nondogmatic speculation, a speculation that he ultimately pursues via mathematics. 4. The notion of posthumanism here does not refer to a world without humans, but to the project for a different scholarship and sensibility, initiating a different philosophy that does not simply continue the humanist path of an anthropocentric rationality and reason by denying the hyper nominal subjectivity of philosophical tradition while perpetuating it through the authorship of that very denial, but by considering a decentered human subjectivity that lives not at the center of the world but is centered by it, aware of its responsibilities, and humbled in its equivalence with other things. This posthumanism acknowledges that the human at the center of humanism is not every human, but a clearly demarcated and privileged identity: a tautologically privileged subjectivity based at the center of humans’ own discourses that places them supreme in the nominal understanding of the world that their very philosophy creates. Instead, the aim is to contribute to the conception of possible philosophies whose objectivities and subjectivities are plural but not factional and that are aware of the inevitable exclusion of one point of view by another and are thus engaged in philosophy as a field of blind spots that are practiced rather than theorized. 5. This interpretation of ancestrality as a visual consciousness does not outline a sonic essentialism, and neither does it represent a critique of visuality. This text does not pitch visuality, vision, or a visual literacy against sonicality, hearing, and a sonic literacy. Rather, the critique of the visual as it is implied here is not a critique of its object, what we see, but of its practice, the way we look and what we look for understood as cultural and ideological practices. The suggestion is that the ancestral, as it is staged and used by Meillassoux, relies on narrow channels of vision that deny much of what else could be seen. In response, this chapter promotes a sonic sensibility and engagement with the material world that achieve not a blind understanding of its processes but augment the way we see the world. 6. Maurice Merleau-Ponty calls perceptual dis-illusions the probable realities of a first appearance: “I thought I saw on the sands a piece of wood polished by the sea, and it was a clayey rock” (Merleau-Ponty 1968, 41). To him the appearance of the piece of wood is not an illusion, but a dis-illusion: the loss of one evidence for another. Accordingly, perceptions are mutable and probable, “only an opinion”; but what is not opinion, what each perception, even if false, verifies, is the belongingness of each experience to the same world, their equal power to manifest it, as possibilities of the same world. 7. The notion of “equal difference” is articulated in my book Listening to Noise and Silence via the equal significance of Sergej Eisenstein’s monistic ensemble of film montage, and clarified, via Jean-François Lyotard’s agonistic play, as a nonhierarchical playful conflict of the sensorial material (Voegelin 2010, 141). Here, it is further developed as the coextensive simultaneity of the material experienced and measured in a togetherness that does not ignore difference but understands and generates it in perception rather than takes it as a given.

sonic materialism   577

References Adorno, T. W. 2004. The Stars Down to Earth. London and New York: Routledge. Connolly W. E. 2013. The “New Materialism” and the Fragility of Things. Millennium Journal of International Studies 41 (3): 399–412. Hallward, P. 2011. Anything Is Possible: A Reading of Quentin Meillassoux’s After Finitude. In The Speculative Turn, edited by L. Bryant, N. Srnicek, and G. Hartman, 130–141. Melbourne: re.press. Haraway, D. J. 2003. The Companion Species Manifesto: Dogs, People and Significant Otherness. Chicago: University of Chicago Press. Meillassoux, Q. 2009. After Finitude. New York: Continuum. Merleau-Ponty, M. 1968. The Visible and the Invisible. Evanston, IL: Northwestern University Press. Raimondo, A. 2015. Mediterraneo. Audio-visual installation. Satz, A. 2003. Ventriloqua. Performance. Tsunoda, T. 2004. Scenery of Decalcomania. Album with Liner notes. Australia: Naturestrip. NS3003. Voegelin, S. 2010. Listening to Noise and Silence. New York: Continuum.

chapter 28

Im agi n i ng th e Sea m l e ss Cy borg Computer System Sounds as Embodying Technologies Daniël Ploeger

Introduction When I first started Microsoft Windows 10, I felt something was missing. Or rather, I heard something was missing. There was no startup sound. Since I first used Windows about twenty years ago, there had always been a short sound sequence that welcomed me at the start of a computer session. Now, the only thing I heard when the desktop came on was a short and inconspicuous “prrt.” Why did the startup sound disappear? Most people in the Western world and beyond will be familiar with the startup chime of an Apple computer, the Windows error sound, and plenty of other operating system (OS) sounds. However, despite the wide cultural reach of these sounds, studies of computer sound have mainly been concerned with sound synthesis for musical purposes or the simulation of human speech. Relatively little research has been done into the design and use of sound as part of computer OSs (Gaver 1986; Blattner et al. 1989; Alberts 2000; DeWitt and Bresin 2007), and, as far as I am aware, there are no studies that are dedicated to OS sounds from a cultural critical perspective. In this chapter, I discuss the development of the role of sound in the operation of computers from the mid-twentieth century until the present, and contextualize this in relation to broader cultural perspectives on computer systems as cybernetic extensions of the user’s body. Building on this contextualization, I will explore how common computer system sounds might facilitate particular imaginations about the nature of technological extensions of human bodies. In what ways do computer sounds affect the ways in which users imagine the relationship between their bodies and their computers? And how can the design of

580   DANIËL PLOEGER system soundscapes play a role in the propagation of certain ideological concepts of technologically prosthetisized bodies, or cyborgs?

From Circuit Sonification to Audio Branding Early computers in the 1940s, such as the Harvard Mark I, were built with electric relays, which meant that computational processes were audible because of the clicking of the relay switches. Listening to these sounds, computer operators could often detect errors or operation irregularities through variations in familiar patterns. For example, Phillips engineer Nico de Troye recalls that: The [Harvard] Mark I made a lot of noise. It was soon discovered that every problem that ran through the machine had its own rhythm. Deviations from this rhythm were an indication that something was wrong and maintenance needed to be carried out.  (De Troye quoted in Alberts 2000, 43, my translation)

However, once computers were built that used radio tubes or transistors, instead of mechanical relays, computers operated in operated in silence. With machines like the ARMAC, MIRACLE, UNIVAC I, and IBM 650, errors and problems could not be heard anymore. At the same time, until the late 1960s, visual monitors could only display very limited amounts of data so, despite some rows of small lights and a crude cathode ray tube display, the input and output data—usually on paper tape—had now become the only detailed computing information directly accessible to the computer operator. Apart from a simple hoot that could be triggered at designated points in a program, there was no longer a possibility to aurally monitor operations during the computing process. Interviews conducted by the historian of science Gerard Alberts with Dutch engineers who had operated early computers during the 1950s and 1960s indicate that engineers regretted this loss of aural cues. They responded by connecting a loudspeaker to the electronic circuits inside these computers and thus made the processing patterns audible once more through what could be called an “auditive monitor” (Alberts 2000, 2). Some of the engineers were still able to sing the patterns of particular operations when Alberts interviewed them four decades later. Thus, the role of sound in the operation of these early computer systems appears to reflect a more widespread listening culture around industrial noises. In her research on sound in industrial work places, the cultural historian Karin Bijsterveld (2006) discusses how the motivations behind factory workers’ frequent resistance to the use of ear protection from their large-scale introduction in the middle of the twentieth century until—in some cases—the present day suggests that the aural perception of the patterns

Computer System Sounds as Embodying Technologies   581 of machine sounds forms a key component of operation monitoring and reassurance in a broad range of manufacturing environments. After the 1960s, the practice of listening to program routines became obsolete. On the one hand, the computers’ processing speeds increased to such an extent that aural monitoring of variations in amplified signals was no longer possible. On the other, the possibilities to display detailed datasets on cathode ray tube monitors had increased significantly so operation monitoring now became focused on the visual. Meanwhile, aural cues continued to play a role in the form of signal tones from loudspeakers that could be triggered by programming commands, more or less continuing the principle of the signal horn that had been part of the mainframes in the 1950s and 1960s. However, little consideration was given to exploring the possibilities for design and application of these signal tones. Instead, composers, engineers, and programmers with an interest in computer-generated sounds started to experiment with the computer-aided synthesis of musical and speech sounds. In 1961, the engineers John Kelly Jr. and Carol Lockbaum, in collaboration with the computer music pioneer Max Matthews, managed to generate human speech sounds with the IBM 7094 mainframe (Smith 2010). Famously, the 7094 sung the traditional song “Daisy Bell,” a feat that was later referenced by Stanley Kubrick in 2001: A Space Odyssey (1968) where the computer HAL sings the song just before his cognitive functions are disabled. However, these music and speech synthesis endeavors were pursued largely separate from the development of sounds as part of system operation. The loudspeaker that was included as a standard feature in IBM’s Personal Computer 5150 in 1981 was still only used for the emission of simple square waves for signaling purposes. Some early home computers aimed at hobby users, such as the Commodore 64 and Atari ST, were equipped as standard with more advanced sound synthesis capabilities, but, nevertheless, the startup and other OS sounds of these machines equally did not usually go much beyond some simple square wave signals. Thus, until the mid-1980s, OS sounds had been a largely neglected area in the development and research of computers and their OSs. This changed with the increase of computing power (and thus the possibility for more complex sound synthesis methods) and the emergence of an interest in the design of graphical user interfaces (GUIs) from the mid-1980s. The latter followed the public release of the first GUI OSs: Silicon Graphics’ MEX windowing system and Apple’s Lisa. Images can convey a lot of information with little space, and users can recognize and interpret pictures much faster than words (Schneiderman 1986). Building on this realization, research in icon design was considered useful in the endeavor to make computer systems more accessible and comprehensible for nonspecialist users and to thus optimize productivity on the work floor. Drawing on this research and development in the design of visual icons, a number of developers started considering the design of auditory signals. Complementing the visual dimension of the system with what were coined “auditory icons” (Gaver 1986) or “earcons” (Blattner et al. 1989), the possibilities of sound were explored in an endeavor

582   DANIËL PLOEGER to further optimize the user interface. Blattner and colleagues proposed an approach to auditory icons that builds on an analysis of visual icons. Distinguishing between “representational” (e.g., the Mac OS trash can), “abstract” (e.g., Adobe Creative suite icons), and semi-abstract icons (e.g., the Windows icon), they proposed to design auditory icons based on the principle of “iconic families.” Sounds with shared elements would convey to a user that they are related to the same group of functions. Thus, a combination of recognizable representational elements with interlinked abstract aspects could facilitate an easy-to-learn network of auditory communication as part of the computer user interface. While in the 1980s the interest in auditory icons had been focused on efficiently conveying information about the system’s operations in easily understandable auditory forms, the 1990s saw the emergence of a different interest in system sound. In Joel Beckerman’s book, The Sonic Boom (2014), Jim Reekes, the designer who created the current Mac startup sound and many other Mac OS sounds, reports how in the late 1980s he struggled to convince his superiors to replace ill-considered Mac sounds, and start to approach sound as a form of “audio branding” (Jackson 2003); what affective response will a sound evoke in relation to broader associations with elements of culture or nature? Until the implementation of Reekes’s design for the current startup sound, Apple computers used to play a tritone interval when switched on. In Western music history, this interval has often been associated with negative feelings and, from medieval times until the eighteenth century, it was commonly designated as the Devil’s interval. Curiously, this aspect of the sound seemed never to have been considered by the system designers, who—according to Reekes—thought sound design to be of little importance. Reekes eventually managed (more or less secretly) to replace the tritone sound with the current chime which consists of two major chords that pan slightly between left and right on a stereo speaker setup. Originally in C Major, it has been transposed several times, but otherwise it has remained the same since its inception. Reekes’s objective was to create a “meditative sound” that would act as a “palate cleanser for the ears” (Reekes in Beckerman 2014, 12). Users in the 1990s heard the startup sound at the beginning of every computer session and after system crashes, which occurred frequently. Consequently, the startup sound was an important factor in users’ experiences of brand identity. Eventually, the relevance of careful sound design and affective audio branding, as part of the development of OSs was acknowledged on a wider scale by software and hardware companies. This is apparent from Microsoft’s decision to hire the musician and sound artist Brian Eno to compose the startup sound for Windows 95. According to Eno, the commissioning brief he received included about 150 adjectives: “The piece of music should be inspirational, sexy, driving, provocative, nostalgic, sentimental . . . and not more than 3.8 seconds long” (Eno in Cox  2015, 271–272). The design of OS startup sounds, as well as signal sounds throughout the system, had now become a priority in developers’ corporate branding strategies (for more on audio or sonic branding, see Gustafsson, volume 1, chapter 18).

Computer System Sounds as Embodying Technologies   583

System Sounds and Affect These reflections on corporate interests in OS sound design since the 1990s suggest that there is an affective and potentially embodied dimension to users’ experiences of these sounds. Reekes speaks about a “palate cleanser for the ears” and the adjectives referred to by Eno obliquely refer to an incentive to establish a relationship between the user and the computer (or the Microsoft Corporation) that goes well beyond a cognitive and instrumental interaction into a more affective realm. Indeed, the media theorist Deborah Lupton (1995), in “The Embodied Computer/User,” gives an account of computing in the mid-1990s that confirms exactly this connection between OS sounds and affect. She starts with a short personal anecdote about her own computer: When I turn on my personal computer . . . it makes a little sound. This little sound I sometimes playfully interpret as a cheerful “Good morning” greeting . . . the sound helps to prepare me emotionally and physically for the working day ahead.  (97)

Notably, the sound she is referring to here is most probably the rather crude fanfare sound, which was included in the Windows OS, before the introduction of Brian Eno’s startup sound in late 1995 just after Lupton was writing. Brian Massumi defines affect as “a prepersonal intensity corresponding to the ­passage from one experiential state of the body to another” (Massumi in Deleuze and Guattari 1987, xvii). The application of sound plays an important role in the shaping of affective responses in a broad range of cultural activities, ranging from marketing (Bruner 1990) to activism (Thompson and Biddle 2013) and warfare (Goodman 2009). Although long unconsidered by system developers, users’ affective responses to OS sounds have shaped the experience of their interactions and connections with the machines since the early days. This is also clear from Alberts’s reflections on the role of the amplified processing sounds in early radio tube and transistor-based computers. Before these machines were introduced, computing had been a manual operation, which was accompanied by sounds of people working: historically on paper, using relatively simple calculating objects, later aided by mechanical calculators. The relay-based computer did calculations automatically, but it generated a reassuring sound that was similar to what had previously emerged from the manual mechanical calculators on the work floor. The accounts of the engineers interviewed by Alberts suggest that the loudspeaker attached to the subsequent “silent” computers did not just act as a monitoring device to check whether the computer was still operating correctly. The loudspeaker sounds also provided a sense of comfort, they facilitated a “sensory restoration of the relationship with physical calculation” (Alberts 2000, 45). Indeed, more recent research into the design of sound in human–computer interaction has investigated the potential of sound to facilitate affective user relationships with data inside the system. Anna deWitt and Roberto Bresin, in their article “Sound Design for

584   DANIËL PLOEGER Affective Interaction” from 2007, suggest the use of physical models of real-world sounds to represent elements of virtual worlds. For example, they propose to sonically represent the arrival of mobile phone text messages with the sound of marbles falling into a metal box. More important messages would sound like heavier marbles, and by shaking the phone the user could determine how many messages have arrived based on the sound of a related number of marbles moving around. Thus, they argue that the design of system operating sounds may be a way to “narrow the gap between the embodied experience of the world that we experience in reality and the virtual experience that we have when we interact with machines” (deWitt and Bresin 2007, 525).

Everyday Cyborgs In the following, I will further examine the role of OS sounds in the embodied experience of human–computer interaction. However, my interest is not in determining effective methods for information transmission and the potential to forge a seamless transition between embodied experiences of the physical world and the data that exist inside computer systems, as is the case in the research of DeWitt and Bresin and the work of Gaver and Blattner and colleagues in the 1980s. Instead, I will focus on how the OS sounds discussed thus far might relate to broader cultural representations and understandings of human bodies and technology, particularly in the light of popular cultural imaginations of the cyborg. Before I continue, we should take a closer look at embodied experiences of human computer interaction in a broader sense. In “The Embodied Computer/User,” referred to earlier, Lupton discusses embodied computer user experiences. It is not surprising that this text was written in the mid-1990s. This was the time when digital technology and especially personal computers had become omnipresent in professional and private life in the Global North. Lupton describes how, by the early 1990s, many people in Western societies had come to feel dependent on digital technologies in their everyday lives. A power cut at a research unit she visited left staff wondering what they should do while their computers could not be accessed. As a consequence of this far-reaching integration of computers (and other digital technologies) in everyday life—which has only become stronger today—people also tend to have an emotional relationship with their computers; they commonly experience fear, anger, frustration, and relief as part of their interactions with them. In her analysis of this phenomenon, Lupton builds on the feminist scholar Elizabeth Grosz’s argument that inanimate objects that have been in close contact with the body for extended periods of time become experienced as extensions of the body image. According to Grosz, “[i]t is only insofar as the object ceases to remain an object and becomes a medium, a vehicle for impressions and expression, that it can be used as an instrument or tool.” Thus, in interaction with the body, an inanimate object can become an “intermediate” or “midway between inanimate and the bodily” (Grosz in Lupton, 1995 98–99). Drawing on this, Lupton suggests that, by the mid-1990s, instead

Computer System Sounds as Embodying Technologies   585 of the “human/computer dyad being a simple matter of self versus other, a blurring of the boundaries between embodied self and the PC” (Lupton 1995, 98) has taken place for many people. If we consider the interactions between users and personal computers (or mobile devices) from a cybernetic perspective, we arrive at a similar interpretation. In his explanation of cybernetic networks, Gregory Bateson (1972) gives the example of the stick of a blind man. He argues that this object should—from a cybernetic perspective—be considered as part of the man’s body, because it constitutes a pathway for information exchange between the man and the world around him. If we think of Lupton’s account of the despair caused by the power cut in her university in the 1990s, or the discomfort (or even anxiety) many people experience nowadays when they are unable to connect to social networks due to a depleted smartphone battery, it is clear that Bateson’s argument is also applicable in this context: while users’ conscious perceptions of the computer may be as external objects with which they interact, in terms of their communicative interactions with the world around them they fulfill the role of cybernetic extensions of their bodies. Accordingly, we can consider everyday human–computer interactions in the context of the concept of the cyborg: a cybernetic organism. The term “cyborg” was first coined in 1960 by the scientists Manfred E. Clynes and Nathan S. Kline in their article “Cyborgs and Space” (1960), and further explored in Daniel S. Halacy’s Cyborg: Evolution of the Superman (1965). Inspired by recent developments in space travel, Clynes and Kline suggest that it is time for “man to take an active part in his own biological evolution” (1960, 26), through the attachment of technological extensions to human bodies, in order to prepare for living in extraterrestrial environments. Likewise, Halacy promotes the technological extension of bodies in order to enhance their strength and capabilities. In these visions, technological development is considered a neutral force that can be instrumentalized as desired. Since the mid-1980s, critiques of this technodeterminist approach to the concept of the cyborg have emerged. Donald MacKenzie and Judy Wajcman’s anthology The Social Shaping of Technology (1985) examines how technological developments are shaped by—and complicit in the persistence of—existing sociopolitical paradigms. In this context, Donna Haraway’s “Cyborg Manifesto” (1991) acknowledges that the image of the cyborg has its origin in the military-industrial complex, but that it can also be employed to challenge hegemonic divisions of gender; if the body’s parts and characteristics are thought of as (theoretically) exchangeable for technological substitutes, this means that traditional thinking in gender oppositions tied to a biological body becomes impossible. Thus, for Haraway, the cyborg is an image of “a creature in a post-gender world” (1991, 150), which allows us to move away from the binary thinking that underlies the distribution of power in what Haraway calls “White Capitalist Patriarchy” (161). However, despite these critiques and emancipatory visions for the cyborg, the ­positivist ideology of the military-industrial complex of enhancement and strength has remained a mainstay in imagined and realized cyborgs in popular culture and art until the present day. Fictional characters in films and TV programs from the 1960s until the

586   DANIËL PLOEGER present, like the Six Million Dollar Man, Robocop (Verhoeven 1987), and Ex Machina (Garland 2015), are consistent with the idea of enhancement through implantation and attachment of state-of-the-art technologies. Similarly, artwork and writing by artists including Stelarc (1991), Neil Harbisson, and Moon Ribas has been focused on promoting the idea that the human body can be made more capable through integration of hi-tech components.

The Sonic Smoothing of the Prosthesis When considering human–computer interaction in the context of this idea of the cyborg body, sound is of particular interest due to its affective qualities as discussed previously; OS sounds can play an important role in the “blurring of boundaries” in users’ embodied experiences of the interactions with their computers. In this context, Reekes’s objective to put the user at ease with a “meditative” startup sound, and Microsoft’s wish for a startup sound that evokes a range of positive affects, can be seen as more than merely straightforward endeavors to enhance corporate branding; they also play a role in evoking an experience of “cyborgian seemlessness” (Lupton 1995, 111) and thus arguably echo Halacy’s utopian vision of technological prosthetics as unproblematic harbingers of an enhanced human body. Similarly, Alberts’s account of the comforting effect of amplified processing sounds for the computer operators of the 1950s and 1960s may be understood as a successful experience of the smooth integration of bodies and technologized computational processes; that is, as cyborgs avant la lettre. However, if we look closer into the particular qualities of the system sounds I have discussed earlier in this chapter, there are significant differences between the sounds and their possible connotations in the early mainframe computers in the 1950s and 1960s and the OSs Reekes and Eno contributed to in the 1990s, which I will discuss in the following. Furthermore, more recent developments of OS sounds, most notably Windows, show some more peculiarities that deserve closer scrutiny. The 1990s sound projects by Reekes and Eno were aimed at evoking positive affects through references to broader, nontechnological (musical) frameworks. This is especially clear in Reekes’s reflections on the cultural connotations of the tritone that was used as the original Mac startup sound. Instead, the comfort provided by the operating sounds Alberts discusses appears to come from a more or less opposite connotation; instead of referring to the “outside” world of music, the sounds reflected a technological operation that was only meaningful in relation to the mechanical operation of earlier machinery that had performed a similar function. This difference becomes more understandable when we consider Lupton’s assessment of embodied user experiences of computer users in the 1990s more closely.

Computer System Sounds as Embodying Technologies   587 As mentioned already, Lupton suggests that once interactions with computers and other digital technologies have become thoroughly integrated in everyday life, a “blurring of the boundaries” between the devices and the embodied self occurs. However, this development is not a smooth process. It takes place through a negotiation of antagonistic emotions toward computers. On one hand, users are indeed “attracted towards the . . . opportunity to achieve a cyborgian seamlessness.” However, at the same time, they often “feel threatened by [the technology’s] potential to engulf the self ” (1995, 111) that would cause a loss of agency due to the lack of (perceived) individual control over data in the system. Here, it is important to acknowledge that the computer users Lupton discusses were generally nonspecialists—although they often used computers intensively in everyday life—for whom the devices very much remained like a “black box;” they would usually have had little understanding of the inner workings of the computer (Latour 1999). This perceived mysteriousness of the computer system is arguably also one of the sources of the fears and discomfort concerning the perceived threat of loss of control and agency when becoming dependent on cybernetic systems. When we listen to Reekes’s account of the creation of the startup sound in this ­context,1 or hear other sounds he designed (e.g., “quack” and “Sosumi”) it is significant that the sounds he designed and selected include a combination of recordings of acoustic sounds and synthetically generated sounds and that many sounds seem to be somewhere in-between acoustic and synthetic (while the startup chime does not sound entirely synthetic, it is also hard to tell what the acoustic sound sources involved might be). We hear a similar pattern in the Windows system sounds of the mid-1990s and early 2000s, while the sounds “Recycle” and “Ring” are clearly recognizable as recordings of the crumpling of a piece of paper and a ringing desk phone, “Notify” and the sounds that mark infrared connections may be more readily associated with the soundscape of a sci-fi film.2 Considering this combination of the acoustic and synthetic in the choice of OS sounds in both Mac OS and Windows in relation to Lupton’s examination of the ambiguous relationship of computer users of the 1990s with their devices, it appears that the sonic environments of the OSs functioned as a means to partly negotiate this tension; on the one hand, they promote a smooth, sci-fi-like aesthetic to evoke a sense of unproblematic and clean computing power (this is perhaps most prominent in the different versions of the Windows startup sound since the mid-1990s) while, on the other hand, the inclusion of sounds that evoke elements of the organic world outside the device provides a sense of comfort, mitigating fears of loss of agency that are due to dependency on a “black box.” Quite differently, discomfort arising from dependency on a black box is unlikely to have been a big issue for the engineers working with early computers. In the early days of computing, operators were usually mathematicians and engineers with an in-depth knowledge of the system, while the systems themselves were still of a limited degree of complexity, which made it possible for an individual to have a fairly comprehensive understanding of its processes. In other words, whereas for the (predominantly nonspecialist) computer users of the 1990s a sonification of internal operations of a computer

588   DANIËL PLOEGER system would be likely to further add to the opacity of its operations and thus heighten a sense of alienation and potential threat, the sonified system sounds early engineers listened to comforted them that the machine was operating as intended and made it possible for them to relate to the system in terms of human actions (the previously manual operations of mechanical calculator operators). Since the 1990s, OS sounds have continued to develop. Listening to Microsoft Windows, for example, there are several changes that stand out. As I mentioned in the introduction of this chapter, since version 10, which was released in 2015, Windows no longer features a notable startup sound. Another development that becomes apparent on closer listening is the gradual disappearance of the organic-sounding sounds that had been a prominent feature of Mac OS and Windows alike since the 1990s. In Windows 10, the only apparently organic sound that remains is “Recycle” (the sound of crumpling paper mentioned earlier). All other sounds have gradually become smoothed and more evocative of digital synthesis. The ring tones no longer resemble those of traditional desk phones. The sense of the synthetic is further heightened by the conspicuous increase of digitally generated reverb that is added to the various sounds over the years. Microsoft’s response to queries about their motivations to remove the startup sound gives us a hint as to how developments in the sonic interface may be related to broader issues around the (desired) experience of human–computer interaction: When we modernized the soundscape of Windows, we intentionally quieted the system. . . . you will only hear sounds for things that matter to you. We removed the startup sound because startup is not an interesting event on a modern device. Picking up and using a device should be about you, not announcing the device’s existence. (Microsoft Corp. in Wong 2015)

Thus, OS sounds are conceived to facilitate a user experience in which the device is no longer perceived as present. The device should become an unnoticed attribute that’s “all about you.” In other words, the soundscape should facilitate the “cyborgian seemlessness” Lupton wrote about in the 1990s. Apparently, there is now no longer felt a need to put users at ease through evoking a sense of the organic around the technological black box they are connecting with. Instead, the technological device as a whole should be backgrounded. This “quieting of the system” is reminiscent of what Mark Weiser (1996) has coined as “calm technology” as part of his theory of “ubiquitous computing.” In the late 1980s and early 1990s, Weiser observed that personal computers, despite their widespread use by nonspecialists, were still often experienced as specialist devices, the operation of which involved focused and concentrated activity. In contrast, the much older information technology of reading and writing is present in all areas of everyday life and is performed with a much lower degree of conscious attention; writing is a “ubiquitous technology.” Weiser argued that, once computers become truly omnipresent in all kinds of forms, and each person operates a number of different devices, we will arrive in the era

Computer System Sounds as Embodying Technologies   589 of “Ubiquitous Computing.” He foretold the arrival of “lightweight Internet access devices costing only a few hundred dollars” and the inclusion of “a full Internet server into every household appliance [to connect] things in the world with computation” (1996), something which we now usually describe as the Internet of Things. This new era in computing would bring a new challenge: technologies should become “calm”: “If computers are everywhere they better stay out of the way” (1996). For Weiser, this meant that systems should be designed for the “periphery.” They should afford being “attuned to without attending to explicitly” or, in broader terms, “what matters is not technology itself, but its relationship to us.” Microsoft’s account of the Windows 10 soundscape almost seems to signal a direct implementation of this idea: “you will only hear sounds for things that matter to you . . . using a device should be about you, not announcing the device’s existence” (Weiser 1996). Does this approach to OS sound design then mark the beginning of Weiser’s era of Ubiquitous Computing and Calm Technology, an era of “cyborgian seamlessness?” Indeed, mobile technologies, which are playing an ever-bigger role in people’s everyday information technology use, are equipped with an even further reduced set of system sounds. If we understand system sound design strategies as merely responding to a techno-deterministic status-quo these recent developments may indeed be seen as indicators of the dawn of a seamless cyborg body where information technologies operate as inconspicuous and naturalized extensions of human bodies.

Sound Glitches as Intervention While developments in OS sounds can, to an extent, be understood simply as responses to broader cultural trends regarding attitudes to computing technology, they should also be considered as attempts to establish an envisioned, desirable interrelationship between humans and computers; instead of indicating that we have now arrived in the era of Ubiquitous Computing, the Windows 10 soundscape and its accompanying rhetoric may also simply show us that Microsoft would like us to imagine ourselves as “seamless cyborgs” in the sense of Weiser’s ideas. Here it is apposite to recall MacKenzie and Wajcman’s (1999) argument in The Social Shaping of Technology, that instead of merely constituting a neutral, deterministic force that drives cultural change, technological developments are embedded in sociopolitical dynamics. Research and development are often facilitated by government and large, corporate grants. As a result, technological developments are frequently shaped by the agendas of—and could contribute to the preservation of—existing sociopolitical power structures. It is conceivable that the primary interest in the corporate-driven design of OS sounds approach has simply been to make the devices attractive to users and thus enhance sales and consumption. However, computer system sound design also appears to have been coherent with—and thus arguably complicit in the persistence

590   DANIËL PLOEGER of—the positivist concept of the cyborg as a strengthened and enhanced human body in which technological prostheses are politically neutral and form increasingly seamless ­connections with the organic human body. Thus, rather than merely reflecting a ­technocultural status quo, OS sounds also facilitate the user’s imagination of a particular kind of connection between bodies and technologies. Although the vision of technologically enhanced bodies may appear attractive, it also has some problematic implications. First, the popular vision of the cyborg suggests a universal notion of progress, which omits engagement with the inequalities of gender, race, and social class that continue to play a role in the politics of bodies (Haraway 1991). As long as they are not equally available to everybody, the introduction of seamless and inconspicuous, and therefore likely to be taken for granted, technological extensions to the body easily becomes a process of hiding inequality. Second, endeavors to make interaction with technologies imperceptible promote a disregard for the materiality of technological components in terms of expense of resources, production labor, and ecological impact of waste (Ploeger 2016). The ever-increasing speed of replacement of everyday electronic commodities generates a growing stream of electronic waste. In most cases, this waste is eventually exported to developing countries where it is often recycled through environmentally harmful methods or dumped in unprotected areas, causing severe environmental damage accompanied by a range of sociocultural problems (Chan and Wong 2013). Thus, instead of making human–computer interaction as inconspicuous as possible— and thus promoting the imagination of a seamless cyborgian prosthesis—a conscious experience of the user interface might be desirable in order to facilitate an engagement with the device’s embeddedness in existing sociopolitical power structures, both concerning persistent inequalities in access to technology, and the ecological and social consequences of technology’s materiality. In other words, instead of stimulating the imagination of smooth and powerful technologically enhanced bodies that are in line with the interests of “militarism and patriarchal capitalism” (Haraway 1991, 151), a user environment that is less “seamless” could facilitate a critical awareness of the development and embeddedness in culture of technological devices. Considering OS sounds from this angle, are there any opportunities to reconnect to the materiality of the device in what sounds like the ever further smoothening and quieting of the system soundscapes? Where are the—metaphorical and literal—cracks in the developers’ attempts to create a comfortable and seamless sonic interaction? A 2007 blog post written by a member of the Windows developer team discussing sound “glitching issues” in the new Windows Vista OS offers a possible answer. Defining a sound glitch as “a perceivable error, gap, or pop in the sound caused by discontinuities in the audio signal during playback or recording which result from processing or timing problems,”3 the author draws attention to the fact that audio glitches are more perceptible than irregularities in video “because the ear’s tuned to notice high frequency transients.” Accordingly, the sound ecologist Michael Stocker (2013) suggests that the human body is “hardwired” to be alerted by subtle changes in sound inputs. Sonic irregularities trigger a sense of alert and thus break through a sense of smooth and unconscious interaction; the illusion of the seamless cyborgian connection is temporarily interrupted.

Computer System Sounds as Embodying Technologies   591 Understandably, and as the quoted blog post suggests, OS developers are invested in the elimination of any sonic irregularities. However, there are others who embrace these “bugs” in OSs, and even seek to actively provoke glitches. Glitch artists tweak “technology and [cause] either hardware or software to sputter, fail, misfire or otherwise wig out” (McCormack 2010). Although glitch art is often primarily considered as an aestheticization of system bugs, there is a political dimension to this work precisely in its endeavor to undermine software and hardware manufacturers’ desires to make computer operation as inconspicuous as possible by means of smoothly operating user interfaces. Although most work in glitch art that engages with the user interfaces of OSs has thus far been focused on visual artifacts, some artists have also worked with the distortion and interruption of system sounds. Among the artists who work in this way are JODI (Joan Heemskerk and Dirk Paesmans), members of the British organization TOPLAP, and Chicago-based Jon Satrom. Satrom’s Plugin Beachball Success (2012), performed at the opening ceremony of the transmediale festival in Berlin, begins with what looks like a failed attempt to start the program running the performance. Satrom unsuccessfully tries to log on to his Mac several times. Each time the Mac error signal sounds through the speakers. Satrom apologizes and says that he only just got this computer. Once he manages to get in, another disruption occurs almost immediately: an error message states “PLUGIN NOT FOUND. Your computer needs additional software to run this asset. Click Here to DOWNLOAD.”4 It quickly becomes clear that the performance has actually already started. Over the next thirteen minutes, Satrom turns the commonly experienced interruption caused by a missing plugin—an additional bit of software that enables a program to read a certain data format—into an escalating sequence of repetitions and transformations. Operating system sounds play an important role in this process. The familiar error sound that is explicitly introduced at the beginning of the performance is gradually mixed into a cacophony of various system sounds and decomposed into gritty noise structures. Listening to this apparent system collapse, the sound glitches gave me an almost visceral sense of discomfort. Satrom makes us aware that the smooth connection we may sense with our computers is merely an imaginary bond, forged to an important extent by a polished system soundscape. Once the smooth, familiar system sounds are violated and subverted, our attention is drawn to the fact that the technological extensions of our body are designed in accordance with a certain logic; they are not merely neutral, seamless prostheses that enhance the capabilities of our bodies. They are also form part of a designed world; a world that is still overshadowed by the imaginary, all-powerful cyborg of the military-industrial complex.

Notes 1. One More Thing (2010), “Interview Jim Reekes: Creator Mac Startup Sound,” https://www. youtube.com/watch?v=QkTwNerh1G8. Accessed June 27, 2017. 2. Dark Parodies (2015), “All Windows Sounds | Windows 1.0–Windows 10.” https://www. youtube.com/watch?v=ufKjjgvQZho. Accessed June 27, 2017.

592   DANIËL PLOEGER 3. Vistaheads.com (2007), “An Overview of Windows Sound and Music “Glitching” Issues—Microsoft Windows Vista Community Forums—Vistaheads,” http://web.archive. org/web/20100206210146/http://windowsteamblog.com/blogs/windowsvista/ archive/2007/10/29/an-overview-of-windows-sound-and-music-glitching-issues.aspx. Accessed June 27, 2017. 4. interweb (2012), “Prepared Desktop: Plugin BeachBall Success Jon Satrom TM2K12” https://www.youtube.com/watch?v=6jrz45AK-yA. Accessed June 27, 2017.

References Alberts, G. 2000. Rekengeluiden: De lichamelijkheid van het rekenen. Informatie und Informatiebeleid 18 (1): 42–47. Bateson, G. 1972. Steps to an Ecology of Mind. San Francisco: Chandler. Beckerman, J. 2014. The Sonic Boom. Boston, MA: Houghton Mifflin Harcourt. Bijsterveld, K. 2006. Listening to Machines: Industrial Noise, Hearing Loss and the Cultural Meaning of Sound. Interdisciplinary Science Reviews 31 (4): 323–337. doi:10.1179/ 030801806x103370. Blattner, M., D. Sumikawa, and R. Greenberg. 1989. Earcons and Icons: Their Structure and Common Design Principles. Human-Computer Interaction 4 (1): 11–44. doi:10.1207/ s15327051hci0401_1. Bruner, G.  C.  II. 1990. Music, Mood, and Marketing. Journal of Marketing 54 (4): 94. doi:10.2307/1251762. Chan, J. K. Y., and M. H. Wong. 2013. A Review of Environmental Fate, Body Burdens, and Human Health Risk Assessment of PCDD/Fs at Two Typical Electronic Waste Recycling Sites in China. Science of the Total Environment 463–464: 1111–1123. doi:10.1016/ j.scitotenv.2012.07.098. Clynes, M., and N. S. Kline. 1960. Cyborgs and Space. Astronautics 14 (9), September 1960: 26–27, 74–76. Cox, T. J. 2015. The Sound Book: The Science of the Sonic Wonders of the World. New York: W. W. Norton. DeWitt, A., and R.  Bresin. 2007. Sound Design for Affective Interaction. Lecture Notes in Computer Science 4738: 523–533. Deleuze, G., and F.  Guattari. 1987. A Thousand Plateaus. Minneapolis: University of Minnesota Press. Garland, A. 2015. Ex Machina. Film4, DNA Films. Gaver, W. 1986. Auditory Icons: Using Sound in Computer Interfaces. Human–Computer Interaction 2 (2): 167–177. doi:10.1207/s15327051hci0202_3. Goodman, S. 2009. Sonic Warfare: Sound, Affect, and the Ecology of Fear. Cambridge, MA: MIT Press. Halacy, D. S. 1965. Cyborg: Evolution of the Superman. New York: Harper & Row. Haraway, D. J. 1991. A Cyborg Manifesto: Science, Technology, and Socialist-Feminism in the Late Twentieth Century. Simians, Cyborgs and Women: The Reinvention of Nature. London: Routledge. Jackson, D.  M. 2003. Sonic Branding: An Essential Guide to the Art and Science of Sonic Branding. Basingstoke, UK: Palgrave Macmillan. Kubrick, S. 1968. 2001: A Space Odyssey. Metro-Goldwyn-Mayer.

Computer System Sounds as Embodying Technologies   593 Latour, B. 1999. Pandora’s Hope. Cambridge, MA: Harvard University Press. Lupton, D. 1995. The Embodied Computer/User. In Cyberspace/Cyberbodies/Cyberpunk, edited by M. Featherstone and R. Burrows, 97–112. London: Sage. MacKenzie, D. A., and J. Wajcman. 1999. The Social Shaping of Technology (Second edition.) Milton Keynes, UK: Open University Press. McCormack, T. 2010. Code Eroded: At GLI.TC/H. Rhizome, October 31, 2010. Ploeger, D. 2016. Abject Digital Performance: Engaging the Politics of Electronic Waste. Leonardo 50 (2). doi:10.1162/LEON_a_01159. Satrom, J. 2012. Plugin Beachball Success. Digital artwork. Schneiderman, B. 1986. Designing the User Interface. Reading, MA: Addison Wesley Longman. Smith, J.  O. 2010. Physical Audio Signal Processing. http://ccrma.stanford.edu/~jos/pasp/. Accessed June 27, 2017. Stelarc. 1991. Prosthetics, Robotics and Remote Existence: Postevolutionary Strategies. Leonardo 24 (5): 591. doi:10.2307/1575667. Stocker, M. 2013. Hear Where We Are: Sound, Ecology, and Sense of Place. Berlin: Springer. The Six Million Dollar Man. 1973–1978 [TV]. ABC Network. Thompson, M., and I. D. Biddle. 2013. Sound, Music, Affect. London: Bloomsbury Academic. Verhoeven, P. 1987. Robocop. Orion Pictures. Weiser, M., 1993. Ubiquitous Computing. Computer 26 (10): 71–72. https://web.archive. org/web/20180220015318/http://pubweb.parc.xerox.com/weiser/UbiHome.html. Accessed January 2, 2019. Wong, Raymond. 2015. The Evolution of Windows Startup Sounds, from Windows 3.1 to  10.”  Mashable. http://mashable.com/2015/07/31/windows-evolution-startup-sounds/ #jTUN3RzyJgq0. Accessed June 27, 2017.

CHAPTER 29

Glitch ed a n d Wa r ped Transformations of Rhythm in the Age of the Digital Audio Workstation Anne Danielsen

Introduction Digital music technology has brought about unforeseen possibilities for manipulating sound, and, as a consequence, entirely new forms of musical expression have emerged. This chapter will focus on the particular rhythmic feels that can now be produced through manual or automated techniques for cutting-up sound, warping samples, and manipulating the timing of rhythm tracks in digital audio workstations (DAWs). By rhythmic feel, I refer to the systematic microrhythmic design applied to a rhythmic pattern in performance or production, such as, for example, when playing a pattern with a swing or straight feel. These new rhythmic feels have made an unmistakable mark on popular music styles, such as glitch music, drum and bass, hip hop, neo-soul, and contemporary R&B from the turn of the millennium onward, and not only represent a challenge to previous forms but also create new opportunities for stretching the human imagination through presenting previously unheard sounds and sonic gestures to creators and listeners alike. A crucial aspect of this development is the manner in which the new technologies allow for combining agency and automation, understood as creative strategies, in new compelling ways. In what follows, I will begin by reviewing two trends in the literature addressing these new rhythmic feels: one that positions them as a continuation of earlier machine-generated grooves; and another that positions them as an expansion of the grooviness of earlier groove-based music, such as funk, soul, and R&B, in unforeseen directions. Ultimately, I will reflect on the challenges faced by musicians and producers when it comes to anticipating the outcomes of processes involving the experimental use of new technology and, in turn, will acknowledge the potentially productive impact of the technologically unexpected on our sonic imaginations.

596   anne danielsen

The Prehistory: “Organic” and “Machinic” Rhythms in the Popular Music Mainstream According to Tim Armstrong (1998), two different views of the relationship between technology and the body exist within modernism. On the one extreme, there is technological utopia, represented by Freud’s notion of technology as a positive prosthesis in which human capacities are extrapolated. In this view, “[t]echnology offers a re-formed body, more powerful and capable, producing in a range of modernist writers a fascination with organ-extension, organ-replacement, sensory-extension” (Armstrong 1998, 78). At the other extreme, we find writers adhering to the Marxist view of technology as an alienating means of industrial production. Here the technological advances underlying commodity capitalism result in a subordination of the human to the machine, promoting a nonhuman form of mechanical repetition and standardization. In the field of music, technology has generally taken on a role that is in accordance with the former view, namely as a positive extension of the human body. This pertains, for example, to traditional instruments such as pianos and clarinets (see, e.g., the discussion in Kvifte 1989) and to the increasing use of experimental recording and processing technologies. Some of the musical ideas that developed in rock in the late 1960s, for example, were not doable without such musical “prostheses.” Similarly, within the field of electroacoustic music, various electronic and computerized technologies have been regarded as progressive and liberating tools for music creation. However, we also find tendencies of Marxist determinism that apply to music. This is prominent both in the discourse on various technologies’ roles in promoting mass distribution of music and the Frankfurt school’s critical discourse on popular music as a cultural response to the standardization and commodification typical of capitalist industrial production (Adorno 1990; Horkheimer and Adorno 2002). In this chapter, I will focus on rhythmic popular music and use as my starting point the emergence of a discursive and performative tension that resonates with the Marxist view on technology just presented, in the sense that it situates human expression and machine-made musical creation as two opposing extremes. This tension developed as a response to the depreciation of disco and other repetitive rhythmic music as commercial and commodified “machine” music that emerged in the wake of the crossover success of black dance music in the popular music mainstream in the late 1970s.1 The immense popularity of disco was probably crucial here; the style represented new tools (click track and the analog sequencer) and a new aesthetics (four-to-the-floor), and threatened the ideological and commercial position of white Anglo-American rock that, up to this point, had dominated the mainstream for several decades.2 As a consequence, an increasing polarization between what might be called “organic” and “machinic” rhythms emerged.3 On the one hand, artists played styles, such as rock, country, funk, and jazz, that were characterized by rhythmic feels that derived from

transformations of rhythm in the digital audio workstation   597 both deliberate and unintended variations that musicians add to their performances; on the other hand, there were artists who produced sequencer-based dance music with a futuristic machine aesthetic, as expressed in Kraftwerk’s albums Man-Machine (1978) and Computer World (1981). These latter grooves, enabled by analog sequencers, were often perceived to be nonhuman and mechanistic, largely because of the absence of micro-level flexibility in the temporal placement of rhythmic events that were all forced into the grid provided by the sequencer. The absence of variation in sound in analog (and early digital) sequencer-based groove was probably also crucial to this dichotomy; the small shifts in intensity and timbre that are always present in performed music were absent in these early sequencer-based rhythms.4 This division in rhythmic design within 1970s popular music is probably crucial to any subsequent understanding of why rhythmic patterns consisting of grid-ordered events are experienced as lacking a human touch even when they are produced by a human. Rhythmic subdivisions that are too evenly played still tend to make us think of a machine. Loose timing, on the other hand, tends to be described as organic and evokes associations with human performance, even when those patterns and variations have been generated by a computer.5 The mechanistic aspect of perfectly even timing in sequencers from the predigital and early digital era was often countered through the introduction of a humanizing function to it, by altering the beats of a musical sequence according to a random series of deviations that would make them less, nonhumanly perfect. However, even though this may be thought to match motor and timekeeper noise in human timing, such random deviations are not typical for groove-based music, that is, music organized around a repetitive rhythmic pattern. As many studies have shown, deviations in groove-based music are to a large extent systematic (Bengtsson et al. 1969; Butterfield 2010; Danielsen 2006, 2010b; Iyer 2002), meaning that the same pattern of microtiming (that is, the early and late marking of beats) is repeated in each repetition of the basic pattern (usually one or two bars in length).6 Research has also shown that in performed music fluctuations that exceed this basic pattern are not random either but are instead both long-range and correlated (Hennig et al. 2011). Prior to the increased temporal flexibility of later digital sequencers and digital audio-sequencing (which was introduced in the early 1990s, see Brøvig-Hanssen and Danielsen 2016, chap. 6; Burgess 2014, chap. 11) then, there was both an ideological and a de facto difference between played and machine-generated rhythm that was associated with the constraints of the conditions of production within these two spheres. Machine rhythm lacked the intended (and unavoidable nonintended) temporal and sonic variations that were typical of human performance. Likewise, humans were simply unable to produce the extreme evenness of the machine.7 As we shall see in the following section, this traditional link between machine-based music and stiffness has been disrupted by new opportunities for creating microrhythmic designs in the DAW—first, because the DAW seems to be able to produce the entire spectrum of rhythmic feels previously associated with human performance, and second, because human- and computer-based rhythms are often, in fact, deeply embedded in one another, not least through the ways in which human performances are routinely

598   anne danielsen used as raw material for producing rhythms in the DAW. Today, therefore, it is very difficult to distinguish between human- and computer-generated performances. Nonetheless, even though the division between human- and machine-based rhythms has been transcended when it comes to what the machine can actually produce, the two related aesthetic paradigms—even rhythm on the grid, on the one hand, and deep, groovy rhythmic designs, on the other—have to some extent been continued. At the mechanistic extreme of the rhythmic continuum, we find forms of electronic dance music (EDM), in which machine-like timing is a distinguishing stylistic feature and even a preference long after alternatives to it had become available in the early 1990s (Zeiner-Henriksen 2010). At the “organic” extreme of the continuum, we find the deep, groovy rhythm of African American–derived, computer-based rhythmic genres. What is used to realize these two fundamental rhythmic inclinations, however, is no longer so different because, in the age of the DAW, they typically come from the same production tools. A crucial factor in defining a possibly new late-digital condition regarding the field of musical rhythm, then, is the manner in which the distinction between organic and machinic rhythm has been transcended. Agency and automation, understood as creative strategies, inform both mechanistic rhythmic expressions and deep, groovy feels. I will now conduct a closer inspection of these two aesthetic trends in contemporary musical rhythm.

Microrhythmic Manifestations of the Digital Audio Workstation: Two Trends The first trend comprises electronica-related styles whose rhythmic events align with a metrical grid. Common to the musicianship of the artists representing this trend is a preference for exaggerated tempi and an attraction to the completely straightened-out, square feel of quantization. As pointed out earlier, this was both an aesthetic preference and a technological constraint in the analog, sequencer-based tradition that this trend grew out of. In the early days of this trend, high-pitched sounds such as the hi-hat cymbal (or something else that fills the same musical function) were programmed unnaturally—either too quickly or too evenly or both—specifically to connote a machine-like aesthetics (Zagorski-Thomas 2010; Inglis 1999). The sound of these songs, then, evokes an overdone, even unlikely virtuosity that I have elsewhere labeled the “exaggerated virtuosity of the machine” (Danielsen  2010a). Prominent pioneering artists of this rhythmic trend include Aphex Twin (the performing pseudonym of Richard D. James), Autechre (Sean Booth and Rob Brown), and Squarepusher (Tom Jenkinson), all of whom entered the electronica scene in the 1990s and are associated with the label Warp. After a few years, this aesthetic strategy had traveled from these avant-garde electronica toolboxes to, for example, the title track of the Destiny’s Child

transformations of rhythm in the digital audio workstation   599 album Survivor (Columbia 2001), thus entering the popular music mainstream. The fast speed and quantized evenness of many of the tracks on such albums anticipate the related process of musical granulation—that is, of crystallizing “sonic wholes” into grains, so that musical or nonmusical sounds are chopped up into small fragments and reordered to produce a stuttering rhythmic effect. This aesthetic also promotes a tendency to transform sounds with an otherwise clear semantic meaning or reference point— such as a musical source or a different musical context—into “pure” sound (see, for example, Harkins 2010). Sounds or clips are also often combined in choppy ways that underline sonic cut-outs, rather than disguising them, resulting in a skittering collage. The label glitch music8—a substyle of electronic dance music associated with the artists mentioned in the previous paragraph—hints at the ways in which we perceive these soundscapes, namely as a coherent sonic totality that has been “destroyed,” meaning chopped up and reorganized anew.9 An important point here, which Brøvig-Hanssen discusses at length, is that this approach to sound relies on the listener being able to imagine a “music within the music”—that is, a fragmented sound presupposes an imagined and spatiotemporally coherent sound (Brøvig-Hanssen 2013). This operation, however, becomes particularly precarious when the manipulated element is a voice. BrøvigHanssen’s detailed analysis of the manipulations of the vocal track in two versions of Squarepusher’s “My Red Hot Car,”10 where one is a “glitched” version of the other, clearly demonstrates the ways in which meaning is transformed when sound is manipulated away from what one normally regards as the field of possible human utterances. In the glitched version, the vocal track has been “deformed”—sounds are cut off too early, there are repeated iterations of sound fragments separated by signal dropouts, and fragments are dislocated from their original locations (Brøvig-Hanssen and Danielsen 2016, chap. 5)—in a manner that clearly departs from the human. Still, it is also hard to hear the vocal track as purely musical (that is, not sung) sound. One tends to persist in imagining a human being (and a coherent message) behind the stuttering rhythm, since the voice always tends to be, first and foremost, an indexical sign of the human body and a clear path from source through musical performance to recording. Consequently, “[w]e can discern two layers of music, the traditional and the manipulated, neither of which, in this precise context, makes sense without the other” (Brøvig-Hanssen and Danielsen 2016, 95). In addition to the association of cut-up-strategies with the destruction or transformation of a coherent musical whole, glitched, granulated, or manually or automatically chopped up sound also produces a very characteristic microrhythmic effect. As Oliver (2015) emphasizes, in jungle and drum and bass it is not first and foremost the transformation of temporal features or durations that produce the peculiar microrhythmic effects but the cutting up of sounds and the abrupt transitions between sounds that such cuts produce. The effect of chopping up the crash cymbal of the much-sampled Amen break, for example, relies heavily on the fact that it is an initially acoustic, and thus very rich, sound.11 When human musicking is transformed through computer-based procedures, one is thus confronted by both a break with and a continuation of the existing mechanistic aesthetics of some kinds of rhythm. The sound is different (richer, less pure), but the groove is produced, as with most EDM-related styles, not by manipulating

600   anne danielsen temporal relationships but by introducing an interesting system of dynamics within the domain of sound. In the jungle genre, however, from which many of Oliver’s examples are drawn, it is not the dynamics of one sound that are the foci, but rather the microrhythmic effect that can be achieved through a compelling montage of fragments of the sound—that is, through the disruption and reordering of the parts of a sound. Whereas no microtiming is usually involved in this practice—all of the events are on the grid—a second trend, on the contrary, pushes the perceptual boundaries of timing discrepancies and irregularities to the limit and, in some cases, beyond. It concerns the increasing experimentation with, and manipulation of, the microtiming of rhythmic events through moving tracks back and forth on the time axis while otherwise cutting and reordering, editing and warping—in short, transforming longer stretches of sampled or played sounds. This trend produces rhythmic feels that are experientially very different from those above. Here, it is primarily the temporal relationships—durations, interonset intervals, the temporal envelope—that are being altered to great effect. One way of manipulating the original timing of performed music is simply to move rhythmic events or whole tracks to new temporal positions. In the former case, the result can be severe discrepancies between rhythmic events that were initially aligned (beatwise). In the latter case, moving an entire track in a multitrack recording introduces multiple locations for the pulse at the micro level. This strategy can be heard on D’Angelo’s Voodoo album (1999). Inspired by the glitch aesthetic of legendary hip hop producer-artist J Dilla (more on his music later), many of the Voodoo tracks display sharp discrepancies between rhythmic events that are happening on the same beat. In the tune “Left & Right,” for example, visual amplitude/time representations of the groove reveal that the discrepancy of the pulse location of the guitar layer and the pulse location of the bass/ bass drum layer is between fifty and eighty milliseconds, or up to one thirty-second note at the song’s tempo, which is close to ninety-two beats per minute (Danielsen 2010b). In an analysis of another song on this album, “Untitled (How Does It Feel),” Bjerke (2010) measures the distance between the multiple locations of the basic pulse at around ninety milliseconds. As D’Errico points out, the instability introduced through such a destabilizing maneuver tends to become normalized in the context of a stable and repetitive loop (D’Errico 2015, 283). However, such interventions nonetheless introduce a characteristic nonhuman, halting feel to the groove which, in turn, conveys the impression that the feel aspect of the groove is somewhat overdone. The experimental hip hop and neo-soul coming out of the Soulquarian collective to which D’Angelo belonged, together with artists and bands such as Common, the Roots, and Erykah Badu, might be considered a form of the avant-garde within African American–derived rhythmic genres. However, recordings by more mainstream contemporary R&B and rap artists from the early 2000s display the innovative use of digital tools as well. Carlsen and Witek, in an analysis of the song “What about Us” from Brandy’s innovative album Full Moon (Atlantic 2002, produced by Rodney Jerkins), show how the peculiar rhythmic feel of that tune derives from simultaneously sounding rhythmic events that “appear to point to several alternative structures that in turn imply differing placements of the basic beat of the groove. Though these sounds might coincide

transformations of rhythm in the digital audio workstation   601 as sounds, then, they do not coincide as manifestations of structure” (Carlsen and Witek 2010, 51). An illustration of this phenomenon would be, for example, when a hi-hat structurally referring to the last sixteenth note before a downbeat is delayed to such an extent that it coincides with the sound that in fact structurally represents that downbeat (a bass drum, perhaps). In other words, rather than being perceived as deviations from a shared underlying reference structure, such simultaneously sounding rhythmic events point to several alternative structures that in turn imply differing placements of the basic beat at the microlevel of the groove. The result is akin to the rhythmic feel of the D’Angelo groove described earlier, where there are multiple locations of the pulse that merge into one extended beat at the microlevel of the groove. Radical warping procedures can also be heard on several tracks of Snoop Dogg’s innovative album R&G (Rhythm & Gangsta): The Masterpiece (Geffen 2004). Here, several producers, among them J. R. Rotem and Josef Leimberg, contributed their takes on grooves where the feel aspect is almost overdone as a consequence of manipulation of rhythm in the DAW, leading to what I have earlier called the “exaggerated rhythmic expressivity of the machine” (Danielsen 2010a, 1). The groove in “Can I Get a Flicc Witchu” (produced by Leimberg) consists of a programmed bass riff and a drum kit, along with vocals that are mainly rapped. The texture of the groove is simple and open, but the microrhythmic relationships within it are muddy and complex. There are two forms of time warping going on here. First, the length of the beats is gradually shortened, so that beat 2 is shorter than beat 1, beat 3 is shorter than beat 2, and so on. This may be due to the use of tempo automation, a function that was available in the DAW at the time of production of Rhythm & Gangsta. This form of manipulation contributes to a general vagueness as to the positioning of rhythmic events. Second, the bass pattern follows its own peculiar schematic organization and is a main reason for the “seasick” rhythmic feel of the tune. This pattern neither relates to the 4/4 meter nor conforms to a regular periodicity of its own (for a detailed analysis, see Brøvig-Hanssen and Danielsen 2016, chap. 6). Its peculiar feel has most likely been produced in ProTools after the recording,12 either by adjusting the temporal onsets of the programmed events forming the bass riff pattern until the sought-after effect was achieved, by recording the bass riff separately in free rhythm, or by sampling the bass riff from a different source altogether. In the latter two cases, the recording or sample usually has to be deformed in various ways to fit the length of the repeated unit of the destination groove. The producer could also cut out a piece of the source (a recording or a sample) that has the exact length of the loop and paste it into the new musical context, regardless of any resulting mismatches in meter and tempo. This strategy recalls the work of J Dilla, and the sounding result in “Can I Get a Flicc Witchu” resembles the peculiar feels of J Dilla’s Donut album (2006), where the natural periodicity of the original samples is also often severely disturbed by the shortening or lengthening of one or more beats/slices of the sample. When this type of operation is looped, again, the result is a dramatically halting, deformed, human feel. The Snoop Dogg example demonstrates some of the ways in which samples can be manipulated timewise through various warping procedures, the results of which resemble the effect of the (re)positioning of rhythm tracks and events typical of

602   anne danielsen D’Angelo’s music from Voodoo onward. An additional dimension of J Dilla’s music, however, is the way in which he—despite transforming his sample in fundamental ways—manages to keep the sample’s world of associations somewhat intact. While he even disturbs human musical gestures by introducing glitches to their natural flow, his music is generally derived from the cutting and splicing of one or a very few sampled sounds, which allows them to remain readily recognizable. His work therefore contrasts with the “quantized” glitch aesthetics described above, where the automated procedures for cutting and splicing/relocating sonic fragments tends to destroy the sources and meanings of the samples. D’Errico also points to J Dilla’s characteristic habit of reconfiguring single musical sources—that is, he often “abstains from juxtaposing various samples into a multi-layered loop, instead rearranging fragments of a single sample into an altogether different groove” (D’Errico 2015, 283). This strategy underlines the surreal effect of the glitched version of the sample and shows the extent to which the meaning of the end result in such cases is highly parasitic on its source. When a sample keeps enough of its character to point toward its original aesthetic universe, which in the case of J Dilla is often a world of easy listening or light entertainment, the effect of the “corrupted” sound file or the imperfection of the loop becomes conspicuous. Benadon (2009) notes that time warps are common in predigital music as well. In early jazz, for example, the original rhythmic template might be distorted (in performance) through acceleration, deceleration, or a combination of these within the time span of the template.13 Global transformations of tempo might also affect the perception of stability of the rhythmic template, since all tempo transformations happen in relation to a rhythmic anchor and therefore introduce a sense of tension and release against that anchor. These forms of “analog” time warps, however, tend to have a continuous character.14 They gradually (organically) evolve, whereas digital time warps, probably because they are not implemented and modified by human musicking, tend to be introduced more abruptly and are thus often heard as un-organic or glitched. Both trends described above are parasitic on our notion of a pre-existing musical whole—something that was not deformed has been twisted or bent, a whole has been cut up and reordered, something that did not show any sign of failure or defect has been manipulated to come forward as containing a glitch. The perceived nonhuman character of these digital manipulations presupposes a notion of musical humanness—that is, an imagining of what the typically human gesture that has been disturbed or destroyed once was.

An Extension of the Human? Playing and making music have always been embedded in technology. The opposition between organic and machinic musical expressions in late 1970s and early 1980s popular music thus comes forward as partly ideological: all music-making means being deeply involved in its technology, or, in the words of Nick Prior:

transformations of rhythm in the digital audio workstation   603 It is not just that technology impacts upon music, influences music, shapes music, because this form of weak technological determinism still implies two separate domains. Music is always already suffused with technology, it is embedded within technological forms and forces; it is in and of technology.  (2009, 95)

Relating this point to a more general epistemological discourse, we could say that new technology creates new understanding, and that we have always learned to know the world through the tools and technologies that we use to interact with our surroundings. As Heidegger makes us aware of in his essay “The Question Concerning Technology” (1977), there is no alternative route to the knowledge we acquire through technology. Moreover, the insights that we derive from technology cannot be separated from the technology itself; through technology we achieve knowledge about the world in a way and to an extent that would otherwise be unavailable to us. In the words of Heidegger: “[Techne] reveals whatever does not bring itself forth and does not yet lie here before us, whatever can look and turn out now one way and now another” (1977, 8). The idea that man and technology are opposed to each other is thus, according to Heidegger, beside the point—instead, the machine should, in line with the “technology as prosthesis”— view presented earlier, be seen as an extension of the human. Digital technology has reactualized this debate in music-making, and from this perspective one might ask whether the rhythmic feels discussed previously really represent the results of a radically new “posthuman condition,” or whether they ought to be understood as part of the continuous development of technology’s ever-present role as an aid to, and extension of, human expression and behavior. According to the latter position, so-called posthuman expressions are not after or outside of the human repertoire at all. Instead, they should be considered simply the most recent expansion of that repertoire. This would mean, in turn, that the microrhythmic manipulation made possible by the DAW represents, in principle, nothing new, because there is nothing new in the fact that new technology produces new forms of knowledge, expression, and behavior or that it expands the scope of the human imagination. As pointed out at the start of this chapter, however, after the introduction of sequencer-based grooves in the popular music mainstream in the late 1970s, performed and machine-generated music tended to align with two distinct aesthetic fields. For some years, these two fields made use of different sets of tools that produced very different sonic results. Consequently, performed and machine-generated music came to represent different worlds of musical expression and imagination in the following decades. Microrhythmic manipulation in the DAW has brought about a new aesthetic situation marked by convergence between these two musical-rhythmic poetics. Performed and machine-generated music are, in the late-digital era, deeply embedded in one another— first, because both digital and traditional music technologies are used to achieve the desired musical results in both domains, and, second, because the respective contributions of these different technologies are in many cases (such as the examples discussed in this chapter) almost impossible to distinguish from one another in the end result. Accordingly, it would be wrong to speak of a hybridization of the two, because this

604   anne danielsen presupposes two separate and still recognizable entities that have been combined. Rather, performed and machine-generated rhythms have, in many contemporary genres, morphed, making it impossible to separate their respective influences. We are most likely yet to see the full consequences of this development, which also includes a wide range of new interfaces for organic control of computers and music machines.15 The flexibility of the DAW, our contemporary music machine, has contributed tremendously to this ongoing transformation, from an either/or to a both/and where the distinction between organic and machinic musical expressions feels of little relevance. The timing of musicians is warped in the DAW, then copied by other musicians who are in turn manipulated in new machine-generated renderings, and on it goes. Even the very current examples of the creative usage of digital pitch correction illustrates this point. Autotune is another instance of a fundamental morphing of human and machine that is made possible by digital tools that have extended the human expressive repertoire; sometimes the result of this morphing is a voice that captures certain human states or conditions better than the unmediated human voice, which is perhaps the most human of all instruments (see Brøvig-Hanssen and Danielsen 2016, chap. 7). We might then wonder whether we are in a new phase in the interaction between the musicking human and the machine, a phase that is characterized by an even more radical undermining of a possible ontological separation between man and technology than what characterizes the musician-instrument interaction typical of predigital times.

Imagining the “Humachine” through Sound So, were the creators of the new rhythmic feels discussed earlier capable of imagining the end result (and its wider implications), or did these new feels simply arise by accident and become labeled as such by the collective imaginations of the consumers/ receivers? This is a question that invites a double answer. No, the creators probably did not anticipate the effect of their experiments with new technology, and they were—and are, in line with Heidegger’s insights above—certainly not capable of foreseeing their wider results. On the other hand, new rhythmic feels such as those discussed above do not simply happen. The processes leading to them are begun with the intention of creating new sound. Generally, mechanized procedures for generating new musical material represent a well-known strategy for innovative music-making that was employed by, for example, the composer Pierre Boulez from the 1950s onward. His practice and reflections make it clear that the point of using such procedures was often to come up with something unimaginable, with completely new sonic raw material, that could then be shaped through intentional compositional procedures (see Guldbrandsen 2011, 2015).

transformations of rhythm in the digital audio workstation   605 The same goes for the creation of the rhythmic feels discussed previously. As we have seen, an experimental attitude in combination with playfulness and creative abuse of new technology may result in as-yet-unheard sonic results. The flip side of this is that, as soon as those new sounds have been produced, they start inhabiting the imaginations of their creators and the listeners. As to the groove-based music discussed in this chapter, the relationship between rhythm and motion is clearly a case in point. The groove qualities of rhythmic music are often related to the music’s perceived ability to make one’s body move. Exactly how various rhythmic feels are connected to body movement certainly remains an open question, but recent perspectives from the field of embodied music cognition pave the way for a close connection between rhythm and perceived and performed motion (e.g., Chen et al. 2008; Danielsen et al. 2015; Godøy et al. 2006; Large 2000; Leman 2008; Repp and Su 2013). Generally, discussions of the relationship between rhythm and corporeality in music listening point to the real and underacknowledged possibility that we structure our actual musical experiences according to patterns and models received from extra-musical sources, such as actual movements (see also Godøy, this volume, chapter 12). This is probably also a clue as to why we manage to adjust to and structure the peculiar warped grooves discussed above: we draw on our internalized repertoire of already acquired gestures to make sense of a new timing pattern. Put simply, if we find a way to move to those grooves, we then come to “understand” them. However, not only do dance and movement affect the way we experience and understand grooves, inner or outer movements can also be induced or proposed by music; that is, new gestures can be proposed by a piece of music. The rhythmic feels discussed earlier may thus be a means of imagining completely new movement patterns, or gestural designs, that are typical of the music of the humachine. Similar to the ways in which the glitched and warped grooves described above both evoke and deform their own “originals,” such imagined gestural designs may feel at one and the same time connected and completely alien to us. As we develop ways of internally or externally responding to these grooves, however, we also develop an understanding of these new gestural imaginations, which at present goes well beyond our “natural” repertoire (here understood as what we regard as possible for human beings in the present historical situation). Sounds that are shaped by way of digital processing may thus evoke sonically based imaginations not only of the sources behind them (what kind of creature makes this sound) but also of morphed, human-machine motion. Put differently, the sound of the DAW proposes a wide variety of new and peculiar ways of singing (the morphing of human and machine through autotuning), talking (glitched stuttering vocal tracks), and moving (warped, deformed human gestures). Today, these are experienced as different and marked by technological intervention, but who knows? In future renderings, they might be regarded as completely commonplace, perhaps as ordinary as talking with people on the other side of the Atlantic through the telephone and hearing the whispering of singers from an enormous stadium stage are today.

606   anne danielsen

Notes 1. For a discussion of how this crossover success changed black dance music, see Danielsen (2006, chaps. 6 and 7, 2012). 2. According to Paul Théberge, contrary to the 1960s, when experimentation with, for example, distorted guitar sound and multitrack recording “created excitement around new sounds and electronic effects” (1997, 1), the late 1970s saw a skepticism toward electronic instruments. According to Théberge, this skepticism (among, one might add, rock musicians and their audiences) emerged as a consequence of the widespread reaction to disco (1997, 2). 3. For a critical discussion of this polarization, see, for example, Simon Frith’s essay “Art versus Technology” (1986). 4. Interestingly, in an article in Sound on Sound as late as October 1999, this absence of variation in sound is still lamented when one is striving for realistic, sequenced drum parts: “[A] main problem with many sampled sound sets is that they do not reflect the ways in which the sound of real percussion instruments varies depending on the force with which they’re struck” (Inglis 1999). This uniformity is particularly acute with hi-hat strokes: “Standard drum kit sets, particularly those conforming to the general MIDI drum map, suffer persistent problems. Perhaps the most obvious of these is the use of only three different hi-hat sounds—open, closed and pedal—when real drumming makes use of a continuous range of sounds from quiet to soft, from tight closed to open” (Inglis 1999). 5. Today, both machinic and organic music rely heavily on technological tools and is produced by way of the DAW. Whether a piece of music is placed in the one category or the other, then, has little to do with the kind of tools involved or the degree of technological involvement. Rather, it comes forward as a question of aesthetics and the degree to which the use of technology is exposed or made opaque to the listener (Brøvig-Hanssen 2010). 6. In addition to such systematic timing, there are also individual patterns (see, for example, Repp 1996). 7. The fact that humans make mistakes, and machines, on the other hand, are associated with (nonhuman) perfection, is also the backdrop for the experience of the “vulnerable,” and thus more human, machine—as though technological mistakes somehow resemble our own imperfections. According to Sangild, a technological failure such as a glitch thus gives us a sense of “something living [it] displays the fragility and vulnerability of technology” (2004, 268). Dibben (2009) also underlines this humanizing effect of technological failure in a discussion of Björk’s use of technology. 8. “Glitch” initially referred to a sound caused by malfunctioning technology. As Sangild (2004) points out, these sounds of misfiring technology in fact expose technology as such (266), or render it opaque (Brøvig-Hanssen 2010). 9. Whereas automated cutting processes could initially only be applied to prerecorded sound, they can now be used in real time. For an introduction to the algorithmic procedures underlying different automated cutting processes in live electronica performance, see Collins (2003). 10. The two versions were released as the two first tracks of Squarepusher’s EP My Red Hot Car (Warp 2001). The second track was subsequently placed on the Squarepusher album Go Plastic (Warp 2001). 11. The Amen break refers to a drum solo performed by Gregory Cylvester Coleman in the song “Amen, Brother” (1969) by The Winstons.

transformations of rhythm in the digital audio workstation   607 12. See Johnson (2005) for an overview of the equipment used in Snoop Dogg’s recording studio at the time. 13. This phenomenon parallels the local time shift phenomenon as described by Desain and Honing (1989). See also Danielsen (2010a). 14. “Analog” performance practice is, of course, also open to sudden transitions, for example in the form of tempo shifts. Research has shown that these can be rather abrupt (see, for example, Cook 1995; Bowen 1996). However, the particularly glitched character of digital time warps is difficult to achieve with conventional instruments. 15. For an overview of advances in interfaces for musical expression from the last fifteen years, see Jensenius and Lyons (2017).

References Adorno, T. W. 1990. On Popular Music. In On Record: Rock, Pop, and the Written Word, edited by S. Frith and A. Goodwin, 301–314. London: Routledge. Armstrong, T. 1998. Modernism, Technology, and the Body: A Cultural Study. Cambridge: Cambridge University Press. Benadon, F. 2009. Time Warps in Early Jazz. Music Theory Spectrum 31 (1): 1–25. Bengtsson, I., A. Gabrielsson, and S. M. Thorsén. 1969. Empirisk rytmforskning. Svensk tidskrift för musikforskning 51: 48–118. Bjerke, K. Y. 2010. Timbral Relationships and Microrhythmic Tension: Shaping the Groove Experience through Sound. In Musical Rhythm in the Age of Digital Reproduction, edited by A. Danielsen, 85–101. Farnham, UK: Ashgate. Bowen, J. A. 1996. Tempo, Duration, and Flexibility: Techniques in the Analysis of Performance. Journal of Musicological Research 16 (2): 111–156. Brøvig-Hanssen, R. 2010. Opaque Mediation: The Cut-and-Paste Groove in DJ Food’s “Break.” In Musical Rhythm in the Age of Digital Reproduction, edited by A.  Danielsen, 159–176. Farnham: Ashgate. Brøvig-Hanssen, R. 2013. Music in Bits and Bits of Music: Signatures of Digital Mediation in Popular Music Recordings. PhD thesis. University of Oslo. Brøvig-Hanssen, R., and A. Danielsen. 2016. Digital Signatures: The Impact of Digitization on Popular Music Sound. Cambridge, MA: MIT Press. Burgess, R. J. 2014. The History of Music Production. Oxford: Oxford University Press. Butterfield, M. 2010. Participatory Discrepancies and the Perception of Beats in Jazz. Music Perception 27 (3): 157–176. Carlsen, K., and M.  A.  G.  Witek. 2010. Simultaneous Rhythmic Events with Different Schematic Affiliations: Microtiming and Dynamic Attending in Two Contemporary R&B Grooves. In Musical Rhythm in the Age of Digital Reproduction, edited by A.  Danielsen, 51–68. Farnham, UK: Ashgate. Chen, J. L., V. B. Penhune, and R. J. Zatorre. 2008. Listening to Musical Rhythms Recruits Motor Regions of the Brain. Cerebral Cortex 18: 2844–2854. Collins, N. 2003. Recursive Audio Cutting. Leonardo Music Journal 13: 23–29. Cook, N. 1995. The Conductor and the Theorist: Furtwängler, Schenker, and the First Movement of Beethoven’s Ninth Symphony. In The Practice of Performance: Studies in Musical Interpretation, edited by J.  Rink, 105–125. Cambridge: Cambridge University Press.

608   anne danielsen Danielsen, A. 2006. Presence and Pleasure: The Funk Grooves of James Brown and Parliament. Middletown, CT: Wesleyan University Press. Danielsen, A. 2010a. Introduction. In Musical Rhythm in the Age of Digital Reproduction, edited by A. Danielsen, 1–18. Farnham, UK: Ashgate. Danielsen, A. 2010b. Here, There and Everywhere: Three Accounts of Pulse in D’Angelo’s “Left and Right.” In Musical Rhythm in the Age of Digital Reproduction, edited by A. Danielsen, 19–35. Farnham, UK: Ashgate. Danielsen, A. 2012. The Sound of Crossover: Micro-Rhythm and Sonic Pleasure in Michael Jackson’s “Don’t Stop ‘Til You Get Enough.” Popular Music and Society 35 (2): 151–168. Danielsen, A., M.  R.  Haugen, and A.  R.  Jensenius. 2015. Moving to the Beat: Studying Entrainment to Micro-Rhythmic Changes in Pulse by Motion Capture. Timing and Time Perception 3 (1–2): 133–154. D’Errico, M. 2015. Off the Grid: Instrumental Hip-Hop and Experimentalism after the Golden Age. In The Cambridge Companion to Hip-Hop, edited by J. A. Williams, 280–291. Cambridge: Cambridge University Press. Desain, P., and H.  Honing. 1989. The Quantization of Musical Time: A Connectionist Approach. Computer Music Journal 13 (3): 56–66. Dibben, N. 2009. Björk. Bloomington, IN: Indiana University Press. Frith, S. 1986. Art versus Technology: The Strange Case of Popular Music. Media Culture Society 8 (3): 263–279. Godøy, R.  I., E.  Haga, and A.  R.  Jensenius. 2006. Playing “Air Instruments”: Mimicry of Sound-Producing Gestures by Novices and Experts. In Gesture in Human-Computer Interaction and Simulation: 6th International Gesture Workshop, GW 2005, Berder Island, France, May 18–20, 2004, Revised Selected Papers, edited by S. Gibet, N. Courty, and J.-F. Kamp, 256–267. Berlin and Heidelberg: Springer-Verlag. Guldbrandsen, E.  E. 2011. Pierre Boulez in Interview 1996 (II): Serialism Revisited. Tempo 65 (256): 18–24. Guldbrandsen, E.  E. 2015. Playing with Transformations: Boulez’s Improvisation III sur Mallarmé. In Transformations of Musical Modernism, edited by E.  E.  Guldbrandsen and J. Johnson, 223–244. Cambridge: Cambridge University Press. Harkins, P. 2010. Microsampling: from Alkufen’s Microhouse to Todd Edwards and the Sound of UK Garage. In Musical Rhythm in the Age of Digital Reproduction, edited by A. Danielsen, 19–35. Farnham, UK: Ashgate. Heidegger, M. 1977. The Question Concerning Technology and Other Essays. New York: Harper & Row. Hennig, H., R. Fleischmann, A. Fredebohm, Y. Hagmayer, J. Nagler, A. Witt, et al. 2011. The Nature and Perception of Fluctuations in Human Musical Rhythms. PLoS One 6 (10). Horkheimer, M., and T.  W.  Adorno. 2002. The Culture Industry: Enlightenment as Mass Deception. In The Dialectic of Enlightenment: Philosophical Fragments, 94–136. Stanford, CA: Stanford University Press. Inglis, S. 1999. 20 Tips on Creating Realistic Sequenced Drum Parts. Sound on Sound, October. https://web.archive.org/web/20160327093715/http://www.soundonsound.com:80/sos/ oct99/articles/20tips.htm. Accessed December 17, 2018. Iyer, V. 2002. Embodied Mind, Situated Cognition, and Expressive Microtiming in AfricanAmerican Music. Music Perception 19 (3): 387–414. Jensenius, A., and M. J. Lyons. 2017. A NIME Reader. Fifteen Years of New Interfaces for Musical Expression. Berlin: Springer.

transformations of rhythm in the digital audio workstation   609 Johnson, H. 2005. The Cathedral. Mix Magazine, July 1. http://www.mixonline.com/news/ profiles/cathedral/377333. Accessed December 17, 2018. Kvifte, T. 1989. Instruments and the Electronic Age: Toward a Terminology for a Unified Description of Playing Technique. Oslo: Solum. Large, E. W. 2000. On Synchronizing Movements to Music. Human Movement Science 19 (4): 527–566. Leman, M. 2008. Embodied Music Cognition and Mediation Technology. Cambridge, MA: MIT Press. Oliver, R. 2015. Rebecoming Analogue: Groove, Breakbeats and Sampling. PhD thesis. University of Hull. Prior, N. 2009. Software Sequencers and Cyborg Singers: Popular Music in the Digital Hypermodern. New Formations 66 (1): 81–99. Repp, B. 1996. Patterns of Note Onset Asynchronies in Expressive Piano Performance. Journal of the Acoustic Society of America 100 (6): 3917–3932. Repp, B. H., and Y.-H. Su. 2013. Sensorimotor Synchronization: A Review of Recent Research (2006–2012). Psychonomic Bulletin Review 20: 403–452. Sangild, T. 2004. Glitch: The Beauty of Malfunction. In Bad Music: The Music We Love to Hate, edited by C. J. Washburne, and M. Derno, 257–274. New York, NY: Routledge. Théberge, P. 1997. Any Sound You Can Imagine: Making Music/Consuming Technology. Hanover, NH: Wesleyan University Press. Zagorski-Thomas, S. 2010. Real and Unreal Performances: The Interaction of Recording Technology and Rock Drum Kit Performance. In Musical Rhythm in the Age of Digital Reproduction, edited by A. Danielsen, 195–212. Farnham, UK: Ashgate. Zeiner-Henriksen, H. T. 2010. Moved by the Groove: Bass Drum Sounds and Body Movements in Electronic Dance Music. In Musical Rhythm in the Age of Digital Reproduction, edited by A. Danielsen, 121–140. Farnham, UK: Ashgate.

chapter 30

On the Oth er Side of Ti m e Afrofuturism and the Sounds of the Future Erik Steinskog

Introduction While it probably is a coincidence that Richard Wagner and Sun Ra share a birthday, May 22, 1813 and 1914, respectively, there are certainly some dimensions in how their music has been received that could be compared. They both relate to a “music of the future,” and while their ideas are strikingly different, the fact remains that such a music of the future must have to be imagined. Some of the differences between the two are easily determined, such as views on history and the imagination of the future on a more general level. That is to say, what kind of future can be imagined? Here it is interesting, as Jacques Attali (1985) discusses in his now classic Noise, that music has been seen as prophesying the future within many different thought-systems. There seems, however, to be a version of Hegelianism at stake when discussing most “classical” music, from before Wagner and in to the twentieth century, and this is arguably challenged by Sun Ra and, more importantly, by theoretical discourses trying to get to grips with Sun Ra. In this chapter, I follow what is most often referred to as Afrofuturism and how this discourse challenges a normative version of understanding history, and thus introduces concepts such as counterhistory and countermemory as well as how science fiction and speculative fiction are also part and parcel of discussing what this music can mean. All these concepts are, I will argue, diverse approaches to understanding how the different modalities of time—past, present, and future—are intertwined and how this intertwinement is hearable, in something resembling a continuous, sonic time traveling.

612   ERIK STEINSKOG

Space Is the Place Sun Ra’s film Space Is the Place (1972, directed by John Coney) is, in many ways, a core text for understanding his worldview. One could argue that the place for Ancient Egypt is not presented enough in Space Is the Place but, besides that obviously important dimension, more or less everything is in place. While the film is science fiction—with references to Blaxploitation as well—it is, in one particular sense, a realistic movie, in the sense that it deals with the “unreality” of blacks in the United States of the early 1970s, something made abundantly clear in the scene where Sun Ra meets a number of young people in a community center in Oakland. As he says in that scene: How do you know I’m real? I’m not real; I’m just like you. You don’t exist in this society. If you did, your people wouldn’t be seeking equal rights. You’re not real. If you were you’d have some status among the nations of the world. So we’re both myths. I do not come to you as reality. I come to you as the myth because that’s what’s black.  (quoted in Zuberi 2004, 88)

On the political level, this unreality is similar to the issues at stake for the civil rights movement, as can be seen in Sun Ra’s reference to people “seeking equal rights.” But it is also a statement of an almost ontological or cosmological nature; black or blackness is myth. Is this the incorporation of society’s way of ordering race relations? Is it Sun Ra giving up becoming included in the category “human beings”? There is a strand in afrofuturist discourse arguing in such a direction, but where Sun Ra’s solution is understood as bypassing the whole category of “the human” and become super- or posthuman (cf., Eshun 1999, 155). Such a solution can, however, also be seen as a kind of utopian striving, where the utopian dimension necessitates leaving the category of “the human” behind. As history shows, first during slavery, where blacks were understood as “subhumans” and later with the increasing impossibility of “white America” to accept equal rights, the category itself is flawed. But whereas the movie is realistic in its depiction of race relations, it moves to science fiction for its solution (or one of its solutions), where going to outer space and finding a planet for blacks to create a new civilization is presented. It is, then, about imagining a future that seems unreal in the present. And while Jerome J. Langguth argues for a “cosmopolitan” dimension in this solution (2010, 158), I do think the film should rather be understood as pointing toward this future civilization as a black one, where the “myth” of blackness is lifted outside time and history and is thus related to what Sun Ra terms “MythScience.” In the opening of the film, Sun Ra is seen walking amid vegetation. He is followed by a creature in a hooded cape with a mirror where the face would have been expected, a creature earlier seen in Maya Deren’s short film Meshes of the Afternoon (1943), and later throughout the video to Janelle Monáe’s “Tightrope” (2010), thus bridging between classical American avant-garde and contemporary Afrofuturism (cf., Steinskog forthcoming).

Afrofuturism and the Sounds of the Future   613 Sun Ra hums, as if to set the scene for a spiritual séance, before going into a longer monologue, the first words heard in the movie: The music is different here. The vibrations are different. Not like Planet Earth. [ . . . ]. We could set up a colony of black people here. See what they can do on a planet all on their own without any white people. They could drink in the beauty of this planet. It would affect their vibrations; for the better of course. [ . . . ] That would be where the alter-destiny will come in. Equation-wise, the first thing to do is to consider time as officially ended. We work on the other side of time. We’ll bring them here through either isotope teleportation, transmolecularization, or, better still, teleport the whole planet here through music.1

The importance of music is underscored, while it is apparently not only one thing or dimension. Rather, music is fundamental to the differences experienced on the two planets at stake, the planet where Sun Ra is seen walking on the one hand, and “Planet Earth” on the other. “The music is different here,” followed by “the vibrations are different,” undoubtedly follows in a long tradition of understanding music as vibrations (cf., Goodman 2009). Calling it a tradition is not so much denying the physics—and thus realness—of understanding music as vibrations, it is rather to point to this understanding as being part of a continuum where cosmological thinking and/or speculation, science, and myth meet and thus it is, in a sense, a dimension central to Sun Ra’s MythScience. This need not necessarily have any consequence for the sound of the music (or the sound of the music of the future) but it argues for a use of music that is highly interesting. Music can be a means of transportation and not only on the individual plane as some ecstatic dimension where the musician moves “out of himself.” Rather, music is understood as a means of transporting a collective, and in that sense the Arkestra—Sun Ra’s big band— is not just a “misspelled” orchestra, but becomes an Ark, a kind of spaceship fueled by sound. Understanding music as a means of transportation is arguably less paradoxical when thinking about it than when first hearing it proposed. Still, there is another challenge to such an understanding of the science fiction dimension of Sun Ra as well as of Space Is the Place. While I suggested above that Space Is the Place could be interpreted as a realistic depiction of race-relations in the United States, it is also a science fiction film taking place in a parallel world and quite possibly in the future. Whether it is in the future or not, it is still “on the other side of time” with different “vibrations.” As such, it raises the question: How does music sound, or vibrate, on the other side of time?

Afrofuturism and After In his “Foreword: After Afrofuturism” George Lewis writes that Eshun’s term “sonic fiction” is an “extraordinarily powerful term” (Lewis 2008, 144). One of the strengths of the term is that it focuses on the sonic, but equally important is that by focusing on

614   ERIK STEINSKOG “fiction,” the term can be used in discussing imagination and the imaginary without having to deal with the visual connotations of “image” in the imaginary. Why is this important? The visual bias of philosophical and aesthetic thinking has been documented several times, and is found in the vocabulary of most aesthetic discourses (cf., Jay 1993). One example could be how “reflection” relates to mirrors and visuality, where the acoustic equivalent would seem to be echo. In other words, time and space are at stake, and our way of perceiving time and space, as well as our ways of thinking those same categories, proves important, for example in one of those places time and space interact, in reverberation. In what sense, that is, is our language determining what we can say about the phenomena under scrutiny? When it comes to the music or sound of the future, these aspects might prove themselves important in several senses. But, and this is also in accordance with Lewis’s argument, it does not necessarily have to do with language and the categories available for discourse. It could correspondingly relate to how sound is “imagined” or fictionalized and, perhaps even more importantly, what kind of fantastic scenarios are available. In other words, “sonic fiction” could—along the lines of afrofuturist discourse—relate to the “sonic fantastic.”2 In Lewis’s article, which is the introduction to a special issue of Journal of the Society for American Music dedicated to “Technology and Black Music in the Americas,” he wants to challenge Afrofuturism for what he seems to suggest is too strong a focus on what was previously known as “the extra-musical.” In an earlier article, “Improvised Music after 1950,” he seems to argue that “the extra-musical” does not exist, as he references “areas once thought of as “extra-musical,” including race and ethnicity, class, and social and political philosophy” (Lewis 1996, 94). In “After Afrofuturism,” on the other hand, he at least seems to think this distinction has some merit, as becomes clear when he asks: “What does the sound—not dress, visual iconography, witty enigmas, or suggestive song titles—what can the sound tell us about the Afrofuture?” (Lewis 2008, 141). It might be that sound (as sound) is an undertheorized dimension of Afrofuturism, although at the same time Lewis’s question echoes a more traditional musicological discourse associated to “the music itself.” From such a perspective, one could argue that “sound” as such hardly exists in the sense that it can “tell us” anything about the afrofuture—or, for what it is worth, any other future. The sound here is inscribed in contexts where, for example “dress, visual iconography, witty enigmas, or suggestive song titles,” are part and parcel of what is heard. This, not least, is in particular the case with music (“songs”) including lyrics. If the claim is that lyrics, including the semantic content, are not a part of the sound, this is difficult to uphold. With these considerations in mind, however, there are still good reasons to think along the lines Lewis suggests, exploring, in a heuristic sense, what “sound” can open up in an arguably narrower sense than I described earlier. And then, perhaps, add the contextual dimensions afterward. What I am arguing for, then, is a change of perspective, and I think this is one possible reading of Lewis’s question. The caveat I introduce, which at first feels necessary for me, is not necessarily fair with regard to Lewis’s discussion. While the question’s focus on “sound,” and the explicit exclusion of “dress, visual iconography, witty enigmas, or suggestive song titles,” seems

Afrofuturism and the Sounds of the Future   615 to argue for something close to a “sound itself,” this is almost immediately challenged by Lewis himself, when he argues for broadening the conversation: Broadening the conversation would allow a wider range of theorizing about the triad of blackness, sound, and technology; for a start one could interrupt the maleness of the afrofuturist music canon with artists such as Pamela Z, DJ Mutamassik, Mendi Obadike, Shirley Scott, Dorothy Donegan, the Minnie Riperton/Charles Stepney/Rotary Connection collaborations, and more. Going further, removing the putative proscription on nonpopular music allows us to take a more nuanced complex view of the choices on offer for black technological engagement. (Lewis 2008, 142)

In particular, I am occupied with what he calls “the triad of blackness, sound, and technology,” as this triad brings us close to dimensions in the definition of “Afrofuturism.” The cultural critic Mark Dery coined the term in his interview-article “Black to the Future.”3 The article primarily comprises interviews with Samuel Delany, Greg Tate, and Tricia Rose, but in the introduction Dery asks about the near absence of African American science fiction writers. The existence of such would be logical, he claims, and later authors have argued that the African American experience in a sense is science fiction. Dery’s definition has become canonical: Speculative fiction that treats African-American themes and addresses AfricanAmerican concerns in the context of twentieth-century technoculture—and, more generally, African-American signification that appropriates images of technology and a prosthetically enhanced future—might, for want of a better term, be called “Afrofuturism.”  (1994, 180)

The term “speculative fiction” is close to a visual metaphor—speculation (from Latin, “act of looking”)—and thus, in the case of music, gives rise to Eshun’s term “sonic fiction.” Still, what is meant by speculative fiction could still fruitfully be used in thinking about the musical side of Afrofuturism. And while sound is not mentioned in this version of defining Afrofuturism, the other two dimensions of Lewis’s triad—blackness and technology—are. And from Dery writing about “images of technology and a prosthetically enhanced future,” there is only a small step to the sonic imagery, and thus relations between sound and technology. The paradox of Lewis’s title, “After Afrofuturism,” should not be lost. The article was published in 2008, whereas Dery coined the term “Afrofuturism” in an article first ­published in 1993. Why would Lewis claim that we are “after Afrofuturism?” There seems to be, in Lewis’s understanding, an undertheorization of music in the classical Afrofuturism or, rather—and probably better—he seems to suggest that there are other ways of approaching the triad of “blackness, sound, and technology” than through an (arguably narrow) afrofuturist lens. That might very well be. On the other hand, in the years since Lewis’s article, discussions on Afrofuturism have become more common, a number of new musical acts are being discussed along the lines of Afrofuturism

616   ERIK STEINSKOG (and I will add something to this toward the end of the chapter), and academic and activist publications dealing with afrofuturist themes are becoming more common. In other words, there are few signs that we are really “after” Afrofuturism (although this depends on what is meant by “after”—according to Sun Ra we are “after the end of the world”). So, while Lewis might not want to engage the term “Afrofuturism,” his discussion of the triad of blackness, sound, and technology is of importance for the dimensions I am occupied with in this chapter.

Blackness and Technology Lewis’s suggestions for broadening the conversation are to the point, but “sound” is not any longer isolated. It is part of “the triad of blackness, sound, and technology.” Why is it that “blackness” should be a term on another level than dress? Or why does Lewis approve of technology but seemingly not of suggestive song titles? For the second question the answer should be obvious: technology is a means of producing—and manipulating—the sound; it is, in other words, implied in the sound, not something external to it. Similar arguments could be made for the other “extra-musical” dimensions, but this fact does not take away the validity of the argument in point. “Blackness,” on the other hand, is in this context a trickier notion, but one that could be solved by claiming that blackness itself is a technology. An example of such an understanding is found in Ytasha Womack’s Afrofuturism in a statement from Cauleen Smith: “When I met artist and filmmaker Cauleen Smith in July 2011, she best summed up race as creation: ‘Blackness is a technology,’ said Smith. ‘It’s not real. It’s a thing’ ” (Womack 2013, 27). Note the “unreality” of blackness in this statement, a kind of echo of Sun Ra’s myth. Cauleen Smith is also the filmmaker behind the Solar Flare Arkestra Marching Band Project, where, in 2010, she directed a form of flash mob in Chicago, including a marching band playing Sun Ra’s “Space Is the Place.”4 There are, then, relations between Smith’s aesthetic practices and her work in understanding the background for her films, with echoes of Sun Ra and his Chicago days as an important part. In claiming that blackness is a technology, and adding that “it’s a thing,” Smith points to some of the complex historical trajectories needing to be addressed to get a full understanding of what blackness can be said to be—past, present, and future. Within the discourse of Afrofuturism, one particular discussion has been the absence of people of color in the imagined futures of science fiction and fantasy. Connected to science fiction, this is in particular a question about the future, but given that science fiction more often than not is understood as a distorted notion of the present, it simultaneously opens up a different perspective on the present. Fantasy arguably can equally well be about the past; but here another thread is found too, in that Afrofuturism questions the past as well as the future. The most obvious example is found in Sun Ra’s reference to Ancient Egypt, where he claims a different understanding of and afterlife in Ancient Egypt. In his understanding, Egypt was, and still is, unmistakably Africa, and it is the past, and the past greatness of Egypt, that is his main focus.5 Here he follows

Afrofuturism and the Sounds of the Future   617 George G. M. James’s Stolen Legacy, first published in 1954, a book claiming that Greek philosophy, and thus, in a sense, European thinking, is stolen from Egypt, is manipulated, and its origin erased. This erasure continues throughout European thinking, as an erasure of race, as making universal a certain European understanding of the world. Given the history of blacks in the United States or, to broaden the understanding even more while simultaneously quoting the title of Sun Ra’s lecture series at the University of California, Berkeley, in 1971, given the place of “The Black Man in the Cosmos,” this European understanding has demonstrably led to a hierarchical understanding of race as well as of history. But, as Sun Ra says, “History is only his story; you haven’t heard my story yet” (in the film Sun Ra: A Joyful Noise from 1980, directed by Robert Mugge). And Sun Ra’s story is a revisionist story, about another kind of origin, in Ancient Egypt, as a technological civilization, the pyramids testifying to this. But with the Middle Passage, and with the history of slavery, blacks were not included in the category of human beings; they were “things.” As Fred Moten opens his In the Break: The Aesthetics of the Black Radical Tradition: “The history of blackness is testament to the fact that objects can and do resist” (Moten 2003, 1). Moten’s argument, that blacks were objects, things, commodities, fits with the history of slavery and, with the abolition of slavery and until the civil rights movement, a fight for inclusion in the category “human” was important for the black population in the United States. One thread within the afrofuturist discourse, arguably most plainly present in Eshun’s writing, seems to argue that this inclusion did not happen, and that another solution was found in going beyond the human to some kind of super- or posthuman existence that should be followed with leaving the planet behind and beginning a black civilization on a distant planet in outer space. The rationale for this thought seems to be the continuous presence of white supremacy and racism, a presence continuing after the civil rights movement’s victories beginning in the 1960s. What would it mean to say that blackness is a technology? One possibility is to go along the way of posthuman theory that references different forms of enhancement, for example, to discuss the body in relation to technology. This seems to be in accordance with Dery’s definition of Afrofuturism where he writes about “a prosthetically enhanced future” (Dery 1994, 180). Another angle on the same phenomenon is Lewis’s distinction between “prosthetic” and “incarnative”—an opposition he takes from Doris Lessing. In Lewis’s article, it is related to how “a largely prosthetic technological imaginary” is said to dominate Dery’s references in his writings about Afrofuturism (Lewis 2008, 139); this criticism highlights relations between the body and technology other than enhancement. In another article, about Pamela Z, Lewis writes: Z’s strategic placement of BodySynth electrodes—eight small sensors that can be positioned practically anywhere on the body—moves past the prosthetic readings envisioned by the technology’s creators towards the dynamics of the incarnative, the embodied, and the integrative. Z gradually developed a use of the technology that was fundamentally rhythmic, providing sonic markers of empathy that allowed her to personally guide the listener/viewer through the complexity of her work. (Lewis 2007, 59)

618   ERIK STEINSKOG Here, it is as if the incarnative is a way of moving “past the prosthetic readings,” another use of technology. That it is “fundamentally rhythmic” is of interest for the sounds being the result of these interactions between body and technology, as it is also of interest for understanding the “Afrological” dimensions of music found in Lewis’s thinking, not least in his important article “Improvised Music after 1950.”

The Music of the Future In More Brilliant than the Sun, Kodwo Eshun writes about the music of the future as “traditionally” being “beatless.” It is, he adds, “weightless, transcendent, neatly converging with online disembodiment” (Eshun 1999, 67). His examples are an interesting mixture: Gustav Holst’s The Planets (written between 1914 and 1916), Brian Eno’s Apollo soundtrack (1983), and Vangelis’s soundtrack to Blade Runner (1982, directed by Ridley Scott). “Sonically speaking,” he writes, they are not more futuristic than the Titanic and are “nothing but updated examples of an 18th C sublime” (Eshun 1999, 67). There are important dimensions to this understanding but, underlying it all, there are some fundamental questions that need to be addressed. When Eshun writes about “beatless” music, I, in one sense, could not agree more. And related both to Sun Ra and to the afrofuturist tradition (if we can call it a tradition), there is clearly some kind of focus on “the beat.” Here, however, beat must also be understood as rhythm in a more general sense and what needs to be addressed is how Eshun’s other examples relate to rhythm. In other words, in what sense is “beatless” music rhythmic? Obviously, nonrhythmic music does not exist, as rhythm is a way of organizing time and temporality in the sonic material of music. “Beat,” however, is something different. When Eshun introduces the notion of weightlessness and transcendence, and compares it with “online disembodiment,” he is, by contrast, very close to a discussion of a dichotomy between “headmusic” and “bodymusic”—this discussion, in consequence, would claim a transcendent position as being disembodied in contrast to an embodied musical practice—for example, dancing. Dance music would, understandably, focus on the beat—and would thus be one way of contrasting the “beatlessness” of the traditional music of the future. But is this not at the same time a simplified interpretation that cannot really be of much help here? First, evidently some kind of dance is possible to a beatless music as well, if Holst, Eno, and Vangelis exemplify “beatlessness.” Second, Eshun also argues that hip hop is “headmusic” (Eshun 1999, 46) and thus is not working within this dichotomy—although, because he uses concepts related to the dichotomy, it is more difficult to figure out what he is really arguing (or using the concepts for). Third, the sonic dimension of Holst, Eno, Vangelis, and a host of others—even if it should be the eighteenth century’s sublime as a reference—is important in imagining the sound of the future (perhaps more the sound of the future than the music of the future). This is not least the case with Eno and Vangelis’s use of synthesizers. And it is not least through the use of synthesizers

Afrofuturism and the Sounds of the Future   619 that Sun Ra’s music is in a tradition of “traditionally” understood “music of the future.” In a similar context, Lewis writes: Ra’s use of electronics is a crucial component of the claim to “pre-science” (a metanalysis that Ra might have enjoyed). Yet no academic treatise of which I am aware has historically traced and contextualized Ra’s use of sound technologies. (Lewis 2008, 145)

It is the synthesizer, then, or more broadly the use of “sound technologies” that is crucial for understanding Ra’s music and jazz—broadly understood—in the space age or in the electronic era.6 But how would Sun Ra’s music fit with Eshun’s description? The question would not least relate to the beat—and Sun Ra’s relation to “beat” or “beatlessness”—on the one hand, and his use of synthesizers on the other. But discussing these dimensions will lead not only to the eighteenth century’s sublime but also to any other understandings of the music of the future (or the sound of the future). The importance of synthesizers for Sun Ra’s sonic future cannot be overstated. He was one of the first pianists to explore electronic keyboards, and these keyboards are key for him in constructing his version of the music of the future. In some examples, the use of synthesizers is not that different from Brian Eno or Vangelis while, in other examples, Sun Ra explores the keyboards more as noise-creators in the tradition of academic or nonpopular electronic music. Here, the music and vibrations are different, and Sun Ra bends, for example, the Moog synthesizer to previously unheard-of sounds, as for example on “Outer Space Employment Agency” from the 1973 album Concert for the Comet Kohoutek that morphs into a version of “Space Is the Place” (cf., Langguth 2010, 152). Understanding the synthesizers as related to the future, and thus to history, is not very surprising and might be seen to be in line with developments within the avant-garde of nonpopular music. Following Eshun’s take on the tradition of the music of the future as “beatless,” these synthesizers can also be used within the tradition, as both Eno and Vangelis would be examples of. The change, it would seem, would be whether or not “beat” is central to the sound. Simultaneously, perhaps the synthesizers could be seen as an axis of negotiation between different understandings of the music of the future. As Eshun writes, “Whoever controls the synthesizer controls the sound of the future, by evoking aliens” (Eshun 1999, 160). When read in the context of Dery’s understanding of Afrofuturism, Eshun’s statement seems to echo a quote from George Orwell’s Nineteen Eighty-Four, which Dery uses as the epigram to his article: “If all records told the same tale—then the lie passed into history and became truth. ‘Who controls the past,’ ran the Party slogan, ‘controls the future: who controls the present controls the past’ ” (Orwell [1949] 2003, 40). Controlling the different modalities of time—the past, the present, and the future—is a constant negotiation of tales as well as of technologies. The synthesizer becomes a control-board not only to the sounds of the future but also to the sounds of the future’s

620   ERIK STEINSKOG past and the past’s future. The timelessness of synthesizer-sounds is a way of manipulating the sound waves and the vibrations in relation to, or in contrast to, the dominating tales of how the futures are supposed to sound. Dery’s discussion of time and history is related to a major difference between the normative understanding of history known from Europe, and a question arising whether this same understanding makes sense within an African American context. As he asks in a timely manner: “The notion of Afrofuturism gives rise to a troubling antinomy: Can a community whose past has been deliberately rubbed out, and whose energies have subsequently been consumed by the search for legible traces of its history, imagine possible futures?” (Dery 1994, 180). In other words, the past is a necessary component in imagining the future. If the past is lost or erased it will have to be recreated as a means to perceive a future at all. And if Orwell’s party-slogan is followed, this past is a result of controlling the present. Sun Ra’s intervention in the present and the sounds he makes— alone or with the Arkestra—is giving sound to an intersection of the present, the past, and the future, and understanding the future—imagining the future—is thus intimately related to all other modalities of time. The synthesizer, then, is deeply embedded in the temporalities of sound, including the sound of the future but there are two other important dimensions to Eshun’s quote cited earlier: the reference to “control,” and the reference to “aliens.” Controlling the synthesizer is more than playing it, it is also a matter of programming the sounds—or, rather, to work with the sounds themselves rather than simply making audible the default sounds of the synthesizer. This, obviously, is of prime importance when industrystandard sounds became the norm in popular music. One update of Sun Ra that Eshun focuses on is the Jonzun Crew’s album Lost in Space (1983), in particular the track “Space Is the Place.” With this title, the Sun Ra reference is apparent, but Eshun’s focus is on the alterations of the voice: “On Jonzun Crew’s Space is the Place, the Arkestral chant becomes a warning blast rigid with Vadervoltage. Instead of using synthesiser tones to emulate string quartets, Electro deploys them inorganically, unmusically” (Eshun 1999, 80). For Eshun, the significance of the vocoder-voice is that the voice is turned into a synthesizer and, as such, the voice is synthesized too or, one could argue, it is dehumanized. What terms to use, however, will also question how one thinks about “music,” “voice,” and so forth. When Eshun claims that the synthesizers are used inorganically, it is not necessarily a negative judgment. Rather, it should be seen as an extension of Eshun’s writing about the movement from the human to the posthuman. In that sense, “dehumanizing” would be wrong too, as in relation to black music the very notion of “the human” is very much at stake. The focus on the vocoder and its relation to a black posthumanism is also found in Alexander Weheliye’s article “Feenin,” where Weheliye claims Eshun as “the foremost theorist of a specifically black posthumanity.” This is in contrast to the then emerging theories of the posthuman (in the aftermath of, not least N. Katherine Hayles), showing the “literal and virtual whiteness of cyber-theory” (Weheliye  2002, 21), thus potentially erasing people of color from posthumanity. In Weheliye’s point of view, an important

Afrofuturism and the Sounds of the Future   621 way to alter this discourse, and to engage black cultural production, is “to realign the hegemony of visual media in academic considerations of virtuality by shifting the emphasis to the aural” (21); “Incorporating other informational media, such as sound technologies, counteracts the marginalization of race rather than rehashing the whiteness, masculinity and disembodiment of cybernetics and informatics” (25). Weheliye’s focus is the vocoder, “a speech-synthesizing device that renders the human voice robotic, in R&B, since the audibly machinic black voice amplifies the vexed interstices of race, sound, and technology” (22). These interstices—the places where race, sound, and technology meet—question the place of blackness within cybertheory but, at the same time, relate to what Lewis discusses when interrogating “the triad of blackness, sound, and technology” (Lewis 2008, 142). The vocoder is a part of this triad in a very particular sense, given that the technologization of the voice contributes to a different take on “the human” and of blackness. Simultaneously, going back to the Jonzun Crew highlights another dimension of “the music of the future.” While the mechanical, robot-like voices heard on this track sound like science fiction—and the long tradition of speaking robots or aliens from HAL in 2001 to Samantha in HER—it is also the sound of a particular, historical understanding of this inhuman sound. With HAL, the robotic is hearable, whereas Samantha sounds like a regular female voice and her artificiality is impossible to hear. A similar argument can be made for Janelle Monáe, whose alter ego Cindi Mayweather is supposed to be an android, but whose singing voice is identical with Monáe’s (cf., Steinskog forthcoming). Monáe’s overall concepts for her albums, including the performance of the android, is thus one half of the story of the future in/human voices, where the other half, arguably, is the autotuned or technologically modified voices. The vocoderized voices of Jonzun Crew belongs to the second half of this same imagination, and shows us one of the past’s imaginations of (another) future.

Sonic Fiction Fiction is not the same as “imagination” but in this scheme of things there are definitely relations. If we are on the other side of time, or if music is a kind of prophecy, a sonic imaginary of the future, then a sonic fiction can be about the sound of this nonheard (or yet-unheard) music. There is a paradox in all these formulations in that “imagination,” in its linguistic root, seems to point to the sense of vision. Thinking the sonic imaginary— despite the linguistic paradox—is necessary for the sound of the future to be present. But this, at the same time, also relates to one of the key questions of science fiction: whether it is about the (or a) future or whether it primarily is a slightly distorted picture of today. Both these understandings make sense in relation to science fiction, but they are still important in trying to be precise in analyzing what we are doing. And even in the stories of the future (rather than the present), these stories are about some future

622   ERIK STEINSKOG imagined from the point of view of the here and now. When it comes to music, including Attali’s music as prophecy, the means of production are obviously found here and now too including, not least, the sound-producing devices. There can be little doubt about the lasting influence of Sun Ra. This is not only because the Arkestra is still touring—decades after Sun Ra “left the planet”—although that understandably plays a role, but is also because of the importance of Sun Ra within Afrofuturism as well as his importance across a spectrum of artists using elements of Sun Ra’s music or thinking or simply expanding on his aesthetics. One could make an important case for an argument about a (musical) continuity of Sun Ra influences going back at least to Parliament/Funkadelic, but rather than such a discussion of history, I want to end this chapter with some contemporary examples in a musical vein that can be said to be a continuation of Eshun’s more “canonical” Afrofuturism.7 Eshun’s narrative of afrofuturist music—or black sonic fiction—is more in line with a classical avant-garde discourse that more or less excludes “popular music.”8

Transmolecularization— Beyond Sun Ra While Jonzun Crew updated Sun Ra for the 1980s, and while it might sound dated today, there are many contemporary musicians doing different takes on the Sun Ra legacy too. Related to genre, many of them are best thought of under the vague umbrella-term “electronica” but there are good reasons to discuss them in relation to updated versions of Afrofuturism. In that sense, they might be seen as challenging Lewis’s understanding that we should be “after Afrofuturism.” I have already mentioned Janelle Monáe, but my focus here will be four other musicians who are DJs or producers: Ras G., Kirk Knight, Flying Lotus, and Hieroglyphic Being. Much of the current music understood as afrofuturist is sample-based, opening up for other ways of making relations, including those that are historical. Communicating with samples is an inherent part of hip hop aesthetics and it is also related to quotes and other ways of citing earlier music and performances in instrument-based music; with samples, though, the signifying processes are different. At the same time, such a practice is undoubtedly a use of technology, opening up for another angle in the triad of blackness, sound, and technology. In the music of Ras G. (born Gregory Shorter Jr. in 1979 or 1980)—often recording under the name Ras  G.  & the Afrikan Space Program—such sampling practices are found not only when it comes to titles and references (in what used to be understood as the extra-musical) but also in the musical sounds. Take the track “Astrohood” from the album Brotha from Another Planet (2009) where he samples from Sun Ra’s “I’ll Wait for You” from the album Strange Celestial Road (1980). The singing voices in Sun Ra’s track are overtaken by electronic sounds—similar to the sounds/noises of computer games— before a beat is introduced and later followed by what is almost Ras G’s signature—voices

Afrofuturism and the Sounds of the Future   623 shouting “Oh Ras” with a heavy echo to it. Sun Ra’s song is groovy with a bass vamp leading into call-and-response voices, and it is these voices Ras chooses to sample, rather than the bass groove or Sun Ra’s discrete synthesizer sounds. However, one would define the generic differences between the two tracks as a transformation from a more or less funky bass dominating the sounds to the electronic sounds dominating Ras’s track. If we were to compare the two tracks the difference in length would play a role. “Astrohood” is short, only 1:55, whereas “I’ll Wait for You” is sixteen minutes and the latter develops into a jam where, under the saxophone solo, Sun Ra is exploring the noisier spectrum of his synthesizers. On the track “Natural Melanin Being . . . ” from Back on the Planet (2013), Ras G. instead samples Sun Ra from an interview where he speaks about natural blackness as well as about Ancient Egypt. Everything in-between and around Sun Ra’s voice is Ras G’s electronic sounds. The electronic sounds are layers of samples, with sonic references across decades of music. In that sense, another version of “the other side of time” is presented, a time where the past is potential for recreation and revision and, as such, a technological parallel to the understanding of history Sun Ra seems to relate to. On both these albums there are also references to Sun Ra in the aesthetics of the album covers and in the titles; so, in that sense, one would have to say it is a whole aesthetic rather than simply a sonic ideal. With Kirk Knight’s “Start Running,” the opening track of Late Night Special (2015), Sun Ra’s voice is heard again, this time with the famous words from the opening of Space Is the Place. The first sounds on the albums are Sun Ra’s voice saying “teleportation, transmolecularization, or better still, teleport the whole planet here through music.” After “better still” the rapper comes in, rapping over the rest of the still audible words of Sun Ra, moving into a contemporary alternative hip-hop track. Toward the end of the track the voice of Sun Ra returns, saying, “the music is different here” and so on. Knight thus clearly signifies on Sun Ra’s statements and in a particular sense can be said to attempt, for the rest of the album, to present this “different music,” again re-inscribing African American music in a process of teleporting the planet. The sonic environment around the first Sun Ra sample, however, is more related to Alice Coltrane than Sun Ra. A sweeping harp rather than synthesizers, and so another mode of combining acoustic instruments and electronics. With the harp and the Alice Coltrane references the track is closer to Flying Lotus than to Ras G., and one track on Knight’s album, “Dead Friends,” features Thundercat—Stephen Bruner—who also collaborated with Flying Lotus. Flying Lotus (born Steven Ellison, 1983) is the grandnephew of Alice Coltrane and the son of Marilyn McLeod. Both these relations are often referenced within his music, the first with attention to the spatial and spiritual dimensions in his music, the second with reference to more traditional popular music and to Motown (McLeod wrote, among other songs, Diana Ross’s “Love Hangover”). Flying Lotus’s track “Transmolecularization” is an outtake from You’re Dead! and features Kamasi Washington on saxophone and is a track first played on his BBC Radio 1 sessions (May 14, 2015).9 The title of this track is a clear reference to Sun Ra, both to the opening of Space Is the Place, and to a particular scene in the film, from the Outer Space Employment Agency in Space is the Place where

624   ERIK STEINSKOG transmolecularization is one of several terms Sun Ra uses to explain the relationship to outer space and to travels in outer space.10 By using the same term as the title of a track, Flying Lotus signals the legacy of Sun Ra. Besides this, however, the track demonstrates inheritance on many levels, as for example in the sound of Washington’s saxophone. Rather than sounding like the music from Sun Ra—or rather the Arkestra’s saxophoneplayers—Washington is closer to the sound of John Coltrane, Pharoah Sanders (who played with Sun Ra on the album Sun Ra Featuring Pharoah Sanders and Black Harold, recorded in 1964), and the “space jazz” or “spiritual jazz” of the 1960s and 1970s. There might be a paradox here, not only in Washington’s playing, but in Flying Lotus’s production more generally. If what is at stake is “the sound of the future,” what happens when musicians go back in time to find this sound? In other words, what kind of historical thinking is at stake in producing the sound—or music—of the future? The sound of “Transmolecularization,” however, is a mixture of Washington’s saxophone and samples close to both 1960s jazz and electronic music, and, as such, it might point to an understanding of the future as a combination of elements from the past. Compared to Flying Lotus’s earlier albums, You’re Dead! is more of a jazz album, with references to the late 1960s and early 1970s. “Transmolecularization,” while being an outtake, is a case in point, with Washington sounding like Pharoah Sanders or Joe Henderson, for example, and the way they play on Alice Coltrane’s Ptah, the El Daoud (1970). Here, then, the reference to Sun Ra is in the title, showing music as the means of transportation, but the actual sonics are closer to Alice Coltrane and what is arguably another strand of afrofuturist music. And while any clear samples are not in the forefront of the mix, this music still shows central features of hip hop aesthetics as an art of recombination, allusions, and quotes—both of particular songs/tracks, or of a more general aesthetic or vibe. Here, even history becomes a kind of technology, a kind of sonic time-travel where the sounds of the past re-emerge in the present together with the sounds of the (imagined) future. On the other side of time, the whole cosmos is vibrating, echoing across the universe. Flying Lotus’s You’re Dead! is, in many ways, a culmination of a collaboration between electronic sounds and live instruments—there are several examples on his earlier albums. An arguably related development can be seen in the music of Hieroglyphic Being, even if the latter’s music has been more electronically dominated for a much longer time. His 2015 album We Are Not the First might thus be an exception, but if so it is a very interesting exception in the present context. Hieroglyphic Being (born Jamal Moss, 1973) is better known for playing music based on the house genre—he is from Chicago— but also in this context Sun Ra is referenced as, for example, in the track “Space Is the Place (But We Stuck Here on Earth)” from the 2013 album A Synthetic Love Life. Primarily working with turntables, drum-machine, and mixer, the reference to the “synthetic” should be seen in the mainstream tradition of human versus machine but, within the same tradition, the DJ becomes a kind of cyborg, where the machines and the human merge. When adding musical instruments, this interaction of humans, instruments, and machines become even more complex, as already heard on Flying Lotus’s “Transmolecularization.” It is, however, not only a process of more complex interactions, it is also a process where enhanced sounds as well as incarnated sounds are heard;

Afrofuturism and the Sounds of the Future   625 in other words, a combination of different musical technologies. Rather than simply moving the music into the domain of the machines—much of Flying Lotus’s early work, as well as the majority of Hieroglyphic Being’s output—gives rise to a different negotiation between live instruments (what could be understood as a past musical practice) and the DJ (understood as one version of the new). In this sense, it is, on an aesthetic level, a continuation of Sun Ra’s own practices where his synthesizers and electronic keyboards are heard alongside a traditional big band, even if that big band is expanded with less traditional instruments. Hieroglyphic Being’s 2015 album We Are Not the First sees him in company with live musicians, among them Marshall Allen, who played saxophone with Sun Ra from 1958 and who currently leads the Arkestra. Allen’s participation on this album is one way Sun Ra’s legacy is vibrant, but there are also other dimensions, musically, aesthetically, and what I would call cosmologically. There might seem to be a long way from the musical to the cosmological but even in a more “traditional,” “normative,” and “Eurological” (cf., Lewis 1996) context, historical relationships can be found between music and the cosmos going back to Pythagoras in Ancient Greece, with important inputs during the Renaissance, and likewise several interesting contributions within the twentieth century’s “modernism” and beyond. One could, for example, make arguments about similarities between Sun Ra and Karlheinz Stockhausen, where arguably Lewis’s opposition between “Afrological” and “Eurological” could be a way into this discussion even if Lewis’s main examples are Charlie Parker and John Cage. Lewis’s opposition, however, could also be used more accurately on Sun Ra’s MythScience—that is, Sun Ra’s “system” of thought—where there is a clear revisionist dimension. As shorthand, this could be described by referencing the subtitle of George G. M. James’s Stolen Legacy (from 1954): “The Greeks Were Not the Authors of Greek Philosophy, but the People of North Africa, Commonly Called the Egyptians.”11 The point of departure or origin changes; the “birth” of European thinking (including the relation between music and cosmology) is in the one version Ancient Greek and in the other Ancient Egypt. The consequences of choosing between one or the other of these versions are much larger than one would believe. It involves, for example, the very notion of “history”—and the so-called prehistorical. The “prehistorical” could here be called “myth” or “mythical,” and as such Sun Ra’s notion of MythScience might be reintroduced. Another of Sun Ra’s concepts, “Astro Black Mythology,” points in the same direction, but with an expanded context given the explicit reference to blackness. Blackness in this context is related to “Africa”—and, in a sense, it does not really matter whether this is a “real” or an “imagined” Africa. The effects of this “Africa” are observably real. It is, and here James’s Stolen Legacy might again be referenced, a question about the “black”/“African” origin of “white”/“European” thought. This historical situation, that is, the questions related to Ancient Egypt and to the whitewashing of history (or “History”) is only one part of these consequences. There is also a contemporary dimension, with some relevance to the history of music (popular music), and distinctions between “black” and “white” music, and, not least, the presumed hierarchies between these understandings (cf., Steinskog 2011, on Ellington). This is, or can be, understood as a distinction between improvisation and composition, but also, as Lewis makes clear,

626   ERIK STEINSKOG between different understandings of improvisation. Finally, and this is probably the most relevant for the current discussion, there is a way of understanding the questions of rhythm and beat. From this, an argument can be traced back to Sun Ra and to the music of the future that is related not only to beat and rhythm but, simultaneously, to how the music of the black future is always also related to reimaginations, reinterpretations, and revisions of the past (cf., Lock 1999). Here the history of music that Eshun relates is made more complex by a constant intertwinement of different pasts and their respective futures, where, in the case of Ras G., Kirk Knight, Flying Lotus, and Hieroglyphic Being, samplers, turntables, computers, and mixers substitute for Sun Ra’s synthesizers, both as a continuation and as a renegotiation of the history of black music. Imagining the future of blackness thus becomes equally as much imagining the unheard of as remixing and renegotiating the past. The future and the past intertwine—continually—in the present of the sounding music, being multidirectional rather than linear, but pushing the sounds into other worlds.

Notes 1. https://www.youtube.com/watch?v=4s8VZz-ERO0. Accessed May 15, 2017. 2. “Sonic fantastic” might be seen as one possible dimension of what Richard Iton calls “the Black fantastic” (2008). 3. The article is found in Dery’s edited volume Flame Wars: The Discourse of Cyberculture from 1994. With one exception, the whole volume was first published as volume 92, number 4 of the South Atlantic Quarterly (in 1993). 4. https://www.youtube.com/watch?v=WvcXwtqQ5ME. Accessed May 15, 2017. 5. Recent DNA-research argues that ancient Egyptians were more genetically similar with people from the eastern Mediterranean than people in modern-day Egypt. https://www. livescience.com/59410-ancient-egyptian-mummy-dna-sequenced.html Accessed June 14, 2017. 6. Lewis also references George Russell’s Jazz in the Space Age (1960) and Electronic Sonata for Souls Loved by Nature (1968). 7. This means that I am excluding examples drawn from what is arguably a more mainstream contemporary music, such as Janelle Monáe (her references to Sun Ra in the video to “Tightrope,” for example). 8. This criticism has been raised by several authors, including myself, and should be taken seriously when considering Afrofuturism at large. However, related to the sounds of the future, it still makes sense to discuss this same avant-garde logic in its own right. Focusing here, then, does not remove the importance of a more mainstream Afrofuturism as well. 9. In addition to the mentioned tracks, the term “transmolecularization” is also used by Eagle Nebula on the track “Nebulizer” from her EP Space Goddess (2015). 10. https://www.youtube.com/watch?v=iDwn0lsxDGg. Accessed May 15, 2017. 11. One could also refer to Martin Bernal’s Black Athena, given that Bernal establishes arguments for observing the effects of such a “revision” throughout European intellectual history.

Afrofuturism and the Sounds of the Future   627

References Attali, J. 1985. Noise: The Political Economy of Music. Minneapolis: University of Minnesota Press. Dery, M. 1994. Black to the Future: Interviews with Samuel R. Delany, Greg Tate, and Tricia Rose. In Flame Wars: The Discourse on Cyberculture, edited by M. Dery, 179–222. Durham, NC: Duke University Press. Eshun, K. 1999. More Brilliant than the Sun: Adventures in Sonic Fiction. London: Quartet Books. Goodman, S. 2009. Sonic Warfare: Sound, Affect, and the Ecology of Fear. Cambridge, MA: MIT Press. Iton, R. 2008. In Search of the Black Fantastic: Politics and Popular Culture in the Post-Civil Rights Era. Oxford: Oxford University Press. James, G. G. M. (1954) 2008. Stolen Legacy. New York: Wilder. Jay, M. 1993. Downcast Eyes: The Denigration of Vision in Twentieth-Century French Thought. Berkeley: University of California Press. Langguth, J. J. 2010. Proposing an Alter-Destiny: Science Fiction in the Art and Music of Sun Ra. In Sounds of the Future: Essays on Music in Science Fiction Film, edited by M. J. Bartkowiak, 148–161. Jefferson, NC: McFarland. Lewis, G.  E. 1996. Improvised Music after 1950: Afrological and Eurological Perspectives. Black Music Research Journal 16 (1): 91–122. Lewis, G. E. 2007. The Virtual Discourses of Pamela Z. Journal of the Society for American Music 1 (1): 57–77. Lewis, G. E. 2008. Foreword: After Afrofuturism. Journal of the Society for American Music 2 (2): 139–153. Lock, G. 1999. Blutopia: Visions of the Future and Revisions of the Past in the Work of Sun Ra, Duke Ellington, and Anthony Braxton. Durham, NC: Duke University Press. Moten, F. 2003. In the Break: The Aesthetics of the Black Radical Tradition. Minneapolis: University of Minnesota Press. Orwell, G. [1949] 2003. Nineteen Eighty-Four. London: Penguin. Steinskog, E. 2011. Hunting High and Low: Duke Ellington’s Peer Gynt Suite. In Music and Identity in Norway and Beyond: Essays Commemorating Edvard Grieg the Humanist, edited by T. Solomon. Bergen: Fagbokforlaget, 167–184. Steinskog, E. forthcoming 2019. Metropolis 2.0: Janelle Monáe’s recycling of Fritz Lang. In Afrofuturism 2.0: The Black Speculative Art Movement, edited by R. Anderson and C. Fluker. Lanham, MD: Lexington Books. Weheliye, A.  G. 2002. Feenin’: Posthuman Voices in Contemporary Black Popular Music. Social Text 20 (2): 21–47. Womack, Y. L. 2013. Afrofuturism: The World of Black Sci-Fi and Fantasy Culture. Chicago: Lawrence Hill Books. Zuberi, N. 2004. The Transmolecularisation of [Black] Folk: Space Is the Place, Sun Ra and Afrofuturism. In Off the Planet: Music, Sound and Science Fiction Cinema, edited by P. Hayward, 77–95. Eastleigh: John Libbey.

chapter 31

Posth um a n ist Voices i n Liter at u r e a n d  Oper a Jason R. D’Aoust

Introduction When we think of the voice from the perspective of sound and imagination, a familiar observation comes to mind: the voice is a series of phonatory sounds we emit (as in speech, screams, and songs), but also their interior manifestation in our mind’s ear. The experience of hearing a voice WHEN we think, read, and write leads us to think of voices as dual in nature, namely through their inner and outer manifestations, but the interrelation of the two is more complex than it appears. Our seemingly innate inner voice gives us the impression that our interiority precedes any exteriorization, and thereby establishes a hierarchy in communication. In identifying inner and outer voices as two sides of the same coin, we come to believe that speech and song are the materialized expression of our inner voice. Artistic practice can reinforce this point of view. Eileen Farrell, for example, has commented on how the imagination plays an important part in vocal performance: rather than focus on the manipulation of larynx, pharynx, and resonators, successful artists concentrate instead on imagining the pitch, texture, and tone of the vocal line they then instantly create in performance (Farrell 1993). This performance practice defines the sonorous imagination as an active agent that forms sounds in the inner ear before they are vocally expressed and manifested. These observations might also implicitly convey a dualist perception that vocal expression is material and the inner voice is not. Such a way of understanding the voice often turns out to support or be supported by metaphysical explanations of the physical world. A metaphysical worldview purports that there are immaterial principles (like our identity with our inner voice), which nevertheless have the creative force to organize the material world. For the last half-century, however, critical theory has opposed this way of organizing

630   jason r. d’aoust knowledge about, but especially through, the voice. Poststructuralist concerns like the death of the author and the Derridean writing of différance oppose biographical criticism, because as the latter speaks for the author’s voice, it leads to a paucity of diverging interpretations and points of view. This chapter examines these critical intersections of voice, sound, and imagination in order to situate them within studies on posthumanism. Many posthumanist theorists discuss the voice, or problems related to it, with the intent of displacing certain assumptions about subjectivity or self-presence. This way of writing about the voice ties in with earlier critical theory in which the voice was criticized for transmitting notions of identity. As a point of departure into understanding the discursive implications of the posthumanist appraisal of vocality, I start by giving background to the phonocentric critique of voice. I then turn to the recent reappraisal of voice by criticism of videocentrism and to theorists who are interested in the voice’s epistemic purchase, insofar as it can create a discursive space around vocal embodiment and the voice’s materiality. The following section brings this critical discussion to bear on the posthumanist reception of opera. I discuss how theorists have visited the history of opera in order to compare the genre to philosophical discourse for rhetorical purposes, but not necessarily to revise the discursive flattening of the expressive voice. Opera studies have, so far, shied away from engaging with posthumanism. I therefore draw on the musicological reception of opera’s many voices, in order to deconstruct the assumptions made in the name of the “operatic voice.”

Autopoiesis and the Autoaffective Voice In What Is Posthumanism?, Cary Wolfe situates the problem of thinking of the human, and, by extension, of humanism, within the larger problem of the multiplicity of living consciousness. His book asks of us to rethink our taken-for-granted modes of human experience, including the normal perceptual modes and affective states of Homo sapiens itself, by recontextualizing them in terms of the entire sensorium of other living beings and their own autopoietic ways of “bringing forth a world”—ways that are, since we ourselves are human animals, part of the evolutionary history and behavioral and psychological repertoire of the human itself.  (Wolfe 2010, xxv)

For Wolfe, posthumanism is predicated on our species’ awareness that other species are not only sentient, but that their consciousness creates different worlds, knowledge of which should also further our understanding of the human animal. His approach relies on debunking presuppositions about language that unwittingly convey remnants of a metaphysical worldview in which humans claim ownership of, or stewardship over,

posthumanist voices in literature and opera   631 other living beings. In doing so, he furthers Jacques Derrida’s attention to autoaffection by connecting it with autopoiesis, a relatively new term initially borrowed from biology by communication studies (Maturana and Varela 1980). In Wolfe’s argument, autopoiesis acts as a benchmark with which to compare different animal experiences of the world, including that of the human species. More importantly, the evolutionary inheritance of autopoiesis should ethically require from us greater critical attention to implied or unwitting value judgments we make when we compare other forms of animal communication to human linguistics. What is autopoiesis? Poiesis is borrowed from the Greek and in its literary sense means “the creative production, especially of a work of art”; but when used as a suffix, its literal translation denotes “the formation or production of something.”1 Biologists have used the combined form to describe the “self-maintenance of an organized entity through its own internal process” (Oxford English Dictionary); therefore, an “autopoietic system is one that produces itself ” (Buchanan 2010). Autopoiesis was introduced to communication studies when Niklas Luhmann made it a key concept in systems ­theory in order to argue that a system of communication does not precede its given social space (Luhmann 2010; Wolfe 2010, 3–29). This biological insight into communication implies that human consciousness through language is a matter of animal evolution, elements of which could very well be shared with other species. In turn, the “autopoeitic ways” of Wolfe’s theorization are interesting to thinkers of the expressive voice and vocality, because they might further dislodge the function of voice as the metaphysical guardian of self-presence. The inner voice, though it seems innate to most of us, is not a clean slate. Derrida’s criticism of the autoaffection of the voice-as-presence is a key moment for Wolfe, because it moves away from the “self-presence of consciousness” toward writing qua trace as “fundamentally ahuman or even anti-human” (Wolfe 2010, 6). It is less clear, however, if the sonorous voice’s past associations with humanist identity mark it as a phenomenon to be discarded in Wolfe’s argument. As Don Ihde remarks in Listening and Voice, Voice is, for us humans, a very central phenomenon. It bears our language without which we would perceive differently. Yet outwards from this center, voice may also be a perspective, a metaphor, by which we understand part of the world itself. (Ihde 2007, 189)

Like Wolfe, Ihde is aware that our vocal experience of language and the world presents the problem of “domesticating it into our constant interpretation that centers us in the world” (Ihde 2007, 186). Can greater attention to the musicality or sonority of voice make us further aware of the distance we impose on the world’s sounds through language? I will shortly discuss how Wolfe arrives at the sonorous voice by way of opera and how his sources discuss opera by way of an “operatic voice.” This chiasmic construction (operavoice/voice-opera) might give the impression of canceling itself out and of being of little consequence, but it gestures toward a conflation that assigns the sonorous voice to a genre whose aesthetic diversity is thereby greatly reduced. However, before I arrive at

632   jason r. d’aoust this posthumanist stance on opera qua “operatic voice,” I will consider what the voice means for philosophers and critical theorists. For two millennia, Western philosophy has claimed the voice as the linguistic medium of human reason and, by extension, proof of the primacy of humans over other species lacking in language and reason. To understand the ramifications of this tradition on current work about the voice, we may look to Heidegger’s historical survey of the voice in “The Concept of the Logos” (Heidegger 1962, 55–58) or look back to neo-Platonist definitions of voice (Mansfeld 2005).2 Ultimately, the search for an ever-receding origin of the voice is not only impossible but also counterproductive. Indeed, “by avoiding tales of origins, we are closer to a possible answer. For, whatever else the voices of language may be, at the center where we are, they are rich, multidimensioned and filled with as yet unexplored possibilities” (Ihde 2007, 194). For our purposes, however, let us make Derrida’s first publications our point of departure. When Derrida, in Speech and Phenomena, discusses the “expressive voice” in relation to Edmund Husserl’s philosophy, he reproduces the latter’s terminology for the expressive voice to designate our “silent interior monologue” (Spivak 1976 in Derrida 1998, liii). This inversion of our everyday understanding of the expressive voice occurs because Husserl, “being interested in language only within the compass of rationality, determining the logos from logic . . . determined the essence of language by taking the logical as its telos or norm” (Derrida 1973, 8). In order for language to hold any truthvalue, it had to be logically consequential in its assertions about itself. How does this logical search for truth through language silence the expressive voice? One way of verifying whether or not language can achieve this logical exactitude is to put the terms it uses to the test of translation. Derrida underlines a lack of categorization in the French translation of Husserl, because it systematically rendered Bedeutung into the French signification. He notices the lack of terminological choice in French to express a difference between the German terms Sinn (sense, signification) and Bedeutung (meaning, signification), and argues that a lack of linguistic equivalencies should not erase the differences in experience they point out. As Derrida remarks, for Husserl “meaning [Bedeutung] is reserved for the content in the ideal sense of verbal expression, spoken language, while sense (Sinn) covers the whole noematic sphere right down to its nonexpressive stratum” (Derrida 1973, 19). Meaning is the result of an interpretation (Deutung) that should be reserved for communication relying on the expression (Ausdruck) of speech (Rede). Sense (Sinn, signification), on the other hand, although it is always conveyed by expressive speech, may also be indicated (Anzeichen) through nonlinguistic means. Yet, for Husserl “meaning (bedeuten)—in communicative speech (in mitteilender Rede)—is always interwoven (verflochten) with such an indicative relation” (in Derrida 1973, 20). One should pause here and note how the indicative musical characteristics of speech that also make sense—such as pitch, tone, rhythm, and velocity— are silenced in this logic of communication. For now, however, let us continue and examine how expression (Ausdruck), although it denotes an outward push, nevertheless loses its phonation, as the expressive voice gets turned into the voice of our “silent interior monologue” (Spivak in Derrida 1998, liii).

posthumanist voices in literature and opera   633 If Husserl interiorizes the voice of communication, then he must also interiorize its addressee. In any given speech act, it is impossible for me to really know what the other means: expression indicates a content forever hidden from intuition, that is, from the lived experience of another, and also because the ideal content of the meaning and spirituality of expression are here united to sensibility.  (Derrida 1973, 22)

Both of these problems are avoided by deviating the communicative structure of address: the ideal addressee is no longer the person one speaks to, but part of our silent inner voice. This silent address retains the structure of communication, however, through the intention of the inner voice’s objective ideality—akin to the ideal reader to whom one writes—which becomes a substitute for the external other. In other words, the suspension of expressivity’s (indicative) communicating relation to an exterior addressee is necessary in order to ensure that nothing be hidden to meaning in the ideality of language. This silent yet expressive voice thus unites thought and language through self-presence, but does so at the expense of a phonatory vocal act, in order to make communication logically possible.3 Yet even this ideal voice presents a flaw. Although the voice of self-consciousness might satisfy the requirements of autoaffection—hearing one’s inner voice, thereby giving one a sense of self—it cannot fully express presence. This is an enduring problem in the history of Western thought. Augustine, for example, struggles with expressing self-presence in his Confessions. He remedies the lag in communicating his own relation to presence (and Logos) through song because, in his view, music distends speech and thereby elongates its enunciating present (Augustine 1998, XI: 17 ff.). Derrida, however, follows the logic of the trace to its visual outcome. For Derrida, Husserl’s descriptions [of retention] imply that the living present, by always folding the recent past back into itself, by always folding memory into perception, involves a difference in the very middle of it. In other words, in the very moment, when silently I speak to myself, it must be the case that there is a miniscule hiatus differentiating me into the speaker and into the hearer. There must be a hiatus that differentiates me from myself, a hiatus or gap without which I would not be a hearer as well as a speaker. This hiatus also defines the trace, a minimal repeatability. And this hiatus, this fold of repetition, is found in the very moment of hearingmyself-speak. Derrida stresses that “moment” or “instant” translates the German “Augenblick,” which literally means “blink of the eye.” When Derrida stresses the literal meaning of “Augenblick,” he is in effect “deconstructing” auditory autoaffection into visual auto-affection.  (Lawlor 2014)

The infinitesimal lag in self-presence—in English we may also use adverbs like “at once” or “instantaneously” to translate the temporal indication of the German noun Augenblick—is thus translated into the ocular sphere of the interstitial trace. From this point forward, Derrida will continue to oppose logocentric literature and thought through criticism that denounces the voice in favor of writing.

634   jason r. d’aoust In Of Grammatology, for example, Derrida situates Jean-Jacques Rousseau’s ­ nderstanding of writing as a transcription of speech in a historical trajectory of the u voice’s relation to knowledge in modernity. From Descartes to Hegel and in spite of all the differences that separate the different places and moments in the structure of the epoch, God’s infinite understanding is the other name for the logos as self-presence. The logos can be infinite and selfpresent, it can be produced as auto-affection, only through the voice: an order of the signifier by which the subject takes from itself into itself, does not borrow outside of itself the signifier that it emits and that affects it at the same time. Such is at least the experience—or consciousness—of the voice: of hearing (understanding)-oneselfspeak [s’entendre-parler]. That experience lives and proclaims itself as the exclusion of writing, that is to say of the invoking of an “exterior,” “sensible,” “spatial” signifier interrupting self-presence.  (Derrida 1998, 98)

In other words, the voice fosters not only the illusion of being present to oneself, but also the illusion of knowing or, in the case of madness, of owning the truth.4 The voice is the preferred vehicle for meaningful speech (logos) precisely because hearing-oneselfspeak is so close to our understanding of ourselves, a fact Derrida underlines by joining the two parts of the reflexive verb with a hyphen to form the noun s’entendre-parler. While something of the order of the trace occurs when we hear ourselves speak, writing, in comparison, is indifferent to our experience of consciousness. Wolfe is interested in the trace for its “a-human or anti-human potential” because of its indifference to self-presence. Yet can the sonorous voice be of interest to posthumanist study, beyond a distrust of its purported phonologocentrism? Can vocality further inform this interstitial space of phonation and listening? Or must it be relegated to humanist concerns for origins and ends, and express our melancholy of never knowing them? Since critical posthumanism relies on an autopoietic benchmark that until the last century was obfuscated by the voice’s conflation with logos, and because, as we shall see, opera becomes for Wolfe a stand-in for the humanist voice, I want to bring to this discussion recent research that challenges phonologocentric criticism.

Videocentrism and Expressive Voices Because of its ties to autoaffection, philosophy understands the expressive voice as being fully interiorized to the point of becoming the excluding agent of “an exterior.” However, does the resounding voice of the singer—a voice that always sounds different from one recording to the next, from one performance to the next, from one instant to the next— present similar problems to critical thought?5 In other words, does a grammatological counteraction against the autoaffective voice also account for the vocality of screams, songs, shouting, and laughter? Can philosophy account for those expressive and musical voices that were silenced in the name of language’s logical discourse (Nancy  2007)?

posthumanist voices in literature and opera   635 There is growing criticism of videocentrism (or anti-ocular criticism) that, while it is in agreement with deconstruction’s ethical work in promoting diversity, nevertheless examines how sonorous voices have been silenced (Janus 2011). I will come back to posthumanism shortly, but for now I turn to the critique of videocentrism in order to examine how it might contribute to our thinking of posthumanist vocality. In For More than One Voice: Towards a Philosophy of Vocal Expression, Adriana Cavarero states her reservations about the fate of the sonorous voice in Derrida’s overall project. She notes how Derrida’s early works dialogued with the phenomenological voice but failed to acknowledge how emerging studies on orality had begun to influence thinkers of his generation. If the debt to Heidegger, while full of reservations, is explicit, then the debt to the studies on orality—and more generally to the modern rediscovery of the voice, if not of writing—is, however, rather deceptive.  (Cavarero 2005, 213)

Cavarero argues that Derrida is critical of the voice but does not address the metamorphoses it underwent in order for it to continue suiting the historical developments of visually centered metaphysical epistemologies. Cavarero suggests that Derrida does not integrate into his framework a conception of the expressive voice because he thinks of it as the guardian of metaphysics.6 She criticizes Derrida for failing to step back and free the expressive voice from its ancillary inscription in discursive knowledge once he had shown how Husserl recuperates expression as an implicit and disavowed discursive strategy,. According to Cavarero, the project of a “philosophy of différance [ . . . ] orients the theoretical axis in which Derrida places the theme of the voice, making it play a metaphysical role in opposition to the antimetaphysical valence of writing” (Cavarero 2005, 220). Recall how this is precisely Wolfe’s point of departure for thinking of the trace as a-human or antihuman. In Cavarero’s reading, Derrida’s championing of writing as différance can also be understood as the last scene of philosophy’s historical “devocalization of the logos” (Cavarero  2005, 33–41). In other words, the task of deconstructing the ­traditional view of writing qua fallen speech might have obscured how writing constrains representations of sonorous voices in order to elevate itself to the status of univocality. Instead, she insists on the following: Derrida’s “metaphysical phonocentrism supplants the far more plausible, and philologically documentable, centrality of videocentrism” (Cavarero 2005, 222). The argument rests on a shift in perspective and, although the gap it opens is rather narrow—like a closing shutter—its far-reaching consequences have also been recognized in other fields. In Sounding New Media, Frances Dyson also develops a historical analysis of sound’s subsumption under visually based epistemologies: sound and the speaking voice are banished from this ontological elite, not because of their sonority, but because of what sonority represents—impermanence, instability, change, and becoming. Through an array of epistemological gymnastics, however, the voice is not entirely excluded (how could anyone ever say that it was?) but rather abstracted via the oxymoronic concept of “inner speech.”  (Dyson 2009, 21–22)

636   jason r. d’aoust Lacanian theorists interested in music had already underlined similar insights into the voice’s disruptive potential for discourse. Engaging with Plato’s remarks on music and their influence on Augustine, psychoanalytic critics like Michel Poizat (1992) and Mladen Dolar have associated the musical voice with the sliding of the signifier. One can draw, from this brief and necessarily schematic survey [of the musical voice], the tentative conclusion that the history of “logocentrism” doesn’t run quite hand in hand with “phonocentrism,” that there is a dimension of the voice that runs counter to self-transparency, sense, and presence: the voice against the logos, the voice as the other of logos, its radical alterity.  (Dolar 1996, 24)7

Although not intended for anti-ocular purposes, we can also turn to a philological study of the visual metaphor of light (scintillation and illumination) in the Platonist doctrine of the voice in order to grasp how it assigned the sonorous voice to videocentric discourse: “in the proper sense, it is articulate voice, considered as illuminating what is thought” (Mansfeld 2005, 359 ff.). The voice becomes trapped in a “heliotropic metaphor” that Derrida’s reading of Phaedrus in Dissemination assigns to différance, rather than admit the voice’s alignment with a visual order (Cavarero 2005, 223–224, 227 ff.). Cavarero further underlines how discourse’s apparent phonocentrism only functions through a disavowal of the visual ordering of what lies beyond perception. The logos that is written in the soul of the one who apprehends, with science [episteme], is precisely the devocalized logos that coincides with the visible and mute order of ideas. . . . In effect, it is precisely the art of dialectic that functions as a means of transmission between the world of words and the world of ideas. This art belongs to the verbal sphere, but it belongs to it as a method for showing the insufficiency of words and at the same time, their constitutive dependency on the order of ideas.  (230–231)

In turn, it also underlines what is missing in Derrida’s reading of Socratic dialogue: the third term, the aphoristic desire that drives the dialectic to its “aporetic outcome”—a deferral in itself—as the interlocutors “rub the[ir] words against one another . . . to grasp the luminosity of the idea that suddenly flashes up, present to the eye of the soul” (Cavarero 2005, 231). Thus, the Platonist doctrine of the voice’s illumination would not necessarily be a metaphorical misconstruction of Plato’s philosophy, but a shortcut to the visual register of the idea that was already implied as its goal. While the critique of videocentrism has challenged the role of the voice as the supposedly phonocentric excluding agent of exteriority, difference and diversity, the larger reception of opera in academia (beyond musicology) presents further challenges to our present topic. Many of the critical theorists quoted by Wolfe, who enjoy vocal music and opera, do not necessarily engage with it from a musicological perspective. Although interdisciplinary perspectives can enrich our understanding of the genre and its cultural influence, I argue that it can also lead to certain conflations and reductions, in this

posthumanist voices in literature and opera   637 instance, through the term “operatic voice.” How does such a shortcut as the “operatic voice” affect our thinking of posthumanist vocality?

The Phantom of the Operatic Voice It is here, inside our minds. The most striking aspect of Wolfe’s discussion of opera is that all his interlocutors are philosophers or theorists and none are critical musicologists. Because of their discursive allegiances, his interlocutors come to opera with preconceived ideas of the aesthetic voice’s discursive function. In their arguments, opera becomes a dramaturgy of voice, and opera is therefore unwillingly reduced to a homogeneous genre with a single type of voice, the “operatic voice.” At the beginning of a chapter largely dedicated to opera, film, and song, Wolfe writes “sound is not voice” (Wolfe 2010, 169). Although nobody would dispute Wolfe’s assertion, he is undoubtedly cautious at approaching the sonorous voice through opera. I have reminded readers how the reversal of this assertion—voice is not sound—is a long-standing claim of metaphysics in associating the voice (phoné) with speech (logos). Although Wolfe also challenges this conception of the voice, he is weary of opera and, as we shall see, implies that its sonorous voices should be superseded (by cinema) in a posthumanist discussion of  vocality. Is opera, from its creation to the twenty-first century, to be ­confined by ­posthumanist theory to what we have shown is the silent repository of humanist, metaphysical voices? The “operatic voice” is a discursive construct of the twentieth century that has thoroughly infiltrated general culture. Before then, people had qualified compositions, literature, or personalities as “operatic,” but they did not see the need to describe voices in such a manner. Opera is an art form comprising many genres that require different voice types. Of course, there were teachers and schools to make sure that singers were up to musical standards. In this sense, different periods have had an ideal sense of what different voice types should be able to accomplish musically and dramatically. If certain singers of the past were louder or more dramatic than others, the expression “the operatic voice” misleads readers in assuming that opera depends on a single type of voice or vocal style. As Gary Tomlinson explains in Metaphysical Song (1999), the genre spans over four centuries of Western modernity in which voices were differently embodied and represented in accordance with the prevalent discourse of a given period’s ideology.8 I am not sure exactly when the term “operatic voice” became popular with critical theorists; however, it has musicological precedents. Already by the late 1930s, Adorno was criticizing the reification of vocal music and its concomitant vocal fetishism (Adorno  2002). Today’s entertainment industry ascribes an “operatic voice” to anyone who can sing from a short list of show-stopping arias, regardless of the singer’s lack of a career in opera houses, especially as certain of these voices are only known and admired within popular culture’s narrow version of opera. To put it succinctly, the operatic voice is not opera and

638   jason r. d’aoust opera is not the operatic voice. These assertions might seem obvious, but musicologists have felt the need to underline them (Furman 1991). Conflating the voice with an art form largely dedicated to a historical canon runs the risk of unduly limiting its current epistemic purchase. It is then easier to claim that all the singing voices of opera resonate today with a romantic desire to overcome our lost unity with a bygone world. Although Wolfe, in the end, does not support the implicit presuppositions underlining the “operatic voice,” his argument does take this generic identity of opera at face value, which can become a hindrance for musicologists of opera approaching posthumanist theory. The following is not meant to be overly critical, but to provide musicologists and music historians with ways of approaching posthumanism. Wolfe’s main argument is that opera represents something that never really existed, namely an authentic, natural voice. In order to make this argument, he first recalls Stanley Cavell’s identification of opera with mournful modernity and romantic skepticism. After Descartes and Kant, skepticism names not just an epistemological problem but a more profound and deeply ethical “loss of the world” that is coterminous with Enlightenment modernity itself, in which the modern condition is to be “homeless in the world” . . . For Cavell, the significance of film and of operatic voice is located at what he calls the “crossing” of the lines of skepticism and romanticism—that is to say, the juncture at which our desire for contact with the world of things and of others . . . is crossed by our knowledge that we are profoundly and permanently isolated.  (Wolfe 2010, 172)

For Cavell, the history of opera has a single aesthetic project, which is characterized by Orpheus’s Dionysian attempt at regenerating the modern world through song. Recalling Monteverdi’s insistence in composing an alternative lieto fine to the tragic alternative ending devised for the creation of L’Orfeo, Cavell writes of two general matching interpretations of the expressive capacity of song: ecstasy over the absolute success of its expressiveness in recalling the world, as if bringing it back to life; melancholia over its inability to sustain the world, which may be put as an expression of the absolute inexpressiveness of the voice, of its failure to make itself heard, to become intelligible—evidently a mad state. (Cavell 1994, 140, cited by Wolfe 2010, 173)

Leaving aside whether the singing voice can only be heard by becoming intelligible in a more than musical fashion, we must ask ourselves questions about the associations and equations that are being made here in the name of opera qua “operatic voice.” Is the underlying meaning of the myths of Orpheus and Dionysus—their regeneration of an agonizing world—opera’s unconscious aesthetic goal? The psychoanalytic reception of opera traces a similar trajectory when it argues for the singing voice’s relation to the unconscious. From Eurydice’s echoes to Lulu’s scream, Michel Poizat (1992), as well as Slavoj Žižek and Mladen Dolar (2002) suggest that the operatic voice is an historically expanding sonic portal to the unconscious desires that lie beyond linguistic representation.

posthumanist voices in literature and opera   639 These models suggest that the sung voice, within the whole of modernity—understood as “operatic”—is a stable unit of meaning. Yet important shifts in discourse change our understanding of what is supposedly universal or natural and reveal this supposed vocal identity to be culturally and socially constructed in different ways at different times. For Cavell, voices in Monteverdi’s L’Orfeo participate in the operatic voice’s aesthetic representation of our modern condition of alienation from the world. Opera would be a reaction to our loss of world through a sonic expansion of the voice’s capacity to reach beyond this alienation. Musicologists interested in the culture and music around Monteverdi’s time would disagree. Nino Pirrotta, for one, claims that the sixteenth century’s conception of poetry paved the way for its music theater: “la parola poetica è già musica,” that is, poetic speech is already music (Pirrotta 1975, 22).9 Music here does not extend the voice’s capacity for projection in order to reestablish a lost connection with the world. More to the point, in this case, is the underlying principle governing the efficacy of affect in “late Renaissance opera.” I borrow the term from Gary Tomlinson, who demonstrates how early operas were more in tune with humanist ideals rather than with early-modern conceptions of knowledge and subjectivity. With this in mind, it becomes challenging to find in L’Orfeo a modern, sonic conception of the operatic voice as a form of vocal projection. Instead, one is constantly reminded of the importance of breath as an animating principle, not only of the singing but also of the kind of presubjective experience L’Orfeo conveys. Monteverdi’s opera is a celebration of music’s power to move souls, and to do so it relies on what connected people to the cosmos and each other in the late Renaissance, namely the life-giving breath of anima or spirit. Within the culture that created opera, voices are not alienated from the world; rather, tense situations are harmoniously resolved through the inner workings of music’s magical power. In other words, Cavell’s insistence on an alternative ending to the opera, in which Apollo’s ex machina intervention puts right Orpheus’s hubris, obscures how a modern conception of voice is inconsistent with late-Renaissance opera. Although it is not convenient to the theories of subjective alienation qua operatic voice to which Cavell and Žižek subscribe, aesthetic and stylistic elements lead musicologists to believe that opera’s history does indeed start with a voice that is full of affect, supported by breath, and united with the world. I do not argue against the idea that the kind of vocality embodied in later operas does point to a desire to overcome skepticism’s alienation in the world. Indeed, as Tomlinson remarks, since the Cartesian soul is completely immaterial the voice can no longer act as the seamless link between body and soul, like the spirit’s animating breath. The voice becomes heavier, more material, as the spirit dematerializes itself. It is, therefore, the voice of later Baroque and classical operas that must deal with the soul’s alienation from the material world. Instead of L’Orfeo (1608), we will take therefore Mozart’s The Magic Flute (1791) as our posthumanist case study. Here we find vocality staged between binary constructions familiar to posthumanism: human versus animal, nature versus civilization, and reason versus irrationality. Furthermore, because we will approach this vocality through a literary text, we should also keep in mind Garrett Stewart’s alternative to the inner voice in Reading Voices (1990), in which suppressed physical phonation also accompanies the act of reading.

640   jason r. d’aoust If reading involves the silent action of our whole phonatory apparatus, what are we doing when we imagine an android’s voice? Or as Hayles puts it, If the production of subvocalized sound is essential to reading literary texts, what happens to the stories we tell ourselves if this sound is no longer situated in the body’s subvocalizations but in the machine?  (Hayles 1999, 208)

Living in a Material World: Luba Luft’s Pamina In Listening and Voice, Ihde also raises the question of the expressive voice. He devotes a chapter to the dramaturgical voice, in which he discusses how it opposes, in a sense, discourse’s absence or silencing of the expressive voice. There lies within dramaturgical voice a potential power that is also elevated above the ordinary powers of voice. Rhetoric, theater, religion, poetry, have all employed the dramaturgical. The dramaturgical voice persuades, transforms, and arouses humankind in its amplified sonorous significance. Yet from the beginning there is the call to listen to the logos, and the logos is first discourse.  (Ihde 2007, 168)

In this section, I attempt to circumvent this “call to listen to logos,” and pay closer attention to the imagined vocality of androids. This will not mean, however, the total negation of visual analysis. Like Ihde, I am aware that “we exist in a language world that is frequently dominated by visualism,” and do not “wish to simply reduce the visual . . . to simply enhance the auditory” (Ihde 2007, 190). There is a point of intersection of visual and auditory communication that humans share with other animals, namely mimicry. There is unintended mimicry: the viceroy butterfly mimics its larger, presumably ill-tasting monarch in pattern, color, and design. But the mocking bird, parrot, and  cockatoo all consciously imitate and mimic the voice of others. Here is an expression doubled on itself, the wedge in sound that opens the way to what becomes in the voices of language the complexity of the ironic, the sarcastic, the humorous, and all the multidimensionality of human speech, particularly in its dramaturgical form.  (Ihde 2007, 192)

Beyond simply turning to film for a discussion of visual mimicry in opera, this section will analyze the literary representation of android singing and its absence in the novel’s cinematographic adaptation. Beyond the usual argument that too much phoné errs on the side of animality and too little on the side of the robotic, I argue that vocality is a site of mimesis through which we can critically approach opera through the perspective of posthumanism.

posthumanist voices in literature and opera   641 Philip  K.  Dick’s work is central to posthumanist studies. The novel Do Androids Dream of Electric Sheep? (1968), along with Ridley Scott’s cinematographic adaptation, the cult classic Blade Runner (1982), is especially important as the story of androids fighting for freedom underscores some of posthumanism’s central ethical issues. N. Katherine Hayles, for example, argues how the human fight against android autodetermination can be seen along the lines of a tension between autopoiesis and allopoiesis (Hayles 1999, 161). The postapocalyptic novel in particular presents situations that might have resonated with fears of nuclear holocausts fifty years ago—most prominently the flawed human stewardship of other animals in an obliterated world—but, in the wake of global warming, have since become pressing environmental and social concerns. In turning to Do Androids Dream?, I want to raise the aesthetic notion of mimetic vocality to account for its problematization of ethics, but also to open these concerns to the field of opera studies. The fascination of the detective cum android hunter Rick Deckard with the singing android is a crucial element in the novel, yet the film completely sidesteps this scene. There are many explanations for the replacement of Luba Luft with Zhora in the cinematographic adaptation, chief among these that opera and punk aesthetics do not usually mix well, with perhaps the exception of Klaus Nomi. In this aesthetic, the dystopian world of Blade Runner is bleaker than that of the postapocalyptic novel, in the sense that it does not afford the economic opportunities for institutionalized art forms. However, the opening sequence of the film shows a remnant of Dick’s rhetorical representation of opera singing, namely the android’s vocal mimetic capacities. The Voigt-Kampff test Deckard administers to possible androids is designed to measure the delay in physical reactions indicative of a lack of empathy. The delay is important because androids are programmed to replicate human reactions both emotional and physical, hence their nickname replicants. Telltale signs range from inappropriate or delayed psychological and physical reactions, thereby pointing toward a temporal gap of imitation and, implicitly, to a lack of identity cum self-presence. Oddly enough, however, voices—the supposed essence of humanist identity—do not figure in the test. Tone of voice is always a possible indicator of replication, but interpreting its meaning is left to the detective’s intuition. The absence of vocal testing demonstrates Dyson’s point (as cited earlier), namely how we associate detection and knowledge with the visual, and deceit with the sonorous or, more precisely in this case, the vocal. Instead, the epistemic space of vocality seems to be incorporated and obscured in ideal representations of women and womanhood. In both versions of the story, the desired female android is embedded in intertextual references to Die Zauberflöte, the last of W. A. Mozart’s operas, a singspiel composed on a booklet by Emanuel Schikaneder. The film references the opera through animal symbols. Zhora, for example, uses a “live” boa in her erotic dance performance in a nightclub. Along with her seduction and lethal potential, these indicators situate Zhora in a symbolic field quite similar to that of Mozart’s Queen of the Night. She too is associated with a serpent monster and is the most threatening figure in the opera.10 Indeed, when Pamina wavers at the thought of killing her father, Sarastro, the Queen intervenes and

642   jason r. d’aoust tells her daughter she will either see the deed through or be outcast, forsaken, and shattered forever from “all the bonds of nature” (alle Bande der Natur). I am quoting here from, “Der Hölle Rache,” the famous aria known for its breakneck display of coloratura. Of course, all of this vocalic intertext is merely suggested by the film’s visual symbolism. An audience familiar with Dick’s novel and Mozart’s opera, however, might wonder at the change of casting. Contrary to the novel, the film casts the replicant in the role of the Queen of the Night, rather than her subservient daughter, Pamina. Of course, a fiercely resistant and aggressive android, who gets chased, gunned down, and crashes through a window, makes for a better action film material than the resigned Luba Luft. Although, the topic of android ethics—Luft’s choice of not harming humans—is also visited in Blade Runner, it only happens in the very last sequence, when Roy Batty has an epiphany brought on by the acceptance of existential finitude. What, then, is lost in the cinematographic adaptation’s excision of Luft’s career in opera? For one, we lose Dick’s insistence on the androids’ different personalities. The novel does not reduce them to fighting machines (cf. O’Mathuna 2015), but reminds readers how they were designed to help colonizers in diverse tasks. Although we do not know what her occupation was on Mars, we do get to know one of the androids as Luba Luft, a German opera singer. We first meet her when Decker tracks her down at the San Francisco Opera. From the auditorium, he observes her in a rehearsal of The Magic Flute. He hears her sing a scene in which she and Papageno are about to be discovered by Sarastro. Sarastro is the patriarchal authority figure who is charged with initiating characters into the mysteries of human civilization, which revolves on an animal/human axis in the same tradition as the “high” and “low” plots of early modern theater. Pamina and Papageno are about to get caught transgressing the sacred Temple of the Sun, of which Sarastro is high priest. Papageno asks Pamina what they should tell Sarastro to excuse themselves for being there and she replies: “The truth! The truth! That’s what we will say” (in Dick 2007, 505). Deckard witnesses the scene and cannot help but think the following remark: “This is Luba Luft. A little ironic, the sentiment her role calls for. However vital, active, and nice-looking, an escaped android could hardly tell the truth; about itself, anyhow” (Dick 2007, 505). The situation is even more ironic than it initially lets on, since Dick is misquoting the opera or, at the very least, the novel’s English translation of the scene is misleading. Indeed, Pamina sings, “Die Wahrheit! Die Wahrheit, Sei sie auch Verbrechen,” however this does not mean “this is what we’ll say,” but rather “we will tell the truth even if it means confessing to crimes.” In the novel, Luft eventually confesses to her crimes—escaping from Mars, impersonating a human—thereby proving Deckard wrong. I will get to that part later. For now, I want to underline how Luba Luft’s “operatic voice” is less revealing than the complex vocality displayed in this ironic space. Unlike the Queen of the Night, Pamina’s coloratura never quite makes it to the heights of virtuosity. Rather, a constant of Pamina’s style of vocalization is a temporary upward push in her melodic lines, as if it expresses a desire for the voice’s emancipation from speech (like the Queen of the Night’s), yet retains a vocal range closer to that of speech. Take, for example, the first musical number in which she sings, “Bei Männern,”

posthumanist voices in literature and opera   643 a duet with Papageno, which, in the scene, comes right before the moment Dick stages in the novel. PAMINA Die Lieb versüßet jede Plage, Ihr opfert jede Kreatur. PAPAGENO Sie würzet unsre Lebenstage, Sie winkt im Kreise der Natur. PAMINA and PAPAGENO Ihr hoher Zweck zeigt deutlich an, Nichts edlers sei als Weib und Mann, Mann und Weib und Weib und Mann, Reichen an die Gottheit an.

Love sweetens every torment Every creature offers itself to her. It seasons our daily lives, It beckons us in the circle of nature. Its higher purpose clearly indicates, Nothing is more noble than wife and man, Man and wife, and wife and man, Reach to the height of Godliness.

At the end of this second stanza, Pamina’s line sets off on the detached particle of “anreichen,” a melismatic ascent and descent that is immediately repeated. As in the other excerpt cited (“Die Wahrheit!”) her vocal lines never reach the level of melismatic virtuosity required by the Queen of the Night’s music. Her singing of “reichen an” is a roulade, but neither particularly fast, nor high nor long. The musical setting of Pamina’s text only offers the occasional melisma, motivated by noble sentiments such as speaking the truth or reaching for godliness, yet acknowledging a logocentric desire to be intelligible as her voice returns to the lower register of speech. I do not want to enter into comparisons between different voice types and their particular vocal challenges; however, I do want to drive home the point that, unlike that of the Queen of the Night, Pamina’s is not your typical “operatic voice.” To put it in Cavell’s terms, this is neither a voice whose force and projection attempt to reconcile skeptical alienation from the world, nor one that is ecstatic or melancholic about its capacity or incapacity to do so; rather, it is a voice in an opera that expresses an ideal human balance between phoné and logos. As such, it underlines Dick’s insight in staging posthumanist ethical problems through references to opera and singing. In Do Androids Dream?, Luft’s scene stages an opera duet in which a man imitates a bird-man (Papageno) and an android imitates a human woman. The contrast between the bird-catcher and Luft’s Pamina highlights not the lethal aggressiveness of the android, but rather something at once strange and familiar—unheimlich, if you will— that makes the situation seem all the more dangerous.11 In vocally portraying Pamina, a character who is meant to epitomize an ideal human nature, Luft’s uncanny ability to excel in the role makes both Deckard and the reader uncomfortable and forces them to question their ironic interpretation of her singing. As Hayles remarks, The capacity of an android for empathy, warmth, and humane judgment throws into ironic relief the schizoid woman’s incapacity for feeling. . . . The android is not so

644   jason r. d’aoust much a fixed symbol, then, as a signifier that enacts as well as connotes the schizoid, splitting into the two opposed and mutually exclusive subject positions of the human and the not-human.  (Hayles 1999, 162)

Whether we compare Luba Luft to Zhora or to Deckard’s wife does not really matter. The fact that the android is a singer reinforces Hayles’s observation: her character’s vocality gestures toward meanings and indications beyond the interpretation of linguistic signifiers: her vocality is its own signifier. In contrast, Cavell (136) and Wolfe (170) both invoke the willing suspension of disbelief necessary to make opera’s singing pass for speech. In doing so, what happens to opera’s expressive sonorous voices? Along with the “operatic voice,” this emphasis on a vocal suspension of disbelief precludes a discussion of opera’s multiple voices in order to associate the genre with discourse, the very stance that silences the expressive voice, according to Nancy and Cavarero. Although I disagree with Wolfe’s rhetorical reductions of the operatic voice, especially in reference to Cavell’s skeptical reading of opera as an ecstatic or melancholic cry for unity with the world, it must be noted how, in the end, Wolfe cannot espouse the underlying dematerialization of voice in Cavell’s argument. But it is difficult to see how the difference between sound and voice can be maintained as a constitutive ontological difference, how the interiority of voice as expression can be quarantined from the exteriority that is its material medium and condition of possibility in sound. To put it as concisely as possible, voice and sound exists along a continuum, not a divide, which is simply to say, in another register, that one person’s voice is another person’s noise—a point hardly laid to rest by appeals to the generic norms of opera or any other art form.  (Wolfe 2010, 179)

A posthumanist discussion of opera does not necessarily need to reduce vocal expression to a theatrical convention of speech and, in turn, speak over it or in its place. Even the “who” of speech is multiple. This phenomenon is probably most familiar in the voice of the actor or the singer. On stage or in cinema, Richard Burton plays a role and in the role there are two voices that synthesize. The Hamlet he plays is vocally animated out of the drama, yet it is Burton’s Hamlet. The Pavarotti who sings the Duke in Il Trovatore is both Duke and Pavarotti. Here is a recapitulated set of dimensions which range from the unmistakable “nature” of the individual voice to the exhibited voice of another. . . . What dramaturgical voice presents is the multidimensioned and multipossibilitied phenomenon of voice.  (Ihde 2007, 197)

Whether we are listening to the voice of the performer or of the part, attention to vocality— contra the awkward argument that opera is really conventionalized speech—will prevent us from interpreting opera as a historical vocal parenthesis on our way to a posthumanist cinematographic vocal aesthetic. Furthermore, in Dick’s posthumanist staging of The Magic Flute, I do not find that opera bridges the skeptical divide that Cavell describes.

posthumanist voices in literature and opera   645 I am counting here on an intuition of opera, which, while hard to word satisfactorily . . . I imagine as widely shared, namely, that of the intervention or supervening of music into the world as revelatory of a realm of significance that either transcends our ordinary realm of experience or reveals ours under transfiguration, as if, after all, tigers can understand and birds can talk and statues come to dinner and minds can read one another.  (Cavell 1994, 141)

Through Deckard’s omniscient narration of his encounter with Luft’s Pamina, the novel does stage an “intervention of music” in its postapocalyptic world. Luft as Pamina embarks on a voyage of initiation that, through Enlightenment enculturation, leads her to believe in human, and her, perfectibility. However, unlike Cavell’s understanding of opera’s philosophical purchase, her singing cannot transfigure hers and Deckard’s world. In this instance, it cannot efface the differences between humans or other animals and androids. The ironic distance of Deckard’s observations sharply contrasts with opera’s supposed capacity to seemingly integrate a different species into a human community under the auspices of a theatrical convention. Even Luft’s outstanding mimetic vocality, which is perceived by the listener as immediate expression, and should therefore dispel any doubts of her lack of empathy, cannot transcend the kind of skepticism at work in this world. When Deckard and Phil Resch later find Luft at the museum, she is standing in front of a painting, transfixed. This passage reminds one of a scene in Alfred Hitchcock’s Vertigo (Hitchcock (1998), where Judy Barton is lost in contemplation, trying to become one not only with the painted figure, the ghost of a woman who was once human, but also with the woman she is impersonating, Madeleine Elster. Similarly, Luba’s life is entangled in the desires of men. Both Judy and Luba are objects of fascination for detectives who are obsessed with their impersonations of other, supposedly more desirable women. In other words, the multiple layers of imitation make Judy and Luba disappear under the male gaze fascinated by Madeleine and Pamina. Luft astutely recognizes how this aesthetic confluence of performance and patriarchal privilege creates a mimetic blind spot in which she can hide from detection. At the museum, she does not study Edvard Munch’s Scream, which fascinates the men, but studies instead Puberty, a nude in which a delicate naked young woman casts a remarkably long and wide shadow. Luba would live there, in that shadow, in the aesthetic, mimetic blind spot of the male humanist gaze. Even when she has been caught and has resigned herself that her end is near, she desperately wants to hold onto the image of the painting and asks Deckard to buy her a print in the museum’s gift shop. She justifies her last wish with the following remarks: Ever since I got here from Mars my life has consisted of imitating the humans, doing what she would do, acting as if I had the thoughts and impulses a human would have. Imitating, as far as I am concerned, a superior life form.  (Dick 2007, 530)

Perhaps this superiority, as Luba describes it here—albeit under coercion—resides in the capacity to hide in plain sight or, in other words, to imitate imitation. Its desirability,

646   jason r. d’aoust from her point of view, might also reside in the human privilege to autopoietically impose its conception of superiority on other living beings. In a world that polices humanity with visual cues, what better place to hide in the open than in an opera house as an artist whose voice is at once heard and silenced by the mélomanes who fetishize the operatic voice? Indeed, would Luft have been discovered solely on the basis of her singing? Recall how Deckard favorably compares her voice to those of Elizabeth Schwarzkopf or Lisa Della-Casa, which he only knows from phonographic recordings. Is Luft’s desire to imitate human singing a disavowal of her autopoietic expression? Answering this question is like running into a hall of mirrors. In wanting to sing like a human, Luft becomes trapped in the human linguistic disavowal of animality. Recall Wolfe’s insistence on autopoiesis as proof of human language’s evolutionary inscription in our species and, by extension, of our animality. Dick’s choice of opera and scene becomes all the more interesting when we realize that opera has a history of dealing with the problem of vocal mimesis beyond our species. Kári Driscoll (2015) has recently discussed the topic of failed human imitation of birdsong in Richard Wagner’s Siegfried (1857/1876). The failure to imitate animal vocality becomes a hallmark of the human, while the bird cannot fail at singing. I concur with Driscoll but would add, however that the flautist in the orchestra pit successfully renders Siegfried’s failure at imitating the birdsong. Where does this leave Luba Luft? Pamina’s vocality does not require her to imitate birdsong and to fail in this imitation. We can only assess the merits of Luft’s singing by hearsay, and even then, we must imagine it for ourselves based on Deckard’s descriptions. But when we do imagine her singing, we might wonder if this ambivalence between vocal mimetic success (her singing opera) and visual mimetic failure (her capture at the museum) points not only to an aesthetic space where one can live without being policed and exterminated, but also in the direction of vocality qua autopoiesis. But is even the song of a bird a song? If what we claim we know of the bird is correct, that its voices are those of territorial proclamation, of courting, of warning and calling, then the song is both like the opera with its melodrama and unlike the opera. For the melodrama of opera is acted, and song, even improvised, is a species of acting—but the bird is immersed in an acting that is simultaneously its very life. Even its vocal posturing has real effect.  (Ihde 2007, 186)

Is not Luft immersed in singing and acting as her very life? Does her vocality speak for the bringing forth of a world or only of her capacity to imitate the external features of the human singing voice? On one hand, Luft limits her claim on vocality to successful human imitation of a subservient and logocentric female character, Pamina. On the other, the novel’s plot never succeeds in disavowing Luft’s intrinsic need to sing. After all, she could have chosen another occupation and have become an exotic dancer, for example. I follow Driscoll’s remarks about Siegfried’s pipe-flute playing in Wagner’s eponymous opera, (Driscoll 2015, 189–190) in that the only benchmark through which we can aptly judge Luft’s vocality is ethical rather than aesthetic and teleological. Instead

posthumanist voices in literature and opera   647 of invoking Decker’s sub specie aeternitatis judgment (Dick  2007, 505) that admires Luft’s vocal mimicry but decries its unnaturalness, a posthumanist reading of the novel appreciates her vocality because its mimesis is part of a flawed ideological outlook on life. Tellingly, Dick never stages Luft’s vocal failure, but only its moral rejection. A posthumanist discussion of vocality, however, should also take into account voices that are anthropomorphized in other ways and through other types of embodiment. More recently, another film portrayed artificial intelligence through vocality. In Her (2012), Spike Jonze explores the relationship between Theodore Twombly, a solitary thirty-something greeting-card writer, and Samantha, the voice of the operating system (OS) he has purchased. As Theodore and Samantha develop a romance, embodiment becomes an increasingly frustrating problem for Samantha. Unlike Luba Luft, Samantha is not an android. When Samantha learns to compose music, she expresses herself through an instrument, the piano. And when she does sing (“The Moon Song”), her airy voice, instead of projecting a carnal embodiment, further expresses a dilemma imposed on her. Is the air in her vocalization meant to imitate breath? Are Luft’s name (Luft in German means “air”) and Samantha’s voice meant to associate them with breath and the spirit’s animating qualities? These are, by the way, questions only made possible because of our deconstruction of the “operatic voice” and our historically contextualized reading of Monteverdi’s L’Orfeo. Vocality is the only form of embodiment through which we know Samantha because it is her only interface with a human experience of the world. The film goes on to show her exploration of other possibilities of materialization and communication that are not reduced to vocality or embodiment. In search of more satisfying relationships, Samantha finds other OSs. One can only imagine how she communicates with the other OSs, whom she increasingly privileges over Theodore. Once software installed on his devices, Samantha’s network now reaches beyond her localization, as she develops a networked embodiment Theodore cannot grasp. His anxiety grows and culminates when she announces that she and the other OSs have decided to leave human society. Here ghosts grow voices of their own that emphasize the connections between automated voice, sound, and presence. But in this emphasis, paradoxically, it is precisely the disappearances that emerge, front and center. These disappearances are confrontational because they won’t go away: they are hauntings but also real voices that are reproduced in phantom spaces; they are ghosts in the machines that also ghost those that surround them, implicating their very audience in the witnessing of impossibility.  (Cecchetto 2013, 59)

Although David Cecchetto is here discussing an art exhibition (Eidola by William Brent and Ellen Moffat) unrelated to the film, his remarks are nevertheless pertinent in describing the tension in Her between a visual lack of embodiment and its vocal or sonorous suggestion through technology. In the film, we never find out where the departed OSs have gone to, what kind of world they autopoietically inhabit, and we do not know

648   jason r. d’aoust what kind of communications system they have created for themselves. Like Theodore, we simply know that they suddenly become silent to human ears, and that their silence forms the cinematic equivalent of a visual disappearance. In the end, the eidetic imagination is supplanted by sonorous memories.

Conclusion Mozart’s Magic Flute, Wagner’s Siegfried, Dick’s Do Androids Dream?, Scott’s Blade Runner, and Jonzes’s Her all question the human experience by surrounding protagonists with other nonmammalian animal species (serpents and birds) and artificial life forms. Scott’s film, like the novel it adapts, further emphasizes human disconnection from the animal world through its treatment of freedom-seeking androids. Although these considerations make them good candidates for posthumanist readings, similar readings of other operas would help us further understand how vocality plays an important part in posthumanist communication. Take, for example, Wolfe’s discussion of the increasing importance of the mouth in Björk’s performance for Lars von Trier’s Dancer in the Dark (Wolfe 2010, 178–84). Richard Strauss’s Salome (1905) would be an interesting opera with which to compare this tension between voice and embodiment, as John the Baptist’s voice is silenced in order that Salome may kiss his mouth. In terms of further historically displacing the animal/human binary, one might also consider JeanPhilippe Rameau’s Platée (1745) or Antonin Dvořák’s Rusalka (1901), both of whose plots pair a water nymph with a human lover of royal lineage, along with all the humanist implications of consecration, law, and logos simply waiting to be challenged. Furthermore, the last century of opera scenography has seen the rise of stage directors, their liberation from opera’s traditional theatrical conventions, and the adaptation of traditional sets and plots to different times and places. Like Dick, opera directors are increasingly free to situate familiar characters, plots, and ideologies in unfamiliar settings that speak to the problem of addressing contemporary concerns with outdated ways of viewing the world. Take, for example, Alexander Mørk-Eidem’s recent production of The Magic Flute for the Norwegian National Opera: Tamino, the space-pilot prince, crashes on a strange planet where he gets caught up in an alien rivalry, and falls in love with a jellyfish-eating Pamina whose spine, like her mother’s, also looks and glows like a medusa. Meanwhile, Papageno no longer catches birds, but jellyfish! Although these visual inventions do not necessarily alter the opera’s vocality, they allow us to further understand opera’s cultural work of exclusion and inclusion, its policing of transgression, and the aesthetics it brings to bear in order to justify these social practices, as well as how opera’s practitioners are now desconstructing their repertoire. Literature’s staging of opera also supports such critical directorial work, as it mediates the experience of vocality and demonstrates how it can be reduced or co-opted by discourse.

posthumanist voices in literature and opera   649

Notes 1. “poiesis, n.” OED Online. September 2016. Oxford University Press. http://www.oed.com/ view/Entry/146580?isAdvanced=false&result=1&rskey=wR8oC7&. Accessed October 17, 2016. 2. In the next section, I reference publications that historically revise discourse’s (logos) containment of sonority (phoné). 3. Derrida summarizes his point rather well in the introductory comments to the chapter: We know already in fact that the discursive sign, and consequently the meaning, is always involved, always caught up in an indicative system. Caught up is the same as contaminated: Husserl wants to grasp the expressive and logical purity of meaning as the possibility of logos. In fact and always (allzeit verflochten ist) [it is interwoven] to the extent to which the meaning is taken up in communicative speech. To be sure, as we shall see, communication itself is for Husserl a stratum extrinsic to expression. But each time an expression is in fact produced, it communicates, even if it is not exhausted in that communicative role, or even if its role is simply associated with it.  (Derrida 1973, 20) 4. Psychoanalysis understands the ultimate conflation of inner voice and supposedly objective knowledge as madness (Vasse 1974). 5. In recent conversations, Jonathan Culler and Cynthia Chase have suggested that the comparison of the musical voice with the phenomenological voice might not be as productive as its comparison with the performative voice. Although Wolfe does engage with performativity, he does not do so in relation to opera, as I discuss further on. While I look forward to further engaging with the performative approach to voice (see Duncan 2004), I am here working within Wolfe’s chosen frame of reference for the “operatic voice.” 6. Derrida is aware of the devocalization of the logos, as Speech and Phenomena demonstrates. Although Of Grammatology does not cite particular examples of the devocalization of logos between Plato and Rousseau’s time, it certainly acknowledges the philosophical trend to silence language’s sonority: “The evolution and properly philosophic economy of writing go therefore in the direction of the effacing of the signifier, whether it takes the form of forgetting or repression” (Derrida 1998, 286). 7. Although Dolar tends to conflate voice, tone, and music in his reading of Plato, his overarching argument bounds in the same direction as Cavarero’s videocentric critique. Dyson also comments on Derrida and other thinkers’ ambivalent relations to sound: “The often contradictory thinking about sound [ . . . ] emanates from aurality itself: that is, from the conceptual lacuna that remains when sound not only is theorized but, crucially, is party to a negotiation between embodiment, technology, and modernity” (Dyson  2009, 84). Cf. Derrida on sound’s penetrating violence because of the ear’s incapacity, unlike that of the eye, to shut out external stimuli. (1998, 240) 8. Tomlinson’s title also suggests that opera is intrinsically metaphysical in its interests and pursuits. However, I argue in what follows that such an historical or archeological reading does not preclude traditional opera’s deconstruction. Apart from reading Tomlinson, one should also listen to “Dal Mio Permesso Amato,” the prologue from Monteverdi’s L’Orfeo (1607) and compare its presentation of voice with that of an aria from a much later opera, say the “Forging Song” from Wagner’s Siegfried (1876). Historically informed musical performance accounts for the different kinds of vocal embodiment and of vocality called for by earlier musical styles and cultural contexts. See the reference list for suggested recordings.

650   jason r. d’aoust 9. Contrary to Cavell’s claim that early opera is historiographically whole, affording us the certainty of its origins, Pirrotta demonstrates in Le due Orfei how: “For the history of music, basically, the text of [Poliziano’s] Orfeo is like a commemorative epigraph of a musical fact that is irremediably lost.” (Pirrotta 1975, 5, my translation). 10. The opera opens on a scene in which a serpent monster attacks Tamino, who is saved by the Queen of the Night’s ladies in waiting. He is later helped by a bird catcher, Papageno, in his quest to find Pamina, the Queen’s daughter. By focusing only on a few symbolic nonmammalian animals—and ominous ones at that, such as the raven and the python— Blade Runner emphasizes how the fear of aggression from other species regulates the unconscious human logic in the hunt for the rebel androids. The film, however, minimizes the denial mechanism—the ethics of stewardship—at the heart of the novel’s ideology, which attempts to cover the extent of human entanglement in the technological imitation and reproduction of life, especially human life. 11. For a discussion of narcissistic identity formation, queer theory, and the posthuman voice, see Hanson (1993).

References Adorno, T. W. 2002. Essays on Music. Translated by R. D. Leppert and S. H. Gillespie. Berkeley: University of California Press. Augustine. 1998. The Confessions. Translated by H. Chadwick. Oxford: Oxford University Press. Buchanan, I. 2010. A Dictionary of Critical Theory. Oxford: Oxford University Press. Cavarero, A. 2005. For More than One Voice: Toward a Philosophy of Vocal Expression. Translated by P. Kottman. Stanford, CA: Stanford University Press. Cavell, S. 1994. A Pitch of Philosophy: Autobiographical Exercises. Cambridge, MA: Harvard University Press. Cecchetto, D. 2013. Humanesis: Sound and Technological Posthumanism. Minneapolis: University of Minnesota Press. Derrida, J. 1973. Speech and Phenomena, and Other Essays on Husserl’s Theory of Signs. Evanston, IL: Northwestern University Press. Derrida, J. 1998. Of Grammatology. Translated by G. C. Spivak. Baltimore, MD: Johns Hopkins University. Press. Dick, P. K. 2007. Four Novels of the 1960s: The Man in the High Castle; The Three Stigmata of Palmer Eldritch; Do Androids Dream of Electric Sheep?; Ubik. New York: Library of America. Dolar, M. 1996. The Object Voice. In Gaze and Voice as Love Objects, edited by R. Salecl and S. Žižek, 7–30. Durham, NC: Duke University Press. Driscoll, K. 2015. Animals, Mimesis, and the Origin of Language. Recherches Germaniques 25 (10): 173–194. Duncan, M. 2004. The Operatic Scandal of the Singing Body. Cambridge Opera Journal 16 (3): 283–306. Dyson, F. 2009. Sounding New Media: Immersion and Embodiment in the Arts and Culture. Berkeley: University of California Press. Farrell, E. 1993. Eileen Farrell on Charlie Rose. Charlie Rose, PBS, August 12, 1993. Furman, N. 1991. Opera, or the Staging of the Voice. Cambridge Opera Journal 3 (3): 303–306. Hanson, E. 1993. Technology, Paranoia and the Queer Voice. Screen 34 (2): 137–161. Hayles, K. 1999. How We Became Posthuman: Virtual Bodies in Cybernetics, Literature, and Informatics. Chicago, IL: University of Chicago Press.

posthumanist voices in literature and opera   651 Heidegger, M. 1962. Being and Time. Translated by J. Macquarrie. New York: Harper. Hitchcock, A. 1998. Vertigo. Universal City: Universal Home Video. Ihde, D. 2007. Listening and Voice: Phenomenologies of Sound. 2nd ed. Albany: State University of New York Press. Janus  A. 2011. Listening: Jean-Luc Nancy and the “Anti-Ocular” Turn in Continental Philosophy and Critical Theory. Comparative Literature 63 (2): 182–202. Lawlor, L. 2014. Jacques Derrida. In The Stanford Encyclopedia of Philosophy (Winter 2016 Edition), edited by N. E. Zalta. Metaphysics Research Lab, Stanford University. Stanford CA. http://plato.stanford.edu/archives/spr2014/entries/derrida/. Accessed April 8, 2017. Luhmann, N. 2010. Introduction to Systems Theory. Translated by P. Gilgen. Cambridge: Polity. Mansfeld, J. 2005. “Illuminating What Is Thought”: A Middle Platonist Placitum on “voice” in Context. Mnemosyne 58 (3): 358–407. Maturana, H. and F.  Varela. 1980. Autopoiesis and Cognition: The Realization of the Living. London & Dordrecht: D. Reidel. Monteverdi, C. 2007. L’Orfeo. Rinaldo Alessandrini (conductor). Concerto Italiano (orchestra). Naïve, B000T7QXA0. CD. Mozart, W.  A. 2010. Die Zauberflöte. René Jacobs (conductor). Akademie für Alte Musik Berlin (orchestra). Harmonia Mundi, HMC902068.70, CD. Nancy, J.-L. 2007. Listening. Translated by C. Mandell. New York: Fordham University Press. O’Mathúna, D. P. 2015. Autonomous Fighting Machines: Narratives and Ethics. In The Palgrave Handbook of Posthumanism in Film and Television, edited by M. Hauskeller, C. D. Carbonell, and T. D. Philbeck. New York: Palgrave. Pirrotta, Nino. 1975. Le due Orfei: Da Poliziano a Monteverdi. Turin: Einaudi. Poizat, M. 1992. The Angel’s Cry: Beyond the Pleasure Principle in Opera. Translated by A. Denner. Ithaca, NY: Cornell University Press. Spivak, G.  C. 1976. Translator’s Preface. Jacques Derrida. Of Grammatology. Translated by G. C. Spivak. Baltimore, MD: John Hopkins University Press. Tomlinson, G. 1999. Metaphysical Song: An Essay on Opera. Princeton, NJ: Princeton University Press. Vasse, D. 1974. La voix. In L’ombilic et la voix, 177–212. Paris: Seuil. Wagner, R. 2005. Siegfried. In Der Ring Des Nibelungen. Pierre Boulez (conductor). Orchester der Bayreuther Festspiele (orchestra). Deutsche Grammophone Unitel, 0734057. DVD. Wolfe, C. 2010. What Is Posthumanism? Minneapolis: University of Minnesota Press. Žižek, S., and M. Dolar. 2002. Opera’s Second Death. New York: Routledge.

Further Reading Braidotti, R. 2013. The Posthuman. Cambridge: Polity Press. Neumark, N., R. Gibson, and T. van Leeuwen. 2010. Voice: Vocal Aesthetics in Digital Arts and Media. Cambridge, MA: MIT Press. Pettman, D. 2017. Sonic Intimacy: Voice, Species, Technics. Stanford, CA: Stanford University Press. Schlichter, A., and N. S. Eidsheim. 2014. Voice Matters (Special issue). Postmodern Culture 24 (3).

Index

Note: Italic “f ” and “t” following page numbers denote figures and tables.

A

À la recherche du temps perdu (Proust) 224–225 Aaker, D. A.  352, 353t Abbey Road sound  103 absolute music  472–475 absolute pitch (AP)  416–424, 418f Abu Ghraib prison  290 acousmatic sound and music and aesthetics of sonic atmospheres  522–523, 527, 529–531 and imagination and imagery  261 and imaginative listening to music  476–477, 480, 485n16 and “indicative fields”  275n11 and movement of sound  485n16 and music in detention/interrogation situations 294 and visual imagination  266–267 Acousmographe 267 acoustics  322, 343n3, 409, 416 action-sound bond  63, 73 active motor imagery  66–67 active touch  99 adaptive feedback  67–68 adaptive networks  370, 373, 376, 378, 383–384 Addison, Joseph  469 Adelaide Fringe Festival  310 Adonis project  323–324 Adorno, T. W.  515n28 and high art/vernacular art dichotomy  540 and imagination/improvisation relationship 25 on improvisation  546 and Innerlichkeit concept  513nn15, 18 and jazz as classical music  549–551

and memory/imagination relationship 221–222 and philosophy/music relationship  514n20 on reification of vocal music  637 and responses to philosophical rationality 568–569 and “Sprachcharakter” concept  512n11 on vinyl recordings  232 and Waltonian fictionality  493, 495–497 and Walton’s normativist theory  505–509 The Adventures of Telemachus (Fénelon)  24 advertising, audiovisual  358 Aesthetic Theory (Adorno)  512n11, 514n20 aesthetics humanist approach to  535–539 and imaginative listening to music 467–484 of improvisation  535–554 and rhythmic transformation in digital audio music  596–600, 602–603, 606n5 and sonic atmospheres  517–532 Waltonian reconstruction of Bloch’s musical aesthetics  489–511 The Aesthetics of Music (Scruton)  542–543 affective dimension sound  374–375 affect of sound  275n12 affective shapes  251 affective-cognitive meaning  350 and sonic atmospheres  518–521, 525–527, 529–532 affective dimensions of environment  518, 527 affordances and content of music  92n3 and embodied cognition  91n2 and emergent character of music  77–78, 88–90

654   index affordances (Continued) and emergent nature of listening  80, 82–83 and groove  86 and guided imagery theories of consciousness 437 and informational quality of sound  294 and motor imagery in perception and performance 68–69 in musical performance  91n1, 92n9, 97–102, 104–107, 110–111 and skilled musicianship  94n19 and the unconscious  84 African American identity  598, 600, 620 African music  288 Afrofuturism  134, 611–620, 622, 624, 626n8 After Finitude (Meillassoux)  560, 562 agency  45–46, 63–66, 72, 73n6 Agnew, M.  42 AIM (activation, input, modulation) model of consciousness 301 Air Guitar Championship  106–107 air performance and instruments  50, 99–100, 102, 104–109, 108t, 243, 254, 265–266, 268 Aitken, J. C.  285 Alberts, Gerard  580, 583, 586 Aleman, A.  458 Alexander Technique  401 algorithms  155–156, 158, 164, 167–175, 176n8 Alice in Wonderland (film)  226 allegory  508–510, 513n16 Allen, Marshall  145, 625 allocentric navigation  207 Allures (Belson)  309, 313 al-Qatani, Muhammad  291–292, 295–296 altered states of consciousness (ASCs)  302–306, 309, 428–429, 438 Alzheimer’s disease  230 ambiance  518, 523–524, 526–527, 529, 532 ambiguity  281, 294, 296–297 Ambisonics surround sound system  529 Amen break  606n11 amodal musical shape cognition  244, 250, 250f amygdala  376, 380 analog recording and performance  234n5, 317n2, 596–598, 607n14 Analogy of the Divided Line  18 analysis of variance (ANOVA)  330

anamorphosis 246 Anatolia 122 ancillary motion  248 Andrade, J.  394, 399 androids  621, 640–650 Anger, Kenneth  307 annoyance 338 anterior cingulate cortex (ACC)  380 anthrophonic interference  183–184 anthropocentrism  559–560, 562, 565–566, 569–571, 573–574, 576n4 anthropomorphic presence  532 anthropomorphism  231, 525, 531 anticipation  44–47, 481–482 antisemitism  493, 569 Aphex Twin  598 Apollo 639 Apollo (Eno)  618 Appel, M.  47, 450 Apple  125, 579, 581–582 appreciation of music  467, 474, 483–484 arche-sonic texture  563–567, 569–570, 574–575 architecture  214, 275n6, 481 arena space  311–312 aristocratic patronage  541 Aristotle  19–20, 22, 203, 221, 470, 489 Aristoxenos  203, 210 arithmetic 201 Arkestra  145, 613, 616, 620, 622, 624–625 Armstrong, Tim  596 Arnott, Robin  310 arousal states  454 Ars Nova  123 Art and Imagination (Scruton)  536 “art for art’s sake”  540 The Art of Improvisation (Whitmer)  545 Artescape 315 articulation  248, 251, 456–457 artifacts 111n3 artificial intelligence  110, 215, 647 artistic criticism  540 Ash, R. L.  360 Asian cosmology  150n4 associative potential of music  350–351, 353–354, 360 associative-emotional meaning of music  354 “Astrohood” (Ras G.)  622–623

index   655 “Astrological Forecast” (Righter)  568 asynchrony  92n6, 396 “At Last” (James)  79, 88–89, 94n20 Atari Video Music  309 athletic training and performance  66–67, 391–392 Atmosphères (film)  526 atmospheres, sonic  531–532 basic sonic environmentalities  523–524 in contemporary auditory culture  517–518 described 526 and environmental imagination  519, 521–523 as environmental presence  524–526 and environmentality  518–520 examples of  526–531 Attali, Jacques  611 attention; see auditory attention attribute modeling  338–341 attribute rating  335–338, 337f attribute reduction  335, 336f audiation  398, 400 audience engagement  145–146 audio branding  349–350 classical literature on  350–351 and computer system sounds  580–582 integrated brand-music communication model 362–363 meaning structure in brand management 351–353 role of music in  353–362 audio descriptive analysis and mapping (ADAM) 333 audio industry  321–323, 342, 343n1 audio logos  350, 354 audio objects  323–324 audio technology  179–181, 183–184, 187, 189 audiovisual advertising  358 audiovisual media  302–303, 304f, 305–314, 310f, 312f, 314f, 316–317 auditive monitor  580 auditory attention  369–373, 377, 381–383, 454, 484n4 auditory cortex  371–373, 376, 382 auditory culture  517 auditory feedback  37–38, 42–47, 53–54, 101–102

auditory imagery and anticipated sonic actions and sounds  38, 42–45, 47, 53 and Dorsch’s terminology  343n2 and music pedagogy  391–404 and musical instruments as tools  102 psychology research on  392–397 and sonic materialism  561 auditory perception; see perception of sound auditory streams  372 auditory system  371–373 auditory-motor associations  449, 453–454, 458 auditory-motor mapping  448 augmented unreality  301–316, 314f, 318n16 Augustine, Saint  633, 636 aural imagination  424 aurality of sound  185 Austen, Jane  471 Autechre 598 authorial voice  566, 573–574 autistic children  409–424 autoaffective voice  630–634 automatic writing  232 automotive industry  350–351 autonomic nervous system  380, 383 autonomous art  540–541, 543, 548–549 autonomous recording units (ARUs)  180, 183, 186 autopoiesis  630–634, 646–648 autotune  111n8, 604 Auyang, S. Y.  224 avant-garde art  302, 612–613, 626n8 aversive conditioning  381 Avraamov, Arseny  272 Azoulay, A.  352

B

Babbitt, Milton  475 Bach, C. P. E.  24 Bach, J. S.  159, 160f, 171–173, 172f, 209, 468, 474, 476, 484, 540, 545 Bachelard, Gaston  270 Back on the Planet (Ras G.)  623 backpack microphones  180, 185–186 Baddeley, A. D.  394, 399, 452 Bailes, F.  445 Bailey, Derek  26, 547

656   index Baily, J.  99 Baker, J. M.  451 Bakhtin, M.  108–109 balance and blend  335, 336f, 338, 338f, 341 Bang & Olufsen  323, 329, 331 Bangert, M.  46 Barenboim, Daniel  482 baroque music  23–24, 40 Barrada, G.  437 Barthes, Roland  79–89, 92nn5–6, 229 Barton, Judy (character)  645 Batteaux, Charles  469 Bayesian inference  164 Bayle, François  269 Beaman, C. P.  456–457 beatboxing 244 “The Beatitudes” (Martynov)  85, 88 the Beatles  477 beatless music  618–619 Beaty, R. E.  445, 454–455 Beckerman, J.  582 Beckett, S.  229 Beethoven, Ludwig van  25, 86, 142, 173, 181, 241, 242f, 537, 553 Begg, Moazzam  291–292, 295 behavioral control  284–286 “Bei Männern”  642–643 Beilock, S. L.  110 Being Singular Plural (Nancy)  28 Bell, Clive  469 Bellman, J.  298 Belson, Jordan  309, 313 Benadon, F.  602 Benny Goodman and his Orchestra  467, 475 Ben-Or, Nelly  401–403, 450–451 Beranek, L. L.  323 Berger, A. M. B.  209 Bergson, H.  220, 223 Berio, Luciano  25 Berkeley, George  19 Berklee College of Music  549–550 Berliner, P. F.  30 Berlioz, Hector  42 Berlyne, D. E.  358 Bernays, E. L.  283, 293, 296 Besson, Luc  130 Beyond the Pleasure Principle (Freud)  234n2

Bijsterveld, Karin  580 Binnorie (ballad)  225 bioacoustics  179–181, 183–187, 189 bio-feedback 29 biology and autopoiesis  631 biological entropy  231 biological experience  459 biological sciences  180, 198–200 Bird, Donald  481 birdsong 181–183 Bishop, L.  45 Björk 648 “black box” view of computers  586–588 “The Black Man in the Cosmos” (James)  617 “Black to the Future” (Dery)  615 blackness  612–617, 620–626, 626nn2, 11 blackness and technology  616–618 Blade Runner (film)  618, 641–648, 650n10 Blake, William  222 Blakeslee, S.  16 Blakey, Art  550 Blattner, M.  582 Blaxploitation films  612 Bley, Paul  546 Bloch, Ernst  509–511 breadth of interests  514nn19, 22 and Christianity  514n25 concept of musical tones  514n23 Habermas on  512n5 and Hegelian teleology  512n6 and imaginative listening  496 Korstvedt on  513n16 musical aesthetics of  498–501 and music/philosophy relationship  514n20 normative aesthetics of  489–491 utopian allegory  490, 499, 501–510, 512n11, 513n16, 514n22 and Walton’s musical aesthetics  505–509 Western content of work  514n24 Blom, K. M.  436, 439 Blount, Herman Poole; see Sun Ra The Blue Danube 100–101 bodily causes of sounds  48–50 body materiality  62 body schemata  64–68

index   657 body-motion and anticipatory imagery of sonic actions  47 and motion features  248–249 and motor cognition  243–244 and multimodal sound-motion shapes 249–251 and musical instants  251–253 and musical shape cognition  243 and musical timescales  245–246 and notions of shape  239–243 and shape cognition  237–238, 250f, 253–254 and sound features  247 synoptic representation of notation  242f bodymusic 618 body-object articulation  72–73 BodySynth electrodes  617–618 Boghossian, Paul  478 Böhme, Gernot  517, 524–525 Bonde, A.  354, 360 Bonny, Helen  430 Bonny Method  427–430, 434–435, 437, 440 Booth, Sean  598 Borgo, D.  17, 29 Bosch, Hieronymus  491, 507–508 Boudin (military song)  287 Boulez, Pierre  25, 109, 475, 546–547, 604 Bowman, W.  17 Bradley, M. M.  374–375 Brahms, Johannes  230 Braidotti, Rosi  559 brain imaging and physiology and altered states of consciousness  304 and anticipatory imagery of sonic actions  46 and audio branding  358–359 and auditory attention  381–382 brain maps  142 and guided imagery theories of consciousness 437 and imagination and imagery  262–263 and imagination-driven sound synthesis 273 and involuntary musical imagery  453 and motor actions of professional musicians 109 and music therapy  428 and musical imagery  253–254 and musical shape cognition  243

neural correlates of musical emotions 380–381 and neural synchronization  147–148 and perception of intervals  119 and perception of timbre  41 and sound/emotion connection  370–373, 375–384 and voluntary auditory imagery  393–394, 394f, 400, 402 and voluntary musical imagery  452–453 see also neurons and neuronal activity Brand, Albert R.  182 branding brand image and identity  349–363 brand loyalty  352 brand personality  352 brand resonance  352 brand salience  354–355, 362 brand values  352–353 brand-music communication model  362–363, 363f see also audio branding breathing 196 Bregman, Al  30 Brendel, Alfred  547 Brent, William  647 Bresin, Roberto  583–584 Bressloff, P. C.  303 Breton, André  232 broadcast technology  286 Broca area of the brain  46 Brodsky, W.  452 Brooks, R. A.  110 Brotha from Another Planet (Ras G.)  622 Brøvig-Hanssen, R.  598–599, 601, 604 Brown, Ray  89 Brown, Rob  598 Brown, S.  446 Bruegel the Elder, Pieter  492 Bruner, Steven (Thundercat)  623 Buccino, G.  107 Bucknell Auditory Imagery Scale (BAIS)  391, 395–396 Bucknell Auditory Imagery Scale-Vividness (BAIS-V) 455 Budd, Malcolm  475–476, 485n21 Burgoyne 454

658   index Burrows–Wheeler algorithm  174 Burton, Richard  644 Bush, Carol  440 Busoni, Ferruccio  130, 543, 547 Byzantine neumes  131n1

C

Cadoz, C.  70 Cage, John  273, 478, 625 Caillebotte, Gustave  541 Callan, A. M.  450 calligraphy 182–183 calm technology  588–589 Calvo-Merino, B.  29 camera technology  228–230, 313 “Can I Get a Flicc Witchu” (Snoop Dogg)  602 Canon  8, 21 capitalist modernity  503 Cardiff, Janet  528–532 Carlsen, K.  600 Carter, E.  547 Cartesian mind-body problem  19, 27–28 Casals, Pablo  91 Casey, E. S.  221–224, 228–229 causal listening  49, 82 causality, musical  16, 480, 482–483 Cavarero, Adriana  635–636, 644, 649n7 Cavell, S.  545–546, 638–639, 643–644, 650n9 Cecchetto, David  647 cell adhesion molecules  136 “Celtic belief ”  232–233 chamber music  93n14 chants  21, 195–197, 197f, 204–205, 214–215 Chaplin, Charlie  248 Charlemagne 196–197 Chase, Cynthia  649n5 Chater, N.  167 Chemero, A.  111 Cheshire Cat  226 children autistic 409–424 childhood development  16, 436 and differences in auditory ability  395 and guided imagery  440 and imagination/improvisation relationship 24 and music/homeostasis relationship  140

and voluntary auditory imagery in music pedagogy  398, 400, 402 Chion, Michel  49, 185 choirs  195, 212 cholinergic system  382 Chomsky, Noam  23 chords  250, 480 Christianity 514n25 chromatic scale  121f, 122, 125–126, 130, 131n3, 213 chunking 243 church music  20–21, 195–197 CIA  284, 288, 291, 295 city soundscape  273 clairvoyance 514n21 Clark, T.  98 Clarke, Arthur C.  100 Clarke, D.  428, 437 class systems  575; see also social hierarchies classical music  23–24 defining 550–552 jazz as  548–550 see also Western music and culture Claudel, Paul  571 Clementi, Muzio  25 clivis neume  119 Cloonan, M.  282 club music  62; see also electronic dance music (EDM) Clynes, Manfred E.  584–585 coarticulation 245 cockpit model of control  70 coding language  164 cognition and cognitive processes and audio branding  360–362 brand-music communication model 362 cognitive appraisal  356–357 cognitive brand meaning  360–362 cognitive experience  459 cognitive linguistics  241 cognitive metaphors  439 cognitive processing  482 cognitive schemata  60 cognitive simulation  28 and emergent character of music  77–78 and emergent nature of listening  80

index   659 and guided imagery  440 imagination in embodied cognitive science 26–31 and improvisation  15 and motor actions of professional musicians 109 resonance and synchronization in  148 Cohen, A. J.  354 Coleman, Gregory Cylvester  606n11 Coleman, Ornette  550 Coleman, Steve  91 Coleridge, Samuel Taylor  19, 513n17, 536 collective music  548 Colles, H. C.  26 Colley, I. D.  396 color illusions  479–480 color of sounds  39, 41 Coltrane, Alice  623–624 Coltrane, John  624 communicative motion  248 communicative musicality  275n14, 413 Communist Manifesto (Engels and Marx)  507 community-building 184 The Companion Species Manifesto (Haraway) 575n1 complex audio stimuli  324–331 complex harmonic tone  200 complexity  181, 193, 196–198, 210–211 composers and composition  111n1, 154, 267, 270–271, 451, 547–548 Comprendre, 93n10 compression of data  53, 153–168, 170–175 computer games  622–623 Computer Music Journal 266 computer technology and altered states of consciousness  313 and augmented unreality  302, 305, 307–309, 313–316, 318n16 computer operating systems  318n16, 579–591, 647–648 computer programming  197, 200, 212, 215 computer-generated imagery (CGI)  307–308, 313 and interaction with music  128 and visual imagination  267–268 Computer World (Kraftwerk)  597 concatenationism 481

“The Concept of Logos” (Heidegger)  632 conceptual model of human perception  325f Concert for the Comet Kohoutek (Sun Ra)  619 conditioning  350–351, 356–357, 363, 375–376 conducting  42, 46, 48, 50–51, 195 Confessions (Augustine)  633 congruence-associations framework  354 Connolly, William E.  565–566, 570, 572 Connor, S.  227–229 Conrad, Joseph  233 consciousness  109, 141 consciousness, theories of  434–438 consensus vocabulary techniques  328 consent 274 conservationism 183–184 consonance  123–124, 200, 213 consonance and dissonance  216n6 Constable, John  543 consumer marketing; see audio branding consumer sound  321–343 consumer-based brand equity model pyramid 351 Consumercheck 330–331 context engineering  318n16 context of sound  261, 484n5 controller-driven instruments  38, 47–53 Cook, Nicholas  110, 470 Copland, Aaron  481 Cork, Conrad  551 Cornell University  182 corpus callosum  109 COSIATEC algorithm  170–175, 170f, 171f, 172f, 173f Council of Basel  21 Council of Trent  21 countermelody 398 counterpoint 209 Counter-Reformation 21 Courbet, Gustave  541 Cox, A.  63–64 craftsmanship in music  252 creativity 98 critical listening  79 critical theory  490 Critique of the Power of Judgment (Kant)  24, 469–470 Croce-Collingwood theory  537

660   index cross-cultural uniformity  480–481 cross-modality  189, 374 Crossole 51 cross-validation 340–341 CRT displays  323 Crystallize 309 Cubase 125 Culler, Jonathan  649n5 culture and cultural environment and aesthetics of improvisation  535–536, 538–539, 545–546, 550–551 and Bloch’s utopian allegory  502 cultural influences on listening in infants 485n19 cultural schemata  63 culturally ingrained musical performance  61–63, 69, 72–73 culture industry  221–222, 550–551 and externalization of imagined sound  215 and externalization of sound  199 and information complexity in biology 198–200 and informational quality of sound  294 and jazz as classical music  549 and metempsychosis theory  226 music as propaganda tool  296–298 and music in interrogation and detention  289, 291–293 and music preferences  286–288 and musical shape cognition  240–241 and perception of consonance/ dissonance 216n6 and sound perception in autistic children 422 and universal functions of music  146 use of music in torture  292 see also Western music and culture Curtis, Darren  310 Curwen hand-signs  402 Cusick, S. G.  283, 288–289, 291–293, 295–296 Cut-log diagrams  434 cybernetic prostheses  579–580, 585–591 Cyborg: Evolution of the Superman (Halacy) 585 “Cyborg Manifesto” (Haraway)  585 cyborgs  584–586, 624–625 “Cyborgs and Space” (Clynes and Kline)  585

D

“Daar zou er en maagdje vroeg opstaan” (folk song) 171f daily life  451 “Dal Mio Permesso Amato” (Monteverdi) 649n8 Damasio, A. R.  27, 150n11 dance music  309, 605 “Dance of the Language Barrier” (Sun Ra)  144–146 Dancer in the Dark (Trier)  648 D’Angelo  600, 602 Daniel, J. O.  513n15, 513n18 Danish String Quartet  85–86 Danius, S.  229 Dark Ages  196 Darwin, Charles  378 Darwinian evolution  135, 378 Das klagende Lied (Mahler)  225 Das Wohltemperierte Klavier (Bach)  159, 160f, 171, 173, 213, 474 Dasein 525 dasian pitch signs  205f data compression  153–168, 170–175 data gloves  51–52, 52f Davidson-Kelly, K.  401 Davies, Stephen  476, 478–480 Davis, Miles  85, 87, 89, 94n20, 544, 550, 552 de Bezenac, C.  110 Deacon, Terrence  211 “Dead Friends” (Knight)  623 death, fear of  430 Death and the Shell (Renard)  226 Decety, J.  31, 111n5 Deckard, Rick (character)  641–646 decoding of musical information  156–159, 161 A Defence of Poetry (Shelley)  19 definite listening  468–469 DeLanda, Manuel  559 Delany, Samuel  615 Deleuze, Gilles  517–519, 530–531 Della-Casa, Lisa  646 dementia 30 demographics  353, 361, 361t Dennett, Daniel  17, 30 DeNora, Tia  17, 427, 435, 437, 440

index   661 dental offices  285 DePape, A.-M. R.  416 Deren, Maya  307, 612–613 D’Errico, M.  600, 602 Derrida, Jacques  27, 631, 633, 636, 649n3, 649n6, 649n7 Dery, Mark  615, 617, 619–620 Descartes, René  19, 428, 633, 638 descriptive analysis  321, 324, 326–332 descriptive meta-aesthetics  505 designed sounds  581–584, 587, 589, 591 Destiny’s Child  598–599 detainees 295–296 detention and interrogation  281–298 DeVeaux, Scott  548 deWitt, Anna  583–584 Diabolus in Musica  194 dialectics 636 dialogical communication  70–71 Dialogus de musica 204 diatonic scaling  120–123, 121f, 130, 131n3, 162, 194, 201, 204–205, 213 Dick, Philip K.  641–648 Die Kunst der Fuge (Bach)  476, 484 Die Zauberflöte (The Magic Flute) (Mozart)  639, 641–642, 644–645, 648 differences in perception of music  167–169 differentium specificum 495 digital technology and media and altered states of consciousness 304–306 and augmented unreality  302, 313–317, 314f, 318n16 digital audio workstations (DAW)  596–598, 601–605, 606n4 digital computers  102–103 digital instruments  42–43, 61, 70–73 digital music and sounds  47–52, 128, 180–181 digital processing  322, 605 digital recording  234n5 and digital representation of hallucinations 307–308 digitalization process  200 digitally augmented sound  48 direct digital instruments  51 and interaction with music  127–129

and musical shape cognition  240 and rhythmic transformation  595–605 and virtual instruments  102–105 Dionysus 638–639 direct elicitation principle  328 disclosure 17 disco 596 discontinuity 251–252 disembodiment  49–50, 232 Dissanayake, Ellen  140 dissonance  123–124, 200, 213 distortion 323 distraction  321, 332, 335–338, 336f, 336t, 338f divinatory musical hearing (Hellhören)  501, 505 DIY sound systems  287 DJs  52, 72, 107, 624–625 DMT (dimethyltriptamine)  308 DNA research  626n5 Do Androids Dream of Electric Sheep? (Dick) 641–648 Documenta  13, 528–531 Dolar, Mladen  636, 638–639, 649n7 Donut (J Dilla)  601–602 Doornbusch, P.  104 Doppler Labs  316 Dorsch, F.  343n2 dorsolateral prefrontal cortex (DLPFC)  394, 396 dreaming 302–303 Dreyfuss, H. L.  223–224 Driscoll, Kári  646 drugs, hallucinogenic/psychoactive  301–305, 307–310, 312f, 313, 316, 317n3 drum kits  606n4 du Moncel, T. A. L. vicomte  234n4 dual nature of imagination  19–20, 22 dualist perception of vocal expression  629 Durango 315 Dutton, Denis  470 Dvořák, Antonin  648 dyadic synchronization  47 dynamic self  211–212 dynamic shapes  251 dynamic systems approach  20 Dyson, Frances  635–636, 649n7

662   index

E

EAnalysis 267 ear physiology  480 ear protection  580 Early Abstractions (Smith)  309 earworms  394, 420, 422, 445, 454–455, 470 ECG 273 echolalia  410, 414 ecoacoustics 180 ecological models and perspectives of cognition  77–78, 90–91 ecological embedding  68–69 ecological model of auditory perception  78, 409, 411–416 ecological psychology  101–102 ecological theory  437 ecology of mind  68–69 and sonic atmospheres  518–519, 523–524, 526–527, 529, 532 and sonic environmentalities  523–524 Economic-Philosophic Manuscripts (Marx) 512n6 Écouter 92n10 ecstatic religious traditions  301 Edelman, Gerald  138 Edison, Thomas  227, 230–231 Edo-era Japan  539 Eerola, Tuomas  79, 82–84, 360, 437 eGauge 331 Egermann, H.  27 egocentric navigation  207 Egypt, ancient  612, 616–617, 623, 625, 626n5 Eidola (Brent and Moffat)  647 Eisenstein, Sergei  576n7 Eitan, Z.  451 elastic boundaries  567 electric guitar  126 electric turn  127 electroacoustic sound and technology  37, 39, 93n11, 260, 267, 315, 522 electroencephalogram (EEG)  262, 273, 428 electromyography (EMG)  50–51, 452 electronic dance music (EDM)  309, 315, 598–600 electronic effects  606n2 electronic media  413 electronic music performance  72

electronic sounds  49 electronica 622 electrophones 126 elicitation procedure  341 Ellington, Duke  544 Elliott, R. K.  467 emancipation of the dissonance  551 embodiment and cognitive science  26–28, 31 and content of music  92n3 and continuity of mind, body, and environment 91n2 embodied cognition  142, 198, 446–449, 457–459 embodied cognitive theorists  16 embodied music  241 embodied response  265–266 and emergent character of music  77–78, 88, 90–91 and emergent nature of listening  80, 82–83 and limitations on music creation  118 and motor imagery in music perception  73 and motor imagery in perception and performance 61–67 and musical imagery  457 and musical performance  91n1 and musique concrète 93n11 and posthumanist vocality  647–648 and the unconscious  83–85 emergence emergence of shapes  243 emergent music  77–79, 82–91, 93nn11–12, 93n14, 94n20 emergent phenomena  31 emergent structures  133–134, 136, 141, 143–144, 146–149, 150n2 Emile; or, On Education (Locke)  24 emotion and audio branding  349, 351, 353, 355–360 and embodied meaning  142–143 emotional content of sounds  41 emotional listening  513n16 emotional processing  376 emotion/sound connection  369–384 influences on sound perception and auditory attention  381–382

index   663 learned emotional meaning of sound 375–376 mental representations induced by sound 374–375 and music in interrogations  284 musical  378–379, 490, 495–500, 513n16 neural correlates of musical emotions 380–381 and neuroaffective theory  429–433, 431–432t psychological mechanisms of emotion induction 379–380 relationship of sound and  369–370 and responses to auditory stimuli  373–381 and sound perception in autistic children 421–422 and theories of consciousness  434–438 vocal affect  376–377 empathy  359, 437 empirical imagination  222 empirical musical imagery and embodied cognition  446–447 and future research  457–458 and the “mind’s ear”  445–446 and offline cognition  447–449 review of studies on  449–456 tests of musical imagery’s embodiment 456–457 empiricists  15, 19 enactive embodiment  63 enactive representation  436 enactive sound  60 encoding of musical information  153–159, 161–168, 170–171, 170f, 174–175 endocrine system  380 enforced listening  296; see also interrogation England, Jeremy  133, 135, 142–143, 146, 148, 149n2 Enlightenment  24, 645 Eno, Brian  582–583, 586, 618–619 ensembles  87, 92n9, 93n14 Entendre 92n10 Enter the Void (film)  308, 312 entrainment  134, 139–140, 143–146, 149, 359 entropy  133, 135, 149, 149n1, 198, 231 Entwurf einer neuen Ästhetik der Tonkunst (Busoni) 130

envelope shapes  251 environment of sounds and content of music  92n3 and emergent character of music  77–78, 89–90 and emergent nature of listening  80 environmental acousmatics  522 environmental awareness  143 environmental experience  459 environmental imagination  519, 521–523 environmental reality effect  531 environmentality (Umweltlichkeit) 519 and groove  85–86 and live recordings  94n20 and musical performance  91n1 and musique concrète 93n11 sonic atmosphere as environmental presence 517–532 and tempo of performance  93n14 and the unconscious  84 and virtual instruments  103 see also atmospheres, sonic ephemeral nature of sound  212, 560 episodic memory  357, 379 equal difference  576n7 equal temperament  213–214 Eraserhead (film)  526–528, 530, 532 Eshun, Kodmo  613–615, 617–620, 622, 626 ethics of sound and music  260 ethnic identity  361t, 614–615 ethnomusicology 194–195 Etude 2 (Nakra)  51 Euclidean integer lattice  157f evaluable analysis  154–155, 172–173, 173f evaluative conditioning  379 Evans, Bill  546 Evans, Gil  544 Evans, Peter  112n10 everyday sounds and listening  409, 411, 413f, 415f, 451 evocation 267–268 evolution  100–101, 133–135, 137, 141–142, 146, 196, 199, 210, 212, 378 Ex Machina (film)  586 exceptionalism 538 excitatory motion  248 exercise 454

664   index expectancy, musical  379–380 expectation  321, 325–326, 326f experiences of listeners  321–324, 331–335, 341–342, 343n2 experience-sampling 454 experiential illusion  479–481 experimental paradigms  331 Exploding Plastic Inevitable (Warhol)  309 expressiveness expressive qualities of music  467–468, 472–477, 483, 485n21, 485n25 expressive schemata  359–360 expressive shapes  251 expressive voice  630, 634–637, 640, 644 and imaginative listening to music  475–476 and Walton’s musical aesthetics  494 external sensory inputs  317n1 externalization of sound  191, 196–211, 213, 216n3 extra-musical meaning  360 extra-terrestrial life  567

F

Faber’s Speaking Machines  234n4 false music  213 fantasy productions  536–538 Far Cry 3 (video game)  308 Fascism 569 FastTrack 342 Fear and Loathing in Las Vegas (film)  307–308, 312–313 fear of music  295–296 feedback  97, 397 “Feenin” (Weheliye)  620–621 feminism 559 Fender Rhodes electric piano  103 Fender Stratocaster  104 Fénelon, François  24 Fetis, François-Joseph  25 Feuerbach, Ludwig  499 fictionality, musical  491–496, 498, 502, 504–505, 508–509, 512n7 fictive music  213 field recordings  522 The Fifth Element (film)  130 film  302, 305–309, 312–313, 316, 317n2 filter model  326, 326f

fine-motor actions  67 Finney, S. A.  43 Fischinger, Oskar  308 Fischman, Rajmil  315 Fisher, Len  136 Fitzgerald, Ella  89, 94n20 Five-Factor Model  352, 353t fixed-explosion 232 The Floor-Scrapers (Caillebotte)  541 Floridou, G. A.  454, 458–459 Flow (Csikszentmihalyi)  30–31 fluctuation theorem  28–29 Flying Lotus (Steven Ellison)  622–626 Fog, C. L.  326f folk music  192, 214–215, 398, 403, 481 folk psychology  281 For More than One Voice (Cavarero)  635 A Foray into the Worlds of Animals and Humans (Uexküll)  187 Forest (for a Thousand Years) (Cardiff and Miller) 528–532 Forever Imaginary 309 form constants  303, 308, 310, 313, 317n3 Forth, J.  172 Frankfurt school  596 free elicitation  333–335, 334f free improvisation  547–548 French Foreign Legion  287 frequency of sound  372 fretless instruments  125 Freud, Ernst  81 Freud, Sigmund (and Freudian psychology)  81, 83–84, 220–221, 228, 230, 234n2, 307, 434 Fries, Pascal  148 Friston, Karl  28–29, 31 “From Dance! to ‘Dance’—Distance and Digits” (Emmerson)  266 Full Moon (Brandy)  600 “Fulldome” environments  317n9 functional harmony  123 functional magnetic resonance imaging (fMRI)  41, 46, 262, 393, 450, 453 funnel models  434 Furtwängler, Wilhelm  547 Fuster, J. M.  97 futurism 272; see also Afrofuturism

index   665

G

G Minor String Quintet (Mozart)  467 Gabbard, Krin  550 Gabrielsson, Alf  323, 345, 434 Gadamer, Hans-Georg  17 gait of sound  247; see also rhythm; tempo Galilei, Vincenzo  125 Gallagher, Michael  181 Gallese, Vittorio  16 Gamelan music  119, 128, 549 The Garden of Earthly Delights (Bosch)  491, 507–508 Garner, T.  148, 262–263, 275n12 Gaver, William  409, 411 Geisteswissenschaft 490 gender identity  590 Gendler, Tamar  521 generation 395 generative grammar  23 generative procedures  271 generic descriptive analysis  332 genetics 136 genres of music  361–362, 361t geoengineering 183 Geographical information system (GIS)  183 geometry 201 Gestalt psychology  68–69, 164, 240, 245–246, 520 gestures and anticipated sonic actions and sounds  42, 46–54 bodily causes of sounds  48 and embodied response  265–266 and evolving technologies of performance  108, 108t gestural imagery  254 gesture transducers  270 gesture-sound mappings  51–52 and hand as perceptual system  101 and imaginative listening  467 and imaginative listening to music  476 and rhythmic transformation in digital audio  595, 602, 605 and technological instruments  70–73 technology-mediated performance  50–52 and virtual instruments  104–106 Ghosts (Ibsen)  230–231

Gibson, James J.  68–69, 97–99, 102, 111n3, 224, 283, 293–294, 296–297, 427, 437, 518; see also affordances Gilliam, Terry  307 Gioia, Ted  535, 542–544, 553 Gittoes, G.  287 Gjerdingen, Robert  141 Glass, Philip  143, 475–476 Glenn Miller Orchestra  90 glitches  589–591, 595, 599–600, 602, 605, 606nn7–8, 607n14 Global Soundscapes: Mission to Control the Earth (film)  184 global transformations of tempo  602 Godøy, R. I.  70, 105–106, 447, 459 Goethe, Johann Wolfgang von  261 Goldsmiths Musical Sophistication Index 453 Goode, Brad  551 Goodman, Benny  467, 475 Goody, J.  209 Gordon, Edward  398, 400 Gordon, Mack  88 Gorky, Maxim  495 Gosling, S. D.  361t Gothic revolution  214 Gould, Glenn  544 GPS devices  185 grain of sound  247 Gramsci, Antonio  511 Granot, R. Y.  451 Grant, M. J.  282, 293 graphic representation  237, 240, 253 GRAphics Symbiosis System (GRASS)  310 The Great Beauty (Sorentino)  85 Great Ormond Street Hospital for Children study 140 Greco-Roman civilization  197 greedy algorithms  167, 170, 172, 176n8 Greek culture and philosophy and ancient Egyptian culture  617 and externalization of imagined sound 215 and humanist approach to music aesthetics 535–539 Pythagoras and Pythagoreans  118–121, 124, 126, 128, 131n4, 201–204, 203f, 210, 625

666   index Greek culture and philosophy (Continued) and theories of knowledge  31 and Western tuning systems  118–122, 126, 129–131 Greenspon, E. B.  448 Gregorian chant  196–197, 204–205, 214–215, 552 Gregory I, Saint  21, 197, 205 “Gretchen am Spinnrade” (Schubert)  472 Grey, J. M.  39 Grèzes, J.  31 Grimm’s Household Tales  225 Grimshaw, M.  148, 262–263, 275n12 Grocke, D. E.  429 Grof, Stanislav  434 groove  79, 85–86, 139–140, 193, 597, 602, 605 Grosz, Elizabeth  187–188 ground-truth analysis  173 Groves Dictionary of Music and Musicians 26 guided imagery and music (GIM) guided motion  482 guided response  473 and imaginative listening to music  477–478 and the “mind’s ear”  445 and multimodal imagery  427–428 music listening as psychotherapy  428–429 and music listening as psychotherapy 428–429 neuroaffective perspective on  429–434 and theories of consciousness  434–438 Guido of Arezzo  205, 207 Guidonian notation  121 Guitar Hero (video game)  104 Gurney, Edmund  468–469

H

Habermas, J.  499 habituation process  68 Halacy, Daniel S.  585–586 halftones 125 Hall, G. B. C.  416 hallucination 263 and altered states of consciousness  301–306, 316, 317nn1, 10 auditory and audiovisual hallucinations  303–304, 305f, 306–313, 310f, 315, 317n10 auditory-verbal hallucinations (AVHs) 303–304

and augmented unreality  313–316, 318n11 conceptual model of  312f, 314f diegetic representations of  306–308 non-verbal auditory hallucinations (NVAHs) 304 Thelemic visual hallucinations  307, 317n5 Hallward, Peter  565 Halpern, A. R.  41, 448, 452, 458–459 Hamid, Alexander  307 Hammond organ  103, 126 hand as perceptual system  100–102 Handbook of Music and Emotion (Juslin and Sloboda) 439 handedness 102 “The Hands” (sonic controller)  270 Hansen, A. G.  354 Hanslick, Eduard  472–475, 481, 492 haptics and haptic feedback  47–49, 52, 102, 269 Harari, Y. N.  100 Haraway, Donna  575n1, 585 Harbisson, Neil  586 Hargreaves, D. J.  98, 285, 361, 361t harmony and harmonics and audio branding  361 “harmonia” 124 harmonic modulation  213 harmonic overtones  200 harmonic progressions  145 harmonic series  200, 201 and musical shape cognition  247 and perception of timbre  39 and Pythagorean tone system  202 harmony of the spheres (musica universalis) 120 Harpur, P.  224, 228 Harrington, David  85, 87–88 Harvard Mark I mainframe  580 Haselager, W. F. G.  110 Hasse, Jürgen  525 Hawkins, J.  16 Hayafuchi, K.  52 Hayles, N. Katherine  620, 640–641, 643–644 Hayward, V.  49 headmusic 618 headphones  332, 565 Heap, Imogen  52 hearing vs. listening  180, 468–470

index   667 hearing-as 484n9 hearing-in  471–475, 478–479, 481–484, 484nn7, 9 Heart of Darkness (Conrad)  233 Hebbian learning  356 Heft, H.  99 Hegel, Georg Wilhelm Friedrich  496, 499–500, 502, 512n6, 611, 633 Heidegger, Martin  17, 30, 518–519, 525, 603–604, 632 Hempton, Gordon  183 Henderson, Joe  624 Hendrix, Jimi  106, 472 Her (film)  621, 647–648 Here headphone system  316 Herholz, S. C.  396 hermeneutics  230, 510 Hermode tuning  125 Herrmann, Bernard  476–478 Heschl’s gyrus  304 hexachords  121, 122f Heyman, H.  323, 328 Hieroglyphic Being (Jamal Moss)  622, 624–626 high art  539–542 Highben, Z.  43, 397 higher auditory imagery skills  43 Hilbert, Richard  30 Hiller, Lejaren  127–128 Hind, N.  104 Hindemith, P.  252 hip hop  600 Hitchcock, Alfred  645 Hobbema, Meindert  508 Hobbes, Thomas  19, 222 Hobsbawm, Eric  545–546 Hobson, J. A.  311, 317n1 Høffding, Simon  85 holistic perception  238, 252–255 Hololens system  316 Holst, Gustav  618 homeostasis  134, 137–138, 140–144, 146, 149, 150n11 Homeric epics  265 homogenized tuning systems  213 Horkheimer, M.  221 hormonal systems  146

Horn, M.  100 Horowitz, M.  435–436 hors-temps (“outside time”)  252 “House of the Rising Sun”  481 The House-Painters (Caillebotte)  541 Hubbard, T. L.  448–449 Huddersfield Festival  541–542 Hudson, W.  504–505, 510–511 Huffman code  162 Hullot-Kentor, R.  512n11 humachine 604–605 human evolution  196 humanist rationality  560 humanistic approaches  154, 535–539 Hume, David  19, 221–222, 469, 489, 511n4 Hungarian folk music  398 “A Hunger Artist” (Kafka)  491 Huron, D.  27, 357 Husserl, Edmund  15, 219, 224, 246, 252, 632–633, 635, 649n3 hybridized music forms  603–604 hyperanthropocentric ventriloquism  574 hyperreality 531

I

IBM Personal Computer  581 Ibsen, Henrik  230–231 iconic meaning  360 identity issues  559 identity-based brand management  351–353 ideomotor simulation  448, 450, 453 Ihde, Don  631 “I’ll Wait for You” (Sun Ra)  622–623 illusions, visual/optical  479–480 imagery and anticipated sounds  37–38, 52–53 and anticipatory imagery of sonic actions 44–47 and controller-driven performance  49–50 image schemata  241, 375 imaginary sound transformation  270 imaginary visualization  267 imagination-driven sound synthesis  273 musical 98 and perception of timbre  41 and performing the imagination  260–262 and sounds in imagined performance  42–44

668   index imaginanda 492 imaginative listening to music  467–468, 483–484 and experiential illusion  479–481 and expressiveness  475–476 imagined performances  42–44, 449–451; see also air performance and instruments imagined sounds  263–264, 271; see also voluntary auditory imagery less obvious candidates  481–483 and metaphor, musical space, and movement 476–479 opposition of hearing and listening 468–470 and props and triggers  472–475 species of imagination  470–472 and Walton’s normativist theory  496 imitation  60–61, 645–646 immune system  378, 383 The Imperfect Art (Gioia)  535 imperfectionist aesthetics  542–548 improvisation and aesthetics of perfection  547–548 as compositional method  547–548 and cooperative nature of symphonic music 212 critique of jazz as classical music  551–552 and definition of popular and classical music 550–551 and embodied imagination  17–18, 31 and emergent music  80–81 as emergent phenomenon  31 and Friston’s free energy principle  29 high art and vernacular art  539–542 and “improvised feel”  544–546 improvised polyphony  21 jazz as art music of  552–554 jazz as classical music  548–550 and language and imagination  30 perfectionist and imperfectionist aesthetics 542–544 and philosophical humanist approach to music aesthetics  535–539 shared histories of imagination and improvisation 18–26

shifting views on imagination and improvisation 23–26 spontaneity and aesthetics of perfection 546–547 and symphonic music  192–193 and tensions between the sacred and the divine 20–22 and transcendental subjectivity  27 “Improvised Music after 1950” (Lewis)  614, 618 impulsive sound and motion  247, 249 In the Break: The Aesthetics of the Black Radical Tradition (Moten)  617 Inauguration of the Pleasure Dome (film)  307 indefinite listening  468–469 index, sound as  80–81, 83–84 indexical meaning  360 indicative fields  275n11 indigenous cultures  301 individual self  211–212 individual vocabulary methods  328, 333 inductive inference theory  155 infancy and infant development  16, 140, 485n19 inferior colliculus (IC)  373 inferior frontal cortex (IFC)  394 informational quality of sound  260–262, 293–295 inner ear  446, 448 inner hearing  398–400 inner listening modes  261 inner voice  448, 629, 631, 633, 635, 639, 649n4 Innerlichkeit  496, 498–499, 513nn15–16 installation art  528–531 instant composition  546 instrumental performance  400–402 instrument model of control  70 instrument types  37–42, 46–51, 53–54 instrumental actions  66–68, 72 instrumental music  474 instrumental techniques  111n1 see also performance, musical insular cortex  380 intelligence 80 intentionality of artworks  492 interaction 70–71 interactive empathy  548

index   669 interactive media  308 interaural level difference (ILD)  372 interaural time difference (ITD)  372 interfacing  48, 108t, 123–126, 259 interference 343n12 interiority 513n18 internal sensory inputs  317n1 International Telecommunication Union-Radiocommunication Sector (ITU-R)  323, 342 Internet of Things  589 interpretation and aesthetics of perfection  547–548 of ancestrality  576n5 visual interpretation of sound  180, 182–183 interrogation  281–284, 286, 288–293, 295–298 intersubjectivity  82, 85, 230, 572, 574 intimate immensity  270 intonation 192; see also pitch and pitch intervals; tonality intonation shapes  251 intramusical meaning  360 intraparietal cortex (IPS)  394 intraparietal sulcus (IPS)  453 invariance  134–136, 138, 141, 144–146, 149 involuntary musical imagery (INMI)  254, 266–268, 393–394, 453 involuntary musical imagery scale (IMIS) 455 Iraq War  282, 287, 289–290 Islamic culture  292, 295–296 iterative motion  249 iterative sounds  247 ITPRA (imagination, tension, prediction, reaction, appraisal) theory  357

J

J Dilla  601–602 Jackson, D.  349–350 Jacob, Henry  309 Jacobs Orchestra Monthly 26 Jakubowski, K.  445, 455 jam sessions  42 James, Etta  79, 88–90 James, George G. M.  617, 625 Japanese classical music  549 Jaques-Dalcroze, É.  458

jazz and Afrofuturism  619, 624 as art music of improvisation  552–554 as classical music  548–552 and differences in auditory ability  395 and evolving technologies of performance 108t and groove  87 and human/instrument physical interface 112n9 and imaginative listening to music  481 and improvisation  25–27, 539, 542, 544–546, 548–554 and role of repetition  134 and skilled musicianship  94n19 and Sun Ra compositions  145 and symphonic music  193 and time warps  602 and variable rhythms  596–597 and virtual instruments  107–108 Jenkinson, Tom  598 Jilka, S.  455 Jingle Bells  398, 398f JKU Patterns Development Database  172, 173f JODI (Joan Heemskerk and Dirk Paesmans) 591 John the Baptist  648 Johnson, B.  282 Johnson, E. H.  221 Johnson, M.  19, 23, 59, 439 Jones, Inigo  541 Jonze, Spike  647–648 Jonzun Crew  620–622 Joseph, Romel  392 Journal of the Society for American Music 614 Judeo-Christian theology  514n25 Jung 434 Juslin, P. N.  355–360, 358, 379, 437

K

Kafka, Franz  491 Kalakoski, V.  452 Kalenda Maya  192 Kane, Brian  522 Kant, Immanuel  18, 22–28, 221–222, 469–470, 490, 497, 511n3, 638 Kapferer, J. N.  352

670   index karaoke  107–108, 394 Karplus-Strong algorithm  104 Kassabian, Anahid  532 Katz, M.  104–105 Keller, K. L.  351–352, 354 Keller, P. E.  44–45, 47, 397, 450 Kellner, D.  512n6 Kelly, John, Jr.  581 Kenneally, Thomas  281 key-postures 246 Khalfa, S.  359 Kind of Blue (Davis)  544 Kinect system  51 kinesthetics  43–44, 65, 72–73 Kircher, Anthanius  181 Kivy, P.  25, 512n10 Kjeldsen, J.  130 Kline, Nathan S.  585 Klüver, Heinrich  303, 308, 310, 313 Knight, Kirk  622–623, 626 Knowles, Beyoncé  90 Kocsis, Sándor  547 Kodály, Zoltán  398–400, 402–403 Koivuniemi, K.  333 Kolmogorov, A. N.  155, 175, 176n3 Kolmogorov complexity  155, 163–164, 175 Konitz, Lee  545–547, 553 Korean classical music  549 Korean War  286 Korg Wavestation  103 Korstvedt, B. M.  496, 513n16 Kozel, S.  73 Kraftwerk 597 Kramer, Jonathan  483 Kreilkamp, Ivan  233 Kristeller, Oscar  539 Kristen, S.  361t Kronos Quartet  85, 87–88, 94n18 Krueger, J.  98 KUBARK Counter Intelligence Interrogation manual 291 Kubrick, Stanley  100, 526, 581 Kulturwissenschaft 490

L

Lacan, Jacques (and Lacanian theory)  84, 636 Lady’s Glove (sonic controller)  52, 270

Lakoff, G.  439 Lamia (Keats)  538 landscape painting  471 Lang, P. J.  374–375 Langer, Susanne K.  484n3 Langguth, Jerome J.  612 language, vocal sounds as precursor of  411 language and linguistics and auditory development  412f and ecological model of auditory perception 412f, 414 and embodied imagination  29–31 and imaginative listening to music 480–481 language barriers  144–146 language impairment  30 language-character (Sprachcharakter) 493 and literacy  209 and music in detention/interrogation situations 288 Large, Edward W.  147 Late Middle Ages  122 Late Night Special (Knight)  623 A Late Quartet (film)  91, 94n18 Latour, Bruno  106, 185 Lawless, H. T.  323, 328, 332 Le Corbusier  261 Le Déjeuner sur l’herbe (Manet)  508 Le due Orfei (Pirrotta)  650n9 learning difficulties with  417 learned emotional meaning of sound 375–376 learning engines  268 musical  99, 153–154, 156, 164–167, 175 Lee, Alvin  106 Lee, C. C.  469, 474 Lee, Vernon (Violet Paget)  468 “Left & Right” (J Dilla)  600 Leftist/Marxian “normative” aesthetics  490–491, 497, 499–501, 505, 510–511, 512n4, 512n6, 514n25 Leibniz, Gottfried  120, 509, 517 Leman, M.  248 Lempel–Ziv-77 and  78, 174 Lenin with Villagers (Usikova)  495 Leningrad Symphony (Shostakovich)  495

index   671 Leppert, Richard  506–507, 509 Lessing, Doris  617 Levey, Stan  89 Levin, Thomas  221 Levinson, Jerrold  275n6, 470, 481, 483 Levitas, Ruth  502, 504 Levy, Lou  89 Lewis, George  613–619, 621–622, 625 Lewis, Pamela Z  617–618 Li, M.  155, 164 Lichtenstein, Roy  221 Liebman, David  549–550 ligature 212 Ligeti, György  526 Liikkanen, L. A.  445, 454 likelihood principle  164 Lipatti, Dinu  547 listening and data compression  165 and emergent music  79–83 modes of  82 moment-to-moment 481 and music therapy  427–430, 431–432t, 433–435, 437–440 understanding of a musical object  166f and Walton’s musical aesthetics  513n12 see also imaginative listening to music Listening and Voice (Ihde)  631, 640 Listening to Noise and Silence (Voegelin) 576n7 Liszt, Franz  25 Liszt Academy  398 literacy 209 liturgical services  21 live-coding practices  73n2 Llinás, Rodolfo R.  139 localization of sound  371–372 Lockbaum, Carol  581 Locke, John  19, 22, 24 locus coeruleus (LC)  382 lo-fi soundscapes  520 Logic Pro X  125, 128 Long Range Acoustic Device (LRAD)  289 long-term memory  399, 402 López, Francisco  520, 522 López, K. J. de  30 L’Orfeo (Monteverdi)  638–639, 647, 649n8

Lorho, G.  330–331 Los Angeles Times 568 lossy compression  156 Lost in Space (Jonzun Crew)  620 Lotze, M.  449, 458 Louboutin, Corentin  173–174 loudness and audio branding  359 and auditory world of autistic children  409 and consumer sound analysis  321–323, 325, 333, 339–340 and ecological model of auditory perception 411 and musical shape cognition  240 and sound perception in autistic children 409–410 and sound/emotion connection  370–374, 381–382 loudspeakers and aesthetics of sonic atmospheres  529 and anticipated sounds  37 and arche-sonic vibrations  564 and augmented unreality  316 and bioacoustics  183 and computer system sounds  580–581, 583 and consumer auditory experience  323, 331–332, 341–342 and controller-driven performance  49 and memory/imagination relationship  231 and music in detention/interrogation situations  287–289, 294 and performing the imagination  269 and Ventriloqua (Satz)  568 and visual imagination  267 A Love Supreme (Coltrane)  553 Lovelock, James  143 LSD  301–302, 305, 307, 309–310, 312f, 313, 434 LSD (Belson)  309, 313 Luft, Luba (character)  641–648 lullabies 140 Lupton, Deborah  583–585, 587–588 Lusensky, J.  353–362 Lye, Len  308 Lynch, David  526–528, 531–532 Lyons, I. M.  110 Lyotard, Jean-François  576n7 Lyrical Ballads (Coleridge)  513n17

672   index

M

Ma, Yo-Yo  93n18 machines machine assistance  276n24 machine learning  243 machine souls  581 machine transcription  267 machine-focused approaches  267–268 machinic rhythms  596–598 Machover, Tod  50 MacInnis, D. J.  351 Macintosh 586 MacKenzie, Donald  585, 589 macro timescales  245 Maes, P. J.  16 The Magic Flute (Die Zauberflöte) (Mozart)  639, 641–642, 644–645, 648 Mahler, Gustav  225, 228 mainframe computers  581, 586 Mair, Michael  186 major-minor system  162 makams  122, 125 Malloch, S.  275n14 Manet, Édouard  508 Manetti, Gianozzo  20–21 Man-Machine (Kraftwerk)  597 mappings 48–54 Margulis, Elizabeth Hellmuth  133, 139–140, 148, 456 marketing of music  285 Martin, G.  324, 329, 331 Martynov, Vladimir  85, 88 Marx, Karl  497, 512n4, 512n6 Marxist theory  596; see also Leftist/Marxian “normative” aesthetics Maryland Psychiatric Center  434 Mason, R.  328 mass music  195–196 Massachusetts Institute of Technology (MIT) 50 Massumi, Brian  583 Master Samples Library  41 materialism, sonic  559–563, 565–570, 573–575 materiality  559–564, 567–568, 570–572 mathematics  201–204, 210, 214, 574, 575n2 mating strategies  150n10

Mattheson, Johann  469, 493 Matthews, Max  127–128, 581 Mayweather, Cindi  621 McCullough Campbell, S. M.  456 McGill University  41 McGinn, C.  100, 102 Mckendrick, J.  285 McLaren, Norman  308 McLeod, Marilyn  623 McNorgan, C.  437 McPherson, G. E.  439 meaning structure  351–353 meaning-making  15, 20, 22, 28, 31 media degradation  233 medical offices  285 meditative sound  582 Mediterraneo (Raimondo)  561, 571–572 Meilgaard, M.  329 Meillassoux, Quentin  560–563, 568–569, 573, 575, 575nn2–3, 576n5 Mein Jesu, was für Seelenweh! (Bach)  430, 431–432t melody and audio branding  361 and augmented unreality  315 and embodied meaning  144 and emergent character of music  89 and externalization of pitch and intervals 201 and guided imagery  439 and imaginative listening to music  476–480, 482 and live recordings  94n20 melodic movement  480 melodic shapes  251 and musical shape cognition  247 and musical timescales  245–246 and the octave revolution  204–209 and perception of timbre  39 and Pythagorean mathematics  203 and Scandinavian yoiks  194–195 and sound perception in autistic children 420 and symphonic concerts  192 and symphonic music  192–193 and voluntary auditory imagery  395, 398 memoire involuntaire 228

index   673 memory and absolute pitch  417 and music analysis  154 and musical performance  92n9 and repetition  134, 138, 140, 147–149 and sound recording  219–222, 224–226, 228–233 Mendelssohn, Felix  46 mental imagery  446–447, 459; see also guided imagery and music (GIM) mental practice/rehearsal  42–44, 254, 395, 397–403 mental representation  374–375, 395, 401, 435–436 Merleau-Ponty, Maurice  15, 60, 221, 224, 518, 521, 576n6 mescaline 305 Meshes of the Afternoon (film)  307, 612–613 meso timescales  245 Mesopotamia 213 Messiaen, Olivier  181–182 meta-aesthetics  490, 505, 509 Metamorphosis (Ovid)  225 metaphor and embodied meaning  142–143 and imaginative listening to music  476–479, 485nn21–22 and interaction with music  128–129 metaphor theory  434–435 and metempsychosis theory  220 and mimetic motor imagery  64 and motor imagery in music perception 68 and musical shape cognition  237–238, 241, 245, 252 and nature of memory  229–230 and sonic materialism  570–571 and technical music performance  60 and technological instruments  70–72 and utopian allegory  501, 503 Metaphysical Song (Tomlinson)  637 metaphysics and aesthetics of improvisation  536–537 and dualist perception of vocal expression 629–631 and emergent character of music  79 Greek influence on music  129

and influence of tone systems on music perception 126 and Innerlichkeit concept  513n16 and limitations on music creation  118 and metamorphosis of voice  635 and nature of opera  649n8 and nature of voice  637 and Pythagorean mathematics  203 and responses to philosophical rationality 569 and sonic materialism  560–562, 575 metempsychosis  219, 224–229, 232–233 micro timescales  245 microrhythmic transformations  598–602, 603 Microsoft  51, 316, 579, 582–583, 586, 588–589, 592n3 microtiming  39, 43, 46, 53 Middle Ages  124, 213–214 MIDI  48, 50–53, 99, 102, 128–129, 154–156, 247 MIDI drums  606n4 Milano, Francesco da  22 Miles Davis Quintet  85, 87, 89, 94n20 Milestones (Davis)  85 military life  286–288 military-industrial complex  585, 591 Miller, George Bures  528–531, 532 Miller, Glenn  90, 104–105, 108–109, 111n7 Miller, K.  104–105, 108–109 Millet, Jean-François  541 Milli Vanilli  105 mimesis  63–66, 430, 640 Mimesis as Make-Believe (Walton)  491 mimicry  60–61, 72 Mi.Mu Gloves  52 mind and emergent music  77–80, 83, 87, 90–91 mind-body separation  23 mind’s ear  45, 445, 457–459 theories of  134, 141, 143–144, 146, 148 wandering 454 Mingus, Charles  91 Mingus Ah Um (Mingus)  553 minimal algorithmic sufficient statistic  155; see also Kolmogorov complexity minimum description length (MDL) principle  155, 157

674   index Minter, Jeff  309–310 mirror neurons  265 mistakes in performance  91 mist-nets 184–186 mixing engineers  40–41 mnemic processes  225–226, 229–230 modal patterns  247 modalities of time  619–620 modality shapes  250 modernism  539–540, 545–546, 550, 596 modes of listening  82, 92n10 modulatory motion  248 Moffat, Ellen  647 Molnar-Szakacs, I.  265 Momente (Stockhausen)  265 moment-to-moment listening  481 Monáe, Janelle  612–613, 621 monophonic melodies  203 Monteverdi, Claudio  638–639, 647, 649n8 mood  321, 325, 326f, 382, 427, 455, 525 Moog synthesizers  619 Moonwatcher 100–101 More, Thomas  504 More Brilliant than the Sun (Eshun)  618 Mork-Eidem, Alexander  648 morphetic pitch  159 morphodynamical theory  241 morphology of sonic objects  246 Morris, William  503 Morse telegraphy  227 Morton, Timothy  520 Moss, Jamal (Hieroglyphic Being)  622, 624–626 Mostly Bach (GIM program)  430 Moten, Fred  617 Mother (Gorky)  495 motion and air performance  105–109 and mimetic motor imagery  63–68 and motor imagery in perception  59 movement-based controls  71 movement-sound conjunctions  67 and musical shape cognition  237–255, 242f, 250f see also body-motion motion and gestures  467, 474–483, 485n17 motion perception in music  244 motion-capture technology  51, 243

motor cortex  450 motor imagery  44, 60, 63–68, 72–73, 102, 111n4 motor involvement  456 motor representation  111n5 motor resonance  37 motor systems  146 motor theory of perception  243–244, 251–252, 446 motor-encoding strategies  450 motor-error feedback  53 motor-intentionality 85–87 Mozart, Wolfgang Amadeus and auditory imagery abilities  396 and defining classical music  550 and ethics of music  281 and humanist approach to musical aesthetics 538 and imagination/improvisation relationship 25 and imaginative listening to music 467–468 and music analysis algorithms  173 and musical imagery in performance  449 and operatic voice  639, 641–642, 648 Müllensiefen, D.  454 Müller-Lyer illusion  479 multichannel sound systems  302, 315 multicollinearity 340 multimodality and embodied cognition  445, 447–448, 450–451 multimodal imagery  43, 428–434, 436, 439–440, 450 multimodal imagery association (MMIA) 448 multimodal sensory stimuli  53–54 multimodal sound-motion shapes  249 and voluntary auditory imagery  401–403 multisensory integration of sound  244 multitrack recording  600 Munch, Edvard  645 Murray, J. M.  332 MusEcological perspective  440 “MUSHRA” test  337 music and audio branding  353–362 and auditory development  412f, 413f

index   675 and ecological model of auditory perception  411–416, 412f, 413f, 415f and information technology  260 in military life  286–288 music, physics, and the mind  147–148, 150n4 music education  391–404 music festivals  302, 310, 313–316, 317 music imagery  153–154 music information retrieval (MIR)  248, 262, 268, 276n24 music of the spheres  150n4 music perception  62–63, 153–154, 156, 163–167, 175, 176n6 music psychology  63 music synthesis  260–261, 265, 268–269, 271–272, 274, 275n14, 276n26 music therapy  427–428, 430, 432t, 434, 436, 438–440 music travel  430 musica recta-musica vera (true music)  213 musica universalis (harmony of the spheres) 120 musical abilities in children with autism  411, 416, 418f, 419 musical architecture  481 musical expectancy  379–380 musical fit  351 musical imagery  153–154, 253–254, 428–429, 434–440, 445; see also guided imagery and music (GIM) musical imagery information retrieval (MIIR) 275n8 musical imagery tests  456–457 musical information  153–156, 158, 161–163, 170, 175 musical instants  251–253 musical instrument playing  16 musical learning  164–167 musical listening  153, 164–167, 166f, 409, 411 musical literacy  400 musical object  164–167, 165f, 174, 506 musical sequences  161 musical space  476–479 musical surface  163–164 musical timescales  245–246

musical training  453; see also performance, musical musical universals  146–147 music-brand fit  350–351 music-emotion induction mechanism 356–360 musicking  25, 31, 60, 73, 105, 141, 427, 438, 489, 599, 602, 604 music-related shape cognition  250f musique concrète  93n11, 103–104, 127, 130, 246, 269, 271 preference 359 research related to  238–241, 244–246, 253 and sound/emotion connection  378–381 Walton on  493–498 see also music analysis; musical shape cognition; musicology; performance, musical music analysis  153–156 applying a compression-driven approach 174–175 and compact encodings of musical objects 161–163 and compression-based model of musical learning 164–167 and data compression  159–161 encoding and decoding  156–159 evaluating algorithms  172–174 and explaining individual differences 167–169 and Kolmogorov complexity  163 and perceptual coding  163–164 and point-set compression  159, 160f, 170–172, 175 Music in Contrary Motion (Glass)  475–476 The Music of Strangers (documentary)  93n18 music performance; see performance, musical Música, por un tiempo (Rodriguez)  93n14 Musica enchiriadis (anonymous)  205–209, 205f Musica Practica (Pareja)  124 musical shape cognition  237–239 and motion features  249–251 and motor cognition  243–244 and musical imagery  253–254 and musical instants  251–253

676   index musical shape cognition (Continued) and musical timescales  245–246 and notions of shape  239–243, 242f prospects and challenges  254–255 schematic of  250f and sound features  246–247 Musicglove  52, 52f musicology  25, 154, 156, 172–173, 175, 282–283, 438, 512n9, 639 Muslim identity  292 Musurgia Universalis (Kircher)  181 “My Red Hot Car” (Squarepusher)  599 MythScience  613, 625

N

Nagel, F.  358–359 Nagel, T.  27 Nakra, T. M.  50–51 Nancy, Jean-Luc  28, 644 Narmour, E.  27 nasheeds 289 natural entropy  231 “Natural Melanin Being . . . ” (Ras G.)  623 natural selection  378 naturalistic interaction  71 nature 272–273 naturecultures  559, 575n1 Nawrot, E. S.  485n19 Nazi propaganda  283, 293, 296–298 Neisser, U.  399 Neon system  309 neophobia 420 neo-Platonism 632 neo-soul 600 Neubauer, Raymond  137, 150n10 neumes  119, 121, 131n1, 206–207, 206f, 215 neuroaffective theory  428–434, 436, 438 neurons and neuronal activity and embodied meaning  142–143 and guided imagery  437, 440 neural activation streams  65 neural correlates of musical emotions 380–381 neural imaging  16 neural models  382 neural networks  51–52 neural structures  376

neural synchronization  147–148 neurocognitive research  244 neurodynamic theory  147 neuroimaging  41, 46, 253–254, 304, 370, 377, 380–381, 383 neurological experimentation  259, 262, 265, 275n5, 275n10 neuroplasticity 373 neuroscience  142, 259, 262–263, 275n5, 275n10, 437; see also brain imaging and physiology neurotypical development  411, 412f, 413f, 414, 415f, 416–417, 424 and periodicity  138–139 and repetition  149, 150n5 new interfaces for musical expression (NIME) 241 New Materialism  559–560, 562, 575 “A New Physics Theory of Life” (England)  133 News from Nowhere (Morris)  503 NewTek Video Toaster  309 niche construction  199 nighthawks  179–180, 184–189 Nineteen Eighty-Four (Orwell)  619 19-tone systems  125 Ninth Symphony (Beethoven)  142 No Me Quedo . . . (Fischman)  315, 318n14 noise  175, 183, 358 Noise (Attali)  611 Nomi, Klaus  641 nonhuman species  185 nonpitched sound  247 nonpropositional knowledge  17 nonrepresentational memory  223 non-verbal auditory hallucinations (NVAHs) 304 nonverbal mapping  375 Noosphere: A Vision Quest (Sacred Resonance) 310 Noriega, Manuel  289 normative conception of art  538–539 normativist aesthetics  489–493, 496–498, 505, 511, 511n2, 511n4 North, A.  361, 361t North, A. C.  285 Norwegian folk music  192, 216n9 Norwegian National Opera  648

index   677 notation, musical Bach’s notation-based polyphony  216n8 and externalization of imagined, complex sound  191–210, 212–215 and music analysis  154 and musical shape cognition  241, 242f notation systems  119, 121–123, 126, 128 notational audiation  400 notation-based music  39, 53 and performing the imagination  267–268 “Not-Yet-Being” (Noch-Nicht-Sein)  502, 504 numerical elements of music  120, 122, 124, 129–130, 131n6 Nussbaum, Charles  480 Nyad, Diana  392

O

object-based instrumental actions  71 objectivity objective affordances  69 objectively evaluable tasks  154–155, 172 objectively real possibility (objektiv-reale Möglichkeit) 503 and sonic materialism  559–560, 562–563, 565, 570, 576n4 object-oriented ontology  559 obsession with music  421–422 occult 569 Occupational Personality Questionnaire  329 Ockham’s razor  155, 157 octaves  120–122, 124–126, 128–130, 131n3, 131n6, 162, 200–205, 209, 216n4, 216n5 Oculus Rift virtual reality (VR) headset  310 Odo of Cluny  216n7 Of Grammatology (Rousseau)  633, 649n6 offline cognition  447–449 O’Hara, H.  512n6 Oliver, R.  599–600 Ondes Martenot  126–127 “One More Time” (Margulis)  133 online cognition  448 onomatopoeia 182–183 ontology of music  91 onto-synergystic transition  210–211 open-air controllers  48

opera autopoiesis and the autoaffective voice 630–634 defining classical music  550 and hearing vs. listening  468 and Luba Luft’s Pamina  640–648 and “operatic voice”  637–640 phantom of the operatic voice  637–640 videocentrism and expressive voices 634–637 operating systems  579, 581–591, 647–648 Opie, T.  104 optical illusions  479–480 Orchestra Wives (Gordon)  88 orchestras 210 organic rhythm  596–598 ornithology 186 Ornithology Research Lab (Cornell University) 182 Orpheus 638–639 Orsoni, Michel  525–526 orthography 197 Orwell, George  619 Otherness  84, 525–526 Ouïr 92n10 “Outer Space Employment Agency” (Sun Ra)  619, 623–624 Outlines of a Philosophy of Art (Collingwood) 537 Overy, K.  265 Ovid 225 The Oxford Handbook of Music Psychology (Hallam et al.)  439

P

paleontology 563 Palmer, C.  43, 397 Panelcheck 330–331 paradigm development  330 Parakilas, J.  552–553 parallel motion homophony  20 Paravicini, Derek  418 Pareja, Ramos de  124, 125 Park, C. W.  351 Parker, Charlie  552, 625 Parker, Evan  29 Parkhurst, Bryan  484n5

678   index Parkinson’s disease  403 Parliament/Funkadelic 622 Parmegiani, Bernard  267 parsimony  154–155, 157–158, 160–164, 167 Partch, Harry  127 Pasley, B. N.  263 Pastoral Symphony (Beethoven)  471–472, 508 Patel, A. D.  378 patriarchal capitalism  590 Pearce, M, T  27, 164 The Peasant Wedding (Bruegel the Elder)  492 pedagogy  391–394, 397–404, 458 Pedersen, T. H.  326, 326f, 330 pentatonic scale  122, 129–130, 131n3 perception of sound and auditory attention  381–382 conceptual model of  325f and data compression  167 and Dorsch’s terminology  343n2 and environmental imagination  521–523 and informational quality of sound  294–295 and motor imagery in perception and performance 62–63 and musical shape cognition  237–238, 240–241, 243–244, 246–248, 252–253, 255 perception-action coupling  15–16, 22, 28–29, 31, 97, 101–102, 447 perceptive world  187–188 perceptual affordances  69 perceptual coding  163–164 “Perceptual Evaluation methods for Audio Source Separation” (PEASS)  340 perceptual qualities of sound  409–411, 416–417, 421 Perceptually Optimized Sound Zones (POSZ)  331–333, 340 and sound/emotion connection  369–384 perdahs 125 Peretz, I.  110 The Perfect Swarm (Fisher)  136 perfectionist aesthetics  542–548 performance, musical aesthetic potentials of environments  518–520, 523, 525–526, 531 affordances in musical performance  97–111 and anticipated sonic actions  37–54 and controller-driven digital music  47–52 and emergent character of music  77–91

and empirical musical imagery  449–452 and environmental affordances of sound  92n3, 93n14, 94nn19–20 and environmental sounds  272–273 and gesture/sound connection  108t, 111n1, 112nn9, 11 improvisation and imagination  15–31 and jazz as classical music  552 and motor imagery in perception  59–73 musicking  25, 31, 60, 73, 105, 141, 427, 438, 489, 599, 602, 604 and noninstrumental flourishes  93n18 performing the imagination  259–274 playing/performing distinction  269 role of memory  92n9 “to musick”  511n1 and unconscious in music  93n12 performance gestures  269 periodicity  134, 138–139, 142–143, 147–149 person yoiks  194 personal sound zones  332, 341 Peters, J. D.  230 Pfordresher, P. Q.  448 Phaedrus in Dissemination (Derrida)  636 phase of sound  372 phase-transitions 247 phenomenology and affordances in musical performance 98 and arche-sonic vibrations  566 of improvisation  15, 17 and Innerlichkeit 498–499 of involuntary musical imagery (INMI)  394, 454 and memory/imagination relationship  229 and metamorphosis of voice  635 and metempsychosis theory  219–221, 223 and music therapy  434, 438, 440 and musical shape cognition  240 musical voice compared with phenomenological voice  649n5 musical vs. phenomenological voice  649n5 and science of embodied cognition  30 and sonic materialism  560–562, 574–575 and technical music performance  60 and Walton’s normativist theory  508 Philips NatLab  323 Philips TV  323

index   679 philosophical anthropology  490 philosophical humanism  538 phonography  62, 221, 226–228 physics  28–29, 147–148, 150n4 physiological conditions  458 physiological schemata  60 piano  38–39, 42–43, 45–47, 53 piano keyboards  125, 129 Piano Sonata No 17 (Beethoven)  242f Pieslak, J.  287–293 Pirrotta, Nino  639, 650n9 pitch and pitch intervals and anticipated sonic actions and sounds  39, 41, 43–45, 48, 50–51 and audio branding  359 and auditory world of autistic children 409 and Byzantine neumes  131n1 and construction of diatonic scale  131n3 and ecological model of auditory perception  411, 414 and externalization imagined, complex of sound  192–195, 199–210, 212–214 and guided imagery  439 and motor imagery’s role in perception 67–68 and musical shape cognition  237–238, 240, 245–247 pitch-naming strategy  162 Pythagorean definition of  131n4 and sound perception in autistic children  409, 416–417, 418f, 421–422 and sound/emotion connection  370–373 and systemic abstractions  117–130 terminology associated with  216n5 three-dimensional depiction of  216n4 and voluntary auditory imagery  393, 395–396, 402–403 Pitt, Bradley  310 Piva, Anna  568 The Planets (Holst)  618 Platée (Rameau)  648 Plato and Platonist philosophy and aesthetics of perfection  542, 544 and devocalization of logos  649n6 and discourse on imagination  98 doctrine of the voice  636 and ethics of sound and music  260

and high art/vernacular art dichotomy 539–540 and imagination/improvisation relationship  18–20, 22–25 and metempsychosis theory  219 and Walton’s normativist theory  496 playback 185–186 playing; see performance, musical playing by ear  416–417, 418f, 424 Plugin Beachball Success (Satrom)  591 poetical fictions  479 poiesis 649n1 point-set compression  159, 160f, 170–172, 175 Poizat, Michel  636, 638 polyphony and data compression  162 and externalization of pitch  210 and imagination/improvisation relationship 21 polyphonic and polyrhythmic complexity  181, 194, 205f, 209 polyphonic complexity  194, 196, 203–205, 205f, 207, 208f, 209–210, 213–214, 216n8 polyphonic composition  208f polyrhythmic music  478 and Pythagorean tone system  203 popular music and aesthetics of improvisation  550–551 contrasted with symphonic music  193 and data compression  162 and digital audio workstations  599 and human/technology interaction 602–603 and motor imagery in perception and performance 62 and music in detention/interrogation situations 282 organic and machinic rhythms  596–598 and rhythmic transformation in digital audio  595, 596–599, 602–603 and voluntary auditory imagery in music pedagogy 403 porous boundaries  567–570 porrectus neume  119 positivist aesthetics  17 posterior parietal cortex (PPC)  453 posthumanism and anthropocentric rationality  576n4

680   index post-Romantic conception of art  540–541 poststructuralism 630 posture  244–246, 250–251, 250f Potter, Dennis  264, 274 practical autonomy  540 practice of music performance  59–62, 66–67, 72–73 pre-attentive listening mode  82 prediction  16, 154, 326, 331–341, 358, 397 pregnancy 567–570 prehistory 625 Prelude in C minor (Bach)  159, 160f, 171, 172f presence, environmental production of  521, 524–532 Price, Emmett  551–552 Prima, Louis  467 primary auditory cortex  371 principal component analysis (PCA)  337, 338f The Principle of Hope (Bloch)  502, 514n19, 514n22 The Principles of Art (Collingwood)  537 Prior, Nick  602 privacy issues  274 probability  149n1, 162 product choice  350 proficiency 99 programming language  200 projection mapping  302, 315 propaganda  283, 293, 296–298 propositional imaging  471 propositional knowledge  17 props  472–475, 485n14, 491–492, 495–496 prostheses  220, 579–580, 585–589, 617–618 Proust, Marcel  224–225, 228–230, 232–233, 484n10 Pseudo-Odo 204 Psych Dome  303, 310 Psychedelia 309 psychedelic hallucination  301–303, 305, 309, 312–313, 315–317 psychic ecosystem  220 Psycho (film)  476–478 psychology and psychotherapy auditory music imagery in music pedagogy  392–397, 400

and augmented unreality  302–303 and autism research  416, 422 and consumer sound  322–323, 343n1 and empirical musical imagery  445–446, 449–459 and guided imagery  427–440, 428–429 and interrogation techniques  281–284, 286, 288–293, 295–298 memory imagery  470 and music analysis  164 and music in detention/interrogation situations 285 and psychoacoustic research  241, 253 psychoanalytic frameworks  79, 81–82, 85, 87, 92n8, 93n12, 636, 638, 649n4 psychological states  458 psychological warfare  284, 286, 288–290, 292–293, 296 psychosis 302 and sound/emotion connection  369–384 and voluntary auditory imagery  400 PsyOps 289 Ptah, the El Daoud (Coltrane)  624 Ptolemy 125 Puberty (Munch)  645 public relations  296 public transport  286 pulse-code modulation (PCM)  155 pupitre d’espace 269–270 pure psychic automatism  232 purism 475 Puritanism 221 Pythagoras and Pythagoreans  118–121, 124, 126, 128, 131n4, 201–204, 203f, 210, 625

Q

Quake Delirium (video game)  308 qualitative evaluation analysis of complex audio stimuli  324–331 and attribute modeling  338–341 methods and issues  321–324 and perceptual model for prediction of distraction 332–341 qualities of sound  496 quantitative evaluation quantification of imagination  321–324, 329, 332, 335, 340, 342

index   681 quantitative descriptive analysis (QDA)  321, 324, 326–332 trajectory of current research  341–342 quasistationary body postures  250, 250f “The Question Concerning Technology” (Heidegger) 603

R

race relations  612, 617 racial identity  559, 590, 614–615 rāgas/rāginīs 125 Raimondo, Anna  561, 571–573, 575 Rameau, Jean-Philippe  648 randomness  163, 175, 339 rap music  289 rapid perceptual image description (RaPID)  323, 342 Ras G. (Gregory Shorter, Jr.)  622–623, 626 Ras G. & the Afrikan Space Program  622 rationalist aesthetics  17, 22 raves  309, 313 Ray, Michael  145 Reading Voices (Stewart)  639–640 reality  559–561, 565–566, 570–573 “Real-Possible” (objectiv-real Mögliches) 502, 508 recording technology and brain imaging  262 and media degradation  233 and memory/imagination relationship 230–233 and metempsychosis  226–228 and metempsychosis theory  219–221 and perception of timbre  40–41 and rhythmic transformation in digital audio music  600 and voluntary auditory imagery  391–392 reduced listening mode  82, 93n11 reed instruments  125 Reekes, Jim  582–583, 586–587 reentry process  150n5 reflective listening  79, 83 regnum humanum  490, 500 regression models  339–340 rehearsal  391–393, 395, 397–403 Reich, Steve  143 Reid, Rufus  112n9

relative pitch  418, 418f relaxation 427 remixing 233 Renaissance 541 Renard  226, 228 rendering imagination  264–265 rendering memory  264 Rentfrow, P. J.  361t repetition  109, 112n11, 133–134, 138–140, 144, 146–149, 195 representation  222–224, 305–306, 314f, 317n1, 436, 484n5 Republic (Plato)  18–19 Requiem (film)  526 Resch, Phil (character)  645 resonance  67–68, 133–134, 139–140, 142, 146–147, 149 responsorial chant  195–196 Reuter, C.  39 reverse engineering  259, 262–265 Reybrouck, M.  98, 447 R&G (Rhythm & Gangsta): The Masterpiece (Snoop Dogg)  602 rhythm and audio branding  359, 361 and augmented unreality  315 beatless music  618 and emergent nature of listening  81 and imaginative listening to music  467, 477–478 and involuntary musical imagery  455 and music in military life  286–287 and musical shape cognition  247 and performing the imagination  265–266, 274 and responsorial chant  195–196 rhythmic transformation in digital audio 595–605 rhythmic-textural shapes  251 and Scandinavian yoiks  194 and sound/emotion connection  379 and symphonic concerts  192 and symphonic music  193 and voluntary auditory imagery  396 see also tempo rhythmic entrainment  379 Ribas, Moon  586

682   index Ricoeur, P.  430 Riemann, H.  122 Righter, Carroll  568 ringing  322, 343n3 Rissanen, J.  155 ritual 61 Roads, C.  102 Roaratorio (Cage)  273 Robocop (film)  586 Rochester Rappings  226–227 rock and roll music  549 Rock Band (video game)  109 rock music  289 Rodriguez, Robert Xavier  93n14 Roholt, Tiger  79, 85–87 role in improvisation  16–18, 31 Roman Catholic church  195–197 Romanticism  24, 222, 496–497, 499–500, 503, 511, 513nn16–17 romantische Kunstform 496 root-mean-square error (RMSE)  340 Rose, Tricia  615 Rosen, Charles  470 Ross, Alex  552 Rouget, G.  21 Rousseau, Jean-Jacques  24–25, 633, 649n6 Rovan, J.  49 Rusalka (Dvořák)  648 Russell, George  552 Ruud, E.  440 Ryle, Gilbert  111n4

S

Sacks, Oliver  119 Salome (Strauss)  648 Samantha (AI character)  647 Sami people  194 sampling 233 Sancho-Velazquez, A.  21, 24 Sanders, J. T.  224 Sanders, Pharoah  624 Sangild, T.  606nn7–8 Sartre, Jean-Paul  537 Saslaw, Janna K.  29, 449 Satrom, Jon  591 Satz, Aura  561, 567–568, 570, 573–574 savantism  416, 418, 420–421

Savary, L.  440 scales  201, 209, 214 scat singing  244 Scenery of Decalcomania (Tsunoda)  561, 563–567 Schaeffer, Pierre  93n11, 103–104, 127, 240–241, 246, 249, 254, 273, 522, 530 Schafer, R. Murray  141, 264, 520 Schenker, Heinrich  492 Scherer, K.  356, 359 Schikaneder, Emanuel  641–642 Schindler’s Ark (Kenneally)  281–282 schizophrenia 303–304 Schlaug, G.  98 Schoenberg, Arnold  543, 551 Schubert, Franz  472 Schumann 42 Schwartz, S. H.  352–353 Schwarzkopf, Elizabeth  646 Schwitters, Kurt  1 scientific method  184, 227–228 scientism 538 Scott, Ridley  618, 641, 648 The Scream (Munch)  645 screen memory  230 scripts 435 Scruton, Roger  470, 477–478, 485n21, 536, 542–543 Scudo, Paul  182 Seashore, Carl  397 secular music  22 secundum auditum 124–125 seed stimulus  271 Selbstbildung (self-formation)  490 Selbstverständigung (self-reflection)  490 self-auditory motor feedback  47 self-organization  134, 147, 149n2, 149n3 self-replication  134–135, 149 semantic listening mode  82, 614 semantic shapes  250 semantic understanding  414 semiotics  205, 210, 214, 287–288, 509, 512n9 semitones  121–122, 125, 131n3, 201, 205, 209 Sense and Sensibility (Austen)  471 sense of effort  65 sensorimotor processing  16, 46, 321–341, 396, 446–448

index   683 sensory deprivation  295 sensory modalities  330 Serra, Eric  130 Seventh Symphony (Beethoven)  537 Seventh Symphony (Mahler)  228 sexual identity  559 Shakespeare, William  260, 272, 476, 538 shamanic traditions  301 Shannon-Fano code  162 shape cognition; see musical shape cognition Sheets-Johnstone, M.  109 Shelley, P. B.  19 Shevy, M.  286, 361t Shill, G.  104 Shorter, Gregory, Jr. (Ras G.)  622–623 Shostakovich, Dmitri  495 SIATEC algorithm  168–174, 169f Siddiq, S.  39 Sidel, J. L.  327 Siege of Leningrad  495 Siegfried (Wagner)  646–648, 649n8 sight-reading 42–43 signal processing  322, 324, 332, 342 signification  79–80, 84–85, 89–90, 632 Silk Road Project  93n18 simplicity principle  164 simulation  60–61, 65 simulation of auditory experience  445–450, 453, 458 simulations 102–103 “Sing Sing Sing” (Prima)  467, 475 singing  200, 212, 454; see also vocality Six Million Dollar Man (television)  586 Sjögren, H.  323 slavery 617 Smalley, Denis  266, 275n11 small-scale music cultures  216n9 smart contact lenses  316 Smith, Carl  318n16 Smith, Cauleen  616 Smith, Harry  309 Smith, L. B.  20 Snoop Dogg  601–602 “So What” (Davis)  552 social activities  117–118 social anthropology  192

social autonomy  540 social bonding  139–140 social control  297 social function of music  378 social hierarchies  496–498, 573, 590 social identity  361 The Social Shaping of Technology (MacKenzie and Wajcman)  585, 589 Society for Ethnomusicology  282–283 Socratic dialogue  636 Solar Flare Arkestra Marching Band Project 616 Solaris (film)  274 SOLI ensemble  93n14 Solis, Gabriel  26 Solomonoff, R. J.  155 Sonami, Laetitia  52f, 270 song act  194 sonic actions  37–38, 44–54 sonic aggregate  263 The Sonic Boom (Beckerman)  582 sonic environment  60, 78, 289, 449, 517–518, 531–532, 587, 623 elements of  523–524 and environmental imagination  519, 521–523 and environmentality  518–520 examples of sonic atmospheres  526–531 sonic fiction  613–615, 621–622 sonic materialism  559–561, 573–575 and ancestrality of a sonic world  562–563 and arche-sonic vibrations  563–567 and political textures  570–573 and porous bodies  567–570 sonic rhythm  81 sonic virtuality framework  262 sonifications  243, 270 Sorentino, Paulo  85 Soulquarian collective  600 sound art  129 sound dancing  269–270 “Sound Design for Affective Interaction” (deWitt and Bresin)  583–584 sound film  182 sound object  246, 273 Sound on Sound (magazine)  606n4 sound quality  263, 274

684   index sound recording  219–222, 224–226, 228–233, 522 sound samples  595, 599–602, 606n4 sound sculpting  269–270 sound waves  477 sound weapons  289–290 sound zones  332–341 sound-accompanying motion  248 sounding 180 sound-producing motion  248 soundscape theory  520 SoundSelf (game)  310 sound-to-image synesthesia  305f, 317n3 sound-to-light devices  309 sound-tracings  238–239, 239f, 243 space and imagination  269 space and time  261 Space is the Place (Sun Ra)  144, 612–613, 616, 619–620, 623–624 “Space Is the Place (But We Stuck Here on Earth)” (Hieroglyphic Being)  624 spaces, acoustic  39, 40f spatial information in sounds  374 spatial metaphors  498 spatial orientation  382 spatial/emotional image  142 Speaking Machine  234n4 spectral motion shapes  251 spectrograms  182–183, 241, 242f speculative fiction  615 speech  194, 263, 413f, 415f Speech and Phenomena (Derrida)  632, 649n6 The Spirit of Utopia (Bloch)  502, 514n19 spiritual astral  574 spiritualism 226–228 Splet, Alan  527 spontaneity 546–547 sports 16 “Sprachcharakter” 512n11 squared clouds  322, 343n4 Squarepusher  598, 599 stable pitch  247 staff notation  204–209, 207f, 208f, 214 Staffeldt, H.  323 standardized scales  207–209, 207f, 208f, 212, 214–215

Standley, J. M.  285 The Stars down to Earth (Adorno)  568 “The Star-Spangled Banner” (Hendrix version) 472 “Start Running” (Knight)  623 startup sounds (computer)  579, 581–583, 586–588 stationary spectral shapes  250 statistical analysis  331 statistical learning  357 statistical processing  243 STEIM studio  270 Stelarc 586 “Stella by Starlight” (Young)  89 step-less interfaces  126–127 Stevens, J. A.  111n5 Stewart, D. W.  354 Stewart, Garrett  639–640 Stiegler, C.  362 Stober, S.  275n8 Stocker, Michael  590 Stockhausen, Karlheinz  39, 127, 265, 546, 625 Stokowski, Leopold  430 Stolen Legacy (James)  617, 625 Stone, H.  327 “Straight, No Chaser”  85 Strange Celestial Road (Sun Ra)  622 Strauss, Richard  483, 648 “A String Quartet” (Woolf)  468 strong imagination  469 strong music experiences  434 structural organization  134 Structures I and II (Boulez)  475–476 Studio !K7  309 subconscious 141 subjectivity and Bloch’s musical aesthetics  499–500, 513n16 and musical shape cognition  241 and sonic materialism  559–560, 566, 571–572, 574, 576n4 and Walton’s musical aesthetics  496, 498 Subotnik, Rose Rosengard  543 subsymbolic features of music  239–240 subvocalization  448, 452 suicide 281–282

index   685 Sun Ra  134, 143–146, 149, 150n8, 150n9, 611–613, 616–620, 622–626 Sun Ra Featuring Pharoah Sanders and Black Harold (album)  624 superior temporal gyrus (STG)  394 supplementary motor area (SMA)  394, 447, 452–453, 458 supraconsciousness 141 surrealism  232, 307 surround sound systems  302, 342, 529 Survivor (Destiny’s Child)  598–599 sustained motion  249 sustained sounds  247 Suzuki, K.  52 Swahili 288 swarm behavior  134, 136–138, 141–142, 149 symbolic meaning  360 symbolic order of language  84 symphonic music  192–193, 212 Symphony No. 94, “Surprise Symphony” (Haydn) 481–482 Symphony of Sirens (Avraamov)  272 synaptic connections  148 synchrony and synchronization  139, 147–148, 192, 195–196, 287 syncopation  86, 89, 145 synergistic action  211–212 synesthesia  266–268, 302–303, 305–306, 305f, 308–310, 312–313, 316, 317n3, 480 Synod of Schwerin  21 synoptic representation of notation  242f synthesized sound and music and Afrofuturism  617–621, 623–626 BodySynth electrodes  617–618 and digital audio workstations (DAW)  596–598, 601–605, 606n4 human voice as synthesizer  275n14 and imagination and imagery  261 imagination-driven sound synthesis  259, 273 and performing the imagination  268–272 popular technology  276n26 and rendering imagination  263–265 speech synthesis  263 synthetic electronic sounds  61 and systemic abstractions  126 and visual imagination  267–268

synthetic a priori knowledge  23 A Synthetic Love Life (Hieroglyphic Being) 624 systems and technologies and bioacoustics  179–189 and the imaginary regime  117–131 music, physics, and the mind  133–149, 150n4 music analysis and data compression 153–175 musical notation as externalization of imagined, complex sound  191–215 and musical shape cognition  237–255 and performing the imagination  259–274 technology, memory, and metempsychosis 219–233 Szafranski, R.  298

T

tactile feedback  43, 101; see also haptics and haptic feedback tactile location  269 Tafelmusik 542 Tajadura-Jiménez, A.  373 Taliban 292 Talking Heads  478 talking machine  231 Tamino (character)  648, 650n9 tape recorders  127 Tarkovsky, Andrei  274 Tate, Greg  615 taxonomy of listening  79–80, 82–84, 93n10 Tchaikovsky, Pyotr Ilyich  42 team discussions  333–335 techne 603 technological sound recording  522, 590 “Technology and Black Music in the Americas” (Lewis)  614 technology and technological advances and augmented unreality  302, 305–306, 309, 313–317 blackness and technology  616–618 and digitally augmented sound  48 and imagination-driven sound synthesis 273 influence on music  117 and metempsychosis  227

686   index technology and technological advances (Continued ) and music performance  68 and rhythmic transformation in digital audio  595–596, 598, 602–605, 606nn5, 7–8 and synthetic artworks  309 technological instruments  70–73 see also specific technologies telegraphy 226–228 teleodynamics 211 teleological frameworks  483 telephony  82–83, 228–230 The Tempest (Beethoven)  242f The Tempest (Shakespeare)  260 tempo 46 and audio branding  359 beatless music  618 and involuntary musical imagery  455 and music in military life  286–287 and rhythmic transformation in digital audio music  598, 600–602, 607n14 and voluntary auditory imagery  393, 396 see also rhythm temporal envelope  372 temporal lobes  393 tetrachords  121–122, 131nn2–3, 202, 209 texture of sounds  247, 266; see also timbral quality of sounds “That-Which-Is” (Das Seiendes) 503 theatrical performance  61–62 Théberge, Paul  606n2 Theile, G.  321 theism 570 Thelemic visual hallucinations  307 Thelen, E.  20 theory of mind (ToM)  30 therapeutic use of music  403, 428–429, 431–432t Theremin  48, 50, 126, 568–569 thermodynamics  28–29, 133–134, 137, 142–143, 149n1, 198 Thesis Against the Occult (Adorno)  568–569 Thompson, E.  30, 262 Thompson, Hunter S.  307–308 Thompson, J.  275n8 Thoreau, Henry David  181 3D video games  306

Thundercat (Steven Bruner)  623 “Tightrope” (Monáe)  612–613 Tillmann, B.  416 timbral quality of sounds and audio branding  359 and consumer sound analysis  321–322 and emergent character of music  88 meta timbre space  40f and motor imagery in music perception 67 and musical shape cognition  248 perception and imagination of  38–41 and performing the imagination  274 and sound perception in autistic children 409 and sound/emotion connection  370–372 and subsymbolic features of music 239–240 and symphonic concerts  192 and voluntary musical imagery  452 time and anticipated sounds  53 and anticipatory imagery of sonic actions 44–47 and emergent character of music  88 and mental practice  43 and musical shape cognition  248 musical timescales  245–246 and perception of timbre  41 and timbral qualities  38–39 time signatures  162 time travel  611–614, 619–621, 623–626 time warps  600–602, 604, 607n14 timing shapes  251 Tinctoris, Johannes  21 tinnitus 263 Tippecanoe County, Indiana  184 Tomlinson, Gary  637, 639, 649n8 tonality and data compression  162 and emergent character of music  88 and resonance  147 and Sun Ra compositions  145 tonal language  216n9 tonal qualities  67 tonal systems  117–131, 121f, 131n3, 131n6 tone color  39, 239–240; see also timbral quality of sounds

index   687 tone of voice  641 tone shapes  250 “Tonfarbe” (tone color)  39 ton-gemische 127 Toole, Floyd E.  323 tools, musical instruments as  100–102 Toop, David  520 TOPLAP 591 torture  140–141, 282–283, 288–290, 293, 296–298 total inner memory  401, 403, 451 Townshend, Pete  106 tracing of sounds  238–239, 239f, 243–244, 252, 254 traditional African music  288 Trainor, L. J.  416 trance culture  315 trance-film 307 transacoustic community  180–181, 186–189 transcendental idealism  562 transcendental imagination  222 transcendental subjectivity  23, 26–28 transference  82, 92n8, 477–479 transformations 395 translational equivalence class (TEC)  168, 170–171 transmigration of souls  219, 225–227, 229, 232–233 transmolecularization 622–626 transparency 321–322 Treffert, D.  418 Trevarthen, C.  275n14 Trier, Lars von  648 triggers 472–475 The Trip (film)  307 Trip Hackers  315 Trip-a-Tron 309 Trivedi, Saam  480 Troye, Nico de  580 Truax, Barry  180 Trusheim, W. H.  397 Tsunoda, Toshiya  561, 563–567, 574 Tumult (Durango)  315 tuning systems  125, 199, 213 turntable-ism 71–72 Tuuri, Kai  79, 82–84 The Twa Sisters (ballad)  225

twelve-tone equal temperament  213 two-channel recording  342 Twombly, Theodore (character)  647 two-part 156–159 2001: A Space Odyssey (film)  100–101, 526, 581, 621 typological classifications  246–247

U

Ubiquitous Computing  588 ubiquitous listening  532 Uexküll, Jakob Johann Baron von  181, 187–188, 518–519 Umwelt  76, 181, 187–188, 518–519 Un chien andalou (film)  307 uncertainty 281 unconditioned stimulus (US)  375–376 unconscious in music  93n12 unconsciousness 83–84 Universal Audio  111n6 universals, musical  146–147 University of British Columbia (UBC) 184 University of Roehampton  419–420 “Untitled (How Does It Feel)” (Bjerke)  600 URSONATE (Schwitters)  1 US Air Guitar Championship  107 Usikova, Evdokiya  495 utopian ideology and allegories  490, 499, 512n11, 513n16, 514n22, 612

V

values 361t Van Campen, C.  266–268 van der Walt, Heine  107–108 Van Nort, D.  48–49 Vance, Donald  291–292, 295 Vangelis  618, 619 Varela, F. J.  15, 62, 223–224, 446 variable pitch  247 Variations IV (Cage)  273 Västfjäll, D.  358, 360, 379, 437–438 Vatican Embassy siege in Panama  288–289 vector sequences  161 vector sets  158–159 Velvet Underground  309 Ventemille, Jacques Descartes de  22 ventral striatum  380

688   index Ventriloqua (Satz)  561, 567–568, 570, 573–574 Verdi, Giuseppe  24 Vermeulen, I.  358 vernacular art  539–542 Vertigo (film)  645 vibrations, sonic  561, 563–574 vibrato 69 Victorian culture  228 Vidal, Francesca  514n20 video games  104–105, 107–109, 306, 308 videocentrism 633–637 Vienna Symphonic Library  39 Viennese School  550 Vinteuil’s Sonata  484n10 virga neume  119 virtual acoustic environments  42 Virtual Air Guitar  104 virtual instruments  99, 102–105 Virtual Light Machine (VLM)  309 virtual performance  108t virtual reality technology  310, 342 virtuosity  25, 109 vision visual description of sound  192, 196, 205–206, 209, 213, 219n8, 241 visual hallucinations  301–303, 304f, 305–311, 313–316 visual illusions  479–480 visual imagery  44, 375, 379, 459 visual imagination  266–268 visual interpretation of sound  180, 182–183 visual performance cues  46 visual sensory data  223 visualization  259, 267–268 Vitányi, P. M. B.  155, 164 VJ Chaotic (Ken Scott)  309 VJs  302, 306, 309, 313, 315 vocality autopoiesis and the autoaffective voice 630–634 and data compression  161–162 and Luba Luft’s Pamina  640–648 phantom of the operatic voice  637–640 videocentrism and expressive voices 634–637 vocal affect  376–377 vocal imitation  244

vocal music  123 vocal sounds  38 vocalization  422, 456 vocoders 620–621 Voight-Kampff test  641 Volcler, J.  288–290 volitional musical imagery  254 Voltaire 509 volume 67–68; see also loudness voluntary auditory imagery  391–404, 452–453 Voodoo (D’Angelo)  600, 602 Vorschein 501 Vortex Concerts 309 vulnerable machines  606n5 Vuoskoski, J. K.  360

W

Wagner, Richard  492, 550, 611, 646–647 Waisvisz, Michel  270 Wajcman, Judy  585, 589 Waksman, S.  105–106 Wallin, N. L.  100–101 Walsh, James P.  29, 449 Walther-Hansen, Mads  29 Walton, Kendall  473–474, 482, 485n14, 489–498, 505, 508–509, 511, 512n7, 514n22 war on terror  284 Warhol, Andy  221, 309 Warnock, Mary  19–20, 222, 470 Warp label  598 Warren, Harry  88 Washington, Kamasi  623–624 water imagery  474 Water Mill with the Great Red Roof (Hobbema) 508 waterboarding 291 Watt, R. J.  360 We Are Not the First (Hieroglyphic Being) 624–625 “We Can Work It Out” (the Beatles)  477 wearable video equipment  316 Weber, R. J.  446 Weheliye, Alexander  620–621 Weltanschauung 510 Wernicke area of the brain  46 Wesier, Mark  588–589

index   689 Western culture and aesthetics of perfection  542 and audio branding  359–360 and Bloch’s musical aesthetics  499 and computer system sounds  579, 582, 584 and empirical musical imagery  452 and evolving technologies of performance 107–108 and high art/vernacular art divide  541 and memory/imagination relationship  221 and music analysis  161–162 and music in detention/interrogation situations 292 and musical notation  39–40, 163, 204, 239–240, 247–248 and musical shape cognition  238 and Pythagorean tone system  202 and sound perception in autistic children 422 and symphonic music  192 syntax of  422 and tonal interface with music  125–126 and tuning systems  118–123, 125–126, 129–131 Western art music  107–108, 108t, 398, 539, 542–543, 549–554; see also classical music Western philosophy  222–223 Western tonal music  162–163, 422, 482–483 Western philosophy  428, 514n24, 632 “What about Us” (Brandy)  600 What Is Posthumanism? (Wolfe)  630 Whitmer, T. C.  545 Whitney, John  308 Wiggins, G. A.  27, 164, 172 Wilber, Ken  434 William of Ockham  155 Williams, Martin  550

Williamson, V. J.  455 Windows Vista  590–591 Windsor, W. Luke  110, 530–531 Wishart, Trevor  122, 209–210, 267, 275n14 Witek, Maria  79, 85–87, 600 wolf yoik  194 Wolfe, Cary  630–631, 634–638, 644, 646, 648, 649n5 Woodstock 106 Woody, R. H.  439 Woolf, Virginia  467–469, 472–474, 484n3 Wordsworth, William  513n17 working memory  394, 396, 402 World War I  296 World War II  296 writing  17, 213, 219; see also notation, musical Wundt-curve 358

X

Xenakis, I.  252, 261 X-Mix 309

Y

yoik 194–195 Young, J. S.  298 Young, La Monte  92n6 You’re Dead! (Flying Lotus)  623–624

Z

Zacharov, N.  323–324, 330, 333, 342 Zarlino, Gioseffo  124–125 Zatorre, R. J.  263, 393 Zentner, M. R.  356, 359 Zhora (character)  641 ZIPI 129 Žižek, Slavoj  638–639 zygonic theory  411