252 112 8MB
English Pages [219] Year 2019
SOUND DESIGN THEORY AND PRACTICE
Sound Design Theory and Practice is a comprehensive and accessible guide to the concepts which underpin the creative decisions that inform the creation of sound design. A fundamental problem facing anyone wishing to practice, study, teach or research about sound is the lack of a theoretical language to describe the way sound is used and a comprehensive and rigorous overarching framework that describes all forms of sound. With the recent growth of interest in sound studies, there is an urgent need to provide scholarly resources that can be used to inform both the practice and analysis of sound. Using a range of examples from classic and contemporary cinema, television and games this book provides a thorough theoretical foundation for the artistic practice of sound design, which is too frequently seen as a ‘technical’ or secondary part of the production process. Engaging with practices in film, television and other digital media, Sound Design Theory and Practice provides a set of tools for systematic analysis of sound for both practitioners and scholars. Leo Murray is a lecturer in sound at Murdoch University, Australia. He spent ten years as a broadcast engineer in the UK, before moving into teaching and researching in sound, principally working in film and television. His research interests include sound design, semiotics and media ethics.
SOUND DESIGN THEORY AND PRACTICE Working with Sound
Leo Murray
First published 2019 by Routledge 2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN and by Routledge 52 Vanderbilt Avenue, New York, NY 10017 Routledge is an imprint of the Taylor & Francis Group, an informa business © 2019 Leo Murray The right of Leo Murray to be identified as author of this work has been asserted by him in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data Names: Murray, Leo, 1968- author. Title: Sound design theory and practice : working with sound / Leo Murray. Description: London ; New York, NY : Routledge, 2019. | Includes bibliographical references and index. Identifiers: LCCN 2019003326 (print) | LCCN 2019009673 (ebook) | ISBN 9781315647517 (ebook) | ISBN 9781138125407 (hardback : alk. paper) | ISBN 9781138125414 (pbk. : alk. paper) Subjects: LCSH: Motion pictures--Sound effects. | Motion pictures--Aesthetics. | Sound--Social aspects. Classification: LCC PN1995.7 (ebook) | LCC PN1995.7 .M87 2019 (print) | DDC 791.4301--dc23 LC record available at https://lccn.loc.gov/2019003326 ISBN: 978-1-138-12540-7 (hbk) ISBN: 978-1-138-12541-4 (pbk) ISBN: 978-1-315-64751-7 (ebk) Typeset in Bembo by Taylor & Francis Books
For my family
CONTENTS
List of illustrations Foreword Acknowledgements
viii x xii
1 Introduction
1
2 Theories of sound
13
3 Audiovisual theories of sound
35
4 Sound as a sign
53
5 Analysing sound with semiotics
76
6 King Kong (1933)
98
7 No Country for Old Men
109
8 Sound in non-fiction
120
9 Sound in video games
145
10 Sound in practice
174
Appendix A Appendix B Appendix C Index
193 197 200 203
ILLUSTRATIONS
Figures
5.1 5.2 5.3 5.4 5.5 7.1 7.2 7.3 7.4 7.5 7.6 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9
The Conversation – opening shot of the city square The Conversation – the rooftop position The Conversation – the rooftop ‘sniper’ The Conversation – the couple seen from the rooftop sniper POV through the telescopic sight The Conversation – Harry Caul goes to the parked van No Country for Old Men – the gas gun No Country for Old Men – approaching the door No Country for Old Men – “Where does he work?” No Country for Old Men – Chigurh hears the toilet flush No Country for Old Men – the shadow outside the hotel room door No Country for Old Men – the shadow moves away The Mighty Conqueror – establishing shot of Sydney Harbour Bridge The Mighty Conqueror – the same shot repeated 14 times Among the Hardwoods – men work to cut down the giant trees Among the Hardwoods – horses and oxen used to haul the fallen trees to the railway line Dock Ellis & the LSD No-no Hidden Slaves Stranger Comes to Town It’s Like That
91 92 92 92 93 113 113 114 114 115 116 123 124 125 125 130 131 131 132 132
List of illustrations ix
8.10 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8
Warsaw Uprising Pong Space Invaders Asteroids FIFA 18 The Last of Us Call of Duty WWII Limbo Cuphead
137 148 149 149 157 159 160 162 166
Table
5.1 A comparison of Chion’s and Peirce’s concepts
88
FOREWORD
My own interest in sound began as a child, as it did with many people: through listening to music, watching television and films, and playing games. When I was growing up there seemed to be a never-ending series of children’s television shows which had fantastic theme music, an impressive array of character voices, and vivid sound effects. The cartoon series Tom and Jerry cartoon often entailed Tom the cat receiving an incredibly painful though temporary injury. In one particular episode, the details of which are long gone, I can still remember the screaming sound he made. It was such a desperate, comical, elongated yelp of animated pain that it stuck with me. I can still remember the sound even now. Though probably not consciously at the time, I was somehow aware that someone, somewhere had made that sound, because I knew that cartoons did not talk by themselves. Someone, therefore, had chosen that sound for the cartoon; they may even have made the sound themselves, and it fitted perfectly. From then on, over the years I became increasingly interested in how things sounded. As a teenager in the 1980s this was usually through a newfound interest in music in general, and music production in particular. It seemed that music and music technology were developing rapidly side by side with original and interesting new sounds and styles coming one after another. Later I discovered many other sound practitioners of my approximate age were experiencing their own particular sonic enlightenments around the same time. Whether it was the electronic sounds of 1970s gaming consoles, or the character voices for Warner Bros cartoons, or the clue that Steve Austin in the Six Million Dollar Man was using his bionic powers, or the futuristic theme music for Doctor Who; each was fascinating to me. I did not know about Mel Blanc, though I would have seen his name on hundreds of cartoons. I did not know how someone had come up with the idea of auralising the feat of bionic vision. I did not know that Delia Derbyshire and her contemporaries had created many of the most revolutionary sounds in the BBC Radiophonic
Foreword xi
Workshop. Experimenting with a borrowed video camera allowed me a first glimpse of having the ability to provide a sound track of my own choosing to whatever images were appearing on the screen. As an adult, I am still fascinated by sound, and particularly what happens when it is synchronised with moving images.
ACKNOWLEDGEMENTS
I would like to acknowledge several of my colleagues at Murdoch University, in particular Gail Phillips, Martin Mhando and Alec McHoul without whom this book would not have come to light. Several other colleagues at Murdoch have also helped either through giving me time to do the work or by their timely advice, support, patience or humour. My thanks go to Helena Grehan, Chris Smyth, Anne Surma, Rikki Kersten, Johannes Mulder, Simon Order, Timothy Eng, Cam Merton and Jacqui Baker. As with any teacher, I am also very grateful to a number of my students, both undergraduate and postgraduate. Thanks in particular to Ben Morton and Jordan Sweeting and the many others whose work prompted me to think differently about the subject and opened up different avenues of interrogation. I am also grateful to a number of filmmakers who gave me the opportunity to collaborate on their productions, particularly Damian Fasolo, Nathan Mewett, Glen Stasiuk, and Rebecca Cialella. I am also indebted to the UK- and Australian-based practitioners who gave their time and insights during the preliminary research for this book: James Currie, Rolf de Heer, Graham Ross, Tony Murtagh, Kallis Shamaris, Steve Haynes, Ric Curtin and Glen Martin. More recently, another group of US-based industry practitioners kindly agreed to be interviewed for this book: Charles Maynes, Emar Vegt, Jay Peck, Matt Haasch, Mark Mangini, Paula Fairfield, Tim Nielsen and Tom Fleischman. Portions of Chapter 5 were originally published as ‘Peirce and Sound Design Practice’ in 2015 in the Journal of Sonic Studies, (10). Portions of Chapter 8 were originally published in 2013 as ‘With my own ears: the ethics of sound in non-fiction film and TV’ in the Australian and New Zealand Communication Association (ANZCA) Annual Conference. Cover illustration – ‘Shoe Machine’ artwork by permission of Michelle Home. Created for the Australian short film Sol Bunker (2015) written and directed by Nathan Mewett. The film’s eponymous character is an eccentric and obsessive sound designer.
1 INTRODUCTION
This book began with the desire to improve the alignment of the teaching of sound design concepts with sound design practice.1 For anyone wishing to integrate the analysis or theory of sound design with the practice of sound design there remains a substantial gap in the existing literature between the analysis of the finished article and the practice that creates it. For any practitioner working to create the soundtrack for documentary and drama production, or the interactive sound design for digital video games, the work presents an array of challenges and requires a range of skills, both technical and artistic, which deserve articulation in their own right. Apart from purely theoretical disquisitions on sound, industry practice may be conveyed by practitioners themselves reflecting on their own work or by interviews with practitioners in more academic texts. However, currently there are few texts that tie theory and practice together systematically. That is the aim of this book. Previous generations of sound practitioners often learnt their craft through an apprenticeship model in which the student sat at the teacher’s side gradually soaking up experience from real-world examples. The artistic and technical demands of the work could be learned side by side under the guidance of an experienced hand. Contemporary training is more likely to be through technical and tertiary institutions, rather than through a master-apprentice model, and this should ideally provide the opportunity for teaching theory alongside practical instruction. There is a sizable proportion of people working in the industry who may not have formally studied sound but who nevertheless have arrived at a position in the industry through a combination of talent, providence and the motivation to learn for themselves through practice. Studying sound design involves a range of approaches to the treatment of sound. Sounds may originate from authentic recordings, or may be replaced in a studio, or even designed from scratch.2 Music may be written for the production or the production may be edited to fit the music. Sometimes the focus is on accurate and
2 Introduction
faithful recordings, and other times on the wholesale replacement or careful editing of the original recordings in such a way as to hide the artifice of the process. The real happily lives alongside the fabricated. The synchronous coexists with the asynchronous. Each and every sound element can be subsumed by the needs of the story being told or the game being played. When combined with the images, subtle or major changes in meaning or emphasis are routinely discovered, discarded, or built upon. In this environment, the aim of this book is to illuminate some of the processes at work in the creation of the soundtrack in such a way as to make it of practical use to those working in the industry, and to those engaged in studying, analysing and teaching sound. Where traditional analytical approaches focus on the product, the aim of this more practitioner-based approach is also to give due attention to the process. A practice-centred model has been adopted in which the theory and concepts can be applied in the context of the actual practice. The aim is therefore to devise a theory of sound design based on the actual practice of sound designers. In working on various productions and in teaching the practice of sound design I have found many enlightening works of theory written from the perspectives of the critic, analyst or theorist, as well as interviews and articles from practitioners. However, the two have rarely correlated and there has appeared to be a disconnection between the theory and the practice. This was particularly true in the area of sound design, which has been comparatively neglected in film theory. There appeared to be an opportunity to address two issues simultaneously by articulating a theory of sound design, firmly grounded in industry practice. This book, therefore, addresses the following question: What theoretical model best matches how sound is used by sound designers in actual practice? The soundtrack and the depth and complexity of work that leads to the creation of the soundtrack are often ignored altogether, reduced to a series of seemingly purely technical stages. The quotation below, from 1938, illustrates the long-held view that from an outside perspective the production of the soundtrack is seemingly preoccupied with the technical details of microphones, mixers, equalisers and the like: When you hear a famous violinist in Carnegie Hall you think of him only as a great creative artist – certainly not as a mechanical technician. Yet, if you were to spend hour after hour with him during practice, it is probable that you would become quite conscious of the meticulous placing of his fingers on the strings, his bowing, and even the kind of strings used on the violin. If, during all this time he appeared by necessity to be engrossed with technique, it is quite possible that when you went to the concert you would still think of him as a fine technician and of his violin as a mere mechanical tool. His concert would actually be just as beautiful a creation, but your point of view would have spoiled your appreciation of it.
Introduction 3
It is fortunate that the audience seeing a finished picture has not seen it being rehearsed over and over again in the re-recording rooms. If the rerecorder is successful, the audience is not conscious of his technique but only of the result achieved. The director and producer watching the re-recorder work out the details, however, may think him and his tools very mechanical, for in spite of what is going on inside his head (the important part of rerecording), his hands are performing a multitude of mechanical operations, and his conversations with his assistants are in terms of machines. (Research Council of the Academy of Motion Picture Arts and Sciences 1938, 72) The musical analogy is a useful one. In order to produce anything approaching a pleasant sound, one must first master the mechanical and technical aspects of the instrument so that they become invisible to the listener. Sound practitioners are all too aware that often the better their work the more invisible it becomes. To the audience there may be no appreciation at all that any work has gone into the production of the sound track. To those involved in video games, television or film production, but not directly in sound, the process could be seen as mysterious, technical or irrelevant. On a film set, for example, the majority of the film crew are concerned with the production of image, including the camera, gripping, lighting, art, hair, make-up and so on. Their work is visible, particularly where on set video playback shows the fruits of their labour. Conversely, the sound crew typically comprises just two people and often only they are aware of what is being recorded, with the other crew being in the sonic equivalent of ‘the dark’. Headphones give access to a separate world denied to the others. Conversely, the work of other departments is often visible first hand, but also through the camera’s lens and onto the playback screen. Any problems, solutions or changes in the sonic world are all but hidden. It is readily accepted that the photographer (or cinematographer) must fully understand and master the technology and techniques of the camera in order to produce artistic work. The means of producing the art become irrelevant once the art is exhibited. Sound could and should be perceived in a similar way. However, due to a lack of awareness, knowledge and understanding of its processes it is relegated to a secondary, subsidiary or worse still purely technical series of processes, rather than an artistic enterprise. Akin to the photographer’s camera or the violinist’s instrument, the sound practitioner uses techniques and technologies that need to be understood and mastered in order to create the artwork, which is hidden once the work is performed or exhibited. This book intends to form a theoretical and conceptual foundation to the practice of sound design, which is too frequently seen as a ‘technical’ or secondary part of the production process. It is hoped that this will facilitate a means of analysing and describing the ways in which sound, in its broadest sense, can be used to ‘create meaning’. As well as being of some practical use to those engaged in the study and practice of sound design, it is also hoped that delineating the theoretical underpinnings of sound design practice will contribute in some way to the
4 Introduction
perception of sound design as a creative enterprise, rather than a predominantly technical one. While I was researching sound theory, film theory and film criticism, I was also working on films, watching films and reading various practitioners’ own writing about their methods and approaches. In reading the practitioners’ opinions and descriptions of their own working practices, it seemed that there were commonalities and methods of work that were partly attributable to schools of thought or industrial and organisational structures. There also appeared to be commonalities in their working methods that seemed to reveal a consistent and meaningful approach to the production of the soundtrack. Their approach was not typically concerned with the technical details or the minutiae of the sonic elements, but with the overall effect of the sound and picture combination as a whole. There was an overriding emphasis on the story, and how the sound helps to tell the story. The function of the soundtrack, rather than its components, increasingly became the focus of my research. What does each part do, in terms of the overall story? How are these effects achieved? How do sound design professionals see their role and their work? If the fundamental role of the soundtrack is to help tell the story, then accordingly the fundamental role of the sound design practitioner is also to help tell the story. Taking this seemingly common-sense, but altogether necessary, starting point, it follows that the sound choices – whether they are about the clarity of a line of dialogue, the choice and timing of music against a sequence of images, the suggestive possibilities of the background ambiences, or the realistic portrayal of an event – are all concerned with how they help to tell the story. This led to an examination of theories that had been used in film, thus far, to see what might be most suited to the particular function of explaining sound design. The field of semiotics has provided a good deal of the raw theoretical material from which film theory has grown. Semiotics is the study of signs, and for many people (myself included), signs are a tremendously useful way of thinking about how we make sense of the world. The ideas of Swiss linguist Ferdinand de Saussure have been adapted by generations of theorists in a range of academic fields. The Saussurean model of the sign, or an adaptation of it (Barthes 1968; 1972; Metz 1974), is most frequently used in film analysis. Saussure’s model, developed from his study of linguistics, takes into account the arbitrary nature of signs and the importance of the sequential order of signs to create or modify meaning. Since the visual perspective of films can be easily divided into scenes, shots and still frames, there is a temptation to use the analogy of a filmic language, whereby the film’s structure is the equivalent of the words, phrases and sentences of a language. There has been less success in adapting this model adequately to describe sound in film. However, the work of Peter Wollen (1998) indicates that there might be another semiotic pathway. The model of the sign proposed by the American Charles Sanders Peirce takes account of a great many of the functions, properties and uses of sound, which cannot be accommodated in the linguistic, Saussurean model. Further study gave me confidence to argue that the Peircean model of the sign and its related concepts
Introduction 5
provide a comprehensive and generally applicable method of analysing how sound is used, and how the listener might make sense of what they hear whilst also providing a language that can be adapted to describe how sound is used. This semiotic model is described later.
Sound analysis and sound-image analysis Whilst the main purpose of this book is to extrapolate sound theory from sound practice, it is also essential to apply the theory and concepts ‘after the fact’ to the completed production, in order to illustrate the flexibility of the theoretical model being used. In doing so the intention is to demonstrate how the concepts of Peirce’s semiotic model can be applied to the finished productions to explain how the meaning for the audience is arrived at from the material presented in the production. A fundamental problem facing anyone wishing to practice, study or teach sound design is the lack of a theoretical language to describe the way sound is used and an overarching framework that adequately explains how sound design works in practice. Whilst there are a growing number of books, conferences and journals geared toward the critical examination of sound design, or sound in other audiovisual media, there are few texts that are sufficiently grounded in the corresponding dayto-day work of practitioners. This research is an attempt to use the experience of practitioners within the industry to inform the theory and, through the application of Peirce’s semiotic model, identify and provide a critical language that allows theory to inform the practice. This book shows how semiotics can provide the framework and the language to analyse and describe both the product and the practice of sound design. It can be applied at micro and macro levels of analysis, being equally applicable to individual sound elements and the soundtrack as a whole. It can be used to explore what the audience hears and what the practitioners do as they manipulate sound for effect. Its flexibility also allows it to accommodate other sound theories. The Peircean model has the potential to provide both the justification for sound production practice and the means to bring sound production out of the shadows. By gaining the means to elucidate previously concealed sound production processes, it is possible to give this practice the acknowledgement it merits as a fundamental and influential element of any audiovisual artefact that cannot, and should not, be taken for granted. In this book a number of case study productions are discussed to ascertain how sound is used in conjunction with images to perform its various functions: to create the narrative, to create a sense of immersion, to present reality (or a sense of reality), and so on. Two film examples are discussed in some detail, and their soundtracks are analysed at both macro and micro levels to highlight the flexibility of the approach: from the specifics of the individual representative sounds (and musical motifs) in King Kong (Cooper and Schoedsack 1933), to the unfolding of soundimage relationship and subsequent creation of meaning in No Country for Old Men
6 Introduction
(Coen and Coen 2007). Whilst the analyses are primarily based on the actual films, supporting data is also used from published accounts of those who worked on the films. Other types of audiovisual works such as non-fiction production (news, documentary, television sport) and interactive media (e.g. various video game genres) are also examined to illustrate the range of productions which can be analysed. An initial driving force during the research for this book was the belief, or at least the suspicion, that there was an underpinning rationale shared by many sound practitioners in their approach to their work that had not yet been satisfactorily explained. During this research, many of the sound editors, recordists, designers and mixers interviewed brought up the issue that often work in sound is inadvertently downplayed. This is partly because it is ‘invisible’, in the sense that it is difficult to determine what work has actually been done, but also because the work in progress is actually heard by relatively few, and even where this work is heard, it is not often immediately apparent what end is being achieved by the work. Part of the problem of analysing and talking about sound is the lack of appropriate or consistent vocabulary. A language with which to describe the purpose and function of sounds and sound practices might go some way to removing the shroud of unfamiliarity around the work of the sound practitioner. The semiotic model and its vocabulary used in this book can be useful in analysing both the soundtrack as a whole, as well as the specific sounds and sound/image combinations. It provides the means to go beyond the technical operations to describe the functional aspects of sounds. Semiotics, used as an overarching model into which other elements of sound theory can be incorporated, has shown that it is versatile enough to provide a comprehensive and universal approach that can be applied equally to both the productions themselves and the practice of sound design. The task of creating the soundtrack can be described in terms of a series of questions about what the audience should know, feel or think as they experience the production. This type of approach focuses attention on the decisions that influence how well the story is told, or rather how the story will be understood or experienced by the audience. It shifts the focus from the classification of the sound to its function in the soundtrack, where each element of the soundtrack is selected and manipulated to serve the needs of the work. The types of focused listening described by Michel Chion affect the decisions that are made about the individual elements of the soundtrack, as well as the overall view of the soundtrack. In Peircean terms, these properties of sounds are reframed as sound-signs, having iconic, indexical or symbolic relationships to the things they represent, which are manipulated by the producer and interpreted by the audience. This model of the soundtrack does not seek to overthrow the many useful, existing sound design models developed by theorists and practitioners. Indeed, the models suggested by Rick Altman, Michel Chion, Walter Murch and Tomlinson Holman can be successfully integrated into the broader Peircean model. Similarly, whilst the industrial model typically delineates sounds as either dialogue, music or effects, the Peircean model can be applied to each family of sounds,
Introduction 7
enabling the classification of sound elements in terms of their particular function and role within the soundtrack. In some sense, the creation of the soundtrack is an exercise in problem-solving. There are sounds that need to be used, but which may require editing or replacement in order for the narrative to be maintained whilst the artifice remains hidden. There are sounds that are added or manipulated that are designed to go unnoticed, but are judged to suggest a feeling that would not otherwise be present. There are relationships between sounds and images that can yield an interpretation of the narrative, either explicit or implicit. Each element, be it sound or image, is only a part of the whole, and they are co-dependent or interdependent. The soundtrack depends on the image and the image is dependent on the soundtrack. As seen through the Peircean semiotic lens, the practice of sound design can be more clearly illustrated as a creative endeavour rather than a technical one. For those involved in sound design, a recurring concern is the perception of sound practitioners and their work by others involved in other areas of film, television or interactive media. Sound production can too easily be dismissed as a sequence of overly technical operations, reinforcing the belief that the work being done is technical rather than creative. The 1938 Motion Picture Sound Engineering handbook presents the analogy of the violinist at Carnegie Hall whose technical mastery appears effortless: If the re-recorder is successful, the audience is not conscious of his technique but only of the result achieved. The director and producer watching the rerecorder work out the details, however, may think him and his tools very mechanical, for in spite of what is going on inside his head (the important part of re-recording), his hands are performing a multitude of mechanical operations, and his conversations with his assistants are in terms of machines. (Research Council of the Academy of Motion Picture Arts and Sciences 1938, 72) A semiotic model highlights the process of meaning-making. When applied to sound design practice, it uncovers the way sound is creatively manipulated at every stage, often using technical means, to achieve a particular aim. This aim is frequently described in broad terms as serving the needs of the story. In more practical terms, it means serving the director’s wishes, so that sound can be integrated in the overall narrative to suit the needs of the story. For those working in sound design, there is an implicit acknowledgement that these decisions, whilst often technically based, are made in the interests of the production as a whole. It is a process that begins with the question about the purpose or intended effect of the various aspects of the soundtrack. It might be that a significant role for the soundtrack is in creating the illusion of seamlessness, so that a series of disjointed visual shots appear to be a single continuous piece of action, or to lend a sense of believability to a fiction such as a fight sequence or a fantastic monster or animated vehicle. It might be that the soundtrack needs to set a particular mood, or geographical location, or historical time period, or time of
8 Introduction
day. Whatever the requirements, once determined, the second part of the process is spent answering the question: How best to go about achieving this end? This book is an attempt to examine that process to demonstrate even to those not directly involved with the production of soundtracks that the practice of sound design is end-directed, in the sense that decisions about sound are taken based on the effect they produce. Rather than perceiving sound as a superfluous or auxiliary step that is applied at the completion of a creative process, it can instead be scrutinised as a fundamentally important aspect of the creation of the entire work. In talking to industry practitioners for this book it became apparent that they had shared concerns about the way sound was viewed within the wider industry. Often this was the increasing pressure on time to complete the work to the standard required, especially in post-production. Each worked with the utmost professionalism on their productions and valued the work of their collaborators and colleagues in sound and in the wider industry. Relatively few were particularly concerned with technology or with tools, but each was passionate about sound and its potential to help in telling stories. Sound is frequently employed as a means of representing reality in concert with the images. However, practitioners are keenly aware that often there is a need, and the potential, for this to be fabricated, manipulated or recreated if necessary. For those involved in sound for non-fiction, a high standard of professional and ethical practice is the benchmark. For practitioners, critics, analysts, teachers and students of sound design, some guidelines may prove useful in beginning an overdue discussion about the ethics of sound practice. Whilst not exclusively the domain of those working in non-fiction, the ethics of sound is increasingly becoming a concern given the difficulty that the audience, and even fellow professionals have in discriminating between the real and the fabricated, and many types of sonic ‘sleight of hand’. Constraints caused by insufficient time or money are problems common to most sound practitioners at some time or other, as they are with any other creative endeavour. Occasionally, the biggest impediment to a more imaginative use of sound are colleagues whose understanding or appreciation of sound’s creative potential means limited involvement in the design of the production, or insufficient acknowledgement of the potential for sound to affect the meaning of the whole. Where a director or producer values sound and sees its importance in the finished product, the impact is felt throughout the production. Similarly, some practitioners enjoy a less traditional method of production, collaborating with those who value sound, and facilitating the early involvement of their sound colleagues when their opinions, concerns and suggestions can be incorporated into the foundation of the production, rather than being used as a coat of paint to be applied to the finished structure. By uncovering some of the underlying principles of the way sound functions as a sign, this book adds weight to the argument that sound’s storytelling potential is too often limited by it being seen as an appendage or afterthought, rather than an essential element of any audiovisual artefact. Often the work is done so deftly as to hide itself, which means the creative labour involved risks being attributed elsewhere. For those looking in on the world of sound, the attention is easily put on
Introduction 9
the technology and the tools of the trade rather than the value that sound adds, and with no knowledge of the journey taken to arrive at the finished soundtrack any work done is already invisible. This book adds to the call for sound to be recognised as a full and genuine creative collaborator in the creation of audiovisual productions, whose potential should be maximised at each stage of development and production. To those behind the curtain, intimately involved in the production as the soundtrack takes shape, the impact of sound on the finished article and how the soundtrack affects the audience is clearer. With the help of semiotics we can uncover the underlying principles, rationales and philosophies which underpin the soundtrack itself as well as the practice of sound design.
For those wary of ‘theory’ Many filmmakers are innately dubious or outright critical about film theory itself or the theories about the work that they do. The British director Alan Parker (Midnight Express, Angel Heart, Evita) made a documentary for Thames Television, commissioned to commemorate British Film Year in 1986. After playing clips of two revered critics, Derek Malcolm (critic) and Anthony Smith (BFI), he commented: The academic spiv is a new phenomenon in the film industry: articulate, opinionated, and not above a little old-fashioned, self-serving hype; a sort of Lou Grade with ‘O’ levels. And about as useful to the rest of the film industry as a scratch on the negative. (Parker 1986) This view is shared by some institutions and individuals whose instinctive position is that you learn more about film by doing it and, furthermore, rely on the idiom: those that can, do. Toby Miller describes it as a “line of reasoning that says: a) film-makers work with their imagination and practical knowledge; b) film-goers work with common sense; and c) film theorists work to undo the special magic and evacuation of cinema, on the basis of their lack of these three qualities” (Miller 1992). Of course, a number of historically important film-makers also had a lot to say about theory. Russian filmmaker/theorists alone like Vertov, Kuleshov, Pudovkin and Eisenstein make up an impressive list of practitioners who made films yet devoted considerable time and energy to film theory and criticism.3 Obviously then being a film-maker does not preclude or prevent the individual from also being a film theorist. Indeed, it could easily be argued that being a film-maker affords a much better position to originate theory since it is based on first-hand lived experience. Aside from arguing for the intellectual merits of film theory authors like Lapsley and Westlake also make a convincing case for its continued relevance: In watching a film the spectator is not merely a passive receptacle imbibing its meaning, but is engaged in a succession of interpretations which depend on a
10 Introduction
whole set of background beliefs.… On the basis of such beliefs – or theories, whether formalised or not – the spectator sees faces, telephones, desert landscapes rather than patches of colour; ascribes motives to characters; judges certain actions as good and others as bad; decides that this film is realistic and that one is not; distinguishes the happy from the unhappy ending; and so on. The apparently simple act of spectating thus involves theories of representation, of human nature, of morality, of the nature of reality, of the conditions for human happiness, etc. Similarly, for the filmmaker, however self-consciously intuitive the approach, there is inevitably a comparable set of theories underlying the production of film. For the critic, or anyone engaged in a discussion of cinema, judgements also involve theories. For example, the suggestion that the increase in muggings can be traced to the increased incidence of violence in films at least a theory of signification (how meaning is produced), and one of subjectivity (how spectators are affected by texts). (Lapsley and Westlake 1988, vi) This book is partly an attempt to resolve the contradictory position that exists by everyone who exists between these two extremes: that filmmakers work with their imaginations, practical knowledge and common-sense; while at the other the idea that ‘theory’ is inescapable, since everything other than a fact is a belief or idea. Many influential sound practitioners have also written insightfully about how sound is, and could be used. Aside from industry-focused texts written by experienced practitioners there are a number of practitioners such as Walter Murch, Randy Thom and Rob Bridgett who have written about the potential for sound and how it can be used, and who have talked about sound in an unashamedly theoretical way, backed up by practical experience and concrete examples of their own work. Around the turn of the century, usergroups and websites began to appear which encouraged discussion about the non-technical side of sound production.4 In recent years the proliferation of websites devoted to discussion about sound, such as DesigningSound.org, have created a place where practitioners of all levels can discuss ideas. Articles range from discussion on technologies and techniques to more reflective pieces on more broadly artistic or philosophical aspects of what it recognises as a creative endeavour. More recently podcasts such as TheAudioNowcast and Tonebenders, along with dozens of Facebook groups and YouTube channels, allow anyone interested in sound greater access to like-minded people. Interested students, amateurs and professionals alike are able to discuss their experiences, philosophies and choices.
What this book is not about Having stated that this book is an attempt to navigate the path between the practitioners view of their work and a theoretical explanation for why it works it is worthwhile explaining some things that book will not cover. Firstly, it is not a
Introduction 11
philosophical investigation into sound itself, or for that matter in a physical or acoustical sense. Students of film sound will no doubt be aware of the landmark books which brought together many of the key theoretical writings on film sound together with practitioner interviews to give a broad and comprehensive introduction to the field. Film Sound (Weis and Belton 1985) and Sound Theory, Sound Practice (Altman 1992) have each been valued companions. There are also an increasing number of authors who have taken up the challenge to create a nuanced philosophical discussion of sound and perception, particularly Casey O’Callaghan’s Sounds (2007), along with others such as Roberto Casati and Jérôme Dokic (2009). For those wishing to look at sound from a psychoacoustic perspective you would find no better starting place than Albert Bregman’s Auditory Scene Analysis (1990). Neither is this book intended to be a how-to guide for sound software or hardware, or recording, editing and mixing. This book instead attempts to provide a framework for the way we analyse sounds and soundtracks. It does so by co-opting a model of semiotics which appears to be particularly suited to the task at hand. It also lends itself to the analysis of the actual practice of sound design, if we take the liberty of using the term sound design in its broadest possible sense of meaning the deliberate use of sound.
Notes 1 The term ‘sound design’ is immediately slightly problematic in that it is both very general and very specific at the same time. It frequently refers to a number of quite different practices which will be discussed later, but for now it can be taken to mean the designed or deliberate use of sound. 2 It is relatively rare for new sounds to be created completely from scratch since a majority of the time new sounds begin life as existing sounds that are used as a starting point to be manipulated to create the desired sound. 3 See for example ‘Kino-eye: The Writings of Dziga Vertov’ (Vertov and Michelson 1984), ‘Film Technique and Film Acting – The Cinema Writings of V.I. Pudovkin’ (Pudovkin 1958), Writings: Sergei Eisenstein Selected Works’ (Eisenstein and Taylor 1996) and ‘Kuleshov on Film: Writings by Lev Kuleshov’ (Kuleshov and Levaco 1974). 4 Yahoo’s Sound-article-list and sound-design usergroups, and the website filmsound.org were all important mechanisms for bringing ideas about how sound can be used to a broader audience of interested parties.
References Media referenced Coen, Joel, and Ethan Coen. 2007. No Country for Old Men. Miramax. Motion Picture. Cooper, Merian C., and Ernest B. Schoedsack. 1933. King Kong. RKO Radio Pictures. Motion Picture.
Other sources Altman, Rick. 1992. Sound Theory, Sound Practice, AFI Film Readers. New York: Routledge. Barthes, Roland. 1968. Elements of Semiology. New York: Hill and Wang.
12 Introduction
Barthes, Roland. 1972. Mythologies. Translated by Annette Lavers. New York: Hill and Wang. Bregman, Albert S. 1990. Auditory Scene Analysis The Perceptual Organization of Sound. Cambridge, MA: MIT Press. Casati, Roberto, and Jérôme Dokic. 2009. Sound and Events. In Philosophy of Sound. Nîmes: Jacqueline Chambon. Original edition, 1994. Eisenstein, S. M., and Richard Taylor. 1996. S.M. Eisenstein: Selected Works. London: British Film Institute. Kuleshov, L. V., and Ronald Levaco. 1974. Kuleshov on Film: Writings by Lev Kuleshov. Berkeley; London: University of California Press. Lapsley, Robert, and Michael Westlake. 1988. Film Theory: An Introduction, 2nd edition. Manchester, UK; New York: Manchester University Press. Metz, Christian. 1974. Film Language: A Semiotics of the Cinema. New York: Oxford University Press. Miller, Toby. 1992. “(How) Does Film Theory Work?“ Continuum: The Australian Journal of Media and Culture 6(1): 186–212. doi:10.1080/10304319209359391O’Callaghan, Casey. 2007. Sounds: A Philosophical Theory. Oxford: Oxford University Press. Parker, Alan. 1986. A Turnip Head’s Guide to the British Film Industry: A Personal View. Thames Television. Pudovkin, Vsevolod Illarionovich. 1958. Film Technique and Film Acting, memorial edition. London: Vision Press. Research Council of the Academy of Motion Picture Arts and Sciences. 1938. Motion Picture Sound Engineering. New York: D. Van Nostrand Company, Inc. Vertov, Dziga, and Annette Michelson. 1984. Kino-eye: The Writings of Dziga Vertov. Berkeley, CA: University of California Press. Weis, Elisabeth, and John Belton, eds. 1985. Film Sound: Theory and Practice. New York: Columbia University Press. Wollen, Peter. 1998. Signs and Meaning in the Cinema, expanded edition. London: BFI Publishing.
2 THEORIES OF SOUND
Definitions of sound An acoustic definition of sound would typically be something like ‘mechanical vibrations transmitted by an elastic medium’, or even ‘a longitudinal compression wave which distorts a medium by creating moving fronts of high and low particle compression’. There is no way to directly capture or transmit sound even though, to a young child at the beach holding up a seashell to listen for the trapped sound of the ocean, this might seem like a reasonable explanation for the noise. In order for us to work with sound we need to convert it from the acoustic domain into a more useful one such as an electrical signal, by means of a transducer. Usually this means a microphone allowing the variations in pressure to be converted to an electrical representation which can then be transmitted or recorded. On playback another transducer (usually a loudspeaker of some sort) converts the electrical signal back into an acoustical one. Since the majority of sound design is for an audience of human beings, we may begin by saying that for our purposes sound is in the range of human hearing, which is typically understood as being in a roughly ten octave range of 20–20,000 cycles per second (20–20kHz). Going back a step, do we then define sound as something audible to a person? For example, what about sounds that can be heard by dogs but which humans cannot hear? If we record such sounds and play them back at a slower speed, we can then hear the original sound at a lower pitch that is now audible to us. For convenience we might assume that sound refers to acoustic waves which are in the range of human hearing. Sounds that are not audible to us can be considered ultrasonic when they are composed of frequencies higher than those our ears are able to deal. Similarly, sounds below our audible frequency range can be considered infrasonic.
14 Theories of sound
For a more general definition the Oxford English Dictionary describes sound as: ‘Vibrations that travel through the air’ and alternatively ‘A thing that can be a heard’. Most (though not all) sound designers would have at least some basic knowledge of the physics of sound, and many might have a much more thorough scientific understanding having studied physics, electrical engineering, electronics or some other related discipline. In terms of the perception there is an alternative definition of sound which relates to the perceptual sense referring to how these vibrations are heard. In the English language we do not use different words to distinguish between sound as a physical sense as vibrations in a medium, and sound in a perceptual sense. Obviously the two are inextricably linked but anyone with a microphone and recording device will testify that what we perceive as human listeners in a given situation and what is objectively picked up by a microphone can be quite different.
Theories of sound Whether regarding its physical, mathematical and musical characteristics, the organ which senses it, the elements of speech and language, or in comparison to our other senses, sound has long been a topic that prompted thoughtful inquiry. Pythagoras (c. 570–495 BCE) was intrigued by sound, so the story goes, when he passed a blacksmiths and found the sound of several blacksmiths’ hammers when struck simultaneously could produce both sounds both pleasant and unpleasant (consonant and dissonant) (Caleon and Subramaniam 2007, 173–174). On further investigation he determined that there was a relationship between the weights of the hammers which produced a harmonious sound and that they were in ratios of whole numbers (2:1, 3:2, 4:3, etc.). This in turn led him to create a simple stringed instrument on which to conduct experiments, and through which he found that two stretched strings in a ratio of 2:1 produced the same note, one octave apart. Other simple ratios also produced consonant sounds. Ratios of 1:2 and 2:1 give octaves, 2:3 and 3:2 give fifths, and ratios of 3:4 and 4:3 give fourths. Whilst all integers can be used in this way to create ratios Pythagoras realised that using numbers from 1 to 4 was sufficient for all musical notes to construct a musical scale. These first four integers, for Pythagoreans, were special in other ways since they added up to 10 (the perfect number, signifying unity of the highest order) and could also be displayed in a triangle – the Tetractys of the Decade.1 Pythagoras had a significant influence on both theories of music and of sound, being the first to describe many of the fundamental relationships around sound and music. Pythagoras found a mathematical relationship between the length of a pair of strings, the tension in the strings and the sound they made when both played. Where the lengths were in a proportion that made a simple ratio 1:1, 2:1, 3:2, 4:3 their sound was also consonant or agreeable. Where the ratio was not in a simple fraction the sound produced was dissonant.2 Plato (c. 428–348 BCE ) was also interested in the relationship between sound and the hearing process, and the relationship between the body and soul in acoustic
Theories of sound 15
sensation. In Plato’s theory of perception sound is distinct from the perceptive process. In other words, sound does not have to be heard to exist in its own qualities, which could be interpreted as an early version of the pop-philosophical question: if a tree falls in a forest and no-one is there to hear it, does it make a sound? This passage concerns the faculty of hearing contains an intriguing mixture of insight and mis founded common sense: In considering the third kind of sense, hearing, we must speak of the causes in which it originates. We may in general assume sound to be a blow which passes through the ears, and is transmitted by means of the air, the brain, and the blood, to the soul, and that hearing is the vibration of this blow, which begins in the head and ends in the region of the liver. The sound which moves swiftly is acute, and the sound which moves slowly is grave, and that which is regular is equable and smooth, and the reverse is harsh. A great body of sound is loud, and a small body of sound the reverse. Respecting the harmonies of sound I must hereafter speak. (Timaeus [360 BCE ] Plato 2013) Whilst many elements of Plato’s analysis are intriguing there are occasions where it is demonstrably incorrect, such as the belief that perceived pitch of a sound depends on the speed with which it spreads. Like Plato, Aristotle was interested in the perception of sounds. Especially in book two of his treatise De Anima [On The Soul] he laid out the groundwork for a more thorough analysis of sound: Actual sound requires for its occurrence (i, ii) two such bodies and (iii) a space between them; for it is generated by an impact. Hence it is impossible for one body only to generate a sound-there must be a body impinging and a body impinged upon; what sounds does so by striking against something else, and this is impossible without a movement from place to place. (Aristotle 350 bce, II.8) Aristotle also offered a coherent explanation for an echo: An echo occurs, when, a mass of air having been unified, bounded, and prevented from dissipation by the containing walls of a vessel, the air originally struck by the impinging body and set in movement by it rebounds from this mass of air like a ball from a wall. It is probable that in all generation of sound echo takes place, though it is frequently only indistinctly heard. What happens here must be analogous to what happens in the case of light; light is always reflected otherwise it would not be diffused and outside what was directly illuminated by the sun there would be blank darkness; but this reflected light is not always strong enough, as it is when it is reflected from water, bronze, and other smooth bodies, to cast a shadow, which is the distinguishing mark by which we recognize light.
16 Theories of sound
It is rightly said that an empty space plays the chief part in the production of hearing, for what people mean by ‘the vacuum’ is the air, which is what causes hearing, when that air is set in movement as one continuous mass; but owing to its friability it emits no sound, being dissipated by impinging upon any surface which is not smooth. When the surface on which it impinges is quite smooth, what is produced by the original impact is a united mass, a result due to the smoothness of the surface with which the air is in contact at the other end. (Aristotle 350 bce, II.8) In each case above Aristotle correctly describes fundamental aspects of sound: firstly, that sound requires an object to be moving or vibrating, and secondly, that echoes occur wherever sound is reflected off another object, even if only indistinctly heard. Aristotle also considered sound’s relation to language, describing the linguistic elements such as vowels (‘that which without impact of tongue or lip has an audible sound’), semivowel (‘that which with such impact has an audible sound, as S and R’) and mute (‘that which with such impact has by itself no sound, but joined to a vowel sound becomes audible, as G and D’): These are distinguished according to the form assumed by the mouth and the place where they are produced; according as they are aspirated or smooth, long or short; as they are acute, grave, or of an intermediate tone; which inquiry belongs in detail to the writers on meter. (Aristotle 350 bce, part xx) He also distinguished between things that are capable of making a sound and those that are not (Aristotle 350 bce, II.8); that is, things which when struck emit a sound. Aristotle’s examples of the former are bronze and any smooth solid object, with sponge or wool for the latter. He also correctly surmised that the air between listener and sounding object must be continuous for hearing to occur, and that when the air outside the ear moved the air inside the ear is also disturbed (Johnstone 2013, 632). By ‘moved’ we mean of course vibrated. The air between the sounding object and listener does not necessarily move from one place to the other. The first century BCE Roman architect Marcus Vitruvius Pollio, or simply Vitruvius, made important discoveries on the motion of sound as a wave. Now the voice is like a flowing breath of air, and is actual when perceived by the sense of hearing. It is moved along innumerable undulations of circles; as when we throw a stone into standing water. Innumerable circular undulations arise spreading from the centre as wide as possible. And they extend unless the limited space hinders, or some obstruction which does not allow the directions of the waves to reach the outlets. And so when they are interrupted by obstacles, the first waves flowing back disturb the directions of those which follow. In the same way the voice in like manner moves circle fashion. But
Theories of sound 17
while in water the circles move horizontally only, the voice both moves horizontally and rises vertically by stages. Therefore as is the case with the direction of the waves in water, so with the voice when no obstacle interrupts the first wave, this in turn does not disturb the second and later waves, but all reach the ears of the top and bottom rows without echoing. (Vitruvius and Granger 1931, Chapter iii, 5–8) He was particularly interested in the acoustical properties of theatres: For we must choose a site in which the voice may fall smoothly, and may reach the ear with a definite utterance and without interference of echoes. For there are some places which naturally hinder the passage of voice: the dissonant which the Greeks call katechountes; the circumsonant which are named by them perichountes; the resonant also which are called antechountes; the consonant which they name synechountes. The dissonant places are those in which the voice, when first it rises upwards, meets solid bodies above. It is driven back, and settling down, overwhelms the following utterance as it rises. The circumsonant are those in which the voice moves round, is collected and dissipated in the centre. The terminations of words are lost and the voice is swallowed up in confused utterance. The resonant are those in which the words, striking against a solid body, give rise to echoes and make the termination of the words double to the ear. The consonant also are those in which the voice reinforced from the ground rises with greater fulness, and reaches the ear with clear and eloquent accents. (Vitruvius and Granger 1931, Book V, Chapter viii, 1–2) The wave motion described by Vilnius appears to be transverse (such as the motion of a ripple in a pond) as opposed to longitudinal (which is how a sound wave actually travels) and this model of sound transmission survived for centuries. Nevertheless, Vitruvius did outline some important acoustical characteristics which have an impact on sound intelligibility; interference (dissonance) and reflections (echo). Giambattista Benedetti (1530–1590) was the first to hypothesise that musical sounds, such as those made by vibrating strings of a musical instrument, travel through the air as a series of pulses; and how high or low they sound (its pitch) is a direct result of how frequently the vibrations occur (Helm 2013, 147–148). This then led to further questions about which other factors influence the pitch – the tension, the length, and so on. A countryman of Bendetti, Galileo Galilei (1564–1642) was also interested in sound.3 He was among the first to describe some of the physical properties of sound, particularly in comparison to light: “Everyday experience shows that the propagation of light is instantaneous; for when we see a piece of artillery fired, at great distance, the flash reaches our eyes without lapse of time; but the sound reaches the ear only after a noticeable interval” (Galilei 1939, 42). Galileo agreed with Benedetti that a string’s frequency of vibration is connected to both its length and its tension and gave
18 Theories of sound
the more general principle that the ratios of the lengths of strings are the inverse of the ratios between frequencies of vibration (Galilei 1914, 103). Whilst Galileo had no means to measure the frequency of vibration of strings, Franciscan friar Marin Mersenne (1588–1648) was able to experimentally confirm Galileo’s hypotheses. Importantly Mersenne was also able to predict the frequency of pipe organ notes based on their dimensions as well as measure the frequency of vibration in a string. Mersenne experimented and corresponded widely with his peers, including Galileo and Descartes, and published his account Harmonicorum Liber in 1636, before Galileo’s own, though Galileo’s discovery pre-dates Mersenne’s publication. Rene Descartes (1596–1650) also described sound in terms of the objects which make them but in Passions of the Soul considered that what we actually hear are not the objects themselves, but some “movements coming from them” (Descartes and Voss 1989, [1649, XXIII]). Descartes thus separated the visible or audible object from our perception of them: Thus when we see the light of a torch, and hear the sound of a bell, this sound and this light are two different actions which, simply by the fact that they excite two different movements in certain of our nerves, and by these means in the brain, give two different sensations to the soul, which sensations we relate to the subjects which we suppose to be their causes in such a way that we think we see the torch itself and hear the bell, and do not perceive just the movements which proceed from them. (Descartes, Haldane and Ross 1997, 369) As well as the nature of hearing Descartes went on to describe the function of the components of the ear: CXCIV. Of hearing. Fourthly, there are two nerves within the ears, so attached to three small bones that are mutually sustaining, and the first of which rests on the small membrane that covers the cavity we call the tympanum of the ear, that all the diverse vibrations which the surrounding air communicates to this membrane are transmitted to the mind by these nerves, and these vibrations give rise, according to their diversity, to the sensations of the different sounds. (Descartes, Lindsay and Veitch 1912, 218) Isaac Newton (1643–1727) built upon Galileo’s ideas in Principia Mathematica and went further in outlining one of the first clear manifestations of a physical description of sound: The last Propositions respect the motions of light and sounds; for since light is propagated in right lines, it is certain that it cannot consist in action alone (by Prop. XLI and XLIl). As to sounds, since they arise from tremulous bodies,
Theories of sound 19
they can be nothing else but pulses of the air propagated through it (by Prop. XLIII); and this is confirmed by the tremors which sounds, if they be loud and deep, excite in the bodies near them, as we experience in the sound of drums; for quick and short tremors are less easily excited. But it is well known that any sounds, falling upon strings in unison with the sonorous bodies, excite tremors in those strings. (Newton, Cajori and Motte 1962, 368) Newton also confirmed the speed of sound at 1142 feet per second (of English measure, or 1070 of French measure) before standardisation of units. Whilst the mathematician and physicist Newton examined its mathematical and physical characteristics one of his contemporaries’ focus regarding sound was as the object of our senses. Though John Locke is better known as a philosopher he was also a distinguished physician himself, collaborating with some of the most eminent scientists of his century including chemist Robert Boyle and working with Richard Lower in experimental physiology (Rogers 1978, 223–224). In An Essay Concerning Human Understanding John Locke (1632–1704) took up the classical idea of balance in describing all of the human senses, recognising that they appeared perfectly suited to our surroundings: If our Sense of Hearing were but 1000 times quicker than it is, how would a perpetual noise distract us. And we should in the quietest retirement be less able to sleep or meditate than in the middle of a sea-fight. (Locke and Fraser 1959, 302–303) Being written long before Darwin’s theory of natural selection for the apparent matching of our sense faculties with the world around us, Locke instead attributed them to “the all-wise Architect”. Locke’s writing was in part a refutation of the doctrine of ideas put forward by Descartes. Instead of ideas being innate, Locke suggested that all our ideas come from either Sensation or Reflection. Our senses provide us the “Objects of Sensation” but our ideas may also come about by reflection from the “Operations of our Minds” (Locke 1690, Book II, Chapter 1, 1–5). Those investigating sound had until now tended to do so in an all-encompassing universal sense. Around the period of the enlightenment this holistic examination of sound began to separate into more distinct areas of study. The scientific study of the properties of sound itself began to be undertaken alongside, but now separate to, those who instead focused their study on how humans make sense of sound, either as a philosophical entity or as the object of human perception.
The scientific study of sound Various people in the seventeenth century, including Wallis in England, and Sauveur in France, had observed that a stretched string could oscillate in parts with certain nodes, and that these oscillations were simple multiples of the frequency of
20 Theories of sound
the fundamental. Daniel Bernoulli subsequently showed that it was possible for these different frequencies to be produced simultaneously, meaning that the resultant oscillation was the result of the algebraic sum of the individual harmonics. The French mathematician Joseph Fourier built on this idea initially in a memoir in 1807 and later published in his Analytical Theory of Heat (1822).4 His idea was that which stated that any period function could be expressed as a series of sinusoidal waves. The equations of the movement of heat, like those which express the vibrations of sonorous bodies, or the ultimate oscillations of liquids, belong to one of the most recently discovered branches of analysis, which it is very important to perfect.… The same theorems which have made known to us the equations of the movement of heat, apply directly to certain problems of general analysis and dynamics whose solution has for a long time been desired. (Fourier 1878, 6–7) Fourier’s account was initially met with some resistance, by mathematicians including his mentors and former teachers Laplace and Lagrange (O’Connor and Robertson 1997). Fourier’s work eventually was proven to be correct and his theorem became fundamentally important in its application to digital audio, among other fields of mathematics and science. Other discoveries relating to sound and hearing followed Fourier’s insights into complex waves. The German Georg Simon Ohm first suggested that the ear was sensitive to spectral components, and gave his name to his less well known Law of Hearing (1843) in which he suggested that a musical sound is perceived by the ear as a set of a number of constituent pure harmonic tones.5 Fourier demonstrated that a continuous-time signal, such as sound, can be described by an arithmetic series of pure sinusoidal waves. Harry Nyquist (1928) theorised, and later Claude Shannon (1948) proved, with the NyquistShannon sampling theorem that any band-limited signal may be represented as a series of equally spaced snapshots or samples.6 The rate at which the samples are taken needs to be double that of the highest frequency to be reproduced. In effect, for human hearing which is normally agreed to occupy the range of approximately 20Hz to 20kHz this means that for any audible signal to be sampled and replayed with the signal completely intact a sample rate of 40kHz is sufficient. In practice, there are some limitations which are more concerned with creating a band-limited signal so that out of band (ultrasonic) portions of the signal are removed before sampling so that aliasing does not occur, but Nyquist and Shannon’s ideas have proven to be both correct and far reaching. Benjamin Peirce (1809–1880) compiled An Elementary Treatise on Sound whose aim was to update a version “written by Sir John Herschel for the Encyclopedia Metropolitana, adapted to the purposes of instruction” (Peirce 1836, iii).7 In it, Peirce catalogued the various writings on sound from antiquity up to the time of writing – from general treatises of Aristotle, through the enormously scientifically
Theories of sound 21
productive eighteenth century up to and including the most recent scientific discoveries by Michael Faraday and Wilhelm Eduard Weber. Benjamin Peirce’s catalogue is a comprehensive overview of the entire science of sound up to the year of its publication. As well as a general overview of the historical development of theories on sound and its propagation, there are substantial sub-sections on other areas of sound such as musical sounds, as well as further subsections on instruments according to category: Keyed, Bowed, Blown by Lungs, Blown by Bellows, and so on. The treatise concludes with The Voice, its organs, production, and also somewhat intriguingly sections on imitations of the human voice, speaking trumpets and ventriloquism. The section on The Voice begins: In all animals, without exception, (unless perhaps, the grasshopper with its chirp, or the cricket be such,) the sounds of the voice are produced by a wind instrument, the column of air contained in the mouth, throat and anterior part of the windpipe being set in vibration, by the issue of a stream of air from the lungs through a membranous slit in a kind of valve placed in the throat. Before going on to describe the physical physiological means of voice production he continues in a more reflective mood: Almost every animal has a voice, or cry, peculiar to itself; the voice being most perfect, and varied, in man and in birds, which, however, differ extremely in the degree in which they possess this important gift. In quadrapeds, it is limited to a few uncouth screams, bellowings, and other noises, perfectly unmusical in their character, while in many birds it assumes the form of musical notes, of great richness and power, or even of articulate speech. In the human species alone, and that only in some rare instances, we find the power of imitating with the voice every imaginable kind of noise, with a perfect resemblance, and of uttering musical tones of a sweetness and delicacy attainable by no instrument. (Peirce 1836, 202) Benjamin Peirce’s section on the development of a mathematical theory of sound also gave a wonderful analogy of wind rippling a field of standing corn for sound’s longitudinal wave motion: The gust in its progress depresses each ear [of corn] in its own direction, which, so soon as the pressure is removed, not only returns, by its elasticity, to its original upright situation, but by the impetus it has thus acquired, surpasses it, and bends over as much, or nearly as much, on the other side; and so on alternately, oscillating backwards and forwards in equal times, but continually through less and less spaces, till it is reduced to rest by the resistance of the air. Such is the motion of each individual ear; and as the wind passes over all of
22 Theories of sound
them in succession, and bends each equally, all their motions are so far similar. But they differ in this, that they commence not at once but successively. (Peirce 1836, 19) Such an analogy expresses quite poetically the more accurate longitudinal wave motion of sound, as opposed to the transverse wave of ripples in a pool of water, the view that had stood for centuries, though Aristotle had hinted at the longitudinal nature of sound waves in air which is “set in motion … by contraction or expansion or compression” (Caleon and Subramaniam 2007, 174). Many further developments in acoustics and technologies related to sound came about in the nineteenth and twentieth centuries, but many of the fundamental ideas and theories around sound and perception first outlined by the ancient Greeks and Romans have been debated for much longer.
Auditory perception Our conceptions of sound’s scientific or acoustical properties have gradually become more exact and quantified. We can measure the speed of sound more accurately in any medium. We can confidently predict how sound will behave in complex situations, whether for noise abatement in outdoor festivals, or for predicting natural reverberation in the design of new auditoria. Our understanding about the perception of sound in psychoacoustics and the psychology of hearing has also increased markedly though knowledge here is less absolute since it is the result of perceptual testing. Some ancient ideas about hearing have been supported while others have been left as a historical record of the process of discovery, which necessarily involves trial and error, hypothesis and new hypothesis. Albert Bregman’s Auditory Scene Analysis (Bregman 1990) remains the preeminent work in the field of the perception of hearing. It is important to note that auditory perception involves the study of not simply the ears, but the auditory system. Two fundamental and interrelated concerns of the field of auditory perception are to understand firstly, what this system does and secondly, how it works. Bregman’s work has also sought to answer other related questions: what is the purpose of perception? How much of aural capacity is innate versus learnt? How much effect does conscious attention affect hearing perception? Some of the practical applications of auditory perception research include improvements in artificial speech recognition, smart hearing aids, and auditory display. Bregman’s laboratory conducted its work using the overarching hypothesis that the various phenomena of auditory perception were actually part of a larger process of perception which was given the name Auditory Scene Analysis. According to Bregman the principal purpose of perception is to provide the means through which we construct our representation of reality (Bregman 2008).
Theories of sound 23
General principles of perceptual organisation One fundamental principle of perceptual organisation is that we hear sounds not individual elements but streams. When we hear a violin, or orchestra or someone talking, we do not hear the individual harmonic components, percussive noises, grunts, squeaks or breaths. Rather we hear a sound ‘stream’. It has been likened to the visual parsing that takes place when light hits the eye’s retina–shapes or patterns are recognised. Though the Gestalt psychologists such as Koffka (1935) were primarily discussing vision when they developed their factors and principles of perceptual organisation many of the principles, or laws, can be applied to sound. Pragnanz, the central law states that “every stimulus pattern is seen in such a way that the resulting structure is as simple as possible” (Goldstein 2010, 105). Laws of Proximity, Similarity, Continuity, Closure and Connectedness do also appear to fit very well with our experience of how the faculty of hearing actually works. There are a number of principles of auditory grouping, akin to the more familiar principles concerning visual perception, namely: Location, Similarity of Timbre and Pitch, Proximity in Time, Auditory Continuity, and Experience (Goldstein 2010, 300–303): Location – Sounds from a particular source come from one position or, where it moves, it follows a continuous path. Similarly, if two similar sound streams are grouped together it is difficult to separate them out. The so-called cocktail party effect demonstrates this phenomenon where the auditory attention is able to be selectively focus on one sound stream, and tune out others. This effect is usually explained by our ability localise sound’s direction due to binaural hearing.8 Similarity of Timbre and Pitch – Sound elements will be grouped together if they share similar attributes. For example, a group of voices saying or singing a similar word or sound will be grouped. Notes in a chord, though different in pitch share similar timing or tonal characters and so will be grouped together as a chord rather than individual notes. Proximity in Time – This can also be illustrated by the stream segregation where sounds that occur in rapid succession are likely to be produced by the same source and will be grouped together. This effect, along with the similarity of timbre and pitch can be demonstrated by auditory stream segregation, in which alternating high and low notes are played in succession. Where the notes alternative slowly the similarity of timbre and pitch dominates and they are perceived as a high-lowhigh-low sequence. Where the notes alternate quickly, they are perceived as two distinct groups, one a higher group and a second lower group. Auditory Continuity – In the natural world a physical property of objects is that they remain intact even if temporarily partially hidden from view. Where a quiet sound source is temporarily masked by a louder sound source the idea is that once the masking sound is removed, if the quiet source remains there is a tendency to believe it was continuous throughout. This is routinely used in film where a music cue is to be used and must be synchronised with some action on screen. For
24 Theories of sound
example, a piece of music that has two cues thirty seconds apart which must coincide with two specific moments on screen. Unfortunately, after picture editing the two moments on screen are twenty-five seconds apart: five seconds of the music must be edited out. This can be achieved by placing a masking sound over the edit point. The sound of a car or plane going by, an old-fashioned camera-flash or some other sufficiently loud sound effect will help mask the edit. When the music returns it will not be perceived as being discontinuous although the edit has been made. Experience – Once a pattern has been established or is recognised, an initially unfamiliar sequence can then be recognised. This is known as a melody schema: if we do not know that a melody is present then we do not look for it, but once we know it is there we recognised it as the melody is stored in our memory. For linguists the smallest units of speech are often broken down into syllables but in they in turn can be analysed in smaller units – the speech sounds or phonemes. The English language has around forty phonemes made from its 26 letters. Brian Moore also draws our attention to perceptual studies of sound which are of great interest to sound editors, designers and mixers, such as the way our ear/brain combination discriminates between contrasting sounds or the way we understand elements of speech perception: To clarify the nature of phonemes, consider the following example. The word ‘Bit’” is argued to contain three phonemes: an initial, a middle, and a final phoneme. By altering just one of these phonemes at a time, three new words can be created: ‘pit’, ‘bet’ and ‘bid’. Thus for the linguist or phonetician, the phonemes are the smallest units of sound that in any given language differentiate one word from another. Phonemes on the their own do not have a meaning or symbolize an object (some are not even pronounceable in isolation), but in relation to other phonemes they distinguish one word from another, and in terms of what is perceived, rather than in terms of acoustic patterns. Thus, they are abstract, subjective entities, rather like pitch. (Moore 1989, 300) Most sound practitioners are typically well aware that our auditory system is well suited to the analysis of change. In my own experience of teaching sound design classes, it is common to use a room with an air conditioner or background noise of some sort and after some time point out the sound of the air conditioner. Students invariably notice only then, for the first time, a sound which had been with them continuously for the preceding ten minutes. Then, after predicting that they would not notice the sound in even a few minute’s time some of the students are once more surprised when we have moved onto another topic, only for the sound of the air conditioner to be pointed out once more. They had duly, and unconsciously, assimilated it once again. Our attention can be focused on things that are constant but it is likely to wander to other things when there is little variation. We
Theories of sound 25
are much better at perceiving change: “The perceptual effect of a change in a stimulus can be roughly described by saying that the preceding stimulus is subtracted from the present one, so what remains is the change. The changed aspect stands out from the rest” (Moore 1989, 191). Michael Forrester suggests that the dominant psycho-acoustics view of auditory perception of might not tell the entire story, proposing the idea that “when hearing a sound, our imagination often plays an important part in recognising what it might be” (Forrester 2007). It is also worthwhile trying to unpick the term hearing from listening. Auditory perception is frequently concerned with hearing, focusing on some atom of sound stimuli. In everyday listening, we tend to ascribe meaning to sounds from the context surrounding them. As we converse with a friend or family member, we may not listen to the individual sounds that go to make up speech sounds as they are so naturalised that we instantly recognise who is talking and focus instead on the content of the speech, and perhaps any other cues in the sound of the voice that may indicate some further meaning that may be construed. That our previous cognitions may play a part in determining what we hear, and what we make of it, seems somewhat outside the realm of testing, since it is impossible to account for everything that a test subject has heard or may be thinking. As such, auditory perception remains somewhat distanced by its focus on the apparatus of hearing rather than the resulting thoughts that follow the sound stimulus. There is also increasing evidence that the auditory cortex is in some way shaped by experience. Training oneself to listen for a particular frequency band, or instrument, or use of reverberation for example will increase the ability to discriminate it from others because auditory areas of the brain can be shaped by training. Some of the most interesting and far-reaching research into perception involves infants’ hearing. A 1980 study showed even a three-day-old baby recognised the sound of its mother’s voice (DeCasper and Fifer 1980). For new born babies the preference for the maternal voice has a biological and evolutionary imperative which hastened mother–infant bonding. This is made possible because the neonate’s auditory competency is already sufficiently sensitive to recognise rhythm, intonation, frequency variation and potentially the phonetic components of speech (DeCasper and Fifer 1980, 1176). The field of perception has for some time broadly divided into two camps and their respective advocates; direct perception and indirect perception (Michaels and Carello 1981). Here the scientific model adopted depends on a philosophical model at its base. Either we believe that our senses give us direct information about our environment, or, that our senses add to our existing understandings about the world and are, instead, used to build representations of reality. The choice is whether our knowledge of the world is “unaided by inference, memories, or representations” or “that perception involves the embellishment or elaboration of inadequate stimulus input” (Michaels and Carello 1981, 1–2). The question of perception is far-reaching, and is fundamentally important for such things as neural networks for learning systems such as autonomous vehicles,
26 Theories of sound
character recognition or medical diagnostics. At its most basic, a neural network does not directly perceive reality from its inputs. Instead, it develops the ability to recognise patterns from a sufficiently large number of inputs in order to create a model from which it can create novel iterations. These can include natural speech recognition and synthesis, colourising black and white photographs, or a picture created from a text description alone: “There does not seem to be any limit to what can be done, except for human imagination (and training datasets)” (VanRullen 2017). Neural networks take data as we take in sensory information as experience. We may concede that, like neural networks, human perception is indirect and that our understanding of reality is the result of a number of representations, built on existing schema, then we must also concede that these representations or human models of the world are mental constructs rather than replicas. If we leave the physical properties of sound to one side, being more suited to the province of acoustics, we can nevertheless say a few things about sound and how we perceive it. Thinkers since Plato and Aristotle have wrestled with the curiosities of sound, being sometimes mathematical and musical, beautiful and dangerous. There are a number of important differences between sound and its sensory cousin, vision. First, sound only exists in a stream. For a sound to be heard something must have happened or be happening. Light and colour exists while ever there is a light source – a table is visible, but it does not make a noise. So, in a sense, sound tells you what is happening – wind through leaves, birds, distant noises, footsteps, traffic, conversations, machines, music, and so on. Second, sound is three dimensional, and is perceived in that way. Our aural apparatus allows us to differentiate between left and right, fore and aft, up and down. Conversely, our eyes have only a small portion of the world open to it at any one time, with each human eye being oriented so similarly to its partner that the peripheral field is sacrificed in favour of depth perception. Third, sound spreads out as a spherical wave from its origin and at a comparatively pedestrian speed compared to that of light. This allows us to determine localisation (the direction of the sound source, the sound’s origin) by focusing on the sound that arrives first rather than any reflections that would arrive later. The perception of these acoustic reflections gives rise to echoes and reverberations which give acoustically reflective environments their character. As far as the perception of sound is concerned there is still work to do in determining the exact nature of the processes at work in the ear/brain system (Glyde et al. 2013). Auditory perception research might increasingly point to a more philosophically interesting discussion on the nature and development of our perception. Another fundamental difference between visual perception and auditory perception is the idea of objects. This idea has a basis in the everyday lived experience, which will also be familiar to a great many sound designers: we see visual objects, but we hear sound events. The sounds may indeed come from objects but the sound is fundamentally caused by something actually happening. 9 Whilst sounds may originate from, and be physically tied to the location of an object (such as a character, weapon or vehicle) the sound itself is triggered when an event occurs.
Theories of sound 27
The study of sensory perception was centred initially on colour, perhaps since the visible as in many other areas, appear to be given privilege over audible and the other sense perceptions in the assumptions about usefulness of sensory information gathering (Casati and Dokic 2014). It is worth reminding ourselves here of some fundamental differences between our sense perceptions sight and hearing and taste, for example. For analysts of colour or taste there may be some fundamental qualities which are perceptually elemental and from which other secondary qualities are made. For colour, the primary qualities would be the primary hues: red, green and blue, from which all other colours can be made. For taste, these may be the four basic components: sweetness, sourness, saltiness and bitterness (Byrne and Hilbert 2008, 385–386). These ‘sensible qualities’ are the fundamental properties: “A sensible quality is a perceptible property, a property that physical objects (or events) perceptually appear to have” (Byrne and Hilbert 2008, 385). Yet, if we mix together two colours we no longer see two colours, but instead see a different resultant colour that is the result of the two individual components. If we mix together two sounds, such as two notes on a piano, we can still hear two sounds. It would also be difficult to imagine such fundamental elements of sound that all other sub-sets sounds can be derived from them. There is of course a counterpart in physiology of the models for both colour and taste perception. Our eyes have receptors which are aligned with the three primary colours (as well as other more sensitive receptors for intensity rather than colour). Similarly, the mouth, and in particular the tongue, has receptors for the four primary taste sensations. Our ears, on the other hand, have a large number of tiny hair cells which line the cochlear which respond to a number of particular frequency bands. Hair cells in the large end of the cochlea respond to very high frequencies, and those in the small end respond to lower frequencies. Once stimulated these hair cells, which are connected to nerves, fire impulses that are sent to the brain via the auditory nerve. These impulses are then interpreted as sound (Goldstein 2010, 259–276).
Philosophies of sound The following thought experiment was published in the notes and queries section of the Scientific American in April 1884: (18) S. A. H. asks: If a tree were to fall on an uninhabited island, would there be any sound? A. Sound is vibration, transmitted to our senses through the mechanism of the ear, and recognized as sound only at our nerve centers. The falling of the tree or any other disturbance will produce vibration of the air. If there be no ears to hear, there will be no sound. The effect of the transmission of the vibrations upon surrounding objects will be the same, with or without the presence of sentient conditions for recognizing them. Hence there will be vibration, but no sound to the things that cannot hear. (Scientific American 1884, 218)
28 Theories of sound
Of course, depending on your definition of sound the answer could be either no or yes. If you imagine sound to be a physical, measurable thing created by movement then yes, of course a tree falling to the ground would make a sound.10 On the other hand, if the definition of sound relies on human (and possibly animal) perception then with no-one and nothing to hear it, it does not make a sound. For philosophers of sound, and especially of sound perception, there are several fundamental questions that go beyond this simple question. First, there is a distinction between those who view perceptions of sounds as either objects or streams. For some philosophers of sound such as Robert Pansau the everyday language which is used to describe our perceptions is the first problem that needs to be addressed: Our standard view about sound is incoherent. On the one hand we suppose that sound is a quality, not of the object that makes the sound, but of the surrounding medium. This is the supposition of our ordinary language, of modern science and of a long philosophical tradition. On the other hand, we suppose that sound is the object of hearing. This too is the assumption of ordinary language, modern science and a long philosophical tradition. Yet these two assumptions cannot both be right – not unless we wish to concede that hearing is illusory and that we do not listen to the objects that make sounds. (Pasnau 1999, 309) Pasnau argues that sound is in fact a property of the object rather than the medium but our lack of appropriate language makes the difficulty in discussing sound in philosophical terms. If we compare the aural to the visual sense, colours are thought of as qualities where sounds are particular and individual: “Sounds … have identity, individuation, and persistence conditions that require us to distinguish them from properties of the sources that we should understand to make or produce sounds” (O’Callaghan 2009, 22). For O’Callaghan we nearly always experience sounds as “sounds of something” and attribute “the properties to the ordinary objects we commonly think of as sound sources” (O’Callaghan 2009, 19). This is undoubtedly often the case. Yet, when we think about the sound of wind, the sound we mean depends on where the wind is and what it is moving. Wind in trees is heard as the rustling of leaves. Wind in a tunnel is the resonant rush of sound in a large open pipe. The wind that you hear in a speeding car when the window is down is different again, as is that of pan pipes or wind chimes. Here the sound is more concerned with the event that is happening than the object itself. In Casati and Dokic’s review of philosophies of sound they argue that it is often helpful to consider the spatial element of sounds across the different theoretical viewpoints: If the sounds we hear have spatial locations, they can be thought to be located either where the material sources are (distal theories), or where the hearers are
Theories of sound 29
(proximal theories), or somewhere in between (medial theories). In the proximal theories the sounds are either a) sensations experienced by hearers, or b) proximal stimuli – that is, even though we might hear a sound as coming from a particular source or direction, the sound is nevertheless heard wherever the listener happens to be. In medial theories the emphasis is on the movement of the medium (e.g. air) itself, and into which both the sounding object and listener are immersed. This is also where the physical/acoustical wave motion explanation would sit, as would Aristotle’s conception of sound. In distal properties, sound is considered as processes or events in the medium inside (or at the surface of) sounding objects, or in the stuff of the sounding object. (Casati and Dokic 2014)11 Whilst seeming slightly strange initially, the proximal stimuli standpoint does take account of the different experience different listeners will have of the same sound. Take for example sound of an orchestra: if we are close to it, we may hear the first violin overly loud in the balance. Someone at the back of the room will hear a different balance of instruments, as well as a more reverberant balance overall; and someone backstage will hear a muffled version of the whole, even though they are all listening to the same source. Sounds, being temporal also possess a duration. Indeed, it is impossible to have a sound that has no duration. Just as with light travelling the distance from our sun to the earth, what we are seeing is what happened eight minutes ago, when we listen we are listening back in time. We are listening to some event that happened not contemporaneously with our hearing it, but some time ago. Anyone who has seen and heard, from the opposite end of the field, a ball being kicked will have experienced the delay between seeing the event, and hearing the event. If the distance were 100 metres (roughly the length of a football pitch) the time gap would be around 1/3 second. Hearing thunder illustrates the same point, but since the original event has much more energy, and is consequently much louder, the sound can be heard much further away. Casey O’Callaghan argues the view that sounds should be thought of as objects in themselves. “Sounds, I want to suggest, have identity, individuation, and persistence conditions that require us to distinguish them from properties of the sources that we should understand to make or produce sounds” (O’Callaghan 2007, 22). At first glance this may seem quite strange, but take the example of a police car’s siren heard from different listening positions. The relative speed of the listener and object, their distance apart, the reflectiveness of the environment, and the orientation of one to the other will all have a substantial effect on the sound heard. Similarly, for microphones, and their positioning in order to capture the sound of an object. It would be difficult to argue that there is only one particular sound of an object since much depends on the listening position. The most complex of sounds – the human voice – has a sound source which is relatively small, being the mouth and nostrils. Compare this to a piano, where the complex sound emanates from all parts of the object in question; the strings themselves as well as
30 Theories of sound
the sounding board and all the other elements which vibrate and resonate in sympathy. Mohan Matthen argues that rather than trying to decide whether sounds should be considered objects in their own right, or as one of the sensible properties along with colour, shape and size. Instead, he points out that we sense objects and their features. We recognise things, as well as their characteristics (Matthen 2010). Murray Schafer and Barry Truax were both influential in reframing the study of sound to be a study of listening as ‘acoustic ecology’, focusing on the relationship between humans and their acoustic environment. In Acoustic Communication (2001) Truax also describes a number of different levels of listening attention. He differentiates between ‘listening-in-search’ of a sound, such as speech in a noisy environment and ‘listening-in-readiness’, which is a listening in expectation of a familiar sound or signal (Truax 2001, 22). Truax also differentiates subtle differences between background listening, where we are aware but the sound is of no immediate significance, and subliminal listening, which is a total lack of conscious awareness (Truax 2001, 24–25). Whilst hearing is often compared to colour perception it is fundamentally different in both physical properties and the way human senses give our perception of them. When we listen, we might say we hear the source or origin of a sound. When we see an object, we are often not seeing the source of the light, but rather the reflection of the light source. Where we see colour what we are often sensing is the amount of red, green or blue light that is being reflected by the object. It could be argued to be analogous to hearing echoes or reverberations since we can perceive both a sound source directly and the echoes which reflect from other objects which accompany it shortly afterward. Our senses of sight and hearing have different strengths and weaknesses. How then do we consider hearing, along with our other senses? How do we describe our experience? According to Matthew Nudds, our senses work together giving us a holistic experience of the world: We perceive a single, unitary, world: in our day to day interaction with the world we don’t distinguish its objects and features in terms of the senses using which we perceive them.… We tend to think of the senses as distinct things: as five different perceptual inlets to the mind. We tend to think that each of the senses is, in some significant way, distinct from and independent of the rest, that each sense is separable from and can function independently of the others. And we tend to think that our perception of the world is the result of the combined use of whatever senses we have, so that what we perceive is the sum total of what each sense alone provides. (Nudds 2001, 223–224) Whatever we hear contributes to this perception of reality. While this corresponds, or at least doesn’t stray too far from our ongoing perception, it will be incorporated and assimilated into our perception.
Theories of sound 31
Summary Two main strands of historical inquiry into sound have been conducted in parallel; the scientific nature of sound such as its speed, wave motion, frequency, along with the human perception of sound (musical pitch, harmony, intelligibility, etc.), have both incrementally increased our understanding of what is an extraordinarily complex and sometimes mysterious sense. This chapter began with two quite different definitions of sound; one scientific “vibrations that travel through the air” and second, more perceptually oriented, “a thing that can be a heard”. The study and practice of sound design is informed by both viewpoints. Philosophers since Aristotle have attempted to describe and categorise the world and the world of sound. Piece by piece our scientific understanding of sound has evolved, as an acoustical wave travelling through a medium, which can be measured, modelled and reproduced. If we also consider the perception of sound as a type of sign system, we may be able to examine how sounds become meaningful. Our understanding of sound as the object of our auditory perception has also become more nuanced but there is less agreement on the theoretical models that describe it. In one sense human language is the most obvious example of a sonic sign system but there is still debate about its origin. Noam Chomsky’s idea of Universal Grammar – the innate genetic ability to use the rules of language – has been criticised it is on grounds of the diversity and non-uniformity of different language structures (e.g. Evans 2014). For a Chomskian Universal Grammar to hold true every human language must share structural characteristics. Whilst there are indeed commonalities across all human languages this, so the criticism goes, is not necessarily evidence of a pre-encoded Universal Grammar, but rather a functional system which is a learned skill. More recently, an evolutionary model of language has emerged which sees the human neural system in its totality as the basis for the language ability, since language use requires cognitive capacity combined with development and fine control of the organs of speech production such as the tongue and larynx, and biological development of human hearing system for both speaker and listener (Lieberman 2006). Whilst there is debate about any supposed biological origins of language, their complexity and ubiquity point to a natural and innate human ability to create, use and share sign-systems in order to communicate. Whether examining a complex formal sign-system like language, or the use of non-verbal sound signals such as alarms, semiotics can be used to examine any type of sound as part of a sign, either alone or in conjunction with a visual accompaniment, from the very simplest to the most complex.
Notes 1 The tetractys also symbolised the four classical elements of fire, air, water and earth (Ritter and Morrison 1838, 363) 2 The tension in the strings were also later found to be in a similar relationship though this time the pleasant pairs were related to square root of the ratio of the tensions. For example if the tensions were in the ratio 9:4, the square root of 9/4 results in the simple fraction 3/2 and thus a harmonious relationship.
32 Theories of sound
3 Galileo Galilei, after publishing his heliocentric theory in support of Copernicus, met with the opposition of the Catholic Church which formally declared the theory to be heretical. The geocentric view, which tallied with the bible, put the earth at the centre of the universe whereas the heliocentric view put the Sun as the centre with the planets orbiting it. 4 Fourier’s paper “On the Propagation of Heat in Solid Bodies” was presented to the Paris Institute in 1807. 5 Ohm is better known for Ohm’s Law of Electrical Resistance, known to every student of physics where voltage is equal to current multiplied by resistance, or V=IR. 6 The Sampling Theorem states: “If a function f(t) contains no frequencies higher than W cps [cycles per second], it is completely determined by giving its ordinates at a series of points spaced 1/ZW seconds apart” (Shannon 1948, 11). 7 Benjamin Peirce was the father of Charles S. Peirce, whose theory of semiotics is applied later in this book. 8 Interaural time difference (ITD) and interaural level difference (ILD) aid in understanding speech “at significantly poorer signal-to-noise ratios when speech is spatially separated from the noise rather than co-located” (Glyde et al. 2013). 9 This view of sound sources has been fundamentally grasped by interactive sound designers in particular since sounds are often literally described and allocated to an ‘event’ in audio middleware such as FMOD or WWISE. 10 For the pedantic the sound of the tree falling does not make very much sound compared to the tree coming in contact with the ground, which comparatively makes a much greater sound. 11 There are, perhaps inevitably, further categories or subcategories of theories. Sounds are considered Aspatial where the theory denies sounds having spatiality of their own, arguing instead that the spatiality of sound is superimposed on a physical or visible world. So the theory goes, if this world could not be seen or touched would our hearing give us the objective perception of the sounds being located in space? Another category of sounds would be Pure Events, such as where we hear sounds as music, we can then hear them as events detached from their physical causes (Casati and Dokic 2014).
References Aristotle. 350 bce. On The Soul, Book II. Aristotle. 350 bce. Poetics. Raleigh, NC: Generic NL Freebook Publisher. Bregman, Albert S. 1990. Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge, MA: MIT Press. Bregman, Albert S. 2008. “Home Page of the Auditory Research Laboratory.” Available online at http://webpages.mcgill.ca/staff/Group2/abregm1/web/index.htm Byrne, Alex, and David Hilbert. 2008. “Basic Sensible Qualities and the Structure of Appearance.” Philosophical Issues 18: 385–405. Caleon, Imelda, S. and R. Subramaniam. 2007. “From Pythagoras to Sauveur: tracing the history of ideas about the nature of sound.” Physics Education 42(2): 173. Casati, Roberto, and Jérôme Dokic. 2014. “Sounds.” In Edward N. Zalta, Stanford Encyclopedia of Philosophy. Stanford University: Metaphysics Research Lab. DeCasper, Anthony J., and William P. Fifer. 1980. “Of Human Bonding: Newborns Prefer their Mothers’ Voices.” Science 208(4448): 1174–1176. Descartes, René, Elizabeth Sanderson Haldane, and G. R. T. Ross. 1997. Key Philosophical Writings, Wordsworth Classics of World Literature. Ware: Wordsworth Classics. Descartes, René, A. D. Lindsay, and John Veitch. 1912. A Discourse on Method. London; New York: J. M. Dent & Sons; E. P. Dutton & Co. Descartes, René, and Stephen Voss. 1989. The Passions of the Soul. Indianapolis: Hackett Pub. Co. Evans, Vyvyan. 2014. The Language Myth. Cambridge: Cambridge University Press.
Theories of sound 33
Forrester, Michael A. 2007. “Auditory Perception and Sound As Event: Theorising Sound Imagery In Psychology.” Sound Journal. Available online at www.kent.ac.uk/arts/soundjournal/forrester001.html Fourier, Jean Baptiste Joseph. 1878. The Analytical Theory of Heat. Translated by Alexander Freeman. Cambridge: Cambridge University Press. Galilei, Galileo. 1914 [1638]. Dialogues Concerning Two New Sciences. Translated from the Italian and Latin into English by Henry Crew and Alfonso de Salvio. With an Introduction by Antonio Favaro. Translated by Alfonso de Salvio and Henry Crew. New York: MacMillan. Galilei, Galileo. 1939. Dialogues Concerning Two New Sciences. Evanston, IL: Northwestern University Press. Glyde, Helen, Jörg M. Buchholz, Harvey Dillon, Sharon Cameron, and Louise Hickson. 2013. “The importance of interaural time differences and level differences in spatial release from masking.” The Journal of the Acoustical Society of America 134(2): EL147-EL152. doi:10.1121/1.4812441 Goldstein, E. Bruce. 2010. Sensation and Perception, 8th edition.Belmont, CA: Wadsworth, Cengage Learning. Helm, E. Eugene. 2013. Melody, Harmony, Tonality: A Book for Connoisseurs and Amateurs. Lanham, MD:Scarecrow Press. Johnstone, Mark A. 2013. “Aristotle on Sounds.” British Journal for the History of Philosophy 21(4): 631–648. doi:10.1080/09608788.2013.792239 Koffka, Kurt. 1935. Principles of Gestalt Psychology. London: Routledge & Kegan Paul. Lieberman, Philip. 2006. Toward an Evolutionary Biology of language. Cambridge, MA: Belknap Press of Harvard University Press. Locke, John. 1690. Of our Complex Ideas of Substances. In An Essay Concerning Human Understanding Book II: Of Ideas. Locke, John, and Alexander Campbell Fraser. 1959. An Essay Concerning Human Understanding. London: Dover Publications. Matthen, Mohan. 2010. “On the Diversity of Auditory Objects.” Review of Philosophy and Psychology 1(1): 63–89. doi:10.1007/s13164–13009–0018-z Michaels, Claire F., and Claudia Carello. 1981. Direct Perception. Englewood Cliffs; London: Prentice-Hall. Moore, Brian C. J. 1989. An Introduction to the Psychology of Hearing, 3rd edition. London: Academic Press. Newton, Isaac, Florian Cajori, and Andrew Motte. 1962. Mathematical Principles of Natural Philosophy and His System of the World, 2 volumes. Berkeley: University of California Press. Nudds, Matthew. 2001. “Experiencing the Production of Sounds.” European Journal of Philosophy 9(2): 210–229. doi:10.1111/1468-0378.00136 Nyquist, Harry. 1928. “Certain Topics in Telegraph Transmission Theory.” Transactions of the American Institute of Electrical Engineers 47(2): 617–644. doi:10.1109/T-AIEE.1928.5055024 O’Callaghan, Casey. 2007. Sounds: A Philosophical Theory. Oxford: Oxford University Press. O’Callaghan, Casey. 2009. “Constructing a Theory of Sounds.” Oxford Studies in Metaphysics 5: 247–270. O’Connor, J. J., and E. F. Robertson. 1997. “Jean Baptiste Joseph Fourier.” Available online at www-groups.dcs.st-and.ac.uk/history/Biographies/Fourier.html Pasnau, Robert. 1999. “What is Sound?” The Philosophical Quarterly 49(196): 309–324. doi:10.1111/1467–9213.00144 Peirce, Benjamin. 1836. An Elementary Treatise on sound; Being the Second Volume of a Course of Natural Philosophy. Boston: J. Munroe and Co. Plato. 2013. Timaeus. In The Project Gutenberg EBook of Timaeus. Gutenberg.org.
34 Theories of sound
Ritter, Heinrich, and A. J. W. Morrison. 1838. The History of Ancient Philosophy. Oxford: D. A. Talboys. Rogers, G. A. J. 1978. “Locke’s Essay and Newton’s Principia.” Journal of the History of Ideas 39(2): 217–232. doi:10.2307/2708776 Schafer, R.Murray. 1993. The Soundscape: Our Sonic Environment and the Tuning of the World. Rochester, VT: Destiny Books. Scientific American. 1884. “Notes and Queries.” Scientific American, 50(14):218. Shannon, C. E. 1948. “A Mathematical Theory of Communication.” The Bell System Technical Journal, 27 (October 1948): 623–656. Truax, Barry. 2001. Acoustic Communication, 2nd edition. Westport, CN:Ablex. VanRullen, Rufin. 2017. “Perception Science in the Age of Deep Neural Networks.” Frontiers in Psychology 8(142). doi:10.3389/fpsyg.2017.00142 Vitruvius, Pollio, and Frank Stephen Granger. 1931. Vitruvius On Architecture, 2 volumes. London; New York: W. Heinemann; Putnam.
3 AUDIOVISUAL THEORIES OF SOUND
Introduction Until there was some technological means of converting, recording or transmitting sound, all sound was live.1 The nineteenth century saw the birth of three important sound technologies: i) the telephone, invented by Alexander Graham Bell in 1876, ii) the phonograph, invented by Thomas Edison in 1877, and iii) radiotelegraphy, developed by Guglielmo Marconi in 1895 (Dombois and Eckel 2011).2 Through the use of a transducer, we can change a signal from one form of energy to another. A microphone changes variations in pressure into electrical energy, and a loudspeaker vice versa, where a phonograph converts sound into analogous grooves cut into a spiral on a disc. As a result, we can amplify, store or send audio as an electrical signal.3 Cinema sound, for example, began to be adopted wholesale once the Vitaphone system was introduced. This was a disc-based sound recording process that meant that all the sound that was to accompany a scene had to be recorded simultaneously. A few short years later, once sound could be translated to an optical (and later a magnetic) medium, several things became possible. First, editing could be performed to sequence different sounds together or to remove silences or unwanted material. This enabled multi-layered soundtracks with dialogue, music and sound effects being recorded separately and mixed together to create a composite soundtrack. Second, super-position of sounds was possible, meaning sounds could be layered by placing one over the other in a predictable manner. Third, sounds could be easily sped up, slowed down, or reversed. As a result, this meant that audiovisual representation was now easily open to manipulation. Our senses could be fooled. Through sound editing we can replace, or augment, or enhance the sounds of any object or place imperceptibly. Whether using sound alone, such as in a musical performance or a radio drama, or using
36 Audiovisual theories of sound
sound with images, there is no visible trace of any manipulation. However, in an audiovisual production the visible tends to supply convincing corroboration that what we can hear is real. The visible supports the sonic illusion just as much as the audible supports visual illusions or special visual effects.
Cinema and sound Rick Altman has written comprehensive histories of silent film sound and numerous articles on sound’s place (and absence) in film theory (Altman 1980a; 1992; 2004). In likening cinema sound to ventriloquism, he provided a novel approach to the positioning of the sound/image/meaning construct where the acousmatic sound is seen as a distraction, and “we are so disconcerted by a sourceless sound that we would rather attribute the sound to a dummy or a shadow than face the mystery of its sourcelessness” (Altman 1980a, 76). The visual image of a person speaking is a redundant duplication, yet the shot/reverse shot is still the staple of filmmaking. In “Four and a Half Film Fallacies”, Altman also highlights the importance of the twin aims of early sound cinema, namely persuasive illusion and intelligibility, being borne of necessity in terms of the technology available and the development of cinema sound where technology began to be used to deliberately construct reality, rather than just to observe it (Altman 1980c, 58). In addition to being the foremost historian of cinema sound Altman also closely examined the language of film theory, which is based firmly in visual terms, tracing its origins to the early theorists who were distrustful of sound (including Kracauer, Pudovkin, Wright and Arnheim) and who consequently framed cinematic theory in visual terms, often ill-suited for application to the soundtrack. He described a number of fallacies, historical, ontological, reproductive, nominalist and indexical, which permeated much of the early and subsequent writing on cinema as a whole that sought to diminish or ignore altogether sound’s place in cinema theory (Altman 1980b, 65).
Early film sound theory The advent of dialogue in film created a new set of questions, for example, around the artistic purpose of film as compared to the theatre. Before the advent of sound, the medium could never aspire to full-blown realism, and so filmmakers had headed off in the opposite direction, creating a new stylised representational art form. Critics were divided on whether the advent of sound would enhance or destroy film as an artistic medium. Many of the classical era theorists of the new sound film were sceptical of talkies, and hoped they might fade away once the public’s initial fascination with synchronous sound had worn off. Still, there remained the concern that audiences would not be able to concentrate on the visual image if they had to concentrate too much on dialogue (Kracauer 1985). Some audiences were well used to booming voices though. In American cinemas a master of ceremonies (MC) was frequently employed to read
Audiovisual theories of sound 37
and interpret the films as they were projected (Fielding 1980, 4–5), providing a link between the familiar vaudeville and the new cinema. The MC would read the subtitles to the audience, many of whom would not be able to read, or at least, not be able to read in English. Lastly, and perhaps most importantly, he was there to explain the film to the audience. The increasingly sophisticated editing and camera positioning were therefore mediated to an audience more used to live theatre than seeing stories presented on screen. The pioneering Russian filmmaker Sergei Eisenstein, along with Vsevolod Pudovkin and Grigori Alexandrov in 1928, were even more fearful of the introduction of dialogue to a mature cinema stating that “actual language posed a threat to figurative language that had evolved” (Eisenstein, Pudovkin, and Alexandrov 1985, 75). The threat sound posed was obvious. The ornate visual language that had been developed for silent cinema was under attack. These ideas took hold of filmmakers in the UK and France, more so than those working in the US. Eisenstein proposed the theory of montage in which the sequencing and juxtaposition of cinematic elements provided the distinctive difference that marked out cinema as a new art form. Pudovkin also accepted the sonic component of film but particularly argued against the idea of dialogue driven films (1985). Pudovkin, too, saw sound films as a legitimate avenue to pursue as long as the principle of asynchronism was adopted. The isolation of dialogue as the problem, rather than sound as a whole, was also taken up by René Clair, in 1929. He was pragmatic enough to recognise that the adoption of sound in film was here to stay but was careful to delineate between sound films and the derided talking pictures: The talking film is not everything. There is also the sound film – on which the last hopes of the advocates of the silent film are pinned. They count on the sound film to ward off the danger represented by the advent of talkies, in an effort to convince themselves that the sounds and noises accompanying the moving picture may prove sufficiently entertaining for the audience to prevent it from demanding dialogue, and may create an illusion of “reality” less harmful for the art than the talking film. (Clair 1985, 92) Others were less fearful of dialogue and saw its potential. By 1938 Rudolf Arnheim contended that film dialogue makes storytelling easier, as “a device for saving time, space and ingenuity”, and whilst he could accept that opera as an art benefits from the libretto, providing a “skeleton of the dramatic action”, he dismissed the idea that dialogue could have a similar function in film (1985, 112). Rather than modifying the art form, Arnheim viewed dialogue as ruining the existing art form by interfering with the expression of the image.4 Practitioner-theorists, such as Alberto Cavalcanti writing in 1939, argued convincingly that sound could be used as the “medium of suggestion”, an idea that has grown in acceptance over the years as the use of sound became less functional or literal. Being able to hear a sound without being able to immediately see its origin
38 Audiovisual theories of sound
allows the filmmaker to create suspense, fear or confusion. In doing so, the audience would be less aware of the mechanics of film, since sound is not as directly attributable to its source as visual information: “That is why noise is so useful. It speaks directly to the emotions” (Cavalcanti 1985, 109). Filmmaker Béla Balázs saw the creative potential of sound in film which had on the whole been unfulfilled. Working primarily as a writer who straddled both silent and sound eras, he wrote in 1949 about some of the dramaturgical possibilities in sound, which “can intervene to influence its course” rather than simply accompanying images in the story (1952, 200). He described how tension and surprise could be achieved and maintained by manipulation of the soundtrack. Where a sound is diegetic, but not visible to the audience, the actor may hear the sound, see its source and understand what it is before the audience. Conversely, the audience may see the origin of the sound before the character (1952, 209–210). Whether creating suspense, tension or surprise, the ability to have the characters and audience not necessarily listening to the same thing or seeing the same thing, puts the audience or the actor in the privileged position of prior knowledge. He also highlighted the dramatic usefulness of visible reactions to sounds rather than their origin, as well as warning of the potential for sound metaphors and similes to become obvious or stereotypical (1952, 218). Dialogue for some, and synchronised sound for others, were the specific threats rather than sound in itself although sound as a whole became the target of the distrust. Whether one considers the silent era of cinema a mature and pure form of art, or a preface to real cinematic greatness, there is little disputing the effect that the introduction of sound (and with it, dialogue) had on the way films were written, acted, directed and received. The prevailing view amongst many early theorists was that the introduction of sound brought cinema closer to reality and therefore further from art. Many of the classical era theorists of the new sound film were sceptical of talkies and hoped they might fade away once the public’s initial fascination with synchronous sound had worn off. However, audiences were now used to seeing and hearing films. Taking sound away, or even simply the dialogue, was not an option. Once the Vitaphone system had been developed, to such an extent that synchronism between pictures and sound was achievable and therefore transparent, the talking pictures were off and running.5 Filmmakers could then begin using the sound component of their films creatively, rather than as a mere marketing instrument to declare that their films ‘talked’. In the words of Scott Eyman “Talkies were not an evolution, but a mutation, a different art form entirely” (1999, 22) Even Alfred Hitchcock, whilst defining ‘pure film’ as film that expresses its meaning visually, specifically through montage, used sound in his films as another creative input whose purpose was to hold the audience’s attention (Weis 1978, 42). This pragmatic approach exemplifies the emerging bifurcated approach to the introduction of sound, on the one hand holding the view that pure cinema is visual, whilst at the same time proceeding to utilise sound to create films beyond what was possible with the purely visual.
Audiovisual theories of sound 39
The realistic or fidelity approach to the recording and use of sound in cinema could be exemplified by the ideas of another influential figure whose work straddled the coming of sound to cinema, J. P. Maxfield. He wrote on cinema acoustics (Maxfield and Flannagan 1936) as well as recording techniques before sound in film had been properly established (Maxfield and Harrison 1926). Maxfield, being more empirical than artistic, used the human model of perception and common sense as the basis for his recommendations on the practicalities of recording sound for films. There was necessarily a compromise between intelligibility and faithfulness to real life (Lastra 2000). For intelligibility Maxfield proposed the three-walled set as being preferable since it reduced reverberation and therefore improved intelligibility and recommended the placement of a single microphone along the line of the camera so that “characters approaching the camera automatically approach the microphone as well” (Altman 1980c, 50). His practical understanding of the differences between natural binaural hearing compared to hearing via a monoaural microphone recording will be recognised by anyone who has ever recorded or mixed dialogue: The loss of direction brought about by the use of one ear only, causes some rather unexpected results. When two ears are used, a person has the ability to consciously pay attention to sounds coming from a given direction to the partial exclusion of sounds coming from other directions. With the loss of the sense of direction which accompanies the use of monaural hearing, this conscious discrimination becomes impossible and the incidental noises occurring in a scene, as well as any reverberation which may be present, are apparently increased to such an extent that they unduly intrude themselves on the hearer’s notice. It is therefore, necessary to hold these noises, including the reverberation, down to a lower loudness than normal if a scene recorded monaurally is to satisfactorily creative the illusion of reality when listened to binaurally. (Maxfield 1930, 86) Maxfield also applied this common-sense approach to the matching of sound scale to the image in which sound and image work together to present a realistic depth and scale: When a person is viewing a real scene in real life, he is viewing it with lenses – that is, the eyes, and pickup devices – that is, the ears, which are in a fixed relationship, one to the other.… The point of importance, however, is the fact that the eyes and ears maintain a fixed relationship to one another. (Maxfield 1930, 85) The conventions of visual editing brought about a conundrum. Films are fundamentally different representations to real life in that they are edited. We do not experience a continuous fixed relationship between eyes and ears. In real life a
40 Audiovisual theories of sound
person’s voice matches the direction of their mouth. In a film, the person may be shown in the centre of the screen, or moving around, or alternating between a wide and close-up shot, or they may even not be on screen at all. Should the sound then match the scale/perspective of the picture? That is, if the scene cuts from a wide shot to a close up, should the sound go from a relatively quiet and reverberant to a louder more direct perspective? This also presents a problem for the fidelity model of sound recording since to follow slavishly the visual representation merely serves to draw attention to each visual edit. This tends to work against the need to develop a sense of engagement or immersion in the story. As a result, Maxfield’s advocacy was largely abandoned by the industry which valued pragmatism and storytelling over supposed scientific correctness. The industry adopted the view that the soundtrack could be employed to hide the artifice of the cinematic apparatus and its processes, even if it meant a departure from scientific or common-sense realism.
Later film sound theory Regardless of the abandoning of a strict adherence to scientific recording and reproduction, cinema routinely required from the soundtrack a ‘lifelike’ experience. Pictures without sound could never attain the level of immersion of a sound film. As Mary Ann Doane pointed out, the audience engages and is immersed if the visual and aural world in which they are asked to believe is done well enough (Doane 1980a). Hand in hand with a believable or seemingly natural cinematic representation goes the need to hide the cinematic apparatus itself: “‘the less perceivable a technique, the more successful it is’, the invisibility of the work on sound is a measure of the strength of the soundtrack” (Doane 1980b, 48). Michel Chion has written extensively on sound and the cinema, his influential books including Audio-Vision: Sound on Screen (1994) and The Voice in Cinema (1999). Like Mary-Anne Doane, Chion sees cinema as inherently ‘vococentric’, with the voice being the constant throughout production whilst other sounds are the accompaniment. Chion coined a number of useful phrases relating to some of the phenomena of sound in film. In what he calls the ‘audiovisual contract’ sound and image work together to create the perception of a unified whole. Chion describes the idea of ‘added value’ in which sound enriches an image without appearing to do so, wherein the immediate or remembered experience comes naturally from the image by itself: “that sound merely duplicates a meaning which in reality it brings about” (1994, 5). For Chion sound is often used to guide the eye, either by synchronising particular sounds to visual movements or creating a sleight of hand effect by providing a sound to an event that did not actually happen (such as a punch or a quickly opening door). Chion highlights the power of the acousmêtre in cinema, the unseen voice, by deliberately withholding the visible source of the voice (1999, 17–29). He also created the term synchresis, a melding of synchronism and synthesis, to describe the in parallel occurrence of
Audiovisual theories of sound 41
auditory phenomenon and visual phenomenon to become one single cinematic or mental object (Chion 1994, 63). In addition, Chion discusses the question of off-screen space and point of audition, highlighting the difficulty in isolating a single point in contrast to the visual point of view (1994, 89–92). Indeed, it is the image that largely determines the supposed point of audition, typically either through close-up on the character whose point of audition we share, or through a subjective image of the character’s viewpoint. Chion also describes different listening modes, originally described by Pierre Schaeffer (1967), which relate to the purpose of the listening. For example, ‘causal’, which seeks to gather information, ‘semantic’, which seeks to decode the sounds, and ‘reduced’, which focuses on the characteristics of the sound itself, rather than the meaning or cause (Chion 1994, 25–31). Chion has also examined the lack of critical attention to synchronous sounds, such as dialogue in film. He argues convincingly that synchronous sounds tend to be swallowed up by the visual element, where the meaning of the combined sound-image becomes attributed solely to the image, and the sound itself becomes thought of as redundant (Chion 1999, 4). This partly explains why there has been more critical attention on musical and voiceover elements of the soundtrack since they are not tied to visual elements.6 Chion also recognised that a principal problem for sound analysis is the lack of a common language, and its dependence on blurred terminology and concepts. He chastises theorists who complain that “the vocabulary is too poor to describe the sound, but never think to use more specific words that exist in their language” (Chion 2012, 9 [my translation]).7 David Sonnenschein provides a unique and comprehensive guide to sound design bringing together ideas of sound and narrative theory, with psychoacoustics, music, voice and image (2001). Departing from more conventional guides, he splits the development of the soundtrack into various areas of focus, for example, chronological, acoustic, perceptual, phonetic, emotional, descriptive, and so on. Preferring the term voice to dialogue, Sonnenschein (2001, 133) also highlights the difficulty we have in listening to the characteristic sounds of a voice in speech that is in our native language, as the “involuntary urge for understanding” will take over, mirroring Chion’s delineation of different listening modes. Similarly, he draws attention to the non-language speech elements, such as prosody, the tonal inflections and emphases, which add meaning to the words. Sonic hierarchies and sounds in general are related to Gestalt principles, such as figure and ground, closure, common fate and belongingness, which can then be used or manipulated (Sonnenschein 2001, 79–83). In addition, Sonnenschein (2001, xxi–xxii) takes account of the audience and the meanings that are derived from the physical, emotional, intellectual and moral elements of the soundtrack. There has been something of an explosion in sound theory in recent years. From closer study particular of media industries history of sound to anthropological, political, personal, philosophical, aesthetic and cultural examinations of sound, the scope is still expanding and the renewed attention on sound is both exhilarating and slightly overwhelming. Sound Studies as a discipline has seen an exponential
42 Audiovisual theories of sound
growth recently with authors finding an environment in which to discuss their ideas rather than working at the fringes of other disciplines such as musicology, philosophy or cultural studies. Increasingly, modern researchers are returning to the transition from silent era to sound. Kathryn Kalinak’s Sound: Dialogue, Music, and Effects examines the history of film sound from the silent to the digital era. Kalinak also makes the point about ‘silent film’ being a label applied retrospectively to an era that was never silent, but which is now defined by an absence of synchronised sound: “created during the industry-wide conversion of synchronised sound and projected back onto an era that generally used the term “moving pictures” to describe the phenomenon” (Kalinak 2015, 2). Hollywood Soundscapes (Hanson 2017) examines the era of the transition to sound from the perspective of the sound technicians whose role fundamentally changed. From being relied on for technical expertise their role transitioned to include more artistic requirements in support of the ‘story values of the film’. The early recognition of sound as a close collaborator of cinematography and screenwriting are traced from writings, magazines, journals and technical manuals of the day. There has also been deepening interest in the technologies of sound and the accompanying focus on sound aesthetics. Beyond Dolby: Cinema in the Digital Sound Age (Kerins 2011) examines both in light of the widespread adoption of cinema sound technologies. Though multichannel sound had been around since Fantasia it was not widely adopted in cinemas until Dolby’s introduction of digital 5.1 format, and later the rival formats of Digital Theatre Systems (DTS) and Sony’s SDDS system. Contemporary cinemas, digital soundtracks and modern cinema speaker system are capable of louder and less distorted reproduction than their predecessors, but they also enable very quiet sounds to be heard. This increased dynamic range of digital reproduction meant that films could now have extremely quiet as well as extremely loud sections. It marks the cinematic sound equivalent of the move from the era of the harpsichord to that of the pianoforte.8 Dolby Digital 5.1 also became the de facto surround standard for consumers as it was part of the specification for the DVD Video standard. More recently, Dolby have sought to make its Atmos system the twenty-first century standard for multichannel cinema. As well as convincing filmmakers that the majority of audiences could now hear their films as they were intended they were also encouraged to take advantage of the ability to position sounds anywhere in space.9 Jay Beck’s Designing Sound: Audiovisual Aesthetics in 1970s American Cinema (2016) also highlights the influence of Dolby in cinema sound which also largely coincided with the increasing popularity of pop-music scored films like The Graduate (Nichol 1967) and Easy Rider (Hopper 1969) and the rise to prominence of sound-aware directors like Frances Ford Coppola and George Lucas. The debuts of Dolby’s technologies were frequently on films whose musical soundtrack was an important selling point: Dolby-A noise reduction was first used in the production of Stanley Kubrick’s A Clockwork Orange (1971) which prominently featured classical music as well as synthesised compositions of Walter Carlos, while the
Audiovisual theories of sound 43
quadraphonic Dolby Stereo system was first used on the musical drama A Star is Born (Pierson 1976). Star Wars (Lucas 1977) was released shortly afterward.10
Practitioner’s sound theory Although largely unrepresented in academic literature, industry practitioners have not been silent in debates about the role of sound in film, television, games and other media and their observations have added a dimension to the critiques offered by cinema-focussed theorists. Their views can be found in their own books or in the industry press, and increasingly through specialist blogs and other websites. Practitioners are in the privileged position of writing from the perspective of creator, whereas the analyst and theorist can only examine the finished results. Understanding the processes and technologies, rationales and methodologies that go into the production of the soundtrack adds a new dimension to sound criticism. By explicitly describing their working methods, practitioners enable us to better understand the contribution of sound to the creation of a story. As well as discussing sound as a discrete and isolated element, it can also be discussed in terms of the hierarchy of filmmaking, the practicalities of the industry, and the decisionmaking processes, which affect how sound is used and could be used. There are a number of texts either written by industry practitioners or which contain in-depth interviews with practitioners. Film Sound: Theory and Practice (Weis and Belton 1985), and Sound and Music in Film and Visual Media (Harper, Doughty and Eisentraut 2009) comprise a collection of interviews and profiles. One recent collection is The Palgrave Handbook of Sound Design and Music in Screen Media: Integrated Soundtracks (Greene and Kulezic-Wilson 2016). Its title gives the clue that it covers areas beyond cinema, with a welcome focus on a number of television series which have increased in importance and acclaim this century. There are also a number of books written by experienced practitioners who have not received widespread acclaim or influence. As with any text, their impact is dependent on the prominence of the author within the industry. For example, The Art of the Sound Effects Editor (Kerner 1989), Sound Effects: Radio, TV, and Film (Mott 1990), and The Foley Grail: The Art of Performing Sound for Film, Games, and Animation (Ament 2009) cannot point to such high profile or high status films as some of their contemporaries. For instance, practitioners from television have historically been accorded less status than those from film. Hence, both Kerner (The Man From U.N.C.L.E.) and Mott (Days of our Lives), being associated with television drama suffer in comparison to their feature film cousins. In addition, Ament, who works in an area (foley) that is far from glamorous moviemaking, may suffer in comparison to her colleagues writing on other aspects of soundtrack production. Randy Thom, Sound Designer and Supervising Sound Editor, and Director of Sound Design at Skywalker Sound, takes up many of the same themes that Robert Bresson has discussed in his influential book Notes on the Cinematograph (Bresson 1975).11 In the influential paper, “Designing a Movie for Sound”, Thom suggests that starving the eye of information forces the brain to use the ear for information
44 Audiovisual theories of sound
(2000, 8).12 Thom puts forward some refreshing views for those working within and alongside film sound. He stresses the importance of visual ambiguity, such as darkness, camera movement, slow motion, black-and-white photography, and subjective camera angles, to provide room for expression (Thom 2000, 8). Conscious that we are rarely as aware of sound as we are of images, Thom (2000, 10) argues that “movies are about making connections between things that couldn’t possibly be connected in a single real life moment or, at least, in a way that you could be aware of in any sense. Sound is one of the best ways to make those connections.” He also makes a good case for the early involvement of sound producers in the film production and is distrustful of the tyranny of competence, the idea that “as a professional you should know what you are going to do – which is anathema to the creative process” (Thom 2000, 16). Thom also advocates a restrained approach which is always end-directed in favour of the most efficient means of moving the narrative forward I think the listener/viewer does need something more than the chassis of a story. They need at least a few details, a few very specific clues that are pliable enough in terms of their meaning to act like little spring boards, propelling each audience member into a journey informed by his/her own imagination. But I do also think it’s crucial to avoid supplying an overload of details, which will often cancel each other out. It’s usually not a good idea to “sonify” every single thing in a scene that could conceivably make a sound. It’s better to choose, and through that choice create a little story vector. (Thom in Farley 2012) Tomlinson Holman (2010) is one of the few widely known figures in film sound, and in Sound for Film and Television covers the technical aspects as well as a brief theory of film sound. With a career spanning audio engineering and film theory, Holman covers much of the same territory as Yewdall, but also provides insights regarding the theoretical rationale for particular practices. Holman describes sound as having a narrative role, comprising direct narrative and subliminal narrative functions and a grammatical role that is used as a form of “connective tissue” in the filmmaking process in producing meaning (Holman 2010, xi–xii). Dialogue is an example of a sound with a direct storytelling role, while music is typically an example of subliminal narrative. Particular sounds, such as background ambient sounds, which straddle picture edits to indicate continuity, perform a grammatical role. Mark Mangini (Mad Max: Fury Road, Blade Runner 2049) describes three principal routes that of a sound designer may take in creating sounds: the original source itself, a stand-in for the original source, or a purely symbolic representation (Mangini 1985, 364). The first is often the domain of sound recordists in getting clean recordings of actual objects or environments such as authentic weapons or places. The second is a ‘cognitive route’ where a metaphorical connection exists between the stand-in and the original. Mangini gives the classic example of
Audiovisual theories of sound 45
cornstarch being used as a stand-in for the sound of footsteps in snow. Cornstarch itself, being a white powdery material, may have been the initial aid in its being tried out, or similarly half coconut shells being used in place of horses’ hooves. The third path is the most abstract having no actual corollary with the original physical object. Examples include scrunched cellophane being used for crackling of fire. Walter Murch is perhaps the most respected film sound writer/practitioner. In interviews and his own articles, he explains and develops his theory of film, and the place of sound within it. As a highly respected and successful practitioner, Murch is able to quash some sound myths. He points out that anyone who is aware of the processes of film sound soon realises that “there is no necessary connection between ends and means” (Murch 2005b). He overturns the idea that filmic sound is somehow no different from the actual sound being represented. Nor is film necessarily a realist medium, or at least it should not aim to be so. Arguing against the idea of completeness, Murch suggests that the “best sound is the sound inside somebody’s head” (cited in Kenny 1998). In order to engage the audience, there should be some room for sense-making, to “provoke the audience to complete a circle of which you’ve only drawn a part” (Jarrett and Murch 2000, 8). Several ideas flow from this deliberate artistic restraint. Rather than literal representation, sound can act as a metaphor from which meaning is made: This metaphoric use of sound is one of the most flexible and productive means of opening up a conceptual gap into which the fertile imagination of the audience will reflexively rush, eager (even if unconsciously so) to complete circles that are only suggested, to answer questions that are only half-posed. (Murch 2005b) The metaphorical use of sound extends into both the blending of sounds and the actual search for representative sound elements, which isolate the essential quality being sought: “I always try to be metaphoric as much as I can and not to be literal. When you’re presented with something that doesn’t quite resolve on a normal level, that’s what makes the audience go deeper” (Jarrett and Murch 2000, 8). Murch comes closest to a practical theory of sound when describing sound as being on a continuum between coded and embodied: “The clearest example of Encoded sound is speech. The clearest example of Embodied sound is music” (Murch 2005a). As both a sound editor and picture editor, Murch is in a privileged position to judge the relative importance and synergies between sight and sound: I see sound and picture as natural allies, the flip side of the same coin. In football terms: I’m able to pass the ball to myself. When I’m editing a picture I can think of something in sound that will help me with the picture: I can leave an idea ambiguous in the image, knowing that I can complete or amplify it later with the right sound. (Cowie and Murch 2003)
46 Audiovisual theories of sound
It is evident from this brief review that sound practitioners on the whole do not tend to have as a high a profile as sound theorists. Instead, their views are more typically found in the industry press. The practitioners of film sound are generally not as well known, or as commercially exploitable, as actors and directors, and neither are they held in the esteem enjoyed by cinematographers, composers or picture editors. Relatively few sound practitioners have the status to be interviewed on their own merit in magazines that are geared toward cinema. Instead, magazines, such as Mix: Professional Audio and Music Production and Audio Technology, often only interview prominent sound mixers and editors when the film they are discussing is a blockbuster or Oscar hopeful (Jackson 2010a, 2010b). The quoted material typically involves a description of the approach as it relates to the story itself and how that approach integrated with the director’s wishes. Similarly, interviews with those working on high status television series typically cover technical topics, such as the peculiarities of large-scale period production in Boardwalk Empire ( Jackson 2011a) or the use of particular recording technologies in Game of Thrones ( Jackson 2011b). As a result, the content is more journalistic, whereby the ‘angle’ and point of interest of the article is the film itself rather than the practitioner and their work. Rob Bridgett is the game sound designer and author of “Designing a Game for Sound” (2009), written as an extension of Randy Thom’s original article “Designing a Movie for Sound” (2000). Sharing many commonalities with film sound Bridgett also advocated that the early integration of sound into the planning stages of games both technically and aesthetically. Like Thom, Bridgett sees collaboration as the path to better integration of audio in games. Design is at the heart of video games ‘direction’ and working directly as part of that team is the best place to start. Art direction and technical direction all play heavily into the decision making process. Finding an audio direction that can not only support these other disciplines but can lead and inspire them is the goal for a game sound designer in pre-production. (Bridgett 2009) Murch, Thom and Mangini have been influential because of their standing within the industry which is a result of both commercial and critical success. Each gives a thoughtful and rational explanation for their personal philosophy, illustrated with examples of their own work. In particular, Murch has gained enormous respect through his filmmaking, both in picture-editing and sound, which give weight to his ideas on cinema and how sound can be used most effectively in cinema. Thom emphasises the creative aspects of the soundtrack and encourages other filmmakers, in particular writers, to experiment with the narrative possibilities of sound. Whilst Holman articulates a meta-analysis of the functional aspects of cinema sound, Mangini and Sonnenschein advocate a methodology and strategies for sound along the lines of Edward de Bono’s ‘thinking hats’, which can be used for creative ends. Bridgett, though working in a different medium, echoes the
Audiovisual theories of sound 47
thoughts of Thom in particular in advocating a co-operative approach with other disciplines in an area that does not typically have a single authorial voice. Each of the practitioners outlined offer either an explanation for the range of sounds that inhabit the soundtrack or guidelines for their creative construction illustrated with examples of their own work, which highlight both the applicability and usefulness of their approach. Whilst the models are sparse, their simplicity and flexibility is key to their success. Both Murch and Holman delineate between different functions or attributes of sounds that perform different narrative functions. Both Thom and Sonnenschein articulate a rationale for a particular approach or a range of creative possibilities that allow sound to be further embedded in the fabric of the film or to be a more meaningful partner in the filmmaking process.
Forums and blogs The practitioner’s perspective is immensely useful in analysing the ways that sound in film is used to create meaning. Whilst there are relatively few who manage to get their thoughts and personal philosophies into print, there are increasing opportunities for practitioners to publish their thoughts and ideas online, and to engage in a discussion that is not geared toward technical or commercial interests. Since the turn of the century a number of online forums and blogs have quickly become places where practitioners can discuss their work and ideas in a collaborative setting. FilmSound.org became a repository for a huge number of articles and interviews on sound theory and practice, not strictly limited to film sound, but including television, games and other interactive media. It carried an excellent collection of articles, both original and those originally published elsewhere, on film sound history and theory, including classic articles from Balazs, Clair, Pudovkin and Kracauer, and more contemporary works from Gustavo Constantini, Gianluca Sergi and Jeffrey Ruoff. Increasingly, online forums and blogs provide an opportunity for practitioners, educators and theorists to discuss theory and practice. Long-running Yahoo groups, such as Sound Article List, provided practitioners a space to discuss their own and their peers’ work with academics and others with an interest in film sound theory and practice.13 New blogs that focus on sound have successfully extended this movement, such as Designing Sound: Art and Technique of Sound Design (Isaza 2012), which combines practitioner articles, and technical how-to guides and videos augmented with regular sound design challenges for both novices and more experienced practitioners to compare styles and treatments. Frequently, there is a focus on contemporary guest practitioners who discuss their work in detail in articles and interviews, and take part in online question and answer sessions that more deeply explore the rationale for particular approaches or treatments. As well as more established names, other interesting and influential industry figures have been the subject of month-long specials, such as Ann Kroeber, Paul Davies and Rob Bridgett, which involve extended interviews, question and answer sessions from blog participants, and articles written by the particular practitioner.14
48 Audiovisual theories of sound
Summary In a soundtrack the seemingly obvious and taken-for-granted sound are routinely designed to appear to be natural. Even where there is explicit meaning, such as the language being spoken in dialogue, there are other characteristics of the voice and its treatment that contain supplementary information, which can be consciously acknowledged, intuited, partially perceived or ignored depending on the audience and other social, cultural and historical factors. Sound effects and music are often rich in meaning but are less easy to analyse since they are frequently overlooked, and indeed are often designed to be overlooked. At other times, the designed-ness of the element is also hidden and it is meant to be taken as an unmediated representation, as though it somehow passed from the events being filmed directly to the audience. The mimetic sleight of hand employed tends to attribute the manipulation or transformation of meaning elsewhere; for example, to the acting, directing, writing or cinematography. Early cinema sound’s twin aims could be summarised as persuasive illusion and intelligibility. Later cinema added other duties for the soundtrack to perform. Mary Ann Doane underlined the role that sound has been given in creating atmosphere and mood: If the ideology of the visible demands that the spectator understand the image as a truthful representation of reality, the ideology of the audible demands that there exist simultaneously a different truth and another order of reality for the subject to grasp. The frequency with which the words ‘mood’ or ‘atmosphere’ appear in the discourse of sound technicians testifies to the significance of this other truth. Most apparent is the use of music tracks and sound effects tracks to establish a particular ‘mood’. (Doane 1980b, 49) This task, whether knowingly or subconsciously is routinely taken on by practitioners in every area of sound practice. The word ‘atmosphere’ is used to describe a category or type of sound which, though it has no actual origin on screen, is nevertheless fundamentally important in creating a sense of something which is not otherwise contained within the image or its synchronous sound. It is a mood or feeling which is specifically created through design. In other words, the actual sounds themselves are less important than the feeling which is created. The sounds are merely the vehicles for the desired outcome. Whether we are looking at the macro level of genres and styles, or the minutiae of individual sound elements in a sequence, a sign-based analysis can help shed light on the ways the particular sound representations work, and how explicit and implicit, overt and covert meanings are transmitted through the elements of the soundtrack, which are received by the audience.
Notes 1 Even when there is a slight discrepancy between the event and the sound produced, such as when an echo reflects from a distant object, or the reverberation is heard inside a large church, once it has passed it is gone never to return.
Audiovisual theories of sound 49
2 There is evidence that Marconi was not the sole inventor of the wireless with separate claims from both Nikola Tesla and Jagadish Chandra Bose among others (Salazar-Palma, Sarkar, and Sengupta 2009). Similarly, there is some dispute over the origins of the invention of the telephone with both Antonio Meucci and Elisha Gray having made discoveries prior to Bell’s successful patent (Finn 2009). Regarding the phonograph, other inventors had produced devices to record sounds, but Edison’s device was able to reproduce sound. 3 It also means that we can convert other forms of data into sonic data. This field is known as sonification or audification (see Dombois and Eckel 2011). 4 Indeed, the gloomy outlook of sound is highlighted in the title of the essay “A New Laocoön: Artistic Composites and Talking Film”, a reference to G. E. Lerssing’s 1766 essay “Laocoön” which highlighted the differences between painting and poetry (Dalle Vacche 2003, 166). Laocoön refers to the Trojan priest who tried to warn his countrymen about the dangers of the Greek horse. For Arnheim, sound is the gift that cinema is foolish to accept. 5 Vitaphone is the system of sound synchronised to pictures developed by Warner Bros. Although used to provide a synchronised score for Don Juan (1926) it was made famous by the Warner Bros. first partially talking picture, The Jazz Singer (1927). 6 Whilst the terms are often used interchangeably, there is a distinction in practice between voiceover and narration, particularly in documentary production, where “voiceover is a disembodied voice derived from a character in the interview material. Narration is usually studio-recorded and not directly linked to a field recording” (Purcell 2007, 347). 7 La seconde question est que le son est un domaine où règne encore un grand flou terminologique et conceptuel, que tout le monde peut constater, et que ma recherché depuis longtemps vise à réduire. Cependant, ce flou, ne s’y complait-on pas un peu trop, chez les chercheurs et les intellectuels? En effet, ils vont se plaignant que le vocabulaire soit trop pauvre pour désigner les sons, mais ne songent jamais à se servir de mots plus précis qui existent bel et bien dans leur langue – qui certes ne disent pas tout, mais sont tout de même plus satisfaisants, moins passe-partout que ceux de “bruits” ou de “sons”. 8 The harpsichord produced notes of equal volume no matter how hard or softly the key is pressed, whereas the clavichord, and later the pianoforte (literally soft loud) used hammers which allowed for dynamic volume control depending on how firmly the key is struck. Though the instruments appear similar, the harpsichord, being of uniform volume, relied on the busy baroque style to create interest, whereas the nuances capable on the piano allowed new forms of musical expression to flourish. 9 Typically, this approach to mixing is used sparingly. Though it is possible to position sound for the audience from any direction (or at least any direction where there is a speaker, or along a line between adjacent speakers) there is a reluctance to do so if there is a chance that this will break the sense of story immersion which is happening on the screen. Typically, only non-narrative sound effects come from any of the speakers which are not behind the screen, unless it is for a deliberate effect, such as during the flyover effects for the pod race sequence in The Phantom Menace. Though Walter Murch used surround sound for the first time in Apocalypse Now for its ability to create an enveloping soundscape for the audience, it was typically introduced imperceptibly over the duration of long scene so as not to create a disjoint between picture and sound. 10 Dolby also relocated their headquarters to San Francisco in that year, the city which was the base for Francis Ford Coppola and George Lucas’s Zoetrope company. 11 It is worth noting that the original French title ‘Notes sur le Cinématographe’ does not refer to the English language term ‘cinematography’, which is concerned only with images. Though sometimes translated as ‘Notes on the cinematographer’, Cinématographe refers instead to the filmmaking enterprise as a whole. 12 Later amended and republished in Soundscape: The School of Sound Lectures, 1998–2001 (Sider, Freeman and Sider 2003), and in an updated form as “Screenwriting for Sound” (Thom 2011). 13 See http://groups.yahoo.com/group/sound-article-list/
50 Audiovisual theories of sound
14 Paul Davies is best known for his collaborations with Lyn Ramsay (Ratcatcher, Morvern Callar, and We Need To Talk About Kevin). Ann Kroeber was the partner and collaborator of Alan Splet, and worked on several films with Splet and/or David Lynch (Elephant Man, Blue Velvet).
References Altman, Rick. 1980a. “Cinema/Sound.” Yale French Studies 60. Altman, Rick. 1980b. “Four and a half film fallacies.” In Sound Theory, Sound Practice, edited by Rick Altman, 35–45. New York: Routledge. Altman, Rick. 1980c. “Sound Space.” In Sound Theory, Sound Practice, edited by Rick Altman, 46–64. New York: Routledge. Altman, Rick. 1992. Sound Theory, Sound Practice, AFI Film Readers. New York: Routledge. Altman, Rick. 2004. Silent Film Sound, Film and Culture. New York: Columbia University Press. Ament, Vanessa T. 2009. The Foley Grail: The Art of Performing Sound for Film, Games, and Animation. London: Focal Press. Arnheim, Rudolf. 1985. “A New Laocoon: Artistic composites and talking film.” In Film Sound: Theory and Practice, edited by Elisabeth Weis and John Belton, 112–115. New York: Columbia University Press. Balázs, Béla. 1952. Theory of the Film: Character and Growth of a New Art. Translated by Edith Bone. London: Dennis Dobson. Beck, Jay. 2016. Designing Sound: Audiovisual Aesthetics in 1970s American Cinema. London; New Brunswick, NJ: Rutgers University Press. Bresson, Robert. 1975. Notes on Cinematography. Translated by Jonathan Griffin. New York: Urizen Books. Bridgett, Rob. 2009. “Designing a Game for Sound.” DesigningSound.org. Available online at http://designingsound.org/2009/11/24/rob-bridgett-special-designing-a-game-for-sound/ Cavalcanti, Alberto. 1985. “Sound in Films.” In Film Sound: Theory and Practice, edited by Elisabeth Weis and John Belton, 98–111. New York: Columbia University Press. Chion, Michel. 1994. Audio-Vision: Sound on Screen. Translated by Claudia Gorbman. New York: Columbia University Press. Chion, Michel. 1999. The Voice in Cinema. Translated by Claudia Gorbman. New York: Columbia University Press. Clair, Rene. 1985. “The Art of Sound.” In Film Sound: Theory and Practice, edited by Elisabeth Weis and John Belton, 92–95. New York: Columbia University Press. Cowie, Peter, and Walter Murch. 2003. “The Synergy of Sight and Sound.” Berlinale Talent Campus #3, Berlin, February 11. Dalle Vacche, Angela. 2003. The Visual Turn: Classical Film Theory and Art History. New Brunswick, NJ; London: Rutgers University Press. Doane, Mary Ann. 1980a. “The Voice in the Cinema: The Articulation of Body and Space.” Yale French Studies (60): 33–50. doi:10.2307/2930003 Doane, Mary Anne. 1980b. “Ideology and the Practice of Sound Editing and Mixing.” In The Cinematic Apparatus, edited by Stephen Heath and Teresa De Lauretis, 47–56. New York; London: St. Martin’s Press; Macmillan. Dombois, Florian, and Gerhard Eckel. 2011. “Audification.” In The Sonification Handbook, edited by T. Hermann, A. Hunt and J. G. Neuhoff. Berlin, Germany: Logos Publishing House. Eisenstein, Sergei, Vsevolod Pudovkin, and Grigori Alexandrov. 1985. “A Statement.” In Film Sound: Theory and Practice, edited by Elisabeth Weis and John Belton, 83–85. New York: Columbia University Press.
Audiovisual theories of sound 51
Eyman, Scott. 1999. The Speed of Sound: Hollywood and the Talkie Revolution, 1926–1930. Baltimore: Johns Hopkins University Press. Farley, Shaun. 2012. “Ideas in Sound Design: Deprivation and Barriers.” DesigningSound. org. Available online at http://designingsound.org/2012/02/22/ideas-in-sound-de sign-deprivation-and-barriers-part-1/ Fielding, Raymond. 1980. “The Technological Antecedents of the Coming of Sound: An Introduction.” In Sound and the Cinema, edited by Evan William Cameron. New York: Redgrove Publ. Co. Finn, Bernard S. 2009. “Bell and Gray: Just a Coincidence?” Technology and Culture 50(1): 193–201. Greene, Liz, and Danijela Kulezic-Wilson, eds. 2016. The Palgrave Handbook of Sound Design and Music in Screen Media: Integrated Soundtracks. London: Palgrave Macmillan. Hanson, Helen. 2017. Hollywood Soundscapes: Film Sound Style, Craft and Production in the Classical Era. London: British Film Institute. Harper, Graeme, Ruth Doughty, and Jochen Eisentraut. 2009. Sound and Music in Film and Visual Media: An Overview. New York: Continuum. Holman, Tomlinson. 2010. Sound for Film and Television, 3rd edition. Amsterdam; Boston: Elsevier/Focal Press. Isaza, Miguel. 2012. “Designing Sound: The Art and Technique of Sound Design.” Available online at http://designingsound.org/ Jackson, Blair. 2010a. “Avatar: James Cameron and audio team create a new world of futuristic sounds.” Mix Online. Available online at http://mixonline.com/post/features/a vatar-0110/ Jackson, Blair. 2010b. “Christopher Nolan's 'Inception'.” Mix Online. Available online at http://mixonline.com/post/features/christopher_nolan_inception/ Jackson,Blair . 2011. “Boardwalk Empire: Bringing the 1920s to life.” Mix Online. Available online at http://mixonline.com/post/features/boardwalk_empire/ Jackson, Blair. 2011. “Game of Thrones: Production Sound for HBO fantasy series.” Mix Online. Available online at http://mixonline.com/post/features/game_of_thrones/ Jarrett, Michael, and Walter Murch. 2000. “Sound doctrine: An interview with Walter Murch”. Film Quarterly 53(3): 2–11. Kalinak, Kathryn Marie, ed. 2015. Sound: Dialogue, Music, and Effects, Behind the Silver Screeen. New Brunswick, NJ: Rutgers University Press. Kenny, Tom. 1998. “Walter Murch - The Search for Order in Sound & Picture.” Mix, April. Kerins, Mark. 2011. Beyond Dolby (Stereo): Cinema in the Digital Sound Age. Bloomington, IN; Chesham: Indiana University Press; Combined Academic. Kerner, Marvin M. 1989. The Art of the Sound Effects Editor. London: Focal Press. Kracauer, Siegfried. 1985. “Dialogue and Sound.” In Film Sound: Theory and Practice, edited by Elisabeth Weis and John Belton, 126–142. New York: Columbia University Press. Lastra, James. 2000. Sound Technology and the American Cinema: Perception, Representation, Modernity, Film and Culture. New York: Columbia University Press. Mangini, Mark. 1985. “The Sound Designer.” In Film Sound: Theory and Practice, edited by Elisabeth Weis and John Belton. New York: Columbia University Press. Maxfield, J. P. 1930. “Acoustic Control of Recording for Talking Motion Pictures.” Journal of the Society of Motion Picture Engineers 14(1): 85–95. Maxfield, J. P., and C. Flannagan. 1936. “Wide-Range Reproduction in Theaters.” Journal of the Society of Motion Picture Engineers 26(1): 67–78. doi:10.5594/J01286 Maxfield, J. P., and H. C. Harrison. 1926. “Methods of high quality recording and reproducing of music and speech based on telephone research.” The Bell System Technical Journal 5(3): 493–523. doi:10.1002/j.1538–7305.1926.tb00118.x
52 Audiovisual theories of sound
Mott, Robert L. 1990. Sound Effects: Radio, TV, and Film. Jefferson, NC: McFarland & Company, Inc. Murch, Walter. 2005a. “Dense Clarity – Clear Density.” The Transom Review, 2 February. Murch, Walter. 2005b. “Womb Tone.” The Transom Review, 16 May. Pudovkin, V. I. 1985. “Asynchronism as a Principle of Sound Film.” In Film Sound: Theory and Practice, edited by Elisabeth Weis and John Belton, 86–91. New York: Columbia University Press. Purcell, John. 2007. Dialogue Editing for Motion Pictures: A Guide to the Invisible Art. Amsterdam; Boston: Focal Press. Salazar-Palma, M., T. K. Sarkar, and D. Sengupta. 2009. “A brief chronology of the origin and developments of wireless communication and supporting electronics.” Applied Electromagnetics Conference (AEMC), 14–16 December. Schaeffer, Pierre. 1967. Traité Des Objets Musicaux. Paris: Seuil. Sider, Larry, Diane Freeman, and Jerry Sider. 2003. Soundscape: The School of Sound Lectures, 1998-2001. London: Wallflower Press. Sonnenschein, David. 2001. Sound Design: The Expressive Power of Music, Voice, and Sound Effects in Cinema. Studio City, CA: Michael Wiese Productions. Thom, Randy. 2000. “Designing a Movie for Sound.” In Cinesonic: Cinema and the Sound of Music, edited by Philip Brophy, 1–28. North Ryde, NSW: Allen & Unwin. Weis, Elisabeth. 1978. “The sound of one wing flapping.” Film Comment 14(5): 42. Weis, Elisabeth, and John Belton, eds. 1985. Film Sound: Theory and Practice. New York: Columbia University Press.
4 SOUND AS A SIGN
If we are listening to a familiar voice on a telephone, we are doing several things simultaneously. We recognise the voice as familiar even though it is markedly different from the real voice, having been filtered to be only a three to four octave bandwidth of the natural voice. We also immediately understand the words and language being spoken. Where there are competing sounds, we are still able to follow the speech. Where we might miss a word, we are able to guess what we missed from the previous or following words or other context. We are also able to gauge other information from the sound of the voice and how it is being used as a vehicle for the words, rather than only the words themselves. We might be able to tell whether they are upset, or happy, or with someone else, or keeping something secret. Many of these processes are so naturalised that we do not give them a moment’s thought, which is not to say that they are happening. How then do we explain how these abilities comes about? Think for a moment about the processes happening when we first learn what someone sounds like or looks like? Or consider what happens when we hear a familiar sound, or an unfamiliar sound: do we compare it to a mental inventory of sounds we have heard before? What happens when we hear a sound at the same time as see something or someone moving? In the previous section we looked briefly at perception. The dominant view of perception is that we do not directly perceive reality through our senses, but instead create models of reality. As is the case with an artificial neural network our dataset at birth is largely blank and we must gradually make mental connections between the new information our senses are giving us and our existing and developing ideas.
The importance of signs The importance of signs was recognised by ancient philosophers Plato and Aristotle, with Augustine and John Locke revisiting the topic centuries later. By the beginning of the twentieth century two principal models of semiotics – the study
54 Sound as a sign
of signs – had become established.1 In Switzerland, the linguist Ferdinand de Saussure’s developed his theory about the structure of language and his lectures were published after his death as the Course in General Linguistics (Saussure et al. 1960). This proved to be immensely useful to European film scholars as a model which could be applied to the analysis of cinema. Around the same time in the United States Charles Sanders Peirce was also developing his ideas about signs. For anyone unsure of whether signs are actually useful as a means of analysing or describing our world it is worth pausing for a moment to ask whether there is an alternative. Semiotician Daniel Chandler uses examples from literature to examine the question (Chandler 2011). Novels such as Jonathon Swift’s Gulliver’s Travels present such a situation as encountered by the eponymous traveller in the land of Lagado where their professors of language sought to improve it by such things as cutting out verbs and participles since all things imaginable are but nouns. But the grander project was aimed at the inadequacy of words themselves standing directly for physical things in the world around us. Gulliver reports that they suggested a novel idea: A scheme for entirely abolishing all words whatsoever; and this was urged as a great advantage in point of health, as well as brevity … that since words are only names for things, it would be more convenient for all men to carry about them such things as were necessary to express a particular business they are to discourse on. But for short conversations, a man may carry implements in his pockets, and under his arms, enough to supply him; and in his house, he cannot be at a loss. Therefore the room where company meet who practise this art, is full of all things, ready at hand, requisite to furnish matter for this kind of artificial converse. (Swift and Roscoe 1841, 50) Rather than using words to describe mental concepts, what was instead being advocated was a completely literal, physical representation of objects with which to communicate. Swift’s satirical take on those critics who argued that we can only mean what we say in plain language, rather than through metaphor was later echoed in Lewis Carroll’s Through the Looking Glass. Alice is bewildered by Humpty Dumpty’s use of words which she had previously assumed had a stable and agreed meaning.2 ‘When I use a word,’ Humpty Dumpty said in rather a scornful tone, ‘it means just what I choose it to mean – neither more nor less.’ ‘The question is,’ said Alice, ‘whether you can make words mean so many different things.’ ‘The question is,’ said Humpty Dumpty, ‘which is to be master – that’s all.’ (Carroll 1911, 211) Whilst Carroll and Swift were writing fiction, subsequent linguists and philosophers have taken their work very seriously. Many, including Saussure, argued that
Sound as a sign 55
the traits of language are much more conventional and malleable than previously thought. It is of course, nonsensical to reduce oneself to communicating only in relation to physical objects around us. Yet, as with many of Gulliver’s unusual encounters, there was a logic underlying the plans from language professors of Lagado: Another great advantage proposed by this invention was, that it would serve as a universal language, to be understood in all civilised nations, whose goods and utensils are generally of the same kind, or nearly resembling, so that their uses might easily be comprehended. And thus ambassadors would be qualified to treat with foreign princes, or ministers of state, to whose tongues they were utter strangers. (Swift and Roscoe 1841, 50) The problem of the lack of universal communication has been one which preoccupied other writers. Douglas Adams solution described a leech-like translator called the Babel fish in his Hitchhiker’s Guide to the Galaxy that could be put into a person’s ear, thus enabling the wearer to understand every language in the universe (Adams 1979, 49–50). Obviously, the ability to communicate using a ‘common tongue’ (to use an old phrase made new in Game of Thrones) would be most welcome, but is perhaps unlikely. Instead, we might aim to better understand the structure of signs in sign systems since sound designers routinely create meaningful non-verbal sounds expecting the audience to fathom what is meant. Here the underlying organisation of the sign system, whether a complex structure such as a formal language, or non-spoken signs will be examined to see whether (and, if so, how) it might be adapted to show us something useful about how we use sound and, by the same token, how sound might be understood.
Written and spoken language If we look at written language, two different ways of writing the same word might mean very little different in the word itself, but the clothes that the word arrives in might give some contextual understanding.3 Hearth hearth HEARTH hearth hearth hearth are all the same word, but the style in which it is written might (or equally, might not) be taken as some clue as to how we might interpret the word in its context. For most writing, online, newspapers, books, there is very little variation in the typeface once chosen. Every now and then a new font arrives, such as Times New Roman or Calibri, which is then used as the vehicle for all types of writing.4 In written English we have 26 letters and a number of supplementary characters used for punctuation. In spoken English we also have some limitations provided by the building blocks of that language:
56 Sound as a sign
While the number of roots in a living, natural language is infinite and constantly increasing, and the number of prefixes and suffixes is large but limited, the number of phonemes (the phonemic inventory) within a language is very limited, anywhere between 12 and 60 for each language. This means that linguistic signs are built up from a limited, finite number of elements, the phonemic inventory, that serves to separate meaning but has itself no meaning. In other words, linguistic signs are made up of non-signs. (Johansen and Larsen 2002, 45) In spoken language people say the words, and people have unique voices and ways of expressing themselves through their voices. A huge amount of information can be gleaned from the voice, or is betrayed by the voice – the gender, approximate age, nationality or region, political or sexual orientation, education, class. Many of those things that we ascertain from a voice might be stereotypical, or based on past experience, and many might also be wrong. Whatever the case, we make use the extra information that is given to us by the iconic and indexical representation of the voice as well as the symbolic meaning of the words themselves. Few voices are completely neutral, and serve only to demonstrate what one idea of neutral (a 1950s BBC announcer, a GPS navigation voice) others will immediately recognise as the privileging of one over others. If we compared a language to sound in all its manifestations, we would quickly come to the view that sound has infinitely more variety than those capable of being communicated by language alone. But as we know, in a spoken language the voice adds a significant layer to the words themselves, as any actor knows full well. The standardisation of written language (alphabets, fonts, punctuation, etc.) stand in contrast to the limitless forms of variation of the spoken word. A simple distinction such as a child versus an adult voice, or a male as opposed to a female voice, will render the sound of the speech, and thus its particular meaning and context, significantly different. Imagine reading a simple phrase such as the following: You liked what I did. Now imagine saying it several times, each time emphasising a different word. Depending on the emphasis and inflexion on the particular word, the meaning of the text changes, and in many cases becomes a question rather than a statement: You liked what I did. [You, of all people] You liked what I did. [I can’t believe anyone would like that] You liked what I did. [… but maybe not when I did it] You liked what I did. [You noticed me?] You liked what I did. [… but not what I am doing now] Any spoken language would have then at least two separate semiotic layers which have the potential to be meaningful: first, the actual coded language itself – the order and meaning of the words, phrases, sentences and so on. Second, the
Sound as a sign 57
contextual information contained in the voice speaking the words, which may give an enormous amount of supplementary information about the speaker themselves (gender, age, mood) as well as the actual content of the message. Similarly imagine those same two layers (the words of the language and the voice which is the vehicle for the language) where the language is unfamiliar to the listener. Here the words themselves would not be easily decoded but there would still be meaning contained in the sound of the voice which may be useful to the listener to ascertain something about what is being said, or perhaps the mood of the person speaking.
Sound as a sign The founding fathers of modern semiotics, Ferdinand de Saussure in Switzerland and Charles S. Peirce in the US, independently described the fundamental nature of signs, although their work appeared almost simultaneously. In the Course in General Linguistics (Saussure et al. 1960) Saussure proposed a simple two-part (dyadic) structure of the sign, namely the relationship between that which is signified and its signifier. Importantly, he analysed the sign as not merely the ‘name’ of a ‘thing’ but instead as a concept along with its soundimage (1960, 66). Saussure also described a fundamentally important characteristic of linguistic signs: that they are arbitrary. For example, the English and French words ‘dog’ and ‘chien’ refer to the same thing. Saussure viewed language as the ideal example of the arbitrary sign system: “That is why language, the most complex and universal of all systems of expression, is also the most characteristic; in this sense linguistics can become the master-pattern for all branches of semiology although language is only one particular semiological system” (1960, 68). Saussure also pointed out a paradoxical element in the use of signs in language, that they are at once fixed and forever changing, and display immutability and mutability (1960, 72–76). They are immutable in the sense that signs are fixed as far as the individual is concerned, yet mutable in the sense that language changes and evolves. Saussure separated language into langue (the system of language) and parole (the spoken language), thus placing emphasis on the speaker of the language rather than the listener. Theo van Leeuwen’s adaptation of Halliday’s social semiotics is perhaps the preeminent implementation of Saussurean semiotics to sound. In Speech, Music, Sound, the stated aim is to “explore the common ground between speech, music and other sounds” (van Leeuwen 1999, 1), without using the language of mainstream musicology and linguistics. However, van Leeuwen accepts that using this method to do so also proves difficult: A semiotics of sound should therefore not take the form of a code book, but the form of an annotated catalogue of the sound treasures Western culture has collected over the years, together with the present experience. A semiotics of sound should describe sound as a semiotics resource offering its users a rich
58 Sound as a sign
array of semiotics choices, not as a rule book telling you what to do, or how to use sound ‘correctly’. (van Leeuwen 1999, 6) Whilst an ‘annotated catalogue’ is useful, it does not take us much farther down the road towards understanding how sound works in practice or the decisions that practitioners take to create sound designs or soundtracks. A ‘code book’ need not be prescriptive, but may provide both a language and range of concepts that can be applied to any use of sound. Van Leeuwen’s approach is both original and broad-ranging, but is not sufficiently specific or explicit that it can be easily adapted to an analysis of sound design theory or practice. What is required instead is a system that can be used for the analysis of the soundtrack elements both individually and in combination with other elements, both sound and picture. In contrast to Saussure’s two-part sign, Peirce proposed a three-way (triadic) structure to describe the nature of the relationship between the representamen or signifier (the sign vehicle, the form of the sign), the object (that which is being represented), and the interpretant (the sign in the mind or the ‘receiving end’, how the object is interpreted). He highlighted the difference between the idea and the sign: “A sign receives its meaning through its subsequent interpretation” (Peirce and Hoopes 1991, 7). Therefore, his focus was not simply on signs but on semiosis or meaning-making. Where Saussure focused on language, Peirce’s theories encompassed any type of sign, including natural signs. For Peirce, the interpretation of a sign was not necessarily the end of the process, but perhaps the beginning of another set of signs, with the interpretant being a representamen of another sign, and so on. Umberto Eco (1979, 69) developed Peirce’s principle that one sign might beget another, naming the process ‘unlimited semiosis’. Following Peirce’s groundwork, Eco stated that “every act of communication to or between human beings – or any other intelligent biological or mechanical apparatus – presupposes a signification system as its necessary condition” (1979, 9). Furthermore, it is not necessary for signs to be intentionally emitted, any more than they are intentionally interpreted. Eco also described the polysemy of the sign, the fact that multiple interpretations were possible from the same sign or text depending on the reader/interpreter of the sign/text. The Peircean semiotic model has the potential to allow a fuller examination of the meanings contingent on the context of the soundtrack and sound–image relationships to be incorporated. These can include, but are not limited to, the culturally determined information (which can be thought of as naturalised or embedded) in the sounds of dialogue from its pitch, authority, speed of delivery, accent, volume, closeness and so on. Although the exact origin of an unfamiliar sound effect may not be known, it can nevertheless contain subconscious clues to its source, which then affects the way we interpret what we are experiencing in light of what we have previously experienced.
Sound as a sign 59
Semiotics and sound analysis Semiotic film theory has traditionally paid little heed to the practice of film sound, even though logically it should since sound is an intrinsic part of the film experience. Instead, cinema has tended to be defined as a visual medium, and it is the visual that has been the prime focus of analysis. The implication has been that since the image was there first, and sound came second, ipso facto sound cannot be a fundamental to cinema (Altman 1980a). Altman has done much to reveal the various fallacies underpinning the idea of film as a ‘visual medium’. The other aspect of film critique is that semiotic film theory, which utilises a Saussurean approach, necessarily applies an analogy of language to the film (Metz 1974a). This inevitably leads to the representation of film in linguistic terms with the application of hierarchical elements that stand in for the units of language, such as image for phoneme, shot for word, scene for sentence or phrase, and so on. This also means that the soundtrack has tended to be ignored, as it is difficult to describe in linguistic elemental terms. This necessitates a closer look at how critics have adapted the Saussurean semiotic schema to film. A classical textual analysis of a film tends to break down the elements and arrangement of the image, from the still to the textual whole, e.g. frame–shot– scene–sequence–generic stage–work as a whole (Iedema 2001, 189). By comparison, in Halliday’s (1996, 23–29) sociosemiotic theory of language the elements are converted into film text terms, in order to analyse the ways the message will be received: Text–situation–register–code–social structure. Gregory Currie (1993) argued against a language-based approach to cinema on the grounds that meaning in the cinema was not analogous to the way meaning is derived through language. The linguistic approach begged such fundamental questions, for example, whether meaning conveyed by cinematic images can correspond to the way that meaning is conveyed in language, by words and sentences. Indeed, after years of trying to fit the square peg of linguistics into the round hole of cinema, it appears enthusiasm for a linguistic-based approach is on the wane. However, it remains influential in film theory discourse, partly because of the influence of Bakhtin’s heteroglossia (Bakhtin and Holquist 1981), and the resulting differentiation between the system of language and the new emphasis on the utterance, and thus the separation of the structure from the form. Whilst useful for the visual analysis of audiovisual production, and to explain elements of multiple paradigmatic and multiple syntagmatic relationships at work, a Saussurean-derived analysis of the soundtrack is ill-suited to explain the myriad of ways the soundtrack is created, edited and mixed. Unlike written texts, and to a large extent the image portion of audiovisual media, the soundtrack usually consists of multiple simultaneous streams of information, rather than single streams that neatly end as another takes its place, and which are often not simple representations of what they appear to be. Neither does it provide a means of deconstructing the way sound is used. The inevitable hierarchical nature of the levels does not take into account the stable
60 Sound as a sign
elements of the soundtrack, or the shifts that may take place as music is used or removed. Nor does it take into account the difference between visual and aural, in the sense that sound cannot be confined to a single stable moment, such as a frame, since sound does not exist except as part of a stream. It cannot be frozen in time, in the same way that an image captures a moment or represents a moment. Neither does this model make any reference to the synchronised or non-synchronised elements of the audiovisual production.
The sign system of Charles S. Peirce In the penultimate paragraph of his major work An Essay Concerning Human Understanding ([1690] Locke and Fraser 1959), John Locke suggested that science could be divided into three sorts: first, Physica, being the knowledge of things, their properties and operations, and what he termed ‘natural philosophy’; second, Practica – being the skill in applying ethically our own powers and actions. The third branch he named Semeiotike: the doctrine of signs; the most usual whereof being words, it is aptly enough termed also Logike, logic: the business whereof is to consider the nature of signs, the mind makes use of for the understanding of things, or conveying its knowledge to others. For, since the things the mind contemplates are none of them, besides itself, present to the understanding, it is necessary that something else, as a sign or representation of the thing it considers, should be present to it: and these are ideas. And because the scene of ideas that makes one man’s thoughts cannot be laid open to the immediate view of another, nor laid up anywhere but in the memory, a no very sure repository: therefore to communicate our thoughts to one another, as well as record them for our own use, signs of our ideas are also necessary: those which men have found most convenient, and therefore generally make use of, are articulate sounds. The consideration, then, of ideas and words as the great instruments of knowledge, makes no despicable part of their contemplation who would take a view of human knowledge in the whole extent of it. And perhaps if they were distinctly weighed, and duly considered, they would afford us another sort of logic and critic, than what we have been hitherto acquainted with. (Locke and Fraser 1959, Ch 21, section 4) Other philosophers took up the idea from Locke, but none so to the extent of Charles Sanders Peirce.5 Whilst Ferdinand de Saussure saw semiotics as a branch of linguistics, Peirce’s semiotic model was, literally, an attempt to explain everything, and whose stated ambition and purpose was to make a philosophy like that of Aristotle, that is to say, to outline a theory so comprehensive that, for a long time to come, the entire work of human reason, in philosophy of every school and kind, in mathematics, in psychology,
Sound as a sign 61
in physical science, in history, in sociology, and in whatever other department there may be, shall appear as the filling up of its details. (Peirce et al. 1982, 6.168–69) Obviously, in making such a bold claim Peirce was opening himself up to accusations of breath-taking arrogance, but there was method to the madness. Peirce was not talking idly when he described his intent. It was a genuine attempt to create a “philosophical edifice that shall outlast the vicissitudes of time” (“A Guess at the Riddle” [1887–1888] Peirce et al. 1982, 6.168). There can be few individuals who have been noted for their original and pioneering work in such a diversity of intellectual fields. Leonardo Da Vinci springs to mind as the archetypal genius who followed his intellectual curiosity wherever it might lead. Peirce made strides in such a range of endeavours that he might one day rank alongside such luminaries. Who is the most original and the most versatile intellect that the Americas have so far produced? The answer ‘Charles S. Peirce’ is uncontested, because any second would be so far behind as not to be worth nominating. Mathematician, astronomer, chemist, geodesist, surveyor, cartographer, metrologist, spectroscopist, engineer, inventor; psychologist, philologist, lexicographer, historian of science, mathematical economist, lifelong student of medicine; book reviewer, dramatist, actor, short story writer; phenomenologist, semiotician, logician, rhetorician and metaphysician. He was, for a few examples, the first modern experimental psychologist in the Americas, the first metrologist to use a wave-length of light as a unit of measure, the inventor of the quincuncial projection of the sphere, the first known conceiver of the design and theory of an electric switching-circuit computer, and the founder of ‘the economy of research.’ He is the only system-building philosopher in the Americas who has been both competent and productive in logic, in mathematics, and in a wide range of sciences. If he has had any equals in that respect in the entire history of philosophy, they do not number more than two. (Max Fisch in Sebeok 1981, 17) Peirce’s list of intellectual achievements is impressive but he was not without problems in his personal life.6 Despite having a privileged start in life, with his father Benjamin Peirce the Professor of Mathematics and Astronomy at Harvard University, Peirce suffered from what was then termed ‘facial neuralgia’. This cripplingly painful chronic pain condition is likened to sharp, stabbing electric shocks and may be related to behaviours which would now be considered as mental illness. He suffered many scandals and made many enemies. Regardless of his personal troubles the subject of signs was one to which he would often return throughout his intellectual life. Peirce produced a huge amount of published and unpublished material.7 He wrote on many subjects, including the physical sciences, mathematics (especially
62 Sound as a sign
logic), economics, psychology and other social sciences, but returned to the area of semiotics throughout his career. He perceived all his work as related to his study of semiotics: “For, as the fact that every thought is a sign, taken in conjunction with the fact that life is a train of thought, proves that man is a sign” (Peirce, Hartshorne, and Weiss 1960, 5.253). Peirce held a pansemiotic view, in that we are surrounded by signs and it is through signs that we make sense of the world: “The entire universe is profused with signs, if it is not composed exclusively of signs” (Peirce, Hartshorne and Weiss 1960, 5.448). For Peirce, whatever is in the mind is only a representation; which is at once common sense and at the same time a startlingly important distinction. In one of his earliest writings on signs entitled “Some Consequences of Four Incapacities” (Peirce et al. 1982, 2.213), Peirce indicated the importance he attached to signs, and therefore his study of signs. Taking each of the incapacities in turn: 1. We have no power of Introspection, but all knowledge of the internal world is derived by hypothetical reasoning from our knowledge of external facts. This suggests the fundamental importance of the work of hypothesis in making sense of the world, rather than the Cartesian model that considers the thought as immediate perception. Instead, for Peirce, thought comes from an interpretation of the external world. 2. We have no power of Intuition, but every cognition is determined logically by previous cognitions. For Peirce, there is no completely new cognition, or thought. Instead, every thought is one of a series of thoughts with which we make sense of the world. Cognition is a process. Peirce used the analogy of a ‘train of thought’ to describe this continuous process, wherein “each former thought suggests something to the one which followed it, i.e. is the sign of something to this latter” (Peirce et al. 1982, 2.213). 3. We have no power of thinking without signs. Here, Peirce explicitly states that signs are absolutely fundamental to understanding the world, and that instead of directly experiencing external reality we ourselves mediate it. Our eyes give us a sign, as do our ears, and all other sensory organs through which we conceive our world. This is what is meant by Peirce’s unorthodox claim that we are, to ourselves, a sign (Peirce, Hartshorne and Weiss 1960, 5.253). 4. We have no conception of the absolutely incognizable. For Peirce, not only are meaning and cognition directly related, but the incognizable can have no meaning because it cannot be conceived.
Sound as a sign 63
Some of the ideas of Peirce’s semiotics are broadly similar to those of Ferdinand de Saussure, which is understandable since they were both building on centuries old ideas about signs. The major difference between the two was that Peirce’s model was a ‘general system’ – that is, one that can be applied to any type of sign. For us this is important since it means it can be applied to sound of any kind just as it can to any other type of sign system such as maps, traffic lights, mathematics, or smells.
The universal categories In order to build a framework for his semiotic system Peirce revisited the concept of Categories: – the idea that everything can be described in terms of what P.F. Strawson called a “descriptive metaphysics” concerned with describing the general features of our conceptual structure of the world (Strawson 1959). In essence, it is an attempt to answer the most fundamental of questions: “what is there?” In order to understand what Peirce meant by his system of Universal Categories it is worthwhile mentioning those of Aristotle and Kant which Peirce’s system of categories were designed to replace.8 Aristotle classified the world (the classes of things that exist, or rather, that we can conceive) into ten categories. Kant also suggested a list of categories in four classes of three, but thought of them as a properties or characteristics of any object. Whilst acknowledging that there may be others, Peirce sought to define three basic phenomenological categories, which he called the Universal Categories: Firstness, Secondness, and Thirdness. These Universal Categories are from three phenomenological conceptions being First: only through qualities of its own; Second: with reference to a correlate (a something else); and Third: mediation, bringing a second and third into relation to each other. At first glance, Peirce’s categories seem so imprecise as to have little use; with the names of the three categories – Firstness, Secondness, Thirdness – only adding to the sense of vagueness. Yet they are fundamentally important to the system he developed. Peirce first formally outlined them in his article “On a New List of Categories” [1867] and continued to work on them throughout his life. When his contemporary William James congratulated him on the novelty and originality of his system Peirce was somewhat offended: It rather annoys me to be told that there is anything novel in my three categories; for if they have not, however confusedly, been recognized by men since men began to think, that condemns them at once. To make them as distinct as it is in their nature to be is, however, no small task. I do not suppose they are so in my own mind; and evidently, it is not in their nature to be sharp as ordinary concepts. But I am going to try to make here a brief statement that, I think, will do something for them. (Peirce, Hartshorne and Weiss 1960, 8.264)
64 Sound as a sign
In “One, Two, Three: Fundamental Categories of Thought and of Nature” [1885] Peirce further defined these categories: It seems, then, that the true categories of consciousness are: first, feeling, the consciousness which can be included with an instant of time, passive consciousness of quality, without recognition or analysis; second, consciousness of an interruption into the field of consciousness, sense of resistance, of an external fact, of another something; third, synthetic consciousness, binding time together, sense of learning, thought. (Peirce, Hartshorne and Weiss 1960, 1.377) In an alternative definition from 1904 he described the three categories somewhat more clearly: Firstness is the mode of being of that which is such as it is, positively and without reference to anything else. Secondness is the mode of being of that which is such as it is, with respect to a second but regardless of any third. Thirdness is the mode of being of that which is such as it is, in bringing a second and third into relation to each other. (Pierce, Hartshorne and Weiss 1960, 8.328). As such, firstness is an unreflected feeling, immediacy, or potentiality. Secondness is a relation of one to another, a comparison, an experience, or action. Thirdness is a mediation, synthesis, habit, or memory (Noth 1990, 41). Given Peirce’s own variety of definitions, it would be easy to misrepresent his intentions; however, at the risk of over-simplifying, we could say that:
Firstness is a raw feeling about something, a naïve, unanalysed impression. Secondness is an apprehension, or recognition of a cause or relation. Thirdness is the bringing together, mediation, or synthesis.
Thirdness is the sense made from the interaction of the three elements in a sign, the object, its signifier, and the interpreting thought of the sign: “In its genuine form, Thirdness is the triadic relation existing between a sign, its object, and the interpreting thought, itself a sign, considered as constituting the mode of being of a sign” (Peirce, Hartshorne and Weiss 1960, 8.328). Therefore, in the Peircean model, signs are a phenomenon of thirdness. Thirdness also gives rise to concepts such as context, meaning and significance: I will only mention here that the ideas which belong to the three forms of rhemata are firstness, secondness, thirdness; firstness, or spontaneity; secondness, or dependence; thirdness, or mediation. (Peirce, Hartshorne, and Weiss 1960, 3.422)9
Sound as a sign 65
If we were to imagine the simplest forms of life millions of years ago, we might imagine them to have the beginnings of conscious thought. For the simplest organisms, life is composed of firstnesses only – raw feelings with nothing being related to anything. Perhaps eyes are still at the simple light or dark stage in their evolution, hearing has not yet developed, and a sense of touch, taste or smell does not yet yield any information that can be acted upon.
The structure of a sign Peirce defined the sign as consisting of three inter-related parts: a representamen, an object, and an interpretant. In Peirce’s own terminology, the word ‘representamen’ is used to describe the sign itself, as opposed to its signifying element or the object to which the sign refers. The representamen can be equated in some ways to the signifier from Saussure’s model or the sign vehicle from Charles Morris (1971):10 A sign is an object which stands for another to some mind. (Peirce et al. 1982, 3. 66) Since Peirce chose different terms for the same idea with ‘representamen’ and ‘sign’ used interchangeably, in the interests of clarity in this book the term signifier will be used in place of the less familiar ‘representamen’ or ‘sign’ when discussing the actual signifying element of the triadic sign. For our purposes a signifier is usually a sound. It is the relationship between the signifier, the object and the interpretant that determines how the sign will be interpreted. For example, on seeing smoke coming from a house a person may come to the understanding that there is a fire in that house. Smoke is the signifier. Fire is the object. The interpretant is the impact of the sign in the receiving mind: the connection of smoke with fire, therefore the understanding that there is fire in the house. Another example might be the sound or visual output of the Geiger counter in identifying the presence of radioactivity. The presence of radioactivity (the object) is inferred through the audible click sound or visual readout of the counter (the signifier), in order to create the interpretant (the presence of radioactive material nearby). The object is what is represented in the sign. The object of the sign need not be a physical object, but rather is simply whatever is being represented or signified, which could be an idea, a person, an inanimate object, a film or anything else: An object is anything that can be thought. (Peirce, Hartshorne, and Weiss 1960, 8.184) The object determines the signifier. The interpretant itself acts as a signifier for a further triad, and thus semiosis is the never-ending process in which each interpretant acts as a sign-vehicle for the next sign:
66 Sound as a sign
An interpretant is the mental effect of the sign, the signification or interpretation of the sign. (Peirce, Hartshorne, and Weiss 1960, 8.184) The ability to create an interpretant requires some reasoning by some mind. Peirce suggested a new class of reasoning to add to the classifications of deductive and inductive reasoning, which he called abductive reasoning. Abduction is the first step in logical reasoning, in that it sets a hypothesis on which to base further thought: All the ideas of science come to it by the way of Abduction. Abduction consists in studying facts and devising a theory to explain them. Its only justification is that if we are ever to understand things at all, it must be in that way. (Peirce, Hartshorne, and Weiss 1960, 5.145) Abduction is not something useful only in science. It is how we make sense of the world from the moment we are born. Imagine if you can the thought processes for a new-born baby in its first few days: What is that face? What is that sound? Is that sound coming from that face? What does it mean? Is that person the same person as before? Ouch that hurts. If I do that again will it hurt again? Yes. If I cry will that person help me? If I smile will they feed me? Obviously, this sequence of thoughts is unlikely but there would be some sequence of cognitions as a baby takes in information from its surroundings. Abduction is something that is so naturalised that we take it for granted. Rarely do we consciously acknowledge that we have a working or provisional understanding of events or what we think will happen, otherwise we would never be able to make a decision or judgement about anything. For example, any social interaction or everyday action such as driving a car, riding a bicycle or even walking down a street in any city requires us to make a multitude of these abductions in order to go about our business. We might make a guess at other people’s motives, movements or predict their future actions based on little more than a hunch. Thus, the question is how to understand the relationship between the object and the sign, and between the sign and the interpretant, and the representation of the sign itself. For Peirce, there were “three species of representations: copies, signs, and symbols” (Peirce et al. 1982, 1.174). For example, suppose there are painted human footprints on a pavement. The footprint could act as a sign that a foot has been there. The size, colour and other details do not, the existence of the footprint is the signifier that a foot has been there. The footprint acts as a sign, being a representation of part of the foot or the shoe itself, as well as being an evidentiary sign of the person walking. The footprint could also indicate symbolically the instruction to walk in the direction indicated by the footprint. For example, a stencilled footprint indicating a recommended path for children to walk to school, or the particular route to follow without need for written instructions.
Sound as a sign 67
Classification of signifier-object relations As Peirce states, objects determine their signifiers, the nature of the signifier-object link puts constraints or boundaries on the form of the signifier. This is a fundamental division of signs, namely the relation between the signifier and the object:
Icon (Firstness) – when the link is a quality or property. Index (Secondness) – when the link is existential. Symbol (Thirdness) – when the link is conventional, from habit or rule.
The distinction is rarely absolute and, to a degree, there may always be elements of each relation in a particular sign. Icon – The word icon is familiar to us but it is important to first clarify Peirce’s definition of the icon that will be used here. Peirce modified his definition of the Icon over time. Both visual and musical semioticians frequently adopt the definition of the icon from Peirce’s early work which focuses on the idea of similarity or resemblance. This definition from 1885 is typical, “I call a sign which stands for something merely because it resembles it, an icon. Icons are so completely substituted for their objects as hardly to be distinguished from them” (Peirce 1998, 1:226). Yet, by 1903 Peirce used an alternative and altogether more useful definition of the icon which rested less on similarity but on its characteristics or properties: “An icon is a representamen which fulfils the function of a representamen by virtue of a character which it possesses in itself, and would possess just the same though its object did not exist” (Peirce, Hartshorne, and Weiss 1960, 5.73); and this from 1904: “An icon is a sign fit to be used as such because it possesses the quality signified” (Peirce 1998, 2:307). Any kind of ‘iconic sign’ might be provisional. A sign must in the first place have some qualities in itself which serve to distinguish it, a word must have a peculiar sound different from the sound of another word; but it makes no difference what the sound is, so long as it is something distinguishable. (Peirce et al. 1982, MS 217:1873) In simple terms this means that if we describe a sound sign as being iconic, what this means is that the sound has audible qualities but it need not be recognised as coming from a particular object, or synchronised with a particular action; nor would it be understood as being recognisably similar to another sound from which we might make sense of it. We may perceive basic or fundamental qualities of the sound but, as yet, have nothing with which to link it in a meaningful way. Index – Where an icon has properties that are irrespective of anything else, an index possesses a genuine relation to its object. It can be seen as having a referential or evidentiary quality, since it cannot exist except through reference to its object. In sound terms, an indexical sound points to its origin. It is evidence of the object,
68 Sound as a sign
or action that created it. Like the word icon, the word index brings with it a number of existing understandings. We are used to the word index as being a pointer to something else, as in the index of this book being a list of names or subjects with links to the pages on which they are mentioned. For a long time, the consensus view was that photographs are broadly indexical where paintings are iconic. This view can be shown to be both incomplete and insufficient. Whatever stood before the camera as the image was taken does indeed have an indexical relationship to the photographic image but this is only one of several indexical relations that the photograph may have. There is an indexical link to the photographer, the type of camera used, the point of perspective of the camera apparatus (body and lens), the time of day (e.g. from sunlight), the season. Indeed, a blueprint can be said to have an indexical relationship with a building and its architect, just as a portrait can have an indexical link to the subject of the painting and the artist. It follows then that as there are multiple potential indexical relations from the simplest of signs, there would be a great many symbolic relations given the many different contexts in which the image can be viewed: since any worldly thing whatsoever – whether it be a photograph, a film, a painting, or a CGI – is dyadically connected to the world (or reality) in a potentially limitless number of ways, each one of them can form the basis for an indexical function. This implies that it is absurd to pretend that a photograph is more indexical than a painting or a CGI, since it is impossible to quantify the number of ways in which something may serve as a sign. (Lefebvre 2007, 228) In order to determine the object of a sign we must determine its use, or rather the way it is being used, since it may have multiple uses. Even the simplest indexical sound – a knock at the door – is evidence of someone knocking at a door, or rather, evidence that some action has happened to create the distinctive sound of tapping on wood. For it to be understood as knocking on a door, some other process needs to have taken place. Symbol – For the sound of knocking on wood to have the intended effect, it needs to be understood by the listener. That sound means that someone is on the other side of the door and would like me to open the door. The sound of knocking is a deliberate communication, and its meaning has to be learned. There is no natural law that determines that particular sound of knuckle on wood means anything. It is understood as the result of habit, or a rule or law, which governs that shared understanding. The two basic forms of sign relations: icon and index, alone being would be relatively useless without some way of linking similar concepts, or events. In the Geiger-counter example we may know that the crackling sound of the counter indicates the presence of radioactivity, but the sound that is used to convey that information has absolutely no link whatsoever to the concept it represents. It is simply a learned association or convention.
Sound as a sign 69
If we think about the process of learning without being shown what object has made a noise and what it means we would probably follow a path of recognising the sound in the first place as having some meaning against the backdrop of all the natural sounds of the world. We may hear a sound initially and it is literally meaningless, though we may perceive some characteristics or qualities it may possess. Later we may, perhaps through combination with other senses perceive that it happens at the same time as a visible action. Conversely, it may not have a visual component, but the sound alone would be evidence of some object or action which caused the sound. If we heard the sound again, we may have further collateral information which leads us to understand what the sound actually signified. It could be the knocking at a door, or the sound of the family car arriving home, or any other sound we have ever experienced and came to a realisation of what it meant. If we take an example of everyday sound design, the sounds of our computer operating systems offer a good illustration. The various iterations of Microsoft or Apple’s operating systems use short non-verbal sounds to indicate a multitude of warnings, alerts and notices. The different sounds which mean notification, or popup blocked, or hardware fail, or email received, are all by themselves fairly meaningless. We might notice small differences between many the sounds but through context and familiarity that we learn that these sounds have some significance. Gradually we recognise the sounds and their associated meanings through use of the sound alone.11 This process of icon to index to symbol happens so frequently and so seamlessly that we do not often realise it is a process at all. Imagine when you are introduced to someone for the first time. You are already using a fairly complicated sign system (language itself) to communicate and are now being given a new sound which will represent a new object; in this case a person you have just met. Unpicking the various layers of signs being used shows the massive complexity of sound signs we use every day. Interpretation frequently requires collateral information such as knowledge of the language being used, with which to interpret the sign. If we slow down the process, we may see its components more clearly. If, for example one is in discussion with a stranger speaking in an unfamiliar language the simple task of using the words ‘me’ and ‘you’ are aided by pointing at each person in turn. Such collateral information helps to clarify the meaning of the word and its related concept.
Dividing the object Peirce later emphasised an ‘end-directed’ process of inquiry rather than endless semiosis. The focus is then on the object as it stands at the end of the process, in light of collateral experience, as opposed to the object referred to in the signification. The first object is the immediate object, where the subsequent object is the dynamical object. The immediate object is the initial object, what it first appears to be, an unmediated approximation. The dynamical (mediate) object is the result ‘at
70 Sound as a sign
the end of the line’. The term ‘real object’ would also be used if it were not for the fact that the object might not actually be real: We must distinguish between the Immediate Object, – i.e., the Object as represented in the sign, – and the Real (no, because perhaps the Object is altogether fictive, I must choose a different term; therefore:), say rather the Dynamical Object, which, from the nature of things, the Sign cannot express, which it can only indicate and leave the interpreter to find out by collateral experience. (Peirce 1998, 2.498) This differentiation between the Immediate Object and the Dynamical Object is useful in that it accounts for a modification of understanding dependent on experience and other information or qualities possessed by the interpreter of the sign. It allows for the same object to be identified differently dependent on other external factors, such as the interpreter’s experience.
Dividing the interpretant Having divided the object, thereby moving away from the necessity of an infinite chain of signs, and thus infinite semiosis, Peirce instead offers a differentiation of types of interpretant: the Immediate Interpretant, the Dynamical Interpretant and the Final Interpretant. The Immediate Interpretant can be thought of as a recognition of the syntax of the sign, a surface-level understanding, or “the total unanalyzed impression which the sign might be expected to produce, prior to any critical reflection upon it” (Savan 1988, 53). The Dynamical Interpretant can be thought of as “the effect produced in the mind” (Peirce et al. 1982, 8.343), which is reached in combination with collateral experience; or moving toward a final meaning. The Final Interpretant is the end of the process, once ‘all the numbers are in’, and can be thought of as the idealised end-point. It is the interpretant “which would be reached if a process of enriching the interpretant through scientific enquiry were to proceed indefinitely. It incorporates a complete and true conception of the objects of the sign; it is the interpretant we should all agree on in the long run” (Hookway 1985, 139). The division of the interpretant in this way allows for the gradual unfurling of meaning from the sign, although the signifier itself need not change.
Significance and signs Semiotics, being the study of signs, can be a powerful tool in unpacking the complexity of how sound is used, and how meaning is made. Peirce’s semiotic model is particularly useful here because it applies to the most minute of sounds, as well as the macro-scale to shed light on how our interpretation or understanding of sounds develops or unfolds over time. If we back a step before meaning is made it
Sound as a sign 71
may be worth distinguishing between significance and meaning. To notice something could have significance is a necessary stop on the way to understanding some meaning. Morag Grant describes the usefulness of the distinction and focussing instead on significance in relation to experimental music and contemporary music composition: Significance, as I see it, is a broader term: on the one hand more relative (it may be entirely dependent on the context in question, the same element being significant in one context and relatively banal in another) and at the same time more specific (since it is only ever significant in relation to a given context). It can be free of the connotational function of meaning (but does not have to be), is not limited to the semantic sphere (though it also does not exclude it) and does not carry the implicit expressionism which colours the term meaning. Most importantly, if significance is contextual, then the process by which we decide that something is significant becomes an integral part of its significance: things become significant because we relate them to something else in the context of which they become significant. Furthermore, this very significance carries its own implications for the further course of the process. (Grant 2003, 175) This view of experimental music might also apply to any type of sound design. It is also worth noting that Peirce’s explanation or theory of signs offers, in some sense, a reasonable explanation for how any type of significance or learning occurs. It provides an explanation for the fundamentals of intelligence and how information can be generated from sensory input. A biological study involving birds of one species learning to associate the sound of alarm calls other species shows that this type of sound sign learning is not limited to the human world (Magrath et al. 2015a). In the study a group of birds were introduced to novel sounds and learned to associate them with alarm calls. After only a few training sessions, unfamiliar calls were introduced at the same time as predatory birds. In some cases, the unfamiliar calls were from real birds, and others entirely artificial calls were used. Though initially ignoring the unfamiliar calls, after two days the fairy-wrens in the study had learned to flee on hearing them. There are further examples of inter-species eavesdropping to gather information intended for others. Around seventy vertebrate species also gather information from close or distantly related species (Magrath et al. 2015b). There appears to be an evolutionary benefit if an individual can understand the significance of information about predators or learn to recognise distress calls from outside their own family or community. Some animals appear to be able to quickly recognise a new sound and associate an idea with that sound. What is initially an unfamiliar sound, once associated with another mental concept, such as the presence of a predator or some other danger, it can then stand for that related concept. This same process is at work from the moment we ourselves begin to make sense of the information provided to us by our ears, eyes and other senses. There are implications for how we might choose to
72 Sound as a sign
use sound significantly or meaningfully. As we shall see later in this book, many of the ideas for how sound practitioners go about the processes of selecting, recording, editing, creating, synchronising, or mixing sounds appear to implicitly apply some of the concepts described in Peirce’s theory of signs. This should not be taken as a criticism of Peirce’s theory. Rather it should be an illustration of his belief that he was simply formally acknowledging something that already existed although it was not previously well articulated.
Reality and semiotics For Peirce, one of the founders of the philosophical school of pragmatism, the value of a method of inquiry was in its usefulness.12 Even where virtual scientific certainties have stood unshaken for centuries, such as Euclidean geometry or Newtonian physics, they have sometimes required amendment. The pragmatic view is that all knowledge is provisional and Peirce employed the idea of fallabilism: “the doctrine that our knowledge is never absolute but always swims, as it were, in a continuum of uncertainty and of indeterminacy” (Peirce, Hartshorne and Weiss 1960, 1.171). That is not to suggest that we do not know anything. On the contrary, fallabilists do not require absolute certainty for belief, but it recognises that “people cannot attain absolute certainty concerning questions of fact” (Peirce and Buchler 1955, 59). Indeed, it would be very difficult to make any kind of decision if there were no belief without absolute certainty. The implication of this view for realistic representation is that whatever is sufficient to satisfy the requirements of belief will work in the absence of a better explanation. We do not require absolute certainty but instead rather build up a picture of realism piece by piece. We tend to seek out a best explanation for the representations that present before us. Where a sound is synchronous with an action, we might well believe that the two are related, and that one caused the other. In the absence of a better explanation, this rational belief will suffice. Where this sound/action is repeated we may consider this belief to be well founded.
Summary Where Fourier took a seemingly innocuous everyday phenomenon such as heat as the source of his ground-breaking theoretical work, Peirce looked on signs as both simultaneously intuitive yet not fully understood. Fourier recognised the profound implications that his work on heat would have: Primary causes are unknown to us; but are subject to simple and constant laws, which may be discovered by observation, the study of them being the object of natural philosophy. Heat, like gravity, penetrates every substance of the universe, its rays occupy all parts of space. The object of our work is to set forth the mathematical laws
Sound as a sign 73
which this element obeys. The theory of heat will hereafter form one of the most important branches of general physics. (Fourier 1878, 1) Peirce applied his intellect to a broad range of subject but semiotics preoccupied him for his entire life. Peirce, like Fourier, took a commonplace and seemingly common-sense topic and attempted to deduce general principles from it. Peirce was attempting to supplant the categories of Aristotle and Kant with ones which were more general, and also which were within the grasp of everyday thought rather than the domain of metaphysics. This is an important point, and one that I believe is central to his model’s usefulness. It was that in his model of categories that Peirce was simply articulating what he believed was a natural and logical way or organising thoughts, even though we may rarely view them in such terms. His model of semiotics gives an account of how our thoughts and ideas come about, how we learn and how we make sense of things or have knowledge of our world, or the fictional and virtual worlds that can also be created.
Notes 1 The terms Semiotic comes from the Greek σημειωτικός (se-meio-tikos) meaning ‘observant of signs’ where σημεῖον (se-meion) means ‘a sign, or mark’. The alternative spelling ‘semeiotic’ was coined by John Locke and was also sometimes used by Peirce. 2 Lewis Carroll was the pen name of Charles Lutwidge Dodgson, who lectured in mathematics at Oxford University. 3 The written word ‘hearth’ also gives no indication of how it might be pronounced, and a person unfamiliar with the word might pronounce it using its beginning ‘hear-’ as a clue to its pronunciation. 4 Times New Roman was commissioned for The Times newspaper in 1931. Calibri was developed 2002–2004 and replaced Times New Roman as the default font in Microsoft Office software in 2007. 5 Many of his works are brought together in edited collections, primarily Collected Papers of Charles Sanders Peirce (volumes 1–8) and Writings of Charles S. Peirce: A Chronological Edition (volumes 1–6 and 8). 6 In Philosophy he is recognised as a founder of the philosophical movement of Pragmatism (and later Pragmaticism). In Mathematics he is best known for his work in linear algebra but his work was primarily in logic and he was the first to show how Boolean Algebra could be done via a single binary operation. He was also one of the founders of modern statistics. His day-to-day work was in science and led him to pioneering advances in geography: in cartography, his quincuncial projection is an alternative two-dimensional representation of the globe. In gravimetrics (the measurement of earth’s gravity) Peirce developed a special pendulum with which was able to accurately measure the acceleration due to gravity. 7 Though a prodigious intellectual talent he alienated many important people, and his employment at his only university position at Johns Hopkins was terminated as a result, and instead made his living working for the US Coast and Geodetic Survey (see Joseph Brent’s biography Charles Sanders Peirce: A Life). His list of accomplishments is nevertheless staggering in its scope. The Encyclopaedia Britannica entry for Peirce states “Peirce is now recognized as the most original and the most versatile intellect that the Americas have so far produced. The recognition was slow in coming.” 8 Aristotle’s ten categories are substance, quantity, qualification, relation, place, date, posture, state, action, passion. Kant’s table of categories contains four classes of three: Quantity
74 Sound as a sign
9
10
11 12
(unity, plurality, totality); Quality (reality, negation, limitation); Relation (Inherence and subsistence, causality, community); Modality (possibility, existence, necessity) (see Thomasson 2013). Rhemata, as the plural of rheme: “A rheme is any sign that is not true nor false, like almost any single word except ‘yes’ and ‘no’, which are almost peculiar to modern languages. [—] A rheme is defined as a sign which is represented in its signified interpretant as if it were a character or mark (or as being so)” (Peirce et al. 1982, 8.337 [original emphasis]). Morris (1938, 6–7) adapted Peirce’s tripartite sign and focused his view of semiotics around semantic, syntactic and pragmatic levels of signs. Semantics relates to the comprehension of the preferred reading of the sign, with the syntactic level being the recognition of the sign and pragmatics being the interpretation of the sign. Morris used the term sign-vehicle in place of the Peircean representamen. It is also quite a nostalgic trip to hear old operating system sounds and try to remember what each sound meant. Thankfully, YouTube contains many archives of such OS sounds of yesteryear from Windows 95, XP, and various Apples OS iterations. Peirce later founded an alternative to pragmatism which he named pragmaticism to distinguish he ideas from those of his contemporaries. The unwieldly name was selected to be “ugly enough to be safe from kidnappers” (Peirce, Hartshorne and Weiss 1960, 5.414).
References Adams, Douglas. 1979. The Hitchhiker’s Guide to the Galaxy. London: Picador. Altman, Rick. 1980. “Four and a half film fallacies.” In Sound Theory, Sound Practice, edited by Rick Altman, 35–45. New York: Routledge. Bakhtin, M. M., and J. Michael Holquist. 1981. The Dialogic Imagination: Four Essays. Austin: University of Texas Press. Brent, Joseph. 1998. Charles Sanders Peirce: A Life. Bloomington: Indiana University Press. Carroll, Lewis. 1911. Alice's Adventures in Wonderland, and, Through the Looking-glass and What Alice Found There. London:Macmillan and Co. Limited. Chandler, Daniel. 2011. “Semiotics for beginners.” Available online at http://visual-mem ory.co.uk/daniel/Documents/S4B/ Currie, Gregory. 1993. “The Long Goodbye: The Imaginary Language of Film.” British Journal of Aesthetics 33(3): 207–219. Eco, Umberto. 1979. Theory of Semiotics, Advances in Semiotics. Bloomington: Indiana University Press. Fourier, Jean Baptiste Joseph. 1878. The Analytical Theory of Heat. Translated by Alexander Freeman. Cambridge: Cambridge University Press. Grant, Morag Josephine. 2003. “Experimental Music Semiotics.” International Review of the Aesthetics and Sociology of Music 34(2) :173–191. Halliday, M. A. K. 1996. "Language as a Social Semiotic." In The Communication Theory Reader, edited by Paul Cobley, 359–383. London: Routledge. Hookway, Christopher. 1985. Peirce. London: Routledge & Kegan Paul. Iedema, Rick. 2001. "Analysing Film and Television: A Social Semiotic Account of Hospital: an Unhealthy Business." In Handbook of Visual Analysis, edited by Theo Van Leeuwen and C. Jewitt, 183–203. London: Sage. Johansen, Jørgen Dines, and Svend Erik Larsen. 2002. Signs in Use: An Introduction to Semiotics. London: Routledge. Lefebvre, Martin. 2007. “The Art of Pointing. On Peirce, Indexicality, and Photographic Images.” In Photography Theory, edited by James Elkins. New York; London: Routledge. Locke, John, and Alexander Campbell Fraser. 1959. An Essay Concerning Human Understanding. New York: Dover Publishing.
Sound as a sign 75
Magrath, Robert D., Tonya M. Haff, Jessica R. McLachlan, and Branislav Igic. 2015a. “Wild Birds Learn to Eavesdrop on Heterospecific Alarm Calls.” Current Biology 25(15): 2047–2050. doi:10.1016/j.cub.2015.06.028 Magrath, Robert D., Tonya M. Haff, Pamela M. Fallow, and Andrew N. Radford. 2015b. “Eavesdropping on heterospecific alarm calls: from mechanisms to consequences.” Biological Reviews 90(2): 560–586. doi:10.1111/brv.12122 Metz, Christian. 1974. Film Language: A Semiotics of the Cinema. New York: Oxford University Press. Noth, Winfried. 1990. Handbook of Semiotics, Advances in Semiotics. Bloomington: Indiana University Press. Peirce, Charles S. 1998. The Essential Peirce. Bloomington: Indiana University Press. Peirce, Charles S., and Justus Buchler. 1955. Philosophical Writings of Peirce. Edited by Justus Buchler. New York: Courier Dover Publications. Peirce, Charles S., and James Hoopes. 1991. Peirce on Signs: Writings on Semiotic. Chapel Hill: University of North Carolina Press. Peirce, Charles S., Charles Hartshorne, and Paul Weiss. 1960. Collected Papers of Charles Sanders Peirce. Cambridge, MA: Belknap. Peirce, Charles S., Max Harold Fisch, Edward C. Moore, and Christian J. W. Klousel. 1982. Writings of Charles S. Peirce: A Chronological Edition. Bloomington: Indiana University Press. Saussure, Ferdinand de, Charles Bally, Albert Riedlinger, and Albert Sechehaye. 1960. Course in General Linguistics, 1st British Commonwealth edition. London: Owen. Savan, David. 1988. An Introduction to C. S. Peirce's Full System of Semiotic, Issue 1 of Monograph series of the Toronto Semiotic Circle. Victoria College in the University of Toronto. Sebeok, Thomas A. 1981. The Play of Musement. Bloomington: Indiana State University. Strawson, P. F. 1959. Individuals. An Essay in Descriptive Metaphysics. London: Methuen & Co. Swift, Jonathan, and Thomas Roscoe. 1841. The Works of Jonathan Swift, Containing Interesting and Valuable Papers not Hitherto Published. With Memoir of the Author by Thomas Roscoe, etc. London: Henry Washbourne. Thomasson, Amie. 2013. “Categories.” In The Stanford Encyclopedia of Philosophy, edited by Edward N. Zalta. Available online at https://plato.stanford.edu/entries/categories/ van Leeuwen, Theo 1999. Speech, Music, Sound. New York: Macmillan Press.
5 ANALYSING SOUND WITH SEMIOTICS
Using semiotics Whilst such a comprehensive system as Peirce’s semiotics might not immediately appear to be practically useful, its strength lies in its flexibility and universality – qualities that lend it to analyses that are otherwise either elusive or unworkable. The focus of this model is on the process of semiosis. The meaning of a sign is not entirely contained in it, but rather meaning arises through its interpretation. Each of the three principal elements of the sign, the object, signifier and interpretant, allow each sign to be examined using the relations between the elements. Each object-signifier can be thought of as being linked either through some immediate property (firstness), as a link to causality or evidence (secondness), or through convention or habit (thirdness). So how does this apply to the use of sound? Can sounds be interpreted as ‘sound-signs’? Let us make an examination of sound in light of five basic principles in regard to our semiotic model:
1 A given sign need not be assigned to one and only one class. Take, for example, the ticking sound of a clock. Its relationship to its object can be iconic, indexical or symbolic, depending on factors described as the context. The sound of the clock is iconic in the sense that it represents the clock through the distinctive properties of the mechanical sound of the clock ticking. We can recognise the sound here because of its iconic quality. The sound of the clock is also indexical, in that it is evidence of the presence of the clock that makes the sound. From our experience of the concept of time we have learned that each tick of a clock represents one second of time passing. The sound of the ticking is thus also symbolically linked with the idea of time.
Analysing sound with semiotics 77
2 A sign’s classification can change with its function, and with history, and with perspective and interpretation. Continuing with the ticking clock example, we can show where this classification might change. Imagine the following three scenarios, where the only accompanying sound is the ticking of a clock: (A) (B) (C)
shot of a man lying in a darkened room staring at the ceiling shot of a man racing through a crowded city street shot of an unattended package at a railway station
In each case, what is represented by the ticking (the origin of which may or may not be visible) is different through its contextual use. The first example might represent time moving slowly for someone staring at the ceiling (unable to sleep? nothing to do?). The second example might represent time moving quickly (running out of time?). The third example might be an indexical or symbolic sign for a bomb that may (or may not) be inside the package. The context of the sound-sign shifts the meaning.
3 The focus is on the process of semiosis rather than the content of the sign. From the previous example we can see that the process of semiosis is critical in determining meaning from the sound-sign. The meaning of the sound-sign has changed but the sound-signifier has not changed in any way except in its context.
4 The initial object may not be the same as the eventual object once the sign has been ‘mulled over’, and similarly the initial interpretant need not be fixed since the object determines the interpretant. What may begin as one interpretation (the immediate interpretant) could be simply the sound of the ticking, which may then lead to a modified interpretation that becomes an indexical link to an actual timepiece somewhere nearby. It can be further modified to the extent that the sound is actually not a clock at all, but a bomb, since the sound is symbolic (as well as indexical and iconic) of a bomb. This ‘mulling over’ of the interpreting mind creates meaning through a process. The abduction stage is required to create new meaning though a new hypothesis, which is then tested, and is at work when we create meaning through comparison with another, allowing the incorporation of new signs into our schemata. The capacity to compare is used to create meaning. Literary devices, such as figurative language, depart from a literal meaning in order to compare the characteristics of one thing to another, through simile or metaphor.
78 Analysing sound with semiotics
5 The Peircean model of the sign takes account of the role of the interpreter in coming to meaning. The interpretation of the ticking sound in all the ways suggested, requires a mind to interpret. The apprehension of the sound-sign, prior experience and mental processing is required to create meanings. Indeed, different minds might well make different meanings from the same sound-sign. Therefore, the role of the interpreter is fundamental not only in whether meaning is created, but also in relation to what meanings are created.
Firstness, Secondness and Thirdness in sound Peirce’s Universal Categories of Firstness, Secondness and Thirdness are particularly relevant to sound, as opposed to vision, since sound only ever exists as a stream, rather than as a constant or static object. Our awareness of the sound develops as the sound develops. We can stare at a photograph and the photograph will remain the same, although our interpretation of it may change as we dwell on it. However, as we listen to a novel sound, first it begins, and we may initially only notice that it simply is a sound. Then it becomes a sound with respect to some other sound or silence, or some physical thing, and thereafter some mediation and attribution of meaning, if the process gets that far. Whilst the Universal Categories are difficult to pin down, the idea behind them highlights some phenomena that can exist prior to our eventual understanding, or before the process of semiosis is complete. There is a logical sequence from Firstness to Thirdness: We notice something without conscious acknowledgement: Imagine a sound X that one notices, but that is not recognised. It merely has vague characteristics, without pointing to any particular thing.1 This would be considered a firstness, relating to the immediate properties or characteristics of a thing without reference to another. Then we notice with respect to some other thing: A sound that stops or starts does so in comparison to, or with respect to, silence. We now notice that sound X stops, or that it starts. We do not yet attribute any meaning to the sound but notice that it exists or ceases to exist, perhaps as a result of some other action. This would be called a secondness, directly relating to a second thing. Finally, we incorporate the knowledge into the broader understanding through reference to a third: If we later notice sound X once more and realise a cause for the sound, link it with some action, or otherwise describe it in terms of a general rule, then this would be called a thirdness, a synthesis of meaning or mediation. We can take the example of the three ‘pips’ from a speaking clock which uses a spoken description of the time, such as “At the third stroke, the time will be twelve forty-six and ten seconds”, with three beeps following.2 The first pip indicates simply a pip, which has sonic qualities of its own that are independent of any other thing. On hearing a second pip we recognise it as being similar to the first, but which we only recognise as the reference once we hear a second pip. This
Analysing sound with semiotics 79
second pip, identical in sound to the first pip, allows the inference that what we are hearing is a sequence, and establishes an interval tempo. The third (and altogether necessary) pip is identical in sound to the first two, and is the only one that signifies the exact time. It is by then predictable and meaningful since the listener knows to expect a third pip, when to expect it and what the pip represents. The three pips illustrate Peirce’s categories. The first being the quality, the second being the fact, and the third being the habit or rule.
Sound signs Whilst Peirce’s work primarily focuses on the classification of the signifier and the interpretant, it is important to briefly address the classification of signifier-object relations, initially dealing with familiar sounds. Here are some examples of sound signs described using the Peircean typology:
Sound Icons – onomatopoeia, a car’s horn, a car’s engine, a musical sound. Sound Indexes – a cry of pain, the beep from an ATM, the sound of a car’s indicator, a footfall or footstep. Sound Symbols – a record scratch, spoken language, Morse code.
Of course, many, if not all sounds, will exhibit the properties of icons, indexes and symbols. For example, a telephone ring acting as indexical evidence of the phone being rung by someone, iconic as the inherent properties of the sound of ringing is useful to draw attention, and symbolic as a sound meaning that someone wants to communicate in the first place. How then can we apply Peirce’s model of the sign to the new or the unfamiliar? First, we revisit Peirce’s ideas from “Some Consequences of Four Incapacities”. The first three being: 1 2 3
We have no power of Introspection, but all knowledge of the internal world is derived by hypothetical reasoning from our knowledge of the world. We have no power of Intuition, but every cognition is determined logically by previous cognitions. We have no power of thinking without signs (Peirce et al. 1982, 2.213)
From these three incapacities, we can surmise that we arrive at understanding from our own reasoning, based on our knowledge of the world, and that our reasoning is based in signs, since our thinking comes through signs and their associations. Therefore, any new sign is determined by previous cognitions, and we can only arrive at new thoughts through other signs. When confronted with new signs, we arrive at new understandings through hypothesis and reasoning, an example of Peircean abductive reasoning, as distinct from inductive and deductive reasoning.
80 Analysing sound with semiotics
Natural(ised) and arbitrary sound signs Natural signs are signs that appear in nature that we can recognise and get meaning from. Lightning and thunder are such natural signs. The sound of thunder in conjunction with lightning provides a literal sound representation of the storm.3 The sound of thunder without its visual representation of lightning also mimics the natural world, since we often hear thunder without seeing its lightning source, and so deduce that a storm is some distance away. When seeing lightning without hearing thunder we can also induce that the sound of thunder is about to arrive; and the smaller the time between the two, the closer the storm. Therefore, there is a sense of foreboding that accompanies a destructive event, such as a storm. In addition, we learn (a naturalised sound code) that loud rumbles tend to come from significant physical events or manmade events, such as explosions. We know from our own experience of sounds that loud, powerful sounds come from large powerful events. Before thunder and lightning were scientifically explained as phenomena associated with the discharge of energy caused by the ionisation of storm clouds, they were simply recognised as a ‘sign’ signifying a storm.4 Further back in history they were interpreted as a sign that the thunder god was unhappy and was making his/ her feelings known.5 Equally, they could be taken as a positive sign if there had been a period of drought, with lightning and thunder indicating an end to the drought. Natural sound signs can have referential and explicit meaning (the sound of thunder, the accompaniment of lightning), and they can have implicit and symptomatic meanings (the gods are angry, the drought is over). Whilst there are natural signs all around us there are also signs that are created that are not grounded in a natural occurrence but that are conventional signs designed for the purpose of communication (Cobley and Jansz 1999, 5). Both natural and conventional sound signs are used in film soundtracks; the natural sounds that the audience intrinsically knows and those that have to be learned through associations. Sometimes these associations are not obvious; for example, the pairing of music to particular characters, yet the effect of the symbolic link does its job nonetheless. Few audience members watching The Empire Strikes Back for the first time would consciously recognise Darth Vader’s ‘theme music’ (Kershener 1980) but after accompanying his appearance on screen the particular musical theme becomes linked to his character. Thereafter, that particular music portends his imminent arrival. Peirce’s definition of the symbolic representation largely coincides with Saussure’s ‘arbitrary signifier’, in that there is neither similarity nor direct connection through cause and effect, nor other evidence: “The symbol is connected with its object by virtue of the idea of the symbol–using mind, without which no such connection would exist” (Peirce, Hartshorne and Weiss 1960, 2.299). For example, the word ‘dog’ has no natural connection with the concept of dog, but the order of the letters d-o-g creates the meaning for the concept of dog in English, just as the letters c-h-i-e-n create the meaning for the same concept in French. In this
Analysing sound with semiotics 81
sense, it is a socially learned association, which points to a conventional link between the object and its sign (Chandler 2007, 28). The system only works because it is recognised or agreed that the sign represents the object. Similarly, a violin appears to have little intrinsically or naturally to make it representative of a romantic moment between lovers or of the many other things it represents. Instead, some types of music can be said to have acquired a socially-constructed meaning. The symbolic relationship between signifier and object, leading to the principle that a signifier and its object need no natural connection whatsoever, can be illustrated in film sound. Once the simple association of sound and image is made, one can later be used to suggest the other. For example, the double basses in Jaws (Spielberg 1975) are associated with the shark through the synchronisation of a moving underwater shot in the opening title sequence, and later during the first shark attack. The image takes the shark’s point of view (POV) and is used to indicate the shark rather than physically showing the shark itself in the first part of the film. Both times we are given the POV shot of the shark, we hear the accompanying double basses. By the second double basses/POV shot, an abduction can be made that one symbolises the other, or when one is heard the other is present and an attack is imminent. By the time the double basses are heard later in the film, the audience can induce (correctly) that another shark attack is likely, although visually nothing more than a calm ocean need be shown. The double basses now represent the shark without having to show either the shark itself or the shark’s POV. Music is used in a similar way in Fritz Lang’s M, in which Peter Lorre’s character whistles two bars of a piece of music (Grieg’s ‘Troll Dance’). The music is thereafter used to symbolise the murderer of the film: I seem to recollect quite clearly that this harmless little tune became terrifying. It was the symbol of Peter Lorre’s madness and bloodlust. Just a bar or two of music. And do you remember at what points (toward the end) the music was most baleful and threatening? I do. It was when you could hear the noise, but could not see the murderer. (Cavalcanti 1985, 108) The visual image is therefore liberated from a simple functional representation. A sense of anticipation is created whilst showing little in the mise-en-scène, and the desired cinematic effect is produced in the most efficient manner possible.
Music in the soundtrack Music can contain referential or symbolic links to things outside its immediate context. It could provide a link to a genre, or the musicians who wrote and performed it, or perhaps an era with which the music is associated. Potentially music associated with a character(s) can become fused with the character and any lyrical content, or associations with the music, its performer or writer can be then attached to the character. Non-coded or semi-coded lyrics can offer a deeper or
82 Analysing sound with semiotics
alternative reading of what we are experiencing and how we interpret the characters on screen. For example, the lyrics in the song ‘Everybody’s Talkin’ from Midnight Cowboy (Schlesinger 1969) performed by Harry Nielsen is used early in the film in a montage sequence which follows protagonist Joe Buck (Jon Voight) as he begins his new life in New York City. Whilst there may be no direct or easy connection between the two, the mere fact that they are synchronised will lead the audience to look for a connection: “Everybody’s talking at me, I can’t hear a word they’re saying. Only the echoes of my mind” (Neil 1966). The fact that lyrics of the song are in the first person suggests that there is a link between the lyrics of the song and character on screen, and the audience will seek to make sense of what they are presented. In this example, the music and lyrics are able to offer an alternative view of naïve optimism, which is in opposition to the visual representation of the character and his situation (Potter 1990, 93–100). Music can be viewed as a second text that is embedded in the film text, bringing with it a range of meanings and references outside the filmic text that can be overlaid on to its new context. This intertextuality between film and music is overt, with the music being embedded into the film. For example, in Dr. Strangelove (Kubrick 1963), the music in the opening sequence is a lush, orchestral piece of music. By itself it is not unusual as a piece of ‘film music’, although perhaps a strange choice for an opening title sequence showing two US Air Force aeroplanes refuelling in mid-air. The romantically orchestrated music seems at odds with the visual images, yet when combined with the distinctive comic book hand-drawn titles, which again do not seem to match either images or music, we are getting contradictory messages, especially given the prominence of the full title of the film (Dr. Strangleove or: How I Learned to Stop Worrying and Love the Bomb). Closer inspection of the music, to those familiar with it, will show it is actually an instrumental version of the love song, “Try a Little Tenderness” (Woods, Campbell and Connelly 1933). As such, we are invited to make a connection between the now sexualised images (through the associated romantic music) of one plane ‘refuelling’ another. This being the opening sequence, such a combination of serious images being subverted by the musical soundtrack provides an invitation to see the rest of the film in light of this illuminating allusion. As well as requiring (or at least encouraging) the audience to make connections between the images, the opening titles and the music, the music refers extra-textually to a work that is itself a recognisable media representation of ‘romance’ or ‘love’. It is important also to recognise that the secondary interpretation of the music need not be immediate. It may well be noticed immediately, or sometime later during the film, sometime after the film, or not at all. Peirce’s differentiation between immediate and dynamical objects and interpretants allow for interpretation and reinterpretation in the light of collateral experience.6 Music used in this way is more indirect, requiring audience expectation and recognition of musical forms to add meaning through synchronisation (or juxtaposition) with images. Written in 1932, the song had been recorded by many artists, such as Bing Crosby and notably Otis Redding in 1966 two years after the film. This made the music more recognisable still to a new audience in the years
Analysing sound with semiotics 83
following the film’s release, thereby increasing the connotative content of the sequence, illustrating the unlimited semiotic potential of extra-textual influences affecting the interpretation of an historical and supposedly fixed film. Our understanding of the film might then be modified by time or collateral experience, which influences our interpretation of the content of the sign in the film. In Peircean semiotic terms, we may well have an immediate interpretant for the film (or for this piece of music in the film) as we first experience it, and later this interpretant may be modified, whether during the course of that viewing, or when watching the same film years later. On-screen music, such as that used in a musical or where characters play or sing, has both a literal, indexical, signifying quality, as well as other embedded and connotative qualities.7 In scored music even the simplest musical sequence has a multitude of possible meanings and readings, since music is so loaded with cultural, historical, and social meaning. Even where completely original music is used and the individual instruments and voices are unrecognisable, the music itself stands alone to add meaning to the film since the musical sounds have iconic properties regardless of their origin or associations. The musical characteristics (instrumentation, arrangement, musical style, tempo, and so on) will force the images or the film as a whole to be interpreted through this filter by the simple act of synchronisation with the images. Where existing music is incorporated into a film, the original music’s meaning works with the new context to create further meaning potential through synchronisation, or juxtaposition with images. The music may be sympathetic to the time and period of the film, or it may be in contrast to it. Just as, in the eighteenth century, Mozart signified the action taking place in a Muslim country through use of instruments associated with Turkey (Deutsch 2007, 7), cinema has taken on the same idea, using instrumentation to indicate or reinforce a particular era or location in sympathy with the diegesis of the film. Conversely, films can also dispense with a musical approach being dependent on, or sympathetic to, the mimesis or diegesis. For example, Vangelis’ score for Chariots of Fire (Hudson 1981), is seemingly at odds with the world of the story in terms of style and orchestration, and which may therefore invite the audience to make connections between the music and the images that it accompanies. Music can embody specific meanings dependent on cultural specificity and history. To a British listener the hymn Amazing Grace (Newton 1779) may conjure up images of pipers and the Scottish Highlands, or some loose idea of ‘Scottishness’, funerals, or a sense of hope. The hymn was composed by a former slave trader John Newton, who converted to Christianity and subsequently became a minister. Since its popularisation in the US the same music heard by an American may represent the Ku Klux Klan and slavery as it has been closely associated with both the organisation and that period of American history. Since the 1960s it has also been linked with the American civil rights movement. More recently US President Barack Obama performed an impromptu version of the song at a funeral service for a victim of the Charleston shooting (Sack and Harris 2015). Rather than simply a
84 Analysing sound with semiotics
hymn, Amazing Grace is then “taken seriously as cultural artefact loaded with a history of greed, oppression and hope” (Yenika-Abbaw 2006, 353). While cultural differences such as these would obviously make symbolic or indexical use of music fraught with danger, music is often rich in meaning and can be put to service in films to either augment or contrast with the other visual or sonic elements. Examples abound, but in terms of instrumental music alone, the use of music in 2001: A Space Odyssey (Kubrick 1968) is an example par excellence. Again, the Peircean concept of the dynamical interpretant takes account of the changing understanding from a seemingly stable sign object. The music that accompanies the opening title sequence is Richard Strauss’s Also Sprach Zarathustra (1896), which is a majestic piece of music fitting to open such an impressive film. Stephen Deutsch points out that the music was composed as a eulogy to Friedrich Nietzsche, who wrote a work of the same name that contains the line, “What is ape to man? A laughing stock, an object of shame and pity. Thus is man to Superman” (Deutsch 2007, 13). Given the subject matter of the film (the three stages of mankind – the ape, present day man and starchild/superman), there is what appears to be an obvious reference to the Nietzsche work from its opening sequence, which then informs the whole film. The same music accompanies the transition from ape to man, and from man to starchild. Kubrick himself, though, seemed to distance himself from that interpretation of the music, arguing that he simply did not have time to use the score that Alex North had written, and in any case preferred the ‘temporary’ music he had been using during editing (Ciment 2003, 177). Talking about his attitude to film music in general Kubrick also indicated his broader view, But there doesn’t seem to be much point in hiring a composer who, however good he may be, is not a Mozart or a Beethoven, when you have such a vast choice of existing orchestral music which includes contemporary and avant-garde work. Doing it this way gives you the opportunity to experiment with the music early in the editing phase, and in some instances to cut the scene to the music. (Ciment 2003, 153) The employment of pre-existing music as the soundtrack score to transfer some of the embedded meaning to the film is commonplace. Often filmmakers will select pre-existing music precisely because of the connotations it brings with it. For example, Martin Scorsese (2004, 1990, 1995) often uses music as a shorthand way of describing a precise era in films, which span many decades, and simultaneously uses particular pieces of music for its associations. The music then fulfils a dual role: efficiently indicating the era through the particular music choices, whilst allusions to the content of the titles of the songs, their lyrics, or their performer provide extra-textual information that can be brought to the filmic text. From the use of Tony Bennett’s Rags To Riches (Adler and Ross 1953) in the early part of Goodfellas, to The Rolling Stones’ Gimme Shelter (Jagger and Richard 1969) years later, the music adds layers of possible meaning. The title and lyrics of each song can be
Analysing sound with semiotics 85
seen as a reflection of the state of mind of the protagonist, Henry Hill (played by Ray Liotta), as he begins his mafia career in the mid-1950s with Rags to Riches, “I know I’d go from rages to riches If you would only say you care/ And though my pocket may be empty I’d be a millionaire”. Also his eventual decline into cocaineinduced paranoia in the late 1960s with Gimme Shelter, “Oh, a storm is threat’ning my very life today/ If I don’t get some shelter/ Oh yeah, I’m gonna fade away.” The music employed in this way uses both a symbolic link to the type of music, the artist, the performer, the title and so on, but also suggests a link to a particular time, acting as an indication of the era being presented. Peircean semiotics, when applied to music, has often been hamstrung by the need to incorporate and account for the seemingly intractable necessities for music to resemble its object, while also being seen as being an abstracted art which need not refer to anything outside of itself. By adopting Peirce’s later definition of the iconic sign, musical analysis is freed from the necessity to always look for resemblance or similarity, and may instead on its own properties and characteristics. By doing so a clearer separation between iconic, indexical and symbolic signifying elements of music can be made which also enables analysis of the way music is used and functions alone or in other media texts. Music in film has multiple simultaneous functions, and can be analysed using a semiotic approach. Whilst a focus on the iconic is productive in terms of a film score’s musical function the other filmic functions of the music can benefit from an examination of the music’s indexical and symbolic representational characteristics. Whether using entirely original scored music or reusing existing music and so bringing specific extra-textual knowledge and experience into play music’s ability to convey meaning makes it a powerful tool for the filmmakers. Simple synchronisation of music with visual images, or repeated usage of a theme with a character can serve to signify indexical and symbolic links which might point to a figurative link to a character, provide simple factual information such as a time frame for the drama, or to provide a musical backdrop to the narrative.
Emotion and signs We can see how signs might convey meaning but how do signs convey emotions? If we look at emotional expression, we might consider three basic types: natural expression, action expression, and artistic expression. Natural expressions are involuntary or spontaneous behaviours that occur such as facial expressions (smiles, pouts), vocalisations (sobs, hoots) or through tone of voice (e.g. higher pitch, brittleness) (Glazer 2017, 190). Action expressions are intentional displays of emotion, which are separate from natural expressions. For example, a speaker might report that they are happy, but it is the voice and facial expression that would express the emotion of happiness. Similarly, there is a difference between expressing an emotion and coping with an emotion, which might actually involve trying to suppress the emotion. Here then, two behaviours might be at odds with each other. Artistic expressions are the artefacts that convey emotions. These are figurative or
86 Analysing sound with semiotics
metaphorical representations of emotions. Picasso’s Guernica might represent the horror or tragedy of a brutal moment of the Spanish Civil War. J. S. Bach’s Toccata or Richard Strauss’ fanfare from Thus Spake Zarathustra might each be a musical expression of the awesome, whether secular or religious. While action expression is under the control of the individual, natural expression is involuntary, and may betray the emotion. Indeed, it is impossible for natural expression to be insincere. Natural expressions may be obvious, such as laughing or crying, or they may be indicators of some emotion that is potentially difficult for an observer to gauge.8 In the context of sound design and emotion, much of the work that is done is in recreating or simulating these kinds of emotional signs. Instinctively picture editors and dialogue editors select takes that contain the most effective balance between the overt message of the dialogue, and the ‘natural’ expressions which contain the other signs that reinforce, nuance or contradict other emotions being deliberately conveyed.
Integrating semiotics with sound theory In some ways an example par excellence of semiotics, sound design in all its forms demonstrates in practical terms how signs are manipulated, juxtaposed and combined in order to change, create or suggest a meaning. The task now is to see how far this model of sound as a sign fits with existing sound theory. Film sound theory from Rick Altman, Michel Chion, Tomlinson Holman and Walter Murch can be re-examined to determine the potential of Peircean semiotics to integrate other aspects of audiovisual sound theory.
Rick Altman In “Four and a Half Film Fallacies”, Rick Altman describes the myth of cinematic sound being always indexically linked to the original sound as a recording, in the same way that the photographic image in a portrait is an indexical ‘death mask’ of its subject (Altman 1980). In fact, the sound reproduced is always achieved through the mediation of the technologies of recording and reproduction. Applying the Peircean model we can more accurately describe cinematic sound as retaining, reconstructing or creating the illusion of an indexical link depending on, for example, whether the dialogue used was synchronously recorded, rerecorded as a replacement for the original synchronous recording through ADR, or in the creating of the voice for an animated character. As far as the audience is concerned, each sound representation is indexical since it appears to be dependent on, and a direct consequence of, its on-screen source, but the process that creates it cannot always said to be an unmediated and natural consequence. The semiotic model similarly supports the idea of Altman’s ‘sound ventriloquism’, since each sound representation can carry with it not only an indexical link to its object but also iconic properties and symbolic meaning: “Far from being subservient to the image, the sound track uses the illusion of subservience to serve its own ends” (Altman 1980, 67).
Analysing sound with semiotics 87
Michel Chion In cinesemiotic terms, there are a number of filmic codes, including the use of a close-up or a reverse shot, or the use of scored music, or of titles, and so on. Each of the codes needs to be learned in order to function. Whatever we can say about a film or a genre of film, or some other classification of a group of films, styles of photography or characters, we do so on the basis of our experience of those things. Above all of these codes is the super-code of lived reality, our everyday experience. One of the principal codes of reality is that of synchronisation between sound and image. If we are talking with someone their lip movements match the sound they make. If they put a glass on the table, we hear the sound of the glass on the table matching the visible event.9 This fundamental code can be mimicked and then manipulated in film. Michel Chion (1994) described as synchresis the process wherein one synchronous sound is replaced with a different sound to create a new effect, whilst retaining a sense of realism since it is still synchronous.10 Altman’s rejection of the indexical fallacy is taken further through the description of the process, long common to film sound practice, of replacing or substituting sounds, whilst retaining the sense of unity with their apparent visible source. Whilst the source in everyday life is an object that is both visible and sonic, the source in film appears to be the visible object which is separated from its sonic source. For example, a person speaking can be seen to speak and the sound of their voice comes from their mouth. In a film the person is visible on the screen, but the sound of their voice comes not from the position of their voice on the screen but instead is produced by a loudspeaker, usually positioned at the centre of the screen, which is typically used for most, if not all, dialogue in the film.11 Film sound manipulates our understanding of lived experience of synchronised sound to allow different meanings to be created. From the blending of two representations, one visible and one aural, the characteristics of one can be transposed or transplanted onto the other. The new sound is selected or modified to create a new iconic relationship, different in some way from the original, and which is designed to be interpreted in a different way. In the process of synchresis, the two sources are initially separate. The original object (the visual plus the aural, if it exists) is separate from the new sonification. The new sonification is swapped for the original sound to forge a new indexical link, removing any trace of the duality of the objects and creating a oneness with the new visible and audible object, mimicking the synchronisation code of lived reality. Synchresis can be explained by Peirce’s concepts of icon and index, whereby sound and image fuse into one single cinematic object. The iconic properties of the new sound are attributed to the visible object, the source of the synchresised sound. The new object of synchresis (the ‘synchresised’ sound) uses the new iconic properties and newly fashioned indexical link to create a new sound for the object. As Altman (1980b) points out, we tend to hear objects coming from the visual source, but once synchronised the image will be seen as producing the sound. The new
88 Analysing sound with semiotics
sonification will have ‘disappeared’ into the object it has helped to create, leaving one conceptual object derived from its constituent visual and auditory components. The image object and synchresised sound could be related, one character’s voice being replaced by a better recording of his or her voice from an alternative ‘take’, or a recording made in the process of ADR. The image object and synchresised sound could be completely unrelated until they are synchronised, such as the animated character. Since, in an animation the original object is virtual rather than filmed, there is no indexical sound recording to be replaced. Instead, a new indexical link is formed to create the single audiovisual object. Chion also expands on three listening modes put forward by Schaeffer (Schaeffer 1967): causal listening, which consists of listening for information; semantic listening, which involves the interpretation of a message, such as spoken language or another coded sound; and reduced listening, which involves listening to “the traits of the sound itself, independent of its cause of meaning” (Chion 1994, 25–29). These three modes of listening (causal, semantic and reduced) correspond closely with Peirce’s concepts of the indexical, symbolic and iconic. It is possible to relate Chion’s listening modes to Peirce’s division of signifier-object relations (see Table 5.1). Whilst the focus of Chion’s listening modes is on the reception of sounds, the Peircean concepts extend the model to include both the creation of the sounds and their reception. By attending to the three modes of listening and the three divisions of sound-signs, we can frame the different ways of manipulating the sounds that form the soundtrack. The fact that there is such an apparently clear connection between the two models is not, I would argue, luck. Nor is it from copying. It is instead likely because it is so common-sense. Peirce used the terms Firstness, Secondness and Thirdness, which at first glance might appear wilfully obscure. We can see how they might be applied to the different types of sound signs. Listening to the sound purely in terms of its own characteristics involves no other object. Listening to a sound which is evidence of an object or action means the sound and object are now two things which are linked. A sound which as a result of continued usage TABLE 5.1 A comparison of Chion’s and Peirce’s concepts
Listening mode
Signifier-object relation
Description
Reduced
Icon
Causal
Index
Semantic
Symbol
The character of the sound itself, rather than the meaning of the sound or the object which makes the sound. The reference to the thing, place or object which makes the sound. The learned association or rule which is generated as a result of the use of the sound in a particular context.
Analysing sound with semiotics 89
through learning, habit, or social convention brings three things together: the sound, the object, and the recognition of the link to previous usages, which create its meaning.
Walter Murch Murch’s idea of the conceptual gap between what the audience is presented with and what sense is made, fits with Peirce’s description of abductive reasoning, in which the film is designed to be thought-provoking: That’s the key to all film for me-both editorial and sound. You provoke the audience to complete a circle of which you’ve only drawn a part. Each person being unique, they will complete that in their own way. When they have done that, the wonderful part of it is that they re-project that completion onto the film. They actually are seeing a film that they are, in part, creating – both in terms of juxtaposition of images and, then, juxtaposition of sound versus image and, then, image following sound, and all kinds of those variations. (Jarrett and Murch 2000, 3) Therefore, for Murch, a key role of the filmmaker is to lay the foundations for meaning to be created by the individual audience member. An audiovisual ‘trail of breadcrumbs’ can then be used to allow abduction to take place, which may then be supported or modified by subsequent events, but also by recollections of previous scenes of the film. These conceptual dots can then be joined together to create links between narrative elements. In semiotic terms, it means leaving sufficient space to allow the audience to formulate the object of the sign and to then create the interpretant, which gives meaning to that sign.
Tomlinson Holman In Sound for Film and Television Holman divides the soundtrack into its three functional components: direct narrative, subliminal narrative, and grammatical (Holman 2002; 2010). The focus is not on the individual elements of the soundtrack, but rather their combined function. Whilst the direct narrative functions of the sound are often synchronised sounds, such as dialogue or sync sound effects, what separates direct narrative from the subliminal is that it is meant to be noticed. Subliminal narrative components are designed to not be directly noticed, or at all. Similarly, the grammatical function of sound may relate to its treatment as much as to the particular sound being used. As a direct narrative component, we can say that in addition to the actual language being spoken which is a symbolic system, character dialogue has both indexical and iconic functions and elements which can be manipulated. It is indexical in that it provides a causal link to the character speaking, just as the visual representation of the character refers to the actual character. It acts as a reference,
90 Analysing sound with semiotics
or proof of the validity and reality of the film representation, where the film and sound work together to portray a recognisable reality. It is also iconic in the sense that the voice has characteristics and particular qualities that are separate to the words being spoken. Subliminal narrative sounds can be understood as sounds whose characteristics allow them to input into the process of meaning-making in the mind of the audience. They are the objects that may not or need not be immediately or fully recognised. Dialogue is, of course, most certainly symbolic in the truest sense. We make sense of dialogue as a literal language since the words of the language have symbolic meaning. The various semiotic concepts, as described earlier, can now be applied to the process of creating a soundtrack from the perspective of the sound practitioner. Background sounds, the use of sound metaphors, the addition of sounds, the augmentation of sounds, or the emotional content of sounds can inform the process of semiosis as meaning is gradually created. The division of both object and interpretant, in the light of external information, allows for meaning to become ascribed to sound signs when they are first used, and when they are subsequently re-introduced. For Holman’s grammatical function of sound, Peirce’s different kinds of reasoning (abduction, induction and deduction) can be applied to examine the process of meaning-making from sound signs. The existing codes and conventions of film sound allow us to make sense of what we hear, such as theme music, score music, narration, character dialogue, continuous backgrounds across picture edits, and so on, so that we can deduce or induce their meaning. Abduction is the process that takes place in the absence of sufficient information. Frequently, for example, in the opening scene to a film, or when first playing a video game, we are presented with an ‘incomplete picture’ – what is the sound? What is the origin of the sound? How does the sound I can hear link with what I am seeing? And so on. It is this guesswork that encourages the audience to suggest or create meaning for themselves, which in due course will either be corroborated or may require some amendment. This grammar of sound is partly determined by our past experience of films, which allows us to recognise certain sounds or the sound conventions used, for example, theme music or the use of sound across a picture edit to indicate continuity. For many sound signs, grammatical meaning is dependent on the image in order for it to function, for example, POVs that can be created using sound, pictures or both. Indeed, POV shots and their sound equivalents require a certain amount of learning to be thus identified since they require a shift in the perspective of the viewer from objective observer to one that shares the viewpoint of a participant.
Analysis example – The Conversation Francis Ford Coppola’s The Conversation (1974) is both an excellent film and a useful illustration of some of these concepts applied. The opening sequence provides examples of some elements of the model and how they may be applied to both sound, sound/image combinations and their role in the developing narrative. The opening title shot (Figure 5.1) is a very long, slow zoom of a busy city square
Analysing sound with semiotics 91
FIGURE 5.1
The Conversation – opening shot of the city square
accompanied by an echoey musical performance, and gradually some very peculiar metallic noise also becomes audible. We do not see the origin of the music or hear anything that we can see happening in the (still very long) shot. The picture cuts to a man observing the scene below from the roof of a building (Figures 5.2 and 5.3), with another picture cut showing his POV (point of view) of the square below through a telescopic sight typical of a sniper’s rifle (Figure 5.4). Accompanying this POV shot is the strange metallic sound. Gradually the picture and sound begin to align, and a couple is now both visible (in a long telephoto shot) and audible (with some occasional accompanying metallic distortion). As the peculiar sound appears to be synchronised with the visual images of the people talking, we gradually become aware that there is some link between the two. Not until we see Harry Caul (played by Gene Hackman) climb into a parked van (Figure 5.5), where his associate monitors the conversation through recording equipment, while we continue to hear the couple’s conversation, do we realise that the ‘sniper’ on the roof is actually pointing a microphone rather than a rifle. Harry asks his associate how the recording is going, and we hear the results while the images show each microphone position in turn. To this sequence we can now apply several of the concepts under discussion to these opening minutes from The Conversation. Examples of icon, index and symbol, abduction, immediate and dynamical objects and interpretants are all exhibited in order to create an intriguing and narratively inventive opening scene – the creative way the scene is constructed, the deliberate withholding of information and an implicit understanding of the way sound and image combinations can be set up to deliberately manipulate or obscure the way they will be understood or interpreted. In semiotic terms the metallic sound was initially purely iconic – it contained no indexical link to anything so far in the story – and was also devoid of symbolic meaning. If we were using Chion’s terminology our listening would
FIGURE 5.2
The Conversation – the rooftop position
FIGURE 5.3
The Conversation – the rooftop ‘sniper’
FIGURE 5.4
The Conversation – the couple seen from the rooftop sniper POV through the telescopic sight
Analysing sound with semiotics 93
FIGURE 5.5
The Conversation – Harry Caul goes to the parked van
be reduced since there was no causal or semantic meaning to the sound. Our understanding of the immediate object of the ‘sniper’ is modified in light of the new information and becomes a different dynamical object: a man pointing a microphone. The immediate object of the metallic noise, once associated with what is picked up via the long-range microphone, becomes a dynamical object in light of this new information. Similarly, each of these dynamical objects now suggests something different – and so creates a new dynamical interpretant which in this case is the imperfect result of a covert microphone recording. The initially iconic sound gradually attains an indexical link to its origin (the microphone) and thereafter becomes symbolically meaningful as a surveillance recording of the couple we have been watching. The abductions we make as the scene develops, concerning the origin or meaning of the reverberant music, the strange metallic sound, the ‘sniper’ and the two people talking, have to be modified in light of the new experience or new information. As we are given more information our assumptions and guesses are either supported or have to be modified in light of collateral experience.
Summary Understanding our sonic world, or our wider world in general, involves making sense of our senses. Hearing, and the other senses give us percepts – impressions of something through the senses. What, and whether, anything happens with those percepts depends on the ability to combine them to make concepts. Iconic sounds, or percepts, in this terminology remain as merely vague impressions. As soon as recognition ties the sound to an object, an event, a source, then it becomes linked to another thing – and becomes a concept. Something has happened in the mind to link the two things together. Once other concepts can be linked by some kind of pattern, or habit, or rule, this concept can become symbolic.
94 Analysing sound with semiotics
There has been relatively little adoption of Peirce’s semiotic model in media studies or sound studies. Partly, that is explained by the histories of those two disciplines: the relative delay in publication of the majority of Peirce’s writing, and its translation. Peirce’s contemporary Ferdinand de Saussure’s writings were in French and was a readymade semiotic system which was readily adopted by theorists in need of a model. Saussure’s model lives on but is fundamentally different to Peirce’s. Though Peirce lived in the age of photography and silent film, sound recording and transmission were new technologies in their infancy, and few of his writings incorporated sound signs as examples. Where sound had previously been ephemeral, once sound recording and reproduction became possible, they each “loosened the bonds of causality and lifted the shadow away from the object” (Murch 2000, 2.1). We are intimately familiar with incredibly complex symbolic systems – human language for instance – which humans have developed over millennia the ability to use, understand and manipulate. At its core is the simple idea that the simplest of sounds can be a relatively meaningless, or can be linked to its source; and may in turn indicate a concept for no other reason than it is socially agreed or is convenient. These concepts can be so naturalised as to appear as rigid and obvious, or flexible enough that they can be replaced by better concepts where required. Psychoacoustics has shown us that sound can operate in streams, which can be grouped together by the listener in order to separate sources from unwanted sounds. From this ability, we can determine, by virtue of our binaural hearing, the location of sounds. Coupled with stereoscopic vision and the ability to move and tilt our heads to check and verify, we can use two independent senses to cross verify the origin of sounds. This might be a bird in a tree, or a person behind us calling our name. Semiotics provides a conceptual framework that can be seen as the building blocks of meaning making through which learning can take place, as well as a comprehensive model that gives a coherent account of the processes of meaning making, or arriving at an understanding, and how we come to know things. Audiovisual productions are, in a sense, a perfect place to apply some of these concepts. Whether in a film or videogame we are usually presented initially with a blank slate and gradually we are provided sounds and images from which we are supposed to make sense. If we apply our semiotic concepts of abduction, which is in effect the ‘hypothesis stage’ of creating meaning we are continually making abductions as to what things might mean. We are continually presented with new information into our current understanding which are either assimilated or adapted; either the information supports the current hypothesis in which case it is assimilated and strengthens our hypothesis, or our hypothesis must be modified to account for this new information or experience. At the start of a film the opening shot, character, location, event, music, words, and so on, will almost always be interpreted as being significant purely because they are the first things in the film. As our understanding develops it may be that what we initially do not recognise we later do recognise or remember from earlier. Here then the object may not be recognised or known initially and so our understanding will be incomplete. Later this
Analysing sound with semiotics 95
may change and so our initial hypothetical understanding of what we heard, and what it might mean develop over time.
Notes 1 For example, Peirce used the sense of a colour, which can exist without specific reference to another. Although this is only partially true, since we can only tell a colour in comparison to other known colours. 2 Thanks to Alec McHoul for this elegant example. 3 ‘Literal’ here is used in the sense that it is non-figurative. 4 Thunder is caused by the explosive expansion of air during a lightning strike, which causes a shock wave and is heard as thunder (see www.bom.gov.au/info/thunder/). 5 For example, Zeus (Greek), Jupiter (Roman), Thor (Norse), Lei Gong (Chinese), Xolotl (Aztec), Set (Egyptian) and Namarrkun (Australian Aboriginal) are all gods of thunder (Spence 2005). 6 In his Prolegomena to an Apology for Pragmaticism [1906] Peirce writes “we have to distinguish the Immediate Object, which is the Object as the Sign itself represents it, and whose Being is thus dependent upon the Representation of it in the Sign, from the Dynamical Object, which is the Reality which by some means contrives to determine the Sign to its Representation” (Peirce, Hartshorne and Weiss 1960, 4.536). 7 ‘On-screen’ music, or ‘diegetic’ music, is often referred to as ‘source’ music to distinguish it from ‘score’ which is typically non-diegetic. Although the distinction is not clear cut, since a good deal of on-screen music is augmented by non-diegetic music blurring the line between on-screen and off-screen, diegetic and non-diegetic. ‘Scored music’ refers to music that is non-diegetic. Whilst it frequently refers to orchestral music, score here is used to mean any element of the soundtrack that is musical and does not emerge from the narrative story world. 8 Of course, there might be a problem of determining whether someone was laughing or crying, or if someone is crying, which emotion was being expressed, since crying with relief and crying in pain may appear similar. 9 Whilst there is a difference between the speed of light and the speed of sound, such that the sound will always lag the vision (sound travelling around 340 metres per second compared to light’s speed of 3x108 metres per second, or for practical purposes instantaneous and infinite), it means that we can always expect the sound to be instantaneous with its visible source slightly later as the distances become greater. 10 Synchresis is the synchronisation of the new sound to the image, with the synthesis of the two components creating the new audiovisual object. See Appendix A: ‘synchresis’. 11 This loudspeaker, which is usually positioned immediately behind the screen, plays back material that is in what is referred to as the ‘centre channel’, as opposed to the left and right speaker channels or surround speaker channels. The centre channel for dialogue has been adopted almost universally by filmmakers to present consistency, rather than attempting to match the physical sound source to the visible source on screen.
References Media referenced Adler, Richard, and Jerry Ross. 1953. Rags to Riches. His Master’s Voice. Song. Bennett, Tony. 1953. Rags to Riches. Columbia Records. Song. Coppola, Francis Ford. 1974. The Conversation. Paramount. Motion picture. Crosby, Bing. 1933. Try a Little Tenderness. Brunswick Records. Song.
96 Analysing sound with semiotics
Grieg, Edvard. 1876. In the Hall of the Mountain King. From Peer Gynt Suite. Leipzig: C.F. Peters. Orchestral music. Hudson, Hugh. 1981. Chariots of Fire. 20th Century Fox. Motion picture. Jagger, Mick, and Keith Richard. 1969. Gimme Shelter. Decca Records/ABKCO. Song. Kershener, Irvin. 1980. Star Wars: Episode V – The Empire Strikes Back. Lucasfilm/Twentieth Century Fox. Kubrick, Stanley. 1963. Dr. Strangelove or, How I Learned to Stop Worrying and Love the Bomb. Columbia Pictures. Motion picture. Kubrick, Stanley. 1968. 2001: A Space Odyssey. Warner Brothers Pictures. Motion picture. Lang, Fritz. 1931. M. Vereinigte Star-Film GmbH. Motion picture. Neil, Fred. 1966. Everybody’s Talkin’. Carlin Music Corporation. Song. Nilsson, Harry. 1969. Everybody’s Talkin’. RCA/Victor. Song. Newton, John. 1779. Amazing Grace. Song. Redding, Otis. 1966. Try a Little Tenderness. Volt/Atco. Song. Schlesinger, John. 1969. Midnight Cowboy. United Artists. Motion picture. Scorsese, Martin. 1990. GoodFellas. Warner Bros. Motion picture. Scorsese, Martin. 1995. Casino. MCA/Universal Pictures. Motion picture. Scorsese, Martin. 2004. The Aviator. Miramax. Motion picture. Spielberg, Steven. 1975. Jaws. Universal. Motion picture. Strauss, Richard. 1896. Also Sprach Zarathustra. Munich: Joseph Aibl. Orchestral tone poem. Williams, John. 1980. Star Wars: Episode V - The Empire Strikes Back. Lucasfilm/Twentieth Century Fox. Soundtrack. Woods, Harry M., Jimmy Campbell, and Reginald Connelly. 1933. Try a Little Tenderness. Song.
Other references Altman, Rick. 1980a. “Four and a Half Film Fallacies.” In Sound Theory, Sound Practice, edited by Rick Altman, 35–45. New York: Routledge. Altman, Rick. 1980b. “Moving Lips: Cinema as Ventriloquism.” In Cinema/Sound, edited by Rick Altman. New Haven, CN: Yale University Press. Cavalcanti, Alberto. 1985. “Sound in Films.” In Film Sound: Theory and Practice, edited by Elisabeth Weis and John Belton, 98–111. New York: Columbia University Press. Chandler, Daniel. 2007. Semiotics: The Basics, 2nd edition. Oxford: Routledge. Chion, Michel. 1994. Audio-Vision: Sound on Screen. Translated by Claudia Gorbman. New York: Columbia University Press. Ciment, Michael. 2003. Kubrick. New York: Faber & Faber. Cobley, Paul, and Litza Jansz. 1999. Introducing Semiotics. Cambridge: Icon Books. Deutsch, Stephen. 2007. “Editorial.” The Soundtrack 1(1): 3–13. Glazer, Trip. 2017. “The Semiotics of Emotional Expression.” Transactions of the Charles S. Peirce Society 53(2): 189–215. Holman, Tomlinson. 2002. Sound for Film and Television, 2nd edition. Boston, MA: Focal Press. Holman, Tomlinson. 2010. Sound for Film and Television, 3rd edition. Amsterdam; Boston: Elsevier/Focal Press. Jarrett, M., and W. Murch. 2000. “Sound doctrine: An interview with Walter Murch.” Film Quarterly 53(3): 2–11. Murch, Walter. 2000. “Stretching Sound to Help the Mind See.” New York Times, October 1, (2.1). Peirce, Charles S., Max Harold Fisch, Edward C. Moore, and Christian J. W. Klousel. 1982. Writings of Charles S. Peirce: A Chronological Edition. Bloomington: Indiana University Press.
Analysing sound with semiotics 97
Peirce, Charles S., Charles Hartshorne, and Paul Weiss. 1960. Collected Papers of Charles Sanders Peirce. Cambridge, MA: Belknap. Potter, Cherry. 1990. Image, Sound, & Story: The Art of Telling in Film. London: Secker & Warburg. Sack, Kevin, and Gardiner Harris. 2015. “President Obama Eulogizes Charleston Pastor as One Who Understood Grace.” New York Times, June 27. Available online at www.nytim es.com/2015/07/04/arts/obamas-eulogy-which-found-its-place-in-history.html Schaeffer, Pierre. 1967. Traité Des Objets Musicaux. Paris: Seuil. Spence, Leis. 2005. A Dictionary of Non-Classical Mythology: Cosimo Inc. Yenika-Abbaw, Vivian. 2006. “Capitalism and the Culture of Hate in Granfields’s Amazing Grace: The Story of the Hymn.” Journal of Black Studies 36: 353–361.
6 KING KONG (1933)
Introduction Released at the height of the Great Depression, King Kong (Cooper and Schoedsack 1933) was a ground-breaking film in several ways. First, it was the first cinematic ‘blockbuster’, making a two million dollar return in the years of its release alone, from a production budget of $672,000. Second, its use of stop-motion techniques (by Willis O’Brien and Marcel Delgado) was an enormous advance on the technique pioneered by Georges Méliès. Third, it was among the first feature sound films whose musical score was entirely original.1 Fourth, it was one of the first non-cartoon films for which the character sounds were designed. The processes and approach pioneered in King Kong have been adopted and adapted by generations of filmmakers. The creation of the desired filmic effect through the manipulation of existing recordings bearing some trace of metaphorical meaning has been the de facto standard for much of the work of sound designers ever since. What began as a means of sonic problem-solving is also an enormously productive avenue to explore for sound design, given the wealth of fertile sounds with which to begin.
Sound effects in King Kong Ever since sound recording became possible and thereby “loosened the bonds of causality and lifted the shadow away from the object” (Murch 2000, 2.1) sound representations provided a potential wealth of creative opportunities. At the time of the making of King Kong the ability to record sound and later synchronise to pictures was still in its relative infancy, though those working in animated cinema had arguably done more creative work in this field than those working in the realm of live action, since unlike live action films cartoons were not tied to any realistic portrayal.2
King Kong 99
Murray Spivack had worked previously as a symphonic percussionist before joining FBO Pictures (which later became RKO Pictures) in 1929 at the beginning of the industry’s changeover to talking pictures. The sound effects he created for King Kong play an important part in creating believable monster characters that could only really come to life when they made a sound. Spivack had researched the likely sounds made by the dinosaurs in the film, but was informed that the creatures would not roar as they would not have had vocal chords, and would instead have been more likely to have hissed (Goldner and Turner 1976, 190). Departing from an authentic representation Spivack chose to adopt a more dramatic approach combining recordings from the RKO sound library of various animals to which he added some vocalisations of his own. In an interview with Popular Science magazine, during the making of King Kong, Spivack explained the problem of creating the sound of massive monsters: “But why,” I asked, “wouldn’t some living animal’s roar have done the trick? A lion, for instance?” “The trouble with the roars of living animals,” Spivack said, “lies in the fact that audiences recognize them. Even the most terrifying notes would be recognized. Also, the majority of roars are too short. The elephant, with the longest roar of which I know, sustains the sound only eight or nine seconds. Kong’s longest continues for thirty seconds, including six peaks and a threesecond tail. “The triceratops resembles an enlarged boar or a rhinoceros; more like a boar, perhaps, because of the three large horns protruding from his head. This little fellow measures only twenty-five feet long, yet in the picture he bellows like a bull, gores a man and tosses him into some long-forgotten bush—to the accompaniment of a reversed and lengthened elephant roar.” (Boone 1933, 21) Spivack described the start of the process in which his secretary would look over the script and see what sort of sound was needed for the script, which was noted on the document “Sound Effects List and Cost Estimate, July 19, 1932” (Goldner and Turner 1976, 195–199). The document lists various materials which would need to be purchased as well as a ‘spotting list’ of effects which would need to be created. Among the sounds required are all the relatively straightforward sounds such as wind, the ship’s foghorn, the officer’s whistle, gunshots, airplanes, and so on, as well as the creature voices (roars, hisses, snarls, whimpers and grunts) and other body sounds, such as footsteps and fight sounds that would need to be created. Initially Spivack contacted the curator of the Carnegie Museum to ask for an opinion on what Kong and the other creatures might have sounded like. With no useful answer forthcoming Spivack instead used his own reasoning: I thought, “Well, what the hell am I going to do with this? Now, I’ve got to come up with some sound and I’ve got to come up with something that’s
100 King Kong
practical.” I didn’t want any toys, I didn’t want any cartoon sounds. I wanted something that would be believable. So the first thing I did is I went to the zoo. And I gave one of the men who feed the animals a ten dollar bill and I said, “I want to get this lion to roar and I want to get the tiger to roar, and I want to get some sounds out of these animals … the giraffe, the elephants. I wanted to get as much sound as I could, and I had a portable microphone and a portable recorder so I could record all of these sounds.” So he said, “Well, the best time to do that would be at feeding time.” So he said, “I’ll feed them. And then after I feed them, I’ll get them to roar.” So he fed them, went outside the cage and made as though he were going to steal their food. And believe me, they bellowed. They raised hell! And it was the very sound that I wanted. (Spivack and Degelman 1995, 54–55) Armed with recordings of lions and tigers, Spivack then had to make them unrecognisable: I got what I needed, but it wasn’t sufficient, because you could recognize a lion and you could recognize a tiger. Well, I conceived the idea like a phonograph record. If you slow the record down, the pitch drops in direct proportion. So if I wanted to slow it down a whole octave, I knew that if I played it at half-speed the pitch would drop an octave. And the octave would be outside of the human range. There wasn’t any animal that could bellow that low, and of course, if we slowed it down to half-speed it lengthened the sound. So consequently, I knew then that I’ve got the start of this thing. So I took the tiger growl, went through the same process, and played that backwards.… And played the lion growl forwards, also at the same [speed], and got a mixture of those two so that it was an unrecognizable sound but it certainly was a roar. And being that low, it was out of the human range. And it was quite lengthened. So I made all sorts of shadings and all sorts of sizes on that. And then fitted them. (Spivack and Degelman 1995, 55) The sound of the pterodactyls originated as a bird recording, while the tyrannosaurus was made again using an original animal recording which was slowed and pitched down to hide its origin whilst achieving the desired effect (Spivack and Degelman 1995, 56). In describing the sound design process Spivack describes the difficulties: You see, I started making the sounds before they had concluded putting the whole picture together. So I had most of the sound that I was going to use earlier. And it was just a matter, then, of synchronizing it to the picture, and that wasn’t the most difficult part of it. The most difficult part of it was trying to get some sound that made some sense, without including a bunch of toys and junk like the average cartoon is. (Spivack and Degelman 1995, 60)
King Kong 101
Getting ‘something that’s practical’ that ‘would be believable’ and ‘that made some sense’ gives us a clue to the underlying conceptual reasoning for the choice. Spivack adopted a realistic treatment (although completely inauthentic) using common-sense understandings of everyday life. First, that a larger animal makes a deeper sound, and second, the characteristic sounds of animals, such as the screech of a bird, or the roar of a big cat, can be used as the basis for new creatures bearing similar characteristics, but on a larger scale. In semiotic terms this can be described as transplanting the recognisable iconic and symbolic sounds of animals, and shifting them sufficiently so that they indicate their new source while retaining their expressive power, thereby creating a new indexical link between sound and image. Once the original sounds had been recorded the relative levels were mixed on a purpose- made four-channel mixer allowing Spivack to play the newly designed sounds back in combination with the pictures: I did that for all of the noises that I needed. I didn’t need it for an elevated train. I knew what those sounded like. But any of the noises that were strange noises I worked out. And that’s how I got to the sounds. (Spivack and Degelman 1995, 57) Where Kong battles with the Tyrannosaurus rex each creature needed to be distinguished. Since both creature voices were based on pitched down and slowed down recordings, they lost some of their distinguishing features, and so the dinosaur’s voice was created using different elements: Spivack had mixed an old puma sound track with the steam-like noise from a compressed air machine and added a few screeches from his own throat, uttered a few inches from a microphone in the soundproof room. Exactly in what proportions they were blended I cannot say, for each sequence demands many trials before the mixed noises come through the loud speaker in such volume and of such quality that the small audience of men expert in diagnosing sound declare, “Kong and the tyrannosaurus [sic] must have sounded like that.” (Boone 1933, 106) Spivack’s two basic processes – (a) reversing the recordings, and (b) slowing down the recordings – retain some of the iconic qualities of the sounds whilst breaking the indexical link that ties them to their recognisable origin. By reversing a sound, the tonal character is retained but the sense that is made of the sound is lost. For example, reversing a speech recording no longer makes sense (as the symbolic sequencing of the different language elements no longer create language meaning) but the tonal character of the sound itself (and thus much of its iconic quality) is recognisably human in origin as well as being recognisably male. Slowing down the recording further shifts the links between the sound and its origin – our natural world tells us that a lower pitch is an indication of the size of the creature which produces it. Younger animals tend to sound higher in pitch than they do as adults.
102 King Kong
Using this approach natural-sounding creature sounds were created which carried the properties borrowed from their original recordings, and then were manipulated in order to hide the origin of the sound, whilst retaining the metaphorical meaning of the sound. To create the sound of Kong beating his chest Spivack’s team experimented with timpani, the floor, and eventually settled on hitting an assistant’s chest whilst placing the microphone at his back. For other creature sounds Spivack and his assistant Walter G. Elliot adopted similar approaches to create what later became known as sound design, which involved vocalising: Some of the miscellaneous sounds were created with the simple instruments one would find in any studio sound laboratory. One hour Elliot was grunting into a hollow double gourd with a microphone conveniently placed to pick up deep growls and grunts of the triceratops; later in the morning he was halfreclined on the floor, grunting through a water-filled mouth into a megaphone, thus producing the animal’s death gurgles. (Boone 1933, 106) The Kong ‘love grunt’ was not recorded using this method; instead Spivack recorded himself grunting through a megaphone, and pitching/slowing down the recording as before to take it out of the range which would make it recognisably human in origin (Spivack and Degelman 1995, 56, Cooper et al. 2006). Just as we may often tell the sound of an angry animal from a happy animal (such as the sounds of dogs and cats when angry or contented), performing the sound in this way imbues the sound with the desired characteristics and allows meaning to be drawn from it. In Peircean terms the meaning of the sound (its interpretant) is created because we recognise some of the emotional intent of the sign from our own experience of animal and human sounds. We can therefore create the metaphorical link between the partially recognised iconic sounds of vocalisation (which when synchronised become Kong’s vocalisations providing an indexical link to him) and a learned symbolic meaning of the sound that can inform our understanding of his emotional state. It is also interesting, and slightly ironic, to note that in the sequences showing Denham’s film crew on the island there is no sound crew accompanying the actual camera crew. Even within King Kong itself sound is not shown as being a mediated part of the filmmaking process. It is as if sound in a film has no technological base, involves no work, is natural, and will simply “show up,” just like the spectacle Denham witnesses. Further, the classical paradigm would have us believe that no work has gone into the sound of what we witness. Sound is just there, oozing from the images we see. (Gorbman 1987, 75)
King Kong 103
Sound appears to happen without choice, creativity or design playing a part and this depiction from 1933 presages the widespread attitude towards the hierarchy between picture and sound for years to come.
Music in King Kong It is not simply a musical piece being played over the top of the images. He is weaving his music around the cuts of the film. (Peter Jackson talking about Max Steiner’s score for the original version of King Kong in Cooper and Schoedsack 2005)
One of the principal differences between score for the sound film compared to score for the silent film was the use of silence itself. In the silent film score continuous music was the norm since discontinuities could not be hidden by other sounds, whereas the sound film had dialogue and sound effects to fill in the sonic spaces between sections of music. In the silent film the musical accompaniment, whether pre-existing, improvised or written specifically for the film, could only be loosely synchronised to the images, at best using key moments to signify changes. If the score was to match the action it was only suitable for a single instrumentalist who could improvise, since there would be enormous difficulty in improvising musical time to pictures whilst attempting to play in time with other musicians. The synchronous soundtrack allowed for an approach which combined the elements of the silent score approach – underscoring the mood of the scene – combined with accentuation of key story elements (Buhler, Neumeyer and Deemer 2010, 322). This allowed the composer some degree of flexibility, without having to provide the “structural consummations required by the syntax of Western Music” (Brown 1994, 94). Instead, shorter phrases and stand-alone motifs could be used where necessary, alongside longer more traditional themes which in a standalone piece of music would not be possible. Max Steiner was one of the foremost composers of the formative years of Hollywood’s conversion to sound. A proponent of nonsource (non-diegetic) music Steiner, in adapting his compositions, helped to define the parameters of film music language and the standard of the Hollywood film soundtrack (Beheim 1993, 122, Witkin 1991, 218). Steiner was a child prodigy in his native Austria, having composed a successful operetta at the age of sixteen which ran for one year in Vienna’s Orpheum Theatre, with several other of his compositions published and played by the Vienna Philharmonic (Goldner and Turner 1976, 191). Steiner’s score for King Kong was entirely original and composed specifically for the film. There is no music at all in the first twenty minutes of the film. Steiner reasoned that since the first part of the film concerned the ongoing Depression it would be more effective to leave it without score (Steiner 1976). The absence of music from the opening New York scenes also makes the characters’ motives difficult to determine, particularly Denham’s in convincing Anne to join his expedition.
104 King Kong
Once the film leaves behind present-day concerns and begins to enter into the unknown the soundtrack leaves behind New York and the reality of the times and begins to move towards an expressive mode of storytelling. As the ship approaches Skull Island through fog music is used to create a sense of mystery and a sense of the unknown. The score is used throughout most of the remainder of the film, accompanying around 68 of the film’s remaining 79 minutes (Handzo 1995, 47) with a total running time of approximately 100 minutes.3 Acknowledging the prestige of orchestral music, and the fact that it could be used as an indicator of the scale and status of the film, Steiner was hired to compose an entirely original score for King Kong. It is relatively easy for any composer worth the name to provide music to fit a particular mood, far harder to judge what mood is required by the scene, and how much of it should be provided musically. (Deutsch 2007, 6) Steiner adopted the Wagnerian style of leitmotif to combine musical themes with particular characters, rather than simply accentuating a particular mood. The love theme which is used to signify Anne’s character and later her relationship with Driscoll is established early in the film. Steiner also punctuated elements of the narrative in a style, disparagingly referred to as Mickey-Mousing, which contemporaries often found disagreeable.4 Fellow composer Miklos Rozsa, was critical: “I intensely dislike it. One of the reasons I did not want to come to Hollywood was that I thought that was what you had to do here” (Brown 1994, 273). Steiner, though, took the view that film music was there to serve the dramatic content, arguing that to criticise the practice was missing the point: ‘Mickey Mousing’. It is so darn silly. You think soldiers marching. What are you going to do? The music going one way and the image another? It would drive you nuts. It needs a march. It has to be with marching feet. So if you call it ‘Mickey Mousing’, it’s fine with me. (Schreibman and Steiner 2004) The style became much more familiar through its employment in cartoons, from which it gets its name. Music used in this way can also be described in semiotic terms. Accentuation of the dramatic content through the musical soundtrack is thus used to reinforce the sign already present in the story, creating a metaphorical and symbolic link between music and story. This technique is extensively used throughout the film; for example, when the tribal chief first notices Denham’s party he walks down the steps to the accompaniment of a low bowed string, synchronised to his footsteps, which continues for his entire approach to Denham. Elsewhere in King Kong further examples of Mickey-Mousing abound:
King Kong 105
The music ascending as Anne is forced up the steps to be chained to the pillars. Driscoll ‘sneaking’ past the dead tyrannosaurus accompanied by ‘tiptoeing’ music. Kong seemingly ‘tickling’ Anne accompanied by the synchronised trills of the woodwind. Driscoll ascending the rocks to reach Anne. The elevated train coming down the line.
As well as Mickey-Mousing, the score is also used to highlight or accent other story elements. Where the captain translates the conversation with the island chief music is used to accent the islanders’ speech, adding a second layer of non-verbal meaning to the foreign dialogue. Similarly, back on board the ship the love theme for Anne/Driscoll is interrupted by the captain’s question: “Mr Driscoll. Are you on deck?” The music resumes for the shot of their embrace and is interrupted once more when the captain then asks “Can you please come up on the bridge?” Simple symbolic links such as a run of ascending notes may have a very literal link to what is happening on screen. By synchronisation with narrative elements we are able to use the concept of abduction to interpret the meaning of the film through prior experience of musical themes or conventions which then inform our reading of the narrative. In semiotic terms, through synchronisation, the music is used to create a symbolic meaning or learned association which informs the interpretation of the broader narrative. For other relatively direct links between music and narrative such as ‘sneaking’, ‘tiptoeing’ or ‘tickling’ music the audience is expected to interpret the music as having a clarifying role in the interpretation of the story. For example, where Kong tickles Anne the musical treatment encourages us to view Kong as inquisitive and gentle, where previously he was fearsome. Steiner’s use of leitmotif mirrors the symbolic linking of music with character, planting the seed for music to suggest a character through the music. The main three-note descending figure theme of Kong himself also accompanies the opening titles, while a variation is also used for Anne’s motif, as is another for the main love-theme. Jack’s ‘courage’ theme is a four-note figure. The score interweaves some of the themes together in a sequence, such as the aboriginal dance being interspersed with Anne’s theme, and Jack’s four-note motif accompanying the rescue attempt. The aboriginal theme is also whistled by Denham, accompanied by the (non-diegetic) orchestra. There is no strict delineation between diegetic and non-diegetic music in King Kong, which Jim Buhler describes as ‘fantastic’: The music on the island, for instance, is neither diegetic nor nondiegetic. I would locate the fantastic, in fact, in the gap between what we hear and what we see. But without some sort of distinction between diegetic/nondiegetic sound, such a gap isn’t even really audible. (Buhler et al. 2003, 77)
106 King Kong
Even where the film features source music, such as in the ‘aboriginal sacrificial dance’, drums are the only instruments seen, and are augmented in the score by orchestral instruments. It is a musical shorthand to indicate the unfamiliar. Its only specificity is its non-westernness, and this is accentuated, positioned as it is alongside more western traditions of orchestral musical forms and orchestration. For a general, western audience an accentuation on percussion and repetition was sufficient to indicate a tribal or strange setting. The ‘open-fifths’ of the score used for Skull Island is structurally identical to music used to represent native Americans, being a shorthand for exotic and/or primitive. By 1933 the use of stereotypical music was already well established with part of the Tchaikovsky’s The Nutcracker being used to represent a wide variety of locations from North Africa through to south east Asia (Buhler, Neumeyer and Deemer 2010, 205–208). The music which accompanies our first sight of Kong is his theme (introduced in the opening credits) which is repeated during Kong’s battles. When Kong puts Anne on the ledge near his Skull Island cliff top home Kong’s theme has been replaced by Anne’s. We, like Anne, are therefore guided to be less fearful of Kong and see him as a protector and he does indeed save from a giant snake. The Empire State building finale brings back the love theme which previously was used for Anne, or Anne and Driscoll. Used here, it now indicates the relationship between Kong and Anne. As the aeroplanes move in, we see Kong as he holds Anne for what will be the last time. The musical score is used to indicate Kong no longer as threat, but instead as protector, through the use of Anne’s rather than Kong’s theme. Through the accompanying score, Kong’s character has moved from savage beast to loving protector. After Denholm’s final line of “It was beauty killed the beast” we hear the Kong theme repeated but this time as a mournful coda as he lies dead on the pavement. In semiotic terms the symbolic identifications which are created by the music allow meaning to be made by the audience. On viewing the film for the first time the individual pieces of music could not be recognised since the score was original. Instead, some of the musical themes used are recognisable or have qualities which can be recognised, and which have been used before to represent particular places or emotions. These pieces of music act as sound symbols, which are learned associations or rules “which represent their objects, independently alike of any resemblance or any real connection, because dispositions or factitious habits of their interpreters insure their being so understood” (Peirce 1998, 460–461). Steiner’s score for King Kong illustrates many of the semiotic concepts that apply to film music. The use of Wagnerian leitmotif to establish a symbolic link between character and musical theme where iconic and symbolic representations are created from particular orchestration and culturally accepted representations of places, peoples, actions and moods. Close synchronisation between music and action attempts to create a pseudo-causal (indexical) link between film and score, where the score responds and reacts to the action rather than pre-empts or describes it. The synchronisation also encourages or demands that the music be related to the narrative in some symbolic way, and our prior knowledge or familiarity with
King Kong 107
musical themes and orchestrations allow us to bring extra-textual knowledge to aid in our interpretation. Having established symbolic links between score and narrative we are invited to interpret meaning through abduction such as empathising with Kong’s motives and his relationship with Anne.
Summary King Kong was a resounding success with audiences on its release in 1933 and remains a hugely influential film for filmmakers. It created the template for blockbuster cinema by creating a fantastic world never seen before. The methods employed were groundbreaking, not least in its use of music and sound as genuine creative partners in its success. There is a direct line from Murray Spivack’s work in transplanting sounds of the normal world to be used as the creature voices of King Kong to that of Ben Burtt’s sound design for Star Wars (Lucas 1977). Similarly, Max Steiner’s score created the blueprint for cinematic scoring which has been used on countless films since. Using some of the tools of semiotics we can also examine the processes as well as the end result of the film makers who worked on the soundtrack. The starting point for the sound design of the monsters could not be the real creatures since they either did not exist except as fantasy, or were long since extinct. This forced, or at least encouraged, a way of working where the desired result dictated the sound choices. If the need was for a massive, fearsome, yet believable, monster then those characteristics could be sought and manipulated where necessary to achieve the goal. In the musical score, Max Steiner was given license to create a score that interweaved with the story and the characters. Just as importantly, King Kong illustrates the collaborative nature of the creative processes that go into creating the soundtrack. Whether King Kong’s soundtrack is viewed in terms of cinematic art, or sound design, or simply sonic problem-solving, the techniques that were developed and choices which were made to create its extraordinary characters and their world have proven a fruitful source of inspiration for generations of sound designers and composers.
Notes 1 Original music had been used in conjunction with existing music in previous sound films, and some original scores had been written for silent films such as The Fall of a Nation (1916) and Battleship Potemkin (1925) (see Buhler, Neumeyer and Deemer 2010, 274–275). Steiner had also written original music for three films preceding King Kong: Symphony of Six Million (1932), Bird of Paradise (1932) and The Most Dangerous Game (1932). 2 For example, see Steamboat Willie (Iwerks 1928) or the Merrie Melodies and Looney Tunes series, which began in 1929, and Sinkin’ in the Bathtub (Harman and Ising 1930), which has a wealth of expressive sound effects and music. 3 A three-minute overture precedes the opening titles. 4 The term Mickey-Mousing is usually attributed to King Kong executive producer David O. Selznik.
108 King Kong
References Media referenced Cooper, Merian C., and Ernest B. Schoedsack. 1933. King Kong. RKO Radio Pictures. Motion Picture. Cooper, Merian C., and Ernest B. Schoedsack. 2005. King Kong – DVD Special Edition disc two. RKO Radio Pictures. DVD. Cooper, Merian C., Fay Wray, Ernest B. Schoedsack, and Robert Armstrong. 2006. The Passion and the Fury – Special Feature on King Kong disc two. Warner Home Video. Harman, Hugh, and Rudolph Ising. 1930. Sinkin' in the Bathtub. Warner Brothers. Short animated film. Iwerks, Ub. 1928. Steamboat Willie. Walt Disney. Short animated film. Lucas, George. 1977. Star Wars. Twentieth Century-Fox. Motion picture.
Other references Beheim, Eric. 1993. “Review of ‘Film Music 1’ by Clifford McCarty.” American Music 11(1): 121–123. Boone, Andrew R. 1933. “Prehistoric Monsters Roar and Hiss for Sound Film.” Popular Science, April. Brown, Royal S. 1994. Overtones and Undertones: Reading Film Music. Berkeley: University of California Press. Buhler, James, Anahid Kassabian, David Neumeyer, and Robynn Stilwell. 2003. “Panel Discussion on Film Sound/Film Music.” The Velvet Light Trap 51: 73–91. Buhler, James, David Neumeyer, and Rob Deemer. 2010. Hearing the Movies: Music and Sound in Film History. New York: Oxford University Press. Deutsch, Stephen. 2007. “Editorial.” The Soundtrack 1(1): 3–13. Goldner, Orville, and George E. Turner. 1976. The Making of King Kong. New York: Ballantine Books. Gorbman, Claudia. 1987. “Classical Hollywood Practice.” In Unheard Melodies: Narrative Film Music, 70–98. Bloomington, IN: Indiana University Press. Handzo, Steven. 1995. “Sound and music in the movies: The golden age of film music.” Cineaste 21(1–2): 46. Murch, Walter. 2000. “Stretching Sound to Help the Mind See.” New York Times, October 1. Peirce, Charles Sanders. 1998. The Essential Peirce, Volume 2: Selected Philosophical Writings (1893–1913). Bloomington, IN: Indiana University Press. Schreibman, Myrl A., and Max Steiner. 2004. “On Gone with the Wind, Selznick, and the Art of “Mickey Mousing”: An Interview with Max Steiner.” Journal of Film and Video 56(1): 41–51. Spivack, Murray, and Charles Degelman. 1995. An Oral History with Murray Spivack. Beverley Hills: Academy of Motion Pictures Arts and Sciences – Centre for Motion Picture Study. Steiner, Fred. 1976. King Kong: Entr’acte. Liner Notes from LP. Witkin, Tamar. 1991. “Review of ‘The Composer in Hollywood’ (1990) by Christopher Palmer.” The Musical Quarterly 75(2): 218–219.
7 NO COUNTRY FOR OLD MEN
Introduction No Country for Old Men (Coen & Coen 2007) was the recipient of several awards including Oscars for directing, film, and writing, with nominations for sound and sound editing. It also won a Cinema Audio Society award for sound mixing, two Motion Picture Sound Editors awards and a BAFTA nomination for best sound. It is also, perhaps, a rare film in that the soundtrack was noticed and singled out for praise in reviews: [T]he leading character in this reverberating movie is silence, save for the sights and sounds of air and breath.… Silence deepens the horror of the drug-deal massacre that the lone hunter Moss first glimpses through his binoculars – he spies scuttled pickup trucks, sprawled bodies, even a slain and rotting dog. (More so than that of any of his none-too-blabby co-stars, most of Brolin’s work is wordless.) Silence heightens the exquisite tension as Chigurh tracks Moss, on the run, from motel to motel. (Silence is broken by the beep on Chigurh’s radar of a certain tracking transponder that chirps a warning of impending mayhem.) Silence accompanies the mournful sheriff as he drives his Texas highways, and silence is what hangs in the air after Chigurh raises his grotesque, sound-muffling weapon to snuff out one life and then another, cold as hell. (Schwarzbaum 2007) And particularly in this article from the New York Times: In one scene a man sits in a dark hotel room as his pursuer walks down the corridor outside. You hear the creak of floorboards and the beeping of a transponder, and see the shadows of the hunter’s feet in the sliver of light
110 No Country for Old Men
under the door. The footsteps move away, and the next sound is the faint squeak of the light bulb in the hall being unscrewed. The silence and the slowness awaken your senses and quiet your breathing, as by the simplest cinematic means – Look! Listen! Hush! – your attention is completely and ecstatically absorbed. You won’t believe what happens next, even though you know it’s coming. (Scott 2007) Skip Lievsay, who has worked on each of the Coen brothers’ films describes the process of creating the soundtrack for the film: “Suspense thrillers in Hollywood are traditionally done almost entirely with music,” he said. “The idea here was to remove the safety net that lets the audience feel like they know what’s going to happen. I think it makes the movie much more suspenseful. You’re not guided by the score and so you lose that comfort zone.” (Lim 2008) Carter Burwell, another long-time Coen collaborator, agreed with the direction for the soundtrack and created a very restrained, minimalist score. The music in the film is practically ‘inaudible’, in Claudia Gorbman’s sense of the word, in that for an audience it is imperceptible and is not consciously heard.1 Whenever I played a traditional musical instrument against No Country for Old Men, the prescence of music made the film less real; it relaxed the tension, diminished the experience. Eventually the music in No Country was reduced to steady-state tones generated with sine waves and singing bowls that had no attack whatsoever. These tones would fade in under sound effects like wind or automobile tones, then shift in pitch and volume for dramatic effect, but like a frog in a slowly warming pot, the audience’s ears were unaware of what was going on. Skip Lievsay, the sound designer, collaborated with me in matching the pitch of the effects to the music. (Burwell 2013, 169) With so few elements, any that are added become noticeable, and each affects the other. The relatively quiet and uncluttered soundtrack lends more meaning to the sounds that are included, since they are therefore highlighted. Rather than using musical score to accentuate a feeling that is already created by other means, the silence surrounding the sparse sounds accentuates the inherent drama of the situation. For Walter Murch, silence is the ultimate metaphorical sound: If you can get the film to a place with no sound where there should be sound, the audience will crowd that silence with sounds and feelings of their own making, and they will, individually, answer the question of, “Why is it quiet?”
No Country for Old Men 111
If the slope to silence is at the right angle, you will get the audience to a strange and wonderful place where the film becomes their own creation in a way that is deeper than any other. (Murch 2005) Silence is used for example when Llewellyn Moss (played by Josh Brolin) surveys the scene of the botched drug deal as he takes in what he has come across, as we do, and also later where he escapes by swimming down the river, pursued by the dog, which again is wholly without music. Sonically it contains only the natural sounds of the river, and the exertions of both man and dog until the moment Moss shoots it. Here we will focus on some key ideas and their development using a semiotic analysis. In this way we can look at some of the sounds in the film as they are used as signs. If we are to take every sound as a sign then we must also take every image as a sign, and the whole range of signs are interrelated and the context of each sign affects its interpretation. In a film such as No Country for Old men, techniques of providing or withholding information, providing ambiguous information, and encouraging the audience to infer meaning and then playing on the audiences’ expectations are used to immerse the audience in the narrative. What we know, what we have seen and heard, what we think we know, and what we think is happening or will happen are all used to allow the viewer/listener to create the meaning from the narrative for themselves as they watch and listen. As a consequence, at several stages of the film there are information gaps which need to be filled in order to know what is happening and therefore what is about to happen. Early in the film some of the important sound elements are introduced. The opening sequences in a film are not simply introducing themes, characters or locations. They also show the audience how the soundtrack interacts with the images. The first sequences which show each of the three main characters introduce the characters themselves and also illustrate something of the approach to the soundtrack taken in the film. First, Anton Chigurh’s chosen murder weapon, a gaspowered cattle gun, is shown to indicate what is about to happen through the use of relatively innocuous images and sounds: a bottle of compressed air, and the slight hissing sound it makes when turned on. Having established the air cylinder and the sound of the valve being opened prior to its use, thereafter simply seeing the tank being carried or put down deliberately, or hearing the sound of the valve being turned on, is sufficient to signify its imminent use as a weapon. The sight of the bottle and the hissing sound of the valve are then established as being related to the bottle’s use as a murder weapon. In semiotic terms the sound of hissing in particular is now used symbolically to suggest an imminent attack. Second, the character dialogue, especially of Anton Chigurh (played by Javier Bardem) is used both for its language meaning and as a particular delivery, as well as a signifier of a course of action. Once introduced, the stylised dialogue in which Chigurh repeats his question, and its immediate aftermath become associated, and are linked thereafter in the film: when we subsequently hear Chigurh repeating a
112 No Country for Old Men
question we then expect a violent outcome. As with the sound of the gas bottle in semiotic terms the repeated dialogue suggests a future course of action based on what we have seen and heard and how they are linked. Third, the withholding of information, both visual and aural, forces the audience to make meaning from the sounds and images that they are given. Rather than having a clear cause and effect, the audience is positioned to propose links (or in Peirce’s term: abductions) from the signs they are given. This again aligns the audience’s perspective with that of the character’s perspective. We, like the characters, are trying to understand what is happening from the information we have available to us.
The sounds of the gas bottle and cattle gun We first see the gas bottle being put into the police car by the deputy sheriff who has arrested Chigurh and later he describes the arrest to his sheriff: “He had an oxygen tank for emphysema or something and a hose that ran down the sleeve” (Figure 7.1). After killing the deputy, Chigurh picks up the gas bottle and its attachment which makes a distinctive ‘ping’ as it taps against the chair. Though quiet, the sound of the gas bottle is clear and isolated in the soundtrack, which is uncluttered and noticeably free of other sonic distraction including music. Driving a stolen police car, Chigurh pulls over a man on the highway and approaches him with the bottle at his side before killing the man. When Chigurh arrives at Moss’s trailer we see a shot of the gas bottle being carried up steps to trailer. We see only the boots of the man walking but the sight of the cylinder and hose indicates that it is Chigurh. We see the gas bottle being turned on and hear the accompanying ‘hissing’ sound. We still do not see the character’s face, since he is suggested by the sight of the bottle alone and by his distinctive boots (Figure 7.2). The cattle gun is then used to blow out the door lock. Elegant and efficient filmmaking is achieved through the establishing of images and sound signs whose significance is enhanced by their relative scarcity. Simply seeing and hearing the gas tap being turned on is sufficient to create a sense of tension and foreknowledge in the viewer/listener. We are primed to expect a violent confrontation from a visual shot of a gas bottle and the sound of a gentle hissing of the tap. By highlighting visual elements and the sounds of the device – the hissing of the tap, and the ping of the cylinder – our attention is pointed to the device itself as it has become a sound-sign of imminent violence.
The use of repeated dialogue SCENE: ROADSIDE – Before Chigurh’s first use of the cattle gun his dialogue sets the tone for several other meetings with individuals: “Step out of the car, please” and “I need you to step out of the car, please, sir”. The repetition becomes
No Country for Old Men 113
FIGURE 7.1
No Country for Old Men – the gas gun
FIGURE 7.2
No Country for Old Men – approaching the door
a sound sign which precedes his use of the cattle gun. The repeated dialogue preempts the murder and thereafter acts as a sign of an impending murder. The symbolic link between the character’s speech and the event which follows it might not, by itself, be strong, but it becomes strong because these two themes, once established, are repeated during the film. The character reiterates questions throughout the film, as he uses the cattle gun or other weapons several times. SCENE: GAS STATION – By now Chigurh has killed two people: the policeman and the man he pulled over in the car. At the gas station Chigurh engages the attendant in conversation, eventually convincing him to call on a coin toss: “Call it”, “Just call it” and “You need to call it. I can’t call it for you”. Eventually, the attendant guesses correctly, and Chigurh leaves. We are set up to believe we know what will happen but since the man guesses correctly Chigurh does not carry out what we assumed he would. The abduction is not fully correct, but neither is it wrong. We have learned a little more about Chigurh’s character, though we have our assumptions about the link between the repetitive dialogue and an attack modified as it did not take place on this occasion. SCENE: CHIGURH AT TRAILER OFFICE – Later, Chigurh visits the woman in the trailer park office looking for Moss. Chigurh repeatedly asks the woman in the office: “Where does he work?” (Figure 7.3). Exasperated, the woman replies “Did you not hear me? We can’t give out no information” At this point, the
114 No Country for Old Men
FIGURE 7.3
No Country for Old Men – “Where does he work?”
repeated dialogue mirrors what we have seen earlier and we now have a fair idea of Chigurh’s intended course of action. Although the woman is unaware of it, we can make the abduction based on a previous link between the repeated questions and the likelihood of an attack. Instead though, Chigurh is interrupted by the sound of toilet flush (Figure 7.4), and while the woman is oblivious to the importance of the sound, he leaves without another word. The toilet flush is an indication that someone else is nearby, and so breaks the sequence that had been established. Again, the expected course of action is foiled, and again our again hypothesis remains, though again needs to be modified. The process of meaning-making, and as in this case, abduction modification in light of further evidence, requires the audience to actively participate in the production of meaning rather than being a passive recipient.
Withholding information We have seen how Chigurh is pinpointing the location of the money and of Moss. We see and hear Chigurh’s tracking receiver (whose transmitter is hidden inside the briefcase of money) which is flashing and beeping faster as he approaches the motel in a car. The sequence unfolds as follows:
FIGURE 7.4
No Country for Old Men – Chigurh hears the toilet flush
No Country for Old Men 115
HOTEL ROOM – Moss notices tracking transmitter buried in the money he has been carrying as he sits in a room in a new hotel. He now knows how he has been traced. Hearing a slight noise, he rings downstairs to the desk clerk that he had just spoken to – we clearly hear the dial tone and ringing tone from the phone as well the sound of a phone ringing downstairs. The call goes unanswered and we hear a distant noise from outside the room. Moss sits on the bed pointing his gun at the door, and turns off the lights. We begin to hear approaching footsteps and the faint sound of the quickening beeps of the tracking receiver and then a click as it is switched off. Moss sits on the bed pointing a gun at the door. He notices the shadow of something outside his door (Figure 7.5 – which are most likely Chigurh’s legs). We see a close up of the Moss’s gun hear the sound of it being cocked. After a moment’s pause we see the shadow move away (Figure 7.6), and then a hear very faint squeak and the corridor light goes out. In the ensuing darkness and silence, we see that Moss is puzzled. He, and we the audience, have a second or two to work out that the squeaking sound was the corridor light bulb being unscrewed. The lock is blown out and hits Moss, who then fires through the still-closed door and wall, and then escapes through the window, pursued by Chigurh. REAR OF HOTEL – Moss lands on the ground outside the hotel, picks up his gun and the case of money and re-enters the hotel. HOTEL LOBBY – Moss walks back through the reception area of the hotel and sees the overturned cat’s milk and the desk, previously occupied by the clerk, empty. There are several ways that this sequence could have been written, filmed and edited. It would be understandable to be presented with Chigurh’s entry to the hotel, his killing of the clerk, the quickening pace of the tracking receiver as he slowly ascends the stairs, the picture cutting between Moss and his pursuer as he makes his way ever closer to the room. Tension could be built in such a scene either through source music or score. Instead, the sequence takes place inside Moss’s hotel room, and avoids showing Chigurh’s approach; indeed, Chigurh is not shown at all. By limiting
FIGURE 7.5
No Country for Old Men – the shadow outside the hotel room door
116 No Country for Old Men
FIGURE 7.6
No Country for Old Men – the shadow moves away
the sights and sounds to only that which Moss can see and hear the audience is then effectively experiencing the film from Moss’s position in real-time, seeing and hearing only what he sees and hears. We see only the sliver of light and shadows under the door, hearing only what he hears from his hotel room; the stillness and small sound cues such as the telephone, the beeping, the silence after the beeping stops, and the squeaking of the light bulb being removed. Through the subtle uses of simple sounds, important consequences can be inferred. We are given an opportunity to determine the sounds, understand the origin of the sounds, and then to interpret their meaning in that particular context, both from the perspective of the character and from our knowledge of the film up until that point.
Moss’s telephone earpiece tone – we hear that his call is not answered. A distant telephone ringing – we presume that it was the lobby phone he was calling. Footsteps – we hear slow footsteps approaching the room. Beeps – we hear the quiet but quickening sound of beeps that suggest to us that Chigurh has located Moss. Squeak – it is unclear initially what this means, though when we see the light go out we have a moment to realise what is happening.
In our model of the sign neither the object nor its meaning is necessarily constant. Noticing a sound is merely the first step. Recognising the origin of the sound is the next. Understanding the meaning of that sound sign is a process in which context is crucial and abductive thinking is required to create an immediate interpretant, which can be tested by induction. The sequence progresses with an accumulation of snippets of information, from which we make a guess at the source of the sound, and then its meaning – We hear a distant thud. We see Moss dial a number. It is a short number so we might guess that it is a hotel internal number. We hear it ringing out through the phone’s earpiece, so know it is not being answered. We also hear a distant ringing, perhaps it is downstairs in the lobby? Why would the man at the desk not pick up the phone? By manipulating the available sounds and available story information
No Country for Old Men 117
through POV, framing, editing and mixing, the audience is required to engage with the film in order for meaning to be created.
Summary The use of sound signs to represent the object allows the object to be represented without need for a visual representation. The sound of the object stands in for the object. For example, the sound of the hiss stands in for the gas bottle, and thus the cattle gun. The use of repeated dialogue followed by violence suggests a sequential link. Once this sequence is then repeated a symbolic link can be made with the action that follows it. The first occasion does not create a symbolic link by itself, which is in a sense Peirce’s idea of firstness, of possibility. By the second use of repeated dialogue, we are reminded of the first use of repeated dialogue and can make an abduction that a similar course of action will take place, which is Peirce’s idea of Secondness, or relation between the two. By the third use of repeated dialogue we might guess that a pattern has been established, which is Peirce’s idea of Thirdness, a rule or habit. In both cases, the guesses that we are invited to make and then test when we next come across the pattern are only partially correct. We can guess the intent of Chigurh’s character but not his actions. By withholding information, the sounds which remain are relatively simple but they may have very particular contextual meaning. The sound-signs’ objects may be simple but their interpretation conveys something other than the simple sound object itself. The sound of a toilet flushing is not merely a literal sign that a toilet has been flushed, but that a murder has been averted, though the potential victim is unaware of its significance. A vague pair of shadows that pause and then move and disappear leaving only a sliver of light beneath a door, which itself then disappears as the light source itself is removed, indicates not simply an object outside the door, but a killer outside the door. The squeaking of the light bulb being unscrewed is a sound-signifier, but it is not clear what the source of this sound is or what this means until a few moments after the light goes out. The deliberate withholding of information forces more significance to be placed onto what remains as well as encouraging a filling-in of the conceptual gaps. Here the soundtrack, in combination with each of the other filmmaking elements (script, picture editing, cinematography, acting, direction), provides sufficient information but no more. It allows the audience to attempt to create a meaning, and to have that meaning challenged or modified. It is worth noting that the hotel room scene in the finished film precisely follows the way it was scripted.2 Some sounds are described to be deliberately vague, such as the moment when Moss hears a sound before he rings down to the lobby:
118 No Country for Old Men
From somewhere, a dull chug. The sound is hard to read – a compressor going on, a door thud, maybe something else. The sound has brought Moss’s look up. He sits listening. No further sound. (Coen and Coen 2006, 100–101) It was written and designed to work as it does using sound and image together to create the narrative. Only by a sympathetic orchestration of all the elements of sound and image, and their editing can each work with the other. The entire sequence is shown from the perspective of Moss inside the hotel room. It contains no dialogue. As elegant as it is efficient, the soundtrack contains relatively few sounds but the relative rarity bestows meaning. In the moment immediately after Chigurh removes the light bulb, in the silence that surrounds the characters, the sound of the gas bottle’s hiss can be imagined. It need not even be there and is as close to Walter Murch’s idea of the ideal sound as is possible to attain. We may have just heard it, or may think we have heard it, and, in a sense, it does not matter – the mere suggestion is enough.
Notes 1 Claudia Gorbman describes sever principles of composition, mixing and editing in Classic Hollywood Practice – the first being invisibility, and the second being inaudility: “Music is not meant to be heard consciously. As such it should subordinate itself to dialogue, to visuals – i.e. to the primary vehicles of the narrative” (Gorbman 1987, 73) 2 See Script for No Country for Old Men draft script (Coen and Coen 2006).
References Media referenced Coen, Joel, and Ethan Coen. 2007. No Country for Old Men. Miramax. Motion Picture.
Other references Burwell, Carter. 2013. “No Country for Old Music.” In The Oxford Handbook of New Audiovisual Aesthetics. edited by John Richardson, Claudia Gorbman and Carol Vernallis, 168–170. New York; Oxford: Oxford University Press. Coen, Ethan, and Joel Coen. 2005. “No Country for Old Men” (draft script 28 November). Available online at www.simplyscripts.com/n.html Coen, Ethan, and Joel Coen. 2006. “No Country for Old Men” (draft script 18 May). Available online at http://finearts.uvic.ca/writing/websites/writ218/screenplays/award_win ning/no_coun try_for_old_men.pdf Gorbman, Claudia. 1987. “Classical Hollywood Practice.” In Unheard Melodies: Narrative Film Music, 70–98. Bloomington, IN: Indiana University Press. Lim, Dennis. 2008. “Exploiting Sound, Exploring Silence.” New York Times, 6 January. Available online at www.nytimes.com/2008/01/06/movies/awardsseason/06lim.html? _r=1&oref=slogin
No Country for Old Men 119
Murch, Walter. 2005. “A Conversation w/ Walter Murch.” The Transom Review 5(1). Schwarzbaum, Lisa. 2007. “No Country for Old Men (review).” Entertainment Weekly, 7 November. Scott, A. O. 2007. “He Found a Bundle of Money, and Now There’s Hell to Pay.” New York Times, 9 September.
8 SOUND IN NON-FICTION
Sound and non-fiction From the earliest days of documentary cinema techniques were in evidence that we would now associate principally with fiction. In 1922 Robert Flaherty made what is often considered to be the first documentary film, Nanook of the North. The film follows the Indigenous Inuit man in the Ungava Peninsula of eastern Canada, Nanook, going about his traditional work in a very harsh landscape, surviving by hunting and fishing, and constructing a traditional igloo home. Whilst Nanook of the North is a ground-breaking film in many respects film it has been the touchstone of debate around documentary ethics ever since. Nanook employs several recognisable documentary styles: “re-enactment, staging, observational mode, ethnography, exploration, poetic experimental film, participatory mode, fiction, portrait, travelogue, landscape, adventure film, nature film, hybrid forms combining fiction and documentary” (Zimmermann and Zimmermann Auyash 2015). The film was made as a collaborative effort with Flaherty spending sixteen months living with the Inuit, and follows Nanook and his family through the seasons. During filming, whether at Flaherty’s request or their own instigation, the participants used old-fashioned harpoons even though they already had rifles for hunting, and the same event might be filmed multiple times in order to cut between angles. Flaherty’s concern was telling the story of the Inuit through his focus on Nanook. For Nanook of the North and others that came later there were few hard and fast rules concerning documentary practices, and indeed there was, as Daniel Levi puts it, “a lot of two way traffic across a weak ontological frontier” (in Juhasz and Lerner 2006). Flaherty’s motives in departing from a strictly factual representation to one in which he instead presented the Inuit as he imagined them to be, or wished to present them, are still debated. Documentarians and makers of various other forms of non-fiction sought to navigate what is actually meant by the
Sound in non-fiction 121
terms ‘documentary’ or ‘non-fiction’, and these questions were soon made even more complex by the addition of sound.
Newsreels Newsreels were the first widespread non-fiction genre. Beginning over a hundred years ago when Pathé introduced The Pathé Journal in France. An English version started soon afterward in the UK with Pathé’s Animated Gazette edition number one appearing in June 1910, with an American-produced edition in the following year. The silent newsreels continued to appear in theatres until the introduction of sound in 1927. Silent newsreels such as the industry-leading weekly Kinograms, which had been selected by New York’s Capitol Theatre as their exclusive news weekly, went the way of silent films and had gone by 1931 (Fielding 1972, 84). The advent of sound fundamentally changed the types programs that were possible. Fox Movietone shifted its sound newsreels’ focus to broadly three categories to take advantage of sound: “talking celebrities, soundscapes and musical performances” (Deaville 2015, 47). For regular news items sound made infrequent incursions. One rare occurrence was an assassination attempt on the Italian Prince Umberto II whilst on a visit to Brussels to lay a wreath at the tomb of the Unknown Belgian Soldier. The chaos of the immediate aftermath being captured on camera and by the sound recording, a band playing, a gunshot followed by screams, with the band continuing to play, unaware of what had happened (Deaville 2015, 48). This witnessing of real life through actuality recording was almost a side effect of the availability of sound recording. As newsreels developed so did the practices of sound. The soundtrack of archival non-fiction films routinely contains dialogue, music and sound effects that were recorded after (or separate from) filming. Whilst in a literal sense the newsreel is a very obvious archive there are some caveats regarding both sound and image. Some of the earliest uses of sound in non-fiction film are examples of the addition of sound to image to present a reality which never existed in the first place. In many cases the techniques used in newsreel films predate their widespread use in documentary. In a 1946 article on newsreel sound, Warren M. McGrath described the techniques and equipment that had been developed to create the newsreel soundtracks: It was inevitable that newsreel sound should pass through an era of growing pains before settling down to a specific treatment acceptable to all major newsreel producing companies. Since early 1932, the commentary type of newsreel story has increased in popularity until today it is accepted as the most lucid manner in which to present current events. This, of course, has resulted in a steady decrease in the amount of natural sound recorded in the field and thus the work of the newsreel synchronizer has become increasingly important. It is through his efforts that commentary mixed with music and sound effects, and an occasional interpolation of natural sound, results in a pleasing
122 Sound in non-fiction
composite sound track at a level constant throughout the reel and unvarying from week to week. (McGrath 1946, 371–372) Whether for reasons of convenience, economic/narrative necessity, or aesthetics, the sound which accompanies newsreels is often some way from the authentic record that is expected from its visual counterpart. It was a deliberate and designed accompaniment to the visual image. At times it could include sound which could not be recorded at the time but which was added in post-production. In Air Battles Over Europe (British Pathé 1944) there are several sequences of aircraft in battle which would have presented enormous, if not insurmountable, difficulty for the recording of synchronous sound yet which nevertheless contain the sounds of aircraft along with music and narration. As the title suggests it features incredible footage of actual ‘dogfights’ between fighter aircraft – the documentary commentary revealing that ‘concealed in cameras built into the wings of their aircraft is the evidence of victory’. Such visual evidence is augmented by sound almost certainly added after the fact, though this is of course difficult to confirm absolutely, such is the nature of sound production. This audiovisual newsreel seen from today is presented as newsreel of the time, but no indication is given as to whether it was made with synchronous sound originally and if not, when the sound was added afterward. The British Pathé online archive contains a great many films which contain sound, but which in all likelihood would have been filmed silent with the sound added after the fact. The British Time to Remember series of documentaries made in the late 1950s was created using archival newsreel footage. The episode Enough of Everything… 1917 (Bayliss 1957) features narration along with occasional sound effects and music. For an audience watching the documentary in 1957 it may have been obvious that the narration was from the present day, featuring as it did the recognisable voice of narrator Stanley Holloway, famous for his character roles and comic monologues. Less obvious as a recent addition perhaps would be the additional sounds and music which accompany the archival images. Footage of explosions, machine guns and aircraft feature matching sounds though they were not recorded concurrently with the images of the Great War, or the Russian Revolution, which are depicted in the film. The normal processes of filmmaking routinely rely on a basic principle of ‘disguised artifice’ as a foundational production practice. Somewhat ironically, actual realism can itself sometimes sound unrealistic. Realism in film is often in practice the illusion of reality. The soundtrack may be recorded simultaneously with the images but, equally may also be created, wholly or in part, after filming is complete. As an audience how are we to know whether what we are seeing and hearing is authentic? Does authenticity in this context mean not changing the recording? Can it mean removing the parts that make it appear inauthentic, or combining different elements to recreate a more representative version of (sound) events?
Sound in non-fiction 123
Newsreel footage which includes the sound of the crowd at a football match or the sound of a fighter plane diving in wartime would almost certainly have originally been filmed without sound, and had sound elements such as narration, music and sound effects added some time later. Looking back to newsreels of the early twentieth century there is a tendency for this mismatch between archival image and supplementary sound to disappear; and for such historical and archival material to gradually acquire authenticity as it ages. Though the soundtrack has clearly been added afterward it retains its archival quality since in a sense it illustrates how the newsreels came to be presented to their contemporary audiences, though the added sonic elements are, strictly speaking, fabrications in which the soundtrack purports to be something it is not. Sound’s usefulness in creating different types of narrative were beginning to be explored in many productions. In Australia a newsreel titled The Mighty Conqueror (McDonagh 1931) was made about the famous Australian racehorse, Phar Lap. It utilises some techniques that mark it out as an early example of the new possibilities available to filmmakers open to the creative use of sound.1 Amongst the dramatic techniques employed in The Mighty Conqueror are the opening title shot of the horse rolling on its stable floor accompanied by the actuality sound recording of its vocalisations rather than music. The second sequence is a series of shots of two actors enjoying a drink looking out over a view of the Sydney Harbour Bridge, shot from behind to allow for a scripted voiceover as they discuss the famous horse (Figure 8.1). This voiceover is then used as a sound bridge (pre-lapping the picture cut) from outdoor to indoor and across a travel sequence montage as they make their visit to the racecourse. Interestingly, in this newsreel fakery is not limited to sound. In one sequence which discussed the horse’s prodigious stride sound is used to hide the artifice in the picture editing. The voiceover focuses
FIGURE 8.1
The Mighty Conqueror – establishing shot of Sydney Harbour Bridge
124 Sound in non-fiction
attention on the stride rather than the discontinuity in the film in which the short sequence is repeated 14 times (Figure 8.2).
Sound from location recordings In 1926 another Australian filmmaker Bert Ive had travelled to the south west of Australia to make a silent documentary film on the timber-getters who worked there. The forests of giant Karri and Jarrah trees are amongst the tallest and hardest woods and the difficulty and remoteness made for film which created a romantic view of the work (Figure 8.3). With the advent of sound recording Ive returned to the subject and made Among the Hardwoods (Maplestone 1936) recording with a Kinevox recorder which allowed sound to be recorded in sync with the film camera. Ive and freelance cameraman Lacey Percival had filmed an earlier and silent film of the same name in 1927.… But in mid-1936 Ive and a sound recordist revisited the Pemberton region of south-west Western Australia to make a film that focussed more on the atmosphere of the forest as it was irrevocably changed by the axe and saw of the loggers. While this new film was half the length of the silent, Ive’s director Lyn Maplestone drew added impact by lingering on the sights and sounds of fewer phases of logging trees in the Jarrah and Karri forest. (NFSA Films 2013) The 1936 film contains very distinctive sounds of logging, with the high-pitched blows from the logger’s axes echoing through the forest, as well as the calls from
FIGURE 8.2
The Mighty Conqueror – the same shot repeated 14 times
Sound in non-fiction 125
FIGURE 8.3
Among the Hardwoods – men work to cut down the giant trees
the men in charge of the teams of horses and oxen used to haul the fallen trees to the railway line and from there on to the timber mill (Figure 8.4). There is no music in the film and few title cards are used rather than voiceover, but Ive’s choice to return to the same location to remake the earlier documentary was based purely on the accessibility of sound recording technology. There are some differences worth noting about the two versions of the film.
FIGURE 8.4
Among the Hardwoods – horses and oxen used to haul the fallen trees to the railway line
126 Sound in non-fiction
First, the fact that the film was remade at all points to the fact that it was felt that something was definitely to be gained by essentially remaking a new film with sound. Second, there is the tacit acknowledgement that adding sound allowed for longer takes which ‘added impact’ while the new film ‘focussed more on the atmosphere of the forest’, whilst at the same time being more efficient in that the new sound film was shorter than its silent equivalent. Third, the specifics of the sound recording contribute something that pictures alone could not possibly capture: the sounds of the axes in the forest, the portable motor-saws, the whistle of the steam train, and the calls of the men as they corral the teams of bullocks and horses to pull the giant trees through the forest. The sounds of this documentary are all synchronous sounds recorded along with the observational style images. The filmmakers take their role to be to ‘document’ seriously and so we see and hear the work of the loggers.
Early documentary films Around the same time in the UK, the General Post Office Film Unit made what was to be a was a landmark documentary, Night Mail (Watt and Wright 1936) about the London, Midland and Scottish Railway’s mail train. It featured a musical score composed by Benjamin Britten and sound recordings supervised by sound director Alberto Cavalcanti.2 Much of the soundtrack is comprised of occasional explanatory narration to accompany the actuality recordings of the railway’s stations, along with the real postal worker participants stilted conversations, which were written by directors Harry Watt and Basil Wright. It is the final section of the film that soundtrack is left to Benjamin Britten’s score along with a reading of the Auden specially composed poem whose meter was designed to match that of the sound of the steam train: This is the night mail crossing the border, Bringing the cheque and the postal order, Letters for the rich, letters for the poor, The shop at the corner, the girl next door. (Auden 1936) Alongside Auden’s rhythmical poem, and Britten’s musical score, the third sonic element was provided by Alberto Cavalcanti, a Brazilian film maker who had worked for some time in French Commercial Avant Garde cinema prior to accepting the invitation to work for the GPO Film Unit. For some, the techniques displayed by Cavalcanti’s documentary soundtracks not only matched but exceeded his feature film contemporaries “in sophisticated and multi-layered sound” (Ellis 2018). Jack Ellis describes him in a way that many modern-day sound practitioners would aim to emulate: Cavalcanti seemed always to be the artist, personal creator and, especially, consummate technician. He applied himself to the basic modes of film art –
Sound in non-fiction 127
narrative fiction, avant-garde, and documentary – in a full range of capacities – set designer, sound recordist, producer, and director. A charming journeyman artist with a cosmopolitan and tasteful flair, he taught and influenced a lot of other filmmakers and was responsible for noteworthy innovation and experimentation in many of the films with which he was associated. (Ellis 2018) Cavalcanti was a pioneering figure in early film sound as well as a successful innovative director in his own right, being one of the founders of Ealing Studios and later directing influential films such as the Went the Day Well? (1942) and They Made Me a Fugitive (1947). Since the field of film sound was quite open at the time there was little distinction between fiction and non-fiction techniques. Cavalcanti’s 1939 article describes his approach to creating the sound for another GPO film, North Sea, best described as a dramatized documentary. By this time, he was in charge of the GPO film unit and is credited only as producer on this film. He described the usefulness of unrecognised or unidentified sound: when we made North Sea we had to do a studiocrash, to represent a sudden catastrophe on board a ship. The sound staff approached the B.B.C. and everybody else, but they could not get a combination of sounds that would be sufficiently terrifying. They asked me. I told them at once that they would have to get a loud, unidentifiable sound to stick into the crash. They got it. A horrid metallic squeal which suggested that the vessel had been squeezed diagonally and had started all her seams. It was a wonderful noise because it was unrecognizable. (Cavalcanti 1985, 108–109) The particular technique that Cavalcanti describes here are still fundamentally useful in modern sound design. Cavalcanti and others’ work straddled what we might consider non-fiction and fiction but is there a distinction between fiction and non-fiction techniques? In practice there is no real difference at all: each uses speech as voiceover/narration and for such things as exposition, character, drama and humour; each also uses music for mood, emphasis, comedy and suspense; and each uses sound effects sound effects, for drama, realism, comedy or mood. It is also worth noting that the sound effects in each case may be authentic in origin, and, equally, may not. The non-fiction or documentary film shares the same basic elements with fiction film with similar techniques being used. Given the potential for manipulation, there would be enormous difficulty for an audience to try and determine or detect any manipulation of the soundtrack. Whilst this is less of an issue for fiction filmmakers, it could be a concern to those working in non-fiction. Even participants who were present during filming and recording would not be sure whether the recordings used were real or authentic, or had been altered in some way, or replaced. The only people that would know for sure being the ones who had done
128 Sound in non-fiction
the work. Leaving aside the ethical questions this poses for a moment, we can say that there are obvious similarities in terms of documentary and fiction productions there are of course some notable differences. Documentaries typically focus on location sound rather than post-production since scenes cannot (or perhaps should not) usually be repeated.
Conventions of documentary Whether the subject is natural history, politics, war or biography the sound of a documentary occupies a curious position. The principle of ‘disguised artifice’ coexists with the audience expectation of truthful representation. Documentary realism often means ‘the illusion of reality’ – realism as a representation (Murray 2010). At times authentic recordings are the focus of the documentary story where at other times the images are the focus and synchronous sound may not even have been available. We also accept that authentic sound in a documentary film can also be potentially rich in meaning whether speech, actuality sounds or music. The development of portable quiet cameras such as the Mitchell NC (newsreel camera) in 1933 and the soundproof BNC (Blimped Newsreel Camera) model a year later led to a changes in documentary practice.3 With their associated sync sound recording equipment both camera and sound could work together to react to what is going on. The simple ability to put the camera on the shoulder rather than a tripod meant that both camera and sound were mobile, free to move and respond to events rather than dictating them. “The new news reporting potential of the mobile camera with sync sound was electrifying. And we had a whole bunch of rules: we were shooting handheld, no tripods, no lights, no questions; never ask anybody to do anything” (Richard Leacock in Wintonick 2000). This new style of observational filmmaking, called free cinema in the UK, cinéma vérité in France, and direct cinema in the US, meant a change of emphasis, not simply on the what was technically possible but the rationale for doing it in the first place: “It was absolutely the opposite of the scripted, the conceived, the planned, the argument-led documentary. It was wanting what you got, rather than going out to get what you want, if you see what I mean” (Karel Reisz in Wintonick 2000). The availability of technologies such as hand-held cameras and sync sound recording equipment allowed for this new style of filmmaking. At its heart, it became another option for the film-maker who wished to record actuality as it occurred – unvarnished and unrehearsed. At its extreme direct cinema was purely observational, and its impact on more conventional documentary was substantial, as the standard documentary practice gradually adopted and assimilated some of the observational method. A poor recording might still be used if it were narratively important, but continuity conventions still applied and chronological order was generally used as a cause and effect narrative was created. This inevitably required some forms of manipulation in post-production, even if the shooting was principally in the observational style. Jeffrey Ruoff (1993) described some of the types of manipulations that are
Sound in non-fiction 129
encountered in seemingly observational documentary using An American Family TV series (1973) as a case study, highlighting the occasional instances of cheating in the soundtrack such as the rerecording of sound for a sequence that was originally filmed without sound. Contemporary documentaries such as those from Michael Moore, use a full range of the soundtrack’s persuasive and narrative possibilities. The sidewalk sequence from Moore’s Roger and Me (Moore 1989) uses a combination of soundtrack voiceover, narration, ironic music, location music, and location recording used over images which are sometimes sync and sometimes juxtaposed for effect. Perhaps most notably, Moore’s Fahrenheit 9/11 (2004) used a simple but effective black screen to accompany the actuality sound of explosions, crashing masonry, confusion and scream in the wake of the attack, rather than showing the now instantly recognisable video footage captured on the day. More recently Senna (2010) and Amy (2015) documentary director Asif Kapadia and producer James Gay-Rees adopted a different approach which relies heavily on archival materials of their subject or in the case of Amy, interview recordings with people who knew her. Whilst both films make extensive use of archival materials there is a difference between the two. For Senna the images were typically very high-quality footage shot by professionals, on film or high-quality video, including helicopter shots from Formula 1 races. For Amy the footage is typically amateur material, often shot on mobile phones, resulting in images that were less than cinematic. The subjects of the documentaries partially explain the difference in archival material and the decision to rely on relatively poor-quality images for Amy. In Senna the focus is on motor racing, which in television terms is a primarily visual experience, where in Amy the focus in on music and the sound of her voice. The sound of the documentary (her music, her singing, her speaking voice) is the focus which allows the relatively amateurish archival footage to be used.
Animated documentary Animated documentaries use real life audio interview material as the basis for an animated film. Whilst Aardman Animation Studios found great success with Creature Comforts (1989) and its successors, the technique had been used before by animators such as the American husband and wife team of John and Faith Hubley in films such as Moonbird (Hubley 1959) and Windy Day (Hubley 1968). In Creature Comforts Nick Park used various real, unscripted interviews of residents of a council estate and old-people’s home discussing their own surroundings, or people visiting London Zoo. These interview characters are synchronised to stop-motion animated zoo animals who perform the speech. The effect is both charming and humorous but it would not be what most people would consider to be a ‘documentary’ film since the images are entirely fabricated, although the soundtrack is composed of documentary material. Given that documentaries routinely contain documentary footage to which an entirely fabricated soundtrack is added (speech, music, sound effects), can we think
130 Sound in non-fiction
of a film like Creature Comforts in the same way? For many people the defining characteristic of documentary is not in cinematic devices such as camerawork or editing, but by the intention of its portrayal. We become aware very quickly that the interviews we are hearing, rather than the animation we are watching, are real and recognisable individuals rather than scripted performances. In this sense then the documentary element is in the recording and editing of the interview soundtrack. The anthropomorphic animation performs a rather different task in supporting the narrative, providing humour, counterpoint and pathos. Whilst it may be a stretch to interpret films like Creature Comforts as a documentary the idea of an animated documentary has been taken up by others. Dock Ellis & the LSD No-No (Blagden 2009) began as a two-hour radio interview with the baseball player, Dock Ellis. A four-minute radio documentary of his recollections of an infamous 1970 ‘no-hitter’ game he pitched while high on LSD. After hearing the show producer Chris Isenberg commissioned James Blagden to create an accompanying animation (Figure 8.5). The oral history or long form interview seems to lend itself to an animated treatment – an amalgam of ‘live action sound’ and animated images – particularly where there are practical difficulties in a visual representation of the subject of the documentary. Swedish filmmakers David Aronowitsch and Hanna Heilborn use the technique as their films are often about vulnerable subjects such as children being held in detention centres, who may not be shown. In their animated documentary Hidden (Heilborn, Aronowitsch and Johansson 2002) the sound is from an interview with Giancarlo, a 12-year-old boy, while the images are animated (Figure 8.6). Giancarlo describes his life living homeless and alone in Peru, being chased by police and living in hiding. His persona remains hidden. In Slaves (Heilborn and Aronowitsch 2008) the story is of another vulnerable child Abuk
FIGURE 8.5
Dock Ellis & the LSD No-no
Sound in non-fiction 131
FIGURE 8.6
Hidden
who was abducted at the age of five by government sponsored militia in Sudan (Figure 8.7). Other film-makers have adopted this technique using real recordings combined with animated representations of the individuals. In the US film Stranger Comes to Town (Goss 2006) uses cartoon World of Warcraft-style characters or avatars to voice the recordings of the individual interviewed (see Figure 8.8). In Australia It’s Like That (Craddock et al. 2005) uses a range of hand drawn, computer generated images and stop motion styles using models and puppets to accompany the real audio recordings of three children held in indefinite detention (see Figure 8.9).
FIGURE 8.7
Slaves
132 Sound in non-fiction
FIGURE 8.8
Stranger Comes to Town
FIGURE 8.9
It’s Like That
Sound in television non-fiction: current affairs, news and sport Sound design is probably not the first thing that springs to mind when we think of television news, current affairs or sport. Yet the same technologies are available to those gathering and editing the raw material as those working in fiction genres and documentary. Here the sound is used to convey factual, explanatory, and background information, and as with documentary filmmaking, production sound
Sound in non-fiction 133
recordings are key. Sound effects may be used to indicate reality or may be important in understanding the events. In news and current affairs productions, voiceover or narration often accompanies the images, often recorded after the fact to tell, or clarify, the story. Though music is not uncommon in longer current affairs stories it is not generally used in news genres as it would immediately indicate a shift in production where an authorial ‘narrative’ is being made rather than a mere presentation of ‘facts’. Though still in the realm of non-fiction television sport is perhaps best explained as a composite model of television genres lying somewhere “between drama, journalism and light entertainment” (Whannel 1992, 56). Increasingly big-budget televised sports events demand larger than life audio to rival feature film and game aesthetics. From a production perspective the sound might be there to enhance the experience rather than an attempt to provide an objective version of events. Here again the decisions about sound production, and what specific purpose sound serves, will determine how sound is used and whether a truly authentic approach is warranted or required. Televised horse-racing in the 1970s presented a problem for the presentation of sound. In a course much larger than most stadium or gymnasium-based sports, the capture of the sounds of the horses running around the track was a problem routinely solved through fabrication. Whilst the sound of the race commentator could be easily acquired, the sounds which accompanied the long lens shots of the horses galloping was frequently a sound effect loop being played back until such time as the ‘live sound’ of the event could again take over once the field reached the home straight (and the microphones). This type of sleight of hand is not confined to horse-racing. Sound actuality is considered by some to be “the fingerprint of the event on the media event, or a stamp of authenticity” (Raunsberg and Sand 1998, 168). Arndt Maasø describes a process of “taming wild sound” in sports like skiing, where recording the sound in practice sessions and playing back from sampler for the live broadcast where there may be competing crowd and machinery noise which could make the sound of skiing too hard to hear (Maasø 2006). For those sports which are televised pseudolive or nearly-live, such as golf where large numbers of competitors are playing simultaneously requiring time-shifting to enable key moments to be witnessed which happened a few moments ago. Whilst television cameras armed with telephoto lenses, and mounted on raised platforms, can have a relatively close perspective of virtually every part of a course, a limited number of roving microphones may not be able to capture every swing and every putt. Indeed, the shots themselves are not the only element that can be replaced. CBS were embarrassed when found to be inserting birdsong into coverage of the PGA Masters golf tournament. Fox Sports also admitted to ‘sweetening’ baseball highlights through the addition of sound effects for hits or catches which were real recordings but which were not synchronous with the action (Sandomir 2004). Whilst some audiences found objectionable these enhancements through the addition of sound effects, since they were not actually the real sound of the event,
134 Sound in non-fiction
little attention is usually paid to the overall design of broadcasts. The simple placement of a microphone, or lack of a microphone, has an enormous impact on the sound that will accompany any televised sports coverage. Anyone watching the 2010 FIFA World Cup in South Africa would be familiar with the distinctive sound of the vuvuzela, a plastic horn which when played en masse produces a sound reminiscent of a swarm of angry wasps. The vuvuzela was a recognisable and constant accompaniment to each of the 64 games played in the tournament, which in turn created a conundrum for broadcasters. The pitch of the vuvuzela being relatively constant (around 233Hz, or B-flat below middle C) could be removed through filtering for the television broadcast which would potentially please many in the audience who quickly grew tired of its constant drone. On the other hand, seen as an event which is being televised, the sound of the event was most certainly dominated by the sound of the vuvuzela. Removing this would mean departing from the broadcast being a representation of what it felt like to be in the stadium, which would be what ‘effects microphones’ have traditionally been used for in sports stadium broadcasts.
Semiotics and non-fiction sound Sound in non-fiction presents many of the similar attributes as its fiction counterparts. Its principal difference lies in the fact that we are usually dealing in some sense with reality. Given that the soundtrack is routinely manipulated to be perceived as real, while concealing any sign of such manipulation, the semiotic approach be applied to analyse how concepts of realism and reality, authenticity and truth relate to sound practice. In Peirce’s work, reality and truth were important subjects, with abduction being the process to fundamentally identify and determine what one believes to be real or true: Abduction is reasoning which professes to be such that in case there is any ascertainable truth concerning the matter in hand, the general method of this reasoning, though not necessarily each special application of it, must eventually approximate to the truth… there is only a relative preference between different abductions; and the ground of such preference must be economical. That is to say, the better abduction is the one which is likely to lead to the truth with the lesser expenditure of time, vitality, etc. (Peirce 1976, 4:37–38) Many of the choices in sound practice are about clarifying particular sound-signs so that they are interpreted in a certain way to convey a sense of realism rather than representing reality. For example, synchresis allows a sound that is synchronised and appropriate to expectations to be taken as the actual sound. It is reasonable to believe that the visible object makes the sound, although we know this is often not the case. Without foreknowledge, we would not be aware that it was not an original, truthful sound. Through synchronisation it has become the real sound. Our
Sound in non-fiction 135
determination of whether something is truthful or real depends on our interpretation of the information we are presented with. In the absence of conflicting information, we will seek out an economical interpretation and accept it as being real. In Some Consequences of Four Incapacities, Peirce’s first idea concerns reality: “We have no power of Introspection, but all knowledge of the internal world is derived by hypothetical reasoning from our knowledge of the world” (1982, 2.213). Our ability to determine whether something is real is dependent on our senses and perception, which enable us to experience the world. As a code of filmmaking, realism tries to mimic this lived experience, although the codes that are created to mimic realism are subject to change. That which is convincing for one era reeks of artificiality in another. Photographs, gory horror films, 2-D and 3-D cinema, all use elements of realism that attempt to simulate lived experience. In sound realism, the role of the sound designer (recordist/editor/mixer) is in selecting, creating or manipulating sounds or sound sequences that will be perceived by the audience as real. The normal processes of filmmaking routinely rely on disguised artifice to create replicas of reality. Shots are edited to make sense where there was little sense before. Elements are edited out and edited in. Shots are framed to highlight certain things and avoid others. Reactions are inserted where reactions were not filmed. Each film element is “subsumed by the needs of the story” (Metz 1974, 45). The soundtrack is recorded but also created after filming is complete. So how can we know whether what we are seeing and hearing is authentic? Any viewer of a documentary may at some stage question the material presented to them: What happened before that cut? Did these two events really happen one after another? What is outside the frame? Compared to fiction filmmaking, sound in documentary filmmaking often has a more obvious production bias. Unlike fiction films, there is no option to rerecord poor speech recordings later in a studio. This does not preclude the possibility that some parts of the soundtrack might still be created or manipulated during postproduction; however, the emphasis is usually on authenticity or truthfulness. As with drama filmmaking, the sound may be modified or additional sound may be incorporated for aesthetic reasons. For example, unwanted sounds may be removed, such as microphones being bumped. Elsewhere, when the recorded sound quality may not match the pictures and thus gives the appearance of inauthenticity. Sounds may be added to the soundtrack to make it appear more realistic. Occasionally, the accompanying sounds may need to be recreated, for example, where a sequence is used for which no location sound was recorded. Or it may need to be replaced, for example, where copyrighted music is recorded during filming but cannot be used without payment of excessive licensing fees (Aoki, Boyle and Jenkins 2006, 13–14). Some of the earliest uses of sound in non-fiction film included the addition of sound to an image to present a reality that never existed in the first place. Undoubtedly, there are occasions where synchronous sound is not recorded and so is later added to footage without destroying the verisimilitude of the representation. At other times, an actual recording might, deliberately or inadvertently, give a
136 Sound in non-fiction
misleading representation. Between these two extremes lies the range of practical applications of sound in documentary. Decisions about levels of realism and authenticity in sound in individual films reside with individual sound editors and directors. Whilst the audiovisual literacy of the audience has changed since the days of newsreels, it would be a mistake to assume that an audience would be able to determine the origin and authenticity of material in the soundtrack, especially when compared with its visual counterpart (Murray 2010). Fidelity in terms of sound most frequently relates to the idea of its truthfulness or accuracy. For Bordwell and Thompson, fidelity “refers to the extent to which sound is faithful to the source as we conceive it” (2010, 283) For others, fidelity suggests some notion of quality. Ultimately, the best judges of fidelity or authenticity are the filmmakers and participants themselves. This illustrates the curious notion of authenticity when discussing documentary film, and documentary film sound in particular. Does authenticity in this context mean not changing the recording? Can it mean removing the parts that make it appear inauthentic, or combining different elements to recreate a more representative version of (sound) events?
Truth-telling and sound For some the advent of cinema, and in particular the coming of sound, meant a more noble purpose than simple entertainment could be achieved. Yet, even in the earliest days of silent non-fiction film, whether documentary films like Nanook of the North, or silent newsreels, there had been a temptation to ‘massage’ the truth. An exposé appeared in The Literary Digest described some of the techniques used to create Great War ‘footage’ from the front: A contributor to Popular Science Monthly and the World’s Advance (New York, November) tells us that clever mechanical devices, the unstinted use of electricity, spring bayonets, gunpowder-bladders, and underground explosives are used in the production of these war-pictures, which are so realistic that they seem to bear the earmarks of the French and Belgian trench and the Polish battle-field. We read: “Agricultural laborers, farmers’ sons, and village youths, drest in the uniforms of the British and German armies, are drilled in their new duties and initiated into the mysteries of disappearing bayonets, exploding fake shells, trench-warfare, and make-believe ‘gassing.’ (The Literary Digest 1915, 1079) All filmmaking to some extent is reliant on disguised artifice to create replicas of reality. An audience presented with a reasonable synchronous representation which appears to meet expectations of lived experience would be expected to accept the representation as being real. It is, after all the most economical explanation, presented in the absence of an alternative. Sound is no different. In some ways it is preferable to fake sound rather than images, since there is such difficulty for anyone, even an expert, to determine whether any manipulation has actually taken
Sound in non-fiction 137
place. As a result, there is a significant degree of trust placed on the non-fiction filmmaker in regard to sound. Is there, then, an ethical duty to adhere to a degree of authenticity in documentary material? Seemingly, when filmmakers are dealing with documentary material there is an ethical duty to adhere to a degree of authenticity. However, differences exist over how to define authenticity in this context. Brian Winston argues convincingly that the focus of documentary ethics should be squarely on the relationship between documentary maker and participant as opposed to “an amorphous ‘truth-telling’ responsibility to the audience” (Winston 2000, 5). Whilst it is hard to disagree with Winston’s primary concern, it would be a mistake to dismiss this secondary truthtelling responsibility of non-fiction. Adopting a quasi-legal determination of a victimless manipulation in a documentary eschews the potential for a variety of other forms of misrepresentation, as well as the rewriting of history. Occasionally styles and genres emerge which adapt techniques of fiction and non-fiction to create a composite. They Shall Not Grow Old (Jackson 2018) combines restored film footage from the Imperial War Museum’s Great War archive with audio from BBC’s Great War series interviews recorded in 1964 to commemorate the hundredth anniversary of the armistice. Warsaw Uprising (Komasa 2014) is a dramatic film assembled from six hours of silent, black and white newsreel footage filmed during the Warsaw Uprising of 1944. Enormous effort was spent tracking down the types of clothing dye used at the time so the film could be colourised properly (Figure 8.10). Lip-readers were also employed to create the script for the newly recorded dialogue which was performed by actors. Sound effects and music are also added to the silent original footage.4 The tagline for the film’s trailer is “87 minutes of truth”. Warsaw
FIGURE 8.10
Warsaw Uprising
138 Sound in non-fiction
Uprising was a collaboration with the Warsaw Uprising Museum and as such is an ongoing project to commemorate the uprising, with the film’s website contains photographs of people in the original footage it hopes will yet be identified. In each case the seriousness of the subject matter means that the material itself would need to be faithful to both the participants and the events shown, even if techniques used to enhance images and sound might risk being considered by themselves to be in authentic. Authentic sound, which can be defined here as ‘sound recordings of what they purport to be’ can be considered a benchmark for authentic non-fiction sound, whilst at the same time acknowledging that authenticity is in itself a slippery term since a recording of any particular event can be radically different depending on the perspective or placement of the microphone, or the sound’s context. It is the choices in sound recording, editing and mixing that largely determines whether that sound representation taken as authentic is actually of the thing it purports to be. Augmentations and manipulations inevitably occur, as we have seen. Sometimes sound might be authentic but not synchronous. For example, it could have been recorded at that location around the same time as the filming where it was not possible or practical to do both simultaneously. It is only those who have actually done the work in creating the finished result who know whether it is a truthful representation, a more readily believable facsimile, or a dramatic interpretation of the actual events.
Impact on the audience The recognition that audiovisual material appears to be a faithful representation of actuality fuelled early fears over its misuse. The potential impacts of such representations on an audience, whether the material was real or not, were also increasingly coming under scrutiny. In the specific area of film, The Production Code (MPPDA 1930–1967) attempted to instil an ethical framework into the previously laissez faire film industry, inadvertently underscoring the power of film, unlike other media, to present a convincing replica of reality. The first general principle of the code was that “No picture shall be produced that will lower the moral standards of those who see it. Hence the sympathy of the audience should never be thrown to the side of crime, wrongdoing, evil or sin” (MPPDA 1930– 1967). The section titled “Reasons Supporting the Preamble of the Code” outlines the moral obligations on the part of filmmakers, particularly because of the powers of film: “The reaction of a reader to a book depends largely on the keenness of the reader; the reaction to a film depends on the vividness of the presentation” (MPPDA 1930–1967, III D c). We could also look to other areas of non-fiction production for guidance on ethics. Many journalistic bodies have developed practical ethical guidelines. The American Society of Magazine Editors’ (ASME) ethical codes outline basic principles about the readers’ entitlement to fairness and accuracy, the relationship of trust, the difference between editorial and marketing messages and the protection of editorial integrity (ASME 2013). The Consultative Group of International and
Sound in non-fiction 139
Regional Organisations of Journalists (CGIROJ) noted “the important role which information and communication play in the contemporary world, both in national and international spheres, with a growing social responsibility being placed upon the mass media and journalists” and set out ten principles of professional ethics, the first two of which was “The people’s right to true information” and “The journalist’s dedication to objective reality” (CGIROJ 1983). Recognising that objective reality is sometimes a slippery concept the principle is given some context: with due deployment of the creative capacity of the journalist, so that the public is provided with adequate material to facilitate the formation of an accurate and comprehensive picture of the world in which the origin, nature and essence of events, processes and state of affairs are understood as objectively as possible. (CGIROJ 1983) Brian Winston describes some of the practical difficulties which beset working journalists: Ethics in general, and ethical systems in particular, tend to be restrictive in free expression terms. They also tend to be individualistic and posit a free person facing choices as the norm. Like other workers, journalists for the most part are not free. They are responsible to their employers as well as to their readership/audience and to the participants, informants or contributors who interact with them; and these responsibilities can be at odds with each other. The owners need profit and, sometimes, a platform; the consumers information and/or titillation; the participants privacy or, sometimes, a platform. (Winston 2000, 118) Replacing ‘journalist’ with ‘sound practitioner’ in these two examples goes some way to describing the position of those working in non-fiction sound. In what is primarily a field which concerns reality there are pressures to present the work in the form that suits the employer’s wishes, or in a way more pleasing or dramatic to the audience, or in a way that casts the participants in a particular light. The manipulation of non-fiction material, whether in newsreel or in documentary, has been possible since sound recording’s introduction. Documentary and newsreel makers were quick to adapt their techniques to take advantage of what sound could offer: Soon the style and structure of the sound newsreel began to change. In order to provide smooth transitions between sequences, editors added narration and music. Instead of presenting the raw, ‘single system’ recordings, technicians began mixing, or rerecording the original sound track with other tracks, to modulate the signal strength uniformly from shot to shot and to blend in other supplementary sound tracks in a smooth fashion. Also, whenever natural
140 Sound in non-fiction
sounds were lacking in the original recordings, editors added artificial or spuriously recorded sound effects from the rapidly growing newsreel sound library to add a bogus authenticity to the sequence. (Fielding 1972, 167) Paul Rotha, one of the fathers and leading theorists of the documentary tradition did not share the same concerns. He was excited by the new-found possibilities of sound but concerned for the slavish adoption of synchronous sound at the expense of its creative use in documentary: No matter how intimately observed and edited, the material on the screen is separated from the audience by an unreal sense of illusion. But with the addition of sound, whether synchronous or not, that barrier is partially removed. The audience cannot prevent itself from participating in the action shown on the screen. This peculiar quality of sound can, and often is, a source of danger. On the one hand, it tends to permit carelessness to arise in the making of the sound band and to permit the selection of one sound rather than another, not because of its particular significance but because of its capacity for producing a sense of reality. (Rotha 1939, 216) Rotha’s view, largely shared by Alberto Cavalcanti, imagines sound in documentary as a genuine creative partner, not one in which the soundtrack strictly correlates the reality of the images. Indeed, such a marriage between sound and vision often tends to be dull if there is insufficient thought of one or the other. The elements of the soundtrack through which the audience understands the text can be manipulated at every stage: whether in recording though microphone placement, re-recording and pre-recording; or through editing and mixing to remove, augment or replace sounds. It has the potential to hide its tracks to such an extent that it is impossible to determine how something was recorded, whether it was actually live, and whether it was authentic.
Sound and ethics Though fiction and factual genres, and their respective soundtracks, are generally recognisable to an audience, the technologies and techniques involved in their creation are often virtually identical. They each aim “to give sensory embodiment to a representation that will engage us, a representation that is more imaginary than not for fiction and representation that is more historical than imaginary for documentary” (in Rogers 2015, x–xi). Are ethical guidelines appropriate for non-fiction? In comparison to visual material, sound manipulations are often fiendishly difficult to determine. How then could these or others be put into practice, and is there any way of knowing if they are being observed? Picture edits in television and film, though not consciously noted by the audience, indicate the places where a particular sequence begins and ends. Just as in photography, where a frozen moment can attempt to represent a broader reality, at the same time it offers no guarantee to be a faithful representation of the whole story: “[a]ny medium
Sound in non-fiction 141
in which hours of raw material are boiled down to minutes of consumption inevitably requires some manipulation. The challenge is to ensure the process engages but doesn’t deceive its audience” (Mercer 2011). Camera choices, sound choices and editing choices make truth telling a subjective idea. Selection, editing in and editing out, highlighting and de-emphasising, re-recording and augmentation are all used in non-fiction material. This manipulation, or potential for manipulation, indicates the need for a discussion on the ethical dimensions of such practice. In comparison to picture editing, sound editing (layering, replacement, augmentation, addition) is usually designed to be inaudible and undetectable and makes scrutiny or analysis for the audience a difficult if not impossible task. Sound is a less obviously exclusively defined representation to the audience – it is not bound by the screen. There is therefore enormous difficulty in determining to what extent the sound representation is appropriate, misleading, authentic or fabricated. The audience is largely unaware of the techniques of sound manipulation and uninformed about its potential to misrepresent. If, for whatever reason, authentic sound is augmented, suppressed or replaced, the impact of the mediation should be examined to ensure that the result does not deceive the participants or audience, or compromise the material’s accuracy or its legitimacy as archival material. All sound recording is to some extent mediated, and the mediation, whether consciously designed or unintentional, suggests that there is some onus on the practitioner to act in an ethical manner. In moving from a view of the creation of the soundtrack from passive registration to active design we must recognise its potential to mislead. Is there then a duty on the practitioner to work in a manner consistent with a high ethical standard of professional practice which goes beyond the production of simple entertainment, and which takes account of context, and of accuracy and does not take undue advantage of the difficulty for the audience to detect the manipulation? In non-fiction productions there is at the least an implication that the material being presented is authentic – that the sounds which accompany the images are what they purport to be. Even where authentic sound is used it requires attention to ensure that it does not mislead or misrepresent. For some, ethical guidelines, whether personal or in professionally organised framework such as those involved in Journalism are central to the work they do. For others, a laissez-faire approach is the norm, where the process is less important than the end result. Yet it is precisely the point that ethical work is all about the process. The choices that are made in determining what is kept in, and what is left out, become fundamentally important in non-fiction genres.
Summary When Walter Murch called for a creative supervising sound designer role in feature films, which was believed to be largely absent, it should be noted that this is one which is largely confined to a Hollywood mode of film production. When the territories of fiction and non-fiction genres were not yet clear-cut film-makers experimented with technologies and forms. In documentary film particularly
142 Sound in non-fiction
there has been a long tradition of authorial and creative control that has often extended to the design of an integrated soundtrack, in which sound effects and speech are deployed in ways that sometimes align them closely with the forms and functions of music. (Birtwistle 2016, 387) There has also been a tension between creativity and authenticity since the emergence of non-fiction filmmaking. Documentary makers frequently walk the line between art and reality. There is no inherent requirement to tell only absolute truth, and indeed it would be impossible in any case. There is, though, an ethical dimension governing the choices that are made. For Bill Nichols, this is no contradiction: “What if documentaries aren’t simply informational but rhetorical? What if they aren’t only rhetorical but also poetic and story driven?” (in Rogers 2015, x). What appear, on the face of it, to be simple, truthful sound representations in a nonfiction context can be examined to reveal a number of choices at every stage whether sound going to be recorded, augmented or replaced. Sounds which are useful to the filmmakers are not always authentic, and sounds which are authentic are not always useful. A key factor in determining the sound philosophy at work is the aim of the filmmakers and how they ultimately view the production itself. Whether it is considered to be documentary, history, ethnography, anthropology, entertainment or drama, the choices that are part of the overall sound design of the production will always be choices. Some elements are left in while others and deliberately de-emphasised, augmented, replaced or omitted entirely for the benefit of the story being told.
Notes 1 Also worth mentioning is the practice then of a title credit for sound (William Shepherd) alongside photography (Mark Friend). 2 Cavalcanti was also author of the influential 1939 article “Sound in Films” which was reproduced in the anthology Film Sound: Theory and Practice (Weis and Belton 1985). Benjamin Britten also appeared to take at least some credit for the use of sounds in Night Mail, describing his role within in the GPO Film Unit as ‘writing music & supervising sounds’ and so was responsible for the co-ordination of everything to do with the soundtracks of the films (Mitchell 2000, 83–84). 3 Mitchell also introduced the BNCR – the BNC reflexing camera – once it became possible. 4 See http://warsawrising-thefilm.com/multimedia/
References Media referenced Bayliss, Peter. 1957. Time to Remember – Enough of Everything … 1917. British Pathé. Blagden, James. 2009. “Dock Ellis & the LSD No-No.” Available online at https://vimeo. com/45983332 British Pathé. 1944. Air Battles Over Europe. Available online at www.britishpathe.com/ video/air-battles-over-europe
Sound in non-fiction 143
Craddock, Louise, Susan Earl, Sally Gross, Emma Kelly, Nicole McKinnon, Elizabeth McLennan, Sharon Parker, Dell Stewart, Sophie Raymond, Yuki Wada, Justine Wallace, and Diana Ward. 2003. It's Like That. Animated documentary. Flaherty, Robert Joseph. 1922 (1999). Nanook of the North. DVD. Goss, Jacqueline. 2007. Stranger Comes to Town. Animated documentary. Heilborn, Hanna, and David Aronowitsch. 2008. Slaves. Animated documentary. Available online at https://vimeo.com/58632132 Heilborn, Hanna, David Aronowitsch, and Mats Johansson. 2002. Hidden. Available online at https://vimeo.com/73122152 Hubley, John and Faith Hubley. 1959. Moonbird. Storyboard. Animated short. Hubley, John and Faith Hubley. 1968. Windy Day. Storyboard. Animated short. Jackson, Peter. 2018. They Shall Not Grow Old. Warner Bros. Feature documentary. Kapadia, Asif. 2010. Senna. Universal Pictures. Feature documentary. Kapadia, Asif. 2015. Amy. Altitude Film. Feature documentary. Komasa, Jan. 2014. Warsaw Uprising: Next Film. Feature documentary. Maplestone, Lyn. 1936. Among the Hardwoods. Available online at www.youtube.com/wa tch?v=4dRsu3vbMj4 McDonagh, Paulette 1931. The Mighty Conqueror. Available online at https://youtu.be/ IOM5jp9QrFo Moore, Michael. 1989. Roger and Me. Warner Bros. Feature documentary. Park, Nick. 1989. Creature Comforts. Aardman Animations. Animated short. Watt, Harry, and Basil Wright. 1936. Night Mail. UK: Associated British Film Distributors, General Post Office Film Unit.
Other references Aoki, Keith, James Boyle, and Jennifer Jenkins. 2006. ‘Bound by Law.’ Center for the Study of the Public Domain. Available online at www.law.duke.edu/cspd/comics/digital.html (accessed 25 March). ASME 2013. “Editorial Guidelines: ASME Guidelines for Editors and Publishers.” Available online at www.magazine.org Birtwistle, Andy. 2016. “Electroacoustic Composition and the British Documentary Tradition.” In The Palgrave Handbook of Sound Design and Music in Screen Media: Integrated Soundtracks, edited by Liz Greene and Danijela Kulezic-Wilson, 387–402. London: Palgrave Macmillan UK. Bordwell, David, and Kristin Thompson. 2010. Film Art: An Introduction, 9th edition. New York: McGraw-Hill. Cavalcanti, Alberto. 1985. “Sound in Films.” In Film Sound: Theory and Practice, edited by Elisabeth Weis and John Belton, 98–111. New York: Columbia University Press. CGIROJ 1983. “International Principles of Professional Ethics in Journalism.” Available online at ethicnet.uta.fi Deaville, James. 2015. “Music and sound in documentary film.” In Music and Sound in Documentary Film, edited by Holly Rogers, 41–55. London: Routledge. Ellis, Jack C. 2018. “Alberto Cavalcanti – Director.” Available online at www.filmreference.com Fielding, Raymond. 1972. The American Newsreel, 1911–1967, 1st edition. Norman, OK: University of Oklahoma Press. Juhasz, Alexandra, and Jesse Lerner. 2006. F Is for Phony: Fake Documentary and Truth’s Undoing, Visible Evidence. Minneapolis: University of Minnesota Press. Maasø, Arndt. 2006. “Designing Sound and Silence.” Nordicom Review 2 (Plenary II).
144 Sound in non-fiction
McGrath, Warren 1946. “Newsreel Sound.” Journal of the Society of Motion Picture Engineers 47(5): 371–375. doi:10.5594/J12763 Mercer, Julian. 2011. “Trust and Observational Documentary.” bbc.co.uk. Available online at www.bbc.co.uk/journalism/ethics-and-values/trust-and-choices/trust-and-observationa l-doc.shtml. Metz, Christian. 1974. Film Language: A Semiotics of the Cinema. New York: Oxford University Press. Mitchell, Donald. 2000. Britten and Auden in the thirties: the year 1936: the T.S. Eliot memorial lectures delivered at the University of Kent at Canterbury in November 1979. New edition, Aldeburgh Studies in Music. Woodbridge; Rochester, NY: Boydell Press. MPPDA 1930–1967. The Production Code of the Motion Picture Industry. Murray, Leo. 2010. “Authenticity and realism in documentary sound.” The Soundtrack 3(2): 131–137. NFSA Films 2013. “Among the Hardwoods.” Available online at https://youtu.be/ 4dRsu3vbMj4 Peirce, Charles S. 1976. The New Elements of Mathematics, edited by Carolyn Eisele. 5 vols. The Hague: Mouton. Peirce, Charles S., Max HaroldFisch, Edward C.Moore, and Christian J. W. Klousel. 1982. Writings of Charles S. Peirce: A Chronological Edition. Bloomington: Indiana University Press. Raunsberg, Preben, and Henrik Sand. 1998. “TV Sport and Rhetoric: The Mediated Event.” Nordicom Review 19(1): 159–173. Rogers, Holly, ed. 2015. Music and Sound in Documentary Film. London: Routledge. Rotha, Paul. 1939. Documentary Film. London: W. W. Norton & Company Inc. Ruoff, Jeffrey. 1993. “Conventions of Sound in Documentary.” Cinema Journal 32(3): 24–40. Sandomir, Richard. 2004. “Fox Uses Noise From Nowhere.” New York Times, 28 October. The Literary Digest. 1915. “Fake War-Movies.” The Literary Digest, 13 November,1079. Weis, Elisabeth, and John Belton, eds. 1985. Film Sound: Theory and Practice. New York: Columbia University Press. Whannel, Garry. 1992. Fields in Vision: Television Sport and Cultural Transformation. London: Routledge. Winston, Brian. 2000. Lies, Damn Lies and Documentaries. London: British Film Institute. Wintonick, Peter. 2000. Cinéma Vérité: Defining the Moment. National Film Board of Canada. Documentary feature. Zimmermann, Patricia R., and Sean Zimmermann Auyash. 2015. “Nanook of the North.” (Essay) Library of Congress. Available online at www.loc.gov/programs/static/nationa l-film-preservation-board/documents/nanook2.pdf
9 SOUND IN VIDEO GAMES
Introduction For a long time, there was a commonly held view that video games were only of interest to children and teenagers, whether in arcades or on inexpensive dedicated devices, and were not seen as an industry or medium which warranted serious examination, unlike cinema or television. Games have emerged to become an significant position in the cultural landscape because of various successful and influential games and the proliferation of games hardware whether on PC, handheld devices or home entertainment systems such as Xbox and PlayStation.1 The development of digital video games has accelerated to the point that they are virtually unrecognisable from the humble beginnings of their arcade forebears. In their use of sound, the complexity and creativity that was once only available in the cinema can now heard in high production streaming television channels as well as AAA game titles. For sound practitioners video games are increasingly important as the sound recordists, editors and designers who may have initially worked in film and television are increasingly applying their skills across different media. As the games industry has grown to outstrip film, interactive audio has become a hugely important part of the sound design landscape. In this chapter, we will examine some prominent games to explore the functions and aims of their sound designs, as well as some of the theoretical models or frameworks used in both game audio analysis and production. One of the aims of this book is to outline a theory that is sufficiently broad to encompass any type of sound design, but specific enough to be of practical use. As with the analysis of a number of different theoretical approaches film sound theories, we can compare the games and game audio models with the semiotic model to see whether it provides a useful way of examining video game sound and how well it applies to the practice of interactive sound design.
146 Sound in video games
Similarities between interactive and linear media First, it is worth mentioning some commonalities that game audio has with its linear cousins. They both use sound deliberately or meaningfully, to create stories using sound in a believable or realistic manner, even if the actual context may be fantastical, just as it is in fantastical genres of cinema or television. Both seek an emotional or intellectual investment in their audiences through the use of creative and artistic techniques in the creation of the sound design. Games practitioners often work with sound in similar categories of dialogue, music and sound effects. Both linear and interactive media use recorded sounds which are then edited or ‘designed’ to better suit their purpose. In each, some sounds are used in such a way as to be explicitly noticed or heard by the audience. At other times each uses sound which is designed to go unnoticed, but which is still intended to have some effect on the listener. Both use sound for a range of different purposes and take advantage of its various functions. Sound might have an informational function by using a concrete sound happening either on-screen or off-screen, which provides useful information for the audience or player. Both use sound for an emotional function, such as through music to accompany a battle sequence or highlight a particularly exciting or poignant moment. There are also creative and technical challenges with each that need to be overcome for each project. The pre-production, production and post-production phases of games are slightly different in focus to linear production but largely involve the move from writing, planning and artistic conception stages in pre-production, though to the creation of the content itself such as the building of environments, characters and objects in production; and through to testing and fine-tuning the final work. What is different about sound in video games? Although it shares several similarities with linear sound design, sound for interactive media is significantly different in its implementation:
Mixing – modern game audio is usually ‘mixed’ in real time, each time it is played. As a result, when it heard by the audience it is likely to be different each time. As a sound designer the focus is on creating a set of sounds that will interact determined by a set of rules for different scenarios. Shared resources – Where linear media has consistent and dedicated storage and delivery pipelines for audio content games often impose some limitations depending on the available resources which are shared with the game engine, AI elements, physics, and so on. These resources could be the amount of available RAM, disc streaming bandwidth, or CPU cycles which can be allocated to audio. As a result, this means that the implementation of sound needs to take into consideration a hierarchy of sounds in which less important elements can be discarded in order to free up system resources. It is also different in the way it is heard:
Sound in video games 147
Variability – For any game sound designer the way that their work will be consumed by the player is unpredictable. The exact sequence of events which unfold during gameplay are not known when the game is built. Duration – Games may be played for as long as the user wishes, or for as long as the game takes to reach its conclusion. There is no definite imposed time limit. Even where a game has a finite duration, it can be played again and may be different in each iteration.
In creating sound elements for a game, the sound designers may need to consider not just each sound but how it will play back and interact with other sounds and how it will be affected or interact with other elements in the context of the game. For example, the physical environment of the game may change how the sound will be heard. Speech may be heard naturally or through an intercom effect. It would be impractical or impossible to account for every possible location or circumstance each individual sound may in inhabit. As a result, the equivalent of the rerecording mixing stage where each sound element is brought together is instead focused on the logical rules that will be applied in given circumstances at run time.
Early game sound design – creating meaningful sounds Slot machines and pinball machines were the forerunners of video games. The mechanical or electro-mechanical sounds that accompanied aspects of these games were often bells just as they were in a telephone because they produced a lot of ‘bang for buck’. Different bells could indicate to everyone within earshot that a slot machine was paying out, or warn the arcade owner that a machine was being ‘tilted’ (Collins 2016). As with modern video games the sound of these machines were used to provide player feedback such as indicating a particular threshold score had been reached or announcing some other reward. An added benefit of sound was that it was clearly audible to nearby potential players and could be used to entice new customers through the sound as a means of attraction.2 Spacewar! (1961) is generally considered to be the first widespread video game and was originally intended to have sound, but memory was at a premium and so sound was omitted.3 A later game Computer Space (1971) used a standard General Electric portable television and did use sounds, listed in the owner’s manual as “beeps, missile scream, rocket thrust, and explosions” (Bushnell 1972, 1). One of the first video games to achieve commercial success was Pong (1972), originally made by Atari for the arcade, and later ported to other hardware such as the Binatone system for the home market. The sound design of Pong consisted of a relatively simple, but nevertheless meaningful, system of synthesised tones synchronised to the ‘ball’ hitting either players’ bats or the sidewall, or the scoring of a point.4 Through synchronisation with the action the sounds were quickly recognisable and meaningful. Asteroids (1979) brought a greater range of sounds; ten in all, none of which exceeded one second duration. There were separate sounds for
148 Sound in video games
fire; small, medium and large explosions; thrust; small and large flying saucers; bonus life; and two samples for the music which consisted of two alternating notes. For the two-dimensional fixed shooter game Space Invaders (1978), the four individual sound effects were shot; player die; invader die; and UFO, which repeats as a continuous loop whereas the other sound effects are all ‘one-shots’. A fifth sound is the musical accompaniment, which is a looping four note descending sequence. While early games used simple synthesised sounds the introduction of sample playback brought with it the potential to use recordings of real sounds. By the time we get to modern games which are delivered on platforms that support several gigabytes of storage the focus has shifted from the sheer number of sounds to the flexibility of sounds. For example, in LA Noire (2011) there were over a hundred ambience tracks which were made from loops of between ninety seconds and two minutes (Theiler 2013). More recently still a more flexible and efficient approach to creating ambience sounds is by using smaller component sounds that are then used to ‘spawn’ a constantly changing soundscape, being placed in an environment with varying volume and spatialisation.5 At the time microcomputers were insufficiently powerful to run Space Invaders so its creator Tomohiro Nishikado designed a purpose-built board to handle the graphics and the mono sound was based around the newly available Texas Instruments SN76477 sound chip. The limitations of the hardware inadvertently influenced the gameplay mechanics: Originally I wanted to move 55 Invaders at the same time, but the hardware only let me move on Invader every 1/60 second. As a result, Invaders began to move faster as they decreased in number. But in the end, this actually added more thrills to the game. (Nishikado 2004, 43)
FIGURE 9.1
Pong
Sound in video games 149
FIGURE 9.2
Space Invaders
FIGURE 9.3
Asteroids
As the game play speed increases the tempo of the music also increases, peaking at the end of the level until one speedy alien invader remains, and is therefore most difficult to finish off. Asteroids, by contrast, has its difficulty and dramatic peak long before the end, since the final asteroid is no more difficult to destroy than the first. In both Asteroids and Space Invaders the sound implementations can be seen as examples of the use of symbolic sound signs which are meaningful. By using simple characteristic differences between sounds, an existential link between the sound itself and the event that caused it, and a meaning relating to the differences
150 Sound in video games
between the sounds, the relatively simple sounds give useful feedback to the player. In the case of Space Invaders, a by-product of the hardware choice is that the speed of the game, and its sounds and music, increase as each level becomes more difficult. The musical tempo increases as the number of aliens declines. This simple relationship between the game state and the musical accompaniment was a forerunner of the techniques that allow player complex feedback through interactive musical accompaniment.
Theoretical models for video games Sound designers for games, like their counterparts in other industries, collaborate with colleagues involved in other aspects of the production. In common with the analysis of sound in other audiovisual media, an important concern is on the language used to talk about games broadly and game audio in particular. There is inevitably some specific technical terminology that must be learned by practitioners to communicate technical details when working in a team environment. In games there may be conversations discussing topics such as asset counts, memory budgets and bandwidth. Whilst they are not concerned with the actual production or the experience of the person watching, listening to or playing the finished article, they do ultimately have an impact on it. There are several theoretical models of game structures such as the rules, story and gamespace along with descriptions of the different typologies for games overall and classifications of the dozens of game genres. It is difficult to disentangle categories of sounds, types of sounds and functions of sounds without talking about their usage in the context of a game. Chris Crawford’s The Art of Computer Game Design (1984) was influential in that it set out a path to what games could become despite limitations of hardware and regardless of individual game genres. One more modern book with widespread influence is Jesse Schell’s The Art of Game Design (2008) also focuses on the ultimate aim of the design process which is to create a an experience. It contains numerous lenses through which to analyse games and isolates four major building blocks: technology, mechanics, story and aesthetics, which form the ‘elemental tetrad’ of games (Schell 2008, 44–45). Each aspect of game design, including sound design, is examined using a series of lenses such as: needs, problems to be solved, or indirect control, creating many fruitful starting points for designers. One framework to describe games that has become influential is the mechanicsdynamics-aesthetics model (MDA) which emerged from a workshop presented at the annual Game Developers Conference. Its aim was to “bridge the gap between game design and development, game criticism, and technical game research” (Hunicke, Leblanc and Zubek 2004, 1). In this model, just as the design requirements must have an effect on the code required, it follows that the coding decisions, algorithms, tools and vocabulary will manifest themselves in the affecting the gameplay. For example, Space Invaders’ technological limitations and implementation had a profound impact on the gameplay just as the design requirements determined the visual style, the weapons and sound choices.
Sound in video games 151
The MDA model takes the components of a user perspective model of games and pairs each with its design counterpart: “From the designer’s perspective, the mechanics give rise to dynamic system behaviour, which in turn leads to particular aesthetic experiences. From the player’s perspective, aesthetics set the tone, which is born out in observable dynamics and eventually, operable mechanics” (Hunicke, Leblanc and Zubek 2004, 2). In short, Rules ≡ Mechanics, System ≡ Dynamics, Fun ≡ Aesthetics: Mechanics describes the particular components of the game, at the level of data representation and algorithms. Dynamics describes the run-time behavior of the mechanics acting on the players’ inputs to the game and the results of this player interaction in the game over time. Aesthetics describes the desirable emotional responses evoked in the player, when she interacts with the game system. (Hunicke, Leblanc and Zubek 2004, 2) The MDA model also separates out the aesthetics component into different elements such as challenge (obstacles to be overcome), fellowship (game as social framework), discovery (game as uncharted territory), narrative (game as drama), and so on (Hunicke, Leblanc and Zubek 2004, 2).6 Its influence can be attributed to the insights that are possible through its application to different games and aspects of games. It can reveal aspects of the game play and we can examine the design implications in dynamics and mechanics that support that desired player experience. In terms of the way the rules or mechanics for a game can be explained the semiotic concept of abduction, or hypothesising, is fundamentally important in learning and playing games. When presented with a new game, whether in an arcade or using a PlayStation, a large part of the gameplay needs to be learnt simply from playing. The feedback to the player, both from the controller itself and the reactions to events in the game determine whether or not a player will persevere with the game. A sweet spot, neither too easy nor too hard, in which feedback can be incorporated is required to build up the necessary conceptual framework in order to progress through the game. We can also look at the aesthetic aspects of the way sounds are implemented to examine what is communicated to the gamer from a semiotic perspective. For example, where challenge is a significant element and a particular level needs to be completed the sound implementation might provide negative as well as positive feedback. Sound feedback here is intended to make the player feel something. A mocking commentary on the gameplay such as the dog in Duck Hunt mimicking laughter, is described by game audio designer Scott Martin Gershin: “Using a modulated synthesiser it almost has a laughing quality … it’s kind of fun when games tease you – is that the best you can do?” (Gershin et al. 2017). Here it the effect on the player from the use of sound which is important. The ‘end result’, or interpretant in semiotic terms, is to entice the player to overcome the difficulty of the game through teasing or gentle mockery. The sound most appropriate to that design aim is used.
152 Sound in video games
Game audio models Along with models of games as a whole there are also a number which look at game audio specifically. An aspect of games that is subtly different to linear media regards the space or world that the game and the player inhabit. In a film there is a relatively distinct boundary between diegetic sound—the story world that is heard by the characters, and non-diegetic sound—the sound which can only be heard by the audience, such as narration or music score.7 In a video game there is a diegetic space of and from the game world, but also a separate space in which the individual player can hear, but which other players or characters cannot. This differentiates a space unique to games, which Kristine Jørgensen describes one “where the participatory nature of games allows the players a dual position where they are located on the outside of the gameworld but with power to reach into it” (Jørgensen 2011, 79). The diegetic/nondiegetic model, being one of the simplest and most widespread of the film sound models has also found a use in game audio. Games, like films have a story world or space which extends beyond the frame of the visible screen. There is also the idea that sound can occupy space which may be onscreen, or offscreen, or outside the gameworld itself. Where the diegetic/nondiegetic divide has proven to be of use to film theory teachers (if not widely adopted by practitioners) it becomes more problematic in describing the game’s sonic world (Jørgensen 2011).8 How should one view character dialogue or voiceover that is directly addressed to the player or player’s character, for example?9 Similarly, how do we characterise talk between players in a multiplayer game? Would this be different if the players are physically in the same room, compared to communicating through headsets via the game? In a game, the player occupies a space that spans the gameworld and the real world. We can use the term gamespace to accommodate such sounds that are part of the conceptual space in which the game is played. Menu elements, or spoken communication between players would be part of the gamespace but not be part of the gameworld, for example, in the same way that such conversation between players of poker or monopoly might have conversations directly relating to the game which are part of the gamespace. Similarly, using a straightforward dialogue/music/effects classification potentially downplays the functional aspect of the usage. For example, speech sounds might be conversational, whose only function is providing a background ambient bed, or it could be could be primarily informational or instructional (e.g. the leader of the mission giving a briefing). It might also be primarily expressive where the content of the speech sounds is secondary to the emotion being conveyed, or it could also perform a secondary function as a locatable sound source in that it gives concrete information such as an enemy’s direction/distance away. Speech might also operate as simple narration. Tomlinson Holman’s model of the functional aspects of the soundtrack for film sound can also be adapted to game audio. In Holman’s model, the soundtrack performs three functions: direct narrative, subliminal narrative or grammatical
Sound in video games 153
(Holman 2010, xi–xii). In film, the grammar function might be to hide picture edits by having consistent and continuous sound before and after a cut, for example. Since there is no direct equivalent of the picture cut in games the grammatical rules are slightly different. Rather than examining individual sounds to position them we can then look at how a sound’s function contributes to the overall experience. In film and television it is relatively unusual to attract attention off-screen since the narrative is unfolding on screen and any threat to break that engagement in the story is potentially fraught with danger.10 In a game the player often exists in a three-dimensional world and has agency to change their own perspective and field of view. The virtual objects that emit a sound in a video game are already attached to visible objects positioned in three-dimensional space, and those same coordinates can be used by the sound engine to map each sound’s origin. Here the sound’s directional information is intimately tied to the mechanics of the game and its primary function might be to signal the player to change their field of view (their visual display) to somewhere currently off screen. In a film this localisation of directional information would be a panning decision, where in a video game is a semi-automated process that is required to interact with the game. Jørgensen emphasises the interface aspect of many of the audio elements of games and positions them along a continuum depending on their function where metaphorical interface sounds such as music are at one extreme, while at the other end would be iconic interface sounds that would be diegetic or hard sound effects synchronised with actions or events (Jørgensen 2011, 91–92). Many other models of game audio use the diegetic/nondiegetic as a starting point to incorporate a more functional perspective. For example, Alexander Galloway uses a twodimensional model in which sounds are described in terms of their origin in the gameworld with the two axes being Operator/Machine and Diegetic/Nondiegetic (Galloway 2006, 17). Other models such as that from Westerberg and Schoenau-Fog expand on the simple diegetic/nondiegetic model to different sound types depending on their function: Diegetic sound – Both the sound and to what it refers to, exists within the world of the game, e.g. the sound of a shot when the player shoots a gun. Masking sound – The sound belongs to the world of the game but contains a non-diegetic message, e.g. a monster’s roar as a warning to the player when it is in range to attack. Symbolic sound – The sound does not belong to the world of the game but is caused by a mechanic in the game, e.g. music playing just before an ambush to warn the player. Non-diegetic sound – The sound neither belong to the world of the game, nor does it refer to anything in the world of the game e.g. the sound of hovering the mouse over the menu interface. (Westerberg and Schoenau-Fog 2015, 47–48)
154 Sound in video games
In this model of game audio sounds can also be characterised as being proactive or reactive (from the perspective of the game, rather than the player) such that proactive sounds supply information to the player, where reactive sounds are responding to player action (Westerberg and Schoenau-Fog 2015, 49). The Interface-Effect-Zone-Affect (IEZA) framework has also been influential in describing the usage of sound in games using diegetic/non-diegetic as vertical axis, and setting/activity as horizontal axis creating four quadrants to categorise the use of sound (Huiberts 2010, van Tol and Huiberts 2008). Effect sounds are perceived as being produced by or attributed to sources that exist within the game world, where Zone sounds whilst also coming from the world are environmental rather than being tied to objects or sources. Interface sounds originate outside the fictional game world, but often communicate abstract information as health or score through symbolic sounds. Finally, the Affect group of sounds do not relate to a particular activity but instead “communicates the setting of the nondiegetic side of the game environment and is used to add or enlarge social, cultural and emotional references” (Huiberts 2010, 29). The IEZA model has been combined with other models from film sound (Wilhelmsson and Wallén 2011) and with the Schell’s Elemental Tetrad (Ralph and Monu 2015).
Game audio and semiotics A common complaint about working with collaborators across artistic disciplines is the lack of a shared language. For Doug Church “the primary inhibitor of design evolution is the lack of a common design vocabulary” (Church 1999). The semiotic model described in this book can be thought of as an overarching framework into which other models can be integrated. If we compare the various models of game sound, we can see how well they can be incorporated into the semiotic model of sound. Game design is, by definition, about the design process concerned with problem solving. The various model of games and game audio highlight different aspects of the process of design, or of the gameplay itself. Whether it is focussing on different aspects of experience, story, mechanics and technology in the Elemental Tetrad or the IEZA’s categorisation of sounds along twin axes of effect/affect and diegetic/ nondiegetic, there is space for examination of the way games are understood or learned by the player. Semiotics is a potentially fruitful way of examining what the intended consequence or effect of each component might be and then examining the way that end is achieved. For sound designers it also provides a way of demonstrating the particular advantages of using sound to provide solutions to design problems. In the game Galaxian (1979) – a variation on Space Invaders – shooting the enemy aliens as they periodically swoop down is rewarded with bonus points, and this is signified through a sound motif and a visible number indicating points won where the alien was destroyed. In amongst lots of shooting and swooping of aliens it might be difficult to determine initially what each sound referred to but
Sound in video games 155
eventually it comes to have meaning, which would render the numerical display slightly redundant as visual attention was needed elsewhere. In Bomb Jack (1984) different signature sounds accompanied game events – the birth of a new enemy, and their fall to earth to enable them to move, the player’s jump, player’s death, and so on. When a power pill became available it was accompanied by a sound, and when the pill was used, it was both signified and timed by a musical motif that played for its duration. This enabled very clear feedback about the power pill’s duration without requiring any countdown timer or visual representation to indicate the moment the pill would abruptly wear off. In each case there was little or no need of instructions or rules to introduce these concepts as they became self-evident with play. What the sound designer considers is how that sound is going to be interpreted by the player, and how it might change or react to player actions, or respond to changing game dynamics, which can also give clues to the player. For example, if music plays when an object is picked up, should there be some acknowledgement or sense of reward? At other times the feedback to the player might be subtler. If the ambience or music changes in some way immediately before the appearance of an enemy boss, that association may initially go unnoticed but might eventually be recognised as being somehow significant as the player becomes more accustomed to the game. In order to play a game a player brings with them a set of existing paradigms – familiarity with other games or genres of games which function in a broadly similar way: side-scrollers, platforms games, first person shooters, etc. Each player builds up a conceptual map that helps them understand the grammar of the game, and what they are required to do in order to progress through the game. The sound that accompanies an action is partly an aesthetic choice. The fact that the sound actually accompanies an action or event in the game is part of the game mechanics itself. For a sound designer, the sounds themselves, and the way the sounds interact or change as a result of the player’s actions, and the game context, are all interrelated and part of the sound design process. The semiotic concept of abduction as the preliminary stage of logical reasoning is a very useful one to apply to gameplay. We literally have to learn how to play, and learn what things are significant in the world that is presented to us as we interact with it. Sounds become linked with objects, actions and events. The exact choice for both synchronous event sounds and non-synchronous sounds might be entirely new, or might bring with it metaphorical echoes of previous similar sounds heard from the natural world or from other games, or other media. As each sound usage becomes more familiar it symbolises its object and can be recognised and interpreted. Sounds that accompany events or actions have a built-in indexical link to the event/action. Once that action is repeated and a similar sound is heard there is a very strong link made between the object (action or event) and the signifier (sound). By the same token there may be some deliberate ambiguity about the sound that is accompanying an action, or whichever object is being denoted by a particular sound. Some classic arcade games used this link between sounds and
156 Sound in video games
game actions to allow visual attention to remain elsewhere. The process of learning how to play a game is so naturalised that it might avoid analysis, but it is a process nonetheless. At the start of a game we are presented with minimal information and work out the most reasonable explanation. We carry on with this hypothesis until a better one emerges or is forced on us. Each new piece of information might be assimilated into our understanding or might modify it.
Modern games Andy Farnell, author of Designing Sound (2010) views sound from three different perspectives which form the three fundamental pillars of sound design: physical, mathematical and psychological. The ‘physical’ pillar involves understanding the transmission and propagation of sound in real life, whereas the ‘mathematical’ is concerned with the how computers render a representation of the real world. Finally, the ‘psychological’ is about perception and “how we extract features and meaning from them, and how we categorise and memorise them” (Farnell 2010, 7–8). These three pillars correspond in part to the development of the study of sound over time, and which began as a unified study in ancient Greece, and that only began to fracture along acoustical, philosophical and psychological paths around the time of Isaac Newton and John Locke. The digital pillar is fundamentally concerned with the specific tools of sound design, and how interactive sound software and middleware allows the manipulation of sounds to be used interactively.
FIFA Genre games have the advantage or reducing the learning curve of a new game, and have grown somewhat since the days of being classified as “racin’, fightin’ or shootin’” (Martin Campbell-Kelly in Collins 2008, 123). Supergenres such as fighting, strategy, adventure, role-playing, racing and sports games each have consistent characteristics that make categorisation almost inevitable. Where electronic sound sources lent themselves to early spacethemed games involving aliens, rockets, and the like, modern games cover a much broader range of topics each with their own aesthetic specificities and sound requirements. Sport-based games like FIFA series (1993-present) and its principal rival Pro Evolution Soccer (1995–present) each have annual updates that parallel the seasons of football. Both attempt a lifelike portrayal of both the physical world and the players themselves to create as believable a replica of reality as possible. Each involves detailed attention to the graphical presentation, modelling real teams, kits, stadia, and players whilst supporting the game with statistics about individual player attributes on which to base the gamer’s team selection, formations and player trading. FIFA 18 was the bestselling boxed game in the UK in 2017 and also made the top 20 US best-selling video games (despite the relatively minor interest for soccer in the US).
Sound in video games 157
The sound design for both games looks to real-life and media presentation of soccer games. For FIFA, match commentary is supplied by voices recognisable from television broadcasts, which hope to lend the gameplay an air of reality. For the English language versions Martin Tyler and Alan Smith are familiar voices who record hours of commentary both specific and general scenarios, often using the same commentary microphones that they would use during a live broadcast. Interestingly, for FIFA16 the commentators did not have any visual reference as they recorded the commentary, which is unscripted as Martin Tyler explains: Everything we do is unscripted which is why I guess it feels natural – because it is natural – but they will give us the scenario. They’ll say, “A team’s 3–0 up with ten minutes to go, and then they concede a goal. Give us three versions of what you say.” (Copa90 2015) The commentators are not actors in the sense that they are not expected to read dialogue of what someone imagines commentary to be. Instead, they perform as they would in their normal role as television commentators. By performing as they would, with natural sounding commentary that matches the game scenario, a very realistic sounding commentary is the end result. The purpose or function of the commentary is a feeling of realism, matching the experience outside of the game. Whether that feeling is created in this particular situation is dependent on the recognisable voice of the commentator, the natural delivery of their voice, and its suitability for the game scenario. FIFA and many other sports-centred games also rely on familiarity and authentic correctness of key aspects the sound design. A typical focus of any game’s
FIGURE 9.4
FIFA 18
158 Sound in video games
sound is as part of the game mechanics which can be learned and mastered, and which is often as close as possible to a realistic depiction. In the FIFA series the commentary exists so that the player feels as though it is as close to the real thing as possible. Here presenting the ‘real thing’ in sound design has two quite different meanings and requirements: for commentary it is related to the televisual production of the professional game, where for the game play sounds it is more closely related to the actual experience of playing the game of football.
The Last of Us Set in a post-apocalyptic US much of the focus for the sound design for The Last of Us (2013) is in creating complex and natural sounding environments that also support a particular feeling or mood. Many of the sound design elements in The Last of Us presented an immediate challenge in that the environments in the game are all non-industrial. The games senior sound designer, Derrick Espino explains: We knew that it was going to be challenging for a world that had no technology, no electricity for the most part. Also we wanted to create environments that were natural but underscored the mood of the game. Here again the dual requirements of the background sound ambience are explicit. The need for believability is married with a secondary or subliminal requirement to underscore a sense of atmosphere in keeping with the story. This multiple purpose approach was to sound design extended beyond ambience to the whole soundtrack. The approach to the aesthetic of the sound design is summed up by Game Director Bruce Stanley as the question to a question: “What is the most minimal that you need to pull off what you are trying to achieve?” (Coleman 2013). The game also gave us enemy characters referred to as ‘clickers’ which use a combination of clicking, screeching and moaning to create a frightening effect, but which is completely human in origin being performed. An early idea for the creatures, based on the early conceptual artwork, was that they would be blind but would have the ability to use a form of echo-location. The performances of the principal characters were recorded using motion capture and voice recording for both cinematics and in game audio. During one of the recording sessions one of the voice actors made a particular sound, in the back of the throat, combined with a squeak, which appeared to create the desired result, as Creative Director Neil Druckman explains: Here is this benign clicking sound that on its own isn’t very freaky, but put in this other context all of a sudden you hear this sound and you have this different symbolism in the game and it becomes this fear factor. And again using audio, you don’t know where this is, but you hear it echoing down the hallway and people get very scared when this hear this audio cue. (Druckman in Coleman 2013)
Sound in video games 159
FIGURE 9.5
The Last of Us
Once this sound is recognised or learned it also becomes a functional part of the game which gives the gamer useful information – hearing a clicker before seeing the character provides a locatable enemy. The underlying minimalistic approach was what Audio Lead Phillip Kovats referred to as using sound in a “Hitchcockian way” concerned “more about the psychology of what is happening on the audioscape” (Coleman 2013). For The Last of Us there was a deliberate choice to create a set of sounds that supported a common goal for each category but which often had multiple simultaneous functions. The ambiences needed to be a believable portrayal of a world with no technology as well as supporting the feeling of isolation. The clickers were distinctive and inhuman whilst simultaneously being human in origin and with the ability to be performed by either actors or members of the game’s sound team could be manipulated for variety, and once learned could be recognised and act as a warning sound of an unseen enemy nearby.
Call of Duty WWII Call of Duty WWII (2017) was the best-selling game of 2017. It is a first-person shooter that adopts a composite approach to sound design that is typical of many games using a historical setting. As well as a requirement for intensity a level of authentic realism with attention to detail for certain key elements such as weaponry and the battlefield was a fundamental goal. Audio Director Dave Swenson “wanted the game to feel immersive and authentic to World War II. Every sound was custom recorded for the game.… we wanted the battlefield to sound intense and realistic” (Walden and Andersen 2018). For the gamer such attention to accurate sonic representation might support a sense of believability and heightened drama. In Call of Duty WWII (COD WW2) the appropriateness or accuracy of the sound design was also tested by someone who had first-hand experience by a veteran of the Second World War acting as
160 Sound in video games
FIGURE 9.6
Call of Duty WWII
advisor to the project who could advise on what a solider would hear in those circumstances from a first-hand account: What was happening on Normandy Beach when they were landing? What would the soldiers hear? Were they doing a lot of shooting themselves? Were they mostly just trying to find cover from the MG 42 machine guns that Germans were firing at them from the bluffs above? He would tell us stories and talk to us about real experiences of the soldiers and that would help my team to understand what it might have sounded like. (Swenson in Walden and Andersen 2018) The extra effort and attention to creating a more realistic sound design, rather than purely aiming for engagement or excitement alone went beyond what might be expected of a traditional first-person shooter: I wouldn’t want a World War II veteran to sit down and experience the game and say, “That’s not what it was like.” That would be a real failure for me. My hope would be that a World War II veteran would say, “Yeah, that’s exactly what it was like.” That was our goal and that’s why we put in all of the effort that we did. (Swenson in Walden and Andersen 2018) This attention to a believable and accurate portrayal of the experience of warfare uses the intuition and skills of the sound designer along with the guidance of an eye- and ear-witness. For other aspects of the sound design, such as fantasy characters, there is no relevant historical source to consult. For the sound of the zombies in COD WW2 a starting point for sound design was the fact that they were meant to be recently
Sound in video games 161
deceased German soldiers. For these characters a special zombie language was created by Don Veca which he named “Zom-Deutsche” (Walden and Andersen 2018). Recordings of actual German dialogue was split into component syllables and then scrambled. The resultant gibberish language retains the component sounds of German speech whilst removing it from its semantic meaning. This zombie dialogue has enough familiarity to be Germanic sounding without actually being recognisable in a semantic sense as German language. In the same way that Murray Spivack first played back a tiger’s roar to disembody the sound from its source, recognisable language here is used as the starting point in a manipulation that takes recognisable characteristics and re-associates them to transplant or suture them to a new object and a different context.
Limbo Few recent games have made such an impact as Limbo (2010) in the sound design community. A 2D puzzle platform game in essence, with a style of gameplay sometimes characterised as a ‘learning by dying’, Limbo was an unusual and influential game that despite its apparent simplicity it was nevertheless gripping and demanded an emotional investment of its players. The game and its sound design garnered numerous awards but was the first game for its sound designer, Martin Stig Andersen.11 Limbo combines a number of interesting sound ideas and Andersen brought with him about sound and music in general: I remember when playing the piano in my childhood I had this abstract inner vision of pulling the keys on the keyboard apart, and entering the sound, like I wanted to be inside sound itself. Today I haven’t been using a keyboard for over ten years, and I’ve learned to form sound as if it was a piece of clay. (Andersen and Kastbauer 2011) Working with the project’s director, who had been thinking about sound before employing Andersen, several counterintuitive directions were given: first that sound and music should “avoid music that would manipulate the emotions of the player” (Andersen and Kastbauer 2011). Since this is precisely the function normally required of music in many audiovisual productions, it is worth unpacking what he might mean here. Rather than being ‘manipulated’ or steered into a set emotional response by the music, it can be thought of instead as allowing the individual to create a response to the world for themselves. It reminded me of the aesthetics of light and sound; you have something recognizable and realistic, but at the same time it’s abstract…. It’s the same as what I love about how we use sound. We have all these slight references that focus on ambiguity, so it’s more about what the listener imagines, rather than what I want to tell them. (Bridge 2012)
162 Sound in video games
When Andersen discusses the intention of the sound design it is in terms of the feeling that is created or instilled in the player: of foreboding and a sense of isolation (Bridge 2012). Whilst this deliberate manipulation often has negative associations, in Andersen’s view it is well within the role where “audio designers try to use their craft to manipulate players in new and different ways as an enhancement” (Bridge 2012). Andersen’s approach of using sound “as if it was a piece of clay” points to a process in which the original sound source is virtually unimportant and, in a sense, meaningless because it will be so divorced from its usage that only the merest hint of its origin remains. With the aim of using sound to make the player “feel a bit on edge” his approach was not one which focused on specific sounds that would be included but rather was on the effect of the choices: “‘What did that make me feel?’ and it might have been the best or coolest or most wonderful sound, but if it’s not contributing to the emotion or atmosphere of the game, unfortunately it had to go into the basket” (Bridge 2012). This approach is one borrowed from electroacoustic music in which a sound is transposed from its original context: “I might just extract the texture or colour and then use it to transform another sound. It leads to a slightly un-natural but useful quality allowing me to create an audio world that is generic and yet unique” (Broomhall and Andersen 2011). The use of sound was designed to encourage the player to gradually become more involved and reflect on their own emotional response to what was happening in the game. I was trying to achieve the creation of a world structure with the audio going from quasi-realistic sound that you hear in the forest – naturalistic – then as the boy progresses through the world things become more and more abstract,
FIGURE 9.7
Limbo
Sound in video games 163
almost transcendent.… So what I wanted to contribute was more along the lines that the boy got habituated to the violence – rather than the player, with the player almost wondering how to feel and with the music sometimes almost representing forgiveness. (Broomhall and Andersen 2011) As with each of the games described it is possible to see both the intention of the sound design and the actual soundtrack implementation in semiotic terms. From the perspective of the sound designer the aim was to “sound as if it was a piece of clay” where its use is “more about what the listener imagines, rather than what I want to tell them”. The actual sounds themselves are the vehicles for a meaning which may not be completely clear. How that sound is interpreted or understood is left quite open since there is a deliberate graduate removal of any strong indexical link to concrete objects from the real world.
Immersion – modelling reality or ignoring reality? Immersion as a concept is quite difficult to pin down, particularly when talking about games, since it often has an implicit reality component where the more realistic the graphics, the world, characters, sound are presented the idea goes, the more immersive the experience. We can talk about being immersed, or engrossed in a book, which is obviously not a realistic experience. Imaginative immersion would be achievable in a good novel. It is not dependent on any lifelike quality of either audio or visual representation. We might get pleasure from watching a football match, or watching others play competitive games where the interactivity is vicarious. The unfolding story or experience is what is immersive. If it holds our interest, it has the potential to be immersive. Different types of immersion (sensory, challenge-based, imaginative) will require different approaches and designs (Ermi and Mäyrä 2005). In virtual environments sound is thought be have a fundamental requirement to provide accurate and/or authentic models of acoustics in order to achieve a sense of immersion. Frequently this accuracy is achieved by focus on the simulation of sounds, sound propagation and room acoustics, and binaural rendering; often using hardware and techniques adapted from graphical rendering techniques (see Serafin et al. 2018). It is interesting to note that in virtual environments, the audible and visual functions are often considered as separate and mutually exclusive but there is evidence to suggest that perceptually they can affect each other. In a study where both high- and low-quality auditory displays were paired with high quality visual displays the audio affected the quality perception of the visual display (Storms and Zyda 2000). Better quality audio made the images appear better than when they were presented without sound. Equally important, lower quality audio made the images appear worse than when they were used without sound. This finding would correspond with the widespread belief and lived experience of many visual special effects artists as well as sound designers.
164 Sound in video games
Fidelity is seen as important but ‘sonic realism’ in games is a concept that benefits from some unpacking. How much realism is required? For Droumeva Milena the concept of verisimilitude provides a broader perspective than simply fidelity or realism with respect to game audio, where verisimilitude “concerns itself with the experience and nature of truthfulness and authenticity in a game context, as conveyed through the game soundscape” (Milena 2011, 143). Repetition can undermine a sense of immersion if what is being presented is an otherwise realistic scene. While Pong or Space Invaders used the same sounds over and over again it is not considered a problem, but where hearing a single phrase being used too frequently in FIFA might inadvertently show the limitations of a large but ultimately finite commentary database. This runs counter to the required sense of spontaneity in commentary which is the hallmark of a live event. Context is crucial. In a game like Dragon Age (2009), footsteps are a significant part of the game giving natural feedback about the player’s character movements and also those of potential enemies. Accordingly, to the player it is important that they never appear to repeat as here repetition is the enemy of believability. Technology need not necessarily be hidden to create a sense of immersion. In Simon McBurney’s theatre piece, Encounter, the audience listens to the production on individual headphones often through the binaural dummy head microphone located onstage. Inspired by the meeting between photographer Loren McIntyre and the Amazonian Mayoruna tribe, the technological advantages of this approach are many and varied: When you put the headphones on you get the impression of being alone. McIntyre was being alone so you have to reproduce the feeling of being alone, being in an audience of several hundred people, and this does seem to work. I wanted a feeling of intimacy because I wanted people to examine their own empathy. Empathy and proximity are intimately connected. (McBurney 2016) Delivering the intimate sound to each audience member through a binaural microphone aims for a different effect to traditional narrative storytelling where, “binaural stereo experience [that] moves the listener into the scene of the original performance, in contrast to other space-related recording techniques, where the acoustic event is moved to the listener” (McBurney cited in Orozco 2017, 35). The impression of intimacy particularly effective despite the artificiality of the conceit, and given that the audience is purposely shown how this technology works and how it is being used. In games, as in cinema and television, there has been a tendency to assume that greater realism will lead to greater immersion. Whilst it is tempting to imagine that greater immersion will be the inevitable result of technological advancement and ever more accurate sonic representation there is evidence from other fields that this is not necessarily the case. We know from film sound that this seemingly common-sense approach was doomed to failure from the sound-scale matching approach advocated by J. P. Maxfield of Western Electric in
Sound in video games 165
the 1930s (Altman 1980, 50–54). More specifically in games, Mark Grimshaw is also wary of this approach arguing that “perceptual realism (as opposed to a mimetic realism) where verisimilitude, based on codes of realism proves as effective if not more efficacious than emulation and authenticity of sound” (Nacke and Grimshaw 2011, 273).
Meaningful sound design The SoundLibrarian (Sound Designer Stephan Schutze) series 101 Games You Must Listen to Before You Die contains a video analysis of the audio in Bioshock 2 (2010). In it he describes the use of the beautiful music composed for the game and the initial introduction of the narrative, and in particular the relationships between the player character and other characters in the game. His own reaction when hearing the ‘little sister’ in the game being threatened was one of fury: For reference, I don’t have children and yet when playing this game I found myself reacting in this way when I heard one of the little sisters scream. I was just furious that anything would threaten them and defending them was the only thing on my mind at the time. (Schutze 2013) One fundamental difference between game sound and linear sound design is that in games the player is not a passive audience member. We listen to a film, but we interact with a game. Through their actions the gamer has an influence on what is heard: This means that game sound has a double status in which it provides usability information to the player at the same time as it has been stylized to fit the depicted universe. This may create confusion with respect to the role of the sound since it appears to have been placed in the game from the point of view of creating a sense of presence and physicality to the game universe while it actually works as a support for gameplay. (Jørgensen 2011, 81) Sound in a game is individualised for a single player. Bartle suggests four categories of player – achievers, socialisers, explorers and killers – with each individual having different preferences and different ways of enjoying or consuming games (Bartle 2004, 77). Whatever the motivations for playing the game the sound that is heard will be meaningful since it is provides the context for the game world as well as responding to the players’ own actions. This also creates a potential problem for sonic clarity since the lack of predetermination means there is a chance that the mix could at times be overwhelmed with competing sounds. As a result, as with any other type of mixing, a series of decisions need to be made in which a
166 Sound in video games
hierarchy of sounds is used to prioritise those heard from all those which may be available. In some cases, a mixing system is implemented which accommodates the most important sounds at the expense of less important ones. Given that the sound will not be heard alone but will be heard alongside the visual elements of the game there is potentially an overload of information which would work against the overall design aesthetic. In a game like Cuphead (2017), it means removing a great deal of the sound that was originally intended to be heard: When we were mixing the game, it became more about removing sounds than anything else. The most important sounds were the ones that help notify the player of either enemy intentions or their own (whether they had built up their super weapons, etc.)… The music itself is such a huge character and driving force in the game that it needed to be the star of the show for the most part, with the sound design being the force to communicate messages to the player during the chaos. (Samuel Justice in Kuzminski 2018) This idea of sound being used to communicate messages to the player is a simple function but an important one. At the beginning of this chapter we looked at the similarities and differences between interactive and linear media. There is another similarity that affects those working in the either industry. For games, as with cinema, there remains a prevailing assumption that the visual contains the essence of the medium, and the sonic is mere augmentation: What if, as a result of accident or magic, your character loses the ability to see or hear? In graphical worlds, the answer is simply “It Does Not Happen.” It
FIGURE 9.8
Cuphead
Sound in video games 167
would be possible to continue playing without sound, but without vision there simply isn’t enough other sensory information available to compensate. (Bartle 2004, 231)12 Whilst this seems a common-sense view (pardon the pun) there are examples of games which disprove this, and they can be played by unsighted players, even where the game was not designed for that purpose. ‘You could almost play it with your eyes closed’ is a phrase often used to compliment a game’s sound design, whilst stopping short of the actual claim of it being a possibility. Many unsighted players (blind, partially blind, temporarily blind) play games designed particularly for that audience, but many also play regular games using only the regular sound design which provides enough feedback to play the game effectively. Though Super Mario 64 (1996) was originally marketed for its innovative 3-D worlds on the Ninetendo 64 system, a blind gamer describes playing it nonetheless: There were all these musical cues and cues in the sound design.… You could tell when you’d picked up a star, or when the player was jumping, or when you picked up coins. When you can’t rely on visual cues, having a different audio cue for each event in the game is more than just a bonus. (Webber 2014) There are a growing number of audio only games.13 They may use a heightened sense of hearing as a strength in a world plunged into darkness, such as Three Monkeys (2015). Games can also as training situations such as Legend of Iris (2015), inspired by the Legend of Zelda (1986), which helps blind children learn navigation skills in the context of a fantasy game (Allain et al. 2015). Initially each player is presented with small tasks to become familiarised with the controls. Increasingly complex tasks are introduced to develop different auditory skills such as localisation, focusing, following moving objects, avoiding moving objects, orientation using environmental sound, spatial memory (Allain et al. 2015, 2.2) Game audio can be designed in such a way that their influence or value may be obvious or less obvious. In the context of a game, the process of learning about the game’s characters, weapons, enemies, and so on provides an opportunity for significance of each to be learned in order to progress through the game. Their recurrence will allow the gamer/player to make connections between their actions and responses. Recurrent elements of the sound design can also serve to reinforce particular details about environments or particular characters or enemies. Emotional responses can be guided or suggested. At each stage, in whatever game genre or style, a similar end-directed process is undertaken. The sound design is created with its purpose in mind. Whilst there may be a substantial amount of ‘spotting’ of particular sounds that will occur, the overriding concern is the effect the soundtrack will have on the listener, emotionally, or as a narrative element; in other words what the player should know, think or feel as a result of the sound design. Since games also use sound as a means of
168 Sound in video games
feedback as a guide to the gameplay, we might add to this list what the designer might intend (in an ideal world) the player to do as a result.
Summary R. Murray Schafer’s view of the natural soundscape (1993) is analogous to the visible landscape, since it considers everything within audible range, can be applied to games. Some of this manufactured soundscape is the background, the bedrock on the audible world that we inhabit, which gives us the setting for our story or game to unfold. Other parts are foregrounded, or are the consequence of actions and events that demand our attention. Others still might reward the careful listener with information that will not easily be revealed to the casual or inattentive player. The work of a sound designer is in creating this multiplicity of sonic elements to support the many and various requirements of gameplay. One of the fundamentally useful suggestions is in Schell’s The Art of Game Design is leaving space for imagination, a theme that will find correspondence in many areas of sound design outside game audio. While there are many differences in technology, approach and intention for interactive sound designers there are nevertheless commonalities with other areas of sound design. Any sound design worth its salt takes account of the purpose or function and the effect of the sound. It is essential that the exploration of the usage of sound in some interactive experience does not end up confused with the mere placing of sounds on top of things. Designers should not be searching for excuses to use sound: they should be designing ways in which sound may contribute to the purpose of the application. To put it another way, in this context, sound is a means, not an end. It is not about fitting; it is about profiting. (Valter and Licínio 2011, 364) Here as with any other type of audiovisual sound design, the aim is always on the effect and so the work is in creating the sounds through which the effect is achieved. Writing about game development in a time of 8-bit, 64K, 1Mhz systems, Chris Crawford posed two hypothetical futures: a choice between no further technological development or no further artistic development. Probably, neither of these futures will come to pass; we will have both technological development and artistic development. Yet we must remember that technological development, while entirely desirable, will never be the driving force, the engine of change for computer games.… Artistic maturation will be the dynamo that powers the computer game industry. (Crawford 1984, 106–107) Much of the current research on audio is related to immersive audio but as with the study of sound itself, a fundamental split concerns what we actually mean and
Sound in video games 169
understand by the term ‘immersive audio’ and the assumptions that go with it. From a technological perspective, it is often about the accurate modelling of acoustic phenomena and its reproduction either through headphones or a loudspeaker network.14 The alternative interpretation is concerned the psychology of hearing and the feeling of immersion that may or may not relate to an accurate physical model of human hearing perception. A book can be immersive, but few would argue that it is its lifelike experience that accounts for the immersive quality. Audio research into acoustic modelling, immersive and interactive audio systems have led to new technologies and new ways to implement sound, particularly in virtual and augmented reality applications. For aspiring creative sound designers, though, a final thought to consider. Almost a century of audiovisual design experience has shown that a well-meaning duplication or simulation of the real world is not necessarily the best path to creating interesting, moving or engaging works or experiences. It would be foolish to ignore or stand in the way of technological development. Time and time again technological development has led creative artists to new heights, but whilst it may be the inspiration it is seldom the reason for their success. It would be wise to remember that often the most effective sound design comes from judicious use of the tools at hand, a problemsolving approach, and a good deal of imagination.
Notes 1 At the time of writing the most recent figures from the Entertainment Software Association indicate that in the US, the largest market for games, consumer spend on video games in 2017 was around $29 billion on content and $6.9 billion on hardware and accessories (www.theesa.com). 2 A common name remained ‘amusements’ for arcades filled with video games and slot machines (‘one-armed bandits’). 3 According to its creator Steve Russell, Spacewar! used only 4000 18-bit words of memory split between program and data and so sound was not included (Takahashi and Russell 2011). 4 In the original Atari arcade version, the tones used a combination of pitch and duration. Sounds were 500Hz for 0.04s (the approximate duration of one frame of video) for players’ bat hit, the wall (250Hz for 1 frame) or a point scored by either player (250Hz for 10 frames). In the Binatone version of the game, the sounds used were only differentiated by pitch: either players’ bat (1.5kHz), the wall (1kHz) or a point scored (2kHz). 5 This spawning system is supported in audio middleware such as Audiokinetic Wwise and Firelight Technologies FMOD (‘scatterer’) as well as many bespoke audio engines. 6 The full list of aesthetic components in games is: 1. Sensation – game as sense-pleasure; 2. Fantasy – game as make-believe; 3. Narrative – game as drama; 4. Challenge – game as obstacle course; 5. Fellowship – game as social framework; 6. Discovery – game as uncharted territory; 7. Expression – game as self-discovery; 8. Submission – game as pastime (Hunicke, Leblanc and Zubek 2004, 2). It has been noted that aesthetics are generally person-specific, with different players reacting differently. The term ‘aesthetics’ could just as easily be swapped for the term ‘affect’. 7 It should be noted that there are numerous definitely examples where this boundary is deliberately blurred, such as the musical score of Blazing Saddles becoming visible as Count Basie’s orchestra, or the transition or musical score (non-diegetic) into source music (diegetic) originating in the story world through a radio or a band playing in the background.
170 Sound in video games
8 The diegetic/nondiegetic model is often replaced by the musical originated source/score model in which music which originates in the world of the characters is described as source (and is treated accordingly) where music that is heard by the audience but not audible to the characters is called score. 9 Some might argue that this is similar to Alfie or Annie Hall or High Fidelity where a character breaks the ‘fourth wall’ and directly addresses the camera/audience. For example, in the opening scene of Alfie (1966) Michael Caine directly addresses the camera and introduces himself as the film’s title credit appears and then adds “I suppose you think you’re going to see the bleeding titles now. Well, you’re not. So you can all relax.” 10 In cinema this tendency to focus of narrative sounds to the front speaker channels is to avoid what is known as the ‘Exit Sign Effect’ where an object might disappear off screen while its sound appears to come from somewhere in the actual physical space of the cinema (Holman 2010, 30). 11 Though Andersen was sound designer he is credited only as Composer in several places such as the game’s Wikipedia page. This is not a new development. Donkey Kong’s Wikipedia entry lists the game’s director, producer, designers and composer while ‘sound’ is merely credited to the type of Intel microcontroller chip used (i8085). 12 Interestingly perhaps, in a footnote the author mentions that he had almost been deafened aged five, but after surgery his hearing was restored, though a lasting effect is the inability to determine the direction of sounds (Bartle 2004, 403). 13 Audiogames.net lists over 600 audio only games at the time of writing: www.audiogam es.net/list-games/ 14 See for example acoustic modelling for computer games (Miga and Ziólko 2015), headphone-based immersion (Yao 2017) or immersive audio using wave field synthesis (Lim et al. 2014). There are also interesting hybrid studies which examine physical responses related to audio such as Usher, Robertson and Sloan (2013).
References Games Spacewar! 1961. Russell. Computer Space 1971. Nutting Associates. Pong 1972. Atari. Space Invaders 1978. Taito. Asteroids 1979. Atari. Galaxian 1979. Namco. Bomb Jack 1984. Tehkan. Duck Hunt 1984. Nintendo. Legend of Zelda 1986. Nintendo. FIFA 1993-present. Electronic Arts. Super Mario 64 1996. Nintendo. Limbo 2010. Playdead. Bioshock 2010. 2K Marin. LA Noire 2011. Rockstar. The Last of Us 2013. Sony Computer Entertainment. Three Monkeys 2015. Incus Games. Legend of Iris 2015. TU Delft. Call of Duty WWII 2017. Activision. Cuphead 2017. Studio MDHR.
Sound in video games 171
Other references Allain, Kevin, Bas Dado, Mick Van Gelderen, Olivier Hokke, Miguel Oliveira, Rafael Bidarra, Nikolay D. Gaubitch, Richard C. Hendriks, and Ben Kybartas. 2015. “An audio game for training navigation skills of blind children.” IEEE 2nd VR Workshop on Sonic Interactions for Virtual Environments (SIVE), Arles, France. Altman, Rick. 1980. “Sound Space.” In Sound Theory, Sound Practice, edited by Rick Altman, 46–64. New York: Routledge. Andersen, Martin Stig, and Damian Kastbauer. 2011. “Limbo – Exclusive Interview with Martin Stig Andersen.” Designing Sound. Available online at http://designingsound.org/ 2011/08/01/limbo-exclusive-interview-with-martin-stig-andersen/ Bartle, Richard A. 2004. Designing Virtual Worlds, 1st edition. Indianapolis, IN: New Riders Pub. Bridge, Caleb. 2012. “Creating Audio That Matters.” Gamasutra.com. Available online at www.gamasutra.com/view/feature/174227/creating_audio_that_m Broomhall, John, and Martin Stig Andersen. 2011. “Final Cut: Limbo.” AudioMedia, July, 32–33. Bushnell, Nolan K. 1972. Computer Space Owner’s Manual Instructions. Nutting Associates. Church, Doug. 1999. “Formal Abstract Design Tools.” Gamasutra.com. Available online at www.gamasutra.com/view/feature/131764/formal_abstract_design_tools.php Coleman, Michael. 2013. SoundWorks Collection – The Sound and Music of The Last of Us. Available online at https://vimeo.com/68455513 Collins, Karen. 2008. Game Sound: An Introduction to the History, Theory, and Practice of Video Game Music and Sound Design. Cambridge, MA: MIT Press. Collins, Karen. 2016. “Game Sound in the Mechanical Arcades: An Audio Archaeology.” Game Studies: The International Journal of Computer Game Research 16(1). Copa90. 2015. FIFA16 Behind the Scenes: Match Commentary. Available online at https:// youtu.be/_asye0IWFxY Crawford, Chris. 1984. The Art of Computer Game Design. Berkeley, CA: Osborne/McGrawHill. Ermi, Laura, and Frans Mäyrä. 2005. “Fundamental components of the gameplay experience: Analysing immersion.” DiGRA Conference: Changing Views – Worlds in Play. Farnell, Andy. 2010. Designing Sound. Cambridge, MA; London: MIT Press. Galloway, Alexander R. 2006. Gaming: Essays on Algorithmic Culture. Minneapolis: University of Minnesota Press. Gershin, Scott Martin, Russell Brower, Tommy Tallarico, and Pedro Seminario. 2017. “Classic Video Game Sounds Explained by Experts (1972–1998) | Part 1.” Wired.com. Available online at https://youtu.be/jlLPbLdHAJ0 Holman, Tomlinson. 2010. Sound for Film and Television, 3rd edition. Amsterdam; Boston: Elsevier/Focal Press. Huiberts, Sander. 2010. “Captivating Sound: The Role Of Audio For Immersion In Computer Games.” Doctoral thesis, Utrecht School of the Arts (HKU) and University of Portsmouth. Hunicke, Robin, Marc Leblanc, and Robert Zubek. 2004. “MDA: A Formal Approach to Game Design and Game Research.” AAAI Workshop Challenges Game, pp. 1–5. Available online at http://www.aaai.org/Papers/Workshops/2004/WS-04-04/WS04-04-001.pdf? utm_source=cowlevel Jørgensen, Kristine. 2011. “Time for New Terminology?: Diegetic and Non-Diegetic Sounds in Computer Games Revisited.” In Game Sound Technology and Player Interaction: Concepts and Developments, edited by Mark Grimshaw, 78–97. Hershey, PA: IGI Global. Kuzminski, Adriane. 2018. “Cuphead Sound Design.” ASoundEffect.com. Available online at www.asoundeffect.com/cuphead-sound/
172 Sound in video games
Lim, H., C. Kim, E. Ekmekcioglu, S. Dogan, A. P. Hill, A. M. Kondoz, and X. Shi. 2014. “An approach to immersive audio rendering with wave field synthesis for 3D multimedia content.” IEEE International Conference on Image Processing (ICIP),27–30 October. McBurney, Simon. 2016. “The Encounter Simon McBurney | Complicité.” YouTube. Available online at https://youtu.be/2cxnzcsHuKM Miga, Bartlomiej, and Bartosz Ziólko. 2015. “Real-time acoustic phenomena modelling for computer games audio engine.” Archives of Acoustics 40(2). Milena, Droumeva. 2011. “An Acoustic Communication Framework for Game Sound: Fidelity, Verisimilitude, Ecology.” In Game Sound Technology and Player Interaction: Concepts and Developments, edited by Mark Grimshaw, 131–152. Hershey: IGI Global. Nacke, Lennart, E., and Mark Grimshaw. 2011. “Player-Game Interaction Through Affective Sound.” In Game sound technology and player interaction : concepts and development, edited by Mark Grimshaw, 264–285. Hershey PA: Information Science Reference. Nishikado, Toshihiro. 2004. “Nishikado-San Speaks.” Retro Gamer 3: 35. Orozco, Lourdes. 2017. “Theatre in the Age of Uncertainty: Memory, Technology, and Risk in Simon McBurney’s The Encounter and Robert Lepage’s 887.” In Risk, Participation, and Performance Practice: Critical Vulnerabilities in a Precarious World, edited by Alice O’Grady, 33–55. Cham: Springer International Publishing. Ralph, Paul, and Kafui Monu. 2015. “Toward a Unified Theory of Digital Games.” The Computer Games Journal 4(1): 81–100. doi:10.1007/s40869–40015–0007–0007 Schafer, R.Murray. 1993. The Soundscape: Our Sonic Environment and the Tuning of the World. Rochester, VT: Destiny Books. Schell, Jesse. 2008. The Art of Game Design: A Book of Lenses. Amsterdam; Boston: Elsevier/ Morgan Kaufmann. Schutze, Stephan (SoundLibrarian). 2013. “Bioshock 2 Game Audio Analysis.” YouTube. Available online at www.youtube.com/watch?v=_gdJA5tF7f4 Serafin, S., M. Geronazzo, C. Erkut, N. C. Nilsson, and R. Nordahl. 2018. “Sonic Interactions in Virtual Reality: State of the Art, Current Challenges, and Future Directions.” IEEE Computer Graphics and Applications 38(2) :31–43. Storms, Russell L., and Michael J. Zyda. 2000. “Interactions in perceived quality of auditory-visual displays.” Presence: Teleoperators and Virtual Environments 9(6): 557–580. Takahashi, Dean, and Steve Russell. 2011. “Steve Russell talks about his early video game Spacewar!” Venturebeat.com. Available online at https://venturebeat.com/2011/01/12/ fifty-years-later-video-game-pioneer-steve-russell-demos-spacewar-video-interview/ Theiler, Michael. 2013. “Sound in the Banner Saga.” Available online at https://stoicstudio. com/sound-in-the-banner-saga/ Usher, Raymond, Paul Robertson, and Robin Sloan. 2013. “Physical responses (arousal) to audio in games.” The Computer Games Journal 2(2): 5–13. doi:10.1007/bf03392340 Valter, Alves, and Roque Licínio. 2011. “Guidelines for Sound Design in Computer Games.” In Game Sound Technology and Player Interaction: Concepts and Developments, edited by Mark Grimshaw, 362–383. Hershey, PA: IGI Global. van Tol, Richard, and Sander Huiberts. 2008. “IEZA: A Framework for Game Audio.” Gamasutra.com. Available online at www.gamasutra.com/view/feature/3509/ieza_a_fram ework_for_game_audio.php? Walden, Jennifer, and Asbjoern Andersen. 2018. “Creating Call of Duty WWII’s Historic Sound – an in-depth interview with Dave Swenson.” Available online at https://www. asoundeffect.com/call-duty-wwii-sound/ Webber, Jordan Erica. 2014. “Video games which open the door for the blind to play.” The Guardian, 13 October.
Sound in video games 173
Westerberg, Andreas, and Henrik Schoenau-Fog. 2015. “Categorizing video game audio: an exploration of auditory-psychological effects.” Proceedings of the 19th International Academic Mindtrek Conference, Tampere, Finland. Wilhelmsson, Ulf, and Jacob Wallén. 2011. “A Combined Model for the Structuring of Computer Game Audio.” In Game Sound Technology and Player Interaction: Concepts and Developments, 98–132. Hershey, PA: USA: IGI Global. Yao, Shu-Nung 2017. “Headphone-based immersive audio for virtual reality headsets.” IEEE Transactions on Consumer Electronics 63(3).
10 SOUND IN PRACTICE
Sound analysis revisited Historically, serious inquiry into sound considered it initially as a unified whole, but which eventually became split into two mains strands of investigation. The scientific, acoustical study of sound was separated from its study as an object of the human perception. The history of sound’s technological development has many identifiable landmarks and familiar names who paved the way for those working with sound decades or centuries later. The work of mathematicians Fourier, Nyquist and Shannon, and technologies developed by Bell, Edison and Marconi have each had a profound effect on the development of sound technology and the uses of sound in all its forms. Once it became possible to capture or transmit sound and reproduce it at will, sound also became something that could be manipulated whilst retaining the appearance of naturalness. When synchronised or re-synchronised with a visible partner, whether a real physical object, or one seen on a screen or in a virtual environment, the two worked together to hide their own artifice. This sleight of hand could happen in plain view, hidden by the experience of lived reality which teaches us that we can trust our own perceptions, and believe what we hear and see. This leads us to the other main branch of the study into sound is not concerned with acoustics or technology but with how sound is perceived, and subsequent to that perception, how it is understood or interpreted. This is of particular interest to sound designers. Sound may be put in service to support or suggest the idea of a realistic portrayal of objective reality, and at other times to make real the fantastic or synthetic. It can be used to suggest a mood or to give insight into a character’s thoughts or intentions. The fundamental problem-solving usage of sound is what has led to the term sound design being used liberally throughout this book. Whilst it continues to be a problematic term for some, given its specific histories across
Sound in practice 175
different industries, it is one which carries with it a sense of the breadth of potential. Sound is often used to problem-solve, and this task in no way undermines the artistic and aesthetic aspects of the work that this umbrella term is adopted to describe the work undertaken. The term ‘design’ emphasises the end-directed, problem-solving nature of sound wherever it is used: whether in film, television, games or product design.1 Much of the work in sound design is spent in determining how a particular usage of sound will be read – what an audience, or player will make of it. Semiotics, being the study of signs, is potentially well suited to examine both the product and the practice of sound design. The semiotics of Charles S. Peirce in particular seem particularly adapted to explore the range of sound’s possibilities. Starting with the premise that “we have no power of thinking without signs”, Peirce built a model that is flexible enough to describe any kind of sign usage, which is its principal benefit over language-based semiotic models (Peirce, Hartshorne and Weiss 1960, 5.284). Sounds can be thought of as signifiers in a sign, where the sound signifies an object, which in turn can be interpreted in some mind. This then allows us to look at any kind of sound usage whether musical or spoken, familiar or unfamiliar, commonplace or unique. This model can also be used alongside more specific theoretical or practitioner-derived models of sound such as those offered by Rick Altman, Michel Chion and Walter Murch. Equipped with the tools of Peirce’s semiotic model we can put them to use to reveal the particularities of sound’s role in telling stories, such as the gradual unfolding of meaning in Coppola’s The Conversation where information is gradually accumulated to shape our understanding of events and what to make of them. They may also be applied at the micro level when dealing with individual sounds created by Murray Spivack for King Kong. They might also be used at the macro level of a series of recurring sound signs which point to a meaningful and systematic usage in No Country for Old Men. The use of sound in non-fiction contexts offers similar problems as its fictional counterparts, as well as ample potential for creative possibilities. Since the earliest days of the non-fiction film, the techniques and technologies it adopted were broadly similar to those used in any other audiovisual production. Where it differed was in the motivations of its filmmakers and the recognition that their choices had an effect on how the audience would view their work. Non-fiction filmmaking presented an overtly ethical dimension to the near endless technical, artistic and socio-political decisions that each film might require. However, any truth-telling nature of non-fiction should not be seen as a smothering requirement or an indication of a lack of creativity. On the contrary, a great deal of creative work goes into the soundtrack of a number of non-fiction works, sometimes in order to hide the artifice that had to be used to give the impression of verisimilitude, and at others to make explicit to the audience the types of choices that are made in all non-fiction works to tell a story from a particular point of view. More recently, the world of interactive sound, and with it Virtual Reality (VR) and Augmented Reality (AR), presents yet more opportunities for sound designers
176 Sound in practice
to create new worlds. Though increasingly technologically complex, each decision that goes into the creation of the sound content for an interactive context is still fundamentally a creative, aesthetic and design-oriented one. Ever since the video game Pong we have been aware that once the simplest of synthesised sounds synchronised with a user action, that sound then becomes meaningful to the player. Part of sound’s usefulness is that it frees the eye from attending to information already conveyed elsewhere. As with cinema, games routinely require sound to provide a sense of realism, alongside other more emotional guidance for the player, as well being part of a broader audiovisual feedback system, which can be learned and mastered. Yet all too often for media that involves sound, writers (whether theorists or critics) display misconceptions or a fundamental ignorance about the practicalities of the production processes and such analyses are at best incomplete, and all too frequently unhelpful. In an article titled “A Statement on Sound Studies” Mark Kerins pointed to potentially fruitful points of research for the field of sound studies and film sound studies in particular: Many of those who write on film sound, whether from inside or outside cinema studies, display only a limited understanding of the various processes and people involved in the creation of motion picture soundtracks. As one common example, there is still a frequent assumption that onscreen dialogue is recorded on set, despite the pervasiveness of Automated Dialogue Replacement (ADR) in modern filmmaking. This level of ignorance about actual production strategies and techniques results in incomplete, and possibly inaccurate, analyses; indeed, even using such a seemingly fundamental term as ‘sound design’ is historically problematic, as that designation did not appear until the 1970s, and remains a subject of debate among film sound professionals. More work that combines primary research (including interviews with film sound professionals and study of archival records where available) with a basic understanding of common production practices would provide useful context about the ways film soundtracks are made, the possible limitations (both technical and aesthetic) filmmakers confront at various times, and why moviemakers arrive at particular decisions. (Kerins 2008, 116–117) In order to develop a fuller understanding of sound design and its role in audiovisual productions such as film, television and video games it is fundamentally important to hear practitioners describe their work. The individual rationales that inform the different approaches taken can reveal a great deal about the fundamental underpinnings of their work. This chapter examines the practitioner’s view to reveal some of the fundamental philosophies and working practices that have been developed and to see what commonalities there are across a range of different practitioners, industries, roles and genres.
Sound in practice 177
Re-examining sound design For those working in any area of audiovisual sound production, there is an implicit acknowledgement that their decisions are made in the interests of the production as a whole. It is a two-part process that begins with the questions concerned with what the audience should know, or think, or feel. Once this is determined, the second part of the process is spent answering the question: How best to go about achieving this end? For many practitioners this process is so intuitive or ingrained that it is not often put into words. Pedro Seminario describes the thought process that goes into the sound design for a game: When you are trying to create sound for things that don’t really exist the best thing you can do is try to figure out the intention of it first.… Oh, this a zombie. Ok, so what portions of it do we need to focus on. It’s trying to make something that is universally thought of as a zombie. (Gershin et al. 2017) A semiotic model can also help reveal to those not directly involved with the production of the soundtrack or soundscape that the practices of sound design are end-directed, in the sense that decisions about sound are taken based on the effect they produce: what the listener will think, or feel, or know as a result. Most storytellers, or those involved in a communication of any kind, would like to affect their audience in some way. Simply providing information in some cases may be enough to affect a change. A journalist reporting on a devastating drought or war may need no embellishment (and it may well detract from the story) for the audience to respond in a visceral way. For others operating somewhere along a continuum between truth and fantasy there are choices in how to represent, what to represent, and by what means. The term sound design has been used liberally throughout this book, though it should be acknowledged again that for many it remains quite a problematic term. It has different meanings for people across different industries, and often for those within the same industry. In using the term sound design there is an attempt to incorporate the idea that there is rational decision-making involved; that whatever is being done is a result of choice. That is not to say that the choice is not artistic. Design and art are not necessarily polar opposites. Indeed, they co-exist in virtually every situation wherever sound is used purposefully. In a very real sense, sound design is an exercise in problem-solving. There are sounds that need to be used, but which may require editing or replacement in order for verisimilitude to be maintained whilst any artifice remains hidden. There are sounds that are added or manipulated that are designed to go unnoticed, but which are adjudged to be useful in suggesting a feeling that would not otherwise be present, or sufficiently present. Tim Nielsen, sound editor on There Will Be Blood (Anderson 2007), describes his work creating the 15-minute opening sequence which is without dialogue and largely without music in which we are with Daniel Plainview (Daniel Day Lewis) in a deep hole, mining for silver:
178 Sound in practice
A lot of it is to the credit of the movie that you are so drawn into the movie that you are not paying attention to the sound … the important thing is this stark world of wide open spaces and this claustrophobic existence of this miner trying to find oil that he spends all his day doing. To that end everything we do is trying to sell that contrast and make it emotionally strong. (Nielsen and Murray 2015) There are relationships between sounds and images or objects that can yield an interpretation of the narrative, either explicit or implicit. Each element, be it sound or image, is only a part of the whole and they are interdependent. The soundtrack depends on the image and the image is dependent on the soundtrack. Whilst the focus here is on the use of sound in media, there are similar problems to solve in other fields of sound design such as product design, interaction design or auditory display. As well as the fundamental functional aspects of the sound being fulfilled there are usually other requirements such as an aesthetic quality, or an expressive or interesting quality, particular where the sound design is applied to something new such as a new product or interaction. Here “sound has to have a character, an identity of its own, which is based on sonic qualities or qualities of sonic processing, rather than sonic references to familiar, ‘natural’ sounds” (Hug and Misdariis 2011, 25). Strategies to develop sound ideas may involve a conceptual starting point in another modality, such as painting, music or architecture, that is used as a starting point and source of inspiration. Here the aims are usually outlined, not in sonic terms, but in terms of function and aesthetics as well as potentially fitting in to an existing world or family of sounds that have been established. For electric vehicles for example, there is a good deal of new territory to map out. BMW sound designer Emar Vegt describes the problem of creating the sound for sports cars that no longer contain an internal combustion engine, but which nevertheless, should convey a sense of sportiness: In an electric car, or a ‘sporty sound’ it’s a new process that we are still discovering. For a combustion engine, yeah, we know what to expect. We know what a sporty or dynamic driving sound is. But for an electric car the parameters, the dimensions, are the same but the content has changed. (Vegt and Murray 2015) Though it may sound strange sound design is often about giving character to creatures or inanimate objects. For one new model of BMW, Vegt outlines the guiding principles related to the character of the car was conveying an idea of being as a ‘good citizen’ along with more traditional aims: EV: One of the key words for the sound which we discussed with the sound was ‘friendliness’ because many BMWs are loud, and have a certain level of aggressiveness to it: of the stance, that is dynamic, and wants to move and has the power to do so. And with [this model] aggressiveness was not one of the goals.
Sound in practice 179
LM: It’s character design as much as sound design in a way? EV: Yeah, because friendly should not sound weak. It should also not sound like you can just step in front of the car and it will stop for you because it will give in to anything. So precise, friendly, sort-of powerful still, but friendly and powerful is a difficult mix. Lightweight was a goal because the car is lightweight and it should be conveying that through sound as well. If I have a car and it sounds heavy then you have a contradiction. (Vegt and Murray 2015) The practices of sound recording, editing and mixing can be conceptualised as a design task, with the sound being a means to an end rather than the end in itself. The semiotic model can also illustrate to those not directly involved with sound design that the practice is end-directed, in the sense that decisions about sound are taken based on the effect they produce: what the audience, or player, or user will think, or feel, or know. Rather than perceiving sound as a superfluous or auxiliary step that is applied at the completion of a creative process, it can instead be scrutinised as a fundamentally important aspect of the creation of the artefact. Along with realism and believability sound is often required to give character to individuals or places. Charles Maynes describes the contrast in expectations between sound and image: I think one of the core issues with that landscape is that sound is often called on to bring reality to a visual dimension. So there might be a fantastic computer generated image so sound is expected to give it physical credibility. So a lot of times we are really drawn into this. Of course we are supporting the storytelling which is the primary measure of what we do – but we don’t have the same kind of latitude that visual effects has because if we do something fantastical that doesn’t have a visual tied to it, it is extremely distracting, and oftentimes counterproductive to the storytelling. (Maynes and Murray 2015) This interplay between sound and image imposes some limitations but also allows a great deal of artistic licence. Mark Mangini’s first job in sound was at Hanna Barbera’s animation studio in the 1970s. The animation would be done in Korea or Taiwan and would be assembled and then sound design work would begin. It was a great training ground, animation. It’s a unique skill. We used to joke that once you got good at it.… When the season would get very busy and you didn’t have enough trained cartoon sound editors, we’d hire from the local union and they just send anybody. A guy that had been cutting a sitcom or a Magnum PI and they just didn’t get the cartoon aesthetic which was that you don’t … you always want to use metaphorical sound. You don’t want to use real sound. So it was always that interesting process having to retrain those individuals. (Mangini and Murray 2015)
180 Sound in practice
That approach of stretching the bond between sound and image as far as possible, pioneered by people like Treg Brown in particular, was one that was fruitful, and became a fundamentally useful approach to sound design more broadly or when designing sounds for things that do not exist. I found that I admired the people like Treg who just went further and further out with that use of metaphor. Well when you see that action happen, how far afield can you go with sound and still tell the story and yet have an absurd noise. And that was the fun part. And that leads to sound design, that leads to that metaphorical thing which is critical for sound design. (Mangini and Murray 2015) If we were to consider the practice of sound design in terms of the use of signs we might think of the role of a sound designer as an arranger of sounds. The sounds chosen may sometimes be instantly familiar or it may be completely foreign. Each sound’s source or purpose may be instantly recognisable from the type of sound being used or from its context, such as when a sound is synchronised with a visible action, or where a type of music is used to introduce or accompany a scene. Others may also not reveal themselves, instead requiring the listener to make a guess as to its meaning, and that guess may again need to be revisited in light of some new information or after a period of reflection. Sound designers may control the sounds themselves but they do not have complete control over the way those sound signifiers might be heard, what exactly they will signify, or how they will be interpreted or understood. In some ways sound designers intuitively work to manipulate or control in some way how they intend the audience reads the soundtrack or soundscape that is created. Some sounds such as lines of dialogue are most likely required to be heard for story exposition. Other sounds may be used as clues which are not necessarily instantly understandable but which may provide an association which could be interpreted in some way. Equally, some other sounds may be used purely to invoke a feeling or emotional response. David Lynch’s film and television soundtracks are frequently used as examples of where the aim of a particular sonic treatment is a particular feeling rather than any explicit information about any character or the narrative. Simply creating a feeling of confusion, or dread, or familiarity in the audience may be exactly the desired outcome. Whatever the desired effect, the sound designer’s job is to assemble the sounds in order to influence the audience in some way. For some projects sound often plays a supporting role, often to create a sense of realism. There may be some irony that in shows or movies praised for their sense of authenticity or realism this is the case partly because of the invisible work of those involved in the Foley track which performs its role so well. Matt Haasch and his Foley partner Jay Peck have worked on a number of productions, such as The Wire and True Detective, or films like The Wrestler and Precious. While it may be nice to hear the result of one’s work, given the time and effort that might go into creating
Sound in practice 181
the sound for a scene, for Matt Haasch there is always a sense that the needs of the production come first. I think there is some truth the fact that if we are doing our work really correctly, you don’t perceive it. We know what was ours or not but it’s not going to be shown. (Peck, Haasch and Murray 2015) The specific approach to Foley that is to be adopted when starting a new project can bring different philosophies depending on the specifics of the script and any particular requirements relating to character or mood to be enhanced or suggested. I would say my perception of it, generally speaking, unless noted – our job is to do very realistic performance that will fit in with production. Once you get into a piece, the mood of a piece. You could say this is, and I’m not trying to toot our own horns, but we work with the same people a lot, so they know that we’re perceptive enough to see a character and try and reinforce a character or situation. (Peck, Haasch and Murray 2015) In order for this to work to its fullest those collaborators not directly involved in the soundtrack trust enough those sound specialists to do their work. Matt Haassch credits showrunner David Simon and HBO for the farsightedness to allow such attention to detail in The Wire: If we knew we were going to be in the schools in the next season, in the off-season they would go into schools and do the recordings. They did Baltimore harbour recordings. They did Baltimore container yard. They did specific recordings for that and David Simon was big on hiring local people to do ADR. Jen had this ragtag group of misfits that were actually – some were characters from The Corner who then got jobs on The Wire – who were getting together a group of ADR loop group people who were doing all the touts in the streets ‘Red Tops. Blue Tops’. They were all kids right off the streets, so there was a real concern for legitimacy and authenticity and the same thing with his other jobs. (Peck, Haasch and Murray 2015) While authenticity or realism are play a central role for some tasks, in the world of fantasy sound plays a different role. On the face of it designing sounds for imaginary creatures or monsters might be far removed from Foley but each sonic decision taken is still concerned with the overall story. Paula Fairfield, sound designer on Game of Thrones, describes her approach to designing sound as one where each individual sound decision still relates to the bigger picture:
182 Sound in practice
Anybody can stick a bunch of cool sounds on something, and you go “Ooh, isn’t that pretty.” And you know what, it probably would be, but I have no interest in that.… How do you want to connect with your audience? What is it you want to make them feel? What is the story you are telling? If they don’t have a story, I make the story up. I need a story. (Fairfield and Murray 2015) For Paula Fairfield it begins with the attention to emotion: how does this make you feel? In Game of Thrones the role of Sound Designer is primarily creating the sounds for fantastical elements of the show: the dragons, the dire wolves, the white walkers, the mammoths, the giants, the dreams and the ravens. This does not mean that it is entirely divorced from reality. Indeed, the sound design still needs to have some purpose so that it meshes with the characters and the narrative. To me, in my mind, for something that makes sense, for someone to enter it there needs to be a logic. You may not understand it but you must feel it, that there is something.… It was never conscious, because like with the dragons they [the producers] often said “we want to make everybody cry.” That’s not an easy thing to do with a bunch of dragons. For me what’s beautiful about is I tell you my story through the dragons, and when you say “My god, those dragons moved me.” you hear my story, though you don’t know necessarily what it is, and then it becomes your story, but you heard me. And that is what art is about. (Fairfield and Murray 2015) For the sound of the dragons in Game of Thrones Paula Fairfield adopts a figurative approach to sound design which is grounded in real life, often from unusual sources. My whole way that I deal with the world is generally through metaphor and analogy. That’s what you are doing – applying sonic analogies to things. When I am doing the dragons, for instance, I am inspired by visual effects. When it comes in and you see, like, an eye flicker or something move in the face, or whatever, to find something for that. This past season I put my dog’s nasal whistles in Drogon. They’re beautiful. The sound for me, was my sweet dog who’s thirteen. He’s really strong, powerful. And one of the sweetest sounds when he’s up here [up close] and I hear this almost barely audible nose whistle, and it’s so sweet and gentle. And you hear that and attach it to a dragon and you get those beautiful.… So imagine taking a beautiful delicate sound like that and attaching it to a dragon. Look what happens. It’s so simple, not dragon-y at all. There’s a vulnerability. (Fairfield and Murray 2015) These sonic analogies transplant familiar feelings from one situation to another. As an audience we may not be able to recognise the exact origin of a sound but
Sound in practice 183
the aim is that there is sufficient to provide an metaphorical link to accomplish the desired outcome, whatever it may be.
Communication and collaboration In creative areas which are seen as specialist or ‘black arts’ of some kind, communicating the requirements becomes vitally important. For Charles Maynes sometimes this lack of a common language presents a problem. It’s so variable. There’s really no hard and fast metric to be able to determine how an outcome might happen. The one thing that seems as the more experience you get, people tend to feel more comfortable communicating in abstract terms as opposed to concrete directives. They’ll say “Yeah, I want it to feel like this.” And they will just expect that you will be able to establish that. (Maynes and Murray 2015) “I want it to feel like this” may appear vague but for many sound designers is a very useful starting point. What a sound person brings is their judgement on how best to achieve that aim. It’s tough though because sometimes actually saying “I want it to sound like the Matrix” is ridiculously helpful because there is an aesthetic attached to that which is very clean. You don’t have a lot of activity happening off the edges of the screen. It’s hyper-real. If a director’s speaking that and they are not able to enunciate that and you provide something that is more of a ‘Terence Malick’ kind of thing, or a lyrical exposition of sound to the image, you would totally fail. I guess you almost have to start from the top and say “what is the feeling you want, and is there any existing work that you would like to use as a point of reference for this?” That’s pretty vague still because the way that that can be interpreted is generally very broad. So I think that that is a really helpful way to talk. (Maynes and Murray 2015) In other areas the processes are quite different and there are specific preparations required in order for the process to go smoothly. In what might appear an arcane world of foley the language and conventions are different to other areas of sound. Part of the battle the fact that there is not always a common language between the people that actually do it and those who commission the work. I think it is more than half the battle. And I think for us, partly because we are up here now, and we’re not in the city. At Sound One a supervisor could bring a director to our stage and supervise us: “Oh, how’s the pogo stick coming.” “Well here’s what we’ve done.” “Well I like it but let’s try something different.” Now we’re.… Yes, we can send stuff back and forth but it’s
184 Sound in practice
not the same as that face-to-face thing. And we miss that, but we’re also only as good as our supervisors. So if you have a sound super[visor] on a job, and we often work for the same guys. But then you throw in a new director and a new producer. And it becomes a game of telephone on some level sometimes where our supervisor has to invent a new language with a new person. (Peck, Haasch and Murray 2015) For those who choose to write about sound there is a fundamental problem in that words often cannot convey what is easy to show through sound. If writing about music is like dancing about architecture then writing about sound as a whole can surely be equally abstract because we do not even possess a way of articulating the way sound is used far beyond a first layer of meaning whether in written language or musical notation: In recording, writing, and theorizing sound and sound’s affects, how do we then represent the fullness of sound in print? Can words and figures on the page do justice to the many sensations of hearing, feeling and understanding sound? Do notes, letters or graphs provide enough information to understand and perform a piece of music or recreate a theatrical moment? Can linguistic transcriptions represent tonal languages, heightened forms of speech genres, idiosyncratic dialect, and verbal performance events in their dialogical and heteroglossic complexities? (Patch 2013) An initial driving force for this book was the belief, or at least the suspicion, that there was an underpinning rationale shared by many sound practitioners in their approach to their work that had not yet been made clear. Many of the practitioners interviewed brought up the issue that creative work in sound is often inadvertently downplayed. This is partly because it is ‘invisible’, in the sense that it is difficult to determine what work has actually been done, but also because the work in progress is actually heard by relatively few, and even where this work is heard, it is not often immediately apparent what end is being achieved by the work. Part of the problem of analysing and talking about sound, is the lack of appropriate or consistent vocabulary. A language with which to describe the purpose and function of sounds and sound practices might go some way to removing the shroud of unfamiliarity of the work of the sound practitioner. Using this sign-based model and its concepts can, I hope, be useful in analysing the soundscape as a whole as well as the specific sounds and sound/image combinations. It provides the means to go into functional, representational, aesthetic, and symbolic aspects of sound usage. Semiotics, used as an overarching model into which other elements of film sound theory can be incorporated, has shown that it is versatile enough to provide a comprehensive and universal approach that can be applied equally to both the analysis and practices of sound design. In conversations with collaborators who are not sound specialists there is a need to find a common language that is useful and
Sound in practice 185
meaningful to each party. Describing a specific sound that is required may seem sensible but may not be particularly useful in practice. Metaphors borrowed from elsewhere may be a staging post on the way to the final destination. Describing a feeling or emotion is often useful because that is in fact closer to the ultimate aim of the design, and probably the production as a whole. For Star Wars Ben Burtt was given artwork and the overall them of a ‘used future’ rather than specific sounds required for creatures, weapons or vehicles (Whittington 2013, 63). Exactly how this aim was achieved was left up to the skill of the sound designer. Productive conversations between sound designers and their clients, usually directors or producers typically set out the ultimate aim: Usually we will try and leave the technology talk out of it. The technology is there to help us do our job but hopefully is the least important part of our job. First thing, always talk about sound in an emotional way, which is difficult because emotion is so subjective but it is the goal of the film – to tell an emotional story so we usually start talking to the director about emotions. (Nielsen and Murray 2015) This approach leaves the door open to the artist or designer to create the content that fulfils that requirement without unnecessarily limiting them or painting them into a corner. Effectively communicating what is required means not imposing limitations on creative specialists in whatever field but allows them to create meaningful objects (whether physical, artificial, visual or sonic) that will have the intended effect or lead to the intended interpretation. When practitioners discuss their approach, their work and what they see as important the words which come up over and again are story and feeling. For rerecording mixer Tom Fleischman, story and feeling are the two fundamentals important to anyone going into mixing: As long as those people proper kind of training, the proper aesthetic, and the feel, and the idea. Again here we go–story–do they know. Are they really mixing for the story or are they mixing to make it sound cool? (Fleischman and Murray 2015)
Sound authorship One of the frustrations for those working in sound is the difficulty being recognised as a creative endeavour. This is inadvertently caused by sound practitioner’s success in hiding any trace of their work. Whilst it is easy to understand that every word in a book is there by choice because someone decided to put it there, it is less common to parallels with the soundtrack. For Sound Designer Mark Mangini (Mad Max: Fury Road, Blade Runner 2049), sound authorship is no different: “There is no accidental sound in a film. Everything is chosen, selected, created, placed, designed… ‘orchestrated’ is a much better word which gives credit to the designer” (Mangini and Murray 2015). Yet sound is much more difficult to
186 Sound in practice
separate into authored pieces. It often appears completely natural, even in animated or completely artificial computer-generated images. As a result, unlike cinematography or music, there is often no trace of any authorship at all in the soundtrack. We understand that music doesn’t appear from nowhere. Anyone viewing an image can, with some imagination, envisage that someone must have positioned a camera in a particular place, pointed in particular direction to achieve the result: That is because most human beings are visually literate. And this gets into a whole other aspect of appreciation, but we are not sonically literate and that is a function of the inability to reproduce sound up until the last 150 years. Maybe, arguably the first recorded sound is late 1800s, but reproducible images go back thousands of years. So we don’t have an aural literacy to understand how to deconstruct sound. But of course you can deconstruct sound. You can listen to the soundtrack of a movie and understand: here’s what I am hearing. Why was that decided upon? Well, we are in a subjective moment for this character. We see and environment but we don’t hear an environment. That must have been a choice. What was that choice and how does it support the narrative? You can deconstruct aurally just like you can visually. (Mangini and Murray 2015) Sound is so naturalised that it is difficult for those not involved in the process to imagine it as being a series of choices. He describes an attitude to the work of sound which is common to many sound designers in film, as well as other disciplines: Arguably, the contribution that sound makes is every bit as artistic as what a composer brings, a production designer brings, cinematographer beings. They all bring an aesthetic. They analyse the script. They read with the director and they develop an approach and an aesthetic that describes in maybe – I don’t want to use the word subconscious – even the writer is always leading the audience in ways that they don’t know they are being led in. In using word play, or lenses or lighting, or colour, to tell a story that isn’t described in the form of words. Sound can do the same thing. So it should always have an aesthetic but we are not appreciated for applying one, because most people don’t understand even the rudiments of what we do, or what we do is even working on a subliminal level, or even an artistic level, let alone that there is an aesthetic applied to create that. (Mangini and Murray 2015) There are inevitably frustrations that arise from being defined by the technological aspects of the work involved in sound. Though they may use a range of specialised equipment, cinematographers are viewed as being creative rather than as technologists. Though the technologies involved may be specialised, the aim is a finished work through successful collaboration. In any design or artistic discipline there may
Sound in practice 187
be a range of options that could conceivably have been used, and a range of options that flow on from each individual choice. Ultimately, in a creative industry which combines a range of sound specialists what is being offered is an opinion; one that comes from an understanding of how best to use sound in a given context. Sound Editor Charles Maynes describes the underlying philosophy behind this approach: I would like to think that basically the biggest success for me would be for the director to feel that essentially he did exactly what I would have done with his skill set. And I don’t see that as being at all diminishing in my ideas. Because I enjoy being in a collaborative process. I’ll certainly have an opinion and a viewpoint. And a reaction to the story but my number one job is trying to make sure that the director feels that as little got in the way of his story process as possible. (Maynes and Murray 2015) Not everyone can be a master of every element in such a massive enterprise as a big budget television show, film or video game. Successful collaboration often comes because those leading the project implicitly understand this process, which includes sound as an equal creative component partner which can be implemented into the DNA of a project. On successful projects the early consideration and integration of sound ideas is taken for granted. Tom Fleischman describes the way long-time collaborators director Martin Scorsese and editor Thelma Schoonmaker involve music in a project: The whole preproduction … during the story boards he is thinking about music.… With Goodfellas, with those guys – Marty and Thelma – the music is always planned way ahead of time and very often it’s in the script. And that I think is true for a lot of filmmakers. It really depends on what kind of music it is. If it is needle drops. If it’s records – scoring with records. I think there in film school they call it diegetic and nondiegetic. We call it source and score. But if something is scored with a record usually that decided on way, way early. (Fleischman and Murray 2015) Music, like other sonic elements in then part of the blueprint for the project rather than the coat of paint applied once the structure is complete.
A practice-oriented theory of sound Fundamentally, sound can be defined as an acoustical phenomenon, or as the object of human perception. Sound can also be examined from a number of disciplinary perspectives for those who work it sound. An actor and an acoustician
188 Sound in practice
might both care about intelligibility of a human voice but share little in common about the semantic content. From an content perspective, even the most fundamentally important of human sounds, speech, is still a resource that is yet to be fully mined. Linguistics, whilst closely related to semiotics is still limited in its ability to describe the use of sound broadly, or even the human voice specifically: “Linguistics takes the text of speech as its object, but disregards the voice, the social articulation of language, perpetually pregnant with meaning” (Patch 2013). What advantages then does the semiotic model described here have when applied to sound design? In addition to providing an overarching framework and a language with which to describe sound, the Peircean semiotic model has a number of specific benefits:
It can be used as a means of analysing individual sounds, sound-image combinations, and sound-object relationships. It helps uncover and explain the creative processes involved in the production the soundtrack by illustrating a coherent conceptual basis for the practice. It facilitates the critical examination of the practice by providing a language to explain processes and the theory embedded in the practice, to explain how sound is used meaningfully. It takes into account the role of individual interpretation in creating meaning. It takes into account the possibility of interpretation becoming modified over time or through collateral experience. It integrates with existing sound theories, and thus provides a pedagogical framework with which the practices of sound design can be taught.
The development of a conceptual framework that can be applied to all kinds of sounds, regardless of their function or positioning on the hierarchy of the sound track or soundscape. It supports collaborative and fruitful discussion between individuals and colleagues, sound specialists and non-specialists, practitioners and theorists alike. Adopting Peirce’s semiotic model allows for sound to be conceptualised as a system of signs rather than simply sound types, such as dialogue, sound effects or music. It provides a means of describing how each sound element might be used to fulfil its many functions. It also furnishes the language tools to help explain not only the sound itself but what happens when it is heard by the listener, and the process of deriving meaning when sounds are heard in a particular context. The task of creating the sound design or soundscape of any sort can be reframed as a series of questions about what the listener should know, feel or think as a result of hearing it. This type of approach focuses attention on the decisions that influence how well the story is told, or what emotion is conveyed, or what information is communicated, how it will be understood by its audience, and how believable it might appear. It shifts the focus from the classification of each sound to its particular function, where each element of the sound design is selected and manipulated to serve a particular end.
Sound in practice 189
The types of focused listening described by Michel Chion affect the decisions that are made about the individual elements of the soundtrack, as well as the overall view of the soundtrack. In semiotic terms, these properties of sounds are reframed as sound-signs, having iconic, indexical or symbolic relationships to the things they represent, each of which can be manipulated by the sound designer and is interpreted by the audience. This model of the soundtrack does not seek to overthrow the many useful, existing film sound models developed by theorists and practitioners. Indeed, models such as those suggested by Rick Altman, Michel Chion, Walter Murch and Tomlinson Holman can be integrated into the broader semiotic model. Similarly, whilst the industrial model typically delineates sounds as either dialogue, music or effects, and subcategories of each, the semiotic analysis can be applied to each family of sounds, or individual sounds enabling the classification of sound elements in terms of their particular function or role within the soundtrack.
Working with sound In talking to industry practitioners for this book, it became apparent that they had a shared concern about the way sound was viewed within the wider industry. The increasing pressure on time to complete the work to the standard required, especially in post-production. Each worked with the utmost professionalism on their productions and valued the work of their collaborators and colleagues in sound and in the wider industry. Few were particularly concerned with technology or with tools other than being diligent in their mastery of them, but each was passionate about sound and its potential to help in telling stories. There is still a sense that even after the 1970s renaissance of sound, a certain amount of proselytising is required to maintain the idea that sound is an artistic and creative enterprise and proper attention to it is worthwhile and extremely effective. Tim Nielsen describes the conundrum faced by many: Unfortunately, the view of sound is that it is a highly technical craft which it is by nature. We have to know a lot about computer software and recording microphones and all these things but at its heart it should not be a technical endeavour. It is a creative endeavour as much as choosing the right lighting for a scene is, or choosing where to cut is. And so the directors who recognise that are in a much better position to use sound in a way that will really benefit their movie as opposed to those who relegate it to a secondary sort of necessary evil at the end of the process. (Nielsen and Murray 2015) The practitioners interviewed here are intended to be a snapshot rather than being representative of the whole sound industry. At the same time, they illustrate the breadth of work across genres and industries that is now undertaken. Foley artist Jay Peck worked in theatre prior to his film and TV work, while his Foley mixer
190 Sound in practice
Matt Haasch worked in live music concerts. Paula Fairfield is best known for her sound design and editing work in film and television but she also works in Virtual Reality projects. Mark Mangini is best known for sound editing and mixing feature films but worked initially in television cartoons and also has several credits as composer. Tom Fleischman is best known for his collaborations with Martin Scorsese in feature films and television drama but throughout his career has worked in documentary. The list goes on. Where previously people worked a whole career in a particular field it is more common to work across very different fields primarily because there are commonalities that apply across different industries and media which use sound. Constraints caused by insufficient time or money are problems common to most sound practitioners at some time or other, as they are with any other creative endeavour. Occasionally, the biggest impediment to a more imaginative use of sound are colleagues whose own understanding or appreciation of sound’s creative potential means limited involvement in the original conception of the production, or insufficient acknowledgement of the potential for sound to influence the work as a whole. Where a writer or director or producer genuinely values sound and sees its importance in the finished work, the impact is felt throughout the production. Similarly, some practitioners enjoy this traditional method of production, collaborating with those who value their creative input and expertise, and facilitating the early involvement of their sound colleagues when their opinions, concerns and suggestions can be better incorporated into the foundations of the production, rather than being used as a ‘coat of paint’ to be applied to the finished structure. There are obviously different working methods between someone who gathers ambience recordings from someone who edits together sound for fantastical creatures, or someone who mixes together a range of disparate elements into a whole. There are, though, a number of commonalities. Whichever practitioner described their work, or their process they often talk in terms of an ultimate overriding aim – whether it is the form of the narrative, or the affect on the audience, or the feeling or emotion that is required – the end point is not a sound in itself. Sound is the means of reaching the goal. The tools are secondary.
Summary This book makes frequent use of the term sound design to describe all manner of sound roles and tasks that in other contexts or more accurately described as sound effects editing, sound effects design, or sound supervision. Walter Murch’s advocacy for the term reflected an idealised method of working and role where one person oversaw the entire soundtrack, a director of sound. Tim Nielsen, reflecting on the term, described the work of sound design: “sound design is an intentioned use of sound. It is not haphazard. You are doing something with a purpose” (Nielsen and Murray 2015) Whilst the technology involved in sound design is often complicated, it is not the most important thing by any means. Picture editors, visual effects designers and
Sound in practice 191
cinematographers also use complicated technology but somehow there is an idea that sound is technical rather than artistic. This is partly because often the artistic and creative work that is done to achieve the result is deliberately or necessarily made invisible or imperceptible. The model suggested here – that sounds can be understood as part of a sign system – can be applied to the work that is done. Whilst many recordists, editors and mixers would not necessarily see their day-to-day work in terms of semiotic concepts (abduction, signifier-object relations, dynamical objects and interpretants, etc.) the fundamental processes that are used to undertake this work can frequently be framed in semiotics terms. Whether the work done is convincing the audience of a new realism through Foley, or creating the sound of a fantastical character, or mixing a complex web of sounds to something that appears to be natural and exactly what is required moment to moment, the fundamental approach involved is often quite similar. By describing the work in both the sounds themselves and the processes used by practitioners to do the work, we can shed some light on the ways that sound is used meaningfully. In doing so the principles that are fundamental to the decision-making behind the choice of sounds to be created, recorded, manipulated and combined, can be examined to reveal some fundamental principles. When Charles S Peirce first outlined his semiotic system he was surprised that anyone considered it novel. He had merely codified what he believed had been “recognised since men since men began to think” (Peirce, Hartshorne and Weiss 1960, 8.264). Similarly, the system applied here is not really novel in any way. It is simply a way of applying some conceptual tools sounds to as they are used and understood, and the ways sound designers of all kinds can view their work in a way (to borrow Peirce’s phrase) “that, I think, will do something for them” (Peirce, Hartshorne and Weiss 1960, 8.264). It is my intention that by exploring the ways in which sound can be considered to function as a sign, it adds weight to the argument that sound’s storytelling potential is too often limited by it being seen as an appendage or afterthought, rather than an essential element of any audiovisual work. Often the work is done so deftly as to hide itself, which means the creative labour involved risks being attributed elsewhere or ignored entirely. For those looking in on the world of sound, attention is easily drawn to the technology or the tools of the trade rather than the value that sound adds. Without appreciation or knowledge of the effort taken to arrive at the finished sound design any intellectual and artistic work carried out is already invisible. By discussing sound in terms of its influence and how it is made to perform its roles and multiple duties we may inch a little closer to a shared understanding of the work of sound designers and the impact of sound design more broadly. To those already working ‘behind the curtain’, intimately involved as the sound design takes shape, the effect it has on the whole, and thus how it will affect those who will hear it, is also made a little clearer.
192 Sound in practice
Note 1 The Oxford English Dictionary defines design as ‘Do or plan (something) with a specific purpose in mind’.
References Fairfield, Paula, and Leo Murray. 2015. “Transcript of Interview with Paula Fairfield.” 5 October. Fleischman, Tom, and Leo Murray. 2015. “Transcript of Interview with Tom Fleischman.” 25 September. Gershin, Scott Martin , Russell Brower, Tommy Tallarico, and Pedro Seminario. 2017. “Classic Video Game Sounds Explained by Experts (1972–1998) | Part 1.” Wired.com. Available online at https://youtu.be/jlLPbLdHAJ0 Hug, Daniel, and Nicolas Misdariis. 2011. “Towards a conceptual framework to integrate designerly and scientific sound design methods.” Proceedings of the 6th Audio Mostly Conference: A Conference on Interaction with Sound, Coimbra, Portugal. Kerins, Mark. 2008. “A statement on sound studies.” Music, Sound and the Moving Image 2(2): 115. Mangini, Mark, and Leo Murray. 2015. “Transcript of Interview with Mark Mangini.” 5 October. Maynes, Charles, and Leo Murray. 2015. “Transcript of Interview with Charles Maynes.” 12 October. Nielsen, Tim, and Leo Murray. 2015. “Transcript of Interview with Tim Nielsen.” 30 September. Patch, Justin. 2013. “Caught in The Current: Writing Ethnography That Listens.” Journal of Sonic Studies 4(1). Peirce, Charles S., Charles Hartshorne, and Paul Weiss. 1960. Collected Papers of Charles Sanders Peirce. Cambridge, MA: Belknap. Vegt, Emar, and Leo Murray. 2015. “Transcript of Interview with Emar Vegt.” 25 June. Whittington, William. 2013. “Lost in Sensation: Reevaluating the Role of Cinematic Sound in the Digital Age.” In The Oxford Handbook of Sound and Image in Digital Media, edited by Carol Vernallis, Amy Herzog and John Richardson, 61–74. New York: Oxford University Press.
APPENDIX A
Glossary – Sound, audio, film, television and games production terms This glossary includes terms that are mentioned in the text from sound and film theory as well as film, television and game audio terminology with specific meanings. Where there is potential for ambiguity job titles are capitalised where equipment is not, for example, dialogue editing is performed by a Dialogue Editor. Acousmatic Acousmêtre ADR, ‘looping’
Ambience
Atmosphere, atmos CGI
Sound that is heard without seeing its source. Acousmatic music is music that is heard with the source remaining unseen. A voice-character, specific to the cinema, which derives power from its visual origin being withheld. Automatic Dialogue Replacement or Automated Dialogue Replacement. A process that involves the rerecording of an actor’s dialogue in post-production for technical, performance or other storytelling reasons. Usually done in a recording studio in sync with picture. The background sound of a location that is often created in postproduction to either augment or replace the original location’s background sound. See also ‘Atmosphere’ and ‘Room tone’. Although sometimes used interchangeably with ambience, atmos is best described as the result where ambience is the raw material. See also ‘Ambience’ and ‘Room tone’. Computer Generated Imagery. The process of generating and animating elements to be composited into a scene as if the elements were present when it was shot.
194 Appendix A
Cut, Cutting
Diegetic sound / non-diegetic sound
Establishing shot Foley
Foley Artist, Foley Walker Gamespace
Gameworld
Leitmotif
Non-diegetic sound
Cutting is a term used to describe any type of editing, coming from the days of magnetic tape, and before that film, where the recording media was literally cut and spliced together with a blade in order to perform an edit. Music or sound effects that appear to emanate from the ‘world’ of the film. This is in contrast to non-diegetic sound, such as the music score, which accompanies the film but often does not appear to come from within the story world. In practice music is more often referred to as being either source or score, where source music exists in the story world, whereas score is heard only by the audience and not the characters. A shot generally shown at the beginning of a scene to indicate a change in location or time. Formerly known as post sync effects. Named after the technique’s pioneer, Jack Foley. The creation of footsteps and other effects performed in sync with a projected image. Where production dialogue is replaced by ADR, foley is often required to fill in the missing sound effects. This new recording is free from any background noise, and can be used to support or replace the sync sound in the final mix and may also be used to fill the Music and Effects-only (M+E) mix. A person who performs the foley sound effects. The conceptual space or arena in which a game is played, independent of any possible fictional universe in which it may be set. A unified and self-contained universe that is functionally and environmentally designed for the purpose of playing a specific game. A leitmotif is a musical theme that comes to represent a character, place or theme through association. Leitmotif, from Grove’s Dictionary of Music, is defined as: “a theme, or other coherent idea, clearly defined so as to retain its identity if modified on subsequent appearances, and whose purpose is to represent or symbolise a person, object, place, idea, state of mind, supernatural force or any other ingredient in a dramatic work, usually operatic but also vocal, choral or instrumental.” Music or sound effects that do not appear to emanate from the ‘world’ of the film. Frequently the music score that accompanies the film does not appear to come from within the story world and cannot be heard by the characters in the story, but sits outside it.
Appendix A 195
Pre-lap Post-production
Pre-production Principal photography Production
Production sound Rerecording (dubbing)
Reverberation
Shot/reverse shot
Sound Designer, sound design
Spotting list
A sound edit that precedes the picture edit. The final stage of the filmmaking process commonly involves picture editing, sound design, visual effects, and outputting the production to a format suitable for broadcast or exhibition. The planning stage before shooting commences, including casting, location scouting, and budgeting. The main period of filming in which shooting occurs with the main actors. This differs from visual effects photography and B-camera or ‘second unit’ shooting. See also ‘Production’. The stage at which principal photography occurs. The main period of filming in which shooting occurs with the main actors. This differs from visual effects photography and B-camera or ‘second unit’ shooting. Also known as ‘principal photography’. Audio recorded on set during production, which is typically dialogue. This is in contrast to ADR, foley and audio created by the Sound Designer. The process of mixing together all of the elements of a soundtrack that have been created during editing. The dialogue, foley, other effects and music are literally rerecorded as they are played back from their recorders onto a new master recorder, hence the term ‘rerecording’ (known as ‘dubbing’ in the UK). The rerecording (or Dubbing) Mixer is responsible for the balance of the final soundtrack. A reflection of a sound from multiple surfaces. This is in contrast to an echo, where there is generally only one surface reflecting the sound and the echoed sound is much clearer. A film editing technique where the perspective of a visual shot views the action from the opposite side of the previous shot. For example, during a conversation between two actors, giving the effect of looking from one actor to the other, and also suggesting that the two characters are looking at each other. The term sound design has (at least) two specific but quite different meanings. The role of Sound Designer can refer to the person responsible for the overall sound of the film. It also describes a person responsible for creating unique sounds from scratch. This term, borrowed from live theatre, has come to mean the sound equivalent of production designer or director of photography. After filming is complete and during the editing phases of a project, a sound spotting session or music spotting session often takes place which creates a spotting list. The composer
196 Appendix A
Stereo
Sync, sync sound
or music supervisor, or sound editor will meet with the director and/or editor to spot the film for placement of sound effects and music. Scenes or moments where specific musical cues will occur can be discussed. Problematic dialogue recordings, particularly those that need ADR, as well as any required foley effects and designed sound effects, and the overall feel of the soundtrack can be revisited or determined. Multichannel audio that is often (but not limited to) two channels. Two is the minimum number of channels for stereo; 5.1 is a stereo format, for example. Two-channel audio is often combined into a single stereo file or track with outputs for the left and right speakers. Synchronised or synchronous recordings that were recorded at the same time as the visible images and which need to retain a synchronous relationship.
APPENDIX B
Glossary – Peircean semiotic terms Abbreviated from The Commens Dictionary of Peirce’s Terms, except where otherwise indicated. Hypothesis, as a form of reasoning (a posteriori reasoning). Modes of being, fundamental conceptions, which include Firstness, Secondness, and Thirdness. A sign represented in its signified interpretant as is if it were in real relation to its object. A sign that is a sign of actual existence for its interpretant. Dynamical interpretant The actual effect that it has upon its interpreter. See also ‘Final interpretant’ and ‘Immediate interpretant’. Dynamical object The object determined through collateral experience. See also ‘Immediate object’ and ‘Object’. Final interpretant An idealised interpretant. The effect that the sign would produce upon any mind on which the circumstances should permit it to work out its full effect. See also ‘Dynamical interpretant’ and ‘Immediate interpretant’. Firstness Indicated by a quality of feeling. The mode of being of that which is, positively and without reference to anything else. See also ‘Secondness’ and ‘Thirdness’. Icon A Representamen whose Representative Quality is a Firstness that possesses the quality signified. Abduction Categories, Universal Categories Dicent
198 Appendix B
Immediate interpretant The effect that the sign first produces or may produce upon a mind, without any reflection on it. See also ‘Dynamical interpretant’ and ‘First interpretant’. Immediate object The initial object as represented in the sign. See also ‘Dynamical object’ and ‘Object’. Index A Representamen whose Representative Quality is a Secondness that is in real reaction with the object denoted. Interpretant The effect of the sign produced in the mind of the interpreter. See also ‘Dynamical interpretant’, ‘Final interpretant’, and ‘Immediate interpretant’. Legisign A sign whose nature is of a general type, law or rule. Object That which the sign stands for. See also ‘Dynamical object’ and ‘Immediate object’. Qualisign A quality that is a sign (Firstness). Reality The state of affairs as they are, irrespective of what any mind or any definite collection of minds may represent it to be. The real may be defined as comprising characteristics that are independent of what anybody may think them to be. Representamen The representation is the character of a thing by virtue of which, for the production of a certain mental effect, it may stand in place of another thing. A Representamen is the subject of a triadic relation to a second, called its Object, for a third, called its Interpretant. This triadic relation being such that the representamen determines its interpretant to stand in the same triadic relation to the same object for some interpretant. Rheme A sign whose signified interpretant is a character or property. Secondness Dependence. The mode of being of that which is, with respect to a second but regardless of any third. See also ‘Firstness’ and ‘Thirdness’. Sign “A sign is anything which is so determined by something else, called its Object, and so determines an effect upon a person, which effect I call its Interpretant, that the latter is thereby mediately determined by the former”. Signifier/Sign Vehicle see ‘Representamen’ Sinsign An actual existent thing or event that is a sign (Secondness). Symbol A Representamen whose representative quality is a Thirdness, which represents it object, independently of any resemblance or any real connection. A symbol is a Representamen whose representative character consists precisely in it being a rule or habit that will determine its interpretant.
Appendix B 199
Thirdness
Universal Categories
Tending toward a law or general character. The mode of being that which is, in bringing a second and third into relation to each other. The mode of being whereby the future facts of Secondness will take on a determinate general character. See also ‘Firstness’ and ‘Secondness’. see ‘Categories’
APPENDIX C
List of interviews A number of interviews were conducted for this book. The goal was to test some of the theoretical aspects of the book by speaking to several sound practitioners about their work, and often about particular productions in some detail. The interviewees were chosen either because of their work on particular influential productions (e.g. a film, game, television series, or product) in which I was interested or because of their expertise and broad experience in a particular area of sound. I am enormously grateful to the interviewees for lending their time and being so forthcoming with their thoughts and reflections on their own work and articulating their individual working methodologies and philosophies.
Charles Maynes (interview 12 October 2015) Charles Maynes is a Sound Recordist and Sound Editor. Charles Maynes has been involved in the film sound business since 1994, winning two Emmy awards for best sound editing for HBO’s miniseries The Pacific, and the History Channel’s Gettysburg and and has worked on Academy Award Sound Editing projects U-571 and Letters From Iwo Jima. His film credits also include Twister, Starship Troopers, From Dusk Til Dawn and Jackie Brown. His game credits include Ghost Recon: Future Soldier and the Call of Duty series. He now works mainly as an independent sound designer and has also developed a reputation as a sound effects recordist specialising in firearms recording and has worked on numerous AAA title videogames.
Appendix C 201
Jay Peck and Matt Haasch (interview 28 September 2015) Jay Peck (Foley Artist) and Matt Haasch (Foley Mixer) work as a team from a base in rural New York State. Jay Peck and Matt Haasch have carved out a niche as highly sought-after Foley team, particularly on East Coast-based projects. Their work can be heard on several high-profile TV series such as The Wire and True Detective and features The Wrestler and Beasts of No Nation.
Tim Nielsen (interview 30 September 2015) Initially studying at the University of Southern California to be a cinematographer, Tim Nielsen heard a lecture by Skywalker Sounds’s Gary Rydstrom and changed focus to sound. He began working at Skywalker Sound in 1999 and has worked as a sound editor on several sound award-winning films including Avatar, War Horse, The Lord of the Rings series and Moana as well as several high-profile films such as the Pirates of the Caribbean series and There Will Be Blood.
Tom Fleischman (interview 25 September 2015) Tom Fleischman is the son of Film Editor Dede Allen and Television Documentary Maker Stephen Fleischman. He knew from an early age that he wanted to be in the business of filmmaking. His career began working up through the ranks in the New York film sound industry and he eventually began mixing independent documentaries and feature films under the tutelage of Richard Vorisek. He has since developed long-standing working relationships with several directors including Jonathan Demme, Spike Lee and most notably Martin Scoresese with whom he has collaborated since The King of Comedy. He has won an academy award for Hugo, and was the first New York sound mixing team to be nominated for an Oscar for Warren Beatty’s Reds.
Mark Mangini (interview 5 October 2015) Mark Mangini grew up in Boston and studied foreign languages before moving to LA where he landed a job with animation studio Hanna Barbera’s sound department working on tv shows like Yogi Bear, Huckleberry Hound, Flintstones, Scooby Doo and Captain Caveman. He now works as a supervising sound editor and rerecording mixer. He was won sound editing awards for his work on Blade Runner 2049, The Lion King and Raiders of the Lost Ark including an Oscar for Mad Max: Fury Road.
Paula Fairfield (interview 5 October 2015) A long-time collaborator with Robert Rodriguez, working on Hands of Stone, Sin City and Spy Kids Paula Fairfield first came to work in sound from in art
202 Appendix C
school and has worked as sound effects editor or sound designer on a range of cinema, television, interactive and immersive virtual reality projects. Paula Fairfield is best known for her award-winning sound design for Game of Thrones and Lost.
Emar Vegt (interview 25 June 2015) After completing his undergraduate degree Emar Vegt chose to specialise in Sound Design. In his MSc in Industrial Design from Eindhoven University he focused on the industrial design of soundscapes. Now working at BMW he leads a team of automotive sound designers which is becoming increasingly important in the motor industry for user experience as well as the transition from internal combustion engines (ICE) to engines other forms which do not inherently have the same acoustic signature.
INDEX
abduction 66, 77–81, 89–94, 105–107, 112–117, 134, 151, 155, 191 acoustic ecology 30 ADR 86, 88, 176, 181 Altman, Rick 36, 39, 59, 86 Andersen, Martin Stig 161–163 animation sound 88, 129–132, 179, 186 Aristotle 15–16, 20, 22, 53, 60, 63 Arnheim, Rudolf 37 asynchronism 37 Auden, W.H. 126 auditory perception 14–15, 18–19, 22–28, 30–31 Balázs, Béla 38 Beck, Jay 42 Benedetti, Giambattista 17 Bernoulli, Daniel 20 Bregman, Albert 22 Bridgett, Rob 46–47 Britten, Benjamin 126 Burwell, Carter 110 Carroll, Lewis 54 Casati and Dokic 27–29 categories see universal categories Cavalcanti, Alberto 37–38, 81, 126–127, 140 Chion, Michel 40–41, 87–88, 189 Clair, René 37 Coen, Joel and Ethan 109, 118 collaboration 42, 46, 107, 110, 150, 154, 181, 183–184, 186–190
Descartes, Rene 18–19 Dialogue 36, 39–41, 44, 86, 87, 111–114, 117, 152, 157; foreign language 105, 161; see also ADR diegetic and non-diegetic 38, 103, 105, 152–154, 187 Doane, Mary Ann 40, 48 ear, physiology of 8; see also hearing Eco, Umberto 58 Editing, picture 24, 39–40, 45, 84, 89–90, 123–124, 135, 140–141; see also sound editing Eisenstein, Sergei 37 Elemental Tetrad 150 emotion and sound 38, 85–86, 90, 102, 106, 146 ethics: filmmaking 127–128, 138–139; documentary 120, 137, 175; sound practitioner 140–142 events: audiovisual 26–29, 40, 48, 87, 149; understanding and interpretation of 66, 68, 80, 89, 93; real-life 120–122, 128, 133–136 Fairfield, Paula 181–182 fidelity, sonic 39, 136, 164 firstness see universal categories Flaherty, Robert 120 Fleischman, Tom 185, 187 foley 43, 180–181, 183, 191 Forrester, Michael 25 Fourier, Joseph 20, 72–73
204 Index
Galilei, Galileo 17–18 Game of Thrones 181–182 Gestalt, principles of 23–24, 41 GPO Film Unit 126–127 Grimshaw, Mark 165 Gulliver’s Travels 54, 55 Haasch, Matt 180–181, 183–184 Hanson, Helen 42 hearing: as a sense 13–16, 18–20; binaural 23, 39, 94, 163–164; compared to listening 6, 13–14, 25, 29–30; see also auditory perception Holman, Tomlinson 44, 46–47, 89–90, 152–153 icon 67–69, 77, 79, 83, 85–93, 101–102, 106, 189 IEZA model 154 index 67–69, 77, 79, 83, 85–93, 101–102, 106, 189 intelligibility 17, 31, 36 interpretant 77; immediate 70, 77, 82–84, 116; dynamical 70, 82–84, 91–94; final 70 Jørgensen, Kristine 152–153, 165 Kalinak, Kathryn 42 Kerins, Mark 42, 176 Kinevox 124 Kubrick, Stanley 42, 82, 84 language: structure 15–16, 24, 28, 31, 53–57, 160; of film 36–37, 59, 103, 183; of technology 150, 154; of sound 41, 183–184, 188; foreign 105, 160; see also speech, dialogue; voiceover and narration Lievsay, Skip 110 listening 13–14, 25, 29–31, 39, 53, 57, 78, 94, 164, 186, 188; modes 41, 44, 88, background 24, 30, 44, 90, 152, 158, 168; subliminal 30, 44, 89–90, 152, 158, 186 Locke, John 19, 60, 69 Mangini, Mark 44–46, 179–180, 185–186 Matthen, Mohan 30 Maxfield, J.P. 39–40, 164–165 Maynes, Charles 179, 183, 187 MDA model 150–151 Mersenne, Marin 18 Moore, Brian 24–25 Moore, Michael 129
motif: musical 104–106, 155; sound 154 Murch, Walter 45–47, 89, 94, 110–111, 118, 189–190 music and meaning 41, 48, 67, 71, 81–84 music in film 23–24, 42–48, 80–84, 110, 141–142, 187; scored 90, 98, 103–107, 126–127; popular 42, 84–85, 129 music in games 148–150, 152–153, 155, 161–163, 165–167 musical sounds 14, 17, 20–21 Nanook of the North 120 narrative function of sound 41, 44–47, 89–90, 104–107, 111, 123, 128–130, 133, 152–153, 164–167, 180, 182, 186, 190 Newton, Isaac 18–19 Nielsen, Tim 177–178, 185, 189 Night Mail 126 Nishikado, Tomohiro 148 Nudds, Matthew 30 Nyquist, Harry 20 O’Callaghan, Casey 28–29 object: sound 16, 18–19, 23–24, 26–31; audiovisual 40–41, 44–45, 86–88, 90, 134, 153, 155, 160, 174, 178, 185; of a sign 58, 64–69, 76, 79, 85, 116–117; immediate 69–70, 77, 82, 91–94; dynamical 69–70, 82, 91–94 Ohm, Georg Simon 20 Park, Nick 129 Parker, Alan 9 Pasnau, Robert 28 Pathé newsreel 121–122 Peirce, Benjamin 20–22 Peirce, Charles. S. 54, 57–58, 60–67, 69–73 perception 15, 18–19; direct and indirect 25–26; sonic compared to visual 23, 27; sensory 26–27, 62, 71; see also auditory perception phonemes 24, 56 Plato 14–15, psychoacoustics 41, 94; see also auditory perception Pudovkin, Vsevolod 36, 37 Pythagoras 14 realism (audiovisual) 36–40, 45, 48, 87, 90, 98, 101, 148, 156, 157–165, 176–181; in nonfiction 121–122, 126–130, 133–136, 138, 140, 142; actuality 121, 123, 126, 128–129, 131, 133, 138; hyperrealism 183
Index 205
reality 22, 25–26, 30, 36–37, 38–40, 62, 68, 72, 87, 90, 121–122, 128, 133–136, 138–139; see also realism reasoning 62, 66, 71, 79, 89–90, 134–135, 155–156; see also abduction Rotha, Paul 140 Saussure, Ferdinand de 54–55, 57–60, 94 Schafer, Murray 30, 168 secondness see universal categories sensible qualities 27, 30; see also perception Shannon, Claude 20 sign (Peirce) 58, 60–70, 82–84 sign (Saussure) 54, 57–62; applied to film 59–60; applied to language 54–57 silent film 36–38, 42, 103, 121, 124 Sonnenschein, David 41, 46–47 sound: definitions of 13, 14; propagation as a wave 16, 17, 20–20, 26, 29; as an object 18, 19, 23, 26, 29; as a stream 28 sound design: approaches to 41, 43–44, 46–47, 86, 127, 132–135, 141–142, 155–163, 165–168, 174, 177, 183–186, 188–189; authorship 185–187; cartoon 179–180; creature 98–102, 158–159, 182; automotive 178–179; individual elements 98–102, 107, 157, 178, 181–182; industrial 69 sound editing 35, 45, 136, 99–103, 141, 177–178, 185–187 sound effects 99–103, 111–114, 121–123, 133, 140, 148 sound recording: technology 14, 35, 44–46, 122, 189; techniques 39–40, 91, 93, 98–102, 140, 164; synchronous 44, 121–123, 124–128, 133–135, 138, 140, 157, 164; 86–88, 98–102, 122, 123, 129; sound effects 133, 181; see also ADR, Foley speech 25, 38, 44, 53, 57, 114, 118, 131, 146–147, 149, 154, 170, 176–177, 186, 213, 217: intelligibility and recognition
22, 26, 30, 41, 45, 53; linguistic study of 24–26, 56, 188; intonation 56–57; functions of 85, 152 Spivack, Murray 99–102 Steiner, Max 103–107 Swift, Jonathan 54, 55 symbol 67–69, 77, 79, 83, 85–91, 93, 101–102, 106, 153, 189 synchresis 40, 87–88, 134 synchronised sound 36–38, 40–42, 72, 82–83, 85–89, 98, 100, 102–106, 122 talking film 37 technology and sound 35–36, 42–43, 102, 125, 150–151, 158, 164, 168–169, 174–176, 185–186, 189–191 The Wire 180–181 theme music 80, 85, 90, 103–107 thirdness see universal categories Thom, Randy 43–44, 47 Through the Looking Glass 54, 55 Transducer 13, 35 Truax, Barry 30 universal categories 63–65, 73, 78–79; Van Leeuwen, Theo 57–58 Vegt, Emar 178–179 Ventriloquism 36, 87 Vitaphone 35, 38 Vitruvius 16–17 voice: recognition 25, 53; acousmatic 40; characteristics 88, 90; production of 17, 21, 31; in film 39–41, 36, 39–40, 87–88, 113–114, 131; non-human 99, 101, 107; see also speech voiceover and narration 122–123, 125–127, 129, 133, 139, 152 vuvuzela 134 Westerberg, Andreas, and Henrik Schoenau-Fog 153–154