198 20 32MB
English Pages 227 [240] Year 1980
Toward a Structural Psychology of Cinema
Approaches to Semiotics 55 Editorial Committee Thomas A. Sebeok Roland Posner Alain Rey
MOUTON PUBLISHERS · THE HAGUE · PARIS · NEW YORK
Toward a Structural Psychology of Cinema
John M. Carroll
MOUTON PUBLISHERS · THE HAGUE · PARIS · NEW YORK
ISBN: 90-279-3447-9 Jacket design by Jurriaan Schrofer © 1980, Mouton Publishers, The Hague, The Netherlands Printed in Great Britain
The forward movement of our epoch in art must blow up the Chinese Wall that stands between the primary antithesis of the 'language of logic' and the 'language of images'. We demand from the coming epoch of art a rejection of such opposition. Sergei Eisenstein 1929
Contents
ACKNOWLEDGMENT
ix
CHAPTER 1: INTRODUCTION
1
CHAPTER 2: RECURRENT PROBLEMS IN CINEMA THEORY 1. Theories of Cinema 2. The Early Theorists 3. Four Recurring Problems 4. Summary
7 7 8 13 26
CHAPTER 3: FILM AS LANGUAGE 1. Filmolinguistics 2. Theory, Methodology, and Metz 3. Subjective Phenomena as Data and Analysis 4. The Grande Syntagmatique: An Assessment 5. Poetry, Symbolism, and Cinema
29 29 31 36 41 45
CHAPTER 4: A LINGUISTIC APPROACH TO CINEMA THEORY 1. Preliminaries 2. Phrase Structure Grammar 3. Acceptability
54 54 55 65
CHAPTER 5: TRANSFORMATIONAL-GENERATIVE CINEMA GRAMMAR 1. Inadequacies of Phrase Structure Grammar 2. Transformational-Generative Grammar 3. Some Implications
81 81 92 116
CHAPTER 6: DELETION AS A CASE STUDY 1. What Isn't There in Cinema and Language 2. A Taxonomy of Deletion Types in Cinema 3. The Representation of Deletion in the Grammar 4. Explanatory Adequacy 5. Universals
125 125 130 143 148 150
CHAPTER 7: ACTIONS AND SHOTS AS PSYCHOLOGICAL UNITS 1. The Psychological Reality of Cinema Grammar 2. Cinema'Perception: Experiment 1 3. Cinema Perception: Experiment 2 4. Extending the Analysis of Cinema Perception
157 157 159 167 174
CHAPTER 8: THE PLACE OF CINEMA GRAMMAR IN CINEMA THEORY 1. Overview 2. Grammar and Perception 3. Grammar and Aesthetics
184 184 185 194
CHAPTER 9: EPILOGUE
206
APPENDIX: DESCRIPTIONS OF EXPERIMENTAL SCENES FOR EXPERIMENTS 1 AND 2
214
REFERENCES
218
Acknowledgment I am very grateful to several people and to two institutions for making this work possible. The encouraging free-spiritedness of Professor Thomas G. Bever's laboratory at Columbia University, and a fellowship from the Graduate Faculty of Arts and Sciences at Columbia for the years 1972-1976, allowed me to begin systematically thinking about cinema. The actual writing work was carried out at the IBM Watson Research Center, on Research Assignment during 1976-1977 and as a Research Staff Member since 1977. I am grateful to Dr. Lance A. Miller, Manager of the Behavioral Sciences Group, for his continuing support of my work. Several individuals have contributed a great deal to this work by discussing cinema with me more than I suspect they really cared to. Thomas Bever suggested that I pursue this study and was unwarrantedly encouraging in the early stages. Leon Gellman saw and critiqued a huge number of films with me, stubbornly pointing out problematic aspects of the data. Margot Lasher read every draft with insight and patience, improving both the argument and the prose. I dedicate this book to Margot Lasher.
Introduction
This book is about cinema theory, cognitive psychology, and semiology, probably in that order. The viewpoint adopted, and the fundamental thesis advanced, is that contemporary cinema theory can take seriously the 'film as language' metaphor presupposed by Eisenstein and other early cinema theorists. Further, it is argued that such a research orientation offers important and somewhat unique advantages to cinema theory by placing the study of cinema within the broader intellectual contexts of semiology and cognitive psychology: the study of cinema can be the study of human intelligence. The slogan 'film is language' is as old as cinema theory itself. And its fundamental justification is purely intuitive: film sequences seem to have a syntax.1 The same images scrambled into a different ordering would have an entirely different meaning, or no meaning at all. Thus, consider various orderings of the three images glossed in English below. (i) A close-up shot (i.e. face only) of a man, A, smiling. (ii) A medium-shot (i.e. from the waist up) of two men, A and B, engaged in conversation. (iii) A long-shot (i.e. revealing both men completely) of the two men A and B parting; they wave to one another as they walk off. 2 The order (i), (ii), (iii), suggests that the smiling gesture of A in (i) invited the conversation. The order (ii), (i), (iii), suggests that something in the conversation pleased A. The order (ii), (iii), (i), suggests A's overall satisfaction with meeting B. Each of these different orderings suggest a slightly different meaning for the sequence of images.
2 Toward a Structural Psychology of Cinema
There are three other possible orderings of the three images: (i), (iii), (ii); (iii), (i), (ii); and (iii), (ii), (i). All of these three are incoherent in that A and B part just prior to their conversation.3 Consider the sequence (i), (iii), (ii); A smiles (i), A and B wave as they part from each other (iii), and A and B converse (ii). This is visual nonsense. Rather analogous things are routinely observed in language. The sentence: Fred hit Sam. is quite different from the sentence: Sam hit Fred, and totally different from the meaningless sequence: * Hit Fred Sam.4 Film theorists very early on were impressed with the prima facie similarities between cinema and language, and sought to develop the analysis of cinema as a sort of filmolinguistics (e.g. Eisenstein 1949). This book stands in that same tradition. The task we shall undertake is to apply the analytical techniques of modern generative linguistics to the theory of cinema.5 Generative linguistics emphasizes the importance of characterizing the intuitive knowledge that all speakers of a language share. For example, speakers of English recognize the first of our two example sequences above as a bona fide sentence of English; and they recognize the second as a nonsentence. Speakers of English know that the declarative sentence: Fred hit Sam. is synonymous with, that is has the same meaning as, the passive sentence: Sam was hit by Fred.6 What can such basic and obvious facts tell us about the structure of human language? They can tell us much. Accounting for the systematic relation of declarative and passive sentence pairs, for example, has stimulated numerous theoretical elaborations in generative linguistics (Chomsky 1957: 43-48, 79; 1965: 103-106; 1970; Grinder and Elgin 1973: 144-145; Perlmutter and Postal 1977). And yet the fundamental empirical motivations for these elaborations of linguistic theory are not buried in obscurity, or jargon, or tortured chains of reasoning. For the most part, they are immediately
Introduction 3
accessible to any speaker of English. Language, perhaps the most intricately structured human activity, gives up many of her secrets to us almost directly. The same sorts of points can be made with respect to cinema. To continue with the synonymy example, most of us have noticed cinema sequences that remind us of other similar sequences; sometimes we have noticed that a sequence is repeated in a film, perhaps with a slight variation. Woody Allen's film Play It Again Sam contains many sequences that play on synonymy relations. Many of the sequences in this film are intentionally contrived to be synonymous (to some degree) with corresponding sequences in the film Casablanca.'' Surely we can all imagine filming the same scene from two different camera angles, and thereby creating a pair of synonymous sequences. Suppose our earlier example (i), (ii), (iii) substituted a long-shot for (ii): the two men are talking, but in a more inclusive view. Call this shot (ii'). What then is the relation of the sequence (i), (ii), (iii), to the sequence (i), (ii 1 ), (iii)? One might ask whether accounting for such cases of synonymy in cinema might reveal underlying principles of structure. Does the fact that we agree on basic judgments of synonymy imply that we share a tacit knowledge of cinema structure (perhaps in just the sense that speakers of English share a tacit understanding of English syntax)? As in the case of language, there is no shortage of clear, intuitively accessible facts about cinema. However, in contrast to the case of language, there is presently no systematic theoretical analysis of these facts. Cinema theorists have typically looked far beyond simple structural relations like synonymy in favor of analyzing complex aesthetic or political relations. These more complex levels of analysis are, of course, necessary for a comprehensive theory of cinema. However, any theory of cinema will be fundamentally inadequate if it does not analyze basic and intuitively apparent structural relations, like synonymy. This constitutes the initial motivation for the present study of cinema. A generative grammar of cinema borrows its methodology and theoretical vocabulary from linguistics. The aim of such a cinema grammar is to provide an analysis of the intuitive knowledge people have about the structure of cinema sequences. Such an analysis can provide a foundation for coherent investigations of
4 Toward a Structural Psychology of Cinema
the more typical concerns of cinema theorists. Later (Chapter 8), we will consider a specific proposal as to how cinema grammar might provide some necessary theoretical foundation for theories of cinema aesthetics. However, the most fundamental motivation for a linguistic program of cinema theory is not merely that it provides foundation for traditional questions. Rather, it is that a linguistic approach raises new issues and possibilities. For example, when we analyze two complex symbolic modes, like language and cinema, within the same formal framework (in this case generative grammar), there is a possibility that certain intermodal generalizations will become apparent (Chapters 6, 7, and 8). Such generalizations might reflect basic properties of human intelligence. That is, perhaps the human mind defines and manipulates the structure of complex sequences of symbols in narrowly proscribed universal ways. This proposal is hardly novel, in fact it offers the only tenable account of the underlying samenesses of human behavior and experience. What may seem novel is the belief that the analysis of cinema can provide further empirical and theoretical elaboration of the nature of human intelligence. Yet, this assumption was routine in the writing of early cinema theorists like Eisenstein (1949). If the proposal that the study of cinema can contribute to the study of human intelligence strikes us as novel, it is because modern cinema theory has failed increasingly to confront these issues. In the present study, these matters are of paramount concern. In Chapters 2 and 3, certain historical issues are presented. Chapter 2 considers, in overview, some of the problems that have prevented cinema writing from truly becoming cinema theory, in the sense science uses the term theory. Chapter 3 turns specifically to the film-as-language metaphor and the problems and confusions regarding its continued use in discussions and analyses of cinema. These two chapters are intended to develop two points: first, many of the chronic problems of cinema theory stem from a failure to deal with cinema systematically as a problem in the study of human intelligence. Second, many of the problems in studying film as if it were a language stem from basic confusions about what it means to study film as language. These chapters
Introduction 5
are, therefore, remedial; they attempt to clear ground for a new linguistic approach to cinema theory. Chapters 4 and 5 develop a generative linguistic approach to cinema theory. Chapter 4 defines phrase structure grammar as an approach to the description of cinema structure. Chapter 5 elaborates the phrase structure approach and defines transformational grammar. These chapters present an introduction to and a handbook for a particular linguistic approach to the study of film. They characterize cinema grammar as the central component in a structural psychology of cinema. Chapters 6, 7, and 8 explore the implications of cinema grammar. Chapter 6 explores deletion as a case study in grammatical analysis, and considers the implications of the analysis for the study of universals. Chapter 7 reviews some studies of cinema perception and the role of the grammatical structure of cinema sequences in organizing their perception is considered. Chapter 8 explores how cinema grammar might be embedded in a comprehensive theory of cinema, focusing on the relations between both grammar and perception, and grammar and aesthetics. These final chapters work outward from cinema grammar toward the goal of describing a comprehensive structural psychology of cinema. They attempt to connect cinema grammar to traditional questions: how do we understand cinema sequences?, what is the basis for cinema aesthetics?, etc.
NOTES 1. The term syntax is a technical term from linguistics. It refers to the fact that the particular ordering of words in a sequence significantly determines various properties of the sequence; such as whether or not it can be recognized as a sentence, and if it is so recognized, what meaning it will be recognized as having. Consult Grinder and Elgin (1973) for a review of basic terminology and theory in generative linguistics. Linguistic terminology, as well as cinema terminology and terminology from psychology and semiotics, has been minimized whenever possible. Some familiarity with the terminologies of these areas will, however, certainly be helpful to the reader. 2. This discussion focuses exclusively on the analysis of narrative cinema, and principally on the narrative of the classic Hollywood period. Starting from a corpus of clearly and homogeneously structured films seemed prudent for both scientific and didactic purposes. Terms like close-up.
6 Toward a Structural Psychology of Cinema
medium-shot, and long-shot, are used in their standard senses (e.g. Reisz 1953). 3. There are surely ways to interpret these sequences such that they are not incoherent, e.g. in the order (iii), (i), (ii), (ii) might be a flashback memory for A: he parts from B (iii), he smiles reflectively (ii), and then recalls their pleasant conversation (i). The remarks in the text presuppose a simple linear interpretation for the various possible orderings of (i), (ii), and (iii). (See Chapter 4 for comments on flashback constructions.) 4. By convention, an asterisk will be used to indicate that a given linguistic sequence is recognized by native speakers as a nonsentential sequence. This is the standard symbology in linguistics (Grinder and Elgin 1973). 5. Most cinema viewers are not filmmakers as well. At first this may seem to contrast significantly with the situation for language, where virtually all hearers are speakers as well. Since there are filmmakers, however, cinema as a form would, like language, be expected to be structured both by its production and its perception — even if this is not always true for particular viewers. Grinder and Elgin (1973), and many other recent introductions to linguistics, provide an elementary discussion of generative linguistics. Some background material is incorporated into Chapters 4 and 5, where we will consider film grammar at a more mechanical level. 6. It is possible to characterize differences between the declarative and passive versions that one might want to call meaning differences. We certainly do not mean to suggest otherwise: a successful account of the relation between declarative and passive forms will obviously have to describe both the sameness, what we have called synonymy, and the difference. These are technical questions, however, that clearly go far beyond the point of the example in the text. 7. Synonymy in cinema is discussed in greater detail in Chapter 5.
Recurrent Problems in Cinema Theory
1. THEORIES OF CINEMA This chapter briefly reviews some historically important positions in film theory, and attempts to characterize some of the problems that have chronically hampered analyses of film. One goal of the cinema theory which will be developed later in this book will be to confront and resolve each of these traditional difficiencies within a systematic theoretical framework. The unfortunate, but understandable, lack of systematicity with respect to goals and methods in the work of early cinema theorists, has been virtually institutionalized in the study of cinema. It is not the case that we are faced with choosing between three or four conceptions of what a cinema theory is a theory of, rather it is the case that we presently have no choices at all. Tudor (1974) has recently advocated a position that, in the context of the study of cinema, seems almost novel: he argues that a theory of cinema should be a theory in the usual scientific sense of the word. He enumerates three requirements for scientific theories in general: (1) they should make generalizations about that which they describe, (2) they should be systematic, and (3) they should have a creative aspect, that is, they ought to lead to new questions and predictions. When a set of systematically related statements analyzes a coherent fragment of the world, we call that set a theory. If the theory captures significant generalizations about the world-fragment which it describes, it may ultimately be thought of as an explanation of phenomena pertaining to that world-fragment. The criteria outlined by Tudor certainly do not seem to be too severely restrictive or idiosyncratic. In film, however, theories do
8 Toward a Structural Psychology of Cinema
not accommodate these three requirements, a view earlier considered by Spottiswoode (1950 [1933]: 154-159).10 Tudor argues that most of what is called film theory in fact consists of examinations of the assumptions underlying film criticism (1974: I I ) . 1 He gives as examples the work of Bazin, Kracauer and the proponents of the auteur school. Of course, as Tudor points out, we can still learn a great deal from the consideration of a particular critic's worldview as it interacts with, and in part determines, his appraisal of given films. However, we should not confuse the critic's worldview with a theory of film. Other works of so-called film theory consist almost entirely of descriptive anecdotes (e.g. Baläzs 1970 [1945]). An unfortunate consequence of the unsystematic usage of the word 'theory' as regards discussions of film is that a critic's worldview or a catalog of anecdotes, mislabeled as a theory of film, may then erroneously be taken as some sort of explanation of film. The roots of the problem can be traced to the earliest work in cinema theory. Theorists like Eisenstein, Pudovkin, and Baläzs, wanted to establish a scientific program for cinema theory. Eisenstein especially attempted to integrate the scientific study of film with psychology, sociology, and linguistics. He believed that the scientific method could be, and indeed had to be, applied to the study of art: science and art could not be separated from each other. However, unclear and confusing statements in these early works have ultimately led to enterprises in film theory that are quite antithetical to principles such as those discussed by Tudor.
2. THE EARLY THEORISTS 2.1. Eisenstein Most of the very early work in film theory was directed at an analysis of montage: the joining together of spatially and temporally continuous shots by means of cutting. To recall our example from Chapter 1, we see two men talking and then instantaneously we see them parting. What has happened? How do we make sense of this juxtaposition? How do we interpret the abrupt discontinuity bridging two relatively continuous shots? These are the questions that first intrigued cinema theorists.
Recurrent Problems in Cinema Theory 9
Eisenstein viewed montage as the major formative element in cinema. Throughout his life he attempted to formally perfect a taxonomic classification of montage types and to understand its psychological basis. His first published article (Appendix 2 of Eisenstein 1947 [1923]) sketched an approach to theater direction that he called the montage of attractions. He argued that elements of a production could be arranged in a formally determinable order so that a viewer would be aroused ('shocked' in Eisenstein's terminology) in precisely the intended manner and to the intended degree. In the late 1920s Eisenstein generalized this notion of collisionary montage (or kino-fist) under the metaphor film grammar of conflicts. In this view, the contrasts within and between shots give rise to conflict or tension which renders a film sequence emotionally exciting, aesthetically pleasing, and even narratively coherent. Eisenstein (1949: 60) argues that '... in regard to the action as a whole, each fragment-piece is almost abstract'. Something photographed in one montage-piece can only mininally reveal itself: it provides only the barest skeleton of information. Only when it is reconstructed via a montage of fragments can it be fully revealed — narratively, emotionally, and aesthetically. However, Eisenstein cautioned against a too-uncritical and too-simple conception of montage. His own initial proposal was a five-tiered analysis: he proposed five montage types which simultaneously coexist in any film sequence. Metric montage refers to the absolute lengths of the montagepieces. Patterns of cutting lengths may be repeated to establish a background measure, analogous to the notion of a musical measure. Lengthening the strips produces a calm measure, while shortening them creates tension. Alternatively, two juxtaposed sequences in a film, or two kinds of content within a sequence, can be contrasted by structuring each under a different metrical measure. Eisenstein contrasts metric montage with rhythmic montage. According to the latter method, the piece lengths are determined not by a formula but by the visual content of the pieces, particularly the rhythm of actions in the shots. Eisenstein recalls the Odessa steps sequence in his film Potemkin. The rhythm of the soldiers' feet creates a counterpoint to the cutting which is basically metric montage. This counterpoint is then itself violated,
10 Toward a Structural Psychology of Cinema
first by Eisenstein's switching to rhythmic cutting at key points in the sequence, and later by his changing the rhythm of the action from that of the soldiers' feet descending the steps to that of the baby carriage rolling down the steps. The original tension created by the contrast in the rhythm of the soldiers' feet and the metric cutting is compounded by the alternation of rhythmic cutting with metric cutting. The acceleration in the rhythm of the action when the baby carriage rolls down the steps even further compounds the counterpoint and therefore the tension. Tonal montage is characterized by Eisenstein with a musical metaphor: it is the emotionally dominant chord of a sequence. The notion can be objectivized by reference to physical parameters, such as camera angles, grain, light tonality, etc., in relation to content elements of the shots, such as the shape of the objects photographed and the commonalities of their movements from shot to shot. Eisenstein gives as an example the fog sequence from Potemkin which repeats a tiny rocking rhythm in the motion of the water, the ships and buoys, the sea birds, and the rising mists. Overtonal montage, the fourth category, is rather elusive. Eisenstein again makes an analogy to music. He notes that accompanying the sound of any dominant tone are a range of related tones called overtones and undertones (1949: 66). The feeling of the shot must, then, be some complex combination of the entire complex of overtonal and dominant aspects. Eisenstein exemplifies this montage method with sequences from his film Old and New, but his discussion is not clear. A sequence's overtones are apparently the partial contradictions of its dominant tone. Overtonal montage is an emergent quality, derived from the interaction of the other three montage types. Eisenstein underscores this principle of conflict among montage types. He specifies that a method of montage becomes a montage construction only when placed in contrast to other methods. Rhythmic montage grows out of the conflict between metric montage and the movements within the shot. Tonal montage derives from the conflict of rhythmic montage and the tonal elements of the sequence. Finally, overtonal montage emerges from the conflict between the dominant tone and the overtones. Eisenstein's fifth montage type is intellectual montage. This method, unlike the other forms, does not appeal directly to the
Recurrent Problems in Cinema Theory 11
emotions but to more rational modes of experience. Intellectual montage juxtaposes elements which are similar in some thematic sense, forcing the viewer to abstract this similarity and to develop some comment upon it. Thus, in October Eisenstein presents a sequence of gods from various cultures. The spectator is encouraged to abstract from these images the notion god and the comment Eisenstein intends. In the following decade Eisenstein broadened the notion of montage. Less emphasis was placed on conflict. In his book The Film Sense (1942), he makes an analogy between montage and aspects of word blending. Thus Lewis Carroll's blend word 'frumious' is not the sum of furious and fuming but rather, says Eisenstein, an entirely distinct word. The notion of conflict is entirely unnecessary for an account of the processes of creative neologism (Carroll and Tanenhaus 1975; Halle 1973). By the early 1940s Eisenstein acknowledged that in his early work perhaps too much attention had been given to juxtaposition of shots, with too little attention given to the analysis of what was actually being juxtaposed. Even so, in The Film Sense he generally assumes the montage framework that he had previously developed — although avoiding any explicit use of the notion conflict. He elaborates the earlier five tiered montage taxonomy, including audio-visual interactions as a new montage type. The synthesis of the sound track and the image track is dubbed vertical montage. Finally, he introduces the term chromophonic montage in his discussion of the synchronization of music and color.
2.2. The constructivists For Eisenstein, the montage construction and its constituent elements were indivisible. Kuleshov, Pudovkin, and Baläzs, in contrast, maintained a conception of montage as a linkage of pieces. The view of montage they established was that of a conceptual glue that pasted together otherwise independent components. Shot by shot, and brick by brick, as it were, a concept is built out of a sequence of elements. In his 1929 textbook, On Film Technique, Pudovkin describes three experiments by Kuleshov and himself performed in 1920.
12 Toward a Structural Psychology of Cinema
In one experiment, they joined a neutral close-up shot of the actor Mosjukhin to three other shots. In the first version, they spliced in a shot of a plate of soup standing on a table. In the second version, they followed the shot of Mosjukhin with a shot of a coffin containing a (dead) woman. Finally, in the third sequence the shot of Mosjukhin preceded a shot of a little girl playing merrily with a toy bear. When they showed the sequences to viewers they found that Mosjukhin's expression was rated as pensive in the first case, deeply sorrowful in the second, and happy in the third. Viewers apparently constructed a concept of the sequences and then attributed an appropriate emotion to the neutral expression of Mosjukhin's face (see Pudovkin 1958 [1929]: 168). In another experiment, Kuleshov photographed the hands, feet, eyes and heads of several different women in motion. Edited together these parts gave the impression of the movement of a single person (see Pudovkin 1958 [1929]: 145). In a third experiment, Kuleshov joined the following five shots into a scene: (1) A young man walks from left to right. (2) A young woman walks from right to left. (3) They meet and shake hands. The man points out of frame. (4) A large white building is shown, a broad flight of steps in front. (5) The two people mount the steps. Viewers accepted the scene as representing a coherent, unified event transpiring in a real location. In fact, however, each of shots (1), (2), (3), and (5) was filmed in a different place in Russia — and shot (4) was of the White House! The juxtaposition created what Pudovkin calls a filmic space — what Kuleshov himself calls a creative geography (Pudovkin 1958 [1929]: 88-89). Kuleshov and Pudovkin described the phenomenology of montage as being constructed by expectations, inferences, deductions, and associations. Similarly, Baläzs (1970 [1945]: 119) argued that the viewer presupposes the existence of a typically human intelligence underlying any particular sequence of images displayed. Based on this presupposition, the viewer strives to unravel the relations that bind these images together and the meaning they are intended to convey. Constructivist theorists like Baläzs and Pudovkin explored the analysis of a far broader range of cinematic devices than had
Recurrent Problems in Cinema Theory 13
Eisenstein. Baläzs and Pudovkin examined mise-en-scene devices other than cuts: camera angle, camera distance, focus, distortion, pans, tracks, etc. Baläzs notes, for example, that the use of very unusual camera angles is typically motivated thematically. He discusses the use of the subjective camera, the repetition of camera set-ups for effect, flashbacks, and expressionistic sequences (see below in section 3.4). He points out the conventional uses of dissolves, fades, irises, pans, and the close-up shot. However, while this exposition is full of interesting observations, to which we will have occasion to return in subsequent chapters, it does not attain a very high level of systematization. Indeed, the only analysis that approaches any significant degree of systematization is Eisenstein's montage taxonomy. And the sense in which this taxonomy provides a cinema theory remains unclear.2 The taxonomy classifies various cutting arrangements, and, at least in its early form, advises that conflict and contrast of various montage types enriches cinema presentation. But the later Eisenstein backed off somewhat from this categorical statement. And clearly, something must be said of a qualitative nature: the mere creation of conflict among montage types will never in itself guarantee a successful cinema construction. Even the early Eisenstein would not have claimed this. But how is optimal contrast to be achieved? All we have in answer to this is a series of anecdotal citations from Eisenstein's own films.
3. FOUR RECURRING PROBLEMS Built upon this uncertain foundation of anecdote, cinema theory has tended to wander, in terms both of its aims and its methodologies. As a result, we have made very little progress toward cinema theory in Tudor's sense of the word: There is no comprehensive scheme available for the description of the structure of cinema sequences. And there is not even a ghost of a comprehensive theoretical analysis of the film medium. The questions which intrigued Eisenstein, how does the medium work?, how do we understand cinema constructions?, why are we pleased and stimulated by certain arrangements?, are still with us, and for the most part completely unanswered.
14 Toward a Structural Psychology of Cinema
In the remainder of this chapter, we will try to define more specifically four problem areas that have chronically plagued studies in cinema theory. The linguistic approach to cinema theory which we will construct in later chapters has as one of its specific goals the successful confrontation of each of these problems. One of them is the failure of film theorists to distinguish between description and prescription. Only the former can count as theory in the typical scientific usage (recall Tudor's three requirements). A second problem is that, even when film theory is descriptive, no consistent distinction has been made between the structural operation of the film medium and the bases for judgments of quality. Both concerns are theoretically valid, of course, but neither can be significantly advanced if they are not distinguished from one another. A third problem area is the role of psychological explanation in analyses of cinema. Characteristically, theories of film have been elaborated and justified by appeals to vague psychological principles. Many theorists, for example, have argued that film is experienced on analogy to real world experience. But the principle of analogy, whatever it in fact is, is certainly not in itself a comprehensive psychology of cinema. Fourth, and finally, the project of discovering and formalizing the intrinsic laws of the medium has simply failed to progress. It is generally paid lip service in cinema theory, but little more.
3.1 Prescription versus description The distinction between prescription and description can be put succintly: in the latter case we account for what is while in the former we outline what should be. As a first approximation, we might map the prescriptive function onto film critics, and the descriptive function onto film theorists. However, the mapping between prescription and description, on the one hand, and criticism and theory, on the other, is more involved than one might at first suppose and we will not bother to fully explore it here (see Tudor 1974: 10-11). The point is that film theory, as defined by Tudor, is fundamentally directed toward describing the formal properties, mechanisms, etc., of cinema, and not toward prescrib-
Recurrent Problems in Cinema Theory 15
ing to directors and actors how film devices ought to be properly employed. There are style manuals, of course, one of the expressed purposes of which is to prescribe for the student of film how certain devices should be used. Pudovkin's 1958 [1929] book explicitly adopts, as one of its purposes, this function. A more recent cinema manual is Karel Reisz's (1968 [1953]) handbook. In such a handbook, one expects to be told where to place a cut in a given type of action sequence, how to pace cutting, what camera angles to use, etc. However, such maxims are often baldly presented in works purporting to be film theory.3 It is difficult to see how such slogans can add anything to a theoretical analysis of how film works either as a communication medium or as an art form. No matter how well-founded such works merely prescribe how film works; they presuppose, but do not offer, a descriptive theory. There is no mystery in this confusion. The early theorists undiscriminatingly mixed theory and criticism, description and prescription. For example, when Eisenstein considers Kuleshov and Pudovkin's constructivist theory of montage (Eisenstein 1949: 36-39), part of his argument is that montage does not work that way — this is the descriptive enterprise. But his argument goes beyond this. He claims that to use montage as one uses bricks to construct a wall, is wrong. This latter point is simply prescription. Eisenstein is instructing filmmakers about the use of montage. The fact that he does so in an essay on film theory could be expected to cause confusion, and it has. Spottiswoode's (1950 [1933]) work on what he called cinema grammar is fraught with examples of prescriptivism, mislabeled as theory. He considers, for example, the use of superposition (pp. 169-172), or what is now more commonly called double-exposure, in expressive montage sequences in which confusion and chaos are portrayed by overlayed shots containing conflicting elements of motion.4 Spottiswoode contends that one would not convey the idea of boredom by boring a viewer, and hence that one cannot convey confusion by potentially confusing overlays. Is Spottiswoode uncovering a basic property of cinema structure, or is he simply disguising an opinion as film theory? Spottiswoode's discussion of visual simile represents another such example. He points out (pp. 247-254) that there are prob-
16 Toward a Structural Psychology of Cinema
lems in establishing that the relation X is like Υ between two images. In particular, the viewer may incorrectly assume that the second image simply continues the action of the narrative. From this he concludes that film needs a pictorial sign for the concept 'like'. The use of fades or dissolves is ruled out because they are otherwise useful and restricting their use in this way would limit the director. Spottiswoode decides that the wipe is the proper candidate, mainly because he sees no other use for it. Surely, this view of how the wipe should be used is prescription, and nothing else. Prescriptivism, by its definition, cannot provide a theoretical account of cinema and, it certainly seems, does not in itself provide any sound basis for theory. In fact, it must assume a descriptive theory which in general is not explicitly provided. Therefore, we should not be surprised to find that prescriptivism usually leads to empty polemicsn. The prescriptivist tendency is certainly not limited to the examples discussed above. Nilsen in The Cinema as a Graphic Art (n.d.) was able to list sixteen prescriptive maxims, noting that not a single one had any general validity (p. 114).
3.2. Models versus aesthetics The second problem area which we will address is the confusion of the structural operation of the film medium with the bases for judgments of quality. Following the terminology of Tudor (1974) we refer to the former goal as being that of a model of film and to the latter as being that of an aesthetic of film. A model of cinema is concerned with how it is possible to perceive, understand, and create cinema sequences. An aesthetic describes the mechanisms that allow us to value and prefer certain of these constructions. Even when writing on cinema theory has succeeded in presenting truly descriptive analysis, it has generally failed to separate model from aesthetic in that analysis. Basically, it will be held that the failure to distinguish between models and aesthetics has prevented film theory from adequately formulating either. In addition, much pointless debate has resulted from this confusion. Any aesthetic insofar as it aspires to be theory, necessarily builds upon a model. The model provides structural descriptions
Recurrent Problems in Cinema Theory 17
for entities, to which the aesthetic then makes assignments of value. The particular manner in which the aesthetic theory assigns value to given structural configurations is the crucial element of the aesthetic theory itself. But the structural descriptions which it presupposes are provided by a model. Because of our poor understanding of the structural basis for any artistic medium, this program is rarely realized. As a result, aesthetic theory has remained, for the most part, extremely subjective and often obscure. Note that we do not claim that the operational distinction between models and aesthetics is easily accomplished; we merely claim that it is necessary. The early theorists were quite concerned with cinema aesthetics. One of Eisenstein's chief goals was to have cinema accepted as an art form. However, these theorists were much more concerned with discovering the structural mechanisms of the film medium. Thus, treatment of film aesthetics by the early theorists was scant. Not until Bazin, in the 1940s, is there anything approaching an aesthetic theory. And only Eisenstein has had a greater influence on thinking about film. Bazin's basic assumption is that film as a medium possesses essential properties centering about its propensity for revealing reality. The aesthetic qualities of cinema, he argues, emerge from its realism. Bazin contrasts what he calls true realism with pseudorealism. The latter merely deceives the viewer by means of illusions. The former concretely and faithfully projects actual physical reality. Unfortunately, this distinction doesn't hold up at all: if pseudorealism fools the eye and the mind, how can it ever be distinguished from true realism? Bazin acknowledged this paradox and reformulated his dichotomy: the essential reality of cinema was now seen as that reality of which the viewer is convinced.5 This he sets in contrast to unconvincing reality. The paradox is resolved, but the analysis is now completely enclosed in Bazin's own subjectivity. It rests entirely on his judgment of how convincing, plausible, conceivable, a film reality is. Bazin's implicit model of film assigns a description to films which rates their reality, in other words, the model of film consists of some procedure for isomorphic mappings into the real world. This model asserts that the same knowledge that allows us to interact with the real world allows us to understand and create cinema sequences, by analogy as it were.6 His aesthetic then
18 Toward a Structural Psychology of Cinema
simply assigns greater value to films that are more real (allow a better mapping). The aesthetic is very straightforward and simple.7 However, as might be expected, the specification of the model is problematic. In spite of this, Bazin remains one of the few film theorists to have drawn out the model/aesthetic distinction. (For other contrasts of model/aesthetic interest see Nilson n.d.: 21, 39; Arnheim 1957 [1933]: 210; Wollen 1969: 16-17; Reisz 1968 [1953]: 216-217; Tudor 1974: 14). Followers of Bazin, like Metz and Barr, however, have not been as careful in distinguishing models and aesthetics. They are responsible for establishing the theoretical opposition of Bazin and Eisenstein. It is not surprising that this happened. Bazin frequently makes outrageous prescriptivist statements as he wrestles with the inconsistencies and contradictions of the model presupposed by his reality aesthetic (see Note 5). Thus, he labels montage 'the anticinematic process par excellence' (1971: 46) and claims that '... when the essence of a scene demands the simultaneous presence of two or more factors in the action, montage is ruled out' (1971: 50).8 But, of course, it is precisely when the action is comprised of two or more factors that cutting is typically employed in cinema. Even Siegfried Kracauer (1960: 29), Bazins's partner in the realist aesthetic, admits that editing is the most general and most indispensable property of cinema. On the other hand, although Eisenstein's analysis of montage pertains fundamentally to a model of montage and not a montage aesthetic, his frequent advocacy of montage suggests to the reader an implicit montage aesthetic.9 Hence, we see that the opposition of Eisenstein and Bazin is rooted in the confusion of model and aesthetic: Eisenstein addressed his analysis primarily to the former, while Bazin concerned himself with the latter. To oppose the two is to contrast incommensurables, it is an error of category-
3.3. Appeals to psychology A third problem area in film theory is the uncertain role which it has characteristically given to psychological justification. Frequently a theoretical viewpoint is built upon a quasipsychological metaphor, but typically the metaphor is vague and employed
Recurrent Problems in Cinema Theory 19
loosely. Münsterberg (1970 [1916]) represents the first attempt to deal with the psychology of cinema. Indeed, this work appeared so early that there was rather little psychological theory for Münsterberg to bring to bear on the analysis of cinema. Nevertheless, Münsterberg manages to comment on the film experience with respect to depth, movement, memory, imagination, attention, and emotion, and even to raise the question of a cinema aesthetic. His model contains elements of two purer viewpoints which were later more fully developed by other theorists: the empathy theory of Baläzs, Kuleshov, and Pudovkin, and the Arnheim theory of partial illusion. He empathetically interprets what would now be called expressive or Hollywood montage (Note 4), arguing that we apprehend such montage sequences on analogy to our own private dream fantasies. Similarly, Münsterberg analyzes the close-up shot as an objectification of perceptual attention (p. 38), the flashback as an objectified act of memory (p. 41), and the sequencing of shots in a cinema scene as an objectification of the sequencing of attentional foci in ordinary real world looking behavior (p. 35). People can, of course, easily discriminate between real world experience and cinema experience, thus the appeal to analogy cannot provide a comprehensive film psychology. Münsterberg backs off from the extreme view that all aspects of cinema experience are defined by analogy to real world experience. He develops the foundation of a thesis similar to that of Arnheim (1957 [1933]), characterizing the experience of cinema as only partially analogous to real world experience. He claims, for example, that depth in cinema is comparable to the experience of depth when the world is viewed through a glass plate (pp. 22-23). Because of lens distortion and nonoptimal viewing conditions, the illusion of depth can be only partial. Münsterberg takes a similar position regarding the perception of motion in cinema, and concludes that both depth and motion are partial illusions of the real visual world. 3.3.1. Empathy. Baläzs, Pudovkin, and Kuleshov further develop the notion of empathy. In their view, film mimics the ordinary perceptual or imaginal experiences of the viewer. The viewer then sees what would be the most likely or reasonable real world interpretation of the sequence of images on the screen. For example,
20 Toward a Structural Psychology of Cinema
Pudovkin (1958 [1929]) describes the sequence from Griffith's Intolerance, in which the Dear One hears the court's verdict. Griffith cuts from a shot of her face, a trembling smile, to a closeup of her fidgeting, folded hands. For Pudovkin, the scene is gripping because it hastens the naturalistic perceptual processes which would have involved a pan down from the face to the hands (p. 93-94). Pudovkin claimed that the essential importance of editing is its ability to guide the attention of the viewer to particular elements of the action. The laws of editing, he contended, were isomorphic to the determinants of ordinary looking (p. 115). However, as illustrated by the example from Intolerance, cinema experience can also improve on real world experience by being more directive. Instead of glancing about to find what is significant in a scene, the cinema viewer's attention can be directed to certain elements by means of a cut. Another aspect of the empathy theory is the claim that the viewer sees what is the most likely interpretation of a given film sequence, given ordinary real world experience. Thus viewers see a single woman, not a catalogue of body parts in the Kuleshov experiments discussed earlier. It is also on this basis that Baläzs argues that the close-up shot signifies poignant loneliness (1970 [1945]: p. 63): the character is alone on the screen, he is lonely. In recent years, Vorkapich (1974) has developed the notion of kinesthetic empathy, a view earlier considered by Spottiswoode (1950 [1933]: 154-159)'°. Basically kinesthetic empathy asserts that the viewer represents a perceived cinema sequence kinesthetically in his or her own body. A bad cut is therefore jolting because the viewer's body is disoriented in the space of the sequence. The empathy theory must surely be correct, in some measure. The film viewer does often seem to experience as if the film were real life. However, the proponents of the empathy theory have failed to systematically draw out the consequences of this for cinema experience; they have remained at the level of vague metaphor. Without elaboration this theory makes numerous silly predictions — for example, if some cuts are bad in that the viewer is empathetically thrown through space, why are not all cuts bad for just this reason? A cut from outside a second story window to
Recurrent Problems in Cinema Theory 21
a front porch, it seems, could be either good or bad, even though the viewer would have 'fallen' just as far in both cases. 3.3.2. Partial illusion. Indeed, Arnheim (1957 [1933]) makes just this point in developing his theory of partial illusion. He notes that if cinema indeed created a strong kinesthetic empathy, editing would be impossible: the viewer would feel empathetically 'tossed about'. To the contrary, he argues, it is the fact that cinema creates only a partial illusion of reality that allows cutting at all. Moreover, his argument continues, it is in this partial illusion of reality that the aesthetic potential of cinema resides. Sizes and shapes appear in distorted perspective, color is absent, the field of view is cropped off at the screen's borders, spatial and temporal continuity is interrupted." However, film, unlike theater, can portray real life in naturalistic surroundings. Hence, its dual nature: on the one hand a strong affinity for reality, on the other, a flat, black and white, distorted picture. For Arnheim these observations establish film as art: 'Art begins where mechanical reproduction leaves off (1957 [1933]: 57; see also Pudovkin 1958 [1929]: 57). The partial illusion viewpoint has come to be identified with several reactionary positions. One such argument is that cinema should not have sound because a sound track creates too complete an illusion of reality and to that extent impugns the aesthetic effect of the film. Analogous arguments have been made against color, the wide screen, and other technical innovations. Time has failed to bear out these arguments to even the slightest degree, and the partial illusion theory has accordingly come to look a little silly. However, it is clear that cinema does present only a partial illusion of the real visual world. This is as true of black-andwhite silent films as it is of wide-screen talkies in living color. Therefore, it follows that the essential properties of cinema, if there are any, reside in the differences between this partial cinema illusion and the world itself. Unfortunately, proponents of the partial illusion theory failed to identify these essential properties, as the empathy theorists failed to more than anecdotally account for the essential similarities between the cinema experience and other facets of the real world.
22 Toward a Structural Psychology of Cinema
3.3.3. Association. Both the empathy theory and the partial illusion theory assume a cinema model of analogy: sequences are interpreted as if they were real life experiences. One difficulty with such approaches is that analogy is not a particularly welldefined psychological principle, and this makes it is hard to see what sort of explanation is being offered. Another approach to psychological explanation in cinema models has been to make use of existent psychological theories. Often these turn out to be little better than analogy as explanations, but at least they can be related to other work in psychology. Baläzs and Eisenstein shared an interest in the stimulus-response psychology of Pavlov and Watson. Baläzs argues (1970 [1945]: 91) that repeating a camera set-up will evoke the prior scene in which the set-up was employed. The two scenes are associated and the viewer will experience some degree of ] vu. Baläzs adopts the view that the meaning of a cinema sequence is comprised of the associations between ideas which the viewer already has and the ideas evoked by the various presented images and image sequences (pp. 179, 185, 197, etc.). The early Eisenstein held a very orthodox Pavlovian view of montage. Conflicting montage patterns, he believed, directly elicited unconditioned emotional responses: shocks of affect. In Eisenstein's view, this affect could be realized in the viewer as pathos. It could raise the viewer out of himself and, into the film and in particular, into the intellectual montage of the film. The proper stimuli elicit the proper response: if the filmmaker provides the proper constellation of metric/rhythmic/tonal/overtonal montage, a frenzied pathos will be elicited. Eisenstein elaborated this early view by accepting a richer conceptualization of stimulus-response association. He writes of images as being structured, constructed out of sequences of cinematic representations that evoke in the viewer's consciousness the meanings and implications intended by the filmmaker (pp. 15, 19, 30, 216, etc.). Notions like structure, meaning, and consciousness, fall well beyond the simple reflex psychology of Pavlov. The notion of association appealed to is far more rich and complex. A second respect in which Eisenstein enriched his underlying psychological model is credited to Wundt. Eisenstein calls it inner speech. Inner speech is the language of thought, the laws of 'inner speech' must accordingly predict, at least in part, the princi-
Recurrent Problems in Cinema Theory 23
pies of montage, or more broadly, what Eisenstein called film speech. He argued that it was therefore of the greatest importance to discover these '... general laws of form, which lie at the base not only of works of film art, but of all kinds of arts in general'(1949: 251). Although Eisenstein made use of far more sophisticated psychological concepts than mere analogy, he consistently failed to move his project beyond the prolegomena stage. He argues forcefully for montage as a Pavlovian unconditioned emotional response, and later for montage as a rich structuring of associations. He makes the important point that general laws of form may underlie principles specific to cinema. However, he does not progress very far in analyzing how montage is understood by means of rich associative network, and he never cites a single law of inner speech.
3.4. General laws Eisenstein's concern with general laws of inner speech is the most eloquent articulation of the oldest project in film theory. The search for general laws of the film medium was of paramount interest for the early theorists. Later theorists (e.g. Bazin) were also fundamentally concerned with discovering such principles. Unfortunately, not a great deal has resulted from all this interest. Relatively few general laws of cinema have been characterized, almost none have been systematically examined or formalized, and none has been seriously considered from the perspective of what Eisenstein called general laws of inner speech. It is not the case, of course, that nothing has been learned about film with respect to the goals of models and aesthetics. Indeed, many generalizations become evident from the discussion in this chapter: the analogy model as it has been developed (or presupposed) by Bazin, Arnheim, Münsterberg, Pudovkin, and Baläzs; the several aesthetics based upon this model — Bazin's reality aesthetic, the very similar empathy theory of Pudovkin and Baläzs, and the Arnheim theory "of partial illusion; and Eisenstein's several characterizations of the mechanics of montage. However, the general laws of cinema that emerge from these many analyses are relatively few in number, relatively un-
24 Toward a Structural Psychology of Cinema
systematically related to one another, and relatively limited in their generality. Eisenstein's project for investigating the underlying laws of inner speech has never really begun. Virtually all of the theorists we have considered discussed the use of close-up shots (Arnheim 1957 [1933]: 77-85; Baläzs 1970 [1945]: 52-59; Pudovkin 1958 [1929]: 93-94, 106-107; Spottiswoode 1950 [1933]: 138-141; Eisenstein 1949: 237-243). The consensus seems to be that the close-up intensifies by selecting out and magnifying critical details in the long shot. Typical early examples cited are from the work of Griffith, who frequently invoked the close-up to reveal the emotional state of a character. Lillian Gish in Way Down East or Broken Blossoms does all of her dramatic acting in the close-up. Hence, we have the law: 12 close-up = = > emotional intensity Of course, our law is really only a generalization with many exceptions. For example, if an actor picks up a small object in a long shot the director might employ an expository close-up just to reveal the nature of the object. Here: close-up = = > detail Such formulations lack rigor and fail to specify their own epistemological status in a cinema model. Nevertheless, something about the fundamental nature of the close-up is captured in such laws. And at the very least they provide hypotheses to test against real films. Another topic that many of the early theorists explored was cutting rate and cutting rhythm. Eisenstein addressed the interaction of metric and rhythmic montage, pointing out that alternation in the piece-lengths on either method would elicit tension in the viewer. Pudovkin also claimed that editing rhythms can evoke emotional responses (1958 [1929]: 144). He illustrates this with a parallel montage sequence in Griffith's Intolerance (pp. 47-48). Toward the end of the film there is a close-up scene. A woman in a car must catch up with an express train in order to save her condemned husband by revealing the identity of the real murderer. Griffith cuts from the husband's cell back to the car. He cuts back and forth between the preparations for execution and the efforts for rescue. The montage-pieces become successively shorter as the climax approaches. Eisenstein (1949: 223) applauded this. This sort of editing construction has now become standard.
Recurrent Problems in Cinema Theory 25
Consider now camera angle. Clearly, there are alot of choices here for the filmmaker. There are eye-level shots, which abound in television shooting; over-the-shoulder shots, typical in dialogue scenes; and even the straight-down high shots, like those used in Dreyer's Joan of Arc or Hitchcock's Psycho. Arnheim ( 1957 [1933]: 42-48) emphasizes that camera angle can be used to emphasize aspects of what is being shown and heighten the viewer's sensitivity to it. He notes, for example, that oblique camera angles tend to especially intensify movement. Nilsen (n.d.: 46) points out an example from Pudovkin's Mother, feelings of emotional depression and extreme grief as expressed by the actors are reinforced by a slight foreshortening accomplished by filming from a slightly higher than eye-level camera position. Baläzs reviews an example from Eisenstein's Potemkin. Eisenstein, as director, wanted to bring the mutiny sequence to an emotional crescendo as it ended. He introduced a variety of high shots and low shots as an alternative to shorter and shorter montage-pieces (1970 [1945]: 100-101). As a generalization, Baläzs proposes that unusual camera angles imply something unusual about the material. Consider finally the use of subjective camera techniques. Pudovkin (1958 [1929]: 115) points out that the camera can adopt the viewpoint of a character in the scene. Occasionally this technique is employed to represent something unusual (e.g. Leo G. Carroll's suicide in Hitchcock's Spellbound). Baläzs (1970 [1945]: 90) emphasizes the dramatic effect of taking on the eyes of a person in the scene. More often, however, its only real significance is that of involving the viewer (see Nilsen n.d.: 3738), or possibly only providing variety. Nevertheless, the generalization is of some interest. Much has been accomplished in the study of cinema. However, the theories and the generalizations alike fail to be comprehensive and systematic. Those generalizations that have been noticed have not been integrated into a theoretical system, they are just unrelated statements. (The single exception to this is Eisenstein's montage theory). Not even lip service is paid to Eisenstein's more general underlying laws of inner speech. None of this work has seriously addressed the task of placing the study of cinema in the broader context of the human sciences. In sum, not a great
26 Toward a Structural Psychology of Cinema
deal of progress has been made toward what Eisenstein conceived of as a union of logic and art, a science of cinema.
4. SUMMARY The sorts of difficulties we have been examining in this chapter are fundamental problems for contemporary cinema theory. Very little of what is called cinema theory meets even the three minimal requirements outlined by Tudor. And no current work successfully confronts the four recurrent problems which have been examined in this chapter. Prescriptive maxims continue to be mistakenly offered as theoretical analyses, models are confused with aesthetics, the psychological bases of the cinema experience are misunderstood or ignored, and the general laws of cinema remain, for the most part, unknown. One strategy for dealing with these problems has been to adopt the research paradigms and theoretical constructs of neighboring disciplines to the analysis of film. Cinema theorists have begun to seriously develop methodological analogies like film is dream, Film is a sociopolitical microcosm, or film is language. Psychoanalytic, sociopolitical, or linguistic approaches to cinema theory will satisfy requirements like Tudor's exactly in the way that their parent disciplines do. And indeed, the most important recent work in cinema theory has been undertaken within one or another of these borrowed methodologcial orientations. The particular linguistic approach we shall develop here attempts to organize the subject matter of cinema theory by making it analogous to generative linguistics, and in this way tries to satisfy Tudor's three requirements. Furthermore, this linguistic approach attempts to successfully confront each of our four recurrent problems. The fundamental concern of a generative linguist is the description of a grammatical model of language. In Chapters 4, 5, and 6, we outline a generative model of cinema grammar. The generative linguist is also concerned with the psychological bases and implications of grammatical descriptions. In Chapters 7 and 8, we consider the relation of cinema grammar to cinema perception. Finally, the generative linguist is concerned with general laws of language, principles that are called language universals. In Chapters 6 and 8, we explore putative universals of
Recurrent Problems in Cinema Theory 27
cinema, including a proposal about the relation of the grammatical model of cinema to cinema aesthetics. In the next chapter, we consider some recent discussions of linguistic approaches to cinema theory. Much of this literature rests on confusions about what it might mean to adopt a linguistic approach. Accordingly, it will be useful to have first clarified some of these misunderstandings before turning to the substantive task of developing a generative theory of cinema.
NOTES 1. Such work would certainly pertain to a theory of film criticism, in Tudor's sense of the word theory. Thus, Tudor's position is certainly not a unilateral attack on criticism. His point is merely that a work prolegomena to a theory of film criticism, is not necessarily relevant to a scientific analysis of how the film medium works psychologically, sociologically, and politically. This issue is further clarified below in the text. 2. For further, but brief, consideration of taxonomic theories, with particular regard to the theory of Metz (1974), see Chapter 3, section 4, (also Fodor, Bever, and Garrett 1974: Chapter 2). 3. And these maxims are often not only oversimplifications for the student uut also of rather questionable validity. For such a case compare Reisz (1968 [1953]: 220) with Arnheim (1957 [1933]: 50). 4. Expressive montage, sometimes also called Hollywood montage, refers to sequences which depict a train of events or the passage of time by blending many brief images. For example, to suggest that many months have passed, one might blend shots of the pages of a calender flipping by with shots of autumn activites, then winter activities, then spring activities. (Indeed, this very sequence occurs in many films). The effect is that in the space of a few seconds, a considerable span of time can be represented. Similarly, an expressive montage sequence can set a mood by blending many shots which evoke that mood (this is the type of sequence Spottiswoode has in mind). 5. This shift and the change in terminology makes his theory very hard to follow. In the revised version of his theory Bazin defines convincing realism as the union of true realism plus pseudo realism; these two terms are, however, renamed documentary realism and aesthetic realism, respectively. In 1951 Bazin was able to claim that 'some marvel or some fantastic thing on the screen' not only fails to disturb reality, but is cinema reality's 'most valid justification' (1971: 108). In 1945, however, he had lashed out against 'the pseudorealism of a deception aimed at fooling the eye (or for that matter the mind); ... content in other words with illusory appearances.' (1971: 12). 6. Bazin's implicit model, characterized in this way, is remarkably similar to the empathy model of Münsterberg, Pudovkin, Baläzs, etc. (see below in
28 Toward a Structural Psychology of Cinema
section 3.3.1). To my knowledge, though, Bazin himself never explicitly acknowledged this; probably because his fundamental concern was the cinema aesthetic that could be defined by this model and not the model itself. 7. Other theorists have followed Bazin in developing a simple and elegant formula for aesthetics with a more complicated model doing the real work of the theory. Wollen (1969) argued for a cinema grammar. The complexity of the grammatical description assigned to each cinema entity by the grammar was to be proportional to its aesthetic value. Metz (1974) argued that at some points in the diachronic development of cinema grammar, ungrammaticality was equivalent to aesthetically valuable. (We return to Metz in the next chapter). U n f o r t u n a t e l y , these simple and elegant aesthetics only appear plausible when, as in the case of Bazin, Metz, and Wollen, the grammatical model has not been specified but only imagined. In Chapter 8, we will suggest that the relationship between model and aesthetic is rather more complex and indirect than these theorists hold. 8. It is important to note, and quite obvious from the quotes in the text, that Bazin's work is fraught with prescriptivism. His argument often attempts to set standards for realism (however defined) instead of probing the mechanism of analogy that supposedly allows us to understand film as we understand the real world. Ruling out editing in film is probably the most absurd statement one could possibly make. 9. Often Eisenstein juxtaposes his description and endorsement of montage methods with an argument that film is the epitome of art forms (e.g. 1949: 181-182). Naturally, this has tended to encourage the misinterpretation that his fundamental contribution was that of an aesthetic rather than of a model. In any case, the term montage, as we have seen, means much more to Eisenstein than editing; and it is editing only that Bazin attacks. It seems that the move to oppose Eisenstein and Ba/.in is misinformed in almost every possible way. 10. Spottiswoode argues against the stereoscopic film claiming that it would necessarily make the viewer feel 'hurled through space in an instant' (1950 [1933]: 154). This prescription, on the part of Spottiswoode, seems to presuppose a kinesthetic empathy similar to that of Vorkapich. Of course, it is difficult to really assess Spottiswoode's theoretical position since he does not make it explicit. Ironically, there is an interesting anticipation of and criticism of Vorkapich's kinesthetic empathy, in Arnheim's Film as Art (1957 [1933]: 30-32; see also below in the text). I I . Arnheim was writing before the advent of color photography, the wide screen, and even before sound film had completely established itself as dominant over silent film. 12. In the theory which we eventually will develop in Chapters 4, 5, and 6, such a law might be interpreted as a rule of semantic interpretation. In fact, however, we will have little to say about the description of meaning in cinema, affective or denotative, as our major concern will be to confront only structure.
Film as Language
1. FILMOLINGUISTICS One traditional theme in the study of cinema interprets the structure of film as a metaphorical language. Such a conceptualization does seem to have prima facie validity. A cinema scene attempts to portray an event by imposing a structure of cuts, zooms, tracks, pans, framings, and the like on that event. Analogously, a sentence represents an idea by imposing a syntactic and phonological structure on a set of lexical items. Characteristically, the film-as-language metaphor has been used in two separable, but not always separated, ways. First, film-as-language can be interpreted as a theoretical claim. That is, one can assert that language and cinema are, in some fundamental sense, members of the same natural kind. This is a substantive claim, and must, of course, be directly evaluated empirically. However, there is a second interpretation of the film-as-language metaphor. In this interpretation film-as-language is taken to be a methodological assumption. In this second interpretation, the film-as-language slogan asserts nothing theoretically substantive. However, the slogan guides research and thus does have theoretical consequences. The basic thesis of this chapter is that the traditional metaphor of film as language has mistakenly been treated as a theoretical claim, when it should have been treated as a methodological assumption. As a result, film theory has concerned itself with a complex of issues involving the definition of language and the applicability of the terms used in this definition to cinema. Indeed, these issues have so distracted theorists that almost no
30 Toward a Structural Psychology of Cinema
attention has been given to the consequences of film-as-language for understanding cinema. References to film as a language abound in the classic works of cinema theory. Baläzs, Eisenstein, Pudovkin, Nilsen, and Spottiswoode all describe cinema as a language. Eisenstein entitled a 1934 essay 'Film language', referring to montage as film-syntax (Eisenstein 1949: 112). Baläzs (1970 [1945]) entitles one of his early chapters new form-language'. He argues that cinema viewers must learn the picture-language in order to comprehend film sequences (pp. 33-38). Spottiswoode (1950 [1933]) explicitly stated that his goal was to make precise '... the language and grammar which the film ... has to acquire' (p. 29). Eisenstein elaborated the film-as-language metaphor in his claim that there is a structural correspondence between the linguistic word and the cinematic shot, and between the linguistic sequence (or montage phrase) and the cinematic sequence (1949: 236). He argued that, like the word, the shot is incomplete as a independent unit. Both the word and the shot derive the bulk of their significance from the interactions (or collisions) in which they partake. Baläzs (1970 [1945]: 118) shared this view: The meaning of a single note in a tune, the meaning of a single word in a sentence manifests itself only through the whole. The same applies to the position and the role of the single shot in the totality of the film.' Pudovkin (1958 [1929]) also emphasized the correspondence between the shot and the word (e.g. p. 100). Baläzs (1970 [1945]) emphasizes another sort of film-aslanguage metaphor. He describes a language of gesture which, he argues, cinema tends to nurture. The language of gesture, he contends, is a universal language, understood by all. He argues that in the course of civilization, man has lost the power of expressive gesture (for various reasons, including the development of verbal language). The cinema (he has particularly silent cinema in mind) returns to this common linguistic instrument and in doing so reconnects all of humankind — each person to himself and to others (pp. 39-45). Baläzs points out, for example, that in cinema a movement as simple as walking, an activity we presuppose but hardly notice in daily life, can become quite significant (pp. 134-137). In succeeding chapters of this book, we shall try to make much of these two conceptualizations of film as language —
Film as Language 31
Eisenstein's notion fhat cinema is a language of shots and sequences of shots and Baläzs's notion that cinema is a language of gestures, actions, and events. In the present chapter, we will focus more closely on making precise the sense in which cinema can be regarded as a language. The writings of Baläzs, Eisenstein, and the other early theorists, seem to equivocate on the issue of whether film-as-language is a theoretical or a methodological statement. Certainly, if it is to be analyzed as being the former, it is quite poorly explicated and defended. But if it is to be analyzed as being the latter, it is, after all, rather little used by these theorists. The difficulty with these early writings is that this issue is never really confronted directly. The early theorists simply seem to have ignored the entire matter, and to have freely moved back and forth between the two positions.
2. THEORY, METHODOLOGY, AND METZ In more contemporary treatments of cinema theory, the film-aslanguage metaphor has remained. However, until rather recently almost nothing has been done to ensure that the metaphor had any content — either as a methodological assumption or as a theoretical claim. As Wollen (1969: 7) very aptly put it, 'Writers about the cinema have felt free to talk about film language as if linguistics did not exist ...'. In general, cinema theorists have failed to find and/or pursue any consequence of the film-aslanguage slogan. A notable exception to this is Christian Metz, who over the last fifteen years has investigated film from a seriously linguistic point of view. 1 Unfortunately, Metz's important work has led as much to confusion as it has to clarification. The status of what Eisenstein called filmolinguistics, or what is now called film semiotics, in cinema theory is still very much in question. In this chapter, we will review the argument of Metz's book Film Language: A Semiotic of the Cinema, summarizing and criticizing some of the major points raised by Metz and his critics. It is important to examine this literature in detail since the present work shares many assumptions with Metz's work. To the extent that we can recognize and dismiss pseudoissues and unravel confusions in the
32 Toward a Structural Psychology of Cinema
Metz literature, we can improve the focus and clarity of the present theory. 2 The task undertaken in Film Language is that of applying the methods and models of Ferdinand de Saussure (1959 [1916]) to the study of film. In de Saussure's approach, the signification of linguistic signs (for de Saussure, principally speech sounds) is determined by the ways in which they contrast with other linguistic signs in a sequence, or syntagm, of signs: 'In the syntagm a term acquires its value only because it stands in opposition to everything that precedes or follows it or both' (de Saussure 1959 [1916]: 123). In this sense, de Saussure's approach to linguistic theory envisions a taxonomy of contrasts. De Saussure himself was quite interested and optimistic about extending his linguistic theory to other domains. This enterprise he named semiology or semiotics. Thus, the approach of Metz is very much that countenanced by de Saussure. According to de Saussure, the semiotic of any given field rests on the dual foundation of linguistics and the structural peculiarities of the particular field in question. Metz enriches this program by adopting a threefold support: the study of film, linguistics, and narratology, the study of narrative structure (Bremond 1968; Greimas 1968; Propp 1958 [1928]; Todorov 1969). His intention is to begin the semiology of cinema with the semiology of the narrative film. Before considering any truly substantive issues, Metz poses two metatheoretical questions: first, in what (theoretical) sense is film like a language? Second, is the selection of the narrative film as the object of study sufficiently justified? The first question leads Metz to several points. He argues that the shot in film is like the sentence in language and therefore not like the linguistic word, the phoneme, etc. Presumably, Metz makes this argument against theorists like Eisenstein who assumed that the cinematic shot corresponded to the linguistic word (see above). He addresses the question of whether film is langue or langage. In de Saussure's theory, langue refers to the abstract code, or grammar, which forms the observable, more superficial, aspects of langage. In considering this matter, Metz argues that cinema actually has very few signs, and that it lacks a double articulation (Martinet 1960). Language has a double articulation in the sense that sentences are articulated on a set of morphemes (minimal units of meaning), which in turn are articulated on a set
Film as Language 33
of phonemes (segments of speech noise). He finds that cinema is only a one-way communication, unlike language, and that, also unlike language, it has a very minimal syntax. Ultimately, he concludes that film is not langue, but that in some sense it can be studied as langage. The second metatheoretical question leads Metz to argue that film has a natural affinity for narrativity; that a sequence of images cannot help but tell a story. Moreover, his argument continues, it was in adopting the narrative as its central genre that film developed its similarities with language. Metz concludes here that to study the narrative film is therefore to precede most swiftly to the heart of film semiotics. Metz's positions on these two meta-theoretical questions have further consequences to which we will return below. However, it is important to note first that both questions, and arguments addressed to them, are simply irrelevant to Metz's project. Metz's conclusions here, which he justifies by argument, amount to methodological assumptions. Methodology, however, is justified not by argument, but by its empirical efficacy. The centrality of the narrative film, for example, is a point Metz simply doesn't need to argue. He may choose to study narrative film, indeed the choice may even turn out to have been a good choice, in the sense of heuristically good. But the choice cannot be antecedently justified. It is the former metatheoretical question, however, that is of more concern here. If film-as-language is taken as a methodological principle, a great deal of Metz's argument simply falls away as irrelevant. For example, the various facts that cinema is a oneway communication, that it lacks a double articulation, and that it has few real signs may be of use in organizing research; they may even convince someone who seriously believes film to be a language, that he is mistaken; but they cannot call into question the methodological principle of film-as-language. Indeed, there is only one sort of demonstration that could reject the methodological assumption that cinema can be studied by a linguistic approach. That would be to adopt a linguistic approach — and this is just what we shall do in subsequent chapters. Metz's fundamental confusion in Film Language is that he takes film-as-language to be both a methodological principle and a theoretical claim without ever distinguishing one from the other.
34 Toward a Structural Psychology of Cinema
The theoretical claim that X is a Y must involve demonstration, proof, and argument, as well as careful and precise definitions of X and Υ — something is being asserted about the nature of things. The methodological assumption that X is a Y, however, is only an orienting hunch: for the sake of argument X is taken to be a Y, and the consequences of this assumption are investigated. In Film Language, Metz confuses his methodological principles, his orienting assumptions, with theoretical claims. This is why his argument tends to become somewhat convoluted and why his ultimate conclusion that film is sort of langage but not langue is unsatisfying. 3 Indeed, one might well ask how langage, the observable aspects of the system, can exist without langue, an abstract underlying form. Metz himself seems dissatisfied with his conclusion, and continually requestions it throughout Film Language. Reviewers of Metz's work have often compounded the confusion by elaborating this misguided argument. Thus, Nichols (1975: 39) argues that film may be analyzed like a language but not like a verbal language. If this claim is true (as a theoretical claim), it is trivial (i.e. film does not consist of words, etc.). If the claim addresses issues of methodology, it is simply tendentious and without empirical significance. Metz's confusion regarding the status of his methodological assumptions leads to another difficulty, which in turn has instigated further very fundamental confusions in the writings of his critics. In arguing that his methodological principles are antecedently justified, Metz casts his study of film semiotics not as a contribution to a new field, but as the contribution to a new field. Thus, the irrelevant argument he presents for essentially unarguable principles serves a somewhat perverse purpose with respect to the dialectic between differing theories of cinema. It is hardly surprising that Henderson (1975: 32), in reviewing Film Language, criticizes Metz for not providing a complete theory of cinema: Metz's own presentation implicitly suggests that he thinks he has. The argument that one's approach or theory is in some sense quintessential is a bizarre one. Metz's implicit claim that his approach is quintessential cinema theory, even though it is unintentional, is very destructive to cinema theory and, in particular, to linguistic programs of cinema theory. It does a double disserv-
Film as Language 35
ice: first, it confuses its own modest contribution with a nonexistent grand system and, second, it confuses the critical debate which follows it. To be specific, Metz's implicit position that his structural-linguistic program could provide the cinema theory has led to the response by some of his critics that no structural-linguistic approach can provide any cinema theory. Let me exemplify this situation in the Metz literature. As we have noted above, Metz takes as one of his starting points the centrality of the narrative to film studies. This irrelevant argument leads him to make the equally irrelevant corollary argument that denotation is more central to cinematic codification than is connotation (Metz 1974: 117-118). Metz argues that connotation is simply a derived form of denotation. I will make three observations on this point. First, as before, all Metz's system needs to do is to take its predilection for the study of denotative structure as a methodological principle. Second, prima facie, such a move would appear to be reasonable since the mere fact that we all discuss the distinction between connotation and denotation ensures that the distinction can be operationalized within limits. Finally, applying this argument where it is not needed serves to gratuitously impugn any approach, complementary to Metz's, which does address the structure of connotation (e.g. any sociopolitical approach). Given this, it is hardly surprising to find Metz criticized for 'denying the image's expressive nature' (Nichols 1975: 35). This is exactly what he has done (but, cf. Metz 1974: 111). Dismissing the irrelevant argument in Metz's book, the question arises whether there is anything salvagable left. What remains is quite straightforward: Metz uses a phenomenological method to analyze the segment as a structural unit of film and ultimately develops a Saussurian taxonomy of the segment based on the ways different segment-types can be contrasted. This taxonomy he calls the grande syntagmatique. I will approach the discussion of the grande syntagmatique in two stages: first, I will examine the use to which Metz puts phenomenology; second, I will examine the taxonomy he derives.
36 Toward a Structural Psychology of Cinema
3. SUBJECTIVE PHENOMENA AS DATA AND ANALYSIS Next to the confusion of film-as-language, the most basic confusion in Metz's work as well as in the Metz literature is the confusion of phenomenology — qua method versus qua theory. The question is does Metz use phenomenology, the personal observation of one's own internal states, as a method to study the way in which cinema structure is experienced, or does he find in his own phenomenology a theory of the structure of cinema? All inquiry is to some extent phenomenological since observation itself is an interaction of the observer and the observed. There is no such thing as objective observation. Thus, there is no approach to inquiry which can eliminate the subjectivity of experience. In fields as widely separate as psychoanalysis, quantum mechanics, and generative linguistics this fact has become a commonplace. The dream report is viewed as the interaction of (at least) the dreamer and the dream; the momentum of a subatomic particle is a function of its mass, its velocity, and the occasion of its observation, and the acceptability judgment associated with a given linguistic sequence is a function of its structural analysis and various properties of the speaker/hearer, the speech context, etc. (see below and Chapters 4 and 5). In assessing the role of phenomenology in Metz's work, it is again useful to introduce the method versus theory dichotomy. Phenomenology qua method is simply the acknowledgment that nothing about a scientific inquiry is purely objective. This is just to say that so-called logical positivism is futile: requiring the definition of a theoretical entity to include an objective verification procedure for instances of that entity ensures that there can be no theoretical entities, ergo there can be no theories. (For discussion of logical positivism: Ayer 1966; Quine 1963). The actual achievements of the disciplines that espoused the logical positivism of the early twentieth century (e.g. Radical Behaviorism in psychology, American Descriptivism in linguistics) are indeed few. That they are not totally null is primarily due to the fact that the objectivity standards of logical positivism were always to some extent compromised (Chomsky 1959; Fodor 1968). In the view of psychologists like Freud (1900) and Köhler (1929), and of linguists like Chomsky (1959), scientific inquiry can focus upon and explain the systematicities of subjective phe-
Film as Language 37
nomena rather than ineffectively and inconsistently trying to outlaw subjectivity.
3.1. Phenomenology qua method Phenomenology qua method locates in subjective experience data for scientific analysis. Thus, for the psychoanalyst, the latent content of the dream is inferred from the dream report of the dream's manifest content. This inference, which derives from psychoanalytic theory, abstracts the analysis from the subjective experience. Similarly, in generative linguistics, the structural description of a sentence predicts the pattern of acceptability intuitions associated with it. In contrast, phenomenology qua theory locates in subjective experience the theoretical analysis itself. This latter form of phenomenology has come to be viewed with a good deal of deserved suspicion. Consider, for example, the analysis of certain visual illusions by the psychologist Titchener (Titchener 1907 [1897]; Boring 1953). Titchener discusses, for example, the perception of a square inscribed in a circle. Typically, he points out (pp. 195197), observers report that the circle seems to flattened and constricted at the four points where it intersects the inscribed square. That is, observers experience the visual illusion that the circle is not a circle, but a somewhat squarish closed curve. Titchener's approach to the analysis of such phenomena was to introspect on the perceptual processes which phenomenally create them (pp. 39-49). In the case of the square and circle illusion, he claimed that movements of the eyes cause the angles formed by the sides of the square with the four arcs of the circle to appear larger than they in fact are. Note that this analysis was not based upon recordings of eyemovements, but on Titchener's own introspections about eyemovements and their relation to perceptual experience. Thus, the subjective experience of the trained observer (or, more precisely, the report of the subjective experience) is the analysis. There is no reason to believe that such an introspective analysis truly reveals anything about the actual mental processes involved — no reason to believe that observers can be trained to notice minute eyemovements phenomenally.
38 Toward a Structural Psychology of Cinema
For contrast, consider the generative approach to linguistics. (This contrast is of particular interest, since in the next chapter we shall adopt a generative approach to the study of cinema.) In this approach, intuitions of sentencehood as rendered by fluent speaker/hearers of a language constitute the basic data. Insofar as these judgments are reliable across speakers, they are taken to derive from the subjective yet systematic knowledge that people have when they know the language in question. The linguist constructs a formal theory, or grammar, which predicts the particular patterning of judgments that have been collected. If the grammar generates all sequences that fluent speaker/hearers take to be sentences of their language and only such sequences, the grammar constitutes a theory of the knowledge the fluent speaker/hearer has when he or she knows the language. (We return to and elaborate this approach in much greater detail in the next chapter.) The theory, or grammar, is an abstraction from the phenomenological data, it is not literally in the data. In Metz we find both varieties of phenomenology undiscriminatingly mixed and unlabeled. First let us consider examples of his phenomenology qua method. Metz uses his own intuitions of structural constituency to define the units of structure in the narrative and in the film itself. In the latter case, he writes '... the film sequence is a real unit — that is to say, a sort of coherent syntagma within which "shots" react (semantically) to each other.' (Metz 1974: 115). In the former case, Metz seeks to establish what he calls the event as a unit of narrative structure. A narrative is comprised of a series and a sum of event units (Metz 1974: 24). These units, events and sequences, are widely alluded to in writing on film theory and narratology; they appear to be at least relatively systematic and general subjective phenomena. Here again, however, Metz provides further irrelevant argument and citation to sustain his phenomenological method. All he needs to do is to establish his terms operationally in subjective experience. The validity of the sequence as a unit of cinema structure or of the event as a unit of the narrative rests on the empirical consequences of taking them to be units (evidenced in viewer intuitions, perceptual and cognitive measurements, etc. — see Chapters 4 through 7), and not on Metz's spurious arguments and citations.
Film as Language 39
In criticizing Metz on this point, however, Henderson (1975: 28) blunders in claiming that 'to extract the unit designations of a narrative system without the syntagmatics and the overall model that go with them is meaningless'. The sentence, for example, is a coherent unit of phenomenology which can be taken over from whatever linguistic system one wishes to take it from — without any obligation to take over that linguistic system in full. Henderson's confusion may be due to the abstruse way in which Metz introduces event as a theoretical term.
3.2. Phenomenology qua theory Metz also relies on phenomenology qua theory. Here I take his treatment of analogy as an example. For Metz, the formative elements of his units, the event and the sequence, are images. These elements are not, as Metz argues irrelevantly, doubly articulated, as are the formative elements of linguistic units. They are direct analogs of the world: an image denotes that which it represents. Hence, the internal analyses of the event and of the sequence are to be given directly in experience. This has the effect of relieving Metz's theory of the burden of having to address any level of structure below that of the sequence/event. But Metz has gotten off too easy: analogy is not an account, it is merely a term, and a term that requires some sort of definition. Metz mistakenly takes the intuition that the screen's images convey a strong impression of reality (a datum) for an account of that very fact (i.e. it's analogy!). If we take the analogy position seriously, we are stuck with an embarrassing fact: viewers know perceptually that they are not witnessing reality when they see a film (e.g. there are no binocular depth cues — Hochberg 1964). Moreover, we know that ordinary world perception is coded — we even know some of the code (Kling and Riggs 1971, Chapters 4-13). Given this, what work is a term like analogy doing for us? It might make sense to speak of partially congruent mental representations. For example, the word 'square' and a line drawing of a square are reliably differentiated and yet their respective mental representations are also denotatively congruent (to the extent that they denote the same concept). Congruent relations between cinematic images
40 Toward a Structural Psychology of Cinema
and world objects and events would of course need to be described in far more detail than this. But in any case, the mere word 'analogy' provides no analysis whatsoever. It is not surprising that Metz's critics have mistakenly interpreted him on phenomenology. Henderson (1975: 29-31), in particular, takes him to task over this, but basically for the wrong reasons. Henderson scathingly derides the appeal to experience, but astonishingly gives no argument at all. He merely quotes Lovi-Strauss at length in a passage which simply says that he (Lovi-Strauss) avoided phenomenology. Henderson goes on to claim that psycholanalysis and historical materialism both 'require a break with experience in order to construct its concept, in order to construct a model of the system which produces ordinary experience either at the psychological or the political level.' (Henderson 1975: 30). But this is a non sequitur if we read it literally: methodology is not something we need to vote on. Put directly, who cares what psychoanalysis and historical materialism do? The error goes yet deeper. On the one hand, Henderson misunderstands psychoanalysis if he thinks that it makes no use of phenomenology: the dream report is phenomenology par excellence. Moreover, since many forms of psychoanalysis countenance self-analysis, it is not impossible for the entire psychoanalytic discourse to be circumscribed within subjective experience. On the other hand, he misunderstands Metz if he believes that the system Metz constructs from intuition is itself completely given in conscious experience. The line of reasoning that leads Henderson to his conclusion about Metz's work is this: he criticizes the phenomenological method — 'Metz attempts a syntagmatics of the part, determined empirically, i.e. without reference to an overall theoretical model.' (Henderson 1975: 29). Henderson continues (p. 32), 'And, since there was not theoretical model which launched the inquiry, he cannot account for what led to the collection of these data in the first place. Empirical studies often exhibit this doubly isolated condition.' As we have already noted, Metz places himself in the theoretical program of de Saussure, hence his enterprise is not really undertaken 'without reference to an overall theoretical model'. Again, it appears that Metz has led his critics into confusing
Film as Language 41
phenomenology as method with phenomenology as analysis. With regard to the grande syntagmatique, the method is phenomenological, the theory is Saussurian. Henderson's conclusion is based on the misunderstanding that Metz finds the taxonomy itself in the raw data of experience (cf. 'determined empirically').
4. THE GRANDE SYNTAGMATIQUE: AN ASSESSMENT The theory of cinema structure that Metz develops is a taxonomy of segment-types, differentiated by several binary oppositions. This taxonomy is the grande syntagmatique. The properties which are contrasted in the taxonomy have to do with the form of the segment as well as the type of event represented in the segment. Sequences are categorized according to the sort of event they represent and the manner in which they represent it. One goal of the grande syntagmatique is to provide an account of the relation between the two levels of cinema structure, the event and the segment (Metz 1974: 143-145). The grande syntagmatique is displayed in Figure 1 (adapted from Metz 1974: 146). Note that the eight segment-types are defined by seven binary oppositions. This structuredness of the taxonomy differentiates it from less formal taxonomic schemes like that of Eisenstein. At the apex of the taxonomy, the class of autonomous segments is divided into into autonomous shots and syntagmas. Autonomous shots are segments that consist of only a single shot, syntagmas are constructions of more than one shot. The class of syntagmas divides into chronological syntagmas and achronological syntagmas. In the latter the temporal relations between the various shots and images is not specified by the film, while in the former it is. Achronological syntagmas are of two types. In one, the Bracket syntagma, a series of brief images are presented which suggest a common theme, but which are not temporally related to one another. In the other, the parallel syntagma, two visual themes are interwoven, but no temporal relations, either between shots pertaining to the same theme or between themes is specified; for example, alternating shots of riches and shots of poverty. Chronological syntagmas are also of two types. One is the descriptive syntagma. In the descriptive syntagma all elements presented in a
42 Toward a Structural Psychology of Cinema
Figure 1.
Film as Language 43
in a series of shots are understood to be simultaneous; for example, shots of a landscape (usually just prior to some action). The other type of chronological syntagma is narrative. Narrative syntagmas can be divided into into two types, alternating syntagmas and linear syntagmas. The alternating syntagma interweaves two temporally parallel action sequences; for example, the cavalry's advance and the seige of the wagon train. Linear syntagmas represent a single action sequence. They divide into scenes and sequences. The scene represents a spatially and temporally integral event. The sequence, in contrast, presents a discontinuous event. There are two types of sequence, the ordinary sequence and the episodic sequence. In the ordinary sequence, moments in the represented event that have no consequence for the overall plot are simply omitted (the viewer does not need to see a character fumbling for correct cab fare). In the episodic sequence, these omissions are more systematic; for example, a segment depicting the life of a mailman might show him delivering letters at a certain house each day, omitting the rest of his daily activities. 4 The question that must now be addressed is what the value of Metz's work is. Even if we grant that all narrative cinema segments could be exhaustively and uniquely classified by the binary oppositions of the grande syntagmatique, so what? 5 Clearly, Metz has identified and systematized several important structural oppositions in his taxonomy. For example, the dimension of time is theoretically treated in the opposition of chronological and achronological segments. Accordingly, it seems proper to consider the grande syntagmatique as a theory of segment structure in just the sense that Saussurian analyses of language are considered linguistic theories. The next question we must ask is to what sorts of new insights about cinema the theory leads. On this point, Metz is quite ambivalent. In several places Metz (1974: 48, 56-57) seems to be suggesting that grammaticality, as defined perhaps by his grammar, can be used to define aesthetically pleasing. He describes a period in film history when grammatical corresponded to bad. But certainly he would not want to claim that segments analyzable by the grande syntagmatique (i.e. grammatical vis-ä-vis the grande syntagmatique) are all aesthetically bad! And certainly he would not want to claim that segments not analyzable by the grande syntag-
44 Toward a Structural Psychology of Cinema
matique (i.e. ungrammatical on the grande syntagmatique) are good! Such formulae relating models of film structure (the domain of theoretical terms like grammatical) and aesthetic theories of value are embarrassingly oversimplistic The relation between cinema grammar and cinema aesthetics is in all likelihood very much more complex (Tudor 1974; and Chapter 8 below). In some contexts Metz seems to entertain the idea that the grande syntagmatique could constitute a contribution to cognitive theory. Thus, he states that the role of montage, or editing, in film is formed '... by a certain structure of the human mind' (Metz 1974: 47; see also pp. 103, 136, 145). Thus, to study the principles of montage would be to collect evidence about the nature of the human mind. Metz seems to have no illusions about the difficulties involved with studying underlying cognitive structures by studying cinema, 'In every human phenomenon of some magnitude — the cinema included — various cultural systems intervene and overlap in complex ways.' (Metz 1974: 74; see also pp. 112, 143). But if Metz seriously regards the grande syntagmatique as a vehicle to study cognitive structures, he has failed to make this clear. References are scattered and vague, and nowhere does he concisely delineate any specific hypotheses. Finally, Metz occasionally attempts to consider the perception of cinema. Often, and unfortunately, his perceptual analyses are quite superficial. For example, he appeals to 'the spontaneous psychological mechanisms of filmic perception' (Metz 1974: 103). The use of terms like this, seem to suggest, first, that the mechanisms of cinema perception are understood (a totally erroneous suggestion), and second, that they are trivial (cf. 'spontaneous'). When Metz directly confronts the role of grammatical structures in perception, he demonstrates that he is simply uninformed For example, he writes, 'as linguists have observed, the sentence is first of all a unit of speech, not of thought, reality, or perception.' (Metz 1974: 217). Personally, I cannot fathom what it could mean to be a unit of speech but not of reality. Aside from this, the facts offered are incorrect. The sentence is, besides being a unit of speech, a unit of perception (Miller 1950; Miller and Isard 1963) and of thought (Valian and Wales 1976; Wanner 1974). What is particularly unfortunate about Metz's misunderstandings here is that they lead him to expect that the structural
Film as Language 45
units described by cinema grammar will fail to be actual perceptual units. Indeed, Metz seems committed to insist on even less potential psychological interest for his theory. He argues that, 'Many people, misled by a kind of reverse anticipation, antedated the language system; they believed they could understand the film because of its syntax, whereas one understands the syntax because one has understood, and only because one has understood, the film.' (Metz 1974: 41). Elsewhere, however Metz seems to relent, he offers a glimmer of hope. He suggests that his grammar, the grande syntagmatique, accounts for 'the intelligibility of the co-occurrences of filmic discourse' (Metz 1974: 145). As in the case of his consideration of the cognitive implications of cinema grammar, Metz's comment on the perceptual reality of the structural units of his theory is scattered, vague, and undeveloped. Perhaps it is enough to develop a comprehensive, or even partially comprehensive, taxonomic scheme for cinema segments. Perhaps this is enough to ask of a cinema theory. Perhaps it is too much to ask that the theory make comments regarding cinema aesthetics, cognition, and perception. Or, perhaps Henderson is correct when he says, The taxonomy that results identifies certain patterns and gives various labels to these, but it says little or nothing about them, neither why these patterns exist nor what is important about them.' (Henderson 1975: 31). Metz's own characterizations of the theoretical implications of the grande syntagmatique are profoundly unsatisfying. His presentation is confused and often infuriating. No more attention will be directed to evaluating the grande syntagmatique, since in the next two chapters I will begin to outline a complementary linguistic program for cinema theory which potentially satisfies the inadequacies of the grande syntagmatique noted above. In the final section of this chapter, I would like to return to some of the earlier, and perhaps more important issues, of the Metz literature. Specifically, I want to review some of the writing of two other theorists who, like Metz, apparently confuse the theoretical claim that film is a language with the methodological assumption that film can be studied as language.
46 Toward a Structural Psychology of Cinema
5. POETRY, SYMBOLISM AND CINEMA In different ways the work of Pier Paolo Pasolini (1966) on the so-called cinema of poetry and of Daniel Sperber (1974) on symbolism represent further confusions of the film-as-language issue. Pasolini has made two claims which I will try to refute; first, that the structure of cinema is of an irrational type and, second, that there can be no distinction between grammar and rhetoric in cinema theory. Sperber makes the claim that symbolic phenomena cannot be analyzed in semiotic or linguistic frameworks, and argues that they should instead be analyzed as deriving from a symbolic attitude of human cognition and memory. Basically, I want to show that there is really nothing to these various claims. Sperber's argument fails because each of the criticisms he raises against his linguistic straw-men, are just as true of the program he offers as a solution. Moreover, he attacks the simplest sorts of linguistic analyses of symbolic phenonena. Pasolini mistakes his own subjective impressions for analyses of phenomena. I must point out, however, that much more is at issue in the discussions presented in Sperber (1974) and Pasolini (1966), and I shall only address those aspects of the texts that are relevant to the Metz literature. Pasolini, like Metz, is concerned with the validity of the theoretical claim that film is a language.6 Pasolini focuses on the communicative nature of language. He argues that the essence of verbal language is communication, but that this is not true of cinematic language. As evidence, he argues that men communicate, after all, with words and not with images (Pasolini 1966: 35). There are two flaws in this argument: first, it is factually incorrect, it is not at all clear that the primary and essential feature of verbal language is its communicative aspect. It has often been argued that the primary objective of human verbal language transcends communication and actually involves things such as representing the self to the self (e.g. Terrace and Bever 1976). Aside from this, what will Pasolini say of the sign language of the deaf? Is it excluded from language because it does involve the use of images by men to communicate? Insofar as there is an issue here at all, Pasolini's position is untenable. But there is a second flaw in the argument, and this is that the argument is irrelevant. Like Metz, Pasolini has attacked the
Film as Language 47
theoretical claim that film is a language, a claim that no one, not even the most dedicated filmolinguist, must defend. But this irrelevant argument is just a prelude. Pasolini's main contention is that film is of an irrational type. He argues phenomenologically, mistaking similarities in subjective experience for similarities in unpresented analyses. On this basis, he concludes that film, dreams, and memories are of the same irrational type, in contrast to language which is of a rational type. He labels dreams and memories virtually prehuman, pregrammatical, and premorphological. The argument fails to go through for the same two reasons: factually, the evidence given is inadequate and pervasively contradicted; logically, the point is moot. We cannot know what Pasolini means by dreams and memories for he gives no citations. We cannot know what he means by ascriptions like prehuman, pregrammatical, and premorphological either, for he does not elaborate or sufficiently exemplify them. However, we can examine the psychological and psychoanalytic literature on the topics of dreams and memories, and compare what we find there with Pasolini's implicit analysis. In the case of memory, there is extensive research in experimental psychology which has fruitfully charted the rational basis for storage and retrieval in human memory. The most general conclusion is that items are stored and retrieved according to shared features of form, content, and context (Bartlett 1932). Memory, it seems, is not in any conventional sense pre-morphological or irrational. Pasolini has similarly confused the psychoanalytic analysis of dreams. Freud (1900: 661) describes dream thoughts in these terms: These usually emerge as a complex of thoughts and memories of the most intricate possible structure, with all the attributes of the trains of thought familiar to us in waking life .... The different portions of this complicated structure stand, in the most manifold logical relations to one another. They can represent foreground and background, digressions and illustrations, conditions, chains of evidence and counter-arguments. In Freud's theory, dream thoughts constitute the underlying (latent) structure of the dream image (manifest content). The two levels are related by dramatization, a relation which maps
48 Toward a Structural Psychology of Cinema
several elements in the dream thought material onto each element of the manifest content of the dream (Freud 1901: 652, 659). The techniques of dream interpretation use the dream report to infer the structure of the dream itself and, ultimately, to construct a theory of the dream thought. The structure of the project is very much like that of generative linguistic analysis in which the speaker/hearer intuition of sentencehood is used to infer the structural organization underlying manifest sentence segments. Pasolini's use of pregrammatical, premorphological, and irrational seems obscure in this context. But again, the truly destructive point, for Pasolini, is that his argument is irrelevant. Whatever characteristics, impressions, and similarities are shared by the experience of cinema and experience of something else cannot be used as an analysis of those very facts — this is the fallacy of phenomenology as theory. However, even if such an analysis could be constructed, it would not bear at all on the methodological principle that film can be studied like a language. A third argument Pasolini presents in defense of his view that cinema is not like language is the argument that there is no distinction between grammar and rhetoric in cinema. Pasolini explains that the filmmaker must first create his images from chaos and then invest these newborn morphological entities with his own aesthetic expression: 'While the writer's work is aesthetic invention, that of the filmmaker is first linguistic invention, then aesthetic.' (Pasolini 1966: 36). Metz concurs with this point, 'To "speak" a language is to use it, but to "speak" cinematographic language is to a certain extent to invent it.' (Metz 1974: 101; see also pp. 117, 224). And once again, I claim that the argument fails to go through for factual and logical reasons. The logical failure is the usual one; Metz and Pasolini point out a difference between film and language to serve the larger argument that film is not a language. But this point, as we have noted over and over, is beside the point — film is clearly not language, but it may still be studied linguistically. Aside from the logical problems, the argument is clearly wrong. It is a linguistic commonplace that language use plays a determining role in the evolution of language structure — synchronically and diachronically (e.g. Bever and Langendoen 1971; Bever, Carroll, and Hurtig 1976). Accordingly, rhetoric and grammar
Film as Language 49
are not functionally separable in the theory of language either: to speak a language is to a certain extent to invent it. But from this we clearly need not conclude, as Pasolini and Metz do, that grammar and rhetoric are one and the same. In the theory of language, systems of language structure (e.g. grammar) and systems of language use (e.g. rhetoric) are conceived of as participating in a functional interaction. The equilibration of this interactive relation instigates diachronic and synchronic changes in each of the contributory systems. In noticing that the use of cinematic forms results in the invention of new cinematic forms, Metz and Pasolini seem to have overlooked the fact that linguistic invention is a daily experience: language behavior is creative, everyone has produced novel sentences (for discussion, see Chomsky 1968). (Even if we restrict ourselves, as Pasolini does, to the comparison of words with cinematic images, the point still holds; see Halle 1973; Carroll and Tanenhaus 1975). To sum up, there seems to be no profit in arguing that film is more like poetry than prose, ergo it has no grammar and cannot be studied linguistically. The three arguments we have considered all rest on defective factual and logical bases. Finally, note that there has been a considerable amount of work done on the analysis of various aspects of poetic structure and stylisitics — based explicitly on linguistic analyses (e.g. Freeman 1970). Thus, even if it could be proved that film is more like poetry than prose, this would not serve as an argument that film should not be studied linguistically. In the remainder of this section, I would like to briefly examine the work of Sperber (1974) on symbolism. It is clear, I think, that film routinely exploits and creates symbolism. Therefore, Sperber's arguments against linguistic approaches to the study of symbolism can be construed by extension as arguments against taking a linguistic approach to the study of cinema. The first portion of Sperber's book concerns itself with attempting to refute the semiotic approach to the study of symbols. He dismisses several candidate approaches that fall under the semiotic heading, finally turning to the work of Lovi-Strauss. Relative to his assessment of other semiotic approaches, Sperber is positive about the contribution of Lovi-Strauss. However, he refuses to label Lovi-Strauss's work as semiotic. Finally, Sperber
50 Toward a Structural Psychology of Cinema
concludes that symbolic phenomena are cognitive phenomena, the result of a symbolic attitude. There is much that is positive here. Most centrally, Sperber quite correctly maligns that school of thought which would analyze the complexities of symbolic phenomena by mapping putative symbols into putative meanings, one to one. But this point is not completely original; it is the clarion call of modern-day structuralism, as championed by L6vi-Strauss, Chomsky, and many others. Indeed, it is entirely consistent with the position of most contemporary linguists and semiologists. Thus, one question we want to ask regarding Sperber is whether or not the positions he attacks are strawmen. It seems that the answer to this question may very well be yes. Sperber's characterization of semiotic approaches is quite narrow. For example, in attacking the psychoanalytic school, Sperber directs all of his remarks to the writings of Freud on sexual symbolism. One wonders why he does not at least note the much broader perspectives of a psychoanaltic theory like that of Jung, which explicitly refuses to limit itself to sexual symbols. Similarly, in attacking linguistic approaches, Sperber addresses only de Saussure, totally ignoring the contemporary generative school. Sperber is quite right when he asks of semiotic approaches, in what sense have you explained a symbol when you translate it (e.g. wearing butter on the head = = > having semen on the genitals; p. 46)? However, this point only holds against the rather simple semiotic programs he chose to consider. Certainly, it would be possible to adopt a methodologically linguistic orientation to the study of symbolism. This would not necessarily mean that symbols would be translated into language. It would not necessarily involve the theoretical claim that symbolism is a language in any substantive sense. It would only mean that the techniques of gathering and organizing data into an analysis would be, at least partially, imported from linguistics. De Saussure was the founder of semiotics, but that hardly means that Saussurian linguistics must be the basis for semiotic analysis (this is in fact one problem with Metz's work). Indeed, if one assumes that the linguistic model underlying the semiotic of symbols is generative grammar, many of Sperber's comments regarding Lovi-Strauss contradict his commentary on semiotics. Sperber (1975: 69-70) notes two advantages, in partic-
Film as Language 51
ular, of Lovi-Straussian structuralism. First, he approves of the fact that the sort of description rendered by Lovi-Straussian analyses includes multiple layers of meaning. Second, and vis-ä-vis the first, Sperber approves of the fact that L6vi-Strauss maintains both underlying and manifest oppositions in his analysis. However, he complains that it is difficult to see just what a system of binary oppositions, even one that simultaneously maintains both underlying and manifest levels, gets us in terms of explaining symbolic phenomena (p. 69). From the perspective of a linguistic approach, there are several points that need to be made here. First, and we have already made this point before, Lovi-Strauss's approach is not at all inconsistent with linguistic, or semiotic approaches. Contemporary generative linguistics is founded on the principle that in any adequate linguistic grammar both underlying and manifest levels of sentence structure must be simultaneously addressed (see Chapter 4). Moreover, given sentences routinely receive more than one structural description at one or both of these levels. Thus, contrary to Sperber's view, alternative analyses are univocally resolved. Sperber seems to confuse the language user, who typically does interpret linguistic entities univocally, with the theory of linguistic grammar, which in contrast must exhaustively describe all possible interpretations of all possible linguistic segments. Sperber's reservations about the explanatory efficacy of Ldvi-Strauss approach raise other, somewhat more difficult, questions. As argued earlier in the discussion of Metz's use of binary oppositions, I also have some difficulty in seeing just what a set of binary oppositions, underlying or manifest, buys us in terms of explanation. At the least, I can envision other approaches that appear to offer more (see Chapters 5, 6, and 7). However, it is fair to ask the question of what Sperber proposes to construct in place of the binary oppositions of L6vi-Straussian structuralism. Throughout his book he constantly refers to the alternate approaches he discusses as party games (e.g. p. 64), thus he leads the reader to expect that he indeed does have a way out for the theory of symbolism. Unfortunately, he does not. Sperber's own analysis of symbolic phenomena is that they are cognitive phenomena: when we experience things which, on the basis of our encyclopedic knowledge of the world, strike us as incongruous or odd or inexplicable,
52 Toward a Structural Psychology of Cinema
we adopt a symbolic attitude and interpret these situations as being symbolic. This proposal is somewhat of a letdown, for how are we to recognize symbolic attitudes except by playing the same mentalistic party games that psychoanalysis and structuralism must play. What is a cognitive phenomena anyway, or better yet, what is not? How is encyclopedic knowledge of the world to be defined? Where, finally, is the discovery procedure that Sperber seems to be promising us? I don't want to overstate the case against Sperber too strongly, since his proposals are quite sensible, although quite programmatic. The real point here, as in the discussion of Pasolini, is that the polemical antisemiotic viewpoint presented has very little to recommend it. Most of the existent literature on the subject of film-as-language is simply beside the point. The film-as-language metaphor needs to be adopted, taken seriously, and explored in order to be rejected or sustained. Fifty years of cinema theory has concerned itself with whether or not film is a language, but whatever the answer to this question we still must ask, can we learn anything about cinema by studying it as if it were a language? The answer to this question is quite independent of the answer to the first question, and far more important. NOTES 1. Even prior to 1974, when Film Language appeared in English translation, Metz had begun to abandon the filmolinguistics of that work. In spite of this, the approach taken in Film Language had a significant impact on cinema studies of the mid-1970s (Carroll 1977b; Henderson 1975; Nichols 1975), and remains the classic work of that period. 2. For the specific purposes of the present study, it is actually more important for us to examine the theoretical and especially methodological underpinnings and assumptions of Metz's work than it is for us to examine the substantive results of his analysis. As I have argued in Carroll (1977b), the confusions created by Metz's work and the critical reviews it has received unwarrantly impugn structural linguistic approaches to cinema theory. 3. Bettetini (1973: 31, 45, etc.) also makes this confusion. 4. Metz (1974) goes into more detail regarding these segment-types, and the reader is referred there for more discussion. This brief consideration is sufficient for present purposes, basically the problem we want to pose is once you have a taxonomy, leaving aside questions of its adequacy (and it is clear that many existent cinema segments would find no
Film as Language 53
account in Metz's system), what do you do with it? This has never been made clear by Metz or anyone else, see below in text. 5. Metz himself unhappily notes that some segments are not exhaustively classified by the taxonomy. He does, however, recognize this as a deficiency of the analysis. 6. Pasolini (1965; 1966), and Mitry (1967) as well, seems to see a fundamental antithesis between the claim that cinema has an abstract denotative structure — a syntax — and the principle that cinema is a creative medium. If true, this is certainly an unfortunate misbelief (see Bettetini 1973: 63). However, it suggests a possible motivation for the arguments raised by Pasolini and critiqued in the text.
A Linguistic Approach to Cinema Theory
1. PRELIMINARIES In this study we shall assume the methodological principle that film can be studied meaningfully as if it were language. As we saw in the preceding chapter, there is really no good reason to reject such an approach out of hand. Those who have done so, have proceeded either on the basis of no argument and evidence, or on the basis of fallacious argument and irrelevant evidence. However, the best way to deal with these misunderstandings of the film-as-language metaphor is to adopt the program and produce a successful analysis. This is precisely the project we shall undertake. Further, we shall, like Metz, limit our study to the narrative film, and, in general, to commercial narrative films. There are two reasons for this: first, it is primarily the narrative that has concerned traditional works in film theory. Therefore, it is by analyzing the narrative that our project can potentially make the greatest contact with prior work. Second, people have the greatest experience with the narrative film, as opposed to documentary or experimental genres. The commercial narrative, in particular, has relatively great accessibility. Accordingly, the reader can check the examples cited, either by recollection or by seeing the film in question. It must be acknowledged that, for many purposes, other corpora might be more efficacious. However, we leave this for future consideration.1 In the present chapter we adopt a simple grammatical formalism, that of phrase structure grammar. Phrase structure grammar is an improvement over the approach of the grande syntagmatique
A Linguistic Approach to Cinema Theory 55
in that it can describe an infinity of potential cinema structures (instead of just eight).
2. PHRASE STRUCTURE GRAMMAR 2.1. What phrase structure grammar is Everything that is called a theory is not seeking the same thing. A casual reading of Chapter 2, or any other review of those works that have been called film theory, makes this abundantly clear. In saying this, I do not mean to disparage any particular notion of film theory. There is room in the study of the human mind and its artifacts for a great variety of theories — indeed, having a great number of theories might even improve our chances of finding something out. Phrase structure grammar exemplifies one sort of conception of theory. A phrase structure grammar is a formal algebra which recursively enumerates all and only the members of a particular class of objects, a possibly infinite class. The rules of the grammar are principles that predict whether or not a given object can belong to the class. They describe the underlying structural regularities which inhere in that class of objects. The best way to understand what this means is to examine some examples. We will later want to contrast Metz's grande syntagmatique approach to the theoretical program implied by phrase structure grammar. Suppose that in a study of a very simple (and rather monotonous) phenomena, the following corpus of sequences was observed: aaaabbb aabbbb abbb aaaab aab ab aaaaabb abbb abb aaabbbbbbbbbb
56 Toward a Structural Psychology of Cinema
The example strings are clearly abstractions from some actual class of phenomena. They can be regarded as symbolically representing a wide variety of behavioral or physical processes. For example, 'a' might correspond to 'makes a left turn', and 'b' might correspond to 'makes a right turn' — thus the example sequences might represent the behavior of a rat running through various t-maze combinations. In the first maze, the rat turns left four times and then right three times; in the second he turns left twice and then right four times; etc. Clearly, the sequences could as well represent many other phenomena: a child who either hops or skips, layers of shale that either contain oil or do not, a machine that either stamps out a waffle iron or a machine gun, etc. The abstract a-b sequences in the example above describe all of these sequential structures — namely, by describing that which is common to all of them. There are many things about the various physical/behavioral interpretations of the sequences that are different, and, at some level of analysis, we would want to be able to describe this too. Nevertheless, for present purposes we will concern ourselves only with the common abstract sequential structure underlying all of these phenomena. What sort of description will the phrase structure grammar provide for our a-b sequences? We might first ask what there is to say about these sequences at all. I am ready to be corrected on this, but the only regularity I see in these sequences is that some number of a's are followed by some number of b's. That is to say, we never have the sequence 'aaaabbbba' or 'bbbbbaaaaa'. Some a's must occur, and then some b's must occur — but nothing else. The simple phrase structure grammar below recursively enumerates, or generates, all of the example sequences. To that extent, this grammar is a theory of the structure of those sequences. Rule 1: S => aB Rule 2: B => aB Rule 3: B => bC Rule 4: C => bC Rule 5: C => φ (The special symbol ψ is referred to as the null symbol.) The five rules of this regular grammar apply by rewriting what is on the
A Linguistic Approach to Cinema Theory 57
left of the arrow by what is on the right of the arrow. The initial symbol is S. Thus, the first sequence listed in our example set above can be generated in the following way: Starting symbol: S Apply Rule 1: aB Apply Rule 2: aaB Apply Rule 2: aaaB Apply Rule 2: aaaaB Apply Rule 3: aaaabC Apply Rule 4: aaaabbC Apply Rule 4: aaaabbbC Apply Rule 5: aaaabbb The result of this series of rule applications can be rendered as a tree structure diagram, as in Figure 2. The terminal, or lowest, nodes of this structure list out the sequence 'aaaabbb'. The upper branches, or nonterminal nodes, describe the structural relations between the various a's and b's. The series of rule applications we went through in order to generate the sequence is called the derivation of the sequence in this phrase structure grammar. We could construct similar derivations for each of the example sequences. However, we could not construct a derivation for any of the sequences listed below: aaaaaaaa aaaaaaaaba bb abab bbbbbaaa baaaabb aaba
bbbba abababab aaabbba It is in this sense that the grammar is an adequate theory of the sequences listed in our first example.2 All of those sequences are generated, as they should be, by the grammar which is an adequate theory of their structure. The grammar simply formalizes as an explicit algebra what we recognize as the significant structural property of our sequences, namely, that a group of a's precedes a group of b's.
58 Toward a Structural Psychology of Cinema
a
B α
Β
α
Β α
Β b
C b
C b
Figure 2.
C
Φ
A Linguistic Approach to Cinema Theory 59
2.2. A phrase structure grammar of cinema Consider now the problem, as stated by Metz, of a grammar of the segment in cinema (I prefer the term 'scene' to Metz's 'segment'. In any case, the former is the more traditional in cinema writing.) 3 Metz's taxonomy manages to distinguish eight separate types of cinema scenes. It is entirely possible that this partitioning draws a significant generalization concerning the possible ways in which narrative cinema scenes can represent bits of a narrative. 4 However, it must be equally acknowledged that, perhaps at a different level of analysis, there are a bewildering number of possible ways that a cinema representation can be arranged and re-arranged: there may well be an unlimited number of potential scene structures. Metz's grande syntagmatique, or any purely taxonomic approach, can never capture this fact in the description it provides. A phrase structure grammar, like Gl, enumerates an infinite class of potential structures and hence can overcome this limitation. In the remainder of this section, I will briefly discuss two aspects of the structure of cinema scenes and how these facts can find expression in a phrase structure grammar of the scene. I will not bother to develop any complete phrase structure grammar analysis of even a single scene, for reasons that will be clearer in th next chapter.5 However, it does seem prudent to remark, albeit briefly, on the methodological assumption that an investigation of cinema can coherently limit itself to the description of the structure of something called scenes. This question can be divided into two parts: first, can we be sure that such a coherent level of structure exists (in viewers' phenomenology)? Second, is it sensible to study scenes, apart from the larger contexts of the narratives in which they appear? We will consider these two questions in order. The existence of scenes cannot be objectively established: logical positivism cannot be satisfied.6 However, as in the case of linguistics, we can appeal to common intuitions in order to operationally establish our object of inquiry. In the study of linguistic syntax, one appeals to intuitions about what constitutes a sentence. In the study of cinema grammar, we appeal to intuitions about what constitutes a cinema scene. There is, to be sure, some slack here. Not everyone agrees in every case about what se-
60 Toward a Structural Psychology of Cinema
quences of words comprise a sentence and what sequences do not. And we can expect that some will differ as to what they consider to be a scene (we return to this matter in the next section). However, the fact that every film theorist from Eisenstein to Metz has made reference to the scene as a coherent unit of cinema narratives, ensures that our appeal to this unit can be operationalized within limits. (Experimental investigations discussed in Chapter 7 lend somewhat more objective support on this point). Just as in linguistics, the grammar we seek to construct is a formalized definition of the object of our inquiry. A linguistic grammar provides a theory of sentence structure; it defines the sentence in terms of its internal structure. Analogously, a cinema grammar (of the sort countenanced here) provides a theory of scene structure, and defines the scene unit in terms of its internal structure. It is perhaps chiefly in this sense that the study of scene structure can contribute to the more traditional concerns of cinema theory: Theorists from Eisenstein to Metz have freely used the conception of the scene unit, but no one has ever provided an adequate definition of the scene. No one has even really undertaken this project. Accordingly, a grammar of the scene can provide a structural foundation for a more comprehensive analysis of cinema (analyses of entire feature length cinema discourses), and for the more traditional questions of cinema aesthetics and style, in particular. 7 Much of linguistic research addresses the structure of individual sentences in isolation. Thus, a grammar of a natural language generates all and only the sentences of that language; not paragraphs, words, phrases, or novels. It is assumed that a theory of language structure will include a theory of sentence structure and that the subject matter of the theory of sentence structure can be investigated at least somewhat independently of larger and smaller units of language structure. This assumption is surely wrong in the limit. That is, it is most certainly the case that sentence structure is not independent of paragraph structure, of discourse context, of social context, or even of the speaker's and hearer's intelligence. Everyone knows that you don't say the same things to your mother that you would say to fellow members of the wrestling team. You say things differently when speaking to a child. What you consider to be an appropriate sentence might vary quite a bit depending on who you
A Linguistic Approach to Cinema Theory 61
are talking to, and when, where, and about what you talking. Nevertheless, it may still be a useful research strategy to first tackle only the structure of isolated sentences, and then to move on to deal with these other factors. This is precisely what has happened in modern linguistics: the first breakthroughs in the study of discourse structure and so-called speech acts have just begun to appear after fifteen years of groundwork in the study of sentence structure. The statements in (A) and (B) below cite certain regularities in the structure of cinema scenes. (A) Every scene begins with a long-shot, that is, a shot which reveals the entire geography relevant to the action of the scene. (B) A scene may have embedded within it another scene, representing pieces of the narrative occurring at another time or place: a flashback, flashforward, or cutaway scene. Principle (A) describes the master-scene technique common in the Hollywood films of the 1930s and 1940s. It was developed by D.W. Griffith and is one of the striking characteristics of his cinema. In a film like Way Down East, Griffith typically begins the action of a scene in an establishing long-shot, and then weaves in a series of detail close-up shots to reveal actors speaking lines, reacting to events, etc. Nilsen (n.d. p. 22) recommends this sort of directorial treatment, and it is extremely prevalent in the work of the Hollywood era directors. It has often been called the master-scene approach. Phrase structure grammar can describe this structural regularity by means of the following rule: Rule 1: S = = > L + D* This rule can be read: A scene (S) can consist of a long-shot (L) followed by a sequence consisting of one or more detail-shots (D*).8 (We do not claim that all scenes have this structure, of course; but we do take up the issue of the adequacy of grammars in the following subsection.) Now consider principle (B), in particular, the case of flashback scenes. Several examples of this type can be found in the Griffith film Broken Blossoms. In these scenes, during a detail close-up shot of the Chinaman (Richard Barthlemess), Griffith will often cut in a brief flashback to the Chinaman's youth. The phrase structure rule below attempts to describe this: Rule 2: D = = > D + S + D
62 Toward a Structural Psychology of Cinema
A detail-shot (D) can be realized as a detail-shot sequence interrupted by another scene (S)9 The important point about these phrase structure rules is that they formalize the underlying regularities about scene structure which we are tacitly aware of and can characterize discursively (although with much less precision). It is easy to move from these two isolated rules to a simple derivation of a scene structure, formally analogous to the derivation corresponding to Figure 2. The following derivation describes the scene structure in Figure 3.
Figure 3.
A Linguistic Approach to Cinema Theory 63
Starting symbol: S Apply Rule 1: L + D + D Apply Rule 2 : L + D + S + D + D Apply Rule 1:L + D + L + D + D + D This scene consists of a long-shot followed by two detail-shots. Within the first of these detail-shots there is a flashback scene, itself consisting of a long-shot followed by a detail-shot. Note that the terminal nodes of the scene structure in Figure 3 are quite abstract: they are not images, or pieces of film, but rather abstract categories of scene structure.10 Such a structural description constitutes a partial specification of what a cinema scene is: it is a step toward a definition of the scene as a unit in cinema. There are a great many potential scene structures which will not be enumerated by the grammar consisting of our Rules 1 and 2 above. And this observation raises the question of what an adequate phrase structure grammar of cinema must be responsible for describing. We now turn to this question, as it has been framed in recent linguistic inquiries.
2.3. Observational adequacy There have been many applications of phrase structure grammar in current linguistic research. For example, Chomsky (1957: 7-27) presents the following fragment of a grammar for English. S = = > NP + VP NP = = > Det + N VP = = > V + NP Det = = > the N = = > man, ball, etc. V = = > hit, took, etc. This grammar yields sentence structures like the one for 'The man hit the ball.' in Figure 4.
64 Toward a Structural Psychology of Cinema
ball
Figure 4. However, like our simple phrase structure grammar for scenes, Chomsky's grammar is certainly not an adequate grammar of the English language." There are a great many sentences (that is, linguistic sequences which we feel are sentences) for which this grammar provides no description. Chomsky suggests that one evaluation criterion for grammar is that it generate all and only those linguistic sequences which native-speaking persons take to be well-formed sentences of the language. This criterion of adequacy for a grammar is referred to as observational adequacy (Chomsky 1965). An observationally adequate grammar of English must enumerate all and only those word sequences that speakers of English accept as well-formed sentences of English. Analogously, we may specify that an observationally adequate grammar of cinema must enumerate all and
A Linguistic Approach to Cinema Theory 65
only those sequences of images and shots that competent film viewers take to be well-formed film scenes. Unlike the grande syntagmatique, observationally adequate grammars of language and cinema must enumerate infinite classes of sentence and scene structures. Notice that in both cases we are relying on a tacit knowledge that speakers and film viewers have that allows them to distinguish film scenes and sentences from other sorts of sequences. This phenomenological data is then turned back onto the object of inquiry and employed as the empirical basis for a grammar which constitutes a definition of that object. But what are these judgments of sentencehood and scenehood that we are relying upon?
3. ACCEPTABILITY Part of what speakers know can be termed acceptability: they know when a sequence of words comprises a well-formed sentence of their language. Analogously, film viewers know that something is wrong when a film they are viewing comes unsprocketed in the projector and begins to jitter and jump. Such a scene is unacceptable, in this sense, in a narrative film. The mental processes that underlie such acceptability judgments are in all likelihood very complex and, in any case, as yet quite poorly understood (Carroll, Bever, and Pollack 1981). But it is nevertheless true that speakers of English recognize certain sequences as bona fide sentences of their language, as for example: The cat ate the mouse. Harry brought a bagel to Carol, and Oscar the creamcheese to Janet. The teachers each spoke to Mary. It happened that Mary spoke with her teachers, and other sequences as nonsentences, for example: * The the mouse cat ate. * Harry brought Carol a bagel, and Oscar Janet the creamcheese. * Mary spoke to the teachers each. * That Mary spoke with her teachers happened. Film viewers likewise recognize certain sequences as scenes, and others as nonscenes. And most strikingly, these acceptability
66 Toward a Structural Psychology of Cinema
judgments are rendered with relatively good reliability and agreement.
3.1. Acceptability in cinema Evidence of this can be cited from the texts of traditional cinema theory. Nilsen (n.d. p. 44) discusses how certain combinations of shots can be unintelligible. Vorkapich (1974) partitions cinema sequences into those that are filmic and those that are unfilmic. Reisz (1968 [1953], Chapter 16) also seems to have the notion of acceptability in mind when he speaks of cutting combinations that are not smooth, etc. Various other theorists have talked about the same sort of phenomenon labeling combinations we might call unacceptable as confusing, meaningless, or even unmotivated. Eisenstein (1942) provides further characterization of acceptability. In a discussion of the cinematic representation of battle scenes, he notes (p. 210) that there is often a lack of overall topographical and strategic logic. Accordingly, '... most film battles melt into an hysterical chaos of skirmishes through which it is impossible to discern the general picture of the whole developing event.' Thus, locally each shot may be well-formed, however, the lack of a clear geography renders the scene as a whole unacceptable.12 These prior, albeit somewhat unsystematic, references to the concept of acceptability encourage the hope that acceptability can be sufficiently operationalized to serve as the basis of a cinema grammar. Clearly, however, acceptable, in linguistics, cannot simply be equated with speakable, since clearly not everything a speaker of a language actually utters would be judged upon reflection to be a sentential sequence (even by the speaker). Consider these, Seems like a good idea. Wanna? Oscar will. Harry and the warden is waiting. Analogously, I think, we must acknowledge that not every existent cinema sequence comprises an acceptable scene. The fact that a given sequence is found in a feature film, does not guarantee that viewers will find it acceptable.
A Linguistic Approach to Cinema Theory 67
Reisz (1968 [1953]: 216-217) considers the sort of unacceptability that arises when actions are not matched during a cut: an actor begins an action in one shot, during the action there is a cut to a new camera position, and then the action is completed in a second shot. Occasionally, especially in older films that may have broken and have been respliced, the two pieces are not joined together smoothly — some part of the action is either deleted or shown twice. To quote Reisz, a cut from a long-shot of an actor standing by the fireplace to a medium-shot of the same actor already seated in an armchair would be unacceptable. The wedding scene of Pabst's The Three Penny Opera is a more subtle example of this. The scene takes place in a loft that has been decorated ostentaciously for the wedding of Mack the Knife. Because the background of each shot is so rich in complicated detail, the viewer can lose all sense of geography. The various shots cannot be synthesized into a coherent whole. (For discussion of the role of visual landmarks in perception, see Lynch 1960 and Gellman 1977). Baläzs (1970 [1945]: 109) raises similar points. He recalls a German film (he doesn't recall the director of the film) called Phantom. In this film, a series of vignettes from the hero's life are flashed onto the screeen in a stream-of-consciousness format. Baläzs does not necessarily say that this is bad cinema, but he does conclude that such cinema sequences cannot be said to represent coherent events — they are unacceptable as narrative scenes. Other instances of apparent acceptability raised by Baläzs involve visual distortion. He argues (1970 [1945]: 93, 102) that things may be so distorted as to become effectively unrecognizable and thus no longer a representation of reality. One example, not given by Baläzs himself, is the early Abel Gance film The Follies of Dr. Tube. The use of distorting lenses in this film renders some of the sequences unacceptable. A final example might be the camera tracking scenes in Hitchcock's Murder (1934). Hitchcock's use of the moving camera in this early film was very innovative, but since he could not smoothly refocus the lense a§ the camera moved there is constant blurring, focusing, and blurring as the camera movement and the focus leapfrog along. This distortion seems to render some of the scenes in the film unacceptable.
68 Toward a Structural Psychology of Cinema
It is important to stress, however, that instances of unacceptability in commercial narrative cinema should be rather more difficult to isolate than they are in the study of language. Free speaking and listening can reveal and suggest much to us about language vis-ä-vis acceptability. However, existent narrative cinema sequences are not in general freely created but rather are pieced together, tested, rearranged, and reorganized by editors and directors. With respect to cinema acceptability, it is probably true that the very best data is on the cutting-room floor. Linguists working within the framework of transformational-generative grammar take acceptability judgments as primitive data. They reason that the systematicities in acceptability judgments among different speakers of a language community comprise important indications of the underlying knowledge which speakers have about their language. A grammar which generates all and only the sentences that speakers take to be acceptable sentences of their language, is, in this sense, an explicit theory of (part of) the knowledge the speaker uses in making acceptability judgments. Just as a linguistic theory of sentence structure constitutes a theory of knowledge which speakers of a language have, the theory of the structure of cinema scenes should aspire to be a theory of knowledge which viewers have when they know cinema. Thus, the cinema grammar should be evaluated by criteria like observational adequacy — just as a linguistic grammar is.
3.2. Filmic versus cinematic Acceptability cannot be equated with having value — much of poetry is unacceptable, in the technical sense, although it is probably a much better quality of language than is Chomsky's sentence, The man hit the ball. which is a fully acceptable sentence. Similarly, it seems that many acceptable cinema scenes are merely pedestrian, while a great many artistically important pieces of film are downright unacceptable (in this technical sense). It is important to emphasize this in a study of cinema since film is perhaps fundamentally thought of as an art form.
A Linguistic Approach to Cinema Theory 69
Classifying some sentence as unacceptable does not mean that it is a poor sentence, that the speaker of it is a fool or an illiterate, or indeed anything like this. Acceptability is an immediate intuition about a sentence's well-formedness, it is a basic data term to linguists and means nothing outside of the framework of the study of sentence structure. Analogously, classifying a cinema sequence as an unacceptable scene simply fails to bear directly on its aesthetic properties in any way whatsoever.13 However, there may be some a tendency to misunderstand that unacceptable implies bad or unskilled. In order to avoid this confusion in the study of cinema, I propose the following distinction (based on terms introduced by Slavko Vorkapich). The term 'cinematic' will hereafter be used to refer to cinema sequences that are artistically and stylistically valuable, well-executed, etc. The term 'filmic' will be used to refer to sequences which are structurally correct and therefore well-formed (acceptable, in linguistic terms).14 The distinction between filmic and cinematic aspects has often been latently instrumental in discussions of film (see Nilsen, n.d. p. 21, 39; Eisenstein 1942: 17; Wollen 1969: 16-17; Tudor 1974: 16; Baläzs 1970 [1945]: 179). However, the failure of film theorists to observe the distinction explicitly has led to much confusion. As an example, consider Rudolf Arnheim's (1957 [1933]) theory of partial illusion. Arnheim very eloquently develops the thesis that film is art to the extent that it does not provide absolute verisimilitude with reality. Thus, he argues against color and sound, since they break down the abstraction of cinema and bring the cinematic experience closer to real life experiences. However, his argument continues, form for form's sake alone is bad cinema: the illusion must be only partial, but the illusion must be there. Thus, he criticizes directors, like George Pabst, who characteristically overstructure cinema sequences with many cuts. Arnheim attacks Pabst for cutting to a reverse angle in a sequence from The Diary of a Lost Girl 'for no apparent reason'. Arnheim's position is confused by his failure to draw the filmic-cinematic distinction. Cutting, for example, often seems to be a matter of filmicity, or film structure, rather than film aesthetics (cinematicity). Consider the work of Andy Warhol, films like Empire State Building, as contrasted to some of the cinema of Jean-Luc Godard, for example Breathless. In the former case, no
70 Toward a Structural Psychology of Cinema
cutting is employed. The film is resultingly unfilmio, although, some would argue, it is cinematic. In the latter case, some of the cutting employed (for example the famous cut to the gun), is clearly unfilmic, absolute unnecessary to the structure of the film — and quite jarring, but again some would argue, quite cinematic. Cutting, or not cutting, for no apparent reason can equally render a sequence unfilmic. Both approaches make the viewer aware that he is watching a film and not life — they destroy the (partial) illusion of cinema. However, both approaches are not inconsistent with cinematicity. Examples to the converse of Warhol and Godard abound in television editing: moronic, sing-song cutting patterns which maintain the illusion and do not intrude on the viewer's awareness, but which are wholly uncinematic (although filmic). (Although, there are cases in which such cutting patterns may lead to unfilmic sequences, see Chapter 7.) Arnheim's attack on Pabst, although made on aesthetic grounds, really bears on filmicity, not cinematicity. Arnheim's failure to distinguish between these two aspects has the effect of confusing and blurring his important observation. (In a subsequent work, Arnheim (1957 [1938]) does appear to differentiate filmic and cinematic aspects. He writes 'a phenomenon may not disturb us in a purely psychological sense and still be objectionable artistically.' (p. 210)) In addressing the issue of cinematicity, we again confront the distinction between prescriptive and descriptive cinema theory (Chapter 2). Raymond Spottiswoode (1950 [1933]), for instance, attacks the use of wipes and superposition (double exposures) on grounds which at first appear only to be confusing cinematicity and filmicity. However, on reflection, it seems quite clear that the mere fact that devices like wipes and superpositions are employed bears neither on filmicity nor on cinematicity. In any given instance, such a determination could presumably be made, but these devices cannot be ruled out categorically. Spottiswoode, in addition to confusing filmicity and cinematicity, has mistaken a prescriptive maxim for an empirically motivated descriptive generalization. Hopefully, this introductory characterization of the distinction between the filmic and the cinematic has not been unnecessarily belabored. However, the failure to make the distinction appears
A Linguistic Approach to Cinema Theory 11
to have caused a great deal of confusion in cinema theory both traditionally and recently.
3.3. Some constraints on filmicity In closing the discussion of the present chapter, we will consider several examples of filmicity judgments. For present purposes, these will take the form of ordinary language statements, inductive generalizations about what conditions lead to filmic and unfilmic cinema sequences. In the next two chapters, we will return to the task of expressing such principles as rules of cinema grammar. (C) The various pieces of the narrative represented by a scene must be ordered in the scene as they are ordered in the narrative. (D) The contours of successive shots in a scene must be substantially displaced with respect to each other.15 (E) Intrashot composition must be consistent with the narrative to which the scene corresponds. For example, jitter and shaking of the camera can only be acceptable when motivated in the narrative; lighting, lighting differences, and shadows must be consistent with facts about the narrative situation. (F) If an actor casts a glance out of the frame of the shot (casts a look of outward regard), the shot immediately following will be interpreted as a subjective-shot, that is, as a shot from the actor's point of view — revealing what it was that he looked at. If the shot immediately following a look of outward regard cannot be interpreted as a subjective-shot, the sequence will be unfilmic. (G) During a subjective-shot, the actor with whose point of view the camera position coincides does not physically take part in the event represented in the shot. That is, the actor for whom the shot is subjective plays only a passive role in the action recorded from his viewpoint. Principle (C) is exemplified by most narrative scenes. However, a clearly unfilmic violation of the principle occurs in a threeshot sequence from Leo McCary's film Big Business. In this scene Laurel and Hardy are destroying an adversary's house,
72 Toward a Structural Psychology of Cinema
throwing its contents out onto the street. In the first shot, a jug rolls onto a policeman's foot (he is watching the spectacle). In the second shot, Laurel throws a jug out of the house through a window. The third shot reveals Hardy, who has been smashing things as Laurel throws them out, shattering the jug on the policeman's foot (and unluckily smashing the foot as well). Of course, shot two must precede shot one in order to maintain the temporal sequence of the narrative. Principle (D) is discussed by Reisz (1968 [1953]: 223). He is describing the so-called a-b style of rendering a dialogue scene: the camera is positioned just over the shoulder of one interlocutor (say the left shoulder) while the other participant speaks. Then, often when speaking turns change, there is a cut to a camera position just over the other shoulder of the other interlocutor (the right shoulder). Cutting to opposite shoulders allows the two characters to maintain their respective screen positions (screenright, screen-left) and varies the two shots. When someone is speaking, that person is shown as further from the camera, as shown in Figure 5 (after Reisz 1968 [1953]).
Figure 5.
A Linguistic Approach to Cinema Theory 73
If one were to cut from one participant's right shoulder to the right shoulder of the other participant the two shots would resemble the drawings in Figure 6 (after Reisz 1968 [1953]).
CUT
Figure 6. Such a juxtaposition confuses viewers as to who is who in the shot. The various cuts between the two camera positions are initially seen as bizarre motion, and only later recognized as actual cuts (see Note 15 and discussion in Reisz 1968 [1953]). Often these types of unfilmic scenes are the result of cutting across the axis. Consider a further example from John Huston's Key Largo. Bogart and Bacall are walking across the camera frame laterally, the camera at right angles to the axis of motion. Huston cuts 180° across the motion with the result that Bogart and Bacall seem to have turned around and exchanged places during the cut. This exchange severely disrupts the continuity of the narrative, and renders the sequence quite strikingly unfilmic. Vorkapich (1974) discusses further cases of unacceptability of
74 Toward a Structural Psychology of Cinema
this general type. He suggests that while it is always filmic to cut across the parallel to ground plane, it is sometimes unfilmic to cut across the sagittal plane. Principle (E) can be illustrated by handheld camera work. Sometimes the jitter and shaking that accompanies such filming is filmic, such as when the narrative involves hiking through a rugged terrain. But in a scene for which jitter is not well motivated by the narrative, say a scene taking place in Buckingham Palace, handheld camera work is unfilmic. Many relevant examples of this type involve material in the background of the scene as photographed. For example, the tracks of a large truck or a plane flying across the sky in the background of a western. These examples can often involve lighting inconsistencies. In Charles Chaplin's Monsieur Verdoux there is a scene in which Chaplin (Verdoux) crosses his furniture store to answer the door. This is done in two shots, but the background illumination of the two shots varies vastly, and there is no apparent reason why half of the store should be uniformly and significantly brighter than the other half. Principles (C), (D), and (E) may seem rather obvious and extreme. Principles (F) and (G) are somewhat more subtle by comparison. Many film theorists, however, have cited the observation contained in Principle (F) — Baläzs (1970 [1945]: 161), Pudovkin (1958 [1929]: 25), Vorkapich (1974). In Hitchcock's film The Birds there is a sequence which appears to contradict Principle (F). The film's major character, Tippi Hendren, is seated outside of the schoolhouse while the birds are massing for another attack. She sits in front of the playground monkeybars, her back turned. Hitchcock cuts between a shot of the bars, filling with deadly birds, and a shot of Hendren looking beyond the camera. However, the viewer is confused. On the one hand, the relation of Hendren and the monkeybars has been established — she is in front of the bars with her back turned to them. On the other hand, when she looks beyond the camera prior to the cut back to the shot of the bars, the viewer is encouraged (Principle A) to locate the bars in front of her (beyond the camera). In other words, Hendren appears to be watching the monkeybars which are in fact behind her! A similar example can be cited from Eisenstein's Ivan the Terrible, Part I. In the wedding banquet scene, Prince Kurbsky
A Linguistic Approach to Cinema Theory 75
casts a glance out of a medium-shot. The following shot is not a subjective-shot from his viewpoint, but rather a long-shot along the same camera axis as the previous shot. The more inclusive shot reveals what it is that Kurbsky glanced at — the Czarina. This sequence does not confuse the scene's geography as badly as does the example sequence from The Birds, since in the long-shot the viewer can directly recover the spatial relation of Kurbsky and what he glanced at. However, the look of outward regard implies a subjective-shot, and the fact that a subjective-shot does not follow renders the sequence somewhat unfilmic. Another example of (F) comes from Fellini's Nights of Cabiria. Friends take Cabiria for a ride in an automobile and drop her off in a high-class part of Rome. Cabiria walks about, staring at the unfamiliar surroundings,and then casts a glance out of the frame. In the following shot, a very well-dressed lady turns and looks directly at the camera. The viewer assumes (Principle F) that the lady is looking around at Cabiria, who probably stands out in this part of town. However, in the next shot Cabiria is shown to be in front of the lady — when the lady turned around to look at the camera she actually looked away from Cabiria! (Fellini also does not include a subjective-shot from the lady's viewpoint after her look of outward regard.) Thus, instead of revealing what Cabiria looks at with a subjective-shot, Fellini shows the person she looks at looking away — in neither case does he provide the subjectiveshot. Accordingly, this sequence is unfilmic. (From the cinematic viewpoint, of course, Fellini has served his goal of making the camera an assertive personality in the action of the sequence.) In John Ford's How Green Was My Valley we find another example of Principle (F). Walter Pidgeon confronts Maureen O'Hara at the chapel door after she has stormed out of the church disgusted by the hypocritical deacons. The first shot is a mediumshot from the chapel door; the two talk. Suddenly, O'Hara turns and runs away. Pidgeon, left behind, turns to look after her. This look of outward regard anticipates a subjective-shot. However, the next shot is a three-quarters profile-shot — Pidgeon screen-left, O'Hara screen-right. Since in the second shot Pidgeon is partially masked by a tree in the background, the cut is particularly jarring and unclear. As a final example, consider a scene from D.W. Griffith's Way Down East (other work of Griffith provides many similar exam-
76 Toward a Structural Psychology of Cinema
pies). Griffith often used the look of outward regard to represent the emotional state of self-reflection. For instance, in the shot that introduces Richard Barthlemess in Way Down East, he is fetching water from a well. During the shot, he pauses and stares intently off-screen — his eyebrows furrowed. The modern viewer expects a cut revealing what has caught Barthlemess's attention, but there is no cut: Barthlemess is simply lost in thought. Perhaps in Griffith's day these constructions were perfectly filmic, however, for modern viewers they are not (Principle F). Finally, consider Principle (G). An extreme example of a violation of this principle is the entire first half of the film Dark Passage. Halfway through the film Bogart has massive plastic surgery, however, the viewer never sees his face before the surgery. Thus, throughout the first half of the film we see disembodied arms turning on a phonograph, fumbling for a cigaretta, paging through a photograph album, opening drawers, etc. In order to make all this seem reasonable, that is, in order to stay the question of why don't we see more than the arms and hands, all of these sequences were composed of subjective-shots. Thus, when Bogart turns on the phonograph, the camera has his viewpoint and hence of course records only the arms and hands. All of these sequences are, however, unfilmic. Another example comes from the film A Farewell to Arms. In the very last scene of this film director Frank Borzage has Helen Hayes kiss the hero Gary Cooper. But he arranged the shot as subjective from Cooper's viewpoint. Hence, Hayes kisses the camera. Even though Cooper (the camera viewpoint) is relatively passive with respect to Hayes in this kiss, it seems that this is still too active a role for the subjective viewpoint to play in the action of the sequence. A more pointed case is from Hitchcock's Spellbound. At the end of the film Ingrid Bergman exposes Leo G. Carroll as the evil force in the plot. As Bergman begins to leave the room, there is a subjective-shot from Carroll's viewpoint. He slowly raises a revolver. As he does so, the camera pivots — reflecting his changing aim. Thus, the viewer sees a disembodied arm moving through space with the camera pivoting after it. The sequence is unfilmic. Hitchcock employs another unfilmic construction in The Man Who Knew Too Much when he has Peter Lorre slap Leslie Banks in a subjective-shot from Lorre's viewpoint.
A Linguistic Approach to Cinema Theory 77
Hitchcock uses a similar arrangement in The Paradine Case. In one sequence, an arm reaches up from the subjective viewpoint to caress Gregory Peck. The sequence is unfilmic. In another sequence, Hitchcock represents two lovers coming together by opposing two subjective-shots: first, the man viewing the woman in a medium to long-shot, then the woman viewing the man in a close to medium-shot, then the man viewing the woman in a closer shot, etc. Again the sequence is unfilmic. A somewhat marginal case comes from Dreyer's Vampyr. In the dream sequence, in which David Grey looks out of his own coffin, we have a subjective-shot — straight up and jerking, to simulate the carrying motions. The sequence seems to be unfilmic — but this could be due to the relatively poor technical execution of the sequence. To summarize, there does seem to be quite convincing evidence for Principle (F) and at least suggestive evidence for Principle (G). The latter principle really needs to be further clarified — perhaps what really cannot be tolerated in subjective-shots is motion and not involvement in the scene's action per se. This would explain all of the examples except the one from A Farewell to Arms. What we would really want to do would be to contrast subjective-shots with nearly subjective viewpoints, and see what filmicity effects hold up. 16 For example, in the film Harper there is a sequence which is not subjective in which a man holds out a phone to Janet Leigh. She refuses to take the phone call and there is a protracted shot during which the phone is being held by a disembodied arm. The scene seems perfectly filmic, indicating perhaps that it is the subjectivity of the sequences in our earlier examples that renders them unfilmic. NOTES 1. For example, when we discuss acceptability and filmicity (see below), we would in all likelihood find a far more diverse selection of unfilmic data in a corpus of student films, or rough-cut versions of commercial narratives. The problem here would be that the data would not be easily available for inspection. Also in this regard, see Note 16 below. 2. See Gross and Lentin (1971), Minsky (1967), and Wall (1972) for further discussion of the theory of formal grammars. 3. Another term often used in discussions of cinema to refer to same class of unit as Metz's segment and my scene is the term sequence. In the
78 Toward a Structural Psychology of Cinema
4.
5.
6.
7.
8.
9.
present discussion, sequence will be reintroduced in Chapter 5 as a technical term. I note again, however, that Metz certainly failed to make clear the significance of any generalizations drawn by the grande syntagmatique taxonomy. See Chapter 3 for discussion. In fact, it will be argued in Chapter 5 that no phrase structure approach can express certain facts about scene structure that seem crucial. However, phrase structure grammar continues to play an important role in the version of transformational grammar that is developed in Chapters 5 and 6. The present discussion, even more so than the discussions in the two later chapters, should be viewed as primarily directed at the task of exploring questions at the foundation of cinema grammar, rather than championing any particular candidate grammar, or narrowed class of grammars. By this I mean that we are not likely ever to build a recognition device for scene units that is sensitive only to the physical properties that comprise scenes. But, as we observed in Chapter 2, there is no important consequence of this; except perhaps the recommendation that we give up on the logical positivism program. We are certainly not driven to unsystematic subjectivity. Indeed, the grammar we construct partially objectivizes the definition of 'scene'. We are assuming that a coherent description of scene need not include complementary music, background noise, speech and other sounds correlated with expressed or implied actions, etc. We envision a theory limited to cinematographies. This is typical of the sort of pretheoretical decomposition and idealization that organizes scientific inquiry. For example, syntactic structure is typically investigated as separate from speaking contexts, questions of acoustics and articulation, phonology, etc. However, n.b., we are not claiming that the scene is like the sentence in any theoretical sense: they are corresponding levels of structure in two parallel but quite separate investigations. The asterisk convention used in this rule has only a notational significance. Notice that the same generalization regarding scene structure could stated with the two rules below: S ==> L + D D ==> D + D The first rule describes the possibility that a scene can consist of a long-shot followed by a detail-shot. The second rule describes the possibility that a detail-shot can consist of two detail-shots in sequence. Of course, the two detail-shots generated by the second rule may themselves be derived into a sequence of two detail-shots. Thus, these two rules allow the possibility that scenes can consist of a long-shot followed an arbitrary number of detail-shots, though at least one. This is precisely the effect of the rule cited in the text. It seems that the flashback sequences that can be inserted within scenes can themselves often be more extensive structures than scenes. In the film Casablanca when Bergman confronts Bogart after hours at Rick's Cafe, they recall their affair in Paris through an extended flashback. This sort of structure lies beyond the scope of the present rule.
A Linguistic Approach to Cinema Theory 79
10. In this vein, note that the structural description in Figure 2 makes no mention of right or left turns, etc. The structure in Figure 2 is also an abstract description of structure. These remarks also apply to syntactic structures like Figure 4 (below in text). The graphic symbols which are the terminal nodes of this structure are considerably abstracted from their own acoustic realizations as spoken words. 11. For example, the grammar given in the text generates the sentence, The ball took the man. (which it probably should not), and fails to generate sentences like, The man I like is absent, which it probably should. 12. Eisenstein suggests that the basis of this intuition of unacceptability is perceptual (1942: 31). The argument has been made many times in recent linguistic research that intuitions of acceptability are sometimes due to functional interactions of the separate mental systems of grammar and perception (e.g. Bever, Carroll, and Hurtig 1976). In Chapter 8, we will develop analogous proposals regarding the analysis of some acceptability judgments in cinema. 13. In Chapter 8 we will speculate on the relation between grammatical descriptions (which are determined in part by acceptability judgments) and aesthetic qualities. Therefore, we would certainly not want to say that there is no relation between grammatical structure (as determined in part by acceptability) and aesthetic properties. This relation is both complex and indirect, and never as simple an equation as: acceptable = grammatical = good or acceptable = grammatical = bad In Chapter 3 we took Metz to task for this kind of oversimplification: it is rarely the case that unacceptability causally implies lack of artistic value. 14. We have acknowledged that, like linguistic judgments of acceptability, judgments of filmicity are absolutely subjective. It is therefore impossible to objectively describe or rigorously define the distinction between cinematic and filmic. Intuitions of sequence well-formedness can only bear on gross subjective impressions. Without prior cognizance of a theory of film stylistics and aesthetics (i.e. cinematicity) and a theory of film structure (i.e. filmicity), a viewer could hardly be expected to consistently separate the two in his subjective experience. Insofar as a film spectator can accept the filmic-cinematic distinction we must conclude that he has a tacit theory of these terms — i.e. they are aspects of his knowledge. The generalities which do obtain in these judgments of filmicity represent an enormously important, unexpected, source of insight into the nature of cognitive and perceptual competences which the viewer brings to the film experience. 15. The apparent motion illusion upon which cinema is based fundamentally depends on the fact that the human eye cannot resolve very minute, and rapid, contour displacements. Thus, we see motion in cinema as smooth and continuous, although physically it consists of rapid but completely discrete transitions between succeeding frames. Accordingly, when it is intended that a discrete change be perceived by the viewer as such, the
80 Toward a Structural Psychology of Cinema
filmmaker must take care that the transition is not (mis)perceived as smooth motion. For related discussion see Hochberg (1964; 1970). 16. Clearly, one way to elaborate the analysis developed here is to conduct actual experimental studies of filmicity phenomena: scene sets could be constructed that are almost completely identical in both form and content — differing only in a single detail of structure. Differences in obtained filmicity ratings could then be attributed directly to this detail of scene structure. Such an approach allows for a more subtle analysis of filmicity than can be achieved by inspection of existent narrative films, although it forces the investigation to base itself on artificial and possibly unnatural scene exemplars.
Transformational-Generative Cinema Grammar
1. INADEQUACIES OF PHRASE STRUCTURE GRAMMAR Phrase structure grammar can in principle provide an account of filmicity facts; and in this sense it can provide a grammar of the structure of cinema scenes. A phrase structure grammar as simple as our two rule example grammar in the last chapter can potentially enumerate an unlimited set of scene structures. Phrase structure grammar allows a far more comprehensive description of scene structure than a taxonomy with a small finite number of classes. However, as we shall see presently, there are many important facts about the structure of cinema scenes that go beyond considerations of filmicity and that cannot in principle be described by phrase structure grammar. An examination of these facts leads us to adopt a richer grammatical formalism, that of transformational-generative grammar. In this section we will examine some of the inadequacies of phrase structure description. In section 2, we will develop a transformational-generative approach to cinema grammar. And in section 3, we will comment briefly on some of the the implications of this approach.
1.1. Proper constituencies One of the strongest sorts of intuitive knowledge people have about sentences and cinema scenes can be called intuitions of constituency. In the sentence, 'The man hit the ball.' we feel that the first 'the' and the 'man' belong together in some sense, as do the words 'hit the ball'. These intuitions are accounted for in the
82 Toward a Structural Psychology of Cinema
phrase structure description of this sentence in Figure 4: 'the man* is isolated as the subject—noun phrase (NP) group, and the other words are grouped as the verb phrase (VP). Analogously, our phrase structure grammar for cinema scenes isolates and groups together the shots comprising the embedded flashback in Figure 3. However, phrase structure grammar fails in many cases to reveal all aspects of the constituent structure of sentences and scenes in the structures it provides. Consider so-called respectively constructions, as exemplified by the sentence, John, Harry, and Francine, sang, danced, and read a book, respectively. A phrase structure description for this sentence appears in Figure 7. The sentence seems to contain the separate clauses 'John sang'; 'Harry danced'; and 'Francine read a book'. However, these clausal constituents are discontinuous in the actual sentence sequence, and are not represented as coherent units of structure in the phrase structure description of Figure 7. Thus, phrase structure description cannot adequately render our constituency intuitions for such sentences. Structures somewhat analogous to respectively constructions occur in cinema. Consider, for example, the device of parallel cutting, first introduced by Griffith. In a parallel-cutting arrangement, a film traces the simultaneous activities of several characters or groups of characters. In a typical western, one might reveal the stranded and besieged folks from the wagon train, holding off thousands of Indians, and periodically cut to shots of the cavalry's hurried charge. Typically, just before the two separate trains of activity are joined (in this case just as the cavalry is about to appear from behind a hill), the cutting rate is increased. Griffith found that such a structure heightens tension in the viewer. For present purposes, observe that the separate activities of the cavalry and the wagon train are coherent wholes for the viewer. Even though they overlap the viewer regards them as integral constituents. However, phrase structure descriptions, like Figure 4, cannot recognize this constituency.
Transformational-Generative Cinema Grammar 83
Figure 7.
84 Toward a Structural Psychology of Cinema
In citing these examples from cinema and language, we do not intend to claim that that phrase structure grammars, such as those we have considered, provide no accounting of constituency facts; rather we argue that such grammatical descriptions cannot account for all constituency facts. In this sense, phrase structure grammars do not render the proper constituency representations for sentences and scenes. Another type of sequence for which phrase structure grammar fails to provide proper constituent structures involves what we will call deletion. What deletion is can best be seen by means of example (due to Ross 1970 [1967]): Joe ate rice and Harry fish. The phrase structure description for this sentence appears in Figure 8.
fish
Joe
ate
rice
Figure 8. Again, it seems that the actual sentence contains separate clauses: 'Joe ate rice' and 'Harry ate fish'. But the phrase structure description fails to represent this.
Transformational-Generative Cinema Grammar 85
And again, we can identify arrangments of cinema scenes that parallel the example sentence in relevant ways. Often, films reveal a character doing something, and then cut to a shot of another character doing that same thing. For example, a young recruit approaches the quartermaster's counter and receives his combat boots; then another does it, then another, etc. This sort of scene often involves deletion: it is not necessary to show the entire episode for the second soldier. The film may simply show that second soldier approaches the counter, and let the viewer make the inference that he too receives boots. The viewer does indeed make this inference, and the fact that the second soldier received boots is a real part of the scene's structure. However, a phrase structure grammar will not reveal this, and is to that extent further demonstrated to be inadequate. 1 1.2. Ambiguity and paraphrase In addition to the problems of constituency that we have just considered, phrase structure descriptions are unable to characterize certain important relations that scenes and sentences enter into. One of these is paraphrase. The sentences below are synonymous, they are paraphrases of one another. The man hit the ball. The ball was hit by the man. This relation is intuitively apparent to speakers of English, it is part of the basic knowledge one has when one knows the language. We have already considered the phrase structure of the first sentence, it is given in Figure 4. Compare this structure with the phrase structure assigned to the second sentence, as sketched in Figure 9. The descriptions assigned to these two sentences by phrase structure grammar are completely distinct. There is no way to tell that the two sentences corresponding to these structure are synonymous. These phrase structure representations fail to relate the two paraphrases. As we noted in Chapter 1, cinema also displays the relation of paraphrase. A scene consisting first of a close-shot of a man followed by a more encompassing long-shot of a man hitting a ball seems to be a paraphrase of a scene consisting first
86 Toward a Structural Psychology of Cinema
Figure 9.
Transformational-Generative Cinema Grammar 87
of a close-shot of a ball, followed by a long-shot of a man hitting the ball. It is not crucial that we analyze this pair of scenes as being strictly analogous to the phenomenon of inversion of subject and object nouns in our sentences. The two scenes are paraphrases of one another. Examples of paraphrase can be cited from existent films as well. Baläzs (1970 [1945]: 91) provides an interesting example of the use of paraphrase. He discusses the phenomenon of de"ja vu and notes that in cinema, as in ordinary life experience, repetition can evoke memory. He describes a scene from the film Narcosis which he made along with Alfred Abel. In the film a man and a woman encounter one another after many years of separation. Time has changed them both greatly, and the man does not even recognize the woman. In the film, the woman arranges a meeting staged to be identical to a meeting between them of many years past. She seats the man in the same chair, arranged in front of the same fireplace, and seats herself in the same position — her face illuminated by the fire just as it was so long before. (Of course, the man then recognizes her.) Thus, Abel and Baläzs repeat the event of the prior scene with the same participants — only the time of the scene is changed. Filmmakers sometimes repeat sequences for emphasis (Pudovkin 1958 [1929]: 103), and this provides us with a fund of synonym pairs. There is a famous example of synonymic repetition in Eisenstein's film Potemkin — the scene in which the ship's doctor is thrown overboard. Eisenstein also used repetitions in October. Weiner used repetition in The Cabinet of Doctor Caligary: when the sonambulist, Caesar, arrives at the heroine's bedroom window. In a long-shot, he stands up at the window, framed in the light, and leers at the girl he is about to kidnap. However, in the following medium-shot, he performs the same action — repeating the action in the tighter medium-shot intensifies the viewer's empathy with the helpless girl. In more contemporary work, repetition has also been exploited as a formal device. In The Mother and the Whore, Eustache shows the character Veronika lighting a cigarette twice — in each of two adjoining shots. Godard engaged in this sort of synomymy in Les Carbinniers, Renais in Je T'aime, Je T'aime, Last Year -At Marienbad, and Stavisky. Consider this example, drawn from Stavisky. the Spanish arms buyer and Stavisky's wife are on the
88 Toward a Structural Psychology of Cinema
golf-course. In a long-shot the viewer sees them talking. Immediately, the scene is repeated in a medium-shot. The repetition is visually compelling since the viewer does not know for sure that the medium-shot is actually synonymous with its predecessor until the woman takes her sweater from the man with exactly the same gesture. There seem to be different degrees of synonymy that can obtain between cinema scenes. For instance, consider all of the lovemaking scenes ever filmed. Clearly, a great many of these scenes represent the same basic event. Now consider the same cinema scene (any scene at all) acted out by different sets of actors or in different places. The underlying events may be more than just similar, they might well be identical — only the personages and the background scenery differ. Finally, consider the case of the same event filmed from two camera viewpoints, or filmed from the same camera viewpoint but edited differently. In this example, the actors, props — everything but camera position and/or montage — is absolutely identical (Baläzs 1970 [1945]: 32, 94, 165-166). We can easily construct such an example. Recall the final scene of Casablanca, the scene in which Bergman and Bogart say goodbye at the airport. We see various shots of Bergman, her husband, and Bogart, as Bogart reveals his decision to stay in Casablanca. Bergman and the husband then walk off toward the waiting airplane and freedom. As they walk off to the plane (and into the camera) we do not see Bogart behind them, just dense fog. In the next shot, we see Bogart, from the back watching the plane take off. But imagine a scene in which Bogart is visible in the shot of Bergman walking off with her husband. In this synonymous version, Bogart is no longer the self-sacrificing man-ofsteel, who takes their three fates in his hands and decides unselfishly. Now he is the man left behind in the fog, the unrequited lover, the soon-to-be-forgotten memory of the past. As we noted above, the cinema grammar envisioned here addresses only cinematographies. The massive difference in affective signification between these two synonymous scenes is beyond the scope of the present theory. Just as our phrase structure grammar for English could not represent the paraphrase relations of the active and passive sentences, a phrase structure grammar of cinema cannot represent the
Transformational-Generative Cinema Grammar 89
synonymy of scenes. Insofar as we wish the descriptions generated by grammar to account for relations like paraphrase, phrase structure grammar will be inadequate. Paraphrase, however, is not the only such relation, and hence the inadequacy of phrase structure grammar is even more acute. Ambiguity provides further problems for phrase structure description. Sentences that are ambiguous have two (or more) separate meanings: The duck is ready to eat. The separate meanings for this ambiguous sentence are indicated below, The duck is ready to be fed. The duck is ready to be eaten. The structure for the example sentence appears in Figure 10.
VP
NP
Det
N
the
duck
is
ready
eat
Figure 10.
90 Toward a Structural Psychology of Cinema
Phrase structure grammar assigns only one structure to sentences like our example, and thus treats them just as sentences like 'The man hit the ball'. Accordingly, phrase structure grammar fails to represent the fact of ambiguity. We can imagine several ways in which a scene might be called ambiguous. First, an object in the scene could be ambiguous. For example, we see an actor throw something. It is small and round and reddish in color. Did he throw and apple or a bomb? Or again, we see someone enter the room and commit the murder. Was it the butler or the duke? Baläzs (1970 [1945]: 110) notes that this sort of ambiguous presentation is typically employed when the character of Jesus Christ appears in a film. One class of such examples involves the use of mirrors. In Charles Chaplin's The Circus, there is a scene in a circus funhouse in which the viewer seems to lose track of which image is the real Charlie and which is the reflection. Orson Welles used mirrors in just this way in his Lady From Shanghai. A second potential variety of ambiguity could involve actions. For example, we see some sequence of motions performed on the screen, but we cannot determine which of two events these motions represent. Possible examples of this also come from Chaplin's work. In The Idle Class, the viewer sees Chaplin with his back turned to the camera — heaving and shuddering. Is he sobbing at his wife's refusal to forgive him? No, for when he subsequently turns around we see that he has been shaking a drink — apparently indifferent to whether his wife forgives him or not. A similar arrangement was used in Chaplin's The Immigrants: here we see him leaning over the rail of a rolling boat, legs flailing. He is not experiencing an attack of sea sickness, as we might suppose, but rather he is fishing. In both examples, the effectiveness of the scene resides in the fact that it is the less likely of two interpretations of a scene that is in fact confirmed by subsequent material. Another example can be cited from the film The Shakiest Gun in The West. In one scene, Barbara Rhodes and Don Knotts are defending themselves against an Indian attack. Both are firing away. The typical western cutting arrangement here would be to show one of the two fire off a shot, and then to cut to an Indian falling off his horse. In the actual film, however, a shot of Knotts firing is followed immediately by a shot of Rhodes firing, and only
Transformational-Generative Cinema Grammar 91
then is there a shot of an Indian falling from a horse. The viewer cannot determine from this which of the two in fact fired the fatal bullet. (Indeed, one does find this out quite a bit later in the film.) Phrase structure grammar, as we have noted, may assign only one description to such scenes and thus fail to adequately represent their structure.2
1.3. Descriptive adequacy Faced with the facts considered in the previous two sections, what can we do? One alternative is to be satisfied with accounting for filmicity alone. There has never been an explicit theory of filmicity, and a grammar that generated all and only the filmic scene structures would certainly tell us much about the knowledge people must have in order to phenomenally make this distinction. Phrase structure grammar can potentially give us such a theory of scene structure. Another alternative is to augment our grammatical formalism, to part company dramatically with taxonomic and phrase structure approaches. This latter alternative is the one we shall proceed with. The criterion of observational adequacy only addresses the nature of the output of the grammar. If the grammar generates all and only scene structures corresponding to filmic scenes, it is observationally adequate. If we consider how the grammar enumerates the set of scene structures it enumerates, we can construct a stricter criterion for adequacy, and hence better ensure that an adequate grammar is a more a reasonable theory of the knowledge people have about the structure of cinema scenes. Chomsky has called such a criterion descriptive adequacy: a linguistic grammar is descriptively adequate if it generates all and only the acceptable sentences of the language with their correct structural descriptions (Chomsky 1965: 26-27). But what are correct structural descriptions? They are structural descriptions that capture generalizations about paraphrase, ambiguity, proper constituency, and the like. Thus, the correct structural descriptions for two synonymous scenes or sentences must show that the two are related and how. It must characterize not only the fact of synonymy, but also the differing degrees or types of synonymy that inhere in the experience of viewing cine-
92 Toward a Structural Psychology of Cinema
ma. It must mark not only the similarity of synonymous scenes, but also that which differentiates them from each other (Eisenstein 1942: 11). The correct structural description for an ambiguous scene or sentence must show that it has two meanings, and must indicate what they are. Finally, the structural description must represent the proper constituencies of the scene or sentence.3 A descriptively adequate cinema grammar must account for filmicity, paraphrase, ambiguity, and constituency — and whatever other properties and relations viewers take to be systematic and important. It is important to bear in mind, however, that other sorts of basic properties and relations may need to be defined in order to attain criteria strict enough for evaluating candidate grammars. These additional criteria may, of course, be quite unlike any used in linguistics itself. However, in the present study we will be focusing only on facts pertaining to these. 2. TRANSFORMATIONAL-GENERATIVE GRAMMAR Transformational-generative grammar has developed in linguistics as a means of addressing the considerations of descriptive adequacy. A transformational-generative grammar generates two phrase structures for a sentence: one is referred to as the sentence's deep structure, and the other as the sentence's surface structure. These two levels of structure are related by transformational rules. Synonymy, ambiguity, and issues of proper constituency are described by relationships between these two levels of structure. A pair of synonymous sentences share a common deep structure, but differ in their surface structure. Conversely, an ambiguous scene has two distinct deep structures but only a single surface structure. Problems of constituency are addressed by allowing sentences to have differing constituent structures at the deep and surface levels. 2.1. Linguistic transformational-generative grammar
A transformational-generative grammar, or TGG, consists of several components. There is a Base Component, which is a
Transformational-Generative Cinema Grammar 93
phrase structure grammar and which generates deep structures in a manner not unlike the example phrase structure grammars we considered in Chapter 4. Additionally, there is a Transformational Component, which transforms these deep structures into surface phrase structures Deep structures are related to meanings by a set of semantic interpretation rules (Katz and Fodor 1963; Katz 1972), and surface structures are related to manifest (i.e., acoustical) utterances by a set of phonological interpretation rules (Chomsky and Halle 1968; Shane 1973). The grammar is schematically blocked out in Figure 11. Sound Phonological Interpretation Rules Surface Structure Transformational Rules Deep (Base) Structure
\
Semantic Interpretation Rules
Meaning Figure 11. In order to better understand the type of description produced by transformational-generative grammar, let us return briefly to our earlier example sentences. The man hit the ball. The ball was hit by the man. Both of these sentences have the same deep structure. Simplifying a bit, this structure is given in Figure 4. The relevant rules of
94 Toward a Structural Psychology of Cinema
the Base Component are those given in discussion of that example. S = = > NP + VP NP = = > Det + N VP = = > V + NP Det = = > the N = = > man, ball, etc. V = = > hit, etc. What differentiates these two sentences is their transformational derivations. The active sentence 'The man hit the ball' has only a trivial transformational derivation. For our purposes, its deep structure and its surface structure are identical (Figure 4).4 The passive version of the sentence, however, undergoes the Passive transformation: Passive Rule: NP1 + V + NP2 = = > NP2 + was + V + by + NP1 When this rule is applied to the deep structure in Figure 4, it produces the structure in Figure 9 above. This structure (again simplifying somewhat) is the surface phrase structure of the passive sentence 'The ball was hit by the man'. The transformational-generative derivation of our active and passive sentences accounts for the paraphrase relation between them. Both sentences share a common deep structure, in this way the grammar theoretically reconstructs the phenomenal sameness of the sentences. However, the two sentences have distinct transformational derivations and surface structures. In this way the grammar theoretically reconstructs the fact that synonymous sentences are different too. For an ambiguous sentence, we would have two (or more) different deep phrase structures, each of which could be derived transformationally into the same surface structure. The multiple deep structure would correspond to the multiple separate meanings that speakers understand ambiguous sentences to have.
2.2. Methode logica I prelim inaries As we have noted, some of the limitations of phrase structure descriptions of language are also apparent in considering cinema. What we would like, then, is to develop a
Transformational-Generative Cinema Grammar 95
transformational-generative approach to cinema grammar.5 This approach, if successful, would allow us to confront problems like ambiguity and paraphrase in ways analogous to linguistic TGG description. Our goal is to construct a grammatical formalism that can potentially frame a descriptively adequate grammar of scenes. We do not have to look far to find two significant levels of scene structure that appear to be share a relation analogous to that of linguistic deep and surface structure. Cinema scenes represent visual events, things that really happen in a threedimensional world. These events are realized as a sequence of long-shots, detailed close-ups profile-shots, etc. The grammar, then, relates each event to its corresponding scene or scenes. In fact, what we have been calling synonymy, or paraphrase, seems to be precisely the case of two differing scenes representing the same visual event. A grammar that generated an event structure and from it transformationally derived a sequence structure could in principle account for the phenomenon of synonymy understood in this way. Other aspects of descriptive adequacy also seem to develop naturally from this proposal. An ambiguous scene would have multiple distinct event structures. However, its transformational derivations would converge on a single sequence structure. The problem of proper constituencies also seems tractable on this approach. Consider our example of the parallel action scene involving the wagon train and the cavalry. We would like to show that all of the material dealing with the cavalry's advance forms a coherent unit, even though this material is interspersed with the wagon train material in the actual scene. In a transformational-generative approach, we would separately generate these two event structures: the cavalry advances, the wagon train is attacked. The two event structures are then transformationally combined to obtain a single sequence structure.6 Such a treatment accounts both for the fact that the scene itself is a coherent unit, and for the fact that the scene consists of two interspersed coherent units. The components of a TGG of cinema, which are by design rather parallel to those of linguistic TGG, are blocked out in Figure 12.7
96 Toward a Structural Psychology of Cinema
Film "Photographic Interpretation Rules" Sequence Structure Transformations Event Structure Semantic Interpretation Rules Meaning Figure 12. Without a doubt, such a formal device is ambitious. A linguistic grammar must enumerate the class of all possible propositional deep structures and then formally combine and transform them into surface structure representations. The grammar of the cinema scene must enumerate the class of all visual events and then transform them (i.e. frame, focus, organize them into moving and static shots, etc.) into the sequence structures of cinema scenes. Before actually launching the project of such a cinema grammar, it is important to say a few words about the scope of what we will be doing. We will not be presenting a comprehensive theory of cinema, or even a complete grammar of cinema scenes. And the reader should not expect to find this. Nevertheless, there is no reason to doubt that the analysis outlined can be greatly extended. Of the many and important mise-en-scene devices available to the filmmaker, we shall concern ourselves primarily with cuts. Certainly, this simplification does not intend to undervalue other mise-en-scene techniques, like pans or dissolves or
Transformational-Generative Cinema Grammar 97
tracks. This important extension of the present work is left for future research. We shall also ignore many of the nuances of scene structure, such as slight variations in camera angle and camera distance For some purposes, of course, it is of interest to distinguish between a camera angle of 45° from the horizontal, and one of 45.001° from the horizontal. In the present work, we shall limit ourselves to the grosser categories of 15° increments. Similarly, one can specify the dimension of camera distance as '5 feet from the object', or '5.001 feet from the object', or simply say 'a close-up'. We will adopt the latter level of precision. Again, the choices are heuristic, not theoretical. There is no intent to devalue camera distance or camera angle as dimensions of scene structure. Consider the analogy in linguistics: there is a sense in which there are an infinite number of ways to say the word 'table'; variations in pitch, intonation, stress, etc. will create an unlimited diversity. From the viewpoint of acoustic phonetics, this is of great interest. However, from the viewpoint of more molar levels, like the level of syntax, the word 'table' is a single element. From the perspective of syntax, the multitude of utterances of 'table' forms a category, and syntactic analysis does not further partition this category. This sort of idealization serves every science, and we shall employ it in simplifying the diversity in event and sequence structures. The final arbiter of whether of not our idealizations are justified is the extent to which we are able to shed light on the structure of cinema scenes and, in particular, to construct a descriptively adequate grammar of the scene.
2.3. An illustrative fragment of the grammar In this section, we will develop the basic mechanics of a TGG for scenes. We will discuss a few further rules in Chapters 6 and 8, but this presentation contains the core of the grammar. In the first subsection, we will develop the phrase structure component which generates event structures. In the second subsection, we will present some transformational rules that derive sequence structures from these event structures.
98 Toward a Structural Psychology of Cinema
2.3.1. Phrase structure component: event structure. Recent work in the analysis of narrative cinema has often distinguished a 'level' of events, or action sequences (Bettetini 1973; Eco 1967; Metz 1974; Mitry 1967; Pasolini 1965; see also Barthes 1974). The present theory also distinguishes the level of events.8 The phrase structure generation of an event is initialized by the rule: E = = > N + Sq The rule can be read: 'Event' consists of 'Nominals' adjoined to 'Action Sequence'. (Thus, E = 'Event', N = 'Nominals', and Sq = 'Sequence'.) The rules that elaborate the N node develop a description of the actors and background props of the scene. The transformational component of the grammar then organizes and distributes this geography with respect to the derived sequence of 'shots'. Some of the geography and personages of the event are revealed in long-shots, some in moving camera shots, etc. We will consider the nature of the N node and the manipulations that the grammar performs on the N node in Chapter 6. (See McCawley 1968, for the analogous basis of this analysis in linguistic grammar.) The rules that elaborate the Sq node develop a description of the action sequence of the scene. The phrase structure generation of the Sq node begins with the rule: Sq = = > A* The symbol Sq is rewritten as a sequence of A nodes; an action sequence becomes a sequence of actions. There are several ways in which A nodes can become elaborated. First, an A node can be rewritten as a sequence of A nodes: A = = > A* But some actions have a distinct structure of preparation and completion (e.g. a pitcher's wind-up prior to the release of the ball). A nodes can also be expanded by this rule: A ==> P + F This rule allows the node type A to be rewritten as 'Preparatory Action' plus 'Focal Action'. Preparatory and focal actions themselves are rewritten as A nodes: P = = > A* F = = > A*
Transformational-Generative Cinema Grammar 99
This idea might seem a little odd at first, but it allows us to characterize the notions 'preparatory' and 'focal' at several levels of phrase structure. For example, the approach of the thugs is a preparatory component of the hero's beating, but the flexing of the thug's arm is, at a finer level of structure, a preparatory component of the thug's action of punching the hero. Similarly, the preparatory component of taking a step plus the focal component of walking appear both to be components of the approach of the thugs. By allowing the nodes P and F to embed actions we might be able to capture the generalities between preparatory and focal components at many different levels of event structure. Basic actions, that is, A nodes which do not themselves embed P, F, or A nodes, are expanded with the following rules: A = = > Pred. + Arg.* Pred. = = > wind-up, tell, throw, etc. Arg. = = > pitcher, baseball, umpire, etc. These rules specify that ah action consists of a predicate plus its set of arguments.9 For example, in the case of a pitcher windingup, we might have: wind-up + pitcher. The terminal symbols here (e.g. wind-up) are names which recognize and isolate the units of events. As Barthes (1974: 19) puts it, 'the sequence exists when and because it can be given a name, it unfolds as this process of naming takes place.' As indicated in section 2.2 above, it would certainly be possible to analyze the structure of actions more finely, describing the minute muscle twitches that comprise a pitcher's wind-up. However, this level of detail is inappropriate for our project. And in any case, there is every reason to believe that such descriptions are irrelevant to the way people actually conceive of actions.10 (For related approaches to this problem see Barker and Wright 1955; Dickman 1963; Newtson 1977.) The phrase structure rules we have just now enumerated will provide us with a fund of event structures as input to the transformational component of the grammar. However, given an event in the world, they do not tell us what its phrase structure will be like. In assigning event structures to actual events we rely on our intuitive knowledge about events, augmented by various operational procedures. In linguistics, one such sort of procedure is called a test of constituency (see Hudson 1967; Wells 1947). These tests of constituency are empirical criteria that are used as
100 Toward a Structural Psychology of Cinema
guides for which linguistic elements should be grouped with which other linguistic elements in a structural representation of the sentence. In order to obtain a hierarchical phrase structure description for events we can devise some constituency criteria. For purposes of discussion, we will make use of two such tests (to be complete, we would need others as well): First, to the extent that an action cannot be coherently separated from its predecessor in an event sequence, the action and its predecessor are co-constituent. In order to operationalize this test, imagine inserting another scene (e.g. a flashback scene) between two action units. Insofar as this manipulation would render an unfilmic scene, the two actions are co-constituent. Second, to the extent that the same agent instigates two adjacent actions in a sequence, the actions are co-constituent." (We will make use of these principles below.) It must be kept in mind that these operational procedures are merely heuristics, they are not a part of the cinema grammar. They are used in order to systematize the assignment of event structures to events. They are justified only to the extent that the assignments lead to descriptively adequate accounts of the cinema scenes in question. Descriptive adequacy is the final standard. 2.3.2. Transformational component: sequence structure. By means of a formal transformational derivation, the event structure is realized as the sequence representation of a cinema scene. Certain actions are rendered as long-shots, others as close-shots. Some actions are deleted, and therefore are not represented in the derived sequence structure. As the A node is the major subconstituent of an event, the S node is the major subconstituent of a sequence structure. The following transformational rule carries A nodes (actions) into S nodes (shots): Shot Rule: A ==> S Two or more A nodes can, of course, be collapsed into a single shot.12 Accordingly, the Shot Rule can be restated as such: A* = = > S However, there are cases in which this statement of the Shot Rule does not seem to be correct. In particular, it has been argued that cuts are unfilmic when they coincide precisely with
Transformational-Generative Cinema Grammar 101
changes in the action, that the viewer needs to have visual linking between successive shots (Baläzs 1970 [1945]: 53). Baläzs claims that each shot in a scene must divide actions so that an action begun or implied in one shot is fully revealed and completed in the following shot. An example comes from the Fellini film Nights of Cabiria. Early in the film there is a full medium-shot of Cabiria standing outside of her house. She turns abruptly and enters the house in a profile long-shot. The cut from medium- to long-shot coincides quite precisely with the abrupt thematic action change. And the scene is unfilmic, just as Baläzs's principle would predict. In Baläzs's term what we lack is a link between the action of the two shots. In the terms of the grammar, the cut between the two shots coincides with the boundary between two A nodes. Consider the action of Cabiria entering her house. This action seems to consist of a preparatory component, perhaps turning toward the door and reaching for the doorknob, and a focal component, pushing the door open while stepping into the house.13 If we elaborate Baläzs's notion of linking in this way, we can say that a cut could obtain at the juncture between the preparatory and focal components (say, just after Cabiria turns toward the door, reaching) but not before the preparatory component (as it does in the existent film). In violating this restriction, the cut as it occurs in the film renders the scene ungrammatical. But not all actions consist of preparatory and focal components, and not all cuts coinciding with action boundaries render scenes unfilmic. Consider a subjective-shot scene, such as those discussed in connection with Principles (F) and (G) in Chapter 4. An actor looks out of the shot, there is a cut, and the next shot reveals what was looked at. The action of looking is not preparatory to whatever action is being looked at, and the cut is perfectly filmic. In fact, when two adjacent actions have different agent arguments, the cut from one action to the other can occur right on the A node boundary. Two adjacent actions will have different agent arguments when the lowest node they are mutually dominated by is Sq. This leads to another restatement of the Shot Rule: Shot Rule: X + A* + ==> X + S +
102 Toward a Structural Psychology of Cinema
Condition: X and Υ are either null or, with A*, are mutually dominated by no A node, that is, by no node lower than Sq. We also want to provide rules for the case in which there is a preparatory/focal structure and the A nodes in question are dominated by a node lower than Sq. Such rules must break A nodes between the subconstituents Ρ and F (Preparatory and Focal). Related-Agent Shot Rule: X + PI + Fl + Υ + P2 + F2 + Ζ = = > X + P 1 + S + F2 + Z Conditions: (1) X and Υ are not null. (2) Fl and F2 are mutually dominated by some A node. An F node plus some unspecified material (the variable Y) plus a Ρ node are rewritten as an S node. In this way, the shot boundaries derived do not coincide with the action boundaries precisely, but they do lawfully correspond to these action boundaries, as specified in the rule scheme. With this analysis in hand, perhaps we can return to the example from Nights of Cabiria. Recall that in the example the viewer first sees a full medium-shot of Cabiria, standing just outside her home. There is a cut and, immediately coinciding with the cut, she turns and enters the house in a profile long-shot. Apparently, the Shot Rule has applied when, according to its condition, it should not. The Related-Agent Shot Rule is the appropriate rule here. According to it, the cut can only obtain between the corresponding P and F nodes, that is, between the time Cabiria turns toward the door, and the time she enters the house. Thus far we have concerned ourselves with transformations that determine where in a sequence of actions a cut can be placed, rendering the actions as a sequence of shots. We also want to consider rules that delete or expand a part of the action and thus create different event and sequence structure constituencies (recall our example of the cavalry and the wagon train). And we want to establish rules that specify what sort of shot can follow what sort of shot, how a given sequence of actions is to be realized: long-shot, profile-shot, close-up, etc. One type of deletion is exemplified by this sort of arrangement: toward the end of a scene (i.e. and its corresponding event) the viewer sees an action begun, however, instead of seeing the entire action completed there is a cut to an entirely new scene
Transformational-Generative Cinema Grammar 103
(event). This technique is used especially to spare the viewer unpleasant details. Thus, if the hero is about to be beaten by thugs, we might see the thugs approach him, perhaps even raise their fists menacingly. However, before the action is completed there is a cut to a new scene (e.g. the hero being nursed back to health by a beautiful nurse). The division of actions into focal and preparatory components provides a possible basis for treating this type of construction. Thus, consider the following rule. 14 Event-Final Deletion Rule: X + p + F ==> X + P X + S + F ==> X + S This rule specifies that the right-most action of an event structure (i.e. there is nothing to the right of P and F, or S and F, in the rule, therefore take them to be the right-most elements in the event structure), may be rewritten as P (or as S) — the F node is deleted. In our example, the action is 'hero is beaten-up by thugs'. The focal component is, of course, the actual beating. The preparatory component is the approach of the thugs. In light of the rule just defined, the focal component is deleted and only the preparatory component remains to be realized in the sequence structure. The rule is stated in two versions because it interacts with the Related-Agent Shot Rule in scene derivations. Notice that when the latter rule applies to structure like 'P + F + P + F', the output structure is 'P + S + F'. Now the Event-Final Deletion Rule tells us what we can do with such trailing F nodes — we delete them. If the Related-Agent Shot Rule applies before the EventFinal Deletion Rule, we will need the second statement of the Event-Final Deletion Rule to accomplish this. If the ordering is the reverse, we will need the first statement.15 We will have need of further ancillary rules, for example, these: Preparatory Shot Rule: X + P ==> X + S Event-Final Shot Rule: SI + F = = > SI + S2 The Preparatory Shot Rule allows us to treat the P node left over in the output of the first version of the Event-Final Deletion Rule
104 Toward a Structural Psychology of Cinema
and of the Related-Agent Shot Rule. The Event-Final Shot Rule can apply in case the Related-Agent Shot Rule applies but the Event-Final Deletion Rule does not apply. The remaining leftmost F node itself becomes a shot.16 The other side of deletion, in a sense, is amalgamation. As we noted in our discussion of the cavalry and the wagon train example, we will need rules to intersperse material from two or more event structures into a single sequence structure (flashback scenes are another example). The rule below attempts to express the parallel action arrangement: Parallel Action Rule:17 Sq(S*)&Sq'(S*) > (SI + SI') + (S2 + S2') + ... + (Sn + Sn') We have two Sq nodes each of which consist of a sequence of S nodes. The rule has the effect of pairing off the shot nodes, one from one Sq structure and one from the other. We will return to this rule below in section 2.3.3, where we try to exemplify the role which each of the rules we have been discussing might play in actual derivations. In Chapter 4 we briefly dealt with the problem of determining what sort of shot can follow what sort of shot. The phrase structure grammar we developed expressed one such relation as indicated below. S = = > L + D* Griffith, as we noted, often began a scene with a long-shot and then followed it with a sequence of detail close-up shots. But clearly, this so-called master-scene approach is too restrictive. Many filmic scenes do not abide by this rule. Long-shots can certainly appear elsewhere in a scene than in the initial position. Moreover, we will have to worry about what scenes can be realized as over-the-shoulder shots, which can be subjective-shots, profile-shots, medium-shots, head-on-shots, etc. These properties are indicated in sequence structure by the presence of framing features. For example, indicates that the directly dominating shot node corresponds to a mediumshot. A variety of transformations function to attach these framing features to appropriate shot nodes. One such rule we will need ensures that Principle (F) is not violated: when an actor casts a look of outward regard the immediately following shot must be subjective.
Transformational-Generative Cinema Grammar 105
Subjective Sequence Rule: S(Pred. + Arg.*) + S'(Pred.' + Arg.*') = = > Sl(Pred. + Arg.*) + S'(Pred.' + Arg.*' + ) Conditions: (1) Pred. = Outward regard' (2) Arg.'l of S' is not identical to Arg.l of S (3) Arg.l of Sis As an illustration, consider the scene-fragment from The Birds: S has Pred. = Outward regard' and Arg.l = 'Hendren'. S' has Pred.' = 'fill monkeybars' and Arg.'l = 'birds'.
Arg.'2
Arg.'l
Pred.
Arg.'
Pred '
outward regard
Hendren
fill
Pred.
Arg l
Pred.'
Arg 1
Arg.' 2
out;vard rec ard
Hen dren
fi
birds
monkeybars
Figure 13.
monkey bar
birds
106 Toward a Structural Psychology of Cinema
As indicated in Figure 13, the Subjective Sequence Rule attaches to S 1 , making it subjective from Hendren's viewpoint. Note that conditions (2) and (3) guarantee Principle (G): the agent of the subjective shot, in this case S', cannot be the one from whose viewpoint the shot is subjective (i.e. the agent of shot 5, as specified by condition 3). A second illustration of the placement of framing features involves a-b dialogue scenes. Such scenes involve alternating reversed angle 3/4-perspective-shots (recall our discussion of Figures 5 and 6). Dialogue Rule: S(Pred. + Arg.*) = = > S(Pred. + Arg.* + ) Condition: Pred. = 'talks to' Dialogue Sequence Rule: S(Pred. + Arg.* + NP1 + VP1 + CONJ. + VP2 Condition: NP1 = NP2, VP1 * VP2. The application of Coordination Reduction to the structure in Figure 20 yields the structure in Figure 21. Having such a rule, we can generally account for a wide range of deletion cases. The rule can be extended to sequences of clauses, as indicated in the sentences below.
Deletion as a Case Study 129
The King of Siam captured the flying shark, the King of Siam killed the giant turtle, and the King of Siam returned peace and safety to the realm. The King of Siam captured the flying shark, killed the giant turtle, and returned peace and safety to the realm. With relatively minor elaborations this same rule can be modified to account for the synonymy relationship of the sentences in the second pair of examples (Carroll 1978; Harries 1973). Indeed, the rule of Coordination Reduction, in its most general form, accounts for an extremely broad range of linguistic deletion phenomena in many languages. The deletions we have been considering to this point are really of a rather special sort. (Later, we will call them deletions under identity.) There are many instances of deletion in which the interpretation is not nearly so unambiguous, as in our snowball and fish examples. But regardless of our ability or inability to pretheoretically analyze the workings of deletion, we cannot ignore the importance of deletion processes to the structure of language or of cinema. On first reading, the ««deleted version of our snowball example seems a rather awkward sentence of English. Certainly, it is awkward in comparison with the version which has undergone deletion. Vygotsky (1965 [1934]: 139) illustrated the point this way: '... imagine that several people are waiting for a bus. No one will say, on seeing the bus approach, "The bus for which we are waiting is coming." The sentence is likely to be an abbreviation "Coming" or some such expression, because the subject is plain from the situation.' The same sort of principle appears to hold in cinema (Carroll 1973). A principle governing the well-formedness of cinema scenes stipulates that deletion, in the form of cutting and framing (see below), be employed periodically (and, in fact, fairly frequently). The most clearcut illustrations of this principle are scenes which are unfilmic because they violate it; for example, Alfred Hitchcock's suppressed film Rope (see Reisz 1968 [1953]: 233-236), and some of the experimental films by Andy Warhol, like Empire State Building. Another example is the Eustache film The Mother and the Whore. Veronika's long speech near the end of the film is held in what seems to be an endless close-shot. Such scenes are judged to be unfilmic.
130 Toward a Structural Psychology of Cinema
Baläzs (1970 [1945]: 131) relates this constraint on scene filmicity to visual interest. A relatively brief long-shot of an anthill, he argues, might be phenomenally far more lengthy (and more boring) than the same long-shot with an inserted close-shot that reveals the activity of the anthill. However, Baläzs also points out that cutting is not the only factor involved here. He acknowledges that visual events 'full of internal movement' can be realized as cinema scenes with minimal cutting (e.g. pp. 139, 254). Nilsen and Spottiswoode have developed the matter of cutting rate from a more perceptual viewpoint. Nilsen (n.d. p. 65) argues that the proper cutting rate for a scene is a function of the amount of time required by the viewer to fully apprehend the event represented in the scene. A scene representing a relatively simple event should be cut at a more rapid rate than one representing a complex event. Spottiswoode (1950 [1933]: 215-218) took an almost quantitative approach to this question. He referred to the balance between content complexity and cutting rate as cutting tone. He argued that events of varying complexity are apprehended at differing rates, and that the optimal place to cut is just as the effect of the shot (in part, at least, a function of rate of apprehension) reaches its maximum. To put this simply, one should cut when the viewer understands and appreciates the shot maximally, leaving him, perhaps, wanting just a little more, but never bored. We cannot pursue these analyses too far since the theoretical terms employed are after all rather vaguely defined. As a result, we cannot really state our must-cut constraint at this time, even as generally as Principles (A) though (G). However, the upshot of this discussion of must-cut, for our purpose, is that cutting, and therefore quite often deletion, is quite mandatory in the derivation of cinema scenes from visual events.
2. A TAXONOMY OF DELETION TYPES IN CINEMA In this section, we will divide deletion into four categories. Each category will be exemplified and discussed separately. In the next section, we will attempt to formalize the description of two of these deletion categories.
Deletion as a Case Study 131
The first question we ought to address is the question of what deletion is in cinema. We conceive of a cinema scene as a structured packaging of a sequence of actions and visual scenes. An intrinsic property of this packaging is that it reduces the detail of the event and descriptions it represents. A camera, or for that matter a human observer, can only apprehend a fraction of what goes on around it: things happen behind it, to the left, to the right, and in front of it — the choice must be made as to which direction will be attended to. In any slice of time, the camera records only one subset of those things it could record. It deletes the rest. This deletion takes place linearly in time as well. Suppose action X begets action Υ just as action A begets action B. The camera selects X over A in one time slice, implicitly deleting A. But now, perhaps, the camera will choose Β over Υ so that it does not inadvertently eliminate the entire episode of Α-B. Thus, the resultant linear cinema scene X-B represents Α-B and X-Y, deleting both A and Y. On all accounts, properly composed cinema ought to focus the viewer's attention on what is essential, and eliminate or subordinate that which is not. This is, in a sense, the primary teleological justification for deletion processes. Thus, Pudovkin (1958 [1929]: 93) speaks of 'clear selection, the possibility of the elimination of those insignificances that fulfill only a transition function and are always inseparable from reality, and of the retention only of climatic and dramatic points'. Arnheim (1957 [1933]: 89) writes 'From the time continuum of a scene he [the film artist — JMC] takes only the parts that interest him, and of the spatial totality of objects and events he picks out only what is relevant. Some details he stresses, others he omits altogether.' Finally, Nilsen (n.d. p. 21) describes '... the necessity so to organize the expressive elements in the space and time of the shot that the idea at the basis of the directorial treatment will be clearly manifested, ... this will only be achieved if fortuitous, unessential elements are suppressed.' In this discussion, we will divide the topic of deletion into two parts, spatial deletion and linear (temporal) deletion. The former refers to elements of the static mise-en-sc6ne which are omitted, the latter refers to segments of the linear action which are omitted. If a character reaches out of the frame for something and we
132 Toward a Structural Psychology of Cinema
are not shown what it is that he reached for, there has been a spatial deletion. If we see that he has picked up a cigar, but we are not actually shown his reaching, there has been a linear deletion. A second division which we make is that between deletion under identity and deletion without identity. Deletion under identity refers to situations in which something already shown to the viewer can be presupposed subsequently (and hence not actually shown). Deletions without identity also presuppose something instead of showing it, but in these cases what is not shown has also not been shown before. Thus, to continue our earlier example, if the character reaching for the cigar had previously been shown reaching for pipes and cigarettes, not showing the particular act of reaching for the cigar would be a linear deletion under identity. We have seen him reaching already, a new object of his reaching is all that needs to be specified. Had we not seen him reaching before, the deletion would have been a linear deletion without identity. Similarly, if our character was shown reaching out of the frame, but we had not previously seen the extension of the frame (i.e. the space which surrounds the manifest space of the frame of the shot), this would be a spatial deletion without identity. Had we previously been given a long-shot revealing a cigar box just within reach of the character, the deletion would have been a spatial deletion under identity. We must hasten to note that nothing theoretical rests on this taxonomy. It has been constructed as a descriptive tool for convenience of discussion. However, without some preliminary structuring of our data we would be unable to tackle theoretically substantive questions. We will now procede to an examination of the various classes in greater detail, with examples. Then, we will comment, in particular, on the class of linear deletions under identity as they may relate to the linguistic rule of Coordination Reduction. It should be clear that the sorts of deletions we will want to consider at this stage of the investigation will for the sake of convenience be the rather simpleminded and mundane sorts of deletion that occur in almost every reasonably structured film. Many eloquent examples of deletion can be cited which would not fit into our simple taxonomy at all. Spottiswoode (1950 [1933]:
Deletion as a Case Study 133
171) noticed one such example. In Pudovkin's film Deserter, the image of a ship is superimposed on a detail shot of a piece of metal being riveted and hammered into shape. The myriad details and stages of ship-building have been deleted, metaphorically compressed, as it were, into a single detail plus the image of the completed ship. This appears to be both a spatial and a linear deletion. We will not be able to consider such complicated formal constructions here.
2.1. Spatial deletion 2.1.1. Spatial deletion without identity. Arnheim (1957 [1933]: 85) notes that the filmmaker directs the attention of the member of the audience by deleting what he does not want them to look at. The interest of the viewer is directed at whatever is on the screen, because there is nothing else to see in a dark theater. Arnheim (p. 82) points out that these cases of deletion can be unfilmic. He cites the example of Dreyer's The Passion of Joan of Arc. In the opening segment of this film there is an extended series of close-up shots. The viewer is given no idea about the surroundings of what he sees on the screen. Arnheim says that such a deletion of space without identity, '... easily leads to the spectators having a tiresome sense of uncertainty and dislocation.' Consider two further examples from the work of Dreyer. First, in the last scene of The Passion of Joan of Arc, when Joan is burned, there are a series of Α-B cuts between a shot of Joan tied to the stake and a pile of smoking wood. There is never a connecting long-shot, hence we as viewers do not know visually that Joan is actually burning on the pile of wood. Without these two shots connected we have to rely on our historical knowledge that Joan was in fact burned in order to infer that the pile of smoking wood is under her and will ultimately be the vehicle of her death. A second case from Dreyer's work comes from the film Vampyr. After Sybille Schmitz, as the character Leone, is attacked, there is a scene in which she glares menacingly, and one supposes hungrily, at Rena Mandel, her as yet unbitten sister Gisele. A series of one-shots, with each sister looking out of the shot in the presumed direction of the other, connect the two sisters. Generally, this look of outward regard suffices to connect
134 Toward a Structural Psychology of Cinema
a shot with the shot following it in that the following shot is taken to be subjective (Principle A). Here, however, the subjective-shot device fails and the spatial deletion (i.e. the lack of a connecting long-shot) renders the scene unfilmic. Another source of such examples is the Resnais film Last Year at Marienbad. I find many of the scenes in this film to be only marginally filmic. However, these are certainly classic cases of spatial deletion without identity. The first long tracking segment in the film, in fact, presents one example. The camera tracks across walls and ceilings, doors, around hallways, and all the time remains very close to these surfaces — we do not see which room we are in, which way we turned out of the last room, etc. However, filmic cases of this type abound. Pudovkin (1958 [1929]: 93) described an example from D.W. Griffith's Intolerance. It is the scene in which Mae Marsh, as The Dear One, hears the death sentence passed on Robert Herron, in the role of her husband. Griffith shows the face of the woman in close-up. Then there is an abrupt cut away to a shot of her hands, her fingers are clasping at one another. Pudovkin notes that we never see the entire woman, only the face and the hands. The remainder of the woman's body has, in our term, been deleted without identity. We understand that an entire woman's figure exists in the extension of the screen, but we never see this figure. Baläzs (1970 [1945]: 123) hypothesizes another example. 'We see someone leave a room. Then we see the room in disorder, showing traces of a struggle. Then a close-up — the back of a chair with blood dripping from it. This would be sufficient. There would be no need for us to see the struggle or the victim. We would guess it all.' (Note that this example involves linear deletion without identity as well — we do not see the struggle but we know it to have taken place.) These examples by Baläzs and Pudovkin make it plain that some spatial deletion without identity is possible and quite filmic. What seems to be required is that the viewer can infer what was deleted from that which remains at the point of the deletion. Thus, we know that there was a struggle if we know that there is a setting for the aftermath of the struggle. Similarly, we know that there is a woman if we see the hands and face of the woman. However, in Arnheim's Joan of Arc example, we cannot infer the nature of the deleted space and are not informed of its nature until much later in the film. (I have
Deletion as a Case Study 135
no simple accounting of the other two Dreyer examples, but I believe that an account of them will hinge on what sorts of inferences viewers are willing or able to make in-interpreting deleted segments.) In M Fritz Lang shows a guard searching for Peter Lorre, who plays the role of a child molester. Lorre is hiding in a storeroom. The scene is structured in such a way that we see the guard and Lorre in separate shots. Never do we see Lorre and the guard connected in a single shot — even though we realize that the two must occupy the same space, namely the storeroom. Since the geographies of the two shots are not connected, the scene is somewhat unfilmic. In the initial scene of Pickup on South Street, Samuel Fuller constructed a similarly confusing scene. He shows three people glancing nervously about on a crowded subway train. However, he never gives a unifying long-shot; we see each person in a separate shot. The viewer never finds out what the exact geographical relation is between the three people. In Hitchcock's Psycho there is a comparable scene. When the detective climbs the stairs (just before he is murdered), there is a cut to the base of a door just beginning to open. Subsequently, we are able to infer that the door opening was in fact the bedroom door. However, at the time of the cut the viewer does not know where the door is, and the scene is resultingly somewhat unfilmic. 2.1.2. Spatial deletion under identity. The second class of spatial deletion is spatial deletion under identity. Nilsen (n.d. p. 34) speaks of '... giving special emphasis to one or another part of the scene [read as setting — JMC] being filmed, by isolating it from the long-shot of the general scene.' Such examples are very common in the cinema of D.W. Griffith. Griffith developed a formula for constructing scenes, which later evolved into the Hollywood master-scene approach of the 1930s and 1940s. Griffith shot the scene in long-shot and then went back and did close-up detailshots. Scenes were constructed by inserting these detail-shots into the long-shot master. Typically the close-ups do not involve the major action of the scene but rather reveal the expression with which an actor reacts to that action. As Münsterberg (1970 [1916]: 38) put it, 'The close-up has to furnish the explanations.'
136 Toward a Structural Psychology of Cinema
Since none of the linear plot line is deleted, these are spatial deletions. Since we have already had a chance to see the space of the entire scene in the long-shot before any of the close-up insert-shots, they are deletions under identity. Another example of this type of deletion comes from the Hitchcock film The Man Who Knew Too Much. In the Albert Hall scene, there is a cut from a very long-shot of the Ambassador seated in the hall to a close-up. In the very long-shot the viewer cannot see that one of the many faces in the crowd is in fact that of the Ambassador at whom the following close-up shot is directed. The combination of long-shot and close-up is resultingly somewhat unfilmic. (The formal converse of this sort of combination occurs in Roman Polanski's film What\ In the piano room scene, there is a close-up shot preceding a long-shot. The two shots are joined in that they both include a vase of flowers. However in the long-shot, the vase is insignificant and the viewer is not likely to notice it. Resultingly, the two shots may appear to be disconnected: a close-up shot of a vase followed by a long-shot of something totally different and unrelated. So far we have only been discussing deletion which is accomplished by means of cutting. Perhaps this is the most common method, but it is surely not the only method. Consider an example from Nicholas Roeg's film The Man Who Fell to Earth. Rip Torn tricks David Bowie with a T.V. remote control tuner that triggers an X-ray photograph. Torn goes into the kitchen in order to get some ice and to hide the specially rigged tuner. As he leaves Bowie, the camera tracks along with him. Thus, Bowie's space is deleted under identity — we have seen it already — but by a track not a cut.
2.2. Linear deletion 2.2.1. Linear deletion without identity. Complementary to spatial deletion in our taxonomy is linear deletion. This class is characterized by the elimination of pieces of the linear plot line, that is, actions which for one reason or another must have taken place are not explicitly represented in the manifest series of shots. We have already considered some examples of linear deletion. Recall the Baläzs case of showing the aftermath of a struggle. In this scene
Deletion as a Case Study 137
we need not see the struggle in order to know that it has taken place: logically the struggle is a part of the film's linear plot line, but the viewer never actually sees it. The Baläzs example is an instance of linear deletion without identity. We do not need to have previously witnessed a struggle in order to understand from witnessing the aftermath of a struggle that a struggle has taken place. In the Baläzs example we have not previously seen a struggle. Many nonhypothetical examples can be cited as well. Eisenstein (1949: 11) cites one such example from the reediting of the film Danton. In the original film, Emil Jannings, as Danton, rushes up to Robespierre, who has just condemned a friend of Danton's to the guillotine, and spits in his face. Robespierre wipes the spit from his face with a handkerchief. In reediting the film, Benjamin Boitler simply spliced out the portion of the scene in which Danton spits. The resulting scene was entirely changed: the impression given the viewer was that just as Danton rushes up to Robespierre to object to the sentence, the latter wipes away a tear. The implication of the re-editing is that Robespierre condemned Danton's friend with great regret. In Jean-Charles Tacchela's film Cousin, Cousine, there are several examples. Early in the film one of the cousins approaches his grandfather who is digging out a swimming pool. They have a conversation in a standardly arranged pattern of -B shots. The grandfather asks the boy to join in the digging. In the very next shot the boy is digging in the pool beside his grandfather . The viewer understands that he has climbed down into the pool, but does not see him do so. Later in the film, Victor Lanoux (the character Ludovico) and Marie France Pisier (the character Karine) return home just in time to see Ludovico's daughter Nelsa go running through the house into her bedroom, very upset and loudly bursting into tears. There is a cut to Ludovico's reaction, he turns his head noticing the racket Nelsa is making. The very next shot is of Ludovico and Karine seated on the bed comforting Nelsa. Again, the viewer knows that they must have walked to the bedroom, but has not actually seen them do so. Another example comes from Wertmuller's The Seduction of Mimi. There is a long scene in which the hero, Giancarlo Giannini (the character Mimi), meets Mariangela Melato (the character Fiora) during the time he is away from Sicily working in Turin.
138 Toward a Structural Psychology of Cinema
Toward the end of the scene, they are walking along a riverbank. There are a series of cuts structuring the scene, but the last cut is particularly interesting for us. In one shot they are strolling along the river, in the second they are standing beside the road which runs parallel to the river. They argue and part from one another yelling insults. These two shots are joined directly by a cut. The viewer never sees the two characters turn away from the river and walk from the riverbank to the road. One of the most impressive examples of this is from Orson Welles' film Citizen Kane. There is a Hollywood montage arrangment depicting the deterioration of Kane's (played by Welles) marriage to his first wife (Ruth Warrick). We see the Kanes at breakfast, first sitting close to each other, talking about how much time Kane spends with his newspaper (and not with his wife). Finally, they decide to go back to bed. The shot dissolves (with a flick pan) into another breakfast scene. Kane is paying less attention to his wife's protestations and is increasingly annoyed with her inability to understand his work. A further breakfast dissolves in and out and the Kanes are shown at opposite ends of the table, absolutely silent and separate. Entire days, and probably weeks, are deleted from this scene, the viewer can only very generally infer what sorts of activities must have or might have occurred. Many examples of linear deletions without identity involve scene boundaries. For example, the action of a scene is represented up to some point beyond which the filmmaker expects that the viewer can figure out what happened next, and then a cut is employed and the next scene begins. As Münsterberg (1970 [1916]: 47) put it, 'There is no need of bringing the series of pictures to its logical end, because they are pictures only and not the real objects.' Consider one such example from the John Ford film The Grapes of Wrath. After the grandmother, played by Zeffie Tilbury, dies on the family's perilous journey to California, someone makes the statement 'we'll bury her in the flowers'. The scene ends right there with a cut to the family once again on its way to California. As viewers, we never see the actual burial, and yet we understand it to have taken place. Another similar example comes from The Man Who Fell to Earth. In one scene David Bowie and Candy Clark are touring the countryside in their chauffeured car. Bowie stares mesmerized at some country people who are
Deletion as a Case Study 139
just as fascinated by his limousine. There are a series of -B cuts back and forth, and finally a white-out of the country scene and the limousine. The following scene opens with the limousine pulling up to its final destination on the trip, the site of Bowie's landing on earth. However, a certain amount of connecting material in the linear plot structure has been omitted. We never see the car travel from the location of the country people to the point at which it finally comes to a halt. Another example of this type comes from Alfred Hitchcock's The Family Plot. Here the jeweler administers an injection to the heroine (Barbara Harris), and he and his wife (Karen Black) prepare to move the sedated girl to their basement vault. There is a cut, and in the following shot the two are riding in a car returning from a ransom pick-up and examining the large diamond they have extorted. The viewer does not see them place Barbara Harris in the vault, or indeed, see any part of the ransom episode, although all this is understood to have transpired. Baläzs (1970 [1945]: 149) noted an interesting case of this sort of linear deletion without identity across a scene boundary. In this type of combination one scene ends with a closing diaphragm-shot '... the last shot of the scene is a close-up in which only a single face or hand or object remains in the frame, thus being lifted out of its space. This last shot of the scene is at the same time the first shot of the next scene, which emerges as the diaphragm is opened up.' Although, the diaphragm is no longer a common mise-en-scene device, this type of arrangement has now been widely adopted in television: a scene ends with a cut to a close-up. When the director cuts back from the close-up, what had been in the close-up is now embedded in a new setting and a new scene begins. Just as was the case for spatial deletion without identity, it seems that many examples of linear deletion without identity are unfilmic. As Reisz (1968 [1953]: 217) puts it, 'If the editor ... skips a piece of action — say he cuts from the shot where the door is half open to another where it is already closed — there will be a noticable jump in the continuity and the cut will not be smooth.' Often, this sort of linear deletion without identity is due to a simple error in matching shots. In D.W. Griffith's Broken Blossoms, there seems to be an example of this. The character Battling has his robe off in a close-shot, but when he rises to fight
140 Toward a Structural Psychology of Cinema
in the following long-shot — he has the robe on. The viewer does not see him put the robe on, but must assume that this took place. It seems that this detail was simply overlooked when the film was put together. There are two further examples of this sort of thing in the Lubitsch film Ninotchka. In one scene Melvyn Douglas is trying to get the three Russian agents drunk. The eyeglasses of the agent Boyenoff fall off in a close-shot. However, when Lubitsch cuts to the medium-shot, the eyeglasses have mysteriously been put on again. The viewer does not see Boyenoff put the glasses back on, but assumes that he did so. However, the transition from close-up to medium does not inform the viewer that any time has passed. To the contrary, in fact, such combinations usually indicate that no time has passed. Hence, the cut is perceived as unfilmic, or to use Reisz's term, as not smooth. Later in the same film, Douglas takes Greta Garbo to a restaurant where they encounter the Grand Duchesse (played by Ina Claire). When the Duchesse joins their table a chair suddenly appears out of nowhere for Douglas to offer her. Again, the viewer can infer that the chair was brought to the table by an alert waiter, even though this is not explicitly shown. However, since no time is indicated as having been deleted, the cut is unfilmic. 2.2.2. Linear deletion under identity. The fourth category we have described is that of linear deletion under identity. In these cases what has been deleted need not be inferred on the basis of general knowledge about the world, but can be determined from what has been explicitly represented in the manifest structure of the scene. In our earlier example from The Man Who Fell to Earth, one scene ended and then the car had to travel some unknown distance before the next scene began. In our earlier example from Wertmuller's The Seduction of Mimi we observed that the two characters had to turn away from the river and walk from the riverbank to the road between shots, as it were. What was deleted had to be filled-in based upon general knowledge about states of the world. In other examples from these same films it seems that what has been deleted can be defined, or recovered, purely by referring to what has actually been shown.
Deletion as a Case Study 141
For example, in The Man Who Fell to Earth, there is a scene in which Bowie and Clark are making love. (This entire scene is intercut with another scene in which Bowie presents Clark with a telescope as a gift, but we will ignore this complication for the purposes of the discussion.) As they make love, there are nonmatching cuts to other segments of the action. Thus, subepisodes of their lovemaking are deleted under identity with the subepisodes which are actually represented in the film. Similarly, in Wertmuller's The Seduction of Mimi (in the same portion of the film we referred to above), there is a scene between Mimi and Fiora when he first approaches her at her street-shop kiosk. As they talk to each other and begin to flirt, there are cuts to other segments of time in which they continue to talk and flirt. The cuts do not match and the viewer is presumably supposed to assume that something has been deleted. What has been deleted, however, is just more talking and flirting. In the riverbank example from The Seduction of Mimi, the viewer cannot assume that only more walking along the riverbank has been deleted. He must further infer that Mimi and Fiora turned away from the river, changing their direction, and approached the road. If he did assume that only more walking along the riverbank was deleted, he could not locate them at the road. In the kiosk example the viewer can assume that only functionally identical segments of talking and flirting have been deleted. For our taxonomy this seems to be a limiting contrast: if in the riverbank example the vector of motion for Mimi and Fiora could have remained constant and still permitted the viewer to locate them at the road in the next shot, this would have been a linear deletion under identity. Several examples of linear deletion under identity obtain in Lang's Metropolis. One example is very similar to the previous examples from The Man Who Fell to Earth and The Seduction of Mimi. When Maria, the robot (as played by Brigette Helm), dances for the industrial leaders of Metropolis, there is a single medium-shot of her dancing. While the shot is held in terms of its spatial inclusiveness, segments of time are deleted. Thus, we see the robot dance a bit, then there is an abrupt cut to more dancing in the same spatial shot, and so on. What has been deleted is just more of the dancing.
142 Toward a Structural Psychology of Cinema
Later in the film there is another deleted scene somewhat different from the ones we have discussed so far. The robot Maria is leading the workers to riot on the surface of Metropolis (and, of course, tricking them into drowning their own children). Maria ascends from the workers' subterranean city on an elevator car, and the viewer is shown the entire traversing of the elevator through the bottom aperture of the elevator shaft, up into the nonvisible part of the shaft. However, when subsequent carloads of workers ascend the elevator shaft, we only see glimpses enough to suggest that they fully ascend. Still later, when the power fails, the elevators fall back down through their shafts. The viewer sees the first one fall through the entire aperture at the bottom of the shaft and explode into pieces. Then he sees another just landing and bursting apart (the beginning of its fall through the aperture has been deleted by virtue of its identity with the fall of its predecessor). Notice that there are actually two sorts of linear deletion operating here. First, there is a linear deletion without identity: we never see the elevators traverse the entire length of the shaft (this would be, after all, a rather boring and dark shot). Thus, we must infer that the elevators do travel the entire length of the shaft. Second, there is a linear deletion under identity: the deletion of detail from the ascending and crashing shots on the basis of their partial identity with the initial ascending and crashing shots does not require an inference based upon knowledge about elevators or elevator shafts. A more recent example, similar to the latter examples from Metropolis, comes from Michael Ritchie's film The Bad News Bears. There are several examples involving scenes at baseball practice and games. For instance, the team is practicing sliding into a base. The first child is shown starting to run, the camera pans with him as he runs the distance of the baseline, and finally he slides into the base. Subsequent team members are shown just sliding with a short pan revealing only the last few steps of their approach to the base. Most of the approach and the start in these latter instances have been deleted from the manifest film. They are, of course, to be taken as identical to that of the first player. Similarly, when the team is batting in a game, we see the wind-up of the pitcher, the release of the ball, the ball's trajectory to home plate, and the swing and hit of the batter. There is a pan along
Deletion as a Case Study 143
with the ball as it finally lands in the outfield. The fielders scramble after it, while the batter rounds the bases. Subsequently, we only see swing, hit, fielders scrambling, and shots of running the bases. Ultimately, we see only swing and_hit, swing and hit. Again, the other material can be deleted by virtue of its identity with the material which appeared explicitly in the initial full version. Another, somewhat metaphorical, example of this occurs in The Man Who Fell to Earth. In the scene in which Buck Henry (the character Farnsworth) is murdered, we see in real-time detail two men approach him, pick him up, and, after two attempts, finally manage to throw him out of the window. Then we see several shots of his body falling through the air earthward. Next, we see the same two men accost his roommate. There is a cut, to a barbell falling through the air, just as did Farnsworth's body. Now, in the context of the film, it is clear that we are to take the barbell to be a metaphor for the roommate. Ignoring this aspect of the scene's structure, we have a deletion under identity. The scene in which Farnsworth is thrown out of the window is shown in complete detail. The subsequent segment in which his roommate is thrown out is abbreviated: we see him accosted and we see him (his barbell) fall. We do not see the preparation for and execution of the act of throwing him out of the window, but we know that what happened was the same as that which happened to Farnsworth.
3. THE REPRESENTATION OF DELETION IN THE GRAMMAR By way of summary, we have partitioned the task of describing deletion processes in cinema into four subtasks, each being that of describing one of the four classes in our deletion taxonomy. Of these four classes, the two that involve deletion under identity appear to be most amenable to formalization. Recall though that our Event-Final Deletion Rule, presented in Chapter 5, accounts for at least some linear deletions without identity. In the remainder of this chapter, however, we will turn to a description of task of formalizing the deletion under identity — spatial and linear. Our fundamental goal, as in Chapter 5, is to illustrate the descrip-
144 Toward a Structural Psychology of Cinema
live tool of TGG, and not to worry about presenting a completely adequate and comprehensive statement of particular rules. The Included Space Constraint (ISC), or Principle (H), stated informally below, initially formulates some of the properties of filmic spatial deletion. (H) The space included in each of the shots belonging to a common scene is encompassed by some long-shot in that scene. If the scene includes more than one such long-shot, these must be connected either by overlapping directly or by being joined in a superlong-shot. It is asserted that any scene satisfying this principle will be filmic, but it is not claimed, and it would be clearly false to claim, that all filmic scenes involving spatial deletion are predicted by this principle. As we observed above, there are cases of spatial deletion without identity which are fully filmic. The question now is how to formalize this principle. One approach would be to have material copied from the N node into a terminal node dominated by Sq deleted in virtue of its satisfying some criterion of identity with material copied from the N node into another terminal node dominated by Sq. Recall that the entire complex event E underlying a cinema scene divides into two portions: the description of the elements of the scene's geography, and the spatial relations between these elements, is encoded under the N (naming) node; the action of the shots is encoded under the Sq node. When an event is derived into a sequence structure, information from the N node is copied into each terminal node under Sq, transforming these descriptions of space and actions into a sequence structure representation, (e.g. corresponding ultimately to, say, the Ambassador in a long-shot seated in the Albert Hall and the Ambassador in a close-up seated in the Albert Hall — in our example from The Man Who Knew Too Much). In the Albert Hall example, various elements of the overall scene's geography can be deleted from the close-shot of the Ambassador, in virtue of their identity with elements of the preceding long-shot. The Included Space Constraint, Principle (H), can be restated and further formalized as below.
Deletion as a Case Study 145
(H 1 ) Whenever material dominated by A nodes is transformed into an S node, delete from N all the material mentioned in the newly created S node. When no A nodes remain under Sq, if N is null (i.e. if nothing remains under N) delete E and N (i.e. the derivation is complete — the event structure is now a sequence structure), if N is non-null (i.e. if N is non-empty) mark the derivation as ill-formed (i.e. the scene will violate the included space constraint). This is a fairly explicit statement of the ISC, but it is not a transformational rule. In the course of converting the event representation into a derived sequence representation, the grammar keeps track of how much relevant geography has been made explicit. It does this by beginning with a formal model of all relevant geography, the N node, and by deleting from that formal model anything which appears in the description of any S node created in the course of the derivation. When the derivation is complete, if nothing remains under the N node then the ISC is satisfied and all of the relevant geography has been included in the scene. We can begin to recast this formulation as a transformational rule. For example, consider the following statement. Included Space Shot Rule: N + Sq(Al + A2 + ... + An) = = > Sq(S«LONG» + A2 + ... + An) Condition: S(), as derived by this rule, contains all of the elements of N. This rule rewrites the left-most A node of an event structure as a long-shot, revealing all of the relevant geography of the scene.2 The empty N node is deleted, the ISC is satisfied. Note that this treatment is precisely the master-scene approach of Griffith, which we considered in Chapter 4. Figure 22 illustrates the Albert Hall example. Our phrase structure approach rendered this same relation with the rule: S = = > L + D*
146 Toward a Structural Psychology of Cinema
a = Ambassador b = balcony c = crowd
a+b+c+ a+
copied into A-j copied into A2 b + c+..
Figure 22. As we noted in Chapter 4, not all filmic scenes are master scenes, and indeed, not all filmic scenes with spatial deletions under identity are master scenes. A somewhat more general statement, would allow the N node information to be distributed over several shot nodes, hence avoiding the limitation of a scene-initial encompassing long-shot. In this approach, all shot rules would be elaborated to delete those elements of N that are copied into the S node created. The various shot rules we have developed (recall Chapter 5) would simply continue to reapply and S nodes would continue to be developed until there were no A nodes remaining. At that point in the derivation, if anything remained under N, the scene derived would violate the ISC and resultingly would be unfilmic. 3 We turn now to consideration of how the facts regarding linear deletion under identity might be given a preliminary formal treatment. Consider the following statement:
Deletion as a Case Study 147
(I) When actions are repeated in scene, noninitial iterations may be reduced by having some subsets of their identical repetitive elements deleted. Principle (I) attempts to capture the essence of the deletion processes we observed in the examples discussed from The Bad News Bears, The Man Who Fell to Earth, and Metropolis. Here material is deleted from a node dominated by Sq in virtue of an identity with material contained in some other node under Sq. Under the Sq node is a schematic hierarchical formula of the scene's action (e.g. C runs from Υ to Z — in our example from The Bad News Bears), supplemented by the naming information represented under the N node (e.g. C = Charlie, Υ = first base, Ζ = second base). We might begin formalizing this principle as follows: Coordination Reduction: A(A1 + A2 + ... + An) + A'(A'l + A'2 + ... A'n) = = > A(A1 + A2 + ... + An) + A'(A'j + A'k + ... + A'n) Conditions: (1) A'j + ... A'n is a continuous sub-sequence of A'l + ... + A'n (2) Ai = A'i for all i This rule stipulates that two adjacent and identical action nodes (A and A 1 ) can be reduced to a single full A node structure, plus a subsegment of the other A node. The operation of the rule in our The Bad News Bears example is illustrated in Figure 23. The key condition on the rule, of course, is identity (condition 2). The criterion of identity for linear deletion under identity is algebraically defined. Even though the actors in adjacent base-sliding actions are different, the corresponding A nodes in the scene's event structure are identical. Since A3 = A5 (in Figure 23), A5 can be deleted.
148 Toward a Structural Psychology of Cinema
B C Υ Ζ
= =
Bill Charlie F i r s t Base Second Base
Figure 23. 4. EXPLANATORY ADEQUACY This consideration of deletion phenomena concludes our grammar fragment. We now turn to the implications of the grammar for various aspects of the analysis of cinema. This discussion requires us to once again extend the empirical criteria for evaluating grammars. Even if we could comprehensively establish a transformational-generative grammar of the structure of cinema scenes, we would still have to ask whether that grammar could fruitfully support that analysis of cinema aesthetics, cinema comprehension, filmmaking, criticism, and the like. If the answer to these questions is negative, the grammar will be of limited theoretical utility. Our approach to this matter will be to define a higher level of adequacy for cinema grammar: explanatory adequacy. A grammar that achieves explanatory adequacy is a descriptively adequate grammar that facilitates the statement of general principles of
Deletion as a Case Study 149
human intelligence.4 To make this clear, consider three ways a grammar might do this. Recall our distinction between models and aesthetics in Chapter 2. We have concentrated on the goals of the former, expressly avoiding any confusion with the latter. However, at some point in anybody's study of film it must be acknowledged that film is art, and that the experience of film viewing can be an aesthetic experience, in the traditional sense. A grammar of scene structure, while it does not directly address aesthetic questions, might provide a substantive basis for stating and investigating hypotheses about the nature of aesthetic experience. Such a grammar would achieve a degree of explanatory adequacy.5 Another route to explanatory adequacy involves the psychology of cinema production and comprehension. The grammar we have been exploring is a theory of knowledge, but not necessarily a theory of how that knowledge is put to use in filmmaking and film viewing. However, the possibility certainly exists that the rules and structures defined by the grammar can contribute to the analysis of cinema production and perception. A grammar that does this would be more than descriptively adequate, it would achieve explanatory adequacy. We turn to this question in the next chapter. A third way a grammar can begin to achieve explanatory adequacy would be to state rules that are intermodally universal. That is, suppose that one of our cinema transformations turned out to be isomorphic with a particular linguistic transformation. This might happen by chance, but it could also reflect a universal property of the way the human mind structures complex sequences, be they cinema scenes or sentences. If such universals exist, they represent profound generalizations about the nature of human intelligence. A grammar whose statements faciliatate the discovery of such principles achieves a level of explanatory adequacy. In the next section, we will discuss the possibility that the rule of Coordination Reduction, stated above, is one such universal principle. Unlike the criteria of descriptive adequacy, the criteria of explanatory adequacy are vaguely defined by examples. A grammar that contains generalizations about cinema that go beyond descriptive adequacy approaches explanatory adequacy. The list of such generalizations is an open list: when we have a grammar
150 Toward a Structural Psychology of Cinema
that contributes to the analysis of production, perception, aesthetics, and universale, we try for something more. Explanatory adequacy is necessarily vague, it is a goal that continually recedes from what we have been able to accomplish.
5. UNI VERS ALS One class of universal is that of universal transformational rules of grammar. The rule of Passive, as we have seen, relates sentences like those below. Ralphie hit Joey on the head. Joey was hit on the head by Ralphie. The first sentence is the active version, the second is the passive version. The Passive rule is viewed as expressing a language universal relation.6 If it is true that certain relations like Passive are truly universal, expressing themselves in all human languages, then in order to achieve explanatory adequacy, a grammar of English would have to, first, include these rules, and second, distinguish these rules from other nonuniversal transformational relations. The real significance of all this transcends issues of formalism, however. What does it mean to say that a rule is universal? Clearly, it means that the structural relationships described by the rule obtain in all, or very nearly all, languages. What does this mean? One could, naturally, attribute these common properties to historical accident, but this seems strained as an explanation of their occurrence. Surely a more interesting hypothesis is Chomsky's (1968) suggestion that there are language universale precisely because there are shared basic properties of the human mind — a core of linguistic structure common to all human beings, a part of our common genetic heritage. If there is but one interesting question in the study of human knowledge(i.e. cognitive science), this is it. If the existence of universals is to be accounted for in Chomsky's suggestion, then, we are in the position of making rather strong claims about the nature of the a priori structure of the human mind, the innate knowledge children are born with. Chomsky (1975) provides a simple and compelling illustration of this line of argument. He (pp. 30-33) asks us to imagine a child
Deletion as a Case Study 151
just learning how to form questions in English from corresponding actives, thus: the man is tall — is the man tall? the book is on the table — is the book on the table? From this pattern of behavior we might understandably conclude that the child simply searches through the active version (left-toright) for the first occurrence of the word Ms' (or another such verbal element), and shifts this element to the front of the sentence sequence. When exposed to declarative-interrogative pairs like this the language-learning child might well conclude that this is the rule for forming questions in English. Following Chomsky, we will call this hypothesis structure independent, since it relies only on the left-to-right order in word sequences. An example like the one below presents problems for a structure independent account. the man who is tall is in the room — is the man who tall is in the room? Chomsky notes that children never mistakenly utter such incorrect forms, although they do, of course, make many mistakes in the course of language learning. But this is a puzzle. On the basis of the simple cases (e.g. the man is tall — is the man tall?), simple parsimony would direct the child toward the structure independent hypothesis. And this, in turn, would suggest that at least some of the time some children would mistakenly utter incorrect forms like 'is the man who tall is in the room?' The fact that children do not do so demands that an imparsimonious hypothesis be entertained, a hypothesis that conceives of the child's acquisition of language as guided by a priori knowledge and not merely by parsimonious empirical inductions. Chomsky presents such an account: The child analyzes the declarative sentence into abstract phrases; he then locates the first occurrence of 'is' (etc.) that follows the first noun phrase; he then preposes this occurrence of 'is', forming the corresponding question, (p. 32) Chomsky calls this hypothesis structure dependent and poses the following question: why does the language-learning child consistently make use of the structure dependent hypothesis in attempting to master his language, even though the structure independent
152 Toward a Structural Psychology of Cinema
hypothesis is far more parsimonious? Or, put another way, is the rule of question formation in English grammar structure dependent? Chomsky's answer is that the principle of structure dependence is an a priori condition on grammar and the child's language acquisition capacities, it is a universal.7 The evidence that there are indeed language universals is at this point quite conclusive (Bach and Harms 1968; Greenberg 1963; Jakobson 1941; Ross 1967) and, while Chomsky's own interpretation of universals as aspects of innate mental capacities may still be regarded as radical by some, there are in fact no serious alternative interpretations. The question that we must now pose is what the existance of universals may mean for cinema grammar, and vice versa. It does seem that certain types of relationships, often cited in linguistic investigations as universal, are analogously manifest in the structure of cinema. For example, the Passive relation can be reinterpreted cinematically: Ralphie is shown in a close mediumshot. As he prepares to deliver a blow, there is a cut to a long medium-shot revealing that he is hitting Joey on the head. This scene seems to correspond to the active sentence 'Ralphie hit Joey on the head.' Now consider the passive version of the sentence, 'Joey was hit on the head by Ralphie.' To cinematically interpret this sentence, one could first show Joey in the close medium-shot. Then, just as he is being struck, there is a cut to a long mediumshot, this time revealing that Ralphie is the attacker. So far, though, our consideration has been at least somewhat frivolous. In a certain sense, it must be so at this stage of the investigation: without a fixed and well-developed grammatical theory of scene structure, we cannot examine wonuniversal transformational rules in any great detail or rigor — let alone putatively universal rules. Nevertheless, there are a few further remarks pertaining to the rule of Coordination Reduction that we can briefly consider. On the basis of our discussion of the 'snowball' and 'fish' examples in section 1, we have considered the possibility that the linguistic rule of Coordination Reduction, widely analyzed as a universal rule (Koutsodas 1971; Harries 1973), was in fact more than just a linguistic rule. Indeed, the transformational description we have offered for the phenomena of linear deletion under identity in cinema, was labeled Coordination Reduction. The question then is whether the linguistic and the cinema grammar
Deletion as a Case Study 153
formulations have anything fundamental in common: are they really the same rule? If so, it could be that what have been called language universals are in some instances special cases of broader cognitive universals. The starting point for our analogy is the intuitive similarity between scenes like the murder of Farnsworth scene from The Man Who Fell to Earth, and sentences like, I began to eat fish, and Bill rice. But, as noted above, without a fixed formalism, this remains a purely empirical analogy. The formalism we have so far provided is simply not detailed enough: nothing we have said rejects the analogy, but nothing in the formalism independently predicts a rule of Coordination Reduction in cinema grammar. However, there is an interesting anecdotal argument that I believe should be raised. If two phenomena are similar in the way I have claimed that linguistic Coordination Reduction and cinematic linear deletion under identity are, we might be able to reveal this similarity as merely accidental or artifactual by manipulating both phenomena identically and observing whether they behave differently under the manipulation. As a crude analogy, you could show that an egg and an egg-shaped rock are in fact different, even though they look the same, by dropping both on the floor — only one will make a mess. What I decided to do was to look at Coordination Reduction in more than two adjacent scenes. In the The Bad News Bears, there are instances of such deletion in the base-sliding and hitting scenes. What we find is that increasing deletion can take place throughout the entire scene. We see the first batting scene in its entirety: stepping up to the plate, the pitcher winds up and releases the ball, the ball travels to home plate, the batter swings and hits, the ball travels to the outfield, it is fielded, the hitter rounds the bases, etc. This first scene is represented in several shots and almost none of the linear plot line is deleted. For the next batter, less material is explicitly represented in the cinema scene, but we understand that the same sequence of events has taken place. Finally, we see only a swing and a hit, but we still understand that the entire sequence of events has taken place. I have never found a case of less and less linear deletion in such a construction, but try to imagine it. Consider the basesliding scene from The Bad News Bears. The first child runs the
154 Toward a Structural Psychology of Cinema
entirety of the base-line and then slides into the base, we only see the second child slide into the base, then the third child is shown running, say, half the distance of the base-line and then sliding into the base. I have the distinct intuition that such scenes are unfilmic (but of course they are scenes of imagery not cinema). Perhaps the fact that I have never seen one of this type in an existent film is prima facie evidence that they are unfilmic — and that they end up on the cutting-room floor. In any case, I was quite surprised when I returned to the linguistic analysis of linguistic Coordination Reduction. Ross (1970 [1967]) had noticed that the rule can apply in coordinations of more than two clauses, and had discussed data like those below. I began to eat fish, Bill began to eat rice, and Harry began to eat roast beef. I began to eat fish, Bill to eat rice, and Harry to eat roast beef. I began to eat fish, Bill rice, and Harry roast beef. However, no one had looked at Coordination Reduction analogous to that in the The Bad News Bears examples. In fact, the linguistic data appear to be quite parallel to the cinema data. First, consider what happens when successive deletion is not nondecreasing: * I began to eat fish, Bill rice, and Harry to eat roast beef. * Max wanted to love Harriet, Fred Sue, and Tom to love Kathy. The resulting sentences are unacceptable. Now, observe the result when the successive deletions are nondecreasing. I began to eat fish, Bill to eat rice, and Harry roast beef. Max wanted to love Harriet, Fred to love Sue, and Tom Kathy. The sentences are fully acceptable. The matter of the linguistic rule of Coordination Reduction in sequences of more than two conjuncts has been somewhat oversimplified here. Elsewhere, I have gone into greater detail regarding the specific changes in the linguistic rule of Coordination Reduction that these data necessitate (Carroll 1978a). For our purposes, it is enough to note that the sentences analogous to our unfilmic gendanken scene from The Bad News Bears, are quite unacceptable, while the filmic example corresponds to an acceptable class of sentences.
Deletion as a Case Study 155
Recently linguistic approaches have begun to emerge in diverse areas (e.g. in music, Sundberg and Lindbloom 1976; in architecture, Eisenman 1971; in poetics, Freeman 1970; in dance, Lasher 1978a,b). In many cases, these investigations converge in concluding that at least some grammatical universals are not uniquely linguistic. Sundberg and Lindbloom (1976), for example, present a formal analysis of some Swedish nursery tunes. They demonstrate that certain aspects of the transformational-generative phonology of Chomsky and Halle (1968) (assignment of stress contour to surface structures) can be generalized to both linguistic and music grammar. They argue that these parallels between linguistic structure and musical structure reflect cognitive-perceptual universals pertaining to the ways people deal with complex sequences. In many places (e.g. Chomsky 1975: Chapter 1), Chomsky has uncritically adopted the position that universal properties of the structure of language are ipso facto uniquely properties of the structure of language. This position rules out by fiat the possibility that at least some of the properties of language are actually more general in nature. In particular, Chomsky ignores the possibility that some putative language universals are actually properties of all human symbolic systems, or even of all of complex human behavior. It seems patent that this approach can only lead to a distorted analysis of both language and the more general cognitive systems which theoretically circumscribe it. Indeed, the analysis of Coordination Reduction outlined above, would comprise a specific disconfirmation of Chomsky's position. NOTES 1. For additional discussion of deletion phenomena, see Carroll (1981). 2. Another formulation, one that formalizes the condition of the rule a little more, would be this: N(N1 + N2 + ... + Nm) + Sq(Al + A2 + ... + An) = = > Sq(S«LONG» + A2 + ... An) Condition: S includes N l , N2, ..., Nm. Note that on either statement we would need a convention that if the N node has been deleted, that is, copied into a S node, other A nodes can have access to that information when some shot rule or another applies to them. 3. Since we have not really explored the internal structure of the N node, there is little point in recasting our earlier shot rules here.
156 Toward a Structural Psychology of Cinema
4. The term explanatory adequacy is taken from Chomsky (1965). The particular characterization of it, while consistent with Chomsky's approach, differs in detail from his own definition. 5. We shall indeed assert this about the present grammar and make a preliminary study of this claim in Chapter 8. 6. For a fairly recent consideration, see Perlmutter and Postal (1977). Note though that the particular statement of the Passive rule given earlier in the text is quite clearly not the correct universal formulation (Perlmutter and Postal 1977). It may not even be adequate for English. The purpose of that statement of Passive was purely didactic in the present context. 7. Indeed, Frith and Robson (1975) have recently demonstrated that young children are better able to recognize cinema scenes that conform to Hollywood cutting convention than they are to scenes that fail to do this. Since I do not know any specific details about the cutting rules Frith and Robson studied, I cannot now comment on the relevance of their work to the explanatory adequacy issue.
7 Actions and Shots as Psychological Units
1. THE PSYCHOLOGICAL REALITY OF CINEMA GRAMMAR In the preceding three chapters, a linguistic program for research in cinema theory was developed. Some preliminary hypotheses about the structure of cinema scenes were advanced, assessed, and at least partially formalized as a transformational-generative grammar. Since the descriptions enumerated by the grammar conform to the requirements of descriptive adequacy, the grammar can be viewed as part of an axiomatic theory of human knowledge, a cognitive theory. Further questions can be raised regarding the role these cognitive structures play in the implementation of behavior and the organization of experience pertaining to cinema. For example, how, if at all, does the knowledge imputed to the film viewer by cinema grammar structure the experience of viewing films? How, if at all, does this knowledge structure the behavior of the filmmaker in creating films? How, if at all, is grammatical knowledge relevant to aesthetic experience and to film criticism? It would be surprising, in a sense, if the descriptions generated by the grammar were to shed light on these questions: the empirical criteria of observational and descriptive adequacy do not address questions of cinema perception, cinema production, etc. But surely we would like a grammatical theory to be able to shed light on such matters. As defined in Chapter 6, a grammar that does so, achieves some measure of explanatory adequacy. There is indeed a straightforward interpretation of transformational-generative grammar that places it in the broader psychological context of a theory of perceptual and production
158 Toward a Structural Psychology of Cinema
behavior. In linguistics and psycholinguistics this interpretation has been called the derivational theory of complexity, or DTC. The claim is simply this: in the course of actually uttering a sentence, one must actually mentally reconstruct the linguistic derivation of that sentence; and in understanding a sentence, one must reconstruct the inverse of the sentence's derivation. Thus, it is claimed that we understand a sentence, in part, by successively detransforming it back to its underlying deep structure. Sentences with more complex linguistic derivations are accordingly predicted to be more complex behaviorally (i.e. to speak and understand), hence the name derivational theory of complexity.1 The DTC can be simply interpreted in terms of the cinema grammar: in order to understand a cinema scene, it is necessary to detransform the scene back to its underlying event structure. That is, it is the event that is understood. Conversely, to produce a cinema scene, one begins with an event and then transforms it into a series of shots. Intuitively, this seems to be just what happens. It is true that in the case of sentence production much of the mechanics is purely mental, and almost all of it goes on inside the speaker. While in cinema production, the mechanics involves editing devices, projectors, and many other cybernetic extensions. But these differences may be more superficial than they at first appear. Many psycholinguistic experiments have been performed to test and elaborate the DTC hypothesis. McMahon (1963) presented subjects with various types of sentences. In each case he asked the subject to determine whether the sentence was true or false. The time required for this decision was measured. He found that sentences that had more transformations in their grammatical derivations took longer to verify than sentences with fewer transformations in their derivations. For example, active sentences were verified faster than passive sentences (recall section 2.1 of Chapter 5). The grammatical derivation of passive involves an extra transformation, and hence, in the DTC view, it is predicted to require more comprehension (and therefore verification) time, 2 In the present chapter, we will review some recent work in cinema perception that attempts to establish a corollary of the DTC. The claim is that the event and sequence structures described by the grammar are real units of comprehension to the
Actions and Shots as Psychological Units 159
cinema viewer. 3 A routine observation in perceptual psychology is that people do impose a structuring on continua of stimuli they are exposed to. When a descriptively adequate grammatical theory of those stimuli is available, it is reasonable to ask whether people in fact use units of grammatical structure as perceptual units. Bever, Lackner, and Kirk (1969) and Fodor, Bever, and Garrett (1974) have developed the claim that the listener treats the linguistic clause as a perceptual unit. Thus, in their model, sentence perception consists in part of isolating and segmenting together word sequences which correspond to linguistic clauses. Fodor and Bever (1965) and Chapin, Smith, and Abrahamson (1972) have also defined a perceptual unit in terms of a level of linguistic structure. They propose that major surface structure constituents are perceptual units in sentence perception (see Carroll 1976, for review). What these views share in common, is the claim that listeners use their grammatical knowledge to define the units they construct in the course of understanding sentences. On both accounts, then, linguistic transformational-generative grammar is seen as providing insights into human perception, and therefore as achieving some level of explanatory adequacy. In the remainder of this chapter, we will consider analogous possibilities with regard to the cinema grammar. As psycholinguists have tested the hypothesis that the grammatically defined structure of the sentence organizes sentence perception, we may test the hypothesis that the grammatically defined structure of cinema scenes organizes cinema perception. In particular, we ask whether the A nodes and S nodes of event and sequence structure are coherent units to the film viewer.4
2. CINEMA PERCEPTION: EXPERIMENT 1 We proceed in this enterprise using our linguistic analogy. First, we will review a recent psycholinguistic experiment which purports to demonstrate that the structural units defined by linguistic grammar play a role as units during sentence perception. Then,
160 Toward a Structural Psychology of Cinema
we will illustrate how the techniques of such psycholinguistic research can be imported into the study of cinema perception. Bever and Hurtig (1975) presented experimental subjects with sentences which were interrupted with superimposed clicks. Thus, an experimental subject heard something like the following: By making his plan known CLICK Jim brought out the objections of everyone. In this example, the interrupting click co-occurs with the juncture between the two clauses of the sentence. Bever and Hurtig were able to show that when a click is superimposed on the final word of a clause, e.g. 'known' in the example, it was detected less often than when it occurred in other positions, such as in the clause boundary, as indicated in the example, or just after a clause boundary, e.g. the word 'Jim'. Bever and Hurtig interpret their results as indicating that the linguistically defined clause is a unit in sentence perception. In their analysis, a sentence is perceived clause by clause. During a clause the listener gathers information, identifying grammatical elements, subphrases, etc. At the end of the clause, all of the foregoing material is segmented and receded into a holistic mental unit. This receding diverts processing capacities that otherwise could have been directed outward gathering information. Thus, it is just prior to clause boundaries that the listener's ability to detect unexpected interruptions, like clicks, is somewhat impaired. One of our first cinema perception experiments used the click detection idea of Bever and Hurtig. We replaced the emulsion of a single frame in a filmstrip with a yellow filter. At normal silent film projection rates of 16 frames per second, these yellow flashes had a duration of only 60 milliseconds. We then showed the films to spectators who were asked to rate the brightness of the flashes on a scale of 1 to 7 (7 being the brightest). Following Bever and Hurtig, we predicted that if people were engaged in segmentation and receding, their ability to attend to interrupting stimuli like the flashes would be somewhat impaired. Thus, we expected that lower perceived brightness ratings would indicate increased perceptual processing. A set of brief narrative film scenes was prepared. Each of the scenes depicted some ordinary daily life experience, such as conversation, scenes of people walking around in rooms or outside, traffic scenes, or meals. The scenes were constructed to be clear-
Actions and Shots as Psychological Units 161
V Sq
Figure 24.
162 Toward a Structural Psychology of Cinema
ly filmic and were of three structural types: Type EC scenes consisted of two shots joined by a cut, each shot corresponded to an action unit. Type C scenes consisted of two shots joined by a cut, but only a single action was represented. The cuts in the Type EC and Type C scenes involved changes in camera distance (medium-shot to close-shot, or medium-shot to long-shot) or in camera angle (a lateral change of 45° or 90°). Type E scenes consisted of two action units (like Type EC), but contained no cuts. Type E scenes did, however, involve the use of other mise-en-scene devices (e.g. pans, dollies, tilts — see Appendix I for details). The changes in the action of the Type EC and Type E scenes involved relatively abrupt thematic discontinuities (e.g. a small girl is rummaging through a drawer // she turns and walks away down the hall). The structural analyses of the scenes was intended to accord with the significant levels of the cinema grammar outlined in Chapters 4, 5, and 6. The Type EC scenes have the event and sequence structures sketched in Figure 24. Note that the shot boundary in Type EC scenes, is a major boundary (as defined in section 3.2 of Chapter 5).5 The Type C scenes have the structure sketched in Figure 25; and the Type E scenes have the structure outlined in Figure 26. The shot boundary in the Type C scenes is a nonboundary in our earlier terms. (Descriptions of the scenes used in this experiment are presented in the Appendix.) Five exemplars of each type were constructed. One of the five in each type classification was randomly assigned to each of five conditions. In Condition -5, the emulsion of the frame five frames before the predicted structural boundary (i.e. the cut or change in the action), was scraped off and replaced by a yellow filter. In Condition -2, the emulsion was scraped off and replaced by a yellow filter for the frame two frames before the predicted structural boundary. In Condition 0, the emulsion was replaced for a frame coinciding with the structural boundary. (The actual frame replaced was randomly either the frame immediately before the boundary or immediately after it, the boundary itself does not, of course, consist of any frames.) In Condition +2, the frame two frames after the predicted structural boundary was replaced by a yellow filter, and in Condition +5 the frame five frames after the boundary was replaced.
Actions and Shots as Psychological Units 163
V Sq
Figure 25.
164 Toward a Structural Psychology of Cinema
V Sq
Figure 26.
Actions and Shots as Psychological Units 165
These fifteen film scenes (3 types χ 5 conditions) were combined with twelve distractors. The distractors were scenes with yellow flash frames located at random positions. They were intended to discourage participants from simply attending to the cuts and action boundaries. The 15 experimental scenes and 12 distractors were organized into 3 blocks. Each block consisted of 5 experimental scenes, one from each of the five conditions, and 3 distractors. The types were rotated so that each was about equally represented at all serial positions in the presentation order. The final film was shown to twelve observers who rated each yellow flash they saw on a scale of 1 to 7, with 7 being brightest. As an ancillary task, they were asked to classify each scene they saw as Old', 'new', or a 'paraphrase'. An old scene was defined as a literal repetition of a scene previously displayed in the experiment; a new scene was one not already presented; and a paraphrase was defined as a scene technically new yet very similar to an old scene. The purpose of this second task was to ensure that participants gave full attention to processing the experimental scenes (and therefore that they did not merely look for flashes and ignor the scenes). Mistaking old for new scenes (or vice versa) was rare. The mean ratings across subjects are summarized in Table 1, and the overall means, collapsing across type by condition, are graphed in Figure 27.
Condition (Number of Frames from Structural Boundary)
Type of Stimulus Sequence
-5
-2
0
+2
+5
EC
2.58
1.91
2.50
3.58
2.33
C Ε
2.91
2.91
3.00
2.91
1.75 2.41
1.91 2.25
4.75 3.58 3.61
2.50
2.08 2.44
All
3.02
Mean Brightness Ratings Table 1.
166 Toward a Structural Psychology of Cinema
t«
σ> c '•Μ CO
GC
(Λ
4.0 3.0
CD
C
2.0
QQ C 03 0)
1.0
-5
-2
0
Condition (Number of Frames from Structural Boundary) Figure 27. Bever and Hurtig found a great difference in click detection between the position just before a clause boundary, where detection was quite low, and detection in the boundary, where it was quite high. Analogously, we find that flashes are rated as being quite bright when they occur in a structural boundary of a scene, but as relatively less bright when they occur just before a structural boundary. The difference overall between brightness ratings for Condition 0 and Condition -2 is statistically significant collapsing across participants and scenes (p