132 98 53MB
English Pages [217] Year 2003
O u t s t a n d i n g D is s e r t a t i o n s i n L i n g u i s t i c s La u r en c e H o r n ,
General Editor
T he Serial V erb C on str u ctio n Parameter Osamuyimen Stewart Thom pson L ong -D istance D ependencies M ihoko Zushi T he M orphosyntax of the A lgonquian C o n jun ct V erb A Minimalist Approach Julie Brittain T u r n -T aking in E nglish and J apanese Projectability in Grammar, Intonation and Semantics Hiroko Furo M orphologically G overned A ccent in O ptimality T heory John Alderete M inimal Indirect R eference A Theory of the Syntax-Phonology Interface Amanda Seidl D istinctiveness , C o e r c io n and Son ority A Unified Theory of Weight Bruce M oren P honetic and P honological A spects G eminate T iming William H. Ham of
Vowel R eductio n in O ptimality T heory Katherine Crosswhite An E ffort Based A pproach to C on sonant Lenition R obert Kirchner T he Synch ro nic and D iachronic P honology of Ejectives Paul D. Fallon G rammatical Features and the A cquisition of R eference A Comparative Study of Dutch and Spanish Sergio Baauw Auditory R epresentations in P honology Edward S. Flemming
T he T ypology of P arts o f Speech Systems The Markedness o f Adjectives David Beck T he E ffects of P rosody o n A rticulation in E nglish Taehong C ho Parallelism and P rosody in the P rocessing of E llipsis Sentences Katy Carlson P ro d u c tio n , P er ceptio n , and E m ergent P honotactic Patterns A Case o f Contrastive Palatalization Alexei Kochetov RADDOPPIAMENTO SlNTATTICO IN ITALIAN A Synchronic and Diachronic Cross-Dialectical Study Doris Borrelli P resupposition and D iscourse Fun ctio ns of the J apanese Particle
Mo Sachiko Shudo T he Syntax of P ossession in J apanese Takae Tsujioka C ompensatory Len gthening Phonetics, Phonology, Diachrony Darya Kavitskaya T he E ffects of D uration and Son o r ity on C o n t o u r T one D istribution A Typological Survey and Formal Analysis Jie Zhang E xistential Faithfulness A Study of Reduplicative TETU , Feature Movement, and Dissimilation Caro Struijke P ro no un s and W o r d O r d e r in O ld E nglish With Particular Reference to the Indefinite Pronoun Man Linda van Bergen
E llipsis
a n d id 4- m a r k in g in
J apanese C o n v e r s a t io n
John Fry
R outledge N ew York & L ondon
Published in 2003 by Roudedge 29 West 35th Street New York, NY 10001 www.routledge-ny.com Published in Great Britain by Roudedge 11 New Fetter Lane London EC4P 4EE www.roudedge.co.uk Roudedge is an imprint of the Taylor & Francis Group Printed in the United States o f America on acid-free paper. Copyright © 2003 by Taylor & Francis Books, Inc. All rights reserved. N o part of this book may be reprinted or reproduced or utilized in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publisher. 10 9 8 7 6 5 4 3 2 1
Library o f Congress Cataloging-in-Publication Data for this book is available from the Library o f Congress. ISBN 0-415-96764-3
Contents
List of Tables
*x
List of Figures
x>
Acknowledgments
xi>>
1 Introduction 1.1 Overview of the book 1.1.1 Part I: TheCHJ corpus 1.1.2 Part II: Ellipsis and wa-marking 1.2 Notes to the reader 1.2.1 Intended audience 1.2.2 Availability of data 1.2.3 Japanese language examples
1 1 2 2 3 3 4 4
1 The CHJ corpus
7
2
Corpora and conversation 2.1 Introduction to language corpora 2.1.1 The role of the corpus in linguistics 2.1.2 Basic features of corpora 2.1.3 Annotated corpora 2.2 Speech corpora 2.2.1 Spoken vs. written language 2.2.2 Planned speech 2.2.3 Pragmatic or task-oriented dialogues 2.2.4 Casual conversations 2.3 Characteristics of conversation 2.3.1 Turn-taking behavior
9 9 9 12 13 15 15 16 17 18 19 19
Contents
vi 2.3.2 2.3.3 2.3.4
Backchannel behavior Disfluencies Conversational structure
20 21 23
3
The CHJ corpus 3.1 The LDC CallHome corpora 3.2 About the CHJ corpus 3.3 About the speakers 3.4 The CHJ transcripts 3.4.1 Morphological segmentation 3.4.2 Size of the CHJ corpus 3.4.3 Other transcription conventions 3.4.4 Alterations to the transcripts
27 27 28 28 31 33 34 34 35
4
Annotating the CHJ corpus 4.1 Introduction 4.1.1 Native-speaker annotators 4.1.2 NTT Goi-Taikei semantic dictionary 4.2 The CHJ lexicon 4.2.1 Overview of the lexicon 4.2.2 GT semantic categories 4.3 Semantic and POS annotations 4.3.1 Format of the annotated transcripts 4.3.2 POS annotations 4.4 Predicate-argument annotations 4.4.1 Structural annotation 4.4.2 Goi-Taikei transfer dictionary 4.4.3 Hand-tagging of predicate-argument relations 4.4.4 Results of the hand tagging 4.4.5 Predicate-argument annotation format 4.5 Acoustic annotations 4.5.1 Review of speech processing concepts 4.5.2 Processing the CHJ speech data 4.5.3 JFo measurements 4.5.4 Word segmentation 4.5.5 Format of acoustic annotations
37 37 38 38 39 39 42 45 45 46 48 48 50 53 61 66 67 67 69 70 71 75
Contents
vii
II Ellipsis and wa-marking
77
5 Ellipsis 5.1 Introduction to Part II 5.2 Introduction to ellipsis 5.2.1 What is ellipsis? 5.2.2 Examples of ellipsis 5.2.3 Functions of ellipsis 5.3 Argument ellipsis 5.3.1 Argument ellipsis in the CHJ corpus 5.3.2 Subject ellipsis 5.3.3 Ellipsis in transitive and intransitive predicates 5.3.4 Conclusion: argument ellipsis 5.4 Particle ellipsis 5.4.1 Introduction 5.4.2 Sex and dialect 5.4.3 Syntactic factors in particle ellipsis 5.4.4 Animacy and definiteness 5.4.5 Focus and particle ellipsis 5.4.6 Conclusion: particle ellipsis
79 79 82 82 82 83 84 84 87 92 95 96 96 101 104 109 114 119
6
121 121 122 123 125 125 128 130 131 133 137 144 145 145 148 151 155 156
Wfa-marking 6.1 Introduction 6.1.1 Topic and subject in Japanese 6.1.2 Mechanics of wn-marking 6.2 Semantics of wa- and ga-phrases 6.2.1 Kuno’s taxonomy of wa and ga 6.2.2 Categorical vs. thetic judgments 6.2.3 Wa as a backgrounding particle 6.2.4 Old vs. new information 6.2.5 File card-based accounts of wa and ga 6.2.6 The Strong Familiarity Condition 6.2.7 Conclusion: semantics of wa- and gn-phrases 6.3 Intonation and wa and ga 6.3.1 Intonation and focus 6.3.2 Fo correlates of vva-phrases 6.3.3 F0 correlates of wa and ga in CHJ 6.3.4 Conclusion: intonation and wa and ga 6.4 Properties of wa-marked nouns
viii
Contents 6.4.1 6.4.2 6.4.3
III
Accessibility to wa-marking Semantic properties of wa- and gn-marked nouns Conclusion: properties of wa-marked nouns
Appendices
156 158 164
169
A Background on the Japanese language A. 1 Introduction A.2 Grammar A.3 Dialects A.4 Sentence-final discourse particles
171 171 171 178 179
Bibliography
183
Author Index
199
Subject Index
203
List of Tables 1.1
Translations for Japanese grammatical morphemes
5
2.1 2.2
Examples of English corpora (Jurafsky and Martin 2000) Vocalizations in the CHJ transcripts
12 23
3.1
CHJ transcription conventions
34
4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8
Fragment from our CHJ lexicon Most common POS categories in the lexicon GT argument roles (Bond and Shirai 1997) Light verbs excluded from predicate-argument tagging Missing verbs excluded from predicate-argument tagging Most frequent tagged predicate types Distribution of predicate sense judgments over annotators Word boundaries for first utterance of transcript 0696
40 41 52 54 55 62 62 74
5.1
Ellipsis rates for argument roles
86
5.2
N1 ellipsis rates for particular predicates
88
5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14
Transitivity and humanness in Nl ellipsis Ellipsis rates by speaker sex Fill rates for transitive and intransitive predicates Fill rates for transitive and intransitive predicates in CHJ Fill rates for A, O, and S in Sacapultec and CHJ Particles following nouns in five argument roles Presentation format of particle ellipsis data Particle ellipsis rates by sex for Nl and N2 Particles following N l by speaker sex Particle ellipsis rates by dialect for N l and N2 Particle ellipsis rates forN l and N2 wh-words Particle ellipsis rates forN l and N2 in questions
90 92 93 95 95 99 101 102 103 104 104 106
List o f Tables
X
5.15 5.16 5.17 5.18 5.19 5.20 5.21 5.22
Particle ellipsis rates for N1 and N2 in short sentences Particle ellipsis rates for monosyllabic N1 and N2 Particle ellipsis rates for verb-adjacent N1 and N2 NP classes and o-ellipsis (Minashima 2001) Particle ellipsis and animacy in CHJ Particle ellipsis and strongly definite NPs in CHJ Particle ellipsis in grammatically defocused positions Particle ellipsis and prosodic focus
107 108 109 111 112 113 116 118
6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13 6.14 6.15
Four sentence types Japanese definite NP data from P&Y (1998) Complex NPs in the CHJ corpus satisfying (i)-(iii) Scale of accentability (Matsunaga 1984) Average F0 values (Hz) from Finn (1984) Average F0 measurements (Hz) over 7,106 nouns ANOVA results for 4,035 topics and subjects Particles following nouns in five argument roles Accessibility to wa-marking in the CHJ corpus Proper noun classes followed by particles Concrete common noun classes followed by particles Abstract common noun classes followed by particles Examples of semantic categories Proportion of times each noun type was gn-marked Proportion of times each noun type was wa-marked
135 139 141 147 149 152 154 157 158 160 161 162 163 163 164
A. 1 Japanese hiragana and katakana syllabaries A.2 Examples of NP-final particles A.3 Sentence-final discourse particles in the CHJ corpus
172 173 181
List of Figures 3.1 3.2 3.3 3.4 3.5 3.6
Sex of callers and recipients Sex and age classifications of the 269 CHJ speakers Regional accents or dialects of the 269 CHJ speakers Years of education of 84 CHJ callers Age range of 74 CHJ callers Fragment from CHJ transcript no. 0696
4.1 GT common-noun semantic hierarchy (top four levels) 4.2 GT proper-noun semantic hierarchy (top three levels) 4.3 Fragment from annotated transcript 0696 4.4 Two senses of the verb torn in the GT transfer dictionary 4.5 Fragment from CHJ transcript no. 0696 4.6 Screen shot of verb sense disambiguation application 4.7 Final format of annotated transcript 0696 4.8 Screen shot of ESPS/waves+ software used for analysis of CHJ speech 4.9 Acoustic and phonetic data file for nouns 6.1 6.2 6.3
CART tree fragment from Venditti (2000) Peak Fo values from Table 6.6. Keenan-Comrie accessibility hierarchy
29 30 30 31 31 32 43 44 46 51 58 60 66 70 75 150 153 156
Acknowledgments This manuscript is a revised version of my 2001 Ph.D. dissertation at Stanford University. The transition from thesis to book was facilitated by Larry Horn, the series editor, and Paul Foster Johnson, the dissertations editor at Routledge. I’m grateful to my dissertation committee—David Beaver, Chris Manning, Yoshiko Matsumoto, Stanley Peters, and Peter Sells—for their advice, encourage ment, and patience while the dissertation was progressing, and to the other faculty, staff, and students at the Stanford Linguistics Department for creating such a rich and enjoyable environment for learning about language. Stanley Peters’ lab at CSLI is where this book was conceived and written, and I’m grateful to the CSLI researchers and staff, especially Stanley himself, for making that daily experience so enjoyable. I’m particularly indebted to CSLI visitors Tim Baldwin, Francis Bond, and Yasuharu Den for so freely lending me their considerable expertise in Japanese NLP. Thanks also to Chris Manning and Anubha Kothari for catching a mistake in my original particle ellipsis counts. This book would not have been possible without generous financial support from two sources: U.S. National Science Foundation grant BCS-0002646, and a dissertation grant in Japanese Studies from Institute for International Studies at Stanford. These grants enabled me to employ a group of nine very talented and hard-working native speaker research assistants: Motohide Hatanaka, Misa Miyachi, Tomoko Momiyama, Juno Nakamura, Yuko Okado, Emi Suzuki, Ai Takahama, Michiko Ueda, and Rika Yonemura. Extra special thanks are due Misa, Juno, Yuko, and Emi for their long and dedicated service. This book and the research it describes were produced using free, open source software, notably Debian GNU/Linux, GNU Emacs, DTgX, Perl, and Ruby. I thank the open source software community for developing so many tools of unri valed quality. Finally, I thank my wife ChiSook Julie Hwang for her unflagging love and forbearance, little Jen and Shu for their companionship, and my parents, Frances and Steve, who made it all happen.
xiii
CHAPTER 1
Introduction Contents 1.1
1.2
1.1
Overview of the book 1.1.1
Part 1: The CHJ corpus
2
1.1.2
Part II: Ellipsis and wo-marking
2
Notes to the reader 1.2.1 1.2.2
Intended audience Availability of data
1.2.3
Japanese language examples
3 3 4 4
Overview of the book
This book investigates the use and function of two linguistic mechanisms, ellipsis and wfl-marking, in colloquial Japanese speech. Ellipsis is the phenomenon whereby a speaker omits from an utterance nor mally obligatory elements of syntactic structure. In this book we focus on two types of ellipsis in colloquial Japanese: the omission of arguments to predicates, and the ellipsis of grammatical particles after noun phrases. The term wa-marking refers to a Japanese speaker’s use of the particle wa to mark an NP or other phrase. The particle wa, among other functions, marks the topic of a Japanese utterance; that is, what the utterance is ‘about.’ Our approach to ellipsis and vra-marking is largely empirical: we rely on quan titative and qualitative analyses of a large Japanese speech corpus in order to elu cidate how and when these linguistic mechanisms are exploited by speakers in natural, spontaneous speech. One of the advantages of working with a large corpus, as we remark in Sec tion 2.1.1 of the next chapter, is that in many cases it allows us to make robust statistical generalizations based on thousands of examples of the phenomena un der study. Indeed, this book presents a number of such quantitative generalizations concerning the use of ellipsis and wa-marking in Japanese colloquial speech. At the same time, one does well to heed the warning that “statistical results themselves reveal nothing, and require careful and systematic interpretation by
Ellipsis and wa-marking in Japanese Conversation
2
the investigator to become linguistic data” (Pustejovsky et al. 1993, p. 354). In that spirit, this book also devotes considerable attention to the relevant theoretical linguistics literature on ellipsis and vra-marking, in order to provide context and motivation for our empirical investigations. In some cases, our quantitative results serve to support or to undermine specific theoretical claims about ellipsis or wamarking from the Japanese linguistics literature. In other cases, we attempt to formulate our own interpretations or explanations of our observations. Finally, in some cases we simply report surprising results or interesting data in the hope that future theories, or current theories unknown to us, might account for them. The book is organized into two parts, whose contents are summarized below. 1.1.1
Part I: The CHJ corpus
In empirical research (in linguistics and elsewhere), it is important to be explicit about the data on which the research is based. How, when, and where were the data collected? In what format? Are the data meaningful, precise, and reliable? Do they adequately capture and represent the phenomena under study? Could other data have been used instead? We therefore devote Part I of this book to describing, explaining, and justifying the language data used in our research on ellipsis and wa-marking. Part I consists of three chapters: • Chapter 2 introduces certain basic facts about language corpora and how they are used in linguistics. It also explains how spontaneous conversational speech, the kind of language examined in this book, differs from other types of language data such as planned monologues or written texts. • Chapter 3 describes the CallHome Japanese (CHJ) corpus (LDC 1996), a collection of 120 recorded telephone conversations that serves as the basis of our annotated corpus. • Chapter 4 presents a detailed description of the linguistic annotations— comprising various types of acoustic, phonetic, syntactic, and semantic information—that we subsequently added to the original CHJ data described in Chapter 3. 1.1.2
Part II: Ellipsis and wa-marking
Part II, the main part of the book, investigates the phenomena of ellipsis and wa-marking in Japanese conversation. We examine how ellipsis and wa-marking
Introduction
3
are actually used in natural, everyday Japanese speech, based on quantitative and qualitative analyses of our annotated CHJ corpus. Part II consists of two extended chapters: • Chapter 5 is concerned with ellipsis. We focus on two types of ellipsis in colloquial Japanese: the omission of arguments to predicates, and the ellip sis of grammatical particles after noun phrases (NPs). First, we demonstrate that Japanese conversation obeys certain principles of argument ellipsis that appear to be language universal: namely, the tendency to omit transitive and human subjects and the tendency to express at most one argument per clause. Next, we identify a set of syntactic and semantic factors that corre late significantly with the ellipsis of grammatical particles following an NP. These factors include the grammatical construction type (question, idiom), length of the NP (in syllables), utterance length (in words), proximity of the NP to the predicate, and the animacy and definiteness of the NP. • Chapter 6 is concerned with wa-marking. The particle wa is generally said to mark the topic of a Japanese utterance; that is, what the utterance is ‘about.’ However, the semantic and pragmatic functions of wa are quite subtle and complex, as we discuss in Section 6.2. Chapter 6 also identifies a set of semantic and prosodic properties that tend to distinguish wa from the subject-marking particle ga. In terms of lexical semantics, we show that nouns marked by ga tend to be animate, while wa is strongly associ ated with references to locations and times. We also show that wa-marked phrases exhibit more prominent intonation, as measured by peak F0, than ga-marked phrases in the CHJ speech data.
1.2
Notes to the reader
1.2.1 Intended audience We imagine that this book will be of interest primarily to researchers in Japanese linguistics and Japanese natural language processing (NLP). These are fields in which ellipsis and wa-marking have traditionally been important topics of inquiry. Others who might find this book of interest include language typologists, corpus linguists, and conversation analysts. In any case, our intention is to present the material in this book in such a way as to make it easily accessible to as wide an audience as possible. To that end, we do not presuppose too much knowledge of linguistics, corpus research, speech
Ellipsis and wa-marking in Japanese Conversation
4
processing, or even the Japanese language. In the case of Japanese, Appendix A covers basic linguistic facts about the language for readers who are unfamiliar with it. Those readers who are already familiar with corpus linguistics and with the properties of conversational language can safely skip the introductory material on these topics in Chapter 2. Readers whose eyes glaze over at the fine (often tedious) details of corpus annotation might choose to skim through parts of Chapter 4, although those who wish to use our annotated CHJ corpus in their own research should find those details useful. 1.2.2 Availability of data The original CHJ speech data and transcripts, described in Chapter 3, are avail able to the general research community through the Linguistic Data Consortium (LDC 1996). The annotated CHJ transcripts and other data that we created for this book, described in Chapter 4, are also being published through the Linguistic Data Consortium under the title “Annotated CallHome Japanese Corpus.” 1.2.3 Japanese language examples At many points in this book we present examples of Japanese utterances. Almost all of our Japanese examples represent actual utterances taken from the transcripts of the CHJ corpus. The examples are numbered and presented in the standard linguistic format shown in (1.1). tz
zya
koko
1i wa
d a iz y o u b u
da
t to
S 3 om ou
It if. kedo
w ell
h e re
TOP
a ll rig h t
COP
COMP
th in k
how ever
«__ '__
‘Well, I think it’s all right here, but... ’
(0696; 416)
In (1.1), the utterance is rendered first in Japanese characters, next in romanized pronunciation (kunreisiki transliteration), then in word-by-word translation, and finally in an English gloss. (In fact, our Japanese language examples generally do not include Japanese characters except where they are necessary or relevant.) Along the right margin of each Japanese language example, across from its English gloss, is printed the CHJ transcript number from which the utterance was taken, and the approximate start time of the utterance in that conversation. For instance, utterance (1.1) is taken from the transcript of conversation number 0696, where it occurs approximately 416 seconds into the conversation. Detailed infor mation on the structure and format of the transcripts themselves is presented in Chapter 3.
Introduction
5 Abbreviation COMP COP FP GEN GER NEG NOM OBJ PASS PAST POT PRES
Q QUOT SUBJ TOP
Function complementizer copula final particle genitive gerund negative nominalizer object passive past tense potential/ability present tense question quotation subject topic
Example to da, desu ne, sa no -te nai no, koto WO
rare -ta
rare -ru, -i ka tte
ga
wa
Table 1.1: Translations for Japanese grammatical morphemes Supplying word-by-word translations for Japanese language examples is made difficult by the fact that many Japanese morphemes do not translate directly into individual English words. This is the case, for example, with the morphemes wa, da, and to in example (1.1). Rather than translating morphemes such as these into English, we simply supply abbreviations of their grammatical and discourse function, using the notation listed in Table 1.1. Finally, a note on personal pronouns. Inspired by Wolters (2001), our policy throughout this book is to refer to the speaker as “she” (mnemonic: She=Speaker) and to the addressee as “he” (mnemonic: He=Hearer).
Part I The CHJ corpus
CHAPTER 2
Corpora and conversation Contents 2.1
2.2
2.3
Introduction to language corpora
9
2.1.1
9
The role of the corpus in linguistics
2.1.2
Basic features of corpora
2.1.3
Annotated corpora
12 13
Speech cotpora
15
2.2.1
Spoken vs. written language
15
2.2.2
Planned speech
16
2.2.3
Pragmatic or task-oriented dialogues
17
2.2.4
Casual conversations
Characteristics of conversation 2.3.1
Turn-taking behavior
18 19 19
2.3.2
Backchannel behavior
20
2.3.3
Disfluencies
21
2.3.4
Conversational structure
23
2.1
Introduction to language corpora
2.1.1
The role of the corpus in linguistics
Corpus linguistics refers to the systematic study of large collections of text or speech for the purpose of formulating and empirically testing scientific hypothe ses about language (McEnery and Wilson 1996; Biber et al. 1998). Corpus-based research has a long history in 19th- and 20th-century linguistics (see McEnery and Wilson 1996, pp. 2-4). Even before the development of digital computers, linguists made use of corpora in order to compile grammars and study language acquisition by children. Later, in the 1960s and 1970s, these empirical, corpusbased approaches fell out of favor, as linguists adopted the more rationalist pro gram inaugurated by Chomsky (1957). By the 1990s, however, the popularity of corpus linguistics had rebounded, due in part to a shift in interest away from 9
10
Ellipsis and wd-marking in Japanese Conversation
isolated sentences and onto longer, connected streams of text and speech. At the material level, the growth of corpus linguistics was also fueled by the increasing availability to researchers of two resources: very large corpora of text and speech, and the computer power and storage capacity required for analysis of these large data sets. One manifestation of this empirical turn in linguistics was the founding, in 1992, of the Linguistic Data Consortium (LDC) at the University of Pennsylva nia. The LDC serves as a repository and clearinghouse for hundreds of corpora, including our CallHome Japanese collection of telephone conversations. Types o f linguistic data The coipus is not the only source of language data available to the linguist. It is therefore appropriate to consider how the corpus differs from other types of language data, and what kinds of linguistic inquiry can benefit from corpus data. Our discussion of this topic is based on Williams’ (1996, p. 4) taxonomy of linguistic data. Williams classifies language data as systematic or unsystematic, depending on whether or not the data are collected or sampled in an organized, objective, and premeditated fashion. Generally speaking, systematic data are nec essary in order to make quantifiable generalizations, whereas nonsystematic data provide attestation, or a kind of ‘existence proof,’ for qualitative observations. The debate over the relative merits of quantitative versus qualitative research methods is a familiar one in the social sciences. On the one hand, quantitative researchers point out that results obtained from small uncontrolled samples often cannot be replicated and do not extend reliably to the general population. On the other hand, qualitative methods are capable of generating objectively verifiable claims and have led to important discoveries. In fact, the methodologies are often complementary: initial, qualitative insights and theories are subjected to empirical testing, at which point they are verified, falsified, or refined as necessary. Part II of this book, which subjects a number of theoretical claims in the Japanese linguistics literature to empirical testing using the CHJ data, is an example of this process at work. Non-systematic linguistic data The most familiar type of non-systematic linguistic data in Williams’ taxonomy is the linguist’s own introspection. As Chomsky often points out, there is vast reservoir of linguistic knowledge in the mind of every native speaker that is read ily available via introspection, and which any linguistic theory will have to ac
Corpora and conversation
11
count for. “The problem for the grammarian” suggests Chomsky (1965, p. 20), “is to construct a description, and, where possible, an explanation, for the enor mous mass of unquestionable data concerning the linguistic intuition of the native speaker (often, himself).” One advantage of introspective linguistic data is that it provides the researcher immediate access to hypothetical examples of language, both well- and ill-formed, with which to test hypotheses. On the other hand, linguistic intuitions can vary considerably from speaker to speaker, and there is always the danger that the researcher will favor, consciously or not, those data which help to support his or her claim (Schiitze 1996). A second type of non-systematic data is anecdotal evidence, which refers to data that the linguist comes upon (reads, overhears, etc.) by chance. Such data were an important source for Fromkin (1973) in her research on slips of the tongue and other speech errors. Systematic linguistic data The two types of systematically gathered linguistic data in Williams’ taxonomy are elicited data and corpus data. Elicitation experiments involve the active ac quisition of a specific type of data from native speakers in order to test a pre determined hypothesis. Svartvik and Quirk (1980) distinguish between perfor mance elicitations, which are used to test hypotheses about language production, and judgment elicitations, which are used to test language perception. Asking a subject to read aloud a list of words into a tape recorder is a common type of per formance elicitation, whereas asking a native speaker about the well-formedness or naturalness of an expression is a common type of judgment elicitation. The second type of systematic linguistic data is the corpus. A corpus is usu ally defined as a large but coherent collection of text or speech. In other words, rather than being just a concatenation of random text or speech samples, a corpus reflects some particular, well-defined language variety or varieties. In contempo rary linguistics, corpora are also of necessity machine-readable, so that computers can be used to count, classify, and search for relevant linguistic features. The two most important properties of a corpus to a linguist are its represen tativeness and its size. To yield valid results, the text or speech in a corpus must be sampled so as to be maximally representative of the type of language under consideration. In addition, the corpus should contain “as large a mass of data as is practicable, with the aim of smoothing out any potential bias due to the operation of linguistic variables” (Williams 1996, p. 6). Of course, no database can possi bly be large enough to yield generalizable results for every conceivable linguistic
Ellipsis and wd-marking in Japanese Conversation
12
Corpus Complete works of Shakespeare Brown corpus Switchboard corpus
Medium written written telephone speech
Tokens 884,647 1,000,000 2,400,000
Types 29,066 61,805 20,000
Table 2.1: Examples of English corpora (Jurafsky and Martin 2000) inquiry. The results of a corpus experiment are necessarily circumscribed by the limitations of the corpus itself. A corpus has both qualitative and quantitative applications in linguistics. In qualitative research, the corpus is searched in order to identify and describe as pects of usage and to provide ‘real-life’ examples of particular phenomena. In quantitative research, linguistic features are classified, counted, and correlated with other features, and statistical techniques are used to explain observations and to generalize findings to larger populations. It is this capacity to support objective, quantitative measurement of linguistic phenomena that sets the corpus apart from other types of linguistic data. Another unique advantage afforded by corpus data is that the linguistic fea tures to be investigated need not be spelled out in advance of the compilation of the corpus. This means that when surprising results or observations emerge from an experiment, new theories and hypotheses can be formulated and tested without compiling a new data set. 2.1.2
Basic features of corpora
The size of a language corpus is often described in terms of the number of words it contains. However, it is sometimes useful to describe a corpus in units other than words. The number of types in a corpus corresponds to the size of its lexicon; i.e., the number of distinct words found in the corpus. The number of tokens is the sum of the instances of all types in the corpus, i.e. the total number of running words. A similar distinction is involved in the notions of wordform and lemma. A wordform is an inflected form of a word as it appears in the corpus. A lemma refers to a set of lexical forms which share the same stem, the same major part-of-speech, and the same sense (much like a headword in a dictionary). Table 2.1 lists the number of wordform tokens and wordform types found in three large, well-known English language corpora (Jurafsky and Martin 2000, p. 195). The Switchboard corpus and the Brown corpus from Table 2.1 are two well-known instances of a speech corpus and written language corpus, respec
Corpora and conversation
13
tively. The Switchboard corpus (Godfrey et al. 1992), compiled by Texas Instru ments, is a collection of 2,430 recorded telephone conversations between strangers. The Brown corpus (Kucera and Francis 1967), compiled at Brown University starting in 1961, is credited with being the first large machine-readable corpus of English writing. It contains sample texts from a variety of genres such as fiction, newspaper articles, and scholarly writing. 2.1.3
Annotated corpora
In a linguistically annotated corpus, the individual morphemes, words, or sen tences are marked with one or more phonetic, syntactic, semantic, or discourse classification tags. The most common type of corpus annotation is the part-of-speech (POS) tag. Traditional POS categories include noun, verb, pronoun, adverb, and adjective. For purposes of annotating every word in a corpus, however, a larger and more fine-grained inventory of POS categories is typically required. The Brown cor pus, for example, was tagged using an inventory of 87 English POS categories (Francis and Kucera 1982). A corpus containing accurate POS annotations is useful as training data for computer programs called taggers that automatically determine the POS categories of their language input (Jurafsky and Martin 2000). Automatic POS taggers are used in a number of language technologies including speech recognition, information retrieval, and spelling correction. A second common type of annotation, this time at the sentence level, is the syntactic parse tree. A corpus of syntactically parsed sentences is called a treebank. The best known example is the Penn Treebank (Marcus et al. 1993), a col lection of parsed sentences taken from several existing corpora, including Brown and Switchboard. The syntactic trees in the Penn Treebank were parsed by com puter and then hand-corrected by linguists. Treebanks are useful in NLP as train ing data for probabilistic parsing programs (Manning and Schutze 1999). Word senses are another type of linguistic information sometimes found in annotated corpora. Word sense annotations specify the particular sense of each word token in a corpus, based on some pre-existing inventory of meanings. For example, the WordNet electronic dictionary (Fellbaum 1998) lists two senses for the noun tree, one meaning ‘tall woody plant’ and the other meaning ‘diagram’ (the sense typically associated with linguistics texts). A large sense-tagged corpus of English is described by Kilgarriff and Rosenzweig (2000). Recently, attempts have been made to annotate natural dialogue corpora with tags that indicate which ‘dialogue game move’ was intended by the speaker. For
14
Ellipsis and v/n-marking in Japanese Conversation
example, Levin et al. (1999) annotated the CallHome Spanish corpus with dia logue game moves drawn from the following set of eight basic types: seeking information, giving information, giving directive, action commit, giving opinion, expressive, seeking confirmation, and communication filler. In Chapter 4 we describe the annotations we made to the CHJ corpus in order to facilitate our research into ellipsis and wa-marking. These annotations include a combination of phonetic, acoustic, POS, semantic, and word sense tags. Reliability o f corpus annotations Linguistic analyses that are based on corpus annotations can only be as meaning ful, accurate, and reliable as the annotations themselves. Meaningful annotations are those that adequately capture the distinctions they were designed to make. Re liable annotations are those for which the same annotations can be obtained on successive trials. Reliability is especially important for hand-annotated linguistic data. If human coders cannot agree on how to tag specific cases, then the validity of the coding system, and any experimental results derived from it, are open to question. Techniques for determining the reliability of human-coded linguistic annota tions are discussed by Lampert and Ervin-Tripp (1993) and by Carletta (1996). One common method is to calculate the proportion of coder agreement; that is, the number of times the coders agree on a tag divided by the total number of cases. The closer the proportion is to 1, the higher the degree of inter-annotator agreement and reliability. However, the proportion of agreement alone is insufficient as a reliability judg ment, because it fails to take into account chance agreement. After all, if two annotators were to classify cases into one of two categories at random, we would expect their judgments to agree half the time simply by chance. In order to deter mine the reliability of human tagging, then, one must control for expected chance agreement. A descriptive statistic that is used for this purpose is the kappa statistic (Co hen 1960; Siegel and Castellan 1988). Kappa (K ) represents the proportion of observed agreement that is not attributable to chance. It is computed as the ratio of the proportion of agreement (corrected for chance agreement) to the maximum possible proportion of agreement (also corrected for chance agreement): _ P(A) - P(E) 1 - P(E)
Corpora and conversation
15
Here P ( A ) is the proportion of times that the coders agree and P{E) is the pro portion of times that we would expect them to agree by chance. Siegel and Castellan (1988, pp. 284-291) note that the K statistic in fact rep resents a family of agreement measures, in which P{A) and P{E) might be com puted in various ways depending on the particular coding scheme used. In Section 4.4.4 we address the issue of the reliability of the hand-coded predicate-argument annotations that we performed on the CHJ corpus.
2.2
Speech corpora
A corpus typically contains samples of either spoken language or written lan guage, but not both. This separation is a sensible one, given our stipulation in Section 2.1.1 that a corpus should be maximally representative of the style of lan guage being studied. As we will see below, spoken and written language often exhibit considerable differences. A spoken-language corpus may contain either recorded speech (that is, actual acoustic data, such as Switchboard), or written transcriptions of spoken language. The London-Lund corpus (Svartvik and Quirk 1980; Svartvik et al. 1982) is an example of the latter—it contains prosodically-annotated transcriptions of con versations, but not the original speech data. Ideally, of course, a corpus would offer both of these formats together. A true speech corpus, then, contains actual recorded speech, generally in a format suitable for phonetic and prosodic analysis in the laboratory. The LDC distributes its speech corpora on compact discs con taining digitized speech data that is readable by speech processing software such as ESPS/waves+ (Entropic 1999) (Section 3.2). 2.2.1 Spoken vs. written language The language found in speech corpora like Switchboard usually differs in a num ber of respects from that found in collections of writing like the Brown corpus.1 First of all, spoken language is typically more terse, less grammatically cor rect, less well-structured, and more ambiguous than text (Tannen 1982; Brown and Yule 1983). Natural speech is replete with extra-grammatical disfluencies such as pauses, interruptions, hesitations, false starts, repetitions, and repairs. Spoken language also tends to be highly elliptical. In conversation, a speaker is unlikely to respond to a question using a full grammatical sentence; more likely,1 1We emphasize that these differences are only tendencies, not defining features. Quantitative stud ies by Biber (1988) demonstrate that the genre or type of a text is a better predictor of its lexical and grammatical properties than the medium of the text (spoken or written).
Ellipsis and wa-marking in Japanese Conversation
16
the speaker will answer with a partial phrase such as oh, just to the store, or no, sorry. In Chapter 5 of this book we thoroughly investigate two types of ellipsis in the CHJ conversations: argument ellipsis and particle ellipsis. The casual, el liptical style of the CHJ corpus makes it an ideal data set for studying ellipsis in Japanese. A number of researchers have observed particular lexical tendencies that dis tinguish speech from writing. Studies of comparable stretches of spoken and writ ten English have found the spoken texts to exhibit fewer words per sentence, fewer syllables per word, more one-syllable words, more references to people, and fewer attributive adjectives (Drieman 1962; Gruner et al. 1967). Another common ob servation is that speech exhibits less lexical variety (that is, a smaller vocabulary) than writing. Examination of Table 2.1 on page 12, for instance, reveals that the Switchboard speech corpus contains a significantly smaller number of wordform types in comparison to the two written corpora (Brown and Shakespeare), even though the number of overall tokens in Switchboard is far greater. Halliday (1989, 1994) observes that written language displays not only greater lexical variety, but also greater lexical density, as measured by the number of nouns and other openclass or ‘lexical’ words per clause. In speech, on the other hand, clauses tend to contain more functional words such as determiners, pronouns, prepositions, and conjunctions. Finally, it has been observed that spoken and written discourse often differ with respect to the degree of emotional involvement they display (Chafe 1982; Tannen 1982; Maynard 1993). In written language, the focus tends to reside on the content of the message being conveyed. Spoken language, on the other hand, often reveals the emotional involvement of the speaker in the content of her message. This involvement is manifested linguistically by more frequent use of first-person reference, more overt expression of mental states, and greater use of emphatic expressions. In Japanese, this emotional involvement manifests itself most notably through the use of discourse particles (Section A.4). 2.2.2
Planned speech
Some speech corpora consist of planned speech of one kind or another. The Lancaster/IBM corpus (Knowles et al. 1996), for instance, offers 50,000 words of planned monologues in British English. A much larger corpus of Japanese mono logues is under development by Furui (2000). Some planned speech corpora are simply sequences of isolated utterances for use in speech recognition research. Examples include the JEIDA corpus of Japanese (Itahashi 1991) and the TIMIT corpus of American English (LDC 1993).
Corpora and conversation
17
The distinction between planned and unplanned speech can be an important one in linguistic research. Whether an utterance is spontaneous or planned in advance is known to affect a number of its linguistic features, from discourse structure (Ochs 1979) to prosody (Beckman 1997). In the case of prosody, there have been a number of cases where laboratory results obtained with spontaneous, natural speech data turned out to diverge substantially from results obtained with planned, experimentally controlled speech (Yaeger-Dror 1985; Silverman et al. 1992). We will see another example of this phenomenon at work in our examina tion of the intonation of Japanese topics and subjects in Section 6.3.2. 2.2.3 Pragmatic or task-oriented dialogues The telephone conversations examined in this book represent casual, rather than pragmatic, interactions. This typological distinction has important consequences for the structure and language style of the conversations. A pragmatic dialogue or conversation is motivated by a clear pragmatic pur pose; for example, buying or selling, seeking help, or making an appointment. Brown and Yule (1983) call these transactional, as opposed to interactional, di alogues. Pragmatic interactions tend to be more formal in tone than casual con versations, and they typically come to an end once the pragmatic goal has been achieved. One type of pragmatic interaction is the task-oriented dialogue, in which the participants attempt to complete some well-defined task such as assembling a piece of machinery together. Grosz (1977) shows how the structure of a taskoriented dialogue parallels the structure of the task, with sub-dialogues corre sponding to sub-tasks and so on. Several large corpora of pragmatic or task-oriented conversations in English have been developed, including: • The Edinburgh HCRC Map Task corpus (Anderson et al. 1992), a collection of short two-person dialogues in which one speaker tries to draw a route on a map, following the instructions of another speaker. • The MADCOW/ATIS corpus of human-computer airline reservation dia logues (Hirschman et al. 1993). • The Air Traffic Control corpus of radio traffic between controllers and pilots (LDC 1994). • The TRAINS railroad transportation route-planning dialogues (Heeman and Allen 1995).
Ellipsis and wa-marking in Japanese Conversation
18
Japanese task-oriented dialogues There are also several collections of Japanese task-oriented dialogues. These in clude a Japanese version of the Map Task corpus (Horiuchi et al. 1999), and the ATR dialogue database, or ADD (Ehara et al. 1990). The ADD data include dia logues between a travel agent and her customers, and dialogues between the secre tariat and the participants of an international conference. The ADD dialogues are simulated; the participants were simply role-playing, not actually trying to carry out these tasks in the real world. Some of the ADD exchanges are spoken, while others were typed on a keyboard. ATR has also developed an annotated corpus of 500 spoken task-oriented dia logues, called the ATR-ITL speech and language database (Takezawa et al. 1998). The ATR-ITL dialogues are annotated with coreference tags that specify, among other things, explicit referents for elided (i.e., ‘omitted’) arguments to predicates (Section 5.3). 2.2.4 Casual conversations Casual conversations exhibit a number of differences from the pragmatic or taskoriented dialogues discussed above. First of all, casual conversations are not mo tivated by a clear pragmatic purpose. Stylistically, casual conversations tend to display more informality, humor, and colloquial language than pragmatic ones (Eggins and Slade 1997). Chatting, gossiping, and ‘shooting the breeze’ are all forms of casual conversation. The Japanese telephone conversations used as data for this book are examples of a particular genre of casual conversation that Pridham (2001) labels comment and elaboration. This genre is characteristic of informal conversations between speakers who know each other well. Pridham (2001, p. 64) associates the follow ing eleven features with comment and elaboration dialogues: 1. Topics switch freely. 2. Topics are often provoked by what speakers are doing, by objects in their presence, or by some association with what has just been said. 3. There does not appear to be a clearly defined purpose for the conversation. 4. All speakers can introduce topics and no one speaker appears to control the conversation. 5. Speakers comment on each other’s statements.
Corpora and conversation
19
6. Topics are only elaborated on briefly, after follow-up questions or comments from listeners. 7. Comments in response to a topic often include some evaluation. 8. Responses can be very short. 9. Ellipsis is common. 10. The speaker’s cooperation is often shown through speaker support and rep etition of each other’s vocabulary. 11. Vocabulary typical of informal conversation is used, for example cliches, vague language, and taboo language. The Switchboard corpus is perhaps the best known collection of casual conver sational speech data in English. Conversational speech data for several other lan guages are also available through the CallHome family of corpora (Section 3.1), which includes the CHJ corpus.
2.3
Characteristics of conversation
The data set for this book, the CHJ corpus, is a collection of recorded Japanese telephone conversations. It is therefore appropriate here, in the final section of this chapter, to reflect briefly on what conversation is and how it is different from other types of language use. We will focus our attention on four essential charac teristics of conversation: turn-taking behavior, backchannel expressions, speaker disfluencies, and conversational discourse structure. 2.3.1 T\irn-taking behavior In idealized models of dialogue, one person speaks while the other listens. In real life, conversation is more chaotic, and gaps and overlaps in speech are common. Nonetheless, participants in a conversation do engage in sophisticated turn-taking behavior in order to determine who currently has ‘the floor,’ who gets to speak next, and for how long. Conversational turn-taking is sometimes said to be projective rather than re active (Sacks et al. 1974; Clark 1996). In other words, speakers do not simply wait until the current turn has ended before initiating their own turn; rather, they actively exploit linguistic and nonverbal clues in the conversation in order to time
20
Ellipsis and wa-marking in Japanese Conversation
or ‘project’ the end of the current turn, thus minimizing the gaps and overlaps in speech. A variety of linguistic cues are used to help coordinate turn-taking, including intonation, syntax, and pauses. Turn transitions tend to occur at the boundaries of syntactic constituents such as sentences, clauses, phrases, or one-word construc tions (Sacks et al. 1974). In addition, a speaker will generally signal the end of her turn intonationally, for example by a characteristic rise or fall in pitch or an elongation of the final syllable. There is a considerable literature on Japanese conversational turn-taking be havior, both verbal and non-verbal; however, since it is not directly relevant to this book, we will not review it here. Interested readers are referred to Hayashi (2003), Hayashi and Mori (1998), Hinds (1982b), Maynard (1989), Koiso et al. (1998a), Uchida (1998), Mori (1999), and Tanaka (1999). 2.3.2
Backchannel behavior
Clark (1996) observes that participants in a conversation appear to operate in two tracks simultaneously. First is the ‘official’ track, in which we find the ideational or propositional content of the interaction. The second, meta-communicative track is used for managing the conversation that is taking place in track 1. Speakers use track 2 to ask for confirmation, invite sentence completions, make repairs, and reveal their intentions and interpretations. Use of track 2 is not restricted to the speaker who currently has the floor, however. Listeners also make use of track 2 in order to signal understanding of what the speaker is saying, and to provide ‘continuer’ signals (Schegloff 1982) which encourage the speaker to continue talking. When track 2 is used for listener response in this way, it is referred to as the conversational backchannel (Yngve 1970; Jefferson 1984). The backchannel is home to a variety of verbal behaviors, from sympathetic chuckles to expressions of understanding such as uh-huh, mhm, and yeah. A num ber of nonverbal backchannel behaviors have also been investigated, including gesticulation, head nods and shakes, and the direction and duration of gaze (Dun can and Fiske 1977). Japanese backchannel behavior It is often observed (Hayashi 1996; Maynard 1997; Mori 1999) that backchan nel behavior is extremely common among Japanese speakers. There is even a commonly-known metaphorical term for this behavior—aizuti, the sound of two
Corpora and conversation
21
blacksmiths hammering iron in turn—which would seem to imply a kind of folk status for conversational backchannel behavior among Japanese (Mori 1999, p. 13). Indeed, there is evidence that Japanese speakers engage in both verbal and nonver bal backchannel behaviors more frequently than do Americans (Maynard 1989) or Australians (Elzinga 1978). The most frequent form of verbal backchannel behavior in Japanese is the ar ticulation of short affirmative expressions such as un (‘uh-huh’), hontou (‘really’), hai (‘yes’), ee (‘yeah’), and sou (‘right/indeed’) (Ward 1998). The most common backchannel expression in Japanese is un, roughly translatable as ‘uh-huh’ in En glish. The un backchannel sound is extremely common in Japanese and has been characterized as “almost like background music during the speaker’s talk” (May nard 1989, p. 162). In fact un, at 15,236 tokens, is the most frequent word type in the CHJ corpus by a wide margin. 2.3.3
Disfluencies
Natural conversation is filled with disfluencies, or disruptions in the fluent presen tation of speech (Levelt 1983, 1989; Clark 1996). Given the highly colloquial nature of the conversations in the CHJ telephone conversations, it is not surprising that speaker disfluencies are ubiquitous in the CHJ data. Disfluencies in CHJ can be organized into two major classes: repairs and hesitation sounds. We illustrate both types of disfluency below using exam ples from the CHJ corpus. Repairs A repair is a type of disfluency in which the speaker interrupts her speech pro duction, returns to an earlier point in her current utterance, and then restarts the utterance from that point. Those cases in which the speaker does not actually change the content of her utterance are often labeled restarts, false starts, or rep etitions rather than repairs. However, the term repair is typically used generically to refer to all instances of stopping and restarting. Two examples of repairs from the CHJ corpus are given in (2.1) and (2.2). Note that the = symbol is used to mark partial word tokens in the CHJ transcripts (Section 3.4.3). (2.1)
datte tii= but sma=
tiisai hikouki zya nai no? small plane COP NEG Q
‘But isn’t it a small plane?’
(1899; 587)
Ellipsis and wa.-mark.ing in Japanese Conversation
22
(2.2)
samu= sonnani samuku nai, ima col= so much cold NEG now
‘It’s not so cold now.’
(1899; 720)
In (2.1), the speaker restarts the word tiisai (‘small’). In (2.2), the speaker appar ently begins to say samuku nai (‘not cold’), but then repairs her utterance to the weaker assertion sonnani samuku nai (‘not so cold’). Repairs are a common occurrence in spontaneous speech. The percentage of utterances that contain repairs has been estimated for a number of different speech corpora: figures of 10% or less are cited for the MADCOW/ATIS corpus (Shriberg et al. 1992) and for the Japanese ADD corpus (Murakami and Sagayama 1991), while 25% is reported for the TRAINS dialogues (Heeman and Allen 1995), and 34% for a collection of spontaneous Dutch dialogues (Levelt 1983). We have not attempted to determine the total number of repairs in the CHJ corpus. However, the CHJ corpus does contain 2,530 partial word tokens marked with =. These partial word tokens are found in 2,373 of 37,716 speaker turns (about 6% of turns).
Hesitation sounds Hesitation sounds, also called ‘filled pauses,’ are sounds or words that are asso ciated with utterance planning or hesitancy on the part of the speaker. English hesitation sounds include vocalizations such as um, uh, and er, as well as lexi cal expressions like well... and y ’know. In Japanese, hesitation sounds include vocalizations like aa and ee, as well as lexical items like ano (‘that’) and nanka (‘what’). The CHJ corpus contains thousands of examples of hesitation sounds. Obtain ing an exact count would be difficult, since tokens such as ano (‘that’) and nanka (‘what’) are not always fillers. Vocalizations, on the other hand, are more easily quantified, since all non-Iexical vocalizations are marked with the % symbol in the CHJ transcripts (Section 3.4.3). This is illustrated in utterance (2.3). (2.3)
%eeto rondon zyanakute %eeto nan da kke? bosuton ka %umm London is not %umm what COP Q Boston Q ‘Umm, it’s not London. Umm, what was it again? Boston?’ (1032; 558)
Many of the %-marked vocalizations in the CHJ transcripts, including eeto and uunto, are typical utterance-initial hesitation sounds. Others, notably aa, function as backchannel utterances as well as hesitation sounds. Still others, such as hee and ara, express surprise or disbelief rather than hesitation.
23
Corpora and conversation Token
%Z. -3 %A.£ % l~ % hC % 3 —^ %X. t t % lt %3 — % d-A ,b %3
aa aq haa huun eq hee ee ara uun eeto oo ha UU uunto u
Token Count hou 2,175 % lid kyaa 1,442 %# V — a 574 % h hyaa 540 kaq 484 %A'-r> kuu 276 % < 238 % 'V '--- X, heheen atya 135 atta 132 V o h ^tz ahaha 112 V o h ltlt 60 V cfrh iA s katyon huhu 49 hun 42 waa 30 % b h 29 Total
Count 28 9 7 6 4 3 2 1 1 1 1 1 1 1 6,384
Table 2.2: Vocalizations in the CHJ transcripts Table 2.2 lists the complete inventory of %-marked vocalizations found in the CHJ transcripts. The transcripts contain a total of 6,384 vocalization tokens. These tokens are found within 5,608 of the 37,716 speaker turns. The vocalizations in the CHJ corpus tend to occur at the beginning of speaker turns: 4,618 speaker turns begin with vocalizations, while only 1,927 turns end with vocalizations (these counts include 206 turns that both begin and end with vocalizations). This result is consistent with the finding by Takezawa et al. (1994) that nearly 80% of hesitation sounds in their data (the ATR Speech and Language Database) occurred in sentence-initial positions. 2.3.4
Conversational structure
When the text or transcript of a conversation is examined, various aspects of the structure of the conversation reveal themselves. In this section, we briefly consider four types of discourse structure that are associated with conversation. These are (i) openings and closings, (ii) adjacency pairs, (iii) discourse cohesion, and (iv) thematic structure.
24
Ellipsis and wa-marking in Japanese Conversation
Openings and closings At the most basic level, a conversation consists of an opening, a body, and a clos ing (Schegloff and Sacks 1973). The opening sequence is typically initiated by one of the participants using a greeting or summons like hello, excuse me, or good morning. Such greetings are examples of conversational routines—fixed, formu laic expressions that are closely bound to a specific function or communication situation (Coulmas 1981; Aijmer 1996). In the case of telephone conversations, the speakers typically take time to verbally establish their identities (since they cannot see each other), their roles in the conversation, and the nature of the busi ness at hand (Schegloff 1968). We will not concern ourselves with conversational openings and closings in this book, since our CHJ transcripts do not include them. As we will see in Sec tion 3.4, the CHJ speech segments selected for transcription are taken exclusively from the body of the conversation. Adjacency pairs Schegloff and Sacks (1973) observe that natural conversation seems to consist largely of back-and-forth exchanges that they call adjacency pairs. These are pairs of adjacent utterances made by different speakers, the second being a re sponse to the first. Examples of adjacency pairs include question and answer, thanks and acknowledgment, and summons and response. Many types of conver sation, for instance doctor-patient interviews and courtroom interrogations, can be characterized as long sequences of adjacency pairs. Discourse cohesion Telephone conversations, like other genres of discourse, typically exhibit a certain amount of order and structure in the way succeeding utterances ‘flow’ together. This is a reflection of the coherence or cohesion of the discourse. Cohesion is achieved in a number of ways. Halliday and Hasan (1976) distin guish between lexical cohesion, by which a discourse is held together by clusters of semantically related words, and referential cohesion, which relies on corefer ence links such as those between pronouns and their antecedents in the discourse. Discourse cohesion is also maintained through the use of discourse markers (Schiffrin 1987), words or phrases that make explicit the boundaries and underly ing relations between sentences or other units of language. Examples of discourse markers in English include but, and, oh, well, although, anyway, and I mean.
Corpora and conversation
25
Thematic structure Earlier in this section we divided conversations into three parts: an opening, body, and closing. Traditionally, the transition from the opening to the body of a casual conversation is taken to be the point at which the initiator of the conversation intro duces the first discourse topic for discussion. After discussing the first topic, the participants then proceed to take up new topics, one after another, until they mu tually decide to close the conversation. Structurally, each new discourse topic can be said to define a new section of the conversation, with sub-topics corresponding to sub-sections, and so on, forming a tree-like hierarchy (van Dijk 1985). The topic of a section of discourse is whatever the discourse is ‘about’ at that point—the particular person, object, or proposition about which the speaker is either providing or requesting information. Casual conversations, as we noted in Section 2.2.4, typically exhibit frequent changes in discourse topic. A number of linguistic behaviors are known to accompany the transition from one topic to another in a conversation (Clark 1996). These include: • A substantial lapse in the conversation, possibly accompanied by backchannel behavior. • Minimal or abbreviated responses to adjacency pair initiatives, indicating the speaker’s desire to ‘move on’. • A formulation, summarization, or evaluation of the topic just discussed. • Use of specific discourse markers like anyway, by the way, and that reminds me that are associated with topic transition.
CHAPTER 3
The CHJ corpus Contents
3.1
3.1
The LDC CallHome corpora
3.2
About the CHJ corpus
28
3.3
About the speakers
28
3.4
The CHJ transcripts
31
3.4.1
Morphological segmentation
33
3.4.2
Size of the CHJ corpus
34
3.4.3
Other transcription conventions
34
3.4.4
Alterations to the transcripts
35
27
The LDC CallHome corpora
This chapter introduces the CallHome Japanese (CHJ) speech corpus (LDC 1996), a collection of 120 recorded telephone conversations that serves as the basis of our annotated corpus. The next chapter, Chapter 4, describes the linguistic annotations that we subsequently added to the original CHJ data described in this chapter. CallHome refers to a family of speech corpora1 that were assembled by the LDC beginning in 1995. The collection was sponsored by the Large Vocabulary Conversational Speech Recognition (LVCSR) project of the U.S. Department of Defense. CallHome corpora were developed for the following languages: Amer ican English, Egyptian Arabic, German, Japanese, Mandarin Chinese, and Span ish. All of the CallHome corpora consist of telephone calls that were initiated from the U.S. or Canada and placed to the country of the caller’s choice. The partici pants were native speakers of their respective language who agreed to be recorded in exchange for a free long-distance telephone call of up to 30 minutes. Partic ipants were recruited via World Wide Web postings, newspaper advertisements, on-site presentations, personal contacts, and telephone solicitation. The identities 1http://www.ldc.upenn.edu/ldc/about/callhome.html
27
I
28
Ellipsis and wd-marking in Japanese Conversation
of the participants were not collected. The calls were initiated via a toll-free robot operator that completed the call only if both parties agreed to be recorded. Each CallHome conversation was recorded on two separate channels, one each for the caller and recipient. The speech signal on each channel was digitally sam pled at the rate of 8,000 samples per second, using eight bits per sample in the /r-law format (Section 4.5.1). The resulting speech data was then encoded in the NIST Sphere header format and distributed on compact disc by the LDC.
3.2
About the CHJ corpus
The research in this book is based on the 1996 release of the CallHome Japanese (CHJ) corpus (LDC 1996). The 1996 release consists of 120 spontaneous, un scripted telephone conversations between native speakers of Japanese. The con versations were recorded between June 17, 1995 and November 15, 1995. The CHJ corpus includes transcripts of each conversation as well as digitized speech data. Each transcript covers a contiguous five- or ten-minute segment taken from a recorded conversation lasting up to 30 minutes. The transcriptions were performed by Texas Instruments and the LDC. In all, 200 telephone calls were transcribed. Of these, 80 were designated as training calls, 20 as development test calls, and 100 as evaluation test calls for purposes of the LVCSR project. For each of the training and development test calls, a contiguous ten-minute region was selected for transcription; for the evaluation test calls, a five-minute region was transcribed. The 1996 release of the CHJ corpus includes all 20 of the tenminute development test calls, all 80 of the ten-minute training calls, but only 20 of the five-minute evaluation test calls, for a total of 120 calls. The remaining 80 test calls were held in reserve for future LVCSR benchmark tests. The transcripts are morphologically segmented; i.e., analyzed into individual morphemes (Sec tion 3.4.1). In addition to the speech data and transcripts, the 1996 CHJ release from the LDC also includes (i) a 80,688-word lexicon, and (ii) source code for the finite-state transduction software that was used for the morphological segmenta tion of the transcripts.
3.3
About the speakers
In this book we refer to every participant in the CHJ conversations as a speaker. In each conversation we distinguish two particular speakers: the caller, who placed the call from North America, and the recipient, who received the phone call in Japan.
The CHJ corpus
29 No. of callers
Figure 3.1: Sex of callers and recipients The 120 phone calls therefore involve a total of 120 callers and 120 recipients. Figure 3.1 presents data on the sex (as judged by the transcribers) of each caller and recipient. As the figure shows, most (83/120, 69%) of the callers were fe male. However, an even greater percentage (100/120, 83%) of the call recipients were female; this is because most of the male callers (23/37, 62%) placed calls to female recipients, and the vast majority of female callers (77/83,93%) also called female recipients. The imbalance in the number of female recipients is remark able. Anecdotal evidence suggests that many of the male callers are husbands, perhaps temporarily assigned to work in the U.S., who placed calls to their wives back in Japan. Many of the female callers, on the other hand, seemed to be sin gle women, often students, who placed calls to their mothers or to close female friends. In 101 of the conversations, only the caller and the recipient spoke. In 18 of the conversations, the recipient of the call passed on the phone to one or more additional speakers (usually children) at some point in the conversation. In these cases, the new speakers would converse with the caller for a brief period and then pass the phone back to the recipient. In one case, it was the caller who briefly passed the phone to a child. Because of these additional participants, the total number of CHJ speakers is in fact 269 rather than 240 (the total number of callers and recipients). The transcribers made judgments as to the sex, age group, and regional accent or dialect of each of the 269 speakers. The sex and age data judgments are sum marized in Figure 3.2. The regional accent or dialect judgments are summarized in Figure 3.3 (see the discussion of Japanese dialects in Section A.3). As that figure shows, the majority of speakers (156/269, 58%) spoke the standard Tokyo
Ellipsis and wn-marking in Japanese Conversation
30
Figure 3.2: Sex and age classifications of the 269 CHJ speakers
No. of speakers
Figure 3.3: Regional accents or dialects of the 269 CHJ speakers
The CHJ corpus
31
No. of callers 40-| 30
23
z
38 12
2 0 -| 10
0
8-10
11-13
14-16
17-19
20+
Figure 3.4: Years of education of 84 CHJ callers No. of callers 29 / z 30 14 10 11 20 / 1 — A. 5 10 ---------1 r — 0 16-23 32-39 24-31 40-47 48-55
5
„
55+
Figure 3.5: Age range of 74 CHJ callers dialect. Many of the callers reported demographic information about themselves when they completed the call. Eighty-four of the 120 callers reported their education level (number of years of education completed), and 74 callers reported their age. These self-reported education and age data are summarized in Figures 3.4 and Figure 3.5. No such demographic information was collected from the recipients.
3.4
The CHJ transcripts
The CHJ transcripts are distributed as text files containing EUC-encoded Japanese characters. Each transcript covers a contiguous five- or ten-minute segment taken from a recorded conversation lasting up to 30 minutes. The five- or ten-minute speech segments selected for transcription generally begin around the 120-second mark of the conversation. This two-minute delay in beginning the transcription was implemented in order to give the speakers time to grow accustomed to being recorded and hence to speak more naturally. This technique of delaying transcription in the interest of obtaining more natural speech is standard in analyses of spontaneous conversations (Duncan and Fiske 1977; Maynard 1989, 1993; Pridham 2001). A fragment of a transcript is illustrated in Figure 3.6. The figure shows the first ten speaker turns from the first transcript of the CHJ corpus, along with an
32
Ellipsis and v/a-marking in Japanese Conversation
1 2 0 . 2 0 1 2 3 . 3 5 A: 'J§->tz i>,
t-F .
1 2 3 . 2 6 1 2 3 . 7 0 B: 1 2 4 . 2 8 1 2 8 . 5 0 A:
9/C. h“-
[ [ fotz L , c o l ] ] B 4 ' »f. < t
CD to 1 £ £ B
J
0 *
t
AM Z B tz ti'to m i [fcfcL,col] ]
fc S i * tfk\
@H*'[[^^>,col]] l o t *>»f
£ CD AHA *5 {laugh} . 1 2 8 . 4 7 1 2 9 . 8 5 B: % t - t A. 7 / 1 . 1 2 8 . 7 5 1 3 3 . 7 5 A: 7 / 1 la, C C IZ & h @&[ [ * > * ; L , c o l ] ] + 7 J LA' @ » - i t &V '[ [ a - J - C ^ r W c o n ] ] A, tz.
+
tj t
133.81 135.15 Ifc7 t 138.40 140.46 141.17
jar-r a a*.
-* n 135.63 140.15 S it . 138.99 141.15 142.88
B: %*. - jl Z-tt 4 i A: jfiZti£> (( ))
. ±®
* - > t t t z CD t D o l f H WH LX V' l>?
[distortion]
Translation of first three speaker turns: A:
t-F , uso, false,
IZ ‘J f -i tz 6 . A‘ f&< X $ BA m nakute sa nihon ni kaetta ra. atasi syoku ga job SUBJ lack PRES GER y ’know Japan to I
‘No way! I’d still have no job, y’know, if I return to Japan.’
B:
p/C, un. ‘Uh-huh.’
A:
h r . - CD
Jo
tonii no o Tony GEN ho n U ^ X h lf Jt7 katte age you buy give VOL
AM
ZB
tanzyoubi kugatsu hutsuka birthday Sept. 2nd t S i* I t i f , fl to omotta kedo, atasi COMP think but, I
tz A'to
MA»
da kara nanika is because something 2 mm 4> 48;69 Y -- *
part prefix
MiS 0
noun noun
2682;2681,2682;-
cop,pres conj
transcript
0696
Figure 4.3: Fragment from annotated transcript 0696 4.3.2
POS annotations
Assigning POS tags to words is a common enterprise in the field of NLP, where this task is referred to as POS tagging or simply tagging (Section 2.1.3). Since some words fall under more than one syntactic category, the task for a tagger is to assign the correct POS tag to a word based on how it is used in a particular utterance or sentence. In Japanese, as in English, there are hundreds of common words that can as sume more than one syntactic category. A large number of Japanese words occur both as free-standing nouns and as the stems of verbs or i-adjectives. In most cases, these noun/verb pairings are highly predictable; for example, color terms like aka (‘red’) and ao (‘blue’) occur alone as nouns, but are also the stems of
Annotating the CHJ corpus
47
their corresponding predicative adjectives akai (‘is red’) and aoi (‘is blue’). An other common example is the class of Japanese verbal nouns. These are words (generally of Chinese origin) that can either stand alone as nouns or else combine with the verb suru (‘make/do’) to form verbs. Since the verb sum contributes no particular semantic value of its own, it is often referred to as a ‘light’ verb. There are other POS ambiguities in Japanese besides noun/verb ambiguities. We already noted in Section 4.2.1 that kara is both a grammatical particle (mean ing ‘from’) and a clausal conjunction (meaning ‘because’). An accurate POS tagger must find ways to resolve these kinds of categorical ambiguities. Our own CHJ tagger, for example, had to determine that the token of ga in Figure 4.3 is a postpositional NP particle ( p a r t) , as shown in the figure, rather than a clausal conjunction (c o n j). Tagging the CHJ transcripts There are two general approaches to implementing taggers. The first approach, stochastic tagging (e.g. Merialdo 1994), uses a training corpus to compute the probability of a given word having a given tag in a given context; these pre computed probabilities are then used to select the most likely tags for the new input text. The second approach, rule-based tagging, involves the compilation of a large database of ad-hoc disambiguation rules.1 We chose a rule-based tagging approach over the stochastic approach for two reasons. First was the bootstrapping problem: we had no corpus that was pre tagged with the LDC’s POS categories to use as training data for a stochastic tagger. The second reason was the simplicity of the LDC’s Japanese tagset (see Table 4.2 on page 41), which made it possible to produce accurate tags using a relatively small number of rules. For example, the LDC’s particle category p a r t covers not only grammatical particles like ni but also sentence-final discourse par ticles like ne; thus there is no need for our tagger to disambiguate particles such as no, which can act as both a grammatical and a discourse particle. We implemented a rule-based tagger that operated in two stages. First, the entire CHJ lexicon was read as input, and a large table was filled with the set of possible POS tags for each token. Next, CHJ transcripts were read as input, and one or more POS tags were output for each token in the format shown in Figure 4.3. 1In most cases (e.g. Voutilainen 1995), the rules are hand-crafted by linguists. Brill (1995), how ever, proposes a mixed stochastic/rule-based approach in which the tagging rules are automatically induced from a training corpus using a machine learning algorithm.
Ellipsis and v/a-marking in Japanese Conversation
48
Cases of POS ambiguity were resolved using a list of hand-crafted rules. These tagging rules included examples like the following: • If the token are is followed by ba, mark as r5 (verb); otherwise, mark as dem (demonstrative). • If the token nara is followed by a form of nai, mark as r5 (verb); otherwise, mark as p a r t (particle). • If the token ga or kara follows a nominal, demonstrative, or interrogative, mark it as p a r t (particle); otherwise, mark as c o n j (conjunction). Our tagging of the 120 CHJ transcripts proceeded as follows. First, a CHJ transcript was selected and run through the rule-based tagger as described above. The resulting POS tags were then hand-corrected by an annotator who was famil iar with the LDC tagset. The annotator’s corrections were then generalized and incorporated into new tagging rules, and regression tests were used to ensure that the new rules introduced no unwanted consequences in previously tagged tran scripts. This style of iterative, rule-based tagging is roughly the same method adopted for the ENGTWOL tagger (Voutilainen 1995).
4.4
Predicate-argument annotations
So far, this chapter has covered the semantic and POS tags that we assigned to in dividual words in the CHJ corpus. In this section, we move up from the word level to the sentence level as we detail the semantic predicate-argument annotations that we performed on individual sentences in the CHJ transcripts. 4.4.1 Structural annotation We noted in Section 2.1.3 that it is sometimes desirable to annotate a corpus with structural information; that is, information indicating how the individual words in an utterance relate to, depend on, or form constituents with one another. Of course, the first question that arises in this regard is what kind of structural descriptions are appropriate for the task at hand. We immediately rejected the idea of producing a treebank of phrase struc ture annotations of CHJ utterances (Section 2.1.3). There were four reasons for this. Most importantly, the ‘messiness’ of spontaneous spoken utterances—i.e., their ambiguity, terseness, and disfluency—makes it notoriously difficult to assign complete, well-formed phrase structure trees to them. The second reason is that
Annotating the CHJ corpus
49
the relatively unstructured word order of Japanese, together with its case-marking system, mean that phrase structure analyses are not particularly helpful for clari fying the grammatical and semantic relations between words in a Japanese utter ance (i.e., determining ‘who did what to whom’). The third problem is that there is no general consensus among syntacticians about what the appropriate phrase structure representation for Japanese is, even for the most basic sentence types (Section A.2). And finally, at the purely practical level, full phrase structure an notation is a heavily time- and labor-intensive undertaking, even when it can be partly automated (Marcus et al. 1993). Predicate-argument structure Rather than pursue phrase structure annotations, we instead annotated sentences in the CHJ corpus with a limited type of dependency structure (Section A.2) showing basic predicate-argument relations within each sentence. An argument is a noun or other syntactic element that is normally required, or at least (in the case of Japanese) specifically permitted, by a verb or other predicate. For example, the Japanese verb kaeru (‘return (home)’) is a predicate that permits three arguments: an agent (who does the returning), a ‘source’ (the location from which one returns), and a ‘goal’ (the location returned to). In the case of utterance (4.1), the first speaker turn of transcript 0696, the agent argument to kaeru is represented by the noun atasi (‘I’), and the goal argument is nihon (‘Japan’): (4.1)
uso, atasi syoku ga nakute sa nihon ni kaetta ra. false, I job SUBJ not have y’know Japan to returned if. ‘No way! I’d still have no job, y’know, if I return to Japan.’ (0696; 120)
The source argument to kaeru is not explicitly stated by the speaker of (4.1), but it is clear from context that the intended meaning is ‘return to Japan from the United States.’ The dependency graph in (4.2) represents the explicit argument role relations in utterance (4.1). In particular, the labels a g e n t and g o a l indicate the specific roles played by the arguments atasi and nihon to the predicate kaetta (‘returned’). AGENT
(4.2) uso,
atasi I
syoku job
ga
nakute not exist
sa
nihon Japan
ni
kaettara returned
50
Ellipsis and wn-marking in Japanese Conversation
Note that in (4.2) the noun atasi (T ) plays two roles: the indirect object of the verb nakute (‘not exist’) (lit., a job does not existfor me), and the agent of the verb kaetta (‘returned’). Example (4.2) depicts precisely the type of information captured by our pred icate-argument annotations. Our annotators connected each predicate to its ex plicitly mentioned arguments in the same sentence, and then labeled each such argument with its semantic or grammatical role(s) in the utterance. Note that this annotation format represents dependency structure at the bunsetsu level, rather than the word level (Section A.2). In other words, dependencies between bunsetsu, such as between syoku and the verb nakute, are indicated, while dependencies within bunsetsu, such as between syoku and the particle ga, are not. In the remainder of this section, we first present specific details about how the predicate-argument annotation was accomplished, and then address the question of the accuracy and reliability of the resulting annotations. 4.4.2
Goi-Taikei transfer dictionary
Our predicate-argument tags are based on the GT predicate transfer dictionary. This dictionary—designed for machine translation applications—specifies trans lation correspondences between Japanese and English predicates (i.e., verbs and adjectives). In fact, GT offers two transfer dictionaries: (i) a common transfer dictionary, containing 10,000 standard predicates; and (ii) an idiomatic transfer dictionary, containing 5,000 idiomatic predicates. For each predicate, the dictio naries list one or more particular meanings or senses. An average of 2.3 senses per verb are specified in the dictionaries. By way of illustration, Figure 4.4 shows two senses of the verb torn (‘take’) that appear in the common transfer dictionary. The first (and more general) sense is translated as ‘take’ in English, whereas the second sense is translated as ‘re serve’ (as in “reserve a hotel room”). Each predicate sense in the dictionary is associated with one or more argument slots, which are labeled N l, N2, etc. Each argument slot contains information such as its grammatical function, case marker, case role, selectional restrictions, and default order (not all these features are shown in the figure). The selectional restrictions on arguments refer to specific semantic nodes in the GT ontology (Section 4.2.2).2 2Selectional restrictions (Katz and Fodor 1963; Chomsky 1965) are semantic sortal constraints imposed on potential arguments to a predicate in order to capture certain regularities in its interpre tation. For example, the verb toru, in its ‘reserve’ sense, selects for a human agent, and for objects like hotel rooms and cars (Figure 4.4). Despite their limitations (e.g., failure to handle metaphorical language use), selectional restrictions have in practice proven useful in NLP tasks such as word sense
Annotating the CHJ corpus
51
Pattem-ID
-302246-
Semantic Class
(a c tio n , t r a n s f e r ) pred
JXf> {torn)
N1 Japanese N2
case-role
Agent
case-marker
f t (ga) ‘NOM’
restriction
agent
case-role
Object-1
case-marker £ (o) ‘ACC’ restriction *
pred take N1
English
N2 Pattern-ID
-302253-
Semantic Class
(a c tio n ) pred N1
Japanese
function case
subject nominative
function
direct-object
case
accusative
IX§ {torn) case-role Agent case-marker i f (ga) ‘NOM’ restriction agent case-role
N2
Object-1
case-marker £ (o) ‘ACC’ restriction
pred English
N1
reserve function case
N2
lo d g i n g , v e h i c l e ,
subject nominative
function direct-object case accusative
Figure 4.4: Two senses of the verb toru in the GT transfer dictionary
52
Ellipsis and w a-marking in Japanese Conversation L a b el Nl N2 N3 N4 N5 N6 N7 N8 N9 N10 Nil N12 N13 N14 S10 QUANT ADV TIME
Role Subject/Agent Object-1 Object-2 Source Goal Purpose Result Locative Comitative Quotative Material Cause Instrument Means Clause Quantity Adverb Time duration
Case marker ga (kara,towa) \wa\ o (nituite) [ga] ni, e, to, kara,... kara, yori ni, e, made ni ni, to ni, o, de, e, kara to to kara, yori, de kara, yori, de de de to
English gloss (Subject) (Object) (Indirect Object) from to (until) for as in/at/on with with, from for with by that
Table 4.3: GT argument roles (Bond and Shirai 1997) As Figure 4.4 shows, the Japanese argument slots in each pattern are followed by a corresponding set of English argument slots for use in the English translation of the predicate. It is possible for an argument slot to appear only on one side, Japanese or English; this is useful for verbs in one language that ‘incorporate’ information that must be mentioned explicitly in the other. Recall that in the dependency graphs presented earlier in Section 4.4.1, labels like AGENT were used to indicate the role of each argument. Our actual predicateargument annotations, however, use the GT-style argument labels shown in Fig ure 4.4: N l, N2, etc. A complete inventory of these GT argument labels is pre sented in Table 4.3. Each argument label is listed along with its semantic or gram matical role, the Japanese particles most commonly associated with it, and a cor responding English preposition or grammatical function. Labels beginning with N (Nl, N2, etc.) represent nominal arguments. The label S10 is used for the sentential arguments of verbs like omou (‘think’) or iu (‘say’). Other argument la bels include QUANT for numeric expressions of quantity, ADV for certain required disambiguation, syntactic disambiguation, and anaphora resolution.
Annotating the CHJ corpus
53
a d v e rb s (m a in ly in id io m a tic e x p re ss io n s), a n d T IM E fo r e x p re s s io n s o f te m p o ra l d u ra tio n .
It should be noted that GT’s inventory of argument roles, listed in Table 4.3, was designed specifically for the practical application of Japanese-to-English ma chine translation, and therefore represents a mixture of semantic and grammatical roles that is more fine-grained than traditional grammatical categories such as subject and object. For example, the Japanese verb wataru (‘cross over’) takes an argument that is typically marked with the direct object particle o, as illustrated in (4.3) : (4.3)
burukkrin tte kawa o watatte iku n ya ne brooklyn QUOT river OBJ cross over go NOM COP FP ‘At Brooklyn, you cross over the river.’ (1057; 343)
In the GT dictionary, however, this argument to wataru (‘cross over’) is specified as N8 (locative), so the noun kawa (‘river’) in (4.3) is annotated N8 rather than N2 (direct object) in our corpus. In Part II of the book we will sometimes refer to the N l and N2 arguments as the “subject” and “object” of a particular predicate; how ever, it should be kept in mind that our GT argument annotations do not overlap perfectly with these traditional grammatical categories. GT’s inventory of semantic and grammatical argument roles also stands in contrast to the purely semantic interpretation that has been associated with what are referred to variously as semantic-, thematic-, or 0-roles (Jackendoff 1972; Dowty 1991). The GT inventory is also smaller and more rough-grained than the elaborate system of English participant roles developed by Fillmore and col leagues in the FrameNet project (Baker et al. 1998). In fact, there seems to be no consensus among linguists on what the best set of argument roles might be, or even whether these roles should be replaced by more abstract primitives (Levin and Hovav 1996). Nevertheless, argument roles such as those in Table 4.3 have proved useful in practice for NLP applications, and are relied on by most machine translation systems (Bond and Shirai 1997). 4.4.3
Hand-tagging of predicate-argument relations
In the remainder of this section we describe how the predicate-argument tagging of the CHJ corpus was carried out by hand by our native speaker annotators.
Ellipsis and v/a-marking in Japanese Conversation
54 agaru (‘rise’) dasu (‘put out’) kakaru (‘require’) tuku (‘attach’)
ageru (‘raise’) deru (‘go out’) kakeru (‘hang’) utu (‘hit’)
aru (‘have’) hairu (‘enter’) naru (‘become’)
ataru (‘strike’) ireru (‘put in’) suru (‘do’)
Table 4.4: Light verbs excluded from predicate-argument tagging Which predicates to tag ? Manual annotation is a time- and labor-intensive enterprise, and it is sometimes difficult to obtain reliable agreement among multiple annotators when their task involves making subjective, contextual judgments. For these reasons, we decided to attempt predicate-argument annotation of only a subset of the predicates (verbs and adjectives) in the CHJ transcripts. The predicates that we decided not to annotate with predicate-argument tags fall into five major classes. The first class is that of auxiliary verbs, including modals and other supporting verb types which in Japanese follow the main verb in a clause. In other words, we annotated the predicate-argument relations of only the main verb in each clause. The second class encompasses nominal adjectives and the copula. These pred icates do not typically subcategorize in interesting ways, which is why we did not annotate them. Their typical argument pattern consists of a single subject (Nl) with minimal selectional restrictions (Fry and Bond 2000). The third class that we did not annotate is light verbs. In particular, 14 very common verbs that carry Tight’ or underspecified meanings, listed in Table 4.4, were excluded from annotation. The reason for this was the large number of senses (more than 28 each) that are listed for these verbs in the GT transfer dictionary. A pilot annotation study (Fry and Bond 2000) revealed that our native speaker annotators frequently disagreed when choosing senses for these verbs, and often failed even to find a suitable sense listed in GT. For instance, the light verb suru (‘do’) frequently appears in the CHJ transcripts in the sense shown in (4.4): (4.4)
un, ato ni syuu-kan sita ra yeah after two weeks-long did if ‘yeah, in about two weeks’
(0743; 342)
This sense of suru does not appear in GT. In written Japanese suru would not normally be used in this way; rather, a more specific verb such as tatsu (‘pass’) would be preferred.
Annotating the CHJ corpus arigatai (‘be thankful’) daburu (‘duplicate’) dekasu (‘do/commit’) hikkurikaesu (‘turn around’) hottarakasu (‘neglect’) kan suru (‘be connected’) kurinuku (‘gouge out’) nazimu (‘become familiar’) nekasu (‘put to sleep’) okumaru (‘lie deep in’) otikoboreru (‘fall/drop’) seppatumaru (‘be cornered’) tamageru (‘be astonished’) yokotawaru (‘lie down’)
55 awasu (‘combine/merge’) darakeru (‘be lazy’) harasu (‘dispel’) hineru (‘get stale’) hutekusareru (‘become sulky’) kudasaru (‘give’) maturu (‘sew’) nazoru (‘trace’) nesoberu (‘sprawl’) okurasu (‘retard’) saboru (‘ditch’) sindoi (‘be difficult’) tutau (‘go along’) zireru (‘fret’)
Table 4.5: Missing verbs excluded from predicate-argument tagging The fourth category of predicate that we excluded encompasses causative and passive constructions. When a verb stem is followed by causative (-(sa)se) or passive (-(ra)re) morphology, the valency of its argument structure changes. In the case of passives, the agent role is suppressed, while in the case of causatives, a new causal agent role is added (Shibatani 1976, 1985). Since these altered argument structures no longer match the argument pattern specified for the stem in the GT transfer dictionary, we chose not to annotate them. The fifth category we excluded is that of missing verbs. A total of 28 predicate types in the CHJ transcripts could not be found in the GT transfer dictionary and so were not annotated. These are listed in Table 4.5. Finally, one predicate that does not fall into any of the above five classes was also excluded from annotation: sugoi (‘terrible’). We excluded sugoi because in natural speech it is frequently used not as a predicate but rather as either an exclamation or an intensifying adverb (despite the fact that it is grammatically an i-adjective). This is illustrated in example (4.5) from CHJ: (4.5)
uindoozu ga sugoi koutyou da si Windows SUBJ terrible good shape COP and ‘Windows is terribly successful.’
(3006; 465)
In the end, a total of 19,290 predicate tokens from the CHJ corpus were man ually tagged with predicate-argument relations. Of these predicates, 2,804 were
56
Ellipsis and wa-marking in Japanese Conversation
i-adjectives and the remainder were verbs. Which arguments to tag? After deciding which predicates to tag, we then had to establish how the annotators should go about selecting the appropriate arguments for each predicate. One question that arises concerns the appropriate syntactic unit of an argu ment. Should an argument slot be identified with a single word (e.g., a noun), or with a complex phrase (e.g., an NP)? The policy we adopted was to identify an ar gument with a single word, and in the case of complex NPs or other phrases, with the head word of that phrase. We chose this policy for several reasons. First was the difficulty of identifying and annotating phrase structure nodes such as NPs in spontaneous speech data like CHJ. Fortunately, however, the fact that Japanese is head-final (Section A.2) means that it is relatively easy to identify the head words at the end of such phrases. For instance, in possessive NPs of the form X no Y, and in appositive NPs of the form XY, the head is invariably the noun Y. We extended this ‘rightmost noun’ policy to cover other types of complex NP for which the notion of head is less well defined. For example, in the case of conjunctive NPs like X to Y (‘X and Y’) and X ka Y (‘X or Y’), we selected the rightmost noun Y as the head by default. Furthermore, identifying the head word of an argument was deemed sufficient for our purposes because of the head’s status as the locus of linguistic information in a phrase: the head word is what determines a phrase’s syntactic category, as well as its most salient semantic properties. Another issue that arose concerned where to look for arguments. Should the annotators identify only those arguments that are found in the same clause as the predicate? Or should potential arguments be identified anywhere within the same speaker turn as the predicate? The policy we adopted in this regard was to iden tify appropriate arguments anywhere within the same sentence as the predicate in question. One advantage of tagging at the sentence level is simply that the sentence is a relatively well-defined linguistic entity. In spoken language, sentence bound aries are marked not only syntactically—in Japanese, sentences end with a verb, often followed by auxiliaries and sentence-final particles—but also prosodically, with a steep rise or fall in pitch. In the case of the CHJ data, of course, the sen tence boundaries are already transcribed using sentence-final punctuation: ques tion marks, exclamation points, and periods. Although many CHJ speaker turns are interjections or sentence fragments, with no sentence-final punctuation, for purposes of predicate-argument annotation we simply counted these as sentences.
Annotating the CHJ corpus
57
Another basic reason why the sentence is the appropriate level for predicateargument annotations is simply that the other potential levels are not appropriate. The clause level, for example, is too small for capturing the essential semantic relations between predicates and arguments. This point is illustrated by example (4.1) on page 49, where the agent of the verb kaetta (‘returned’) is mentioned only in the previous clause. There is also the case of the Japanese ‘post-verbal construc tion’ (Shibatani 1990; Kaiser 1998), in which an argument is ‘post-posed’ to a position after its predicate. Post-verbal constructions like (4.6) are quite common in colloquial speech like the CHJ corpus. (4.6)
mou hitotu dasita, papa? another one sent Papa ‘Did Papa send another one?’
(1003; 579)
On the other hand, annotating predicate-argument relations beyond the sen tence level is equally inappropriate. Such an enterprise ceases to be predicateargument annotation and becomes coreference annotation, a considerably more complex (and less reliable) undertaking (see e.g. Eckert and Strube 1999). An example o f predicate-argument annotation Let us illustrate with a concrete example of predicate-argument annotation of a small fragment of the CHJ corpus. We illustrate, once again, with the opening fragment from transcript 0696, shown in Figure 4.5. The first speaker turn, spoken by A, was tagged by our annotators as shown in (4.7) : N1
uso,
atasi I
syoku job
ga
nakute not exist
sa
nihon Japan
ni
kaettara returned
The predicate-argument tagging of (4.7) proceeded as follows. First, the verbs nakute (‘not exist’) and kaetta (‘returned’) were identified as the main predicates of their respective clauses. Main predicates were automatically distinguished from auxiliary verbs by the rule-based tagger described in Section 4.3.2. Next, the an notators chose appropriate verb sense patterns for these main predicates from a list of options taken from the GT transfer dictionary. The annotators picked an appro priate sense pattern for each verb based on their own intuitions about the particular
Ellipsis and wa-marking in Japanese Conversation
58 A:
•3*\ fi,
Si
*
uso, atasi syoku ga na false, I job s u b j lack
!)§~>tz
tonii no Tony GEN H* nanika something
Mi
te
*'
o HON
l i t katte buy
CO
musyoku no jobless GEN
frb
tanzyoubi kugatsu hutsuka da kara birthday Sept. 2nd is because £>»f i .1 t JBofc l f t \ & £ age you to omotta kedo, atasi sa give VOL COMP think but, I y’know
APbI ningen da. person am.
‘Tony’s birthday is Sept. 2 so I want to buy him something, but I’m unemployed.’ Figure 4.5: Fragment from CHJ transcript no. 0696 context in which the predicate was uttered, as revealed by the transcript (and, oc casionally, the audio recording) of the conversation. Recall that these GT sense patterns specify, among other things, a particular set of argument slots (labeled N l, N3, etc.). Finally, the annotators identified and labeled the particular words in the sentence that, in their judgment, explicitly filled those argument slots. In this case, the nouns atasi (T ), syoku (‘job’), and nihon (‘Japan’) were so identified and labeled as depicted in (4.7). The second speaker turn in Figure 4.5, spoken by B, consists only of the backchannel expression un (‘uh-huh’), and so of course was not annotated with predicate-argument relations. Finally, the third speaker turn, spoken by A, was tagged by our annotators as shown in (4.8). (Since the complete utterance is rather long and complex, only the most essential words, including all of the tagged
Annotating the CHJ corpus
59
words, are shown in (4.8).)
(4.8)
N2
nanika something
katte buy...
thought
The annotation application We developed a browser-based tagging application specifically for this project, in order to facilitate the manual annotation of predicate-argument relations described above. The application, shown in Figure 4.6, was implemented using HTML forms within a standard web browser. A total of 19,290 forms like the one shown Figure 4.6 were created, one for each predicate to be annotated. As shown in Figure 4.6, the left frame of the application displays the transcript of the particular CHJ conversation being annotated. In the case of Figure 4.6, once again, the opening speaker turns of conversation 0696 are shown (cf. Figure 4.5). The predicates to be annotated are underlined and hyperlinked, so that clicking on a predicate in the transcript (left frame) brings up a verb sense menu for that verb in the right frame. In the case of Figure 4.6, the verb sense menu for the verb token nai (‘not exist’), from the first utterance in the left frame, has been brought up in the right frame. The three GT senses of nai are listed as menu choices. Each sense offers a unique subcategorization frame, including a set of argument role labels from Table 4.3 and a set of semantic selectional restrictions. The selectional restrictions are also underlined and hyperlinked to a diagram of the complete GT ontology (Figure 4.1), so that the annotators can see examples of each semantic category and how that category fits into the broader GT ontology. Finally, an English gloss of each verb sense (from the GT transfer dictionary) is given at the end of each subcategorization frame. A ‘Problems’ field, shown at the bottom of the right-hand frame in Figure 4.6, was provided so that the annotator could report any errors or difficulties encoun tered during annotation. The most common use of the Problem field was to report the lack of an appropriate verb sense option for a particular verb token. For ex ample, the verb kirn (‘to cut’) has a sense meaning ‘to hang up (the phone).’ This sense, which appeared frequently in the CHJ transcripts, is missing from the GT transfer dictionary. For these tokens of kiru (meaning ‘hang up’), the the an notators used the Problem field to report that no appropriate verb sense choice was offered, and in the end we excluded these cases from annotation.
Netscape: Verb sense annotation CD v 5
s
tn Jin'P-* + " -Fs o
vo o> ve A
3, o
T3 B
J -R •# gm ■ii+U §P
U 'K-x6 I o ^I i5 h
s®
e
41 lu®k
a
Figure 4.6: Screen shot of verb sense disambiguation application
Annotating the CHJ corpus 4.4.4
61
Results of the hand tagging
A total of 19,290 predicate tokens from the CHJ transcripts were manually tagged with predicate-argument relations as described above. However, we discarded 525 of these predicate tokens because of errors reported in the Problems field, such as missing senses or mistranscriptions. This left a total of 18,765 tagged predicate tokens. Table 4.6 lists the 42 most frequent tagged predicate types, along with the number of senses that are listed for each predicate in the GT common transfer dictionary. This number represents the minimum number of verb sense choices available to the annotator for that predicate in the annotation application. In addi tion, certain idiomatic senses from the GT idiom transfer dictionary were some times added as choices for particular predicate tokens in the annotation applica tion. These idiomatic senses were added for predicate tokens that appeared in sentences in which the other relevant words from the idiom also appeared. For ex ample, the idiomatic sense atama ni kuru (‘get angry’) was added as a verb sense choice for those tokens of the predicate kuru (‘come’) that occurred in sentences that also contained the word atama. In other words, idiomatic senses like atama ni kuru were only presented to the annotator when they had a reasonable chance of being appropriate choices. We adopted this strategy in order to make sure that idioms got properly sense-tagged, and at the same time to avoid overburdening the annotators with large numbers of irrelevant idiomatic verb sense choices for each predicate token. Annotator agreement on verb sense choices Each of the 18,765 predicate tokens was tagged by at least two different annota tors, so that inter-annotator agreement and reliability could be measured. In most cases only two annotators were used, but for a limited number of con versations we were able to employ three different annotators: 2,761 of the 18,765 tokens (14.7%) were tagged by three annotators, while the remaining 16,004 to kens (85.3%) were tagged by two annotators. The total number of predicate sense judgments was thus 16,004 x 2 + 2,761 x 3 = 40,291. Table 4.7 shows how these judgments were distributed among the different annotators. In order to estimate of the reliability of our hand-coded predicate sense anno tations, it is first of all necessary to determine the proportion of times that the an notators agreed in their judgments (Section 2.1.3). In our case, we cannot directly compare agreement rates on particular tokens, since the number of annotators for each token varies (from two to three). Naturally, tokens that are tagged by only
Ellipsis and wa-marking in Japanese Conversation
62
Count 1,331 1,276 1,080 895 670 641 552 548 491 354 310 287 271 249 229 228 214 204 176 155 152
Predicate iu (‘say’) iku (‘go’) yoi (‘good’) omou (‘think’) nai (‘not exist’) wakaru (‘understand’) iru (‘be, exist’) kuru (‘come’) yaru (‘do’) kaeru (‘return’) kaku (‘write’) mini (‘see’) kau (‘buy’) okuru (‘send’) ligau (‘differ’) dekiru (‘be able’) kiku (‘hear/ask’) siru (‘know’) motu (‘hold’) torn (‘take’) tukau (‘use’)
Senses Count 134 6 130 9 109 5 108 6 107 3 101 7 97 2 92 8 90 12 88 2 87 3 86 12 71 5 69 11 69 5 67 16 66 8 4 66 65 10 14 63 63 5
Predicate au (‘meet’) denwa suru (‘phone’) owaru (‘finish’) laberu (‘eat’) tuku (‘arrive’) warui (‘bad’) tukuru (‘create’) takai (‘tall/high’) kangaeru (‘ponder’) ookii (‘big’) yasui (‘cheap’) syaberu (‘chat’) noru (‘ride’) asobu (‘play’) ganbaru (‘persist’) morau (‘receive’) todoku (‘reach’) neru (‘sleep’) ooi (‘numerous’) hanasu (‘speak’) iru (‘need’)
Senses 2 2 8 2 3 5 14 5 3 7 4 2 8 3 2 5 6 2 7 4 4
Table 4.6: Most frequent tagged predicate types
Annotator Misa Miyachi Yuko Okado Ai Takahama Michiko Ueda Rika Yonemura Total:
Number of judgments 15,339 9,109 6,047 6,270 3,526 40,291
Table 4.7: Distribution of predicate sense judgments over annotators
Annotating the CHJ corpus
63
two annotators are more likely to exhibit ‘universal’ agreement than those tagged by three annotators. Rather, the appropriate metric for annotator reliability in our case is overall pairwise agreement; that is, the proportion of times that any two annotators agree with one another. Since k annotators can be paired in (*) ways, three annota tors can agree with each other in (®) = 3 ways, whereas two annotators can only agree in (^) = 1 way. Our data comprise 16,004 tokens judged by two annota tors, representing 16,004 pairwise judgments, and 2,761 tokens judged by three annotators, representing 2,761 x 3 = 8,283 pairwise judgments. We thus have a total of 16,004 + 8,283 = 24,287 possible pairwise agreements between annota tors. The observed pairwise agreement, P{A), can then be computed as the ratio of the number of pairs for which there was agreement, 21,425, to the number of possible pairwise agreements. In our case we have 21425 0.882 24287 In other words, our annotators agreed with each other about predicate senses about 88% of the time. However, pairwise agreement alone is insufficient as a reliability judgment, because it fails to take into account chance agreement (Section 2.1.3). It is there fore necessary to determine how often we might expect our verb sense annotators to agree by chance, and then to correct for this chance agreement appropriately. Our annotator’s task is to assign a token t to one of m t sense categories. Recall that the number of sense categories mt can be different for each token f. This is not simply because different predicate types have different numbers of senses in the GT common transfer dictionary (see Table 4.6), but also because one or more idiomatic senses may have been added to the list of choices for t , depending on the other words in the sentence besides t. Following Scott (1955) and Fleiss (1971), we assume that the annotators’ judgments are statistically independent, and that each of the mt categories is equally likely, as would be the case if the coders were choosing categories at random. In this case, the chance of a pair of coders choosing the same category for token t is ^ . The overall expected chance pairwise agreement P{E) can then be computed as the mean of the expected chance agreements over all 24,287 pairwise judgments: P(A) =
P(E) =
1 24287
24287
E
i m ti
0.294
In other words, if we had asked the annotators to assign predicate senses randomly,
Ellipsis and wa-marking in Japanese Conversation
64
we would expect them to agree about 29% of the time simply by chance. We can now estimate the overall reliability of our predicate sense annotations using the kappa statistic K (Section 2.1.3). Typically, a K score greater than 0.80 indicates good reliability, while scores between 0.67 and 0.80 allow tentative conclusions to be drawn (Carletta 1996). In our case, we have r,
P{A) - P(E) _ 0.882 - 0.294 _ 0.588 _ n „ „ 1 - P( E) 1 - 0.294 0.706
Of course, the K value reported above represents an estimate of overall pair wise inter-annotator agreement on the task of assigning senses to predicates in the CHJ corpus. We treated each assignment as a statistically independent event, and did not attempt to assess the reliability of particular annotators, or of judgments on individual predicate types. It is possible, for example, that certain specific verb types exhibited low agreement, while others exhibited high agreement. Still, the relatively high K score reported above is encouraging, for it suggests that, on the whole, our predicate sense annotations are reliable. Discarding unreliable annotations While our predicate sense annotations may be reliable in the aggregate, the fact remains that there were thousands of predicate tokens on which our annotators could not agree on an appropriate GT sense tag. In order to help ensure the accu racy of our annotations, we eliminated the tags for all tokens on which there was insufficient agreement. For 16,403 of the 18,765 annotated predicate tokens (87.4%), the annotators were in agreement; that is, the annotators all chose the same sense for that token. We kept the tags for all of these tokens. The remaining 2,362 predicate tokens (12.6%) produced at least one disagreement among the annotators. These 2,362 tokens fell into three classes; (i) for 1,886 tokens, there were only two annotators, and they disagreed; (ii) for 452 tokens, two of three annotators agreed; and (iii) for 24 tokens, three annotators all disagreed. We eliminated the annotations for all predicates in classes (i) and (iii), but to keep the annotations for the tokens in class (ii). In other words, the standard we adopted is that one pairwise agreement on a predicate sense for a particular token was sufficient. We are left with a total of 16,403 + 452 = 16,855 annotated predicate tokens.
Annotating the CHJ corpus
65
Argument annotations So far we have described the selection of the appropriate sense for each predicate token. Now we turn to the results for the second half of the coding task: filling argument slots. This task involves two types of judgment on the part of the anno tator. First, she must decide, for each slot, whether or not it can be filled; that is, whether or not the argument role in question is actually represented by some entity mentioned in the same sentence. Second, once she decides that a particular slot should be filled, she must decide which particular word in the sentence best fills that slot. Both of these decisions are sources of potential disagreements among annotators. Since different predicate senses have different subcategorization patterns, we can only compare slot filling agreement rates for those predicate tokens whose sense has already been established. We therefore examined the 16,855 predicate tokens that had been successfully sense-annotated to see how similar the annota tors’ slot filling judgments were for those predicates. Overall agreement was high. Two or more annotators agreed on their slot filling tags for 15,618 (92.7%) of our 16,855 sense-annotated predicate tokens. A total of 15,101 tokens (89.6%) exhibited no disagreements at all, while the remaining 1,754 (10.4%) exhibited one or more slot filling disagreements. Of the 1,754 predicate tokens that exhibited disagreements, 1,538 were cases in which two or more annotators differed in the number of slots they chose to fill. For the remaining 216 tokens, the annotators agreed in the number of slots they filled, but disagreed on one or more of the particular fillers they chose. How should these slot filling disagreements be dealt with? One approach to handling these cases would be simply to eliminate them, and content ourselves with the 15,101 predicates that exhibited full agreement, or perhaps with the 15,618 predicates for which two or more coders agreed. However, this approach would unfairly eliminate predicate tokens in more verbose sentences, simply be cause those sentences offer more slot filling options to the annotator (hence more opportunities for disagreement). Our database of predicate-argument annotations would then be biased towards short utterances. Instead, we decided to keep all of our 16,855 sense-annotated predicate to kens. We resolved disagreements by favoring the judgments of our two designated expert coders, Misa Miyachi and Ai Takahama (see Section 4.1.1).
66
Ellipsis and wa-marking in Japanese Conversation
#
Start
#
120. •20
0001
of
transcript
123.,35 A :
7*
0696
n
L, c o m
7-f",
0002 fl
pro
fl
8,37;-
0004
m
noun
«
1939;-
0005 as
neg,i-adj
0007
10 tokens)
N2
.59 .41 1.00
436 633 1,069
Nl
.41 .59 1.00
1,914 710 2,624
N2
.73 .27 1.00
759 713 1,472
.52 .48 1.00
Table 5.15: Particle ellipsis rates for N l and N 2 in short sentences (particle included), this exception is most likely a case of dialectal variation. Out side of this conversation there were no cases of ki tuke(ru) (without the particle) in the corpus. Utterance length Another syntactic factor that has been claimed to condition particle ellipsis is ut terance length. Jorden (1974, p. 44) notes that “ga, wa, and o are frequently omit ted, particularly in short sentences.” Similarly, Alfonso (1966, p. 1198) observes that “in short expressions, wa and o are often omitted.” Neither author offers a precise characterization of a “short” expression or sen tence. In order to operationalize the notion for our purposes, let us arbitrarily define a short utterance in the CHJ transcripts as a sentence that contains fewer than ten tokens, excluding punctuation.10 Sentences of ten or greater tokens we will call “long.” Table 5.15 compares the particle ellipsis rates for N l and N 2 within short sentences and long sentences. As the table shows, the particle ellipsis rates within short sentences are significantly higher. The particle ellipsis rate for N l in short sentences is .41, compared to just .27 in long sentences, a significant difference (X2 = 97.78, p < .001). Similarly, the rate forN2 is .59, significantly higher than the .48 rate in long sentences (x2 = 28.87. p < .001). Why would short sentences exhibit higher rates of particle ellipsis than longer ones? A plausible explanation, suggested directly by Alfonso (1966) and hinted at by Hinds (1982a), is that longer sentences exhibit more syntactic complex ity, and hence introduce more potential ambiguities. As Hinds (1982a, p. 155) puts it, particles are likely to be omitted “when they do not contribute necessary, 10See Section 3.4.1 on morphological segmentation of the C H J transcripts. A s in Section 4.4.3, w e identify sentences in the C H J transcripts using sentence-final punctuation. Interjections and fragments are also counted as sentences.
Ellipsis and w a-marking in Japanese Conversation
108 Following particle? yes no Total
Multisyllabic
Monosyllabic
N2
Nl 235 63 298
.79 .21 1.00
61 55 116
.53 .47 1.00
Nl 2,792 .66 .34 1,423 4,215 1.00
N2 .47 1,133 .53 1,291 2,424 1.00
Table 5.16: Particle ellipsis rates for monosyllabic N1 andN2 non-redundant information.” As syntactic complexity increases, particles become more necessary for semantic disambiguation. Word length Tsutsui (1984) proposes that the length of the NP itself, as measured in syllables, is an important factor in determining whether or not that NP gets explicitly marked with a particle. Specifically: “The ellipsis of a case particle marking a monosyl labic NP is less natural than that of a case particle marking a multisyllabic NP” (Tsutsui 1984, pp. 90-101). In order to test this hypothesis on the CHJ data, we used our phonetic anno tations (see Figure 4.9 on page 75) to distinguish monosyllabic noun tokens from multisyllabic noun tokens in the CHJ transcripts. Since Tsutsui’s generalization concerns syllables rather than morae (Section A.2), we counted as monosyllabic words like hon (‘book’) that end in the nasal mora n and words like kyou (‘to day’) with long vowel sounds (see Tsujimura 1996, p. 66). We excluded the word ki (‘spirit’) from our counts, since it appears almost exclusively within idioms in which the particle cannot be omitted. Table 5.16 compares the particle ellipsis rates for monosyllabic and multisyl labic N1 and N2 arguments in the CHJ corpus. As the table shows, the particle ellipsis rates for multisyllabic NPs are indeed higher than those for monosyllabic NPs, as Tsutsui would predict. Most notably, the particle ellipsis rate for mul tisyllabic Nl, .34, is significantly higher than the .21 rate for monosyllabic N1 (x 2 = 20.07, p < .001). In the case of N2, however, the difference is not signifi cant. The particle ellipsis rate for multisyllabic N2 is .53, not significantly higher than the .47 for monosyllabic N2 (x2 = 1.52). Verb adjacency Several researchers have pointed out that the direct object particle o tends to be omitted when the object argument appears immediately adjacent to (i.e., in front
Ellipsis
109
Following particle? yes no Total
Verb-adjacent Nl
1,529 694 2,223
Other
N2
.69 .31 1.00
697 983 1,680
Nl
.41 .59 1.00
1,498 792 2,290
N2
.65 .35 1.00
498 363 861
.58 .42 1.00
Table 5.17: Particle ellipsis rates for verb-adjacent N l and N2 of) the verb. As Tsutsui (1984, pp. 132-135) puts it: “Unless the speech is very for mal, the ellipsis of o is natural if the NP is immediately followed by the predicate of the sentence.” This observation is supported by Matsuda’s (1996) quantitative study of o-ellipsis. In Matsuda’s data, sentences where the direct object was ad jacent to the verb went unmarked 59% of the time, whereas nonadjacent direct objects went unmarked only 26% of the time. Saito (1983,1985) presents an account of this phenomenon within the syntac tic framework of government-binding theory (Chomsky 1981). In Saito’s account, abstract accusative case in Japanese is assigned by the verb, whereas nominative case is ‘inherent,’ that is, not assigned by any element. Saito then invokes the adjacency requirement—an assumption that abstract case assignment, across lan guages, licenses an NP only when it is adjacent to the verb—to explain the fact that the particle o can be elided when the object is immediately followed by the predicate. The CHJ data, summarized in Table 5.17, confirm that adjacency to the predi cate is an important predictor of particle ellipsis for N 2 . The rate of particle ellip sis for N2 arguments adjacent to the verb is .59, the same rate found by Matsuda. This rate is significantly higher than the rate of .42 exhibited by non-adjacent N2s (X2 = 61.10, p < .001). In the case of N l, on the other hand, being adjacent to the verb makes no significant difference in rates of particle ellipsis (x2 = 5.78). 5.4.4
Animacy and definiteness
In this section we examine how two semantic properties, animacy and definite ness, correlate with particle ellipsis in Japanese. Differential Object Marking across languages Animacy and definiteness turn out to be important properties in the direct object case marking systems of many languages, including Japanese.
110
Ellipsis and viz-marking in Japanese Conversation
Aissen (2000), following Bossong (1985), observes that it is common for lan guages with overt case marking of direct objects to mark certain classes of objects, but not others. Aissen refers to this phenomenon as Differential Object Marking (DOM). Examples of languages that exhibit DOM include: Hebrew, in which only definite objects are case marked; Sinhalese, in which only animate objects are case marked; and Romanian, in which case marking of animate personal pronouns and proper nouns is obligatory. The general pattern, observed in more than 300 DOM languages, seems to be that the direct objects that get case marked in these languages are more se mantically and pragmatically prominent that those that do not get case marked. Aissen characterizes semantic and pragmatic prominence along two dimensions: animacy and definiteness. Objects that are higher in one or both of the hierarchies shown in (5.10) are more likely to be case marked in DOM languages. (5.10)
a. Animacy scale: Human > Animate > Inanimate b. Definiteness scale: Personal pronoun > Proper noun > Definite NP > Indefinite specific NP > Non-specific NP
Working within the framework of syntactic Optimality Theory (see e.g. Leg endre et al. 2001) Aissen formalizes the prominence hierarchies in (5.10) into two ‘iconicity constraints,’ one for animacy and the other for definiteness. The first constraint states, in effect,.that it is more important for more highly animate nouns to receive overt case marking than less animate nouns, and similarly for the definiteness constraint. As the term ‘iconicity’ suggests, there would seem to be a ready functional ex planation for why the DOM phenomenon is so pervasive across human languages. The properties of animacy and definiteness are most often associated with gram matical subjects, at least in transitive constructions (Keenan 1976; Comrie 1989). It is therefore natural to speculate, as Aissen does, that the function of DOM is to disambiguate subject from object. In other words, DOM languages mark seman tically prominent direct objects, and not others, because they are the ones most likely to be confused with subjects if left unmarked. Indeed, as Comrie (1989, Sec. 6.2.2) reports, the case-marking systems of a wide range of languages seem to operate on precisely this principle. In Japanese, the particle o can mark any direct object, regardless of its seman tic properties. Japanese is therefore not a DOM language. However, the fact that the majority of direct objects in the CHJ corpus are not overtly marked with o (see Table 5.8 on page 99) raises the question of whether these iconicity constraints might help explain how the ellipsis of o is distributed in our corpus. It seems
Ellipsis
111
Personal pronoun Proper noun Def. animal noun Def. human noun Def. inanimate noun Indef. human noun Indef. animal noun Indef. inanimate noun
Unmarked 1 5 4 13 222 8 11 179
o-marked 115 100 58 104 1,006 32 20 175
Total 116 105 62 117 1,228 40 31 354
Rate of o-ellipsis .01 .05 .07 .11 .18 .20 .35 .51
Table 5.18: NP classes and o-ellipsis (Minashima 2001) a least plausible that one function of the particle o is to help Japanese speakers disambiguate objects from subjects. Minashima (2001) on animacy and definiteness in o-ellipsis Minashima (2001) examines the effects of animacy and definiteness on the ellipsis of o in Japanese. He reports that Japanese direct objects that are explicitly marked with o tend to be high in animateness and definiteness, while unmarked objects tend to be low in animacy and definiteness. Minashima’s results are based on a data set of 2,053 direct object NPs. Mi nashima explains his data collection as follows: I collected data from Japanese colloquial speech because the Japanese accusative case marker is more frequently deleted in colloquial speech than in literary language. The data cited in the paper was collected from conversations in Japanese novels and comics. (Minashima 2001, p. 176) Minashima extracted from these sources 2,053 direct object NPs, and classi fied each one’s animacy and definiteness properties by hand. Following Silverstein (1976), Minashima adopts the animacy hierarchy in (5.11): (5.11) Inanimate noun > animal noun > human noun > proper noun > personal pronoun As for definiteness, Minashima classifies as definite any NP “that refers to an entity or group of entities whose identity is presumably known to addressee.” All personal pronouns and proper nouns are classified as inherently definite.
Ellipsis and wn-marking in Japanese Conversation
112 Following particle? yes no Total
Not animate
Animate
N2
Nl 1,312 724 2,036
.64 .36 1.00
167 154 321
.52 .48 1.00
N2
Nl 1,715 762 2,477
.69 .31 1.00
1,028 1,192 2,220
.46 .54 1.00
Table 5.19: Particle ellipsis and animacy in CHJ Minashima’s data are summarized in Table 5.18. As the table shows, personal pronouns exhibited the lowest rate of o-ellipsis, .01, while indefinite inanimate nouns exhibited the highest rate, .51. More generally, Minashima’s data show that definite NPs (top five rows of Table 5.18) are explicitly marked with o more often than indefinite NPs (bottom three rows). Furthermore, animate nouns are explicitly marked with o more often than inanimate nouns are. Minashima’s data therefore support Aissen’s iconicity constraints, which hold that direct objects that are higher in animacy and definiteness are more likely to be explicitly case marked. CHJ data We now examine how animacy and definiteness correlate with particle ellipsis in the CHJ data. Our results promise to complement Minashima’s results in a couple of respects. First, our CHJ data set offers a more representative sampling of colloquial Japanese speech, since it consists of naturally-occurring colloquial speech from recorded telephone conversations rather than fictional dialogues from novels and comics. Second, where Minashima confined his study to direct objects, we examine particle ellipsis in subjects (Nl) as well as direct objects (N2). In order to operationalize the notion of animacy in the CHJ data we turn to our GT semantic annotations (Section 4.2.2). Animacy is typically considered a property of humans and animals only (Crystal 1997). We therefore classify as animate those nouns that fall under the p e r s o n class in the GT common noun or proper noun ontology. In addition, we include all nouns that fall within the 1 i v i n g class of the common noun ontology, except for those within the p l a n t subclass, in order to exclude plants from being counted as animate. Table 5.19 shows the effect of animacy on particle ellipsis in the CHJ data. In the case of direct objects (N2), our animacy results weakly support those of Minashima. In particular, the rate of particle ellipsis for animate nouns in CHJ, .48, is lower than the rate for other nouns, .54, although the result is not significant
Ellipsis
113
Following particle? yes no Total
Proper noun or personal pronoun Nl
746 452 1,198
Other
N2
.62 .38 1.00
92 69 161
Nl
.57 .43 1.00
2,281 1,034 3,315
N2
.69 .31 1.00
1,103 1,277 2,380
.46 .54 1.00
Table 5.20: Particle ellipsis and strongly definite NPs in CHJ (Xz = 3.G8). Turning to Nl, we find strong results in the opposite direction. In particular, the rate of particle ellipsis for animate nouns, .36, is higher than the rate for other nouns, .31, a significant difference (x2 = 11.64, p < .001). Now let us turn to definiteness. In contrast to English, which possesses the definite and indefinite articles a and the, Japanese does not grammatically encode the definiteness of nouns. As a result, there is no automatic method to determine whether a given nominal token in the CHJ corpus is intended to refer in a definite way to some specific entity. On the other hand, personal pronouns and proper nouns, the two most defi nite NP categories identified by Aissen and Minashima, are readily identifiable in our annotated corpus. Let us use the term ‘strongly definite’ to refer to the class of personal pronouns and proper nouns, thus capturing the top two levels of Aissen’s and Minashima’s definiteness scales. The strongly definite nouns in the CHJ corpus include: • All nouns assigned the POS class p r o p (proper noun) • The first-person referring expressions boku, ore, ware (ware), and (w)ata(ku)si • The second-person referring expressions an(a)ta, kimi, omae, temae, and temee • The third-person referring expressions kanozyo, kare(ra), and mina (sama). Table 5.20 shows the rates of particle ellipsis for strongly definite nouns (i.e., proper nouns and personal pronouns) in the CHJ data. In the case of direct objects (N2), the CHJ data once again support the earlier results of Minashima and the iconicity constraints of Aissen. The rate of particle ellipsis for strongly definite nouns, .43, is significantly lower than the rate for other nouns, .54 (x 2 = 7.06, p < .01). In the case of N l, once again the results are the converse: the ellipsis rate for
114
Ellipsis and v/d-marking in Japanese Conversation
strongly definite nouns, .38, is significantly higher than the rate for other nouns, .31 (x2 = 17.03, p < .001). Discussion: animacy and definiteness In this section we found that semantically prominent nouns—namely proper nouns and nouns that refer to people and other animate entities—are less likely to exhibit o-ellipsis than other types of nouns in our conversational speech data. In other words, we found evidence that Aissen’s iconicity constraints, under which animate and definite direct objects are more likely to be explicitly case marked, apply to Japanese as well as to DOM languages. We also found that animacy and definiteness correlate strongly with subject (Nl) particle ellipsis, but in the opposite direction. That is, subjects that are an imate and definite are less likely to be explicitly case marked. Our results for N l and N2 are therefore consistent with the functionalist explanation offered for Aissen’s iconicity constraints: namely, that explicit case marking serves to disam biguate grammatical functions (in the case of DOM, disambiguating subject from object). Since the properties of animacy and definiteness are most often associ ated with grammatical subjects, those subjects that are animate and definite are precisely those that are least in need of explicit case marking. Comrie (1989, Sec. 6.2.2) reports that animacy and definiteness play similar roles in the case-marking systems of a wide range of human languages, including virtually all Australian languages. The fact that we discovered a similar effect for Japanese in the CHJ data therefore helps to reinforce Comrie’s observation that “case marking, which has so often been viewed as an area of language-specific idiosyncrasy.. .can [nevertheless] be the subject of fruitful language universals” (Comrie 1989, p. 136). 5.4.5
Focus and particle ellipsis
Particle ellipsis in Japanese has been claimed to correlate with the phenomenon of focus. Specifically, NPs that represent the focus (the point of contrastive or important information) of an utterance must keep their particles, while NPs that are not under focus are licensed to drop their particles. This observation has been given a number of different formulations: wa is never dropped if the NP it marks is the focus (or part of the focus) of the sentence. (Tsutsui 1981, p. 297)
Ellipsis
115
If the referent of X in X wa is psychologically close to the speaker and the hearer, wa tends to drop unless X is under focus. (Makino and Tsutsui 1986, p. 24) The direct object marker o can be omitted unless the NP o is under focus. (Makino and Tsutsui 1986, p. 25) My assumption is that whenever the pertinent NP is “deemphasized” or “defocused,” the case marker can be deleted. (Masunaga 1988, p. 147) None of these authors offers a precise characterization of what it means for an NP to be under focus. Masunaga (1988), however, proposes that a subject or object NP can be indirectly defocused if some other element in a sentence is marked with an emphatic particle. The defocused subject or object is then free to drop its grammatical particle. Masunaga gives three examples of emphatic particles that have this defocusing effect: the NP particle mo (‘too/even’), and the sentence-final discourse particles yo and zo (see Section A.4). Masunaga lists a handful of example sentences in which the use of these parti cles causes the subject or object of the sentence to be defocused, thereby licensing particle ellipsis. Her example sentences include the following (particle ellipsis is indicated by the notation -0): (5.12) Boku I
wa Ran-0 top
R a n -(O B J)
‘I sa w Ran (as
san
do
mo
mita.
th re e
tim e s
even
saw
m a n y as) th re e tim e s.’
(5.13) kinou boston de Ran-0 mita yo. yesterday Boston inR an -(O B J) saw FP ‘I saw Ran yesterday in Boston.’ (5.14) Onnanoko-0 kita zo. girl-(S U B J) came f p ‘A g irl came.’ Masunaga asserts that the particle ellipses in examples (5.12)-(5.14) would be unacceptable if not for the presence of the emphatic particles mo, yo, and zo, which serve to license the ellipses.
Ellipsis and wd-marking in Japanese Conversation
116 Following particle? yes no Total
Other
Grammatically defocused N2
N1
627 367 994
.63 .37 1.00
209 351 560
N2
Nl
.37 .63 1.00
2,400 1,119 3,519
.68 .32 1.00
986 995 1,981
.50 .50 1.00
Table 5.21: Particle ellipsis in grammatically defocused positions Evidence for focus constraints on particle ellipsis in CHJ In the remainder of this section we examine the CHJ corpus for empirical evidence that particle ellipsis is constrained by focus. Since our CHJ annotations include acoustic as well as syntactic information, we are able to consider two types of fo cus phenomena: (1) grammatical ‘defocusing’ (as described above by Masunaga), and (2) prosodic focus, as measured by Fq measurements of individual nouns. Let us use the term “grammatically defocused” to describe sentences such as (5.12)-(5.14); that is, sentences that contain a subject or direct object that has presumably been defocused because some other element in the sentence has been marked with an emphatic particle. We can then assemble the set of grammatically defocused sentences in the CHJ corpus. Our inventory of emphatic particles begins with Masunaga’s set: mo, yo, and zo. To these we add the restrictive focus NP particles dake, bakari, and sika (which all translate roughly as ‘only’). Finally, we add the remaining emphatic sentencefinal discourse particles listed in Section A.4: yone, yona, wa, wayo, wayone, and wana. (Note that the emphatic discourse particle wa, written with the character b , is a different particle than the topic particle wa, which is written l£.) Table 5.21 shows the rates of particle ellipsis for N1 and N2 arguments in grammatically defocused sentences in the CHJ data. As the table shows, argu ments in grammatically defocused positions exhibit higher rates of particle ellip sis than other arguments. The ellipsis rate for defocused N2s, .63, is significantly higher than the rate for other N2 arguments, .50 (x 2 = 27.17, p < .001), and similarly for N1 arguments ( x 2 = 9.21, p < .01). Our results in Table 5.21 are once again quite similar to the results of an ex periment carried out by Matsuda (1996, p. 171). Matsuda found that direct objects in utterances with sentence-final particles dropped the o particle 62% of the time, while direct objects in utterances without sentence-final particles dropped o only 52% of the time, a significant difference (p < .001). At first glance, the empirical findings reported by Matsuda (1996) and in Ta-
Ellipsis
117
ble 5.21 seem to support Masunaga’s claim that grammatical defocusing licenses particle ellipsis, even though the effect is far from categorical. On the other hand, these empirical results are also subject to an alternative in terpretation. As Matsuda (1996, p. 176) points out, particle ellipsis and sentencefinal particles are both associated exclusively with colloquial speech (see Sec tion A.4), so it is not at all surprising that these two phenomena should correlate highly. Conversations that are highly colloquial will exhibit high rates of both particle ellipsis and sentence-final particles, while conversations that are more formal will exhibit less of each. Note that this explanation can also account for Masunaga’s acceptability judgments about her examples (5.12)-(5.14): the use of emphatic particles renders those sentences more colloquial, and hence makes the particle ellipses sound more natural. Under this interpretation, the alleged ‘defo cusing’ phenomenon is a red herring. Prosodic focus and particle ellipsis We began this section by reporting the claim that focused NPs do not license parti cle ellipsis. So far we have tested this claim only indirectly, by examining particle ellipsis rates for NPs when other elements of the same sentence are focused. We are now in a position to test the claim directly using our acoustic annotations of the CHJ corpus. A wide range of human languages use intonation to mark the focus of an utterance. In particular the focus (the point of new, contrastive, or otherwise im portant information) tends to be prosodically stressed, while information that is not in focus tends to be unstressed. (This is discussed further in Sections 6.2.4, 6.2.5, and 6.3.1 in the next chapter.) In the case of Japanese, this phenomenon is manifested in terms of pitch: expanded pitch range is used to indicate a point of focus or contrast, while compressed pitch range correlates with non-prominent or given information (Matsunaga 1984; Pierrehumbert and Beckman 1988; Hirose et al. 1996; Venditti and Swerts 1996; Venditti 2000). In physical terms, this ex pansion or compression of pitch range is achieved by raising or lowering the To topline (corresponding to peak high tones) rather than modifying the bottom range of F0\ as a result, peak F0 values are useful clues for identifying points of focus in Japanese utterances (Venditti 2000). Recall that our acoustic annotations of nouns in the CHJ corpus, depicted in Figure 4.9 on page 75, let us determine whether or not a particular noun carries the highest F0 value in the sentence in which it occurs. Nouns that have this prop erty we label prosodically focused. In other words, our set of prosodically focused nouns consists of those nouns associated with the maximal point of peak F0 in the
Ellipsis and wa-marking in Japanese Conversation
118 Following particle? yes no Total
Other
Prosodically focused
N2
Nl 510 227 737
.69 .31 1.00
179 171 350
.51 .49 1.00
N2
Nl 2,517 1,259 3,776
.67 .33 1.00
1,016 1,175 2,191
.46 .54 1.00
Table 5.22: Particle ellipsis and prosodic focus sentence in which they occur; if a noun does not contain the maximum Fo value in its sentence, we do not count it as prosodically focused. This definition is ad mittedly rather crude, since it does not take into account natural pitch declination among other factors (Section 6.3.1), but will have to serve as a rough approxi mation of prosodic focus in lieu of manually interpreting the pitch countours of thousands of utterances. Table 5.22 shows the effect of prosodic focus, as defined above, on particle ellipsis in the CHJ data. In the case of Nl, there is essentially no difference in particle ellipsis rates between focused and unfocused arguments (x 2 = 1.80). Turning to N2, we find that prosodically focused N2s exhibit a slightly lower rate of particle ellipsis, but again the results are statistically insignificant (x2 = 2.76). In other words, prosodic focus is not a significant predictor of particle ellipsis. Conclusion: focus and particle ellipsis Our results on focus and particle ellipsis can be summarized as follows. First, we saw that alleged evidence for the role of focus in particle ellipsis, in the form of grammatical ‘defocusing’ using emphatic particles (Masunaga 1988), can be explained quite plausibly without the notion of focus. Then, investigating more direct evidence for the role of focus in particle ellipsis, we discovered that argu ments that are prosodically focused, as measured by sentence-maximum peak Fo values, do not drop their particles significantly more often than other arguments. In sum, the claim that focus is associated with particle ellipsis does not seem to be supported by evidence in the CHJ corpus. On the other hand, it would perhaps be premature to accept our negative results as conclusive. One could imagine carefully controlled experiments that would probe the relation of focus and particle ellipsis more directly than we have been able to do here. For example, the grammatical defocusing effect proposed by Masunaga (1988) might be tested in a way that controls for the level of colloquial language. Furthermore, our definition of prosodic focus based on the peak Fo
Ellipsis
119
value of the sentence is rather crude; one could imagine a better operationalization, based perhaps on perception studies or on F0 values predicted by a phonological model (e.g. Venditti 2000). In any case, we leave these efforts to future research. 5.4.6 Conclusion: particle ellipsis Our explorations in the CHJ corpus have identified a number of syntactic and se mantic factors that correlate significantly with particle ellipsis in Japanese. These include the following: • Animacy and definiteness • Questions and idioms • Utterance length (in words) • Word length (in syllables) • Verb adjacency Two sociological factors, sex and dialect, were also investigated, but did not yield robust results. Furthermore, we were unable to find evidence in support of the claim that focus is associated with particle ellipsis. Our results seem to raise as many questions as they answer. For example, what explains the correlation between particle ellipsis and questions? Or word length? Is there a common principle that unifies the entire set of factors above? This last question underscores the fact that in spite of (or perhaps because of) the large quantities of data we were able to marshal in our investigation, we have not been guided towards any unified theory of particle ellipsis in Japanese. On the contrary, if any conclusion seems justified by our results, it is that theories of particle ellipsis that rely on a single categorical explanation—e.g., adjacency (Saito 1985) or focus (Masunaga 1988)—are doomed to inadequacy. Our results also suggest that if a comprehensive account of Japanese parti cle ellipsis ever does emerge it is likely to be functionalist in nature. We have seen that a single functionalist explanation—the need to disambiguate grammat ical roles like subject and object—accounts quite plausibly for the way particle ellipsis correlates with both utterance length (Section 5.4.3) and with animacy and definiteness (Section 5.4.4). The animacy and definiteness constraints discussed in Section 5.4.4 are particularly interesting, because these act as ‘hard’ or categor ical constraints on case marking in some languages, but are ‘soft’ constraints (i.e., statistical preferences) in Japanese.
CHAPTER 6
Wa-marking Contents 6.1
6.2
6.3
6.4
6.1
Introduction 6.1.1 Topic and subject in Japanese 6.1.2 Mechanics of vra-marking Semantics of wa- and g«-phrases 6.2.1 Kuno’s taxonomy of wa and ga 6.2.2 Categorical vs. thetic judgments 6.2.3 Wa as a backgrounding particle 6.2.4 Old vs. new information 6.2.5 File card-based accounts of wa and ga 6.2.6 The Strong Familiarity Condition 6.2.7 Conclusion: semantics of wa- and gn-phrases Intonation and wa and ga 6.3.1 Intonation and focus 6.3.2 F0 correlates of wa-phrases 6.3.3 Fo correlates of wa and ga in CHJ 6.3.4 Conclusion: intonation and wa and ga Properties of wn-marked nouns 6.4.1 Accessibility to wa-marking 6.4.2 Semantic properties of wa- and ga-marked nouns 6.4.3 Conclusion: properties of wa-marked nouns
121 122 123 125 125 128 130 131 133 137 144 145 145 148 151 155 156 156 158 164
Introduction
This chapter investigates the use and function of the topic-marking particle wa in Japanese. Parts of the chapter are devoted to a review of the large existing literature on Japanese wa. However, just as with our investigation of ellipsis in Chapter 5, we also make use of the CHJ corpus data to provide quantitative and qualitative characterizations of how the participants in a Japanese conversation actually use wa in natural, spontaneous speech. 121
Ellipsis and wn-marking in Japanese Conversation
122
The chapter is organized as follows: • Here in Section 6.1 we present introductory information about Japanese topics and the use of the particle wa. • Section 6.2 examines the contrasting semantics of wa- and ga-phrases, a subtle and perplexing issue that has received considerable attention in the Japanese linguistics literature. • Section 6.3 examines intonational differences between wa- and ga-phrases. • Finally, Section 6.4 examines certain lexical semantic properties of wa- and ga-marked nouns in the CHJ corpus, and lists the types of argument roles most frequently marked by wa and ga. 6.1.1 Topic and subject in Japanese Section A.2 in Appendix A offers a brief introduction to Japanese postpositional particles, including wa and ga. We observe in that section that sentences like (6.1), with an explicit ga-marked subject, exhibit a subject-predicate organization. (6.1)
uindoozu ga sugoi koutyou da Windows S U B J terrible good shape C O P ‘Windows is terribly successful.’
si and (3006; 465)
The subject-predicate distinction, which dates back to Aristotle, is a famil iar one in English and other European languages. The subject of a sentence, in the traditional Aristotelian sense, is an entity (e.g., a person, place, or object) of which something is said, or predicated (Seuren 1998). In linguistics, the subject is usually defined in grammatical terms; that is, in terms of specific syntactic and semantic properties (Keenan 1976). For example, in English and other European languages the agent of a transitive verb is almost always encoded grammatically as a subject.1 In Japanese, there is another very common type of sentence organization called the topic-comment structure (Li and Thompson 1976; Gundel 1988). This can be illustrated by replacing the subject particle ga in (6.1) with the topic particle wa, as in (6.2): (6.2)
uindoozu wa sugoi koutyou da Windows T O P terrible good shape C O P ‘As for Windows, it’s terribly successful.’
si and
1Ergative languages (Dixon 1994) are an important exception but are beyond the scope of this book.
Wa-marking
123
The precise nature of the semantic difference between the ga-sentence in (6.1) and the wfl-sentence in (6.2) is an issue that we take up next in Section 6.2. For now, we simply note that since the topic-marking particle wa shares many syntactic and semantic properties with the subject-marking particle ga, the two particles are often discussed together so that their differences can be highlighted. In fact, linguistic accounts of wa are typically formulated explicitly in terms of how wa differs from ga. Semantic and syntactic notions o f topic The topic of an utterance is traditionally defined as the particular semantic entity (e.g., person or object) that the utterance is ‘about.’ Gundel (1988) offers the following more precise definition: An entity E is the topic of a sentence, S, iff in using S the speaker intends to increase the addressee’s knowledge about, request infor mation about, or otherwise get the addressee to act with respect to E. (Gundel 1988, p. 210) The term topic phrase, in contrast, refers to syntactic units, such as the NP uindoozu wa (‘Windows’) in (6.2). In other words, the topic, a semantic entity, is the referent of (i.e., is referred to by) the topic phrase. In this chapter we will generally use the term topic interchangeably for both the semantic and syntactic senses, except where doing so might introduce con fusion. The term comment generally refers to all those constituents in a topiccomment construction that are not part of the topic phrase. The comment of the sentence expresses information that is in some sense ‘about’ the topic. The semantic notion of topichood also extends beyond the sentence level, as we saw in Section 2.3.4. For example, an entire conversation, or part of a con versation, might have a particular discourse topic, which might or might not be mentioned explicitly in the discourse. 6.1.2
Mechanics of rva-marking
In Japanese, not only subjects but also direct objects, locatives, and other ar guments can be made into topics. Subjects (marked by ga) and direct objects (marked by o) can be topicalized by simply replacing the particle (ga or o) with wa, just as was done in (6.2). In the case of (6.3) below, it is hon (‘books’), the direct object of katte (‘buying’), that is wa-marked.
124 (6.3)
Ellipsis and wa-marking in Japanese Conversation nanka dizunii no hon wa nanka ippai katte uh Disney G E N book T O P uh lots buying ‘Uh, I’ve been buying lots of Disney books.’
(0986; 615)
The English gloss for (6.3) might also be (awkwardly) rendered: ‘As for Disney books, I’ve been buying a lot of them.’ NPs followed by particles other than ga and o can be topicalized by attaching wa to the particle. This is illustrated in (6.4) and (6.5): (6.4)
nihongo ni wa yama toka kawa toka itimonzi de Japanese in t o p mountain etc. river etc.straight line c o p ‘In Japanese, ‘mountain,’ ‘river,’ etc. are (written with) straight lines.’ (1048; 671)
(6.5)
watasi no toko no ie kara I G EN place G E N house from ‘From my house, (it’s) pretty far.’
wa TOP
kanari tooi no pretty far F P (1288; 350)
The particles ga and o, however, can never be followed by wa. Locative arguments that are normally followed by ni frequently drop that par ticle when they are topicalized. This is illustrated by the wa-marking of nihon (‘Japan’) in (6.6): (6.6)
ima, nihon wa hurousya ga ooi now Japan T O P vagrant SUBJ numerous ‘Are there lots of vagrants in Japan now?’
no? N O M ?
(1237; 235)
Note that wa-marking, in the account given above, is a mechanism or process: we say the particle wa ‘replaces’ or ‘attaches to’ other particles, thereby creating a topic phrase. Put another way: a Japanese speaker ‘marks’ a topic phrase with wa. (This latter formulation sidesteps the issue of whether other particles are involved or not.) This formulation echoes our earlier characterization of both ellipsis and wa-marking as “optional linguistic mechanisms” in Section 5.1. So far, we have described the function of wa as indicating the topic of a sen tence. However, wa is sometimes claimed to have other semantic and pragmatic uses as well, such as indicating contrastive information in an utterance. This is the issue we take up next.
Wa-marking
6.2
125
Semantics of wa- and ga-phrases
This section is devoted to the semantics of wa-marking in Japanese. Our goal is to shed light on the meaning that wa-marked phrases contribute to a sentence. The semantics of wa and ga in Japanese is in many ways a well-worn research topic, as the reader of this section will quickly appreciate during our review of the voluminous literature on this subject. The size of this literature is perhaps justified, however, by the subtlety of the semantic and pragmatic distinctions evoked by these two expression types. After our review of the literature, we devote the last part of this section, and subsequent sections of this chapter, to analyzing certain aspects of how wa and ga are actually used in the CHJ corpus. Our empirical findings will help to illustrate, and in some cases evaluate, observations from the theoretical literature. This section is organized roughly chronologically. We begin in Sections 6.2.1 through 6.2.3 by considering three of the most influential characterizations of wa and ga that were published in English in the 1970s: those of Kuno (1973), Kuroda (1972), and Martin (1975). Then, in Section 6.2.4, we examine a more recent class of theoretical descriptions of wa- and ga-phrases that are formulated in terms of givenness; i.e., whether the phrases in question represent information that is ‘old’ or ‘new’ to the discourse. Section 6.2.5 considers Japanese wa-marking from the perspective of the file card-based approach of Vallduvl (1992), which seeks to situate the functions of Japanese wa and ga within a broader, cross-language account of how information is ‘packaged’ linguistically by speakers using means such as prosody and syntax. Finally, in Section 6.2.6 we criticize one particular file card approach to Japanese topics, that of Portner and Yabushita (1998). As usual, our descriptions of various theoretical approaches to wa and ga in this section are illustrated with example utterances taken from the CHJ corpus. 6.2.1 Kuno’s taxonomy of w a and g a One of the most influential linguistic descriptions of the particles wa and ga is the one presented by Kuno (1973, Chap. 2). Kuno on wa Kuno enumerates two different uses of wa. The first is the standard thematic wa, which marks an NP as the theme or topic of the sentence. This thematic use is illustrated in utterances (6.7) and (6.8).
Ellipsis and wd-marking in Japanese Conversation
126 (6.7)
(6.8)
watasi wa
de
na
katta na.
I
leave
neg
pa st
top
fp.
‘As for me, I didn’t leave.’
(0924; 477)
tiba tyan wa kenzai da kedo. Chiba Ms. t o p healthy c o p but ‘As for Ms. Chiba, she is healthy.’
(1725; 537)
According to Kuno, themes are restricted to NPs that are either anaphoric (i.e., NPs like T or ‘Ms. Chiba’ that refer to particular entities) or generic (i.e., NPs like ‘mankind’ that refer to an entire class). Furthermore, the NPs marked by thematic wa are generally restricted to the beginning of a sentence.23 Kuno’s second category is contrastive wa. Utterances (6.9) and (6.10) from CHJ illustrate contrastive uses of wa? (6.9)
(6.10)
maa nusuma re ru koto wa nai well steal pa ss p r e s thing t o p n e g ‘Well, it’s not that it was stolen.’ tegami kaita n da letter wrote NOM c o p
kedo, denwa wa sinakatta but, phone t o p didn’t
(0988; 271) n nom
da kedo COP but
‘I wrote a letter, but I didn’t call.’
(2188; 180)
According to Kuno, phrases marked by the contrastive wa receive prominent in tonation. Furthermore, unlike the case of thematic wa, NPs marked by contrastive wa are not restricted to being anaphoric or generic. Kuno notes further that many instances of thematic wa phrases are potentially ambiguous between the thematic and contrastive readings. For example, given the proper context or intonation, (6.7) could be interpreted contrastively to mean something like ‘I didn’t leave (but the others did).’ McGloin on wa McGloin (1986) elaborates somewhat on Kuno’s account of wa phrases, most notably by investigating the semantic interactions between wa and negation. Like 2This property, being sentence-initial, seems to be a defining feature of themes across languages (Vallduvi 1992). 3For the sake of consistency w e will continue to gloss w a as TOP, even in cases where it does not s e e m to m a r k a topic.
Wa-marking
127
Kuno, McGloin assumes that there are two distinct uses of wa, a thematic and contrastive one. In support of this, she notes a number of properties that seem to distinguish the two uses of wa: 1. Contrastive wa carries emphatic stress. 2. Only generic or anaphoric NPs take thematic wa. 3. Thematic wa does not occur in embedded sentences. 4. Contrastive wa signals the target of negation. With respect to the second property, McGloin emphasizes that not only can the contrastive wa attach to NPs that are neither generic nor anaphoric, but in fact it can attach to a variety of other phrase types besides NPs. These include adverbials like zyouzu ni (‘skillfully’), quantifiers like minna (‘every’), deverbal nouns like tukuri (‘construction’), and verb stems like iki (‘go’). For example, in B’s utterance in (6.11), wa attaches to the adverbial form waruku (‘badly’): (6.11)
A: ii uti desyou? good house C O P ‘It’s a good house, right?’ B: un. maa waruku wa nai kedo ne. yeah, well badly TOP NEG but F P ‘Yeah. Well, it’s not bad anyway.’
(1123; 383)
With respect to the fourth property, negation, McGloin observes that thematic wa phrases typically lie outside the domain of negation, whereas contrastive wa phrases act as a target for negation. In fact, we already illustrated this phenomenon using utterance (6.7) above. Recall that under its thematic reading, (6.7) means something like ‘as for me, I didn’t leave’. Under a contrastive reading, however, (6.7) implies that someone else besides the speaker left: ‘I didn’t leave (but the others did).’ Kuno on ga Kuno enumerates three different uses of the subject-marking particle ga. First is the use of ga in a neutral description of actions, as illustrated in (6.12). (6.12) myuuziamu ga aru no museum SUBJ exist n o ‘There is a museum (here).’
yo m
fp
(0924; 278)
Ellipsis and vj'a-marking in Japanese Conversation
128
Sentences of neutral description such as (6.12) “present an objectively observable action, existence, or temporary state as a new event” (Kuno 1973, p. 51). Accord ing to Kuno, only the subjects of action verbs, existential verbs, and adjectives that represent changing states can be followed by the ga of neutral description. Kuno’s second use of ga is for exhaustive listing. This use is illustrated by speaker B ’s answer to A’s question in (6.13): (6.13)
A: nani ga
ii
no
tte
itta ra
w hat SUBJ good NOM that said
if
‘W hat did (he) say was good?’
B: sonna anta kyuuryougaii tte itta such you salary SUBJ good that said ‘(He) said your salary was good.’
(2235; 522)
Here, B is providing a definitive account, or exhaustive listing, of the things that were said to be good. A paraphrase might be ‘What he said was good was your salary.’ In contrast to the neutral description ga, the exhaustive listing ga can also mark the subject of predicates that denote more stable or permanent states. Kuno claims that when a ga-marked subject appears with a dynamic predicate or action verb, the sentence is often ambiguous between the neutral description and exhaustive listing readings. Finally, Kuno notes that there are certain cases where ga is used to mark an object, not a subject, as in (6.14). (6.14)
uti no oyazisan ninniku ga suki da we GEN father garlic SUBJ fond COP
‘Our father likes garlic.’
(2217; 215)
This ‘objective ga’ marks the object of all transitive adjectives (e.g., hosii ‘want’) and a handful of other stative transitive predicates (e.g., suki ‘be fond of’, above). 6.2.2
Categorical vs. thetic judgments
Kuroda (1972) offers an account of the difference between wa and ga that is based on the philosophical distinction between categorical and thetic judgments. These notions originate in the 19th-century theory of judgment proposed by Franz Brentano and elaborated by Anton Marty. The categorical judgment conforms to the traditional Aristotelian paradigm of the subject and predicate. That is, a categorical judgment consists of two separate acts: (i) the act of recognizing the subject, an autonomous individual, and (ii) the
Wa-marking
129
act of affirming or denying what is expressed by the predicate about the subject. English sentences like John walks, The book is on the table and That tree is an elm represent categorical judgments. The thetic judgment, on the other hand, represents simply the recognition or rejection of the material of a judgment, without first individuating an autonomous subject. That is, a state of affairs is grasped as a whole, without the analysis characteristic of the categorical judgment. Existential sentences like There is a God and impersonal sentences like It is raining express thetic judgments. According to Kuroda, Japanese sentences with wa-marked topics represent categorical judgments, while sentences using ga rather than wa represent thetic judgments. Kuroda illustrates with the pair in (6.15a-b). (6.15)
a. inu ga hasitte iru dog SUBJ running is ‘A dog is running.’ b. inu wa hasitte iru dog TOP running is ‘The dog is running.’
The ga sentence (6.15a) corresponds to a thetic judgment; that is, a direct recognition of the event of something running. Here, the judgment which un derlies the act of naming is subordinated to the kernel judgment of the event. In contrast, sentence (6.15b), using wa, corresponds to the categorical judgment; that is, a judgment with a logical (rather than grammatical) subject-predicate structure. Here the judgment also involves a specific event of running, but in this case the speaker’s interest is first directed towards the entity referred to by the logical sub ject, and then this entity is connected to the occurrence of the event. Shibatani's ‘emphatic judgment' Kuroda’s characterization of wa as indicating a categorical judgment seems to be endorsed by Shibatani (1990, p. 264-265). The particle wa, according to Shibatani, “separates an entity from the rest of things and has the effect of making an emphatic judgment.” He suggests that this characterization of wa obviates the need for Kuno’s separate contrastive wa: Kuno’s “contrastive wa", then, is due to the inherent nature of wa as an emphatic particle (understood as above), whose emphatic force be comes more pronounced when there is a contrasting proposition. That
Ellipsis and wa.-marking in Japanese Conversation
130
is, there aren’t two distinct wa’s, or two distinct meanings associated with wa, as suggested by the labels “thematic” wa and “contrastive” wa\ rather, one and the same wa has the effect of emphasizing the contrast when the discourse environment provides a background for contrast. (Shibatani 1990, p. 265) 6.2.3
Wa
as a backgrounding particle
Martin (1975, Section 2.3) offers a somewhat different perspective on wa. In Mar tin’s account, the central function of wa is to subdue or background its material. The backgrounding effect of wa, and its contrast with ga, are perhaps most clearly illustrated in examples of questions and answers. Martin notes that wa phrases are used in wh-questions in order to redirect the focus away from that phrase and onto the interrogative word. This is illustrated in examples like (6.16) and (6.17) from CHJ: (6.16) oheya wa room
nan nin? TOP how many people?
‘How many people are in the room?’ (6.17) ne kyou fp
today
(0924; 720)
wa nan niti da kke? w hat day COP fp
top
‘Say, what’s the date today?’
(1538; 467)
Martin (1975, p. 62) also observes that the particle wa is used in the answer to a question when one wishes to deny some particular element of the question. He illustrates this point with the question-answer pairs in (6.18) and (6.19). Utterance (6.18b), the negative answer to (6.18a), highlights the negation partly by using wa to background the negated entity:4 (6.18)
a. tabako ga aru ka? cigarette s u b j exist Q ‘Are there cigarettes?’ b. tabako wa nai. cigarette TOP NEG ‘There are no cigarettes.’
In the response in (6.19b), on the other hand, the focus is on tabako rather than on the negation, and so ga is used. 4 In Section 5.4.3 w e noted that questions like (6.18a) of the form X g a a r u ? in fact sound unnatural in Japanese.
Wa-marking (6.19)
131
a. nani ga nai ka? what S U B J N E G Q ‘What don’t we have?’ b. tabako ga nai cigarette SUBJ n e g ‘There are no cigarettes.’
Under Martin’s analysis, wa bears a relationship of symmetrical focusing with the emphatic particle mo (‘even’, ‘too’). That is, mo highlights while wa subdues. The hearer’s attention is concentrated or focused by mo, while wa directs attention elsewhere. Similarly, Rickmeyer (1985, p. 280) characterizes to the effect of wa as right-focusing, i.e., backgrounding the phrase it marks and limiting the focus to the remainder of the sentence. This contrasts with the left-focusing effect of mo. 6.2.4
Old vs. new information
The givenness of an entity mentioned in a discourse refers to whether that entity represents information that is in some sense ‘old’ or ‘new.’ While it is recognized that givenness plays an important role in prosody and other aspects of linguistic structure, there has nevertheless been some disagreement about how givenness itself should be defined (Wolters 2001). Halliday (1967, p. 204), for example, ob serves that, in English, prominent intonation is associated with new information, i.e., information which “the speaker presents ... as not being recoverable from the preceding discourse.” Prince (1981) offers a more detailed taxonomy of given ness. In Prince’s system, given information is either textually evoked (mentioned in the discourse) or else situationally evoked. Prince also introduces the category of inferable, to describe those entities which the speaker assumes the hearer can infer from other evoked entities (e.g., the driver is inferable from the bus). The Japanese particles wa and ga are frequently characterized in terms of the givenness of the entities marked by the particles.5 In particular, it is observed that wa often serves to indicate old, given, or presupposed information, whereas normal syntactic marking of a subject and object with particles ga and o suggests new information. For example, in one traditional Japanese discourse pattern, ga is used to mark the subject of a sentence when the individual denoted by the subject is first intro duced. Then, in subsequent utterances, wa replaces ga, and the subject moves into the background. This is illustrated in the following pair of ‘fairy tale’ sentences from Makino and Tsutsui (1986): ’According to Mak i n o (1982, p. 134), this observation goes back at least to Kasuga (1918).
132 (6.20)
Ellipsis and wa-marking in Japanese Conversation a. Mukasimukasi, hitori no oziisan ga sundeimasita. ‘Once upon a time, there lived an old man.’ b. Oziisan wa totemo binbou desita. ‘The old man was very poor.’
Another traditional discourse paradigm, discussed by Hinds and Hinds (1979) and Maynard (1980), is one in which an NP is introduced first with ga, then high lighted in a second reference using wa, and subsequently elided. In fact, however, quantitative studies of oral narratives by Clancy and Downing (1987) and of writ ten children’s stories by Watanabe (1989) show that NP-ga followed by ellipsis is the most common technique for introducing new individuals, and the intervening wa is in fact rare. In any case, Watanabe’s results from children’s stories do sup port the givenness analysis of wa and ga: in those stories, 88% of new referents were marked by ga, versus 12% by wa, and 95% of the wa tokens marked old referents. Makino (1982) presents a givenness-based critique of Kuno’s account of wa. “Underlyingly, both thematic and contrastive wa indicate old information,” claims Makino (1982, p. 137), adding “I cannot think of any context where wa indicates important new information.” Starting from this premise, Makino goes on to ar gue that, pace Kuno, there is only one wa, and it is inherently both thematic and contrastive: But the thematic meaning carries with it contrastive meaning simul taneously. Such a dual function is actually quite reasonable, because the moment the listener assumes that a theme has been chosen out of some possible candidates, he will get the contrastive interpretation on top of the thematic interpretation. (Makino 1982, p. 137) The observed difference between Kuno’s two interpretations of wa, according to Makino, can be explained by the fact that thematic NPs tend to refer to infor mation that is older, or otherwise less important, than information referred to by contrastive NPs. In fact, Kuno’s entire taxonomy of wa and ga types can be sub sumed under a hierarchy of ‘communicative dynamism,’ representing the degree of newness or importance of the information that these particles mark. Makino’s hierarchy is displayed in (6.21), from newest to oldest: (6.21) neutral description ga > exhaustive listing ga > contrastive wa > thematic wa
Wa-marking
133
Inoue (1982), too, invokes the old/new distinction in order to account for the difference between Kuno’s thematic and contrastive senses of wa. Inoue observes, following Kuroda, that wa constructions make categorial judgments about partic ular individuals, to the exclusion of others, and are in this sense inherently con trastive. When a wa phrase refers to old information, however, “its discourse func tion of topic becomes highlighted, subduing the sense of contrast” (Inoue 1982, p. 285). When a wa phrase is new, on the other hand, then “the sense of contrast becomes prominent, since it is deprived of the discourse function of topic.” 6.2.5
File card-based accounts of w a and g a
In this section, we examine wa and ga from the perspective of the ‘informa tion packaging’ framework of Vallduvf (1992). This framework uses a file card metaphor to provide a cross-language account of how information is ‘packaged’ linguistically by speakers using means such as prosody and syntax. We begin by introducing the central metaphor underlying Vallduvf’s theory: the file. File change semantics Karttunen (1976) introduced into semantic theory the metaphor of a file that stores records of the entities (‘discourse referents’) that are talked about in a discourse. The notions of files and discourse referents were later incorporated into the influ ential semantic theories of Heim (1982) and Kamp and Reyle (1993). Heim in particular used the file metaphor in her analysis of the semantics of definite and indefinite noun phrases. In Heim’s theory, the difference between definite and indefinite NPs is accounted for in terms of their presuppositions: definites must be familiar, while indefinites must be new. Familiarity is modeled using a metaphorical system of indexed, numbered/i/e cards that hold the information conveyed by the speaker as a discourse progresses. For example, the utterance A woman saw a dog, with no preceding context, leads the hearer to construct two cards and to number them, say, 1 and 2. On card 1, the hearer writes “is a woman” and “saw 2,” and then fills in card 2 with “is a dog” and “was seen by 1.” The difference between definite and indefinite NPs, then, can be summarized by the rule: For every indefinite, start a new card; for every definite, update a suitable old card. The meaning of an utterance is equated with its “file change potential,” a function which assigns to every file F the result file F' which is brought about by making the utterance in a situation in which F obtains. The information packaging framework of Vallduvf (1992) is a model of how the hearer retrieves the informational content of an utterance and enters it into
134
Ellipsis and wn-rnarking in Japanese Conversation
his file card system. Each file card is associated with a set of records containing descriptions (attributes and relations) about the entity it denotes, possibly making reference to other cards standing for other individuals and entities. Updating a file card with new information is a two-step process of finding (i) the relevant file card and (ii) the relevant record on it, if any, that is to be modified. Focus and ground as instructions Crucial to Vallduvfs account is the distinction between the focus, the important or new information in an utterance, and the ground, the parts of the utterance that are not in the focus. The ground subdivides further into link and tail elements, described below. The focus, link, and tail, the three primitives of the Vallduvfs informational component,6 are then associated with three specific file update in structions. In other words, Vallduvfs system constitutes a formalization of the no tion that speakers not only convey information to their listeners, but also provide them with instructions on how to accommodate and integrate the information. Within the ground, the link indicates where, i.e. on which file card, the infor mation is to be entered. The link corresponds to what is traditionally called the topic. It appears sentence-initially, since its function is to point to the location where subsequent information is to be recorded. The tail, on the other hand, indi cates how the update is to be performed. Vallduvf distinguishes two subtypes of the update instruction: U P D A T E - A D D , where a new record (containing the infor mation in the focus) is simply added, and u p d a t e - r e p l a c e , where an old record is replaced with the new information. Vallduvf stipulates that all utterances contain a focus (on the assumption that speakers seek to be informative), but utterances may be link-less or tail-less. The result is a taxonomy of four sentence types, with four corresponding instruction types, listed in Table 6.1. The focus of a sentence S signals an instruction to update the file system with the new information of the sentence, Ig. In the table, f c refers to the file card specified by the link, and record(/c) refers to a specific record on f c that predicates some property or attribute of the entity represented by fc . Note that there is no G O T O instruction in the link-less sentence types (3 and 4); that is, these sentence types do not instruct the hearer to move to a particular file card for update. Such an utterance might be one in which the locus of update is inherited from earlier in the discourse, or else an all-focus utterance with no presuppositions or previous context. 6This tripartite information structure contrasts with traditional bipartite divisions such as topic/comment (Section 6.1.1) and theme/rheme. See Vallduvf (1992, Chap. 3) for discussion.
Wa-marking 1 2 3 4
135 GOTO(/c) GOTO(/c)
Link-Focus Link-Focus-Tail All Focus Focus-Tail
UPDATE-ADD(IS) UPDATE-REPLACE(IS, record (fc)) UPDATE-ADD(IS) UPDATE-REPLACE(IS, record (fc))
Table 6.1: Four sentence types Example using English prosody In English, according to Vallduvf, the focus, link, and tail are associated with particular intonation patterns. The focus is associated with a simple high pitch accent, usually followed by a falling boundary tone. This focus accent is called an ‘A accent’ by Jackendoff (1972), and labeled H* in the system of Pierrehumbert (1980). Links, on the other hand, are associated with a complex fall-rise accent, sometimes called a B accent and often labeled L+H*. Finally, the remaining, prosodically weaker phrases in an utterance can be assigned to either the tail or the focus, depending on their status as old or new. Consider Vallduvfs examples (6.22) and (6.23) below. Here we indicate A ACCENT using upper-case letters and B accent using boldface. (6.22) The boss [hates BROCCOLI] LINK
FOCUS
(6.23) The boss HATES broccoli. LINK
FOCUS
TAIL
Both (6.22) and (6.23) express the same proposition, but in each case the speaker is prosodically “packaging” that proposition in a different way. This distinction is captured in Vallduvfs update model. Normally, (6.22) assumes only the def inite NP the boss as a discourse referent, and then gives new information about the boss, namely her attitude towards broccoli. That is, the packaging of (6.22) instructs the hearer to G O T O the file card for the boss and u p d a t e - a d d the new information hates broccoli. Utterance (6.23), in contrast, seems to take for granted some previous discourse connection between the boss and broccoli, and gives new information as to the nature of that relationship. Once again, the link in (6.23) in structs the hearer to G O T O the file card for the boss. But where the card previously recorded that the boss had attitude P to broccoli (where P might have been left underspecified), the hearer is now instructed to u p d a t e - r e p l a c e P with hates.
136
Ellipsis and Wd-marking in Japanese Conversation
Information packaging and wa and ga The question of how Japanese wa and ga fit into Vallduvf s model has been taken up by several authors, including Heycock (1994), Kaiser (1998), Portner and Yabushita (1998), and Komagata (1999). These authors all characterize wa and ga in the following terms: 1. Links are marked by wa. 2. The particle ga can mark the focus or tail, but not the link. In fact, Watanabe (1989), predating Vallduvfs dissertation, also describes the function of wa explicitly in terms of file addresses: In terms of cognitive process, wa triggers our attention to search for an existing file under the referent name marked by wa... In other words the new information (i.e., the predicate part of a sentence or clause where the wa-marked NP resides) is fed in the existing file, and...[the] wa-marked referent serves as a filing address or file name into which incoming information is to be fed. (Watanabe 1989, p. 181) We can illustrate the file card-based information packaging functions of wa and ga using example (6.24) from the CHJ corpus. (6.24) gakkou wa nee, roku-nensei ga ooi n school t o p F P six-grader S U B J numerous N O M ‘At that school, y ’know, there are a lot of sixth graders.’
da
yo
COP
f p
(1041; 460)
Here, the topic phrase gakkou wa (‘school’) serves as a link; that is, a pointer to a file card to be updated with the information contained in the focus. The gamarked phrase rokunensei ga ooi (lit., ‘sixth graders are numerous’) corresponds to the focus, or the new information to be added to the file card pointed to by the link. This use of ga as a focus marker thus corresponds in function to the A accent in English. Discussion: wa as a link marker The information packaging account of wa and ga, as outlined above, is appealing in some respects. The file card system offers a vivid metaphor for human sentence processing, and Vallduvfs set of file card instruction types represents a concrete
Wa-marking
137
implementation of the plausible notion that speakers provide listeners with ex plicit instructions on how to accommodate and integrate the information they are communicating. On the other hand, the file card account seems overly simplistic, especially considering the nuanced functions that we have seen attributed to wa in this sec tion. For example, if one accepts Kuno’s taxonomy (Section 6.2.1), then the claim that wa marks the link in Japanese is presumably a claim about only the thematic wa and not the contrastive one. The association of thematic wa with linkhood is consistent with Kuno’s observation that a Japanese sentence is restricted to at most one thematic wa but may contain any number of contrastive vva’s, since under Vallduvfs system the information structure of a sentence is restricted to at most one link, but any number of focus and tail elements. Furthermore, the fact that thematic wa is generally restricted to sentence-initial position is consistent with Vallduvfs conception of links as exclusively sentence-initial. The contrastive wa, which marks new information and receives prominent intonation, seems better classified as a focus marker rather than a link marker. We will return to the issue of the adequacy of file card accounts of wa at the end of Section 6.4. Meanwhile, in Section 6.2.6 below we critique the claim by Portner and Yabushita (1998) that the file card framework is not simply appealing, but actually necessary for an adequate account of wa. 6.2.6
The Strong Familiarity Condition
The file card account of wa outlined above is adopted, in modified form, by Port ner and Yabushita (1998) as part of their theory of Extended File Change Seman tics. Portner and Yabushita (P&Y) use the file card account of wa in order to explain a certain pragmatic constraint on Japanese definite NPs. Specifically, P&Y seek to account for their observation that, in Japanese, “a discourse entity can be most readily picked out with information that has been attributed to it while it is the topic” (p. 121). They illustrate this phenomenon with the sequence of utterances in (6.25a)-(6.25d) (P&Y pp. 125-126): (6.25)
a. John wa kafe de onnanohito ni aimasita. John t o p cafe at woman o b j met ‘John met a woman at a cafe.’ b. Kanozyo wa pianisuto desita. She t o p pianist was ‘She was a pianist.’
138
Ellipsis and wa-marking in Japanese Conversation c. ?? Kare ga kafe de atta onnanohito wa totemo He s u b j cafe at met woman t o p very omosiroi hito desita. interesting person was ‘The woman he met in the cafe was a very interesting person.’ d. Pianisuto noonnanohito wa totemo omosiroi hito pianist o f woman TOP very interesting person desita. was ‘The woman who was a pianist was a very interesting person.’
P&Y predict that sentence (6.25d) is a natural and felicitous continuation of (6.25a) and (6.25b), but that (6.25c) is not. The reason that (6.25d) sounds natural is that the woman in question was a wa-marked topic when it was asserted that she was a pianist. In contrast, (6.25c) is infelicitous because John, not the woman, was the topic when it was asserted that they had met at a cafe. P&Y formalize their observation into the following condition on definite NPs in Japanese: (6.26) Presupposition o f Definite NPs [the ‘Strong Familiarity Condition’] A definite NP a* is only felicitous if the information that it represents an a is already entered on card i. Condition (6.26) explains example (6.25) as follows. Utterance (6.25b) causes the information that the woman is a pianist to be entered on the woman’s file card. The NP pianisuto no onnanohito (‘the pianist woman’) in (6.25d) therefore obeys the Strong Familiarity Condition (SFC), because the woman was a wa-marked topic when it was asserted that she was a pianist. In contrast, the NP kare ga kafe de atta onnanohito (‘the woman he met in a cafe’) in (6.25c) violates the SFC, and is therefore infelicitous, because John, not the woman, was the topic when it was asserted that they had met at a cafe. According to P&Y, the SFC on Japanese NPs demonstrates the predictive power of file card-based semantic theories. In order to enforce the SFC on Japanese NPs, they argue, a semantic theory requires some mechanism like file cards for connecting up the information established in a discourse with those individuals that the information is ‘about.’ Theories that have such a mechanism (e.g. Vallduvx’s and P&Y’s) are superior in this respect to theories that lack one (e.g. focusbased approaches like von Fintel’s (1994) or Roberts’ (1996)), because the latter are presumably unable to account for Japanese data like (6.25).
Wa-marking
139
No.
Definite NP
NP1
pianisuto no onnanohitoi
File Card the womani
pianist GEN woman t ‘the wom ani who was a pianist’
NP2
11kare \ ga kafe de atta onnanohit02
Johni
hei SUBJ cafe LOC m et w om an 2 ‘the womans hei met in the cafe’
NP3
chikin furaido suteeki o chuumon sita onnanohitoi
the w omani
chicken fried steak OBJ order did womani ‘the wom ani who ordered a chicken fried steak’ NP4
llotokonohitox ga raketto o tewatasita onnanohito2
the mani
m ani SUBJ racquet OBJ handed womans ‘the womans who the m ani had handed a racket to ’
NP5
wookuman 0 katta onnanohitoi
the wom ani
Walkman o b j bought w om ani ‘the w om ani who bought a Walkman’
NP6
llnihon ningyoo 0 katta onnanohitoi
Kyoto-s
Japanese doll OBJ bought womani ‘the wom ani who bought a Japanese doll’ [in Kyoto]
NP7
imiron 0 senkoo site iru gakuseii
the studenti
semantics OBJ majoring doing is studenti ‘the studenti who is majoring in sem antics’
NP8
11X kyouzui ga osiete iru gakusei-2
Prof. Xi
X professori SUBJ teaching is students ‘the students whom Professor X i is teaching’
Table 6.2: Japanese definite NP data from P&Y (1998) In the remainder of this section we examine the SFC more closely, and also look for evidence for it in the CHJ corpus. Evidence for the SFC As further evidence for the SFC, P&Y offer three similar mini-discourses along the lines of (6.25). We do not reproduce the three other mini-discourses in their entirety because they are largely identical in form to (6.25). Rather, we simply list P&Y’s felicity judgments on the final NPs from each of these sequences in Table 6.2. The ‘File Card’ field in the table indicates, for each NP, the entity that
140
Ellipsis and wa-marking in Japanese Conversation
was the topic when the information in the NP was asserted; that is, the file card holding that information. In P&Y’s account, the felicitous NPs (NP1, NP3, NP5, and NP7) are those whose head matches the ‘File Card’ entity; the infelicitous NPs (NP2, NP4, NP6, and NP8) are those whose head does not match, in violation of the SFC. P&Y assert that the SFC “applies to all definite NPs in Japanese” (p. 123). If this is the case, then we would expect to be able to find examples in the CHJ corpus of definite NPs that obey the SFC and are therefore felicitous. Furthermore, we would expect not to find (too many) examples of definite NPs that disobey the SFC, and are therefore infelicitous. If these expected results bear out in the corpus, then this would be strong evidence for the SFC and for P&Y’s claims about its implications for semantic theory. Before searching the CHJ corpus for evidence of the SFC, however, let us take a closer look at P&Y’s own data. An examination of the constructed example NPs in Table 6.2 shows that they share the following three properties: (i) They are complex NPs consisting of a nominal head preceded by a verbal or nominal modifier (representing descriptive information). (ii) They are definite and referring-, that is, they identify a specific entity in the current context. (iii) They are presupposed-, that is, at some point earlier in the discourse, the information represented by the modifier had already been predicated of the entity denoted by the NP. Examples o f the SFC in the CHJ data Now let us supplement P&Y’s constructed examples in Table 6.2 with some natur ally-occurring NPs from the CHJ corpus. We searched the CHJ transcripts for examples of NPs comparable to P&Y’s in that they satisfied (i)-(iii) above. We first counted 29,681 NPs in the corpus that were likely to be definite and referring. We therefore excluded question words, the complementizer koto, and discourse-deictic NPs like zitu-wa (‘the truth is...’). Of these 29,681 NPs, 3,439 were complex NPs as defined in (i). In order to find complex NPs built out of presupposed information, as defined in (iii), we decided to limit ourselves to those NPs containing information that was textually given. In other words, we considered NPs whose head nominal and modifier had both appeared at least once somewhere earlier in the conversation.7 7 M o r e precisely, their morphological stems were textually given.
Wa-marking Label CHJ1
Location 0862;156
141 Complex NP soru un dou
Earlier status of NP head marked object at 144s
o
arch exercise ‘[back] arching exercises’ CHJ2
1048;352
n iju u h a k k a i n o b a ik in g u
ga
marked subject at 341s
28-floor g e n buffet ‘28th-floor buffet’ CHJ3
1509;671
o to k o - b u r i e a ii m u su k o
discourse-salient at 462s
man-style subj good son ‘manly son’ CHJ4
1541;724
m e z u r a s ii r u su d e n
unmarked object at 697s
unusual message ‘unusual messages’ Table 6.3: Complex NPs in the CHJ corpus satisfying (i)-(iii) We therefore eliminated 1,537 of the 3,439 complex NPs because the head was textually new, and another 975 whose modifier was textually new. This left us with 927 complex NPs whose head and modifier were potentially presupposed since they had both appeared earlier in the conversation. At this point, our annotator Juno Nakamura manually inspected each of the 927 complex NPs. Her task was to find examples of NPs whose modifier rep resented information that had previously been attributed to the entity denoted by the head, as in P&Y’s examples. We were only able to find a total of four ex amples that met this criterion. These four NPs, labeled CHJ1-CHJ4, are listed in Table 6.3, along with their location (transcript number and utterance time) in the corpus. The ‘Earlier status’ field in Table 6.3 indicates the grammatical status of the earlier occurrence of the NP head (the underlined word), just before the informa tion represented by the preposed modifier was predicated of it. For example, in example CHJ1, u n d o u (‘exercise’) had been an o-marked object at 144s into tran script 0862 when s o r u (‘arching’) had been predicated of it. Since u n d o u ' s earlier status was that of object rather than a wa-tnarked topic, CHJ1 violates the SFC. In fact, as shown in the table, none of the heads in examples CHJ1-CHJ4 had been wa-marked topic phrases as demanded by the SFC. In examples CHJ1 and CHJ2 the heads had been a ga-marked subject and o-marked object, respec tively, while in CHJ4 the head had been an unmarked object. Although CHJ3 had not been a wa-marked topic, it could be regarded as having been a salient
142
Ellipsis and wa-rnarking in Japanese Conversation
discourse topic, because the infant under discussion had been audible in the back ground when the speaker asserted it was otoko-buri-ga ii (‘manly’). If P&Y’s file card model were extended to include unmentioned but otherwise salient entities as links, then CHJ3 could plausibly be interpreted as not in violation of the SFC. Otherwise, however, the CHJ corpus does not seem to present any evidence that the Strong Familiarity Condition holds for definite NPs in Japanese. Discussion The fact that we found no evidence for the SFC in the CHJ corpus does not neces sarily contradict P&Y’s claim that the SFC applies to all definite NPs in Japanese. In fact, P& Y suggest that violations of the SFC should be viewed as cases of pre supposition failure that can be accommodated in the right circumstances (pp. 124127). Returning to example (6.25), for instance, P&Y note that (6.25a) can in fact be followed felicitously by (6.25c) directly (i.e., skipping over (6.25b)). The relative naturalness of the combination (6.25a)-(6.25c) is attributed to the fact that in this case there is no alternative way to pick out the woman; however, once the information in (6.25b) is added, (6.25c) becomes infelicitous in comparison to (6.25d). In other words, the SFC should not be interpreted as a strict seman tic constraint on definite NPs, but rather as a predictor of relative felicity ceteris paribus. The fact that NPs in the CHJ corpus do not appear to be sensitive to the SFC could just mean that the SFC is being overridden or outranked by competing constraints. While this may be the case, we think a more plausible interpretation of our cor pus results is that the phenomenon that P&Y seek to explain does not really exist, and the SFC is not an actual linguistic constraint on Japanese definite NPs. What justifies this conclusion is the fact that P&Y’s example data can be explained with out the use of file cards or similar devices, using principles of greater robustness and generality than the SFC. In particular, we note that all of the infelicitous NPs in Table 6.2, except E6, have a subject in the modifier, whereas none of the felicitous NPs do. In syn tactic terms, the infelicitous NPs in NP2, NP4, and NP8 are cases of relativization out of object position, whereas the felicitous examples NP1, NP3, NP5, and NP7 represent relativization out of subject position. P&Y’s felicity judgments are therefore consistent with the well-established generalization, due to Keenan and Comrie (1977), that subjects are more accessible to relativization than objects (see also Section 6.4.1). That is, relativized subjects are the unmarked case in all lan guages that permit relative clauses. Evidence for this generalization in the case of Japanese in particular is found in Baldwin’s (1998) analysis of the EDR corpus:
l
W'd-marking
143
out of 4,615 relative clauses, 3,004 involved relativized subjects, while only 306 involved direct objects and 15 indirect objects. The Keenan-Comrie generalization accounts for three of P&Y’s four exam ples. What remains to be explained is why P&Y judged NP5, wookuman o katta onnanohito (‘the woman who bought a Walkman’) to be more felicitous than NP6, nihon ningyoo o katta onnanohito (‘the woman who bought a Japanese doll’). The complete context that P&Y provide for the felicitous example, NP5, is presented in (6.27): (6.27)
a. Futari no onnanohito ga nihon e ikimasita. two GEN women SUBJ Japan to went ‘Two women went to Japan.’ b. Hitori no onnanohito wa Tookyoo de wookuman o one GEN woman TOP Tokyo at Walkman OBJ kaimasita. bought ‘One woman bought a Walkman in Tokyo.’ c. Moo hitori no onnanohito other one GEN woman ningyoo o kaimasita. doll
obj
wa top
Kyooto de nihon Kyoto at Japanese
bought
‘The other woman bought a Japanese doll in Kyoto.’ d. Wookuman o katta onnanohito wa rai-nen mata Walkman OBJ bought woman TOP next-year again nihon e iku soo desu. Japan to go seems COP ‘The woman who bought a Walkman will go to Japan again next year.’ The context for the infelicitous example, NP6, is (6.28): (6.28)
a. Futari no onnanohito ga nihon e ikimasita. two GEN women s u b j Japan to went ‘Two women went to Japan.’ b. Tookyoo de wa hitori no onnanohito ga wookuman Tokyo at TOP one GEN woman SUBJ Walkman o kaimasita. OBJ bought
‘In Tokyo, one woman bought a Walkman.’
144
Ellipsis and \na-marking in Japanese Conversation c. Kyooto de wa moo hitori no onnanohito ga Kyoto at TOP other one GEN woman subj nihon ningyoo o kaimasita. Japanese doll OBJ bought ‘In Kyoto, the other woman bought a Japanese doll.’ d. ?? Nihon ningyoo o katta onnanohito wa Japanese doll OBJ bought woman TOP rai-nen mata nihon e iku soo desu. next-year again Japan to go seems COP ‘The woman who bought a Japanese doll will go to Japan again next year.’
Once again, there is a simple explanation, not involving file cards, for why NP6 might sound more stilted than NP5. Namely, NP6 in (6.28d) repeats informa tion (the buying of the doll) that was given in the immediately previous utterance (6.28c). Although NP5 in (6.27d) also repeats earlier information (from (6.27b)), there is at least an intervening utterance in (6.27c). As a result NP5 sounds some what less awkward than NP6. In conclusion, the Keenan & Comrie hierarchy of accessibility to relativization, a well-established and apparently universal linguistic principle, explains three of P&Y’s four felicity judgments, and their fourth judgment can also be plausibly accounted for without the use of file cards or the SFC. This fact, together with the lack of evidence for the SFC in the CHJ data, casts doubt on the SFC’s legitimacy as a constraint on Japanese NPs. If we are correct, and the SFC does not hold, then P&Y’s argument for the superiority of file card-based semantics over rival theories is also invalid. 6.2.7
Conclusion: semantics of w a - and ga-phrases
In this section we explored the contrasting semantics of wa- and go-phrases, a subtle and perplexing issue that has received considerable attention in the Japanese linguistics literature. We began in Section 6.2.1 with a review of Kuno’s taxonomy of wa and ga, which includes his distinction between the thematic wa and contrastive wa. Kuno’s approach was then compared to Kuroda’s explanation of wa and ga based on categorical and thetic judgments (Section 6.2.2), and Martin’s account of wa as a backgrounding particle (Section 6.2.3). Then, in Section 6.2.4, we examined a class of theoretical descriptions of wa and ga that are formulated in terms of
Wa-marking
145
the givenness of the entities marked by these particles. This led to the discus sion in Section 6.2.5 of the file card-based approach, which seeks to situate the functions of Japanese wa and ga within a broader, cross-language account of ‘in formation packaging.’ Finally, in Section 6.2.6 we focused on one particular file card approach, that of Portner and Yabushita (1998), and discovered a flaw in their argument for the necessity of file cards in semantic theory. In the remaining two sections of this chapter we discuss certain other (nonsemantic) characteristics of wa and ga, and further explore how these particles are actually used in the CHJ corpus. In particular, Section 6.3 examines the intonational correlates of wa- and ga-phrases, and Section 6.4 investigates which NP classes are most frequently marked by wa and ga.
6.3
Intonation and wa and ga
The issue of the intonation of wa-phrases was briefly raised above in Section 6.2 during our discussion of the so-called contrastive wa, which was described as receiving “prominent intonation” (Kuno 1973) or “emphatic stress” (McGloin 1986). This section examines this and other aspects of the intonation of wa- and ga-phrases in more detail. Intonation refers to perceived patterns of pitch or melody in speech. Intonation is therefore a perceptual or psychological phenomenon: contrastive phonological events—e.g., rises or falls in pitch—are interpreted by the hearer and assigned linguistic significance (Ladd 1996). On the other hand, pitch variation correlates with specific acoustic phenomena that can be measured represented as continuous numeric values. The physical basis of perceived pitch, as we saw in Section 4.5.1, is Fo. In this section we will examine measurements of Fq variation in actual Japanese utterances, including those in the CHJ corpus, to help shed empirical light on the relationship between pitch and the particles wa and ga. This section is organized as follows. First, in Section 6.3.1, we examine the association between intonation and focus. Next, in Section 6.3.2 we review three previous studies of the intonation of wa- and ga-phrases in Japanese. Finally, in Section 6.3.3, we present the results of our own study of Fq and wa- and gaphrases in the CHJ corpus. 6.3.1 Intonation and focus Variation in Fo range within Japanese utterances is affected by a complex as sortment of local phonological effects (Pierrehumbert and Beckman 1988) and
146
Ellipsis and wn-marking in Japanese Conversation
broader discourse effects (Venditti 2000). One example is declination: the grad ual decrease in the mean Fo value, and the compression of the range of observed F q values, over the course of an utterance. Another factor affecting pitch range is discourse structure. Utterances at the beginning of a discourse segment tend to exhibit higher peak F0 than those elsewhere in the segment, while discoursefinal utterances tend to exhibit compressed pitch range. These effects have been observed in both English (Hirschberg and Nakatani 1996) and Japanese (Venditti and Swerts 1996; Venditti 2000). There is also evidence that Japanese speakers manipulate pitch range in order to indicate the givenness or salience of the entities mentioned in the utterance (Pierrehumbert and Beckman 1988; Hirose et al. 1996; Venditti and Swerts 1996; Venditti 2000). In particular, expanded pitch range is used to indicate a point of focus or contrast, while compressed pitch range correlates with non-prominent or given information (Section 5.4.5). The use of higher pitch to mark new or important information is by no means restricted to Japanese. As Chafe (1976) observes: The principal linguistic effects of the given-new distinction, in En glish and perhaps in all languages, reduce to the fact that given infor mation is conveyed in a weaker and more attenuated manner than new information. This attenuation is likely to be reflected in two principal ways: given information is pronounced with lower pitch and weaker stress than new, and it is subject to pronominalization. Contrastive information also tends to be marked with expanded pitch range in Japanese. We have already seen that phrases marked with contrastive wa are described as receiving ‘prominent intonation’ or ‘emphatic stress,’ unlike NPs marked with thematic wa. In fact, this description seems to have been verified experimentally by Finn (1984) and by Nakanishi (2001), as we report in Sec tion 6.3.2 below. Sentence accentability o f wa and ga Matsunaga (1984) considers how wa and ga affect the placement of sentence ac cent in Japanese. The sentence accent—the most prominent accent phrase in a sentence—marks the point of information focus or contrastive stress in an utter ance. Phonologically, the locus of Japanese sentence accent falls on the leftmost accentable constituent of the accent phrase, in contrast to English, where the ac cent normally falls on the rightmost word in a phrase.
Wa-marking
147
Most accentable ga-phrases interrogatives
nouns
4=> time adverbs
wa-phrases
Least accentable verbs particles
Table 6.4: Scale of accentability (Matsunaga 1984) Matsunaga appeals explicitly to the notions of old and new information in his account of the “focus-accentability” of phrases marked by wa and ga. Specifically, Matsunaga notes that thematic wa phrases, which refer anaphorically to already salient entities, are less likely to receive sentence accent than ga phrases, which refer to ‘new’ information. He illustrates with the pair in (6.29): (6.29)
a. waTAsi ga satou desu. I SUBJ Sato C O P ‘I’M the one whose name is Sato.’ b. watasi wa SAtou desu. I t o p Sato c o p ‘My name is SATO.’
Utterance (6.29a), which displays sentence accent on the phrase watasi ga, is an acceptable response to the question “Who is Mr. Sato?” but it cannot be used to introduce oneself, since it presupposes that the name Sato is already mutual knowledge. Sentence (6.29b), on the other hand, with sentence accent on the name Satou, represents a standard self-introduction. Here the unstressed thematic waphrase refers to the speaker, who is clearly salient in the context of an introduction, while the new information, the speaker’s name, receives prominent intonation. Matsunaga also considers the ‘accentability’ of a number of other constituent types besides wa- and ga-phrases. He notes that interrogative or ‘wh-words’ such as doko (‘where’), dou (‘how’), and dare (‘who’), which prompt for missing in formation, tend to attract sentence accent regardless of where they appear in a sentence.8 Furthermore, nouns, with the exception of thematic wa phrases, are more accentable than adverbs of time and verbs. Matsunaga’s complete ‘scale of accentability’ is displayed in Table 6.4. 8 Recall from Section 5.4.3 that Japanese wh-words are strongly associated with o-ellipsis as well as sentence accent. This fact poses yet another problem for the claim that focus inhibits particle ellipsis (Section 5.4.5).
Ellipsis and wa-marking in Japanese Conversation
148 6.3.2
Fo
correlates of wa-phrases
In this section we review three phonetic studies of the F0 correlates of wa-phrases. These studies were carried out by Finn (1984), Venditti (2000), and Nakanishi (2001). Nakanishi’s study, which uses a small amount of pitch data extracted from the CHJ corpus, examines pitch differences between the thematic and contrastive uses of wa. The Finn and Venditti studies both compare the pitch patterns of topics to subjects, but arrive at opposite conclusions about the matter. In Section 6.3.3, therefore, we carry out our own investigation of the F0 correlates of wa and ga in the CHJ speech data. After presenting our own results, we speculate on the reasons for the discrepancy between the Finn and Venditti studies. The Nakanishi study Nakanishi (2001) compares the Fo patterns of thematic and contrastive wa. Her data include selected CHJ utterances as well as speech elicited in the laboratory. First, under laboratory conditions, Nakanishi had several native Japanese speakers read aloud a set of constructed example sentences containing instances of thematic and contrastive wa. She then supplemented these data with ten utterances contain ing wa extracted from the CHJ corpus. Five of these CHJ utterances were judged by Nakanishi to be thematic, and the other five contrastive. By way of example, two of these CHJ utterances, the first thematic and the second contrastive, are given below in (6.30) and (6.31). (6.30) mukou wa syoubai desyou there/him TOP business COP ‘For him, it’s his job.’
(1263; 517)
(6.31) gohan toka wa oisii kedo ne... food etc. TOP good but FP ‘The food is good but... ’
(1263; 864)
For all of her collected utterances, Nakanishi measured the peak Fo value im mediately preceding wa and the peak F0 value immediately following wa. In the case of thematic wa, the two peaks tended to be roughly equal in value. In the case of contrastive wa, however, the F0 peak preceding wa tended to be significantly higher than the F0 peak following wa. Nakanishi’s results therefore support the hypothesis that the thematic and contrastive uses of wa are distinct, and that they can be distinguished on the basis of pitch. Similar results were also obtained by Finn (1984), as we report next.
149
WiL-marking
Token noun-w