Chinese Lexical Semantics: 22nd Workshop, CLSW 2021, Nanjing, China, May 15–16, 2021, Revised Selected Papers, Part I (Lecture Notes in Artificial Intelligence) 3031067029, 9783031067020

The two-volume proceedings, LNCS 13249 and 13250, constitutes the thoroughly refereed post-workshop proceedings of the 2

156 34 32MB

English Pages 552 [546] Year 2022

Table of contents :
Preface
Organization
Contents – Part I
Contents – Part II
Lexical Semantics and General Linguistics
On the Differences Between Dōua and Dōub
Abstract
1 An Introduction: The Necessity of Comprehensively Clarifying the Differences Between dōua and dōub
2 The Syntactic Differences Between dōua and dōub
2.1 The Difference of the Topic Components Between dōua Sentences and dōub Sentences
2.2 The Differences of Modifiers Before dōua and dōub
2.3 The Differences of ‘dōua/dōub + VP/AP’ as the Complements
3 The Semantic Differences Between dōua and dōub
4 The Pragmatic Differences Between Doua and Doub, and Between Their Sentences/Constructions
4.1 The Interpersonal Function and Textual Function of Doua and Its Sentences/Constructions
4.2 The Interpersonal Function and Textual Function of Doub and Its Sentences/ Constructions
5 Epilogue
Acknowledgments
References
From Falling to Hitting: Diachronic Change and Synchronic Distribution of Frost Verbs in Chinese
Abstract
1 Introduction
2 Frost Verbs in Ancient Chinese
3 Frost Verbs and Frost Damage: Evidence from Geographical Distribution
4 Conclusion
Acknowledgement
References
The Senses of Mandarin Deadjectival Verbs
Abstract
1 Introduction
2 Against the Coercion Hypothesis
3 Deadjectival Verbs in Mandarin
3.1 Time Phrases as Diagnostic Test
3.2 Time Phrases in Mandarin
3.3 Aspectual Class of Deadjectival Verbs in Mandarin
3.4 The Semantics of Mandarin Deadjectival Verbs
4 Concluding Remarks
References
A Corpus-Based Study of Factive Verbs and Its Influencing Factors
Abstract
1 Introduction
2 Factive-Sum Verbs in the Actual Corpus
3 Supplementation of Factuality-Sum Influencing Factors
3.1 Counter-Factuality Influencing Factors
3.2 Non-factuality Influencing Factors
4 The Theoretical Study of Factive-Sum Verbs and the Factuality-Sum Influencing Factors
5 Conclusion
Acknowledgments
References
The Discrimination of the Synonyms of yǐnqǐ: A Corpus-Based Study
Abstract
1 Introduction
2 Related Works
3 Research Design
3.1 BCC Corpus
3.2 Instruments
3.3 Methods
4 Research Analysis
4.1 Differences in Register Distribution
4.2 Analysis of Colligation
4.3 Analysis of Significant Collocations
4.4 Analysis of Semantic Prosody
5 Conclusion
References
A Corpus-Based Semantic Analysis of the Pattern ‘bǎ+NP+VP’ in Mandarin Chinese
Abstract
1 Introduction
1.1 “bǎ+NP+VP” in Mandarin Chinese
1.2 Pattern Grammar
1.3 Research Questions and Innovation
2 Methods
2.1 Corpus
2.2 Tool and Software
2.3 Procedure
3 Results and Discussion
3.1 Semantic Distribution of N and V at the First Level
3.2 Key Semantic Subcategories at the Second Level
3.3 Typical Usages of the Pattern “bǎ + N + V”
4 Conclusion
Appendix
References
Cross-Categorial Behaviors of Mandarin Physical Contact Verbs: A Frame-Based Constructional Analysis of qiāodǎ 敲打
Abstract
1 Introduction
1.1 Previous Works on Semantic Extensions
1.2 Semantic Extension of Physical Contact Verbs
1.3 Frame-Based Constructional Approach
2 Frame-Based Analysis of qāodǎ 敲打 ‘Knock’
2.1 qiāodǎ 敲打 ‘Knock’ as a Physical Contact Verb
2.2 Cross-Categorial Behaviors of qiāodǎ 敲打 ‘Knock’
2.3 Corpus Distribution of qiāodǎ 敲打 ‘Knock’
3 Cognitive Motivations
4 Conclusion
References
The Correspondence Between Semantic Functions and Syntactic Constructions of Guà Verbs
Abstract
1 Introduction
2 The Reclassification of Guà Type Verbs
3 The Correspondence Between the Different Semantic Functions and the Syntactic Constructions of Guà Type Verbs
3.1 The Correspondence Between the Semantic Functions and the Syntactic Constructions of “Strong State Verbs”
3.2 The Correspondence Between Semantic Functions and the Syntactic Constructions of “Strong Action Verbs”
4 Discussion
Acknowledgments
References
The Collostruction-Based Definition Model in Language-Specific Chinese-English Learner’s Dictionaries: The Case of Chinese Collective Classifier ‘Bǎ’
Abstract
1 Introduction
2 A Corpus-Based Collostructional Analysis of “Bǎ”
2.1 Research Overview
2.2 Result
3 A Comparative Study on “Bǎ” and Its English Counterparts
3.1 English Counterparts of “Bǎ”
3.2 A Comparative Study on “Bǎ” and Its English Counterparts
4 A Study on the Definition Models for Chinese Collective Classifiers in Existing CELDs
5 The Language-Specific Collostruction-Based Model for Defining Chinese Collective Classifiers
6 Conclusion
Acknowledgements
References
The Collocations of Chinese Color Words
Abstract
1 Introduction
2 Related Research
3 Research Methods
3.1 Corpus Selection and the Research Tool
3.2 Corpus Selection and Annotation
4 The Experiential Domain and the Abstract Domain of the Chinese Color Words 白 bái ‘White’ and 黑 hēi ‘Black’
5 Semantic Analysis of the Collocates of the Color Words 白 Bái ‘White’ and 黑 hēi ‘Black’
5.1 Comparison Under the Subject-Predicate Relation
5.2 Comparison Under the Attributive-Head Relation
5.3 The Similarities and Differences of the Collocates’ Distribution Under the Subject-Predicate Relation and the Attributive-Head Relation
6 Conclusion
Acknowledgment
References
The Differences Between Jiùshì and Jiùsuàn as Conjunctions and Their Formation Mechanisms
Abstract
1 Introduction
2 The Differences Between Jiùshì and Jiùsuàn
2.1 Differences in the Syntax of the Two Words
2.2 Differences in the Semantics of the Two Words
2.2.1 The Multifuctionality of Jiùshì
2.2.2 The Subjectivity of Jiùsuàn
3 The Evolution of Jiùshì and Jiùsuàn
3.1 The Evolution of Jiùshì and the Formation of Multifuctionality
3.2 The Evolution of Jiùsuàn and the Highlight of Subjectivity
3.3 The Differences and Influence of the Evolution Paths of Jiùshì and Jiùsuàn
4 Conclusion
Acknowledgments
References
A Deontic Modal SFP in Chengdu Chinese
Abstract
1 Introduction
2 Data Presentation
2.1 The Semantics of Təu
2.2 Təu-Induced Condition: Telicity
2.3 The Təu-Induced Modality
3 Syntactic Analysis of Təu-Containing Sentences
3.1 Non-relativization of Təu
3.2 Stacking of SFPs
3.3 Further Evidence for Conditional Modality of Təu
4 Concluding Remarks
References
An Analysis of the Grammaticalization, Coercion Mechanisms and Formation Motivation of the New Construction ‘XX Zi’ from the Cognitive Perspective
Abstract
1 Introduction
2 Structural Features and Grammaticalization Process of ‘XX Zi’ Construction
2.1 Grammaticalization of ‘Zi’ and Definition of New ‘XX Zi’ Construction
2.2 Structural Characteristics of the New ‘XX Zi’ Construction
2.3 Grammaticalization of the New ‘XX Zi’ Construction
3 Reconstructive Operational Mechanisms of the New ‘XX Zi’ Construction
3.1 Lexical Coercion in the New ‘XX Zi’ Construction
3.2 Construction Coercion in the New ‘XX Zi’ Construction
3.3 Inertial Coercion of the New ‘XX Zi’ Construction
4 Formation Motivation and Popular Mechanism of the New ‘XX Zi’ Construction
4.1 Transmission of Fan Culture in Europe, America, Japan and South Korea
4.2 The Replication and Reinforcement of Strong Memes
4.3 Influence of Social Culture and Boost of New Media
5 Pragmatic Function of the New ‘XX Zi’ Construction from the Cognitive Perspective
5.1 Needs of Verbal Communication in Specific Context
5.2 Expression Needs of ‘Defamiliarization’ and ‘Subjectivization’
5.3 Presentation and Cognitive Prominence of Thematic Meaning
6 Conclusion
References
A Study on Lexical Knowledge and Semantic Features of Speech Act Verbs Based on Language Facts
Abstract
1 Introduction
2 The Construction of Lexical Knowledge System of Speech Act Verbs
2.1 The Lexical Knowledge System of Verbs
2.2 The Collocation Knowledge System of Speech Act Verbs
2.3 The Syntactic Framework Information of Speech Act Verbs
3 The Construction of Semantic Feature Knowledge System of Speech Act Verbs
3.1 The Semantic Feature Knowledge System of Verbs
3.2 The Semantic Feature Extraction of Speech Act Verbs
3.3 The Semantic Feature Knowledge System of Speech Act Verbs
4 Conclusion
Acknowledgments
References
Investigating Verbs of Confession Through a Syntactic and Semantic Annotation Tool
Abstract
1 Introduction
2 Related Research
3 Research Methods
3.1 Research Procedures
3.2 The Syntactic and Semantic Annotation Tool
4 Syntactic and Semantic Analysis of Verbs of Confession
4.1 Syntactic Analysis of Verbs of Confession
4.2 Semantic Roles of the Words Collocated with Verbs of Confession
4.3 Comparison with Chinese Resources
5 Conclusion
Acknowledgement
References
The Pragmatic Distribution and Semantic Explanation of Evidential Prepositions
Abstract
1 Introduction
2 Methods of Analyzing the Semantic Structure of Prepositions
3 Semantic Structure of “jü (据)” and “yijü (依据)”
3.1 Semantic Structure of “jü (据)”
3.2 Semantic Structure of “yijü (依据)”
3.3 Distinguishing “jü (据)” from “yijü (依据)”
4 Semantic Structure and Modality Classification of Evidential Prepositions
4.1 Semantic Structure of Evidential Prepositions
4.2 Modality Classification of Evidential Prepositions
5 Semantic Map of Evidential Prepositions
6 Conclusion
Acknowledgments
References
The Discourse Functions of Shell Nouns in Mandarin: A Genre-Based Study in Popular and Professional Science Articles
Abstract
1 Introduction
2 Corpus and Procedures
3 Results
3.1 The Distinction of Semantic Types
3.2 The Distinction of Grammatical Collocation
3.3 The Distribution of Content and Correlation Information
4 Discussion
4.1 Discourse Organization
4.2 Subjectivity
4.3 Knowledge Construction
5 Conclusion
Acknowledgements
References
The Interpersonal and Attitudinal Function of the Modal Particle A in the Middle of the Sentence
Abstract
1 Discourse Distribution of A(啊) in a Sentence
2 To Strengthen Attitudinal Function
2.1 Adding Subjective Expression Components of the Speaker
2.2 Causing the Subjective Deviation of ‘Objective Quantity’
2.3 Strengthening the Speaker’s Subjective Modality
3 To Weaken the Speaker’s Illocutionary Force
4 Discourse Function of A(啊)
4.1 Construction of Discourse Style
4.2 Marking the Intervention of the Listener
5 Conclusion
References
Verb Meaning Representation Based on Structured Semantic Components
1 Introduction
2 The Flexibility of Natural Language Meanings
3 The Granularity and Structure of Semantic Role Systems
4 The Meaning of za Represented in Structured Semantic Components
4.1 The Semantic Components of za
4.2 The Hierarchical and Metaphoric Structure of the Semantic Component System
5 The Application of the Structured Semantic Components on Other Verbs
6 Conclusive Remarks
References
Activation of Alternatives by Mandarin Sentence-Initial and Sentence-Internal Foci: A Semantic Priming Study
Abstract
1 Introduction
2 Methods
2.1 Participants and Materials
2.2 Procedure
2.3 Measurements
3 Results
4 Concluding Remarks
Acknowledgments
References
On the Limitations of Constructional Innovation: A Case Study of the “ni bixu zhidao de NP”
Abstract
1 Introduction
2 The Form and Semantic Distinctiveness of the Construction “你必须知道的NP” [ni bixu zhidao de NP] (NP that you must know)
2.1 Variety and Inclination of the Productive Element X
2.2 Subjectivisation of the Fixed Element “你必须知道” [ni bixu zhidao de NP] (you must know)
2.2.1 Over-Extension of the Shifter “你” [ni] (you)
2.2.2 Semantic Weakening and Tokenization of “你必须知道的” [ni bixu zhidao de NP] (you must know)
2.3 Conventionalization of the “你必须知道的NP” [ni bixu zhidao de NP] (NP that you must know)
3 Discourse Function of the “你必须知道的NP” [ni bixu zhidao de NP] (NP that you must know)
3.1 Highlighting High-Value Information
3.2 Expressing the Author’s Stance
3.3 Recruiting Readers’ Empathy
4 Conclusion
Acknowledgements
References
Formalized Chinese Sentence Pattern Structure and Its Hierarchical Analysis
Abstract
1 Introduction
2 An Overview of Sentence Pattern Structure System
2.1 The Syntactic Structure System of SPS
2.2 The Lexical Structure System of SPS
3 The Hierarchical Characteristics of SPS
3.1 The Sentence Pattern Level Outweighs Phrasal Hierarchy
3.2 The SPS Level Can Be Measured by Vertical Distance Between Horizontal Lines and the Nesting Depth of Bracket in SPS Expressions
3.3 Lexical Structure Analysis Does not Increase SPS Level
4 The Teaching Application of SPS
4.1 Retrieval and Acquisition of Examples of Sentence Patterns
4.2 Analysis of the Structure of Sentence Pattern Expression
5 Conclusion
Acknowledgments
References
On the Semantic Ambiguity of Chinese Causative Resultative V-Vs
Abstract
1 Introduction
2 The Syntactic-Semantic Interface
2.1 Syntactic Structure
2.2 Argument Realization
2.3 Semantics of CR V-Vs
3 Cases of Semantic Ambiguity and Constraints
3.1 吃饱 Chi-bao ‘Eat-Full’
3.2 骑累 qi-lei ‘Ride-Tired’
3.3 追累 Zhui-lei ‘Chase-Tired’
3.4 想哭 Xiang-ku ‘Miss-Cry’
4 Conclusion
References
On the Dynamic Semantics of Fǒuzé and Bùrán in Mandarin Chinese
Abstract
1 Introduction
2 Literature Review
3 Modality, Dynamic Semantics and Fǒuzé/bùrán
4 Conclusion
Acknowledgments
References
Functions of Non-subject Topics in Mandarin Conversations
Abstract
1 Introduction
2 Method and Data
3 Functions of NS-Topics in Sentences
4 Functions of NS-Topics in On-Going Conversations
4.1 NS-Topics as Lexical Cohesions
4.2 Inverted Object as Response to Alternative Questions
4.3 NS-Topics as Episodic Topics in Conversations
5 Conclusions
References
A Comparative Study of Two Motion Verbs Lái and Guòlái
Abstract
1 Introduction
2 The Conceptual Meaning of Lái and Guòlái
3 Lái and Guòlái as Directional Complements
3.1 Syllable Number of Verb
3.2 Verb Meaning
4 Summary
References
Morpheme Zú “Tribe” in Mandarin Chinese
Abstract
1 Introduction
1.1 Zú is a Nominal Suffix
1.2 Research Questions
2 Four Morpheme Types (Packard [1])
2.1 Morpheme Classifications
2.2 Headedness Principle and X-bar Morphology
3 Word Formation with Morpheme Zú
3.1 Word Forms with Zú
3.2 Relationship Between Components and Zú in Word Forms
3.3 Internal Structure of Word Formation Zú
3.4 Discussion
4 Conclusion
References
Interpretation of Complex Event and the Semantics Structure of General Verbs
Abstract
1 Introduction
2 The Semantic Framework of gao (搞)
2.1 Semantic Types of Objects and Distribution of Meaning Items
2.2 Semantic Level of gao (搞)
2.3 Summary of Section 2
3 Object Complexity and Eventual Whole Scanning
3.1 Complex Events
3.2 Plural Events
3.3 Summary of Section 3
4 Conclusion
References
Perfectivity via Locative Non-coincidence: Pre-verbal TAU in the Xiaolongmen Dialect
Abstract
1 Introduction
2 Semantic Properties of Pre-verbal TAU
2.1 Directional/Movement Meaning
2.2 Overt or Covert Locative Arguments
2.3 Perfectivity
2.4 Semantic Constraints for TAU
2.5 Summary
3 TAU as Locative Predicate: A Formal Analysis
4 Conclusion
References
The Development Trend and Form-Meaning Features of Contemporary Chinese Lexical Patterns
Abstract
1 Introduction
2 Productivity of Contemporary Chinese Lexical Patterns
2.1 Quantitative Analysis of Word Formation of Lexical Patterns
2.2 Prosodic and Grammatical Mode of Lexical Patterns
3 Analysis of the Combination of Forms and Meanings of Contemporary Chinese Lexical Patterns
3.1 The Prosodic Structural Features of Contemporary Chinese Lexical Patterns
3.2 Grammatical Structure Features of Lexical Patterns in Contemporary Chinese
4 Conclusion
Funding
References
Natural Language Processing and Language Computing
Automatic Prediction and Linguistic Interpretation of Chinese Directional Complements Based on BERT Model
Abstract
1 Introduction
2 Related Work and Our Framework
3 Design of the Prediction Model for CDC
3.1 Data Processing
3.2 Model Analysis
4 Analysis of Accuracy Rate for Neural Network Models
5 How Does BERT Pay Attention to CDC Selection?
5.1 Model Analysis with Sampling and Occlusion
5.2 Linguistic Interpretation of CDC Selection Constraints
5.3 Using BERT Model for Educational Purpose
6 Conclusion
Acknowledgments
References
Chunk Extraction and Analysis Based on Frame-Verbs
Abstract
1 Introduction
2 Representation System of the Interaction Between Frame-Verb and Chunk
2.1 Definition of Frame-Verb and Its Quantity
2.2 Construction of Chunk System
3 Frame-Chunk Extraction
3.1 Chunk Extraction System
3.2 Chunk Extraction Process
4 Results Statistics and Analysis
5 Conclusion
Acknowledgement
References
From `It's Your Funeral' to `Mouse Tail Juice': A Quantitative Study of the Mishearings in Danmu Videos
1 Introduction
2 Definition, Motivation, and Semantics of Danmu Mishearings
2.1 Definition of Danmu Mishearings
2.2 Motivations of Danmu Mishearings
2.3 Semantics of Danmu Mishearings
3 Data Collection
4 Results
4.1 Semantic Opposition
4.2 Semantic Competition
4.3 Semantic Emergence
5 Conclusion
References
A Textual Entailment Recognition Method Fused with Language Knowledge
Abstract
1 Introduction
2 Related Work
3 Model
3.1 K-Attention
3.2 RoBERTa
3.3 Output
4 Experiment
4.1 Dataset
4.2 Experimental Setup
4.3 Experimental Results
5 Result Analysis
5.1 Category Analysis
5.2 Ablation Analysis
5.3 Sample Analysis
5.4 Discussion
6 Conclusion
Acknowledgments
References
Semantic Similarity of Inverse Morpheme Words Based on Word Embedding
Abstract
1 Introduction
1.1 An Overview of Previous Work on Inverse Morpheme Words
1.2 Word Embedding and Distributed Representation
2 Methods
2.1 Pre-trained Word Embedding Model
2.2 Cosine Similarity
3 The Experiment
3.1 Word List Extraction
3.2 Experimental Results
4 Discussion
4.1 Classification of Inverse Morpheme Words
4.2 Analysis of Factors Influencing the Semantic Similarity of Inverse Morpheme Words
5 Conclusion
Acknowledgements
References
A Hybrid Model for Chinese Confusable Words Distinguishing in Proofreading
Abstract
1 Introduction
2 Model
2.1 Model Structure
2.2 Pre-trained Module
2.3 Ensemble Module
3 Experiments
3.1 Data Preparation
3.2 Baseline Models
3.3 Evaluation Metrics
3.4 Main Results
3.5 Ablation Studies
3.6 Case Studies
4 Conclusions
Acknowledgments
References
Translating Classical Chinese Poetry into Modern Chinese with Transformer
Abstract
1 Introduction
2 Related Work
3 Method
4 Experiment
4.1 Data and Setups
4.2 Main Results
5 Case Study
6 Conclusion and Future Work
Acknowledgments
References
Disambiguation of Network Informal Language Expressions Based on Construction Grammar
Abstract
1 Introduction
2 Related Work
3 Analysis of NILE Characteristics
3.1 Analysis of NILE Features at the Lexical Level
3.2 Analysis of NILE at the Constructional Level
4 A NILE Disambiguation Method Based on Construction Grammar
4.1 The Interaction Between Constructional and Lexical Meanings in NILEs
4.2 NILE Disambiguation System Design
5 Conclusion
References
Themes and Sentiments of Online Comments Under COVID-19: A Case Study of Macau
Abstract
1 Introduction
2 Methodology
2.1 Data
2.2 Tools
3 Results
3.1 Themes
3.2 Sentiments
4 Discussion
5 Conclusion
Acknowledgements
References
Multilingual China-Related News Identification Framework Based on Multiple Strategies
Abstract
1 Introduction
2 Related Work
2.1 China-Related News Analysis
2.2 Multilingual Text Classification
3 Framework
3.1 Basic Framework
3.2 Text Classification Based on XLM Model
3.3 Corpus Generation Based on Pseudo-Label Technology
3.4 China-Related Characteristic Words Based on Sequence Labeling
3.5 Loss Function
4 Experiment
4.1 Dataset
4.2 Experiment Setup
5 Results and Analysis
6 Conclusion
Acknowledgements
References
Research and Implementation of Buzzword Detection Technology Based on the Dynamic Circulation Corpus
Abstract
1 Introduction
2 Scheme
2.1 Solution for Data Statistics and Data Storage
2.2 Buzzword Detection Models
2.3 Buzzword Quality Scoring Model
3 Dataset and Metrics
4 Analysis of Experimental Results
4.1 Performance Analysis of Data Storage Solutions
4.2 Performance Analysis of Buzzword Detection Models
4.3 Performance Analysis of Buzzword Quality Scoring Model
5 Conclusion
References
Author Index

Recommend Papers

Chinese Lexical Semantics: 24th Workshop, CLSW 2023, Singapore, Singapore, May 19–21, 2023, Revised Selected Papers, Part I (Lecture Notes in Artificial Intelligence) 9819705827, 9789819705825

This book constitutes the refereed proceedings of the 24th Workshop on Chinese Lexical Semantics, CLSW 2023, held in Sin

106 15 32MB Read more

Chinese Lexical Semantics: 24th Workshop, CLSW 2023, Singapore, Singapore, May 19–21, 2023, Revised Selected Papers, Part II (Lecture Notes in Artificial Intelligence) 9819705851, 9789819705856

101 50 29MB Read more

Chinese Lexical Semantics: 20th Workshop, CLSW 2019, Beijing, China, June 28–30, 2019, Revised Selected Papers (Lecture Notes in Computer Science, 11831) 3030381889, 9783030381882

This book constitutes the thoroughly refereed post-workshop proceedings of the 20th Chinese Lexical Semantics Workshop,

110 13 82MB Read more

Chinese Lexical Semantics: 23rd Workshop, CLSW 2022, Virtual Event, May 14–15, 2022, Revised Selected Papers, Part II 9783031289569, 9783031289552, 3031289560

The two-volume set LNAI 13495 and LNAI 13496, constitute the refereed post-workshop proceedings of the 23rd Chinese Lexi

126 36 32MB Read more

Multi-Agent-Based Simulation XXII: 22nd International Workshop, MABS 2021, Virtual Event, May 3-7, 2021, Revised Selected Papers (Lecture Notes in Computer Science) 3030945472, 9783030945473

This book constitutes the thoroughly refereed post-conference proceedings of the 21st International Workshop on Multi-Ag

105 72 28MB Read more

Continual Semi-Supervised Learning: First International Workshop, CSSL 2021, Virtual Event, August 19–20, 2021, Revised Selected Papers (Lecture Notes in Artificial Intelligence) 3031175867, 9783031175862

This book constitutes the proceedings of the First International Workshop on Continual Semi-Supervised Learning, CSSL 20

108 32 Read more

PRICAI 2021: Trends in Artificial Intelligence: 18th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2021, Hanoi, Vietnam, ... Part I (Lecture Notes in Computer Science) 3030891879, 9783030891879

This three-volume set, LNAI 13031, LNAI 13032, and LNAI 13033 constitutes the thoroughly refereed proceedings of the 18t

123 105 63MB Read more

Engineering Multi-Agent Systems: 11th International Workshop, EMAS 2023, London, UK, May 29–30, 2023, Revised Selected Papers (Lecture Notes in Artificial Intelligence) 3031485386, 9783031485381

This book constitutes revised selected papers from the 11th International Workshop on Engineering Multi-Agent Systems, E

104 86 12MB Read more

Pattern Recognition: 6th Asian Conference, ACPR 2021, Jeju Island, South Korea, November 9–12, 2021, Revised Selected Papers, Part I (Lecture Notes in Computer Science, 13188) 3031023749, 9783031023743

This two-volume set LNCS 13188 - 13189 constitutes the refereed proceedings of the 6th Asian Conference on Pattern Recog

113 94 88MB Read more

Proceedings of China SAE Congress 2021: Selected Papers (Lecture Notes in Electrical Engineering, 818) 9811938415, 9789811938412

These proceedings gather outstanding papers presented at the China SAE Congress 2021, held on Oct. 19-21, Shanghai, Chin

104 85 219MB Read more

Chinese Lexical Semantics: 22nd Workshop, CLSW 2021, Nanjing, China, May 15–16, 2021, Revised Selected Papers, Part I (Lecture Notes in Artificial Intelligence)
3031067029, 9783031067020

Author / Uploaded
Minghui Dong (editor)
Yanhui Gu (editor)
Jia-Fei Hong (editor)

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

LNAI 13249

Minghui Dong Yanhui Gu Jia-Fei Hong (Eds.)

Chinese Lexical Semantics 22nd Workshop, CLSW 2021 Nanjing, China, May 15–16, 2021 Revised Selected Papers, Part I

123

Lecture Notes in Artificial Intelligence Subseries of Lecture Notes in Computer Science Series Editors Randy Goebel University of Alberta, Edmonton, Canada Wolfgang Wahlster DFKI, Berlin, Germany Zhi-Hua Zhou Nanjing University, Nanjing, China

Founding Editor Jörg Siekmann DFKI and Saarland University, Saarbrücken, Germany

13249

More information about this subseries at https://link.springer.com/bookseries/1244

Minghui Dong · Yanhui Gu · Jia-Fei Hong (Eds.)

Chinese Lexical Semantics 22nd Workshop, CLSW 2021 Nanjing, China, May 15–16, 2021 Revised Selected Papers, Part I

Editors Minghui Dong Institute for Infocomm Research Singapore, Singapore

Yanhui Gu Nanjing Normal University Nanjing, China

Jia-Fei Hong National Taiwan Normal University Taipei, Taiwan

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Artificial Intelligence ISBN 978-3-031-06702-0 ISBN 978-3-031-06703-7 (eBook) https://doi.org/10.1007/978-3-031-06703-7 LNCS Sublibrary: SL7 – Artificial Intelligence © Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

The 2021 Chinese Lexical Semantics Workshop (CLSW 2021) was the 22nd event since the establishment of this series in 2000. CLSW has been held in different Asian cities including Beijing, Hong Kong, Taipei, Singapore, Xiamen, Hsin Chu, Yantai, Suzhou, Wuhan, Zhengzhou, Macao, Leshan, and Chia-Yi. Over the years, CLSW has become one of the most important venues for scholars to report and discuss the latest progress in Chinese lexical semantics and related fields, including theoretical linguistics, applied linguistics, computational linguistics, information processing, and computational lexicography. CLSW has significantly impacted and promoted academic research and application development in the related fields. CLSW 2021 was hosted by Nanjing Normal University, China. This year, 261 papers were submitted to the workshop, setting the highest record ever. All submissions went through a double-blind review process, with at least two independent reviewers assigned to each paper. Of all the paper submissions, 91 (34.8%) were accepted as oral presentations and 85 (32.6%) as poster presentations. Among the accepted papers, the top-rated English papers were further selected to be included in the proceedings. They are organized in topical sections covering all major topics of lexical semantics, semantic resources, corpus linguistics, and natural language processing. We are pleased that these shortlisted papers are published by Springer as part of their Lecture Notes in Artificial Intelligence (LNAI) series and are to be submitted for indexing by Ei and Scopus. The Organizing Committee would like to express our gratitude to the conference chairs: Ting Liu (Harbin University of Technology) and Jie Xu (Macao University), the honorary members of the Advisory Committee: Shiwen Yu (Peking University), Chin-Chuan Cheng (University of Illinois), Chu-Ren Huang (Hong Kong Polytechnic University), and Xinchun Su (Xiamen University), and the other members of the Advisory Committee for their guidance in promoting and running the workshop. We sincerely appreciate the invited speakers for their outstanding keynote talks: Yiming Yang (Jiangsu Normal University), Chu-Ren Huang (Hong Kong Polytechnic University), Qun Liu (Huawei Noah’s Ark Lab), and Ge Xu (Minjiang University). Also, we would like to acknowledge the members of the Organizing Committee, Nanjing Normal University, and the student volunteers for their tremendous contribution to this event. Our gratitude also goes to the Program Committee members and reviewers for their time and efforts in the paper review work. Last but not least, we thank all the authors and attendees for their scientific contribution and participation, which made CLSW 2021 a great success. April 2022

Weiguang Qu Minghui Dong Jia-Fei Hong Yanhui Gu

Organization

Conference Chairs Jie Xu Ting Liu

University of Macau, Macao SAR, China Harbin Institute of Technology, China

Academic Committee Chairs Shiwen Yu Chin-Chuan Cheng Ka Yin Benjamin T’sou

Peking University, China Taiwan Normal University, Taiwan The Education University of Hong Kong, Hong Kong SAR, China

Academic Committee Yanbin Diao Jia-Fei Hong Chu-Ren Huang Donghong Ji Peng Jin Zhuo Jing-Schmidt Kim Teng Lua Meichun Liu Qin Lu Xinchun Su Zhifang Sui Shu-Kai Hsieh Jie Xu Hongying Zan Yangsen Zhang Sung Lin Chen

Beijing Normal University, China Taiwan Normal University, Taiwan The Hong Kong Polytechnic University, Hong Kong SAR, China Wuhan University, China Leshan Normal University, China University of Oregon, USA Chinese and Oriental Languages Information Processing Society, Singapore City University of Hong Kong, Hong Kong SAR, China The Hong Kong Polytechnic University, Hong Kong SAR, China Xiamen University, China Peking University, China Taiwan University, Taiwan University of Macau, Macao SAR, China Zhengzhou University, China Beijing Information Science and Technology University, China Cheng Kung University, Taiwan

viii

Organization

Program Committee Chairs Minghui Dong Jia-Fei Hong Yanhui Gu

Agency for Science, Technology and Research, Singapore Taiwan Normal University, Taiwan Nanjing Normal University, China

Program Committee Kathleen Ahrens Xiaojing Bai Dabhur Bayar Shu Cai Cairangjia Siaw Fong Chung Ren-Feng Duann Minxuan Feng Wenhe Feng Helena Gao Shu-Ping Gong Shulun Guo Chunjie Guo Yingjie Han Lin He Chan-Chia Hsu Yuxiang Jia Yuru Jiang Shengyi Jiang Peng Jin Yonghong Ke Huei-Ling Lai Lung-Hao Lee Baoli Li Bin Li Chihkai Lin Jingxia Lin Maofu Liu Yao Liu

The Hong Kong Polytechnic University, Hong Kong SAR, China Tsinghua University, China Inner Mongolia University, China Google, USA Qinghai Normal University, China Chengchi University, Taiwan Taitung University, Taiwan Nanjing Normal University, China Wuhan University, China Nanyang Technological University, Singapore Chiayi University, Taiwan Shanghai Jiao Tong University, China Nanjing University of Aeronautics and Astronautics, China Zhengzhou University, China Wuhan University, China Taipei University of Business, Taiwan Zhengzhou University, China Beijing Information and Science Technology University, China Guangdong University of Foreign Studies, China Leshan Normal University, China Beijing Normal University, China Chengchi University, Taiwan Central University, Taiwan Bozhi Technology, China Nanjing Normal University, China Tatung University, Taiwan Nanyang Technological University, Singapore Wuhan University of Science and Technology, China Institute of Scientific and Technical Information of China, China

Organization

Pengyuan Liu Meichun Liu Zhifu Liu Donghong Liu Yunfei Long Chiarung Lu Wei-Yun Ma Mengxiang Wang Lingling Mu Weiming Peng Likun Qiu Weiguang Qu Gaoqi Rao Yanqiu Shao Yangyang Shi Jihua Song Chunyang Song Zuoyan Song Qi Su Xuri Tang I-Ni Tsai Jin Wang Shan Wang Meng Wang Lei Wang Yunwang Wu Jiun-Shiung Wu Hongbing Xing Jiajuan Xiong Jie Xu Dengfeng Yao Shuangyun Yao Dong Yu Hongying Zan Weidong Zhan Junping Zhang Keliang Zhang Lei Zhang

ix

Beijing Language and Culture University, China City University of Hong Kong, Hong Kong SAR, China China Three Gorges University, China Central China Normal University, China University of Nottingham, UK Taiwan Normal University, Taiwan Columbia University, USA Beijing Union University, China Zhengzhou University, China Beijing Normal University, China Alibaba, China Nanjing Normal University, China Beijing Language and Culture University, China Beijing Language and Culture University, China Meta, USA Beijing Normal University, China Shanghai Jiao Tong University, China Peking University, China Peking University, China Huazhong University of Science and Technology, China Taiwan University, Taiwan Yunnan University, China University of Macau, Macao SAR, China Jiangnan University, China Peking University, China Peking University, China Chung Cheng University, Taiwan Beijing Language and Culture University, China The University of Hong Kong, Hong Kong SAR, China University of Macau, Macao SAR, China Beijing Union University, China Huazhong Normal University, China Beijing Language and Culture University, China Zhengzhou University, China Peking University, China Beijing Language and Culture University, China Information Engineering University, Luoyang, China Northeast Normal University, China

x

Organization

Kunli Zhang Qingqing Zhao Zezhi Zheng Hua Zhong Yu-Yun Chang Zhimin Wang Shih-Wen Chyu

Zhengzhou University, China Institute of Linguistics, Chinese Academy of Social Sciences, China Xiamen University, China Fujian Normal University, China Chengchi University, Taiwan Beijing Language and Culture University, China Taiwan Normal University, Taiwan

Organizing Committee Chairs Weiguang Qu Junsheng Zhou Shehui Liang Dongbo Wang Bin Li Minxuan Feng

Nanjing Normal University, China Nanjing Normal University, China Nanjing Normal University, China Nanjing Agricultural University, China Nanjing Normal University, China Nanjing Normal University, China

Publication Committee Chairs Qi Su Pengyuan Liu Xuri Tang

Peking University, China Beijing Language and Culture University, China Huazhong University of Science and Technology, China

Contents – Part I

Lexical Semantics and General Linguistics On the Differences Between D¯oua and D¯oub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hua Zhong From Falling to Hitting: Diachronic Change and Synchronic Distribution of Frost Verbs in Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sicong Dong and Chu-Ren Huang

3

22

The Senses of Mandarin Deadjectival Verbs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiaoqian Zhang

31

A Corpus-Based Study of Factive Verbs and Its Influencing Factors . . . . . . . . . . . Yu Wang and Yulin Yuan

42

The Discrimination of the Synonyms of yˇınqˇı: A Corpus-Based Study . . . . . . . . Xian Wang and Yuelong Wang

56

A Corpus-Based Semantic Analysis of the Pattern ‘bˇa+NP+VP’ in Mandarin Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Congcong Yang and Yunhua Qu

68

Cross-Categorial Behaviors of Mandarin Physical Contact Verbs: A Frame-Based Constructional Analysis of qi¯aodˇa 敲打 . . . . . . . . . . . . . . . . . . . . Tianqi He and Meichun Liu

84

The Correspondence Between Semantic Functions and Syntactic Constructions of Guà Verbs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Caiying Yang, Gaofeng Shi, Hongbing Xing, and Xingsan Chai

96

The Collostruction-Based Definition Model in Language-Specific Chinese-English Learner’s Dictionaries: The Case of Chinese Collective Classifier ‘Bˇa’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Zeng Li and Chen Congmei The Collocations of Chinese Color Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Shan Wang, Le Wu, and Qiaomin Gong The Differences Between Jiùshì and Jiùsuàn as Conjunctions and Their Formation Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 Wei Bian

xii

Contents – Part I

A Deontic Modal SFP in Chengdu Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 Jiajuan Xiong An Analysis of the Grammaticalization, Coercion Mechanisms and Formation Motivation of the New Construction ‘XX Zi’ from the Cognitive Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Chen Li A Study on Lexical Knowledge and Semantic Features of Speech Act Verbs Based on Language Facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Linlin Zhang and Hongbing Xing Investigating Verbs of Confession Through a Syntactic and Semantic Annotation Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 Shan Wang The Pragmatic Distribution and Semantic Explanation of Evidential Prepositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 Enxu Wang and Zheng Zhang The Discourse Functions of Shell Nouns in Mandarin: A Genre-Based Study in Popular and Professional Science Articles . . . . . . . . . . . . . . . . . . . . . . . . . 223 Xin Kou The Interpersonal and Attitudinal Function of the Modal Particle A in the Middle of the Sentence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 Minfeng Wang Verb Meaning Representation Based on Structured Semantic Components . . . . . 248 Long Chen and Weidong Zhan Activation of Alternatives by Mandarin Sentence-Initial and Sentence-Internal Foci: A Semantic Priming Study . . . . . . . . . . . . . . . . . . . . . 263 Tsun-Ming Ma, Yu-Yin Hsu, Tianyi Han, and Daria Tack On the Limitations of Constructional Innovation: A Case Study of the “ni bixu zhidao de NP” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 Lei Zhang, Shanshan Yang, and Sicong Dong Formalized Chinese Sentence Pattern Structure and Its Hierarchical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 Weiming Peng, Zuntian Wei, Jihua Song, Shiwen Yu, and Zhifang Sui On the Semantic Ambiguity of Chinese Causative Resultative V-Vs . . . . . . . . . . . 299 Jiaojiao Yao

Contents – Part I

xiii

On the Dynamic Semantics of Fˇouzé and Bùrán in Mandarin Chinese . . . . . . . . . 312 Jiun-Shiung Wu Functions of Non-subject Topics in Mandarin Conversations . . . . . . . . . . . . . . . . . 325 Yanmei Gao and Guoyan Lyu A Comparative Study of Two Motion Verbs Lái and Guòlái . . . . . . . . . . . . . . . . . 339 Ziyan Li Morpheme Zú “Tribe” in Mandarin Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 Huahung Yuan and Yan Li Interpretation of Complex Event and the Semantics Structure of General Verbs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370 Jie Fan Perfectivity via Locative Non-coincidence: Pre-verbal TAU in the Xiaolongmen Dialect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 Xia Liu and Vincent Jixin Wang The Development Trend and Form-Meaning Features of Contemporary Chinese Lexical Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391 Jiapan Li Natural Language Processing and Language Computing Automatic Prediction and Linguistic Interpretation of Chinese Directional Complements Based on BERT Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 Young Hoon Jeong, Ming Yue Li, Su Min Kang, Yun Kyung Eum, and Byeong Kwu Kang Chunk Extraction and Analysis Based on Frame-Verbs . . . . . . . . . . . . . . . . . . . . . . 417 Chengwen Wang, Gaoqi Rao, Endong Xun, and Zhifang Sui From ‘It’s Your Funeral’ to ‘Mouse Tail Juice’: A Quantitative Study of the Mishearings in Danmu Videos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426 Yihan Zhou A Textual Entailment Recognition Method Fused with Language Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440 Yalei Liu, Lingling Mu, Wenyan Chu, and Hongying Zan Semantic Similarity of Inverse Morpheme Words Based on Word Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452 Jiaomei Zhou and Zhiying Liu

xiv

Contents – Part I

A Hybrid Model for Chinese Confusable Words Distinguishing in Proofreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464 Luozheng Li, Peipei Song, Dan Zhang, and Dongyan Zhao Translating Classical Chinese Poetry into Modern Chinese with Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474 Peng Jin, Hailiang Wang, Limin Ma, Bing Wang, and Shushan Zhu Disambiguation of Network Informal Language Expressions Based on Construction Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481 Rongjing Xia and Keliang Zhang Themes and Sentiments of Online Comments Under COVID-19: A Case Study of Macau . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494 Xi Chen, Vincent Xian Wang, and Chu-Ren Huang Multilingual China-Related News Identification Framework Based on Multiple Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504 Lianxi Wang, Xiaotian Lin, and Nankai Lin Research and Implementation of Buzzword Detection Technology Based on the Dynamic Circulation Corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518 Yingying Wang, Huaqiu Liu, Erhong Yang, and Yuxuan Jiang Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531

Contents – Part II

Cognitive Science and Experimental Studies Human Body Metaphor in News Headlines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bingbing Yang and Zhimin Wang Human Cognitive Constraints on the Separation Frequency and Limit of Separable Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiaoming Han and Haifeng Wang A Quantitative Research on the Spatial Imageries for Among Flowers . . . . . . . . . Ning Cheng Study on the Order of Double-Syllable Double Attributives and Selection Restrictions—Take the Structures of a1 + a2 + De + n and a1 ’ + De + a2 ’ + n’ as Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rui Song, Wenjie Zhao, and Zhimin Wang Embodied Grounding of Concreteness/Abstractness: A Sensory-Perceptual Account of Concrete and Abstract Concepts in Mandarin Chinese . . . . . . . . . . . . Yin Zhong, Chu-Ren Huang, and Kathleen Ahrens A Diachronic Study on Linguistic Synesthesia in Chinese . . . . . . . . . . . . . . . . . . . Qingqing Zhao and Yunfei Long The Relationship Between Lexical Richness and the Quality of CSL Learners’ Oral Narratives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qing Ma and Xingsan Chai

3

18

42

56

72

84

95

Interpreting Accomplishments by Script Knowledge: A Comparison Study Between Chinese and French . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Yingyi Luo and Xiaoqian Zhang Lexical Resources and Corpus Linguistics Creation and Significance of Database of Dictionary of Cognate Words . . . . . . . . 119 Shuyi Fang and Liangyue Xu Chinese Predicate Chunk Knowledge Base Construction and Internal Boundary Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Chengwen Wang, Xiang Liu, Gaoqi Rao, Endong Xun, and Zhifang Sui

xvi

Contents – Part II

Translational Equivalents for Culture-Specific Words in Chinese-English Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Qian Li Quantitative Analysis of Chinese and English Verb Valencies Based on Probabilistic Valency Pattern Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Jianwei Yan and Haitao Liu A Quantitative Approach to the Stylistic Assessment of the Middle Chinese Texts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Bing Qiu and Wei Bian Construction and Evaluation of Chinese Word Segmentation Datasets in Malay Archipelago . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Shengyi Jiang, Yingwen Fu, and Nankai Lin Social Changes Manifested in the Diachronic Changes of Reform-Related Chinese Near Synonyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 Longxing Li, Vincent Xian Wang, and Chu-Ren Huang Semantic Classification of Adverbial Adjectives Based on Chinese Chunkbank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 Zhenzhen Qin, Tian Shao, Gaoqi Rao, and Endong Xun Prepositional Frame Extraction and Semantic Classification Based on Chinese ChunkBank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Liyang Pang, Chengwen Wang, Guirong Wang, Gaoqi Rao, and Endong Xun From Complex Emotion Words to Insomnia and Mental Health: A Corpus-Based Analysis of the Online Psychological Consultation Discourse About Insomnia Problems in Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Xiaowen Wang, Yunfei Long, Panyu Qin, Chunhong Huang, Caichan Guo, Yong Gao, and Chu-Ren Huang A Quantitative Study on the Measure Index of Syntactic Complexity in Textbooks for Chinese as a Second Language . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Caihong Cao, Wenting Cao, and Fang Tian Construction and Quantitative Analysis of Jiangsu Dialect Function Word Knowledgebase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 Xiaoru Wu, Yuling Dai, Xuefen Mao, Minxuan Feng, and Bin Li Construction of Event Annotation Corpus for Political News Texts . . . . . . . . . . . 258 Ruimin Wang, Yajuan Ye, Kunli Zhang, Hongying Zan, and Yingjie Hang

Contents – Part II

xvii

Developing a Syntactic and Semantic Annotation Tool for Research on Chinese Vocabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 Shan Wang, Xiaojun Liu, and Jie Zhou An Overview of the Construction of Near-Synonyms Discrimination Resources . . . . . . . . . . . . 295 Juan Li Contemporary Chinese Social Vocative Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 Xue Zhang The Corpus Construction of Basic Noun Compound Phrase in Literature Domain and Its Comparison with News Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 Yuan Zhong, Ying Zhang, and Pengyuan Liu Predicate Annotation for Chinese Intent Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 335 Yan Li, Likun Qiu, and Zhe Zhao The Development of the Chinese Monosyllabic Motion-Directional Constructions: A Diachronic Constructional Approach . . . . . . . . . . . . . . . . . . . . . . 353 Fangqiong Zhan A Corpus Study of Anaphora in Chinese Conditionals . . . . . . . . . . . . . . . . . . . . . . 365 Shunting Chen Research on the Distribution and Characteristics of Negative Quasi-Prefixes in Different Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376 Yonghui Xie and Erhong Yang A Quantitative Study on Mono-Valent Noun and Its Ellipsis . . . . . . . . . . . . . . . . . 388 Xiangyu Chi, Gaoqi Rao, and Endong Xun Construction of Chinese Obstetrics Knowledge Graph Based on the Multiple Sources Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399 Kunli Zhang, Chenxin Hu, Yu Song, Hongying Zan, Yueshu Zhao, and Wenyan Chu Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411

Lexical Semantics and General Linguistics

On the Differences Between Dōua and Dōub Hua Zhong(&) Overseas Education College of Fujian Normal University, Fuzhou, China [email protected]

Abstract. In the existing literature, many scholars advocate the dichotomous adverb dōu(都). But most of the deﬁnitions of the semantic and pragmatic functions of dōua (都a) and dōub(都b) are not rigorous, accurate, and comprehensive. The differences between them have not been comprehensively and systematically sorted out. The academic circles still can not clearly distinguish them. Based on the redeﬁnition of the semantic and pragmatic functions of dōua and dōub in Zhong [6–8, 11] this paper attempts to comprehensively reveal the differences between them from multiple levels and dimensions, hoping to reach the dichotomy consensus on the adverb dōu early. Keywords: The differences between Dōua and Dōub Three planes Ideational function Interpersonal function Textual function

1 An Introduction: The Necessity of Comprehensively Clarifying the Differences Between dōua and dōub In the existing literature, many scholars advocate the dichotomous adverb dōu, namely dōua (dōu1) and dōub(dōu2, dōu3). For example, Paris [1],Wang[2], Jiang[3], Zhang [4], Xu [5], and so on. However, most of them use the Chinese analytical description of synonyms to deﬁne the semantic and pragmatic functions of dōua and dōub, which is not rigorous, accurate, and comprehensive enough. After Zhong [6–10] redeﬁned the semantic and pragmatic functions of dōua and dōub, Zhong [11] extracted the distinctive features of them, and has made it clear that adverb dōu are dōua and dōub. But the differences between them have not been comprehensively and systematically sorted out. The ‘classiﬁcation without distinction’ still confuses the academic circles, and even develops some very ad hoc ‘unity theories’ (such as: Li [12], Xu [13], Shen [14], Zhang [15], etc.) Therefore, it is still necessary to make a comprehensive comparison between dōua and dōub and clarify their differences. In view of this, this paper attempts to comprehensively compares the differences between dōua and dōub from multiple levels and dimensions, hoping to reach the dichotomy consensus on the adverb dōu early.

© Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 3–21, 2022. https://doi.org/10.1007/978-3-031-06703-7_1

4

H. Zhong

2 The Syntactic Differences Between dōua and dōub In sentences, dōua or dōub are usually used as an adverbial to modify the predicate, and there seems to be no difference between them. This is probably the reason why few scholars have talked about their differences at the syntactic level. However, after a little in-depth investigation, from their collocations and combinations in the sentences, we will ﬁnd that there are relatively neat differences at the syntactic level between them. According to the author’s investigation, there are at least the following opposites: 2.1

The Difference of the Topic Components Between dōua Sentences and dōub Sentences

First of all, the topics of dōua sentences can be ‘suǒyǒu NP(all NP), quán NP(whole NP), měi NP(every NP), yíqiè(everything), dàjiā(everyone), dàochù(everywhere), chùchù(everywhere), rénrén(everyone), jiājiā(every family)…’ and other universal elements, while the topics of dōub sentences usually cannot be these. For example: (1) Suǒyǒu rén dōua shì duǎnxiù báoshān. All people DOUa are short-sleeve thin-shirt All the people wore thin shirts with short sleeves. !* (Lián) suǒyǒu rén dōub shì duǎnxiù báoshān. (Including) all people DOUb are short-sleeve thin-shirt All the people even wore thin shirts with short sleeves. (2) Quánjiā rén dōua huáiniàn tā. Whole-family people DOUa miss her She was missed by the whole family. !* (Lián) Quánjiā rén dōub huáiniàn tā. (Including) Whole-family people DOUb miss her Even the whole family missed her. (3) Měi zhōu dōua yǒu liǎng fēng xìn yuèguò Chángjiāng hé Hànshuǐ. Every week DOUa have two CL letter cross Yangtze River and Hàn River There are two letters every week across the Yangtze River and Han River. !* (Lián) Měi zhōu dōub yǒu liǎng fēng xìn yuèguò Chángjiāng. (Including) Every week DOUb have two CL letter cross Yangtze-River hé Hànshuǐ. and Hàn River. Even two letters cross the Yangtze River and the Han River every week. (4) Yíqiè dōua tūrán biànde wútóu wúxù. Everything DOUa suddenly become headless disorderly. Everything suddenly became headless and disorderly. !* (Lián) Yíqiè dōub tūrán biànde wútóu wúxù. (Including) Everything DOUb suddenly become headless disorderly. Even everything suddenly became headless and disorderly. (5) Dàjiā dōua gǔlì tā. Everyone DOUa encourage her. Everyone encouraged her

On the Differences between Dōua and Dōub

5

!* (Lián) Dàjiā dōub gǔlì tā. (Including) Everyone DOUb encourage her. Even everyone encouraged her. (6) Dàochù dōua kěyǐ tīngjiàn hùxiāng gǔlì de shēngyīn. Everywhere DOUa can heard mutual encourage PAT voice Voices of mutual encouragement can be heard everywhere. !* (Lián) Dàochù dōub kěyǐ tīngjiàn hùxiāng gǔlì de shēngyīn. (Including) everywhere DOUb can heard mutual encourage PAT voice Even voices of mutual encouragement can be heard everywhere. (7) Rénrén dōua nénggòu dǒngdé zhè ge dàolǐ. Everyone DOUa can understand this CL truth. Everyone can understand this truth. !* (Lián) Rénrén dōub nénggòu dǒngdé zhè ge dàolǐ. (Including) Everyone DOUb can understand this CL truth. Even everyone can understand this truth. (8) Jiājiā dōua wèi tāmen fūqī dàkāi lǜdēng. Every-family DOUa for they couple big-open green-light Every family gave the green light to the couple. !* (Lián) Jiājiā dōub wèi tāmen fūqī dàkāi lǜdēng. (Including) Every-family DOUb for they couple big-open green-light Even every family gave the green light to the couple. 2.2

The Differences of Modiﬁers Before dōua and dōub

Secondly, many modiﬁers appear very freely before dōua, such as ‘quán(entirely), yībān(usually), dàbùfēn(largely); yìzhí(all the time), cónglái(all along), xiànglái(all along), céngjīng(ever); bù(not), méi(no)’ and so on; But before dōub, adverbials and other adverbial elements are rarely used to modify it. For example: (9) Gāngcái nà gǔ qì, yíxiàzi quán dōua(*dōub) xiāo le. Just-now that CL anger at-once entirely DOUa(*DOUb) disappear ASP The anger just now disappeared all at once. (10) Nà ge shíhòu, dàlǐtáng yìbān dōua(*dōub) méiyǒu kōngtiáo. That CL time, auditorium usually DOUa(*DOUb) none air-onditioner At that time, there is usually no air conditioner in the auditorium. (11) Zhèxiē rén dàbùfēn dōua(*dōub) zhīchí tā. These people largely DOUa(*DOUb) support him Most of these people support him. (12) Tā yìzhí dōua(*dōub) hěn guānxīn nǐ. She all-the-time DOUa(*DOUb) very care you She has always cared about you. (13) Zhè jǐge háizi cónglái dōua(*dōub) bú qù túshūguǎn. These several kid all-along DOUa(*DOUb) not go library. These children never go to the library (14) Zhè ge rén xiànglái dōua(*dōub) zhème yū This CL man all-along DOUa(*DOUb) so pedantic.

6

H. Zhong

This man has always been so pedantic. (15) Zhèxiē qínshǒu céngjīng dōua(*dōub) shì zhùmíng yuétuán de yuèshī These player ever DOUa(*DOUb) are famous orchestra DE musician These players used to be musicians of famous orchestras. (16) Tāmen bù dōua(*dōub) dài le yǎnjìng ma? They not DOUa(*DOUb) wear ASP glasses PAT Don’t they all wear glasses? (17) Dàjià hái méi dōua(*dōub) dào quán Everyone yet not DOUa(*DOUb) arrive entire Everyone hasn’t arrived yet. (18) Dōub kuài liùdiǎn le DOUb will six-o’clock ASP It's even almost six o'clock. !* Yìzhí/ cónglái/bù/méi + dōub kuài liùdiǎn le All-the-time/ all-along/not/no + DOUb will six-o'clock ASP All-the-time/ all-along/not/no + it's even almost six o’clock. 2.3

The Differences of ‘dōua/dōub + VP/AP’ as the Complements

In addition, when ‘dōu + VP/AP’ is used as a complement, it can only be used as a stative complement in ‘(NP) + VP + De + dōua/b + VP/AP’. The main difference between the two sentence-patterns is that there must be a plural topic NP (expressed or implied) at the beginning of dōua sentence, for example: (19) Tiàowǔ de rén dǎbàn de dōua bǐjiào yǒu gèxìng. Dance PAT people dress PAT DOUa fairly have personality Dancers are dressed with more personality. (20) Yíqiè jìnzhǎn de dōua fēicháng shùnlì. Everything get-along PAT DOUa awfully smooth Everything is going very smoothly. (21) (Zhèxiē nián) Tā yìzhí zuò de dōua hěn hǎo. (These years) He all-the-time do PAT DOUa very well He has been doing very well all these years. In these sentences, ‘Dancers, Everything, (These years)’ are plural. However, there must be a singular topic NP (expressed or implied), or even the plural NP is a singleelement set for collective reading, at the beginning of dōub sentence. For example: (22) Bǎ wǒ xià de dōub wàng le chīfàn le. BA me scare PAT DOUb forget ASP eat-rice ASP I was so scared that I forgot to eat. (23) Nà shíhòu wǒ de jiǎo dòng de dōub làn le ge dà kūlōng. That time I PAT foot freeze PAT DOUb rot ASP CL big hole My feet were rotten with cold at that time, and there were big holes. The word ‘wǒ(I)’ in example (22) is singular, while the word ‘jiǎo(foot/feet)’ in example (23) may be singular or plural, but it is only interpreted as a single-element set

On the Differences between Dōua and Dōub

7

in sentences. This difference is shown more clearly in the ambiguous sentences ‘bǐ + NP + VP + de + dōua/b + VP/AP’. For example: (24) a. Tā bǐ wǒ chàng de dōub(*dōua) hǎo. He than I sing PAT DOUb(*DOUa) well. He even sings better than me. b. Tā bǐ wǒmen chàng de dōub(*dōua) hǎo, gèng bú yòng shuō He than we sing PAT DOUb(*DOUa) well, let-alone not use say qítā rén le. other people ASP He even sings better than us, not to mention others. c. Tā bǐ wǒmen chàng de dōua/b hǎo. He than we sing PAT DOUa/b well He sings better than all of us/He even sings better than us. NP ‘wǒ(I)’ in (24a) is singular. Although NP ‘wǒmen(we)’ in (24b) is plural, it is also a single-element set compared with ‘qítā rén(others)’, so dōu in the sentence can only be dōub, not dōua. But in the ambiguous sentence (24c), NP ‘wǒmen(we)’ can be understood as either a unitary set formed by a collection or a plural-elements set formed by plural individuals.

3 The Semantic Differences Between dōua and dōub Zhong [7] deﬁned dōua as a distributive operator for quantifying eventualities globally, whose semantic function is to quantify the eventualities expressed in the sentence/proposition, and the quantitative value expressed by the distributive operator dōua is ‘total, plural, and approximate’. Zhong [6, 8] deﬁned dōub as a counterexpectation (hereafter CE) discourse-marker. Its semantic function is the procedural pragmatic meaning without truth value, and indicates that a speaker makes a judgment on the eventuality described by a proposition, and he/she believes that the possibility of the eventuality is inferior-to-expectation (or normal). From the above deﬁnitions, we can roughly extract four groups of differences in the semantic level (or the conceptual function) of dōua and dōub. Firstly, from the semantic nature of dōua and dōub, dōua is the objective quantiﬁcation on the eventualities expressed in the sentence/proposition, and dōub is the subjective evaluation on the eventuality expressed in the sentence/proposition. For example: (25) Lǐ Xiàozhǎng, Wáng Yuànzhǎng hé Zhāng Zhǔrèn dōua kuài wǔshí le. Lǐ president, Wáng dean and Zhāng director DOUa will ﬁfty ASP President Li, Dean Wang and Director Zhang are almost ﬁfty (26) a. Dōub kuài wǔshí le, yīnggāi xiǎngshòu xiǎngshòu le. DOUb will ﬁfty ASP, should enjoy enjoy ASP It’s almost ﬁfty, it’s time to enjoy life.

8

H. Zhong

b. Dōub kuài wǔshí le, gèng bùnéng làngfèi shíjiān le. DOUb will ﬁfty ASP, more cannot waste time ASP It’s almost ﬁfty, let alone wasting time. The opposition between example (25) (26) shows that dōua is an objective quantiﬁcation and is used to convey objective information, while dōub is a subjective evaluation, which is used to express subjective cognition, thoughts and feelings, etc. Dōub as a CE discourse-marker has no objective meaning, but has subjective meanings and strong subjectivities. Subjectivity, on the one hand, is that some deﬁnitions of the expectations (or normals) of the possibilities may be different due to the relevant people and situation. For example: (27) Dōub kuài wǔshí le, yīnggāi xiǎngshòu xiǎngshòu le, bú yào zài nàme láolèi. DOUb will ﬁfty ASP, should enjoy enjoy ASP not ask-for again so tire It's almost ﬁfty, so you should enjoy life and stop being so tired. (28) Dōub kuài wǔshí le, gèng bùnéng làngfèi shíjiān le, yīnggāi gèng nǔlì DOUb will ﬁfty ASP, more cannot waste time ASP, should more hard gōngzuò. work It’s almost ﬁfty, let alone wasting time. We should work harder. On the other hand, the informative intention of a dōub sentence is to describe the eventuality whose possibility of occurrence or existence is inferior-to-expectation, as long as the addressee can understand the informative intention, the truth or falsehood of the proposition in a dōub sentence is insigniﬁcant. Therefore, a speaker even can use the things in ﬁctions, or myths and legends for purposeful exaggeration just like extreme situations deviated from the expectation (or normal), to express attitudes and feelings of the speaker emphatically, for example: (29) Lián shàngdì dōub děi pà tā sānfēn Including God DOUb have-to afraid he three-points Even God has to be afraid of him. (30) Tā bǐ xiǎoguǐ dōub huài. He than little-devil DOUb evil He's eviler than a little devil. (31) Tā qì de fèi dōub kuài zhà le She angry PAT lungs DOUb will explode ASP She was so angry that her lungs almost exploded with anger. (32) Zài tā nà zhuórè de mùguāng xià, tā juéde zìjǐ dōub kuàiyào rónghuà le. On he that burning PAT sight down, she feel herself DOUb almost melt ASP. Under his burning eyes, she felt that she was about to melt. In addition, dōub as a CE marker doesn’t have any propositional function but a commentary function to a relevant proposition, and obviously evaluative functions (see also Fraser [16]), which also can be considered as a high-level predicate with an ‘implicit assertion’. As a result, Dōub cannot to be negated, and except an echo question, a dōub sentence normally can’t be used as an interrogative sentence

On the Differences between Dōua and Dōub

9

[commentary: * in front of an example sentence means that it is false; ?* means that an echo question is tenable and a general question is untenable]. For example: (33) Shílǐ wài de wēixiǎn, tā dōub néng gǎnjué dào. Ten-miles away PAT danger, he DOUb can feel get He can feel the danger ten miles away !* Shílǐ wài de wēixiǎn, Tā bù(méi) dōub néng gǎnjué dào. Ten-miles away PAT danger, He not DOUb can feel get He even cannot feel the danger ten miles away. !?* Tā dōub néng gǎnjué dào? /*Shei dōub néng gǎnjué dào? He DOUb can feel get? /* Who DOUb can feel get? Can he even feel it? /* Who even can feel it? (34) Dōub kuài liùdiǎn le DOUb will six-o'clock ASP It's even almost six o'clock. !* Bù(méi) dōub kuài liùdiǎn le Not DOUb will six-o'clock ASP It is not even almost six o'clock. !?* Dōub kuài liùdiǎn le?/?*Dōub kuài jǐdiǎn le? DOUb will six-o'clock ASP?/?* DOUb will What-time ASP? Is it even almost six o'clock? /?* What time is it almost? (35) Gǒu yǒushí bǐ rén dōub hǎo. Dog sometimes than person DOUb good Sometimes dogs are even better than human beings. !* Gǒu yǒushí bù(méi) bǐ rén dōub hǎo. Dog sometimes not than person DOUb good Even sometimes dogs are not better than human beings. !?* Gǒu yǒushí bǐ rén dōub hǎo? Dog sometimes than person DOUb good Are sometimes dogs even better than human beings? (36) Wǒ xué dōub xué bù lái. I learn DOUb learn not come I can't learn anything. !* Wǒ xué bù(méi) dōub xué bù lái. I learn not DOUb learn not come I can't learn anything. !?* Nǐ xué dōub xué bù lái? You learn DOUb learn not come Can you not learn anything? (37) Fàn dōub liáng le, kuài chī ba! Dinner DOUb cold ASP, hurry eat PAT The dinner already got cold, just eat it! !* Fàn bù(méi) dōub liáng le, kuài chī ba! Dinner not DOUb cold ASP, hurry eat PAT The dinner didn’t already got cold, just eat it! !?* Fàn dōub liáng le ma?

10

H. Zhong

Dinner DOUb cold ASP PAT? Did the dinner already get cold? !?* Shěnme dōub liáng le? What DOUb cold ASP What did already get cold? From the point of view of systemic functional linguistics, the ideational function of dōua is the experiential function, which participates in the construction of propositional meaning, and the ideational function of dōub is the logical function, which participates in the construction of logical relations (see Halliday [17]). It is to construct the relationship in which the eventuality expressed by the sentence/proposition is less likely to occur or exist than expectation (or normal). Or from the point of view of discourse marker theory, the semantics of dōua belongs to the truth semantics of expressing propositional content, and the semantics of dōub belongs to the non-truth semantics of expressing propositional attitudes, which is a kind of procedural meaning that leads or guides discourse comprehension (see Blakemore [18, 19]; Sperber and Wilson [20]). For example: (38) a. Tāmen dōua shāngxué qù le |* Tā dōua shāngxué qù le They DOUa go-to-school go ASP |* He DOUa go-to-school go ASP They all went to school. |* He all went to school. b. Tāmen shāngxué qù le | Tā shāngxué qù le They go-to-school go ASP | He go-to-school go ASP They went to school. | He went to school. (39) a. ?* Shūbāo lǐ de dōngxi dōua línshī le, búguò, kèběn hái Schoolbag inside PAT thing DOUa get-wet ASP, but, textbook yet méi línshī. not get-wet Everything in the schoolbag is wet, but the textbook is not wet yet. b. Shūbāo lǐ de dōngxi línshī le, búguò, kèběn hái méi línshī. Schoolbag inside PAT thing get-wet ASP, but, textbook yet not get-wet The things in the schoolbag is wet, but the textbook is not wet yet. c. * Tā dōua dàxué bìyè le He DOUa university graduate ASP He all graduated from a university d. Tā dàxué bìyè le He university graduate ASP He graduated from a university (40) a. a. Shūbāo lǐ de dōngxi dōub línshī le, búguò, kèběn hái Schoolbag inside PAT thing DOUb get-wet ASP, but, textbook yet méi línshī. not get-wet. Everything in the schoolbag is wet, but the textbook is not wet yet.

On the Differences between Dōua and Dōub

11

b. Shūbāo lǐ de dōngxi línshī le, búguò, kèběn hái méi línshī. Schoolbag inside PAT thing get-wet ASP, but, textbook yet not get-wet The things in the schoolbag is wet, but the textbook is not wet yet. c. Tā dōub dàxué bìyè le He DOUb university graduate ASP He already graduated from a university. d. Tā dàxué bìyè le He university graduate ASP He graduated from a university. The difference between (39a) and (40a) shows that the semantic functions of dōua and dōub are indeed different. The opposition between (38a) and (39b) sentences, (38c) and (39d) sentences shows that dōua will change the truth value of the sentence or the content of the proposition. The difference between (40a) and (40b) sentences, (40c) and (40d) sentences shows that dōub will not change the truth value of the sentence or the content of the proposition, but it can lead or guide discourse understanding. For example: (41) a. Bǎ tā chǎo xǐng le. BA he make-a-noise wake ASP Woke him up. b. Bǎ tā dōub chǎo xǐng le. BA he DOUb make-a-noise wake ASP Even woke him up. The truth meanings of sentences (41a) and (41b) are the same: ‘woke him up’. (41a) in an uncertain context can be interpreted in multiple pragmatics. ‘Woke him up’, both it can be understood as ‘the voice is too loud’, or it can be understood as ‘(the voice is very low) he is alert when he sleeps’. After adding dōub, the hearer's pragmatic understanding of the propositional meaning in (41b) is affected or restricted. ‘Woke him up’, usually can only be understood as ‘the voice is too loud.’ The semantic difference between the true value of dōua and the non-true value of dōub is also reflected in the informational prosodies and the syntactic transformations. For example: (42) a. Tāmen dōua qù le Běijīng. They DOUa go ASP Beijing They all went to Beijing. b. Tāmen yǒu duōshǎo rén qù le Běijīng? They have how-many person go ASP Beijing How many of them have gone to Beijing? (43) (Lián) Tāmen dōub qù le Běijīng. ! ? (Including) They DOUb go ASP Beijing They even went to Beijing.

12

H. Zhong

Dōua in (42a) expresses the truth-value meaning, so it can be loaded with focus stress. ‘How many people’ can replace dōua to ask questions, and the sentence (42a) can be used to answer the sentence (42b). But dōub in (43) expresses non-truth-value meaning, so it cannot be loaded the focus stress and can only be read lightly. And there is no question word to replace dōub to ask questions. The sentence (43) cannot be used to answer questions independently. While the dōua sentence can answer the question, the dōub sentence cannot. For example: (44) Q: a. Tāmen bān yǒu duǒshǎo tóngxué kǎoshàng le yánjiùshēng? Their class have how-many classmate admitted ASP graduate-student How many students in their class have been admitted to graduate school? A: b. Quán bān dōua kǎoshàng le. Whole class DOUa admitted ASP The whole class was admitted. A: c. (Lián) Chéngjì zuìchà de Lǐ Míng dōub kǎoshàng le. (Including) grades worst PAT Lǐ Míng DOUb admitted ASP Even Li Ming, who got the worst grades, was admitted. ! c1. (Lián) Chéngjì zuìchà de Lǐ Míng dōub kǎoshàng le, (Including) grades worst PAT Lǐ Míng DOUb admitted ASP Even Li Ming, who got the worst grades, was admitted. Quán bān dōua kǎoshàng le. Whole class DOUa admitted ASP The whole class was admitted. ! c2.?*(Lián) Chéngjì zuìchà de Lǐ Míng dōub kǎoshàng le, (Including) grades worst PAT Lǐ Míng DOUb admitted ASP Even Li Ming, who got the worst grades, was admitted. Keshì háiyǒu jǐwèi xuéxí jiàohǎo de méi kǎoshàng. However still-have a-few study better PAT not admitted However, there are still a few who have done well in their studies, who did not pass the examination. (44a) can be answered by (44b), but not by (44c). The reason is that besides (44c) itself is difﬁcult to form a sentence alone, the complete sentence formed by (44c) is not certain and unchanged, there can be c1, c2, and so on. Although c1 can be used to answer (44a), the real answer to (44a) is still the following ‘The whole class was admitted’. But c2, etc. still cannot be used to answer (44a). Secondly, in terms of the number of eventualities in the sentences of dōua and dōub, dōua quantiﬁes the eventualities expressed in the sentences/propositions, and the number of eventualities is ‘total, plural and approximate’ (see Zhong [7] for details). For example: (45) a. Tāmen dōua qù le Běijīng. They DOUa go ASP Beijing They all went to Beijing.

On the Differences between Dōua and Dōub

13

b. b. * Tā dōua qù le Běijīng. He DOUa go ASP Beijing He all went to Beijing. (45a) sentence ‘They went to Beijing’ is a set of the eventualities composed of subeventualities ‘A went to Beijing, B went to Beijing…’. The number of the eventualities is ‘total, plural, approximate’, and the sentence is grammatical. (45b) sentence ‘He went to Beijing’ is a single-eventuality set (unique set), and the sentence is not grammatical. And dōub is to show the speaker's judgment of the eventuality stated in the proposition that the possibility of the eventuality is lower than the expectation (or normal), so the number of the eventuality expressed in a dōub sentence/proposition is ‘singular, exact’. For example: (46) (Lián) Tāmen/Tā dōua qù le Běijīng, (Including) They/He DOUa go ASP Beijing, Even they/he went to Beijing, why didn't you go? In (46) sentence ‘They/He went to Beijing’ both are a single-eventuality set (unique set). Thirdly, due to the different semantic functions of dōua and dōub, the corresponding semantic/pragmatic conditions of dōua/dōub sentences are different. I. The critical factor to determining the validity of dōua sentence is that the eventualities expressed in the sentence must be plural (see Zhong [7]). The corpus shows that no matter whether the nominal component before dōua is singular or plural, whether the action indicated by the verb or predicate modiﬁed by dōua is singular or plural, and whether the predicate is a collective predicate or symmetrical predicate, these factors cannot ﬁnally determine whether dōua sentence is true or not. For example: (47) *Tāmen dōua bǎ zǒngtǒngfǔ bāowéi zhe. They DOUa BA presidential-palace surround ASP They all are surrounding the presidential palace. (‘They’ is plural.) (48) *Máo Zédōng hé Liú Shàoqí dōua shì Húnán lǎoxiāng. Mao Zedong and Liu Shaoqi DOUa are Hunan fellow-villager Mao Zedong and Liu Shaoqi are both fellow Hunanese. (‘Mao Zedong and Liu Shaoqi’ is plural.) (49) Tā bǎ nà ge píngguǒ dōua chī le. He BA that CL apple DOUa eat ASP. He ate all that apple. (‘He, that apple’ are singular.) (50) Tāmen dōua chīdào le yícì/liǎngcì. They DOUa be-late ASP one-time/two-times They are all late once/twice. (‘Once/twice’ is singular/ plural.) (51) Zhè jǐtiān tāmen tiāntiān dōua bǎ zǒngtǒngfǔ bāowéi zhe. These several-days they day-day DOUa BA presidential-palace surround ASP

14

H. Zhong

In the past few days, they have surrounded the presidential palace every day. (‘Surround’ is a collective predicate.) (52) Zhèxiē rén dōua shì fūqī. These person DOUa are couple These people are all couples. (‘Be couples’ is a symmetric predicate.) Sentences (47, 48) are not valid because they all mean singular eventuality, and sentences (49–52) are valid because they all mean plural eventualities. Moreover, the factor that ultimately determine whether the eventualities expressed in dōua sentence are plural is not the nominal/topical elements before dōua or the predicates after dōua, but by the predicates in the sentence and the noun components combined with them (see Lasersohn [21], Zhang [4],Yuan [22]). For example: (53) a. *?Tā bǎ nà dī shuǐ dōua hē le. He BA that CL water DOUa drink ASP. He drank all the drop of water. (‘A person drinks a drop of water’ is hardly plural events) b. Tā bǎ nà bēi shuǐ dōua hē le. He BA that cup water DOUa drink ASP. He drank all the cup of water. (‘People drink a glass of water’ can easily become plural events) c. *? Zhè zhī wénzi bǎ nà dī shuǐ dōua hē le. This CL mosquito BA that CL water DOUa drink ASP. The mosquito drank all the drop of water. (‘A mosquito drinks a drop of water’ can easily become plural events) (54) a. *? Zhè zhī dà gōngjī bǎ nà lì mǐfàn dōua chī le. This CL big cock BA that grain rice DOUa eat ASP The big cock ate all that grain of rice. (‘A big cock ate a grain of rice’ is hardly plural events) b. Zhè zhī dà gōngjī bǎ nà wǎn mǐfàn dōua chī le. This CL big cock BA that bowl rice DOUa eat ASP The big cock ate all the bowl of rice. (‘A big cock ate a bowl of rice’ can easily become plural events) c. Zhè zhī xiǎo mǎyǐ bǎ nà lì mǐfàn dōua chī le. This CL small ant BA that grain rice DOUa eat ASP. The small ant ate all that grain of rice. (‘A small ant ate a grain of rice’ can easily become plural events) II. The critical factor to determining the validity of dōub sentence is that the possibility of occurrence or existence of the eventuality expressed in the sentence must be lower than the expectation (or normal) (see Zhong [6, 8]). Because of the occurrence or existence of the eventuality with a lower possibility than the expectation (or normal) in dōub sentence, it can be imagined that the occurrence or existence of the eventuality

On the Differences between Dōua and Dōub

15

at the lowest pragmatic scale is the most likely to become the expression content of dōub sentence, and its pragmatic frequency is also the highest. For example: (55) Zhōngguó rén lián sǐ dōub bú pà, hái pà kùnnán ma? China people including death DOUb no afraid, yet afraid difﬁculty PAT Chinese people are not afraid of death. Are they still afraid of difﬁculties? (56) Měitiān cìhòu nǐ, yí jù hǎohuà dōub dé bú dào. Everyday serve you, one sentence good-word dōub gain not get I can'’ get a good word when I serve you every day. (57) Jiùjiù shì zhōngguótōng, bǐ shàngdì dōub gèng liǎojiě Zhōngguó rén. Uncle is china-hand, than God DOUb more learn China people My uncle is a China hand, and he knows Chinese people better than God. In turn, we can also predicate that when the possibility is higher or equivalent to that of the eventuality expected (or normal) in a particular context, or the eventuality is without any possibility (except the eventuality exaggerated on purpose), because the possibility cannot be inferior-to-expectation (or normal), Dōub sentences are untenable. For example: (58) * Àiyīnsītǎn bǐ xiǎoxuéshēng dōub cōngming. Einstein than pupil DOUb smart Einstein is even smarter than pupils. (59) * Dōub xiàtiān le, zěnme hái bù xiàxuě ne? DOUb summer ASP, how still not snow PAT It’s already summer, why doesn’t it snow. (60) a. * Tā lián jiéjiàrì dōub xiūxi. He including holiday DOUb rest He even rests on holidays. b. Tā lián zhōumò dōub xiūxi. He including weekend DOUb rest He even rests at weekends. c. Tā lián gōngzuòrì dōub xiūxi. He including weekday DOUb rest He even rests on weekdays. (61) a. * Tā lián xiàshǔ dōub gǎn dézuì He including subordinates DOUb dare offend He even dares to offend his subordinates. b. *Tā lián píngjítóngshì dōub gǎn dézuì He including peers DOUb dare offend He even dares to offend his peers. c. Tā lián lǐngdǎo dōub gǎn dézuì He including leaders DOUb dare offend He even dares to offend his leaders.

16

H. Zhong

d. Tā lián shàngdì dōub gǎn dézuì He including God DOUb dare offend He even dares to offend God. (62) a. * Zhè wèi tōngxué lián yìfēn dōub néng kǎo dào. This CL student including one-point DOUb enable exam get This student even can get one point in an examination. b. * Zhè wèi tōngxué lián liùshífēn dōub néng kǎo dào. This CL student including sixty-points DOUb enable exam get This student even can get sixty points in an examination. c. Zhè wèi tōngxué lián bāshífēn dōub néng kǎo dào. This CL student including eighty-points DOUb enable exam get. This student even can get eighty points in an examination. d. Zhè wèi tōngxué lián yìbǎifēn dōub néng kǎo dào. This CL student including one-hundred-points DOUb enable exam get This student even can get one hundred points in an examination. (63) a. * Xiǎowáng bǐ tā bàbà dōub xiǎo. Xiaowang than he dad DOUb young Xiaowang is even younger than his dad. b. * Xiǎowáng bǐ tā bàbà dōub dà. Xiaowang than he dad DOUb old Xiaowang is even older than his dad. c. Xiǎowáng zhǎng de bǐ tā bàbà dōub lǎo. Xiaowang look PAT than he dad DOUb old Xiaowang looks even older than his dad. (exaggerated on purpose.) ‘To rest on holidays, Einstein is smarter than pupils, it doesn’t snow in summer, he dares to offend his subordinates, one can get one point in an examination’ are at the high end of pragmatic scales of possibilities. ‘To rest at weekends, he dares to offend his peers, one can get sixty points in an examination, Xiaowang is younger than his dad’ have the possibility almost equivalent to or close to the expectation (or normal). And the possibility of ‘Xiaowang is older than his dad’ doesn’t exist. So these doub sentences are untenable. Normally the possibility of ‘Xiaowang looks older than his dad’ is unlikely, but it is can be used for exaggeration on purpose. For most students who ‘can get sixty points’ is the expected or normal possibility of exam results, but for one student who is doing very badly at school (whose exam result is expected or normally to be thirty or forty points), the possibility is inferior-to-expectation or (normal), at this point then the doub sentence is tenable, for example: (64) Tā xiànzài doub néng kǎo dào liùshífēn le, zhēnde jìnbù He now DOUb enable exam get sixty-points ASP really progress hěndà le. very-large ASP

On the Differences between Dōua and Dōub

17

He even can get sixty points, that he really had made a great progress. Fourthly, from the information content expressed by doua, doub and their sentences/constructions, doua quantiﬁes the eventualities and expresses a neutral information, with a medium level of information, while doub, as an CE discoursemarker, judges the eventuality and expresses CE information. The level of information is high (limited to space, do not expand, see Dahl [23], Wu [24] for details).

4 The Pragmatic Differences Between Doua and Doub, and Between Their Sentences/Constructions The pragmatic functions of doua and doub, and their sentences/constructions are quite complicated. For the convenience of presentation, we will learn from the three metafunction theories of functional linguistics (see Halliday [25], Fraser [16]), mainly from the interpersonal function and textual function to compare the pragmatic functions of doua and doub, and their sentences/constructions. 4.1

The Interpersonal Function and Textual Function of Doua and Its Sentences/Constructions

The interpersonal function of doua is relatively simple, mainly to convey objective and quantitative information, and its informative intention and communicative intention are basically the same (see Sperber and Wilson [20]). The textual function of doua sentences/constructions is context-free, which can form various associations with other sentences or form a sentence independently. 4.2

The Interpersonal Function and Textual Function of Doub and Its Sentences/ Constructions

As a CE discourse marker, doub has a strong interpersonal interaction (see Zhong [6] for details). Doub roughly encodes two layers of procedural meaning for its sentence: One is to introduce some kind of expectation (or normal), another is to judge that the possibility of a state of affairs is deviated from the expectation (or normal). And it makes a doub sentence has an emphasis meaning by pragmatic interference (see Zhong [8] for details). As we can see, the addition of doub changes the original proposition from neutral information to CE information, and its informational value from medium to high. It can give a procedural restriction or guidance on understanding the relevant discourse, which helps the addressee obtain more information contents with less pragmatic efforts. This is just the exertion of Economy Principle at discourse level. In a doub sentence, its information intention is to state that the possibility of occurrence or existence of a state of affairs is lower than the expectation (or normal). But this is not the chief purpose of a doub sentence, nor its communicative intention. Its chief purpose is to assist in expressing a variety of emotions of a speaker, such as exclamation, astonishment, blame, complain, regret and etc., through stating the state

18

H. Zhong

of affairs deviating from expectation (or normal), or to assist in expressing advice, reminder and so on. For example: (65) Lián cǎogǎo dōub gōnggōngzhěngzhěng, zhen shi nande a! Including draft DOUb neat-and-tidy, really is rare PAT Even the draft is neat and tidy, it’s really rare! (66) Nénggòu sīxiǎng duōme hǎo! Hǎo de wǒ dōub bù xiǎng shuìjiào. Enable think how nice! Good PAT I DOUb not want sleep How nice it is to be able to think! So good that I don’t even want to sleep. (67) Jiùjiù shì zhōngguótōng, bǐ shàngdì dōub gèng liǎojiě Zhōngguó rén. Uncle is china-hand, than God DOUb more learn China people My uncle is a China hand, and he knows Chinese people better than God. (68) Nǐ dōub bù zhīdào wèishěnme guà, nà hái guà tā gànmǎ? You DOUb not know why hang, then still hang it do-what? If you don’t know why you hang it, why hang it up? (69) Duōshǎo rén zuòmèng dōub mèng bù dào de hǎo shì, nǐ How-many people dream DOUb dream not get PAT good thing, you jìngrán hái tuīcí! unexpectedly still refuse This is a good thing that many people can’t even dream of, and you even refuse to do so! (70) Zhè dōub shǎ shíhòu le, nǐ hái bù qǐchuáng? This DOUb what time ASP, you still not get-up What time is it, you still don’t get up? From the corpora of dōub sentences, it is not difﬁcult to ﬁnd that dōub sentences having communicative intention to assist in expression, which provides an evidence and support for a speaker’s attitude and assertion, i.e. ‘evidentiality’, strengthening illocutionary force, mood, and tone, such as examples (65–70). It is a kind of effort to strengthen the speaker’s exclamation, astonishment, blame, complain, regret and etc., which is just the embodiment of the communicative intention or meta-pragmatic awareness of a dōub sentence (see Verschueren [26]). Take a comparison: (71) a. Dōub dà gūniang le, yào zhùyì zhěngjié! DOUb big girl ASP, should pay-attention tidiness You are already a big girl, you should pay attention to tidiness! b. Zhùyì zhěngjié! Pay-attention tidiness Pay attention to tidiness! (72) a. Zhōngguó rén lián sǐ dōub bú pà, hái pà kùnnán ma? China people including death DOUb not afraid, yet afraid difﬁculty PAT Chinese people are not afraid of death. Are they still afraid of difﬁculties? b. Hái pà kùnnán ma? Yet afraid difﬁculty PAT Are they still afraid of difﬁculties?

On the Differences between Dōua and Dōub

19

Obviously, in comparison with sentences (71b, 72b) which only say ‘Pay attention to tidiness!’, ‘Are they still afraid of difﬁculties?’, sentences (71a,72a) are much stronger in illocutionary force, mood, and tone, which is more evidential and even make the addressee harder to refuse, and also show the communicative intention to strengthen declaring the speaker’s attitude and assertion. The procedural meaning of dōub as a CE marker is essentially a comparison (among different pragmatic scales of possibility or between the state of affairs deviating from the expectation and the reality or result etc.). But the comparison itself is not the chief purpose of a dōub sentence, and the chief purpose of a dōub sentence is to show the speaker’s attitude and assertion, and to provide relevant evidences, which can strengthen illocutionary force and obtain the best communicative effects. Therefore, in a relatively complete discourse or context, dōub is normally used in the sentence for assisting to declare the speaker’s attitude and assertion, and is seldom used in an independent sentence, such as examples above (65–72). Thus, whether the targets of comparison are explicit or implicit, and whether sentences assisted to declare attitude and assertion are explicit or implicit, in a speciﬁc discourse, a dōub sentence must have very strong context dependency and textual relevance. Therefore, the textual function of dōub and its sentences/constructions belongs to the context-dependency. Due to the procedural meaning and pragmatic features of dōub, the main textual relevance types of dōub sentences are progressive, adversative, certiﬁcative, causal, suppositional and etc., and dōub sentences are usually used in spoken discourse (Examples are omitted, see Zhong [6] for details).

5 Epilogue All scholars who pay attention to the adverb dōu must ﬁrst think about two closely related issues. Such as, how to classify the semantic and pragmatic functions of the adverb dōu, and to deﬁne them/it? The solution of these two issues is the fundamental basis for the other related studies of the adverb dōu. Ignoring these two issues often leads to the story of blinds touching elephants with different experiences. However, the solution of these two issues has a difﬁculty that is similar to eggs ﬁrst or chickens ﬁrst. If the semantic and pragmatic functions were clearly deﬁned, the classiﬁcation would be easy. However, the semantic and pragmatic functions of dōua (dōu1) and dōub(dōu2, dōu3) are too complex to deﬁne. If the classiﬁcation were clear, it could eliminate tangles between sub-dōus. Also, it is easy to reveal the internal consistency of different contextual usage of the same subdōu and to deﬁne the semantic and pragmatic functions of sub-dōus precisely. However, it is not easy to classify well when the semantic and pragmatic functions are not clearly deﬁned. Classiﬁcation only needs to ﬁnd a few distinctive features, and there is no need to comprehensively and accurately deﬁne the semantic and pragmatic functions of dōu. Easy before difﬁcult, on these two difﬁcult problems, Zhong [11] has extracted the distinctive features of them, and has made it clear that adverb dōu are dōua and dōub.

20

H. Zhong

This paper shows that there are systematic differences in syntax, semantics and pragmatics between dōua and dōub. By clarifying these differences, we can fully establish the dichotomy consensus of the adverb dōu. Acknowledgments. The study is supported by the Social Science Foundation of Fujian Province (FJ2020B130).

References 1. Paris, M.C.: Lián…Yě/Dōu in Mandarin Chinese. Linguist. Abroad. 3, 50–55 (1981) 2. Wang, H.: Analysis of grammar meaning of adverb Dōu. Chin. Learn. 6, 55–60 (1999). (In Chinese) 3. Jiang, J.: Evolution and classiﬁcation of totalizing. Chin. Learn. 4, 72–76 (2003). (In Chinese) 4. Zhang, Y.S.: Grammaticalization and subjectivization of adverbs Dōu. J. Xuzhou Norm. Univ. 3, 56–62 (2005). (In Chinese) 5. Xu, L.J.: Similarities and differences of Shanghai Dialect Chai and Mandarin Dōu. Dialects 2, 97–102 (2007). (In Chinese) 6. Zhong, H.: [Doub(Dou2, Dou3)] as a counter-expectation discourse-marker: on the pragmatic functions of Doub from the perspective of discourse analysis. In: CLSW 2015, LNAI, vol. 9332, pp. 392–407 (2015) 7. Zhong, H.: On the quantiﬁcation of events in Dōua construction. In: CLSW 2017, LNAI, vol.10709, pp. 41–60 (2018a) 8. Zhong, H.: The conventional implicature of Dōub: on Semantics of Dōub from the Perspective of Discourse Analysis. In: CLSW 2018, LNAI, vol. 11173, pp. 44–60 (2018b) 9. Zhong, H.: The distributive-index function of interrogative pronouns in dōu sentences. Lang. Teach. Ling. Stud. 4, 79–90 (2021). (In Chinese) 10. Zhong, H.: Comprehensively seeking differences, seek common ground deeply. Linguist. Sci. 4, 383–401 (2021). (In Chinese) 11. Zhong, H.: The centennial controversy: How to classify the Chinese adverb dōu. In: CLSW 2019, LNAI, vol. 11831, pp. 63–73 (2020) 12. Li, W.S.: On the semantic complexity of Dōu in Mandarin: a partially uniﬁed account. Chin. Teach. 3, 319–330 (2013). (In Chinese) 13. Xu, L.J.: Is Dōu a universal quantiﬁer? Stud. Chin. Lang. 6, 498–507 (2014). (In Chinese) 14. Shen, J.X.: Leftward or rightward? The quantifying of Dōu. Stud. Chin. Lang. 1, 3–17 (2015). (In Chinese) 15. Zhang, J.J.: A uniﬁed account on the semantics of dou: maximal quantity of event. Lang. Teach. and Ling. Stud. 1, 55–66 (2021). (In Chinese) 16. Fraser, B.: Towards a theory of discourse markers. In: Fischer, K. (ed.) Approaches to Discourse Particles, pp. 189–204. Elsevier Ltd, Amsterdam (2006) 17. Halliday, M.A.K.: An Introduction to Functional Grammar, 3rd edn. Edward Arnold, London (2004) 18. Blakemore, D.: Understanding Utterances. Basil Blackwell, Oxford (1992) 19. Blakemore, D.: Relevance and Linguistic Meaning: The Semantics and Pragmatics of Discourse Markers. Cambridge University Press, New York (2002) 20. Sperber, D., Wilson, D.: Relevance: Communication and Cognition, 2nd edn, pp. 58–61. Blackwell, Oxford (1995) 21. Lasersohn, P.: Plurality, Conjunction and Events. Kluwer, Dordrecht (1995)

On the Differences between Dōua and Dōub

21

22. Yuan, Y.L.: The summative function of Dōu and its distributive effect. Contemp. Linguis. 4, 289–304 (2005). (In Chinese) 23. Dahl, Ö.: Grammaticalization and the Lift Cycles of Construction, p. 27. Ms. Stockholm University (2000) 24. Wu, F.: On the pragmatic function of the construction X bu-bi YZ. Chin. Lang. 3, 222–231 (2004). (In Chinese) 25. Halliday, M.A.K.: An Introduction to Functional Grammar, 2nd edn. Edward Arnold, London (1994) 26. Jef, V.: Understanding Pragmatics. Edward Arnold, London (1999)

From Falling to Hitting: Diachronic Change and Synchronic Distribution of Frost Verbs in Chinese Sicong Dong1(&) 1

2

and Chu-Ren Huang2

School of Humanities and Social Sciences, Harbin Institute of Technology, Shenzhen, China [email protected] Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hung Hom, Hong Kong [email protected]

Abstract. The verbs indicating the occurrence of frost in Chinese have undergone a diachronic change. Ancient Chinese chiefly uses non-volitional verbs with downward movement meanings, while Sinitic languages widely adopt 打 dǎ ‘to hit’, an action verb with high transitivity. This modern usage develops from the transitive verb 打 dǎ ‘to hit’ denoting frost damage in ancient Chinese through conventionalization and semantic bleaching. Speakers of Chinese using this verb have experienced frost distinctively, which leads to the linguistic innovation. The geographical distribution of frost verbs and frost damage provides clues to this relation between weather and language. Keywords: Weather verb Frost Geolinguistics Diachronic change Sinitic languages

1 Introduction How language encodes weather has received increasing attention in recent years from typological perspectives. Most studies, within the framework of Eriksen et al. [1, 2], investigated whether certain weather events are encoded as predicates or arguments in different languages. Huang et al. [3], in addition, showed that typological research on weather expressions can be expanded to the diversity of verbs that collocate with arguments carrying meteorological meanings, such as Mandarin 下 xià ‘to fall’ in 下雨 xiàyǔ ‘to rain’ and 起 qǐ ‘to rise’ in 起雾 qǐwù ‘to fog’. This paper will look into such verbs in expressions indicating the occurrence of frost, henceforth frost verbs, in Sinitic languages and ancient Chinese. Frost verbs behave quite differently from other Chinese weather verbs describing similar weather phenomena [4–7]. For example, frost verbs tend to be non-directional in Sinitic languages, while other condensed atmospheric water, such as fog and dew, tend to adopt verbs with downward meanings. This seems more interesting given that frost is said to move downwards in Archaic Chinese [8, 9], and tends to ‘fall’ crosslinguistically [10]. Moreover, frost verbs in Sinitic languages, such as 打 dǎ ‘to hit’ in © Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 22–30, 2022. https://doi.org/10.1007/978-3-031-06703-7_2

From Falling to Hitting: Diachronic Change and Synchronic Distribution

23

打霜 dǎshuāng ‘to frost’, can be used as transitive verbs to denote the extreme weather phenomenon of frost damaging crops [11], but other weather verbs are nearly intransitive. For such behaviour of frost verbs, Huang et al. [3] provided an account using mass and speed involved in a weather phenomenon: weather events with bigger weather substances and faster weather processes tend to select action verbs with high transitivity or high kinesis. Frost is heavier than fog and dew, and is thus inclined to be expressed by the action verb 打 dǎ ‘to hit’. Based on previous research, we want to further address the following questions: a) Is the relation between directional verbs and action verbs a diachronic change of frost verbs? If yes, why would it happen? b) How are frost verbs in Sinitic languages distributed geographically? Can the distribution provide any clue to the use of action verbs?

2 Frost Verbs in Ancient Chinese In this section, we will look into how frost is expressed in ancient Chinese, in order to check whether the use of action verb is a result of diachronic change. Ren and Dong [8] examined frost verbs used from oracle bone inscriptions to Southern and Northern dynasties, i.e., in Archaic and early Middle Chinese. According to their investigation, frost verbs in that period, such as 降 jiàng, 陨 yǔn and 下 xià, predominantly have the meaning of moving downwards. To draw a whole picture of frost verbs in ancient Chinese, therefore, we need to investigate their usage after Southern and Northern dynasties (420 A.D.–589 A.D.). We examined the classical text from Sui-Tang to Qing dynasties, i.e., from 581 A. D. to 1911 A.D., in the database of Chinese Text Project (accessed at https://ctext.org), and found frost verbs in the following works: Yiwen Leiju (艺文类聚), Tongdian (通典), Quan Tangshi (全唐诗), Annotations to Classic of Filial Piety (孝经注疏), Taiping Yulan (太平御览), Taiping Guangji (太平广记), Zhuzi Yulei (朱子语类), Journey to the West (西游记) and Jin Ping Mei (金瓶梅). Such verbs can be grouped into three major types, based on their meaning, as shown below. Among these verbs, the ones of the ﬁrst type are predominantly used to indicate the occurrence of frost. a) Verbs meaning to fall or to drop, e.g., 降 jiàng, 落 luò, 下 xià, 陨/殒 yǔn, 零 líng. See (1)–(3). b) Verbs meaning to condense, e.g., 凝 níng, 结 jié. See (4). c) Verbs meaning to exit or to appear, e.g., 有 yǒu, 出 chū. See (5). (1) 五月草始生, 八月霜雪降。 wǔ__yuè__cǎo__shǐ__shēng__bā__yuè__shuāng__xuě__jiàng ﬁfth__month__grass__begin__grow__eighth__month__frost__snow__fall ‘Grass begins to grow in May, while frost and snow appear in August.’ (Dangxiang 党项, in Tongdian)

24

S. Dong and C.-R. Huang

(2) 庐山顶上有一池, 水池中有三石雁, 霜落则飞。 lúshān__dǐng__shàng__yǒu__yī__chí__shuǐchí__zhōng__yǒu__sān__shí__ yàn__shuāng__luò__zé__fēi Mount-Lu__summit__up__exist__one__pond__pond__middle__exist__three __stone__wild-goose__frost__fall__then__fly ‘There are three stone wild geese in a pond on top of Mount Lu. They would fly when frost forms.’ (Mount Lu 庐山, in Taiping Yulan) (3) 此桃霜下始花, 隆冬可熟。 cǐ__táo__shuāng__xià__shǐ__huā__lóng__dōng__kě__shú this__peach__frost__fall__begin__bloom__deep__winter__can__mature ‘This type of peach begins to bloom when frost forms, and will be ripe in the dead of winter.’ (Melon 瓜, in Taiping Guangji) (4) 薤露落而暮田寒, 玄霜凝而垄草白。 xiè__lù__luò__ér__mù__tián__hán__xuán__shuāng__níng__ér__lǒng__cǎo__bái scallion__dew__fall__then__dusk__ﬁeld__cold__thick__frost__condense__ then__ridge-in-ﬁeld__grass__white ‘The ﬁeld at dusk turns cold when dew forms on scallions, while the grass on the ridges in the ﬁeld turns white when thick frost is condensed.’ (Emperor Wu of Chen 陈武帝, in Yiwen Leiju) (5) 若有霜雪, 必有灾异。 ruò__yǒu__shuāng__xuě__bì__yǒu__zāi__yì if__exist__frost__snow__surely__exist__disaster__oddity ‘When there are frost and snow, there must be disasters and strange occurrences.’ (Wheat 麦, in Taiping Yulan) Apart from the verbs listed above, some other verbs can also be found, but only in rhetorical language, such as the poems in Quan Tangshi. They include verbs with downward movement meanings, e.g., 沉 chén ‘to sink’, 坠 zhuì ‘to drop’, 雨 yù ‘to fall like raining’, 垂 chuí ‘to droop’, and others such as 来 lái ‘to come’, 飞 fēi ‘to drift’ and 起 qí ‘to rise’. In addition, the action verb 打 dǎ ‘to hit’ has also been found to co-occur with 霜 shuāng ‘frost’ in four of the Tang poems, as shown in (6)–(9). (6) 浓霜打叶落地声, 南溪石泉细泠泠。 nóng__shuāng__dǎ__yè__luò__dì__shēng__nán__xī__shí__quán__xì __línglíng thick__frost__hit__leaf__fall__ground__sound__south__creek__stone __spring__tiny__cool ‘Thick frost causes leaves to fall to the ground, making sounds, while the cool spring streams slowly through the stones along the south creek.’ (Autumn dusk at a temple 洞宫秋夕, in Quan Tangshi) (7) 觅句唯顽坐, 严霜打不知。 mì__jù__wéi__wán__zuò__yán__shuāng__dǎ__bù__zhī seek__sentence__only__ﬁrmly__sit__severe__frost__hit__NEG__know ‘(Jia Kuang) sits ﬁrmly, composing lines of a poem, unaware of the severe frost formed on him.’ (Missing Jia Kuang from Kuangshan 思匡山贾匡, in Quan Tangshi)

From Falling to Hitting: Diachronic Change and Synchronic Distribution

25

(8) 霜打汀岛赤, 孤烟生池塘。 shuāng__dǎ__tīng__dǎo__chì__gū__yān__shēng__chítáng frost__hit__islet__island__bare__sole__smoke__generate__pond ‘Frost makes the islets bare of life, while a line of smoke rises from a pond.’ (Missing Shilang Wang Qi at night Ganxiaoting 干霄亭晚望怀王棨侍郎, in Quan Tangshi) (9) 自怜酷似随阳雁, 霜打风飘到日边。 zì__lián__kù__sì__suí__yáng__yàn__shuāng__dǎ__fēng__piāo__dào __rì__biān self__pity__extremely__resemble__follow__sun__wild-goose__frost __hit__wind__float__arrive__sun__side ‘I pity myself as a man like a wild goose following the sun, suffering from frost and wind all the way.’ (Presenting to Shijun Lu 上卢使君, in Quan Tangshi) It is obvious, however, that the above instances are not simply describing the occurrence of frost. Instead, the foci are on the damage caused by frost. In other words, 打 dǎ ‘to hit’ is not a typical frost verb, but a transitive action verb, meaning frost inflicting damage to plants and humans. In fact, there are many usages of 霜 shuāng ‘frost’ as the subject of a transitive action verb in ancient Chinese. See (10), accessed via CBETA (https://cbetaonline.dila.edu.tw/), and (11), for examples of 伤 shāng ‘to hurt’ and 杀 shā ‘to kill’. This usage and frame reflect the original meaning of 霜 shuāng: a substance to perish, and to make farming accomplished [5]. Also, the use of these transitive verbs is correlated with language speakers’ experience and impact by meteorological events, the damage to crops or other plants by frost in this particular case, which can be predicted by the theory of Huang et al. [3], namely, weather events of high kinesis (big mass and/or fast speed) are correlated with verbs of high kinesis (transitive action verbs). Similarly, 打 dǎ ‘to hit’ is used as a transitive verb in many Sinitic languages to describe lightning strike [11], also a meteorological disaster, which further supports the claim. (10) 卒有雹霜伤杀谷实 cù__yǒu__báo__shuāng__shāng__shā__gǔ__shí suddenly__exist__hail__frost__hurt__kill__grain__seed ‘Suddenly hail and frost damaged the grain.’ (Xiuxing Daodijing 修行道地经) (11) 昨夜霜一降, 杀君庭中槐。 zuò__yè__shuāng__yī__jiàng__shā__jūn__tíng__zhōng__huái yesterday__night__frost__once__fall__kill__you__courtyard__ middle__pagoda-tree ‘Frost formed last night, and damage the pagoda trees in your courtyard.’ (Advising my friend 谕友, in Quan Tangshi) It can thus be seen that, together with the ﬁndings of Ren and Dong [8], nonvolitional verbs meaning moving downwards are the norm for frost verbs in ancient Chinese, and expressions using the action verb with high transitivity, i.e., 打 dǎ ‘to hit’, in modern Sinitic languages, should be a linguistic innovation. However, the verb 打 dǎ ‘to hit’, though not used as a typical frost verb in ancient Chinese, does co-occur with 霜 shuāng ‘frost’ to denote the damage caused by this weather phenomenon. The reason for this diachronic change, as we are going to demonstrate in Sect. 3, can be better explored based on the geographical distribution of frost verbs.

26

S. Dong and C.-R. Huang

3 Frost Verbs and Frost Damage: Evidence from Geographical Distribution In dialectology and geolinguistics, synchronic distribution of linguistic forms is deemed of value to the enquiry into diachronic changes and variations. Huang et al. [3] made a preliminary exploration of the geographical distribution of frost verbs in Sinitic languages, based on 98 languages/dialects. However, the data points in northern China are not dense enough to depict the general distribution. In other words, among the scattered locations sharing the same frost verb, there are possibly locations using other verbs. Therefore, it is necessary to examine frost verbs in more locations, especially the ones in northern China, to obtain more detailed distribution. We collected the use of frost verbs in 163 languages/dialects. The data were collected through consulting dictionaries and enquiring of native speakers. The dictionaries we consulted are Li [12], Xu and Miyata [13], Tao [14] and Zhang and Mo [15]. For data points not included in the dictionaries, we consulted native speakers individually via the social media application programme WeChat. All the collected data were then organized and the data points were grouped into four categories by the use of frost verbs: locations using the action verb with high transitivity, namely, 打 dǎ ‘to hit’; locations using verbs meaning to fall, including 下 xià, 落 luò and 降 jiàng; locations using both of the above types of verbs; locations using other verbs. Finally, we uploaded the data to ArcGIS Online by Esri (accessed at https://www.arcgis.com/), and generated a map of frost verbs, as shown in Fig. 1.

Fig. 1. Distribution of frost verbs in Sinitic languages

From Falling to Hitting: Diachronic Change and Synchronic Distribution

27

The pattern is straightforward in Fig. 1. The data points using 打 dǎ ‘to hit’ are located in the central area of China, roughly along the Yangtze River. While the data points using verbs meaning to fall seem to surround the area of 打 dǎ ‘to hit’, namely, they are located in the north as well as the south-eastern coastal region of China. This pattern conforms to the centre-periphery structure of concentric distribution [16] in geolinguistics. That is, new forms are used and diffused in the centre, while old forms remain in the periphery. Normally, the central area in a concentric distribution is the political, economic and cultural centre of a country, such as Kyoto in ancient Japan [17]. The area of 打 dǎ ‘to hit’, however, has not been such a centre of China. The linguistic innovation happened in this area, is due to the fact that language speakers there have experienced frost differently. According to research on frost in China [18, 19], frost in the area of 打 dǎ ‘to hit’ is meteorologically different from frost in the areas using verbs with downward meanings, in two important aspects. First, the periods of frost days, i.e., from the ﬁrst to the last frost dates of a year, are different in length. Frost days in the area of 打 dǎ ‘to hit’ are roughly between 50 to 150 days, while most of the areas of downward verbs have either fewer than 15 frost days or more than 150 frost days, see Fig. 2(c) in Xu et al. [18]. Second, the area of 打 dǎ ‘to hit’ roughly coincides with the regions suffering from frost damage to crops, see Fig. 2 below, showing the isogloss of 打 dǎ ‘to hit’ based on Fig. 1 and the area with frost damage based on data in Feng et al. [19].

Fig. 2. Distribution of frost expressions with 打 dǎ ‘to hit’ and frost damage area

28

S. Dong and C.-R. Huang

Combining the above two aspects of meteorological differences of frost, we propose that the linguistic innovation at issue is motivated by the meteorological behaviour of frost in the central area, and the usage of 打 dǎ ‘to hit’ as a frost verb originates from its transitive usage with 霜 shuāng ‘frost’ in ancient Chinese. Speciﬁcally, ﬁrst, 打 dǎ ‘to hit’ is used to denote the damage inflicted by frost in ancient Chinese, as shown in (6)–(9). Such a usage has a higher frequency in the central area than other regions, as it regularly suffers from frost damage. Importantly, frost days in the central area do not last very long, but once frost appears, the severe damage is highly likely to accompany, so frost can be prominently perceived as a disastrous weather phenomenon in this area. The highly frequent use of 打 dǎ ‘to hit’, in turn, leads to the conventionalization and semantic bleaching of this verb co-occurring with the frost noun. That is, the frequent use allows 打 dǎ ‘to hit’ to be used both as a generic verb for frost damage events and a weather verb for frost, so it subsequently starts to compete with frost verbs with downward movement meanings, and eventually substitute for the earlier frost verbs in the central area, completing the process of this linguistic innovation. On the other hand, since frost is frequently seen in northern China and rarely seen in the south-eastern coastal region, and both areas are outside the region with frequent frost damage, thus lacking the motive for linguistic innovation, the old forms are kept. Our proposal successfully accounts for the expressions of frost damage in Sinitic languages. According to the investigation of Dong et al. [10], most of the recorded forms for frost damage in Sinitic languages are 霜打 shuāngdǎ frost-hit, while languages spoken in Yinchuan and Lanzhou adopt 霜杀 shuāngshā frost-kill. This distribution should be due to the fact that 霜打 shuāngdǎ ‘frost inflicts damage to’ has been conventionalized in the central area, and verbs such as 杀 shā ‘to kill’, which were also used with 霜 shuāng ‘frost’ in ancient Chinese, can only be found outside the area suffering from frost damage regularly, such as Yinchuan and Lanzhou. Note that when two usages of 打 dǎ ‘to hit’, namely, to frost and to inflict damage, co-exist in one Sinitic language, e.g., Shanghai Wu, the occurrence of frost is 打霜 dǎshuāng, while the occurrence of frost damage is 霜打 shuāngdǎ. Another piece of evidence supporting our proposed account is the idioms concerning frost affected vegetables in Chinese, as shown in (12). Other verbs for frost, either frost verbs such as 下 xià ‘to fall’ and 落 luò ‘to fall’, or verbs denoting frost damage such as 伤 shāng ‘to hurt’ and 杀 shā ‘to kill’, cannot have this kind of conventionalized usage. (12) a. a. 霜打萝卜赛人参 shuāng__dǎ__luóbo__sài__rénshēn. frost__hit__daikon__surpass__ginseng ‘The daikon is even better than ginseng after frost damage.’ b. 霜打的茄子——蔫了 shuāng__dǎ__de__qiézi__niān__le frost__hit__DE__aubergine__wither__PFV ‘(Someone is dispirited like) the withered aubergine after frost damage.’

From Falling to Hitting: Diachronic Change and Synchronic Distribution

29

c. 霜打青菜分外甜 shuāng__dǎ__qīngcài__fènwài__tián frost__hit__bok-choy__particularly__sweet ‘The bok choy is particularly sweet after frost damage.’

4 Conclusion Our investigation of classical texts show that the occurrence of frost is predominantly expressed as ‘falling’ in ancient Chinese, and that the wide use of 打 dǎ ‘to hit’ as frost verbs in modern Sinitic languages is a result of diachronic change. Based on our data of geographical distribution, we ﬁnd that the region suffering from frost damage is roughly where the new frost verb is used. The severeness and damage of frost is argued to be cognitively salient, playing an important part in the perception of frost. Such distinctive experience and perception lead to the adoption of 打 dǎ ‘to hit’, which has been conventionalized and semantically bleached from its transitive usage meaning frost damage in ancient Chinese. Our study corroborates previous hypotheses on how weather shapes Sinitic languages and also provides an unusual case of concentric distribution, namely, the central area using new linguistic forms is not the economic or cultural centre. Acknowledgement. We would like to express our gratitude to Dr. He Huang for his helpful comments.

References 1. Eriksen, P., Kittilä, S., Kolehmainen, L.: Linguistics of weather: cross-linguistic patterns of meteorological expressions. Stud. Lang. 34(3), 565–601 (2010). https://doi.org/10.1075/sl. 34.3.03eri 2. Eriksen, P., Kittilä, S., Kolehmainen, L.: Weather and language. Lang. Linguist. Compass 6 (6), 383–402 (2012). https://doi.org/10.1002/lnc3.341 3. Huang, C.-R., Dong, S., Yang, Y., Ren, H.: From language to meteorology: kinesis in weather events and weather verbs across Sinitic languages. Hum. Soc. Sci. Commun. 8, 4 (2021). https://doi.org/10.1057/s41599-020-00682-w 4. Dong, S., Yang, Y., Huang, C.-R., Ren, H.: Directionality and momentum of water in weather: a morphosemantic study of conceptualisation based on Hantology. In: Hong, J.-F., Zhang, Y., Liu, P. (eds.) CLSW 2019. LNCS (LNAI), vol. 11831, pp. 575–584. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38189-9_59 5. Huang, C.-R., Dong, S.: From lexical semantics to traditional ecological knowledge: on precipitation, condensation and suspension expressions in Chinese. In: Hong, J.-F., Zhang, Y., Liu, P. (eds.) CLSW 2019. LNCS (LNAI), vol. 11831, pp. 255–264. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38189-9_27 6. Dong, S.: A study on meteorological words in Chinese. Postdoctoral report, Peking University (2019). https://doi.org/10.13140/rg.2.2.22648.29445. (in Chinese)

30

S. Dong and C.-R. Huang

7. Dong, S., Yang, Y., Ren, H., Huang, C.-R.: Directionality of atmospheric water in Chinese: a lexical semantic study based on linguistic ontology. SAGE Open 11(1), 1–13 (2021). https://doi.org/10.1177/2158244020988293 8. Ren, H., Dong, S.: Ways to encode meteorological events in Classical Chinese: from a typological perspective. Bull. Linguist. Stud. 27, 265–283 (2021). (in Chinese) 9. Ren, H.: “Noun-Verb” Conversion in Archaic Chinese: from the Perspective of LexicalSemantic Analysis. China Social Sciences Press, Beijing (2020).(in Chinese) 10. Dong, S., Huang, C.-R., Ren, H.: Towards a new typology of meteorological events: a study based on synchronic and diachronic data. Lingua 247, 102894 (2020). https://doi.org/10. 1016/j.lingua.2020.102894 11. Dong, S., Xu, J., Huang, C.-R.: Angry thunder and vicious frost: remarks on the unaccusativity of Chinese weather verbs. In: Liu, M., Kit, C., Su, Q. (eds.) CLSW 2020. LNCS (LNAI), vol. 12278, pp. 64–73. Springer, Cham (2021). https://doi.org/10.1007/9783-030-81197-6_6 12. Li, R. (ed.): Great Dictionary of Modern Chinese Dialects (42 Volumes). Jiangsu Education Publishing House, Nanjing (1993–2003). (in Chinese) 13. Xu, B., Miyata, I. (eds.): A Comprehensive Dictionary of Chinese Dialects. Zhonghua Book Company, Beijing (1999). (in Chinese) 14. Tao, G. (ed.): Dictionary of Nantong Dialect. Jiangsu People’s Publishing Ltd, Nanjing (2007).(in Chinese) 15. Zhang, W., Mo, C.: Dictionary of Lanzhou Dialect. China Social Sciences Press, Beijing (2009).(in Chinese) 16. Yanagita, K.: A Study of Variants of Escargot. Toko Shoin, Tokyo (1930). (in Japanese) 17. Ang, U.: Theories of diffusion and classiﬁcation in spatial development of language. Lang. Linguist. 16(5), 639–661 (2015). https://doi.org/10.1177/1606822X15583250 18. Xu, Y., Wang, G., Wang, P.: Climatic change of frost in China in recent 50 years. Scientia Meteorologica Sinica 29(4), 427–433 (2009). (in Chinese) 19. Feng, Y., He, W., Sun, Z., Zhong, X.: Climatological study on frost damage of winter wheat in China. Acta Agron. Sin. 25(3), 335–340 (1999). (in Chinese)

The Senses of Mandarin Deadjectival Verbs Xiaoqian Zhang(&) Institute of Linguistics, Chinese Academy of Social Sciences, Beijing, China [email protected]

Abstract. This work focuses on the senses of deadjectival verbs in Mandarin. We argue that the change-of-state reading of deadjectival verbs is a basic reading, rather than a derived one. Building on the interpretations of different time phrases in combination with deadjectival verbs, we propose that they belong to the aspectual class of achievements. Deadjectival verbs in Mandarin are divided into two classes: open scale-type verbs are derived from adjectival bases that are associated with an open property scale and their basic sense is [become Adjective-er]; closed scale-type verbs are derived from adjectival bases that are associated with a closed property scale and their basic sense is [become Adjective]. Keywords: Deadjectival verbs

Coercion Time phrases Become A(-er)

1 Introduction Mandarin has a class of words that function both as predicative adjectives and degree achievement verbs [1]. Typical examples are liang ‘cool’, di ‘low’, gui ‘expensive’, etc. Take liang ‘cool’ for instance. When it is used as a predicative adjective, it expresses a stative reading, e.g. (1); when it is used as a degree achievement verb, it expresses a change-of-state reading, e.g. (2)1–(4). (1)

(2)

(3)

(4)

1

Tang tebie liang. soup extremely cool ‘The soup is extremely cool.’ Tang liang-le, keyi he le. soup cool-PFV MODE drink PTCL ‘The soup cooled. You can drink it.’ Tang gang liang, keyi he le. soup just cool MODE drink PTCL ‘The soup just cooled. You can drink it.’ Tang mashang liang, deng-deng soup soon cool wait-wait ‘The soup will cool soon. Drink it later.’

Abbreviations used throughout in the glosses: PFV = perfective; PTCL = particle.

CLF

= classiﬁer;

DEM

© Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 31–41, 2022. https://doi.org/10.1007/978-3-031-06703-7_3

zai then

= demonstrative;

he. drink

MODE

= mood;

32

X. Zhang

Scholars are interested in the relation between the stative reading and the changeof-state reading of this class of words. Two hypotheses exist in the literature. The ﬁrst hypothesis assumes that the stative reading of liang ‘cool’ is a basic reading, i.e. liang ‘cool’ belongs to the aspectual class of states. When the perfective sufﬁx verbal -le or a temporal adverb like gang ‘just’ or mashang ‘soon’ occurs, liang ‘cool’ has to be re-analysed via an implicit coercion operator and thus gives rise to a change-of-state reading, cf. [2–9]. Contrary to this, the second hypothesis assumes that no derivational relation exists between the stative reading and the change-of-state reading, both of which are basic. In other words, liang ‘cool’ has two independent lexical entries: it either is a predicative adjective, or a degree achievement verb, to be more precise a deadjectival verb. The event structure templates of the two lexical entries are provided in (5). (5) Predicative adjective: [x ] Deadjectival verb: [BECOME [x < STATE>]] Although the coercion hypothesis avoids the redundancy of two lexical entries in the lexicon, we will point out the limits of this analysis in Sect. 2, and argue in favour of the hypothesis of double lexical entries. In Sect. 3, we will provide a ﬁne-grained analysis of the senses of deadjectival verbs before concluding in Sect. 4.

2 Against the Coercion Hypothesis If the coercion hypothesis presented in Sect. 1 is on the right track, the following two assumptions must hold. On the one hand, verbal -le can trigger a change-of-state reading when co-occurring with all predicative adjectives, and yet if only certain but not all predicative adjectives can express a change-of-state reading, this hypothesis seems rather ad hoc. On the other hand, temporal adverbs like gang ‘just’ and mashang ‘soon’ can express a change-of-state reading only when modifying predicative adjectives, but not with other aspectual classes of predicates such as activities, accomplishments and achievements. Otherwise, it makes no sense to treat liang ‘cool’ as a state, for it is impossible to exclude the possibility that it can also belong to other aspectual classes of predicates. Firstly, it is possible to note that predicative adjectives are not all equally compatible with verbal -le. According to [10], there are two classes of Mandarin predicative adjectives: the ﬁrst includes adjectives of base form, while the second comprises adjectives of complex form such as reduplication forms. However, neither class can unequivocally induce a change-of-state reading when combined with verbal -le as shown by examples (6)–(13). Note that examples (6)–(9) present adjectives in their basic form, while examples (10)–(13) include reduplicated adjectives.

The Senses of Mandarin Deadjectival Verbs

(6)

gaoxing-le happy-PFV ‘become happy’ (8) *zixi-le careful-PFV (10) yun-yun-hu-hu-le dizzy-PFV ‘become dizzy’ (12) *shuang-shuang-kuai-kuai-le comfortable-PFV

(7) (9) (11) (13)

33

shuangkuai-le comfortable-PFV ‘become comfortable’ * shenmi-le mysterious-PFV * gao-gao-le tall-tall-PFV *

wei-wei-qu-qu-le grieved-PFV

As far as the distribution of temporal adverbs like gang ‘just’ and mashang ‘soon’ is concerned, the situation is even more troublesome. Examples (14)–(17) present a predicative adjective liang ‘cool’, an activity ting yinyue ‘listen to music’, an accomplishment kan na ben shu ‘read that book’ and an achievement ying ‘win’, respectively. Despite their different aspectual class, they can all co-occur with the temporal adverb gang, which highlights the initial point of the eventuality denoted by the predicate and expresses a change-of-state reading. (14) Tang gang liang. soup just cool ‘The soup just cooled.’ (15) Lisi gang ting yinyue. Lisi just listen music ‘Lisi started to listen to music.’ (16) Lisi gang kan na ben shu. Lisi just read DEM CLF book ‘Lisi just began that book.’ (17) Lisi gang ying. Lisi just win ‘Lisi just won.’

The same applies to the distribution of mashang ‘soon’ which is clearly compatible with all aspectual classes of predicates and expresses a reading according to which the eventuality denoted by the predicate will start, cf. (18)–(21). (18) Tang mashang liang. soup soon cool ‘The soup will soon cool.’ (19) Lisi mashang ting Lisi soon listen ‘Lisi will soon listen to music.’ (20) Lisi mashang kan Lisi soon read ‘Lisi will soon read that book.’ (21) Lisi mashang hui MODE Lisi soon ‘Lisi will soon win.’

yinyue. music na

ben

DEM

CLF

ying. win

shu. book

34

X. Zhang

In a nutshell, we have argued that predicative adjectives do not all express a change-of-state reading in the presence of verbal -le, and temporal adverbs such as gang ‘just’ and mashang ‘soon’ can always induce a change-of-state reading regardless of the aspectual classes of predicates they modify. These two pieces of evidence clearly challenge the coercion hypothesis. We are thus led to adopt the proposal of double lexical entries according to which words such as liang ‘cool’ have two separate lexical entries in the lexicon: (i) as predicative adjectives with a stative use and (ii) as deadjectival verbs with a change-of-state use. The next section will speciﬁcally concentrate on the second.

3 Deadjectival Verbs in Mandarin 3.1

Time Phrases as Diagnostic Test

In the linguistic literature on aspect, time phrases are often used as a classical test to investigate the aspectual classes of predicates they modify, given that they express contrasting readings when co-occurring with different aspectual classes of predicates. Take in-adverbials and for-adverbials in English, for instance. When in-adverbials co-occur with states in the past tense, they specify the duration of the interval preceding the starting point of the state under consideration, thus conveying a delay reading. (22) State: delay reading John was at the summit in an hour. ‘At the end of an hour John was at the summit.’ [11] When in-adverbials modify activities in the past tense, they do not measure the duration of the activities, but can only express a marginal event delay reading. (23) Activity: #duration reading/ ?? marginal delay reading They chatted in an hour. # ‘They chatted, and the chatting event had a duration of an hour.’ ?? ‘After an hour they (ﬁnally) began to chat.’ [11] As to accomplishments that denote a complex event composed of a process and a telos, in-adverbials measure the duration of the whole complex event and express a whole event duration reading, e.g. (24). (24) Accomplishment: duration reading John wrote the letter in ten minutes. ‘John wrote the letter, and the whole letter-writing event had a duration of ten minutes.’ [11]

The Senses of Mandarin Deadjectival Verbs

35

Finally, with respect to achievements that are quasi-instantaneous, in-adverbials do not specify the duration of the achievements per se, but the delay before the culmination of the whole event and trigger a whole event delay reading, e.g. (25). (25) Achievement: delay reading John reached the summit in an hour. ‘At the end of an hour John reached the summit.’ [11] Instead, as far as for-adverbials are concerned, when they modify states and activities, they induce a duration reading, e.g. (26)–(27): indeed, a week and an hour specify the duration of the state of being sick and the duration of the activity of walking in the park, respectively. (26) State: duration reading Mary was sick for a week. [4] (27) Activity: duration reading Mary walked in the park for an hour. [4] However, when they modify accomplishments and achievements, they do not measure the duration of events per se, but the duration of the resultant state ensuing from the culmination of accomplishments and achievements. In (28), the window was open for ﬁve minutes, whereas in (29) Rebecca was absent for three hours. (28) Accomplishment: resultant state duration reading Rebecca opened the window for ﬁve minutes. [12] (29) Achievement: resultant state duration reading Rebecca left for three hours. [12] Additionally, when sentences describe an iteration of activities, accomplishments or achievements, for-adverbials specify the duration of the iteration of events, as shown in (30)–(32). (30) Iteration of activities John jogged for ten years. (31) Iteration of accomplishments John took buses to school for ten years. [13] (32) Iteration of achievements John noticed miserable looking people for several hours. [13] Since in-adverbials and for-adverbials can be used to distinguish different aspectual classes of predicates, [11] resorted to them to determine the semantics of English deadjectival verbs. According to [14], deadjectival verbs in English can be divided into two classes. The ﬁrst class comprises deadjectival verbs that are derived from adjectives associated with an open property scale, e.g. widen, which will be henceforth addressed to as open scale-type deadjectival verbs. The second class, instead, includes deadjectival verbs that are derived from adjectives associated with a closed property scale, e.g. quieten, which will now be called closed scale-type deadjectival verbs.

36

X. Zhang

[11] pointed out that all deadjectival verbs in English—both open scale- and closed scale-type—have a whole event delay reading as well as a resultant state duration reading when co-occurring with in-adverbials and for-adverbials, respectively. Speciﬁcally, in-adverbials in (33a)–(33b) specify the lapse of time prior to the culmination of the whole event of becoming A(djective)-er, while for-adverbials in (34a)– (34b) specify the duration of the resultant state ensuing from the culmination of the event of becoming A(djective)-er. (33) whole event delay reading a. The gap widened in a few minutes. ‘At the end of a few minutes the gap became wider.’ [11] b. The room quietened in a few minutes. ‘At the end of a few minutes the room became quieter.’ [11] (34) resultant state duration reading a. The gap widened for a few minutes. ‘(Having widened), the gap was wider for a few minutes.’ [11] b. The room quietened for a few minutes. Building on the interpretations of (33)–(34), [11] concluded that both open scaletype and closed scale-type deadjectival verbs in English belong to the aspectual class of achievements whose basic sense is [become A(djective)-er], which refers to a minimal telic event. In addition to the whole event delay reading, [11] found out that closed scale-type deadjectival verbs can also express a whole event duration reading when combined with in-adverbials, e.g. (35). (35) whole event duration reading The room quietened in a few minutes. ‘The room was becoming quieter throughout a period of a few minutes, and at the end of that period the room was quiet.’ [11] In a few minutes speciﬁes the duration of an event that comprises two sub-events: one consisting of an iteration of minimal events of becoming quieter, and the other providing a standard telos determined by the context—that is, ingression into the state of being quiet. Quieten on this reading belongs to the aspectual class of accomplishments, whose sense is [become A] rather than [become A-er]. 3.2

Time Phrases in Mandarin

Similarly to English, Mandarin time phrases in various syntactic positions also trigger different interpretations when combined with different aspectual classes of predicates [15]. With respect to post-verbal time phrases, when they go with states and activities, they express a duration reading: liang nian ‘two years’ and liang ge xiaoshi ‘two hours’ specify the duration of the state of liking Mary and that of the activity of listening to music in (36) and (37), respectively.

The Senses of Mandarin Deadjectival Verbs

(36) State: duration reading Lisi xihuan-le Mali liang Mary two Lisi like-PFV ‘Lisi liked Mary for two years.’ (37) Activity: duration reading Lisi ting-le liang ge two CLF Lisi listen-PFV ‘Lisi listened to music for two hours.’

37

nian. year

xiaoshi yinyue. hour music

Yet, when they co-occur with accomplishments, they do not measure the duration of the event, but rather that of the resultant state ensuing from the culmination of the event. Contrast (38) with (39). (38) Accomplishment: *duration reading * Lisi chi-le shi ke caomei wu fenzhong. Lisi eat-PFV ten CLF strawberry five minute (39) Accomplishment: resultant state duration reading Lisi ku-xing-le shi fenzhong you shuizhao-le. Lisi cry-wake-PFV ten minute again fall asleep-PFV ‘Lisi cried so that he woke up for ten minutes before falling asleep again.’

Likewise, the combination of achievements and post-verbal time phrases gives rise to a resultant state duration reading, e.g. (40) in which wu tian ‘ﬁve days’ measures the duration of Lisi’s stay. (40) Achievement: resultant state duration reading Lisi lai-le wu tian jiu five day then Lisi come-PFV ‘Lisi came for five days before leaving.’

zou-le. leave-PFV

Crucially, pre-verbal time phrases have different distributions and interpretations from post-verbal time phrases. They are incompatible with states and activities, cf. (41)–(42), but can co-occur with accomplishments and express a whole event duration reading, e.g. (43) where ﬁve minutes speciﬁes the duration of the event of eating ten strawberries. They can also co-occur with achievements and induce a whole event delay reading, e.g. (44) in which ﬁve minutes speciﬁes the duration of the delay prior to the culmination of the achievement of coming. (41) State: *duration reading * Lisi liang nian Lisi two year (42) Activity: *duration reading * Lisi liang ge Lisi two CLF

xihuan-le like-PFV

Mali. Mary

xiaoshi ting-le hour listen-PFV

yinyue. music

38

X. Zhang

(43) Accomplishment: whole event duration reading Lisi wu fenzhong chi-le shi ke CLF Lisi five minute eat-PFV ten ‘Lisi ate ten strawberries in five minutes.’ (44) Achievement: whole event delay reading * (jiu) lai-le. Lisi wu fenzhong Lisi five minute only come-PFV ‘Lisi came in five minutes.’

caomei. strawberry

Note that achievements cannot go with pre-verbal time phrases easily. We have to add adverbs such as jiu ‘only’ to highlight the delay prior to the culmination of the achievement. 3.3

Aspectual Class of Deadjectival Verbs in Mandarin

Given the differences between post-verbal and pre-verbal time phrases, in this part we will determine the aspectual class of deadjectival verbs in Mandarin. As with the classiﬁcation of deadjectival verbs in English, we will divide Mandarin deadjectival verbs into two classes: (i) open scale-type deadjectival verbs such as pang ‘fatten’ derived from the open-scale adjective fat and (ii) closed scale-type deadjectival verbs such as gan ‘dry’ derived from the closed-scale adjective dry. As illustrated in (45)–(46), post-verbal time phrases are compatible with both open scale-type and closed scale-type deadjectival verbs and trigger a resultant state duration reading. Two months speciﬁes the duration of the resultant state of being fatter in (45) and of being dry in (46). The resultant state duration reading clearly excludes the possibility of deadjectival verbs to be classiﬁed as states or activities, thus leaving them with the potentiality to either be accomplishments or achievements. (45) resultant state duration reading Lisi pang-le liang ge yue le. Lisi fat-PFV two CLF month PTCL ‘It has been two months since Lisi became fatter.’ (46) resultant state duration reading Lumian gan-le liang ge yue le. road dry-PFV two CLF month PTCL ‘It has been two months since the road became dry.’

Similarly to post-verbal time phrases, pre-verbal time phrases are compatible with both open scale-type and closed scale-type deadjectival verbs and express a whole event delay reading as shown by (47)–(48). Two days speciﬁes the duration of the delay prior to the culmination of becoming fatter in (47) and becoming dry in (48). (47) whole event delay reading * (jiu) Lisi liang tian Lisi two days only ‘Lisi became fatter in two days.’ (48) whole event delay reading * Lumian liang tian (jiu) road two day only ‘The road became dry in two days.’

pang-le. fat-PFV

gan-le. dry-PFV

The Senses of Mandarin Deadjectival Verbs

39

The whole event delay reading, however, excludes the possibility of Mandarin deadjectival verbs to be accomplishments, suggesting that they naturally fall into the aspectual class of achievements. The obligatory presence of the adverb jiu ‘only’ in (47)–(48) provides further evidence supporting the achievement status of deadjectival verbs, given that they behave in the same way as other “canonical” achievements in Mandarin. 3.4

The Semantics of Mandarin Deadjectival Verbs

Having determined the aspectual class of deadjectival verbs in Mandarin, it is now possible to focus on and discuss their different senses. As illustrated in (45) and (47), when the post-verbal and the pre-verbal time phrases co-occur with an open scale-type verb like pang ‘fatten’, they specify the duration of the resultant state ensuing from becoming fatter and that of the delay prior to the culmination of becoming fatter. We thus conclude that the basic sense of open scaletype verbs is [become A-er]. In contrast with this, when a closed scale-type verb such as gan ‘dry’ co-occurs with post-verbal and pre-verbal time phrases, they specify the duration of the resultant state ensuing from becoming dry in (46) and that of the delay prior to the culmination of becoming dry in (48). The basic sense of closed scale-type verbs is thus [become A]. The semantic difference between open scale-type and closed scale-type verbs in Mandarin is also observed in the contrast between (49) and (50). (49) Lisi zuijin pang-le, yi tian bi yi tian Lisi recently fat-PFV one day than one day ‘Lisi fattened recently. He is fatter and fatter day by day.’ (50) Lumian zuijin gan-le, ???yi tian bi yi tian day than one day road recently dry-PFV one ‘The road dried recently. The road is drier and drier day by day.’

pang. fat gan. dry

As the basic sense of open scale-type verbs is [become A-er], we can continue the ﬁrst clause in (49) with a clause asserting that Lisi’s physical property of weight undergoes changes. On the contrary, the basic sense of closed scale-type verbs is [become A], which suggests that the standard telos of gan ‘dry’ already obtained and that the road already entered into the state of being dry. It follows that it is difﬁcult to continue the ﬁrst clause in (50) with a clause asserting that the road undergoes more changes in terms of its dryness. Lastly, note that the [become A] reading of closed scale-type deadjectival verbs can be coerced into a [become A-er] reading in the presence of degree adverbials. Consider (51). (51) Lumian gan-le bushao. road dry-PFV a lot ‘The road dried a lot.’

When the degree adverbial bushao ‘a lot’ appears, the closed-scale type verb gan is re-interpreted and gives rise to a coerced reading according to which the road became drier to the extent speciﬁed by the degree adverbial a lot.

40

X. Zhang

4 Concluding Remarks In this work, we have studied the senses of deadjectival verbs in Mandarin. Firstly, we have provided two pieces of evidence to support the hypothesis that Mandarin deadjectival verbs are not a derived class of verbs but rather form a basic class of verbs. Building on the interpretations and distributions of different time phrases in cooccurrence with deadjectival verbs, we have proposed that both open scale-type and closed scale-type deadjectival verbs belong to the aspectual class of achievements. Open scale-type deadjectival verbs express [become A-er] as their basic reading and closed scale-type deadjectival verbs express [become A] as their basic reading. The senses of Mandarin deadjectival verbs differ from those of their English counterparts, for open scale-type deadjectival verbs in English belong to the aspectual class of achievements whose sense is [become A-er], whereas closed scale-type deadjectival verbs in English are ambiguous between achievements and accomplishments, expressing [become A-er] and [become A] readings, respectively.

References 1. Dowty, D.R.: Word Meaning and Montague Grammar. The Semantics of Verbs and Times in Generative Semantics and in Montague’s PTQ. Dordrecht: D. Reidel Publishing Company (1979) 2. Lin, J.W.: Aspectual selection and negation in Mandarin Chinese. Linguistics 41(3), 425– 459 (2003) 3. Lin, J.W.: Temporal reference in Mandarin Chinese. J. East Asian Linguis. 12(3), 259–311 (2003) 4. Smith, C.S.: The Parameter of Aspect. Kluwer Academic Publishers, Dordrecht (1991) 5. Smith, C.S.: Aspectual viewpoint and situation type in Mandarin Chinese. J. East Asian Linguis. 3(2), 107–146 (1994) 6. Tai, J.H.Y.: Verbs and times in Chinese: Vendler’s four categories. In: Testen, D., Mishra, V., Drogo, J. (eds.) Papers from the Parasession on Lexical Semantics, Chicago Linguistic Society. vol. 20, pp. 289–296 (1984) 7. Wu, J.S.: The semantics of the perfective le and its context-dependency: An SDRT approach. J. East Asian Linguis. 14(4), 299–336 (2005) 8. Xiao, R., McEnery, T.: Aspect in Mandarin Chinese: A Corpus-based Study. John Benjamins Publishing Company, Amsterdam (2004) 9. Yuan, Y.: Shi lun hanyu de ti yazhi [On aspectual coercion in Chinese]. Shijie Hanyu Jiaoxue [Chinese Teach. World ] 25(3), 334–345 (2011) 10. Zhu, D.: Xiandai Hanyu Yufa Yanjiu [Studies on the Grammar of Modern Chinese]. Shangwu Yinshuguan [The Commercial Press] (1980) 11. Kearns, K.: Telic senses of deadjectival verbs. Lingua 117, 26–66 (2007) 12. Piñon, C.: Durative adverbials for result states. In: Bird, S., Carnie, A., Haugen, J.D., Norquest, P. (eds.) Proceedings of 18th West Coast Conference on Formal Linguistics, pp. 420–433 (1999)

The Senses of Mandarin Deadjectival Verbs

41

13. Landman, F., Rothstein, S.: Incremental homogeneity in the semantics of aspectual forphrases. In: Rappaport Hovav, M., Doron, E., Sichel, I. (eds.) Lexical Semantics, Syntax and Event Structure, pp. 229–251. Oxford University Press (2010) 14. Hay, J., Kennedy, C., Levin, B.: Scalar structure underlies telicity in “degree achievements”. In: Matthews, T., Strolovitch, D. (eds.) Proceedings of SALT 9, pp. 127–144 (1999) 15. Zhang, X.: Expressions durative en chinois mandarin. Une étude sur l’aspect. Ph.D. thesis, Université Paris Diderot (2016)

A Corpus-Based Study of Factive Verbs and Its Influencing Factors Yu Wang(&)

and Yulin Yuan

Department of Chinese Language and Literature, Peking University, Beijing, China [email protected]

Abstract. Factive verbs refer to verbs with the function of presupposing the truth value of its subordinate clause. Previous studies have examined factive verbs with canonical sentences, yet not many of them have investigated these verbs systematically with instances from corpus. Taking “知道(know)”, “认为 (think)” and “幻想(fantasize)” as examples, this paper investigated 1464 sentences in the CCL corpus, and systematically described the factuality of factive verbs in the actual corpus. This paper also prposed that the subordinate clause of the non-factive verb “think” is not neutral and the subordinate clause of the counter-factive verb “fantasize” is not always false. Additionally, this paper came up with a variety of factuality influencing factors that have not been put forward by academics yet, such as tense, modal verbs, double negation, rhetorical questions, subjects, descriptive adverbials, etc. Finally, this paper introduced the theory of intersubjectivity and appraisal systems to unify the various factuality influencing factors. Keywords: Factive verbs

Intersubjectivity Appraisal systems

1 Introduction “Factive” was ﬁrst proposed by Kiparsky and Kiparsky [1] and further divided into three categories by Leech [2]: “A predicate (or more precisely, a feature in a predicate) may be classiﬁed as factive, non-factive, or counter-factive.” (P302, Line15) To be more detailed, Li Xinliang [3] came up with a deﬁnition of “factive verbs” and “factuality” by combining Yuan Yulin’s [4] and Leech’s theories [2]: Factuality is a semantic function of verbs, that is, the ability of verbs to presuppose the truth value of their subordinate clause. Speciﬁcally, a verb whose subordinate clause is always true whether it is positive or negative is factive, and the ability to presuppose the truth of its subordinate clause is called factuality; a verb whose subordinate clause is always false whether it is positive or negative is counter-factive, and the ability to presuppose the falsity of its subordinate clause is called counter-factuality; a verb whose subordinate clause is neither true nor false whether it is positive or negative is non-factive, and the ability of not presupposing the truth value of its subordinate clause is called nonfactuality. “知道(know)”, “幻想(fantasize)”, and “认为(think)” are instances of factive, counter-factive and non-factive verbs respectively. © Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 42–55, 2022. https://doi.org/10.1007/978-3-031-06703-7_4

A Corpus-Based Study of Factive Verbs and Its Influencing Factors

43

In the following sections, in order to distinguish the “factive” and the sum of “factive”, “non-factive”, and “counter-factive”, we will use “factive-sum” to represent the sum. Similarly, we will use “factuality-sum” to represent the sum of “factuality”, “non-factuality”, and “counter-factuality”. Factive-sum verbs account for the great majority of factive-sum predicates, therefore, most factive-sum predicate studies focused on factive-sum verbs. In the past decades, the constancy of factuality-sum has been the basis of the study of factive-sum verbs. However, recently, many Chinese scholars, such as Li Xinliang, Yuan Yulin [5], Chen Zhenyu, Zhen Cheng [6], Guo Guang, Chen Zhenyu [7], Yuan Yulin [8, 9], etc. have found that the factuality-sum of factive-sum verbs may shift under some speciﬁc conditions. For example, factive verbs may lose their factuality and become non-factive or even counter-factive verbs; non-factive verbs may become factive verbs or counterfactive verbs, etc. Speciﬁcally, Li Xinliang and Yuan Yulin [5, 10] have studied the factuality-sum shifting of counter-factive verbs and the factive verb “know”, respectively. Chen and Zhen [6] discussed the factuality shift of “regret” and “know” and prove that “factuality is both lexical semantics and rhetorical pragmatics” (P21). Li Xinliang [11, 12] further analyzed non-factive verbs and semi-factive verbs. Yuan Yulin and Kou Xin [13] extended the study of factive-sum words from verbs to nouns, and explored the factuality-sum shift of factive-sum nouns, etc. In conclusion, previous studies on factive-sum verbs are quite fruitful. However, there are still many shortcomings. First of all, most of the previous studies are based on the modal of canonical sentences and introspection, which is narrow in scope and unnatural in the corpus, and cannot systematically show the actual language use of factive-sum verbs. Secondly, although there are many factuality-sum influencing factors found by the researchers, these ﬁndings are not systematically analyzed and examined according to the same criterion. Finally, existing researches on the factualitysum shift are still at the level of description, lacking convincing theory explanations. Therefore, the current situation has led us to the following questions: First, if the factuality-sum of factive-sum verbs can shift, can they still be said to have factualitysum? Second, what is the condition of factive-sum verbs in modern Chinese that presuppose the truth value of the subordinate clause? What are the factors that affect

44

Y. Wang and Y. Yuan

factuality-sum? Are there any unidentiﬁed factors affecting factuality-sum? Why do different factors influence the factive-sum verbs in different degrees? The above questions have not yet been answered. The complexity of factive-sum verbs encourages us to explore them more deeply and systematically with evidence from corpus. The study of factive verbs will contribute to the study of semantic presupposition, cognitive collaboration, and the theory of teaching Chinese as a foreign language.

2 Factive-Sum Verbs in the Actual Corpus In order to investigate the factive-sum verbs in the actual corpus, we took “知道 (know)”, “幻想(fantasize)”, and “认为(think)”1 as examples. For each word, we downloaded 5000 positive and negative sentences from the CCL corpus2, and randomly extracted 250 positive and 250 negative sentences through a Python program. Counter-factive verbs were very rare in the negative environment; in the corpus, there were only 214 “fantasize” negatives in all. The following are the statistics of the factuality-sum of the 1464 factive-sum verbs. The following is a general summary of the results of our investigation of the factive-sum verbs. Table 1. Statistics of 1464 factive-sum verbs.

The factive verb “know” The non-factive verb “think”

The counter-factive verb “fantasize”

Subordinate clauses

Positive

Negative

Positive

Negative

Positive

Negative

Sentences with truth value / All sentences

111/250

61/250

250/250

232/250

29/250

38/214

True False Neutral

Firstly, as shown in the blue font in Table 1, although the factuality-sum of factualsum verbs may shift under some speciﬁc conditions, overall, the factuality-sum of factive-sum verbs remains relatively stable.

1

2

To be more concise, in the following sections, we will use “know”, “think” and “fantasize” to represent “知道(know)”, “幻想(fantasize)”, and “认为(think)”, respectively. CCL corpus is one of the biggest Modern Chinese corpora.

A Corpus-Based Study of Factive Verbs and Its Influencing Factors

45

Secondly, for the non-factive verb “think”, the truth value of its subordinate clause is not unpredictable as previous studies stated. When “think” is positive or negative, its subordinate clause has a strong tendency to be true or false, respectively, and the degree of tendency is related to the context. As shown in (4), most sentences in the corpus cannot be followed by “true” clauses and “false” clauses simultaneously since its subordinate clause’s truth value has a strong tendency.

The phenomenon that there is a strong tendency in the truth value of non-factive verbs’ subordinate clauses is also in line with the “quality” of the theory of conversational implicature. In view of this, we believe that it would be more appropriate to change the deﬁnition of non-factive verb from “A verb whose subordinate clause is neither true nor false whether it is positive or negative” to “A verb whose subordinate clause tends to be true while the verb is positive and false while the verb is negative”. Thirdly, the counter-factive verbs are relatively stable. However, as it is shown in Table 1, the truth value of its subordinate clause is not “always false” as previous studies stated, but can be “neutral” or “true”. For more details, we will discuss in the next section. Fourth, in the process of investigating the actual corpus, we found some factualitysum influencing factors that have not yet been pointed out, such as the descriptive adverbial “mistakenly”, the double negation structure “cannot not”, and so on. The details will be elaborated in the next section.

46

Y. Wang and Y. Yuan

3 Supplementation of Factuality-Sum Influencing Factors Based on the previous studies and the results of our investigations, we added some factuality-sum influencing factors. The details are as follows. 3.1

Counter-Factuality Influencing Factors

At present, there are very few studies talking about the factors that influence the counter-factuality of the counter-factive verb, and therefore the counter-factive verb is uniformly considered as “the sign of falsity” by the academic community. However, during the process of our investigation, we found that the counter-factuality of the counter-factive verb “fantasize” is obviously influenced by the tense. Speciﬁcally, when the tense of the subordinate clause is “future tense”, the counterfactuality of the counter-factive verb “fantasize” disappears, and the truth value of the subordinate clause changes from “false” to “neutral”. The example is as follows.

When the tense of the main clause is “past tense”, the counter-factuality of the counter-factive verb “fantasy” is lost because the subordinate clause is still “future tense” compared to the main clause. In this case, the truth value of the subordinate clause is determined by the content of the context. Consider the following example.

3.2

Non-factuality Influencing Factors

Except for subject, tone, and tense [11], we found that the non-factuality of the nonfactive verb is affected by a variety of factors such as modal verbs, double negation,

A Corpus-Based Study of Factive Verbs and Its Influencing Factors

47

rhetorical questions, and descriptive adverbials. Due to the length limit, we will briefly describe them as follows. Firstly, modal verbs have a greater influence on the non-factive verb “think”. Speciﬁcally, when a weak modal verb (e.g., may) modiﬁes a non-factive verb, the tendency of the non-factive verb will be weakened and its non-factuality will remain unchanged; and when a strong modal verb modiﬁes a non-factive verb, the nonfactuality of the non-factive verb will shift. More speciﬁcally, the truth value of the subordinate clause will strongly tend to be true while the verb is positive and strongly tend to be false while the verb is negative. The examples are as follows.

Secondly, the tone of the double negation is stronger than that of modal verbs, so the double negation makes the non-factuality of non-factive verbs shift completely. That is, when the non-factive verb “think” is modiﬁed by the double negation, its nonfactuality will become factuality, and its subsequent subordinate clause will be true.

48

Y. Wang and Y. Yuan

Thirdly, in Chinese, the rhetorical question will change the non-factuality of the non-factive verbs completely. Speciﬁcally, when the rhetorical question as a whole indicates the positive (e.g., 怎么能不认为? Nobody cannot think that…), the truth value of the subordinate clause is true; when the rhetorical question as a whole indicates the negative (e.g., 怎么能认为? Nobody thinks that…), the truth value of the subordinate clause is false. Examples are as follows. (Only Chinese examples below are rhetorical question sentences, the English ones are translations.)

Fourthly, when the subject is missing or when the person is “the author”, etc., it will also have an impact on the non-factuality of “think”. See the following example.

Fifthly, except for the tense adverbials, some descriptive adverbials also have a effect on factuality. However, not all descriptive adverbials have such an effect. Some adverbials will make the truth value of the subordinate clause of “think” shift from “neutral” to “true”, such as “wisely”, etc. While some other adverbials will make the truth value shift from “neutral” to “false”, such as “mistakenly”, “stupidly”, etc. Why is there such a difference? What factors are responsible for this difference? We will discuss them in the next section.

A Corpus-Based Study of Factive Verbs and Its Influencing Factors

49

We summarize the various types of factuality influencing factors in the following table, where the red parts are newly added by us (Table 2). Table 2. The factuality-sum influencing factors3 Main clause Subjects Tone Factive verb

Tense

Subordinate clause

Modal verbs

Descriptive adverbials Content

Tense

The whole Sentence pattern

The conceptual structure of words

Know

(+)

/

/

-

/

+

-

-

Non-factive verb Think Counter-factive Fanverb tasize

(+)

+

+

(+)

+

/

/

+

/

/

/

+

/

/

+

+

/

/

+

4 The Theoretical Study of Factive-Sum Verbs and the Factuality-Sum Influencing Factors What is the essence of factive-sum verbs? Yuan Yulin [14] gave us a reasonable answer. “Most verbs with factuality-sum are verbs that can have subordinate clauses. These verbs will form complementary constructions together with the subject and subordinate clauses. According to Verhagen [15], it is the subordinate clauses that really express the content in the complementation constructions, while the main clause predicate verb is mainly a “mental-space builder” that expresses the mental state of the subject. Accordingly, the subject of the main clause is an “onstage conceptualizer”. The discourse function of the complement construction is to invite the conceptualizer (listener) in the ground to cognitively coordinate with the conceptualizer in the foreground (subject of the main clause) and the speaker, and to instruct the listener to cognitively interpret the subordinate clause in the way speciﬁed by the speaker through the main clause (including the subject of the main clause and the predicate verb of the main clause). Thus, the complementary construction is an interactive subjective linguistic expression (intersubjectivity).” (P4). We agree with the above view. We believe that the essence of the factuality-sum is that the speaker expresses his or her understanding of the subordinate clause through

3

“+” means the factor influences the factuality-sum, “-” means the factor doesn’t influence the factuality-sum, “(+)” means the factor influences the factuality-sum to some degree, and “/” means the influence of this factor has not been discussed by academics yet.

50

Y. Wang and Y. Yuan

the factive-sum verb. As a sign, the factive-sum verb can guide the listener to understand the subordinate clause in the way that the speaker wants. The structure of “main clause with factive-sum verb + subordinate clause” possesses three perspectives, speciﬁcally: i) the speaker’s perspective; ii) the perspective of the subject of the sentence; and iii) the listener’s perspective. The speaker has the most radical effect on factuality-sum. This is because, according to the theory of conversational implicature, by default, the speaker meets the demand of “quality” and the listener believes that his or her words are true, so the listener naturally comprehends the sentence by following the speaker’s guidance. Therefore, the speaker’s influence on the factuality-sum of the sentence is most radical. The influence of the subject on factuality-sum is secondary to the speaker’s, and limited to the content of the sentence itself. Finally, the influence of the listener is least radiant, since the subjectivity of the listener is aroused only in two situations. First, the sentence clearly contradicts listener’s background knowledge; and, second, the truth value of the subordinate clause is undetermined and needs to be judged by the listener. In what way does the speaker guide the listener’s cognitive understanding of the sentence? We believe that the speaker incorporates his or her point of view into the sentence mainly through “appraisal”. “In the late 1990s, James R. Martin, a linguist in the Department of Linguistics at the University of Sydney, Australia, created the theoretical framework of appraisal systems.” [16] “Appraisal systems include three major subsystems: engagement, attitudes, and graduation. They are subdivided separately. Attitudes are subdivided into emotion, judgment, and appreciation. Engagement is subdivided into self-referential and borrowed speech. Graduation is subdivided into gestalt and focus.” [16] “Appraisal is mainly expressed by lexical choices, while only a few grammatical structures have appraisal functions and have their own formal features” [17]. We consider the “main clause with factive-sum verb + subordinate clause” as a typical appraisal structure. The speaker introduces his own appraisal (e.g., factivesum words, descriptive adverbials, etc.) into the main clause to guide the listener in judging and understanding the entire sentence. This appraisal is divided into two types: the appraisal of the truth value of the subordinate clause and the appraisal of the subject behavior of the main clause. In which the appraisal of the truth value of the subordinate clause thoroughly affects the factuality-sum of the sentence, while the appraisal of the subject behavior of the main clause “does not” or “slightly” affects the factuality-sum of the sentence. Based on the above theory of intersubjectivity and appraisal systems, we can classify factive-sum verbs and factuality-sum influencing factors as follows (Table 3).

A Corpus-Based Study of Factive Verbs and Its Influencing Factors

51

Table 3. The classiﬁcation of the factuality-sum influencing factors The speaker’s perspective

Type of appraisal Degree of influence Categories

Appraising the truth value of the subordinate clause Thorough

Appraising the main clause subject’s behavior Not thorough

Factive verbs (e.g. know) Rhetorical questions, double negation

Modal verbs (e.g. must) Descriptive adverbials (e.g. insistently)

Descriptive adverbials (e.g. mistakenly) Tone Tense

The listener’s perspective /

Thorough Background knowledge The content of the subordinate clause The tense of the subordinate clause

The subject’s perspective Reliability of the subject Not thorough Person of the subject

As shown above, with the different perspectives and appraisal types, we can distinguish between the radical factuality-sum influencing factors and the slight factualitysum influencing factors. The effect of “appraising the truth value of the subordinate clause from the speaker’s perspective” on factuality-sum is thorough. This is because the speaker is responsible for the truth or falsity of the subordinate clause when the object of the speaker’s appraisal is the truth value of the subordinate clause. According to the theory of conversational implicature, the listener defaults to the “truth” of the speaker’s opinion. For example, when the speaker appraises with “know”, it implies his or her own opinion that the subordinate clause is a “true fact”, and in this case, the speaker is responsible for the truthfulness of the subordinate clause, so “know” is factive. In contrast, when the speaker uses “think” to appraise, it does not imply the speaker’s own opinion, and the speaker is not responsible for the truth of the subordinate clause, so “think” is non-factive. Similarly, descriptive adverbials (e.g. “mistakenly”,), intonations (e.g., “actually”, “surprisingly”), rhetorical questions, double negation, etc., all reflect the speaker’s appraisal of the truth value of the subordinate clause, and thus the influence of such factors on the truth value of the subordinate clause is always thorough. In “appraising the main clause subject’s behavior from the speaker’s perspective”, the speaker only appraises the subject’s behavior, and these appraisals are not responsible for the truth value of the subordinate clause, but only strengthen or weaken the subject’s degree of reliability, so the influence on factuality-sum is tendentious. For example, modal verbs (e.g., 一定 “deﬁnitely”, 可能 “probably”, etc.) can strengthen or weaken the strength of the subject’s opinion, and some descriptive adverbials (e.g.,

52

Y. Wang and Y. Yuan

sincerely, ﬁrmly) also belong to this case. Such influences are applied only to the subject within the sentence, and do not represent the speaker’s point of view. Therefore, such influences cannot completely predetermine the truth value of the subordinate clause. Notably, when the speaker’s view does not agree with that of the subject, the truth value of the subordinate clause will change. Examples are as follows.

As we can see, the influence of “appraising the truth value of the subordinate clause from the speaker’s perspective” on the factuality-sum is irrevocable, while the influence of “appraising the main clause subject’s behavior from the speaker’s perspective” on the factuality-sum is revocable. The influence of “appraising of the subject’s behavior from the speaker’s point of view” is similar to “appraising the main clause subject’s behavior from the speaker’s perspective”, which only strengthens or weakens the truth value tendency of the subordinate clause, and can be cancelled. Finally, regarding the influence of the listener, the listener is activated only when the content of the subordinate clause contradicts the listener’s background knowledge or requires the listener to judge the truth or falsity in relation to the content. At this

A Corpus-Based Study of Factive Verbs and Its Influencing Factors

53

point, the content of the subordinate clause has been activated from the background to the foreground due to the listener’s awareness, and its presupposition will be cancelled, and the truth value of the subordinate clause will be determined by the listener’s background knowledge and his or her understanding of the context. Therefore, the influence of listeners on the truth value of the subordinate clause is also thorough. Examples are as follows.

As can be seen, when the listener’s consciousness is activated, previous presuppositions are cancelled, and the truth value of the subordinate clause is determined by the listener’s background knowledge and his or her understanding of the context. In summary, the “main clause with factive-sum verb + subordinate clause” structure is a typical appraisal structure. The comprehension process of this appraisal structure is an intersubjective process of mutual cognitive collaboration between the speaker and the listener. When the speaker appraises the truth value of the subordinate

54

Y. Wang and Y. Yuan

clause in the main clause using factive-sum verbs, tone, descriptive adverbials, etc., the truth value of the subordinate clause is presupposed by the main clause and cannot be cancelled; when the speaker appraises the behavior of the subject of the main clause, or when the subject of the main clause itself changes its degree of reliability, the main clause only has a tendential influence on the truth value of the subordinate clause, which can be cancelled; and whatever in any case, when the content of the subordinate clause contradicts the listener’s background knowledge or requires the listener to judge the truth or falsity in the context of the content, all the presuppositions and tendencies are cancelled, the background information is extracted to the foreground, and the truth value of the subordinate clause will be determined by the listener’s background knowledge and his or her understanding of the context. The above is our study results of factive-sum verbs and factuality-sum influencing factors.

5 Conclusion In this paper, taking “know”, “think” and “fantasize” as examples, we investigated 1464 sentences in the CCL corpus and systematically depicted the factive verbs in the actual corpus. The truth value of the subordinate clause after the non-factive verb “think” has a strong tendency, and the truth value of the subordinate clause after the counter-factive verb “fantasize” is not always “false”, but can also be “neutral” or “true”. Moreover, the paper has found a variety of factuality-sum influencing factors such as tense, modal verbs, double negation, rhetorical questions, subjects, descriptive adverbials, etc. which have not been pointed out by academics yet. Finally, the paper introduced the theory of intersubjectivity and appraisal systems to generalize and unify factive-sum verbs and various factuality-sum influencing factors and provided an explanation for them. In addition, this study is only a preliminary attempt, and there are still many shortcomings. For example: 1. The article only investigated three factive-sum verbs, namely, “know”, “think” and “fantasize”. However, in the actual corpus, there are far more factive-sum verbs than these three verbs, and the scope of investigation needs to be further expanded. 2. The articles’ perception and processing of the corpus are based on the authors’ own sense of speech, which is highly subjective. The credibility of the corpus should be improved by collecting the perceptions of multiple people on the corpus through surveys and other means. 3. The article is based on a linguistic perspective, which is subjective. We can try other perspectives. For example, if we analyze it from the perspective of event extraction (which is in the ﬁeld of natural language processing), it may have more application value and be more objective, etc. In the future, we will continue our efforts to make a breakthrough in data and theory, and conduct more in-depth and reasonable research and exploration on factivesum verbs, so as to contribute to research on semantic presupposition, cognitive collaboration, teaching Chinese as a foreign language, and natural language processing.

A Corpus-Based Study of Factive Verbs and Its Influencing Factors

55

Acknowledgments. I am grateful to the anonymous reviewers of CLSW 2021 for helpful suggestions and comments. This paper is supported by the Major Program of State Commission of Science Technology of China (2020AAA0106701) and the National Social Science Fund (18ZDA295). All errors remain my own.

References 1. Kiparsky, P., Kiparsky, C.: Fact. In: Bierwisch, M., Heidolph, K.E. (eds.) Progress in Linguistics: A Collection of Papers, pp. 143–173. The Hague, Paris (1970) 2. Leech, G.: Semantics: The Study of Meaning, 2nd edn, vol. 302. Penguin Books, Harmondsworth (1981) 3. Li, X.: A study of factuality of verbs in Chinese facts. World Chinese Lang. Teach. 29(03), 350–361 (2015). (in Chinese) 4. Yuan, Y.: The factuality and polar licensed function of implicit negative verbs. Lang. Sci. 13 (06), 575–586 (2014). (in Chinese) 5. Li, X., Yuan, Y.: The grammatical context of the factuality of “know” and its conﬁdence variation. Chin. Lang. 01, 42–52+127 (2017). (in Chinese) 6. Chen, Z.Y., Zhen, C.: The essence of factuality – lexical semantics or rhetorical pragmatics. Contemp, Rhetoric 01, 9–23 (2017). (in Chinese) 7. Guo, G., Chen, Z.: The non–factive and counter-factive of “know”––And the grammaticalization of “know early”. Lang. Teach. Res. 02, 81–90 (2019). (in Chinese) 8. Yuan, Y.: The factuality shift of “remember” and its conceptual structural basis. Lang. Teach. Res. 01, 36–47 (2020). (in Chinese) 9. Yuan, Y.: The factuality shift of “forget” and its conceptual structural basis. Chin. Lang. 05, 515–526+638 (2020). (in Chinese) 10. Li, X., Yuan, Y.: Grammatical conditions of counter-factive verb object truth and its conceptual motivation. Contemp. Ling. 18(02), 194–215 (2016). (in Chinese) 11. Li, X.: A study on the factuality of “feeling” verbs and their shift. Lang. Teach. Res. 05, 65– 75 (2018). (in Chinese) 12. Li, X.: A study of modern Chinese semi–factive verbs driven by conceptual structure. Collect. J. Lang. Stud. 01, 38–54+373 (2018). (in Chinese) 13. Yuan, Y., Kou, X.: A study on the factuality of modern Chinese nouns. Lang. Res. 38(02), 1–13 (2018). (in Chinese) 14. Yuan, Y.: Narrativity and factuality: two mechanisms of navigating linguistic reasoning. Lang. Stud. 01, 1–9 (2020). (in Chinese) 15. Verhagen, A.: Constructions of Intersubjectivity: Discourse, Syntax, and Cognition. Oxford University Press, Oxford (2005) 16. Wang, Z.: Appraisal systems and their operation – new developments in system-functional linguistics. Foreign Lang. (J. Shanghai Foreign Lang. Univ.) 06, 13–20 (2001). (in Chinese) 17. Zhang, K.: The structure of dominant commentary and its appraisal function. Foreign Lang. Teach. 05, 14–17 (2007). (in Chinese)

The Discrimination of the Synonyms of yǐnqǐ: A Corpus-Based Study Xian Wang(&) and Yuelong Wang College of Humanities, Huaqiao University, Quanzhou, China [email protected], [email protected]

Abstract. Most Chinese dictionaries simply interpret the three words dǎozhì, yǐnzhì and zàochéng as yǐnqǐ, which do not contribute to distinguishing the differences among these synonyms for Teaching Chinese as a Foreign Language (TCFL). Based on Corpus Linguistics and qualitative analysis, this paper makes a detailed analysis of the group of synonyms yǐnqǐ, dǎozhì, yǐnzhì, and zàochéng from four aspects: register distribution, colligation, signiﬁcant collocation, and semantic prosody. We ﬁnd that this group of synonyms has its priority in use, providing directional guidance for word selection for Chinese teaching and Chinese learners. Keywords: Yǐnqǐ Dǎozhì Zàochéng Colligation Collocation Semantic prosody Synonyms

1 Introduction With the proposal of the concept of International Chinese Education and the accelerating process of Chinese internationalization, exploring systematic and in-depth discrimination of Chinese synonyms has become a critical and challenging teaching content of international Chinese education, which also poses a challenge to the Chinese language teachers. For Chinese learners, distinguishing the subtle differences between synonyms plays a vital role in their Chinese writing, translation, and Hanyu Shuiping Kaoshi (HSK). Traditional synonym discrimination is mainly summarized from lexical meaning, grammatical meaning, and color meaning. These traditional methods mainly rely on intuition and lack of scientiﬁc data supports, which makes the credibility of the research results low. Instead, using a corpus to study the differences of synonyms is an effective research method. It helps to discover new facts in language use and characteristics of word behavior [1]. It can make up for the lack of objectivity in traditional word discrimination, broaden the vision of word discrimination, deepen the multi-dimensional understanding of synonyms, and grasp the use of words more accurately [2].

2 Related Works Related works include Tao, who studies the different prosody of various Chinese words meaning “occur,” Dam-Jensen and Zethsen, who ﬁnd systematic prosodic differences between Danish verbs roughly translatable as “cause” and “lead to,” and Wang Yan © Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 56–67, 2022. https://doi.org/10.1007/978-3-031-06703-7_5

The Discrimination of the Synonyms of yǐnqǐ: A Corpus-Based Study

57

discriminated several synonymous verbs of “arouse,” “provoke,” and “evoke” in English [3–5]. However, based on the corpus, there are minor relevant works on analyzing the synonymous group verbs of yǐnqǐ in Chinese. Chinese synonyms of yǐnqǐ mainly include dǎozhì, yǐnzhì, and zàochéng. In Modern Chinese Dictionary (7th Edition) [6], yǐnqǐ was annotated as “one thing, phenomenon or activity makes another thing, phenomenon or activity appear.” There is a mutual and circular explanation for the other three words and no signiﬁcant difference among them. For example, dǎozhì was annotated as “yǐnqǐ,” yǐnzhì was annotated as “yǐnqǐ; dǎozhì,” zàochéng was annotated as “yǐnqǐ; to form (a bad result).” Through the search of HSK dynamic composition corpus (http://hsk.blcu.edu.cn), it is found that dǎozhì was misused as yǐnqǐ in 5 cases, yǐnzhì in 3 and zàochéng in 1; There were 16 cases of misusing yǐn qǐ as zàochéng, 5 of misusing dǎozhì and 1 of misusing yǐnzhì. It shows that dictionaries cannot help Chinese learners distinguish the usage differences of these synonyms, and they will still encounter obstacles in writing and communication. Therefore, we need to make a systematic and detailed analysis of the synonyms of yǐnqǐ, dǎozhì, yǐnzhì, and zàochéng.

3 Research Design 3.1

BCC Corpus

This study is based on the BCC corpus of Beijing Language and Culture University. BCC (http://bcc.blcu.edu.cn) is an online corpus mainly composed of Chinese and other languages, with a full scale of tens of billions of words, is characterized by multilevel corpus processing, the combination of modern Chinese corpus and ancient Chinese corpus, multi-style, synchronic and diachronic corpus. In the era of big data, language phenomena can be veriﬁed, falsiﬁed, or discovered by corpora [7]. In this study, we choose the subcorpus of BCC—the multi-domain corpus, which is a balanced corpus with signiﬁcant text sources, mainly classiﬁed into four categories: News, Literature, Science & Technology, and Microblog. The content of these corpora is independent of each other and does not intersect with other corpora. At present, BCC is still in continuous dynamic updating, which can more comprehensively reflect the actual language phenomenon in today’s social life. 3.2

Instruments

The following two research instruments are used: Corpus Online and AntConc. Corpus Online (http://corpus.zhonghuayuwen.org/CpsWParser.aspx) is used for online word segmentation for the raw data downloaded from BCC corpus; AntConc is used for analysis and statistics, which has the functions of the index, vocabulary generation, subject word calculation, collocation, and clusters extraction. One of the essential functions is to analyze the subject word features in the observation corpus of the reference corpus.

58

3.3

X. Wang and Y. Wang

Methods

Due to the large number of yǐnqǐ, dǎozhì and zàochéng retrieved from BCC, we ﬁrst randomly extracted 10000 raw data of the three words, then used Corpus Online to process word segmentation, and eliminated a series of problems such as garbled code, an empty line, character repetition, etc. Based on getting a relatively clean text, the four words yǐnqǐ, dǎozhì, yǐnzhì, and zàochéng are statistically analyzed by using AntConc. After the index lines are extracted, the frequency and data statistics are done, and the differences in register distribution, colligation, signiﬁcant word collocation, and semantic prosody of these synonyms are summarized and analyzed.

4 Research Analysis 4.1

Differences in Register Distribution

Halliday holds that language is a form of activity of human beings in societies [8]. Affected by contextual factors, language forms a variety of language variants, which form different registers. The differences of different registers are mainly reflected in the choice of vocabulary. Palmer pointed out that synonyms can be used in different styles or registers, even in similar situations [9]. By comparing the distribution of synonyms in different registers, we will have a clear understanding of these synonyms, improve the appropriateness and accuracy of learners’ use, and help them realize that the discrimination of synonyms depends not only on perception also on rationality. According to the retrieval results of the multi-domain corpus, the frequency of the synonyms of yǐnqǐ, dǎozhì, yǐnzhì and zàochéng in different registers is obtained, as shown in Table 1. Table 1. Frequency per million words of the four synonyms yǐnqǐ dǎozhì yǐnzhì News 11.98 7.13 0.02 Literature 11.90 2.66 0.1 Science & Technology 66.12 46.19 0.25 Microblog 6.19 8.94 0.04 Total 96.19 64.92 0.41

zàochéng 17.69 9.36 61.80 7.20 96.06

In Table 1, it can be seen that yǐnqǐ and zàochéng have the highest overall frequency of occurrence in BCC multi-domain corpus, and the frequency of use in each register is relatively close, indicating that the two words are most commonly used, and the use of context is relatively similar. The usage frequency of dǎozhì is about twothirds of yǐnqǐ and zàochéng, which is also a common word; However, the overall frequency of yǐnzhì is far lower than that of the other three words, indicating that yǐnzhì is not a common word in any register.

The Discrimination of the Synonyms of yǐnqǐ: A Corpus-Based Study

59

In different registers, the frequency of the four words in the Scientiﬁc register was much higher than in other registers, indicating that the four words’ written features are relatively signiﬁcant. Generally speaking, the language in Literature and Science & Technology is higher than in daily life, and they are reprocessed from daily language. Instead, to express the reality of life, the language of News and Microblog is closer to daily life. Therefore, we can rank the four registers covered by BCC into Microblog, News, Literature, and Science & Technology according to the degree of formal style. Table 2. The distribution of the four synonyms in a different register News Literature Science & Technology Microblog yǐnqǐ 12.46% 12.36% 68.75% 6.44% dǎozhì 10.98% 4.10% 71.15% 13.78% yǐnzhì 5.00% 22.50% 62.50% 10.00% zàochéng 18.42% 9.74% 64.35% 7.50%

The distribution of Science & Technology and Literature can be used to analyze their stylistic differences (Table 2). The total proportion of yǐnzhì in Science & Technology and Literature register reaches 85%, which the degree of formal style is the highest, the proportion of yǐnqǐ in Science & Technology and Literary register is 81.11%, the proportion of dǎozhì is 75.2%, the proportion of zàochéng is 74.09%, which the degree of formal style is the lowest. On the whole, although these four words are synonyms with prominent formal stylistic features, detailed analysis shows that the order formal stylistic degree from high to low is yǐnzhì > yǐnqǐ > dǎozhì > zàochéng. 4.2

Analysis of Colligation

Colligation is a critical concept proposed by Firth, a famous British linguist, to study collocation. In his opinion, colligation is not an abstraction parallel to collocation but a higher levels abstraction [10]. That is to say, and colligation is the grammatical framework or structure in which collocations occur. A colligation represents a kind of collocations, such as of +V-ing, N+ of, N + N (Here, V denotes verb, V-ing denotes gerund, N denotes noun), and others collocation forms are collocations [11]. The establishment of collocations is a crucial step in observing and summarizing the semantic prosody and describing keywords’ signiﬁcant collocations. Given a large number of yǐnqǐ, dǎozhì, yǐnzhì and zàochéng in the corpus, we use interlaced extraction to obtain index rows. Since language usage is a probability problem to a large extent, random extraction can better reflect the probability attributes [1]. Tables 3, 4, 5 and 6 shows that we randomly select 100 indexes for each word to count their colligations.

60

X. Wang and Y. Wang Table 3. Colligation statistics of yǐnqǐ in corpus Yǐnqǐ Frequency Percentage V + N/NP 31 31% yóu + NP/VP + V 3 3% V + V/VP 66 66%

There are three types of colligations of yǐnqǐ in Table 3: V + N/NP, yóu + NP/VP + V, and V + V/VP (NP denotes noun phrase, VP denotes verb phrase). Among them, the type of V + V/VP accounts for the highest proportion, that is, yǐnqǐ followed by a predicate or predicate phrase is common. In addition, yǐnqǐ can be followed by a prepositional phrase composed of “yóu” to form “yóu + NP/VP + V,” but the proportion is low. Table 4. Colligation statistics of dǎozhì in corpus Dǎozhì Frequency Percentage V + N/NP 7 7% suǒ + V + de 1 1% V + V/VP 86 86% V + PP 6 6%

In Table 4, four types of colligations can be found: V + N/NP, suǒ + V + de, V + V/VP, and V + PP (PP denotes prepositional phrase). In the same way, V + V/VP is the main colligations form of dǎozhì. At the same time, dǎozhì can also form the “suǒ + V + de,” such as “jīnróng wēijī suǒ dǎozhì de,” “rìyuán shēngzhí suǒ dǎozhì de,” “dīxuètáng suǒ dǎozhì de” etc., but this kind of colligation form has a low proportion in our selected texts. Three types of colligations of yǐnzhì can be found (Table 5): V + V/VP, V + N/NP and suǒ/yóu + V + de. Similarly, V + V/VP is the main colligation form. It can be followed by a noun phrase or preposition suǒ/yóu, which makes up 12% of the selected texts. Table 5. Colligation statistics of yǐnzhì in corpus Yǐnzhì V + N/NP V + V/VP suǒ/yóu + V + de

Frequency 25 63 12

Percentage 25% 63% 12%

Table 6. Colligation statistics of zàochéng in corpus Zàochéng Frequency Percentage V + N/NP 35 35% V + V/VP 65 65%

The Discrimination of the Synonyms of yǐnqǐ: A Corpus-Based Study

61

There are only two types of colligations of zàochéng (Table 6), namely V + V/VP and V + N/NP. However, the proportion of V + N/NP is lower than that of V + V/VP. To sum up, the statistical analysis of these synonyms reveals that they have joint and individual characteristics in the form of colligation. The common colligations of the four synonyms are V + V/VP and V + N/NP. Moreover, the frequency of V + V/VP is the highest colligation among these four synonyms, which indicates that V + V/VP is the main form. In the process of analyzing the colligation, it is also found that yǐnqǐ can coexist with the other three synonyms, forming a syntactic colligation, namely “yǐnqǐ + VP, dǎozhì/ yǐnzhì/ zàochéng + VP”, such as “yǐnqǐ guānjié yālì zēnggāo dǎozhì ruǎngǔ ruǎnhuà”, “yǐnqǐ zhōu zhījiān zhēng tóuzī, cóngér yǐnzhì yībùfen shuìkuǎn liúshī hòuguǒ”. It can be seen that there is a logical sequence relationship between yǐnqǐ and dǎozhì, yǐnzhì and zàochéng, which is the premise of discussing their semantic prosody characteristics. 4.3

Analysis of Signiﬁcant Collocations

The concept of collocation was proposed by Firth, who believes that collocation is keeping company between words; that is, words have their ﬁxed partners, which are collocations [10]. Palmer pointed out that some words are collocationally restricted, i.e., they occur only in conjunction with other words [9]. We believe that “the collocation behavior of words shows a certain semantic choice: certain lexical items will habitually attract a certain kind of lexical items with the same semantic characteristics to form a collocation” [11]. Therefore, even synonyms with the same or similar meanings may choose different collocations. Through the previous discussion of colligation of this group of synonyms, it is concluded that their standard forms of colligations are V + N/NP and V + V/VP; that is, yǐnqǐ, dǎozhì, yǐnzhì, and zàochéng are transitive verbs, which can be followed by the object directly. In Chinese, there are still some modiﬁers between the verb and the direct object. Therefore, under the Collocates column, we select the Same option of Window Span to set the span to [0, 6R], and under the Collocates Item in Tool Preferences, we drop down the Collocate Measure select MI. MI refers to the strength of collocation between words. The larger the MI is, the more attractive the search term is to its co-occurrence words [12]. Calculating the MI of the collocation words of synonyms can reflect the mental lexicon extraction of Chinese native speakers. It can make Chinese learners more intuitively understand the characteristics of these synonyms used by Chinese native speakers in actual language communication. It could allow Chinese learners to more intuitively understand the characteristics of the use of these synonyms by native Chinese speakers in actual language communication and also explain from a side that the accessibility of lexical is not entirely based on conceptual categories. However, maybe extraction is based on the frequency of word collocations, which is different from the traditional interpretation of synonyms [13]. The following is a detailed analysis of the signiﬁcant collocations of the four synonyms. The observation frequency of yǐnqǐ in AntConc is 8034, and its collocations are mainly divided into the following ﬁve categories:

62

X. Wang and Y. Wang

(1) Words expressed the attitude and emotion of the subject: guānzhù, zhòngshì, zhùyì, fǎnxiǎng, bùmǎn, xìngqù, gòngmíng, fǎnyìng, etc. (2) Words referred to the ﬁeld of medicine: such as gǎnrǎn, zhòngdú, xìbāo, xīnjī, línchuáng, huànzhě, chūxiě, xīnzàng, bìngfāzhèng, etc. (3) Words expressed degree, such as qiángdù, gāodù, guǎngfàn, etc. (4) Words expressed quantity, such as yìxiē, yíxìliè, yízhèn, etc. (5) Words indicated actors such as government departments or ofﬁcials, for example, gōng’ān gànjǐng, gōng’ānjú, guójiā, guójì, shèhuì, lǐngdǎo, etc. The observation frequency of dǎozhì in AntConc is 10741, and its collocations are mainly divided into the following six categories: (1) Words referred to in medical ﬁelds, such as sǐwáng, gǎnrǎn, huànzhě, zhàng’ài, xīnjī, sǔnshāng, xìbāo, jíbìng, etc. (2) Words indicated the appearance of phenomena and changes in situations, such as fāshēng, xiàjiàng, chūxiàn, zēngjiā, jiàngdī, xíngchéng, etc. (3) Words indicated the economic ﬁelds, such as shìchǎng, jīngjì, jiàgé, etc. (4) Words expressed degree, such as yánzhòng, zhòngyào, shènzhì, etc. (5) Words expressed quantity, such as yìxiē, dàliàng, etc. (6) Words referred to an event or trouble, such as wèntí, jiēguǒ, hòuguǒ, etc. The observation frequency of yǐnzhì in AntConc is 924; the following is the classiﬁcation of its collocations, which mainly in three categories: (1) Words referred to events or troubles, such as wèntí, hòuguǒ, yuányīn, etc. (2) Words referred to economic ﬁelds, such as tóuzī, fēngxiǎn, xūqiú, jīnróng, sǔnshī, fāzhǎn, shìchǎng, tōnghuò péngzhàng, etc. (3) Words referred to the medical ﬁelds, such as xīnjī, jīngshén, zhàng’ài, xìbāo, zǔzhī, jījiàn, jíbìng, chūxiě, etc. The observation frequency of zàochéng in AntConc is 8607; the following is the classiﬁcation of its collocations, which mainly in six categories: (1) Words expressed harmful consequences of an event, such as sǐwáng, shòushāng, shāngwáng, shānghài, etc. (2) Words referred to events or troubles, such as yǐngxiǎng, kùn’nán, shìgù, hòuguǒ, è’liè, etc. (3) Words referred to the economic ﬁelds, such as sǔnshī, jīngjì, sǔnhài, qǐyè, shìchǎng, cáichǎn, guóyǒu zīchǎn, etc. (4) Words expressed degree, such as yánzhòng, bùliáng, zhòng, jùdà, etc. (5) Words expressed quantity, such as yìxiē, dàliàng, yígè, etc. (6) Words referred to the natural environment, such as wūrǎn, wēixié, wēihài, làngfèi, etc. The above analysis and research found that the four synonyms have similarities and differences in signiﬁcant collocations. The collocation of yǐnqǐ is mainly some abstract words that express the subject’s subjective evaluation, attitude, and emotion. The collocation of dǎozhì is relatively complex, which can be used to express concrete and tangible objective things or can also be used with words that indicate changes in

The Discrimination of the Synonyms of yǐnqǐ: A Corpus-Based Study

63

phenomena. Yǐnzhì is often used with abstract words of economic changes or medical phenomena, and zàochéng is often used with words of medical ﬁelds or problems of physical events, and most of the collocations are negative words that are unsatisfactory or unpleasant. 4.4

Analysis of Semantic Prosody

When looking at speciﬁc words in isolation, they do not have obvious emotional tendencies, but when forming a collocation with some speciﬁc words with the same semantic characteristics, they will be affected by the latter and produce related semantic characteristics. For example, it produces emotional meanings of praise and criticism, which is semantic prosody. The so-called semantic prosody means that some node words (keywords) habitually attract a certain kind of collocations with the same or similar semantic characteristics, and their semantics infect and permeate each other, thus forming a semantic atmosphere in the context [14]. Therefore, semantic prosody is also called “semantic inﬁltration.” Stubbs classiﬁed semantic prosody into three categories: positive, negative, and neutral [15]. In positive semantic prosody, most of the keywords attract words with positive semantic characteristics. In negative semantic prosody, most of the keywords attract words with negative semantic characteristics. In neutral semantic prosody, the keywords absorb words with both negative and positive semantic characteristics. The situation is complex, so neutral semantic prosody is also called complex semantic prosody. There is no absolute positive semantic prosody or negative semantic prosody in most cases, but there is the question of the negative and positive distribution ratio. Louw believes that semantic prosodies have been mainly inaccessible to human intuition about language, and they cannot be retrieved reliably through introspection [16]. Morley and Partington believe that semantic prosody is a constraining mechanism; like so many mechanisms in grammar, we cannot simply put any old bunch of words together [17]. Therefore, the study of semantic prosody based on the corpus of actual texts can link the collocation in language form with semantics. At the same time, it is found that in most cases, the real meaning of words is not expressed by isolated words but by collocation [18]. Regarding the semantic prosody of the four synonyms, the analysis of their colligations and signiﬁcant collocations in the previous article can provide a particular reference for the semantic prosody characteristics of the four synonyms. Due to a large observation corpus of these, we still adopt randomly extracting 100 index lines to study its semantic prosody. In the analysis of the signiﬁcant collocations of yǐnqǐ, it is concluded that the collocations of yǐnqǐ mainly express the emotion, attitude, and evaluation of the subject. Through case retrieval (Table 7), it is further found that the semantic features of yǐnqǐ attract words are both positive and negative. Meanwhile, most of the collocations that appear in the medical ﬁeld have negative semantic features. Among the 100 collocations, 33% showed positive semantic features, 21% showed neutral semantic features, and 46% showed negative semantic features. Therefore, it is generally believed that yǐnqǐ is complex semantic prosody.

64

X. Wang and Y. Wang Table 7. Analysis of semantic prosody of Yǐnqǐ

Signiﬁcant collocations of Yǐnqǐ

Frequency

Percentage

guānzhù/ zhòngshì/ xìngqù/ zhùyì/ tóngqíng/ qīdài/ hàoqíxīn Wèntí/ bùmǎn/ jǐngjué/ bù’ān/ pīpíng/ mièshì fǎnxiǎng/ zhèndòng/ sīkǎo/ gòngmíng/ línchuáng fǎnyìng/ jíbìng/ gānyìngbiàn/ gǎnrǎn/ xīnlǜshīcháng/ sǐwáng

33

33%

Semantic prosody Positive

22 21

22% 21%

Negative Neutral

24

24%

Negative

Table 8. Analysis of semantic prosody of Dǎozhì Signiﬁcant collocations of Dǎozhì

Frequency

Percentage

zēngzhǎnglǜ shōusuō/ zhèngzhuàng/ fǎnyìng/ xiàoyìng Tānhuàn/ wěn’luán/ bìngbiàn/ zhàng’ài/ yìyù/ shīmíng/ liúchǎn lànyòng/ fànzuì/ tāntā/ shīzhījiāobì/ pòchǎn/ shīlì/ shībài

1 4 32

1% 4% 32%

Semantic prosody Positive Neutral Negative

63

63%

Negative

From the signiﬁcant collocations of dǎozhì, there is no apparent semantic prosody tendency. By randomly selecting 100 search lines (Table 8), it is found that only 5% of the collocations with neutral and positive semantic features, and 95% of the collocations with negative semantic features, mainly indicating the harmful results of events in the medical and social ﬁelds. Therefore, dǎozhì could be identiﬁed as negative semantic prosody.

Table 9. Analysis of semantic prosody of Yǐnzhì Signiﬁcant collocations of Yǐnzhì

Frequency

Percentage

fāmíng/ mǎnzú/ mùbiāo/ zēngzhǎng/ tóurù/ zēnggāo/ duōyuánhuà fǎnyìng/ chǎnshēng/ yǐngxiǎng/ biàndòng/ zhuàngkuàng shīmíng/ lànyòng/ shòusǔn/ bìngfāzhèng/ bàofā/ zhàng’ài/ biǎnzhí/ fēngxiǎn/ wēijī/ hùnluàn/ guānbì/ sǔnshī/ bàodiē

16

16%

Semantic prosody Positive

18

18%

Neutral

28

28%

Negative

38

38%

Negative

The Discrimination of the Synonyms of yǐnqǐ: A Corpus-Based Study

65

The previous discussion on the collocation of yǐnzhì mainly focuses on the analysis of the distribution of its ﬁeld of occurrence, that is, the collocation of yǐnzhì appears more in the economic ﬁeld and medical ﬁeld. According to the semantic prosody statistics in Table 9 above, it is found that 16% of the collocations attracted by yǐnzhì have positive semantic features, 18% have neutral semantic features, and 66% have negative semantic features in the ﬁeld of medicine or economy. On the whole, yǐnzhì could be considered as negative semantic prosody. As for the semantic prosody characteristics of zàochéng, we can roughly judge that zàochéng is negative semantic prosody through the signiﬁcant collocation in Sect. 4.3. In order to verify the conjecture, the study also randomly selected 100 out of 8607 index lines for observation (Table 10 below). It was found that 80% of the collocations had negative semantic features, only one collocation with positive semantic features, and neutral semantic features accounted for 19%. Therefore, it could be judged that zàochéng is negative semantic prosody. Table 10. Analysis of semantic prosody of Zàochéng Signiﬁcant collocations of Zàochéng

Frequency

Percentage

jīhuì yìnxiàng/ fēnliú/ xíngwéi/ yǐngxiǎng/ gǎibiàn/ pínggū/ yìzhì zhémó/ ěguǒ/ shāngwáng// shuāijiǎn/ kùnnán/ shāngwáng/ shīzōng/ wēijī

1 19

1% 19%

Semantic prosody Positive Neutral

80

80%

Negative

To sum up, yǐnqǐ is complex semantic prosody, yǐnzhì is partial negative, dǎozhì and zàochéng are negative. According to the previous analysis of colligation, it is mentioned that yǐnqǐ can form syntactic colligation with yǐnzhì, dǎozhì, and zàochéng respectively, forming a logical sequence in semantics. The content of yǐnqǐ primarily refers to the appearance of an objective event, situation, or phenomenon. As for the result, it is followed by dǎozhì or zàochéng to indicate a series of problems or accidents. Semantically, it shows a dissatisfactory and unexpected. In Chinese, dǎozhì and zàochéng and other event trigger words and victims are often used to highlight the harmfulness of emergencies [19]. It can be said that there are no absolute synonyms, no two words have the same meaning. Indeed, it would seem unlikely that two words with the same meaning would survive in a language [9]. Therefore, even if the dictionary interprets the last three words as yǐnqǐ, they are words with different semantic colors. It can be seen that the core collocations of these synonyms are all related to harmful items. If the negative semantic prosody is taken as the standard and give a division in degree, it can be roughly considered that the negative degree of the semantic prosody of these four synonyms gradually decreases from dǎozhì > zàochéng > yǐnzhì > yǐnqǐ.

66

X. Wang and Y. Wang

5 Conclusion Naturally occurring examples from corpora can provide up-to-date information on the usage of words, and it is an effective supplement to the traditional synonym discrimination based on intuitive judgment or introspection. From the perspective of event structure, we can classify yǐnqǐ as one kind of objective factor to indicate the occurrence of events, and the other three words are another kind of negative consequences or harmfulness to indicate the occurrence of events. yǐnzhì is the least commonly used and the most formal, so it is easy to be distinguished. Zàochéng and dǎozhì are not easy to be distinguished, but dàozhì is the least formal and the most negative. When it is necessary to express the more severe consequences of daily events, dǎozhì is more appropriate. This study relies on the research path of synonym discrimination based on corpus, making the research results more precise and more targeted. It can provide directional guidance for Chinese teaching, help Chinese learners distinguish the differences, and provide a reference for the compilation of the Chinese dictionaries.

References 1. Wei, N.X.: Corpus-based and corpus-driven approaches to the study of collocation. Contemp. Linguist. 4(2), 101–114 (2002). (in Chinese) 2. Pan, F., Feng, Y.J.: The differences of synonyms—a corpus-based study. Shandong Foreign Lang. Teach. 4, 8–12 (2000). (in Chinese) 3. Tao, H.: Towards an emergent view of lexical semantics. Lang. Linguist. 4(4), 837–856 (2003). (in Chinese) 4. Dam-Jensen, H., Zethsen, K.K.: Pragmatic patterns and the lexical system—a reassessment of evaluation in language. J. Pragmat. 39(9), 1608–1623 (2007) 5. Wang, Y.: A Corpus-Based Analysis of English Synonyms: take the verbs “arouse”, “provoke” and “evoke” as Examples. Coll. Engl. 13(2), 68–73 (2016). (in Chinese) 6. The Linguistics Institute of Chinese Academy of Social Sciences: The Dictionary of Modern Chinese, 7th edn. Commercial Press, Beijing (2016). (in Chinese) 7. Xun, E.D., Rao, G.Q., Xiao, X.Y., Zang, J.J.: The construction of the BCC corpus in the age of big data. Corpus Linguist. 3(1), 93–109 (2016). (in Chinese) 8. Halliday, M.A.K.: McIntosh, Angus and Strevens, Peter: The Linguistic Science and Language Teaching. Longman, London (1964) 9. Palmer, F.R.: Semantics A New Outline, pp. 59–62. Cambridge University Press, London (1976) 10. Firth, J.R.: Papers in Linguistics 1934–1951. Oxford University Press, London (1957) 11. Wei, N.X.: General methods of semantic prosody research. Foreign Lang. Teach. Res. 34(4), 300–307 (2002). (in Chinese) 12. Deng, Y.C.: Statistics in collocation study. J. Dalian Maritime Univ. 4, 74–77 (2003). (in Chinese) 13. Zhang, J.D., Liu, P.: Corpus-based approaches to the differentiation of English synonyms. J. PLA Univ. Foreign Lang. 28(6), 49–52+96 (2005). (in Chinese) 14. Wei, N.X.: A Corpus-based contrastive study of semantic prosodies in learner English. Foreign Lang. Res. 5, 50–54+112 (2006). (in Chinese) 15. Stubbs.: Text and Corpus Analysis. Blackwell Publishers, Oxford (1996)

The Discrimination of the Synonyms of yǐnqǐ: A Corpus-Based Study

67

16. Louw, B.: Irony in the text or insincerity in the writer?—the diagnostic potential of semantic prosodies. In: Baker M., Francis, G., Tognini-Bonelli, T. (eds.) Text and Technology: In Honour of John Sinclair, pp. 157–176. John Benjamins, Amsterdam (1993) 17. Morley, J., Partington, A.: A few frequently asked questions about semantic—or evaluative —prosody. Int. J. Corpus Linguist. 14(2), 139–158 (2009) 18. Liang, M.C., Li, W.Z., Xu, J.J.: Using Corpora: A Practical Coursebook. Foreign Language Teaching and Research Press, Beijing (2010). (in Chinese) 19. Huang, J.: The Connotation of “Event Trigger” and Chinese event expression system— taking the emergency trigger as an example. Lang. Transl. 1, 36–42 (2020). (in Chinese)

A Corpus-Based Semantic Analysis of the Pattern ‘bǎ+NP+VP’ in Mandarin Chinese Congcong Yang

and Yunhua Qu(&)

School of International Studies, Zhejiang University, Hangzhou, China {yangcc0905,qu163hua}@163.com

Abstract. Great signiﬁcance has been ascribed to the pattern “bǎ+NP+VP” due to its high frequency and various usages. Based on a 2.2-million-word corpus, this study aims to answer the following questions: How are lexical items associated with this pattern? How should semantic associations within it be best described? What are the key semantic subcategories and typical usages of this pattern? The major innovation of this study lies in the adoption of the three-level semantic coding system of the New Synonym Dictionary, and the utilization of co-occurrence networks for better visualization and description of language data. The results indicate that the pattern exhibits a restriction on the choice of N(P) and V(P). Moreover, semantic categories of head nouns and verbs are associated with each other, and simultaneously conﬁned by this pattern. Ab (person) and Id (action of upper limbs) constitute the most important semantic subcategories of noun and verb, respectively, in this pattern. Finally, four typical usages of this pattern are summarized through co-occurrence networks. Keywords: bǎ+NP+VP occurrence network

Semantic analysis Pattern grammar Co-

1 Introduction 1.1

“bǎ+NP+VP” in Mandarin Chinese

The bǎ-sentence is one of the most typical sentence patterns in Mandarin Chinese. Its high frequency, various usages, as well as signiﬁcance for language description and language learning, have motivated a substantial body of related research for several decades. Traditional research of the bǎ-sentence mainly concerns syntax, such as Ziegeler [1] and Bender [2]. With the development of teaching Chinese as a foreign language, some literature investigates the acquisition of bǎ construction by non-native speakers of Chinese [3, 4]. Studies of the bǎ-sentence also exist from the perspectives of discourse [5] and prosodic distribution [6]. Recently, requirements of NLP and machine translation have driven some research of the bǎ-sentence, but most of these studies focus only on annotation of syntax and semantic role [7–9]. Project supported by the National Social Science Fund of China (No. 17BYY002). © Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 68–83, 2022. https://doi.org/10.1007/978-3-031-06703-7_6

A Corpus-Based Semantic Analysis of the Pattern ‘bǎ+NP+VP’

69

Extant literature on the bǎ-sentence can be approximately divided into three main aspects: (1) the grammatical meaning of the bǎ-sentence; (2) the generation of the bǎsentence; and (3) characteristics of the components of “bǎ+NP+VP”. These investigations, however, possess two main deﬁciencies. First, existing studies about semantic features are mostly on the lexical level. However, the generation of sentences is a process of transformation from meaning to form, and the relationship between meaning and form is usually one-to-many. Therefore, the present study is based on the frequency of semantic categories, rather than word frequency. Second, previous research only analyzes isolated attribute data from the perspective of individual words, and ignores the connection between different semantic categories, which constitutes an essential factor in language expression. Consequently, the present research draws on social network analysis, establishes associations among semantic categories, and explores relational data through co-occurrence networks. 1.2

Pattern Grammar

Most existing research on the bǎ-sentence is based on systemic-functional grammar, transformational-generative grammar, and dependency grammar. Although these studies have yielded certain ﬁndings, due to the complexity of “bǎ” sentences, it is difﬁcult to explain the characteristics of the constituents from a uniﬁed perspective. Therefore, a novel perspective is needed to elucidate this issue. As a descriptive approach, pattern grammar is proposed by Hunston and Francis [10], and focuses on the association between meaning and structure. In pattern grammar, a word is regularly associated with certain words and structures, which are regarded as patterns. For example, patterns of the verb explain include V N, V wh, V about N, V N to N, V that, V to N, V to N that, etc. A pattern possesses three main features: (1) the combination of words occurs relatively frequently; (2) it is dependent on a particular choice of words; and (3) it is associated with a clear meaning. In addition to its high frequency, “bǎ+NP+VP” often co-occurs with particular words and is assigned a certain meaning. As a consequence, it is viewed as a pattern for further analysis. Pattern grammar asserts that patterns always occur with relatively restricted lexis [10]. This corpus-driven approach has added a novel dimension to the traditional observation that meaning and pattern, or sense and syntax, are associated. 1.3

Research Questions and Innovation

Based on a 2.2-million-word corpus, the present study aims to investigate semantic features of the pattern “bǎ+NP+VP”, and explore the interaction between meaning and structure. Speciﬁcally, it intends to answer the following three research questions: 1. How are lexical items associated with the pattern “bǎ+NP+VP”? How should semantic associations within this pattern be best described? 2. What are the key semantic subcategories of head nouns and verbs in this pattern? 3. What are typical usages of this pattern?

70

C. Yang and Y. Qu

The innovation of the present study can be speciﬁed in two aspects: First, innovation in research methodology. Previous studies on the bǎ-sentence are primarily conducted as qualitative analyses; whereas, the present study extracts data from a balanced corpus to identify linguistic behavior of the pattern “bǎ+NP+VP”, which is of great use in verifying conclusions drawn by previous research. Moreover, in past investigations, the classiﬁcation of semantic categories mostly relied on subjective judgments of native speakers’ intuition. The three-level coding system of the New Synonym Dictionary [11] adopted in this study, however, is based on ontology, which is both more objective and accurate. Another major innovation of this study is that it introduces co-occurrence networks into semantic analysis for better visualization and description of language data. Second, enrichment of theoretical foundation. Pattern grammar is adopted to analyze the bǎ-sentence, which can better reflect its actual usages in authentic environments, compared with rule-based grammar. Furthermore, pattern grammar provides us with a practical methodology to elucidate the interaction between pattern (form) and semantic features (meaning) from a structural perspective.

2 Methods 2.1

Corpus

The study was based on a 2.2-million-word corpus built in 2016, i.e., the Zhejiang University Spoken and Written Corpus of Mandarin Chinese, which contains 1.1 million words of spoken and written Mandarin Chinese, respectively. This corpus is tagged by the Institute of Computing Technology, Chinese Lexical Analysis System (ICTCLAS). The composition of this corpus is shown in Table 1. Table 1. The Zhejiang University Spoken and Written Corpus of Mandarin Chinese Written part Spoken part News reports 8.7% Telephone calls 25.0% News editorials 5.0% Television talk shows 11.4% News reviews 14.4% Face-to-face conversation 5.0% Religion 2.9% Plays and movies 8.6% Skills, trades and hobbies 9.2% Debates 7.1% Popular lore 8.1% Internet speech 5.5% Biographies and essays 2.2% Chinese folk arts 3.6% Reports and ofﬁcial documents 5.9% Oral narratives 9.3% Academic prose 17.8% Edited oral narratives 24.5% General ﬁction 4.7% Mystery and detective ﬁction 4.0% Science ﬁction 1.0% Adventure and martial arts ﬁction 4.6% Romantic ﬁction 5.3% Humor 1.5% Lyrics 4.7%

A Corpus-Based Semantic Analysis of the Pattern ‘bǎ+NP+VP’

2.2

71

Tool and Software

Semantic System of the New Synonym Dictionary. The three-level coding system of the New Synonym Dictionary [11] is based on ontology, which has become an important concept in computer science since the end of the 20th century. In 1998, Studer et al. deﬁned ontology as “a formal explicit speciﬁcation of a shared conceptualization” [12]. It constitutes a scientiﬁc method to describe the system of linguistic conceptions and knowledge. In semantic analysis of machine translation, ontology can provide us with manifold information about words and reveal semantic relations among words [13]. Indeed, this is precisely the reason that the dictionary based on ontology is chosen to annotate words in the pattern under investigation. This semantic system comprises 15 categories at the ﬁrst level, i.e., A [human beings], B [natural entities], C [man-made entities and tools], D [abstract entities], E [events], F [time], G [space], H [number], I [physical behavior], J [psychological behavior], K [social activities], L [phenomena and condition], M [properties], N [relation], and O [others]. In addition, there are 203 subcategories at the second level (a lowercase letter is added, e.g., Aa [form of address]), and 1,477 more speciﬁc subcategories at the third level (a number is added, e.g., Aa 01 [honoriﬁc title]). Regarding part of speech: categories from A to H are generally nouns; categories I, J, K, N, and part of L are generally verbs; categories M and part of L are adjectives; and the ﬁnal category O is a miscellaneous category, which comprises mainly function words. KH-coder. The KH-coder is a free software program for quantitative content analysis or text mining developed by Koichi Higuchi. It can also be utilized for computational linguistics. Texts of 13 languages, including Chinese (simpliﬁed), English, Japanese, etc., can be analyzed by this software [14]. The KH-coder is a convenient tool to generate co-occurrence networks. A cooccurrence network can show semantic tags (nodes) with similar appearance patterns, i.e., with high degrees of co-occurrence, connected by lines (edges). In a co-occurrence network, larger nodes indicate higher frequency semantic tags, and thicker lines mean higher degrees of co-occurrence. To measure the strength of co-occurrence, Jaccard coefﬁcients are calculated for all possible combinations of target semantic tags. The Jaccard coefﬁcient is a measure of the percentage of overlap between sets deﬁned as: J(X,Y) = |X∩Y| / |X∪Y|

ð1Þ

The Jaccard coefﬁcient can be a value between 0 and 1, with 0 indicating no overlap and 1 complete overlap between the sets [15]. An example of how the Jaccard coefﬁcient is computed when using patterns as the unit of analysis is as follows: A: Number of patterns that contain both word X and Y. B: Number of patterns that contain word X only. C: Number of patterns that contain word Y only. J(X,Y) = A / (A+B+C)

ð2Þ

Aiming to discover the most signiﬁcant semantic tags and determine typical usages of the pattern “bǎ+NP+VP”, the present study utilizes the KH-coder to generate cooccurrence networks for visualization of semantic relations.

72

2.3

C. Yang and Y. Qu

Procedure

Data Extraction and Selection. First, Python is used to retrieve concordance lines of bǎ (把) from the 2.2-million corpus. In different contexts, bǎ can be used as a verb, a noun morpheme, a measure word, a preposition, or an auxiliary in Chinese. Because this study is interested in the pattern “bǎ+NP+VP”, only concordance lines containing the prepositional bǎ can be selected and analyzed. The next step is to import retrieved data into Excel. For the patterns of “bǎ+NP +VP” in the corpus, head words (N/V) are extracted from the phrases (NP/VP). At last, there are two columns in an Excel document: the ﬁrst column lists N items of this pattern; the second column lists V items of this pattern. For example, in the concordance line “他把崭新的茶杯放在桌上。” (He put the brand-new cup on the table.), the noun “茶杯” (cup) and the verb “放” (put) are extracted and listed in a row of the Excel document. Semantic Annotation. Nouns and verbs extracted from the patterns of “bǎ+NP+VP” are manually annotated according to the semantic system of the New Synonym Dictionary. For instance, the noun “茶杯” (cup) is annotated as Cf 13, and the verb “放” (put) is annotated as Id 12. In the tag Cf 13, C represents “man-made entities and tools”, Cf means “utensil” and, more speciﬁcally, Cf 13 refers to “tableware”. Likewise, in the tag Id 12, I represents “physical behavior”, Id means “action of upper limbs”, and the whole tag Id 12 refers to “put or hang”. For polysemous words that are classiﬁed into different categories or subcategories of the semantic system, tags are carefully annotated according to their context meaning by two native speakers with the assistance of the Modern Mandarin Dictionary [16]. Clustering and Statistical Analysis. The following step is to calculate the frequency of semantic tags at the ﬁrst level (e.g., C and I) and at the second level (e.g., Cf and Id) for nouns and verbs, respectively. SPSS is utilized to explore the semantic associations between nouns and verbs in the patterns of “bǎ+NP+VP”. Subsequently, these semantic tags are imported into the KH-coder, and co-occurrence networks are generated for further analysis.

3 Results and Discussion The results show that, in our corpus, bǎ (把) appears in a total of 4,543 concordance lines, including 4,336 prepositional bǎ and 207 bǎ of other functions (i.e., verb, noun, or measure word). It is worth noting that among 4,336 valid patterns of “bǎ+NP+VP”, only 1,985 patterns are from the written corpus, while the other 2,351 patterns are from the spoken one, which reflects the higher frequency of this pattern in spoken language. 3.1

Semantic Distribution of N and V at the First Level

Hunston and Francis [10] demonstrated that words that frequently co-occur with a pattern tend to group in certain aspects of meanings. In other words, particular patterns are associated to words with particular meanings. This research explores the pattern

A Corpus-Based Semantic Analysis of the Pattern ‘bǎ+NP+VP’

73

“bǎ+NP+VP” through semantic analysis of the group of nouns and verbs attached to this pattern. In this part, the semantic distribution of N and V at the ﬁrst level is discussed, and how semantic tags of N and V co-occur with each other is also analyzed. Table 2. Semantic distribution of N and V at the ﬁrst level N % V % A [human beings] 1,282 29.57% B [natural entities] 422 9.73% C [man-made entities and tools] 1,099 25.35% D [abstract entities] 1,021 23.55% E [events] 117 2.70% F [time] 32 0.74% G [space] 59 1.36% H [number] 67 1.55% I [physical behavior] 16 0.37% 1,773 40.89% J [psychological behavior] 35 0.81% 167 3.85% K [social activities] 73 1.68% 1,578 36.39% L [phenomena and condition] 29 0.67% 719 16.58% M [properties] 84 1.94% N [relation] 99 2.28%

Table 2 indicates that the pattern “bǎ+NP+VP” has a tendency to co-occur with certain types of meaning groups of N and V, respectively. Speciﬁcally, approximately 88.2% of nouns extracted from the 4,336 patterns refer to human (29.57%), man-made entities and tools (25.35%), abstract entities (23.55%), and natural entities (9.73%). For the general distribution of V in this pattern, the table shows that 93.86% of verbs fall in the three semantic categories of physical behavior (40.89%), social activities (36.39%), and phenomena and condition (16.58%). This semantic distribution indicates the variety of nouns and the centralization of verbs in the same pattern. Table 3. Results of Chi-square test Value df Asymptotic signiﬁcance (2-sided) a Pearson chi-square 418.928 12 0.000 Likelihood ratio 424.217 12 0.000 N of valid cases 3,824 a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 8.83.

74

C. Yang and Y. Qu

The next task is to determine the interaction between N and V in this pattern. Table 6 in Appendix shows the N*V cross tabulation of all 4,336 patterns, but several cells have an expected count of less than 5. For the convenience and accuracy of Chisquare tests, patterns comprising nouns tagged A, B, C, D and verbs tagged I, J, K, L, N are selected for further analysis, which account for over 88.19% of all of the patterns. The Chi-square test in Table 3 shows that the semantic categories of nouns and verbs in this pattern are strongly associated (X2 = 418.928, p = 0.000). This result provides evidence for the assertion of pattern grammar that syntax and sense are associated, and a pattern tends to co-occur with lexical items sharing meanings of an aspect [17]. In other words, a particular pattern selects certain lexical items constrained in meanings. Only words that share an aspect of meaning groups have a possibility of co-occurring with the pattern “bǎ+NP+VP”. Table 4. Distribution of semantic tags in this pattern

n

A

Count Adjusted Count Adjusted Count Adjusted Count Adjusted Count

B C D Total

residual residual residual residual

v I 432 −8.0 260 −8.3 671 14.5 272 −12.2 1635

Total J 49 0.7 6 −2.5 15 −4.6 65 5.7 135

K 587 8.4 88 −7.1 293 −8.1 429 4.3 1397

L 176 −1.7 63 −0.1 112 −5.4 226 7.3 577

N 38 2.7 5 −1.4 8 −3.7 29 2.0 80

1282 422 1099 1021 3824

It is also noted that nouns of four different semantic categories have their own statistically-preferred verbs. As shown in Table 4, the results of adjusted residual imply that nouns’ semantic categories exert an impact on the choice of verbs. More specifically, according to the adjusted residuals, the following four essential functions of this pattern can be determined: • • • •

bǎ bǎ bǎ bǎ

+ + + +

N N N N

[man-made entities and tools] + V [physical behavior] [human beings] + V [social activities] [natural entities] + V [physical behavior] [abstract entities] + V [phenomena and condition]

It is obvious that the pattern “bǎ+NP+VP” has a restriction on the choice of N(P), and then the choice of V(P) is restricted by the meaning of N(P). Although lexical items vary considerably, their meaning groups are much more limited. The meanings of nouns and verbs are connected, and simultaneously conﬁned by this pattern.

A Corpus-Based Semantic Analysis of the Pattern ‘bǎ+NP+VP’

3.2

75

Key Semantic Subcategories at the Second Level

In response to the second research question, frequencies of semantic tags at the second level are calculated ﬁrst. According to Biber [18], the threshold of frequency, concerning whether it can be regarded as a stable collocation, is 10 per one million words. Since there are 2.2 million words in our corpus, semantic tags that occur at least 22 times at the second level are counted for the analysis. It is found that 45 subcategories of nouns have a frequency of more than or equal to 22, while only 28 subcategories of verbs are eligible, which conﬁrms the variety of nouns and the centralization of verbs in this pattern. The top-15 subcategories of nouns and verbs are listed in Table 5, and all of the subcategories with a frequency of more than or equal to 22 are presented in Table 7 in Appendix. Table 5. Frequency of semantic tags at the second level Noun Ab (person)

Example tā (he) Cf (utensil) bēi zi (cup) Bf (body part and shǒu organ) (hand) Da (situation) qíng kuàng (situation) Dg (science and yǔ yán subject) (language) Ax (person’s lǐ léi name) (Li Lei) Cd (food and drug) fàn (meal) Ck (creation) shū (book) Cm (money) qián (money) Cc (clothing and dà yī ornaments) (overcoat) Ak (family and qī zǐ relative) (wife) Dd (function) yōu diǎn (advantage) Dn (economy) lì rùn (proﬁt) Aa (form of shì mín address) (citizen) Bb (substance) shuǐ (water)

Freq. % Verb 715 16.5 Id (action of upper limbs) 257 5.9 Kb (social activity) 195

4.5 Lj (static relation)

160

3.7 Kc (daily routine)

146

3.4 Ka (language activity) 3.1 Ib (passing of title)

136 131 131 111 108 106 97 94 89 86

Example tuī (push) yāo qǐng (invite) dāng zuò (regard as) dǎ sǎo (clean up)

shuō (speak) gěi (give) 3.0 Ii (action with tools) jiǎn (cut off) 3.0 Ic (change of gǎi biàn properties) (change) 2.6 Kk (public ān pái administration) (arrange) 2.5 Kg (education and fān yì research) (translate) 2.4 Kf (transportation) sòng (transport) 2.2 Ld (state of objects) hùn hé (mix) 2.2 Ke (economic mǎi activity) (buy) 2.1 Le (state of affairs) wán chéng (complete) 2.0 Ie (action of ﬁve chī sense organs) (eat)

Freq. % 904 20.8 486

11.2

307

7.1

287

6.6

232

5.4

223

5.1

145

3.3

140

3.2

128

3.0

116

2.7

110

2.5

107

2.5

104

2.4

103

2.4

102

2.4

76

C. Yang and Y. Qu

As shown above, Ab (person) and Id (action of upper limbs) are the most frequent semantic subcategories of noun and verb, which indicates that the pattern “bǎ+NP+VP” is frequently used to describe people and is concerned with actions of the upper limbs, especially the hands. However, frequency by itself cannot fully elucidate the essential role of these two semantic subcategories in this pattern. Consequently, the method of co-occurrence network is utilized to seek further conﬁrmation. kn dl ac li ga nc

0.75

ai

ea

na

dq

0.50 aa

db

jb

dn

kg

kd Coefficient:

ax

bf

dg

0.25

ak

kk

dv

0.025

ab

lj

eb

1.00

al

dk ci

cb

ka dd ic

kb

0.100

kf

be

da

jf

il

le

ld

kc

ii

ce

id

lg lh

di

0.050 0.075

bd

ij

Frequency: 250

ie

mf ck

df

ch

cd

ke

ia

500

cf an

ib

dj

cc bb

kh

750 cm

dc do

ca

ba dh ar

cn cj

Fig. 1. Co-occurrence network of semantic tags at the second level based on centrality [N = 73 (number of semantic tags), E = 781 (number of edges), centrality type: Eigenvector.]

This co-occurrence network presented in Fig. 1 is based on centrality from social network analysis, i.e., determination of the centrality of the role that each semantic tag plays in the network. Here, circles representing nodes are in light yellow to blue color, indicating their ascending degree of centrality. The centrality type eigenvector, which is also termed Bonacich's centrality, means that one's centrality is a function of how many connections one has and how many connections the actors in the neighborhood have [19]. In other words, the more connections the actors in your neighborhood have, the more central you are. The KH-coder uses igraph to calculate these centrality values. In total, there are 73 nodes representing different semantic subcategories, including 45 subcategories of nouns and 28 of verbs. The result demonstrates that the signiﬁcance of Ab (person) and Id (action of upper limbs) not only depends on their high frequency,

A Corpus-Based Semantic Analysis of the Pattern ‘bǎ+NP+VP’

77

but more importantly, lies in the central position regarding the function of connecting other semantic subcategories in this pattern. 3.3

Typical Usages of the Pattern “bǎ + N + V”

To complete the determination of typical usages of the pattern “bǎ+NP+VP”, the method of co-occurrence network is used again. In addition to centrality, the KH-coder can also utilize igraph to generate co-occurrence networks based on community detection. Since plotting all co-occurrence information may simply ﬁll the diagram with lines, it is helpful to indicate which edges are most important in a network. Their importance depends on whether or not they are part of the minimum spanning tree, which is based on the strength of co-occurrence using Prim’s algorithm. To reduce the diagram to a simple network, only edges forming the minimum spanning tree are selected and displayed. an .03

do

lh .03

dj

.05

ac

cb .05

.05

kd

kh

dc

.03

al

.03 .05

dd

.02

.11

ab

.06

ax

ch

.03

.05

ld

id

mf

.05

dg

da .05

.11

.11

ka

250 jf

.03

.07

.11

nc cf

cd .10

0.100 Frequency:

.03

.07

cc

.04 .04

jb

06

0.075

.03

.03 .07

05

03

0.025

dv

.04

.04

bf .06

04

02

0.050

db

.03 .09

df .06

le

kg dl

aa

.03 .03

kf

.06

01

Coefficient:

ij

.03

.06

be

lg

.03

cj

.02

kc .07

kb

.03

.03

lj

.12

.03

ce

ia

.04

ak

ba

.05

.06

kn

Subgraph:

eb

.03

.02

li

bd .03

dq

dk

na

.03

di

ea

ci

ck .05

.06

ie ii

.08

ib .07

750

.03

kk

ca

cn

.05

.03 .03

.05

cm

ke

.12

.03

.03

ai

ic .04

dh .05 bb

.06

il

.02

500

.04

.04 .05

ga

dn

ar

Fig. 2. Co-occurrence network of semantic tags at the second level based on community detection [N = 73 (number of semantic tags), E = 72 (number of edges), community detection type: Modularity.]

Modularity refers to fastgreedy community. This function attempts to ﬁnd dense sub-graphs, also called communities in graphs, via directly optimizing a modularity score [20]. As shown in Fig. 2, there are six subgraphs in different colors representing six communities, centered around Id (Subgraph 01), Ib (Subgraph 02), Ka (Subgraph

78

C. Yang and Y. Qu

03), Ab (Subgraph 04), Lg (Subgraph 05), and Ij (Subgraph 06). Considering both frequency and Jaccard coefﬁcient, this co-occurrence network assists to conﬁrm four typical usages of the pattern “bǎ+NP+VP”: – Typical Usage 1: Cf (utensil)

Id (action of upper limbs)

Ii (action with tools)

Cc (clothing and ornaments) Bf (body part and organ)

Ld (state of objects)

Cd (food and drug)

Ie (action of five sense organs)

Example: bǎ zì xíng chē tuī zǒu (wheel the bicycle away), bǎ kù zi chuān shàng (put on the pants), bǎ shǒu bì tái qǐ (lift one’s arm), bǎ cài duān shàng lái (serve the dishes), bǎ zhuō zi xiū hǎo (repair the table), bǎ shǒu shēn kāi (stretch out one's hand), bǎ fàn chī le (have a meal), etc. – Typical Usage 2: Ck (creation) Ib (passing of title)

Dh (information) Cm (money)

Ke (economic activity)

Example: bǎ shū gěi (give the book), bǎ xìn gěi (give the letter), bǎ qián gěi (give money), bǎ qián huā le (spend money), etc.

A Corpus-Based Semantic Analysis of the Pattern ‘bǎ+NP+VP’

79

– Typical Usage 3: Da (situation) Ka (language activity)

Le (state of affairs)

Dg (science and subject)

Kg (education and research)

Ea (state of affairs)

Kk (public administration)

Example: bǎ qíng kuàng shuō míng (explain the situation), bǎ huà shuō wán (speak sth.), bǎ shì qíng jiǎng (tell sb. sth.), bǎ mìng yùn lián jiē (share the same destiny), bǎ shù xué bǔ xí (tutor sb. in math), bǎ shì chǔ lǐ (deal with sth.), etc. – Typical Usage 4: Ax (person’s name) Kb (social activity) Ak (family and relative) Lj (static relation) Ab (person) Kc (daily routine) Kf (transportation)

Example: bǎ tā qǐng lái (invite him), bǎ wǒ dāng chéng (regard me as), bǎ tā yǎng dà (bring him up), bǎ wǒ sòng qù (pick me up), bǎ lǐ léi piàn le (mislead Li Lei), bǎ ér zi jiè shào (introduce the son), etc.

4 Conclusion This study explores the pattern “bǎ+NP+VP” from the perspective of semantic analysis. The results indicate that this pattern has a restriction on the choice of N(P) and V(P). In addition, semantic categories of head nouns and verbs are associated with each other, and simultaneously conﬁned by this pattern. Ab (person) and Id (action of upper limbs) are the most important semantic subcategories of noun and verb in this pattern. Finally, four typical usages of the pattern “bǎ+NP+VP” are summarized through cooccurrence networks, which focus on the action of upper limbs, passing of title, person, and language activity, respectively.

80

C. Yang and Y. Qu

The signiﬁcance of the present study comprises two aspects. In the linguistic aspect, a holistic understanding of the pattern “bǎ+NP+VP” is achieved from the perspective of semantic analysis. This study also provides new evidence for the interaction of structure, meaning, and function in a pattern. In the practical ﬁeld of education, this study may provide valuable guidance and reference for the teaching and learning of Chinese as a foreign language.

Appendix

Table 6. N * V Cross tabulation (complete version)

n

A B C D E F G H I J K L M

Total

Count Expected Count Expected Count Expected Count Expected Count Expected Count Expected Count Expected Count Expected Count Expected Count Expected Count Expected Count Expected Count Expected Count Expected

count count count count count count count count count count count count count count

v I 432 524.2 260 172.6 671 449.4 272 417.5 17 47.8 9 13.1 28 24.1 29 27.4 3 6.5 10 14.3 10 29.8 5 11.9 27 34.3 1773 1773.0

Total J 49 49.4 6 16.3 15 42.3 65 39.3 10 4.5 2 1.2 2 2.3 5 2.6 2 0.6 1 1.3 2 2.8 3 1.1 5 3.2 167 167.0

K 587 466.6 88 153.6 293 400.0 429 371.6 57 42.6 18 11.6 17 21.5 25 24.4 5 5.8 10 12.7 15 26.6 7 10.6 27 30.6 1578 1578.0

L 176 212.6 63 70.0 112 182.2 226 169.3 30 19.4 3 5.3 11 9.8 7 11.1 5 2.7 12 5.8 40 12.1 13 4.8 21 13.9 719 719.0

N 38 29.3 5 9.6 8 25.1 29 23.3 3 2.7 0 0.7 1 1.3 1 1.5 1 0.4 2 0.8 6 1.7 1 0.7 4 1.9 99 99.0

1282 1282.0 422 422.0 1099 1099.0 1021 1021.0 117 117.0 32 32.0 59 59.0 67 67.0 16 16.0 35 35.0 73 73.0 29 29.0 84 84.0 4336 4336.0

A Corpus-Based Semantic Analysis of the Pattern ‘bǎ+NP+VP’ Table 7. Frequency of semantic tags at the second level (complete version) Noun Ab Cf Bf Da Dg Ax Cd Ck Cm Cc Ak Dd Dn Aa Bb Ea Db Ch Mf Dv Ai Ca Do Ac Dk Be Bd Dq Ce Dh Cb Ba Ci Eb Dc Df Cj Ga An Cn

Freq. % 715 16.5 257 5.9 195 4.5 160 3.7 146 3.4 136 3.1 131 3.0 131 3.0 111 2.6 108 2.5 106 2.4 97 2.2 94 2.2 89 2.1 86 2.0 81 1.9 71 1.6 68 1.6 65 1.5 61 1.4 50 1.2 48 1.1 48 1.1 47 1.1 43 1.0 42 1.0 40 0.9 40 0.9 39 0.9 38 0.9 36 0.8 35 0.8 35 0.8 35 0.8 31 0.7 31 0.7 30 0.7 30 0.7 28 0.6 27 0.6

Verb Id Kb Lj Kc Ka Ib Ii Ic Kk Kg Kf Ld Ke Le Ie Lh Ia Il Jb Ij Na Nc Li Lg Jf Kh Kn Kd

Freq. % 904 20.8 486 11.2 307 7.1 287 6.6 232 5.4 223 5.1 145 3.3 140 3.2 128 3.0 116 2.7 110 2.5 107 2.5 104 2.4 103 2.4 102 2.4 84 1.9 76 1.8 73 1.7 64 1.5 61 1.4 50 1.2 45 1.0 41 0.9 36 0.8 34 0.8 34 0.8 25 0.6 22 0.5

(continued)

81

82

C. Yang and Y. Qu Table 7. (continued) Noun Freq. % Verb Freq. % Dl 25 0.6 Ar 23 0.5 Di 23 0.5 Al 22 0.5 Dj 22 0.5

References 1. Ziegeler, D.: A possession-based analysis of the ba-construction in Mandarin Chinese. Lingua 110(11), 807–842 (2000). https://doi.org/10.1016/S0024-3841(00)00018-8 2. Bender, E.: The syntax of mandarin bă: reconsidering the verbal analysis. J. East Asian Linguis. 9(2), 105–145 (2000). https://doi.org/10.1023/A:1008348224800 3. Virginia, Y.: The acquisition of the Chinese ba-construction. Stud. Second. Lang. Acquis. 30 (2), 257–259 (2008). https://doi.org/10.1017/S0272263108080364 4. Wen, X.H.: A daunting task? The acquisition of the Chinese ba-construction by nonnative speakers of Chinese. J. Chin. Linguist. 40(1), 216–240 (2012) 5. Liu, F.H.: Word order variation and ba sentences in Chinese. Stud. Lang. 31(3), 649–682 (2007). https://doi.org/10.1075/sl.31.3.05liu 6. Xie, C.W.: A study on focus in ba-construction with disyllabic verbs. Int. J. Chinese Linguist. 5(2), 306–330 (2018). https://doi.org/10.1075/ijchl.5.2 7. Wang, L.L., Sun, W.W., Yuan, Y.L.: On automatic interpretation and pattern alternation of the Chinese bǎ-construction. Comput. Eng. Appl. 51(19), 129–137 (2015). (in Chinese) 8. He, B.R., Qiu, L.K., Xu, D.K.: Rule-based semantic role labeling of ba-sentences. J. Chinese Inf. Process. 1, 84–93 (2017). (in Chinese) 9. Tan, X.P.: The construction and application of the comparable corpus for TCSL: a case study of semantic and syntactic annotation and its application of “ba” sentence. J. Chinese Inf. Process. 6, 83–92 (2017). (in Chinese) 10. Hunston, S., Francis, G.: Pattern Grammar: A Corpus-driven Approach to the Lexical Grammar of English. John Benjamins, Amsterdam (2000) 11. Kang, S.Y.: New Synonym Dictionary. Shanghai Lexicographical Publishing House, Shanghai (2015). (in Chinese) 12. Studer, R., Benjiamins, V.R., Fensel, D.: Knowledge engineering: principle and methods. Data Knowl. Eng. 25(1–2), 161–197 (1998). https://doi.org/10.1016/S0169-023X(97) 00056-6 13. Feng, Z.W.: On the Chinese translation of the term ontology – “Běntǐlùn” and “Zhīshí běntǐ”. In: Revised Selected Papers of 6th Chinese Lexical Semantics Workshop, pp. 63–65. China Academic Journal Electronic Publishing House (2005). (in Chinese) 14. KH Coder. http://khcoder.net 15. Romesburg, H.C.: Cluster Analysis for Researchers. Lifetime Learning Publications, Belmont (1984) 16. Lv, S.X., Ding, S.S.: Modern Mandarin Dictionary. 7th ed. Commercial Press, Beijing (2016). (in Chinese) 17. Hunston, S.: Corpora in Applied Linguistics. Cambridge University Press, Cambridge (2002)

A Corpus-Based Semantic Analysis of the Pattern ‘bǎ+NP+VP’

83

18. Biber, D.: Register as a predictor of linguistic variation. Corpus Linguist. Linguist. Theory 8 (1), 9–37 (2012). https://doi.org/10.1515/cllt-2012-0002 19. Bonacich, P.: Power and centrality: a family of measures. Am. J. Sociol. 92(5), 1170–1182 (1987). https://doi.org/10.2307/2780000 20. Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E. 70(6) (2004). https://doi.org/10.1103/PhysRevE.70.066111

Cross-Categorial Behaviors of Mandarin Physical Contact Verbs: A Frame-Based Constructional Analysis of qiāodǎ 敲打 Tianqi He1,2(&)

and Meichun Liu2

1

2

College of Literature and Journalism, Central South University, Changsha, China [email protected] Department of Linguistics and Translation, City University of Hong Kong, Kowloon Tong, Hong Kong [email protected]

Abstract. This study aims to conduct a frame-based and corpus-based analysis on the cross-categorial behaviors of the prototypical Physical Contact verb qiāodǎ 敲打 ‘knock’ and explore the cognitive motivations involved in the semantic extension process. Different from the well-accepted mapping from physical impact to mental impact, qiāodǎ 敲打 ‘knock’ also implies the result or manipulative purpose of mental impact. Therefore, it can be further extended to Communication or Manipulation frame by syntactically coding verbal element or intended act as participant roles. The semantic extension of qiāodǎ 敲打 ‘knock’ is not only motivated by conceptual metaphor, but also by metonymybased metaphor and accompanied by semantic-syntactic changes. This study analyzes and accounts for the cross-categorial behaviors of qiāodǎ 敲打 ‘knock’ and discusses the involved cognitive motivations and semantic-syntactic criteria for conceptual transfer, which constitutes a major task in the construction of Mandarin VerbNet and plays a signiﬁcant role in the study of verbal semantics. Keywords: Physical contact verbs cross-categorial extension based metaphor From contact to communication

Metonymy-

1 Introduction Cross-categorial verbs, also termed as polysemous or multi-faceted verbs, are verbs with multiple meanings and usages. It is generally considered that cross-categorial verbs are the results of semantic extensions, which mainly motivated by cognitive mechanisms or pragmatical contexts. Previous studies mainly explored the semantic transfers of certain verb classes, along with the involved cognitive, syntactic and contextual mechanisms. Among them, some scholars focused on the cross-categorial behaviors of Physical Contact verbs and summarized their possible semantic extensions and cognitive motivations.

© Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 84–95, 2022. https://doi.org/10.1007/978-3-031-06703-7_7

Cross-Categorial Behaviors of Mandarin Physical Contact Verbs

1.1

85

Previous Works on Semantic Extensions

Semantic extensions are both universal and language speciﬁc, which has been testiﬁed by tons of typology studies and cross-language comparisons. [1] compared the diachronic usage of Swedish ‘gå’ and its English equivalents ‘go’, and further discussed their language-speciﬁc characteristics. While [2] argued that a semantic domain can be both universal and relativistic culture-speciﬁc based on linguistic evidence from Australian Perception verbs. In the literature, most attention has been paid to the cognitive mechanisms for semantic extension of various verb classes, including Perception verbs [3–5], Spatial verbs [6], Giving verbs [7], Motion verbs [8], Conflict Action verbs [9], Appearance verbs [10], Speech verbs [11], and Emotion verbs [12]. Among them, [5] examined perception verbs such as see and listen in 30 languages and concluded that “colexiﬁcation of perception and cognition-related meanings” are prevalent across languages. [7] compared verbs of giving in Thai and Mandarin and found that give-verbs in the two languages share a similar range of meanings from coding possession transfer to dative, benefactive, and causative relations. These works laid a solid foundation with a diverse spectrum of case studies for cross-categorial verbal extensions. The cognitive mechanisms highlighted in the above works are mainly of two types: metaphor and metonymy. Metaphor accounts for conceptual transfers across domains, normally from external, more accessible to internal, less accessible domains, while metonymy accounts for conceptual associations within the same domain [13–15]. Other works provide accounts of the conventional morphological or lexical rules involved [16], of the intricate interactions between verbs and associated syntactic expressions [17–19], or of corpus collocations and pragmatic factors [20]. In particular, the sources for extension may be accounted for, in terms of the surrounding contexts [21], in terms of prototypes [22, 23], or in terms of image-schema [22, 24]. 1.2

Semantic Extension of Physical Contact Verbs

PHYSICAL CONTACT is one of the most basic bodily experiences and considered to be a fundamental semantic prime [25]. Therefore, Physical Contact verb is undoubtedly the most fundamental verb category in human language. Numerous studies have noticed the polysemous nature of Physical Contact verbs and discussed the potential semantic extension paths of the two distinguishable subtypes [17]: verbs of Contact (e.g., touch) and Contact by Impact (e.g., hit). It is proposed that touch may easily be used as an Emotion verb in ‘I’m deeply touched’ [3] or a Perception verb in ‘I touch/feel the hardness of the table’ [26]. On the other hand, some scholars proposed that Contact by Impact verbs such as ‘hit’ is closely related to Motion [27, 28] and Creation [29] through metaphorical or metonymic extension. There are still works focused on the polysemous semantics of representative Mandarin Physical Contact verbs dǎ 打 ‘hit’ and argued that dǎ 打 ‘hit’ can be used as Communication [30], Produce [15] and even Eat verbs [31]. However, less attention has been paid to the conceptual transfer from physical contact to communication manifested in Mandarin Physical Contact verbs, with corresponding semantic-syntactic behaviors, and the cognitive motivations involved in

86

T. He and M. Liu

such extensions. Thus, this study is dedicated to the novel polysemous relations observed in Mandarin Physical Contact (MPC) verbs [32, 33] with a pilot study that explores the potential semantic extensions of its representative member qiāodǎ敲打 ‘knock’. Taking a corpus-based and framed-based constructional approach, the analyses will be carried out on the basis of the Center for Chinese Linguistics Corpus (CCL) (http://ccl.pku.edu.cn:8080/ccl_corpus/), with reference to the semantic framework provided by Mandarin VerbNet (http://mega.lt.cityu.edu.hk/*yufechen/#/). 1.3

Frame-Based Constructional Approach

In tackling the cross-categorial behavior of qiāodǎ 敲打 ‘knock’, we adopt a hybrid analytical framework based on Frame Semantics [34, 35] and Construction Grammar [36–38]. The approach incorporates both semantic and syntactic analyses in deﬁning event frames that verbs are anchored in. It uses a hierarchically structured frame taxonomy to organize various classes of verbs with collo-constructional deﬁnitions. Each semantic frame is represented by a conceptual schema with core frame elements, i.e., participant roles characteristic of the frame, a set of deﬁning constructions, and a list of salient collocational features [39]. Frames are further structured into a hierarchical scheme [40]: Archi-frame (major semantic domains with well-deﬁned conceptual schema) to primary frame (subparts of the schema with varied sets of core frame elements) to Basic frame (basic-level categories with distinct semantic proﬁles). Moreover, the approach has been adopted to build a cognitively motivated, linguistically valid, computationally feasible and Chinese-appropriate lexical semantic database called the Mandarin VerbNet (http://mega.lt.cityu.edu.hk/*yufechen/#/, see [41, 42]) with the aim of providing ‘lexical semantic information based on grammatical descriptions and anchored in linguistic theories’ [39].

2 Frame-Based Analysis of qāodǎ 敲打 ‘Knock’ A frame-based analysis of qiāodǎ 敲打 ‘knock’ is carried out in this chapter to explore the cross-categorial verbal extension of this verb. It is found that qiāodǎ 敲打 ‘knock’ is mainly used as a Physical Contact verb with potential transfers to Creation, Emotion, Communication and even Manipulation. Detailed analysis and corpus-based evidence will be provided below. 2.1

qiāodǎ 敲打 ‘Knock’ as a Physical Contact Verb

Contact by Impact verbs are generally recognized as the crucial member of Mandarin Physical Contact verbs. It was proposed that Contact by Impact verbs can easily enter transitive patterns, tend to collocate with impactive manner/result, and can encode the salient frame element Affected Figure as either direct object or post-verbal locative complement [33]. Thus, qiāodǎ 敲打 ‘knock’ is considered as a prototypical Physical Contact verb in terms of its semantics and construction patterns, as shown in (1). All

Cross-Categorial Behaviors of Mandarin Physical Contact Verbs

87

examples in this article were selected from Center for Chinese Linguistics Corpus (CCL) (http://ccl.pku.edu.cn:8080/ccl_corpus/). (1) a. 妻子用胶棒敲打了他的头 Qīzǐ yòng jiāobàng qiāodǎ-le tā de tóu Wife use glue stick knock-LE his DE head ‘The wife knocked his head with a glue stick.’ b. 人们狠狠地敲打它,一直把它打得粉碎 Rénmen hěnhěn de qiāodǎ tā, yīzhí bǎ tā dǎ de fěnsuì People violently knock it until BA it knock-DE pieces ‘People knock it violently until smash it into pieces.’ c. 小小的冰球敲打在屋顶上 Xiǎoxiǎo de bīngqiú qiāodǎ-zài wūdǐng shàng Small ice ball knock-ZAI roof ‘Small ice ball knocks on the roof.’ Qiāodǎ 敲打 ‘knock’ in above examples are typical Physical Contact verbs and can be used in transitive patterns: (1a) encodes Affected Figure (e.g., tā de tóu 他的头) as direct object; (1b) co-occurs with impactive manner (e.g., hěnhěn de 狠狠地) and result (e.g., fěnsuì 粉碎); (1c) encodes Affected Figure (e.g., zài wūdǐng shàng 在屋顶上) as the locative complement. However, it is found that qiāodǎ 敲打 ‘knock’ also displays cross-categorial behaviors. This will be further discussed in next section on the basis of corpus data. 2.2

Cross-Categorial Behaviors of qiāodǎ 敲打 ‘Knock’

The conceptual transfers from Physical Contact to other semantic domains such as Emotion, Creation and Communication have been thoroughly explored and supported by cross-linguistic evidence [3, 28, 30]. It is found that qiāodǎ 敲打 ‘knock’ also experiences the semantic transfer from physical contact to create by coding the incremental theme as direct object (2): (2) incremental theme as the direct object: 职工硬是用铁锤敲打出四个大基坑 Zhígōng yìngshì yòng tiěchuí qiāodǎ chū sìgè dàjīkēng. Worker obstinately-SHI use hammer knock out four foundation pit ‘Workers obstinately hammered out four large foundation pits.’ The above sentence encodes the incremental theme (e.g., sìgè dàjīkēng四个大基坑) as the direct object, and therefore enters the deﬁning pattern of Create verbs ([Agent] + [CREATE] + [incremental theme]). On the other hand, it is generally accepted that the conceptual transfer from physical contact to mental contact is usually motivated by metaphorically mapping and accompanied by the change of participant roles, as illustrated in (3).

88

T. He and M. Liu

(3) a. physical contact: 几位伊拉克雇员敲打着门板 Jǐwèi yīlākè gùyuán qiāodǎ zhe ménbǎn. Several Iraqi employee knock-ZHE door ‘Several Iraqi employees are knocking the door.’ b. mental or emotional contact: 这个触目惊心的数字敲打着他的心 Zhègè chùmùjīngxīn de shùzì qiāodǎzhe tā de xīn. This shocking-DE number knock-ZHE his heart ‘This shocking number is knocking his heart.’ As exempliﬁed above, the metaphorical mapping from physical impact (3a) to mental impact (3b) shares similar syntactic patterns, yet the involved semantic elements witness the mapping from a concrete domain to an abstract domain: [Agent] + [qiāodǎ 敲打] + [*Asp_le] + [Affected Figure] in (3a); [Stimulus] + [qiāodǎ 敲打] + [*Asp_le] + [Experiencer] in (3b). However, in some contexts, the verb qiāodǎ 敲打 ‘knock’ not only expresses mental impact (3b), but also implies a manipulative purpose or result of mental impact, usually a positive one, as shown in (4). (4) 咱这个人是个粗人, 得有个 Zán zhègè rén shì gè cūrén, děi yǒu gè. I this person SHI a boor should have a. 仔细的人敲打咱, 咱才能进步。 zǐxì de rén qiāodǎ zán, zán cái néng jìnbù. careful-DE person remind me I only can progress. ‘I’m a boor and can only make progress under the urge of a careful person.’ Qiāodǎ 敲打 ‘knock’ in the above example encodes both the mental impact exerted by the Agent (e.g., zǐxì de rén 仔细的人) and its potential result (e.g., jìnbù 进步). It implies that the mental impact may cause some cognitive or even actional responses of the Affected Patient (e.g., zán 咱). This subtle semantic difference can be further observed in the following examples, where qiāodǎ 敲打is used as a verb of verbal interaction, followed directly by a human object as interlocuter (5a) and a clausal complement in the form of direction quotation (5b), or proposed act (5c), as exempliﬁed below: (5) a. with interlocuter as direct object: 我没有用我们已经掌握的具体事实来敲打他 Wǒ méiyǒu yòng wǒmen yǐjīng zhǎngwò de jùtǐ shìshí lái qiāodǎ tā. I didn’t use our already known DE speciﬁc fact LAI remind him ‘I didn’t remind him with the speciﬁc facts that we already known.’ b. with direct quotation: 罗趁机敲打余:“你在我们这里拿了 Luó chènjī qiāodǎ Yú: “nǐ zài women zhèlǐ ná-le Luo take the chance remind Yu you ZAI us here take-LE

Cross-Categorial Behaviors of Mandarin Physical Contact Verbs

89

这么多工程, 你应该有一个说法。” zhème duō gōngchéng, nǐ yīnggāi yǒu yīgè shuō fǎ.” So many project you should have one explanation ‘Luo took the chance to remind Yu: you should give us an explanation after took so many projects from here.’ c. with indirect proposal: 您不一直敲打我要学知识学文化 Nín bú yīzhí qiāodǎ wǒ yào xué zhīshí xué wénhuà. You not always remind me to learn knowledge learn culture ‘Didn’t you always remind me to learn knowledge and culture?’ As shown above, the conceptual transfer involved here is from physical to verbal contact with a direct object in a simple transitive construction (5a), then from verbal contact to utterance verb with a direct quotation (5b), and then from utterance to coding a manipulative relation with the affected interlocutor and the intended impact or purpose (5c). In sum, except for the metonymic mapping from physical contact to creation, the semantic extension path of qiāodǎ 敲打 ‘knock’ can be summarized as: First, from physical impact (3a) to mental impact (3b) and to manipulative purpose (4); next, from mental impact to verbal impact as a crucial means (5a), and then to verbal impact with a direct quotation (5b); ﬁnally, from verbal impact to manipulative impact coding an Affected Patient and Proposed Act as double objects (5c). 2.3

Corpus Distribution of qiāodǎ 敲打 ‘Knock’

As mentioned above, qiāodǎ 敲打 ‘knock’ can act as Creation, Emotion, Communication and Manipulation verbs, which has been testiﬁed by corpus data. However, these semantic domains are not mutually exclusive, due to the continuity of conceptual transfers. It is thus unlikely to classify the cross-categorial behaviors into absolute separate verb frames. A close examination of the corpus data renders a preliminary summary of the distribution of qiāodǎ 敲打, as shown in Table 1. Table 1. Cross-categorial distributions of qiāodǎ 敲打 ‘knock’ in CCL. (A total of 1026 entries were retrieved from CCL and 818(79.7%) of them are typical physical contact usages. Thus, this table only displays the distribution of cross-categorial usage, with a total of 208 entries.) Verb category Mental Contact (MC) MC + Manipulative (MCM) MC + Verbal instrument (MCV) Telling (T) Manipulative (M) Creation (CR)

Frequency 45/208 (21.6%) 85/208 (40.9%) 34/208 (16.3%) 11/208 (5.3%) 6/208 (2.9%) 27/208 (13%)

Example (3b) (4) (5a) (5b) (5c) (2)

90

T. He and M. Liu

The above table can be represented by Fig. 1:

Fig. 1. Corpus-based cross-categorial distributions of qiāodǎ 敲打 ‘knock’.

As shown above, qiāodǎ 敲打 ‘knock’ displays entangled, overlapped sematic distributions. Besides clear metaphorical mapping from physical contact to mental contact, the involved conceptual transfers are motivated by the co-work of metaphor and metonymy, or the so-called ‘metonymy-based metaphor’ according to [43]. Further discussion on the involved cognitive motivations will be given in the next section.

3 Cognitive Motivations It is well documented that metaphor and metonymy are the most fundamental cognitive motivations of semantic extension [13, 44]; however, most conceptual transfers are not provoked by clear-cut metaphor or metonymy but under their coefﬁcient motivations [45, 46]. Numerous studies have discussed the intricate relationship between the two dominant cognitive mechanisms: [47] adopted the term ‘metaphtonymy’ to refer to ‘metaphor from metonymy’ and ‘metonymy within metaphor’. [23] further argued that metonymy plays a more important role than metaphor in semantic extensions. While [43] proposed ‘metonymy-based metaphor’ and claimed all metaphors for emotion are motivated by metonymy. The examples are cited and re-numbered below: (6) a. John’s head drooped (sadly). b. I am in low spirits. According to [43], the above examples indicate that the well-known metaphor ‘SADNESS IS DOWN’ is actually motivated by ‘effect for cause’ metonymy. More speciﬁcally, the downward behavioral (e.g., head drooped) in (6a) is the effect of an emotional cause (e.g., sadness), and therefore, is motivated by the metonymic mapping of ‘effect for cause’. Then, the ‘downward oriented bodily posture’ in (6a) is further reduced (also considered as metonymy in this article) to purely spatial domain, namely,

Cross-Categorial Behaviors of Mandarin Physical Contact Verbs

91

the ‘downward spatial orientation’ in (6b), which becomes the source domain of the metaphor ‘SADNESS is DOWN’. In addition, [47]’s work indicates how metaphor and metonymy may interact and work together in semantic shift, which is also found in the conceptual transfer of qiāodǎ 敲打 ‘knock’ as detailed in Sect. 2. It is proposed that the transfer may still be motivated by a metaphorical mapping from physical impact to mental impact, but it involves further constructional extensions to constructions that are typical of utterance verbs, which take a direct quotation of message, as well as manipulative, object-control verbs that takes a direct object and verbal complement expressing an intended act. It appears that while qiāodǎ 敲打 ‘knock’ lexically encodes manner of contact, what is semantically proﬁled in (5) is the purpose of contact or the effect of verbal impact. Instead of purely metaphorical, the semantic extension is also motivated by the metonymic shift from Manner of contact to Purpose of contact, as they are two closely related components in the event of qiāodǎ 敲打. In a sense, the metonymic shift enables the semantic shift from coding impactive action to coding impactive message and purpose, which opens the possibility for further transfers from manipulative mental impact (yellow block in Fig. 2) to Manipulating (red block) and Telling (green block) verb. The conceptual transfer from mental impact with manipulative purpose to manipulate is motivated by the metonymic-metaphorical mapping from mental purpose to actional response. While the extension to communication also involves metaphorical mapping from mental impact to verbal impact and metonymic mapping of ‘utterance for mental purpose’. The semantic transfers and involved cognitive motivations of qiāodǎ 敲打can be represented by Fig. 2.

Fig. 2. Cognitively motivated conceptual transfers of qiāodǎ 敲打 ‘knock’.

Thus, the semantic extension of qiāodǎ 敲打 ‘knock’ may be viewed as illustrative of metaphorical extension triggered and followed by metonymic extension. It shows

92

T. He and M. Liu

how metaphor and metonymy may work hand in hand in semantic shift, as also discussed in the literature of metonymy-based metaphor [43]. The case study may be further applied to other verbs of similar nature, which will be the direction of future research.

4 Conclusion This study provided a frame-based constructional analysis of the cross-categorial behaviors of the typical Mandarin Physical Contact (MPC) verb qiāodǎ 敲打 ‘knock’ according to their semantic-to-syntactic peculiarities. The conceptual transfers of qiāodǎ 敲打 ‘knock’ were extensively analyzed and distinguished into different semantic frames on the basis of corpus distributions. It is proposed that qiāodǎ 敲打 ‘knock’ extends from coding physical impact to mental impact via conventional metaphor, and the extension is further stretched to potential result and the cause-effect relation in actional response of the Affected Patient. The semantic extension of mental impact may be realized by verbal communication, which allows the verb to encode verbal impact with a direct quotation, and thus can be viewed as a Communication verb. The change from coding mental to verbal impacts is due to the ‘means for effect’ metonymic transfer as verbal communication is a common means to instill mental impact. On the other hand, the mental impact with manipulative purpose can enter the manipulation domain by coding Proposed Act as the direct object. Unlike the well-accepted metaphorical mapping from physical impact to mental impact, the semantic transfers of qiāodǎ 敲打 ‘knock’ are motivated by the coefﬁcient of metaphor and metonymy, as illustrated in Sect. 3. In sum, this study is signiﬁcant in presenting a comprehensive account of the crosscategorial behaviors of qiāodǎ 敲打 ‘knock’ from a frame-based constructional approach. It identiﬁes the cognitive motivations involved in the semantic-syntactic behaviors in realizing the conceptual transfer. The proposed semantic extension paths can be viewed as illustrative of metonymy-based metaphor and how these two prevalent mechanisms work hand in hand in formulating verbal semantics. It ultimately contributes to the advance of Chinese lexical semantic studies and facilitates the construction of lexical database such as Mandarin VerbNet (http://mega.lt.cityu.edu.hk/ *yufechen/#/).

References 1. Viberg, Å.: The polysemous cognates Swedish gå and English go: universal and languagespeciﬁc characteristics. Lang. contrast 2(1), 87–113 (1999) 2. Evans, N., Wilkins, D.: In the mind’s ear: the semantic extensions of perception verbs in Australian languages. Language, pp. 546–592 (2000) 3. Eve, S.: From Etymology to Pragmatics: Metaphorical and Cultural Aspects of Semantic Structure. Cambridge University Press, Cambridge (1990)

Cross-Categorial Behaviors of Mandarin Physical Contact Verbs

93

4. Mariana, N.: What is universal and what is language speciﬁc in the polysemy of perception verbs? Revue roumaine de linguistique LVIII. 3, 329–343 (2013) 5. San Roque, L., Kendrick, K.H., Norcliffe, E., Majid, A.: Universal meaning extensions of perception verbs are grounded in interaction. Cogn. Linguist. 29(3), 371–406 (2018) 6. Bowerman, M., Choi, S.: Shaping meanings for language: universal and language-speciﬁc in the acquisition of spatial semantic categories. Language acquisition and conceptual development, pp. 475–511 (2001) 7. Thepkanjana, K., Uehara, S.: Semantic extension of the verb of breaking in Thai and Japanese. Manusya J. Hum. 10(3), 95–114 (2007) 8. Kessakul, R., Ohori, T.: Polysemy and pragmatic change in the Japanese conditional marker ba. Studies in Japanese Grammaticalization, pp. 135–162 (1998) 9. Myhalets, O.I.: Semantic peculiarities of the verbs with the highest degree of polysemy denoting conflict actions. Acad. J. Interdiscipl. Stud. 9(3), 17–28 (2020) 10. Fukada, C.: On semantic extensions of verbs of appearance. Papers Linguist. Sci. 2, 63–86 (1996) 11. Orta, M.M.G.: The syntax and semantics interface of present-day and Old English speech verbs: Say and Tell versus Secgan and Tellan. J. English Stud. 3, 81–98 (2002) 12. Mathieu, Y.Y., Fellbaum, C.: Verbs of emotion in French and English. In: Proceedings of the 5th International Conference of the Global WordNet Association (2010) 13. Lakoff, G., Johnson, M.: Metaphors We Live By. University of Chicago Press, Chicago (1980) 14. Fauconnier, G.: Mental Spaces: Aspects of Meaning Construction in Natural Laguage. Cambridge University Press, Cambridge (1994) 15. Wu, C.-S., Wu, Q.-X.: Problems of polysemy processing in a case study of Da. J. Sanm. Univ. 36(1), 7–13 (2019) 16. Copestake, A., Briscoe, T.: Semi-productive polysemy and sense extension. J. Semant. 12 (1), 15–67 (1995) 17. Levin, B.: English Verb Classes and Alternations: A Preliminary Investigation. University of Chicago Press, Chicago (1993) 18. Pustejovsky, J.: The Generative Lexicon. MIT press, Cambridge (1998) 19. Boas, H.C.: Frame Semantics as a framework for describing polysemy and syntactic structures of English and German motion verbs in contrastive computational lexicography. In: Proceedings of Corpus Linguistics, pp. 64–73. University Centre for computer corpus research on language, Lancaster (2001) 20. Zeschel, A.: Exemplars and analogy: semantic extension in constructional networks. In: Glynn, D., Fischer, K. (eds.) Quantitative Methods in Cognitive Semantics: Corpus-Driven Approaches, pp. 201–220. Mouton de Gruyter, Berlin (2010) 21. Nerlich, B., Clarke, D.D.: Ambiguities we live by: towards a pragmatics of polysemy. J. Pragmat. 33, 1–20 (2001) 22. Lakoff, G.: Women, Fire, and Dangerous Things: What Categories Reveal about the Mind. University of Chicago Press, Chicago (1987) 23. Taylor, J.R.: Linguistic Categorization: Prototypes in Linguistic Theory. Oxford University Press, Clarendon (1995) 24. Wang, X.-J.: A Hundred Questions of Linguisitcis. Shanghai Education Press, Shanghai (1983) 25. Goddard, C.: The natural semantic metalanguage approach. In: Hoffman, T., Trousdale, G. (eds.) The Oxford Handbook of Construction Grammar, pp. 49–69. Oxford University Press, Oxford (2010) 26. Al-Ameedi, R.T., Mayuuf, L.H.H.: Semantic extension in verbs of touch in English and Arabic. J. Hum. Sci. 1(23), 532–544 (2016)

94

T. He and M. Liu

27. Gao, H.: Notions of motion and contact for physical contact verbs. In: Proceedings of the 18th Scandinavian Conference of Linguistics, vol. 2, pp. 193–209. Lund University, Sweden (2001) 28. Viberg, Å.: Physical contact verbs in English and Swedish from the perspective of crosslinguistic lexicology. In: Aijmer, K., Altenberg, B. (eds.) Advances in Corpus Linguistics, pp. 327–352. Rodopi, Amsterdam (2004) 29. Riemer, N.: Verb polysemy and the vocabulary of percussion and impact in Central Australia. Australian J. Linguist. 22(1), 45–96 (2002) 30. Hong, J.-F., Huang, C.-R., Ahrens, K.: The Polysemy of Da3: an ontology-based lexical semantic study. In: Proceedings of the 21st Paciﬁc Asia Conference on Language, Information and Computation, pp. 155–162. The Korean Society for Language and Information, Korea (2007) 31. Hook, P., Pardeshi, P., Liang, H.-H.: Semantic neutrality in complex predicates: evidence from East and South Asia. Linguistics 50(3), 605–632 (2012) 32. He, T.-Q., Liu, M.-C., He, H.-F.: Lexical semantics of mandarin carry and hold verbs: a frame-based constructional analysis of ná 拿 and wò 握. In: Su, Q., Zhan, W.-D. (eds.) From Minimal Contrast to Meaning Construct: Corpus-based, Near Synonym Driven Approaches to Chinese Lexical Semantics, FiCL, vol. 9 pp. 39–50. Springer, Singapore (2020) 33. Liu, M., He, T., He, H., Cao, Y.: Mandarin physical contact verbs: a frame-based constructional approach. In: Liu, M., Kit, C., Su, Q. (eds.) CLSW 2020. LNCS (LNAI), vol. 12278, pp. 187–205. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-81197-6_17 34. Fillmore, C.J.: Frame semantics. In: Linguistic Society of Korea (ed.) Linguistics in the Morning Calm, pp. 111–137. Hanshin Publishing Company, Seoul (1982) 35. Fillmore, C.J., Atkins, B. T.: Toward a frame-based lexicon: the semantics of “Risk” and its neighbors. In: Lehrer, A., Kittay, E. (eds.) Frames, Fields, and Contrasts: New Essays in Semantic and Lexical Organization, pp. 75–102. Lawrence Erlbaum, Hillsdale (1992) 36. Goldberg, A.E.: Constructions: A Construction Grammar Approach to Argument Structure. University of Chicago Press, Chicago (1995) 37. Goldberg, A.E.: The relationships between verbs and constructions. In: Verspoor, M.H., Lee, K.-D., Sweetser, E. (eds.) Lexicon and Grammar, pp. 383–398. John Benjamins Publishing, Amsterdam/Philadelphia (1997) 38. Goldberg, A.E.: Verbs, constructions, and semantic frames. In: Hovav, M., Doron, E., Sichel, I. (eds.) Syntax, Lexical Semantics and Event Structure, pp. 39–58. Oxford University Press, Oxford (2010) 39. Liu, M.-C., Chiang, T.-Y.: The construction of Mandarin VerbNet: a frame-based study of statement verbs. Lang. Linguist. 9(2), 239–270 (2008) 40. Liu, M.: Emotion in lexicon and grammar: lexical-constructional interface of Mandarin emotional predicates. Lingua Sinica 2(1), 1–47 (2016). https://doi.org/10.1186/s40655-0160013-0 41. Liu, M.-C.: A frame-based morpho-constructional approach to verbal semantics. In: Kit, C.Y., Liu, M.-C. (eds.) Empirical and Corpus Linguistic Frontiers. China Social Sciences Press, Beijing (2018) 42. Liu, M.-C., Wan, M.-Y.: Semantic distinction and representation of the Chinese ingestion verb Chī. In: Hong, J.-F., Zhang, Y.-S., Liu, P.-Y. (eds.) CLSW 2019. LNAI, vol. 11831 pp. 189–200. Springer, Singapore (2019) 43. Barcelona, A.: On the plausibility of claiming a metonymic motivation for conceptual metaphor. In: Barcelona, A. (ed.) Metaphor and Metonymy at the Crossroads: A Cognitive Perspective, pp. 31–58. Mouton de Gruyter, Berlin/New York (2002) 44. Kövecses, Z., Radden, G.: Metonymy: developing a cognitive linguistic view. Cogn. Linguist. 9(1), 37–77 (1998)

Cross-Categorial Behaviors of Mandarin Physical Contact Verbs

95

45. Kövecses, Z.: Metaphors of Anger, Pride and Love. John Benjamins Publishing, Amsterdam/Philadelphia (1986) 46. Dirven, R.: Metaphor as a basic means for extending the lexicon. In: Paprotté, W., Dirven, R. (eds.) The Ubiquity of Metaphor. Metaphor in Language and Thought, pp. 85–119. John Benjamins Publishing, Amsterdam/Philadelphia (1985) 47. Goossens, L.: Metaphtonymy: the interaction of metaphor and metonymy in expressions for linguistic action. In: Barcelona, A. (ed.) Metaphor and Metonymy at the Crossroads: A Cognitive Perspective, pp. 349–378. Mouton de Gruyter, Berlin/New York (2009)

The Correspondence Between Semantic Functions and Syntactic Constructions of Guà Verbs Caiying Yang2, Gaofeng Shi1, Hongbing Xing2, and Xingsan Chai2(&) 1

College of International Education and Exchange, Tianjin Normal University, Tianjin, China [email protected] 2 Institute on Educational Policy and Evaluation of International Studnets, Beijing Language and Culture University, Beijing, China [email protected]

Abstract. Chinese guà(挂) type verbs have been concerned by grammarians because they can express the continuous function of “state” and have a complex process. This paper adopts the method of corpus statistical analysis on the real usage of verbs. Firstly, we reclassify guà type verbs based on the distribution of their semantic functions, and make a statistical analysis on the richness, concentration, dominant syntactic constructions and common syntactic constructions used by different nembers of guà type verbs. Based on the results, we emphasize the correspondence between different semantic functions and the syntactic constructions. Keywords: Guà type verbs Verb classiﬁcation Syntactic constructions Correspondence

Semantic functions

1 Introduction Guà type verbs have the semantic meaning of “attached” and require the locative components to appear together with them in the construction of sentences, so the term “attached verb” was proposed [1]. The particularity of guà type verbs is reflected in their functions of expressing both “action” and “state” [2]. Because guà type verbs have double semantic features of “action” and “state”, many scholars have made the detailed descriptions of their situation features and process structures [3–6]. The characteristics of different verbs belonging to the guà type verbs are different [7], which gives us a new thought on the guà type verbs as a whole and the semantic functions of its different members. But if we just regard guà type verbs as a type of verbs with common characteristics to analyze their situation characteristics and process structures, it will be difﬁcult to observe the distribution characteristics of various semantic

© Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 96–105, 2022. https://doi.org/10.1007/978-3-031-06703-7_8

The Correspondence Between Semantic Functions and Syntactic Constructions

97

functions of guà type verbs and explore the differences of the distribution characteristics of semantic functions among guà members. The method of combining transformation analysis with semantic feature analysis is applied to differentiate sentence patterns and explain the reasons for the differentiation, which makes Chinese grammar analysis take the road of combining form and semantics [8, 9]. The situation type of a verb is determined by both its own characteristics and other components in the sentence [10]. Verbs and syntactic constructions have the characteristic of mutual selection. The verbs can choose the syntactic constructions, and the syntactic constructions also have their appropriate verbs [11, 12]. Therefore, we hold that it is practical and necessary to investigate whether there exist differences in the types and distribution of the syntactic constructions used by different guà members and also what kind of correspondences between different verbs and syntactic constructions, and between different semantic functions and syntactic constructions. These issues can not only help us have a deep understanding of this kind of verbs, but also verify whether the reclassiﬁcation of guà type verbs in this research is scientiﬁc or not. This is also a useful attempt to explore the motivation of the mutual selection relationship between the verbs and the syntactic constructions. This paper adopts the method of corpus knowledge extraction to generalize and re-categorize the distribution characteristics of the semantic functions of 49 guà type verbs summarized by [4].And this paper makes a statistical analysis on the usage of the syntactic constructions of these verbs in terms of the richness, concentration, dominant and common syntactic constructions, then examines the corresponding relationships between different verbs and syntactic constructions and between semantic functions and the syntactic constructions, and ﬁnally discusses the formation causes of the corresponding relationships between verbs and the syntactic constructions.

2 The Reclassiﬁcation of Guà Type Verbs The corpus data of this paper is extracted from “Modern Chinese Corpus of State Language Commission”, and the guà type verbs studied are shown in Table 1. We annotated the verbs according to the number of the language materials obtained. When the total number of the language materials of the verb was less than 200, we made a comprehensive annotation and analysis on each type of language materials; if the total number was more than 200, we would randomly select out 200 items. The detailed annotated number is listed out in Table 1. We selected the original meaning of a verb to annotate and made statistic analysis. When a verb expresses state semantics in a sentence, we deﬁne it as “state” function such as “墙上挂着一幅画。” (There is a picture hanging on the wall.); When a verb expresses action semantics in a sentence, we deﬁne it as “action” function such as “他正在挂国旗。” (He is hanging the national flag.). Based on the classiﬁcation criteria of [7], 49 guà type verbs were classiﬁed again. According to the proportion of verbs expressing “action” and “state”, we divided guà type verbs into three categories: “strong state verbs”, “strong action verbs” and “action state verbs”. If the ratio of “state” function expressed by one verb is 60% or more, we put it as “strong state verb” while the opposite is “strong action verb” with the ratio of less 40%. As for the function proportion between 40% and 60% of one verb, we name

98

C. Yang et al.

it as “action-state verb”. As is shown in Fig. 1, there are 19 “strong action verbs”, accounting for 38.8%, which is the most; “strong state verbs” are 13, accounting for 26.5%, which is the least; the ratio of “action state verbs” is in the middle, accounting for 34.7% of all the verbs. Table 1. The analyzed number of guà type verbs. 1–10 Number 11–20 Number 21–30 Number 31–40 Number 41–49 Number

挂 200 压 196 拿 198 套 136 长 200

摆 200 扣 71 泡 103 拴 98 捆 96

戴 200 抓 111 堆 133 咬 200 缠 100

穿 200 松 60 插 100 锁 73 埋葬 37

掩盖 117 卷 148 夹 200 盖 200 绑 90

裹 89 包 122 藏 100 垫 70 停 200

关 186 盛 80 晾 200 堵 100 搁 96

放 200 记 200 混 56 埋 200 铺 200

开 176 举 125 存 94 捏 163 踩 100

隐藏 84 装 200 腌 21 弯 99

Fig. 1. The classiﬁcation of guà type verbs.

3 The Correspondence Between the Different Semantic Functions and the Syntactic Constructions of Guà Type Verbs In terms of syntactic constructions annotation, we selected 7 verbs from each category and 21 in total from 挂(guà) to 拿(ná) in Table 1 according to the typical degree and the number of corpus of each verb in the three categories, and annotated their syntactic constructions. The syntactic constructions involved in this paper refers to the researches of The Sentence Pattern Research Group of Beijing Language and Culture University [13] and [12], and have been improved according to the actual usage of corpus. When annotating the syntactic constructions used by each verb, we adopted the subcategory

The Correspondence Between Semantic Functions and Syntactic Constructions

99

syntactic constructions, and the parent category syntactic constructions when investigating the corresponding between the semantic functions and the syntactic constructions. The speciﬁc construction types and the distribution of syntactic constructions corresponding to the three types of verbs are shown in Table 2. 3.1

The Correspondence Between the Semantic Functions and the Syntactic Constructions of “Strong State Verbs”

The syntactic constructions of each “strong state verb” are different, and the proportion of different “strong state verbs” under the same syntactic construction is also different. As is shown in Table 2, nine syntactic constructions have been applied by such verbs, among which “S3(subject) + verb + object” has the highest proportion, accounting for 54.1%. In S3, the proportion of S3-5 “(S) + V + zhe/le/guò + O” takes the ﬁrst place, accounting for 54.3%, followed by S2 “(S) + V + C”, accounting for 15.1%. The remaining types of syntactic constructions are less commonly used. This part will mainly concentrate on the speciﬁc distributions from the four aspects of the richness and concentration of the syntactic constructions, the dominant syntactic constructions and the common syntactic constructions. Table 2. The syntactic constructions types and syntactic constructions distribution of three kinds of verbs. Parent-category syntactic construction

Subcategory syntactic construction

The ratio of strong state verbs (%)

S1 (S) + V

S1-1(S) + (yī/le) + repeated verb S1-2(S) + V S1-3(S) + (yī/le)) + V S1-4(S) + V + le S1-5(S) + V + zhe + (le) S2-1(S) + (bù/méi) + V + C + (le) S2-2(S) + V + le + C S3-1(S) + V + O S3-2(S) + V + C + O S3-3 (S) + V + C + zhe/le/guò + O S3-4(S) + (yī/le) + V + O S3-5(S) + V + zhe/le/guò + O S3-6(S) + (yī/le) + repeated verb + O S4-1(S) + V + O + C

3.7

9.6

4.4

15.1

8.5

17.2

54.1

39.9

34.3

0.1

0.6

15.3 3.4 6.6

13.1 2.3 13.1

S2 (S) + V + C S3 (S) + V + O

S4 (S) + V + O + C S5 Ba sentence S6 Bei sentence S7 Sentences with serial verbs1 (after verb)

S5-1 Ba sentence S6-1 Bei sentence S7-1 Sentences with serial verbs1 (after verb)

0 10.7 2.8 5.5

The ratio of strong action verbs (%)

The ratio of action state verbs (%)

(continued)

100

C. Yang et al. Table 2. (continued)

Parent-category syntactic construction

Subcategory syntactic construction

The ratio of strong state verbs (%)

The ratio of strong action verbs (%)

The ratio of action state verbs (%)

S8 Sentences with serial verbs 2 (before verb) S9 Pivotal sentence 2 (before verb) S10 Subjectpredicate predicate sentence

S8-1 Sentences with serial verbs2 (before verb)

6.7

14.2

11.2

S9-1 Pivotal sentence2 (before verb) 1.3

1.9

3

S10-1 Subject-predicate predicate sentence

0.5

0.8

0.1

The “Richness” and “Concentration” of the Syntactic Constructions of “Strong State Verbs” The richness of syntactic constructions refers to the number of types of syntactic constructions of the verbs. Namely, the higher the richness is, the more the types of syntactic constructions will be; in contrast, the lower the richness is, the less types of syntactic constructions will be. The concentration1 of syntactic constructions refers to whether the syntactic constructions of verbs concentrate on one or more syntactic constructions. We use the Z-score to divide the richness and concentration of syntactic constructions into three grades: high (Z 0.5), medium (0.5 > Z > −0.5) and low (Z −0.5). The richness and concentration of the syntactic constructions of “strong state verbs” are shown in Table 3. From the perspective of “richness”, 5 “strong state verbs (关(guān, close), 裹(guǒ, wrap), 穿(chuān, wear), 戴(dài, wear) and 隐藏(yǐn cáng, hide)) lie in the medium and low grades, accounting for 71.43%, but only 2 verbs (挂(guà, hang) and 摆(bǎi, set)) with high richness, accounting for 28.57%. In terms of “concentration”, “strong state verbs” are mainly distributed in the medium and high levels, including 5 verbs (戴(dài, wear), 挂(guà, hang), 穿(chuān, wear), 隐藏(yǐn cáng, hide) and 摆(bǎi, set)), accounting for 71.43%, while only 2 verbs (裹(guǒ, wrap)and 关(guān, close)) with low concentration, accounting for 28.57%. The verb with low richness but high concentration is 戴(dài, wear), while the verb with high richness and high concentration is 挂(guà, hang). According to the distributions of the richness and the concentration of “strong state verbs”, what can be inferred is that the types of syntactic constructions of “strong state verbs” are relatively concentrated. That is to say, “strong state verbs” have their corresponding syntactic constructions, and the correspondence is very strong.

1

rﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ﬃ P xX Concentration = N , which means the standard deviation of the percentage distribution of different syntactic constructions used by a verb.

The Correspondence Between Semantic Functions and Syntactic Constructions

101

Table 3. The richness and concentration of “strong state verbs”. Classiﬁcation Verbs High 挂(guà, hang) 摆(bǎi, set) Medium 关(guān, close)

Low

Richness 9 8 7

Z-score 2 1 0

裹(guǒ, wrap)

7

0

穿(chuān, wear) 戴(dài, wear) 隐藏(yǐn cáng, hide)

7

0

Verbs 戴(dài, wear) 挂(guà, hang) 穿(chuān, wear) 隐藏(yǐn cáng, hide) 摆(bǎi, set)

Concentration Z-score 24.88 1.4 22.31 0.91 19.5 0.36

6 6

−1 −1

裹(guǒ, wrap) 12.58 关(guān, close) 10.14

17.26

−0.07

16.59

−0.2 −0.97 −1.44

Table 4. The richness and concentration of “strong action verbs”. Classiﬁcation Verbs High 卷(juǎn, roll) 开(kāi, open)

Concentration Z-score 22.18 1.72 16.7 0.5

Medium

15.83 15.15 11.25 8.46 11.74

Low

Richness Z-score Verbs 10 1.73 松(sōng, loose) 9 0.87 掩盖(yǎn gài, cover up) 扣(kòu, buckle) 8 0 放(fàng, put) 抓(zhuā, grasp) 8 0 开(kāi, open) 松(sōng, loose) 7 −0.87 抓(zhuā, grasp) 放(fàng, put) 7 −0.87 卷(juǎn, roll) 掩盖(yǎn gài, 7 −0.87 扣(kòu, buckle) cover up)

0.3 0.15 −0.72 −1.34 −0.61

“Dominant Syntactic Constructions” and “Common Syntactic Constructions” of “Strong State Verbs”. Overall, there exist similarities in the syntactic constructions of the 7 “strong state verbs”. They tend to choose some certain syntactic constructions. The above analysis of “richness” and “concentration” also have obtained that these verbs have a strong correspondence with the syntactic constructions. In order to investigate the speciﬁc types of the corresponding syntactic constructions, it is necessary to analyze the “dominant syntactic constructions” and “common syntactic constructions “. “Dominant syntactic constructions” refers to the high-frequency syntactic constructions used by native speakers when using a certain verb, while “common syntactic constructions” refers to the sub-high frequency syntactic constructions. The Z-score of “dominant syntactic construction” is 1; when the Z-score is 1.30103 => p < 0.05: signiﬁcant; Collostructional strength > 2 => p < 0.01: very signiﬁcant; Collostructional strength > 3 => p < 0.001: extremely signiﬁcant. When the collostructional strength > 1.30103, the construction and the lexical item have a signiﬁcant correlation. The higher the collostructional strength is, the more typical the lexical collocation is. To ensure the statistical consistency, accuracy, and reliability, only nouns with a total frequency of over 50 times in the corpus are the target lexical items to analyze. 2.2

Result

There are 1,028 concordance lines of “bǎ” as a classiﬁer in the corpus. 252 lines of “bǎ” are identiﬁed after excluding its usage as an individual classiﬁer and a verbal classiﬁer manually. According to the frequencies, there are two main types of constructions of “bǎ”, including “numeral + bǎ + noun” and “yī + bǎ + noun”. Next, the Collostructional Analysis is conducted with the corpus data. About “numeral + bǎ + noun”, the collostructional strengths are listed in Table 1. Only the top 5 nouns in terms of collostructional strengths are listed here due to the limited text length (the same below).

110

Z. Li and C. Congmei Table 1. The collostructional strengths between “numeral + bǎ + noun” and nouns No. 1 2 3 4 5

Nouns “mǐ” “dào gǔ” “cǎo” “tǔ” “chái”

Relation Attraction Attraction Attraction Attraction Attraction

Expected frequency 0.005444586 0.00103877 0.006626634 0.00703856 0.00149547

Coll. strength 46.568840 13.360055 13.019142 10.018836 9.273537

According to the collostructional strengths, the construction has attractions to 46 nouns, all of which denote concrete objects which can be held or grasped with people’s hands. According to the shapes of the objects, the nouns in “numeral + bǎ + noun” are divided into the following two categories including powder-shaped or granular things and strip-shaped things in Table 2. Table 2. The categorization of nouns in “numeral + bǎ + noun” Construction Numeral + bǎ + noun

Features of objects denoted by the nouns Powder-shaped or granular things Strip-shaped things

Relations with hands Being held Being grasped

Examples “mǐ”, “dào gǔ”, “tǔ” “cǎo”, “chái”, “dào cǎo”

The features of the objects denoted by the nouns are the indications of the collostructional features of “bǎ”. According to Table 2, the nouns in the ﬁrst category refer to powder-shaped or granular things, which are quantiﬁed by being held in hands, and the nouns in the second category refer to strip-shaped things, which are quantiﬁed by its whole cross section being grasped with hands. Only rough and vague quantities in whole are measured with hands. Thus, “numeral + bǎ + noun” is used for quantifying concrete things held or grasped with hands. Next, the collostructional features of “yī + bǎ + noun” are explored. The calculated collostructional strengths between “yī + bǎ + noun” and nouns are listed in Table 3. Table 3. The collostructional strengths between “yī + bǎ + noun” and nouns No. 1 2 3 4 5

Words “huǒ” “hàn” “lì” “yǎn lèi” “jìn”

Relation Attraction Attraction Attraction Attraction Attraction

Exp. freq 0.006816011 0.002362228 0.009836463 0.003020452 0.002319166

Coll. strength 66.558764 48.350278 26.904929 18.075687 15.320911

The “yī + bǎ + noun” construction has collocational attractions to 11 nouns according to the calculated statistics. It is found that these nouns can be divided into

The Collostruction-Based Deﬁnition Model

111

two categories according to whether the objects denoted by them are abstract or concrete, as shown in Table 4. Table 4. The categorization of nouns in “yī + bǎ + noun” Construction Features of nouns Relations with hands Examples yī + bǎ + noun Abstract nouns No direct relations “lì”, “jìn”, “nián jì” Abstracted concrete nouns No direct relations “huǒ”, “hàn”, “lèi”

The abstract nouns in the ﬁrst category, which have no direct relations with hands, collocate with “bǎ” owing to the metonymical mechanism. For instance, in the collostructions of “yī bǎ lì” and “yī bǎ jìn”, both “lì” and “jìn” mean strengths, which are associated with hands. “Nǐ de shǒu jìn zhēn dà” is said to show the strengths of hands. With the diachronic development, “yī bǎ lì” can also be interpreted as “contributions” or “efforts”. For example, in the sentence “wéi shì jiè hé píng chū yī bǎ lì”, “yī bǎ lì” means “making contributions to the world peace”. Besides, regarding “yī bǎ nián jì”, the collocation of “yī bǎ” and “nián jì” is attributed to the metaphorical mechanism. The original nouns selected by “yī bǎ” refer to a collection of things with a certain spatial length, while “nián jì” is a collection of abstract concepts with a certain time length. The collostruction of “yī bǎ nián jì” is owing to the metaphor mechanism mapping from the space domain to the time domain [8]. The nouns in the second category are typical abstracted concrete nouns. In “yī + bǎ + noun”, for example, although the objects denoted by “huǒ”, “hàn” and “lèi” are concrete in their original meanings, these words do not refer to concrete objects here. For example, in the sentence “rú cǐ dà liàng zī jīn de tóu rù, shǐ bú shǎo rén dōu niē zhe yī bǎ hàn”, the word “hàn” here does not mean sweat in reality, but describes psychological anxiety and tension when collocating with “yī bǎ”. Besides, the lexical items in the numeral slot of the construction can only be “yī”. The sentence *“shǐ bú shǎo rén dōu niē zhe liǎng bǎ hàn” is not acceptable. “Yī + bǎ + noun” is the instantiation of the rhetorical construction “yī + classiﬁer + noun” used for describing feelings or states, in which the classiﬁer is deviated from the original meaning [11].

3 A Comparative Study on “Bǎ” and Its English Counterparts In order to design language-speciﬁc CELDs facilitating English-speaking Chinese learners to acquire Chinese collective classiﬁers, the linguistic features of Chinese and English are compared. The English counterparts of the main constructions of “bǎ” are ﬁrstly identiﬁed, based on which the subsequent comparative study on “bǎ” and its English counterparts is conducted. 3.1

English Counterparts of “Bǎ”

English counterparts of “bǎ” are identiﬁed from Chinese-English bilingual parallel corpora (including CCL Chinese-English Bilingual Corpus, and the Chinese-English

112

Z. Li and C. Congmei

parallel corpus of Four Great Classical Novels and Lu Xun’s novels) [12]. With a comprehensive retrieval of Chinese-English bilingual parallel corpora, 90 ChineseEnglish bilingual pairs of “bǎ” are sorted out. And “numeral + bǎ + noun” has two main English counterparts, as shown in Table 5. Table 5. The English counterparts of “numeral + bǎ + noun” Construction

Collostructions

Numeral + bǎ + noun

Numeral + bǎ + powdershaped or granular things

English counterparts A handful of + noun

Numeral + bǎ + striped things

A bundle of + noun

Examples Kě jiāng yī bǎ cū táng fàng rù gǎn lǎn yóu nèi jiǎo bàn Mix a handful of coarse sugar crystals with some olive oil Fèng jiě ér yòng shǒu jīn guǒ le yī bǎ yá zhù, zhàn zài dì xià Xi-feng stood on the floor below, a bundle of ivory chopsticks wrapped up in a teatowel in her hand

According to Table 5, “numeral + bǎ + noun” corresponds to different English counterparts when quantifying different objects. When quantifying powder-shaped or granular things, “numeral + bǎ + noun” corresponds to “a handful of + noun”. When quantifying strip-shaped things, “numeral + bǎ + noun” corresponds to “a bundle of + noun”. Next, based on the bilingual parallel corpora, “yī + bǎ + noun” mainly has the following examples of English counterparts, as shown in Table 6. Since “yī + bǎ + noun” is the rhetorical construction collocating with abstract nouns or abstracted concrete nouns, the English counterparts are listed in the two different kinds of collostructions. It is found from Table 6 that there are no ready English constructions corresponding to “yī + bǎ + noun” since “yī + bǎ + noun” is mainly used for describing feelings or states. Thus, the free translation method is adopted. 3.2

A Comparative Study on “Bǎ” and Its English Counterparts

As discussed in Sect. 3.1, “numeral + bǎ + noun” corresponds to two English constructions, including “a handful of + noun” and “a bundle of + noun”, both of which are frequently-used English constructions for denoting the collectiveness. The collostructional nouns of “a handful of + noun” and “a bundle of + noun” are studied based on the English language material from GloWbE, a collection of English data from twenty countries. The top 50 nouns in English constructions in terms of their frequencies in GloWbE are chosen for the Collostructional Analysis.

The Collostruction-Based Deﬁnition Model

113

Table 6. The English counterparts of “yī + bǎ + noun” Construction

Collostructions

yī + bǎ + noun

yī + bǎ + abstract nouns

yī + bǎ + abstracted concrete nouns

English counterparts Greater efforts an effort

Great anxiety fear

Examples Tè bié shì nà xiē hái méi yǒu xué huì shēng chǎn de dì qū, jīn nián yīng dāng gèng dà dì nǔ yī bǎ lì Greater efforts must be made this year, particularly in areas which have not yet learned to develop production Táng sēng niē zhe yī bǎ hàn, zhī jiāo: “mò yào shēng shì!” “Whatever you do, don’t get into a row,” Sanzang said again in great anxiety

With the Collostructional Analysis of “a handful of + noun”, the calculated collostructional strengths between “a handful of + noun” and nouns are listed in Table 7. Table 7. The collostructional strengths between “a handful of + noun” and nouns No 1 2 3 4 5

Words People Times Others Dust Games

Relation Attraction Attraction Attraction Attraction Attraction

Exp. freq 10.961980 1.936827 1.824769 0.116018 0.968880

Coll. strength Inf Inf Inf Inf Inf

According to the collostructional strengths, “a handful of + noun” has signiﬁcant attractions to the top 50 nouns. As is concluded in Sect. 3.1, “numeral + bǎ + noun” corresponds to “a handful of + noun” when collocating with powder-shaped or granular things held in hands. And according to whether the collocations of “a handful of + noun” are in correspondences to those of “numeral + bǎ + powder-shaped or granular things”, the nouns in “a handful of + noun” are divided into two categories including the correspondence and the non-correspondence, as shown in Table 8. Table 8. The categorization of nouns in “a handful of + noun” Construction A handful of + noun

Correspondences to “numeral + bǎ + powder-shaped or granular things” Correspondence Non-correspondence

Examples Dust, nuts, sand Times, people, games

114

Z. Li and C. Congmei

“A handful of + noun” is in correspondences to “numeral + bǎ + powder-shaped or granular things” when collocating with the nouns denoting some powder-shaped or granular things, such as “dust”, “nuts” and “sand”. But there are non-correspondences in most cases when “a handful of + noun” collocates with other items including nouns referring to abstract concepts, people or other entities, such as “times”, “people” and “games”, which are not the collocations of “bǎ”. Next, the collostructional nouns of “a bundle of + noun” are studied. And the calculated collostructional strengths between “a bundle of + noun” and nouns are listed in Table 9. Table 9. The collostructional strengths between “a bundle of + noun” and nouns No. 1 2 3 4 5

Words Nerves Joy Sticks Contradictions Energy

Relation Attraction Attraction Attraction Attraction Attraction

Exp. freq 0.004928 0.025955 0.007135 0.003044 0.135162

Coll. strength 259.156515 192.024005 130.556274 96.242181 88.300823

According to the collostructional strengths between “a bundle of + noun” and nouns, the construction has signiﬁcant attractions to the top 50 nouns. As is concluded in Sect. 3.1, when collocating with the nouns denoting strip-shaped objects, “numeral + bǎ + noun” corresponds to “a bundle of + noun”. And according to whether the collocations of “a handful of + noun” are in correspondences to those of “numeral + bǎ + strip-shaped things”, the nouns in “a bundle of + noun” are divided into two categories including the correspondence and the non-correspondence, as shown in Table 10. Table 10. The categorization of nouns in “a bundle of + noun” Construction A handful of + noun

Correspondences to “numeral + bǎ + striped things” Correspondence Non-correspondence

Examples Sticks, straw, ﬁrewood Nerves, joy, contradictions

According to whether the collocations of “a bundle of + noun” are in correspondences to those of “numeral + bǎ + strip-shaped things”, “a bundle of + noun” is only in correspondences to “numeral + bǎ + striped things” when “bundle” quantiﬁes some strip-shaped things, such as “sticks”, “straw” and “ﬁrewood”. And the two constructions are not in correspondences when “a bundle of + noun” collocates with abstract concepts, such as “nerves”, “joy” and “contradictions”.

The Collostruction-Based Deﬁnition Model

115

In a brief summary, “numeral + bǎ + noun” and the English counterparts “a handful of + noun” and “a bundle of + noun” are in partial correspondences, and the nouns collocated with “bǎ” and its English counterparts are partially overlapping. The non-correspondences should be treated in a speciﬁc way in language-speciﬁc CELDs compiled for English-speaking Chinese learners.

4 A Study on the Deﬁnition Models for Chinese Collective Classiﬁers in Existing CELDs In this section, the microstructure deﬁnitions of “bǎ” in the six CELDs published in the century are studied. The six CELDs include A Concise Chinese-English Dictionary (Revised Edition) (hereinafter referred to as CCED) [15], ABC Chinese-English Comprehensive Dictionary (hereinafter referred to as ABC) [16], A Chinese-English Dictionary (hereinafter referred to as CED) [17], A Chinese-English Dictionary for Foreign Learners (hereinafter referred to as CEDFL) [18], CED(C-E) and PLECO(a mobile integrated Chinese-English dictionary founded in May of 2000 by Michael Love for Chinese learning). The deﬁnitions of “bǎ” are summarized in Table 11 and the labels before the deﬁnitions are the labels in the dictionaries. Table 11. The deﬁnitions of “bǎ” in the six CELDs Names of dictionaries CCED

ABC CED CEDFL CED (C-E) PLECO

Deﬁnitions [量] ②handful ③for some abstract nouns, if used with a numeral, only 一 can be used ②for handfuls ③for certain abstract concepts ❷一手抓起的数量a handful of ❹用于某些抽象的事情for something abstract ➊一*花生a handful of peanuts Unit word handful of MEASURE WORD 2 handful of 3 [for certain abstract concepts]

According to Table 11, all the six CELDs include “bǎ” as a lexicographical entry. Four dictionaries, including CCED, ABC, CED, and PLECO, divide the meaning of “bǎ” into two senses. For the ﬁrst sense, CCED provides the word-type English collective noun “handful” while CED, CED(C-E), and PLECO provide phrase-type English constructions of “handful’. And the ﬁrst sense indicates the meaning of quantifying concrete things held or grasped with hands, but only CED points out the

116

Z. Li and C. Congmei

meaning “一手抓起的数量”. However, “一手抓起的数量” is the constructional meaning rather than the isolated lexical meaning of “bǎ”, and it is necessary to present explicitly the constructional form. And the collocational information is not presented except that ABC provides “for handfuls”, which might lead to the negative transfer of the collocational information of “handful” to that of “bǎ” for English-speaking Chinese learners. The second sense indicates that “bǎ” is used for some abstract nouns. Only CCED provides the constructional form with the sentence “if used with a numeral, only 一 can be used”. ABC, CED, and PLECO deﬁne it in the form of “for + quantiﬁed objects”, such as “for certain abstract concepts” (ABC and PLECO), without presenting the features of the abstract nouns and the typical examples. Although learners can learn some collocations with lexicographical examples, the effect is limited. Furthermore, in the above six dictionaries, CEDFL provides one example as the deﬁnition. Although dictionary users can learn lexical collocations with the example, they have no understanding of the meaning potentials and usage of the headword. Generally speaking, the existing deﬁnitions lack complete and clear presentations of usage information and the catering to English-speaking Chinese learners, and the learnability and language-speciﬁcity features need to be reinforced for compiling language-speciﬁc CELDs.

5 The Language-Speciﬁc Collostruction-Based Model for Deﬁning Chinese Collective Classiﬁers Language-speciﬁc CELDs, should be speciﬁc according to the native language backgrounds of dictionary users while following the learnability requirement of CLDs since they are CLDs in nature. According to the ﬁndings of existing deﬁnition models, incomplete and unclear usage information is provided by and large. Based on the concept of the collostruction, a usage-based deﬁnition model, fully integrating and presenting constructions, collostructions, collocations and English equivalents, is proposed to enhance the learnability and language-speciﬁcity features. According to the concept of the collostruction, the meanings of words are interrelated with their constructions (form-meaning pairs), collostructions, and collocations. In terms of deﬁnition contents, language-speciﬁc CELDs should fully provide the usage information of headwords and the distinctive features of headwords, which can be realized by providing the associated constructions, collostructions, collocations and English equivalents of headwords in the deﬁnition of an entry. Construction is the basic unit of a language. The meaning of a word comes from its usage-based constructions. And constructional information provides users with the basic grammatical rules to produce the target language correctly, while showing the usage-based constructional meaning of headwords. And collostructions are the instantiations of usage-based constructions. And the presentations of collostructions in dictionaries are realized with the collostructional features summarized from multiple collostructions. Collocations are the concrete items at the lexical level which co-occur conventionally with the constructions of headwords, which serve as the supplementary information of collostructions. The presentations of high-frequency lexical collocations

The Collostruction-Based Deﬁnition Model

117

in deﬁnitions help non-natives further understand the collostructional features of headwords and acquire the concrete collocations at the lexical level. Besides, English equivalents, the items from the native language system of English-speaking learners, are in partial correspondences to Chinese headwords in most cases. They should be presented as the triggers for activating the linguistic knowledge of native language in English-speaking Chinese learners’ mental lexicon to achieve potential positive transfer. Integrating constructions, collostructions, and collocations involved in the concept of the collostruction, and the English equivalents of headwords, the collostructionbased deﬁnition model is proposed. Constructions, pairs of forms and meanings, are taken as the leading point of the deﬁnition, and the collostructions explored from the conventional co-occurrences of the construction and concrete collocational lexical items are listed under constructions, and the statistically-signiﬁcant collocations at the lexical level are provided as the instantiations of the leading constructions and the collostructional features. Besides, the English equivalents, which are in partialcorrespondence relations with Chinese headwords, are presented along with the collostructional items. And there are possibilities that more than one construction is listed under the same headword, more than one collostruction is listed under the same construction, and more than one typical lexical collocation is listed under the same collostruction. The collostruction-based deﬁnition model is illustrated in Fig. 2.

Fig. 2. The collostruction-based deﬁnition model in language-speciﬁc CELDs

It is shown from the collostruction-based deﬁnition model in Fig. 2 that, constructional information serves as the leading points of the whole deﬁnition, the phrasetype collostructions under constructions are highlighted in bold, and several lexical collocations are presented in brackets, English equivalents are presented along with the corresponding Chinese collostructions. The language-speciﬁc collostruction-based deﬁnition model describes the meaning of headwords from abstractness to concreteness in a dynamic way while fully providing the usage information and distinctive features of headwords, so as to reinforce the learnability and language-speciﬁcity features of language-speciﬁc CELDs. And the collostruction-based deﬁnition model is applied in writing the deﬁnition of “bǎ”. For “bǎ”, it is found in Sect. 2.1 that the two common constructions of “bǎ” are “numeral + bǎ + noun” and “yī + bǎ + noun”, leading to two main constructional

118

Z. Li and C. Congmei

meanings of “bǎ”. The constructional meaning of “numeral + bǎ + noun” is quantifying concrete things held or grasped with hands. Under “numeral + bǎ + noun”, two categories of objects including powder-shaped or granular things and strip-shaped things, are highlighted in bold with the deﬁnition deixis “用于” since the two categories of objects indicate two main collostructional features of “numeral + bǎ + noun”. And since the headword “bǎ” is in partial correspondences to “handful” when collocating with the nouns denoting powder-shaped or granular things, and to the English collective noun “bundle” when collocating with the nouns denoting strip-shaped things, the English counterparts are provided along with the collostructions to achieve potential positive transfer. Besides, “yī + bǎ + noun” is the rhetorical construction for describing feelings or states, collocating with abstract things and abstracted concrete things, which should be highlighted in bold with the deﬁnition deixis “for”. And the typical lexical collocations under collostructions are presented in brackets to serve as the supplementary and instantiated information of constructions and collostructions. And the exemplar entry of “bǎ” is provided in Fig. 3.

Fig. 3. The exemplar entry of “bǎ”

In Fig. 3, sense ② indicates the deﬁnition of “bǎ” as the collective classiﬁer, which is different from its deﬁnitions in six existing CELDs since it presents the complete and clear usage information of “bǎ”, including constructional meanings, forms, collostructional features and some high-frequency collocations, highlighting the distinctive features of “bǎ”.

The Collostruction-Based Deﬁnition Model

119

6 Conclusion In the study, the collostruction-based deﬁnition model is proposed. Firstly, the corpusbased Collostructional Analysis of “bǎ” summarizes the collostructional features of its two main constructions, including “numeral + bǎ + noun” and “yī + bǎ + noun”. “Numeral + bǎ + noun” has the meaning for quantifying concrete things held or grasped with hands, collocating with powder-shaped or granular things and stripshaped things. “Yī + bǎ + noun” is the rhetorical construction for describing feelings or states, in which the meaning of the classiﬁer is deviated from the original meaning, collocating with abstract nouns and abstracted concrete nouns. The comparative study on the collostructional features of “bǎ” and its English counterparts shows that “numeral + bǎ + noun” and the English counterparts “a handful of + noun” and “a bundle of + noun” are in partial-correspondence relations. Moreover, the study of the microstructure deﬁnitions of “bǎ” in existing CELDs ﬁnds that the deﬁnitions of “bǎ” lack the complete and clear presentations of usage information and the catering to English-speaking Chinese learners. Based on the preliminary ﬁndings, the languagespeciﬁc collostruction-based deﬁnition model for Chinese collective classiﬁers is proposed. It is a usage-based deﬁnition model fully integrating and presenting the constructional information, collostructional features, high-frequency collocations and English equivalents. For the study, the collostruction-based deﬁnition model for Chinese collective classiﬁers in language-speciﬁc CELDs is proposed to help Englishspeaking Chinese learners acquire Chinese collective classiﬁers more effectively and enhance the learnability and language-speciﬁcity features. Other elements of the microstructure deﬁnition will be followed up in the future study. Acknowledgements. This work is supported by Graduate Innovation Fund of Southwest University of Science and Technology (20ycx0050).

References 1. Zheng, D.O.: Chinese essential dictionary (Chinese-English). Beijing Language University Press, Beijing (2017) 2. Stefanowitsch, A., Gries St Collostructions: Investigating the interaction between words and constructions. Int. J. Corpus Linguist. 8(2), 209–243 (2003) 3. Zhou, J.: Innovative thinking about interpreting Chinese classiﬁers. Appl. Linguist. (2), 74– 82 (2016). (in Chinese) 4. Xu, Z.Y.: Classiﬁers in modern Chinese learner’s dictionary. Foreign Lang. World (2), 63– 64+62 (1996). (in Chinese) 5. Zhou, J.: The etymological motivations in Chinese classiﬁer dictionaries. Lexicographical Stud. (3), 48–53 (2009). (in Chinese) 6. Zheng, Z.Z., Lin, L.L.: The research of quantiﬁers annotation pattern based on corpus. J. Int. Chin. Stud. 5(2), 90–100 (2014). (in Chinese) 7. He, J.: Addendum to the Study of Classiﬁers in Modern Chinese. Beijing Language and Culture University Press, Beijing (2008). (in Chinese) 8. Zong, S.Y.: A cognitive study on collective classiﬁers. Doctoral dissertation of Shanghai Normal University (2008). (in Chinese)

120

Z. Li and C. Congmei

9. Zhan, W.D., Guo, R., Chang, B.B., et al.: The building of the CCL corpus: its design and implementation. Corpus Linguist. 6(1), 71–86+116 (2019). (in Chinese) 10. Gries, St.Th.: Coll. analysis 3.2a. A program for R for Windows 2.x (2007) 11. Qiu, Y., Shi, C.H.: The generative basis and mechanism of the rhetoric construction “yi+CI +N” and the possibility and reality of the generation of rhetoric construction. Lang. Teach. Linguist. Stud. (6), 65–77 (2019). (in Chinese) 12. Ren, L., Sun, H., Yang, J.: Chinese-English Parallel Corpus of A Dream of Red Mansions (2010). http://corpus.usx.edu.cn/. Accessed 31 Oct 2020 13. Wu, X., Sun, H., Yang, J.: Parallel Corpus of Romance of Three Kingdoms (2010). http:// corpus.usx.edu.cn/. Accessed 31 Oct 2020 14. Zhu, X., Sun, H., Yang, J.: Parallel Corpus of Journey to the West (2010). http://corpus.usx. edu.cn/. Accessed 31 Oct 2020 15. Compilation Committee of A Concise Chinese-English Dictionary: A concise ChineseEnglish dictionary (revised edition). Commercial Press, Beijing (2002) 16. DeFrancis, J.: ABC Chinese-English comprehensive dictionary. Chinese Dictionary Press, Shanghai (2003) 17. Qian, W.S., Yao, N.Q.: A Chinese-English dictionary. Foreign Languages Press, Beijing (2007) 18. Compilation Committee of Chinese-English Dictionary for Foreign Learners: A ChineseEnglish dictionary for foreign learners. Shanghai Translation Publishing House, Shanghai (2008)

The Collocations of Chinese Color Words Shan Wang1,2(&), Le Wu1, and Qiaomin Gong3 1

2

Department of Chinese Language and Literature, Faculty of Arts and Humanities, University of Macau, Taipa, Macau, China [email protected] Zhuhai UM Science & Technology Research Institute, Guangdong, China 3 School of Art Design, Hankou University, Wuhan, China

Abstract. Color words, as the basic words in a language, play an important role in people’s understanding of the world. The existing studies at home and abroad focus on the comparison or usage of color words of different nationalities, but there is a lack of corpus-based research on the collocations of Chinese color words. The typical color adjectives white and black are common in languages. This article took the Chinese color words 白 bái ‘white’ and 黑 hēi ‘black’ as representatives and used a large-scale corpus to extract their collocations. It then annotated the meanings of 白 bái ‘white’ and 黑 hēi ‘black’ in the collocations, as well as the semantic categories of the collocated words one by one. It further compared and analyzed the characteristics of their collocations in the experiential domain and the abstract domain in order to probe into the distribution rules. This research helps to deepen the understanding of the characteristics of color word used in different cognitive domains. Keywords: Color words White Black Collocations domain The abstract domain Cognition

The experiential

1 Introduction Color is ubiquitous in our lives. It is not unique to objects, but the cognitive result of the interaction between people’s eyes and objects. The perception of color is a basic cognition of human beings. When people see color, they unconsciously react in their brains and sense organs, and people’s reactions are different when they see different colors. Therefore, color words have been valued by linguists for a long time. Since white and black have a strong focus, they are the colors most easily perceived by human’s visual organs. The two colors 白 bái ‘white’ and 黑 hēi ‘black’ exist in almost every language in the world. Moreover, they are two basic words in different languages. Some words composed of 白 bái ‘white’ and 黑 hēi ‘black’ are also widely used. For example, Chinese compound words like 枯白 kūbái ‘dry white’, 死白 sǐbái ‘dead white’, 白晃晃 báihuǎnghuǎng ‘dazzling white’, 烏黑 wūhēi ‘black’, 黑沉沉 hēichénchén ‘very dark’, and 黑壓壓 hēiyāyā ‘dark’; commendatory color words like 白亮亮 báiliàngliàng ‘white light” and 黑黝黝 hēiyǒuyǒu ‘shiny black’; derogatory color words with afﬁxes like 白不呲咧 báibúcīliě ‘white’ and 黑不溜秋 hēibúliūqiū ‘black’ [1]. Therefore, 白 bái ‘white’ and 黑 hēi ‘black’, as typical color words, are important for us to make a profound study. © Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 121–143, 2022. https://doi.org/10.1007/978-3-031-06703-7_10

122

S. Wang et al.

Color words have developed different meanings in addition to their original meaning as color in the long process of people’s use. This phenomenon reflects people’s cognitive process: when something without a color feature is given color, it reflects that people use familiar, concreate concepts to express unfamiliar and abstract entities, which forms color metaphors, enabling us to understand the surrounding things more vividly. Metaphor is not only a means of rhetoric, but also a cognitive phenomenon in essence. This paper divides the collocations of 白 bái ‘white’ and 黑 hēi ‘black’ into the experiential domain and abstract domain according to whether 白 bái ‘white’ and 黑 hēi ‘black’ use the original or the metaphorical meaning in the collocations. It then explores the semantic distribution rules of their collocates to explore the characteristics of color word used in different cognitive domains.

2 Related Research Color words, as a category of words commonly found in languages, play an important role in people’s cognition of the world. The existing research on color words at home and abroad mainly focuses on the following aspects. First, the comparison of color words in different countries. Ge and Yang [2] selected the color words 白 bái ‘white’ and 黑 hēi ‘black’ in Chinese and English to investigate their lexicalization, and found that the conceptual domain between Chinese and English has two parts of overlapping and non-overlapping, while Chinese and English people have different thinking patterns. The different cognitive model plays a decisive role in the evolution of the meanings of 白 bái ‘white’ and 黑 hēi ‘black’, in which metonymy plays a more fundamental role than metaphor. Xu [1] compared the similarities and differences between English and Chinese color words white and black, and points out that when expressing the unique national culture, the extended meanings of the words show differences. Liu and Wang [3] explored the semantic differences between the Chinese and Thai color words 白 bái ‘white’ and 黑 hēi ‘black’ from the perspective of the conceptual meaning and the associative meaning. Secondly, some studies about color words are carried out to provide certain suggestions for teaching. Luo [4] took the color words 白 bái ‘white’, 黑 hēi ‘black’ and 紅 hóng ‘red’ as examples and proposed the teaching strategies about color words in teaching Chinese as a foreign language. She pointed out that teachers should pay attention to the additional meanings and cultural connotation of color words, and also pay attention to comparing and summarizing cross-cultural meanings of color words, so as to make students draw inferences from one example to achieve better teaching effect. Shwesinwin [5] compared the difference between Korean and Burmese color words from the perspective of language and cultural characteristics based on the symbolic meaning, explained the limitation of color term expressions, and provided the possibility for the application of color words in teaching. Chang and Lai [6] found that languages, tasks and English proﬁciency are the factors that affect the learning performance of English and Chinese color words. The research also revealed the development model of Chinese learners’ acquisition of English color words, and put forward some enlightenment and suggestions for teaching English vocabulary. The survey by Natalia, Elena, and Svetlana [7] found that language learners have difﬁculty when using

The Collocations of Chinese Color Words

123

phrase units with the color indicator black in communication. Therefore, the authors investigated the language backgrounds of different learners. And their use are largely inconsistent with the Russian language, which causes difﬁculty for learners. The authors proposed a teaching system according to the phenomenon. Thirdly, based on the speciﬁc color adjectives, some scholars studied the special phenomenon of color words in certain nations. Liu and Liu [8] believed that a language itself is a kind of language meme, which has an important influence on the development of a language, especially the lexical meanings. Therefore, they analyzed the balance and imbalance of the development of the meanings of the three color words 白 bái ‘white’, 黑 hēi ‘black’ and 灰 huī ‘grey’ based on the language memetic theory, and discussed the reasons for the changes of the meanings of Chinese color words so as to explore the development of Chinese vocabulary. Grossmann and D’Achille [9] studied the compound color terms in Italian from a synchronic and diachronic perspectives, and gave a descriptive overview of the characteristics of their various compound modes. The study found that compound color terms were only used with certain frequency after the 18th century. Kuriki [10] analyzed the number of Japanese basic color words and their distribution in the color space through a survey of 57 native Japanese speakers and summarized the changes of Japanese color words in the past 30 years. It can be seen that color words have always been a hot spot in the ﬁeld of linguistics and language acquisition in recent years. Existing research involves cognition, culture and other aspects, as well as synchronic and diachronic perspectives. However, there is a lack of research on the collocations of color words based on corpora. Therefore, on the basis of a large-scale corpus, this study selects the typical color words白 bái ‘white’ and 黑 hēi ‘black’ to explore the distribution rules of their collocations, which provide a new perspective on the research of color words.

3 Research Methods 3.1

Corpus Selection and the Research Tool

Chinese Gigaword was ﬁrst released on the Linguistic Data Consortium1 in 2003 and contains Chinese data from the Mainland and Taiwan. The second edition2, released in 2005, covers Chinese language materials from Mainland, Taiwan and Singapore. After doing part-of-speech tagging and proofreading, it developed into Tagged Chinese Gigaword3 [11–14]. Analyzing Chinese color words on the basis of this large-scale corpus can comprehensively show their characteristics. Sketch Engine [12, 13] is a tool developed based on large corpora. In addition to providing general keywords and contextual queries, it also provides a quick description of vocabulary features. It can display the grammatical knowledge list of target words and clearly present the actual usage rules of most words. The system can be applied to linguistic research, language teaching and natural language processing. Based on 1 2 3

https://catalog.ldc.upenn.edu/LDC2003T09. https://catalog.ldc.upenn.edu/LDC2005T14. https://catalog.ldc.upenn.edu/LDC2009T14.

124

S. Wang et al.

Sketch Engine, the Chinese Word Sketch Engine (CWS) [15, 16] is developed by the Vocabulary Research Group of Academia Sinica. This system is to sketch the lexical characteristics of target words on the basis of a Chinese corpus, which is suitable for the analysis of Chinese vocabulary, and there are many studies using this system [17– 26]. Based on the grammatical list of Chinese color words presented by CWS, this paper analyzes their collocations, which are one category of multiword expressions [27]. This study selects the typical Chinese adjectives 白 bái ‘white’ and 黑 hēi ‘black’ in the color domain. Based on the metaphor theory of cognitive linguistics, this paper analyzes the distribution of their collocates in the experiential domain and the abstract domain (the subject of the subject-predicate relation and the head of the attributivehead relation) and explores the similarities and differences. The reason for choosing 白 bái ‘white’ and 黑 hēi ‘black’ as target words is that the pair of 白 bái ‘white’ and 黑 hēi ‘black’ have a stronger perception than other color words, and their grammatical relationship is consistent. That is, most of them present the subject-predicate relation and the attributive-head relation, which is convenient for a comparative study and can help us to better explore their collocational rules. 3.2

Corpus Selection and Annotation

In order to ensure the comparability of search results, before using CWS to search, we set the same search conditions for 白 bái ‘white’ and 黑 hēi ‘black’: ﬁrst the Tagged Chinese Gigaword Corpus is selected, and then the minimum frequency is set to one, the maximum range of its grammatical relation is set to 999, and the minimum saliency is set to 0. This setting guarantees the scope of the search to the largest extent, so that the target words will not be missed. After searching for 白 bái ‘white’ and 黑 hēi ‘black’, it is found that 白 bái ‘white’ presents three grammatical relations, while 黑 hēi ‘black’ presents two grammatical relations. The grammatical relation presented by 白 bái ‘white’ but not in 黑 hēi ‘black’ is the verb-object relation (Object of) and there are only two cases, but after screening, it is found that they are wrong collocations, so the verb-object relation of 白 bái ‘white’ is excluded. The other two grammatical relations of 白 bái ‘white’ are shared with 黑 hēi ‘black’; namely the subject-predicate relation (Subject) and the attributive-head relation (Modiﬁes). After data retrieval was completed, we manually screened the collocations of 白 bái ‘white’ and 黑 hēi ‘black’ one by one and deleted wrongly extracted collocations. The results are shown in Table 1. Table 1. The number of the collocations for 白 bái ‘white’ and 黑 hēi ‘black’ Color Words Collocations Number

白 bái ‘white’ Original Collocations 429

黑 hēi ‘black’ Correct Collocations 79

Original Collocations 296

Correct Collocations 73

The Collocations of Chinese Color Words

125

“Original Collocations” in Table 1 are those that are automatically extracted from the corpus and “Correct Collocations” are those that are manually screened after deleting many wrongly automatically extracted collocations. In the process of manual inspection, we excluded the wrong collocations: (a) names of people (such as 白沙堡 Báishābǎo) and places (such as 白驹容 Bái Jūróng); (b) the errors of 白 bái ‘white’ that are wrongly classiﬁed as a predicate or attributive. For example, more concentrated on the syntactic function of 白 bái ‘white’, which are opposite to the grammatical list. For example, 白 bái ‘white’ should be a predicate rather than an attributive in the collocations of the subject-predicate relation, but there are many cases that 白 bái ‘white’ is used as attributives, such as 頭綁白布條 tóu bǎng bái bùtiáo; while in collocations of the attributive-head relation, 白 bái ‘white’ should be an attributive, but many 白 bái ‘white’ are predicate, such as 顔色較白的産品 yánsè jiào bái de chǎnpǐn ‘products with relatively white color’. It can be seen from Table 1 that although the original collocations of 白 bái ‘white’ is far more than that of 黑 hēi ‘black’, after screening, the quantity of them basically does not show much difference. There are 79 and 73 correct collocations of 白 bái ‘white’ and 黑 hēi ‘black’ respectively. This shows that CWS extracts more wrong collocations for 白 bái ‘white’. Table 2 shows the number of collocations of 白 bái ‘white’ and 黑 hēi ‘black’ under the two grammatical relations. It can be seen that the number of both collocations in the attributivehead relation are higher than those in the subject-predicate relation. The former accounts for 68.4% and 68.5% respectively, while the latter accounts for 31.6% and 31.5% respectively. Table 2. Number of different grammatical relations 白 bái ‘white’ Number Percentage The subject-predicate relation 25 31.6% The attributive-head relation 54 68.4% Total 79 100.0%

Relation

黑 hēi ‘black’ Number Percentage 23 31.5% 50 68.5% 73 100.0%

This study then manually annotated the meanings of 白 bái ‘white’ and 黑 hēi ‘black’ in all the collocations and divided them into the experiential and the abstract domain according to the meanings. It further annotated the semantic categories of the nouns collocated with 白 bái ‘white’ and 黑 hēi ‘black’ in the two grammatical relations. It then explored the semantic distribution and metaphorical expansion of the collocations.

126

S. Wang et al.

4 The Experiential Domain and the Abstract Domain of the Chinese Color Words 白 bái ‘White’ and 黑 hēi ‘Black’ According to their senses in The Contemporary Chinese Dictionary (the seventh edition) [28], we manually annotated the senses of 白 bái ‘white’ and 黑 hēi ‘black’ in each sentence. The distribution of different senses is shown in Table 3 and Table 4. The sense ⑨ of 黑 hēi ‘black’ is a newly added sense based on the corpus. Based on cognitive linguistics, among the senses of 白 bái ‘white’ and 黑 hēi ‘black’, only the ﬁrst senses “color like frost or fresh snow (as opposed to ‘black’)” and “color like coal or ink (as opposed to ‘white’)” are the original meanings of them respectively, while the remaining meanings are all extended meanings. The sense ① of “白1” is the original sense of 白 bái ‘white’, while the other senses are derived from this one. Firstly, 白 bái ‘white’ can be projected from its original domain of color to objects with white properties, and then it can be extended from objects to the expression of the sense ⑨ related to funerals; secondly, 白 bái ‘white’ can be projected from its original domain to express the bright light, and then it can be projected to the cognitive domain to express the clarity and understanding, shown in sense ④; thirdly, 白 bái ‘white’ can be extended from its original sense to state void, which means nothing, and this sense can then be projected to the human cognitive domain to express that something is in vain or has no effect, shown in senses ⑥ and ⑦. Finally, the original meaning of 白 bái ‘white’ can be extended to express the meaning of giving a scornful look, shown in sense ⑩, which contains certain emotion. The sense ① is the original meaning of 黑 hēi ‘black’. Different from the extension of the senses of 白 bái ‘white’, the extension of the senses of 黑 hēi ‘black’ is a kind of chain radiation as a whole; that is, other senses are successively derived from the original sense. 黑 hēi ‘black’ ﬁrst extends from the word expressing color to the senses ② and ③ expressing the darkness of light, and then extends to the sense ⑤ expressing that someone is wicked. If a person is bad and vicious, what he does should be secret and illegal, which leads to the sense ④. The sense ④ is further extended to express an action, shown in ⑥ and ⑦. The newly added sense ⑨ is a negative emotional sense derived from the combination of 黑 hēi ‘black’ and cultural connotation. The original meanings of 白 bái ‘white’ and 黑 hēi ‘black’ reflect the characteristics of their experiential domain, while the extended meanings reflect their abstract domain. Therefore, this article classiﬁed the domain of the collocates according to their senses (including subjects in the subject-predicate relation and the heads in the attributivehead relation). That is, if the original meaning is used, the collocates belong to the experiential domain; otherwise they belong to the abstract domain. Table 3 shows the sense distribution of 白 bái ‘white’, and only sense ① in 白1 is its original meaning. It can be seen that the proportions of 白 bái ‘white’ under the subject-predicate relation and the attributive-head relation are 88.0% and 94.4%, respectively. According to the classiﬁcation criteria for the experiential domain and the abstract domain, the distribution proportion of the collocates of 白 bái ‘white’ in the experiential domain is much higher than that in the abstract domain. It can be seen that when it is in the subject-predicate relation or the attributive-head relation, 白 bái

The Collocations of Chinese Color Words

127

Table 3. Sense distribution of 白 bái ‘white’ 白

Senses

白1

① Adj. color like frost or fresh snow (as opposed to ‘black’) ② something white or nearly white ③ Adj. bright; shiny ④ clear; plain; make a clear ⑤ pure; plain; blank ⑥ Adv. in vain; to no purpose; for nothing ⑦ Adv. free of charge; gratis ⑧ Adj. white (symbol of reactionary political ideology) ⑨ funeral ⑩ V. give a scornful look; stare coldly ⑪ a surname Adj. (of a Chinese character) wrongly written or mispronounced ① state; explain ② spoken lines in an opera ③ dialect ④ vernacular Bai nationality -

白2

白3

白4 Total

The subject-predicate relation Number Percentage 22 88.0%

The attributive-head relation Number Percentage 51 94.4%

0

0.0%

0

0.0%

1 1 0 0

4.0% 4.0% 0.0% 0.0%

0 2 1 0

0.0% 3.7% 1.9% 0.0%

0 0

0.0% 0.0%

0 0

0.0% 0.0%

0 1

0.0% 4.0%

0 0

0.0% 0.0%

0 0

0.0% 0.0%

0 0

0.0% 0.0%

0 0 0 0 0 25

0.0% 0.0% 0.0% 0.0% 0.0% 100.0%

0 0 0 0 0 54

0.0% 0.0% 0.0% 0.0% 0.0% 100.0%

‘white’ tends to indicate the collocation’s embodied and experiential usage; that is, it is mostly used to describe the color of the collocated words. Table 4 shows the sense distribution of 黑 hēi ‘black’. In The Contemporary Chinese Dictionary (the seventh edition) [28], 黑 hēi ‘black’ has eight senses. The ninth sense is a new sense added by the authors when we found that there is no suitable meaning in the dictionary. It is used to express unhappy or dissatisﬁed mood. The data in Table 4 shows that both in the subject-predicate relation and the attributive-head relation, the ratio of 黑 hēi ‘black’ in the experiential domain is higher than that in the abstract domain, with the proportions of 69.6% and 74.0% respectively, indicating that the color adjective 黑 hēi ‘black’ is more often used to describe the color of the collocates.

128

S. Wang et al. Table 4. Sense distribution of 黑 hēi ‘black’

Senses

① Adj. color like coal or ink (as opposed to ‘white’) ② Adj. dark ③ Adj. night; dark night ④ secret; illegal ⑤ wicked; sinister ⑥ V. secretly entrap, deceive or attack ⑦ V. illegally intrude into other people's computer system through Internet to view, change, steal conﬁdential data or interfere with computer programs ⑧ (hēi) a surname ⑨ Adj. unhappy [a newly added sense by the researchers] Total

The subject-predicate relation Number Percentage 16 69.6%

The attributive-head relation Number Percentage 37 74.0%

4 0 1 1 0 0

17.4% 0.0% 4.3% 4.3% 0.0% 0.0%

9 1 3 0 0 0

18.0% 2.0% 6.0% 0.0% 0.0% 0.0%

0 1

0.0% 4.3%

0 0

0.0% 0.0%

23

100.0%

50

100.0%

5 Semantic Analysis of the Collocates of the Color Words 白 Bái ‘White’ and 黑 hēi ‘Black’ Words belong to the same semantic category can comprehensively reflect their generic characteristics. After dividing the experiential domain and the abstract domain, we annotated the semantic categories of the collocates of 白 bái ‘white’ and 黑 hēi ‘black’ using A Thesaurus of Modern Chinese [29]. And then, through comparative analysis, we examined the similarities and differences of the collocates of 白 bái ‘white’ and 黑 hēi ‘black’ in different grammatical relations. 5.1

Comparison Under the Subject-Predicate Relation

This section mainly compares and analyzes the distributional characteristics of the collocates of 白 bái ‘white’ and 黑 hēi ‘black’ under the subject-predicate relation. The number of ﬁrst-level semantic categories of the collocates of 白 bái ‘white’ and 黑 hēi ‘black’ in the experiential domain and the abstract domain is shown in Table 5. The total number of the collocates of 白 bái ‘white’ is 25, while the number of 黑 hēi ‘black’ is slightly less than 白 bái ‘white’, with 23 in total. However, the ﬁrst-level semantic categories of 白 bái ‘white’ and 黑 hēi ‘black’ are not much different in general, and their collocates occur in the same four ﬁrst-level semantic categories, but the proportions are different: 白 bái ‘white’ accounts for the highest proportion in the semantic categories of Living Things, with a proportion of 48.0%, followed by Concrete Entities, accounting for 44.0%, Abstract Entities and Space&Time are relatively

抽象事物 chōuxiàng shìwù ‘Abstract Entities’ 具體物 jùtǐ wù ‘Concrete Entities’ 生物 shēngwù ‘Living Things’ 時空 shíkōng ‘Space&Time’ Total

First-level semantic category

黑 hēi ‘black’

0.0%

45.5%

50.0%

4.5% 100.0%

0

10

11

1 22

0 3

1

1

1

0.0% 100.0%

33.3%

33.3%

33.3%

1 25

12

11

1

4.0% 100.0%

48.0%

44.0%

4.0%

0 16

5

10

1

0.0% 100.0%

31.3%

62.5%

6.3%

1 7

0

1

5

14.3% 100.0%

0.0%

14.3%

71.4%

1 23

5

11

6

4.3% 100.0%

21.7%

47.8%

26.1%

The experiential The abstract domain Total The experiential The abstract domain Total domain domain Number Percentage Number Percentage Number Percentage Number Percentage Number Percentage Number Percentage

白 bái ‘white’

Table 5. The semantic category distribution of the collocates of 白 bái ‘white’ and 黑 hēi ‘black’ in the subject-predicate relation

The Collocations of Chinese Color Words 129

130

S. Wang et al.

small, with a proportion of all being 4.0%, while the semantic proportion of 黑 hēi ‘black’ is from high to low, which is Concrete Entities, Abstract Entities, Living Things and Space&Time, accounting for 43.5%, 30.4%, 21.7% and 4.3% respectively. Next, we will make a more detailed semantic comparison of the collocates in the experiential domain and the abstract domain in Table 5 to explore the similarities and differences of their distribution. (1) Comparison of the collocates of 白 bái ‘white’ between the experiential domain and the abstract domain The data in Table 5 shows that when 白 bái ‘white’ is used as a predicate, there are 22 collocates in the experiential domain and three collocates in the abstract domain, and the former is about seven times of the latter. The experiential domain reflects the original meaning, while the abstract domain reflects the extended meanings. This distribution explains that when 白 bái ‘white’ is used as a predicate, people are inclined to use the original sense of 白 bái ‘white’ to describe the objective color of the collocates, indicating that the prototypical experience of 白 bái ‘white’ is strong. The number of collocates of 白 bái ‘white’ in the abstract domain is less, indicating that people use less metaphorical extension to modify the collocates; that is, the metaphorical usage of 白 bái ‘white’ is relatively rare. In the following part, we will compare the semantic categories of the collocates of 白 bái ‘white’ and 黑 hēi ‘black’ in the experiential domain and the abstract domain to explore the semantic distribution. The collocates of 白 bái ‘white’ are mainly distributed in the two semantic categories of Living Things and Concrete Entities in the experiential domain, accounting for 50.0% and 45.5% respectively, while the collocates in the abstract domain are less in number and evenly distributed. They are distributed in Abstract Entities, Concrete Entities and Living Things, and the proportions are all 33.3%. The collocates of 白 bái ‘white’ in the experiential domain and the abstract domain are jointly distributed in the two semantic categories of Living Things and Concrete Entities. We will further compare the collocations’ differences of 白 bái ‘white’ in these two semantic categories. We further annotated the second-level semantic categories of the collocates in the experiential domain and the abstract domain, and found that the second-level semantic categories of the collocates belonging to Living Things include Biological Parts (頭髮 tóufà ‘hair’, 鬚髮 xūfà ‘beard’, 皮膚 pífū ‘skin’, 發須 fāxū ‘beard’, 鼻肉 bíròu ‘nose’, 鬢髮 bìnfà ‘hair on the temples’, 肌膚 jīfū ‘skin’, 牙齒 yáchǐ ‘tooth’, 羽毛 yǔmáo ‘feather’, and 臉 liǎn ‘face’) and Plants (甜玉米 tián yùmǐ ‘sweet corn’), of which the category of Biological Parts is the largest, while the collocates of the secondlevel semantic categories of Living Things in the abstract domain are People (愛人 àirén ‘lover’), indicating that the distribution of the collocates in the experiential domain and the abstract domain is different. In the second-level semantic category of Concrete Entities, the distribution of the collocates in the experiential domain and the abstract domain is overlapping: the second-level semantic category of the collocates in the experiential domain is richer, including ﬁve second-level semantic categories: Materials (沙 shā ‘sand’), Buildings (房 fáng ‘house’, 墻 qiáng ‘wall’), Daily Necessities (筷子 kuàizǐ ‘chopsticks’, 竹筷 zhúkuài ‘bamboo chopsticks’, 碗 wǎn ‘bowl’), Food (饅頭 mántóu ‘steamed bread’) and Natural Objects (顔色 yánsè ‘color’, 色 sè ‘color’, 雲 yún ‘cloud’), while the second-level semantic categories of the

The Collocations of Chinese Color Words

131

collocates belonging to Concrete Entities include Building (墻壁qiángbì ‘wall’). The collocates belonging to Concrete Entities in the experiential domain and the abstract domain are distributed in the second-level semantic categories of Building together. (2) Comparison of the collocates of 黑 hēi ‘black’ between the experiential domain and the abstract domain It can be seen from Table 5 that when 黑 hēi ‘black’ is used as a predicate, there are 16 subjects in the experiential domain and seven subjects in the abstract domain. The latter is about 2.3 times that of the former. In the subject-predicate relationship, the number of collocations of 黑 hēi ‘black’ in the experience domain is higher than that in the abstract domain and the meaning in the experience domain is the original meaning of 黑 hēi ‘black’, indicating that 黑 hēi ‘black’ is more often used to describe the color of collocated words in the subjectpredicate relationship. Different from the distribution of 白 bái ‘white’, the number of differences between the experiential domain and the abstract domain of 黑 hēi ‘black’ is not as large as that of 白 bái ‘white’, which shows that people more often use the extended meaning of 黑 hēi ‘black’ through metaphorical cognitive extension on the basis of the original physical experience. Next, we will explore the distribution rules of 黑 hēi ‘black’ by comparing the collocational difference between the experiential domain and the abstract domain. The collocates of黑 hēi ‘black’ in the experiential domain are concentrated in the two semantic categories of Concrete Entities and Living Things, accounting for 62.5% and 31.3% respectively, while up to 71.4% of them in the abstract domain are distributed in Abstract Entities, followed by Space&Time and Concrete Entities accounted for 14.3%. Analyzing their second-level semantic categories of respectively, it is found that when the collocates of 黑 hēi ‘black’ in the experiential domain belong to Concrete Entities, the second-level semantic categories include Natural Objects (顔色 yánsè ‘color’, 色澤 sèzé ‘luster’, 色 sè ‘color’, 膚色 fūsè ‘skin color’, 池水 chíshuǐ ‘pool water’, 前鎮河 qiánzhènhé ‘Qianzhen river’), General Terms (算珠 suànzhū ‘calculation beads’, 它 tā ‘it’), Daily Necessities (水桶 shuǐtǒng ‘bucket’) and Cultural Goods (墨水 mòshuǐ ‘ink’); And when the collocates of 黑 hēi ‘black’ are in the experiential domain, the collocates from Living Things belong to the second-level categories Biological Part (頭髮 tóufà ‘hair’, 皮膚 pífū ‘skin’, 臉 liǎn ‘face’), Animal (烏鴉 wūyā ‘crow’), and Human (人 rén ‘people’). While the collocates of 黑 hēi ‘black’ in abstract domain under Abstract Entities has the following second-level categories: Things (過程 guòchéng ‘process’, 天候 tiānhòu ‘weather’), Attributes (夜色 yèsè ‘night’, 泰森 Tàisēn ‘Tyson’), and Consciousness (人心 rénxīn ‘human heart’). When the collocates of 黑 hēi ‘black’ in the experiential domain belong to Space&Time, the second-level semantic category is Space (天 tiān ‘sky’), which is used to describe dark sky. Generally speaking, the collocates 黑 hēi ‘black’ in experiential domain is very different from those of the abstract domain. People prefer to describe Concrete Entities and Living Things with 黑 hēi ‘black’, because these two types of things can bring more direct embodied experience; namely, visual impact to people. Based on the metaphorical concepts, 黑 hēi ‘black’ tends to be used through metaphorical extension to describe Abstract Entities and Space&Time.

132

S. Wang et al.

(3) Comparison of the collocates of 白 bái ‘white’ and 黑 hēi ‘black’ in the experiential domain Comparing the data in Table 5, we can see that the collocates of 白 bái ‘white’ and 黑 hēi ‘black’ in the experiential domain are mainly in the two semantic categories of Concrete Entities and Living Things. The difference is that the highest percentage of collocates in the experiential domain of 白 bái ‘white’ is Living Things, accounting for 50.0%, followed by Concrete Entities, accounting for 45.5%, while 黑 hēi ‘black’ is different, with Concrete Entities accounting for a highest proportion of 62.5%, followed by Living Things, with the proportion of 31.3%. When the collocates belong to Concrete Entities, the second-level semantic categories of 白 bái ‘white’ include ﬁve semantic categories: Materials (沙 shā ‘sand’), Buildings (房 fáng ‘house’, 墻 qiáng ‘wall’), Daily necessities (筷子 kuàizi ‘chopsticks’, 竹筷 zhúkuài ‘bamboo chopsticks’, 碗 wǎn ‘bowl’), Food Items (饅頭 mántóu ‘steamed bread’), and Natural Objects (顔色 yánsè ‘color’, 色 sè ‘color’, 雲 yún ‘cloud’), while there are four second-level semantic categories of 黑 hēi ‘black’ including Natural Objects (顔色 yánsè ‘color’, 色澤 sèzé ‘luster’, 色 sè ‘color’, 膚色 fūsè ‘skin color’, 池水chíshuǐ ‘pool water’, 前鎮河 qiánzhènhé ‘Qianzhen river’), General Terms (算珠 suànzhū ‘calculation beads’, 它 tā ‘it’), Daily Necessities (水桶 shuǐtǒng ‘bucket’) and Culture Goods (墨水 mòshuǐ ‘ink’). And the similarity between 白 bái ‘white’ and 黑 hēi ‘black’ is that they both include the second-level semantic categories of Natural Objects and Daily Necessities. When the collocates belong to Living Things, the second-level semantic categories of 白 bái ‘white’ include Biological Parts (頭髮 tóufà ‘hair’, 鬚髮 xūfà ‘beard’, 皮膚 pífū ‘skin’, 發須 fāxū ‘beard’, 鼻肉 bíròu ‘nose’, 鬢髮 bìnfà ‘sideburns’, 肌膚 jīfū ‘skin’, 牙齒 yáchǐ ‘tooth’, 羽毛 yǔmáo ‘feather’, and 臉liǎn ‘face’) and Plants (甜玉米 tián yùmǐ ‘sweet corn’), while the second-level semantic categories of 黑 hēi ‘black’ include Biological Parts (頭髮 tóufà ‘hair’, 皮膚 pífū ‘skin’, 臉 liǎn ‘face’), Animals (烏鴉 wūyā ‘crow’), and Human (人 rén ‘people’). And the similarity between 白 bái ‘white’ and 黑 hēi ‘black’ is that they both include the description of the biological part. In addition, 白 bái ‘white’ focuses on the description of plants, while 黑 hēi ‘black’ focuses on the animals. These similarities reflect the commonality of people’s cognition between 白 bái ‘white’ and 黑 hēi ‘black’, and the differences reflect people’s preference of choice. (4) Comparison of the collocates of 白 bái ‘white’ and 黑 hēi ‘black’ in the abstract domain By comparing the data in Table 5, we found that there are three collocates for 白 bái ‘white’ and seven collocates for 黑 hēi ‘black’ in the abstract domain, accounting for 12.0% and 30.4% respectively. It shows that compared with 白 bái ‘white’, people more often use the extended meaning of 黑 hēi ‘black’ to achieve metaphorical extension. The data in Table 5 show that the collocates of 黑 hēi ‘black’ in the abstract domain are more concentrated, and those of 白 bái ‘white’ are more evenly distributed. The regularity of the distribution of 黑 hēi ‘black’ in the abstract domain is strong, mainly distributed in Abstract Entities, accounting for up to 71.4%, indicating that the basic usage of 黑 hēi ‘black’ as a color word has been extended to the abstract ﬁeld, and further exploration shows that the second-level semantic categories of 黑 hēi

The Collocations of Chinese Color Words

133

‘black’ in the abstract domain include three categories: Things (過程 guòchéng ‘process’, 天候 tiānhòu ‘weather’), Attributes (夜色 yèsè ‘night’, 泰森 Tàisēn ‘Tyson’) and Consciousness (人心 rénxīn ‘human heart’). Different from 黑 hēi ‘black’, the secondlevel semantic category of 白 bái ‘white’ is Science&Education (話 huà ‘talk’). In the category of Concrete Entities, the second-level semantic category of 黑 hēi ‘black’ is Natural Objects (天 tiān ‘sky’), while that of 白 bái ‘white’ is mainly Building (墻壁 qiángbì ‘wall’). In the category of Living Things, the second-level semantic categories of Biological Parts (心 xīn ‘heart’) and People (愛人 àirén ‘lovers’) belong to 黑 hēi ‘black’ and 白 bái ‘white’ respectively. The exploration of the collocations in the abstract domain found that the collocates’ choice of 白 bái ‘white’ and 黑 hēi ‘black’ is unique; that is, the collocates of them are not interchangeable. When the collocates are from Living Things, the sentence in the corpus is 人心太黑了 rénxīn tài hēi le ‘the heart is too dark’ and 愛人白了他一眼 àirén bái le tā yīyǎn ‘the lover gave him a white look’. Here, 黑 hēi ‘black’ and 白 bái ‘white’ both express extended meanings, which respectively represent the viciousness and badness of a person, and the lover’s contempt or dissatisfaction. But we cannot say 人心太白了 rénxīn tài bái le ‘the heart is too white’ and愛人黑了他一眼 àirén hēi le tā yīyǎn ‘the lover gave him a black look’. It is obvious that such sentences are grammatical but not semantical acceptable. In addition, for the semantic category of Space&Time, there are no collocations for 白 bái ‘white’ in this category, while the second-level semantic category of 黑 hēi ‘black’ is Space (天 tiān ‘sky’); that is, we can usually say 天黑了 tiān hēi le sky_dark_ASP ‘It’s getting dark’, but not to say 天白了 tiān bái le sky_white_ASP ‘It's getting white’. These ﬁndings not only reflect the uniqueness of the collocates’ choice between 白 bái ‘white’ and 黑 hēi ‘black’, but also reflect the asymmetry of people’s cognition. 5.2

Comparison Under the Attributive-Head Relation

The number of collocates (the heads under the attributive-head relation) of 白 bái ‘white’ and 黑 hēi ‘black’ under the attributive-head relation in the experiential domain and the abstract domain is shown in Table 6. There are 54 different collocates of 白 bái ‘white’ and 50 collocates of 黑 hēi ‘black’. Unlike the subject-predicate relation, 白 bái ‘white’ has slightly more collocations than 黑 hēi ‘black’, indicating that the collocates of 白 bái ‘white’ have a wide range of meanings when they are used as heads, but the overall difference of the number between 白 bái ‘white’ and 黑 hēi ‘black’ is not big. In the ﬁrst three semantic categories of 白 bái ‘white’ and 黑 hēi ‘black’, the proportion of collocates are consistent: the number of the collocates in Concrete Entities is the largest, with a proportion of 43.2% for 白 bái ‘white’ and 42.0% for 黑 hēi ‘black’, followed by Abstract Entities, accounting for 18.5% and 30.0% respectively. The semantic category that ranked third for both collocates of 白 bái ‘white’ and 黑 hēi ‘black’ is Organisms, accounting for 7.4% and 14.0% respectively. Next, we will make a more detailed comparison of the collocates in the experiential domain and the abstract domain of Table 6 to explore the rule of their distribution.

抽象事物 chōuxiàng shìwù ‘Abstract Entities 具體物 jùtǐ wù ‘Concrete Entities’ 生物 shēngwù ‘Living Things’ 社會活動 shèhuì huódòng ‘Social Activitie’ 時空 shíkōng ‘Space&Time’ Total

First-level semantic category

黑 hēi ‘black’

3.9%

60.8%

33.3% 2.0%

0.0% 100.0%

2

31

17 1

0 51

0 3

0 0

1

2

0.0% 100.0%

0.0% 0.0%

33.3%

66.7%

0 54

17 1

32

4

0.0% 100.0%

31.5% 1.9%

59.3%

7.4%

1 37

13 0

18

5

2.7% 100.0%

35.1% 0.0%

48.6%

13.5%

7 13

1 0

3

2

53.8% 100.0%

7.7% 0

23.1%

15.4%

7 50

15 0

21

7

14.0% 100.0%

30.0% 0.0%

42.0%

14.0%

The experiential The abstract domain Total The experiential The abstract domain Total domain domain Number Percentage Number Percentage Number Percentage Number Percentage Number Percentage Number Percentage

白 bái ‘white’

Table 6. The semantic distribution of the collocates of 白 bái ‘white’ and 黑 hēi ‘black’ in the attributive-head relation

134 S. Wang et al.

The Collocations of Chinese Color Words

135

(1) Comparison of the collocates of 白 bái ‘white’ between the experiential domain and the abstract domain It can be seen from Table 6 that there are 51 collocates in the experiential domain and three collocates in the abstract domain. The former is about 17 times that of the latter. This shows that people usually use its original meaning; that is, 白 bái ‘white’ has a stronger sense of prototypical experience and its metaphorical extension usage is rarely used when 白 bái ‘white’ is used as an attributive modiﬁer to modify the head. The collocates of 白 bái ‘white’ in the experiential domain is mainly distributed in Concrete Entities, and the proportion is as high as 60.8%, followed by Living Things and Abstract Entities, accounting for 33.3% and 3.9% respectively; while in the abstract domain they are concentrated in Abstract Entities and Concrete Entities, accounting for 66.7% and 33.3% respectively. The collocates of 白 bái ‘white’ in the experiential domain and the abstract domain both distributed in the semantic categories of Concrete Entities and Abstract Entities. In the category of Concrete Entities, the second-level categories show that the collocates of 白 bái ‘white’ in the experiential domain focus on the description of Materials (花崗石 huāgǎngshí ‘granite’, 沙 shā ‘sand’), General Terms (上等貨 shàngděnghuò ‘premium goods’, 妝 zhuāng ‘makeup’), Buildings (花壇 huātán ‘flower bed’, 展廳 zhǎntīng ‘exhibition hall’, 舞臺 wǔtái ‘stage’), Daily Necessities (碟具 diéjù ‘dishware’, 韓服 hánfú ‘Hanbok’, 行頭 xíngtóu ‘actor’s costumes and paraphernalia’, 長裙 chángqún ‘long skirt’, 新衣 xīnyī ‘new clothes’, 棉衣 miányī ‘cotton-padded clothes’, 金針 jīnzhēn ‘golden needle’, 球衣 qiúyī ‘athlete's sport shirt’, 制服 zhìfú ‘uniform’, 服裝 fúzhuāng ‘clothing’, 褲裝 kùzhuāng ‘trousers’), Food Items (味增 wèizēng ‘Miso’, 沙律 shālǜ ‘salad’, 饅頭 mántóu ‘steamed bread’, 牛肉 niúròu ‘beef’), Cultural Goods (報表紙 bàobiǎozhǐ ‘report paper’, 小球 xiǎoqiú ‘ball’, 布景 bùjǐng ‘set’, 熒幕 yíngmù ‘screen’, 國旗 guóqí ‘flag’, 背景 bèijǐng ‘background’) and Natural Objects (雲朵 yúnduǒ ‘cloud’, 雲彩 yúncǎi ‘cloud’, 雲 yún ‘cloud’). Different from the diversiﬁed second-level semantic categories of Concrete Entities in the experiential domain, the second-level semantic collocates in the abstract domain has include one category; that is, Natural Objects (聲音 shēngyīn ‘sounds’). In the category of Abstract Entities, the collocates of the second-level categories of 白 bái ‘white’ in the experiential domain are used to describe the words related to Things (花絮 huāxù ‘features’, 角度 jiǎodù ‘angles’), while that in the abstract domain are used to describe the words related to Science&Education (話語 huàyǔ ‘discourse’, 話 huà ‘words’). In addition, the collocates of 白 bái ‘white’ in the experiential domain are also distributed in the semantic category of Living Things. And the collocates in the second-level semantic categories are distributed in Animals (白鶴 bái hè ‘white crane’, 羊群 yángqún ‘flock of sheep or goats’, 鳥 niǎo ‘bird’), Biological Parts (貝殼平片 bèiké píngpiàn ‘flat shell’, 灰發 huīfā ‘gray hair’, 血肉 xuèròu ‘flesh and blood’, 花瓣 huābàn ‘petal’) and Plants (藕 ǒu ‘lotus root’, 蓮藕 liánǒu ‘lotus root’, 花菜 huācài ‘cauliflower’, 水仙 shuǐxiān ‘narcissus’, 芙蓉 fúróng ‘hibiscus’, 荷花héhuā ‘lotus’, 百合 bǎihé ‘lily’, 大蒜 dàsuàn ‘garlic’, 棉花 miánhuā ‘cotton’, 小麥 xiǎomài ‘wheat’).

136

S. Wang et al.

(2) Comparison of the collocates of 黑 hēi ‘black’ between the experiential domain and the abstract domain There are a total of 37 collocations of 黑 hēi ‘black’ in the experiential domain and a total of 13 collocations in the abstract domain. The former is about 2.8 times the latter. This shows that the original meaning is used more than the extended meaning when 黑 hēi ‘black’ is used as an attributive modiﬁer to modify the head. The collocates of 黑 hēi ‘black’ in the experiential domain is mainly distributed in Concrete Entities and Living Things, accounting for 48.6% and 35.1% respectively. Under Concrete Entities, the collocates are widely distributed in second-level semantic categories, including Natural Objects (膚色 fūsè ‘skin color’, 濃煙 nóngyān ‘dense smoke’, 微粒 wēilì ‘particle’, 顏色 yánsè ‘color’), Stationery (招牌 zhāopái ‘signboard’, 紙 zhǐ ‘paper’), Daily Necessities (手環 shǒuhuán ‘bracelet’, 外套 wàitào ‘coat’, 夜行衣 yèxíngyī ‘night clothes’, 衣服 yīfú ‘clothes’, 表面 biǎomiàn ‘surface’), Utensils (磨刀石 módāoshí ‘whetstone’), Buildings (建築 jiànzhù ‘building’, 排水溝 páishuǐgōu ‘drain’), General Terms (物質 wùzhì ‘substance’) and Materials (煤 méi ‘coal’, 泥沙 níshā ‘silt’, 石油 shíyóu ‘oil’); under Living Things, except for a small number of modiﬁcation to Animals (馬 mǎ ‘horse’, 烏鴉 wūyā ‘crow’, 雄獅 xióngshī ‘lion’), the rest are mostly descriptions of People (隊員 duìyuán ‘team member’, 兒子 érzi ‘son’, 孩子 háizi ‘child’, 黑馬 hēi mǎ ‘dark horse’, 票販 piàofàn ‘ticket vendor’, 人 rén ‘people’) and Biological Parts (辮子 biànzi ‘braid’, 臉 liǎn ‘face’, 鬢毛 bìnmáo ‘hair on the temple’, 身體 shēntǐ ‘body’). In addition, the collocates of 黑 hēi ‘black’ in the experiential domain also have a slight distribution in Abstract Entities, accounting for 13.5%, and the second-level semantic category includes Science&Education (線條 xiàntiáo ‘line’) and Attributes (方克火 fāngkèhuǒ ‘Kehuo Fang’, 框架 kuàngjià ‘frame’, 劉青雲 liúqīngyún ‘Qingyu Liu’, 虎頭蜂 hǔtóufēng ‘hornet’). Most collocates of 黑 hēi ‘black’ in the abstract domain belong to Space&Time, accounting for 53.8%, followed by Concrete Entities, accounting for 23.1%. Under Space&Time, Time (夜晚 yèwǎn ‘night’, 白天 bái tiān ‘day’, 中午 zhōngwǔ ‘noon’, 夜 yè ‘night’) and Space (天空 tiānkōng ‘sky’, 會場 huìchǎng ‘venue’, 天 tiān ‘sky’) are generally evenly distributed; under Concrete Entities, the main body is mainly Buildings (隧道 suìdào ‘tunnel’, 舞臺 wǔtái ‘stage’, 車站 chēzhàn ‘station’). Through comparison, it is found that although some collocates of 黑 hēi ‘black’ belong to Buildings both in the experiential domain and the abstract domain, they are used differently. In the experiential domain, 黑 hēi ‘black’ is mainly a description of the color of building itself; while in the abstract domain, 黑 hēi ‘black’ often expresses a certain emotional or other extended meaning: such as 全國最黑的車站 quánguó zuì hēi de chēzhàn the whole country_most_bad_DE_station ‘the worst station in the country’, 黑 hēi ‘black’ here is no longer a description of the color of the station, but the service of the station. Another example of the 黑 hēi ‘black’ in the description of “stage” and “tunnel” is not to describe its color “darkness”, but to describe its deﬁciencies in light and brightness.

The Collocations of Chinese Color Words

137

(3) Comparison of the collocates of 白 bái ‘white’ and 黑 hēi ‘black’ in the experiential domain It can be seen from Table 6 that the semantic category distribution of the collocates of 白 bái ‘white’ and 黑 hēi ‘black’ in the experiential domain is consistent. The highest percentage of collocates in the experiential domain is Concrete Entities, accounting for 60.8% and 48.6% respectively, followed by Living Things, accounting for 33.3% and 35.1%, followed by Abstract Entities, with a ratio of 3.9% and 13.5%. This study further annotated the second-level semantic categories of the collocates. It is found that the collocates of 白 bái ‘white’ belong to these second-level categories under Concrete Entities: Materials (花崗石 huāgǎngshí ‘granite’, 沙 shā ‘sand’), General Terms (上等貨 shàngděnghuò ‘Premium goods’, 妝 zhuāng ‘makeup’), Buildings (花壇 huātán ‘flower bed’, 展廳 zhǎntīng ‘exhibition hall’, 舞臺 wǔtái ‘stage’), Daily Necessities (碟具 diéjù ‘dishware’, 韓服 hánfú ‘hanbok’, 行頭 xíngtóu ‘actor’s costumes and paraphernalia’, 長裙 chángqún ‘long skirt’, 新衣 xīnyī ‘new clothes’, 棉衣 miányī ‘cotton-padded clothes’, 金針 jīnzhēn ‘golden needle’, 球衣 qiúyī ‘athlete’s sport shirt’, 制服 zhìfú ‘uniform’, 服裝 fúzhuāng ‘clothing’, 褲裝 kùzhuāng ‘trousers’), Food Items (味增 wèizēng ‘miso’, 沙律 shālǜ ‘salad’, 饅頭 mántóu ‘steamed bread’, 牛肉 niúròu ‘beef’), Stationery (報表紙 bàobiǎozhǐ ‘report paper’, 小球 xiǎoqiú ‘small ball’, 布景 bùjǐng ‘setting’, 熒幕 yíngmù ‘screen’, 國旗 guóqí ‘flag’, 背景 bèijǐng ‘background’) and Natural Objects (雲朵 yúnduǒ ‘cloud’, 雲彩 yúncǎi ‘cloud’,雲 yún ‘cloud’), while the collocates of 黑 hēi ‘black’ belong to these second-level categories under Concrete Entities: Natural Objects (膚色 fūsè ‘skin color’, 濃煙 nóngyān ‘dense smoke’, 微粒 wēilì ‘particle’, 顏色 yánsè ‘color’), Cultural Goods (招牌 zhāopái ‘signboard’, 紙 zhǐ ‘paper’), Daily Necessities (手環 shǒuhuán ‘bracelet’, 外套 wàitào ‘coat’, 夜行衣 yèxíngyī ‘night clothes’, 衣服 yīfú ‘clothes’, 表面 biǎomiàn ‘surface’), Utensils (磨刀石 módāoshí ‘whetstone’), Buildings (建築 jiànzhù ‘building’, 排水溝 páishuǐgōu ‘drain’), General Terms (物質 wùzhì ‘substance’) and Materials (煤 méi ‘coal’, 泥沙 níshā ‘silt’, 石油 shíyóu ‘oil’). The difference between the two is that the former contains Food Items while the latter does not, and that the latter includes Utensils while the former does not. When the collocates of 白 bái ‘white’ and 黑 hēi ‘black’ in the experiential domain belong to the semantic category of Living Things, there are three second-level categories for 白 bái ‘white’, including Animals (白鶴 báihè ‘white crane’, 羊群 yángqún ‘flock of sheep or goats’, 鳥 niǎo ‘bird’), Biological Parts (貝殼平片 bèiké píngpiàn ‘shell flat’, 灰發 huīfā ‘gray hair’, 血肉 xuèròu ‘flesh and blood’, 花瓣 huābàn ‘petal’) and Plants (藕 ǒu ‘lotus root’, 蓮藕 liánǒu ‘lotus root’, 花菜 huācài ‘cauliflower’, 水仙 shuǐxiān ‘narcissus’, 芙蓉 fúróng ‘hibiscus’, 荷花 héhuā ‘lotus’, 百合 bǎihé ‘lily’, 大蒜 dàsuàn ‘garlic’, 棉花 miánhuā ‘cotton’, 小麥 xiǎomài ‘wheat’); and the secondlevel semantic categories of 黑 hēi ‘black’ includes Animals (馬 mǎ ‘horse’, 烏鴉 wūyā ‘crow’, 雄獅 xióngshī ‘lion’), People (隊員 duìyuán ‘team member’, 兒子 érzi ‘son’, 孩子 háizi ‘child’, 黑馬 hēimǎ ‘dark horse’, 票販 piàofàn ‘ticket scalper’, 人 rén ‘people’) and Biological Parts (辮子 biànzi ‘braid’, 臉 liǎn ‘face’, 鬢毛 bìnmáo ‘hair on the temple’, 身體 shēntǐ ‘body’).

138

S. Wang et al.

When the collocates are Abstract Entities, the collocates of 白 bái ‘white’ are mainly distributed in the second-level semantic category Things (花絮 huāxù ‘interesting sidelights’, 角度 jiǎodù ‘angles’), the second-level semantic categories of the collocates of 黑 hēi ‘black’ include: Science&Education (線條 xiàntiáo ‘line’) and Attributes (方克火 fāngkèhuǒ ‘Fang Kehuo’, 框架 kuàngjià ‘frame’, 劉青雲 liúqīngyún ‘Qingyu Liu’, 虎頭蜂 hǔtóufēng ‘hornet’). (4) Comparison of the collocates of 白 bái ‘white’ and 黑 hēi ‘black’ in the abstract domain As shown in Table 6, the collocates of 白 bái ‘white’ in the abstract domain are concentrated in Abstract Entities, as high as 66.7%, followed by Concrete Entities accounting for 33.3%; while the collocates of 黑 hēi ‘black’ in the abstract domain are mainly distributed in Space&Time, accounting for 53.8%, followed by Concrete Entities with a ratio of 23.1%. The second-level semantic category of the collocates of 白 bái ‘white’ in Abstract Entities is Science&Education (話語 huàyǔ ‘discourse’, 話 huà ‘talk’), and that in Concrete Entities is the Natural Objects (聲音 shēngyīn ‘sound’). While the second-level category of the collocates of 黑 hēi ‘black’ in Space&Time contains Time (夜晚 yèwǎn ‘night’, 白天 bái tiān ‘day’, 中午 zhōngwǔ ‘noon’, 夜 yè ‘night’) and Space (天空 tiānkōng ‘sky’, 會場 huìchǎng ‘venue’, 天 tiān ‘sky’), that in Concrete Entities contains Buildings (隧道 suìdào ‘tunnel’, 舞臺 wǔtái ‘stage’, 車站 chēzhàn ‘station’). In addition, in the semantic category of Living Things, there exists a collocate (人 rén ‘human’) for 黑 hēi ‘black’, while there is no collocate for 白 bái ‘white’. 5.3

The Similarities and Differences of the Collocates’ Distribution Under the Subject-Predicate Relation and the Attributive-Head Relation

The semantic category distribution of the collocates of 白 bái ‘white’ and 黑 hēi ‘black’ under the subject-predicate relation and the attributive-head relation is shown in Table 7. In general, in the subject-predicate relation, the collocates belong to four semantic categories: Living Things, Abstract Entities, Concrete Entities, and Space&Time. In addition to these four categories, in the attributive-head relation, the collocates belong to one more category: Social Activities, and there are ﬁve categories in total. Although the number of semantic categories is different, the distribution of the collocates of 白 bái ‘white’ and 黑 hēi ‘black’ under the subject-predicate relation and the attributive-head relation is consistent. Concrete Entities is the largest category for both the collocates of 白 bái ‘white’ and 黑 hēi ‘black’, as high as 45.8% and 51.0% respectively. The second category is Living Things, accounting for 35.4% and 30.8% respectively. Next, we will make a more detailed comparison of the collocates from the perspective of two grammatical relations to explore the differences between them.

The subject-predicate relation

The attributive-head relation

6 11 5 0 1 23

4.0%

44.0%

48.0% 0.0%

4.0% 100.0%

4.3% 100.0%

21.7% 0.0%

47.8%

26.1%

2 48

17 0

22

7

4.2% 100.0%

35.4% 0.0%

45.8%

14.6%

0 54

17 1

32

4

0.0% 100.0%

31.5% 1.9%

59.3%

7.4%

7 50

15 0

21

7

14.0% 100.0%

30.0% 0.0%

42.0%

14.0%

7 104

32 1

53

11

6.7% 100.0%

30.8% 1.0%

51.0%

10.6%

白 bái ‘white’ 黑 hēi ‘black’ Total 白 bái ‘white’ 黑 hēi ‘black’ Total Number Percentage Number Percentage Number Percentage Number Percentage Number Percentage Number Percentage

抽象事物 chōuxiàng shìwù 1 ‘Abstract Entities’ 具體物 jùtǐ wù ‘Concrete 11 Entities’ 生物 shēngwù ‘Living Things’ 12 社會活動 shèhuì huódòng 0 ‘Social Activities’ 時空 shíkōng ‘Space&Time’ 1 Total 25

First-level semantic category

Table 7. The semantic category distribution of the collocates of 白 bái ‘white’ and 黑 hēi ‘black’

The Collocations of Chinese Color Words 139

140

S. Wang et al.

Speciﬁcally, there are differences in the semantic categories of the collocates of 白 bái ‘white’ under the subject-predicate relation and the attributive-head relation. Under the former, the collocates are mainly distributed in the two categories of Living Things and Concrete Entities, with a ratio up to 48.0% and 44.0%, respectively. Under the attributive-head relation, the collocates are mainly distributed in Concrete Entities and Living Things, accounting for 59.3% and 31.5%, respectively. It can be seen that regardless of the grammatical relationship, the collocates of 白 bái ‘white’ are mainly distributed in the two categories of Concrete Entities and Living Things, but the proportion is different. Similarly, whether it is under the subject-predicate relation or the attributive-head relation, Concrete Entities is the largest category for the collocates of 黑 hēi ‘black’, accounting for 47.8% and 42.0%, respectively. The difference is that, under the subject-predicate relation, the distribution of the collocates of 黑 hēi ‘black’ in Abstract Entities is the second highest, accounting for 26.1%; while under the attributive-head relation, the distribution of the collocates of that in Living Things is the second highest, with a ratio of 30.0%.

6 Conclusion Color words play an important role in people’s understanding of the world and color metaphors are often used in people’s daily life. However, there are few studies on a corpus-based statistical analysis of the collocations of Chinese color words. This paper selected the representative adjectives 白 bái ‘white’ and 黑 hēi ‘black’ in the color domain, used CWS to automatically extract their collocates from the Tagged Chinese Gigaword Corpus and manually screened the collocations to ﬁlter out the incorrect ones in the subject-predicate relation and attributive-head relation. We annotated the sense of 白 bái ‘white’ and 黑 hēi ‘black’ in each collocation. According to whether 白 bái ‘white’ and 黑 hēi ‘black’ has the original meaning or the extended meaning in each collocation, a collocate is divided into the experiential domain and the abstract domain. Then we carried out semantic category annotation and explored the distributional differences of collocates of the two adjectives in the experiential domain and the abstract domain. Through a comparative analysis, we found that the collocates’ distribution of 白 bái ‘white’ and 黑 hēi ‘black’ reflects people’s preference for semantic choice to a certain extent. 白 bái ‘white’ and 黑 hēi ‘black’ are used to describe the collocates belonging to Concrete Entities, followed by the semantic category Living Things. There are some similarities and differences for 白 bái ‘white’ and 黑 hēi ‘black’ in different grammatical relations. Under the subject-predicate relation, the collocates of 白 bái ‘white’ are mainly distributed in the ﬁrst-level semantic category Living Things, while 黑 hēi ‘black’ in the semantic category of Concrete Entities. Under the attributive-head relation, the words related to Concrete Entities are more popular for both 白 bái ‘white’ and 黑 hēi ‘black’. The results are summarized in Table 8 and Table 9.

The Collocations of Chinese Color Words

141

Table 8. First-level semantic categories of the collocates of 白 bái ‘white’ and 黑 hēi ‘black’ Collocates

Domain

Collocatesof 白bái ‘white’

The experiential domain The abstract domain

Collocatesof 黑 hēi ‘black’

The experiential domain The abstract domain

First-level semantic categories of the subjectpredicate relation Concrete Entities (45.5%) Living Things (50.0%) Space&Time (4.5%) Abstract Entities (33.3%) Concrete Entities (33.3%) Living Things (33.3%) Abstract Entities (6.3%) Concrete Entities (62.5%) Living Things (31.3%) Abstract Entities (71.4%) Concrete Entities (14.3%) Space&Time (14.3%)

First-level semantic categories of the attributive-head relation Abstract Entities (3.9%) Concrete Entities (60.8%) Living Things (33.3%) Social Activities (2.0%) Abstract Entities (66.7%) Concrete Entities (33.3%) Abstract Entities (13.5%) Concrete Entities (48.6%) Living Things (35.1%) Space&Time (2.7%) Abstract Entities (15.4%) Concrete Entities (23.1%) Living Things (7.7%) Space&Time (53.8%)

Table 9. Second-level semantic categories of the collocates from the largest ﬁrst-level semantic category Collocates

Collocates of 白 bái ‘white’

Collocates of 黑 hēi ‘black’

Domain

The subject-predicate relation

The attributive-head relation

Largest ﬁrst-level semantic category of the subject-predicate relation

Second-level semantic categories of the collocates from the largest ﬁrst-level semantic category

Largest ﬁrst-level semantic category of the attributive-head relation

Second-level semantic categories of the collocates from the largest ﬁrst-level semantic category

The experiential domain

Living Things (50.0%)

Biological parts Plants

Concrete Entities (60.8%)

Materials General Terms Buildings Daily Necessities Food Items Cultural Goods Natural Objects

The abstract domain

Buildings Abstract Entities People (33.3%) Concrete Entities (33.3%) Living Things (33.3%)

Abstract Entities (66.7%)

Science&Education

The experiential domain

Concrete Entities (62.5%)

Natural Objects General Terms Daily Necessities Cultural Goods

Concrete Entities (48.6%)

Natural Objects Cultural Goods Daily Necessities Utensils Buildings General Terms Materials

The abstract domain

Abstract Entities (71.4%)

Things Attributes Consciousness

Space&Time (53.8%)

Space Time

142

S. Wang et al.

Acknowledgment. This study is funded by the research project of National Language Commission (Project Number: YB135-159).

References 1. Xu, C.: On the similarities and differences between English and Chinese color words from 黑 hēi “black” and 白 bái “white” (yóu “hēi” yǔ “bái” guǎnkuī yīnghàn yásè cí de yìtóng). Mod. Chin. (xiàndài yǔwén) 57–58 (2007) 2. Ge, H., Yang, C.: A cognitive contrastive study on the lexicalization of “black” and “white” in English and Chinese. (yīnghàn “hēi” “bái” cíhuìhuà de rènzhī duìbǐ yánjiū). Mod. Chin. (xiàndài yǔwén) 150–151 (2013) 3. Liu, J., Wang, W.: A comparative study on the semantics of Chinese and Tai color words “black” and “white” (hàntài yánsè cí “hēi” “bái” yǔyì duìbǐ yánjiū). Writer World (zuòjiā tiāndì) 32–33 (2020) 4. Luo, T.: Teaching strategies of color words in TCFL – Taking 红 hóng “red”, 白 bái “white” and 黑 hēi “black” as an example (yánsè cí zài duìwài hànyǔ zhōng de jiàoxué cèluè—“hóng” “bái” “hēi” sān cí wéi lì). Asia-Paciﬁc Education (yàtài jiàoyù) 48 (2015) 5. Shwesinwin, E.: A comparative study on the symbolic meanings of color terms -focusing on the basic color lexicons of the Korean and Burmese language. J. Int. Netw. Korean Lang. Cult. 15, 161–186 (2018) 6. Chang, Y.-W., Lai, Y.-H.: Factors affecting color-term performance in bilinguals and relevant pedagogical implications. English Teach. Learn. 45(1), 1–26 (2020) 7. Chernova, N., Lelis, E., Baranova, S.: Teaching Russian phraseology with the component of color term “black” to philology foreign students. Propósitos y Representaciones 9 (2021) 8. Liu, Y., Liu, N.: Discusses the unbalanced development of 黑 hēi “black”, 白 bái “white” and 灰 huī “gray” from the perspective of linguistic meme theory (cóng yǔyán móyīn lǐlùn tàntǎo “hēi” “bái” “huī” yìxiàng fāzhǎn de bù pínghéng). Mod. Chin. (xiàndài yǔwén) 34–36 (2017) 9. Grossmann, M., D'Achille, P.: Compound color terms in Italian. In: Raffaelli, I., Katunar, D., Kerovec, B. (eds.) Lexicalization Patterns in Color Naming: A Cross-Linguistic Perspective, vol. 78, pp. 61–82 (2019) 10. Kuriki, I., et al.: The modern Japanese color lexicon. J. Vis. 17 (2017) 11. Huang, C.-R.: Tagged Chinese Gigaword (Version 2.0) (2009). http://www.ldc.upenn.edu/ Catalog/catalogEntry.jsp?catalogId=LDC2009T14. Retrieved from Chinese Gigaword version 2.0. https://catalog.ldc.upenn.edu/LDC2005T14. http://www.ldc.upenn.edu/Catalog/ catalogEntry.jsp?catalogId=LDC2009T14 12. Kilgarriff, A., et al.: The Sketch Engine: ten years on. Lexicography 1, 7–36 (2014) 13. Kilgarriff, A., Rychly, P., Smrz, P., Tugwell, D.: The sketch engine. Inf. Technol. 105–116 (2004) 14. Kilgarriff, A., Tugwell, D.: Sketching words. In: Corréard, M.-H. (ed.) Lexicography and Natural Language: A Festschrift in Honour of B.T.S. Atkins (Euralex 2002), pp. 125–137 (2002) 15. Kilgarriff, A., Huang, C.-R., Rychlý, P., Smith, S., Tugwell, D.: Chinese Word Sketches ASIALEX 2005: Words in Asian Cultural Context, Singapore (2005) 16. Huang, C.-R., et al.: Chinese sketch engine and the extraction of grammatical collocations. In: Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju, Island, Korea, pp. 48–55 (2005)

The Collocations of Chinese Color Words

143

17. Wang, S., Huang, C.-R.: Adjectival modiﬁcation to nouns in mandarin Chinese: case studies on “cháng+noun” and “adjective+túshūguǎn. In: Otoguro, R., Ishikawa, K., Umemoto, H., Yoshimoto, K., Harada, Y. (eds.) Proceedings of the 24th Paciﬁc Asia Conference on Language, Information and Computation (PACLIC-24), Tohoku University, Sendai, Japan, pp. 701–705 (2010) 18. Wu, Y., Wang, S.: Applying Chinese word sketch engine to distinguish commonly confused words. In: Dong, M., Lin, J., Tang, X. (eds.) Chinese Lexical Semantics, pp. 600–619. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49508-8_55 19. Wang, S., Wang, X.: The attention to safety issues from mainland China and Taiwan. In: Hong, J.-F., Su, Qi., Wu, J.-S. (eds.) CLSW 2018. LNCS (LNAI), vol. 11173, pp. 801–818. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-04015-4_69 20. Wang, S., Huang, C.-R., Xu, H.: Compositionality of NN compounds: a case study on [N1 +artifactual-type event nouns]. In: Proceedings of the 26th Paciﬁc Asia Conference on Language, Information and Computation (PACLIC-26), pp. 70–79. Faculty of Computer Science, Universitas Indonesia, Bali (2012) 21. Wang, S., Huang, C.-R.: The semantic type system of event nouns. In: Jing, S.Z. (ed.) Increased Empiricism: Recent Advances in Chinese Linguistics, vol. 2, pp. 205–221. John Benjamins Publishing Company, Amsterdam/Philadelphia (2013) 22. Wang, S.: Semantics of event nouns. Ph.D. The Hong Kong Polytechnic University, Hong Kong (2013) 23. Wang, S., Huang, C.-R.: Type construction of event nouns in mandarin Chinese. In: Proceedings of the First Workshop on Generative Lexicon for Asian Languages (GLAL-1), Workshop of the 26th Paciﬁc Asia Conference on Language, Information and Computation (PACLIC-26), pp. 582–591. Faculty of Computer Science, Universitas Indonesia, Bali, Indonesia (2012) 24. Wang, S., Huang, C.-R.: Word sketch lexicography: new perspectives on lexicographic studies of Chinese near synonyms. Lingua Sinica 3, 1–22 (2017) 25. Wang, S., Tang, L.: Comparison of the changes between mainland China and Taiwan. In: Liu, M., Kit, C., Su, Q. (eds.) Chinese Lexical Semantics, pp. 686–710. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-81197-6_58 26. Wang, S., Wu, L., Gong, Q.: The collocations of Chinese tactile adjectives. In: Liu, M., Kit, C., Su, Qi. (eds.) CLSW 2020. LNCS (LNAI), vol. 12278, pp. 711–733. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-81197-6_59 27. Wang, S.: Chinese Multiword Expressions: Theoretical and Practical Perspectives. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-8510-0 28. Dictionary Editing Room, Institute of Linguistics of China Academy of Social Sciences: The Contemporary Chinese Dictionary (7th Edition) (Xiàndài hànyǔ cídiǎn). The Commercial Press (Shāngwù yìnshūguǎn), Beijing (2016) 29. Su, X.: A thesaurus of modern Chinese (xiàndài hànyǔ fēnlèi cídiǎn). The Commercial Press, Beijing (2013)

The Differences Between Jiùshì and Jiùsuàn as Conjunctions and Their Formation Mechanisms Wei Bian(&) Department of Chinese Language and Literature, School of Humanities and Social Sciences, Tsinghua University, Beijing, China [email protected]

Abstract. The conjunctions Jiùshì (就是) and Jiùsuàn (就算) are not always interchangeable. This paper describes the differences between the two words in terms of syntax and semantics. Syntactically, there are ﬁve differences between the two words; Semantically, Jiùshì has the multifuctionality, while Jiùsuàn has a single function and strong subjectivity. The differences in semantics bring about the differences in syntax. The mechanisms of the differences between the two words need to be explored from the diachronic perspective of lexicalization and grammaticalization. We ﬁnd that the evolutionary paths of Jiùshì and Jiùsuàn are different. Although they have all experienced lexicalization, the use of Jiùshì as a conjunction is gradually grammaticalized by decategorialization. Jiùsuàn developed directly from the adverbial phrase to the conjunction of the hypothesis and concession through the abduction. The multifuctionality of Jiùshì affects its use as a conjunction, and Jiùsuàn has a single function, it has produced a strong subjectivity under the action of persistence and abduction, and it is more easily used in rhetorical questions and transitional complex sentence. It can be used together with Nándào (难道) Dànshì (但是) and so on. The paper ends with discussion of the usage of the two words in the Ming and Qing Dynasties. The data proves once again that the synchronic differences are brought about by diachronic evolution. It can be seen that diachronic evolution can be well used to analyze synchronic differences. Keywords: Zòngyǔ conjunction Jiùshì Subjectivity Grammaticalization

Jiùsuàn Multifuctionality

1 Introduction The conjunctions Jiùshì(就是) ‘even if’ and Jiùsuàn (就算) ‘even if’ are used in compound sentences expressing hypotheses and concessions. What they represent is usually just a hypothetical or virtual situation, which may not actually exist. This type of conjunction is called Zòngyǔ conjunctions (纵予连词) ‘conjunctions for hypotheses and concessions’. Regarding the conjunctions Jiùshì and Jiùsuàn, the previous research mainly focused on the lexicalization and grammaticalization of the two words. [1] believed that both words were formed through lexicalization. He regarded Suàn(算) as a clitic, which © Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 144–159, 2022. https://doi.org/10.1007/978-3-031-06703-7_11

The Differences Between Jiùshì and Jiùsuàn as Conjunctions

145

was directly attached to the conjunction Jiù(就) by analogy, and he regarded Shì(是) as a sufﬁx that can be attached to almost all types of conjunctions. [2] believed that Jiùshì was developed from a judgement phrase to a conjunction for hypotheses and concessions, formed by the absorption of context in the context of the enumerated sentence. [3] and [4] believed that there is a transitional stage in the process of grammaticalization, and at this stage Jiùshì acting as a mood adverb with an emphasis function. Regarding the formation of Jiùsuàn, [5] pointed out that Jiùsuàn was an adverbial phrase originally, and developed into a conjunction through the mechanisms of component shedding and metonymy. It is not difﬁcult to see that there are two shortcomings in the previous research: one is the lack of description and explanation of the synchronic differences between the two words. Although [6] made a preliminary comparison from the aspects of syntax, semantics, and pragmatics, she did not explore the mechanisms of the differences between the two words in depth. Second, previous studies did not link the diachronic evolution of the two words with synchronic differences. In this case, we can neither see the causes of the synchronic differences very well, nor the important influence of diachronic evolution. Therefore, this article intends to analyze the differences between Jiùshì and Jiùsuàn as conjunctions and their formation mechanisms. We believe that the multifuctionality of Jiùshì and the subjectivity of Jiùsuàn are the key factors in forming the differences between the two words. The two words experienced different development processes.

2 The Differences Between Jiùshì and Jiùsuàn Jiùshì and Jiùsuàn have some similarities, and they can be combined with Yě (也) to form a “Jiùshì/Jiùsuàn p, Yě q” format. In this case, they can be replaced with each other, and the sentence semantics will not change. The use of Yě makes this concession relationship more obvious, Yě can even be used alone without collocation with “Jiùshì/ Jiùsuàn”. However, when Yě is not used or in some other situations, there is a difference between the use of Jiùshì and Jiùsuàn, because Jiùshì itself is multi-functional and there is a possibility of semantic ambiguity, while the meaning of Jiùsuàn is very clear and it has stronger subjectivity. 2.1

Differences in the Syntax of the Two Words

According to the description of the usage of two words in [7], combined with the investigation of corpus, we compare the two words from the following ﬁve perspectives: the order of the clause, the matching of related words, the sentence types, the insertion of the mood word and the ability to connect nominal components.1 First, they have different choices of word order positions in compound sentences. Jiùsuàn and its connected clauses can be moved backward to the position of the main 1

For the convenience of comparison and analysis, most of the corpora in modern Chinese in this article are self-designed. For the retrieval of ancient Chinese corpora, we used the CCL corpus and Ancient Chinese Corpus of Chinese Electronic Literature of Academia Sinica.

146

W. Bian

sentence, the original main sentence becomes the antecedent, and the Yě in the original main sentence needs to be removed. Jiùshì and its connected components cannot be moved backward. E.g: Wǒmen yídìng jìnlì qiǎngjiù bìngrén, jiùsuàn/* jiùshì zhǐ yǒu yíxiàn xīwàng. 1PL must manage rescue patient even.if/*but2 only have faint hope ‘We must do our best to rescue the patient, even if there is only a glimmer of hope.’3 Second, there are some differences in the combination of the two words and related words. 800 Words in Modern Chinese states that Jiùsuàn can be matched with Dànshì(但是), Kěshì(可是) etc., the latter clause emphasizes the adversative relation based on the premise of concession, while Jiùshì cannot be used in this way. E.g:

Third, the two words have different abilities to form rhetorical questions. Jiùsuàn can be combined with Nándào(难道), its ability to form rhetorical questions is stronger than Jiùshì. E.g:

Fourth, the two words differ in their ability to pause and follow the modal particle. There can be a pause after Jiùsuàn, followed by the modal particle Ne(呢) or Ba(吧). Jiùshì cannot be used this way. E.g:

Fifth, the two words have different abilities to connect nominal components. [8] pointed out that Jiùshì can directly connect noun components. However, Shì is usually added to Jiùsuàn to make the meaning of the sentence smoother. E.g:

2

3

In the translation process, due to the multifuctionality of Jiùshì, we choose possible understandings other than Zòngyǔ, which will help us compare Jiùshì and Jiùsuàn. Due to space limitations, we only show the translation of the correct sentence.

The Differences Between Jiùshì and Jiùsuàn as Conjunctions

147

In short, Jiùshì and Jiùsuàn are unevenly distributed in syntax. The ﬁrst four cases tend to use Jiùsuàn, and only the ﬁfth perspective mostly uses Jiùshì because it is easier to connect noun components with the morpheme Shì. 2.2

Differences in the Semantics of the Two Words

Semantically, Jiùshì itself is multi-functional, while Jiùsuàn has a single function and more subjectivity. The differences in the semantics between the two words are fundamentally related to the meaning of Shì and Suàn. Shì itself can mean judgment, and it is modiﬁed by the emphasized adverb Jiù to express a positive judgment. Jiùshì develops from a phrase to an adverb, and further develops into a Zòngyǔ conjunction. This diachronic evolution has brought about the multifuctionality and ambiguity of Jiùshì. E.g:

It can be seen that, in addition to expressing assumptions and concessions, it also expresses afﬁrmation and emphasis, and is sometimes used as a turning point. E.g:

In contrast, Suàn is a kind of subjective judgment, which means that the speaker classiﬁes a certain thing or state as a certain situation, mostly virtual reality, and the adverb Jiù strengthens this subjectivity. Because Jiùsuàn often leads to hypothetical concession clauses, after the effects of the persistence and abduction, it gradually solidiﬁes into a conjunction. 2.2.1 The Multifuctionality of Jiùshì [9] pointed out that multifuctionality is a coding form (lexical form, grammatical component, grammatical category, and structural formula) in a language that has two or more different but related functions. With reference to [10], we ﬁnd that Jiùshì has

148

W. Bian

different usages as auxiliary, adverb and conjunction, and Jiùsuàn can only be used as conjunctions in the choice of parts of speech.4 In the “multi-domain” part of the BCC corpus, we examined the ﬁrst 50 sentences containing both Jiùshì and Nándào. The functional distribution of Jiùshì is as follows (Table 1): Table 1. Function distribution of Jiùshì and Nándào collocation Adverb Conjunction Particle(助词) Phrase Total Jiùshì 37 1 1 11 50

An example sentence where Jiùshì is used as a conjunction is as follows:

In the ﬁrst 50 sentences containing both Jiùshì and Dànshì, the situation of Jiùshì is as follows (Table 2): Table 2. Function distribution of Jiùshì and Dànshì collocation Adverb Conjunction Particle(助词) Phrase Total Jiùshì 42 0 0 8 50

The example of Jiùshì being used as a Zòngyǔ conjunction is not completely absent, but it is rare and occurs infrequently. E.g:

We also investigated the ﬁrst 50 sentences that match Jiùsuàn and Nándào in the “Multi-domain” column of the BCC corpus. Among them, 48 cases are conjunctions and 2 cases are phrases. The same is true for Jiùsuàn and Dànshì. It can be seen that when paired with Nándào or Dànshì, Jiùshì is mainly used as an adverb and rarely as a conjunction. In contrast, the function of Jiùsuàn is relatively simple and is mainly used as a conjunctive. Although there are examples such as (8) and (9), which may be caused by the influence of analogy and function expansion, the number of such examples is too small, so it will not affect our basic point of view.

4

The function of Jiùshì as a conjunction is closely related to its function as an adverb, but not to its function as an auxiliary word. Therefore, the article does not discuss the use of auxiliary words too much. Jiùsuàn can be used as an adverbial phrase, which affects its conjunction function.

The Differences Between Jiùshì and Jiùsuàn as Conjunctions

149

2.2.2 The Subjectivity of Jiùsuàn Subjectivity refers to the fact that the speaker expresses his position, attitude, and feelings towards the passage while uttering a passage, thus leaving a mark on himself in the discourse [11, 12]. It is embodied in this article as the speaker’s epistemic modality: in the word Jiùsuàn, Suàn has the meaning of subjective classiﬁcation. At the same time, the persistence and abduction functions also promote the subjectivity of this word. In this article, it is also expressed as the speaker’s affect: Jiùsuàn shows different emotions in different contexts. We examined the use of Jiùsuàn in the BCC corpus and categorized it according to semantic features and emotional expressions: Table 3. Semantic features and emotional expressions of Jiùsuàn Semantic intensity

Psychological expectation

Conventional reasoning

Real psychology

High, exaggerated

−

−

+

+

+

−

+

−

+

−

−

+

+

+

−

+

−

+

Low, realistic

Examples of emotions and attitudes Fearless, Resolute Disappointed, Critical Praise, Approve Dissatisﬁed, disputed Negative, Pessimistic Positive, Proper

Types

I IIA IIB III IVA IVB

As shown in Table 3, the subsequent components of the Zòngyǔ conjunctions Jiùshì and Jiùsuàn can be divided into high-intensity exaggerated type and lowintensity realistic type from the semantic intensity. The former is often a virtual situation, with high semantic intensity, which does not exist or is difﬁcult to achieve in reality. The latter is a real situation, which means concession. From psychological expectation, it is divided into positive expectation and negative expectation [13]. The former is the hope that the event will occur, which is generally a positive situation, and the latter is the undesired event, which is mostly a negative situation. Conventional reasoning is the result of the natural introduction of hypothetical situations, which is in line with general cognition. It is divided into positive and negative situations, which are implicit in sentences. Real psychology is reflected in the speaker’s actual thoughts in the main sentence, which is often contrary to conventional reasoning, and it is also divided into positive and negative cognition. Emotional attitude is expressed through hypothetical concessions, expressing the subjectivity of the sentence. It is divided into four types as a whole, of which II and IV can be divided into two cases according to

150

W. Bian

positive and negative cognition. Let’s take Jiùsuàn as an example, the example sentence is as follows:

By examining the “multi-domain” column of the BCC corpus, we ﬁnd that in the 50 sentences with the Zòngyǔ conjunction Jiùsuàn paired with Nándào and Dànshì, the expressions of rhetorical questions and reversions are mostly used in type III and type IVA (Table 4). Table 4. Semantic distribution of Nándào and Dànshì I IIA IIB III IVA IVB Total Nándào 1 0 0 35 14 0 50 Dànshì 0 0 0 27 23 0 50

It can be seen from the table that when Jiùsuàn is combined with Nándào or Dànshì, it focuses on the low-intensity reality type. Although IVB is reality type, it expresses a positive tone, so it is not suitable. Types III and IVA mainly express

The Differences Between Jiùshì and Jiùsuàn as Conjunctions

151

emotions and attitudes such as dissatisfaction, rebuttal, and negation. This is consistent with the expression of Nándào and Dànshì. Due to the persistence of subjective classiﬁcation, Jiùsuàn is more subjective, it is more suitable to express these emotions and attitudes than Jiùshì. We replaced Jiùsuàn one by one with Jiùshì, and found that the replacement in example (10) and (11) was relatively smooth, while the replacement of type III in example (12) and type IVA in example (13) was more difﬁcult, because there are Nándào and Dànshì in it. However, this is not to say that Jiùshì cannot represent type III and IVA, but it is not suitable for sentences such as rhetorical questions and transition sentences, and such sentences that have strong subjectivity are mostly used in type III and IVA. If it is transformed into a declarative sentence, Jiùshì can be used, but the tone of “dissatisfaction, rebuttal” is greatly weakened, for example:

In short, when collocation with words such as Nándào and Dànshì, we generally use Jiùsuàn instead of Jiùshì. By comparing the differences in the syntax and semantics of the two words, we ﬁnd that the semantics affects the syntax to a greater extent, and the differences in the syntax of the two words are caused by the differences in semantics. The syntactic environment for Jiùshì is limited, because the multifuctionality brings ambiguity, especially the use of Jiùshì as adverbs and transitional conjunctions. The syntactic distribution of Jiùsuàn is wider, precisely because of its single function and strong subjectivity. So, how do these semantic differences come into being? We need to look at the diachronic evolution of the two words.

3 The Evolution of Jiùshì and Jiùsuàn [14] pointed out that around the fourth and ﬁfth centuries, Shì was generally added after other words, resulting in “*Shì” type words, such as Fēishì(非是), Yóushì(犹是) and so on. Through the corpus search, we found that the above-mentioned words such as Fēishì and Yóushì did appear in Middle Chinese in large numbers, but did not ﬁnd a case of Jiùshì used as Zòngyǔ conjunction, which did not appear until the end of Yuan Dynasty. Before the use of Zòngyǔ conjunction appeared, Jiùshì ﬁrst appeared in phrase usage and emphasized adverb usage. This shows that the Zòngyǔ conjunction Jiùshì is not formed by adding the sufﬁx Shì to Jiù, but has undergone lexicalization and grammaticalization. Similarly, the formation of Jiùsuàn is not directly attached to Jiù by Suàn. In fact, both words have evolved from adverbial phrases into Zòngyǔ conjunction, but their speciﬁc evolutionary paths are different.

152

3.1

W. Bian

The Evolution of Jiùshì and the Formation of Multifuctionality

Jiù is used for Zòngyǔ conjunction no later than the Wei, Jin and Six Dynasties.

In the above two examples, the clause where Jiù is located and the following clauses constitute a hypothetical concession relationship. Jiù can be understood as Jí shǐ(即使) ‘even if’. [15] has pointed out Jiù used for Zòngyǔ may be from the Eastern Han Dynasty, while Jiùshì was used for Zòngyǔ in the Yuan and Ming Dynasties. Jiùsuàn is probably later appears. [4] argued in detail that the Zòngyǔ conjunction Jiù originates from the adverb Jiù which plays an emphasis role. [2] pointed out that probably in the Song and Yuan Dynasties, Jiù and Shì began to be used together, which was initially adverbial-verb phrase for judgment. E.g:

In the above example, Jiùshì means judgment, and it is used as a verb phrase. The Jiùshì that expresses judgment is often followed by a noun phrase. As the function expands, it can also be connected with a predicate phrase or a clause, which develops into a mood adverb. E.g:

Example (18) can be seen as in a transitional stage, and there are two possible explanations. It can be understood as a judgment, referring to the person mentioned, and Jiùshì is an adverbial phrase; it can also be understood as emphasizing the act of

The Differences Between Jiùshì and Jiùsuàn as Conjunctions

153

‘saving that woman’, with Jiùshì as an adverb. Jiùshì in Example (19) obviously no longer expresses judgment, but emphasizes action and state. Jiùshì that expresses Zòngyǔ roughly starts at the end of Yuan Dynasty. E.g:

The format of “Jiùshì NP/VP Yě VP” is formed syntactically, meaning the relationship of hypothesis and concession semantically. Jiùshì has become a typical Zòngyǔ conjunction. During the Ming and Qing dynasties, the Zòngyǔ conjunction Jiùshì appeared in large numbers, and was often combined with Yě, E.g:

On the whole, the evolution of Jiùshì is as follows: At ﬁrst it was an adverbial phrase for judgment, and with another verb appearing in the sentence, Jiùshì became an adverb with a strong mood. Since Jiùshì often appears in the sentence pattern of “Jiùshì A Yě B”, the meaning of the sentence indicates that if the situation A is present, the truth of B is not affected by A, and Jiùshì undergoes grammaticalization and becomes a Zòngyǔ conjunction. The introduction mentioned that scholars have disputes over whether the modality adverb stage of Jiùshì exists. We believe that this stage exists, and this brings the multifuctionality of Jiùshì. The proof is as follows: First, the development of adverbs into Zòngyǔ conjunctions has typological evidence. [4] demonstrated the evolution of Jiùshì from emphasis to Zòngyǔ from the perspective of typology. Second, the functions of judgment and emphasis are closely related, and sometimes the boundaries are not clear, such as (18). Third, from the decategorialization process, this evolution is in line with the categoriality. [16] points out that there is such a cline: major category (> intermediate category) > minor category. In this schema the major categories are noun and verb (categories that are relatively “open” lexically), and minor categories include preposition, conjunction, auxiliary verb, pronoun, and demonstrative (relatively “closed” categories). Adjectives and adverbs comprise an intermediate degree between the major and minor categories and can often be shown to derive straight forwardly from (participial) verbs and (locative, manner, etc.) nouns respectively. The function of adverbs exists objectively in the evolution of Jiùshì, and the direction of evolution is in

154

W. Bian

line with the categoriality. Finally, it is not possible to develop directly from adverbial phrases into Zòngyǔ conjunctions. From the position point of view, the adverbial phrase Jiùshì is generally located in the sentence, and the conjunction Jiùshì is generally located at the beginning of the sentence. From a semantic point of view, the components following the phrase Jiùshì are mostly objective and real, while the conjunction Jiùshì is mostly virtual and exaggerated. According to [17] research on the relationship between critical environment and grammaticalizing element, we believe that in the frame of “Jiùshì A, Yě B”, Jiùshì and Yě are unchangeable items and frequently appear in the same frame. A and B are changeable items, which are replaceable components in the frame. The conditions of grammaticalization of the Zòngyǔ conjunction Jiùshì are as follows: First, the unchangeable item Jiùshì is generally located at the beginning of the sentence, which emphasizes A, and the leading clause serves as a prerequisite. Second, the changeable item A is generally a virtual exaggeration, which can be predicate or can be extended to substantive, indicating a kind of hypothesis. Finally, the unchangeable item Yě is very important, and the clauses guided by Yě form a semantic comparison with the preceding clauses, and thus produce the meaning of concession. When the hypothetical concession relationship arises within the entire framework, Zòngyǔ conjunction Jiùshì is formed. 3.2

The Evolution of Jiùsuàn and the Highlight of Subjectivity

The subjectivity of Jiùsuàn is reflected in two perspectives: ﬁrst, the speaker’s epistemic modality, that is, the persistence and abduction effects of the subjective classiﬁcation of Suàn in Jiùsuàn.5 Second, the affect of the speaker. Jiùsuàn reflects different emotions in different contexts. The earliest combination of Jiù and Suàn appeared in the Ming Dynasty. The adverb Jiù modiﬁed the verb Suàn to form adverbial phrases. Quoting from [5]:

Jiùsuàn belongs to the subjective classiﬁcation, it usually refers to virtual reality. For example (23), Jiùsuàn means that this situation is subjectively regarded as credit. The use of the Zòngyǔ conjunction Jiùsuàn originated in the Ming Dynasty. E.g:

5

[16] believes that when a form undergoes grammaticalization from a lexical to a grammatical item, some traces of its original lexical meanings tend to adhere to it, and details of its lexical history may be reflected in constraints on its grammatical distribution. This phenomenon has been called “persistence”.

The Differences Between Jiùshì and Jiùsuàn as Conjunctions

155

The Qing Dynasty continued to use the Zòngyǔ conjunction Jiùsuàn, in addition to Yě, Jiùsuàn can also be used with Nándào, etc., for rhetorical questions. E.g:

[5] believes that Jiùsuàn was an adverbial phrase at ﬁrst, indicating subjective classiﬁcation. In the format of “A Jiùsuàn B”, A, which indicates the condition, gradually falls off, and the follow-up main sentence is derived from “Jiùsuàn B”. Therefore, the function of Jiùsuàn leads to the introduction of virtual results to the introduction of virtual conditions. Through metonymy, it expresses that even if the situation is the case, the fact is still true. As hypothetical concession clauses are often drawn, Jiùsuàn gradually condenses into a Zòngyǔ conjunction. Shang Guowen’s dissertation did not explain the following issues well: ﬁrst, why Jiùsuàn did not develop the use of adverbs; second, how the subjectivity of Jiùsuàn developed, and what impact did it have on the syntax. We try to make the following explanations. First, it has to do with the meanings of Shì and Suàn itself. Shì itself can represent judgment. It is modiﬁed by the modal adverb Jiù and expresses a positive and objective judgment. Therefore, it is easy to develop the use of the modal adverb. In contrast, Suàn means ‘treat as’, it is a subjective judgment, and the modal adverb Jiù reinforces this subjectivity. Therefore, Jiùsuàn failed to develop the use of adverbs. Second, we believe that abduction plays a more important role in semantic evolution and subjective highlighting.6 The abduction process of Jiùsuàn from ‘treat as’ to ‘assume’ is as follows: The Result: The phrase Jiùsuàn indicates a subjective classiﬁcation of virtual reality. The Law: To assume that a situation exists, it is generally virtual reality. The Case: It may be assumed that something happened.

6

[16] pointed out that abduction is to infer from the observed results, according to the law, that something may be a case of the law. Compared with inductive reasoning and deductive reasoning, abduction is more important for grammaticalization.

156

W. Bian

Under the influence of abduction, Jiùsuàn has the meaning of hypothesis. By matching with words such as Yě, it can produce further abduction, expressing the meaning of concession. The Result: The virtual reality conditions assumed by Jiùsuàn are compared with the actual conditions represented by Yě. The Law: To express a concession relationship, it is necessary to assume a virtual reality condition to compare with the real situation. The Case: It may be to express concessions. At this point, Jiùsuàn has evolved into a Zòngyǔ conjunction expressing hypotheses and concessions. It is worth noting that the degree of grammaticalization of Suàn is not as deep as Shì in the Zòngyǔ conjunctions Jiùsuàn and Jiùshì. [18] pointed out that the Shì in the conjunction Jiùshì has been further grammaticalized as an internal part of the word. Due to the persistence of subjective classiﬁcation and the influence of abduction, the subjectivity of Jiùsuàn is stronger and the emotions expressed are more abundant. 3.3

The Differences and Influence of the Evolution Paths of Jiùshì and Jiùsuàn

It can be seen from the above analysis that the evolutionary paths of Jiùshì and Jiùsuàn are not exactly the same. Although they have both undergone lexicalization, the Zòngyǔ conjunction Jiùshì is grammaticalized by decategorialization, and Jiùsuàn develops from the adverbial phrase directly to the Zòngyǔ conjunction through abduction. Details are as follows (Figs. 1 and 2): The two evolutionary paths brought the multifuctionality of Jiùshì and the subjectivity of Jiùsuàn, which determined the differences in the usage of the two words.7 The use of the modal adverb of Jiùshì will affect semantic interpretation, while Jiùsuàn has a single part of speech and strong subjectivity, it is more flexible in the expression of semantic strength and emotion, and it can be used in combination with Dànshì, Nándào, Kěshì and so on. Next, we will analyze the use of the two words in the Ming and Qing Dynasties for further explanation. Table 5 clearly shows that the differences in the diachronic evolution of the two words bring about the synchronic differences. First, due to the experience of decategorialization, Jiùshì has produced multifuctionality, it can be used as an adverb in addition to a conjunction, but Jiùsuàn has a single part of speech8. The multifuctionality of Jiùshì caused the ﬁrst and fourth perspectives in Sect. 2.1. Second, due to the influence of the persistence and abduction, Jiùsuàn has more subjectivity. When used

7

8

[19] pointed out that the process of evolution from a form of greater than a word to a word is generally called “lexicalization” in the Chinese academic circle, regardless of whether the resulting word is lexical or grammatical. This is good for examining Chinese vocabulary (content words and function words) as a whole, otherwise many function words will be excluded. We accept this view and believe that Jiùshì has experienced both lexicalization and grammaticalization from phrase to adverb. We don’t count the usage of Jiùshì and Jiùsuàn as phrases here. Although their use of phrases is also common in corpus surveys, their syntax and semantics are obviously different from those of Zòngyǔ conjunctions, so they are not discussed.

The Differences Between Jiùshì and Jiùsuàn as Conjunctions

157

Fig. 1. The evolution path of Jiùshì

Fig. 2. The evolution path of Jiùsuàn Table 5. The use of Jiùshì and Jiùsuàn in the Ming and Qing Dynasties a

Jiùsuàn Items Part of Rhetorical after speech question the word NP VP Conj. Adv. NP VP Conj. Adv. Journey 0 0 0 3 0 0 0 0 0 0 Water 1 2 3 1 0 0 0 0 0 0 Golden 36 41 77 12 0 0 2 2 0 0 Dream 22 82 104 15 1 0 3 3 0 1 Legend 17 7 24 10 0 0 11 11 0 5 World 78 71 149 4 3 0 2 2 0 0 Total 154 203 357 45 4 0 18 18 0 6 a Because of layout restrictions, only one word in the title of each book is reserved. The complete book titles are as follows from top to bottom: Journey to the West, Water Margin, The Golden lotus, Dream of the Red Mansion, Legend of Heroes and Heroines, Marriages to Awaken the world. These works are language materials and are not included in the references of this paper. TItles

Jiùshì Items after the word

Part of speech

Rhetorical question

in rhetorical questions, it is more likely to use Jiùsuàn, both in terms of overall quantity and proportion. In Legend of Heroes and Heroines, ﬁve rhetorical questions expressing Zòngyǔ relationship are all used Jiùsuàn, accounting for almost half of the sentences that use Zòngyǔ conjunction Jiùsuàn. Although three cases in Marriages to Awaken the World use Jiùshì to indicate rhetorical questions, the proportion is too small in 149 sentences, and there are only two sentences using Jiùsuàn for Zòngyǔ. In contrast, Jiùshì is restricted by multifunctionality and often combined with Yě to form a solid, limited use. This illustrates the difference between the second and third perspectives in Sect. 2.1. Third, when the two words are used as Zòngyǔ conjunctions, the following components are different. Jiùshì can be used for both verbal and noun components, the former is generally more than the latter, while Jiùsuàn usually only takes verbal components. This is also related to abduction, because abduction is a type of reasoning, and reasoning is generally composed of propositions, in which predicate components

158

W. Bian

are indispensable. This is in line with the ﬁfth perspective mentioned in Sect. 2.1. Finally, as a Zòngyǔ conjunction, Jiùshì is used much more frequently than Jiùsuàn, possibly because Jiùshì is generated earlier than Jiùsuàn, and the combination of Jiùshì and Yě has solidiﬁed, and Jiùshì can be followed by a nominal component. In contrast, Jiùsuàn can only make up for the quantitative disadvantage through richer collocation and expression.

4 Conclusion The above analyzes the syntax and semantic differences between Jiùshì and Jiùsuàn, and discussed the relationship between syntax and semantics. We further explored the formation process of differences from a diachronic perspective, and found that the multifuctionality of Jiùshì and the subjectivity of Jiùsuàn are the key factors. The former is mainly brought about by decategorialization, while the latter is mainly caused by the effects of persistence and abduction. Finally, the use of Jiùshì and Jiùsuàn in the Ming and Qing Dynasties proved that the synchronic differences between the two words is actually brought about by the differences in diachronic evolution, which once again illustrates the importance of analyzing the synchronic differences through diachronic evolution. In addition, the discussion in this article also provides a reference for the discrimination of two words with a same morpheme and a different morpheme, such as Zòngshǐ (纵使) and Zòngrán (纵然). It can be described from both syntactic and semantic aspects, and analyzed from the differences in different morphemes and the differences in the evolution of vocabulary. This article also summarizes some rules for the discrimination and analysis of synonyms in teaching Chinese as a foreign language. Acknowledgments. I am grateful to the anonymous reviewers of CLSW 2021 for helpful suggestions and comments. The work was supported by the Social Science Foundation of Beijing, China (Grant No.17YYC019) and Tsinghua University Initiative Scientiﬁc Research Program (Grant No. 2019THZWLJ28).

References 1. Xi, J.: Modern Chinese Conjunctions, pp. 368–371. China Social Sciences Press, Beijing (2010). (in Chinese) 2. Zhang, Y.S.: The Cohesive Function of Jiùshì and Its Grammaticalization. Chin. Teach. World. (3), 80–90 (2002). (in Chinese) 3. Ling, Y.: A Case Study of Concession Conjunction Evolution and Grammatical Function. Zhejiang University, Hangzhou (2007). (in Chinese) 4. Zhang, L.L.: On the Production of the Zòngyǔ Conjunctions Ji(即), Bian(便) and Jiù(就). Humanitas Taiwanica (71), 136–138 (2009). (in Chinese) 5. Shang, G.W.: Characteristics and lexicalization of the usage of “X Suàn(算)”. J. Chin. Lang. Hist. (13), 26–29 (2013). (in Chinese) 6. Xie, Y.L.: Comparison of concession conjunctions Jiùshì and Jiùsuàn. J. Minxi Vocat. Tech. Coll. (2), 50–53 (2013). (in Chinese)

The Differences Between Jiùshì and Jiùsuàn as Conjunctions

159

7. Lü, S.X.: 800 Words of Modern Chinese, pp. 319–322. Shangwu Press, Beijing (1980). (in Chinese) 8. Zhou, G.: Conjunctions and Related Issues, p. 137. Anhui Education Press, Anhui (2002). (in Chinese) 9. Wu, F.X.: Multifuctional morphemes and semantic map model. Stud. Lang. Linguist. (1), 25–26 (2011). (in Chinese) 10. Compiled by the Dictionary Editing Ofﬁce of Institute of Linguistics, CASS. Modern Chinese Dictionary, 7th edn, p. 701. The Commercial Press, Beijing (2018). (in Chinese) 11. Lyons, J.: Semantics 2 vols, p. 739. Cambridge University Press, Cambridge (1977) 12. Shen, J.X.: A survey of studies on subjectivity and subjectivisation. Foreign Lang. Teach. Res. (4), 268–269 (2001). (in Chinese) 13. Shi, J.S., Sun, H.Y.: Internal differences and formation mechanisms of “Dàn (Shì)但(是)” transition conjunctions. Linguist. Res. (4), 34–40 (2010). (in Chinese) 14. Mei, Z.L.: The Origin of the Syntax of Alternative question in Modern Chinese. Mei Zulin’s Linguistic Essays, pp. 10–11. The Commercial Press, Beijing (2007). (in Chinese) 15. Ota, T.: A Historical Grammar of Modern Chinese, pp. 309–310 (S.Y. Jiang and C.H. Xu Transl.). Peking University Press, Beijing (2003). (in Chinese) 16. Hopper, P.J., Traugott, E.C.: Grammaticalization, 2nd edn, pp. 42–107. Cambridge University Press, Cambridge (2003) 17. Peng, R.: On the interaction between critical context and grammaticalizing elements. Linguist. Sci. (3), 278–290 (2008). (in Chinese) 18. Dong, X.F.: Further grammaticalization of “Shì”: from functional word to word-internal element. Contemp. Linguist. (1), 40–41 (2004). (in Chinese) 19. Jiang, S.Y.: Overview of Chinese Historical Lexicology. The Commercial Press, Beijing (2015). (in Chinese)

A Deontic Modal SFP in Chengdu Chinese Jiajuan Xiong(&) Southwestern University of Finance and Economics, Chengdu, China [email protected]

Abstract. This study identiﬁes the SFP təu in Chengdu Chinese as a C-element encoding deontic modality, such as commission or permission, depending on the person(s) of the subject. Moreover, it conveys conditionality in that the implementation of an action requires the completion of another action. It is conditionality that gives rise to a temporally sequential reading, which is dominant enough to earn təu a term as “antecedent aspect marker”. Syntactically, conditionality occurs in an adjunct CP, and this CP is further adjoined to an elided IP, in which deontic modality is located. This IP further projects to a higher CP accommodating the deontic modal SFP təu. Keywords: Təu SFP Chengdu Chinese

Antecedent Conditional Deontic modality

1 Introduction In Chengdu Chinese, the particle təu occurs in a sentence-ﬁnal position under conversational contexts. As shown in (1), speaker A utters a suggestion that both the speaker and addressee(s) go and watch a ﬁlm. The addressee, or one of the addressees, noted as speaker B, replies A by making a commitment to the suggestion with strings attached. The commitment will be fulﬁlled on the condition of completing another action, e.g., “ﬁnishing the meal” in (1B). Notably, speaker B employs the particle təu, which contains two semantic components: (i) making a commitment with regard to a syntactically-elided but contextually-salient action, i.e., “go and watch a ﬁlm”; (ii) providing a condition for the elided action, i.e., “ﬁnish the meal ﬁrst”.

(1) A: ŋomən tɕʻiɛ kʻan tian’in ma. 1PL go watch film SFP ‘Let us go and watch a film.’ B: tsʻɿ no fan təu. eat PERF rice SFP ‘(We go and watch a film) on the condition that we finish the meal.’ In the literature, the function of SFP təu in Chengdu is usually analyzed as an “antecedent aspect” (see, e.g., Zhang et al. 2001), on the ground that the event in a təucontaining sentence, i.e., “ﬁnish the meal” in (1B), is required to antedate the elided action, i.e., “go and watch a ﬁlm”. However, the antecedent meaning is not sufﬁcient to capture the semantic features and syntactic behaviors, e.g.: (i) why does təu differ from © Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 160–171, 2022. https://doi.org/10.1007/978-3-031-06703-7_12

A Deontic Modal SFP in Chengdu Chinese

161

other antecedent adverbials, e.g., ɕian ‘ﬁrst’ and tsɿxau ‘after’, in terms of their compatibility with clausal conjunction and topic contexts? (ii) why does təu encode a kind of modality, but not attested in the case of other antecedent adverbials. These issues will be addressed in Sect. 2, in which the semantic components of təu will be analyzed in detail. In Sect. 3, we will propose a syntactic analysis of təu, of which the C-feature and C-layer will be revealed. We then conclude this paper in Sect. 4.

2 Data Presentation In this section, we present data of təu to generalize its semantics of conditional modality. Contrary to the literature, we dismiss təu as an antecedent marker but analyzes it to be a modal SFP, encoding both conditionality and deontic modality. We will further show that its antecedent meaning is not inherent but derived from conditionality. Moreover, the modality encoded by təu will be identiﬁed and scrutinized, so as to determine its syntactic status. 2.1

The Semantics of Təu

The presence of təu prohibits the occurrence of a continuous clause to encode a sequentially ensuing action, as exempliﬁed by the unacceptability of (2a). If təu is purely antecedent-encoding, the unacceptability of (2a) is unexpected, as an antecedent-consequence collocation can be well licensed in Chengdu Chinese, as shown in (2b) and (2c), in which the antecedent meaning is expressed by the adverbs ɕian ‘ﬁrst’ and tsɿxau ‘after’, respectively. (2) a. *tsʻɿ no fan təu, ŋomən tsʻai/tsai tɕʻiɛ kʻan tian’in. eat PERF rice SFP 1PL then/then go watch film Intended: ‘We go and watch the film on the condition that we finish the meal.’ b. ɕian tsʻɿ no fan, ŋomən tsʻai/tsai tɕʻiɛ kʻan tian’in. first eat PERF rice 1PL then/then go watch film ‘After we finish the meal, we go and watch the film.’ c. tsʻɿ no fan tsɿxau, ŋomən tsʻai/tsai tɕʻiɛ kʻan tian’in. eat PERF rice after 1PL then/then go watch film ‘After we finish the meal, we go and watch the film.’

Given above, we come to the generalization that təu, unlike ɕian ‘ﬁrst’ and tsɿxau ‘after’, is not an antecedent marker. In our analysis, the particle təu functions to express a modality with regard to a syntactically-elided but contextually-salient action contingent on a condition. Thus, we deﬁne təu as a SFP to mark conditional modality. The modality-governed action, though being elided, can be made explicit in the form of a topic, as shown in (3a). However, a similar topic is not compatible with sentences containing time adverbials, such as ɕian ‘ﬁrst’ and tsɿxau ‘after’, as exempliﬁed in (3b) and (3c).

162

J. Xiong

(3) a. kʻan tian’in a, tsʻɿ no fan təu. watch film regarding eat PERF rice SFP ‘As for watching a film, let us do it on the condition that we finish the meal.’ */?? b. kʻan tian’in a, ɕian tsʻɿ no fan. watch film regarding first eat PERF rice Intended: ‘As for watching a film, we have a meal first.’ */?? c. kʻan tian’in a, tsʻɿ no fan tsɿxau. watch film regarding eat PERF rice after Intended: ‘As for watching a film, we have a meal first.’ The above sentences show that təu should not be considered as an antecedent marker. In Sect. 2.2, we will explore how antecedent meaning arises due to the presence of təu. 2.2

Təu-Induced Condition: Telicity

The particle təu requires the presence of the perfective marker no, indicating a future perfective meaning. The absence of no or replacement of the perfective marker no with progressive markers tɕʻi or tau will lead to ungrammaticality, as shown in (4) and (5). (4) Bare VP: A: ŋomən tɕʻiɛ kʻan tian’in ma. 1PL go watch film SFP ‘Let us go and watch a film.’ B: *tsʻɿ fan təu. eat rice SFP Intended: ‘(We go and watch a film) on the condition that we eat the meal.’ (5) Progressive Aspect: A: ŋomən tɕʻiɛ kʻan tian’in ma. 1PL go watch film SFP ‘Let us go and watch a film.’ B: *tsʻɿ tɕʻi/tau fan təu. eat PROG/PROG rice SFP Intended: ‘(We go and watch a film) on the condition that we are having the meal.’

Apart from the presence of the perfective marker no, there are some other attested means applicable to VP, in the presence of the particle təu. For example, the perfective marker tɕʻi1, the modiﬁcation of [(numeral) + classiﬁer] on a NP, the modiﬁcation of 1

The particle tɕʻi can give rise to either durative and perfective readings, as illustrated in (i) and (ii) below: (i) ŋo tsɛ tɕʻi ifu no. (perfective). 1SG fold PERF clothes SFP. ‘I folded the clothes.’. (ii) ŋo tsɛ tɕʻi ifu tsai.(progressive). 1SG fold PROG clothes SFP. ‘I am folding the clothes.’. Despite the two interpretations, tɕʻi in (6b) can only be interpreted as a perfective marker, but not a progressive one.

A Deontic Modal SFP in Chengdu Chinese

163

[numeral + classiﬁer] on a VP, and the presence of a durative timeframe are compatible with the particle təu. They are exempliﬁed in (6)–(10). (6) The perfective tɕʻi: A: ŋomən tɕʻiɛ kʻan tian’in ma. 1PL go watch film SFP ‘Let us go and watch a film.’ B: tsɛ tɕʻi ifu təu. fold PERF clothes SFP ‘(We go and watch a film) on the condition that we complete folding the clothes.’ (7) The [(numeral) + classifier] modification of NP: A: ŋomən tɕʻiɛ kʻan tian’in ma. 1PL go watch film SFP ‘Let us go and watch a film.’ B: tsʻɿ (i/san) wan fan təu. eat one/three bowl rice SFP ‘(We go and watch a film) on the condition that we finish the meal.’ (8) The [numeral + classifier] modification of VP: A: ŋomən tɕʻiɛ kʻan tian’in ma. 1PL go watch film SFP ‘Let us go and watch a film.’ B: pʻau liaŋ tɕʻuan təu. run two circle SFP ‘(We go and watch a film) on the condition that we run two circles.’ (9) The approximate durative time xɚ ‘a while’: A: ŋomən tɕʻiɛ kʻan tian’in ma. 1PL go watch film SFP ‘Let us go and watch a film.’ B: ɕiəuɕi xɚ təu. rest a_while SFP ‘(We go and watch a film) on the condition that we take a rest for a while.’ (10) A durative timeframe: A: ŋomən tɕʻiɛ kʻan tian’in ma. 1PL go watch film SFP ‘Let us go and watch a film.’ B: pai sɿ fəntsoŋ noŋməntsən təu. speak ten minute chat SFP ‘(We go and watch a film) on the condition that we chat for ten minutes.’

164

J. Xiong

The above examples show that the predicates in the təu-occurring sentences encode bounded events. We further argue that təu encodes the conditional meaning and its antecedent meaning drives from the boundedness of events. For example, the future time mintʻian ‘tomorrow’ in (11) gives rise to the interpretation ‘on the condition that it turns tomorrow’, but not ‘after tomorrow’.

(11) A: ŋomən tɕʻiɛ kʻan tian’in ma. 1PL go watch film SFP ‘Let us go and watch a film.’ B. mintʻian təu. tomorrow SFP ‘(We go and watch a film) on the condition that it turns tomorrow.’ To sum up, the presence of təu encodes a conditional meaning, which in most cases leads to an antecedent reading. In Sect. 2.3, we discuss the modality conveyed by təu. 2.3

The Təu-Induced Modality

The occurrence of təu expresses deontic modality, viz., commission and direction, regarding an event that is syntactically implicit but contextually retrievable. The interpretation of attitude depends on the persons who participate in the implicit action at issue. We ﬁnd that the use of təu is compatible with all kinds of persons, be they plural or singular, as shown in (12)–(17). In the case of ﬁrst person subjects, the speaker expresses her/his commission under a speciﬁc condition; and as for second and third person subjects, the speaker conveys her/his direction, in particular, permission, which is also condition-bound.

A Deontic Modal SFP in Chengdu Chinese

165

(12) The first person plural: (commissive) A: ŋomən tɕʻiɛ kʻan tian’in ma. (=1) 1PL go watch film SFP ‘Let us go and watch a film.’ B: ŋomən tsʻɿ no fan təu. 1PL eat PERF rice SFP ‘(We’ll go and watch a film) on the condition that we finish the meal.’ (13) The first person singular: (commissive) A: ni tɕʻiɛ ɕoɕi. 2SG go study ‘You go to study.’ B: ŋo tsʻɿ no fan təu. 1SG eat PERF rice SFP ‘(I’ll study) on the condition that I finish the meal.’ (14) The second person plural: (permissive) A: ŋomən iau kʻan tiansɿ. 1PL will watch TV ‘We’ll watch TV.’ B: nimən tsʻɿ no fan təu. 2PL eat PERF rice SFP ‘(You can watch TV) on the condition that you finish the meal.’ (15) The second person singular: (permissive) A: ŋo iau kʻan tiansɿ. 1SG will watch TV ‘I’ll watch TV.’ B: ni tsʻɿ no fan təu. 2SG eat PERF rice SFP ‘(You can watch TV) on the condition that you finish the meal.’ (16) The third person plural: (permissive) A: tʻamən iau kʻan tiansɿ. 3PL will watch TV ‘They’ll watch TV.’ B: tʻamən tsʻɿ no fan təu. 3PL eat PERF rice SFP ‘(They can watch TV) on the condition that you finish the meal.’ (17) The third person singular: (permissive) A: tʻa iau kʻan tiansɿ. 3SG will watch TV ‘S/he’ll watch TV.’ B: tʻa tsʻɿ no fan təu. 3SG eat PERF rice SFP ‘(S/he can watch TV) on the condition that you finish the meal.’ Note that the persons of the modality-encoding clause might not be identical with the persons in the conditional clause, as shown in (18)–(23). Regardless of the persons

166

J. Xiong

in the conditional clause, the type of deontic modality remains contingent on the persons in the main clause. (18) The first person plural: (commissive) A: ŋomən tɕʻiɛ kʻan tian’in ma. 1PL go watch film SFP ‘Let us go and watch a film.’ B: ŋo/ni/nimən/tʻa/tʻamən tsʻɿ no fan təu. 1SG/2SG/2PL/3SG/SPL eat PERF rice SFP ‘(We’ll go and watch a film) on the condition that I/you/he/she/they finish(es) the meal.’ (19) The first person singular: (commissive) A: ni tɕʻiɛ ɕoɕi. 2SG go study ‘You go to study.’ B: ŋo/ni/nimən/tʻa/tʻamən tsʻɿ no fan təu. 1SG/2SG/2PL/3SG/SPL eat PERF rice SFP ‘(I’ll study) on the condition that I/you/he/she/they finish(es) the meal.’ (20) The second person plural: A: ŋomən iau kʻan 1PL will watch ‘We’ll watch TV.’ B: ŋo/ni/nimən/tʻa/tʻamən 1SG/2SG/2PL/3SG/SPL ‘(You can watch TV) on meal.’ (21) The second person singular: A: ŋo iau kʻan 1SG will watch ‘I’ll watch TV.’ B: ŋo/ni/nimən/tʻa/tʻamən 1SG/2SG/2PL/3SG/SPL ‘(You can watch TV) on meal.’

(permissive) tiansɿ. TV tsʻɿ no fan təu. eat PERF rice SFP the condition that I/you/he/she/they finish(es) the (permissive) tiansɿ. TV tsʻɿ no fan təu. eat PERF rice SFP the condition that I/you/he/she/they finish(es) the

(22) The third person plural: (permissive) A: tʻamən iau kʻan tiansɿ. 3PL will watch TV ‘They’ll watch TV.’ B: ŋo/ni/nimən/tʻa/tʻamən tsʻɿ no fan təu. 1SG/2SG/2PL/3SG/SPL eat PERF rice SFP ‘(They can watch TV) on the condition that I/you/he/she/they finish(es) the meal.’ (23) The third person singular: (permissive) A: tʻa iau kʻan tiansɿ. 3SG will watch TV ‘S/he’ll watch TV.’ B: ŋo/ni/nimən/tʻa/tʻamən tsʻɿ no fan təu. 1SG/2SG/2PL/3SG/SPL eat PERF rice SFP ‘(S/he can watch TV) on the condition that I/you/he/she/they finish(es) the meal.’

A Deontic Modal SFP in Chengdu Chinese

167

Moreover, the difference of subjects as illustrated in (18)–(23) indicates that a təucontaining sentence is a complex one, including a conditional clause and a main clause. The syntactic analysis will be presented in Sect. 3.

3 Syntactic Analysis of Təu-Containing Sentences We propose that a təu-containing sentence is a complex one, containing an adjunct clause and an elided main clause, as illustrated in Fig. 1. This analysis can be supported by two pieces of evidence: ﬁrst, the adjunct clause and the main clause may have their own subjects, which are not necessarily identical (see (18)–(23)). Second, the main clause, though elided, is contextually implied and such a clause cannot be made explicit by adding to a təu-containing sentence in any form of conjunction (see (2a)).

Fig. 1. The syntactic structure of a təu-containing sentence

The above analysis captures the characteristics of təu-containing sentences. The meaning of conditionality is encoded by the adjunct CP, in which TP is required to be telic. It is telicity of TP that leads to antecedent reading of the adjunct CP with reference to the main clause IP. As for the elided IP, it is contextually present. The speaker expresses either a commission or a permission, depending on the persons of the subject within IP. If the subject is the ﬁrst person, be it singular or plural, the speaker conveys a commission to conduct an action; if the subject is the second or third person,

168

J. Xiong

the speaker expresses her/his permission. In both cases, commission and permission are contingent on the fulﬁllment of a condition. Given this, the function of təu is deﬁned as “conditional modality”. Syntactically, təu is a sentence ﬁnal particle that occupies the C position. 3.1

Non-relativization of Təu

Təu-containing sentences defy relativization, as exempliﬁed in (24a). By contrast, a similar meaning expressed by two consecutive predicates allows relativization, as shown in (24b). (24) a. *tsu no tso-iɛ təu ni nei ko ua-ua do PERF homework SFP DE that CL child Intended: ‘the child who performs an action (e.g., watches TV) after finishing homework’ b. tsu no tso-iɛ tsʻai kan tiansɿ ni nei ko ua-ua do PERF homework then watch TV DE that CL child ‘the child who watches TV after finishing homework’

We adopt the adjunction view of relativization (e.g., Schachter 1973; Vergnarud 1974; Xu 1993; Huang et al. 2009; Lin 2010) and treat ni (Chengdu counterpart of Mandarin de) as base-generated in C. Thus, non-relativization of təu-containing sentences leads us to the conclusion that təu is a C element. Speciﬁcally, as analyzed in Sect. 2.3, təu is a SFP encoding deontic modality, such as commission and permission. In the following section, we will examine the interactions between təu and some other SFPs. 3.2

Stacking of SFPs

The SFP təu can be stacked by some other SPFs, such as ma, tɛma, san, xa, to exemplify just a few in (25) and (26). These added SFPs are sensitive to subjects. For instance, xa in (25d) tends to encode a question, whereas the same SFP in (26d) is likely to be interpreted as the speaker’s suggestion. Despite of the differences, the SFPs of ma, tɛma, san and xa share one commonality, i.e., being a propositional modality taking scope over the whole sentence. Thus, they are expected to be syntactically higher than deontic modality SFP of təu, which scopes over the elided action.

A Deontic Modal SFP in Chengdu Chinese

169

(25) Stacking of SFPs on the commissive təu: a. ŋo tsʻɿ no fan təu ma. 1SG eat PERF rice SFP SFP ‘I suggest that (I’ll perform an action) on the condition that I finish the meal.’ b. ŋo tsʻɿ no fan təu tɛma. 1SG eat PERF rice SFP SFP ‘It is evident (e.g., as you promised me) that (I’ll perform an action) on the condition that I finish the meal.’ c. ŋo tsʻɿ no fan təu san. 1SG eat PERF rice SFP SFP ‘It should be the case that (I’ll perform an action) on the condition that I finish the meal.’ d. ŋo tsʻɿ no fan təu xa? 1SG eat PERF rice SFP SFP ‘Is it possible that (I’ll perform action) on the condition that I finish the meal?’ (26) Stacking of SFPs on the permissive təu: a. ni tsʻɿ no fan təu ma. 2SG eat PERF rice SFP SFP ‘I suggest that (you perform an action) on the condition that you finish the meal.’ b. ni tsʻɿ no fan təu tɛma. 2SG eat PERF rice SFP SFP ‘It is evident (e.g., as I already suggested to you) that (you perform an action) on the condition that you finish the meal.’ c. ni tsʻɿ no fan təu san. 2SG eat PERF rice SFP SFP ‘It should be the case that (you perform an action) on the condition that you finish the meal.’ d. ni tsʻɿ no fan təu xa. 2SG eat PERF rice SFP SFP ‘I suggest to you that (you perform an action) on the condition that you finish the meal.’

3.3

Further Evidence for Conditional Modality of Təu

As per our analysis, təu takes up dual meanings, i.e., conditional sense and deontic modality. For example, when a question is raised regarding the listener’s permission, as in (27A), the interlocutor may express her/his consent by using təu, as shown in (27B1). If təu is a pure antecedent marker, (27B1) should be semantically identical with (27B2), which, however, cannot serve as an appropriate answer to a question about permission modality.

170

J. Xiong

(27) A. ŋo kʻo-i kʻ an tiansɿ pu? 1SG can watch TV SFPQ ‘Can I watch TV?’ B1: tsʻɿ no fan təu. eat PERF rice SFP ‘(You can do it) on the condition that you finish the meal.’ B2: #ɕian tsʻɿ fan. first eat rice ‘You have your meal first.’

4 Concluding Remarks In this study, we reanalyze the so-called antecedent marker təu as a SFP encoding conditional modality. Syntactically, a təu-containing sentence is a complex one, with an adjunct CP adjoined to an elided IP, which further project a CP. The adjunct CP is a conditional clause, which requires its TP to be telic. It is telicity that leads to an interpretation of CP’s being sequentially antecedent to IP. This may contribute to the reason why təu has long been termed as an “antecedent marker”. As for modality, təu expresses deontic modality of commission and permission, the choice of which depends on the subjects of the elided IP. Speciﬁcally, the ﬁrst person is usually associated with commissive modality, whereas the second or third persons are generally tied with permissive modality. The deontic modal SFP is situated in the CP periphery and proved to be syntactically lower than epistemic and interrogative SFPs. In this sense, Chengdu Chinese provides an ideal ground to tease apart modal SFPs of various kinds, e.g., the deontic, epistemic and evidential types.

References Huang, C.-T.J., Li, Y.-H.A., Li, Y.: The Syntax of Chinese. Cambridge University Press, Cambridge (2009) Li, B.: Chinese ﬁnal particles and the syntax of the periphery. Ph.D. Thesis. Universiteit Leiden (2006) Lin, T.-H. J.: Syntactic structures of complex sentences in Mandarin Chinese. Nanzan Linguistics, vol. 3: Research Results and Activities, pp. 63–97 (2005) Lin, Y.-A.: The de-marked modiﬁcation structure in Mandarin Chinese. In: Clements, L.E., Liu, C.-M.L. (eds.) Proceedings of the 22nd North American Conference on Chinese Linguistics (NACCL-22) & the 18th International Conference on Chinese Linguistics (IACL-18), vol. 2., 254–270. Harvard University, Cambridge (2010) Rizzi, L.: The ﬁne structure of the left periphery. In: Haegeman, L. (ed.) Elements of Grammar, pp. 281–337. Kluwer, Dordrecht (1997) Searle, J.: R: Intentionality. Cambridge University Press, Cambridge (1983) Xu, D.: A CP analysis of Mandarin Chinese. Linguistics. 10(1), 189–200 (1993)

A Deontic Modal SFP in Chengdu Chinese

171

Zhang, Y., Zhang, Q., Deng, Y.: Grammar of Chengdu Chinese (Chengdu Fangyan Yufa Yanjiu), Bashu Publishing House (Bashu Shushe) (2001) Vergnaud, J.-R.: French relative clauses. Ph.D. thesis. Massachusetts Institute of Technology (1974)

An Analysis of the Grammaticalization, Coercion Mechanisms and Formation Motivation of the New Construction ‘XX Zi’ from the Cognitive Perspective Chen Li(&) School of Humanities, Shanghai Jiao Tong University, Shanghai, China [email protected]

Abstract. The new ‘XX Zi’ construction is influenced by the transmission of fan culture in Europe, America, Japan and Korea, and becomes popular under the effect of strong online catchwords, the influence of social culture and the help of new media. Through the retrieval and analysis of network corpus, combined with the method of questionnaire survey, this paper analyzes the structural features, grammaticalization process, and construction coercion mechanism of the popular online constructions ‘XX Zi’ (such as ‘Xin Xin Zi (欣欣子)’, ‘Wuyu Zi (无语子)’, ‘Bucuo Zi (不错子)’, etc.), and discusses its formation motivation, popular mechanism and pragmatic function from the cognitive perspective. This paper holds that the redundancy of the construction itself and its semantic, phonetic and syntactic features are all the manifestations of its grammaticalization. It meets the communicative needs of young people in speciﬁc contexts, adapts to their expression needs of ‘defamiliarization’ and ‘subjectivization’, and conforms to the cognitive law of people’s pursuit of highlighting the theme meaning. Keywords: ‘XX Zi’ Cognitive linguistics Construction grammar Coercion

Grammaticalization

1 Introduction The reason why catchwords become popular elements in language is that certain language items agree with the social popular mentality and leave a popular mark on semantics or structure [1]. Recently, several talent shows such as ‘Youth With You 2’, ‘Creation Camp 2020’ and ‘Sister Riding the Wind and Waves’ have been widely broadcasted, many Chinese audiences and netizens have also begun to follow the practice of ‘fan circle’ in Europe, America, Japan and Korea, using the popular construction ‘XX Zi’ to refer to their favorite players or idols. Yu Shuxin and An Qi (two contestants from the talent show ‘Youth With You 2’) are called ‘Xin Xin Zi’ and ‘Qi Qi Zi’. After the popularity of the name ‘XX Zi’, many other network catchwords also began to appear in the form of ‘XX Zi’, such as ‘Wuyu Zi’ (wordless), ‘Ke’ai Zi’ (lovely), ‘Gaoxiao Zi’ (funny), ‘Meili Zi’ (beautiful) and ‘Shengqi Zi’ (angry). Even some idioms such as ‘Da Ke Bu Bi’ (it’s not necessary) and ‘Cheng Feng Po Lang’ (riding the wind © Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 172–186, 2022. https://doi.org/10.1007/978-3-031-06703-7_13

An Analysis of the Grammaticalization, Coercion Mechanisms

173

and waves) can be used with a ‘Zi’. The network catchwords ‘XX Zi’ has not appeared and prevailed for a long time in China, its new usage has rarely appeared in traditional Chinese, so this paper regards ‘XX Zi’ as a new network catchword. Compared with the original words, the new words after adding ‘Zi’ show the characteristics of ‘the meaning of the word is ‘roughly the same’, but there seems to be some difference’. How did the network catchwords like ‘XX Zi’ appear and become popular? How can it trace the evolution of grammaticalization? Can ‘XX Zi’ be regarded as a construction? What are its characteristics in terms of conﬁguration, grammar, and semantics? Is ‘XX Zi’ a redundancy phenomenon of a language? What kind of pragmatic function and cognitive mechanism does it present? What attitudes and opinions do users have on the usage of ‘XX Zi’? Because the new Internet catchword ‘XX Zi’ has been popular in China for a short time, there are few related papers that have conducted in-depth explorations on it. Through the retrieval and analysis of network corpus, combined with questionnaire survey, this paper aims to explore the structural characteristics, grammaticalization, construction mechanism, pragmatic function and cognitive mechanism of the popular construction of the new network catchword ‘XX Zi’ from the cognitive perspective, so as to try to solve the above questions. This paper clariﬁes that the research object is the new network construction ‘XX Zi’ with three or more syllables. On this basis, this paper uses corpus retrieval, questionnaire survey, statistical analysis and other methods to conduct a comprehensive investigation and analysis of the new ‘XX Zi’ construction. Some of the corpus related to ‘XX Zi’ involved in this paper was randomly collected from Sina Microblog1, Zhihu2, Douban3, Baidu Tieba4 and other online platforms. This paper issued a questionnaire on the usage of the new ‘XX Zi’ construction. The rest of the corpus comes from the ‘XX Zi’ expression provided by 146 respondents in the questionnaire. Excluding the repetitive corpus, this paper obtains a total of 626 corpus samples related to ‘XX Zi’ that can be used for analysis. The 146 valid questionnaires also investigated the public’s viewing of talent show, the way and frequency of use of the ‘XX Zi’ construction, and their attitudes toward the ‘XX Zi’.

2 Structural Features and Grammaticalization Process of ‘XX Zi’ Construction 2.1

Grammaticalization of ‘Zi’ and Deﬁnition of New ‘XX Zi’ Construction

The concept of grammaticalization was ﬁrst put forward by Meillet. He interpreted grammaticalization as the process of ‘transforming an originally independent word into an element expressing grammatical function’ [2]. Later, ‘grammaticalization’ refers to a

1 2 3 4

Website Website Website Website

address: address: address: address:

https://weibo.com/. https://www.zhihu.com/. https://www.douban.com/. https://tieba.baidu.com/index.html.

174

C. Li

process or phenomenon in which words with real meaning are transformed into components with no real meaning and grammatical function. Traditional Chinese linguistics calls it ‘grammaticalization’ [3]. ‘Zi’ was originally a word with independent real meaning, with independent syllables and ﬁxed tones wrapped around syllables. Its original meaning was ‘baby’ [4]. In the development of the Chinese vocabulary system, the meaning of ‘Zi’ has gradually become blurred. At least in the ancient times, the word ‘Zi’ had signs of sufﬁxing; in the middle ancient times, ‘Zi’ used for sufﬁxes was already very developed and had the ability to form new words [5]. The blurring of the meaning of ‘Zi’ is a continuous process. It has a ﬁxed position in word formation and the pronunciation of its syllables is constantly weakened. Sufﬁx ‘Zi’ mostly refers to the situation that the meaning of ‘Zi’ is empty and the pronunciation is soft. The new ‘XX Zi’ construction discussed in this paper is the new catchword structure composed of the variable ‘XX’ plus ‘Zi’. The variable ‘XX’ before ‘Zi’ in this structure is disyllabic and polysyllabic, excluding monosyllabic. Cognitive linguistics proposes that cognitive abilities interact with and are influenced by language. In a sense, the study of language is actually the study of human ways of expressions and exchanges of ideas [6]. Construction grammar is an important part of cognitive linguistics. Construction is a language unit in which one or more formal or meaning features cannot be strictly predicted from the constitutive components of the construction or other constructions in the grammar [7]. This paper holds that the new catchword ‘XX Zi’ can also be regarded as a construction. Its form and meaning are not a simple addition of the meaning of the variable ‘XX’ and the meaning of ‘Zi’. Construction is restricted by factors such as syntax, semantics, pragmatics and intonation [8]. Therefore, this paper investigates the characteristics of ‘XX Zi’ construction from these perspectives. 2.2

Structural Characteristics of the New ‘XX Zi’ Construction

The part of speech of the variable ‘XX’ generally includes nouns, adjectives, verbs, idioms, and English abbreviations. First, the cases where the variable ‘XX’ is a noun can be divided into two categories. The ﬁrst type is that ‘XX’ is used to represent a person’s name, and the variable ‘XX’ often involves some popular people nowadays. In this situation, the variable ‘XX’ is divided into the following situations: 1) ‘XX’ is ‘surname (or overlapping form of surname)’. For example, the crosstalk actor Yue Yunpeng is called ‘Yue Yue Zi’. 2) ‘XX’ is ‘the overlapping form of the last character of the ﬁrst name’. This usage is numerous, such as ‘Xin Xin Zi’ (Yu Shuxin), ‘Mi Mi Zi’ (Yang Mi) and ‘Qi Qi Zi’ (Li Jiaqi). 3) ‘XX’ is ‘the overlapping form of the ﬁrst word of the name’, such as ‘Xue Xue Zi’ (Kong Xue’er), ‘Min Min Zi’ (Liu Mintao) and so on. 4) ‘XX’ is ‘ﬁrst name (or a homophony of ﬁrst name)’, such as ‘Yaowen Zi’ (Zhu Yaowen), ‘Yijin Zi’ (Wang Yijin) and so on. The second type is that ‘XX’ is a common noun (or the overlapping form of the last character of the noun), and the part of speech of the noun is maintained after adding ‘Zi’, such as ‘Kafei Zi’ (coffee), ‘Guan Guan Zi’ (museum) and ‘Mituan Zi’ (mystery). Secondly, the variable ‘XX’ is an adjective, such as ‘Ke’ai Zi’ (cute), ‘Youqu Zi’ (interesting), ‘Piaoliang Zi’ (beautiful), etc. The structural variation of ‘adjective + Zi’ also includes ‘adverb + adjective + Zi’, such as ‘Hao Mei Zi’ (so beautiful), ‘Zhen

An Analysis of the Grammaticalization, Coercion Mechanisms

175

Xiang Zi’ (so fragrant) and ‘adjective overlap + Zi’, such as ‘Kun Kun Zi’ (sleepy) and ‘Jue Jue Zi’ (so great). The third kind of variable ‘XX’, which is frequently used, belongs to the category of verbs. Examples include ‘Xi’ai Zi’ (like), ‘Mobai Zi’ (worship), ‘Zhichi Zi’ (support), etc. There are three types of variants of ‘verb + Zi’: 1) ‘the overlapping form of verb + Zi’, such as ‘Qin Qin Zi’ (kiss), ‘Zha Zha Zi’ (burn), etc.; 2) ‘verb phrase + Zi’, such as ‘Zhang Zhishi Zi’ (get knowledge), ‘Que Xinyan Zi’ (lack mind; dim-witted), etc.; 3) There are modiﬁers before or after ‘verb + Zi’, such as ‘Bu Zhidao Zi’ (don’t know), ‘Bu Dong Zi’ (don’t understand) and ‘Xiao Si Zi’ (so funny). Finally, there are two unusual uses. The variables ‘XX’ are idioms and English abbreviations, such as ‘Cheng Feng Po Lang Zi’ (ride the wind and waves), ‘Da Ke Bu Bi Zi’ (unnecessary) and ‘OMG Zi’ (Oh my God.). Through the retrieval and sorting of the corpus and the classiﬁcation of the part of speech of the variable ‘XX’, this paper believes that the construction of new ‘XX Zi’ construction presents the following structural characteristics. First, in the ‘XX Zi’ construction, the conﬁguration of ‘surname or ﬁrst name + Zi’ is the most frequently used, and the frequency of use of ‘ﬁrst name + Zi’ is higher than that of ‘surname + Zi’. Besides, ‘adjective + Zi’ and ‘verb + Zi’ also appear frequently, including their variants. Second, the ‘XX Zi’ construction often uses the overlapping form as a word formation means. When the variant ‘XX’ is a surname or ﬁrst name, common noun, adjective and verb, the overlapping word formation method always appears. In the ‘XX Zi’ construction, the use of overlapping adjectives can be used to express the characteristics of increased degree, for example, ‘Kun Kun Zi’ deepens the degree of ‘Kun’ (slppey). The overlapping of nouns and verbs strengthens the speaker’s intimate and lively tone. Third, the variable ‘XX’ in the new ‘XX Zi’ construction are mainly disyllabic words, only a few polysyllabic words. It is closely related to the tendency towards two-syllable words of Chinese vocabulary. The number of disyllabic words in Chinese is more than that of polysyllabic words. Among all 58481 entries (including phrases) in Modern Chinese Dictionary (Revised Edition) (《现代汉语词典(修订本)》), 67.63% are disyllabic and 9.335% are polyphonic (from 3 to 1two syllables) [9]. The syllable characteristics of the new ‘XX Zi’ construction will be further discussed in the next section. 2.3

Grammaticalization of the New ‘XX Zi’ Construction

The evolution of grammaticalization can occur in all aspects of language, involving pronunciation, morphology, syntax, semantics, pragmatics and so on [10]. Through the evolution of the usage of ‘Zi’ in Chinese and the analysis of the relevant corpus of the new ‘XX Zi’, this paper holds that the new changes in semantics, pronunciation, syntax and redundancy of the new ‘XX Zi’ construction are the manifestation of its own grammaticalization. These four aspects will be discussed separately below. Semantic Evolution of the New ‘XX Zi’ Construction. Through the analysis of the original usage of ‘Zi’ and the new ‘XX Zi’ construction, it can be found that the semantic connotation of ‘XX Zi’ has changed signiﬁcantly. First of all, ‘Zi’ could have been used to refer speciﬁcally to learned men, such as ‘Kong Zi’ (Confucius) and ‘Fu Zi’ (master). At present, the objects of ‘Zi’ show a trend of generalization. ‘Surname (or overlapping form of surname) + Zi’ can be used to

176

C. Li

refer to both men (‘Yue Yue Zi’ refers to Yue Yunpeng, a male crosstalk actor) and women (‘Cai Cai Zi’ refers to Cai Yilin, a female singer). Secondly, it is very common to add a ‘Zi’ after a morpheme as a noun sufﬁx. The components before ‘Zi’ are generally nominal, adjective and verbal morphemes. The predicate component plus the sufﬁx ‘Zi’ forms the substantive component, such as ‘Dan Zi’ (duster), ‘Luan Zi’ (mess). However, in the new ‘XX Zi’ construction, after adding ‘Zi’ to nouns, adjectives and verbs, the variable ‘XX’ still maintains its original part of speech, such as ‘Mituan Zi’ (mystery), ‘Daoqian Zi’ (apology) and ‘Jingyan Zi’ (amazing). Finally, ‘Zi’ in the new ‘XX Zi’ construction appears as an afﬁx form without obvious real meaning in most cases, showing an obvious grammatical feature. When ‘XX Zi’ forms a sentence alone, ‘Zi’ usually appears at the end of the sentence, using as a modal particle or auxiliary word at the end of the sentence (see sentences 1 and 3 below). There is no clear boundary between different parts of speech. The differences among atypical members are often fuzzy [11]. As mentioned above, ‘Zi’ was originally an independent notional word, and then the meaning of the word continued to become blurred. The ‘Zi’ in the ‘XX Zi’ construction also showed the usage of modal particles or auxiliary words. This evolution of ‘Zi’ is the process of grammaticalization from category of substantive nouns to the category of modal particles, auxiliary words and afﬁxes. Phonetic Changes of the New ‘XX Zi’ Construction. The grammaticalization of the new ‘XX Zi’ construction is also reflected in phonetics. The process of grammaticali-zation is from substantive to grammaticalized, from grammaticalized to empty. This evolution sequence is ‘notional word > functional word > attached form > inflected form (afﬁx)’ [12]. Grammaticalization is not a process in which a new form replaces an old one and a meaning suddenly replaces another one. It is common for two or more forms and meanings to coexist [13]. When ‘Zi’ is read as ‘zǐ’5, it can be used to refer to children, people, learned men, or the second person ‘you’ in ancient times. It can also be used to refer to speciﬁc things, such as seeds, eggs, lumps or granules, etc. In these cases, ‘Zi’ has an independent notional meaning and undertakes a part of the real meaning of the whole word. With the grammaticalization of ‘Zi’, its pronunciation shows a weakening trend. In the structure of ‘surname or ﬁrst name + Zi’, most of the ‘zi’ syllables are pronounced as ‘zǐ’. In the cases of ‘noun + Zi’ and ‘adjective or verb + Zi’, the pronunciation of ‘Zi’ is further weakened, mostly in the form of soft voice ‘zi’. Therefore, in the new ‘XX Zi’ construction, the use of ‘Zi’ in the ‘surname or ﬁrst name + Zi’ structure can still be regarded as the retention of substantive meaning to a certain extent, but in the case of ‘adjective or verb + Zi’, the grammaticalization process is particularly obvious. Syntactic Features of the New ‘XX Zi’ Construction. In the speciﬁc use process, the part of speech of the whole new ‘XX Zi’ construction is basically consistent with the variant ‘XX’. From the perspective of syntactic components, after adding ‘Zi’ to the variable ‘XX’, the new ‘XX Zi’ construction has some syntactic changes. First, when the variable ‘XX’ is ‘last name’, ‘ﬁrst name’ or ‘common noun’, it often acts as the subject, object and sometimes attribute in the sentence. Generally, nouns

5

There are two pronunciations of ‘Zi’ in Chinese, one is pronounced as the third tone (zǐ), and the other is pronounced as a neutral tone (zi).

An Analysis of the Grammaticalization, Coercion Mechanisms

177

cannot use overlapping forms to express a common grammatical meaning [14], but it is frequently used in the ‘XX Zi’ construction, like Guan Guan Zi (馆馆子) ‘library’. Second, when the variable ‘XX’ is an adjective, the construction of ‘XX Zi’ still maintains the characteristics of adjectives, but there are also some syntactic changes. Chinese adjectives used to be attributives, predicates or predicate centers. The variable ‘XX’ with ‘Zi’ can be regarded as a sentence after adding intonation and tone, see (1) and (2). ‘XX Zi’ and its variants usually form a causal relationship with the clauses before and after. When using ‘adjective + Zi’, some examples show the trend of using ‘Zi’ to replace ‘了’ (le). At this time, ‘Zi’ is used as a modal particle or an auxiliary word at the end of the sentence, like (3). At the same time, the variable ‘XX’ in the ‘XX Zi’ construction is mostly a property adjective (beautiful, strange, good, calm, etc.), and there is basically no descriptive adjective. (1) 这实在太让我生气了!无语子! Zhè shízài tài ràng wǒ shēngqì le !Wúyǔ zǐ ! ‘It really makes me angry! speechless!’ (2) 哇, 好有感觉的饭绘!喜欢子!可爱子! Wa, hǎo yǒu gǎnjué de fàn huì !Xǐhuān zǐ !Kě’ài zǐ ! Wow, what a feeling! I like painting! It is cute!’ (3) 比赛时我妈眼睛一闭一睁, (他) 被挤到第十四, 现在真香子。 Bǐsài shí wǒ mā yǎnjīng yī bì yī zhēng, (tā) bèi jǐ dào dì shí sì , xiànzài zhēn xiāng zǐ. ‘My mother closed her eyes during the game, (he) was ranked to fourteenth, and now he does well.’ Third, when ‘XX Zi’ is ‘verb + Zi’ and its variant, after adding ‘Zi’ to the verb, it basically takes no objects, but choose to precede the object in the clause or omit the object directly. In example (4), the object of ‘like’ was ‘two golden retrievers’. After adding ‘Zi’, the verb ‘like’ will not take objects again but choose to explain it in the clause in front of it. Besides, there are some monosyllabic verbs that can be duplicated to form ‘XX Zi’, such as ‘Dengdeng Zi’ (wait), ‘Qinqin Zi’ (kiss) and ‘Zha Zha Zi’ (burn), like sentences (5). (4) 一进小区看到了两只金毛, 喜爱子! Yī jìn xiǎoqū kàndào le liǎng zhī jīn máo , Xǐ’ài zǐ ! ‘As soon as I entered the community, I saw two golden retrievers, I like them!’ (5) 我真是被这部电视剧的预告震撼到!一股子热血上了我的头!炸炸子! Wǒ zhēn shì bèi zhè bù diànshìjù de yùgào zhènhàn dào !Yī gǔ zǐ rèxuè shàng le wǒ de tóu !Zhà zhà zǐ ! ‘I was really shocked by the preview of this TV play! A stream of blood on my head! It’s burning and hot!’ Regardless of the part of speech of the variable ‘XX’ and the syntactic component it acts in sentences, the construction of ‘XX Zi’ shows a tendency to form a sentence alone and often appear at the end of the sentence. To sum up, the new ‘XX Zi’ construction shows obvious grammaticalization in syntactic features. Redundancy in the New ‘XX Zi’ Construction. In the process of information transmission, there is more information than the minimum amount of language expression,

178

C. Li

which is redundant information [15]. Redundancy refers to the phenomenon that the expression form of language exceeds the needs of the actual content of language in language activities [16]. It is pointed out that the essence of redundancy is a rhetorical phenomenon, which is to create a sense of rhythmic harmony, visual balance and beauty. This phenomenon reflects the psychology of convergence [16]. The ‘Zi’ in ‘Qizi’ (wife) in modern Chinese, like ‘er’, ‘zi’ and ‘tou’ attached after the words shì ér (事儿) ‘thing’, gài zǐ (盖子) ‘lid’ and shí tóu (石头) ‘stone’ in Chinese, is the product of meeting the disyllabic requirements of modern Chinese expression, to make up enough syllables, which belongs to the phenomenon of ‘redundancy’[16]. This paper holds that although ‘redundancy’ means that the language component is more than the actual content being expressed, it is not necessarily ‘surplus’ in terms of grammar and expression form. From the perspective of language composition, in many ‘XX Zi’ expressions such as ‘Wuyu Zi’ (wordless) and ‘Haokan Zi’ (good-looking), deleting ‘Zi’ seems making no difference to language expressions, but from many aspects such as grammatical features and pragmatic functions, ‘Zi’ still has its own role. In terms of syllables, in the corpus collected in this paper, the variable ‘XX’ of the new ‘XX Zi’ construction is mostly composed of two syllable words (93.02%), but also contains a small number of three syllable and four syllable words (6.98%). Disyllabic construction is a lexical construction with the highest frequency and prototype effect in modern Chinese word formation, which has the absolute right of ﬁrst use in word formation [17]. For the ‘XX Zi’ construction, a large number of disyllabic words abandon the original word formation usage with high utilization and prototype effect but add ‘Zi’ to form three syllable words, which is obviously not in line with people’s tendency to save effort in the process of using language. The principle of redundancy is necessary in a certain context [18]. Therefore, the ‘XX Zi’ construction of three syllables and even more syllables appears and becomes popular under the influence of redundancy mechanism. In fact, it is still a form of grammaticalization of the new ‘XX Zi’ construction.

3 Reconstructive Operational Mechanisms of the New ‘XX Zi’ Construction The cognitive construction grammar theory holds that the overall syntactic environment can force words to change their grammatical and semantic characteristics, which is the connotation of ‘construction coercion’ [19]. The emergence and popularity of the new ‘XX Zi’ construction is the result of multiple reconstructive operational mechanisms such as lexical coercion, construction coercion and inertial coercion. 3.1

Lexical Coercion in the New ‘XX Zi’ Construction

Lexical coercion means that vocabulary occupies a dominant position in the process of meaning production [20]. Vocabulary can suppress both construction and vocabulary. On the one hand, in the new ‘XX Zi’ construction, most construction meanings are suppressed by the variable ‘XX’. The ‘Zi’ in the new ‘XX Zi’ construction, which shows a tendency of grammaticalization, is further suppressed by the semantics of the variable ‘XX’, so that the part of speech and semantics of the whole ‘XX Zi’

An Analysis of the Grammaticalization, Coercion Mechanisms

179

construction are mainly controlled by ‘XX’. In the new ‘XX Zi’ construction, when the variable ‘XX’ is a verb, the construction meaning of ‘XX Zi’ mainly comes from the verb. For example, in the expressions of ‘Haoqi Zi’ (curious) and ‘Zhichi Zi’ (support), ‘Haoqi’ and ‘Zhichi’ suppress ‘Zi’ and force it to change semantically. In the construction of ‘adjective + Zi’, adjectives also suppress the semantics of ‘Zi’. In the construction of ‘surname or ﬁrst name + Zi’, although the meaning of ‘Zi’ itself is retained to a certain extent, it is still suppressed by the variable ‘XX’. On the other hand, in some cases, the meaning of the variable ‘XX’ will also be suppressed by ‘Zi’ to a certain extent. For example, ‘laoshi’ (teacher) and ‘fuqin’ (father) in (6) and (7) are usually used as honoriﬁcs. Since the ‘Zi’ has the meaning of ‘son, daughter’ and ‘young, small’, using it in the construction of ‘XX Zi’ further narrows the ‘sense of distance’ with the honoriﬁc objects (ie, ‘teacher’ and ‘father’, etc.). It further expresses the emotional meaning of intimacy, ridicule, and liveliness. (6) 这是我和老师子的合影!我和她说我要用一个很厉害的相机拍合照! Zhè shì wǒ hé lǎoshī-zǐ de héyǐng !Wǒ hé tā shuō wǒ yào yòng yīgè hěn lìhài de xiàngjī pāi hézhào ! ‘This is a photo of my teacher and me! I told her I would take a group photo with a good camera!’ (7) 图二中是父亲子煮得忘了关火糊了的玉米。 Tú èr zhōng shì fùqīn-zǐ zhǔ dé wàng le guān huǒ hú le de yùmǐ . ‘Figure 1 shows the homemade pearl milk tea today, and Figure 2 shows the burnt corn caused by my father’s forgetting to turn off the ﬁre.’ Such two-way lexical coercion generally exists in the process of understanding the meaning of the new ‘XX Zi’ construction, and it has an impact on the meaning of the overall construction. 3.2

Construction Coercion in the New ‘XX Zi’ Construction

As it is mentioned earlier, the new ‘XX Zi’ construction was originally used to refer to names. With the popularity of the new ‘XX Zi’ construction, the coverage of ‘XX’ has also increased. Variables such as common nouns, adjectives, and verbs have all entered the ‘XX Zi’ construction. When ‘XX Zi’ represents a person’s name, the variable ‘XX’ in almost all constructions is a two-syllable word, which makes words that entering this construction more likely to be presented in a two-syllable form. Although this is closely related to the high proportion of disyllable words in modern Chinese, many original monosyllable words use overlapping form to achieve the purpose of entering the disyllabic form, such as sentence (8). After entering the construction, ‘kun’ (sleepy) becomes ‘Kun Kun Zi’. (8) 困困子, 好多天没有发微博了。 Kùn kùn zǐ , hǎoduō tiān méiyǒu fā wēibó le. ‘I’m so sleepy. I haven’t posted microblog for many days.’ Two-syllable construction has absolute pre-use right in word formation [17]. For the variable ‘XX’ that enters the construction, it is obviously suppressed by the Chinese ‘two-syllable construction’. Under the further influence of the redundancy mechanism,

180

C. Li

the ‘XX Zi’ construction shows a kind of ‘three-syllable’ tendency construction coercion. In the collected corpus, the three-syllable ‘XX Zi’ construction accounted for 93.02%. Due to the evolution of syntactic system, some syntactic forms may become lexical forms, which is called ‘lexicalization’ [21]. From the perspective of grammatical structure, the new ‘XX Zi’ construction lacks pause and can hardly insert other words. When ‘XX Zi’ is used, it is mostly expressed in the form of three-syllable or foursyllable words, and it is also close to idioms. To a certain extent, many ‘XX Zi’ constructions, such as ‘Wuyu Zi’ (wordless) and ‘Ke’ai Zi’ (lovely), are used very frequently, and even show a tendency of lexicalization. This trend further strengthens the coercion of the ‘XX Zi’ construction on the variable ‘XX’. 3.3

Inertial Coercion of the New ‘XX Zi’ Construction

Inspired by constructive coercion, cognitive inertia is considered to form an ‘inertial coercion’, which sometimes restricts language expression [19]. For example, after the normal expression, people follow the words used in the front and use them as reference to lead to the subsequent abnormal expressions, so as to form an expression similar to zeugmas [19]. (9) 她举止很轻盈, 很文雅, 也很淑女。 Tā jǔzhǐ hěn qīngyíng , hěn wényǎ , yě hěn shūnǚ . ‘She is very light, elegant and lady.’ In sentence (9), ‘very lady’ tends to be accepted under the inertia of ‘very light and elegant’. Inertial coercion can also be used to explain the phenomenon that driven by the inertia of a normal usage, another abnormal expression is pressed towards legalization, which is also reflected in the corpus of the new ‘XX Zi’ construction. (10) 昕昕子是个温柔的女孩子, 只因为个人风格不同就被嘲讽。真的无语子。 Xīn Xīn zǐ shì gè wēnróu de nǚháizǐ, zhī yīnwéi gèrén fēnggé bútóng jiù bèi cháofěng. Zhēn de wúyǔ zǐ. ‘Xin Xin Zi is a very gentle girl, but she is ridiculed because of her different personal style. It’s speechless.’ In sentence (10), ‘Nvhai zi’ (girl) is a very common usage in the prototype of ‘XX Zi’. In the development of the new ‘XX Zi’ structure, the form of ‘name overlap + Zi’ such as ‘Xin Xin Zi’ appeared in earlier stage. Under the inertia of ‘Nvhai zi’ and ‘Xin Xin Zi’, the unconventional usage like ‘Wuyu Zi’ has gradually begun to be legalized. In conclusion, the new ‘XX Zi’ construction not only presents the characteristics of grammaticalization, but also shows the result of multiple reconstructive operational mechanisms such as lexical coercion, construction coercion and inertial coercion. On this basis, this paper will analyze the formation, popular mechanism and pragmatic function of the new ‘XX Zi’ construction from a cognitive perspective.

An Analysis of the Grammaticalization, Coercion Mechanisms

181

4 Formation Motivation and Popular Mechanism of the New ‘XX Zi’ Construction To describe the construction meaning of a popular construction is to describe its relationship with society, and to describe how the new construction becomes popular [1]. With the emergence of new things and new ideas, the new ‘XX Zi’ construction uses the original language knowledge to meet the needs of expressions and thinking. 4.1

Transmission of Fan Culture in Europe, America, Japan and South Korea

From the results of the questionnaires, some respondents mentioned that the new ‘XX Zi’ construction originated from the European and American fan circle. In the fan post bar of American singer Taylor Swift, a bar friend used to say words like ‘Hao Mei Zi’ (so beautiful), and he cheated many bar friends of their money. Many fans began to use words such as ‘Hao Mei Zi’ and ‘Zhichi Zi’ (support) as the ‘stem’ in the fan circle to express their mockery of him. Subsequently, this usage became popular in the fan circles of Japan and South Korea, and the application scope of the ‘XX Zi’ construction was further expanded to refer to people’s favorite and admired idols. The ‘XX Zi’ becomes popular in China is inseparable from the hot airing of several variety talent shows in the past one year or two (especially in 2020). The results of questionnaire survey show that 79.67% of the respondents have watched the show of ‘Sister Riding the Wind and Waves’; 48.78%, 39.84% and 28.46% of those who watched ‘Youth With You 2’, ‘Creation 101’ and ‘Creation Camp 2020’ respectively. In addition, 63.7% of the respondents knew about ‘Fan Quan’ (Fan Circle), and 76.03% of the respondents have watched Chinese, European, American, Japanese and Korean talent shows. In this survey, 86.99% of the respondents have noticed that many people have used ‘XX Zi’ construction, and 49.32% of them said that they often see this kind of expression. The fan culture in China inherited the popular mechanism of fan circles in Europe, America, Japan and South Korea. Influenced by the transmissibility of foreign ‘fan culture’, Chinese audiences are gradually following the practice of foreign fans, using the construction of ‘surname or ﬁrst name + Zi’ to call the idols. 4.2

The Replication and Reinforcement of Strong Memes

A meme refers to a unit of information that can be copied through a process broadly called ‘imitation’ [22]. The formation of language memes requires the existence of a reproducible expression, such as words, phrases, sentences, etc. Under the condition of speciﬁc external factors (subjective and objective factors), the meme host (the language user) replicates the communication [23]. Network catchwords are strong memes, with the features of strong replication and easy dissemination [24]. The network catchwords ‘XX Zi’ is concise in expression, easy to understand, and has high frequency and practicability in the process of communication. It is easy to be imitated and applied, so it has the ‘internal conditions’ to become a language meme.

182

C. Li

At the same time, language memes also need to have appropriate ‘external conditions’, which can conform to some psychological needs of the host. The background time of their emergence often has a high degree of social concern, and the space and channel of their communication are open, tolerant and powerful. The users of new ‘XX Zi’ construction often have a strong tendency to express their inner feelings and judgments. The social events related to the use of ‘XX Zi’, such as ‘Youth With You 2’, ‘Sister Riding the Wind and Waves’, ‘Creation 101’ and many other talent shows have attracted high attention, which also provide social topics and other ‘external conditions’ for its popularity. New media platforms such as Microblog, Wechat and Post Bar break the time and space limitations of the spread of the network catchwords and accelerate the popularity of ‘XX Zi’ construction. 4.3

Influence of Social Culture and Boost of New Media

In fact, the popularity of the new ‘XX Zi’ construction in China is not only influenced by the ‘fan culture’. The usage of ‘surname or ﬁrst name + Zi’, which is most commonly used by netizens nowadays, can also be found in the naming method of ‘surname (+ numeral) + Zi’ in ancient China, such as Kǒng Zǐ (孔子) ‘Confucius’ and Mèng Zǐ (孟子) ‘Mencius’. In addition, under the influence of Japanese culture, many people also began to learn from Japanese naming methods, such as ‘XX Hui Zi (惠子)’, ‘XX Jing Zi (静子)’ and ‘XX You Hua Zi (优花子)’. In China, some dialects also have the usage of ‘XX Zi’, such as ‘Jinnian Zi’ (this year) and ‘Mingnian Zi’ (next year) in Sichuan dialects. Guangdong dialects use ‘XX Zai (仔)’, which is similar to ‘XX Zi’. These social, cultural and dialectal factors further promoted the emergence of the new ‘XX Zi’ construction in China. Nowadays, with the rapid development of the Internet, under the influence of new media, the degree of sharing of various social cultures and languages is constantly improving. The results of this questionnaires also show that 80.14% of netizens usually know the usage of network catchwords through public platforms like Sina Microblog. Many netizens use communication software (78.09%), online video and audio media (68.49%) to know about the network catchwords. Moreover, 36.3% of the respondents said that they would start to use ‘XX Zi’ construction because of ‘following the trend’. 61.73% of the respondents began to accept and use ‘XX Zi’ network catchwords unconsciously. New media has accelerated the spread of ‘XX Zi’ construction. Microblog, Wechat and various video and audio media have also boosted the response of its communication reconstruction. Every forwarding on the new media platform will promote the replication ability of catchword construction ﬁssion, and make its attention increase in order of magnitude, strengthen the adaptability of the public, and then form a strong catchword meme.

An Analysis of the Grammaticalization, Coercion Mechanisms

183

5 Pragmatic Function of the New ‘XX Zi’ Construction from the Cognitive Perspective With the development of Internet and new media platforms, new language forms are emerging. Network catchwords are a typical manifestation. In order to meet the needs of discourse communication, people tend to use their own cognition to construe, categorize and symbolize these language forms [25], so as to create language expression forms that meet their own conversational needs. The formation of the construction of ‘XX Zi’ is related to people’s pragmatic needs and cognitive mechanism. 5.1

Needs of Verbal Communication in Speciﬁc Context

According to different communicative needs, using different constructions to express is the normality of daily speech [26]. When addressing someone, there are three common ways of expression: 1) To address someone by using the overlapping form of a word in the name (usually the last word of the name), like ‘Ting Ting’ (婷婷). This is very common, occasionally using the overlapping form of surname, for example, ‘Cai Cai’ (蔡蔡). 2) Use a word in the name (usually the last word of the name) plus ‘Zi’ to address the person, like ‘Jing Zi’ (静子). 3) Use ‘Xiao’ (小) or ‘Da’ (大) plus ‘surname or ﬁrst name (or its overlapping form)’ to address each other. Sometimes, add a ‘Zi’ after the surname or ﬁrst name, which can usually increase some intimate tone, such as ‘Xiao Yue Yue’ (小岳岳) and ‘Da Ying Zi’ (大英子). The same language structure can express different meanings in different contexts. Different from these common usages used to refer to a person’s name, the popular construction ‘XX Zi’ often uses ‘the overlapping form of surname or ﬁrst name + Zi’ to address a person, such as ‘Kai Kai Zi’ (Wang Kai) and ‘Xin Xin Zi’ (Wu Xin). This usage can be regarded as the superposition of the use of 1) and 2). Full name is usually used in formal occasions. On the network platforms, calling the full name directly will make people feel a sense of distance and disrespect, while some traditional Chinese expressions are not creative and novel. Therefore, on Wechat, Microblog, Post bar and other software or platforms, netizens began to change the way of addressing actors (actresses) and famous entertainers, using ‘XX Zi’ to further highlight their love and pursuit for them. From the occasions of use and the ways of expression, in this survey, the proportion of respondents who use ‘XX Zi’ in formal occasions such as academic and work-related occasions is 0%, indicating that all respondents believe that ‘XX Zi’ does not have the characteristics of becoming a formal language at present. Therefore, they prefer to use ‘XX Zi’ in chatting software such as Wechat and QQ (58.02%), texts or comments on Microblog and Zhihu (55.56%) and daily oral dialogue (34.57%). It could be seen that the expression of ‘XX Zi’ is often used in informal occasions such as network and daily life to express some of the users’ emotions and attitudes. Using the popular construction of ‘XX Zi’ can not only meet the individual’s expression needs in the speciﬁc context, but also arouse the listeners’ attention and sense of identity, thus further promoting the popularity of ‘XX Zi’.

184

5.2

C. Li

Expression Needs of ‘Defamiliarization’ and ‘Subjectivization’

Nowadays, young people are eager to be different and innovative. They are bored by the frequent and repeated use of the old vocabulary system, hoping to ﬁnd a ‘defamiliarized’ speech mode which is different from their daily conversation habits. It is in this case that many network catchwords come into being and soon attract the attention of the audience. In the questionnaire survey, 80.14% of the respondents are young people aged 19–24, and 16.44% of the respondents are aged 25–30, 66.44% of the respondents had a master’s degree, 23.97% had a bachelor’s degree and 6.85% had a doctor’s degree. Moreover, most of the respondents have known and used the new ‘XX Zi’ construction. It can be found that the group of users of the new ‘XX Zi’ shows obvious characteristics of youthfulness. We also found that although 86.99% of the respondents had noticed the use of ‘XX Zi’, 60.96% of them thought that ‘XX Zi’ catchwords changed quickly and would not be popular for a long time. Another 19.17% of the respondents believed that ‘XX Zi’ catchwords would be used by people for a long time, and it may even become a common usage. One of the reasons for the rise of network catchwords is the explicit tendency of young people’s emotion and the characteristics of pursuing novelty and highlighting personality. The usage of ‘Zi’ has existed in Chinese for a long time, while the new construction of ‘XX Zi’ presents a different way of expression from the past. To a certain extent, it satisﬁes the young people’s psychology of pursuing language ‘defamiliarization’ and seeking the freshness of use. ‘Subjectivity’ refers to the expression of the speaker’s ‘self’ in the utterance, that is, the speaker expresses his position, attitudes and feelings towards a passage of speech at the same time, thus leaving his own mark in the utterance [27]. In order to show this subjectivity, the structural form adopted, or the corresponding evolution process experienced by language is called ‘subjectivization’ [28]. The structure ‘XX Zi’ will be dominated by the users’ personal wishes, which highlights the strong personal color and the tendency of ‘subjectivization’. When ‘XX Zi’ is ‘Adjective + Zi’ and ‘Verb + Zi’ and their variants, the adjective or verb involved in the variable ‘XX’ often has a color of personal subjectivization, expressing the speaker’s emotional attitude. Many common expressions such as ‘Bucuo Zi’ (nice) and ‘Youqu Zi’ (interesting) show the speaker’s judgment of the characteristics of something. The ‘verb + Zi’ expressions such as ‘Xihuan Zi’ (like) and ‘Xie Xie Zi’ (thank you) and their variants will directly reveal the speaker’s behavior choice tendency beyond value judgment. We found that 70.55% of the respondents thought that the expression of ‘XX Zi’ would make them feel intimate and lovely; 39.04% and 36.99% of the respondents thought the expression was lively, warm, humorous and funny. On the other hand, 17.81% of the respondents thought that the expression of ‘XX Zi’ was very pretentious. There are also 6.85% of the respondents think that it is ironic and critical, and 2.74% of the respondents consider that it has a subtle and euphemistic effect. We can see that the new ‘XX Zi’ construction often meets the psychological needs of users in pursuit of ‘defamiliarization’ and ‘subjectivization’. Driven by such psychological needs, more people will consciously spread the new ‘XX Zi’ construction.

An Analysis of the Grammaticalization, Coercion Mechanisms

5.3

185

Presentation and Cognitive Prominence of Thematic Meaning

Thematic meaning is conveyed by the speaker or writer by means of organizing information (word order, emphasis means, arrangement of information focus) [25]. The expression of ‘XX Zi’ often presents the feature of sentence formation. In most cases, it appears at the beginning or end of several clauses or a paragraph. This usage enables listeners or readers to focus on ‘XX Zi’ when they see the whole part of the content, and obtain the most important emotional attitude, talking object, content theme or other information from the construction of ‘XX Zi’. Prominence is the principle of language structure arrangement, which conforms to people’s cognitive law [29]. In the collected corpus of the new ‘XX Zi’ construction, ‘Zi’ always appears as a virtual afﬁx, and presents the characteristics of appearing at the end of a word or a sentence. The variable ‘XX’ is placed before ‘Zi’. Putting the variable ‘XX’ in front can highlight the main information and semantic content of the construction ‘XX Zi’. When the cognitive subject sees the new ‘XX Zi’ construction, he or she will pay more attention to the variable ‘XX’ in the front of the construction and focus on the main information content and thematic meaning.

6 Conclusion This paper holds that the popularity of the new ‘XX Zi’ construction is affected by multiple factors, such as foreign fan culture, the popularity of domestic talent shows, the strong memes of network catchwords. The semantic, phonetic, syntactic features and redundancy of construction itself are the manifestations of grammaticalization. It is the result of multiple reconstructive operational mechanisms such as lexical, construction and inertial coercion. It meets the communicative needs of young groups in pursuit of ‘defamiliarization’ and ‘subjectivization’. This paper not only enriches the study of the usage of ‘Zi’ in Chinese and the new construction of ‘XX Zi’, but also provides a new reference for the study of network catchwords. If we take the new ‘XX Zi’ construction as a window and discuss its derivation and development mechanism, we will have a more comprehensive understanding of the influence of network catchwords on people’s daily communication.

References 1. Deng, X.Q.: A study on the construction of a series of catchwords of ‘Suan Ni Hen’–from the aspect of catchwords to social. Contemp. Rhet. 02, 61–70 (2012). (in Chinese) 2. Meillet, A.: L’ évolution des formes grammaticales. Scientia rivista di scienza 12(26), 130– 148 (1912) 3. Shen, J.X.: A survey of studies on grammaticalization. Foreign Langu. Teach. Res. 04, 17– 23 (1994). (in Chinese) 4. Xu, S.: Shuo Wen Jie Zi. Jiu Zhou Press, Beijing (2006). (in Chinese) 5. Wang, L.: Hanyu Shi Gao. Zhonghua Book Company, Beijing (2005). (in Chinese) 6. Wen, X.: On the objectives principles and methodology of cognitive linguistics. Foreign Lang. Teach. Res. 34(2), 90–97 (2002). (in Chinese)

186

C. Li

7. Goldberg, A.E.: Constructions: A Construction Grammar Approach to Argument Structure. University of Chicago Press, Chicago (1995) 8. Li, X.H., Wang, L.F.: SLA from the perspective of construction grammar: construction theory and its pedagogical implications. Foreign Lang. Res. 2, 107–111 (2010). (in Chinese) 9. Zhou, J.: Hanyu Cihui Jiegou Lun. Shanghai Lexicographical Publishing House, Shanghai (2004). (in Chinese) 10. Fischer, O.: On the role played by iconicity in grammaticalization processes. Form miming meaning— iconicity in language and literature. In: Nänny, M., Fischer, O. (eds.) John Benjamins, Amsterdam (1999) 11. Li, W.L.: The evolution of P in the structure ‘V + P + N’ from the perspective of prototype. Chin. Lang. Learn. 2, 32–35 (2004). (in Chinese) 12. Hopper, P.J., Traugott, E.C.: Grammaticaliztion. CUP, Cambridge (1993) 13. Wang, Y., Yan, C.S.: Grammaticalization: characteristics, motivations and mechanisms. J. PLA Univ. Foreign Lang. 28(4), 1–5 (2005). (in Chinese) 14. Huang, B.R., Liao, X.D.: Xiandai Hanyu, 6th edn. Higher Education Press, Beijing (2017). (in Chinese) 15. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948) 16. Wu, L.Q.: The essence and rhetorical function of ‘redundancy’ in Chinese. J. Jiangsu Normal Univ. (Philos. Soc. Sci. Edn.) 46(1), 43–55 (2020). (in Chinese) 17. Liu, Y.M.: The cognitive mechanisms of the emergence of BA constructions. J. PLA Univ. Foreign Lang. 33(1), 10–15 (2010). (in Chinese) 18. Pan, X.J.: The phenomenon of redundancy in Chinese and its difference from ‘surplus.’ Sinogram Cult. 1, 36–39 (2010). (in Chinese) 19. Wang, Y.: Revision on construction coercion: lexical coercion and inertia coercion. Foreign Lang. Their Teach. 12, 5–9 (2009). (in Chinese) 20. Huang, M., Xiao, S.: A new study of the coercion mechanism of catch word ‘Bei AB’ from cognitive construction perspective. J. Soc. Sci. Hunan Normal Univ. 42(05), 119–123 (2013). (in Chinese) 21. Dong, X.F.: Syntactic change and lexicalization in Chinese. Stud. Chin. Lang. 5, 399–409 (2009). (in Chinese) 22. He, Z.R.: Memetic Understanding of Language in Fashion. Shandong Foreign Lang. Teach. 2, 8–13 (2014). (in Chinese) 23. Sun, Y., Han, G.L.: A pragmatic study of the network catchphrase ‘small target’ from the perspective of memetics. J. Zhejiang Int. Stud. Univ. 2, 62–67 (2018). (in Chinese) 24. Liu, Y.L.: On the Construction of ‘Wo Keneng X1 le Jia X2.’ J. Fuyang Normal Univ. (Soc. Sci.) 2, 54–57 (2017). (in Chinese) 25. Leech, G.: Semantics. Shanghai Foreign Language Education Press, Shanghai (1987) 26. Wu, C.A.: The construction characteristics of ‘Ai za zadi’. Chin. Lang. Learn. 31–34 (2007). (in Chinese) 27. Lyons, J.: Semantics. Cambridge University Press, Cambridge (1977) 28. Shen, J.X.: Subjectivity of Chinese and the teaching of Chinese grammar. Chin. Lang. Learn. 1, 3–12 (2009). (in Chinese) 29. Pan, X.J.: A cognitive explanation of the causes of the redundancy. J. Inner Mongolia Univ. (Philos. Soc. Sci.) 42(6), 134–138 (2010). (in Chinese)

A Study on Lexical Knowledge and Semantic Features of Speech Act Verbs Based on Language Facts Linlin Zhang and Hongbing Xing(&) Beijing Language and Culture University, Beijing, China [email protected]

Abstract. Semantic feature is an important dimension of lexical semantics, which has long been a concern of linguistics, psychology and cognitive neuroscience. Based on the connectionism theory, this research combines statistical analysis of corpus with cognitive analysis, takes the speech act verb 谈tan2 ‘talk’ as an example, constructs the lexical knowledge system and analyzes the collocation knowledge system and syntactic framework information. On this basis, from the perspective of psychological cognition of language use, the semantic feature knowledge system of 谈tan2 ‘talk’ is constructed, suggesting new ideas for the study of the semantic features of speech act verbs. Keywords: Speech act verb Lexical knowledge Semantic feature Corpus

1 Introduction With the development of corpus technology and cognitive science, the theory of language cognition represented by connectionism theory has a wide influence on language acquisition and cognitive research. Connectionism theory holds that process of language acquisition involves acquiring language knowledge through real language materials and storing it in a mental lexicon [1]. Corpus resources provide a basis for lexical knowledge research based on language materials and actual usages. [2] investigated the semantic collocation and acquisition of adverbial adjectives and verbs of the same semantic category, while [3] examined the syntactic and semantic relations of adjective-noun collocations in the mental lexicon of foreign students. The statistical properties of language are closely related to the representation of lexical semantics in the brain. According to connectionism, semantic knowledge is stored in the mental lexicon as a set of semantic features [4–6]. Semantic knowledge is the core of lexical knowledge, and semantic feature knowledge is also an important part of the lexical knowledge system. Scholars have conducted a range of researches on lexical semantics from multiple perspectives such as psycholinguistics, neurolinguistics and computational linguistics. Corpus-based knowledge acquisition and lexical usage analysis are of great signiﬁcance to the studies on lexical semantics. Verbs are usually the semantic and syntactic core of sentences. In the mental lexicon of verbs, in addition to the general lexical information involving form, sound and meaning, much special information is also stored for describing events or © Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 187–197, 2022. https://doi.org/10.1007/978-3-031-06703-7_14

188

L. Zhang and H. Xing

behaviors. [7] explained that subject/object information in the verb lexicon was represented by semantic features through cognitive behavior and ERP analysis. [8] showed that various types of lexical information including semantic information, participated in Chinese sentence processing immediately after being accessed. [9] investigated the representation of verb valence characteristics in the mental lexicon of Chinese native speakers and its role in lexical processing. However, in addition to subject and object information, there is much other information in the lexical information of Chinese verbs, such as the adverbial and complement. In fact, adverbial and complement are also important windows to investigate the processing characteristics of verbs and obtain semantic features. Speech act verbs belong to an important category of verbs. Based on the syntactic framework system of a single verb, [10] compared the syntactic distribution aspects of the synonymous verbs 谈tan2 ‘talk’ and 说shuo1 ‘talk’ and found that the richness of 谈tan2 ‘talk’ was much greater than that of 说shuo1 ‘talk’. [5] constructed the knowledge system of psychological semantic features of action verbs based on large-scale authentic language materials. [11] investigated the semantic features of the antonymous verbs 买mai3 ‘buy’ and 卖mai4 ‘sell’ on the basis of the lexical knowledge system. Under the guidance of connectionism theory, this study conducts a corpus-based knowledge acquisition and statistical analysis of the speech act verb 谈tan2 ‘talk’ and investigates its collocation knowledge system and syntactic framework information. Thus, from the perspective of psychological cognition of language use, this study comprehensively and thoroughly describes the semantic features of the speech act verb 谈tan2 ‘talk’ and constructs its lexical knowledge and semantic feature knowledge systems.

2 The Construction of Lexical Knowledge System of Speech Act Verbs 2.1

The Lexical Knowledge System of Verbs

There have been several studies on lexical knowledge in linguistics and psychology. In linguistics, [12] deﬁned ﬁve attributes of lexical knowledge, including generalization, application, breadth, precision and availability. [13–16] constructed a framework of lexical knowledge that included morphology, meaning, function and usage. Research on lexical knowledge in psychology has focused on the representation of lexical knowledge in the mental lexicon, generally including the representation, organization, extraction and recognition process of lexical information such as sound, form and meaning in the brain. Connectionism theory adopts the distribution representation and holds that a concept is expressed by multiple units through joint action. Words with the same unit in the mental lexicon form a cluster according to the number of shared units, such that there is a unit-to-unit connection between different mental lexicons. [1] emphasized the important role of function distribution, collocation relationship, frequency knowledge and semantic clustering in the representation of lexical knowledge. The numbers of words used in practice, collocation knowledge and syntactic function distribution are important measures of lexical knowledge that affect the representation of lexical semantics. This research mainly constructs the lexical knowledge

A Study on Lexical Knowledge and Semantic Features

189

system of speech act verbs through the collocation knowledge system and syntactic framework information. 2.2

The Collocation Knowledge System of Speech Act Verbs

The corpus of this study comes from the Modern Chinese Corpus of the National Language Commission. 1709 corpora of 谈tan2 ‘talk’ are retrieved from the corpus, of which 500 corpora are respectively selected for labeling. The annotated corpora meet the following requirements: (1) the verb is the main predicate in the sentence; (2) the verb is in a complete syntactic framework; and (3) the sentence with the verb expresses a complete meaning. Finally, this research obtains 336 examples, and annotates the subjects, objects, adverbials, complements, collocation types, and syntactic frameworks of each. By annotating and analyzing the corpus, this study ﬁnds the relations among subject-verb collocation, verb-object collocation, adverbial-verb collocation, and verbcomplement collocation formed by 谈tan2 ‘talk’ and its collocational word-types and frequency, allowing us to construct the collocation knowledge system (Fig. 1 and Table 1).

Fig. 1. The syntactic collocation relationships of 谈tan2 ‘talk’.

190

L. Zhang and H. Xing Table 1. The collocation types of 谈tan2 ‘talk’. Syntactic component Collocation type Number Subject Agent 311 Experiencer 25 Object Noun phrase 113 Noun 52 Verb phrase 21 Pronoun 11 Verb 5 Clause 3 Adverbial Object 60 Status 45 Scope 23 Frequency 23 Time 22 Negativity 22 Manner 16 Place 9 Result 8 Degree 4 Complement Result 40 Quantity 27 Status 10 Possibility 7 Tendency 7

There are 311 agent subjects in the collocations of 谈tan2 ‘talk’, accounting for 92.6% of the examples, of which the personal pronoun 我wo3 ‘I’ is the most commonly used; there are also 25 experiencer subjects, accounting for 7.4%. There are 205 collocation objects, of which 113 noun phrases (55%) make up the largest number, with verbs and clauses making up the smallest proportion. A total of 232 adverbials are collocated with谈tan2 ‘talk’, with most of the adverbials expressing object and state. Among the adverbials expressing negativity, the most commonly used is 不bu4 ‘not’. There are 91 complements collocated with 谈tan2 ‘talk’, of which the number of resultative complements is the largest, accounting for 43.9%, while the potential complements and directional complements are the fewest. 2.3

The Syntactic Framework Information of Speech Act Verbs

The syntactic framework information of words is an important part of the lexical knowledge system [1]. The description and statistics concerning the syntactic framework system of verbs improve the knowledge of grammar and usage frequency in the lexical knowledge system. In addition, as there are selective restrictions between verbs

A Study on Lexical Knowledge and Semantic Features

191

and syntactic frameworks, statistics for the latter allow us to determine the dominant syntactic frameworks of verbs. The syntactic framework of this study is based on [17]. The calculated usages of the syntactic framework information of 谈tan2 ‘talk’ are shown in Table 2. Table 2. The syntactic framework information of 谈tan2 ‘talk’. Syntactic frameworks Number Proportion (Subject)-Verb 26 7.738 (Subject)-Verb-Complement 44 13.095 (Subject)-Verb-Object 108 32.143 (Subject)-Verb-Complement-Object 45 13.393 (Subject)-“不/没 (negative words)”-Verb 5 1.488 (Subject)-“不/没 (negative words)”-Verb-Object 7 2.083 (Subject)-“不/没 (negative words)”-Verb-Complement 1 0.298 (Subject)-Modal Verb-Verb 4 1.190 (Subject)-Modal Verb-Verb-Object 12 3.571 (Subject)-Modal Verb-Verb-Complement 1 0.298 (Subject)-Modal Verb-“一/了”-Repetitive Verb 7 2.083 (Subject)-Modal Verb-“一/了”-Repetitive Verb-Object 6 1.786 (Subject)-“一/了”-Repetitive Verb 10 2.976 (Subject)-“一/了”-Repetitive Verb-Object 14 4.167 (Subject)-Verb-“着” 2 0.595 (Subject)-Verb-“着”-Object 4 1.190 Serial Verb Constructions (verb before) 1 0.298 Serial Verb Constructions (verb after) 38 11.310 shide Sentence 1 0.298

Z-score 0.325 1.028 3.529 1.067 −0.495 −0.417 −0.652 −0.535 −0.222 −0.652 −0.417 −0.457 −0.3 −0.144 −0.613 −0.535 −0.652 0.794 −0.652

It is found that the syntactic framework information of 谈tan2 ‘talk’ is very rich, with a total of 19 syntactic frameworks being used, which is consistent with the ﬁndings of [10]. Among the syntactic frameworks of the 40 verbs annotated, 谈tan2 ‘talk’ is used the most, covering 76% of the syntactic framework types. The most frequently used syntactic framework is “(Subject)-Verb-Object”, with a Z-score1 of 3.529. In addition, 谈tan2 ‘talk’ is widely distributed in the “(Subject)-Verb-Complement” and “(Subject)Verb-Complement-Object” frameworks, with almost the same proportion of usage. “(Subject)-Verb” and “Serial Verb Constructions (verb after)” are also the commonly used syntactic frameworks of 谈tan2 ‘talk’.

1

Z-score is a process of dividing the difference between the number and the mean by the standard deviation, and the calculation formula is: Z = (X-X)/s.

192

L. Zhang and H. Xing

3 The Construction of Semantic Feature Knowledge System of Speech Act Verbs 3.1

The Semantic Feature Knowledge System of Verbs

Semantic feature is an important dimension of lexical semantics and the knowledge support for language understanding and generation [5]. From the perspective of psycholinguistics, there is a large body of evidence showing that meaning in the mental lexicon is characterized by semantic features and semantic networks. The hierarchical network model of [18] holds that the mutual connections of each word with others form a network-like semantic organization in the mental lexicon. In this network, a node represents a concept, and each concept has particular characteristics. These nodes are connected through various networks. The spreading-activation theory proposed by [19] holds that lexical organization does not have a strict hierarchy, but that the units are linked to each other by semantic connections and similarity. [5] constructed the knowledge system of mental semantic features of verbs based on language facts, arguing that the semantic information of words in the mental lexicon was represented as spreading and activation. The knowledge system of mental semantic features of verbs forms a complex network relationship with a hierarchical order and mutual connections and interactions between concept and feature, feature and feature, feature and feature value, feature value and feature value [5]. Semantic feature knowledge system based on ‘featurefeature value’ provides a reference system for the comprehensive and profound systematic description of verb semantics. Therefore, on the basis of previous studies, this research adopts the hierarchical description model of “feature-feature value” to formally describe the semantics of 谈tan2 ‘talk’ and constructs the semantic feature knowledge system. 3.2

The Semantic Feature Extraction of Speech Act Verbs

[1] argued that frequency factor was the most important manifestation of semantic features. This study adopts the semantic feature knowledge system of [5] with modiﬁcations to analyze a real natural corpus and extracts the semantic features of 谈tan2 ‘talk’. Annotation Speciﬁcation. The annotation speciﬁcation involves the analysis on semantic components, the extraction of semantic feature values, the classiﬁcation for semantic features and the construction to semantic feature knowledge system. (1) The analysis on semantic components. This research takes any possible collocational components with verbs in the real scenes into consideration and make an exhaustive analysis of all semantic components associated with 谈tan2 ‘talk’ in a given sentence. Semantic components are not limited to parts of speech, including nouns, verbs, and adjectives, but also include larger units like phrases and clauses.

A Study on Lexical Knowledge and Semantic Features

193

(2) The extraction of semantic feature values. Based on the semantic components, the semantic feature values of verbs are extracted, including the emotion feature, evaluation feature, quantity feature and result feature of the action subject; the reference feature, statement feature, deﬁnite feature and quantity feature of the action object; and the timeliness, spatiality, directionality, result and frequency of the action. For example, 她们都俯着窗口热情地谈着话ta1 men dou1 fu3 zhe chuang1 kou3 re4 qing2 de tan2 zhe hua4 ‘They are talking enthusiastically, leaning over the window’, 热情re4qing2 ‘enthusiastically’ shows the emotion feature of the subject. 我愿意谈一点粗浅的意见wo3 yuan4 yi4 tan2 yi1 dian3 cu1 qian3 de yi4 jian4, ‘I would like to talk some rough ideas’, 一点yi1 dian3 ‘some’ reflects the quantity feature of the object. 实用主义者只谈真理的有用性 shi2 yong4 zhu3 yi4 zhe3 zhi3 tan2 zhen1 li3 de you3 yong4 xing4, ‘Pragmatists talk only about the usefulness of truth’, 只zhi3 ‘only’ reflects the degree feature of the action. (3) The classiﬁcation for semantic features. After the extraction, this study makes a classiﬁcation and summarization for the obtaining semantic feature values. Eventually, a standardized and representative semantic framework for speech act verbs is achieved. (4) The construction to semantic feature knowledge system. This system is hierarchical with the structure of ‘feature-feature value’ as its foundation and core. Frequency of Semantic Features. Based on the vocabulary knowledge system and frequency information, the semantic features of 谈tan2 ‘talk’ are extracted in two ways. (1) The distinctive properties emerged by high-frequency semantic components. This is the most important way to obtain semantic features. A high frequency of combinations between words can highlight semantic features. For example, the high frequency of the combination of 谈tan2 ‘talk’ and 好好地hao3 hao3 de ‘well’ could allow us to extract the major emotion feature of 谈tan2 ‘talk’. (2) The ﬁxed collocations of high-frequency co-occurrence. This approach is also signiﬁcant to obtain semantic features. There are many ﬁxed collocations in Chinese. For example, 谈tan2 ‘talk’ is often combined with 上shang4 ‘upper’ to form ﬁxed collocations such as 谈得上tan2 de shang4 ‘can go so far as to say’ and 谈不上tan2 bu shang4 ‘out of the question’. Thus, the difﬁculty feature of 谈 tan2 ‘talk’ could be extracted. According to the data gained through the former two methods, the frequency of the semantic features of 谈tan2 ‘talk’ is calculated in Table 3.

194

L. Zhang and H. Xing Table 3. The frequency of the semantic features of 谈tan2 ‘talk’.

Semantic feature Subject feature

Feature value

Feature value description

Emotion feature

•Psychological,emotional, and attitudinal tendencies towards action events •The number of participants in action events •Feeling and evaluation of action events

67

•The state change of subject caused by action events •The speciﬁc concepts of things pointed by action-related objects •The property, action or proposition of action-related objects •The deﬁnite articles •The number of action objects •The number and frequency of actions

8

Quantity feature Evaluation feature Result feature Object feature

Reference feature Statement feature

Quantity feature

Time feature Space feature Result feature Manner feature

Deﬁnite feature Quantity feature Frequency feature Degree feature Difﬁculty feature Time feature of action events Initial feature Place feature of action events Direction feature Result feature of action events Manner feature of the subjects

Frequency

60 10

165 29 11 10 50

•The degree and scope of actions •The degree to living up to expectation produced by actions •The time of action events

22

•The beginning and ongoing of actions •The place where an action event occurs

3 9

27 7

•The direction of actions •The result of action events

4 40

•The state mode of action subjects

16

It is found that the subject and object features are the two most frequent features of 谈 tan2 ‘talk’. The frequencies of emotion feature and quantity feature are the highest for the subject feature, indicating that the speaker has a clear attitude toward the conversation event itself. Since a conversation involves two or more parties, quantity feature is also prominent. The frequency of reference feature in the object feature is the highest, indicating that the objects involved in conversation actions are mostly subjective components referring to the concept of action objects. In addition, the frequency of the quantity feature is also very high; the frequency of the frequency feature is 50, and speakers often use adverbs and momentum words such as 再zai4 ‘again’ and 也ye3 ‘also’ to indicate the frequency of action events. The result feature also has a high frequency, indicating that speakers pay more attention to the results of conversation behaviors.

A Study on Lexical Knowledge and Semantic Features

195

In combination with the frequency factors, the distribution of the semantic features and feature values of 谈tan2 ‘talk’ is also investigated. There are differences in the number of feature values and the distributional tendency of features across semantic features. For example, in terms of frequency feature, 谈tan2 ‘talk’ and 再zai4 ‘again’ co-occur 12 times, while 谈tan2 ‘talk’ and 又you4 ‘again’ co-occur only once. In terms of difﬁculty feature, 谈得上tan2 de shang4 ‘can go so far as to say’ and 谈得到tan2 de dao4 ‘take into consideration’ appear 6 times, but 谈不上tan2 bu shang4 ‘out of the question’ appears only once, indicating that 谈tan2 ‘talk’ tends to express the degree to which action behaviors can achieve expectations. 3.3

The Semantic Feature Knowledge System of Speech Act Verbs

Based on the semantic feature framework, this research constructs the semantic feature knowledge system of 谈tan2 ‘talk’ (Fig. 2).

Fig. 2. The semantic feature knowledge system of 谈tan2 ‘talk’.

The semantic feature knowledge system of 谈tan2 ‘talk’ is a feature set that contains subject, object, time, space, manner, result, and quantity features. Each feature has one or more feature values. The distribution representation of connectionism theory holds that words can be decomposed into smaller semantic units in the mental lexicon. [5] pointed out that the number of semantic units of an action concept was related to the number of co-occurring semantic components and their connection strengths, such that the more semantic components that co-occurred around the verb, the more semantic

196

L. Zhang and H. Xing

features it had, and the higher the frequency of verbs co-occurring with a semantic component, the stronger the connection between the two. This study shows that the mental lexicon representation of 谈tan2 ‘talk’ is distributed, and semantic information is included in the relationship between 谈tan2 ‘talk’ and various language components. Connectionism theory holds that the storage of meaning in the mental lexicon is well-organized. This research also contributes that different features of 谈tan2 ‘talk’ are hierarchical, with different degrees of importance, and their contributions to word meaning are also distinct. Furthermore, a three-layer hierarchical network of semantic component-feature value-feature is formed within each feature set. The semantic components used with high frequencies appear repeatedly and form ﬁxed representations. These representations are ﬁrst stored in the layer of feature values, and then clusters are formed between similar feature values, so as to shape more abstract and implicit feature representations.

4 Conclusion This research constructs the lexical knowledge system of the speech act verb 谈tan2 ‘talk’, discusses its collocation knowledge system and syntactic framework information in detail, and constructs its semantic feature knowledge system. Lexical knowledge is the basis for analyzing semantic features, and the semantic features of verbs can be determined on the basis of lexical knowledge connection and clustering. Studying the semantic representation of the mental lexicon from the perspective of psychological cognition of language use can provide a research paradigm for the study of semantic features. The meaning is instructive and explanatory, and the form is decisive and operational. The study of semantic features needs formal standards that can further semantic research. In future research we hope to examine the syntactic framework and semantic feature knowledge systems in combination, expand the scope of investigation of speech act verbs, expand corpus resources, and further improve the study of the lexical semantic features of speech act verbs. Acknowledgments. We thank the anonymous reviewers for their constructive comments, and gratefully acknowledge of the support of the Humanities and Social Sciences Research Planning Project by the Ministry of Education (No. 20YJAZH110), the Western and Frontier Region Project of the Humanities and Social Science Research Youth Fund of the Ministry of Education (No. 19XJC740002), and Wutong Innovation Platform Project of Beijing Language and Culture University (Special Fund for Basic Research Fees of Central Universities) (No. 19PT01).

References 1. Xing, H.: Vocabulary Acquisition of Chinese as a Second Language. Peking University Press, Beijing (2016). (in Chinese) 2. Zhang, J.: The semantic collocation and acquisition of adverbial adjectives/verbs in the same semantic category. Master Thesis of Beijing Language and Culture University (2009). (in Chinese)

A Study on Lexical Knowledge and Semantic Features

197

3. Zhang, Y.: The acquisition of syntactic and semantic of adjective-noun collocation in international students. Master Thesis of Beijing Language and Culture University (2009). (in Chinese) 4. Xing, H.: The formation and development of the second language lexical knowledge. TCSOL Stud. 78(2), 39–46 (2020). (in Chinese) 5. Xu, T.: Construction of knowledge system of psychological semantic features of action verbs based on language facts. Doctoral Dissertation of Beijing Language and Culture University (2020). (in Chinese) 6. Li, P.: Connectionism model of language acquisition. Contemp. Linguis. 4(3), 164–175 (2002). (in Chinese) 7. Li, X.: Mental representation of verb meaning: behavioral and electrophysiological evidence from Chinese. Doctoral Dissertation of Beijing Normal University (2005). (in Chinese) 8. Shi, D., Zhang, H., Shu, H.: The immediate effect of verb information in Chinese sentence comprehension. Acta Psychol. Sin. 31(1), 28–35 (1999). (in Chinese) 9. Feng, L., Ding, G., Chen, Y.: A study on psychological reality of verbal valence characteristics. Appl. Linguis. 2, 61–68 (2006). (in Chinese) 10. Zhao, Y.: The research on the acquisition of sentence frame for single verb. Master Thesis of Beijing Language and Culture University (2013) .(in Chinese) 11. Shi, G., Yang, C., Xing, H.: A study on semantic features of antonymous verbs based on lexical knowledge system. In: Liu, M., Kit, C., Su, Qi. (eds.) CLSW 2020. LNCS (LNAI), vol. 12278, pp. 421–431. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-811976_35 12. Cronbach, L.J.: An analysis of techniques for diagnostic vocabulary testing. J. Educ. Res. 3, 206–217 (1942) 13. Richards, J.C.: The role of vocabulary teaching. TESOLQ. 1, 77–89 (1976) 14. Nation, P.: What Is Involved In Learning A Word in Teaching and Learning Vocabulary, pp. 29–50. University of Wellington. Victoria (1987) 15. Meara, P.: A note on passive vocabulary. Second Lang. Res. 150–154 (1990) 16. Laufer, B.: The development of passive and active vocabulary in a second language: same or different. Appl. Linguis. 19, 225–271 (1998) 17. Sentence Pattern Research Group of Beijing Language and Culture University.: Basic sentence patterns of modern Chinese. Chinese Teach. World. 1, 26–35 (1989). (in Chinese) 18. Collins, A.M., Quillian, M.R.: Retrieval time for semantic memory. J. Verb. Learn. Verb. Behav. 8, 240–247 (1969) 19. Collins, A.M., Loftus, E.F.: A spreading activation theory of semantic process. Psychol. Rev. 82, 407–428 (1975)

Investigating Verbs of Confession Through a Syntactic and Semantic Annotation Tool Shan Wang(&) Department of Chinese Language and Literature, Faculty of Arts and Humanities, University of Macau, Taipa, Macau, China [email protected]

Abstract. With the development of big data and deep learning, Chinese information processing systems need more comprehensive quantitative research to carry out large-scale automatic analysis. Syntactic and semantic analysis is a hot topic in the ﬁeld in recent years. Existing research on Chinese verbs of confession lacks syntactic and semantic exploration based on large-scale corpora. This study has extracted and selected sentences with verbs of confession from several Chinese corpora. This study then has used a self-developed syntactic and semantic annotation tool to check the automatic annotation results based on dependency grammar and further explored the features of these verbs. It was found that the ﬁve verbs all appear in four syntactic dependencies: HED (Head), VOB (Verb-Object), ATT (Attribute) and SBV (Subject-Verb); their most frequently collocated semantic roles are Agent, Content and Time. This study is of value for enriching the deep knowledge of Chinese verbs and providing references for lexicography, language teaching and natural language processing. Keywords: Syntax Semantics Verbs of confession semantic annotation tool Dependency grammar

A syntactic and

1 Introduction Verbs are usually used as the core elements in sentences. Existing research has conducted diverse studies of Chinese verbs from different perspectives. However, there is still a lack of analysis of a group of words from the same semantic category based on dependency grammar. With the rapid development of big data and deep learning, Chinese information processing systems need more and more comprehensive quantitative research to carry out large-scale automatic analysis. Therefore, how to perform syntactic and semantic analysis of a language in the background of big data is of great signiﬁcance for natural language processing [1]. Since Feng [2] introduced dependency grammar into the research ﬁeld of modern Chinese, it has developed rapidly, especially in the ﬁeld of Chinese information processing. Zhou and Huang [3] believed that dependency grammar is a syntactic system that meets the requirements of large-scale real text processing. At present, the analysis based on dependency grammar mainly uses machine learning methods to extract the features for classiﬁcation [4–6]. This method can improve the accuracy rate of syntactic analysis, but there are still many errors. © Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 198–211, 2022. https://doi.org/10.1007/978-3-031-06703-7_15

Investigating Verbs of Confession Through a Syntactic and Semantic Annotation Tool

199

Based on the API of Language Technology Platform (LTP) of Harbin Institute of Technology, this project develops a syntactic and semantic annotation tool to conduct research on Chinese verbs. This article selects a group of Chinese verbs of confession and compares their similarities and differences in syntax and semantics. This study helps to enrich the quantitative study of the syntactic and semantic features of Chinese verbs, which is conducive to lexicography, Chinese language teaching and natural language processing.

2 Related Research Verbs of confession is a sub-category verbs of communication in English Verbnet. Levin [7] briefly introduced some of the classes of English verbs relating to communication and the transfer of ideas. Chinese also has rich communicative verbs, including speaking verbs, suggestion verbs, information verbs, verbs of confession, etc. Existing research focuses on the following aspects. First, research on speaking verbs has been conducted. Wang [8] analyzed three speaking words shuō ‘speak’, jiǎng ‘speak’, and huà ‘speak’ combining diachronic evolution and synchronic distribution based on diachronic and dialect materials. Zhang and Cui [9] analyzed the grammaticalization of three speaking words yuē ‘speak’, yán ‘speak’, and yún ‘speak’ using Shàngshū ‘The Book of History’ and Shījīng ‘The Book of Songs’ as the sources. They found that speakers usually pay more attention to the content than the speech act. They also pointed out that the resulting semantic and pragmatic characteristics caused the grammaticalization of these verbs. However, this study does not analyze the syntactic and semantic features of modern speaking words. Second, studies on suggestion verbs have been conducted. Yu [10] explored the semantic and pragmatic features of the three highfrequency suggestion verbs jiànyì ‘suggest’, tíchàng ‘promote’ and tíyì ‘proposal’. But it just lists some sentences containing these verbs, without the support of large-scale corpus data. Third, research on information verbs has been conducted. Zhang [11] investigated a series of information verbs including fùzhǔ ‘exhort’, zhǔfù ‘exhort’, fēnfù ‘exhort’ and jiāodài ‘exhort’. This study mainly analyzes their diachronic evolution, but it does not analyze the syntactic and semantic features of these words. In sum, most of the existing research focuses on the evolution of verbs of communication, though some research discussed their syntactic and semantic features using a few sentences. Recently, the syntax and semantics of Chinese vocabulary have attracted much attention [12–16]. However, there is still a lack of research on Chinese verbs of confession. Therefore, this article selects a group of such verbs including zhāorèn, zhāogòng, gòngrèn, tǎnbái, and chéngrèn, and explores their syntactic and semantic features based on large-scale corpora from the perspective of dependency grammar.

3 Research Methods 3.1

Research Procedures

First, this study has selected Chinese verb of confession through making reference to different sources. It referred to the sub-category of “confess” of communication verbs

200

S. Wang

in Verbnet (including confess, admit, acknowledge, fess_up, proclaim and reveal) and checked their Chinese translation. It also referred to many Chinese dictionaries, such as Research Group of Modern Chinese Common Word List [17], A Thesaurus of Modern Chinese [18], and Synonym Word Forest [19]. Then ﬁve commonly used verbs of confession were selected, including zhāorèn, zhāogòng, gòngrèn, tǎnbái and chéngrèn. Second, this study extracted data from the corpora of BCC, CCL and Sogou Lab. The speciﬁc steps are as follows. (1) Extracted all sentences in these corpora containing these ﬁve words. (2) Deleted complex sentences (a sentence that consists of two or more clauses) which consist of two or more clauses in order to only keep single sentences. It then deleted single sentences that contain special symbols which are not in line with the rules of Chinese language. The reasons for choosing single sentences is as follows. First, this study aims to explore the syntactic and semantic features of verbs of confession. They can be reflected in single sentences, while the clauses in a complex sentence not containing the target ﬁve verbs has no direct relation to the use of these verbs. Second, complex sentences are more complicated than single sentences, and thus the accuracy rate of automatic annotation is much lower than single sentences. Third, regarding a complex sentence, that is, a sentence containing two or more clauses, the meaning of the clause containing the target verb is often incomplete without referring to other clauses in the sentence. Therefore, this study does not include such clauses in the analysis. (3) All these single sentences were automatically processed by word segmentation and parts of speech tagging using pkuseg. Based on the result, this study selected single sentences in which the ﬁve target words are tagged as verb and are the smallest unit of word segmentation. (4) After these steps, the number of single sentences to be extracted is determined according to the number of senses of each word in The Contemporary Chinese Dictionary (7 Edition) [20]; that is, if a verb has one sense, 200 sentences were randomly extracted, and if a verb has two senses, 400 sentences were randomly extracted. chéngrèn has two senses, this study randomly extracted 400 sentences; each of the other four verbs have one sense, this study randomly extracted 200 sentences for each of them. In total, there are 1200 sentences. (5) Manually excluded sentences with grammatical errors or those that are too colloquial. After this step, a total of 1048 sentences were chosen for the ﬁve verbs, as shown in Table 1. Table 1. Senses of verbs of confession Verbs

Senses in The Contemporary Chinese Dictionary (7 Edition) [20] and The Contemporary Chinese Dictionary (ChineseEnglish Edition) [21] zhāorèn confess one’s crime(s); plead guilty zhāogòng make a confession of one’s crime(s); confess gòngrèn defendant admitted what he had done tǎnbái own up to (one’s mistakes or crime) chéngrèn ① agree; consent; admit; acknowledge; recognize; allow ② recognize the legal status of a newly established country and government

Number of Sentences in Step (4)

Number of Sentences in Step (5)

200 200 200 200 200

165 164 195 200 139

200

185

Investigating Verbs of Confession Through a Syntactic and Semantic Annotation Tool

3.2

201

The Syntactic and Semantic Annotation Tool

LTP is a language processing technology platform, which was developed by Harbin Institute of Technology. Based on the API of LTP [22], this project has developed a syntactic and semantic annotation tool. The interface is shown in Fig. 1.

Fig. 1. The interface of the syntax-semantic annotation tool

When the annotator logs in this tool, sentences can be imported in batches through the button 数据文件路径 shùjù wénjiàn lùjìng ‘data ﬁle path’. Then click the 开始 kāishǐ ‘start’ button to automatically perform word segmentation, syntactic analysis and semantic analysis. Next let’s take the sentence “我承认什么啦! wǒ chéngrèn shénme la! ‘What do I confess!’ as an example to introduce the functions of this tool. First, the word segmentation results are displayed in 处理的句子 chǔlǐ de jùzi ‘processed sentences’ column, as shown in Fig. 2. Second, the syntactic dependence graph is shown in Fig. 3. Third, the semantic dependence graph is shown in Fig. 4.

Fig. 2. The word segmentation results of 我承认什么啦! wǒ chéngrèn shénme la! ‘What do I confess!’

Fig. 3. The syntactic dependency graph of 我承认什么啦! wǒ chéngrèn shénme la! ‘What do I confess!’

202

S. Wang

Fig. 4. The semantic dependency graph of 我承认什么啦! wǒ chéngrèn shénme la! ‘What do I confess!’

Fourth, if the word segmentation is incorrect, they can be directly modiﬁed in Fig. 2. If the syntactic graph or the semantic graph is inaccurate, the results can also be modiﬁed: left-click the parent node, then right-click the child node, and ﬁnally select the 删除 shānchú ‘delete’ button and choose the correct label, as shown in Fig. 5 and Fig. 6 respectively.

Fig. 5. Syntactic dependency modiﬁcation

Fig. 6. Semantic dependency modiﬁcation

Fifth, delete sentences. Click 是否确定删除句子shìfǒu quèdìng shānchú jùzi ‘Are you sure you want to delete the sentence’, and then conﬁrms “Y”. If you choose “Y”, you need to enter the reason for deletion in the box, as shown in Fig. 7.

Investigating Verbs of Confession Through a Syntactic and Semantic Annotation Tool

203

Fig. 7. The prompt box and input box to delete a sentence

Using this tool, this study manually checked and corrected incorrect syntactic and semantic dependencies for all the 1048 sentences.

4 Syntactic and Semantic Analysis of Verbs of Confession 4.1

Syntactic Analysis of Verbs of Confession

Among the 1048 single sentences, the verbs of confession as a whole appear in seven syntactic dependencies when they are dependents: HED (Head), SBV (Subject-Verb), ATT (Attribute), VOB (Object), COO (Coordination), POB (Prepositional-object), and FOB (Fronting-object)1, as shown in Table 2, 3, 4, 5 and 6. That is, they can function as Predicate, Subject, Attribute, Object and as a component of a coordination structure based on dependency grammar. The distribution shows the following characteristics in common: (1) they have diverse types of syntactic dependencies. There are seven types for zhāogòng, six types for zhāorèn and tǎnbái, and ﬁve types for gòngrèn and chéngrèn. (2) Acting as Head accounts for the highest ratio for the ﬁve verbs, with 65.5% for zhāorèn, 42.7% for zhāogòng, 78.5% for gòngrèn, 46.0% for tǎnbái, and 47.8% for chéngrèn. (3) Their second highest syntactic dependency is Verb-Object, with 20.6% for zhāorèn, 40.2% for zhāogòng, 8.7% for gòngrèn, 26.0% for tǎnbái, and 37.7% for chéngrèn. (4) They can all occur in the four positions: Head, Object, Attribute and Subject. There are also differences among these verbs according to the sampled sentences. (1) Only zhāogòng acts as Fronting-object. (2) Only chéngrèn does not act as Prepositional-object. (3) Though all their highest dependency is to act as Head, the percentage of gòngrèn (78.5%) is much higher than other four verbs. (4) Though their second largest dependency is Verb-Object, the percentage of zhāogòng (40.20%) is much higher than other four verbs.

1

Translations of these terms are on this website: https://ltp.ai/docs/appendix.html#id5. But the developer conﬁrmed that some translations have errors on April 26, 2022, which confuse dependency relations with the role of the dependents themselves. Since no new ofﬁcial translations have been provided yet, this paper still uses the current translations on the website, except changing the verb “coordinate” into the noun “coordination”.

204

S. Wang Table 2. The distribution of the syntactic dependencies of zhāorèn Type Number Percentage Examples HED 108 65.5% Nàme nǐ quán zhāorèn le? then_you_all_confess_ASP So you all confessed? VOB 34 20.6% Zhè shì Dí Qīng zìjǐ zhāorèn de this_be_Qīng Dí_himself_confess_DE This was Qing Di’s own admission COO 11 6.7% Lǐ shì yě huà gòng zhāorèn Lǐ surname_also_sign_written confession_confess Li also signed a written confession SBV 5 3.0% Tā de zhāorèn shǐ tā fālèng she_DE_confess_make_him_ be in a daze Her confession made him in a daze ATT 5 3.0% Xiān zhāorèn de rén huì shì shéi? First_confess_DE_person_can_be_whom Who will be the ﬁrst to confess? POB 2 1.2% Fù yú zhāorèn zhīzhōng again_in_confess_in the process In the process of confessing again Total 165 100% /

Table 3. The distribution of the syntactic dependencies of zhāogòng Type Number Percentage Examples HED

70

42.7%

VOB

66

40.2%

SBV

13

7.9%

ATT

7

4.0%

COO

6

4.3%

POB

1

0.6%

FOB

1

0.6%

Total 164

100%

Tā de yī ge tónghuǒ zhāogòng le He_DE_one_CL_partner_confess_ASP One of his partners confessed Zhè jiù shì jiējí dírén de zìwǒ zhāogòng this_just_be_social class_enemy_DE_oneself_confess This is the exact self-confession of the class enemy themselves Xuānyán zhōng de mǒuxiē zhāogòng shì zhídé zhùyì de declaration_among_DE_some_confess_be_worth_note_DE Certain confessions in the manifesto are noteworthy Yóu nǐ lái qǐcǎo zhāogòng wénjiàn by_you_come_draft_confess_document It is up to you to draft the confession document Wèn shénme tā jiùdéi zhāogòng shénme? ask_what_he_have to_confess_what Should she confess whatever question she is asked? Zuò zhàngfū de duì qīzi zhè zhǒng tǎnbái de zhāogòng háobùzàiyì act as_husband_DE_towards_wife_this_kind_frank_DE_confess_completely not concerned The husband doesn't care about the frank confession of his wife Tā bù zhīdào zhèxiē zhāogòng yòu bèi Yuè Wú lù le xiàlái he_not_know_these_confess_again_passive Bei_Yue Wu_record_ASP_down He didn’t know that what he confessed were recorded again by Yue Wu /

Investigating Verbs of Confession Through a Syntactic and Semantic Annotation Tool

205

Table 4. The distribution of the syntactic dependencies of gòngrèn Type HED

Number 153

Percentage 78.5%

VOB

17

8.7%

ATT

10

5.1%

SBV

9

4.6%

POB

6

3.1%

Total

195

100%

Examples Zhāng Wénfú gòngrèn tōuqiè shǔshí Wenfu Zhang_confess_theft_be true Wenchang Zhang confessed that the theft was true Zhè shì yīge hěn yǒuyìsi de gòngrèn this_be_one_CL_very_interesting_DE_confess This is a very interesting confession Tuīfān zìjǐ yǐqián suǒ gòngrèn de huà overturn_oneself_before_Auxiliary _confess_DE_words Overturn what you have confessed Lǐ Shìhuáng de gòngrèn wèi bàn'àn rényuán jiēkāi le mídǐ Shihuang Li_DE_confess_for_take charge of a case_person_open_ASP_mystery Shihuang Li’s confession opened the mystery for the investigators Jù Cáo Jìnxǐ gòngrèn xì yī zì chū zhǔjiàn According to_Jinxi Cao_confess_be_he_oneself_ provide_opinion According to what confessed by Jinxi Cao, it was his own opinion /

Table 5. The distribution of the syntactic dependencies of tǎnbái Type Number Percentage Examples HED 92 46.0% Nǐ gēn bómǔ tǎnbái le? you_to_aunt_confess_ASP Have you confessed to your aunt? VOB 52 26.0% Tā zuìhòu yě xuǎnzé le tǎnbái he_ﬁnally_also_chose_ASP_confess He ﬁnally chose to confess ATT 26 13.0% Zìshǒu tǎnbái de shíjiān wéi liǎng gè bàn yuè shì shìyí de surrender_confess_DE_time_be_two_CL_half_month_ be_appropriate_DE Two and a half months is appropriate for surrendering and confessing COO 15 7.5% Nǐ zài bu tǎnbái jiù bù kě'ài le you_still_not_confess_then_not_cute_ASP You won’t be cute if you don’t confess SBV 13 6.5% Bìngfēi suǒyǒu de yǒuzuì tǎnbái dōu shì xūjiǎ de not_all_DE_guilty_confess_all_be_false_DE Not all guilty confessions are false POB 2 1.0% Tā yóuyú tǎnbái le zìjǐ de zuìxíng ér bèi shìfàng he_because_confess_ASP_oneself_DE_crime_then_passive Bei_release He was released for confessing his crimes Total 200 100% /

206

S. Wang Table 6. The distribution of the syntactic dependencies of chéngrèn

Type Number Percentage Examples HED 155

47.8%

VOB 122

37.7%

ATT

28

8.6%

COO

14

4.3%

SBV

5

1.5%

Total 324

100%

4.2

Nǐ bìxū chéngrèn zhè diǎn You_have to_confess_this_point You have to confess this point Wǒmen xīwàng jiāfāng zǎorì chéngrèn zhōngguó shìchǎng jīngjì dìwèi we_hope_ the Canadian side _ an early date_confess_China_market _economy_status We hope that the Canadian side will recognize China’s market economy status at an early date Tā zàidù chóngshēn le měiguó zhèngfǔ zhǐ chéngrèn yī gè zhōngguó de lìchǎng he_again_reiterate_ASP_US_government_only_confess_one_CL_China_DE_position He reiterated the US government's position of recognizing only one China Gǎn zuò yòu bù gǎn chéngrèn dare_do_but_not_dare_confess Dare to do but dare not confess Chéngrèn yī ge yǒngyǒu liùyì rénkǒu de guójiā shì yīzhǒng gōngzhèng de hé guāngmíngzhèngdà de zérèn confess_one_CL_possess_600_million_population_DE_country_be_one kind_just_DE_and_aboveboard _DE_responsibility Recognizing a country of 600 million people is a just and aboveboard responsibility /

Semantic Roles of the Words Collocated with Verbs of Confession

This section analyzes the semantic roles of the words collocated with verbs of confession when these verbs are used as Head. The results are shown in Table 7. The overall distribution shows the following characteristics: (1) the ﬁve verbs can all collocate with the six semantic roles in the sampled sentences: Agent, Content, Time, Manner, Dative and Location; (2) the top three most frequent semantic roles are Agent, Content and Time. Agent is the most collocated semantic role: 40.7% for zhāorèn, 41.7% for zhāogòng, 34.5% for gòngrèn, 34.4% for tǎnbái, and 39.0% for chéngrèn respectively; (3) Experiencer, Patient, and Location rarely appear. Regarding individual verbs, they show many differences: (1) the top three semantic roles for zhāorèn are Agent, Time and Content; for zhāogòng are Agent, Manner and Time; for gòngrèn, tǎnbái and chéngrèn are Agent, Content and Time; (2) gòngrèn does not appear with Patient and Experiencer, while the other four verbs collocate with eight semantic roles; (3) Manner is more frequently collocated with zhāorèn, zhāogòng and tǎnbái compared to gòngrèn and chéngrèn; (3) Dative is more frequently collocated with gòngrèn and tǎnbái compared to zhāorèn, zhāogòng and chéngrèn.

Investigating Verbs of Confession Through a Syntactic and Semantic Annotation Tool

207

Table 7. The distribution of semantic roles collocated with verbs of confession zhāorèn Semantic Role

zhāogòng

gòngrèn

tǎnbái

chéngrèn

Total

Num

Percen

Num

Percen

Num

Percen

Num

Percen

Num

Percen

Num

ber

tage

ber

tage

ber

tage

ber

tage

ber

tage

ber

Agent

61

40.7%

45

41.7%

68

34.5%

54

34.4%

92

39.0%

320

Content

25

16.7%

14

13.0%

52

26.4%

34

21.7%

80

33.9%

205

Time

27

18.0%

17

15.7%

37

18.8%

24

15.3%

38

16.1%

143

Manner

19

12.7%

18

16.7%

11

5.6%

18

11.5%

13

5.5%

79

Dative

4

2.7%

5

4.6%

20

10.2%

21

13.4%

6

2.5%

56

Patient

9

6.0%

6

5.6%

0

0.0%

2

1.3%

1

0.4%

18

2

1.3%

2

1.9%

9

4.6%

2

1.3%

1

0.4%

16

3

2.0%

1

0.9%

0

0.0%

2

1.3%

5

2.1%

11

150

100.0 %

108

100.0 %

197

100.0 %

157

100.0 %

236

100.0 %

848

Agent-like role Patient-lik e role Situational Role Situational Role Patient-lik e role Patient-lik e role Situational

Locatio

role

n

Agent-like

Experie

role

ncer Total

4.3

Comparison with Chinese Resources

This section compares our research results with Chinese resources. First, the majority of Chinese resources does not describe syntactic functions or semantic roles, nor do they provide frequency information. For example, The Contemporary Chinese Dictionary (7 Edition) [20] only contains the information of pronunciation, part of speech, senses and example sentences. Second, though a few resources provide syntactic functions and semantic role, the syntactic and semantic types are not as rich as this study, and there is also a lack of overall comparison of synonymous verbs. For example, The Syntactic-Semantic Knowledge-Base of Chinese Verbs [23] (Abbreviated as the Knowledge-Base below) of Peking University mainly describes the syntactic functions and semantic roles of common verbs in modern Chinese. The comparisons are summarized in Table 8 and Table 9. Regarding the syntactic functions, the Knowledge-Base describes two functions that they can have: taking objects and function as predicates. Regarding the semantic

208

S. Wang

roles, gòngrèn collocates with Agent, Patient and Relative2, tǎnbái collocates with Agent, Dative and Relative, and chéngrèn collocates with Agent and Relative. The Knowledge-Base and this study shows these differences: (1) though the KnowledgeBase describes the syntactic functions and semantic roles, there are fewer types than this study. (2) The Knowledge-Base lacks quantity data. Instead, our study lists the number and frequency of each syntactic types (Table 2, 3, 4, 5 and 6) and semantic roles (Table 7), which provides quantitative support. (3) The Knowledge-Base does not provide obvious comparison of these verbs, while this study shows their similarities and differences directly. For example, looking at the semantic roles, we can see that zhāorèn and zhāogòng have the same roles with tǎnbái and chéngrèn, while gòngrèn does not often collocate with Patient and Experiencer. It indicates that semantically zhāorèn and zhāogòng are more similar to tǎnbái and chéngrèn, rather than gòngrèn. Table 8. Comparison of the syntactic types in The Syntactic-Semantic Knowledge-Base of Chinese Verbs and this study Verbs zhāorèn

The syntactic-semantic knowledge-base The search results navigate to gòngrèn, which marks these three verbs as synonyms

zhāogòng gòngrèn tǎnbái chéngrèn

take objects; function as predicates; take objects; function as predicates take objects; function as predicates

This study ATT, COO, HED, POB, SBV, VOB ATT, COO, FOB, HED, POB, SBV, VOB ATT, HED, POB, SBV, VOB ATT, COO, HED, POB, SBV, VOB ATT, COO, HED, SBV, VOB

Table 9. Comparison of the collocated semantic roles in The Syntactic-Semantic KnowledgeBase of Chinese Verbs and this study Verbs zhāorèn

The syntactic-semantic knowledge-base The search results navigate to gòngrèn, which marks these three verbs as zhāogòng synonyms gòngrèn

Agent, Patient and Relative

tǎnbái

Agent, Dative and Relative

chéngrèn Agent and Relative

2

This study Agent, Content, Time, Manner, Dative, Patient, Location, Experiencer Agent, Content, Time, Manner, Dative, Patient, Location, Experiencer Agent, Content, Time, Manner, Dative, Location Agent, Content, Time, Manner, Dative, Patient, Location, Experiencer Agent, Content, Time, Manner, Dative, Patient, Location, Experiencer

Relative of the Knowledge-Base is the same as Content in this study.

Investigating Verbs of Confession Through a Syntactic and Semantic Annotation Tool

209

Mandarin VerbNet is a Chinese semantic knowledge base constructed with framebased constructional approach [24]. It describes the characteristics of verbs of confession and has the third-level frame category of “Communication-statement-admit”. But it only contains the verb tǎnbái and does not have the other four verbs. In addition, Table 10 lists ﬁve frame elements of tǎnbái, including Message, Speaker, Message description, Addressee and Topic. Our study ﬁnds eight semantic roles with their frequency shown in Table 7. Certain frame element may be inaccurate in Mandarin VerbNet. For example, in (a), gǎnqíng de biǎodá ‘the expression of feelings’ is marked as Speaker. But according to the deﬁnition Speaker is “The person who confesses a Message or Message_Description to the Addressee.” It is not a speaker and in our study it will be marked as Experiencer. (a) Gǎnqíng de biǎodá bǐjiào zhíjiē, tǎnbái ér qiángliè, yǒurén shuō gèngjiā biāohàn le. feeling_DE_expression_more_direct_ frank_and_strong, some people_say_more_ aggressive_ASP ‘The expression of feelings is more direct, frank and strong, and some people say that it is even more aggressive.’ Table 10. The analysis of tǎnbái in Mandarin VerbNet Verbs Path

Number of Syntactic Patterns

tǎnbái Communication ! Statement ! Admit ! tǎnbái 47

Frame Elements

Message (289), Speaker (286), Message_Description (79), Addressee (47), Topic (14)

5 Conclusion With the rapid development of big data and deep learning, Chinese information processing systems need more and more comprehensive quantitative research results to carry out large-scale automatic Chinese syntactic and semantic analysis. This study has selected sentences with ﬁve verbs of confession from large-scale corpora and used a self-developed syntactic and semantic annotation tool to annotate these sentences. It has the following ﬁndings. (1) In terms of syntax, they can all appear in four syntactic dependencies: HED (Head), VOB (Verb-Object), ATT (Attribute) and SBV (SubjectVerb). That is, they can function as predicates, objects, attributes and subjects based on dependency grammar. Among them, the most frequent use is acting as Head. (2) In terms of semantics, the top three semantic roles collocated with these verbs are Agent, Content and Time; the frequency of Experiencer, Patient, and Location are all very small. This study further compared the results with existing Chinese resources, which shows that this study makes a more comprehensive analysis of the syntactic functions and semantic roles through directly showing their similarities and differences with quantitative data.

210

S. Wang

With the development of artiﬁcial intelligence, linguistics has become a new ﬁeld where scholars in different ﬁelds can collaborate and make progress. The annotation of semantic resources has therefore become a difﬁculty to be overcome [24]. This article explores the syntactic and semantic features of Chinese verbs of confession based on dependency grammar, which provides an important window for the deep understanding of this type of verbs. It can also enrich the quantitative research of vocabulary, which provides a new perspective for the study of Chinese lexicon. This study is useful for lexicography, Chinese language teaching and natural language processing. Acknowledgement. The study is supported by the University of Macau (MYRG2019-00013FAH). This work was performed in part at the high performance computing cluster (HPCC) which is supported by information and communication technology ofﬁce (ICTO) of the University of Macau.

References 1. Liu, H.: Dependency Grammar from Theory to Pratice (Yīcún yǔfǎ de lǐlùn yǔ shíjiàn). Science Press (Kēxué chūbǎnshè), Beijing (2009) 2. Feng, Z.: The Subordination Grammar of Lucien Tesnière (Tèsīníyéěr de cóngshǔ guānxì yǔfǎ). Contemp. Linguist. (Guówài yǔyán xué) 63–65+57 (1983) 3. Zhou, M., Huang, C.: Approach to the Chinese dependency formalism for the tagging of corpus (Miànxiàng yǔliàokù biāozhù de hànyǔ yīcún tǐxì de tàntǎo). J. Chin. Inf. Process. (Zhōngwén xìnxī xuébào) 8, 35–52 (1994) 4. Liu, B., Niu, Y., Liu, H.: A comparative study on style-related differences in syntactic functions of part of speech (Hànyǔ cílèi jùfǎ gōngnéng de yǔtǐ chāyì yánjiū). Lang. Teach. Linguist. Stud. (Yǔyán jiàoxué yǔ yánjiū) 97–104 (2013) 5. Gao, S.: A quantitative study on syntactic functions of nouns in mandarin Chinese: based on Chinese dependency treebank (Jīyú yīcún shùkù de xiàndài hànyǔ míngcí yǔfǎ gōngnéng de jìliàng yánjiū). TCSOL Stud. (Huáwén jiàoxué yǔ yánjiū) 54–60 (2010) 6. Gao, S., Yan, W., Liu, H.: A quantitative study on syntactic functions of chinese verbs based on dependency treebank (Jīyú shùkù de xiàndài hànyǔ dòngcí jùfǎ gōngnéng de jìliàng yánjiū). Chin. Lang. Learn. (Hànyǔ xuéxí) 105–112 (2010) 7. Levin, B.: English Verb Classes and Alternations: A Preliminary Investigation. University of Chicago Press, Chicago (1993) 8. Wang, W.: Diachronic evolution and synchronic distribution of “shuo-type verbs” in Chinese (Hànyǔ “shuō lèi cí” de lìshí yǎnbiàn yǔ gòngshí fēnbù). Stud. Chin. Lang. (Zhōngguó yǔwén) 329–342+384 (2003) 9. Zhang, H., Cui, Y.: The grammaticalization of verbs commonly used in early ancient Chinese (Shànggǔ hànyǔ zǎoqí chángyòng yánshuō dòngcí de yǔfǎhuà). Linguist. Res. (Yǔwén yánjiū) 35–44 (2020) 10. Yu, C.: An investigation on the semantic and pragmatic features of suggestion verbs (“Jiànyì” lèi dòngcí de yǔyì yǔyòng tèzhēng kǎochá). J. Heilongjiang Vocat. Instit. Ecol. Eng. (Hēilóngjiāng shēngtài gōngchéng zhíyè xuéyuàn xuébào) 150–152 (2015) 11. Zhang, Y.: From physical acts to speech acts: the origin of the verbs with the meaning of “exhort” (Cóng wùlǐ xíngwéi dào yányǔ xíngwéi: Zhǔfu lèi dòngcí de chǎnshēng). Stud. Chin. Lang. (Zhōngguó yǔwén) 3–16+95 (2012)

Investigating Verbs of Confession Through a Syntactic and Semantic Annotation Tool

211

12. Wang, S.: Chinese Multiword Expressions. Springer, Singapore (2020). https://doi.org/10. 1007/978-981-13-8510-0 13. Wang, S., Wu, L., Gong, Q.: The collocations of Chinese tactile adjectives. In: Liu, M., Kit, C., Su, Qi. (eds.) Chinese Lexical Semantics. LNCS (LNAI), vol. 12278, pp. 711–733. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-81197-6_59 14. Wang, S., Yin, J.: A comparative study of the collocations in legislative Chinese and general Chinese. In: Hong, J.-F., Zhang, Y., Liu, P. (eds.) CLSW 2019. LNCS (LNAI), vol. 11831, pp. 710–724. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38189-9_72 15. Wang, S., Tang, L.: Comparison of changes between mainland China and Taiwan. In: Liu, M., Kit, C., Su, Qi. (eds.) CLSW 2020. LNCS (LNAI), vol. 12278, pp. 686–710. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-81197-6_58 16. Wang, S., Luo, H.: Exploring the meanings and grammatical functions of idioms in teaching Chinese as a second language. Int. J. Appl. Linguist. 31, 283–300 (2021) 17. Research Group of Modern Chinese Common Word List: Modern Chinese Common Word List (Xiàndài hànyǔ chángyòngcí biǎo). The Commercial Press (Shāngwù yìnshūguǎn), Beijing (2008) 18. Su, X.: A Thesaurus of Modern Chinese (Xiàndài hànyǔ fènlèi cídiǎn). The Commercial Press (Shāngwù yìnshūguǎn), Beijing (2013) 19. Mei, J., Zhu, Y., Gao, Y., Yin, H.: Synonym Word Forest (tóngyìcí cílín). Shanghai Lexicographical Publishing House (Shànghǎi císhū chūbǎnshè), Shanghai (1996) 20. Dictionary Editing Room of Institute of Linguistics of China Academy of Social Sciences: The Contemporary Chinese Dictionary, 7th edn (Xiàndài hànyǔ cídiǎn). The Commercial Press (Shāngwù yìnshūguǎn), Beijing (2016) 21. Dictionary Editing Room: The Contemporary Chinese Dictionary (Chinese-English Edition) [xiàndài hànyǔ cídiǎn (Hàn-Yīng shuāngyǔ)]. Foreign Language Teaching and Research Press (Wàiyǔ jiàoxué yǔ yánjiū chūbǎnshè), Beijing (2002) 22. Liu, T., Che, W., Li, Z.: Langue Technology Platform (Yǔyán jìshù píngtái). J. Chin. Inf. Process. (Zhōngwén xìnxī xuébào) 25, 53–63 (2011) 23. Yuan, Y.: An Information Dictionary of Syntactic and Semantic Functions of Content Words in Modern Chinese (Xiàndài hànyǔ shící jùfǎ yǔyì gōngnéng xìnxī cídiǎn). Peking University, Beijing (2018) 24. Liu, M., Wan, M.: Chinese verbs and their categorization: construction and application of the semantic network of Chinese verbs (Zhōngwén dòngcí jí fēnlèi yánjiū: Zhōngwén dòngcí cíhuì yǔyì wǎng de gòujiàn jí yìngyòng). Lexicographical Studies (Císhū yánjiū) 42–60+110 (2019)

The Pragmatic Distribution and Semantic Explanation of Evidential Prepositions Enxu Wang(&) and Zheng Zhang College of Chinese Language and Literature, University of Jinan, Jinan, China [email protected]

Abstract. Compared with other prepositions, the study on evidential prepositions, such as “jü (据)”, “an (按)”, “ping (凭)”, is relatively weak. Many meanings and usages of evidential prepositions are not clear, which leads to a misunderstanding that there are few differences among them and that they can explain mutually. At the beginning of this paper, we analyze the deﬁnition of 800 words in Modern Chinese which is “据 (jü): 依据 (yijü)” based on the Corpus Online, and ﬁnd there are obvious differences between “据 (jü)” and “依据 (yijü)”. So, it is not suitable to use the latter to explain the former. Then, we investigate more evidential prepositions, and ﬁnd there are also signiﬁcant differences among them. Based on the research above, we divide the evidential prepositions into three categories: epistemic evidence, deonitic evidence and dynamic evidence, and explain the commonness and individuality of evidential prepositions by conceptual space and semantic map. Keywords: Evidential prepositions

Distribution Semantic explanation

1 Introduction Compared with other prepositions, the study on evidential prepositions is relatively weak. Many meanings and usages of evidential prepositions are not clear, which lead to a misunderstanding that there are few differences among them and that they can be deﬁned mutually. Take the preposition “jü (据)” as an example, 800 words in Modern Chinese [1] holds that it has the same meaning as “yijü (依据)” and can be explained by “yijü (依据)”, as shown in example (1). (1) 【据】依据: * 理力争 | *实报告 | *同名小说改编[1:323] ([jü(据)] yijü (依据): to defend some interests according to some reason | to report according to the actual situation | to adapt according to novels of the same name) Is this explanation accurate? If just looking at the examples in (1), such an explanation seems appropriate, where the “jü (据)” can be replaced by “yijü (依据)”. If investigating more examples from the Corpus Online1, however, we will ﬁnd the

1

The corpus for this paper is sourced from the Corpus Online (http://corpus.zhonghuayuwen. org/) unless otherwise stated.

© Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 212–222, 2022. https://doi.org/10.1007/978-3-031-06703-7_16

The Pragmatic Distribution and Semantic Explanation

213

deﬁciency of this explanation. There are a large number of examples where “jü (据)” cannot be replaced by “yijü (依据)”, as shown in example (2). (2) a. 据(*依据)光明日报载, 清华大学已准许一部分优秀学生考读双学士。(According to the Guangming Daily, Tsinghua University has allowed some of its best students to take a double bachelor’s degree.) b. 据(*依据)法国报刊报道, 法国原子能总署已决定把威斯汀豪斯的部分股份买过来。(According to the French press, the French Atomic Energy Agency has decided to buy some shares of Westinghouse.) Conversely, there are also a great number of “yijü (依据)” cannot be substituted for “jü (据)” (example 3). The reason why the “jü (据)” and “yijü (依据)” in the above examples are not interchangeable is that their pragmatic functions are different. Example (2) is quotative sentences. Using “jü (据)” is not to introduce new information but to quote the existed old information. Whereas example (3) is declarative sentences, where the functions of “yijü (依据)” is to introduce new information. Therefore, the meanings or functions of “jü (据)” and “yijü (依据)” are not equivalent, and it is inaccurate to use the latter to explain the former. (3) a. 很多人依据(*据)这个记载, 认为大熊猫就是貔貅。 (百度资讯) (Many people believe, on the basis of this account, that the giant panda is a mythical wild animal.) b. 依据(*据)墓志记载解读唐朝安乐公主的野心与狠毒。 (百度资讯)(Interpreting the ambition and cruelty of Princess Anle of the Tang Dynasty based on the tomb records.) What is the meaning of “jü (据)”? What relationships exist between evidential prepositions? And how to analyze and explain the meanings of evidential prepositions? This paper will answer these questions. The structure of this paper is as follows: Firstly, we will establish a semantic analysis method of prepositions. Then, to analyze the semantic structure of “jü (据)” and “yijü (依据)” so as to distinguish them. Furthermore, we will expand the research scope and distinguish more evidential prepositions. On this basis, we will reclassify evidential prepositions and draw their semantic map.

2 Methods of Analyzing the Semantic Structure of Prepositions The meaning of function words is a semantic structure. Analyzing the semantic structure of a function word is to ﬁnd out the semantic components contained within it and the relationships that exist between them [2]. As for prepositions, they are mainly derived from verbs and are closely related to verbs [3, 4]. Functionally, the function of prepositions is to introduce non-essential arguments for predicates or clauses [5], such as time arguments, starting point arguments, tool arguments, source arguments, purpose arguments and so on [6, 7]. Thus,

214

E. Wang and Z. Zhang

analyzing the semantic structure of prepositions can adopt the same approach as that of verbs, this is, starting from its argument structure. Taking “jü (据)” as an example, we can get the following information by analyzing its semantic structure from the perspective of argument structure: (4) Semantic structure of “jü (据)” Semantic type: evidential type Semantic components: 2 components. One is a non-essential argument, and the other is a predicate or clause. Semantic relations: 3 relations. If the non-essential argument is served by a predi- cate structure, such as “Guangming Daily reports”, the relationship between the non-essential argument and its following clause is “source - evidence”. If the non-essential argument is an opinion, there is a “standpoint - opinion” relationship between the non-essential argument and its following clause. If the nonessential argument is some other component, such as “shishi (事实, fact)/daoli (道理, theory)/xingshi (形势, situation)”, there is a “reliance/base - action” relationship between the argument and its following predicate or clause. How do we know this information is contained in the semantic structure of “jü (据)”? The answer is the pragmatic context or syntactic distribution [8]. The meaning of a word is interdependent on the context where it appears. Words appear in the same context often have the same meaning [9]. There is no better way to know the meaning of a word than to look at the context in which it appears and the word it collocates with. Just as a famous saying goes: You shall know a word by the company it keeps [10: 179]. In the next section, we will consider the contexts in which “jü (据)” and “yijü (依据)” appear and the words that accompany them.

3 Semantic Structure of “jü (据)” and “yijü (依据)” 3.1

Semantic Structure of “jü (据)”

Based on corpus statistics, the preposition “jü (据)” appears in three contexts. Context 1: jü (据) + non-essential argument source of evidence + clause quotative evidence Characteristics of this context: 1) It is the typical context of “jü (据)” sentences, with more than 95% (1360 sentences) of “jü (据)” appearing in this context. 2) Most of the non-essential arguments are reporting type, such as “…… reports/records/rumors/ memories/observations/estimations”, whose functions are to provide the information sources for the following clauses. 3) The following clause does not provide new information, but quotes existed old information. In the context of the discourse, e.g. the underlined information in example (5), the following clause serves to provide evidence for the contextual discussion. There is an “evidence - source” relationship between the non-essential argument and its following clause.

The Pragmatic Distribution and Semantic Explanation

215

(5) a. 山西这6次移民, 都与洪洞大槐树发生过关系。据记载, 元末明初从山西移民, 不管老百姓家在何州何府, 都要先集中到洪洞县去。(The six migrations from Shanxi were all related to the Hongdong Dahuaishu. According to records, immigrants from Shanxi in the late Yuan and early Ming dynasties, no matter where they lived, must concentrate on Hongdong county ﬁrst.) b. 徐悲鸿的画马, 成为一绝, 驰名中外, 据徐老的夫人廖静文说, 悲鸿幼年第一次画的并不是马, 而是虎。(Xu Beihong’s paintings of horses are famous in the world. According to Xu’s wife, Liao Jingwen, the ﬁrst painting Xu Beihong did as a child was not a horse but a tiger.) Context 2: jü (据) + non-essential argument

standpoint

+ clause

opinion

Characteristics of this context: 1) It is the atypical context of “jü (据)” sentences, only 2.6% (37 sentences) of “jü (据)” appearing in this context. 2) The non-essential argument is the standpoint of the following clause information, and the new information in the following clause is inferred from the standpoint of non-essential argument. Hence, there is a “standpoint - opinion” relationship between them. For example (6): (6) a. 至于识字运动, 据我们的看法, 实在也不是什么艰难的问题。(As for the lit eracy movement, in our opinion, it is really not a difﬁcult problem either.) b. 据殷墟出土的甲骨文来推测, 汉字已经有四千年以上的历史了。(According to the Oracle Bone Inscriptions unearthed at Yinxü, Chinese characters have been created for over 4,000 years.) Context 3: jü (据) + non-essential argument

reliance/base

+ clause

action

Characteristics of this context: 1) It is the atypical context of “jü (据)” sentences, with only 2.4% (34 sentences) of “jü (据)” appearing in this context. 2) The nonessential argument serves as the relying force or basis of action, which is the residual usage of “jü (据)” in ancient Chinese and is rarely used in modern Chinese. For example (7): (7) a. 王国维又据之作《流沙坠简》。(Wang Guowei compiled the Falling bamboo slips in quicksand on this basis.) b. 陶安便建议先取金陵, 据形势以临四方。(Tao An suggested Jinling should be captured ﬁrst, then the situation could be used to confront the whole country.) 3) The action of following predicates is carried out on the basis of non-essential arguments, where there is a relationship of “reliance/base - action” between nonessential arguments and predicates.

216

3.2

E. Wang and Z. Zhang

Semantic Structure of “yijü (依据)”

Based on corpus statistics, the preposition “jü (据)” appears in ﬁve contexts. Context 1: yijü (依据) + non-essential argument

reliance/base

+ clause

action

Characteristics of this context: 1) It is the typical context of “yijü (依据)” sentences, with more than half of “yijü (依据)” appearing in this context. 2) Similar to context 3 of “jü (据)” sentences, the non-essential argument of “yijü (依据)” also serves as the relying force or basis of the action. The difference between the two usages lies in: the former is a residual usage of ancient Chinese, which is relatively limited in modern Chinese. But the latter is a new usage and can be used frequently in modern Chinese. 3) As for the following predicate, it is also similar to “jü (据)” sentences in context 3, where the action of the following predicate is carried out on the basis of non-essential arguments, and the relationship between them is “reliance/base - action”. For example: (8) a. 单兵动作也可以依据千变万化的战场实际重新排列组合。(Individual action can be rearranged and combined to match the ever-changing battleﬁeld reality.) b. 我们依据村外菜园土墙边的第一道工事, 杀伤猛扑而来的敌人。(We killed and wounded the attacking enemy based on the ﬁrst line of fortiﬁcations along the earthen wall in the vegetable garden outside the village.) Context 2: yijü (依据) + non-essential argument

strict standard followed

+ clause

action

Characteristics of this context: 1) It is the relatively typical context of “yijü (依据)” sentences, 22.8% of “yijü (依据)” appearing in this context, where the word “yijü (依据)” can usually be replaced by “zuncong (遵从, comply with)”. 2) The non-essential argument represents a strict standard of action to be followed, such as “laws/rules/ requirements/regulations/procedures”, which has broad binding force and compulsory effect on members of society. Individuals cannot change them but comply with them [11]. 3) The following predicate complies with the strict standard to carry out actions. For example (9a), the bankruptcy of an enterprise must comply with certain economic legal. (9) a. 企业破产……, 必须依据一定的经济法律规定, 通过一定的法律程序。 (The bankruptcy of an enterprise…… must comply with certain economic legal and follow certain legal procedures.) b. 地主不应该毁灭农民, 而应该依据沙皇诏谕保护他们。(The landowners should not destroy the peasants but protect them in compliance with the Tsar’s edict.) Context 3: yijü (依据) + non-essential argument

wide standard followed

+ clause

action

The Pragmatic Distribution and Semantic Explanation

217

Characteristics of this context: 1) It is the atypical context of “yijü (依据)” sentences, only 19% of “yijü (依据)” appearing in this context, where the “yijü (依据)” can be replaced by “yi…wei biaozhun (以……为标准, depend on)”. As shown in example (10). 2) The non-essential argument represents a wide standard of action to be followed, such as “reality/achievement/contribution/characteristic”, which do not reflect collective volition and do not have broad binding force and compulsion. Therefore, actors do not have to comply with them, but can apply them creatively or change them. (10) a. 任何在职人员的提升, 都必须依据实际工作成绩和贡献, 而不是……。 (Any promotion of an incumbent must depend on his actual work and contribution rather than……) b. 培育过程依据幼虫的生长大小及其水体中密度, 调节流速。(During incubation, the flow rate can be regulated depending on the growth size of the larvae and their density in the water column.) 3) The following predicate complies with the wide standard to carry out actions. Since the wide standard does not specify what to do and how to do, but provides a vague guidance of action, the action process is uncertain and the outcome of action is unpredictable. Context 4: yijü (依据) + non-essential argument source of evidence + clause quotative evidence. The characteristics of this context are identical to those of context 1 of “jü (据)” (see Sect. 3.1). For example: (11) a. 依据马林科夫同志的报告, 又净增了九百五十万人。(According to Comrade malinkov’s report, there was a net increase of 9.5 million people.) b. 依据该国法律: 一个人的心脏停止跳动, 即视为死亡。(According to the law of the country, a person whose heart stops beating is considered dead.) An exception is that such context is typical for “jü (据)” sentences but is atypical for “yijü (依据)” sentences, only 5% of “yijü (依据)” sentences appearing in this context. 3.3

Distinguishing “jü (据)” from “yijü (依据)”

According to Sect. 3.1 and 3.2, the “jü (据)” and “yijü (依据)” have some similarities, but there are also signiﬁcant differences. The similarities are reflected mainly in the discourse context and semantic function. For example, both “jü (据)” and “yijü (依据)” can appear in context 1 and context 4 (see Sect. 3.2), and both can introduce the source of evidence, the relying force of action, etc. The differences between “jü (据)” and “yijü (依据)” are as follows: 1) Syntactically, the typical syntactic position of “jü (据)…” is at the beginning of sentences (95%), while the “yijü (依据)…” appears in the mid-sentence position (90%). 2)

218

E. Wang and Z. Zhang

Contextually, the usages of “jü (据)” and “yijü (依据)” seem to partially overlap, but in fact they are complementary to each other. 3) Statistically, the typical characteristics of “jü (据)” sentences, such as syntactic positions, discourse contexts, pragmatic frequency, are not typical in “yijü (依据)” sentences; and the typical characteristics of “yijü (依据)” sentences are atypical in “jü (据)” sentences. For more details, see Table 1. Table 1. The differences between “jü (据)” and “yijü (依据)”. Pragmatic context Jü (据) Yijü (依据) 5% a. P + argument source of evidence + clause quotative evidence 95% b. P + argument reliance/base + clause action 2.4% 53.2% c. P + argument standpoint + clause opinion 2.6% d. P + argument strict standard followed + clause action 22.8% 18.9% e. P + argument wide standard followed + clause action Notes: P stands for preposition and % indicates pragmatic frequency.

4 Semantic Structure and Modality Classiﬁcation of Evidential Prepositions 4.1

Semantic Structure of Evidential Prepositions

By analyzing the semantic structure of “jü (据)” and “yijü (依据)”, this paper has solved the problem of distinguishing between the two words. Such problems, however, are widely distributed among evidential prepositions. Take 800 words in Modern Chinese [1] as an example, to explain evidential prepositions, this dictionary usually uses one synonym to explain the other, such as using “anzhao(按照)” to explain “an (按)” [1:50], using “yizhao (依照)” to explain “yi (依)” [1:612], using “an (按)” to explain “zhao (照)” [1:655], which not only results in circular explanations but also makes people think that evidential prepositions have the same meaning and usage. Can the method in this paper solve these problems? The answer is “yes”. By analyzing the semantic structure of evidential prepositions, we distinguish them as follows. Table 2. The differences among evidential prepositions. Yi Zhao Anzhao Yizhao An Yijü Genju Jü (依) (按照) (依照) (按) (根据) (据) (依据) (照) 2661 200 285 1704 383 4571 1431 233 Context a 11.1% 95% 5% 4.4% Context b 18% 2.4% 53.2% 0.5% 13.5% Context c 15.2% 2.6% 30.8% 10.1% 12.6% 2.9% 21% Context d 28% 22.8% 58% 58.6% 75.8% 32.6% 28% Context e 27.7% 18.9% 6.8% 31.3% 11.6% 64% 37.5% Notes: 1) Numbers in the ﬁrst row indicate the occurrence frequency of words in corpus, and numbers in the second to sixth rows (Context a - e) indicate its pragmatic probability, with bolded numbers highlighting its typical usage. 2) The prepositions “ping (凭)” and “kao (靠)” appear only in context b and are not shown here due to space constraints. Pragmatic context (see Table 1)

The Pragmatic Distribution and Semantic Explanation

219

Combined with Table 2, it can be seen that each evidential preposition has its own unique meaning and usage, by which it can be distinguished from other prepositions. Relatively speaking, previous dictionaries used one word to explain another, which ignored their differences in meanings and usages and should be modiﬁed. 4.2

Modality Classiﬁcation of Evidential Prepositions

Scientiﬁc classiﬁcation does not accomplish at one stroke, on the contrary, it usually goes through a long process. The same is true for evidential prepositions, and there is no consistent classiﬁcation yet. Some scholars classify them into “anzhao (按照)”, “ping (凭)” two categories [5], and some classify them into “anzhao (按照)”, “ping (凭)”, “suizhe (随着)” three categories [12], while others divide into “an (按)”, “ping (凭)”, “jü (据)”, “yi (以)” four categories [13]. These classiﬁcations take note of the objective meaning of evidential prepositions while largely ignore their subjective meaning. In fact, the evidence is not an objective concept but a subjective one which expresses a person’s beliefs, standpoints, preferences, standards, etc. If we ask: On what basis do you do this? We are not concerned with how you do it, but with what beliefs you hold, what positions you take, or what standards you follow, etc. In this way, the evidence is not an isolated subjective category, which is closely linked to another subjective category in linguistics, modality. According to references [14–18] and Sects. 2–3, this paper reclassiﬁes Chinese evidential prepositions into three categories: Epistemic Evidence: it introduces evidence for the speaker to judge the truth value of a proposition. Within epistemic evidence, there are two subcategories: one is to introduce the source of evidence, represented by “jü (据)”, for example (5); the other is to introduce the standpoint of opinion, represented by “zhao (照)”. As shown in the following examples. (12) a. 照我看, ……他们的论文和发言都是比较有思想, 有见解的。(In my opinion, …their papers and speeches are relatively thoughtful and insightful.) b. 照此说法, 地租也就成为不劳所得了。(On this basis, land rent will become an unearned income.) Deontic Evidence: it introduces the evidence of whether something is objectively allowed or should be done. Within deontic evidence, there are also two subcategories: one is to introduce strict standards that the actor should follow rather than violate, represented by “yizhao (依照)”, for example (13a); the other is to introduce wide standards (such as requirements, volitions, properties, characteristics, etc.) that the actor should follow, represented by “an (按)”, for example (13b). (13) a. 姑姑依照船长的命令, 按了一下红色电钮。(The aunt pressed the red electric button in accordance with the captain’s order.)

220

E. Wang and Z. Zhang

b. 社员盖房时, 村委会……按低价处理一些树木给社员。(When community members build houses, the village council ...... will disposed of some trees to them at a low price.) Ability Evidence: it introduces the evidence of whether one is capable of doing something. Within ability evidence, there are two subcategories: one is to introduce the inner power on which the action depends, such as the actor’s own abilities, experience, interests, skills, etc., represented by “ping (凭)”, for example (14a); the other is to introduce the outer power on which the action depends, such as parents, others, tools, policies, etc., represented by “kao (靠)”, for example (14b). (14) a. 凭本事挣钱, ……不存在对得起对不起的问题。(Earning money depending on your ability, …there is no right or sorry.) b. 他在南洋的商务就靠女婿李光前掌管了。(He relied on his son-in-law Li Guangqian to take charge of his business in Southeastern Asia.) To sum up, the reclassiﬁcations of evidence prepositions are as follows. Table 3. Modality-based reclassiﬁcation of evidential prepositions. Meanings and usages Epistemic evidence

To introduce the source of evidence To introduce the standpoint of opinion

Deontic evidence

To introduce strict standards of action

Representative words jü (据) zhao(照)…kan (看) yizhao (依照)

Other words

yi (依)…kan (看) anzhao (按照) yi (依)

To introduce wide standards of action an (按) To introduce the inner power what the ping (凭) action depends To introduce the outer power what the kao (靠) action depends Notes: This table lists the dominant usage of prepositions. Those prepositions which have no dominant usage or whose dominant usage is not obvious, such as “genjü (根据)”, “yijü (依据)”, do not appear in this table. Ability evidence

5 Semantic Map of Evidential Prepositions According to the studies above, this paper describes the similarities and differences (see Tables 2 and 3) within evidential prepositions. Based on these knowledges, it is possible to know why different prepositions are used. However, there are no further explanations, and it is unclear why there are so many similarities within evidential prepositions and how they are connected. Typological research shows that there are semantic connections between members of the same category. Through these connections, the members of a category can relate

The Pragmatic Distribution and Semantic Explanation

221

to each other, forming a conceptual space. In the conceptual space, the connections between meanings are illustrated by lines, and the thickness of lines indicates the closeness of the connection. The thicker the line is, the closer the relationship will be. For evidential prepositions, its conceptual space is centered on the meaning of “to introduce the standpoint of opinion” which is connected to other meanings. For example, the central meaning is closely connected to the meaning of “to introduce standards of action” (6 prepositions), and the line is the thickest; the connection with the meaning of “to introduce the reliance or basis of action” is slightly weak (4 prepositions) and the line is thinner; the connection with the meaning of “to introduce the source of evidence” is much weaker (3 prepositions) and the line is further thinned. Further details are as follows.

Fig. 1. Conceptual space and semantic map of evidential prepositions.

Creating a conceptual space for evidential prepositions has three advantages: 1) It can explain why there are so many similarities between evidential prepositions. The answer is that they share a conceptual space. 2) It can connect the different meanings of evidential prepositions, and the thickness of the connecting line reflects the closeness of the connection. 3) It enables different evidential prepositions to be visually distinguished from each other. Different evidential prepositions have different distributions in the conceptual space. Such as, the epistemic evidence distributes in the upper left of the conceptual space, forming the semantic map as Fig. 1b; the deontic evidence is in the upper right of the conceptual space, forming the semantic map as Fig. 1c; and the ability evidence are at the bottom of the conceptual space, forming the semantic map as Fig. 1d.

6 Conclusion Lexicographers believe that there are few differences within evidential prepositions and that dictionaries can use one of them to explain the other, which is not true.

222

E. Wang and Z. Zhang

Based on corpus data, this paper analyses the semantic structure of 10 typical prepositions, and ﬁnds that, in addition to similarities, there are signiﬁcant differences among them. Using the differences, it is possible to divide them into three categories. Subsequently, this paper explains why there are so many differences within evidential prepositions by means of conceptual spaces and semantic maps. Acknowledgments. This paper is supported by the National Social Science Fund (19BYY030). The anonymous reviewers of CLSW2021 put forward many valuable comments. Here, please allow me to express my sincere thanks!

References 1. Lü, Sh. (eds.): 800 words in modern Chinese (Revised). The Commercial Press, Beijing (1999). (in Chinese) 2. Guo, R.: Semantic structure and semantic analysis of Chinese function words. Chin. Teach. World (4), 5–15 (2008). (in Chinese) 3. Ma, B.J.: Prepositions in Traditional Chinese. China Bookstore, Beijing (2002). (in Chinese) 4. Jiang, S., Cao, G.: Reviews of the History of Modern Chinese Grammar. The Commercial Press, Beijing (2005). (in Chinese) 5. Chen, C.: Prepositions and Its Introducing Functions. Anhui Education Press, Hefei (2002). (in Chinese) 6. Yuan, Y.L.: Study on Chinese Valence Grammar. The Commercial Press, Beijing (2010). (in Chinese) 7. Liu, D.Q.: Typology and Preposition Theory. The Commercial Press, Beijing (2003). (in Chinese) 8. Fang, G.T.: System and method. In: Wang, X.J., Bian, J.F., Fang, H. (eds.) Collected Papers on Fang Guangtao’s Linguistics. Jiangsu Education Press, Nanjing (1986). (in Chinese) 9. Harris, Z.S.: Distributional structure. Word 10, 146–162 (1954) 10. Firth, J.: A synopsis of linguistic theory, 1930–55. In: Palmer, F. (ed.) Selected Papers of J. R. Firth 1952–59, pp. 168–205. Indiana University Press, Bloomington (1957/1968) 11. Li, X.Q.: Lecture Notes on Modern Chinese Function Words. Peking University Press, Beijing (2005). (in Chinese) 12. Ma, D.J., Zheng, T.T.: A Study of Prepositional Chunks in Modern Chinese. World Book Publishing Guangdong Co., Guangzhou (2016). (in Chinese) 13. Wei, Q.B.: A study of modern Chinese evidential prepositions. Master’s thesis of Guangxi Normal University, Guilin (2008). (in Chinese) 14. Lyons, J.: Semantics. Cambridge University Press, Cambridge (1977) 15. Palmer, R.:Mood and Modality, 2nd edn. World Book Publishing Company, Beijing (2001/2007) 16. Xie, J.L.: Modal verbs in Chinese. Doctoral dissertation of National Tsinghua University, Taipei (2004). (in Chinese) 17. Peng, L.Zh.: A study of Modern Chinese modality. Doctoral dissertation of Fudan University, Shanghai (2005). (in Chinese) 18. Xu, J.N.: A study of Modern Chinese Discourse Moods. Kunlun Press, Beijing (2008). (in Chinese)

The Discourse Functions of Shell Nouns in Mandarin: A Genre-Based Study in Popular and Professional Science Articles Xin Kou(&) School of Literature, Shandong University, Jinan, China [email protected]

Abstract. This article investigates the shell nouns in popular and professional science articles. This examination reveals the preferred ways of evaluation in these two genres. The study shows that, despite the overall similarity in semantic distribution, there are distinct variations in shell-noun use between the two genres in word frequency, semantic types and syntactic constructions. Furthermore, these distinctions can be reduced to the difference of structure, subjectivity and knowledge between the two genres. Popular science articles tend to be more explicit in making evaluations than professional papers. Whereas research papers prefer to choosing those nouns and constructions which express more informative, objectivity and scientiﬁc rationality. This comparative study expands current understanding of the discourse functions of shell nouns and can also help cultivate writer’s genre awareness. Keywords: Shell nouns science articles

Discourse functions Research papers Popular

1 Introduction Recently, genre variation has received increasing attention in discourse analysis and grammar study. As [1] indicates, genre constrains the choice of discourse structures, lexicons and constructions, which further reflect the formation of discourse communities. For example, the professional articles and social articles show a lot of variations in using expressions. The comparative studies not only contribute to a better understanding of the practices and ideologies of the genres, but also have signiﬁcance in exploring the communicational effects of certain lexicons and syntactic constructions. Among such researches, one area of interest is comparisons between research papers and popular science articles. Those two genres have the same aim to spread scientiﬁc knowledge, which is demonstrated by Hyland as “a tale of two genres” [2, 3]. However, the difference of the two genres is remarkable. Although the research articles are considered to be academic writing, popular science articles paly an indispensable role in spreading knowledge. With different audiences and communicative purposes, these two genres pick distinct expressive options to get with their objectives. Among the various difference, this study focuses on the lexicon and construction choices, and we hope the study could reveal the styles of articles by grammatical analysis. © Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 223–233, 2022. https://doi.org/10.1007/978-3-031-06703-7_17

224

X. Kou

In modern Chinese studies about genres, the previous research has paid some attention to the grammar analysis with the perspective from discourse comparison [4– 6]. Among these, the representative researches come from [7–11]. These researches concern the comparison between oral and writing discourse and enrich the syntactic analysis of certain constructions in Mandarin, such as postpositive relative clauses and a series of discourse markers. Nevertheless, studies on Mandarin has not explored much in academic discourse so far. Several related papers are corpus-based quantitative analysis on micro-linguistic expressions in academic articles. The conclusions of the study can be generalized as that the sentences in academic writing are longer than popular writing, research papers prefer nominalization, abstract nouns appear more in academic papers than in popular writing, and so on [12, 13]. In various lexical and grammar structure resources, abstract noun is a signiﬁcant marker to demonstrate linguistic styles. Between oral and writing corpus there is a distinctive difference of the frequency of abstract nouns. Therefore, previous studies have focused on these special nouns and found that the nouns have rich discourse functions. Based on their semantic and pragmatic features, this kind of nouns is regarded as shell noun, abstract noun, general noun, signaling noun and so on. For example, [14] points out that in academic articles, the shell nouns are used more in art and humanities articles than science articles as grammatical devices of textual cohesion. According to the related researches, this paper aims to discover the discourse functions of Chinese shell nouns in academic writing, including professional and popular science articles. Moreover, we hope to provide more sufﬁcient statistical evidence for the differences in two genres of academic writing.

2 Corpus and Procedures This study is based on two sub-corpora. The ﬁrst sub-corpus consists of articles in professional research articles which come from CSCD journals, totaling 53,126 characters. While the second sub-corpus consists of popular articles in three popular science website, namely Scientiﬁc American, Science Squirrels Meeting and Guokr, totaling 52,668 characters. All the articles in the two sub-corpora belong to botany, astronomy, medicine and environmental science in the recent ﬁve years (2015–2020). The term “shell noun” was ﬁrst introduced by [15] to refer to nouns that can encapsulate chunks of information. [15] identiﬁes shell nouns using objective criteria and elaborately describes their semantic features and multi-functions. According to this research, shell nouns play three functions simultaneously, i.e. characterization, temporary concept formation and linking. In this paper, based on the identiﬁcation of shell noun in [15], the content of Chinese shell nouns is enriched. [16] and [17] divided abstract nouns into two types. The one is content noun, which likes a carrier to contain information as the content of the noun. News, question, fact and thing are the members of this kind. The other one is event attribute noun, which has an argument describing its own content like content noun. Besides, it has another argument linking to an event. This kind of nouns include cause, result, purpose and condition. Take purpose as an example:

The Discourse Functions of Shell Nouns in Mandarin

225

(1) 努力学习的目的是取得一个好成绩。 The purpose of hardworking is to get a good score. In (1), “the purpose” is a container in semantics. Its content is “to get a good score”. Besides, there is another argument related to this noun, which is coded as adnominal clause describing the event which has a purpose. Therefore, this study demonstrates that the nouns like purpose have two arguments, i.e. an argument about content information and an argument about correlation information. According to the list of abstract nouns in [18], the shell nouns have been ﬁltered based on the deﬁnition we talked above. Hence, we get 158 shell nouns for this research. Furthermore, the semantic information of shell nouns can be divided into two types, which are content information and correlation information. Thus, there are two types of shell nouns, one of which is content nouns that have content information. The other one is event correlation nouns that have both content information and correlation information. Simultaneously, on account of semantic features, shell nouns could be divided into various sub-types. Following the research of [15] and [19], we ﬁnally get the classiﬁcation of shell nouns which is demonstrated in Table 1. Table 1. Types of shell nouns. Types Correlation nouns

Content nouns

Deﬁnition

Tokens

Logic Logic relations of events Space-time Space-time relations

cause (原因), purpose (目的) time (时间), circumstance (环境) Detail Manners and methods of events procedure (过程), stage (阶段) Factive The proposition of the noun is truth (真相), fact (事实) true AntiThe proposition of the noun is lie (谎言), illusion (错觉) factive false Feature The features of events merit (优点), dilemma (困境) Mental Mental feelings decision (决定), hope (希望) Speak Speaking and conversation view (说法), answer (回答) Information Messages and news recode (记录), report (报道)

After the data were cleaned, the shell nouns in the concordances were identiﬁed manually and then calculated using the AntConc3.5.8 software. To compare shell-noun use between the two corpora, frequency counts were normalized to 10 million words with respect to the corpus size. The chi-square test was then conducted to measure the extent of difference. In order to highlight the shell nouns with more obvious differences, the signiﬁcance level was established at 0.1% (p < 0.001) in this study. This study has adopted the chi-square test because it has been widely used to determine signiﬁcant differences in frequency comparison between two corpora.

226

X. Kou

3 Results In this section, the shell nouns in research papers and popular science articles are compared in three aspects. The ﬁrst one is their frequency and distribution in semantic categories. The second aspect is the classiﬁcations of modiﬁers accompanying shell nouns. And the last one is the syntactical projection of two information of shell nouns. 3.1

The Distinction of Semantic Types

As displayed in Table 2, the correlation nouns and content nouns are signiﬁcantly different in occurrence between research papers and popular science articles. Table 2. Frequency of shell nouns in two corpus. Types Correlation nouns Logic Space-time Detail Total Content nouns Factive Anti-factive Feature Mental Speak Information Deverbative Total Total (type) Total (token)

Professional

Popular

v2

p

124 (28.77%) 18 (4.18%) 116 (26.91%) 258 (59.86%) 1 (0.23%) 0 (0%) 91 (21.11%) 2 (0.46%) 10 (2.32%) 24 (5.57%) 45 (10.44%) 162 (40.14%) 420 61

116 (31.61%) 2 (0.54%) 43 (11.72%) 161 (43.87%) 11 (3.00%) 5 (1.36%) 55 (14.99%) 13 (3.54%) 31 (8.45%) 64 (17.44%) 27 (7.36%) 206 (56.13%) 367 73

0.76 10.57 53.85 24.26 10.23 5.909 4.978 15.21 15.27 28.47 2.27 24.26 – –

¼ 0.384 ¼ 0.001* < 0.001** < 0.001** ¼ 0.001* ¼ 0.015* ¼ 0.026* < 0.001** < 0.001** < 0.001** ¼ 0.132 < 0.001** – –

Compared with popular science articles, research papers have more shell nouns and correlation nouns as sub-category. Among correlation nouns, research papers prefer to using detail nouns, such as procedure (过程), manner (方式) and method (方法). While the popular science articles use these nouns less frequently. In addition, the numbers of logic nouns type occurring in two corpora are similar, which means both genres choose logic nouns as important lexical resource. However, when we move into the details, the difference emerges. Research papers tend to use affect (影响) and influence (作用) more than others, while the popular science articles are inclined to nouns expressing causal relationship, such as cause (原因), result (结果) and purpose (目的). Moreover, turning to the use of content nouns, we ﬁnd that popular science articles have higher occurrence frequency of this nouns. A distinct difference between two genres comes from mental, speak and information nouns. They are signiﬁcantly more in popular science articles. Besides, there is another issue meriting attention. Though

The Discourse Functions of Shell Nouns in Mandarin

227

the distributions of factive and anti-factive nouns in two styles are not signiﬁcantly different, Table 2 shows that the research papers repel these kinds of nouns, but popular science articles can accept them. Such as in example (2): (2) 事实上, 他所感受到的任何改善都完全是心理作用。但整个科学界却被这种假象蒙蔽了。 In fact, his feelings of change are totally psychological effects. However, all the scientiﬁc community is deceived by this illusion. Furthermore, moving to the use of feature nouns, it can be noticed that albeit there are similar numbers of occurrence in two corpora, the tokens of these nouns that used in the two styles are not in accordance with each other. Research papers use element (因素), feature (特点), problem (问题) and change (变化) as the top ﬁve shell nouns, while popular science articles use advantage (优势), defect (缺陷), difﬁculty (困难) and problem (问题) most, which are compound words with morphemes indicating subjective meaning. However, some words cannot be interpreted by their word formation, like problem (问题) [20]. 3.2

The Distinction of Grammatical Collocation

A shell noun appears in an article as an element of a construction. According to [14, 15, 20], adjectives modifying shell nouns can exert a great influence on the construction of discourse and expression of subjectivity. In the two corpora, the occurrence of “Adj + shell nouns” and shell nouns without adjective are calculated, and the results are showed in Table 3: Table 3. The occurrence of adjectives in shell-noun phrases Research papers ADJ + N 23 (5.48%) No ADJ 397 (94.52%) Total 420

p Popular science articles v2 6 (17.43%) 28.50 < 0.001** 303 (82.56%) 367 – –

[21] divided adjectives which modiﬁed shell nouns into ﬁve types, i.e. descriptive, evaluative, categorical, restrict and cohesive, in order to analyze the pragmatic effect of shell nouns in more detail. For corresponding with Mandarin Chinese, the adjectives are classiﬁed as three variations. Namely, subjective evaluation (successful [成功的], important [重要的]), objective evaluation (scientiﬁc [科学的], informative [详细的], speciﬁc [具体的]), cohesive (next [下一个], other [其他]). Then we can get the data in Table 4 to display the comparison of adjective in the two genres.

228

X. Kou Table 4. Types of adjectives modifying shell nouns in the two genres. Research papers Subjective 9 (39.13%) Objective 12 (52.17%) Cohesive 2 (8.70%) Total 23

3.3

p Popular science articles v2 31 (48.44%) 0.59 = 0.442 24 (37.5%) 1.50 = 0.22 9 (1.41%) 0.41 = 0.52 64 – –

The Distribution of Content and Correlation Information

Content nouns and correlation nouns differ in reference. The content nouns have an argument about the content information, while correlation nouns refer to a correlate event in spite of content arguments. Due to their inherent conceptual structure, the two kinds of arguments need to be realized in syntactic constructions. According to our analysis, the syntactic project of the information of shell nouns is displayed as Table 5 and Table 6. Table 5. The syntactic patterns of content information of shell nouns. Pattern Content + de + N

N + shi/zaiyu + content

Content + shi + N

N refer to content

Example 这样就可避免[因为积分步长太大而错过部分近密交会]content的情况。 Thus we can avoid the situation that part of the near dense intersection is missed because of the oversize integral step 瘦素的用途主要是[告诉大脑, 你是否拥有足够的能量储备来应对相对苛刻的挑战]content。 The main application of leptin is to tell your brain whether you have enough energy reserves to cope with a relatively demanding challenge [科学家对这一时期二氧化碳指数的争议]content是该研究倍受关注的原因之一。 One of the causes that this research draw a lot of attentions comes from the scientists dispute of the CO2 index for this period 尽管[注射胰岛素] content改变了数百万糖尿病患者的生活, 但患者也不是只能选择这样的解决方案。 Although insulin injections have changed the lives of millions of diabetics, this is not the only solution for the patients

The Discourse Functions of Shell Nouns in Mandarin

229

Table 6. The syntactic patterns of correlation information of shell nouns. Pattern Correlation event + de + N N refer to correlate event

Correlate event + V + N

Example 对[近密交会或碰撞]correlation的效果也并未进行具体分析。 (This paper) have not analyzed the effect of near dense intersection or collision [它们有不同的尺寸, 不同的化学成分, 去不同的地方] correlation, 到达目的地时效果也会有所不同。 They come in different sizes and different chemicals and go to different places, which leads to the different effect when they arrive at the destination [其偏心率的具体大小] correlation会对 CE 次数产生影响。 The exact size of its eccentricity could exerts the influence on the frequency of CE

Based on the syntactic patterns in Table 5 and Table 6, we have analyzed the occurrence frequency of each patterns in two corpora. As Table 7 and Table 8 show, the research papers and popular science articles prefer different patterns expressing shell-noun arguments in syntax. Table 7. The frequency of content information patterns. Research papers Content + de + N 103 (24.52%) N + shi/zaiyu + content 87 (20.71%) Content + shi + N 40 (9.52%) N refer to content 101 (24.05%) No content 89 (21.19%) Total 420

p Popular science articles v2 74 (20.16%) 46.40 < 72 (21.52%) 0.15 = 33 (8.99%) 0.07 = 61 (16.62%) 6.61 = 127 (34.6%) 17.70 < 367 – –

0.001** 0.70 0.78 0.010* 0.001**

Table 8. The frequency of correlation information patterns. Research papers Correlate event + de + N 134(51.94%) N refer to correlate event 81(31.4%) Correlate event + V + N 43(16.67%) No correlate event 0 Total 258

p Popular science articles v2 86(53.42%) 0.09 = 0.77 55(34.16%) 0.35 = 0.56 20(12.42%) 1.40 = 0.24 0 – – 161 – –

Comparing data from Table 7 and Table 8, there is remarkable consistency in use of shell nouns in research papers and popular science articles. But the distinction still exists. The most signiﬁcant difference between the two genres is whether the content information of shell nouns is speciﬁed in discourse. As Table 7 shows, the content

230

X. Kou

information of shell nouns is inclined to appear in professional science papers, while there is a mass of content information hid in popular science articles.

4 Discussion The comparison of shell-noun uses in popular and professional science articles contributes to our understanding of the preferred syntactic forms in the two genres. The distinctions can be generalized as three points. The ﬁrst one is that professional research papers use shell nouns as cohesive device more frequently, yet the popular science article prefer the evaluation functions of shell nouns. The second one is that more neutral and objective shell nouns are used in professional papers, whereas the popular science articles have more subjective shell nouns. Finally, in view of knowledge construction, shell nouns can display the difference of information source between research papers and popular science articles. 4.1

Discourse Organization

In terms of the overall semantic distribution of shell nouns, the two genres exhibit a signiﬁcant distinction that there are more correlation nouns in research papers. According to the analysis above, correlation nouns connect two kinds of information in discourse, which makes them presenting as cohesive device naturally. However, when we move to the details, there is an interesting point drawing our attention, which is that logic nouns appear more in popular science articles than in professional papers. This phenomenon means that professional papers do not use logic markers such as logic nouns to manifest logical relations and deduction. Moreover, professional papers use detail nouns more frequently. Thus it can be concluded that professional papers not only use shell nouns as cohesive device, but also apply them as propellers to push the discourse to more detail and speciﬁc. Such as (3), the stages (阶段) help the writer to expand the content of evolution, which makes the discourse more informative. (3) 根据地层侵蚀量, 将丹霞地貌演化阶段分为青年期, 壮年期和老年期。 According to the amount of stratum erosion, the evolution stages of Danxia landform can be divided into the young and the mature and the old. Another notable difference is the syntactic realization of content information. In popular science articles, lots of content information of shell nouns is hidden in discourse. Whereas, the content of shell nouns in research articles tends to be shown. The comparison can be seen in (4a) and (4b). (4) a. 首批结果显示, 当自愿者进入深度睡眠阶段, 脑脊液经特定脑区过滤后周期性地进入大脑, 并且按下图箭头所示的方向环绕整个大脑流动。 (Research paper) The ﬁrst results showed that when the volunteers were in deep sleep, cerebrospinal fluid was periodically ﬁltered into the brain by speciﬁc brain regions and flowed around the entire brain in the direction shown by the arrow below.

The Discourse Functions of Shell Nouns in Mandarin

231

b. 但酒精依赖症产生的有一些原因, 但尚不清晰, 因而现有措施一直没能有效对其进行治疗。 (Popular science article) But alcohol dependence has several causes that are unclear, and current treatments have not been effective. In (4a), the word results has its speciﬁc content, expressing as an object clause. While causes in popular science article does not refer to a proposition in the context, and the content is described as “unclear”. Therefore, it is further proved that the shell nouns in research papers exert an important influence on the construction of detail information. In addition, research papers are more informative and speciﬁc than popular science articles. 4.2

Subjectivity

In view of semantic classiﬁcation of shell nouns, feature nouns appear most often in professional academic articles. This kind of shell nouns include problem (问题), change (改变), merit (优点), defect (缺陷) and other words describing characteristics of an event or a thing. Among these nouns, research papers only prefer parts of them, which are objective and non-evaluative. For example, restriction (限制), feature (特点), problem (问题), change (改变) usually appear in academic papers. Yet advantage (优势), key (关键) and signiﬁcance (重点) do not occur frequently. In contrast, these words are always used to display evaluation in popular science articles. Moreover, factive and non-factive nouns are barely found in professional papers, on the contrary, they can appear in popular science articles which use these words to convey strong views. Such as (5). (5) 但去了势的男性跟女性寿命一样长, 却是个不折不扣的事实。 But it is a plain fact that dehydrated men and women live just as long. Factive and non-factive nouns are presupposition triggers, expressing the evaluation of speakers on events. So this kind of words transmits the judgement and estimation, and it is subjective. Thus, from the perspective of lexical source, it can be seen that popular science articles are more subjective and evaluative than professional academic writing. In addition, the construction of “Adjective + shell noun” shows the difference of papers and popular articles as well. In the former, the occurrence frequency of adjectives is lower, and the adjectives used in discourse are more objective, such as important (重要) and main (主要). In contrast, popular science articles apply more subjective adjectives. There is a typical example in (6): (6) 最近, 两项研究分别提出了两个疯狂的假说: 第九行星可能是一个原初黑洞; 或者, 它的存在是因为太阳系中曾有第二个太阳。 Recently, two separate studies have come up with two crazy hypotheses: that planet Nine could be a primal black hole; Or maybe it exists because there was once a second sun in the solar system.

232

X. Kou

Crazy (疯狂的) carries a strong subjective point, and we can see the writer’s feelings clearly in the article. However, this word is not grateful in professional academic writing which favors more moderate and rigorous expressions. 4.3

Knowledge Construction

The knowledge construction of contexts mainly comes from displaying knowledge sources. There is a noticeable situation in shell-noun use that large amount of speak and information nouns are used in popular science articles. These lexicons constitute the knowledge sources of the writers. Most information in this genre comes from point, view or news. As we can see in (7). (7) 最近, 一位知名经纪人计划接受切胃治病的消息登上热搜, 让这种减重代谢外科治疗手段突然进入镁光灯下。 Metabolic surgery for weight loss was thrust into the spotlight recently when news of a prominent broker’s plans to undergo stomach cutting hit the Internet. On the contrary, research papers have disparate knowledge sources. They use a lot of nominal deverbatives to express the ways of knowledge acquirement. So the shell nouns used here are usually study, research, analysis and investigation. These words manifest the prudent attitude of writers who judge their knowledge and views cautiously and work hard to avoid subjective assessment. Finally, through the three points talked above, the variations observed in this study can be attributed to the different audiences and communicative purposes of the two genres. The main aim of popular science articles is to inform general readers of scientiﬁc information, whereas the goal of research articles is to validate claims shared with peer members of the discourse community. Given their different audiences and communicative purposes, writers of the two genres choose different shell nouns as lexical resource accordingly when construct discourse.

5 Conclusion This examination reveals that there are distinct variations in shell-noun use between the two genres in word frequency, semantic types and syntactic constructions. Furthermore, these distinctions can be reduced to the difference of structure, subjectivity and knowledge between the two genres. Popular science articles tend to be more explicit in making evaluations than professional papers. Whereas research papers prefer to choosing those nouns and constructions which express more informative, objectivity and scientiﬁc rationality. This comparative study indicates that there are variations in shell-noun use between popular and professional science articles, which reflect the preference of different genres. These ﬁndings regarding the discourse functions of shell nouns can help writers and readers better understand the rationale behind the rhetorical choices in different genres. Acknowledgements. I am grateful to the anonymous reviewers of CLSW2021 for helpful suggestions and comments. This research was supported by the Humanities and Social Sciences

The Discourse Functions of Shell Nouns in Mandarin

233

Fund of Shandong Province, China under grant No. 20DYYJ05 (A Study on the Interaction between Attributive Clauses and Head Nouns in Mandarin Chinese). All errors remain my own.

References 1. Bhatia, V.K.: Worlds of Written Discourse. Continuum, London (2004) 2. Hyland, K.: Academic Discourse: English in a Global Context. Continuum, London (2009) 3. Hyland, K.: Constructing proximity: relating to readers in popular and professional science. J. Engl. Acad. Purp. 22, 119–131 (2010) 4. Lv, S.: One the study of grammar. Chin. Lang. [Zhongguo Yuwen] 1, 1–8 (1978). (in Chinese) 5. Zhu, D.: What is the object of modern Chinese grammar study? Chin. Lang. [Zhongguo Yuwen] 5 (1987) (in Chinese) 6. Hu, M.: Style and grammar. Chin. Lang. Learn. [Hanyu Xuexi] 2 (1993) (in Chinese) 7. Zhang, B., Fang, M.: Functional Studies on Chinese Grammar. Jiangxi Education Press, Nanchang (1996). (in Chinese) 8. Tao, H.: On the grammatical signiﬁcance of stylistic classiﬁcation. Contemp. Linguist. [Dangdai Yuyanxue] 3, 15–24 (1999). (in Chinese) 9. Tao, H.: The realization and pragmatic principles of verb argument structure in operational style. Chin. Lang. [Zhongguo Yuwen] 1, 3–13 (2007). (in Chinese) 10. Zhang, B.: Functional grammar and Chinese studies. Linguist. Sci. [Yuyan Kexue] 6, 42–53 (2005). (in Chinese) 11. Fang, M.: The shaping of syntax by stylistic motivation. Rhetoric Study [Xiuci Xuexi] 6, 1– 7 (2007). (in Chinese) 12. Zeng, Y.: The stylistic distribution of intermediate reading materials in TCSL. Chin. Lang. Teach. Res. [Huawen Jiaoxue Yu Yanjiu] 2, 15–22 (2012). (in Chinese) 13. Zhai, W.: Analysis of language characteristics of Chinese humanities and social science academic writing. Master’s dissertation, Xiamen University, Xiamen (2018). (in Chinese) 14. Jiang, F., Hyland, K.: Metadiscursive nouns: interaction and cohesion in abstract moves. Engl. Specif. Purp. 46, 1–16 (2017) 15. Schmid, H.-J.: English Abstract Nouns as Conceptual Shells. Mouton de Gruyter, Berlin (2000) 16. Shen, J.: Transferred reference and metonymy. Contemp. Linguist. [Dangdai Yuyanxue] 1, 1–9 (1999). (in Chinese) 17. Kou, X., Yuan, Y.: The selectional restriction between event attribute nouns and the selfdesignation de-constructions. Contemp. Linguist. [Dangdai Yuyanxue] 3, 396–418 (2017). (in Chinese) 18. Fang, Q.: An analysis of abstract noun teaching strategies based on corpus. J. Jingdezhen Univ. [Jingdezhen Xueyuan Xuebao] 5, 26–27. (2013). (in Chinese) 19. Jiang, F.: Stance construction and interpretational interaction of shell nouns. Mod. Foreign Lang. Res. 1, 59–65 (2016) 20. Wang, E., Yuan, Y.: The qualia role distribution in word meaning and its influence on word interpretation – taking “color (yan) + noun (ming)” compound words as an example. J. Foreign Lang. 2, 31–41 (2018). (in Chinese) 21. Gao, Zhang: “There is a possibility that…”: shell nouns in academic writing by Chinese and Swedish. Linguist. Lit. Stud. 6(2), 52–59 (2018)

The Interpersonal and Attitudinal Function of the Modal Particle A in the Middle of the Sentence Minfeng Wang(&) International College of Chinese Studies, Fujian Normal University, Fuzhou, China [email protected]

Abstract. The modal particle A(啊) in the sentence is mainly placed after the theme, performing the functions of pausing the conversations and drawing others’ attention to the following discourse. However, speciﬁc to the speech scene, the function of the subjective communication of A(啊) in the interactions is a dialogue discourse. The use of A(啊) reflects a certain psychological appeal and pragmatic purpose of a speaker. In terms of the emotional tone of discourse, A(啊) can be further used to shorten the interpersonal distance, to weaken the illocutionary force, to construct discourse style, intervene in discourse, and perform other discourse functions. Keywords: Modal particle A(啊) Interpersonal function Attitudinal function

As an adhesive discourse component, the essential function lies in discourse. To be more speciﬁc, compared with A(啊) at the end of sentences, A(啊) in the mid-sentence is mainly used in dialogue style and does not contribute to the construction of proposition structure and meaning. Although scholars have described A(啊) in the midsentence in detail from syntactic and semantic level [1–9], they actually tend to observe A(啊) statically. From discourse perspective, two opinions are in the mainstream when studying the dynamics of A(啊) in the mid-sentence: A(啊) in the mid-sentence as a topic marker, which is claimed by Li and Thompson [10], Xu and Liu [11], and A(啊) as a thematic marker, pointed out by Zhang and Fang [12]. Nevertheless, most studies only focus on the pre-component of A(啊), but paying little attention to the subsequent discourse of A(啊) and the rheme, which leads to the lack of the research on its attitudinal function in discourse and the interaction with dynamic context. Thus, the main topic of this paper would be analyzing those research gaps in depth, with the hope to contribute to academic knowledge.

1 Discourse Distribution of A(啊) in a Sentence Fang divided theme into three parts, including topic theme, interpersonal theme, and textual theme [13]. In the sentence, A(啊) can appear after the three thematic components: © Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 234–247, 2022. https://doi.org/10.1007/978-3-031-06703-7_18

The Interpersonal and Attitudinal Function

235

i. When it comes to topic theme, we adopt the ‘feature bundle’ proposed by Shen [5], including the components with the function of referring; substantive components, eventualized predicate components, ‘pre-topic markers + substantive words’ and so on. (1) 1S:guan -yu zhe ge wen- ti A[啊], ni shi jie- jue bu liao de. 1S: As for this CL problem A you are solve Neg Mod 1S: As for this problem, you can’t solve it by yourself. (2) 1S:dui-yu zuo- zhe A [啊], dei shuo de wei -wan dian er. 1S: As for the author A should talk Stru tactful a bit 1S: While communicating with authors, it’s better for us to be more tactful. (3) 1S: You-shi-fu, wo kan nin you dian bao-shou! 1S: you Master, I think you are a little conservative 1S: Master, your behaviors make me feel you are a little conservative. 2S: jue- dui bu shi! mai rou A [啊], gen-ben bu shi fu- nv neng gan de. 2S: Absolutely Neg be sell meat A at all Neg be women can do Mod. 2S: Absolutely not! Selling meat is absolutely not a job that women can do. 3S: na jiu-shi geng beng-ti mai yu le! 3S: That be even not to mention sell ﬁsh Mod 3S: Not to mention selling ﬁsh! According to functional grammar, if the topic theme is consistent with the sentence subject, it is considered as an unmarked topic theme, for example, ‘Xiao Wang, go away’1. On the other hand, if they are inconsistent or have pre-topic markers, the topic theme is a marked topic theme. (2) and (3) are a good reference in this case. ii. Interpersonal theme, which allows the language elements to show the speaker’s modality, attitude, cognition and stance, more speciﬁcally, evidentiality words, vocatives, modal adverbs, etc. (4) 1S: wo jian-yi A[啊],cong xian-zai kai-shi zan-men shui dou bu- yao shi- yong zhe ge dian -hua le. 1S: I suggest A from now on, we anyone all Neg use this CL phone Mod. 1S: I suggest that none of us use this phone anymore from now on. (5) 1S: bu- ru A [啊],jiu gai cheng zhi -yin lao- lao, rang zan Niu da- jie fu- ze. 1S: would be better A just change to Soulmate Grandma, let our Sister Niu take charge of. 1S: It would be better to just change to soulmate grandma and let our sister Niu take charge. (6) 1S: jiao- shou A [啊], bu- shi wo-men hu- nao, hen- duo qing- kuang ni ye xiao- de de… 1S: Professor A Neg we make trouble. a lot of situation you also know Mod……. 1S: Professor! We didn’t mean to make trouble. You know something about. our situation……..

1

Liu Danqing (2007:249) thinks that “A (啊)” is a topical means, which makes the sentence subject “Xiao Wang” an explicit “topic”.

236

M. Wang

iii. Textual theme, which points to connective words that lead the content, such as related components and inserted components. (7) 1S: 1S: 1S: (8) 1S: 1S: 1S: (9) 1S: 1S: 1S:

ke- jian A [啊], ta zhe ge ren yi-dian-er- dou kao bu zhu. so A he this CL person at all rely on Neg It can be seen that he is not reliable at all. suo-yi A [啊], ni yao hao -hao zhen- xi ta. So A you should well well cherish her. Therefore, you should really cherish her. ju ge li -zi A [啊],zuo- tian ni jiu bu ying -gai pi -ping ta. Take an example A yesterday you Neg should criticize him. For example, you shouldn’t criticize him yesterday.

2 To Strengthen Attitudinal Function Syntactically, A(啊) is in the middle of theme (ﬁrst) and rheme (end). From the perspective of psychological cognition, whether to add A(啊) or not depends on the needs of expression. The act of adding A(啊) reflects the speaker’s certain psychological demands, cognitive needs and pragmatic purposes, which shows obvious interpersonal pragmatic functions. For example: (10) 1S: zhu -ren A[啊], wo fei -chang gan -xie ni. Sui- ran wo zhi -dao bu- shi ni kai de dao, dan gan -xie ni zai pang -bian de zhi- dao. 1S: Director A I very much thank you although I know Neg you do the operation but I appreciate you beside Stru. guidance. 1S: Director, I thank you very much. Although I know you didn’t lead the operation, I appreciate your guidance beside. !(10’) 1S: zhu- ren, wo fei -chang gan -xie ni. ……. 1S: 1 director I very much appreciate you…… 1S: Director, I thank you so much.…… When putting example (10) and (10’) together and observing, it is obvious that example (10) possesses a kinder tone, and the gratitude to the director was more genuine if analyzing further. If removing A(啊) in the sentence, the gratitude expression still functions, but the tone appears more direct and blunt. It can be said that although the addition and deletion of A(啊) in the sentence does not change the conceptual meaning of the proposition, they are not equivalent in performing the function of the cognitive psychology of the communicative subjects, and have different pragmatic effects with a strong ‘Intersubjectivity’ which expresses a subjective attitude exerted by the speaker. Based on our analysis of the corpus, the subjective attitudinal function of A (啊) in the mid-sentence is reflected in at least four aspects: adding the subjective expression component of the speaker, causing the subjective deviation of objective quantity, strengthening subjective emotion and enhancing subjective cognition of the speaker.

The Interpersonal and Attitudinal Function

2.1

237

Adding Subjective Expression Components of the Speaker

In conversational communication, adding A(啊) in the sentence after topic theme and textual theme can attract the attention of listeners to the subsequent discourse as well as add some subjective feelings and attitude of the speaker. For example: (11) 1S Bailin: zhe ge gou hua de shu- e yi- ban shi duo- shao, Jiamu, yao ni dai de hua ne? 1S Bailin: This CL enough spend Stru amount generally is how much, NAME, need you leave Stru message 1S Bailin: What’s the usual amount, Jiamu, if it is you to bring it 2R Jiamu:wo [YA (呀)], wo jue de zhu- yao hai -shi kan ni yao mai shen- me dong-xi,ru -guo ni yao- shi dao na-er da gou -wu de hua,ke -neng jiu dai de duo yi- dian. 2R Jiamu: for me A, I think mainly still depend on you want buy what thing if you if go there big shop Mod may just bring Stru more a bit 2R Jiamu: If it is me, I think it mainly depends on what I am going to buy. If you want to buy a lot of things, you may bring a little more. (12) 1S Wang Dongyue: wo- men de zhong- guo [A (啊)], shi- ji shang xian- zai yi shuo, shuo san- huang- wu- di. 1S Wang Dongyue: Our Stru China A in fact now one talk talk three emperors and ﬁve sovereigns. 1S Wang Dongyue: While mentioning about our China, actually we are talking about three emperors and ﬁve sovereigns. 2R Liang Dong: dui. 2R Liang Dong: right 2R Liang Dong: Yes. 3S Wang Dongyue: dui bu dui? Ran- hou wo -men cong huang -di, yan- di shuo qi. 3S Wang Dongyue: right Neg right? Then we from the Yellow Emperor Yan Emperor talk Asp. 3S Wang Dongyue: Is that right? let’s start with the Yellow Emperor and Yan Emperor. 4R Liang Dong: dui. 4R Liang Dong: right 4R Liang Dong: Yes. In the dialogue, for example (11), the speaker directly designates ‘Jiamu’ as the next speaker among many listeners, and A (啊) in the sentence shows that the listener ‘Jiamu’ is somewhat surprised about this. From the perspective of discourse analysis, this accident occurs because the listener was not well prepared to take the turn, and if A (啊) is deleted, there is no such modal feature and effect. Hence, ‘Unexpected’ is the emotional attitude brought by A (啊). In example (12) the topic ‘Our China’ is highlighted by the sentence A (啊), which also gives the speaker a strong sense of pride in this topic. On the contrary, without A (啊), it is just an objective statement. In discourse, the speciﬁc emotional attitude conveyed by A (啊) is influenced by the emotional tone of discourse.

238

2.2

M. Wang

Causing the Subjective Deviation of ‘Objective Quantity’

When the topic is a quantiﬁer, A (啊) in the sentence simply indicates the speaker’s subjective identiﬁcation of the ‘objective quantity’ with a deviation. This ‘deviation’ is reflected in the speaker’s further expansion or reduction of the ‘objective quantity’, psychologically. In the process of ‘expanding’ or ‘shrinking’, the speaker’s subjective evaluation often comes along. Different from ‘adding the subjective component of the speaker’, there was a signiﬁcant evaluation component before and after the ‘quantity’ topic. Such as: (13) 1S: cai 20 kuai qian A [啊],tai pian -yi le. 1S: only 20 yuan A too cheap Mod. 1S: It’s only 20 yuan. That is too cheap. (14) 1S: Man -chang de 16 nian[A (啊)], li da -jie bei qi de bu jin- jin shi dui zhangfu de yi pian ai -xin, ye shi zhe ge xiao shan- cun de xi -wang. 1S: long Stru 16 years A, Li Sister carry Stru Neg only only is to husband Stru one CL love also be this CL small village Stru hope 1S: 16 years is a long time, Sister Li carried not only loves for her husband, but also the hope of this small village. (15) 1S: 36 nian [A (啊)],gu- rou- fen- li, che -xin che -gu de si -nian, tong duan gan -chang. 1S: 36 years A the separation of flesh and blood, thorough thoughts, broken liver and intestines. 1S: Entirely 36 years, the deep longing of separation from relatives made. people extremely painful. The underlined parts of examples (13–15) are all evaluative components. For instance, in example (13), before the topic ‘20 dollars’, there is a modal adverb ‘Cai’ which marked ‘the small amount’, and the subsequent discourse ‘too cheap’ is another example. In the sentence, ‘20 dollars’ clearly expressed ‘a small amount’, but after adding A(啊) in the sentence, the ‘small amount’ will further be ‘narrowed’ psychologically, and ﬁnally become a ‘minimal’ amount, which results in the speaker’s ‘disdain’ towards the amount. Psychological ‘subjective minimum’ of the speaker can be detected by ‘buguo (just)’, which is smaller than the subjective minimum, such as: !(13’) 1S: [bu -guo] 20 kuai qian,[A(啊)], tai pian -yi le. 1S: [Just] 20 yuan A too cheap Mod 1S: [Just]20 yuan, it’s too cheap. In case (14), the topic of ‘16 years’ is modiﬁed by ‘chang(long)’, which indicates that ‘16 years’ is a ‘large amount’. Following the action of adding A(啊) in the sentence, the ‘large amount’ of the speaker’s psychological space immediately ‘expanded’, becoming an unbearable ‘great’ amount if viewing from the psychological point of the speaker, and the grammatical subject ‘Sister Li’ completed this ‘extreme’ amount, which naturally aroused the admiration and praise of the speaker for ‘Sister Li’. This can be detected by adding ‘subjective extreme quantity’ such as ‘changda (up to)’:

The Interpersonal and Attitudinal Function

239

!(14’) 1S:[chang da] 16 nian[A (啊)], li da jie bei qi de bu jin jin shi dui zhang-fu de yi-pian ai-xin, ye shi zhe-ge xiao shan -cun de xi -wang. 1S: up to 16 years A Li Sister carry Stru Neg only only is to husband Stru one CL love also be this CL small village Stru hope 1S: Up to 16-year, what Sister Li carried is not only her loves to her husband, but also the hope of this little village. Example (15) ‘36 years’ appears twice and is used in parallelism construction. Parallelism is a typical rhetoric to express emotion and strengthen the effect of language. Through parallelism and subsequent discourse, ‘36 years’ can be identiﬁed as ‘a lot’ in the psychological space of the speaker. Also, A(啊) in the sentence further increases this ‘large number’, which almost makes ‘36 years’ as the speaker’s psychological limit. The grammatical subject ‘he’ is separated from his loved ones in this extreme time, reinforcing the compassion of the speaker. Li believed that using modal particles to express subjective quantity with modal particles is a meta-linguistic method used to describe language itself [14]. The author also points out that meta-subjective quantity can be divided into two types, namely, saying in the big and saying in the small. The A(啊) in the sentence ‘expands’ and ‘reduces’ the ‘objective quantity’, which reflects the expression function of ‘meta-subjective quantity’. 2.3

Strengthening the Speaker’s Subjective Modality

Strengthening the subjective modality of discourse represents certain subjectivity of the original discourse. In this sense, with the addition of the A(啊) in the sentence, the emotional tone of the discourse is strengthened. This function is most obvious when A (啊) is used after the Vocative in a sentence. The interpersonal theme before A(啊) reflects the speaker’s identity, subjective attitude towards proposition and discourse mode, etc. The core meaning of discourse is the rheme, which is the communicative intention of the speaker, and often indicates the speaker’s illocutionary force or subjective cognition of the Vocative. Among the modal particles including ‘BA (吧), A(啊), NE (呢)’ in sentences, the unique syntactic feature of A(啊) in sentences is located after the Vocative. The speaker’s use of Vocative expression is a polite behavior if without second thoughts, but in essence it is a pragmatic strategy which carries out a certain speech act and expresses a certain communicative intention. Vocative does not participate in syntactic structure and can be used independently. It mainly shows up in the social pattern where the social status and power are imbalanced, that is, the Master-Slave mode. In the structure of Vocative, respectful names, such as ofﬁcial title, position or the afﬁx ‘Lao’ and favorite ‘Xiao’ are often used. From the perspective of information structure, the part before A(啊) is the theme, and the component after A(啊) is the rheme. Embedding A(啊) in the sentence increases the level of emotional input or appeal of the speaker, quite similar to the function of ‘lubricant’, which offers flexibility to the discourse and promotes the smooth progress of the discourse. Speciﬁcally, the content expressed by rheme includes: the speaker’s subjective cognition to the listener, suggestion, request, commitment and other speech acts, the consultation, defending and reasoning etc.

240

M. Wang

(16) 1S: chang- zhang [A(啊)], jin- nian neng bu neng huan dian dong- xi fafa, yue bing shen -me de?(jian-yi, qing-qiu、cheng-nuo deng yan-yu xing-wei.) 1S: Director A this year can Neg can change some gifts give-give mooncake and so on Stru 1S: Director, can you change gifts for us this year? Such as moon cakes or something alike? (Suggestions, requests, promises etc.) (17) 1S: Agui[A(啊)] lao -po shi yong lai teng de, bu -shi yong lai da de. Xin she -hui le, fu-nv dou jie -fang le!yi- hou ke bu- neng zhe- yang le. 1S: NAME A wife is use for love Stru Neg be for beat Stru new society Asp women all liberated Asp you Mod Neg can this Mod. 1S: Dear Gui, women have been liberated nowadays; Wife should be loved rather than being hurt and beat, you can never do this again. (Defense and reasoning). (18) 1S: Xiao Wang [A(啊)], zui- jin hen- shao jian-dao ni, ni shen-me-shi-hou neng he shang ni de xi- jiu a. 1S: NAME A recently seldom See you you when can drink Asp your wedding banquet Mod 1S: I rarely see you recently. When are you going to get your wedding banquet? (consult) (19) 1S: yao -bu ni dao wo jia lai? Ni yao duo-shao gong- qian ? 1S: How about you come my home you want how much money. 1S: How about working for my house? How much do you want to be paid? 2R: ‘ni zhe jiu wai-dao le, da-jie[Ya(呀)], qian bu qian de mei- guan- xi. 2R: you this so strange Mod, sister A money Neg money Stru no matter. 2R: It’s very kind of you, sister. It doesn’t matter how much you give. (The speaker’s subjective understanding and evaluation). Peng introduced Sweetser’s speech-act modality and pointed out that ‘any actual sentence may not only react to a proposition, but also achieve a certain speech act by expressing this proposition’ [15]. Example (16), the employees are dissatisﬁed with the pear blossom as a gift for Mid-Autumn Festival every year in the factory. The subsequent discourse of A(啊) is ostensibly an inquiry, nevertheless what is actually being performed is the suggestion speech act. In example (16), the abuse from ‘Wang Gui’ to ‘Anna’ was heard by Wang’s mother-in-law who already knew ‘Anna’ was scolded for no reason. After saying A(啊) in the sentence, she defended herself by narrating the reasons. Argumentation and reasoning can obtain the contextual effect of accusatory education. When the subsequent discourse of A(啊) is a speech-act in the domain, the interpersonal structure pattern is usually an unbalanced Master-Slave mode. As for example (16)–(19), the communication subjects all refer to ‘elder-peer’ relationship. It can be seen that the speaker’s prominence to the listener is a marked expression with a clear purpose. In the speciﬁc speech scene, A(啊) is used after Vocative, but does not participate in the interaction of proposition content, mostly interacts with the context, especially the communicative object, ending up strengthening the speaker’s emotion. In terms of the pragmatic effect, the use of A(啊) in the mid-sentence strengthens the weight of the speaker’s emotional input, and magically produces the right amount of empathy so to resonate with the listener and promote the propositional behavior of the listener or the

The Interpersonal and Attitudinal Function

241

subjective understanding, position and attitude of the receiver. In addition, A(啊) in the sentence is helpful in generating more of the speaker’s cognitive certainty. When the part before A(啊) is the theme of the text and the rheme is the epistemic reasoning of the domain, A(啊) in the sentence means ‘to draw the listener’s attention, my cognitive reasoning is obvious and trustworthy’. If A(啊) is deleted, cognitive reasoning will be more objective and direct, but loose the power of the modal function produced by A(啊).

3 To Weaken the Speaker’s Illocutionary Force In conversational communication with the discourse that carries strong emotional tone, such as criticism and accusation, the application of A(啊) in the mid-sentence can effectively reduce the negativity of the discourse without changing the authenticity of proposition information. In this sense, A(啊) in the sentence has the function of weakening the speaker’s illocutionary force. For example: (20) 1S zhu chi ren: dui, kuai -zi! 1S: Moderator: yes, chopsticks 1S: Moderator: yes, chopsticks 2R Wei Shan: qi -shi you-shi-hou wo gan-jue jiu-shi-shuo, zhe ge cai tai hao-chi de shi-hou, wo ye jiu bu-yong shen -me wan-kuai le. Zhi- jie jiu zhe-yang, ni kan wo, Xiao lao -shi. 2R Wei Shan: actually sometimes I feel for example this CL dish delicious Stru moment I also Neg use what chopstick Mod directly like this you look me Xiao teacher 2R: Actually, sometimes I feel that when this dish is too delicious, I don’t need any chopsticks. Just like that. Look at me teacher Xiao. 1S zhu chi ren: ni [Ya(呀]bu ying- gai zhe- yang, zhe- yang bu wen- ming. 1S Moderator: you A Neg should this this Neg civilized. 1S Moderator: You shouldn’t be like that, it is kind of uncivilized. (21) 1R Lixin: ni [Ya(呀], bie zai ting -zhong peng- you mian -qian. 1R Lixin: you A don’t be audience friend in front of 1R Lixin: Don’t do that in front of the audience 2S Chu Yang: kua xia hai- kou. 2S Chu Yang: exaggerate 2S Chu Yang: Don’t exaggerate. In example (20), the subsequent discourse of A(啊) is a critical speech-act, which threatens the listener’s ‘social identity face’ and creates discord in interpersonal relationships. However, the use of ‘ni A(啊)’ in the sentence enhances the intimacy between the speaker and the listener, so there is an obvious feeling that the direct accusations, criticisms and complaints between friends, acquaintances or relatives have been weakened, and interpersonal distance shortened accordingly. A(啊) in example (21), as a warning speech act, threatens the listener’s ‘face’. With A(啊), it not only slows down the entire speech, but also adds the concerns of the speaker ‘Lixin’ to the listener ‘Chu Yang’. This alleviates the threat of FTA and transform the warning speech-act into a kind reminder, which becomes intersubjective. It can also be inferred

242

M. Wang

that A(啊) in the sentence contributes to the construction of the speaker’s pragmatic identity, such as friendly elders, close friends or close lovers, especially in the context of conflict. However, if the subsequent discourse has an anaphora as the syntactic subject, the situation would be quite different. For instance: (22) 1S dui, ta nv peng-you jiu gen ta kai-wan- xiao shuo, ni [Ya (呀)], ni mei- you nan-ren-wei-er. 1S: Yes, his girlfriend with him joke say you A you Neg manly. 1S: Yes, his girlfriend joked with him, ‘You are not manly enough’ (Complain + Blame, Strengthen) The subsequent discourse of A(啊) refers back to the topic ‘NI (你)’ with the pronoun ‘NI (你)’, and ‘NI (你) A(啊)’ is a discourse marker that expresses the blame [16]. This lays a foundation for the following implementation of other speech-acts. Instead of alleviating the censure of the follow-up discourse, A(啊) strengthens the censure of ‘boyfriend’ through the double way of ‘Complaining + Censuring’.

4 Discourse Function of A(啊) 4.1

Construction of Discourse Style

In terms of the research results of the modal particle A(啊), the concepts of ‘SHUHUAN (舒缓)’, ‘HEHUAN (和缓)’ and ‘Mitigation’ are often mentioned. For example, Examples of Function Words in Modern Chinese published by the Chinese Department of Peking University quotes these two concepts simultaneously. 1) Used between sentence components, in order to make the listener pay attention to the following and build the easygoing and SHUHUAN tone. For example: ta A(啊), jiushi na ge Lao Zhan de erzi, Xiao Zhan. 2) Used after calling language to express HEHUAN tone. For example, yuanzhang A, qi daoshi chu le,keshi ren ……meiyou le. (Examples of Function Words in Modern Chinese: p3–4) [17]. Liu adopted ‘Mitigation’ and ‘SHUHUAN’ and pointed out that the main function of modal particles A(啊), ‘NE (呢)’, ‘BA (吧)’ and ‘NE (呢)’ is to soften the tone of sentences [18]. Some may assume that ‘Mitigation’, ‘HEHUAN’and ‘SHUHUAN’ seem to have similar meanings and functions, so there is no need to distinguish them. However, we believe that there are obvious differences among these three concepts, and we should make a clear distinction when analyzing the function of modal particles [19]. ‘SHUHUAN’ and ‘HEHUAN’ have identical meanings by referring to a slowpaced, soft-spoken descriptive manner in linguistics. Compared with ‘urgent’ and ‘stiff’ way of speaking, they are considered as parts of ‘discourse style’. However, ‘Mitigation’ in linguistics refers to the strategies to achieve tense relationships, and the distant interpersonal distance is thus made to be harmonious and close. This type of concepts is named as ‘interpersonal domain.’ Therefore, ‘SHUHUAN’, ‘HEHUAN’ and ‘Mitigation’ belong to different domains in language functions and shall not be mixed in terms of deﬁning and understanding. In addition, it should be pointed out that A(啊)

The Interpersonal and Attitudinal Function

243

in a sentence, may present plenty of different features in various domains. When utterance may cause interpersonal conflict, A(啊) in the mid-sentence not only helps to achieve a soothing and gentle tone, but also eases the tense interpersonal relationship. But if utterance does not cause interpersonal conflicts, A(啊) may simply indicate the soothing mood of the speaker. (23) 1S: guo -qu 5 nian [A (啊)], zheng- xie zuo zhe ge diao- yan zuo le 509 ci, wei can- zheng yi- zheng zuo le hen duo gong-zuo. 1S: the past ﬁve year A CPPCC do this CL research do Asp 509 times, for participate in politics do Asp very much work. 1S: In the past ﬁve years, the CPPCC has conducted 509 investigations, has done a lot of work for participating in politics. (24) 1S: 1997 nian na nian [Na(哪)], zhe you shi yi ge tong-ku de hui-yi. 1S: 1997 year that year A this again be one CL painful Stru memory. 1S: When it comes to 1997, it was a painful memory. (25) 1S: fan-zheng wo-men hui-min [Na(哪)], ai, xue-tong ne, cong xue-tong lai shuo [A (啊)], gen han-min bu yi-yang. 1S: Anyway we Hui people A Mod lineage Mod from lineage Prep Speak A Prep Han people Neg same. 1S: Anyway, the lineage of Hui people is different from that of Han people. (26) 1S: na hui-min dou ren-shi ma? 1S: then Hui people all know Mod. 1S: Do the Hui people know each other? 2R: yi-ban lai shuo [A (啊)], you ren-shi de, ye you, ye you bu ren-shi de. 2R: general speak A have know Stru also have also have Neg know Stru. 2R: Generally speaking, some of them know each other, but others not. (27) A 1S: ni zuo-tian zen-me mei lai can-jia tong-xue hui A? 1S: You yesterday why Neg come participate class reunion Mod. 1S: Why didn’t you attend the class reunion yesterday? 2R: bu-hao-yi-si A, yin-wei zuo-tian chu-chai le, suo-yi [A(啊)] mei gan-shang. 2R: sorry Mod because yesterday on business Asp so A Neg catch. 2R: Sorry, because I went on a business trip yesterday, I missed it. Examples (23–27) are objective descriptions of the happened events. Examples (23–25) represent the relationship between the subject and the predicate. Example (26) is an explanation, and example (27) shows a causal logic relationship. The theme before A (啊) in these sentences is mainly topic theme and textual theme, while rheme is an objective description and explanation of the theme. With this regard, the main function of A (啊) in the sentence is to adjust the rhythm of discourse, avoid rushing and rigid tone and construct the speaker’s discourse style through delays or pauses in the sentence. 4.2

Marking the Intervention of the Listener

Within conversational communication, the function of drawing the listener’s attention generally performs well in priming sentences. However, in the answer sentence, A (啊)

244

M. Wang

can also show the listener’s attention by echoing and answering, ‘I heard it, it’s my turn to talk’. For example: (28) 1S: zai yi-yuan zhu le yi ge duo yue, ni zen-me xiang de ne? 1S: in hospital live Asp one CL more month, you how think Stru Mod 1S: You have been in the hospital for over a month. Do you have any plans? 2R: wo [Ya(呀)],wo xiang kuai-xie hao qi-lai,shang chang-li lai. 2R: I A I want t quickly recover Asp come factory Asp 2R: I hope to recover quickly and return to work in the factory. (29) 1S: wan-shang guo-lai yi-qi da-pai ba? 1S: tonight come together play cards Mod 1S: Come and play cards together tonight? 2R: da-pai [A (啊)], wo bu-hui, ni-men zi-ji wan ba. 2R: Playing cards A I Neg you own play Mod 2R: I can’t play cards. You guys have fun. ‘Calling attention’ presupposes ‘not paying attention’, that is, in the dialogue, when the speaker is expressing himself or herself, the listener is at the same time actively engaged or thinking about something unrelated to the current topic. In order to maintain the discourse and attract the listener’s attention, the speaker will take the initiative to invite the listener to join in the dialogue and hand over the right of speaking to the listener. While the listener passively takes over the right to speak, he will repeat and respond to relevant information in the form of echo response through A(啊) to express ‘I heard it, it is my turn’. Meanwhile, the ‘pause’ and the ‘delay’ of A(啊) in the sentence also gives the listener more time to think and organize the discourse. Lv [20], Shi [8] and Liu [18] said that A (啊) in the mid-sentence indicates the speaker’s ‘hesitation’,which just falls within this category. The so-called ‘hesitation’ is not the hesitation in the modality, but the hesitation in the way of speaking. If the A (啊) in the above examples is removed, the topic must be changed to the rising tone, or a direct response should be given. At this point, not only the attention of the listener cannot be correctly indicated, but the cognitive state of the listener’s ‘disconcentration’ during the dialogue is impossible to be represented. Interestingly, in discourse analysis, chances are that the listener does not always passively be provided the speaking the right, instead, he or she initiatively intervenes in the dialogue through A (啊) before the speaker ﬁnishes his turn. Taking on the topic and actively striving for the right to speak suggest the listener is eager to express his or her own ideas or craving for more relevant information. At this point, A (啊) in the sentence cannot be deleted or replaced by other modal particles, such as: (30) 1S: zhe-ge di-fang lian ge xiu ji-suan-ji de di-fang dou mei-you… 1S: this CL place Conj CL repair computer Stru place Conj Neg…… 1S: There is not even a place to repair computers here…… 2R: xiu ji-suan-ji [Ya(呀)](*Ba吧 *Ne呢 *Ma嘛), na ke-shi wo de qiang-xiang. 2R: Repair computer A (*Ba *Ne *Ma) that be my Stru strong point. 2R: Repairing the computer? I’m good at repairing computers. (31) 1S: ta zai mei-guo du shuo-shi … 1S: he is in America study master’s degree….

The Interpersonal and Attitudinal Function

245

1S: He is studying for a master’s degree in America…. 2R: zai mei-guo [A(啊] (*Ba吧 *Ne呢 *Ma嘛), na suo da-xue, wo ye shi zai mei-guo du de. 2R: In America A (*Ba *Ne *Ma) which CL university I also be in America study Mod. 2R: In America? Which university in America? I also studied in America before. In example (30), the speaker’s turn is not over yet, and there is no signal from the speaker to voluntarily give up the right to speak. But at this point, the listener interrupts the speaker and repeats the information that he or she is interested in, indicating that he or she has something to say about the message ‘ﬁxing the computer’. If A (啊) is removed from the sentence, then the tone of ‘repairing computer’ will be raised and become a rising tone. Example (31) shows the same situation where the speaker’s turn is not ﬁnished, the listener becomes interested in the message ‘in America’, and forcibly intervenes in the dialogue through A (啊), which shows he or she is eager to know more relevant information, such as ‘which university’, etc.

5 Conclusion From the perspective of syntactic position, A (啊) is located after theme and before rheme; from the perspective of information structure, it is the dividing point between important information and secondary information, and is located before important information; from the perspective of psychological cognition, whether to add A (啊) is decided by the need of expression, and the addition of A (啊), reflecting the speaker’s certain psychological demands, cognitive needs and pragmatic purposes, has obvious interpersonal pragmatic functions. The main conclusions are as follows: (1) A (啊) in the sentence has a common function in all discourses, that is, it indicates pause and draws attention to the following. However, when it comes to conversations, the function of A (啊) in the sentence presents some characteristics in the interaction with the dialogue discourse. (2) A (啊) in the sentence can endow the ‘objective quantity’ with subjectivity and has the function of meta-subjective quantity. (3) When A (啊) is located after the vocative, the rheme is often the speech-act implemented by the speaker or the speaker’s subjective views to the listener. Given the forcefulness of rheme, the main function of A (啊) in the sentence is to strengthen emotion, help the speaker to build intimate pragmatic identity, promote harmonious interpersonal relationship and express strong intersubjectivity. The choice of A (啊) is the embodiment of Politeness Principle. (4) ‘The extension of quantity’, ‘the enhancement of emotion’ and ‘the construction of pragmatic identity’ are all inseparable from the (inter) subjectivity of A (啊). In terms of the emotional tone of discourse, A (啊) in the sentence can carry further emotions from the speaker, such as affection, concern, while weakening the speaker’s illocutionary force, thus easing interpersonal relationship. These interpersonal and attitudinal functions are the result of the interaction between A (啊) and the dialogue.

246

M. Wang

Abbreviations Asp. CL. Conj. Mod. Neg. Prep. Stru.

Aspectual marker Classiﬁers Conjunction Modal particle Negation Preposition Structural auxiliary

References 1. Zhu, D.X.: Grammar Lecture, pp. 207–219. The Commercial Press, Beijing (1982). (in Chinese) 2. Zhao, Y.R.: A Grammar of Spoken Chinese. The Commercial Press, Beijing (1979). (in Chinese) 3. Li, X.Y.: The position of the modal particle ‘A, NE, BA’ in the sentence. J. Henan Univ. 2, 112–115 (1986). (in Chinese) 4. Chu, C.Z.: The Analysis of Modal Particle’s Meaning—Taking ‘A(啊)’ as an Example. Lang. Teach and Ling. Stud. 4, 39–51 (1994). (in Chinese) 5. Shen, J.X.: Unexplained topics—look at ‘topic—explain’ from ‘answers.’ Stud. Chin. Lang. 5, 345–365 (1989). (in Chinese) 6. Sun, R.J.: Modal particles in the sentence restrict the choice of syntactic position. J. Coll. Arts Nanjing Normal Univ. 3, 151–156 (2006). (in Chinese) 7. Lv, S.X.: Collection of Essays by Lv Shuxiang. Northeast Normal University Press, Shanghai (2005). (in Chinese) 8. Shi, Y.W.: Pauses and topics after the subject. J. Chin. Lang. 5, 134–161 (1995). The Commercial Press, Beijing. (in Chinese) 9. Qi, H.Y.: Mood Components Usage Dictionary in Modern Chinese. The Commercial Press, Beijing (2011). (in Chinese) 10. Li, C.N., Thompson, S.A.: Mandarin Chinese: A Functional Reference Grammar. University of California Press (1981) 11. Xu, L.J., Liu, D.Q.: Topic: Structural and Functional Analysis. Shanghai Educational Publication, Shanghai (2007). (in Chinese) 12. Zhang, B.J., Fang, M.: Research on Chinese Functional Grammar. The Commercial Press, Beijing (2014). (in Chinese) 13. Fang, M.: The functions of modal particles in Beijing dialect. Chin. Lang. 4, 129–138 (1994). (in Chinese) 14. Li, S.X.: Research on the expression of subjective quantity in Chinese. Ph.D. dissertation, Chinese Academy of Social Sciences (2003). (in Chinese) 15. Peng, L.Z.: Modality research in modern Chinese. Ph.D. dissertation, Fudan University (2005) 16. Wang, M.F.: The Expressions of ‘Ni’A(你呀)’as discourse marker of blame. Chin. Linguist. 2, 87–94 (2018). (in Chinese) 17. CDPU 1955,1957: Examples of Function Words in Modern Chinese. The Commercial Press, Beijing (1952). (in Chinese)

The Interpersonal and Attitudinal Function

247

18. Liu, Y.H., et al.: Practical Modern Chinese Grammar (Updated Edition). The Commercial Press, Beijing (2001). (in Chinese) 19. Wang, M.F.: On pragmatic function and formal veriﬁcation of ‘Ba(吧).’ Chin. Teach. World 2, 32–43 (2018) 20. Lv, S.X.: Eight Hundred Words in Modern Chinese. The Commercial Press, Beijing (1980). (in Chinese)

Verb Meaning Representation Based on Structured Semantic Components Long Chen(B) and Weidong Zhan(B) Peking University, Beijing, China {chenlong,zwd}@pku.edu.cn

Abstract. Meaning representation is a key task in computational linguistics and natural language processing. Many current meaning representation models fail to represent the accurate meanings of some sentences because of the ﬂexibility of natural language meanings. This paper proposes a meaning representation scheme based on semantic components. Through a case study of Chinese strike verbs, za(smash) and tou(hurl), the paper shows the method of annotating the semantic components for the event participants of each instance of a verb in sentences. The actual meaning of each verb is summarized according to the annotation of semantic features. The change of verb meaning or the generation of new meaning is also explained more reasonably according to the semantic feature annotation proposed in this paper. Keywords: Meaning representation · Structured semantic component · Semantic role · Dynamic word meanings · Lexical meaning decomposition

1

Introduction

The representation and computation of meanings are fundamental in natural language processing. They are designed to provide linguistic information for downstream NLP tasks such as machine translation and reading comprehension [1,2]. It can be viewed as correlating the forms of natural language to the forms of another metalanguage that can help computers learn the mechanism of the understanding of meanings. As for a natural language sentence, the metalanguage representing its meaning can be in the form of logical expressions, semantic role labels, abstract meaning representations, frame semantics, etc. Despite their diﬀerent forms, most of the metalanguages are common in the representation of the propositional meanings of natural language sentences. The propositional concept of a sentence is represented by the verb in the sentence and its semantic relationships with its This paper is supported by Major Project of the ‘New Generation of Artiﬁcial Intelligence’ funded by Ministry of Science and Technology of China (Project NO. 2020AAA0106701). c Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 248–262, 2022. https://doi.org/10.1007/978-3-031-06703-7_19

Verb Meaning Representation Based on Structured Semantic Components

249

theta-roles/semantic roles/frame elements, which are also words and phrases in the sentence. The semantic relationships are represented by semantic labels. This type of metalanguages is based on the semantic relationships between the words in the sentences. It fails to represent sentences like (1).

(1)

fazhan develop

xinpian, guang chip

only

za

qian

smash money

buxing,

hai

unable

also need smash

yao

za

ren. person

The development of the chip technology needs not only much money but also much human eﬀort.

Fig. 1. A way to represent the meaning of (1) in AMR

Take AMR [3] as an example of the type of metalanguage introduced above. If we use AMR, the meanings of the phrases za qian 1 and za ren in the sentence (1) can be represented as the nouns qian(money) and ren(person) being the Arg1 or the Arg2 of the verb za, as is shown in Fig. 1. The deﬁnition of the arguments, recorded in the frameset of Chinese PropBank [4], is shown in Fig. 2. According to the deﬁnitions, money and persons are the things being put down or the target of the event. However, the phrases mean to invest much money and human eﬀort instead of putting down a person on something or putting down something on someone. The meaning of za in the sentence has changed from its original meaning in Fig. 2. The meaning change of verbs, which is common in natural language, is challenging in natural language processing. This paper points out the ﬂexibility of natural language meanings, analyzes the insuﬃciency of current meaning representation models based on lexical semantic relationships, and proposes a meaning representation model based on lexical meaning decomposition. 1

In this paper, Chinese words transcribed in pinyin are in italic, semantic role labels are in boldface, and semantic components are in small capitals.

250

L. Chen and W. Zhan

Fig. 2. The frameset of za

2

The Flexibility of Natural Language Meanings

Natural language meanings are ﬂexible rather than ﬁxed semantic units. Synchronically, the meanings of the diﬀerent instances of a verb may diﬀer in some semantic components. For example, (2a)

jiefei

buting

za

men, shitu

robber constantly smash door

try

ru

shi

qiangjie.

enter

room

rob

The robbers kept smashing at the door in order to rob the house. (2b) tangmu Tom

men,

dan

limian

constantly smash door

buting

za

but

inside POSS

de

ren person

bu

ying.

NEG answer

Tom kept knocking at the door heavily, but people inside did not answer him.

The Chinese verb za means hitting something heavily, usually, but not necessarily, with the intention or result of damaging the object. The za in (2a) is the typical usage of za, where the intention of the action is to damage the men(door). The meaning of za in (2b) is slightly diﬀerent from that in (2a). In (2b), the intention of za men is to draw the attention of other people in the room. In Chinese linguistic studies and lexical knowledge databases, the two instances of za belong to the same sense entry because the diﬀerence in their meanings is slight and they do not have distinct syntactic distributions. However, from the perspective of natural language processing, the semantic discrepancy between the two instances needs to be revealed. For example, the za men in (2a) can be translated as “smash the door”, while the za men in (2b) should be translated as “knock at the door heavily”. Therefore, it is helpful to describe the semantic discrepancy in machine translation. Diachronically, a verb may develop into some meanings diﬀerent from the original ones. For example, (3) Huawei za qian yanfa xinpian. Huawei smash money develop chip Huawei invests a lot of money in the development of chip technology.

Verb Meaning Representation Based on Structured Semantic Components

251

In (3), za qian means investing a lot of money. This is a newly-developed meaning of za and has not been recorded in Chinese dictionaries and lexical knowledge bases such as Chinese Contemporary Dictionary (CCD) or HowNet [5]. The ﬂexibility of word meanings is challenging in natural language processing. As to the above examples, the two instances of za in (2) belong to the same sense entry and the meaning of za in (3) is not recorded. Thus, knowledge-driven approaches to natural language processing fail to distinguish these meanings. Data-driven approaches may also fail to deal with these meanings accurately because the instances of za in (2a) have similar syntactic distributions to the instances in (2b), and the usages of newly-developed meanings are rare. Therefore, the understanding of the ﬂexible word meanings is a challenge to both knowledge-driven and data-driven approaches.

3

The Granularity and Structure of Semantic Role Systems

According to the above discussion, most current meaning representation models are based on lexical semantic relationships, most of which are argument-verb relationships. Therefore, this part discusses the properties of argument-verb relationships in some widely-used meaning representation models and examines their ability to represent ﬂexible verb meanings. The properties of argument-verb relationships include at least two aspects: the granularity and the hierarchical structure. The granularity of argument-verb relationships refers to the level of generality/granularity of semantic roles. For example, Yuan [6] categorized semantic roles into three levels of granularity, i.e. the macro-level, the meso-level, and the micro-level. According to his categorization, macro-level semantic role systems contain a few semantic roles. For example, the semantic role system of Chinese PropBank is a macro-level one. The semantic roles of za in Chinese PropBank are shown in Fig. 2. Meso-level semantic role systems contain about twenty semantic roles. For example, the semantic role systems of the Guidelines for Modern Chinese Predicate Semantic Role Labeling of Peking University [7] and the PKU NetBank [6] are mediumlevel ones. The semantic roles of za in this level of granularity include agent, patient, and instrument. Micro-level semantic role systems contain hundreds of semantic roles. For example, the semantic role system of Chinese FrameNet [8] is a micro-level one. In Chinese FrameNet, za has sixteen frame elements, including Agent, Cause, Impactee, Impactor, Impactors, Force, Instrument, Manner, Means, Period of Iterations, Place, Purpose, Result, Speed, Subregion, and Time. There are two types of the hierarchical structure of semantic role systems, i.e. the list structure and the tree structure. In the list-structured semantic role systems such as the ones of Chinese PropBank and Chinese FrameNet, a semantic role has no semantic relationships with any other semantic role. In the treestructured semantic role systems such as the one of PKU NetBank, the semantic

252

L. Chen and W. Zhan

similarity of two semantic roles can be estimated from the hierarchy. For example, in PKU NetBank, patient and target are both object roles and are core semantic roles, and instrument belongs to non-core semantic roles. The granularity and structure of the semantic role systems in some Chinese semantic knowledge bases are listed in Table 1. Table 1. The granularity and structure of the semantic role systems in some Chinese semantic knowledge bases Macro-level

Meso-level

List structure Chinese PropBank Tree structure

Micro-level Chinese FrameNet

PKU NetBank

Intuitively, the ﬁner-grained a semantic role system is and the more elaborate its structure is, the more eﬃcient it will be in describing argument-verb relationships. However, the current semantic systems are still not suﬃcient in describing the ﬂexibility of word meanings. For example, Chinese FrameNet is the ﬁnest-grained semantic role system, but the frame elements in Chinese FrameNet cannot distinguish the diﬀerent meanings of za men in (2a) and (2b) and cannot describe the meaning of za qian in (3). The PKU NetBank has the most elaborate structure, but as for the sentence (4), it is hard to determine whether yi ge jiuping(a wine bottle) is a patient or an instrument, which indicates that patient and instrument are sometimes similar. The similarity cannot be reﬂected by the tree structure of PKU NetBank. Above all, to describe the ﬂexible verb meanings, the semantic representation model needs to have a ﬁner granularity and structure. (4)

lvke ba yi ge jiuping henhen de za zai zhantai shang. traveller ba one CL wine.bottle violently ADVM2 smash at platform on The passenger smashed the wine bottle down on the platform.

The Meaning of za Represented in Structured Semantic Components

4

As is discussed above, verb meanings must be fully decomposed in order to reﬂect the semantic diﬀerences among the instances of a verb. Therefore, this chapter proposes a meaning representation method based on structured semantic components and applies this method to the Chinese verb za to demonstrate its competence in representing ﬂexible verb meanings. 4.1

The Semantic Components of za

The semantic components of za can be acquired from either previous work on the verb meanings and semantic roles or from semantic knowledge bases. For 2

CL and ADVM refers to Classiﬁer and Adverbial marker.

Verb Meaning Representation Based on Structured Semantic Components

253

example, on the lexical analysis of za, Jiang [9] decomposed the meaning of za into instrument: heavy object, intensity: heavy, target: human or object; on research on semantic roles, Dowty [10] characterized semantic roles in semantic features such as volition, movement, being affected, etc.; Yuan [11] distinguished semantic roles by semantic features such as volition, causation, change, etc. In semantic knowledge bases, the Semantic Knowledge-Base of Contemporary Chinese of Peking University [12] includes three categories and thirteen subcategories of verbs, such as change, exhaustion, creation, etc. These subcategories are classiﬁed according to the semantic attributes of verbs, so they can also be regarded as semantic components. HowNet also contains many sememes that describe the meaning of nouns and verbs, which are also a type of semantic component. Based on the above research and the sememes in HowNet, sixteen semantic components are selected to describe the meanings of za. Their correlation with the coarser-grained arguments of za is listed in Table 2. Table 2. The semantic components of za Semantic roles Semantic components Agent

human

Patient

concrete abstract

Instrument

moving

volitional non-volitional forceful existing

exhaustive heavy

manual

appearing broken worsened developing much

These semantic components can be used to describe the meanings of the events expressed by za. For example,

(2a) jiefei buting za men, shitu ru shi qiangjie. robber constantly smash door try enter room rob The robbers kept smashing at the door in order to rob the house.

The meaning of the event expressed by za can be represented as Fig. 3. The semantic components of jiefei (robber) in (2a) include human, volitional,

Fig. 3. The meaning of the event of za in (2a)

254

L. Chen and W. Zhan

forceful, and manual, which are typical semantic components of the agent of za. The semantic components of men(door) in (2a) include concrete, existing, and broken, which are typical semantic components of the patient of za. (2b) tangmu buting za men, dan limian de ren bu ying. Tom constantly smash door but inside POSS person NEG answer Tom kept smashing at the door, but people inside did not answer him.

Fig. 4. The meaning of the event of za in (2b)

The meaning of the event expressed by za in (2b) can be represented as Fig. 4. The semantic components of tangmu(Tom) are the same as the ones of jiefei (robber) in (2a). The semantic components of men(door) do not include broken, which is diﬀerent from the men(door) in (2a).

(3) Huawei za qian yanfa xinpian. Huawei smash money develop chip Huawei invests a lot of money in the development of chip technology.

The meaning of the event expressed by za in (3) can be represented as Fig. 5. The semantic components of Huawei do not include forceful and manual, which is diﬀerent from the ones of the agents in (2a–b). The semantic components of yanfa xinpian(develop chip technology) include abstract and developing, which are signiﬁcantly diﬀerent from the ones of typical patients of za. The semantic components of qian(money) include abstract, exhaustive, and much, which are also signiﬁcantly diﬀerent from the ones of typical instruments of za.

Verb Meaning Representation Based on Structured Semantic Components

255

Fig. 5. The meaning of the event of za in (3)

The above examples demonstrated the annotation of the meanings of the events expressed by za. This method can reﬂect the semantic discrepancy between the two men(doors) in (2a–b) and can describe the meanings of atypical event participants such as qian(money) in (3). This method is applied to the corpus of za. Fifty sentences containing za selected from the CCL corpus [13] are annotated in this method. Each sentence contains one or several participants. The annotation result is shown in Table 3. The semantic commonalities of diﬀerent instances of za’s event participants can be concluded from the annotation. Then the semantic roles of za can be summarized. For example, in Table 3, the semantic attribute pattern of the ﬁrst row and the one of the tenth row diﬀers only in the attribute of heavy and much, so the instances with the two semantic attribute patterns can be concluded as one semantic role. Likewise, the semantic attribute pattern of the third row and the one of the eighth row can also be merged as belonging to one semantic role. The semantic roles concluded in this way are shown in Table 4. The agent, instrument, and hit object are similar to the Arg0-2 listed in Fig. 2. The worsened object appears in instances such as za le zhaopai (ruined the reputation) and ba haoshi ban za le(did a bad job). The causer co-occurs with the worsened object and is the subject that non-volitionally causes the worsened object to be worsened. The result is the new entity produced in the event of za, such as the bing kulong(ice hole) in za bing kulong(break the ice to form an ice hole). The investment and the invested object are the semantic roles used when za means investing. Besides the typical semantic roles, za also has some atypical semantic roles. For example,

256

L. Chen and W. Zhan

Table 3. The semantic componentsa of the event participants of za in the corpus and the number of their instances Semantic component a

b

c

d

e

f

Number of instances g

+

h

i

j

+

+ + +

+

+

+

+

+

+

+

+ +

+ + +

m n

p 18 15 14

+ +

+

7

+

6 5

+

+

4

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

o

+

+ + +

+ +

l

+

+ + + +

k

+

4 4 2

+ +

2

+ +

2 2 2

+ +

2

+

+

+

+

+

+

+

+

+

+

1

+ +

+

1

+

+

1 1

+ + + + 1 a a. human; b. volitional; c. non-volitional; d. forceful; e. manual; f. concrete; g. abstract; h. existing; i. appearing; j. broken; k. worsened; l. developing; m. moving; n. exhaustive; o. heavy; p. much

Table 4. The semantic roles of za, their semantic components, and the number of their instances Semantic roles

Semantic components

hit object

+/−

agent

+

a

b

c

d

e

+ +

+

+

f

g

+

+

+ +

instrument

+

result

+ +

+

h

+

destroyed object

causer

Number of instances i

j

k

l

+

n

+

+ + +

+

+

+

+

+ +

+

+ +

17

+/−

16 10 5

+/−

invested object +

27 +/−

+

investment

instrument (atypical)

p

+

+

hit object (atypical)

o +/−

+/−

+

+

m +/−

+

+

4 1 1 1

+

1

Verb Meaning Representation Based on Structured Semantic Components (4a)

renren

dou

pa

shuye

luo

xialai

everyone

all

fear

leaf

fall

down

za

po

smash broken

257

tou,

kandao

head

see

Everyone fears that their heads will be smashed by fallen leaves fubai

xianxiang

buwenbuwen.

corruption

phenomenon

indiﬀerent

and chooses to be indiﬀerent to corruption. (4b)

zhe

laotouzi

gei

ta

za

diji.

DEM DIST

old.man

give

3sg

smash

foundation

This old man is smashing the foundation of the house for him.

In (4a), the shuye(leaf) is an atypical instrument because it is not heavy. The semantic attribute broken of tou(head) is proﬁled, so the meaning of the instrument can be diﬀerent from typical ones. In (4b), the diji (foundation) is an atypical hit object because it is not broken. The semantic attribute forceful of zhe laotouzi (this old man) is proﬁled in this sentence. The annotation of the semantic attributes can reﬂect the accurate meaning of the atypical semantic roles. The sense entries of za can be concluded from the co-occurrence of its semantic roles. In the corpus of za, the hit object co-occurs with the instrument, result, and the agent; the worsened object co-occurs with the causer; the investment co-occurs with the invested object and the causer. The three cooccurrence patterns correspond to three sense entries. The ﬁrst can be concluded as “hitting an object heavily with a heavy object”, which is the original and most commonly-used meaning of za; the second can be concluded as “destroying abstract properties”; the third can be concluded as “investing”. The second and the third sense entries are usually not recorded in lexical knowledge bases and dictionaries. The correspondence between za’s sense entries and semantic roles, and semantic roles and semantic components, are shown in Fig. 6. Only the typical semantic roles and part of the semantic components are exhibited in the Fig. 6.

Fig. 6. The sense entries, semantic roles, and semantic components of za

258

L. Chen and W. Zhan

The sense entries can be divided further according to the syntactic distributions. For example, as to the ﬁrst sense entry of za, the usage where the agent appears can be deﬁned as “hitting something with a heavy object”; the usage where the instrument appears in the sentence but the agent does not can be deﬁned as “a heavy object falling down”; the usage where the hit object appears alone can be deﬁned as “be broken”. The above usages corresponds to the ﬁrst and the second sense entry of za in CCD. The sense entries concluded by the method proposed above are more accurate and conform better with the use of za in natural language that the ones in dictionaries. For example, the third sense entry in this paper is not included in CCD; the second sense entry in this paper is deﬁned as “destroying an abstract object”, but in CCD it is deﬁned as “failing”, which is a unary verb, indicating that only the worsened object appears in the sentence. However, in the corpus of za, there are only ﬁve usages where the worsened object appears alone but nine usages where the worsened object co-occurs with the causer. 4.2

The Hierarchical and Metaphoric Structure of the Semantic Component System

The semantic relationships between the semantic components can be further discovered. The relationships include the entailment relationship and the metaphoric relationship. The entailment relationship can be discovered either from common knowledge or from the annotation. For example, if a participant has the semantic component forceful, he must also have the semantic components human and volitional, which means forceful entails human and volitional. In the annotation of the event participants, if a semantic component a always co-occurs with another semantic component b, it can be inferred that a entails b. For example, the column d in Table 3 always co-occurs with the columns a and b, indicating that forceful entails human and volitional. The metaphoric relationship can be discovered from the development of word meanings. For example, in the development from the ﬁrst sense entry of za to the second and third sense entries, the semantic components concrete and broken of the hit object turned into the abstract and worsened of the worsened object; the heavy of the instrument turned into the much of the investment. In linguistic research, the metaphoric relationship between concrete and abstract is often mentioned [14,15], but the metaphoric relationships between more speciﬁc semantic components such as heavy and much is less noticed systematically. The entailment and metaphoric relationships between semantic components are demonstrated in Fig. 7.

Verb Meaning Representation Based on Structured Semantic Components

259

Fig. 7. The hierarchical and metaphoric structure of the semantic component system

The structure of the semantic component system can reﬂect the signiﬁcance of diﬀerent semantic components. For example, concrete, abstract, human are higher-level semantic components. They are more general in meaning. If their values change in the verb meaning development, the values of many other semantic components may also change, and the meaning of the verb may change signiﬁcantly. For example, the concrete of the hit object turns into abstract of the worsened object. The values of many other semantic components also change, such as broken and human. Forceful and manual are lower-level semantic components. They are more speciﬁc in meaning and can distinguish a language unit from the others. If two event participants diﬀer in the lower-level semantic components, they should be categorized as diﬀerent semantic roles. For example, the agent and the causer are only diﬀerent in the values of forceful and manual, but the co-occurrence with other semantic roles also indicates that they should be treated as diﬀerent semantic roles. The structure of the semantic component system can also describe and predict the verb meaning development. For example, the Chinese verb kang originally means carrying heavy objects with one’s shoulders. Recently, it developed into the meaning of undertaking much abstract objects such as debt, blames, or responsibilities. For example,

(5) yuehan wei qizi kang xia wubaiwan zhaiwu. John for wife shoulder under ﬁve.million debt John aﬀorded $5 million in debt for his wife.

This meaning of kang has also not been recorded in CCD, but it can be predicted by the structure of the semantic component system.

5

The Application of the Structured Semantic Components on Other Verbs

The above method can describe the meanings of other verbs and explain and predict their meaning development. For example, the Chinese verb tou has eight

260

L. Chen and W. Zhan

sense entries in CCD, two of which are close to za in meanings. One is throwing something at a target. It is similar to the ﬁrst sense entry of za in that it also expresses hand motions. The other is putting something into a place. It is similar to the third sense entry of za when it co-occured with qian(money). The meaning of tou cannot be accurately expressed only by the sixteen semantic attributes listed above. Five more semantic components need to be added, including q. aloft, r. from, s. arrive, t. give, u.get. The latter four are binary semantic components indicating the relationships between diﬀerent event participants. The meaning of some semantic roles of tou is listed in Table 5 and 6. The other semantic roles and semantic components are omitted in the table because they are rarely used and not relevant to the meaning of za. Table 5. The unary semantic components of the semantic roles of tou Semantic roles Semantic components a b c d e

f

thrower

+

+

+

+ +

thrown object thrown target +/− mover

+

+ +

moved object moved target giver

+

+

given object receiver

+

g

h m

q

+

+

+ +

+

+

+

+

+

+ +

+

+

+/− +

+

+

+/− +/− + +/− +/− +/− −/+

+

+

starting point

+

+

ﬁnishing point

+

+

The unary semantic components can distinguish most semantic roles of tou, but they cannot distinguish the moved object and given object, and the starting point and finishing point. Binary semantic components are needed to help distinguish these semantic roles. The binary semantic components of some pairs of semantic roles are listed in Table 6. Three usages of tou can be summarized from the semantic components. The ﬁrst is throwing, which is its original meaning. The second is moving and the third is transferring. The structure of the semantic component system can be extended from the meaning of tou. For example, aloft entails moving, from and arrive entail concrete, and there are metaphoric relationships between arrive and get, from and give. A part of the extended structure of semantic component system is shown in Fig. 8.

Verb Meaning Representation Based on Structured Semantic Components

261

Table 6. The binary semantic components of some semantic roles of tou role1

role2

Semantic components r s t u

starting point thrown object + ﬁnishing point thrown object moved target

moved object

giver

given object

receiver

given object

+ + +

+ +

+

Fig. 8. The hierarchical and metaphoric structure of the extended semantic component system

6

Conclusive Remarks

Verb meanings are ﬂexible. Diachronically, the meaning of a verb may change; synchronically, the meaning of a verb may be skewed from its canonical meaning according to the context. However, current meaning representation models that are based on semantic relationships between words and their lexical knowledge bases are static and coarse-grained. Only a part of the canonical verb meanings are recorded in the knowledge bases. Therefore, to help the static meaning representation model capture the dynamic verb meanings, the argument-verb relationships in the model must be ﬁne-grained enough. In this paper, representing the meanings of the event participants in semantic components proves to be a feasible approach. This paper takes the Chinese verb za as an example, demonstrates the ﬂexibility of its meaning, and labels some instances of the verb in semantic components. The annotation indicates that the semantic representation based on semantic components can reveal the semantic discrepancy between diﬀerent semantic role instances and describe the meaning of atypical semantic roles. The semantic roles and sense entries of za can be summarized from the annotation. The method can also apply to other verbs such as tou.

262

L. Chen and W. Zhan

The method is still under theoretical discussion, and its eﬀect needs to be proved by more annotated data on a variety of verbs. The selection and deﬁnition of the semantic components are also to be explored.

References 1. Abend, O., Rappoport, A.: Universal conceptual cognitive annotation (UCCA). In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 228–238 (2013) 2. Palmer, M., Gildea, D., Xue, N.: Semantic Role Labeling. Morgan & Claypool Publishers, San Rafael (2010) 3. Baranescu, L., Bonial, C., Cai, S., et al.: Abstract meaning representation for sembanking. In: Linguistic Annotation Workshop and Interoperability with Discourse, pp. 178–186 (2013) 4. Xue, N., Palmer, M.: Adding semantic roles to the Chinese Treebank. Nat. Lang. Eng. 15(1), 143–172 (2009) 5. Dong, Z., Dong, Q.: HowNet-a hybrid language and knowledge resource. In: Proceedings of International Conference on Natural Language Processing and Knowledge Engineering, pp. 820–824 (2003) 6. Yuan, Y.: The ﬁneness hierarchy of semantic roles and its application in NLP. J. Chin. Inf. Process. 21(4), 10–20 (2007). (in Chinese) 7. Guidelines for Modern Chinese Predicate Semantic Role Labeling of Peking University (2015). (in Chinese). http://klcl.pku.edu.cn/xwdt/231664.htm 8. You, L., Liu, K.: Building Chinese framenet database. In: Proceedings of International Conference on Natural Language Processing and Knowledge Engineering, pp. 301–306 (2005) 9. Jiang, S.: Semantic analyses of verbs of striking. Stud. Chin. Lang. 5, 387–401 (2007). (in Chinese) 10. Dowty, D.: Thematic proto-roles and argument selection. Language 67(3), 547–619 (1991) 11. Yuan, Y.: On the hierarchical relation and semantic features of the thematic roles in Chinese. Chin. Teach. World 61(3), 10–22 (2002). (in Chinese) 12. Wang, H., Yu, S., Zhan, W.: The speciﬁcation of the semantic knowledge-base of contemporary Chinese. J. Chin. Lang. Comput. 13(2), 159–176 (2003). (in Chinese) 13. Zhan, W., Chang, B., Guo, R., Chen, Y., Chen, L.L.: The building of the CCL corpus: its design and implementation. J. Corpus Linguist. 6(1), 71–86 (2019). (in Chinese) 14. Lakoﬀ, G., Johnson, M.: Metaphors We Live By. University of Chicago Press (1980) 15. Chen, L., Rao, Q., Liu, Y.: Knowledge representation of non-literal meanings of Chinese words and its applications. Sci. Sin. Inform. 49(8), 1005–1018 (2019). (in Chinese)

Activation of Alternatives by Mandarin Sentence-Initial and Sentence-Internal Foci: A Semantic Priming Study Tsun-Ming Ma, Yu-Yin Hsu(&), Tianyi Han, and Daria Tack The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR [email protected], {yu-yin.hsu,tian1han}@polyu.edu.hk

Abstract. This paper reports results of a semantic priming study on the activation of semantic alternatives by focus phrases in Mandarin introduced by focus-operators – zhiyou ‘only’ and lian ‘even’ – located in two sentence positions (initial and internal). In this study, we used a lexical semantic priming paradigm to study how native mature readers of Mandarin Chinese response to three types of semantic associations (alternative, non-contrastive, and unrelated) through ratings of 7-point Likert Scales. According to Rooth’s Alternative Semantics, focus units activate a set of alternative units semantically. Our results show that the activation of semantic alternatives was indeed sustainably more prominent for words occurring in focus constructions than when they occurred in canonical, non-focus, sentences. However, neither the focus type nor the focus position had speciﬁc effects on the activation of alternatives. Keywords: Semantic priming Focus alternative zhiyou ‘only’ lian ‘even’ Sentence-initial focus Sentence-internal focus Mandarin

1 Introduction In recent decades we have seen rapid growth in the number of studies on sentence comprehension (for a review, see [1]). It has been reported that both prosodic and syntactic structures can play critical roles in the process of language comprehension, especially when they convey information about focus. During communication, the speaker and the listener construct a common ground, and focus – an essential part of information structure – highlights the meanings of speciﬁc constituents that update this common ground [2]. Focus plays an important role in information packaging, such as highlighting a discourse’s new information. According to Gundel [3], there are two main types of focus, i.e., semantic and contrastive (the former also being referred to as information focus in Kiss [4]); semantic focus provides new information in the proposition, whereas contrastive focus marks the constituent that is emphasized and/or contrasted in the utterance. Both types of focus beneﬁt the processing of such highlighted words, as well as the understandability of the sentence [5]. According Rooth’s [6] Alternative Semantics framework, focus units activate a set of alternative words, indicating additional semantic implications. That is, focus’s © Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 263–270, 2022. https://doi.org/10.1007/978-3-031-06703-7_20

264

T.-M. Ma et al.

meaning functions as a variable, referring to a set of alternative words that are semantically related and syntactically plausible in a given syntactic proposition. For example, the sentence in (1) could activate a set of alternative things that ‘I ate’, one member of which is the focus-marked constituent (e.g., 苹果 ‘apple’), when the word 苹果 ‘apple’ is prosodically emphasized. For Rooth [6], a focus phrase in a sentence can activate the meaning not only of the focalized constituent, but also of its associated alternatives in a set. (1) 我吃了[苹果]focus ‘I ate an APPLE.’ The sentence in example (2) contains a focus particle 只有 ‘only’ and its focusmarked constituent 苹果 ‘apple’. Given the exclusive function of 只有 ‘only’, the focus-marked constituent belongs to a set of alternative words, but from that set, ‘apple’ is the only valid one in this context. (2) 只有[苹果] focus被我吃了。 ‘Only the APPLE was eaten by me.’ In the study of syntactic focus in Chinese, considerable attention has been paid to the structural formation and semantic interpretations of sentences with focus particles (such as zhiyou ‘only’ and lian ‘even’). Few studies, however, have investigated how native speakers comprehend such focus sentences, and fewer still have looked at whether different types of focus sentences are processed differently [7]. Another interesting language-processing phenomenon that has yet to be studied is whether focus phrases located in the sentence-initial position and those in sentence-internal positions trigger the same focus effects, despite foci located in both these positions having been examined cross-linguistically [8–11]. Therefore, the present paper adopts a semanticpriming paradigm to study the activation of alternatives to Mandarin only-marked focus nouns (zhiyou-nouns) and even-marked focus nouns (lian-nouns) that are located either sentence-initially or sentence-internally. Mandarin has another only-focus particle, 只 zhi, but it is adverbial and does not occur sentence-initially [12], so will not be dealt with here. More speciﬁcally, the aim of the current study is to explore differences and similarities in the semantic functions of foci located in the two above-mentioned sentential positions; to advance our knowledge of information structure and the effects of sentence types; and to shed light on how native-speaking adults come to comprehend these focus sentences. As such, it will attempt to answer the following three research questions: 1) Are focus phrases more likely to activate alternative associations than non-focus phrases? 2) Do only- and even-foci perform differently in terms of their triggering of alternative semantic associations? 3) Do initial vs. internal positions of focus phrases in a sentence perform differently, in terms of the activation of semantic alternatives?

Activation of Alternatives by Mandarin Sentence-Initial and Sentence-Internal Foci

265

2 Methods 2.1

Participants and Materials

In this study, we recruited 80 students from the Hong Kong Polytechnic University who were native speakers of Mandarin Chinese. Their mean age was 23.4 (±3.85 SD), and 76 were female. None reported having any reading difﬁculties. The experimental items consisted of prime words, their associated target words and the carrier sentences of the prime words (referred to as prime sentences). The target words comprised three types of semantic association with each prime word; all prime and target words were adopted from the experimental stimuli previously used by Yan and Calhoun [5]. As shown in Table 1, the three target-word types were 1) alternative (i.e., semantically related and replaceable words in the target sentence, such as captain and sailor); 2) non-contrastive associate (i.e., semantically related but unreplaceable words in the target sentence, such as captain and deck); and 3) unrelated (i.e., semantically unrelated words, such as captain and pumpkin). We also modiﬁed Yan and Calhoun’s [5] semantic-priming paradigm by adding a preceding-context sentence for each target sentence, to provide general background information for it. Table 1. Example experimental sentences and their associated target words Context sentence 军队开支紧缩需要重新安排人力 ‘The budget of the army is tight, [and they] need to rearrange manpower.’ Sentence type Prime sentence Semantic types Target words Alternative 水手 ‘sailor’ Canonical 军官解雇了船长 ‘The Ofﬁcer ﬁred the captain.’ Non-contrastive 甲板 ‘deck’ Unrelated 南瓜 ‘pumpkin’ Initial-only 只有船长,军官解雇了 Alternative 水手 ‘sailor’ ‘The ofﬁcer ﬁred only the captain.’ Non-contrastive 甲板 ‘deck’ Unrelated 南瓜 ‘pumpkin’ Internal-only 军官只有船长解雇了 Alternative 水手 ‘sailor’ ‘The ofﬁcer ﬁred only the captain.’ Non-contrastive 甲板 ‘deck’ Unrelated 南瓜 ‘pumpkin’ 连船长,军官都解雇了 Initial-even Alternative 水手 ‘sailor’ ‘The ofﬁcer ﬁred even the captain.’ Non-contrastive 甲板 ‘deck’ Unrelated 南瓜 ‘pumpkin’ Internal-even 军官连船长都解雇了 Alternative 水手 ‘sailor’ ‘The ofﬁcer ﬁred even the captain.’ Non-contrastive 甲板 ‘deck’ Unrelated 南瓜 ‘pumpkin’ The square brackets indicate words that are silent in the Chinese examples, and underlined words indicate the intended prime words.

After double-checking the semantic association between, and potential interpretations of, each prime and target pair of words used in Yan and Calhoun [5], we selected 36 nouns as prime words, controlled by animacy: i.e., half were animate and half

266

T.-M. Ma et al.

inanimate. The prime word was the object (with the focus particle) in all sentence types. Each of the 36 prime words was then used to form ﬁve types of prime sentence: 1) in canonical SVO word order (Canonical); 2) with zhiyou ‘only’ focus in the initial position (Initial-only); 3) with zhiyou ‘only’ focus in the internal position (Internalonly); 4) with lian ‘even’ focus in the initial position (Initial-even); and 5) with lian ‘even’ focus in the internal position (Internal-even). Thus, each prime word formed a set of 15 items in a 5 (sentences) 3 (target types) design, and there were 540 items in total. The Latin Square Design was then used to divide all experimental items into 24 blocks, pairs of which formed 12 lists, each containing 45 experimental items and 47 ﬁller items with structures unrelated to the current study. Six likewise unrelated practice items were also included at the beginning of the experimental session. Potential semantic interference within each block was checked and reduced in the following two ways. First, the verb and nouns in each sentence were conﬁrmed to not have any semantic relation with either the prime or target word in other experimental sentences; and second, each list was further divided into four segments, and if any two words in a list might form a semantically related pair, the corresponding items were distributed into separate segments. 2.2

Procedure

The experiment was hosted online using Qualtrics (www.qualtrics.com). After reading the instructions, consenting to participate, and completing a questionnaire about their personal information (student number, age, gender, etc.), the participants had a practice session with the six trial items mentioned above, in which they were instructed to carefully read and attempt to thoroughly understand the meaning of each sentence before moving to the next one. In the main experiment section, the participants ﬁrst read a context sentence, followed by a prime sentence that described an event (see Fig. 1). Then, they clicked an arrow button (!) to move to the next page. On that page, they were shown a target word and asked to rate the semantic relation between it and the previous sentence, on a scale from 1 = “no semantic relation” to 7 = “a strong semantic relation”. Having provided their ratings, they clicked ! again to start the next trial. At the end of the main section, the participants were asked an open-ended question about any questions or problems they had encountered during the experiment. All completed the experiment in 20 to 30 min.

Activation of Alternatives by Mandarin Sentence-Initial and Sentence-Internal Foci

267

Fig. 1. Example experimental item. The context sentence means ‘Lili wanted to play with a large computer-game machine’. The prime sentence means ‘Only coins can turn on this machine’. On the next page, the target word 纸币 means ‘paper money’

2.3

Measurements

Participants’ ratings of the semantic relation between the target words and the prime sentences were re-coded as High (4–7) or Low (1–3). Six participants’ responses were excluded prior to analysis because they had assigned a score of 4 or above to more than four of the 15 unrelated control items. Valid data were analyzed using the glmer function of the lme4 package in R ([13] version 4.0.2), with the predictors being SENTENCETYPE (Canonical control, only, and even), FOCUSPOSITION (initial and internal), and TARGETWORD (alternative, noncontrastive, and unrelated). The random intercepts in the analyses were PARTICIPANTS and ITEMS [14], and model evaluation was conducted using log-likelihood tests.

3 Results We compared the participants’ ratings of semantic relations across three SENTENCETYPE, the three relation types of TARGETWORD, and the two FOCUSPOSITION. Table 2, which summarizes the percentages of High and Low ratings in all conditions, indicates that the ratings of foci in different positions were quite similar. The percentages of Low ratings assigned to Unrelated words across all sentence types were above 80%, suggesting that the sampled native speakers of Mandarin were highly aware of the semantic relations between the words they were shown. However, there was an interesting difference between the canonical sentences and the focus sentences. In all four types of focus sentences, the percentages of High ratings

268

T.-M. Ma et al.

were the highest (i.e., all above 68%) for the Alternative type, and only slightly lower for only-sentences (around 68%) than for even-sentences (around 71%). The Noncontrastive type in the canonical sentences, in contrast, received the highest percentage of High ratings (66.22%), although the difference in the percentages of High ratings between canonical and focus sentences was small (see Table 2). Table 2. Percentage of High/Low responses in each condition Sentence type Focus position Rating Unrelated Canonical Canonical Low 81.45% High 18.55% zhiyou ‘Only’ Initial Low 80.18% High 19.82% Internal Low 79.73% High 20.27% lian ‘Even’ Initial Low 86.55% High 13.45% Internal Low 82.88% High 17.12%

Non-contrastive 33.78% 66.22% 36.04% 63.96% 37.39% 62.61% 38.29% 61.71% 37.84% 62.16%

Alternative 41.26% 58.74% 31.53% 68.47% 31.53% 68.47% 28.51% 71.49% 28.83% 71.17%

The statistical results conﬁrmed our observations. The main effects of sentence type (v2(2) = 1.3136, p = 0.5185) and focus position (v2(2) = 1.0493, p = 0.5918) were both non-signiﬁcant. However, there was a signiﬁcant main effect of target type (v2(2) = 676.48, p < 0.001), with unrelated words receiving signiﬁcantly lower ratings than non-contrastive (p < 0.001) and alternative ones (p < 0.001). We also identiﬁed signiﬁcant interaction effects of sentence type and focus position (v2(4) = 9.614, p = 0.047). As shown in Table 3, in the Alternative word type, focus sentences received signiﬁcantly higher ratings than canonical sentences did (ps < 0.05), but the differences related to focus positions were not signiﬁcant. Table 3. Interaction of focus type and focus position for the target type “Alternative” FocusType “Even” FocusType “Only” FocusPos “Initial” Interaction.Even.Initial

Estimate 0.8980 0.6935 0.0104 0.0050

SE 0.332 0.329 0.331 0.474

z ratio p value 2.697 bare noun > pronoun/demonstrative + noun > restrictive attribute + noun > modiﬁer attributes + noun > relative clause”. Our statistical analysis has shown that if the ﬁxed element is excluded, almost 70% of the remaining productive elements were composed of bare nouns (such as Example 22) coded in an apparently simpler way. In fact, the construction “你必须知道的NP” [ni bixu zhidao de NP] (NP that you must know) corroborates the referential principle pointed by Fang [20] that a more complicated coding of relative clause is adopted to highlight the higher information value of the text. 3.2

Expressing the Author’s Stance

While the context of poem and context of internet language highlight emotiveness and anonymity respectively, the context of titles is notable for its indexing nature, which allows convenient searching for article themes or authors’ stance in brief wording. This study has found that the development of the “你必须知道的NP” [ni bixu zhidao de NP] (NP that you must know) has beneﬁted from its discourse function of expressing the author’s stance. Du Boris [21] deﬁned stance as a public act initiated by a public actor via “overt communicative means”. Stance refers to evaluating the objects and positioning the subjects simultaneously, and building connection with other subjects in terms of any “salient dimension of the sociocultural ﬁeld”. In the case of the present study, though the ﬁxed element “你必须知道的” [ni bixu zhidao de] (you must know) could be excluded without sacriﬁcing the true condition meaning of the construction, different effects are produced in terms of expressing the author’s stance (see Example 24a, 24b, 25a and 25b). Example 24a 《你必须知道的社交潜规则》 (《鲁豫有约》2018/6/30) Ni bixu zhidao de shejiao qian guize (luyu youyue 2018/6/30) The unspoken social rules that you must know (A Date with Luyu 2018/6/30) Example 24b 《社交潜规则》 Shejiao qian guize Unspoken Social rules Example 25a 《你必须知道的3大创业死亡定律》 (《人民网》2014/9/10) Ni bixu zhidao de 3 da chuangye siwang dinglv(renmin wang 2014/9/10) The three Laws of death on start-up business you must know (People’s Daily 2014/9/10) Example 25b 《3大创业死亡定律》 San da chuangye siwang dinglv The three laws of death on start-up business

On the Limitations of Constructional Innovation

281

A comparison into the abovementioned examples reveals that when the ﬁxed element is excluded, in most cases the productive element X can still serve the titling function, as an index to the article theme [22] just like titles in the form of “X那些/点事” [X naxie/dian shi] (all the things about X). However, a closer look shows that the author is more involved in the case “你必须知道的NP” [ni bixu zhidao de NP] (NP that you must know), which can serve the indexing function and express the author’s stance on relevant themes. The author seems to be convinced of the importance of relevant knowledge, as well as its potential in satisfying the readers’ crave for knowledge. The current We-media era is characterized by massive exchange of information with weakening credibility. On one hand, authors aware of this risk employ the pronoun “你” [ni] (you) in the conversation to identify himself the knowledge owner and the readers as knowledge seekers. On the other hand, authors employ the high-value modal verb “必须” [bixu] (must) to underline the objective demand of the readers and consolidate the authority of his judgment. Language is rich in devices to express emotions and stance, such as lexicon, phonetic devices, grammatical and syntactical structures [15]. From this sense, the construction “你必须知道的NP” [ni bixu zhidao de NP] (NP that you must know) is also a kind of grammatical resource that helps the author to enhance the linguistic power, improve involvement, and express the author’s stance in the title. 3.3

Recruiting Readers’ Empathy

Compared with the emotiveness of the poem context and the anonymity feature of the internet language context, interaction is a signiﬁcant feature of titles. For both the online media and the traditional media, the title’s appeal has a crucial and decisive influence on whether the reader would pay more attention or not [23]. In other words, in the shallow reading era where a title’s appeal is crucial to attract the readers, a quality article without an eye-catching title may be left unnoticed. Therefore, this study holds that the “你必须知道的NP” [ni bixu zhidao de NP] (NP that you must know) has demonstrated the power to recruit readers’ empathy which has in turn prompted its thriving in titles. Authors have purposefully abandoned kind-referring NP and strategically chosen the pronoun “你” [ni] (you) in titles (see the comparison in Example 26, 27, 28 and 29). Example 26 《留学党/留学家长/预科生/你必须知道的美国留学住宿全攻略》 Liuxue dang/liuxue jiazhang/yuke sheng/ni bixu zhidao de meiguo liuxue zhusu quan gonglve All accommodation tips that overseas students/parents of overseas students/pre-college program students/you must know studying in America Example 27 《驴友/护士/医护人员必须知道的九个急救误区》 Lvyou/hushi/yihu renyuan bixu zhidao de jiuge jijiu wuqu Nine First-aid misunderstandings that travellers/nurses/medical professionals must know

282

L. Zhang et al.

Example 28 《投资者/商人/新创企业必须知道的这些世界级商业机遇》 Touzi zhe/shangren/xinchuang qiye bixu zhidao de zhexie shijieji shangye jiyu Global business opportunities that investors/businessmen/start-up businesses must know Example 29 《中国人/当代大学生/合格共产党员/你必须知道的“四大”》 Zhonguo ren/dangdai da xuesheng/gongchan dangyuan bixu zhidao de “sida” The “Top Four” Courses that Chinese people/university students/CPC members must know Stirling and Lenore [24] believed that a “potential pressure on the construction of personal narratives is the need to tie the content being presented to the interlocutor’s knowledge and experience and perhaps to recruit their engagement, assent or empathy”. In these examples, “你” [ni] (you) is used as a vocative that is hoping to arouse the reader’s attention. The pronoun “你” [ni] (you) is used by the author to address the current reader. By activating the current reader’s shared knowledge, the author also implies that the message in the title is applicable to the whole readership, including the current reader himself. Thus, the core semantics of the kind-referring NP, in a relevant sense, is established on non-individuality that reference is solid for a speciﬁc kind [25, 26], which further indicates its limitations in reference scope or capacity in arousing the reader’s attention. Compared with direct and real-time spoken interaction, in written communication, indirect and delayed interaction characterizes the relation between the author and readers. However, authors, according to Thompson [27], are bound to keep in mind the readers’ needs and possible reactions when they construct the text. This is even more true in the We-media era we are in, when linguistic innovations and turning delayed interaction into real-time interaction are crucial to all authors. Titles, as the head and core of the text, are the ﬁrst chance to introduce the author to the readers. The “你必须知道的NP” [ni bixu zhidao de NP] (NP that you must know) features a “narrativetheme” structure composed of the ﬁxed element conveying evaluation, and the productive element telling the theme. Titles in this construction can convincingly emphasize the importance of the text information by introducing a ﬁctional real-time dialogue context, where a reader has become “a party of the dialogue” rather than merely “an outsider”.

4 Conclusion This study addresses the limitations of construction innovation by looking into the case of “你必须知道的NP” [ni bixu zhidao de NP] (NP that you must know), an emerging construction popular in Chinese titles. Getting back to the research questions of this study, it is found that special linguistic context, a major incubator for constructional innovation, is also restrictive to construction innovation to some degree: 1) Only new constructions sharing certain functional attributes of the special linguistic context have a better chance of emerging. The rise of the “你必须知道的NP” [ni bixu zhidao de

On the Limitations of Constructional Innovation

283

NP] (NP that you must know) in new media titles is believed to beneﬁt from its functional attributes that “highlight high-value information”, “express the author’s stance” and “recruit readers’ empathy”. Meanwhile, these functions are also in conformity with the attributes of the speciﬁc context of titles; 2) Newly emerging constructions are subject to the speciﬁc conditions of the innovation context, which in turn accounts for the unique features in the form and semantics of the “你必须知道的NP” [ni bixu zhidao de NP] (NP that you must know). In applying this construction in titles, the productive element X in the construction can be in the form of bare nouns, “numeral + (quantiﬁer) + noun”, “pronoun + (quantiﬁer) + noun”, and proper nouns. The bare noun form outweighs all the other forms. Semantically speaking, the productive element X is composed of declarative knowledge and procedural knowledge, with declarative knowledge in dominance. On the other hand, the weakening semantics of the constructional elements also lead to subjectiﬁcation of the semantics of the ﬁxed element “你必须知道的” [ni bixu zhidao de] (you must know). Functional needs constantly inspire and shape “form” invention and development in languages [28]. Linguistic innovation, while accommodating the functional needs, is also influenced by the speciﬁc context where innovation takes place. Previous studies have found that communicative needs in spoken language have incubated a series of ﬁxed structures based on the high-frequency verb “知道” [zhidao] (know), such as “你知道/我不知道/不知道” [ni zhidao/wo bu zhidao/bu zhidao] (you know/I don’t know/don’t know) [29–31]. The three major motivations for language innovation and development, according to Xu and Qin [12], include special linguistic zones, language contact and language acquisition, where titles compose a major category of the special linguistic zones. This study has found that the construction “你必须知道的NP” [ni bixu zhidao de NP] (NP that you must know) is an innovative form developed from speciﬁc functional needs. Compared with other types of titles, such as the suspense titles used to stimulate the readers’ curiosity [32], or the sticky structure titles catering to the gestalt psychology of readers [7], the construction “你必须知道的NP” [ni bixu zhidao de NP] (NP that you must know) is an in-depth exploration into the at-a-loss mental dilemma of readers in the big data era, and a positive solution to the inner thirst of readers for essential knowledge in this knowledge-economy. Acknowledgements. This work was supported by the Youth Project grant of National Social Science Foundation of China [Project: Research on the interaction between generic references and stylistic varieties, Grant Number: 21CYY042], by the Humanities and Social Science Project of the Ministry of Education(Project: On the Syntactic and Semantic Interaction of Generic References from the Textual Perspective, Grant Number: 17YJC740115), and also supported by the Fundamental Research Funds for the Central Universities [Grant Number: CCNU21XJ017].

References 1. Goldberg Adele, E.: Constructions at Work: The Nature of Generalization in Language. Oxford University Press, Oxford (2006) 2. Goldberg Adele, E.: Explain Me This Competition and the Partial Productivity of Constructions. Princeton University Press (2019)

284

L. Zhang et al.

3. Shi, C.: hudong goushi yufa de jiben linian jiqi yanjiu lujing (Basic Ideas and Research Paradigm of Interactive Construction Grammar). dangdai xiuci xue (Contemp. Rhetor.) 2, 12–29 (2016) 4. Wu, W., Xia, F.: “A budao naliqu” de goushi jiexi huayu gongneng jiqi chengyin (Constructions and analysis of discourse functions and the formation of ‘Adj. + budaonaliquA不到哪里去’). zhongguo yuwen (Stud. Chin. Lang.) 4, 326–333 (2011) 5. Shi, C.: goushi yufa de lilun lujing he yingyong kongjian (Theoretical approaches to and application of construction grammar). hanyu xuebao (Chin. Linguis.) 1, 2–13 (2017) 6. Shao, J.: hanyu kuangshi jiegou shuolve (On frame constructions in Chinese). zhongguo yuwen (Stud. Chin. Lang.) 3, 218–227 (2011) 7. Yin, S.: shuo juzhong nianzhuo jiegou zuo biaoti (Cases of Sticky structures as headings). yuyan wenzi yingyong (Appl. Linguis.) 3, 93–100 (1992) 8. Liu, Y.: hanyu de qi yinjie pianming (On Chinese Titles of Seven Syllables). yuyan wenzi yingyong (Appl. Linguis). 2, 122–127 (2003) 9. Chen, P.: Shi hanyu zhong yu mingcixing chengfen xiangguan de sizu gainian(four pairs of concepts related with nominal components in Chinese). zhongguo yuwen (Stud. Chin. Lang. 2, 81–92 (1987) 10. Xiong, R.: zhishi shengchan:xin meiti shidai de meijie neirong shengchan(knowledge production: media content production in the new media age). xinwen qianshao (Press Outpose) 3, 36–37 (2013) 11. Dong, S., Huang, C.: yuyan tequ zhong chuangxin xingshi de xiandu (The limits of language innovation in special linguistic zones). huawen jiaoxue yu yanjiu (TCSOL Stud.) 4, 11–20 (2019) 12. Xu, J., Qin, Y.: “yuyan tequ” de xingzhi yu leixing (nature and category of the special linguistic zone). dangdai xiucixue (Contemp. Rhetor.) 4, 20–31 (2015) 13. Gagné, E.D.: The Cognitive Psychology of School Learning. Little, Brown and Company, Boston (1985) 14. Guo, F.: dangdai beijing kouyu dier rencheng daici de yongfa yu gongneng (usages and functions of second personal pronouns in modern Beijing vernacular). yuyan jiaoxue yu yanjiu (Lang. Teach. Linguis. Stud.) 3, 50–56 (2008) 15. Zhang, L.: kouyu zhong “ni’ de yizhi yongfa jiqi huayu gongneng de fuxian (Discourse functions of “Ni” in speech). shijie hanyu jiaoxue (Chin. Teach. World) 1, 44–54 (2014) 16. Yang, C., Ye, R.: “Bixu”de yuyi tezheng jiqi zhuguanhua-jianyu must bijiao (the semantic features of “Bixu” and its subjectiﬁcation-comparing with must). waiyu xuekan (Foreign Lang. Res.) 2, 60–66 (2016) 17. Shen, J.: yuyan de “zhuguanxing” he “zhuguanhua” (A survey of studies on subjectivity and subjectivisation). waiyu jiaoxue yu yanjiu (Foreign Lang. Teach. Res.) 4, 268–275 (2001) 18. Lv, W., Yao, S.: cihui guizhi yu lifa yuyan de jianmingxing (Vocabulary regulation and the similicity of legal language). yuyan wenzi yingyong (Appl. Linguis.) 4, 65–74 (2018) 19. Xu, J.: ziran yuyan jiaoji zhongde yuma jiedu he zhishi pipei (Code interpretation and knowledge matching in natural language communication). shijie hanyu jiaoxue (Chin. Teach. World 4, 60–68 (2001) 20. Fang, M.: pianzhang yufa yu hanyu pianzhang yufa yanjiu (text grammar and Chinese text grammar studies). zhongguo shehui kexue (Soc. Sci. China) 6, 165–172 (2005) 21. Du Bois, J.W.: The stance triangle. In: Robert, E. (ed.) Stancetaking in Discourse: Subjectivity, Evaluation, Interaction, pp. 139–182. John Benjamins, Amsterdam (2007) 22. Zhang, Y.: fei kuangjia liuxing goushi “X naxie/dian shi” yanjiu (Research on the popular non-framework construction “X naxie/dian shi”). dangdai xiucixue (Contemp. Rhet.) 2, 29– 40 (2015)

On the Limitations of Constructional Innovation

285

23. Zhang, Y.: cong xiangdui dao juedui:chengdu fuci “zui”de zhuguanhua qushi yu houguo (From relative to absolute: the subjectivization trend and consequences of degree adverb “zui”). yuwen yanjiu (Linguis. Res.) 1, 18–25 (2017) 24. Stirling, L., Lenore, M.: About you: empathy, objectivity and authority. J. Pragmat. 43, 1581–1602 (2011) 25. Liu, D.: hanyu leizhi chengfen de yuyi shuxing he jufa shuxing (Semantic and syntactic properties of kind-denoting elements in Chinese). zhongguo yuwen (Stud. Chin. Lang.) 5, 411–422 (2002) 26. Zhang, L., Yao, S.: cong yuti shijiao kaocha zhileiju de jufa tezheng he fenbu qingkuang (Syntactic Features and Distribution of Generic Sentences:a Register Perspective). yuyan jiaoxue yu yanjiu (Lang. Teach. Linguis. Stud.) 3, 74–81 (2013) 27. Thompson, G.: Interaction in academic writing: learning to argue with the reader. Appl. Linguis. 22, 58–78 (2001) 28. Fang, M.: you beijinghua chufa de liangzhong jufa jiegou-zhuyu lingxing fanzhi he miaoxiexing guanxi congju (Two emergent grammatical structures motivated by background information packaging: A case study of the cataphoric zero subject clause and the descriptive relative clause). zhongguo yuwen (Stud. Chin. Lang.) 4, 291–303 (2008) 29. Tao, H.: cong yuyin yufa he huayu tezheng kan “zhidao” geshi zai tanhua zhong de yanhua (Phonological grammatical and discourse evidence for the emergence of zhidao constructions). zhongguo yuwen (Stud. Chin. Lang.) 4, 291–302 (2003) 30. Zhou, B., Li, Y.: “ni bu zhidao”xiang huayu biaoji de yanbian (Evolving into discourse marker: the case of “ni bu zhidao”). hanyu xuebao (Chin. Linguist.) 1, 78–84 (2014) 31. Hu. J.: qianjinghua yu “zhidaoma”de gongneng (Foregrounding and the function of “zhidaoma”). yuyan kexue (Linguis. Sci.) 2, 194–205 (2015) 32. Yang, H., Zhou, J.: shiyin xuanyi biaoti de yuyi yu yupian gongneng yanjiu (A study on the semantic and discourse function of suspenseful titles). dangdai xiucixue (Contemp. Rhetor.) 6, 52–59 (2014)

Formalized Chinese Sentence Pattern Structure and Its Hierarchical Analysis Weiming Peng1,2(&), Zuntian Wei1, Jihua Song1,2, Shiwen Yu3, and Zhifang Sui3 1

School of Artiﬁcial Intelligence, Beijing Normal University, Beijing, China {pengweiming,songjh}@bnu.edu.cn, [email protected] 2 Laboratory for Chinese Character Research and Application, Beijing Normal University, Beijing 100875, China 3 Key Laboratory of Computational Linguistics, Ministry of Education, Institute of Computational Linguistics, Peking University, Beijing 100871, China {yusw,szf}@pku.edu.cn

Abstract. Sentence Patterns (句式) have always been important in Chinese grammar teaching, but they lack the corresponding formal representation that ﬁts with information processing. Sentence pattern analysis in Chinese grammar teaching generally adopts Sentence Component Analysis (句子成分分析法), while the dichotomy idea of the Hierarchical Analysis of Phrase Structure (层次分析法) is dominant in the construction of mainstream Chinese Treebank. This situation impedes the exchange and resource sharing between these two application areas. This paper reviews the formalized schema of Sentence Pattern Structure which is subject to the Diagrammatic Syntactic Analysis (图解法) of Sentence-based Grammar (句本位语法), and expounds its hierarchical characteristics and application value in teaching Chinese as a foreign language. Keywords: Sentence pattern structure Syntactic analysis Chinese treebank Sentence-based grammar Sentence component analysis

1 Introduction In Chinese grammar teaching, a sentence is usually analyzed as a combination of several sentence components such as subject, predicate, object, attribute, adverbial, complement, etc. This Sentence Component Analysis (SCA, 句子成分分析法) method spreads out the various components to form a variety of structural patterns, which is the so-called “sentence pattern.” However, in the ﬁeld of information processing, the SCA method has encountered great challenges. The ﬁrst problem is the hierarchy. When the layers of sentence structure increase, it is difﬁcult to describe the sentence patterns formally. Therefore, in Chinese information processing, the mainstream treebank mostly adopts the dichotomous phrase structural hierarchy rather than the layout form of sentence pattern. Taking phrase structure and dependency structure as examples, although the concepts of subject, predicate, object, attribute, adverbial and complement are still mentioned, their nature has changed: from “sentence component” © Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 286–298, 2022. https://doi.org/10.1007/978-3-031-06703-7_22

Formalized Chinese Sentence Pattern Structure

287

belonging to sentence patterns to “grammatical role” in phrase structure. [1] showed that there are no more than four types of tagging information in syntactic trees: Headphrase, grammatical role, phrase function and structural hierarchy. As long as the annotation scheme is designed properly, the conversion between phrase structure tree and dependency tree could be realized according to the rules. That is to say, phrase structure and dependency structure are only different in form but the internal grammatical system is consistent. The sentence pattern structure discussed in this paper is different from the above two structures. It aims to explore a kind of formalized syntactic structure based on SCA, which can effectively connect with “sentence pattern” expression in Chinese grammar teaching. Obviously, it is “sentence-based”. The Sentence-based Grammar (句本位语法) originated from Mr. Li Jinxi’s A New Chinese Grammar [2] has initiated a Diagrammatic Syntactic Analysis (图解法) method to describe the structure of any complex sentence. However, because of the outdated Li’s grammatical terminology system, it is necessary to make a formalized transformation to apply the diagrammatic method to information processing. This research work is mainly concentrated in Beijing Normal University. The term “Sentence Pattern Structure” was ﬁrst proposed as a kind of formalized structure in [3]. This paper will discuss the hierarchical characteristics of Sentence Pattern Structure on the basis of previous achievements, and point out its advantages and disadvantages compared with phrase structure and dependency structure, as well as its potential application value.

2 An Overview of Sentence Pattern Structure System The Sentence Pattern Structure (abbreviated as SPS) adopts the diagrammatic schema: A long horizontal line is used to separate the subject, predicate and object as the trunk component and the attribute, adverbial and complement as the additional component; the subject, predicate and object above the horizontal line are separated by double vertical line and single vertical line; the attribute, adverbial and complement are connected to its head-words by different fold lines below the horizontal line. The formalization work includes: normalizing the diagrammatic schema of all kinds of components and SPSs; designing the corresponding data storage format. So far, there are three versions of the Sentence Pattern Structure system: SPS System Based on Li’s Grammar. This version [4, 5] retains Li’s “complements (补足语)” in the sentence components. It follows the concept of “distinguishing the word-class according to the component”, and the parts of speech (POS) is directly determined by the sentence components. SPS System After Absorbing Tentative Chinese Teaching Grammar System (《暂拟汉语教学语法系统》). This version [3] is to dock the prevailing Chinese teaching grammar. The updates include: adopting the prevailing deﬁnition of complement (补语) instead of Li’s “complements (补足语)”; setting up the structural pattern of serialverb sentence and pivotal sentence; implementing a Web-based diagrammatic annotation system [6], which greatly improves the efﬁciency of tagging.

288

W. Peng et al.

The Latest SPS System. The initial design and supplement were published in the CLSW2015 [7] and CLSW2016 [8]. This version introduces the “dynamic word” structure, and makes the SPS more standardized. Accordingly, diagrammatic annotation system [9] and treebank [10] have been upgraded and optimized. The SPS discussed below is subject to the latest version. 2.1

The Syntactic Structure System of SPS

The SPS can be classiﬁed into three types from simple to complex: basic sentence pattern, extend sentence pattern and complicated sentence pattern. Basic sentence pattern: the sentence pattern containing only subject, predicate, object as the trunk components and predicate as the core. Extended sentence pattern: the sentence pattern which maintains the trunk layout of Basic Sentence Pattern (i.e. single-predicate core), and has additional components such as attributive, adverbial, complement, independent component, or side-by-side structures such as coordinate and appositive. Complicated sentence pattern: the sentence pattern which break the singlepredicate-core layout of the trunk, i.e. with two or more predicates, such as pivotal sentence (兼语句), serial-verb sentence (连动句), union predicate sentence (联合谓语句), synthetic predicate sentence (合成谓语句), S-P predicate sentence (主谓谓语句) and complex sentence (复句). The three types of sentence patterns are summarized to “diagram formula” [7] as shown in Fig. 1.

Fig. 1. Diagram formula of sentence pattern structures

Contrary to content words serving as sentence components, function words do not directly serve as, so they are marked beside other components in the form of “function word position” in the diagram. There are four main categories: preposition, conjunction, locative word (in locative structure) and auxiliary word. The ﬁrst three types of

Formalized Chinese Sentence Pattern Structure

289

function word positions are marked as “^”, “…”, “□”; while auxiliary positions are divided into three kinds: • Auxiliary words connecting attributes, adverbials and complements, such as “的 [de]”, “地[de]”, “得[de]”, placed at the junction of the fold line. • Auxiliary words combined with NP, such as “等[deng]”, marked as “△”; • Auxiliary words combined with VP, such as “者[zhe]”, “所[suo]”, and modal particles at the end of sentences, marked as “▽”. 2.2

The Lexical Structure System of SPS

The content word position (corresponding to component) and the function word position obtained from the SCA are the minimum syntactic diagrammatic unit, and occupies all kinds of horizontal line segments in the diagram. Generally, a word position is occupied by a word, but there is a large amount of ﬁxed or semi ﬁxed phrases, which should be treated as a word and be analyzed in lexical structure. In the diagram, the text in a word position is marked as follows: for vocabulary words, the POS is marked directly; for the “dynamic words” out of the vocabulary, in addition to tagging the whole POS, it also need to analyze the internal structure. The method is: split the text into some morphemes, and mark the morphemes’ POS, as well as the structural relation between them (as shown in Table 1). Table 1. Symbol of the lexical structure Lexical relation Coordination Attributive-head Adverbial-head Verb-complement Verb-object Subject-predicate Reduplication Other (quantiﬁers, localizers, afﬁxes and other derivative structures)

Symbol … % ! ← | ‖ -

Samples 桌 (table) 椅 (chair) 白 (white) 兔 (rabbit) 极 (extremely) 具 (have) 赶 (drive) 跑 (away) 染 (dye) 发 (hair) 身 (life) 亡 (death) 看 (look) 看 (look) 两 (two) 个 (unit) 桌 (table) 上 (above) 看 (look) 了 (le) 翩然 (trippingly) 而 (er) 至 (come)

@mod n:n…n n:a%n v:d !v v:v←v v:v|n v:n‖v v:vv m:m-q n:n-f v:v-u n:a2-c-v

The diagram form of an example sentence and the corresponding XML format are shown in Fig. 2. “ju” and “xj” denote the nodes of the whole sentence and the clause respectively. The naming of the three types of nodes: component, function word

290

W. Peng et al.

position and word take 3, 2, and 1 letters respectively. The vocabulary word uses POS node to store text directly, while the dynamic word uses a parent POS node with the morpheme POS nodes as the children. The attribute @mod of the parent POS node records the lexical structure information: POS and numbers of characters (omitted when it is 1) of morphemes, as well as the structural relations between them.

Fig. 2. An example of diagram and XML format

3 The Hierarchical Characteristics of SPS 3.1

The Sentence Pattern Level Outweighs Phrasal Hierarchy

In mainstream syntactic analysis, the structural hierarchy is generally for phrases. In SPS, the sentence pattern level (句式层次) outweighs the phrase hierarchy (短语层次). Just like the arrangement in the diagram formula, as long as the components are placed on or directly connected to a horizontal trunk line, they belong to the same sentence pattern level. According to Chinese syntax, the arrangement of components in a single sentence pattern level is constrained to the following rules: • The order of “subject-predicate-object” structure is ﬁxed; • Attributives only appear before the subject and object; • Adverbials only appear before the predicate or at the beginning of a sentence (i.e. before subject and its attributes); • Objects and complements are only after the predicate, and the number of object and complement in a single predicate-core structure is not more than 2 and 1 respectively;

Formalized Chinese Sentence Pattern Structure

291

• The coordinate and appositive structure can divide the subject or object into two or more parts, each can carry attributives respectively. • Two or more predicate-core structures can be combined into pivotal sentence, serial-verb sentence, union predicate sentence, and synthetic predicate sentence (the ﬁrst predicate-core without object and complement); In a single sentence pattern level, there is a centripetal relation between the predicate-core and other components, so the phrase hierarchy can be inferred according to the order of the components. For example, the phrase hierarchy of “attributiveattributive-subject-adverbial-predicate-complement-attributive-object” is naturally as follows: • [[att-[att-sbj]]-[[adv-[adv-[prd-cmp]]]-[att-obj]]] Therefore, the implied phrase hierarchy is not explicitly described in the SPS diagram. Coordinate, appositive and multi-predicate-core structures are side-by-side noncentripetal structures, the phrase hierarchy of which in SPS would be ambiguous. Take “好友张三, 李四 [haoyou zhangsan lisi] (friends Zhang San and Li Si)” as an example, only the coordinate and appositive relation are tagged between the three component positions instead of distinguishing “[好友(friends)-[张三(Zhang San)-李四(Li Si)]]” or “[[好友(friends)-张三(Zhang San)]-李四(Li Si)]”. For another example, “张三的朋友和父亲 [zhangsan de pengyou he fuqin] (Zhang San’s friend and father)” is only marked with “张三的 [zhangsan de] (Zhang San’s)” as the attributive, and the relation between “朋友 [pengyou] (friend)” and “父亲 [fuqin] (father)” is coordinate. Whether the modiﬁcation scope of attributive extends to “父亲 [fuqin] (father)” will not be shown in the diagram. The above situation reflects the deﬁciency of phrase hierarchy description in the SPS system. However, the above two expressions do have two interpretations, which could be classiﬁed as “true-ambiguity” from the perspective of natural language processing. In this case, it is reasonable and acceptable not to disambiguate any details in the formalization of SPS. 3.2

The SPS Level Can Be Measured by Vertical Distance Between Horizontal Lines and the Nesting Depth of Bracket in SPS Expressions

The Sentence-based Grammar regards subject, predicate and object as the trunk components of a sentence, and attributive, adverbial complement as the additional components. From the perspective of the diagram formula, the subject, predicate and object are directly placed on the horizontal trunk line that can be regarded as the SPS level benchmark. The attributive, adverbial and complement are put underneath the trunk line and the level depth is 1. All the components directly connected to the horizontal trunk line constitute the ﬁrst level of SPS. When additional components are nested, new horizontal lines will be drawn downward to increase the SPS level. Suppose that the depth of the horizontal trunk line is 0, and each vertical line segment (fold line using in additional components; or tripod using in nominalized verb

292

W. Peng et al.

phrases and S-P predicate structures) increases the depth by 1. Therefore, the depth of other horizontal lines is the number of vertical line segments passing from the trunk line to it, which can be regarded as the SPS level measurement. Figure 3 shows the diagrams with the downward and upward SPS level increase. sentence ① is an example of nesting additional components, increasing the SPS level downward. Sentence ② is an example of the nested clause structure, increasing the SPS level upward.

Fig. 3. The downward growth and upward growth of SPS level

The SPS level can also be measured by the nesting depth of bracket in SPS expressions. As shown in Table 2, SPS expression uses brackets, square brackets and angle brackets to indicate attributive, adverbial and complement respectively, and the curly brackets to indicate the structures lifted by a tripod; the main component and the separator of function word are basically consistent with the diagram. Layer-by-layer expansion follows two rules: cut off additional components beyond the level range; for the lifted component beyond the level range, only the trunk is reserved. Therefore, the maximum nesting depth of brackets equals to the SPS level. The SPS level is relative. Taking any horizontal line as a benchmark, we can get the SPS expression of a single level by obtaining the word positions within 1 of its depth. Take sentence ① in Fig. 3 as an example: • 您(you)‖[对(of)^规定(regulations)][怎么(how)]看(think)? • 对(of)^(国家(state)‖允许(allow)|名人(celebrity)∥出现(appear)▷的)规定 (regulations) • 国家(state)‖[不(not)]允许(allow)|名人(celebrity)∥[以(as)^身份(role)]出现(appear) < 在(in)^广告(advertisements)□中 • 以(as)^(患者(patient)▷的)身份(role)

Formalized Chinese Sentence Pattern Structure

293

Table 2. Sentence pattern structure expression Sentence Sentence ①

Level 0 1 2 3

4

Sentence ②

0 1

2

3

4

SPS expression 您‖看? (you‖think?) 您‖[对^规定][怎么]看? (you‖[of^regulations][how]think?) 您‖[对^(国家‖允许|名人∥出现▷的)规定][怎么]看? (you‖[of^(state‖allow | celebrity∥appear▷de) regulations][how]think?) 您‖[对^(国家‖[不]允许|名人∥[以^身份]出现 < 在^广告□中 > ▷的)规定][怎么]看? (you‖[of^(state‖[not] allow | celebrity∥[as^identity] appear < in^advertisement□in > ▷de) regulations][how]think?) 您‖[对^(国家‖[不]允许|名人∥[以^(患者▷的)身份]出现 < 在^广告□中 > ▷的)规定][怎么]看? (you‖[of^(state‖[not] allow | celebrity∥[as^(patient ▷de) identity] appear < in^advertisement□in > ▷de) regulations][how] think?) 大家‖谁想到他会记得今天是生日。 (We‖Who would have thought he would remember it was his birthday.) 大家‖{谁‖想到|他会记得今天是生日。} (We‖{Who‖would have thought|he would remember it was his birthday.}) 大家‖{谁‖[都][没]想到|{他‖会∶记得|今天是生日。}} (We‖{Who‖[would][not] have thought|{he would∶remember|it was his birthday.}}) 大家‖{谁‖[都][没]想到|{他‖会∶记得|{今天‖是|生日。}}} (We‖{Who‖[would][not] have thought|{he would∶remember|it || was | his birthday.}}) 大家‖{谁‖[都][没]想到|{他‖会∶记得|{今天‖是|(自己▷的)生日。}}} (We‖{Who‖[would][not] have thought|{he would∶remember|{it || was | his birthday.}}})

Using DOM of XML, the single-level SPS expression is extracted from the treebank easily. These expressions can provide the simplest examples for teaching sentence pattern. 3.3

Lexical Structure Analysis Does not Increase SPS Level

One of the characteristics of Chinese syntax is the high consistency in the structural relation, from morpheme to word, from word to phrase. Therefore, the “phrase-based” thinking holds that the lexical structure of derivative dynamic words should be included in the scope of syntactic analysis. In this way, not only the granularity of the leaf nodes (words) in syntax analysis can be controlled, but also the operation procedure of syntax analysis can be uniﬁed formally. Sentence-based Grammar holds that the structural analysis of dynamic words cannot be confused with syntax, and the derivation of morphology is different from the

294

W. Peng et al.

free combinability of syntax. One of the characteristics of SPS is to separate the structural analysis of dynamic words from the syntactic analysis. Taking n:v|n%n (碎[sui](crush)|石[shi](stone)%机[ji](machine)) for example, although the relation between morphemes is marked as “verb-object” and “attributivehead”, native Chinese speaker can distinguish that the combination is lexical rather than syntactic. Neither the noun morpheme nor the verb morpheme can be expanded, nor can they be replaced by synonymous disyllables (“粉碎石机*[fensui shi ji]”, “碎石头机*[sui shitou ji]”, “粉碎石头机[fensui shitou ji]*”). These differences show that the lexical structure is much more restricted than the syntactic combination. In addition to the above-mentioned component combined structure, the SPS system also classiﬁes quantitative structure, overlapping structure, auxiliary word structure and other types (see Sect. 2.2) into the scope of lexical structure. Unlike syntactic combination, the lexical structure of dynamic word is generally not extensible, or the extension types are enumerable. For example, the extension of quantitative structure (m:m-q) is limited to some mod such as the m:m-a-q (“一整套[yi zheng tao](a whole set of)”, “两大箱[liang da xiang](two big boxes of)”), m-u-q(“十来个[shi lai ge](ten or so)”, “千把斤[qian ba jin](thousands of kilograms of)”). In the mainstream treebank, dynamic words with more than four syllables are generally analyzed as syntactic structure. If the syntactic and lexical structures are not distinguished in the Diagrammatic Syntactic Analysis, the SPS level will be greatly increased. Therefore, the lexical structure independent of syntax is indispensable for the SPS analysis. The lexical patterns of dynamic words can be collected into a knowledge base. On the one hand, the strategy of “lexical pattern ﬁrst” adopted to solve the dynamic words could improve the efﬁciency of annotation; on the other hand, it can also prevent the excessive lexical structure of dynamic words from spreading to the syntactic level, which affects the SPS analysis. According to [8], the knowledge base of dynamic word structure pattern currently contains about 400 records. Even though it is still growing, it is still controllable compared with the syntactic combination. In the SPS schema, the description of the lexical structure is not hierarchical. For example, “碎石机[suishiji](stone crusher)” is only marked “n:v|n%n”. There is no explicit distinction between “[[v|n]%n]” or “[v|[n%n]]”. This is based on the following considerations: First of all, the whole POS can resolve some structural ambiguity. For example, if the whole POS in this example is n, it can only be the former ([[v|n]%n]). Secondly, for many lexical structures, dynamic words are cognitive as a whole, and the hierarchical analysis is irrelevant. For example, the structural model of “政治影响力[zhengzhi yingxiangli] (political influence)” is “n:n2%n2%n”. Both “[n%[n%n]]” or “[[n%n]%n]” are acceptable, because they are same in semantic. Another example is “长白山区[changbai shan qv](Changbai Mountain Area)”. Logically speaking, “山[shan](mountain)” takes precedence over “长白[changbai]”, but the rhythm is just the opposite. Therefore, unmarked level can meet the flexibility of dynamic word structure.

Formalized Chinese Sentence Pattern Structure

295

4 The Teaching Application of SPS The design of SPS system originates from Sentence-based Grammar, which is the earliest Chinese teaching grammar. In addition to the vivid diagrammatic presentation, this data format of SPS is very easy to exchange information of “sentence pattern” commonly used in Chinese grammar teaching. The following are examples of sentence pattern from Chinese textbooks for non-native Chinese speakers: • • • • •

而[er]……则[ze]……(and…then…) S + 要[yao](will)/快要[kuai yao](will)/就要[jiu yao](will) + V + (O) + 了[le] 宁肯[ningken]……, 也不[ye bu]/也要[ye yao]……(would rather…than/to…) A + 有[you] + B + 那么[name]/这么[zheme] + Adj(A is as Adj as B) Adj/V + 是[shi](be) + Adj/V, 可是[keshi]/但是[danshi](but)……

In the ﬁeld of teaching Chinese as a second language, the expressions shown above widely exists in the text, handouts, exercise books, teaching materials, etc. They contain rich knowledge of Chinese language and have high practical value for both teaching itself and NLP. However, due to the different sources and the lack of uniﬁed format and structure information, the sentence pattern expressions have not been organized into a knowledge base for a long time. The form of sentence pattern expression is usually a sequence of word forms, POS, components, ellipsis and other forms, with “ +”, “/” and other symbols as the separator. It is difﬁcult for general NLP technology to support information processing related to the structure of the expressions, such as obtaining sentences for teaching/learning Chinese as a second language, structural analysis of the expressions and so forth. This section briefly introduces the practical applications of solving these two problems by using SPS treebank, so as to illustrate the application value of this structure in information processing. 4.1

Retrieval and Acquisition of Examples of Sentence Patterns

In engineering practice, regular matching is a rough but efﬁcient way to obtain example sentences, that is, replacing the ellipses in the sentence pattern expression with wildcards and matching them in the corpus. For sentence pattern expressions with few Chinese characters, this method has a certain error rate, but the application system usually has a high expectation for accurate acquisition of example sentences. For instance, searching results for “而……则……” include “小的浴堂还是有的, 大而新的浴堂则都不复有此名目. (The small bathrooms still have these, but large and new bathrooms don’t have.)”. It is inconsistent with the structure of sentence pattern expressions and obviously does not meet the teaching needs. The problem is that the structural levels of “而” and “则” do not coincide with the sentence pattern expression. In order to improve the accuracy of obtaining sentences, it is necessary to limit the structural level of speciﬁc keywords in sentences. Based on the SPS treebank, the following two solutions are under consideration. Solution 1: Accurate Retrieval Based on XPath. With “而……则……” for example, the two conjunctions should be located in two conjunctional nodes at the same

296

W. Peng et al.

level of SPS. That is to say, the two conjunctions should be siblings in XML. The XPath is expressed as: • //xj[cc[1]/c = “而[er]” and cc[2]/c = “则[ze]”] Corresponding to the 2nd and 3rd version SPS Schema respectively, [11, 12] have compiled XPaths one by one for the grammar points in international Chinese textbooks, and realized accurate sentence retrieval for speciﬁc grammar points. The premise of the feasibility of this solution is, the relative hierarchy of the corresponding XML nodes of the characteristic items (word forms, POS or components) are ﬁxed in SPS”. In order to prove this, we extracted 1668 sentence pattern expressions from the Chinese textbook corpus. Analyzing the obtained example sentences, it is found that not only the “relative hierarchy” of the corresponding XML nodes of the feature items are ﬁxed, but it is basically within the scope of one SPS level. Solution 2: Fast Retrieval Based on SPS Expression. Firstly, extract single-level SPS expressions from the Treebank according to the idea described in Sect. 3.2, and convert the expressions to a SPS instance bank. The general extraction method is, traverse the XML and locate the hierarchy with “sbj” and “prd” nodes; then take the component nodes that do not exceed one SPS level, and merge the head-words of each component to form the SPS instance [13]. Taking sentence ① in Fig. 3 as an example, the following two SPS expressions can be extracted: • 您对规定怎么看?(What do you think of the regulations?) • 国家不允许名人以身份出现在广告中 (The state does not allow celebrities to appear in advertisements in a certain capacity). Secondly, matching by regular expressions on the SPS instance bank, the retrieval accuracy can be greatly improved and meet the practical needs. It should be noted that some SPS instances extracted in this way are the result of pruning the original sentence, and some position may have semantic incompleteness (e.g. “以身份”). However, compared with the original sentence, sentence examples are more concise and close to the actual needs of teaching. 4.2

Analysis of the Structure of Sentence Pattern Expression

The tasks of automatic structural analysis are as follows: Input: preprocessed “sentence pattern expression”. Output: “sentence pattern structure expression” as shown in Table 2. The preprocess mainly includes: (1) Split the items connected by “/” (logical operator OR), and combine them separately; (2) The POS symbols are uniformly marked by sentence structure; (3) Replace the non-feature items with “…”. For example, after preprocessing “A + 有 + B + 那么/这么 + Adj.”, two expressions are obtained: “…有…那么 a” and “…有…那么a”.

Formalized Chinese Sentence Pattern Structure

297

Use the following steps to analyze: Step 1: Use regular expressions to match the instances with the single-level SPS information from the above-mentioned SPS instance bank; Step 2: For each single-level SPS expression, divide it into component sequences and compared with the input feature items one by one. The matching items are retained, and the mismatching items are replaced with “…”; Step 3: Merge multiple consecutive “…” in the previous result; Step 4: It is possible to get inconsistent SPS expressions from different instances, and the ﬁnal result is the one with greater distribution. After preprocessing 1668 sentence expressions, a total of 2656 inputs were obtained. Limited by the size of the current SPS treebank, 1613 of the 2656 inputs can match the SPS instance. The mismatches are mostly due to the too concrete sentence pattern expression, such as “在…方面做文章 (make an issue in …)”, “决不只是…, 更主要的是… (It’s not just…, more importantly…)” and so on. Among the 1613 matched input formulas, the analysis accuracy is more than 99% after manual proofreading.

5 Conclusion Except for some Constructional structures (构式) [14], the description of Chinese syntactic structure in Treebank mostly adopts the dichotomy idea of the Hierarchical Analysis of Phrase Structure (层次分析法). In Chinese teaching and research, Sentence Component Analysis (SCA,句子成分分析法) still plays an irreplaceable role, although it was once considered not conducive to the description of structural hierarchy. This misunderstanding is because there is no distinction between the two different concepts of SPS level and phrasal hierarchy. In addition, the lack of a formalized SPS system is also the main reason why SCA is less adopted in the construction of Chinese Treebanks. This paper reviews the formalized SPS system based on SCA, and discusses the hierarchical characteristics of SPS from the perspectives of diagrammatic graphics, XML storage structure and SPS expression. The application in the ﬁeld of teaching Chinese as a second language shows that this is a practical and effective formalized structure for information processing. Acknowledgments. This work was supported by the National Natural Science Foundation of China (No: 61877004 and 62007004); and the Key Project of the National Social Science Foundation of China (No: 18ZDA295). The authors would like to express their gratitude for this support.

References 1. Qiu, L.-K., Jin, P., Wang, H.-F.: A multi-view Chinese treebank based on dependency grammar. J. Chin. Inf. Process. 29(03), 9–15 (2015). (in Chinese)

298

W. Peng et al.

2. Li, J.-X.: A New Chinese Grammar. The Commercial Press, Beijing (2001). (in Chinese) 3. Peng, W.-M., Song, J.-H., Wang, N.: Design of diagrammatic parsing method of Chinese based on sentence pattern structure. Comput. Eng. Appl. 50(06), 11–18 (2014). (in Chinese) 4. Peng, W.-M.: Digital Platform Construction of Sentence-Based Grammar and Its Application Study. Beijing Normal University, Beijing (2012). (in Chinese) 5. He, J., Peng, W., Song, J., Liu, H.: Annotation schema for contemporary Chinese based on JinXi Li’s grammar system. In: Liu, P., Su, Q. (eds.) CLSW 2013. LNCS (LNAI), vol. 8229, pp. 668–681. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-45185-0_69 6. Yang, T.-X., Peng, W.-M., Song, J.-H.: High efﬁciency syntax tagging system based on the sentence pattern structure. J. Chin. Inf. Process. 28(04), 43–49 (2014). (in Chinese) 7. Peng, W., Song, J., Sui, Z., Guo, D.: Formal schema of diagrammatic Chinese syntactic analysis. In: Lu, Q., Gao, H. (eds.) Chinese Lexical Semantics. CLSW 2015. LNCS, vol. 9332, pp. 701–710. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27194-1_68 8. Guo, D., Zhu, S., Peng, W., Song, J., Zhang, Y.: Construction of the dynamic word structural mode knowledge base for the International Chinese Teaching. In: Dong, M., Lin, J., Tang, X. (eds.) Chinese Lexical Semantics. CLSW 2016. LNCS, vol. 10085, pp. 251– 260. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49508-8_24 9. Zhao, M., Peng, W.-M., Song, J.-H.: Development and optimization of syntax tagging tool on diagrammatic treebank. J. Chin. Inf. Process. 28(06), 26–33 (2014). (in Chinese) 10. Song, T., Peng, W., Song, J., Guo, D., He, J.: The construction of sentence-based diagrammatic treebank. In: Dong, M., Lin, J., Tang, X. (eds.) Chinese Lexical Semantics. CLSW 2016. LNCS, vol. 10085, pp. 306–314. Springer, Cham (2016). https://doi.org/10. 1007/978-3-319-49508-8_29 11. Zhang, Y., Song, J.-H., Zhu, X., Peng, W.-M.: The identiﬁcation of grammar points in international Chinese language teaching materials based on sentence-based annotation. In: International Conference of Educational Innovation through Technology (EITT), pp. 29–36. IEEE Press, New York (2014) 12. He, J., Peng, W.-M., Song, J.-H.: Digitalization of Chinese sentence structure patterns: improvement of sentence-based grammar and diagrammatic syntactic analysis. J. Beijing Normal Univ. (Nat. Sci.) 52(04), 413–419 (2016). (in Chinese) 13. Zhu, S.-Q., Peng, W.-M., Song, J.-H., Guo, D.-D.: The extraction of Chinese sentence pattern instance based on diagrammatic treebank. J. Chin. Inf. Process. 31(05), 32–39 (2017). (in Chinese) 14. Zhan, W.-D.: On Theoretical issues in building a knowledge database of chinese constructions. J. Chin. Inf. Process. 31(01), 230–238 (2017). (in Chinese)

On the Semantic Ambiguity of Chinese Causative Resultative V-Vs Jiaojiao Yao(&) Center of Linguistics, University of Lisbon, Lisbon, Portugal [email protected]

Abstract. In this study, we propose a syntactic structure for Chinese Causative Resultative V-Vs (CR V-Vs), which are also known as “resultative verb compounds”, in the attempt to account for the semantic ambiguity phenomenon observed in some instances. We claim that a possible interpretation of a CR V-V should be compatible with the proposed structure and must be culturally recognized. We showcase our account’s explanatory and predictive power by presenting some CR V-Vs with semantic ambiguity (and non-ambiguity). Keywords: Causative

Resultative Verb compound Semantic ambiguity

1 Introduction The Chinese Causative Resultative V-Vs (CR V-Vs), also known as “resultative verb compounds” in the literature (e.g., [1, 9, 11, 18]), express caused-result meanings and are composed of two verbal components, with V1 denoting the causing event and V2 the result.1 It has been observed that some CR V-Vs exhibit semantic ambiguity. For example, the sentences in (1)–(2) yield at least two possible interpretations. Meanwhile, the semantics of CR V-Vs are also subject to constraints – as observed in (2), the interpretations (c) and (d) are unacceptable or marginal. (1) 他骑累马了。 Ta qi lei ma le. he ride tired horse ASP a. ‘He rode a horse, and this made him tired.’ b. ‘He rode a horse, and this made the horse tired.’ (2) 这孩子追累我了。 Zhe haizi zhui lei wo le. this child chase tired I ASP a. ‘This child chased me, and this made me tired.’ b. ‘I chased this child, and this made me tired.’ c. ??‘This child chased me, and this made the child tired.’

1

We assume that when adjective-like constituents occur in CR V-Vs, they function as verbs, leaving the distinction between verb and adjective categories a separate issue.

© Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 299–311, 2022. https://doi.org/10.1007/978-3-031-06703-7_23

300

J. Yao

d. *‘I chased this child, and this made the child tired.’ Moreover, semantic ambiguity does not always occur. For example, only one interpretation is possible for the examples in (3), contrasting to (1). (3) 他骑累那匹马了。 Ta qi lei na pi ma le. he ride tired that CL horse ASP a. *‘He rode that horse, and this made him tired.’ b. ‘He rode that horse, and this made that horse tired.’ Argument linking rules have been proposed to account for the semantic ambiguity of Chinese CR V-Vs in previous studies (e.g., [12, 13]). However, leaving aside whether speciﬁc linking rules can be universally applied, there has not been an agreedupon structure of CR V-Vs where these rules may be working. If we assume that CR VVs are listed items in the lexicon (e.g. [1, 11, 12]), their productivity would be a mystery. On the other hand, with a syntactic and event decomposition approach (e.g., [3, 14, 15]), it is difﬁcult to explain why some CR V-Vs can ﬁt into multiple structures but others cannot. In this study, within the theoretical framework of the Minimalist Program [2] and the general tenets of Distributed Morphology [5, 6], we will provide a structure of CR V-Vs, based on which we attempt to explain the phenomenon of semantic ambiguity. In particular, we will address the following questions: 1) Where does the semantic ambiguity originate? 2) For the CR V-Vs with semantic ambiguity, why are some interpretations acceptable but others not? 3) How can we explain the contrast between (3) and (1)?

2 The Syntactic-Semantic Interface 2.1

Syntactic Structure

Distributed Morphology [5, 6] posits that syntax is the only generative system responsible for both word structure and phrase structure. Within this framework, the “Narrow Lexicon” [16] consists of two classes of terminal nodes that enter in the syntactic derivation: “lexical roots” and “bundles of grammatical features” (functional elements). The lexical roots contain encyclopedic semantic content and are acategorial; they can only get categorized and interpreted by merging with a categorizing functional head little x, such as v, n, or a. In particular, the core structure of a verb phrase contains a little v head and a root (represented by “√”), as illustrated in (4): the little v semantically introduces an event, and the root modiﬁes this event by contributing semantic content. (4)

v

√

Following Folli & Harley’s different “flavors” of v heads [4] (e.g., vCAUSE, vBEwe propose that CR V-Vs involve vCAUSE, a v head with CAUSE feature.

COME, vDO),

On the Semantic Ambiguity of Chinese Causative Resultative V-Vs

301

Inspired by Pylkkänen’s parametric proposal on causative constructions [17], we hypothesize that in CR V-Vs, vCAUSE selects roots, similar to how Japanese lexical causatives are formed in Pylkkänen’s proposal. As shown in (5), the result-denoting root, represented by “√2” (i.e., the component at V2 position in CR V-Vs), merges with vCAUSE to get causativized. (5)

√2

vCAUSE

Note that in CR V-Vs, the causing eventuality is overtly expressed by the component at the V1 position. We posit that this root, namely √1, conflates to vCAUSE as an adjunct to specify the Manner of the causing eventuality, assuming with Haugen’s Manner Conflation [8], which shares the spirit of Harley’s Manner Incorporation [7], initially proposed for English instrument verbs such as hammer and brush. Therefore, a CR V-V would involve the structure in (6), from which the V-V adjacency is naturally yielded.

vCAUSEP vCAUSE

(6)

√1

√2

vCAUSE

In spite of being syntactically formed, a CR V-V functions as a V0, which explains why CR V-Vs exhibit high integrity (e.g., neither V1 nor V2 is extractable via whmovement) and general properties of words (e.g., each CR V-V tends to be disyllabic, a tendency observed in Chinese words). 2.2

Argument Realization

In syntax, each CR V-V functions as one single verbal unit, which expresses causedresult meanings (i.e., the result encoded in √2 is brought about via the eventuality denoted by √1). As a single unit, a CR V-V may be syntactically and semantically similar to unaccusative verbs such as melt or telic action verbs such as kill. We hypothesize that in either case, the Causee is generated at the position of an internal argument. When a CR V-V is semantically close to unaccusatives, the external argument can be null, and in this case, the Causee may move up and become the surface Subject. For example, in (7), the Causer is covert, and the Causee 杯子beizi ‘cup’ surfaces as the sentence’s Subject. We call these apparently intransitive V-Vs Inchoative CR V-Vs. (7) a. 杯子打碎了。 Beizi da sui le. cup hit break ASP ‘The cup broke.’

302

J. Yao

vP VP vCAUSEP (V0)

b.

√sui ‘break’

vCAUSE √da ‘hit’

DP beizi ‘cup’

vCAUSE

When the Causer is present, we follow [10] and assume that the Causer is generated at Spec, VoiceP. In this case, the Causee stays in situ. For example, a transitive counterpart of (7) is presented in (8), which we call the Causative type of CR V-Vs. The alternation between Inchoative and Causative CR V-Vs is analogous to the causative alternation observed in verbs such as melt and break. (8) a. 妈妈打碎杯子了。 Mama da sui beizi le. mom hit break cup ASP ‘Mom broke the cup.’ VoiceP

DP

b.

vP

mama ‘mom’

VP vCAUSEP (V0) vCAUSE √da ‘hit’

√sui ‘break’

DP beizi ‘cup’

vCAUSE

In contrast, when a CR V-V is semantically close to telic action verbs such as kill, which involve agentivity and intentionality, the Causer must be overt. We call this type of CR V-Vs the Accusative type. Accusative CR V-Vs only have transitive uses, disallowing intransitive counterparts. An example of Accusative CR V-Vs is presented in (9a, b), whose intransitive counterpart is ungrammatical, as shown in (9c).

On the Semantic Ambiguity of Chinese Causative Resultative V-Vs

303

VoiceP DP

vP

ta ‘he’

VP vCAUSEP (V0)

DP

√si yi zhi chongzi ‘die’ ‘one insect’

vCAUSE √pai vCAUSE ‘smack’

2.3

Semantics of CR V-Vs

So far, we have shown that CR V-Vs may echo alternating unaccusative verbs such as melt (i.e., the Inchoative and Causative CR V-Vs) or telic action verbs such as kill (i.e., the Accusative CR V-Vs). The structures of CR V-Vs are summarized in (10). (10)

a. Inchoative CR V-V vP

b. Causative/Accusative CR V-V VoiceP

VP vCAUSEP (V0) vCAUSE √1

vCAUSE

√2

DP DP

vP

Causer

VP vCAUSEP(V0)

Causee

vCAUSE √1

√2

DP Causee

vCAUSE

The semantic meanings expressed by the structure in (10) are: (11) Semantics of CR V-Vs: a. The Causee enters into the result denoted by √2 via the eventuality denoted by √1. b. When the Causer is overtly expressed, the whole caused-result event is brought about by it.

304

J. Yao

Note that the Causer is not necessarily an Agent of the causing eventuality denoted by √1. The only requirement is that it is a possible Causer to trigger the whole causedresult event described in (11a). It may be a participant (e.g., Agent, Theme, Instrument) or a non-participant of the eventuality denoted by √1. As will be shown in Sect. 3, this is the main reason why semantic ambiguity occurs. Therefore, we claim that a possible interpretation of a CR V-V should meet the requirements in (12): (12) Requirements for possible interpretations of CR V-Vs a. It should be compatible with the structure in (10). For instance, when a CR VV takes two NPs, which we name N1 (the surface Subject) and N2 (the surface Object), the interpretation of N1 being the Causee and N2 being the Causer will be ruled out. b. The caused-result event should be culturally recognized. In other words, such an event should be possible according to people’s world knowledge and experience. In the following section, we will present some speciﬁc cases to show the explanatory power of our account.

3 Cases of Semantic Ambiguity and Constraints 3.1

吃饱 Chi-bao ‘Eat-Full’

We will ﬁrst present a case of semantic constraint. As shown in (13), the CR V-V 吃饱 chi-bao ‘eat-full’ imposes a constraint on the surface Object. While the contrast between (13a) and (13b) might seem to show a constraint on the deﬁniteness or the “size”, the ungrammaticality of (13c) tells us that this is not exactly the case: the surface Object in (13c) is indeﬁnite and bare, but yet the sentence is ungrammatical.

Despite involving two NPs, the sentence in (13a) should belong to the Inchoative type, since the Causee 他ta ‘he’, which controls the result of ‘being full’, surfaces as the Subject. We hypothesize that the surface Object, 饭fan ‘rice’, is a part of the causedenoting root. As presented in (14), in this case, the root that conflates to vCAUSE to specify the Manner is a complex root. Being a root, this complex constituent is

On the Semantic Ambiguity of Chinese Causative Resultative V-Vs

305

expected to denote a generic meaning. ‘Eating rice’ is considered basic in Chinese culture, symbolizing the generic activity of ‘eating’. In contrast, ‘that/a bowl of rice’ and ‘bread’ are overly speciﬁc and therefore are not allowed to be part of a complex root, causing the ungrammaticality of (13b, c).

vP

(14)

VP vCAUSEP (V0) vCAUSE √chi-fan ‘eat rice’

√bao ‘full’

DP ta ‘he’

vCAUSE

This “basicness” constraint also applies to 喝醉 he-zui ‘drink-drunk’. As shown by the contrast between the examples in (15), 酒jiu ‘alcohol’, but not 这瓶酒 zhe ping jiu ‘this bottle of alcohol’ is allowed to occur as the surface Object. That is because ‘drinking alcohol’ expresses the generic meaning of ‘drinking’, but ‘drinking this bottle of alcohol’ is too speciﬁc to fulﬁll the “basicness” requirement.

3.2

骑累 qi-lei ‘Ride-Tired’

Regarding the CR V-V 骑累 qi-lei ‘ride-tired’, an interesting contrast is shown in (16– 17): when the Object is a bare noun, the sentence is semantically ambiguous (16); when it is a DP, the ambiguity disappears (17).

306

J. Yao

Let’s ﬁrst analyze the sentence in (16). The difference between the two interpretations exists in whether N1 or N2 takes the role of Causee (i.e., whether 他ta ‘he’ or 马ma ‘horse’ became tired). According to the proposed structure in (10), when N1 takes the role of Causee, the sentence has an Inchoative structure; when N2 takes the role of Causee, the sentence has a Causative/Accusative structure. Therefore, the interpretation (16a), where N1 他ta ‘he’ is the Causee, should have the Inchoative structure in (18a); in contrast, the interpretation (16b), with N2 马ma ‘horse’ as Causee, corresponds to the structure in (18b). (18)

a. Structure of (16a) vP

b. Structure of (16b) VoiceP

VP vCAUSEP (V0) vCAUSE

√lei ‘tired’

√qi-ma vCAUSE ‘ride a horse’

DP DP ta ‘he’

ta ‘he’

vP VP vCAUSEP(V0) vCAUSE

√lei ‘tired’

DP ma ‘horse’

√qi vCAUSE ‘ride’

Note that the structure of (16a) in (18a) involves a complex root, similar to the cases with 吃饱chi-bao ‘eat-full’ and 喝醉he-zui ‘drink-drunk’ in Sect. 3.1. The meaning of ‘riding a horse’ fulﬁlls the “basicness” requirement to serve as a complex root since it denotes a category of human activity. However, if we replace 马ma ‘horse’ to an NP with a more speciﬁc meaning, such as 那匹马na pi ma ‘that horse’ or 这匹马 zhe pi ma ‘this horse’, the “basicness” constraint is violated, and the sentence would be ungrammatical. That is why interpretation (a) of (17) is not allowed, the structure of which is presented in (19a). This structure would crush due to the violation of the “basicness constraint”. For the sentence in (17), only the Accusative structure in (19b) is possible, which corresponds to the interpretation (17b).

On the Semantic Ambiguity of Chinese Causative Resultative V-Vs

(19)

a. Structure of *(17a) vP

b. Structure of (17b) VoiceP

VP vCAUSEP (V0) vCAUSE

√lei ‘tired’

307

DP DP ta ‘he’

√qi-na-pi-ma vCAUSE ‘ride that horse’

vP

ta ‘he’

VP vCAUSEP(V0) vCAUSE

√lei ‘tired’

DP na pi ma ‘that horse’

√qi vCAUSE ‘ride’

3.3

追累 Zhui-lei ‘Chase-Tired’

Among the four interpretations of the sentence with 追累 zhui-lei ‘chase-tired’ in (20), two are possible (20a, b), one is marginal (20c), and the remaining one is unacceptable (20d). Now we will show how our proposal can predict this.

The interpretations (20a, b) would correspond to a transitive structure in (21) since N2 wo ‘I’ takes the role of Causee. This structure expresses the semantic meaning that ‘I became tired due to the activity of chasing, and it is this child that brings about the whole event’. As highlighted in Sect. 2.3, the Causer does not have to be an Agent of the causing activity but rather is required to be a possible trigger of the whole causedresult event. Therefore, the structure in (21) yields two possible interpretations: a) the child is interpreted as Agent of the activity ‘chasing’, corresponding to (20a); b) the child is not an Agent, but the Theme of the activity ‘chasing’, corresponding to (20b). The second interpretation is possible because the child, being the Theme of ‘chasing’, is a possible Causer, according to our world knowledge and experience: the scenario that I chase a child, and that it makes me tired, is possible in reality.

308

(21)

J. Yao

Structure of (20a, b) VoiceP vP

DP zhe haizi ‘this child’

VP vCAUSEP (V0)

vCAUSE

DP wo ‘I’

√lei ‘tired’

√zhui vCAUSE ‘chase’

In contrast, the interpretations (20c, d) imply an Inchoative structure since N1 这孩子zhe haizi ‘this child’ takes the role of Causee. In this case, since the surface Subject is in fact an internal argument, the surface Object can only occur in a complex root, as presented in (22), similar to the case with 吃饱chi-bao ‘eat-full’ in Sect. 3.1. However, to form a complex root, the “basicness” requirement should be fulﬁlled, which imposes problems here. Contrasting to 吃饭chi-fan ‘eat rice’ and 喝酒he-jiu ‘drink alcohol’ in Sect. 3.1, which express generic meanings of ‘to eat’ and ‘to drink’, 追我zhui-wo ‘chase-me’ can hardly be considered to convey somewhat basic meaning. For this reason, the interpretation (20c), where the causing eventuality is supposed to be ‘chasing me’, is marginal or even unacceptable to some Chinese native speakers. In contrast, the interpretation (20d) is entirely unacceptable. That is because even if we allow 追我zhui-wo to form a complex root, the interpretation (20d) will imply that this complex root conveys the meaning of ‘I chase’, which is impossible with the linear order of 追zhui ‘chase’ preceding 我wo ‘I, me’. (22)

Structure of (20c, d)

vP VP

vCAUSEP (V0) vCAUSE √zhui-wo vCAUSE ‘chase me’ (20c) *‘I chase’ (20d)

√lei ‘tired’

DP zhe haizi ‘this child’

On the Semantic Ambiguity of Chinese Causative Resultative V-Vs

3.4

309

想哭 Xiang-ku ‘Miss-Cry’

Another case of semantic ambiguity is with the CR V-V 想哭xiang-ku ‘miss-cry’, as presented in (23).

The interpretations (23a, b) imply a transitive structure since it is N2, i.e., 妈妈 mama ‘mom’, that takes the role of Causee (i.e., the result is ‘mom cried’). The corresponding structure is presented in (24). (24)

Structure of (23a, b) VoiceP DP nü’er ‘daughter’

vP VP vCAUSEP (V0) vCAUSE

√ku ‘cry’

DP mama ‘mom’

√xiang vCAUSE ‘miss’ This structure is similar to the transitive structure of 追累zhui-lei ‘chase-tired’ in (21). However, while (21) yields two interpretations, only one interpretation is possible here – the interpretation (23a) is not acceptable. Recall that the semantic ambiguity of (21) is due to the fact that the Causer only has to be a possible trigger of the whole caused-result event. It may be an Agent, a Theme, or even a non-participant of the causing eventuality, as long as such caused-result event is culturally recognized. Theoretically, in (24), with 女儿nü’er ‘daughter’ being the Causer, the causing eventuality may be ‘the daughter is missed (by the mom)’ or ‘the daughter is missing (the mom)’. However, based on people’s experience, it is more likely that the person who is missing others, rather than the one who is missed by others, results in crying. Since the Causee in (24) is 妈妈mama ‘mom’, a natural interpretation is that the mom’s crying resulted from her missing the daughter instead of being missed by her daughter. For this reason, although theoretically, both (23a) and (23b) are compatible with the structure in (24), only (23b) is culturally recognized and acceptable. Therefore, the difference between (21) and (24) does not reside in the syntactic structure but people’s world knowledge and experience.

310

J. Yao

Based on the interpretations (23c, d), an Inchoative structure is expected since it is N1 女儿nü’er ‘daughter’ that takes the role of Causee (i.e., the result is ‘the daughter cried’). The corresponding structure is presented in (25), where the cause-denoting root is supposed to be a complex one. (25)

Structure of (23c, d) vP VP vCAUSEP (V0) vCAUSE

√ku ‘cry’

DP nü’er ‘daughter’

√xiang-mama vCAUSE ‘miss mom’ (23c) *‘mom misses’ (23d) Similar to the inchoative 追累zhui-lei ‘chase-tired’ (22), the interpretation (23d) is ruled out because the linear order of 想xiang ‘miss’ preceding 妈妈mama ‘mom’ cannot yield the interpretation of ‘mom misses (the daughter)’, but rather ‘(the daughter) misses the mom’. However, while the interpretation (23c) is possible here, the corresponding one for 追累zhui-lei ‘chase-tired’ is marginal (i.e., the interpretation [20c]). We attribute this difference to the varied degree of semantic “basicness”. While the semantic meaning of 追我 zhui wo ‘chase me’ in (22) is hardly considered basic, 想妈妈xiang mama ‘miss mom’ denotes a more generic activity of human beings, especially at young ages. Therefore, it is more plausible for 想妈妈xiang mama ‘miss mom’ to form a complex root, making (23c) acceptable.

4 Conclusion This study proposes that a CR V-V is formed by two roots conflating or merging with the head vCAUSE. Although syntactically formed, each CR V-V functions as a V0. Depending on the semantic features, some CR V-Vs exhibit unaccusative properties, taking an internal argument, with the external argument being optional – this is when causative alternation is observed. In contrast, other CR V-Vs may be semantically similar to telic action verbs, involve agentivity and intentionality, and disallow intransitive alternants. We propose that the Inchoative structure in (10a) and the Transitive structure in (10b) are the only two possible structures for sentences involving CR V-Vs. When a CR V-V is transitive, the overt Causer is required to be a possible trigger of the whole caused-result event but does not have to be an Agent of the causing eventuality denoted by V1. That is the main reason why some CR V-Vs exhibit semantic ambiguity – the Causer may be interpreted as Agent, Theme, or even a non-participant of V1, as long as the intended semantic meaning of the sentence is compatible with the structure in (10b) and is culturally recognized. On the other hand, semantic constraints have also been observed. The “basicness” constraint applies when

On the Semantic Ambiguity of Chinese Causative Resultative V-Vs

311

the conflating root is a complex one – being a complex root, its semantic meaning should be somewhat generic, denoting a category of human activity. Through several speciﬁc cases, we have shown that the compatibility with the proposed syntactic structure, the cultural recognizability, and the “basicness” constraint together are able to explain why semantic ambiguity occurs and why some interpretations are more plausible than others.

References 1. Cheng, L.L.-S., Huang, C.-T.J.: On the argument structure of resultative compounds. In: Chen, M.Y., Tzeng, O.J.-.L. (eds.) In Honour of William S-Y. Wang: Interdisciplinary Studies on Language and Language Change, pp. 187–221. Pyramid Press, Taipei (1994) 2. Chomsky, N.: The Minimalist Program. MIT Press, Cambridge (1995) 3. Fan, S.-Y.: Argument structure in Mandarin Chinese: a lexical-syntactic perspective. Doctoral dissertation, Universidad Autónoma de Madrid (2013) 4. Folli, R., Harley, H.: Consuming results in Italian and English: flavors of v. In: Kempchinsky, P., Slabakova, R. (eds.) Aspectual Inquiries, pp. 95–120. Springer, Dordrecht (2005). https://doi.org/10.1007/1-4020-3033-9_5 5. Halle, M., Marantz, A.: Distributed morphology and the pieces of inflection. In: Hale, K., Keyser, S. (eds.) The View from Building 20, pp. 111–176. MIT Press, Cambridge (1993) 6. Halle, M., Marantz, A.: Some key features of distributed morphology. MIT Working Papers Linguist. 21, 275–288 (1994) 7. Harley, H.: How do verbs get their names? Denominal verbs, manner incorporation and the ontology of verb roots in English. In: Erteschik-Shir, N., Rapoport, T. (eds.) The Syntax of Aspect. Deriving Thematic and Aspectual Interpretation, pp. 42–64. Oxford University Press, Oxford (2005) 8. Haugen, J.D.: Hyponymous objects and late insertion. Lingua 119, 242–262 (2009) 9. Huang, C.-T.J.: Phrase structure, lexical integrity and Chinese compounds. J. Chin. Lang. Teach. Assoc. 19(2), 53–78 (1984) 10. Kratzer, A.: Severing the external argument from its verb. In: Rooryck, J., Zaring, L. (eds.) Phrase Structure and the Lexicon, pp. 109–137. Kluwer, Dordrecht (1996) 11. Li, C.: Mandarin resultative verb compounds: where syntax, semantics, and pragmatics meet. Doctoral dissertation, Yale University (2007) 12. Li, Y.: On V-V compounds in Chinese. Nat. Lang. Linguist. Theory 8, 177–207 (1990) 13. Li, Y.: The thematic hierarchy and causativity. Nat. Lang. Linguist. Theory 13, 255–282 (1995) 14. Lin, J.: Event structure and the encoding of arguments: the syntax of the Mandarin and English verb phrase. Doctoral dissertation, Massachusetts Institute of Technology (2004) 15. Liu, J.: The syntax of VV resultatives in Mandarin Chinese. Doctoral dissertation, University of Victoria (2019) 16. Marantz, A.: No escape from syntax: don’t try morphological analysis in the privacy of your own lexicon. Univ. Pennsylvania Working Papers Linguist. 4(2), 201–225 (1997) 17. Pylkkänen, L.: Introducing arguments. Doctoral dissertation, MIT, Cambridge (2002) 18. Thompson, S.A.: Resultative verb compounds in Mandarin Chinese: a case for lexical rules. Language 49(2), 361–379 (1973)

On the Dynamic Semantics of Fǒuzé and Bùrán in Mandarin Chinese Jiun-Shiung Wu(&) Institute of Linguistics, National Chung Cheng University, Chiayi, Taiwan [email protected]

Abstract. This paper argues that fǒuzé and bùrán share a uniﬁed dynamic semantics: they update an information state by adding, into the information state, a proposition which is evaluated, provided the negation of the information expressed by the discourse before the proposition comes in. SDRT is used to model their dynamic semantics. It is also briefly discussed what it means to evaluate an imperative or a question when fǒuzé/bùrán connects one of them to another clause. Moreover, it is shown that fǒuzé and bùrán can be distinguished in two aspects. First, fǒuzé is not compatible with two propositions which have a parallel relationship, but, instead, is ﬁne when two propositions have a causal relationship. Second, fǒuzé has an ‘anti-good consequence’ property. Keywords: Fouze Buran Dynamic semantics SDRT Mandarin Chinese

1 Introduction Fǒuzé and bùrán are two discourse markers in Mandarin Chinese (henceforth, Chinese), which connect two pieces of discourse. A piece of discourse consists of as short as one clause and possibly more. Fǒuzé and bùrán are interchangeable in some cases, but not in others. In xiàndài hànyǔ bābǎcí zēngdìngbǎn ‘Eight Hundred Words in Modern Chinese, Extended Version’, [1] provides semantic deﬁnitions for these two lexical items as follows: fǒuzé = rúguǒ bù zhèyàng ‘if it is not so’; bùrán = (i) rúguǒ bù zhèyàng ‘if it is not so’, and, (ii) yǐnjìn yǔ shàngwén jiāotì de qíngkuàng ‘introducing an alternative to the preceding discourse’. Given the deﬁnitions, one might expect that fǒuzé, bùrán, and rúguǒ bù zhèyàng ‘if it is not so’ are exchangeable when these two discourse markers both express rúguǒ bù zhèyàng ‘if it is not so’, and that, since disjunctions such as huòzhě ‘or’ also introduces an alternative, bùrán and huòzhě ‘or’ are interchangeable. However, these two statements are not always accurate. See the examples below.

© Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 312–324, 2022. https://doi.org/10.1007/978-3-031-06703-7_24

On the Dynamic Semantics of Fǒuzé and Bùrán in Mandarin Chinese

(1) a. Gāi xiě xìn le, fǒuzé/bùrán/rúguǒ bù zhèyàng, should write letter Prc1 FǑUZÉ/BÙRÁN/if not so huì bù fàngxīn de. will not at.peace Prc ‘We should write home. Otherwise, our family will be worried.’ b. kěyǐ dǎ diànhuà qù zhǎo tā, bùrán/*fǒuzé/huòzhě can make phone.call go find he, BÙRÁN/*FǑUZÉ/or zìjǐ pǎo yì tàng. self run one trip ‘You can call him. Otherwise, you can go see him yourself.’ c. Wǒ zhǐ néng fàngqì, bùrán/*fǒuzé/*huòzhě gāi zěnme I only can quit BÙRÁN/*FǑUZÉ/*or should how ‘I can only quit. Otherwise, what should I do?’ d. Wǒ zhǐ néng fàngqì, rúguǒ bù zhèyàng, gāi zěnme I only can quit if not so should how ‘I can only quit. If not so, what should I do?’

313

jiālǐ family

jiù JIÙ

bàn? do bàn? do

1

As shown in (1a), all of fǒuzé/bùrán/rúguǒ bù zhèyàng ‘if it is not so’ are interchangeable, which is predicted by [1]. In (1b), fǒuzé is not good but bùrán and huòzhě ‘or’ are, which follows [1] as well. But, (1c, d) do not conform to [1]. In (1c), fǒuzé is not good and this means that bùrán introduces an alternative to the preceding discourse. Nevertheless, huòzhě ‘or’, which serves the same function, is not good here. (1d) is just like (1c) except for that the two clauses in (1c) are connected by bùrán but those in (1d) are attached to each other by rúguǒ bù zhèyàng ‘if it is not so’. Since bùrán but not fǒuzé is good in this (mini-)discourse, bùrán introduces an alternative here, rather than expressing rúguǒ bù zhèyàng ‘if it is not so’. However, rúguǒ bù zhèyàng ‘if it is not so’ is good in this example! Given the above complications, this paper attempts to examine fǒuzé and bùrán in a more detailed manner and tries to provide a satisfactory generalization for these two discourse markers, which are argued to have a modal-like semantics in this paper. Moreover, dynamic semantics is proposed to model the discourse behavior of fǒuzé and bùrán. This paper is organized as follows. Section 2 is a literature review, where the literature on fǒuzé and/or bùrán are critically reviewed. In Sect. 3, these two discourse markers are examined in detailed and are argued to have a modal-like semantics. Furthermore, a dynamic semantics is proposed to account for their discourse behavior, and their dynamic semantics are modeled with Segmented Discourse Representation Theory (for short, SDRT) as proposed in [2]. Section 4 concludes this paper.

1

The abbreviations used in this paper include: ASSO for an associative marker, CL for a classiﬁer, DUR for a durative aspect marker, PRG for a progressive aspect marker, Prc for a sentence-ﬁnal particle, Q for a question particle.

314

J.-S. Wu

2 Literature Review There have been quite some descriptive studies on fǒuzé and bùrán, e.g. in chronological order, [3–20]. Among them, [10, 16, 20] discuss the lexicalization of fǒuzé only and will not be reviewed here. [3] suggests that fǒuzé induces a zhèngfǎn duìzhào shì de bìngliè guānxī ‘contrastive, coordinating relationship’ between two propositions. While this statement is partially accurate, it overgeneralizes. Kěshì/dànshì ‘but’, etc. also present a contrastive, coordinating relationship, but they are not interchangeable with fǒuzé. [4] identiﬁes four possible forms of propositions which can be connected by fǒuzé. This work does not discuss the semantics of fǒuzé and hence cannot explain the differences between fǒuzé and bùrán. [5] essentially suggests that the proposition presented by fǒuzé is the result of the previous proposition and that the truth values of the previous proposition and of the latter one are opposite. Again, this statement of opposite truth values is overgeneralizing since it applies to other contrastive conjunctions such as kěshì, dànshì, etc. [6] utilizes propositional logic to model the semantic/discoursal behavior of fǒuzé. First, (¬q ! p)^(¬p ! q) represents the sentence pattern p (, ¬q), fǒuzé, q, zhǐyǒu/bìxū … cái…. Second, (p ! ¬q)^(p ! q) and (p ! r)^(¬p ! q) stand for the pattern p, ¬q, fǒuzé q, or the one p, r, fǒuzé q. Third, (¬p ! q)^(p_q) represents the pattern p, fǒuzé q. One major problem with [6] is that the formulae proposed are not exclusively for the patterns involving fǒuzé. For example, in the following discourse, tā jīntiān zǎoshàng qǐchuáng wǎn ‘he got up late this morning’, represented as p, xìnghǎo ‘fortunately’, shàngbān méiyǒu chídào ‘he was not late for work’, represented as q, (¬q ! p)^(¬p ! q) can also capture the meaning of this discourse. But, this discourse has nothing to do with fǒuzé. [7] proposes that in the sentence pattern S1, fǒuzé S2, fǒuzé = ¬S1. However, if we look at (1c), the second proposition is uttered based on the premise if I do not quit, that is, ¬S1 if S1 = I quit. Yet, fǒuzé is not good here. Hence, [7]’s generalization needs further examination. [8] examines three lexical items: fǎnzhī, xiāngfǎn and fǒuzé. He claims that fǒuzé presents a proposition inferred from a negated proposition which occurs previously in the same discourse. While this characterization is accurate, (1c) remains a problem, similar to the problem for [7]. [9] discusses the grammaticalization, syntactic and semantic behavior of peudoconnectives (lèi liáncí). It is stated that, for two propositions connected together by fǒuzé, the former proposition provides a condition or a reason, the latter is a reversed inference (nìxiàng tuīdǎo) based on the negation of the former proposition and that fǒuzé follows from the previous discourse and paves ways for the latter one (chéng qián qǐ hòu). [9] is very similar to [7, 8] and hence suffers from the same difﬁculty. [10] suggests two functions for bùrán. First, negation of the proposition preceding bùrán helps to infer the one presented by bùrán. Second, the proposition presented by bùrán is an alternative to the previous one. Since [10] is very similar to [1, 7–9], they suffer similar problems. [12] provides a dynamic semantics for fǒuzé. Based on [21]’s update semantics, [12] proposes the following. An information state is composed of two stacks, each of

On the Dynamic Semantics of Fǒuzé and Bùrán in Mandarin Chinese

315

which is a set of pairs of possible worlds. Fǒuzé updates one of the two stacks. While [12] is very enlightening, yet, it fails to take discourse structure into consideration, cf. [2, 22–24], etc., in addition to more or less standard dynamic semantics, such as [21, 25–28], and so on. [13] focuses on the pattern rúguǒ A, nàme B, fǒuzé C. The following points are suggested. First, A is a premise and B a conclusion. If B is an inference, then A is negated and C is the result of ¬A. If B is an imperative, promissive or a necessity, B is negated and C is the result of ¬B. If B is ability or volition, then again B is negated and C results from ¬B. Second, if A is a hypothetical purpose and B the means to realize the purpose, then B is negated and C follows from ¬B. [13] looks at a particular pattern and examines which proposition is negated. But, the semantic function of fǒuzé is not discussed. This is what should be continued. [14] discusses which constituent is focused on in a discourse with fǒuzé. This paper suggests that the focused constituent is the proposition before fǒuzé. The focus (jiāodiǎn) of [14] seems very different from [29–35], etc. More attention should be paid to whether fǒuzé has a focus function as discussed in the literature. [15] discusses the condition under which fǒuzé is optional. While this work is interesting in that the optionality of fǒuzé is under examination, the semantics of fǒuzé is not addressed. [16] talks about degrees of trustability (xìnlàidù), which means the possibility of truth of propositions connected by fǒuzé. For chúfēi A, fǒuzé B, if A is true, the possibility of truth of B is not high; if A is not true, the possibility of truth of B is high. While [16] talks about possibility of truth, his explanation is essentially equal to contrast in truth values of propositions and hence suffer the same problem. [17] is very interesting in that fǒuzé is pointed out not to express truth contrast (zhuǎnzhé jù) because propositions connected by contrastive conjunctions such as késhì ‘but’ or dànshì ‘but’ are true, but the propositions connected by fǒuzé are not necessarily so. This is an enlightening observation and a good ﬁrst step to tell apart fǒuzé on the one hand and késhì/dànshì ‘but’ on the other. [18] conducts a detailed examination on the pragmatic and interpersonal function of fǒuzé. This study approaches fǒuzé/bùrán from functional perspectives and is complementary with the dynamic semantic account argued for in this paper. Given the above critical review, it can be seen that the research question of this paper has not be satisfactorily addressed and therefore a further examination on fǒuzé and bùrán is called for.

3 Modality, Dynamic Semantics and Fǒuzé/bùrán [1] provides semantic descriptions for fǒuzé and bùrán respectively. Fǒuzé means ‘if it is not so’, while bùrán either expresses ‘if it is not so’ or introduces an alternative to the preceding discourse. As pointed out in Sects. 1 and 2, when denoting ‘if it is not so’, fǒuzé and bùrán are not always interchangeable. In addition, while disjunctions hòu/huòzhě ‘or’ also introduce an alternative, bùrán cannot always be substituted for with huò/hòuzhě ‘or’. In terms of the semantic function of fǒuzé, as reviewed in Sect. 2, it is generally accepted that, given a discourse p, fǒuzé q, one of the following propositional formulae

316

J.-S. Wu

holds: either ¬p ^ q or p ^ ¬q. In plain words, fǒuzé introduces contrastive information. However, as shown below in (2), fóuzé cannot be replaced with contrastive conjunctions such as dàn/dànshì ‘but’, etc., and hence fǒuzé must express something more than simple contrast. What’s more, the fact that fǒuzé/bùrán cannot substitute for contrastive conjunctions dǎn/dànshì ‘but’ suggests that a propositional logic account for fǒuzé is too simple to work since fǒuzé/bùrán involve more than contrast in terms of truth values. (2) Gāi xiě xìn huíjiā le, fǒuzé/*dànshì jiālǐ huì dānxīn de. should write letter home Prc FǑUZÉ/*but family will worry Prc ‘We should write home. Otherwise/*but our family will be worried.’

So, exactly what do fǒuzé and bùrán denote? Studies such as [7, 8] and so forth provide an enlightening hint on this issue. [7] suggest that, given S1, fǒuzé S2, fǒuzé = ¬S1. If the same pattern S1, fǒuzé S2 is used to paraphrase [8], [8] basically says that S2 is inferred from ¬S1. To put [7, 8]’s idea in a formal way, given p, fǒuzé/ bùrán q, where p, q are clauses, q is evaluated provided ¬p. Given the above discussion, the following is proposed. Suppose that K1 is an information state, which stands for the information expressed by the discourse up to this moment. If we have a mini-discourse, e.g. p, fǒuzé/bùrán q, then K1 contains only p. When fǒuzé/bùrán introduce a proposition q into the discourse, fǒuzé/bùrán specify that q is evaluated, given ¬K1. This is a semantics very similar to modality. Let’s take must as an example. For must p, p is evaluated, given an epistemic modal base and an appropriate ordering source, depending on whether must is epistemic or deontic. That is, ¬K1 functions for a proposition introduced into a discourse by fǒuzé/bùrán in a manner very similar to a modal base and an ordering for a modal expression. Assume a discourse p, fǒuzé/bùrán q. The major difference between a modal base plus an ordering and the information required by fǒuzé/bùrán is that the proposition introduced by fǒuzé/bùrán is evaluated under the circumstances of negating the information expressed by the discourse before q comes in. This account of evaluation provided negated previous information can be modeled as a type of information state updating: given an information state K1, which contains p, fǒuzé/bùrán updates the information state with the evaluation of a proposition q they introduced under ¬K1. With the idea of negating the information denoted by the discourse before the proposition introduced by fǒuzé/bùrán comes into the discourse, this information state updating account (cf. [21, 28] etc.) for fǒuzé/bùrán has advantages that the previous accounts do not have. For works such as [5, 9], etc., fǒuzé/bùrán indicate a contrast in the truth values of two propositions. As argued before, such an explanation cannot distinguish fǒuzé/bùrán from contrastive conjunctions such as dàn/dànshì but’, and so on. The account proposed in this project is advantageous over the contrast in truth values accounts in that the information state updating account can tell fǒuzé/bùrán and contrastive conjunctions apart. For contrastive conjunctions, such as dànshì ‘but’, the two conjuncts are semantically (and syntactically) parallel. However, for fǒuzé/bùrán, ¬K1 and q are not semantically parallel. Instead, ¬K1 functions like background assumption, under which q is evaluated. Given this semantic inequality between ¬K1 and q, contrastive conjunctions are not compatible.

On the Dynamic Semantics of Fǒuzé and Bùrán in Mandarin Chinese

317

We can model the above dynamic semantic proposal with SDRT. Suppose that a Segmented Discourse Representation Structure (SDRS) represents an information state. Given a (mini-)discourse p, fǒuzé/bùrán q, we get:

An SDRS is represented as a box, as in (3). The circled numbers stand for the order of the element that comes into the discourse (i.e. the order of the element being added into an information state). In the mini-discourse we assume, p comes into the discourse ﬁrst and is labeled as p1. This is the ﬁrst step of constructing a representation of the mini-discourse assumed. Following SDRT’s conventions, a proposition that comes into the discourse is labeled with p plus a subscripted number. When fǒuzé/bùrán(q) comes into the discourse, a sub-SDRS is created. The reason why a sub-SDRS is needed is because these two discourse markers specify that q, labeled as p2, is evaluated provided ¬p1, which is not an explicit part of the original discourse. Therefore, a sub-SDRS is created to indicate that p2 is evaluated in a different “universe” (represented as a sub-SDRS) where ¬p1 is true. p2’ is used to label this sub-SDRS. This is the second step. The third step is that rhetorical relation Contrast connects p2’ to p1.This step captures the sense of contrast discussed in many of the previous studies on fǒuzé. However, as explicitly shown in the SDRS (3), rhetorical relation Contrast does not connect p2 to p1, but instead p2’ to p1. This is where fǒuzé/bùrán differ from contrastive conjunctions: for a contrastive conjunction such as dànshì ‘but’, rhetorical relation Contrast connects the two conjuncts directly, unlike fǒuzé/bùrán. The SDRT proposal argued here captures the semantic inequality between p1 and p2. On the other hand, [14] suggest that fǒuzé induces a focus reading. However, [14] appears to be inconsistent with previous studies on focus, e.g. [29–35], all of which suggest that a focus device has a right-association property, i.e. a constituent to the right of the device receives the focus, but [14] suggests that, given p, fǒuzé q, p receives focus. To maintain [14]’s proposal, a dramatic change to the well-known theories on focus needs to be made. Although the account proposed in this paper does not explicitly invoke focus, yet it can still explain native speakers’ intuition that in K1, fǒuzé/bùrán q, the speaker’s attention is drawn to K1. This is due to the following reasoning. q is evaluated under the circumstances of ¬K1. However, K1 is the information explicitly represented by the discourse before q comes in. To put it differently, ¬K1 is only an assumption, not a fact, under which q is evaluated. Hence, in many cases, q is irrealis. It is reasonable that the addressee’s attention is drawn to a fact, not a proposition evaluated under an assumption.

318

J.-S. Wu

Moreover, an addressee’s attention is not always drawn to K1 in a discourse such as K1, fǒuzé/bùrán q. For example, bùrán can present a suggested alternative. In such cases, the addressee should pay attention to q, rather than K1. For example, (4) Jìrán zhè-ge fāngfǎ bù xíng, bùrán wǒmen gǎiyòng since this-CL method not work BÙRÁN we change.use zhǒng fāngfǎ ba. kind method Prc ‘Since this method does not work, let’s use a second kind!’

dì-èr second

In (4), although the English translation does not reveal bùrán, (4) is still syntactically, semantically and pragmatically good. This sentence is used to suggest an alternative. For examples of this kind, it is not reasonable if the addressee pays attention to K1, that is, since this method does not work. The same reasoning applies to (1c). On the other hand, for examples such as (1a), where q is not a suggestion, the addressee pays attention to K1. The above account of the addressee’s attention can explain the seeming contradictory behavior of fǒuzé in [14]’s proposal: a focus-inducing device, such as only, induces focus on a constituent to its right, but [14] claims that fǒuzé induces focus to a constituent to its left! My above proposal does not suffer from this theoretical problem, which [14] has difﬁculties explaining. One thing about the information state updating account argued above needs clariﬁcation is what evaluate means in this paper. It is argued that q is ‘evaluated’ provided ¬K1. What does it mean? In formal semantics, to evaluate a proposition is to determine whether it is true or false. However, fǒuzé/bùrán do not always introduce, into a discourse, a declarative sentence, which can be determined to be true or not. Rather, these two discourse markers can introduce, into the discourse, an imperative or a question, neither of which can be identiﬁed to be true or false. For example, (5) a. Xiànzài zài xiàyŭ. Bùrán, nǐ kàn wàimiàn. Now PRG rain BÙRÁN you look outside ‘It is raining now. (If you don’t believe me,) Look outside.’ b. Wǒ zhǐ néng fàngqì. bùrán, gāi zěnme bàn? I only can quit BÙRÁN should how do ‘I can only quit. Otherwise, what should I do?’

In the two examples in (5), q is an imperative in (5a) and a question in (5b). An imperative and a question cannot be determined to be true or false. [36, 37] propose that an imperative denotes a To-Do List, which in turns functions as a deontic ordering. The semantics of question can be a set of propositions, a partition of possible worlds, etc., as [38–40] and so on. Hence, to evaluate q in the proposal of this paper includes how a declarative sentence, an imperative and a question receive an interpretation. It extends the sense of evaluate from evaluate a declarative proposition.

On the Dynamic Semantics of Fǒuzé and Bùrán in Mandarin Chinese

319

Since [36, 37] suggest that an imperative adds a property, i.e. the semantics of a predicate, a VP in the traditional sense, to a To-Do List, which serves as a deontic ordering source, in examples such as (5a), given ¬K1, i.e. if you do not believe that it is raining outside, the imperative nǐ kàn wàimiàn ‘you look outside’ is added to a To-Do List. That is, there is a condition on when an imperative can be added into a To-Do List for examples of this type. As for a question that follows bùrán, let’s assume that the semantics of a question is a set of propositions, which serves as (possible) answers to the question, as proposed in [38, 39]. Then, given ¬K1, the set of propositions can be added into the information state. To put it another way, there is a condition on when a set of propositions denoted by a question can be added into an information state. Now, we have a uniﬁed semantics for fǒuzé and bùrán. But, how are they different since they are not always interchangeable in a sentence? Two points are proposed. First, when the two pieces of information involved have a parallel relationship, rather than a causal one, bùrán is good, but fǒuzé is not. Second, fǒuzé has an ‘anti-good consequence’ preference. The ﬁrst point is to explain so-called introducing an alternative function of bùrán as suggested in the literature. Let’s repeat (1a) and (1b) below for the purpose of illustration. (6) a. Gāi xiě xìn le, fǒuzé/bùrán, jiālǐ huì bù should write home Prc FǑUZÉ/BÙRÁN family will not fàngxīn de. at.ease Prc ‘We should write home. Otherwise, our family will be worried.’ b. kěyǐ dǎ diànhuà qù zhǎo tā, bùrān/*fǒuzé/huòzhě can make phone.call go find he, BÙRĀN/*FǑUZÉ/or zìjǐ pǎo yì tàng. self run one trip ‘You can call him. Otherwise, you can go see him yourself.’

jiù JIÙ

In (6a), not write a letter home causes the family get worried. That is, these two situations have a causal relationship. Here, because the two propositions have a causal relationship, both fǒuzé and bùrán are good. On the other hand, in (6b), call him and go ﬁnd him yourself have a parallel relationship. These two situations are two suggestions of equal or parallel semantic status. Hence, in this example, bùrán performs introduction of an alternative to the preceding proposition and fǒuzé is not good in this example. The ﬁrst point discussed above supports that rhetorical relations need to be taken into consideration if we want a complete picture for the discourse behavior of fǒuzé and bùrán. As the critical review of [12] points out, in addition to dynamic semantics, rhetorical relations are also required to have a satisfactory explanation of these two discourse markers. We can add a fourth step to the SDRS (3) to model the ﬁrst point.

320

J.-S. Wu

b. fǒuzé(q) ⇒ Result(¬π1, π2)

In SDRT’s terms, if two propositions have a causal relationship, they are connected by rhetorical relation Result. For two proposition which have a parallel relationship, they are connected by rhetorical relation Parallel. Therefore, for a mini-discourse p, fǒuzé/bùrán q, the fourth step to build a representation of the discourse is to add a disjunction as marked by circled 4 in (7a). (7b) is the constraint for fǒuzé, which states that if fǒuzé connects q (to another piece of discourse), then it must be rhetorical relation Result that connects these two pieces of discourse labeled as ¬p1 and p2. Before the second difference between fǒuzé and bùrán is presented, the issue concerning the interchangeability between huòzhě ‘or’ and bùrán needs addressing. As shown in (1b), huòzhě ‘or’ and bùrán are interchangeable, whereas in (1c) and (4a), they are not. Why is there such a discrepancy? If we examine these three examples more carefully, we can ﬁnd that in (1b) the two propositions on either side of bùrán are actually two options. However, in (1c) and (4a), they are not. In (1c), the speaker, as a matter of fact, intends to express that there is only one option, the one described by the proposition to the left of bùrán. In (4a), the proposition to the right of bùrán is used to support the proposition to the left. Only when the two propositions on either side of bùrán are real options can huòzhě substitute for it. Finally, fǒuzé has an ‘anti-good consequence’ property. This means that fǒuzé does not introduce a good consequence into a discourse. This property is intuitive because fǒuzé can be used as a threat. Bùrán does not have the threatening sense that fǒuzé has. For example, (8) Zuìhǎo zhào wǒ shuō-de zuò, fǒuzé… had.better as I say-ASSO do FǑUZÉ… ‘You’d better do what I said. Otherwise….’

There are examples where bùrán is good, but fǒuzé is not and these examples all involve introducing a good consequence (good for the speaker). See one example below.

On the Dynamic Semantics of Fǒuzé and Bùrán in Mandarin Chinese

321

(9) Wǒ shì zài liànxí shēngqì.Wǒ yǐhòu jiù zhème xiōng jiāo I be PRG practice angry I later JIÙ so mean teach n Bùrán/*fǒuzé guāi yìdiǎn, hǎo ma? you BÙRÁN/*FÓUZÉ nice a.little OK Q ‘I am practicing being angry. I will teach you in such a mean way, later. (If you do not want me to be mean,) behave a little bit, OK?’

In (9), fǒuzé is not good. This is because behave a little bit is a good outcome for the speaker. When the outcome is good, it is not appropriate to use fǒuzé, due to its strong threatening sense. Another very interesting example that can support the anti-good consequence analysis of fǒuzé comes from the following example. In this example, a neutral (at best) result is introduced and hence fǒuzé and bùrán are both OK. (10) Tā zài xǐzǎo ba. Fǒuzé/bùrán, yùshì-de dēng bù He PRG take.bath Prc FǑUZÉ/BÙRÁN bathroom-ASSO light not huì liàng-zhe, chuānghù yě bù huì yǒu shuǐqì. will on-DUR window also not will have moisture ‘He is taking a bath. Otherwise, the light in the bathroom won’t be on and the windows will not be moist.’

What is interesting about this example is that, if someone wishes to continue this discourse, a continuation that describes a bad thing strongly prefers fǒuzé over bùrán. The possible continuation to this example supports the anti-good consequence property of fǒuzé as well. To sum up, given a mini-discourse p, fǒuzé/bùrán q, fǒuzé and bùrán both update information state by adding a proposition q, which is evaluated under ¬p. SDRT is used to model the dynamic semantics of these two discourse markers and SDRS represents an information state. To build a representation for this mini-discourse, ﬁrst, p comes into the discourse, labeled as p1. When q comes into the discourse, a sub-SDRS is created where ¬p1 is added and q, labeled as p2, is evaluated at the same sub-SDRS. This sub-SDRS is labeled as p2’. Third, rhetorical relation Contrast connects p1 and p2’. Fourth, Result(¬p1, p2) _ Parallel(p1, p2), depending on whether it is fǒuzé or bùrán in the discourse and whether the two propositions have a causal relationship. These two discourse markers are different in two respects. First, when the two pieces of information involved have a parallel relationship, rather than a causal one, only bùrán is good, but fǒuzé is not. Second, fǒuzé has an ‘anti-good consequence’ preference.

4 Conclusion In this paper, a detailed examination is conducted on fǒuzé and bùrán in Chinese. It is argued that given a mini-discourse p fǒuzé/bùrán q, fǒuzé and bùrán both update information state by adding a proposition q, which is evaluated under ¬p. SDRT is used to model the dynamic semantics of these two discourse markers and SDRS represents

322

J.-S. Wu

an information state. To build a representation for this mini-discourse, ﬁrst, p comes into the discourse, labeled as p1. When q comes into the discourse, a sub-SDRS is created where ¬p1 is added and q, labeled as p2, is evaluated at the same sub-SDRS. This sub-SDRS is labeled as p2’. Third, rhetorical relation Contrast connects p1 and p2’. Fourth, Result(¬p1, p2) _ Parallel(p1, p2), depending whether it is fǒuzé or bùrán in the discourse and whether the two propositions have a causal relationship. It is also briefly discussed what is added into an information state when q is an imperative or question. If fǒuzé and bùrán connect an imperative to another proposition, a property, i.e. the semantics of a predicate, is added to a To-Do List, which serves as a deontic ordering. If fǒuzé and bùrán connect a question, a set of propositions, which serves (possible) answers to the question, is added into the information state. Two differences between these discourse markers are presented. First, when the two pieces of information involved have a parallel relationship, rather than a causal one, only bùrán is good, but fǒuzé is not. Second, fǒuzé has an ‘anti-good consequence’ preference. It is also demonstrated how the proposal argued in this project is advantageous over the explanations suggested in the previous studies. Acknowledgments. An earlier version of this paper is presented at the 25th Annual Meeting of International Association of Chinese Linguistics in 2017. I thank the audience at IACL 25 for enlightening discussions. I am also grateful to CLSW 2021 reviewers and on-line workshop participants for comments, questions and discussions. I acknowledge the ﬁnancial support of Ministry of Science and Technology, Taiwan, under the grant number MOST 105-2410-H-194086.

References 1. Lü, S., et al.: Xiàndài Hànyǔ BābǎiCí Zēngdìngběn [Eight Hundred Words in Modern Chinese: Extended Version] Commercial Publishing, Beijing (1999) 2. Asher, N., Lascarides, A.: Logics of Conversation. Cambridge University, Cambridge (2003) 3. Wang, Z.: Fǒuzé Zuòyòng de Quèdìng [Ascertaining the Function of fǒuzē]. Huáiyīn Shīfàn Xuéyuàng Xuébào Zhéxué Shèhuìkēxué Bǎn [J. Huaiyin Teacher’s College Philos. Soc. Sci. Edn.] 91 (1995) 4. Meng, J.: “Fǒuzé jù zhōng fǒuzé hòu de chéngfèn [the constituent after fouze in fouze construction]. Yǔwén Xuékān: Gāoděng Jiàoyùbǎn [J. Lang. Lit. Higher Educ. Vers.], 34–36 (1996) 5. Zheng, Y.: Fǒuzé hàn nìjiǎjù – jiānlùn Xíng Fúyì xiānshēng de jiǎyánnìzhuǎjù [Fouze and Hypothetical Inversion Construction: On Mr. Xing’s Hypothetical Inversion Construction]. Jīngmén Zhíyè Jìshù Xuéyuàn Xuébào [J. Jingmen College Voc. Technol.] 16, 27–33 (2001) 6. Liu, B.: Fǒuzé biǎodá de pànduàn hàn tuīlǐ [Judgement and reasoning denoted by fouze]. Huángshān xuébào [J. Huangshan] 10, 109–113 (2008) 7. Wang, C.: Fǒuzé de piānzhāng xiénjiǎ gōngnēng jí qí cíxìng wètí [On the Part of Speech and Discourse Function of Fouze]. Hànyǔ xuéxí [Chinese Learning] 4 (2008) 17–23 8. Wang, Y.: Fǎnzhī, xiāngfǎn, fǒuzē [On fanzhi ‘on the contrary’, Xiangfan ‘oppositely’ and fouze ‘otherwise’]. Zūnyì Shīfàn Xuéyuàn Xuébào [J. Zunyin Teacher’s College] 11, 91 (2009)

On the Dynamic Semantics of Fǒuzé and Bùrán in Mandarin Chinese

323

9. Cao, X., Zhang, L.: Fǒuzé lèi liáncí de yǔfǎhuà tīdù jí qī biǎxiàn [On the grammaticaliztion of connectives of fouze type]. Hànyǔ xuéxí [Chinese Learn.] 4, 11–21 (2009) 10. Jīn, Y.: Guānyú liáncí fǒuzé de chūjià niándài jí qí tèshū yòngfǎ [On the year of origin and the special function of fouze]. Císhū yánjiù [Dic. Study] 1, 45–47 (2009) 11. Lǚ, M.: Bùrán géshì de yǔyì fēnxī [Semantic analysis of buran]. Zhèngzhōu dàxué xuébào zhéxué shèhuì kéxué bǎn [J. Zhengzhou Univ. Philos. Soc. Sci. Edn.] 43, 112–114 (2010) 12. Ju, F.: The Formal Semantics of Fǒuzē. J. Jinan Univ. Nat. Sci. Edn. 31, 477–484 (2010) 13. Zhu, B., Ruguo, A.: fǒuzé C de yǔyì guānlián jí qí ‘fǒu’ de xiáyù [ON the semantics of Ruguo A, name B, fouze C and the scope of negation]. Shìjiè hànyǔ jiāoxué [Global Chinese Teach.] 25, 479–489 (2011) 14. Zhu, B., Wu, Y.: Jùlián Cénggòu yǔ fǒuzé de jiāodiǎn tóushè [Sentence hierarchy and Focus of fouze]. Hànyǔ Xuébào [J. Chinese]. 4, 81–87 (2012) 15. Zhu, B., Wu, Y.: Fǒuzé de yǐnshěng guīlǜ [On the optionality of fouze]. Yǔyán yánjiù [Lang. Res.] 32, 91–93 (2012) 16. Deng, M.: Bùrán de liáncíhuà [Grammaticalization toward a connective of buran]. Xiàndài Yǔwén [Mod. Lang. Lit.] 10, 62–64 (2012) 17. Wang, Y.: Chúfēi A, fǒuzé B Jùshǐ Kǎochá [On the construction of chufei A, fouze B]. Hēilóngjiāng Kēxué [Heilongjian Sci.] 10, 216–217 (2013) 18. Ye, Y., Kao, S., Bing, H.: yě shuō fǒuzé lèi fùjù [Complex sentences of fouze revisited]. Yǔyán yánjiù [Lang. Res.] 34, 85–88 (2014) 19. Wang, Y., Tsai, B., Xu, M., Hu, K.: Huáyǔ fǒuzé lèi duǎnyǔ de piānzhāng yǔ rénjì gōngnéng tànjiù [On the discourse and inter-personal function of phrases such as fouzei]. Taiwan J. Chinese Second Lang. 9, 31–65 (2014) 20. Xu, T.: Jiǎshè liáncí fǒuzé de cíhuìhuà tàxī [On the lexicaliztion of fouze]. Xīnyú xuéyuàn xuébào [J. Xinyu College] 19, 119–121 (2015) 21. Veltman, F.: Defaults in update semantics. J. Philos. Logic 25(1996), 221–261 (1996) 22. Mann, W., Thompson, S.: Rhetorical structure theory: toward a functional theory of text organization. Text 8, 243–281 (1998) 23. Taboada, M., Mann, W.: Rhetorical structure theory: looking back and moving ahead. Discourse Stud. 8, 423–459 (2006) 24. Taboada, M., Mann, W.: Applications of rhetorical structure theory. Disc. Stud. 8, 567–587 (2006) 25. Chierchia, G.: Dynamics of Meaning: Anaphora, Presupposition and the Theory of Grammar. University of Chicago, Chicago (1995) 26. Groenendijk, J., Stokhof, M.: Dynamic Predicate logic. Linguist. Philos. 14, 39–100 (1991) 27. Kamp, H., Reyle, U.: From Discourse to Logic. Kluwer, Berlin (1993) 28. Yalcin, S.: Epistemic modals. Mind 116, 983–1026 (2007) 29. Lee, P., Pan, H.: The Chinese negation bu and its association with focus. Linguistics 39, 703–731 (2001) 30. Rooth, M.: Association with Focus. Ph.D. Dissertation. University of Massachusetts at Amherst, Amherst (1985) 31. Rooth, M.: A theory of focus interpretation. Nat. Lang. Seman. 1(1992), 75–116 (1992) 32. von Stechow, A.: Topic, Focus and Local Relevance. In: Klein, W., Levelt, W. (eds.) Crossing the Boundaries in Linguistics, pp. 95–130. Springer, Berlin (1981) 33. von Stechow, A.: Focusing and Backgrounding Operators. Technical report 6. Fachgruppe Sprachwissenschaft. Universität Konstanz (1989) 34. von Stechow, A.: Current issus in the theory of focus. In: von Stechow, A., Wunderlich, D. (eds.) Semantics: An International Handbook of Contemporary Research, pp. 804–825. Springer, Berlin (1991) 35. Kadmon, N.: Formal Pragmatics. Blackwell Malden (2001)

324

J.-S. Wu

36. Portner, P.: The Semantics of Imperatives within a Theory of Clause Types. In: Young, R. (eds): SALT XIV, pp. 235–252. Cornell University, Ithaca (2004) 37. Portner, P.: Imperatives and modals. Nat. Lang. Seman. 15, 351–383 (2007) 38. Hamblin, L.: Questions in Montague English. Found. Lang. 10, 41–53 (1973) 39. Karttunen, L.: Synax and Semantics of questions. Linguist. Philos. 3–44 (1977) 40. Dekker, P., Aloni, M., Groenendijk, J.: Questions. In: Aloni, M., Dekker, P. (eds): The Cambridge Handbook of Formal Semantics, pp. 560–592. Cambridge University Press, Cambridge (2016) 41. Groenendijk, J., Stokhof, M.: Questions. In van Benthem, J., ter Meulen, A. (eds) Handbook of Logic and Language, pp. 1055–1124. Elsevier, Amsterdam (1997)

Functions of Non-subject Topics in Mandarin Conversations Yanmei Gao1(&) and Guoyan Lyu2 1

2

School of Foreign Languages, Peking University, Beijing 100871, China [email protected] School of Foreign Studies, Beijing Information Science and Technology University, Beijing 100192, China

Abstract. Some Chinese sentences allow a subject to take a full subjectpredicate structure as its Predicate. The ﬁrst subject is called “main subject” or “non-subject topic” (NS-topic) as distinct from the subject in the subjectpredicate conﬁguration. Previous studies mainly focus on the syntactic and semantic relations between the NS-topic and other components in the sentence. Not many researchers observe the functions of such elements in discourse. Based on a corpus of Mandarin conversations, we ﬁrst calculate the occurrences of NS-topics and then examine their functions in sentences and in discourse. Our data show that four groups of NS-topics are prominent in conversations, namely, 1) modiﬁers of the subject, 2) temporal, spatial or conditional adverbials, 3) inverted objects, and 4) dangling topics. All the four groups of NS-topics have special discoursal functions, as lexical cohesive devices, as responses to alternative questions, or as local episodic topics in extended utterances. Keywords: Non-subject topic Episodic topic

Discoursal function Cohesive devices

1 Introduction In Chinese, some sentences allow a subject to take a full subject-predicate structure as its predicate. The ﬁrst subject is called “main subject” and the second “clause subject” [1–4]. The main subject is also known as the “non-subject topic” (NS-topic) [5], the “topic” [6, 7] or the “dangling topic” [3, 8, 9] as distinct from the subject in the subjectpredicate conﬁguration. In this paper, we use the term “non-subject topic” (“NS-topic” henceforth) to distinguish them from the clause subject. What follows are some examples, with NS-topic underlined and in boldface. In each example, the ﬁrst line displays the Chinese characters; the second line is the pinyin version with tone marks. The third line is the word-for-word literal translation and the fourth line, the non-literal translation.

© Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 325–338, 2022. https://doi.org/10.1007/978-3-031-06703-7_25

326

Y. Gao and G. Lyu

(1) 这个女孩眼睛很大。 [7: 92] (possessive) Zhège nǚhái yǎnjīng hěn dà This girl eyes very big This girl’s (NS-Topic) eyes (Subject) are very big/The girl has big eyes (2) 物价纽约最贵。 [10] (aboutness). Wùjià niŭyuē zùi gùi. Price New York most expensive. Price (NS-Topic) New York (Subject) is the highest. (3) 家具旧的好。 [7: 92] (range) Jiājǜ jiùde hǎo Furniture old good (As for) Furniture (NS-Topic) old (Subject) is good (4) 那年他很紧张。 [7: 92] (time) Nèinián 3SG1 hěn Jǐnzhāng That year he very anxious That year (NS-Topic) he (Subject) was very anxious (5) 我们那边雪很厚。 [13: 54] (place) Wǒmen nèibiān xŭe hěn hòu We place snow very thick (In) my hometown (NS-Topic), the snow (Subject) was very thick (6) 衬衫我买了三件。 [14] (inverted Object) Chènshān wǒ mǎi le sānjiàn Shirts 1SG buy ASP2 three Shirts (NS-Topic) I (Subject) bought three. (7) 主治医师, 他几年前就是了。 [14] (inverted Object) Zhŭzhìyīshī, tā jǐniánqián jiù shì le Prime care physician 3SG years ago thus was ASP Prime care physician (NS-Topic), he (Subject) has taken on the position several years ago In all the examples in (1–7), there are two nominal elements in each sentence, the ﬁrst is the NS-topic or dangling topic and the second, clause subject. Li and Thompson [7: 93] called the ﬁrst one “topic” and the second “subject” and stated that “the topic is the deﬁnite noun phrase that is what the sentence is about, and the subject is the noun phrase in a ‘doing’ or ‘being’ relationship with the verb.” Disputes existed among scholars as to the term used to address the ﬁrst nominal element, as some of them are inverted objects as in (6, 7), adverbials of time and place as in (4) and (5), etc. Chafe called it “Chinese style topic” [15]. Pan and Hu’s “dangling topic” only refers to one type of such elements, the noun phrase, which has no semantic

1

Abbreviations of grammatical words are used following the conventions of Li [11] and Fang [12], such as MEAS for measure, ASP for aspect marker, TPART for topic particle, POSS for possessive marker, NEG for negation, etc. A key to the abbreviations is given at the end of the paper.

Functions of Non-subject Topics in Mandarin Conversations

327

or syntactic relation with any component in the clause structure [9], as in (2) and (3). Xu and Langendoen use the term “topic structure” to refer to the whole conﬁguration of topic plus clause [16: 1]. “By ‘topic structure’, we mean any grammatical conﬁguration consisting of two parts: the topic, which invariably occurs ﬁrst, and the comment, a clause which follows the topic and says something about it.” Previous studies have focused on three major aspects of NS-topics: their semantic and syntactical relations with other components in clause structures [1, 3]; the functions of these elements within sentences [3, 14]; their contrastive functions in discourse [17]. Some of the NS-topics are found to have an owner—owned relationship with the subject in the clause, such as “这个女孩 (this girl)” in (1). NS-topics can be inverted objects of the verbs, such as those in (6) and (7) [3]. They can also be the time and place adverbials of the whole sentence, such as time“那年(that year)” in (4) and “我们那边 (my hometown)” in (5) [3, 14]. The NS-topics like 物价 ‘price’ in (2) and 家具 ‘furniture’ in (3), though in a loose aboutness relationship with the clause, have no semantic nor syntactic relation with any components in the clause structure. They are referred to as dangling topics. The contrastive functions of these elements in discourse have also been observed [10, 17]. The functions of aboutness as in (2) and range as in (3) are not appropriately explained within sentence structures. Apart from the contrastive function, NS-topics have other discoursal functions, for instance, as lexical cohesive devices to link the ongoing utterance with previous ones. However, research from a discourse perspective is quite limited and corpus-based studies concerning functions of NS-topic in discourse are scant. Besides the functions of contrasting, do they have other functions in the organization of discourses? To what extent do they maintain the status of subject matter? Do they have the potential of becoming the main subject matter of the whole conversation? In this paper, we adopt a corpus-based study to observe how frequent such elements occur in natural conversations and their functions in sentences and in discourse. In this study, we use NS-topic as a discourse notion and clearly distinguish it from the subject in clause structures. The research questions are reformulated as follows: (1) How often do NS-topics occur in Mandarin conversations? (2) What are their functions in sentences? (3) What are their functions in discourse? The rest of the paper is organized as follows: Sect. 2 introduces the corpus used in the study and the research method taken to answer the research questions. Sections 3 and 4 will discuss the functions of NS-topics in sentences and in discourse respectively with data from the corpus-based investigation. Finally, in Sect. 5 some conclusions are arrived at and future research directions are discussed.

328

Y. Gao and G. Lyu

2 Method and Data To answer the research questions, data from the Peking University Corpus of Spoken Chinese (PKUC henceforth) were used. The part we used contains 31 audio and videotaped conversations in Mandarin Chinese. The length of the conversations was 134.8 min. Details of the corpus are presented in Table 1. Firstly, we examined all the turns and moves in the conversations and collected all instances of the NS-topic. A “turn” in conversations refers to one speaker’s utterance before transiting to the next speaker. Such a turn may be very short, such as a simple response marker 对 ‘yeah’, 嗯 ‘hum, okey’, etc. It can also be quite long, running for a chain of sentences. We use “move” to refer to an intonation unit uttered in one breath. And an intonation unit is “a stretch of speech uttered under a single coherent intonation contour” [18: 47]. In most cases, one intonation unit corresponds with a Pause Clause (PClause) in Song [19] and a clause in English. Table 1. Data of the 31 conversations Conversations Length (minutes) Turns Moves Chinese characters 31 134. 8 1,993 3,893 55,760

Next, we examined the functions of NS-topics within sentences and in discourse. The criteria for judging their functions within sentences were adopted from Yuan’s [3] and Xu and Liu’s [14] discussion of the relations between NS-topics and other components in sentences. In terms of the criteria for judging their functions in discourse, we follow Martin and Rose’s [20] discourse semantics framework, with an emphasis on their functions as lexical cohesion, as response to alternative questions, as episodic topics. All the instances of NS-topic were collected and examined in our data. The total number of NS-topic was 256. Their percentages among turns and moves were calculated as shown in Table 2. Table 2. Occurrences of NS-Topics NS-topics % in turns % in moves 256 12.84 6.58

According to Yuan [3] and LaPolla [17], the 256 instances were further divided into eight subcategories, namely, demonstrative, personal, epithet, time, place, contingency, inverted object and dangling topic. Figure 1 demonstrates the occurrences of the eight subcategories in 31 conversations.

Functions of Non-subject Topics in Mandarin Conversations

329

NS-topics in Conversations 70

61

60

50

50 40

36

35

30 20

37

19 12 6

10 0

Fig. 1. Subcategories of NS-topics in PKUC

As can be seen, the occurrence of non-topic subject among the 31 conversations was very low, 12.84% in turns and 6.58% in moves. This is in line with the ﬁndings from Chen and Gao’s corpus study on the occurrences of NS-topics in written Chinese [21]. Data from the two corpus-based studies show that NS-topics are marked cases in Chinese language. In both written and spoken Chinese, the predominant proportion of sentences are still of the subject-predicate type. Different from ordinary subjects in sentences, NS-topics have both sentential and discoursal functions. In the next two sections, we ﬁrstly explore their functions in sentences (Sect. 3), and then delve into their functions in discourse (Sect. 4).

3 Functions of NS-Topics in Sentences The functions of NS-topics in sentences fall into four categories, as shown in Table 3 below: 1) as modiﬁcation of the subject, i.e. as personal, demonstrative modiﬁers or as epithet; 2) as inverted object; 3) as temporal, locational or conditional adverbials providing settings for the events expressed by the subject-predicate; 4) as pure dangling topic, not directly determined by the predicate, providing a general context for the event expressed by the verb. NS-topics of the ﬁrst category include personal pronouns, general nouns, determiners and epithet realized by nouns or adjectives. Their function is to restrict the scope of the referents of the subject. In example (8) the two NS-topics are realized by two different elements: In T29b by generic personal pronoun whose referents include the speaker herself, and in T30b by common noun representing a class of books.

330

Y. Gao and G. Lyu Table 3. Relations between NS-Topics and other elements in sentences

Scope Modiﬁcation of the subject

Temporal, spatial, and conditional adverbials Inverted object Pure topics Total

Relations Personal Demonstrative Epithet Time Location Contingency Inverted Object Dangling Topic

Occurrences 50 35 12 61 19 6 36

Subtotal 97

Percentage 37.89

86

33.59

36

14.06

37

37

14.45

256

256

100

(8) PKUC0052 T29. SOPHIE b. 对啊,.......大家家里面都会有。 dùi a dàjiā jiā lǐmiàn dōu hùi yǒu Yeah, ah all home inside all would have Yeah, (in) all (NS-Topic) of our homes there is (a series of books, Subject omitted). T30. SOPHIE …… b. 对啊, 文学艺术类那本磨得最快。 dùi a wénxúeyìshù lèi nèi běn mó de zùi kuài Yeah, ah literature and arts category that MEAS torn ADV most quick Yeah, the literature and arts (NS-Topic) book (Subject) was the ﬁrst one to be torn (read most often). In (8) T29b, the sentence is an existential one, with the main verb 有 ‘have’ occurring at the end of the sentence. All the elements before 都 ‘all’, including 大家 ‘all, everyone’, 家 ‘home’ and 里面‘inside’ have the potential to be the adverbial of place. The function of the NS-topic 大家 ‘all, everyone’ is to generalize the referents which include both the interlocutors. In T30b, the relation between the NS-topic and the Subject becomes more complex: in ordinary word order: determiner + measure would come before the epithet (here, the NS-topic). In this one, however, the epithet comes before the determiner + measure and the real subject ‘book’ was omitted. Such word order is not grammatically acceptable in written Chinese, yet it is repeated by the

2

We followed the transcription conventions of Eggins and Slade [22]: T is for turn; here T29 refers to the 29th turn in the extract; labels in capital letter indicate the speaker identiﬁcation; letters a, b, c, etc. mark the moves in one turn.

Functions of Non-subject Topics in Mandarin Conversations

331

two speakers in this conversation for quite a lot of times without causing any misinterpretation. Neither of the two speakers felt the necessity to repair it. NS-topics as temporal, spatial adverbials have been thoroughly discussed in Li and Thompson [6, 7], Yuan [3], and Xu and Liu [14]. Their functions at sentence level are to indicate the time, location or condition of the events expressed by the verbs. Here are two more examples. (9) PKUC045 T16 XING b. 小学阶段也是一直没有接触过音标。 Xiǎoxué jiēduàn yě shì yīzhí méiyǒu jiēchù guò yīnbiāo Primay school stage also is always not learn ASP phonetics (At) primary school (Time), (you) did not learn International Phonetic Alphabet. (10)

PKUCO44 T11 CHRIS c. 中间的一个位置就是大家一起拿来放书包。 Zhōngjiān de yí gè wèizhì jiùshì dàjiā yīqǐ ná lái fàng shūbāo Middle POSS one MEAS seat thus all together take come put bag The seat in the middle (Place), thus all (Subject) can put their schoolbags there.

In (9) 小学阶段 ‘primary school’ functions as adverbial of time while in (10) 中间的一个位置‘The seat in the middle’ functions as adverbial of place. Whether to consider adverbial of time and place as NS-topic remains an unsettled issue among Chinese linguists. Up untill now, most scholars follow Li and Thompson (1981) and regard such elements as NS-topics. Inverted Objects are syntactically related to the Predicate of the sentence, indicating the patient, instrument, or beneﬁciary of the action expressed by the verbs, as shown in example (6) and (7). NS-topics of the fourth category are the typical Chinese style topic [15], or the pure topics. Such elements do not have syntactic relation with any other component of the sentence structure, only loosely related to one element or providing a general context for the whole sentence. NS-topics of this category are sometimes sufﬁxed by the topic markers 的话 ‘as for, about, concerning’ as in (11). (11) PKUC010 T42. L. c. 后勤的话大部分都是女的嘛。 hòuqín de huà dà bùfen dōu shì nǚ de ma. staff TPART large part ADV be woman POSS MPART As for staff (NS-Topic), most of them (Subject) are women,

332

Y. Gao and G. Lyu

In (11), the topic marker 的话 ‘as for, about’ separates the NS-topic from the rest of the sentence, and the conﬁguration of subject (大部分‘most’) ^ predicate (是‘be’) is not grammatically related with the nominal group preceding the topic marker. This type of ns-topics has a special function in the organization of larger spans of utterances.

4 Functions of NS-Topics in On-Going Conversations Apart from their functions within sentences, the four groups of NS-topics all function as cohesive devices to build up continuity across turns in conversations. Modiﬁers of subjects simultaneously function as lexical cohesive devices to connect the current event with previous ones. Some inverted objects turn out to be responses to alternative questions. Temporal, locational and conditional adverbials often have a contrastive function across utterances. Dangling topics more often than not function as episodic topics in the local context. As contrastive functions of NS-topics have been thoroughly discussed in previous studies, we put an emphasis on the other three discoursal functions here. 4.1

NS-Topics as Lexical Cohesions

Lexical cohesion refers to the “cohesive effect achieved by the selection of vocabulary” [22: 274]. In English, lexical cohesive devices include general nouns (words referring to people, creature, stuff, affair, place, idea, etc.), reiteration (repetition), synonyms, and collocation. Such elements help connect the ongoing utterance with what has been said in the previous utterances. In Chinese, nominal phrases at the clause initial position, be it subject or NS-topic, can all be used to link the current utterance to what has gone before it. All the four groups of NS-topic are found to have been used as lexical cohesive devices. They can be simple repetitions, synonyms, or general nouns. In (12) T19, Vincent picks up one element from the previous utterance and makes it the NS-topic for his own utterance. The co-referential elements are underlined and in boldface. (12) PKUC001. T16. CHRIS: g. 还有一个就是台湾的。 háiyǒu yí gè jiù shì Táiwān de another one MEAS ADV is Taiwan POSS (still) another was from Taiwan T17. VINCENT: a. 哦, 这样啊。 eh, zhèiyàng a eh, like this Eh, I see T18. CHRIS: a. 我们寝室的也是台湾的。 Wǒmen qǐnshì de yě shì Táiwān de

Functions of Non-subject Topics in Mandarin Conversations

333

We dormitory POSS also Taiwan POSS One in my dormitory was also from Taiwan T19. VINCENT a. 台湾那个学缅甸语的是不是? Táiwān nèi ge xúe miǎndiànyǔ de shì bú shì Tanwan that MEAS study Burmeses POSS is NEG The Taiwan (girl) (NS-Topic) the one who is studying Burmese, isn’t she? “The Taiwan (girl)” in (12) T19 is the element just mentioned by the prior speaker. By repeating this element at the beginning of his own utterance, Vincent builds up a lexical connection with the previous talk and sets up a basis for the information to be introduced, “who is studying Burmese”. The function of this NS-topic can be interpreted in two ways: a) as inverted Object, Isn’t she the Taiwan girl who is studying Burmese? or b) as the elliptical response to the statement in T18, you mean the Taiwan girl who is studying Burmese, don’t you? Lexical cohesive function of NS-topics like this one is one of the typical functions discussed in literature. It is in this sense that Shi [8: 386] deﬁnes topic as “an entity that has been mentioned in the previous discourse and is being discussed again in the current sentence, namely, topic is what the current sentence is set up to add new information to.” Here Vincent is adding new information about the girl mentioned in Chris’s utterance. 4.2

Inverted Object as Response to Alternative Questions

An alternative question offers two or more options so that the respondent may choose one and more answers among the options. Most Chinese alternative questions contain the construction of “A 还是B” (A or B) or “是A 还是B” (be A or B). Inverted objects are mostly used after two competing objects occur in previous utterances. Here their function is to pick out the referent to be focused on. (13) PKUC033 T23 LEE a. 你们是订的酒店还是民宿那种? (是A还是B) Nǐmen shì dìngde jǐudiàn háishì mínsù nèizhǒng You is order hotel or homestay What did you book, hotel or homestay? T24 WANG a. 青年旅舍好像, 有酒店, 都有。 Qīngnián lǔshè hǎoxiàng yǒu jiǔdiàn dōu yǒu Hostel seems have hotel all have Hostel, it seemed, and also hotel, both. In example (13), the two speakers were talking about Wang’s plan to travel. Lee asked Wang what kind of hotel they had booked, using an alternative question with two options. Wang answered the question with an inverted object, with the verb 订 ‘book’ omitted. Fox and Thomspon (2010) noted that answers to alternative questions in English fall into two categories, marked and unmarked. Unmarked responses include single words or phrases. Answers in the form of clause or clause complex are marked

334

Y. Gao and G. Lyu

ones, indicating the responders’ questioning of the question design of the previous speaker. In Chinese, answers consisting of single words or phrases are also unmarked, with the verbs omitted. As Chinese alternative questions are mostly in the form of “A or B” or “be A or B”, the answer consisting of A, B or other single word or phrase will in most cases be the inverted object, and also the preferred answer form [24]. 4.3

NS-Topics as Episodic Topics in Conversations

Episodic topic refers to the aboutness of an extended utterance in one turn or across several turns. This kind of topic may be a particular person, place, or an event the interlocutors have talked about for a while. It is in this sense that topic is considered as a “discourse notion” which can function in a special way in the ongoing discourse [7: 100]. A related notion is discourse topic, which was deﬁned as “proposition entailed by the joint set of sentences in the given discourse” [25]. A shift of topic can be clearly felt by the interlocutors or explicitly said by one of them. Among the four groups of NStopics, dangling topics mostly function as episodic topics to begin an extended utterance, as in (14). (14) PKUC004 T51. VINCENT a. 有件事,我刚想说, Yǒu jiàn shì, wǒgāng xiǎng shuō, One thing, 1SG just want say, One thing I want to add, b. 你这十九号交了论文, 总算结了一门课了, 多开心。 nǐ zhè shíjiŭhào jiāo le lùnwén, zǒngsuàn jié le yì mén kè le, duō kāixīn 2SG this 19th submit ASP paper, just complete ASP one MEAS course ASP very relaxed after submitting your paper by 19th, you will complete one course, how relaxing. The element of有件事 ‘one thing’ in (14) may have two readings, as inverted Object of 说 ‘say’ and as the episodic topic. The scope of this topic goes beyond the sentence boundary and extends into the whole utterance, which consists of two sentences, the ﬁrst being “one thing I want to add”, and the second being a more complex sentence which contains three layers of sentence structure. The layers of the second part of the utterance may be analyzed as follows: When you submit your paper by 19th, you will complete one course, (this is something you are to be) happy with. Episodic topics can also provide a whole context for the introduction of a new event, as in (15). Here the speaker is comparing one part of the moat in her hometown with that of the Forbidden City in Beijing. (15) PKUC003 T22 MELIA: e. 我们家那护城河那个最宽的地方看得到那格局特别像 Wǒmen jiā nèi hùchénghé nèige zùi kuān de dìfāng kàn de dào nèi géjú tèbié xiàng My hometown the moat that most wide place look ADV that pattern very like

Functions of Non-subject Topics in Mandarin Conversations

335

The building pattern seen from the widest part of the moat in my hometown looks very much like. 故宫的那个地方…… Gùgōng de nèige dìfāng. Forbidden City POSS that place. the part of the Forbidden City. f. 就是故宫环着的那个河看后面那栋的样子。 Jiùshì gùgōng huán zhe de nèige hé kàn hòumiàn nèidòng de yàngzi Just the Forbidden City round ASP POSS that river look back that POSS shape Just like the building pattern seen from the other side of the moat of the Forbidden City. … T28. AMELIA. a. 然后那个护城河也确实特别宽, Ránhòu nèige hùchénghé yě qùeshí tèbié kuān Then that moat also actually very wide Then, that moat is really very wide, b. 那一带看上去挺漂亮的, Nèiyídài kànshàngqù tǐng piàoliàng de That part look very beautiful POSS The whole area is very beautiful, c. 很舒服。 Hěn shūfú Very comfortable Very comfortable In (15) the function of 我们家那护城河那个最宽的地方 ‘the widest part of the moat in my hometown’ in T22 is to introduce a new topic the moat into the conversation. By narrowing down the scope of the referent from the city ‘my hometown’, then ‘the moat’, and ﬁnally to ‘the widest part of the moat’, the speaker undoubtedly singles out the moat in her hometown as the episodic topic in her own utterance. ‘The moat’ remains as the topic until T28 where Amelia summarizes their discussion on the moat area with an evaluation ‘very comfortable’. After that, the topic shifts back to “the city” which is the macro topic of the whole conversation. Among the 37 dangling topics in the corpus, none of them is found to be the macro topic of the global conversation. All of them function as episodic topics in local utterances, which were later picked up by the speakers as subject in the ongoing discourse. A similar notion in literature is Song’s notion of “generalized topic” [19]. A generalized topic refers to “a syntactic component of a PClause”, usually a nominal phrase “functioning as the subject, object, attributive in the clause in traditional grammar.” The similarity between Song et al.’s generalized topic and episodic topic is that their scope extends beyond the clause boundaries. However, the generalized topic will always have a syntactic role to play in the extended sentence. Episodic topics have no such roles to play in sentence. All the 37 dangling topics are the episodic topics which have no syntactic roles in sentence structures.

336

Y. Gao and G. Lyu

5 Conclusions The current study retested the typology of Chinese as a topic-prominent language by focusing on one particular type of topic, the non-subject topic. Based on data from a corpus of natural conversations, we draw some preliminary conclusions. Firstly, when we strictly distinguish NS-topic from subject, the occurrences of NS-topics are very low. In line with ﬁndings from previous studies, a large proportion of the NS-topics also have a function within sentences, i.e., as modiﬁers of subjects, as temporal, spatial, conditional adverbials, or as inverted object. Only a small percentage of them are pure dangling topics. From a discourse semantics perspective, roles of NS-topics in conversations are examined. Those which function within nominal group also have a role as lexical cohesive devices, as responses to alternative questions. Dangling topics often function as episodic topics which cover a string of utterances. In such cases, their function is to introduce new information which has not been mentioned before. Their scopes go beyond simple sentences into more extended span of conversations. However, they seldom develop into global conversation topics. LaPolla [17] claimed that although in Chinese language, the subject and topic are two different concepts, yet, Chinese is not topic-prominent and not SVO language but topic-comment language. Obviously, he is using topic as a semantic notion, indicating the global semantic structure of Chinese sentences. In this study, we follow Li & Thompson’s [6, 7] and Chao’s [1] criteria to strictly distinguish non-subject topics from subject in the subject-predicate structures in conversations, and we see that in a strict sense, the prototypical topic—NS-topics—in Chinese daily conversations are not as prominent as the ordinary subject in subject-predicate conﬁgurations. NS-topics are marked cases in Chinese language. When speakers use such kind of structures either in sentences or in extended utterances, or even across utterances, they have a strong reason to do so. The driving force beyond the choices of NS-topics are the necessity of connecting the current utterance with the ones in previous turns. Findings from this study supported the conclusions of Chen and Gao [21] and once again challenged the typology of Chinese as topic-prominent language as far as we keep topic and Subject as distinct notions from each other. When most Chinese linguists still follow the consensus of considering topic as the semantic function of subject [26], our study shows that the Chinese style topic—the NS-topic—should be treated separately at both the sentence level and the discourse level. The position of Li and Thompson [6, 7] in regarding topic as a discourse notion is reafﬁrmed and the confusion of mixing the semantic function within sentence with its discourse function in extended utterances could be solved. To solve the problem of mixing the notion of “topic” as both a semantic and a syntactic notion, future researchers may develop some new concepts from the discourse semantic perspective. Abbreviations 1SG 2SG 3SG

ﬁrst person singular second person singular third person singular

Functions of Non-subject Topics in Mandarin Conversations

ASP MEAS NEG TPART POSS

337

aspect maker measure negation topic particle possessive

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.

13. 14. 15. 16. 17.

18.

19. 20.

Chao, Y.R.: A Grammar of Spoken Chinese. The Commercial Press, Beijing (2011) Zhu, D.X.: Notes on Grammar. The Commercial Press, Beijing (1982) Yuan, Y.L.: Topicalization and grammatical relations. Stud. Chin. Lang. 4, 241–254 (1996) Shen, J.X.: On minor sentence and flowing sentences in Chinese: in commemoration of the 120th birthday of Yuen Ren Chao. Stud. Chin. Lang. 5, 403–415 (2012) Chen, G.H., Wang, J.G.: Unmarked non-subject topics in Chinese. Chinese Teach. World. 3, 310–323 (2010) Li, C.N., Thompson, S.A.: Subject and topic: a typology of language. In: Li, C.N., (ed.) Subject and Topic, pp. 457–489. Academic Press, New York (1976) Li, C.N., Thompson, S.A.: Mandarin Chinese: A Functional Reference Grammar. University of California Press, Berkeley (1981) Shi, D.X.: Topic and topic-comment constructions in Mandarin Chinese. Language 76(2), 383–408 (2000) Pan, H.H., Hu, J.H.: A semantic-pragmatic interface account of (dangling) topics in Mandarin Chinese. J. Pragmat. 40, 1966–1981 (2008) Chen, P.: Pragmatic interpretations of structural topics and relativization in Chinese. J. Pragmat. 26, 389–406 (1996) Li, E.-H.: A Systemic Functional Grammar of Chinese: A Text-Based Analysis. Continuum, London (2007) Fang, Y.: A study of topical theme in Chinese: an SFL perspective. In: Webster, J.J. (ed.) Meaning in Context: Implementing Intelligent Application of Language Studies, pp. 84–114. Continuum, London (2008) Gao, Y.M., Lyv, G.Y.: Marked theme in spoken Chinese: a discourse semantics perspective. J. World Lang. 6(1–2), 46–69 (2020) Xu, L.J., Liu, D.Q.: Topic: Structural and Functional Analysis. Shanghai Education Press, Shanghai (2007) Chafe, W.: Giveness, contrastiveness, deﬁniteness, subject, topic, and point of view. In: Li, C.N. (ed.) Subject and Topic, pp. 25–55. Academic Press, New York (1976) Xu, L.J., Langendoen, D.T.: Topic structures in Chinese. Language 61(1), 1–27 (1985) LaPolla, R.: Chinese as a topic-comment (not topic-prominent and not SVO) language. In: Xing, J. (ed.) Studies of Chinese Linguistics: Functional Approaches, pp. 9–22. Hong Kong University Press, Hong Kong (2009) Du Bois, J.W., Schuetze-Coburn, S., Cumming, S., Paolino, D.: Outline of discourse transcription. In: Edwards, J. A., Lampert, M. D. (eds.) Talking data. Transcription and Coding in Discourse Research. Lawrence Erlbaum, Hillsdale, pp. 45–89 (1993) Song, R.: On generalized-topic-based Chinese discourse structure. In: Proceedings of CIPSSIGHAN Joint Conference on Chinese Language Processing, pp. 23–33 (2010) Martin, J.R., Rose, D.: Working with Discourse: Meaning Beyond the Clause, 2nd edn. Continuum, London (2007)

338

Y. Gao and G. Lyu

21. Chen, J., Gao, Y.: Is Chinese topic-prominent language? Foreign Lang. Teach. 5, 11–14 (2000) 22. Eggins, S., Slade, D.: Analysing Casual Conversation. Cassell, London (1997) 23. Halliday, M.A.K., Hasan, R.: Cohesion in English. Longman, London (1976) 24. Xie, X.: Unmarked responses to yes/no and wh-questions in Chinese natural conversations. In: Fang, M., Cao, H.L. (eds.) Interactional Linguistics and Chinese Language Studies, vol. 2, pp. 248–268. Social Sciences Academic Press, Beijing (2018) 25. van Dijk, T.A.: Episodes as units of discourse analysis. In: Tannen, D. (ed.) Analyzing Discourse: Text and Talk, pp. 177–195. Georgetown University Press, Georgetown (1981) 26. Her, O.S.: Topic as a grammatical function in Chinese. Lingua 84, 1–23 (1991)

A Comparative Study of Two Motion Verbs Lái and Guòlái Ziyan Li(&) Graduate School of Language and Culture, Osaka University, Suita, Japan [email protected]

Abstract. The two motion expressions lái 来 ‘come’ and guòlái 过来 ‘come over’ demonstrate similar semantic behaviors in usage. Though they can be used interchangeable in many occasions, there exist situations where guòlái cannot be exchanged with lái. This paper summarizes the factors essential to such usage phenomenon by adopting the methods of corpus comparison and word replacement. We ﬁnd that, if lái and guòlái are used as the main verbs in the motion event expression, only guòlái denotes the meaning of restraint, prohibition, or negation. Meanwhile, the subjects of the motion event shall be present. A possible explanation is, guò 过 ‘past’ represents the semantics of “motion process” and hence, guòlái emphasizes the motion process. In the “verb + lái/guòlái” structure, the polysyllabic verbs, the non-motion verbs, and the nontypical motion verbs have certain restrictions in collective usage with lái. Keywords: Motion event

Lái Guòlái Verb Directional complement

1 Introduction The word lái and guòlái in Chinese are both used as verbal components that demonstrate the movement from far to near. Both have similar semantics. They can also serve as the main verb, and the directional complement of the verb structure in the sentence. In practical use however, there exists some usage differences, such as the following examples: (1)

a. Shì nǐ qǐng tā guòlái de, wèi shénme yīyánbùfā? ‘You invited her to come over, why didn’t you say any word?’ b. Shì nǐ qǐng tā lái de, wèi shénme yīyánbùfā? ‘You invited her to come, why didn’t you say any word?’

© Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 339–352, 2022. https://doi.org/10.1007/978-3-031-06703-7_26

340

Z. Li

(2)

a.

“

”

Xiǎohuǒzi guòlái dào: "lǎodà, bù néng dài tā zǒu ya!" ‘A guy come over and said, boss, we cannot take her away!’ *b.

“

”

Xiǎohuǒzi lái dào: "lǎodà, bù néng dài tā zǒu ya! " ‘A guy come and said, boss, we cannot take her away!’

(3)

a. Wǒmen bù néng ràng tǎnkè guòlái! ‘We can't let the tank come over!’ b. Wǒmen bù néng ràng tǎnkè lái! ‘We can't let the tank come!’

The sentence (b) in the three example groups are derived by replacing guòlái with lái. We can ﬁnd that in the example group (1), the example (1b) is still grammatically valid, and the semantics are basically the same as example (1a), which means lái and guòlái can be used interchangeably. In the example (2) however, the sentence (2b) is not grammatically valid if we replace guòlái with lái. In the example group (3), example (3b) is grammatically valid, but the semantics of the two sentences (3a) and (3b) are different. Actually under the context of (3a), only guòlái can be used. Although lái and guòlái have similarities in semantics and usage, they are not completely equivalent. There exist situations where lái and guòlái cannot be interchanged according to speciﬁc contexts. Native speakers can freely choose appropriate expressions according to their language sense, but learners of Chinese may feel confused when they use lái or guòlái. Example (4) shows some typical error cases retrieved from the HSK Dynamic Composition Corpus (http://hsk.blcu.edu.cn/), the author’s corrections are shown in parentheses.

A Comparative Study of Two Motion Verbs Lái and Guòlái

(4)

341

a. Tā shì wèile xué yī dào Zhōngguó guòlái de. (lái) ‘He came over to China to study medicine.’ (came) b. Wǒmen mùqián zhǐ děngdài zhè rìzi de guòlái.

dàolái

‘We are only waiting for this day to come over.’ (come)

In order to investigate the differences in the usage of lái and guòlái, and provide guidance for language learning and language teaching, this paper focuses on the usage that lái and guòlái act as main verbs and directional complements in the motion events expression, and describes the semantics of lái and guòlái, using speciﬁc example sentences as materials to explore the reasons for the usage differences. Section 2 analyzes the interchangeability and non-interchangeability of lái and guòlái when they are used as the main verbs and then, explains the reasons of the difference from the conceptual meaning. Section 3 describes the difference between lái and guòlái as directional complements, and then, we summarize the factors essential to the usage of lái and guòlái in the “verb + lái/guòlái” structure for speciﬁc context, and derive the reasons for each factor. Section 4 summarizes and concludes this work.

2 The Conceptual Meaning of Lái and Guòlái “Deixis” refers to the functions of personal pronouns, demonstrative pronouns, tenses, and various grammatical and lexical features that connect speech and speech behavior in the time and space coordinates [1]. Without considering the speaker’s condition, it is impossible to determine the object indicated by the context of speaking [2]. According to the deﬁnition, both lái and guòlái in Chinese are verbal elements with deixis properties used to express motion. Researchers have different opinions on the question of whether lái and guòlái are the language units at the same level. Chao [3] classiﬁed both lái and guòlái into the intransitive action verbs representing lái, and regarded guòlái as a verb-complement compound. Zhu [4] regards guòlái as a phrase with a verb-complement structure, and considers lái as a directional complement of guò. In order to facilitate comparison and analysis, this paper regards lái and guòlái are the language units at same level. That is, both are regarded as verbs. As described in the Sect. 1, in some situations, guòlái as the main verb of motion expression can be interchanged with lái. In other situations, however, only guòlái can be used. These situations can be demonstrated as follows:

342

Z. Li

(5)

a. Nǐ kěyǐ lìkè qù qǐng jǐngchá guòlái. ‘You may ask the police to come over right away.’ b. Nǐ kuài guòlái chī. ‘Come over and eat.’ c. Nà wǒ chīwán fàn guòlái. ‘I'll come over after dinner.’

(6) Tiān bù zǎo le. Duìzhǎng, nǐ bié guòlái. ‘It’s getting late. Captain, don't come over.’

(7) Yǒu wǒ zài, tā gǎn guòlái? ‘With me here, does it dare to come over?’

The guòlái in the example (5) can be replaced with lái, and the guòlái in the example (6) cannot be replaced with lái. In example (7) however, the replaceability can only be determined according to the context. By observing the above examples, we can ﬁnd that, in general, if both of the following two factors are satisﬁed, one can only use guòlái. Otherwise, both lái and guòlái can be used. Factor 1: The sentence expresses the semantics of restraint, prohibition, or negation. Factor 2: The motion subject has presence, that is, the motion subject and the speaker are in the same space-time environment. In the example (5) where guòlái can be replaced with lái, example (5a) state the motion process itself, which does not satisfy Factor 1. Besides, the motion subject was not on the scene when the motion started, which means Factor 2 is not satisﬁed.

A Comparative Study of Two Motion Verbs Lái and Guòlái

343

Example (5b) satisfy factor 2, but the sentence itself does not express the semantics of restraint, prohibition, or negation, which violates Factor 1. In the example (5c), guòlái satisfy Factor 1. But at the time point when the motion occurs, the motion subject and the speaker are not in the same space-time environment, which means Factor 2 is not satisﬁed. Example (5c) can be compared with (6) which has similar structures but satisfy Factor 2. Example (7) reveals the effects of presence on the use of lái and guòlái. At the time point when the motion occurs, if the tā 它 ‘it’ in the example sentence is on the scene (on the time point), speaker can only use guòlái. If tā is not on the scene, both lái and guòlái can be used. Cognitive linguistics believes that there is a corresponding relationship between syntactic structure and conceptual structure. Based on this perspective, Wang [5] pointed out that, an event can contain multiple behavioral elements and objects. In daily communication, part of the information is often selected to represent the whole event according to needs, that is, “actual scene information = verbal information + default information”. The cognitive basis is the metonymy relationship in which parts represent the whole. In other words, even for describing the same event, the speaker can choose different angles to describe according to different expression needs, and ﬁnally form different expression styles. Therefore, we believe the difference in the use of lái and guòlái, is determined by whether certain differences in the semantic features of the two can be set as default information in a speciﬁc context. Furthermore, we believe that the difference between the semantic features of lái and guòlái is that, guò 过 ‘past’ adds “motion process” semantic, so that guòlái emphasizes the motion process. lái cannot express “motion process”, nor meet the requirements for emphasizing the motion process in speciﬁc contexts. Lakoff [6] states that humans form a certain image schema based on physical experience, and form a syntactic structure based on the image schema consequently. “Source-Path-Destination” is one of the human kinesthetic image schemas. Liu [7] discussed the semantic composition of the combined form of positional moving words (verbal components), and pointed out that the semantic components of guò include “route (R)”, lái includes “deictic”. The difference between lái and guòlái is that, the semantic component of the former is only the “deictic” represented by lái, and the latter is the “route + deictic” which is a combination of guò and lái. The “route” in Liu’s work refers to the “contour along which the motion process”, which is similar to the “path” in the Lakoff kinesthetic image schema, and both represent the process of motion. Therefore, the reason why guòlái can only be used as the main verb in an event that expresses prohibition, prohibition, or negation (Factor 1), and the subject of motion shall be present (Factor 2) is: the motion did not occur or occurred but did not ﬁnish. The speaker’s purpose is to prevent, stop or negate the movement itself, which emphasizes the motion process rather than the motion result. The “process” cannot be set as the default information, and this information needs to be expressed by the semantics of guò. On the other side, in the context where “process” can be set as the default information, the speaker can choose whether to display this information in the sentence according to his own perspective and expression needs, thus forming lái and guòlái interchangeable situation. Based on the above understanding, we made the conceptual meaning diagrams of lái and guòlái shown in Fig. 1 and Fig. 2.

344

Z. Li

Fig. 1. The conceptual meaning of lái

Fig. 2. The conceptual meaning of guòlái

Figure 1 shows the conceptual meaning of lái, and Fig. 2 shows the conceptual meaning of guòlái. The circle in the ﬁgures represents the subject of motion, the triangle represents the speaker. The range inside the quadrilateral represents the range of conceptual meaning, and the arrow represents the motion action. The conceptual meaning of guòlái covers both the motion subject and the speaker, as well as the entire process of the motion action; the conceptual meaning of lái only covers the motion subject and the speaker, and the motion action only covers its direction instead of the entire process. So far, we discussed the factors that affect whether lái and guòlái are replaceable, and the difference in the conceptual meaning, when lái and guòlái are used as the main verbs of motion expression.

3 Lái and Guòlái as Directional Complements This section discusses the factors that affect the choice of directional complement. Lái and guòlái can be used as main verbs alone, and can also be used as directional complements to follow other verbs. The use of lái and guòlái as complements can be replaceable or irreplaceable. 3.1

Syllable Number of Verb

The combination of language units with different syllables count will form different prosodic structures. In Chinese, there is a close relationship between prosodic structures and sentence legitimacy. Lv [8] pointed out that Chinese single and double syllables affect the structure of words. “Dual-syllable words…often require that the following word is also dual-syllable”. Feng [9] emphasizes the impact of prosody on syntax, and believes that syntax is not independent of phonetics. Pronunciation habits (prosody) can disrupt, legalize or illegalize the original component structure. We found that the number of syllables in the verb also affects the use of lái and guòlái, as is shown in the following examples:

A Comparative Study of Two Motion Verbs Lái and Guòlái

345

Guó nèi yǒu jiāxìn jì guòlái. ‘I have a letter come from home.’ b. Míngcè gěi tā jìlái. ‘Sent him the roster.’

a. Bái sè de huā bù kěn líkāi shì de piāoluò guòlái. ‘The white flowers fell and come over as if they refused to leave.’ *b. Bái sè de huā bù kěn líkāi shì de piāoluò lái. ‘The white flowers fell and come as if they refused to leave.’

In the above example sentences, the same verbs are used in the verbal structure of the bold part of the two sentences (a) and (b). The difference is that, the complement of sentence (a) is guòlái and the complement of the sentence (b) is lái. Apparently, the number of syllables of a verb does affect its combination with lái or guòlái. This effect can be expressed as: when the verb is a monosyllabic word, both lái and guòlái serve as directional complements after the verb, such as (8). When the verb is a polysyllable word, the complement is usually guòlái, as in examples (9). If we replace the verbs in example (9) with monosyllable verbs with similar semantics, as shown in the following example (10), lái can appear in the complement position. (10) a. Yī dào shǒudiàn guāngliàng cháo tā sǎo guòlái. ‘A flashlight swept towards her.’ b. Lànghuā xiàng chuán shàng sǎolái. ‘Waves swept onto the boat.’

346

Z. Li

It is worth noting that although syllables affect the use of lái and guòlái, the effect is not mandatory. Following example shows the effects: (11) a. Tā lìkè fēibēn guòlái le. ‘He rushed over immediately.’ b. Nàbiān fēibēn láile liǎng zhī měngshòu. ‘Two beasts rushed over there.’

(12) a. Tā zìjǐ zǒu de guòlái de. ‘He can come over by himself.’ *b. Tā zìjǐ zǒu de lái de. ‘He can come by himself.’

(13) a. Nà gè gànbù zǒule guòlái. ‘That officer walks over.’ b. Cóng xiāngxià zǒule lái. ‘Walk over here from the countryside.’

It can be seen that, example (9) that can only use guòlái. Example (11) can use both lái and guòlái. Example (12) and (13) reveal the combinations of dual-syllable structure, which is formed by monosyllabic verbs and auxiliary words, with lái and guòlái. The combination of zǒu de 走得 ‘come’ can only be used with guòlái. The combination of zǒule 走了 ‘walk’ can be followed by both lái and guòlái. The usage of zǒule

A Comparative Study of Two Motion Verbs Lái and Guòlái

347

lái 走了来 ‘walk over’ has a certain degree of classical Chinese flavor, and its use in daily conversations is restricted. Based the previous discussion, we draw the following conclusion: the number of syllables of the verb will affect the legitimacy of “verb + lái/guòlái”, and polysyllable verbs tend to be matched with the polysyllable guòlái. However, such effect does not necessarily determine the legitimacy. More generally, the number of syllables affects the “acceptance level” of native speakers in a speciﬁc context. In addition to the number of syllables, stylistic styles, language habits, and even prosody, antithesis, and symmetry relationships in a broad sense will also affect the acceptance and choice of native speakers. The performance and degree of these factors require more in-depth research. 3.2

Verb Meaning

In Sect. 3.1, we found that polysyllable verbs tend to be collocated with guòlái, while monosyllable verbs have no such tendency. Moreover, we found that lái and guòlái have different collocations with different monosyllabic verbs, such as: (14) a. Tā měng de pū guòlái. ‘He leaped over.’ b. Nàge háizi xiàng wǒ de chēlún pūlái. ‘That kid leaped at my wheel.’

(15) a. Yī wèi xiǎojiě kào guòlái xiào wèn wǒ. ‘A lady leaned over and asked me with a smile.’ *b. Yī wèi xiǎojiě kàolái xiào wèn wǒ. ‘A lady leaned over and asked me with a smile.’

In example (14), the directional complements behind the verb pū can be either lái or guòlái, which is different from the verbs in example (15), where kào can only be combined with guòlái. Therefore, in addition to the number of syllables, there exists

348

Z. Li

more factors which can affect the legitimacy of the structure of “verb + lái/guòlái”. Liu [10] compared the directional complements of “V lái” and “V guòlái”, and pointed out that the verbs that can be used together with lái and guòlái, are not exactly the same. However, the paper only lists some verbs that cannot be directly combined with lái, or rarely used with guòlái. The characteristics of these verbs are not discussed. It is well-known that, a word is a combination of sound and meaning. Therefore, in addition to phonetic factors, the meaning of a word is also a factor that affects the legitimacy of a sentence. In order to explore the constraints of the verb meaning over its complement lái/guòlái, we collected the corpus randomly selected from the BCC Corpus (http://bcc.blcu.edu.cn/) that conforms to the structure of “V lái/guòlái”, and match different verbs with the directional complement lái/guòlái. The result is summarized in Table 1 (In order to eliminate the interference of the number of syllables, we only take monosyllable verbs as the examples). Table 1. The collocation of verbs and directional complement lái/guòlái Type 1: can be used with lái and guòlái Group A: zǒu 走 ‘walk’/pǎo 跑 ‘run’/fēi 飞 ‘fly’/pū 扑 ‘flutter’/piāo 飘 ‘float’/pá 爬 ‘climb’/yǒng 涌 ‘surge’ Group B: jì 寄 ‘send’/dǎ 打 ‘beat’/qiǎng 抢 ‘grab’/zhuā 抓 ‘catch’/yùn 运 ‘transport’/lā 拉 ‘pull’/tōu 偷 ‘steal’

Type 2: restrained use with lái Group A: kào 靠 ‘lean against’/wéi 围 ‘surround’/zuò 坐 ‘sit’/jǐ 挤 ‘squeeze’/tǎng 躺 ‘lie’ Group B: kàn 看 ‘look’/wàng 望 ‘gaze’/qiáo 瞧 ‘glance’/zhuǎn 转 ‘turn’/dào 倒 ‘reverse’/ niǔ 扭 ‘twist’

The Type 1 (left column) in Table 1 is verbs that can be used with either the directional complement lái or guòlái. The Type 2 (right column) is generally used with guòlái, while lái is used with restricted verbs. By comparing the Type 1 and Type 2 verbs, we ﬁnd that the main difference between the two types of verbs lies in whether they describe the physical motions that actually occurs. Group A is used to describe autonomous motion and Group B is used to describe induced motion. The group A in the Type 2 verbs is generally used to describe the state of stillness, and is borrowed to describe motion events only under certain circumstances. The kàn, wàng, qiáo in the group B are used to describe the gaze direction, zhuǎn, dào, niǔ describes changes in orientation. These events often do not actually occur at the physical level, but are nonmotion phenomena in human cognition of motion. Talmy [11, 12] discovered that there are some “ﬁctive motion” phenomena in the language that use motion verbs to describe the spatial relationship of stationary objects and divide the virtual motion into six types of “emanation paths”, “pattern paths”, “frame-relative”, “advent paths”, “access paths” and “coextension/coverage paths”. The direction and orientation of the line of sight belong to the “frame-relative”. Lamarre [13] pointed out that the virtual path that expresses the visual direction in Chinese generally uses prepositional structure instead of complement. For example, bāzhe yàoshi yǎnr wǎng lǐ kuīshì 扒着钥匙眼儿往里窥视 ‘peer into the keyhole’. If we replace wǎng lǐ kuīshì 往里窥视 ‘peer into’ with kàn jìnqù 看进去 ‘look into’ then the expression is not natural. The complements that can express visual direction are

A Comparative Study of Two Motion Verbs Lái and Guòlái

349

only a few such as guòqù 过去 ‘go over’ and shàngqù 上去 ‘go up’, which are deictic verb. However, there is no such clear distinction between lái and guòlái, which are also deictic verb. Liu [14] pointed out the difference between lái and guòlái in describing changes in orientation, and described the semantics of lái as “representing the movement of a person or object through an action”. Besides, when describing the semantics of guòlái, Liu [14] added the meaning of “representing the change of direction of a person or object, which means facing a foothold”, in addition to the original meaning of lái. It can be seen that the cognitive similarity between physical motion and virtual motion makes it possible to describe virtual motion with the help of motion verb in language, but the expressions of the two are not totally consistent. Combining the above summary of the collocation situation and related theories and research results, we ﬁnd the effects of the verb meaning on the legitimacy of “verb + lái/guòlái” can be expressed as: 1) Typical motion verbs such as zǒu/pǎo/jì/dǎ, which are mainly used to describe the actual motion on the physical level, can be used with both lái and guòlái. 2) For the non-motion verbs, which are generally used to describe the static state such as kào/zuò and used as motion verbs only under certain circumstances, and the nontypical motion verbs such as the kàn/wàng/zhuǎn/dào which describe the virtual motion of the direction and gaze direction. The collocation of two types of verbs with guòlái is relatively free, but the collocation with lái is limited. It should be noted that the collocation of “non-motion verbs” and “non-typical motion verbs” with lái is limited, because they can also be used with lái in some cases, which is expressed as: First, some verbs cannot be used with lái when used in assertion, while they can be used with lái when used in designation, as we can compare examples (15b) and (16). (16) Tā tuīkāi tā kàolái de tóu. ‘She pushed away his leaning head.’

Second, some verbs that cannot be used with lái when used independently, while they can be used with lái after being combined with a prepositional phrase indicating the direction of motion, as we can compare the example (15b) with example (17). (17) Zhòng nǚzǐ xiàng tā kàolái. ‘The women leaned towards him.’

We can ﬁnd that the essence of the collocation restriction of the verb meaning and its complement lái/guòlái is the restriction of the path expression mode by the characteristics of the motion event, and it is also the different reflections on the semantic

350

Z. Li

and language structure of people’s cognition of different types of motion events, which means virtual motion or not. Talmy [15, 16] believes that the conceptual framework of a motion event is composed of four elements: ﬁgure, motion, path, and ground. The ﬁgure and ground refer to the motion subject and the subject that provides a reference for its motion in a motion (or stationary) event, respectively. Path refers to the course or site occupied by the motion subject relative to the reference object. Motion refers to the movement or positioning itself. From the perspective of lexicalization model and typology, Talmy [15, 16] pointed out that the path elements of English and Chinese are generally contained in the satellite components of verbs (including verb afﬁxes and prepositions and complements that collocation with main verbs). Both languages have a set of satellite components (IN and OUT in English, qù 去 ‘go’ and guò in Chinese, etc.), that are used to speciﬁcally express the path. While in Spanish, the path elements are included in the stem of the main verb. Based on this, Talmy [15, 16] divides the language into two categories: Verb-Framed Language and Satellite-Framed Language. The former refers to the use of verbs to represent the core schema (mainly refers to the two elements of path and ground, especially the path element), includes Romance, Semitic, Japanese, Tamil, etc. The latter refers to languages that use satellite components to represent the core schema, including Indo-European languages, Central Africa Romance languages, Finnish, Chinese, etc. After Talmy’s theory was put forward, many researchers have published their opinions on the rationality of the language type dichotomy and the type of each language [17]. Regarding the classiﬁcation of Chinese, researchers have different opinions. Some researchers basically agree with Talmy’s view and believe that Chinese is an atypical Satellite-Framed Language [18]; some researchers believe that the expression of motion events in Chinese has two situations where directional verbs are used alone, and the combination of verbs and directional complements. Therefore, Chinese belongs to a mixed type of language [19]; other researchers believe that Chinese emphasizes results, so complements are the main component [20], based on this proposition, Chinese can also be regarded as a Verb-Framed Language. Although there are some differences in the opinions of researchers, most researchers agree that when using verb-complement structure to express motion, the core schema is represented by the complement. Based on the above theories and propositions, we think that in the verbcomplement structure “verb + lái/guòlái” that expresses motion, the complements lái and guòlái are syntactic elements that express the path elements of the motion event. In this section, we analyzed the effect of the verb meaning on the collocation of the verb and the complement lái/guòlái, and the difference in the verb meaning, which actually represents the characteristics of different motion events (“motion at the physical level” or “virtual motion”). And we have already discussed in the Sect. 2 that the scope of the conceptual meaning of lái and guòlái is different. Based on this, we conclude that the combination of “verb + lái/guòlái” under the influence of the verb meaning is essentially the choice of the way the path is expressed, by the characteristics of the motion event. And it is also the reflection of people’s cognition of the objective world in the semantic and linguistic structure.

A Comparative Study of Two Motion Verbs Lái and Guòlái

351

4 Summary Both lái and guòlái mean motion from far to near. The semantics and usage of the two are very similar, but they cannot be used interchangeably in some contexts. When lái and guòlái are used as the main verbs of motion expression, whether they can be used interchangeably is mainly affected by whether the sentence expresses the meaning of restraining, prohibition or negation and whether the motion subject has the presence. The main reason for the difference lies in the difference in the conceptual meaning of lái and guòlái, which means that the meaning of trajectory in guò, adds the semantic component describing the motion process to guòlái. Therefore, if the process information cannot be set as the default in the context, only guòlái can be used. In the structure of “verb + lái/guòlái” composed of lái and guòlái as directional complements, the collocation of verbs with lái and guòlái is mainly affected by the number of verb syllables and the verb meaning. In terms of syllables, monosyllable verbs can be paired with both lái and guòlái, and polysyllable verbs are more likely to be collocated with guòlái. In terms of word meaning, “typical motion verbs” can be used with both lái and guòlái. The collocation of “non-motion verbs” and “non-typical motion verbs” with lái is restricted. However, it should be noted that in many cases, the number of syllables can only change the “acceptance level” of a sentence, and does not necessarily lead to extreme results of “grammatical compliance” or “non-grammatical”. The “non-motion verbs” and “non-typical motion verbs” can also be used with lái in some cases. We conclude that the effect of the verb meaning on the legitimacy of the “verb + lái/guòlái” structure is essentially the choice of the way the path is expressed by the characteristics of the motion event, and it is also the reflection of people’s cognition of the objective world in semantics and language structure.

References 1. Lyons, J.: Deixis, space and time. Semantics 2, 636–724 (1977) 2. Tsuji, Y.: An Encyclopedic Dictionary of Cognitive Linguistics (新編認知言語学キーワード事典). Kenkyusha, Tokyo (2013). (in Japanese) 3. Chao, Y.: A Grammar of Spoken Chinese (中國話的文法), 1st edn. The Chinese University Press, Hong Kong (1980). (in Chinese) 4. Zhu, D.: Chinese language (语法讲义), 1st edn. The Commercial Press, Beijing (1982). (in Chinese) 5. Wang, Y.: Cognitive Linguistics (认知语言学), 1st edn. Shanghai Foreign Language Education Press, Shanghai (2007). (in Chinese) 6. Lakoff, G.: Women Fire and Dangerous Things: What Categories Reveal about the Mind, 1st edn. The University of Chicago Press, Chicago (1987) 7. Liu, M., Tsai, H., Hu, C., Chou, S.: The proto-motion event schema: integrating lexical semantics and morphological sequencing. J. Chinese Linguist. 43, 503–547 (2015) 8. Lv, S.: Modern Chinese Eight Hundred Words (现代汉语八百词), 1st edn. The Commercial Press, Beijing (1980). (in Chinese) 9. Feng, S.: Chinese Prosodic Syntax (汉语韵律句法学), 1st edn. Shanghai Educational Publishing House, Shanghai (2000). (in Chinese)

352

Z. Li

10. Liu, Y.: Analysis of Directional Complement (趋向补语通释), 1st edn. Beijing Language and Culture University Press, Beijing (1998). (in Chinese) 11. Talmy, L.: Semantics and syntax of motion. In: Kimball, J.P. (ed.) Syntax and Semantics 4, pp. 181–238. Academic Press, New York, NY (1975) 12. Talmy, L.: Toward a Cognitive Semantics, Volume I: Concept Structuring Systems. 1st edn. MTI Press, Cambridge, MA (2000) 13. Lamarre, C.: Motion expressions in Chinese (中国語の移動表現). In: Matsumoto, Y(eds.). a typological study of motion expressions (移動表現の類型論), pp. 95–128. Kurosio Shuppan, Tokyo (2017). (in Japanese) 14. Liu, Y., Pan, W., Gu, W.: Chinese Grammar (实用现代汉语语法), 1st edn. The Commercial Press, Beijing (2004). (in Chinese) 15. Talmy, L.: Lexicalization patterns: semantic structure in lexical forms. In: Shopen, T. (ed.) Language Typology and Syntactic Description 3: Grammatical Categories and the Lexicon, pp. 36–149. Cambridge University Press, Cambridge (1985) 16. Talmy, L.: Toward a Cognitive Semantics, Volume II: Typology and Process in Concept Structuring. 1st edn. MIT Press, Cambridge, MA (2000b) 17. Li, X.: A critical review of researches on motion events typologies (移动事件类型学研究述评). Foreign Langu. Res. 4, 1–9 (2012). (in Chinese) 18. Shen, J.: A typological investigation of “dynamic complement structure” in modern Chinese (现代汉语“动补结构”的类型学考察). Chinese Teach. World 3, 17–23 (2003). (in Chinese) 19. Lamarre, C.: The linguistic encoding of motion events in Chinese (汉语空间位移事件的语言表达——兼论述趋式的几个问题). Contemp. Res. Mod. Chinese 5, 1–18 (2003). (in Chinese) 20. Tai, J.H.-Y.: Cognitive relativism: resultative construction in Chinese. Lang. Linguist. 4(2), 301–316 (2003)

Morpheme Zú “Tribe” in Mandarin Chinese Huahung Yuan1

and Yan Li2(&)

1

2

Taipei, Taiwan [email protected] School of Foreign Languages and Culture, Nanjing Normal University, Nanjing, China [email protected]

Abstract. The paper focuses on the morphological status of zú (族) in Mandarin Chinese. Expressing the multiplicity of humans originally, the morpheme can appear at the right-hand or at the left-hand side in the word form [X1X2] . Recently, [X1 zú] form can be used as neologisms, for instance, cǎoméizú ‘strawberry generation’ and so on. Following Packard [1], we will analyze zú in the conventional word form as a bound root and that one in the neologism as a word-forming sufﬁx. Zú in the two word forms have an identical internal structure where X1 modiﬁes zú. This shows to indicate a group of people with an identical property, the neologism zú copies the pattern of the word formation of the conventional [X1 zú] form. Keywords: Morpheme Chinese Neologism

Afﬁx Word form Bound Free zú Mandarin

1 Introduction This paper investigates the morpheme zú (族) ‘tribe’ in Mandarin Chinese. Zú can be situated at the right-hand and the left-hand side in the word form [X1X2], for example, zúzhǎng (族长) ‘head of a clan’ and qīnzú (亲族) ‘relatives’. Words forming with zú are often nominals. Words containing zú in the [X1 zú] form are nouns designated for human being, for example, Xiānbēi-zú (1) and yuèguāng-zú (2). (1) 鲜卑族在唐代已经消亡。 Xiānbēi-zú zài Táng dài yǐjīng xiāowáng. Xianbei tribe at Tang time already disappear. ‘The Xianbei have disappeared during the Tang dynasty.’ (2) 月光族在现代社会很常见。 Yuè-guāng-zú zài xiàndài shèhuì hěn chángjiàn. Month ﬁnish tribe at modern society very common ‘The group of people who expend their entire salary by the end of each month are common in a modern society/ who are paycheck-to-paycheck.’ H. Yuan—Independent researcher. © Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 353–369, 2022. https://doi.org/10.1007/978-3-031-06703-7_27

354

H. Yuan and Y. Li

The NP, Xiānbēi-zú in (1) refers to a name of an ethnic group whereas yuèguāng-zú in (2) refers to a group of people who live in the same way, i.e., spend up all the salary before the end of the month, which is a neologism in Mandarin Chinese largely used after the year of 2000. The word yuèguāng-zú as a neologism resembles to Xiānbēi-zú which names the ethnic group since both forms consist of [X1 zú] and indicate a group of people. Sinologue scholars tend to assume that zú in the word Xiānbēi-zú is morphologically different from that one in the neologisms such as yuèguāng-zú. This inspires us to analyze such a morpheme since it is less studied among other morphemes. In this paper, we will discuss the morphological status of the morpheme zú in the word forms [X1 zú] and the word formation. In what follows, we will present the main argue of some previous studies in the sinologue literature and the research questions. 1.1

Zú is a Nominal Sufﬁx

Recent research on the word formation of neologisms highlights the morpheme zú due to its productivity of new words, such as [2–7] among others. It is believed that the word formation of zú (2) is a calque of X-zú originated from Japanese [8]. This morpheme is treated as a nominal sufﬁx [8, 9] because it is able to attach to a predicate (3)a, a noun (3)b, an adjective (3)c or an onomatope (3)d and form a nominal unit [6]. (3)

a. dǎ-gōng zú hit work tribe

b. dānshēn zú single tribe

‘A group of people who do part-time jobs’

c. shíshàng zú fashion tribe

‘A groupe of people who are single’

d. NONO NONO zú no no tribe

‘A group of people who are fashionable’ ‘A group of people who adore ecology and delicacy’

Zú is analyzed as a semi-afﬁx (lèi-cízhuì类词缀) in [6, 10–12] among others without any tests to justify its sufﬁx status. The deﬁnition of semi-afﬁxes is different from one author to another. For Zhang [11], semi-afﬁxes are morphological items which are not completely grammaticalized while for Song [6], semi-afﬁxes have content or abstract meanings. Song [6] points out that zú is not a complete sufﬁx yet since it is undergoing a grammaticalization (虚化) from its root form which is a content word, as in hàn-zú ‘the Han people’ to a sufﬁx (see also [10]). Nevertheless, in the abovementioned works, the morpheme zú in hàn-zú is not attested as a root form, which cannot prove the morphological difference of zú in the neologism yuèguāng-zú. In our view, it is necessary to demonstrate whether the zú in hàn-zú ‘the Han people’ is a content word, a root, not a sufﬁx before assuming zú in the neologism yuèguāng-zú is undergoing a grammaticalization.

Morpheme Zú “Tribe” in Mandarin Chinese

1.2

355

Research Questions

From the above, it is required to determine the morphological status of zú with clear deﬁnitions of morpheme types in Mandarin Chinese and analyze the word structure [X1 zú]. Thus, in this paper, we aim to explain: (i) What is the morpheme zú? Is it a word, a root, or an afﬁx? (ii) What does zú do internally in a word? In our analysis, word forms with zú, such as Xiānbēi-zú and yuèguāng-zú will be examined with criteria to determinate the morpheme types, i.e., word, root, or afﬁx suggested in Packard [1]. The internal structure of such word forms will be observed, and the regularity of word formation of zú will be shown as well.

2 Four Morpheme Types (Packard [1]) In this section, we will present the four morpheme types, together with the Headedness Principle discussed in Packard [1] and his morphological analysis based on X-bar principles as well. 2.1

Morpheme Classiﬁcations

In [1], a word in Chinese is deﬁned as a syntactic word which can occupy a syntactic slot in a sentence. Morphemes can be identiﬁed in a word. Packard [1] states: “the identity of morphemes within words is largely word-driven – the form class identity of the word generally determines the form class identities of its constituents.” “A morpheme has MN form class identities with identity MX determined by word-internal context”, which means “in the case of a complex noun [X1X2]N, the N identity of the word ‘confers’ an identity upon X2, yielding [X1N]N. In [X1X2]N, X2 is equal to N, while X1 is left relatively ‘free to vary’, subject to constraints imposed by the requirements of its position within the word [1].” In a compound noun, such as jīnglǐ 经理 manage-manage ‘manager’, jīng and lǐ are verbs both and according to the wordinternal rule, the identity of N is conferred to the verb lǐ理‘to manage’ while placed at the right side of [X1X2]N. Based on the criteria of free-bound and content-function, Packard [1] distinguishes four morpheme types, i.e., function words, root words, bound roots, and afﬁxes, as illustrates Table 1 below. Table 1. Four morpheme types (slightly modiﬁed) (Packard [1]) Morpheme types Function word (Root) word Bound root Afﬁx

Free or bound Free

Content or function Function

Free Bound Bound

Content Content Function

Examples Mod. de 的, sentence aspect le了and conj. hé 和 bīng 冰 ‘ice’ -fáng 房 ‘house’ Verbal aspect markers etc.…

356

H. Yuan and Y. Li

Packard [1] suggests a subcategorization of the class of afﬁxes into two: wordforming afﬁxes and grammatical afﬁxes since they can be distinguished with clear criterion. Word-forming afﬁxes “may change the form class of items to which they attach”, “apply selectively, to only certain members of a category”, “have a meaning across contexts that is relatively variable and unpredictable” and “may attach to free words or bound roots” [1]. Grammatical afﬁxes “are completely general, that is, apply to all members, or at least large subclasses, of a given class”, “have a constant, predictable meaning across contexts”, “must attach to free words” and “never change the form class of the words to which they attach” [1]. Table 2 summarizes the distinguishing characteristics of the two types of afﬁxes. Table 2. Two types of afﬁxes (Packard [1]) Types Wordforming afﬁxes Grammatical afﬁxes

Change of class Possible

Application to members of a category Certain members

Having a meaning Yes

Attachment to

Free words, bound roots Examples: ‘Nominalizing’ sufﬁxes: zǐ子, tóu头, xìng性, dù度 Negative preﬁxes: wú无, wèi未, fēi非 Agentive marker: zhe者 Never All members Yes Free words Examples: Verbal aspect markers: le了, zhe着, gùo过/Nominal sufﬁx: men 们

The distinction between the four types of morphemes in Mandarin is relevant to the present study on zú. 2.2

Headedness Principle and X-bar Morphology

Packard [1] observes “noun words have nominal constituents on the right and verb words have verbal constituents on the left”. This generalization is termed the Headedness Principle, presented below: (4) Headedness Principle: (disyllabic) noun words have nominal constituents on the right and verb words have verbal constituents on the left [1] An unambiguous noun zhǐ (纸)‘paper’ can be the left- or right-hand member of the word, such as zhǐbǎn (纸板) ‘hardboard’, zhǐbì (纸币) ‘paper currency’, zhǐjiāng (纸浆) ‘paper pulp’, bàozhǐ (报纸)’newspaper’, cǎozhǐ (草纸) ‘straw paper’, jiǎnzhǐ (剪纸) ‘paper cutting’. Words formed with the noun zhǐ maintain its form class identity which is nominal. This is a steadfast representation of word forms of a noun. Packard [1] applies his morphological analysis based on X-bar principles to Mandarin Chinese. The four morphological primitives (X−0, X−1, XW and G) in his proposed X-bar morphological system correspond to the four morpheme types, root word (X−0), bound root (X−1), word-forming afﬁx (XW) and grammatical afﬁx (G). The possible word forms can be captured by the two following rules.

Morpheme Zú “Tribe” in Mandarin Chinese

(5) Rule 1 X−0 ! X−0, −1, Rule 2 X−0 ! X−0, G

{W}

, X−0,

357

−1, {W}

Rule 1 means that “a word may be composed of any combination of bound and free morphemes” and “XW may only attach to a root word or a bound root and may not attach to another XW” [1]. Rule 2 states “a word may also be composed by attaching a grammatical afﬁx to the right of a bona ﬁde word.” A word form of two words N−0N−0 can be captured by the rule X−0 ! X−0, X−0. A word composed of two bound roots N−1N−1 is shown by the rule X−0 ! X−1, X−1. The rules can generate different types of X-bar structures for nonspeciﬁc forms of word. The word forms can be represented by simple, single-level binary branching structures or embedding – i.e., multi-level binary branching structures. A single branching structure can ﬁt all the possible instantiations of the rule in (5). A form of word consisting of two words bīngshān (冰山) ‘ice mountain’, which is written N−0N−0 is captured by the single branching structure, as illustrates (6)a. A word of free word-bound root (N−0N−1) yóumín (游民) ‘vagrant’ can be shown in the structure (6)b.

(6)

a.

b.

The multiple-branching word formation structure are expanded from the single binary structure of the X−0 node (6)a. There are right branching and left branching structures. A word form tiěfànwǎn (铁饭碗)’ secure job’ consists of N−0N−0N−0 which has a right branching structure (7)a. The form of word zìyuànzhě (自愿者)’volunteer’ whose head is a word forming nominal sufﬁx is represented in the left branching structure (7)b.

(7)

a.

b.

The X-bar based formalism will help to understand the form of words zú later in our analysis.

3 Word Formation with Morpheme Zú In this section, we will list words containing zú and apply the criteria of contentfunction and bound-free to identify the types of morphemes to which zú belong, word, root, and afﬁx.

358

H. Yuan and Y. Li

3.1

Word Forms with Zú

It is needed to have an idea about the basic meaning of the morpheme zú. In Shuo wen jie zi, the ﬁrst dictionary on explaining characters, dated two thousand years ago, Xu [13] notes the character zú means an arrow1. In Notes on Shuo wen jie zi, published in the early nineteen century, Duan [14] explains that zú describes the gathering of arrows which marks a multiplicity of persons2. Therefore, this morpheme is associated with a nominal plurality over humans. For the present research, word forms with zú are collected from online Xinhua dictionary [15] and Concise Chinese Reverse Order Dictionary [16] while neologisms zú which are commonly used are picked up from certain previous research [5, 6, 8]. Following Packard [1], it can be observed that the morpheme zú can consist of a word form [X1X2]N, identiﬁed as a noun. Zú can be on the left-hand side in the internal structure, as [zú X2]N (Table 3) and on the right-hand side, [X1 zú]N (Table 4). Since the N identity is given by the element at the right-hand side of the internal structure [X1X2]N, for [X1 zú]N, it is zú that confers the N identity while for [zú X2]N, it is X2. Table 3. Words containing zú ‘tribe’ in the form [zú X2]N Word 族产 zúchǎn 族规 zúguī 族类 zúlèi 族谱 zúpǔ

English gloss Properties of a clan Regulations of a clan Of the same clan or race Genealogy

Word 族亲 zúqīn 族权 zúquán 族人 zúrén 族田 zútián

English gloss Relatives Authority of a clan/a tribe Clansman

Word 族长 zúzhǎng 族尊 zúzūn

English gloss Head of a tribe/clan Seniors in a clan/tribe

Fields owned by a clan/family

Table 4. Words containing zú ‘tribe’ in the form [X1 zú]N Word

English gloss

Adj-zú 大族 dàzú

Big tribe

贵族 guìzú

Aristocracy

1 2

Word

English gloss

Common N-zú 世族 An aristocratic shìzú family politically influential for generations 民族 Nationality mínzú

“族, 矢锋也。束之族族也。” [13]. “所標衆。衆矢之所集。” [14].

Word Proper N-zú 汉族 Hànzú

回族 Huízú

English gloss The Han Chinese

The Hui people (continued)

Morpheme Zú “Tribe” in Mandarin Chinese

359

Table 4. (continued) Word

English gloss

Word

English gloss

Word

遗族 yízú

Descendants (of the clan of the deceased) Relatives

氏族 shìzú

A large group of families that are related to each other Ethnic group, race

维吾尔族 Wéiwúěrzú

亲族 qīnzú 外族 wàizú 异族 yìzú 望族 wàngzú

Other nationalities Outlander

种族 zhǒngzú 宗族 zōngzú 王族 wángzú 家族 jiāzú

Lineage Imperial kinsmen

藏族 Zàngzú 哈萨克族 Hāsàkèzú V/N-zú

English gloss The Uyghurs The Tibetan people The Kazakhs

Family, clan 灭族 Exterminate Respected mièzú3 an entire and tribe influential clan 3 Mièzú (灭族) ‘extermination of a tribe’ can be a verb (i) or a noun (ii). Since the word form can be transformed into a passive form (i), it is a verb. (i) 鲜卑在唐代被灭族了。 Xiānbēi zài Táng dài bèi mièzú le. Xianbei at Tang dynasty passive extermine FP. ‘The Xianbei ave been exterminated during the Tang dynasty.’ (ii) 鲜卑的灭族原因. Xiānbēi de mièzú yuányīn. Xianbei de exterminate cause. ‘The cause of the extermination of Xianbei’.

A series of neologisms [X1 zú]N list here below in Table 5. For distinction, the conventional usage of zú in Table 3 and Table 4 will be labelled as zú1 and the neologisms in Table 5 will be noted as zú2. Table 5. Words containing zú ‘tribe’ in the form [X1 zú]N Component-zú VP 打工族 Dǎgōng-zú 低头族 Dītóu-zú 哈韩/日族 Hāhán/rì-zú NP

飞车族 fēichē-zú

English gloss Workers who do parttime jobs Phubbers or smombies (smartphone zombies) People who are Korean/Japanese maniac Street racers

English gloss 追星族 zhuīxīng-zú 奔奔族 Bēnbēn-zú 上班族 Shàngbān-zú 单身族 dānshēn-zú

Groupies/fans Young people who love to have fun but hard work Ofﬁce workers/salarymen People who are single (continued)

360

H. Yuan and Y. Li Table 5. (continued) English gloss

Component-zú

NV AdvV

AdjP

Onomatope Letter

草莓族 cǎoméi-zú 北漂族 běipiāo-zú 慢活族 mànhuó-zú 新酷族 xīnkù-zú

国际自由族 guójì zìyóu-zú SOHO族 SOHO-zú

尼特族 Neet-zú

Strawberry generation Beijing drifters

English gloss 单车族 dānchē-zú 网购族 wǎnggòu-zú 月光族 yuèguāng-zú

Cyclists Online shoppers

People who live a People who are down-shifting living paychecklifestyle to-paycheck 穷忙族 The working poor Teenagers who are qióngmángzú dressed in trendy clothes and behave in an untraditional way People who are at liberty to choose places to work or have holidays People who work at BOBO族 The sociohome BOBO-zú economic bourgeoisbohemian group The NEETs = people who are unemployed and not receiving an education or vocational training

Before discussing the morpheme types that zú belong to, it is necessary to identify the word forms where zú can appear. The word forms [X1 zú1]N, [zú1 X2]N and [X1 zú2]N are words with a clear boundary since they can be placed in a syntactic slot in a sentence, as illustrate (1)-(2) where these words are assigned as the subject of the VP, which shows they are nominals. Also, in such word forms, the components X1 and X2 cannot be detached from each other between which the relator de (的) is not allowed to insert, as illustrate (8).

(8)

a. * *mín-de-zú

b* *zú-de-zhǎng

c* *cǎoméi-de-zú

Therefore, it can be shown that zú is situated in a form that functions as a word. Zú can be identiﬁed with the help of the criterion of content-function and bound-free discussed in Packard [1]. Firstly, it is needed to identify if zú is bound or free as a morpheme. If a morpheme is free, it is a root word. If it is bound, the morpheme can be a bound root or an afﬁx. It is noticed that zú is not erasable in such word forms [X1 zú1]N, [zú1 X2]N and [X1 zú2]N ((9)-(10)), and it cannot be used alone as a word form, as in (10). This shows zú is bound, not free so as not to be a word.

Morpheme Zú “Tribe” in Mandarin Chinese

361

(9) [藏*(族)]/#[草莓*(族)]是一个[*(民)族]。 [Zàng*(zú)] /#[cǎoméi*(zú)] shì yī-gè [*(mín)zú]. Tibet-tribe strawberry tribe be one-Cl ethnic-group. For Zàng zú: ‘The Tibetan people is an ethnic group.’ For cǎoméi-zú: Intended: ‘Strawberry generation is an ethnic group.’ (10) 王五的[*(亲)/*(种)族] /[族*(人/长)]会来帮他。 Wáng Wǔ de [*(qīn)/*(zhǒng)zú] /[zú *(rén/zhǎng)] huì lái bāng tā. W.W de relative race tribe clan man head will come help he. ‘The relatives/ race of Wang Wu/ The head of clan/ the clansmen of Wang Wu will come to help him.’ Secondly, it is observed that zú has a meaning which is associated with a plurality of humans. Hànzú in (11) refers to the tribe tied with the lineage of Han which is collective while mànhuó-zú indicates a group of people who lives a down shifting lifestyle which is collective as well. Hànzú can indicate the identity of a person and so does mànhuó-zú. (11) 花木兰不是汉族/慢活族。 Huāmùlán bù shì hànzú/mànhuó-zú. Huamulan neg be Han tribe down shifting tribe. ‘Hua Mulan is not a Han/ is not a person living a down shifting lifestyle.’ Also, if zú is removed from the word forms in (9), the word forms cannot express the same meaning as zàngzú, cǎoméi-zú and mínzú since the morphemes zàng’Tibet’, cǎoméi ‘strawberry’ or mín ‘man’ have their own meanings which are the name of a place, a strawberry and a person respectively. Thus, it is obvious that zú is a morpheme with a content and it bears a meaning across all the word forms where it appears. Until now, we have shown that zú1 and zú2 are bound and have a content in them. However, do zú1 and zú2 function in the same way? In the next section, we will discuss how zú relates to the component X1 in the word forms [X1 zú1]N and [X1 zú2]N. 3.2

Relationship Between Components and Zú in Word Forms

Packard [1] observes that in ‘modiﬁer–modiﬁed’ [N1 N2]N forms, “N1 always speciﬁes a property or characteristic of N2” “since these words are deﬁned as having a modiﬁer– modiﬁed structure”. N1 and N2 have a kind of relationship [1]. N1 can be the habitat of N2, such as shuǐniǎo (水鸟) ‘aquatic bird’, bìhǔ (壁虎) ‘gecko’, sōngshǔ (松鼠) ‘squirrel’. N1 can be a metaphorical description of N2, such as yínháng (银行) ‘bank’, huǒchē (火车) ‘train’. N1 depicts the form of N2, for instance, shātáng (砂糖) ‘granulated sugar’, bīngkuài (冰块) ‘ice cube’. We can notice that the ‘modiﬁer-modiﬁed’ principle realizes in [X1 zú1]N forms. It can be found in the word forms listed in Table 4, there are two patterns: X1 is either an adjective or a noun which modiﬁes the morpheme zú1. In the word forms, such as dàzú ‘big tribe’, guìzú ‘aristocracy’, the adjective dà ‘big’ modiﬁes zú to indicate a tribe comprising numerous families and guì modifying zú to mark an identity of a group of people having the quality of aristocracy. The adjectives specify zú1 to compose a word form which is lexicalized since between two morphemes, the relator de is not allowed to be inserted (12).

362

H. Yuan and Y. Li

(12)

a. * *dà de zú Big de tribe

b. * *guì de zú aristocratic de tribe

In other type of word forms, noun-zú1, a noun including common nouns and proper nouns modiﬁes the morpheme zú1. When X is a common noun, such as mínzú ‘people’, zhǒngzú ‘race’, zōngzú ‘lineage’, X is a bound root, not a free word. The morphemes mín, zhǒng, zōng can be tested with a determinant phrase (DP), as illustrates (13)a and they cannot be served as a nominal constituent. In the word forms wángzú ‘imperial kinsmen’ jiāzú’family’, wáng ‘king’ and jiā ‘home’ are identiﬁed as a free word since they occupy a syntactic slot which is a NP (13)b. The two forms of word consist of free word-bound root.

(13)

a. * *Yī-gè mín /zhǒng/zōng One-Cl person kind lineage

b. Yī-gè wáng / jiā One-Cl king home ‘A king/a home/family’

These nominal morphemes mín, zhǒng, zōng, wáng and jiā modify the bound root zú1 and deﬁne the kind of group of people that the words [X1 zú1] refer to. These words are lexicalized since the two components N and zú1 are not allowed to be separated in a word form, as the impossible insertion of the relator (8)a shows. In such word forms [X1 zú1]N where X1 refers to a proper noun, hànzú, huízú, Wéiwúěrzú, Zàngzú, Hāsàkèzú and so on, [X1 zú1] indicate the name of a group of people. When X1 is a name of place, i.e., Hàn ‘Han‘, Zàng ‘Tibet’, X1 is the name or the habitat of the group of people. Also, Wéiwúěr ‘Uyghur’, Huí and Hāsàkè ‘Kazakh’ indicating the name of the race combine with zú1 to nominate a speciﬁc group of people. The name of the race speciﬁes the morpheme zú1 without changing the syntactic category of the actual word form which is a noun. For [X1 zú2]N forms which are neologisms, X1 can be a VP, a NP, a NV, an AdvV, an AdjV and onomatopes or English letters, as indicates Table 5. As Cai [3], Zhao [17] and Zhang [18] indicate, in [X1 zú2]N forms, X1 undergoes a metaphorization to describe a certain property of a person. While co-occurring with zú2, A VP, a V-N structure describes an action, such as dǎgōng (打工) ‘do a part-time job’, zhuīxīng (追星) ‘chase stars’, dītóu (低头) ‘lower one’s head’, hāhán (哈韩) ‘be crazy about Korean pop culture’. The VP transforms to metaphorize the persons who take such an action. When X1 is a NP, such as cǎoméi (草莓) ‘strawberry’, dānshēn (单身) ‘being single’, dānchē (单车) ‘bicycle’, it indicates a feature denoted by the NP. Cǎoméi describes a person who is fragile as strawberries which are easily bruised when the term is uttered in the word form cǎoméi-zú ‘strawberry generation’. This refers to a metaphorical usage of “strawberry”.

Morpheme Zú “Tribe” in Mandarin Chinese

363

When X1 is NV structure, for example, běipiāo (北漂) ‘drift in Beijing/ in the Northern’, wǎnggòu (网购) ‘shop online’, N indicates a location where the action takes place and the NV in the word form zú2 refers to an action that a group of persons who are taking. When X1 is an AdvV structure, like mànhuó (慢活) ‘slowly-live, down shift’, yuèguāng (月光) ‘monthly-ﬁnish’, it shows a life style, a way of living for a person. When X1 is a Adj Phrase, such as xīnkù (新酷) ‘new-cool’, qióngmáng (穷忙) ‘poor-busy’, it designates a person’s state which is trendy and cool, or busy but poor. When X1 is composed of an onomatope in English letters, it is originated from English neologisms, such as SOHO, BOBO, NEET. SOHO is an acronym for ‘Small Ofﬁce Home Ofﬁce’, and it refers to a person who works at home, teleworks, and works from anywhere. This term is integrated into Mandarin as its original form, without any process of phonetic loan or being translated by Chinese characters. Nítè (尼特) is translated phonetically from NEET which is an acronym for “Not in Education, Employment or Training” referring to a property of a person. These neologisms from English also indicate a property that a person has, and they are uttered in the word form zú2 as a metaphor. When [X1 zú2]N forms compared to [X1 zú1]N, it can be noticed that X1 indeed speciﬁes zú2 to describe a certain characteristic of persons as X1 which is a proper noun modiﬁes zú1. X1 in [X1 zú2]N forms can be regarded as a label which is metaphorized from a VP, a NP, a NV, an AdvV or letters. Therefore, X1 is a metaphorical description of zú2 in [X1 zú2]N forms. The morpheme zú2 enables X1 of a syntactic category such as verbs, adjectives, nouns (including the acronyms, BOBO and SOHO) to change into the category of nouns. This is different from the names of races denoted by the word forms [X1 zú1]N whose syntactic category does not change. Therefore, bound and with a content, zú1 and zú2 are different in the possibility of changing the class of item to which they attach into the category of nouns. We are now able to determinate the morpheme types for zú1 and zú2. zú1 is a bound root whereas zú2 is a word-forming afﬁx, especially a word-forming sufﬁx. The criteria for determining the morpheme types for zú1 and zú2 are summarized below in Table 6. In view of the above observation, it can be shown that the component X1 and zú1 has a modiﬁer-modiﬁed relationship in [X1 zú1]N forms and so does X1 and zú2 in [X1 zú2]N forms. The neologism [X1 zú2]N is not different from the conventional word form [X1 zú1]N.

Table 6. Morpheme types for zú1 and zú2

zú1 zú2

Change of class No (N- > N) Yes (Category X- > N)

Application to members of a category Adj, Noun

Having a meaning Yes

VP, NP, NV, AdjP, AdvV, Onomatope

Yes

Attachment to Free words, bound roots Free words

Morpheme type Bound root Wordforming sufﬁx

364

3.3

H. Yuan and Y. Li

Internal Structure of Word Formation Zú

In this section, following the Packard [1]’s X-bar morphological proposal, we will analyze the internal structure of word forms [X1 zú1]N and [X1 zú2]N. Following the Headedness Principle (4), since the word forms [X1 zú1]N and [X1 zú2]N are nouns, zú1 and zú2 are the head of the word form where the morphemes appear. As we have demonstrated above, zú1 is identiﬁed as a bound root and it is written N−1 according to the rule of X-bar morphological analysis and zú2 is identiﬁed as a word-forming sufﬁx, written NW. Thus, for [X1 zú1]N and [X1 zú2]N, the rule is: (14) Rule for [X1 zú1]N and [X1 zú2]N N-0 ! N-0, -1, N-1, {W} [X1 zú1]N forms can be applied to the model (6)b. The word form jiāzú’family’ can be analyzed as a N−0-zú1 since jiā is identiﬁed as a free word, written N−0. This form is modeled on the X-bar morphology in (15)a. Mínzú ‘nationality’ can be realized on the pattern, as illustrates (15)b. As we have explained in the previous section, mín cannot be used alone because it is bound.

(15)

a.

b.

c.

For a name of a race, such as Zàngzú ‘the Tibetan people’, the proper noun Zàng is a word, written N−0 and the internal structure of the word is displayed as in (15)c. [X1 zú1]N forms are analyzed with a single binary combination of N−0 or N−1 and N−1 zú1, as illustrate (15). In our opinion, the neologism [X1 zú2]N can be viewed as an extension of the lexical usage of zú1 since this word form imitates the pattern of [X1 zú1]N and indicates a group of people. [Adj zú1]/[N zú1] is served as the word formation model for the neologism zú2. In the word form [X1 zú2]N, once X1 denotes an action or a characteristic expressed by a NP, VP, AdjP, NV, AdvV and any possible structure, X1 will be metaphorized to be a naming label as to be an adjective or a noun to specify zú2. The application of the word formation model [X1 zú2]N can be illustrated in Fig. 1. zú1 acquires a new way to extend its usage to form a neologism zú2. The rule for the formation of [X1 zú2]N needs to be modiﬁed, as shows (16). (16) Rule for [X1 zú2]N N-0 ! N-0/VP/NP/Adj P/Adv-V, N{W} [X1 zú2]N forms are structured with multiple branches because X1 can be VP, NV, AdjP, AdvV and onomatope which form with more than one morpheme. Since the

Morpheme Zú “Tribe” in Mandarin Chinese

365

Fig. 1. Model for neologism [X1 zú2]N word formation

head is zú2, it is always situated at the right-hand side of the word form in which it appears. The modifying component X1 is placed at the left-hand side of zú2. The structure of [X1 zú2]N is composed of a left branching binary combination of a word form X1 and a node of NW zú2. Let us examine a [X1 zú2]N form such as cǎoméi-zú ‘strawberry generation’. X1 cǎoméi ‘strawberry’ is a noun consisting of one word, cǎo ‘herb’ and one bound root, méi ‘berry’. méi ‘berry’ cannot be used alone as word form since it always combines other morphemes, for instance méiguǒ (莓果) ‘berry’, hóngméi (红莓) ‘cranberry’, lánméi (蓝莓) ‘blueberry’ while cǎo can occupy a syntactic slot, such as sān-gēn cǎo (三根草). Cǎoméi-zú has an internal structure within the word form as illustrates (17)a. The modifying component at the left side having a binary branch node of N−0 speciﬁes the head NWzú2. This structure model can apply to other [X1 zú2]N forms, i.e., zhuīxīngzú ‘fans’ (17)b, yuèguāng-zú ‘people who are living paycheck-to-paycheck’ (17)c, wǎnggòu-zú ‘online shoppers’ (18)a, xīnkù-zú ‘teens dressed in trendy clothes, with untraditional behavior’ (18)b and Soho-zú ‘people who work at home’ (18)c in which X1 has different constituent identities, VP, NV, AdvV, AdjP and onomatope.

366

H. Yuan and Y. Li

(17) a.

(18)

b.

a.

c.

c.

b.

The internal structure in [X1 zú2]N forms is complex. The structure displays a hierarchy between X1 and the word-forming sufﬁx zú2: X1 has its node of single binary branch and zú2 has a single branch. As we have shown earlier, a [X1 zú1]N form has a single node of binary branch and a simple relationship between X1 and the bound root zú1. [X1 zú2]N forms can be regarded similar to [X1 zú1]N. X1 in [X1 zú2]N forms can be marked as N since it undergoes a metaphorization to be tasked with naming the group denoted by zú2. Thus, the X-bar word formation tree can be modiﬁed: the X1 which are VP, AdjP, NP, NV, AdvV and so on changes its word class identity to a N, as illustrates (19)a. The AdjP xīnkù change its class to N−1 since this term needs to be bound in the word form xīnkùzú. The ﬁnalized word formation xīnkù-zú can be shown as in (19)b.

(19)

a.

b.

Morpheme Zú “Tribe” in Mandarin Chinese

367

In [X1 zú2]N forms, X1 are often bound morphemes which cannot be detached from zú2, as the contrast between (20)-(21) shows. [X1 zú2]N forms can refer to a group of people who take the identical action or have the same characteristics or an individual who belongs to the group described by the word form. (20) 小美是个新酷族/月光族/草莓族/网购族。 Xiǎoměi shì gè xīnkù-zú/yuèguāng-zú / cǎoméi-zú/ wǎnggòu-zú. XM be Cl trendy tribe month-ﬁnish-tribe strawberry tribe online shopper ‘Xiaomei is a trendy-untraditional person/living a payback-to-payback life/ a person of the strawberry generation/ an online shopper.’ (21) 小美是个*新酷/*月光/*草莓/*网购。 Xiǎoměi shì gè *xīnkù /* yuèguāng / *cǎoméi / *wǎnggòu. XM be Cl trendy month-ﬁnish strawberry online shop. This means the neologism zú2 forms a word with the same pattern as [X1 zú1]N. The component X1 in [X1 zú1]N forms is not allowed to be separated from the word. X1 can be a free word or a bound root but when it attaches to zú1 in a word form, the latter is lexicalized, as show (22)-(23). (22) 小朱是个哈萨克族/汉族/王族/贵族。 Xiǎozhū shì gè hāsàkèzú /hànzú/ wángzú /guìzú. X.Z be Cl Kazakhs tribe Han tribe royal tribe aristocratic tribe ‘Xiao Zhu is a Kazakh/ a Han/ a aristocrat/ an imperial kinsman.’ (23) 小朱是个*哈萨克/*汉/*王/*贵。 Xiǎozhū shì gè *hāsàkè /*hàn/ *wáng /*guì. X.Z be Cl Kazakhs Han king aristocratic Notice that in [X1 zú2]N form, as a free word, X1 cannot refer to the entity which it denotes. cǎoméi ‘strawberry’ (21) and wáng ‘king’ (23) are not able to indicate an individual of a certain group, i.e., cǎoméi-zú ‘strawberry generation’ and wángzú ‘imperial kinsmen’. Cǎoméi in the form cǎoméi-zú has been proceeded to be a metaphorical property of a person while wáng in the form wángzú has been lexicalized to be a modiﬁer to indicate a certain class of people. As we demonstrated previously, the morpheme zú in [X1 zú1]N and [X1 zú2]N forms always functions as a head and it is able to be modiﬁed by the component X1 (see (15) and (19)). From the morphological point of view, zú1 and zú2 are identical because they have to combine with another morpheme as the modifying component in order to operate as a word form. In sum, we have shown that the internal structure of both word forms zú according to Packard [1]’s X-bar morphological analysis. Both [X1 zú1]N and [X1 zú2]N forms have the identical modiﬁer-modiﬁed relationship between the components X1 and zú. 3.4

Discussion

As we have shown earlier, zú1 and zú2 are different in the possibility of changing the class of item to which they attach into the noun category. zú1 a bound root whereas zú2 is a word-forming afﬁx, especially a word-forming sufﬁx.

368

H. Yuan and Y. Li

Moreover, due to the meaning of zú, the multiplicity, both [X1 zú1]N and [X1 zú2]N are similar in denoting a group of humans, as show (1)-(2), which shows the neologism zú2 follows the morphological pattern of [X1 zú1]N to create new terms on describing a certain behavior of a group of people. It is not appropriate to determine zú2 is a case of grammaticalization of the bound root zú1 since zú2 gathers humans with an identical property into a group as zú1. The neologism [X1 zú2] cannot refer to a true ethnic group but a group of people (9) or an individual (11) while the conventional form, [X1 zú1] indicates a race or a class of persons. If cǎoméi-zú ‘strawberry generation’ in (9) was considered as an acceptable sentence, it would mean cǎoméi-zú is a real name for an ethnic group. This shows zú2 can be used to indicate a group of people, a race as zú1. Thus, in our view, [X1 zú1] and [X1 zú2] have the same way to form words i.e., X modifying zú. The neologism [X1 zú2] forms transform from the model of the word formation [X1 zú1]. It is not appropriate to deﬁne the neologism zú2 because of the grammaticalization of its bound root zú1. The morpheme zú extends its way of word formation. The novelty that the neologism zú2 shows is to adopt components of other syntactic categories such VP, NP, Adj and so forth as its modifying element instead of an adjective or a noun placing at the left-side in a word form [X1 zú1]. In our opinion, the nominal plurality denoted by the two zú deserves to be studied in depth.

4 Conclusion We have shown that zú is a bound morpheme and has a meaning which is the multiplicity across all its word forms. In the conventional word forms zú, [X1 zú]N and [zú X2]N, zú is a nominal bound root, which does not allow it to be used as a free word while in the neologism [X1 zú]N, zú is a word-forming sufﬁx, also bound and with a content. The former one does not change the class of items to which it attaches while the latter, the neologism does. [X1 zú] forms have an internal structure in which X modiﬁes zú. For the neologism zú, the component X1 can be a VP, a NP, an AdjP, an AdvV, a NV and onomatopes which have to be metaphorized to refer to a property or an action of a person and X1 is labelled as a noun to rename the group of people indicated by zú. This can explain the neologism zú is an extended usage of the conventional [X1 zú]N forms.

References 1. Packard, J.L.: The Morphology of Chinese: A Linguistic and Cognitive Approach. Cambridge University Press, Cambridge (2000) 2. Zhao, Y.-M.: Lùn biǎo rén cíyǔmó “X zú” (A study on the person-denoting word formation model X-zu), Zhèjiāng lǐ gōng dàxué xuébào (shèhuì kēxué bǎn) (J. Zhejiang Sci. Tech. Univ. (Soc. Sci.)) 40(4), 353–360 (2018)

Morpheme Zú “Tribe” in Mandarin Chinese

369

3. Cai, Y., Ding, S., Chen, P., Hu, Q.-X., Zeng, Y.: Xīnxìng cízú “X zú ”de xíngshì yǔ yǔyì yánjiū (Research on form and semantics of Neologism X-zu), Hànzì yǔ lìshǐ wénhuà (Chinese Char. History Cult.) 2018(7), 24–25(2018) 4. Liu, C.-Q., Gong, S.: Cíyǔ “zú” de gòuzào lǐjù jí guīfàn wèntí fènxī – jīyú “X zú ”“X nú”de duìbǐ fēnxī (A Contrastive Analysis of the Motivation for Word Formation and X-zu/X-nu Type of Words and Their Standardization). Yuyan wenzi yingyong (Appl. Linguist.) 2, 42– 48 (2010) 5. Ren, F.-Q.: “X zú”cí yǔyán xiànxiàng fēnxī (Analysis on X zu word). Yuwen xue kan 2010 (03), 42–43 (2010) 6. Song, B.-B.: “X zú” cíyǔ zhōng “zú” de xūhuà qíngkuàng kǎochá – jiān tán “X zú”cíyǔ yǎnshēng de dòngyīn (The Study on the Virtual Situation of “Zu” in the Word “X Zu” Concurrently Discussing Derivative Motivation of the Word “X Zu”). Chuánqí chuánjì xuǎn kān (J. Legends) 2010(04), 63–64 (2010) 7. Wang, X.-J.: “Zú” jí “zú” lèi cí qiǎnxī (“Family” and “Family” Category). Chuánqí chuánjì xuǎn kān (J. Legends) 2010(07), 36–37 (2010) 8. Su, X.-H.: Dāngdài Hànyǔ cíyǔmó yánjiū (Study on module of words in modern Chinese). Zhejiang Daxue chuban she (Zhejiang University Press), Hangzhou (2010) 9. Xie, X.-Y.: “Zú” lèicí xīn cíyǔcíqún tànwēi (Research on neologism zu). Yuwen xue kan 2009(20), 140–141 (2009) 10. Zhou, M.-L. Huang, W.-Q.: Jìn wǔ-nián xīn cíyǔ zhōng biǎo rénqún de lèi hòuzhuì fāzhǎn —yǐ 2006–2010 nián xīn cíyǔ wéi lì (On recent development of Semi-Afﬁxes on human— case of neologisms from 2006 to 2010). Ningxia daxue xuebao (Renwen shehui kexue ban) (J. Ningxia Univ. (Human. Soc. Sci. Edn.)) 35(2), 88–92 (2013) 11. Zhang, H.: “Zú” de fāzhǎn yǔ “zú” lèi cíyǔ tànjiū (On development of zu and analysis on word formation of zu). Mudanjiang shifan xueyuan xuebao (Zhe she ban) (J. Mudanjiang Normal College (Philos. Soc. Sci. Edn.)) 2016(2), 99–103 (2013) 12. Zhao, G.: “zú” de yǔyánxué fēnxī (Linguistic analysis on “zu”). Yunnan shifan daxue xuebao (Duiwai Hanyu jiaoxue yu yanjiuban) (Jo. Yunnan Normal Univ. (Chinese as Second Language and Research edition)) 5(6), 86–89 (2007) 13. Xu, S. (Hàn) (auth), Xu, X. et al. (Song)(eds): Shuō Wén Jiě Zì. Vol. 8, Yǎn Bù (Radical). Zhōngguó zhéxué shū diànzǐhuà jìhuà xiànshàng túshūguǎn (Chinese Text Project online open-access digital library). https://ctext.org/shuo-wen-jie-zi/yan-bu2/zh#n30677. Accessed 13 Jan 2021. [(漢) 許慎撰 (宋) 徐鉉等奉敕校定《說文解字》卷八 -> 部, 中國哲學書電子化計劃線上圖書館] 14. Duan, Y.-C. (Qing): Shuō Wén Jiě Zì Zhù (Notes on Shuō Wén Jiě Zì) online consultation. Vol. 7. Yǎn Bù (Radical). http://www.shuowen.org/view/4283. Accessed 13 Jan 2021. [(清) 段玉裁《說文解字注》《說文解字》線上檢索卷七部] 15. Wei, J.-G. (ed): Xin Hua zidian zaixian (online Xin Hua Dictionary). https://www. cidianwang.com/zd/zu/zu24870.htm. Accessed 13 Jan 2021 16. Chen, C., He, G.-W., Xu, Y.-M. (eds): Jiǎnmíng hànyǔ nìxù cídiǎn (Concise Reverse Order Chinese Dictionary). Zhishi chubanshe (Zhishi Publisher), Shanghai (1986) 17. Zhao, Y.-M.: Lùn biǎo rén cí yǔ mó “X zú” (A study on the person-denoting word formation model X-zu). Zhejiang ligong daxue xuebao (Shehui kexue ban)(J. Zhejiang Sci. Tech. Univ. (Soc. Sci.) 40(4), 353–360 (2018) 18. Zhang, J.-Z.: Qiǎn xī “X zú” cíyǔ zhōng de “X” (Analysis on X in word form X zu). Xiandai yuwen (Mod, Chinese) 2010(6), 143–145 (2010)

Interpretation of Complex Event and the Semantics Structure of General Verbs Jie Fan(&) School of Literature and Journalism and Communication, Xihua University, Chengdu, China [email protected]

Abstract. The semantic analysis of general verbs has always been a difﬁcult problem because of their vague meaning. Taking gao (搞) as an example, this paper attempts to analyze the semantics and functions of general verbs. The main ﬁndings are as follows. First of all, through the analysis of semantic structure, it is found that the meaning evolution of gao (搞) has two paths: “process salience” and “result salience”, which develop two kinds of meaning with activity as the core and result as the core respectively. The latter develops “causativity meaning” further through “removal process”. Secondly, it is a common feature of general verbs to report and interpret complex events from the long-range perspective, which is also the source of vague meaning. Keywords: General verbs gao (搞) Semantic structure Event complexity Scanned as a whole

1 Introduction General verbs refer to a kind of verbs with vague meaning in Chinese, such as gao(搞), gan(干) and zuo(做). Because of the vague meaning and the flexible usage, the general verbs are a difﬁcult problem in the study of Chinese lexical semantics, as well as an issue of interest in the ﬁelds of dictionary interpretation, teaching Chinese as a second language and natural language processing. The earliest discussion of general verbs can be traced back to Ouyang Xiu’s Guitian Lu. In modern times, the phenomenon has been discussed by previous scholars, such as [1] to [4], etc. In particular, many studies in recent years have investigated gao(搞), zheng(整) and zuo(做) in terms of characterization and classiﬁcation, lexical comparison, semantic types, semantic prosody, subjectivity, rhetoric and function, such as [5] to [11]. However, compared with other subcategories of verbs, there is still a lot of work to be done on general verbs. For example, the general verbs gao (搞), gan (干) and zuo (做) are similar in meaning and usage, and can often be substituted, as in example (1a), while some cannot be completely substituted, as in example (1bcd). Wang [5] notes the many similarities and differences in the collocation of gao(搞), gan(干) and zuo(做), and proposes some restrictions for substitution. Most of the cases discussed are in the sense of status, such as occupation and position, but in fact the semantic ranges of these three verbs are much wider than these aspects, and the comparison of their distribution and usage should be further extended. © Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 370–380, 2022. https://doi.org/10.1007/978-3-031-06703-7_28

Interpretation of Complex Event and the Semantics Structure of General Verbs

371

Furthermore, dictionary interpretations have not yet distinguished the differences in meaning between these verbs and often interpret them mutually, as in the Modern Chinese Dictionary (7th edition) in [15] for some general verbs. 【gao(搞)】①to do (zuo, 做); to do (gan, 干); to do (congshi, 从事): * production (shengchan, 生产) | work (gongzuo, 工作) | construct (jianshe, 建设). ② try to obtain something (shefa huode, 设法获得……). 【zuo(做)】③to do a job or activity: * job (gong, 工) |thing (shi, 事) |business (maimai, 买卖). 【nong(弄)】②to do (zuo, 做); to do (gan, 干); to do (gao, 搞): * meal (fan, 饭). 【gan(干)】① to do (zuo, 做): really (shi, 实) * | head down (maitou, 埋头) * work (huo,活). At the same time, there are many usages that are difﬁcult to understand based in their current meanings, for example, the word gao(搞) in example (2) is neither “to do” in meaning ①, nor “try to obtain something” in meaning ②. (2) a. gao(搞) + ducai (dictatorship)/zhuanzhi (authoritarianism)/geren chongbai (cult of the individual)/xingshizhuyi (formalism)/Baochan-daohu (ﬁxing of farm output to households)/ xiao-quanzi (coterie)/paixi (factions).

372

J. Fan

b. gao(搞) + yuedui (orchestra)/jijinhui (foundations)/youjidui (guerrillas)/chanyeyuan (industrial park)/zhuanwa-chang (brick and tile factory)/huoshituan (canteen)/ sangji-yutang (mulberry ﬁsh pond). In short, there are many questions that remain to be answered about the general verbs. In a macro perspective, what are the differences and connections between general and non- general verbs? Is there a lexical type of inheritance or analogy? In the meso perspective, what is the distribution and division of typical verbs such as gao(搞), gan(干), zuo(做), nong(弄) and zheng(整) within the general verbs? What are the essential differences between them? In the micro perspective, the semantic features, syntactic expressions, functional usage and evolutionary paths of each general verb have not yet been clearly revealed. The solution to this series of problems depends on a careful observation and depiction of the semantics of the verb, and the search for theoretical interpretations and reflections that can point to meso and macro issues. In this paper, we attempt to describe the semantic features and structure of the typical general verb gao(搞) by analyzing its co-occurring components and syntactic performance. All the examples in this paper are taken from the CCL corpus of Peking University and the BCC corpus of Beijing Language and Culture University, with the exception of a few self-developed examples and previous uses.

2 The Semantic Framework of gao (搞) 2.1

Semantic Types of Objects and Distribution of Meaning Items

There are three co-occurring components of gao (搞): adverbial, object and complement. It is the object that can reflect the semantic difference of gao (搞) more clearly. In terms of common collocations, there are some restrictions to the objects of gao (搞), and gao (搞) has different meanings depending on the semantic categories of object. Therefore, the distribution of gao (搞) needs to be analyzed in detail in terms of the semantic categories of object. In this paper, ﬁrstly, all the two meaning items of gao (搞) are included. Secondly, since the semantic categories of noun object covers the verbal object, the two kinds of objects are listed together. The analysis of the 1339 gao (搞) + object items in the BCC corpus reveals that there are roughly seven semantic categories of the object of gao (搞). ①Substances: huacao (flowers and plants), shouji (mobile phone), diannao (computer), xianlu (wires), mofang (magic cube), etc. ②Activities: a. jianshe (construct), juhui (gather), pohuai (sabotage), zhapian (defraud), touzi (invest), paimai (sell at auction), duanlian (exercise), diaocha (investigate), gaige (reform), hezuo (cooperate), etc. b. bisai (contest), wuhui (dancing party), wanhui (evening party), shalong (salon), huodong (activity), yecan (picnic), jisi (sacriﬁce), yishi (ritual), etc. ③Professions: a. yishu (art), zhengzhi (politics), wenxue (literature), jingji (economy), shengwu (biology), lilun (theory), yongyong, etc.

Interpretation of Complex Event and the Semantics Structure of General Verbs

373

b. geju (opera), baozhi (newspaper), caigou (purchases), xiaoshou (sale), jiaoxue (educate), yunshu (transport), dongman (animation), etc. ④Ideology: xingshi-zhuyi (formalism), zongpai-zhuyi (sectarianism), gerenchongbai (cult of the individual), zerenzhi (responsibility system), chengbao (contract), shichang jingji (market economy), “DuZun-RuShu” (Confucianism Monopoly), etc. ⑤Organization: jijinhui (foundations), youjidui (guerrillas), zazhishe (periodical ofﬁce), shetuan (mass organizations), yuedui (orchestra), shitang (canteen), chanyeyuan (industrial park), etc. ⑥Product: hangmu (aircraft carrier), dafeiji (large aircraft), kang’aiyao (anticancer drug), taikongzhan (space station), youxi (game), jiyinjishu (gene technology), etc. ⑦Gain: qian (money), mi (rice), liangshi (grain), cailiao (material), ziliao (product), wuzi (goods and materials), ming’e (quota), etc. It is important to note that the object of group a is a verb in category ② “activities”. It is easy to see that there is an increase of abstractness of the objects from ① to ⑦, and a corresponding difference in the meaning of gao (搞). Depending on the relationship between gao (搞) and its co-occurring components, the meaning of gao (搞) is divided into the following seven items. ① to handle, to ﬁddle with: * huacao (flowers and plants) /shouji (mobile phone); ② to carry out activity: * jianshe (construct)/juhui (gather)/pohuai (sabotage); ③ to engage in a profession: * yishu (art)/zhengzhi (politics)/wenxue (literature); ④ to implement a system: * xingshi-zhuyi (formalism)/shichang jingji (market economy); ⑤ to set up an organization: * jijinhui (foundations)/shitang (canteen); ⑥ to research and produce: * hangmu (aircraft carrier)/youxi (game); ⑦ to acquire: * qian (money)/liangshi (grain)/cailiao (material). 2.2

Semantic Level of gao (搞)

From Act to Activity. Ding [11] believes that gao (搞) was jiao(搅) in GuangYun in Song Dynasty, which was interpreted as “stir”, and jiao(搅) was still used in the middle of the Qing Dynasty. So far, in Cantonese, gao (搞) and jiao(搅) are synonymous. This is probably the earlier meaning of gao (搞). Then the meaning of the word gao(搞) is gradually changed, indicating hand movements other than “stir”. Of the seven senses mentioned above, item ① “handle, ﬁddle with”, is the closest to the usage. “Handle, ﬁddle with” is a more speciﬁc hand movement. Of all the actions that a person can perform, hand movements are clearly the most important and prominent in cognition. Hand movements are also often the core part of a variety of complex activities, so they are further extended into the behavior to mean “to perform a certain act or activity”. As shown in the examples below. (3) a. gao(搞) + da-saochu/jiawu/xiuli/caijian/xiujian, etc. gao(搞) + thorough cleanup/housework/mending/cutting/constructing, etc.

374

J. Fan

b. gao(搞) + xuexi/yanjiang/jiedai/juhui/pohuai, etc. gao(搞) + study/speech/reception/party/destruction, etc. The act in (3a), while still primarily a “hand action”, is different from the “handle, ﬁddle with” of gao(搞). It is no longer one or several concrete hand movements, but rather a composite event with a hand movement as the core. There is a high degree of randomness and unpredictability as to what speciﬁc actions will be involved. The activities in (3b) are more unpredictable than in (3a), and it is more difﬁcult to ﬁnd a suitable verb to replace gao(搞), which means that the meaning of gao(搞) is more abstract. Process Salience and Outcome Salience. A complex event often includes both “process” and “outcome”. By highlighting different parts of the word gao (搞) in the sense of “carry out an activity”, two different directions of meaning are developed, which will be referred to as “process salience” and “outcome salience”. As shown in the examples below. (4) a. gao(搞) + xuanchuan/xiaoshou/sheji/caigou/baowei, etc. gao(搞) + publicity/sale/design/procure/security, etc. b. gao(搞) + chengbao/zhuanzhi/ “DuZun-RuShu” /xingshi-zhuyi, etc. gao(搞) + contract /authoritarianism /Confucianism Monopoly /formalism, etc. In example (4a), “gao(搞) + object” can be understood as “to carry out some kind of activity” (meaning item ②), and when the event occurs repeatedly and steadily, it is associated with “engage in a profession” (meaning item ③). When an activity has a speciﬁc formal requirement or value orientation, it will be solidiﬁed into a system or concept, which is the sense denoted by meaning item ④, as shown in example (4b). Conversely, any system or concept is implemented by an activity. The linkage of meaning from item ② to ③ and ④ is the result of the prominence of the process. Focusing on outcomes in the event of an activity gives rise to the related meaning of “salience of the outcome”. Meaning item ⑤, ⑥ and ⑦, are all output-oriented meanings. As shown in example (5). (5) Tamen yige shengchandui gao(搞) le wu qi heibanbao. (To make). They one production team gao(搞) le ﬁve times Blackboard newspaper. Their production team made ﬁve blackboard newspapers. Rang ta qu gao(搞) chaye, ta gao(搞) laile ban daizi shuye. (To acquire). Let him go to gao(搞) tea, he gao(搞) come half bag leaves. He was told to get tea and he got half a bag of leaves. Na ge dizhu taoyan yinyue, keshi weile bai paitou hai gao(搞)guo yige yuedui. (To set up). That landowner hate music, but for put on airs also made a band. The landowner hated music, but he also set up a band to show off. “Blackboard newspaper”, “tea”, “leaves”, and “band” are all the “result” of gao (搞), which is more concerned with the expected outcome than with the process and manner of the action. In the above three cases, these objects do not exist before the verb gao(搞), and are divided into three different types of results by the verb gao(搞).

Interpretation of Complex Event and the Semantics Structure of General Verbs

375

De-activation and Causativity. If the verb gao (搞) with an object still retains some traces of “action”, the verb gao (搞) with a complement has completely lost its activity. There are two different forms of gao (搞) with a complement. The ﬁrst is the unmarked complement, where the verb or adjective indicated change is added directly after the verb gao (搞). The second is the complement with the marker de(得), where the complement can be a phrase, an idiom or a small sentence. In either case, the degree of activity of the verb gao (搞) is very low. This is shown in the following examples. (6) a. Ta bei yizhen qiche sheng gao(搞) xing. He by a burst car noise make awake. He was awakened by a burst of cars noise. Ke bie ba zhe liang zhong yao gao(搞) hun le. Not put these two kinds medicine make confusion. Don’t confuse these two medicines. Wo shi xiang cong lilun shang gao(搞) mingbai zhe liang jian shi. I am want to in theory make clear these two things. I was trying to ﬁgure out these two things theoretically. b. Ta hui ba Lali de shenghuo gao(搞)-de yitahutu. She will make Lali’s life gao(搞) mess. She will make a mess of Lali’s life. Wushui ba dishang gao(搞)-de zang-xixi de. Sewage make floor gao(搞) dirty. The sewage made the floor dirty. Zhe jian shi gao(搞)-de qiaozhi hen maodun. This matter make George very contradictory. This matter makes George very contradictory. In (6a), it is not an action of the “car noise” that causes “he woke up”; nor is it any process action by a person that causes “the life mess” in (6b). In example (6), the phrase “gao(搞) + complement” can all be interpreted as “lead to a certain result”. In none of the examples above does gao(搞) indicate any action, behavior, activity or process. That is, there is no deﬁnite action, behavior or process that leads to the result indicated by the complement. The semantic core of “gao(搞) + complement” is the complement, and the meaning of gao(搞) is “to make, to cause”. Combining the object and complement cases, the meaning of gao(搞) can be summarized in the following eight items. ① to handle, to ﬁddle with: * huacao (flowers and plants) /shouji (mobile phone); ② to carry out activity: * jianshe (construct) /juhui (gather) /pohuai (sabotage); ③ to engage in a profession: * yishu (art) /zhengzhi (politics) /wenxue (literature); ④ to implement a system: * xingshi-zhuyi (formalism) /shichang jingji (market economy); ⑤ to set up an organization: * jijinhui (foundations)/shitang (canteen);

376

J. Fan

⑥ to research and produce: * hangmu (aircraft carrier) /youxi (game); ⑦ to acquire: * qian (money) /liangshi (grain) /cailiao (material). ⑧ to cause, to make a result: * cuo (wrong) /cheng (successful)/qingchu (clear). 2.3

Summary of Section 2

The semantic connection between meaning items of gao(搞), starting with “hand movement”, leads to the meaning of “to carry out an activity”, on the basis of which the two paths of “process salience” and “result salience” are developed. On the basis of this, the meaning of “to carry out an activity” is derived and eventually the meaning of gao (搞) is developed, in which the meaning of action disappears. This process is accompanied by a continuous generalisation and weakening of the meaning of gao (搞). This can be represented in Fig. 1. Initial Meaning

Basic meaning

movement

Extended meaning

activity

result

Fig. 1. The semantic framework of “gao (搞)”

3 Object Complexity and Eventual Whole Scanning The semantic understanding of gao (搞) is closely related to the complexity of the event involved, and the meaning of gao (搞) varies accordingly, especially due to differences in the complexity of the object. This can be illustrated in terms of both the internal characteristics of the event and the relationship between the events. The former reflects the complexity of the event in qualitative terms, while the latter reflects the sequence of events in quantitative terms. 3.1

Complex Events

From a qualitative point of view, gao(搞) does not often co-occur with an object that denotes a single event. Taking the eight meaning items in Sect. 2.3, the other meaning items are less likely to denote a simple action or behavior, except for the meaning item ① of “to handle, to ﬁddle with”, which retains the meaning of “hand action” and can denote

Interpretation of Complex Event and the Semantics Structure of General Verbs

377

a single event. The other seven senses of “gao(搞) + object”, all denote events that are internally constituted as complex activities. The different “gao(搞) + object” terms such as “gao (搞) + jianshe (constrcut)”, “gao (搞) + xingshizhuyi (formalism)” and “gao (搞) + kongjian zhan (space station)” are all very macroscopic and extremely large-scale pictures of complex events, which contain numerous complex activities carried out by numerous subjects. Even the much smaller-scale examples of “gao(搞) + wuhui (dancing party)” and “gao(搞) + shalong (salon)” are also complex events made up of different sub-events. There are also examples in the corpus such as (7). (7) Dawei gongsi zhengzai gao(搞) zhaobiao. David company ongoing gao(搞) invite tenders. David’s company is inviting tenders. Riben-ren zai Gongding-qiao gao(搞) le hen duo ci “qinshan huodong”. Japanese at Gongding Bridge gao(搞) many times “goodwill activities”. The Japanese have carried out many “goodwill activities” at Gunding Bridge. Tamen zai nar gao(搞) le yijia meiguo dianying zhipianchang. They at there gao(搞) one American ﬁlm studio. They built an American ﬁlm studio there. Zhe jia yindu gongsi zhuan gao(搞) kang’aiyao. This Indian company Specially produce anti-cancer drugs. This Indian company specialises in anti-cancer drugs. You xie ren man naozi doushi zenme gao(搞) qian. Some people full mind all how to get money. Some people are full of ideas on how to get money. The verb gao(搞) in the above examples does not denote a simple act, but rather summarizes the entire process of the complex event that the “gao(搞) + object” construction represents. The “gao(搞) + object” includes several sub-events within it and no longer indicates a single action. The objects of gao(搞) are basically event nouns with a complex internal structure. Verbs with a single internal structure that can only express a simple action cannot be the objects of gao(搞), such as eat, drink, sleep, lie, walk, sit, run, jump, write, read, sing, cry, look, smell, etc. Only words that represent complex events can be the object of gao(搞) as shown below. Sabotage, speculate, swindle, fraud, gambling, ingenuity, rectiﬁcation, corruption, incubation, agrarian reform, investment, exercise, tender, propaganda, conciliation, procurement, transport, auction, blast, printing, assembly, adventure, trial, study, experiment, trial, production, concealment, inﬁltration, research, creation, investigation, corruption, detection, crime, gathering, inﬁghting, demonstration, strike, union, association apprenticeship, internship, assassination, assassination, oppression, exploitation, reception, supervision, riot, etc. These words are often interpreted as a whole because of their complex internal composition, which makes it difﬁcult to break them down into more speciﬁc verbal components. It means that the more complex the event, the easier it is to encode it as a word

378

J. Fan

with noun features, or even as a specialist event noun, as similarly evidenced in crosslinguistic comparisons in [12]. Han [13] divides event nouns into “complex event nouns”, which represent social phenomena, and “involuntary event nouns”, which represent natural phenomena. It is the former category that can become the object of gao (搞). Overall, “gao (搞) + object” expresses a holistic reading of the complex event framework. 3.2

Plural Events

The “gao(搞) + object” not only requires the object to be a complex event in qualitative terms, but also in quantitative terms, requiring it to be a plural event. This is mainly reflected in the development of the meaning from “to carry out an activity” to “to engage in a profession”, reflecting the difference in the amount of the events and thus the change in the amount of time, which is actually two sides of the same coin. Take the verb-object as an example to illustrate this point. The verb gao(搞) with the verbal object “to carry out an activity” seems to make little semantic contribution. For example, “to gao(搞) + sabotage” is same to “to sabotage”, “to gao(搞) + speculate” is same to “to speculate”, and “to gao(搞) + defraud” is same to “to defraud”. But a closer comparison reveals differences between the two expressions. The term “gao(搞) + verb” tends to be understood more as a plural event than a single verb. In general, “gao(搞) + verb” tends to express plural events or sequences of events that recur many times and are extended in time. If “gao(搞) + verb” is to express a single event, it would have to be accompanied by a one-quantity marker to indicate a single momentum, such as “yi hui (once)”, “yi ci (once), “yi ge (one)” and so on. The individual verbs, which are the core of the predicate, are single-event in character. The same phenomenon can be seen when looking at the real corpus, as in example (8). (8) a. Wo zai bang youtai ren gao(搞) paimai. / ? Wo zai bang youtai ren paimai. I help Jews do sell at auction. I’m helping these Jews with their auctions. b. Ta changqi gao(搞) caigou. /? Ta changqi caigou. He long time do procure. He has been involved in procurement for a long time. c. Tamen zai xiangxia gao(搞) diaocha. / ? Tamen zai xiangxia diaocha. They at countryside do research. They are doing research in the countryside. d. Wo bang youtai ren paimai wenwu/shebei/changfang. I help Jews auction off cultural relic/equipment/workshop. I help these Jews auction off cultural relic/equipment/workshop. In example (8a), “gao(搞) + paimai (sell at auction)” tends to be understood as a plural event, and the sentence seems incomplete if the word gao(搞) is removed. When the complex events expressed by “gao(搞) + verb” are solidiﬁed, they develop further into sequences of events. A sequence of events becomes an occupation when it expands in time to a certain extent, in other words when it is engaged in some stable activity over a long period of time. Regardless of whether the verb is followed by an object or not, verbs without gao (搞) still tend to be understood as single events preferentially. The object of “paimai

Interpretation of Complex Event and the Semantics Structure of General Verbs

379

(sell at auction)” may not be limited to one item, or it may not be convenient to specify them all, so a general verb is used as the core of the predicate, relegating “paimai (sell at auction)” to the object of gao(搞), which allows the narrative to move from the speciﬁc act or item to the activity itself, avoiding the tedium of going into details. In short, “gao (搞) + verb” is unmarked for plural events and marked for single events, while the verb without gao(搞) is unmarked for singular events and marked for plural events. 3.3

Summary of Section 3

With the exception of the initial meaning, which expresses a “hand movement”, there is a clear tendency towards complex events and plural events in the meaning of “gao (搞) + object”. Complex events that are difﬁcult to describe in detail are scanned as a whole and relegated to the object of gao(搞), which makes the narrative more concise. The eight meaning items of gao(搞) were labelled according to the complexity of the event, and the following complexity distribution was obtained (Table 1). Table 1. Complexity distribution of gao(搞) Meaning item

Example

Event complexity

Event quantity

①to ②to ③to ④to ⑤to ⑥to ⑦to ⑧to

* diannao (computer)/xianlu (wires) * jianshe (construct)/pohuai (sabotage) * yishu (art)/zhengzhi (politics) * xingshi-zhuyi (formalism) * jijinhui (foundations)/shitang (canteen) * hangmu (aircraft carrier)/youxi (game) * qian (money)/liangshi (grain) * cuo (wrong)/qingchu (clear)

[−Complex events] [+Complex events] [+Complex events] [+Complex events] [+Complex events] [+Complex events] [+Complex events] [+Complex events]

[−Plural events] [+Plural events] [+Plural events] [+Plural events] [+Plural events] [+Plural events] [+Plural events] [+Plural events]

ﬁddle with carry out activity engage in a profession implement a system set up an organization research and produce acquire make a result

4 Conclusion This paper constructs the semantic framework of gao (搞) based on the usage of general verbs by analyzing the object semantic categories and lexical relations. In the semantic evolution of gao (搞) there are two paths: “process highlighting” and “outcome highlighting”, which develop two types of meanings, activity-centred and resultcentred respectively. The latter, through further “deprocessing”, develops the sense of “to cause”, which is expressed in the dynamic-complementary structure, where gao (搞) loses all action and activity. The expression of complex events is the basic semantic feature of gao (搞), and the expression of plural events is its relatively more preferred semantic feature, which explains the tendency to generalise the semantics of gao (搞). The “gao (搞) + object” is either a complex event in itself, or a sequence of events extending over time and space, while the gao (搞) is a holistic understanding and reporting of a complex event from a distant perspective. Linguistically speaking, it is the generalisation and simpliﬁcation of complex events that allows many details that are too complicated, difﬁcult or inconvenient to be captured in a single general verb. The rhetorical function of the verb is thus derived, and it is used to pass over evasive or indecent meanings.

380

J. Fan

References 1. Li, W.: Some problems in exegesis. Stud. Chinese Lang. 1, 1–17 (1962) 2. Chao, Y.: A Grammar of Spoken Chinese. The Commercial Press, Beijing (1979) 3. Gong, Q., Hu, Z.: A brief discussion on the verb “gao (搞).” Stud. Chinese Lang. 1, 48–53 (1979). (in Chinese) 4. Zhu, D.: Dummy verbs and nominal verbs in modern written Chinese. J. Peking Univ. 5, 2–6 (1985). (in Chinese) 5. Jian, W.: A comparative analysis of the collocation range and meaning of “gan(干)”, “gao (搞)”, “zuo(做).” Overseas Chinese Educ. 2, 48–53 (2002). (in Chinese) 6. Xu, S.: Discussion of the meanings of the verb Gao(搞). J. Shanghai Normal Univ. (Philos. Soc. Sci.) 4, 108–111 (2003). (in Chinese) 7. Diao, Y.: On Delexical Verb. Doctoral dissertations of Nankai University, Tianjin (2004). (in Chinese) 8. Chen, C.: Functions of the V-O phrase of the pro-verb lai (来)and aspects of the relevant quantiﬁers. J. Henan Univ. (Soc. Sci.). 1, 139–146 (2011). (in Chinese) 9. Yao, Y.: On the semantic prosodyand syntactic function of “Gao(搞).” Lang. Teach. Linguist. Stud. 2, 61–68 (2011). (in Chinese) 10. Zhang, Z.: Rhetorical properties of Chinese syntactic semantics from the use of the pronoun “lai(来).” Contemp. Rhetoric. 4, 1–7 (2014). (in Chinese) 11. Ding, S.: Comparison Manual of Ancient and Modern Pronunciation. Zhonghua Book Company, Beijing (1981). (in Chinese) 12. Lu, B.: The semantic characteristics of event nouns in Chinese and English. Contemp. Linguist. 1, 1–11 (2012). (in Chinese) 13. Han, L.: The deﬁnition and system construction of event nouns in Mandarin Chinese. J. East China Normal Univ. (Humanities and Social Sciences). 5, 161–175, 196 (2016). (in Chinese) 14. Yang, L.: A pragmaticcase study of the verb 搞 in contemporary Chinese. Appl. Linguist. 2, 59–66 (2002). (in Chinese) 15. Dictionary Editorial Ofﬁce: Institute of Linguistics, Chinese Academy of Social Sciences, Modern Chinese Dictionary. 7th edn. The Commercial Press, Beijing (2016)

Perfectivity via Locative Non-coincidence: Pre-verbal TAU in the Xiaolongmen Dialect Xia Liu1 and Vincent Jixin Wang2(&) 1

2

School of Foreign Languages, Xiangtan University, Xiangtan, China School of European Languages, Zhejiang International Studies University, Hangzhou, China [email protected]

Abstract. It is observed that a pre-verbal morpheme TAU in the Xiaolongmen dialect indicates the moving of the subject to a speciﬁc location to perform an action at some time before the utterance time. Furthermore, the semantics of TAU conveys the following two essential implications: (1) The location of the subject must be non-coincident with the location of the event; (2) by the utterance time, the subject has already left this location and the involved action on a par have been completed or terminated. We argue that semantically TAU serves as a locative predicate which achieves event anchoring by utilizing the non-coincidence relationship between the event location and the location of the event initiator. Its past tense interpretation and the meaning of perfectivity thus result from the locative non-coincidence. Keywords: Pre-verbal TAU Intentionality

Perfectivity Locative non-coincidence

1 Introduction This paper attends to provide a proper semantics for the special pre-verbal morpheme TAU (tau1213)1 in Xiaolongmen dialect.2 Xiaolongmen is a Xiang dialect that is spoken in the Chenxi county in Western Hunan. The Xiaolongmen dialect belongs to the Chenxu dialect group of Xiang, a Sinitic language. Pre-verbal TAU in Xiaolongmen is rather unusual in the Xiang language family. In Xiang dialects, relevant markers with the initial consonant /t/, such as /ta, təw, tɔ, te, tau, tao/ (usually represented by the characters哒, 咑, 倒 and 到) are used to express the completion of actions, the durativity of actions or the resulting state after an action (see [1], as well as [2]). However,

1 2

213 is the tone value of morpheme TAU which corresponds to the classical tone category qu. Note that there exists another TAU-morpheme in Xiaolongmen, which appears clause-ﬁnal and indicates the emergence of new situations rather than a past event / state; see more details in [3]. In this paper, we restrict the issue to the pre-verbal TAU.

Xia Liu and Vincent Jixin Wang are co-ﬁrst authors of the article. © Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 381–390, 2022. https://doi.org/10.1007/978-3-031-06703-7_29

382

X. Liu and V. J. Wang

these /t/-initial markers usually appear post-verbally. Therefore, the pre-verbal TAUmarker in Xiaolongmen is rather unique. Furthermore, a pre-verbal tense/aspect marker signaling a past tense interpretation is uncommon in Chinese dialects, generally pre-verbal morphemes were used to convey imperfectivity in Xiang, such as zai (goli) and (zheng)zai in the Changsha dialect or even Mandarin dialects (e.g. zai in Mandarin). However, as shown in (1) below, TAU appearing before the verb (tau1213) gives rise to a past tense interpretation; sentences with tau1213 are thus incompatible with any temporal adverbs referring to a non-past time.

This paper aims to describe the linguistic phenomena observed in the Xiaolongmen dialect and ﬁnd an appropriate way to explain how sentences with TAU receive a past tense meaning. We propose that the past tense interpretation results from the noncoincidence relationship between the event location and the location of the event initiator. What is more, the meaning of perfectivity per se is derived from the past tense interpretation triggered by TAU in combination with speciﬁc event types. Hence, we treat TAU as a locative predicate. The VP in TAU-sentences is analyzed as an adjunct presenting the meaning of intention of the subject’s moving to a speciﬁc place. This paper is structured as follows: Sect. 2 discusses the semantic properties demonstrated by pre-verbal TAU. In Sect. 3, we explain how pre-verbal TAU relates to tense/aspect via locative non-coincidence and provide a formal analysis. Section 4 is the conclusion.

2 Semantic Properties of Pre-verbal TAU It is fully presented and interpreted in [3] that pre-verbal TAU contains directional/movement meaning (similar to Mandarin qù (go)/lái (come) + VP + le) and pre-verbal TAU is not a preposition that is able to be omitted together with the location noun. By using tests with A-not-A question, negation and VP ellipsis, [3] proposes that pre-verbal TAU is an auxiliary and regard it as a perfective aspect marker as the primary function of pre-verbal TAU and the function of assigning a past tense interpretation to sentences as the secondary function of pre-verbal TAU. In this paper, we focus on the two functions of temporal indicating of tau1213 rather than which one is superior to the other one. In the following section, we present the semantic interpretation of pre-verbal TAU and explain the semantic constraints of pre-verbal TAU with different event types.

Perfectivity via Locative Non-coincidence

2.1

383

Directional/Movement Meaning

From the examples illustrated above, it seems that sentences with tau1213 are equivalent to sentences with ‘kʻəɯ13/ lai13 + VP + liau42’ (similar to Mandarin qù/lái + VP + le). The following examples come from [3].

Sentence (2a) means that the subject went to play football at some time in the past and the action has been terminated before the speaking moment. On the other hand, if the speaking location is coincident with the event location, sentence (2a) expresses that the subject came to the speaking/event location to play football, and at the speaking moment, the action has been terminated. Note that in each case, the sentence receives only the past reading. In comparison with tau1213, both the past and present reading are available in sentence (2b), which refer to the termination of the action or the processing of football playing by the utterance time, respectively. Similarly, (2c) has two readings as well. As for (2c) with the motion verb ‘come’ and the perfect marker liau42, when speaker A asks speaker B why the subject was here (at the speaking location), the speaker would utter (2c) to indicate that the subject came here to play football and the subject is not here at the speaking time. In addition, when speaker A asks speaker B that what the subject is doing here, speaker B could also utter (2c) to indicate that the subject is here to play football. In summary, sentences (2b) and (2c) both have two readings: present and past; on the other hand, sentence (2a) has just one reading: past. The interpretation of sentences with tau1213 is equivalent to the past tense reading of the “kʻəɯ13/lai13 + + VP + liau42” structure. 2.2

Overt or Covert Locative Arguments

Since tau1213 declares that the subject went/came to a place to take part in a speciﬁc action, it is conspicuous that locative arguments may be added after tau1213, as shown in (3).

384

X. Liu and V. J. Wang

With (3) it is asserted that the subject went/came to the bookstore to buy books and now he has left there. The place in which the event is located can be overtly manifested. Interestingly, such a locative argument turned out to be not necessary; it can be omitted, and the whole sentence remains grammatical.

While sentence (3) explicates the event location, sentence (4) merely signiﬁes that the subject went/came to some particular place to perform an action and is no longer there by the speaking time. The event location does not need to be overtly expressed in TAU-sentences. In a nutshell, tau1213 contains a directional/movement meaning and involves location changes. Nevertheless, the locative argument can be either overtly or covertly realized. 2.3

Perfectivity

It has been proved that only a past tense reading is assigned to sentences marked by the pre-verbal TAU. We examine now whether tau1213 patterns with (verbal) le1 in Mandarin, which indicates perfectivity. According to [4] and [5], grammatical aspects, but not tenses, exhibit restrictions with respect to the lexical aspect of the event denoted. For example, the imperfective aspect is incompatible with a state, an achievement, or an inchoative (e.g., *I am knowing him). If tau1213 was a past tense indicator, it would be compatible with all of four Aktionsarten in [6]. Consider the following examples.

Perfectivity via Locative Non-coincidence

385

Obviously, only events subcategorized as accomplishments and activities are compatible with tau1213; the combination with states and achievements is not licensed. Since perfective aspect views the situation as a whole, the situation is “bounded”. In case of activities and accomplishments shown as in (5a–b), which are dynamic and durative events, tau1213 creates a temporal boundary, and thus views these events as indecomposable.3 The function of presenting an event as being terminated or completed (perfectivity) is observed in all involved examples. This leads to the conclusion that tau1213 on a par gives rise to the meaning of perfectivity. In short, the above shows that, with dynamic and durative events, tau1213 serves as a temporal indicator conveying a past tense interpretation and the termination of the event by the speaking time. 2.4

Semantic Constraints for TAU

In the previous section we have demonstrated that tau1213 selects aspectually only dynamic durative events; states and achievements are not permitted in a TAU-context. Furthermore, it seems to be undisputed that TAU-sentences must involve agentivity and intentionality, since the pre-verbal TAU implies that the subject went/came to somewhere to perform an action. Firstly, TAU-sentences need semantically an agentive subject which can take the event described by VP under control. Non-agentive subjects are not able to co-occur with tau1213.

Secondly, the event selected by tau1213 must be intentional. [7] argues that intuitively intentional activities happen as a gradual process which takes time to proceed; by contrast, instantaneous events (achievements), because they do not take time, cannot be intentional. In addition, states are non-intentional per se. This explains why these two event categories are in principle excluded by tau1213, as illustrated in (5c–d) and (7) below.

3

The boundary caused by TAU is not necessarily the culmination point in case of accomplishments: e.g., in (5b), TAU indicates that the washing event has been already terminated, but not necessarily completed; in other words, the one-piece clothing washing event can be ﬁnished or not ﬁnished.

386

X. Liu and V. J. Wang

Summing up: fundamentally, the two semantic constraints agentivity and intentionality imposed on TAU-sentences regulate the (in) compatibility of the cooccurrence with distinct Aktionsarten. 2.5

Summary

In Sect. 2 we examined the semantic properties manifested in tau1213: First, tau1213 contains a directional/movement meaning and involves the change of location, which indicates that the subject went/came to a speciﬁc place to participate in an action. Second, tau1213 implies that by the utterance time, the subject has left the place where the event occurred, and the event has been terminated: semantically, sentences presented together with tau1213 has a past tense interpretation and also a perfective meaning. Finally, TAU-sentences adhere to the semantic constraints of agentivity and intentionality.

3 TAU as Locative Predicate: A Formal Analysis In this section, we deal with how tau1213 achieves its past tense interpretation and how this interpretation correlates with the event anchoring caused by tau1213. Ultimately, we argue that tau1213 functions as a locative predicate which employs location displacements to influence the temporal reference of sentences. [8] assumes that tense is a temporal predicate of (non-)coincidence. In accordance with this theory, [9] argues further that event anchoring can be fulﬁlled by a location predicate or event participants. In case of tau1213, the following structure is assumed.

Perfectivity via Locative Non-coincidence

(8)

LocP UttLoc

Loc’ Loc

[+coincide] here = present (9)

387

VP EvLoc

LocP UttLoc

Loc’ Loc

[-coincide] there = past

VP EvLoc

Diagrams (8) and (9) show that the function of Loc is equivalent to the role of T (ense) in English or other languages, which is to instruct the event to anchor through the relationship between event location (EvLoc) and utterance location (UttLoc). Loc has the interpretable semantic feature [±coincidence]. When the utterance location coincides with the event location, the sentence takes on a present meaning reading (8); in contrast, when the event location does not overlap with the utterance location, the sentence has a past tense interpretation (9). As discussed in Sect. 2, in sentences with tau1213, the event initiator is not at the event location at the utterance time. Hence, we propose that tau1213 is a non-coincidence locative predicate which indicates the noncoincidence between event location and the location of event initiator. In other words, tau1213 is associated with the syntactic structure assumed in (9). Recall that tau1213 as a locative predicate is saturated through a locative argument which can be either overtly or covertly realized. Semantically, tau1213 relates the event location with the event initiator through the feature [−coincidence]. Hence, we suggest the following lexical entry. (10)

According to (10), tau1213 requires two arguments x and r of type l and e (for place and human, respectively), and a third argument e of type s (for event), and relates the three variables through two relations: MOVE expresses that the event initiator r moved to the place denoted by x, while LOC conveys that the event e is located in x. The last part relates x with r via :COIN meaning that x (the event location) and r (the event initiator) are non-coincident by the utterance time. Since the feature [−coincidence] is associated with PAST-anchoring, the following inference is guaranteed.

388

X. Liu and V. J. Wang

(11)

The perfective reading of tau121 is entailed via the non-coincidence between the event location and event initiator in (11): If an event e is located at x, and x stands in a non-coincident relation with the event initiator r, then the running time of e totally proceeds the utterance time s, and r has already left x by s. Subsequently, it should be taken into account, which syntactic and semantic role the verbal phrase in TAU-sentences takes. Provided that intentionality is imposed on TAU-sentences, we treat the VP syntactically as an adjunct, which semantically conveys the meaning of ﬁnality. We adopt the syntactic analysis for the German Abstentiv proposed in [10], where the whole construction means that the subject is located at a speciﬁc place in order to perform something, and the VP or the PP involved is not necessary to be present, as shown in (12). (12) a. Er ist im Garten Holz hacken. He is in the garden wood hack ‘He is in the garden in order to hack wood.’ b. Er ist im Garten. He is in the garden ‘He is in the garden.’ c. Er ist Holz hacken. He is wood hack ‘He is somewhere in order to hack wood.’

According to [10], sentence (12a) has (12b) as its core part indicating the location of the subject at the speech time, since both (12b) and (12c) can serve as a suitable answer to questions such as “wo ist er? (‘where is he?’)”. As a consequent, the VP is adjoint to LocP to establish the meaning of intention: The subject r is located at a place x which cannot be the location of speech, and the purpose of r’s being there consists of the performance of a special action; otherwise he would not be there. However, in contrast to the German Abstentiv, the meaning of location imposed on TAU-constructions implies the non-coincidence of the event location the event initiator. Thus, the meaning of intention introduced in TAU-sentences should be paraphrased as “The subject went/came to a place x (and he is no more there), in order to perform a special action.” The compositional process of TAU-sentences is exempliﬁed in (13).

Perfectivity via Locative Non-coincidence

389

The VP is combined with LocP by exploiting the MOD-operator proposed in [11], which is semantically empty and merely maps an event type to another one and yields an event type of the same kind. The corresponding semantic interpretations are illustrated successively in (14).

Finally, we argue that the interpretation of intention is strengthened through pragmatic reasoning by applying Modus Tollens.

Per this inference pattern, TAU-sentences convey that the subject went/came to a speciﬁc place in order to perform an action; if he had not intended to do it, he would

390

X. Liu and V. J. Wang

have not been there. Take (11a) as an example: He went to Beijing in order to visit his teacher; if he had not intended to visit his teacher, he would have not been there. Intuitively, this captures the intentional meaning of TAU-sentences properly.

4 Conclusion We have investigated the semantics of pre-verbal TAU in the Xiaolongmen dialect. We claim that tau1213 is a locative predicate relating the event location and event initiator through the feature [−coincidence]. The whole construction indicates that the subject (event initiator) went/came to a speciﬁc place to perform an action and has already left the place by the utterance time. Particularly, this results in a past tense interpretation and the meaning of perfectivity: The involved event happened at some time in the past and has been completed or terminated at the utterance time. Furthermore, we propose that the VP in TAU-sentences be treated as adjunct and establishes the interpretation of the subject’s intention.

References 1. Wang, Z.-Y.: Description of aspect markers in Xiang dialects. Polyglossia: the Asia-Paciﬁc’s Voice in Language and Language Teaching 33, 123–147 (2012). (In Japanese) 2. Wu, Y.-J.: Xiang fangyan dongtai zhuci de xitong ji qi yanbian (The system and evolution of auxiliaries in the Xiang dialects). Hunan shifan daxue chubanshe, Changsha (2006) 3. Liu, X., Huang, K.: Tau+VP and clause-ﬁnal Tau in Xiaolongmen, a Xiang dialect. J. Chin. Linguis. 49(1), 226–252 (2021) 4. Tonhauser, J.: The temporal semantics of noun phrases: evidence from Guaraní, Ph.D. thesis. Stanford University (2006) 5. Smith, C.S., Erbaugh, M.S.: Temporal Interpretation in Mandarin Chinese. Linguistics 43 (4), 713–756 (2005) 6. Vendler, Z.: Linguistics in Philosophy Ithaca. Cornell University Press, New York (1967) 7. Piñón, C.J.: Achievements in an event semantics. In: Lawson, A., Cho, E. (eds.) Proceedings from Semantics and Linguistic Theory 7, pp. 273–296. Cornell University (1997) 8. Demirdache, H., Uribe-Etxebarria, M.: The primitives of temporal relations. In: Martin, R., Michaels, D., Uriagereka, J. (eds.) Step by Step: Essays on Minimalist Syntax in Honor of Howard Lasnik, pp. 157–186. MIT Press, Cambridge (2000) 9. Ritter, E., Wiltschko, M.: Anchoring events to utterances without tense. In: Proceedings of the 24th West Coast Conference on Formal Linguistics (WCCFL), pp. 343–351. Cascadilla Proceedings Project, Somerville (2005) 10. Wöllstein, A.: Aspekte des Absentivs: Wir sind Sue gratulieren – Zum Problem der Lokalisierung im Absentiv. In: Interfaces of Morphology, pp. 179–199. De Gruyter, Berlin (2013) 11. Maienborn, C., Schäfer, M.: Adverbs and adverbials. In: von Heusinger, K., Maienborn, C., Portner, P. (eds.) Semantics: An International Handbook of Natural Language Meaning, vol. 2, pp. 1390–1420. De Gruyter, Berlin & Boston (2011)

The Development Trend and Form-Meaning Features of Contemporary Chinese Lexical Patterns Jiapan Li(&) Beijing Language and Culture University, Beijing, China [email protected]

Abstract. Based on the investigation of the Chinese New Words, this article analyzes the development trend and form-meaning features of contemporary Chinese lexical patterns from both quantitative and qualitative perspectives. Quantitative research found that the number of words of contemporary Chinese lexical patterns ﬁrst increases and then decreases from 2006 to 2018. Some lexical patterns had explosive productivity in a short period of time. The 2 + 1VN mode has the strongest productivity and preponderant word-formation mode of each lexical pattern is different. Qualitative research found that contemporary Chinese lexical patterns are represented by three-syllable patterns in prosody. In terms of grammatical function, the relation between components in the NN are mostly thematic relations, only some of them share property relations. The prosodic modes of structures such as VN, AN, and AV are inconsistent with the mainstream rules of traditional word formation, which is due to the weak motility of V and the classiﬁcation function of A. The study of formmeaning features of lexical patterns is helpful to reveal the new features of contemporary Chinese vocabulary and deepen the study of the evolution of Chinese vocabulary. Keywords: Lexical pattern

Prosodic structure Grammatical structure

1 Introduction Creating words by analogy based on lexical patterns is one of the most important ways to generate new words in contemporary Chinese. For example, according to the lexical pattern [X + nu (奴) ‘slave’], word creators can derive words such as hunnu (婚奴) ‘marriage slave’, chenu (车奴) ‘car slave’, munu (墓奴) ‘tomb slave’, hainu (孩奴) ‘child slave’ from the fangnu (房奴) ‘house slave’. Among the nearly 40000 new words in contemporary Chinese, there are 393 lexical patterns in total, including 6967 new words, accounting for about 18% of the total number of new words [1]. The large number of emerging lexical patterns has attracted the attention of researchers due to their novelty and productivity. Previous researches mainly focused on case study, describing the form and collocation characteristics of lexical patterns, and have explained the formation and development of lexical patterns from the perspective of rhetoric and social psychology [2–4]. Only a few studies have investigated and analyzed the overall characteristics of multiple lexical patterns [5–7]. However, there has © Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 391–401, 2022. https://doi.org/10.1007/978-3-031-06703-7_30

392

J. Li

not been any research utilizing the combination of quantitative and qualitative analysis on the development of lexical patterns and the features of form and meaning in a certain period of time. In order to explore the development, trend, and word formation features of contemporary Chinese lexical patterns, based on the investigation of the Chinese New Words from 2006 to 2018, this study selected the lexical patterns with high productivity at ﬁrst. Part two of this study analyzes the productivity of lexical patterns. Part three describes the form and meaning features of contemporary Chinese lexical patterns and discusses the similarities and differences of word formation between contemporary and modern Chinese. Part four is a conclusion.

2 Productivity of Contemporary Chinese Lexical Patterns Based on the statistical analysis of the series of the Chinese New Words from 2006 to 2018, 20 productive lexical patterns have been generalized, of which 14 have a ﬁxed element in the right position of the pattern, hereinafter referred to as ‘right-ﬁxed’, which are [X + zu (族) ‘nation’] [X + men (门) ‘gate’] [X + ti (体) ‘style’] [X + ke (客) ‘guest’] [X + nü (女) ‘woman’] [X + ge (哥) ‘brother’] [X + nan (男) ‘man’] [X + erdai (二代) ‘second generation’] [X + nu (奴) ‘slave’] [X + fen (粉) ‘fans’] [X + shang (商) ‘quotient’] [X + hua (化) ‘-ize’] [X + kong (控) ‘controller’] [X + ba (吧) ‘bar’]. 1110 words are formed with this element. There are six lexical patterns with a ﬁxed element in the left position, which are respectively [bei (被) ‘be’ + X] [yun (云) ‘cloud’ + X] [wei (微) ‘micro’ + X] [luo (裸) ‘bare’ + X] [ling (零) ‘zero’ + X] [ruan (软) ‘soft’ + X]. 256 words are formed using this element. 2.1

Quantitative Analysis of Word Formation of Lexical Patterns

20 lexical patterns from 2006 to 2018 have been plotted into a stacked area map that shown in chart 1. The Number of Words Increases First and Then Decreases. On the whole, the number of new words created by analogy based on lexical patterns showed an increasing trend from 2006 to 2010, and a decreasing trend from 2010 to 2018. 2009, 2010 and 2011 are years with high productivity, with more than 200 words per year. Since 2015, the number of new words is less than 50 per year, and the productivity of Chinese lexical patterns show an obvious decline trend, with only 16 words in 2017 and 12 words in 2018. Although the number of words is influenced by the principles of lexicography, it can still reflect the trend of the construction of new words in society to a certain extent.

The Development Trend and Form-Meaning Features

393

Fig. 1. The number of words created through lexical patterns in Chinese New Words

Some Lexical Patterns Have Strong Productivity in A Short Time. The Fig. 1 can not only show the quantity, but also reflect the development trend of the same thing at different times. It can be seen from Fig. 1 that from 2006 to 2018, most lexical patterns created new words every year, among which the three lexical patterns of [X + zu] [X + men] [X + ke] have strong productivity in the period of 2006 to 2016. Some lexical patterns have strong productivity in a certain short period, but after that period, no new word are created. For example, [bei + X] only created new words from 2009 to 2012, [X + nu] [X + ba] only created new words from 2006 to 2012, [wei + X] from 2009 to 2016, and [X + kong] from 2010 to 2015. 2.2

Prosodic and Grammatical Mode of Lexical Patterns

This study annotates 1366 lexical words constructed from 20 lexical patterns according to the prosodic and grammatical structure. For example, longxiamen (龙虾门) ‘lobster gate’ is labeled as ‘2 + 1NN’, in which ‘2 + 1’ means that its prosodic mode is ‘disyllable + monosyllable’, ‘NN’ means that its grammatical mode is ‘nominal component + nominal component’, luoguan (裸官) ‘naked ofﬁcials’ is labeled as ‘1 + 1AN’, in which ‘1 + 1’ means that its prosodic mode is ‘monosyllable + monosyllable’, and ‘AN’ means that its grammatical mode is ‘adjective component + nominal component’. The Proportion of Words in Different Prosodic and Grammatical Mode. On the basis of the annotation of prosodic and grammatical mode, this study draws the statistical results into a pie chart, as shown in Fig. 2.

394

J. Li

others, 13% 1+2AV, 3% 1+2NV, 3% 1+1VN, 4% 1+2AN, 4%

2+1VN, 32%

1+2NN, 4% 2+1AN, 4% 1+1AN, 4% 1+1NN, 7%

2+1NN, 22%

Fig. 2. The proportion of words in different prosodic and grammatical modes

It can be seen from Fig. 2: (1) The 2 + 1VN mode has the highest productivity. 2 + 1VN represents a combination mode with ‘disyllabic + monosyllabic’ as the prosodic mode and ‘verbal components + nominal components’ as the form mode, such as kenlaozu (啃老族) ‘neet’, qishimen (歧视门) ‘discrimination gate’ and pangtingge (旁听哥) ‘audit brother’. The words constructed by this mode account for 32% of the total number of words. (2) The mode of words is relatively concentrated. The number of words in the three modes of 2 + 1VN, 2 + 1NN and 1 + 1NN account for 61% of the total number of words, which shows that the lexical patterns tend to adopt these three modes when constructing new words. Figure 2 shows the proportion of prosodic and grammatical modes of all words constructed by lexical patterns as a whole, but this proportion cannot reflect the choice of any speciﬁc lexical patterns. The Proportion of Words in Typical Word Formation Modes. According to the differences in the positions of the ﬁxed components in the lexical patterns, this study classiﬁes the lexical patterns into two types, right-ﬁxed and left-ﬁxed, and summarizes the proportion of word formation of the two types in different modes. The mode accounting for more than 60% are regarded as the typical mode, and the mode accounting for more than 30% are regarded as the relatively typical mode. The statistical results are shown in Table 1 and Table 2. Table 1. Data table of the mode of right-ﬁxed lexical patterns [X + zu]

2 + 1NN

2 + 1VN

1 + 1AN

1 + 1NN

1 + 1VN

1 + 2NN

1 + 2VN

2 + 1VV

3 + 1VV

Others

21%

66%

-

-

-

-

-

-

-

13%

[X + men]

47%

47%

-

-

-

-

-

-

-

6%

[X + ti]

47%

29%

-

-

-

-

-

-

-

24%

[X + ke]

-

26%

11%

11%

39%

-

-

-

-

13%

[X + nü]

25%

25%

19%

-

7%

-

-

-

-

25%

[X + ge]

40%

41%

-

7%

-

-

-

-

-

12%

[X + nan]

35%

19%

11%

-

-

-

-

-

-

34%

[X + erdai]

-

-

-

-

-

55%

26%

-

-

19%

[X + nu]

-

-

-

78%

-

-

-

-

-

23%

[X + fen]

24%

-

24%

40%

-

-

-

-

-

12%

[X + shang]

-

-

-

68%

24%

-

-

-

-

8%

[X + hua]

-

-

-

-

-

-

-

25%

30%

45%

[X + kong]

61%

22%

-

-

-

-

-

-

-

17%

[X + ba]

18%

18%

12%

29%

18%

-

-

-

-

6%

The Development Trend and Form-Meaning Features

395

Table 2. Data table of the mode of left-ﬁxed lexical patterns 1 + 1AN 1 + 2PN 1 + 2PV 1 + 2NN 1 + 2NV 1 + 2AN 1 + 2AV 1 + 1AV Others [bei + X] [yun + X] [wei + X] [luo + X] [ling + X] [ruan + X]

10% 32% -

35% -

59% -

48% -

32% -

45% 63% 40%

34% 25% 33%

57% -

6% 21% 11% 11% 13% 27%

It can be seen from Table 1 and Table 2 that, although on the whole, the lexical patterns in Chinese New Words account for 61% of the words under the three modes of 2 + 1VN, 2 + 1NN and 1 + 1NN, when observing the typical mode of each lexical pattern in detail, the difference in the orientation of the word formation mode within the lexical pattern will be obvious. (1) Among the 20 lexical patterns, only 5 lexical patterns as [X + zu] [X + nu] [X + shang] [X + kong] [ling + X] have typical modes, 13 lexical patterns as [X + men] [X + ti] [X + ke] [X + ge] [X + nan] [X + erdai] [X + fen] [X + hua] [bei + X] [yun + X] [wei + X] [luo + X] [ruan + X] have relatively typical modes, and [X + nü] [X + ba] do not have typical or relatively typical modes. (2) The typical modes of the 20 lexical patterns are quite different from the relatively typical modes. The contemporary Chinese lexical patterns do not show a consistent choice of modes. Although the number of words in 2 + 1VN, 2 + 1NN and 1 + 1NN is relatively high when compared with the total number, this is due to the strong productivity of [X + zu] [X + men] [X + ti] [X + ge] [X + nu]. The purpose of the quantitative analysis of the corpus is to show the development trend of productivity and the choice tendency of modes in Chinese new words within a limited time. In order to explore the motivation behind word formation and choice tendency, further qualitative analysis turns out to be essential.

3 Analysis of the Combination of Forms and Meanings of Contemporary Chinese Lexical Patterns The qualitative analysis of lexical patterns will focus on the typical and relatively typical modes of lexical patterns from the dimensions of prosodic and grammatical structures. 3.1

The Prosodic Structural Features of Contemporary Chinese Lexical Patterns

Quantitative analysis shows that: (1) the prosodic structure of [2 + 1] is the preponderant mode of right-ﬁxed lexical pattern, [X + zu] [X + men] [X + ti] [X + ge] [X + nan] [X + kong] has strong productivity in [2 + 1] prosodic structure, [X + ke] [X + nü] [X + fen] also has a certain degree of productivity. (2) The prosodic structure

396

J. Li

of [1 + 2] is the preponderant mode of left-ﬁxed lexical pattern, [bei + X] [yun + X] [wei + X] [ling + X] [ruan + X] tends to construct new words in the form of [1 + 2], while [luo + X] tends to construct new words in the form of [1 + 1]. The features of the modes in contemporary Chinese are consistent with the threesyllable prosodic structure of new words. Previous studies have shown that Chinese new words tend to be trisyllabic on the basis of statistics of Chinese New Words from 2006 to 2010 [8, 9]. Based on the statistics in Chinese New Words from 2006 to 2018, this study found that from 2006 to 2011, the proportion of trisyllable words within the total of words in the Chinese language increased from 56% to 84%, which is in line with the previous analysis of the development of the trisyllable words. However, from 2011 to 2018, the proportion of the trisyllable words decreased from 84% to 58% (a small increase in 2017). Thus, the proportion of the trisyllable words is shown as rising ﬁrst and then decreasing. Further analysis of the trisyllable words shows that the ﬁxed composition of the lexical pattern shows obvious afﬁx feature, which has the characteristics of ‘unidirectional high combinability, individualized structural type, and grammatical determinant’ [10]. In recent years, Chinese semi-afﬁxes tend to match with disyllabic components. Although studies show that the syllable matching feature of modern Chinese is that even number matching is freer than odd number matching, that is, [1 + 1] is freer than [1 + 2] and [2 + 1] [11–13]. The statistical results of contemporary Chinese words show that for most lexical patterns, language users tend to choose odd matching mode when creating new words, such as [X + zu] [X + men] [ling + X], and only a few lexical patterns, such as [X + nu] [X + shang] [luo + X], language users tend to choose even matching mode when creating new words. It can be seen that most of the new words constructed by Chinese lexical patterns are different from the mainstream Chinese word formation in prosodic structure. 3.2

Grammatical Structure Features of Lexical Patterns in Contemporary Chinese

Grammatical Structure Features of NN. From the quantitative analysis, it can be seen that for most of the right-ﬁxed lexical patterns, the attributive-head NN mode has strong productivity, [X + men] [X + ti] [X + ge] [X + nan] [X + kong] can be created in 2 + 1NN mode, [X + nu] [X + fen] [X + shang] can be created in 1 + 1NN mode, [X + erdai] can be created in 1 + 2NN mode. In the left-ﬁxed lexical patterns, only [yun + X] is relatively productive in the 1 + 2NN mode. Further analysis of the corpus shows that N1 (the left nominal component of NN) has no obvious distinct features in semantics. It can represent both concrete things and abstract things, as well as natural things and man-made things. However, the relationship between N1 and N2 (the right nominal component of NN) can be basically classiﬁed into two categories: thematic relation and property relation. The thematic relation describes how the modiﬁer noun of a compound concept is related to the head noun [14]. The statistics show that most of the components of NN words have a thematic relationship, for example:

The Development Trend and Form-Meaning Features

397

zhainu ‘debt slave’ refers to a person or unit that is under great pressure due to debt. (2008 Chinese New Words) zhentankong ‘detective fan’ refers to people who are obsessed with detective ﬁction and reasoning. (2012 Chinese New Words) ‘Due to’ in the interpretation of zhainu indicates causality, while ‘be obsessed with’ in the interpretation of zhentankong indicates a relationship of concern, both of which belong to thematic relation. Attribute relation means that the modiﬁer noun of the compound concept does not represent a thing in semantics, but a certain attribute of a thing, and the attribute is used to modify the head noun [15]. The statistics show that a small part of NN words in some lexical patterns have attribute relations, for example: liulianzu ‘durian person’ refers to people with certain working experience, strong ability, bad temper, and who are difﬁcult to get along with in the workplace. They are called because they resemble a durian in the way that it smells bad and has a strong flavor. (2009 Chinese New Words) liulian in the liulianzu does not express a thing, but the attributes of smelling and eating, and then metaphorically refers to the attributes of ‘having certain working experience, strong ability, a bad temper, and being difﬁcult to get along with’. Previous studies on the NN mode showed that the attributive nouns are functionally equivalent to or have been transformed into non-predicate adjectives [16, 17]. Nouns as attributives modify the heads with the overall attributes of the things the noun refers to, while adjectives (and/or non-predicate adjectives) as attributives only modify the heads with a single attribute [18]. The overall attribute is more stable than a single attribute, and the relation between the components of NN is also more stable than the relations between AN and VN. However, this study found that only a few components of morphological words have attribute relation, that is, only a few NN words have attributive nouns that have the distinguishing function of adjectives or non-predicate adjectives. The attributive nouns and head nouns of most NN words have a thematic relation. This is inconsistent with the analysis of NN words in mainstream word formation researches. Grammatical Structure Features of VN. From the quantitative analysis, it can be seen that 2 + 1VN is the preponderant mode of [X + zu][X + men][X + ge], and 1 + 1VN is the preponderant mode of [X + ke]. In Chinese, ‘monosyllable verbs are typical members of the verb category which do not have the function of attributive, while disyllable verbs are non-typical members which have the potential attributive function’ [19]. Modern Chinese monosyllabic verbs have strong semantic features of action and grammatical features of verbs. One of the basic features of verbs is that they are closely related to time. Conceptually, actions are usually associated with time. The most typical verb is the monophonic action verb, but ‘its attributes tend to exclude temporality’ [20]. However, disyllabic verbs are weaker in action and their verb grammatical features are either weakened or partially lost. They tend to drift toward nouns. After analyzing the 2 + 1VN words of [X + zu] [X + men] [X + ge], it is found that the V in these lexical patterns is typical of the verb-object structure, and is basically an intransitive verb, such as qiangpiao (抢票) ‘grab tickets’, shiyao (试药) ‘try medicine’, banxue (办学) ‘run schools’, kaobo (考博) ‘take Ph.D. qualifying exam’, gaifen

398

J. Li

(改分) ‘modify score’, weiyao (喂药) ‘feed medicine’, and baitan (摆摊) ‘set up a stall’ in the words qiangpiaozu, shiyaozu, banxuere, gaifenmen, weiyaomen, baitange. Grammatical Structure Features of AN. Both right-ﬁxed and left-ﬁxed lexical patterns have an attributive-head AN mode. Quantitative statistics show that the overall productivity of the right-ﬁxed patterns in the attributive-head AN mode is relatively weak, and only [X + ke] [X + nü] [X + nan] [X + fen] [X + ba] has a word formation ratio ranging from 11%–24% in 1 + 1AN mode. Within the overall word-forming ability of the left-ﬁxed patterns in the two prosodic modes of attributive-head, AN is relatively higher, [wei + X] [ling + X] [ruan + X] has a strong productivity in 1 + 2AN mode, [luo + X] [wei + X] also has a certain degree of productivity in 1 + 1AN mode. In the combination of adjectives and nouns in modern Chinese, words with the same number of syllables are free, that is, [1 + 1] [2 + 2] is freer than [1 + 2] [2 + 1]. The former is a conventional prosodic mode, and the latter is an unconventional prosodic mode. However, the words constructed by language users based on contemporary Chinese lexical patterns are contrary to the overall tendency. One of the most important categories of adjectives is qualitative adjectives and state adjectives. Qualitative adjectives express the attributes of things, and state adjectives express the state of things or actions. From the perspective of syllables, typical qualitative adjectives are in monosyllable form, while typical state adjectives are in two-syllable form [21]. Compared with two-syllabic adjectives, monosyllabic adjectives used as attributives are more distinct and are usually used as the basis for classiﬁcation [22]. Functionally, the ﬁxed components in the left-ﬁxed pattern are the same as the typical qualitative adjectives. When an adjective modiﬁes a noun alone, its main function is to distinguish, that is, to limit the scope of the object to classify things. The A in AN has the function of distinguishing and classifying. For example, shili (实力) ‘strength’ after entering [ruan + X] means ‘influence of spiritual culture and ideology other than economy and military in national competitive strategy’. The form of soft power is different from the strength embodied in economy, science and technology, national defense, etc. Yan (烟) ‘smoke’ means ‘high-end cigarettes without outer packaging’ after entering [luo + X]. The appearance of high-end cigarettes is different from common cigarettes with packaging. Grammatical Structure Features of AV. In modern Chinese, even matching is freer than odd matching in AV mode. ‘The match between qualitative adjectives and behavior verbs (where the typical word length is monosyllabic structure) is unmarked, while the match between qualitative adjectives and action verbs (where the typical word length is polysyllabic structure) shows different degrees of markedness’ [23], such as changzhu (长住): *changjuzhu(长居住) ‘long-term stay’, difei (低飞): *difeixiang (低飞翔) ‘low flight’. However, according to the statistical analysis of the corpus, it can be observed that most of the left-ﬁxed AV modes do not follow the conventional prosodic mode of modern Chinese, only [luo + X] is typical in the 1 + 1AV mode, such as luobao (裸报) ‘naked sign up’, luoben (裸奔) ‘naked running’, luoci (裸辞) ‘naked resign’, luogou (裸购) ‘naked shopping’, luogui (裸归) ‘naked return’, luohun (裸婚) ‘naked marriage’, luojia (裸驾) ‘naked driving’, in which V is a typical action verb and the addition of luo indicates a subordinate classiﬁcation of the

The Development Trend and Form-Meaning Features

399

central component. Like the A in the AN, the A in the AV also mainly plays the role of distinguishing, for example, luotui (裸退) ‘naked retirement’, luohun (裸婚) ‘nude marriage’, luoying (裸映) ‘nude screening’ are respectively different from the conventional cadre retirement, newlyweds’ marriage and movie release forms. [wei + X] [ling + X] [ruan + X] has a certain degree of productivity in 1 + 2AV mode. ‘When the verb in the adjective + verb structure is disyllable, it is generally only combined with disyllable adjectives, not monosyllable adjectives’ [23], such as jingxin baoyang (精心表演): *jing baoyang (精表演) ‘elaborate performance’, and canku daji (残酷打击): *zhong daji (重打击) ‘critical hit’. However, the collocation characteristics of contemporary Chinese lexical patterns are contrary to this rule. Why is this so? Studies have shown that disyllabic verbs have the semantic and grammatical functions of nouns to a certain extent [24]. It is found from tne corpus that the disyllable verbs of 1 + 2 AV are semantically highlighting a behavior rather than a speciﬁc action, such as ruanbagong (软罢工) ‘soft strikes’, ruanbaoyuan (软抱怨) ‘soft complaints’, ruancaiyuan (软裁员) ‘soft layoffs’, ruanliantong (软联通) ‘soft connectivity’, and ruanzhuanxing (软转型) ‘soft transformation’. Bagong, baoyuan, caiyuan, liantong, and zhuanxing are all behaviors, not speciﬁc actions. After adding the ﬁxed component ruan, the unconventional form of behavior is highlighted, that is, a state of opposition between marked items and unmarked items is formed. The grammatical structure of contemporary Chinese lexical patterns mainly includes the four combinations of NN, AN, VN, and AV mentioned above. In addition to this, there is a special case [bei + X]. The emergence of this lexical pattern is closely related to the contraction of the form and meaning of the bei sentence. Previous studies have done a lot of detailed and in-depth analysis on this lexical pattern, so this paper does not intend to carry out the analysis of special cases.

4 Conclusion This article summarizes and analyzes in detail the formal and meaning features of contemporary Chinese lexical patterns on the basis of corpus statistics. First, using the Chinese New Words (2006–2018) series of dictionaries as the source, 20 lexical patterns with strong productivity were selected, and 1,360 morphological words were also selected. Second, the quantitative analysis method is used to investigate the word formation of the lexical patterns and the prosodic and grammatical features of the morphological words. It is found that the word formation of the contemporary Chinese lexical patterns shows the development trend of ﬁrst ascending and then descending, with the turning point being in 2010. Some lexical patterns have strong productivity in the short term, 2 + 1VN mode has the strongest word-forming productivity, and different lexical patterns have different typical modes. Thirdly, on the basis of quantitative statistics, the prosodic and grammatical features of lexical patterns are qualitatively analyzed. It is found that the contemporary Chinese lexical patterns tend to choose odd matching mode of [2 + 1] or [1 + 2] when creating new words; Most NN word formation components have thematic relations, few of which have property relations. VN, AN, AV words are inconsistent with the conventional pattern in the overall tendency of the prosodic mode, because the V has a weak verbal feature, and the A has the function

400

J. Li

of distinguishing and classifying. The features of contemporary Chinese vocabulary found in this research are partly consistent with traditional word-forming rules and some are inconsistent with traditional word-forming rules, indicating that contemporary Chinese vocabulary has developed new word-forming rules based on traditional wordforming rules. Funding. Science Foundation of Beijing Language and Culture University (supported by ‘the Fundamental Research Funds for the Central Universities’) (21YJ030002); Humanities and Social Sciences Planning Fund of the Ministry of Education of China (20YJA740032); The Phoenix Tree Innovation Platform Project (supported by ‘the Fundamental Research Funds for the Central Universities’) (20PT01).

References 1. Kang, S.: The metrological research and application of modern Chinese new words. Chinese Social Sciences Press, Beijing (2008). (in Chinese) 2. Liu, Y.: Nomination by sensitive event projection in the popular constructions such as ‘Xdi’–also on the relational frame-ﬁlling constructions. Contemporary Rhetoric (2), 10–18 (2012). (in Chinese) 3. Qiu, X., Li, B.: The forming process and semantic evolution of the ‘micro-’word family. Appl. Linguist. (1), 37–45 (2015). (in Chinese) 4. Hu, C., Shao, Y.: Difference and analysis between the structures of “Shai(, ) + NP” and “Xiu (, ) + NP.” In: Hong, J.-F., Zhang, Y., Liu, P. (eds.) CLSW 2019. LNCS (LNAI), vol. 11831, pp. 14–23. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38189-9_2 5. Dong, X.: Chinese Lexicon and Morphology. Peking University Press, Beijing (2004). (in Chinese) 6. Zeng, L.: Research on the lexical patterns of three-character words. Wuhan Univ. J. (Hum. Sci.) 63(4), 471–476 (2010). (in Chinese) 7. Li, J.: Research on the semantic recognition procedures and mechanisms of [N+N] lexical words in contemporary Chinese. Lang. Teach. Res. (5), 92–102 (2019). (in Chinese) 8. Liu, C.: An analysis of the motivation of the newly-coined trisyllable words. Chin. Linguist. (3), 50–56 (2012). (in Chinese) 9. Hui, T.: Analysis of structural, semantic and pragmatic features of Chinese neologisms in the recent decade. Appl. Linguist. (4), 26–34 (2014). (in Chinese) 10. Wang, H., Fu, L.: A research on the semi-afﬁx of mandarin Chinese. Linguist. Sci. (5), 3–17 (2005). (in Chinese) 11. Zhang, G.: Selective differences between single and double syllable adjectives. Stud. Chin. Lang. (3), 3–9 (1996). (in Chinese) 12. Zhang, G.: The semantic and syntactic motivation of combinations of monosyllabic and disyllabic verbs and adjectives in “Adj.-V” and “V-Adj.” phrases. Chin. Teach. World (4), 3–17 (2004). (in Chinese) 13. Wang, H.: The relations between the number of syllable, the tonal range of pitch and the grammatical structure in Chinese. Contemp. Linguist. (4), 241–252 (2001). (in Chinese) 14. Gagné, C.L., Shoben, E.J.: Influence of thematic relations on the comprehension of modiﬁernoun combinations. J. Exp. Psychol. Learn. Mem. Cogn. 23(1), 71–87 (1997) 15. Wisniewski, E.J.: Construal and similarity in conceptual combination. J. Mem. Lang. 35(3), 434–453 (1996)

The Development Trend and Form-Meaning Features

401

16. Zhang, B.: Functional explanation of flexible use of parts of speech. Stud. Chin. Lang. (5), 339–346 (1994). (in Chinese) 17. Tan, J.: The semantic foundation and related issues of the transformation of nouns, forms and parts of speech. Stud. Chin. Lang. (5), 368–377 (1998). (in Chinese) 18. Ke, H.: A Study on the Collocation of Single and Double Syllables in Modern Chinese. The Commercial Press, Beijing (2012). (in Chinese) 19. Zhang, G.: Modern Chinese Adjective Function and Cognition Research. The Commercial Press, Beijing (2006). (in Chinese) 20. Li, Y.: The part of speech status of non-predicate adjectives. Stud. Chin. Lang. (1) 1–9 (1996). (in Chinese) 21. Zhu, D.: Modern Chinese adjective research. Lang. Res. (1), 83–111 (1956). (in Chinese) 22. Suzuki, Q.: A preliminary study on the semantic rules of the combination of adjective and noun without ‘de(的)’—also on the distinction of adjectives. Chin. Lang. J. Gramm. Res. Explor. (9), 274–289. The Commercial Press, Beijing (2000). (in Chinese) 23. Zhang, G.: Cognition and Research on Modern Chinese Verbs. Xuelin Publishing House, Shanghai, (2016). (in Chinese) 24. Zhang, G.: A preliminary study on the functional differences of single and double syllable action verbs in the structure of ‘verb + noun’. Stud. Chin. Lang. (3), 186–190 (1989). (in Chinese)

Natural Language Processing and Language Computing

Automatic Prediction and Linguistic Interpretation of Chinese Directional Complements Based on BERT Model Young Hoon Jeong, Ming Yue Li, Su Min Kang, Yun Kyung Eum, and Byeong Kwu Kang(&) Sogang University, Seoul, Korea {boychaboy,kbg43}@sogang.ac.kr, {sinabeurolmy,kksm9801}@naver.com

Abstract. The Chinese directional complement is one of the trickiest concepts for second language learners due to their derivative meanings. In particular, 出来, 起来, 下来, and 下去 are easy to be confused. This study aims to gain grammatical and educational insights for these complements with a neural network model. This study ﬁne-tuned the Chinese BERT model with Chinese directional complement data composed of sentences containing the above four components used in literature, media, and textbook. By measuring these ﬁnetuned models’ accuracy, we show how accurately and efﬁciently the neural network model predicts Chinese directional complements. Furthermore, we interpret and analyze the model’s decision using the Sampling and Occlusion algorithm and visually present which components of the sentence influence the choice for complements. Keywords: Chinese directional complement Explainable AI Sampling and occlusion

BERT Transfer learning

1 Introduction In recent natural language processing, neural network models are one of the most popular methodologies. Neural network models use contextual information of words acquired by learning large amounts of text data, as we can see from the mechanism of ELMo, GPT, and BERT. These algorithms are used in various ﬁelds because they are superior to other models in efﬁciency and accuracy. Taking BERT as an example, this language model far surpasses the existing model in 11 items such as POS tagging, NER, dependency parsing, and QA. [1] The latest neural network models have reached a level that helps researchers work in NLP and linguistic research. There is an increasing need to utilize neural models in Chinese grammar research and Teaching Chinese as a Second Language (TCSL) more actively in this situation. The purpose of this study is to explore a method of automatically predicting Chinese Directional Complement (趋向补语: CDC) using the BERT model. Furthermore, we would like to explore how to use BERT in Chinese grammar research and TCSL. The core themes of this study are the following two. © Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 405–416, 2022. https://doi.org/10.1007/978-3-031-06703-7_31

406

Y. H. Jeong et al.

First, how accurately and efﬁciently can a neural network language model predict and classify CDC? Second, what clues or weights are used in inferring CDC? The former is a task to check the performance of neural models, and the latter is to attempt an interpretation of neural models. In this study, it is CDC (Chinese Directional Complement) that we want to consider using neural network models. The CDC was chosen as the subject of study because it is important in grammar research and language education, but it is hard to grasp its meaning and function. CDC is a multifunctional sentence component that is located after a verb and represents various meanings. CDC is not only used frequently but also represents various meanings from lexical meaning to grammatical meaning. However, in terms of TCSL, CDC is essential but difﬁcult to understand completely. In phrases such as 坚持下来 (hold + come down), 坚持下去 (hold + go down), 想出来(think + come out) and 想起来(think + get up), CDC represents the resultative and stative status of an action rather than a speciﬁc direction of movement. Several previous studies revealed that the CDC is a challenging sentence component to learn. According to the results of [2, 3], and [4], there are a large number of second language learners who misused the CDC. Among all incorrect answers, the proportion of CDC was the highest. In terms of TCSL, it is necessary to teach the grammatical functions of the CDC effectively.

2 Related Work and Our Framework The core methodology used in this study is a transfer learning model based on BERT. BERT. BERT (Bidirectional Encoder Representations from Transformer) is a deep learning algorithm that learns language models based on the transformer encoder. BERT learns not only words but whole sentences, so it has an excellent ability to capture the meaning and grammatical features. The BERT model has a flexible structure and can sufﬁciently learn contextual information at the sentence level. BERT uses a masked language model to learn sentences. A masked language model is a learning method that predicts the next word based on a given word sequence. As shown in the ﬁgure below, the model learns contextual information by repeating the process of predicting which word will appear at the corresponding mask position by putting a mask in the middle of the sentence. BERT pursues a bi-directional approach. The whole sentence is observed from both directions to learn the contextual information. BERT models that use bi-directional encoders have better embedding quality than unidirectional encoders [5] (Fig. 1).

Fig. 1. Different learning methods of masked language models.

Automatic Prediction and Linguistic Interpretation of Chinese Directional Complements

407

Transfer Learning. Transfer learning is a deep learning technique that increases efﬁciency by training newly constructed data after pre-training data. Transfer learning aims to improve target learners’ performance on target domains by transferring the knowledge in different but related source domains. The dependency on a large number of target-domain data can be reduced for constructing target learners. Due to the broad application prospects, transfer learning has become a popular and promising area in machine learning [6]. Our Framework. The architecture of this study can be represented in the ﬁgure below. As shown in the ﬁgure below, the transfer learning process for CDC prediction is conducted by combining the existing Chinese BERT pre-trained data and the CDC corpus. The learning process is accomplished by learning basic information from largescale data and then learning target data for CDC prediction and classiﬁcation. Since the pre-trained BERT data is mainly composed of encyclopedias and newspapers, various CDC examples are relatively insufﬁcient. Therefore, in this study, ﬁne-tuning learning was conducted by extracting CDC examples from literature works, broadcast scripts, and Chinese textbooks with many colloquial expressions. Moreover, we performed the task of improving performance through the transfer learning process (Fig. 2).

Fig. 2. Model architecture for CDC prediction.

3 Design of the Prediction Model for CDC 3.1

Data Processing

CDC Dataset. CDC Dataset consists of sentences with CDC, which we collected from 5,475 literary works, 108 broadcast scripts, and 290 Chinese textbooks.1 We chose these sources for the unbiased and sufﬁcient data and for our goal of this project to be used for educational purposes. Since the pre-trained data tends to have more literary expressions, we added colloquial expressions. 1

The raw data that we used were collected from the following sites, respectively. Literary works: CCL Corpus of Peking University, http://ccl.pku.edu.cn:8080/ccl_corpus/. Broadcast scripts: Media Language Corpus (媒体语言语料库), http://ling.cuc.edu.cn/RawPub/. Chinese textbooks: Corpus of teaching Chinese as second language, http://www.aihanyu.org/.

408

Y. H. Jeong et al.

While collecting, we focused on four directional complements, which are 出来 (come out), 起来 (get up), 下来 (come down) and 下去 (go down). These four complements are used frequently, but they are very tricky in their use due to similarities between them. They also have abundant derivative meanings, such as a resultative or stative state. It is not easy for second-language learners to fully understand these complements’ functions because of these semantic features. It becomes clear with the HSK Dynamic Composition Corpus2 data, where we can ﬁnd out lots of misuse of the four directional complements by second language learners. After extracting sentences with one of these four complements, we had to remove the sentences with tokens that are not used as directional complements among these sentences. After roughly assorting with the Corpus Word Parser3, we checked the remaining sentences one by one. For instance, in 她在他的陪伴下来到医院。 (He accompanied her to the hospital.), 下来 (come down) is not used as a directional complement, but Corpus Word Parser could not recognize it. These statements were removed through regular expressions and NLPIR-Parser4. The ﬁnal CDC Dataset consists of 98,327 sentences, and we used 94,327 sentences for training data and 4,000 sentences for test data. For more accurate measurements, each data source was thoroughly distinguished from minimizing the similarity of context. 3.2

Model Analysis

In this study, we trained the following three pre-trained masked language models using the CDC Dataset above. We used the BERT Classiﬁcation Model, a traditional BERT model with an additional single linear layer on the top, to classify four directional complements. The ﬁrst pre-trained model we used is bert-base-chinese, which was presented in the same year Google presented BERT for the ﬁrst time. The bert-base-chinese model was pre-trained on Chinese Wikipedia, which contains both Simpliﬁed and Traditional Chinese, resulting in 0.4B words. The model consists of 12 layers, 768 hidden layers, 12 attention heads, and 110M parameters. While bert-base-chinese model relies on character-based tokens for tokenization, the updated version of the model: bert-wwm-ext, presented in [7] masks a complete word for masking, which makes the model recover the whole word on Masked Language Model (MLM). This model also uses extended training data, which is ten times bigger than the Chinese Wikipedia, leading to high performance compared to the origin models.

2

3

4

HSK Dynamic Composition Corpus covers the HSK composition papers of foreign exam takers from 1992 to 2005. Corpus Word Parser is a parser providing word segmentation and Part-Of-Speech tagging, http:// www.aihanyu.org/. The NLPIR system is multi-functional that supports Chinese word segmentation, English tokenization, Part-Of-Speech (POS) tagging, named entity recognition, new word identiﬁcation, keywords extraction, and user-deﬁned lexicon. NLPIR-ICTCLAS Home page http://ictclas.nlpir.org/ index_e.html.

Automatic Prediction and Linguistic Interpretation of Chinese Directional Complements

409

MacBERT-large, presented in [8], is our third pre-trained language model. It is a modiﬁed version of the original masked language model, which gained high scores on experiments conducted on various NLP tasks. MacBERT-large uses whole word masking, similar word replacement, and N-gram masking, making it more competitive than other masked language models. With our CDC Dataset, we trained the three language models and used them to ﬁne-tune the classiﬁcation layer to classify the four labels in the test sentences. (出来: 0, 起来: 1, 下来: 2, 下去: 3) Each model was trained with a learning rate of 5e-5, a training batch size of 64, and 10 epochs.

4 Analysis of Accuracy Rate for Neural Network Models Experiment Results. In this chapter, we will further analyze the test results of each ﬁne-tuned model. The following table shows the accuracies (Table 1). Table 1. The accuracy of each CDC classiﬁcation model. Model bert-base-chinese BERT-wwm-ext MacBERT-Large

Correct 86.8% 86.9% 87.7%

Multiple 10.2% 10.2% 10.0%

Wrong 3.0% 2.9% 2.3%

Accuracy 97.0% 97.1% 97.7%

As we can see from the table, each model’s accuracy was 86.8%, 86.9%, and 87.7%, respectively. Despite the signiﬁcant gap between the amount of pre-trained data in each model, the overall accuracy seemed to have little difference. This not only represents the quality of the train data determines the performance of the model but also proves that our train data were of good quality. Multiple Answers. We found out that about 10% of sentences in our data can have multiple answers. There are two main reasons why sentences can have multiple answers. First, some sentences have insufﬁcient contextual information. For instance, example (1) lacks contextual information about the exact direction, making two complements above applicable despite semantic differences. Second, multiple directional complements have similar functions that make them all replaceable, as shown in example (2). (1) 你看他把碗拿[MASK]了。(label:下去; Prediction:出来) (2) 我认为蔡英文接[MASK]还要面对更多事情。(label:下去; Prediction:下来) Therefore, when it comes to evaluating sentences that even native speakers think they can have multiple answers, it cannot be said that the answer predicted by BERT is wrong. When multiple answers are considered correct, the models’ overall accuracy increases about 10 percent point, MacBERT-large remaining the highest. Wrong Answers. Unlike the previous samples of multiple answers, there are also wrong answers, where predictions are not applicable in given sentences. These errors

410

Y. H. Jeong et al.

make up 3% of our total test sentences, and most of the cases are due to the cooccurrence of verb and complements. As seen in the following sentences, the frequency of “verb + label” is lower than that of “verb + prediction”. We also found out another feature in these error sentences: They have implicative meanings. (3) 这么亲热, 一天两天的还真混不[MASK]。(label:出来; Prediction:下去) (4) 看见金字塔就有一种恒心, 一定要把二战熬[MASK], 人类有和平才有希望。 (label:下来; Prediction:下去) In example (3), 混不出来 implies that the relationship between the two is very close, and in example (4), 熬下来 implies that they will certainly endure difﬁcult times. Therefore, we can summarize that the low frequencies and implicative meanings interfere with BERT’s prediction, which causes the occurrence of wrong answers. Complement Accuracy. Table 2 below shows the individual scores within the four directional complements. The accuracy was calculated with considering multiple answers right. Among four complements, 起来 (get up) was the highest in its prediction accuracy, followed by 出来 (come out), 下来 (come down), and 下去 (go down) in descending order. We only attached the table of MacBERT-large here, but the ranking of the four directional complements was the same in all three models. Table 2. The accuracy of each CDC predicted by MacBERT-large CDC 出来起来下来下去 Total

Correct 91.7% 95.2% 87.5% 76.2% 87.7%

Multiple 6.7% 4.1% 10.5% 18.9% 10.0%

Wrong 1.6% 0.7% 2.0% 4.9% 2.3%

Accuracy 98.4% 99.3% 98.0% 95.1% 97.7%

5 How Does BERT Pay Attention to CDC Selection? 5.1

Model Analysis with Sampling and Occlusion

It is a challenging research area to interpret neural network models, and there are various approaches to explain why the model has made certain decisions. To analyze and interpret our CDC Classiﬁcation models, we used the Sampling and Occlusion (SOC) algorithm proposed by [9] because it enables hierarchical analysis and visualization. SOC algorithm is a formal and general way to quantify each word’s importance and phrase in a sentence. It outperforms the existing hierarchical explanation algorithms such as agglomerative contextual decomposition (ACD) because it approximates the N-context independent importance by sampling multiple sentences. We further visualized syntactic composition captured by models for linguistic analysis. Training Models. We have trained four different binary classiﬁers, labeled true if corresponding directional complement should replace the padding (e.g. [MASK]) token in a sentence and false if not. The model architecture was changed from the previous

Automatic Prediction and Linguistic Interpretation of Chinese Directional Complements

411

four-label classiﬁer to ﬁt the SOC algorithm. We used the CDC corpus to train each model. The pretrained model used was bert-base-chinese, considering the accuracy and model size. Every model is trained with a learning rate of 5e−5, a training batch size of 64, and 5 epochs. Test accuracy of each model is written in the following table (Table 3). Table 3. The accuracy of each binary CDC classiﬁers Directional complement Accuracy 出来 94.87% 起来 95.47% 下来 93.25% 下去 92.87%

Algorithmic Details. SOC algorithm is an extension of the input occlusion algorithm [10], which calculates the importance of phrase p speciﬁc to an input sentence x. The importance score is measured by observing the difference of prediction score by replacing the phrase p with padding tokens, noted here as x_p. However, the score measured by the input occlusion algorithm is dependent on the context words in sentence x. SOC algorithm overcomes this limit by sampling Ncontext surrounding the phrase p, obtained by the pre-trained language model. For the language model, we have trained BiGRU language model with our train data.[11] We have set N to 5, so ﬁve contexts have been replaced when generating a set of sentences S. The ﬁnal score is averaged over sampled sentences b x 2 S with the following formulation, which is a simpliﬁed version of the formulation in [10]: /ðp; xÞ ¼

1 X ½scoreðb x Þ scoreðb x pÞ jSj bx2S

ð1Þ

Using the SOC method for linguistic interpretation is our main contribution. After measuring the importance score with the SOC algorithm, we used LTP [12] to parse the sentence into words. The in-depth syntactic analysis was possible by looking at the word-level score difference in sentence x. After each phrase’s score has been calculated, we have visualized the result by marking red for the most important phrase and blue for the least important phrase. The example of the visualization of the hierarchical SOC explanation is shown below (Fig. 3).

Fig. 3. The visualization of the Sampling and Occlusion algorithm result of the sentence 他拿起一个红苹果, 继续吃了[MASK]。

412

5.2

Y. H. Jeong et al.

Linguistic Interpretation of CDC Selection Constraints

CDC is a subgroup of Chinese complements following the verb/adjective to indicate direction movement, action result, and state change. As CDC is a sentence component combined with the main predicate, the most important clue for CDC selection are verbs and adjectives. However, in addition to verbs/adjectives, adverbs, prepositional phrases, auxiliary verbs, and temporal expressions also play an essential role in CDC selection [13]. Verbs and Adjectives. Verbs and adjectives are the words that the BERT model pays the most attention to when predicting CDC. However, there are certain restrictions on choice, depending on the type of verb and adjective. For example, 吃, 写, and 看 can combine with various CDCs, but verbs such as 停, 站, and 贴 only conjugate with certain CDCs. In particular, Chinese adjectives have more restrictions on their choice than verbs. When certain adjectives are used as main predicates, it is easy to predict an appropriate CDC. For example, adjectives with dynamic situations or positive semantic prosody are often combined with 起来. Such is the case with 热闹起来 and 高兴起来. Conversely, adjectives with a static situation or negative semantic prosody (安静, 平静, 黑, 暗, etc.) are often combined with下来. When adjectives are used as main predicates, they serve as an essential clue in predicting CDC. As shown in the example below, in the BERT model, 热闹 or 安静 plays a positive role in CDC selection (the closer to the red color, the more positive it is). Given these words, the BERT model immediately predicts the appropriate CDC. Adverbs. Various adverbs appear in CDC construction. Some adverbs (已经, 刚刚, 终于, etc.) correspond to motion events that have already been completed. Also, some adverbs (一直, 渐渐, 慢慢地, etc.) correspond to the durative meaning of the action. In many cases, the CDC’s predictability increases depending on whether an adverb is used in a sentence. For example, 渐渐 (increasingly) means “greater in number or amount” and CDC 起来 is very close semantically because it indicates the beginning of an action or entering a new state. The adverb 继续 (continuously) means “event continues for a while without stopping”. Therefore, it is appropriate to combine with 下去 expressing the continuation of the action. As in the example below, the BERT model predicts the appropriate CDC based on a speciﬁc adverb (Fig. 4).

Fig. 4. Predictive weight of verbs, adjectives, and adverbs

Automatic Prediction and Linguistic Interpretation of Chinese Directional Complements

413

Prepositional Phrase. The prepositional phrase is a component representing the place or time of an action. The prepositional phrase can serve as an important clue in determining the direction of movement. In addition, 把 can be called a special preposition (sometimes called a disposal marker), also influences CDC selection. In the example below, we can see that prepositional phrases play a positive role in CDC selection. Auxiliary Verbs. Auxiliary verbs represent the speaker’s ability, will, and desire. Auxiliary verbs mainly represent the future for which the action has not yet occurred. Therefore, when these auxiliary verbs are used, there is a certain influence on the CDC selection. If auxiliary verbs (能, 会, etc.) are used in a sentence, it is possible to determine which CDC is appropriate later. As shown in the example below, the BERT model used auxiliary verbs as a positive clue for CDC selection. Temporal Expressions. Temporal expressions representing the past or the future also play a role in CDC selection. CDC indicates the beginning, continuation, and completion of an action. The meaning is closely related to time. For example, 下来, which represents the completion of an action, usually implies that it started from some point in the past and was completed in the present. Therefore, in sentences in which 下来 is used, temporal expressions indicating past situations are often observed. On the other hand, 下去, which indicates the continuation of an action, generally describes a situation in which an action continues to a point in the future. Therefore, 下去 and future markers have a semantic correspondence. This tendency is captured in the BERT prediction model (Fig. 5).

Fig. 5. Predictive weight of prepositions, auxiliary verbs, and temporal expressions

414

Y. H. Jeong et al.

As shown above, when choosing a CDC, a neural network model such as BERT does not simply predict based on the frequency of verbs or adjectives. In addition to the verb, the CDC prediction is made by considering other components of the sentence. When adverbs, auxiliary verbs, and temporal expressions are added to a sentence, the predicted probability of the CDC changes and the type of CDC selected changes. 5.3

Using BERT Model for Educational Purpose

Using the BERT model to study the CDC’s grammatical function is meaningful itself, but it can be more valuable through educational use. If the BERT model and SOC algorithm are appropriately utilized, it will be possible to establish an application system for learning and tutoring CDC. Our methodology is well worth using when looking at the subject, narrowing it down to the ﬁeld of Teaching Chinese as a Second Language (TCSL). Korea is one of the countries where the vast number of Chinese learners are distributed. Many secondary schools and universities in Korea teach Chinese as a second language, and the number of Chinese literature departments established in universities is the second largest in the world right after China. Korea is also the country where the largest number of students in the world take the HSK test. These facts suggest that BERTbased Tutor, which acquires advanced levels of Chinese grammar knowledge, can be helpful. Most of the students rely on Chinese instructors or textbooks. However, compared to students’ demand, the number of Chinese tutors is lacking, and textbooks often do not have enough examples. In particular, Chinese directional complements are easily confused by Korean students due to their derivative meanings. Unfortunately, instructors’ explanations are teacher-dependent, subjective, and time-consuming. Paper textbooks also omit explanations for complex grammar points on CDC. Applying the SOC explanation and visualization when learning CDC can help solve these problems in two ways. First, the BERT model can automatically give you a clue in choosing the right CDC. Students can learn the relationship between speciﬁc clues and the CDC and deepen their understanding of the use. Besides, using the visualization tool enables more intuitive learning by arousing students’ interests. Second, the ability to analyze various sentences beyond the scope of textbooks and manuals will also help with efﬁcient learning. Compared to the textbooks with limited content, BERT is inﬁnite in the context of big data and can better improve students’ knowledge and skills. The result of this research is like a CDC-related HSK question bank book. It assists tutors in lessening their burden of giving good examples and explanations to every student.

6 Conclusion In this study, we investigated how accurately the BERT model can predict the CDC. We also analyzed which words the BERT model uses as an essential clue in the CDC inference process. According to the results of this study, it can be seen that the BERT

Automatic Prediction and Linguistic Interpretation of Chinese Directional Complements

415

model shows excellent performance in inferring distributional features and grammatical relationships based on transfer learning. Results of experiments with four types of CDC with different meanings and functions show that the accuracy rate of predictions is relatively high. In addition, as a result of analysis using the SOC algorithm, we found that the BERT model appropriately uses important clues to determine CDC in context. We believe that this study is meaningful in NLP and provides insight into Chinese grammar research or TCSL. If this methodology is utilized correctly, it will be possible to establish an application system for Chinese grammar research and education. In Neural network models, sufﬁcient language data learning allows us to predict which language expressions are more natural to use. Proper use of these advantages will give us insight into Chinese grammatical functions. This Chinese grammar prediction system will also help Chinese learners improve their skills by showing them what expressions are grammatically correct. Acknowledgments. This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2020S1A5A2A01045437).

References 1. Tenney, I., Das, D., Pavlick, E.: Bert rediscovers the classical NLP pipeline. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4593– 4601 (2019) 2. Che, H.: Error Analysis of Chinese Complements Acquisition by Korean students. School of Liberal Arts, Liaoning Normal University Doctoral dissertation (2006). (in Chinese). (车慧. 韩国留学生习得汉语补语的偏误分析. 辽宁师范大学文学院.) 3. Jung, E.: Difﬁculties and Strategies for Korean Students in Learning Chinese Grammar. East China Normal University Doctoral dissertation (2010). (in Chinese) 4. Yang, Q.: Study on the Learning Method of Chinese Complement: Focusing on the error analysis of Korean learners. Dong-A university Doctoral dissertation (2019). (in Korean) 5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018) 6. Zhuang, F., et al.: A comprehensive survey on transfer learning. Proc. IEEE 109(1), 43–76 (2020) 7. Cui, Y., et al.: Pre-training with whole word masking for Chinese bert. arXiv preprint arXiv: 1906.08101 (2019) 8. Cui, Y., Che, W., Liu, T., Qin, B., Wang, S., Hu, G.: Revisiting pre-trained models for chinese natural language processing. arXiv preprint arXiv:2004.13922 (2020) 9. Jin, X., Wei, Z., Du, J., Xue, X., Ren, X.: Towards hierarchical importance attribution: explaining compositional semantics for neural sequence models. arXiv preprint arXiv:1911. 06194 (2020) 10. Li, J., Monroe, W., Jurafsky, D.: Understanding neural networks through representation erasure. arXiv preprint arXiv:1612.08220 (2016) 11. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014) 12. Che, W., Li, Z., Liu, T.: LTP: a Chinese language technology platform. In: Conference: COLING 2010, 23rd International Conference on Computational Linguistics, Demonstrations Volume, August 2010, Beijing, China, pp. 23–27 (2010)

416

Y. H. Jeong et al.

13. Kang, B.: Deep learning language model and Chinese grammar. J. Chin. Lit. 106 (2021). The Society for Chinese Language and Literature. (in Korean) (강병규. 딥러닝 언어모델 과 중국어문법, 중국문학) 14. Han, Y., Zhong, M., Zhou, L., Zan, H.: Statistical analysis and automatic recognition of grammatical errors in teaching Chinese as a second language. In: Hong, J.-F., Zhang, Y., Liu, P. (eds.) CLSW 2019. LNCS (LNAI), vol. 11831, pp. 406–414. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38189-9_42

Chunk Extraction and Analysis Based on Frame-Verbs Chengwen Wang1, Gaoqi Rao2, Endong Xun2, and Zhifang Sui1(&) 1

2

The MOE Key Laboratory of Computational Linguistics, Peking University, Beijing, China {wangcw,szf}@pku.edu.cn Institute of Big Data, Beijing Language and Culture University, Beijing, China

Abstract. Verbs have always been the focus and occupy a position of difﬁculties in the ﬁeld of linguistics and natural language processing, in China and abroad. Combined with Chinese word construction features, we suggest that studies of verbs should be conducted in a systematic manner, and attention should be paid to the interaction between verbs and related sentence patterns along with frame units. In view of this, this paper focuses on frame-verbs, and 142,142 chunk instances extracted from a 2 TB scale corpus. At the same time, based on HowNet noun ontology, we analyze the distribution of semantic categories in chunk instances. This work describes an empirical approach to the study of the interaction between verbs and chunks. Keywords: Big data

Chunk Frame-verb Interaction

1 Introduction The study of Chinese verbs has always been a popular topic in the ﬁeld of linguistics, and the focus is on the core and central position of verbs in sentences. Many scholars believe in the “verb center theory” [1–3] according to which, verbs form the core of syntax and semantics of sentences, and the relation of syntax and semantics between verbs and their surrounding constituents can effectively hold the deep semantic relation of sentences. Research on theoretical linguistics and language teaching focuses on the deﬁnition and analysis of verb [4, 5], verb valence [5, 6], verb and sentence pattern [2, 7], verb and construction [8], as well as syntactic and semantic functions of special subcategory verbs. Language learners are accustomed to associating verbs with sentence structures, partial structures, frame structures, and ﬁxed collocation expressions for language learning and production. In view of this, a systematic view should be taken in the study of verbs based on their interaction with other linguistic units. At the same time, Chinese has constructional characteristics. In language use, the usage of some chunk constructions by a speaker, is an unprocessed whole extraction and processing method. However, many chunk elements with verbs as their core exist in language facts, such as “为…着想 (wei4…zhuo2xiang3, to think of somebody)”, “向…看齐 (xiang4… kan4qi2, to keep up with)”, “打擦边球 (da3ca1bian1qiu2, to take a chance)”, “拍马屁 (pai1ma3pi4, to lick somebody's boots)”. © Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 417–425, 2022. https://doi.org/10.1007/978-3-031-06703-7_32

418

C. Wang et al.

Based on the above, it is necessary to build a knowledge representation system of the interaction between verbs and chunks in the study of language. Taking the frameverb in Chinese as the starting point, this paper aims to extract the frame-chunk from an empirical perspective to provide an exploratory approach to investigating interaction between verbs and chunks.

2 Representation System of the Interaction Between FrameVerb and Chunk 2.1

Deﬁnition of Frame-Verb and Its Quantity

Shen [10] deﬁned frame-verbs as verbs that are presented in a frame structure syntactically. They can only be used as predicates with prepositions and have the semantic features of [+directivity], [+one-way] and [+autonomy]. Based on whether the verbs fully conform to the aforementioned syntactic and semantic features, Shen deﬁnes the quantity of frame-verbs as 88 and divides them into typical frame-verbs and atypical frame-verbs, the difference being whether they fully meet the above-mentioned deﬁnition. For example, “着想 (zhuo2xiang3, to consider)” and “争光 (zheng1guang1, to win honor)” are typical frame-verbs. However, “打针 (da3zhen1, to take an injection)” and “挑战 (tiao3zhan4, to challenge)” are atypical frame-verbs. In this paper, we consider the frame-verb system as a continuum, and the syntactic and semantic differentiation of verbs within the system is not a rigid distinction of “typical or atypical”. Shen (2016) [10] sorted “鼓气 (gu3qi4, to cheer for)” and “看齐 (kan4qi2, to keep up with)” into typical frame-verbs, with the assumption that they meet the condition that verbs must co-exist with the preposition when they are used as a predicate. However, in BCC (Beijing Language and Culture University Chinese Corpus) [9], there are cases where they appear in the predicate position but do not co-occur with a preposition.Examples are as follows: “陆家嘴的目标, 是早日‘看齐’纽约曼哈顿、伦敦金融城等21世纪著名国际金融中心区。 (Lujiazui’s goal is to “keep pace” with Manhattan, the City of London, and other famous international ﬁnancial centers in the 21st century).” (2) “这类工程的初衷原是让人们有一个比较清晰的‘参照系’, 有一个令上下满意的标准供人们‘看齐’, 用意不错。 (The original intention of such projects was to give people a clear “frame of reference” and a satisfactory standard to “keep up with”, which was a good idea.) (3) 我们不害怕, 我们互相扶持互相鼓气 ! (We are not afraid, we support each other and encourage each other!) (1)

In view of this, we propose to describe the frame-verb subcategories as a continuum, as shown in Fig. 1: In Fig. 1, from left to right, frame-verbs satisfy the deﬁnition to an extent, from high to low. In this paper, we have adopted a broad criterion for deﬁning frame-verbs, which are identiﬁed as verbs that can be closely matched with prepositions to form a semantically self-sufﬁcient framed structure. The preposition here is mainly limited to

Chunk Extraction and Analysis Based on Frame-Verbs

Prototype

A decrease in the typicality

members

419

The edge members

Fig. 1. Schematic diagram of continuum

the case markers in the semantic role of “邻体 (lin2ti3, the adjacent semantic role)”, such as “为 (wei4, for)”, “向 (xiang4, forward)”, “对 (dui4, toward) “ and so on. According to the above-mentioned deﬁnition, verbs with adjacent semantic roles and their case marks are extracted from the Chinese verb semantic role knowledge base [11] and developed from the perspective of ternary collocation as the basis for subsequent frame-chunk extraction. The total number of verbs extracted is 2,134. The partial sample data is shown in Table 1. Table 1. Examples of frame-verbs and their case marks Verbs 着想 (zhuo2xiang3, to consider) 授予 (shou4yu3, to award) 咨询 (zi1xun2, to consult) 赋能 (fu4neng3, to energize) 助力 (zhu4li4, to assist)

The case mark of adjacent semantic role 替 (ti4, for) 为 (wei4, for) 向 (xiang4, to) 向 (xiang4, to) 为 (wei4, for) 为 (wei4, for)

Through the sampling observation of 2,134 verbs, we found that there are three types of frame-verbs that match our deﬁnition. The ﬁrst is the typical frame-verb as deﬁned by Shen (2016) [10], the second is the trivalent verb, and the third is verb object structure, such as “帮助 (bang1zhu4, to help)”, “赋能 (fu4neng2, to enable)”, “对话 (dui4hua4, to dialogue)”, etc. 2.2

Construction of Chunk System

Researches on Chinese linguistics mainly classiﬁes chunks from the perspectives of syntax, semantics, and pragmatics. Hence, there are often overlaps between different chunk types. To extract chunks automatically, we need to consider computation characteristics and pay attention to form recognition. The established chunk system should not only conform to language facts, but also be convenient for computers to extract chunks by using formal markers. The chunk classiﬁcation system used in this study is shown in Fig. 2. In Fig. 2, there are ﬁve types of chunks in the rightmost boxes—connected chunks, sentence connections, frames, separable words, and collocations. Other information in Fig. 2 show the distinguishing features of classiﬁcation:

420

C. Wang et al.

integrated fixed form clauses without punctuation inside Chinese chunks clauses with punctuation inside

non-integrated non-fixed form

double morphemes two or more words

integrated nonintegrated

connected chunks frames seperable words collocations connected chunks sentence connections

Fig. 2. Chinese chunk system

a) Distribution Range: Whether the chunk overlaps any punctuation, that is, whether punctuation is included in the middle of a chunk. b) Unit Integrity: Integrity here implies that a chunk is a concrete language expression, which does not have any constituents that need to be ﬁlled. c) Formal Certainty: Whether the constituent units of a chunk have deﬁnite forms and whether the order of chunk constituents or the constituents to be ﬁlled is preserved. d) Composition Type: A constituent of the chunk is a morpheme or a word. Deﬁnitions of various chunks are as follows: a) Connected Chunk It refers to a language unit composed of two or more words, which can be a phrase in a sentence, or one or more punctuated sentences. Such chunks correspond to idioms, slang, and social terms. b) Sentence Connections Connectives between sentences include correlative words, ordinal words, etc. For example, correlative words: “因为 (yin1wei4, because)… 所以 (suo3yi3, so)”; ordinal words: “第一 (di4yi1, ﬁrst), 第二 (di4er2, second)”。 c) Frame This kind of chunk is composed of frame words and slots. After slots are ﬁlled, it can form phrases or punctuated sentences. Such as, “为…起见 (wei4…qi3jiana4, for the sake of…)”, 在…期间 (zai4…qi1jian1, during the time of)”, “不.不.(bu4… bu4…, no…no…)”, “越.越.(yue4…yue4…, the more… the more…)”, “…怎么样? (…zen3me1yang4, What/How about…?/What if…?)”. d) Separable word Such a chunk is composed of two morphemes which, when combined, corresponds to a word, and may appear discontinuously and in different orders in actual usage. For example, “洗 (xi3, to wash)/澡 (zao3, a shower)”, “打 (da3, to ﬁght)/架 (jia4, a battle)”. e) Collation

Chunk Extraction and Analysis Based on Frame-Verbs

421

It is composed of two or more words, which can appear in succession or in a discrete sequence, and the position before and after the occurrence is not unique. For example, “提升 (ti2sheng2, to improve)/能力 (neng2li4, abilities)”, “丰硕 (feng1shuo4, plentiful/rich)/成果 (cheng2guo3, achievement)”, “戴(dai4, to wear)/高帽 (gao1mao4, a tall hat) (to flatter excessively)”, “唱 (chang4, to sing)/高调 (gao1diao4, a high tune) (to speak pretentiously)”. A noteworthy feature is that if two words appear in succession with their relative orders ﬁxed, the constituent they construct belongs to the category of Connected Chunk. The frame structure formed by the frame-verb and the case marker corresponds to the framed chunk in the chunk system. In the subsequent chunk extraction, we need to extract the slots in the frame, i.e., the corresponding neighbor semantic role instances of speciﬁc verbs. For example, the verb “看齐 (kan4qi2, to keep up)” has a frame structure of “向*看齐 (xiang4*kan4qi1, to keep up with)”, wherein, the speciﬁc instances (in the “*” position) are extracted from the large-scale corpus to ﬁll the frame and form a chunk.

3 Frame-Chunk Extraction 3.1

Chunk Extraction System

Based on the BCC system and to conduct knowledge extraction of chunks, we developed a retrieval and extraction system with stronger retrieval capabilities. The corpus has a size of 2 TB, including data from science and technology ﬁelds, news, microblogs, and literature, which can provide sufﬁcient multi-corpus support for the extraction of frame-chunks. Retrieval queries in BCC consist of basic queries and advanced queries. A basic query consists of strings, attribute symbols, and wildcards. An advanced query adds conditional statements or output statements based on a basic query. Statements are separated by “;” and written within “{}” after the basic search query, in the following form: Queryfcond-1; cond-2; . . .; cond-i; printð$iÞg:

ð1Þ

“Query” denotes the basic search query; the conditional statements within “{}” restricts the content of a basic query; the output statement restricts the output content and only one output statement can be used in an advanced search query. Restricted parts of a query need to be separately enclosed within “()”, and according to the order in which “()” appears, they can be obtained by using a “$” symbol followed by a serial number for conditional or output restriction. The components in the ﬁrst “()” are denoted by “$1” and are analogous in turn. One example: ð2Þ

422

C. Wang et al.

The “向* (n) 看齐w” before “{print($1)}” is the basic search query, which means that a string in the order of “向 + * + 看齐w” (xiang4 + * + kan4qi2 + w) should be searched. Here, “w” is the punctuation mark, and “print ($1)” within “{}” means that the output will simply be “n”. 3.2

Chunk Extraction Process

The extraction process used in this research is as follows: First, we formed chunk expressions according to the correspondence between frame-verbs and related case markers. For example, the case marker for adjacent semantic roles of “有关 (you3guan1, to have relations with)” are “与 (yu3, with)” and “跟 (gen1, with)”, the frame expression that they form are “与…有关 (yu3…you3guan1, to have relations with)” and “跟…有关 (gen1…you3guan1,to have relations with)”. The 2,154 cases of frame expressions formed by verbs and case markers are kept in a ﬁle named FrameExpress. Second, we made full use of natural annotation information (mainly punctuation and position information) to develop a chunk retrieval query. Frame-chunks have formal and meaning integrity as complete linguistic units and are semantically selfcontained and relatively independent. They are complete in expressing meaning and can be used alone in certain contexts. Based on this understanding, natural annotation information is added in the process of summarizing extraction rules. In the BCC corpus, “w” is a position identiﬁer, and “向*看齐w” means that a punctuation mark appears after “向 + * + 看齐”. This ensures the independence and self-sufﬁciency of the unit. With the precipitation effect of big data, such an extraction search rule has a desirable matching effect. Therefore, the expressions in “FrameExpress” are processed to form retrieval formulas, in the form of “p*n v w” where “p” represents the case marker, “n” is the noun, “v” represents the verb, and “w” is the end-of-sentence representation symbol. All search formulas are stored in the query ﬁle. Finally, we used the Web API provided by BCC to extract chunks in batches and sort the results using a Perl program. We can read the retrieved form lines that have been stored in the ﬁle and later use the cloud service to extract chunk knowledge from BCC. Examples of partial extraction results are shown in Table 2: Table 2. Partial chunk extraction instances Chunk instances Frequency 和因素有关 (he2yin1su4you3guan1, to have relations with factors) 986 和深浅有关 (he2shen1qian3you3guan1, to have relations with depth) 571 和人有关 (he2ren2you3guan1, to have relations with human beings) 400 与因素有关 (he2yin1su4you3guan1, to have relations with factors) 5,288 与程度有关 (he2cheng2du4, you3guan1, to have relations with extents) 1,317 与压力有关 (he2ya1li4, you3guan1, to have relations with pressure) 932

Chunk Extraction and Analysis Based on Frame-Verbs

423

4 Results Statistics and Analysis The retrieval query constructed with the help of natural annotation information is relatively strict, and can ensure the correctness of extracted chunk instances. Based on 2,154 frame-verbs, 3,628 retrieval queries were constructed and 142,142 examples of chunks were extracted from the 2 TB-scale corpus. In this part, we statistically analyze verbs further, from the perspective of the number and semantic categories of chunk instances. At the same time, we also calculate the correspondence between the instance semantic categories and the adjacent semantic roles. Figure 3 shows the verbs that rank top 10 in the number of extracted chunk instances.

Frequency

6000 5000 4000 3000 2000 1000 0

Top 10 Verbs by the Number of Chunk Instances

Fig. 3. Verb distribution of Top10 chunk instances (in frequency)

From Fig. 3, we can see that several abstract verbs appear in the top 10 verbs extracted by the number of chunk instances. For example, “有关 (you3guan1, relate to)”, “无关 (wu2guan1, be unrelated to)”, “合作 (he2zuo4, to cooperate)”, “结合 (jie2he2, to combine/to unite)”. This shows that abstract verbs have a strong collocation when they are paired and combined with nouns. At the same time, with the help of HowNet's noun ontology system, we categorized and analyzed the noun part in the chunk extraction. At the same time, we also calculated the coverage of the ontology system to the nouns matched by verbs. The speciﬁc statistical results are shown in Table 3 and Fig. 4: “Coverage rate of semantic categories to instances” in Table 3 refers to the covering proportion that the semantic categories of nouns in HowNet have for noun cases from extracted results of a speciﬁc verb. It is clear that verbs that match more categories of semantics in quantity are still verbs with abstract meanings, such as “有关 (you3guan1, to have relations with)” being able to match 289 semantic categories, and “无关 (wu2guan1, be unrelated to)” being able to match 271 semantic categories. This group of data fully shows that verbs with abstract meanings can be combined with quite a few semantic categories and demonstrate stronger ability of collocation than verbs with meanings referring to speciﬁc actions.

424

C. Wang et al.

Table 3. The number of semantic categories of chunk instances and coverage rate of semantic categories to instances Verbs 有关 (you3guan1, relate to) 无关 (wu2guan1, be unrelated to) 结合 (jie2he2, to combine) 相关 (xiang1guan1, be related to) 相当 (xiang1dang1, to correspond) 致敬 (zhi4jing4, to solute) 类似 (lei4si4, be similar to) 不符 (bu4fu2, not match)

Amount of semantic categories 289

Coverage rate of semantic categories to instances 0.63929

271

0.74036

210

0.62981

197

0.65498

182

0.73333

176 171

0.70789 0.70075

168

0.75788

Further statistics show that the occurrence of minor semantic classes accounts for 90% of the total number of semantic categories. Figure 4 shows the number of semantic categories with Top10 occurrences.

The top 10 semantic categories frequency

2000 1000 0 1 semantic categories human amount

component body

behavior place

strength site

Fig. 4. The top 10 semantic categories (in frequency)

As shown in Fig. 4, the semantic categories “human” and “component” are the top two occurring categories. The top semantic categories include “strength” and “body”, which are also closely related to “human”. It can be fully explained that the adjacent semantic role instances are mainly served by the semantic categories of “human” and its related categories. The data also provides some empirical support for the mapping of semantic categories and semantic roles.

Chunk Extraction and Analysis Based on Frame-Verbs

425

5 Conclusion This study extracts 142,142 examples from a 2 TB-scale corpus, to provide data support for the research of frame-chunks, to some extent. Furthermore, the study of the interaction between frame-verbs and chunks also provides an approach to the study of verbs in the ﬁeld of linguistics. At the same time, there is ample room for improvement. First, the boundary information of punctuation marks at present is used to ensure the quality of chunk extraction. In the future work, we may consider conducting structural analysis of the retrieval corpus ﬁrst and extracting chunks in the syntactic structure tree, to ensure a better quality of chunk extraction. Second, for the extracted instances, the method of semantic selection to restrict automatic acquisition can be considered to generalize and improve the instance. Finally, theses 142,142 examples can be considered for application in downstream information extraction tasks, where the quality of data can be effectively evaluated. Acknowledgement. This paper is supported by the National Key Research and Development Program of China 2020AAA0106700, NSFC project U19A2065 and State Language Commission project ZDI135-114.

References 1. Lv,S.,Zhu,D.:A Talk on Grammatical Rhetoric.The China Youth Publishing House (1952). (in Chinese) 2. Fan, X.: An Overview of Chinese Verbs. Shanghai Education Publishing House (1987). (in Chinese) 3. Wu, W.: Verb centered theory and its profound influence & notes on the study of Chinese grammar summary. Lang. Stud. 10–20 (1994) 4. Zhu, D.: The structure of “de” and sentences of judgment. Stud. Chin. Lang.1 1423–1427 (1978) 5. Li, J.: New Chinese Grammar. Commercial Press, Beijing (1992).(in Chinese) 6. Yuan, Y.: Study on the Valence of Chinese Verbs. Jiangxi Education Press, NanChang (1998). (in Chinese) 7. Li, L.: Sentence Patterns in Modern Chinese. Commercial Press, Beijing (1986).(in Chinese) 8. Su, J., Lu, J.: Syntactic analysis and teaching method of construction chunk. World Chinese Teach. 24(04), 557–567 (2010) 9. Xun, E., Rao, G., Xiao, X., Zang, J.: The construction of the BCC corpus in the age of big data. Corpus Linguis. 3(1), 93–109 (2016). (in Chinese) 10. Shen, P.: Framed verbs and teaching Chinese as a foreign language. In: Proceedings of the 6th East Asian Graduate Forum on Chinese language teaching in East Asia and the 9th graduate academic forum on Teaching Chinese as a foreign language in Beijing, pp. 730– 738 (2016). (in Chinese) 11. Wang, C., Qian, Q., Xing, D., Li, M., Rao, G., Xun, E.: Construction of semantic role bank for Chinese verbs from the perspective of ternary collocation. J. Chin. Inf. Process. 34(09), 19–27 (2020). (in Chinese)

From ‘It’s Your Funeral’ to ‘Mouse Tail Juice’: A Quantitative Study of the Mishearings in Danmu Videos Yihan Zhou(B) Department of East Asian Languages and Cultures, University of Illinois at Urbana-Champaign, Champaign, USA [email protected]

Abstract. Mishearings are a homophonic phenomenon in online danmu videos. This paper applied a quantitative method to investigate the semantic features of mishearings in danmu videos. The study provided one deﬁnition of mishearings in danmu videos and distinguished mishearings from other homophonic phenomena on the Internet. In addition, it argued deconstruction and carnivalization are the two motivations of mishearings in danmu videos. Finally, it demonstrated three semantic characteristics of danmu mishearings with empirical evidence, including the semantic opposition between source symbols and mishearings, the semantic competition among mishearings, and the semantic emergence of mishearings. The current study concluded that danmu mishearings are a new type of meaningful homophonic phenomenon that need further investigation.

Keywords: Mishearings

1

· Danmu videos · Internet language

Introduction

The Internet has a profound impact on our lives. It has also expanded the scope and changed the structure of language [16]. The internet language is now widespread and has become a new social dialect. It can be found in websites, virtual communities, search engines, blogs, microblogs, email, instant messaging, and online comments [11]. In addition, internet language creates new linguistic elements and rules that innovate lexicon, semantics, syntax, rhetoric, and register [26]. These features make the study of internet language a necessary topic. The focus of this paper is an emerging type of internet language: danmu language. Danmu language was less discussed in previous studies and has only gained importance since 2012 [13]. Danmu originated from the Niconico videosharing website, which was established in Japan in 2006 and introduced to China two years later [21,29]. The original meaning of danmu refers to the barrage in artillery tactics. However, in the context of the Internet era, danmu refers to a large amount of synchronized comments that scroll on the screen when the c Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 426–439, 2022. https://doi.org/10.1007/978-3-031-06703-7_33

From ‘It’s Your Funeral’ to ‘Mouse Tail Juice’

427

video is being played. Such synchronized comments are instantaneous, interactive, entertaining, and diverse [9]. They enhance the pleasure of watching videos. Homophone is an important characteristic of danmu language. It is also known as mishearing, soramimi, and mondegreen in the literature [1,21,22]. For example, 耗子尾汁 haoziweizhi (‘Mouse tail juice’) is a neologism originated from danmu language that swept the Internet in China since November, 2020. It is the mishearing of the idiom 好自为之 haoziweizhi (‘It’s your funeral’). The present study aims to demonstrate the characteristics of danmu language. In particular, it focuses on the motivation and semantics of the homophonic phenomenon in danmu language. For convenience, the homophonic phenomenon in danmu will be termed as “mishearings”. This paper hopes to contribute to the existing research in two ways. First, the paper argues that danmu mishearings are a new type of homophonic phenomenon on the Internet. Many homophonic phenomena on the Internet have been examined, such as homophonic puns, eggcorns, malapropism, and unrelated homophones [15,17,32,36]. However, danmu mishearings are diﬀerent from these homophonic phenomena. In addition, previous studies have focused on the phonological features of mishearings, paying little attention to the semantic features [4,22]. Moreover, some authors treated mishearings as meaningless noise and discouraged the study on their semantic features [9,31,32]. This paper, on the other hand, emphasizes the semantics of danmu mishearings and argues that danmu mishearings are meaningful.

2 2.1

Definition, Motivation, and Semantics of Danmu Mishearings Definition of Danmu Mishearings

The present study proposes that any homophonic phenomenon can be deﬁned as the projection from one symbol to another symbol sharing similar pronunciation. The former will be called “source symbol” and the latter as “homophonic symbol”. The direction of phonetic projection will be indicated by an arrow →. Danmu mishearings are thus deﬁned as a homophonic phenomenon in which a single source symbol is projected onto a large number of homophonic symbols. The homophonic symbols do not need to ﬁt the context of the source symbols, nor do they depend on the meaning of the source symbols. The homophonic symbols can have independent meanings and can even deconstruct the meaning of the source symbols. Danmu mishearings are argued to be a new type of homophonic phenomenon and the remaining part will compare danmu mishearings with other homophonic phenomena in the literature. Homophonic Puns and Danmu Mishearings. A homophonic pun expresses two layers of meanings, one from the source symbol and the other from the homophonic symbol [7]. The source symbol is present and carries the primary meaning. In contrast, the homophonic symbol is hidden and expresses the secondary meaning [17].

428

Y. Zhou

The homophonic puns and danmu mishearings are diﬀerent. In homophonic puns, the homophonic symbols must ﬁt the context of the source symbols, while this is not the case for danmu mishearings. The following examples illustrate the diﬀerences. The asterisk indicates the sentence does not make sense. (1) 路由器坏了，我家就没法上网了 Luyouqi huaile, wojia jiu meifa shangwangle ‘The router is broken, so my house has no access to the Internet’ →陆游气坏了，我家就没法上网了 Luyou qihuaile, wojia jiu meifa shangwangle ‘Luyou was angry, so my house has no access to the Internet’ (homophonic pun) (2) 我劝你好自为之 Wo quanni haoziweizhi ‘I advise you to behave yourself’ →∗ 我劝你耗子尾汁 Wo quanni haoziweizhi ‘I advise you to the mouse tail juice’ (danmu mishearings) Eggcorns and Danmu Mishearings. Eggcorns make new interpretations of the source symbols [23]. They can be used in the same contexts as the source symbols and the eggcorns are usually hyponyms or hypernyms of the source symbols [15]. In contrast, in danmu mishearings, the homophonic symbols may neither apply to the contexts of the source symbols nor have a direct semantic relation with them. The diﬀerences are shown in the following examples. (3) 休息一下 Xiuxi yixia ‘Take a break’ →休息一夏 Xiuxi yixia ‘Take a summer break’ (eggcorn) (4) 在健身房练死劲 Zai jianshenfang lian sijin ‘Practice muscle strength in the gym’ →∗ 在健身房练丝巾 Zai jianshenfang lian sijin ‘Practice the silk scarf in the gym’ (danmu mishearings) Malapropism and Danmu Mishearings. Malapropism deliberately uses incorrect characters to create humor or express special emotions [36]. The homophonic symbols add extra meanings to the source symbols. However, this is not true for the homophonic symbols in danmu mishearings. The meaning of danmu mishearings do not build on the meaning of the source symbols. Instead, the meaning of the homophonic symbols often deconstructs or undermines the meaning of the source symbols. In the following sentences, the meaning of “profartssor” is built on the meaning of “professor”. By contrast, in “I said Tingting”, the meaning of “Tingting” does not depend on “stop”. “Tingting” becomes a ﬁctional character, which deconstructs the meaning of “stop”. (5) 教授 Jiaoshou ‘Professor’ →叫兽 Jiaoshou ‘Profartssor’ (Malapropism) (6) 我说停停 Wo shuo tingting ‘I said stop’ →我说婷婷 Wo shuo tingting ‘I said Tingting’ (danmu mishearings)

From ‘It’s Your Funeral’ to ‘Mouse Tail Juice’

429

Unrelated Homophones and Danmu Mishearings. The unrelated homophones and the source symbols are not related in literal meaning. However, the unrelated homophones are used as a substitute for their source symbols as they inherit the meaning of the source symbols [31]. For example, when used in contexts, the word 杯具 beiju ‘cups’ can be used interchangeably with the word 悲剧 beiju ‘tragedy’. However, danmu mishearings do not inherit the meaning of the source symbols. Rather, danmu hearings undermines the meaning of the source symbols. The meaning of ‘British muscle man’ is usurped by the meaning of ‘British marble’. (7) 悲剧 Beiju ‘Tragedy’ →杯具 Beiju ‘Cups’ (Unrelated homophones) (8) 英国大力士 Yingguo dalishi ‘British muscle man’ →英国大理石 Yingguo dalishi ‘British marble’ (danmu mishearings) 2.2

Motivations of Danmu Mishearings

It is commonly held in the literature that the essence of danmu culture is deconstructionism, aiming to deny absolute authority, challenge mainstream culture, and make its own interpretation of the texts [9,30]. Meanwhile, danmu language contains a variety of forms and becomes an online language carnival [3,25]. Based on previous studies, this paper argues that deconstruction and carnival are the two motivations of danmu mishearings. Deconstruction. Deconstruction is an important postmodernist concept proposed by the French philosopher Jacques Derrida in the 1960s.s. This concept stresses diversity and decentralization [8]. Derrida argued that nothing can avoid being pointed to and symbols are pointed to by other symbols in an endless cycle [6]. Therefore, meaning is always delayed and texts are nothing but transitional traces. Meaning is just like a bouquet of ﬂowers. Their branches grow in all directions, but are interconnected with each other at the same time. In such a bouquet, it is impossible to ﬁnd the source and origin [5]. From the perspective of linguistics, deconstructionism makes three claims: it rejects the unequal status between the signiﬁer and the signiﬁed; it denies that speech is primary and words are secondary; it disrupts the static and closed nature of language [28]. The danmu mishearings are consistent with these three arguments. First, in danmu mishearings, the boundary between signiﬁer and signiﬁed is blurred. A homophonic symbol may turn into a source symbol that is pointed by another homophonic symbol. For example, 耗子尾汁 haoziweizhi ‘mouse tail juice’ is the mishearing of 好自为之 haoziweizhi ‘it’s your funeral’. In turn, 耗子尾巴 haoziweiba ‘rat tail’ is the mishearing of 耗子尾汁 haoziweizhi ‘mouse tail juice’. Second, Derrida argues that both writing and speech are symbols and there is no hierarchical subordination [6]. Derrida himself created a new concept diﬀ´erence, which is a homophone of the word diﬀerence. The meanings of the two words have to be distinguished by spelling. For danmu mishearings, the meaning also needs to be diﬀerentiated by the orthography and spelling.

430

Y. Zhou

Thus, the homophonic symbols are not subordinate to source symbols. Finally, the Internet has demolished the boundaries of traditional texts, allowing them to be linked to each other. It also breaks up the communication monopolies and confers individuals the power of communication [37]. This is also true for danmu mishearings. Danmu mishearings are linked to each other and individuals can create his or her own danmu mishearings. Language Carnival. Carnival theory provides another motivation for danmu mishearings. Carnival theory is a literary theory proposed by the Russian literary scholar Mikhail Bakhtin. Bakhtin studied various festivals, linguistic works, and square discourse as a counterreaction to the oﬃcial and serious culture in the Middle Ages [20]. Such carnivals and square discourse have two characteristics: ﬁrst, in the carnival square, hierarchical diﬀerences are temporarily abolished. Social norms and taboos also lost their binding force, allowing people to connect each other in an unrestrained way; second, in carnivals, everyone is both an actor and an audience. These two characteristics are also present in danmu language. To some extent, the openness and anonymity of the Internet has abolished the diﬀerences in social status between danmu users. Moreover, each danmu user is both the creator and the audience of danmu. Bakhtin used the example of “One hundred and seven cries which are cried every day in Paris” to vividly illustrate the clamor of carnival in the square [20]. Similarly, danmu is also a virtual carnival on the Internet. A good parallel is the source symbol 以和为贵 yiheweigui ‘harmony is precious’, based on which individual users produced more than one hundred types of dammu mishearings. 2.3

Semantics of Danmu Mishearings

This section argues that homophonic symbols in danmu mishearings are neither random nor meaningless. There is also competition among diﬀerent mishearings from the same source symbol. When a mishearing is used frequently, new meanings will emerge. Semantic Opposition. The ﬁrst semantic characteristic of danmu mishearings is the opposition between source symbols and homophonic symbols in semantic categories. Speciﬁcally, the semantic categories include: abstract/concrete, inanimate/animate, non-human/human, and disembodied/embodied. When a source symbol belongs to one semantic category (e.g., embodied), its danmu mishearing tends to fall into the opposite category (e.g., disembodied). For example, a source symbol is 面部 mianbu ‘face’, which is embodied. But the mishearing becomes a disembodied word 棉布 mianbu ‘cotton cloth’. This opposition in semantic category is in accord with cognitive linguistics, deconstructionism, and carnival theory. More examples will be given in Sect. 4. The semantic opposition of danmu mishearings is very similar to the projection of metaphor in cognitive linguistics. Classical metaphor theory states that metaphors project from the concrete to the abstract [12]. An empirical

From ‘It’s Your Funeral’ to ‘Mouse Tail Juice’

431

study shows the directions of metaphorical projection can be divided into the following six categories: concrete →abstract, embodied →non-embodied, external →internal, animate →inanimate, less valenced →more valenced, and more intersubjective →less intersubjective [33]. Obviously, the six projection categories have many overlaps with the four projection categories proposed in this paper. The opposition in semantic categories also has a psychological reality. It was found that diﬀerent semantic categories trigger diﬀerent mental processing mechanisms. For example, words that represent animate things are better remembered than words that represent inanimate things and the processing of concrete words relies more on the sensorimotor system while the processing of abstract words depends heavily on the verbal system [19,24]. Deconstruction can be achieved through the opposition in semantic categories too. When the source symbol and the mishearings have opposite semantic categories, the mishearings can easily subvert the meaning of the source symbols. The opposition in semantic categories is also in line with Bakhtin’s analysis of carnival language. Bakhtin points out that the humor of carnival language comes from degradation and materialization [20]. Both degradation and materialization require semantic opposition. In particular, degradation is achieved by a series of semantic oppositions including “inside-out”, “turnabout”, “from top to bottom”, and “from front to rear”. In a similar way, materialization is the projection from the higher, the spiritual, and the abstract onto the earth and the body. Semantic Competition. The second characteristic is that a source symbol usually generates a large number of mishearings and semantic competition occurs among these homophonic symbols. Speciﬁcally, semantic competition is aﬀected by the semantic association between the homophonic symbols and the source symbols as well as the semantic association between the components of the homophonic symbols. Based on the meme theory, the present study proposes the competition of homophonic symbols is essentially a competition for ease of memory. According to the theory of Internet memes, the homophonic symbols on the Internet can also be regarded as Internet memes [27]. Popular memes are often not the important or useful ones, but the memorable ones [2]. Lexicality, semantic associations, frequency, and phonotactic frequency are all positively correlated with the ease of memory [10,18]. Semantic Emergence. The third characteristic is semantic emergence. Semantic emergence occurs when a structure can express meanings beyond its literal sense. If a linguistic expression is used frequently enough, its independence increases. The bond between the parts and the whole is released, allowing the linguistic expression to acquire new meanings and appear in new contexts [35]. In this paper, it is argued that the frequently used mishearings may undergo semantic emergence. These structures will acquire their own meaning and are no longer related to the meaning of the source symbols.

432

3

Y. Zhou

Data Collection

The danmu data in this study are collected from bilibili.com, which is currently the most inﬂuential danmu video website in China. In this paper, the video where the Internet buzzword 耗子尾汁 haoziweizhi ‘mouse tail juice’ originated was selected. The video was released on January 5, 2020 has been watched more than 30.49 million times as of March 4, 2021. The oﬃcial API in Python of bilibili.com was used to crawl the danmu of the video between January 5, 2020 and December 31, 2020. A total of 299,454 danmu were collected. After removing duplicate danmu, 43,606 danmu were obtained. The danmu data were cleaned and only Chinese characters, digits, and English alphabet were retained. All capital letters were also converted to lowercase letters. Danmu with more than 10 characters were cut into segments with no more than 5 characters. Next, the author manually selected the source symbols in the original video. The original video uttered 662 characters, from which the author selected 100 words (293 characters) as the source symbols. The author then annotated mishearings in the danmu. The mishearing has to share at least one phonetic feature of vowel, consonant, or tone in the ﬁrst and last syllable with the source symbol. Infrequent source symbols with less than 10 mishearings were deleted. In the end, 70 source symbols, 2,612 types of mishearings, and 13801 tokens of mishearings were selected.

4 4.1

Results Semantic Opposition

70 types of source symbols and their most frequent mishearing were categorized in terms of their semantic opposition. The semantic oppositions are grouped into eight categories: disembodied-embodied, embodied-disembodied, non-human-human, human-non-human, inanimate-animate, animate-inanimate, abstract-concrete, concrete-abstract. Another category of “exceptions” is added to handle outliers. The categorization results are shown in Table 1. The source symbol is to the left of the arrow and the mishearing is to the right of the arrow. 46 out of 70 pairs of source symbols and mishearings can be categorized into the proposed categories, accounting for 65.7% of the data. This percentage indicates that the semantic characteristics of most danmu mishearings can be explained by the semantic category oppositions and there may be a semantic principle that govern danmu mishearings. In addition, it is important to note that the remaining 24 exceptions are not completely against the semantic category opposition. Opposition in other

From ‘It’s Your Funeral’ to ‘Mouse Tail Juice’

433

Table 1. Opposition in semantic categories Categories disembodied →embodied

Count Example (Source symbol →danmu mishearing) 4

他不服气 tabufuqi ‘He is not convinced’ →他不呼气 tabuhuqi ‘He does not exhale’

embodied →disembodied 15

没有闪 meiyoushan ‘Did not dodge’ →没有伞 meiyousan ‘Has no umbrella’

non-human →human

6

停停 tingting ‘Stop’ →tingting 婷婷 (a female name)

human →non-human

4

他 ta ‘He’ →塔 ta ‘Tower’

inanimate →animate

3

有而来 youbeierlai ‘Well-prepared’ →有bear来 youbearlai ‘A bear comes’

0

N/A

animate →inanimate abstract →concrete

12

concrete →abstract

2

exceptions

24

‘Tingting’

好自为之 haoziweizhi ‘it’s your funeral’ →耗子尾汁 haoziweizhi ‘mouse tail juice’ 收拳 shouquan ‘Withdraw the ﬁst’ →收钱 shouqian ‘Charge money’ 我笑一下 woxiaoyixia ‘I laughed’ →我啸一下 woxiaoyixia ‘I whistled’

semantic categories can still be identiﬁed. For example, 蹭了一下 cengleyixia ‘rubbed one time’ and 蹭了亿下 cengleyixia ‘rubbed a billion times’ show an opposition in the category of quantity. In other words, the proportion of danmu that ﬁt the semantic category opposition may actually be higher. This percentage is not signiﬁcantly lower than 77.1% reported in the projection of semantic categories in metaphor [33]. Therefore, unlike what has been suggested in the literature, this paper argues that danmu mishearings are not a meaningless word play, but a meaningful homophonic phenomenon. It is also interesting to analyze the direction of projection. In danmu mishearings, abstract to concrete, inanimate to animate, and embodied to disembodied accounted for more than half of the data. On the contrary, it is pointed out the direction of metaphorical projection was mostly from concrete to abstract, from animate to inanimate, and from embodied to disembodied.

434

Y. Zhou

It seems to imply that the danmu mishearings and metaphor represent different ways of human cognition. Metaphor represents an abstraction and construction process. In the process of making sense of the world, human beings often start with familiar and tangible things. These things are then extended and generalized to understand more abstract concepts. For example, we derive more abstract concepts of quality and status from the visual space concept of 上 shang ‘above’. In comparison, danmu mishearings are concretization and deconstruction. They serve to challenge the authority and seriousness that has been established. Deconstruction degrades abstract and sublime concepts into concrete and familiar things. It is thus not surprising to see the serious warning 好自为之 haoziweizhi ‘it’s your funeral’ is transformed into a peculiar liquid 耗子尾汁 haoziweizhi ‘mouse tail juice’ by mishearings. But why danmu mishearings show similar patterns as metaphor in the category of embodied/disembodied? This paper argues that this is due to the content of the video. The video was recorded by a traditional martial arts practitioner and made extensive reference to nouns and verbs related to human body. Therefore, the proportion of embodied words is increased and outnumbers the disembodied words. 4.2

Semantic Competition

A case study is used to illustrate the semantic competition in danmu mishearings. Among all the mishearings of 耗子尾汁 haoziweizhi ‘mouse tail juice’, the author selected eight mishearings with the same pronunciation to exclude the interference of phonetic factors. All of the mishearings only diﬀer in the orthography of the last character. The correlation between the frequency of danmu mishearings and ﬁve variables will be analyzed using statistical methods. The ﬁve variables include the frequency of the fourth character, the concreteness of the fourth character, the semantic association between the fourth character and 好自为之 ‘it’s your funeral’ (“Association with source word” hereafter), the semantic association between the fourth character and 耗子 ‘rat’ (“Association with rat” hereafter), and the semantic connection between the fourth character and 尾 ‘tail’ (“Association with tail” hereafter). The frequency of mishearings is collected in the current paper. The frequency of the fourth character was obtained from the frequency of Chinese characters in the BCC corpus [34]. The concreteness of the fourth character was manually annotated by the author based on intuition. The semantic association between the fourth character and 好自为之 ‘it’s your funeral’, the semantic association between the fourth character and 耗子 ‘rat’, and the semantic links between the fourth character and 尾 ‘tail’ were obtained from a Chinese word vector model trained on Baidu Encyclopedia corpus with words and Chinese characters as features [14]. The results are presented in Table 2.

From ‘It’s Your Funeral’ to ‘Mouse Tail Juice’

435

Table 2. Parameters for statistical analysis Mishearings

Mishearing Freq

haoziweizhi 耗子尾汁

3061

haoziweizhi 耗子尾之

28

haoziweizhi 耗子尾吱

28

haoziweizhi 耗子尾支

4th Char Freq

4th Char Concreteness

Association with source word

Association with mouse

Association with tail

2126

Concrete

139739

Abstract

0.1

0.19

0.26

0.32

0.15

2035

0.24

Abstract

0.17

0.30

5

0.30

18849

Abstract

0.12

0.05

haoziweizhi 耗子尾只

0.26

4

169911

Abstract

0.16

0.29

0.42

haoziweizhi 耗子尾知

3

137107

Abstract

0.34

0.18

0.22

haoziweizhi 耗子尾芝

2

1476

Concrete

0.17

0.13

0.23

haoziweizhi 耗子尾肢

1

1850

Concrete

0.06

0.16

0.43

Before conducting the statistical analysis, we may qualitatively examine the reliability of the word vector model. In the sixth column of Table 2, the two words most closely associated with the semantics of 耗子 haozi ‘rat’ are 吱 zhi ‘squeak’ and 只 zhi ‘measure word’. This is consistent with the author’s intuition, because 吱 zhi can be used to describe the sound made by rats, and 只 zhi as a quantiﬁer can be used for rats too. In the seventh column of Table 2, the two words most closely related to the semantics of 尾 wei ‘tail’ are 肢 zhi ‘limb’ and 只 zhi ‘measure word’. Both 尾 wei and 肢 zhi are body parts, so they are expected to have a close semantic association. On the other hand, the word 只 zhi is the most commonly used measure word for animals and it is often used for animals with tails, such as birds, dogs, sheep, cats, monkeys, tigers, lions, foxes, rabbits, and mice. 尾 wei can also be used as a measure word. Therefore, it is reasonable that 只 zhi and 尾 wei have a strong semantic connection. In summary, we can say that it is valid to use the word vector model to approximate the semantic connection. Next, the author conducted a multivariate correlation analysis between different variables and mishearing frequency in R. Before running the correlation analysis, the normality of each variable is checked using the Shapiro-Wilk test. Mishearing frequency (p-value < 0.001), fourth character frequency (p-value = 0.005), concreteness (p-value < 0.001), and Association with tail (p-value = 0.024) are normally distributed, while Association with source word (p-value = 0.18) and Association with rat (p-value = 0.54) are not. Therefore, Pearson’s correlation will be used for the ﬁrst four variables and Spearman’s correlation will be used for the latter two variables. The results of correlation are shown in Table 3. Table 3. Correlation between mishearing frequency and ﬁve variables 4th Char Freq

4th Char Concreteness

Association with source word

Association with rat

Association with tail

Correlation coeﬃcient

r = −0.30

r = 0.48

ρ = 0.05

ρ = 0.32

r = −0.17

P-value

0.46

0.22

0.89

0.43

0.68

436

Y. Zhou

As seen from the table, none of the correlation reached the level of statistical signiﬁcance. Consequently, no reliable conclusions can be drawn. This may be due to the small small size. However, it is still possible to make two speculations for future studies. First, the highest correlation coeﬃcient was found between the concreteness of the fourth character and the mishearing frequency. The coeﬃcient is 0.48, indicating a moderate positive correlation. If the correlation is statistically signiﬁcant, the result can suggest that the more concrete the fourth character is, the more frequent the mishearing will be. It can then be used to support the semantic category opposition hypothesis, which requires the mishearings and the source symbols to fall into opposite semantic categories. For instance, if the source symbol is an abstract word like 好自为之 haoziweizhi ‘it’s your funeral’, the danmu mishearing is expected to be a concrete word. Second, the frequency of the fourth character has weak negative correlations of −0.30 with the frequency of mishearings. We can also hypothesize that high-frequency danmu mishearings tend to contain low-frequency Chinese characters. Using low-frequency characters is more likely to attract visual attention. Despite the two speculations, more data and statistical signiﬁcance are needed before we can make any conclusions. 4.3

Semantic Emergence

It was found when a mishearing is used frequently enough, it becomes an emergent structure and subsequently acquires its own meaning. The ﬁve most frequently occurring mishearings are listed in Table 4. These mishearings not only have a new meanin, but also create a new context. The most frequent mishearing is 耗子尾汁 haoziweizhi ‘Mouse tail juice’, which appears 3,061 times in total. It started as a sarcasm of 好自为之 haoziweizhi ‘It’s your funeral’. However, after frequent use, 耗子尾汁 haoziweizhi ‘Mouse tail juice’ began to have its own meaning. In the danmu language, there are uses such as “two bottles of rat tail juice”. 耗子尾汁 haoziweizhi ‘Mouse tail juice’ is even further interpreted as a technical term 鼠尾胶原蛋白 shuweijiaoyuandanbai ‘collagen from rat tail’. The second most frequent mishearing 婷婷 Tingting, became a ﬁctional character. These emergent structures do not necessarily rely on the context and the meaning of the source symbols. At the same time, these emergent structures have acquired their own meaning and are not a substitute for the source symbols. The semantic emergence reﬂects the deconstructionist essence of danmu mishearings, that is, there is no ﬁxed and eternal meaning and and one meaning always points to the next. This is also the most fundamental diﬀerence between danmu mishearings and other homophonic phenomena.

From ‘It’s Your Funeral’ to ‘Mouse Tail Juice’

437

Table 4. Semantic emergence Source symbols

Mishearings and frequency

好自为之 haoziweizhi ‘It’s your funeral’

New context

New meaning

I advise you 耗子尾汁 haoziweizhi ‘Mouse tail juice’ (3061)

Two bottles of mouse tail juice

Collagen from rat tail

停停 Tingting ‘Stop’

婷婷 Tingting ‘A female name’ (865)

I said

I love Tingting

A ﬁctional ﬁgure

大意 dayi ‘Careless’

大E dayi ‘Big E’ (655)

I was

Big re

Press “E”on the keyboard

有而来 youbeierlai ‘Well-prepared’

有bear来 youbearlai ‘A bear comes’ (344)

He seems

A bear has arrived

Bear

英国大力士 yingguodalishi ‘British muscle man’

英国大理石 yingguodalishi ‘British marble’ (285)

... that weighs more than 200 pounds

Is English marble hard

Calcium carbonate

5

Old context

Conclusion

The current study investigates the semantic features of the homophonic phenomenon in danmu videos using a quantitative approach. It has three ﬁndings. First, there seems to be an opposition in semantic categories between danmu mishearings and source symbols. The opposite categories include: abstract/concrete, inanimate/animate, non-human/human, and disembodied/embodied. The opposition in these categories is not only theoretically congruent with deconstructionism and carnival theory, but also explains more than 65% of the data. In addition, the directions of projection from danmu mishearings to source symbols showed reverse patterns as those in metaphors, suggesting that there may be a cognitive mechanism that regulates danmu mishearings. Second, danmu mishearings of the same source symbol may engage in semantic competition. More reliable data are needed to show whether there is a correlation between the concreteness of danmu mishearings and the frequency of mishearings. Third, frequently used danmu mishearings can become emergent structures and acquire entirely new meanings. The danmu mishearings need not to be contextually and semantically dependent on the source symbols. Taken together, the ﬁndings indicate that danmu mishearings should be treated as a new kind of homophonic phenomenon. Besides, danmu mishearings are not meaningless word play as argued in the literature. Instead, they are a meaningful homophonic phenomenon, which is is manifested in three ways: the opposition in semantic categories between source symbols and danmu mishearings, the semantic competition among danmu mishearings, and the semantic emergence of frequent danmu mishearings. Compared with other homophonic phenomena, danmu mishearings oﬀer a large amount of quantitative data which can be used to test the theories of homophonic phenomena. Future research can combine corpus data with questionnaires to describe the semantic features of homophonic phenomena in a more precise way. Computer simulation can also be used to generate homophonic symbols and make predictions about their spread on the Internet.

438

Y. Zhou

References 1. Beck, C., Kardatzki, B., Ethofer, T.: Mondegreens and soramimi as a method to induce misperceptions of speech content-inﬂuence of familiarity, wittiness, and language competence. PLoS One. 9(1), e84667 (2014) 2. Blackmore, S.: The Meme Machine. Oxford University Press, Oxford (2000) 3. Cheng, H.: Freedom and carnival: the cultural communication strategy of danmu videos - taking bilibili website as an example . Today’s Massmed. 2, 136–138 (2019). (in Chinese) 4. Content, A., Dumay, N., Frauenfelder, U.: The role of syllable structure in lexical segmentation: helping listeners avoid mondegreens. In: ISCA Tutorial and Research Workshop (ITRW) on Spoken Word Access Processes, pp. 39–42. Max-Planck Institute for Psycholinguistics, Nijmegen (2000) 5. Derrida, J.: Margins of Philosophy. University of Chicago Press, Chicago (1982) 6. Derrida, J.: Of Grammatology. Johns Hopkins University Press, Baltimore (1997) . Rhetoric 2, 74–77 7. Fu, S., Sheng, A.: The hierarchy of homophones (2008). (in Chinese) 8. Ge, C.: Postmodernism and its impact on social values . Teach. Res. 5, 96–103 (2013). (in Chinese) 9. Huangfu, X.: The deconstruction and carnival in danmu videos . News World. 12, 64–66 (2015). (in Chinese) 10. Kowialiewski, B., Majerus, S.: The non-strategic nature of linguistic long-term memory eﬀects in verbal short-term memory. J. Mem. Lang. 101, 64–83 (2018) 11. Kuang, X., Jin, Z.: Internet language: a new social dialect . Lang. Plann. 8, 21 (2000). (in Chinese) 12. Lakoﬀ, G., Johnson, M.: Metaphors We Live By. University of Chicago press, Chicago (1980) . Master’s thesis. 13. Li, R.: A linguistic study of danmu videos Shaanxi Normal University (2018). (in Chinese) 14. Li, S., Zhao, Z., Hu, R., Li, W., Liu, T., Du, X.: Analogical reasoning on Chinese morphological and semantic relations. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 138–143. Association for Computational Linguistics, Melbourne, Australia (2018) . 15. Li, S.: A study on the formation mechanism of eggcorns Contemp. Foreign Lang. Stud. 10(12), 21–24 (2010). (in Chinese) 16. Li, W., Chu, J.: The impact of emerging media on language and culture . News Writ. 3, 10–14 (2018). (in Chinese) . J. Lang. 17. Liu, X.: Homophonous puns in internet language Lit. Stud. 18, 129–131 (2006). (in Chinese) 18. Majerus, S., Van der Linden, M.: Long-term memory eﬀects on verbal short-term memory: a replication study. Br. J. Develop. Psychol. 21(2), 303–310 (2003) 19. Meinhardt, M.J., Bell, R., Buchner, A., R¨ oer, J.P.: Adaptive memory: is the animacy eﬀect on memory due to emotional arousal? Psych. Bull. Rev. 25(4), 1399– 1404 (2018). https://doi.org/10.3758/s13423-018-1485-y 20. Mikhail, B.: Rabelais and his World. Indiana University Press, Bloomington (1984) 21. Nakajima, S.: The sociability of millennials in cyberspace: a comparative analysis of barrage subtitling in Nico Nico Douga and Bilibili. In: Frangville, V., Gaﬀric, G. (eds.) China’s Youth Cultures and Collective Spaces: Creativity, Sociality, Identity and Resistance, pp. 98–115. Routledge, New York (2019)

From ‘It’s Your Funeral’ to ‘Mouse Tail Juice’

439

22. Nakata, H.: Hearing Japanese words in English songs: mondegreen phenomena by nonnative listeners. Bull. Seikei Univ. 50(2), 1–16 (2016) 23. Reddy, S.: Understanding eggcorns. In: Proceedings of the Workshop on Computational Approaches to Linguistic Creativity, pp. 17–23 (2009) 24. Sakreida, K., Scorolli, C., Menz, M.M., Heim, S., Borghi, A.M., Binkofski, F.: Are abstract action words embodied? an FMRI investigation at the interface between language and motor cognition. Frontiers Human Neurosci. 7, Article 125 (2013) 25. Shen, W.: The interaction of society and symbols: new media discourse of the youth . and the construction of e-topia Shandong Foreign Lang. Teach. 3, 22–30 (2019). (in Chinese) 26. Shi, C.: The linguistic values of internet language . Appl. Linguist. 3, 70–80 (2010). (in Chinese) 27. Shifman, L.: Memes in Digital Culture. MIT Press, Boston (2014) 28. Su, Q.: On the deconstructionist nature of Derrida’s linguistic philosophy . Foreign Lang. Res. 2, 14–17 (2015). (in Chinese) 29. Wang, L.: A cognitive analysis of danmu language on Bilibili.com . Master’s thesis. Southwest University of Science and Technology (2017). (in Chinese) 30. Xiao, X.: Linguistic features, current situation and the socio-cultural causes of . J. Beihua Univ. danmu language (Soc. Sci.). 17(5), 20–24 (2016). (in Chinese) 31. Xu, M.: Study of irrelevant homophones in internet language . Theoret. Stud. Lit. Art. 6, 69–76 (2013). (in Chinese) 32. Xu, M.: Construction principles and interpretive mechanisms of irrelevant homophones phenomena in internet language . Contemp. Rhetoric. 6, 54–63 (2015). (in Chinese) 33. Xu, Y., Malt, B.C., Srinivasan, M.: Evolution of word meanings through metaphorical mapping: systematicity over the past millennium. Cogn. Psychol. 96, 41–53 (2017) 34. Xun, E., Rao, G., Xiao, X., Zang, J.: The development of bcc corpus in the context . 1, 93–109, 18 (2016). of big data. Corpus Linguist (in Chinese) 35. Zhang, L.: On the role of frequency in the construction of language structure . J. PLA Univ. Foreign Lang. 6, 8–14 (2010). (in Chinese) 36. Zhang, L.: A pragmatic analysis of peculiar malapropism in internet language . Soc. Sci. Guangxi. 3, 144–146 (2005). (in Chinese) 37. Zhixi, Q., Ge, F., Wu, H.: The postmodernism characteristics of internet commu. Wuhan Univ. J. (Human. Sci.) 55(6), 760–766 nication (2002). (in Chinese)

A Textual Entailment Recognition Method Fused with Language Knowledge Yalei Liu1(&), Lingling Mu1, Wenyan Chu2, and Hongying Zan1 1

2

School of Information Engineering, Zhengzhou University, Zhengzhou 450001, Henan, China [email protected], {iellmu,iehyzan}@zzu.edu.cn School of International Studies, Zhengzhou University, Zhengzhou 450001, Henan, China [email protected]

Abstract. Recognizing Textual Entailment (RTE) is a challenging task in natural language processing (NLP). Current mainstream RTE models mainly learn text features and inference knowledge from training data, which leads to weak generalization ability. In response to the problem mentioned above, this paper proposes a Chinese RTE model fused with language knowledge that uses the semantic information of the Ci Lin and the word vector representation based on the sememe. The accuracy of our model achieved 80.60% on the CNLI textual entailment test set and 80.11% on the Chinese test set of XNLI. The experimental results show that the method fused with language knowledge can effectively improve Chinese textual entailment recognition accuracy, and using the larger training corpus will produce better results. Keywords: Textual entailment Lin

Attention mechanism RoBERTa The Ci

1 Introduction Recognizing Textual Entailment (RTE) serves to determine the inferential relationship between a premise (P) and a hypothesis (H), which includes three categories: entailment, contradiction, and neutral (Table 1). Table 1. Textual entailment samples. Premise 男人和女人正在交谈。 (The man and the woman are talking.) 一个男人蹲在人行道上。 (A man is standing on the sidewalk.) 我想到村子里去看看鲍恩斯坦。 (I want to see Bonestein in the village.)

Hypothesis 两个人在交谈。 (Two people are talking.) 这个男人正在找零钱。 (The man is looking for change.) 我不知道鲍尔斯坦是谁。 (I don't know who Ballstein is.)

Relationship Entailment Neutral Contradiction

At present, the mainstream RTE models mostly use methods based on the Deep Neural Networks (DNN) [1–3] or integrated external-language knowledge [4, 5]. © Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 440–451, 2022. https://doi.org/10.1007/978-3-031-06703-7_34

A Textual Entailment Recognition Method

441

Although the methods based on DNN have achieved remarkable recognition results, the inference knowledge they learn from training data is limited; therefore the generalization ability of these methods is weak [6]. The RTE models fused with language knowledge improve the recognition accuracy and generalization ability, but the results are related to the knowledge base. The commonly used English knowledge base is WordNet [7], and Chinese knowledge bases are HowNet [8] and Ci Lin et al. [9]. HowNet is a common-sense knowledge base established by experts. The Chinese concept is represented by one or more sememes and is applied to many ﬁelds of NLP. Sememe is the most basic and smallest unit of meaning, which is not easy to be subdivided. The WRL model proposed by Liu et al. [10] proves that the word representation learning of fusion of sememe information is effective. However, the synonyms in HowNet are implicit and cannot be directly obtained. In the Ci Lin, the synonyms are explicitly given, which is more convenient to be used. Introducing the Ci Lin into the RTE model can enhance the inferential ability of the model. For example, in the sentence “一对夫妇坐在下水道入口处的旁边。(A couple are sitting beside the sewer entrance.)”, the synonyms of “夫妻(couple)” and “下水道(sewer)” can be added to the sentence. Moreover, the sentence after the fusion of synonyms is “一对夫妇夫妻两口子坐在下水道排水沟排污沟入口处的旁边。(A couple spouse consort are sitting beside the sewer cloaca kennel entrance.)”. Hence, the sentence's meaning after the fusion of synonyms is more comprehensive and accurate, which improves the recognition and generalization ability of the model. Therefore, based on the KT-NET [11] model and the Decomp-Att [12] model, we propose a Deep Learning RTE Model fused with Language Knowledge (DLRK). More speciﬁcally, the model concatenates the results of the RoBERTa [13] model and the Knowledge Attention (K-Attention) model. Then the DLRK model uses the softmax function for classiﬁcation. The main contributions of the model are as follows: 1. Use the attention mechanism to learn the features of sentences and incorporate the synonyms with the P and the H; 2. Use the word embedding based on sememe proposed by Liu et al. [10] as the initial vector. The accuracy reached 80.60% on the CNLI textual entailment test set and 80.11% on the Chinese test set of XNLI.

2 Related Work The commonly used English textual entailment dataset is SNLI [14], a dataset released by the NLP group of Stanford University. The commonly used Chinese datasets are the CNLI1 dataset and the XNLI2 dataset. The CNLI RTE dataset was proposed in the Seventeenth China National Conference on Computational Linguistics. The XNLI [15] dataset is a cross-lingual natural language inference corpus. Early RTE methods mainly include similarity-based methods [16–18], alignmentbased methods [19–21], et al. These methods are easy to be implemented, but the

1 2

http://www.cips-cl.org/static/CCL2018/call-evaluation.html#task3. https://cims.nyu.edu/*sbowman/xnli/.

442

Y. Liu et al.

accuracy is not high enough, and they are not flexible. Subsequently, more and more researchers use DNN [22] models to identify textual entailment. Parikh et al. [12] proposed a Decomp-Att model that combines textual entailment recognition based on alignment with attention mechanism and achieves an accuracy of 86.8% on the SNLI dataset. The generalization ability of the model is relatively weak because the model has limited inference knowledge learned from the training corpus. Therefore, some scholars proposed many methods of fusing with language knowledge to enhance the recognition ability of models. Yang et al. [11] proposed a KT-NET model. This model uses the external language knowledge base WordNet to enhance the word representation, which is effective in machine reading comprehension. With the development of the pre-trained models in recent years, more and more researchers have achieved good results. For example, Jacob et al. [23] proposed a word vector representation model BERT based on a self-attention mechanism, which refreshed the records on multiple NLP tasks. The pre-trained RoBERTa model based on a large-scale Chinese corpus proposed by Yiming Cui et al. [13] achieved an accuracy of 78.8% on the XNLI dataset. The MT-DNN model proposed by X Liu et al. [24] combines multi-task learning and a pre-trained language model for language representation learning. This method achieved an accuracy of 91.6% on the SNLI dataset.

3 Model The DLRK model is mainly composed of the RoBERTa model and the K-Attention model. Its structure is shown in Fig. 1.

Fig. 1. The DLRK model’s structure.

A Textual Entailment Recognition Method

3.1

443

K-Attention

The structure of the K-Attention model is shown on the right side of Fig. 1, which is mainly composed of Fusion Ci Lin Layer, Encoding Layer, Interaction Layer, and Aggregation Layer. Fusion Ci Lin Layer. First, we use the word segmentation tool to segment the P and H to obtain the sequence (a1, a2…an) and (b1, b2…bn). For each word, add at most two synonyms and merge them into a new sentence. For example, in Fig. 1, if the word a1 has synonyms in the Ci Lin, its two synonyms (a11, a12) will be added; otherwise, the word a1 will be retained, and the next word a2 will be processed. Encoding Layer. We use the word embedding fusion with sememe information proposed by Liu et al. [10]. The vector is E R|V|xd, where |V| represents the size of the vocabulary dictionary, and d represents the dimension of word embedding. Therefore, the purpose of the encoding layer is to convert each word into the word vector representation, which is recorded as m = (m1,…mi) and n = (n1,…nj). The length of m and n are i and j, respectively. And mi is the vector representation of the i-th word; nj is the vector representation of the j-th word. Interaction Layer. First, we use the fully connected neural network F to extract the features from m and n. Then we get an attention alignment matrix eij and calculate the attention weights to get bi and cj. bi is the attention weight of m aligned with n; cj is the attention weight of n aligned with m. The process of calculating the weight matrix is shown in formula (1)–(3). eij ¼ F ðmÞT F ðnÞ bi ¼

j X 1

cj ¼

i X 1

exp eij

ð1Þ nj

ð2Þ

exp eij mi Pj k¼1 exp ejk

ð3Þ

Pi

k¼1

expðeik Þ

Aggregation Layer. We compare two pairs of vectors mi and bi, nj and cj through the feed-forward neural network G. We get the weight vectors v1,i, v2,j and aggregate them into the weight vectors v1, v2. Finally, the weight vectors v1, v2 are concatenated, and a feed-forward network H is used for classiﬁcation to obtain the vector v R3. The formal representation of the aggregation layer is shown as Eqs. (4) to (8). v1;i ¼ Gð½bi ; mi Þ

ð4Þ

444

Y. Liu et al.

v2;j ¼ G cj ; nj v1 ¼

i X

ð5Þ

v1;i

ð6Þ

v2;j

ð7Þ

1

v2 ¼

j X 1

v ¼ Hð½v1 ; v2 Þ

3.2

ð8Þ

RoBERTa

This RoBERTa model is composed of several transformer modules, mainly using the bidirectional self-attention mechanism and encoder module. The structure of the model is shown on the left side of Fig. 1. The input combines the premise and hypothesis, namely t = (CLS, t1,…tn, SEP), where n represents the total number of characters. The output of the RoBERTa model has two forms: one is the character-level vector l = (lc, l1,…ln, lc), which corresponds to the vector representation of each character; the other is the vector of sentence-level lc, which is the vector of the leftmost [CLS] special symbol. In this paper, we use lc as the ﬁnal output result, representing the semantic relationship of the entire sentence. 3.3

Output

The DLRK model combines the output result v by the K-Attention model with the output result lc of the RoBERTa model, then uses the fully connected neural network H to extract the features further to obtain the vector s R3. Finally, s is converted to the ﬁnal label l by softmax function, as shown in formulas (9)–(10). s ¼ Hð½v; lc Þ

ð9Þ

l ¼ softmaxðsÞ

ð10Þ

4 Experiment 4.1

Dataset

The CNLI dataset was proposed in the Seventeenth China National Conference on Computational Linguistics, and the XNLI(Cross-Lingual Natural Language Inference) dataset is a cross-language evaluation dataset, mainly including 15 languages. The XNLI-ZH dataset is the Chinese part of the XNLI dataset. The scale of the CNLI and XNLI-ZH is shown in Table 2.

A Textual Entailment Recognition Method

445

Table 2. Category statistics of the CNLI and XNLI-ZH. CNLI

Train Dev Test XNLI-ZH Train Dev Test

4.2

Entailment 29,738 3,485 3,475 130,899 830 1,670

Contradiction 28,937 3,417 3,343 130,903 830 1,670

Neutral Total 31,325 90,000 3,098 10,000 3,182 10,000 130,900 392,702 830 2,490 1,670 5,010

Experimental Setup

This experiment environment is python3.6, Tensorflow-GPU-1.13.0. We use the opensource jieba-0.39 Chinese word segmentation tool to segment words. The experimental parameters of the models on the two datasets are the same, and the main hyperparameters are shown in Table 3. Table 3. Main hyper-parameters settings. Parameter name learning_rate hidden_size max_seq_len dropout batch size

Value 2E-5 1024 64 0.5 32

The evaluation uses accuracy which calculated as Eq. (11). P¼

lcorrect l

ð11Þ

where lcorrect represents the number of correctly classiﬁed labels; l is the number of accurate labels of the dataset. 4.3

Experimental Results

Table 5 and 6 shows the experimental results of the different models on the two datasets. The DLR model uses the Chinese word vector [25] pre-trained by the Word2Vec model to initialize the sentence encoding. The DLRK-synonym model only uses sememe embedding to initialize the sentence encoding. The DLRK-sememe model only uses the information of the Ci Lin. The DLRK model uses sememe vectors to initialize the sentence encoding and integrates the information of the Ci Lin.

446

Y. Liu et al.

The CNLI Test Set Accuracy. The experiment results show that the DLRK model has achieved better recognition results. At the same time, it also proves that the use of word vectors based on the sememe and fusion of synonyms is helpful to improve the recognition effect and generalization ability of the model. The following Table 4 shows the accuracy comparison of each model on the CNLI test set. Table 4. Experimental results on the CNLI dataset. Model MT-DNN BERT RoBERTa RoBERTa+synonym DLR DLRK-synonym DLRK-sememe DLRK

Test set (%) 78.57 77.01 79.05 79.60 80.10 80.32 80.50 80.60

The XNLI-ZH Test Set Accuracy. The accuracy rates obtained on each model are shown in Table 5. Due to the limited GPU in the experimental environment, the parameter size set in this experiment is different from the RoBERTa model setting [13]. From Table 5, we can see that under the same parameters, the recognition effect of the DLRK model is better than the RoBERTa model. At the same time, using the larger training corpus will produce better results. Table 5. Experimental results on the XNLI-ZH dataset. Model

XNLI-ZH test set accuracy(%) Train set: XNLI-ZH Train set: CNLI MT-DNN 77.76 74.04 BERT 77.12 74.17 RoBERTa 78.10 74.55 RoBERTa+synonym 78.89 75.17 DLR 78.92 75.08 DLRK-synonym 79.12 75.44 DLRK-sememe 79.56 76.01 DLRK 80.11 76.62

A Textual Entailment Recognition Method

447

5 Result Analysis 5.1

Category Analysis

The confusion matrix of the classiﬁcation results of the DLRK model on the CNLI test set is shown in Table 6. The number of correct classiﬁcations of the DLRK and DLRKsynonym models in the three categories is shown in Table 7. Among them, N, E, and C represent the number of neutral, entailment, and contradiction classiﬁcations. Table 6. Confusion matrix of DLRK model classiﬁcation results on the CNLI dataset. Forecast category N E C Real category N 2,384 423 375 E 460 2,873 142 C 390 150 2,803

Total 3,182 3,475 3,343

Table 7. The number of correctly classiﬁed models in each category on the CNLI dataset. Model N E C Total DLRK 2,384 2,873 2,803 8,060 DLRK-synonym 2,378 2,955 2,699 8,032

It can be seen from the Table 6 that the DLRK model accuracy of neutral, entailment, contradiction is 0.75, 0.83 and 0.84 respectively, which indicates obviously low accuracy of neutral category. It can be seen from Table 7 that compared with the DLRK-synonym model, the DLRK model improves the accuracy of the overall category by 0.28% and improves the recognition effect of contradiction. However, the recognition effect of entailment is not as good as the DLRK-synonym model. 5.2

Ablation Analysis

It can be seen from Table 4–5 that compared with the MT-DNN model, the accuracy of the DLRK model increased by 2.01%+ on the CNLI dataset and XNLI dataset. From the comparative experiment of the RoBERTa model, it can be found that the recognition effect of the RoBERTa + synonym model is better than that of the RoBERTa model, and the accuracy rate can be improved by 0.6%+ on the CNLI dataset.

448

Y. Liu et al.

From the three comparative experiments of the DLRK model, it can be seen that the DLRK model has a better recognition effect than the DLR model. Compared with the DLR model, the accuracy of the DLRK model is improved by 0.5% on the CNLI test set and 1.1% on the XNLI test set. Compared with the DLRK-synonym model, the DLRK model can improve the accuracy by 0.28% on the CNLI dataset. The experimental results show that using the word vector representation based on the sememe to initialize the sentence encoding can improve the accuracy of the model and has a 0.5% improvement on the XNLI test set. 5.3

Sample Analysis

By analyzing the results of the DLRK model recognition on the CNLI test set, we ﬁnd that not only the sentence fusion of synonyms enriches the knowledge representation of text pair, but makes the semantic features of sentences more comprehensive; therefore it enhances the recognition effect and generalization ability of the model. In order to analyze the influence of the text fused with synonym information, the attention weights between the premise and the hypothesis in the DLRK model and DLRK-synonym model are given, respectively. For example, the original text pair is: “P1: 一对夫妇坐在下水道入口处的旁边。(P1: A couple are sitting beside the sewer entrance.) H1: 一对夫妇坐在博物馆的入口处。(H1: A couple is sitting at the entrance of the museum.)”. The text pair that incorporate synonyms is: “P2: 一对夫妇夫妻两口子坐在下水道排水沟排污沟入口处的旁边。(P2: A couple spouse consort are sitting beside the sewer cloaca kennel entrance.). H2: 一对夫妇夫妻两口子坐在博物馆博物院的入口处。(H2: A couple spouse consort is sitting at the entrance of the museum repository.)”. It can be found that the synonyms in the premise are “夫妇(couple)” and “夫妻 (spouse)”, “下水道(sewer)” and “排污沟(cloaca kennel)”. In the hypothesis the synonyms are “夫妇(couple)” and “夫妻(spouse)”, “博物馆(museum)” and “博物院 (repository)”. Figure 2 shows that the attention weights between similar words are higher. The recognition result of the DLRK model is correct (contradiction), but the result of the DLRK-synonym model is wrong (neutral). In this example, the attention weights of “博物馆 (museum)” and “下水道 (sewer)” are lower and the attention weight of “夫妇 (couple)” and “夫妻 (spouse)” are higher in the DLRK model. Therefore, the sentence after the fusion of synonyms in the Ci Lin is more conducive to pay attention to the local features of the sentence and improve the generalization ability of the model.

A Textual Entailment Recognition Method

449

Fig. 2. Attention weight matrix of sentence pairs.

5.4

Discussion

By analyzing the error recognition examples of the CNLI dataset, it is found that the errors of the DLRK model recognition mainly can be classiﬁed into the following categories: Polysemous Words are Ignored in the Fusion of Synonyms. For example, the premise sentence is “滑板飞越斜坡外面。(The skateboard flew over the outside.)”. The hypothesis sentence is “滑板跳过外面的斜坡。(The skateboard jumped over the slope outside.)”. The correct classiﬁcation is neutral. Among them, the word “外面 (outside)” is polysemous. In the premise, the word “外面 (outside)” means “superﬁcial” and “outside” in the hypothesis. However, the model cannot understand the exact meaning of the two sentences correctly, the recognition result is wrong. The Model Lacks Common Knowledge. For example, the premise sentence is “一只小黑狗穿过大雪。(A little black dog walked through the heavy snow.)”, and the hypothesis sentence is “一只黑狗在雪地里奔跑, 它的爪子都变凉了。(A black dog ran in the snow, and its paws became cold.)”. The correct classiﬁcation is entailment. We can infer from the premise that the dog’s paws become cold after passing through the heavy snow. The recognition result is wrong because the model lacks common sense knowledge and cannot infer its deep meaning. The Model Lacks Computing Power. For example, the premise sentence is “两个女孩站在另一个女孩身后 . (Two girls stand behind the other girl.)”, and the hypothesis sentence is “三个女孩站在彼此附近. (Three girls stand near each other.)”. The correct classiﬁcation is entailment. The model can not infer the deep semantics of sentences through calculation, which leads to model recognition errors.

450

Y. Liu et al.

The Model Requires Relevant Domain Knowledge. For example, the premise sentence is “人们在体育场的足球比赛中为足球队加油。(People cheer for the football team in a football match at the stadium.)”, and the hypothesis sentence is “他们在NFL比赛中。(They are in the NFL game.)”. The correct classiﬁcation is a contradiction. The NFL means the American Football League, not the stadium, which leads to the model identifying the wrong result.

6 Conclusion In this paper, we proposed a Chinese RTE model DLRK that fuses with language knowledge, mainly composed of the K-Attention model and RoBERTa model. In the K-Attention model, the sentence fused with synonyms is used as the input of the model, and the sememe vector representation is used to initialize the sentence encoding. The accuracy of our model achieved 80.60% on the CNLI dataset and 80.11% on the XNLI dataset, enhancing the recognition effect and generalization ability of the model. However, this method ignores the characteristics of polysemous words when fusing synonyms. The next step is to add the synonymous information of polysemous words, to improve the accuracy of RTE tasks. Acknowledgments. We are very grateful to the anonymous reviewers for their constructive opinions, as well as the support of the National Key Research and Development Project under Grant No. 2017yfb1002101; the Major Program of National Social Science Foundation of China under Grant No. 17ZDA138; the National Natural Science Foundation of China under Grant No. 62006211; the Science and Technique Program of Henan Province under Grant No. 192102210260; the Key Scientiﬁc Research Program of Higher Education of Henan Province under Grant No. 19A520003, 20A520038.

References 1. Yin, W., Schutze, H., Xiang, B.: ABCNN: attention-based convolutional neural network for modeling sentence pairs. arXiv:1512.05193 (2015) 2. Mou, L., Men, R., Li, G.: Natural language inference by treebased convolution and heuristic matching. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, pp. 130–136 (2016) 3. Chen, Q., Zhu, X., Ling, Z.: Enhanced LSTM for natural language inference. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1657–1668 (2017) 4. Wang, H., Zhao, T.: Text entailment recognition based on integration of language knowledge and deep learning and its application. Harbin Institute of Technology (2019) 5. Chen, Q., Zhu, X., Ling, Z.: Neural natural language inference models enhanced with external knowledge. In: The 56th Annual Meeting of the Association for Computational Linguistics, pp. 2406–2417 (2018) 6. Shi, L., He, L., Qing, Z.: Textual entailment recognition fused with external semantic knowledge. Computer Engineering, pp. 1–8 (2021) 7. Miller, G.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)

A Textual Entailment Recognition Method

451

8. Dong, Z., Qiang, D.: Hownet-a hybrid language and knowledge resource. In: Proceedings of NLP-KE, pp. 820–824. IEEE (2003) 9. Mei, J., Lin, C., Hai, S.: Shanghai lexicographic publishing house (1983) 10. Niu, Y., Xie, R., Liu, Z.: Improved word representation learning with sememes. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2017) 11. Yang, A., Wang, Q., Liu, J.: Enhancing pre-trained language representations with rich knowledge for machine reading comprehension. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (2019) 12. Parikh, A., Täckström, O., Das, D.: A decomposable attention model for natural language inference, pp. 2249–2255 (2016) 13. Cui, Y., Liu, W., Che, W., Qin, B., Yang Z.: Pre-training with whole word masking for Chinese BERT. arXiv preprint arXiv:1906.08101 (2019) 14. Bowman, S., Angeli, G., Potts, C.: A large annotated corpus for learning natural language inference. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 632–642 (2015) 15. Alexis, C., Ruty, R.: XNLI: evaluating cross lingual sentence representations. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (2018) 16. Jijkoun, V., deRijke, M.: Recognizing textual entailment using lexical similarity. In: Proceedings of the 1st PASCAL Challenge Workshop, Southampton, UK, pp. 73–76 (2005) 17. Heilman, M., Smith, N.: Tree edit models for recognizing textual entailments, paraphrases, and answers to questions. In: Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Los Angeles, USA, pp. 1011–1019 (2010) 18. Han, R., Yaqi, S., Wenhe, F.: Recognizing textual entailment based on knowledge topic models. J. Chin. Inf. Process. 29(6), 119–126 (2015) 19. Williams, A., Nangia, N., Bowman, S.: A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426 (2017) 20. Sultan, M., Bethard, S., Sumner, T.: Feature-rich two-stage logistic regression for monolingual alignment. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 949–959 (2015) 21. Noh, T-G., Pado, S., Shwartz, V.: Multi-level alignments as an extensible representation basis for textual entailment algorithms. In: Proceedings of the Joint Conference on Lexical and Computational Semantics, Denver, USA, pp. 193–198 (2015) 22. Bowman, S., Potts, C., Manning, C.: Recursive neural networks can learn logical semantics. arXiv:1406.1827 (2014) 23. Devlin, J., Chang, M., Lee, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018) 24. Liu, X., He, P., Chen, W.: Multi-task deep neural networks for natural language understanding. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (2019) 25. Li, S., Zhao, Z., Hu, R.: Analogical reasoning on Chinese morphological and semantic relations. In: Meeting of the Association for Computational Linguistics (2018)

Semantic Similarity of Inverse Morpheme Words Based on Word Embedding Jiaomei Zhou and Zhiying Liu(&) Institute of Chinese Information Processing, Beijing Normal University, Beijing, China [email protected]

Abstract. Inverse morpheme words are compound words that have the same morphemes but are arranged in the opposite order. The majority of related works on the subject have focused on a narrow investigation of dictionary deﬁnitions, with few studies based on large-scale corpora. Based on the People’s Daily corpus (1946–2017), we add and delete words from a base list and then obtained a word list consisting of 668 pairs of inverse morpheme words. Furthermore, we also calculated cosine similarity by using word embedding based on the distributed representation and discovered that 76% of inverse morpheme words have a cosine similarity of 0.4 or higher, and that word formation, part-ofspeech, and frequency all have an impact on semantic similarity. Keywords: Inverse morpheme words Semantic similarity Word embedding

1 Introduction In the history of the Chinese language, one of the clearest developmental changes has been a shift from monosyllabic to bisyllabic words [1], where one of the main reasons for the early production of bisyllabic words is the temporary combination of synonymous monosyllabic words. The ancient Chinese language was dominated by monosyllabic words, and it was common for these monosyllabic words to be used synonymously or antonymically. The order of words was relatively free, so there were compound words with the same morphemes but in the opposite order. In modern Chinese, such compound words are referred to as inverse morpheme words since their morphemes can be reversed [2]. Because of the reasons above, they often have the same or similar meanings. Consider the following examples: (1) 宋荣子之议, 设不斗争, 取不随, 仇不羞, 囹圄见侮不辱, 世主以为宽而礼之。 (《韩非子显学》) (2) 处乡不节, 憎爱无度, 则争斗之爪角害之。嗜欲无限, 动静不节, 则痤疽之爪角害之。(《韩非子解老》) In this case, the words “争斗 (Zheng Dou, ﬁght)” and “斗争 (Dou Zheng, ﬁght)” come from Han Fei Zi. Their meanings are the same. In the current paper, we calculated the semantic similarity of inverse morpheme words using word vectors obtained from distributed representations, and tried to analyze the factors which affect it, such as word formation and frequency, in order to © Springer Nature Switzerland AG 2022 M. Dong et al. (Eds.): CLSW 2021, LNAI 13249, pp. 452–463, 2022. https://doi.org/10.1007/978-3-031-06703-7_35

Semantic Similarity of Inverse Morpheme Words

453

further explore the semantic and structural features of the words. Based on a review of the literature on inverse morpheme words, we have found that a new approach is required to discuss the semantic relationship between them because many studies only analyzed several examples and then got conclusions, but we still do not know the whole pattern. Our hypotheses are as follows: (1) lists of inverse morpheme words may contain too many historical words and incomplete inclusion of new words; (2) the semantic relations between inverse morpheme words are complex, and the cosine similarity obtained by word embedding can be used to classify their semantic categories. Many methods for calculating semantic similarity between two words have been proposed, and we will briefly introduce one based on word embedding and explain why we chose it in Sect. 1.2 and Sect. 2. In Sect. 3, an experiment will be conducted to determine the cosine similarity of words. The last part of this paper is the investigation of the factors that influence semantic similarity (Sect. 4). 1.1

An Overview of Previous Work on Inverse Morpheme Words

The consensus on the deﬁnition of inverse morpheme words is that they are compound words that have the characteristics of two identical morphemes with the same pronunciation. Their semantics, however, remain a point of disagreement. Cao deﬁned inverse morpheme words more strictly. He believed that real inverse morpheme words should have the same phonological and written forms, opposite linear order, and the same meanings [4], such as “累积 (Lei Ji, accumulate)-积累 (Ji Lei, accumulate)”, “吞并 (Tun Bing, absorb)-并吞 (Bing Tun, absorb)”, “通畅 (Tong Chang, expedite)-畅通 (Chang Tong, expedite)”, “离别 (Li Bie, separate)-离别 (Bie Li, separate)”, “斗争 (Dou Zheng, ﬁght)-争斗 (Zheng Dou, ﬁght)”. While the word pairs “工人 (Gong Ren, worker)-人工 (Ren Gong, artiﬁcial)”, “情敌 (Qing Di, rival in love)-敌情 (Di Qing, enemy status)” have the same morphemes but different meanings, the word pairs “孙子 (Sun Zi, grandson)-子孙 (Zi Sun, descendant)”, “结巴 (Jie Ba, stammer)-巴结 (Ba Jie, curry favour with)” which are pronounced differently, are not real inverse morpheme words. Based on this deﬁnition, he listed 51 pairs of completely synonymous inverse morpheme words, such as “逃窜 (Tao Cuan, run away)-窜逃 (Cuan Tao, run away)”, and 56 pairs of words with different meanings, such as “嘴快 (Zui Kuai, outspoken)-快嘴 (Kuai Zui, one who talks without thinking)”,etc. Some other people argued that not all homographs are semantically identical. They pointed out that only if the morphemes are all in parallel structure and their morphemes have exactly the same meaning, it is only possible that their meanings are exactly equal [2]. After a long time of development, the semantic relationship between a pair of inverse morpheme words has become more complicated, with various semantic relationships such as identical, similar, or different, etc. The meaning relationship between the morphemes is also more complicated, and most of them are not identical, but one of the senses (separation of meanings) is the same. Therefore, we deﬁned inverse morpheme words as a pair of bisyllabic words with inverted morpheme orders and related morpheme meanings in modern Chinese. In addition, some other researchers have extracted word lists of inverse morpheme words from dictionaries or corpus. Zhang extracted 85 pairs from the recent Chinese

454

J. Zhou and Z. Liu

corpus [5]; Tang extracted 136 pairs from both sides of the Taiwan Strait [6], among which there are a large number of Taiwanese words, such as “熊猫 (Xiong Mao, panda)-猫熊 (Mao Xiong, panda)”, “日昨 (Ri Zuo, yesterday)-昨日 (Zuo Ri, yesterday)” which are hardly used in Mandarin. Huang extracted 738 pairs from the Modern Chinese Dictionary (2005 edition) and the Applied Dictionary of Inverse Morphemes Words [7]. The meaning relationships of these words were also classiﬁed according to the dictionary interpretation. In conclusion, there is still a lack of a word list based on a massive modern Chinese corpus. 1.2

Word Embedding and Distributed Representation

For natural language processing, it is necessary to digitally represent words that computers can understand and process. One of the most popular approaches is word embedding (vectors) [8], which was coined by Bengio et al. It can be understood as the creation of a word list in which each word corresponds to a vector in the word list, and the representation is performed by looking up the vector corresponding to each word in the word list [9]. There are two main approaches to word embedding: one-hot representation and distributed representation. One-hot representation represents each word as a highdimensional binary vector with the length of the size of the corpus word list, and the position where the word appears is marked as 1 and the other positions are marked as 0, so that each word can be represented as a string of numbers consisting of 0 and 1. It is based on the mutual independence between words, which is a simple and effective encoding method. However, it still has two drawbacks: it tends to cause the curse of dimensionality and a loss of context. Because the encoding dimension of each word is the size of the whole vocabulary, the larger the number of words, the larger the dimension will be, so the encoding dimension is huge and sparse, making the computation more expensive. More importantly, one-hot representation assumes that words are independent of one another and cannot reflect the degree of relationship between words. For example, in the sentence “I am Chinese and I love China”, “Chinese” and “China” can be presented as [0,0,1,0,0,0] and [0,0,0,0,0,1], and their dot product is zero, while the dot product of “Chinese” and “and” is also zero. It means that “China” and “and” in this representation approach is the same as “Chinese” and “and”. There is no difference in the similarity between these word pairs. It shows that one-hot cannot represent the semantic relationship between words. Therefore, it is not suitable for the representation of inverse morpheme words in our research. Distributed representation solves the problem of one-hot. The dimensionality of the vectors is not constrained by the size of the word list, and the text is represented as lowdimensional, dense continuous vectors. Each word in the word list is represented by a real vector, which is usually 50-dimensional or 100dimensional. Each word is a point in the vector space, and the distance between points is proportional to the similarity between words. Word2Vec [10] is the most popular of the word embedding models, and a key beneﬁt is that it can take additional context into account. It maps words to ndimensional vectors. A word is represented by an n-dimensional vector, and a long text is represented by multiple short n-dimensional vectors. The closer the semantics of two words are, the closer their vectors are in the vector space. The most signiﬁcant

Semantic Similarity of Inverse Morpheme Words

455

advantage of word embedding is that it can capture the semantic information of words, allowing semantically related or similar words to be close in vector distance.

2 Methods 2.1

Pre-trained Word Embedding Model

In the current research, we used distributed representations of words obtained from Word2Vec to compute the semantic similarity of inverse morpheme words. The pretrained Chinese word vector was trained by the People’s Daily corpus (1946–2017) [11]. The model contains 300-dimensional vectors for 356,053 words and phrases. We chose it instead of the Ancient Chinese Corpus mainly because we analyzed the semantic features of modern Chinese inverse morpheme words from a synchronic perspective, rather than focusing on exploring the causes or historical changes of inverse morpheme words. The second reason is the language style. The People’s Daily is China’s largest newspaper with qualities including accuracy, currentness, and clarity. 2.2

Cosine Similarity

After obtaining the word embedding, the semantic similarity of inverse morpheme words could be obtained by calculating the embedding distance [12]. Cosine similarity was used to calculate the vector similarity of words in this paper. The cosine of the angle between the two vectors is measured to determine cosine similarity, and the value of it is [−1, 1]. The larger the value, the smaller the angle between the two vectors and the higher the similarity. The cosine similarity of a pair of words was computed by using the similarity function in the Gensim library (an open-source Python library), and the value of it was used to indicate the semantic similarity of two words.

3 The Experiment 3.1

Word List Extraction

Several researchers have proposed methods for extracting inverse morpheme words as well as word lists. Tang’s word list6 contained a large number of Taiwanese words, many of which were not applicable to Mandarin. With a total of 738 pairs of inverse morpheme words, Huang’s word list [7] basically covered most researcher’s word list. As a result, we used Huang’s word list as a starting point and added and removed words to create a new word list that was more applicable to modern Chinese. We discovered that Huang’s word list has some problems based on our observations, which also veriﬁed one of the hypotheses. (1) There were many historical words that were no longer used in modern Chinese. About 85 pairs having never appeared in the People’s Daily corpus in the past 70 years. For example, the words “熬煎 (Ao Jian, suffering)” and “煎熬 (Jian Ao, suffering)”, “熬煎 (Ao Jian, suffering)” is almost no longer used, whereas “煎熬 (Jian Ao, suffering)” is very common. (2) Some of the words, such as

456

J. Zhou and Z. Liu

“习见 (Xi Jian, be commonly seen)-见习 (Jian Xi, probation)”, “渊深 (Yuan Shen, profound)-深渊 (Shen Yuan, chasm)”, are less commonly used frequently in Mandarin, resulting in insufﬁcient semantic representation during word vector training, potentially leading to inaccurate ﬁnal similarity calculation results. (3) Since it was extracted from a dictionary and was published in 2006, the inclusion of new words was incomplete. On the one hand, we removed these 85 pairs from the list that have never appeared in the People’s Daily corpus in the past 70 years. On the other hand, we extracted all the inverse morpheme words from the top 10,000 words and phrases (a total of 356,053) in frequency in the word embedding ﬁle, yielding 46 pairs of words. 15 of which did not appear in Huang’s word list, accounting for 32.6% of the total: 上海 (Shanghai)-海上 (Hai Shang, at sea), 越南 (Vietnam)-南越 (South Vietnam), 来到 (Lai Dao, arrive)-到来 (Dao Lai, arrive), 故事 (Gu Shi, story)-事故 (Shi Gu, accident), 面前 (Mian Qian, in the face of)-前面 (Qian Mian, in front), 政党 (Zheng Dang, political party)-党政 (Dang Zheng, party politics), 意愿 (Yi Yuan, aspiration)-愿意 (Yuan Yi, be willing), 南海 (Nan Hai, the Nanhai Sea)-海南 (Hai Nan, Hainan (Province)), 上网 (Shang Wang, surf the Internet)-网上 (Wang Shang, on the internet), 新高 (Xin Gao, new peak)-高新 (Gao Xin, high-tech), 放开 (Fang Kai, let go)-开放 (Kai Fang, open), 前年 (Qian Nian, the year before last)-年前 (Nian Qian, before the New Year), 自来 (Zi Lai, unsolicited)-来自 (Lai Zi, from), 时有 (Shi You, from time to tome)-有时 (You Shi, at times), 建党 (Jian Dang, found a party)-党建 (Dang Jian, party building). Obviously, “政党 (Zheng Dang, political party)-党政 (Dang Zheng, party politics)”, “新高 (Xin Gao, new peak)-高新 (Gao Xin, high-tech)”, “上网 (Shang Wang, surf the Internet)-网上 (Wang Shang, on the internet)” are new words. Other words like “意愿 (Yi Yuan, aspiration)-愿意 (Yuan Yi, be willing)”, “故事 (Gu Shi, story)-事故 (Shi Gu, accident)” are well-known and have been used for a long time. Therefore, the previous hypothesis that Huang’s word list was incomplete is conﬁrmed. We added these 15 pairs of words to the word list to make it more complete. Finally, we obtained a word list with 668 pairs of inverse morpheme words, some of which are shown in Table 1 as examples (the order of word 1 and word 2 does not make sense). Table 1. Inverse morphemes word list. word1

word2

word1

word2

word1

word2

word1

word2

爱抚 (caressing) 爱心 (love)

抚爱 (caressing) 心爱 (love)

伴侣 (partner) 膀臂 (arm)

侣伴 (companion) 臂膀 (arm)

编选 (editing) 爱情 (love)

选编 (editing) 情爱 (love)

藏躲 (hide) 草莽 (wildness)

鞍马 (side horse) 拔海 (elevation) 板鼓 (Ban Gu) 办公 (ofﬁce)

马鞍 (saddle) 海拔 (elevation) 鼓板 (clappers) 公办 (state-run)

包皮 (prepuce) 保准 (assuredly) 报警 (warning) 本原 (primitive)

皮包 (portfolio) 准保 (assuredly) 警报 (warning) 原本 (primitive)

谙熟 (proﬁcient) 白灰 (lime) 摆钟 (pendule) 半夜 (midnight)

熟谙 (proﬁcient) 灰白 (hoary) 钟摆 (pendulum) 夜半 (midnight)

侧翼 (flank) 查抄 (conﬁscate) 查检 (check) 查询 (inquiry)

躲藏 (hide) 莽草 (Illicium aniaatum) 翼侧 (flank) 抄查 (conﬁscate) 检查 (check) 询查 (inquiry)

(continued)

Semantic Similarity of Inverse Morpheme Words

457

Table 1. (continued) word1

word2

word1

word2

word1

word2

word1

word2

扮装 (makeup) 邦联 (confederation)

装扮 (makeup) 联邦 (federation)

笔画 (strokes) 闭关 (retreat)

画笔 (paintbrush) 关闭 (close)

依傍 (rely on) 珠宝 (jewelry)

产物 (product) 畅通 (unimpeded)

物产 (product) 通畅 (unimpeded)

保管 (keep) 报捷 (report a success)

管保 (guarantee) 捷报 (news of victory)

边沿 (edge) 爱恋 (love)

沿边 (edgewise) 恋爱 (love)

傍依 (rely on) 宝珠 (precious pearl) 暴风 (squall) 倍加 (double)

风暴 (storm) 加倍 (double)

潮红 (flush) 尘烟 (fog)

红潮 (red tide) 烟尘 (smoke)

3.2

Experimental Results

The semantic similarity of each pair of inverse morpheme words was calculated using Gensim’s similarity function, with the minimum value being 0 and the maximum value being 1, and the larger the value, the more similar the group of words is. To make the data easier to observe, we divided it into ten groups, with values of similarity between 0 and 0.1 forming one group and so on for subsequent counts. The ﬁndings are presented in Fig. 1.

Fig. 1. Semantic similarity of inverse morpheme words with different order of words.

As seen in Fig. 1, the data is dense in the middle and sparse on both sides. The majority of word pairs have a value of 0.5 to 0.8, with 131 pairs having a value of 0.6 to 0.7, 129 pairs having a value of 0.5 to 0.6, and 124 pairs having a value of 0.7 to 0.8.

458

J. Zhou and Z. Liu

There are 0 pairs of words with values below 0.1 and above 0.9 in the current word list, indicating that there are almost no completely unrelated pairs and no completely equivalent inverse morpheme words. This means that the majority of inverse morpheme words share a high level of semantic similarity, i.e., most of them have similar meanings.

4 Discussion 4.1

Classiﬁcation of Inverse Morpheme Words

According to the results presented above, we divided the inverse morpheme words into three categories: words with a value of 0.8 or higher are total synonyms with the same meanings, words with a value of 0.4 to 0.8 are qualsi-synonyms with similar meaning, words with a value of less than 0.4 are pseudo-synonyms with different meaning. The percentages of these three groups are approximately 7%, 76%, and 17%. Table 2 shows some of the randomly selected words.

Table 2. Inverse morpheme words with three different meaning relationships. Category Total synonyms

Qualsisynonyms

Semantic relationship Same meaning

Similar meaning

Word pairs 并吞 (absorb) 寻找 (search) 讲演 (speech) 爱怜 (love tenderly) 率直 (frank) 笔画 (strokes) 伴同 (accompany) 伴侣 (partner) 白花 (white flower) 变形 (distortion)

Pseudosynonyms

Different meaning

称号 (title) 分工 (divide the work) 办公 (ofﬁce) 低压 (low voltage) 传言 (rumour)

吞并 (absorb) 找寻 (search) 演讲 (speech) 怜爱 (love tenderly) 直率 (frank) 画笔 (paintbrush) 同伴 (companion) 侣伴 (companion) 花白 (gray) 形变 (deformation) 号称 (claim to be) 工分 (workpoints) 公办 (state-run) 压低 (lower) 言传 (explain in words)

Cosine similarity 0.87 0.87 0.82 0.87 0.83 0.5 0.44 0.66 0.46 0.58 0.37 0.35 0.25 0.28 0.24

Semantic Similarity of Inverse Morpheme Words

4.2

459

Analysis of Factors Influencing the Semantic Similarity of Inverse Morpheme Words

Word Formation. Words are made up of one or more morphemes, and the way the morphemes are put together is called word formation [1]. The majority of inverse morpheme words are coordinative compound words [3]. We randomly selected three categories of words classiﬁed in Sect. 4.1 and annotated their word formation. Each category has 10 pairs of words, for a total of 60 inverse morpheme words. The following are the results based on the ﬁve different types of Chinese compound word formation: coordinative type, attributive type (modiﬁer and word it modiﬁes), complementing type, predicate-object type, and subject-predicate type, as shown in Table 3. Table 3. Word formation of inverse morpheme words. Category

Coordinative

Attributive

Complementing

0

Subjectpredicate 0

0

Predicateobject 0

Total synonyms Qualsisynonyms Pseudosynonyms

20 9

8

2

0

1

5

12

1

1

1

It demonstrates that word formations have an effect on semantic relations. The coordinative and the attributive types of words predominate, accounting for 90% of the total words, with the coordinative type being primarily total synonymous and the attributive type being primarily qualsi-synonyms. However, the number of the subject-predicate type, complementing type, and predicate-object type is very small, and they are primarily words with similar or different meanings. In the complementing type, for example, only the word “压低 (Ya Di, lower)” appears. There are 10 pairs of total synonyms with the same morphemes among different semantic types, indicating that all of them are coordinative types, indicating that morpheme position reversal has no effect on the semantics of these words. We also counted the words whose formation changes as morpheme position changes and found that 5 pairs of qualsi-synonyms and 6 pairs of pseudo-synonyms have different formations, respectively. The difference between these two groups is that there are fewer coordinative words and more attributive words as the meaning shifts from similar to different. Part of Speech. Part of speech (POS) is a grammatical classiﬁcation of words in a language that affects the syntactic function of words. As shown in Table 4, 10 groups of inverse morpheme words with the lowest semantic similarity are listed and their POS (using the lexical markers from the Contemporary Chinese Dictionary (7th edition)) are

460

J. Zhou and Z. Liu

labeled. Although the meanings of these words are still somewhat related, they are no longer similar in the sense of native speakers, and the POS is no longer consistent after the morpheme is reversed. Only 4 out of 10 pairs of words have identical POS. Table 4. Ten pairs of words with the lowest semantic similarity in inverse morpheme words. word1 心中 (in the heart) 地基 (subgrade) 动机 (motivation) 出发 (leave) 来历 (origin) 工人 (worker) 传言 (rumour) 加强 (strengthen) 明文 (plaintext) 产物 (outcome)

POS adj n n v n n n; v v n n

word2 中心 (centre) 基地 (base) 机动 (motor-driven) 发出 (give out) 历来 (always) 人工 (artiﬁcial) 言传 (explain in words) 强加 (force) 文明 (civilization) 物产 (local products)

POS n n adj v d adj; n v v n; a n

Cosine similarity 0.155 0.184 0.205 0.215 0.233 0.234 0.235 0.236 0.240 0.242

We used the CpsWParser [13] to mark the POS of all words in the list and divided them into two groups: those with identical POS in a pair and those with different POS in a pair, to see if the effect of POS on similarity is signiﬁcant (if a word has multiple classes and another word has only one of them, it is also counted as a different POS). The results show that there are 459 pairs of inverse morpheme words with identical lexicality, with a mean value of 0.593, and 209 pairs with different POS, with a mean value of 0.517. The ANOVA results reveal that there is a signiﬁcant difference between the two sets of data, with p = 1.0439E−08 (