Dependency Structures from Syntax to Discourse: A Corpus Study of Journalistic English [1 ed.] 1032567104, 9781032567105

Based on the large corpora of journalistic English, this title examines dependency relations and related properties at b

143 36

English Pages 296 [297] Year 2023

Table of contents :
Cover
Half Title
Title Page
Copyright Page
Contents
List of Figures
List of Tables
List of Appendices
1. Introduction
2. Literature Review and the Present Study
3. Diversification Processes and Least Effort Principle
4. Research Materials and Methods
5. Syntactic Dependency Relations and Related Properties
6. Discourse Dependency Relations
7. Concluding Remarks
Acknowledgements
Appendices
Author Index
Subject Index

Recommend Papers

structures of dependency

Papers originating from a research seminar on dependency, organized by students and faculty in the winter quarter of 197

98 77 56MB Read more

Синтаксис английского языка: от слова к тексту. = A Students’ Guide to English Syntax: from Word to Discourse. Учебное пособие

Учебное пособие представляет собой универсальный грамматический справочник, охватывающий такие разделы, как простое пред

123 83 8MB Read more

Morphosyntactic Persistence in Spoken English: A Corpus Study at the Intersection of Variationist Sociolinguistics, Psycholinguistics, and Discourse Analysis 9783110197808, 9783110190120

Language users are creatures of habit with a tendency to re-use morphosyntactic material that they have produced or hear

146 35 1MB Read more

A Historical Syntax of English 9780748694563

Explores the many factors that influenced syntactic change in English GBS_insertPreviewButtonPopup(['ISBN:97807486

117 0 4MB Read more

The Semantic Field of Modal Certainty: A Corpus-Based Study of English Adverbs 9783110198928, 9783110196177

In spite of the vast literature on modality in English, very little research has been done on modal adverbs as a group.

146 41 2MB Read more

The History of the Present English Subjunctive: A Corpus-based Study of Mood and Modality 9781474438018

Traces the development of subjunctive use across the Old English, Middle English, and Early Modern English periods Uses

114 11 2MB Read more

Quantitative Analysis of Dependency Structures 9783110565775, 9783110573565, 9783110571097

Dependency analysis is increasingly used in computational linguistics and cognitive science. Surprisingly, compared with

158 50 7MB Read more

English as a Lingua Franca in Higher Education: A Longitudinal Study of Classroom Discourse 9783110215519, 9783110205190

With English-medium higher education burgeoning in Europe and elsewhere outside the English-speaking world, this book is

161 36 2MB Read more

Emerging English Modals: A Corpus-Based Study of Grammaticalization 9783110820980, 9783110166545

This work is essentially based on grammaticalization theory – a branch of linguistics which has gained prominence since

151 51 58MB Read more

The Vocabulary of Medical English: A Corpus-Based Study [1 ed.] 1443895784, 9781443895781

The question of characterizing academic vocabulary has often been framed in a context that is purely determined by quest

344 23 2MB Read more

Dependency Structures from Syntax to Discourse: A Corpus Study of Journalistic English [1 ed.]
1032567104, 9781032567105

Author / Uploaded
Hongxin Zhang

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

DEPENDENCY STRUCTURES FROM SYNTAX TO DISCOURSE A CORPUS STUDY OF JOURNALISTIC ENGLISH Hongxin Zhang

Dependency Structures from Syntax to Discourse

Based on the large corpora of journalistic English, this title examines dependency relations and related properties at both syntactic and discourse levels, seeking to unravel the language patterns of real-life usage. With a focus on rank-frequency distribution, the author investigates the distribution of linguistic properties/units from the perspectives of properties, motifs and sequencings. At the syntactic level, the book analyses the following three dimensions: various combinations of a complete dependency structure, valency and dependency distance. At the discourse level, it proves that the elements can also form dependency relations by exploring (1) the rank-frequency distribution of Rhetorical Structure Theory relations, their motifs, discourse valency and discourse dependency distance; (2) whether there is top-down organisation or an inverted pyramid structure at all the three discourse levels; and (3) whether discourse dependency distances and valencies are lawfully distributed, following the same distribution patterns as those at the syntactic level. This book will be of great value for scholars and students of quantitative linguistics and computational linguistics and its practical insights will also benefit professionals of language teaching and journalistic writing. Hongxin (Maria) Zhang is a lecturer at the School of International Studies, Zhejiang University, China. Her main research interests include quantitative linguistics, dependency grammar, synergetic linguistics and discourse studies.

Dependency Structures from Syntax to Discourse A Corpus Study of Journalistic English Hongxin Zhang

This title is funded by International Publishing Program of Zhejiang University. First published in English 2024 by Routledge 4 Park Square, Milton Park, Abingdon, Oxon OX14 4RN and by Routledge 605 Third Avenue, New York, NY 10158 Routledge is an imprint of the Taylor & Francis Group, an informa business © 2024 Hongxin ZHANG The right of Hongxin ZHANG to be identified as author of this work has been asserted in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. English Version by permission of China Renmin University Press. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data Names: Zhang, Hongxin, 1975- author. Title: Dependency structures from syntax to discourse : a corpus study of journalistic English / Hongxin Zhang. Description: Abingdon, Oxon ; New York, NY : Routledge, 2024. | Includes bibliographical references and index. Identifiers: LCCN 2023018251 (print) | LCCN 2023018252 (ebook) | ISBN 9781032567105 (hardback) | ISBN 9781032567143 (paperback) | ISBN 9781003436874 (ebook) Subjects: LCSH: English language--Dependency grammar. | English language--Discourse analysis. | Journalism--Language. Classification: LCC PE1369 .Z53 2024 (print) | LCC PE1369 (ebook) | DDC 420.1/41--dc23/eng/20230706 LC record available at https://lccn.loc.gov/2023018251 LC ebook record available at https://lccn.loc.gov/2023018252 ISBN: 978-1-032-56710-5 (hbk) ISBN: 978-1-032-56714-3 (pbk) ISBN: 978-1-003-43687-4 (ebk) DOI: 10.4324/9781003436874 Typeset in Times New Roman by KnowledgeWorks Global Ltd.

Contents

List of Figures List of Tables List of Appendices 1 Introduction

vi viii xiii 1

2 Literature Review and the Present Study

15

3 Diversification Processes and Least Effort Principle

36

4 Research Materials and Methods

46

5 Syntactic Dependency Relations and Related Properties

80

6 Discourse Dependency Relations

165

7 Concluding Remarks

237

Acknowledgements Appendices Author Index Subject Index

245 247 274 276

Figures

1.1 1.2 1.3 3.1 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 5.1 5.2 5.3 5.4 5.5 5.6

Constituency relations of “The student has an interesting book”4 Dependency structure of “The student has an interesting book”5 A sample RST tree (WSJ-0642)8 Zipf distribution. (a) A rank-frequency curve. (b) Logarithm of both axes37 A typical RST tree representation (WSJ-0642) 51 An analogy between phrase-structure syntactic trees and RST trees. (a) An equivalent representation of Figure 4.1. (b) A constituent syntactic tree 52 Transforming a constituent structure into a dependency structure. (a) Constituency relation. (b) Dependency relation 53 Converting an RST tree (Figure 4.2a). Steps of conversion. The converted tree 53 Converting Figure 4.1 into a tree with only terminal clause nodes (E- = elaboration-)55 Representing Figure 4.5 horizontally (E- = elaboration-, EGS = elaboration-general-specific) 56 Converting the sample RST tree into sentence nodes57 Converting the sample RST tree into one with mere paragraph nodes58 The dependency tree of the “Pierre” sample sentence60 GVP (generalised valency pattern)62 A re-presentation of Figure 4.564 A time series plot of discourse valency and dependency distance for WSJ-064266 Rank-frequency data of dependents85 Rank-frequency of governors (first algorithm)87 POS rank-frequency of governors (granularity level 2)90 Rank-frequency distribution of syntactic functions95 Rank-frequency curve of combination 4 (top 50) 105 Generalised valency patterns of verbs. (a) Functions dependents of verbs may play (active valency of verbs) (centrifugal force). (b) Functions verbs may play as dependents (passive valency) (centripetal force) 107

Figures vii 5.7 Rank-frequency curve of “dependent+ []=syntactic function” (top 10)109 5.8 Rank-frequency curve of “[]+governor=syntactic function” (top 10)111 5.9 Rank-frequency curve of combination 7 (top 10)114 5.10 Rank-frequency curve of motifs of combination 4 (top 10)118 5.11 Rank-frequency curve of motifs of combination 5 (top 10)121 5.12 Rank-frequency curve of combination 6 (top 10)123 5.13 Rank frequency of motifs of combination 7 (a) Top 50. (b) Top 10 126 5.14 Examples of verb valencies. (a) A tri-valency verb. (b) A bi-valency verb 129 5.15 Dependency tree for the “Pierre” sample sentence131 5.16 A time series plot of generalised valencies for the “Pierre” sample sentence133 5.17 Fitting the ZA model to E1 valency motif rank-frequency data136 5.18 A time series plot of valencies (the first 200 running words of E1)137 5.19 A time series plot of the valency motif lengths (the first 200 valency motifs in E1)137 5.20 Time series plots of DDs and DDs in the “Pierre” sample. (a) Dependency distances. (b) Absolute dependency distances144 5.21 Rank-frequency curves of dependency distance (top 20) 148 6.1 Fitting the ZA model to data of RST relations between sentences within paragraphs178 6.2 Graphic representation of Table 6.17188 6.3 Graphic representation of Table 6.19. (a) Parameter k. (b) Parameter p191 6.4 Graphic representation of Table 6.20. (a) Parameter k. (b) Parameter q192 6.5 An RST analysis of text WSJ_0642 from the RST-DT195 6.6 Converted tree of WSJ_0642 with elementary clause nodes only196 6.7 Converted tree of WSJ_0642 with elementary sentence nodes only197 6.8 Converted tree of WSJ_0642 with elementary paragraph nodes only198 6.9 Visually presenting statistical structures of sentences (clauses as nodes)199 6.10 Instances of perfect inverted pyramids: (Type A)199 6.11 Inverted pyramids with neighbouring parallel structures: (Type B) 200 6.12 Cup-shape approximate-inverted pyramids: (Type C)200 6.13 Pseudo-inverted pyramids: (Type D)200 6.14 Non-inverted pyramid structures without initial paramount nodes: (Type E)200 6.15 Relations between elementary clause nodes only in the converted discourse tree207 6.16 Valency-frequency curves for top 20 discourse valencies225 6.17 Rank-frequency curves for top 20 discourse valencies226

Tables

4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14

Sizes of sub-corpora of syntactic relations49 Tags of POS used in the syntactic sub-corpora50 Tags of dependency relations used in the syntactic sub-corpora50 The seven combinations of dependency structures in the “Pierre” sample sentence (c = combination)61 Valency patterns and valencies for the “Pierre” sample sentence62 Dependency direction, DD and ADD of the “Pierre” sample sentence63 Discourse valency and distance of Figure 4.1165 Motifs of POS and relations of the “Pierre” sample sentence67 All possible sequencings for the sample sentence70 Summary of all word classes82 POS of dependents (top 10) (R. = rank, Freq. = frequency)84 POS of governors (the first algorithm) (top 10) (R.= rank)86 POS of governors (coarse granularity level) (the second algorithm) (R.= rank, Freq. = frequency)88 Fitting ZA model to POS rank-frequency data of dependents90 Fitting ZA model to POS rank-frequency data of governors (the first algorithm)91 Fitting ZA model to POS rank-frequency data of governors (the second algorithm)91 Rank-frequency data of syntactic functions (R. = rank, Freq. = frequency, F. = function)94 A summary of syntactic functions (aggregated data of six sub-corpora)96 Fitting the ZA model to complete rank-frequency data of dependency relations96 A summary of POS sequencing data (TTR = type-token ratio = type/token)97 Truncated POS sequencing data98 Top 10 POS sequencings (individual corpora) (R. = rank Freq. = frequency) 98 Top 20 POS sequencing (all the corpora)99

Tables ix 5.15 5.16 5.17 5.18 5.19 5.20 5.21 5.22 5.23 5.24 5.25 5.26 5.27 5.28 5.29 5.30 5.31 5.32 5.33 5.34 5.35 5.36 5.37 5.38 5.39 5.40 5.41 5.42 5.43 5.44

Fitting the ZA model to rank-frequency data of POS sequencings (probability > 0.01%)99 Fitting the ZM model to rank-frequency data of POS sequencings (probability > 0.01%)100 A summary of relation and POS sequencing data100 A summary of extracted relation and POS sequencing data (probability > 0.01%)101 Fitting the ZA model to rank-frequency data of syntactic function sequencings (probability > 0.01%)101 Fitting the Zipf-Mandelbrot model to rank-frequency data of syntactic function sequencings (probability > 0.01%)101 Top 10 relation sequencings in each sub-corpus102 Top 20 relation sequencing in all the sub-corpora103 A data summary of combination 4104 Fitting the ZA model to rank-frequency data of “dependent + governor”106 Rank-frequency data of combination 4 (top 10) (R. = rank, Freq. = frequency)106 A data summary of combination 5108 Rank-frequency data for combination 5 across the sub-corpora (top 10) (R. = rank, Freq. = frequency)109 Fitting the ZA model to rank-frequency data of combination 5110 A data summary of combination 6111 Fitting ZA model to complete rank-frequency data of combination 6112 Rank-frequency data of combination 6 (top 10) (individual corpora) (R. = rank, F. = frequency)112 Rank-frequency data of combination 6 (top 10)113 A summary of the distribution of combination 7113 Fitting the ZA model to complete rank-frequency data of combination 7114 Top 10 structures of combination 7 (R. = rank, Freq. = frequency) 115 Generating motifs for combination 4 in the “Pierre” sample sentence117 A summary of motifs of combination 4117 Motifs of combination 4 in the sub-corpora (top 5) (R. = rank, Freq. = frequency)119 Fitting the ZA model to motifs of combination 4120 Fitting the ZM model to motifs of combination 4120 Fitting the negative hypergeometric model to motifs of combination 4120 A summary of motifs of combination 5121 Top 10 motifs of combination 5122 Fitting the negative hypergeometric model to motifs of combination 5123

x Tables 5.45 5.46 5.47 5.48 5.49 5.50 5.51 5.52 5.53 5.54 5.55 5.56 5.57 5.58 5.59 5.60 5.61 5.62 5.63 5.64 5.65 5.66 5.67 5.68 5.69 5.70 5.71 5.72 5.73 5.74 6.1 6.2 6.3 6.4

Summary of motifs of combination 6123 Top 5 motifs of combination 6124 Fitting the negative hypergeometric model to motifs of combination 6125 Summary of motifs of combination 7125 Top 5 motifs of combination 7127 Fitting the negative hypergeometric model to motifs of combination 7128 Valencies for words in the “Pierre” sample sentence132 A summary of rank-frequency distribution data for valency motifs133 Rank-frequency distribution data for valency motifs (top 10)134 Frequencies and percentages for top 10 valency motifs135 Fitting the ZA model to valency motif rank-frequency data136 Rank-frequency data of valency motif lengths138 Fitting the ZA model to the valency motif length data138 Lengths of valency motifs and length frequencies (Freq = frequency) 139 Fitting the hyper-Poisson function to the valency motif lengthfrequency interrelation139 Rank-frequency data of DD (top 7, 87.2%–88.4%) (R. = rank, Freq. = frequency)147 Fitting the ZA model to complete DD rank-frequency data148 Key data relating to DD149 Summary of DD sequencing data150 Summary of extracted DD sequencing data (probability > 0.01%)151 Top 6 DD sequencings in each sub-corpus (R. = rank, Freq. = frequency) 151 Top 10 DD sequencings in all the sub-corpora 152 Fitting the ZM model to rank-frequency DD sequencing data (DD = dependency distance, Sequencing = sequencing) 153 Fitting the ZA model to rank-frequency DD sequencing data (DD = dependency distance, Sequencing = sequencing) 153 Summary of ADD sequencing data153 Summary of extracted ADD sequencing data (sequencing probability > 0.01%)154 Top 8 ADD sequencings in each sub-corpus (R. = rank, Freq. = frequency) 154 Top 100 ADD sequencings in all the sub-corpora155 Fitting the ZA model to rank-frequency ADD sequencing data155 Fitting the ZM model to rank-frequency ADD sequencing data156 Distribution of rhetorical relations in Li et al. (2014)167 Percentages of rhetorical relations across levels (16 classes)168 The 10 most frequent rhetorical relations at various levels (an elaborate classification) (R. = rank) (E-=elaboration-)171 Rank-frequency data of taxonomies 1 and 2 (R. = rank, 1 = taxonomy 1, 2 = taxonomy 2, C = clause, S = sentence, P = paragraph) 174

Tables xi 6.5 Rhetorical relations across levels (taxonomy 3) (R.= rank, Freq. = frequency) 177 6.6 Parameter values of the ZA model fitted to data of RST relations between sentences within paragraphs178 6.7 Fitting the ZA model to data of RST relations between sentences within paragraphs (f[i]: empirical frequency, NP[i]: theoretical frequency) 178 6.8 Rank-frequency data of chosen RST relations179 6.9 Fitting the right truncated modified Zipf-Alekseev distribution to the rank-frequency data of chosen RST relations179 6.10 A summary of rank-frequency distribution data for discourse relation motifs (T = taxonomy)185 6.11 Rank-frequency distribution of discourse relation motifs (taxonomy 1, clauses as nodes, part)185 6.12 Rank-frequency distribution of discourse relation motifs (taxonomy 3, paragraphs as nodes, part)185 6.13 Fitting the negative binomial distribution pattern to rank-frequency data of discourse relation motifs (T = taxonomy)186 6.14 Measuring discourse relation motif lengths186 6.15 Length-frequency distribution of lengths of discourse relation motifs (T = taxonomy)187 6.16 Fitting the positive negative binomial distribution to rank-frequency data of motif lengths (three levels) (T = taxonomy)187 6.17 Fitting the positive negative binomial distribution to rankfrequency data of discourse motif lengths (taxonomy 3, clauses as nodes) (f[i]: empirical frequency, NP[i]: theoretical frequency) 188 6.18 Fitting the mixed Poisson distribution to rank-frequency data of motif lengths (T = taxonomy)189 6.19 Parameter values from Table 6.16 (T = taxonomy)191 6.20 Parameter values from Table 6.13 (fitting rank-frequency data of motifs, T = taxonomy)192 6.21 Structures across levels201 6.22 The five most frequent structures across levels201 6.23 Sample sentence structures in text WSJ_0610204 6.24 Sizes for the three sets of sub-corpora (C = clause, S = sentence, P = paragraph)206 6.25 Summary of discourse valencies (C = clauses, S = sentences, P = paragraphs)209 6.26 Discourse valency (V) for clause nodes210 6.27 Discourse valency (V) for sentence nodes212 6.28 Discourse valency (V) for paragraph nodes213 6.29 Fitting the ZA model to rank-frequency data of discourse valency214 6.30 Comparing syntactic valency and discourse valency (percentages >1%)214 6.31 Summary of discourse valency motifs at three levels (C = clauses, S = sentences, P = paragraphs)215

xii Tables 6.32 6.33 6.34 6.35 6.36 6.37 6.38 6.39 6.40 6.41 6.42

Discourse valency motifs for clause nodes (top 10) (R. = rank, F = frequency)216 Discourse valency motifs for sentence nodes (top 10) (R. = rank, F = frequency)217 Discourse valency motifs for paragraph nodes (top 10) (R. = rank, F = frequency)218 Fitting the ZM model to the complete rank-frequency data of the discourse valency motifs219 Calculating discourse dependency distance (DD) for the converted discourse tree 220 Summary of discourse dependency distance (C = clauses as nodes, S = sentences as nodes, P = paragraphs as nodes)221 Discourse dependency distance (clauses as nodes, top 20) (DD = dependency distance, F = frequency)222 Discourse dependency distance (sentences as nodes, top 20) (R. = rank, DD = dependency distance, F = frequency, S = sentence)223 Discourse dependency distance (paragraphs as nodes, complete data) (R. = rank, DD = dependency distance, F = frequency, P = paragraph, Ave. = average)224 Comparing DD at syntactic and discourse levels226 Fitting the ZA model to the rank-frequency distribution of discourse dependency distance228

Appendices

Appendix 1 Appendix 2 Appendix 3 Appendix 4 Appendix 5

POS of dependents (R. = rank, Freq. = frequency)248 POS of governors (the first algorithm) (R. = rank)251 Top 100 POS sequencing (all the corpora)254 Top 100 relation sequencing in all the sub-corpora256 Rank–frequency data of DD (Top 20, 97.2%–97.7%) (R. = rank, Freq. = frequency)258 Appendix 6 Top 20 DD sequencings in each sub-corpus (R. = rank, Freq. = frequency, DD = dependency distance)261 Appendix 7 Top 100 DD sequencings in all the corpora264 Appendix 8 Top 20 ADD sequencings in each sub-corpus (R. = rank, Freq. = frequency)267 Appendix 9 Top 100 ADD sequencings (sequ.) in all the sub-corpora (Sequ. = sequencing, R. = rank, Freq. = frequency)270 Appendix 10 Discourse dependency distance (paragraphs as nodes, complete data) (R. = rank, DD = dependency distance, F = frequency, P = paragraph, Ave. = average)272

1

Introduction

1.1 Starting from the family and state structures in ancient China The root of the world is in the state, the root of the state is in the family, the root of the family is in cultivating oneself. —Mencius (2012, p. 150)

Similar structures abound. Chinese people hear one of the most frequently mentioned similar structures, that between the state and the families. In traditional Chinese culture, the state was the extension and expansion of the family. In the homogeneous pattern of the family and the state, the family was the small state and the state was the large family. The patriarch bore the greatest power and was supreme within the family; within the state, the emperor was supreme and had the greatest power. The emperor, by virtue of his hereditary patriarchal status, rightly ruled over his empire, and this patriarchal status did not cease with the termination of his life. It was passed on from generation to generation through the bloodline. Each emperor appointed himself “Son of Heaven”, a noble dragon. When he died, the empire system continued and was naturally inherited by his firstborn son. While the patriarch was the head of the family, the emperor was the head of the state and the strict “father” of the nation. Not only was the emperor like a father, but the heads of local authorities at all levels were also regarded as the people’s “parental officials”. This homogeneous pattern of the family and the state has been deeply rooted in the Chinese culture. In short, the patriarch was the “ruler of the family” and the ruler was the “father of the state”; the ruler and father were the same one, and the patriarchal system thus permeated society as a whole, even overshadowing class and hierarchical relations. This model of similar structure between family and state is much criticised, but we cannot deny that it helps shape relationship among individuals, families and the whole nation and remains a very important reason for Chinese collectivism. Mencius (around 372 BC–289 BC) (2012), a renowned Chinese philosopher who followed Confucianism, accentuated the role of the family as the root of the state, and further, that of individuals as the root of the family. DOI: 10.4324/9781003436874-1

2 Introduction Similar structures are a research interest in many fields. Typically, the main purpose of studying isomorphism in mathematics is to apply mathematical theory to different domains (Mao, 2011). If two structures are isomorphic, then the objects on them will have similar properties and operations, and propositions that hold for one structure will also hold for the other. Thus, if an object structure is found to be isomorphic to a structure in some mathematical domain, and many theorems have been proved for that structure, then these theorems can be immediately applied to that domain. If certain mathematical methods can be used for that structure, then these methods can also be used for structures in the new domain. This makes it easy to understand and work with the structure of that object and often allows mathematicians to gain a deeper understanding of the domain (Greer & Harel, 1998). Similar structures between higher hierarchy and lower ones particularly fascinate language researchers. If the isomorphy or, to a lesser degree, some significant similarities can be found to hold between two or more linguistic hierarchies or sub-systems, we hypothesise that the regularities discovered at one hierarchy might apply to the other hierarchies. When the same mathematical model can capture the link across linguistic levels about one particular unit/property/relation, one similarity can be claimed to have been found. For instance, Menzerath’s law that “the larger the whole, the smaller the parts” (Menzerath, 1954, p. 100) in linguistics establishes a very interesting link between neighbouring linguistic levels. Function 1.1 (with parameters of a, b, c) can model this kind of connection where upper-level unit length is represented by y, and mean length of its immediate constituents, by x. Through Menzerath’s law, Chen and Liu (2019) discover a hierarchical structure of stroke → component → word → clause → sentence in written Chinese. y = ax be − cx

(1.1)

Besides the size of constructs, we can also investigate other similarities or intrinsic links across linguistic levels. For instance, we might pose a question about whether there is a similar structure, represented as tree structures, between the syntactic level and its upper discourse level. This is where we start for the present study, which aims to examine similarities between syntactic and discourse structures from a dependency perspective. The remaining part of this chapter is structured as follows: Section 1.2 introduces the notion of dependency in general. In Section 1.3, we turn to the main differences between dependency grammars and constituency/phrase-structure grammars first. Then, we proceed to discourse relations and Rhetorical Structure Theory (RST) in Section 1.4. 1.2 The notion of dependency In linguistics, dependency refers to a kind of relation that one element is governed by another (Percival, 1990). Even early in the 13th century, Latin grammarians claim there is “a dependency relation between all two-member constructions”

Introduction 3 (Percival, 1990, p. 35). Osborne (2019, p. 110) defines dependency in terms of licensing, indicating “A governor licenses the appearance of its governees”. These suggest possibility of dependency between linguistic units other than words. For instance, researchers (e.g., de Marneffe & Nivre, 2019; Mel’čuk, 2011; Mel’čuk & Polguère, 2009; Mel’čuk et al., 2003; Torrego, 1998) identify three major types of dependency—semantic, morphological and syntactic dependency. But other types of dependency are generally ignored.1 Most often than not, dependency refers to syntactic dependency. Dependency grammar comes when the basic elements in the asymmetrical relations are words. Unlike phonetics or morphology, the study of syntax is based on the framework of syntax, or a formal grammar. Dependency grammars and phrase-structure grammars (constituency grammars) are the two basic frameworks. Contemporary syntactic studies are based either on phrase-structure grammars or on dependency grammars, or, in some cases, on the combination of both (Liu, 2009a). Dependency grammar is favoured in the present study for the “five advantages” (Liu, 2022, p. 8), i.e., “better natural language processing applications, easier mapping from syntax to semantics, better handling of flexible sequential languages, better psychological realism and easier construction of application-oriented, high-precision syntactic analysis programs”. Section 1.3 elaborates on the differences and similarities between the two frameworks. So, in this study, at the syntactic level, we employ dependency grammars. At the discourse level, we borrow from the syntactic level some basic concepts and apply them in linguistic units bigger than words (e.g., clauses, sentences and paragraphs). This borrowing can be legitimate because similar to sentence structures, text structures can be analysed both hierarchically and relationally (Mann & Thompson, 1987, 1988; Sanders & van Wijk, 1996; Ziegler & Altmann, 2003). In the next section, we will first turn to syntactic dependency. 1.3 Dependency grammars vs. constituency grammars The study of syntax covers the investigation into the principles/rules and processes which govern sentence structures (Carnie, 2021; Chomsky, 2002, 2014; Sakel, 2015). Historically the word “syntax” is from two Greek terms, where “syn” indicates “together”, and “taxis” signifies “ordering”. Put together, “syntax” means “coordination”, coordinating and governing sentence structures. There are at least two dimensions of sentence structures (Ziegler & Altmann, 2003), namely (a) horizontal, or linear order, represented by word order and (b) vertical order. Hierarchy results from the vertical order as grammatical constructs are not linearly ordered in the brain. According to Tesnière (1959, p. 12), in its essence, syntactic study is the transfer between the two orders, namely the transfer between the linear structure of a sentence and the two-dimensional tree structure of it. To encode a sentence is to establish the connexion (“connection”) among words, and to decode it is to comprehend the connexion. Samuelsson and Wiren (2000, p. 59) claim that syntactic parsing is “the process of analysing a sentence to determine its syntactic structure

4 Introduction according to a formal grammar”. Such an output is a “hierarchical structure suitable for semantic interpretation”. In short, syntactic parsing is to convert a linear string into a two-dimensional structure with the help of a formal grammar (Grune & Jacobs, 1990; Liu, 2009a). In both types of formal grammars (dependency grammars and phrase-structure grammars), we establish hierarchic relations where components are linked. As the names suggest, the constituency model of syntax describes the constituency relations while the dependency model of syntax describes dependency relations. Phrase-structure grammars or constituency grammars take phrase structures or immediate constituents as the objects of study for syntactic structure analysis. Proposed by Bloomfield (1933) in the 1930s, such a method was vigorously advocated by the structuralist school in the following two decades. Later, Chomsky (1965) and his followers further extended and popularised it. Many well-known grammars are based on phrase structures, for example, Government and Binding, Generalize Phrase-Structure Grammar, Lexical Functional Grammar, Categorical Grammar, Cognitive Grammar and Construction Grammar (Osborne et al., 2011). Constituency grammars view sentence structures in terms of constituency relations, which is the fundamental trait shared by all constituency grammar frameworks. Every element in a sentence corresponds to one or more nodes in the syntactic tree. This one-to-one-or-more correspondence is clearly visible in Figure 1.1. A six-word sentence such as “The student has an interesting book” implies more than six nodes in the syntactic structure—six for individual words, three for phrase structures (a noun phrase or NP “the student”, a verb phrase or VP “has an interesting book”, and another noun phrase “an interesting book”), and one more for the entity of the whole sentence (S). The relation between a component and its upper node (its syntagm) is a part-whole one. Direct relations between individual words are not directly visible from the constituency relations.

Figure 1.1 Constituency relations of “The student has an interesting book” Source: https://en.wikipedia.org/wiki/Phrase_structure_grammar

Introduction 5 Phrase-structure grammars have been widely researched, taking a primary position in the field of syntactic study. In contrast, grammars based on dependencies (e.g., Ágel et al., 2003; Engel, 1994; Eroms, 2000; Groß, 1999; Hays, 1964; Heringer, 1996; Hudson, 1984, 1990; Kahane, 2000; Lobin, 1993; Mel’čuk, 1988; Pickering & Barry, 1993; Schubert, 1988; Starosta, 1988; Tarvainen, 2000; Tesnière, 1959) come secondary. Those grammars based on dependencies are Hudson’s Word Grammar (Hudson, 1984, 1990), Mel’čuk’s Meaning-Text Theory (Mel’čuk, 1988), Starosta’s Lexical Grammar (Starosta, 1988) and the German Schools (e.g., Engel, 1994; Eroms, 2000; Groß, 1999; Heringer, 1996). For any of these dependency-based grammars, the core operation is always dependency (Ninio, 2006, p. 8). Under the framework of dependency grammar, a sentence is a unit where its basic elements—words are connected. The interconnection between words are dependency relations, or in Tesnière’s term, connexion (“connection”). In a pair of words with a dependency relation, the one that governs is the governor or head, which governs its dependent(s). In a sentence, a word can be a governor or a dependent simultaneously (Liu, 2009a). Take as an example Figure 1.2: “has” is the only word without a head in the sentence, hence the “root” of the dependency tree. The word “book” is simultaneously the governor of two words (“an” and “interesting”) and the dependent of “has”. Positioning governors above the dependents gives rise to hierarchies of dependency trees, or in other words, it is the asymmetric relations between words that generate hierarchies. It is acknowledged by researchers that dependency relations are the basis of dependency grammar. Despite different understandings of the definition of dependency relations, it’s acknowledged that three basic properties are their core features (Mel’čuk et al., 2003; Hudson, 2007; Liu, 2009a; Nivre, 2006, p. 98). First, a dependency relation is a binary one between two linguistic units. Second, usually asymmetrical, this binary relation is between two units in the form of a governor and a dependent, respectively. Third, such an asymmetrical relation is distinguished and marked with a label. These key features further indicate that dependency relations are not limited to relations between words only. Theoretically speaking, dependency relations can occur between other linguistic units like clauses, sentences, paragraphs or even larger units.

Figure 1.2 Dependency structure of “The student has an interesting book” Source: Zhang & Liu, 2017b

6 Introduction Typically, dependency relations exist between words. Take as an example the sentence “The student has an interesting book”. Figure 1.2 displays the three key features of dependency relation. Dependency relations occur between two elements (e.g., between “student” and “has”). In this relationship, “has” acts as the governor and “student” the dependent. It is the asymmetry between components that gives rise to the hierarchical feature of syntactic dependency trees. In the example, their relationship is marked as “subj”, indicating that “student” is the subject of “has”. Rather than the one-to-one-or-more correspondence in constituent grammars, the dependency relation bears a one-to-one correspondence. As a result, every single node in the syntactic tree corresponds to one word, thus restricting the number of nodes in the tree and making it exactly the number of words, or sentence length. For instance, in Figure 1.2, the sentence “The student has an interesting book” bears six words in the linear structure and six nodes in the two-dimensional tree, rather than ten nodes in Figure 1.1. Such a one-to-one correspondence directly stresses the connexion between words. From the perspective of node function, in dependency grammars the root enjoys a prominent position due to its unique role, and each word is clearly bonded with at least one other word via a clear relation. In addition, the reduction of nodes in such a formal grammar greatly simplifies the operation for formalising the structure and facilitates natural language processing (NLP). In short, dependency grammars make the syntactic structure clearer and more concise. By comparison, we can see from the aforementioned features that dependency grammars stress the functions of individual language elements and their relations. This makes dependency grammars more eligible for describing sentence structures (Liu, 2009a; Liu, 2022). In addition, dependency grammars bear more psychological reality than constituency grammars do (e.g., Hudson, 2003; Peng, 2005). Syntactic studies should serve for processing authentic language materials. For this line, with a better capability for syntactic parsing, dependency grammars are more in application. For instance, SyntaxNet from Google, one of the most precise natural language parsers, is based on dependency relations. Dependency grammars are practical grammatical theories for solving language parsing problems (Liu, 2009a). This mainstream of application reflects the language reality. As is mentioned in this part, dependency relations can theoretically exist between other linguistic units, for instance, discourse units like clauses, sentences and even paragraphs. Having briefly addressed the differences between the two formal methods of syntax, we resume with analysis of discourse relations along with RST in Section 1.4. 1.4 Discourse relations and Rhetorical Structure Theory Sanders and van Wijk (1996) claim that like sentence structures, textual structures can be analysed in two aspects, both hierarchically and relationally. The former aspect deals with how one segment dominates another, what the distance between connected text segments is, and more often than not, how informationally relevant the segments are. The latter aspect involves the meaning of connections.

Introduction 7 To discover the similarities and differences between dependency relations at the syntactic and discourse levels in our study, we need to have a framework or formalism for text structure analysis first. From the very few methods which analyse both the hierarchical and the relational aspects, we choose Rhetorical Structure Theory (RST) proposed by Mann and Thompson (1988) as discourse trees in RST indicate different importance of segments in text in a clear and coherent way (Marcu, 1990, 2000). This functional formalism describes how discourse segments or text spans jointly make a complex discourse hierarchical tree structure through the connection of rhetorical relations, which are also known as discourse relations or coherence relations (Mann & Thompson, 1988; Taboada & Mann, 2006a). Taboada and Mann (2006a, 2006b) provide a detailed overview of RST. It is acknowledged as the most ambitious attempt in the field (Feng & Hirst, 2014). RST postulates a hierarchically connected structure of texts. Coherence is construed as all components of a text playing certain roles when connected to other components of the text (Mann & Thompson, 1988; Taboada, 2004; Taboada & Mann, 2006a). Literature abounds in RST-relevant analysis of different genres, for instance, fund-raising texts (Abelen et al., 1993; Mann & Thompson, 1992), expository texts (Goutsos, 1996; Sanders & van Wijk, 1996), business request letters (Kong, 1998), argumentative essays (Godó, 2008), news (Lindley et al., 2001; Ramsay, 2000; Yue & Feng, 2005; Yue & Liu, 2011), dialogues (Daradoumis, 1994; Taboada, 2004), medical discourses (da Cunha et al., 2007), research article abstracts (Rimrott, 2007), online suicide identification (Liu & Liu, 2021), literature (Anita & Subalalitha, 2019), to name just a few. RST corpora come in various languages, for instance, English (Carlson et al., 2002; Zeldes, 2017), German (Stede, 2004; Stede & N eumann, 2014), Portuguese (Cardoso et al., 2011; Pardo & Nunes, 2008), C hinese (Kong & Zhou, 2017; Li et al., 2014; Peng et al., 2022; Yue & Liu, 2011; Zhou & Xue, 2015), Basque (Iruskieta et al., 2013), Bangla (Das & Stede, 2018), Dutch (Redeker et al., 2012), and Persian (Shahmohammadi et al., 2021), Spanish (da Cunha et al., 2011) and the Spanish-Chinese parallel corpus (Cao et al., 2018). From these corpora some comparative studies (e.g., da Cunha & Iruskieta, 2010; Godó, 2008; Iruskieta et al., 2014; Shih, 2016; Trnavac & Taboada, 2021) have been carried out. RST relations, basically asymmetrical, indicate the relations between nucleus and one or more satellites, where the former discourse segment carries the main information, and the latter, secondary information related to the nucleus. Relations carrying an equal status between elements, like Contrast, List or Sequence connect symmetrical relations and are thereby represented by multi-nuclear spans. A typical RST tree, an illustration of both relational and hierarchical aspects of RST, can be exemplified by Figure 1.3, Text WSJ_0642 of the RST Discourse Treebank (RST-DT) (Carlson et al., 2002, 2003), the benchmark RST corpus and the most popular one. Segmented into clauses, this sample text is presented as a seven-layered tree with relations held among spans of various sizes. In the first paragraph (Span 1–6 or clauses 1 through 6), at the lowest layer, giving additional information about the situation presented in Node 4 as an Elaboration-additional-e (with -e indicating

8 Introduction

Figure 1.3 A sample RST tree (WSJ-0642)

embeddedness), Node 3 (the satellite, here an elementary discourse unit [EDU]) is related to Node 4 (the nucleus), and together they form a bigger non-EDU Span 3–4. At this layer, Span 3–4 (the satellite), constituting an Elaboaration-additonal-e again, is further aggregated to Node 2 (again the nucleus), hence the composition of an even bigger Span 2–4. Further on, Span 1, attributing the source (Attribution), joins Span 2–4 and helps generate Span 1–4. Span 1–5 and Span 1–6 are similarly formed. In the second paragraph (Clauses/Span 7–10), Node 8 elaborates on Node 7 (Elaboaration-additonal-e); Node 9 attributes the source of Node 10 (Attribution); Span 7–8 and Span 9–10 form a relation of Antithesis with the latter being the nucleus, indicating a contrast between the situation presented in Span

Introduction 9 7–8 and that in Span 9–10. At the top layer, it is Span 7–10 (the satellite) that is attached to Span 1–6 (the nucleus) with a relation of Elaboration-additional. In this fashion, this kind of aggregation occurs until the top layer where the biggest Span 1–10 encompasses all segments at its lower layers. From the recursive process we see how all units are promoted and incorporated into the tree through being constituents of various sizes (e.g., 3, 3–4, 2–4, 1–4, 1–5, 1–6, 1–10, with the spans getting increasingly bigger) with recursively applied RST relations. The arrows stand for the relational relevance and importance, pointing from one or more satellites to the nucleus. It is noteworthy here that the clauses defined in the RST-DT are somewhat different from traditional sense of clauses with finite verbs. In most cases, each clause in the corpus bears a verb, finite or otherwise. Similar to the dependency syntactic trees, RST relations between the parts of the discourse are distinctively labelled. Different from dependency syntactic trees but similar to phrase-structure syntactic trees, there are both EDUs as well as nonelementary ones. These EDUs, in this case basically clauses, are leaves of the discourse tree. Each EDU is combined with another discourse unit, not necessarily an EDU, and together they form bigger spans, which are non-terminal nodes. From the composition of the whole tree, the correspondence is basically one to more rather than one to one. This type of RST discourse trees can be called dependency trees as the relations are basically asymmetrical. But they bear more resemblance to phrase-structure trees in syntax (Li et al., 2014; Morey et al., 2018). In syntax, phrase-structure trees can be converted into dependency trees. We can also convert traditional RST trees into discourse dependency trees with nodes at the same granularity level of terminal nodes. This conversion resembles that from constituent to dependency for sentences (de Marneffe & Manning, 2008; Hayashi et al., 2016; Hirao et al., 2013; Johansson & Nugues, 2007; Li et al., 2014; Morey et al., 2018; Yamada & Matsumoto, 2003). Kobayashi et al. (2020) propose that any algorithms employed in syntactic parsing can be used in RST parsing. Section 4.3.2 will elaborate on the conversion in this study. After the conversion, analogous to syntactic dependency analysis in the syntactic field, discourse dependency analysis can be similarly carried out (Hudson, 2007; Liu et al., 2017; Sun & Xiong, 2019). Note 1 https://www.oxfordbibliographies.com/display/document/obo-9780199772810/obo9780199772810-0104.xml (Accessed February 9, 2023).

References Abelen, E., Gisela, R., & Sandra, A. T. (1993). The rhetorical structure of US-American and Dutch fund-raising letters. Text, 13(3), 323–350. Ágel, V., Eichinger, L., Eroms, H., Heringer, H. J., & Lobin, H. (Eds.). (2003). Dependency and valency: An international handbook of contemporary research (Vol. 1). Walter de Gruyter.

10 Introduction Anita, R., & Subalalitha, C. N. (2019). Building discourse parser for Thirukkural. In D. M. Sharma, P. Bhattacharyya, & R. Sangal (Eds.), Proceedings of the 16th international conference on natural language processing (pp. 18–25). International Institute of Information Technology, Hyderabad, India, December 2019. NLP Association of India. Bloomfield, L. (1933). Language. Henry Holt. Cao, S., & Gete, H. (2018). Using discourse information for education with a SpanishChinese parallel corpus. In Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018) (pp. 2254–2261): May 2018. Cardoso, P. C. F., Maziero, E. G., Jorge, M. L. C., Seno, E. M., Di Felippo, A., Rino, L. H. M., & Pardo, T. A. (2011). CSTnews—A discourse-annotated corpus for single and multidocument summarization of news texts in Brazilian Portuguese. In Proceedings of the 3rd RST Brazilian meeting (pp. 88–105), October 2011. Carlson, L., Marcu, D., & Okurowski, M. E. (2002). RST Discourse Treebank, LDC2002T07 [Corpus]. Linguistic Data Consortium. Carlson, L., Marcu, D., & Okurowski, M. E. (2003). Building a discourse-tagged corpus in the framework of rhetorical structure theory. In J. van Kuppevelt & R. W. Smith (Eds.), Current directions in discourse and dialogue (pp. 85–112). Kluwer Academic Publishers. Carnie, A. (2021). Syntax: A generative introduction. John Wiley and Sons. Chen, H., & Liu, H. (2019). A quantitative probe into the hierarchical structure of written Chinese. In X. Chen & R. Ferrer-i-Cancho (Eds.), Proceedings of the first workshop on quantitative syntax (Quasy, SyntaxFest 2019) (pp. 25–32). Paris, France, August 26–30, 2019. Chomsky, N. (1965). Aspects of the theory of syntax. The MIT Press. Chomsky, N. (2002 [1957]). Syntactic structures (2nd ed.). De Gruyter Mouton. Chomsky, N. (2014). Aspects of the theory of syntax (Vol. 11). The MIT Press. da Cunha, I., & Iruskieta, M. (2010). Comparing rhetorical structures in different languages: The influence of translation strategies. Discourse Studies, 12(5), 563–598. da Cunha, I., Torres-Moreno, J.-M., & Sierra, G. (2011). On the development of the RST Spanish Treebank. In Proceedings of the fifth law workshop (ACL 2011) (pp. 1–10). Portland, Oregon, USA. June 23–24, 2011. da Cunha, I., Wanner, L., & Cabré, T. (2007). Summarization of specialized discourse: The case of medical articles in Spanish. Terminology, 13(2), 249–286. Daradoumis, T. (1994). Building an RST-based multi-level dialogue context and structure. In Proceedings of the tenth conference on artificial intelligence for applications (pp. 465–466). IEEE, March 1994. Das, D., & Stede, M. (2018). Developing the Bangla RST Discourse Treebank. In E. Grave, P. Bojanowski, P. Gupta, A. Joulin, & T. Mikolov (Eds.), Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018). Miyazaki, Japan, May 2018. de Marneffe, M. C., & Manning, C. D. (2008). The Stanford typed dependencies representation. In Coling 2008: Proceedings of the workshop on cross-framework and cross-domain parser evaluation (pp. 1–8). Manchester, August 2008. de Marneffe, M. C., & Nivre, J. (2019). Dependency grammar. Annual Review of Linguistics, 5, 197–218. Engel, U. (1994). Syntax der deutchen Gegenwartssprache (3rd ed.). Erich Schmidt. Eroms, H. (2000). Syntax der deutschen Sprache. Walter de Gruyter. Feng, V. W., & Hirst, G. (2014). A linear-time bottom-up discourse parser with constraints and post-editing. In Proceedings of the 52nd annual meeting of the Association for Computational Linguistics (Vol. 1: Long Papers) (pp. 511–521). Baltimore, Maryland, USA, June 23–25, 2014.

Introduction 11 Godó, A. M. (2008). Cross-cultural aspects of academic writing: A study of Hungarian and North American college students L1 argumentative essays. International Journal of English Studies, 8(2), 65–111. Goutsos, D. (1996). A model of sequential relations in expository text. Text, 16(4), 501–533. Greer, B., & Harel, G. (1998). The role of isomorphisms in mathematical cognition. The Journal of Mathematical Behavior, 17(1), 5–24. Groß, T. (1999). Theoretical foundations of dependency syntax. Iudicium. Grune, D., & Jacobs, C. (1990). Parsing techniques. Ellis Horwood. Hayashi, K., Hirao, T., & Nagata, M. (2016). Empirical comparison of dependency conversions for RST discourse trees. In Proceedings of the 17th annual meeting of the special interest group on discourse and dialogue (pp. 128–136). Los Angles, Stroudsburg, PA, September 13–15, 2016. Association for Computational Linguistics. Hays, D. (1964). Dependency theory: A formalism and some observations. Language, 40, 511–525. Heringer, H. (1996). Deutsche syntax dependentiell. Stauffenberg. Hirao, T., Yoshida, Y., Nishino, M., Yasuda, N., & Nagata, M. (2013). Single-document summarization as a tree knapsack problem. In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 1515–1520). Seattle, Washington, USA, October 18–21, 2013. Hudson, R. (1984). Word grammar. Basil Blackwell. Hudson, R. (1990). An English word grammar. Basil Blackwell. Hudson, R. (2003). The psychological reality of syntactic dependency relations. In Proceedings of first international conference on meaning-text theory (pp. 181–192). Paris, June 16–18, 2003. Hudson, R. (2007). Language networks: The new word grammar. Oxford University Press. Iruskieta, M., Aranzabe, M. J., de Ilarraza, A. D., Gonzalez, I., Lersundi, M., & de Lacalle, O. L. (2013). The RST Basque TreeBank: An online search interface to check rhetorical relations. In: 4th Workshop RST and discourse studies (pp. 40–49). Fortaleza, CE, Brasil, October 21–23, 2013. Iruskieta, M., da Cunha, I., & Taboada, M. (2014). Principles of a qualitative method for rhetorical analysis evaluation: A contrastive analysis English-Spanish-Basque. Language Resources and Evaluation, 49(2), 263–309. Johansson, R., & Nugues, P. (2007). Extended constituent-to-dependency conversion for English. In J. Nivre, H. J. Kaalep, K. Muischnek, & M. Koit (Eds.), Proceedings of the 16th Nordic conference of computational linguistics (NODALIDA 2007) (pp. 105–112). University of Tartu, Tartu, May 25–26, 2007. Kahane, S. (Ed.). (2000). Les grammaires de dépendance (Dependency Grammars), (Traitement automatique des langues 41). Hermes. Kobayashi, N., Hirao, T., Kamigaito, H., Okumura, M., & Nagata, M. (2020). Top-down RST parsing utilizing granularity levels in documents. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 05, pp. 8099–8106). April 2020. Kong, F., & Zhou, G. (2017). A CDT-styled end-to-end Chinese discourse parser. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 16(4), 1–17. Kong, K. C. C. (1998). Are simple business request letters really simple? A comparison of Chinese and English business request letters. Text, 18(1), 103–141. Lindley, C., Davis, J., Nack, F., & Rutledge, L. (2001). The application of rhetorical structure theory to interactive news program generation from digital archives (technical report No. INS-R0101). Centrum voor Wiskunde en Informatica.

12 Introduction Liu, H. (2009a). Dependency grammar: From theory to practice. Science Press. (In Chinese). Liu, H. (2022). Dependency relations and language networks. Science Press. (In Chinese). Liu, H., Xu, C., & Liang, J. (2017). Dependency distance: A new perspective on syntactic patterns in natural languages. Physics of Life Review, 21, 171–193. Liu, X., & Liu, X. (2021). Online suicide identification in the framework of rhetorical structure theory (RST). Healthcare, 9, 847. Li, S., Wang, L., Cao, Z., & Li, W. (2014). Text-level discourse dependency parsing. In Proceedings of the 52nd annual meeting of the Association for Computational Linguistics (pp. 25–35). Baltimore, Maryland, USA, June 23–25, 2014. Lobin, H. (1993). Koordinationssyantax als prozedurales Phänomen. Studien zur deutschen Grammatik, 46. Narr. Mann, W. C., & Thompson, S. A. (1987). Rhetorical structure theory: A framework for the analysis of texts. In IPRA papers in pragmatics (Vol. 1, pp. 1–22). University of Southern California Marina Del Rey Information Sciences Inst. Mann, W. C., & Thompson, S. A. (1988). Rhetorical structure theory: Toward a functional theory of text organization. Text, 8(3), 243–281. Mann, W. C., & Thompson, S. A. (Eds.). (1992). Discourse description: Diverse linguistic analyses of a fundraising text. John Benjamins. Mao, L. (2011). Combinatorial geometry with applications to field theory, graduate textbook in mathematics. Infinite study. The Educational Publisher. Marcu, D. (1990). Discourse trees are good indicators of importance in text. In I. Mani & M. Maybury (Eds.), Advances in automatic text summarization (pp. 123–136). The MIT Press. Marcu, D. (2000). The theory and practice of discourse parsing and summarization. The MIT Press. Mel’čuk, I. (1988). Dependency syntax: Theory and practice. State University of New York Press. Mel’čuk, I. (2011). Dependency in language-2011. In K. Gerdes, E. Hajičová, & L. Wanner (Eds.), Proceedings of international conference on dependency linguistics 2011 (pp. 1–16). Barcelona, Spain. September 5–7, 2011. Mel’čuk, I., Agel, V., Eichinger, L. M., & Heringer, H. J. (2003). Levels of dependency in linguistic description: Concepts and problems. Dependency and valency. In H. W. Eroms & P. Hellwig (Eds.), Dependenz und valenz/dependency and valency: Ein internationales handbuch der zeitgenossischen forschung/an international handbook of contemporary research (Vol. 1, pp. 188–229). Walter de Gruyter. Mel’čuk, I., & Polguère, A. (2009). Dependency in linguistic description. John Benjamins. Mencius. (2012). Mensius (translated and annotated by L. Wan & X. Lan). Zhonghua Book Company. (In Chinese). Menzerath, P. (1954). Die Architektonik des deutschen Wortschatzes. Dümmler. Morey, M., Muller, P., & Asher, N. (2018). A dependency perspective on RST discourse parsing and evaluation. Computational Linguistics, 44(2), 197–235. Ninio, A. (2006). Language and the learning curve: A new theory of syntactic development. Oxford University Press. Nivre, J. (2006). Inductive dependency parsing. Springer. Osborne, T. J. (2019). A dependency grammar of English: An introduction and beyond. John Benjamins Publishing Company. Osborne, T. J., Putman, M., & Gross, T. M. (2011). Bare Phrase structure, label-less trees, and specifier-less syntax. Is minimalism becoming a dependency grammar. The Linguistic Review, 28, 315–384.

Introduction 13 Pardo, T. A. S., & Nunes, M. G. V. (2008). On the development and evaluation of a Brazilian Portuguese discourse parser. Journal of Theoretical and Applied Computing, 15(2), 43–64. Peng, Y. (2005). On psychological reality of phrase-structure grammar and dependency grammar, PhD dissertation, Shanghai International Studies University. (In Chinese). Peng, S., Liu, Y. J., & Zeldes, A. (2022). GCDT: A Chinese RST Treebank for multigenre and multilingual discourse parsing. arXiv preprint arXiv:2210.10449. Percival, K. (1990). Reflections on the history of dependency notions in linguistics. Historiographia Linguistica, 17, 29–47. Pickering, M., & Barry, G. (1993). Dependency categorial grammar and coordination. Linguistics, 31, 855–902. Ramsay, G. (2000). Linearity in rhetorical organisation: A comparative cross-cultural analysis of news text from the People’s Republic of China and Australia. International Journal of Applied Linguistics, 10(2), 241–258. Redeker, G., Berzlánovich, I., van der Vliet, N., Bouma, G., & Egg, M. (2012). Multi-layer discourse annotation of a Dutch Text Corpus. Age, 1, 2. Rimrott, A. (2007). The discourse structure of research articles abstracts—A rhetorical structure theory (RST) analysis. In Proceedings of the 22nd NorthWest Linguistics Conference (NWLC) at Simon Fraser University (pp. 207–220). Linguistics Graduate Student Association, Canada. Sakel, J. (2015). Study skills for linguistics. Routledge. Samuelsson, C., & Wiren, M. (2000). Parsing techniques. Handbook of natural language processing. Marcel Dekker, Inc. Sanders, T., & van Wijk, C. (1996). PISA: A procedure for analyzing the structure of explanatory texts. Text, 16(1), 91–132. Schubert, K. (1988). Metataxis: Contrastive dependency syntax for machine translation. Foris. Shahmohammadi, S., Veisi, H., & Darzi, A. (2021). Persian rhetorical structure theory. arXiv preprint arXiv:2106.13833. Shih, S. S. P. (2016). English-Chinese comparison of rhetorical structure of research article. NCYU Inquiry in Applied Linguistics, 67. Starosta, S. (1988). The case for Lexicase: An outline of Lexicase grammatical theory. Pinter Publishers. Stede, M. (2004). The Potsdam commentary corpus. In Proceedings of the ACL 2004 workshop on “Discourse Annotation” (pp. 96–102). Barcelona, Spain, July 25–26, 2004. Stede, M., & Neumann, A. (2014). Postdam Commentary Corpus 2.0: Annotation for discourse research. In Proceedings of the 9th international conference on language resources and evaluation (LREC) (pp. 925–929). Reykjavik, Iceland, May 26–31, 2014. Sun, K., & Xiong, W. (2019). A computational model for measuring discourse complexity. Discourse Studies, 21(6), 690–712. https://doi.org/10.1177/1461445619866985 Taboada, M. (2004). Building coherence and cohesion: Task-oriented dialogue in English and Spanish. John Benjamins. Taboada, M., & Mann, W. (2006a). Rhetorical structure theory: Looking back and moving ahead. Discourse Studies, 8(3), 423–459. Taboada, M., & Mann, W. (2006b). Applications of rhetorical structure theory. Discourse Studies, 8(4), 567–588. Tarvainen, K. (2000). Einführung in die Dependenzgrammatik (Reihe germanistische Linguistik 35). Niemeyer.

14 Introduction Tesnière, L. (1959). Éléments de syntaxe structurale. Klincksieck. Torrego, E. (1998). The dependencies of objects (Vol. 34). The MIT Press. Trnavac, R., & Taboada, M. (2021). Engagement and constructiveness in online news comments in English and Russian. Text & Talk, 43(2). https://doi.org/10.1515/text-2020-0171 Yamada, H., & Matsumoto, Y. (2003). Statistical dependency analysis with support vector machines. In Proceedings of the eighth international conference on parsing technologies (pp. 195–206). Nancy, France, April 2003. Yue, M., & Feng, Z. (2005). Findings in a preliminary study on the rhetorical structure of Chinese TV news reports. In Paper presented at the first computational systemic functional grammar conference. Sydney, Australia, July 15–16, 2005. Yue, M., & Liu, H. (2011). Probability distribution of discourse relations based on a Chinese RST-annotated corpus. Journal of Quantitative Linguistics, 18(2), 107–121. Zeldes, A. (2017). The GUM corpus: Creating multilayer resources in the classroom. Language Resources and Evaluation, 51(3), 581–612. Zhang, H., & Liu, H. (2017b). Motifs of generalized valencies. In H. Liu & J. Liang (Eds.), Motifs in language and text (pp. 231–260). De Gruyter Mouton. Zhou, Y., & Xue, N. (2015). The Chinese discourse TreeBank: A Chinese corpus annotated with discourse relations. Language Resources and Evaluation, 49, 397–431. Ziegler, A., & Altmann, G. (2003). Text stratification. Journal of Quantitative Linguistics, 10(3), 275–292.

2

Literature Review and the Present Study

In this chapter, Section 2.1 reviews quantitative studies on syntactic and discourse dependency. Research gaps and detailed research objectives are proposed based on previous studies in Section 2.2. The last section gives an outline of the whole book. 2.1 Quantitative research on syntactic and discourse dependency We start from prior quantitative research concerning syntactic dependency. 2.1.1 Quantitative research on syntactic dependency

Quite a lot of quantitative studies based on syntactic dependency and valency theories are carried out, particularly within the last two decades. First, the distribution patterns of syntactic properties/units in the framework of dependency grammar, such as dependency distance (DD, the linear position of the dominant word minus the linear position of the subordinate word) (Liu, 2007a), dependency relations (Liu, 2007b), the distribution of the height of the dependency tree and the hierarchy in which the nodes are located (Liu, 2017; Zhang, 2023), various types of node positions (Zhang, 2023), the length of sub-trees (Zhang, 2023), part-of-speech valency patterns (Zhang, 2023). Second, the relationships between syntactic properties have been explored, such as those between DD and dependency tree depth (Liu & Jing, 2016), sentence length and hierarchies of dependency tree (Jing & Liu, 2015), sentence length and DD (Jiang & Liu, 2015), complexity and position (Wang & Liu, 2014), and those between sentence length, dependency tree height and tree width (Zhang & Liu, 2018). Third, researchers also delve into the correlation between DD and cognition (e.g., Chen et al., 2022; Fang & Liu, 2018; Ouyang et al., 2022; Zhao & Liu, 2014). As an important concept of dependency grammar, DD reflects the distance of syntactic relations that exist between words as well as the order, and this distance affects the processing of the human brain. There is a tendency towards of Dependency Distance Minimisation (DDM) in human languages, which is a universal feature of language subject to universal cognitive mechanisms (Ferrer-i-Cancho, 2004, 2006, 2016; Futrell et al., 2015; Jiang & Liu, 2015; Lei & Wen, 2020; Liang DOI: 10.4324/9781003436874-2

16 Literature Review and the Present Study & Liu, 2016; Liu, 2007a, 2008, 2022; Liu et al., 2017; Lu & Liu, 2016a, 2016b, 2016c, 2020; Temperley, 2007, 2008; Wang & Liu, 2017); the average sentence DD can serve as a measure of sentence complexity (Hudson, 1995; Liu, 2008). For head-initial dependencies, the dependence distance bears a negative value, while for head-finals, the distance is positive. Using a corpus of 20 languages, Liu (2010) finds that the order of words is also a continuum with linguistic typological implications, and the dependency direction indicator he proposes is called LiuDirectionalities (Fisch et al., 2019). The study of DD and dependency direction helps to analyse the difficulty of sentence comprehension, such as comprehension of ambiguous sentences (Zhao & Liu, 2014), comprehension of “ba”1 sentences (Fang & Liu, 2018), “zai” (a Chinese propositional structure indicating position of time) and subject DD (Xu, 2018), syntactic complexity (Chen et al., 2022; Ouyang et al., 2022); it also contributes to typological studies (e.g., Chen & Gerdes, 2022; Wang, 2015; Wang & Liu, 2017), children acquisition (e.g., Ninio, 1998), language learning (Jiang & Ouyang, 2017; Jiang et al., 2019; Ouyang & Jiang, 2018), translation and interpretation (Jiang & Jiang, 2020; Liang, 2017; Zhou, 2021), ellipsis (e.g., Dai et al., 2022), language evolution (Liu & Chen, 2017) and the design of better algorithms for natural language syntactic analysis (e.g., Collins, 1996, 2003; Eisner & Smith, 2005). There is a growing body of research on valency—the number or patterns of dependents a node can carry or has actually carried (e.g., Beliankou & Köhler, 2018; Čech & Mačutek, 2010; Čech et al., 2010; Gao, 2013; Gao & Liu, 2019, 2020; Gao et al., 2014; Herbst, 1988; Jiang & Liu, 2018; Jin & Liu, 2018; Köhler, 2005; Liu, 2011; Liu & Liu, 2011; Lu et al., 2018; Yan & Liu, 2021); Liu and Feng (2007) propose the concept of Generalised Valency Pattern (GVP) and then Probabilistic Valency Pattern Theory (PVP). Čech et al. (2017) study the linear linguistic behaviour of generalised valency. The study of valency is beneficial for language learning as well as lexicography, among many others (e.g., Gao & Liu, 2019, 2020; Herbst et al., 2004). We will discuss more about the theory of valency in Section 5.6. Examples of valency will be given in Section 4.4.2. Basically, along the research lines of syntactic dependency, studies focus on units mostly with one single element. To the best of our knowledge, units with multiple elements are rarely investigated. The interrelations of distribution patterns among the multiple-element units and those of their individual elements remain an interesting issue. The second dimension for improvement concerns the linear features of linguistic elements. Dynamic features of some elements are investigated but still some further features remain to be discovered. For instance, in the examination of the linear features of elements, most often, motifs are used. But one critical problem with motifs is that repetitive elements are excluded when the elements are nominal. In authentic language, co-occurrence of repetitive elements can be quite frequent. For instance, two or more nouns can come together to modify one noun. But such co-occurrences will be excluded from motif studies. Efforts to integrate such repetitive elements into discussion will prove to be quite rewarding.

Literature Review and the Present Study 17 Third, with large-scale corpora increasingly available, we tend to believe that the larger the sizes, the better. Does this claim always hold true? Will corpora of smaller sizes still be desirable and the research findings gained therein still retain their validity? The research findings from large-scale corpora can be very robust, but this robustness is not always welcome (Liu & Jing, 2016), given that it is often too general, too tolerant of more deviations and exceptions, and sometimes indicative of losing some more specific and elaborate features. In addition, constructing large-scale corpora is very time-absorbing and resource-consuming. Zipf (1932) introduces “relative frequency”, indicating a certain percentage can represent the probability of the entirety or the whole population. So, if a smaller part will suffice to tell the features, the robustness of the research findings will be greatly reduced, excluding rather than including more exceptions. This way, the research result will be more convincing, significant and credible. Measures shall be taken to check how this relative frequency can be figured out. If merely 1/n proves to behave as the entirety does, (n − 1)/n times of efforts can be saved, which is very encouraging. The bigger the n, the better. Seeking such a proportion can be very rewarding, particularly when the availability of big corpora is relatively limited. 2.1.2 Quantitative research on discourse dependency

Having briefly introduced the quantitative research findings on syntactic dependency, let’s resume with discourse dependency, starting from discourse parsing in general. Discourse parsing is important for a lot of high-impact applications like question answering (e.g., Chai & Jin, 2004; Ferrucci et al., 2010; Jansen et al., 2014), sentiment analysis (e.g., Bhatia et al., 2015; Huber & Carenini, 2020; Kraus & Feuerriegel, 2019; Markle-Hus et al., 2017; Somasundaran et al., 2009; Voll & Taboada, 2007), text summarisation (e.g., Cardoso et al., 2011; Goyal & Eisenstein, 2016; Hirao et al., 2013; Huang & Kurohashi, 2021; Louis et al., 2010; Marcu, 2000; Xiao et al., 2020; Xu et al., 2020; Yoshida et al., 2014), text categorisation or document classification (Ji & Smith, 2017), automatic evaluation of student writing (e.g., Burstein et al., 2013; Miltsakaki & Kukich, 2004), language modelling (e.g., Ji et al., 2016), machine translation (Meyer & Popescu-Belis, 2012), and machine reading comprehension (e.g., Narasimhan & Barzilay, 2015), etc. Most of the previous natural language processing (NLP) tasks depend on discourse dependency trees (Kobayashi et al., 2020). Applied research abound in the field, and our study will focus more on the theoretical side. Scientific research is usually conducted under a specific theoretical framework. Various theories on text structure analysis are proposed, such as Systemic Functional Linguistics (Halliday & Hasan, 1976), Rhetorical Structure Theory (RST) of Mann and Thompson (1988), Discourse Representation Theory (Kamp & Reyle, 1993), Segmented Discourse Representation Theory (Asher, 1993; Asher & Lascarides, 1994, 2003; Baldridge & Lascarides, 2005), Relevance Theory (Sperber & Wilson, 1995), Centering Theory (Grosz et al., 1995), Geneva Pragmatics School (Roulet, 1995), Cross-document Structure Theory (CST) (Radev, 2000),

18 Literature Review and the Present Study the lexicalised discourse framework (Webber et al., 2003), Discourse Graphbank (Wolf & Gibson, 2006; Wolf et al., 2004) and Veins Theory (Cristea, 2005), Penn Discourse Tree Bank (PDTB) (Prasad et al., 2008). Since Marcu (1997) brings discourse parsing to prominence, researchers (e.g., Baldridge & Lascarides, 2005; Feng & Hirst, 2012; Hernault et al., 2010; Hirao et al., 2013; Joty et al., 2012; Le Thanh et al., 2004; Li et al., 2014; Muller et al., 2012; Prasad et al., 2008; Reitter, 2003; Sagae, 2009; Soricut & Marcu, 2003; Subba & Di Eugenio, 2009) have proposed various algorithms and systems, extracted textual information and built discourse trees. For instance, Soricut and Marcu (2003) introduce probabilistic models to build discourse trees at the sentence level using both lexical and syntactic features. There are different ways to classify discourse parsing methods. For instance, Li et al. (2022) put them into three broad categories: (1) RST-style discourse parsing identifying hierarchical tree structure (e.g., Anita & Subalalitha, 2019; Feng & Hirst, 2014; Hernault et al., 2010; Ji & Eisenstein, 2014; Joty et al., 2015). (2) PDTB-style discourse parsing identifying flat structures and (3) dialogue discourse parsing. The first two categories parse passages and the last, dialogues. Both the RST Discourse Treebank (RST-DT, Carlson et al., 2002, 2003) and PDTB (Prasad et al., 2008) are large-scale discourse corpora derived from WSJ (Wall Street Journal), and together they constitute the models which “have promoted most of the research on discourse parsing” (Li et al., 2022, p. 1). Actually, there are some efforts to map PDTB and RST annotations (e.g., Demberg et al., 2019; Sun et al., 2022). In the new century, neural network methodology (Barabási & Pósfai, 2016; Newman, 2004, 2018) has seen a surge of application in discourse analysis (e.g., Guo et al., 2020; Ji et al., 2016; Kalchbrenner & Blunsom, 2013; Kraus & Feuerriegel, 2019; Li & Jurafsky, 2017; Li et al., 2016; Li et al., 2021; Liu et al., 2016; Qin et al., 2016; Roemmele, 2016; Sun & Xiong, 2019; Xu et al., 2020; Zhang et al., 2015; Zheng et al., 2016). More reviews about discourse parsing and specific methodologies can be found in many a paper (e.g., Brown & Yule, 1983; Coulthard, 1984; Coulthard & Condlin, 2014; Gee, 2004; Hovy & Maier, 1992; Jiang et al., 2022; Kang et al., 2019; Li et al., 2022; Schiffrin, 1994; Taboada, 2019; Taboada & Mann, 2006a, 2006b; Widdowson, 2007; Yan et al., 2016). Now we narrow down our discussion to the two aspects of discourse structure analysis. Discourse structure can be analysed hierarchically and/or relationally (Sanders & van Wijk, 1996). Some scholars’ investigations cover both hierarchical and relational aspects of discourse structure (e.g., Mann & Thompson, 1988; Sun & Xiong, 2019). For instance, Sun and Xiong (2019) convert RST-DT trees into dependency ones and address how discourse units and relations are related. They investigate discourse complexity. They calculate discourse distance in the same way as how syntactic DD is calculated and find a discourse distance range. Discourse distance is taken as a measure of human processing depth and discourse complexity. Power law distribution patterns are both observed with DD between clause nodes and with

Literature Review and the Present Study 19 relations at the same level. Sun et al. (2022) convert PDTB relations into dependency representations and find that PDTB relations are quite valid. The DD gained therein is highly correlated with that gained from RST representations. Some studies (e.g., Chiarcos & Krasavina, 2008; Feng et al. 2014; Pitkin, 1977) focus more on the hierarchical structure of discourse. For instance, Chiarcos and Krasavina (2005, 2008) study rhetorical distance in RST. Such a distance is measured by the shortest path in the tree or lowest number of edges between an anaphora-antecedent pair. Such a distance is different from Sun and Xiong’s (2019) study. Discourse distance seems effective in telling different genres apart (Sun et al., 2021). Other scholars investigate the meaning of connections (e.g., d’Andrade, 1990; den Ouden et al., 2009; Goutsos, 1996; Guo et al., 2020; Gupta et al., 2019; Hovy & Maier, 1992; Liu et al., 2016; Louwerse, 2001; Narasimhan & Barzilay, 2015; Peng et al., 2017; Ratnayaka et al., 2018; Sahu et al., 2019; Somasundaran et al., 2009; Song et al., 2018; Zhang et al., 2006; Zhang et al., 2015). For instance, with CST (Radev, 2000) as their guiding theory, Ratnayaka et al. (2018) study Court Case transcripts and check sentence relations quantitatively. In many cases, researchers basically examine the frequency of relations (e.g., Das & Taboada, 2018; Iruskieta et al., 2015; Sun & Zhang, 2018). Some distribution models are used to fit the distribution of rhetorical relations, suggesting these relations as results from the diversification processes (Altmann, 1991, 2005a). The ZA (modified right-truncated Zipf-Alekseev) model is found to fit the data in Yue and Liu (2011), who randomly select 20 texts from a self-made Chinese corpus annotated with RST relations. These relations occur in constituenttype discourse trees. We posit discourse relations will also be lawfully distributed in dependency-type discourse trees with only terminal nodes, whether they are clauses, sentences or paragraphs. This hypothesis is partly validated by some research. Li et al. (2014) employ graph-based parsing techniques to convert RST-DT into dependency trees with clausal terminal nodes. They examine the distribution of RST relations. Some of the aforementioned studies are carried out on constituent-type discourse trees. Discourse dependency graph representations has only been studied majorly in the new century (e.g., Hirao et al., 2013; Li et al., 2014; Muller et al., 2012; Prasad et al., 2008). Particularly in the recent decade, more researchers in the field have made some attempts to convert existing phrase-structure-style discourse trees into dependency trees, employing various algorithms, for instance, MIRA and Eisner algorithm (Li et al., 2014), among some others (e.g., Cheng & Li, 2019; Cheng et al., 2021; De Marneffe et al., 2006; Ferracane et al., 2019; Hayashi et al., 2016; Hirao et al., 2013; Huber & Carenini, 2020; Kobayashi et al., 2020; Li et al., 2014; Morey et al., 2018; Sun & Xiong, 2019; Xiao et al., 2021; Yuan et al., 2021; Zhang et al., 2021). From the prior studies, we see some room for improvement. First, dependency structures at discourse levels are far from being adequately investigated compared with research at the syntactic level. When the hierarchical aspect is concerned, discourse DD is very rarely researched (e.g., Sun & Xiong,

20 Literature Review and the Present Study 2019). To the best of our knowledge, discourse valency seems to be an undefined term yet. From the relational perspective, among the existing quantitative investigations, most of them concern some specific relations rather than consider all relations as a whole. Of the few researches that do cover all relations (e.g., Sun & Xiong, 2019), to our best knowledge, there is no examination of the distribution patterns of these relations or investigation between the syntactic and discourse levels concerning the relations. The comparison and contrast in both relational and hierarchical dimensions between the two levels are yet to be carried out. Second, a similar concern that we put forward in Section 2.1.1 is that one single corpus might generate some observations resulting from fluctuations. Evidence is needed to show that the results are solid and convincing. In the big-data era, it is claimed that the bigger the data, the better. But does it always necessarily entail the obligation for theoretical researchers to have large-scale data? Building large-scale discourse dependency corpora is even more demanding than constructing large syntactic dependency ones. If smaller ones could prove valid, and simultaneously the researching findings are found to be generated excluding fluctuations, researchers can be encouraged to build smaller corpora. Anyway, discourse dependency trees are more difficult to build and are therefore less readily available. Like at the syntactic level, we can try to divide an existing corpus into n sub-corpora to examine whether each 1/n of the original one will behave likewise. If so, we can claim that they have reached Zipfian size (Zipf, 1935). Here in our study, we will set n at 6. If these 6 sub-corpora are found to behave homogeneously, we will then know that the common features among these sub-corpora are representative features not subject to random fluctuations. The third aspect of suggested improvement concerns the unit granularity. Prior studies mostly focus on constituent tree structure or on clause units. When the discourse units are set to be at different levels, we will then be able to examine some discourse processes of how to organise lower-level units into higher level ones. The fourth aspect is methodology. Quantitative linguistic studies have focused on units, attributes and their relations (Buk & Rovenchak, 2008; Chen & Liu, 2016; Liu & Huang, 2012; Tuzzi et al., 2009), but not many studies have focused on linear linguistic behaviour (Köhler, 1999, 2006; Pawłowski, 1999). When we examine the distribution patterns of the previously mentioned language entities/properties, we usually employ the bag-of-words methods; in other words, we put all the entities/properties together in their inventories (as if in a bag) and collectively examine their distribution features of these unordered sets of elements without considering the linear behaviour. But syntagmatic/sequential behaviour is a very important aspect of these research objects, which shall not be left out (Köhler, 2015; Pawłowski, 1999). In fact, as early as in the 1960s, Herdan (1966) distinguishes between linear language material (termed as “language in the line”) and non-linear language material (termed as “language in the mass”). He proposes two corresponding types of language research methods accordingly. In quantitative linguistics, linear behaviours of discourse dependency are seldom researched. What are the similarities and differences between the linear behaviours

Literature Review and the Present Study 21 of discourse dependency at the syntactic and discourse levels? This is supposed to be a very interesting research topic. Fifth, a further field of investigation is about superstructure or macrostructure. When discourse dependency trees are constructed, we can investigate superstructures of discourse from a dependency perspective. Out of various types of superstructure (Kintsch & van Dijk 1975; Paltridge et al., 2012; van Dijk, 1980, 1982, 1983, 1985, 2019a, 2019b; Williams, 1984; Williams et al., 1984; Ye, 2019), it is hypothesised that news-writing, organised by a conventional news schema (van Dijk, 1988a, 1988b), follows a relevance/importance ordering of news. In this type of structure, the initial Summary of Headline and Lead summarises the most significant information. The following parts are Context, History and Consequences with descending relevance/importance, but in a less strict order. Occurring towards the end are Verbal Reactions and particularly Comments. Among other names like summary news lead (e.g., Errico, 1997), and lead-andbody principle (e.g., Pöttker, 2003), such a schema, commonly referred to as the inverted pyramid structure (e.g., Bell, 1991; Dunn, 2005; Mindich, 1998; Pöttker, 2003; Scollon, 2000; Thomson et al., 2008), puts the most newsworthy information, or the most immediate and relevant information at the very top, the secondary information in the middle in an order of descending significance/relevance, and finally, the least important information at the bottom. Initial Summaries convey intended semantic macrostructure, thus serving topdown cognitive strategic functions (van Dijk & Kintsch, 1983). This function is fulfilled through the activation of knowledge and establishment of a faster comprehension of local coherence of words and sentences (van Dijk, 1988a). Through experimentation, van Dijk (1988a) finds that only the main topics can be recalled days after no matter whether the readers have finished reading the news. The study corroborates that initial Summaries play a crucial role. Khalil (2006) reports that an interpretation or an emotive value will be acquired through the lead, which manifests the affective stance of the reporter towards the news event. Modern journalism shall be characterised by objectivity. For news reports to be objective, neutrality and the inverted pyramid are the two key components (Mindich, 1998). These two factors contribute to making news-reporting distinctive and unique as an individual genre (Thomson et al., 2008). Various studies show that the top-down instalment structure or the inverted pyramid structure is the most common structure of news reporting (Scollon & Scollon, 1997; Thomson et al., 2008), accounting for 69% of all news stories (Readership Institute 2010). Shie (2012) compares 565 pairs of corresponding new stories from The New York Times International Weekly (IW) and New York Times (NYT), analyses the discourse organisations and measures the distance between the focus and the discourse units and between the units per se. Contrary to the inverted pyramid model, NYT closing paragraphs are found to be salient discourse spans. Shie attributes this to the defining superstructure of soft news stories. Such a structure facilitates some high-profile NLP tasks like information extraction and automatic summarisation (e.g., Koupaee & Wang, 2018; Norambuena et al., 2020), and genre classification (e.g., Dai & Huang, 2021), to name just a few.

22 Literature Review and the Present Study In addition to the top-down structure at the discourse level, van Dijk (1986) proposes such a structure also features the thematic structure in paragraphs. He attributes this relevance ordering to a general news-writing strategy. Like initial summaries of news discourse, internal summaries in paragraphs serve important functions. But to the best of our knowledge, within paragraphs, the inverted pyramid structure is not visually presented. We can further pursue this structure within sentences and examine whether such a top-down strategy is employed in combining clauses into sentences. 2.2 The present study This study examines dependency relations and related properties at both syntactic and discourse levels. We choose relations, DD and valency, three of the key issues in dependency as objects of comparison and contrast. At the syntactic level, it mainly observes (1) the various combinations of a complete dependency structure, (2) valency and (3) DD, which constitute the first research objective of the study. As previously mentioned, theoretically speaking, dependency relations can go beyond words and occur between other linguistic units like phrases, sentences and even paragraphs. At the discourse level, after the conversion of trees, units and properties including discourse dependency relations, discourse valency and discourse DD will be discussed. To define discourse valency and discourse DD, we will borrow some concepts from those related ones in dependency grammar. The definitions of these objects of study will be given in Section 4.4. The conversion of traditional discourse trees will enable us to examine discourse dependency trees at three distinct levels of elementary discourse units, namely clauses, sentences and paragraphs, respectively. So, the second research objective is to prove that at the discourse level, the elements (at the three granularity levels of clauses, sentences and paragraphs) can also form dependency relations. To prove this, we examine 1 the rank-frequency distribution of RST relations, their motifs, discourse valency and discourse DD; 2 whether there is a top-down organisation or an inverted pyramid structure at all the three discourse levels (constituting sentences from clauses, paragraphs from sentences, and finally discourses from paragraphs); and 3 whether discourse DDs and valencies are lawfully distributed, following the same distribution patterns as those at the syntactic level. To examine the dependency relations at the discourse level, we borrow the concepts of valency and DD by analogy of syntactic dependencies. If they are found to follow a certain linguistic model, we can claim that these concepts can be well adapted for relations between discourse elements like clauses, sentences and paragraphs. In this study, we basically employ three methods to examine the distribution of the linguistic properties/units: Rank-frequency distribution (1) of the properties, (2) of their motifs and (3) of their sequencings. The first is examined in a bag-of-words

Literature Review and the Present Study 23 mode, ignoring their linear behaviours and treating the properties/units as if they were in a bag. Both the latter two examine the linear behaviours of the properties and their definitions will be given in Chapter 3. When any of these types of rank-frequency distribution is found to abide by a certain well-proven linguistic model, we can claim that the property/unit in question is a result of a diversification process and is therefore a normal language property/unit. The diversification process is the basic concept for modelling in this study and will be elaborated on in Section 3.2. Detailed research hypotheses or research questions will be given in the relevant sections. 2.3 An outline The book is outlined as follows: Following the introduction in Chapter 1 and prior studies in this chapter, the third chapter introduces the theoretical background, elaborating on diversification processes and the least effort principle. Chapter 4 introduces the research materials and methods. For the research materials, we employ a syntactic dependency treebank and a discourse one. We will employ motifs and sequencings as the linguistic entities exhibiting language-inthe-line features. Chapter 5 focuses on syntactic dependency structures. We take turns to examine the 7 combinations of a complete syntactic dependency structure, and then the valencies (number of dependents) and DDs. Chapter 6 concentrates on discourse dependency structures. Similarly, we examine the dependency relations, valencies and DDs. Furthermore, we examine whether there is a top-down instalment at all the three levels (forming sentences from clauses, forming paragraphs from sentences and then forming the whole discourse from paragraphs). The final chapter is the conclusion of this study, addressing the basic findings, limitations and possible further endeavours. So, we will start from some theoretical background in the next chapter. To model the linguistic phenomena in this study, we need to introduce two basic concepts: Diversification processes and the least effort principle. Note 1 A Chinese propositional structure indicating the following expression as the real object of the verb in the sentence.

References Altmann, G. (1991). Modeling diversification phenomena in language. In U. Rothe (Ed.), Diversification processes in language: Grammar (pp. 33–46). Rottmann. Altmann, G. (2005a). Diversification process. In R. Köhler, G. Altmann, & G. Piotrowski (Eds.), Quantitative Linguistik, Ein Internationales Handbuch (Quantitative linguistics, an international handbook) (pp. 646–658). Walter de Gruyter. Anita, R., & Subalalitha, C. N. (2019). Building Discourse Parser for Thirukkural. In D. M. Sharma, P. Bhattacharyya, & R. Sangal (Eds.), Proceedings of the 16th international

24 Literature Review and the Present Study conference on natural language processing (pp. 18–25). International Institute of Information Technology, Hyderabad, India, December 2019. NLP Association of India. Asher, N. (1993). Reference to abstract objects in discourse (Vol. 50). Springer Science & Business Media. Asher, N., & Lascarides, A. (1994). Intentions and information in discourse. In J. Pustejovsky (Ed.), Proceedings of 32nd meeting of the Association for Computational Linguistics (ACL’ 94) (pp. 34–41). Las Cruces, New Mexico, USA, June 27–30, 1994. Asher, N., & Lascarides, A. (2003). Logics of conversation. Cambridge University Press. Baldridge, J., & Lascarides, A. (2005). Probabilistic head-driven parsing for discourse structure. In Proceedings of the ninth conference on computational natural language learning (CoNLL) (pp. 96–103). Michigan, USA, June 29–30, 2005. Barabási, A., & Pósfai, M. (2016). Network science. Cambridge University Press. Beliankou, A., & Köhler, R. (2018). Empirical analyses of valency structures. In H. Liu & J. Jiang (Eds.), Quantitative analysis of dependency structure (pp. 93–100). De Gruyter Mouton. Bell, A. (1991). The language of news media. Blackwell. Bhatia, P., Ji, Y., & Eisenstein, J. (2015). Better document-level sentiment analysis from RST discourse parsing. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 2212–2218). arXiv preprint arXiv:1509.01599. Brown, B., & Yule, G. (1983). Discourse analysis. Cambridge University Press. Buk, S., & Rovenchak, A. (2008). Menzerath–Altmann Law for syntactic structures in Ukrainian. Glottotheory, 1 (1), 10–17. Burstein, J., Tetreault, J., & Chodorow, M. (2013). Holistic discourse coherence annotation for noisy essay writing. Dialogue & Discourse, 4(2), 34–52. Cardoso, P. C., Maziero, E. G., Jorge, M. L. C., Seno, E. M., Di Felippo, A., Rino, L. H. M., & Pardo, T. A. (2011). CSTnews—a discourse-annotated corpus for single and multidocument summarization of news texts in Brazilian Portuguese. In Proceedings of the 3rd RST Brazilian meeting (pp. 88–105). October 2011. Carlson, L., Marcu, D., & Okurowski, M. E. (2002). RST discourse treebank, LDC2002T07 [Corpus]. Linguistic Data Consortium. Carlson, L., Marcu, D., & Okurowski, M. E. (2003). Building a discourse-tagged corpus in the framework of rhetorical structure theory. In J. van Kuppevelt & R. W. Smith (Eds.), Current directions in discourse and dialogue (pp. 85–112). Kluwer Academic Publishers. Čech, R., & Mačutek, J. (2010). On the quantitative analysis of verb valency in Czech. In P. Grzybek, E. Kelih, & J. Mačutek (Eds.), Text and language. Structure, functions, interrelations (pp. 21–29). Pearson Verlag. Čech, R., Pajas, P., & Mačutek, J. (2010). Full valency. Verb valency without distinguishing complements and adjuncts. Journal of Quantitative Linguistics, 17(4), 291–302. Čech, R., Vincze, V., & Altmann, G. (2017). On motifs and verb valency. In H. Liu & J. Liang (Eds.), Motifs in language and text (pp. 231–260). De Gruyter Mouton. Chai, J., & Jin, R. (2004). Discourse structure for context question answering. In S. Harabagiu & F. Lacatusu (Eds.), LT-NAACL 2004: Workshop on pragmatics of question answering (pp. 23–30). Boston, Massachusetts, USA, May 2–7, 2004. Association for Computational Linguistics. Chen, H., & Liu, H. (2016). How to measure word length in spoken and written Chinese. Journal of Quantitative Linguistics, 23(1), 5–29. Chen, R., Deng, S., & Liu, H. (2022). Syntactic complexity of different text types: From the perspective of dependency distance both linearly and hierarchically. Journal of Quantitative Linguistics, 29(4), 510–540.

Literature Review and the Present Study 25 Chen, X., & Gerdes, K. (2022). Dependency distances and their frequencies in Indo- European language. Journal of Quantitative Linguistics, 29(1), 106–125. Cheng, Y., & Li, S. (2019). Zero-shot Chinese discourse dependency parsing via cross- lingual mapping. In Proceedings of the 1st workshop on discourse structure in neural NLG (pp. 24–29). Tokyo, Japan, November 2019. Association for Computational Linguistics. Cheng, Y., Li, S., & Li, Y. (2021). Unifying discourse resources with dependency framework. In Chinese computational linguistics: 20th China national conference, CCL 2021 (pp. 257–267). Hohhot, China, August 13–15, 2021. Springer-Verlag. Chiarcos, C., & Krasavina, O. (2005). Rhetorical distance revisited: A pilot study. In Proceedings of the corpus linguistics 2005 conference. Birmingham, July 14–17, 2005. Chiarcos, C., & Krasavina, O. (2008). Rhetorical distance revisited: A parametrized approach. In B. Anton & K. Peter (Eds.), Constraints in discourse (pp. 97–115). John Benjamins. Collins, M. (1996). A new statistical parser based on bigram lexical dependencies. In Proceedings of the associations for computational linguistics (pp. 184–191). arXiv preprint cmp-lg/9605012. Collins, M. (2003). Head-driven statistical models for natural language parsing. Computational Linguistics, 29, 589–638. Coulthard, M. (1984). An introduction to discourse analysis (2nd ed.). Longman. Coulthard, M., & Condlin, C. N. (2014). An introduction to discourse analysis. Routledge. Cristea, D. (2005). Motivations and implications of Veins Theory. In Proceedings of the 2nd international workshop on natural language understanding and cognitive science (pp. 32–44). Miami, Florida, USA, May 24, 2005. Dai, Z., & Huang, R. (2021). A joint model for structure-based news genre classification with application to text summarization. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Dai, Z., Liu, H., & Yan, J. (2022). Revisiting English written VP-ellipsis and VP-substitution: A dependency-based analysis. Linguistics Vanguard. dʼAndrade, R. G. (1990). Some propositions about the relations between culture and human cognition. In J. W. Stigler & R. Shweder (Eds.), Cultural psychology: Essays in comparative human development (pp. 65–129). Cambridge University Press. Das, D., & Taboada, M. (2018). Signalling of coherence relations in discourse, beyond discourse markers. Discourse Processes, 55(8), 743–770. De Marneffe, M. C., MacCartney, B., & Manning, C. D. (2006). Generating typed dependency parses from phrase structure parses. In Proceedings of the fifth international conference on language resources and evaluation (LREC’ 06) (pp. 449–454). Genoa, Italy. May 2006. Demberg, V., Scholman, M. C., & Asr, F. T. (2019). How compatible are our discourse annotation frameworks? Insights from mapping RST-DT and PDTB annotations. Dialogue & Discourse, 10(1), 87–135. den Ouden, H., Noordman, L., & Terken, J. (2009). Prosodic realizations of global and local structure and rhetorical relations in read aloud news reports. Speech Communication, 51(2), 116–129. Dunn, A. (2005). Television news as narrative. In H. Fulton (Ed.), Narrative and media (pp. 140–152). Cambridge University Press. Eisner, J., & Smith, N. (2005). Parsing with soft and hard constraints on dependency length. In Proceedings of the international workshop on parsing technologies (pp. 30–41). Errico, M. (1997). “The evolution of the summary news lead”, Media History Monographs 1(1), http://blogs.elon.edu/mhm/files/2017/03/Media-History-Monographs-Volume-1. pdf (Accessed 14 December 2022).

26 Literature Review and the Present Study Fang, Y., & Liu, H. (2018). What factors are associated with dependency distances to ensure easy comprehension? A case study of ba sentences in Mandarin Chinese. Language Sciences, 67, 33–45. Feng, V. W., & Hirst, G. (2012). Text-level discourse parsing with rich linguistic features. In Proceedings of the 50th annual meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 60–68). Jeju Island, Korea, July 8–14, 2012. Feng, V. W., & Hirst, G. (2014). A linear-time bottom-up discourse parser with constraints and post-editing. In Proceedings of the 52nd annual meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 511–521). Baltimore, Maryland, USA, June 23–25, 2014. Feng, V. W., Lin, Z., & Hirst, G. (2014). The impact of deep hierarchical discourse structures in the evaluation of text coherence. In Proceedings of COLING 2014, the 25th international conference on computational linguistics: Technical papers (pp. 940–949). Dublin, Ireland, August 2014. Ferracane, E., Durrett, G., Li, J. J., & Erk, K. (2019). Evaluating discourse in structured text representations. arXiv preprint arXiv:1906.01472. Ferrer-i-Cancho, R. (2004). Euclidean distance between syntactically linked words. Physical Review E, 70(5), 056135. Ferrer-i-Cancho, R. (2006). Why do syntactic links not cross? EPL, 76(6), 1228. Ferrer-i-Cancho, R. (2016). Non-crossing dependencies: Least effort, not grammar. In A. Mehler, A. Lücking, S. Banisch, P. Blanchard, & B. Job (Eds.), Towards a theoretical framework for analyzing complex linguistic networks (pp. 203–234). Berlin/Heidelberg: Springer. Ferrucci, D., Brown, E., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A. A., & Welty, C. (2010). Building Watson: An overview of the DeepQA project. AI Magazine, 31(3), 59–79. Fisch, A., Guo, J., & Barzilay, R. (2019). Working hard or hardly working: Challenges of integrating typology into neural dependency parsers. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) (pp. 5714–5720). Hong Kong, China, November 3–7, 2019. Association for Computational Linguistics. Futrell, R., Mahowald, K., & Gibson, E. (2015). Large-scale evidence for dependency length minimization in 37 languages. Proceedings of the National Academy of Sciences of the United States of America, 112(33), 10336–10341. Gao, J., & Liu, H. (2019). Valency and English learners’ thesauri. International Journal of Lexicography, 32(3), 326–361. Gao, J., & Liu, H. (2020). Valency dictionaries and Chinese vocabulary acquisition for foreign learners. Leixkos, 30, 111–142. https://doi.org/10.5788/30-1-1548 Gao, S. (2013). The study of garden-path sentences based on probabilistic valency pattern theory. Applied Linguistics, 3, 126–132. (In Chinese). Gao, S., Zhang, H., & Liu, H. (2014). Synergetic properties of Chinese verb valency. Journal of Quantitative Linguistics, 1, 1–21. Gee, J. P. (2004). An introduction to discourse analysis: Theory and method. Routledge. Goutsos, D. (1996). A model of sequential relations in expository text. Text, 16(4), 501–533. Goyal, N., & Eisenstein, J. (2016). A joint model of rhetorical discourse structure and summarization. In K. W. Chang, M. W. Chang, A. M. Rush, & V. Srikumar (Eds.), Proceedings of the workshop on structured prediction for NLP (pp. 25–34). Austin, Texas, USA, November 2016. Association for Computational Linguistics. Grosz, B., Aravind, J., & Weinstein, S. (1995). Centering: A framework for modelling the local coherence of discourse. Computational Linguistics, 21(2), 203–226.

Literature Review and the Present Study 27 Guo, F., He, R., Dang, J., & Wang, J. (2020). Working memory-driven neural networks with a novel knowledge enhancement paradigm for implicit discourse relation recognition. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 05) (pp. 7822–7829). April 2020. Gupta, P., Rajaram, S., Schütze, H., & Runkler, T. (2019). Neural relation extraction within and across sentence boundaries. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, No. 01) (pp. 6513–6520). July, 2019. Halliday, M. A. K., & Hasan, R. (1976). Cohesion in English. Longman. Hayashi, K., Hirao, T., & Nagata, M. (2016). Empirical comparison of dependency conversions for RST discourse trees. In Proceedings of the 17th annual meeting of the special interest group on discourse and dialogue (pp. 128–136). Los Angeles, USA, September 13–15, 2016. Association for Computational Linguistics. Herbst, H. (1988). A valency model for nouns in English. Journal of Linguistics, 24(2), 265–301. Herbst, T., David, H., Ian, F. R., & Dieter, G. (2004). A valency dictionary of English: A corpus-based analysis of the complementation patterns of English verbs, nouns and adjectives. De Gruyter Mouton. Herdan, G. (1966). The advanced theory of language as choice and chance. Lingua, 18(3), 447–448. Hernault, H., Helmut, P., du Verle, D. A., & Ishizuka, M. (2010). HILDA: A discourse parser using support vector machine classification. Dialogue and Discourse, 1(3), 1–33. https:// doi.org/10.5087/dad.2010.003 Hirao, T., Yoshida, Y., Nishino, M., Yasuda, N., & Nagata, M. (2013). Single-document summarization as a tree knapsack problem. In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 1515–1520). Seattle, Washington, USA, October 18–21, 2013. Hovy, E. H., & Maier, E. (1992). Parsimonious or profligate: How many and which discourse structure relations?. University of Southern California Marina Del Rey Information Sciences Inst. Huang, Y. J., & Kurohashi, S. (2021). Extractive summarization considering discourse and coreference relations based on heterogeneous graph. In P. Merlo, J. Tiedemann, & R. Tsarfaty (Eds.), Proceedings of the 16th conference of the European chapter of the Association for Computational Linguistics: Main volume (pp. 3046–3052). Online, April 2021. Association for Computational Linguistics. Huber, P., & Carenini, G. (2020). From sentiment annotations to sentiment prediction through discourse augmentation. In D. Scott, N. Bel, & C. Zong (Eds.), Proceedings of the 28th international conference on computational linguistics (pp. 185–197). Barcelona, Spain (Online), December 2020. International Committee on Computational Linguistics. Hudson, R. (1995). Measuring syntactic difficulty. Unpublished paper. https://dickhudson. com/wp-content/uploads/2013/07/Difficulty.pdf (Accessed February 2, 2023). Iruskieta, M., Da Cunha, I., & Taboada, M. (2015). A qualitative comparison method for rhetorical structures: Identifying different discourse structures in multilingual corpora. Language Resources and Evaluation, 49(2), 263–309. Jansen, P., Surdeanu, M., & Clark, P. (2014). Discourse complements lexical semantics for non-factoid answer reranking. In Proceedings of the 52nd annual meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 977–986). Baltimore, Maryland, USA, June 23–25, 2014. Ji, Y., & Eisenstein, J. (2014). Representation learning for text-level discourse parsing. In Proceedings of the 52nd annual meeting of the Association for Computational Linguistics (volume 1: Long papers) (pp. 13–24). Baltimore, Maryland, USA, June 23–25, 2014.

28 Literature Review and the Present Study Ji, Y., Haffari, G., & Eisenstein, J. (2016). A latent variable recurrent neural network for discourse-driven language models. In K. Knight, A. Nenkova, & O. Rambow (Eds.), Proceedings of the 2016 conference of the North American chapter of the Association for Computational Linguistics: Human language technologies (pp. 332–342). San Diego, California, USA, June 12–17, 2016. Ji, Y., & Smith, N. (2017). Neural discourse structure for text categorization. arXiv preprint arXiv:1702.01829. Jiang, F., Fan, Y., Chu, X., Li, P., & Zhu, Q. (2022). A survey of English and Chinese discourse structure analysis. Journal of Software, 1–28. https://doi.org/10.13328/j.cnki. jos.006650 (In Chinese). Jiang, J., & Liu, H. (2015). The effects of sentence length on dependency distance, dependency direction and the implications - based on a parallel English-Chinese dependency Treebank. Language Sciences, 50, 93–104. Jiang, J., & Liu, H. (Eds.). (2018). Quantitative analysis of dependency structures [QL series 72]. De Gruyter Mouton. Jiang, J., & Ouyang, J. (2017). Dependency distance: A new perspective on the syntactic development in second language acquisition. Comment on “Dependency distance: A new perspective on syntactic patterns in natural language” by Haitao Liu et al. Physics of Life Reviews, 21, 209–210. Jiang, J., Yu, W., & Liu, H. (2019). Does scale-free syntactic network emerge in second language learning? Frontiers in Psychology (Language Sciences), 10, 925. https://doi. org/10.3389/fpsyg.2019.00925 Jiang, X., & Jiang, Y. (2020). Effect of dependency distance of source text on disfluencies in interpreting. Lingua, 243, 102873. Jin, H., & Liu, H. (2018). Regular dynamic patterns of verbal valency ellipsis in modern spoken Chinese. In J. Jiang & H. Liu (Eds.), Quantitative analysis of dependency structures (pp. 101–118). De Gruyter Mouton. Jing, Y., & Liu, H. (2015). Mean hierarchical distance: Augmenting mean dependency distance. In Proceedings of the third international conference on dependency linguistics (Depling 2015) (pp. 161–170). Uppsala, Sweden, August 24–26, 2015. Joty, S., Carenini, G., & Ng, R. T. (2012). A novel discriminative framework for sentencelevel discourse analysis. In J. I. Tsujii, J. Henderson, & M. Pasca (Eds.), Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL’12) (pp. 904–915). Stroudsburg, PA, USA, July, 2012. Joty, S., Carenini, G., & Ng, R. T. (2015). Codra: A novel discriminative framework for rhetorical analysis. Computational Linguistics, 41(3), 385–435. Kalchbrenner, N., & Blunsom, P. (2013). Recurrent convolutional neural networks for discourse compositionality. In Proceedings of the workshop on continuous vector space models and their compositionality (pp. 119–126). Sofia, Bulgaria, August 9, 2013. arXiv preprint arXiv:1306.3584. Kamp, H., & Reyle, U. (1993). From discourse to logic: Introduction to model theoretic semantics of natural language, formal logic and discourse representation theory. Kluwer. Kang, X., Zong, C., & Xue, N. (2019). A survey of discourse representations for Chinese discourse annotation. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP, 18(3), 1–25. Khalil, E. N. (2006). Communicating affect in news stories: The case of the lead sentence. Text & Talk, 26(3), 329–349.

Literature Review and the Present Study 29 Kintsch, W., & van Dijk, T. A. (1975). Recalling and summarizing stories. Language, 40, 98–116. Kobayashi, N., Hirao, T., Kamigaito, H., Okumura, M., & Nagata, M. (2020). Top-down RST parsing utilizing granularity levels in documents. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 05, pp. 8099–8106). April 2020. Köhler, R. (1999). Syntactic structures: Properties and interrelations. Journal of Quantitative Linguistics, 6(1), 46–57. Köhler, R. (2005). Synergetic linguistics. In R. Köhler, G. Altmann, & R. G. Piotrowski (Eds.), Quantitative Linguistik. Ein Internationales Handbuch. Quantitative linguistics. An international handbook (pp. 760–774). De Gruyter. Köhler, R. (2006). The frequency distribution of the length of length sequences. In J. Genzor & M. Bucková (Eds.), Favete linguis. Studies in honour of Viktor Krupa (pp. 145–152). Slovak Academy Press. Köhler, R. (2015). Linguistic motifs. In G. K. Mikros & J. Mačutek (Eds.), Sequences in language and text (pp. 107–129). De Gruyter Mouton. Koupaee, M., & Wang, W. Y. (2018). Wikihow: A large scale text summarization dataset. arXiv preprint arXiv:1810.09305. Kraus, M., & Feuerriegel, S. (2019). Sentiment analysis based on rhetorical structure theory: Learning deep neural networks from discourse trees. Expert Systems with Applications, 118, 65–79. Le Thanh, H., Abeysinghe, G., & Huyck, C. (2004). Generating discourse structures for written text. In COLING 2004: Proceedings of the 20th international conference on computational linguistics (pp. 329–335). Lei, L., & Wen, J. (2020). Is dependency distance experiencing a process of minimization? A diachronic study based on the State of the Union addresses. Lingua, 239, 102762. Li, J., & Jurafsky, D. (2017). Neural net models for open-domain discourse coherence. In Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 198–209). Copenhagen, Denmark, September 7–11, 2017. arXiv preprint arXiv:1606.01545. Li, J., Liu, M., Qin, B., & Liu, T. (2022). A survey of discourse parsing. Frontiers of Computer Science, 16(5), 1–12. Li, J., Liu, M., Zheng, Z., Zhang, H., Qin, B., Kan, M. Y., & Liu, T. (2021). Dadgraph: A discourse-aware dialogue graph neural network for multiparty dialogue machine reading comprehension. In 2021 international joint conference on neural networks (IJCNN) (pp. 1–8). July 2021. IEEE. Li, Q., Li, T., & Chang, B. (2016). Discourse parsing with attention-based hierarchical neural networks. In Proceedings of the 2016 conference on empirical methods in natural language processing (EMNLP) (pp. 362–371), Austin, Texas, November 1–5, 2016. Li, S., Wang, L., Cao, Z., & Li, W. (2014). Text-level discourse dependency parsing. In Proceedings of the 52nd annual meeting of the Association for Computational Linguistics. Baltimore, Maryland, USA, June 23–25, 2014, pp. 25–35. Liang, J. (2017). Dependency distance differences across interpreting types: Implications for cognitive demand. Frontiers in Psychology, 8, 2132. Liang, J., & Liu, H. (2016). Interdisciplinary studies of linguistics: Language universals, human cognition and big-data analysis. Journal of Zhejiang University (Humanities and Social Sciences Edition), 46(1), 108–118. (In Chinese). Liu, H. (2007a). Probability distribution of dependency distance. Glottometrics, (15): 1–12. Liu, H. (2007b). Dependency relations and dependency distance: A statistical view based on Treebank. In Meaning-text theory 2007 [Recurso electrónico]: Proceedings of the 3rd

30 Literature Review and the Present Study international conference on meaning-text theory (pp. 269–278). Klagenfurt, May 20–24, 2007. Liu, H. (2008). Dependency distance as a metric of language comprehension difficulty. Journal of Cognitive Science, 9(2), 159–191. Liu, H. (2010). Dependency direction as a means of word-order typology: A method based on dependency treebanks. Lingua, 120(6), 1567–1578. Liu, H. (2011). Quantitative properties of English verb valency. Journal of Quantitative Linguistics, 18(3), 207–233. Liu, H. (2017). The distribution pattern of sentence structure hierarchy. Foreign Language Teaching and Research, 3, 345–352+479. (In Chinese). Liu, H. (2022). Dependency relations and language networks. Science Press. (In Chinese). Liu, B., & Chen, X. (2017). Dependency distance in language evolution: Comment on “Dependency distance: A new perspective on syntactic patterns in natural languages” by Haitao Liu et al. Physics of Life Reviews, 21, 194–196. Liu, H., & Feng, Z. (2007). Probabilistic valency pattern theory for natural language processing. Linguistic Sciences, (03): 32–41. (In Chinese). Liu, H., & Huang, W. (2012). Quantitative linguistics: State of the art, theories and methods. Journal of Zhejiang University (Humanities and Social Sciences), 43(2), 178–192. (In Chinese). Liu, H., & Jing, Y. (2016). A quantitative analysis of English hierarchical structure. Journal of Foreign Languages, 39(6), 2–11. (In Chinese). Liu, B., & Liu, H. (2011). A corpus-based diachronic study on the syntactic valence of Chinese verbs. Language Teaching and Linguistic Studies, 6, 83–89. (In Chinese). Liu, Y., Li, S., Zhang, X., & Sui, Z. (2016). Implicit discourse relation classification via multi-task neural networks. In Proceedings of the AAAI conference on artificial intelligence (Vol. 30, No. 1). March 2016. Liu, H., Xu, C., & Liang, J. (2017). Dependency distance: A new perspective on syntactic patterns in natural languages. Physics of Life Review, 21, 171–193. Louis, A., Joshi, A., & Nenkova, A. (2010). Discourse indicators for content selection in summarization. In Proceedings of the 11th annual meeting of the special interest group on discourse and dialogue (pp. 147–156). September 2010. Association for Computational Linguistics. https://doi.org/anthology/W10-4327 Louwerse, M. M. (2001). An analytic and cognitive parametrization of coherence relations. Cognitive Linguistics, 12(3), 291–315. Lu, J., & Liu, H. (2020). Do English noun phrases tend to minimize dependency distance? Australian Journal of Linguistics, 40(2), 246–262. https://doi.org/10.1080/07268602. 2020.1789552 Lu, Q., & Liu, H. (2016a). A quantitative analysis of the relationship between crossing and dependency distance in human language. Journal of Shanxi University (Philosophy, & Social Science), 39(4), 49–56. (In Chinese). Lu, Q., & Liu, H. (2016b). Does dependency distance distribute regularly? Journal of Zhejiang University (Humanities and Social Science), (4): 49–56. (In Chinese). Lu, Q., & Liu, H. (2016c). Is there a pattern in the distribution of dependence distance? Journal of Zhejiang University (Humanities and Social Science), (4): 63–76. (In Chinese). Lu, Q., Lin, Y., & Liu, H. (2018). Dynamic valency and dependency distance. In J. Jiang & H. Liu (Eds.), Quantitative analysis of dependency structures (pp. 145–166). De Gruyter Mouton. Mann, W. C., & Thompson, S. A. (1988). Rhetorical structure theory: Toward a functional theory of text organization. Text, 8(3), 243–281.

Literature Review and the Present Study 31 Marcu, D. (1997). The rhetorical parsing of unristricted natural language texts. In Proceedings of the 35th annual meeting of the Association for Computational Linguistics (pp. 96–103). Madrid, Spain, July 7–12, 1997. Marcu, D. (2000). The theory and practice of discourse parsing and summarization. The MIT press. Markle-Hus, J., Feuerriegel, S., & Prendinger, H. (2017). Improving sentiment analysis with document-level semantic relationships from rhetoric discourse structures. http://hdl. handle.net/10125/41288 (Accessed February 10, 2023). Meyer, T., & Popescu-Belis, A. (2012). Using sense-labeled discourse connectives for statistical machine translation. In M. R. Costa-jussà, P. Lambert, R. E. Banchs, R. Rapp, & B. Babych (Eds.), Proceedings of the joint workshop on exploiting synergies between information retrieval and machine translation (ESIRMT) and hybrid approaches to machine translation (HyTra) (pp. 129–138). Avignon, France, April 2012. Miltsakaki, E., & Kukich, K. (2004). Evaluation of text coherence for electronic essay scoring systems. Natural Language Engineering, 10(1), 25–55. Mindich, D. T. Z. (1998). Just the Facts: how “objectivity” came to define American journalism. New York University Press. Morey, M., Muller, P., & Asher, N. (2018). A dependency perspective on RST discourse parsing and evaluation. Computational Linguistics, 44(2), 197–235. Muller, P., Afantenos, S., Denis, P., & Asher, N. (2012). Constrained decoding for text-level discourse parsing. In 24th international conference on computational linguistics (COLING 2012) (Vol. 12, pp. 1883–1900). ACL: Association for Computational Linguistics. Mumbai, India, December 2012. hal-00750611f. Narasimhan, K., & Barzilay, R. (2015). Machine comprehension with discourse relations. In Proceedings of the 53rd annual meeting of the Association for Computational Linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers) (pp. 1253–1262). Beijing, China, July 26–31, 2015. Newman, M. (2004). Fast algorithm for detecting community structure in networks. Physical Review, E, 69(6), 066133. Newman, M. (2018). Networks. Oxford University Press. Ninio, A. (1998). Acquiring a dependency grammar: The first three stages in the acquisition of multiword combinations in Hebrew-speaking children. In G. Makiello-Jarza, J. Kaiser, & M. Smolczynska (Eds.), Language acquisition and developmental psychology. Universitas. Norambuena, B. K., Horning, M., & Mitra, T. (2020). Evaluating the inverted pyramid structure through automatic 5W1H extraction and summarization. Computational Journalism C+ J. Boston, MA, USA, March 20–21, 2020. Ouyang, J., & Jiang, J. (2018). Can the probability distribution of dependency distance measure language proficiency of second language learners? Journal of Quantitative Linguistics, 25(4), 295–313. Ouyang, J., Jiang, J., & Liu, H. (2022). Dependency distance measures in assessing syntactic complexity. Assessing Writing, 51, 100603. Paltridge, B., Starfield, S., Ravelli, L. J., & Tuckwell, K. (2012). Change and stability: Examining the macrostructures of doctoral theses in the visual and performing arts. Journal of English for Academic Purposes, 11(4), 332–344. Pawłowski, A. (1999). Language in the line vs. language in the mass: On the efficiency of sequential modelling in the analysis of rhythm. Journal of Quantitative Linguistics, 6, 70–77.

32 Literature Review and the Present Study Peng, N., Poon, H., Quirk, C., Toutanova, K., & Yih, W. T. (2017). Cross-sentence n-ary relation extraction with graph LSTMs. Transactions of the Association for Computational Linguistics, 5, 101–115. Pitkin, W. L. (1977). Hierarchies and the discourse hierarchy. College English, 38(7), 648–659. Pöttker, H. (2003). News and its communicative quality: the inverted pyramid – when and why did it appear? Journalism Studies, 4(4), 501–511. Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A. K., & Webber, B. L. (2008). The Penn Discourse TreeBank 2.0. In H. Sloetjes & P. Wittenburg (Eds.), Proceedings of the 6th international conference on language resources and evaluation (pp. 2961–2968). Marrakech, Morocco, May 28–30, 2008. Qin, L., Zhang, Z., & Zhao, H. (2016). Shallow discourse parsing using convolutional neural network. In Proceedings of the CoNLL-16 shared task (pp. 70–77). Berlin, Germany, August 7–12, 2016. Radev, D. R. (2000). A common theory of information fusion from multiple text sources step one: cross-document structure. In L. Dybkjær, K. Hasida, & D. Tram (Eds.), Proceedings of the 1st SIGdial workshop on Discourse and dialogue (Vol. 10, pp. 74–83). Association for Computational Linguistics. Hong Kong, China, October 7–8, 2000. Ratnayaka, G., Rupasinghe, T., de Silva, N., Warushavithana, M., Gamage, V., & Perera, A. S. (2018). Identifying relationships among sentences in court case transcripts using discourse relations. In 2018 18th International Conference on Advances in ICT for Emerging Regions (ICTer) (pp. 13–20). September 26–29, 2018. IEEE. Readership Institute. (2010). The value of feature-style writing. http://www.readership.org/ content/feature.asp (Accessed 22 January 2014). Reitter, D. (2003). Simple signals for complex rhetorics: On rhetorical analysis with richfeature support vector models. LDV Forum, 18(1/2), 38–52. Roemmele, M. (2016). Writing stories with help from recurrent neural networks. In Proceedings of the AAAI conference on artificial intelligence (Vol. 30, No. 1, pp. 4311–4312). March 2016. Roulet, E. (1995). Geneva school. In J. Verschueren, J. Östman, & J. Blommaert (Eds.), Handbook of pragmatics (pp. 319–323). John Benjamins. Sagae, K. (2009). Analysis of discourse structure with syntactic dependencies and datadriven shift-reduce parsing. In Proceedings of the 11th international conference on parsing technologies (pp. 81–84). Paris, France, October 7–9, 2009. Sahu, S. K., Christopoulou, F., Miwa, M., & Ananiadou, S. (2019). Inter-sentence relation extraction with document-level graph convolutional neural network. arXiv preprint arXiv:1906.04684. Sanders, T., & van Wijk, C. (1996). PISA: A procedure for analyzing the structure of explanatory texts. Text, 16(1), 91–132. Schiffrin, D. (1994). Approaches to discourse. Blackwell. Scollon, R. (2000). Generic variability in news stories in Chinese and English: A contrastive discourse study of five days’ newspapers. Journal of Pragmatics, 32(6), 761–791. Scollon, R., & Scollon, S. (1997). Point of view and citation: Fourteen Chinese and English versions of the “same” news story. Text, 17(1), 83–125. Shie, J.-S. (2012). The alignment of generic discourse units in news stories. Text & Talk, 2(5), 661–679. https://doi.org/10.1515/text-2012-0031 Somasundaran, S., Namata, G., Wiebe, J., & Getoor, L. (2009). Supervised and unsupervised methods in employing discourse relations for improving opinion polarity classification. In Proceedings of the 2009 conference on empirical methods in natural language processing (pp. 170–179). Singapore, 6–7, August 2009.

Literature Review and the Present Study 33 Song, L., Zhang, Y., Wang, Z., & Gildea, D. (2018). N-ary relation extraction using graph state lstm. In Proceedings of conference on empirical methods in natural language processing (pp. 2226–2235). Association for Computational Linguistics. arXiv:1808.09101. Soricut, R., & Marcu, D. (2003). Sentence level discourse parsing using syntactic and lexical information. In Proceedings of the 2003 human language technology conference of the North American chapter of the Association for Computational Linguistics (pp. 149–156). Edmonton, May–June 2003. Sperber, D., & Wilson, D. (1995). Relevance: Communication and cognition (2nd ed.). Blackwell. Subba, R., & Di Eugenio, B. (2009). An effective discourse parser that uses rich linguistic information. In Proceedings of human language technologies: The 2009 annual conference of the North American chapter of the Association for Computational Linguistics (pp. 566–574). Boulder, Colorado, June 2009. Association for Computational Linguistics. Sun, K., Wang, R., & Xiong, W. (2021). Investigating genre distinctions through discourse distance and discourse network. Corpus Linguistics and Linguistic Theory, 17(3), 599–624. Sun, K., Wang, R., & Xiong, W. (2022). A dependency framework for unifying discourse corpora. https://doi.org/10.31234/osf.io/vjegb Sun, K., & Xiong, W. (2019). A computational model for measuring discourse complexity. Discourse Studies, 21(6), 690–712. https://doi.org/10.1177/1461445619866985 Sun, K., & Zhang, L. (2018). Quantitative aspects of PDTB-style discourse relations across languages. Journal of Quantitative Linguistics, 25(4), 342–371. Taboada, M. (2019). The space of coherence relations and their signalling in discourse. Language, Context and Text, 1(2), 205–233. Taboada, M., & Mann, W. (2006a). Rhetorical structure theory: Looking back and moving ahead. Discourse Studies, 8(3), 423–459. Taboada, M., & Mann, W. (2006b). Applications of rhetorical structure theory. Discourse Studies, 8(4), 567–588. Temperley, D. (2007). Minimization of dependency length in written English. Cognition, 105(2), 300–333. Temperley, D. (2008). Dependency-length minimization in natural and artificial languages. Journal of Quantitative Linguistics, 15(3), 256–282. Thomson, E. A., Peter, R. R. W., & Kitley, P. (2008). “Objectivity” and “hard news” reporting across cultures. Journalism Studies, 9(2), 212–228. Tuzzi, A., Popescu, I. I., & Altmann, G. (2009). Zipf's laws in Italian texts. Journal of Quantitative Linguistics, 16(4), 354–367. van Dijk, T. A. (1982). Episodes as units of discourse analysis. In Tannen, D. (Ed.), Analyzing discourse: Text and talk (pp. 177–195). Georgetown University Press. van Dijk, T. A. (1982). Episodes as units of discourse analysis. Analyzing Discourse: Text and Talk, 177–195. van Dijk, T. A. (1983). Discourse analysis: Its development and application to the structure of news. Journal of Communication, 33(2), 20–43. van Dijk, T. A. (1985). Structures of news in the press. In T. A. van Dijk (Ed.), discourse and communication: New approaches to the analysis of mass media, discourse and communication (pp. 69–93). De Gruyter. van Dijk, T. A. (1986). News schemata. In R. C. Charles & S. Greenbaum (Eds.), Studying writing: Linguistic approaches (pp. 155–185). Sage Publications. van Dijk, T. A. (1988a). News Analysis – Case Studies of International and National News in the Press. Lawrence Erlbaum. van Dijk, T. A. (1988b). News as discourse. Lawrence Erlbaum.

34 Literature Review and the Present Study van Dijk, T. A. (2019a). Macrostructures: An interdisciplinary study of global structures in discourse, interaction, and cognition. Routledge. van Dijk, T. A. (2019b). Some observations on the role of knowledge in discourse processing. Arbolesy Rizomas, 1(1), 10–23. van Dijk, T. A., & Kintsch, W. (1983). Strategies of discourse comprehension. Academic Press. Voll, K., & Taboada, M. (2007). Not all words are created equal: Extracting semantic orientation as a function of adjective relevance. In AI 2007: Advances in artificial intelligence: 20th Australian joint conference on artificial intelligence (pp. 337–346). Gold Coast, Australia, December 2–6, 2007. Springer. Wang, H., & Liu, H. (2014). The effects of length and complexity on constituent ordering in written English. Poznań Studies in Contemporary Linguistics, 50(4), 477–494. Wang, Y. (2015). A study on the distribution of dependency distances in different domains of written English in the BNC. Master dissertation, Dalian Maritime University. Wang, Y., & Liu, H. (2017). The effects of genre on dependency distance and dependency direction. Language Sciences, 59, 135–147. Webber, B., Stone, M., Joshi, A., & Knott, A. (2003). Anaphora and discourse structure. Computational Linguistics, 29(4), 545–587. Widdowson, H. G. (2007). Discourse analysis (Vol. 133). Oxford University Press. Williams, J. P. (1984). Categorization, macrostructure, and finding the main idea. Journal of Educational Psychology, 76(5), 874. Williams, J. P., Taylor, M. B., & de Cani, J. S. (1984). Constructing macrostructure for expository text. Journal of Educational Psychology, 76(6), 1065. Wolf, F., & Gibson, E. (2006). Coherence in natural language: Data structures and applications. The MIT Press. Wolf, F., Gibson, E., Fisher, A., & Knight, M. (2004). Discourse Graphbank. Linguistic Data Consortium. Xiao, W., Huber, P., & Carenini, G. (2020). Do we really need that many parameters in transformer for extractive summarization? Discourse can help! In C. Braud, C. Hardmeier, J. J. Li, A. Louis, & M. Strube (Eds.), Proceedings of the first workshop on computational approaches to discourse (pp. 124–134), Online, November 2020. Association for Computational Linguistics. arXiv preprint arXiv:2012.02144. Xiao, W., Huber, P., & Carenini, G. (2021). Predicting discourse trees from transformerbased neural summarizers. arXiv preprint arXiv:2104.07058. Xu, C. (2018). Dependency distance between preposition zai and the clause subject. In H. Liu (Ed.), Advances in quantitative linguistics (pp. 231–243). Zhejiang University Press. (In Chinese). Xu, J., Gan, Z., Cheng, Y., & Liu, J. (2020). Discourse-aware neural extractive text summarization. In D. Jurafsky, J. Chai, N. Schluter, & J. Tetreault (Eds.), Proceedings of the 58th annual meeting of the Association for Computational Linguistics (pp. 5021–5031). July, 2020. Online. arXiv preprint arXiv:1910.14142. Association for Computational Linguistics. Yan, J., & Liu, H. (2021). Quantitative Analysis of Chinese and English Verb Valencies Based on the Probabilistic Valency Pattern Theory. In Workshop on Chinese Lexical Semantics (pp. 152–162). Nanjing, China, May 2021. Springer International Publishing. Yan, W., Xu, Y., Zhu, S., Hong, Y., Yao, J., & Zhu, Q. (2016). A survey to discourse relation analyzing. Journal of Chinese Information Processing, 30(4), 1–11. (In Chinese with English abstract). Ye, Y. (2019). Macrostructures and rhetorical moves in energy engineering research articles written by Chinese expert writers. Journal of English for Academic Purposes, 38, 48–61.

Literature Review and the Present Study 35 Yoshida, Y., Suzuki, J., Hirao, T., & Nagata, M. (2014). Dependency-based discourse parser for single-document summarization. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1834–1839). Doha, October, 2014. Yuan, J., Lin, Q., & Lee, J. S. (2021). Discourse tree structure and dependency distance in EFL writing. In Proceedings of the 20th international workshop on treebanks and linguistic theories (TLT, SyntaxFest 2021) (pp. 105–115). December 2021. Yue, M., & Liu, H. (2011). Probability distribution of discourse relations based on a Chinese RST-annotated corpus. Journal of Quantitative Linguistics, 18(2), 107–121. Zhang, B., Su, J., Xiong, D., Lu, Y., Duan, H., & Yao, J. (2015). Shallow convolutional neural network for implicit discourse relation recognition. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 2230–2235). Lisbon, Portugal, September 17–21, 2015. Zhang, H. (2023). Quantitative syntactic features of Chinese and English – A dependencybased comparative study. Zhejiang University Press. (In Chinese). Zhang, H., & Liu, H. (2018). Interrelations among dependency tree widths, heights and sentence lengths. In H. Liu & J. Jiang (Eds.), Quantitative analysis of dependency structure (pp. 31–52). De Gruyter Mouton. Zhang, L., Wang, G., Han, W., & Tu, K. (2021). Adapting unsupervised syntactic parsing methodology for discourse dependency parsing. In Proceedings of the 59th annual meeting of the Association for Computational Linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers) (pp. 5782–5794). August 2021, online. Zhang, M., Zhang, J., Su, J., & Zhou, G. (2006). A composite kernel to extract relations between entities with both flat and structured features. In Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the Association for Computational Linguistics (pp. 825–832). Sydney, July 2006. Association for Computational Linguistics. Zhao, Y., & Liu, H. (2014). Tendency to minimize dependency distance in understanding ambiguous structure. Computer Engineering and Applications, 50(6), 7–10. (In Chinese). Zheng, H., Li, Z., Wang, S., Yan, Z., & Zhou, J. (2016). Aggregating inter-sentence information to enhance relation extraction. In Proceedings of the thirtieth AAAI conference on artificial intelligence (pp. 3108–3114). Phoenix Arizona, February 12–17, 2016. Zhou, C. (2021). On mean dependency distance as a metric of translation quality assessment. Indian Journal of Language and Linguistics, 2(4), 23–30. Zipf, G. K. (1932). Selected studies of the principle of relative frequency in language. MA: Harvard University Press. Zipf, G. K. (1935). The psycho-biology of language. An introduction to dynamic philology. Houghton Mifflin.

3

Diversification Processes and Least Effort Principle

This chapter introduces the theoretical background for this research—diversification process and the least effort principle. First Zipf’s law is introduced along with the least effort principle, which Zipf uses to explain why Zipf’s distributions occur. On the basis of these discussions, we introduce first diversification processes which are constantly occurring in language and then how we can model such kind of processes. It is the diversification process that lays the foundation of modelling in this study. 3.1 Zipf’s law and least effort principle In investigations into a linguistic (sub)system, an important step is to identify the elements of that (sub)system, that is, the units and properties. Units/properties are not automatically found in the object of investigation by observation, but rather, they are defined since it’s a conceptual model (Altmann, 1993, 1996; Köhler, 2012). In accordance with the research objectives, researchers define their concepts from the perspective of the relevant theoretical framework employed in their study. For instance, there are phrases in phrase-structure grammars, but dependency distance can hardly be defined from the framework of phrase-structure grammars. Of primary importance in the definition of concepts is a well-formulated scientific hypothesis (Bunge, 2007). Upon the definition of units/properties, we will validate their linguistic status checking whether their distribution follows a certain distribution pattern (in most cases, a Zipf-law-related one). If it does, the regular distribution can prove that the units/properties are results of diversification processes and they are basic linguistic units/properties. Linguistic units/properties of linguistic elements abide by universal laws, which, like the well-known laws in natural sciences, can be “formulated in a strict mathematical way” (Köhler, 2012; Ord, 1972, p. 10). Among all the universal distribution laws, Zipf’s law is a central linguistic law connected with a great number of properties and language processes (Köhler, 2012, p. 9). Zipf (1932) introduces the principle of “relative frequency” of language, which states that in linguistics, it is neither necessary nor possible to study the entire population, but after repeating a language experiment for several times, if a certain percentage of the population is reached, then the results can represent the probability DOI: 10.4324/9781003436874-3

Diversification Processes and Least Effort Principle 37 in the entire population. The size of the population that reaches the proportion is called the Zipfian size (Zipf, 1932). Thus, we can study language without looking into the whole population, but with smaller samples bearing a Zipfian size. Zipf (1935) found the Zipf’s law when examining the rank-frequency distribution of words in the Latin texts by Plautus. This is done by assigning rank 1 to the word occurring most frequently, and rank 2 to the word occurring less frequently than the word ranking No. 1. The rank assigning lasts until the least common words. In this way a rank-frequency sequence comes into being. When represented by a graph, many or most of the entities are found to be rarely used and the word frequency decays (actually as a power law of its rank). Subsequent researchers (e.g., Joos, 1936) found that in the formula frr = K, where K is not a constant but a parameter, so that some more complicated formulas were derived. Examples will be given in Section 3.4. First discovered in linguistics, Zipf’s law soon found its place in various other fields, both in humanities and in natural sciences. In different fields, the curves for such a distribution pattern have a similar shape with a long tail. Figure 3.1a visually presents such a curve. When both axes are logarithmic, a line with a slope of around −1 is obtained (Figure 3.1b). In quantitative linguistics, there are three universal types of linguistic laws, that is, distributional ones, functional ones and developmental ones. The Zipf distribution is a typical distributional law. Zipf was not the only one to discover the law, but it was named after him in honour of his great contribution. By studying language as a natural phenomenon, Zipf’s work became seminal, introducing methods from natural sciences to linguistic studies. His research laid the scientific foundation for modern quantitative linguistics. His explanation of Zipf’s law (see Section 3.2) was not comprehensive, but it provided a clear direction for the construction of linguistic theories. Since then, dynamic theories and ideas such as self-organisation, self-adaptation, and synergetic linguistics have gradually gained importance in the field of linguistic research. “Since the birth of Zipf’s law, researchers have tried to study various linguistic laws in the world using a quantitative approach” (Fang & Wang, 2017, p. 11). For

Figure 3.1 Zipf distribution. (a) A rank-frequency curve. (b) Logarithm of both axes

38 Diversification Processes and Least Effort Principle example, Zipf’s law can bear linguistic topological significance (e.g., Gerdes et al., 2021). Yu et al. (2018) analyse Zipf curves in 50 languages and find common patterns in the three segments of Zipfian curves and explain them. A study by Wang and Liu (2022) find that Zipf’s law can be used as an indicator of lexical diversity. Some scholars have even found that animal languages can also be explained by this law (Ferrer-i-Cancho et al., 2013; Gustison et al., 2016; Heesen et al., 2019; Semple et al., 2010). Yu et al. (Yu, 2018; Yu et al., 2016) propose the “Hierarchical Selection Model” through analysing natural language texts. They use this model to analyse the significance of the power law exponent of Zipf’s law, pointing out that there are hierarchies in many phenomena in human society, including language. The general pursuit of high hierarchies gives rise to the emergence of the hierarchical phenomenon (the higher the hierarchy, the higher its probability of being selected). In the statistical process, a power-law distribution comes into being. Zipf’s model has extended to a wide range of fields such as sociology, geography and psychology, and half a century on, Zipf’s model has been validated in almost every social field. In 2002, for example, a special issue of Glottometrics, a linguistic journal, was devoted to Zipf’s law to mark Zipf’s 100th anniversary, and many scholars from various fields, including mathematics, information, biology, music and physics, reported findings related to Zipf’s law. In short, the Zipf distribution is not limited to the distribution of linguistic phenomena but is a universal law applicable to all fields. According to Zipf, the principle of least effort is a fundamental principle that guides linguistic behaviour as well as human behaviour of all kinds (Zipf, 1949). In the next section we will introduce this principle. 3.2 Least effort principle and diversification process The Zipfian curves occur as a result of least effort principle in the diversification process. There is a history of over 100 years of investigating language economy principles. French phonetician Passy proposes in the 19th century “le principe d’économie” (the principle of economy) (Feng & Zhou, 2017). However, he argues that the emphasis principle would outweigh the economic principle. Frei proposes “le besoin d’économie” (the need of economy) in 1929 (cf. Feng, 2013). Jespersen proposes the “easy theory” in 1951 (cf. Feng & Zhou, 2017). These researchers argue that these economy principles only occur in linguistic evolution. French linguist Martinet (1955) proposes “l’économie du langage” (the economy of the language), which is not limited to the evolution of language. Instead, it’s a basic principle for language. It comes from the human needs for expression and communication, and is restricted by the biological limitations of human beings (for instance, limited working memory). These two opposing forces contribute to the movement and development of language as internal factors. Rationalising power consumption plays a role in the functioning of language at all times (Feng, 2013). The idea is consistent with Zipf’s principle of least effort.

Diversification Processes and Least Effort Principle 39 The language economy principle exerts significant impacts on language at various sub-levels, be they phonological, syntactic and lexical, to name a few. In the case of grammar, for example, subjects are marked in a special form in French and even disappear in Russian due to their high frequency, which reduces the burden of pronunciation and comprehension. Whether termed as the principle of linguistic economy or the least effort principle, they reflect the fact that the distribution pattern of our language is humandriven (Lin & Liu, 2018; Liu, 2014, 2018), and the aforementioned minimisation of dependency distance (Section 2.1.1) is also due to the limited working memory of humans, and this cognitive limitation leads to the need for language users to save their memory power. For example, Xu and Liu (2022) explore the role of working memory for syntactic dependency structure. We will leave more details about the minimisation of dependency distances to Section 5.7, which gives rise to the power-law distribution, hence the Zipfian curves. Another cause for the Zipfian curves and Zipfian distributions is diversification process, which is also a by-product of the least effort principle. Quite well-known in biology, the diversification processes also take place in linguistics, occurring at all levels of language, and covering a wide scope of language phenomena (Altmann, 1991). Words can have more than one class membership with formal changes like derivation (e.g., propose, proposal) or without formal changes (e.g., smile as a noun or as a verb). Or words can be polysemous (e.g., plot) or acquire connotation with other words. In the opposite direction, unification is also at work. These two contrasting processes are collectively called Zipfian processes. Zipf (1949) proposes “the least effort” principle as the theoretical explanation for Zipf’s law. This principle is a compromising consequence of two competitive economic principles. Take words as an example. To minimise his/her effort in encoding, the speaker tends to use less words. As a consequence, many words will bear more than one meaning, which, when going to an extreme, can be a word meaning everything. This least effort by the speaker gives rise to the unification force. However, for the listener, to minimise his/her effort in decoding, he/she might prefer the situation where one word carries only one exact meaning, thus giving rise to the diversification force. A balance reached between the unification and diversification forces can indicate that both parties have simultaneously minimised their efforts, as in this way an optimisation of communicative transactions cost between the speaker and the listener is reached. As a result, a few words will occur more frequently than many other words. More frequently occurring elements demand less emphasis or conspicuousness than those occurring more rarely (Zipf, 1933, p. 1), thus generating the special curve of rank-frequency distribution of words. The least effort principle explains why many language phenomena observe the Zipf’s law, which exhibits the dynamic features of language. As the internal driving forces, unification process and diversification process (or collectively called Zipfian forces) constitute the mechanism for various language phenomena to occur. Having introduced the basic concepts of diversification process and Zipf’s law resulting from such kind of process, we proceed with the factors leading to diversification processes in the next section.

40 Diversification Processes and Least Effort Principle 3.3 Factors leading to diversification processes Factors leading to diversification processes are various, among which are random fluctuation, environmentally conditioned variation, conscious change and system modification (Altmann, 2005). We’ll elaborate a little bit more on some of the factors. A quite important factor leading to diversification processes is self-regulative triggering (Altmann, 2005, p. 647). The causes for self-regulative triggering lie deep and such triggering is quite well-known in diachronic linguistics. For instance, for a sound gradually penetrating into the distinctive domain of another sound, split or allophone of that sound might occur (cf. Boretzky, 1977). System modification comes as language is a self-regulating system. Köhler (1986, 1987, 1989, 1990, 1991, 2012) puts forward some requirements, which are thus named as the Köhlerian requirements. These requirements mostly come in pairs with opposite forces, for instance the minimal coding effect and minimal decoding effort, context economy and context specificity, invariance and flexibility of the relation between expression and meaning. These forces also trigger self-regulation of language. Each diversification process evokes its reverse unification process on the same language entity, which will help combat the decay of the phenomenon. But the two forces might come at different levels. A collective inertia will resist all language innovations. “What predominates in all change is the persistence of the old substance; disregard for the past is only relative. That is why the principle of change is based on the principle of continuity” (de Saussure, 1973, p. 74). The unification processes help maintain the established order while the diversification process may change the order, remake the order or increase complexity, or even totally destroy the existent order (Guntern, 1982). For instance, a language might diversify into different dialects owing to lexical or syntactic changes. The unification of language might also be in play consciously to maintain a certain order. Typical examples include language planning and language policies from the government. To decide whether a process is a unification one or a diversification one relies both on the perspective of examination and on the extent of such kind of process. Also, for the factors leading to the very diversification process, sometimes it might be difficult to tell exactly which one out of the many factors is playing a role. Sometimes several factors might be in play. But as some of the factors are exerting a constant force at the same direction, these forces can be added. Usually, a differential equation might be able to model the result from such processes. The two forces of unification and diversification work together and contribute to stable distributions, which have been repeatedly corroborated in quantitative linguistics. 3.4 Modelling diversification processes To model diversification processes, we shall start from three general assumptions, which serve as the foundation of modelling (Altmann, 2005). The first assumption addresses the rank-frequency distribution of the diversified entities, which are rightfully the objects of investigation in this study.

Diversification Processes and Least Effort Principle 41 The first assumption states that a decreasing rank-frequency distribution is formed when the classes of the diversified entity are of a nominal variable, or that a discrete distribution is formed in case the classes come from a numerical variable. An entity diversifies in a certain direction, resulting in unequal consequences of the diversified entities, which can then be ranked according to decreasing frequencies. If going right, this hypothesis can be a criterion to distinguish the various taxonomies of the diversified entities, depending on how the rank-frequency distribution fits the empirical data (Altmann, 2005, p. 647). The second assumption goes that the classes of the diversified entity are somehow mutually dependent. For instance, one word class does not exist independent of other word classes. A typical example is articles: An article evokes an ensuing noun and avoids a possessive pronoun. Or some classes might complement one another. The process of diversification is the mechanism behind which many observed phenomena occur (Altmann, 2005). Based on theoretical considerations, we can propose some hypotheses, which, if corroborated empirically, can be linguistic laws. These laws can be used to explain and predict linguistic phenomena. A diversification process can meet some needs (e.g., Köhlerian requirements), affect many language entities and their properties, and trigger other diversification processes (Altmann, 2005). These processes interact with each other, hence selfregulation between the linguistic elements. Many examples of this can be found in various fields of linguistics. For example, in the field of grammar, the diversification process increases the number of affixes, grammatical categories, and polyfunctionality of syntactic constructions. If affixes do not become more functional, the total inventory of affixes will have to grow. Through the addition of more rules, inventories of sentence types and syntactic constructions will increase accordingly. In all these fields, when the saturation point is reached, a stable state will occur and maintain for some time during which only some insignificant oscillations happen. From the above analysis, we can see that many phenomena that we can observe in linguistics have a common feature—they are results of diversification processes, which lays the foundation for their modelling and interpretation. In the following modelling procedure, x stands for the rank (in case the variable is a nominal one), or a discrete/numerical variable. We start from the initial point where we assume the beginning class bears a probability of 1. The following class develops in proportion to the first class, which can be represented by the ratio in: P2 = aP(3.1) 1 The second class further develops into the third and fourth classes. The proportion might be still a constant like a in Function 3.1. But with various factors which are effective, the constant develops into f(x), namely, Px = f ( x ) Px −1(3.2)

42 Diversification Processes and Least Effort Principle where f ( x ) = g ( x ) / h( x ) (3.3) Here, g(x) stands for the speaker’s part. The innovation on his part diversifies the classes. And h(x) stands for the hearer/community’s part. Contrary to the speaker, the community is generally conservative, preferring a hindrance to too much creativity so as to ensure undisturbed communication (Köhler & Altmann, 1996, p. 649ff.). As a general statement, Function 3.2 Px=f(x)Px-1 can be universally employed for various results of diversification process. f(x) will differ in its concrete meaning accordingly for different initial or subsidiary conditions. For instance, when modelling the steady state in semantics, it is Beőthy’s law (Beőthy & Altmann, 1984a, 1984b); when modelling that in dialectology, it is Goebl’s law (Goebl, 1984); in modelling word frequency, it is Zipf-Mandelbrot’s law and in modelling diversification process in lexicology, it is Martin’s law (Köhler & Altmann, 1996, p. 649ff.). Altmann (1991) offers a comprehensive review of many possible modelling methods. We illustrate with the right truncated modified Zipf-Alekseev distribution, as it’s a widely recognised model in linguistics and is also one of the most important models in this study. As previously mentioned, all language entities diversify and generate variants or secondary forms, which might be grouped into a certain class. These diversified entities will follow a certain rank-frequency distribution (Altmann, 2005). Specifically, Hřebíček (1996) hypothesises that they follow the right truncated modified Zipf-Alekseev distribution pattern. He puts forward two hypotheses: a The logarithm of the class size and that of the ratio of the probabilities P1 and Px are proportional to each other, which can be represented by ln( P1 / Px ) ∝ ln x(3.4) b The logarithm of Menzerath’s law (y = Axb) gives the previous proportionality function, thus ln( P1 / Px ) = ln( Ax b )

(3.5)

where A=ea, yielding a solution Px = P1 x − ( a + b ln x ) , x = 1, 2,3,…(3.6) In case Function 3.6 is a probability distribution, P1 is the normalising constant; or the estimated size of the first class, x=1. Quite often, the first class will display

Diversification Processes and Least Effort Principle 43 a diverging frequency from other classes which are regularly distributed. Assign a special value to the first class and we obtain:  α  P =  (1 − α ) x − ( a + b ln x )  T  ′ x

x =1 x = 2,3,4,...,n

(3.7)

where n

T=

∑j

− ( a + b ln j )

, a, b ∈ℜ, 0 < α < 1(3.8)

j=2

Function 3.6 or Function 3.7 represent the Zipf-Alekseev distribution. In case n is finite, Function 3.7 is the right truncated modified Zipf-Alekseev distribution. Having addressed the fundamental assumptions based on which we model the language entities/properties, we will define the language entities/properties or our research objects in the next chapter along with the research materials. References Altmann, G. (1991). Modeling diversification phenomena in language. In U. Rothe (Ed.), Diversification processes in language: Grammar (pp. 33–46). Rottmann. Altmann, G. (1993). Science and linguistics. In R. Köhler & B. Rieger (Eds.), Contributions to quantitative linguistics (pp. 3–10). Kluwer. Altmann, G. (1996). The nature of linguistic units. Journal of Quantitative Linguistics, 3(1), 1–7. Altmann, G. (2005). Diversification process. In R. Köhler, G. Altmann, & G. Piotrowski (Eds.), Quantitative Linguistik, Ein Internationales Handbuch (Quantitative linguistics, an international handbook) (pp. 646–658). Walter de Gruyter. Beőthy, E., & Altmann, G. (1984a). The diversification of meaning of Hungarian verbal prefixes. II. ki-. Finnisch-Ugrische Mitteilungen, 8, 29–37. Beőthy, E., & Altmann, G. (1984b). Semantic diversification of Hungarian verbal prefixes. III. “föl-”, “el-”, “be-. In U. Rothe (Ed.), Glottometrika (Vol. 7, pp. 45–56). Brockmeyer. Boretzky, N. (1977). Einführung in die historische Linguistik. Rowohlt. Bunge, M. (2007). Philosophy of science (Vol. 1). From problem to theory (4th ed.). Transaction Publishers. de Saussure, F. (1973). Cours de linguistique générale. Payot. Fang, Y., & Wang, Y. (2017). Quantitative linguistic research of contemporary Chinese. Journal of Quantitative Linguistics, 25(2), 107–121. Feng, Z. (2013). Schools of modern linguistics (Updated edition). Commercial Press. (In Chinese). Feng, Z., & Zhou, J. (2017). Martinet and French functional linguistics. Modern Chinese (Language Research), 8, 4–6. (In Chinese). Ferrer-i-Cancho, R., Hernández-Fernández, A., Lusseau, D., Agoramoorthy, G., Hsu, M. J., & Semple, S. (2013). Compression as a universal principle of animal behavior. Cognitive Science, 37, 1565–1578.

44 Diversification Processes and Least Effort Principle Gerdes, K., Kahane, S., & Chen, X. (2021). Typometrics: From implicational to quantitative universals in word order typology. Glossa: a Journal of General Linguistics, 6(1), 1–31. https://doi.org/10.5334/gjgl.764 Goebl, H. (1984). Dialektometrische Studien I. Narr. Guntern, G. (1982). Auto-organization in human systems. Behavioral Science, 27, 323–337. Gustison, M. L., Semple, S., Ferrer-i-Cancho, R., & Bergman, T. J. (2016). Gelada vocal sequences follow Menzerath’s linguistic law. In L.C. Dorothy (Ed.), Proceedings of the National Academy of Sciences of the United States of America (E2750–E2758). University of Pennsylvania, Philadelphia, USA. https://doi.org/10.1073/pnas.1522072113 Heesen, R., Hobaiter, C., Ferrer-i-Cancho, R., & Semple, S. (2019). Linguistic laws in chimpanzee gestural communication. Proceedings of the Royal Society B: Biological Sciences, 286(1896), 20182900. Hřebíček, L. (1996). Word associations and text. Glottometrika, 15, 96–101. Joos, M. (1936). Review of G. K. Zipf. The psycho-biology of language. Language, 12, 196–210. Köhler, R. (1986). Zur linguistischen Synergetik. Struktur und dynamik der lexik. Brockmeyer. Köhler, R. (1987). Systems theoretical linguistics. Theoretical Linguistics, 14, 241 K 257. Köhler, R. (1989). Linguistische Analyseebenen, Hierarchisierung und Erklärung im Modell der sprachlichen Selbstregulation. In L. Hřebíček (Ed.), Glottometrika (Vol. 11, p. 18). Brockmeyer. Köhler, R. (1990). Elemente der synergetischen Linguistik. In R. Hammerl (Ed.), Glottometrika (Vol. 12, pp. 179–188). Brockmeyer. Köhler, R. (1991). Diversification of coding methods in grammar. In U. Rothe (Ed.), Diversification processes in language: Grammar (pp. 47–55). Rottman. Köhler, R. (2012). Quantitative syntax analysis. De Gruyter Mouton. Köhler, R., & Altmann, G. (1996). “Language Forces” and synergetic modelling of language phenomena. In P. Schmidt (Ed.), Glottometrika 15 (pp. 62–76). Wissenschaftlicher Verlag Trier. Lin, Y., & Liu, H. (2018). Methods and trends in language research in the big-data era. Journal of Xinjiang Normal University (Philosophy and Social Sciences), 1(39), 72–83. (In Chinese). Liu, H. (2014). Language is more a human-driven system than a semiotic system. Comment on modelling language evolution: Examples and predictions. Physics of Life Reviews, 11, 309–310. Liu, H. (2018). Language as a human-driven complex adaptive system. Comment on “Rethinking foundations of language from a multidisciplinary perspective”. Physics of Life Reviews, 26(27), 149–151. Martinet, A. (1955). Économie des Changements Phonetiques. Maisonneuve & Larose. Ord, J. K. (1972). Families of frequency distributions. Hafner Publishing Company. Semple, S., Hsu, M. J., & Agoramoorthy, G. (2010). Efficiency of coding in macaque vocal communication. Biological Letters, 6, 469–471. https://doi.org/10.1098/rsbl.2009.1062 Wang, Y., & Liu, H. (2022). Revisiting Zipf’s law: A new indicator of lexical diversity. In M. Yamazaki, H. Sanada, R. Köhler, S. Embleton, R. Vulanović, & E. S. Wheeler (Eds.), Quantitative approaches to universality and individuality in language (pp. 193–202). De Gruyter Mouton. https://doi.org/10.1515/9783110763560 Xu, C., & Liu, H. (2022). The role of working memory in shaping syntactic dependency structures. In J. Schwieter & E. Wen (Eds.), The Cambridge handbook of working memory and language (pp. 343–367). Cambridge University Press.

Diversification Processes and Least Effort Principle 45 Yu, S. (2018). Linguistic interpretation of Zipf’s law. In H. Liu (Ed.), Advances in quantitative linguistics (pp. 1–25). Zhejiang University Press. (In Chinese). Yu, S., Liang, J., & Liu, H.. (2016). Existence of hierarchies and human’s pursuit of top hierarchy lead to power law. http://arxiv.org/abs/1609.07680 Yu, S., Xu, C., & Liu, H. (2018). Zipf’s law in 50 languages: Its structural pattern, linguistic interpretation, and cognitive motivation. https://arxiv.org/abs/1807.01855 Zipf, G. K. (1932). Selected studies of the principle of relative frequency in language. Harvard University Press. Zipf, G. K. (1933). Selected studies of the principle of relative frequency in language. Language, 9(1), 89–92. Zipf, G. K. (1935). The psycho-biology of language. An introduction to dynamic philology. Houghton Mifflin. Zipf, G. K. (1949). Human behavior and the principle of least effort. Addison-Wesley.

4

Research Materials and Methods

This chapter covers the research materials and methods. It is structured as follows. The next two sections introduce the research materials: A syntactic dependency treebank (Section 4.2) and a discourse dependency treebank (Section 4.3); Section 4.4 defines research objects in the present study; the following section concerns two unique linguistic units: Motifs and sequencings. Section 4.6 winds up this chapter with a summary. First, we address the treebanks at the two levels. We will start from the syntactic dependency treebank. 4.1 From graph theory to treebanks As a branch of mathematics, graph theory (Bondy & Murty, 2008; Chartrand, 1985; Harary, 1969; Tutte, 2001) takes graphs as its object of study. Diagrams in graph theory use points to represent things and connecting lines to represent relationships between things. A tree is a type of graph. A connected acyclic graph is a tree. The vertices of a graph are the nodes, and the branches in a tree graph are the edges. To facilitate problem solving, a particular vertex of the tree can be treated as the root of the tree, and generally, any node of a tree can be treated as a root. It can be imagined that the tree starts branching from this point and the process continues, with nodes that can no longer branch on the way becoming leaves. In other words, the vertices in the tree with a “degree” of 1 (i.e., associated with only one other node) are called leaves. Vertices with a degree greater than 1 are called branch points. A tree with a fixed root is a rooted tree. The relationship between a rooted tree and a fixed root is called tree order, and this order can be characterised by “height”. A vertex with distance k to the root has height k and is at level k of the tree. If the graph is directed, the degrees can be distinguished as out-degrees, which represent the number of arcs starting at that node, and in-degrees, representing the number of arcs ending at that node. Following the above definition, a syntactic structure tree is a tree in graph theory. In a syntactic tree based on phrase structure, all words are leaves. In a syntactic tree based on dependency relations, the leaves are all subordinates and cannot DOI: 10.4324/9781003436874-4

Research Materials and Methods 47 govern other words. For example, in the dependency structure of the sentence “The student has an interesting book” (Figure 1.2), the in-degree of “book” is 1 (hasbook) and the out-degree is 2 (an-book, interesting-book). In fact, in the syntactic dependency trees, all nodes other than the root have an in-degree of 1. The root node bears an in-degree of 0. This means that each node has only one governor/ head; in other words, the dependency relationship is a one-to-one correspondence. The out-degree can be varied, that is, the number of subordinate words that these nodes can take is changeable, which is what we will define later as “valence”. Tree structures can bridge various levels of processing (Marcus, 2001; van der Velde & de Kamps, 2006). Tree diagrams are used in a wide variety of fields, and Bod (2005) suggests that graphs can model higher perception as well as cognition and can provide a unified cognitive model for the three modalities of language, music and problem-solving. Trees provide represent grouping structures for these fields, and thus tree structures can be seen ubiquitously in many fields. One of the fundamental properties of language is its hierarchical nature. The existence of hierarchies in language or other human phenomena is also highlighted in the “Hierarchical Selection Model” proposed by Yu, Liang and Liu (Yu, 2018; Yu et al., 2016). In the case of language, for example, a small number of preferred linguistic entities are used more frequently, forming the first segment of the Zipf correlation curve, which is the emergence of a hierarchy, while a large number of other linguistic entities are of a lower hierarchy, forming the long tail of the Zipf curve, which statistically results in a regular power-law distribution of related phenomena. This hierarchical nature is particularly evident in syntax (Feng, 2011; Liu, 2017; Liu & Jing, 2016; Yadav et al., 2020). The hierarchical nature of sentences is generally represented as a tree diagram. Lai (1924/2007, p. 4) suggests that diagramming helps language learning, “drawing a diagram to analyse a sentence, either a main clause or a subordinate one, makes the relationship clear with the position and function of nodes clearly represented”. Hudson (1998) also indicates how a person draws a tree can reflects how he understands the sentence. Liu and Jing (2016, p. 2) further point out that depicting hierarchical sentence structure is an important aspect of language studies. Such a method bears practical value in language teaching, and is related to the inquiry into the developmental laws of language itself and human cognition in general. Tesnière (1959, p. 16) argues that the search for transfer rules between a twodimensional hierarchical order and a one-dimensional linear order is the main task of syntactic research, with the former order outweighing the latter. Tesnière proposes an important concept—connexion (connection), recognising it as the basis of the entire structural syntax (Tesnière, 1959, p. 12). He uses the French sentence “Alfred parle” (Alfred says.) to exemplify that it is the connexion that unites the two words into a whole that is different from the original individual words. This connexion is similar to the chemical reaction between chlorine and sodium, which creates sodium chloride, a new substance different from both the original chlorine and sodium. Connexions are indispensable for the expression of ideas; “without connexions we wouldn’t be able to express any continuous thought

48 Research Materials and Methods and would only be able to utter a succession of unrelated images and ideas isolated from each other” (Tesnière, 1959, p. 12). Feng (2013, p. 430) argues that sentences have a rigorous organisation, with a breath of life, which is given by connexions and is therefore the “lifeblood” of sentences. Tesnière (1959) proposes that a connexion establishes in a sentence a relationship between two words with one subordinate to another, and connexions therefore obey the principle of hierarchy. Such connexions can be expressed in terms of a stemma where governors are placed above their subordinates. Tesnière’s concept of connexions is what we now termed as “dependency”. As mentioned above, syntactic or semantic structures are often represented as trees, hence the concept of treebanks as introduced by linguist Geoffrey Leech in the 1980s (e.g., Garside et al., 1987). In linguistics, a treebank is a corpus annotated with semantic or syntactic structures of sentences. Before the 1990s, system developers in computational linguistics used rulebased syntactic-semantic analysis with limited success, and were not yet able to deal with large-scale real texts. The construction of annotated corpora in the 1990s revolutionised computational linguistics: Treebanks could be used as a source of knowledge to obtain syntactic structures and to evaluate syntactic analysis results. The construction of treebanks allowed computational linguistics to begin a shift from introspection to corpus and a strategic shift from rules to statistics. Despite its inception in computational linguistics, such large-scale empirical data have since then benefited the study of linguistics in general and thus gained increasing attention from researchers (Abeillé, 2003), and corpora have become useful tools in linguistic research (Hinrichs & Kübler, 2005; Hudson, 2007; Nivre, 2005). Liu (2009a) maintains that the resources of treebanks contribute to theoretical linguistic research. For instance, treebanks constructed using real corpora contain a large amount of information on syntactic distribution and greatly facilitates syntactic analysis. Many of the previously mentioned studies were carried out on the basis of syntactic dependency treebanks. For example, among many others, Gao et al. (Gao, 2010; Gao et al., 2010) carry out studies on the syntactic functions of Chinese nouns and verbs on the basis of Chinese dependency treebanks; Liu (2007) conducts a study on the probability distribution of dependency relations in 20 languages; Liu and Cong (2014) investigate complex networks in Chinese. In recent years Chinese scholars have carried out an increasing number of quantitative studies based on dependency treebanks (e.g., Chen & Gerdes, 2022; Chen et al., 2022; Fang & Liu, 2018; Jiang & Jiang, 2020; Jiang & Liu, 2015, p. 2018; Jing & Liu, 2017; Jiang & Ouyang, 2017; Lei & Wen, 2020; Liang, 2017; Liu & Chen, 2017; Liu et al., 2017; Ouyang et al., 2022; Xu & Liu, 2022; Zhou, 2021). This study will utilise two large treebanks of English (one at the syntactic level and one at the discourse level), both based on the same source material. The aim is to understand the distribution of relevant elements at the two levels, to establish the linguistic significance of these linguistic units/attributes, and to carry out a comparative study between these two levels. We present these two corpora in the next two sections.

Research Materials and Methods 49 4.2 A syntactic dependency treebank To make research materials comparable, we need to choose a syntactic tree bank and a discourse one from the same source material, which quite narrows down our choice. We luckily find the entire WSJ section of Penn Treebank from the Linguistic Data Consortium (1999) a rare suitable option. From the Prague Czech-English Dependency Treebank 2.0 (PCEDT 2.0) (Hajič et al., 2012), we choose the English part as our corpus for syntactic relations. This part contains the entire WSJ section of Penn Treebank from the Linguistic Data Consortium (1999) (LDC99T42). It contains 2499 stories, or 1.174 million words. LDC99T42 is converted into dependency trees from phrase-structure ones. The resulting corpus was manually rechecked and it follows the annotation format of PDT 2.0. The annotations meet the basic projection requirements (Uresova et al., 2016). Early in the 1930s, Zipf (1933) proposes the principle of “relative frequency” in language, arguing that it’s neither necessary nor possible to examine the population of all language entities. After repeated experiments, if a certain percentage is found to stand for the probability of the whole population, it is said to reach the Zipfian size. That indicates we only need to study the sample with a Zipfian size rather than the whole population. To minimise influences from various sizes and find the suitable Zipfian size of the corpus, we divide the PCEDT 2.0 into 23 sub-corpora, each with about 50 thousand words, and then randomly choose 6 sub-corpora. The exclusion of punctuation marks, whose roles are assigned to their head words, enables us to concentrate on word-word dependency. The resulting sub-corpora are E1–E6, each bearing a size of 43,200 plus words (Table 4.1). If we can prove them to be homogeneous, we can claim that the sub-corpora are of Zipfian size and that the regularities found in them can reflect regularities of journalistic English. Tables 4.2 and 4.3 present the annotations of POS (parts of speech) and syntactic functions/dependency relations. They basically follow an order of decreasing frequencies. Dealing with the syntactic corpora in the previous fashion bears the following advantages: 1 Both the redundancy of the big data and the robustness of the approach are reduced (Liu & Jing, 2016). 2 If the sub-corpora are found to be homogeneous, we can prove that the features found in them can represent the features of journalistic English in general. 3 Some features common to all the sub-corpora will automatically emerge. These features may not be conspicuous when there is only one corpus. Table 4.1 Sizes of sub-corpora of syntactic relations

words

E1

E2

E3

E4

E5

E6

43,264

43,276

43,264

43,253

43,257

43,260

Source: Zhang, H. (2018). A Chinese-English Synergetic Syntactic Model Based on Dependency Relations. Doctoral dissertation, Zhejiang Univiersity. (In Chinese).

50 Research Materials and Methods Table 4.2 Tags of POS used in the syntactic sub-corpora POS

Explanation

POS

Explanation

nn in nnp dt nns jj vbd cd cc vb rb vbn vbz to vbg prp vbp md pos

Noun, singular or mass Preposition or subordinating conjunction Proper noun, singular Determiner Noun, plural Adjective Verb, past tense Cardinal number Coordinating conjunction Verb, base form Adverb Verb, past participle Verb, 3rd person singular present to Verb, gerund or present participle Personal pronoun Verb, non-3rd person singular present Modal Possessive ending

prp$ wdt jjr nnps rp wp wrb jjs rbr ex pdt rbs fw wp$ uh ls sym others

Possessive pronoun Wh-determiner Adjective, comparative Proper noun, plural Particle Wh-pronoun Wh-adverb Adjective, superlative Adverb, comparative Existential there Predeterminer Adverb, superlative Foreign word Possessive wh-pronoun Interjection List item marker symbols others

Having introduced the syntactic dependency treebank, we resume with discourse dependency treebank. 4.3 A discourse dependency treebank In this section of the study, we first of all select an appropriate RST corpus and then resume with the conversion of RST trees into new dependency ones with all nodes Table 4.3 Tags of dependency relations used in the syntactic sub-corpora Functions1

Explanation

1 2 3 4 5 6 7 8

atr adv auxp sb auxa obj nr auxv

9 10 11 12 13 14

coord auxc pnom pred neg exd

Attribute Adverbial Preposition, parts of a secondary (multiword) preposition Subject Articles a, an, the Object Unidentified Auxiliary verbs be, have and do, the infinitive particle to, particles in phrasal verbs Coordination node Subordinating conjunction, subordinator Nominal part of a copular predicate Predicate. A node not depending on other nodes. Negation expressed by the negation particle not, n’t A technical value signalling a “missing” (elliptical) governing node; also for the main element of a sentence without predicate (stands for "externally dependent")

Rank

Research Materials and Methods 51 representing elementary discourse units (EDUs)of clauses, of sentences and then of paragraphs, respectively. 4.3.1 Selecting an RST corpus

Among many RST corpora (e.g., Stede, 2004; Taboada, 2004, 2008; Yue, 2006), the RST Discourse Treebank (RST-DT) (Carlson et al., 2002, 2003) is recognised vigorous and organised efforts to consistently to collect RST analyses. The RSTDT boasts a size of 176,383 words, containing 385 news stories, hard and soft alike, from the WSJ and covering subjects like news, finance and arts. Figure 1.3 (re-presented here in Figure 4.1) visually presents a typical RST tree, where the EDUs are clauses, but they are not necessarily independent ones.

Figure 4.1 A typical RST tree representation (WSJ-0642)

52 Research Materials and Methods Carlson et al. (2003) offer a complete overview of the RST-DT. With a fairly homogeneous annotation of rhetorical relations, this corpus has been the choice for many researches (e.g., Demberg et al., 2019; duVerle & Prendinger, 2009; Feng & Hirst, 2012; Hernault et al., 2010; Liu & Zeldes, 2019; Sporleder & Lascarides, 2004; Stede, 2012; Williams & Power, 2008, p. 2019), which have generated a wide range of publications concerning RST analysis. Hence it is our choice in this study. But RST trees like Figure 4.1 indicate more resemblance to phrase-structure syntactic trees than dependency syntactic trees. To make the dependency trees at the two linguistic levels more comparable, we need to transform this traditional type of RST trees to new ones with merely ultimate nodes. This will be addressed in the next part. 4.3.2 RST tree conversion

Traditional RST trees are very much like the constituent syntactic structure as both types of trees are basically made up of binary structures. Figure 4.2a presents an equivalent representation of Figure 4.1. In constituency relations like Figure 4.2b, syntactic functions join the nodes, which are either terminal nodes of words or bigger spans (e.g., S, NP2). In Figure 4.2a, terminal clause nodes and bigger spans are connected with discourse relations. To examine RST relations holding between the same-level units, we need to reframe the trees into ones with only terminal nodes. The resemblance previous discussed sheds light on the re-composition of traditional RST tree—we can borrow the algorithm of transforming a syntactic constituent tree into a dependency one (Liu, 2009a, 2009b). In Figure 4.3a the more prominent words in phrases are promoted as the heads, for instance, nouns in noun phrases (NPs) (Steps 1

Figure 4.2 An analogy between phrase-structure syntactic trees and RST trees. (a) An equivalent representation of Figure 4.1. (b) A constituent syntactic tree Source: Zhang & Liu, 2016a

Research Materials and Methods 53

Figure 4.3 Transforming a constituent structure into a dependency structure. (a) Constituency relation. (b) Dependency relation Source: Zhang & Liu, 2016a

and 5) and verbs in verb phrases (VPs) (Steps 2 and 3) are promoted. Finally, in the sentence (S), which is composed of a noun phrase (NP1) and a verb phrase (VP1), the verb (“may”) is promoted as the root of the whole sentence as the VP is more significant than NP1 (Step 4). These steps generate Figure 4.3b, a dependency tree. So now we can perform the conversion of RST trees following the same algorithm. We illustrate with Figure 4.4 about how to recursively promote each clausal nucleus to be the head text span along the original relational vein. Step 1, promote Node 4 as the head of 3–4. Steps 2–5, promote Node 2 (the highlighted one) to be the head of 2–4, then along the vein, the head of 1–4, 1–5, and 1–6.

Figure 4.4 Converting an RST tree (Figure 4.2a). (left) Steps of conversion. (right) The converted tree

54 Research Materials and Methods Step 6, Node 7 is promoted to be the head of 7–8. Steps 7 and 8, Node 10 (the red one) is raised to be the top of 7–10. Step 9, Node 2 is promoted as the root node. Finally, we get Figure 4.4b, a four-layer discourse dependency tree, where the left box stands for Paragraph 1 and the right box, Paragraph 2. With texts in it, Figure 4.4b becomes Figure 4.5, which goes with only terminal clause nodes. Figure 4.5 can also be represented horizontally by Figure 4.6. It is the compositionality criterion of RST (Marcu, 2000) that enables such a conversion. The criterion goes that a rhetorical relation obtaining between two spans also holds between their most salient units. As Node 2 is the most significant part of Span 1–6, and Node 10 is the most salient one of Span 7–10, we deduce that Node 10 is the Elaboration-additional of Node 2. With this logic, we still get as the final result in Figure 4.5, which clearly presents relational, linear and hierarchical features. Having converted traditional RST trees with only clause nodes, we can resume with the trees with mere sentence nodes and then paragraph nodes. This is concerned with the granularity of terminal nodes, which will be addressed in the following sub-section. 4.3.3 Granularity of RST terminal nodes

Structural units can be arranged along various dimensions, one of which is the unit size (Hovy & Maier, 1992). The size can range from morphophonemic units to full discourse length. These units of various sizes are intrinsically and closely connected: Prosodic units can form words, words can work together to form clauses, and then to sentences, paragraphs (or chapters and sections), discourse or even further. Along this vein, all the units can be closely examined and interfaces between different sizes can be mapped. As a matter of fact, various unit sizes of discourse structure have been extensively examined. We illustrate with some as follows. Some researchers examine prosodic differences (the unit size being intonational patterning) (e.g., Ayers, 1994; Cho, 1996; den Ouden, 2004; den Ouden et al., 2002; Hirschberg & Litman, 1987; Kong, 2004; Kowtko, 1997; Nakatani et al., 1995; Noordman et al., 1999; Pierrehumbert & Hirschberg, 1987; Venditti, 2000; Venditti & Hirschberg, 2003; Venditti & Swerts, 1996; Wichmann, 2014). Some investigate subclausal or pronominalisation aspects (e.g., Ariel, 1988; Azar et al., 2019; Björklund & Virtanen, 1991; Carminati, 2002; Coopmans & Cohn, 2022; Gernsbacher, 2013; Gordon et al., 1993; Hwang, 2022; KarmiloffSmith, 1985, 1994; Levy, 1984; Marslen-Wilson et al., 1982; Passoneau, 1991; Yoon et al., 2021). Some studies examine intra-sentence relations (e.g., Gupta et al., 2016; Kambhatla, 2004; Soricut & Marcu, 2003; Vu et al., 2016; Zhang et al., 2006,

Research Materials and Methods 55

Figure 4.5 Converting Figure 4.1 into a tree with only terminal clause nodes (E- = elaboration-)

2018) or inter-sentence ones (e.g., Gupta et al., 2019; Miller & McKean, 1964; Noriega-Atala et al., 2021; Peng et al., 2017; Quirk & Poon, 2017; Sahu et al., 2019; Song et al., 2018; Verga et al., 2018; Zheng et al., 2016) or both (e.g., Swampillai & Stevenson, 2011).

56 Research Materials and Methods

Figure 4.6 Representing Figure 4.5 horizontally (E- = elaboration-, EGS = elaborationgeneral-specific)

Further upward, there are, for instance, macrostructures (Kintsch & van Dijk, 1975; Paltridge et al., 2012; van Dijk, 1980, 1982, 1983, 1985, 2019a, 2019b; Williams, 1984; Williams et al., 1984; Ye, 2019) or discourse schema (e.g., Anderson et al., 1978; d’Andrade, 1984, 1990; Bruner & Bruner, 1990; Cook, 1990; Hidi & Hildyard, 1983; Machin & van Leeuwen, 2003; Malcolm & Sharifian, 2002; Nelson, 1981, 1986; Nwogu, 1989; Rumelhart, 1978; Spiro & Tirre, 1980; Strauss & Quinn, 1993; Winograd, 1977). We can go even further. Nearly half a century ago, Hobbs (1978) proposes that relations between linguistic units are the basic components of discourse structure. Such units can be clauses, sentences, groups of sentences and even the discourse per se. For instance, Radev (2000) proposes cross-document structure so that multi-document summarisation can be done. The recent popular language model ChatGPT (Chat Generative Pre-trained Transformer) benefits a lot from such kind of cross-document structures. I wonder whether it’s a pity this book doesn’t refer to any dialogue with ChatGPT as I have always been on the waiting list. After a broad discussion of unit sizes, we narrow down to our discussions concerning RST unit sizes. Albeit different and competing hypotheses concerning what can constitute EDUs, it is unanimously agreed in the field that EDUs are “non-overlapping spans of text” (Carlson & Marcu, 2001, p. 2). Taboada and Mann (2006) expect that larger units including orthographic paragraphs or even sub-sections can function well, but such attempts seem to fail to meet the expectations (Marcu et al., 2000). In early studies of RST, to avoid circularity, basically, researchers used independent clauses as unit of analysis, which yields a common misunderstanding that using clauses is a fixed feature (Taboada & Mann, 2006). Actually, since the very beginning of RST, unit size for RST analysis has an arbitrary feature, be they clauses, sentences, paragraphs or chapters (Mann & Thompson, 1988). Also, innovation is encouraged (Taboada & Mann, 2006). The definition of units of RST analysis shall be dependent on the research goals. Taboada and Mann (2006, p. 437) encourage innovation. They maintain that if the taxonomy is observable, researchers can employ a unit division method suitable for their researches. Following the same compositionality criterion (Marcu, 2000), we can continue to convert the trees into ones with sentence nodes. Figure 4.7 presents the resulting

Research Materials and Methods 57

Figure 4.7 Converting the sample RST tree into sentence nodes

graph. And similarly, we can construct trees with mere paragraph nodes (Figure 4.8). For the sample text, Paragraph 2 is the Elaboration-additional for Paragraph 1. Somewhat different from syntactical analysis, which generally goes without multi-nuclear structures, such multi-nuclear structures are quite abundant in RST trees. Where there are multiple nuclei, we separately examine every pair of nuclear- satellite relation. 4.4 Defining research objects In this section, we introduce the concrete objects of investigation in this study. Research objects at the syntactic level will first be introduced, followed by those at the discourse level. 4.4.1 Defining linguistic units/properties

In the field of science, units or the properties of units are concepts, which do not automatically exist, but rather, are defined (Altmann, 1993, 1996). Linguistic units/properties are existence because of a certain theoretical framework.

58 Research Materials and Methods

Figure 4.8 Converting the sample RST tree into one with mere paragraph nodes

For instance, in phrase-structure grammars, there are non-elementary phrases, which do not exist in dependency grammars. The definition of concepts is conventional; that is, researchers will choose the most suitable way to define concepts from a certain theoretical framework and their research objectives (Altmann, 1996). But there is a significant prerequisite: Concepts must be defined clearly, then operationalised so that the concepts can be used in observable facts (Bunge, 2007). We illustrate with syntax, where valency (the number of dependents), POS and syntactic functions (or dependency relations) have been defined as linguistic units/ properties. Most prior studies concerning the distributions of them are related to only one of them, for instance POS (e.g., Best, 1994; Tuzzi et al., 2009; Ziegler, 1998, 2001) or syntactic functions (or dependency relations) (e.g., Köhler, 1999, 2012; Liu, 2009a). Some researches examine the combination of single elements, like word orders (e.g., SVO, OSV, VOS), which are related to the POS of the sentence structure (e.g., Ferrer-i-Cancho, 2015). Another instance is the sentence structure concerning the German verbs (Köhler, 2012). In Köhler’s study, verbs are not listed in the structures. For example, “Snsa”, a structure where a verb requires a nominativecase subject and an accusative-case object, fails to mark the position of the verb governor. Basically, these studies find these linguistic units/properties abide by a certain Zipf’s law-related distribution pattern, and are thus results of diversification process. We have explained in Chapter 3 what Zipf’s law and diversification processes are.

Research Materials and Methods 59 To our best knowledge, there are rare cases of studying a combination of various units/properties. Hajič et al. (2012) examine the combination of POS and syntactic function. For instance “n:adv” stands for a nominal structure at the position of an adverbial like “Last year” in “Last year we met in Prague”. Then a question arises: What about more similar types of combinations with two or more linguistic units/properties which are combined in a meaningful way? Will they be a new type of linguistic entity? If so, will their distributions follow certain linguistic laws? If they will, are these distribution patterns related with the distribution patterns of the single elements? In this research, we will try a combination of POS and syntactic function, but in a way different from Hajič et al.’s (2012) study. At the syntactic level, a complete dependency structure has three basic elements: a dependent, a governor and their relation. For instance, in the structure “interesting books”, an adjective “interesting” serves as an attribute of the noun “books”. If we use POS to abstract both the dependent and the governor, we get “Adj +N = Attribute”. Here, “+” indicates modification or restriction, “=” represents the relation between the dependent and the governor. So, we can use “dependent (POS) + governor (POS) = syntactic function/dependency relation” to stand for general dependent structures. In this abstract structure, the POS of the dependent is put before that of the governor, ignoring their linear sequence or the dependency direction. This structure has three elements, from which we can take into consideration every single element or their combination, thereby generating seven different types of combination of elements, namely: Combination 1: Dependent (POS), Combination 2: Governor (POS), Combination 3: Dependency relation, Combination 4: The pair of “dependent (POS)+governor (POS)”, Combination 5: “dependent (POS)+ [] = dependency relation/syntactic function”, where certain types of dependents can serve certain syntactic functions, and similarly, Combination 6: “[]+governor (POS) = dependency relation/syntactic function” where governors need to bear some dependents and make them fulfil certain functions, and finally Combination 7: The complete dependency structure “dependent (POS)+governor (POS) = dependency relation/syntactic function”. 4.4.2 Research objects at the syntactic level

At the syntactic level, we focus on the dependency structures (and various combinations of them as previously defined), dependency distances (DDs) and valencies. We illustrate the various combination of dependency structures with Figure 4.9, the dependency tree for the “Pierre” sample sentence. For each node of the sample sentence, the seven different combinations of the structure can be represented by Table 4.4.

60 Research Materials and Methods

Figure 4.9 The dependency tree of the “Pierre” sample sentence Source: Zhang & Liu, 2017b

Valency has been an integral part of dependency grammar. As one of the fundamental features of words, dependency is actually the realisation of valency in syntax. Liu and Feng (Liu, 2007, 2009a; Liu & Feng, 2007) propose Generalised Valency Pattern (GVP), which captures the ability both for a word or word class to be a governor (governing other words) and for it to be a dependent (governed by another word or word class) (Figure 4.10). In this way, words can be combined with other others and become bigger linguistic units. Valency denotes such a potential. In Figure 4.10, W represents a word or a word class. C-initial parts or the complements are required ones for the completion or specification of W’s meaning. A-initial parts or the adjuncts are for restriction or explanation of W. Above the word (class) is G or governor of W. From the graph we can see that in GVP, a word can carry several dependents or have multiple valencies, but it can be governed by only one governor. The former direction indicates the active valency of words and the latter, passive valency. The occurrence of W in authentic texts opens several slots for other words (or word classes) to fill in. That’s to say, the valency potential turns into specific slots, and it might be possible to predict the types and numbers of arguments. Simultaneously, the occurrence of W in authentic texts can help reveal if W can meet the needs as a dependent of another word or word class. Whether these two types of binding will take place relies on whether syntactic, semantic and pragmatic requirements are fulfilled.

Table 4.4 The seven combinations of dependency structures in the “Pierre” sample sentence (c = combination) c1

Governor

c2

c3

c4

c5

c6

c7

1

Pierre

nn

Vinken

nn

atr

nn+nn

[ ]+nn=atr

nn+[ ]=atr

nn+nn=atr

2

Vinken

nn

will

md

sb

nn+md

[ ]+md=sb

nn+[ ]=sb

nn+md=sb

3

61

cd

years

nn

atr

cd+nn

[ ]+nn=atr

cd+[ ]=atr

cd+nn=atr

4

years

nn

old

jj

nr

nn+jj

[ ]+jj=nr

nn+[ ]=nr

nn+jj=nr

5

old

jj

Vinken

nn

atr

jj+nn

[ ]+nn=atr

jj+[ ]=atr

jj+nn=atr

6

will

md

/

root

/

/

/

/

7

join

vb

will

md

pred

vb+md

[ ]+md=pred

vb+[ ]=pred

vb+md=pred

8

the

dt

board

nn

auxa

dt+nn

[ ]+nn=auxa

dt+[ ]=auxa

dt+nn=auxa

9

board

nn

join

vb

obj

nn+vb

[ ]+vb=obj

nn+[ ]=obj

nn+vb=obj

10

as

in

join

vb

auxp

in+vb

[ ]+vb=auxp

in+[ ]=auxp

in+vb=auxp

11

a

dt

director

nn

auxa

dt+nn

[ ]+nn=auxa

dt+[ ]=auxa

dt+nn=auxa

12

nonexecutive

jj

director

nn

atr

jj+nn

[ ]+nn=atr

jj+[ ]=atr

jj+nn=atr

13

director

nn

as

in

adv

nn+in

[ ]+in=adv

nn+[ ]=adv

nn+in=adv

14

Nov.

nn

join

vb

obj

nn+vb

[ ]+vb=obj

nn+[ ]=obj

nn+vb=obj

15

29

cd

Nov.

nn

atr

cd+nn

[ ]+nn=atr

cd+[ ]=atr

cd+nn=atr

Research Materials and Methods 61

Dependent

62 Research Materials and Methods

Figure 4.10 GVP (generalised valency pattern) Source: Adapted from Liu, 2009a, p. 68.

Valency or the number of dependents a node has is illustrated in the sample sentence “Pierre Vinken, 61 years old will join the board as a nonexecutive director Nov. 29” (Table 4.5). For instance, both “Pierre” and “old” are dependent on “Vinken”; thus “Vinken” occurs in the pattern “Pierre+ Vinken (dependent)+old”, with a valency of 2. When the node (e.g., “61” and “the”) is the leaf node without any other node directly below it, we define it as having a valency of 0. Such a pattern is arranged in a linear fashion and the node before the bracket is the governor Table 4.5 Valency patterns and valencies for the “Pierre” sample sentence Position

Node

Valency pattern

Valency

1

Pierre

Pierre

0

2

Vinken

Pierre+ Vinken◎+old

2

3

61

61

0

4

years

61+years◎

1

5

old

years+old◎

1

6

will

Vinken+will◎+join

2

7

join

join◎+ board+as+Nov.

3

8

the

the

0

9

board

the+board◎

1

10

as

as◎+director

1

11

a

a

0

12

nonexecutive

nonexecutive

0

13

director

a+nonexecutive+director◎

2

14

Nov.

Nov.◎+ 29

1

15

29.

29

0

Source: Zhang & Liu, 2017b

Research Materials and Methods 63 Table 4.6 Dependency direction, DD and ADD of the “Pierre” sample sentence #

Dependent

Governor

1

Pierre

Vinken

2

Vinken

3

#

Dependency direction

DD

|DD|

2

+

2−1=1

1

will

6

+

6−2=4

4

61

years

4

+

4−3=1

1

4

years

old

5

+

5−4=1

1

5

old

Vinken

2

−

2 − 5 = −3

3

6

will

/

/

/

7

join

will

6

−

6 − 7 = −1

1

8

the

board

9

+

9−8=1

1

9

board

join

7

−

7 − 9 = −2

2

10

as

join

7

−

7 − 10 = −3

3

11

a

director

13

+

13 − 11 = 2

2

12

nonexecutive

director

13

+

13 − 12 = 1

1

13

director

as

10

−

10 − 13 = −3

3

14

Nov.

join

7

−

7 − 14 = −7

7

15

29.

Nov.

14

−

14 − 15 = −1

1

Average

−0.6

2.07

in it. Besides the valency patterns, Table 4.5 also summarises the relevant valencies, where ◎=(dependent), indicating the node before it is the governor. For nodes with a valency of 0, they carry no dependents, are not governors and thus go without “◎”. This way we can take into consideration all running words, leaving out no nodes. Having defined valencies, we resume with DD, which is calculated by subtracting the linear position of the governor from that of the dependent. For instance, the DD between “Pierre” and “Vinken” is 2 − 1=1, a plus value indicating the governor-final order. Similarly, the DD between “old’ and “Vinken’ is 2 − 5=−3, a minus value indicating the governor-initial position. Table 4.6 summarises the dependency direction, DD and its absolute dependency distance (ADD or |DD|). Having addressed the research objects at the syntactic level, we resume with those at the discourse level in the next sub-section. 4.4.3 Research objects at the discourse level

In this sub-section, we focus on research objects at the discourse level. First, as it’s difficult to assign an individual role to the discourse span, we only focus on discourse relations.

64 Research Materials and Methods We then borrow the ideas from syntactic dependency trees and get discourse valency and discourse DD. The definition of discourse DD follows the one in Sun and Xiong (2019). We illustrate with Figure 4.5: Converting Figure 4.1 into a tree with only terminal clause nodes. To make it easier to understand, we re-present it here (Figure 4.11).

Figure 4.11 A re-presentation of Figure 4.5

Research Materials and Methods 65 Table 4.7 Discourse valency and distance of Figure 4.11 Node

Dependent Valency pattern on

Valency

Dependency distance (DD)

Absolute dependency distance (ADD)

1

2

1

0

2−1=1

1

2

root

1-2◎-4-5-6-10

5

0

0

3

4

3

0

4−3=1

1

4

2

3-4◎

1

2 − 4 = −2

2

5

2

5

0

2 − 5 = −3

3

6

2

6

0

2 − 6 = −4

4

7

10

8-7◎

1

10 − 7 = 3

3

8

7

8

0

7 − 8 = −1

1

9

10

9

0

10 − 9 = 1

1

10

2

7-9-10◎

2

2 − 10 = −8

8

Average

0.9

−1.2

2.4

Table 4.7 presents the discourse valency and distance of Figure 4.5 (= Figure 4.11). Here in this excerpt, except for the root which has a valency of 5, and Node 10 bears a valency of 2, the rest have a valency of either 1 or 0. The mean valency for the text is 0.9, a little less than 1. Also from Table 4.7, we see more head-initial patterns (five instances) than head-final patterns (three instances, all in a neighbouring fashion in this case). In addition, in the head-initial patterns, the dependents are quite far apart from their governor—Node 2 (with a mean DD of −3.6) while in the head-final patterns, the dependents are adjacent to their heads, generating a DD of 1 in all the instances. That explains why in this case, the mean DD for the whole text is −1.2, bearing a minus value. When dependency directions are ignored, the mean absolute dependency distance (ADD) is 2.4. Will similar observations be made in other trees in the RST-DT? We have to wait to discover. Figure 4.12 is the time plot series for the discourse valency and DD for the excerpt. Some wave-shaped rhythmic patterns can be seen for both features. For instance, for valency (the solid line), in most cases, we see a small-0-small-0 pattern. As for DD (the thin dotted line), in most cases, the distance bears a minus distance, suggesting that most spans occur after their governors. So, it’s understandable that the average value is −0.9. When ignoring the dependency direction and examining the ADD (the thick dotted line), we find its pattern is changing basically in synchronisation with valency. Having defined the research objects at both syntactic and discourse levels, we will go on with the definition of two linear linguistic units—motifs and sequencings, with the latter being a new concept in this study.

66 Research Materials and Methods

Figure 4.12 A time series plot of discourse valency and dependency distance for WSJ-0642

4.5 Defining two linear linguistic units Quantitative linguistics examine units/properties and their interrelations (Buk & Rovenchak, 2008; Chen & Liu, 2016; Tuzzi et al., 2009), but usually bag-of-words models are adopted as if to put these units and properties in a bag, ignoring their sequential behaviours (Köhler, 1999, 2006; Pawłowski, 1999). As a quite important aspect, sequential/syntactic behaviour shall be addressed rather than being left out (Köhler, 2015; Pawłowski, 1999). Herdan (1966) defines “language in lines” and “language in the masses”, with the former being linear/ sequential linguistic materials and the latter, nonlinear ones. These two categories of materials shall be studied with corresponding research methods. In this section, we will introduce two special linguistic units with both being linear structures—motifs and sequencings. 4.5.1 Motif—a special linguistic unit

Boroda (1982) defined the motif of a musical text for the study of musical texts based on the length of the notes in the piece. Köhler (e.g., 2006, 2012, 2015) innovatively introduces this unit into linguistic research to study the rhythm of properties themselves. With the motivation to examine the syntagmatic/sequential dimension of linguistic entities/properties, Köhler (e.g., 2015) defines a new linguistic unit—motif (originally called sequence or segment, cf. Köhler, 2006). Originally from the motif in music, this new unit is defined in two ways. With numerical properties, it is defined as the longest continuous sequence where the value does not decrease. For instance, with the discourse valency in Table 4.7 (Discourse

Research Materials and Methods 67 valency and distance of Figure 4.11) being 0, 5, 0, 1, 0, 0, 1, 0, 0, 2, we can have the following motifs: 0–5, 0–1, 0–0–1, 0–0–2. The first motif ends with “5” as the next one is “0”, smaller than 5. The third motif bears two 0 values as motifs are defined as series of non-descending of values. We can find that all these motifs begin with an element of “0”. In this sample, there are two motifs with two elements (0–5, 0–1, both with a length of 2) and another two motifs with three elements (0–0–1 and 0–0–2, both with a length of 3). But when the research objects are nominal, we have to define otherwise. Such a motif is the longest sequences of unrepeated elements. We illustrate with the motifs of POS and relations/syntactic functions of the “Pierre” sample sentence (Table 4.8). This 15-word sentence is abstracted into six POS motifs and five relation motifs. When the POS motifs are concerned, all of them start with NN, suggesting the high frequency and importance of nouns. When the relation motifs are Table 4.8 Motifs of POS and relations of the “Pierre” sample sentence # Dependent

POS

POS motif

Relation

nn

atr

1

Pierre

nn

2

Vinken

nn

3

61

cd

4

years

nn

nr

5

old

jj

atr

6

will

md

7

join

vb

pred

8

the

dt

auxa

9

board

nn

obj

10

as

in

11

a

dt

12

nonexecutive

jj

13

director

nn

14

Nov.

nn

15

29

cd

nn-cd

nn-jj-md-vb-dt

nn-in-dt-jj

sb atr

nn-cd

atr-sb atr-nr

root atr-root-pred-auxa-obj-auxp

auxp auxa atr

nn

Relation motif

adv

auxa-atr-adv-obj

obj atr

atr

68 Research Materials and Methods concerned, most of them begin with “atr” (attribute), suggesting it as a highly significant function. The only motif which does not start with “atr” (auxa-atr-adv-obj) still has an element of “atr” in it. Köhler and Naumann (2008, p. 637) suggest that sequences in texts do not exist in a disorderly manner, but with a certain regularity. The motif approach is a method used to study the rhythmic patterns of sequences in texts. Motifs are applicable to any linguistic unit. Both word and non-word units can be used as their basic units, such as length, hierarchy, POS and dependency, all of which can be defined in a similar way. Motifs can be used not only for metric and sequential properties, but also for nominal properties. The frame unit, within which motifs are defined, bear various possibilities, for example, text, sentences, paragraphs, sections, chapters and verses (Köhler, 2006; Köhler & Naumann, 2008). Different terms like L-motif, F-motif, P-motif are defined taking the initials from the corresponding units like length, frequency and polysemy. The definition of motifs embraces a particular advantage—motifs of linguistic units/properties can be flexibly upgraded or downgraded. For instance, LL-motifs (the length of length motifs) are generated by further abstracting L-motifs. Likewise, there can be LF-motifs, FL-motifs, even LLL-motifs (the length of LL-motifs) (Köhler & Naumann, 2010)! This reflects a higher level of abstraction in quantitative linguistics than in general linguistics. Length, hierarchy, POS and dependency relations mentioned above are in fact already an abstraction, as concrete words no longer appear in the property. The move from length to L-motifs, then to LL-motifs, further to LLL-motifs … is the result of progressive abstraction. For various causes, conscious or otherwise, like the use of other basic linguistic units that can be sensed directly (e.g., words, sentences), users use the kinetic motifs with unequal probability. During the course of text formation, a balance will be struck between the recurrence of rhythmically based fragments on the one hand and the introduction of new fragments on the other hand. When modelling the distribution of motifs, the corresponding assumption is that “any appropriate segmentation of a text will display a monotonously decreasing frequency distribution” (Köhler, 2012, p. 118). If a motif is deemed as a basic linguistic unit, the relationships between its ranks and frequencies, between lengths and frequency-ranks, and between length-frequencies and their corresponding ranks should follow certain models, respectively. Such models are expected to bear some linguistic significance. A common function for modelling the rank-frequency distribution of motifs is the ZA (the right truncated modified Zipf-Alekseev) model, and a better model is the ZM (Zipf-Mandebrot) model (Köhler, 2015; Köhler & Naumann, 2010). Both models, related to Zipf’s law, reflect a balance between linguistic forces. A common model for capturing the frequency-length relationship of motifs is the hyper-Poisson model (Köhler, 2006), and other models include the positive negative binomial model (Zhang & Liu, 2017b), and the Lorentzian model (Čech et al., 2017), among others. Defining motifs for both numerical and nominal types of linguistic units/ properties, we can segment a text (or other frame units) from left to right, starting from the first unit (e.g., first word, POS, dependency, length) without omission;

Research Materials and Methods 69 such a segmenting criterion is objective and unambiguous (Köhler, 2012, p. 117). In this way, the entire frame is abstracted into a sequence of unbroken motifs. In the present study, unless otherwise stated, the motifs we study use each subcorpus as the frame, that is, the segmentation is done in the whole sub-corpus, rather than in sentences, paragraphs or chapters etc. Motifs can be applied in linguistics for various ends. For example, Köhler and Naumann (2008) suggest that word length motifs can distinguish texts of different genres. In the study of linear linguistic behaviour in DD, Jing and Liu (2017) study the DD motifs of 21 Indo-European languages and find that they, able to distinguish different languages, carry topological implications. Chen’s (2017) study suggests that the type-token ratios of POS motifs can discriminate various genres. For more relevant researches of motifs, see Köhler (2015) and Liu and Liang (2017). Having defined motifs, we move on to the next type of unit, which also exhibits the sequential behaviour of language entities/properties. 4.5.2 Sequencing—a special linguistic unit

Different from sequences from Köhler (2006), which is later termed as motif, sequencings in this study are all the possible linear strings truncated from a given sentence. Here the truncating can be done on words, on the syntactic functions they play or their POS, or on any other related properties/units. This is actually a combination of linear n-grams (n = 1 and beyond). For instance, for the sample sentence “The student has an interesting book” (Figure 1.2), there are altogether 21 (Σ16 6 = 1 + 2 + 3 + 4 + 5 + 6 = 21) possible ordered strings as listed in Table 4.9. With data extracted from corpora, we can come up with the rank-frequency list of all the sequencings. In this way, some patterns will emerge with some sequencings occurring more often. Sequencings and motifs share the following common features. Both sequencings and motifs employ language-in-the-line methods. In addition, they can both cover all the running words. Naturally, some patterns will occur more often than others. Whether their rank-frequency data follow the same distribution pattern remains to be discovered. In case of nominal qualities, sequencings are different from motifs in the following aspects: 1 The basic elements in motifs are not repeated. Restrained by the definition of motifs when the examined unit/quality are nominal, each single element can occur only once, and only in one motif. But the basic elements in sequencings are repeated n times (n = the length of the original string). This difference will result in quite different numbers of sequencings and motifs generated from the same source material. The former will be much larger than the original material, while motifs can make the source material more abstract. For instance, in the same sample sentence, there are only two motifs of POS: d-n-v and d-a-n, but there are 21 sequencings of POS.

70 Research Materials and Methods Table 4.9 All possible sequencings for the sample sentence String length

Sequencing of words

Sequencing of parts of speech

Sequencing of syntactic functions

1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 4 4 4 5 5 6

the student has an interesting book the-student student-has has-an an-interesting interesting-book the-student-has student-has-an has-an-interesting an-interesting-book the-student-has-an student-has-an-interesting has-an-interesting-book the-student-has-an-interesting student-has-an-interesting-book the-student-has-an-interesting-book

d (determiner) n (noun) v (verb) d a (adjective) n d-n n-v v-d d-a a-n d-n-v n-v-d v-d-a d-a-n d-n-v-d n-v-d-a v-d-a-n d-n-v-d-a n-v-d-a-n d-n-v-d-a-n

atr subj root atr atr obj atr-subj subj-root root-atr atr-atr atr-obj atr-subj-root subj-root-atr root-atr-atr atr-atr-obj atr-subj-root-atr subj-root-atr-atr root-atr-atr-obj atr-subj-root-atr-atr subj-root-atr-atr-obj atr-subj-root-atr-atr-obj

2 In the same motif, the same basic nominal element won’t occur twice since a motif is defined as composed of unrepeated elements. This will leave out many cases when one element occurs twice or more times in an adjacent manner. This problem is tackled in sequencings. Take as an example the sequencing “atr-atr-obj” (representing the syntactic functions of each word in “an-interesting-book”). In such a sequencing, we can see a pattern of two attributes occurring before a noun. 3 Similarly, patterns like nn-in-nn (in representing a preposition) won’t be recognised by motifs but will be a possible pattern in sequencing. Such patterns might be linguistically important. In sequencing, the same elements might be able to occur more than once in a pattern. But sequencing will generate more rare strings which won’t able to be recognised as linguistically meaningful. We shall look for a way to balance these two aspects. 4.6 Chapter summary This chapter elaborates on the research materials and methods. We first introduce two news treebanks as our research object (Sections 4.2 and 4.3), with one being a syntactic dependency treebank, and the other a discourse

Research Materials and Methods 71 treebank which is converted into dependency treebank with mere terminal nodes. Both treebanks are from the same source material and thus render it possible for us to carry out comparative studies at the syntactic and discourse levels. Then the research objects are separately defined for the two levels (Sections 4.4.2 and 4.4.3). At the syntactic level, we explore the seven combinations of the dependency structure, valency and DD. At the discourse level, we focus only on discourse relations, discourse valency and discourse DD. The previously defined research objects can be studied using two basic methods: One is the language-in-the-mass methods and the other is the language-inthe-line methods. For the former methods, all the units/properties are put together without considering their linear features. For the latter methods, the order of units/ properties is taken into consideration. Section 4.5 introduces two such two linear linguistic units—motifs and sequencings. After we have addressed the research materials and methods, we are able to proceed to syntactic dependency relations and related properties in Chapter 5. Note 1 https://ufal.mff.cuni.cz/pcedt2.0/en/a-layer.html As punctuation marks are removed, relations like AuxG (non-terminal and non- coordinating graphic symbols) and AuxK (terminal punctuation of a sentence) are not included.

References Abeillé, A. (2003). Treebank: Building and using Parsed Corpora. Kluwer. Altmann, G. (1993). Science and linguistics. In R. Köhler & B. Rieger (Eds.), Contributions to quantitative linguistics (pp. 3–10). Kluwer. Altmann, G. (1996). The nature of linguistic units. Journal of Quantitative Linguistics, 3(1), 1–7. Anderson, R. C., Spiro, R. J., & Anderson, M. C. (1978). Schemata as scaffolding for the representation of information in connected discourse. American Educational Research Journal, 15(3), 433–440. Ariel, M. (1988). Referring and accessibility. Journal of Linguistics, 24(1), 65–87. Ayers, G. M. (1994). Discourse functions of pitch range in spontaneous and read speech. OSU Working, Papers in Linguistics, 44, 1–49. Azar, Z., Backus, A., & Özyürek, A. (2019). General-and language-specific factors influence reference tracking in speech and gesture in discourse. Discourse Processes, 56(7), 553–574. Best, K. -H. (1994). Word class frequencies in contemporary German short prose texts. Journal of Quantitative Linguistics, 1, 144–147. Björklund, M., & Virtanen, T. (1991). Variation in narrative structure: A simple text vs. an innovative work of art. Poetics, 20(4), 391–403. Bod, R. (2005). Towards Unifying Perception and Cognition: The Ubiquity of Trees. Prepublication. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.123.9114. (Accessed 29 April 2022). Bondy, J. A., & Murty, U. S. R. (2008). Graph theory. Springer.

72 Research Materials and Methods Boroda, M. (1982). Häufigkeitsstrukturen musikalischer Texte. In J. K. Orlov, M. G. Boroda, & I. S. Nadarejšvili (Eds.), Sprache, Text, Kunst. Quantitative Analysen (pp. 231–262). Brockmeyer. Bruner, J., & Bruner, J. S. (1990). Acts of meaning: Four lectures on mind and culture (Vol. 3). Harvard University Press. Buk, S., & Rovenchak, A. (2008). Menzerath–Altmann law for syntactic structures in Ukrainian. Glottotheory, 1(1), 10–17. Bunge, M. (2007). Philosophy of science (Vol. 1). From problem to theory (4th ed.). Transaction Publishers. Carlson, L., & Marcu, D. (2001). Discourse tagging reference manual. ISI Technical Report ISI-TR-545, 54, 56. Carlson, L., Marcu, D., & Okurowski, M. E. (2002). RST discourse treebank, LDC2002T07 [Corpus]. Linguistic Data Consortium. Carlson, L., Marcu, D., & Okurowski,M. E. (2003). Building a discourse-tagged corpus in the framework of rhetorical structure theory. In J. van Kuppevelt & R. W. Smith (Eds.), Current directions in discourse and dialogue (pp. 85–112). Kluwer Academic Publishers. Carminati, M. N. (2002). The processing of Italian subject pronouns. University of Massachusetts Amherst. Čech, R., Vincze, V., & Altmann, G. (2017). On motifs and verb valency. In H. Liu & J. Liang (Eds.), Motifs in language and text (pp. 231–260). De Gruyter Mouton. Chartrand, G. (1985). Introductory graph theory. Dover. Chen, H., & Liu, H. (2016). How to measure word length in spoken and written Chinese. Journal of Quantitative Linguistics, 23(1), 5–29. Chen, R. (2017). Quantitative text classification based on POS-motifs. In H. Liu & J. Liang (Eds.), Motifs in language and text (pp. 65–86). De Gruyter Mouton. Chen, R., Deng, S., & Liu, H. (2022). Syntactic complexity of different text types: From the perspective of dependency distance both linearly and hierarchically. Journal of Quantitative Linguistics, 29(4), 510–540. Chen, X., & Gerdes, K. (2022). Dependency distances and their frequencies in Indo- European language. Journal of Quantitative Linguistics, 29(1), 106–125. Cho, Y. H. (1996). Quantitative and prosodic representation of tone and intonation in the Kyungnam dialect of Korean. The University of Arizona. Cook, G. W. (1990). A theory of discourse deviation: The application of schema theory to the analysis of literary discourse. Doctoral dissertation, University of Leeds. Coopmans, C. W., & Cohn, N. (2022). An electrophysiological investigation of co-referential processes in visual narrative comprehension. Neuropsychologia, 172, 108253. d’Andrade, R. G. (1984). Cultural meaning systems. In R. Shweder & R. LeVine (Eds.), Culture theory: Essays on mind, self, and emotion (pp. 88–119). Cambridge University Press. d’Andrade, R. G. (1990). Some propositions about the relations between culture and human cognition. In J. W. Stigler & R. Shweder (Eds.), Cultural psychology: Essays in comparative human development (pp. 65–129). Cambridge University Press. Demberg, V., Scholman, M. C., & Asr, F. T. (2019). How compatible are our discourse annotation frameworks? Insights from mapping RST-DT and PDTB annotations. Dialogue & Discourse, 10(1), 87–135. den Ouden, H. (2004). Prosodic Realizations of Text Structure, unpublished PhD dissertation, University of Tilburg. den Ouden, H., Noordman, L., & Terken, J. (2002). The prosodic realization of organisational features of text. In Proceedings of speech prosody 2002 (pp. 543–546). Aix-enProvence, France, April 11–13, 2002.

Research Materials and Methods 73 duVerle, D., & Prendinger, H. (2009). A novel discourse parser based on support vector machine classification. In K. Y. Su, J. Su, J. Wiebe, & H. Li. (Eds.), Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing (pp. 665–673). Suntec, Singapore, August 2–7, 2009. Fang, Y., & Liu, H. (2018). What factors are associated with dependency distances to ensure easy comprehension? A case study of ba sentences in Mandarin Chinese. Language Sciences, 67, 33–45. Feng, Z. (2011). Language and math. World Publishing Corporation. (In Chinese). Feng, Z. (2013). Schools of modern linguistics (updated edition). Commercial Press. (In Chinese). Feng, V. W., & Hirst, G. (2012). Text-level discourse parsing with rich linguistic features. In Proceedings of the 50th annual meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 60–68). Jeju Island, Korea, July 8–14, 2012. Ferrer-i-Cancho, R. (2015). The placement of the head that minimizes online memory. Language Dynamics and Change, 5(1), 114–137. Gao, S. (2010). A quantitative study of grammatical functions of modern Chinese nouns based on a dependency treebank. TCSOL Studies, 2(38), 54–60. (In Chinese). Gao, S., Yan, W., & Liu, H. (2010). A quantitative study on syntactic functions of Chinese verbs based on dependency treebank. Chinese Language Learning, 5, 105–112. (In Chinese). Garside, R., Leech, G., & Sampson, G. (1987). The computational analysis of English. A corpus-based approach. Longman. Gernsbacher, M. A. (2013). Language comprehension as structure building. Psychology Press. Gordon, P. C., Grosz, B. J., & Gilliom, L. A. (1993). Pronouns, names, and the centering of attention in discourse. Cognitive Science, 17(3), 311–347. Gupta, P., Rajaram, S., Schütze, H., & Runkler, T. (2019). Neural relation extraction within and across sentence boundaries. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, No. 01) (pp. 6513–6520). July, 2019. Gupta, P., Schutze, H., & Andrassy, B. (2016). Table filling multi-task recurrent neural network for joint entity and relation extraction. In Proceedings of COLING 2016, the 26th international conference on computational linguistics: Technical papers (pp. 2537–2547). Osaka, Japan, December 11–17, 2016. Hajič, J., Hajičová, E., Panevová, J., Sgall, P., Cinková, S., Fučíková, E., Mikulová, M., Pajas, P., Popelka, J., Semecký, J., Šindlerová, J., Stĕpánek, J., Toman, J., Urešová, Z., & Žabokrtský, Z. (2012). Prague Czech-English dependency treebank 2.0 LDC2012T08. Linguistic Data Consortium. Harary, F. (1969). Graph theory. Addison-Wesley. Herdan, G. (1966). The advanced theory of language as choice and chance. Lingua, 18(3), 447–448. Hernault, H., Helmut, P., du Verle, D. A., & Ishizuka, M. (2010). HILDA: A discourse parser using support vector machine classification. Dialogue and Discourse, 1(3), 1–33. https:// doi.org/10.5087/dad.2010.003 Hidi, S. E., & Hildyard, A. (1983). The comparison of oral and written productions in two discourse types. Discourse Processes, 6(2), 91–105. Hinrichs, E., & Kübler, S. (2005). Treebank profiling of spoken and written German. In Proceedings of the fourth workshop on treebanks and linguistic theories. Barcelona, Spain, December 9–10, 2005.

74 Research Materials and Methods Hirschberg, J., & Litman, D. (1987). Now let’s talk about now; Identifying cue phrases intonationally. In 25th Annual meeting of the Association for Computational Linguistics (pp. 163–171). Chicago, USA, July 1987. Hobbs, J. R. (1978). Why is discourse coherent?. Technical Report, Artificial Intelligence Center. SRI International Menlo Park CA. Hovy, E. H., & Maier, E. (1992). Parsimonious or profligate: How many and which discourse structure relations?. University of Southern California Marina Del Rey Information Sciences Inst. Hudson, R. (1998). English Grammar. Routledge. Hudson, R. (2007). Language networks: The new word grammar. Oxford University Press. Hwang, H. (2022). The influence of discourse continuity on referential form choice. Journal of Experimental Psychology: Learning, Memory, and Cognition. Advance online publication. https://doi.org/10.1037/xlm0001166 Jiang, J., & Liu, H. (2015). The effects of sentence length on dependency distance, dependency direction and the implications - based on a parallel English-Chinese dependency treebank. Language Sciences, 50, 93–104. Jiang, J., & Ouyang, J. (2017). Dependency distance: A new perspective on the syntactic development in second language acquisition. Comment on “Dependency distance: A new perspective on syntactic patterns in natural language” by Haitao Liu et al. Physics of Life Reviews, 21, 209–210. Jiang, X., & Jiang, Y. (2020). Effect of dependency distance of source text on disfluencies in interpreting. Lingua, 243, 102873. Jing, Y., & Liu, H. (2017). Dependency distance motifs in 21 Indo-European languages. In H. Liu & J. Liang (Eds.), Motifs in language and text (pp. 133–150). De Gruyter Mouton. Kambhatla, N. (2004). Combining lexical, syntactic, and semantic features with maximum entropy models for information extraction. In Proceedings of the ACL interactive poster and demonstration sessions, 22. (pp. 178–181). Association for Computational Linguistics. July 2004. Karmiloff-Smith, B. A. (1985). Language and cognitive processes from a developmental perspective. Language and Cognitive Processes, 1(1), 61–85. Karmiloff-Smith, B. A. (1994). Beyond modularity: A developmental perspective on cognitive science. European Journal of Disorders of Communication, 29(1), 95–105. Kintsch, W., & van Dijk, T. A. (1975). Recalling and summarizing stories. Language, 40, 98–116. Köhler, R. (1999). Syntactic structures: Properties and interrelations. Journal of Quantitative Linguistics, 6(1), 46–57. Köhler, R. (2006). The frequency distribution of the length of length sequences. In J. Genzor & M. Bucková (Eds.), Favete linguis. Studies in honour of Viktor Krupa (pp. 145–152). Slovak Academy Press. Köhler, R. (2012). Quantitative syntax analysis. De Gruyter Mouton. Köhler, R. (2015). Linguistic motifs. In G. K. Mikros & J. Mačutek (Eds.), Sequences in language and text (pp. 107–129). De Gruyter Mouton. Köhler, R., & Naumann, S. (2008). Quantitative text analysis using L-, F- and T-segments. In B. Preisach & D. Schmidt-Thieme (Eds.), Data analysis, machine learning and applications (pp. 637–646). Springer. Köhler, R., & Naumann, S. (2010). A syntagmatic approach to automatic text classification. Statistical properties of F- and L-motifs as text characteristics. In P. Grzybek, E. Kelih, & J. Mačutek (Eds.), Text and language. Structures, functions, interrelations (pp. 81–90). Praesens Verlag.

Research Materials and Methods 75 Kong, E. (2004). The role of pitch range variation in the discourse structure and intonation structure of Korean. In ICSLP 8th International conference on spoken language processing ICC. Jeju, Jeju Island, Korea, October 4–8, 2004. Kowtko, J. C. (1997). The function of intonation in task-oriented dialogue. Doctoral dissertation, University of Edinburgh. Lai, K. H. (1924/2007). New writings on the law of national languages. Hunan Education Publishing House. (In Chinese). Lei, L., & Wen, J. (2020). Is dependency distance experiencing a process of minimization? A diachronic study based on the State of the Union addresses. Lingua, 239, 102762. Levy, E. (1984). Communicating thematic structures in narrative discourse: The use of referring terms and gestures. Unpublished doctoral dissertation, University of Chicago. Liang, J. (2017). Dependency distance differences across interpreting types: Implications for cognitive demand. Frontiers in Psychology, 8, 2132. Linguistic Data Consortium. (1999). Penn Treebank 3. LDC99T42. Liu, H. (2007). Probability distribution of dependency distance. Glottometrics, 15, 1–12. Liu, H. (2009a). Dependency grammar: from theory to practice. Science Press. (In Chinese). Liu, H. (2009b). Probability distribution of dependencies based on Chinese dependency treebank. Journal of Quantitative Linguistics, 16(3), 256–273. Liu, H. (2017). The distribution pattern of sentence structure hierarchy. Foreign Language Teaching and Research, 3, 345–352+479. (In Chinese). Liu, B., & Chen, X. (2017). Dependency distance in language evolution: Comment on “Dependency distance: A new perspective on syntactic patterns in natural languages” by Haitao Liu et al. Physics of Life Reviews, 21, 194–196. Liu, H., & Cong, J. (2014). Empirical characterization of modern Chinese as a multi-level system from the complex network approach. Journal of Chinese Linguistics, 1, 1–38. Liu, H., & Feng, Z. (2007). Probabilistic valency pattern theory for natural language processing. Linguistic Sciences, (03), 32–41. (In Chinese). Liu, H., & Jing, Y. (2016). A quantitative analysis of English hierarchical structure. Journal of Foreign Languages, 39(6), 2–11. (In Chinese). Liu, H., & Liang, J. (Eds.). ( 2017). Motifs in language and text. De Gruyter Mouton. Liu, H., Xu, C., & Liang, J. (2017). Dependency distance: A new perspective on syntactic patterns in natural languages. Physics of Life Review, 21, 171–193. Liu, Y., & Zeldes, A. (2019). Discourse relations and signaling information: Anchoring discourse signals in RST-DT. Proceedings of the Society for Computation in Linguistics, 2(1), 314–317. Machin, D., & van Leeuwen, T. (2003). Global schemas and local discourses in cosmopolitan. Journal of Sociolinguistics, 7(4), 493–512. Malcolm, I. G., & Sharifian, F. (2002). Aspects of Aboriginal English oral discourse: An application of cultural schema theory. Discourse Studies, 4(2), 169–181. Mann, W. C., & Thompson, S. A. (1988). Rhetorical structure theory: Toward a functional theory of text organization. Text, 8(3), 243–281. Marcu, D. (2000). The theory and practice of discourse parsing and summarization. The MIT press. Marcu, D., Carlson, L., & Watanabe, M. (2000). The Automatic Translation of Discourse Structures. Presented at the 1st Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL’00) (pp. 9–17). Seattle, Washington, USA, April 29–May 4, 2000. Marcus, G. F. (2001). The algebraic mind. The MIT press.

76 Research Materials and Methods Marslen-Wilson, W., Levy, E., & Tyler, L. K. (1982). Producing interpretable discourse: The establishment and maintenance of reference. In R. J. Jarvella & W. Klein (Eds.), Speech, place and action (pp. 339–378). John Wiley and Sons. Miller, G. A., & McKean, K. O. (1964). A chronometric study of some relations between sentences. Quarterly Journal of Experimental Psychology, 16(4), 297–308. Nakatani, C., Hirschberg, J., & Grosz, B. (1995). Discourse structure in spoken language: Studies on speech corpora. In Working notes of the AAAI spring symposium on empirical methods in discourse interpretation and generation (pp. 106–112). Palo Alto, California USA, March 27–29, 1995. Nelson, K. (1981). Social cognition in a script framework. In J. H. Flavell & L. Ross (Eds.), Social cognitive development: Frontiers and possible futures (pp. 97–118). Cambridge University Press. Nelson, K. (1986). Event knowledge: Structure and function in development. Lawrence Earlbaum. Nivre, J. (2005). Dependency grammar and dependency parsing. (MSI report). Växjö University: School of Mathematics and Systems Engineering. Noordman, L., Dassen, I., Swerts, M., & Terken, J. (1999). Prosodic markers of text structure. In K. van Hoek, A. A. Kibrik, & L. Noordman (Eds.), Discourse studies in cognitive linguistics: Selected papers from the fifth international cognitive linguistics conference (pp. 131–48). Amsterdam and Philadelphia, PA: John Benjamins. Noriega-Atala, E., Lovett, P. M., Morrison, C. T., & Surdeanu, M. (2021). Neural architectures for biological inter-sentence relation extraction. arXiv preprint arXiv:2112.09288. Nwogu, K. N. (1989). Discourse variation in medical texts: Schema, theme and cohesion in professional and journalistic accounts. Doctoral dissertation, University of Aston in Birmingham. Ouyang, J., Jiang, J., & Liu, H. (2022). Dependency distance measures in assessing syntactic complexity. Assessing Writing, 51, 100603. Paltridge, B., Starfield, S., Ravelli, L. J., & Tuckwell, K. (2012). Change and stability: Examining the macrostructures of doctoral theses in the visual and performing arts. Journal of English for Academic Purposes, 11(4), 332–344. Passoneau, R. J. (1991). Getting and keeping the center of attention. In R. Weischedel & M. Bates (Eds.), Challenges in natural language processing (pp. 35–38). Cambridge University Press. Pawłowski, A. (1999). Language in the line vs. language in the mass: On the efficiency of sequential modelling in the analysis of rhythm. Journal of Quantitative Linguistics, 6, 70–77. Peng, N., Poon, H., Quirk, C., Toutanova, K., & Yih, W. T. (2017). Cross-sentence n-ary relation extraction with graph LSTMs. Transactions of the Association for Computational Linguistics, 5, 101–115. Pierrehumbert, J., & Hirschberg, J. B. (1987). The meaning of intonational contours in the interpretation of discourse. In Intentions in communication (pp. 271–311). The MIT Press. Quirk, C., & Poon, H. (2017). Distant supervision for relation extraction beyond the sentence boundary. In Proceedings of the conference of the European chapter of the Association for Computational Linguistics (Vol. 1) (pp. 1171–1182). Association for Computational Linguistics. arXiv preprint arXiv:1609.04873. Radev, D. R. (2000). A common theory of information fusion from multiple text sources step one: Cross-document structure. In L. Dybkjær, K. Hasida, & D. Tram (Eds.), Proceedings of the 1st SIGdial workshop on discourse and dialogue (Vol. 10) (pp. 74–83). Association for Computational Linguistics. Hong Kong, China, October 7–8, 2000.

Research Materials and Methods 77 Rumelhart, D. E. (1978). Schemata: The building blocks of cognition. In R. Spiro, B. Bruce, & W. Brewer (Eds.), Theoretical issues in Reading comprehension (pp. 33–58). Lawrence Earlbaum. Sahu, S. K., Christopoulou, F., Miwa, M., & Ananiadou, S. (2019). Inter-sentence relation extraction with document-level graph convolutional neural network. arXiv preprint arXiv:1906.04684. Song, L., Zhang, Y., Wang, Z., & Gildea, D. (2018). N-ary relation extraction using graph state LSTM. In Proceedings of conference on empirical methods in natural language processing (pp. 2226–2235). Association for Computational Linguistics. arXiv:1808. 09101. Soricut, R., & Marcu, D. (2003). Sentence level discourse parsing using syntactic and lexical information. In Proceedings of the 2003 human language technology conference of the North American chapter of the Association for Computational Linguistics (pp. 149–156). Edmonton, May–June 2003. Spiro, R. J., & Tirre, W. C. (1980). Individual differences in schema utilization during discourse processing. Journal of Educational Psychology, 72(2), 204. Sporleder, C., & Lascarides, A. (2004). Combining hierarchical clustering and machine learning to predict high-level discourse structure. In Proceedings of computational linguistics 2004 (pp. 43–49). Geneva, Switzerland, August 23–27, 2004. Stede, M. (2004). The Potsdam commentary corpus. In Proceedings of the ACL 2004 workshop on “Discourse Annotation” (pp. 96–102). Barcelona, Spain, July 25–26, 2004. Stede, M. (2012). Discourse processing. Morgan and Claypool. Strauss, C., & Quinn, N. (1993). A cognitive/cultural anthropology. In R. Borofsky (Ed.), Assessing cultural anthropology (pp. 284–297). McGraw-Hill. Sun, K., & Xiong, W. (2019). A computational model for measuring discourse complexity. Discourse Studies, 21(6), 690–712. https://doi.org/10.1177/1461445619866985 Swampillai, K., & Stevenson, M. (2011). Extracting relations within and across sentences. In Proceedings of the international conference recent advances in natural language processing 2011 (pp. 25–32). Hissar, Bulgaria, September 2011. http://aclweb.org/anthology/ R11-1004 Taboada, M. (2004). Building coherence and cohesion: Task-oriented dialogue in English and Spanish. John Benjamins. Taboada, M. (2008). SFU Review Corpus [Corpus]. Vancouver: Simon Fraser University, http://www.sfu.ca/~mtaboada/research/SFU_Review_Corpus.html (Accessed 22 January 2014). Taboada, M., & Mann, W. (2006). Rhetorical structure theory: Looking back and moving ahead. Discourse Studies, 8(3), 423–459. Tesnière, L. (1959). Éléments de syntaxe structurale. Klincksieck. Tutte, W. T. (2001). Graph theory. Cambridge University Press. Tuzzi, A., Popescu, I. I., & Altmann, G. (2009). Zipf’s laws in Italian texts. Journal of Quantitative Linguistics, 16(4), 354–367. Uresova, Z., Fucikova, E., & Hajic, J. (2016). Non-projectivity and valency. In Proceedings of the workshop on discontinuous structures in natural language processing 2016 (pp. 12–21). San Diego, California, June 17, 2016. Association for Computational Linguistics. van der Velde, F., & de Kamps, M. (2006). Neural blackboard architectures of combinatorial structures in cognition. Behavioral and Brain Sciences, 29, 37–108. van Dijk, T. A. (1980). Macrostructures: An interdisciplinary study of global structures in discourse, interaction, and cognition. Lawrence Erlbaum.

78 Research Materials and Methods van Dijk, T. A. (1982). Episodes as units of discourse analysis. In Tannen, D. (Ed.), Analyzing discourse: Text and talk (pp. 177–195). Georgetown University Press. van Dijk, T. A. (1983). Discourse analysis: Its development and application to the structure of news. Journal of Communication, 33(2), 20–43. van Dijk, T. A. (1985). Structures of news in the press. In T. A. van Dijk (Ed.), discourse and communication: New approaches to the analysis of mass media, discourse and communication (pp. 69–93). De Gruyter. van Dijk, T. A. (2019a). Macrostructures: An interdisciplinary study of global structures in discourse, interaction, and cognition. Routledge. van Dijk, T. A. (2019b). Some observations on the role of knowledge in discourse processing. Arbolesy Rizomas, 1(1), 10–23. Venditti, J. J. (2000). Discourse structure and attentional salience effects on Japanese intonation. The Ohio State University. Venditti, J. J., & Hirschberg, J. (2003). Intonation and discourse processing. In Proceedings of the international congress of phonetic sciences (pp. 315–318). University of Saarbrücken. Venditti, J. J., & Swerts, M. (1996). Intonational cues to discourse structure in Japanese. In Proceeding of fourth international conference on spoken language processing. ICSLP’96 (Vol. 2), (pp. 725–728). IEEE. October 1996. Verga, P., Strubell, E., & McCallum, A. (2018). Simultaneously self-attending to all mentions for full-abstract biological relation extraction. In Proceedings of conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies (pp. 872–884). Association for Computational Linguistics. arXiv preprint arXiv:1802.10569. Vu, N. T., Adel, H., Gupta, P., & Schütze, H. (2016). Combining recurrent and convolutional neural networks for relation classification. In Proceedings of the NAACL-HLT (pp. 534–539). San Diego, California USA, 2016. Association for Computational Linguistics. arXiv preprint arXiv:1605.07333. Wichmann, A. (2014). Discourse intonation. Covenant Journal of Language Studies, 2(1), 1–16. Williams, J. P. (1984). Categorization, macrostructure, and finding the main idea. Journal of Educational Psychology, 76(5), 874. Williams, J. P., Taylor, M. B., & de Cani, J. S. (1984). Constructing macrostructure for expository text. Journal of Educational Psychology, 76(6), 1065. Williams, S., & Power, R.. (2008). Deriving rhetorical complexity data from the RST-DT corpus. In Proceedings of 6th international conference on language resources and evaluation (LREC) (pp. 2720–2724). Marrakech, Morocco, 28–30 May, 2008. Winograd, T. (1977). A framework for understanding discourse. Cognitive Processes in Comprehension, 63, 88. Xu, C., & Liu, H. (2022). The role of working memory in shaping syntactic dependency structures. In J. Schwieter & E. Wen (Eds.), The Cambridge handbook of working memory and language. Cambridge University Press. (In Press). Yadav, H., Vaidya, A., Shukla, V., & Husain, S. (2020). Word order typology interacts with linguistic complexity: A cross-linguistic corpus study. Cognitive Science, 44(4), 1–44. Ye, Y. (2019). Macrostructures and rhetorical moves in energy engineering research articles written by Chinese expert writers. Journal of English for Academic Purposes, 38, 48–61. Yoon, S. O., Benjamin, A. S., & Brown-Schmidt, S. (2021). Referential form and memory for the discourse history. Cognitive Science, 45(4), e12964.

Research Materials and Methods 79 Yu, S. (2018). Linguistic interpretation of Zipf’s law. In H. Liu (Ed.), Advances in quantitative linguistics (pp. 1–25). Zhejiang University Press. (In Chinese). Yu, S., Liang, J., & Liu, H. (2016). Existence of hierarchies and human’s pursuit of top hierarchy lead to power law. http://arxiv.org/abs/1609.07680 Yue, M. (2006). Discursive usage of six Chinese punctuation marks. In Proceedings of the COLING/ACL-2006 student research workshop (pp. 43–48). Sydney, Australia, July 17–21, 2006. Zhang, H. (2018). A Chinese-English Synergetic Syntactic Model Based on Dependency Relations. Doctoral dissertation, Zhejiang Univiersity. (In Chinese). Zhang, H., & Liu, H. (2016a). Rhetorical relations revisited across distinct levels of discourse unit granularity. Discourse Studies, 18(4), 454–472. Zhang, H., & Liu, H. (2017a). Motifs in reconstructed RST discourse trees. Journal of Quantitative Linguistics, 24(2–3), 107–127. Zhang, H., & Liu, H. (2017b). Motifs of generalized valencies. In H. Liu & J. Liang (Eds.), Motifs in language and text (pp. 231–260). De Gruyter Mouton. Zhang, Y., Qi, P., & Manning, C. D. (2018). Graph convolution over pruned dependency trees improves relation extraction. In Proceedings of the 2018 conference on empirical methods in natural language processing (pp. 2205–2215). Association for Computational Linguistics. arXiv preprint arXiv:1809.10185. Zhang, M., Zhang, J., Su, J., & Zhou, G. (2006). A composite kernel to extract relations between entities with both flat and structured features. In Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the Association for Computational Linguistics (pp. 825–832). Sydney, July 2006. Association for Computational Linguistics. Zheng, H., Li, Z., Wang, S., Yan, Z., & Zhou, J. (2016). Aggregating inter-sentence information to enhance relation extraction. In Proceedings of the thirtieth AAAI conference on artificial intelligence (pp. 3108–3114). Phoenix Arizona, February 12–17, 2016. Zhou, C. (2021). On mean dependency distance as a metric of translation quality assessment. Indian Journal of Language and Linguistics, 2(4), 23–30. Ziegler, A. (1998). Word class frequencies in Brazilian-Portuguese press texts. Journal of Quantitative Linguistics, 5, 269–280. Ziegler, A. (2001). Word class frequencies in Portuguese press texts. In L. Uhlířová, G. Wimmer, G. Altmann, & R. Köhler (Eds.), Text as a linguistic paradigm: Levels, constituents, constructs. Festschrift in honour of Luděk Hřebíček (pp. 295–312). Wissenschaftlicher Verlag Trier. Zipf, G. K. (1933). Selected studies of the principle of relative frequency in language. Language, 9(1), 89–92.

5

Syntactic Dependency Relations and Related Properties

This chapter focuses on syntactic dependency relations. We will first validate homogeneity across the sub-corpora to prove the chosen research materials are qualified candidates in this study (Sections 5.1 and 5.2). We will then subsequently examine sequencings of individual elements of dependency structure (Section 5.3), distribution of combinatorial dependency structures (Section 5.4), motifs of dependency structures (Section 5.5), valency distribution (Section 5.6) and dependency distance sequencings (Section 5.7). The final section provides a summary. To prove that the sub-corpora will behave homogeneously, we would check the rank-frequency data of both word classes or parts of speech (POS) (Section 5.1) and dependency relations or syntactic functions (Section 5.2). 5.1 Validating homogeneity through POS distribution 5.1.1 Introduction

Research on POS or word classes abound in the field (e.g., Best, 1994, 1998, 2000, 2001; Hammerl, 1990; Liang & Liu, 2013; Schweers & Zhu, 1991; Wang, 2018; Zhu & Best, 1992; Ziegler, 1998, 2001; Ziegler et al., 2001). Common methods for examining POS are frequency, percentage, rank-frequency function fitting, entropy and Gini coefficient. Some studies focus on the POS system as a whole (e.g., Hengeveld, 2007; Hengeveld et al., 2004; Muryasov, 2019; Sun, 2020). As the first person to study POS ratios, Ohno assumes that such ratios are constant (cf. Köhler, 2012). Hammerl (1990) suggests modelling POS probabilities employing the ZA (the right truncated modified Zipf-Alekseev) model in most cases. Section 3.4 details how this model is derived. Schweers and Zhu (1991) suggest that negative hypergeometric model can capture POS distribution. Altmann and Burdinski (1982) give a theoretical derivation of this model and then test it empirically. Popescu et al. (2010) use a functional form of the “linguistic data stratification” hypothesis, with the effect of stratification appearing as a parameter in their formula. They validate the model on 60 Italian texts. Köhler (2012) uses a simplified version of their model. DOI: 10.4324/9781003436874-5

Syntactic Dependency Relations and Related Properties 81 Some scholars (e.g., Elts, 1992; Mikk & Elts, 1992, 1993) have proposed that the ratio of POS affects text difficulty. Wiio (1968, p. 43) suggests that the formula No. of adj. + No. of adv. No. of n. + No. of v.

(5.1)

can be used as one of the indices of text difficulty. Tuldava and Villup (1976) propose the ratio of nouns/verbs as an index of substantivity. Scholars have examined POS differences between different genres or styles. Tuldava and Villup (1976) investigate the POS frequencies of fiction in six languages, Hoffmann (1976) compares the POS percentage in fiction and philosophy texts, and Mikk (1997) compares the POS frequencies in spoken and written texts. Hengeveld et al. (2004) carry out a cross-linguistic study on POS in 50 languages and discuss the classification of the POS system. Vulanović (2008a, 2008b, 2009) conducts a series of related research from a typological perspective. He puts forward methods to classify POS systems and to measure the efficiency of such systems. The last two decades has witnessed a growing body of quantitative comparative and contrastive studies on styles, genres, authors, and even languages, among which some focus on POS distributions. Liu (2009a) builds a Chinese corpus of news and finds that word classes of both governors and dependents follow the ZA distribution model. Using two self-built dependency tree corpora of news, Huang and Liu (2009) find ratios and TTRs of adverbs, nouns and pronouns can well cluster the texts into spoken and written styles. Using the same research materials, Liu et al. (2012) find the percentage of word classes bearing the same syntactic function can shed some light on the differences between spoken and written styles. Using the Chinese part of Lancaster, Zhang’s (2012) research suggests POS distributions can discriminate different genres in written Chinese. Using Popescu et al.’s (2010) model, Pan and Liu (2018) verify the distribution of modern Chinese adnominals. Chen (2013) studies the classification of Chinese functional words using dependency networks. Hou and Jiang (2014) examine the POS employing text vector analysis and random forest in their examinations on the POS distribution. Li et al. (2015) examine the POS distribution in French. With COCA (Corpus of Contemporary American English) and COHA (Corpus of Historical American English) as research materials, Liu (2016) researches five genres both synchronically and diachronically. Pan et al. (2018) explore the differences in POS distributions between the original and translated versions of English poetry. These studies basically adopt the bag-of-words approach, which is a non-linear material research method. Chen (2017) and Yan (2017) adopt a linear approach. Chen (2017) finds that features like POS motifs, TTR and entropy (to be explained in Section 5.1.2) can distinguish different genres. Yan (2017) studies the POS motifs of deaf students, which are also found to abide by the ZA distribution pattern. Albeit amid a large body of related research, we can still study the POS distribution from several perspectives to cross-validate the linguistic status of POS.

82 Syntactic Dependency Relations and Related Properties We pose the following research hypotheses: Hypothesis 5.1.11: The POS distribution of dependents conforms to the ZA model. The parameters of the model can reflect homogeneity across the sub-corpora. Hypothesis 5.1.2: The POS distribution of governors conforms to the ZA model. The parameters of the model can reflect homogeneity across the sub-corpora. Both hypotheses on POS distribution patterns are put forward to validate homogeneity across the sub-corpora. If the sub-corpora are found to be homogeneous, they can be legitimate research materials. We can also claim the chosen size of the sub-corpora have reached the Zipfian sizes (cf. Section 3.1). The ZA model is chosen as the candidate model as it has been shown to be effective in modelling rank-frequency data in a large number of studies (e.g., Hřebíček, 1996; Köhler, 2015; Köhler & Naumann, 2010; Yue & Liu, 2011). 5.1.2 Results and discussion

Table 5.1 summarises the general distribution of all word classes across the sub-corpora. We briefly introduce the indicators in Table 5.1. 1 TTR (type-token ratio) reflects the ratio of types to tokens. For example, the expression “dependency grammar” bears two tokens of POS (two nouns) but only one POS type (“dependency” and “grammar” are both nouns), suggesting a TTR of 0.5 (=1/2). TTR can distinguish between different types of texts with the same text size. Huang and Liu’s (2009) investigation is such an example. Table 5.1 Summary of all word classes E1

E2

E3

E4

E5

E6

Type

33

30

30

30

31

28

Token

43,266

43,276

43,264

43,254

43,257

43,260

TTR

0.000786

0.000716

0.000717

0.000717

0.00074

0.00067

Entropy

3.16

3.15

3.2

3.17

3.17

3.13

RR

0.171

0.172

0.166

0.172

0.174

0.175

Verbs

6,358

6,680

6,505

6,321

5,978

6,467

Adjectives

3,671

3,347

3,550

3,525

3,405

3,220

Activity indicator

0.63

0.67

0.65

0.64

0.64

0.67

Descriptivity indicator

0.37

0.33

0.35

0.36

0.36

0.33

Syntactic Dependency Relations and Related Properties 83 2 Entropy is one of the most basic concepts in information theory. Let the probability of occurrence of any linguistic entity or property (here POS) in a text be Pr (the ratio of its frequency to the total number of tokens), entropy is calculated as V

H=

∑p log r

2

pr

(5.2)

r =1

Entropy measures the degree of disorder of things. In this case, entropy can reflect the richness of POS in a text—the higher the entropy, the higher the richness of POS. 3 Like entropy, repetition rate (RR) is calculated based on the probability of occurrence. Let the probability of occurrence of any linguistic entity or property (in this case, POS) in the text be Pr (the ratio of its frequency to the total number of tokens), entropy is V

RR =

∑p

r

2

(5.3)

r =1

The higher the RR value, the lower the richness. 4 The formula of activity indicator in the table is Q′ = v / ( a + v )

(5.4)

Here v stands for the tokens of verbs and a, tokens of adjectives. 5 The formula for the descriptivity indicator in the table is a / ( a + v )

(5.5)

Statistical test shows that TTR (M = 0.0007, SD = 0.00004), entropy (M = 3.163, SD = 0.023), RR (M = 0.172, SD = 0.003), activity indicator (M = 0.650, SD = 0.017) and descriptivity indicator (M = 0.350, SD = 0.017) can all reflect the homogeneity of the corpus, at least when POS is concerned. After exploring some overall features, we explore the distributions of POS, taking turns in addressing the research hypotheses in Section 5.1.1. 5.1.2.1 POS distribution of dependents

Dependents are all the words except the roots, which are the only nodes bearing no governor. Appendix 1 summarises the data, where the highest rank goes to the most frequent POS, Rank 2, to the next highest rink until the lowest rank is assigned to the least frequent POS. The rank-frequency data are all derived in this manner in this book. Table 5.2 presents the top 10 word classes: general singular nouns (nn), prepositions (in), proper nouns (nnp), determiners (dt), adjectives (jj), plural nouns (nns), adverbs (rb), cardinal numbers (cd), verbs (vb) and “to” (to). From Table 5.2, we can see the similar percentages across the sub-corpora.

R.

%

E1

R.

%

Freq.

E2 POS

Freq.

E3 POS

Freq.

E4 POS

Freq.

E5 POS

Freq.

E6 POS

Freq.

POS

1

16.1–16.7

6,628

nn

6,601

nn

6,773

nn

6,664

nn

6,873

nn

6,746

nn

2

12.2–12.6

5,047

in

5,124

in

5,056

in

5,026

nnp

5,184

nnp

5,130

in

3

10.7–12.0

4,651

nnp

4,813

nnp

4,418

dt

4,884

in

4,935

in

4,785

nnp

4

9.7–10.5

4,313

dt

4,204

dt

4,341

nnp

4,018

dt

4,008

dt

4,132

dt

5

7.5–8.3

3,417

jj

3,093

jj

3,287

jj

3,255

jj

3,168

jj

3,121

nns

6

6.7–7.3

3,024

nns

2,938

nns

2,897

nns

2,965

nns

2,770

nns

2,939

jj

7

3.8–5.8

1,605

rb

1,575

cd

1,565

rb

1,924

cd

2,393

cd

1,920

cd

8

3.4–4.0

1,553

cd

1,556

rb

1,533

cd

1,667

rb

1,423

rb

1,420

rb

9

2.8–3.5

1,383

vb

1,435

vb

1,365

vb

1,167

vb

1,165

vb

1,252

vb

10

2.6–3.0

1,106

to

1,194

to

1,116

to

1,087

to

1,151

to

1,223

to

Sub-total

32,727

32,533

32,351

32,657

33,070

32,668

Total

41,198

41,200

41,198

41,181

41,198

41,192

84 Syntactic Dependency Relations and Related Properties

Table 5.2 POS of dependents (top 10) (R. = rank, Freq. = frequency)

Syntactic Dependency Relations and Related Properties 85

Figure 5.1 Rank-frequency data of dependents

Figure 5.1 displays how the dependents are distributed. All sub-corpora seem to behave likewise with the curves nearly overlapping each other, visually presenting the homogeneity across the sub-corpora. If the granularity of the POS is changed, for example, all forms of verbs (vb, vbd, vbn, vbg, vbz, vbp) are grouped into one vb category, the proportion of verbs in each sub-corpus is 10.9%–12.5% and rank third. But the data here can’t reflect the distribution of verbs in all nodes, because in most cases, the root of the English dependency tree is a verb, which takes a significant proportion. Similarly, when all types of English nouns (nn, nnp, nns, nnps) are grouped together, their proportion rises to 34.3%–36%, which still tops the list. This range of percentages is consistent with the findings of Liang and Liu (2013). 5.1.2.2 POS distribution of governors

After analysing the distribution of dependents, we proceed with governors. There are two algorithms to calculate the frequency of the dominant words. We illustrate with the “student” example (Figure 1.2). We use the symbol of ◎ to suggest the word preceding it as a governor. In the valency structure of “student has◎ book”, there are two dependent-governor pairs (“student-has◎” and “has◎-book”) with “has” being the governor in both cases. The first algorithm is to calculate “has” twice as it occurs in two dependent-governor pairs (in the two-dimensional structure), and the second, to calculate “has” once as it only occurs once in the linear structure. We first follow the first operation, where a governor will be calculated each time it occurs in a dependent-governor pair. In this case, the frequencies of governors will correspond to those of dependents as dependency grammar examines one-on-one correspondence (cf. Table 5.2). The result is presented in Appendix 2. Table 5.3 presents the top 10.

R.

%

E1

E2

E3

E4

E5

E6

POS

Freq.

POS

Freq.

POS

Freq.

POS

Freq.

POS

Freq.

POS

Freq.

1

21.6–22.2

nn

9,316

nn

8,963

nn

9,392

nn

9,172

nn

9,403

nn

8,904

2

12.5–13

in

5,345

in

5,225

in

5,179

in

5,186

in

5,141

in

5,304

3

9.3–10

nns

4,140

nns

3,990

nns

3,819

nns

3,868

vbd

4,126

vbd

4,062

4

7.7–9.8

vbd

3,258

vbd

3,344

vbd

3,160

vbd

3,635

nns

3,845

nns

4,025

5

7.1–8.4

cc

2,949

nnp

2,961

cc

2,913

nnp

3,082

nnp

3,461

cc

3,085

6

6.6–7.1

nnp

2,851

cc

2,779

nnp

2,789

cc

2,794

cc

2,701

nnp

2,938

7

5.8–6.5

vbn

2,435

vbn

2,450

vbn

2,378

vbn

2,660

vbn

2,540

vbn

2,439

8

4.4–5.5

vb

2,133

vb

2,130

vbz

2,259

vbz

2,077

vb

1,811

vb

1,852

9

3.9–5.0

vbz

1,732

vbz

2,053

vb

2,015

vb

1,754

vbz

1,444

vbz

1,602

10

3.0–3.4

md

1,301

vbg

1,417

vbg

1,257

to

1,244

to

1,261

to

1,294

Sub-total

35,460

35,312

35,161

35,472

35,733

35,505

Total

41,198

41,200

41,198

41,181

41,198

41,192

86 Syntactic Dependency Relations and Related Properties

Table 5.3 POS of governors (the first algorithm) (top 10) (R.= rank)

Syntactic Dependency Relations and Related Properties 87

Figure 5.2 Rank-frequency of governors (first algorithm)

From Table 5.3, we can see the top 9 word classes are basically nn (noun, singular or mass, 21.6%–22.2%), in (preposition, 12.5%–13%), nns (noun, plural, 9.3%–10%), vbd (verb, past tense, 7.7%–9.8%), cc (conjunction, 7.1%–8.4%), nnp (proper noun, singular, 6.6%–7.1%), vbn (verb, past participle, 5.8%–6.5%), vb (verb, basic form, 4.4%–5.5%) and vbz (verb, gerund or present participle, 3.9%–5.0%). The same word class rank similarly with similar percentages across the sub-corpora. Tables 5.2 and 5.3 suggest the difference of distribution of governors and dependents. We won’t go into detail here. Figure 5.2 is the visual presentation of POS rank-frequency data of governors. Like Figure 5.1, the curves basically overlap each other, suggesting homogeneity among the sub-corpora. In order to simplify the issue, we unify three most frequently occurring content word classes (verbs, nouns and adjectives) separately. For example, vbd (verb, past tense), vb (verb, basic form), vbn (verb, past participle), vbz (verb, third person singular present), vbz (verb, gerund or present participle), vbg (verb, gerund or present participle) and vbp (verb, non-third person singular present) are all categories as verbs under the umbrella term vb. Similarly, all types of nouns, including nn (noun, singular or mass), nnp (proper noun, singular), nns (noun, plural) and nnps (proper noun, plural), are grouped together as nouns (nn). Likewise, jj (adjective, basic form), jjr (adjective, comparative) and jjs (adjective, superlative) are all annotated as jj (adjective). Hereafter in this study, the word classes are all of this increased granularity unless otherwise specified. We now proceed to study the POS distribution of dependents (the second algorithm, with each governor counted only once no matter in how many dependentgovernor pairs it occurs) using the uniform coarse granularity of word classes. Table 5.4 presents such POS distribution of dependents. Figure 5.3 presents the

R.

%

E1

E2

E3

E4

E5

E6

R.

%

Freq.

POS

Freq.

POS

Freq.

POS

Freq.

POS

Freq.

POS

Freq.

POS

1

39–41

9,364

nn

9,256

nn

9,247

nn

9,433

nn

9,665

nn

9,426

nn

2

21–23

5,137

vb

5,327

vb

5,211

vb

5,124

vb

4,873

vb

5,097

vb

3

20–21

4,970

in

4,914

in

4,894

in

4,811

in

4,849

in

4,941

in

4

5

1,084

to

1,149

to

1,092

cc

1,084

to

1,137

to

1,190

to

5

4–5

1,079

cc

1,041

cc

1,079

to

1,047

cc

1,010

cc

1,144

cc

6

2–3

524

md

508

md

524

jj

514

cd

676

cd

461

cd

7

22

447

jj

486

jj

503

md

436

jj

431

jj

451

md

8

2

373

cd

365

cd

393

cd

397

md

430

md

431

jj

9

1

218

rb

207

'

222

rb

229

rb

203

rb

214

'

10

1

187

'

190

rb

213

'

207

'

181

'

164

rb

11

0

63

dt

67

dt

93

$

86

$

112

$

77

$

12

0

50

$

43

$

58

dt

54

dt

29

rbr

54

dt

13

0

26

prp

28

rbr

30

rbr

28

prp

28

dt

46

rbr

14

0

11

wp$

28

prp

24

prp

22

rbr

15

prp

14

prp

15

0

11

wrb

11

wp$

10

lrb

12

fw

8

lrb

4

prp$

16

0

10

rbr

8

prp$

10

wp$

11

wp$

5

prp$

4

wp$ (Continued)

88 Syntactic Dependency Relations and Related Properties

Table 5.4 POS of governors (coarse granularity level) (the second algorithm) (R.= rank, Freq. = frequency)

Table 5.4 (Continued) %

E1

E2

E3

R.

%

Freq.

POS

Freq.

17

0

5

prp$

7

fw

9

18

0

4

fw

6

wdt

19

0

3

rp

6

20

0

3

wdt

21

0

2

22

0

23

0

24

Freq.

POS

Freq.

POS

fw

7

wrb

4

wp$

3

wrb

8

wrb

5

prp$

5

wdt

3

pdt

wrb

7

wp

4

sym

3

wp

2

fw

2

pos

6

prp$

4

wdt

3

pos

2

wdt

pos

1

wp

5

rp

3

wp

2

rp

1

rbs

2

rbs

1

rbs

4

pdt

3

lrb

2

wrb

1

pos

1

wp

1

lrb

1

rbs

3

pos

1

fw

0

1

rp

1

eto

2

uh

1

rbs

25

0

1

pdt

1

wdt

2

rbs

1

pdt

26

0

1

pdt

1

sym

27

0

1

rp

23,654

23,645

POS

E6

POS

23,574

Freq.

E5

Freq.

Total

POS

E4

23,530

23,675

23,730

Syntactic Dependency Relations and Related Properties 89

R.

90 Syntactic Dependency Relations and Related Properties

Figure 5.3 POS rank-frequency of governors (granularity level 2)

results visually, where the overlapping curves suggest homogeneity among subcorpora. From Table 5.4, we can see similar proportions of the same word classes occurring across the sub-corpora, particularly the top three word classes. Nouns (nn) take up percentages of 39%–41%, verbs (vb), 21%–23% and prepositions (in), 20%–21%. Following these top three are usually infinitive “to” (to) and conjunction (cc), taking up percentages about 5%. 5.1.2.3 Fitting ZA model to POS data

Fitting the ZA model to POS data of dependents, governors (the first algorithm) and governors (the second algorithm) yields the data in Tables 5.5, 5.6 and 5.7, respectively. The determination coefficients R2 are all above 0.9584 and suggest Table 5.5 Fitting ZA model to POS rank-frequency data of dependents R²

a

b

n

α

N

E1

0.9584

0.33

0.26

38

0.16

41,198

E2

0.9665

0.33

0.26

37

0.16

41,200

E3

0.9650

0.29

0.26

38

0.16

41,198

E4

0.9640

0.32

0.26

37

0.16

41,181

E5

0.9668

0.33

0.27

37

0.17

41,198

E6

0.9685

0.32

0.26

36

0.16

41,192

Syntactic Dependency Relations and Related Properties 91 Table 5.6 Fitting ZA model to POS rank-frequency data of governors (the first algorithm） R²

a

b

n

α

N

E1

0.9713

0.18

0.31

32

0.23

41,198

E2

0.9655

0.18

0.31

34

0.22

41,200

E3

0.9684

0.14

0.31

33

0.23

41,198

E4

0.9618

0.25

0.29

36

0.22

41,181

E5

0.9655

0.18

0.33

35

0.23

41,198

E6

0.9646

0.27

0.29

31

0.22

41,192

excellent fitting results, from which we can claim that all sub-corpora are homogeneous. We can illustrate with data from Table 5.5, where Parameter a bears an M of 0.320 (SD = 0.016), Parameter b, an M of 0.262 (SD = 0.004) and Parameter α, an M of 0.162 (SD = 0.004). The fittings also validate the ZA model proposed by Hammerl (1990). The negative hypergeometric distribution suggested by Schweers and Zhu (1991) also fits the data excellently with determination coefficient R2 ranging from 0.9740 to 0.9887 for dependents and from 0.9399 to 0.9891 for governors. To save space, the detailed fitting results are not shown here. Fittings as shown in Tables 5.5–5.7 once again prove that POS is a syntactic property regularly distributed, whether they are dependents or governors. They also suggest POS as a result of the diversification process, and validate the sub-corpora as legitimate research materials bearing Zipfian sizes. In Liu’s (2009b) paper, one of the chosen passages doesn’t fit well enough. One possible cause might be its small size (with a sample passage of only 233 words), which fails to meet the Zipfian size. 5.1.3 Section summary

In this section, we validate the homogeneity of the six sub-corpora from the perspective of rank-frequency distribution patterns of word classes (or POS). These word Table 5.7 Fitting ZA model to POS rank-frequency data of governors (the second algorithm) R²

a

b

n

α

N

E1

0.9601

0.48

0.64

23

0.40

23,574

E2

0.9651

0.49

0.64

25

0.39

23,654

E3

0.9644

0.47

0.62

25

0.39

23,645

E4

0.9668

0.45

0.63

27

0.40

23,530

E5

0.9641

0.42

0.63

26

0.41

23,675

E6

0.9617

0.46

0.63

22

0.40

23,730

92 Syntactic Dependency Relations and Related Properties classes are found to obey the ZA distribution pattern, a reflection of how self-regulation functions in language. The long-term language development brings about a result of gaming between the force of speakers/writers and that of listeners/readers. Some word classes are thus more frequently used than the others, simultaneously saving the efforts for both parties. The curves in Figures 5.2 and 5.3 suggest hierarchies in the selection of POS in the system as a result of pursuit of higher hierarchies in accordance with a “Hierarchical Selection Model” (Yu, 2018; Yu et al., 2016). In Section 5.2, we examine the homogeneity from another perspective—the distribution of dependency (relations)/syntactic functions. 5.2 Validating homogeneity through dependency relation distribution 5.2.1 Introduction

In the field of dependency grammar, there is a general agreement that dependency relations are the basis of dependency grammar, and such relations are binary, asymmetric, and marked (Liu, 2009a, p. 98) (cf. Section 1.3). The syntactic functions in dependency treebanks reflect the dependency relations. Quantitative research abounds on how syntactic functions/dependency relations are distributed. For example, based on a Chinese dependency treebank, Liu (2009b) examines probability distribution of dependencies. Separately, he examines dependency relations, word classes as dependents, word classes as governors, verbs as governs and nouns as dependents. Most of them are found to be excellently fitted with the ZA distribution. In the study of multifunctionality of word classes, Liu and Feng (Liu, 2006, 2007, 2009a; Liu & Feng, 2007) propose the concept of generalised valency pattern (GVP) (cf. Section 4.4.2). In a further study, Liu (2009a) proposes a related concept—probabilistic valency pattern (PVP) theory. In his syntactic synergetic model, Köhler (2012, pp. 186–187) proposes concepts of polyfunctionality and synfunctionality. The former concerns how many different functions a construction might perform, while synfunctionality indicates how many different functions share a syntactic representation. Most existing literature on syntactic functions focus on the distribution of functions of one particular word class, such as verbs (Gao et al., 2010; Liu, 2011; Liu & Liu, 2011), nouns (Gao, 2010), and adverbs (Yin, 2014) or several major content word classes (e.g., Mo & Shan, 1985; Xu, 2006). There are also studies related to certain syntactic functions with a purpose of finding out which word classes can play such functions. For instance, Liu et al. (2012) find that percentages of word classes playing certain functions can tell styles apart. Some scholars have also studied auxiliaries specifically (e.g., Aziz et al., 2020). Yan and Liu (2017) compare the dependency relations from Great Expectations and Jane Eyre and find their distributions are significantly different between the two works, suggesting different writing styles of the two authors. Li (2018) conducts a comparative study of two syntactic functions (subjects and objects) in Chinese and English by examining which word classes can function

Syntactic Dependency Relations and Related Properties 93 as subjects and objects. Yan (2018) studies the development of preposition use in deaf students’ written language, exploring the syntactic functions and dependent distance development of preposition-related phrases. Besides language-in-the-mass research methods, linear behaviours of dependency relations are examined. For instance, Yan (2017) examines the motifs of dependency relations in deaf students’ writings. In Section 5.1, we demonstrated the homogeneity of research materials from a POS perspective, and in this section, we continue the verification from the perspective of syntactic functions. Liu (2009a) suggests that Chinese dependency relations/ syntactic functions can be modelled by the ZA model. We propose a hypothesis to validate the feasibility of Liu’s (2009a) ZA model in English data. Hypothesis 5.2.1: The rank-frequency data of syntactic functions of the six subcorpora follow the ZA distribution pattern. The parameters of the model can reflect the homogeneity across the sub-corpora. 5.2.2 Results and discussion

Table 5.8, where the roots are excluded, presents the rank-frequency data of syntactic functions across the sub-corpora. The rankings of syntactic functions are almost identical. We illustrate with the most frequent six functions, all with percentages above 6%. 1 Occurring most frequently is atr (attribute, 33.2%–36.6%). Two factors can account for its high percentages: (1) the high percentages of nouns (Cf. Section 5.1.2), (2) the likely presence of more than one attribute for each noun. 2 Adv (adverbial, 14.3%–15.4%) comes second as adverbials can be related to several word classes (e.g., verbs, adjective, adverbs and even nouns). 3 Auxp (preposition, parts of a secondary (multiword) preposition, 13.0%–13.5%) ranks third. This is a result of the high frequency of prepositions in English. 4 Auxa (articles a, an, the, 8.8%–9.4%) ranks fourth. Again, the high frequency of English nouns is a very important contributing factor. 5 Every sentence goes with at least one subject. In compound and complex sentences, there are even more, hence the high ranking of sb (subject, taking up percentages above 8.7% and ranking fourth or fifth). 6 With frequent occurrences of verbs, many of which are transitive verbs, verbal objects are quite frequent in English. Also, all prepositions take prepositional objects. Both factors contribute to the high frequency of obj (object, taking up percentages of above 6%), which ranks 5th or 6th across the sub-corpora. Figure 5.4 shows how syntactic functions are distributed in all the sub-corpora. Again, the overlapping curves constitute evidence of the data homogeneity across the sub-corpora. As homogeneity across the sub-corpora is relatively strong, we claim the results do not come from random fluctuations. To save space, we directly present the summary data of all 6 sup-corpora (Table 5.9). This time, roots are also included.

94 Syntactic Dependency Relations and Related Properties Table 5.8 Rank-frequency data of syntactic functions (R. = rank, Freq. = frequency, F. = function) R.

E1

E2

E3

Freq.

F.

%

Freq.

F.

%

Freq.

F.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Total

13,989 6,202 5,492 3,759 3,718 2,744 1,210 1,168 1,027 663 558 467 92 50 33 26 41,198

atr adv auxp auxa sb obj nr auxv coord auxc pnom pred neg auxx exd auxg

34.0 15.1 13.3 9.1 9.0 6.7 2.9 2.8 2.5 1.6 1.4 1.1 0.2 0.1 0.1 0.1 100.0

13,691 6,301 5,430 3,828 3,629 2,846 1,263 1,208 1,006 863 557 406 74 56 41 1 41,200

atr adv auxp sb auxa obj nr auxv coord auxc pnom pred neg auxx exd auxg

33.2 15.3 13.2 9.3 8.8 6.9 3.1 2.9 2.4 2.1 1.4 1.0 0.2 0.1 0.1 0.0 100.0

13,685 6,207 5,415 3,874 3,837 2,751 1,234 1,168 1,055 751 597 474 74 48 28

atr adv auxp auxa sb obj nr auxv coord auxc pnom pred neg auxx exd

R.

E4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Total

E5

Freq.

F.

14,241 6,331 5,346 3,880 3,569 2,646 1,193 1,155 1,002 644 565 491 45 39 34

atr adv auxp sb auxa obj nr auxv coord auxc pnom pred neg exd auxx

41,181

34.6 15.4 13.0 9.4 8.7 6.4 2.9 2.8 2.4 1.6 1.4 1.2 0.1 0.1 0.1 100.0

41,198

% 33.2 15.1 13.1 9.4 9.3 6.7 3.0 2.8 2.6 1.8 1.4 1.2 0.2 0.1 0.1 100

E6

Freq.

F.

15,087 5,880 5,441 3,630 3,524 2,516 1,031 973 954 659 502 491 381 70 35 24 41,198

atr adv auxp sb auxa obj auxv nr coord auxc pnom pred auxg neg exd auxx

36.6 14.3 13.2 8.8 8.6 6.1 2.5 2.4 2.3 1.6 1.2 1.2 0.9 0.2 0.1 0.1 100.0

Freq.

F.

14,012 6,105 5,569 3,813 3,637 2,791 1,133 1,111 1,046 760 499 437 154 45 44 36 41,192

atr adv auxp sb auxa obj coord auxv nr auxc pnom pred neg exd auxg auxx

34.0 14.8 13.5 9.3 8.8 6.8 2.8 2.7 2.5 1.8 1.2 1.1 0.4 0.1 0.1 0.1 100

Any dependency relation ranking above roots (ranking 7th, 4.7%) suggests more than one occurrence in an average sentence. These syntactic functions include, in sequence, atr (attribute, 32.7%), adv (adverbial, 14.3%), auxp (preposition, parts of a secondary (multiword) preposition, 12.6%), sb (subject, 8.8%), auxa (articles a, an, the, 8.5%) and obj (object, 6.3%).

Syntactic Dependency Relations and Related Properties 95

Figure 5.4 Rank-frequency distribution of syntactic functions

Fitting the ZA model to rank-frequency data of dependency relations in the six sub-corpora yields excellent results with all determination coefficients R2 above 0.958 (Table 5.10). Parameters a, b and α validate homogeneity, with Parameter a (M = 0.320, SD = 0.016), b (M = 0.262, SD = 0.004) and α (M = 0.162, SD = 0.004) behaving similarly across the sub-corpora. The result suggests dependency relations as results of diversification process and validates Liu’s (2009b) model for English data. 5.2.3 Section summary

In this section, we validate the homogeneity of the six sub-corpora from the perspectives of rank-frequency distribution patterns of dependency relations (or syntactic functions). Like the distribution of POS, dependency relations are found to obey the ZA distribution model, a result suggesting the dependency relations as a result of diversification process. The homogeneity displayed between the subcorpora with the less-than-5,000-word size seem to suggest that these sub-corpora have reached a Zipfian size (Zipf, 1932), which is around one 23rd of the original complete corpus. Maybe if we continue this process, we might be able to find an even smaller Zipfian size for some specific purposes. 5.3 Sequencings of individual elements of dependency structure Having verified the homogeneity of the sub-corpora and proving their eligibility as the research material at the syntactic level with language-in-the-mass methods, in this section, we can start examining sequential behaviours of individual elements

96 Syntactic Dependency Relations and Related Properties Table 5.9 A summary of syntactic functions (aggregated data of six sub-corpora) Rank Function Notes

Frequency %

1

atr

Attribute

84,815

32.7

2

adv

Adverbial

37,060

14.3

3

auxp

Preposition, parts of a secondary (multiword) preposition

32,721

12.6

4

sb

Subject

22,729

8.8

5

auxa

Articles a, an, the

22,015

8.5

6

obj

Object

16,310

6.3

7

root

Root of the dependency tree

12,138

4.7

8

nr

Unidentified

6,932

2.7

9

auxv

Auxiliary verbs be, have and do, the infinitive particle to, particles in phrasal verbs

6,846

2.6

10

coord

Coordination node

6,183

2.4

11

auxc

Subordinating conjunction, subordinator

4,346

1.7

12

pnom

Nominal part of a copular predicate

3,281

1.3

13

pred

Predicate. A node not depending on other nodes.

2,767

1.1

14

neg

Negation expressed by the negation particle not, n't

509

0.2

15

auxg

Non-terminal and non-coordinating graphic symbols

452

0.2

16

others

Others

249

0.1

17

exd

A technical value signalling a “missing” (elliptical) governing node; also for the main element of a sentence without predicate (stands for “externally dependent”)

221

0.1

Total

259,574

100

Table 5.10 Fitting the ZA model to complete rank-frequency data of dependency relations R²

a

b

n

α

N

E1

0.9584

0.33

0.26

38

0.16

41,198

E2

0.9665

0.33

0.26

37

0.16

41,200

E3

0.9650

0.29

0.26

38

0.16

41,198

E4

0.9640

0.32

0.26

37

0.16

41,181

E5

0.9668

0.33

0.27

37

0.17

41,198

E6

0.9685

0.32

0.26

36

0.16

41,192

Syntactic Dependency Relations and Related Properties 97 of dependency structures (including, again, word classes and dependency relations). The definition and examples of sequencings are available in Section 4.5.2. 5.3.1 Introduction

There have been prior studies concerning the distribution patterns of POS and syntactic functions as well as those of their motifs (e.g., Best, 1994; Köhler, 1999, 2012; Liu, 2009a; Tuzzi et al., 2009; Ziegler, 1998, 2001). Here we are going to test whether the proposed method of ordered sequencing will work in a similar fashion. To simplify the discussion, we examine first the sequencings of the POS of all the running words, which are all dependents except for the root words. Then sequencings of dependency relations will be examined. We pose the following hypotheses: Hypothesis 5.3.1: The rank-frequency data of POS sequencings are regularly distributed, following a certain distribution pattern. Hypothesis 5.3.2: The rank-frequency data of dependency relation sequencings are regularly distributed, following a certain distribution pattern. In the next part, we will present the results and discussion. In the final part of this section, we will summarise the whole section. 5.3.2 Results and discussion

In this part, we take turns to address the two hypotheses. We will first examine POS sequencings. 5.3.2.1 POS sequencings

Table 5.11 presents a summary of the POS sequencing data, which shows that generally the sequencings of POS are behaving homogeneously across the sub-corpora at least when TTR (M = 0.6483, SD = 0.00983) is concerned. For many types of sequencings, they have a very low probability. To make sure we can find some patterns that are more likely to occur, we leave out all the Table 5.11 A summary of POS sequencing data (TTR = type-token ratio = type/token)

E1 E2 E3 E4 E5 E6

Type

Token

TTR

400,760 399,283 384,135 372,476 379,812 371,332

605,370 604,832 586,971 578,872 596,644 576,430

0.66 0.66 0.65 0.64 0.64 0.64

98 Syntactic Dependency Relations and Related Properties Table 5.12 Truncated POS sequencing data

E1 E2 E3 E4 E5 E6

POS sequencing type

POS sequencing token

POS sequencing TTR

POS sequencing %

371 392 383 365 387 391

126,365 127,558 126,582 126,036 127,739 128,030

0.0029 0.0031 0.0030 0.0029 0.0030 0.0031

20.9 22.0 21.6 20.8 21.4 22.2

sequencings with a probability lower than 0.01%, which we believe only occur at random. This truncation generates the following data (Table 5.12). We deem these remaining types (371–392) do not occur randomly as each type occurs at least 58 times in the very sub-corpora. The truncating suggests homogeneity across the subcorpora again with TTR (M = 0.0030, SD = 0.00894) and POS sequencing percentages (M =21.4833, SD = 0.56710) behaving alike. Table 5.13 presents the top 10 POS sequencings for each sub-corpus. The same sequencing enjoys the same or similar ranking with similar frequencies and thus Table 5.13 Top 10 POS sequencings (individual corpora) (R. = rank Freq. = frequency) R.

E1

Freq.

%

E2

Freq.

%

E3

Freq.

%

1 2 3 4 5 6 7 8 9 10

nn vb in dt nn-nn jj nn-in dt-nn jj-nn nn-vb Sub-total

14,598 6,358 5,070 4,320 4,044 3,671 2,870 2,851 2,581 2,270

11.6 5.0 4.0 3.4 3.2 2.9 2.3 2.3 2.0 1.8 38.6

nn vb in nn-nn dt jj nn-in dt-nn jj-nn nn-vb

14,591 6,680 5,143 4,238 4,211 3,347 2,794 2,733 2,378 2,316

11.5 5.3 4.1 3.4 3.3 2.6 2.2 2.2 1.9 1.8 38.3

nn vb in dt nn-nn jj dt-nn nn-in jj-nn nn-vb

14,184 6,505 5,072 4,425 3,971 3,550 2,898 2,768 2,495 2,233

11.2 5.1 4.0 3.5 3.1 2.8 2.3 2.2 2.0 1.8 38.0

R.

E4

Freq.

%

E5

Freq.

%

E6

Freq.

%

1 2 3 4 5 6 7 8 9 10

nn vb in nn-nn dt jj nn-in dt-nn jj-nn nn-vb Sub-total

14,755 6,321 4,893 4,442 4,018 3,525 2,679 2,675 2,498 2,406

11.6 5.0 3.8 3.5 3.1 2.8 2.1 2.1 2.0 1.9 37.8

nn vb in nn-nn dt jj nn-in dt-nn jj-nn cd

15,031 5,978 4,952 4,571 4,013 3,405 2,772 2,751 2,474 2,418

11.8 4.7 3.9 3.6 3.1 2.7 2.2 2.2 1.9 1.9 37.9

nn vb in nn-nn dt jj nn-in dt-nn nn-vb jj-nn

14,898 6,467 5,135 4,403 4,132 3,220 2,855 2,816 2,444 2,318

11.6 5.1 4.0 3.4 3.2 2.5 2.2 2.2 1.9 1.8 38.0

Syntactic Dependency Relations and Related Properties 99 Table 5.14 Top 20 POS sequencing (all the corpora) Rank

Sequencing

%

Rank

Sequencing

%

1

nn

11.55

11

cd

1.44

2

vb

5.03

12

in-dt

1.32

3

in

3.97

13

in-nn

1.26

4

nn-nn

3.37

14

rb

1.21

5

dt

3.30

15

nn-nn-nn

0.99

6

jj

2.72

16

cc

0.95

7

nn-in

2.20

17

to

0.90

8

dt-nn

2.19

18

in-dt-nn

0.87

9

jj-nn

1.93

19

vb-dt

0.84

10

nn-vb

1.84

20

dt-jj

0.80

we can further claim the data are generally homogeneous across the sub-corpora. Therefore, we might be able to come up with the most frequent patterns through examining their occurrences in all the sub-corpora. Appendix 3 is the list for top 100 POS sequencing (all the corpora). We briefly examine the top 20 rankings (Table 5.14). Half of the top 10 are single element sequencings like nn, vb, in, dt and jj. Another half of the top 10 go with two elements, one of which is a noun. Three-element sequencings come 15th and 18th. Both of them have at least a noun in the sequencing. The two-or-more element sequencings bear a very close relationship with nouns, which is in accordance with the high percentages of nouns in English (Liang & Liu, 2013). Finally, we fit both the ZA (Table 5.15) and Zipf-Mandelbrot (ZM) (Table 5.16) models to rank-frequency data of the POS sequencings with a probability over 0.01%. These two models are particularly chosen because they are the most Table 5.15 Fitting the ZA model to rank-frequency data of POS sequencings (probability > 0.01%) Input data X²

P(X²) C

DF

R²

a

b

n

α

N

E1-pos

1,826.83 0

0.0145 360

0.9921 0.34 0.09 365 0.12 126,036

E2-pos

3,393.2

0

0.0269 366

0.9819 0.19 0.11 371 0.12 126,365

E3-pos

3,350.03 0

0.0265 378

0.9828 0.20 0.11 383 0.11

E4-pos

3,596.4

0

0.0282 387

0.9827 0.16 0.11 392 0.12 127,558

E5-pos

2,446.95 0

0.0192 382

0.9901 0.29 0.09 387 0.12 127,739

E6-pos

1,989.21 0

0.0155 386

0.9917 0.34 0.09 391 0.12 128,030

126,582

100 Syntactic Dependency Relations and Related Properties Table 5.16 Fitting the ZM model to rank-frequency data of POS sequencings (probability > 0.01%) Input data

X²

P(X²)

C

DF

R²

a

b

n

N

E1-pos

1,109.58

0

0.0088

361

0.9783

1.00

0.72

365

126,036

E2-pos

885.96

0

0.007

367

0.9823

1.00

0.71

371

126,365

E3-pos

840.3

0

0.0066

379

0.9834

0.99

0.72

383

126,582

E4-pos

1,108.93

0

0.0087

388

0.9767

1.00

0.70

392

127,558

E5-pos

1,654.1

0

0.0129

383

0.9637

1.01

0.91

387

127,739

0

0.0077

387

0.9789

1.00

0.69

391

128,030

E6-pos

979.94

well-established models to examine the distribution patterns of linguistic properties or elements. The former fitting (Table 5.15) generates R2 values all above 0.982 and the parameter value of α approximating 0.12 (M =0.1183, SD = 0.00408) in all the sub-corpora. The latter fitting results (Table 5.16) are still excellent, though not as excellent as the former one. But the values of Parameter a seem to be quite similar, all approximating 1.00 (M =1.0000, SD = 0.00632). The previous findings validate the linguistic status of such sequencings and corroborate Hypothesis 5.3.1. 5.3.2.2 Dependency relation sequencings

Having examined the POS sequencings, we examine the sequencings of syntactic functions/dependency relations in a similar fashion. Table 5.17 presents the summary of relation sequencing data along with those of POS sequencing. Table 5.18 is the extraction of sequencings with a probability of over 0.01%. For both the complete and the extracted data, POS and relation sequencing present a lot of similarities, at least from the perspectives of types, tokens and TTR.

Table 5.17 A summary of relation and POS sequencing data Dependency relations

E1 E2 E3 E4 E5 E6

POS

Types

Tokens

TTR

Types

Tokens

TTR

393,991 395,320 378,854 365,343 372,341 367,024

605,370 604,832 586,971 578,872 596,644 576,430

0.65 0.65 0.65 0.63 0.62 0.64

400,760 399,283 384,135 372,476 379,812 371,332

605,370 604,832 586,971 578,872 596,644 576,430

0.66 0.66 0.65 0.64 0.64 0.64

Syntactic Dependency Relations and Related Properties 101 Table 5.18 A summary of extracted relation and POS sequencing data (probability > 0.01%) Dependency relations

E1 E2 E3 E4 E5 E6

POS

Types

Tokens

TTR

%

Types

Tokens

TTR

%

375 381 387 410 394 401

129,988 129,183 129,502 133,316 135,221 130,911

0.0029 0.0029 0.0030 0.0031 0.0029 0.0031

21.47 21.36 22.06 23.03 22.66 22.71

371 392 383 365 387 391

126,365 127,558 126,582 126,036 127,739 128,030

0.0029 0.0031 0.0030 0.0029 0.0030 0.0031

20.9 22.0 21.6 20.8 21.4 22.2

Table 5.19 Fitting the ZA model to rank-frequency data of syntactic function sequencings (probability > 0.01%) Input data

R²

a

b

n

α

N

E1-function

0.9923

0.49

0.07

375

0.11

129,988

E2-function

0.9891

0.35

0.09

381

0.11

129,183

E3-function

0.9888

0.35

0.09

387

0.11

129,502

E4-function

0.9934

0.60

0.05

410

0.11

133,316

E5-function

0.9880

0.32

0.09

394

0.11

135,221

E6-function

0.9913

0.44

0.07

401

0.11

130,911

Fitting both the ZA (Table 5.19) and ZM (Table 5.20) models to the extracted relation data yields excellent results will all R2 values above 0.979. In the ZA model (Table 5.19), all parameter α is 0.11; in the ZM model (Table 5.20), parameter a (M = 1.003333, SD = 0.00816) values appropriate 1 for all the sub-corpora. The fittings validate the linguistic status of relation sequencings and corroborate Hypothesis 5.3.2. Table 5.20 Fitting the Zipf-Mandelbrot model to rank-frequency data of syntactic function sequencings (probability > 0.01%) Input data

X²

P(X²)

C

DF

R²

a

b

n

N

E1-function

1,015.8

0

0.0078

371

0.9792

1.01

0.96

375

129,988

E2-function

920.79 0

0.0071

377

0.9817

1.01

0.97

381

129,183

E3-function

967.47 0

0.0075

383

0.9827

1.00

0.76

387

129,502

E4-function

938.31 0

0.0070

406

0.9804

1.01

0.95

410

133,316

E5-function

780.9

0

0.0058

390

0.9840

1.00

0.70

394

135,221

E6-function

965.1

0

0.0074

397

0.9807

0.99

0.74

401

130,911

102 Syntactic Dependency Relations and Related Properties Table 5.21 Top 10 relation sequencings in each sub-corpus R.

E1

Freq.

%

E2

Freq.

%

E3

Freq.

%

1 2 3 4 5 6 7 8 9 10

atr adv atr-atr auxp auxa sb obj auxp-atr atr-atr-atr auxa-atr Sub-total

14,010 6,210 5,751 5,496 3,765 3,727 2,752 2,463 2,305 2,167

2.31 1.03 0.95 0.91 0.62 0.62 0.45 0.41 0.38 0.36 8.04

atr adv atr-atr auxp sb auxa obj auxp-atr atr-atr-atr auxa-atr

13,743 6,314 5,561 5,443 3,834 3,635 2,848 2,335 2,233 2,097

2.27 1.04 0.92 0.90 0.63 0.60 0.47 0.39 0.37 0.35 7.94

atr adv atr-atr auxp auxa sb obj auxp-atr auxa-atr atr-atr-atr

13,698 6,213 5,458 5,420 3,880 3,842 2,755 2,349 2,229 2,062

2.33 1.06 0.93 0.92 0.66 0.65 0.47 0.40 0.38 0.35 8.16

R.

E4

Freq.

%

E5

Freq.

%

E6

Freq.

%

1 2 3 4 5 6 7 8 9 10

atr adv atr-atr auxp sb auxa obj atr-atr-atr auxp-atr root Sub-total

14,241 6,331 6,042 5,346 3,880 3,569 2,646 2,493 2,392 2,072

2.46 1.09 1.04 0.92 0.67 0.62 0.46 0.43 0.41 0.36 8.47

atr atr-atr adv auxp sb auxa atr-atr-atr auxp-atr obj auxa-atr

15,111 6,789 5,887 5,447 3,633 3,529 3,069 2,589 2,518 2,054

2.53 1.14 0.99 0.91 0.61 0.59 0.51 0.43 0.42 0.34 8.49

atr adv auxp atr-atr sb auxa obj auxp-atr root auxa-atr

14,012 6,105 5,569 5,549 3,813 3,637 2,791 2,628 2,068 2,067

2.43 1.06 0.97 0.96 0.66 0.63 0.48 0.46 0.36 0.36 8.37

Now we go to specific sequencings. Table 5.21 presents the top 10 relation sequencings in each sub-corpus. With the same sequencing ranking (nearly) identically with similar percentages, we claim that the six sub-corpora are basically homogeneous. Thus, we are able to examine the general trend in the total data (six sub-corpora combined). Appendix 4 summarises the top 100 relation sequencings in the collection of all the six sub-corpora. We first briefly analyse the top 10 sequencings (Table 5.22), most of which are more frequent than roots, indicating at least one occurrence in each sentence on average. There are 6 sequencings with only one element (atr, adv, auxp, sb, auxa and obj), suggesting the great significance of these syntactic functions (attribute, adverbial, secondary preposition, subject, article and object, in sequence). Three sequencings have two elements, all with an atr (attribute). The individually occurring atr accounts for 10.76% of the whole extracted data, ranking the top. So it’s quite understandable that in the top 10 sequencings, there is also one sequencing with three identical elements (atr-atratr). This suggests that it’s quite commonplace for several attributes to occur before a noun. Ranking 11th is the root, taking an average percentage of 1.54. When the following nine rankings are concerned, these sequencings occur less

Syntactic Dependency Relations and Related Properties 103 Table 5.22 Top 20 relation sequencing in all the sub-corpora Rank

Sequencing

%

Rank

Sequencing

%

1

atr

10.76

11

root

1.54

2

adv

4.70

12

atr-adv

1.33

3

atr-atr

4.46

13

atr-sb

1.28

4

auxp

4.15

14

atr-obj

1.14

5

sb

2.88

15

auxp-auxa

1.14

6

auxa

2.79

16

adv-auxp

1.13

7

obj

2.07

17

auxp-atr-atr

1.02

8

auxp-atr

1.87

18

obj-auxp

0.91

9

atr-atr-atr

1.80

19

auxp-adv

0.88

10

auxa-atr

1.61

20

nr

0.88

frequently than roots. Except for nr (not identified relations), the rest occur with two or three components. For these double-or-triple-component sequencings, they either bear an element of atr (attribute) or one of auxp (secondary preposition) or both (e.g., auxp-atr-atr). The examination of sequencing enables the investigation of repeated elements in an ordered sequence, which is an advantage over the methods of examining motifs. 5.3.3 Section summary

This section examines whether the proposed concept of sequencing is a valid linguistic unit through the examination of POS sequencings and dependency relation sequencings. It is found out that the rank-frequency distribution of these two types of sequencings with a probability higher than 0.01% observes the ZA and ZM models. In fact, when we examine all the data without extracting those lower-probability sequencings, they are found to behave quite similarly. As it is more time consuming and makes little difference, we decide it might be more feasible to examine only the probability of at least above 0.01%. In this section we examined the sequencings of individual elements of the dependency structures, namely the first and third “combinations”, in the next two sections, we examine the combinations: Combinations 4 through 7, which are defined in Section 4.4.1. In Section 5.4, we focus on rank-frequency distributions of them. In Section 5.5, we go on examining these combinations through a lens of their motifs.

104 Syntactic Dependency Relations and Related Properties 5.4 Distributions of combinatorial dependency structures 5.4.1 Introduction

In this section, we examine the rank-frequency distribution of the four combinations of elements from the complete dependency structure “dependent (POS) + governor (POS) = syntactic function/dependency relation”, namely: Combination 4: Dependent (POS) + governor (POS); Combination 5: Dependent (POS) + [] = syntactic function; Combination 6: [] + governor (POS) = syntactic function; and Combination 7: The complete dependency structure. What needs to be pointed out is that both the dependents and governors here refer not to the concrete words, but rather to their POS, an abstraction of one of the qualities of words. We posit: Hypothesis 5.4.1: The four types of combinations are regularly distributed, following the same ZA distribution pattern, and are thus result of diversification process. 5.4.2 Results and discussion

This part takes turn to investigate the distribution patterns of Combinations 4 through 7. To save space, in the result-presenting tables, we only present top 10. And to highlight the general tendency, we will only present the curves of ranks from 1 to 50. 5.4.2.1 Combination 4: “dependent + governor”

Table 5.23 is a data summary of combination 4, which pairs up dependents and governors. TTR (M = 0.0057, SD = 0.00040), entropy (M = 5.0317, SD = 0.02137) Table 5.23 A data summary of combination 4 Input data

Types

Tokens

TTR

Entropy

Repeat rate

E1-Combination4

233

41,262

0.0056

5.01

0.05

E2-Combination4

232

41,311

0.0056

5.04

0.05

E3-Combination4

254

41,240

0.0062

5.07

0.05

E4-Combination4

230

41,181

0.0056

5.02

0.05

E5-Combination4

253

41,250

0.0061

5.03

0.06

E6-Combination4

209

41,192

0.0051

5.02

0.05

Syntactic Dependency Relations and Related Properties 105

Figure 5.5 Rank-frequency curve of combination 4 (top 50)

and repeat rates (M = 0.05167, SD = 0.00408) all show some homogeneity across the sub-corpora. Figure 5.5 exhibits the top 50 rank-frequency data. The curves of all sub-corpora basically overlap each other, further indicating the homogeneity among the sub-corpora. Fitting the ZA model to the complete rank-frequency data of combination 4 yields an excellent result with all determination coefficient R2 above 0.967 (Table 5.24). Parameters b (M = 0.2450, SD = 0.02588) and α (M = 0.1083, SD = 0.00753) suggest homogeneity among the sub-corpora. The fitting suggests combination 4 as a normal linguistic unit and as a result of a diversification process. Table 5.25 displays the top 10 “dependent + governor” structures, accounting for 65.2%–66.1% of the total. It shall be noted that here the “dependent + governor” structure doesn’t suggest a linear order. For instance, both phrases “mission◎ impossible” and “important mission◎” are marked as a combination of “dependent + governor”. The most common 9 combinations follow similar rankings among the sub-corpora. Successively, they are: “nn+nn” (nouns modifying nouns), “nn+vb” (nouns modifying verbs) or “dt+nn” (determiners modifying nouns), “nn+in” (nouns as prepositional objects), “jj+nn” (adjectives modifying nouns), “in+vb” (prepositional phrases modifying verbs), “in+nn” (prepositional phrases modifying nouns), “vb+vb” (verbs modifying verbs) and “nn+cc” (nouns modifying conjunctions). Of these structures, four of them are nouns as governors (“nn+nn”, “dt+nn”, “jj+nn” and “in+nn”), another four, verbs as governors (“nn+vb”, “in+vb”, “vb+vb” and “rb+vb”), and still another two, nouns as dependents (“nn+in” and “nn+cc”). These regularities accord with the high frequencies of verbs and particularly nouns in English.

106 Syntactic Dependency Relations and Related Properties Table 5.24 Fitting the ZA model to rank-frequency data of “dependent + governor” Input data

R²

a

b

n

α

N

E1

0.9686

0.36

0.21

233

0.10

41,262

E2

0.9926

0.01

0.28

232

0.11

41,311

E3

0.9884

0.07

0.26

254

0.10

41,240

E4

0.9878

0.15

0.25

230

0.11

41,181

E5

0.9782

0.29

0.22

253

0.12

41,250

E6

0.9868

0.13

0.25

209

0.11

41,192

5.4.2.2 Combination 5: “dependent + [] = syntactic function”

In this part, we continue with combination 5: “dependent + [] = syntactic function”, namely, what functions different word classes can play. In Section 4.4.2, we have introduced the idea of GVP. Fix the POS of dependents in combination 5 and we obtain the centripetal force of the generalised Table 5.25 Rank-frequency data of combination 4 (top 10) (R. = rank, Freq. = frequency) R.

E1

Freq.

%

E2

Freq.

%

E3

Freq.

%

1 2 3 4 5 6 7 8 9 10

nn+nn dt+nn nn+vb nn+in jj+nn in+vb in+nn vb+vb nn+cc rb+vb Sub-total

4,259 4,059 4,001 3,601 2,954 2,280 2,074 1,569 1,517 954

10.3 9.8 9.7 8.7 7.2 5.5 5.0 3.8 3.7 2.3 66.1

nn+nn nn+vb dt+nn nn+in jj+nn in+vb in+nn vb+vb nn+cc rb+vb

4,438 4,208 3,905 3,444 2,671 2,400 1,977 1,680 1,369 969

10.7 10.2 9.5 8.3 6.5 5.8 4.8 4.1 3.3 2.3 65.5

nn+nn dt+nn nn+vb nn+in jj+nn in+vb in+nn vb+vb nn+cc rb+vb

4,123 4,111 4,072 3,505 2,850 2,218 2,026 1,639 1,385 953

10.0 10.0 9.9 8.5 6.9 5.4 4.9 4.0 3.4 2.3 65.2

R.

E4

Freq.

%

E5

Freq.

%

E6

Freq.

%

1 2 3 4 5 6 7 8 9 10

nn+nn nn+vb dt+nn nn+in jj+nn in+vb in+nn vb+vb nn+cc rb+vb Sub-total

4,654 4,120 3,801 3,488 2,799 2,332 1,916 1,657 1,315 954

11.3 10.0 9.2 8.5 6.8 5.7 4.7 4.0 3.2 2.3 65.7

nn+nn nn+vb dt+nn nn+in jj+nn in+vb in+nn vb+vb nn+cc cd+nn

4,992 3,935 3,800 3,506 2,769 2,287 2,020 1,482 1,331 959

12.1 9.5 9.2 8.5 6.7 5.5 4.9 3.6 3.2 2.3 65.7

nn+nn nn+vb dt+nn nn+in jj+nn in+vb in+nn nn+cc vb+vb rb+vb

4,505 4,120 3,856 3,517 2,594 2,315 2,094 1,592 1,581 867

10.9 10.0 9.4 8.5 6.3 5.6 5.1 3.9 3.8 2.1 65.6

Syntactic Dependency Relations and Related Properties 107

Figure 5.6 Generalised valency patterns of verbs. (a) Functions dependents of verbs may play (active valency of verbs) (centrifugal force). (b) Functions verbs may play as dependents (passive valency) (centripetal force) Source: Gao et al., 2010

valency (Figure 4.10) or the passive valency. For instance, Figure 5.6b (Gaol et al., 2010) visually presents the passive part of the probabilistic valency patterns of verbs. In the framework of dependency grammar, we define multifunctionality as the number of functions a word (as a dependent) may play. And we define synfunctionality as the number of POS which may play the same syntactic function. If the aforementioned passive valency of PVP represents the multiple functions of a particular POS (as dependents), or its multifunctionality, then the study of synfunctionality is to examine the probability that various POS (as dependents) may have the same function. We give it a similar term “probabilistic function patterns”.

108 Syntactic Dependency Relations and Related Properties Table 5.26 A data summary of combination 5 Input data

Types

Tokens

TTR

Entropy

Repeat rate

E1-Combination5

116

43,264

0.0027

4.62

0.065

E2-Combination5

109

43,276

0.0025

4.64

0.064

E3-Combination5

108

43,264

0.0025

4.64

0.063

E4-Combination5

104

43,253

0.0024

4.61

0.065

E5-Combination5

109

43,257

0.0025

4.59

0.069

E6-Combination5

109

43,260

0.0025

4.60

0.066

Similarly, coming back to combination 5, if we fix the function part, we get the probabilistic POS pattern. For instance, the probability distribution of “dependent + [] = ATR” is thus the probabilistic POS pattern of attributes. In this part, we include all the possible combinations of “dependent + [] = syntactic function”. Table 5.26 summarises all the data for combination 5 with TTR (M = 0.0025, SD = 0.0001), entropy (M = 4.6167, SD = 0.02066) and repeat rates (M = 0.0653, SD = 0.00207) displaying some homogeneity across the sub-corpora. Table 5.27 presents the frequency data for the top 10 ranks, which account for 68.1%–70% of the total. Figure 5.7 presents the curves of the rank-frequency data of combination 5 (top 10). Fitting the ZA model to complete rank-frequency data of combination 5 yields determination coefficients R2 all above 0.975 (Table 5.28), hence a validation of the linguistic status of combination 5. The parameters seem to behave similarly with b (M = 0.2500, SD = 0.00632), n (M = 109.1667, SD = 3.86868) and α (M = 0.1600, SD = 0.00632) displaying some homogeneity across the sub-corpora. The overlapping curves of the six sub-corpora further validate their homogeneity. Some regularities can be seen from Table 5.28 follows: 1 The first 7 ranks are identical among sub-corpora, a strong indication of homogeneity. In sequence they are: “nn+ []=atr” (15.2%–17.4%), where a noun acts as an attribute; “in+ []=auxp” (9.8%–10.1%), where the preposition is an auxiliary word; “dt+ []=auxa” (8.2%–9.0%), where a determiner modifies a noun; “jj+ []=atr” (6.4%–7.1%), where an adjective plays the role of an attribute; e “nn+ []=sb” (5.7%–6.2%), where a noun acts as the subject; f “nn+ []=adv” (5.5%–5.9%), where a noun occurs after a preposition and together they play the role of an adverbial; and g “nn+ []=obj” (4.5%–5.1%), where a noun is the object. a b c d

Syntactic Dependency Relations and Related Properties 109 Table 5.27 Rank-frequency data for combination 5 across the sub-corpora (top 10) (R. = rank, Freq. = frequency) R.

E1

Freq.

%

E2

Freq.

1 2 3 4 5 6 7 8 9 10

nn+[ ]=atr in+[ ]=auxp dt+[ ]=auxa jj+[ ]=atr nn+[ ]=adv nn+[ ]=sb nn+[ ]=obj vb+[ ]=adv vb+[ ]=root rb+[ ]=adv sub-total

6,756 4,380 3,759 3,085 2,573 2,528 2,156 1,773 1,492 1,332

15.6 10.1 8.7 7.1 5.9 5.8 5.0 4.1 3.4 3.1 69.0

nn+[ ]=atr in+[ ]=auxp dt+[ ]=auxa jj+[ ]=atr nn+[ ]=sb nn+[ ]=adv nn+[ ]=obj vb+[ ]=adv vb+[ ]=root rb+[ ]=adv

R.

E4

Freq.

%

1 2 3 4 5 6 7 8 9 10

nn+[ ]=atr in+[ ]=auxp dt+[ ]=auxa jj+[ ]=atr nn+[ ]=sb nn+[ ]=adv nn+[ ]=obj vb+[ ]=adv vb+[ ]=root cd+[ ]=atr sub-total

7,010 4,240 3,562 2,926 2,582 2,492 2,068 1,865 1,607 1,454

16.2 9.8 8.2 6.8 6.0 5.8 4.8 4.3 3.7 3.4 68.9

%

E3

Freq.

6,754 15.6 4,271 9.9 3,628 8.4 2,781 6.4 2,589 6.0 2,437 5.6 2,228 5.1 1,941 4.5 1,519 3.5 1,336 3.1 68.1

nn+[ ]=atr in+[ ]=auxp dt+[ ]=auxa jj+[ ]=atr nn+[ ]=sb nn+[ ]=adv nn+[ ]=obj vb+[ ]=adv vb+[ ]=root rb+[ ]=adv

6,555 15.2 4,309 10.0 3,878 9.0 2,999 6.9 2,453 5.7 2,412 5.6 2,170 5.0 1,853 4.3 1,577 3.6 1,339 3.1 68.3

E5

Freq.

E6

Freq.

nn+[ ]=atr in+[ ]=auxp dt+[ ]=auxa jj+[ ]=atr nn+[ ]=sb nn+[ ]=adv nn+[ ]=obj cd+[ ]=atr vb+[ ]=adv vb+[ ]=root

7,530 17.4 4,282 9.9 3,527 8.2 2,884 6.7 2,631 6.1 2,400 5.5 1,946 4.5 1,859 4.3 1,738 4.0 1,480 3.4 70.0

nn+[ ]=atr in+[ ]=auxp dt+[ ]=auxa jj+[ ]=atr nn+[ ]=sb nn+[ ]=adv nn+[ ]=obj vb+[ ]=adv vb+[ ]=root cd+[ ]=atr

7,086 16.4 4,363 10.1 3,635 8.4 2,748 6.4 2,677 6.2 2,407 5.6 2,176 5.0 1,819 4.2 1,648 3.8 1,348 3.1 69.1

%

Figure 5.7 Rank-frequency curve of “dependent+ []=syntactic function” (top 10)

%

%

110 Syntactic Dependency Relations and Related Properties Table 5.28 Fitting the ZA model to rank-frequency data of combination 5 Input data

R²

a

b

n

α

N

E1-Combination 5

0.9791

0.23

0.25

116

0.16

43,264

E2-Combination 5

0.9750

0.21

0.25

109

0.16

43,276

E3-Combination 5

0.9778

0.23

0.24

108

0.15

43,264

E4-Combination 5

0.9778

0.19

0.25

104

0.16

43,253

E5-Combination 5

0.9826

0.17

0.26

109

0.17

43,257

E6-Combination 5

0.9784

0.20

0.25

109

0.16

43,260

2 In most cases, ranking 8th to 10th are basically “vb+ []=adv”(4.1–4.5%, a verb acting as an adverbial), “vb+ []=root” (3.4–3.8%, a verb being the root of the dependency tree), “cd+ []=atr” (2.7–4.3%, a numeral being an attribute) or “rb+ []=adv” (2.7–3.2%, a verb acting as an adverbial). 3 Among the top 10 structures, nouns as dependents take up the highest percentage, playing the roles of attributes, adverbials, subjects and objects. The next prominent POS is the verb, which is an adverbial or the root of its dependency tree. Verbs as adverbials (“vb+ []=adv”) occur more than those as roots (“vb+ []=root”). Following verb/noun-related structures are propositions as auxiliary words (“in+ []=auxp”), determiners and adjectives as attributes (“dt+ []=atr”, “rb + []=atr”) and adverbs as adverbials (“rb+ []=adv”). The study of combination 5 further validates the feasibility of examining “probabilistic function patterns” as well as “probabilistic valency patterns” (the passive valency part or the part of centripetal force). 5.4.2.3 Combination 6: “[] + governor = syntactic function”

After the analysis of dependents acting as certain syntactic functions, we proceed with governors which call for some dependents to fulfil certain functions. The relevant structure is combination 6 “[] + governor (POS) = syntactic function”. Fix the POS of the governor in combination 6 and we can obtain the centrifugal force of the generalised valency (Figure 4.10), or the active valency. For instance, Figure 5.6a (Gao et al., 2010) visually presents the active part of the probabilistic valency patterns of verbs. Active valency and passive valency as introduced in Section 4.4.2 come together to form the binding power of a word class. When represented with probabilities, they become probabilistic valency patterns. Fix the function of the governor in combination 6 and we can know what POS can play a certain function. For instance, examining “[]+governor =obj” is to check which word classes can play the role of objects. This is an essential component of probabilistic function patterns. Further studies will address this interesting research issue.

Syntactic Dependency Relations and Related Properties 111 Table 5.29 A data summary of combination 6 Input data

Types

Tokens

TTR

Entropy

Repeat rate

E1-Combination6

130

41,262

0.0032

4.39

0.09

E2-Combination6

134

41,311

0.0032

4.42

0.09

E3-Combination6

139

41,240

0.0034

4.44

0.09

E4-Combination6

131

41,181

0.0032

4.34

0.10

E5-Combination6

143

41,250

0.0035

4.36

0.10

E6-Combination6

128

41,192

0.0031

4.42

0.09

In this sub-section, we just take into consideration all structure of “[] + governor = syntactic function”. In a similar fashion, we present the data summary of combination 6 (Table 5.29), the rank-frequency curve of the top 10 frequencies (Figure 5.8), and the fitting of the ZA model to the complete rank-frequency data (Table 5.30). The fitting yields all determination coefficient R2 above 0.984, a validation that combination 6 is also a basic linguistic unit and a result of diversification process. The parameters again display some homogeneity, for instance, TTR (M = 0.00327, SD = 0.00015), entropy (M = 4.3950, SD = 0.03886) and repeat rate (M = 0.933, SD = 0.00516) in Table 5.29, b (M = 0.2933, SD = 0.00816) and α (M = 0.2467, SD = 0.00816) in Table 5.30.

Figure 5.8 Rank-frequency curve of “[]+governor=syntactic function” (top 10)

112 Syntactic Dependency Relations and Related Properties Table 5.30 Fitting ZA model to complete rank-frequency data of combination 6 Input data

R²

a

b

n

α

N

E1-Combination6

0.9849

0.04

0.29

130

0.25

41,262

E2-Combination6

0.9843

0.01

0.30

134

0.24

41,311

E3-Combination6

0.9844

0.11

0.28

139

0.24

41,240

E4-Combination6

0.9856

0.01

0.30

131

0.25

41,181

E5-Combination6

0.9854

0.04

0.29

143

0.26

41,250

E6-Combination6

0.9854

0.00

0.30

128

0.24

41,192

Table 5.31 presents the top 10 ranks of combination 6, accounting for 74.1% 75.4% of the total. Table 5.32 summaries the data of Table 5.31 in a clearer fashion. The structure of nouns opening slots for attributes ([]+nn=atr, 23.9–26.2%) accounts for about a quarter of the total, followed by “[]+nn=auxa” (8.4–8.9%), where nouns open slots for determiners. Both of the top 2 go with nouns as governors. Table 5.31 Rank-frequency data of combination 6 (top 10) (individual corpora) (R. = rank, F. = frequency) R. E1 1 2 3 4 5 6 7 8 9 10

F.

%

10,266 24.9 [ ]+nn=atr [ ]+nn=auxa 3,689 8.9 [ ]+vb=sb 2,756 6.7 [ ]+in=adv 2,674 6.5 [ ]+vb=auxp 2,521 6.1 [ ]+nn=auxp 2,223 5.4 [ ]+vb=obj 2,122 5.1 [ ]+in=atr 1,954 4.7 [ ]+vb=adv 1,666 4.0 [ ]+vb=auxv 1,073 2.6 SUB-TOTAL 75.0

F.

%

E3

F.

%

10,072 24.4 [ ]+nn=atr [ ]+nn=atr [ ]+nn=auxa 3,532 8.6 [ ]+nn=auxa [ ]+vb=sb 2,875 7.0 [ ]+vb=sb [ ]+in=adv 2,695 6.5 [ ]+in=adv [ ]+vb=auxp 2,563 6.2 [ ]+vb=auxp [ ]+vb=obj 2,213 5.4 [ ]+nn=auxp [ ]+nn=auxp 2,141 5.2 [ ]+vb=obj [ ]+in=atr 1,856 4.5 [ ]+in=atr [ ]+vb=adv 1,840 4.5 [ ]+vb=adv [ ]+vb=auxv 1,120 2.7 [ ]+vb=auxv 74.8

9,836 23.9 3,754 9.1 2,927 7.1 2,637 6.4 2,430 5.9 2,215 5.4 2,094 5.1 1,902 4.6 1,793 4.3 1,093 2.7 74.4

E5

E6

F.

[ ]+nn=atr [ ]+nn=auxa [ ]+vb=sb [ ]+in=adv [ ]+vb=auxp [ ]+nn=auxp [ ]+vb=obj [ ]+in=atr [ ]+vb=adv [ ]+vb=auxv

9,924 24.1 3,542 8.6 2,901 7.0 2,672 6.5 2,625 6.4 2,254 5.5 2,100 5.1 1,940 4.7 1,558 3.8 1,020 2.5 74.1

R. E4

F.

1 2 3 4 5 6 7 8 9 10

10,348 25.1 [ ]+nn=atr 10,797 26.2 3,494 8.5 [ ]+nn=auxa 3,445 8.4 3,038 7.4 [ ]+vb=sb 2,770 6.7 2,731 6.6 [ ]+vb=auxp 2,598 6.3 2,648 6.4 [ ]+in=adv 2,579 6.3 2,075 5.0 [ ]+nn=auxp 2,185 5.3 2,036 4.9 [ ]+in=atr 1,964 4.8 1,810 4.4 [ ]+vb=obj 1,941 4.7 1,770 4.3 [ ]+vb=adv 1,576 3.8 1,080 2.6 [ ]+cd=atr 1,053 2.6 75.4 74.9

[ ]+nn=atr [ ]+nn=auxa [ ]+vb=sb [ ]+in=adv [ ]+vb=auxp [ ]+vb=obj [ ]+nn=auxp [ ]+in=atr [ ]+vb=adv [ ]+vb=auxv Sub-total

%

E2

F.

%

%

Syntactic Dependency Relations and Related Properties 113 Table 5.32 Rank-frequency data of combination 6 (top 10) Rank

%

Combination 6

Note

1 2 3 4

23.9–26.2 8.4–8.9 6.7–7.4 6.3–6.6

[ ]+nn=atr [ ]+nn=auxa [ ]+vb=sb [ ]+in=adv

5 6 7 8

5.9–6.4 4.9–5.5 4.7–5.4 4.4–4.8

[ ]+vb=auxp [ ]+nn=auxp [ ]+vb=obj [ ]+in=atr

9 10

3.8–4.3 2.5–2.7

[ ]+vb=adv [ ]+vb=auxv

nouns open slots for attributes nouns open slots for determiners verbs open slots for subjects verbs open slots for prepositional phrases as adverbials verbs open slots for prepositions as auxiliary words nouns open slots for prepositions as auxiliary words verbs open slots for objects prepositions open slots so that they together can be attributes verbs open slots for adverbials verbs open slots for auxiliary verbs

5.4.2.4 Combination 7: A complete dependency structure

Combination 7, or “dependent + governor = syntactic function/dependency relation” covers the three essential components of a complete dependency structure. Table 5.33 presents the general distribution of the complete dependency structures. TTR (M = 0.009333, SD = 0.005164), entropy (M = 5.692333, SD = 0.0371358) and repeat rates (M = 0.040167, SD = 0.011690) seem to suggest some homogeneity among the sub-corpora. Figure 5.9 presents the top 10 rank-frequency data. Fitting the ZA model to the complete rank-frequency data yields an excellent result with all corresponding determination coefficients R2 above 0.985 (Table 5.34). The data prove the homogeneity among the sub-corpora with parameters b (M = 0.2167, SD = 0.01033), and α (M = 0.1083, SD = 0.00753). The regular distribution patterns validate the linguistic status of combination 7 as a normal linguistic unit and as a result of a diversification process. The top 10 structures of combination 7 are shown in Table 5.35, accounting for 54.6–56% of the total. Some regularities of the distribution patterns can be seen from Table 5.35. Ranking top 10 are all structures related to nouns or verbs, or both. The high percentage Table 5.33 A summary of the distribution of combination 7 Input data

Types

Tokens

TTR

Entropy

Repeat rate

E1-Combination7

388

41,262

0.009

5.687

0.040

E2-Combination7

389

41,311

0.009

5.728

0.039

E3-Combination7

416

41,240

0.010

5.742

0.039

E4-Combination7

377

41,181

0.009

5.657

0.041

E5-Combination7

403

41,250

0.010

5.649

0.042

E6-Combination7

373

41,192

0.009

5.691

0.040

114 Syntactic Dependency Relations and Related Properties

Figure 5.9 Rank-frequency curve of combination 7 (top 10)

of these structures is partly a result of the high frequency of nouns and verbs (Liang & Liu, 2013). 1 Top 3 structures are all noun structures. The structure “nn+nn=atr” (where a noun modifies another noun and serves as an attribute) ranks first among all sub-corpora with a percentage of 10.3–12.1%. This structure stands for a noun being the attribute of another noun. Among all the attributes of nouns, there are more nouns than adjectives, which might be quite counter-intuitive to many. Adjectives as attributes to nouns (“adj+nn=atr”) rank No. 3, accounting for 6.3–7.2%. Ranking between “nn+nn=atr” and “adj+nn=atr” is a noun structure with determiners: “dt+nn=auxa”. About 10% of words in the corpora are determiners like “the/a”, “this/that/those/these”, “each”, “both”, and “another”, Table 5.34 Fitting the ZA model to complete rank-frequency data of combination 7 Input data

R²

a

b

n

α

N

E1-Combination7

0.9850

0.01

0.23

388

0.10

41,262

E2-Combination7

0.9872

0.03

0.22

389

0.11

41,311

E3-Combination7

0.9853

0.05

0.22

416

0.10

41,240

E4-Combination7

0.9882

0.10

0.21

377

0.11

41,181

E5-Combination7

0.9883

0.15

0.20

403

0.12

41,250

E6-Combination7

0.9855

0.02

0.22

373

0.11

41,192

Syntactic Dependency Relations and Related Properties 115 Table 5.35 Top 10 structures of combination 7 (R. = rank, Freq. = frequency) R.

E1

Freq.

%

1 2 3 4 5 6 7 8 9 10

nn+nn=atr dt+nn=auxa jj+nn=atr in+nn=auxp nn+vb=obj nn+in=adv in+vb=auxp nn+vb=sb nn+in=atr rb+vb=adv Sub-total

4,247 10.3 nn+nn=atr 3,685 8.9 dt+nn=auxa 2,954 7.2 jj+nn=atr 1,976 4.8 nn+vb=obj 1,875 4.5 in+vb=auxp 1,871 4.5 nn+vb=sb 1,847 4.5 in+nn=auxp 1,792 4.3 nn+in=adv 1,728 4.2 nn+in=atr 888 2.2 rb+vb=adv 55.4

4,427 10.7 nn+nn=atr 3,526 8.5 dt+nn=auxa 2,671 6.5 jj+nn=atr 1,963 4.8 in+nn=auxp 1,878 4.5 nn+vb=obj 1,863 4.5 nn+vb=sb 1,858 4.5 nn+in=adv 1,846 4.5 in+vb=auxp 1,597 3.9 nn+in=atr 933 2.3 rb+vb=adv 54.6

4,111 10.0 3,752 9.1 2,848 6.9 1,945 4.7 1,871 4.5 1,817 4.4 1,812 4.4 1,761 4.3 1,690 4.1 910 2.2 54.6

R.

E3

Freq.

Freq.

Freq.

1 2 3 4 5 6 7 8 9 10

nn+nn=atr dt+nn=auxa jj+nn=atr nn+vb=sb nn+in=adv in+vb=auxp in+nn=auxp nn+vb=obj nn+in=atr rb+vb=adv Sub-total

4,650 11.3 nn+nn=atr 4,992 12.1 nn+nn=atr 3,490 8.5 dt+nn=auxa 3,445 8.4 dt+nn=auxa 2,797 6.8 jj+nn=atr 2,769 6.7 jj+nn=atr 1,947 4.7 nn+vb=sb 1,942 4.7 in+nn=auxp 1,887 4.6 in+nn=auxp 1,930 4.7 nn+vb=sb 1,871 4.5 in+vb=auxp 1,859 4.5 in+vb=auxp 1,844 4.5 nn+in=adv 1,824 4.4 nn+vb=obj 1,794 4.4 nn+vb=obj 1,683 4.1 nn+in=adv 1,599 3.9 nn+in=atr 1,681 4.1 nn+in=atr 914 2.2 cd+nn=atr 959 2.3 vb+vb=auxv 55.3 56.0

%

E2

E4

Freq.

%

%

E3

E6

Freq.

%

%

4,501 10.9 3,540 8.6 2,592 6.3 1,988 4.8 1,985 4.8 1,833 4.5 1,821 4.4 1,805 4.4 1,709 4.1 856 2.1 54.9

thus leading to the high frequency of the structure “dt+nn=auxa”. Ranking No. 4 or 5 are “in+nn=auxp” (4.5–4.8%), or “nn+vb=obj” (4.5–4.8%). The former stands for structures where prepositional phrases modify nouns and the latter, structures where nouns are objects of verbs. 2 Besides “nn+vb=obj” (4.5–4.8%, nouns serving as objects of verbs), several other verbal structures rank among top 10: “in+vb=auxp” (propositional phrases modifying verbs), “nn+vb=sb” (nouns being the subjects of verbs), and “rb+vb=adv” (adverbs modifying verbs as adverbials). 3 Prepositions are also important in the sub-corpora. They can act as dependents, like in the noun structure “in+nn=auxp” and in the verb structure “in+vb=auxp”. They can also play the role of a governor, like in the structures “nn+in=adv” and “nn+in=atr”. A prepositional phrase acts in “nn+in=adv”as an adverbial, and in “nn+in=atr”, as an attribute. 5.4.3 Section summary

The previous section of this chapter focuses on the distribution patterns of individual units/properties. This section comes to combinatorial distribution of units/ properties, namely the various combinations of two to three components of the

116 Syntactic Dependency Relations and Related Properties complete dependency structure “dependent (POS) + governor (POS) = syntactic function/dependency relation”. To sum up the research findings in this chapter, we find that the various combinations of dependency structure all follow the same ZA distribution model, which indicates that language is a self-adjusting adaptive system. Some of the structures are more frequently used while others occur less frequently. The Zipf-related distribution model occurs as a result of the competition and cooperation between speakers/writers and listeners/readers, enabling least efforts from both parties. Research on combinatorial units/properties extends the definitions of linguistic units. But the present section only investigates the distribution patterns in a bag-ofwords mode. More features of language in the line, of which motifs are a typical example, can be examined. This is the task in the next section. 5.5 Distributions of dependency structure motifs 5.5.1 Introduction

Having examined how the bag-of-words models of various combinations are distributed, we continue with their linear behaviours by examining their motifs. The definitions of motifs are given in Section 4.5.1. And the next sub-section will illustrate how to generate motifs in this part. As the motifs are combinations, which themselves indicate some linear behaviour, we posit their distribution models might be different from the traditional ones like ZA (a, b; n = x-max, α fixed) and ZM (a, b; n = x-max) models. We also wonder whether the traditional way of forming motifs makes enough sense for the combinations. Besides, for Combinations 4 through 6, there are two elements for each, while for combination 7, the complete valency structure, there are three elements, which will further complicate the issue. But we can still predict that the motifs of the four combinations follow the same distribution model (Hypothesis 5.5.1). We will check these combinations following the same order as we did previously. 5.5.2 Generating motifs

We illustrate with the motifs of combination 4 for the “Pierre” sample sentence (Figure 4.9). As the examined unit is nominal, a motif of such a unit is the longest sequence without repetitious elements. Thus, the whole sentence can generate only two motifs as presented in Table 5.36. The second motif starts with “DT+NN” as there is such a combination in the previous motif. To avoid ambiguity, here we use comma to separate the elements as combination 4 itself has a plus sign. In this way, we can get all the motifs for combinations 4 through 7. In this section, we only display distribution graphs for the top 50 frequencies so that the general trend can be more easily seen. When illustrating the cases with top frequencies, we choose those above a percentage of 0.5.

Syntactic Dependency Relations and Related Properties 117 Table 5.36 Generating motifs for combination 4 in the “Pierre” sample sentence Dependent

POS

Governor

POS

Combination 4

1

Pierre

nn

Vinken

nn

nn+nn

2

Vinken

nn

will

md

nn+md

3

61

cd

years

nn

cd+nn

4

years

nn

old

jj

nn+jj

5

old

jj

Vinken

nn

jj+nn

6

will

md

/

7

join

vb

will

md

vb+md

8

the

dt

board

nn

dt+nn

9

board

nn

join

vb

nn+vb

10

as

in

join

vb

in+vb

11

a

dt

director

nn

dt+nn

12

non-executive

jj

director

nn

JJ+NN

13

director

nn

as

in

NN+IN

14

Nov.

nn

join

vb

NN+VB

15

29.

cd

Nov.

nn

CD+NN

/

Motif

nn+nn, nn+md, cd+nn, nn+jj, jj+nn, vb+md, dt+nn, nn+vb, in+vb

dt+nn, jj+nn, nn+in, nn+vb, cd+nn

5.5.3 Results and discussion

This sub-section takes turns to investigate the motifs of the four combinations. 5.5.3.1 Combination 4: “dependent + governor”

First, we get the general information for the motifs of combination 4, the dependentgovernor pair (Table 5.37). The data suggest general homogeneity among the subcorpora with TTR (M=0.669706, SD = 0.0222911) and entropy (M= 11.347098, SD = 0.161260356). As it is not necessary to display more than 5 thousand types, we only visually present the top 10 frequencies (Figure 5.10), which clearly indicate that except for Table 5.37 A summary of motifs of combination 4 Input data

Types

Tokens

TTR

Entropy

Combination4-E1 Combination4-E2 Combination4-E3 Combination4-E4 Combination4-E5 Combination4-E6

5624 5622 5613 5497 5369 5595

8279 8252 8108 8356 8518 8262

0.679309 0.681289 0.692279 0.657851 0.630312 0.677197

11.435537 11.431673 11.487903 11.289096 11.047232 11.391146

118 Syntactic Dependency Relations and Related Properties

Figure 5.10 Rank-frequency curve of motifs of combination 4 (top 10)

the highest frequency (all above 500), the slope is quite gentle. Actually, from ranking No. 2 on, they only have quite small frequencies (less than 100). Table 5.38 illustrates the top five motifs for each sub-corpus, covering 9.2%–12.6% of the total. All the top 5 combinations are all noun-related, which is closely related with the high frequency of nouns. Ranking No. 1 is the combination of “nn+nn”, where a noun modifies another noun. As the motifs of nominal properties are defined as composed of non-repeated elements, we can expect that the following motif is again “nn+nn”. This suggests that it’s quite often for several nouns occurring together, with one of them being the governor, and the rest, dependents. This motif of “nn+nn” alone takes up more than 6.6% of the total, followed by a motif accounting for less than 0.8%. The low frequencies of other motifs suggest that it’s not common for other dependent-governor pairs to habitually co-occur in a neighbouring fashion. As has been expected, fitting the complete data to the Altmann Fitter, we find that neither ZA or ZM models are suitable ones. For the former model, the determination coefficients R2 for two of the sub-corpora (E2 and E5) are simply 0, indicating a complete failure (Table 5.39). For the latter model, the fitting results are quite unsatisfactory since the determination coefficients R2 are below 0.46 (Table 5.40). The only suitable model is the three-parameter negative hypergeometric model (K, M, n) (Table 5.41). The parameters K (M = 0.901667, SD = 0.0325064) and M (M = 0.308333, SD = 0.0213698) are around 0.90 and 0.31, respectively, suggesting that the six sub-corpora are in a certain way homogeneous. The fittings further validate the linguistic status of combination 4. The motifs of combination 4 fail to observe either the ZA or ZM models, suggesting some different dynamics of forming these motifs.

Table 5.38 Motifs of combination 4 in the sub-corpora (top 5) (R. = rank, Freq. = frequency) E1

Freq.

1 2 3 4 5

nn+nn jj+nn nn+cc dt+nn, nn+nn nn+nn, pos+nn

570 70 63 44 40 sub-total

E4

Freq.

nn+nn jj+nn dt+nn, nn+nn nn+nn, nn+cc nn+nn, nn+in

658 58 46 45 42 sub-total

1 2 3 4 5

%

E2

Freq.

nn+nn nn+nn, pos+nn nn+cc nn+vb, dt+nn jj+nn

570 53 50 45 44

%

E5

Freq.

7.9 0.7 0.6 0.5 0.5 10.2

nn+nn jj+nn nn+nn, nn+in cd+nn, nn+nn nn+nn, pos+nn

879 63 44 44 43

6.9 0.8 0.8 0.5 0.5 9.5

%

E3

Freq.

%

nn+nn jj+nn nn+cc dt+nn, nn+nn nn+nn, pos+nn

535 55 47 43 39

6.6 0.7 0.6 0.5 0.5 8.9

%

E6

Freq.

%

10.3 0.7 0.5 0.5 0.5 12.6

nn+nn nn+cc jj+nn nn+nn, nn+vb dt+nn, nn+nn

616 52 51 48 42

7.5 0.6 0.6 0.6 0.5 9.8

6.9 0.6 0.6 0.5 0.5 9.2

Syntactic Dependency Relations and Related Properties 119

R.

120 Syntactic Dependency Relations and Related Properties Table 5.39 Fitting the ZA model to motifs of combination 4 Input data

R²

a

B

n

α

N

E1-Combination4

0.9793

0.3038

0.0217

5,624

0.0688

8,279

E2-Combination4

0

0

0

E3-Combination4

0.9783

0.2508

0.025

5,613

0.066

8,108

E4-Combination4

0.9829

0.2165

0.0304

5,497

0.0787

8,356

E5-Combination4

0

0

0

E6-Combination4

0.9894

0.4903

0.0053

0

0

0

0

0

5,595

0

0.0746

8,262

Table 5.40 Fitting the ZM model to motifs of combination 4 Input data

R²

a

b

n

N

x-max

E1-Combination4

0.4333

0.73

1.38

5,624

8,279

5,624

E2-Combination4

0.4247

0.73

1.44

5,622

8,252

5,622

E3-Combination4

0.4140

0.72

1.56

5,613

8,108

5,613

E4-Combination4

0.4356

0.75

1.10

5,497

8,356

5,497

E5-Combination4

0.3832

0.81

1.45

5,369

8,518

5,369

E6-Combination4

0.4595

0.72

0.71

5,595

8,262

5,595

5.5.3.2 Combination 5: “dependent + [] = syntactic function”

Data of combination 5 seem to behave quite like combination 4. Table 5.42 presents the summary of the motifs of combination 5. The TTR (0.53–0.6) is smaller than that for combination 4 (0.63–0.68). TTR (M=0.577133, SD = 0.0268153) and entropy (M=10.459817, SD = 0.2138295) present some homogeneity among the sub-corpora. Table 5.41 Fitting the negative hypergeometric model to motifs of combination 4 Input data

R²

K

M

n

N

E1-Combination4

0.8840

0.92

0.32

5,623

8,279

E2-Combination4

0.8767

0.92

0.32

5,621

8,252

E3-Combination4

0.8661

0.93

0.33

5,612

8,108

E4-Combination4

0.8850

0.90

0.30

5,496

8,356

E5-Combination4

0.8815

0.84

0.27

5,368

8,518

E6-Combination4

0.8694

0.90

0.31

5,594

8,262

Syntactic Dependency Relations and Related Properties 121 Table 5.42 A summary of motifs of combination 5

E1-Combination5 E2-Combination5 E3-Combination5 E4-Combination5 E5-Combination5 E6-Combination5

Types

Tokens

TTR

Entropy

5507 5573 5535 5462 5215 5524

9411 9305 9257 9550 9906 9485

0.585166 0.598925 0.597926 0.571937 0.526449 0.582393

10.565581 10.608978 10.649334 10.410639 10.063781 10.460586

Figure 5.11 presents the rank-frequency curve for the top 10. Quite like Figure 5.10 (Rank-frequency curve of motifs of combination 4), the dots of the highest frequency for each sub-corpus are quite separated from the next frequencies. Table 5.43 is the top 10 motifs of Combination, which account for 20.5%–25.8% of the total data. All the three top-ranking motifs start with “nn+ []=atr” (where nouns are attributes) except for E2 with a very small gap. No. 1 rankings have frequencies about 10 times of No. 2 rankings. We fit the same negative hypergeometric model to motifs of combination 5 and the fittings are even better, with all determination coefficients R2 above 0.925 (Table 5.44). Again, parameters K (M = 0.751667, SD = 0.0147196) and M (M = 0.221667, SD = 0.116905) both behave quite homogeneously among the sub-corpora. The fittings further validate the linguistic status of combination 5. 5.5.3.3 Combination 6: “[] + governor = syntactic function”

Likewise, we get the relevant data for the motifs of combination 6. A summary is available in Table 5.45. Figure 5.12 presents the rank-frequency curve for the top

Figure 5.11 Rank-frequency curve of motifs of combination 5 (top 10)

E1 1 nn+[ ]=atr

Freq.

%

E2

1,273

13.5 nn+[ ]=atr

Freq. %

E3

1,276 13.7 nn+[ ]=atr

Freq. % 1,207 13.0

2 nn+[ ]=atr, in+[ ]=auxp

128

1.4 in+[ ]=auxp, nn+[ ]=atr

115

1.2 nn+[ ]=atr, cc+[ ]=coord

116

1.3

3 nn+[ ]=atr, cc+[ ]=coord

109

1.2 nn+[ ]=atr, in+[ ]=auxp

107

1.2 nn+[ ]=atr, in+[ ]=auxp

113

1.2

4 in+[ ]=auxp, nn+[ ]=atr

101

1.1 nn+[ ]=atr, cc+[ ]=coord

102

1.1 in+[ ]=auxp, nn+[ ]=atr

99

1.1

101

1.1 nn+[ ]=atr, pos+[ ]=atr

94

1.0

5 jj+[ ]=atr

71

0.8 nn+[ ]=atr, pos+[ ]=atr

6 nn+[ ]=atr, pos+[ ]=atr

70

0.7 in+[ ]=auxp, dt+[ ]=auxa, nn+[ ]=atr

67

0.7 in+[ ]=auxp, dt+[ ]=auxa, nn+[ ]=atr

66

0.7

7 in+[ ]=auxp, dt+[ ]=auxa, nn+[ ]=atr

62

0.7 dt+[ ]=auxa, nn+[ ]=atr

55

0.6 dt+[ ]=auxa, nn+[ ]=atr

66

0.7

8 dt+[ ]=auxa, nn+[ ]=atr

60

0.6 nn+[ ]=atr, in+[ ]=auxp, dt+[ ]=auxa

55

0.6 jj+[ ]=atr

63

0.7

9 nn+[ ]=atr, in+[ ]=auxp, dt+[ ]=auxa

57

0.6 jj+[ ]=atr

47

0.5 nn+[ ]=atr, in+[ ]=auxp, dt+[ ]=auxa

61

0.7

55

0.6 cd+[ ]=atr

46

0.5 jj+[ ]=atr, nn+[ ]=atr

45

10 cd+[ ]=atr E4 1 nn+[ ]=atr 2 nn+[ ]=atr, in+[ ]=auxp

sub-total

21.1

21.2

Freq.

%

1,443

15.1 nn+[ ]=atr

E5

Freq. %

0.5 20.8

E6

1,674 16.9 nn+[ ]=atr

Freq. % 1,414 14.9

133

1.4 nn+[ ]=atr, in+[ ]=auxp

139

1.4 nn+[ ]=atr, cc+[ ]=coord

151

1.6

3 nn+[ ]=atr, cc+[ ]=coord

119

1.2 nn+[ ]=atr, cc+[ ]=coord

120

1.2 nn+[ ]=atr, in+[ ]=auxp

142

1.5

4 in+[ ]=auxp, nn+[ ]=atr

109

1.1 in+[ ]=auxp, nn+[ ]=atr

105

1.1 in+[ ]=auxp, nn+[ ]=atr

113

1.2

5 nn+[ ]=atr, pos+[ ]=atr

80

0.8 nn+[ ]=atr, jj+[ ]=atr

98

1.0 nn+[ ]=atr, pos+[ ]=atr

72

0.8

6 in+[ ]=auxp, dt+[ ]=auxa, nn+[ ]=atr

70

0.7 cd+[ ]=atr, nn+[ ]=atr

96

1.0 in+[ ]=auxp, dt+[ ]=auxa, nn+[ ]=atr

64

0.7

7 dt+[ ]=auxa, nn+[ ]=atr

69

0.7 nn+[ ]=atr, pos+[ ]=atr

83

0.8 dt+[ ]=auxa, nn+[ ]=atr

59

0.6

8 jj+[ ]=atr

60

0.6 cd+[ ]=atr

82

0.8 jj+[ ]=atr

54

0.6

9 jj+[ ]=atr, nn+[ ]=atr

59

0.6 nn+[ ]=atr, cd+[ ]=atr

81

0.8 nn+[ ]=atr, in+[ ]=auxp, dt+[ ]=auxa

47

0.5

52

0.5 dt+[ ]=auxa, nn+[ ]=atr

74

0.7 jj+[ ]=atr, nn+[ ]=atr

45

10 cd+[ ]=atr, nn+[ ]=atr

Sub-total 23.0

25.8

0.5 22.8

122 Syntactic Dependency Relations and Related Properties

Table 5.43 Top 10 motifs of combination 5

Syntactic Dependency Relations and Related Properties 123 Table 5.44 Fitting the negative hypergeometric model to motifs of combination 5 Input data

R²

K

M

n

N

E1-Combination5

0.9386

0.77

0.23

5,506

9,411

E2-Combination5

0.9257

0.75

0.23

5,572

9,305

E3-Combination5

0.9312

0.77

0.23

5,534

9,257

E4-Combination5

0.9389

0.74

0.22

5,461

9,550

E5-Combination5

0.9589

0.74

0.20

5,214

9,906

E6-Combination5

0.9358

0.74

0.22

5,523

9,485

Table 5.45 Summary of motifs of combination 6 Input data

Types

Tokens

TTR

Entropy

Combination6-E1 Combination6-E2 Combination6-E3 Combination6-E4 Combination6-E5 Combination6-E6

5024 5059 5129 4943 4849 5077

11,605 11,452 11,260 11,801 12,176 11,376

0.432917 0.441757 0.455506 0.418863 0.398242 0.446290

9.399387 9.486960 9.629189 9.296507 9.032495 9.565153

10 motifs. Table 5.46 is top 5 motifs of combination 6, with the first ranking motif “[]+nn=atr” (other word classes modifying nouns and serving as attributes) taking up an average percentage of 22.5. Fitting the negative hypergeometric model to data of motifs of combination 6 yields excellent results with determination coefficients R2 above 0.967 (Table 5.47).

Figure 5.12 Rank-frequency curve of combination 6 (top 10)

1 2 3 4 5

1 2 3 4 5

E1

Freq.

%

E2

Freq.

%

E3

Freq.

%

[ ]+nn=atr [ ]+nn=atr, [ ]+vb=sb [ ]+nn=atr, [ ]+in=adv [ ]+nn=atr, [ ]+vb=obj [ ]+nn=atr, [ ]+in=atr Sub-total

2,613 158 132 123 117

22.5 1.4 1.1 1.1 1.0 27.1

[ ]+nn=atr [ ]+nn=atr, [ ]+vb=sb [ ]+nn=atr, [ ]+in=atr [ ]+nn=atr, [ ]+in=adv [ ]+nn=atr, [ ]+vb=obj

2,537 135 132 124 103

22.2 1.2 1.2 1.1 0.9 26.6

[ ]+nn=atr [ ]+nn=atr, [ ]+vb=sb [ ]+nn=atr, [ ]+in=adv [ ]+nn=atr, [ ]+in=atr [ ]+nn=atr, [ ]+vb=obj

2,334 136 130 123 115

20.7 1.2 1.2 1.1 1.0 25.2

E4

Freq.

%

E5

Freq.

%

E6

Freq.

%

[ ]+nn=atr [ ]+nn=atr, [ ]+vb=sb [ ]+nn=atr, [ ]+in=adv [ ]+nn=atr, [ ]+in=atr [ ]+nn=atr, [ ]+vb=obj Sub-total

2,735 170 150 141 104

23.2 1.4 1.3 1.2 0.9 28.0

[ ]+nn=atr [ ]+nn=atr, [ ]+in=atr [ ]+nn=atr, [ ]+vb=sb [ ]+nn=atr, [ ]+in=adv [ ]+nn=atr, [ ]+vb=obj

3,122 179 146 140 102

25.6 1.5 1.2 1.2 0.8 30.3

[ ]+nn=atr [ ]+nn=atr, [ ]+vb=sb [ ]+nn=atr, [ ]+in=adv [ ]+nn=atr, [ ]+in=atr [ ]+nn=atr, [ ]+vb=obj

2,391 166 119 115 103

21.0 1.5 1.0 1.0 0.9 25.4

124 Syntactic Dependency Relations and Related Properties

Table 5.46 Top 5 motifs of combination 6

Syntactic Dependency Relations and Related Properties 125 Table 5.47 Fitting the negative hypergeometric model to motifs of combination 6 Input data

R²

K

M

n

N

Combination6-E1

0.9730

0.7392

0.1660

5,023

11,605

Combination6-E2

0.9676

0.7384

0.1690

5,058

11,452

Combination6-E3

0.9678

0.7459

0.1757

5,128

11,260

Combination6-E4

0.9759

0.7464

0.1628

4,942

11,801

Combination6-E5

0.9778

0.7275

0.1515

4,848

12,176

Combination6-E6

0.9716

0.7516

0.1738

5,076

11,376

Parameters K (M = 0.741500, SD = 0.0084413) and M (M = 0.166467, SD = 0.0087534) further indicate the homogeneity of the sub-corpora. The fitting further corroborates the linguistic status of combination 6 and suggests it as a result of the diversification process. 5.5.3.4 Combination 7: a complete dependency structure

Combination 7 is the combination with the most components. As is hypothesised, its data behave somewhat differently from the previous three types of combinations. Combination 7 data have the biggest TTR among the four groups of data (Table 5.48). On average, every four tokens call for three types, suggesting great diversity. The rank-frequency curve for the top 50 motifs is presented in Figure 5.13a. To make the trend more visible, we further select the top ten in Figure 5.13b, which looks quite similar except for the top rank. Top 5 motifs (Table 5.49) take up an average percentage of 12.4%, with the first motif (“nn+nn=atr”) contributing an average percentage of 9.6%. Fitting the Negative hypergeometric model to the complete data of combination 7 yields less satisfactory fitting results than the previous combinations (Table 5.50, p. 128). The determination coefficients R2 range between 0.7567 and 0.8315, indicating a still acceptable result. Parameters K (M = 0.795000, SD = 0.301662) and M (M = 0.298333, SD = 0.0213698) seem to behave less Table 5.48 Summary of motifs of combination 7 Input data

Types

Tokens

TTR

Entropy

Combination7-E1 Combination7-E2 Combination7-E3 Combination7-E4 Combination7-E5 Combination7-E6

5444 5382 5384 5368 5194 5428

7243 7168 7052 7301 7573 7169

0.751622 0.750837 0.763471 0.735242 0.685858 0.757149

11.405496 11.406756 11.455841 11.275591 10.945104 11.380026

126 Syntactic Dependency Relations and Related Properties

Figure 5.13 Rank frequency of motifs of combination 7 (a, above) Top 50. (b, below) Top 10

homogeneously than the other three groups of data. This is generally understandable as it’s becoming more difficult for patterns to co-occur with combination 7 itself including one more element than the other three 2-element combinations. But the fitting results still indicate some regularity of motifs of combination 7, the complete dependency structure.

Table 5.49 Top 5 motifs of combination 7

1 2 3 4 5

Freq.

%

E2

Freq.

%

E3

Freq.

%

nn+nn=atr jj+nn=atr dt+nn=auxa, nn+nn=atr nn+nn=atr, pos+nn=atr dt+nn=auxa, jj+nn=atr sub-total

634 90 50 44 36

8.8 1.2 0.7 0.6 0.5 11.8

nn+nn=atr nn+nn=atr, pos+nn=atr jj+nn=atr dt+nn=auxa, nn+nn=atr dt+nn=auxa, jj+nn=atr

631 58 54 43 29

8.8 0.8 0.8 0.6 0.4 11.4

nn+nn=atr jj+nn=atr dt+nn=auxa, nn+nn=atr nn+nn=atr, pos+nn=atr dt+nn=auxa, jj+nn=atr

594 65 51 44 35

8.4 0.9 0.7 0.6 0.5 11.2

E4

Freq.

%

E5

Freq.

%

E6

Freq.

%

nn+nn=atr jj+nn=atr dt+nn=auxa, nn+nn=atr nn+nn=atr, pos+nn=atr jj+nn=atr, nn+nn=atr sub-total

722 69 59 52 33

9.9 0.9 0.8 0.7 0.5 12.8

nn+nn=atr jj+nn=atr nn+nn=atr, pos+nn=atr nn+nn=atr, cd+nn=atr dt+nn=auxa, nn+nn=atr

942 77 53 46 43

12.4 1.0 0.7 0.6 0.6 15.3

nn+nn=atr jj+nn=atr dt+nn=auxa, nn+nn=atr nn+nn=atr, pos+nn=atr nn+cc=atr

672 62 55 40 27

9.4 0.9 0.8 0.6 0.4 11.9

Syntactic Dependency Relations and Related Properties 127

1 2 3 4 5

E1

128 Syntactic Dependency Relations and Related Properties Table 5.50 Fitting the negative hypergeometric model to motifs of combination 7 Input data

R²

K

M

n

N

Combination7-E1

0.7787

0.81

0.31

5,443

7,243

Combination7-E2

0.7684

0.82

0.31

5,381

7,168

Combination7-E3

0.7567

0.82

0.32

5,383

7,052

Combination7-E4

0.7888

0.79

0.29

5,367

7,301

Combination7-E5

0.8315

0.74

0.26

5,193

7,573

Combination7-E6

0.7588

0.79

0.30

5,427

7,169

5.5.4 Section summary

Section 5.5.3 examined combinations 4, 5, 6 and 7 of the complete dependency structure “dependent + governor = syntactic function/dependency relation” and their motifs. All these combinations bear at least two elements. Combination 4 (“dependent (POS) + governor (POS)”) examines the collocation between word classes. Combination 5 (“dependent (POS) + [] = syntactic function”) examines the functions of the word classes. Combination 6 (“[] + governor (POS) = syntactic function”) examines which word classes can serve as certain functions when they are modifying given word classes. Combination 7 (“dependent (POS) + governor (POS) = syntactic function”) examines the complete dependency structures of dependent-governor pairs. The excellent fittings of the right truncated modified ZA model to combinations 4 through 7 and of the Negative hypergeometric model to their motifs suggest that the collocations are regularly distributed and their linear orders follow some common linguistic mechanism. To our best knowledge, it’s a new experiment about the motifs of combinations. For most motifs of single elements (cf. Liu & Liang, 2017; Mikros & Mačutek, 2015), the best-fitting models are usually the ZA model and ZM distribution patterns. But combinations seem to work with a somewhat different mechanism, resulting in a different distribution pattern, here in this study, the Negative hypergeometric model. Whether this model will fit more combination-induced motifs needs more data to prove. A common feature for all these groups of motif data is that the Rank 1 frequencies are always much higher than Rank 2 frequencies. Such a feature indicates that actually, following the definition of motifs, few patterns are a result of a cooccurrence of two or even more such combinations. Changing the definition of how motifs are formed, the result will be very much different. For the present definition of motifs, there shall be no repetition in the sequence. But let’s take nouns as an example: it’s quite likely that several nouns are used to modify the same noun. If this kind of repetition is taken into account, the situation might be different. This shall be an interesting future endeavour.

Syntactic Dependency Relations and Related Properties 129 For all these combinations, when one of the elements is fixed, the result can be more focused and more linguistically significant with a better capacity to investigate a certain syntactic function or a certain word class. For instance, in combination 4, if the word class of the dependent is fixed (say, nouns), then this combination examines which word classes nouns can modify. Or when the word class of the governor is fixed (e.g., verbs), it turns to investigate which word classes can modify verbs. Or in combination 5, if the word class of the dependent is fixed (e.g., adjectives), this combination examines the functions of adjectives; if the syntactic function is fixed (e.g., attributes), this combination looks at which word classes can function as attributes. Following this procedure, many probabilistic patterns can be generated. Having investigated the dependency structure with the 7 various combinations of the elements, and using three somewhat different quantitative ways of investigation (the rank-frequency distribution of the linguistic entities, of their sequencing and of their motifs), in the next two sections, we will resume with valency (Section 5.6) and dependency distance (Section 5.7), two other properties related to dependency structures. 5.6 Valency distribution 5.6.1 Introduction

Valency has its origin in chemistry. Introduced into linguistics by Tesnière (1959), the idea of valency has developed into modern dependency grammar (Liu, 2009a). In linguistics, this concept denotes the requirements for a word (particularly a verb) to take certain words as its complements. Since verbs are of primary importance in sentence structure, verb valency tends to attract the most attention. For each verb, it takes some particular kinds of complements in particular forms, thus generating different valency patterns or complementation patterns (Köhler, 2012). For instance, in the sentence “He gave me a book” (Figure 5.14a), the verb “gave” is a tri-valency verb. In the structure “I read the book” (Figure 5.14b), the verb “read” is a bi-valency verb. In these two structures, both “gave” and “read” are governors, with the former governing three dependents (“he”, “me” and “a book”) and the latter, two dependents (“I” and “the book”). Researchers have studied verb valency quantitatively in some languages. Köhler (2005) investigates the distribution pattern of German verb valency. Čech

Figure 5.14 Examples of verb valencies. (a) A tri-valency verb. (b) A bi-valency verb Source: Zhang & Liu, 2017b

130 Syntactic Dependency Relations and Related Properties et al. (2010) examine the full verb valency, which goes without discriminating between obligatory arguments (complements) and optional arguments (adjuncts) owing to the current lack of a clear criterion between the two. Using the same approach, Čech and Mačutek (2010) observe the connection between verb length and verb complementation patterns in Czech. Liu (2011) examines the interrelation between verb valency, verb length, frequencies and polysemy. Liu and Liu (2011) conduct a corpus-based diachronic study on the syntactic valence of Chinese verbs. Gao et al. (2014) build a synergetic lexical model of Chinese verbs with valency incorporated. Jin and Liu (2018) examine the regular dynamic patterns found in verbal valency ellipsis taking place in modern spoken Chinese. Beliankou and Köhler (2018) investigate valency structures. Lu et al. (2018) check dependency distance and dynamic valency. Gao and Liu (2019) suggest that valency patterns shall be included in English Learners’ Thesauri like the Longman Language Activator and A Dictionary of Synonyms. Gao and Liu (2020) check how foreign learners can benefit from Chinese valency dictionaries in Chinese acquisition. The previously mentioned studies approach valency, addressing language in the mass. When the sequential behaviour of valency is concerned, Čech et al. (2017) examine the full verb valencies from the perspective of motifs in Czech and Hungarian and find such motifs follow the ZM distribution pattern, validating valency motifs as normal language entities. Most of the prior studies examine verb valency. However, valency isn’t exclusive to verbs. Tesnière (1959) examines the valency of other types of words. Herbst (1988) defines valency as the requirement for a word (not necessarily a verb) on its complements. The word is usually a verb, a noun or an adjective. Both obligatory arguments and optional adjuncts can function as the complements. He discovers a valency model for English nouns. Later, Herbst et al. (2004) incorporate three word classes (verbs, nouns and adjectives) in their English valency dictionary. Hao et al. (2022) examine the noun valency in interlanguage. Liu and Feng (Liu, 2007, 2009a; Liu & Feng, 2007) proposes GVP (Generalised Valency Pattern), which is explained in Section 4.4.2. GVP includes both the active part of the valency (indicating a word or word class can govern other words or word classes) and the passive part of the valency (indicating a word or word class can be governed by other words or word classes). Liu (2009a) further proposes Probabilistic Valency Pattern Theory (PVP) theory, incorporating the ideas of probabilities. He examines all word classes which can act as governors, all word classes which can act as dependents and particularly, verb governors and noun dependents. Based on PVP, Yan and Liu (2021) quantitatively analyse the verb valency in Chinese and English. Like full verb valency of Čech et al. (2010), the GPV does not distinguish between obligatory arguments and optional adjuncts. But the GPV is different from full verb valency in the following ways: The GPV covers two types of binding for a word to be a governor and a dependent. The GPV covers all running words rather than mere verbs.

Syntactic Dependency Relations and Related Properties 131 These features of PVP allow us to study word chains in terms of valencies. In this part of the study, we would like to examine generalised valencies of words to investigate how words in authentic texts play the role of valency carriers. Here valency is defined as the number of all arguments and adjuncts, or the number of all dependents of the word. We are going to study valencies employing both languagein-the-mass and language-in-the-line methods. We pose three research questions: Research question 5.6.1: Do valencies observe the ZA distribution model? Research question 5.6.2: Do valency motifs observe the ZM distribution model? Research question 5.6.3: What is the link between motif length frequencies and lengths? The next part of this section addresses the methods of examining valencies and their motifs. The last part presents conclusions. 5.6.2 Materials and methods

First of all, we get the number of all dependents or the generalised valency of all the running words. Take Figure 5.15 (a re-presentation of Figure 4.9) for example, where terminal nodes (e.g., “Pierre” and “61”) with no dependents are defined as having a valency of 0. The words “will” and “old”, bearing 2 and 1 dependent

Figure 5.15 Dependency tree for the “Pierre” sample sentence Source: Zhang & Liu, 2017b

132 Syntactic Dependency Relations and Related Properties Table 5.51 Valencies for words in the “Pierre” sample sentence Node

POS

Pierre Vinken 61 years will join the board as a nonexecutive director Nov. 29

nn nn cd nn md vb dt nn in dt jj nn nn cd

Valency pattern Pierre+Vinken◎+old 61+years◎ Vinken+will◎+join join◎+board+as+Nov. the+board◎ as◎+director a+nonexecutive+director◎ Nov.◎+29

Valency 0 2 0 1 2 3 0 1 1 0 0 2 1 0

separately, are defined as having valencies of 2 and 1. So for the whole sentence, the valencies for all the words are listed in Table 5.51, where words with no dependents are defined as having a valency of 0. The symbol ◎ suggests its previous word as a governor. From all the valency patterns which are extracted from the sub-corpora, we find governors can be verbs, nouns, adjectives, adverbs, prepositions, conjunctions, the infinitive “to” and so on so long as they have daughter nodes. Having defined generalised valencies, we move on to their motifs. Since valencies are numerical, a valency motif is the longest sequence of non-descending valencies. Thus, the example sentence can be represented by the following motifs: 0-2 0-1-2-3 0-1-1 0-0-2 1 0 The initial motif ends with valency 2 since the following word (“61”) bears a valency of 0. Similarly, the second motif ends with 3 as the next valency is 0. Figure 5.16 visually represents the valency time series of motifs (suggested by numbers) of the sample sentence. We can see that in most cases, the motifs begin with valency 0, indicating a running word without any dependent. Such a pattern might to some extent tell some dynamic mechanism of behaviour of words in a sequence. Also, such a rhythm might affect dependency distances between dependents and their governors. The dependency distances will in turn affects comprehension difficulty. We hypothesise that such a rhythm constitutes a synergetic result from two opposing forces: the speaker would minimise his production effort while the hearer would minimise his comprehension load.

Syntactic Dependency Relations and Related Properties 133

Figure 5.16 A time series plot of generalised valencies for the “Pierre” sample sentence Source: Zhang, 2018

Besides the valencies, the motif lengths of valencies can also capture the syntactic rhythm. Motif lengths are defined as the number of elements in a motif. As indicated by Figure 5.16, the sample sentence has such a series of valency motif lengths: 2, 4, 3, 3, 1 and 1. 5.6.3 Results and discussion

In this part, we take turns to examine (1) distribution of valency motifs, (2) distribution of motif lengths and (3) the link between motif length frequencies and lengths, addressing the three research questions subsequently. 5.6.3.1 Distribution of valency motifs

Table 5.52 is a summary of rank-frequency distribution data for valency motifs. Table 5.53 presents the top 10 motifs with their percentages and frequencies available in Table 5.54. The top 10 accounts for quite similar percentages (73–74%) Table 5.52 A summary of rank-frequency distribution data for valency motifs Data

Motif type

Number of motifs

Valency type

motif-E1 motif-E2 motif-E3 motif-E4 motif-E5 motif-E6

173 173 173 174 180 180

17,642 17,547 17,723 17,503 17,645 17,513

10 11 9 10 11 10

Source: Zhang & Liu, 2017b

134 Syntactic Dependency Relations and Related Properties Table 5.53 Rank-frequency distribution data for valency motifs (top 10) Rank

E1

E2

E3

E4

E5

E6

1

1

1

1

1

1

1

2

0+2

0+2

0+2

0+2

0+2

0+2

3

0+1

0+1

0+1

0+1

0+1

0+1

4

0+0+2

0+0+2

0+0+2

0+0+2

0+0+2

0+0+2

5

0+0+3

0+0+3

0+0+3

0+1+1

0+1+1

0+1+1

6

0+1+1

0+1+1

0+1+1

0+0+3

0+3

0+0+3

7

0+1+2

0+3

0+3

0+3

0+0+3

0+3

8

0+3

0+1+2

1+1

0+1+2

0+1+2

0+1+2

9

1+1

1+1

0+1+2

1+1

1+1

1+1

0+0+0+3

0+0+0+3

0+0+0+3

0+0+0+3

0+0+1

0+1+3

10

Source: Zhang & Liu, 2017b

and each rank weighs similarly across the sub-corpora. Particularly, the motif rankings for the top 4 are identical across the sub-corpora. All these suggest that the six sub-corpora are generally homogeneous, at least when valency is concerned. The motif “1” takes up quite high percentages (16–17%), followed by “0+2” (14–15%) and then “0+1” (11–12%). Interestingly, motif “0+2” occurs more frequently than “0+1”. As valency motifs are defined as valences not going down, the motif “0+2” indicates that the relevant strings occur before a word with no dependent (indicating a valency of 0) or before a word with only 1 dependent (indicating a valency of 1). Similarly, the motif “0+1” indicates that the relevant strings only come before a word with no dependent. That indicates words in quite some structures are sequenced in a pattern where valency 1 (indicating a word with one dependent) occurs between two words without dependents. This is a very special rhythmic pattern. Fitting the ZA model to the complete rank-frequency data of valency motifs yields excellent results with all R2 above 0.989 (Table 5.55). Figure 5.17 (p. 136) exemplifies the top 20 curve of the fitting of E1. The fitting yields very similar values of parameter α (M = 0.169150, SD = 0.003930776) across the sub-corpora. The ZM model is an excellent fitting model for motif data (Čech et al., 2017; Köhler, 2015). Like most motif studies, both the ZA and ZM models fit the data in this part well, with the fitting of the latter generating R2 values between 0.9840 and 0.9962. 5.6.3.2 Distribution of motif lengths

Before the examination of motif lengths, we take a closer look at their basic components—valencies. The first 200 words of E1 is presented in Figure 5.18, which

Syntactic Dependency Relations and Related Properties 135 Table 5.54 Frequencies and percentages for top 10 valency motifs E1

E2

E3

Rank

Freq.

%

Freq.

%

Freq.

%

1

2,995

17.0

2,923

16.6

3,114

17.5

2

2,505

14.2

2,437

13.9

2,612

14.7

3

2,090

11.9

2,001

11.4

1,970

11.1

4

1,194

6.8

1,264

7.2

1,224

6.9

5

917

5.2

879

5.0

940

5.3

6

811

4.6

796

4.5

770

4.3

7

760

4.3

784

4.5

759

4.3

8

747

4.2

710

4.0

709

4.0

9

705

4.0

688

3.9

703

4.0

10

384

2.2

339

1.9

369

2.1

sub-total

74.4 E4

72.9 E5

74.2 E6

Rank

Freq

%

Freq

%

Freq

%

1

2,873

16.4

2,966

16.7

2,985

17.0

2

2,484

14.2

2,346

13.2

2,387

17.0

3

2,078

11.9

2,169

12.2

1,979

13.6

4

1,254

7.2

1,243

7.0

1,238

11.3

5

857

4.9

942

5.3

854

7.1

6

816

4.7

864

4.9

837

4.9

7

794

4.5

834

4.7

767

4.8

8

710

4.1

643

3.6

755

4.4

9

674

3.9

635

3.6

696

4.3

10

351

2.0

316

1.8

312

4.0

sub-total

73.8

73.0

73.2

Source: Zhang & Liu, 2017b

we believe can represent the whole sub-corpora to a certain degree. As language users we won’t attach equal importance to all the running words, some certain rhythms can be generated, which can be reflected in the time series plot. As can be seen from Figure 5.18, except for valency 0, it’s rare to have two or more neighbouring words to carry the same number of dependents (to have the

136 Syntactic Dependency Relations and Related Properties Table 5.55 Fitting the ZA model to valency motif rank-frequency data Input data

R²

a

b

n

α

N

Motif-E1

0.9891

0.1724

0.2865

172

0.1698

17641

Motif-E2

0.9918

0.2873

0.2558

172

0.1666

17546

Motif-E3

0.9929

0.2880

0.2618

172

0.1757

17722

Motif-E4

0.9905

0.3058

0.2549

173

0.1642

17502

Motif-E5

0.9891

0.3276

0.2491

179

0.1681

17644

Motif-E6

0.9910

0.2841

0.2554

179

0.1705

17512

Source: Zhang & Liu, 2017b

same valency). But it’s not rare for a noun to have two or more modifiers, so it’s more common for two or more valencies 0 to co-occur adjacently. Another trend that can be witnessed from Figure 5.18 is about two or occasionally three words occurring with growing valencies, which usually start from valency 0. The frequent occurrences of 0-to-more valency pattern will result in frequent occurrences of motif length of 2 and 3, particularly length 2. All the growing valency will finally come down to valency 0, gradually or abruptly, resulting in motif lengths of 1. The time series plots of the motif lengths (Figure 5.19) can partly reflect the valency. The length rhythm somewhat differs from the valency rhythm. From Figure 5.19 we can foresee that in the whole corpus of E1, length 2 occurs quite frequently, ranking the top. Occurring less frequently is length 3. Noticeably, neighbouring identical motif lengths of 2 occur quite frequently, taking up 78% of

Figure 5.17 Fitting the ZA model to E1 valency motif rank-frequency data Source: Zhang & Liu, 2017b

Syntactic Dependency Relations and Related Properties 137

Figure 5.18 A time series plot of valencies (the first 200 running words of E1) Source: Zhang & Liu, 2017b

all neighbouring identical motif lengths. However, length 1 is not as frequent as lengths 2 and 3. These observations agree with the predictions we have made when we examine the rhythmic pattern of valencies. This interesting observation needs to be further explored and interpreted. Table 5.56 summarises the data of valency motif lengths. The average valency motif length (M = 2.458333, SD = 0.0132916) seems to be very much alike across the sub-corpora. Throughout all the six sub-corpora, the lengths follow the same

Figure 5.19 A time series plot of the valency motif lengths (the first 200 valency motifs in E1) Source: Zhang & Liu, 2017b

138 Syntactic Dependency Relations and Related Properties Table 5.56 Rank-frequency data of valency motif lengths Rank

Length

E1

E2

E3

E4

E5

E6

Freq

Freq

Freq

Freq

Freq

Freq

1

2

6,659

6,561

6,651

6,636

6,672

6,467

2

3

5,039

5,133

5,043

5,021

5,159

5,128

3

1

3,240

3,146

3,344

3,086

3,197

3,185

4

4

2,069

2,066

2,086

2,117

1,987

2,058

5

5

516

516

474

519

515

547

6

6

97

108

101

100

90

101

7

7

17

13

17

17

17

24

8

8

4

2

5

4

6

2

9

9

1

1

2

1

Total

17,641

17,546

17,722

17,502

17,644

17,512

Average length

2.45

2.47

2.44

2.47

2.45

2.47

Source: Zhang & Liu, 2017b

ranking order: length 2 ranks No. 1, followed by lengths 3 and 1, then subsequently from 4 to 9. In other words, except for length 1 (taking up a percentage of 9–10%), all lengths are ordered with increasing ranks. Valency patterns differ for different POS, which might partly be a cause for the uneven distribution of the motif lengths. We fit the ZA model to the data (Table 5.56). The excellent fittings (with R2 all above 0.97) as presented by Table 5.57 yields an affirmative answer to research question 4.6.2, further testifying the motifs as normal linguistic units. The parameter α (M = 0.375550, SD = 0.0036187) behaves homogeneously across the sub-corpora. Table 5.57 Fitting the ZA model to the valency motif length data Input data

R²

a

B

n

α

N

x-max

Motif-length-E1

0.9735

0.0214

0.9759

8

0.3775

17,641

8

Motif-length-E2

0.9763

0.1119

0.9625

9

0.3739

17,546

9

Motif-length-E3

0.9706

0.0090

1.0037

9

0.3753

17,722

9

Motif-length-E4

0.9750

0.1418

0.9332

9

0.3792

17,502

9

Motif-length-E5

0.9774

0.0890

0.9883

9

0.3781

17,644

9

Motif-length-E6

0.9751

0.1186

0.9367

8

0.3693

17,512

8

Source: Zhang & Liu, 2017b

Syntactic Dependency Relations and Related Properties 139 Table 5.58 Lengths of valency motifs and length frequencies (Freq = frequency) Length

1 2 3 4 5 6 7 8 9 Total

E1

E2

E3

E4

E5

E6

Freq

Freq

Freq

Freq

Freq

Freq

3,146 6,561 5,133 2,066 516 108 13 2 1 17,546

3,344 6,651 5,043 2,086 474 101 17 5 1 17,722

3,086 6,636 5,021 2,117 519 100 17 4 2 17,502

3,197 6,672 5,159 1,987 515 90 17 6 1 17,644

3,240 6,659 5,039 2,069 516 97 17 4 17,641

3,185 6,467 5,128 2,058 547 101 24 2 17,512

Source: Zhang & Liu, 2017b

5.6.3.3 Frequency and length

We continue to address the third research question and examine the link between motif lengths and length frequencies. Table 5.58 presents the relevant data. The hyper-Poisson (a, b) function has been used to model the link between language qualities (Köhler, 2015). Here, fitting this function to the data yields excellent results with all R2 values above 0.993 (Table 5.59), which further validates valency motifs as language entities resulted from the diversification process. The fitting yields similar a (M = 1.000117, SD = 0.0122022) and b (M = 0.435450, SD = 0.0136946) values across the sub-corpora, suggesting some homogeneity. 5.6.4 Section summary

This part examines generalised valency from three dimensions. It finds that both valency motifs and motif lengths abide by the same ZA distribution model. In addition, Table 5.59 Fitting the hyper-Poisson function to the valency motif length-frequency interrelation Input data

R²

A

b

N

E1

0.9959

0.9966

0.4377

17,641

E2

0.9946

0.9974

0.4249

17,546

E3

0.9956

1.0029

0.4553

17,722

E4

0.9969

0.9987

0.4223

17,502

E5

0.9951

0.9838

0.4250

17,644

E6

0.9939

1.0213

0.4475

17,512

Source: Zhang & Liu, 2017b

140 Syntactic Dependency Relations and Related Properties the interrelation between motif lengths and length frequencies can be captured by the hyper-Poisson function. These findings validate the linguistic status of valency. As Ding (2009, pp. 140–141) puts it, “Rhythm is a universal law of human physiology, psychology and the movement of nature, and language is naturally no exception to it. All discourses that can be extended in a sequence of time have periodic rhythmic variations, which only exist in different forms of expression depending on the nature of their units and the rules of their combination.” “The size of the rhythmic cluster is determined by the speed of speech, the physiology of articulation and the psychological habits of the people ……” “The basic unit of rhythm is the beat, and the largest unit is the rhythmic cluster.” The basic unit of rhythm here is the valency, and the rhythm of valency influences the distance between words with dependent relations, the latter in turn influencing the difficulty of understanding syntactic structure (Liu, 2008). This rhythm may again reflect the dynamic balance between the principle of least effort for both the speaker and the listener. Having discussed the valency distribution, we move on to the next important aspect of dependency structures—dependency distance or the distance between a dependent and its governor. 5.7 Dependency distance sequencing 5.7.1 Introduction

Research in psycholinguistics provides a basis for the empirical study on language comprehension difficulty (Jay, 2004). From a psycholinguistic perspective, syntactic analysis models based on working memory are well founded (e.g., Jay, 2004; Levy et al., 2013), but to measure the difficulty of comprehension poses a challenge for formal and cognitive linguists. Related theories based on memory or distance are proposed, such as Yngve’s (1960) Depth Hypothesis, Hawkins’ (1994) Early Immediate Constituents (EIC) and Gibson’s (1998) Dependency Locality Theory (DLT). Yngve’s (1960) Depth Hypothesis is an attempt to measure the difficulty of sentence comprehension. Yngve defines the sentence depth as the maximum number of symbols that need to be stored when constructing a sentence. Yngve proposes that sentences in authentic language have a depth that does not exceed a certain value, which is generally consistent with the human working memory capacity (Cowan, 2001, 2005, 2010; Miller, 1956). However, Yngve’s Depth Hypothesis is not perfect (Frazier, 1985). Following Yngves’s approach to determining depth, scholars have conducted a number of empirical studies and found no support for this hypothesis (Sampson, 1997). Nonetheless, the Depth Hypothesis establishes a unifying metric for language comprehension difficulty; the hypothesis links cognitive structure, which is common to humans, to comprehension difficulty. Hawkins (1994, p. 13) argues that although it is not known at the moment what this threshold is, its discovery constitutes an important task in measuring how difficult language comprehension is. If such a constant could be found, the theory of human performance would be greatly simplified (Cowan, 2005, p. 6).

Syntactic Dependency Relations and Related Properties 141 Miller and Chomsky (1963) propose a measure of syntactic difficulty, which uses the ratio between non-ultimate and ultimate nodes in phrase-structure syntactic trees. Frazier (1985) suggested converting that global algorithm to a local one, so that the algorithm is more sensitive. Some researchers use other terms like dependency length (e.g., Futrell et al., 2015; Temperley, 2007, 2008). The dependency length in Temperley’s (2008) text refers to the path length of the root that eventually reaches the dependent via other nodes. Gibson and other scholars argue that sentence comprehension difficulty is related with the linear order of syntactically related words and with their distance (Gibson, 1998, 2000; Gibson & Pearlmutter, 1998; Grodner & Gibson, 2005; Temperley, 2007; Gildea & Temperley, 2010; Liu, 2008; Wagers & Phillips, 2014). Dependence distance (DD) connects the two-dimensional hierarchical structure and the linear sentence structure, hence a key property of syntactic system based on dependency grammar (Liu et al., 2017). There are three definitions of DD. Heringer et al. (1980, p. 187) define it as the number of words in the linear sentence between the dependent and its governor. Such a definition defines the DD between a dependent and its neighbouring dependent as 0, which causes inconvenience in calculation, and ignores the direction of dependency. Hudson (1995, p. 16) defines DD as the linear distance between the dependent and its governor, and the dependency distance between the dependent and its neighbouring governor is 1. Liu (2007) defines DD as the series number of the governor in the linear sentence minus that of its dependent. Such a definition is more advantageous in that it concurrently gives consideration to dependency direction—the DD is negative in case the governor is placed in the front, otherwise it is positive. The definition of Liu (2007) is adopted in the study herein. If sentence comprehension and analysis is about transforming a linear string of words into a dependency tree, that is, from one-dimensional structure to a twodimensional structure, then we can liberate it from working memory only when the dependent meets its governor and forms a syntactic relation (Ferrer-i-Cancho, 2004; Hudson, 2010; Jiang & Liu, 2015; Liu, 2008). In other words, when comprehending sentences, one uses an incremental parsing strategy, and words that have entered working memory stay there until a structure is formed. Because working memory has a limited capacity, if more words are stored in working memory exceeding the memory capacity, it can lead to comprehension failure (Covington, 2003). Hudson (1995) and Liu (2008) point out that the average DD of the sentence can be taken as the gauge for sentence complexity. Syntactic analysis is to process the sentence following the linear order of words and then to figure out their interrelations. Such a process runs in the working memory; therefore, the shorter the distance between the words involving a dependency relation, the easier the words will be understood and memorised. If the distance is too long, the words will be more readily forgotten. In his paper, Liu (2008) proposes three hypotheses: (1) Linear sequences that minimise the average DD will be preferred by the language analysis algorithms of the human brain. (2) The vast majority of sentences or texts of human utterances do not exceed a threshold of the average DD. (3) Grammar

142 Syntactic Dependency Relations and Related Properties alongside cognition works together to keep the DD within this threshold. The basic starting point for these assumptions is that human cognitive structure (here working memory capacity) and the limitations of grammar make human language tend to minimise the dependency distance. He tests these hypotheses using a corpus of 20 different languages with dependent annotations and finds that human languages have a tendency to minimise the average DD, which has a threshold no greater than three, and that grammar also plays an important role in limiting the distance, thus verifying the three hypotheses mentioned above. This is the first study of this type to be conducted with a large-scale multilingual treebank. Human language tends to minimise the DD, and it is a common linguistic feature governed by human’s common cognition system (Ferrer-I-Cancho, 2016; Futrell et al., 2015; Liu, 2008; Liu et al., 2017; Lu & Liu, 2016a, 2016b). By using real linguistic materials from large-scale corpora, some scholars have proved the common feature that dependency distance minimisation (DDM) may exist in natural languages. For example, Spanish scholar Ferrer-i-Cancho (2004) carries out relevant study with a Romanian dependency Treebank. Liu (2008) and the research team of MIT (Futrell et al., 2015) do so with 20 languages, and 37 languages, respectively. They prove that this common feature of DDM may exist in natural languages. Ferrer-i-Cancho (2004, 2013) has also proved such generality from the perspective of theoretical analysis. In addition, the research results of computational linguistics have also proved minimisation of DD can improve the accuracy of syntactic analysis to a great extent. In case DD can indicate how linguistic comprehension and output are constrained by human working memory, and in case the capacities of working memory are common to most of the people, the DD of human language shall abide by certain regularity (Jiang & Liu, 2015, p. 96). Researchers have recently carried out a great many studies on the regularity of DD. Liu (2007) analyses the probability distribution of DD (all in absolute value ignoring dependency direction) in a Chinese dependency treebank, which shows the data comply with the model of right truncated Zeta distribution. Liu (2008) discovers that the average DD of Chinese is 2.84 (dependency direction ignored), and the probability that the dependent neighbours its governor is around 50%. As far as dependency direction is concerned, Chinese is a mixed language in which the governor lies behind its dependent slightly more often, and is a language that features orders of S+V, V+O and Adj+N. Liu’s study is the first to find out that word order is a continuum, and that the order can be used for distinguishing different types of languages, serving as a new approach to the typological study. Fisch et al. (2019) refer to such directions as Liu-Directionalities. Investigations on DD and dependency directions help to analyse the difficulty of sentence comprehension (syntactic parsing), for instance the comprehension of ambiguous sentences (Zhao & Liu, 2014), the study of the comprehension of “ba” (a Chinese preposition suggesting its object as being acted on) sentences (Fang & Liu, 2018), the study of the dependency distance between “zai” (a Chinese preposition suggesting positions) and its subject (Xu, 2018), the study of syntactic complexity (Chen et al., 2022; Ouyang et al., 2022); it also contributes to the study

Syntactic Dependency Relations and Related Properties 143 of differentiating genres or even languages (e.g., Chen & Gerdes, 2022; Wang, 2015; Wang & Liu, 2017), children’s language acquisition (e.g., Ninio, 1998), language learning (Jiang & Ouyang, 2017; Ouyang & Jiang, 2018), translation (Jiang & Jiang, 2020; Liang, 2017; Zhou, 2021) and the design of better algorithms for natural language syntactic analysis (e.g., Collins, 1996, 2003; Eisner & Smith, 2005; Liu, 2009a). Besides synchronic studies on DD, there are also diachronic studies. For example, Liu and Xu (2018) conduct a study on Chinese and find DD increases steadily over the centuries. Jin (2018) finds that with the extension of school age, writings from Chinese deaf students witness a steady increase of DD. Zhu et al. (2022) study the diachronic changes of DD in four genres of modern English, and find the trends of DD changes. As for studies on DD’s sequential behaviour, Liu (2008) examines the time series of dependency direction. Jing and Liu (2017) study the DD motifs in 21 IndoEuropean languages, and discover their topological significance. In terms of the distribution model of dependency distances, Liu (2007) suggests the right-truncated Zeta model fits the ADD distribution; Li and Yan (2021) find that ADDs of the interlanguage of English for Japanese learners conform to the ZA model. The same model is also found to fit interlanguage (e.g., Hao et al., 2021). We will test these two models on our data to further validate the homogeneity across the sub-corpora. We therefore hypothesise: Research hypothesis 5.7.1: Dependency distances are distributed following the right truncated Zeta model and the parameters of the model can reveal the homogeneity of the six sub-corpora. Research hypothesis 5.7.2: Dependency distances are distributed following the ZA model and the parameters of the model and the parameters of the model can reveal the homogeneity of the six sub-corpora. As rank-frequency distributions of dependency distances (DD) and of their motifs have been examined by prior studies (e.g., Jing & Liu, 2017; Liu, 2008), when the linear behaviour of DD is concerned, we can choose to investigate sequencings. From time series plots, we can visualise linear behaviours of linguistic properties/entities, where the temporal dimension is not time in the sense of diachronic studies, but the dynamic development of the text, which is termed as syntagmatic time (Pawłowski, 1997). Time series plots are a simple application of time series analysis (cf. Pawłowski, 1997), from which the rhythmic pattern of units/attributes can be visible. For the time series analysis in this study, we mainly use time series plots in order to present relevant trends. Figure 5.20 shows the time series plot of DD and ADD for the “Pierre” sample sentence, whose dependency tree is available in Figure 4.9 and whose DDs and ADDs are calculated in Table 4.6. To leave out no node, we assign a DD value of 0 to the root (the biggest dot in the plot). When dependency directions are considered, Figure 5.20a shows that of the 5 nodes before the root, 4 of them bear

144 Syntactic Dependency Relations and Related Properties

Figure 5.20 Time series plots of DDs and DDs in the “Pierre” sample. (a) Dependency distances. (b) Absolute dependency distances Source: Zhang & Liu, 2017

positive values of DD. After the root, minus and plus values occur more alternatively. Towards the end of the sentences, there are more minus DD values. Temperley (2008) proposes the same-branching principle that dependencies shall be consistently right-branching and left-branching and the reverse/opposite branching. In this “Pierre” sample, it seems that left-branching occurs more often at the initial part of the sentence, right-branching more often at the end of the sentence, and alternating dependency direction more near the root. Would English sentences generally follow this trend? Will the mean dependency distance be of minus values as well (here in this sentence, –0.6)? When dependency directions are ignored, it can be discovered from Figure 5.20b that long and short DDs occur generally alternately, which facilitates the generation of a small mean dependency distance (2.07 for this sentence). In Figure 5.20a Nodes 3 and 4 both bear a valency of 1 when dependency direction is included, the only same-value neighbouring distance in the tree. Except for distances of 1 (suggesting adjacent dependency, either governor-initial or governor-final), other same-value DDs don’t occur in a neighbouring fashion. Will this long-short-longshort pattern occur as a general pattern in the sub-corpora? To address this question, we will have to study the syntagmatic/sequential behaviour of DD and ADD. Sequencings directly reflect how DD is sequentially distributed. The definition of sequencings is available in Section 4.5.2. We expect that such sequencings will behave like those for POS and dependency relations. Also, we can further validate the linguistic status of DD through the examination of their sequencings. We thereby hypothesise: Hypothesis 5.7.3: DD sequencings will observe both the ZA and AM distribution models. Hypothesis 5.7.4: ADD sequencings will observe both the ZA and AM distribution models.

Syntactic Dependency Relations and Related Properties 145 If the two hypotheses are corroborated, we can further claim that the methods of examining linear behaviours of linguistic units/properties through investigating their sequencings can have more general value. In the remainder of this section, the next sub-section will present the results and discussion. The final part will summarise this section. 5.7.2 Dependency distance minimisation

By comparing DDs of natural languages with those of random languages, scholars demonstrate that human languages have a common tendency to minimise dependency distances (Ferrer-i-Cancho, 2004; Gildea & Temperley, 2010; Liu, 2007, 2008). But the DDs in actual languages are larger than the minimum possible dependency distance, reflecting the constraints of grammar (Ferrer-i-Cancho & Liu, 2014). The reasons why the average dependency distance can be used as a measure of complexity are related to three main dimensions of research. The first dimension also has a relatively long history of research, which we will first address. Short dependency distance preferences have been examined since 1930, when the German linguist Behaghel (1930) suggests that closely related words tend to occur together. In the last two decades or so, several scholars have conducted studies related to short-dependent distance preferences. Yngve’s (1960) Depth Hypothesis, based on phrase structure grammar, is a hypothesis about the relationship between working memory capacity and the complexity of grammatical structure comprehension. It was later introduced to dependency grammar by Heringer et al. (1980) to explore the relationship between working memory capacity and the complexity of comprehension of linguistic structures. Frazier successively proposes the Late Closure principle (Frazier, 1978) and the Same Branching principle (Frazier, 1985) in the garden path sentence; Dryer (1992) proposes the branching direction theory (BTD). Hawkins (1994, 2004) finds that if the shorter of the two subordinate structures is closer to the governor, dependency distance can be shortened, which he proposes as a principle of EIC. This cognitivefunctional principle allows the (human) parser to observe the syntactic structure of a linguistic expression as early as possible, thus reducing the processing difficulty. Gibson incorporates DDM into his Syntactic Prediction Locality Theory framework, which is later termed as Dependency Locality Theory (DLT, Gibson, 1998). DLT states that longer dependency distances increase the difficulty of processing structures. Temperley (2008) summarises these principles as same branching (a preference for structures that always branch left or always branch right), ordered nesting (DD reduction generated through placing a shorter subordinate structure near the governor) and reverse branching (DD reduction generated through opposite branching of the single-word dependent to the main dependency direction). Lu (2018) investigates the effects of crossings, root node position and chunking on DD, with all these three factors reducing DD to some extent. Gildea and Temperley (2010) propose that DDM is a contributing factor in natural language variation. Over the decades, there have been other theories dealing with syntactic complexity that can essentially be accounted for in terms of DDM.

146 Syntactic Dependency Relations and Related Properties The second dimension is related to computational linguistic studies, which also indirectly support DDM. In several models of dependency-based syntactic analysis, DD is considered an important factor in syntactic analysis, with a preference for analysis with shorter DD (Collins, 2003; Eisner & Smith, 2005; Sleator & Temperley, 1991). This demonstrates that DDM per se can largely improve the accuracy of syntactic analysis. The third dimension concerns language development. For instance, although some languages have longer mean DD, others, shorter, the overall trend of word order change is towards a DD reduction (Gildea & Temperley, 2010). This not only occurs in English (Lei & Wen, 2020; Pyles, 1971; Zhu et al., 2022), but also in German (Ebert, 1978). This very perspective of DDM tells us to some extent about the past, present and possibly the future of natural languages (Feng, 20172). In summary, as an important concept of dependency grammar, DD embodies the distance as well as the order of syntactic relations that exist between words, and makes a difference to language processing in the human brain. There is a general tendency for human languages to minimise dependency distances, which is a universal feature of language bounded by universal cognitive mechanisms (Ferrer-i-Cancho, 2004, 2006, 2016; Futrell et al., 2015; Jiang & Liu, 2015; Lei & Wen, 2020; Liang & Liu, 2016; Liu, 2007, 2008; Liu et al., 2017; Lu & Liu, 2016a, 2016b; Lu & Liu, 2020; Wang & Liu, 2017). Ferrer-i-Cancho (2004) uses a dependency treebank of Romanian. Liu (2008) uses 20 languages and a team of researchers at MIT (Futrell et al., 2015) conducts related studies in 37 languages. These researchers successively demonstrate the possible existence of a general feature of DDM in natural languages by employing large-scale corpora of authentic languages. Ferrer-i-Cancho (2004, 2013) also demonstrates this commonality from a theoretical analysis perspective. Following the proposal of DDM, there has been a large body of research on the topic. In the 2017 issue of Physics of Life Reviews, the review of “Dependency distance: a new perspective on syntactic patterns in natural languages” by Liu et al. (2017) was published alongside a dozen related reviews. The scholars explored many linguistic issues related to the DDM from multiple perspectives. 5.7.3 Results and discussion

In this sub-part, we take turns to address the four research hypotheses. We will examine DD in a bag-of-words model first. Then we will investigate DD and ADD sequencing distributions. 5.7.3.1 Validating DD distributions

This part addresses research hypotheses 5.7.1 and 5.7.2, validating the two models (the right truncated Zeta model and the ZA model) in the data. The result, if confirmed, will help validate the homogeneity of the sub-corpora when DD is concerned and make the next part of the discussion on DD sequencings more feasible.

Syntactic Dependency Relations and Related Properties 147 Table 5.60 Rank-frequency data of DD (top 7, 87.2%–88.4%) (R. = rank, Freq. = frequency) R.

E1

E2

Freq.

DD

%

Freq.

1

13,098

1

31.7

12,972

2

9,191

−1

22.3

3

4,333

2

4

4,205

5

E3 %

Freq.

1

31.4

13,171

1

31.9

9,434

−1

22.8

9,173

−1

22.2

10.5

4,232

2

10.2

4,279

2

10.4

−2

10.2

4,151

−2

10.0

4,240

−2

10.3

2,616

−3

6.3

2,661

−3

6.4

2,630

−3

6.4

6

1,561

3

3.8

1,521

3

3.7

1,519

3

3.7

7

1,374

−4

3.3

1,391

−4

3.4

1,401

−4

3.4

Sub-total R.

DD

88.1

E4

87.9 E5

Freq.

DD

1

13,476

2

DD

%

Freq.

1

32.7

13,412

8,704

−1

21.1

3

4,281

2

4

4,196

5

%

88.3 E6

%

Freq.

1

32.5

13,097

1

31.8

8,835

−1

21.4

9,333

−1

22.7

10.4

4,267

2

10.3

4,246

2

10.3

−2

10.2

4,083

−2

9.9

4,157

−2

10.1

2,511

−3

6.1

2,575

−3

6.2

2,680

−3

6.5

6

1,602

3

3.9

1,482

3

3.6

1,509

3

3.7

7

14,26

−4

3.5

1,368

−4

3.3

1,350

−4

3.3

Sub-total

87.9

DD

87.2

DD

%

88.4

There are 69–86 types of DD. Appendix 5 show the top 20 DD rank-frequency data, accounting for 97.2%–97.7% of the total. The first seven rankings (Table 5.60, 87.2%–88.4%) are identical, all being 1, −1, 2, −2, −3, 3 and −4, and again their percentages are similar across the sub-corpora. Smaller distances (whether of positive or negative values) occur more frequently, which is an important reason for the minimalisation of DD (Buch-Kromann, 2006). With such a mechanism, the cognitive load is then lessened as the subordinate is more likely to meet its governor and is thus more easily freed from working memory. Figure 5.21 displays the top 20 rank-frequency data of DD, visually validating the homogeneity across the sub-corpora with the curves overlapping with each other. Fitting the right-truncated Zeta distribution in Liu (2007) to the complete DD rank-frequency data yields very poor results, with the determination coefficient R2 of one sub-corpus being 0 and that of the other five being just around 0.88.

148 Syntactic Dependency Relations and Related Properties

Figure 5.21 Rank-frequency curves of dependency distance (top 20)

The result suggests that the model is not the best model for the data in this study, thus nullifying Research Hypothesis 5.7.1. One possible reason is that the DD in Liu (2007) is actually absolute dependency distance (ADD), where directions are ignored. We again fit the ZA distribution to the complete data. This model is often employed to test whether a particular linguistic unit/property is regularly distributed and is thus the result of linguistic diversification processes. This is also the model for ADD distributions in interlanguage studies (e.g., Hao et al., 2021; Li & Yan, 2021). The fitting results, shown in Table 5.61, are excellent, with all determination coefficients R2 above 0.99. Parameter α (M = 0.322, SD = 0.008) seems to suggests some common mechanisms of dependency distance. The fitting constitutes further evidence that dependency distance is a normal linguistic Table 5.61 Fitting the ZA model to complete DD rank-frequency data R2

a

b

n

α

N

E1

0.9929

0.44

0.41

86

0.32

41,262

E2

0.9923

0.65

0.35

76

0.31

41,311

E3

0.9924

0.49

0.40

72

0.32

41,240

E4

0.9932

0.27

0.45

74

0.33

41,181

E5

0.9928

0.33

0.42

85

0.33

41,250

E6

0.9925

0.59

0.37

69

0.32

41,192

Syntactic Dependency Relations and Related Properties 149 Table 5.62 Key data relating to DD

E1 E2 E3 E4 E5 E6 M SD

Mean DD

Mean ADD

DD = 1 (%)

DD = −1 (%)

DD = ±1 (%)

Opposite neighbouring DD > 0 DD (%) (%)

0.03 0.00 0.00 0.05 0.09 0.00 0.03 0.0366

2.35 2.37 2.33 2.35 2.57 2.31 2.38 0.0953

31.7 31.4 31.9 32.7 32.5 31.8 32.00 0.4980

22.3 22.8 22.2 21.1 21.4 22.7 22.08 0.6911

54.0 54.2 54.2 53.9 53.9 54.5 54.12 0.2317

48.6 48.3 49.5 48.7 48.0 48.4 48.58 0.5115

52.6 51.8 52.4 53.7 53.6 52.2 52.72 0.7705

property/unit at the syntactic level and a result of diversification process. The data again demonstrates that there is a relative homogeneity of data across the sub-corpora, and that each sub-corpus can represent the basic features of the news genre. This allows us to use the data from one of the sub-corpora (for example E1) or from the collection of the six sub-corpora for detailed profiling in the analysis that follows. Table 5.62 summarises some key data relating to DD in English. All the parameters seem to suggest some homogeneity across the sub-corpora with mean (M) and standard deviation (SD) listed in the table except for Mean DD. The SD of mean DD seem to be larger than the mean, which doesn’t readily rule out the possibility for homogeneity as all the values approaching 0 are rather small. When compared with other languages (for instance, Chinese, which has an average mean DD of 1.42) (Zhang, 2023), the differences between languages and homogeneity within the same language can be more evident. Some mechanisms for the formation of DDM in English seem to manifest themselves in Table 5.62. 1 The English language seems to have a mechanism to achieve an equilibrium between the head-initial and head-final dependencies, thus generating a mean DD around 0. Follow-up studies need to discover for what dependency structures the dependency distances are negative and for what they are positive. The ADDs for English are between 2.31 and 2.57, a result consistent with the research findings by Liu (2007). The mean ADDs for English (M = 2.38, SD = 0.0953) and other parameter values seem to behave generally homogeneously across the sub-corpora, suggesting the data as reflective of journalistic English. 2 Adjacent dependencies (direct adjacency between the dependent and its governor) take more than half of the cases; the proportion in the study (53.9%–54.5%) is slightly lower than that in Jiang and Liu’s (2015) study, where adjacent dependencies in English account for 55%–67% of the total. The difference in data between the two studies may stem from differences in the corpora, but one thing is common: more than half of the dependencies arise in adjacent nodes, and this high proportion is an important contributor to minimalisation of DD.

150 Syntactic Dependency Relations and Related Properties 3 In adjacent dependent-governor pairs, head-initial cases accounts for 39%–42%, which is an important mechanism for the overall small mean DD in English, as a significant proportion of such cases can offset head-final dependencies and help generate a mean DD approaching 0 (0–0.09 across the sub-corpora). 4 The average proportion of reverse dependencies in the data is 48.6%, indicating positive and negative dependency distances alternating with each other. Temperley (2008) proposes the principle of reverse dependencies is an effective mechanism for reducing dependency distance. The aforementioned aspects makes both mean DD and mean ADD generally small in English. Having discussed the distribution of DD and ADD, we resume with DD sequencings in the next part. 5.7.3.2 DD sequencings

Table 5.63 presents the type-token data of DD sequencings. The TTR values are around 0.64 (M =0.638333, SD = 0.0098319), similar to those of POS sequencings (cf. Table 5.11 A summary of POS sequencing data) and dependency relation sequencings (Table 5.17 A summary of relation and POS sequencing data). Available in Table 5.64, the data for the part of DD sequencings with a probability over 0.01% take up 21.5–22.6% of the complete data (Table 5.63). Again TTR (M =0.003083, SD = 0.0001169) and percentages (M =21.900000, SD = 0.5692100) behave homogeneously across the sub-corpora. Later in this part, we only use the extracted data. The top 20 DD sequencings (47.7%–48.6% of the extracted data) in all the subcorpora are presented in Appendix 6. The top 6 rankings (Table 5.65, 29.1–29.6% of the extracted data) have the same sequencings, which are, successively, “1”, “−1”, “2”, “−2” and “2, 1” and “1, −2”. Here we use commas to indicate the order in the sequencing as some dependency values have minus signs. The first 4 rankings are in accordance with Table 5.60 (Rank-frequency data of DD, top 7, Table 5.63 Summary of DD sequencing data DD

Type

Token

TTR

E1

391,057

605,370

0.65

E2

391,658

604,832

0.65

E3

372,217

586,971

0.63

E4

366,971

578,872

0.63

E5

379,059

596,644

0.64

E6

362,796

576,430

0.63

Syntactic Dependency Relations and Related Properties 151 Table 5.64 Summary of extracted DD sequencing data (probability > 0.01%) Type

Token

TTR

%

E1

391

129,983

0.0030

21.5

E2

373

128,126

0.0029

21.2

E3

406

131,107

0.0031

22.3

E4

409

129,188

0.0032

22.3

E5

402

128,001

0.0031

21.5

E6

416

130,552

0.0032

22.6

Table 5.65 Top 6 DD sequencings in each sub-corpus (R. = rank, Freq. = frequency) R.

Freq.

E1-DD

%

Freq.

E2-DD

%

Freq.

E3-DD

%

1

13,098

1

10.1

12,972

1

10.1

13,171

1

10.0

2

9,191

−1

7.1

9,434

−1

7.4

9,173

−1

7.0

3

4,333

2

3.3

4,232

2

3.3

4,279

2

3.3

4

4,205

−2

3.2

4,151

−2

3.2

4,240

−2

3.2

5

3,851

2, 1

3.0

3,749

2, 1

2.9

3,797

2, 1

2.9

6

3,579

1, −2

2.8

3,456

1, −2

2.7

3,619

1, −2

2.8

Sub-total

29.5

Freq.

E4-DD

%

Freq.

E5-DD

%

Freq.

E6-DD

%

1

13,476

1

10.4

13,412

1

10.5

13,097

1

10.0

2

8,704

−1

6.7

8,835

−1

6.9

9,333

−1

7.1

3

4,281

2

3.3

4,267

2

3.3

4,246

2

3.3

4

4,196

−2

3.2

4,083

−2

3.2

4,157

−2

3.2

5

3,784

2, 1

2.9

3,719

2, 1

2.9

3,739

2, 1

2.9

6

3,527

1, −2

2.7

3,441

1, −2

2.7

3,451

1, −2

2.6

Sub-total

29.2

29.6

29.5

29.2

29.1

87.2%–88.4%), where small value DDs suggest the dependents and their governors are quite close. The fifth ranking goes to “2, 1” indicating its two corresponding nodes both occur before the third node and collectively modify it. A typical example is “an interesting book”, where “an” (with a DD of 2) and “interesting” (with a DD of 1) serve as attributes of their common governor “book”. The sixth ranking

152 Syntactic Dependency Relations and Related Properties Table 5.66 Top 10 DD sequencings in all the sub-corpora rank

DD sequencings

Frequency

%

Times in a sentence

1

1

81,686

10.51

5.0

2

−1

58,132

7.48

3.5

3

−2

32,230

4.15

2.0

4

2

27,175

3.50

1.7

5

2, 1

22,639

2.91

1.4

6

−3

21,305

2.74

1.3

7

1, −2

21,073

2.71

1.3

8

0

16,431

2.11

1.0

9

−1, −1

13,303

1.71

0.8

−1, 1

13,220

1.70

0.8

10

Sub-total

39.52

“join the board” in Table 4.6 （Dependency direction, DD and ADD of the “Pierre” sample sentence). Here in this example, “the” bears a DD of 1 from its governor “board” while “board”, a DD of −2 from its governor “join”. The top rankings suggest some dependency configurations that minimise dependency distance. As the six sub-corpora are generally homogeneous when DD sequencings are concerned, we are able to examine the top 100 DD sequencings in the data collection of all the sub-corpora (Appendix 7). The top 10 rankings are presented in Table 5.66. By dividing the frequency of sequencings with the number of sentences (= the number of roots), we can get the average number of sequencing in each sentence. On average, for each sentence there are 5 words (= 81686/16430) with a DD of 1; 3.5 words, a DD of −1; 2 words, a DD of −2; 1.7 words, a DD of 2; and 1.3 words, a DD of −3. When sequencings with more than one DD are concerned, on average, for each sentence there are 1.3 sequencings of “2, 1” and also 1.3 sequencings of “1, −2”. If we closely examine more data, we will find more such meaningful sequencings in English. To save time, we don’t continue the probe here. We now fit the ZA and ZM models to the rank-frequency DD sequencing data. It seems that the ZM model generates a better fit with both better R2 values and more homogeneous parameter values (a and b) (Table 5.67). The fitting doesn’t rule out the ZA model, which is still a good model with R2 values over 0.93 (Table 5.68). Parameters a (M = 1.01, SD = 0) and b (M = 0.936667, SD = 0.015055) in Table 5.67 and Parameters b (M = 0.121667, SD = 0.0098319) and α (M = 0.10, SD = 0) in Table 5.68 behave homogeneously across the six sub-corpora. The fittings validate Research Hypothesis 5.7.3.

Syntactic Dependency Relations and Related Properties 153 Table 5.67 Fitting the ZM model to rank-frequency DD sequencing data (DD = dependency distance, Sequencing = sequencing) Input data

R²

a

b

n

N

DD-Sequencing-E1

0.9844

1.01

0.95

391

129,983

DD-Sequencing-E2

0.9818

1.01

0.95

373

128,126

DD-Sequencing-E3

0.9839

1.01

0.95

406

131,107

DD-Sequencing-E4

0.9835

1.01

0.92

409

129,188

DD-Sequencing-E5

0.9829

1.01

0.92

402

128,001

DD-Sequencing-E6

0.9823

1.01

0.93

416

130,552

Table 5.68 Fitting the ZA model to rank-frequency DD sequencing data (DD = dependency distance, Sequencing = sequencing) Input data

R²

a

b

n

α

N

DD-Sequencing-E1

0.9332

0.06

0.13

391

0.10

129,983

DD-Sequencing-E2

0.9457

0.20

0.11

373

0.10

128,126

DD-Sequencing-E3

0.9441

0.14

0.12

406

0.10

131,107

DD-Sequencing-E4

0.9420

0.07

0.13

409

0.10

129,188

DD-Sequencing-E5

0.9340

0.04

0.13

402

0.10

128,001

DD-Sequencing-E6

0.9424

0.18

0.11

416

0.10

130,552

5.7.3.3 Sequencings of absolute DDs

Having analysed the DD sequencings, we continue with absolute DD (hereafter referred to as ADD), which is the dependency distance without considering the dependency direction. Table 5.69 is the summary of ADD sequencing data. This time, TTR (M = 0.571667, SD = 0.0075277) is generally smaller than that for DD sequencing data (Table 5.63, M = 0.64). This is quite understandable as ADD goes without directions, thus with less types and less possible relevant sequencings. Table 5.69 Summary of ADD sequencing data

E1 E2 E3 E4 E5 E6

Type

Token

TTR

351,905 351,277 333,257 328,075 340,537 323,758

605,370 604,832 586,971 578,872 596,644 576,430

0.58 0.58 0.57 0.57 0.57 0.56

154 Syntactic Dependency Relations and Related Properties Table 5.70 Summary of extracted ADD sequencing data (sequencing probability > 0.01%) Type

Token

TTR

%

E1

432

163,849

0.0026

27.1

E2

433

163,325

0.0026

27.0

E3

444

164,768

0.0027

28.1

E4

445

162,504

0.0027

28.1

E5

434

161,249

0.0027

27.0

E6

457

164,274

0.0028

28.5

Table 5.70 provides a summary of extracted ADD sequencing data (sequencing probability > 0.01%). TTR (M = 0.002683, SD = 0.0000753) and percentages (M = 27.63333, SD = 0.6743897) suggest homogeneity across the sub-corpora. The top 20 ADD sequencings (50.47%–50.83%) are available in Appendix 8, which further proves the homogeneity of the six sub-corpora. For the top 8 sequencings (Table 5.71, 37.69%–37.91%), the corpora share the same ranking order: 1, 2, Table 5.71 Top 8 ADD sequencings in each sub-corpus (R. = rank, Freq. = frequency) R.

E1

E2

E3

ADD

Freq.

%

ADD

Freq.

%

ADD

Freq.

%

1 2 3 4 5 6 7 8

1 2 1-1 2-1 1-2 3 1-2-1 1-3 sub-total

22,289 8,538 7,303 6,492 6,023 4,177 3,977 3,042

13.60 5.21 4.46 3.96 3.68 2.55 2.43 1.86 37.75

22,406 8,383 7,467 6,403 5,988 4,182 4,019 3,023

1 2 1-1 2-1 1-2 3 1-2-1 1-3

13.72 5.13 4.57 3.92 3.67 2.56 2.46 1.85 37.88

22,344 8,519 7,383 6,523 6,077 4,149 4,085 3,017

1 2 1-1 2-1 1-2 3 1-2-1 1-3

13.56 5.17 4.48 3.96 3.69 2.52 2.48 1.83 37.69

R.

E4

1 2 3 4 5 6 7 8

E5

E6

ADD

Freq.

%

ADD

Freq.

%

ADD

Freq.

%

1 2 1-1 2-1 1-2 3 1-2-1 1-3 sub-total

22,180 8,477 7,165 6,435 6,000 4,113 3,958 2,920

13.65 5.22 4.41 3.96 3.69 2.53 2.44 1.80 37.70

22,247 8,350 7,223 6,317 6,009 4,057 3,976 2,920

1 2 1-1 2-1 1-2 3 1-2-1 1-3

13.80 5.18 4.48 3.92 3.73 2.52 2.47 1.81 37.91

22,430 8,403 7,538 6,302 5,963 4,189 3,864 3,017

1 2 1-1 2-1 1-2 3 1-2-1 1-3

13.65 5.12 4.59 3.84 3.63 2.55 2.35 1.84 37.57

Syntactic Dependency Relations and Related Properties 155 Table 5.72 Top 100 ADD sequencings in all the sub-corpora Rank

ADD sequencing

Frequency

%

Times in a sentence

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 2 1-1 2-1 1-2 3 1-2-1 1-3 1-1-2 3-1 4 0 1-1-1 2-1-3

133,896 50,670 44,079 38,472 36,060 24,867 23,879 17,939 13,173 12,706 12,548 12,138 11,385 10,542 Sub-total

13.66 5.17 4.50 3.93 3.68 2.54 2.44 1.83 1.34 1.30 1.28 1.24 1.16 1.08 45.15

11.0 4.2 3.6 3.2 3.0 2.0 2.0 1.5 1.1 1.0 1.0 1.0 0.9 0.9

1-1, 2-1, 1-2, 3 and 1-2-1, suggesting some certain rhythms and patterns among the corpora. Here “−” indicates the order in the sequencing. The top 100 ADD sequencings in all the sub-corpora are presented in Appendix 9. Those sequencings taking a percentage over 1% are available in Table 5.72. Again we can examine how many times these sequencings occur in an average sentence. Of these 10, there are four one-element sequencings (“1” occurring 11 times, “2”, 4.2 times, “3”, twice and “4”, once in a sentence on average; another four two-element sequencing (“1-1” occurring 3.6 times, “2-1”, 3.2 times, “1-2”, 3 times and “1-3” 1.5 times in a sentence on average) ; and even two sequencings with three elements (“1-2-1” occurring twice and “1-1-2”, 1.1 times in a sentence on average). Finally, we fit the ZA and ZM models to the data (Tables 5.73 and 5.74). This time, different from the DD data, fitting the ZA model generates a better R2 values (all above 0.98). The parameter value of α is constant (= 0.14) across the subcorpora, suggesting some common mechanisms. The ZM model fitting enjoys a Table 5.73 Fitting the ZA model to rank-frequency ADD sequencing data Input data

R²

a

b

n

α

N

ADD-Sequencing_E1

0.9853

0.35

0.09

432

0.14

163,849

ADD-Sequencing_E2

0.9888

0.44

0.08

433

0.14

163,325

ADD-Sequencing_E3

0.9802

0.27

0.10

444

0.14

164,768

ADD-Sequencing_E4

0.9839

0.34

0.09

445

0.14

162,504

ADD-Sequencing_E5

0.9860

0.38

0.09

434

0.14

161,249

ADD-Sequencing_E6

0.9892

0.45

0.07

457

0.14

164,274

156 Syntactic Dependency Relations and Related Properties Table 5.74 Fitting the ZM model to rank-frequency ADD sequencing data Input data

R²

a

b

n

N

ADD-Sequencing_E1 ADD-Sequencing_E2 ADD-Sequencing_E3 ADD-Sequencing_E4 ADD-Sequencing_E5 ADD-Sequencing_E6

0.9605 0.9706 0.9608 0.8484 0.9575 0.8559

1.05 1.02 1.05 1.24 1.05 1.22

0.80 0.52 0.80 3.65 0.79 3.35

432 433 444 445 434 457

163,849 163,325 164,768 162,504 161,249 164,274

much weaker R2 values (two of them only about 0.85) but still, the ZM model fits the data acceptably well. The previous findings suggest the ADD sequencings as normal language entities and as consequences of diversification processes. They also validate Hypothesis 5.7.4 in this section. 5.7.4 Section summary

This section explores the distribution of dependency distances (DD) and ADDs. Dependency distance in this study is defined as the serial number of the governor in the linear sentence minus that of the dependent. A negative DD suggests a head-initial case, and a positive DD, a head-final case. The right-truncated Zeta distribution fails to model the DD in the study. Instead, the ZA function models DD distributions excellently. In all the sub-corpora, the top DDs are, in sequence, 1, -1, 2, -2, -3, 3, -4, with similar proportions, suggesting homogeneity at least when DD is concerned. The smaller-value DDs occur more frequently than biggervalue DDs, an important contributing factor for dependency distance minimisation (DDM) in English (Buch-Kromann, 2006). In this fashion, the cognitive load is lighter as the dependent is more likely to find its governor and is thus more easily freed from working memory. This part further examines the validity of the sequencing method on DDs and ADDs, whose rank-frequency distributions are both found to observe the ZA as well as ZM models. The fittings validate the linguistic status of the DD and ADD sequencings and suggest them as results of diversification processes. They also further indicate the feasibility of the sequencing method. 5.8 Chapter summary This chapter focuses on the syntactic dependency structures and some related ideas. It yields the following research findings: We first validated the homogeneity of the sub-corpora through an examination of POS and dependency relations/syntactic roles. We then proposed the concept of sequencing, which denotes all the possible ordered strings from a sentence. To simplify the discussion and avoid the data scarcity problem, we only examined the sequencings with a probability higher than 0.01%. This method is proved valid

Syntactic Dependency Relations and Related Properties 157 when we examine the word classes, dependency relations, DD and absolute DD sequencings individually. They were all found to observe the ZA and/or ZM models. We examined the seven different combinations of the complete dependency structure: “dependent + governor = syntactic function/dependency relation”, where both the dependent and governor refer to their POS property or word class. These combinations refer to: Combination 1: Dependent (POS), Combination 2: Governor (POS), Combination 3: Dependency relation, Combination 4: The pair of “dependent (POS)+governor (POS)”, Combination 5: “dependent (POS)+ [] = dependency relation/syntactic function”, where certain types of dependents can serve certain syntactic functions, and similarly, Combination 6: “[]+governor (POS) = dependency relation/syntactic function” where governors need to bear some dependents and make them fulfil certain functions, and finally Combination 7: The complete dependency structure “dependent (POS)+governor (POS) = dependency relation/syntactic function”. We first examined POS and syntactic function sequencings of all the running words, the rank-frequency distributions of which were found to observes both the ZA and ZM patterns. Then we separately examined the rank-frequency distribution of Combinations 4 through 7 and of their motifs, using both language-in-the-line and language-in-the-bag methods. The 4 were are found to observe the ZA model, and their motifs, a negative hypergeometric model. Such an attempt is, to our best knowledge, also a new one on the combinatorial distribution of units/properties, which extends the definition of linguistic entities/properties. This chapter also examined two dependency related properties: valency (defined as the number of dependents a governor possesses) and dependency distance. Both valency motifs and motif lengths abide by the ZA model. In addition, the interrelation between motif lengths and length frequencies can be captured by the hyperPoisson model. The sequencings of dependency distance and absolute dependency distance are both found to observe the ZA and ZM models. The fittings validate the linguistic status of syntactic dependency structures, valency and dependency distance and indicate them as results of diversification processes. This chapter focuses on syntactic dependency. In a similar fashion, we will move on to discourse dependency in the succeeding chapter. Notes 1 Research questions or hypotheses are numbered in accordance with the section. For instance, Hypothesis 5.1.1 suggests itself as the first hypothesis in Section 5.1. 2 http://blog.sina.com.cn/s/blog_72d083c70102y6pm.html (Accessed January 19, 2022)

158 Syntactic Dependency Relations and Related Properties References Altmann, G., & Burdinski, V. (1982). Towards a law of word repetitions in text-blocks. Glottometrika, (4), 147–167. Aziz, A., Saleem, T., Maqsood, B., & Ameen, A. (2020). Grammatical and syntactical functions of auxiliaries in English and Urdu. Revista Amazonia Investiga, 9(35), 34–50. Behaghel, O. (1930). Von deutscher Wortstellung. Zeitschrift Für Deutschkunde, 44, 81–89. Beliankou, A., & Köhler, R. (2018). Empirical analyses of valency structures. In H. Liu & J. Jiang (Eds.), Quantitative analysis of dependency structure (pp. 93–100). De Gruyter Mouton. Best, K.-H. (1994). Word class frequencies in contemporary German short prose texts. Journal of Quantitative Linguistics, 1, 144–147. Best, K.-H. (1998). Zur Interaktion der Wortarten in Texten. Papiere zur Linguistik, 58, 83–95. Best, K.-H. (2000). Verteilungen der Wortarten in Anzeigen. Göttinger Beiträge zur Sprachwissenschaft, 4, 37–51. Best, K.-H. (2001). Zur Gesetzmäßigkeit der Wortartenverteilungen in deutschen Pressetexten. Glottometrics, 1, 1–26. Buch-Kromann, M. (2006). Discontinuous grammar. A dependency-based model of human parsing and language acquisition. Dissertation, Copenhagen Business School. VDM Verlag Dr. Müller. Čech, R., & Mačutek, J. (2010). On the quantitative analysis of verb valency in Czech. In P. Grzybek, E. Kelih, & J. Mačutek (Eds.), Text and language. Structure, functions, interrelations (pp. 21–29). Preasen Verlag. Čech, R., Pajas, P., & Mačutek, J. (2010). Full valency. Verb valency without distinguishing complements and adjuncts. Journal of Quantitative Linguistics, 17(4), 291–302. Čech, R., Vincze, V., & Altmann, G. (2017). On motifs and verb valency. In H. Liu & J. Liang (Eds.), Motifs in language and text (pp. 231–260). De Gruyter Mouton. Chen, R. (2017). Quantitative text classification based on POS-motifs. In H. Liu & J. Liang (Eds.), Motifs in language and text (pp. 65–86). De Gruyter Mouton. Chen, R., Deng, S., & Liu, H. (2022). Syntactic complexity of different text types: From the perspective of dependency distance both linearly and hierarchically. Journal of Quantitative Linguistics, 29(4), 510–540. Chen, X. (2013). Dependency network syntax from dependency treebanks to a classification of chinese function words. In Proceedings of the second international conference on dependency linguistics (DepLing 2013) (pp. 41–50). Prague, Czech Republic, August 27–30, 2013. Chen, X., & Gerdes, K. (2022). Dependency distances and their frequencies in Indo- European language. Journal of Quantitative Linguistics, 29(1), 106–125. Collins, M. (1996). A new statistical parser based on bigram lexical dependencies. In Proceedings of the Associations for Computational Linguistics (pp. 184–191). arXiv preprint cmp-lg/9605012. Collins, M. (2003). Head-driven statistical models for natural language parsing. Computational Linguistics, 29, 589–638. Covington, M. (2003). A free-word-order dependency parser in prolog. Artificial Intelligence Center, The University of Georgia. Cowan, N. (2001). The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24, 87–185. Cowan, N. (2005). Working memory capacity. Psychology Press. Cowan, N. (2010). The magical mystery four: How is working memory capacity limited, and why? Current Directions in Psychological Science, 19(1), 51–57. Ding, J. (2009). Outlines of stylistic analysis. Jinan University Press.

Syntactic Dependency Relations and Related Properties 159 Dryer, M. S. (1992). The Greenbergian word order correlations. Language, 68, 81–138. Ebert, R. P. (1978). Historische syntax des deutschen. Sammlung Metzler. Eisner, J., & Smith, N. (2005). Parsing with soft and hard constraints on dependency length. In: Proceedings of the international workshop on parsing technologies (pp. 30–41). Elts, J. (1992). A readability formula for text on biology. In Psychological Problems of Reading (pp. 42–44). Vilnius, May 6–7, 1992. Harper and Brothers. Fang, Y., & Liu, H. (2018). What factors are associated with dependency distances to ensure easy comprehension? A case study of ba sentences in Mandarin Chinese. Language Sciences, 67, 33–45. Ferrer-i-Cancho, R. (2004). Euclidean distance between syntactically linked words. Physical Review E, 70(5), 056135. Ferrer-i-Cancho, R. (2006). Why do syntactic links not cross? Epl, 76(6), 1228. Ferrer-i-Cancho, R. (2013). Hubiness, length, crossings and their relationships in dependency trees. Glottometrics, 25, 1–21. Ferrer-I-Cancho, R. (2016). Non-crossing dependencies: Least effort, not grammar. In A. Mehler, A. Lücking, S. Banisch, P. Blanchard, & B. Job (Eds.), Towards a theoretical framework for analyzing complex linguistic networks (pp. 203–234). Springer. Ferrer-i-Cancho, R., & Liu, H. (2014). The risks of mixing dependency lengths from sequences of different length. Glottotheory, 5(2), 143–155. Fisch, A., Guo, J., & Barzilay, R. (2019). Working hard or hardly working: Challenges of integrating typology into neural dependency parsers. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) (pp. 5714–5720). Hong Kong, China, November 3–7, 2019. Association for Computational Linguistics. Frazier, L. (1978). On comprehending sentences: Syntactic parsing strategies. Unpublished doctoral dissertation, University of Massachusetts, Amherst. Frazier, L. (1985). Syntactic complexity. In R. David, L. Karttunen, & A. Zwicky (Eds.), Natural language parsing: Psychological, computational, and theoretical perspectives (pp. 129–189). Cambridge University Press. Futrell, R., Mahowald, K., & Gibson, E. (2015). Large-scale evidence for dependency length minimization in 37 languages. Proceedings of the National Academy of Sciences of the United States of America, 112(33), 10336–10341. Gao, J., & Liu, H. (2019). Valency and English learners’ thesauri. International Journal of Lexicography, 32(3), 326–361. Gao, J., & Liu, H. (2020). Valency Dictionaries and Chinese vocabulary acquisition for foreign learners. Leixkos, 30, 111–142. http://doi.org/10.5788/30-1-1548 Gao, S. (2010). A quantitative study of grammatical functions of modern Chinese nouns based on a dependency treebank. TCSOL Studies, 2(38), 54–60. (In Chinese). Gao, S., Yan, W., & Liu, H. (2010). A quantitative study on syntactic functions of Chinese verbs based on dependency treebank. Chinese Language Learning, 5, 105–112. (In Chinese). Gao, S., Zhang, H., & Liu, H. (2014). Synergetic properties of Chinese verb valency. Journal of Quantitative Linguistics, 1, 1–21. Gibson, E. (1998). Linguistic complexity: Locality of syntactic dependencies. Cognition, 68, 1–76. Gibson, E. (2000). The dependency locality theory: A distance-based theory of linguistic complexity. In A. P. Marantz, Y. Miyashita, & W. O’Neil (Eds.), Image language, brain (pp. 95–126). The MIT Press. Gibson, E., & Pearlmutter, N. J. (1998). Constraints on sentence comprehension. Trends in Cognitive Sciences, 2, 262–268.

160 Syntactic Dependency Relations and Related Properties Gildea, D., & Temperley, D. (2010). Do grammars minimize dependency length? Cogn. Sci, 34, 286–310. Grodner, D., & Gibson, E. (2005). Consequences of the serial nature of linguistic input for sentential complexity. Cognitive Science, 29(2), 261–290. Hammerl, R. (1990). Untersuchungen zur Verteilung der Wortarten im text. In L. Hřebíček (Ed.), Glottometrika (Vol. 11, pp. 142–156). Brockmeyer. Hao, Y., Wang, X., & Lin, Y. (2021). Dependency distance and its probability distribution: Are they the universals for measuring second language Learners’ language proficiency? Journal of Quantitative Linguistics, 29(4), 485–509. Hao, Y., Wang, Y., & Liu, H. (2022). A diachronic study on noun valency in Chinese interlanguage based on a dependency treebank. Journal of Yunnan Normal University (Teaching & Studying Chinese as a Foreign Language Edition, 20(04), 84–92. (In Chinese). Hawkins, J. A. (1994). A performance theory of order and constituency. Cambridge University Press. Hawkins, J. A. (2004). Efficiency and complexity in grammars. OUP Oxford. Hengeveld, K. (2007). Parts-of-speech systems and morphological types. ACLC Working Papers, 2, 31. Hengeveld, K., Rijkhoff, J., & Siewierska, A. (2004). Parts-of-speech systems and word order. Journal of Linguistics, 40(3), 527–570. Herbst, H. (1988). A valency model for nouns in English. Journal of Linguistics, 24(2), 265–301. Herbst, T., David, H., Ian, F. R., & Dieter, G. (2004). A valency dictionary of English: A corpus-based anaysis of the complementation patterns of English verbs, nouns and adjectives. De Gruyter Mouton. Heringer, H. J., Strecker, B., & Wimmer, R. (1980). Syntax. Fragen–Lösungen–Alternativen. Wilhelm Fink Ver-lag. Hoffmann, L. (1976). Kommunikationsmittel Fachsprache. Eine Einführung. Akademie Verlag. Hou, R., & Jiang, M. (2014). Analysis on Chinese quantitative stylistic features based on text mining. Literary and Linguistic Computing: Digital Scholarship in the Humanities, 4, 1–11. Hřebíček, L. (1996). Word associations and text. Glottometrika, 15, 96–101. Huang, W., & Liu, H. (2009). Aplication of quantitative characteristic of Chinese genres in text clustering. Computer Engineering and Applications, 45(29), 25–27+33. (In Chinese). Hudson, R. (1995). Measuring Syntactic Difficulty. Unpublished paper. https://dickhudson. com/wp-content/uploads/2013/07/Difficulty.pdf (Accessed February 2, 2023). Hudson, R. (2010). An introduction to word grammar. Cambridge University Press. Jay, T. B. (2004). The psychology of language. Beijing University Press. Jiang, J., & Liu, H. (2015). The effects of sentence length on dependency distance, dependency direction and the implications - based on a parallel English-Chinese dependency treebank. Language Sciences, 50, 93–104. Jiang, J., & Ouyang, J. (2017). Dependency distance: A new perspective on the syntactic development in second language acquisition. Comment on “Dependency distance: A new perspective on syntactic patterns in natural language” by Haitao Liu et al. Physics of Life Reviews, 21, 209–210. Jiang, X., & Jiang, Y. (2020). Effect of dependency distance of source text on disfluencies in interpreting. Lingua, 243, 102873. Jin, H. (2018). A dependency-based study on the development of deaf students’ syntactic ability in Chinese written sentences. In H. Liu (Ed.), Advances in quantitative linguistics (pp. 315–328). Zhejiang University Press. (In Chinese).

Syntactic Dependency Relations and Related Properties 161 Jin, H., & Liu, H. (2018). Regular dynamic patterns of verbal valency ellipsis in modern spoken Chinese. In J. Jiang & H. Liu (Eds.), Quantitative analysis of dependency structures (pp. 101–118). De Gruyter Mouton. Jing, Y., & Liu, H. (2017). Dependency distance motifs in 21 Indo-European languages. In H. Liu & J. Liang (Eds.), Motifs in language and text (pp. 133–150). De Gruyter Mouton. Köhler, R. (1999). Syntactic structures: Properties and interrelations. Journal of Quantitative Linguistics, 6(1), 46–57. Köhler, R. (2005). Synergetic linguistics. In R. Köhler, G. Altmann, & R. G. Piotrowski (Eds.), Quantitative Linguistik. Ein Internationales Handbuch. Quantitative linguistics. An international handbook (pp. 760–774). De Gruyter Mouton. Köhler, R. (2012). Quantitative syntax analysis. De Gruyter Mouton. Köhler, R. (2015). Linguistic motifs. In G. K. Mikros & J. Mačutek (Eds.), Sequences in language and text (pp. 107–129). De Gruyter Mouton. Köhler, R., & Naumann, S. (2010). A syntagmatic approach to automatic text classification. Statistical properties of F- and L-motifs as text characteristics. In P. Grzybek, E. Kelih, & J. Mačutek (Eds.), Text and language. Structures, functions, interrelations (pp. 81–90). Praesens Verlag. Lei, L., & Wen, J. (2020). Is dependency distance experiencing a process of minimization? A diachronic study based on the State of the Union addresses. Lingua, 239, 102762. Levy, R., Fedorenko, E., & Gibson, E. (2013). The syntactic complexity of Russian relative clauses. J. Mem. Lang, 69(4), 461–495. Li, W. (2018). A Chinese-English comparative study on the quantitative syntactic features of subjects and objects. In H. Liu (Ed.), Advances in quantitative linguistics (pp. 244–267). Zhejiang University Press. (In Chinese). Li, W., & Yan, J. (2021). Probability distribution of dependency distance based on a treebank of Japanese EFL Learners’ interlanguage. Journal of Quantitative Linguistics, 28(2), 172–186. Li, X., Li, X., & He, K. (2015). A corpus-based study of French lexical features. Research in Teaching, 31, 116–117. (In Chinese). Liang, J. (2017). Dependency distance differences across interpreting types: Implications for cognitive demand. Frontiers in Psychology, 8, 2132. Liang, J., & Liu, H. (2013). Noun distribution in natural languages. Poznań Studies in Contemporary Linguistics, 49(4), 509–529. Liang, J., & Liu, H. (2016). Interdisciplinary studies of linguistics: Language universals, human cognition and big-data analysis. Journal of Zhejiang University (Humanities and Social Sciences Edition), 46(1), 108–118. (In Chinese). Liu, B., & Liu, H. (2011). A corpus-based diachronic study on the syntactic valence of Chinese verbs. Language Teaching and Linguistic Studies, 6, 83–89. (In Chinese). Liu, B., Niu, Y., & Liu, H. (2012). A study of Chinese stylistic differences based on an annotated dependency treebank. Applied Linguistics, (4), 134–142. (In Chinese). Liu, B., & Xu, C. (2018). A diachronic study on dependency distance and dependency direction of vernacular Chinese. In H. Liu (Ed.), Advances in quantitative linguistics (pp. 329–343). Zhejiang University Press. (In Chinese). Liu, G. (2016). A corpus-based survey of English nominal frequency and its corresponding stylistic distribution. Shandong Foreign Language Teaching, 4, 3–11+42. (In Chinese). Liu, H. (2006). Syntactic parsing based on dependency relations. Grkg/Humankybernetik, 47(3), 124–135. Liu, H. (2007). Probability distribution of dependency distance. Glottometrics, (15): 1–12. Liu, H. (2008). Dependency distance as a metric of language comprehension difficulty. Journal of Cognitive Science, 9(2), 159–191.

162 Syntactic Dependency Relations and Related Properties Liu, H. (2011). Quantitative properties of English verb valency. Journal of Quantitative Linguistics, 18(3), 207–233. Liu, H. (2009a). Dependency grammar: From theory to practice. Science Press. (In Chinese). Liu, H. (2009b). Probability distribution of dependencies based on Chinese dependency treebank. Journal of Quantitative Linguistics, 16(3), 256–273. Liu, H., & Feng, Z. (2007). Probabilistic valency pattern theory for natural language processing. Linguistic Sciences, (03): 32–41. (In Chinese). Liu, H., & Liang, J. (Eds.). (2017). Motifs in language and text. De Gruyter Mouton. Liu, H., Xu, C., & Liang, J. (2017). Dependency distance: A new perspective on syntactic patterns in natural languages. Physics of Life Review, 21, 171–193. Lu, J., & Liu, H. (2020). Do English noun phrases tend to minimize dependency distance? Australian Journal of Linguistics, 40(2), 246–262. http://doi.org/10.1080/07268602.2020. 1789552 Lu, Q. (2018). Effects of crossings, root position and chunking on depenedency distance. In H. Liu (Ed.), Advances in quantitative linguistics (pp. 198–210). Zhejiang University Press. (In Chinese). Lu, Q., Lin, Y., & Liu, H. (2018). Dynamic valency and dependency distance. In J. Jiang & H. Liu (Eds.), Quantitative analysis of dependency structures (pp. 145–166). De Gruyter Mouton. Lu, Q., & Liu, H. (2016a). A quantitative analysis of the relationship between crossing and dependency distance in human language. Journal of Shanxi University (Philosophy, & Social Science), 39(4), 49–56. (In Chinese) Lu, Q., & Liu, H. (2016b). Does dependency distance distribute regularly? Journal of Zhejiang University (Humanities and Social Science, 4, 49–56. (In Chinese). Mikk, J. (1997). Parts of speech in predicting reading comprehension. Journal of Quantitative Linguistics, 4(1–3), 156–163. Mikk, J., & Elts, J. (1992). Dependence of interest in reading on text characteristics. In Psychological Problems of Reading (pp. 44–46). Vilnius, May 6–7, 1992. Harper and Brothers. Mikk, J., & Elts, J. (1993). Comparison of texts on familiar or unfamiliar subject matter. In L. Hřebíček & G. Altmann (Eds.), Quantitative text analysis (pp. 228–238). WVT. Mikros, G. K., & Mačutek, J. (Eds.). (2015). Sequences in language and text. De Gruyter Mouton. Miller, G. A. (1956). The magical number seven plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81–97. Miller, G. A., & Chomsky, N. (1963). Finitary modeals of language users. In R. D. Luce, R. R. Bush, & E. Galanter (Eds.), Handbook of mathematical psychology (Vol. II, pp. 419–491). Wiley. Mo, P., & Shan, Q. (1985). Statistical analysis of syntactic functions of three major content word classes. Journal of Nanjing Normal University, 3, 55–63. (In Chinese). Muryasov, R. Z. (2019). On the periphery of the parts of speech system. Xlinguae. European Scientific Language Journal, 12(4), 51. Ninio, A. (1998). Acquiring a dependency grammar: The first three stages in the acquisition of multiword combinations in Hebrew-speaking children. In G. Makiello-Jarza, J. Kaiser, & M. Smolczynska (Eds.), Language acquisition and developmental psychology. Universitas. Ouyang, J., & Jiang, J. (2018). Can the probability distribution of dependency distance measure language proficiency of second language learners? Journal of Quantitative Linguistics, 25(4), 295–313. Ouyang, J., Jiang, J., & Liu, H. (2022). Dependency distance measures in assessing syntactic complexity. Assessing Writing, 51, 100603.

Syntactic Dependency Relations and Related Properties 163 Pan, X., Chen, X., & Liu, H. (2018). Harmony in diversity: The language codes in English– Chinese poetry translation. Digital Scholarship in the Humanities, 33(1), 128–142. Pan, X., & Liu, H. (2018). Adnominals in modern Chinese and their distribution properties. Glottometrics, 29, 1–30. Pawłowski, A. (1997). Time-Series analysis in linguistics: Application of the ARIMA method to cases of spoken Polish. Journal of Quantitative Linguistics, 4(1–3), 203–221. Popescu, I.-I., Altmann, G., & Köhler, R. (2010). Zipf’s law – Another view. Quality & Quantity, 44(4), 713–731. Pyles, T. (1971). The origins and development of the English language. Harcourt Brace Jovanovich. Sampson, G. (1997). Depth in English grammar. Journal of Linguistics, 33, 131–151. Schweers, A., & Zhu, J. (1991). Wortartenklassifizierung im Lateinischen, Deutschen und Chinesischen. In U. Rothe (Ed.), Diversification processes in language: Grammar (pp. 157–165). Margit Rottmann Medienverlag. Sleator, D., & Temperley, D. (1991). Parsing English with a link grammar. Carnegie Mellon University Computer Science technical report CMU-CS-91-196. Sun, L. (2020). Flexibility in the parts-of-speech system of classical Chinese. De Gruyter Mouton. Temperley, D. (2007). Minimization of dependency length in written English. Cognition, 105(2), 300–333. Temperley, D. (2008). Dependency-length minimization in natural and artificial languages. Journal of Quantitative Linguistics, 15(3), 256–282. Tesnière, L. (1959). Éléments de syntaxe structurale. Klincksieck. Tuldava, J., & Villup, A. (1976). Sõnaliikide sagedusest ilukirjandusproosa autorikõnes [Statistical analysis of parts of speech in Estonian fiction]. Töid keelestatistika alalt I (pp. 61–102) [Papers on linguostatistics I]. Tartu. (Summary in English, pp. 105–106). Tuzzi, A., Popescu, I. I., & Altmann, G. (2009). Zipf’s laws in Italian texts. Journal of Quantitative Linguistics, 16(4), 354–367. Vulanović, R. (2008a). The combinatorics of word order in flexible parts-of-speech systems. Glottotheory, 1, 74–84. Vulanović, R. (2008b). A mathematical analysis of parts-of-speech systems. Glottometrics, 17, 51–65. Vulanović, R. (2009). Efficiency of flexible parts-of-speech systems. In R. Köhler (Ed.), Issues in quantitative linguistics (pp. 155–175). RAM-Verlag. Wagers, M. W., & Phillips, C. (2014). Going the distance: Memory and control processes in active dependency construction. Quarterly Journal of Experimental Psychology, 67(7), 1274–1304. Wang, H. (2018). A quantitative study on measuring length of Chinese and English noun phrases. In H. Liu (Ed.), Advances in quantitative linguistics (pp. 268–280). Zhejiang University Press. (In Chinese). Wang, Y. (2015). A study on the distribution of dependency distances in different domains of written English in the BNC. Master dissertation, Dalian Maritime University. Wang, Y., & Liu, H. (2017). The effects of genre on dependency distance and dependency direction. Language Sciences, 59, 135–147. Wiio, O. A. (1968). Readability, comprehension and readership. Acta Universitatis Tamperensis, ser. A, 22. Tampere. Xu, C. (2018). Dependency distance between preposition zai and the clause subject. In H. Liu (Ed.), Advances in quantitative linguistics (pp. 231–243). Zhejiang University Press. (In Chinese).

164 Syntactic Dependency Relations and Related Properties Xu, Y. (2006). Examining grammatical functions of content words and reconstructing word class system. Dissertation, Nanjing Normal University. (In Chinese). Yan, J. (2017). The rank-frequency distribution of part-of-speech motif and dependency motif in the deaf Larners’ compositions. In H. Liu & J. Liang (Eds.), Motifs in language and text (pp. 181–200). De Gruyter Mouton. Yan, J. (2018). A treebank-based study of syntactic development of prepositions in deaf students’ written language. In H. Liu (Ed.), Advances in quantitative linguistics (pp. 231–243). Zhejiang University Press. (In Chinese). Yan, J., & Liu, S. (2017). The distribution of dependency relations in great expectations and Jane Eyre. Glottometircs, 37, 13–33. Yan, J., & Liu, H. (2021). Quantitative analysis of Chinese and English verb valencies based on the probabilistic valency pattern theory. Chinese Lexical Semantic Workshop (CLSW 2021), Nanjing, China. Yin, B. (2014). An examination of the distribution and syntactic function of modern Chinese range adverbs. Journal of Jiaozuo University, 28(3), 20–21. (In Chinese). Yngve, V. (1960). A model and a hypothesis for language structure. Proceedings of the American Philosophical Society, 104: 444–466. Yu, S. (2018). Linguistic interpretation of Zipf’s law. In H. Liu (Ed.), Advances in quantitative linguistics (pp. 1–25). Zhejiang University Press. (In Chinese) Yu, S., Liang, J., & Liu, H. (2016). Existence of Hierarchies and Human’s Pursuit of Top Hierarchy Lead to Power Law. http://arxiv.org/abs/1609.07680 Yue, M., & Liu, H. (2011). Probability distribution of discourse relations based on a Chinese RST-annotated corpus. Journal of Quantitative Linguistics, 18(2), 107–121. Zhang, H. (2018). A Chinese-English Synergetic Syntactic Model Based on Dependency Relations. Doctoral dissertation, Zhejiang Univiersity. (In Chinese). Zhang, H. (2023). Quantitative syntactic features of Chinese and English—a dependencybased comparative study. Zhejiang University Press. (In Chinese). Zhang, H., & Liu, H. (2017b). Motifs of generalized valencies. In H. Liu & J. Liang (Eds.), Motifs in language and text (pp. 231–260). De Gruyter Mouton. Zhang, Z. (2012). A corpus study of variation in written Chinese. Corpus Linguistics and Linguistic Theory, 8(1), 209–240. Zhao, Y., & Liu, H. (2014). Tendency to minimize dependency distance in understanding ambiguous structure. Computer Engineering and Applications, 50(6), 7–10. (In Chinese). Zhou, C. (2021). On mean dependency distance as a metric of translation quality assessment. Indian Journal of Language and Linguistics, 2(4), 23–30. Zhu, H., Liu, X., & Pang, N. (2022). Investigating diachronic change in dependency distance of Modern English: A genre-specific perspective. Lingua, 103307. Zhu, J., & Best, K.-H. (1992). Zum Wort im modernen Chinesisch. Oriens Extremus, 35, 45–60. Ziegler, A. (1998). Word class frequencies in Brazilian-Portuguese press texts. Journal of Quantitative Linguistics, 5, 269–280. Ziegler, A. (2001). Word class frequencies in Portuguese press texts. In L. Uhlířová, G. Wimmer, G. Altmann, & R. Köhler (Eds.), Text as a linguistic paradigm: Levels, constituents, constructs. Festschrift in honour of Luděk Hřebíček (pp. 295–312). Wissenschaftlicher Verlag Trier. Ziegler, A., Best, K.-H., & Altmann, G. (2001). A contribution to text spectra. Glottometrics, 1, 97–108. Zipf, G. K. (1932). Selected studies of the principle of relative frequency in language. Harvard University Press.

6

Discourse Dependency Relations

Having discussed the syntactic dependency relations, we can move on to discourse dependency relations, examining whether these dependency relations and related properties/entities behave similarly. In the converted RST trees (cf. Section 4.3.2), all EDUs (elementary discourse units) are elementary units, and thus we are able to examine the relations between the units at the same level and examine the discourse process of forming bigger linguistic units through their combinations. In this chapter, to simplify the discussion, we will define three granularity levels of the discourse units as follows: 1 The lowest level, where the EDUs are basically clauses. 2 The intermediate level, where the EDUs are sentences. 3 The highest level, where the EDUs are paragraphs. This chapter is organised in the following way: Section 6.1 concentrates on the percentages of RST relations across distinct levels and discusses the relations per se along with the relevant discourse processes. Sections 6.2 and 6.3 examine the rank-frequency of RST relations and of their motifs, separately, with an aim of modelling their distributions. Section 6.4 visualises the structural “inverted pyramids” in English news discourse. The next two sections examine the rank-frequency distributions of discourse valency (including their motifs) and of DDs. Finally, Section 6.7 addresses why trees are reframed and Section 6.8 summarises the whole chapter. 6.1 RST relations per se 6.1.1 Introduction

We can analyse discourse structures from both hierarchical and relational dimensions (Sanders & van Wijk, 1996). The hierarchical dimension addresses issues like government of one text segment over another and the distance between connected segments. Many researchers postulate a hierarchical structure for discourse (e.g., Bille, 2005; Chiarcos & Krasavina, 2008; Mann & Thompson, 1988; Moser et al., DOI: 10.4324/9781003436874-6

166 Discourse Dependency Relations 1996; Pitkin, 1977; Polanyi et al., 2004; Sun & Xiong, 2019). For instance, Sun and Xiong (2019) convert the RST-DT corpus into dependency type and examine the dependency distance of each relation type. The second dimension of discourse structure analysis focuses on the meaning of connections (e.g., d’Andrade, 1990; den Ouden et al., 2009; Guo et al., 2020; Hovy & Maier, 1992; Liu et al., 2016; Louwerse, 2001; Narasimhan & Barzilay, 2015; Peng et al., 2017; Sahu et al., 2019). When the relational dimension of RST is concerned, several related issues are quantitatively examined. It’s found that certain relations occur more as leaf nodes (mainly clauses) and some others, more in bigger spans (Carlson & Marcu, 2001). Similarly, Williams and Reiter (2003) notice the uneven distribution of RST relations at different layers in the rhetorical structure trees–some relations are found to occur more at lower layers while some others behave otherwise. From the RST-DT (Carlson et al., 2002, 2003), rhetorical complexity data are derived by Williams and Power (2008). They define such complexity of a node as the number of EDUs directly under it. Specifically, each type of relation is examined from both the perspective of satellites and that of nuclei. This research also constitutes a validation of the observation by Williams and Reiter (2003). Some researchers (e.g., Beliankou et al., 2012; Yue & Liu, 2011) fit the RST rhetorical relations to certain distribution models. For instance, Yue and Liu (2011) find that the ZA (modified right-truncated Zipf-Alekseev) model fits well the data of relations in 20 randomly selected texts from a Chinese RST-annotated corpus (Yue & Feng, 2005). Their study suggests relations as result of a diversification process (Altmann, 1991, 2005a). For these studies, the nodes of the discourse tree come at different granularity. We therefore wonder whether it’s possible to discuss the relations between terminal nodes of the same granularity level, be they mere clauses, or sentence, or paragraphs or even beyond. Li et al. (2014) have partly addressed this issue. Using two algorithms and employing graph-based dependency parsing techniques, they convert RST-DT trees into ones with only terminal clausal EDUs. The distribution of such RST relations is examined and the relevant information is available in Table 6.1. Li et al. (2014) identify 19 types of coarse-grained relations. As suggested by Section 4.3.2, we have reframed each tree into ones with terminal nodes at one distinct level, with the nodes being clauses, sentences and paragraphs, respectively. This operation enables us (1) to investigate relations at three different levels separately and exclusively and (2) to examine at three levels whether each and every rhetorical relation can, in the discourse processes, combine nodes at its next immediate level into nodes at a higher level (coherent bigger units). We illustrate with two examples. Elaboration, as the name suggests, is a relation to elaborate something, or to offer more details. Such a relation is quite representative and can be observed at all the three levels. But one question arises, does it possess the same significance across levels? Or consider Topic-drift, which indicates the smooth drift of topic information from one span to the next one. We might ask whether it can be observed at all the three levels, between sub-sentential units, between sentences and between paragraphs. If the answer is affirmative, we can pursue: Is it applied with similar

Discourse Dependency Relations 167 Table 6.1 Distribution of rhetorical relations in Li et al. (2014)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Relations

Train

%

Test

%

Elaboration Attribution Joint Same-unit Contrast Explanation Background Cause Evaluation Enablement Temporal ROOT Comparison Condition MannerMeans Summary Topic-Change Textual TopicComment

6879 2641 1711 1230 944 849 786 785 502 500 426 342 273 258 191 188 187 147 126

36.3 13.9 9.0 6.5 5.0 4.5 4.1 4.1 2.6 2.6 2.2 1.8 1.4 1.4 1.0 1.0 1.0 0.8 0.7

796 343 212 127 146 110 111 82 80 46 73 38 29 48 27 32 13 9 24

33.9 14.6 9.0 5.4 6.2 4.7 4.7 3.5 3.4 2.0 3.1 1.6 1.2 2.0 1.2 1.4 0.6 0.4 1.0

percentages across levels? If the answer is negative, we might wonder what leads to the absence at a certain level. We come up with a research question. Research Question 6.1.1: Do the distribution patterns of RST relations reveal differences and similarities of discourse processes at different levels (generating sentences from clauses, paragraphs from sentences, and then discourse from paragraphs)? We will examine RST relations along three taxonomy levels: Taxonomy 1, the most elaborate one, is marked in the RST-DT manual. For instance, under the umbrella relation Elaboration (E-), there are eight kinds (Elaboration-additional, Elaboration-general-specific, Elaboration-part-whole, Elaboration-process-step, Elaboration-object-attribute, Elaboration-set-member, Example and Definition). If these relations are embedded, they are marked with -e, for instance, E-object-attribute-e. This elaborate classification also distinguishes between multinuclear and mono-nuclear relations. There are, for instance, Consequence (multinuclear), Consequence-n (mono-nuclear, as a nucleus), Consequence-s (mono-nuclear, as a satellite). Then there are actually altogether 87 such relations in the RST-DT, which is different from the number of 78 in the manual (Carlson & Marcu, 2001). A reason for the disparity results from whether to indicate relations as being a satellite(-s), a nucleus(-n) or being embedded (-e). Taxonomy 2 groups relations with the same head as one type. For instance, the first 6 E-type relations are grouped as Elaboration. Thus the previous eight kinds reduce to three kinds: Elaboration, Example and Definition. In the RST-DT, there are 45 types of relations at this taxonomy level.

168 Discourse Dependency Relations Taxonomy 3 further reduces the 45 types into 16 general classes, with each class sharing some type of rhetorical meaning. At taxonomy 2, Elaboration, Example and Definition are three types of relations. Now sharing some rhetorical similarities, they are reduced to one only (still carrying the name of “Elaboration”). Likewise, Comparison, Preference, Analogy and Proportion can be grouped under one umbrella term “Comparison”. The next subsection will detail the research findings on these RST relations. 6.1.2 Research findings and discussion

This part focuses on the RST relations separated at three distinct levels. Initially, we examine the 16 general classes of the relations (taxonomy 3), the distribution of which is presented in Table 6.2. The relation Joint is not included in the table as both its members (List and Disjunction) are multinuclear and are missing as a result of the tree conversion. For instance, in Sample 1: [Call it a fad.] [Or call it the wave of the future.] wsj_0633 The alternative “[Call it a fad.]” and “[Or call it the wave of the future.]”, originally bear a relation of Disjunction in the RST-DT. After the conversion, they are both marked as dependents of their common head “[But,]”, bearing a relation of “Interpretation-s” with the head in the new tree. Similarly, at other levels, some relations might be missing. Between the two upper levels, an intimate interaction can be seen, with a Pearson correlation coefficient R2 of 0.988 (P-value = 0.000). There is a weaker but Table 6.2 Percentages of rhetorical relations across levels (16 classes) Relations

Percentages Between clauses within sentences

Elaboration Attribution Background Enablement Contrast Cause Explanation Condition Manner-Means Temporal Comparison Evaluation Summary Topic-Comment Topic-Change Total

38.8 31.6 5.4 4.9 3.6 3.4 2.8 2.5 2.1 1.9 1.1 1 0.9 0.1 100

Source: Zhang & Liu, 2016a

Between sentences within paragraphs 56.2 1.2 5.4 0.5 5.3 4.6 15.3 0.6 0.5 0.5 1 7.2 1.2 0.5 100

Between paragraphs 63.3 0.1 6.4 0.4 3.7 2.8 8.6 0.5 0.3 0.1 0.7 7.9 3.8 0.7 0.7 100

Discourse Dependency Relations 169 still acceptable correlation between the lowest sentence level (nodes being clauses within sentences) and the intermediate paragraph level (nodes being sentences with paragraphs) where Pearson correlation coefficient R2 is 0.705 (P-value = 0.005). Similarly, the Pearson correlation coefficient is 0.716 (P-value = 0.004) between the lowest level and the highest level (nodes being paragraphs). In terms of general classes of the relations, the two upper levels bear more resemblance. But still there is a common mechanism across the three related discourse processes in shaping units of one level into bigger spans at upper levels. We will examine some specific classes of relations to see whether some similar discourse processes obtain across levels. 1 As the most prevalent form to modify a nucleus (38.8%–63.3%), Elaboration is an umbrella term for eight specific types of relations. As is expected, to elaborate is a universal discourse process across levels. 2 With similar percentages (2.8%–6.4%), both Background and Cause occur across levels for somewhat similar discourse processes. A Background establishes the ground or a context for its nucleus and a Cause presents a cause for its nucleus. For both, their nuclei can be clauses, sentences and paragraphs. 3 Likewise, Comparison (0.7%–1.1%) and Contrast (3.7%–5.3%) can be found with similar percentages for each at all levels. Besides similarities, Table 6.2 also presents some differences between the lowest level and the other two levels. We illustrate some with their discourse functions. 1 Elaboration ranks No. 1 across levels, but with different percentages. At the lowest level, it occurs with a percentage of less than 40%, but at the two upper levels, they take up more than half of all relations (56.25% at the intermediate level and 63.3% at the highest level). The deficit for the lowest level can be made up by Attribution, which holds between nearly 1/3 of the relations (31.6%) at the lowest level, but rarely occurs at the intermediate level (1.2%) or at the upper level (0.1%). Attribution is used in reported speech, where somebody says something. Usually a reporting verb (e.g., “observed”) or a phrase like “according to” will be present. Sometimes cognitive predicates will be present when this relation is used to express thoughts, feelings and hopes (Carlson & Marcu, 2001). This can explain why Attribution is not distributed in a balanced fashion across levels. At the lowest level there is a high proportion of Attribution, which might be a feature unique to news-writing. 2 Mainly occurring between clauses are four relations like Enablement, Temporal, Condition, and Manner-Means. We illustrate with Manner-Means. This is an umbrella term for both Enablement and Purpose. Adverbials of purpose occur frequently, accounting for why Manner-Means majorly holds within sentences (between clauses). Likewise, time seldom goes beyond sentences (accounting for 0.5%), not to mention between paragraphs. So Temporal principally go between clauses within sentences (1.9%). The only exception occurring between paragraphs involves a topic shift, where an event happens “at the same time” with the preceding paragraph (Temporal-same-time).

170 Discourse Dependency Relations 3 Explanation and Evaluation behave otherwise. They majorly hold at the two upper levels (between sentences or between paragraphs). We take as an example Explanation (an umbrella term for Evidence, Explanation-argumentative and Reason), which accounts for only 1% at the lowest level. A clause is usually neither complex enough for situation assessment and evaluation (Evaluation), nor for providing a factual explanation (Explanation-argumentative), nor for convincing the reader of a point (Evidence). Besides what has been previously observed, two classes capture our special attention. 1 The only class that occurs exclusively at the highest level is Topic Change. Whether it’s a Topic-Drift or a Topic-Shift, the author starts the new topic in a new paragraph. This might be a feature unique to news-writing. 2 Topic-Comment is also an interesting class with its members behaving somewhat divergently. Question-answer, its representative member, occurs between nodes at all levels, but the other two members (Problem-solution and Statementresponse) only occur at the two upper levels. It’s impossible to accomplish problem raising and solution providing (Problem-solution), or to follow a response with a statement (Statement-response) within the same sentence. Another member is Rhetorical-question, which is always a self-contained sentence and therefore only holds between sentences. Despite the observation from the 16-type classification (taxonomy 3) that the upper two levels are more alike, we can still witness some core mechanism across the three levels for the discourse to proceed. 1 When RST relations with the most elaborated taxonomy (with a total of 86 RST relations) are examined, we find 45 relations common to all levels (accounting for 52.3% of the total). At the lowest level, there are the most relations (78); at a level upward, only 58 remain and further to the highest level, only 54 relations hold between paragraphs. 2 At the lowest level between clauses, some relations like Statement-response and Summary never occur. In addition, 24 embedded relations (–e) only obtain between clauses. These differences further tell the lowest level apart from the other two levels. 3 At the intermediate level, Rhetorical-question is the only relation that is unique to the sentence-sentence relation. At this intermediate level, the missing relations basically are embedded relations with their non-embedded relations present at this level. The only exceptions are Topic-drift and Topic-shift, which, as previously mentioned, occur only between paragraphs. 4 At the uppermost level, those missing relations are basically embedded ones. Like at the intermediate level, their non-embedded relations are present at this level. Temporal-before(-e) and Preference(-e) are two exceptions, which simply do not obtain between paragraphs. Assigning a clear preference (Preference)

Discourse Dependency Relations 171 for one of the two shall proceed within the same paragraph, thus occurring only between clauses or sentences. To save space, we don’t examine all of the 87 types. Rather, we just briefly touch upon the top 10 relations (Table 6.3). Among the top 10, Elaboration-additional (3.5% at the lowest level, 43.0% and 46.6% at the next two upper levels, respectively) and Circumstance (2.6%–4.8%, with similar percentages across levels) are two relations commonly observed across levels. When the mono-nuclear relation Circumstance is concerned, the satellite constitutes a context so that the nucleus-related situation can be interpreted. The discourse process of interpreting the situation occurs more between clauses (4.8%) than between sentences or between paragraphs (both with a percentage of 2.6%). But the mono-nuclear relation Elaboration-additional behaves quite differently. This relation, indicating a discourse process of satellites offering details about or additional information for the nucleus-related situations, takes up a big proportion of all discourse processes. At the two upper levels, it amounts to an overwhelming percentage (43%–46.6%), more than 10 times that at the lowest level (3.5%). The finding agrees with Carlson and Marcu, 2001) expectation of its particular popularity across large spans. Table 6.3 T he 10 most frequent rhetorical relations at various levels (an elaborate classification) (R. = rank) (E-=elaboration-) R.

Clauses as nodes

%

Sentences as nodes

%

Paragraphs as nodes

%

1

Attribution

30.0

E-additional

43

E-additional

46.6

2

E-objectattribute-e

22.9

Explanationargumentative

10.2

E-set-member

6.7

3

E-additional-e

7.3

Example

5.8

Explanationargumentative

6.2

4

Circumstance

4.8

E-generalspecific

4.1

Example

4.4

5

Purpose

4.4

Evidence

3.9

E-general-specific

3.9

6

E-additional

3.5

Antithesis

3.5

Background

3.9

7

E-generalspecific

2.0

Background

2.9

Evaluation-s

2.8

8

Condition

1.9

Circumstance

2.6

Summary

2.8

9

Antithesis

1.7

Interpretation-s

2.6

Circumstance

2.6

Concession

1.6

Consequence-s

2.2

Interpretation-s

2.5

10

sub-total Source: Zhang & Liu, 2016a

80.1

80.8

82.4

172 Discourse Dependency Relations Four relations, Circumstance, Condition, Concession and Purpose usually play the roles of adverbial modifiers and are thus more common at the lowest level between clauses. Example, Background, Interpretation-s and Elaboration-generalspecific behave differently as these relations occur more often between sentences or paragraphs. For instance, to provide more specific information to help define a very general concept (Elaboration-general-specific) can hardly be accomplished within a sentence. Or consider Background. Different from the stronger and somewhat cotemporal relation Circumstance, this relation goes with a context which occurs at distinctly different times and which is not always delimited sharply or specified clearly. The relation Antithesis, appearing more often at the two lower levels, particularly between sentences, necessitates some attention. This mono-nuclear relation, presenting a contrasting situation, boasts more frequencies than Contrast, which links multiple nuclei. We now turn to some rhetorical relations which are only more visible at one certain level. We still start from the lowest level, where Attribution (30%) and Elaborationobject-attribute-e (22.9%) together hold up more than half the sky. As it indicates a postmodifier giving meaning to an object, the latter occurs only 14 times at the intermediate level and merely four times at the highest level. At the intermediate level, providing evidence or justification (Evidence) is usually performed by sentences. Similarly, Consequence-s are more visible at this level between sentences. At the highest level, 6.7% of all the discourse processes go with introducing a finite set or a list of information where one or more members of the set are specifically elaborated by the satellite (Elaboration-set-member). This relation ranks second between paragraphs. This might be a unique feature of journalistic writings, where the elaboration occurs in separate paragraphs. Likewise, Evaluation-s and Summary are also relations more prominent at this level than at the other two levels. Writers usually summarise (Summary, 2.8%) or assess (Evaluation-s) in new paragraphs. 6.1.3 Section summary

This part of the observation sheds light on how clauses aggregate into sentences, sentences into paragraphs and paragraphs into passages through the connection of RST rhetorical relations. At these three levels, considerable similarities as well as consistent differences are observed. Most relations (54 out of a total of 87) can occur between EDUs of various granularity levels, but they come with different percentages. Some relations might be prominent only at a certain level, while some other relations might be absent from a certain level. These suggest that at various levels there are different discourse processes going on. But fundamentally, we see more resemblance between the two upper levels, suggesting some similar mechanisms operating between sentences and between paragraphs. What we have observed is in accordance with the predictions in the manual (Carlson & Marcu, 2001) and also with what William and Reiter (2003) have found

Discourse Dependency Relations 173 out, but we move a step further and quantitatively present the specific percentages of various relations at distinct levels. The incomplete list of observations is adequate for answering Research Question 6.1.1. Having discussed the percentages of various RST analysis, we can, in the next section, move on to the distribution patterns to fit certain models to the data. 6.2 Modelling distributions of RST relations 6.2.1 Introduction

The previous section focuses on the relations per se. In this section, we model their distributions. Using the right truncated modified ZA model, Yue and Liu (2011) model the distribution of RST relations in a Chinese RST-annotated corpus (Yue & Feng, 2005). This study justifies the rhetorical relations as a result of diversification process (Altmann, 1991, 2005a) and as normal language entities. Their corpus is a traditional one with spans coming at various levels. We posit that the RST relations at different levels in our converted corpus will follow the same ZA distribution model. The distributions will further incorporate RST relations at various taxonomy levels. We examine the three taxonomy levels of the relations, which was previously introduced at Section 6.1, with taxonomy 1 being the most elaborate (87 types), taxonomy 3, being the most general (16 types) and taxonomy 2 going in between (45 types). These taxonomies will be discussed so that we can further check the applicability and reliability to discuss RST rhetorical relations among the same level of units at all taxonomy levels. Also we might be able to justify these taxonomies if their relevant relations are regularly distributed, following a certain distribution pattern. It’s found in Section 5.2 that syntactic dependency relations/syntactic roles are distributed following the ZA distribution model. One of the purposes of the study is to examine whether dependencies at syntactic and discourse levels follow some common laws. Thus we come up with the hypothesis: Hypothesis 6.2.1: At all unit granularity levels, and relation taxonomy levels, RST rhetorical relations follow the same ZA distribution pattern. 6.2.2 Results and discussion

In this part, we first present the results. Then we discuss the taxonomies of relations along with the diversification process. 6.2.2.1 Distribution pattern of RST relations

Initially, Tables 6.4 and 6.5 present the rank-frequency data for RST relations at all unit levels. Table 6.4 stands for data at the more elaborate relation taxonomy

174 Discourse Dependency Relations Table 6.4 Rank-frequency data of taxonomies 1 and 2 (R. = rank, 1 = taxonomy 1, 2 = taxonomy 2, C = clause, S = sentence, P = paragraph) Rank

Taxonomy 1

Rank

C1

S1

P1

1

3455

1531

1427

2

2644

363

3

840

4

Taxonomy 2 C2

S2

P2

1

4352

1753

1802

204

2

3639

363

189

205

189

3

605

207

135

549

147

135

4

553

139

118

5

511

139

120

5

251

124

112

6

398

123

118

6

239

108

90

7

233

103

87

7

220

103

85

8

219

91

86

8

212

96

79

9

199

91

79

9

200

91

77

10

183

80

78

10

162

82

65

11

158

75

77

11

148

75

60

12

141

75

65

12

139

66

58

13

135

66

60

13

102

46

36

14

129

46

50

14

99

45

22

15

123

45

36

15

94

44

20

16

107

43

34

16

94

42

15

17

95

39

26

17

93

28

14

18

91

32

22

18

50

27

12

19

86

28

20

19

46

18

10

20

85

28

15

20

43

17

8

21

80

27

14

21

31

14

7

22

80

17

10

22

30

12

7

23

61

14

8

23

26

12

6

24

61

12

8

24

20

9

4

25

56

12

8

25

18

9

4

26

54

12

7

26

14

7

3

27

53

10

7

27

13

6

3

28

46

9

7

28

13

4

3 (Continued)

Discourse Dependency Relations 175 Table 6.4 (Continued) Rank

Taxonomy 1 C1

Rank S1

P1

Taxonomy 2 C2

S2

P2

29

45

9

7

29

10

4

3

30

42

9

6

30

7

3

3

31

41

9

5

31

4

3

3

32

40

7

4

32

1

2

3

33

38

6

4

33

1

2

2

34

34

6

3

34

1

2

35

32

5

3

35

1

1

36

30

5

3

36

1

1

37

25

4

3

37

38

20

4

3

39

20

4

3

40

18

4

3

41

18

3

2

42

17

3

2

43

16

3

2

44

14

2

2

45

14

2

2

46

13

2

1

47

13

2

1

48

13

2

1

49

13

1

1

50

12

1

1

51

11

1

1

52

9

1

1

53

9

1

1

54

8

1

1

55

7

1

56

7

1

57

7

1

1

(Continued)

176 Discourse Dependency Relations Table 6.4 (Continued) Rank

Taxonomy 1 C1

Rank S1

58

7

59

7

60

6

61

5

62

5

63

5

64

5

65

5

66

4

67

4

68

3

69

3

70

3

71

2

72

1

73

1

74

1

75

1

76

1

77

1

78

1 11529

P1

Taxonomy 2 C2

S2

P2

11529

3564

3063

1

3564

3063

total

Source: Zhang & Liu, 2016c

levels taxonomies 1 and 2, and Table 6.5, for the most general classifications with 16 classes (taxonomy 3). Table 6.6 presents the fitting results of the ZA model. All fittings yield quite good results, with determination coefficients all above 0.944, suggesting the model as a good fitting model and the relations at all level with all taxonomies as results of diversification processes. Table 6.7 typically presents the fitting of the model to data of RST relations in the intermediate level and Figure 6.1 is its graphic representation.

Table 6.5 R hetorical relations across levels (taxonomy 3) (R.= rank, Freq. = frequency) R.

Between clauses within sentences Relation Elaboration Attribution Background Enablement Contrast Cause Explanation Condition Manner-Means Temporal Comparison Evaluation Summary Topic-Comment total

Source: Zhang & Liu, 2016c

Freq. 4,476 3,639 618 567 412 391 321 283 242 220 130 120 99 11 11,529

% 38. 8 31. 6 5. 4 4. 9 3. 6 3. 4 2. 8 2. 5 2. 1 1. 9 1. 1 1. 0 0. 9 0. 1 100

Between paragraphs

Relation

Freq.

Elaboration Explanation Evaluation Background Contrast Cause Attribution Summary Comparison Condition Topic-Comment Temporal Enablement Manner-Means

2,004 547 255 194 190 163 42 41 37 22 19 18 16 16 3,564

% 56.2 15.3 7.2 5.4 5.3 4.6 1.2 1.2 1.0 0.6 0.5 0.5 0.4 0.4 100

Relation

Freq.

%

Elaboration Explanation Evaluation Background Summary Contrast Cause Comparison Topic Change Topic-Comment Condition Enablement Manner-Means Temporal Attribution

1,940 263 243 197 115 113 87 21 21 21 16 11 10 3 2 3,063

63. 3 8. 6 7. 9 6. 4 3. 8 3. 7 2. 8 0. 7 0. 7 0. 7 0. 5 0. 4 0. 3 0. 1 0. 1 100

Discourse Dependency Relations 177

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Between sentences within paragraphs

178 Discourse Dependency Relations Table 6.6 Parameter values of the ZA model fitted to data of RST relations between sentences within paragraphs Nodes

Relations

R2

a

b

n

α

Taxonomy 1: 87 types

clauses sentences paragraphs

all all all

0.9764 0.9964 0.9971

1.38 0.12 0.02

0.04 0.28 0.30

78 58 54

0.30 0.43 0.47

Taxonomy 2: 45 types

clauses sentences paragraphs

all all all

0.9443 0.9969 0.9982

1.50 0.13 0.20

0.13 0.30 0.27

33 36 37

0.38 0.49 0.59

Taxonomy 3: 16 classes

clauses sentences paragraphs

all all all

0.9510 0.9963 0.9971

1.69 0.35 0.05

0.08 0.46 0.50

14 14 15

0.39 0.56 0.63

Source: Zhang & Liu, 2016c

Table 6.7 Fitting the ZA model to data of RST relations between sentences within paragraphs (f[i]: empirical frequency, NP[i]: theoretical frequency) x[i]

f[i]

NP[i]

x[i]

f[i]

NP[i]

1 2 3 4 5 6 7

2004 547 255 194 190 163 42

2004.00 512.76 319.45 208.41 142.03 100.36 73.07

8 9 10 11 12 13 14

41 37 22 19 18 16 16

54.54 41.57 32.26 25.42 20.30 16.41 13.40

Source: Zhang & Liu, 2016c

Figure 6.1 Fitting the ZA model to data of RST relations between sentences within paragraphs Source: Zhang & Liu, 2016a

Discourse Dependency Relations 179 Table 6.8 Rank-frequency data of chosen RST relations Relations

Elaboration

Temporal

Topic-comment

Rank

Between clauses

Between sentences

Between paragraphs

Between clauses

Between paragraphs

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Total

2,644 840 398 233 80 54 53 45 40 25 20 18 14 11 1 4,476

1,531 205 147 43 32 14 10 9 4 3 2 2 1 1

1,427 204 135 120 34 8 4 3 2 1 1 1

86 80 41 5 5 3

7 5 3 2 1 1

2,004

1,940

220

19

Source: Zhang & Liu, 2016c

From among the rhetorical relation classes that have several representative members, we choose those with at least five members to examine how their members are distributed. We choose at least five to avoid data scarcity problem. Table 6.8 presents the rank-frequency data for the members of Elaboration, Temporal and Topic-comment. The excellent fittings as presented in Table 6.9 suggest these representative members are also results of diversification processes. We briefly introduce these three umbrella terms. Covering the most types of relations is Elaboration, which includes all Elaboration-initial relations (e.g., Elaboration-general-specific, Elaboration-part-whole), Example and Definition. Taboada and Mann (2006a) identify the definition of rhetorical relations as a problem area as the sub-types are hard to tell apart. The fitting result, with all determination coefficients R2 above 0.99, seems to justify the definitions of Elaboration cluster of relations in the RST-DT. Temporal and Topic-Comment act likewise, sharing the same distribution pattern with their determination coefficients R2 above 0.98. Table 6.9 Fitting the right truncated modified Zipf-Alekseev distribution to the rankfrequency data of chosen RST relations

3 of the classes

Nodes

Relations

R2

a

b

n

α

clauses paragraphs sentences clauses paragraphs

elaboration elaboration elaboration temporal topic-comment

0.9993 0.9991 0.9973 0.9801 0.9968

1.16 0.44 0.52 0.26 0.00

0.36 0.74 0.66 1.18 0.66

15 14 12 6 6

0.59 0.76 0.74 0.39 0.37

Source: Zhang & Liu, 2016c

180 Discourse Dependency Relations The previous findings validate Hypothesis 6.2.1 and prove that rhetorical relations are all regularly distributed, following the same ZA pattern, whichever taxonomy they follow and whichever level of granularity they are at. Even sub-classes with at least five representative members share the same distribution model. The findings accord with Yue and Liu (2011). Fitting the ZA model to data in Table 6.1 (Distribution of rhetorical relations in Li et al., 2014) also yields excellent result with R2 being 0.9970 (for the train data) and 0.9897 (for the test data), which might constitute a validation for Li et al.’s (2014) algorithms. The common distribution model seems to suggest certain common mechanisms of how discourse proceeds through the connection of RST rhetorical relations at all levels. But what does the same distribution pattern mean? We will elaborate on the model and its significance in the next part. 6.2.2.2 Taxonomies in RST and the diversification process

This part focuses on taxonomies of relations employed in the RST-DT. How do discourse segments relate? Through relations? Then how many are needed? There is a fair amount of controversy. For instance, Grosz and Sidner (1986) propose only two basic structural relations carrying no semantic import: dominance and satisfaction-precedence. Hovy and Maier (1992, p. 4) call this position with only two intersegment relations “the Parsimonious Position”. At the other end of the spectrum is “the profligate position” (Hovy & Maier, 1992, p. 1), encouraging an open-ended set of relations. That’s why along the spectrum, some collections of rhetorical relations might come with a number of relations between 2 and 350 (Taboada & Mann, 2006a)! In many linguistic areas, a fixed inventory of relations is not necessarily required (Taboada & Mann, 2006a). But researchers have been debating over the taxonomies of rhetorical relations as they deem them intuitive and subjective. There is no uniform standard for the annotation, and particularly for some values, the agreement can be problematic (Scholman & Sanders, 2014). Albeit with all these factors, the taxonomy of RST rhetorical relations is neither fixed nor completely unstable; rather, it is a set of open relations but nonetheless relatively stable. In RST, initially, Mann and Thompson (1988) propose 24 relations; after that there have been some new proposals and discussions (e.g., Carlson & Marcu, 2001; Louwerse, 2001; Sanders et al., 1992; Taboada & Mann, 2006a). On the RST website (http://www.sfu.ca/rst/), the initial 24 member relations extend to 30. In the RST-DT corpus, there are 87 relations when the taxonomy is the most delicate. Besides diversification, there is also unification in play with suggestions to compartmentalise relations. For instance, Taboada and Mann (2006a) identify 12 classes. In the RST-DT, relations are grouped into 16 classes. The only two basic structural relations proposed by Grosz and Sidner (1986) are another example of unification. Validation or justification of rhetorical relations can take place in three dimensions. Initially, there have been studies empirically validating or justifying the rhetorical relations per se or their taxonomies. Some researchers carry out experiments and study how subjects experience with the relations. Some researchers compare

Discourse Dependency Relations 181 two ways of text representation. Before the proposal of RST, Meyer et al. (1980) find that coherence relations help ninth-grade students organise discourse, particularly when the coherence relations are explicitly marked. Likewise, Haberlandt (1982) proves that discourse segment processing can also be facilitated by marked coherence relations. Sanders et al. (1992) at the outset of RST propose descriptive adequacy and cognitive plausibility as two main features of RST. They carry out psycholinguistic experiments, proving that their subjects are rather sensitive to various relations. Such experiments prove that their taxonomy is psychologically salient, constituting evidence to the understanding of such relations. Whether explicitly marked or underspecified, coherent relations are important for comprehension (Cain, 2003; den Ouden et al., 2009; Graesser et al., 2001; Graesser et al., 2003; Ibáñez et al., 2019; Kintsch, 1994; Kleijn et al., 2019; McCarthy & McNamara, 2021; Rohde & Horton, 2014; Sanders & Noordman, 2000; Sanders et al., 2007; Spooren, 1997). For instance, Spooren (1997) finds such relations are used cooperatively by both speakers and hearers. Sanders and Noordman (2000) focus on relations explicitly marked, which are found to facilitate processing. But that doesn’t rule out coherent relations which are underspecified. An investigation on such prosodic realisation like articulation rate, pitch range and segments is carried out by den Ouden et al. (2009), who compare the RST-annotated version with the read-aloud version. The former is found to reflect prosodic characteristics of the texts and their organisational features. Rohde and Horton (2014) carry out an eyetracking investigation, which finds what readers expect about coherence relations between sentences can be revealed through their anticipatory looks. The second dimension of the proof, a more direct one is the practical applications, like in psycholinguistics (Das & Taboada, 2018; Kehler et al., 2008; Sanders et al., 1992; Spooren, 1997), computational linguistics (Jurafsky, 2004; Kibble & Power, 2004; Morris & Hirst, 1991; Sanders et al., 1993; Šnajder et al., 2019; Spooren & Degand, 2010; Wolf & Gibson, 2005) and discourse analysis (Al-khazraji, 2019; Kleijn et al., 2019; Miragoli et al., 2019; Siregar et al., 2021; van Dijk, 1982, 1983), which go beyond the original objective to generate texts (Taboada & Mann, 2006a). Finally, fitting certain distribution patterns to the RST data constitutes yet another dimension of the proof. Typical examples include studies from Beliankou et al. (2012) and Yue and Liu (2011). Along this vein of proof, we need to come back to the idea of the diversification process, which we have briefly introduced in Chapter 3. The ZA model is a well-known Zipf’s law-related pattern for modelling rankfrequency data of the diversified entities (Altmann, 1991; Köhler, 2012). Yue and Liu (2011) provide its linguistic interpretation. In this study, the good fitting justifies the linguistic status of RST rhetoric relations as results of diversification processes. It also validates the hypothesis in this sub-section (Hypothesis 6.2.1) and backs up research findings by Beliankou et al. (2012), Li et al. (2014) and Yue and Liu (2011). These studies collectively justify the taxonomies of the RST rhetorical relations and suggest the relations as results of diversification processes. This method works not just for relations in RST. Data from Ratnayaka et al. (2018) can also be fitted using the ZA model with R2 of 0.9736. But better models

182 Discourse Dependency Relations are negative hypergeometric (K, M, n) (R2 = 0.9915) and Polya (s, p, n) (R2 = 0.9892). This result also validates the relation classification in Cross-document Structure Theory (CST). So this validation method can be used for relation distribution regardless of the framework from which these relations are generated. 6.2.3 Section summary

This section examines the rank-frequency distributions of all the RST rhetorical relations across levels and finds that all of them, including three classes bearing at least five representative members, follow the same right truncated modified ZA distribution pattern. The fittings suggest possible common mechanisms of discourse processes in English news-writing. The fitting of the same distribution model to all sets of data justifies the three taxonomies of the rhetorical relations in the RST-DT, including the taxonomy of sub-types. With both diversification and unification forces in play, the resulting distribution of RST relations is found to obey a linguistically meaningful pattern, suggesting the relations as results of diversification processes. The common distribution patterns for syntactic and discourse relations seem to suggest some common mechanism at the two levels, which are both human-driven. These two sub-sections address the collective behaviours of RST relations, ignoring their sequential behaviours, which will be tackled in the next section. 6.3 Motifs in reconstructed RST discourse trees 6.3.1 Introduction

Most quantitative methods cover unordered inventories of elements or their relations. In other ways, language-in-the-mass methods, rather than language-in-the line ones are employed, generally ignoring sequential/syntagmatic behaviour of the objects under study (Köhler, 2015; Pawłowski, 1999). Using the data from the Postdam Commentary Corpus (Stede, 2004), Beliankou et al. (2012) investigate the motifs of RST relations and their lengths and find that the lengths of R-motifs (defined as uninterrupted sequences of unrepeated elements in the study) follow the hyper-Binomial distribution and justify R-motifs as a consequence of a diversification process. They also find that the lengths of D-motifs of elements following a depth-first path in a tree structure follow the mixed negative binomial distribution, which is interpreted as a result of a combination of two diversification processes. Most of the previous studies go with conventional RST representations with both elementary and non-elementary spans and thus render it impossible to examine RST relations at distinct levels of granularity exclusively and clearly. Another drawback is a reductionism, which is also long characterised by linguistic studies (Hřebíček, 1999): Previous RST studies mostly only cover sentences and/or sentential constituents.

Discourse Dependency Relations 183 This section is aimed at examining R-motifs of RST relations at all three granularity levels of constituents (namely clauses, sentences and paragraphs) and also the lengths of these motifs. For simplicity motifs thereafter mentioned in this section are all R-motifs unless otherwise indicated. We hypothesise: Hypothesis 6.3.1: At all separate levels, motifs of RST relations follow a common rank-frequency distribution pattern. Hypothesis 6.3.2: At all levels, lengths of the relation motifs are lawfully distributed, following the same rank-frequency distribution pattern. The remainder of this section is organised as follows. The next subsection elaborates on the research findings. And the final one presents a summary. 6.3.2 Research methods

Different from the previous two sections, where the three levels refer to (1) clauses aggregating into sentences, (2) sentences aggregating into paragraphs and (3) paragraphs aggregating into passages, in this section, we examine how three different granularity levels of EDUs constitute the whole discourse. Namely we investigate the dynamic discourse process of building discourses from (1) clauses—the lowest level of EDUs, (2) sentences—the intermediate level of EDUs and (3) paragraphs—the highest level of EDUs, respectively. As for examining motifs, the frame unit in this study is the whole sub-corpora. We will assign a role to all the nodes. Nodes serving a certain rhetoric relation will be assigned the role of that very relation. For the uppermost nodes, whether in mono-nuclear or multi-nuclear cases, they will be assigned the role of Root. This way, none of the EDUs will be left out and we can investigate the dynamic discourse processes with all the running EDUs. As the EDUs differ in each sub-corpus, we expect some diverse dynamic processes among them. And simultaneously, as all types of EDUs are closely related, we can also foresee some similarities. As the previous two sections have found the two upper two levels are more alike, we also expect this kind of resemblance when motifs and their lengths are concerned. We thus come up with a new hypothesis: Hypothesis 6.3.3: Both motifs and motif lengths of RST rhetorical relations exhibit both similarities and differences among the three different levels of EDUs in the converted corpora. Like in the previous section, we need to examine the relations along the same three taxonomies: Taxonomy 1: The most elaborate one with a total of 86 (rather than the claimed 78), where Elaboration includes 8 sub-kinds (6 Elaboration-initial relations, Example and Definition).

184 Discourse Dependency Relations Taxonomy 2: The intermediate level with a total of 37 types, where Elaboration boasts three members (Elaboration, Example and Definition). Taxonomy 3: The most general one with a total of 16 types, where Elaboration, Example and Definition are grouped into one umbrella term—Elaboration. Discussions along all these taxonomies might further testify the applicability and reliability of the tree conversion and the feasibility of the taxonomies. Thus we update the research hypothesis: Updated Hypothesis 6.3.3: Both motifs and motif lengths of RST rhetorical relations along three taxonomies exhibit both similarities and differences among the three different levels of EDUs in the converted corpora. Before we move on, we illustrate how we get the motifs. As presented in Section 4.5.1, we define motifs as a sequence of uninterrupted, unrepeated elements as RST rhetorical relations are nominal rather than numeric. We illustrate with a sample sequence of clause EDUs, which has the following sequence of RST relations (taxonomy 3): Root, Explanation, Manner-Means, Root, Elaboration, Elaboration, Elaboration, Enablement, Background, Elaboration. Therefore, the following motifs can be obtained: Root + Explanation + Manner-Means, Root + Elaboration, Elaboration, Elaboration + Enablement + Background, Elaboration. The first motif ends with Manner-Means as the next relation is Root, a repetition with one of the elements. Similarly, the second motif has to be ended with Elaboration as there can’t be repetitious elements in the same motif. Here, we use + to indicate sequential order. 6.3.3 Results and discussion 6.3.3.1 Motifs of RST relations

Table 6.10 presents the summary of the nine rank-frequency distribution groups for the motifs of rhetorical relations. We illustrate with top 5 data for taxonomy 1 when clauses are EDUs ( Table 6.11) and top 5 data for taxonomy 3 when paragraphs are EDUs (Table 6.12). These motifs of all nine groups of complete data are found to follow the same distribution model: negative binomial (k, p) with all determination coefficients R2

Discourse Dependency Relations 185 Table 6.10 A summary of rank-frequency distribution data for discourse relation motifs (T = taxonomy) Nodes

Clauses

Taxonomy

T1

Sentences T2

T3

T1

Paragraphs

T2

T3

T1

T2

T3

Types

3,559

2,181

1,404

1,102

881

510

527

429

285

Tokens

11,163

13,939

14,570

41,06

44,78

4,905

2,008

2,138

2,274

Source: Zhang & Liu, 2017a

Table 6.11 Rank-frequency distribution of discourse relation motifs (taxonomy 1, clauses as nodes, part) Rank

Motifs

1

Elaboration-additional

2

Freq.

%

2,086

18.687

Elaboration-additional + Attribution

321

2.876

3

Elaboration-additional + Elaboration-object-attribute-e

304

2.723

4

Attribution

290

2.598

5

Attribution + Elaboration-additional

218

1.953

3,219

28.837

Sub-total ...

...

...

3,559

Circumstance + Elaboration-additional + Background

... 1

Total

11,163

0.009 100%

Source: Zhang & Liu, 2017a

Table 6.12 Rank-frequency distribution of discourse relation motifs (taxonomy 3, paragraphs as nodes, part) Rank

Motifs

Freq.

%

1

Elaboration

1,065

46.834

2

Root

182

8.004

3

Elaboration + Root

171

7.520

4

Root + Elaboration

67

2.946

5

Elaboration + Explanation

55

2.419

Sub-total

1,540

67.723

...

...

...

...

285

Elaboration + Explanation + Evaluation + Root + Attribution + Background Total

Source: Zhang & Liu, 2017a

1 2,274

0.044 100%

186 Discourse Dependency Relations Table 6.13 Fitting the negative binomial distribution pattern to rank-frequency data of discourse relation motifs (T = taxonomy) Nodes

T

R²

k

p

N

Clauses

T1 T2 T3

0.9527 0.9973 0.9974

0.2137 0.1266 0.1265

0.0004 0.0007 0.0014

11,163 13,939 14,570

Sentences

T1 T2 T3

0.9830 0.9952 0.9975

0.1550 0.1213 0.1180

0.0010 0.0013 0.0034

4,106 4,478 4,905

Paragraphs

T1 T2 T3

0.9460 0.9785 0.9923

0.1857 0.1402 0.1443

0.0018 0.0021 0.0063

2,008 2,138 2,274

Source: Zhang & Liu, 2017a

above 0.949 (Table 6.13). The fitting results corroborate Hypothesis 6.3.1, proving that at all separate levels, motifs of RST relations are lawfully distributed. In this part, we will take turns to address the three hypotheses. Initially, we investigate how motifs are distributed. Following that we pursue the distribution of motif lengths. Finally, we compare the parameters of the distribution patterns to see whether they exhibit some similarities and differences, or connections. 6.3.3.2 Lengths of motifs

We resume with the motif lengths, or the number of relations each motif contains. Table 6.14 illustrates how motif lengths are calculated: the number of elements in a motif corresponds to its length. Table 6.15 presents the rank-frequency data for all nine groups. In most cases, length and rank are the same with only one exception, which occurs at the highest level with paragraphs as EDUs (the highlighted part in Table 6.16). This is an outlier with negligible influence. We thus employ the lengthfrequency data as rank-frequency data. It is found that motif lengths observe the positive negative binomial distribution model with excellent fitting results (all R2 above 0.99), thus corroborating Hypothesis 6.3.2 and proving that lengths of the relation motifs are lawfully distributed. Table 6.17 and its graphic representation Figure 6.2 illustrate the fitting of clause nodes (taxonomy 3). Table 6.14 Measuring discourse relation motif lengths Motifs

Length

Elaboration Elaboration + Explanation Elaboration + Explanation + Evaluation + Root+ Attribution + Background

1 2 6

Source: Zhang & Liu, 2017a

Discourse Dependency Relations 187 Table 6.15 L ength-frequency distribution of lengths of discourse relation motifs (T = taxonomy) Frequency Length

Clauses as nodes

Sentences as nodes

Paragraphs as nodes

T1

T2

T3

T1

T2

T3

T1

T2

T3

1

3951

7013

7726

1954

2364

2779

1065

1185

1308

2

3501

4154

4269

1067

1166

1331

464

526

576

3

1992

1737

1706

590

547

536

244

234

254

4

917

664

602

285

247

182

128

116

97

5

432

256

204

117

91

54

71

52

28

6

229

78

46

61

42

20

30

22

11

7

89

25

13

21

14

3

5

3

8

25

10

4

7

5

9

17

2

4

2

10

8

11

2

1

Source: Zhang & Liu, 2017a

6.3.3.3 Comparison among granularity levels

For both rhetorical relation motifs and their lengths, each group of data follows a common distribution pattern, the negative binomial distribution and the positive negative binomial distribution, respectively. Therefore we can tentatively conclude that there is a common mechanism of discourse process through the connection of Table 6.16 Fitting the positive negative binomial distribution to rank-frequency data of motif lengths (three levels) (T = taxonomy) Nodes

T

R2

k

p

N

Clauses

T1 T2 T3

0.9958 0.9996 0.9999

5.7461 4.7864 6.3673

0.7630 0.8025 0.8519

11,163 13,939 14,570

Sentences

T1 T2 T3

0.9988 0.9997 0.9999

2.1767 1.9543 4.0537

0.6340 0.6548 0.8067

4106 4478 4905

Paragraphs

T1 T2 T3

0.9973 0.9996 0.9990

1.0234 1.1516 2.5643

0.5221 0.5769 0.7380

2008 2138 2274

Source: Zhang & Liu, 2017a

188 Discourse Dependency Relations Table 6.17 Fitting the positive negative binomial distribution to rank-frequency data of discourse motif lengths (taxonomy 3, clauses as nodes) (f[i]: empirical frequency, NP[i]: theoretical frequency) x[i]

1

2

3

4

5

6

7

8

f[i] NP[i]

7726 7741.25

4269 4223.25

1706 1744.49

602 605.03

204 185.79

46 52.13

13 13.64

4 4.41

K = 6.3673, p = 0.8519, R2 = 0.9999 Source: Zhang & Liu, 2017a

rhetorical relations (regardless of their granularity level of EDUs or the taxonomy of relations). Following the discovery of the common distribution models, we resume with the model choice and then the examination of the model parameters with a hope of shedding some light on the relevant discourse processes. In the application of Altmann-Fitter, we usually find that quite a number of distribution patterns will result in excellent fitting results. Usually, these models go with two to four parameters. For example, in fitting the rank–frequency data of motif lengths in this study, mixed Poisson (a, b, α), a three-parameter model yields an even better result (Table 6.18) than the two-parameter positive negative binomial model.

Figure 6.2 Graphic representation of Table 6.17 Source: Zhang & Liu, 2016a

Discourse Dependency Relations 189 Table 6.18 Fitting the mixed Poisson distribution to rank-frequency data of motif lengths (T = taxonomy) Nodes

T

R²

a

b

α

N

Clauses

T1 T2 T3

0.9993 0.9999 1.0000

2.73 1.74 1.39

0.84 0.49 0.40

0.22 0.26 0.33

11163 13939 14570

Sentences

T1 T2 T3

0.9991 0.9997 0.9999

2.01 1.83 1.28

0.39 0.36 0.30

0.38 0.32 0.38

4106 4478 4905

Paragraphs

T1 T2 T3

0.9998 0.9964 0.9971

1.92 1.75 1.24

0.26 0.29 0.20

0.39 0.34 0.46

2008 2138 2274

Source: Zhang & Liu, 2017a

We employ the positive negative binomial model as our distribution model from among a list of models with excellent fitting results (for instance the mixed Poisson distribution) for the following reasons: a The goodness-of-fit only differs slightly between the two models, both of which yield excellent fitting results. b A trade-off between parameter numbers and goodness-of-fit improvement must be reached in model selection. A good distribution is one with theoretical justification and linguistic interpretation (Altmann, 1997). Models with more parameters might yield better fittings, but also are even tougher to be interpreted linguistically. We need to be cautious with the added parameter (Köhler, 2012). c This two-parameter model is widely employed for its plausible linguistic interpretations (e.g., Altmann, 2005b), which also holds for the data in this research. The deduction of the model is available in Altmann (2005b, p.205). Here we go into some details about the deduction of the model. First of all, as previously proved, rhetorical relations and their motifs are a result of a diversification process (Altmann, 1991). We also hypothesise that the motif lengths of rhetorical relations are a result of diversification process. Here is how we obtain the distribution pattern of motif lengths through the mathematical formulation of our hypothesis. We posit that motif length x is dependent on x − 1 since shorter ones are the basis for forming longer ones through the addition of one or more relations. To put it differently, new lengths appear in dependence on existent ones. For simplicity, only the frequencies of neighbouring classes are taken into consideration (cf. Altmann, 1991). We express the probability of class x as proportional to that of class x − 1.

190 Discourse Dependency Relations From the same material, we obtain the nine groups of data. They only differ in granularity of the EDUs and the taxonomy of relations. Through examining Table 6.15, we find how these two factors influence the results. a When the taxonomies go from the most general, to intermediate, to the most elaborate, motif lengths grow longer. We use quantity a to indicate such a tendency. b When EDU sizes become smaller, longer motifs are generated. We use quantity b to represent such a tendency. As is also suggested by Table 6.15, with motif lengths decreasing, the probability of longer lengths deceases as well. That indicates, the probability class x yields a decreasing influence. We thus use Function 6.1 to express all the preceding considerations of how probability of motif lengths is distributed. Px =

a + bx Px −1(6.1) x

We substitute a/b with k−1 and b with q, and obtain the negative binomial distribution:  k + x −1  k x Px =  p q , x  

x = 0,1,…(6.2)

Here no span goes without any role (either bearing one rhetorical relation or being a root), P0 = 0 and x = 1, 2, … we thus get the positive negative binomial distribution.  k + x −1    x   k x Px = p q , k 1− p

x = 1, 2,…(6.3)

Excellent fitting results as represented by Table 6.18 corroborate our hypothesis that motifs lengths, like the motifs and the rhetorical relations, are also results of a diversification process. Having made a choice of model, we resume with the parameters of the model with an aim of examining whether they will tell some differences or similarities between the discourse processes at different granularity levels. Following our deduction, both parameters a and b in Function 6.1 will exert their influence on the motif lengths. We can expect both parameters of the chosen model—k and p to be highly correlated with N. The Pearson correlation (R2) between k and N is found to be 0.907 (P-Value = 0.001), indicating a high correlation. The correlation between p and N is weaker but still acceptable with Pearson correlation (R2) being 0.755 (P-value = 0.019). To manifestly show the effect of granularity of EDUs, we reorganise Table 6.18 into Table 6.19. Both k and p can be seen to decrease with the increase of the size of

Discourse Dependency Relations 191 Table 6.19 Parameter values from Table 6.16 (T = taxonomy) k

p

T

Clauses as nodes

Sentences as nodes

Paragraphs as nodes

Clauses as nodes

Sentences as nodes

Paragraphs as nodes

1 2 3

5.7461 4.7864 6.3673

2.1767 1.9543 4.0537

1.0234 1.1516 2.5643

0.7630 0.8025 0.8519

0.634 0.6548 0.8067

0.5221 0.5769 0.7380

Source: Zhang & Liu, 2017a

EDUs. Figure 6.3, the graphic representation of Table 6.19, more visibly presents this effect. The graph also indicates more likeness between the two upper EDU granularity levels. In Beliankou et al.’s (2012) research, RST motif lengths of Potsdam Commentary Corpus follow the hyperbinomial distribution pattern. We fit it to our data and fail to find it a suitable model since it fits only four groups of data (three groups of data with clauses nodes, and one group of data with sentences nodes at taxonomy 3). For other five groups of data, the fitting yields a zero R2. What might bring about such a failure? We come up with four tentative explanations. First, the motif sequences are examined on different bases. Beliankou et al. (2012) examine the relations following the order of the two-dimensional tree. In our research, we take into consideration the linear order only. Second, basically, the hyperbinomial distribution only fits the data with clause nodes in our research. That might suggest somewhat different mechanisms at the higher levels. Third, the tree conversion in our study will undoubtedly bring about some consequences. We examine terminal nodes exclusively while in Beliankou et al.’s (2012) study the RST relations exist between spans of different granularity levels.

Figure 6.3 Graphic representation of Table 6.19. (a) Parameter k. (b) Parameter p Source: Zhang & Liu, 2017a

192 Discourse Dependency Relations Table 6.20 Parameter values from Table 6.13 (fitting rank-frequency data of motifs, T = taxonomy) k

p

T

Clauses as nodes

Sentences as nodes

Paragraphs as nodes

Clauses as nodes

Sentences as nodes

Paragraphs as nodes

T1 T2 T3

0.2137 0.1266 0.1265

0.1550 0.1213 0.1180

0.1857 0.1402 0.1443

0.0004 0.0007 0.0014

0.0010 0.0013 0.0034

0.0018 0.0021 0.0063

Source: Zhang & Liu, 2017a

Finally, we would like to suggest that language differences might be influential as well since one is in English and another is in German. But whether to include roots is not necessarily a contributing factor. Beliankou et al. (2012) examine every nucleus-satellite pair but we assign a role to every node. When we excluded the root and also examined each and every nucleus-satellite pair, the data still complied with our chosen model, but not the hyperbinomial model. We would recommend our chosen model for Beliankou et al.comm2012) data as well. The fitting result is excellent with an R2 value of 0.9852, only slightly different from the original value of 0.9974. For the aforementioned reasons, for the data in both studies, the two-parameter positive negative binomial model is more preferable. Having examined parameter values from the perspective of motif lengths, we use a similar methods to examine parameter values from the perspective of motifs. Table 6.20 is reorganised from Table 6.13. Figure 6.4 is the graphic representation of Table 6.20. We don’t find significant differences among the three curves, which

Figure 6.4 Graphic representation of Table 6.20. (a) Parameter k. (b) Parameter q Source: Zhang & Liu, 2017a

Discourse Dependency Relations 193 might indicate some common mechanisms of how rhetorical relations are sequentially organised in discourses. As we have hypothesised, with different granularity levels of the EDUs, the dynamic discourse processes differ to a certain extent. But the similarities and connections seem to be more evident. When motif lengths are concerned, again as we have hypothesised, more resemblance can be seen between the two upper EDU levels, which differ more from the clause level. 6.3.4 Section summary

Conventionally, both elementary terminal nodes and non-elementary nodes are present in RST trees. The conversion reframes each tree into three new dependency trees with only terminal nodes at three granularity levels, with nodes being clauses, sentences and paragraphs, respectively. This section yields some research findings concerning the distribution of motifs of RST relations. a At all levels, with all taxonomies of RST relations, the motifs of the relations follow the same negative binomial distribution pattern. b The lengths of these motifs all abide by a positive negative binomial distribution. c The previous two distribution patterns bear more similarities than differences at all granularity levels of terminal nodes. The first three sections of this chapter deal with discourse dependency relations. In the next section, we will examine whether the hierarchy established through the connections of dependency relations in the converted trees can capture the structural regularities. 6.4 Visualising structural “inverted pyramids” in English news discourse across levels 6.4.1 Introduction

This section examines the hierarchical structural features. Precisely, we look at the superstructure of news-reporting. There are various types of superstructures or macrostructures (Kintsch & van Dijk, 1975; Paltridge et al., 2012; Williams et al., 1984). Van Dijk (1982, 1986, 1988a, 1988b) maintains that there is a schema in news-writing which follows an ordering of importance/relevance. In such a schema, summarising the most salient information, the initial summary (Headline + Leads) is followed by Context, History and Consequences with less and less importance/relevance, but not necessarily following exactly a strict order. Verbal Reactions and Comments tend to occur at the end of the news. To sum it up, very much resembling an inverted/inverse pyramid, such a structure places the most relevant, immediate and newsworthy information at the very top, the secondary information in the middle, and the least salient information at the bottom;

194 Discourse Dependency Relations that is why such a schema is commonly termed as the inverted pyramid structure (e.g., Bell, 1991; Dunn, 2005; Mindich, 1998; Pöttker, 2003; Scollon, 2000; Thomson et al., 2008). Some scholars term it as the lead-and-body principle (e.g., Pöttker, 2003) and summary news lead (e.g., Errico, 1997), among some others. Different structures fulfil different functions. The inverted pyramid structure serves some practical purposes (Telg & Lundy, 2015/2021). For instance, in printed news, to fit the available place, some less important information can be directly cut. Or readers can read the beginning part to decide whether to continue. In the era of Internet newscasting, readers don’t have to click through to read the whole piece of news but still they can get the gist. All these can be done without losing the most salient information. In addition, reading is a mental challenge. We have a limited capacity for information processing, particularly working memory (Cowan, 2001, 2005, 2010). The placement of inverted pyramid structure helps readers grasp the most critical information and manage their cognitive load. With the intended semantic structure conveyed by the initial Summaries, the inverted pyramid is a topdown cognitive strategy (van Dijk & Kintsch, 1983). It activates knowledge and establishes a faster comprehension with initial Summaries playing a crucial role (van Dijk, 1988a). Experimentation shows that days after reading the news, usually readers can only recall the main topics (van Dijk, 1988a). The Lead exhibits how the reporter views the news affectively and transmits an emotive value, facilitating news interpretation for the readers (Khalil, 2006). Such a structure helps automatic summarisation (e.g., Koupaee & Wang, 2018), information extraction (e.g., Norambuena et al., 2020) and automatic corpus generation (e.g., Cruz et al., 2021), genre classification (e.g., Dai & Huang, 2021), among many other NLP tasks. Objectivity shall be characterised of modern journalism. Employing the inverted pyramid to address the 5W1H questions, and being neutral play key roles for reporters to report their news objectively (Mindich, 1998). These two key components make news-reporting unique and distinctive, therein recognised as an individual genre (Thomson et al., 2008) different from other genres like chronology (Sternadori, 2008). This top-down instalment strategy is shown by various studies to be a common strategy employed in news reporting (Scollon & Scollon, 1997; Thomson et al., 2008). Readership Institute (2010) specifically measures its percentage and finds that 69% of news follows the inverted pyramid structure. Through comparing 565 pairs of corresponding news stories from two resources (The New York Times and the New York Times International Weekly), Shie (2012) finds in the New York Time, closing paragraphs also convey salient information, which Shie believes to define soft news superstructure. Shie also measures how far the discourse units are from their focus and how far the units are between each other. The top-down superstructure occurs not only at discourse level. Such a structure is also characterised as the thematic structure at the paragraph level (van Dijk, 1986). This relevant ordering can also be attributed to a general strategy of newswriting. Topic sentences occurring initially in a paragraph, or internal summaries serve similar functions to those in initial summaries of news reporting. Can we visually present these two similar types of structure at both discourse and paragraph levels if there are such structures? We can pursue this question and investigate whether the same top-down strategy obtains in news reporting at both levels.

Discourse Dependency Relations 195 Like in Sections 6.1 and 6.2, again the three levels refer to: (1) the discourse level where nodes are paragraphs, (2) the paragraph level where nodes are sentences, and (3) the sentence level where nodes are basically clauses. Only trees with multiple nodes will be eligible. So, at the discourse level, only 359 passages are examined. The next part of this section addresses in what way we are going to draw the inverted pyramids. 6.4.2 Drawing inverted pyramids

We illustrate with Figure 6.5 (a re-presentation of Figure 1.3) about how we draw the structure at three different levels. Following the analogy between the original

Figure 6.5 An RST analysis of text WSJ_0642 from the RST-DT

196 Discourse Dependency Relations RST tree and the constituent structure of phrase-structure grammar, we convert Figure 6.5 into Figure 6.6 (a re-presentation of Figure 4.5) with only elementary clause nodes. Similarly, on the basis of Figure 6.6, we can obtain Figure 6.7 with sentence nodes only.

Figure 6.6 Converted tree of WSJ_0642 with elementary clause nodes only

Discourse Dependency Relations 197

Figure 6.7 Converted tree of WSJ_0642 with elementary sentence nodes only

Ultimately, on the basis of Figure 6.7 we can obtain Figure 6.8 with paragraph nodes only. With the three levels defined in Section 6.1.1, we need to go on to select trees eligible for our study. At the sentence level (with clauses being ultimate nodes), the highlighted two parts (Sentences 1 and 4) of Figure 6.6 are eligible with multiple clauses as nodes. But Sentences 5 and 6 are both one-clause sentences and are therefore ineligible for the present study. At the paragraph level (with sentences being ultimate nodes), we see in Figure 6.7 the highlighted part of the rhetorical relation Elaboration-general-specific connecting S2 and S1, and Elaborationadditional connecting S3 and S1 for Paragraph 1. As for Paragraph 2, there is only one sentence, so it is excluded from our discussion. At the discourse level (with paragraphs being ultimate nodes), Figure 6.8 indicates that Paragraph 2 is an Elaboration-additional for Paragraph 1. So, for the same tree, we can examine the structure at three distinct levels. To present the structure of trees or sub-trees, we will assign a value to each node. The highest value will be assigned to the top node (root) as we believe it contains the most central information. For instance, in the first sentence, 1, 3, 1 and

198 Discourse Dependency Relations

Figure 6.8 Converted tree of WSJ_0642 with elementary paragraph nodes only

2 will be assigned to Nodes 1, 2, 3 and 4 in Figure 6.9, respectively. Similarly, in the fourth sentence, Nodes 7–10 will get a value of 2, 1, 2 and 3 sequentially. And therefore, the lowest value (1) will be assigned to nodes at the lowest hierarchy of the tree (like Nodes 3 and 8 in Figure 6.9). Other nodes will be assigned a value in between. Therefore, for the two sentences, they carry structures of 1+3+1+2 and 2+1+2+3, respectively, following the linear order of nodes. Similarly, Paragraph 2 in Figure 6.7 bears a structure of 2+1+1, as the initial sentence carries more weight. The whole sample passage carries a structure of 2+1 as well since the beginning paragraph is the root. With structures represented by a sequence of numbers, we can check whether they belong to inverted pyramid structures. We first define some types of their visual structures. We start from the most obvious types of inverted pyramid structures. Type A. If a structure goes in a rigid sense with successively descending information relevance, it is “perfect inverted pyramid”. Some typical examples of “perfect inverted pyramids” are shown in Figure 6.10. Type B. When relevance in the sequence generally descends but with a value maintaining somewhere, as presented in Figure 6.11, the structure is still an inverted pyramid. The maintained value indicates neighbouring parallel structures. Type C: The third type bears a cup-shape with the highest level occurring initially, and some parallel structures occurring non-initially in a non-neighbouring

Discourse Dependency Relations 199

Figure 6.9 Visually presenting statistical structures of sentences (clauses as nodes)

Figure 6.10 Instances of perfect inverted pyramids: (Type A) Source: Zhang & Liu, 2016b

manner (Figure 6.12). This is an approximate-inverted pyramid as the initial node carries the most relevant information and there is a general relevancedescending order. For this shape, there shall be at least four nodes (for instance 3+2+1+2 for the left “cup” and 4+3+2+3+2+1 for the right “cup” in Figure 6.12). Type D: The fourth type is not an inverted pyramid. It goes with the initial node and some other non-neighbouring node(s) bearing the paramount information.

200 Discourse Dependency Relations

Figure 6.11 Inverted pyramids with neighbouring parallel structures: (Type B) Source: Zhang & Liu, 2016b

Figure 6.12 Cup-shape approximate-inverted pyramids: (Type C) Source: Zhang & Liu, 2016b

This is a pseudo-inverted pyramid as the initial root fails to make the structure an inverted pyramid, and as a matter of fact, as presented by Figure 6.13, the shape resembles a rectangle. Type E: If the initial node doesn’t carry the most important information of the chosen text span, the relevant structure is a non-inverted pyramid at all (e.g., Figure 6.14). Type F: In case there are multiple nuclei, the structure resembles a rope without any descending relevance in the structure. Without doubt, it is not an inverted pyramid at all.

Figure 6.13 Pseudo-inverted pyramids: (Type D) Source: Zhang & Liu, 2016b

Figure 6.14 Non-inverted pyramid structures without initial paramount nodes: (Type E) Source: Zhang & Liu, 2016b

Discourse Dependency Relations 201 Table 6.21 Structures across levels Type

Discourse level: Paragraphs as nodes

Paragraph level: Sentences as nodes

Sentence level: Clauses as nodes

Frequency Average % nodes

Frequency Average % nodes

Frequency Average % nodes

A

20

2.75

6

1,099

2.23

45 1,744

2.23

30

B

100

6.87

28

491

3.56

20

547

3.46

9

C

131

10.67

36

110

4.92

5

344

4.69

6

D

70

15.29

19

74

4.06

3

881

4.02

15

E

31

19.00

9

188

4.31

8 2,040

3.44

35

F

7

2.29

2

456

2.32

353

2.12

6

Total 359

10.60

100 2,418

2.81

100 5,909

3.17

100

Inverted pyramids

70%

19

70%

45%

Source: Zhang & Liu, 2016b

Following the definition of a general organisation of descending information relevance, Types A, B and C are regarded as inverted pyramids in this study. Having discussed how we can graphically represent the structure, we will then collect the data and analyse them in the next sub-section. 6.4.3 Results and discussion

In this part, we check whether the inverted pyramid model exists at all three levels. Table 6.21 is a summary of all types of visual structures for all the eligible trees. The top five structures are presented in Table 6.22. 6.4.3.1 Discourse level

It can be seen from Table 6.21 that when the nodes are paragraphs, 70% of the passages go with an inverted pyramid structuring, including Types A, B and C. From Table 6.22 The five most frequent structures across levels Discourse level

Paragraph level

Sentence level

Structure

Type

%

Structure

Type

%

Structure

Type

%

2+1+1+1 2+1 2+1+1 2+1+1+1+1+1 2+1+1+1+1 Sub-total

B A B B B 16%

5 4 3 2 2

2+1 1+1 2+1+1 3+2+1 2+2+1 69%

A F B A B

37 13 8 8 3

2+1 1+2 1+2+1 2+1+2 1+1 54%

A E E D F

24 11 7 7 5

Source: Zhang & Liu, 2016b

202 Discourse Dependency Relations Table 6.21, we can see that Type C (Cup-shaped approximate-inverted pyramids) takes up the biggest percentage (36%) among all shapes of structures. Type B structures (inverted pyramids with neighbouring parallel structures) come second, accounting for 28% of the total, but there are only 20 cases (6%) of perfect inverted pyramids (Type A). Examining Type A instances, we find most of them (13/20) have only 2 paragraphs. On average, these instances have 2.75 nodes, far less than the average of 10.6 at this level. It seems that shorter passages with less paragraphs are needed to form Type A structures. The percentage of inverted pyramids (70%) in this study agrees with Readership Institute (2010) which claims that in the United States, 69% of hard and soft news feature an inverted pyramid structure. It also goes in line with van Dijk’s prediction about the macrostructuring of news reporting since Types A, B and C bear an initial most relevant paragraph and enjoy an organisation that the relevance descends generally and that the instalments can be in less strict order (van Dijk, 1988a, 1988b). At the discourse level, the top five structures take up 16% of the total (Table 6.22). Four of these five structures are of Type B (12%). Compared with the percentage of top five structures at the paragraph level (69%) and that at the sentence level (54%), we can see that structures at the discourse level are rather diversified. In fact, there are 205 structures for the total of 359 texts, with 183 structures occurring only once. For the 30% of passages with non-inverted pyramid structuring, most of them have a vast number of paragraphs. For Type D passages, the average number of paragraphs reach 15.29, and for Type E, 19. It seems that with the number increasing, there is mounting pressure to bear an inverted pyramid structure. As the definition of Type D goes that both the initial node and some other nonneighbouring node(s) bear the most significant and relevant information, the high percentage (19%) of this structure further highlights the importance of the initial Summary Lead despite the fact that this type is not grouped as an inverted pyramid structure in this study. As for Type E structures, 16 of them are soft news, taking up nearly 50% of the total (31). Careful examination of data reveals that for most of the Type E passages (17 from a total of 31) the paramount nodes occur at the second position. This at least proves that the most relevant and immediate information tend to occur at a quite front position, though not initially. Finally, we examine the Type E passages. These passages are all short hard news. On average, these 7 passages have 2.29 paragraphs. Six of them have only two paragraphs. These brief reports have few sentences. In a nutshell, this study further highlights the significance of the summary news leads though some texts are not grouped as bearing inverted pyramid structures. But somehow different from Shie (2012), we fail to differentiate superstructures between soft and hard news in our study. The sample of 359 passages is neither sufficient nor balanced enough for examining systematic differences. Having examined the discourse level, we can resume with the paragraph level.

Discourse Dependency Relations 203 6.4.3.2 Paragraph level

As suggested by Table 6.21, we find a nearly identical percentage (70%) of inverted pyramid structuring at the paragraph level to that at the discourse level (69%), validating van Dijk’s hypothesis that there is a top-down thematic structure in each schematic category of news (History, Consequences, etc.) similar to the top-down superstructure of the whole news discourse (van Dijk, 1986). So we claim that at least for the corpus in our study, English news reporting features a top-down instalment structure at both discourse and paragraph levels with macro-propositions on top. Having addressed the general trend at the paragraph level, we go to some details concerning different structure types. First, as the number of nodes will make a great difference in forming different types, it’s necessary to check the average number of nodes at the paragraph level with sentence nodes. The average is found to be only 2.81, not including those non-eligible single-sentence paragraphs. The small number greatly facilitates the formation of perfect inverted pyramids (Type A), which accounts for 45% of the total with an average number of 2.3 sentences. Similarly, the formation of Type F (with multi-nuclei) is made easier, which takes up a percentage of 19%. Another side effect of this small average number of nodes also accounts for the less frequencies of Types C, D and E. These structures at this level carry an average number of nodes bigger than 4. Type C structuring requires a minimum of 4 as the definition requires and that’s why it only takes up a percentage of 3%. The imbalanced distribution, where Types A, B, F accounts for 84% of the total structures is also reflected by Table 6.22, the top five structures at each level. Here at the paragraph level, five top structures account for a total of 69%, indicating a distribution more unified than that at the discourse level. Actually, out of a total of 2,418 eligible paragraphs, there are only 159 patterns with 96 of them appearing just once, making it different from the discourse level. The difference is basically due to the average number of nodes at the two levels. We wonder whether the inverted pyramid structuring is unique to English news reporting. On the one hand, English language users are taught throughout their schooling and professional training that they should start a paragraph with a topic sentence, which usually occurs at the initial position. This placement facilitates the comprehension of the main idea and triggers better recall (Kieras, 1980). But on the other hand, this placement does not prevail in some genres. Braddock (1974) finds that a mere 13% of expository writings begin with a topic sentence. This is an interesting topic worth further research endeavour. 6.4.3.3 Sentence level

Finally, we move on to the sentence level with clauses as nodes. Quite different from the upper two levels where about 70% of the structures are inverted pyramids, at this level, only 45% of the sentences go with an inverted pyramid structure.

204 Discourse Dependency Relations At the sentence level, only 72% of the nuclei occur before their satellites, at the paragraph level, 90% and at the discourse level, 93%. For inverted pyramid structuring to appear, it’s necessary that nuclei occur before their satellites. For instance, at the sentence level, if adverbial clauses occur before the main clause, the whole sentence won’t feature an inverted pyramid structure. Type E (35%) is the top sentence pattern at this level, starting with a rhetorically subordinate unit (a satellite). Accounting for 30%, the second top sentence pattern is perfect inverted pyramids (Type A), the frequent occurrence of which basically comes from the small number of average nodes for each sentence. Naturally, rhetorical structures at the sentence level are expected to correspond to the relevant grammatical structures. For instance, “2+1”, a Type A structure is expected to be composed of two parts: a grammatical main clause, then a grammatically subordinate clause. Or consider “1+2”, a Type E structure, which might be expected to be composed of two parts with a main clause followed by a subordinate one, but this time, we see a structure of a grammatically subordinate clause, then a grammatical main clause. Such expectations might quite often come true (e.g., S1 and S2 in Table 6.23). But not all sentences behave as expected. Take for instance, S3 in Table 6.23. This compound sentence bears a Type A pattern rather than a Type F pattern as the beginning clause is considered to be both rhetorically and semantically more

Table 6.23 Sample sentence structures in text WSJ_0610 #

Sentence

Structure

Type

S1

C1: And now Kellogg is indefinitely suspending work (Nucleus) C2: On what was to be a $1 billion cereal plant. (Elaboration-object-attribute-e）

2+1

A

S2

C1: Led by its oat-based Cheerios line. (Circumstance) C2: General Mills has gained an estimated 2% share so far this year, mostly at the expense of Kellogg. (Nucleus)

1+2

E

S3

C1: Kellogg's current share is believed to be slightly under 40% (Nucleus) C2: While General Mills' share is about 27%. (Comparison)

2+1

A

S4

C1: “Cheerios and Honey Nut Cheerios have eaten away sales (Nucleus) C2: Normally going to Kellogg's corn-based lines (Elaboration-object-attribute-e) C3: Simply because they are made of oats”, (Explanation-argumentative) C4: Says Merrill Lynch food analyst William Maguire. (Attribution)

2+1+1 +1

B

S5

C1: The company said (Attribution) C2: it was delaying construction (Nucleus) C3: because of current market conditions. (Reason)

1+2+1

E

Discourse Dependency Relations 205 important. This case might indicate why Type F structures enjoy only a low percentage (6%) at the sentence level. Rhetorical structures do not completely correspond with grammatical structures. As illustrated in C4 in S4 and C1 in S5 (Table 6.23), Attribution, used in reported speeches, occurs quite frequently in news discourse, and thus constitutes an important contributing factor for the mismatch. Both C4 and C1 occur in main clauses, but rhetorically and semantically, they are less important, only providing the Attribution information. In such a relation, the satellite part (Attribution) usually contains phrases like “according to”, and in the nucleus the reported message is present in a separate clause. Also, Attribution is used with cognitive predicates to indicate hopes, feelings or thoughts (Carlson & Marcu, 2001). In both cases, the relevant RST structures are the opposite of their grammatical structures. Out of a total of 11,529 relations between clause nodes within sentences, Attribution accounts for 32% with its 3,639 instances. Of these 3,600+ cases, 1,444 of them take the initial position, and 679, the final position, resulting in high occurrences of both Type E and Type A structures, with percentages of 35 and 30, respectively. The more than 1/3 of the Type A structures at the sentence level indicate a much higher percentage than at the other two levels (8%–9%). The Type C structure requires at least four nodes, and is thus rare at the sentence level with a percentage of 6% only, which is quite similar to the highest level. Within sentences, there are only an average number of 3.17 nodes. The frequent occurrences of Attribution contribute to the fact some particular structural types occur more often than other types. We posit this relation is a feature of news-writing but further studies need to be carried out to check whether such a relation also occurs frequently in other genres. 6.4.4 Section summary

This section tests whether the inverted pyramid structure of news-writing, or the “summary + details” structuring obtains at the three levels of discourse process (forming sentences from clauses, paragraphs from sentences, and discourses from paragraphs). Having visually presented the structures, the study finds out that both the intermediate and highest levels bear a similar schematic top-down organisation. At the sentence level, the rhetorical structures, somehow differing from grammatical structures, are also visually and statistically presented. This endeavour of reframing the traditional RST trees into ones with only elementary EDUs at three distinct levels sheds light on new research prospects with its unique analytical advantages. In addition, the method of visually presenting the structuring can also be used in other genres and for other purposes, for instance in cross-language comparison and language learning. But we still hold doubt about whether RST relations bear nuclearity utterly corresponding to information importance. For instance, when Condition, a satellite is removed, it is problematic to make the remaining opaque contexts independent assertions.

206 Discourse Dependency Relations 6.5 Discourse valency 6.5.1 Introduction

As previously mentioned in Section 5.6, the rank-frequency distribution of syntactic valencies follows the ZA (right truncated modified Zipf-Alekseev) model and that of the motifs abides by the ZM (Zipf-Mandelbrot) model. We borrow from dependency grammar the idea of discourse dependency trees. Similarly, we can have discourse valencies, or the number of satellites. So here in this study, discourse valency refers to the number of dependents a governor has. We posit that discourse valencies might have similar distribution features to syntactic valencies. Thus we put forward the following two hypotheses: Hypothesis 6.5.1: The rank-frequency distributions of discourse valencies at three levels (with all terminal units being clauses, sentences and paragraphs, respectively) all abide by the ZA model. Hypothesis 6.5.2: The rank-frequency distributions of discourse valency motifs at three levels all abide by the ZM model. In the next part, we tell how to get the discourse valencies and their motifs along with the separation of the corpora into 6 sub-corpora at the three distinct levels. 6.5.2 Methods

In this part, for the three corpora (with all terminal units being clauses, sentences and paragraphs respectively), we divide each of them into six sub-corpora of equivalent sizes and examine whether they are generally homogeneous. If so, we can claim that the regularities discovered in this study can be a regularity for English news-reporting. The sizes for the three sets of sub-corpora are available in Table 6.24. Discourse valency is defined the same way as syntactic valency. Take Figure 6.15 (a re-presentation of Figure 4.5) as an example. Node 2 has five satellites (Nodes 1, 4, 5, 6 and 10) and thus has a valency of 5. Similarly, Node 10 has a valency of 2. Both Nodes 4 and 7 have a valency of 1. The rest nodes bear no Table 6.24 Sizes for the three sets of sub-corpora (C = clause, S = sentence, P = paragraph) Sub-corpora

Tokens

Sub-corpora

Tokens

Sub-corpora

Tokens

C-RST-1 C-RST-2 C-RST-3 C-RST-4 C-RST-5 C-RST-6

4236 4064 4360 4123 4230 4190

S-RST-1 S-RST-2 S-RST-3 S-RST-4 S-RST-5 S-RST-6

1362 1363 1379 1360 1353 1371

P-RST-1 P-RST-2 P-RST-3 P-RST-4 P-RST-5 P-RST-6

628 638 636 631 648 635

Discourse Dependency Relations 207

Figure 6.15 Relations between elementary clause nodes only in the converted discourse tree

208 Discourse Dependency Relations lower-level dependents, hence a valency of 0. So in order, the clause nodes of the sample paragraph have the following valency sequence: 0, 5, 0, 1, 0, 0, 1, 0, 0, 2. As a motif is defined as the longest sequence of undescending values when the quality is numerical (Köhler, 2015), the previous sequence can be represented by the following motifs: 0+5 0+1 0+0+1 0+0+2 Following the previous definition of discourse valency and valency motif, we can get the relevant data for all the three sets of sub-corpora. In the next sub-section, we will discuss the results and in the final part, we will summarise this section. It is worth mentioning that different from syntactic trees with only one root, there might be multi-nuclei in RST trees. For instance, three satellites might share two nuclei; in such a case the valencies of the two nuclei are both defined as 2. 6.5.3 Results and discussion

In this part, we take turns to address the two hypotheses. We examine first the distribution of the discourse valencies, and then the distribution of their motifs. 6.5.3.1 Rank-frequency distribution of discourse valencies

Table 6.25 presents the summary of discourse valencies when the terminal nodes are clauses, sentences and paragraphs, respectively. From the mean (M) and standard derivation (SD) of TTR (type-token ratio), entropy and RR (repetition rate), we can see generally, the six sub-corpora are very much similar, exhibiting certain homogeneity. With the nodes getting bigger, TTR and RR are getting bigger and entropy, smaller. Tables 6.26 (p. 210), 6.27 (p. 212), and 6.28 (p. 213) exhibit the discourse valencies when the nodes are clauses, sentences and paragraphs, respectively. The most frequent valencies are all 0, 1, 2 and so on in the same order. In most cases the rank is 1 plus the valency, which is quite similar to syntactic valencies. When percentages are concerned, valency 0 (for nodes which can only function as dependents) accounts for 50.3%–52% for clause nodes, 52.9%–56.8% for sentence nodes, and 58.8%–62.8% for paragraph nodes, all bigger than 46.1%–46.5% for valency 0 in syntactic trees. The general tendency seems to suggest that the bigger the ultimate nodes, the bigger percentages of valency 0. Fitting the ZA model to all the three sets of complete data yields excellent results, with all determination coefficient R2 above 0.99 (Table 6.29, p. 214). The

Discourse Dependency Relations 209 Table 6.25 Summary of discourse valencies (C = clauses, S = sentences, P = paragraphs) Text

Types

Tokens

TTR

Entropy

RR

C-RST-1 C-RST-2 C-RST-3 C-RST-4 C-RST-5 C-RST-6 M SD

16 17 17 19 16 17

4236 4064 4360 4123 4230 4190

0.0038 0.0042 0.0039 0.0046 0.0038 0.0041 0.0041 0.003077

2.07 2.05 2.05 2.09 2.08 2.05 2.0650 0.0176068

0.34 0.34 0.33 0.34 0.33 0.34 0.3367 0.0051640

Text

Types

Tokens

TTR

Entropy

RR

S-RST-1 S-RST-2 S-RST-3 S-RST-4 S-RST-5 S-RST-6 M SD

14 13 13 13 12 13

1362 1363 1379 1360 1353 1371

0.010 0.010 0.009 0.010 0.009 0.009 0.0095 0.0005477

1.84 1.80 1.84 1.81 1.85 1.86 1.8333 0.0233809

0.39 0.39 0.37 0.40 0.37 0.37 0.3817 0.0132916

Text

Types

Tokens

TTR

Entropy

RR

P-RST-1 P-RST-2 P-RST-3 P-RST-4 P-RST-5 P-RST-6 M SD

11 12 11 11 10 11

628 638 636 631 648 635

0.018 0.019 0.017 0.017 0.015 0.017 0.0172 0.0013292

1.76 1.66 1.75 1.65 1.76 1.72 1.7167 0.0500666

0.42 0.44 0.41 0.45 0.41 0.42 0.4250 0.0067082

results validate the Hypothesis 6.5.1. It seems that the parameter value of α generally grows with the EDUs getting bigger. From Zhang (2023), we extract the syntactic valency data. Therefore we are able to compare syntactic valency and discourse valency. To save space, we only compare the first five rankings with percentages bigger than 1%. From Table 6.30 (p. 214), the summary of percentages for both syntactic valency and discourse valency (percentages >1%), we can find some trends across levels. 1 The rankings across levels suggest the same order: rankings grow with valencies. 2 Valency 0 ranks first across levels. When the unit size gets bigger, valency 0 takes bigger proportions. A little than half syntactic valency is of a value of 0. Discourse valency of 0 between clause nodes accounts for about half of the total. This trend continues at higher levels. One possible reason is the number of nodes gradually gets smaller with the growth of node size. 3 Ranking second is all valency 1. At the syntactic level about 29.4%–30% of the nodes have one dependent, taking up a bigger percentage than discourse valency

210 Discourse Dependency Relations Table 6.26 Discourse valency (V) for clause nodes C-RST-1 Rank

V

C-RST-2

Frequency

%

V

C-RST-3

Frequency

%

V

Frequency

%

1

0

2,203

52.0

0

2,109

51.9

0

2,214

50.8

2

1

882

20.8

1

938

23.1

1

1,045

24.0

3

2

540

12.7

2

475

11.7

2

507

11.6

4

3

295

7.0

3

249

6.1

3

286

6.6

5

4

141

3.3

4

111

2.7

4

141

3.2

6

5

64

1.5

5

57

1.4

5

64

1.5

7

6

40

0.9

6

46

1.1

6

37

0.8

8

7

20

0.5

7

32

0.8

7

23

0.5

9

8

19

0.4

8

23

0.6

8

20

0.5

10

10

13

0.3

9

7

0.2

10

6

0.1

11

11

8

0.2

10

5

0.1

11

6

0.1

12

9

5

0.1

11

4

0.1

12

4

0.1

13

15

2

0.0

14

3

0.1

9

2

0.0

14

17

2

0.0

15

2

0.0

15

2

0.0

15

20

1

0.0

12

1

0.0

14

1

0.0

16

24

1

0.0

17

1

0.0

17

1

0.0

21

1

0.0

22

1

0.0

17 18 19 C-RST-4 Rank

V

C-RST-5

Frequency

%

V

C-RST-6

Frequency

%

V

Frequency

%

1

0

2,106

51.1

0

2,126

50.3

0

2,179

52.0

2

1

999

24.2

1

966

22.8

1

963

23.0

3

2

415

10.1

2

534

12.6

2

450

10.7

4

3

256

6.2

3

296

7.0

3

257

6.1

5

4

130

3.2

4

125

3.0

4

181

4.3

6

5

83

2.0

5

76

1.8

5

62

1.5

7

6

50

1.2

6

41

1.0

6

30

0.7

8

7

41

1.0

7

23

0.5

8

18

0.4 (Continued)

Discourse Dependency Relations 211 Table 6.26 (Continued) C-RST-4 Rank

V

C-RST-5

Frequency

%

V

C-RST-6

Frequency

%

V

Frequency

%

9

8

14

0.3

8

14

0.3

7

16

0.4

10

10

6

0.1

9

10

0.2

9

12

0.3

11

9

5

0.1

11

6

0.1

10

6

0.1

12

13

4

0.1

15

5

0.1

11

5

0.1

13

11

4

0.1

16

3

0.1

12

5

0.1

14

15

4

0.1

10

3

0.1

13

3

0.1

15

12

2

0.0

12

1

0.0

14

1

0.0

16

16

1

0.0

23

1

0.0

16

1

0.0

17

28

1

0.0

27

1

0.0

18

30

1

0.0

19

31

1

0.0

of 1. Here for discourse valencies, no significant difference is found when the unit size grows from clauses to sentences and then paragraphs. 4 Ranking third and fourth are valencies of 2 and 3, respectively, both showing a similar tendency—with the nodes getting larger, the percentages become smaller. 6.5.3.2 Rank-frequency distribution of discourse valency motifs

Having addressed the rank-frequency distribution of discourse valency, we continue with that of discourse valency motifs. Table 6.31 (p. 215) summarises the discourse valency motifs at all three levels. Different from the valencies (where with the nodes getting bigger, TTR and RR are getting bigger and entropy, smaller), for the valency motifs, the TTR is getting bigger with higher-level nodes, but entropy and RR are quite similar across levels. This time, the TTR at the two lower levels seems to be more alike, setting them apart from the highest level. Tables 6.32 (p. 216), 6.33 (p. 217) and 6.34 (p. 218) are the top 10 valency motifs for clause nodes, sentence nodes and paragraph nodes, accounting for 50.7%– 57.8%, 61.2%–65.7%, and 54.3%–63.1% of the total, respectively. At all these three levels, motif “0+1” ranks No. 1. When nodes are clauses, ranking second and third are “1” and “0+2”; when nodes are sentences, generally “0+2” or “0+1+1” ranks second; and when nodes are paragraphs, generally “1” or “0+2” comes second. This is somewhat different from the syntactic valency motif where motifs “1” and “0+2” come first and second, respectively (Table 5.53 and Table 5.54).

212 Discourse Dependency Relations Table 6.27 Discourse valency (V) for sentence nodes S-RST-1 Rank

V

S-RST-2

Frequency

%

V

S-RST-3

Frequency

%

V

Frequency

%

1

0

774

56.8

0

773

56.7

0

734

53.2

2

1

324

23.8

1

328

24.1

1

372

27.0

3

2

130

9.5

2

134

9.8

2

155

11.2

4

3

56

4.1

3

71

5.2

3

56

4.1

5

4

30

2.2

4

23

1.7

4

26

1.9

6

5

23

1.7

5

10

0.7

5

17

1.2

7

6

10

0.7

6

9

0.7

6

7

0.5

8

7

6

0.4

7

6

0.4

9

4

0.3

9

8

3

0.2

8

5

0.4

7

3

0.2

10

9

2

0.1

9

1

0.1

8

2

0.1

11

14

1

0.1

11

1

0.1

13

1

0.1

12

15

1

0.1

14

1

0.1

14

1

0.1

13

16

1

0.1

15

1

0.1

21

1

0.1

14

23

1

0.1

S-RST-4 Rank

V

S-RST-5

Frequency

%

V

S-RST-6

Frequency

%

V

Frequency

%

1

0

799

58.8

0

716

52.9

0

739

53.9

2

1

302

22.2

1

363

26.8

1

366

26.7

3

2

115

8.5

2

153

11.3

2

137

10.0

4

3

60

4.4

3

53

3.9

3

57

4.2

5

4

43

3.2

4

32

2.4

4

29

2.1

6

5

19

1.4

5

20

1.5

5

14

1.0

7

6

12

0.9

6

9

0.7

6

14

1.0

8

7

4

0.3

7

2

0.1

7

8

0.6

9

10

2

0.1

8

2

0.1

8

3

0.2

10

8

1

0.1

9

1

0.1

9

1

0.1

11

13

1

0.1

15

1

0.1

10

1

0.1

12

15

1

0.1

17

1

0.1

25

1

0.1

13

16

1

0.1

26

1

0.1

14

Discourse Dependency Relations 213 Table 6.28 Discourse valency (V) for paragraph nodes P-RST-1 Rank

V

P-RST-2

Frequency

%

V

P-RST-3

Frequency

%

V

Frequency

%

1

0

382

60.8

0

388

60.8

0

377

59.3

2

1

126

20.1

1

158

24.8

1

144

22.6

3

2

54

8.6

2

48

7.5

2

57

9.0

4

3

31

4.9

3

14

2.2

3

32

5.0

5

4

21

3.3

4

13

2.0

4

7

1.1

6

5

6

1.0

5

5

0.8

5

7

1.1

7

6

3

0.5

7

4

0.6

6

6

0.9

8

7

2

0.3

6

4

0.6

7

3

0.5

9

8

1

0.2

14

1

0.2

9

1

0.2

10

10

1

0.2

15

1

0.2

11

1

0.2

11

15

1

0.2

22

1

0.2

21

1

0.2

P-RST-4 Rank

V

P-RST-5

Frequency

%

V

P-RST-6

Frequency

%

V

Frequency

%

1

0

396

62.8

0

381

58.8

0

380

59.8

2

1

133

21.1

1

150

23.1

1

153

24.1

3

2

47

7.4

2

58

9.0

2

38

6.0

4

3

33

5.2

3

22

3.4

3

31

4.9

5

4

10

1.6

4

19

2.9

4

17

2.7

6

5

4

0.6

5

10

1.5

5

6

0.9

7

6

3

0.5

6

5

0.8

6

4

0.6

8

12

2

0.3

8

1

0.2

7

3

0.5

9

7

1

0.2

9

1

0.2

8

1

0.2

10

8

1

0.2

15

1

0.2

15

1

0.2

11

15

1

0.2

26

1

0.2

Fitting the ZM model to the complete rank-frequency data of the discourse valency motifs yields quite good to excellent results with all determination coefficients R2 over 0.948 (Table 6.35, p. 219). Thus, Hypothesis 6.5.2 is validated. 6.5.4 Section summary

This section borrows the valency idea from syntax and regards the discourse valency of a discourse node as the number of its satellites in the converted trees with

214 Discourse Dependency Relations Table 6.29 Fitting the ZA model to rank-frequency data of discourse valency R²

a

b

n

α

N

C-RST-1

0.9978

0.1666

0.8458

16

0.5201

4236

C-RST-2

0.9998

0.2703

0.8399

16

0.5191

4063

C-RST-3

0.9997

0.2413

0.8815

17

0.5078

4360

C-RST-4

0.9992

0.5334

0.7206

19

0.5108

4123

C-RST-5

0.9987

0.2460

0.8478

16

0.5026

4230

C-RST-6

0.9993

0.2495

0.8337

17

0.5200

4190

R²

a

b

n

α

N

S-RST-1

0.9999

1.0737

0.6401

14

0.5683

1362

S-RST-2

0.9997

0.7480

0.8121

12

0.5675

1362

S-RST-3

0.9997

0.8967

0.8163

13

0.5323

1379

S-RST-4

0.9996

1.2014

0.5490

13

0.5875

1360

S-RST-5

0.9996

0.5842

0.9242

12

0.5292

1353

S-RST-6

0.9997

1.0523

0.7019

13

0.5390

1371

R²

a

b

n

α

N

P-RST-1

0.9995

0.3944

0.8382

11

0.6083

628

P-RST-2

0.9994

2.4833

0.1850

12

0.6082

638

P-RST-3

0.9992

1.9081

0.3125

11

0.5928

636

P-RST-4

0.9990

0.6429

0.8454

11

0.6276

631

P-RST-5

0.9994

1.1817

0.5919

10

0.5880

648

P-RST-6

0.9976

2.0078

0.2887

11

0.5984

635

Table 6.30 Comparing syntactic valency and discourse valency (percentages >1%) Rank

1 2 3 4 5

Valency

0 1 2 3 4

Syntactic valency %

Discourse valency %

Words

Clauses

Sentences

Paragraphs

44.9–45.3 29.4–30.0 14.9–15.5 6.5–6.7 2.4–2.6

50.3–52 20.8–24.2 10.1–12.7 6.1–7.0 2.7–4.3

52.9–58.5 22.3–27 8.5–11.3 3.9–5.2 1.7–3.2

58.8–62.8 20.1–24.8 7.4–9.0 2.2–5.2 1.1–3.3

Discourse Dependency Relations 215 Table 6.31 Summary of discourse valency motifs at three levels (C = clauses, S = sentences, P = paragraphs)

C-RST-1 C-RST-2 C-RST-3 C-RST-4 C-RST-5 C-RST-6

S-RST-1 S-RST-2 S-RST-3 S-RST-4 S-RST-5 S-RST-6

P-RST-1 P-RST-2 P-RST-3 P-RST-4 P-RST-5 P-RST-6

Types

Tokens

TTR

Entropy

RR

195 202 190 207 204 194

1397 1282 1458 1308 1422 1416

0.14 0.16 0.13 0.16 0.14 0.14

5.72 5.90 5.56 5.83 5.69 5.59

0.04 0.04 0.05 0.04 0.05 0.05

Types

Tokens

TTR

Entropy

RR

79 92 76 78 84 86

481 441 484 472 472 477

0.16 0.21 0.16 0.17 0.18 0.18

4.85 5.18 4.91 4.95 5.05 5.02

0.07 0.05 0.06 0.06 0.06 0.06

Types

Tokens

TTR

Entropy

RR

59 62 59 58 59 69

188 184 196 187 205 184

0.31 0.34 0.30 0.31 0.29 0.38

5.02 5.09 5.02 4.86 4.93 5.23

0.05 0.05 0.05 0.06 0.06 0.05

mere elementary nodes. For the three converted corpora where the terminal nodes are clauses, sentences and then paragraphs respectively, each of them is divided into 6 sub-corpora of the same size. Each set of corpora is found to behave generally homogeneously. Thus it can be claimed that the universality unveiled therein can be a commonality for English news-reporting. This part of the study yields the following findings: 1 The rank-frequency of the discourse valencies, like that of syntactic valencies, abides by the ZA model. Also like the syntactic valencies, valency 0 ranks first and the general ranking order is an increasing order of valencies. 2 The rank-frequency of the discourse valency motifs, like that of syntactic valency motifs, abides by the ZM model. But unlike the syntactic valencies, motif “0+1” rather than “1” ranks first. 3 The fittings validate that both the discourse valencies and their motifs are results of diversification process. Having discussed discourse valency, we can move on to discourse dependency distance in the next sub-section.

216 Discourse Dependency Relations Table 6.32 Discourse valency motifs for clause nodes (top 10) (R. = rank, F = frequency) R.

C-RST-1

F

%

C-RST-2

F

%

C-RST-3

F

%

1

0+1

164

11.7

0+1

153

11.9

0+1

179

12.3

2

1

132

9.4

1

103

8.0

1

138

9.5

3

0+2

111

7.9

0+2

84

6.6

0+2

136

9.3

4

0+0+1

67

4.8

0+0+1

67

5.2

0+0+2

75

5.1

5

0+0+2

65

4.7

0+0+2

54

4.2

0+3

73

5.0

6

0+3

59

4.2

0+1+1

45

3.5

0+0+1

65

4.5

7

0+1+1

47

3.4

0+3

45

3.5

0+1+1

48

3.3

8

0+0+0+2

38

2.7

0+0+3

34

2.7

0+0+3

39

2.7

9

0+2+2

38

2.7

1+1

33

2.6

1+1

39

2.7

10

0+0+3

31

2.2

0+2+2

32

2.5

0+4

34

2.3

Sub-total R.

53.8

50.7

56.7

C-RST-4

F

%

C-RST-5

F

%

C-RST-6

F

%

1

0+1

157

12.0

0+1

180

12.7

0+1

176

12.4

2

1

123

9.4

1

135

9.5

1

140

9.9

3

0+2

84

6.4

0+2

133

9.4

0+2

121

8.5

4

0+0+1

72

5.5

0+3

68

4.8

0+0+1

82

5.8

5

0+1+1

60

4.6

0+0+2

56

3.9

0+0+2

64

4.5

6

0+0+2

60

4.6

0+0+1

56

3.9

0+3

57

4.0

7

0+3

49

3.7

0+1+1

46

3.2

0+4

51

3.6

8

0+0+3

37

2.8

0+0+3

45

3.2

0+1+1

51

3.6

9

1+1

32

2.4

0+2+2

35

2.5

0+0+3

48

3.4

10

0+4

26

2.0

1+1

29

2.0

1+1

29

2.0

Sub-total

53.5

55.1

57.8

6.6 Discourse dependency distance distribution 6.6.1 Introduction

Like the way we borrow the idea of discourse valency from syntax, we will also borrow the idea of discourse dependency distance from it. The methods of obtaining such data will be available in the next sub-section. Some researchers analyse RST from the perspective of distance, for instance rhetorical distance or the shortest path between an anaphora and its antecedent

Discourse Dependency Relations 217 Table 6.33 Discourse valency motifs for sentence nodes (top 10) (R. = rank, F = frequency) R.

S-RST-1

F

%

S-RST-2

F.

%

S-RST-3

F

%

1

0+1

84

17.5

0+1

72

16.3

0+1

73

15.1

2

0+0+1

49

10.2

0+0+1

32

7.3

0+2

49

10.1

3

0+2

46

9.6

0+2

30

6.8

1

44

9.1

4

1

42

8.7

1

23

5.2

0+0+2

33

6.8

5

0+0+2

24

5.0

0+1+1

22

5.0

0+0+1

32

6.6

6

0+3

22

4.6

0+3

21

4.8

0+3

21

4.3

7

0+1+1

14

2.9

0+0+3

19

4.3

0+1+1

17

3.5

8

0+0+3

13

2.7

0+0+0+1

19

4.3

0+0+3

15

3.1

9

0+4

11

2.3

0+0+2

19

4.3

0+0+0+1

13

2.7

10

0+5

11

2.3

0+0+0+2

13

2.9

0+1+2

11

2.3

Sub-total R.

65.7

61.2

63.6

S-RST-4

F

%

S-RST-5

F

%

S-RST-6

F

%

1

0+1

73

15.1

0+1

60

12.7

0+1

71

14.9

2

0+2

49

10.1

0+2

55

11.7

0+2

43

9.0

3

1

44

9.1

1

47

10.0

1

35

7.3

4

0+0+2

33

6.8

0+0+1

28

5.9

0+0+1

33

6.9

5

0+0+1

32

6.6

0+0+2

22

4.7

0+0+2

31

6.5

6

0+3

21

4.3

0+1+1

20

4.2

0+3

29

6.1

7

0+1+1

17

3.5

0+3

19

4.0

0+1+1

23

4.8

8

0+0+3

15

3.1

0+0+0+2

15

3.2

0+4

16

3.4

9

0+0+0+1

13

2.7

0+0+3

14

3.0

0+0+0+1

14

2.9

0+1+2

11

2.3

0+4

13

2.8

0+1+2

11

2.3

10

Sub-total

63.6

62.1

64.2

which goes along the rhetorical tree (Chiarcos & Krasavina, 2005, 2008). The rhetorical distances are different from the dependency distance in this study. Sun and Xiong (2019) convert 345 texts from the RST-DT into dependency trees and measure discourse complexity, merging the distance and network approaches. They measure mean rhetorical length of each rhetoric relation by discourse distance. Such a length is measured in discourse trees with clause nodes. For instance, Topic-drift has the longest length of 35.45. They also measure mean real length (by the number of EDUs). For instance, for Topic-drift, the mean real length is 68.82. They find that both mean discourse dependency distances (DDs)

218 Discourse Dependency Relations Table 6.34 Discourse valency motifs for paragraph nodes (top 10) (R. = rank, F = frequency) R.

P-Rst-1

F

%

P-Rst-2

F

%

P-Rst-3

F

1

0+1

19

10.1

0+1

23

12.5

0+1

18

9.2

2

0+2

18

9.6

0+0+1

17

9.2

0+2

18

9.2

3

0+0+1

16

8.5

1

15

8.2

1

17

8.7

4

0+1+1

13

6.9

0+2

14

7.6

0+0+1

16

8.2

5

0+3

10

5.3

0+0+2

9

4.9

0+0+3

13

6.6

6

0+0+0+2

9

4.8

0+0+0+1

8

4.3

0+3

12

6.1

7

1

9

4.8

0+0+0+2

6

3.3

0+0+0+2

8

4.1

8

0+0+3

8

4.3

0+1+1

5

2.7

0+0+2

8

4.1

9

0+0+0+1

7

3.7

0+1+1+1+1

5

2.7

0+1+1

6

3.1

0+1+2

5

2.7

0+0+1+1

5

2.7

0+0+0+1

6

3.1

10

sub-total

60.6

58.2

%

62.2

P-Rst-4

F

%

P-Rst-5

F

%

P-RST-6

F

%

1

0+1

28

15.0

0+1

31

15.1

0+1

26

14.1

2

0+2

19

10.2

0+2

22

10.7

1

19

10.3

3

1

17

9.1

1

16

7.8

0+2

11

6.0

4

0+0+1

14

7.5

0+0+1

14

6.8

0+0+1

11

6.0

5

0+0+3

11

5.9

0+1+1

10

4.9

0+3

9

4.9

6

0+0+0+3

7

3.7

0+0+2

9

4.4

0+4

6

3.3

7

0+0+2

6

3.2

0+3

7

3.4

0+0+0+0+3

5

2.7

8

0+3

6

3.2

0+0+0+0+3

6

2.9

0+0+2

5

2.7

9

0+0+0+1

5

2.7

0+4

5

2.4

0+0+1+1

4

2.2

0+1+1

5

2.7

0+0+1+1

5

2.4

0+0+4

4

2.2

10

sub-total

63.1

61.0

54.3

of clause nodes and rhetorical length of RST relation types between clause nodes abide by the power law distribution patterns very well. But it still remains a puzzle whether the result is subject to any fluctuation and whether such regularity can be observed across all node levels. In this section, we would like to further validate the linguistic status of such dependency distance and see whether it follows the same ZA model as the syntactic dependency distance. We pose the following research hypothesis: Hypothesis 6.6.1: Like syntactic dependency distance, the rank-frequency data of discourse dependency distance between all node levels also abide by the ZA model.

Discourse Dependency Relations 219 Table 6.35 Fitting the ZM model to the complete rank-frequency data of the discourse valency motifs

C-RST-1 C-RST-2 C-RST-3 C-RST-4 C-RST-5 C-RST-6

S-RST-1 S-RST-2 S-RST-3 S-RST-4 S-RST-5 S-RST-6

P-RST-1 P-RST-2 P-RST-3 P-RST-4 P-RST-5 P-RST-6

R²

a

b

n

N

0.9924 0.9955 0.9628 0.9937 0.9193 0.9885

1.5401 1.4118 1.8838 1.3485 1.1827 1.6299

4.1425 3.4502 7.9776 2.3672 0.6065 4.6275

195 202 190 207 204 194

1397 1282 1458 1308 1422 1416

R²

a

b

n

N

0.9799 0.9812 0.9924 0.9924 0.9559 0.9702

1.6642 1.6628 1.6709 1.6709 1.2005 1.3147

4.157 4.4113 3.9446 3.9446 1.0650 1.1162

86 84 76 76 92 79

477 472 484 484 441 481

R²

a

b

n

N

0.9763 0.9749 0.9483 0.9781 0.9938 0.9801

1.5797 1.1162 1.8924 1.1843 1.1677 1.0534

5.6673 1.4789 9.3284 1.3103 1.3056 1.2505

59 62 59 58 59 69

188 184 196 187 205 184

In the remainder of this section, the next part will address the methods of getting discourse dependency distance. In Section 6.6.3, we will discuss the results. The final part is a section summary. 6.6.2 Methods

We calculate the discourse dependency distance in the following way: 1 For each satellite-nucleus pair, the dependency distance is the position of the nucleus minus that of the satellite. This definition is the same as Sun and Xiong’s (2019) definition. 2 For every root, we assign distance 0 to it so that the roots are not left out. When there are multiple roots, each is assigned a distance value of 0. Table 6.36 illustrates how we obtain the discourse DD from Figure 6.6 (Converted tree of WSJ_0642 with elementary clause nodes only), Figure 6.7 (Converted tree of WSJ_0642 with elementary sentence nodes only), and Figure 6.8 (Converted tree of WSJ_0642 with elementary paragraph nodes only). In this sample, when the nodes are clauses, half the DD is of minus values (5 out of a total of 10) and the mean DD is −1.2 (the root included), suggesting that

220 Discourse Dependency Relations Table 6.36 Calculating discourse dependency distance (DD) for the converted discourse tree Dependent

Governor

DD

Clauses as nodes (Figure 6.6) (mean = −1.2)

Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10

Node 2 Root Node 4 Node 2 Node 2 Node 2 Node 10 Node 7 Node 10 Node 2

2 − 1 =1 0 4−3=1 2 − 4 = −2 2 − 5 = −3 2 − 6 = −4 10 − 7 = 3 7 − 8 = −1 10 − 9 = 1 2 − 10 = −8

Sentences as nodes (Figure 6.7) (mean = −1.5)

Sentence 1 Sentence 2 Sentence 3 Sentence 4

Root Sentence 1 Sentence 1 Sentence 1

0 1 − 2= −1 1 − 3= −2 1 − 4= −3

Paragraphs as nodes (Figure 6.8) (mean = −0.5)

Paragraph 1 Paragraph 2

Root Paragraph 1

0 1 − 2= −1

in most cases, the nucleus occurs before the satellite or that in the governor-final situations, the dependent is far from the governor. In this sample, four dependentgovernor pairs come in an adjacent manner. When the nodes are sentences, all dependencies occur in head-initial situations, suggesting in this case the most important information is laid at the very beginning. Finally, when the nodes are paragraphs, the only dependency is again a head-initial one, which is in agreement with the inverted pyramid structuring as discussed in Section 6.4. At all the three levels, the mean DDs are of minus values, suggesting some common mechanisms. Whether these observations with the sample will be the regularities for the whole corpus remains to be seen. We use the same three sets of sub-corpora from Section 6.5 with an aim of discovering systematic trends across the three levels. 6.6.3 Results and analysis

Table 6.37 presents the summary of discourse dependency distance at three levels. Some general tendencies are clearly visible from the table: 1 Within each set of sub-corpora, the data seem to suggest some homogeneity, at least when TTR, entropy, repeat rates, average DD and percentages of negative DD are concerned. M and SD values are available in Table 6.37. 2 When TTR, entropy and RR are concerned, it seems that the two upper levels (sentences and paragraphs as nodes) are more alike.

Discourse Dependency Relations 221 Table 6.37 Summary of discourse dependency distance (C = clauses as nodes, S = sentences as nodes, P = paragraphs as nodes)

C-RST-1 C-RST-2 C-RST-3 C-RST-4 C-RST-5 C-RST-6 M SD

Types Tokens TTR

Entropy RR

Average DD % of negative DD

98 135 130 182 126 86

4.07 4.17 4.06 4.19 3.98 3.98 4.08 0.0901

−4.2 −4.5 −4.7 −4.1 −4.6 −4.6 −4.45 0.2429

4176 4083 3982 4329 4109 4100

0.02 0.03 0.03 0.04 0.03 0.02 0.03 0.0075

Types Tokens TTR S-RST-1 S-RST-2 S-RST-3 S-RST-4 S-RST-5 S-RST-6 M SD

43 53 57 51 53 38

1354 1302 1308 1450 1344 1329

0.03 0.04 0.04 0.04 0.04 0.03 0.04 0.0052

Types Tokens TTR P-RST-1 P-RST-2 P-RST-3 P-RST-4 P-RST-5 P-RST-6 M SD

22 28 31 28 28 29

628 620 636 627 630 635

0.04 0.05 0.05 0.04 0.04 0.05 0.05 0.0055

0.13 0.12 0.13 0.13 0.14 0.13 0.13 0.0063

76.13 76.88 75.89 77.92 79.09 78.05 77.33 1.2394

Entropy RR

Average DD % of negative DD

3.29 3.39 3.38 3.13 3.17 3.13 3.25 0.1211

−3.0 −2.6 −3.2 −2.8 −3.1 −3.0 −2.95 0.2168

0.20 0.19 0.20 0.21 0.24 0.22 0.21 0.0179

84.19 78.42 81.27 81.86 84.23 85.33 82.55 2.5460

Entropy RR

Average DD % of negative DD

2.93 3.17 3.16 2.84 2.96 3.04 3.02 0.1316

−1.9 −2.2 −2.3 −1.9 −2.2 −2.5 −2.17 0.2338

0.20 0.19 0.19 0.22 0.22 0.20 0.20 0.0137

76.43 70.97 74.37 71.29 76.67 77.80 74.59 2.9003

3 With the nodes getting bigger, average DD gets shorter. A possible reason is that for the same passage, the number of clauses is much bigger than those of sentences and paragraphs. For Sun and Xiong’s (2019) study, the long distance might also result from a similar reason: they measure the distance between clause dependents and their roots in the frame of a whole text rather than within sentences. 4 All the average DDs are of negative values, indicating most nuclei occur before their satellites. This is in agreement with the findings in Section 6.4, suggesting that more salient information is generally put at more initial positions. 5 Interestingly, when the percentages of negative dependency distance are concerned, the highest level (M = 74.59%) and the lowest level (M = 77.33%) seem to be more alike with smaller percentages than the intermediate level (M = 82.55%). Tables 6.38 and 6.39 present the top 20 DD data for the lowest level with clause nodes and the intermediate level with sentence nodes, and Appendix 10, the complete data with paragraph nodes. Table 6.40 presents the top 10 statistics of

C-RST-1 D

C-RST-2

F

%

D

C-RST-3

F

%

D

C-RST-4 F

%

D

C-RST-5

F

%

D

C-RST-6

F

%

D

F

%

1

−1

1,279

30.5

−1

1,243

29.8

−1

1,310

32.1

−1

1,269

30.1

−1

1,335

31.7

−1

1,297

31.2

2

1

495

11.8

1

407

9.7

1

446

10.9

−2

412

9.8

−2

466

11.1

1

467

11.2

3

−2

399

9.5

−2

390

9.3

−2

380

9.3

1

394

9.4

1

441

10.5

−2

414

10.0

4

−3

315

7.5

−3

306

7.3

−3

306

7.5

−3

288

6.8

−3

341

8.1

−3

328

7.9

5

−4

227

5.4

−4

248

5.9

−4

218

5.4

−4

213

5.1

−4

234

5.6

−4

206

5.0

6

−5

163

3.9

−5

169

4.0

−5

166

4.1

−5

164

3.9

−5

164

3.9

−5

164

3.9

7

−6

131

3.1

−6

139

3.3

−6

122

3.0

−6

113

2.7

−6

110

2.6

−6

118

2.8

8

2

103

2.5

2

127

3.0

2

110

2.7

−7

99

2.4

2

91

2.2

2

111

2.7

9

3

103

2.5

−7

107

2.6

3

104

2.6

2

86

2.0

−7

84

2.0

−7

102

2.5

10

−7

89

2.1

3

85

2.0

−7

90

2.2

3

68

1.6

3

74

1.8

3

100

2.4

11

−8

73

1.7

−8

75

1.8

−8

62

1.5

−8

65

1.5

−8

62

1.5

−8

71

1.7

12

−9

55

1.3

−9

50

1.2

4

54

1.3

−9

59

1.4

−9

48

1.1

−9

54

1.3

13

−10

49

1.2

4

48

1.1

−9

52

1.3

−10

59

1.4

−10

40

1.0

−11

42

1.0

14

4

37

0.9

−10

47

1.1

−10

44

1.1

−11

49

1.2

4

40

1.0

4

36

0.9

15

−13

35

0.8

−11

37

0.9

−11

40

1.0

−12

47

1.1

−11

39

0.9

−10

34

0.8

16

−11

35

0.8

−13

33

0.8

5

34

0.8

−14

42

1.0

−13

27

0.6

−12

29

0.7

17

5

30

0.7

−12

29

0.7

−14

28

0.7

−13

37

0.9

−12

25

0.6

−16

24

0.6

18

−12

26

0.6

5

26

0.6

−12

27

0.7

4

32

0.8

−14

25

0.6

−21

23

0.6

19

−16

24

0.6

−15

22

0.5

6

25

0.6

−16

27

0.6

5

23

0.5

−14

22

0.5

20

−14

20

0.5

−14

19

0.5

−13

25

0.6

−15

25

0.6

−16

21

0.5

−13

21

0.5

Sub−total

87.8

86.3

89.4

84.3

87.7

88.1

222 Discourse Dependency Relations

Table 6.38 Discourse dependency distance (clauses as nodes, top 20) (DD = dependency distance, F = frequency)

Table 6.39 Discourse dependency distance (sentences as nodes, top 20) (R. = rank, DD = dependency distance, F = frequency, S = sentence) S-RST-1 DD

S-RST-2 F

%

DD

S-RST-3 F

%

DD

S-RST-4 F

%

DD

S-RST-5 F

%

DD

S-RST-6 F

%

DD

F

%

1

−1

539

39.8

−1

498

38.2

−1

527

40.3

−1

568

39.2

−1

603

44.9

−1

572

43.0

2

−2

191

14.1

−2

178

13.7

−2

179

13.7

−2

212

14.6

−2

170

12.6

−2

177

13.3

3

0

125

9.2

0

141

10.8

0

106

8.1

0

199

13.7

0

131

9.7

0

133

10.0

4

−3

109

8.1

−3

93

7.1

−3

90

6.9

−3

121

8.3

−3

89

6.6

−3

92

6.9

5

−4

59

4.4

1

69

5.3

1

72

5.5

−4

69

4.8

−4

48

3.6

−4

57

4.3

6

1

52

3.8

−4

62

4.8

−4

62

4.7

−5

52

3.6

1

42

3.1

1

47

3.5

7

−5

47

3.5

−5

37

2.8

−5

32

2.4

1

38

2.6

−5

35

2.6

−5

38

2.9

8

−6

30

2.2

−6

23

1.8

−6

23

1.8

−6

36

2.5

−6

28

2.1

−6

37

2.8

−7

24

1.8

2

23

1.8

2

23

1.8

−7

19

1.3

−7

24

1.8

−7

26

2.0

2

19

1.4

3

16

1.2

−7

22

1.7

−8

16

1.1

2

19

1.4

−8

19

1.4

11

−8

16

1.2

−8

15

1.2

3

20

1.5

2

15

1.0

−8

16

1.2

−10

16

1.2

12

−10

16

1.2

−7

14

1.1

−11

13

1.0

−10

12

0.8

−9

14

1.0

−14

12

0.9

13

−11

14

1.0

−11

12

0.9

−10

13

1.0

−9

10

0.7

−10

13

1.0

−9

10

0.8

14

−9

11

0.8

4

10

0.8

−8

12

0.9

−16

7

0.5

−11

8

0.6

−12

9

0.7

15

−14

9

0.7

−9

8

0.6

−16

10

0.8

−12

5

0.3

3

8

0.6

2

9

0.7

16

3

8

0.6

−16

8

0.6

−15

8

0.6

−30

4

0.3

−13

8

0.6

−13

7

0.5

17

−13

8

0.6

−12

7

0.5

−13

8

0.6

−23

4

0.3

−18

8

0.6

−11

6

0.5

18

−12

8

0.6

5

7

0.5

4

7

0.5

−11

4

0.3

−15

7

0.5

−16

6

0.5

19

−22

6

0.4

−13

6

0.5

−12

7

0.5

−40

3

0.2

−12

6

0.4

−15

6

0.5

20

−20

6

0.4

−10

6

0.5

−9

7

0.5

−14

3

0.2

−17

5

0.4

−21

5

0.4

Sub-total

95.8

94.7

94.9

96.3

95.4

96.6

Discourse Dependency Relations 223

9 10

224 Discourse Dependency Relations Table 6.40 Discourse dependency distance (paragraphs as nodes, complete data) (R. = rank, DD = dependency distance, F = frequency, P = paragraph, Ave. = average) P-RST-1

P-RST-2

P-RST-3

DD

F

%

DD

F

%

DD

F

%

1

−1

232

36.9

−1

215

34.7

−1

230

36.2

2

0

109

17.4

0

132

21.3

0

108

17.0

3

−2

96

15.3

−2

66

10.6

−2

78

12.3

4

−3

53

8.4

−3

38

6.1

−3

49

7.7

5

−4

28

4.5

1

28

4.5

1

40

6.3

6

1

26

4.1

−4

22

3.5

−4

27

4.2

7

−5

21

3.3

−5

17

2.7

−5

19

3.0

8

−6

10

1.6

2

14

2.3

−6

13

2.0

9

−8

9

1.4

−8

13

2.1

−7

10

1.6

10

2

8

1.3

−7

13

2.1

−9

7

1.1

Sub-total

94.2

P-RST-4

89.9 P-RST-5

91.4 P-RST-6

DD

F

%

DD

F

%

DD

F

%

1

−1

227

36.2

−1

247

39.2

−1

234

36.9

2

0

162

25.8

0

119

18.9

0

119

18.7

3

−2

75

12.0

−2

83

13.2

−2

72

11.3

4

−3

48

7.7

−3

35

5.6

−3

51

8.0

5

−4

26

4.1

−4

27

4.3

−4

40

6.3

6

−7

13

2.1

1

21

3.3

−5

27

4.3

7

1

11

1.8

−6

21

3.3

1

20

3.2

8

−5

10

1.6

−5

14

2.2

−6

12

1.9

9

−6

9

1.4

−8

13

2.1

−7

10

1.6

10

−8

9

1.4

−7

9

1.4

−8

7

1.1

Sub-total

94.1

93.5

93.3

Appendix 10, accounting for 89.9%–94.2% of the total. Some regularities among the levels are evident: 1 The most frequent distances are all “−1” at all levels, which indicates that the nucleus is directly before the satellite; in other words, the more salient information is before the secondary information.

Discourse Dependency Relations 225 2 For those DDs of negative values, the most frequent DDs follow the same order from “−1”, “−2” to “−3” and so on. DDs from “−1” to “−10” (the highlighted parts in Table 6.38) usually occur on the top 15 list. And distances from “1” to “3” (sometimes “4”) will also occur at this distance section, but with fewer occurrences. For instance, occurrences of distance “1” with clause nodes are about one third of those of distance “−1”. Figure 6.16 is the valency-frequency graph for Tables 6.38, 6.39 and 6.40. The thin dotted lines stand for the lowest level with clause nodes, the thin solid lines, the intermediate level with sentence nodes, and the thick dotted lines, the highest level with paragraph nodes. The basic shapes of the lines are quite similar except that distance “1” occurs more often than “0” when the nodes are clauses. The lines for sub-corpora at the same level nearly overlap with each other, suggesting homogeneity visually. Figure 6.17 presents the top 20 rank-frequency data of Figure 6.16. The six lines for each set of data in most cases coincide with each other, suggesting certain homogeneity among the six sub-corpora at the same level.

Figure 6.16 Valency-frequency curves for top 20 discourse valencies

226 Discourse Dependency Relations

Figure 6.17 Rank-frequency curves for top 20 discourse valencies

To compare valencies at both syntactic and discourse levels, we retrieve syntactic DD data from Table 5.62 (key data relating to DD) and Appendix 6 (top 20 DD sequencings in each sub-corpus). Table 6.41 summarises valencies at these two levels and reveals some interesting trends. 1 Interestingly, for the top eight syntactic DD ranks, they follow the same order 1, −1, 2, −2, −3, 3, 4, and −4 across the sub-corpora, all with the same absolute values (±1, ±2, ±3, ±4) neighbouring each other. But for discourse valency, the same absolute values are never “intimate” neighbours. Table 6.41 Comparing DD at syntactic and discourse levels Rank

1 2 3 4 5 6 7 8

Syntactic level

Discourse level

Words as nodes

Clauses as nodes

Sentences as nodes Paragraphs as nodes

DD

%

DD

%

DD

%

DD

%

1 −1 2 −2 −3 3 −4 4

31.4–32.7 21.1–22.8 10.2–10.5 9.9–10.3 6.1–6.5 3.6–3.9 3.3–3.5 1.6–1.8

−1 −2 1 −3 −4 −5 −6 −7

29.8–32.1 9.3–11.1 9.4–11.8 6.8–8.1 5.0–5.9 3.9–4.1 2.6–3.3 2.1–2.6

−1 −2 1 −3 −4 −5 −6 −7

38.2–44.9 12.6–14.6 6.6–8.3 3.6–4.8 2.6–5.5 2.4–3.6 1.8–2.8 1.1–2.0

−1 −2 −3 −4 1 −5 −6 −7

34.7–39.2 10.6–15.3 5.6–8.4 3.5–6.3 1.8–6.3 1.6–3.0 1.4–3.3 1.4–2.1

Discourse Dependency Relations 227 2 At the syntactic value, it seems that plus values 1 and 2 come before their corresponding minus values. The trend starts to reverse then. Minus values −3 and −4 come before their corresponding plus values. 3 At the discourse level, when the nodes are clauses or sentences, the rankings go with identical DDs: −1, −2, 1, −3, −4, −5, −6 and −7, with only one plus value (1) ranking third (accounting for 9.4%–11.8% with clauses as nodes and 6.6%–8.3% with sentences as nodes). When the nodes are paragraphs, the rankings are similar except for plus value 1, which ranks fifth with lower percentages (1.6%–3%). 4 At the syntactic level, head-final dependencies are more frequent while at the discourse level, head-initial dependencies are more frequent, suggesting initial parts as more central. This accords with the findings of large percentages of inverted pyramid structures for the news genre at the paragraph and discourse levels (Section 6.4). 5 With nodes getting bigger, the mean distance between a dependent and its governor grows as well, with mean DD of 0–0.09 across the six sub-corpora between words within sentences (M = 0.003), −2.5 to −1.9 between clauses within sentences (M = −2.17), −2.6 to −3.2 between sentences within paragraphs (M = −2.95) and −4.6 to −4.1 within discourse between paragraphs (M = −4.45). 6 Liu (2018) maintains that there is a threshold of DD within 3 at the syntactic level. Our research findings verify Liu’s hypothesis. If there is a threshold within 3 at the discourse level, we might claim that the same threshold might obtain at the paragraph level (with sentences as nodes) and at the sentence level (with clauses at nodes). But at the discourse level where the nodes are paragraphs, the threshold shall be about 5. But whether it’s 3 or 5, they are still within the range of 7±2, or within “a central memory store limited to 3 to 5 meaningful items” as suggested by Cowan (2010, p. 51). 7 Adjacent pairing of dependents and their governors occurs frequently at all levels, with about half of all dependencies at both syntactic level between words and paragraph level between sentences coming in a neighbouring fashion. For those between paragraphs within the discourse, the percentage is about 44% and between clauses within sentences, about 36%. Finally, we fit the ZA model to all the three sets of data and get excellent results with all determination coefficient R2 above 0.987 (Table 6.42). The fitting results verify the linguistic status of RST discourse valency at three levels, suggesting them as results of diversification processes. It seems Parameter α values can reflect the size of units, growing with them. Parameter α values also reflect some homogeneity within the same set of sub-corpora with M (= 0.31), SD (= 0.0082) for clause units, M (= 0.37), SD (= 0.0148) for sentence units, and M (= 0.41), SD (=0.0253) for paragraph units. 6.6.4 Section summary

This section examines discourse DDs, and finds that in all the three sets of converted RST corpora, the DDs abide by the ZA distribution pattern.

228 Discourse Dependency Relations Table 6.42 Fitting the ZA model to the rank-frequency distribution of discourse dependency distance

C-RST-1 C-RST-2 C-RST-3 C-RST-4 C-RST-5 C-RST-6

P-RST-1 P-RST-2 P-RST-3 P-RST-4 P-RST-5 P-RST-6

S-RST-1 S-RST-2 S-RST-3 S-RST-4 S-RST-5 S-RST-6

R²

a

b

n

α

N

0.9967 0.9914 0.9957 0.9932 0.9936 0.9974

0.5628 0.4529 0.5481 0.4539 0.5157 0.1446

0.2002 0.2143 0.2018 0.2050 0.2207 0.2974

98 135 130 182 126 86

0.2977 0.2998 0.3157 0.3015 0.3164 0.3098

4,176 4,083 3,982 4,329 4,109 4,100

R²

a

b

n

α

N

0.9912 0.9817 0.9983 0.9873 0.9946 0.9991

0.2138 0.1824 0.4416 0.5570 0.4554 0.3082

0.4796 0.4200 0.3493 0.4219 0.3863 0.4268

22 28 31 28 28 29

0.3694 0.3468 0.3616 0.3620 0.3921 0.3685

628 620 636 627 630 635

R²

a

b

n

α

N

0.9985 0.9971 0.9978 0.9902 0.9985 0.9987

0.3968 0.6487 0.3128 0.6569 0.6924 0.6342

0.3059 0.2379 0.3076 0.2906 0.2234 0.2535

43 53 57 51 53 38

0.3981 0.3825 0.4029 0.3917 0.4487 0.4304

1,354 1,302 1,308 1,450 1,344 1,329

The finding validates the linguistic status of discourse dependency distance and suggests them as results of diversification processes. Parameter values can reflect both the homogeneity within the same set of sub-corpora and also changes with unit size. Like syntactic DDs, there is a threshold of DD for discourse dependency (3~5), which might result from human parsers’ limited capacity of working memory. 6.7 Why reframe the trees? Having addressed several issues concerning discourse relations, discourse dependency distance and valency from the converted trees of RST discourse trees, we finally come to address a key question: In what way is the conversion necessary and linguistically interpretable? Initially, within the framework of RST analysis, there is a compositionality criterion (Marcu, 2000): in case there is an RST relation R between two textual spans, this relation also holds between the two most salient units. Such a criterion also holds inversely. Thus, the conversion in our study is operated in line with this compositionality criterion. Second, the conversion is also operated in line with the hierarchy principle, an essential principle for RST (Taboada & Mann, 2006a). Though in the converted

Discourse Dependency Relations 229 trees, all nodes are terminal, thus avoiding using recursively lower-layer constituents, the conversion still goes with the four constrains of the hierarchy principle: 1 Completeness. The conversion covers all the spans, leaving out no constituent. 2 Connectedness. All the nodes are connected. 3 Uniqueness. All the units in the reframed trees are terminal and each is a unique text span. 4 Adjacency. Nodes are connected with adjacent nodes, either directly or indirectly. Taboada and Mann (2006a, p. 430) claim that unit sizes in RST analysis can vary; however, previous study on RST trees with large unit sizes are basically “uninformative” and “arbitrary”. The conversion of the trees justifies that RST can work with large unit sizes. The analysis is also informative. We deem that the conversion constitutes a promising avenue which can be further explored. We cite as examples some research possibilities in several dimensions. First, this very study helps discover both the similarities and differences of RST relations and related discourse processes clearly and systematically. By analogy, we also study discourse valencies and dependency distances. The study of the inverted pyramid structure is a happy marriage between RST structures and genres or macrostructures (van Dijk, 1980). Second, units larger than clauses (for instance sentences, paragraphs in this study) shed lights on more orthographic clues. The tree drawing is thus tremendously facilitated. The conversion provides alternative presentations to RST trees (e.g. Figure 4.6 Representing Figure 4.5 horizontally) with clear hierarchy. Such trees save both time and space. This yields an affirmative answer to Taboada and Mann’s (2006a) question about whether RST representation has to be restricted to trees. It measures up to Taboada and Mann’s (2006a, p.434) expectation and is “convenient, easy to represent, and easy to understand”. Third, the recomposed trees can render it easier to join summarisation and RST (Marcu, 2000; Torres-Moreno, 2014) since RST trees sufficiently indicate importance in discourses (Marcu, 1999, 2000). Also, the reframed trees make more prominent the central notion of RST—nuclearity. Fourth, the conversion lowers computational complexity and sheds light on some practical problems proposed by Alami et al. (2015). Fifth, if this conversion is applied to other genres or languages, we might be able to find suitable models whose parameters might be able to tell the genres or languages apart. Such a topological application seems rather promising. Finally, up till now, in our study, the length-frequency relationship, dependency distance and valencies have been studied. When future studies cover more properties (for instance, information, number of layers, complexity) and their correlations, we might be able to construct a discourse synergetic model based on RST. In addition, measuring these properties might be conducted with various operations (e.g., Kockelman, 2009; Williams & Power, 2008). The conversion can provide some alternative solutions.

230 Discourse Dependency Relations We can still continue with the list. But previous discussions have already sufficed to answer the question raised in this section. Linguistically interpretable and technical possible, the conversion is thus rendered desirable. 6.8 Chapter summary This chapter draws on an analogy between syntactic dependency trees and RST (Rhetorical Structure Theory) discourse trees, chooses from the RST-DT (the Rhetorical Structure Theory Discourse Treebank) 359 Wall Street Journal articles with multiple paragraphs, and converts each original RST tree with both non-elementary and elementary discourse units (EDUs) into three additional dependency trees at three levels, all with mere EDUs (clauses, sentences and paragraphs, respectively). Such conversion provides unique analytical advantages and enables us to conduct several related investigations, which yield the following results: 1 The examination of RST relations per se illuminates the discourse processes of organising clauses into sentences, sentences into paragraphs and then paragraphs into discourses. It also suggests some interactions and homogeneity across the three levels, which are more obvious at the two upper levels where sentences are grouped into paragraphs and paragraphs into discourses. 2 The rank-frequency data of all RST relations (at the three levels of unit granularity and along three taxonomies of the relations) in the converted trees are found to observe the same ZA model, justifying RST relations as normal language entities and verifying the taxonomies of RST relations employed in the data. 3 In the same fashion, the motifs of RST relations are examined and found to observe the Zipf-Mandebrot model, further validating the linguistic status of discourse dependency relations. 4 The study examines the inverted pyramid structuring (“summary + details”) of news reporting. It visually presents and extents the inverted pyramid structuring idea from the discourse level to the paragraph level. In addition to the discourse level, the study finds that the body of the news reporting also goes with a macroproposition on top and bears a top-down schematic instalment organisation. Also visually and empirically presented, rhetorical structures at the sentence level are found to differ from grammatical structures to a certain degree. 5 From the reframed discourse dependency trees, we obtain the valency of nodes, which is defined as the number of immediate lower-level units under the very node. Such discourse valency is found to observe the same ZA model like those syntactic valencies. 6 Similarly, we obtain the dependency distance, which is defined as the distance from the lower-layer units to their immediate governing upper-layer units in the dependency tree. Such discourse distances are found to abide by the ZA distribution pattern. The aforementioned research findings validate the linguistic status of discourse relations, discourse valency and distance, suggesting them as regular linguistic entities and results of diversification processes.

Discourse Dependency Relations 231 References Alami, N., Meknassi, M., & Rais, N. (2015). Automatic texts summarization: Current state of the art. Journal of Asian Scientific Research, 5(1), 1–15. Al-khazraji, A. (2019). Analysis of discourse markers in essays writing in ESL classroom. International Journal of Instruction, 12(2), 559–572. Altmann, G. (1991). Modeling diversification phenomena in language. In U. Rothe (Ed.), Diversification processes in language: Grammar (pp. 33–46). Rottmann. Altmann, G. (1997). The art of quantitative linguistics. Journal of Quantitative Linguistics, 4(1–3), 13–22. Altmann, G. (2005a). Diversification process. In R. Köhler, G. Altmann, & G. Piotrowski (Eds.), Quantitative Linguistik, Ein Internationales Handbuch (Quantitative linguistics, an international handbook) (pp. 646–658). Walter de Gruyter. Altmann, G. (2005b). Phonic word structure (Die Lautstruktur des Worts). In R. Köhler, G. Altmann, & G. Piotrowski (Eds.), Quantitative Linguistik, Ein Internationales Handbuch (Quantitative linguistics, an international handbook) (pp. 191–207). Walter de Gruyter. Beliankou, A., Köhler, R., & Naumann, S. (2012). Quantitative Properties of Argumentation Motifs. In I. Obradović, E. Kelih, & R. Köhler (Eds.), Methods and applications of quantitative linguistics, selected papers of the 8th international conference on quantitative linguistics (QUALICO) (pp. 35–43). Belgrade, Serbia, April 26–29, 2012. Bell, A. (1991). The language of news media. Blackwell. Bille, P. (2005). A survey on tree edit distance and related problems. Theoretical Computer Science, 337(137), 217–239. Braddock, R. (1974). The frequency and placement of topic sentences in expository prose. Research in the Teaching of English, 8(3), 287–302. Cain, K. (2003). Text comprehension and its relation to coherence and cohesion in children’s fictional narratives. British Journal of Developmental Psychology, 21(3), 335–351. Carlson, L., & Marcu, D. (2001). Discourse tagging reference manual. ISI Technical Report ISI-TR-545, 54, 2001: 56. Carlson, L., Marcu, D., & Okurowski, M. E. (2002). RST discourse treebank, LDC2002T07 [Corpus]. Linguistic Data Consortium. Carlson, L., Marcu, D., & Okurowski, M. E. (2003). Building a discourse-tagged corpus in the framework of rhetorical structure theory. In J. van Kuppevelt & R. W. Smith (Eds.), Current directions in discourse and dialogue (pp. 85–112). Kluwer Academic Publishers. Chiarcos, C., & Krasavina, O. (2005). Rhetorical distance revisited: A pilot study. In Proceedings of the corpus linguistics 2005 conference. Birmingham, July 14–17, 2005. Chiarcos, C., & Krasavina, O. (2008). Rhetorical distance revisited: A parametrized approach. In B. Anton & K. Peter (Eds.), Constraints in discourse (pp. 97–115). John Benjamins. Cowan, N. (2001). The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24, 87–185. Cowan, N. (2005). Working memory capacity. Psychology Press. Cowan, N. (2010). The magical mystery four: How is working memory capacity limited, and why? Current Directions in Psychological Science, 19(1), 51–57. Cruz, J. C. B., Resabal, J. K., Lin, J., Velasco, D. J., & Cheng, C. (2021). Exploiting news article structure for automatic corpus generation of entailment datasets. In PRICAI 2021: Trends in artificial intelligence: 18th Pacific Rim international conference on artificial intelligence, PRICAI 2021, Part II 18 (pp. 86–99). November 8–12, 2021, Hanoi, Vietnam. Dai, Z., & Huang, R. (2021). A joint model for structure-based news genre classification with application to text summarization. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.

232 Discourse Dependency Relations d’Andrade, R. G. (1990). Some propositions about the relations between culture and human cognition. In J. W. Stigler & R. Shweder (Eds.), Cultural psychology: Essays in comparative human development (pp. 65–129). Cambridge University Press. Das, D., & Taboada, M. (2018). Signalling of coherence relations in discourse, beyond discourse markers. Discourse Processes, 55(8), 743–770. den Ouden, H., Noordman, L., & Terken, J. (2009). Prosodic realizations of global and local structure and rhetorical relations in read aloud news reports. Speech Communication, 51(2), 116–129. Dunn, A. (2005). Television news as narrative. In H. Fulton (Ed.), Narrative and media (pp. 140–152). Cambridge University Press. Errico, M. (1997). “The Evolution of the Summary News Lead”, Media History Monographs 1(1), http://blogs.elon.edu/mhm/files/2017/03/Media-History-Monographs-Volume-1.pdf. (Accessed 14 December 2022). Graesser, A. C., McNamara, D. S., & Louwerse, M. M. (2003). What do readers need to learn in order to process coherence relations in narrative and expository text. Rethinking reading comprehension, 82, 98. Graesser, A. C., Wiemer-Hastings, P., & Wiemer-Hastings, K. (2001). Constructing inferences and relations during text comprehension. Text Representation: Linguistic and Psycholinguistic Aspects, 8, 249–271. Grosz, B. J., & Sidner, C. L. (1986). Attention, intentions, and the structure of discourse. Journal of Computational Linguistics, 12(3), 175–204. Guo, F., He, R., Dang, J., & Wang, J. (2020). Working memory-driven neural networks with a novel knowledge enhancement paradigm for implicit discourse relation recognition. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 05) (pp. 7822–7829). April 2020. Haberlandt, K. (1982). Reader expectations in text comprehension. Advances in Psychology, 9, 239–249. Hovy, E. H., & Maier, E. (1992). Parsimonious or profligate: how many and which discourse structure relations? University of Southern California Marina Del Rey Information Sciences Inst. Hřebíček, L. (1999). Principle of emergence and text in linguistics. Journal of Quantitative Linguistics, 6(1), 41–45. Ibáñez, R., Moncada, F., & Cárcamo, B. (2019). Coherence relations in primary school textbooks: Variation across school subjects. Discourse Processes, 56(8), 764–785. Jurafsky, D. (2004). Pragmatics and computational linguistics. In L. R. Horn & G. Ward (Eds.), Handbook of pragmatics (pp. 578–604). Blackwell. Kehler, A., Kertz, L., Rohde, H., & Elman, J. L. (2008). Coherence and coreference revisited. Journal of Semantics, 25(1), 1–44. Khalil, E. N. (2006). Communicating affect in news stories: The case of the lead sentence. Text & Talk, 26(3), 329–349. Kibble, R., & Power, R. (2004). Optimizing referential coherence in text generation. Computational Linguistics, 30(4), 401–416. Kieras, D. E. (1980). Initial mention as a signal to thematic content in technical passages. Memory & Cognition, 8(4), 345–353. Kintsch, W. (1994). Text comprehension, memory, and learning. American Psychologist, 49(4), 294. Kintsch, W., & van Dijk, T. A. (1975). Recalling and summarizing stories. Language, 40, 98–116.

Discourse Dependency Relations 233 Kleijn, S., Henk, L. W., Maat, P., & Sanders, T. J. M. (2019). Comprehension effects of connectives across texts, readers, and coherence relations. Discourse Processes, 56(56s), 447–464. Kockelman, P. (2009). The complexity of discourse. Journal of Quantitative Linguistics, 16(1), 1–39. Köhler, R. (2012). Quantitative syntax analysis. De Gruyter Mouton. Köhler, R. (2015). Linguistic motifs. In G. K. Mikros, & J. Mačutek (Eds.), Sequences in language and text (pp. 107–129). Berlin/Boston: De Gruyter Mouton. Koupaee, M., & Wang, W. Y. (2018). Wikihow: A large scale text summarization dataset. arXiv preprint arXiv:1810.09305. Li, S., Wang, L., Cao, Z., & Li, W. (2014). Text-level discourse dependency parsing. In Proceedings of the 52nd annual meeting of the Association for Computational Linguistics (pp. 25–35). Baltimore, Maryland, USA, June 23–25, 2014. Liu, H. (2018). Language as a human-driven complex adaptive system. Comment on “Rethinking foundations of language from a multidisciplinary perspective”. Physics of Life Reviews, 26(27), 149–151. Liu, Y., Li, S., Zhang, X., & Sui, Z. (2016). Implicit discourse relation classification via multi-task neural networks. In Proceedings of the AAAI conference on artificial intelligence (Vol. 30, No. 1, pp. 2750–2756). March 2016. Louwerse, M. M. (2001). An analytic and cognitive parametrization of coherence relations. Cognitive Linguistics, 12(3), 291–315. Mann, W. C., & Thompson, S. A. (1988). Rhetorical structure theory: Toward a functional theory of text organization. Text, 8(3), 243–281. Marcu, D. (1999). Discourse trees are good indicators of importance in text. In I. Mani & M. Maybury (Eds.), Advances in automatic text summarization (pp. 123–136). The MIT Press. Marcu, D. (2000). The theory and practice of discourse parsing and summarization. The MIT press. McCarthy, K. S., & McNamara, D. S. (2021). The multidimensional knowledge in text comprehension framework. Educational Psychologist, 56(3), 196–214. Meyer, B. J. F., Brandt, D. M., & Bluth, G. J. (1980). Use of top-level structure in text: Key for reading comprehension in ninth-grade students. Reading Research Quarterly, 16(1), 72–103. Mindich, D. T. Z. (1998). Just the Facts - how “objectivity” came to define American journalism. New York University Press. Miragoli, S., Camisasca, E., & Di Blasio, P. (2019). Investigating linguistic coherence relations in child sexual abuse: A comparison of PTSD and non-PTSD children. Heliyon, 5(2), e01163. Morris, J., & Hirst, G. (1991). Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 17(1), 21–48. Moser, M. G., Moore, J. D., & Glendening, E. (1996). Instructions for coding explanations: Identifying segments, relations and minimal units. Department of Computer Science, University of Pittsburgh. Narasimhan, K., & Barzilay, R. (2015). Machine comprehension with discourse relations. In Proceedings of the 53rd annual meeting of the Association for Computational Linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers) (pp. 1253–1262). Beijing, China, July 26–31, 2015.

234 Discourse Dependency Relations Norambuena, B. K., Horning, M., & Mitra, T. (2020). Evaluating the inverted pyramid structure through automatic 5W1H extraction and summarization. Computational Journalism C+ J. Boston, MA, USA, March 20 – 21, 2020. Paltridge, B., Starfield, S., Ravelli, L. J., & Tuckwell, K. (2012). Change and stability: Examining the macrostructures of doctoral theses in the visual and performing arts. Journal of English for Academic Purposes, 11(4), 332–344. Pawłowski, A. (1999). Language in the line vs. language in the mass: On the efficiency of sequential modelling in the analysis of rhythm. Journal of Quantitative Linguistics, 6, 70–77. Peng, N., Poon, H., Quirk, C., Toutanova, K., & Yih, W. T. (2017). Cross-sentence n-ary relation extraction with graph LSTMs. Transactions of the Association for Computational Linguistics, 5, 101–115. Pitkin, W. L. (1977). Hierarchies and the discourse hierarchy. College English, 38(7), 648–659. Polanyi, L., Culy, C., van den Berg, M. H., & Thione, G. L. (2004). A rule based approach to discourse parsing. In Proceedings of the 5th SIGdial workshop in discourse and dialogue (pp. 108–117). Cambridge, MA, USA, May 1, 2004. Pöttker, H. (2003). News and its communicative quality: the inverted pyramid – when and why did it appear? Journalism Studies, 4(4), 501–511. Ratnayaka, G., Rupasinghe, T., de Silva, N., Warushavithana, M., Gamage, V., & Perera, A. S. (2018). Identifying relationships among sentences in court case transcripts using discourse relations. In 2018 18th International conference on advances in ICT for emerging regions (ICTer) (pp. 13–20). September 26–29, 2018. IEEE. Readership Institute. (2010). The value of feature-style writing. http://www.readership.org/ content/feature.asp (Accessed 22 January 2014). Rohde, H., & Horton, W. S. (2014). Anticipatory looks reveal expectations about discourse relations. Cognition, 133(3), 667–691. Sahu, S. K., Christopoulou, F., Miwa, M., & Ananiadou, S. (2019). Inter-sentence relation extraction with document-level graph convolutional neural network. arXiv preprint arXiv:1906.04684. Sanders, T., & Noordman, L. (2000). The role of coherence relations and their linguistic markers in text processing. Discourse Processes, 29(1), 37–60. Sanders, T., Land, J., & Mulder, G. (2007). Linguistics markers of coherence improve text comprehension in functional contexts. Information Design Journal, 15(3), 219–235. Sanders, T., & van Wijk, C. (1996). PISA: A procedure for analyzing the structure of explanatory texts. Text, 16(1), 91–132. Sanders, T. J., Spooren, W. P., & Noordman, L. G. (1992). Toward a taxonomy of coherence relations. Discourse Processes, 15(1), 1–35. Sanders, T. J., Spooren, W. P., & Noordman, L. G. (1993). Coherence relations in a cognitive theory of discourse representation. Psychology. https://doi.org/10.1515/cogl.1993.4.2.93 Scholman, M., & Sanders, T. (2014). Annotating coherence relations in corpora of language use. In Proceedings of the CLARIN annual conference. Soesterberg, The Netherlands. Scollon, R. (2000). Generic variability in news stories in Chinese and English: A contrastive discourse study of five days’ newspapers. Journal of Pragmatics, 32(6), 761–791. Scollon, R., & Scollon, S. (1997). Point of view and citation: Fourteen Chinese and English versions of the “same” news story. Text, 17(1), 83–125. Shie, J.-S. (2012). The alignment of generic discourse units in news stories, https://doi. org/10.1515/text-2012-0031. Text & Talk, 2(5), 661–679. Siregar, I., Rahmadiyah, F., & Siregar, A. F. Q. (2021). Linguistic intervention in making fiscal and monetary policy. International Journal of Arts and Humanities Studies, 1(1), 50–56.

Discourse Dependency Relations 235 Šnajder, J., Sladoljev-Agejev, T., & Vehovec, S. K. (2019). Analysing rhetorical structure as a key feature of summary coherence. In Proceedings of the fourteenth workshop on innovative use of NLP for building educational applications (pp. 46–51). August 2019, Florence, Italy. Spooren, W. (1997). The processing of underspecified coherence relations. Discourse Processes, 24(1), 149–168. Spooren, W., & Degand, L. (2010). Coding coherence relations: Reliability and validity. De Gruyter Mouton. Stede, M. (2004). The Potsdam commentary corpus. In Proceedings of the ACL 2004 Workshop on “Discourse Annotation” (pp. 96–102). Barcelona, Spain, July 25–26, 2004. Sternadori, M. (2008). Cognitive processing of news as a function of structure: a comparison between inverted pyramid and chronology. Doctoral dissertation, University of Missourif news as. Sun, K., & Xiong, W. (2019). A computational model for measuring discourse complexity. Discourse Studies, 21(6), 690–712. https://doi.org/10.1177/1461445619866985 Taboada, M., & Mann, W. (2006a). Rhetorical structure theory: Looking back and moving ahead. Discourse Studies, 8(3), 423–459. Telg, R., & Lundy, L. (2015/2021). News writing for print: WC191/AEC529, rev. 6/2021. EDIS, 2021(3). Thomson, E. A., Peter, R. R. W., & Kitley, P. (2008). “Objectivity” and “Hard News” Reporting Across Cultures. Journalism Studies, 9(2): 212–228. Torres-Moreno, J. M. (2014). Automatic text summarization. Wiley. van Dijk, T. A. (1980). Macrostructures: An interdisciplinary study of global structures in discourse, interaction, and cognition. Lawrence Erlbaum. van Dijk, T. A. (1982). Episodes as units of discourse analysis. In Tannen, D. (Ed.), Analyzing discourse: Text and talk (pp. 177–195). Georgetown University Press. van Dijk, T. A. (1983). Discourse analysis: Its development and application to the structure of news. Journal of Communication, 33(2), 20–43. van Dijk, T. A. (1986). News schemata. In R. C. Charles, & S. Greenbaum (Eds.), Studying writing: Linguistic approaches (pp. 155–185). Sage Publications. van Dijk, T. A. (1988a). News Analysis – Case Studies of International and National News in the Press. Lawrence Erlbaum. van Dijk, T. A. (1988b). News as discourse. Lawrence Erlbaum. van Dijk, T. A., & Kintsch, W. (1983). Strategies of discourse comprehension. Academic Press. Williams, J. P., Taylor, M. B., & de Cani, J. S. (1984). Constructing macrostructure for expository text. Journal of Educational Psychology, 76(6), 1065. Williams, S., & Power, R. (2008). Deriving rhetorical complexity data from the RST-DT corpus. In Proceedings of 6th international conference on language resources and evaluation (LREC) (pp. 2720–2724). Marrakech, Morocco, 28–30 May, 2008. Williams, S., & Reiter, E. (2003). A corpus analysis of discourse relations for Natural Language Generation. In Proceedings of the corpus linguistics 2003 conference (pp. 899–908). Lancaster University, UK, 28–31 March, 2003. Wolf, F., & Gibson, E. (2005). Representing discourse coherence: A corpus-based study. Computational Linguistics, 31(2), 249–287. Yue, M., & Liu, H. (2011). Probability distribution of discourse relations based on a Chinese RST-annotated corpus. Journal of Quantitative Linguistics, 18(2), 107–121. Yue, M., & Feng, Z. (2005). Findings in a preliminary study on the rhetorical structure of Chinese TV news reports. Paper presented at the first computational systemic functional Grammar conference, Sydney, Australia, 15–16 July.

236 Discourse Dependency Relations Zhang, H., & Liu, H. (2016a). Rhetorical relations revisited across distinct levels of discourse unit granularity. Discourse Studies, 18(4), 454–472. Zhang, H., & Liu, H. (2016b). Visualizing structural “inverted pyramids” in English news discourse across levels. Text & Talk, 36(1), 89–110. Zhang, H., & Liu, H. (2016c). Quantitative aspects of RST rhetorical relations across individual levels. Glottometrics, 33, 8–24. Zhang, H., & Liu, H. (2017a). Motifs in reconstructed RST discourse trees. Journal of Quantitative Linguistics, 24(24u), 107–127. Zhang, H. (2023). Quantitative syntactic features of Chinese and English—A dependencybased comparative study. Zhejiang University Press. (In Chinese).

7

Concluding Remarks

This final chapter comes to some concluding remarks. We first briefly address the basic findings, innovations and limitations (Section 7.1). Section 7.2 looks into the future by addressing some possible further endeavours. 7.1 A brief summary This study examines the dependency relations and related qualities (valency and dependency distance [DD]) at both syntactic and discourse levels. At the syntactic level, we choose as our research material the English part of the Prague Czech-English Dependency Treebank 2.0 (PCEDT 2.0) (Hajič et al., 2012), which contains the entire Wall Street Journal (WSJ) section of Penn Treebank from the Linguistic Data Consortium (1999) (LDC99T42). The study yields the following research findings at the syntactic level. 1 The complete dependency structure is represented as “dependent + governor = syntactic function/dependency relation”, where both the dependent and governor refer to their POS property. Such a structure can yield seven types of combinations of one or two or three elements. All these combinations are found to observe the same the right truncated modified Zipf-Alekseev (ZA) distribution model, and their motifs, a negative hypergeometric model. 2 Valency is defined as the number of dependents a governor has and the DD, the distance between a dependent and its governor. Both valency motifs and motif lengths abide by the ZA model. In addition, the interrelation between motif lengths and length frequencies can be captured by the hyper-Poisson model. 3 A concept of sequencing is proposed, which is defined as all the possible ordered strings of the elements/properties trimmed from a sentence. The distribution patterns of POS, dependency relation, DD and absolute DD (ADD) sequencings are examined and found to follow the ZA and Zipf-Mandebrot (ZM) models. When using these motifs, users won’t use all of them to the same extent, either consciously or, in most cases, subconsciously. If these observed entities are regular language entities, they are supposed to abide by a linguistically meaningful distribution model. Most existent motif studies find that motifs follow some particular DOI: 10.4324/9781003436874-7

238 Concluding Remarks patterns of distribution, like ZM distribution and ZA distribution models, suggesting the examined entities as results of diversification processes and as normal language entities. At the discourse level, we choose as our research material the 359 WST articles with multiple paragraphs from the RST-DT (the Rhetorical Structure Theory Discourse Treebank). The original RST trees go with both elementary and nonelementary discourse units. Drawing on an analogy between syntactic dependency trees and RST discourse trees, we borrow the algorithms of converting syntactic constituent trees into dependency trees, and convert each traditional RST tree into three new dependency trees where all units are elementary ones (clauses, sentences and paragraphs, respectively). The new trees are similar to syntactic dependency trees. We therefore can borrow the concepts of valency and DD from syntax and examine their distribution features. The study yields the following research findings at the discourse level: 1 Examining the RST relations illuminates the discourse processes of organising lower-level units into their immediate upper-level units (clauses into sentences, sentences into paragraphs and then paragraphs into discourses). The study finds both homogeneity and interactions across levels, particularly at the two upper levels. 2 We examine the rank-frequency distributions of all RST relations in the newly converted trees at three unit granularity levels (clauses, sentences and paragraphs) along three taxonomies of the relations per se. All these groups of relations observe the same ZA model, and their motifs, the ZM model. 3 The study examines whether the inverted pyramid structuring (“summary + details”) of news reporting obtains at the three levels. The structuring is visually and empirically presented. Both at the highest level (combining paragraphs into discourses) and the intermediate level (combining sentences into paragraphs), there is an inverted pyramid structure with a top-down schematic instalment organisation. At the lowest level (combining clauses into sentences), the rhetorical structures only partly overlap with grammatical structures in terms of significance. 4 Following the practice in syntax, we define the valency of a node as the number of its dependents, or immediate lower-level units under it. Like syntactic valency, discourse valency in this study is found to observe the same ZA model. 5 We define the discourse DD in a way similar to that in syntax. The distance from the lower-level units to their immediate upper-level units is found to abide by the ZA distribution pattern. The previous research findings validate the linguistic status of dependency relations, valency and DD at both syntactic and discourse levels and indicate them as results of diversification process. The research is innovative in terms of research methods: 1 To our best knowledge, this is the first empirical research to compare the dependency relations and related properties of the two language sub-systems— syntax and discourse.

Concluding Remarks 239 2 Examining the combinatorial distribution of units/properties (e.g., the four combinations of two to three elements of the complete dependency structure) might shed light on new research prospects. 3 Sequencings as a new type of unit might shed light on the sequential behaviours of language entities/properties, particularly when there are repeated elements in some patterns. 4 Three major types of examining the distribution patterns of linguistic units/ properties are employed, covering both ordered and unordered sets of elements. 5 The conversion of discourse trees into dependency trees provides unique analytical advantages. Drawing on an analogy between syntactic dependency trees and discourse dependency trees yields a lot of new concepts/units/properties and research potentials. 6 Presenting the structuring from the converted trees enables an empirical discussion on whether the top-down news reporting structuring obtains at various levels. But it also needs pointing out that in our study, the research materials contain only English journalistic articles from the WSJ. Whether the findings in this study can be readily applied to other languages or genres remains doubtful. For crossgenre and cross-language universality, more similar studies need to be conducted. Even within the two news corpora in the study, we can also differentiate types of newspaper genres (short articles, letters to the editor, etc.) (Webber, 2009) and examine the differences and similarities among these types. This can be a topic for further research. When the RST analysis is concerned, whether nuclearity of RST relations can readily equal importance in information in all cases still remains doubtful. It might be thorny to admit that the segmentation for the RST Treebank sometimes can be problematic, when, for example, complement clauses (such as “I think that…”) are separated. They are actually units of syntax rather than those of discourse. A more consistent segmentation might generate a slightly different result but won’t alter the general trend. So presently we content ourselves with the practice in the study. 7.2 Looking into the future—synergetic linguistic studies There are quite a number of dimensions to related further studies. We cite some as examples. For instance, for the various types of combinations of syntactic dependency structure, if a comparative study is carried out between two languages or among more languages, more similarities and differences among languages will be discovered in the syntactic domain. In addition, more studies can be done following the method of sequencing, which captures the linear feature of language entities/properties in a way different from motifs. Furthermore, one key factor that makes discourse trees different from syntactic trees is that there are multiple nuclei in discourse trees. Future research efforts shall include examining nuclearity with such multiple nuclei considered.

240 Concluding Remarks More research perspective concerns language as a system. The similar patterns of distribution across the two language sub-systems discovered in this study are reflections of languages being a human-driven complex adaptive system (Liu, 2018). Zipf (1935, 1949) studies word frequency in depth, emphasises the use of precise statistical methods to explore the structure and development of language and begins the study of the correlation between language properties. He has opened the door to the study of language as a system. Von Bertalanffy (1968) proposes systems theory, which treats the object of study as a system, analyses the structure and function of the system and investigates the interrelationships between the systems, its internal elements and its external environment, as well as the regularity of language development. The development of systems science has also influenced the empirical study of language as a system in linguistics. Language chaos theory, language catastrophe theory (Wildgen, 1982, 1990, 2005) and the theory of language holography (Qian, 2002) all recognise language as a dynamic system. Holland (1995) proposes the theory of complex adaptive systems, the core idea of which is that individual adaptability leads to system complexity (Lin & Liu, 2018). This idea has also been introduced into the study of linguistics. Briscoe (1998) maintains that both language and people are complex dynamic adaptive systems. Similarly, Wang (2006, 2011) proposes that language is a complex adaptive system and that language, language development and evolution, and the brain are directly and closely related, and that language is the result of an emergent process. In recent years more scholars have conducted studies related to language as a complex adaptive system (e.g., Deacon, 2003; Ellis, 2011, 2016; Ellis & Larsen-Freeman, 2009; Holland, 2006; Kretzschmar, 2015; Larsen-Freeman, 2018; Massip-Bonet, 2013; Steels, 2000; Beckner et al., 2009). To sum up, as Liu (2014, 2018) argues, language is a human-driven, multilayered, adaptive complex system. Language systems are characterised by the following features. First, language is a human-driven system that cannot exist independently of humans. The development of language systems is also influenced by natural and social factors as well as the physiological, psychological and cognitive factors of language users, etc. These influences from both external and internal human factors bring about the universality as well as diversity of language (Lin & Liu, 2018). Further studies can probe into these factors to check how they interact with language. Second, language is a multi-level system (Chen & Liu, 2019; Hudson, 2010; Köhler, 2012; Liu & Cong, 2014; Miyagawa et al., 2013). Köhler (2012) views language as a large system containing hierarchically organised subsystems such as lexicon, syntax and semantics. Changes experienced by a subsystem not only affect all parts of that subsystem, that is, sub-subsystems at a lower level; they also affect other parts of the system with which it is functionally connected. Liu and Cong (2014) use a complex network approach to study the linguistic networks of Chinese at four levels (syntax, semantics, character co-occurrence and word co-occurrence)

Concluding Remarks 241 and find relationships among the four subsystems are closely related to human cognition. Third, systems are dynamic, and dynamic systems operate around a common functional goal; in this dynamic process, they are subject to various perturbations resulting from the external environment as well as from their own internal elements (Xu, 2000). Language systems should also have this systemic commonality. The main goal of a language system is interpersonal communication, and in order to achieve this goal optimally, the components of the language system need to work together at the lexical, syntactic, as well as semantic and pragmatic levels. This synergy is governed by the Least Effort Principle, which was elaborated in Section 3.2. The above-mentioned interactions between multiple levels of language subsystems also reflects the idea that language systems are dynamic. Fourth, language is a complex system. “Complex” mainly means that the system is emergent, that “the overall behaviour of the system is not equal to the sum of the component behaviours” (Lin & Liu, 2018, p. 75). For example, the sentence “I like him” has a meaning that the three individual words “I”, “like”, and “he” do not have. Fifth, language is a self-regulating, self-organising adaptive system. Self- organisation refers to the formation of an orderly structure within the system automatically by internal synergy according to its own rules (Haken, 1978). Wildgen (1990) explores some basic principles of linguistic self-organisation. Since language is a system, it is a natural conclusion that we should and can follow a systems theory approach to the study of language. Köhler sees language as a large system with lexical, syntactic and semantic subsystems, and he suggests that “like an organism, language is a self-organising, selfregulating system, a special kind of dynamic system with its own characteristics” (Köhler, 2012, p. 169). He introduces the approach of synergetics (Haken, 1978), a branch of systematics, into the study of language. Synergetic linguistics (Köhler & Altmann, 1986), an interdisciplinary discipline that combines systems science and language science, thus came into being. The synergetic linguistic approach aims to provide a unified framework with which linguistic theories can be constructed. So another further study that is particularly worth trying is the synergetics (competition and cooperation) among the discourse properties. With more links between these properties discovered, we might be able to build several loops of interrelations among the discourse units/properties, constructing a discourse synergetic model based on the RST-DT or other discourse corpora. We expect more relevant research to address these interesting issues in order to reveal some scientific laws of language systems, especially syntactic and discourse subsystems, so as to increase knowledge and understanding of the language we use every day, even if this kind of new knowledge and understanding might constitute only a tiny bit of improvement in science. When science is concerned, it is “a system that explains the laws of objects” (Liu, 2017, p. 24). Science has three core elements (Altmann, 1993, p. 3). The “object” is the first element, including (Altmann, 1993, p. 3). In future investigations, in order to better understand the syntactic subsystem as the highest object of study, we can go deeper into the properties of the linguistic entities in the system, establish their linguistic status, and on this basis, explore the interplay and competition between the properties (i.e., synergistic relationships), clarify the structure of each relationship, and ideally, encompass all these objects of study in a comprehensive SYSTEM. This is the task of the synergetic linguistics. In short, the next step is to study the “system”, the supreme object of scientific research, in order to investigate the dynamic relations between the above-mentioned objects of linguistic research. “Approach”, the second element of science, is a quadruple encompassing (Altmann, 1993, p. 4). In the follow-up studies, we can continue to explore the relationship between discourse properties from a dependency perspective with an aim to establish a subsystem/model of discourse synergetics. The third element of science is “theory”, which includes concepts, laws and hypotheses. The present study has addressed some research questions, verified or disproved some research hypotheses, and these verified hypotheses are then language laws. In the next step, if we can verify more linguistic hypotheses, make them linguistic laws, and then connect these linguistic laws to make a network of laws, we can build a subsystem of discourse synergy based on dependencies. This research finding per se can be elevated to the status of a theory. From the above definition of science and elements of science as defined by Altmann (1993), the founder of modern quantitative linguistics, the study we have carried out in this book is also a scientific one. This study has not yet reached the highest object of scientific research (in the discourse subsystem, it can be a “discourse synergetic system”), but we are already on the way to explore this highest object. We believe our investigation has laid some necessary foundation for discourse synergetic system. Both quantitative linguistics and synergistic linguistics are interdisciplinary studies that extend existing models and that may also involve integration with other fields. For instance, psycholinguistics and sociolinguistics can be employed to extend linguistic inquiries. The combination of big data and interdisciplinary studies is a weapon for quantitative linguistics (Liang & Liu, 2016; Liu, 2015). As Liu (2015, p. 3) states, “Disciplinary studies are a powerful tool for discovering language laws”. Along this path, I believe we can discover more interesting linguistic laws. References Altmann, G. (1993). Science and linguistics. In R. Köhler & B. Rieger (Eds.), Contributions to quantitative linguistics (pp. 3–10). Kluwer. Beckner, C., Blythe, R., Bybee, J., Christiansen, M. H., Croft, T., Schoenemann, T., & Five Graces Group. (2009). Language is a complex adaptive system: Position paper. Language Learning, 59, 1–26.

Concluding Remarks 243 Briscoe, T. (1998). Language as a complex adaptive system: Coevolution of language and of the language acquisition device. In Proceedings of computational linguistics in the Netherlands meeting (Vol. 4) (pp. 3–40). Chen, H., & Liu, H. (2019). A quantitative probe into the hierarchical structure of written Chinese. In X. Chen & R. Ferrer-i-Cancho (Eds.), Proceedings of the first workshop on quantitative syntax (Quasy, SyntaxFest 2019) (pp. 25–32). Paris, France, August 26–30, 2019. Deacon, T. W. (2003). Multilevel selection in a complex adaptive system: The problem of language origins. In B. H. Weber & D. J. Depew (Eds.), Evolution and learning: The Baldwin effect reconsidered (pp. 81–106). The MIT Press. Ellis, N. C. (2011). The emergence of language as a complex adaptive system. In J. Simpson (Ed.), The Routledge handbook of applied linguistics (pp. 674–687). Routledge. Ellis, N. C. (2016). Salience, cognition, language complexity, and complex adaptive systems. Studies in Second Language Acquisition, 38(2), 341–351. Ellis, N. C., & Larsen-Freeman, D. (2009). Language as a complex adaptive system (Vol. 11). John Wiley and Sons. Hajič, J., Hajičová, E., Panevová, J., Sgall, P., Cinková, S., Fučíková, E., Mikulová, M., Pajas, P., Popelka, J., Semecký, J., Šindlerová, J., Stĕpánek, J., Toman, J., Urešová, Z., & Žabokrtský, Z. (2012). Prague Czech-English dependency treebank 2.0 LDC2012T08. Linguistic Data Consortium. Haken, H. (1978). Synergetics. Springer. Holland, J. H. (1995). Hidden order: How adaptation builds complexity. Basic Books. Holland, J. H. (2006). Studying complex adaptive systems. Journal of Systems Science and Complexity, 19, 1–8. Hudson, R. (2010). An introduction to word grammar. Cambridge University Press. Köhler, R. (2012). Quantitative syntax analysis. De Gruyter Mouton. Köhler, R., & Altmann, G. (1986). Synergetische Aspekte der Linguistik. Zeitschrift für Sprachwissenschaft, 5, 253–265. Kretzschmar, W. A. (2015). Language and complex systems. Cambridge University Press. Larsen-Freeman, D. (2018). Second language acquisition, WE, and language as a complex adaptive system (CAS). World Englishes, 37(1), 80–92. Liang, J., & Liu, H. (2016). Interdisciplinary studies of linguistics: Language universals, human cognition and big-data analysis. Journal of Zhejiang University (Humanities and Social Sciences Edition), 46(1), 108–118. (In Chinese). Linguistic Data Consortium. (1999). Penn Treebank 3. LDC99T42. Lin, Y., & Liu, H. (2018). Methods and trends in language research in the big-data era. Journal of Xinjiang Normal University (Philosophy and Social Sciences, 1(39), 72–83. (In Chinese). Liu, H. (2014). Language is more a human-driven system than a semiotic system. Comment on modelling language evolution: Examples and predictions. Physics of Life Reviews, 11, 309–310. Liu, H. (2015). Interdisciplinary studies are weapons to discover language laws. Journal of Zhejiang University, 13, 2015. (In Chinese). Liu, H. (Ed.). (2017). An introduction to quantitative linguistics. The Commercial Press. (In Chinese). Liu, H. (2018). Language as a human-driven complex adaptive system. Comment on “Rethinking foundations of language from a multidisciplinary perspective”. Physics of Life Reviews, 26(27), 149–151. 151. Liu, H., & Cong, J. (2014). Empirical characterization of modern Chinese as a multi-level system from the complex network approach. Journal of Chinese Linguistics, 1, 1–38.

244 Concluding Remarks Massip-Bonet, À (2013). Language as a complex adaptive system: Towards an integrative linguistics. Springer. Miyagawa, S., Berwick, R. C., & Okanoya, K. (2013). The emergence of hierarchical structure in human language. Frontiers in Psychology, 4. http://dx.doi.org/10.3389/fpsyg.2013. 00071 Qian, G. (2002). Language holography. Commercial Press. (In Chinese). Steels, L. (2000). Language as a complex adaptive system. In Parallel problem solving from nature PPSN VI: 6th International conference Paris, France, September 18ve system:proceedings 6 (pp. 17–26). Springer. von Bertalanffy, L. (1968). General system theory. George Braziller. Wang, S. (2006). Language is a complex adaptive system. Journal of Tsinghua University (Philosophy and Social Sciences), 21(06), 5–13. (In Chinese). Wang, S. (2011). Language, evolution and the brain. The Commercial Press. (In Chinese). Webber, B. (2009). Genre distinctions for discourse in the Penn TreeBank. In Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing (pp. 674–682). Suntec, Singapore, August 2009. Wildgen, W. (1982). Catastrophe theoretic semantics. An application and elaboration of Renn Thomrattheory. Benjamins. Wildgen, W. (1990). Basic principles of self-organization in language. In H. Haken & M. Stadle (Eds.), Synergetics of cognition. Proceedings of the international symposium at schloss Elmau (pp. 415–426). Springer. Wildgen, W. (2005). Catastrophe theoretical models in semantics (Katastrophentheoretische Modelle in der Semantik). In R. Köhler, G. Altmann, & G. Piotrowski (Eds.), Quantitative Linguistik, Ein Internationales Handbuch (Quantitative linguistics, an international handbook) (pp. 410–422). Walter de Gruyter. Xu, G. (2000). Systems science. Shanghai Scientific & Technological Education Publishing House. (In Chinese). Zipf, G. K. (1935). The psycho-biology of language. An introduction to dynamic philology. Houghton Mifflin. Zipf, G. K. (1949). Human behavior and the principle of least effort. Addison-Wesley.

Acknowledgements

This work is supported by the National Social Science Fund of China under Grant No. 19BYY109. Here I would also like to express my sincere gratitude to those who have helped render this academic work possible. First and foremost, I would like to thank Prof. Haitao LIU for being such an exceptional role model and mentor. Thank you for proposing the research objective that formed the foundation of this study. I have benefited enormously from your guidance, support and understanding throughout the research process, and I could not have done it without your push from time to time. I am truly grateful for you to encourage me to try to be a linguist. You have always been a great source of encouragement and inspiration to me throughout my linguistic journey which unfolds itself as beautiful and promising. I would also like to extend my heartfelt appreciation to Prof. Reinhard Köhler. Your advice and guidance have been invaluable in shaping my understanding of the subject. I regret that we have not been able to connect for years, but I will always cherish the lessons and insights that you have shared with me. Your linguistic expertise and unwavering commitment to excellence have been a constant source of motivation for me. I have printed your words as my desktop image which has been a constant reminder for me to pursue excellence. My sincere thanks go to anonymous reviewers for offering insightful suggestions that helped me refine and improve the book. Thank you for putting so much time and effort into reviewing my work and offering extremely valuable feedback, based on which I have made a lot of improvements. I would also like to extend my appreciation to Haiqi WU for providing me with valuable data that were crucial to the completion of this book. You have been so helpful for many of my academic publications. I would like to express my deepest gratitude to Ziqiang FANG for the invaluable contribution in proofreading my book. Your meticulous attention to detail and insightful feedback have significantly improved the quality of the final product. Thanks shall also go to Mr. Tianheng QI and Mrs. Yehua LIU of China Renmin University Press. Thank you for offering constant support and for urging me to finish the manuscript in time. Our cooperation has been smooth and pleasant. Your efficiency and support are great indication of the excellent culture of the Press.

246 Acknowledgements I want to offer my warmest thanks to Mrs. Riya Bhattacharya, Associate Project Manager of KnowledgeWorks Global Ltd. Our collaboration has always been smooth and delightful. You have demonstrated remarkable efficiency, patience, and a strong sense of time management, and provided clear instructions, which allowed us to make swift progress from manuscript submission to its formal publication. Thank you for showing your personal care and understanding towards me. Through working with you, I have also experienced the excellence of KnowledgeWorks Global’s culture. Finally, I want to express my gratitude to my love for consistent encouragement and support throughout this journey. Your kind gesture of giving me a red envelope (with money of course) on each day when I did research gave me the motivation to push through any obstacles that I have encountered. Thank you all for making me feel that the path of scientific research can be filled with joy, laughter, growth and discovery. Your warmth has made me feel that this world is truly beautiful.

Appendices

R.

%

E1

E2

R.

%

Freq.

POS

E3

Freq.

POS

E4

Freq.

POS

E5

Freq.

POS

E6

Freq.

POS

Freq.

POS

1

16.1-16.7

6628

nn

6601

nn

6773

nn

6664

nn

6873

nn

6746

nn

2

12.2–12.6

5047

in

5124

in

5056

in

5026

nnp

5184

nnp

5130

in

3

10.7–12.0

4651

nnp

4813

nnp

4418

dt

4884

in

4935

in

4785

nnp

4

9.7–10.5

4313

dt

4204

dt

4341

nnp

4018

dt

4008

dt

4132

dt

5

7.5–8.3

3417

jj

3093

jj

3287

jj

3255

jj

3168

jj

3121

nns

6

6.7–7.3

3024

nns

2938

nns

2897

nns

2965

nns

2770

nns

2939

jj

7

3.8–5.8

1605

rb

1575

cd

1565

rb

1924

cd

2393

cd

1920

cd

8

3.4–4.0

1553

cd

1556

rb

1533

cd

1667

rb

1423

rb

1420

rb

9

2.8–3.5

1383

vb

1435

vb

1365

vb

1167

vb

1165

vb

1252

vb

10

2.6–3.0

1106

to

1194

to

1116

to

1087

to

1151

to

1223

to

11

2.4–2.6

1031

cc

1008

cc

1089

prp

1051

prp

1002

vbd

1087

cc

12

2.2–2.5

924

prp

962

prp

1039

cc

974

cc

929

cc

947

vbd

13

2.0–2.1

833

vbd

850

vbn

817

vbz

883

vbd

925

others

876

prp

14

1.9–2.0

774

vbn

823

vbd

782

vbn

790

vbn

804

vbn

844

vbn

15

1.7–1.9

699

vbg

782

vbg

757

vbd

766

vbz

713

prp

702

vbg

16

1.5–1.9

674

vbz

774

vbz

682

vbg

616

vbg

620

vbz

633

vbz

(Continued)

248 Appendices

Appendix 1 POS of dependents (R. = rank, Freq. = frequency)

Appendix 1 (Continued) R.

%

E1

E2

R.

%

Freq.

17

1.2–1.5

492

vbp

492

vbp

571

others

599

others

572

vbg

624

others

18

1.1–1.3

490

prps

483

pos

519

vbp

492

vbp

486

pos

441

vbp

19

1.1–1.2

471

pos

463

prps

503

prps

474

prps

402

prps

441

prps

20

0.8–1.1

467

others

462

others

456

pos

467

pos

333

vbp

419

pos

21

0.6–0.8

319

md

333

md

312

md

242

md

254

md

293

md

22

0.5–0.6

230

wdt

239

wdt

214

wdt

219

wdt

202

wdt

205

wdt

23

0.3–0.5

198

nnps

148

rp

156

jjr

168

jjr

134

jjr

195

nnps

24

0.3–0.4

153

jjr

145

jjr

153

wp

152

rp

113

wp

185

jjr

25

0.3–0.4

140

rp

140

nnps

144

wrb

131

wp

107

rp

116

wp

26

0.2–0.3

127

wp

112

wp

142

rp

122

wrb

101

jjs

108

rp

27

0.2–0.3

117

wrb

108

wrb

136

nnps

101

jjs

93

nnps

106

wrb

28

0.2–0.3

93

jjs

98

rbr

104

rbr

75

rbr

80

wrb

98

rbr

29

0.1–0.2

83

rbr

97

jjs

96

jjs

41

ex

66

rbr

95

jjs

30

0.1

40

ex

38

ex

53

ex

38

nnps

58

lrb

45

ex

31

0.1

27

fw

31

pdt

29

rbs

29

fw

50

rrb

28

pdt

32

0.1

20

pdt

21

fw

25

pdt

28

rbs

40

ex

25

rbs

POS

E3

Freq.

POS

E4

Freq.

POS

E5

Freq.

POS

E6

Freq.

POS

Freq.

POS

Appendices 249

(Continued)

R.

%

E1

R.

%

Freq.

33

0.0–0.1

19

rbs

16

rbs

23

fw

23

pdt

19

pdt

4

fw

34

0.0

14

lrb

13

rrb

14

rrb

12

wps

17

rbs

4

wps

35

0.0

14

rrb

13

lrb

14

lrb

12

lrb

4

wp$

2

ls

36

0.0

12

wps

11

wps

10

wps

12

rrb

2

uh

1

uh

37

0.0

6

ls

7

uh

2

fw

38

0.0

4

uh

Total

E2 POS

41,198

E3

Freq.

POS

5 41,200

uh

E4

Freq.

POS

5

ls

2

uh

41,198

E5

Freq.

POS

41,181

E6

Freq.

POS

41,198

Freq.

POS

41,192

41,198

250 Appendices

Appendix 1 (Continued)

Appendix 2 POS of governors (the first algorithm) (R. = rank) R.

%

E1

E2

E3

E4

E5

E6

POS

Freq.

POS

Freq.

POS

Freq.

POS

Freq.

POS

Freq.

POS

Freq.

21.6–22.2

nn

9316

nn

8963

nn

9392

nn

9172

nn

9403

nn

8904

2

12.5–13

in

5345

in

5225

in

5179

in

5186

in

5141

in

5304

3

9.3–10

nns

4140

nns

3990

nns

3819

nns

3868

vbd

4126

vbd

4062

4

7.7–9.8

vbd

3258

vbd

3344

vbd

3160

vbd

3635

nns

3845

nns

4025

5

7.1–8.4

cc

2949

nnp

2961

cc

2913

nnp

3082

nnp

3461

cc

3085

6

6.6–7.1

nnp

2851

cc

2779

nnp

2789

cc

2794

cc

2701

nnp

2938

7

5.8–6.5

vbn

2435

vbn

2450

vbn

2378

vbn

2660

vbn

2540

vbn

2439

8

4.4–5.5

vb

2133

vb

2130

vbz

2259

vbz

2077

vb

1811

vb

1852

9

3.9–5.0

vbz

1732

vbz

2053

vb

2015

vb

1754

vbz

1444

vbz

1602

10

3.0–3.4

md

1301

vbg

1417

vbg

1257

to

1244

to

1261

to

1294

11

2.9–3.1

vbg

1290

to

1274

md

1232

vbg

1221

cd

1190

vbg

1194

12

2.6–3.0

to

1239

md

1240

to

1164

vbp

1063

md

1089

md

1128

13

2.1–2.6

vbp

1010

vbp

1057

vbp

1088

md

982

vbg

1052

vbp

878

14

1.4–2.2

cd

596

cd

601

cd

669

cd

894

vbp

654

cd

789

15

1.1–1.4

jj

484

jj

539

jj

579

jj

478

others

469

others

484

16

0.9–1.2

others

385

others

402

others

492

others

478

jj

456

jj

463 (Continued)

Appendices 251

1

R.

%

E1 POS

E2 Freq.

POS

E3 Freq.

POS

E4 Freq.

POS

E5 Freq.

POS

17

0.6–0.8

rb

295

rb

313

rb

339

rb

274

rb

18

0.2–0.5

nnps

197

nnps

154

nnps

123

dt

67

nnps

19

0.1–0.3

dt

67

dt

81

jjr

87

jjr

46

20

0.1–0.2

jjr

53

jjr

68

dt

73

nnps

21

0.1

prp

37

prp

37

rbr

33

22

0–0.1

jjs

21

rbr

35

prp

23

0–0.1

wrb

14

jjs

22

24

0.0

wps

12

wps

25

0.0

rbr

12

26

0.0

prps

27

0.0

28

E6 Freq. 250

POS

Freq.

rb

236

96

nnps

210

jjr

42

jjr

106

37

dt

34

rbr

75

prp

35

rbr

34

dt

59

32

jjs

27

jjs

23

jjs

18

jjs

32

rbr

24

prp

19

prp

16

18

lrb

21

fw

19

prps

10

prps

8

wdt

10

wps

17

wps

16

pos

8

wps

8

7

wrb

9

fw

16

wrb

9

lrb

8

pdt

4

wdt

4

fw

8

wrb

9

pos

8

wps

8

wdt

3

0.0

pos

4

prps

7

wp

9

lrb

7

wp

7

wrb

3

29

0.0

fw

4

pos

4

pdt

8

prps

6

wdt

5

pos

2

30

0.0

rp

3

lrb

3

prps

6

wp

4

pdt

3

fw

2

31

0.0

rbs

3

pdt

2

rp

6

wdt

4

wrb

3

rbs

1

32

0.0

wp

1

rbs

2

wdt

1

sym

4

rp

2 (Continued)

252 Appendices

Appendix 2 (Continued)

Appendix 2 (Continued) R.

%

E1 POS

E2 Freq.

POS

E3 Freq.

POS

Freq.

POS

Freq.

rbs

2

sym

1

uh

2

fw

1

rbs

1

wp

1

34

0.0

rp

1

35

0.0

rp

1

36

0.0

pdt

1

41,200

41,198

E6

Freq.

0.0

41,198

1

POS

E5

33

Total

rbs

E4

41,181

41,198

POS

Freq.

41,192

Appendices 253

R

Sequencing

%

R

Sequencing

%

1

nn

11.55

35

jj-nn-in

0.51

2

vb

5.03

36

prp-vb

3

in

3.97

37

4

nn-nn

3.37

5

dt

6

R

Sequencing

%

68

jj-jj

0.25

0.49

69

pos-nn

0.24

nn-in-dt-nn

0.48

70

nn-pos-nn

0.24

38

in-nn-nn

0.47

71

vb-dt-jj

0.23

3.30

39

rb-vb

0.42

72

&

0.23

jj

2.72

40

vb-rb

0.42

73

nn-nn-cc

0.23

7

nn-in

2.20

41

vb-jj

0.41

74

nn-dt

0.23

8

dt-nn

2.19

42

nn-nn-in

0.41

75

nn-in-jj

0.23

9

jj-nn

1.93

43

in-jj

0.40

76

&-cd

0.23

10

nn-vb

1.84

44

jj-nn-nn

0.39

77

in-cd

0.23

11

cd

1.44

45

md

0.38

78

vb-nn-nn

0.22

12

in-dt

1.32

46

dt-nn-vb

0.37

79

jj-jj-nn

0.22

13

in-nn

1.26

47

pos

0.37

80

vb-in-nn

0.22

14

rb

1.21

48

nn-pos

0.36

81

vb-to-vb

0.21

15

nn-nn-nn

0.99

49

nn-to

0.36

82

nn-md

0.21

16

cc

0.95

50

nn-nn-nn-nn

0.35

83

prp&-nn

0.21

17

to

0.90

51

in-dt-jj

0.34

84

nn-nn-nn-vb

0.21 (Continued)

254 Appendices

Appendix 3 Top 100 POS sequencing (all the corpora)

Appendix 3 (Continued) R

Sequencing

18

in-dt-nn

19

%

R

Sequencing

%

0.87

52

cc-nn

0.33

vb-dt

0.84

53

vb-to

20

dt-jj

0.80

54

21

nn-in-nn

0.79

22

vb-in

23

R

%

85

vb-jj-nn

0.21

0.32

86

vb-prp

0.20

nn-vb-vb

0.32

87

dt-jj-nn-in

0.20

55

in-jj-nn

0.31

88

in-dt-nn-in

0.20

0.78

56

nn-in-nn-nn

0.31

89

dt-nn-in-nn

0.20

prp

0.74

57

md-vb

0.31

90

nn-vb-nn

0.20

24

nn-in-dt

0.70

58

cd-cd

0.31

91

jj-in

0.20

25

nn-nn-vb

0.70

59

prp&

0.30

92

nn-to-vb

0.19

26

vb-vb

0.69

60

nn-vb-in

0.29

93

vb-dt-nn-in

0.19

27

vb-nn

0.67

61

nn-cc-nn

0.28

94

vb-dt-jj-nn

0.19

28

dt-jj-nn

0.64

62

nn-rb

0.28

95

in-nn-in

0.19

29

nn-cc

0.64

63

vb-in-dt

0.28

96

nn-jj

0.19

30

cd-nn

0.57

64

jj-nn-vb

0.27

97

vb-vb-in

0.18

31

dt-nn-in

0.55

65

in-dt-jj-nn

0.27

98

nn-in-jj-nn

0.18

32

dt-nn-nn

0.55

66

nn-vb-dt

0.27

99

vb-in-dt-nn

0.18

33

to-vb

0.52

67

in-dt-nn-nn

0.25

100

jj-nn-in-nn

0.18

34

VB-DT-NN

0.52

Appendices 255

Sequencing

R

Sequencing

%

R

Sequencing

%

1

atr

10.76

35

sb-auxv

0.48

2

adv

4.70

36

adv-auxp-atr

3

atr-atr

4.46

37

4

auxp

4.15

5

sb

6

R

Sequencing

%

68

root-auxp

0.25

0.48

69

adv-obj

0.24

obj-auxp-atr

0.48

70

obj-auxp-auxa

0.24

38

atr-atr-obj

0.45

71

atr-auxp-atr-atr

0.24

2.88

39

auxp-atr-atr-atr

0.42

72

root-auxa

0.24

auxa

2.79

40

atr-auxp-atr

0.42

73

adv-root

0.24

7

obj

2.07

41

pnom

0.42

74

auxp-auxa-atr-adv

0.23

8

auxp-atr

1.87

42

auxp-atr-adv

0.41

75

adv-sb

0.22

9

atr-atr-atr

1.80

43

auxp-auxa-atr-atr

0.4

76

atr-coord-atr

0.22

10

auxa-atr

1.61

44

auxa-adv

0.39

77

atr-auxp-auxa

0.22

11

root

1.54

45

auxa-sb

0.39

78

sb-auxv-adv

0.21

12

atr-adv

1.33

46

coord-atr

0.38

79

atr-root

0.21

13

atr-sb

1.28

47

atr-adv-auxp

0.37

80

atr-atr-auxp-atr

0.21

14

atr-obj

1.14

48

atr-sb-root

0.37

81

atr-atr-atr-sb

0.21

15

auxp-auxa

1.14

49

adv-auxa

0.36

82

auxa-atr-sb

0.21

16

adv-auxp

1.13

50

atr-atr-auxp

0.36

83

atr-sb-adv

0.2

17

auxp-atr-atr

1.02

51

atr-atr-atr-atr-atr

0.35

84

adv-auxp-auxa-atr

0.2 (Continued)

256 Appendices

Appendix 4 Top 100 relation sequencing in all the sub-corpora

Appendix 4 (Continued) R

Sequencing

18

obj-auxp

19

%

R

Sequencing

%

0.91

52

pred

0.35

auxp-adv

0.88

53

auxv-adv

20

nr

0.88

54

21

auxv

0.87

22

atr-auxp

23

R

%

85

pnom-auxp

0.2

0.32

86

atr-adv-auxp-atr

0.19

sb-auxp

0.31

87

adv-atr-atr

0.19

55

auxp-auxa-adv

0.3

88

atr-sb-auxv

0.19

0.81

56

adv-auxp-auxa

0.3

89

auxp-atr-atr-adv

0.19

auxp-auxa-atr

0.80

57

atr-coord

0.3

90

atr-atr-obj-auxp

0.19

24

coord

0.78

58

sb-atr

0.29

91

nr-atr

0.19

25

sb-root

0.76

59

auxa-atr-atr-atr

0.29

92

obj-atr

0.19

26

atr-atr-atr-atr

0.76

60

root-atr

0.29

93

auxp-adv-auxp

0.19

27

auxa-atr-atr

0.71

61

auxa-obj

0.29

94

sb-auxp-atr

0.19

28

adv-atr

0.68

62

adv-auxp-adv

0.28

95

coord-atr-atr

0.19

29

sb-adv

0.59

63

atr-obj-auxp-atr

0.28

96

atr-atr-atr-atr-atr-atr

0.19

30

auxc

0.55

64

obj-auxp-atr-atr

0.27

97

auxa-atr-obj

0.19

31

atr-obj-auxp

0.52

65

adv-auxp-atr-atr

0.26

98

nr-auxp

0.19

32

atr-atr-adv

0.50

66

atr-auxa

0.25

99

obj-auxp-auxa-atr

0.18

33

atr-atr-sb

0.49

67

auxa-atr-adv

0.25

100

auxp-adv-atr

0.18

34

adv-adv

0.48

Appendices 257

Sequencing

R.

E1

E2

Freq.

DD

E3

%

Freq.

DD

%

Freq.

DD

%

1

13,098

1

31.7

12,972

1

31.4

13,171

1

31.9

2

9191

−1

22.3

9434

−1

22.8

9173

−1

22.2

3

4333

2

10.5

4232

2

10.2

4279

2

10.4

4

4205

−2

10.2

4151

−2

10.0

4240

−2

10.3

5

2616

−3

6.3

2661

−3

6.4

2630

−3

6.4

6

1561

3

3.8

1521

3

3.7

1519

3

3.7

7

1374

−4

3.3

1391

−4

3.4

1401

−4

3.4

8

707

4

1.7

724

−5

1.8

715

4

1.7

9

688

−5

1.7

676

4

1.6

700

−5

1.7

10

431

−6

1.0

441

−6

1.1

427

−6

1.0

11

428

5

1.0

408

5

1.0

396

5

1.0

12

288

6

0.7

263

−7

0.6

263

6

0.6

13

239

−7

0.6

255

6

0.6

243

−7

0.6

14

221

7

0.5

195

7

0.5

224

7

0.5

15

214

−8

0.5

195

−8

0.5

202

−8

0.5

16

179

8

0.4

172

9

0.4

171

8

0.4 (Continued)

258 Appendices

Appendix 5 Rank–frequency data of DD (Top 20, 97.2%–97.7%) (R. = rank, Freq. = frequency)

Appendix 5 (Continued) R.

E1

E2

Freq.

DD

%

E3

Freq.

DD

%

Freq.

DD

%

17

160

9

0.4

165

8

0.4

161

9

0.4

18

127

−9

0.3

158

10

0.4

130

−9

0.3

19

117

10

0.3

141

−9

0.3

122

10

0.3

20

95

11

0.2

116

−10

0.3

114

−10

0.3

Sub-total R.

97.6

E4

97.5 E5

Freq.

DD

97.7 E6

%

Freq.

DD

%

Freq.

DD

%

13,476

1

32.7

13,412

1

32.5

13,097

1

31.8

2

8704

−1

21.1

8835

−1

21.4

9333

−1

22.7

3

4281

2

10.4

4267

2

10.3

4246

2

10.3

4

4196

−2

10.2

4083

−2

9.9

4157

−2

10.1

5

2511

−3

6.1

2575

−3

6.2

2680

−3

6.5

6

1602

3

3.9

1482

3

3.6

1509

3

3.7

7

1426

−4

3.5

1368

−4

3.3

1350

−4

3.3

8

708

4

1.7

729

4

1.8

703

4

1.7

9

705

−5

1.7

712

−5

1.7

674

−5

1.6 (Continued)

Appendices 259

1

R.

E4

E5

Freq.

DD

%

E6

Freq.

DD

%

Freq.

DD

%

10

458

−6

1.1

483

5

1.2

449

−6

1.1

11

447

5

1.1

443

−6

1.1

390

5

0.9

12

329

6

0.8

286

6

0.7

276

−7

0.7

13

279

−7

0.7

259

−7

0.6

260

6

0.6

14

216

7

0.5

237

7

0.6

229

7

0.6

15

186

8

0.5

193

−8

0.5

200

8

0.5

16

175

−8

0.4

169

8

0.4

176

−8

0.4

17

141

9

0.3

157

10

0.4

165

9

0.4

18

129

−9

0.3

145

9

0.4

135

−9

0.3

19

127

10

0.3

138

−9

0.3

126

10

0.3

20

101

−10

0.2

122

−10

0.3

103

−10

0.3

Sub-total

97.6

97.2

97.7

260 Appendices

Appendix 5 (Continued)

Appendix 6 Top 20 DD sequencings in each sub-corpus (R. = rank, Freq. = frequency, DD = dependency distance) R.

E1-DD

%

Freq.

E2-DD

%

Freq.

E3-DD

%

1

13,098

1

10.1

12,972

1

10.1

13,171

1

10.0

2

9191

−1

7.1

9434

−1

7.4

9173

−1

7.0

3

4333

2

3.3

4232

2

3.3

4279

2

3.3

4

4205

−2

3.2

4151

−2

3.2

4240

−2

3.2

5

3851

2, 1

3.0

3749

2, 1

2.9

3797

2, 1

2.9

6

3579

1, −2

2.8

3456

1, −2

2.7

3619

1, −2

2.8

7

2695

−2

2.1

2830

−1, −1

2.2

2754

−1, 1

2.1

8

2657

0

2.0

2708

−1, 1

2.1

2696

−1, −1

2.1

9

2616

−3

2.0

2661

−3

2.1

2630

−3

2.0

10

2085

1, −3

1.6

2074

1, −3

1.6

2109

1, −3

1.6

11

2002

0

1.5

1965

0

1.5

2024

0

1.5

12

1951

1, 1

1.5

1929

1, 1

1.5

1933

1, 1

1.5

13

1598

−2

1.2

1525

2, 1, −3

1.2

1617

−1, 1, −2

1.2

14

1561

3

1.2

1521

3

1.2

1558

2, 1, −3

1.2

15

1515

2, 1, −3

1.2

1520

−2, −1

1.2

1530

−2, −1

1.2

16

1439

1, 0

1.1

1511

−1, 1, −2

1.2

1519

3

1.2

17

1438

−3

1.1

1400

1, 0

1.1

1440

1, 0

1.1 (Continued)

Appendices 261

Freq.

R.

Freq.

E1-DD

%

Freq.

E2-DD

%

Freq.

E3-DD

%

18

1374

−4

1.1

1391

−4

1.1

1401

−4

1.1

19

1243

1, −2, −1

1.0

1300

1, −2, −1

1.0

1356

1, −2, −1

1.0

Sub-total

48.0

Freq.

E4-DD

%

Freq.

E5-DD

%

Freq.

E6-DD

%

1

13,476

1

10.4

13,412

1

10.5

13,097

1

10.0

2

8704

−1

6.7

8835

−1

6.9

9333

−1

7.1

3

4281

2

3.3

4267

2

3.3

4246

2

3.3

4

4196

−2

3.2

4083

−2

3.2

4157

−2

3.2

5

3784

2, 1

2.9

3719

2, 1

2.9

3739

2, 1

2.9

6

3527

1, −2

2.7

3441

1, −2

2.7

3451

1, −2

2.6

7

2511

−3

1.9

2633

−1, 1

2.1

2804

−1, −1

2.1

8

2505

−1, −1

1.9

2575

−3

2.0

2680

−3

2.1

9

2486

−1, 1

1.9

2468

−1, −1

1.9

2639

−1, 1

2.0

10

2174

1, 1

1.7

2122

1, 1

1.7

2095

1, 1

1.6

11

2072

0

1.6

2007

0

1.6

2071

1, −3

1.6

12

1974

1, −3

1.5

1960

1, −3

1.5

2068

0

1.6

13

1602

3

1.2

1528

−1, 1, −2

1.2

1565

−1, 1, −2

1.2

48.6

47.9

(Continued)

262 Appendices

Appendix 6 (Continued)

Appendix 6 (Continued) R.

Freq.

E4-DD

%

Freq.

E5-DD

%

Freq.

E6-DD

%

14

1476

1, 0

1.1

1482

3

1.2

1509

3

1.2

15

1430

2, 1, −3

1.1

1461

1, 0

1.1

1487

2, 1, −3

1.1

16

1426

−4

1.1

1452

2, 1, −3

1.1

1431

−2, −1

1.1

17

1418

−2, −1

1.1

1371

−2, −1

1.1

1413

1, 0

1.1

18

1417

−1, 1, −2

1.1

1368

−4

1.1

1350

−4

1.0

19

1225

1, −2, −1

0.9

1184

1, −2, −1

0.9

1223

1, −2, −1

0.9

Sub-total

47.7

47.9

47.8

Appendices 263

Rank

Sequencing

Frequency

%

1

1

81,686

10.51

2

−1

58,132

3

−2

4

Rank

Sequencing

Frequency

%

51

3, 1

2434

0.31

7.48

52

1, −2, −1, 1

2416

0.31

32,230

4.15

53

−2, −1, −1

2404

0.31

2

27,175

3.50

54

1, −5

2385

0.31

5

2, 1

22,639

2.91

55

2, 1, 0

2364

0.30

6

−3

21,305

2.74

56

1, 0, 1

2271

0.29

7

1, −2

21,073

2.71

57

−4, −1

2268

0.29

8

0

16,431

2.11

58

−1, 1, −2, −1

2267

0.29

9

−1, −1

13,303

1.71

59

−2, −1, 1

2253

0.29

10

−1, 1

13,220

1.70

60

1, 3

2196

0.28

11

−4

12,614

1.62

61

−7

2149

0.28

12

1, −3

12,273

1.58

62

−1, −1, 1, −2

2088

0.27

13

1, 1

12,204

1.57

63

3, −1

2061

0.27

14

3

9553

1.23

64

3, 1, 1

2024

0.26

15

2, 1, −3

8967

1.15

65

3, 2, 1, −4

1986

0.26

16

1, 0

8629

1.11

66

1, −3, 1

1958

0.25

17

−1, 1, −2

7638

0.98

67

0, −1, −1

1938

0.25 (Continued)

264 Appendices

Appendix 7 Top 100 DD sequencings in all the corpora

Appendix 7 (Continued) Rank

Sequencing

Frequency

%

18

1, −2, −1

7531

0.97

19

−2, −1

7270

20

−5

21

Rank

Frequency

%

68

−1, 3

1830

0.24

0.94

69

−2, −3

1814

0.23

7010

0.90

70

1, −2, −1, 1, −2

1759

0.23

0, −1

5548

0.71

71

1, −2, −3

1756

0.23

22

1, −4

5486

0.71

72

1, −4, −1

1753

0.23

23

−1, 2

5077

0.65

73

1, 1, −4

1752

0.23

24

1, 2

4905

0.63

74

1, 2, −1

1748

0.22

25

−1, 2, 1

4737

0.61

75

−2, 1, −2

1747

0.22

26

3, 2

4696

0.60

76

1, 0, 1, −2

1684

0.22

27

4

4557

0.59

77

6

1681

0.22

28

3, 2, 1

4342

0.56

78

−2, −1, 1, −2

1610

0.21

29

−3, −1

4326

0.56

79

−1, 1, 1

1544

0.20

30

−6

4182

0.54

80

−4, 1

1506

0.19

31

1, −3, −1

4176

0.54

81

1, 1, 0, −1

1486

0.19

32

1, 0, −1

4173

0.54

82

4, −1

1474

0.19

33

−1, −1, −1

3865

0.50

83

0, −1, 1

1469

0.19

34

0, 1

3608

0.46

84

1, 4

1456

0.19 (Continued)

Appendices 265

Sequencing

Rank

Sequencing

Frequency

35

−2, 1

3431

36

1, 1, 0

37

%

Rank

Sequencing

Frequency

%

0.44

85

1, 0, −1, −1

1443

0.19

3377

0.43

86

−3, −1, −1

1421

0.18

−1, −1, 1

3337

0.43

87

1, −3, −1, −1

1417

0.18

38

−1, −2

3326

0.43

88

3, −1, 1

1413

0.18

39

−1, 2, 1, −3

3300

0.42

89

−8

1398

0.18

40

1, 2, 1

3157

0.41

90

2, 1, 0, −1

1391

0.18

41

1, −2, 1

3003

0.39

91

7

1322

0.17

42

2, 1, −3, −1

2977

0.38

92

1, −2, 1, −2

1321

0.17

43

2, −1

2973

0.38

93

0, 2

1317

0.17

44

2, 1, −4

2939

0.38

94

−3, 1, −2

1317

0.17

45

5

2785

0.36

95

1, −6

1301

0.17

46

2, 1, 1

2704

0.35

96

−1, −1, 2

1296

0.17

47

1, 1, −3

2560

0.33

97

−5, −1

1268

0.16

48

1, −2, −1, −1

2515

0.32

98

1, −3, −1, 1

1264

0.16

49

0, 1, −2

2448

0.32

99

−3, −1, 1

1261

0.16

50

−3, 1

2437

0.31

100

0, 2, 1

1260

0.16

266 Appendices

Appendix 7 (Continued)

Appendix 8 Top 20 ADD sequencings in each sub-corpus (R. = rank, Freq. = frequency) R.

Freq.

%

Freq.

ADD-E2

%

Freq.

ADD-E3

%

1

1

22,289

13.60

22,406

1

13.72

22,344

1

13.56

2

2

8538

5.21

8383

2

5.13

8519

2

5.17

3

1-1

7303

4.46

7467

1-1

4.57

7383

1-1

4.48

4

2-1

6492

3.96

6403

2-1

3.92

6523

2-1

3.96

5

1-2

6023

3.68

5988

1-2

3.67

6077

1-2

3.69

6

3

4177

2.55

4182

3

2.56

4149

3

2.52

7

1-2-1

3977

2.43

4019

1-2-1

2.46

4085

1-2-1

2.48

8

1-3

3042

1.86

3023

1-3

1.85

3017

1-3

1.83

9

1-1-2

2255

1.38

2232

1-1-2

1.37

2300

1-1-2

1.40

10

3-1

2158

1.32

2177

3-1

1.33

2116

4

1.28

11

4

2081

1.27

2067

4

1.27

2109

3-1

1.28

12

0

2002

1.22

1965

0

1.20

2024

0

1.23

13

1-1-1

1878

1.15

1962

1-1-1

1.20

1897

1-1-1

1.15

14

2-1-3

1789

1.09

1776

2-1-3

1.09

1803

2-1-3

1.09

15

1-3-1

1618

0.99

1586

1-3-1

0.97

1593

1-0

0.97

16

1-0

1610

0.98

1568

1-0

0.96

1570

2-1-1

0.95

17

0-1

1524

0.93

1568

2-1-1

0.96

1560

0-1

0.95 (Continued)

Appendices 267

ADD-E1

R.

ADD-E1

Freq.

18

2-1-1

1523

19

1-4

20

1-1-2-1

Freq.

ADD-E2

0.93

1484

0-1

0.91

1548

1-3-1

0.94

1481

0.90

1455

1-4

0.89

1529

1-4

0.93

1258

0.77

1307

3-1-1

0.80

1316

1-1-2-1

0.80

sub-total R.

%

50.67

%

Freq.

ADD-E3

50.83

ADD-E4

Freq.

%

Freq.

ADD-E5

%

1

1

22,180

13.65

22,247

1

13.80

2

2

8477

5.22

8350

2

3

1-1

7165

4.41

7223

4

2-1

6435

3.96

5

1-2

6000

6

3

7

%

50.65 Freq.

ADD-E6

%

22,430

1

13.65

5.18

8403

2

5.12

1-1

4.48

7538

1-1

4.59

6317

2-1

3.92

6302

2-1

3.84

3.69

6009

1-2

3.73

5963

1-2

3.63

4113

2.53

4057

3

2.52

4189

3

2.55

1-2-1

3958

2.44

3976

1-2-1

2.47

3864

1-2-1

2.35

8

1-3

2920

1.80

2920

1-3

1.81

3017

1-3

1.84

9

4

2134

1.31

2153

1-1-2

1.34

2214

1-1-2

1.35

10

0

2072

1.28

2097

4

1.30

2159

3-1

1.31

11

3-1

2061

1.27

2042

3-1

1.27

2068

0

1.26

12

1-1-2

2019

1.24

2007

0

1.24

2053

4

1.25 (Continued)

268 Appendices

Appendix 8 (Continued)

Appendix 8 (Continued) R.

ADD-E4

Freq.

ADD-E5

13

1-1-1

1855

1.14

1827

1-1-1

1.13

1966

1-1-1

1.20

14

2-1-3

1700

1.05

1703

2-1-3

1.06

1771

2-1-3

1.08

15

1-0

1643

1.01

1619

1-0

1.00

1611

1-0

0.98

16

0-1

1552

0.96

1495

1-3-1

0.93

1583

1-3-1

0.96

17

1-4

1514

0.93

1489

0-1

0.92

1558

2-1-1

0.95

18

2-1-1

1500

0.92

1477

1-4

0.92

1547

0-1

0.94

19

1-3-1

1479

0.91

1477

2-1-1

0.92

1422

1-4

0.87

20

1-0-1

1232

0.76

1235

4-1

0.77

1275

3-1-1

0.78

sub-total

Freq.

%

50.47

%

50.68

Freq.

ADD-E6

%

50.48

Appendices 269

R.

Sequ.

Freq.

%

R.

Sequ.

Freq.

%

1

1

133,896

13.66

35

1-5

4407

0.45

2

2

50,670

5.17

36

1-1-0

4370

3

1-1

44,079

4.50

37

6

4

2-1

38,472

3.93

38

5

1-2

36,060

3.68

6

3

24,867

7

1-2-1

8 9

R.

Sequ.

Freq.

%

68

1-3-1-2

2485

0.25

0.45

69

1-2-1-1-2

2479

0.25

4330

0.44

70

2-1-0-1

2419

0.25

1-2-1-2

4283

0.44

71

1-1-2-1-1

2314

0.24

39

5-1

4210

0.43

72

1-4-1-1

2282

0.23

2.54

40

0-1-2

4050

0.41

73

1-2-1-4

2262

0.23

23,879

2.44

41

0-1-1

4022

0.41

74

3-2-1-4

2258

0.23

1-3

17,939

1.83

42

2-1-2-1

3833

0.39

75

1-0-1-2-1

2255

0.23

1-1-2

13,173

1.34

43

1-1-1-2

3715

0.38

76

8

2225

0.23

10

3-1

12,706

1.30

44

3-1-2

3277

0.33

77

1-2-1-3-1

2063

0.21

11

4

12,548

1.28

45

1-1-3-1

3274

0.33

78

1-1-1-2-1

2037

0.21

12

0

12,138

1.24

46

1-1-0-1

3212

0.33

79

2-1-1-1

1942

0.20

13

1-1-1

11,385

1.16

47

1-0-1-2

3207

0.33

80

7-1

1904

0.19

14

2-1-3

10,542

1.08

48

1-1-4

3180

0.32

81

1-4-1-2

1896

0.19

15

1-0

9644

0.98

49

1-0-1-1

3143

0.32

82

2-1-5

1863

0.19

16

1-3-1

9309

0.95

50

4-1-2

3126

0.32

83

1-2-1-1-1

1859

0.19

17

2-1-1

9196

0.94

51

2-1-0

3122

0.32

84

3-1-2-1

1781

0.18 (Continued)

270 Appendices

Appendix 9 Top 100 ADD sequencings (sequ.) in all the sub-corpora (Sequ. = sequencing, R. = rank, Freq. = frequency)

Appendix 9 (Continued) R.

Sequ.

Freq.

18

0-1

9156

19

1-4

20

%

R.

Sequ.

Freq.

%

0.93

52

1-2-1-2-1

3085

0.31

8878

0.91

53

4-1-1

3080

3-1-1

7556

0.77

54

1-1-1-1

21

1-1-2-1

7318

0.75

55

22

1-0-1

7211

0.74

23

4-1

7126

24

1-2-1-1

25

R.

Freq.

%

85

1-1-5

1758

0.18

0.31

86

1-2-1-0

1753

0.18

3025

0.31

87

9

1744

0.18

7

2881

0.29

88

1-1-4-1

1737

0.18

56

0-1-2-1

2840

0.29

89

5-1-2

1732

0.18

0.73

57

1-3-2

2779

0.28

90

1-1-1-3

1705

0.17

7105

0.73

58

1-3-2-1

2778

0.28

91

5-1-1

1688

0.17

5

6755

0.69

59

6-1

2767

0.28

92

1-1-3-1-1

1616

0.16

26

3-2

5840

0.60

60

2-1-3-1-1

2733

0.28

93

1-6-1

1577

0.16

27

3-2-1

5830

0.59

61

2-3

2695

0.28

94

2-3-1

1548

0.16

28

1-3-1-1

5290

0.54

62

1-2-3

2690

0.27

95

1-2-3-1

1543

0.16

29

1-1-3

5287

0.54

63

1-5-1

2634

0.27

96

1-4-1-2-1

1536

0.16

30

2-1-2

5031

0.51

64

1-6

2586

0.26

97

3-1-1-2

1532

0.16

31

2-1-3-1

4971

0.51

65

2-1-1-2

2512

0.26

98

1-7

1517

0.15

32

1-2-1-3

4922

0.50

66

4-1-2-1

2511

0.26

99

1-3-1-2-1

1507

0.15

33

1-4-1

4863

0.50

67

2-1-4-1

2502

0.26

100

2-2

1488

0.15

34

2-1-4

4714

0.48

Appendices 271

Sequ.

P-RST-1 DD

P-RST-2

F

%

DD

P-RST-3

F

%

DD

P-RST-4

F

%

DD

P-RST-5 F

%

DD

P-RST-6

F

%

DD

F

%

1

−1

232

36.9

−1

215

34.7

−1

230

36.2

−1

227

36.2

−1

247

39.2

−1

234

36.9

2

0

109

17.4

0

132

21.3

0

108

17.0

0

162

25.8

0

119

18.9

0

119

18.7

3

−2

96

15.3

−2

66

10.6

−2

78

12.3

−2

75

12.0

−2

83

13.2

−2

72

11.3

4

−3

53

8.4

−3

38

6.1

−3

49

7.7

−3

48

7.7

−3

35

5.6

−3

51

8.0

5

−4

28

4.5

1

28

4.5

1

40

6.3

−4

26

4.1

−4

27

4.3

−4

40

6.3

6

1

26

4.1

−4

22

3.5

−4

27

4.2

−7

13

2.1

1

21

3.3

−5

27

4.3

7

−5

21

3.3

−5

17

2.7

−5

19

3.0

1

11

1.8

−6

21

3.3

1

20

3.2

8

−6

10

1.6

2

14

2.3

−6

13

2.0

−5

10

1.6

−5

14

2.2

−6

12

1.9

9

−8

9

1.4

−8

13

2.1

−7

10

1.6

−6

9

1.4

−8

13

2.1

−7

10

1.6

10

2

8

1.3

−7

13

2.1

−9

7

1.1

−8

9

1.4

−7

9

1.4

−8

7

1.1

11

−7

7

1.1

−6

10

1.6

2

6

0.9

−9

5

0.8

−11

5

0.8

−9

7

1.1

12

−11

5

0.8

−13

7

1.1

−8

6

0.9

−15

3

0.5

−9

5

0.8

−10

6

0.9

13

−10

5

0.8

−10

7

1.1

3

5

0.8

−18

3

0.5

−12

4

0.6

−12

4

0.6

14

3

4

0.6

−11

7

1.1

−10

5

0.8

−12

3

0.5

−10

4

0.6

−13

3

0.5

15

−12

3

0.5

−9

6

1.0

−15

3

0.5

2

3

0.5

2

3

0.5

−17

3

0.5

(Continued)

272 Appendices

Appendix 10 Discourse dependency distance (paragraphs as nodes, complete data) (R. = rank, DD = dependency distance, F = frequency, P = paragraph, Ave. = average)

Appendix 10 (Continued) P-RST-1 DD

P-RST-2

F

%

DD

P-RST-3

F

%

DD

P-RST-4

F

%

DD

P-RST-5 F

%

DD

P-RST-6

F

%

DD

F

%

−15

3

0.5

−12

5

0.8

−13

3

0.5

−11

3

0.5

−13

3

0.5

−11

3

0.5

17

−13

2

0.3

3

4

0.6

−12

3

0.5

−13

3

0.5

−18

3

0.5

−14

2

0.3

18

−9

2

0.3

−14

4

0.6

−16

3

0.5

−10

2

0.3

−17

2

0.3

−15

2

0.3

19

−14

2

0.3

−15

3

0.5

4

3

0.5

10

2

0.3

−15

2

0.3

2

2

0.3

20

5

1

0.2

−22

1

0.2

−14

3

0.5

−14

2

0.3

−14

2

0.3

−19

2

0.3

21

−17

1

0.2

−21

1

0.2

−11

2

0.3

−20

1

0.2

−19

1

0.2

−24

1

0.2

22

−16

1

0.2

10

1

0.2

−28

2

0.3

−17

1

0.2

−16

1

0.2

−23

1

0.2

23

7

1

0.2

−18

2

0.3

−21

1

0.2

−25

1

0.2

−26

1

0.2

24

−17

1

0.2

−17

2

0.3

3

1

0.2

−23

1

0.2

−25

1

0.2

25

−16

1

0.2

−21

1

0.2

−16

1

0.2

4

1

0.2

−22

1

0.2

26

−18

1

0.2

−29

1

0.2

−19

1

0.2

3

1

0.2

−18

1

0.2

27

−20

1

0.2

−22

1

0.2

45

1

0.2

8

1

0.2

−16

1

0.2

28

−19

1

0.2

−20

1

0.2

−22

1

0.2

10

1

0.2

−21

1

0.2

29

−26

1

0.2

−20

1

0.2

30

5

1

0.2

31

−19

1

0.2

Mean −1.9

−2.2

−2.3

−1.9

−2.2

−2.5

Appendices 273

16

Author Index

Altmann, G. 3, 14, 19, 23, 24, 29, 33, 36, 39, 40–44, 57, 58, 71, 72, 77, 79, 80, 118, 158, 161–164, 166, 173, 181, 188, 189, 231, 241–244 Beliankou, A. 16, 24, 130, 158, 166, 181, 182, 191, 192, 231 Best, K-H. 58, 71, 80, 97, 158, 164 Carlson, L. 7, 10, 18, 24, 51, 52, 56, 72, 75, 166, 167, 169, 171, 172, 180, 205, 231 Chen, H. 10, 20, 24, 72, 240, 243 Chen, R. 15, 16, 24, 48, 72, 81, 142, 158 Chen, X. 15, 16, 25, 30, 38, 44, 48, 72, 75, 81, 143, 163 Chomsky, N. 3, 4, 10, 141, 162 Feng, Z. 7, 14, 16, 30, 38, 43, 47, 60, 73, 75, 92, 130, 162, 166, 173, 235 Ferrer-i-Cancho, R. 10, 15, 26, 38, 43, 44, 58, 73, 141, 142, 145, 146, 159, 243 Gibson, E. 18, 26, 34, 140, 141, 145, 159–161, 181, 235 Herdan, G. 20, 27, 66, 73 Hirao, T. 9, 11, 17–19, 27, 29, 35 Hudson, R. 5, 6, 9, 11, 16, 27, 47, 48, 74, 141, 160, 240, 243 Jiang, J. 15, 16, 28, 31, 48, 74, 76, 141–143, 146, 149, 160, 163 Jing, Y. 15, 17, 28, 30, 47–49, 69, 74, 75, 143, 161

Köhler, R. 16, 20, 23, 24, 29, 36, 40–44, 58, 66, 68, 69, 71, 74, 79, 80, 82, 92, 97, 129, 130, 134, 139, 158, 161, 163, 164, 181, 182, 189, 208, 231, 233, 240–245 Liang, J. 12, 14–16, 24, 29, 30, 45, 47, 48, 69, 72, 74, 75, 79, 80, 85, 99, 114, 128, 143, 146, 158, 161, 162, 164, 242, 243 Liu, H. 2–7, 9–20, 24–26, 28–31, 34, 35, 38, 39, 44, 45, 47–49, 52, 53, 58, 60, 62, 66, 68, 69, 72–76, 78–82, 85, 91–93, 95, 97, 99, 114, 128–131, 133–149, 158–164, 166, 168, 171, 173, 176–181, 185–189, 191, 192, 199–201, 227, 233, 235, 236, 240–243 Lu, Q. 16, 30, 35, 43, 79, 130, 142, 145, 146, 162, 164, 194, 235 Mann, W. C. 3, 7, 9, 10, 12, 13, 17, 18, 25, 31, 33, 56, 75, 77, 79, 165–169, 177, 179–181, 184, 228, 229, 233, 235 Marcu, D. 7, 10, 12, 17, 18, 24, 31, 33, 47, 54, 56, 72, 75, 77, 166, 167, 169, 171, 172, 180, 205, 228, 229, 231, 233 Mel’čuk, I. 3, 5, 12 Noordman, L. G. 25, 54, 72, 76, 181, 232, 234 Ouyang, J. 15, 16, 28, 31, 48, 74, 76, 142, 143, 160, 163

Author Index 275 Pawłowski, A. 20, 31, 66, 76, 143, 163, 182, 234 Sanders, T. 3, 6, 7, 13, 18, 32, 165, 180, 181, 233, 234 Stede, M. 7, 10, 13, 51, 52, 77, 182, 235 Sun, K. 9, 13, 18, 19, 20, 33, 64, 73, 77, 166, 217, 219, 221, 235, 244 Taboada, M. 7, 11, 13, 14, 17–19, 25, 27, 33, 34, 51, 56, 77, 179, 180, 181, 228, 229, 232, 235 Tesnière, L. 3, 5, 14, 47, 48, 77, 129, 130, 163

van Dijk, T.A. 21, 22, 29, 33, 34, 56, 74, 77, 78, 181, 193, 194, 202, 203, 229, 233, 235 Williams, S. 21, 34, 52, 56, 78, 166, 193, 229, 235 Ziegler, A. 3, 14, 58, 79, 80, 97, 164 Zipf, G.K. 17, 19, 20, 33, 35–39, 42–45, 47, 49, 58, 68, 77, 79, 80, 82, 91, 95, 99, 101, 116, 163, 164, 166, 179, 181, 206, 230, 237, 240, 244

Subject Index

abstract dependency distance (ADD) 63, 65, 143, 144, 146, 148–150, 152–156, 237, 267–270 active valency 60, 107, 110 activity indicator 82, 83 adjacent dependency 144 adjunct 24, 60, 130, 131, 158 algorithm 9, 16, 18, 19, 31, 52, 85–88, 90, 91, 141, 143, 166, 180, 238, 251 Altmann-Fitter 188 Anaphora 19, 34, 216 Antecedent 19, 216 Annotation 13, 18, 24, 25, 27, 28, 49, 52, 72, 77, 142, 180, 235 Assumption 40, 41, 43, 68, 142 bag-of-words 20, 22, 66, 81, 116, 146 Beőthy’s law 42 catastrophe theory 240 Centering Theory 17 centrifugal force 107, 110 centripetal force 107, 110 chaos theory 240 Chat Generative Pre-trained Transformer (ChatGPT) 56 classification 17, 21, 25, 30, 32, 72–74, 78, 81, 158, 161, 167, 170, 171, 176, 182, 194, 231, 233 coefficients 90, 95, 108, 113, 118, 121, 123, 125, 148, 176, 179, 184 cognition 11, 15, 25, 29, 33–35, 47, 71, 72, 74, 76–78, 142, 159, 161, 163, 232, 234, 235, 241, 243, 244 coherence 7, 13, 21, 24–26, 29–31, 33, 34, 77, 181, 231–235 coherence relation 7, 25, 30, 33, 181, 232–235

combinatorial (dependency/structure/ distribution/unit/property) 77, 80, 103, 115, 116, 157, 239 Complement 27, 41, 60, 129, 130, 158, 160, 239 complementation 27, 129, 130, 160 complex adaptive system 44, 233, 240, 242–244 complexity 13, 15, 16, 18, 24, 31, 33, 34, 40, 72, 76–78, 141, 142, 145, 158–161, 163, 166, 217, 229, 233, 235, 240, 243 Connexion (connection) 3, 5, 6, 47, 48 Consequence 39, 41, 156, 167, 171, 172, 182 constituency grammar 3, 4, 6 constituent 2, 4, 6, 9, 11, 19, 20, 34, 52, 53, 140, 182, 183, 196, 229, 238 conversion 9, 11, 22, 50, 52–54, 168, 184, 191, 193, 228–230, 239 corpus 6, 9, 10, 13, 14, 16, 19, 20, 24, 25, 27, 30, 35, 48–52, 69, 72, 73, 77, 78, 81, 83, 85, 95, 98, 101, 102, 118, 121, 130, 136, 142, 147, 149, 151, 154, 160, 161, 164, 166, 173, 180, 183, 194, 203, 220, 226, 231, 235, 261, 267 Corpus of Contemporary American English (COCA) 81 Corpus of Historical American English (COHA) 81 Cross-document Structure Theory (CST) 10, 17, 19, 24, 182 Curve 12, 37, 38, 39, 47, 85, 87, 90, 92, 93, 104, 105, 108, 109, 111, 114, 118, 121, 123, 125, 134, 147, 148, 192, 225, 226

Subject Index 277 Degree 46, 47 Dependency 2–7, 9–36, 39, 44, 46–50, 52–54, 58, 59, 60, 61, 63, 64, 65, 66, 68, 70–76, 78–239, 242, 243, 261, 272 dependency direction 16, 30, 34, 59, 63, 65, 141–145, 152, 153, 161, 163 dependency distance (DD) 15, 16, 18, 19, 22–26, 28–31, 34–36, 39, 59, 63–66, 69, 71–75, 79, 129, 130, 132, 140–166, 215–230, 237, 238, 258–264, 267–270, 272, 273 dependency distance minimisation (DDM) 15, 142, 145, 146, 149, 156 dependency grammar 2–6, 10–13, 15, 22, 31, 58, 60, 75, 76, 82, 85, 92, 107, 129, 141, 145, 146, 162, 206 dependency length 25, 26, 33, 141, 159, 160, 163 Dependency Locality Theory (DLT) 140, 145 dependency length 25, 26, 33, 141, 159, 160, 163 dependency relation 92, 94, 100, 103, 113, 116, 128, 141, 150, 157, 237 dependency structure 5, 19, 22–24, 28, 30, 35, 39, 44, 47, 53, 59, 61, 78, 80, 95, 97, 103, 113, 116, 125, 126, 128, 129, 140, 149, 156–158, 162, 237, 239 dependency tree 5, 6, 9, 15, 17, 19–23, 28, 30, 35, 46, 47–50, 52–54, 59, 60, 64, 70, 71, 73, 74, 75, 79, 81, 85, 92, 96, 110, 131, 141–143, 146, 158–162, 193, 206, 217, 230, 237–239, 243 dependency treebank 23, 28, 30, 46, 48, 49, 50, 70, 71, 73–75, 92, 142, 146, 158–162, 237, 243 dependent 5, 6, 11, 16, 23, 41, 50, 51, 56, 58–63, 65, 67, 81–85, 87, 90–93, 96, 97, 103–110, 113, 115–118, 120, 128–132, 134, 135, 140–142, 145, 149–151, 156, 157, 168, 189, 205, 206, 208, 209, 220, 227, 237, 238, 240, 248 Depth Hypothesis 140, 145 descriptivity indicator 82, 83 diachronic study 29, 30, 75, 130, 160, 161 discourse 2, 3, 6–15, 17–35, 46, 48, 49–52, 54, 56, 57, 63–66, 70–79, 140, 157, 164–244, 272 discourse complexity 13, 18, 33, 77, 217, 235

discourse dependency 9, 12, 15, 17, 19, 20–23, 25, 29, 35, 46, 50, 54, 157, 165–236, 239, 272 Discourse Graphbank 18, 34 discourse parsing 12, 13, 17, 18, 24, 26, 27, 29, 31–33, 73, 75, 77, 233, 234 discourse process 20, 25, 34, 71, 73, 77, 78, 165–167, 169, 171, 172, 183, 187, 188, 190, 193, 205, 229, 230, 232–234, 238 discourse relation 2, 6, 7, 14, 19, 27, 28, 30–33, 35, 52, 63, 71, 75, 164, 182, 185–187, 228, 230, 232–235 Discourse Representation Theory 17, 28 synergetic system 242 discourse valency 20, 22, 64–66, 71, 165, 206, 208–219, 226, 227, 230, 231, 238 distribution 14–16, 18–20, 22, 29, 30, 31, 34–44, 47, 48, 58, 59, 68, 69, 74, 75, 80–83, 85, 87, 91–93, 95, 97, 100, 103, 104, 108, 113, 115, 116, 128–131, 133, 134, 138–140, 142–144, 146–148, 150, 156, 157, 160–168, 173, 179–191, 193, 203, 206, 208, 211, 216, 218, 227, 228, 235, 237–240 diversification process 19, 23, 36–45, 58, 91, 95, 104, 111, 113, 125, 139, 148, 149, 156, 157, 163, 166, 173, 176, 179–182, 189, 190, 215, 227, 228, 230, 231, 238 dynamics 73, 118 Early Immediate Constituents (EIC) 9, 12, 140, 145 Edge 5, 7, 11, 13, 16, 19–22, 25–27, 34, 46, 48, 59, 74, 76, 78, 128, 157, 194, 232, 233, 238, 241, 243, 245, 246 elementary discourse unit (EDU) 6, 8–10, 12, 13, 17, 22, 25, 32, 34, 39, 41, 49, 51, 54, 56, 71, 75, 77, 78, 129, 145, 146, 150, 163, 165, 166, 168, 172, 182–184, 186, 188–191, 193, 205, 209, 217, 230, 232–235, 238, 244 entropy 74, 80–83, 104, 108, 111, 113, 117, 120, 121, 123, 125, 208, 209, 211, 215, 220, 221 fitting 80, 90, 91, 95, 96, 99–101, 104, 106, 108, 110–114, 118, 120, 121, 123, 125, 126, 128, 134, 136, 138, 139,

278 Subject Index 147, 148, 152, 153, 155–157, 176, 178–182, 186–192, 208, 213–215, 219, 227, 228 function 2, 4, 6, 7, 12, 21, 22, 24, 31, 34, 37, 38, 41–43, 47–50, 52, 56, 58, 59, 67–71, 73–76, 80, 81, 92–97, 100–103, 107–111, 113, 116, 120, 121, 128–130, 139, 140, 145, 156–159, 161, 162, 164, 169, 190, 194, 208, 233–235, 237, 240–242 garden path sentence 145 generalised valency 16, 62, 92, 107, 110, 130, 131, 139 Generalised Valency Pattern (GVP) 16, 60, 62, 92, 107, 130 Geneva (Pragmatics) School 17, 32 Goebl’s law 42 Governor 3, 5, 6, 47, 48, 58–63, 65, 81–83, 85–88, 90, 91, 92, 103–106, 110, 111–113, 115–118, 121, 128–130, 132, 140–142, 144, 145, 147, 150–152, 156, 157, 206, 220, 227, 237, 251 governor/head-final (dependency) 16, 63, 65, 144, 149, 150, 156, 220, 227 governor/head-initial (dependency) 16, 63, 65, 144, 149,150, 156, 220, 227 granularity 9, 11, 20, 22, 29, 54, 79, 85, 87, 88, 90, 165, 166, 172, 173, 180, 182, 183, 187, 188, 190, 191, 193, 235, 238 graph theory 46, 71–73, 77 head 1, 5, 13, 16, 21, 24, 25, 33, 47, 49, 52–54, 65, 73, 77, 149, 150, 156, 158, 167, 168, 193, 220, 227, 235 Hierarchical Selection Model 38, 47, 92 hierarchical structure 2, 4, 10, 19, 30, 75, 141, 165, 243 hierarchy 2, 3, 15, 30, 32, 38, 45, 47, 48, 68, 75, 164, 193, 198, 228, 229, 234 homogeneity 80, 82, 85, 87, 90–93, 95, 98, 104, 108, 111, 117, 125, 139, 143, 146, 147, 149, 154, 156, 208, 220, 225, 228, 230, 238 hyper-Binomial (function/model/ distribution) 182 hypergeometric (function/model/ distribution) 80, 91, 118, 120, 121, 123, 125, 128, 157, 182, 237 hyper-Poisson(function/model/distribution) 68, 139, 140, 157, 237

hypothesis 2, 19, 21, 23, 36, 41, 42, 56, 80, 82, 83, 93, 97, 100, 101, 104, 116, 125, 132, 140–146, 148, 152, 157, 164, 173, 180, 181, 183, 184, 186, 189, 190, 193, 203, 206, 208, 209, 213, 218, 227, 242 immediate constituent 2, 4, 140 in-degree 46, 47 interdisciplinary study 33, 34, 77, 78, 235 interlanguage 130, 143, 148, 161 inverted pyramid (structure) 21, 22, 31, 165, 193–195, 198–205, 220, 227, 229, 230, 232, 234, 235, 238 Isomorphism/isomorphy 2, 11 language holography 240, 244 language in the line 20, 31, 76, 116, 234 language in the mass 20, 31, 66, 76, 130, 234 language modelling 17 law 2, 10, 18, 19, 22, 24, 33, 36, 37, 38, 39, 41, 42, 44, 45, 47, 58, 59, 68, 72, 75, 76, 77, 79, 140, 158, 163, 164, 173, 181, 183, 186, 218, 235, 241–243 least effort principle 23, 36–45, 241 left-branching 144 lexicalised discourse framework 18 linear language material 20 linear linguistic behavior 16, 20, 69 linear linguistic units 65, 66, 71 Linguistic Data Consortium 10, 24, 34, 49, 72, 73, 75, 231, 237, 243 linguistic data stratification 80 Liu-Directionalities 142 machine translation 13, 17, 31 macrostructure 21, 31, 33, 34, 56, 76, 77, 78, 193, 229, 234, 235 Martin’s law 42 mean dependency distance (MDD) 35, 79, 144, 164 Menzerath’s law 2 Model 1, 2, 4, 11, 13, 17–19, 21–23, 25–29, 31–33, 36, 38, 40–44, 47, 56, 66, 68, 74, 76, 77, 80–82, 90–93, 95, 96, 99–101, 103, 104, 106, 108, 110–114, 116, 118, 120, 121, 123, 125, 128, 130, 131, 134, 136, 138, 139, 140, 142–144, 146, 148, 152, 153, 155–158, 160, 164–166, 173, 176, 178, 180–182, 184, 186, 188–192, 201, 206, 208, 213–215,

Subject Index 279 218, 219, 227–231, 234, 235, 237, 238, 241–245 modified right-truncated Zipf-Alekseev (function/model/distribution) (ZA) 19, 68, 80–82, 90–93, 95, 96, 99–101, 103, 104, 106, 108, 110–114, 116, 118, 120, 128, 131, 134, 136, 138, 139, 143, 144, 146, 148, 152, 153, 155–157, 166, 173, 176, 178, 180–182, 206, 208, 214, 215, 218, 227, 228, 230, 237, 238 motif 14, 16, 22–24, 29, 46, 65–72, 74, 75, 79–81, 93, 97, 102, 103, 116–140, 143, 157, 158, 161, 162, 164, 165, 182–193, 206, 208, 211, 213, 215–219, 230, 231, 233, 236–239 motif length 131, 133, 134, 136–140, 157, 183, 184, 186–193, 237 multifunctionality 92, 107, 108 natural language processing (NLP) 3, 6, 10, 11, 13, 17, 21, 24–29, 31, 32, 35, 73, 76, 77, 79, 159, 194, 231, 233, 235, 244 negative hypergeometric (function/model/ distribution) 80, 91, 118, 120, 121, 123, 125, 128, 157, 182, 237 neural network 18, 27–30, 32, 77, 78, 232–234 node 4, 6–9, 15, 16, 18, 19, 46, 47, 50, 52–59, 62–65, 71, 83, 85, 96, 131, 132, 141, 143–145, 149, 151, 166, 169–171, 178, 179, 183, 185–189, 191–193, 195–213, 215–227, 229, 230, 238, 272 non-linear language material 20, 81 nucleus 7, 8, 9, 53, 57, 166, 167, 169, 171, 186, 192, 200, 203–205, 208, 219–221, 224, 239 objectivity 21, 33, 194 obligatory argument 130 opposite branching 145 optional argument 130 ordered nesting 145 out-degree 46, 47 parameter 2, 34, 37, 80, 82, 91, 93, 95, 100, 104, 108, 111, 113, 118, 121, 125, 134, 143, 148, 149, 152, 155, 178, 186, 188–192, 209, 227–229 parts of speech (POS) 15, 48, 50, 58, 59, 67, 68–70, 72, 80–93, 95, 97, 98–101, 103, 104, 107, 108, 110,

116, 117, 128, 132, 138, 144, 150, 156–158, 162–164, 237, 248–254 passive valency 60, 107, 110 Penn Discourse Tree Bank 18 phrase structure 4, 12, 25, 46, 145 Polya (distribution/function/model) 182 Polyfunctionality 41, 92 Postdam Commentary Corpus 13, 182 power law 18, 37, 38, 45, 164, 218 Prague Czech-English Dependency Treebank (PCEDT) 49, 71, 73, 237, 243 Probabilistic Valency Pattern Theory (PVP) 16, 26, 30, 34, 75, 92, 107, 130, 131, 162, 164 Probability 14, 17, 29, 31, 35, 36, 38, 41, 42, 48, 49, 68, 75, 80, 83, 92, 97–101, 103, 108, 110, 130, 142, 151, 154, 156, 160–164, 189, 190, 235 probability distribution 14, 29, 31, 35, 42, 48, 75, 92, 108, 16–164, 235 quantitative linguistics 14, 20, 23–26, 29–31, 33–35, 37, 40, 43, 66, 71, 72, 75, 76, 79, 158–164, 231–236, 242–244 rank-frequency (distribution/curve/data) 22, 23, 37, 39–42, 68, 69, 80, 82, 83, 85, 87, 90, 91, 93–97, 99–101, 103–105, 106, 108–114, 118, 121, 123, 125, 129, 133, 134, 136, 138, 143, 147, 148, 150, 152, 153, 155–157, 164, 165, 173, 174, 179, 182–189, 192, 206, 208, 211, 213–215, 218, 219, 225, 226, 228, 230, 238, 258 relation 1–7, 9, 11, 14–16, 18–20, 22–27, 29–33, 35, 46–50, 52, 54, 56, 58, 63, 66–68, 71, 72, 74–77, 79–242 relative frequency 17, 35, 36, 45, 49, 79, 164 Relevance Theory 17 repetition rate (RR) 82, 83, 208, 209, 211, 215, 220, 221 reverse branching 145 rhetorical distance 19, 25, 216, 231 rhetorical relation 7, 11, 19, 25, 52, 54, 79, 166–168, 171–173, 177, 179–184, 187–190, 193, 232, 235, 236 Rhetorical Structure Theory (RST) 2, 6–14, 17–19, 22, 24, 25, 27, 29, 31, 35, 50–54, 56–58, 65, 72, 75, 78, 79, 164–168, 170, 172, 173, 176,

280 Subject Index 178–184, 186, 191, 193, 195, 196, 205, 206, 208, 209–219, 221–224, 227–231, 235, 236, 238, 239, 241, 272, 273 Rhetorical Structure Theory Discourse Treebank (RST-DT) 7, 9, 18, 19, 25, 51, 52, 65, 72, 75, 78, 166–168, 179, 180, 182, 195, 217, 230, 235, 238, 241 Rhythm 31, 65, 66, 68, 76, 132–137, 140, 143, 155, 234 right truncated modiﬁed Zipf-Alekseev (function/model/distribution) (ZA) (function/model/distribution) 19, 42, 43, 68, 80–82, 90–93, 95, 96, 99, 103, 104, 106, 108, 110–116, 118, 120, 128, 131, 134, 136, 138, 139, 143, 144, 146, 148, 152, 153, 155–157, 166, 173, 176, 178–182, 206, 214, 215, 218, 222, 227, 228, 230, 237, 238 right-branching 144 right-truncated Zeta (distribution/function/ model) 142, 143, 146, 147, 156 R-motif 182, 183 Robustness 17, 49 Root 1, 5, 6, 46, 47, 53, 54, 61, 65, 67, 70, 83, 85, 93, 94, 96, 97, 102, 103, 109, 110, 141, 143–145, 152, 162, 167, 183, 184, 185, 186, 190, 192, 197, 198, 200, 208, 219–221, 256, 257 same-branching 144 Satellite 7, 8, 9, 57, 166, 167, 171, 172, 192, 204, 205, 206, 208, 213, 219–221, 224 Segmented Discourse Representation Theory 17 self-adaptation 37 self-organisation 37, 241 self-regulation 40 sentence complexity 16, 141 sentiment analysis 24, 29, 31 sequencing 22, 23, 46, 66, 69–71, 80, 95, 97–103, 129, 140, 143–146, 150–157, 226, 237, 239, 254–257, 261, 264–267, 270 sequential behavior 20, 66, 69, 95, 130, 143, 144, 182, 239 span 7–11, 13, 21, 52–54, 56, 63, 65, 77, 142, 166, 169, 171, 173, 182, 190, 191, 200, 228, 229

Stratification 14, 80 Substantivity 81 Subsystem 240–242 summarization 17, 21, 56, 194, 229 superstructure 21, 193, 194, 202, 203 synergetic linguistics 29, 161, 241, 242 synergetic model 44, 92, 229, 241 synergetics 241–244 synergy 241, 242 synfunctionality 92 syntactic analysis 3, 16, 48, 141–143, 146 syntactic complexity 16, 24, 31, 72, 76, 142, 145, 158, 159, 161, 163 syntactic dependency 3, 6, 9, 11, 15, 16, 20, 23, 39, 44, 46, 47, 49, 50, 64, 70, 71, 78, 80–165, 173, 218, 230, 238, 239 syntactic function 48, 52, 58, 59, 67, 69, 70, 73, 80, 81, 92–96, 100–103, 107–111, 113, 116, 120, 121, 128, 129, 157, 159, 162, 164, 237 syntactic valency 206, 209, 211, 214 syntagmatic behaviour (sequential behavior) 20, 66, 69, 95, 130, 143, 144, 182, 239 syntagmatic time 143 syntax 3, 4, 6, 9–14, 35, 44, 47, 58, 60, 74, 77, 158–161, 163, 213, 216, 233, 238–240, 243 system 1, 2, 14, 17, 18, 29, 31, 36, 40, 44, 48, 72, 75, 76, 80, 81, 92, 116, 141, 142, 160, 162–164, 202, 220, 229, 233, 235, 238, 240–244 system complexity 240 Systemic Functional Linguistics 17 systems theory 240, 241 taxonomy 56, 167, 168, 170, 173, 174–178, 180–192, 234 terminal node 9, 19, 52, 54, 71, 131, 166, 191, 193, 208, 215 text categorization 17 text difficulty 81 text summarization 17 time series (plot) 65, 66, 132,133, 135–137, 143, 144, 163 tree conversion 52, 168, 184, 191 tree height 15 tree width 15, 35 treebank 7, 10, 11, 13, 14, 18, 23, 24, 28, 30, 32, 35, 46, 48–51, 70–75, 92, 142, 146, 158–162, 164, 230, 231, 237–239, 243, 244

Subject Index 281 type-token ratio (TTR) 69, 82, 97, 150, 151, 153, 154, 208, 209, 211, 215, 220, 221 typology 26, 30, 44, 78, 159 valency 9, 12, 15, 16, 20, 22, 24, 26–28, 30, 34, 58, 60, 62–67, 71, 72, 75, 77, 80, 85, 92, 107, 110, 116, 129–140, 144, 157–162, 164, 165, 206, 208–219, 225, 226, 227, 228, 230, 231, 237, 238 valency pattern 15, 16, 26, 30, 34, 62, 63, 65, 75, 92, 107, 110, 129, 130, 132, 136, 138, 162, 164 Veins Theory 18, 25 Vertice 46 Wall Street Journal (WSJ) 7, 8, 18, 49, 51, 66, 168, 195–198, 204, 219, 230, 237, 239 word class 41, 60, 71, 79–83, 87, 90–93, 97, 107, 110, 123, 128–130, 157, 158, 162, 164 word order 3, 44, 58, 78, 142, 146, 159, 160, 163 working memory 27, 38, 39, 44, 78, 140–142, 145, 147, 156, 158, 159, 194, 228, 231, 232

206, 208, 211, 216, 218, 227, 228, 235, 237–240 function 2, 4, 6, 7, 12, 21, 22, 24, 31, 34, 37, 38, 41–43, 47–50, 52, 56, 58, 59, 67–71, 73–76, 80, 81, 92–97, 100–103, 107, 108–111, 113, 116, 120, 121, 128–130, 139, 140, 145, 156–159, 161, 162, 164, 169, 190, 194, 208, 233–235, 237, 240–242 Goebl’s law 42 hyper-Binomial (function/model/ distribution) 182 hypergeometric (function/model/ distribution) 80, 91, 118, 120, 121, 123, 125, 128, 157, 182, 237 hyper-Poisson(function/model/distribution) 68, 139, 140, 157, 237 law 2, 10, 18, 19, 22, 24, 33, 36–39, 41, 42, 44, 45, 47, 58, 59, 68, 72, 75–77, 79, 140, 158, 163, 164, 173, 181, 183, 186, 218, 235, 241–243

Zipfian size 20, 37, 49, 82, 91, 95 Zipf-Mandebrot (function/model/ distribution) (ZM) ZipfMandelbrot’s law 68, 99, 100, 103, 116, 118, 120, 128, 130, 131, 134, 152, 153, 155–158, 206, 213, 215, 219, 237, 238 Zipf’s law 33, 36–39, 44, 45, 58, 68, 77, 79, 163, 164, 181

Menzerath’s law 2 Model 1, 2, 4, 11, 13, 17–19, 21–23, 25–29, 31–33, 36, 38, 40–44, 47, 56, 66, 68, 74, 76, 77, 80–82, 90–93, 95, 96, 99, 100, 101, 103, 104, 106, 108, 110–114, 116, 118, 120, 121, 123, 125, 128, 130, 131, 134, 136, 138–140, 142–144, 146, 148, 152, 153, 155–158, 160, 164–166, 173, 176, 178, 180–182, 184, 186, 188–192, 201, 206, 208, 213–215, 218, 219, 227–231, 234, 235, 237, 238, 241–245

Distributions/Functions/ Models/Laws

negative hypergeometric (function/model/ distribution) 80, 91, 118, 120, 121, 123, 125, 128, 157, 182, 237

Beőthy’s law 42

Polya (distribution/function/model) 182 power law 18, 37, 38, 45, 164, 218

Distribution 14–16, 18–20, 22, 29–31, 34–44, 47, 48, 58, 59, 68, 69, 74, 75, 80–85, 87, 91–93, 95, 97, 100, 103, 104, 108, 113, 115, 116, 128–131, 133, 134, 138–140, 142–144, 146–148, 150, 156, 157, 160–168, 173, 179–191, 193, 203,

right truncated modiﬁed Zipf-Alekseev (function/model/distribution) = ZA (function/model/distribution) 19, 42, 43, 68, 80–82, 90–93, 95, 96, 99, 103, 104, 106, 108, 110–116, 118, 120, 128, 131, 134, 136, 138,

282 Subject Index 139, 143, 144, 146, 148, 152, 153, 155–157, 166, 173, 176, 178–182, 206, 214, 215, 218, 222, 227, 228, 230, 237, 238 right-truncated Zeta (distribution/function/ model) 142, 143, 146, 147, 156 synergetic model 44, 92, 229, 241

Zipf’s law 33, 36–39, 44, 45, 58, 68, 77, 79, 163, 164, 181 Zipf-Mandebrot (function/model/ distribution) = ZM = ZipfMandelbrot’s law 68, 99, 100, 103, 116, 118, 120, 128, 130, 131, 134, 152, 153, 155–158, 206, 213, 215, 219, 237, 238