241 68 8MB
English Pages 288 Year 2015
Ioan-Iovitz Popescu, Mihaiela Lupea, Doina Tatar and Gabriel Altmann Quantitative Analysis of Poetic Texts
Quantitative Linguistics
Editors Reinhard Köhler Gabriel Altmann Peter Grzybek Advisory Editor Relja Vulanović
Volume 67
Ioan-Iovitz Popescu, Mihaiela Lupea, Doina Tatar and Gabriel Altmann
Quantitative Analysis of Poetic Texts
DE GRUYTER MOUTON
ISBN 978-3-11-033605-4 e-ISBN (PDF) 978-3-11-036379-1 e-ISBN (EPUB) 978-3-11-039479-5 ISSN 0179-3616 Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2015 Walter de Gruyter GmbH, Berlin/Boston Printing and binding: CPI books GmbH, Leck ♾ Printed on acid-free paper Printed in Germany www.degruyter.com
Foreword The study of language is a travel without end. Not only because there are many languages but also because there is an unlimited number of texts. Everyone produces several ones on a daily basis and the only way to learn about languages (and not only about languages) is the Sisyphean analysis of the infinite number of texts. Usually, a given problem whose solution is focused, presupposes some definitions, some conventions and some hypotheses. The definitions concern concepts which are created by the researcher and enable her/him to describe and classify. This is the mandatory initial point of any analysis. If we want to proceed to the next level, we must try to test some hypotheses about the properties and the behavior of the given classes. But even if we succeed to do so and capture the results in form of a model, we see that each of the classes can be further scrutinized and split into new classes according to another property, e.g. classifying the words of a text in parts of speech (level 1), we state that the nouns have different grammatical functions (level 2); in turn we may state that every grammatical function contains elements differing in their polysemy (level 3), etc. This procedure does not differ from that in physics or astronomy. The only difference is the fact that language is a cultural product; its analysis necessitates rather complex methods. The next complication shared also with biology is the variability of texts. Texts are written by persons of different age, education, gender, mother tongue, social status; they were written/uttered for different aims in parts or as a whole (text-sorts), they describe different (existing or imaginary) matters and stick sometimes to quite different restrictions (e.g. meter, rhythm, rhyme). And each time we decide for analyzing some of the aspects, find a mathematical model, test it and subsume the discovered regularity under a system, we discover a new aspect. And somewhere on this endless wandering we shall meet psychologists, biologists and physicists and will be forced to take into account their view of things. In the present volume we went a very restricted path: we analyzed the poetic work of the Romanian author Mihai Eminescu and tried to show at least two aspects of texts, viz. the phonetic aspect (Ch. 2), and the vocabulary (Ch. 3). The control cycle was presented in Ch. 4. We hope that the methods will be used for analyzing many different texts. We want to express our gratitude to several colleagues who helped, advised, corrected and improved the book. In the first place, it was Reinhard Köhler who spent more time with corrections to this book than with writing his own one. Other colleagues: Relja Vulanović, Ján Mačutek, Gabriela Pană
VI Foreword Dindelegan, Claudiu Vasilescu, Sorin Vizireanu, and Dan Zotta helped us with the English language, mathematics, computing, graphics, and Romanian, and we are glad that they did not met with shattered nerves. Ioan-Iovitz Popescu Mihaiela Lupea Doina Tatar Gabriel Altmann
Contents Foreword V 1
Introduction 1
2 2.1 2.1.1 2.1.2 2.2 2.2.1 2.2.2 2.2.3 2.3 2.4 2.5 2.5.1 2.5.2 2.5.3 2.5.4
Phonic phenomena 9 Occurrence without pattern 9 Phoneme frequencies 9 Euphony in general 21 Assonance 31 The diagonal 37 Symmetry 39 Poem length and significant sequences 41 Alliteration 44 Aggregation 56 Rhyme 73 Word length 74 Open and closed rhymes 84 Masculine and feminine rhyme 86 Parts of speech in rhyme words 87
3 The word 97 3.1 Introduction 97 3.2 Frequency distribution 98 3.2.1 Stratification 102 3.2.2 Ord's criterion 122 3.2.3 The lambda indicator 129 3.2.4 Entropy and repeat rate 144 3.2.5 Gini's coefficient 153 3.2.6 Geometric properties 157 3.2.6.1 The triangle 157 3.2.6.2 Writer's view and the golden section 171 3.3 Vocabulary richness 181 3.4 Word length 196 3.4.1 Ord's scheme 197 3.4.2 Word-length distribution 202 3.5 Word classes (parts of speech) 210 3.5.1 Frequencies 210
VIII Contents 3.5.2 3.5.3 3.5.3.1 3.5.3.2 3.5.3.3 4
Descriptiveness vs. activity 216 Runs 231 Sequential dependence 232 Run length 236 Placing tendency 238 The control cycle 242
References 270 Index 277
1 Introduction Poetic texts can be analyzed from an infinite number of viewpoints, just as any text and the whole of the human behaviour. Every viewpoint is interesting for some scientific discipline, and the number of viewpoints increases with the advancement of science. Our aim is very restricted, but, nevertheless, it opens up an infinite domain of new problems. And every problem can be solved in different ways. Hence, there is a path without end, wherever one begins and in whatever direction one goes. In the present volume, we shall concentrate on a small number of methods used in the study of poetic texts and apply them to some already quantified textual properties. Our textual examples are poems; they are often short and each result can be checked even without the use of a computer. Besides, the study of the phonic structure of poems is reasonable, because according to R. Jakobson, in poetry the form stays in the foreground. In prose, the phonic structure is not as prominent as in poetry and the rhythmic structure of prose depends also on the character of the given language, it is seldom a conspicuous property of a single text. Nevertheless, there is a discipline engaged in the study of prose rhythm. The methods presented in this study are applied to a corpus of 150 Romanian poems (including also a few “outliers”) written by Mihai Eminescu as they can be found in many editions of his works, texts analysing his works, or on the Internet: http://ro.wikisource.org/wiki/Autor:Mihai_Eminescu. In the present investigation, quantitative methods proven and tested in studies of prose texts, including methods for text comparison, are applied to poetic texts. Inter-sort or inter-language comparisons are frequently somewhat futile because each genre and each language has its own characteristic ways of text creation, hence most of the properties are significantly different. A statistical test simply emphasizes this expectation. We shall study phonic features, word-form frequencies, word-length, wordclasses, and the semantic structure of the poems revealing some parts of the author's world of associations. Each of them has many facets, but we concentrate rather on methods and methodology. An obvious question at the beginning of any book on text studies is: What is a text? However, in contemporary science, such essentialist questions are rather outdated. They require determinations of a kind of Kantian noumenon, the essence of a thing, which does not exist, or, expressed in a weaker form, it would not explain anything because explanations form an infinite hierarchy whereas the “essence” would be a final (and therefore not acceptable) station on this
2 Introduction way. Hence the only rational question is: what do we consider as a text? For the purpose of the present study, a text is a linear sequence of meaningful entities, organized also hierarchically (e.g. in the hierarchy sentence, clause, phrase, word, morpheme, syllable, phoneme). In linguistics, we restrict ourselves to spoken or written material but even within this restricted field, we find exceptions. Hypertexts e.g., on the Internet, are full of pictures and links, or texts in comics, etc., belong to the domain of intertextuality. Of course, one can study them, too, from various points of view but they are not standard texts as we are interested in. The texts of our interest are written in some script and their entities do not have only a purpose (like the kitchen in a house) but also a meaning, i.e., they refer to objects outside of the text. Nevertheless, even under this restriction, they have many properties in common with other linear sequences, and consequently, many methods used in non-linguistic disciplines can be applied also in linguistics. In quantitative linguistics, the explication of a text is not one of the aims or results of the research activities nor are the description of the content nor its evaluation (whether aesthetic or stylistic). Quantitative-linguistic research aims at finding regularities which arise due to the effect of –possibly still unknown – background laws. These regularities should not be confused with grammatical rules, which can be learnt or changed or even violated, and appear, in a manner of speaking, on the surface of the texts. We rather search for textual phenomena which are evoked by and evidence of certain background mechanisms. We shall never know all of them but stepwise approaching the matter allows us to penetrate deeper and deeper. There are five main approaches to text analysis (cf. Altmann 2007, 2009): (1) The static approach is concerned with the text as a whole, comprising the computation of all known properties, stylistic studies, evaluation of frequencies of different phenomena, lengths, polysemy values, word associations, measurement of grammatical structures, rankings, diversifications, classifications, denotative structures, measurement of differences, entropies, etc. This means that the text will be dissected into well defined units whose properties are studied. For this approach, at least elementary statistical methods are indispensible. Among the obvious tools, mathematical graphs and their properties provide easy ways to describe and display phenomena and relations. (2) The sequential approach considers text as a linear sequence of entities forming time series, runs, Markov chains, reference chains, etc. These entities comprise degrees of properties, frequencies, metrical feet, distances between elements of the series, etc., the position of certain de-
Introduction 3
grees of a property in a higher unit, e.g. word length positioning in the given sentence. This approach is more complex and frequently requires more complex methods. Corresponding mathematical models may be based on differential and difference equations. (3) A systemic approach can be started when some of the problems in the first two domains have been solved. Relations between entities, properties and structures which form control cycles and display the selfregulation mechanisms of text are in the focus of this approach. Though we know that texts are produced by authors, which consciously obey only grammatical rules and maybe rules of text structure, there are also latent, subconscious forces which compel the speaker/writer to form the text in a special way, e.g. reducing the decoding effort, reducing the memory effort, reducing sentence difficulty, increasing originality etc. The writer is free with respect to the content but not free with respect to the external form of the text: s/he must abide by some laws if s/he wants to be understood. The axiom concerning the non-existence of isolated entities in language and text is a sufficient motivation for the systemic approach. Investigations of this kind are known from the socalled synergetic linguistics (cf. Köhler 2005) and comprise both language and text. (4) The typological approach consists of comparing all the above mentioned properties as they occur in texts of different languages, placing the languages and texts on different scales, building fuzzy classes and studying the variability of various phenomena. Though text analysis played a secondary role in this research, its importance receives new impulses (cf. e.g. Kelih 2009; Popescu, Mačutek, Altmann 2009). However, the notorious classifications based on categorical concepts do not yield anything else but new, more general, concepts. We need them, but they seldom lead to theoretical progress. (5) The chaos theoretical approach. All aspects mentioned above contain some elements of chaos which is placed in a deeper layer in all text phenomena. Some phenomena, e.g. fractals, dimensions, attractors are identifiable but because of their indirect relevance for the text sciences and also because of their computational effort they are not yet sufficiently discussed (cf. Hřebíček 1997, 2000; Andres 2010; Andres, Benešová 2011). Ideally, a quantitative text analysis engages three different specialists. This is because at the beginning of the research, it is always the task of a linguist/text
4 Introduction scientist to set up a hypothesis with linguistic relevance. No hypothesis – no quantitative text research! The linguist states what kind of data would be relevant for testing the hypothesis and the programmer tries to elicit them from texts. As opposed to facts and phenomena, data are not just given but they are the result of a scientific activity, they are constructed. To a text scientist, text is the matter from which data are conceptually constructed. In the meantime, the mathematician translates the verbal hypothesis into the language of mathematics, i.e. formulates it as a statistical hypothesis. At the same time s/he tries together with the linguist to find the mechanism that can lead to the rise of the given phenomenon. In other words, the mathematician tries to set up a model of the phenomenon and to subsume it under an existing theory, to embed it in a system of similar hypotheses. The programmer tests the hypothesis on her/his data and the mathematician interprets them statistically. The results of the test are translated into the daily language of linguistics, and the linguist interprets the result linguistically. Hence, the succession of persons in text analysis is: linguist –> mathematician –> programmer –> mathematician –> linguist. The linguist is placed at the beginning and the end of this procedure and warrants the linguistic relevance of the problem at the beginning and the relevance of the results at the end. Needless to say, mathematicians and programmers frequently propose excellent ideas; a sound cooperation yields the most reasonable results. Texts are sources also for other disciplines such as psycholinguistics, sociolinguistics, dialectology, language teaching, etc. in which the respective experts determine the course of research. Another obvious question is: What can be considered as poetry? The first answer is: Poetry is a kind of literary art where evocative and aesthetic effects are based on form, in addition to (sometimes: instead of) meaning. This volume aims at investigating the universal laws and interrelations of aspects connected with consciously formed texts under consciously imposed form restrictions. There are many commonalities in these texts but none of the properties can be supposed as a necessary condition. Rhyme, rhythm, meter, the existence of verse line, strophes, a fixed number of lines (as in sonnets), meaning, etc., can be found in many but not in all poems. We must rely on the judgement of literary historians, making allowance for the existence of outliers which may destruct even our theories. Many times they can be made harmless by introducing boundary or subsidiary conditions. A large part of quantitative characterisations is performed by means of indicators. Many of them tell the same story but their interpretation may be differ-
Introduction 5
ent. But if they tell the same story, then there is a clear link between them, even when their method of computation is different. The indicators should have at least the following properties (cf. Galtung 1967; Grotjahn, Altmann 1988; Wimmer et al. 2003: 25ff): (1) Meaning. This seems to be quite natural, but many indicators arise in form of a proportion which does not have a clear interpretation. The indicator must tell us what it describes. (2) Simplicity, especially at the beginning of a research, because it alleviates computation and the mathematical treatment. It is advantageous to express different properties with different indicators. (3) Variation interval. If there are indicators varying in the interval , a given value of this indicator cannot be interpreted. Every number can be considered large (with respect to the lower limit 0) or small (with respect to the upper limit ∞). It is therefore reasonable to restrict the value to a finite interval by means of normalization. (4) Sampling distribution. This property of an indicator is indispensable for a reliable evaluation of the measured values. It gives information about the frequency or probability of the individual values of the indicator, information which is fundamental for any statistical assessment. Unfortunately, this requirement is still ignored in the humanities in many cases. In order to apply an indicator, e.g. for comparisons, one should know at least its variance, which is needed for asymptotic tests. Exact probabilities can be computed only when the distribution of the indicator is known. The application of non-parametric statistics, a well-established technique, is an alternative. (5) Reliability is the measure of exactness and stability. The indicator should be stable and express the same property in all cases. (6) Validity means the fact that the indicator truly expresses the studied property. An illustrative example in this respect is the large number of available measures of vocabulary richness, whose validity is an open question. But all this cannot be achieved in an elementary, preliminary investigation. Research begins always with the first step and improves its argumentation step by step, sets up more complex hypotheses, extends the investigations to other languages and, based on the surface phenomena expressed by indicators, further steps towards a theory follow. A theory is a system of interrelated hypotheses, some of which can be considered laws, i.e. general statements derived from axioms or other laws, or in other words, anchored in antecedent knowledge, and empirically well corroborated (cf. Bunge 1967). In mainstream linguistics, the term theory is misused. It stands, as a rule, for concepts, isolated phenomena, descriptive approaches, sets of facts, classifications and sets of rules. All that, and even strict definitions – which are not more than conventions – and a preceding formalization do not have the status
6 Introduction of a theory. The mentioned definitions and formalisations are merely necessary but not sufficient conditions for the construction of a theory. A theory begins to arise when we derive hypotheses from antecedent knowledge, test them empirically and join them with a system of universal, corroborated statements. This is, of course, not a simple task because language is not a deterministic system with clear-cut units and relations. Though it is always in a steady state, it varies with every speaker, it changes incessantly, and communication is possible only because of its self-regulation. A speaker can – and does – change elements but if s/he aims at communicative success, the change must not surpass a certain limit. With every change, the limit is shifted by a tiny quantity. Since this shift is always advantageous from the point of view of the speaker – s/he is the actor in this play – the phenomena in language are never distributed according to the normal (Gaussian) distribution. Every distribution in language is skewed. Nevertheless, values of whatever property taken from many texts may display normality in a statistical sense (a situation that can be tested) and a comparison with other text groups is possible by means of an asymptotic test based on normality. The greatest advancements in every empirical science are achieved by introducing mathematical methods. Mathematics is a warrant of exactness, testability, deducibility, and systematisation and it gives us the chance to predict phenomena which are not visible on the surface of texts. In spite of this, there are still objections against the application of mathematical instruments in literary studies. Such objections can be heard from hard-core poeticists relying only on intelligent verbal descriptions. The corresponding arguments have been analysed (cf. Altmann 1999; Wimmer et al. 2003:14 ff.) and will be reproduced here for the sake of clarity. The objections are: 1. “Our objects cannot be quantified/mathematised.” 2. “Even if it would be possible, we are not interested in numbers but in qualities and properties.” 3. “We are not interested in laws but in the uniqueness, idiosyncrasy of texts.” 4. “Our problems are that complex that no mathematics can capture and express them.” Evidently, all these objections arise either from misunderstanding and can easily be appeased, or from a negative attitude towards mathematics, and in that case they cannot be removed. Objection 1 is rooted in false epistemology. We do not “mathematise” real objects, we quantify/mathematise only our concepts and ideas about them. Objects do not contain numbers, which could be observed. Properties are first
Introduction 7
conceptually constructed, then quantified and at last measured. These measures are ascribed to objects. Properties are always gradual (cf. Bunge 1983; 187f; 1995), hence quantification is the best way to perform exact research. If the concepts of objects or properties formed by a researcher cannot be quantified, then we have to conclude that these concepts are too poor or too unclear. In qualitative research only inexact expressions like “very warm”, “many”, “frequently” etc., occur, in extreme cases the property is dichotomised– relics from structuralism – and loses the major part of the information. Qualitative concepts of properties are the ontogenetic heritage of our language, in which numbers appear later on, as can be observed also in children's development. But if we admit that qualities and quantities do not exist in reality but only in our concepts, objection 1 becomes irrelevant. Objection 2, just as objection 1, confuses epistemology with ontology. No scientist is interested in numbers, but numbers are the best way to exactly capture our conceptual entities. Reality is neither qualitative nor quantitative. It simply exists. With the help of our concepts, we simply try to capture it in order to improve orientation and knowledge, and to survive. The information we obtain from reality are merely weak electrical impulses entering our brain via our senses, and the brain has to construct a (partial) map of the reality on this basis. Reality is (re-)constructed by means of concepts, which are primordially qualitative, and the natural language helps us by codifying them. Science, however, requires more exact concepts, viz. quantitative ones (cf. Bunge 1967, 1983; Stegmüller 1970; Essler 1971). Disciplines working with quantitative concepts develop more rapidly than other ones. Objection 3 is an evident error. Idiosyncrasy can be stated only as a contrast to a general background or as a difference from other texts. In any case, comparison is necessary. But if a text is said to represent an idiosyncrasy, the significance of the difference must be shown. This can be done only by means of a statistical test; the indication of the difference – even if it is given in an exact form – would not suffice. In literary studies, in corpus linguistics, and in computational linguistics, text and methods are frequently compared by indicating a certain numerical difference, sometimes in form of percentages. On this basis alone, conclusions are drawn such as "Method X is superior by 3 %" or "Text 1 possesses more of property X than text 2: Text 1 has 70 scores and text 2 only 68". These are proto-scientific statements, not more than opinions; they ignore that the difference may be a random result or due to an inexact measurement. Objection 4 is again a misunderstanding. Every statement, including scientific ones, is a simplification, whether it is given verbally or by means of mathematics. An object cannot be captured in its entirety – especially because
8 Introduction we do not even know what its entirety is. Only a small number of aspects of an object can be focused. In some theories, the inevitable separation into 'relevant' and other aspects is made explicit in the ceteris paribus condition, which can be presented in form of a disturbance constant or a special function, which weakly contributes to the main independent variable. Furthermore, mathematical concepts and quantitative methods are obviously the only imaginable way to describe and analyse complex structures and processes including fuzzy ones. There are two ways to perform text analyses: comparison of texts and text sorts, written by different authors in one or more languages and the study of the outcomes of text laws – or study of an individual author, one text sort, and one language, description of the properties of the given set of texts and the theoretical search for the latent mechanisms which brought about the given phenomenon. In this volume, we focus the poetic work by the famous Romanian poet Mihai Eminescu and try to characterise it, show some relations, and realisations of text laws, and we indicate perspectives for future research.
2 Phonic phenomena 2.1 Occurrence without pattern 2.1.1 Phoneme frequencies The usual way to capture phonic phenomena in texts is to consider sound/phoneme frequencies, either absolutely, relatively or associated with a position. While in prose mostly the first view is practised, in poetry patterns of sounds occur whose existence or positioning display a kind of statistical trend. The most common one is rhyme, which is created consciously whereas in other kinds of poetry and various languages, also phenomena such as alliteration, assonance, spontaneous aggregation, etc. are observed. Before scrutinising these specific phenomena, we will focus on the study of phoneme frequencies. We suppose that even in poetry – if there is no special aim, as in Dadaistic poetry – the phonemes abide by the stratification law, a general hypothesis, which was proposed as an alternative for Zipf's formula (cf. Popescu, Altmann, Köhler 2010). Nevertheless, Zipf's power law or ZipfAlekseev's function can also be used where the data are less complex. The stratification approach aims at finding the number of strata formed by the given entities. In short texts, there is usually only one layer, in longer texts, stratification becomes more obvious. This regularity holds for any kind of entities. In order to demonstrate this regularity on the phonic level, we first present the phonic analysis of Romanian and its phonemic interpretation as well as the transcription of letters into phonemes. In Romanian phonology, the phoneme inventory consists of seven vowels (strong vowels, syllabic vowels), one voiceless (non-syllabic) vowel, two or four semivowels (different views exist and we will work with the four semivowel version) and twenty-two consonants. The vowel „i“ can occur at the end of a syllable which already contains a syllabic vowel. In this case, „i“ is a nonsyllabic (voiceless) vowel. A semivowel (weak vowel) is phonetically similar to a vowel (strong vowel) but functions as a syllable boundary rather than as the nucleus of a syllable and is shorter than the corresponding vowel. Out of the total number of seven vowels, only four can behave as semivowels, which are involved in some special groups of phonemes called diphthongs and triphthongs. A diphthong refers to two adjacent vowels occurring within the same syllable. It contains one vowel (strong vowel) and one semivowel (weak vowel). A triphthong is the uninterrupted combination of three vowels in the same syllable: a strong vowel and
10 Phonic phenomena two semivowels (the strong vowel is usually in between the semivowels). The list of phonemic transcriptions of graphemes is presented in Tables 2.1.1.1 to 2.1.1.4 Table 2.1.1.1: The phoneme-grapheme relation for vowels and semivowels in Romanian Phoneme (IPA)
Grapheme
1
/a/
strong vowel
apa
2
/ə/
strong vowel
părinte
3
/ɨ/
strong vowel
cânta coborî, înainte
4
/e/
strong vowel
erou
5
/e̯ /
in diphthongs and triphthongs
weak vowel
stea, /e̯ /a/ - diphthong doreai, /e̯ /a/j/ - triphthong
6
/i/
strong vowel
inel
7
//
at the end of a syllable containing a syllabic vowel
non-syllabic flori, îţi, (voiceless) vowel orice, galbeni
8
/j/
in diphthongs and triphthongs
weak vowel
mai, /a/j/ -diphthong doreai, /e̯ /a/j/ triphthong
9
/o/
strong vowel
oraş
10
/o̯ /
in diphthongs and triphthongs
weak vowel
coasă, /o̯ /a/ - diphthong pleoape, /e̯ /o̯ /a/triphthong
11
/u/
strong vowel
durere
12
/w/
in diphthongs and triphthongs
weak vowel
nou, /o/w/ - diphthong vreau, /e̯ /a/w/ - triphthong
i
Example in Romanian
Occurrence without pattern 11
Table 2.1.1.2: The phoneme-grapheme relation for consonants in Romanian Phoneme (IPA)
Grapheme
Example
Example in English
1
/b/
bine
book
2
/k/
curaj karate quasar
close
3
/t∫/
cer cireşe
chest
4
/k’/
chemare, chipeş kilogram
kept
5
/d/
dar
day
6
/f/
foc
face
7
/g/
greu
gold
8
/dʒ/
ger gingaş
gist
9
/g’/
ghem ghiocel
get
10
/h/
harnic
hat
11
/ʒ/
joc
pleasure
12
/l/
lac
lake
13
/m/
mac
moon
14
/n/
nor
name
15
/p/
parc
pan
16
/r/
rac
rain
17
/s/
soare
sun
18
/∫/
şarpe
shape
19
/t/
tare
time
20
/ts/
ţară
its
21
/v/
val watt
voice
22
/z/
zori
zone
23
/c/+/s/ /g/+/z/
excursie examen
exception
12 Phonic phenomena Table 2.1.1.3: Romanian diphthongs Grapheme
Phonemic transcription
Example in Romanian
1
/a/j/
mai
2
/a/w/
dau
3
/e̯ /a/
stea
4
/e/j/
trei
5
/e̯ /o/
vreo
6
/e/w/
leu
7
/j/a/
biată
8
/j/e/
miere
9
/i/j/
fii
10
/j/o/
iobag
11
/i/w/
auriu
/j/u/
iubire
12
/o̯ /a/
soare
13
/o/j/
foi
14
/o/w/
nou
15
/w/a/
ziua
16
/w/e/
înşeuez
17
/u/j/
pui
18
/u/w/
continuu
19
/w/ə/
două
20
/w/ɨ/
plouând
21
/ə/j/
răi
22
/ə/w/
rău
23
/ɨ/j/
câine îi dau
24
/ɨ/w/
râu
Occurrence without pattern 13
Table 2.1.1.4: Romanian triphthongs Grapheme
Phonemic transcription
Example in Romanian
1
/e̯ /a/j/
doreai
2
/e̯ /a/w/
mergeau
3
/e̯ /o̯ /a/
pleoape
4
/j/a/j/
voiai
5
/j/a/w/
tăiau
6
/j/e/j/
piei
7
/j/e/w/
eu
8
/j/o̯ /a/
creioane
9
/j/o/j/
i-oi da
10
/j/o/w/
maiou
11
/o̯ /a/j/
orzoaică
12
/w/a/j/
înşeuai
13
/w/a/w/
înşeuau
14
/w/ə/j/
rouăi
Syllabification is very important in the identification of diphthongs, triphthongs and finally in the phonemic transcription. Some special cases of phonemic transcriptions with syllabification are presented below. 1. The grapheme ‘e’ at the beginning of personal pronouns is transcribed as follows: eu ea el ele ei
(I) (she) (he) (they - feminine) (they - masculine)
/j/e/w/ /j/a/ /j/e/l/ /j/e/ - /l/e/ /j/e/j/
2. The grapheme ‘e’ at the beginning of the forms (different tenses) of the verb “a fi” (“to be” ) is transcribed as /j/e/: e este eram
/j/e/ /j/e/s/ - /t/e/ /j/e/ - /r/a/m/
14 Phonic phenomena 3. The graphemes ‘e’ and ‘a’ at the beginning of a syllable, following a syllable which ends with ‘i’ are transcribed as /j/e/ and /j/a/ respectively. urgie prietenie România mantia
/u/r/ - /dʒ/i/ - /j/e/ /p/r/i/ - /j/e/ - /t/e/ - /n/i/ - /j/e/ /r/o/ - /m/ɨ/ - /n/i/ - /j/a/ /m/a/n/ - /t/i/ - /j/a/
4. Exceptions: loan words (neologisms) cordial Eliade diamant
/k/o/r/ - /d/i/ - /a/l/ /e/ - /l/i/- /a/ - /d/e/ /d/i/ - /a/ - /m/a/n/t/
For more details related to the rules for phonemic transcription in Romanian see Dindelegan (2013: 7–17). Examples of phonemic transcriptions with syllabification: chemare cheamă ochi ochii copii copiii veciniciei creioane şoarece fecioară valuri să-mi ghiozdan ghiocel geană gingaş examen excursie
/k’/e/ - /m/a/ - /r/e/ /k’/a/ - /m/ə/ /o/k’/ /o/ - /k’/i/ /k/o/ - /p/i/ /k/o/ - /p/i/- /j/i/ /v/e/t∫/ - /n/i/ - /t∫/i/ - /j/e/j/ /k/r/e/- /j/o̯ /a/ - /n/e/ /∫/o̯ /a/ - /r/e/ - /t∫/e/ /f/e/ - /t∫/o̯ /a/ - /r/ə/ /v/a/ - /l/u/r/i/ /s/ə/m/i/ /g’/o/z/ - /d/a/n/ /g’/i/ - /o/ - /t∫/e/l/ /dʒ/a/ - /n/ə/ /dʒ/i/n/ - /g/a/∫/ /e/ - /g/z/a/ - /m/e/n/ /e/k/s/ - /k/u/r/ - /s/i/ - /j/e/
Occurrence without pattern 15
The phonemic transcription of the poem Lacul is presented below. Lacul
phonemic transcription
Lacul codrilor albastru Nuferi galbeni îl încarcă Tresărind în cercuri albe El cutremură o barcă
/l/a/k/u/l/ /k/o/d/r/i/l/o/r/ /a/l/b/a/s/t/r/u/ /n/u/f/e/r/i/ /g/a/l/b/e/n/i/ /ɨ/l/ /ɨ/n/k/a/r/k/ə/ /t/r/e/s/ə/r/i/n/d/ /ɨ/n/ /t∫/e/r/k/u/r/i/ /a/l/b/e/ /j/e/l/ /k/u/t/r/e/m/u/r/ə/ /o/ /b/a/r/k/ə/
Şi eu trec de-a lung de maluri Parc-ascult şi parc-aştept Ea din trestii să răsară Şi să-mi cadă lin pe piept
/∫/i/ /j/e/w/ /t/r/e/k/ /d/e̯ /a/ /l/u/n/g/ /d/e/ /m/a/l/u/r/i/ /p/a/r/k/a/s/k/u/l/t/ /∫/i/ /p/a/r/k/a/∫/t/e/p/t/ /j/a/ /d/i/n/ /t/r/e/s/t/i/j/ /s/ə/ /r/ə/s/a/r/ə/ /∫/i/ /s/ə/m/i/ /k/a/d/ə/ /l/i/n/ /p/e/ /p/j/e/p/t/
Să sărim în luntrea mică Îngânaţi de glas de ape, Şi să scap din mână cârma, Şi lopeţile să-mi scape;
/s/ə/ /s/ə/r/i/m/ /ɨ/n/ /l/u/n/t/r/e̯ /a/ /m/i/k/ə/ /ɨ/n/g/ɨ/n/a/ts/i/ /d/e/ /g/l/a/s/ /d/e/ /a/p/e/ /∫/i/ /s/ə/ /s/k/a/p/ /d/i/n/ /m/ɨ/n/ə/ /k/ɨ/r/m/a/ /∫/i/ /l/o/p/e/ts/i/l/e/ /s/ə/m/i/ /s/k/a/p/e/
Să plutim cuprinşi de farmec Sub lumina blândei lune –
/s/ə/ /p/l/u/t/i/m/ /k/u/p/r/i/n/∫/i/ /d/e/ /f/a/r/m/e/k/ /s/u/b/ /l/u/m/i/n/a/ /b/l/ɨ/n/d/e/j/ /l/u/n/e/
Vântu-n trestii lin foşnească, Unduioasa apă sune!
/v/ɨ/n/t/u/n/ /t/r/e/s/t/i/j/ /l/i/n/ /f/o/∫/n/e̯ /a/s/k/ə/ /u/n/d/u/j/o̯ /a/s/a/ /a/p/ə/ /s/u/n/e/
Dar nu vine... Singuratic În zadar suspin şi sufăr Lângă lacul cel albastru Încărcat cu flori de nufăr
/d/a/r/ /n/u/ /v/i/n/e/ /s/i/n/g/u/r/a/t/i/k/ /ɨ/n/ /z/a/d/a/r/ /s/u/s/p/i/n/ /∫/i/ /s/u/f/ə/r/ /l/ɨ/n/g/ə/ /l/a/k/u/l/ /t∫/e/l/ /a/l/b/a/s/t/r/u/ /ɨ/n/k/ə/r/k/a/t/ /k/u/ /f/l/o/r/i/ /d/e/ /n/u/f/ə/r/
The frequencies of phonemes ranked in usual way are presented in Table 2.1.1.5. The poem Lacul has altogether total lines: 20; the total number of phonemes is 414 composed of 180 vowels (strong +voiceless +week) and 234 consonants. The size of the phoneme inventory is 29.
16 Phonic phenomena Table 2.1.1.5: Rank-frequencies of phonemes in individual strophes in Lacul Strophe 1
Strophe 2
Strophe 3
Strophe 4
Strophe 5
rank
freq phoneme freq phoneme freq
phoneme freq
phoneme freq
phoneme
1
12
/r/
9
/a/
7
/a/
10
/n/
8
/a/
2
8
/l/
7
/e/
7
/s/
9
/u/
8
/u/
3
7
/a/
7
/i/
6
/ə/
6
/a/
8
/n/
4
7
/e/
7
/r/
6
/e/
6
/e/
8
/r/
5
7
/k/
7
/t/
6
/i/
6
/s/
6
/l/
6
6
/u/
6
/p/
6
/n/
5
/i/
5
/i/
7
5
/n/
5
/ə/
5
/ɨ/
5
/l/
5
/k/
8
4
/ə/
5
/k/
5
/m/
4
/t/
5
/s/
9
4
/b/
5
/s/
4
/k/
3
/ə/
4
/ə/
10
3
/ɨ/
4
/d/
4
/l/
3
/j/
3
/ɨ/
11
3
/i /
4
/l/
4
/p/
3
/k/
3
/e/
12
3
/o/
4
/∫/
3
/d/
3
/d/
3
/d/
13
3
/t/
3
/j/
3
/r/
3
/m/
3
/f/
14
2
/i/
3
/u/
2
i
//
3
/p/
3
/t/
15
2
/d/
3
/n/
2
/g/
3
/r/
2
/g/
16
2
/s/
2
//
2
/∫/
2
/ɨ/
1
/ i/
17
1
/j/
2
/m/
2
/ts/
2
/b/
1
/o/
18
1
/t∫/
1
/e̯ /
1
/e̯ /
2
/f/
1
/b/
19
1
/f/
1
/w/
1
/o/
2
/∫/
1
/t∫/
20
1
/g/
1
/g/
1
/u/
1
/e̯ /
1
/p/
21
1
/m/
0
/ɨ/
1
/t/
1
/i /
1
/∫/
22
0
/e̯ /
0
/o/
0
/j/
1
/o/
1
/v/
23
0
/o̯ /
0
/o̯ /
0
/o̯ /
1
/o̯ /
1
/z/
24
0
/w/
0
/b/
0
/w/
1
/v/
0
/e̯ /
25
0
/p/
0
/t∫/
0
/b/
0
/w/
0
/j/
26
0
/∫/
0
/f/
0
/t∫/
0
/t∫/
0
/o̯ /
27
0
/ts/
0
/ts/
0
/f/
0
/g/
0
/w/
28
0
/v/
0
/v/
0
/v/
0
/ts/
0
/m/
29
0
/z/
0
/z/
0
/z/
0
/z/
0
/ts/
i
Occurrence without pattern 17
The rank-frequency distribution of phonemes in all strophes can be captured by the one- or two-component stratification formula
(2.1.1.1) fr = 1 + a*exp(-r/b) + c*exp(-r/d) Figure 2.1.1.1. Rank-frequencies of phonemes in the first strophe of Lacul
testifying to the phonic stratification of individual strophes. The individual fitting parameters, computed iteratively, are presented in Table 2.1.1.6. The graphic picture of the first strophe is presented in Figure 2.1.1.1 and the fourth strophe in Figure 2.1.1.2 (R2 is the usual determination coefficient). Table 2.1.1.6: Parameters of fitting (2.1.1.1) to individual strophes Strophe
a
b
c
d
R2
1
11.5941
6.1295
-
-
0.95
2
8.8879
9.0145
-
-
0.93
3
7.6789
9.0035
-
-
0.91
4
4.1970
1.0692
8.5249
7.7596
0.96
5
9.5960
7.2343
-
-
0.93
As can be seen in Table 2.1.1.6, the parameters do not vary excessively, the phoneme representation is quite uniform. Nevertheless, different local phenomena may appear, and these will be analyzed in the subsequent sections. Of course, the differences between individual parameters could be tested but the differences are too small to allow general hypothesis building. The ho-
18 Phonic phenomena mogeneity of the distributions in individual strophes cannot be performed by means of the chi-square test because the frequencies are too small; another test based on ranks could be used instead. To this end, we reorder Table 2.1.1.5 according to phonemes and ascribe them the respective rank in the given strophe. The result can be seen in Table 2.1.1.7. When two or more frequencies are identical, the corresponding phomenes were assigned the mean of the ranks, e.g. if ranks 5,6,7 have the same frequency, then all three items receive rank 6. If a frequency is unique, its rank remains as it is. Further, if a phoneme does not occur in a strophe, it obtains the highest rank (highest mean rank).
Figure 2.1.1.2. Rank-frequencies of phonemes in the fourth strophe of Lacul
The S column contains the sums of the given rows; Vi is a function of ties (in the strophe i). A tie with ti occurrences corresponds to the function ti3 - ti, e.g. in the first column, the rank 4 occurs three times (the phonemes /a/, /e/, /k/), hence V/a/ = 33 - 3 = 24. If there are several ties in the column, the sum of the above function has to be calculated. In column S2, there are the squares of the values in the S column. The squared sum of empirical rank sums is given as
(2.1.1.2)
2
Np SSR = ∑ ( Si − S ) =∑ Si − ∑ Si / N p , i =1 i =1 i =1 Np
2
Np
2
yielding in our case SSR = 198356 - 21752/29 = 35231 (see Table 2.1.1.7). Np is the number of distinct phonemes in the studied poem, e.g. Np = 29 for Lacul (see Table 2.1.1.5).
Occurrence without pattern 19
Since we have ties whose sums are given in the last row, we compute Kendall's concordance coefficient in the form:
(2.1.1.3)
W =
12( SSR) m
m 2 ( N p − N p ) − m∑Vi 3
i =1
where m is the number of strophes of the studied poem. Lacul has m = 5 strophes of 4 verses each. The value of W need not be calculated because one can directly compute the test-criterion (see below). The computation of the function Vi can be illustrated using the example of the first strophe. Rank 8.5 occurs twice, rank 4 three times, rank 15 three times, rank 11.5 four times, rank 19 five times, and rank 25.5 eight times, hence the function Vi is computed for the first strophe as follows: V1 = 1(23 - 2) + 2(33 - 3) + 1(43- 4)+1(53 - 5) +1(83 - 8) = 6 + 48 + 60 +120 + 504 = 738
Now we want to find a criterion enabling us to decide whether the strophes are phonically independent. This can be done by means of the chi-square criterion as defined by Kendall (1962: 100): (2.1.1.4)
X2 =
12( SSR) m N p ( N p + 1) −
m 1 ∑V j N p − 1 j =1
yielding a chi-square statistic with Np-1 degrees of freedom. Inserting the computed values in this formula we obtain X2 = 12(35231)/[5(29)30 - (1/28)3930] = 422772/(4350-140.357) = 100.43 Since we have DF = 28, our result is significant (e.g. for α = 0.0005 we have X2 = 50.5), i.e. the use of the phonemes is divergent among the individual strophes. This fact allows to deduce several consequences. We cannot, however, ask the author any more whether our interpretations are correct. Possible inferences are e.g. that the poem has not been written in one go or that it has been corrected subsequently or that it has not been written spontaneously in form of an improvisation, etc.
20 Phonic phenomena Table 2.1.1.7: Ranks of phonemes in individual strophes in Lacul phoneme
strophe 1
/a/
2 4
3
4
1
1.5
S
S2
13
169
5 4
2.5
/ə/
8.5
8
4.5
12
9
42
1764
/ɨ/
11.5
25
7.5
17.5
12
73.5
5402.25
/e/
4
3.5
4.5
4
12
28
784
/e̯ /
25.5
19
19.5
22
26.5
112.5
12656.25
/i/ i //
15
3.5
4.5
6.5
7
36.5
1332.25
11.5
16.5
15.5
22
19.5
85
7225
/j/
19
14
25.5
12
26.5
97
9409
/o/
11.5
25
19.5
22
19.5
97.5
9506.25
/o̯ /
25.5
25
25.5
22
26.5
124.5
15500.25
/u/
6
14
19.5
2
2.5
44
1936
/w/
25.5
19
25.5
27
26.5
123.5
15252.25
/b/
8.5
25
25.5
17.5
19.5
96
9216
/k/
4
8
10
12
7
41
1681
/t∫/
19
25
25.5
27
19.5
116
13456
/d/
15
11
12.5
12
12
62.5
3906.25
/f/
19
25
25.5
17.5
12
99
9801
/g/
19
19
15.5
27
15
95.5
9120.25
/l/
2
11
10
6.5
5
34.5
1190.25
/m/
19
16.5
7.5
12
26.5
81.5
6642.25
/n/
7
14
4.5
1
2.5
29
841
/p/
25.5
6
10
12
19.5
73
5329
/r/
1
3.5
12.5
12
2.5
31.5
992.25
/s/
15
8
1.5
4
7
35.5
1260.25
/∫/
25.5
11
15.5
17.5
19.5
89
7921
/t/
11.5
3.5
19.5
8
12
54.5
2970.25
/ts/
25.5
25
15.5
27
26.5
119.5
14280.25
/v/
25.5
25
25.5
22
19.5
117.5
13806.25
/z/
25.5
25
25.5
27
19.5
122.5
15006.25
Sum S
435
435
435
435
435
2175
198356
Vi
738
882
726
666
918
3930
Occurrence without pattern 21
2.1.2 Euphony in general In literary studies, euphony is a fuzzy concept originating from an individual perception of a text and the intuitive aesthetic evaluation of this perception. Peculiar enough, in music, which is strongly based on euphony (but not always), the concept does not even exist. Instead, various kinds of aesthetics are discussed. In textology, beauty is associated rather with the choice of words or association of ideas, etc. Euphony is some background noise ascribed to the phonic composition of the poem expressed above all by the rhyme. Since rhyme is conscious, the concept of euphony becomes fuzzy if we add to the rhyme also some subconscious phenomena and perform dichotomic decisions about their presence or absence. Definitions that can be found en masse in dictionaries or on the Internet say that euphony is a pleasing or sweet sound or a harmonious succession of words with a pleasing sound – which is simply a tautology, not an operational definition. In order to avoid passionate discussions about the greater harmonious and pleasing succession of words in Sheffield English than in Italian, we try to bestow the concepts of euphony with a more objective correspondence with perceived reality and warrant it a computable existence. In the presented approach, euphony is understood as a regular or nonrandom occurrence of sounds/phonemes or their sequential patterns in a text. We prefer to apply the concept of phoneme because that of sound is rather fuzzy and sound realisation depends always on the automatisms acquired during childhood. In poetry, for which the concept of euphony is considered as relevant – as opposed to prose –, the best known euphonic phenomenon is the rhyme. If, say, we find an /a/ in each position in a verse where a vowel occurs then we are inclined to consider it a euphonic pattern. This event need not be a conscious act of the poet, it may simply be the outcome of the Skinner effect or a phenomenon of self-organisation. Nevertheless, there are cases, as e.g. in old Javanese, where a specific vowels sequence is required for each verse. Whatever the cause may be, if we want to consider a pattern euphonic, we must show that the given pattern is significant, i.e. not a random effect. There are several methods to determine this fact: (i) To ask the author who perhaps remembers her/his motivs and writing method (as long as s/he lives) – however, even if so, a writer will not be able to state exactly the degree of realised euphony. (ii) To ask informants for their subjective impression; but this method furnishes information only in form of subjective statements and depends strongly on age, education, gender, social status, dialect, etc. Nevertheless, at least a kind of scaling can be
22 Phonic phenomena obtained in this way. (iii) The only objective method is a statistical test of the occurrence of individual phonemes or phonemic patterns for significance. This procedure has several advantages: (a) It is objective; every researcher will obtain the same results; (b) it involves quantification of a very fuzzy concept and can be used for comparisons, classifications, studying the evolution of a writer, etc., and (c) it allows us to determine the entities which evoked euphony. (d) Last but not least, we can compute even the probability of a false evaluation. Up to now, there are no general hypotheses about euphony. Neither boundary conditions are known under which it must, may or cannot occur. The number of statistical studies concerning euphony is very small (cf. Skinner 1939, 1941; Sebeok, Zeps 1959; Meyer-Eppler 1959; Altmann 1963, 1966a,b, 1968; Wimmer et al. 2003). The investigations are local, restricted to a text sort or a writer. Anyway, they show that objective quantification of different phonic phenomena in poetry is possible. In the present chapter, we shall study the general euphony of verses in the sense of non-random occurrence, i.e. by a significantly frequent occurrence of some phonemes in the line and set up an indicator of general euphony of a poem. Since we study only one writer, the starting point is the table of letters, their phonemic correspondences, and their occurrences in his works considered. The titles and sizes of 46 analysed poems are listed in Table 2.1.2.1 and the corresponding phoneme occurrence is presented in Table 2.1.2.2. It is to be noted that we work with the concept of phoneme, i.e. with a higher abstraction because of its uniformity as opposed to the variation observed in sounds. We want to present a realistic measurement of euphony; therefore we proceed as follows. Every line is considered a sample of its own. We distinguish the number of vowels V and the number of consonants C in the line. Since in a vocalic position, a certain vowel can occur or not, its distribution is binomial. Now, if a vowel i occurs two or more times in the line, in general xi-times, we compute the probability of the given or more extreme number of occurrences by means of the formula (2.1.2.1).
Occurrence without pattern 23
Table 2.1.2.1: Titles and sizes of analysed 46 poems by Eminescu Poem title
No of words (text size)
1
Lebăda
41
2
Peste vârfuri
47
3
Dintre sute de catarge
50
4
Şi dacă...
53
5
La mijloc de codru...
55
6
Somnoroase păsărele...
55
7
La steaua
71
8
Adânca mare…
75
9
Trecut-au anii
88
10
Lacul
90
11
Ce te legeni...
102
12
Odă în metru antic
103
13
De ce nu-mi vii
123
14
Mai am un singur dor
125
15
Criticilor mei
130
16
O, mamă…
140
17
Cu mane zilele-ţi adaogi...
141
18
Revedere
141
19
Sara pe deal
156
20
Atât de fragedă…
176
21
Freamăt de codru
179
22
Ce-ţi doresc eu ţie, dulce Românie
183
23
Pe lângă plopii fără soţ
199
24
Povestea codrului
220
25
Floare-albastră
247
26
Sonete
265
27
Despărţire
304
28
Ghazel
331
29
La moartea lui Heliade
332
30
O, adevăr sublime...
334
31
Iubită dulce, o, mă lasă
337
32
O călărire în zori
346
33
Dacă treci râul Selenei
356
24 Phonic phenomena
34
Rugăciunea unui dac
357
35
Copii eram noi amandoi
375
36
Glossă
380
37
Povestea teiului
390
38
Venere şi Madona
393
39
Făt-Frumos din tei
415
40
Dumnezeu şi om
443
41
Junii corupţi
458
42
Mortua est!
491
43
Epigonii
921
44
Împărat şi proletar
1510
45
Luceafărul
1737
46
Scrisoarea III
2278
V V x V − x (2.1.2.1) P ( X ≥ xi ) = pi qi , ∑ x = xi x
where X is a vocalic phoneme, and analogically for consonants, replacing V by C. The first line of the poem Adânca mare Adânca mare sub a lunei faţă; is transcribed as /a/d/ɨ/n/k/a/ /m/a/r/e/ /s/u/b/ /a/ /l/u/n/e/j/ /f/a/ts/ə/ In this verse, we have V = 12, C = 11. Considering the vocalic phoneme /a/ we see that it occurs 5 times and its relative frequency with respect to the inventory of vowels is 0.181583 (Table 2.1.2.2); we compute accordingly Table 2.1.2.2: Phonemes in Eminescu's 46 poems phoneme
No of occurrences
relative frequency
relative frequency vowels/consonants
Vowels (strong, weak, voiceless) 1
/a/
6172
0.086823
0.181583
2
/ə/
3005
0.042272
0.088408
Occurrence without pattern 25
3
/ɨ/
1733
0.024379
0.050986
4
/e/
7118
0.100131
0.209415
5
/e̯ /
859
0.012084
0.025272
6
/i/
4252
0.059814
0.125096
7
/ i/
1142
0.016065
0.033598
8
/j/
2434
0.034240
0.071609
9
/o/
2159
0.030371
0.063519
10
/e̯ /
546
0.007681
0.016064
11
/u/
4261
0.059941
0.125360
12
/w/
309
0.004347
0.009091
Consonants 13
/b/
767
0.010790
0.020676
14
/k/
2389
0.033607
0.064399
15
/t∫/
1145
0.016107
0.030865
16
/k’/
223
0.003137
0.006011
17
/d/
2627
0.036955
0.070814
18
/f/
795
0.011183
0.021430
19
/g/
505
0.007104
0.013613
20
/dʒ/
277
0.003897
0.007467
21
/g’/
19
0.000267
0.000512
22
/h/
67
0.000943
0.001806
23
/ʒ/
114
0.001604
0.003073
24
/l/
3173
0.044635
0.085533
25
/m/
2539
0.035717
0.068442
26
/n/
4867
0.068465
0.131197
27
/p/
2046
0.028782
0.055153
28
/r/
5220
0.073431
0.140712
29
/s/
2766
0.038910
0.074561
30
/∫/
1222
0.017190
0.032941
31
/t/
3944
0.055481
0.106316
32
/ts/
760
0.010691
0.020487
33
/v/
1116
0.015699
0.030083
34
/z/
514
0.007231
0.013856
26 Phonic phenomena
12 12 − x x 0.181583 (1 − 0.181583) x =5 = 0.038452 + 0.009953 + 0.001893 + 0.000262 + 0.0000259 + + 0.00000172 + 0.0000000695 + 0.000000001285 = 0.05059.
P (/ a / ≥ 5) =
12
∑ 5
The phoneme /n/ occurs twice in the given line and its relative frequency with respect to the inventory of consonants is 0.131197 (Table 2.1.2.2), hence we obtain 11 11 P (/ n / ≥ 2) ∑ 0.131197 x (1 = = − 0.131197)11− x 0.433506. 2 x=2 If the computed probability is smaller than α, which can be determined conventionally as e.g. 0.05, then we may speak of a euphonic tendency. The extent of euphony contributed by the given phoneme will be measured by the indicator (cf. Wimmer et al. 2003: 60) (2.1.2.2)
CE
phoneme
100[α − P(ξ ≥ x)], if α > P(ξ ≥ x) = otherwise 0,
where ξ = occurrences of the phoneme. Here, α is the significance level (= 0.05), therefore the Coefficient of Euphony, CEphoneme expressed by (2.1.2.2) is always positive and may attain values in the interval [0; 5.00]. In our example, we obtained in the first case the sum of 0.05059, which is greater than 0.05, hence the five occurrences of /a/ do not display a euphonic effect. The same holds for the two occurrences of /n/, from which follows that the first line is not constructed euphonically. Performing the above computation for all k phonemes occurring at least twice in the line we obtain the mean euphony indicator for the line as (2.1.2.3) CE= line
100 ∑ [α − P(ξi ≥ xi )] k ξ i ∈E
where the ξi are the phonemes belonging to the euphonic set E fulfilling condition E = {phoneme|CEphoneme > 0}. Having performed this computation for all lines of a poem we may define the euphonic value of the whole poem with K lines as
Occurrence without pattern 27
(2.1.2.4) CE poem =
1 K
K
∑ CE
line j
.
j =1
For the given poem we may compute the variance of CEpoem empirically as (2.1.2.5) Var (CE poem ) =
1 K
K
∑ (CE
line j
j =1
− CE poem ) 2 ,
a simple expression that can be used for comparing two poems by means of a normal test. For the sake of illustration let us consider the poem Lebăda: Lebăda
Phonemic transcription
Când pintre valuri ce saltă /k/ɨ/n/d/ /p/i/n/t/r/e/ /v/a/l/u/r/ i/ / t∫/e/ /s/a/l/t/ə/ Pe baltă
/p/e/ /b/a/l/t/ə/
În ritm uşor,
/ɨ/n/ /r/i/t/m/ /u/∫/o/r/
Lebăda albă cu-aripele-n vânturi
/l/e/b/ə/d/a/ /a/l/b/ə/ /k/w/a/r/i/p/e/l/e/n/ /v/ɨ/n/t/u/r/ i/
În cânturi
/ɨ/n/ /k/ɨ/n/t/u/r/ i/
Se leagănă-n dor;
/s/e/ /l/e̯ /a/g/ə/n/ə/n/ /d/o/r/
Aripele-i albe în raza cea caldă
/a/r/i/p/e/l/e/j/ /a/l/b/e/ /ɨ/n/ /r/a/z/a/ /t∫/a/ /k/a/l/d/ə/
Le scaldă,
/l/e/ /s/k/a/l/d/ə/
Din ele bătând,
/d/i/n/ /j/e/l/e/ /b/ə/t/ɨ/n/d/
Şi-apoi pe luciu, pe unda d-oglinde
/∫/j/a/p/o/j/ /p/e/ /l/u/t∫/u/ /p/e/ /u/n/d/a/ /d/o/g/l/i/n/d/e/
Le-ntinde
/l/e/n/t/i/n/d/e/
O barcă de vânt.
/o/ /b/a/r/k/ə/ /d/e/ /v/ɨ/n/t/
We obtain the results presented in Table 2.1.2.3. Table 2.1.2.3: Analysis of the poem Lebăda line no.
phoneme
CEphoneme 1.701282
C̅E̅line
4
/b/
0.24304
5
/ɨ/
3.544263
1.772131
7
/a/
3.088126
0.772031
10
/p/
1.837048
0.204116
28 Phonic phenomena Adding the numbers in the last column and dividing the sum by the number of lines in the poem (K = 12) we obtain CELebada = 2.991318/12 = 0.249277. Using this mean and the values in the last column by taking into account the eight lines that have euphony zero we obtain the variance Var(CELebada) = 0.257629. In this way, the euphonic tendency of all poems can be computed. Here we shall simply order the poems according to increasing euphony as presented in Table 2.1.2.4. The problem of the euphonic weight of individual phonemes is languagedependent – even if it can have an iconic background – but it cannot contribute to answering general questions. We can simply state that individual poems have different euphony values ranging from 0.110634 up to 0.453381. Comparing the first poem (smallest euphony) with the last (highest euphony) by means of a normal test we obtain = u
| 0.110634 − 0.453381| 0.342747 = = 4.29 0.0798193 0.047272 0.442159 + 56 80
which is a highly significant value. Hence euphony played a certain role in Eminescu's work. We may ask two questions: (1) Did Eminescu develop in this respect or did he maintain the same level from the first to the last analysed poem? (2) Does the extent of euphony depend on the length of the poem? The first question can be answered by scrutinising the relation of the euphony of the poem to the year of its origin. Looking at Table 2.1.2.4 and plotting the euphony values according to years in a diagram (cf. Figure 2.1.2.1) we see that in a certain epoch of his creativity, Eminescu began to develop euphony, interrupted always this evolution and fell back to a lower state, where he began anew. His “pathological” year 18831 was, regarding euphony, particularly expanded. This reminds of renewal processes but scrutinising of this mechanism must be postponed to the happy time when the works by more writers will be at our disposal and we shall know not only the year of origin but also the dates of first appearance. In any case, one sees a very characteristic historical movement of euphony. Taking simply means of the concerned years leads to a slight linear increase.
1 In June 1883, Eminescu fell seriously ill and finally died in 1889.
Occurrence without pattern 29
Table 2.1.2.4: Euphony in Eminescu's poems (ordered according to increasing values) Year
Poem title
No of verses
Euphony poem
Variance euphony
1873
Dumnezeu şi om
56
0.110634
0.047272
1867
Dacă treci râul Selenei
41
0.126545
0.042433
1867
La moartea lui Heliade
48
0.128128
0.047784
1870
Epigonii
114
0.137836
0.037605
1883
Peste vârfuri
12
0.14854
0.066133
1883
Somnoroase păsărele
16
0.150188
0.116695
1879
Atât de fragedă...
36
0.153134
0.102216
1883
Odă în metru antic
20
0.153366
0.072062
1873
Ghazel
40
0.165371
0.039335
1879
Despărţire
38
0.165682
0.063814
1887
Venere şi Madona
48
0.183075
0.069181
1874
Împărat şi proletar
210
0.183341
0.076543
1879
Rugăciunea unui dac
46
0.184598
0.067198
1881
Scrisoarea III
285
0.190571
0.079298
1869
Junii corupţi
78
0.195154
0.141009
1873
Adânca mare...
14
0.195572
0.064063
1883
Cu mâine zilele-ţi adaogi...
32
0.208622
0.139903
1866
O călărire în zori
86
0.2208
0.145174
1871
Iubită dulce, o, mă lasă
56
0.224681
0.096907
1871
Mortua est!
70
0.225005
0.092254
1874
O, adevăr sublime...
44
0.228707
0.090329
1867
Ce-ţi doresc eu ţie, dulce Românie
32
0.229817
0.146835
1887
De ce nu-mi vii
24
0.233317
0.113969
1879
Freamăt de codru
48
0.245238
0.117649
1869
Lebăda
12
0.249277
0.257629
1883
Şi dacă...
12
0.262959
0.34563
1883
Criticilor mei
28
0.264219
0.147977
1879
Sonete
42
0.264639
0.132969
1885
Sara pe deal
24
0.265213
0.088441
30 Phonic phenomena
1887
Povestea teiului
88
0.26974
0.217131
1876
Lacul
20
0.274255
0.152951
1883
Luceafărul
392
0.27721
0.239471
1883
Mai am un singur dor
36
0.28178
0.340581
1873
Floare albastră
56
0.293544
0.24609
1883
Trecut-au anii
14
0.296258
0.258758
1880
O, mamă...
18
0.296358
0.12687
1875
Făt-Frumos din tei
92
0.300824
0.201322
1883
La mijloc de codru
13
0.329564
0.127905
1871
Copii eram noi amândoi
92
0.336609
0.325554
1883
Pe lângă plopii fără soţ
44
0.342626
0.228156
1879
Revedere
36
0.353938
0.336106
1886
La steaua
16
0.365743
0.625435
1880
Dintre sute de catarge
16
0.373583
0.163423
1878
Povestea codrului
52
0.406063
0.316016
1883
Ce te legeni codrule
25
0.425026
0.486791
1883
Glossă
80
0.453381
0.442159
Figure 2.1.2.1. Plot of
Assonance 31
The second question can be answered if we scrutinise the relation as given in Figure 2.1.2.2.
Figure 2.1.2.2. Plot
As can easily be seen, there is no dependence of euphony on poem length. On the contrary, some poems of the same length seem to be written under quite different euphonic regimes. However, as soon as more authors have been analyzed, even this should be shown empirically by a statistical test. In our present study, euphony is a local phenomenon concerning a given poem but there is neither development nor length dependence.
2.2 Assonance Assonance may remind of an echo, a repetition of a sound sequence in another position of the poem. It must consist of at least two sounds (vowels) in the same linear order; the sequence may be “discontinuous”. While in prose assonance is not always relevant, it may play a certain euphonic role in poetry. Assonance may give rise to parallelism, i.e. repetition of the same sound-sequence in parallel positions in the strophe. This phenomenon can be observed e.g. in Malay folk-quatrains called pantuns (cf. Altmann 1963). In modern poetry, one cannot expect ordered vocalic structures outside of rhyme and if rhyme does not exist (e.g. in hexameter), vocalic patterns are rather seldom. One way of finding vocalic assonance patterns in modern poetry is to study the transitions from one vowel to the next one, so to say, to search for Markov dependencies of the first order. But the computation would be complex and not very lucid (cf. Brainerd 1976; Altmann 1988) since we have several different
32 Phonic phenomena states (= number of different vowels) in a text. Instead, we simply observe the transitions between vowels omitting the transition from one verse to the next and register them in a contingency table. For Romanian, we use the vocalic phonemes: {/a/, /ə/, /ɨ/, /e/, /e̯ /, /i/, /i/, /j/, /o/, /o̯ /, /u/, /w/}. Registering all transitions we obtain a 12 × 12 contingency table, in which we can find the individual tendencies. We test each individual cell using the criterion
(2.2.1) u =
ni .n. j n , ni .n. j (n − ni .)(n − n. j ) n 2 (n − 1) nij −
where u is the quantile of the standard normal distribution N(0,1), nij is the frequency in cell (i,j); ni. is the sum of row i; n. j is the sum of column j, and n is the total sum. The expression ni.n.j /n is the expectation for the cell (i,j), and the expression in the denominator is the standard deviation in the cell (i,j). If u ≥ 1.96, we have a significant vowel pattern, otherwise the pattern can be considered random. Here, we are not interested in different strengths of the patterning, hence we decide dichotomically: if u ≥ 1.96, we have an existing, positive pattern (P), otherwise the sequence is not significant (N). This is why we use rather the normal distribution than the chi-square criterion, which yields only positive results (being the square of (2.2.1)). For the sake of illustration let us present the results from the poem Lacul in Table 2.2.1. Consider the sequence of two a-s /aa/ yielding n/aa/ = 7 and the sequence /au/, n/au/ = 10. Inserting the other numbers from Table 2.2.1 we obtain: 36(34) 7− 160 u/ aa / = = −0.2999 36(34)(160 − 36)(160 − 34) 1602 (160 − 1)
Assonance 33
36(34) 10 − 160 = 2.43 36(34)(160 − 36)(160 − 34) 1602 (160 − 1)
u/ au /
Table 2.2.1: Frequency of vowel sequences in the poem Lacul /a/
/ə/
/ɨ/
/e/
/e̯ /
/i/
/i /
/j/
/o/
/o̯ /
/u/
/w/
ni .
a/
7
6
1
7
0
4
1
0
0
0
10
0
36
/ə/
4
2
1
0
0
3
2
0
1
0
2
0
15
/ɨ/
4
3
2
2
0
0
0
0
0
0
2
0
13
/e/
5
2
0
0
1
5
2
1
0
0
4
1
21
/e̯ /
3
0
0
0
0
0
0
0
0
0
0
0
3
/i/
2
3
3
5
0
1
1
3
3
0
3
0
24
/i /
4
0
1
3
0
0
0
0
0
0
0
0
8
/j/
1
1
0
2
0
1
0
0
0
1
1
0
7
/o/
2
0
0
1
1
1
1
0
0
0
0
0
6
/o̯ /
1
0
0
0
0
0
0
0
0
0
0
0
1
/u/
1
3
0
7
1
6
2
1
2
0
2
0
25
/w/
0
0
0
1
0
0
0
0
0
0
0
0
1
n.j
34
20
8
28
3
21
9
5
6
1
24
1
160
Since u/aa/ = -1.96 < -0.2999 < 1.96, the sequence /aa/ does not represent a significant association. The sequence /au/ represents a significant association because u/au/ = 2.43 ≥ 1.96. Performing the same test for all cells of Table 2.2.1 we obtain the results as presented in Table 2.2.2. Here we took the value of u = 1.96 as a boundary, however, one can use other quantiles. In a two-sided test, this corresponds to α = 0.05. One could, of course, assign the vowel sequences to different classes – as is usual in phonemics – e.g. those with u < -1.96 to the dissociative class (D) and those within [-1.96; 1.96] to the neutral class (N) but our aim here is to find only existing preferred associations (P).
34 Phonic phenomena Table 2.2.2: The u-test for individual cells of Table 2.2.1 /a/
/ə/
/ɨ/
/e/
/e̯ /
/i/
/i /
/j/
/o/
/o̯ /
/u/
/w/
P
/a/ /ə/ /ɨ/
P
/e/ /e̯ /
P
P
/i/ // i
P
P P
/j/ P
/o/ /o̯ / /u/ P
/w/
Evidently, there are associative tendencies in constructing vowel chains. Nine out of one hundred and forty-four sequences are preferred by the poet. In order to see whether the same tendencies exist in other works by Eminescu we analyzed 46 poems and presented the results in Table 2.2.3. Table 2.2.3: Associative two-member chains in Eminescu's poems Poem title
No of Significant chains of phonemes verses
Lebăda
12
/a,ə/, /ə,w/, /ɨ,i/, /i,e/, /i,e/, /o,j/, /u, i/,
Peste vârfuri
12
/a,j/, /ə,o/, /e,a/, /e,o/, / ,e/, /j,ɨ/, /o,u/,/o̯ ,a/, /u, / 9
Şi dacă...
12
/a,ə/, /a,ɨ/, /ɨ,u/, /e̯ ,a/, /i,j/, /j,e/,/o,e̯ /, /o,i/, /u,i/
9
La mijloc de codru...
13
/ə,i/, /ɨ,u/, /e̯ ,a/, /i,e/, /j,e/, /o̯ ,a/, /u,e̯ /
7
Adânca mare…
14
/a,ə/, /ɨ,e̯ /, /ɨ,i/, /e,j/, /e,o̯ /, /e̯ ,a/,/i,ɨ/, /i,o/, /j,e/, /o,i/, /o̯ ,a/
11
Trecut-au anii
14
/a,ə/, /a,e̯ /, /ɨ,a/, /e,u/, /e̯ ,a/, /i,j/, /i,o/, /i,ɨ/, /j,a/, /j,e/, /o,w /, /u, i/, /w,a/
13
Somnoroase păsărele..
16
/a,e/, /e̯ ,a/, /i,j/, /i,ɨ/, /o,i/, /o̯ ,a/, /u,ə/
7
La steaua
16
/a,e̯ /, /a,w/, /ə,i/, /ə,i/, /e̯ ,a/, /i,ɨ/, /i,j/, /i,o/, /j,e/, /o̯ ,a/, /u,i/
11
/a,u/, /ə,ɨ/, /ɨ,u/, /e̯ ,o/, /i,e/, /j,o/, /o,o/, /o̯ ,a/, /u,i/
9
Dintre sute de catarge 16
i
No. 7 i
Assonance 35
Poem title
No of Significant chains of phonemes verses
No.
O, mamă…
18
/a,ə/, /ə,u/, /e,w/, /e̯ ,a/, /i,e/, /i,ɨ/,/j,o/, /o,j/, /o̯ ,a/, /u,ə/
10
Lacul
20
/a,u/, /e,w/, /e̯ ,a/, /i,j/, /i,o/, /i,a/, /j,o̯ /, /o,e̯ /, /w,e/
9
Odă în metru antic
20
/a,e/, /ɨ,ɨ/, /e,i/, /e,w/, /e̯ ,a/, /i,j/, /i,ɨ/, /j,e/, /o,i/, /o̯ ,a/, /w,o/
11
De ce nu-mi vii
24
/ə,w/, /e,u/, /e,w/, /e̯ ,a/, /i,j/, /i,i/, /j,e/, /o,i/, /o,i/, /o̯ ,a/, /u,i/
11
Sara pe deal
24
/a,e̯ /, /ə,i/, /ɨ,ɨ/, /e,i/, /e,i/, /e,w/, /e̯ ,a/, /i,ə/, /i,j/, 12 /j,o̯ /, /o,i/, /o̯ ,a/
Ce te legeni...
25
/a,e̯ /, /a,j/, /ɨ,u/, /e,ɨ/, /e,e/, /e̯ ,a/, /i,o/, /o,u/, /o̯ ,a/, /u,i/, /u,i/
11
Criticilor mei
28
/ə, e̯ /, /ɨ,i/, /e,o̯ /, /e,u/, /e̯ ,a/, /i,j/, /j,e/, /o,i/, /o̯ ,a/, /u,e/, /u,i/, /w,ə/
12
Cu mâne zilele-ţi adaogi...
32
/a,o/, /ə,o̯ /, /e,i/, /e̯ ,a/, /i,e/, /i,w/, /j,e/, /o,ə/, /o̯ ,a/, /w,a/, /w,e̯ /
11
Ce-ţi doresc eu ţie, dulce Românie
32
/ə,i/, /ə,w/, /ɨ,i/, /e,o/, /e,w/, /e̯ ,a/, /i,e̯ /, /i,j/, /i,o/, /j,e/, /o,ɨ/, /o,o̯ /, /o̯ ,a/, /u,i/, /w,i/
15
Revedere
36
/ɨ,ə/, /ɨ,u/, /e,e̯ /, /e,i/, /e̯ ,a/, /i,e/, /j,e/, /j,o/, /o,j/, /o,u/, /o̯ ,a/, /u,i/
12
Atât de fragedă...
36
/ɨ,o/, /e,i/, /e,u/, /e̯ ,a/, /i,j/, /i,o/, /i,w/, /j,e/, /o,i/, 11 /o̯ ,a/, /w,a/
Mai am un singur dor
36
/a,o̯ /, /a,u/, /ə,i/, /e,e/, /e̯ ,a/, /i,e̯ /, /i,j/, /i,ɨ/, /j,e/, /o,j/, /o,o/, /o̯ ,a/, /u,i/, /w,o/
14
Despărţire
38
/ə,w/, /ɨ,o̯ /, /e̯ ,a/, /e̯ ,o̯ /, /i,j/, /j,e/, /o,j/, /o̯ ,a/, /w,ɨ/
9
Ghazel
40
/ə,u/, /ə,w/, /ɨ,o/, /e,i/, /e̯ ,a/, /i,ə/, /i,ɨ/, /i,o̯ /, /j,e/, 14 /o,i/, /o,j/, /o̯ ,a/, /u,e/, /u,i/
Dacă treci râul Selenei 41
/a,ə/, /a,e/, /ə,a/, /ə,i/, /e̯ ,a/, /i,j/, /i,o/, /i,w/, /i,ɨ/, /j,e/, /o,i/, /o,i/, /o,o̯ /, /o̯ ,a/, /u,e̯ /, /w,ɨ/
16
Sonete
42
/a,ə/, /ɨ,u/, /e,e/, /e,j/, /e̯ ,a/, /i,j/, /i,i/, /j,a/, /o,i/, /o̯ ,a/, /u,i/
11
Pe lângă plopii fără soţ
44
/ə,w/, /ɨ,i/, /e̯ ,a/, /i,j/, /o̯ ,a/, /w,e/
6
O, adevăr sublime...
44
/a,e̯ /, /ə,o/, /ə,w/, /e,i/, /e̯ ,a/, /i,ə/, /i,u/, /j,e/, /j,u/, /o,i/, /o̯ ,a/, /w,u/
12
Rugăciunea unui dac
46
/a,e̯ /, /a,w/, /ɨ,u/, /e,ɨ/, /e̯ ,a/, /e̯ ,o/, /i,o/, /j,e/, /o̯ ,a/, /u,j/, /w,o̯ /
11
Venere şi Madona
48
/a,ə/, /ə,u/, /ɨ,o̯ /, /e,j/, /e,w/, /e̯ ,a/, /j,e/, /o,i/,
11
36 Phonic phenomena
Poem title
No of Significant chains of phonemes verses
No.
/o,w/, /o̯ ,a/, /u,ɨ/ Freamăt de codru
48
/a,ə/, /ə,o̯ /, /e,w/, /e̯ ,a/, /i,i/, /i,o/, /j,e/, /o,j/, /o̯ ,a/, /u,i/
10
La moartea lui Heliade 48
/a,ə/, /ə,ɨ/, /ɨ,i/, /e̯ ,a/, /i,j/, /i,o/, /i,w/, /j,e/, /o,o/, 12 /o̯ ,a/, /u,i/, /u,j/
Povestea codrului
52
/a,ə/, /ə,i/, /ɨ,o/, /e̯ ,a/, /i,e̯ /, /i,j/, /i,o/, /j,ɨ/, /j,e/, /o,w/, /o̯ ,a/, /u,i/, /u,o̯ /, /w,o/
14
Iubită dulce, o, mă lasă
56
/a,e̯ /, /a,o/, /ə,j/, /ə,u/, /ə,w/, /e̯ ,a/, /j,e/, /o,i/, /o,u/, /o̯ ,a/, /u,ə/, /w,o/
12
Floare-albastră
56
/a,ə/, /a,ɨ/, /ə,i/, /ə,u/, /ɨ,o/, /ɨ,u/, /e,i/, /e,w/, /e̯ ,a/, /i,e̯ /, /i,j/, /i,w/, /j,e/, /o,i/, /o,j/, /o̯ ,a/, /u,ə/, /u,i/, /w,a/
19
Dumnezeu şi om
56
/a,ə/, /a,e/, /a,u/, /a,w/, /ə,ɨ/, /ə,i/, /ə,u/, /ɨ,ə/, /e̯ ,a/, /i,j/, /i,o̯ /, /j,e/, /o̯ ,a/
13
Mortua est!
70
/a,ə/, /a,e̯ /, /a,j/, /ə,i/, /ɨ,o/, /e,u/, /e̯ ,a/, /i,j/, /i,o̯ /, /i,w/, /j,e/, /o,i/, /o̯ ,a/, /u,i/, /w,ɨ/
15
Junii corupţi
78
/a,ə/, /ə,o̯ /, /ɨ,e̯ /, /ɨ,i/ ,/e̯ ,a/, /i,ə/, /i,j/, /i,ɨ/, /j,e/, /o,w/, /o̯ ,a/, /u,i/
12
Glossă
80
/a,ə/, /ə,ɨ/, /ə,i/, /ə,w/, /e,e/, /e̯ ,a/, /i,j/, /i,o/, /i,ɨ/, /j,e/, /o,o̯ /, /o,w/, /o̯ ,a/, /u, i/, /w,ə/
15
O călărire în zori
86
/a,ə/, /a,j/, /ə,i/, /ə,w/, /ɨ,u/, /e,ɨ/, /e,o/, /e̯ ,a/, /i,j/, /i,ɨ/, /j,a/, /o,i/, /o,w/, /o̯ ,a/, /u,j/
15
Povestea teiului
88
/a,ə/, /a,j/, /ə,ɨ/, /ə,u/, /e,e̯ /, /e,w/, /e̯ ,a/, /e̯ ,o̯ /, /j,a/, /o,i/, /o,u/, /o̯ ,a/, /u, i/, /w,a/
14
Copii eram noi amândoi
92
/a,e/, /ɨ,ə/, /e,w/, /e̯ ,a/, /i,j/, /i,o/, /j,e/, /o,i/, /o̯ ,a/, /u, i/, /u,o̯ /, /w,a/
12
Făt-Frumos din tei
92
/a,ə/, /ɨ,ə/, /e,ɨ/, /e̯ ,a/, /j,a/, /j,e/, /o,i/, /o,i/, /o,u/, 13 /o,w/, /o̯ ,a/, /u,i/, /w,a/
Epigonii
114
/a,ə/, /a,u/, /a,w/, /ɨ,ə/, /e,i/, /e̯ ,a/, /i,e̯ /, /i,j/, /i,o̯ /, /i,ɨ/, /j,e/, /o,j/, /o̯ ,a/, /u,e/
Împărat şi proletar
210
/a,ə/, /a,u/, /ə,ɨ/, /ə,i/, /e,i/, /e̯ ,a/, /i,e̯ /, /i,j/, /i,o̯ /, 17 /j,a/, /j,e/, /j,o/, /o,i/, /o,o/, /o̯ ,a/, /u,i/, /w,ə/
Scrisoarea III
285
/a,ə/, /a,u/, /a,w/, /ə,ɨ/, /ɨ,ə/, /ɨ,u/, /e,o̯ /, /e̯ ,a/, /i,e/, /i,j/, /i,o/, /i,ɨ/, /j,a/, /j,e/, /o,e̯ /, /o,i/, /o,i/, /o̯ ,a/, /u,ə/, /u,i/, /u,i/, /w,a/, /w,ɨ/
Luceafărul
392
/a,ə/, /a,e̯ /, /a,w/, /ə,ɨ/, /ə,i/, /e,o̯ /, /e,w/, /e̯ ,a/, /i,e̯ 21 /, /i,i/, /i,j/, /i,o/, /i,ɨ/, /i,o̯ /, /j,a/, /j,e/, /o,i/, /o,i/, /o,u/, /o̯ ,a/, /u,i/
14
23
Assonance 37
2.2.1 The diagonal Looking at Table 2.2.2 we see that there is no sequence with significant frequency on the diagonal. However, if we look at Table 2.2.3 we find a very small number of identical (= diagonal) pairs. This may indicate that Romanian avoids such pairs or characterise a property of Eminescu's poems. Whatever the “cause”, we may test the behaviour of the diagonal applying a simple statistical test for the existence of vowel harmony in languages (cf. Altmann 1987; Schulz, Altmann 1988; Altmann, Altmann 2008). In our case, we surpass the boundaries of words and test the existence or nonexistence of a tendency. We set up a function of diagonal cells in form of their sum S = n11 + n22 + … + nkk, with k as the number of cells on the diagonal, and compare it with the expected sum given as
ni .n.i n
k
E (S ) = ∑ i =1
using the variance defined as
(2.2.2) Var ( S ) =
∑ n .n. (n − n .)(n − n. ) + 2∑ n .n. n .n. i
i
i
i
i
2
n (n − 1)
i < i´
i
i
i'
i'
.
The difference between the observed and the expected values divided by the root of the variance of S yields the standard normal distribution N(0,1). Performing the test (2.2.3) u =
S − E (S ) , Var ( S )
we can state whether the diagonal as a whole exhibits a significantly positive (u> 1.96), a significantly negative (u < −1.96) or a neutral tendency (u ∈ [-1.96; 1.96]). The corresponding chi-square test becomes simpler when only the preference of the diagonal is of interest:
38 Phonic phenomena k
(2.2.4) X 2 =
n(nS − ∑ ni .n.i ) 2 k
i =1
k
∑ ni .n.i (n − ∑ ni .n.i ) 2
,
=i 1 =i 1
which is approximately X2 ≈ u2. For the sake of illustration, we compute the tendency of the diagonal for the data in Table 2.2.1. We designate the two sums in (2.2.2) as A and B respectively and obtain S = 7 + 2 + 2 + 0 + 0 + 1+ 0 + 0 + 0 + 0 + 2 + 0 = 14 E(S) = [36(34) + 25(20) + 13(8) + 21(28) + 3(3) + 24(21) + 8(9) + 7(5) + +6(6) + 1(1) + 25(24) + 1(1)]/160 = 21.7125 A = [36(34)(160-36)(160-34) + 25(20)(160-25)(160-20) + …+ +1(1)(160-1)(160-1)] /1602 /159 = 15.34948 B =2[36(34)25(20) + 36(34)13(8)+…+36(34)1(1)+…+25(24)1(1)] /1602 /159 = = 2.33445 Var(S) = A + B = 17.6839 Inserting the needed values in (2.2.3) we obtain u=
14 − 21.7125 = −1.834 17.6839
showing that the diagonal is neutral although there is in fact a negative tendency (avoidance of sequences of equal vowels). Using the chi-square criterion we obtain for (2.2.4)
= X2
160[160(14) − 3474]2 = 3.1697 3474(1602 − 3474)
which is approximately u2 = (-1.834)2 = 3.3635 and not significant: it does not show the direction of the tendency.
Assonance 39
Instead to analyse each of the 46 poems separately, we show the numbers of positive associations in all poems as presented in Table 2.2.4, obtained by counting the significant sequences in Table 2.2.3. Table 2.2.4: Numbers of associations of subsequent vowels in 46 poems /a/
/ə/
/e/
/e̯ /
/i/
/i /
/j/
/o/
/a/
0
23
/ə/ /ɨ/
1 1
0 6
2
5
8 2
0 0
8
0
0
5
1 2
10 6
5 1
1 0
/e/
1
0
4
/e̯ /
43
0
0
4
2
6
5
0
0
0
0
/i/
0
4
1
7
7
2
0
/i /
1
0
14
2
0
2
/j/
8
0
2
34
0
/ɨ/
/o̯ /
/u/
/w/
2
1
7
6
2 5
3 2
7 10
10 0
3
3
4
5
12
0
2
2
0
0
28
12
4
0
4
0
1
4
2
1
2
0
0
0
4
2
1
0
/o/
0
1
1
1
3
19
9
9
4
3
7
7
/o̯ /
42
0
0
0
0
0
0
0
0
0
0
0
/u/
0
5
1
3
2
5
24
3
0
2
0
0
/w/
8
3
4
2
1
1
0
0
4
1
1
0
Since (on the diagonal) there are only 12 cases (marked in bold face) out of 563, which indicate association of identical phonemes (/ɨ,ɨ/, /e,e/, /i,i/, /o,o/), we may state that there is a strong dissociative tendency to use sequences consisting of identical vowels. The most frequent pairs of subsequent vowels are /e̯ ,a/ (43), /o̯ ,a/ (42), /j,e/ (34), /i,j/ (28) and they correspond to the diphthongs: “ea“, “oa“, “ie“, “ii“.
2.2.2 Symmetry If assonance is symmetric then sequences are to be considered random; the given sequence and the same sounds in reverse order are to be expected with the same frequency. If /e̯ ,a/ is significantly frequent, then we expect also /a,e̯ / to have the same quality. If the situation is different, we may speak about significant asymmetry of assonance. In Table 2.2.4 we observe the asymmetry of assonance in the studied poems. This can be caused both by the given language and the style of the author. If the given language excludes specific sequences (e.g. in Indonesian there are sequences [ә,a] but no sequences [a,ә]), asymmetry is not necessarily given. In general, if vowel sequences are concerned, the more
40 Phonic phenomena vowels there are in the language, the smaller is the probability of the existence of all reverse sequences in a poem. Whatever the situation in the given language, asymmetry can be measured and expressed by an indicator. For this purpose, we use the well known Bowker-test, which is based on the comparison of all cells (i,j) with symmetric cells (j,i) where i ≠ j, i.e. symmetric cells without the diagonal of the contingency table. We set up the criterion k −1
(2.2.5) X 2 = ∑ i= 1
k
∑
j = i +1 nij + n ji ≠ 0
(n ij − n ji ) 2 nij + n ji
which is distributed like a chi-square with k(k-1)/2 degrees of freedom (DF), k being the number of classes, here vowels (k = 12). Using Table 2.2.1 we obtain
= X2
(6 − 4) 2 (4 − 1) 2 (1 − 1) 2 (2 − 0) 2 + + + + = 49.9151 6+4 4 +1 1+1 2+0
which is with DF = 12(12-1)/2 = 66 yields P = 0.93 showing that the sequences are quite symmetric. In this way, all poems could be scrutinised. But if we consider the overall Table 2.2.4 to see the situation and perform the same test we obtain X2 = 297.802 which is with 66 degrees of freedom highly significant, i.e. it testifies to strong asymmetry. All in all, there is a strong tendency to avoid reverse/symmetric vocalic assonances either in Eminescu's work or in Romanian in general. In any case, the extent of asymmetry can be considered a property of his poems. This is not an automatic result but a case of poem structure. However, only a comparison with other poets – also in other languages – would help to unveil the strength of this structure. Simple functions applied to the sequence do not yield a better result than R2 = 0.80 but it is surely based on the fact that each poem is an individual creation, and a better result would follow only if we had many poems of the same length. Preliminarily we can consider this problem as a future task.
Assonance 41
2.2.3 Poem length and significant sequences Out of 144 possible vowel sequences in the given poems we find only 95 significant ones. One can expect that if the number of lines increases, the number of significant sequences will increase, too. However, the increase is not linear, it can be more adequately captured using a power function. We apply such a function to the data in Table 2.2.5, which results from counting the significance sequences for each poem in Table 2.2.3. As can be seen in Figure 2.2.1, the number of significant sequences increases with increasing poem length.
Figure 2.2.1. Increase of significant vowel sequences with increasing poem length Table 2.2.5: Dependence of significant sequences on verse numbers No. of verses
No. of significant vowel sequences
No. of verses
No. of significant vowel sequences
12
7
41
16
12
9
42
11
12
9
44
6
13
7
44
12
14
11
46
11
14
13
48
11
16
7
48
10
16
9
48
12
16
11
52
14
42 Phonic phenomena
No. of verses
No. of significant vowel sequences
No. of verses
No. of significant vowel sequences
18
10
56
12
20
9
56
19
20
11
56
13
24
11
70
15
24
12
78
12
25
11
80
15
28
12
86
15
32
11
88
14
32
15
92
12
36
12
92
13
36
11
114
14
36
14
210
17
38
9
285
23
40
14
392
21
There is no significant trend, the sequences are likely to correspond with those usual in Romanian word structure. It will be reasonable to characterise Eminescu only after several Romanian authors have been investigated. An alternative measuring method for poem size is in terms of the number of words. This is done in Table 2.2.6. The result of the regression analysis is displayed in Figure 2.2.2. As can be seen, the regression is here too weak to be considered as really existing. Table 2.2.6: Title and size of analysed 46 poems by Eminescu Poem title
N words (text size)
No. of significant chains of phonemes
1
Lebăda
41
7
2
Peste vârfuri
47
9
3
Dintre sute de catarge
50
9
4
Şi dacă...
53
9
5
La mijloc de codru...
55
7
6
Somnoroase păsărele...
55
7
7
La steaua
71
11
Assonance 43
8
Adânca mare…
75
11
9
Trecut-au anii
88
13
10
Lacul
90
9
11
Ce te legeni...
102
11
12
Odă în metru antic
103
11
13
De ce nu-mi vii
123
11
14
Mai am un singur dor
125
14
15
Criticilor mei
130
12
16
O, mamă…
140
10
17
Cu mane zilele-ţi adaogi...
141
11
18
Revedere
141
12
19
Sara pe deal
156
12
20
Atât de fragedă…
176
11
21
Freamăt de codru
179
10
22
Ce-ţi doresc eu ţie, dulce Românie
183
15
23
Pe lângă plopii fără soţ
199
6
24
Povestea codrului
220
14
25
Floare-albastră
247
19
26
Sonete
265
11
27
Despărţire
304
9
28
Ghazel
331
14
29
La moartea lui Heliade
332
12
30
O, adevăr sublime...
334
12
31
Iubită dulce, o, mă lasă
337
12
32
O călărire în zori
346
15
33
Dacă treci râul Selenei
356
16
34
Rugăciunea unui dac
357
11
35
Copii eram noi amândoi
375
12
36
Glossă
380
15
37
Povestea teiului
390
14
38
Venere şi Madona
393
11
39
Făt-Frumos din tei
415
13
40
Dumnezeu şi om
443
13
41
Junii corupţi
458
12
42
Mortua est!
491
15
44 Phonic phenomena
43
Epigonii
44
Împărat şi proletar
921
14
1510
17
45
Luceafărul
1737
21
46
Scrisoarea III
2278
23
Figure 2.2.2. Number of significant phoneme chains vs. poem lengths in words
2.3 Alliteration In poetry, there are two kinds of alliteration (= repetitions of the same sound at the beginning of linguistic entities): (1) in the poem, at the beginning of verses, which we will call Skinner alliteration, (2) in the line, at the beginning of words, which can be called Beowulf alliteration. The evaluation of both kinds may be processed using the same method. However, there are different starting approaches. (a) Stating the relative frequencies of all phonemes (sounds) in the poetic work of the author or (b) stating the relative frequencies only at the beginning of words (or verses) in all his poems; or (c) considering only the given poem and taking all phoneme/sound frequencies or (d) only the initial phonemes/sounds and their frequencies into account, and finally (e) starting from the relative frequencies of phonemes/sounds in the given language. Needless to say, the outcomes of tests may turn out to be very different but none of these “universes of discourse” represents some kind of population in the sense of statistics. They are samples but there is no population which could be called “phoneme/sound frequency with Eminescu” or even “phoneme/sound frequency in Romanian”. Every language changes every day, has many dialects
Alliteration 45
and idiolects, and a “population” should represent also all spoken utterances. Hence, in language there is no “population” (cf. Orlov, Boroda, Nadarejšvili 1982). Nevertheless, one may always start from a conventionally stated background and obtain a restricted result. Furthermore, two “distance” approaches can be differentiated: (f) taking into account the mutual distance of the given alliterated lines or words because the repetition of the same sound e.g. at the beginning of the 1st and at the 100th line surely does not have an alliterative effect, or (g) igoring distance. Computation according to approach (f) is much more difficult and requires some intuitive, subjective decisions about the distance in which alliteration can still be perceived. Here we shall lean against the sound frequency in 146 poems by Eminescu. A study on the basis of the phoneme frequencies in modern Romanian would mean to perform judgements about samples on the basis of a “population” which arose circa 140 years later; considering the complete “language of Eminescu” would mean only his written and conserved texts, not his spoken ones. Every decision about the choice of a “population” in language is simply a condition under which the analysis will be performed. The situation resembles mathematical theorems with “Let be given…”. Computing the Skinner alliteration does not differ from that of general euphony but here we do not differentiate vowels and consonants because both can occur at the same position. All formulas have been presented in previous chapters. The results of the computation of Skinner alliteration are presented in Table 2.3.1. Table 2.3.1: Skinner alliteration in 46 poems by M. Eminescu Year
Poem
No. of verses
phoneme
alliterative euphonies
Mean Skinner alliteration poem
1869
Lebăda
12
/ɨ/ /l/
1.66509 3.55578
0.43507
1883
Peste vârfuri
12
/m/ /p/
4.93605 0.48573
0.45181
1883
Şi dacă...
12
/j/ /∫/
4.30004 4.99989
0.77499
1883
La mijloc de codru...
13
/l/ /∫/
4.79498 4.99983
0.75345
1873
Adânca
14
/a/
2.20067
0.34688
46 Phonic phenomena
Year
Poem
No. of verses
mare...
phoneme
alliterative euphonies
Mean Skinner alliteration poem
/∫/
2.65572
1883
Trecut-au anii
14
/k/ /∫/
3.95403 2.65572
0.47213
1883
Somnoroase păsărele
16
/d/ /p/ /s/
3.03033 3.99200 4.71373
0.73350
1886
La steaua
16
/j/ /l/
3.39116 4.53123
0.49515
1880
Dintre sute de catarge
16
/k/ /t∫/ /d/ /v/
3.46930 2.31987 3.03033 4.99050
0.86312
1880
O, mamă...
18
/d/ /m/ /s/
2.28268 4.66702 4.54768
0.63874
1876
Lacul
20
/ɨ/ /s/ /∫/
3.78911 0.90644 4.96607
0.48308
1883
Odă în metru antic
20
/p/
4.99844
0.24992
1887
De ce nu-mi vii
24
/k/ /d/ /s/
4.20917 4.83748 4.79619
0.57679
1885
Sara pe deal
24
/s/ /v/
4.97454 4.38810
0.39011
1883
Ce te legeni codrule
25
/b/ /d/ /∫/ /z/
2.03667 4.80299 4.99998 3.59533
0.61740
1883
Criticilor mei
28
/k/ /t∫/
4.971341 3.98666
0.319929
1883
Cu mâne zilele-ţi adaogi...
32
/k/ /t∫/ /d/ /∫/
4.99917 3.53697 2.06036 4.78625
0.48071
1867
Ce-ţi doresc eu ţie, dulce Românie
32
/k/ /t∫/ /d/ /f/ /v/
2.83420 3.53697 4.39714 0.02892 4.84627
0.48886
Alliteration 47
Year
Poem
No. of verses
phoneme
alliterative euphonies
Mean Skinner alliteration poem
1879
Revedere
36
/k/ /t∫/ /m/ /∫/ /v/
4.98289 2.99134 4.12971 4.66840 3.12156
0.55261
1879
Atât de fragedă...
36
/k/ /m/ /∫/
4.32218 1.13610 4.66840
0.28130
1883
Mai am un singur dor
36
/d/ /p/
4.00055 3.05850
0.19608
1879
Despărţire
38
/k/ /t∫/ /s/
4.99999 2.68212 4.67215
0.32511
1873
Ghazel
40
/d/ /p/ /∫/
3.45432 4.90593 4.99999
0.33401
1873
Dacă treci râul Selenei
41
/j/ /k/ /d/ /m/ /p/ /∫/
3.73529 3.82598 3.29193 4.99121 2.01644 4.93282
0.55594
1879
Sonete
42
/k/ /d/ /p/ /s/ /∫/
4.95362 3.11865 4.30731 2.70689 4.41912
0.46442
1883
Pe lăngă plopii fără soţ
44
/o/ /k/ /∫/
4.99518 4.93787 4.31429
0.32380
1874
O, adevăr sublime...
44
/o/ /k/ /∫/ /t/
4.96600 4.66043 4.31429 1.63301
0.35395
1879
Rugăciunea unui dac
46
/j/ /k/ /p/ /s/ /∫/
4.53414 4.91815 4.80177 4.99392 4.99989
0.52713
1887
Venere şi Madonă
48
/k/ /d/
2.78525 1.83675
0.26996
48 Phonic phenomena
Year
Poem
No. of verses
phoneme
alliterative euphonies
/f/ /∫/
3.33612 4.99984
Mean Skinner alliteration poem
1879
Freamăt de codru
48
/j/ /k/ /t∫/ /p/ /s/ /∫/
2.62252 4.89379 4.25465 0.09509 1.17964 4.06876
0.35655
1867
La moartea lui Heliade
48
/a/ /o/ /k/ /p/ /v/
2.90305 3.50287 4.89379 0.09509 4.31774
0.32734
1878
Povestea codrului
52
/k/ /t∫/ /∫/
1.98510 4.01405 4.80065
0.20769
1871
Iubită dulce, o, mă lasă
56
/k/ /t∫/ /s/ /∫/
3.88829 4.79065 4.97055 4.99948
0.33302
1873
Floarealbastră
56
/∫/ /v/
4.99999 3.83283
0.15773
1873
Dumnezeu şi om
56
/ɨ/ /d/ /f/ /∫/
3.82567 3.29543 2.50228 4.99508
0.26104
1871
Mortua est!
70
/k/ /d/ /p/ /s/ /∫/
4.99999 3.52076 3.44782 3.09290 4.97938
0.28630
1869
Junii corupţi
78
/ɨ/ /k/ /t∫/ /d/ /s/ /∫/ /v/
4.99795 4.99999 1.24682 4.18110 1.77316 4.99387 1.53272
0.30417
Alliteration 49
Year
Poem
No. of verses
phoneme
alliterative euphonies
Mean Skinner alliteration poem
1883
Glossă
80
/t∫/ /d/ /∫/ /t/ /v/
4.99993 4.72079 3.75505 4.99999 4.13341
0.28261
1866
O călărire în zori
86
/ɨ/ /k/ /d/ /p/ /∫/
3.11600 4.93680 4.54513 4.98123 4.99817
0.26253
1887
Povestea teiului
88
/j/ /k/ /d/ /p/ /s/ /∫/
4.91265 4.03543 3.37119 3.63219 4.26721 4.98556
0.28641
1871
Copii eram noi amândoi
92
/ɨ/ /k/ /d/ /m/ /p/ /∫/
2.46870 4.99382 4.98496 3.26839 4.48592 4.99999
0.273932
1875
Făt-Frumos din tei
92
/ɨ/ /j/ /k/ /p/ /∫/
2.46870 4.56485 3.75689 4.48592 4.99999
0.22040
1870
Epigonii
114
/k/ /t∫/ /p/ /s/ /∫/ /v/
4.95225 4.99996 0.28475 4.49828 4.99698 4.99860
0.21694
1874
Împărat şi proletar
210
/ɨ/ /k/ /t∫/ /p/ /s/ /∫/ /v/ /z/
4.78623 4.99999 4.76277 4.99999 3.18147 4.99998 4.35967 3.09749
0.16756
50 Phonic phenomena
Year
Poem
No. of verses
phoneme
alliterative euphonies
1881
Scrisoarea III
285
/ɨ/ /k/ /t∫/ /d/ /p/ /∫/ /v/
4.80557 4.99999 3.16111 4.99732 4.58681 4.99999 4.99849
1883
Luceafărul
392
/ɨ/ /j/ /k/ /d/ /h/ /p/ /s/ /∫/
4.59363 4.99538 4.99999 4.99999 4.94273 4.99950 3.81825 4.99999
Mean Skinner alliteration poem 0.114208
0.09783
Figure 2.3.1. Skinner alliteration in 46 poems by Eminescu
Comparing the mean alliteration of poems (last column) with the number of lines it can easily be seen that the longer the poem the weaker is the weight of alliteration. The result is presented in Figure 2.3.1. The trend can be expressed by a polynomial but this is no good solution. The oscillation is too strong to be captured by a simple function. However, we do not attempt at fitting a function as we currently lack an a priori hypothesis. We simply consider all poems in which the mean alliteration is greater than 0.5 as alliteratively emphasised in-
Alliteration 51
dependently of the number of lines. In Table 2.3.1, ten poems fulfill this condition, i.e. 21.74%. Since they are mostly short, we can conclude that Skinner alliteration did not play an important role in Eminescu's poetry. But even if we omit the respective poems, the oscillation is too strong and the best fitting result is yielded by a decay function (e.g. y = ab/(b+x) or y = a/(1 + bx)) with R2 = 0.72. Another question is the increase or decrease of Skinner alliteration in Eminescu's development. We reorder the poems according to the year of the publication, take the means of individual years and obtain the results presented in Table 2.3.2. Table 2.3.2: Historical development of Skinner alliteration in 46 poems by Eminescu (averages are marked with bold letter) Year
Poem title
1866
O călărire în zori
86
0.26253
1867
Ce-ţi doresc eu ţie, dulce Românie
32
0.48886
1867
La moartea lui Heliade
48
0.32734
1867
Average
1869
Lebăda
12 78
1869
Junii corupţi
1869
Average
1870
Epigonii
1871
No. of verses
Mean Skinner alliteration
0.40810 0.43507 0.30417 0.36962 114
0.21694
Iubită dulce, o, mă lasă
56
0.33302
1871
Mortua est!
70
0.28630
1871
Copii eram noi amândoi
92
0.27393
1871
Average
1873
Adânca mare...
14
0.34688
1873
Ghazel
40
0.33401
1873
Dacă treci râul Selenei
41
0.55594
1873
Floare-albastră
56
0.15773
1873
Dumnezeu şi om
56
0.26104
1873
Average
1874
O, adevăr sublime...
0.29775
0.33112 44
0.35395
52 Phonic phenomena
1874
Împărat şi proletar
1874
Average
210
0.16756
1875
Făt-Frumos din tei
92
0.22040
1876
Lacul
20
0.48308
1878
Povestea codrului
52
0.20769
1879
Revedere
36
0.55261
1879
Atât de fragedă...
36
0.28130
1879
Despărţire
38
0.32511
1879
Sonete
42
0.46442
1879
Rugăciunea unui dac
46
0.52713
1879
Freamăt de codru
48
0.35655
1879
Average
1880
Dintre sute de catarge
16
0.86312
1880
O, mamă...
18
0.63874
1880
Average
1881
Scrisoarea III
1883
0.26076
0.41785
0.75093 285
0.11421
Peste vârfuri
12
0.45181
1883
Şi dacă...
12
0.77499
1883
La mijloc de codru...
13
0.75345
1883
Trecut-au anii
14
0.47213
1883
Somnoroase păsărele
16
0.7335
1883
Odă în metru antic
20
0.24992
1883
Ce te legeni codrule
25
0.6174
1883
Criticilor mei
28
0.31993
1883
Cu mâne zilele-ţi adaogi...
32
0.48071
1883
Mai am un singur dor
36
0.19608
1883
Pe lângă plopii fără soţ
44
0.3238
1883
Glossă
1883
Luceafărul
1883
Average
1885
Sara pe deal
24
0.39011
1886
La steaua
16
0.49515
1887
De ce nu-mi vii
24
0.57679
80
0.28261
392
0.09783 0.44263
Alliteration 53
1887
Venere şi Madonă
48
0.26996
1887
Povestea teiului
88
0.28641
1887
Average
0.37772
We do not observe any trend in the data; the course of the values is horizontal. We can conclude that Skinner alliteration does not play any important role in Eminescu's poems. Consider now the individual phonemes and their alliterative weight. The results presented in Table 2.3.3 are obatined considering the number of poems in which a given phoneme has significant alliterative weight. As can be seen, vowels play a secondary role. The euphonic effects are brought about mostly by fricatives; some phonemes are not used at all in this role. Let us consider now the Beowulf alliteration, i.e. the repetition of the same phonemes at the beginning of words in the verse. The computation is analogous, the formulas can be found in previous chapters. The results for individual poems are presented in Table 2.3.4.
Table 2.3.3: Number of poems in which the given phoneme is significantly Skinner alliterative phoneme
No. of poems
/∫/
32
/k/
29
/d/
21
/p/
19
/s/
16
/t∫/
14
/v/
11
/ɨ/
10
/j/
8
/m/
6
/f/
3
/l/
3
/o/
3
/a/
2
/t/
2
54 Phonic phenomena
/z/
2
/b/
1
/h/
1
Figure 2.3.2. Beowulf-alliteration in 46 poems by Eminescu Table 2.3.4: Beowulf alliteration in poems by Eminescu
Year
Poem title
No. of verses
Mean Beowulf alliteration
1869
Lebăda
12
0.05347
1883
Peste vârfuri
12
0.46448
1883
Şi dacă...
12
0.14888
1883
La mijloc de codru
13
0.55811
1873
Adânca mare...
14
0.29776
1883
Trecut-au anii
14
0.24343
1883
Somnoroase păsărele
16
0.09431
1886
La steaua
16
0.13558
1880
Dintre sute de catarge
16
0.71598
1880
O, mamă...
18
0.39431
1876
Lacul
20
0.46970
1883
Odă în metru antic
20
0.30460
1887
De ce nu-mi vii
24
0.55745
1885
Sara pe deal
24
0.27876
1883
Ce te legeni codrule
25
0.14388
1883
Criticilor mei
28
0.31361
1883
Cu mâne zilele-ţi adaogi...
32
0.22782
Alliteration 55
Year
Poem title
No. of verses
Mean Beowulf alliteration
1867
Ce-ţi doresc eu ţie, dulce Românie
32
0.54427
1879
Revedere
36
0.38388
1879
Atât de fragedă...
36
0.33618
1883
Mai am un singur dor
36
0.31839
1879
Despărţire
38
0.40529
1873
Ghazel
40
0.35619
1873
Dacă treci râul Selenei
41
0.30746
1879
Sonete
42
0.28151
1883
Pe lângă plopii fără soţ
44
0.37434
1874
O, adevăr sublime...
44
0.29885
1879
Rugăciunea unui dac
46
0.35243
1887
Venere şi Madonă
48
0.42784
1879
Freamăt de codru
48
0.28827
1867
La moartea lui Heliade
48
0.31488
1878
Povestea codrului
52
0.18022
1871
Iubită dulce, o, mă lasă
56
0.28434
1873
Floare-albastră
56
0.32952
1873
Dumnezeu şi om
56
0.29509
1871
Mortua est!
70
0.38091
1869
Junii corupţi
78
0.41006
1883
Glossă
80
0.49429
1866
O călărire în zori
86
0.24570
1887
Povestea teiului
88
0.37388
1871
Copii eram noi amândoi
92
0.32993
1875
Făt-Frumos din tei
92
0.40153
1870
Epigoni
114
0.38133
1874
Împărat şi proletar
210
0.34650
1881
Scrisoarea III
285
0.38960
1883
Luceafărul
392
0.31057
Even if we take means for individual poem lengths, this impression remains. However, it can depend on the insufficient representation of some poem lengths. In any case, there is no significant tendency according to Figure 2.3.2.
56 Phonic phenomena The same image can be obtained by ordering the poems according to the year of origin. There is no noteworthy trend. Even negative results are important: we may state that alliteration of whatever kind did not play any important role in Eminescu's poetry.
2.4 Aggregation Departing from the Skinner effect, according to which “the appearance of a sound in speech raises the probability of occurrence of that sound for some time thereafter” (Skinner 1939) it may be supposed that verses in mutually near neighbourhood are phonetically more similar than those in greater distance. This effect is not necessarily based on semantic effects; it is rather a kind of mechanical self-stimulation. It can be brought into connection with the generally accepted view of neuronal activity, which is maintained over a certain time span and may thus evoke repeatedly the same or a similar stimulus in the brain of the writer. In this way, sounds, words, or other units which were produced on a neural trigger have an increased probability to be repeated as long as the activity has not fully expired. We call this phenomenon aggregation. The distance d is measured simply in terms of the number of “steps” between two verses. Poetic texts form the best material for studying this phenomenon because of the almost equally long text sections (verses) which may be compared. Though we can measure the probability of appearance of a phonic element in a certain distance d after its previous appearance – although we must operate with conditional probabilities – we cannot measure either the degree of spontaneity or the extent of self-stimulation but we can, at least, assume the existence of a mechanism controlling this phenomenon. The rest must be left to neurologists. Now, even if the hypothesis is reasonable, the outcome may be disappointing because the author of a written text can exchange words later on in several places and disturb the tendency. As a matter of fact, only pieces of spoken language can be fully spontaneous but if we have luck, we find the traces of this trend also in poetry. Hence, the non-existence of the aggregation trend is no proof for the absence of spontaneity. A material in which this trend can be found with some guarantee are spontaneously narrated epopees of oriental narrators who compose them each time anew (cf. Altmann 1968). The situation is different in music where improvisation is a spontaneous sequence of harmonies and even those “stolen” occasionally from known compositions are inserted because the foregoing sequences simply lead to them.
Aggregation 57
The possibilities of research in this domain are merely touched and open an interdisciplinary field. One can compare sounds, phonemes, n-grams, syllables, morphemes, places of accent, feet, word lengths, clause lengths, etc. The research is at its beginnings. Here we shall adhere to the method proposed in Altmann (1968) (cf. also Wimmer et al. 2003: 72ff) and construct the data as follows: We stay on the level of phonemes and each verse of a poem is transcribed phonemically. For convenience we provide below the phonemic transcription of the poem Lacul. Lacul
phonemic transcription
Lacul codrilor albastru Nuferi galbeni îl încarcă Tresărind în cercuri albe El cutremură o barcă
/l/a/k/u/l/ /k/o/d/r/i/l/o/r/ /a/l/b/a/s/t/r/u/ /n/u/f/e/r/i/ /g/a/l/b/e/n/i/ /ɨ/l/ /ɨ/n/k/a/r/k/ə/ /t/r/e/s/ə/r/i/n/d/ /ɨ/n/ /t∫/e/r/k/u/r/i/ /a/l/b/e/ /j/e/l/ /k/u/t/r/e/m/u/r/ə/ /o/ /b/a/r/k/ə/
Şi eu trec de-a lung de maluri Parc-ascult şi parc-aştept Ea din trestii să răsară Şi să-mi cadă lin pe piept
/∫/i/ /j/e/w/ /t/r/e/k/ /d/e̯ /a/ /l/u/n/g/ /d/e/ /m/a/l/u/r/i/ /p/a/r/k/a/s/k/u/l/t/ /∫/i/ /p/a/r/k/a/∫/t/e/p/t/ /j/a/ /d/i/n/ /t/r/e/s/t/i/j/ /s/ə/ /r/ə/s/a/r/ə/ /∫/i/ /s/ə/m/i/ /k/a/d/ə/ /l/i/n/ /p/e/ /p/j/e/p/t/
Să sărim în luntrea mică Îngânaţi de glas de ape, Şi să scap din mână cârma, Şi lopeţile să-mi scape;
/s/ə/ /s/ə/r/i/m/ /ɨ/n/ /l/u/n/t/r/e̯ /a/ /m/i/k/ə/ /ɨ/n/g/ɨ/n/a/ts/i/ /d/e/ /g/l/a/s/ /d/e/ /a/p/e/ /∫/i/ /s/ə/ /s/k/a/p/ /d/i/n/ /m/ɨ/n/ə/ /k/ɨ/r/m/a/ /∫/i/ /l/o/p/e/ts/i/l/e/ /s/ə/m/i/ /s/k/a/p/e/
Să plutim cuprinşi de farmec Sub lumina blândei lune – Vântu-n trestii lin foşnească, Unduioasa apă sune!
/s/ə/ /p/l/u/t/i/m/ /k/u/p/r/i/n/∫/i/ /d/e/ /f/a/r/m/e/k/ /s/u/b/ /l/u/m/i/n/a/ /b/l/ɨ/n/d/e/j/ /l/u/n/e/ /v/ɨ/n/t/u/n/ /t/r/e/s/t/i/j/ /l/i/n/ /f/o/∫/n/e̯ /a/s/k/ə/ /u/n/d/u/j/o̯ /a/s/a/ /a/p/ə/ /s/u/n/e/
Dar nu vine... Singuratic În zadar suspin şi sufăr Lângă lacul cel albastru Încărcat cu flori de nufăr
/d/a/r/ /n/u/ /v/i/n/e/ /s/i/n/g/u/r/a/t/i/k/ /ɨ/n/ /z/a/d/a/r/ /s/u/s/p/i/n/ /∫/i/ /s/u/f/ə/r/ /l/ɨ/n/g/ə/ /l/a/k/u/l/ /t∫/e/l/ /a/l/b/a/s/t/r/u/ /ɨ/n/k/ə/r/k/a/t/ /k/u/ /f/l/o/r/i/ /d/e/ /n/u/f/ə/r/
We set up a vector whose elements are the 34 Romanian phonemes PV = {/a/, /ə/, /ɨ/, /e/, /e̯ /, /i/, /i/, /j/, /o/, /o̯ /, /u/, /w/, /b/, /k/, /t∫/, /k’/, /d/, /f/, /g/, /dʒ/, /g’/, /h/, /ʒ/, /l/, /m/, /n/, /p/, /r/, /s/, /∫/, /t/, /ts/, /v/, /z/}.
58 Phonic phenomena and replace the phonemes by their frequencies in the given verse. Analyzing the whole poem consisting of 20 verses we obtain the following phoneme vectors: PV1 = 〈3,0,0,0,0,1,0,0,2,0,2,0,1,2,0,0,1,0,0,0,0,0,0,4,0,0,0,3,1,0,1,0,0,0〉 PV2 = 〈2,1,2,2,0,0,2,0,0,0,1,0,1,2,0,0,0,1,1,0,0,0,0,2,0,3,0,2,0,0,0,0,0,0〉 PV3 = 〈1,1,1,3,0,1,1,0,0,0,1,0,1,1,1,0,1,0,0,0,0,0,0,1,0,2,0,4,1,0,1,0,0,0〉 PV4 = 〈1,2,0,2,0,0,0,1,1,0,2,0,1,2,0,0,0,0,0,0,0,0,0,1,1,0,0,3,0,0,1,0,0,0〉 PV5 = 〈2,0,0,3,1,1,1,1,0,0,2,1,0,1,0,0,2,0,1,0,0,0,0,2,1,1,0,2,0,1,1,0,0,0〉 PV6 = 〈4,0,0,1,0,1,0,0,0,0,1,0,0,3,0,0,0,0,0,0,0,0,0,1,0,0,3,2,1,2,3,0,0,0〉 PV7 = 〈2,3,0,1,0,2,0,2,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,3,3,0,2,0,0,0〉 PV8 = 〈1,2,0,2,0,3,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,1,1,3,0,1,1,1,0,0,0〉 PV9 = 〈1,3,1,0,1,2,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,2,2,0,2,2,0,1,0,0,0〉 PV10 = 〈3,0,2,3,0,0,1,0,0,0,0,0,0,0,0,0,2,0,2,0,0,0,0,1,0,2,1,0,1,0,0,1,0,0〉 PV11 = 〈2,2,2,0,0,2,0,0,0,0,0,0,0,2,0,0,1,0,0,0,0,0,0,0,2,2,1,1,2,1,0,0,0,0〉 PV12 = 〈1,1,0,3,0,2,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,2,1,0,2,0,2,1,0,1,0,0〉 PV13 = 〈1,1,0,2,0,2,1,0,0,0,2,0,0,2,0,0,1,1,0,0,0,0,0,1,2,1,2,2,1,1,1,0,0,0〉 PV14 = 〈1,0,1,2,0,1,0,1,0,0,3,0,2,0,0,0,1,0,0,0,0,0,0,3,1,3,0,0,1,0,0,0,0,0〉 PV15 = 〈1,1,1,1,1,2,0,1,1,0,1,0,0,1,0,0,0,1,0,0,0,0,0,1,0,4,0,1,2,1,3,0,1,0〉 PV16 = 〈3,1,0,1,0,0,0,1,0,1,3,0,0,0,0,0,1,0,0,0,0,0,0,0,0,2,1,0,2,0,0,0,0,0〉 PV17 = 〈2,0,0,1,0,3,0,0,0,0,2,0,0,1,0,0,1,0,1,0,0,0,0,0,0,3,0,2,1,0,1,0,1,0〉 PV18 = 〈2,1,1,0,0,2,0,0,0,0,2,0,0,0,0,0,1,1,0,0,0,0,0,0,0,2,1,2,3,1,0,0,0,1〉 PV19 = 〈3,1,1,1,0,0,0,0,0,0,2,0,1,1,1,0,0,0,1,0,0,0,0,5,0,1,0,1,1,0,1,0,0,0〉 PV20 = 〈1,2,1,1,0,0,1,0,1,0,2,0,0,3,0,0,1,2,0,0,0,0,0,1,0,2,0,3,0,0,1,0,0,0〉 We shall call the components of one vector xi, those of another yi. In order to state the phonemic similarity of two verses we use the cosine indicator without normalising the values given as cos ( PV1 , PV2 ) =
x1 y1 + x2 y2 + ... + xk yk x12 + x22 + ... + xk2
y12 + y22 + ... + yk2
k
(2.4.1) =
∑x y i =1
i
k
i
k 2 i =i 1 =i 1
∑x ∑ y
2 i
where k = 34 represents the number of Romanian phonemes, and from (2.4.1) we obtain the radian of the angle using (2.4.2) τij = arccos(cos(PVi,PVj)).
Aggregation 59
The greater is τ, the greater is the phonic dissimilarity of two verses. If τ = 0, perfect similarity is given. Different modifications are possible but we consider the result as it is. We compute the mean (dis)similarity of verses in a certain distance d and take d = 1 evaluting (2.4.2) for all immediately neighbouring verses i.e. (1,2), (2,3),…,(19,20). For the first and the second verse we obtain (leaving out pairs in which one of the elements is 0):
cos( PV1 , PV2 ) = =
3(2) + 2(1) + 1(1) + 2(2) + 4(2) + 3(2) 2
4 + 2(3) 2 + 3(2) 2 + 5(1) 2 32 + 7(2) 2 + 5(1) 2 27 = 0.583383 7.1414(6.4807)
From this, τ12 = arccos(0.583383) = 0.947908. After computing all τ among neighbouring verses, we obtain the mean of these τ(1) as the sum of all respective τi,i+1 divided by 19, since there are 19 neighbouring pairs. Thus the mean τ for distance 1 is τ(1) = (0.947908 + 0.666946 + 0.701674 + 0.797118 + 0.937744 + 1.032968 + 0.970128 + 0.913333 + 1.244796 + 1.039012 + 0.986940 + 0.746283 + 0.957312 + 0.917633 + 0.998581 + 0.877128 + 0.696339 + 1.101363 + 0.987134) / 19 =0.922123. In order to obtain directly an expression for similarity (there are dozens of different formulas), we simple take (2.4.3) S = 1 −
2τ ( d )
π
which attains values in interval [0, 1], where 0 is the smallest similarity and 1 the greatest similarity, and present the results in Table 2.4.1.
60 Phonic phenomena The graphical representation of the course of similarity in dependence on distance is presented in Figure 2.4.1.
Figure 2.4.1. The development of mean phonemic similarity with increasing distance in Lacul Table 2.4.1: Mean phonic similarities of verses of Lacul at distance d Distance d
Mean dissimilarity τ(d)
Similarity S = 1-2τ(d)/ π
1
0.922123
0.412958
2
0.880339
0.439559
3
0.980918
0.375528
4
0.953367
0.393068
5
0.951558
0.394220
6
1.020575
0.350282
7
0.983388
0.373956
8
1.012124
0.355662
9
0.956406
0.391133
10
1.014036
0.354445
11
0.978028
0.377368
12
0.991043
0.369082
13
1.012168
0.355634
14
0.987895
0.371086
15
0.953101
0.393237
16
0.908692
0.421509
17
0.841329
0.464393
18
0.618277
0.606393
19
0.865573
0.448959
Aggregation 61
In spontaneously created poetry – like folklore – we expect a decrease of mean similarity. However, folklore and artistic poetry may differ drastically. In folklore, we have frequently spontaneous creation of a known story – though learning very long works by heart was nothing exceptional (e.g. in India). The stories have been told in one go and shortened or prolonged according to the interest of the audience. In artistic poetry there may be strivings for different effects, e.g. echoes at the end of a long poem or memories of some previous states, repetitions of the given rhyme, phonetic allusions, etc. In Lacul one sees rather a convex curve which, at its end, makes a jump. Since there are 20 verses, this distance is measured only in a few cases, hence this value is not a reliable mean. However, one can see that the first (Lacul codrilor albastru = The blue lake of the woods) and the 19th (Lângă lacul cel albastru = Nearby the blue lake) verses are more similar because the latter plays the role of a refrain. If we accept only mean values based on at least ten observations, the decreasing trend can be observed but its significance cannot be shown. Evidently one must scrutinise longer texts and take care of refrains. In Table 2.4.2 we present the poem Glossă with its English translation by A. G. Sahlean (cf. http://users.rcn.com/luceafarul/translators_gl.html, 02.07.2012) indicating the “echoed” verses with the same number. Table 2.4.2: Echoes in the poem Glossă Glossă
Glossa
Vremea trece, vremea vine,
1
Time goes by, time comes along,
Toate-s vechi şi nouă toate;
2
All is old and all is new;
Ce e rău şi ce e bine
3
What is right and what is wrong,
Tu te-ntreabă şi socoate;
4
You must think and ask of you;
Nu spera şi nu ai teamă,
5
Have no hope and have no fear,
Ce e val ca valul trece;
6
Waves that rise can never hold;
De te-ndeamnă, de te cheamă,
7
If they urge or if they cheer,
Tu rămâi la toate rece.
8
You remain aloof and cold.
Multe trec pe dinainte,
To our sight a lot will glisten,
În auz ne sună multe,
Many sounds will reach our ear;
Cine ţine toate minte
Who could take the time to listen
Şi ar sta să le asculte ?...
And remember all we hear?
Tu aşează-te deoparte,
Keep aside from all that patter,
62 Phonic phenomena
Glossă
Glossa
Regăsindu-te pe tine,
Seek yourself, far from the throng
Când cu zgomote deşarte Vremea trece, vremea vine.
When with loud and idle clatter 1
Time goes by, time comes along.
Nici încline a ei limbă
Nor forget the tongue of reason
Recea cumpăn-a gândirii
Or its even scales depress
Înspre clipa ce se schimbă
When the moment, changing season,
Pentru masca fericirii,
Wears the mask of happiness -
Ce din moartea ei se naşte
It is born of reason's slumber
Şi o clipă ţine poate;
And may last a wink as true:
Pentru cine o cunoaşte
For the one who knows its number
Toate-s vechi şi nouă toate.
2
All is old and all is new.
Privitor ca la teatru
Be as to a play, spectator,
Tu în lume să te-nchipui;
As the world unfolds before:
Joace unul şi pe patru,
You will know the heart of matter
Totuşi tu ghici-vei chipu-i,
Should they act two parts or four;
Şi de plânge, de se ceartă,
When they cry or tear asunder
Tu în colţ petreci în tine
From your seat enjoy along
Şi-nţelegi din a lor artă
And you'll learn from art to wonder
Ce e rău şi ce e bine.
3
What is right and what is wrong.
Viitorul şi trecutul
Past and future, ever blending,
Sunt a filei două feţe,
Are the twin sides of same page:
Vede-n capăt începutul
New start will begin with ending
Cine ştie să le-nveţe;
When you know to learn from age;
Tot ce-a fost ori o să fie
All that was or be tomorrow
În prezent le-avem pe toate,
We have in the present, too;
Dar de-a lor zădărnicie
But what's vain and futile sorrow
Tu te-ntreabă şi socoate.
4
You must think and ask of you;
Căci aceloraşi mijloace
For the living cannot sever
Se supun câte există,
From the means we've always had:
Şi de mii de ani încoace
Now, as years ago, and ever,
Lumea-i veselă şi tristă;
Men are happy or are sad:
Alte măşti, aceeaşi piesă,
Other masks, same play repeated;
Aggregation 63
Glossă
Glossa
Alte guri, aceeaşi gamă,
Diff'rent tongues, same words to hear;
Amăgit atât de-adese Nu spera şi nu ai teamă.
Of your dreams so often cheated, 5
Have no hope and have no fear.
Nu spera când vezi mişeii
Hope not when the villains cluster
La izbândă făcând punte,
By success and glory drawn:
Te-or întrece nătărăii,
Fools with perfect lack of luster
De ai fi cu stea în frunte;
Will outshine Hyperion!
Teamă n-ai, căta-vor iarăşi
Fear it not, they'll push each other
Între dânşii să se plece,
To reach higher in the fold,
Nu te prinde lor tovarăş; Ce e val ca valul trece.
Do not side with them as brother, 6
Waves that rise can never hold.
Cu un cântec de sirenă,
Sounds of siren songs call steady
Lumea-ntinde lucii mreje;
Toward golden nets, astray;
Ca să schimbe-actorii-n scenă,
Life attracts you into eddies
Te momeşte în vârteje;
To change actors in the play;
Tu pe-alături te strecoară,
Steal aside from crowd and bustle,
Nu băga nici chiar de seamă,
Do not look, seem not to hear
Din cărarea ta afară De te-ndeamnă, de te cheamă.
From your path, away from hustle, 7
If they urge or if they cheer;
De te-ating, să feri în laturi,
If they reach for you, go faster,
De hulesc, să taci din gură;
Hold your tongue when slanders yell;
Ce mai vrei cu-a tale sfaturi,
Your advice they cannot master,
Dacă ştii a lor măsură;
Don't you know their measure well?
Zică toţi ce vor să zică,
Let them talk and let them chatter,
Treacă-n lume cine-o trece;
Let all go past, young and old;
Ca să nu-ndrăgeşti nimică,
Unattached to man or matter,
Tu rămâi la toate rece.
8
You remain aloof and cold.
Tu rămâi la toate rece,
8
You remain aloof and cold
De te-ndeamnă, de te cheamă;
7
If they urge or if they cheer;
Ce e val ca valul trece,
6
Waves that rise can never hold,
Nu spera şi nu ai teamă;
5
Have no hope and have no fear;
64 Phonic phenomena
Glossă
Glossa
Tu te-ntreabă şi socoate
4
You must think and ask of you
Ce e rău şi ce e bine;
3
What is right and what is wrong;
Toate-s vechi şi nouă toate;
2
All is old and all is new,
Vremea trece, vremea vine.
1
Time goes by, time comes along.
The development of mean phonemic similarity with increasing distance in the poem Glossă is shown in Figure 2.4.2. The echoes are systematically placed and add similarity to different distances and destroy thereby any trace of spontaneity (in our view). A quite different course of similarity in dependence on distance is presented in Figure 2.4.3 for the poem Luceafărul. Thus, the poetic texts of each author must be scrutinised separately and the descriptions need not be general. Long poems are seldom written without pause and, at last, they may have been corrected many times. In this way, some creative processes are glossed over, modified, words are replaced by other ones, etc., and the phonic elements of the creation become lost. However, frequently even all a posteriori modifications do not destroy everything, and poets may intuitively create some classes of poems with common features, in our case, common phonetic similarities depending on distance. Analyzing Eminescu's work, we found several types regarding the course of mean similarities with increasing distance. They could not be captured by simple curves in any case, in each case a polynomial of very high order was necessary. Thus characterisation in this way is not prolific.
Figure 2.4.2. The development of mean phonic similarity with increasing distance in the poem Glossă
Aggregation 65
Figure 2.4.3. The development of mean phonic similarity with increasing distance in the poem Luceafărul
However, almost all the poems have a common feature: the variance of the similarity increases with increasing distance. In order to capture the state of the similarity, we have chosen a static and a dynamic characterisation of the poems. The static characterisation may be performed by a non-weighted indicator called non-smoothness (cf. Popescu et al. 2010: 95 ff.) consisting in the enumeration of local extremes. A point xi on the curve is a local maximum, if xi-1 < xi > xi+1 and a local minimum, if xi-1 > xi < xi+1. In the last column of Table 2.4.1 we find m = 13 extremes (including the first and the last values). Non-smoothness is defined as (2.4.4) NS =
m−2 , nd − 2
where m is the number of local extremes and nd is the number of distances in the poem. Since the first and the last value are automatically extremes, they will be subtracted both from m and nd. As can easily be seen, in Lacul we obtain NS(Lacul) = (13 – 2)/(19 – 2) = 0.6471. This indicator is a simple proportion and can easily be treated statistically. However, (2.4.4) does not express the magnitude of the extremes. In order to weight the oscillation, Popescu et al. (2010: 97) proposed the indicator of roughness defined as (2.4.5) R =
(m − 2) L , (nd − 2)(nd − 1) 2
66 Phonic phenomena where L is the arc length computed as (2.4.6) L =
nd −1
∑ [( x − x i =1
i
i +1
) 2 + 12 ]1/ 2
normalised by its maximum, which is given by (2.4.7) Lmax =
nd −1
∑ [(0 − 1) i =1
2
+ 12 ]1/ 2 = (nd − 1) 2 .
Indicator R attains its minimum, 0, if the arc is a straight line, and its maximum, 1, if the values oscillate regularly between 0 and 1. To demonstrate the computation we can state that in Lacul there are nd = 19 points out of which m = 13 are extremes. The arc length can be obtained as L = [(0.439559 - 0.412958)2 + 1]1/2 + [(0.375528 – 0.439559)2 + 1]1/2 + … + +[(0.448959 – 0.606393)2 + 1]1/2 = 18.029689. Inserting these values in (2.4.5) we obtain (13 − 2)18.029689 = R = 0.458294 (19-2)(19-1) 2 i.e., approximately a mean roughness. The values for other poems by Eminescu can be found in Table 2.4.3, and the graphics of roughness and poem length in Figure 2.4.4. Table 2.4.3: Similarity roughness in 46 poems by Eminescu Poem title
n verses
Roughness
Lebăda
12
0.472067
Peste vârfuri
12
0.629308
Şi dacă...
12
0.709386
La mijloc de codru...
13
0.495605
Adânca mare...
14
0.579076
Trecut-au anii
14
0.515271
Somnoroase păsărele...
16
0.599956
La steaua
16
0.381355
Dintre sute de catarge
16
0.655667
Aggregation 67
O, mamă...
18
0.424618
Lacul
20
0.458294
Odă în metru antic
20
0.499768
De ce nu-mi vii
24
0.439849
Sara pe deal
24
0.539198
Ce te legeni codrule
25
0.482380
Criticilor mei
28
0.425023
Cu mâne zilele-ţi adaogi…
32
0.488748
Ce-ţi doresc eu ţie, dulce Românie
32
0.488421
Revedere
36
0.386118
Atât de fragedă
36
0.493082
Mai am un singur dor
36
0.471823
Despărţire
38
0.383963
Ghazel
40
0.458873
Dacă treci râul Selenei
41
0.409476
Sonete
42
0.526059
Pe lângă plopii fără soţ
44
0.448790
O, adevăr sublime...
44
0.414093
Rugăciunea unui dac
46
0.411280
Venere şi Madonă
48
0.424563
Freamăt de codru
48
0.487451
La moartea lui Heliade
48
0.471841
Povestea codrului
52
0.447547
Iubită dulce, o, mă lasă
56
0.413737
Floare-albastră
56
0.493925
Dumnezeu şi om
56
0.453710
Mortua est!
70
0.432799
Junii corupţi
78
0.462185
Glossă
80
0.516858
O călărire în zori
86
0.528599
Povestea teiului
88
0.407723
Copii eram noi amândoi
92
0.532498
68 Phonic phenomena
Făt-Frumos din tei
92
0.524711
Epigonii
114
0.484215
Împărat şi proletar
210
0.450940
Scrisoarea III
285
0.421290
Luceafărul
392
0.479955
Figure 2.4.4. Poem length and roughness
Now, in order to interpret roughness we consider the possibility that with spontaneous writing the Skinner effect is active, similarity decreases regularly and slowly and the arc is small. Hence the smaller roughness, the greater spontaneity: the poem was written “fluently”. But the more pauses there are during the writing process, the more echoes are placed (artificially) in the poem and the more additional corrections were made, the greater will be the roughness. Thus, roughness discloses something of the creative process but without the possibility of asking the author we cannot give a definitive statement. In any case, poems with smaller than mean roughness were produced with greater spontaneity – in general. As can be seen in Figure 2.4.4, roughness of the majority of the poems under study lies under 0.5 and testifies to the possibility that Eminescu wrote them “rather” spontaneously. The longer poems approximate the mean 0.5 and reveal that the author made a lot of supplementary changes, or made a number of pauses while writing, or sought a phonic effect, echo, etc., though his original way of writing might have been spontaneous. Only a small number of shorter poems display great roughness, i.e. “construction” difficulties or supplementary changes of the text.
Aggregation 69
The dynamic characterisation of the increasing oscillation can be performed using the stepwise variance (SV) of the computed similarities defined as
= SV (2.4.8)
1 d ∑ ( Si − S d ) 2 d − 1 i =1
where Si are the individual similarities. For illustration, let us consider the poem Lebăda. Taking, e.g. the first three distances we obtain the mean similarity 0.364403 and the variance as 0.001244 up to distance d = 3. Computing all stepwise variances we obtain the results in the last column of Table 2.4.4 displayed graphically in Figure 2.4.5. The numbers in the last column of the Table 2.4.4 are multiplied by 1000 for a more lucid survey in the Figure 2.4.5. Table 2.4.5: Stepwise variance of similarities in the poem Lebăda Distance d
Similarity S
Stepwise Mean
Stepwise variance (SV)
1
0.330972
0.330972
-
2
0.401263
0.366118
0.002470
3
0.360975
0.364403
0.001244
4
0.325068
0.354570
0.001216
5
0.337641
0.351184
0.000969
6
0.377393
0.355552
0.000890
7
0.373982
0.358185
0.000790
8
0.379664
0.360870
0.000735
9
0.339634
0.358510
0.000693
10
0.423059
0.364965
0.001033
11
0.522756
0.379310
0.003193
This is, of course, a somewhat smoother sequence than the similarities themselves and can be captured by the beta-function, (2.4.9) SV = a(d – 2)b(M – d)c where a, b, c, and M are fitting parameters and d is the independent variable (distance).
70 Phonic phenomena
Figure 2.4.5. Beta fitting of U-shaped variance of similarities in Lebăda
Figure 2.4.6. Beta fitting of U-shaped variance of similarities in Luceafărul
Generally, besides its easy linguistic interpretation (cf. Popescu, Čech, Altmann 2011: 103), the beta-function appears to be most versatile, as further illustrated for Luceafărul (392 verses) in Figure 2.4.6, Glossă (80 verses) in Figure 2.4.7, Freamăt de codru (48 verses) in Figure 2.4.8 and Şi dacă... (12 verses) in Figure 2.4.9. The SV curves have a U-shape, an inverted sigmoid or other monotonous shapes either ascending or descending. The results of beta fitting to stepwise variances of similarities for other poems are presented in Table 2.4.5 in continuation below.
Aggregation 71
Figure 2.4.7. Beta fitting of the variance of similarities in Glossă
Figure 2.4.8. Beta fitting of the variance of similarities in Freamăt de codru
A small number of poems cannot be satisfactorily fitted by the beta-function. We suppose the effect of specific boundary conditions that are not yet known and should be incorporated in the theory later on. In some cases, the parameter a takes on enormous values, which have to be compensated by the other parameters. Also parameter M cannot always be explained empirically, though theoretically it is the maximum of x. Up to now, there is no satisfactory theory concerning the interplay of forces in texts. However, the results give an impetus for both literary and psycholinguistic investigations.
Figure 2.4.9. Beta fitting of the variance of similarities in Şi dacă…
72 Phonic phenomena Table 2.4.5: Fitting of beta-function 2.4.9 to stepwise variances of similarities (multiplied by 1000) Poem title
n verses
a
b
c
M
R2
Lebăda
12
2.3493
-0.4306
-0.2654
11.0089
0.98
Peste vârfuri
12
0.1169
1.0894
-0.0655
11.0003
0.93
Şi dacă...
12
12.3976
-0.5407
-0.3606
16.1572
0.94
La mijloc de codru...
13
1.2474
0.0789
0.0071
12.0000
0.86
Adânca mare...
14
1.2745
-7.1E-8
-0.2891
14.2913
0.57
Trecut-au anii
14
0.4839
0.2059
-0.2675
13.0083
0.99
Somnoroase păsărele...
16
5571.203
-0.6197
-2.7514
23.7679
0.63
La steaua
16
1.3119
-2E-9
-0.4255
15.1716
0.83
Dintre sute de catarge
16
1.6076
0.2446
0.2499
15.0000
0.44
O, mamă...
18
1.0765
-1E-13
-0.2986
19.4406
0.52
Lacul
20
83.3140
-0.6644
-1.4705
21.2663
0.78
Odă în metru antic
20
1.7370
-1.8E-9
-0.6308
20.3687
0.87
De ce nu-mi vii
24
11.6359
3.6506
-2.6821
107.8785
0.85
Sara pe deal
24
12.2982
0.2102
-1.0728
47.8712
0.89
Ce te legeni codrul
25
44.6701
1.5114
-2.2456
60.7189
0.89
Criticilor mei
28
132.7336
0.00017
-1.8380
35.7149
0.60
Cu mâne zilele-ţi adaogi
32
48.3140
2.7223
-1.9622
444.8317
0.66
Ce-ţi doresc eu ţie, dulce Românie
32
0.1020
0.7227
-0.1598
31.0098
0.93
Revedere
36
1.5975
-0.2331
-0.3290
35.0139
0.98
Atât de fragedă...
36
3.2043
0.1718
-0.6844
69.1686
0.75
Mai am un singur dor
36
2.4047
-0.2276
-0.5714
36.1521
0.92
Despărţire
38
6.0967
-0.4217
-0.7629
40.5750
0.82
Ghazel
40
1.3139
-0.0355
-0.4976
42.7757
0.91
Dacă treci râul Selenei
41
7.8334
-0.3357
-1.1493
45.7117
0.95
Sonete
42
0.0855
0.5007
-0.2268
41.0493
0.97
Pe lângă plopii fără soţ
44
1.8410
0.0475
-0.6434
47.2725
0.90
O, adevăr sublime...
44
10.5413
7.48E-8
-1.1991
52.7014
0.93
Rugăciunea unui dac
46
0.7278
-0.2065
-0.2231
45.1293
0.77
Venere şi Madona
48
0.4995
0.1174
-0.5464
48.4942
0.92
Freamăt de codru
48
0.5710
0.5304
-0.8757
52.7284
0.97
La moartea lui Heliade
48
0.4784
0.0620
-0.2752
47.0190
0.95
Rhyme 73
Povestea codrului
52
230.9531
-0.3735
-1.3829
73.4993
0.88
Iubită dulce, o, mă lasă
56
197.7745
-0.2191
-1.7486
76.1385
0.91
Floare-albastră
56
0.9176
-0.1208
-0.4055
55.7411
0.91
Dumnezeu şi om
56
0.1257
0.2524
-0.3413
56.1415
0.90
Mortua est!
70
10.1021
-0.3649
-0.8539
73.7914
0.90
Junii corupţi
78
1.7058
-0.1296
-0.5118
81.6006
0.94
Glossă
80
22.1707
-0.4299
-0.9037
79.4903
0.99
O călărire în zori
86
2.4177
-0.1933
-0.4362
86.7347
0.90
Povestea teiului
88
1.2982
0.1839
-0.7542
102.6669
0.95
Copii eram noi amândoi
92
243.4634
-0.3241
-1.3718
125.6920
0.81
Făt-Frumos din tei
92
5.8398
-0.2242
-0.8559
95.7490
0.97
114
2.1441
-0.2022
-0.7339
122.7160
0.92
Împărat şi proletar
210
0.6965
-0.2423
-0.4111
211.0245
0.93
Scrisoarea III
285
5.1373
-0.3903
-0.6547
289.804
0.96
Luceafărul
392
0.9638
-0.2562
-0.3700
391.8325
0.94
Epigonii
2.5 Rhyme The present section is devoted to properties of rhyme words, i.e. word-forms at the end of the verse/line with a phonic counterpart at the end of some further verse/line. For our purposes, we will operationalise the concept word-form in a simplistic, graphical way (cf. Chapter 3.2. Frequency distribution) as common also in computational and corpus linguistics: We will consider a string of characters separated from other strings by white spaces, punctuation marks or the end of a verse/line as a word-form (apostrophes, hyphens etc. are not considered as characters and are removed from the strings). If rhyme is present in a poem, it fulfils its main euphonic function. Since the number of possible rhymes in a language is finite – as can be shown by a combinatorial argument – the frequent use of two rhyme words together makes the given rhyme used up. It does not bring much surprise, on the contrary, it fulfils a kind of expectation and in the course of time it looses its effectiveness. Besides, not all words with a rhyming counterpart can be used within the same strophe because the respective words do not possess a matching semantic association or at least a kind of semantic contiguity. Generally, one and the same poet cannot afford to repeat the same rhyme in his further poems. Poetic originality is not only a matter of ideas but also a mat-
74 Phonic phenomena ter of form as distinct from scientific, journalistic or political texts in which the same matter can be steadily continued without any formal restrictions. Thus the rhyming technique of a poet must change in the course of time, and this change may concern any of the many properties both of rhyme (cf. e.g. http://en.wikipedia.org/wiki/Rhyme, 02.7.2012) and of the choice of rhyming words. Here we can take into consideration only a few of them. We shall show some quantitative properties and follow their development or at least their proportions and differences. (1) Word length. Rhyming words may have different syllabic lengths and the frequencies form a distribution. (2) Open and closed rhyme. A rhyming word ending with vowel is open, that ending with a consonant is closed. This concerns, of course, the phonetic and not the written form. The rhyme can be mixed, too, i.e. one of the rhyming words is open, the second is closed; hence each word must be counted separately. In a sonnet there are 14 rhyming words. (3) Masculine and feminine rhyme. The first contains a word with the stress on the last syllable, the latter has the stress on the penultimate syllable. (4) Parts of speech. Each word belongs to a part of speech which may be marked morphologically or, in some more analytic languages, at least syntactically. A poet may emphasise things, properties or references and other “Aristotelian” categories in different ways. It depends, of course, on the given language, which part of speech may occur at the end of a verse.
2.5.1 Word length The number of syllables (= word length) in rhyme words may be constant in some poetries but with Eminescu it is a free property controlled by meaning, which has priority. Nevertheless, we assume that there is some basic distribution which makes its way and controls the formation of rhyme words. It may follow the distribution of word length valid for the language as a whole but this is not very probable because in prose there are other regimes for manipulating length than in poetry. The existence of a distribution of rhyme-word lengths is, at the same time, a sign of some kind of inner rhythm. This rhythm may be disturbed in three extreme cases: (i) if the poem is too short, testing a distribution turns out to be a problem, e.g. because of zero degrees of freedom; (ii) in very long poems it may be disturbed by the fact that it was not written in one go; the author made pauses and after every pause a different mechanism could arise; (iii) if in most poems (of appropriate length) we discover some pattern in rhyme-
Rhyme 75
word length and find a poem not obeying such a regularity, it may be a sign of many corrections made a posteriori, or a sign of non-spontaneity. In the majority of cases one cannot ask the poet and considers the given poem as an exception. On the other hand, spontaneity is a process whose quality/degree cannot be measured. Analyzing 141 poems by Eminescu we arrived inductively at the hypothesis that rhyme-word length abides by a Poisson regime. Consider the rhyme positions as urns in which monosyllables, disyllables, etc. are placed. The urns may exert some influence, which may be neutral (random), attracting or repelling, thus yielding different distributions. But also the content may exert influence on the rhyme and this may lead to the rise of a whole family of distributions. Our task is to find this family. The simplest way is to apply the general theory as proposed by Wimmer and Altmann (2005). We suppose that if there is a mechanism controlling the rise of a distribution of rhyme-word lengths, then the probability Px of finding rhyme-word length x is proportional to the probability of the class x-1. This approach has been prolific in deriving many language laws. In discrete cases we simply suppose that (2.5.1)
∆Px −1 = g ( x) , Px −1
where ΔPx-1 = Px – Px-1, i.e. the difference of two neighbouring classes, and g(x) is a proportionality function. Simply, the relative difference of two neighbouring classes can be expressed by a function g(x). Taking, for example g(x) = a/x – 1, one obtains Px = (a/x)Px-1 whose solution yields the Poisson distribution (see below). This approach enables us to avoid the application of stochastic processes which are not beloved by linguists. We begin instead directly with the resulting difference equations whose solution yields – in our case – a family of Poisson-type distributions. Rewriting (2.5.1) as (2.5.2)
Px − Px −1 = g ( x) Px −1
we obtain (2.5.3) Px= (1 + g ( x)) Px −1 where g(x) represents a function which yields a preliminary, very general interpretation of the situation in language.
76 Phonic phenomena We write
a1 a2 a0 + + (2.5.4) g ( x) = c1 ( x + b1 ) ( x + b2 )c2 and interpret a0 as a proportionality constant which is present in any linguistic distribution. It is given as a constant of language, not as a property of the individual speaker or writer (poet). The second component expresses the simplest relationship of the neighbouring classes of the distribution, frequently controlled by two constants (b1 and c1), showing the mechanism of self-regulation. The numerator is a function of the speaker, the denominator is the control instrument of the hearer which cannot allow the property to attain infinite values. Here, it is the warrant of convergence. Instead of introducing further functions exerting influence on Px, one collects them in the third component considering it as the ceteris paribus condition. Here b2 and c2 are mostly great in order to break the activity of the speaker who is forced to take into account other intervening phenomena. In every case, the parameters and the components as a whole may obtain a different interpretation. Further research in this direction is necessary. In Eminescu's poems, we found four types of Poisson type distributions, all of them being special cases of (2.5.3). Since length x = 0 does not exist, all distributions must be displaced one step to the right, i.e. displacement means in the formulas that x is replaced by x-1, and the support changes to x = 1,2,…. If there are poems without word with length x = 1, then one must displace the distribution two steps to the right. The simplest special case is the Poisson distribution. Here we set a0 = −1, a1 = a, b1 = a2 = 0, c1 = 1 and obtain (2.5.5) Px =
a Px −1 x
Solving the equation stepwise yields
Px = (2.5.6)
a xe− a = , x 0,1,2,... x!
Rhyme 77
or in displaced form (2.5.7) Px =
a x −1e − a = , x 1,2,... ( x − 1)!
The results of fitting this distribution to Eminescu's rhyme-word lengths yielded the results presented in Table 2.5.1. Here, the empirical frequencies are given in the second column, for example, in Adânca mare…, there are 8 monosyllabic rhyme-words, 3 disyllabic, 1 trisyllabic, and 2 quadrisyllabic ones. The letter a symbolises the parameter of the Poisson distribution; X2 is the result of the chisquare test for goodness-of-fit performed with DF degrees of freedom and yielding the probability P. If P is greater than 0.01 – which has been chosen because of the very short tails of the empirical data –we consider the fitting result as safisfactory and accept the hypothesis that the data are Poisson-distributed. Table 2.5.1: Fitting the (displaced) Poisson distribution to rhyme-word lengths in Eminescu's poems (x = 1,2,…) Poem title Adânca mare…
Empirical distribution
a
X2
DF
P
8,3,1,2
0.7170
1.17
1
0.27
Atât de fragedă…
9,19,6,2
1.0441
4.12
2
0.13
Când amintirile...
4,14,3,2,0,0,1
1.2215
5.64
2
0.06
6,7,6,1
1.1021
0.23
1
0.63
Când priveşti oglinda mărei
7,14,9,2
1.2389
2.7
2
0.26
Ce e amorul?
6,11,6,4
1.3516
0.43
2
0.81
Ce te legeni...
Când marea...
6,6,8,3,1
1.4967
1.31
2
0.52
Cine-i?
9,14,6,1
0.9990
2.04
2
0.36
Crăiasa din poveşti
0,8,4,2
1.4366
5.67
2
0.06
1,9,3,1
1.3981
6.13
2
0.05
Cu mâne zilele-ţi adaogi...
6,6,3,0,1
0.9588
0.04
2
0.98
Cum oceanu-ntărâtat...
0,4,8,1,1
1.9464
6.54
1
0.01
De câte ori, iubito...
1,4,5,3,1
1.9992
0.53
1
0.47
De-aş avea
2,7,3,11,0,1
2.3374
2.39
1
0.12
De-aş muri ori de-ai muri
8,15,10,2,1
1.2764
1.97
3
0.58
Criticilor mei
78 Phonic phenomena
De-oi adormi (variantă)
10,19,3,1,2,1
0.8544
4.56
1
0.03
De-or trece anii...
5,6,4,1
1.1447
0.008
1
0.93
Departe sunt de tine
6,10,2
0.8428
3.03
1
0.08
18,15,4,1
0.6918
0.43
2
0.81
3,4,5,2
1.5232
0.9
2
0.64
1,4,3,2,2
2.0717
0.47
3
0.93
4,8,4
1.1037
1.25
1
0.26
7,9,10,7,1,1,1
1.6705
1.1
3
0.78
4,2,8,1,1
1.6165
6.45
3
0.69
2,6,4
1.3172
1.26
1
0.26
7,3,0,1,1
0.6334
0.43
1
0.51
3,4,5,0,1,0,1
1.5181
0.9
2
0.64
8,5,3
0.7375
0.12
1
0.73
6,9,5,1,1
1.1695
0.35
2
0.84
6,11,2,0,0,1
0.9248
2.96
2
0.23
4,5,3
1.0198
0.12
1
0.73
0,6,6,1,0,1
1.9453
2.54
1
0.11
18,26,15,3,1,0,1
1.1625
1.64
3
0.65
2,5,5,4
1.8606
0.3
2
0.86
6,9,7,5,2
1.6227
0.33
3
0.95
La moartea principelui Ştirbey
2,7,7
1.5773
1.16
1
0.28
La mormântul lui Aron Pumnul
5,7,7,4,1,0,1
1.6754
0.2
3
0.98
La o artistă (Ca a nopţii poezie)
7,8,7,14,1,1
2.0621
11.8
4
0.02
9,14,4,1
0.9152
2.28
2
0.32
3,8,4,1
1.2155
2.06
2
0.36
5,10,10,3
1.4814
2.58
2
0.28
13,16,4,1,1,1
0.9527
1.54
2
0.46
Misterele nopţii
4,7,8,11,1,1
2.1709
6.19
4
0.19
Noaptea
2,11,2,4,0,1
1.5351
6.88
3
0.08
Nu e steluţă
2,4,3,1,2
1.6323
0.09
2
0.95
Nu mă-nţelegi
4,8,9,1,2
1.6305
3.02
3
0.39
12,17,3,1,2,1
0.8542
1.93
1
0.16
Despărţire Din Berlin la Potsdam Din lyra spartă... Din noaptea Din străinătate Dintre sute de catarge Doi aştri Dorinţa Foaia veştedă (după Lenau) Frumoasă şi jună Horia Iar când voi fi pământ (variantă) Îngere palid... Iubind în taină... Iubitei La mijloc de codru... La moartea lui Neamţu
La o artistă (Credeam ieri) La steaua Locul aripelor Mai am un singur dor
Nu voi mormânt bogat (variantă)
Rhyme 79
O arfă pe-un mormânt
0,11,4,3,5
2.0288
8.77
3
0.03
10,7,1
0.5343
0.72
1
0.39
3,5,8,2
1.6324
3.55
2
0.17
17,17,8,1,1
0.9053
0.38
2
0.82
Peste vârfuri
2,5,4,1
1.3988
1.4
2
0.50
Povestea codrului
10,13,3
0.7885
2.45
1
0.12
8,13,3
0.8551
3.41
1
0.05
4,13,6,11,1,0,1
1.9392
7.69
3
0.05
S-a dus amorul
9,27,5,6,1
1.2459
11.21
3
0.02
Şi dacă...
4,5,1,1,0,1
1.1903
1.34
2
0.51
Singurătate
10,6,3,1
0.7578
0.28
1
0.59
Somnoroase păsărele...
0,8,4,4
1.6931
4.79
2
0.09
4,6,1,0,1
0.8983
0.94
1
0.33
10,14,13,4,2,1
1.4882
0.81
3
0.85
0,5,7,2
1.946
4.11
1
0.04
4,9,2,0,1
1.0091
2.69
1
0.11
O, mamă… Pajul Cupidon... Pe lângă plopii fără soţ…
Replici Revedere
Steaua vieţii Te duci... Veneţia (de Gaetano Cerri) Viaţa mea fu ziuă
2-displaced Poisson Ghazel
(0),21,15,4
0.5994
0.45
1
0.50
Glossa
(0),52,23,4,1
0.4218
0.07
1
0.79
(0),7,8,1
0.6925
2.07
1
0.15
Unda spumă
As can be seen, the result is satisfactory, even if we have small chi-square probabilities in some cases. In the process of fitting, we selected always the “best” Poissonian model, i.e. many of the above data can be captured by the other models, too. A further specification of (2.5.4), viz. a0 = −1, a1 = a, c1 = b, b1 = a2 = 0 yields
Px =
a Px −1 xb
80 Phonic phenomena i.e., there is a slight modification of the original Poisson distribution. The solution yields (2.5.8) = Px
ax = P0 , x 0,1,2,... ( x!)b
and with displacement (2.5.9) Px =
a x −1 P1 , x 1,2,3,... , = (( x − 1)!)b
representing the Conway-Maxwell-Poisson distribution. P1 is, as a matter of fact, the normalising constant. The results of fitting are presented in Table 2.5.2. Many of them can be captured also by the simple Poisson distribution but the result is better with this one. Of course, in all fittings we were forced to adapt pooling of some insufficiently represented length classes. The third model, called Hyperpoisson distribution, can be set up using the specifications a0 = −1, a1 = a, a2 = 0, b1 = b-1, c1 = 1. We see that also here the last component is zero. We obtain the recurrence relation yielding
= Px (2.5.10)
Px =
a Px −1 b + x −1
ax = P0 , x 0,1,2,... b( x )
where b(x) = b(b+1)(b+2)…(b+x-1) is the ascending factorial function. In the 1displaced case we obtain
= (2.5.11) Px
a x −1 C , x 1,2,3,... = b( x −1)
Rhyme 81
Table 2.5.2: Fitting the 1-displaced Conway-Maxwell-Poisson distribution to rhyme-word lengths in Eminescu's poems Poem title
Empirical distribution
a
b
X2
DF
P
Adio
5,23,10,1
4.6713
3.456
0.01
1
0.93
Ah, mierea buzei tale
8,17,10,1
2.8256 2.4639
0.95
1
0.33
Amicului F.I.
11,26,10,1
2.4892
2.7587
0.07
1
0.79
Basmul ce i l-aş spune ei
15,40,27,8
2.8747
2.1366
0.1
1
0.75
0.1125 2.9677
0.7
1
0.40
0.95
Ce-ţi doresc eu ţie, dulce Românie
2,11,15,4
Ecò
30,69,39,10,2
2.2079
1.9436
0.1
2
Epigonii
10,42,38,20,4
3.7233
1.9634
0.87
2
0.65
Făt-Frumos din tei
7,24,14,1
4.1189 3.0697
0.78
1
0.38
Împărat şi proletar
27,79,68,29,5,1,1
3.1494 1.9000
0.31
2
0.86
3.4911
1.6957
2.41
2
0.30
9.1609 3.8740
0.41
1
0.52
Junii corupţi Lasă-ţi lumea...
7,23,25,19,4 3,30,16,2,1
Luceafărul
91,169,100,29,3
2.0020
1.7773
1.46
2
0.46
Melancolie
8,17,10,3
2.1914
1.9109
0.01
1
0.91
Mortua est!
7,36,22,5
4.9340 2.9483
0.04
1
0.83
5,19,11,3
3.5377
2.5271
0.05
1
0.82
30,84,66,21,5
3.0179
1.9760
0.23
2
0.89
Privesc oraşul furnicar Ondina (Fantazie) Povestea teiului
4,40,36,5,3
10.1300
3.5192
0.01
1
0.92
Scrisoarea I
16,68,50,18,4
3.6527
2.1753
0.52
2
0.77
Scrisoarea II
8,37,28,8,1
4.5287
2.5572
0.02
2
0.99
Scrisoarea III
38,122,87,31,4,2,0,1
2.8995
1.9603
0.42
2
0.81
Scrisoarea IV
17,71,47,10,2,0,0,1
3.8110 2.4249
1.57
2
0.46
Scrisoarea V
17,46,39,12,5,1
1.7063
1.31
2
0.52
5.3863 3.0093
0.10
1
0.74
Viaţa
7,40,25,5,0,1
2.7577
where C is the normalisation constant. The results of fitting are presented in Table 2.5.3. There was only one distribution that required the last component containing aa.
82 Phonic phenomena The specification a0 = −1, a1 = a-1, a2 = 1, b1 = 0, b2 = a, c1 = c2 = 1 yields the recurrence formula
1 a −1 Px Px −1 = + x a x + leading to the Ferreri-Poisson distribution (2.5.12a) Px =
a xe− a C , x 0,1,2,... , = (a + x) x!
where again the displacement causes the replacement of x by x-1 and the support is x = 1,2,…, i.e. (2.5.12b) Px =
a x −1e − a C , x 1,2,3... = (a + x − 1)( x − 1)!
Writing (2.5.12b) as
Px =
1 a x −1e − a C , x 1,2,3... = (a + x − 1) ( x − 1)!
we see that the original Poisson distribution has been modified at each step by a simple function. The fitting yielded the results presented in Table 2.5.4. In some cases, the results of fitting are not very satisfactory. We applied some legal “tricks”, e.g. adding a zero frequency for a non-existing first or last class, different pooling of classes, etc. In this way, we obtained a Poissonian regime in 131 poems out of 141, the remaining 10 poems were either too short, too irregular or too long, evidently composed of several parts so that no distribution at all could be fitted (Când; De ce nu-mi vii; Din valurile vremii..., Înger de pază; Lacul; Prin nopţi tăcute; Se bate miezul nopţii...; Maria Tudor; Feciorul de împărat fără de stea; Memento mori). We can now ask the question whether other Romanian poets abide by the same regime or whether it is only Eminescu's personal style. Further, is there a correlation between the year of origin and the model, or at least between the year and the parameters of a model? This question could be answered only if all poems by Eminescu would be analyzed. We leave the problems to future research.
Rhyme 83
Table 2.5.3: Fitting the 1-displaced Hyperpoisson distribution to rhyme-word lengths in Eminescu's poems Poem title Amorul unei marmure Andrei Mureşanu Aveam o muză
Empirical distribution
a
b
X2
DF
P
8,9,15,10,2
1.0272
0.0597
1.26
1
0.26
4,37,30,12,2,1
0.8283
0.0895
0.27
2
0.87
11,27,27,6,0,1
0.8508
0.2949
3.83
2
0.15
36,130,69,20,3
0.6044
0.1674
0.44
2
0.80
Când crivăţul cu iarna...
18,38,29,7,2
0.8030
0.3404
1.37
2
0.50
Care-i amorul meu în astă lume
0,20,12,3,1,1
0.6952
0.1084
3.07
1
0.08
8,14,21,6,2
0.7849
0.0099
0.76
1
0.38
2,35,18,7
0.5702
0.0326
0.1
1
0.75
4,13,8,1,1,0,1
0.7149
0.2200
0.02
1
0.89
8,22,21,5,3,1,1
1.3878
0.6787
3.54
3
0.32
Călin (file de poveste)
Copii eram noi amandoi Cugetările sărmanului Dionis De ce să mori tu? Doina Egipetul Floare-albastră Freamăt de codru Icoană şi privaz În căutarea Şeherezadei Înger şi demon Întunericul şi poetul Iubită dulce, o, mă lasă La Bucovina La moartea lui Heliade Mureşanu O călărire în zori O, adevăr sublime... Pustnicul Rugăciunea unui dac Sara pe deal
13,39,22,12,3,1
0.9712
0.3649
0.99
2
0.62
3,27,16,8,1,0,0,1
0.7707
0.0056
0.58
2
0.75
0,23,15,9,1
0.8174
0.0355
1.67
1
0.20
42,96,40,9,1
0.4931
0.2157
0.2
2
0.90
0,72,66,11,5,1,1
0.7269
0.0101
7.81
2
0.02
7,39,37,14,7,3
1.1232
0.2016
1.44
3
0.70
5,14,10,5
0.915
0.3268
0.01
1
0.91
6,23,16,8,3
0.9753
0.2544
0.17
2
0.92
8,15,9,3,1
0.8611
0.4593
0.003
1
0.96
9,18,14,7
1.0627
0.5313
0.16
1
0.69
58,103,46,12,3,1
0.6460
0.3637
0.6
2
0.74
13,40,22,11
0.7551
0.2454
0.21
1
0.65
10,21,8,5
0.7154
0.3407
0.95
1
0.33
9,33,17,3,2
0.5721
0.1560
0.04
1
0.85
6,20,13,7
0.8589
0.2577
0.03
1
0.87
0,13,10,0,1
0.5889
0.0546
3.18
1
0.07
1,22,10,5,3,1
0.8633
0.0392
4.22
1
0.04
11,24,7,2,1
0.4759
0.2181
0.75
1
0.39
Stelele-n cer
12,5,10,3,4,2
7.1701
8.9626
5.17
3
0.16
Venere şi Madonă
7,17,13,8,2,1
1.2575
0.5178
0.26
2
0.88
Sonete Speranţa
84 Phonic phenomena Table 2.5.4: Fitting the 1-displaced Ferreri-Poisson distribution to rhyme-word lengths in Eminescu's poems a
X2
1,10,2,2,1
2.6771
7.15
2
0.03
Kamadeva
1,7,2
1.6327
6.73
2
0.03
La Quadrat
1,10,4,0,1
1.7581
7.93
2
0.02
2,8,2
1.5667
5.71
1
0.02
2,9,4,1
1.7491
5.01
2
0.08
Poem
Empirical distribution
Cum negustorii din Constantinopol
Lebăda Lida O stea prin ceruri
DF
P
2,10,1,3
1.8384
7.45
2
0.02
0,0,7,4,3
4.0136
3.61
1
0.06
3,11,3,1
1.6173
6.23
2
0.04
Sus în curtea cea domnească
11,10,4,5
1.6469
1.18
2
0.55
Trecut-au anii
0,7,4,2,1
2.5384
1.17
2
0.56
13,6,5,3
1.4073
1.69
2
0.43
Oricâte stele... Pe aceeaşi ulicioară...
Vis
2.5.2 Open and closed rhymes If a rhyme-word ends with a vowel, we call it open, those ending with a consonant are closed. Evidently, the proportion of closed rhymes is merely the complement of the open ones, hence once needs only consider one of them. In the development of poetry there may be two trends: either it begins with a majority of open rhymes and after their abuse the number of closed rhymes increases, or vice versa; or rhyming approaches a steady state (fifty-fifty) which may attain the zero-zero limit with rhymeless poetry. The first case has been observed in the Slovak poetry up to 1960 (cf. Štukovský, Altmann 1965, 1966), the subsequent development is not known. As we analyze only one writer, we cannot make statements about the general development in Romanian. Nevertheless, we can at least show that this property is not constant but changes from year to year. We analyzed, again, 138 poems by Eminescu, but not all years are representative. For example, for 1875, 1882, 1884, 1885 and 1886 we have only one poem for each case and two of them display extreme values. For the other years we could form more or less reliable averages. The results are presented in Table 2.5.5.
Rhyme 85
Table 2.5.5: Mean proportion of open rhymes in Eminescu's poetry Year
% open rhymes 1866
0.6863
Year
% open rhymes 1878
0.6923
1867
0.5263
1879
0.7419
1868
0.5412
1880
0.7647
1869
0.6623
1881
0.7573
1870
0.7313
1882
0.7083
1871
0.7374
1883
0.7009
1872
0.6959
1884
0.3750
1873
0.8079
1885
0.9167
1874
0.7367
1886
0.5000
1875
0.5870
1887
0.7000
1876
0.6339
1889
0.7456
It can easily be shown that there is no trend in the data, the proportion of open rhymes is rather constant. The investigation for Eminescu comprises only 23 years so that a continuation of the study of Romanian poetry up to now would be necessary. In spite of the fact that some years are poorly represented (esp. the year 1884 yielding a strong extreme value) there is a slightly increasing trend to employ open rhyme words, as can be seen in Figure 2.5.1, which cannot be considered significant because of some outliers caused by insufficient representation. However, the drastic change in Figure 2.1.2.1. (euphony), occurring after the “pathological” year 1883, is confirmed also in Figure 2.5.1. for the open rhyme proportion. In further research, the data could be enriched by those from other writers because here we are interested rather in a hypothesis concerning the development of Romanian poetry as a whole.
86 Phonic phenomena
Figure 2.5.1. The trend of using open rhyme words by Eminescu
2.5.3 Masculine and feminine rhyme The placing of the last stress in the line distinguishes (at least) two kinds of rhyme words: masculine if the stress is on the last syllable, and feminine in all other cases. This is a slight simplification but in our case it is sufficient. Instead of presenting the whole table of data we show only the mean proportions of masculine rhymes in the respective years. Again, in some cases the individual outliers disturb the horizontal dispersion around a straight line. The results are presented in Table 2.5.6 and Figure 2.5.2.
Figure 2.5.2. Mean proportion of masculine rhymes in Eminescu's poems
Rhyme 87
Table 2.5.6: Mean proportions of masculine rhymes in Eminescu's poems ordered chronologically Year
Proportion of masculine rhymes
Year
Proportion of masculine rhymes
1866
0.4235
1878
0.6154
1867
0.5079
1879
0.3871
1868
0.5116
1880
0.5294
1869
0.4401
1881
0.3439
1870
0.4179
1882
0.5000
1871
0.4226
1883
0.4384
1872
0.3832
1884
0.0000
1873
0.2199
1885
0.5000
1874
0.2878
1886
1.0000
1875
0.3478
1887
0.4000
1876
0.4605
1889
0.2018
No significant trend could be observed but the dispersion seems to increase. Nevertheless, as noticed above (Figures 2.5.2 and 2.5.1), a significant change occurs after the year 1883 also in the masculine rhyme proportion.
2.5.4 Parts of speech in rhyme words Usually, not all parts of speech occur in the rhyme position unless the poet was forced to place e.g. a preposition or a conjunction or even a separable prefix at the end of the line in order to construct a rhyme. In some languages, it is not possible, in some other ones (e.g. German) it is sometimes necessary. Starting from the usual parts-of-speech system in Romanian following the classical Latin pattern we obtain N – noun V – verb A – adjective Av – adverb P – pronoun Pr – preposition
88 Phonic phenomena C – conjunction Nu – numeral I – interjection. However, not all parts of speech occur in rhyme positions in Eminescu's poetry. We do not find prepositions and numerals at all, only once a conjunction and a very small number of interjections. Hence we can set up a set {N,V,A,Av,P,C,I} but in Table 2.5.7 we bring the complete set because in analyzing further poems one could find a numeral or a conjunction. In order to study both the general and the individual trend in his works, we analyzed 141 poems and registered the part-of-speech to which the rhyme-word belongs. We obtained the results presented in Table 2.5.7, ordered alphabetically. Table 2.5.7: Parts of speech in Eminescu's poems Poem title (alphabetically)
N,V,A,Av,P,Pr,C,Nu,I
1
Adânca mare…
2
Adio
7, 5, 2, 0, 0, 0, 0, 0, 0
3
Ah, mierea buzei tale
16, 9, 3, 1, 7, 0, 0, 0, 0
4
Amicului F.I.
29, 4, 6, 2, 7, 0, 0, 0, 0
5
Amorul unei marmure
6
Andrei Mureşanu
7
Atât de fragedă…
8
Aveam o muză
9
Basmul ce i l-aş spune ei
10
Călin - File de poveste
11
Când
8, 8, 8, 0, 0, 0, 0, 0, 0
12
Când amintirile...
10, 7, 4, 2, 1, 0, 0, 0, 0
13
Când crivăţul cu iarna...
14
Când marea...
15
Când priveşti oglinda mărei
15, 3, 7, 4, 3, 0, 0, 0, 0
16
Care-i amorul meu în astă lume
22, 9, 5, 0, 1, 0, 0, 0, 0
17
Ce e amorul?
14, 7, 3, 2, 2, 0, 0, 0, 0
18
Ce te legeni...
19
Ce-ţi doresc eu ţie, dulce Românie
17, 11, 4, 0, 0, 0, 0, 0, 0
20
Cine-i?
16, 3, 11, 0, 0, 0, 0, 0, 0
15, 13, 3, 4, 4, 0, 0, 0, 1
30, 5, 7, 0, 2, 0, 0, 0, 0 34, 21, 24, 3, 4, 0, 0, 0, 0 23, 5, 4, 1, 3, 0, 0, 0, 0 27, 15, 25, 3, 2, 0, 0, 0, 0 37, 19, 23, 6, 4, 0, 0, 0, 1 136, 65, 49, 1, 7, 0, 0, 0, 0
29, 30, 28, 1, 6, 0, 0, 0, 0 7, 5, 7, 1, 0, 0, 0, 0, 0
10, 9, 1, 1, 4, 0, 0, 0, 0
Rhyme 89
Poem title (alphabetically)
N,V,A,Av,P,Pr,C,Nu,I
21
Copii eram noi amandoi
28, 15, 6, 1, 1, 0, 0, 0, 0
22
Crăiasa din poveşti
6, 2, 6, 0, 0, 0, 0, 0, 0
23
Criticilor mei
6, 6, 2, 0, 0, 0, 0, 0, 0
24
Cu mâne zilele-ţi adaogi...
25
Cugetările sărmanului Dionis
26
Cum negustorii din Constantinopol
14, 1, 1, 0, 0, 0, 0, 0, 0
27
Cum oceanu-ntărâtat...
4, 7, 3, 0, 0, 0, 0, 0, 0
28
De câte ori, iubito...
4, 4, 3, 3, 0, 0, 0, 0, 0
29
De ce nu-mi vii
30
De ce să mori tu?
31
De-aş avea
32
De-aş muri ori de-ai muri
33
De-oi adormi (variantă)
18, 6, 7, 4, 1, 0, 0, 0, 0
34
De-or trece anii...
6, 4, 0, 2, 4, 0, 0, 0, 0
35
Departe sunt de tine
36
Despărţire
37
Din Berlin la Potsdam
38
Din lyra spartă...
5, 5, 1, 0, 1, 0, 0, 0, 0
39
Din noaptea
6, 5, 2, 2, 1, 0, 0, 0, 0
40
Din străinătate
22, 7, 6, 1, 0, 0, 0, 0, 0
41
Din valurile vremii...
4, 10, 4, 1, 1, 0, 0, 0, 0
42
Dintre sute de catarge
12, 2, 1, 1, 0, 0, 0, 0, 0
43
Doi aştri
9, 0, 1, 0, 2, 0, 0, 0, 0
44
Doina
45
Dorinţa
46
Dumnezeu şi om
47
Ecò
73, 30, 44, 0, 3, 0, 0, 0, 0
48
Egipetul
44, 18, 28, 0, 0, 0, 0, 0, 0
49
Epigonii
49, 14, 49, 0, 2, 0, 0, 0, 0
50
Făt-Frumos din tei
51
Feciorul de împărat fără de stea
52
Floare-albastră
53
Foaia veştedă (după Lenau)
54
Freamăt de codru
18, 14, 11, 2, 3, 0, 0, 0, 0
55
Frumoasă şi jună
1, 8, 4, 0, 1, 0, 0, 0, 2
5, 7, 1, 3, 0, 0, 0, 0, 0 47, 10, 4, 2, 1, 0, 0, 0, 0
10, 4, 1, 5, 4, 0, 0, 0, 0 12, 3, 11, 2, 0, 0, 0, 0, 0 14, 1, 6, 3, 0, 0, 0, 0, 0 17, 8, 10, 0, 1, 0, 0, 0, 0
8, 7, 2, 1, 0, 0, 0, 0, 0 17, 9, 4, 4, 4, 0, 0, 0, 0 8, 4, 2, 0, 0, 0, 0, 0, 0
39, 14, 1, 1, 6, 0, 0, 0, 0 7, 3, 1, 1, 0, 0, 0, 0, 0 25, 3, 27, 0, 1, 0, 0, 0, 0
23, 6, 14, 2, 1, 0, 0, 0, 0 338, 211, 234, 22, 38, 0, 0, 0, 0 37, 6, 10, 1, 2, 0, 0, 0, 0 8, 4, 0, 0, 2, 0, 0, 0, 0
90 Phonic phenomena
Poem title (alphabetically)
N,V,A,Av,P,Pr,C,Nu,I
56
Ghazel
22, 7, 10, 0, 1, 0, 0, 0, 0
57
Glossă
36, 25, 10, 7, 2, 0, 0, 0, 0
58
Horia
59
Iar când voi fi pământ (variantă)
60
Icoană şi privaz
61
Împărat şi proletar
62
În căutarea Şeherezadei
63
Înger de pază
64
Înger şi demon
65
Îngere palid...
66
Întunericul şi poetul
23, 6, 4, 0, 1, 0, 0, 0, 0
67
Iubind în taină...
11, 2, 0, 0, 1, 0, 0, 0, 0
68
Iubită dulce, o, mă lasă
17, 19, 14, 4, 2, 0, 0, 0, 0
69
Iubitei
20, 20, 13, 2, 7, 0, 0, 0, 2
70
Junii corupţi
36, 18, 22, 2, 0, 0, 0, 0, 0
71
Kamadeva
72
La Bucovina
73
La mijloc de codru...
74
La moartea lui Heliade
23, 13, 12, 0, 0, 0, 0, 0, 0
75
La moartea lui Neamţu
15, 8, 5, 1, 0, 0, 0, 0, 0
76
La moartea principelui Ştirbey
77
La mormântul lui Aron Pumnul
13, 1, 8, 0, 3, 0, 0, 0, 0
78
La o artistă (Ca a nopţii poezie)
17, 8, 12, 2, 1, 0, 0, 0, 0
79
La o artistă (Credeam ieri)
14, 3, 11, 0, 0, 0, 0, 0, 0
80
La Quadrat
8, 4, 4, 0, 0, 0, 0, 0, 0
81
La steaua
4, 7, 2, 2, 1, 0, 0, 0, 0
82
Lacul
83
Lasă-ţi lumea...
84
Lebăda
6, 4, 2, 0, 0, 0, 0, 0, 0
85
Lida
6, 6, 4, 0, 0, 0, 0, 0, 0
86
Locul aripelor
87
Luceafărul
88
Mai am un singur dor
21, 4, 7, 3, 1, 0, 0, 0, 0
89
Maria Tudor
7, 5, 2, 0, 0, 0, 0, 0, 0
90
Melancolie
10, 4, 4, 0, 4, 0, 0, 0, 0 24, 7, 5, 2, 2, 0, 0, 0, 0 96, 52, 32, 4, 4, 0, 0, 0, 0 81, 55, 66, 2, 5, 0, 1, 0, 0 91, 23, 31, 0, 11, 0, 0, 0, 0 7, 2, 3, 0, 2, 0, 0, 0, 0 34, 26, 44, 0, 4, 0, 0, 0, 0 7, 4, 1, 0, 0, 0, 0, 0, 0
4, 2, 4, 0, 0, 0, 0, 0, 0 19, 2, 6, 3, 6, 0, 0, 0, 0 7, 2, 3, 0, 1, 0, 0, 0, 0
8, 5, 2, 0, 1, 0, 0, 0, 0
5, 5, 0, 0, 0, 0, 0, 0, 0 18, 17, 13, 0, 3, 0, 0, 0, 0
12, 6, 5, 1, 4, 0, 0, 0, 0 186, 100, 55, 36, 15, 0, 0, 0, 0
20, 10, 6, 1, 1, 0, 0, 0, 0
Rhyme 91
Poem title (alphabetically)
N,V,A,Av,P,Pr,C,Nu,I
91
Memento mori
668, 185, 303, 30, 24, 0, 0, 0, 1
92
Misterele nopţii
93
Mortua est!
94
Mureşanu
102, 57, 55, 2, 7, 0, 0, 0, 0
95
Noaptea...
8, 4, 8, 0, 0, 0, 0, 0, 0
96
Nu e steluţă
6, 3, 2, 0, 1, 0, 0, 0, 0
97
Nu mă-nţelegi
98
Nu voi mormânt bogat (variantă)
99
O arfă pe-un mormânt
100 O călărire în zori 101 O stea prin ceruri 102 O, adevăr sublime... 103 O, mamă… 104 Ondina (Fantazie) 105 Oricâte stele...
10, 11, 2, 9, 0, 0, 0, 0, 0 33, 13, 23, 0, 1, 0, 0, 0, 0
8, 4, 6, 3, 3, 0, 0, 0, 0 20, 6, 6, 3, 1, 0, 0, 0, 0 7, 3, 6, 1, 2, 0, 0, 0, 0 60, 11, 14, 1, 0, 0, 0, 0, 0 4, 9, 3, 0, 0, 0, 0, 0, 0 33, 4, 6, 0, 1, 0, 0, 0, 0 7, 5, 0, 3, 3, 0, 0, 0, 0 101, 33, 55, 4, 13, 0, 0, 0, 0 12, 1, 0, 1,0, 0, 0, 0, 0
106 Pajul Cupidon...
7, 2, 6, 3, 0, 0, 0, 0, 0
107 Pe aceeaşi ulicioară...
9, 3, 0, 4, 2, 0, 0, 0, 0
108 Pe lângă plopii fără soţ… 109 Peste vârfuri 110 Povestea codrului
12, 14, 11, 5, 2, 0, 0, 0, 0 4, 2, 3, 3, 0, 0, 0, 0, 0 19, 1, 4, 0, 2, 0, 0, 0, 0
111 Povestea teiului
36, 28, 12, 10, 2, 0, 0, 0, 0
112 Prin nopţi tăcute
2, 4, 10, 0, 0, 0, 0, 0, 0
113 Privesc oraşul furnicar
16, 9, 6, 5, 0, 0, 0, 0, 2
114 Pustnicul 115 Replici 116 Revedere
33, 8, 17, 3, 3, 0, 0, 0, 0 16, 3, 1, 0, 4, 0, 0, 0, 0 13, 14, 6, 2, 1, 0, 0, 0, 0
117 Rugăciunea unui dac
22, 17, 3, 3, 1, 0, 0, 0, 0
118 S-a dus amorul
19, 10, 7, 6, 6, 0, 0, 0, 0
119 Sara pe deal
8, 4, 10, 0, 2, 0, 0, 0, 0
120 Scrisoarea I
84, 35, 29, 5, 2, 0, 0, 0, 1
121 Scrisoarea II
58, 15, 6, 3, 0, 0, 0, 0, 0
122 Scrisoarea III
185, 53, 27, 6, 14, 0, 0, 0, 0
123 Scrisoarea IV
77, 37, 26, 7, 1, 0, 0, 0, 0
124 Scrisoarea V
72, 20, 17, 5, 6, 0, 0, 0, 0
125 Se bate miezul nopţii...
4, 2, 0, 0, 0, 0, 0, 0, 0
92 Phonic phenomena
Poem title (alphabetically) 126 Şi dacă...
N,V,A,Av,P,Pr,C,Nu,I 5, 6, 0, 1, 0, 0, 0, 0, 0
127 Singurătate
12, 4, 2, 2, 0, 0, 0, 0, 0
128 Somnoroase păsărele...
8, 4, 3, 1, 0, 0, 0, 0, 0
129 Sonete
21, 16, 3, 2, 0, 0, 0, 0, 0
130 Speranţa
22, 11, 8, 4, 0, 0, 0, 0, 0
131 Steaua vieţii
7, 2, 2, 0, 1, 0, 0, 0, 0
132 Stelele-n cer
23, 6, 7, 0, 0, 0, 0, 0, 0
133 Sus în curtea cea domnească
8, 3, 19, 0, 0, 0, 0, 0, 0
134 Te duci...
14, 16, 9, 4, 1, 0, 0, 0, 0
135 Trecut-au anii
7, 6, 0, 1, 0, 0, 0, 0, 0
136 Unda spumă
5, 5, 5, 0, 1, 0, 0, 0, 0
137 Venere şi Madonă
34, 4, 9, 0, 1, 0, 0, 0, 0
138 Veneţia (de Gaetano Cerri) 139 Viaţa
12, 1, 1, 0, 0, 0, 0, 0, 0 33, 17, 24, 3, 1, 0, 0, 0, 0
140 Viaţa mea fu ziuă
6, 7, 1, 1, 1, 0, 0, 0, 0
141 Vis
16, 4, 4, 1, 2, 0, 0, 0, 0
The rank-order distribution is not very illuminating and, as a matter of fact, not very useful. Direct comparisons of individual poems would easily be possible using the cosine similarity (see above) but one could obtain only a classification. More interesting is the activity/descriptivity relationship measured by means of a modified Busemann coefficient (cf. Busemann 1925; Altmann 1978) given as (2.5.13) Q =
V , A +V
i.e. as a proportion of verbs in A+V. The ratio Q is a simple proportion in [0,1]. In the vicinity of 0.5, it expresses the active-descriptive equilibrium; if Q < 0.5, it is a sign of ornamentality/descriptivity; if Q > 0.5, it is a sign of activity. Q is evidently binomially distributed and in absence of any trend, we have Q = p = 0.5. Now considering A + V = n, we easily obtain confidence intervals for any n. Thus we can decide that (i) if Q > 0.5 (or V > A) we compute n x =V x n
(2.5.14) P ( X ≥ V ) = ∑ 0.5n
Rhyme 93
and if P(X ≥ V) ≤ 0.05, we consider the poem as rather active. (ii) If Q < 0.5, we compute n x =0 x V
(2.5.15) P ( X ≤ V ) = ∑ 0.5n and if P(X ≤ V) ≤ 0.05, we consider the poems as ornamental or descriptive. (ii) In all other cases the poem has the active-descriptive equilibrium. Consider the poem Adio in which V + A = n, i.e. 13 + 3 = 16, and Q = 13/16 = 0.8125. Since the first condition is fulfilled, we compute (2.5.14) and obtain P(X ≥ 13) = P(X = 13) + P(X = 14) + P(X = 15) + P(X = 16) = 16 16 16 16 = 0.516 + 0.516 + 0.516 + 0.516 = 13 14 15 16
= 0.008545 + 0.001831 + 0.000244 + 0.000015 = 0.0106. Since the result is smaller than 0.05 we consider the rhyming part of the poem as highly active. In order to alleviate computations, we present a table (c.f. Table 2.5.8) showing n = 5,…,60 and the two boundaries of equilibrium. If V is smaller than or equal to the left number (VL), then the poem is ornamental (descriptive); if V is greater than or equal to the right number (VU), then the poem is active. In all other cases there is an active-descriptive equilibrium. In the above example, we had n = 16 and V = 13. Looking at Table 2.5.8 we easily find that V = 13 > 12, hence the poem is strongly “active”. In this way, we obtain at least a classification consisting of three classes. A finer computation of cumulative probabilities would yield a more continuous classification but it can be shown that Eminescu is rather in equilibrium. Not even the analysis of the history shows a trend-like motion. Though in the historically second half, some poems display greater “activity”, it is mostly because they are short and in the rhyme position there is no adjective. Thus the rhyme position is not enough for the judgement of the overall “activity” of the poem, it mirrors only a partial picture. Now, considering all parts of speech in the rhyme position we can ask whether the distribution of word classes is statistically equal. Any similarity measure could be used but a subsequent test for significance would be necessary in any case. Hence we rather perform a homogeneity test directly. We use
94 Phonic phenomena for this purpose the information statistics 2Î, which is asymptotically distributed like a chi-square variable, and build groups of poems which have a homogeneous distribution of parts of speech in the rhyme position. Table 2.5.8: Boundary values of active/descriptive equilibrium n
VL
VU
n
VL
VU
n
VL
VU
5
0
5
24
7
17
43
15
28
6
0
6
25
7
18
44
16
28
7
0
7
26
8
18
45
16
29
8
1
7
27
8
19
46
16
30
9
1
8
28
9
19
47
17
30
10
1
9
29
9
20
48
17
31
11
2
9
30
10
20
49
18
31
12
2
10
31
10
21
50
18
32
13
3
10
32
10
22
51
19
32
14
3
11
33
11
22
52
19
33
15
3
12
34
11
23
53
20
33
16
4
12
35
12
23
54
20
34
17
4
13
36
12
24
55
20
35
18
5
13
37
13
24
56
21
35
19
5
14
38
13
25
57
21
36
20
5
15
39
13
26
58
22
36
21
6
15
40
14
26
59
22
37
22
6
16
41
14
27
60
23
37
23
7
16
42
15
27
We demonstrate the procedure with the first two poems Adânca mare… and Adio. Adânca mare… - parts of speech in the rhyme position Adânca mare sub a lunei faţă (N); Înseninată de-a ei blondă rază (N), O lume-ntreagă-n fundul ei visează (V) Şi stele poartă pe oglinda-i creaţă (A). Dar mâni - ea falnică, cumplit turbează (V) Şi mişcă lumea ei negru-măreaţă (A),
Rhyme 95
Pe-ale ei mii şi mii de nalte braţe (N) Ducând peire - ţări înmormântează (V). Azi un diluviu, mâne-o murmuire (N), O armonie, care capăt n-are (V) Astfel e-a ei întunecată fire (N), Astfel e sufletu-n antica mare (N). Ce-i pasă - ce simţiri o să ni-nspire (V) Indiferentă, solitară - mare! (N)
Since none of the poems have numerals and prepositions in the rhyme position, we omit the two zeros and obtain N Adânca mare
V
…
7
A
Av
P
C
I
fi.
5
2
0
0
0
0
14
Adio
15
13
3
4
4
0
1
40
f.j
22
18
5
4
4
0
1
m=54
Let us consider the above matrix with 2 rows and 7 columns. The individual numbers in the cells are called fij, the sums of the rows are fi. and the sums of the columns f.j. Let the sum of all numbers be
m = ∑∑ fij i
j
Then the 2Î test can be performed by means of the formula (2.5.16) 2 Iˆ = 2
∑i ∑j
fij ≠ 0
f ij ln
mf ij fi . f . j
and can be presented also as
ˆ 2 ∑ ∑ fij ln fij + 2m ln m − 2∑ fi .ln fi . − 2 ∑ f . j ln f . j
= 2I (2.5.17)
i
j fij ≠ 0
i
j f . j ≠0
For each cell containing zero, 1 is subtracted from the overall 2Î (cf. Ku 1963). In our example we obtain m = 54 and (according to (2.5.17)) 2Î = 2(7 ln 7 + 5 ln 5 + 2 ln 2 + 15 ln 15 + 13 ln 13 + 3 ln 3 + 4 ln 4 + 4 ln 4 + 1 ln 1) + 2(54 ln 54) – 2(14 ln 14 + 40 ln 40) – 2(22 ln 22 + 18 ln 18 +
96 Phonic phenomena 5 ln 5 + 4 ln 4 + 4 ln 4 + 1 ln 1) = = 2(13.6214 + 8.0472 +1.3863 + 40.6208 + 33.3443 + 3.2958 + 5.5452 + 5.5452 + 0) + 2(215.4051) – 2(184.5020) – (68.0029 + 52.0267 + 8.0472 + 5.5452 +5.5452 +0) = = 222.8124 +430.8102 – 369.0040 – 278.3344 = = 6.2842 We subtract 5 for the zeroes from this number and obtain 2Î = 1.2842. Since we have (2-1)(7-1) = 6 degrees of freedom, we can say that the two poems are homogeneous from this point of view, even if the differences are striking. Unfortunately, the subtraction of zeroes leads many times to problematic results, hence in this problem the information statistics cannot be used. For all other cases we perform the usual chi-square test, i.e. we compute (2.5.18) X 2 = ∑ i
∑
j f .j ≠0
( fij − fi . f . j / m) 2 fi . f . j / m
.
For the above case we obtain X2 = 4.0953 which need not be modified. With 6 degrees of freedom it is not significant. Groupings can be performed by forming a group of only those poems which are not significantly different form all the other ones in the group. After performing all tests pair-wise, we can state that the poems have a very homogeneous distribution of parts of speech in the rhyme position. In this sense, Eminescu was very stable.
3 The word 3.1 Introduction The number of word properties studied so far in quantitative linguistics is considerable. As already said, these properties are nothing intrinsic to the word, they are scientific (or everyday) concepts, i.e. mental constructions. At the beginning of the 20th century very few quantitative word properties were discussed; today, every linguist can ex abrupto list at least twenty. Our conceptual knowledge increases and we aim at organising it in systems in which there are no longer isolated parts. This ideal state cannot be attained in one step, it is rather a way along which scientists collect and put together membra disiecta. In this section, we restrict ourselves to a few of the aspects of words which have been quantified. The most popular way to study quantitative linguistic properties is the investigation of word frequencies. This way yields various results, but also traps, pitfalls, stumbling blocks, etc. depending not only on the language, text sort, grammar under study, but also on the previous education of the researcher. The views on language and the methods applied by mathematicians, physicists, psychologists and linguists are quite different. A result that seems relevant to a researcher in information theory may seem quite irrelevant to a linguist and vice versa. The unification of all views into a general theory is not only necessary but also possible; however, complex research teams are needed to cover all the present-day opinions. Research expands both in depth and width; and a theory should be constructed in such a way that every new vista should be derivable from it. Presently, this is merely a dream. We shall restrict ourselves to the surveys of frequency and its different aspects, length, word classes, and sequences called motifs and apply the status quo to the poetic texts by M. Eminescu. In this volume, the presented models and methods are used to characterise poetic texts but they can be applied also to properties of any texts in any language. The word is, in a certain sense, a central linguistic unit. It is the clothing of concepts, so to say, their incarnation. Needless to say, communication is possible also without words but human communication is most effective when words are involved. Words and their properties are involved in syntactic constructions, in paradigmatic classes, in many control cycles of synergetic linguistics, and most dictionaries describe words and their properties. The number of words in a language is always underestimated. There are no complete dictionaries, of course. The greatest German dictionary contains about 300,000 words but lin-
98 The word guists estimate the real extent up to about 20 millions (including terminology). The impossibility to capture the complete stock is caused by the daily change of the vocabulary; words are born and die; only a part of them is codified in a dictionary. They are applied in texts, and their usage is controlled on two levels: on the surface by the grammar of the language, and in depth by laws dictated by the mechanisms of communication. These latent mechanisms give rise to phenomena a part of which will be described in this chapter. We shall touch word frequency, vocabulary richness of texts, word length, the representation of word classes evoking phenomena such as descriptiveness, nominal style, ornamentality, etc. but we shall not scrutinise all of them. It is in order to note here that capturing “all” properties of words in a work like that by Eminescu is absolutely impossible. On the other hand, many aspects we investigated yield neutral results, i.e. no observable tendencies or characteristic features. But with continuing research everything can turn out to be relevant for the development of language or literature. Thus we present many results but not all of them display remarkable features.
3.2 Frequency distribution Counting word frequencies in texts belongs to the earliest activities in quantitative linguistics. Perhaps the most famous case is that of Kaeding's (1897/98) frequency dictionary of German, which, just as the majority of frequency dictionaries, represents a kind of l’art pour l’art. It provides only material, not data, because no hypothesis is associated with it. Nevertheless, under favourable conditions one can construct different kinds of data from it. Today, the investigation of the frequency of occurrence of words and other units in texts has so many aspects that it must be considered a discipline on its own right. When units are counted three forms can and must be distinguished: (a) word forms, (b) lemmas, (c) hrebs. Regardless of which variant is chosen, a text has to be pre-processed before any counting of linguistic units becomes possible. Technically spoken, a (written) text consists of a stream of symbols, in which linguistically interesting units must be identified and segmented. In computational linguistics, this first step is called tokenisation. Fortunately, quite reliable software is available, which performs this task for many languages. Tokenisation has to cope with many detail problems, beginning with the decision which symbols will be considered characters and which are separators. The recognition of punctuation marks, numbers, hyphenation and many other questions belong to this step. Only then, the identified tokens can be processed to identify word forms. Compounding, multi-word units, proper
Frequency distribution 99
names, abbreviations, foreign words etc. form complications for which appropriate (with respect to the intended investigation) decisions have to be taken. Lemmatisation has become a standard task in computational linguistics, at least for many languages and common language processing purposes. An alternative is the segmentation of the text into hrebs (cf. Hřebíček 1997; Ziegler, Altmann 2001), which consist of words or morphemes or even phrases (Köhler, Naumann 2007) with the same meaning or function. In this case, the problem of homonyms is already solved and the performing of a morphemic analysis of the texts, the problem of apostrophes, hyphens and compounds, too. But there is no program that could master this specific task, at least up to now. Thus a hreb-like analysis must be performed with paper and pencil, though some procedures have already been programmed. The problem of “correctness” of our analysis does not exist here, because every analysis of a text is based on some criteria, and the criteria are not given a priori but set up by the researcher: they are conventions. In this sense, “correct” means “in agreement with definitions, conventions, etc.” If we search for external, objective criteria, then there are at least two: (1) Those definitions and criteria which yield results confirming with a linguistic law should be preferred over other ones. At the same time, this criterion is very pretentious because there is nothing more difficult in science than to establish a law. (2) That analysis is better whose resulting entities have the most associations with other computed entities, i.e. the results of the analysis can be embedded in a system of related statements. Needless to say, external criteria are a prerequisite on the way to a theory whereas internal criteria contribute to description, classifications, etc. In the sequel, we shall introduce some of the relations which have been thoroughly studied in quantitative linguistics. Word frequencies can be presented in three forms: (i) As ranked frequencies, where the rank is the independent variable and the frequency the dependent one. This form is called mostly “Zipf's law” but it has many variants and different interpretations (cf. e.g. Zipf 1935; Mandelbrot 1953; Miller 1957, Popescu, Altmann, Köhler 2010). It has a very rich history. Its properties belong to the most extensively studied ones, not only in linguistics (cf. http://www.nslij-genetics.org/wli/zipf). As a matter of fact, it is not a distribution but an ordered sequence. (ii) The frequency spectrum, where the independent variable is the frequency (x = 1,2,…) and the dependent variable is the number of words with frequency x. Theoretically, it can be obtained by a transformation of the rank-frequency sequence. This relation forms indeed a frequency distribution. (iii) In form of a cumulative frequency distribution, where the frequencies in (i) or (ii) are summed up step by step.
100 The word These presentations are already high abstractions and can be used to compute several indicators. In our present investigation, we shall operate with word forms. The apostrophes will be eliminated and the word parts will be joined (not replaced by a blank); the same will be done with hyphens; compounds represent one word form if they are written together. The boundary of the word form is formed by a blank (white space) or another separator such as punctuation marks, including the beginning and the end of a text. This criterion is simply practical and may be applied mechanically, at least in alphabetic languages. Its adequateness can be corroborated only by means of the two external criteria mentioned above. For the sake of illustration, we present the computing procedure as applied to Eminescu's poem Prin nopţi tăcute in Table 3.2.1. The first column contains the ranks r. We shall adhere to this technique, though it would be possible to ascribe words with the same frequency their mean rank. In Table 3.2.1 the first and the second word would have ranks 1.5. In our case it would complicate some computations. The order of the word forms in the last columns is reversely alphabetic in the given frequency class. This is rather a technical matter without any influence on the results. The frequency f(r) is the number of occurrences of the given word-form in the given text. The cumulative frequency ∑f(r) is the stepwise addition of frequencies beginning with the first one. F(r) is the empirical distribution function obtained as (3.2.1) F (r ) =
1 r ∑ f (i) , N i =1
where N is the sum of all frequencies (i.e. number of tokens = text length). Information of this kind is sufficient for any kind of statistical processing, but we shall use only a part of this information for descriptive purposes. The frequency spectrum is easily computed from the ranked frequencies simply by counting the numbers of ones, twos, threes,… in the second column. Then x (x = 1,2,3,…) is the occurrence frequency of words and f(x) is the number of words occurring x times. Using the data in Table 3.2.1 we set up the spectrum as presented in Table 3.2.2.
Frequency distribution 101 Table 3.2.1: Rank-frequency of word forms in Eminescu's poem Prin nopţi tăcute Rank r
Frequency
Cumulative
f(r)
frequency ∑f(r)
3
3
2
3
6
3
2
8
4
2
10
5
2
12
6
2
14
7
1
15
8
1
16
9
1
17
10
1
18
11
1
19
12
1
20
13
1
21
14
1
22
15
1
23
16
1
24
17
1
25
18
1
26
19
1
27
20
1
28
21
1
29
22
1
30
23
1
31
24
1
32
25
1
33
26
1
34
27
1
35
28
1
36
29
1
37
30
1
38
31
1
39
32
1
40
33
1
41
34
1
42
35
1
43
1
F(r)
Word form
0.0625
prin
0.1250
din
0.1667
un
0.2083
şi
0.2500
luna
0.2917
lumea
0.3125
visuri
0.3333
vântul
0.3542
văd
0.3750
trece
0.3958
tăcute
0.4167
senină
0.4375
sece
0.4583
sunt
0.4792
rece
0.5000
plină
0.5208
plâng
0.5417
ochiumi
0.5625
obraz
0.5833
o
0.6042
nor
0.6250
nopţi
0.6458
mute
0.6667
mintea
0.6875
marea
0.7083
lunce
0.7292
lină
0.7500
lată
0.7708
iute
0.7917
în
0.8125
icoanai
0.8333
glas
0.8542
eu
0.8750
cu
0.8958
cea
102 The word Rank r
Frequency
Cumulative
f(r)
frequency ∑f(r)
1
44
37
1
45
38
1
46
39
1
47
40
1
48
36
F(r)
Word form
0.9167
ce
0.9375
cată
0.9583
cânt
0.9792
beată
1.0000
aud
Table 3.2.2: Frequency spectrum of word forms in Eminescu's poem Prin nopţi tăcute Number of occurrences x
Number of words occurring x-times f(x)
1 2 3
34 4 2
In the same way as in Table 3.2.1, one can set up the cumulative frequencies and the empirical distribution function, too. Formally, the transformation of rankfrequencies into a spectrum is done by solving (3.2.2) x ≤ f(r) < x + 1 where f(r) is a theoretical function, for r (cf. Haight 1966, 1969; Baayen 1989; Zörnig, Boroda 1992, Wimmer et al. 2003). The presentation of lemma and hreb frequencies is performed analogically.
3.2.1 Stratification Many sophisticated argumentations have been presented together with tentative models of the rank-frequency distribution of word-like units. The observed regularity is known as “Zipf's law” though there were different researchers earlier than Zipf to discover the utility of the power function for similar purposes, cf. http://www.nslij-genetics.org/wli/zipf (02-01.2010) where Wentian Li collected much literature concerning this “law” in sciences. Though Zipf himself did not have a concrete hypothesis and set up his formula simply leaning against the inspection of frequencies, other researchers brought in miscellaneous arguments (e.g. B. Mandelbrot 1953, G.A. Miller 1957; J.K. Orlov 1982; V.V. Arapov 1977; Arapov, Šrejder 1977, 1978, W. Li 1992; Baayen 2001; Naranan,
Frequency distribution 103
Balasubrahmanyan 2005, Manin 2009; Ferrer i Cancho 2005), and “Zipf's law” is perhaps the most diffused linguistic concept in all sciences. The development is caused, as is usual in science, by cropping up counter arguments, different views, new interrelations etc. As already mentioned above, the rank-frequency sequence is not really a distribution because ranks do not represent a random variable. This problem concerned G. Herdan (1956, etc.) who criticised it wherever he could. However, the power function found a victorious way into several scientific disciplines; in probability theory it is called zeta-distribution (sometimes also Joos' model, Riemann zeta distribution, Zipf distribution, Zipf's law, Zipf-Estoup law, etc. cf. Wimmer, Altmann 1999: 665 f.). Whether in form of a function or of a distribution, it is a special case of the general theory of language laws (cf. Wimmer, Altmann 2005). However, it has been shown that the degree of synthetism of a language plays an important role for the adequacy of Zipf's formula, especially in the domain of hapax legomena. In Popescu, Mačutek, Altmann (2009) fitting Zipf's formula to three texts in languages with different degrees of synthetism has been shown. The fitting results, as presented in Figure 3.2.1.1, 3.2.1.2 and 3.2.1.3, show that in a highly synthetic language such as Hungarian, the power function lies below the hapax legomena; in a highly analytic language such as Hawaiian, it lies above the hapax legomena; only in a moderately analyticsynthetic language such as Bulgarian, it crosses the hapax legomena. This holds true, of course, only for unlemmatised texts.
Figure 3.2.1.1. Fitting the power function to a text in a highly synthetic language
104 The word
Figure 3.2.1.2. Fitting the power function to a text in a highly analytic language
Figure 3.2.1.3. Fitting the power function to a text in a balanced synthetic language (from Popescu, Mačutek, Altmann 2009: 104 f.)
Though all fitting results in the figures are adequate, they miss a bit of realism. In order to reconcile the observed facts with our intuition, we consider the ranked frequencies as a sequence and adhere to the proposal made by Popescu, Altmann, Köhler (2010) and Altmann, Popescu, Zotta (2013)1 considering the ranked frequencies as a superposition of different strata in text. The simplest strata are formed by e.g. the parts-of-speech classes, autosemantics and synsemantics, specific words and general words, direct and indirect speech, persons in a stage play, etc. The study of strata is not finished, on the contrary, it can begin just now. According to this approach, different strata have their own fre 1 Altmann, G., Popescu, I.-I., Zotta, D., (2013). Stratification in Texts, Glottometrics 25, 85-93.
Frequency distribution 105
quency sequences organised in a decreasing order, in form of a decay. Decays have usually the form y = a exp(-x/r). If we consider a superposition of these strata, i.e. if we feed the individual strata in the main stratum and re-rank the whole field we obtain the function
= y
∑ a exp(− x / r ) i
i
i
with a variable number of exponential components. In addition, since we know that frequencies cannot be smaller than 1, we add 1 as the limit and obtain finally (3.2.1.1) y = 1 + a1 exp(−x/r1) + a2 exp(−x/r2) + a3 exp(−x/r3) + … This function has several advantages: (i) It is not a distribution, hence it need not be normalised, and reconciles even those who reject ranking as random variable. (ii) It does not attain unrealistic values smaller than 1, a case that is present with all distributions applied to this phenomenon, if they are not truncated at the right hand side (but often even in that case). (iii) It automatically provides information about the number of strata: if two exponential components have the same (or very similar) parameter ri, then one of the components is redundant and may be omitted. Parameter ai is only part of the sum of all ai which express the amplitude. When we eliminate a redundant component, then a new fitting with a reduced number of exponential components automatically adds the lost part of the amplitude to other ai's. We performed fitting function (3.2.1.1) to 146 poems by Eminescu and obtained the results presented in Table 3.2.1.2. Adding a new component was stopped as soon as it contained an exponent identical with a previous one. The poem Memento mori, which is very long, is given in the table with three strata but it contains four. They are y = 448.4420 exp(-x/1.6753) + 179.4063exp(-x/9.7582) + + 16.0070exp(-x/79.0170) + 6.8450exp(-x/412.8684) and R2 = 0.9938. However, the fitting result does not improve considerably, because R2 is in both cases greater than 0.9, hence we shall accept the threecomponent version. The table gives some answers and stimulates other ones. (1) We see that the maximal number of strata is 4 but it can be reduced to 3 because the poem Memento mori may be captured sufficiently well with 3 components. Thus for Eminescu we have the results presented in Table 3.2.1.1.
106 The word Table 3.2.1.1: Number of strata in 146 poems by Eminescu No. of Strata
No. of Poems
1 2 3
41 66 39
It can be supposed that there are no stratification preferences in Eminescu's poems, although the frequencies are not distributed uniformly (X2 = 9.39 with 2 DF). (2) An increasing or decreasing number of strata in the course of time cannot be observed. (3) Though the shortest poems have only one stratum and the long poems have mostly 3 strata, a clear tendency (correlation with N) cannot be detected. This is caused by the weak variation of the number of strata. A tendency can at least be conjectured if we compare Eminescu's stratification with that of the Slovak poet E. Bachletová who wrote only short poems (N < 171), all of which contain only one stratum. (4) Is there an association between the number of strata and the average length of the verses? To answer this question, verse length line by line and poem by poem would have to be counted, but again, the small variation within the data would obviate reliable statements. (5) Are there any semantic or content-dependent properties that cause the given stratification? (6) Does stratification depend on the presence of communication or speech acts? (7) Strata may arise even if a poem was not written in one go – this is always the case with long texts – or if the author or the editors performed supplementary changes. This list can easily be prolonged but we shall postpone it or leave it to those researchers who are specialised in some of the given domains. We take it for granted that the number of strata in a poem is not a random event but surely rooted in some circumstances we do not know as of yet. Each of the above items requires both historical and literary knowledge and the research will boil down to interpretative conjectures, because Eminescu himself cannot be asked.
Frequency distribution 107
In order to illustrate the work with the stratification approach we present in Figures 3.2.1.4, 3.2.1.5, and 3.2.1.6 fitting results with (3.2.1.1) and one, two and three exponential components.
Figure 3.2.1.4. Fitting a mono-stratal poem Locul aripelor
Figure 3.2.1.5. Fitting a bi-stratal poem Speranţa
108 The word
Figure 3.2.1.6. Fitting a tri-stratal poem Scrisoarea IV
As can be seen in the last figures, the curves adequately capture the frequencies. However, the use of other formulas even for the monostratal case yields worse results. In Figures 3.2.1.7, 3.2.1.8, and 3.2.1.9 we present the fitting of Zipf's, Zipf-Mandelbrot's and Zipf-Alekseev's distributions to the data from the poem Locul aripelor. The formulas are as follows Right truncated Zipf/zeta distribution: (3.2.1.2)
;
;
Right truncated Zipf-Mandelbrot distribution: (3.2.1.3) Right truncated modified Zipf-Alekseev distribution:
(3.2.1.4)
Frequency distribution 109
Figure 3.2.1.7. Locul aripelor: Fitting the Right truncated zeta distribution with: a = 0.5305, n = 173, X2 = 12.7007, DF = 134, P ≈ 1.00, f(r) < 1 for r > 101
Figure 3.2.1.8. Locul aripelor: Fitting the Right truncated Zipf-Mandelbrot distribution with a = 0.6601, b = 3.8588, n = 173, X2 = 14.22, DF = 130, P ≈ 1.00, f(r) < 1 for r > 97
The result is excellent in each case, the probability is always ≈ 1.00 but the figures show some deficiencies and differences, especially the convergence against 0, present even with the truncated zeta/Zipf-distribution. Both approaches have similar problems: With a distribution as a model, it is not always easy to find a plausible interpretation of the parameters; with the stratification approach, the problem is the determination and interpretation of the strata. In the optimal case one ought to derive the parameters from a theory, but this is a dream of the future.
110 The word
Figure 3.2.1.9. Locul aripelor: Right truncated modified Zipf-Alekseev: a = 0.3351, b = 0.0337, n = 173, α = 0.0309, X2 = 13.98, DF = 130, P ≈ 1.00; f(r) < 1 for r > 97
In Table 3.2.1.2 the exponential parameters are arranged in increasing order. A preliminary examination yielded the following results: – The parameter r1 of the mono-stratal data depends on text length N as can be seen in Figure 3.2.1.10. It could be captured by means of a linear function but this circumstance cannot be generalized as yet. – In bi-stratal data the relationship of parameters r1 and r2 displays a considerable dispersion (Figure 3.2.1.11). Though an increasing trend is visible, much more data are necessary to establish a relationship. – In tri-stratal data there are three possibilities to combine two parameters as can be seen in Figures 3.2.1.12, 3.2.1.13, and 3.2.1.14. In the first case, the outlier seduces to the acceptation of a linear trend but the dispersion is too great. The relationship between r1 and r3 in Figure 3.2.1.13 is rather a cloud than a strict relationship. And that of r2 and r3 as presented in Figure 3.2.1.14 is again a very peculiar phenomenon. Thus, until further investigations have been performed, we cannot set up even hypotheses about these relations. The result is won abductively, it is pretheoretical and needs substantiation. Let us briefly recollect the rise of the spectrum. It is created from the rank-frequency distribution by “turning it round” and without taking recourse to the ranking. The spectral g(1) is the number of different hapax legomena, g(2) the number of different dis legomena, g(3) the number of different tris legomena, etc., i.e. the number of different words occurring 1, 2, 3, etc. times.
The word 111
Crăiasa din poveşti Criticilor mei Cu mâne zilele-ţi adaogi… Cugetările sărmanului Dionis Cum negustorii din Constantinopol Cum oceanu-ntărâtat... Dacă treci râul Selenei De câte ori, iubito… De ce nu-mi vii De ce să mori tu? De-aş avea De-aş muri ori de-ai muri Demonism De-oi adormi (variantă) De-or trece anii… Departe sunt de tine Despărţire Din Berlin la Potsdam Din lyra spartă... Din noaptea Din străinătate Din valurile vremii… Dintre sute de catarge Doi aştri Dorinţa
1876 1883 1883 1872 1874
1873 1873 1879 1887 1869 1866 1869 1872 1883 1883 1878 1879 1873 1867 1884 1866 1883 1880 1872 1876
Poem title
Year
77 356 102 123 266 93 258 882 122 87 135 304 128 51 68 244 152 50 40 102
122 130 141 571 101
N
4.3323 26.2532 30.1546 1009.5406 12.5348 5.9555 6.5861 39.9979 4.0551 8.1834 3.5936 13.1881 6.1389 2.4715 2.9175 15.5203 1048.0385 5.1076 1023.8836 16.4283
620.1621 2.9378 5.9539 53.0793 5.0416
a1
0.5342 2.7062 0.4500 0.1649 1.0288 6.0681 4.1209 3.1068 5.0045 3.4878 1.8192 1.1467 1.7924 3.5407 4.6150 2.2650 0.1447 2.3633 0.1443 0.7296
0.1967 14.4864 1.4186 0.7128 4.0443
r1
3.0751 3.3878 1.9529 4.8512 7.9849 X 4.7201 8.9490 X X 4.4930 2.4289 2.8418 X X 3.0936 5.5159 X X 2.2071
2.3550 X 2.1354 11.5949 X
a2
3.6733 20.7827 8.5934 2.2171 11.7215 X 15.1460 30.5860 X X 6.2784 4.8224 8.2511 X X 17.1414 9.2768 X X 6.1129
11.5610 X 15.9017 6.7859 X
r2
X X X 2.7798 X X X X X X X 6.0690 X X X X X X X X
X X X 3.0792 X
a3
X X X 12.0362 X X X X X X X 14.3645 X X X X X X X X
X X X 31.9774 X
r3
0.9429 0.9759 0.9426 0.9657 0.9849 0.9592 0.9755 0.9692 0.9526 0.9808 0.9714 0.9844 0.9580 0.8790 0.9120 0.9711 0.9661 0.9265 1.0000 0.9621
0.9424 0.8984 0.9244 0.9899 0.9663
R2
3 2 2 3 2 1 2 2 1 1 2 3 2 1 1 2 2 1 1 2
2 1 2 3 1
112 The word
Strata
Dumnezeu şi om Ecò Egipetul Epigonii Făt-Frumos din tei Feciorul de împărat fără de stea Floare-albastră Foaia veştedă (după Lenau) Freamăt de codru Frumoasă şi jună Ghazel Glossă Horia Iar când voi fi pământ (variantă) Împarat şi proletar În căutarea Şeherazadei Înger de pază Înger şi demon Îngere palid... Întunericul şi poetul Iubind în taină… Iubită dulce, o, mă lasă Iubitei Junii corupţi
1873 1872 1872 1870 1875 1872
1874 1874 1871 1873 1870 1869 1883 1871 1871 1869
1873 1879 1879 1871 1873 1883 1867 1883
Poem title
Year
1510 915 91 876 63 249 87 337 416 458
247 115 179 113 331 380 143 131
443 698 688 921 415 6030
N
47.4060 35.7173 16.6935 23.0117 2.6544 11.1358 2.6544 12.0744 18.0428 20.7220
14.9171 511.6083 4.0893 1811.9780 9.4178 12.9813 19491.6295 9.9354
15.2903 34.9965 26.4144 23.6970 11.4825 142.3625
a1
2.7852 3.5201 0.5082 1.8070 4.5588 3.6606 4.5588 0.7234 1.5403 1.8898
2.2033 0.1729 1.9703 0.1551 2.5320 1.2626 0.1114 0.6885
4.2676 4.5633 4.3648 2.4497 2.0479 2.1681
r1
19.9569 3.1089 1.8382 16.2868 X 2.1619 X 7.3443 6.8214 11.7782
2.5109 2.8761 4.1896 4.7950 5.1449 4.4059 2.8539 3.0310
2.5673 3.0133 3.4749 20.9277 4.4631 130.6617
a2
13.2994 10.8104 10.9270 13.0474 X 19.2127 X 17.3343 24.1768 10.9045
16.0441 5.8439 7.9319 6.7276 16.7188 5.1963 8.6061 8.2061
27.6472 40.4731 41.1845 6.3594 2.6915 11.4048
r2
4.9170 4.7263 X 2.4309 X X X X X X
X X X X X 9.0697 X X
X X X 3.3973 6.4490 10.6212
a3
61.2710 39.9599 X 53.0710 X X X X X X
X X X X X 18.4480 X X
X X X 58.2903 17.4201 194.6782
r3
0.9923 0.9931 0.9064 0.9829 0.9004 0.9669 0.9050 0.9839 0.9888 0.9858
0.9256 0.9461 0.9752 0.9703 0.9815 0.9872 0.9508 0.9559
0.9781 0.9836 0.9867 0.9887 0.9900 0.9847
R2
3 3 2 3 1 2 1 2 2 2
2 2 2 2 2 3 2 2
2 2 2 3 3 3
Frequency distribution 113
Strata
Kamadeva La Bucovina La mijloc de codru… La moartea lui Heliade La moartea lui Neamţu La moartea principelui Ştirbey La mormântul lui Aron Pumnul La o artistă (Ca a nopţii poezie) La o artistă (Credeam ieri...) La Quadrat La steaua Lacul Lasă-ţi lumea… Lebăda Lida Locul aripelor Luceafărul Mai am un singur dor Melancolie Memento mori Miradoniz Misterele nopţii Mitologicale Mortua est! Mureşanu
1887 1866 1883 1867 1870 1869 1866 1868
1869 1870 1886 1876 1883 1869 1866 1869 1883 1883 1876 1872 1872 1866 1873 1871 1876
Poem title
Year
219 110 71 90 225 41 66 259 1737 125 274 9773 636 155 681 491 2051
81 184 55 332 245 132 150 142
N
52.6423 6.7652 3.1825 1.9511 8.7220 4.5000 5198.9942 8.0232 109.0767 2.5801 2050.7692 459.7574 52.0807 14.7594 43.8444 2413.1118 90.3787
5.1921 6.6344 21.1647 2010.6876 22.1321 6.0559 502.6801 1009.8848
a1
0.5313 4.8902 3.6197 2.0911 1.5835 1.4427 0.1216 11.4417 1.1309 9.4739 0.1770 1.7496 1.8462 0.5051 1.9040 0.1537 1.9000
0.5548 1.4426 1.3732 0.1311 0.3829 6.5669 0.1630 0.1583
r1
3.1343 X X 5.0539 4.9613 X 1.9852 X 35.8357 X 7.8142 171.4638 5.7625 4.3472 6.7523 16.0205 31.4086
2.6971 2.8229 X 11.0637 1.5359 X 6.5360 3.4573
a2
20.0234 X X 3.8944 10.6603 X 4.7291 X 9.7030 X 4.2871 11.2838 8.0546 10.9757 8.9135 4.7484 14.4051
4.4592 14.6002 X 2.9660 3.4484 X 1.6817 1.7790
r2
X X X X X X X X 6.3336 X 2.7489 14.9325 4.4058 X 3.2527 4.5730 4.5796
X X X 4.3421 4.5061 X 2.5251 2.3545
a3
X X X X X X X X 82.1682 X 17.8317 234.1431 34.3053 X 38.8606 28.8266 120.0863
X X X 19.6092 15.6923 X 11.0561 14.9838
r3
0.9694 0.9466 0.9073 0.9741 0.9837 0.8348 0.9194 0.9833 0.9947 0.8653 0.9854 0.9930 0.9941 0.9639 0.9627 0.9900 0.9931
0.9338 0.9539 0.9699 0.9771 0.9690 0.9481 0.9614 0.9415
R2
2 1 1 2 2 1 2 1 3 1 3 3 3 2 3 3 3
2 2 1 3 3 1 3 3
114 The word
Strata
1868 1873 1866 1869 1874 1880 1883 1872 1869 1878 1879 1879 1883 1883 1878 1887 1869 1873 1874
1873 1874 1871 1867 1882 1883
Year
Murmură glasul mării Napoleon Noaptea… Nu e steluţă Nu mă-nţelegi Nu voi mormânt bogat (variantă) Numai poetul O arfă pe-un mormânt O călărire în zori O stea pin ceruri O, adevăr sublime... O, mamă… Odă în metru antic Odin şi poetul Ondina (Fantazie) Oricâte stele… Pajul Cupidon… Pe aceeaşi ulicioară… Pe lângă plopii fără soţ Peste vârfuri Povestea codrului Povestea teiului Prin nopţi tăcute Privesc oraşul furnicar Pustnicul
Poem title
48 157 346 78 334 140 103 1429 871 85 148 138 199 47 220 390 48 173 380
119 240 177 54 384 113
N
2.9884 6.7169 71101.8760 2.7075 18.3782 3.9817 3.7195 54.6362 25.5026 1.9520 11.6078 3.3786 5.1965 7.7134 11.7077 2279.6645 2.9884 4182.5534 7.2483
20092.6605 23.3121 8.4605 2.6189 9.8973 5.6330
a1 3.4122 3.0347 5.5876 X 3.3607 X X 4.3446 8.1815 X 3.1713 4.1507 X 35.4042 22.2535 X X 4.4394 6.2670 X 2.6900 10.4876 X 4.8366 5.8695
3.3896 1.3403 0.1114 5.7072 3.7779 2.1181 6.1772 0.9785 1.2515 7.2602 3.3671 2.4637 1.3915 1.5149 1.4751 0.1617 3.3896 0.1398 4.1610
a2
0.1092 1.2236 0.6900 6.2878 5.7220 3.1103
r1
X 8.4609 4.3886 X 16.1917 9.4530 X 9.2958 6.7031 X X 7.0803 9.8897 X 16.5057 3.6789 X 2.2154 15.2017
5.8434 19.0081 9.2722 X 24.0198 X
r2 X X X X X X X X 2.6000 X X X X 4.3825 3.8466 X X X X X X 4.5035 X 2.9397 X
a3 X X X X X X X X 30.1466 X X X X 87.6763 48.8057 X X X X X X 21.3489 X 9.5797 X
r3
0.9141 0.9774 0.9760 0.9065 0.9859 0.9595 0.9453 0.9904 0.9914 0.8349 0.9697 0.9771 0.9808 0.9843 0.9442 0.9851 0.9141 0.9747 0.9881
0.9599 0.9786 0.9731 0.8884 0.9739 0.9756
R2
1 2 3 1 2 2 1 3 3 1 1 2 2 1 2 3 1 3 2
2 2 2 1 2 1
Frequency distribution 115
Strata
Poem title
Replici Revedere Rugăciunea unui dac S-a dus amorul Sara pe deal Scrisoarea I Scrisoarea II Scrisoarea III Scrisoarea IV Scrisoarea V Se bate miezul nopţii… Şi dacă… Singurătate Somnoroase păsărele… Sonete Speranţa Steaua vieţii Stelele-n cer Sus în curtea cea domnească Te duci... Trecut-au anii Unda spumă Venere şi Madona Veneţia (de Gaetano Cerri) Viaţa mea fu ziuă Vis
Year
1871 1879 1879 1883 1885 1881 1881 1881 1881 1881 1883 1883 1878 1883 1879 1868 1871 1879 1870 1883 1883 1869 1887 1883 1869 1876
147 141 357 219 156 1282 696 2278 1256 1027 45 53 172 55 265 245 70 91 128 84 88 59 393 79 105 177
N 17.6037 5.6647 14.4738 205.8417 2.6301 62.2749 14062.8669 113.6286 103.8187 45.3177 3.1155 3.5954 5.7752 4.2714 20.1325 28.0111 3.2808 5.1921 5.4093 1005.2837 6.3903 2.8065 18.1985 2.9884 4.7017 7.0752
a1 4.8747 7.5513 3.8073 0.1904 2.2015 2.9501 0.1374 2.2583 1.0165 3.1912 2.1840 5.3136 7.3088 2.6561 0.4237 1.4522 5.2560 0.5548 0.5535 0.1888 0.5808 5.1492 2.8804 3.3896 4.6823 1.6542
r1 X X 2.9110 8.9709 4.0110 9.1858 19.3143 40.4430 22.9525 11.1447 X X X X 5.4946 4.6618 X 2.6971 3.5815 3.9763 2.1689 X 3.8295 X X 2.5533
a2 X X 20.8656 8.0310 6.5669 44.9864 9.7624 9.8721 9.5323 11.3547 X X X X 13.5360 17.0697 X 4.4592 7.2919 3.4067 6.8424 X 28.3750 X X 13.3389
r2 X X X X X X 1.8867 5.6378 4.8122 3.9754 X X X X X X X X X X X X X X X X
a3 X X X X X X 48.6273 99.8282 62.6190 61.7916 X X X X X X X X X X X X X X X X
r3 0.9452 0.9662 0.9776 0.9853 0.9604 0.9833 0.9835 0.9802 0.9961 0.9903 0.9179 0.9257 0.9593 0.9502 0.9768 0.9815 0.9161 0.9342 0.9535 0.9875 0.9105 0.9059 0.9744 0.9195 0.9609 0.9537
R2 1 1 2 2 2 2 3 3 3 3 1 1 1 1 2 2 1 2 2 2 2 1 2 1 1 2
116 The word
Strata
Frequency distribution 117
Figure 3.2.1.10. Dependence of r1 on N in mono-stratal data
Figure 3.2.1.11. The relationship of r1 and r2 in bi-stratal data
Figure 3.2.1.12. Relationship between r1 and r2 in tri-stratal data
118 The word
Figure 3.2.1.13. Relationship between r1 and r3 in tri-stratal data
Figure 3.2.1.14. Relationship between r2 and r3 in tri-stratal data
For an illustration, let us consider the rank-frequency sequence of the poem Lacul, namely 6, 5, 4, 3, six successive 2, and sixty successive 1, hence the corresponding spectrum is g(1) = 60, g(2) = 6, g(3) = 1, g(4) = 1, g(5) = 1, and g(6) = 1. Transforming the data according to (3.2.2) in order to obtain the frequency spectra of individual poems, it is possible to show that also spectra are stratified entities. Since spectra contain a true random variable, one can expect that the exponential fitting (3.2.1.1) will be at least as good as with ranked frequencies. As shown in Table 3.2.1.3, this is true, and only one stratum is required. Indeed, with increasing frequency, x, the decay of the spectrum, g(x), closely follows the simple exponential law g(x) = 1 + a*exp(-x/b)
Frequency distribution 119
with a and b as text parameters. The average value of the determination coefficient of 146 poems, considered in Table 3.2.1.3, is very high, of R̄2 = 0.9987, with a very narrow standard deviation, of 0.0020. Table 3.2.1.3: Exponential fitting of the spectrum g(x) = 1 + a*exp(-x/b) in 146 poems by Eminescu (asterisk corresponds to the transformation g*(x) = g(x) – g(W) + 1 where W = the greatest non-zero class, see comment on indicator B in Chapter 3.2.6) Poem title
a
b
R2
Adânca mare... Adio Ah, mierea buzei tale Amicului F.I. Amorul unei marmure* Andrei Mureşanu Atât de fragedă … Aveam o muză Basmul ce i l-aş spune ei Călin (file de poveste) Când Când amintirile… Când crivăţul cu iarna... Când marea... Când priveşti oglinda mărei Care-i amorul meu în astă lume Ce e amorul? Ce te legeni… Ce-ţi doresc eu ţie, dulce Românie Cine-i? Copii eram noi amândoi Crăiasa din poveşti Criticilor mei* Cu mâne zilele-ţi adaogi… Cugetările sărmanului Dionis Cum negustorii din Constantinopol Cum oceanu-ntărâtat... Dacă treci râul Selenei De câte ori, iubito… De ce nu-mi vii De ce să mori tu? De-aş avea* De-aş muri ori de-ai muri Demonism De-oi adormi (variantă)* De-or trece anii…
1354.4756 568.3042 735.7630 1001.1384 869.0966 4425.6238 859.3323 1597.6866 1240.1196 4102.0535 725.1972 4617.2887 2885.9093 449.0164 805.7183 1313.6504 737.9769 376.5405 649.0010 505.3207 1604.0263 364.6824 203.2197 331.8470 2704.4726 1733.1756 878.1955 1262.1373 508.9190 238.0511 2147.2913 427.8289 1102.0203 2017.5358 2827.1894 680.9978
0.3103 0.5328 0.5182 0.5323 0.5497 0.5681 0.4842 0.5100 0.5523 0.6249 0.4639 0.2395 0.4701 0.5049 0.4093 0.4309 0.4425 0.5446 0.5318 0.5206 0.4849 0.6219 0.8219 0.6918 0.4705 0.3171 0.3703 0.5237 0.5042 0.7091 0.3662 0.4395 0.4711 0.5685 0.2929 0.3888
0.9982 0.9995 0.9911 0.9993 0.9995 0.9981 0.9988 0.9991 0.9985 0.9985 0.9997 0.9977 0.9995 0.9986 0.9999 0.9979 1.0000 0.9989 0.9985 0.9997 0.9993 0.9997 0.9976 0.9970 0.9995 0.9992 1.0000 0.9999 0.9994 0.9984 0.9950 0.9954 0.9961 0.9970 0.9999 0.9995
120 The word Poem title
a
b
R2
Departe sunt de tine Despărţire Din Berlin la Potsdam Din lyra spartă... Din noaptea* Din străinătate Din valurile vremii… Dintre sute de catarge* Doi aştri Dorinţa Dumnezeu şi om Ecò Egipetul Epigonii Făt-Frumos din tei Feciorul de împărat fără de stea Floare-albastră Foaia veştedă (după Lenau) Freamăt de codru Frumoasă şi jună Ghazel Glossă Horia Iar când voi fi pământ (variantă) Împarat şi proletar În căutarea Şeherazadei Înger de pază Înger şi demon Îngere palid...* Întunericul şi poetul Iubind în taină…* Iubită dulce, o, mă lasă Iubitei Junii corupţi Kamadeva La Bucovina La mijloc de codru… La moartea lui Heliade La moartea lui Neamţu La moartea principelui Ştirbey La mormântul lui Aron Pumnul La o artistă (Ca a nopţii poezie) La o artistă (Credeam ieri...) La Quadrat
983.8634 1326.8446 675.3902 349.9895 1061.9776 933.7838 492.4733 1156.0026 12102.1473 782.4757 2160.4607 2537.8588 2531.8318 2480.0678 2437.8153 7251.3292 1420.0060 1070.6388 1645.0020 608.9169 1552.7107 1542.3236 1000.4145 777.4744 6333.5097 4337.9540 247.0794 3043.6635 470.0535 960.0351 1130.1664 1085.2744 932.7079 6592.8465 754.1010 650.0029 261.3333 1547.0818 884.7791 1365.4426 669.0598 330.7506 468.6121 452.0503
0.4162 0.4755 0.4743 0.4451 0.3186 0.5149 0.5463 0.2836 0.1727 0.4240 0.4791 0.5128 0.5162 0.5789 0.4259 0.6513 0.4527 0.3984 0.3868 0.4500 0.4736 0.4166 0.4379 0.4614 0.4554 0.4602 0.6586 0.5036 0.4181 0.5233 0.3539 0.5238 0.5931 0.3144 0.3977 0.5658 0.4477 0.4659 0.5316 0.3571 0.5124 0.6868 0.6937 0.5032
0.9998 0.9990 0.9998 0.9998 1.0000 0.9986 0.9973 1.0000 1.0000 1.0000 0.9999 0.9999 0.9992 0.9996 0.9987 0.9952 0.9984 1.0000 0.9996 0.9973 0.9990 0.9854 0.9995 0.9999 0.9991 0.9994 0.9959 0.9994 0.9999 0.9999 1.0000 0.9971 0.9962 0.9996 0.9999 1.0000 1.0000 0.9967 0.9976 0.9968 0.9994 0.9987 0.9990 0.9988
Frequency distribution 121 Poem title
a
b
R2
La steaua* Lacul Lasă-ţi lumea… Lebăda* Lida Locul aripelor* Luceafărul Mai am un singur dor Melancolie Memento mori Miradoniz Misterele nopţii Mitologicale Mortua est! Mureşanu Murmură glasul mării Napoleon Noaptea… Nu e steluţă* Nu mă-nţelegi* Nu voi mormânt bogat (variantă) Numai poetul* O arfă pe-un mormânt O călărire în zori O stea pin ceruri* O, adevăr sublime... O, mamă… Odă în metru antic* Odin şi poetul Ondina (Fantazie) Oricâte stele… Pajul Cupidon…* Pe aceeaşi ulicioară… Pe lângă plopii fără soţ Peste vârfuri Povestea codrului Povestea teiului Prin nopţi tăcute* Privesc oraşul furnicar Pustnicul Replici Revedere Rugăciunea unui dac S-a dus amorul
7881.7018 706.0953 1697.2405 672.4397 480.3582 1302.7430 2575.7621 519.8061 1116.5236 11401.9795 1927.2720 704.3565 2526.0404 1378.5337 2920.5188 1454.2049 714.8968 1636.4762 144.3022 1464.2586 4053.6331 515.9543 1000.1551 827.9697 683.8633 1844.0705 525.3332 760.3918 2471.2823 2840.3157 430.4143 3541.9342 799.2952 1463.1455 1157.9948 943.3530 1414.5720 515.9543 1125.8307 3102.1643 415.5590 628.3138 1774.9980 1578.8917
0.1999 0.4029 0.4019 0.3317 0.4381 0.4440 0.6715 0.5488 0.5081 0.6507 0.5326 0.4753 0.5125 0.5527 0.6840 0.3565 0.5891 0.3654 0.5839 0.5072 0.2634 0.3597 0.4304 0.6461 0.3881 0.4410 0.5170 0.4142 0.6440 0.5269 0.5119 0.2819 0.4462 0.3904 0.2834 0.5182 0.5216 0.3597 0.4400 0.3843 0.4944 0.4881 0.4695 0.3968
0.9999 0.9999 0.9988 1.0000 1.0000 0.9992 0.9984 0.9996 0.9998 0.9948 0.9986 0.9933 0.9997 0.9979 0.9982 0.9987 0.9994 0.9951 0.9979 0.9990 0.9999 1.0000 0.9988 0.9981 0.9999 0.9998 0.9970 0.9999 0.9978 0.9993 0.9993 0.9998 0.9999 0.9973 1.0000 0.9993 0.9996 1.0000 0.9999 0.9988 0.9987 0.9987 0.9999 0.9991
122 The word Poem title
a
b
R2
Sara pe deal Scrisoarea I Scrisoarea II Scrisoarea III Scrisoarea IV Scrisoarea V Se bate miezul nopţii… Şi dacă… Singurătate* Somnoroase păsărele… Sonete Speranţa Steaua vieţii Stelele-n cer Sus în curtea cea domnească Te duci... Trecut-au anii Unda spumă* Venere şi Madona Veneţia (de Gaetano Cerri)* Viaţa mea fu ziuă Vis
1528.3283 3893.9812 2429.4491 4835.4891 3500.8155 2045.7178 616.4617 141.1241 2386.1162 512.9476 1485.5921 478.7944 266.2396 881.1158 919.5371 1166.2636 496.1178 414.2072 836.2394 1988.4880 1046.7900 789.2336
0.3826 0.5117 0.5092 0.5835 0.5374 0.6208 0.3486 0.5902 0.3288 0.3881 0.4474 0.6460 0.5485 0.3859 0.4261 0.3370 0.4809 0.4047 0.6558 0.2897 0.3774 0.5145
0.9997 0.9982 0.9998 0.9988 0.9988 0.9992 1.0000 0.9880 0.9977 1.0000 0.9983 0.9979 1.0000 1.0000 0.9992 0.9997 0.9996 0.9999 0.9987 1.0000 0.9994 0.9999
3.2.2 Ord's criterion The study of a writer's word-frequencies is always performed with the intent to find some order in his writing. Though themes and style may change, we conjecture that there are some properties in the text that are mastered by the writer only partially, especially as long as the text is short. But as soon as the text begins to increase, the writer loses the overall control and some of the properties develop without his conscious will. In such a situation, self-organisation begins to work and “pulls” the text to an attractor which must be found analytically. This is the case e.g. with the addition of strata whose number may increase in the course of text writing (see Chapter 3.2.1). By self-organisation, new strata are laid on the old ones and our apparatus captures the probable form of the attractor. Here we shall restrict ourselves to another attractor, viz. Ord's criterion (cf. J.K. Ord, 1972), used frequently in linguistics. To this end we need the moments of the ranked frequency sequence which may be a distribution or merely a sequence of numbers. We compute the usual first moment (average), the second
Frequency distribution 123
central moment (here variance) and the third central moment, an indicator of the skewness of the ranked sequence. The definitions are
1 N
′ r= (3.2.2.1) m1=
V
∑ rf (r ) r =1
where r are the ranks, f(r) are the frequencies, N = text size in words and V = the maximum rank (the vocabulary). Further (3.2.2.2)
1 N
2 m= s= 2
V
∑ (r − r )
2
f (r )
r =1
and (3.2.2.3)= m3
1 N
V
∑ (r − r ) r =1
3
f (r ) .
Considering the moments individually in Eminescu's poems we would find a great variation and no order. But if we use Ord's ratios defined as (3.2.2.4) = I
m2 s2 = r m1′
and (3.2.2.5)
S=
m3 , m2
Figure 3.2.2.1. Ord's criterion for Eminescu's 146 rank-frequency sequences
we can plot a representative of each text in a Cartesian coordinate system and obtain a kind of order. Up to now, all investigations of in textology
124 The word have shown that a writer in a certain language is positioned either in an elliptic cloud or, in stricter cases, on a straight line, independently of the property measured (cf. Ammermann 2001; Arlt 2006; Best 2001a, 2003; 2005a,b; Best, Kaspar 2001; Nemcová, Altmann 1994; Oakes 2007; Popescu et al. 2008; Popescu, Čech, Altmann 2011a; Wimmer et al. 2003: 100 ff.). Generally, two similar sequences of rank-frequencies fi and fj obey the rule fi = constant • fj so that the constant is eliminated in the above Ord's ratios and the corresponding values coincide. Hence, the more similar are two rank-frequencies, the more closer are their representative points in a plot. In Table 3.2.2.1 we present the values for all poems by Eminescu and plot the results in Figure 3.2.2.1. In the plot, we omitted two texts with great size in order to render the picture more lucid. Nevertheless, even the omitted texts lie exactly on the computed curve. Needless to say, Ord's indicators can be computed also from the theoretical values of the stratification formula, e.g. with (3.2.2.6)
(r) = 1 + aexp(-r/b)
the average would be (3.2.2.7)
′ m= 1
1 N
V
∑ r (1 + a exp(−r / b)) . r =1
For example, in the poem Adânca mare... we obtain empirically m1´ = 26.57, and using the computed parameters (cf. Table 3.2.1.2 in Chapter 3.2.1) the expectation is
1 62 m1′ = ∑ r (1 + 5.1159exp(−r / 3.1488)) =26.71 , 75 r =1 i.e. the difference is only in the decimal places. As can be seen, S is a power function of I and can be expressed as S = 0.30517(I) 1.16736 with R2 = 0.9972. The straight line S = 2I – 1 is the boundary between the great area of the betabinomial (= negative hypergeometric) distribution and the hypergeometric and the beta-Pascal distributions, and touching the binomial, Poisson and negative binomial distributions (cf. Ord 1972; Popescu et al. 2009: 154). As can be seen, Eminescu is placed entirely in the beta-binomial domain. The dependence of I on N is quite natural but that of S on I is not. The third central moment, which is also a component of S is a result of self-organization, a kind of adaptation of the skewness of the ranked sequence to the variance. Though all of these sequences have by definition a hyperbolic form, skewness and variance are not necessarily interdependent. It has been shown in other
Frequency distribution 125
publications (cf. Popescu et al. 2009: 154-165; Wimmer et al. 2003: 99-102; Popescu, Čech, Altmann 2011a) that in most text collections in 20 languages, the relationship S = f(I) is usually an increasing straight line. Here, we would obtain a very good fit with S = -18.5433 + 0.9647(I) with R2 = 0.99. The kind of dependence and the parameters are features of the given text collection. A quasi-final theoretical decision cannot be made before analyses of different text sorts and individual writers have been performed.
126 The word
Pajul Cupidon... O arfă pe-un mormânt Sara pe deal Singurătate
14.3580 4.8041
14.4398 3.2689
14.6022 2.4098
4.4763
15.1658
15.4324
15.8942 11.0147
16.0946 7.4472
16.1075
16.2951
16.5634 5.3582
16.6383 8.6144
17.3485
17.4031
18.7352
19.0325 9.3166
19.1177
19.3205 7.3207
O stea prin ceruri
Cum oceanuntărâtat... Veneţia (de Gaetano Cerri) Kamadeva
Oricâte stele...
De-aş avea
Înger de pază
Iubind în taină...
Trecut-au anii
Te duci...
De-or trece anii...
Lacul
Stelele-n cer
Când amintirile...
Ce te legeni...
De câte ori, iubito...
Odă în metru antic
6.4573
5.8619
5.1325
7.1442
5.0704
3.2733
3.7007
La mormântul lui Aron Pumnul Adio
14.3434 4.5074
16.2218
16.8749
S
16.9221
14.0062
33.7436
33.7274
33.5602 17.6877
15.3924
14.0851
33.4095 13.6364
30.3494 9.6160
30.2561
29.8618 11.3191
29.6257
28.5780 12.1992
28.1931
27.2352
I
Care-i amorul meu în astă lume
La o artistă (Credeam ieri) Ah, mierea buzei tale
Pe lângă plopii fără soţ
Freamăt de codru
La Bucovina
15.9934 21.6960
28.0882 40.4923 20.3393
39.2957
38.4593 23.1001
37.7909
35.0398 12.5882
34.1670
Ce-ţi doresc eu ţie, dulce 33.7896 19.7728 Românie Privesc oraşul furnicar 33.8715 12.9692
Noaptea...
Atât de frageda…
Vis
Misterele nopţii
Din valurile vremii...
13.2642 2.9552
Adânca mare…
Poem title
La steaua
S
I
Poem title
Scrisoarea V
Epigonii
Inger si demon
Ondina (Fantazie)
Demonism
Ecò
Când crivăţul cu iarna...
Egipetul
Scrisoarea II
Mitologicale
Cugetările sărmanului Dionis Miradoniz
Junii corupţi
Mortua est!
Dumnezeu şi om
Făt-Frumos din tei
Aveam o muză
Pustnicul
Povestea teiului
Poem title 44.4743
S
46.6967
48.2603
181.2812 142.4597
174.8791 116.4110
166.9185 116.1771
166.8483 110.8842
161.6418 120.1606
137.7140 85.8646
136.5299 94.3774
133.9957 80.3625
133.7825 90.1734
133.6390 80.9262
119.1822 83.5678
113.2859 63.0100
93.9687 51.5615
89.4855 63.6093
88.0286 43.1048
81.1388
80.5931
75.4629 38.8569
74.4225
I
Frequency distribution 127
Locul aripelor Floare-albastră
20.8298 13.2573
4.1517
5.3182
10.2383
21.1285
21.2373
21.6335
21.7732
22.1753
Criticilor mei
De ce nu-mi vii
Frumoasă şi jună
Nu voi mormânt bogat (variantă) Foaia veştedă (dupa Lenau) Crăiasa din poveşti
Murmură glasul mării 22.8708 6.4199
11.1675
13.6821
20.8195 12.0935
Când marea...
Amorul unei marmure
Întunericul şi poetul
25.6754
47.5299
30.1077
25.6365
48.9285 28.7203
48.7903 21.8256
48.2531
47.9721
30.9101
23.1258
46.5548 26.4649
45.0691
44.7092 24.8054
De-aş muri ori de-ai muri 47.5829
Amicului F.I.
Din străinătate
La moartea lui Neamţu
Napoleon
20.3502 11.0975
La Quadrat
23.7420
18.6836
31.6412
S
44.0860 20.6707
42.5334
Lasă-ţi lumea...
41.9171
Povestea codrului S-a dus amorul
41.7168
I
Speranţa
Poem title
20.0956 6.6987
S
Viaţa mea fu ziuă
I
Când priveşti oglinda 19.3821 6.6813 mărei Cum negustorii din 19.4562 5.8108 Constantinopol Dorinţa 19.6653 5.7203
Poem title
Feciorul de împărat fără de stea Memento mori
Scrisoarea III
Călin (file de poveste)
Andrei Mureşanu
Mureşanu
Împărat şi proletar
Luceafărul
Odin şi poetul
Scrisoarea I
Scrisoarea IV
În căutarea Şeherezadei
Poem title
S
1350.45901354.7581
848.5269 842.0609
404.7692 327.5248
393.5057 329.6952
348.2138 287.8655
330.9081 290.9484
291.0834 208.6160
281.3298 246.8649
241.2197 200.7361
232.3401 176.0081
231.5885 173.3774
182.1950 109.1787
I
128 The word
Frequency distribution 129
3.2.3 The lambda indicator In texts of sufficient lengths, the frequencies which correspond to adjacent ranks are not equidistant. At the beginning of the sequence, the distances between f(r) and f(r+1) are great, with increasing ranks, the differences between the corresponding frequencies decrease. The tail of the sequence consists mostly of f(r) = 1, i.e. there is no difference. This way of structuring can be expressed by the length of the arc beginning at the distribution top (1, f(1)), see Figure 3.2.6.1, and ending at (V, f(V) with V = maximum rank (the extent of Vocabulary); usually f(V) = 1. Since we have to do with discrete quantities, the arc length can be expressed as (3.2.3.1)= L
V −1
∑[( f (r ) − f (r + 1)) r =1
2
+ 1]1/ 2 ,
i.e. as the sum of Euclidean distances between neighbouring frequencies. Evidently, the greater the text size N (= area under the distribution), the greater is the arc length. In order to normalise L, i.e. to make it independent of N, Popescu, Čech and Altmann (2011) proposed the lambda indicator (3.2.3.2) Λ =
L(log10 N ) , N
yielding values scattered around an horizontal straight line in the plot and enabling us to compare texts of different lengths. When we consider the poem Prin nopţi tăcute as shown in Chapter 3.2 (Table 3.2.1), we obtain the sequence 3,3,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 from which it follows that L = [(3 – 3)2 + 1]1/2 + [(3 – 2)2 + 1]1/2 + … + [(1 – 1)2 + 1]1/2 = 39.8284. Since N = 48, we obtain Λ = 39.8284(log1048)/48 = 1.3950.
130 The word Table 3.2.3.1.: Lambda in 146 poems by M. Eminescu N
L
Λ
Var (Λ)
63.0645
1.5767
0.003986
114.6410
1.5872
0.004079
Year
Poem title (alphabetically)
1873
Adânca mare…
75
1883
Adio
159
1873
Ah, mierea buzei tale
228
147.5432
1.5259
0.001572
1869
Amicului F.I.
257
194.6569
1.8253
0.001052
1868
Amorul unei marmure
266
187.1290
1.7059
0.001915
1871
Andrei Mureşanu
2008
1057.0693
1.7387
0.000326
1879
Atât de frageda…
176
138.7396
1.7701
0.000954
1871
Aveam o muză
421
290.0402
1.8080
0.001917
1871
Basmul ce i l-aş spune ei
398
272.0975
1.7774
0.001117
1876
Călin (file de poveste)
2299
1199.1833
1.7534
0.000268
1869
Când
126
101.8929
1.6985
0.000737
1883
Când amintirile...
97
80.6569
1.6520
0.002305
1872
Când crivăţul cu iarna...
708
438.8507
1.7666
0.001252
1869
Când marea...
114
81.8929
1.4776
0.003552
1869
Când priveşti oglinda mărei
101
83.4787
1.6566
0.002213
1873
Care-i amorul meu în astă lume
213
158.8929
1.7369
0.001910
1883
Ce e amorul?
124
97.9274
1.6533
0.004057
1883
Ce te legeni...
102
79.3657
1.5629
0.003192
1867
Ce-ţi doresc eu ţie, dulce Românie
183
129.7148
1.6037
0.003101
1869
Cine-i?
129
95.7148
1.5660
0.003505
1871
Copii eram noi amândoi
375
265.9252
1.8253
0.000963
1876
Crăiasa din poveşti
122
96.9515
1.6580
0.001990
1883
Criticilor mei
130
91.2426
1.4837
0.001974
1883
Cu mâne zilele-ţi adaogi...
141
106.4787
1.6230
0.002482
1872
Cugetările sarmanului Dionis
571
406.6445
1.9632
0.000958
1874
Cum negustorii din Constantinopol
101
84.6569
1.6800
0.002164
1873
Cum oceanu-ntărâtat...
77
67.2426
1.6474
0.001497
1873
Dacă treci râul Selenei
356
244.4991
1.7523
0.001164
1879
De câte ori, iubito...
102
85.9907
1.6933
0.001029
1887
De ce nu-mi vii
123
85.7800
1.4575
0.003658
1869
De ce să mori tu?
266
177.8098
1.6209
0.002018
1866
De-aş avea
93
62.4787
1.3225
0.006866
1869
De-aş muri ori de-ai muri
258
171.1356
1.5997
0.002206
1872
Demonism
882
524.0894
1.7502
0.000931
1883
De-oi adormi (variantă)
122
105.2426
1.7998
0.002182
1883
De-or trece anii...
87
64.8929
1.4467
0.003931
1878
Departe sunt de tine
135
106.8929
1.6868
0.002983
Frequency distribution 131
Year
Poem title (alphabetically)
N
L
Λ
Var (Λ)
1879
Despărţire
304
208.7707
1.7051
0.001639
1873
Din Berlin la Potsdam
128
101.3006
1.6677
0.003899
1867
Din lyra spartă...
51
43.8284
1.4675
0.002791
1884
Din noaptea
68
56.8284
1.5314
0.002883
1866
Din străinătate
244
174.3565
1.7060
0.001276
1883
Din valurile vremii...
152
105.8929
1.5200
0.002469
1880
Dintre sute de catarge
50
41.6503
1.4153
0.007118
1872
Doi aştri
40
38.4142
1.5385
0.000781
1876
Dorinţa
102
87.8126
1.7292
0.001200
1873
Dumnezeu şi om
443
327.2346
1.9548
0.001641
1872
Ecò
698
461.6051
1.8807
0.000996
1872
Egipetul
688
465.7789
1.9211
0.000442
1870
Epigonii
921
590.0754
1.8992
0.000570
1875
Făt-Frumos din tei
415
289.4209
1.8258
0.001232
1872
Feciorul de impărat fără de stea
6030
2445.6464
1.5332
0.000081
1873
Floare-albastră
247
192.1356
1.8612
0.002321
1879
Foaia veştedă (dupa Lenau)
115
100.0645
1.7931
0.002050
1879
Freamăt de codru
179
144.4853
1.8185
0.000871
1871
Frumoasă şi jună
113
85.3657
1.5510
0.003709
1873
Ghazel
331
235.8836
1.7957
0.001088
1883
Glossă
380
200.2494
1.3595
0.001759
1867
Horia
143
120.9907
1.8236
0.001519
1883
Iar când voi fi pământ (variantă)
131
107.4787
1.7371
0.001469
1874
Împărat şi proletar
1510
896.5804
1.8876
0.000491
1874
În căutarea Şeherezadei
915
615.4903
1.9921
0.000412
1871
Înger de pază
91
72.0645
1.5514
0.001199
1873
Inger şi demon
876
537.7659
1.8064
0.000980
1870
Îngere palid...
63
52.8284
1.5088
0.003236
1869
Întunericul şi poetul
249
180.4694
1.7367
0.002033
1883
Iubind în taină...
87
76.8284
1.7128
0.001976
1871
Iubită dulce, o, mă lasă
337
216.0618
1.6205
0.001782
1871
Iubitei
416
248.4707
1.5643
0.001104
1869
Junii corupţi
458
322.7674
1.8752
0.001988
1887
Kamadeva
81
70.2426
1.6550
0.001384
1866
La Bucovina
184
141.8929
1.7465
0.001464
1883
La mijloc de codru...
55
41.6363
1.3175
0.003397
1867
La moartea lui Heliade
332
231.7707
1.7600
0.001249
1870
La moartea lui Neamţu
245
175.3071
1.7095
0.001398
1869
La moartea principelui Ştirbey
132
99.4787
1.5981
0.003432
132 The word Year
Poem title (alphabetically)
N
L
Λ
Var (Λ)
1866
La mormântul lui Aron Pumnul
150
118.8191
1.7237
0.001424
1868
La o artistă (Ca a nopţii poezie)
142
106.4049
1.6128
0.002545
1869
La o artistă (Credeam ieri)
219
158.7279
1.6963
0.001128
1870
La Quadrat
110
80.8929
1.5012
0.002015
1886
La steaua
71
61.8284
1.6121
0.002699
1876
Lacul
90
71.0711
1.5432
0.001176
1883
Lasă-ţi lumea...
225
170.5432
1.7829
0.001060
1869
Lebăda
41
37.2361
1.4647
0.004821
1866
Lida
66
57.6503
1.5894
0.002342
1869
Locul aripelor
259
174.8995
1.6297
0.002292
1883
Luceafărul
1737
885.3917
1.6514
0.000364
1883
Mai am un singur dor
125
103.2426
1.7319
0.001547
1876
Melancolie
274
201.9549
1.7968
0.001355
1872
Memento mori
9773
3961.9447
1.6175
0.000068
1872
Miradoniz
636
406.0453
1.7898
0.000703
1866
Misterele nopţii
155
111.8929
1.5812
0.001928
1873
Mitologicale
681
466.0482
1.9389
0.000647
1871
Mortua est!
491
307.1973
1.6837
0.001684
1876
Mureşanu
2051
1029.0071
1.6616
0.000296
1873
Murmură glasul mării
119
101.9907
1.7789
0.002033
1874
Napoleon
240
176.8790
1.7542
0.000967
1871
Noaptea...
177
130.3071
1.6550
0.002370
1867
Nu e steluţă
54
39.8284
1.2778
0.004077
1882
Nu mă-nţelegi
384
261.7793
1.7618
0.001325
1883
Nu voi mormânt bogat (variantă)
113
99.6569
1.8106
0.001814
1868
Numai poetul
48
39.8284
1.3950
0.004853
1873
O arfă pe-un mormânt
157
120.3071
1.6827
0.001113
1866
O călărire în zori
346
245.8645
1.8042
0.001767
1869
O stea prin ceruri
78
64.8284
1.5726
0.002338
1874
O, adevăr sublime...
334
235.3818
1.7786
0.001053
1880
O, mamă…
140
99.8929
1.5313
0.003158
1883
Odă în metru antic
103
83.2426
1.6267
0.002847
1872
Odin şi poetul
1429
763.2834
1.6852
0.000405
1869
Ondina (Fantazie)
871
557.6579
1.8823
0.000441
1878
Oricâte stele...
85
72.8284
1.6531
0.001285
1879
Pajul Cupidon...
148
118.7800
1.7418
0.003224
1879
Pe aceeaşi ulicioară...
138
104.4853
1.6202
0.002283
1883
Pe lângă plopii fără soţ
199
140.7213
1.6256
0.002001
1883
Peste vârfuri
47
40.0645
1.4254
0.003249
Frequency distribution 133
Year
Poem title (alphabetically)
N
L
Λ
Var (Λ)
1878
Povestea codrului
220
171.7800
1.8290
0.000718
1887
Povestea teiului
390
271.1328
1.8013
0.001072
1869
Prin nopţi tăcute
48
39.8284
1.3950
0.004853
1873
Privesc oraşul furnicar
173
140.7559
1.8209
0.002139
1874
Pustnicul
380
273.9640
1.8599
0.001115
1871
Replici
147
81.8627
1.2070
0.007270
1879
Revedere
141
103.0711
1.5711
0.002439
1879
Rugăciunea unui dac
357
259.4127
1.8549
0.001310
1883
S-a dus amorul
219
155.5432
1.6623
0.002897
1885
Sara pe deal
156
129.4787
1.8203
0.002549
1881
Scrisoarea I
1282
744.5881
1.8051
0.000558
1881
Scrisoarea II
696
442.3648
1.8067
0.001246
1881
Scrisoarea III
2278
1236.8800
1.8230
0.000224
1881
Scrisoarea IV
1256
749.9394
1.8504
0.000669
1881
Scrisoarea V
1027
581.7540
1.7059
0.000425
1883
Se bate miezul nopţii...
45
39.8284
1.4632
0.003358
1883
Şi dacă...
53
37.2426
1.2116
0.005811
1878
Singurătate
172
135.0711
1.7556
0.001423
1883
Somnoroase păsărele...
55
46.2426
1.4633
0.002493
1879
Sonete
265
196.3071
1.7951
0.001229
1868
Speranţa
245
154.6554
1.5082
0.001430
1871
Steaua vieţii
70
55.2426
1.4561
0.003816
1879
Stelele-n cer
91
76.6569
1.6503
0.001156
1870
Sus în curtea cea domnească
128
104.6569
1.7229
0.001490
1883
Te duci...
84
72.9112
1.6703
0.003614
1883
Trecut-au anii
88
74.2426
1.6405
0.001218
1869
Unda spumă
59
46.8284
1.4055
0.003571
1887
Venere şi Madona
393
256.5522
1.6936
0.001318
1883
Veneţia (de Gaetano Cerri)
79
70.8284
1.7013
0.002293
1869
Viaţa mea fu ziuă
105
86.6569
1.6681
0.002036
1876
Vis
177
139.8929
1.7767
0.000944
In Table 3.2.3.1, the lambdas of each poem are shown. The average of all the 146 poems is Λ = 1.6685 and the variance of the values is extremely small, viz. s2 = 0.0235. Lambda is, in this form, independent of N. If we order the poems according to their years of origin, we obtain a straight line, which is almost horizontal, viz. Λ = 2.6964 – 0.000548(year), and no test statistic (t or F) is significant. Thus, we obtained an indicator which gives a picture of the frequency structure of a text.
134 The word Using the empirical mean and variance, we may set up a 95% confidence interval for a true mean by computing
Λ − 1.96 s 2 / N ≤ µ Λ ≤ Λ + 1.96 s 2 / N , which in our case is 1.6685 – 1.96(0.0235/146)1/2 ≤ µ Λ ≤ 1.6685 + 1.96(0.0235/146)1/2, yielding . We obtain the 95% interval for the individual values of Λ as . It is easy to see that this interval includes the golden section
= ϕ
1+ 5 = 1.6180, 2
which appears also elsewhere with different text phenomena. This fact forces us to look at the text from a different point of view. The writer controls the text both consciously and subconsciously. S/He cares for style and theme and as long as the text is short, s/he can maintain a kind of equilibrium of word repetitions, a property captured by the rank-frequency sequence of words. But if the text becomes longer, this control is lost either suddenly after a break in writing or gradually during the writing process itself, depending on the capability of the writer. At this point, the equilibrium is broken and one should not expect any kind of conscious structuring. But in this chaotic situation, self-organisation takes place and steers the frequency structure of the text towards a fixed point which is, in our case, the golden section. Although we believe in the writer's freedom of text construction, this behaviour of texts forces us to believe either in the existence of self-organisation or in the existence of background mechanisms working during the whole time of writing. Needless to say, the golden section has been found now in the interval of Romanian, but Λ can take quite different values in other languages (cf. Popescu, Čech, Altmann 2011: 49ff), which simply have other attractors or obey some other latent mechanisms. In some subsequent chapters we shall see that the golden section is an attractor appearing in different geometric properties of the ranked frequency sequence. Nevertheless, there is a certain dispersion of lambdas in texts, i.e. not all texts converge equally to or attain the golden section – there are differences between them. In order to test them at least asymptotically, we must know the variance of an individual lambda. As shown in Popescu, Mačutek, Altmann (2010) (cf. also Popescu, Čech, Altmann 2011) the variance can be computed as
Frequency distribution 135
(3.2.3.3)
Var ( L) =
N − f1 V 2 pˆ r N − f1 V −1 V aˆr pˆ r 1 − aˆr aˆ s pˆ r pˆ s −2 ∑ 2 ∑ ∑ 1 − pˆ1 r = 2 (1 − pˆ1 ) r = 2 s= r +1 1 − pˆ1
where pˆ r −1 − pˆ r 1 − pˆ1 − aˆr = + 2 ˆ r −1 − pˆ r 2 p ( N − f1 ) +1 1 − pˆ1
pˆ r − pˆ r +1 1 − pˆ1
( N − f1 )
( N − f1 ) ( N − f1 )
2
2
pˆ r − pˆ r +1 +1 1 − pˆ1
for r = 2,..,V-1, and
aˆV = −
pˆV −1 − pˆV 1 − pˆ1
( N − f1 ) ( N − f1 )
2
2
pˆV −1 − pˆV +1 1 − pˆ1
,
where the symbols with a circumflex are the values estimated form the sample. For an asymptotic test for comparing two texts we need the variance of Λ, which is given by 2
log N (3.2.3.4) Var (Λ ) = 10 Var ( L) . N
The variances of individual texts are presented in the last column of Table 3.2.3.1. The test can be performed by means of the normal approximation (3.2.3.5) u =
Λ1 − Λ 2 . Var (Λ1 ) + Var (Λ 2 )
For the sake of illustration, consider the last two poems Vis and Viaţa mea fu ziuă with Λ(Vis) = 1.7767 and Var(ΛVis) = 0.000944 and Λ(Viaţa…) = 1.6681 and Var(ΛViaţa…) = 0.002036. Inserting these values into formula (3.2.3.5) we obtain
= u
1.7767 − 1.6681 = 1.99 0.000944 + 0.002036
which is significant at the α = 0.05 level. Though the differences in Λ may be very small – Eminescu's lambda values vary in the interval – they may nonetheless be significantly different.
136 The word It can easily be shown that Λ does not depend on the development of the writer: there is no relation between the year of origin and Λ. The same holds for the relation to the elements of Ord's criterion. It is worthwhile to compare Λ with its maximum Λmax as allowed by the longest possible arc length Lmax. The latter can be slightly simplified and obtained without much computing in the form (cf. Popescu, Mačutek, Altmann 2009: 68) Lmax = f(1) + V – 2 = N – 1. Inserting this expression into Eq. (3.2.3.2), we obtain an approximate Λmax as
1 (3.2.3.6) Λ max =1 − log10 N , N which approaches the common logarithm of N as N grows (rendering 1/N → 0). In order to show the behaviour of both Λ and Λmax, we present their course for different text lengths in Figure 3.2.3.1 for 1185 texts in 35 languages (cf. Popescu, Čech, Altmann 2011: 11) and in Figure 3.2.3.2 for 146 poems by Eminescu. As can
Figure 3.2.3.1. Observed and maximum lambda values of 1185 texts of different size in 35 languages
be seen, no text attains its lambda-maximum, and the greater the text size, the greater the difference between the observed lambda and maximum lambda. Considering the figures showing lambda, one can see that the observed lambda values are roughly constant and the deviations observed in individual texts may hint at language, genre, and some stylistic idiosyncrasy which should be further investigated. However, it is noteworthy to remark a flat bowlike Λdependence on text size N, between about 500 and 1000, as can be noticed in
Frequency distribution 137
Figure 3.2.3.2, a course confirmed also for Czech and English (R. Čech, personal communication, 2011). This behaviour is expected because in very short texts no or only few words are repeated and Λ may approach Λmax quite closely, whereas in very long texts Λ decreases significantly because of word repetition. Generally, the shorter the text, the more chances it has to attain the maximum. This can be clearly seen if we plot relative lambda values, i.e. Λ/Λmax against text size as shown in Figure 3.2.3.3. However, this dependence becomes blurred if N is measured in units of the area (h – 1)2, i.e. in terms of the indicator b = N/(h – 1)2 where h is the h-point (cf. Popescu, Mačutek, Altmann 2009:144, Eq.7.12), as illustrated in Figure 3.2.3.4.
Figure 3.2.3.2. Observed and maximum lambdas in Eminescu's poems of different size
A recent progress has been made in constructing a Λ indicator independent of text size (Popescu, Zörnig, Altmann, Glottometrics, 2013: 25, 43 ff.) by introducing a modified lambda as Λmod = Λ(log10N)0.14282575 with the variance (3.2.3.8) Var(Λmod) = (log10N)0,2858Var(Λ).
138 The word
Figure 3.2.3.3. Relative lambda against text size for 146 poems by Eminescu.
Figure 3.2.3.4. Indicators Λ and b
As a closing application of the above lambda concept we consider the ranking of a few well known poems by Eminescu in Tables 3.2.3.2, 3.2.3.3, and 3.2.3.4. The last table introduces the sampling by the first 100 words of the considered poem in the ratio with the corresponding Λmax(100) = 1.98. Obviously, the ranking is very sensitive to the considered variable. Thus, in Table 3.2.3.2, ordered by descending values of the absolute lambda, the top belongs to the poem În căutarea Şeherezadei; in Table 3.2.3.3, ordered by descending values of the relative lambda, the top belongs to the poem Floare-albastră; and, finally, in Table 3.2.3.4, ordered by descending vaues of the relative lambda of the first 100 word sampling, the top belongs to the poem Scrisoarea IV. In all cases the rank-
Frequency distribution 139
ing emphasises the quality of the word rank-frequency distribution of the considered texts. Table 3.2.3.2: Lambda in 7 poems by M. Eminescu N
Poem title În căutarea Şeherezadei Floare-albastră Scrisoarea IV Scrisoarea III Călin Memento mori Feciorul de împărat fără de stea
915 247 1256 2278 2299 9773 6030
Λ 1.9921 1.8612 1.8504 1.8230 1.7534 1.6175 1.5332
Table 3.2.3.3: Relative lambda in 7 poems by M. Eminescu Poem title Floare-albastră În căutarea Şeherezadei Scrisoarea IV Scrisoarea III Călin Feciorul de împărat fără de stea Memento mori
Λ
N 247 915 1256 2278 2299 6030 9773
1.8612 1.9921 1.8504 1.8230 1.7534 1.5332 1.6175
Λmax
Λ/Λmax
2.3830 2.9582 3.0965 3.3561 3.3601 3.7797 3.9896
0.7810 0.6734 0.5976 0.5432 0.5218 0.4056 0.4054
Table 3.2.3.4: Lambda of first 100 words samples from 7 poems by M. Eminescu Poem title Scrisoarea IV Memento mori Floare-albastră Scrisoarea III Călin În căutarea Şeherezadei Feciorul de împărat fără de stea
N
Λ(100) (for the first 100 words)
Λ(100) / Λmax(100) (Λmax(100) = 1,98)
1256 9773 247 2278 2299 915 6030
1.7731 1.7249 1.7238 1.7096 1.6896 1.6049 1.5413
0.8955 0.8712 0.8706 0.8634 0.8533 0.8106 0.7784
As has been shown in Popescu, Čech, Altmann (2011), lambda is a kind of control of the entire production of an author. Beside different style, subject, text sort, and other influences, the author has a certain (subconscious) way of writing, rendering the latent structure (in this case the frequency structure) of the text similar to his other texts. It is a kind of perseveration. It depends on his
140 The word feeling for rhythm, ease of writing, adopted customs, etc., leading to similarity – in extreme cases to monotony – in his work. On the other hand, he tries to avoid old structures and strives for some innovation. The clash of these forces boils down to the fact that not all texts have the same lambda structure and not all texts differ significantly from other ones. Since the lambda values of two texts can be tested for difference (using formula 3.2.3.5), we compare each poem with each other and state the number of different texts in the analyzed work. For Eminescu we obtain the results presented in the last columns of Table 3.2.3.5. We suppose that these differences are not haphazard but follow a certain regularity which can be derived from the interaction of the two mentioned forces. Considering the number of similarities as a continuous variable, we conjecture that the relative rate of change (of similarities dy/y) depends on the difference between the two forces, i.e. it is proportional to the difference of innovation minus perseveration, i.e.
dy b c = − (3.2.3.7) dx . y x−m M −x Hence, y is the number of significant similarities, m is the minimum, M the maximum value of the variable x, and let x = 1/Λ. Solving (3.2.3.7) we obtain the beta function
a ( x − m)b ( M − x ) c . (3.2.3.8) y = Inserting the values presented in Table 3.2.3.5 we obtain the function as shown in Figure 3.2.3.5. Since we consider x = 1/Λ, the minimum can be set at 0.5 and all the other values were found iteratively. The coefficient of determination is R2 = 0.87, which can be considered a very good result. y = 0.0038(x – 0.5)2.0133(2.1449– x)33.493
Figure 3.2.3.5. Fitting the beta function to the number of similarities
Frequency distribution 141 Table 3.2.3.5: Number of significantly similar poems for each of Eminescu's poems ranked by first publishing year (from Popescu, Mačutek, Altmann 2011: 103 ff.) First published Poem title 1866 1866 1866 1866 1866 1866 1866 1867 1867 1867 1867 1867 1868 1868 1868 1868 1869 1869 1869 1869 1869 1869 1869 1869 1869 1869 1869 1869 1869 1869 1869 1869 1869 1869 1870 1870 1870 1870 1870 1871 1871
De-aş avea Din străinătate La Bucovina La mormântul lui Aron Pumnul Lida Misterele nopţii O călărire în zori Ce-ţi doresc eu ţie, dulce Românie Din lyra spartă... Horia La moartea lui Heliade Nu e steluţă Amorul unei marmure La o artistă (Ca a nopţii poezie) Numai poetul Speranţa Amicului F.I. Când Când marea... Când priveşti oglinda mărei Cine-i? De ce să mori tu? De-aş muri ori de-ai muri Întunericul şi poetul Junii corupţi La moartea principelui Ştirbey La o artistă (Credeam ieri...) Lebăda Locul aripelor O stea pin ceruri Ondina (Fantazie) Prin nopţi tăcute Unda spumă Viaţa mea fu ziuă Epigonii Îngere palid... La moartea lui Neamţu La Quadrat Sus în curtea cea domnească Andrei Mureşanu Aveam o muză
Λ 1.3225 1.7060 1.7465 1.7237 1.5894 1.5812 1.8042 1.6037 1.4675 1.8236 1.7600 1.2778 1.7059 1.6128 1.3950 1.5082 1.8253 1.6985 1.4776 1.6566 1.5660 1.6209 1.5997 1.7367 1.8752 1.5981 1.6963 1.4647 1.6297 1.5726 1.8823 1.3950 1.4055 1.6681 1.8992 1.5088 1.7095 1.5012 1.7229 1.7387 1.8080
Var(Λ) 0.0069 0.0013 0.0015 0.0014 0.0023 0.0019 0.0018 0.0031 0.0028 0.0015 0.0012 0.0041 0.0019 0.0025 0.0049 0.0014 0.0011 0.0007 0.0036 0.0022 0.0035 0.0020 0.0022 0.0020 0.0020 0.0034 0.0011 0.0048 0.0023 0.0023 0.0004 0.0049 0.0036 0.0020 0.0006 0.0032 0.0014 0.0020 0.0015 0.0003 0.0019
1/Λ 0.7561 0.5862 0.5726 0.5801 0.6292 0.6324 0.5543 0.6236 0.6814 0.5484 0.5682 0.7826 0.5862 0.6200 0.7168 0.6630 0.5479 0.5888 0.6768 0.6036 0.6386 0.6169 0.6251 0.5758 0.5333 0.6257 0.5895 0.6827 0.6136 0.6359 0.5313 0.7168 0.7115 0.5995 0.5265 0.6628 0.5850 0.6661 0.5804 0.5751 0.5531
Similarities 21 80 77 82 72 64 62 82 41 54 72 11 85 76 32 44 49 70 47 77 74 74 74 90 42 82 73 49 73 64 25 32 28 78 22 53 82 47 82 61 63
142 The word First published Poem title 1871 1871 1871 1871 1871 1871 1871 1871 1871 1871 1872 1872 1872 1872 1872 1872 1872 1872 1872 1872 1873 1873 1873 1873 1873 1873 1873 1873 1873 1873 1873 1873 1873 1873 1874 1874 1874 1874 1874 1874 1875 1876 1876 1876
Basmul ce i l-aş spune ei Copii eram noi amandoi Frumoasă şi jună Înger de pază Iubită dulce, o, mă lasă Iubitei Mortua est! Noaptea... Replici Steaua vieţii Când crivăţul cu iarna... Cugetările sărmanului Dionis Demonism Doi aştri Ecò Egipetul Feciorul împărat fără de stea Memento mori Miradoniz Odin şi poetul Adânca mare... Ah, mierea buzei tale Care-i amorul meu în astă lume Cum oceanu-ntărâtat... Dacă treci râul Selenei Din Berlin la Potsdam Dumnezeu şi om Floare-albastră Ghazel Înger şi demon Mitologicale Murmură glasul mării O arfă pe-un mormânt Privesc oraşul furnicar Cum negustorii din Constantinopol Împarat şi proletar În căutarea Şeherezadei Napoleon O, adevăr sublime... Pustnicul Făt-Frumos din tei Călin (file de poveste) Crăiasa din poveşti Dorinţa
Λ 1.7774 1.8253 1.5510 1.5514 1.6205 1.5643 1.6837 1.6550 1.2070 1.4561 1.7666 1.9632 1.7502 1.5385 1.8807 1.9211 1.5332 1.6175 1.7898 1.6852 1.5767 1.5259 1.7369 1.6474 1.7523 1.6677 1.9548 1.8612 1.7957 1.8064 1.9389 1.7789 1.6827 1.8209 1.6800 1.8876 1.9921 1.7542 1.7786 1.8599 1.8258 1.7534 1.6580 1.7292
Var(Λ) 0.0011 0.0010 0.0037 0.0012 0.0018 0.0011 0.0017 0.0024 0.0073 0.0038 0.0013 0.0010 0.0009 0.0008 0.0010 0.0004 0.0001 0.0001 0.0007 0.0004 0.0040 0.0016 0.0019 0.0015 0.0012 0.0039 0.0016 0.0023 0.0011 0.0010 0.0006 0.0020 0.0011 0.0021 0.0022 0.0005 0.0004 0.0010 0.0011 0.0011 0.0012 0.0003 0.0020 0.0012
1/Λ 0.5626 0.5479 0.6447 0.6446 0.6171 0.6393 0.5939 0.6042 0.8285 0.6868 0.5661 0.5094 0.5714 0.6500 0.5317 0.5205 0.6522 0.6182 0.5587 0.5934 0.6342 0.6554 0.5757 0.6070 0.5707 0.5996 0.5116 0.5373 0.5569 0.5536 0.5158 0.5621 0.5943 0.5492 0.5952 0.5298 0.5020 0.5701 0.5622 0.5377 0.5477 0.5703 0.6031 0.5783
Similarities 69 49 68 54 71 58 79 79 9 41 74 8 70 47 34 12 41 47 59 63 79 48 90 66 72 98 12 51 60 55 12 76 73 60 79 25 3 70 68 40 51 62 75 81
Frequency distribution 143
First published Poem title 1876 1876 1876 1876 1878 1878 1878 1878 1879 1879 1879 1879 1879 1879 1879 1879 1879 1879 1879 1880 1880 1881 1881 1881 1881 1881 1882 1883 1883 1883 1883 1883 1883 1883 1883 1883 1883 1883 1883 1883 1883 1883 1883 1883
Lacul Melancolie Mureşanu Vis Departe sunt de tine Oricâte stele... Povestea codrului Singurătate Atât de fragedă… De câte ori, iubito... Despărţire Foaia veştedă (dupa Lenau) Freamăt de codru Pajul Cupidon... Pe aceeaşi ulicioară... Revedere Rugăciunea unui dac Sonete Stelele-n cer Dintre sute de catarge O, mamă... Scrisoarea I Scrisoarea II Scrisoarea III Scrisoarea IV Scrisoarea V Nu mă-nţelegi Adio Când amintirile... Ce e amorul? Ce te legeni... Criticilor mei Cu mâne zilele-ţi adaogi... De-oi adormi (variantă) De-or trece anii... Din valurile vremii... Glossă Iar când voi fi pământ (variantă) Iubind în taină... La mijloc de codru... Lasă-ţi lumea... Luceafărul Mai am un singur dor Nu voi mormânt bogat (variantă)
Λ 1.5432 1.7968 1.6616 1.7767 1.6868 1.6531 1.8290 1.7556 1.7701 1.6933 1.7051 1.7931 1.8185 1.7418 1.6202 1.5711 1.8549 1.7951 1.6503 1.4153 1.5313 1.8148 1.8067 1.8230 1.8504 1.7059 1.7618 1.5872 1.6520 1.6533 1.5629 1.4837 1.6230 1.7998 1.4467 1.5200 1.3595 1.7371 1.7128 1.3175 1.7829 1.6514 1.7319 1.8106
Var(Λ) 0.0012 0.0014 0.0003 0.0009 0.0030 0.0013 0.0007 0.0014 0.0010 0.0010 0.0016 0.0021 0.0009 0.0032 0.0023 0.0024 0.0013 0.0012 0.0012 0.0071 0.0032 0.0006 0.0012 0.0002 0.0007 0.0004 0.0013 0.0041 0.0023 0.0041 0.0032 0.0020 0.0025 0.0022 0.0039 0.0025 0.0018 0.0015 0.0020 0.0034 0.0011 0.0004 0.0015 0.0018
1/Λ 0.6480 0.5565 0.6018 0.5628 0.5928 0.6049 0.5467 0.5696 0.5649 0.5906 0.5865 0.5577 0.5499 0.5741 0.6172 0.6365 0.5391 0.5571 0.6060 0.7066 0.6530 0.5510 0.5535 0.5485 0.5404 0.5862 0.5676 0.6300 0.6053 0.6049 0.6398 0.6740 0.6161 0.5556 0.6912 0.6579 0.7356 0.5757 0.5838 0.7590 0.5609 0.6055 0.5774 0.5523
Similarities 52 61 60 68 92 65 47 73 68 72 84 72 49 96 75 65 44 61 65 43 61 46 61 42 37 69 72 83 78 93 69 40 75 72 41 53 18 80 85 18 65 59 87 63
144 The word First published Poem title 1883 1883 1883 1883 1883 1883 1883 1883 1883 1883 1884 1885 1886 1887 1887 1887 1887
Odă în metru antic Pe lângă plopii fără soţ Peste vârfuri S-a dus amorul Se bate miezul nopţii... Şi dacă... Somnoroase păsărele... Te duci... Trecut-au anii Veneţia (de Gaetano Cerri) Din noaptea Sara pe deal La steaua De ce nu-mi vii Kamadeva Povestea teiului Venere şi Madona
Var(Λ)
Λ 1.6267 1.6256 1.4254 1.6623 1.4632 1.2116 1.4633 1.6703 1.6405 1.7013 1.5314 1.8203 1.6121 1.4575 1.6550 1.8013 1.6936
0.0028 0.0020 0.0032 0.0029 0.0034 0.0058 0.0025 0.0036 0.0012 0.0023 0.0029 0.0025 0.0027 0.0037 0.0014 0.0011 0.0013
1/Λ 0.6147 0.6152 0.7016 0.6016 0.6834 0.8254 0.6834 0.5987 0.6096 0.5878 0.6530 0.5494 0.6203 0.6861 0.6042 0.5552 0.5905
Similarities 76 71 34 85 42 8 39 96 65 91 61 64 80 41 67 59 74
Similar results have been found also in works of other authors in several languages (cf. Popescu, Čech, Altmann 2010). Of course, the interaction of the two forces innovation and perseveration may have another form but idiosyncrasies can be studied only after they had been discovered.
3.2.4 Entropy and repeat rate In this section, we shall present two well-known indicators of frequency data, viz. Shannon's entropy and Herfindahl's (1950) indicator of concentration. Shannon's entropy is only one of a large number of formulas expressing the same property (cf. Esteban, Morales 1995), but in linguistics it was used many times and is sufficient for our purposes. The concentration indicator was introduced into linguistics by G. Herdan (1962:36-40; 1966:271-273) under the name repeat rate. The formulas are very simple. Shannon's entropy is defined as V
(3.2.4.1) H = − ∑ pi log 2 pi = log 2 N − i =1
1 V ∑ fi log 2 fi , N i =1
where pi is the relative frequency of the respective entity, i.e., pi = fi/N. The variance of H is
Frequency distribution 145
V
(3.2.4.2) Var ( H ) =
∑ p log i =1
i
2 2
pi − H 2
N
.
The relative entropy is given as (3.2.4.3) H rel =
H , H0
where H0 = log2V, and V the number of types (vocabulary) or form-types. The test for the difference of two texts with the entropies H1 and H2 can be performed by means of the t-test (3.2.4.4) t =
H1 − H 2 . Var ( H1 ) + Var ( H 2 )
where the degrees of freedom can be computed from (3.2.4.5) DF =
[Var ( H1 ) + Var ( H 2 )]2 [Var ( H1 )]2 [Var ( H 2 )]2 + N1 N2
but since our data are usually very extensive, one can consider (3.2.4.4) as a normal test with infinitely many degrees of freedom, and the computation of formula (3.2.4.5) can be omitted. The domain of the entropy is H ∈ [0, log2V]. The repeat rate is defined as V
(3.2.4.6) RR = ∑ pi2 , i =1
computed usually as (3.2.4.7) RR =
1 V 2 ∑ fi , N 2 i =1
i.e. using directly the absolute frequencies. The variance of RR is asymptotically equal to
( RR) = (3.2.4.8) Var
4 V 3 (∑ pi − RR 2 ) . N i =1
For the comparison of two texts, the asymptotic normal test can be applied. Since RR ∈ [1/V; 1] normalisation can be performed as follows
146 The word
(3.2.4.9) RRrel =
1 − RR . 1−1 / V
Frequently, an alternative formula, the McIntosh version (1967) is employed: (3.2.4.10) RRMc =
1 − RR . 1−1 / V
The mutual asymptotic relationship of entropy and repeat rate is (3.2.4.11) RR =
2( H 0 − H ) + 1 . V
We illustrate the computation showing its application to the poem Prin nopţi tăcute, for which we obtain the sequence of frequencies 3,3,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1. Here N = 48, V = 40, hence H0 = log240 = 5.3219. Since log22 = 1 and log21 = 0, we obtain H1 = log248 – (1/48)[(2)3log23 + 4(2)1] = 5.5850 – (9.5098 + 8)/48 = 5.2202 and Hrel = 5.2202/5.3219 = 0.9809. The variance is Var(H) = {2(3/48)[log2(3/48)]2 + 4(2/48)[log2(2/48)]2 + + 34(1/48)[log2(1/48)]2 − 5.22022}/48 = 0.0072. The values of the repeat rate can be obtained as follows: RR = [2(32) + 4(22) + 34(1)2]/482 = 0.0295. RRMc =
1 − 0.0295 = 0.9838 1 − 1 / 40
Var(RR) =
3 3 3 4 3 2 1 2 0.0000179. 2 + 4 + 34 − 0.0295 = 48 48 48 48
Frequency distribution 147
Both indicators express an aspect of the unevenness of the distribution of frequencies and can be interpreted in many different ways. Entropy is usually considered as an indicator of uncertainty but at the same time it is a measure of equilibrium or uniformity of the distribution of frequencies. Since –log2 pi is the quantity of self-information or uncertainty with which an entity occurs, expression (3.2.4.1) is the mathematical expectation of this quantity, i.e. an indicator of location. If we consider the text as a point in a V-dimensional space, then the repeat rate is also the square of the distance of this point from the origin. Further, if pi is considered a random variable, then the repeat rate is its mathematical expectation and in turn an indicator of location. However, originally it was introduced as a measure of concentration. Both indicators can be also used as measures of dispersion. It can easily be shown that they can be transformed into one another using formula (3.2.4.11). For example in the above case we obtain the repeat rate from the entropy as
= RR
2(5.319 − 5.2202) + 1 = 0.0301 , 40
while direct computation yielded RR = 0.0295, i.e. the difference is on the third decimal place. All values of entropy and repeat rate of word-forms in 146 poems by Eminescu are presented in Table 3.2.4.1. Entropy is maximal if all entities have the same frequency of occurrence. But in that case the vocabulary richness of the texts is also maximal. The entropy is minimal if a unique word is repeated in the text, but this cannot be found even in Dada texts. Nevertheless, a writer may repeat constructions which may influence all indicators, cf. e.g. Eminescu's poem La mijloc de codru… we find the last five lines: Şi de lună şi de soare Şi de păsări călătoare, Şi de lună şi de stele Şi de zbor de rândurele Şi de chipul dragei mele. where the repetition of words şi de (English “and of”) causes the poem to become an outlier with respect to many indicators and may impair otherwise clear relationships. On the other hand, if all entities have the same frequency, the repeat rate is minimal, i.e. great vocabulary richness is associated with a small repeat rate.
148 The word From this point of view, both indicators can textologically be interpreted as measures of vocabulary richness. The mutual empirical relationship of these two indicators for the considered 146 Eminescu's poems can be expressed by the function RR = 0.0072 + 10.3262*exp(-1.1436H) with R2 = 0.9350. The individual values are presented in Figure 3.2.4.1 and Table 3.2.4.1.
Figure 3.2.4.1. The relation between entropy and repeat rate
There are two outliers clearly visible in Figure 3.2.4.1, signalising specific aims of the writer. One is the poem Replici (H = 5.4313, RR = 0.0399), a “dialogue” between the poet and his sweetheart, with many repetitions of the words eu sunt (“I am”) and tu eşti (“you are”). The other one is the poem La mijloc de codru… (H = 4.5867, RR = 0.0718), discussed above, where the words şi de (“and of”) are repeated seven times in the last five lines. The mean relative entropy computed from all 146 poems is Hrel= 0.9548, whose degree cannot be evaluated unless works by other authors in the same language have been analyzed. A preliminary analysis of 54 poems of the Slovak writer Eva Bachletová yielded Hrel = 0.9745, indicating a slightly lower value of vocabulary richness as compared to Eminescu.
Frequency distribution 149 Table 3.2.4.1: Entropies and repeat rates of word-forms in 146 poems by Eminescu Poem title
H
H0
Hrel
Var(H)
RR
RRMc
Var(RR)
Adânca mare…
5.8038 5.9542 0.9747 0.0075 0.0212 0.9789 0.000012
Adio
6.4966 6.7944 0.9562 0.0070 0.0151 0.9689 0.000005
Ah. mierea buzei tale
6.8335 7.1699
Amicului F.I.
7.4229 7.5999 0.9767 0.0024 0.0069 0.9879 0.000003 7.5236
0.9531 0.0051 0.0119 0.9720 0.000002
Amorul unei marmure
7.2359
Andrei Mureşanu
9.0129 9.9816 0.9030 0.0021 0.0057 0.9548 0.0000002
0.9618 0.0040 0.0090 0.9772 0.000001
Atât de fragedă…
6.8106 7.0553
Aveam o muză
7.7134
8.1344 0.9482 0.0040 0.0078 0.9695 0.0000008
Basmul ce i l-aş spune ei
7.6351
8.0334 0.9504 0.0040 0.0082 0.9696 0.000001
Călin (file de poveste)
9.0441 10.1331 0.8925 0.0021 0.0066 0.9471 0.0000002
Când
6.4533 6.6439 0.9713 0.0059 0.0144 0.9780 0.000005
0.9653 0.0056 0.0123 0.9734 0.000004
Când amintirile...
6.1604 6.3219 0.9744 0.0062 0.0167 0.9804 0.000006
Când crivăţul cu iarna...
7.9948 8.7142
0.9174 0.0043 0.0086 0.9539 0.0000007
Când marea...
6.0594 6.3219
0.9585 0.0082 0.0194 0.9691 0.000009
Când priveşti oglinda mărei
6.1927
0.9741 0.0062 0.0166 0.9794 0.000007
Care-i amorul meu în astă lume 7.0575 Ce e amorul?
6.3576
7.2946 0.9675 0.0040 0.0095 0.9808 0.000001
6.2908 6.5546 0.9598 0.0086 0.0178 0.9662 0.000011
Ce te legeni...
6.0180 6.2479 0.9632 0.0086 0.0202 0.9691 0.000014
Ce-ţi doresc eu ţie. dulce
6.7026 6.9887 0.9591 0.0057 0.0128 0.9735 0.000003
Românie Cine-i?
6.2652 6.5392
Copii eram noi amândoi
7.4797 7.9658 0.9390 0.0056 0.0107 0.9573 0.000003
0.9581 0.0080 0.0175 0.9682 0.000008
Crăiasa din poveşti
6.3904 6.5546 0.9750 0.0050 0.0144 0.9813 0.000005
Criticilor mei
6.3568 6.5078 0.9768 0.0036 0.0137 0.9862 0.000001
Cu mâne zilele-ţi adaogi...
6.5508 6.7142
Cugetările sărmanului Dionis
8.1207 8.6036 0.9439 0.0037 0.0070 0.9652 0.0000008
Cum negustorii din
6.2436 6.3923 0.9767 0.0055 0.0156 0.9823 0.000005
0.9757 0.0040 0.0125 0.9841 0.000002
Constantinopol Cum oceanu-ntărâtat...
5.9713
6.0661 0.9844 0.0045 0.0177 0.9876 0.000005
Dacă treci râul Selenei
7.3167
7.8580
De câte ori. iubito...
6.2581
6.3923 0.9790 0.0050 0.0154 0.9833 0.000006
De ce nu-mi vii
6.0806 6.3576 0.9564 0.0082 0.0196 0.9666 0.00001
De ce să mori tu?
7.0020 7.4263 0.9429 0.0060 0.0119 0.9645 0.000002
De-aş avea
5.6489 5.9307
0.9525 0.0102 0.0253 0.9643 0.000015
De-aş muri ori de-ai muri
7.0085 7.3923
0.9481 0.0055 0.0114 0.9681 0.000002
Demonism
8.2353 8.9658 0.9185 0.0035 0.0078 0.9546 0.0000006
0.9311 0.0066 0.0126 0.9503 0.000004
150 The word Poem title
H
H0
Hrel
Var(H)
RR
RRMc
Var(RR)
De-oi adormi (variantă)
6.6007 6.7142
De-or trece anii...
5.6853
5.9773
0.9831 0.0034 0.0117 0.9883 0.000002 0.9512 0.0125 0.0263 0.9586 0.000025
Departe sunt de tine
6.4962 6.7142
0.9675 0.0062 0.0142 0.9760 0.000005
Despărţire
7.2572
Din Berlin la Potsdam
6.4189 6.6294 0.9683 0.0063 0.0149 0.9761 0.000005
7.6582 0.9476 0.0051 0.0101 0.9674 0.000002
Din lyra spartă...
5.3831
Din noaptea
5.7306 5.8329 0.9825 0.0052 0.0208 0.9866 0.000006
5.4594 0.9860 0.0051 0.0258 0.9885 0.00001
Din străinătate
7.0193
Din valurile vremii...
6.4330 6.7004 0.9601 0.0061 0.0148 0.9738 0.000003
7.3923 0.9495 0.0062 0.0122 0.9639 0.000003
Dintre sute de catarge
5.2039
5.3576
Doi aştri
5.2719
5.2854 0.9974 0.0012 0.0262 0.9977 0.000003
0.9713 0.0113 0.0320 0.9731 0.000038
Dorinţa
6.2445 6.4094 0.9743 0.0066 0.0165 0.9774 0.000009
Dumnezeu şi om
7.9486 8.3219
Ecò
8.1211
0.9551 0.0036 0.0067 0.9726 0.0000007
8.7879 0.9241 0.0043 0.0086 0.9528 0.0000009
Egipetul
8.2963 8.8202 0.9406 0.0033 0.0064 0.9654 0.0000005
Epigonii
8.4986 9.8471
Făt-Frumos din tei
7.7025
0.8631 0.0032 0.0066 0.9588 0.0000005
8.1344 0.9469 0.0042 0.0080 0.9681 0.000001
Feciorul de împărat fără de stea 9.6210 11.1491 0.8629 0.0011 0.0058 0.9437 6.00E-08 Floare-albastră
7.2136 7.9484 0.9076 0.0055 0.0105 0.9687 0.000003
Foaia veştedă (dupa Lenau)
6.5227 6.6294 0.9839 0.0035 0.0123 0.9883 0.000002
Freamăt de codru
6.9598 7.1599
Frumoasă şi jună
6.0989 6.3576 0.9593 0.0085 0.0192 0.9684 0.00001
Ghazel
7.5083
7.8517
0.9563 0.0040 0.0081 0.9740 0.000001
Glossă
6.9536
7.5774
0.9177 0.0056 0.0133 0.9538 0.000002
Horia
6.7646 6.8948 0.9811 0.0034 0.0107 0.9870 0.000002
0.9721 0.0043 0.0102 0.9810 0.000002
Iar când voi fi pământ (var.)
6.5771
Împărat şi proletar
8.8488 9.7432 0.9082 0.0026 0.0062 0.9539 0.0000003
6.7279 0.9776 0.0043 0.0124 0.9842 0.000003
În căutarea Şeherezadei
8.5975
Înger de pază
6.0203 6.1497 0.9790 0.0049 0.0175 0.9845 0.000005
Înger şi demon
8.3352 9.0224 0.9238 0.0033 0.0066 0.9606 0.0000004
Îngere palid...
5.6359
5.7279 0.9839 0.0050 0.0219 0.9876 0.000007
Întunericul şi poetul
7.1177
7.4594 0.9542 0.0055 0.0108 0.9690 0.000002
Iubind în taină...
6.1957 6.2668 0.9887 0.0029 0.0147 0.9919 0.000002
Iubită dulce, o, mă lasă
7.3540
Iubitei
7.4543 7.9069 0.9428 0.0039 0.0087 0.9691 0.0000008
Junii corupţi
7.6844
Kamadeva
6.0342 6.1293 0.9845 0.0043 0.0169 0.9880 0.000005
La Bucovina
6.9459
9.2143
7.7279 8.2715 7.1293
0.9331 0.0031 0.0060 0.9617 0.0000004
0.9516 0.0039 0.0086 0.9743 0.0000007 0.9290 0.0056 0.0101 0.9539 0.000002 0.9743 0.0036 0.0099 0.9838 0.000001
Frequency distribution 151
Poem title
H
H0
Hrel
Var(H)
RR
RRMc
Var(RR)
La mijloc de codru...
4.5867
5.1293
0.8942 0.0385 0.0718 0.8826 0.000391
La moartea lui Heliade
7.4361
7.8138
0.9517 0.0045 0.0089 0.9704 0.000001
La moartea lui Neamţu
7.1826 7.4346 0.9661 0.0037 0.0088 0.9807 0.0000008
La moartea principelui Ştirbey
6.3571
La mormântul lui Aron Pumnul
6.6531 6.8580 0.9701 0.0053 0.0127 0.9781 0.000004
6.6147
0.9611 0.0071 0.0157 0.9729 0.000005
La o artistă (Ca a nopţii poezie) 6.5165 6.7004 0.9725 0.0046 0.0132 0.9814 0.000003 La o artistă (Credeam ieri)
6.9977 7.2479 0.9655 0.0043 0.0106 0.9764 0.000002
La Quadrat
6.0524 6.3038 0.9601 0.0083 0.0195 0.9694 0.00001
La steaua
5.8643 5.9542 0.9849 0.0045 0.0188 0.9882 0.000005
Lacul
5.9155
Lasă-ţi lumea...
7.0988 7.3837 0.9614 0.0050 0.0102 0.9743 0.000002
6.1293
0.9651 0.0090 0.0210 0.9712 0.000013
Lebăda
5.1256
5.2095 0.9839 0.0077 0.0315 0.9842 0.000029
Lida
5.7414
5.8329 0.9843 0.0050 0.0207 0.9870 0.000008
Locul aripelor
7.0855 7.4346 0.9530 0.0049 0.0102 0.9728 0.000001
Luceafărul
8.6845 9.6795 0.8972 0.0024 0.0072 0.9482 0.0000003
Mai am un singur dor
6.5857 6.6865 0.9849 0.0028 0.0115 0.9906 0.000001
Melancolie
7.2069 7.5850 0.9502 0.0058 0.0111 0.9640 0.000003
Memento mori
10.1306 11.8041 0.8582 0.0008 0.0056 0.9409 5.00E-08
Miradoniz
7.8641 8.5584 0.9189 0.0049 0.0106 0.9458 0.000002
Misterele nopţii
6.5452
Mitologicale
8.1924 8.7879 0.9322 0.0040 0.0082 0.9551 0.000001
6.7814 0.9652 0.0053 0.0134 0.9776 0.000003
Mortua est!
7.6630 8.2046 0.9340 0.0044 0.0089 0.9616 0.000001
Mureşanu
8.8851 9.9084 0.8967 0.0022 0.0065 0.9502 0.0000002
Murmură glasul mării
6.5038 6.6439 0.9789 0.0045 0.0131 0.9841 0.000003
Napoleon
7.0924 7.4009 0.9583 0.0052 0.0110 0.9697 0.000003
Noaptea...
6.7288 7.0000 0.9613 0.0056 0.0124 0.9747 0.000003
Nu e steluţă
5.1944
Nu mă-nţelegi
7.6089 8.0056 0.9504 0.0040 0.0079 0.9717 0.0000007
5.3219
0.9760 0.0074 0.0314 0.9815 0.000014
Nu voi mormânt bogat (var.)
6.5094 6.6294 0.9819 0.0042 0.0128 0.9861 0.000003
Numai poetul
5.2202
O arfă pe-un mormânt
6.6392 6.8826 0.9646 0.0059 0.0132 0.9749 0.000004
5.3219 0.9809 0.0072 0.0295 0.9837 0.000018
O călărire în zori
7.5010 7.8642 0.9538 0.0043 0.0090 0.9688 0.000002
O stea prin ceruri
5.9230 6.0224 0.9835 0.0043 0.0181 0.9981 0.000004
O. adevăr sublime...
7.3073 7.8202 0.9344 0.0065 0.0118 0.9549 0.000003
O. mamă…
6.3493 6.6147 0.9599 0.0068 0.0159 0.9720 0.000005
Odă în metru antic
6.2373 6.3750 0.9784 0.0047 0.0152 0.9849 0.000003
Odin şi poetul
8.6319 9.4998 0.9086 0.0026 0.0065 0.9548 0.0000003
Ondina (Fantazie)
8.4225 9.0634 0.9293 0.0032 0.0066 0.9605 0.0000005
152 The word Poem title
H
H0
Hrel
Var(H)
RR
RRMc
Var(RR)
Oricâte stele...
6.1182
6.1898 0.9884 0.0028 0.0154 0.9922 0.000002
Pajul Cupidon...
6.5377
6.8455 0.9550 0.0087 0.0161 0.9630 0.000009
Pe aceeaşi ulicioară...
6.4398 6.6865 0.9631 0.0067 0.0150 0.9734 0.000005
Pe lângă plopii fără soţ
6.7834
7.1085
Peste vârfuri
5.1213
5.2854 0.9690 0.0134 0.0349 0.9684 0.000063
Povestea codrului
7.1731
7.3923 0.9703 0.0039 0.0091 0.9800 0.000002
Povestea teiului
7.6147 8.0279 0.9485 0.0043 0.0084 0.9683 0.000001
0.9543 0.0060 0.0126 0.9703 0.000003
Prin nopţi tăcute
5.2202
Privesc oraşul furnicar
6.8569 7.0875
5.3219 0.9809 0.0072 0.0295 0.9837 0.000018
Pustnicul
7.7049 8.0768 0.9540 0.0039 0.0074 0.9734 0.0000007
Replici
5.4313
Revedere
6.4349 6.6724 0.9644 0.0060 0.0145 0.9761 0.000003
0.9675 0.0055 0.0118 0.9750 0.000004
6.1898 0.8775 0.0177 0.0399 0.9062 0.000033
Rugăciunea unui dac
7.5905 7.9830 0.9508 0.0046 0.0086 0.9682 0.000001
S-a dus amorul
6.8833 7.2479 0.9497 0.0063 0.0124 0.9673 0.000003
Sara pe deal
6.8303 7.0000 0.9758 0.0042 0.0108 0.9831 0.000002
Scrisoarea I
8.6764 9.4676 0.9164 0.0027 0.0065 0.9555 0.0000004
Scrisoarea II
8.0706 8.7245 0.9250 0.0039 0.0078 0.9585 0.0000006
Scrisoarea III
9.0155 10.1624 0.8871 0.0024 0.0079 0.9390 0.0000003
Scrisoarea IV
8.6215
Scrisoarea V
8.2784 9.1033 0.9094 0.0035 0.0084 0.9490 0.0000006
9.4491 5.3219
0.9124 0.0029 0.0072 0.9514 0.0000005
Se bate miezul nopţii...
5.2529
Şi dacă...
5.0294 5.2095 0.9654 0.0108 0.0352 0.9721 0.000027
0.9870 0.0054 0.0281 0.9885 0.000014
Singurătate
6.8570 7.0661 0.9704 0.0045 0.0108 0.9807 0.000002
Somnoroase păsărele...
5.4040 5.5236 0.9784 0.0078 0.0268 0.9810 0.000019
Sonete
7.3509 7.5999 0.9672 0.0034 0.0079 0.9816 0.000001
Speranţa
6.6599
7.1599
0.9302 0.0080 0.0172 0.9481 0.000008
Steaua vieţii
5.6506
5.7814
0.9774 0.0063 0.0224 0.9827 0.000009
Stelele-n cer
6.1082 6.2479 0.9776 0.0058 0.0170 0.9822 0.000007
Sus în curtea cea domnească
6.5576 6.7004 0.9787 0.0040 0.0123 0.9856 0.000002
Te duci...
5.8442 6.0875 0.9600 0.0125 0.0249 0.9582 0.000041
Trecut-au anii
6.1099 6.2095 0.9840 0.0040 0.0160 0.9884 0.000003
Unda spumă
5.4375
Venere şi Madona
7.5182 7.9484 0.9459 0.0043 0.0090 0.9667 0.000001
Veneţia (de Gaetano Cerri)
6.0821
Viaţa mea fu ziuă
6.2773 6.4263 0.9768 0.0052 0.0151 0.9829 0.000004
Vis
6.9242 7.1085
5.5546 0.9789 0.0066 0.0256 0.9836 0.000011 6.1497 0.9890 0.0031 0.0159 0.9918 0.000003 0.9741 0.0039 0.0102 0.9828 0.000002
Frequency distribution 153
3.2.5 Gini's coefficient Another geometric property of the rank-frequency sequence is Gini's coefficient, which is rarely applied in textology (cf. Popescu, Altmann 2006; Popescu et al. 2009: 54 ff.) but it seems to reflect an aspect of vocabulary richness. The usual, original way of computation is lengthy while a simplified variant yields the same results. Originally, the frequencies are considered in reverse order – beginning with the smallest frequency – and the ranking is automatically reverse, too. Then both the frequencies and the ranks are cumulated and relativised. That means, the cumulative relative frequencies form an arc running from (0,0) and touch the bisector in (1,1). The arc is usually called Lorenz curve. The magnitude of the area between the bisector and the Lorenz curve yields Gini's coefficient, as can be seen in Figure 3.2.5.1. It could be computed by means of adding the areas of small trapezoids between the bisector and the Lorenz curve but fortunately there are several equivalent expressions. We shall use the definition (3.2.5.1)= G
1 2 V + 1 − V N
V
r =1
∑ rf (r ) ,
which is based on the usual ranks and frequencies. V is the size of vocabulary and N stand for the text length. Considering the last expression in (3.2.5.1) we see that it is nothing else but 2m1', i.e. twice the mean rank. The greater the area, the smaller is the vocabulary richness. This can easily be seen if one considers the maximal richness of the text in which all words occur exactly once. In that case the Lorenz curve is parallel to the diagonal and G (i.e. the area between the diagonal and the Lorenz curve) would be very small. In order to express the richness appropriately, Popescu et al. (2009) defined it as (3.2.5.2) R4 = 1 − G. The variance of G and R4 are identical because both 1 and V are constants. Hence (3.2.5.3) Var (G ) Var ( R4 ) = =
4σ 2 . V 2N
154 The word
Figure 3.2.5.1. Lorenz curve of (reversed) ranked cumulative relative frequencies
The difference between two texts can be tested on the basis of the asymptotic normal criterion (3.2.5.4) u =
R4,1 − R4,2 Var ( R4,1 ) + Var ( R4,2 )
,
where the lower case 1 and 2 indicate two different texts. All values of R4 and their variances for 146 poems by Eminescu are presented in Table 3.2.5.1 Table 3.2.5.1: Vocabulary richness using Gini's coefficient for 146 poems by Eminescu Poem title
R4
Var(R4)
Poem title
R4
Var(R4)
Adânca mare…
0.8411
0.0053 La moartea lui Heliade
0.7129
0.0012
Adio
0.7362
0.0025 La moartea lui Neamţu
0.7477
0.0016
Ah, mierea buzei tale
0.6912
0.0017 La moartea principelui
0.7682
0.0031
Amicului F.I.
0.7907
0.0015 La mormântul lui Aron
0.7991
0.0027
0.7814
0.0027
Ştirbey Pumnul Amorul unei marmure
0.7347
0.0015 La o artistă (Ca a nopţii
Andrei Mureşanu
0.5506
0.0002 La o artistă (Credeam ieri) 0.7488
0.0017
Atât de fragedă…
0.7826
0.0023 La Quadrat
0.7554
0.0036
Aveam o muză
0.7019
0.0010 La steaua
0.8835
0.0054
Basmul ce i l-aş spune ei
0.7005
0.0010 Lacul
0.7984
0.0045
Călin (file de poveste)
0.5365
0.0002 Lasă-ţi lumea...
0.7656
0.0018
poezie)
Frequency distribution 155
Poem title
R4
Var(R4)
Poem title
R4
Var(R4)
Când
0.8140
0.0032 Lebăda
0.9077
0.0093
Când amintirile...
0.8383
0.0041 Lida
0.8772
0.0058
Când crivăţul cu iarna...
0.6238
0.0006 Locul aripelor
0.7077
0.0015
Când marea...
0.7432
0.0035 Luceafărul
0.5309
0.0002
Când priveşti oglinda mărei 0.8291
0.0039 Mai am un singur dor
0.8499
0.0031
Care-i amorul meu în astă
0.0019 Melancolie
0.7300
0.0045
0.4283 0.00003
0.7677
lume Ce e amorul?
0.7812
0.0033 Memento mori
Ce te legeni...
0.7779
0.0039 Miradoniz
0.6312
0.0006
Ce-ţi doresc eu ţie, dulce
0.7348
0.0022 Misterele nopţii
0.7524
0.0025
Cine-i?
0.7541
0.0031 Mitologicale
0.6782
0.0006
Copii eram noi amândoi
0.6962
0.0011 Mortua est!
0.6466
0.0008
Românie
Crăiasa din poveşti
0.8071
0.0032 Mureşanu
0.5279
0.0002
Criticilor mei
0.7768
0.0028 Murmură glasul mării
0.8529
0.0033
Cu mâne zilele-ţi adaogi...
0.7928
0.0027 Napoleon
0.7434
0.0017
Cugetările sărmanului
0.7069
0.0007 Noaptea...
0.7538
0.0023
0.8448
0.0039 Nu e steluţă
0.8009
0.0068
Cum oceanu-ntărâtat...
0.8806
0.0050 Nu mă-nţelegi
0.7051
0.0010
Dacă treci râul Selenei
0.6822
0.0012 Nu voi mormânt bogat
0.8824
0.0035
Dionis Cum negustorii din Constantinopol
(variantă) De câte ori, iubito...
0.8441
0.0038 Numai poetul
0.8542
0.0080
De ce nu-mi vii
0.7268
0.0031 O arfă pe-un mormânt
0.7771
0.0026
De ce să mori tu?
0.6848
0.0015 O călărire în zori
0.7184
0.0011
0.7141
0.0041 O stea prin ceruri
0.8548
0.0049
0.6993
0.0012
De-aş avea De-aş muri ori de-ai muri
0.6938
0.0015 O, adevăr sublime...
Demonism
0.6088
0.0004 O, mamă…
0.7420
0.0028
De-oi adormi (variantă)
0.8707
0.0032 Odă în metru antic
0.8290
0.0038
De-or trece anii...
0.7515
0.0047 Odin şi poetul
0.5622
0.0003
Departe sunt de tine
0.7981
0.0030 Ondina (Fantazie)
0.6486
0.0005
Despărţire
0.7005
0.0013 Oricâte stele...
0.8785
0.0044
Din Berlin la Potsdam
0.7972
0.0031 Pajul Cupidon...
0.7892
0.0028
Din lyra spartă...
0.8792
0.0075 Pe aceeaşi ulicioară...
0.7729
0.0029
Din noaptea
0.8571
0.0057 Pe lângă plopii fără soţ
0.7272
0.0020
Din străinătate
0.7224
0.0017 Peste vârfuri
0.8418
0.0085
Din valurile vremii...
0.7323
0.0026 Povestea codrului
0.7914
0.0018
156 The word Poem title
R4
Var(R4)
Poem title
R4
Var(R4)
Dintre sute de catarge
0.8361
0.0079 Povestea teiului
0.7044
0.0010
Doi aştri
0.9756
0.0087 Prin nopţi tăcute
0.8542
0.0080
Dorinţa
0.8461
0.0039 Privesc oraşul furnicar
0.8043
0.0023
Dumnezeu şi om
0.7453
0.0009 Pustnicul
0.7339
0.0011
Ecò
0.6607
0.0006 Replici
0.5547
0.0025
Egipetul
0.6880
0.0006 Revedere
0.7598
0.0028
Epigonii
0.6505
0.0004 Rugăciunea unui dac
0.7331
0.0012
Făt-Frumos din tei
0.7059
0.0010 S-a dus amorul
0.7225
0.0019
Feciorul de împărat fără de
0.4392 0.000054 Sara pe deal
0.8345
0.0026
0.7702
0.0017 Scrisoarea I
0.5974
0.0003
Foaia veştedă (dupa Lenau) 0.8725
0.0034 Scrisoarea II
0.6404
0.0006
Freamăt de codru
0.8152
0.0023 Scrisoarea III
0.5441
0.0002
Frumoasă şi jună
0.7584
0.0035 Scrisoarea IV
0.5960
0.0003
Ghazel
0.7294
0.0012 Scrisoarea V
0.5840
0.0004
Glossă
0.5752
0.0009 Se bate miezul nopţii...
0.8983
0.0084
Horia
0.8489
0.0027 Şi dacă...
0.7624
0.0070
Iar când voi fi pământ
0.8298
0.0030 Singurătate
0.7996
0.0023
Împărat şi proletar
0.5965
0.0003 Somnoroase păsărele...
0.8526
0.0071
În căutarea Şeherezadei
0.6747
0.0005 Sonete
0.7629
0.0015
Înger de pază
0.8208
0.0042 Speranţa
0.6456
0.0016
Înger şi demon
0.6277
0.0005 Steaua vieţii
0.8197
0.0055
Îngere palid...
0.8616
0.0061 Stelele-n cer
0.8489
0.0043
Întunericul şi poetul
0.7375
0.0016 Sus în curtea cea dom-
0.8325
0.0031
stea Floare-albastră
(variantă)
nească Iubind în taină...
0.8952
0.0044 Te duci...
0.8207
0.0048
Iubită dulce, o, mă lasă
0.6820
0.0011 Trecut-au anii
0.8603
0.0044
Iubitei
0.6433
0.0009 Unda spumă
0.8291
0.0065
Junii corupţi
0.6918
0.0009 Venere şi Madona
0.6808
0.0010
Kamadeva
0.8764
0.0048 Veneţia (de Gaetano Cerri) 0.9059
0.0048
La Bucovina
0.7946
0.0021 Viaţa mea fu ziuă
0.8358
0.0038
La mijloc de codru...
0.6675
0.0074 Vis
0.8060
0.0022
The relationship to previous indicators may be considered linear. Thus, R4 = −2.6294 + 3.5363Hrel with R2 = 0.92
Frequency distribution 157
but concerning RRMc (cf. formula 3.2.4.10) we must consider the first three values as outliers and omit them. In that case we obtain R4 = −6.1206 + 7.0690RRMc with R2 = 0.80. If we consider all values, the relationship is not linear any more. It is again the poem La mijloc de codru… representing the strongest outlier. Comparing the numbers in the column of R4 we can easily state that they are located in the interval R4 ∈ [0.4283; 0.9756]. The empirical mean of these numbers is R4 = 0.75, the variance is s2 = 0.009828, the third central moment (3.2.2.3) is m3 = -0.000664 and the coefficient of asymmetry (S, cf. formula 3.2.2.5) is γ3 = m3/m23/2 = −0.6813. This means that the distribution is slightly steeper on its right hand side. It can be shown that there is no historical change in Eminescu's vocabulary richness. The linear regression against years yields an almost horizontal straight line with very great dispersion rendering it insignificant. However, it can also be shown that the longer the poem, the smaller is his vocabulary richness. Comparing N with R4 we obtain the power relationship R4 = 1.4023N−0.1185, displaying R2 = 0.78 and the t-test for parameters and the F-test for regression are smaller than 0.00001. This is caused by the fact that with increasing text length many words are repeated and new words do not come in great number. But if an indicator of richness changes with increasing length, it is not really adequate. However, if it changes very regularly, we must suppose that there is a background mechanism controlling the increase of new and the repetition of old words. Unquestionably, it should be connected to the information flow in a text but this concept is very vague and complex and nobody knows how to measure it. It is not identical with the stepwise measurement of entropy, it encompasses semantic, syntactic and lexical information but their share on information has never been measured or even defined.
3.2.6 Geometric properties 3.2.6.1 The triangle The rank-frequency distribution of word-forms, lemmas or hrebs of a text has always a hyperbolic form. The same holds for frequency spectra which can be determined empirically by simple addition or theoretically by a simple transformation. The rank-frequency distribution has two conspicuous points, viz. P1(V,1) designating the vocabulary size (V) of the text or the highest rank (V =
158 The word rmax), whose frequency f(V) is always 12, and the point P2(1,f(1)) designating the first ranked word with highest frequency f(1). The third point, called h-point is determined as follows (cf. Popescu et al. 2009, Eq. (3.2))3. if there is an r such that r = f (r ) r , (3.2.6.1) h = f (i )rj − f ( j )ri r − r + f (i ) − f ( j ) , if there is no r such that r = f (r ) j i
i.e. if the rank r is equal to its frequency f(r) then h = r, otherwise we use the second part of formula (3.2.6.1). When there are several such ranks r, we will take the smallest one. Usually ri and rj are the smallest neighbouring values such that f(i) < f(j). Of course, we would obtain another h if we ascribe the same frequencies to the same mean rank, e.g. if two equal frequencies have ranks 3 and 4 we ascribe them the mean rank 3.5. Now, having the third point P3(h,h) we can set up the characteristic triangle of the text, presented in Figure 3.2.6.1.
Figure 3.2.6.1. The characteristic triangle of the rank requency sequence (from Popescu, Altmann 2006)
2 If the greatest rank is still smaller then the smallest frequency, i.e. if rmax < f(rmax), the parameter h cannot be computed. Hence one must transform the whole ranked sequence by subtraction into f*(r) = f(r) – f(V) + 1 (cf. Popescu, Kelih, Best, Altmann 2009) 3 The seminal paper introducing the h-point into linguistics was published in 2007 (cf. Popescu 2007).
Frequency distribution 159
Knowing the three points we compute the area of the triangle as
V 1 1 1 (3.2.6.2) Ah = 1 f (1) 1 2 h h 1 from which follows
Ah (3.2.6.3)=
1 | Vf (1) + 2h − h(V + f (1)) − 1| . 2
Note that one should take the absolute value of the area Ah regardless of its orientation (P1P2P3 or P1P3P2). For the sake of illustration let us consider the poem Doi aştri in which we find the following ranked word/form frequencies 2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1. Here we have V = 39, f(1) = 2, h = [2(2) – 1(1)]/[2-1+2-1] = 3/2 = 1.5 hence Ah = |39(2) + 2(1.5) - 1.5(39 + 2) – 1|/2 = 9.25. In order to obtain a relative number we compute first the maximum of Ah given as (3.2.6.4) Amax = (1/2)(V – 1)(f(1) – 1) , yielding in our example Amax = (1/2)(39 – 1)(2 – 1) = 19.00. Then the ratio (3.2.6.5) A = Ah/Amax gives the relative empirical size of the characteristic triangle. In our example we obtain A = 9.25/19.00 = 0.4868. As can be observed, N (the number of words in the text under study) does not play any role in this computation. We present the results in Table 3.2.6.1. The dispersion is too great to show a significant dependence on N. An increase of A with increasing N is, nevertheless, observable, as can be seen in Figure 3.2.6.2, but a function capturing this trend will not be proposed. The dispersion is the smaller, the greater is N. Evidently, A approaches its highest value 1. A similar result has been shown for 54 texts in 7 languages (Popescu, Altmann 2006a).
160 The word Table 3.2.6.1: Relative size of the characteristic triangle of rank-frequency sequences of wordforms in 146 poems by Eminescu Poem title
N
V
f(1)
h
Ah
Amax
A
Adânca mare…
75
62
5
3.00
57.00
122.00
0.47
Adio
159
111
9
5.00
204.00
440.00
0.46
Ah, mierea buzei tale
228
144
10
5.00
339.50
643.50
0.53
Amicului F.I.
257
194
5
4.00
90.50
386.00
0.23
Amorul unei marmure
266
184
9
5.00
350.00
732.00
0.48
Andrei Mureşanu
2008 1011
65
14.67
24980.98 32320.00
0.77
Atât de frageda…
176
133
11
3.50
482.50
660.00
0.73
Aveam o muză
421
281
17
7.00
1352.00
2240.00
0.60
Basmul ce i l-aş spune ei
398
262
18
6.00
1523.50
2218.50
0.69
Călin (file de poveste)
2299 1123
98
16.33
45071.35
54417.00
0.83
Când
126
100
7
3.50
165.75
297.00
0.56
Când amintirile...
97
80
5
3.00
75.00
158.00
0.47
Când crivăţul cu iarna...
708
420
31
10.00
4264.50
6285.00
0.68
Când marea...
114
80
7
4.00
109.50
237.00
0.46
Când priveşti oglinda mărei 101
82
6
3.00
116.50
202.50
0.58
Care-i amorul meu în astă
213
157
7
4.50
184.50
468.00
0.39
Ce e amorul?
124
94
8
3.00
225.50
325.50
0.69
Ce te legeni...
102
76
8
3.50
160.00
262.50
0.61
Ce-ţi doresc eu ţie, dulce
183
127
8
4.67
197.16
441.00
0.45 0.49
lume
Românie Cine-i?
129
93
8
4.33
157.00
322.00
Copii eram noi amândoi
375
250
24
7.00
2047.50
2863.50
0.72
Crăiasa din poveşti
122
94
7
3.00
180.00
279.00
0.65
Criticilor mei
130
91
4
3.00
42.00
135.00
0.31
Cu mâne zilele-ţi adaogi...
141
105
6
3.50
123.75
260.00
0.48
Cugetările sărmanului
571
389
27
8.00
3595.00
5044.00
0.71
101
84
5
3.00
79.00
166.00
0.48
Dionis Cum negustorii din Constantinopol Cum oceanu-ntărâtat...
77
67
4
2.50
47.25
99.00
0.48
Dacă treci râul Selenei
356
232
21
6.00
1682.50
2310.00
0.73
De câte ori, iubito...
102
84
6
2.50
141.50
207.50
0.68
De ce nu-mi vii
123
82
9
4.00
190.50
324.00
0.59
De ce să mori tu?
266
172
13
6.00
568.50
1026.00
0.55
De-aş avea
93
61
6
4.00
52.50
150.00
0.35
Frequency distribution 161
Poem title
N
V
f(1)
h
Ah
Amax
A
De-aş muri ori de-ai muri
258
168
10
5.67
340.83
751.50
0.45
Demonism
882
500
36
10.00
6329.50
8732.50
0.72
De-oi adormi (variantă)
122
105
4
3.00
49.00
156.00
0.31
De-or trece anii...
87
63
7
4.00
84.00
186.00
0.45
Departe sunt de tine
135
105
7
4.00
147.00
312.00
0.47
Despărţire
304
202
14
6.00
771.50
1306.50
0.59
Din Berlin la Potsdam
128
99
7
3.67
155.33
294.00
0.53
Din lyra spartă...
51
44
3
2.00
20.50
43.00
0.48
Din noaptea
68
57
3
3.00
2.00
56.00
0.04
Din străinătate
244
168
13
5.00
644.00
1002.00
0.64
Din valurile vremii...
152
104
7
5.00
91.00
309.00
0.29
Dintre sute de catarge
50
41
4
2.67
24.17
60.00
0.40
Doi aştri
40
39
2
1.50
9.25
19.00
0.49
Dorinţa
102
85
7
2.67
177.00
252.00
0.70
Dumnezeu şi om
443
320
15
6.50
1317.25
2233.00
0.59
Ecò
698
442
30
9.00
4514.50
6394.50
0.71
Egipetul
688
452
24
7.50
3646.00
5186.50
0.70
Epigonii
921
565
37
10.00
12258.00 16560.00
0.74
Făt-Frumos din tei
415
281
17
6.50
1426.00
0.64
Feciorul de împărat fără de 6030 2271
207
2240.00
23.67 205748.63 233810.00
0.88
stea Floare-albastră
247
185
12
3.88
983.56
1353.00
0.73
Foaia veştedă (dupa Lenau)
115
99
5
3.00
94.00
196.00
0.48
Freamăt de codru
179
143
7
4.00
204.00
426.00
0.48
Frumoasă şi jună
113
82
8
4.00
151.50
283.50
0.53
Ghazel
331
231
12
6.00
662.50
1265.00
0.52
Glossă
380
191
19
8.00
982.00
1710.00
0.57
Horia
143
119
6
3.00
172.00
295.00
0.58
Iar când voi fi pământ (vari-
131
106
6
3.00
152.50
262.50
0.58
Împărat şi proletar
1510
857
55
13.50
17424.50
23112.00
0.75
În căutarea Şeherezadei
915
594
34
9.00
7280.50
9784.50
0.74
Înger de pază
91
71
5
2.50
84.50
140.00
0.60
Înger si demon
876
520
30
11.00
4785.50
7525.50
0.64
Îngere palid...
63
53
3
2.50
11.50
52.00
0.22
Întunericul şi poetul
249
176
11
5.00
505.00
875.00
0.58
Iubind în taină...
87
77
3
2.50
17.50
76.00
0.23
Iubită dulce, o, mă lasă
337
212
11
6.00
502.50
1055.00
0.48
antă)
162 The word N
V
Iubitei
416
Junii corupţi
458
Poem title
f(1)
h
Ah
Amax
A
240
17
6.50
1210.75
1912.00
0.63
309
23
7.50
2315.50
3388.00
0.68
Kamadeva
81
70
4
2.50
49.50
103.50
0.48
La Bucovina
184
140
7
4.00
199.50
417.00
0.48
La mijloc de codru...
55
35
11
2.83
129.66
170.00
0.76
La moartea lui Heliade
332
225
14
5.75
893.13
1456.00
0.61
La moartea lui Neamţu
245
173
8
5.00
244.00
602.00
0.41
La moartea principelui
132
98
6
4.00
89.50
242.50
0.37
La mormântul lui Aron Pum- 150
116
8
4.00
219.50
402.50
0.55
142
104
7
3.50
172.75
309.00
0.56
La o artistă (Credeam ieri)
219
152
12
4.00
587.50
830.50
0.71
La Quadrat
110
79
7
3.50
129.00
234.00
0.55
La steaua
71
62
3
3.00
2.00
61.00
0.03 0.46
Ştirbey nul La o artistă (Ca a nopţii poezie)
Lacul
90
70
6
3.50
80.00
172.50
Lasă-ţi lumea...
225
167
10
4.50
440.75
747.00
0.59
Lebăda
41
37
3
2.33
10.67
36.00
0.30
Lida
66
57
4
2.00
54.50
84.00
0.65
Locul aripelor
259
173
8
6.00
154.50
602.00
0.26
Luceafărul
1737
820
84
14.00
Mai am un singur dor
125
103
4
3.00
48.00
153.00
Melancolie
274
192
5.50
1062.25
1528.00
Memento mori
9773 3576
423
27.00 702364.00 754325.00
0.93
Miradoniz
636
377
40
8.00
5879.50
7332.00
0.80
Misterele nopţii
155
110
7
4.00
154.50
327.00
0.47
Mitologicale
681
442
34
8.00
5617.50
7276.50
0.77
Mortua est!
491
295
22
7.50
2063.25
3087.00
0.67
Mureşanu
2051
961
88
17.00
33384.00 41760.00
0.80
Murmură glasul mării
119
100
6
3.00
143.50
247.50
Napoleon
240
169
14
4.00
820.50
1092.00
0.75
Noaptea...
177
128
8
4.50
210.00
444.50
0.47
Nu e steluţă
17
28125.50 33988.50
0.83 0.31 0.70
0.58
54
40
3
3.00
2.00
39.00
0.05
Nu mă-nţelegi
384
257
12
7.00
607.00
1408.00
0.43
Nu voi mormânt bogat
113
99
5
3.00
94.00
196.00
0.48
48
40
3
2.50
8.25
39.00
0.21
(variantă) Numai poetul
Frequency distribution 163
Poem title
N
V
f(1)
h
Ah
Amax
A
O arfă pe-un mormânt
157
118
8
4.00
223.50
409.50
0.55
O călărire în zori
346
233
19
5.50
1525.50
2088.00
0.73
O stea prin ceruri
78
65
3
3.00
2.00
64.00
0.03
O, adevăr sublime...
334
226
18
6.00
1307.50
1912.50
0.68
O, mamă…
140
98
7
4.00
136.50
291.00
0.47
Odă în metru antic
103
83
4
3.00
38.00
123.00
0.31
Odin şi poetul
1429
724
56
13.00
15214.50
19882.50
0.77
Ondina (Fantazie)
871
535
35
9.75
6593.00
9078.00
0.73
Oricâte stele...
85
73
3
2.00
35.00
72.00
0.49
Pajul Cupidon...
148
115
9
4.00
273.00
456.00
0.60
Pe aceeaşi ulicioară...
138
103
7
4.00
144.00
306.00
0.47
Pe lângă plopii fără soţ
199
138
9
5.00
258.00
548.00
0.47
Peste vârfuri
47
39
5
2.50
44.50
76.00
0.59
Povestea codrului
220
168
9
3.50
449.25
668.00
0.67
Povestea teiului
390
261
18
6.00
1517.50
2210.00
0.69
Prin nopţi tăcute
48
40
3
2.50
8.25
39.00
0.21
Privesc oraşul furnicar
173
136
10
4.00
391.50
607.50
0.64
Pustnicul
380
270
12
6.50
709.50
1479.50
0.48
Replici
147
73
15
6.43
270.57
504.00
0.54
Revedere
141
102
6
4.50
67.00
252.50
0.27
Rugăciunea unui dac
357
253
14
6.00
975.50
1638.00
0.60
S-a dus amorul
219
152
10
5.00
359.50
679.50
0.53
Sara pe deal
156
128
6
3.67
141.50
317.50
0.45
Scrisoarea I
1282
708
50
10.50
13730.50
17321.50
0.79
Scrisoarea II
696
423
30
11.00
3864.00
6119.00
0.63
Scrisoarea III
2278 1146
110
15.40
53373.70 62402.50
0.86
Scrisoarea IV
1256
699
65
12.33
18018.01 22336.00
0.81
Scrisoarea V
1027
550
46
10.50
9531.00
12352.50
0.77
Se bate miezul nopţii...
45
40
3
2.00
18.50
39.00
0.47
Şi dacă...
53
37
4
3.00
15.00
54.00
0.28
Singurătate
172
134
6
4.00
125.50
332.50
0.38
Somnoroase păsărele...
55
46
4
2.50
31.50
67.50
0.47
Sonete
265
194
8
5.00
275.50
675.50
0.41
Speranţa
245
143
19
5.00
958.00
1278.00
0.75
Steaua vieţii
70
55
4
3.00
24.00
81.00
0.30
Stelele-n cer
91
76
5
3.00
71.00
150.00
0.47
Sus în curtea cea domnea-
128
104
5
3.00
99.00
206.00
0.48
scă
164 The word N
V
f(1)
h
Ah
Amax
Te duci...
84
68
9
3.00
193.00
268.00
0.72
Trecut-au anii
88
74
4
2.50
52.50
109.50
0.48
Poem title
A
Unda spumă
59
47
3
3.00
2.00
46.00
0.04
Venere şi Madona
393
247
17
6.33
1269.34
1968.00
0.65
Veneţia (de Gaetano Cerri)
79
71
3
2.50
16.00
70.00
0.23
Viaţa mea fu ziuă
105
86
5
3.00
81.00
170.00
0.48
Vis
177
138
7
3.50
232.25
411.00
0.57
We obtain an image similar to that obtained for A also if we compute the corresponding triangle for the frequency spectrum (cf. Popescu et al. 2009: 81ff.). For the sake of differentiation we will call here W = the greatest non-zero class, g(1) = number of words occurring once, k = the h-point for the spectrum, Q1(W,1), Q2(1,g(1)), Q3(k,k). The situation is presented in Figure 3.2.6.3. The formulas are identical with those in (3.2.6.1) to (3.2.6.5), mutatis mutandis.4
Figure 3.2.6.2. The convergence of A-values with increasing N
We demonstrate the calculation of B for the poem Doi aştri, whose frequencies are shown above. Here we have after transformation
4 If the greatest rank is still smaller than the smallest frequency, i.e. if the maximum frequency W < g(Wx), the parameter k cannot be computed. Hence one must transform the whole ranked sequence by subtraction into g*(f) = g(f) – g(W) + 1.
Frequency distribution 165
g(1) = 38, g(2) = 1, W = 2, hence k = (38(2) – 1(1))/(2 – 1+ 38 – 1) = 75/38 = 1.9737 Bk = Wg (1) + 2k – k ( g (1) + W ) – 1 / 2 2 ( 38 ) + 2 (1.9737 ) – 1.9737 ( 38 + 2 ) – 1 / 2 = 0.0003 = ... = Bmax
1) / 2 ( 2 (W – 1) ( g (1) –=
and = B B= k / Bmax
– 1) ( g (1) – = 1) / 2
37 /2 =
18.5
0.00001 .
0.0003 / 18.5 =
Figure 3.2.6.3. The characteristic triangle of the frequency spectrum
Again, we compute the value of B for 146 poems and obtain the results in Table 3.2.6.2. Table 3.2.6.2: Relative size of the characteristic triangle of the spectrum of word-forms in 146 poems by Eminescu (asterisk means the use of the transformation g*(f) = g(f) – g(W) + 1) Poem title
N
W
g(1)
k
Bk
Bmax
B 0.46
Adânca mare…
75
5
55
3.00
50.00
108.0
Adio
159
9
88
3.50
229.25
348.0
0.66
Ah, mierea buzei tale
228
10
108
4.33
288.17
481.5
0.60
Amicului F.I*
242
5
154
4.20
54.80
306.0
0.18 0.63
Amorul unei marmure*
235
9
142
3.80
355.40
564.0
Andrei Mureşanu
2008
65
763
6.50
22112.50
24384.0
0.91
Atât de frageda…
176
11
110
3.50
396.25
545.0
0.73
Aveam o muză
421
17
226
3.89
1451.89
1800.0
0.81
Basmul ce i l-aş spune ei
398
18
204
4.40
1351.50
1725.5
0.78
166 The word Poem title Călin (file de poveste) Când
N
W
g(1)
k
Bk
Bmax
B
2299
98
830
6.00
37891.50
40206.5
0.94
126
7
85
2.82
170.18
252.0
0.68
Când amintirile...
97
5
72
2.00
104.50
142.0
0.74
Când crivăţul cu iarna...
708
31
345
3.70
4655.10
5160.0
0.90
Când marea...
114
7
63
3.33
106.67
186.0
0.57
Când priveşti oglinda mărei
101
6
71
2.83
106.25
175.0
0.61
Care-i amorul meu în astă
213
7
130
4.33
162.00
387.0
0.42
Ce e amorul?*
110
8
78
2.88
190.75
269.5
0.71
Ce te legeni...
102
8
61
2.82
149.09
210.0
0.71
Ce-ţi doresc eu ţie, dulce
183
8
100
3.00
240.50
346.5
0.69 0.70
lume
Românie Cine-i?
129
8
75
2.91
181.68
259.0
Copii eram noi amândoi
375
24
205
3.63
2048.06
2346.0
0.87
Crăiasa din poveşti
122
7
74
3.00
140.00
219.0
0.64
Criticilor mei*
120
4
61
3.40
14.40
90.0
0.16
Cu mâne zilele-ţi adaogi...
141
6
79
2.95
114.08
195.0
0.59
Cugetările sărmanului Dionis
571
27
324
3.82
3707.23
4199.0
0.88
Cum negustorii din
101
5
75
3.00
70.00
148.0
0.47
Constantinopol Cum oceanu-ntărâtat...
77
4
60
2.60
38.90
88.5
0.44
Dacă treci râul Selenei
356
21
188
3.33
1628.50
1870.0
0.87
De câte ori, iubito...
102
6
71
2.82
106.82
175.0
0.61
De ce nu-mi vii
123
9
59
3.00
166.00
232.0
0.72
De ce să mori tu?
266
13
141
3.88
621.50
840.0
0.74
De-aş avea*
77
6
45
3.25
54.88
110.0
0.50
De-aş muri ori de-ai muri
258
10
133
3.88
391.31
594.0
0.66
Demonism
882
36
349
5.00
5324.00
6090.0
0.87
De-oi adormi (variantă)*
112
4
94
2.67
59.50
139.5
0.43
De-or trece anii...
87
7
53
2.60
109.60
156.0
0.70
Departe sunt de tine
135
7
90
2.88
177.94
267.0
0.67
Despărţire
304
14
163
4.00
790.50
1053.0
0.75
Din Berlin la Potsdam
128
7
83
2.90
162.40
246.0
0.66
Din lyra spartă...
51
3
38
2.60
5.80
37.0
0.16
Din noaptea*
56
3
47
2.33
14.00
46.0
0.30
Din străinătate
244
13
135
3.63
612.38
804.0
0.76
Din valurile vremii...
152
7
80
3.40
135.00
237.0
0.57
Dintre sute de catarge*
43
4
35
2.00
32.50
51.0
0.64
Frequency distribution 167
N
W
g(1)
Doi aştri
40
2
38
1.97
Dorinţa
102
7
75
3.33
Dumnezeu şi om
443
15
269
3.60
Ecò
698
30
362
4.00
Egipetul
688
24
366
4.00
Epigonii
921
37
442
4.56
Făt-Frumos din tei
415
17
234
4.40
1440.70
6030
207
1567
9.50
153767.00
Floare-albastră
247
12
157
5.69
466.19
858.0
0.54
Foaia veştedă (dupa Lenau)
115
5
88
2.86
89.50
174.0
0.51
Freamăt de codru
179
7
125
3.25
225.75
372.0
0.61
Frumoasă şi jună
113
8
67
4.00
121.50
231.0
0.53 0.68
Poem title
Feciorul de împărat fără de
k
Bk
Bmax
B
0.00
18.5
0.00
128.67
222.0
0.58
1509.40
1876.0
0.80
4649.50
5234.5
0.89
3615.50
4197.5
0.86
7090.00
7938.0
0.89
1864.0
0.77
161298.0 0.95
stea
Ghazel
331
12
189
4.33
702.33
1034.0
Glossă
380
19
141
4.57
977.86
1260.0
0.78
Horia
143
6
103
3.50
121.25
255.0
0.48
Iar când voi fi pământ (vari-
131
6
90
3.00
128.50
222.5
0.58
Împărat şi proletar
1510
55
706
5.50
17327.25
19035.0
0.91
În căutarea Şeherezadei
915
34
495
5.00
7097.00
8151.0
0.87
Înger de pază
91
5
55
2.86
54.14
108.0
0.50
Înger şi demon
876
30
419
4.00
5390.50
6061.0
0.89
57
3
44
2.60
7.00
43.0
0.16
antă)
Îngere palid* Întunericul şi poetul
249
11
143
3.33
532.67
710.0
0.75
Iubind în taină*
81
3
68
2.60
11.80
67.0
0.18
Iubită dulce, o, mă lasă
337
11
162
4.00
548.50
805.0
0.68
Iubitei
416
17
174
5.40
968.20
1384.0
0.70
Junii corupţi
458
23
275
3.75
2607.00
3014.0
0.87
Kamadeva
81
4
62
2.67
38.17
91.5
0.42
La Bucovina
184
7
112
3.33
196.50
333.0
0.59
La mijloc de codru...
55
11
29
3.25
97.25
140.0
0.69
La moartea lui Heliade
332
14
182
4.20
866.10
1176.5
0.74
La moartea lui Neamţu
245
8
136
3.75
277.25
472.5
0.59
La moartea principelui Ştirbey
132
6
84
4.25
64.50
207.5
0.31
La mormântul lui Aron Pumnul
150
8
96
2.87
237.30
332.5
0.71
La o artistă (Ca a nopţii poezie)
142
7
78
3.00
148.00
231.0
0.64
La o artistă (Credeam ieri)
219
12
112
3.78
441.06
610.5
0.72
La Quadrat
110
7
63
3.25
109.50
186.0
0.59
168 The word Poem title
N
W
g(1)
k
Bk
Bmax
B
La steaua*
59
3
54
1.98
26.01
53.0
0.49
Lacul
90
6
60
2.67
94.17
147.5
0.64
Lasă-ţi lumea...
225
10
142
3.75
428.25
634.5
0.67
Lebăda*
37
3
34
2.89
0.00
33.0
0.00
Lida
66
4
50
3.14
17.79
73.5
0.24
Locul aripelor*
223
8
138
4.00
263.50
479.5
0.55
Luceafărul
1737
84
583
5.60
22623.50
24153.0
0.94
Mai am un singur dor
125
4
85
2.93
42.11
126.0
0.33
Melancolie
274
17
157
3.00
1076.00
1248.0
0.86
Memento mori
9773
423 2460
10.92
504564.04
Miradoniz
636
40
296
4.40
5184.70
518849.0 0.97 5752.5
0.90
Misterele nopţii
155
7
87
3.83
127.67
258.0
0.49
Mitologicale
681
34
360
4.50
5237.50
5923.5
0.88
Mortua est!
491
22
227
3.87
2018.97
2373.0
0.85
Mureşanu
2051
88
679
6.29
27471.21
29493.0
0.93
Murmură glasul mării
119
6
89
3.50
103.75
220.0
0.47
Napoleon
240
14
132
3.71
656.07
851.5
0.77
Noaptea...
177
8
107
3.71
217.64
371.0
0.59
Nu e steluţă*
42
3
27
2.67
2.67
26.0
0.10
Nu mă-nţelegi*
330
12
205
4.75
718.88
1122.0
0.64
Nu voi mormânt bogat (vari-
113
5
92
2.50
110.75
182.0
0.61
42
3
33
2.33
9.33
32.0
0.29
antă) Numai poetul* O arfă pe-un mormânt
157
8
99
3.40
217.00
343.0
0.63
O călărire în zori
346
19
177
5.13
1183.87
1584.0
0.75
O stea prin ceruri*
66
3
53
2.60
8.80
52.0
0.17
O, adevăr sublime...
334
18
192
2.95
1420.70
1623.5
0.88
O, mamă…
140
7
77
3.50
125.50
228.0
0.55
Odă în metru antic*
93
4
69
2.83
36.92
102.0
0.36
Odin şi poetul
1429
56
525
5.89
12994.67
14410.0
0.90
Ondina (Fantazie)
871
35
427
5.00
6322.00
7242.0
0.87 0.07
Oricâte stele...
85
3
62
2.80
4.30
61.0
Pajul Cupidon...*
124
9
103
2.80
309.00
408.0
0.76
Pe aceeaşi ulicioară...
138
7
86
2.89
169.06
255.0
0.66
Pe lângă plopii fără soţ
199
9
114
4.00
270.50
452.0
0.60
Peste vârfuri
47
5
35
2.00
49.00
68.0
0.72
Povestea codrului
220
9
138
3.57
361.57
548.0
0.66
Povestea teiului
390
18
209
3.80
1453.00
1768.0
0.82
Frequency distribution 169
Poem title
N
W
g(1)
k
Bk
Bmax
B
Prin nopţi tăcute*
42
3
33
2.33
9.33
32.0
0.29
Privesc oraşul furnicar
173
10
117
2.92
402.21
522.0
0.77
Pustnicul
380
12
231
4.00
903.50
1265.0
0.71
Replici
147
15
56
3.00
316.00
385.0
0.82
Revedere
141
6
82
3.25
105.75
202.5
0.52
Rugăciunea unui dac
357
14
212
3.50
1091.50
1371.5
0.80
S-a dus amorul
219
10
128
3.00
435.50
571.5
0.76
Sara pe deal
156
6
113
3.00
163.00
280.0
0.58
Scrisoarea I
1272
50
553
5.86
12064.43
13524.0
0.89
Scrisoarea II
696
30
342
4.33
4327.83
4944.5
0.88
Scrisoarea III
2278
110
873
6.00
45071.50
47524.0
0.95
Scrisoarea IV
1256
65
546
6.00
15917.50
17440.0
0.91
Scrisoarea V
1027
46
410
5.00
8294.50
9202.5
0.90 0.30
Se bate miezul nopţii...
45
3
36
2.33
10.33
35.0
Şi dacă...
53
4
27
3.25
6.38
39.0
0.16
Singurătate*
151
6
115
4.13
99.06
285.0
0.35 0.46
Somnoroase păsărele...
55
4
40
2.50
27.00
58.5
Sonete
265
8
160
4.00
307.50
556.5
0.55
Speranţa
245
19
103
3.86
746.57
918.0
0.81
Steaua vieţii
70
4
44
2.86
21.79
64.5
0.34
Stelele-n cer
91
5
67
2.67
73.67
132.0
0.56
Sus în curtea cea Domnească
128
5
89
3.25
72.50
176.0
0.41
Te duci..
84
9
61
2.67
183.33
240.0
0.76
Trecut-au anii
88
4
63
2.78
35.22
93.0
0.38
Unda spumă*
47
3
36
2.50
7.25
35.0
0.21
Venere şi Madona
393
17
183
4.57
1102.43
1456.0
0.76
Veneţia (de Gaetano Cerri)*
73
3
64
2.33
19.67
63.0
0.31
Viaţa mea fu ziuă
105
5
75
3.00
70.00
148.0
0.47
Vis
177
7
114
3.25
205.13
339.0
0.61
As can be seen, both A and B can be interpreted as measures of vocabulary richness. In Doi aştri where only one word is repeated twice, the relative size of the spectrum triangle is 0.00001 attaining almost its lower boundary 0. Again, the graph of (see Figure 3.2.6.4) shows that B converges to 1 for large text sizes. However, the dispersion is still too great and the sequence cannot be captured by a power function.
170 The word Hoping that we obtain a smoother curve using the ratio A/B we must state that we obtain rather a funnel converging to 1 as shown in Figure 3.2.6.5. Here we took only 141 poems: two of them, Doi aştri and Lebăda, were omitted because B = 0 and three outliers, Oricâte stele... (A/B = 6.90), Din lyra spartă... (A/B = 3.04), and Lida (A/B = 2.68) in order to keep the picture more lucid.
Figure 3.2.6.4. The convergence of B-values with increasing N
Figure 3.2.6.5. The convergence of A/B ratio with increasing N
Frequency distribution 171
3.2.6.2 Writer's view and the golden section On the basis of the frequency triangle we can imagine that the writer unconsciously controls the increase of the word frequencies with the help of the h-point. S/He also controls the proportions of synsemantics and autosemantics which are separated from each other in a fuzzy way by the h-point, and by the k-point characterising the spectrum s/he controls the representation of the frequency classes. This idea was proposed several times, cf. Popescu, Altmann (2007); Tuzzi, Popescu, Altmann (2010a,b). The two straight lines joining P3(h,h) with P1(V,1) and P2(1,f(1)) respectively form an angle α, whose cosine can be computed as (3.2.6.6) cos α =
−[(h − 1)( f (1) − h) + (h − 1)(V − h)] . [(h − 1) 2 + ( f (1) − h) 2 ]1/ 2 [(h − 1) 2 + (V − h) 2 ]1/ 2
Considering again the poem Doi aştri we obtain cos α = -[(1.5-1)(2 – 1.5) + (1.5 – 1)(39 – 1.5)]/[(0.52 + 0.52)1/2(0.52 + + 37.52)1/2] = = -[0.52 + 0.5(37.5)]/[0.7071(37.5)] = - 0.7165 The angle α in radians can be obtained from this value by α radians = arcos(cos α). In our example it is arccos(-0.7165) = 2.3695 radians. The values of α radians of rank-frequency distributions of all poems are presented in Table 3.2.6.3. Table 3.2.6.3: α radians of rank-frequency distribution in 146 poems (ordered according to increasing N) Poem title
N
V
f(1)
h
Doi aştri
40
39
2
1.50 -0.7165
2.3695
Lebăda
41
37
3
2.33 -0.9109
2.7164
Se bate miezul nopţii...
45
40
3
2.00 -0.7255
2.3825
Peste vârfuri
47
39
5
2.50 -0.5493
2.1523
Numai poetul
48
40
3
2.50 -0.9606
2.8598
Prin nopţi tăcute
48
40
3
2.50 -0.9606
2.8598
Dintre sute de catarge
50
41
4
2.67 -0.8073
2.5103
Din lyra spartă...
51
44
3
2.00 -0.7237
2.3800
Şi dacă...
53
37
4
3.00 -0.9191
2.7367
Nu e steluţă
54
40
3
3.00 -0.9985
3.0876
La mijloc de codru...
55
35
11
2.83 -0.2742
1.8486
cos α
α rad
172 The word Poem title
N
V
f(1)
h
Somnoroase păsărele...
55
46
4
2.50 -0.7311
2.3907
Unda spumă
59
47
3
3.00 -0.9990
3.0962
Îngere palid...
63
53
3
2.50 -0.9577
2.8495
Lida
66
57
4
2.00 -0.4634
2.0526
Din noaptea
68
57
3
3.00 -0.9993
3.1046
Steaua vieţii
70
55
4
3.00 -0.9110
2.7164
La steaua
71
62
3
3.00 -0.9994
3.1077
Adânca mare…
75
62
5
3.00 -0.7307
2.3901
Cum oceanu-ntărâtat...
77
67
4
2.50 -0.7234
2.3794
O stea prin ceruri
78
65
3
3.00 -0.9995
3.1093
Veneţia (de Gaetano Cerri)
79
71
3
2.50 -0.9554
2.8417
Kamadeva
81
70
4
2.50 -0.7226
2.3784
Te duci...
84
68
9
3.00 -0.3453
1.9233
Oricâte stele...
85
73
3
2.00 -0.7170
2.3703
De-or trece anii...
87
63
7
4.00 -0.7421
2.4070
Iubind în taină...
87
77
3
2.50 -0.9549
2.8400
Trecut-au anii
88
74
4
2.50 -0.7218
2.3772
Lacul
90
70
6
3.50 -0.7332
2.3938
Înger de pază
91
71
5
2.50 -0.5331
2.1331
Stelele-n cer
91
76
5
3.00 -0.7262
2.3836
De-aş avea
93
61
6
4.00 -0.8601
2.6062
Când amintirile...
97
80
5
3.00 -0.7252
2.3822
Când priveşti oglinda mărei
101
82
6
3.00 -0.5756
2.1841
Cum negustorii din Constantinopol Ce te legeni...
101
84
5
3.00 -0.7243
2.3809
102
76
8
3.50 -0.5155
2.1124
De câte ori, iubito...
102
84
6
2.50 -0.4108
1.9941
Dorinţa
102
85
7
2.67 -0.3778
1.9582
Odă în metru antic
103
83
4
3.00 -0.9053
2.7029
Viaţa mea fu ziuă
105
86
5
3.00 -0.7239
2.3803
La Quadrat
110
79
7
3.50 -0.6078
2.2241
Frumoasă şi jună
113
82
8
4.00 -0.6303
2.2527
Nu voi mormânt bogat (variantă) 113
99
5
3.00 -0.7217
2.3770
Când marea...
114
80
7
4.00 -0.7344
2.3956
Foaia veştedă (dupa Lenau)
115
99
5
3.00 -0.7217
2.3770
Murmură glasul mării
119
100
6
3.00 -0.5717
2.1794
Crăiasa din poveşti
122
94
7
3.00 -0.4668
2.0564
De-oi adormi (variantă)
122
105
4
3.00 -0.9030
2.6976
De ce nu-mi vii
123
82
9
4.00 -0.5471
2.1497
cos α
α rad
Frequency distribution 173 Poem title
N
V
f(1)
h
Ce e amorul?
124
94
8
3.00 -0.3917
1.9733
Mai am un singur dor
125
103
4
3.00 -0.9032
2.6979
Când
126
100
7
3.50 -0.6021
2.2169
Din Berlin la Potsdam
128
99
7
3.67 -0.6463
2.2735
Sus în curtea cea domnească
128
104
5
3.00 -0.7210
2.3760
Cine-i?
129
93
8
4.33 -0.7000
2.3462
Criticilor mei
130
91
4
3.00 -0.9044
2.7007
Iar când voi fi pământ (variantă) 131
106
6
3.00 -0.5707
2.1782
La moartea principelui Ştirbey
132
98
6
4.00 -0.8493
2.5855
Departe sunt de tine
135
105
7
4.00 -0.7278
2.3859
Pe aceeaşi ulicioară...
138
103
7
4.00 -0.7282
2.3865
O, mamă…
140
98
7
4.00 -0.7293
2.3881
Cu mâne zilele-ţi adaogi...
141
105
6
3.50 -0.7243
2.3808
Revedere
141
102
6
4.50 -0.9327
2.7726
La o artistă (Ca a nopţii poezie)
142
104
7
3.50 -0.6013
2.2159
Horia
143
119
6
3.00 -0.5690
2.1760
Replici
147
73
15
6.43 -0.6019
2.2167
Pajul Cupidon...
148
115
9
4.00 -0.5375
2.1382
La mormântul lui Aron Pumnul
150
116
8
4.00 -0.6212
2.2411
Din valurile vremii...
152
104
7
5.00 -0.9118
2.7183
Misterele nopţii
155
110
7
4.00 -0.7268
2.3845
Sara pe deal
156
128
6
3.67 -0.7665
2.4442
O arfă pe-un mormânt
157
118
8
4.00 -0.6208
2.2406
Adio
159
111
9
5.00 -0.7333
2.3939
Singurătate
172
134
6
4.00 -0.8446
2.5767
Privesc oraşul furnicar
173
136
10
4.00 -0.4674
2.0572
Atât de fragedă …
176
133
11
3.50 -0.3345
1.9118
Noaptea...
177
128
8
4.50 -0.7269
2.3845
Vis
177
138
7
3.50 -0.5963
2.2096
Freamăt de codru
179
143
7
4.00 -0.7222
2.3778
Ce-ţi doresc eu ţie, dulce Românie La Bucovina
183
127
8
4.67 -0.7598
2.4338
184
140
7
4.00 -0.7225
2.3782
Pe lângă plopii fără soţ
199
138
9
5.00 -0.7280
2.3863
Care-i amorul meu în astă lume
213
157
7
4.50 -0.8269
2.5443
La o artistă (Credeam ieri)
219
152
12
4.00 -0.3700
1.9498
S-a dus amorul
219
152
10
5.00 -0.6457
2.2727
Povestea codrului
220
168
9
3.50 -0.4276
2.0126
Lasă-ţi lumea...
225
167
10
4.50 -0.5549
2.1591
cos α
α rad
174 The word Poem title
N
V
f(1)
h
Ah, mierea buzei tale
228
144
10
5.00 -0.6469
2.2743
Napoleon
240
169
14
4.00 -0.3047
1.8804
Din străinătate
244
168
13
5.00 -0.4690
2.0590
La moartea lui Neamţu
245
173
8
5.00 -0.8141
2.5219
Speranţa
245
143
19
5.00 -0.3025
1.8781
Floare-albastră
247
185
12
3.88 -0.3447
1.9227
Întunericul şi poetul
249
176
11
5.00 -0.5740
2.1822
Amicului F.I.
257
194
5
4.00 -0.9536
2.8356
De-aş muri ori de-ai muri
258
168
10
5.67 -0.7520
2.4220
Locul aripelor
259
173
8
6.00 -0.9392
2.7910
Sonete
265
194
8
5.00 -0.8125
2.5193
Amorul unei marmure
266
184
9
5.00 -0.7227
2.3785
De ce să mori tu?
266
172
13
6.00 -0.6055
2.2212
Melancolie
274
192
17
5.50 -0.3868
1.9679
Despărţire
304
202
14
6.00 -0.5515
2.1549
Ghazel
331
231
12
6.00 -0.6571
2.2878
La moartea lui Heliade
332
225
14
5.75 -0.5176
2.1149
O, adevăr sublime...
334
226
18
6.00 -0.4055
1.9883
Iubită dulce, o, mă lasă
337
212
11
6.00 -0.7241
2.3805
O călărire în zori
346
233
19
5.50 -0.3349
1.9123
Dacă treci râul Selenei
356
232
21
6.00 -0.3371
1.9147
Rugăciunea unui dac
357
253
14
6.00 -0.5471
2.1496
Copii eram noi amândoi
375
250
24
7.00 -0.3560
1.9348
Glossă
380
191
19
8.00 -0.5687
2.1758
Pustnicul
380
270
12
6.50 -0.7217
2.3771
Nu mă-nţelegi
384
257
12
7.00 -0.7834
2.4708
Povestea teiului
390
261
18
6.00 -0.4026
1.9852
Venere şi Madona
393
247
17
6.33 -0.4669
2.0566
Basmul ce i l-aş spune ei
398
262
18
6.00 -0.4026
1.9851
Făt-Frumos din tei
415
281
17
6.50 -0.4817
2.0733
Iubitei
416
240
17
6.50 -0.4847
2.0769
Aveam o muză
421
281
17
7.00 -0.5331
2.1331
Dumnezeu şi om
443
320
15
6.50 -0.5579
2.1626
Junii corupţi
458
309
23
7.50 -0.4065
1.9894
Mortua est!
491
295
22
7.50 -0.4296
2.0148
Cugetările sarmanului Dionis
571
389
27
8.00 -0.3629
1.9422
Miradoniz
636
377
40
8.00 -0.2322
1.8051
Mitologicale
681
442
34
8.00 -0.2755
1.8499
Egipetul
688
452
24
7.50 -0.3801
1.9607
cos α
α rad
Frequency distribution 175 Poem title
N
V
f(1)
h
Scrisoarea II
696
423
30
11.00 -0.4871
2.0795
Ecò
698
442
30
9.00 -0.3732
1.9532
Când crivăţul cu iarna...
708
420
31
10.00 -0.4140
1.9976
Ondina (Fantazie)
871
535
35
9.75 -0.3431
1.9210
Înger şi demon
876
520
30
11.00 -0.4830
2.0749
Demonism
882
500
36
10.00 -0.3444
1.9224
În căutarea Şeherezadei
915
594
34
9.00 -0.3178
1.8942
Epigonii
921
565
37
10.00 -0.3256
1.9024
Scrisoarea V
1027
550
46
10.50 -0.2755
1.8499
Scrisoarea IV
1256
699
65
12.33 -0.2265
1.7993
Scrisoarea I
1282
708
50
10.50 -0.2471
1.8204
Odin şi poetul
1429
724
56
13.00 -0.2850
1.8598
Împărat şi proletar
1510
857
55
13.50 -0.3026
1.8782
Luceafărul
1737
820
84
14.00 -0.1984
1.7705
Andrei Mureşanu
2008
1011
65
14.67 -0.2752
1.8496
Mureşanu
2051
961
88
17.00 -0.2363
1.8094
Scrisoarea III
2278
1146
110
15.40 -0.1631
1.7346
Călin (file de poveste)
2299
1123
98
16.33 -0.1981
1.7702
Feciorul de împărat fără de stea
6030
2271
207
23.67 -0.1327
1.7039
Memento mori
9773
3576
423
27.00 -0.0728
1.6437
cos α
α rad
As shown in Figure 3.2.6.6, the α radians converges to the golden section given as
= ϕ (3.2.6.7)
1+ 5 = 1.6180 2
as was firstly shown for 176 texts in 20 languages (Popescu, Altmann 2007: 79, Figure 5), and not to π/2 = 1.5708 (see more in Tuzzi, Popescu, Altmann, 2010b). Again, we may speak of mechanical self-regulation in cases where the text is too long and the word frequencies cannot be controlled consciously by the author. The fact that one obtains the golden section in different text sorts and different languages points to the existence of an attractor known since antiquity. We obtain a similar result also if we compute the indicator B, the homologous of indicator A for the frequency spectrum (cf. Popescu et al. 2009: 81ff.) using the triangle Q1(W,1), Q2(1,g(1)), Q3(k,k). The situation is presented in Figure 3.2.6.3. The results for frequency spectra of the same poems are presented in Table 3.2.6.4 and shown graphically in Figure 3.2.6.7.
176 The word
Figure 3.2.6.6. Convergence of α radians in Eminescu's poems to the golden section for rank frequencies (presented in logarithmic scaling of N) Table 3.2.6.4: α radians of frequency spectra in 146 poems by Eminescu (asterisk (*) means the use of the transformation g*(f) = g(f) – g(W) + 1) N(N*)
W
g(1)
Adânca mare…
75
5
55
Adio
159
9
88
3.50 -0.4405
2.0270
Poem title
Ah, mierea buzei tale
k
cos α
α rad
spectra
spectra
3.00 -0.7338
2.3946
228
10
108
4.33 -0.5345
2.1347
Amicului F.I.*
257(242)
5
154
4.20 -0.9751
2.9180
Amorul unei marmure*
266(235)
9
142
3.80 -0.4918
2.0850
2008
65
763
6.50 -0.1008
1.6718
Andrei Mureşanu Atât de fragedă …
176
11
110
3.50 -0.3384
1.9160
Aveam o muză
421
17
226
3.89 -0.2279
1.8007
Basmul ce i l-aş spune ei
398
18
204
4.40 -0.2590
1.8328
Călin (file de poveste)
2299
98
830
6.00 -0.0603
1.6312
Când
126
7
85
2.82 -0.4189
2.0030
Când amintirile...
97
5
72
2.00 -0.3297
1.9068
Când crivăţul cu iarna...
708
31
345
3.70 -0.1063
1.6773
Când marea...
114
7
63
3.33 -0.5694
2.1766
Când priveşti oglinda mărei
101
6
71
2.83 -0.5241
2.1225
Care-i amorul meu în astă lume
213
7
130
4.33 -0.7972
2.4934
Ce e amorul?*
124(110)
8
78
2.88 -0.3669
1.9465
Ce te legeni...
102
81
61
2.82 -0.3604
1.9395
Ce-ţi doresc eu ţie, dulce Românie
183
8
100
3.00 -0.3905
1.9719
Cine-i?
129
8
75
2.91 -0.3758
1.9560
Frequency distribution 177
N(N*)
W
g(1)
Copii eram noi amândoi
375
24
205
3.63 -0.1407
1.7120
Crăiasa din poveşti
122
7
74
3.00 -0.4722
2.0626
130(120)
4
61
3.40 -0.9794
2.9383
Poem title
Criticilor mei*
k
cos α
α rad
spectra
spectra
Cu mâne zilele-ţi adaogi...
141
6
79
2.95 -0.5601
2.1653
Cugetările sărmanului Dionis
571
27
324
3.82 -0.1294
1.7006
Cum negustorii din Constantinopol
101
5
75
3.00 -0.7265
2.3840 2.4506
Cum oceanu-ntărâtat...
77
4
60
2.60 -0.7706
Dacă treci râul Selenei
356
21
188
3.33 -0.1435
1.7147
De câte ori, iubito...
102
6
71
2.82 -0.5191
2.1166
De ce nu-mi vii
123
9
59
3.00 -0.3499
1.9282
De ce să mori tu?
266
13
141
3.88 -0.3204
1.8970
93(77)
6
45
3.25 -0.6740
2.3104
De-aş muri ori de-ai muri
258
10
133
3.88 -0.4450
2.0319
Demonism
882
36
349
5.00 -0.1395
1.7107
122(112)
4
94
2.67 -0.7921
2.4851
De-or trece anii...
87
7
53
2.60 -0.3714
1.9513
Departe sunt de tine
135
7
90
2.88 -0.4333
2.0189
Despărţire
304
14
163
4.00 -0.3054
1.8811
Din Berlin la Potsdam
128
7
83
2.90 -0.4419
2.0285
51
3
38
2.60 -0.9801
2.9418
68(56)
3
47
2.33 -0.9074
2.7078
244
13
135
3.63 -0.2888
1.8638
De-aş avea*
De-oi adormi (variantă)*
Din lyra spartă... Din noaptea* Din străinătate Din valurile vremii...
152
7
80
3.40 -0.5805
2.1901
50(43)
4
35
2.00 -0.4741
2.0647
Doi aştri
40
2
38
1.97 -1.0000
3.1416
Dorinţa
102
7
75
3.33 -0.5640
2.1701
Dumnezeu şi om
443
15
269
3.60 -0.2319
1.8048
Ecò
698
30
362
4.00 -0.1229
1.6941
Egipetul
688
24
366
4.00 -0.1565
1.7280
Epigonii
921
37
442
4.56 -0.1170
1.6881 1.8492
Dintre sute de catarge*
Făt-Frumos din tei
415
17
234
4.40 -0.2748
6030
207
1567
9.50 -0.0484
1.6193
Floare-albastră
247
12
157
5.69 -0.6214
2.2414
Foaia veştedă (dupa Lenau)
115
5
88
2.86 -0.6713
2.3067
Freamăt de codru
179
7
125
3.25 -0.5303
2.1297
Frumoasă şi jună
113
8
67
4.00 -0.6374
2.2619
Ghazel
331
12
189
4.33 -0.4152
1.9990
Feciorul de împărat fără de stea
178 The word N(N*)
W
g(1)
Glossă
380
19
141
Horia
143
6
103
3.50 -0.7246
2.3813
Iar când voi fi pământ (variantă)
131
6
90
3.00 -0.5737
2.1818
Împărat şi proletar
1510
55
706
5.50 -0.0969
1.6679
În căutarea Şeherezadei
915
34
495
5.00 -0.1447
1.7160
Înger de pază
91
5
55
2.86 -0.6814
2.3205
Înger şi demon
876
30
419
4.00 -0.1218
1.6929
Îngere palid...*
63(57)
3
44
2.60 -0.9788
2.9352
249
11
143
3.33 -0.3071
1.8829
87(81)
3
68
2.60 -0.9758
2.9211
Poem title
Întunericul şi poetul Iubind în taină...*
k
cos α
α rad
spectra
spectra
4.57 -0.2656
1.8396
Iubită dulce, o, mă lasă
337
11
162
4.00 -0.4113
1.9947
Iubitei
416
17
174
5.40 -0.3789
1.9594
Junii corupţi
458
23
275
3.75 -0.1514
1.7228
Kamadeva
81
4
62
2.67 -0.7981
2.4949
La Bucovina
184
7
112
3.33 -0.5549
2.1590
La mijloc de codru...
55
11
29
3.25 -0.3613
1.9405
La moartea lui Heliade
332
14
182
4.20 -0.3275
1.9044
La moartea lui Neamţu
245
8
136
3.75 -0.5606
2.1659
La moartea principelui Ştirbey
132
6
84
4.25 -0.8990 2.6884
La mormântul lui Aron Pumnul
150
8
96
2.87 -0.3605
La o artistă (Ca a nopţii poezie)
142
7
78
3.00 -0.4709
2.0611
La o artistă (Credeam ieri)
219
12
112
3.78 -0.3443
1.9223 2.1489
1.9396
La Quadrat
110
7
63
3.25 -0.5464
La steaua*
71(59)
3
54
1.98 -0.7074
2.3565
90
6
60
2.67 -0.4730
2.0635
Lacul Lasă-ţi lumea... Lebăda* Lida Locul aripelor* Luceafărul
225
10
142
3.75 -0.4209
2.0052
41(37)
3
34
2.89 -1.0000
3.1416
66
4
50
3.14 -0.9445
2.8068
259(223)
8
138
4.00 -0.6178
2.2367
1737
84
583
5.60 -0.0665
1.6374 2.6580
Mai am un singur dor
125
4
85
2.93 -0.8853
Melancolie
274
17
157
3.00 -0.1543
1.7257
Memento mori
9773
423
2460 10.92 -0.0281
1.5989
Miradoniz
636
40
296
4.40 -0.1067
1.6777
Misterele nopţii
155
7
87
3.83 -0.6918
2.3348
Mitologicale
681
34
360
4.50 -0.1276
1.6987
Mortua est!
491
22
227
3.87 -0.1688
1.7404
Frequency distribution 179
Poem title
N(N*)
W
g(1)
Mureşanu
2051
88
679
6.29 -0.0724
1.6432
Murmură glasul mării
119
6
89
3.50 -0.7275
2.3854
Napoleon
240
14
132
3.71 -0.2756
1.8500
Noaptea... Nu e steluţă* Nu mă-nţelegi* Nu voi mormânt bogat (variantă)
k
cos α
α rad
spectra
spectra
177
8
107
3.71 -0.5571
2.1616
54(42)
3
27
2.67 -0.9917
3.0126
384(330)
12
205
4.75 -0.4760
2.0669 2.1280
113
5
92
2.50 -0.5288
48(42)
3
33
2.33 -0.9130
2.7214
O arfă pe-un mormânt
157
8
99
3.40 -0.4847
2.0768
O călărire în zori
346
19
177
5.13 -0.3079
1.8838 2.9283
Numai poetul*
O stea prin ceruri*
78(66)
3
53
2.60 -0.9773
O, adevăr sublime...
334
18
192
2.95 -0.1387
1.7100
O, mamă…
140
7
77
3.50 -0.6086
2.2250
Odă în metru antic*
103(93)
4
69
2.83 -0.8582
2.6026
Odin şi poetul
1429
56
525
5.89 -0.1065
1.6775
Ondina (Fantazie)
871
35
427
5.00 -0.1416
1.7128
Oricâte stele...
85
3
62
2.80 -0.9968
3.0613
148(124)
9
103
2.80 -0.2960
1.8713 2.0242
Pajul Cupidon...* Pe aceeaşi ulicioară...
138
7
86
2.89 -0.4380
Pe lângă plopii fără soţ
199
9
114
4.00 -0.5377
2.1385
Peste vârfuri
47
5
35
2.00 -0.3448
1.9228
Povestea codrului
220
9
138
3.57 -0.4453
2.0323
Povestea teiului
390
18
209
3.80 -0.2068
1.7791
48(42)
3
33
2.33 -0.9130
2.7214
Privesc oraşul furnicar
173
10
117
2.92 -0.2774
1.8519
Pustnicul
380
12
231
4.00 -0.3635
1.9428
Replici
147
15
56
3.00 -0.2015
1.7737
Prin nopţi tăcute*
Revedere
141
6
82
3.25 -0.6551
2.2851
Rugăciunea unui dac
357
14
212
3.50 -0.2433
1.8165
S-a dus amorul
219
10
128
3.00 -0.2901
1.8651
Sara pe deal
156
6
113
3.00 -0.5697
2.1770
Scrisoarea I
1272
50
553
5.86 -0.1182
1.6893
Scrisoarea II
696
30
342
4.33 -0.1386
1.7098
Scrisoarea III
2278
110
873
6.00 -0.0538
1.6246
Scrisoarea IV
1256
65
546
6.00 -0.0937
1.6646
Scrisoarea V
1027
46
410
5.00 -0.1069
1.6779
45
3
36
2.33 -0.9114
2.7175
Se bate miezul nopţii...
180 The word Poem title
N(N*)
W
g(1)
Şi dacă...
53
4
27
3.25 -0.9743
2.9143
172(151)
6
115
4.13 -0.8716
2.6294
Somnoroase păsărele...
55
4
40
2.50 -0.7348
2.3962
Sonete
265
8
160
4.00 -0.6153
2.2335
Speranţa
245
19
103
3.86 -0.2136
1.7861
Steaua vieţii
70
4
44
2.86 -0.8744
2.6350
Stelele-n cer
91
5
67
2.67 -0.6021
2.2169
Sus în curtea cea domnească
128
5
89
3.25 -0.8052
2.5068
Te duci...
84
9
61
2.67 -0.2820
1.8567
Trecut-au anii
88
4
63
2.78 -0.8404
2.5688
Unda spumă*
59(47)
3
36
2.50 -0.9619
2.8646
393
17
183
4.57 -0.2954
1.8706
79(73)
3
64
2.33 -0.9039
2.6996
Singurătate*
Venere şi Madona Veneţia (de Gaetano Cerri)* Viaţa mea fu ziuă
k
cos α
α rad
spectra
spectra
105
5
75
3.00 -0.7265
2.3840
177
7
114
3.25 -0.5318
2.1315
Vis
Figure 3.2.6.7. Convergence of α radians to the golden section for spectra in Eminescu's poems
The simultaneous convergence of α radians, both for ranks and spectra, to the golden section in Eminescu's poems is presented in Figure 3.2.6.8.
Vocabulary richness 181
Figure 3.2.6.8. Convergence of α radians, both for ranks and spectra, to the golden section in Eminescu's poems
Finally, as expected on geometrical grounds, there should be a perfect linear relationship between the indicator A and the angle α radians for rankfrequencies. With the data in Tables 3.2.6.1 and 3.2.6.3 we have A = 1.8035 - 0.5593*α (ranks) with R2 = 0.9964 Similarly for spectra, with the data in Tables 3.2.6.2 and 3.2.6.4, we have B = 1.8561 - 0.5816*α (spectra) with R2 = 0.9949.
3.3 Vocabulary richness One of the most diffuse concepts in textology is that of vocabulary richness. This concept is not well-defined; a large number of measures were proposed, each of them expresses another property and each of them behaves differently. Even the basic idea behind the concept is not really clear as it is, in all cases, intended to be measured in terms of textual properties but at the same time often interpreted as a measure of cognitive qualities of an author. However, there is no hypothesis which could (or would try to) deliver a bridge between a person's mental word inventory (be it its active or passive version) and the vocabulary of an individual text. There are too many intervening factors between these two aspects – factors of style, genre, target readership, topic and many more – to allow to infer one from the other. Moreover, even the aim of the measurement is not always transparent. A direct comparison of texts with respect to vocabulary size is not reasonable if the texts are of different size, which is almost always the
182 The word case. Short texts inevitably correspond to small vocabularies, a fact which is neglected by some of the proposed richness measures such as the ratio between the number of hapax legomena and the number of running words. Furthermore, some of the proposed measures display a bizarre mathematical behaviour under certain conditions (cf. Wimmer, Altmann 1999) – a sign of a bad correspondence between the textual reality and the idea how it should be measured. We are confronted with a very fuzzy concept, and the merit of science is the great effort for and the progress in quantifying and measuring this phenomenon. The history of the problem is very rich, the most contributions have been furnished by French scientists (Bernet 1988; Brunet 1978; Cossette 1994; Dugast 1980; Guiraud 1954; Honore 1979; Hubert, Labbé 1994; Ménard 1983; Muller 1964, 1970, 1977; Thoiron 1988; Thoiron, P., Labbé, D., Serant, D. 1988) but the research advances in the whole world. Unfortunately, even an attempt at writing a survey of all models and results would necessitate a separate book. If word-forms are considered, different results are obtained for the same text in different languages. This is caused by the degree of synthetism of a language. Hence in turn two other views present themselves: the lemmatised text and the frequency spectrum of the text. The lemmatised text has more problems than the unlemmatised text, since lemmatisation is very probably a procedure which every linguist performs in another way. One should not forget that lemmatisation is based on a series of rules mostly inherited in national linguistics, it encompasses the elimination of grammatical categories (but not everywhere!), unsolvable cases (e.g. suppletivism, gender), decisions about the status of prepositions-prefixes-proclitics (or postpositions-suffixes-enclitics), the status of compounds, derivates, etc. Lemmatisation performed by software is only one of the possible ones – not the last truth! Nevertheless, one can use it for the predetermined purposes (cf. Chapter 3.2). The other way out is to look at the frequency spectrum of word-forms or lemmas. In the spectrum, all hapax legomena are collected in x = 1; words occurring twice, in x = 2, etc. Of course, words occurring twice or three times, etc. contribute to the richness, too, but up to which x should we consider the words as richness bearers? Here a very simple and a well definable point is given, viz. the h-point and its family (cf. Popescu et al. 2009; 2010) which will be used here, too. On the other hand, one could weight of richness by the inverse value of frequency yielding values in the interval [0, 1]. Here we shall restrict ourselves to the application of some indicators introduced in works by Popescu et al. (2009, 2011). For a short survey of other indicators see Wimmer, Altmann (1999). As has been shown in Chapter 3.2.6, the h-point separates synsemantics – occurring usually very frequently – from autosemantics which make up the
Vocabulary richness 183
contents of the text and occur less frequently. Of course, some of the synsemantics occur also beyond the h-point, and some autosemantics forming the very theme of the text (cf. Popescu, Altmann 2011) occur more frequently below the h-point but do not contribute significantly to vocabulary richness. Thus the h-point or the k-point can be used for expressing an approximate measure of vocabulary richness. Defining (3.3.1)
F ( h) = F ( r ≤ h) =
1 [h] ∑ f (r ) N r =1
as the distribution function of the rank-frequency sequence consisting of the sum of relative frequencies from r = 1 up to [h] (since h needs not be an integer, we take only values up to the integer value, [h], of h). Making a small correction and subtracting the half of the square formed by h, divided by the whole area under the frequency sequence (N), we obtain F(h) – h2/(2N). Finally, since richness is given by those words whose rank is greater than h, we define the richness indicator R1 (cf. Popescu et al. 2009: 33) as h2 (3.3.2) R1 = 1 − F ([h]) − 2N
.
For illustration consider the rank-frequency sequence of Doi aştri shown in Chapter 3.2.6: 2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1. Here we have h = 1.5, N = 40, F([1.5]) = F(1) = 2/40 = 0.05 whence R1 = 1 – (0.05 – 1.52/(2*40)) = 0.9781, a value that could be expected since almost all words occur only once. This is, of course, possible only in short texts. The values of R1 for other poems by Eminescu are presented in Table 3.3.1. Table 3.3.1: The vocabulary richness R1 in 146 poems by Eminescu. Poem title
N
h
F(h)
R1
Adânca mare…
75
3.00
0.1467
0.9133
Adio
159
5.00
0.1950
0.8836
Ah, mierea buzei tale
228
5.00
0.1491
0.9057
Amicului F.I.
257
4.00
0.0700
0.9611
Amorul unei marmure
266
5.00
0.1316
0.9154
Andrei Mureşanu
2008
14.67
0.2276
0.8260
184 The word Poem title
N
h
F(h)
R1
Atât de fragedă…
176
3.50
0.1136
0.9212
Aveam o muză
421
7.00
0.1686
0.8895
Basmul ce i l-aş spune ei
398
6.00
0.1508
0.8945
Călin (file de poveste)
2299
16.33
0.2545
0.8036
Când
126
3.50
0.1270
0.9216
Când amintirile...
97
3.00
0.1237
0.9227
Când crivăţul cu iarna...
708
10.00
0.2359
0.8347
Când marea...
114
4.00
0.1842
0.8860
Când priveşti oglinda mărei
101
3.00
0.1287
0.9158
Care-i amorul meu în astă lume
213
4.50
0.1033
0.9442
Ce e amorul?
124
3.00
0.1532
0.8831
Ce te legeni...
102
3.50
0.1569
0.9032
Ce-ţi doresc eu ţie, dulce Românie
183
4.67
0.1421
0.9174
Cine-i?
129
4.33
0.1860
0.8867
Copii eram noi amândoi
375
7.00
0.1973
0.8680
Crăiasa din poveşti
122
3.00
0.1066
0.9303
Criticilor mei
130
3.00
0.0846
0.9500
Cu mâne zilele-ţi adaogi...
141
3.50
0.0993
0.9441
Cugetările sărmanului Dionis
571
8.00
0.1716
0.8844
Cum negustorii din Constantinopol
101
3.00
0.1188
0.9257
Cum oceanu-ntărâtat...
77
2.50
0.0909
0.9497
Dacă treci râul Selenei
356
6.00
0.2079
0.8427
De câte ori, iubito...
102
2.50
0.0882
0.9424
De ce nu-mi vii
123
4.00
0.1870
0.8780
De ce să mori tu?
266
6.00
0.1880
0.8797
De-aş avea
93
4.00
0.2151
0.8710
De-aş muri ori de-ai muri
258
5.67
0.1667
0.8956
Demonism
882
10.00
0.2075
0.8492
De-oi adormi (variantă)
122
3.00
0.0902
0.9467
De-or trece anii...
87
4.00
0.2414
0.8506
Departe sunt de tine
135
4.00
0.1556
0.9037
Despărţire
304
6.00
0.1711
0.8882
Din Berlin la Potsdam
128
3.67
0.1328
0.9197
Din lyra spartă...
51
2.00
0.0980
0.9412
Din noaptea
68
3.00
0.1324
0.9338
Din străinătate
244
5.00
0.1762
0.8750
Din valurile vremii...
152
5.00
0.1776
0.9046
Dintre sute de catarge
50
2.67
0.1600
0.9111
Vocabulary richness 185
Poem title
N
h
F(h)
R1
Doi aştri
40
1.50
0.0500
0.9781
Dorinţa
102
2.67
0.1078
0.9270
Dumnezeu şi om
443
6.50
0.1422
0.9055
Ecò
698
9.00
0.2235
0.8345
Egipetul
688
7.50
0.1642
0.8766
Epigonii
921
10.00
0.2020
0.8523
Făt-Frumos din tei
415
6.50
0.1542
0.8967
6030
23.67
0.2879
0.7585
247
3.88
0.1296
0.9008
Feciorul de împărat fără de stea Floare-albastră Foaia veştedă (după Lenau)
115
3.00
0.0957
0.9435
Freamăt de codru
179
4.00
0.1229
0.9218
Frumoasă şi jună
113
4.00
0.1770
0.8938
Ghazel
331
6.00
0.1480
0.9063
Glossă
380
8.00
0.2421
0.8421
Horia
143
3.00
0.0839
0.9476
Iar când voi fi pământ (variantă)
131
3.00
0.0992
0.9351
Împărat şi proletar
1510
13.50
0.2238
0.8365
În căutarea Şeherezadei
915
9.00
0.1792
0.8650
Înger de pază
91
2.50
0.0879
0.9464
Înger şi demon
876
11.00
0.2055
0.8636
Îngere palid...
63
2.50
0.0952
0.9544
Întunericul şi poetul
249
5.00
0.1687
0.8815
Iubind în taină...
87
2.50
0.0690
0.9670
Iubită dulce, o, mă lasă
337
6.00
0.1365
0.9169
Iubitei
416
6.50
0.1466
0.9041
Junii corupţi
458
7.50
0.2031
0.8584
Kamadeva
81
2.50
0.0864
0.9522
La Bucovina
184
4.00
0.1087
0.9348
La mijloc de codru...
55
2.50
0.3273
0.7295
La moartea lui Heliade
332
5.75
0.1476
0.9022
La moartea lui Neamţu
245
5.00
0.1184
0.9327
La moartea principelui Ştirbey
132
4.00
0.1515
0.9091
La mormântul lui Aron Pumnul
150
4.00
0.1400
0.9133
La o artistă (Ca a nopţii poezie)
142
3.50
0.1056
0.9375
La o artistă (Credeam ieri)
219
4.00
0.1142
0.9224
La Quadrat
110
3.50
0.1545
0.9011
La steaua
71
3.00
0.1268
0.9366
Lacul
90
3.50
0.1667
0.9014
186 The word N
h
F(h)
Lasă-ţi lumea...
225
4.50
0.1289
0.9161
Lebăda
41
2.33
0.1463
0.9201
Poem title
R1
Lida
66
2.00
0.0909
0.9394
Locul aripelor
259
6.00
0.1622
0.9073
Luceafărul
1737
14.00
0.2441
0.8123
Mai am un singur dor
125
3.00
0.0800
0.9560
Melancolie
274
5.50
0.1642
0.8910
Memento mori
9773
27.00
0.2718
0.7655
Miradoniz
636
8.00
0.2154
0.8349
Misterele nopţii
155
4.00
0.1290
0.9226
Mitologicale
681
8.00
0.1880
0.8590
Mortua est!
491
7.50
0.1853
0.8719
Mureşanu
2051
17.00
0.2574
0.8130
Murmură glasul mării
119
3.00
0.1008
0.9370
Napoleon
240
4.00
0.1333
0.9000
Noaptea...
177
4.50
0.1356
0.9216
Nu e steluţă
54
3.00
0.1667
0.9167
Nu mă-nţelegi
384
7.00
0.1641
0.8997
Nu voi mormânt bogat (variantă)
113
3.00
0.1062
0.9336
Numai poetul
48
2.50
0.1250
0.9401
O arfă pe-un mormânt
157
4.00
0.1465
0.9045
O călărire în zori
346
5.50
0.1416
0.9021
O stea prin ceruri
78
3.00
0.1154
0.9423
O, adevăr sublime...
334
6.00
0.2066
0.8473
O, mamă…
140
4.00
0.1643
0.8929 0.9369
Odă în metru antic
103
3.00
0.1068
Odin şi poetul
1429
13.00
0.2344
0.8247
Ondina (Fantazie)
871
9.75
0.1906
0.8640 0.9647
Oricâte stele...
85
2.00
0.0588
Pajul Cupidon...
148
4.00
0.1824
0.8716
Pe aceeaşi ulicioară...
138
4.00
0.1594
0.8986
Pe lângă plopii fără soţ
199
5.00
0.1658
0.8970
Peste vârfuri
47
2.50
0.1702
0.8963
Povestea codrului
220
3.50
0.0955
0.9324
Povestea teiului
390
6.00
0.1538
0.8923
Prin nopţi tăcute
48
2.50
0.1250
0.9401
Privesc oraşul furnicar
173
4.00
0.1387
0.9075
Pustnicul
380
6.50
0.1447
0.9109
Vocabulary richness 187
N
h
F(h)
Replici
147
6.43
0.4490
0.6916
Revedere
141
4.50
0.1489
0.9229
Rugăciunea unui dac
357
6.00
0.1653
0.8852
S-a dus amorul
219
5.00
0.1689
0.8881 0.9405
Poem title
R1
Sara pe deal
156
3.67
0.1026
Scrisoarea I
1282
10.50
0.1871
0.8574
Scrisoarea II
696
11.00
0.2399
0.8470
Scrisoarea III
2278
15.40
0.2629
0.7891
Scrisoarea IV
1256
12.33
0.2205
0.8400
Scrisoarea V
1027
10.50
0.2269
0.8268
45
2.00
0.1111
0.9333
Se bate miezul nopţii... Şi dacă...
53
3.00
0.1887
0.8962
Singurătate
172
4.00
0.1221
0.9244
Somnoroase păsărele...
55
2.50
0.1273
0.9295
Sonete
265
5.00
0.1094
0.9377
Speranţa
245
5.00
0.2082
0.8429
Steaua vieţii
70
3.00
0.1429
0.9214
Stelele-n cer
91
3.00
0.1319
0.9176
Sus în curtea cea domnească
128
3.00
0.0938
0.9414
Te duci...
84
3.00
0.1786
0.8750
Trecut-au anii
88
2.50
0.0795
0.9560
Unda spumă
59
3.00
0.1525
0.9237
Venere şi Madona
393
6.33
0.1654
0.8856
Veneţia (de Gaetano Cerri)
79
2.50
0.0759
0.9636
Viaţa mea fu ziuă
105
3.00
0.1143
0.9286
Vis
177
3.50
0.0960
0.9386
Figure 3.3.1. Decrease of vocabulary richness R1 with text size
188 The word The short and the long poems display significant differences; most short poems have very high richness while long poems have significantly less (see Figure 3.3.1). No historical trend is observed. Since this indicator is based on F(h), one can obtain its variance as (3.3.3) Var(F(h)) = F(h)(1 - F(h))/N because the subtracted value (h2/(2N)) can be considered constant for the given text and its variance is zero. Thus comparison of the richness values of two texts is possible. Consider the text with the greatest richness, Doi aştri, R1= 0.9781, and that with the smallest richness, Replici, R1 = 0.6916. In Table 3.3.1, all necessary values are given, hence the asymptotic test can be performed using the normal distribution (3.3.4) z =
| R1,1 − R1,2 | F (h1 )(1 − F (h1 )) F (h2 )(1 − F (h2 )) + N1 N2
yielding in our case z = 0.57 which is not significant. However, comparing Doi aştri with Împărat şi proletar (R1 = 0.8365) we obtain z = 3.92 which is highly significant. Thus even smaller differences can be more significant than great differences; it depends on poem size. The same procedure can be performed for the spectrum, but in a spectrum the hapax legomena and words with a small repetition are placed at the beginning of the distribution; we define (cf. Popescu et al. 2009: 38)
= R2 G ([k ]) − (3.3.5)
k2 . 2V
This indicator has the advantage of taking into account not only the hapax legomena but also other frequency classes. But the number of classes to be considered gets evident only after computing the indicator k. When we compare two texts, we do not compare identically defined proportions but textually determined ones. Calculating this measure again for the poem Doi aştri we obtain k = 1.9737, V = 39 and G([1.9737]) = G(1) = 38/39 = 0.9744. From which we obtain R2(Doi aştri) = 0.9744 – 1.97372/(2*39) = 0.9244, which is, again, a very high richness value. The values of richness R2 for all 146 poems are presented in Table 3.3.2 and in Figure 3.3.2.
Vocabulary richness 189 Table 3.3.2: The vocabulary richness R2 in 146 poems by Eminescu (asterisk corresponds to the transformation g*(x) = g(x) – g(W) + 1, where W = the greatest non-zero class, see Chapter 3.2.6) Poem title
N
V
g(1)
k
G(k)
R2
Adânca mare…
75
62
55
3.0000
0.9839
0.9113 0.8998
Adio
159
111
88
3.5000
0.9550
Ah, mierea buzei tale
228
144
108
4.3333
0.9583
0.8931
Amicului F.I.*
242
189
154
4.2000
0.9947
0.9480
Amorul unei marmure*
235
177
142
3.8000
0.9718
0.9310
Andrei Mureşanu
2008
1011
763
6.5000
0.9624
0.9415
Atât de fragedă…
176
133
110
3.5000
0.9774
0.9314
Aveam o muză
421
281
226
3.8889
0.9537
0.9268
Basmul ce i l-aş spune ei
398
262
204
4.4000
0.9625
0.9325
Călin (file de poveste)
2299
1123
830
6.0000
0.9697
0.9527
126
100
85
2.8182
0.9600
0.9203
Când Când amintirile...
97
80
72
2.0000
0.9250
0.9000
Când crivăţul cu iarna...
708
420
345
3.7000
0.9429
0.9266
Când marea...
114
80
63
3.3333
0.9500
0.8806
Când priveşti oglinda mărei
101
82
71
2.8333
0.9512
0.9023
Care-i amorul meu în astă
213
157
130
4.3333
0.9745
0.9147
Ce e amorul?*
110
90
78
2.8750
0.9667
0.9207
Ce te legeni...
102
76
61
2.8182
0.9474
0.8951
Ce-ţi doresc eu ţie, dulce
183
127
100
3.0000
0.9370
0.9016
lume
Românie Cine-i?
129
93
75
2.9091
0.9355
0.8900
Copii eram noi amândoi
375
250
205
3.6250
0.9560
0.9297
Crăiasa din poveşti
122
94
74
3.0000
0.9894
0.9415
Criticilor mei*
120
87
61
3.4000
0.9885
0.9221
Cu mâne zilele-ţi adaogi...
141
105
79
2.9500
0.9524
0.9109
Cugetările sărmanului Dionis
571
389
324
3.8182
0.9614
0.9427
Cum negustorii din
101
84
75
3.0000
0.9762
0.9226
Constantinopol Cum oceanu-ntărâtat...
77
67
60
2.6000
0.9701
0.9197
Dacă treci râul Selenei
356
232
188
3.3333
0.9526
0.9286
De câte ori, iubito...
102
84
71
2.8182
0.9762
0.9289
De ce nu-mi vii
123
82
59
3.0000
0.9512
0.8963
De ce să mori tu?
266
172
141
3.8750
0.9302
0.8866
77
56
45
3.2500
0.9643
0.8700
De-aş avea*
190 The word Poem title
N
V
g(1)
k
G(k)
R2
De-aş muri ori de-ai muri
258
168
133
3.8750
0.9405
0.8958
Demonism
882
460
349
5.0000
0.9565
0.9293
De-oi adormi (variantă)*
112
101
94
2.6667
0.9703
0.9351
De-or trece anii...
87
63
53
2.6000
0.9206
0.8670 0.9035
Departe sunt de tine
135
105
90
2.8750
0.9429
Despărţire
304
202
163
4.0000
0.9554
0.9158
Din Berlin la Potsdam
128
99
83
2.9000
0.9495
0.9070
Din lyra spartă...
51
44
38
2.6000
0.9773
0.9005
Din noaptea*
56
51
47
2.3333
0.9804
0.9270
Din străinătate
244
168
135
3.6250
0.9643
0.9252
Din valurile vremii...
152
104
80
3.4000
0.9423
0.8867
Dintre sute de catarge*
43
38
35
2.0000
0.9737
0.9211
Doi aştri
40
39
38
1.9737
0.9744
0.9244
Dorinţa
102
85
75
3.3333
0.9765
0.9111
Dumnezeu şi om
443
320
269
3.6000
0.9656
0.9454
Ecò
698
442
362
4.0000
0.9661
0.9480
Egipetul
688
452
366
4.0000
0.9690
0.9513
Epigonii
921
565
442
4.5556
0.9699
0.9515
Făt-Frumos din tei
415
281
234
4.4000
0.9644
0.9300
6030
2271
1567
9.5000
0.9701
0.9502
Floare-albastră
247
185
157
5.6923
0.9838
0.8962
Foaia veştedă (după Lenau)
115
99
88
2.8571
0.9697
0.9285
Freamăt de codru
179
143
125
3.2500
0.9720
0.9351
Frumoasă şi jună
113
82
67
4.0000
0.9878
0.8902
Ghazel
331
231
189
4.3333
0.9697
0.9291
Glossă
380
191
141
4.5714
0.9215
0.8668
Horia
143
119
103
3.5000
0.9916
0.9401
Iar când voi fi pământ (vari-
131
106
90
3.0000
0.9811
0.9387
Împărat şi proletar
1510
857
706
5.5000
0.9627
0.9450
În căutarea Şeherezadei
915
594
495
5.0000
0.9747
0.9537
Înger de pază
91
71
55
2.8571
0.9718
0.9143
Înger şi demon
876
520
419
4.0000
0.9538
0.9385
Îngere palid…*
57
50
44
2.6000
0.9800
0.9124
Feciorul de împărat fără de stea
antă)
Întunericul şi poetul
249
176
143
3.3333
0.9602
0.9287
Iubind în taină…*
81
74
68
2.6000
0.9865
0.9408
Iubită dulce, o, mă lasă
227
212
162
4.0000
0.9387
0.9009
Vocabulary richness 191
N
V
Iubitei
416
Junii corupţi
458
Kamadeva La Bucovina
Poem title
La mijloc de codru...
g(1)
k
G(k)
R2
240
174
309
275
5.4000
0.9625
0.9018
3.7500
0.9482
81
70
0.9255
62
2.6667
0.9714
0.9206
184
140
112
3.3333
0.9714
0.9317
55
35
29
3.2500
0.9429
0.7920
La moartea lui Heliade
332
225
182
4.2000
0.9733
0.9341
La moartea lui Neamţu
245
173
136
3.7500
0.9538
0.9131
La moartea principelui
132
98
84
4.2500
0.9694
0.8772
150
116
96
2.8667
0.9569
0.9215
142
104
78
3.0000
0.9712
0.9279
La o artistă (Credeam ieri)
219
152
112
3.7778
0.9737
0.9267
La Quadrat
110
79
63
3.2500
0.9620
0.8952
La steaua*
59
56
54
1.9815
0.9643
0.9292
Ştirbey La mormântul lui Aron Pumnul La o artistă (Ca a nopţii poezie)
Lacul
90
70
60
2.6667
0.9429
0.8921
Lasă-ţi lumea...
225
167
142
3.7500
0.9581
0.9160
Lebăda*
37
35
34
2.8857
0.9714
0.8525 0.8958
Lida
66
57
50
3.1429
0.9825
Locul aripelor*
223
165
138
4.0000
0.9758
0.9273
Luceafărul
1737
820
583
5.6000
0.9500
0.9309
Mai am un singur dor
125
103
85
2.9286
0.9709
0.9292
Melancolie
274
192
157
3.0000
0.9531
0.9297
Memento mori
9773
3576
2460
10.9167
0.9715
0.9548
Miradoniz
636
377
296
4.4000
0.9602
0.9345
Misterele nopţii
155
110
87
3.8333
0.9545
0.8878
Mitologicale
681
442
360
4.5000
0.9661
0.9432
Mortua est!
491
295
227
3.8667
0.9424
0.9170
Mureşanu
2051
961
679
6.2857
0.9646
0.9441
Murmură glasul mării
119
100
89
3.5000
0.9900
0.9287
Napoleon
240
169
132
3.7143
0.9704
0.9296
Noaptea...
177
128
107
3.7143
0.9531
0.8992
Nu e steluţă*
42
34
27
2.6667
0.9706
0.8660
Nu mă-nţelegi*
330
248
205
4.7500
0.9798
0.9343
Nu voi mormânt bogat (vari-
113
99
92
2.5000
0.9596
0.9280
42
37
33
2.3333
0.9730
0.8994
antă) Numai poetul*
192 The word Poem title
N
V
g(1)
k
G(k)
R2
O arfă pe-un mormânt
157
118
99
3.4000
0.9661
0.9171
O călărire în zori
346
233
177
5.1250
0.9785
0.9222
O stea prin ceruri*
66
59
53
2.6000
0.9831
0.9258
O, adevăr sublime...
334
226
192
2.9500
0.9425
0.9232
O, mamă…
140
98
77
3.5000
0.9592
0.8967
Odă în metru antic*
93
79
69
2.8333
0.9620
0.9112
Odin şi poetul
1429
724
525
5.8889
0.9627
0.9388
Ondina (Fantazie)
871
535
427
5.0000
0.9701
0.9467
Oricâte stele...
85
73
62
2.8000
0.9863
0.9326
Pajul Cupidon…*
124
109
103
2.8000
0.9817
0.9457
Pe aceeaşi ulicioară...
138
103
86
2.8889
0.9320
0.8915
Pe lângă plopii fără soţ
199
138
114
4.0000
0.9638
0.9058
Peste vârfuri
47
39
35
2.0000
0.9487
0.8974
Povestea codrului
220
168
138
3.5714
0.9821
0.9442
Povestea teiului
390
261
209
3.8000
0.9464
0.9187
Prin nopţi tăcute*
42
37
33
2.3333
0.9730
0.8994
Privesc oraşul furnicar
173
136
117
2.9167
0.9559
0.9246
Pustnicul
380
270
231
4.0000
0.9667
0.9370
Replici
147
73
56
3.0000
0.9178
0.8562 0.8992
Revedere
141
102
82
3.2500
0.9510
Rugăciunea unui dac
357
253
212
3.5000
0.9565
0.9323
S-a dus amorul
219
152
128
3.0000
0.9342
0.9046
Sara pe deal
156
128
113
3.0000
0.9766
0.9414
Scrisoarea I
1272
707
553
5.8571
0.9576
0.9333
Scrisoarea II
696
423
342
4.3333
0.9574
0.9353
Scrisoarea III
2278
1146
873
6.0000
0.9686
0.9529
Scrisoarea IV
1256
699
546
6.0000
0.9700
0.9442
Scrisoarea V
1027
550
410
5.0000
0.9618
0.9391
Se bate miezul nopţii...
45
40
36
2.3333
0.9750
0.9069
Şi dacă...
53
37
27
3.2500
0.9730
0.8302
Singurătate*
151
128
115
4.1250
0.9922
0.9257
Somnoroase păsărele...
55
46
40
2.5000
0.9565
0.8886
Sonete
265
194
160
4.0000
0.9691
0.9278
Speranţa
245
143
103
3.8571
0.9301
0.8781
Steaua vieţii
70
55
44
2.8571
0.9455
0.8712
Stelele-n cer
91
76
67
2.6667
0.9605
0.9137
Sus în curtea cea domnească
128
104
89
3.2500
0.9808
0.9300
Te duci...
84
68
61
2.6667
0.9559
0.9036
Vocabulary richness 193
Poem title
N
V
g(1)
k
G(k)
R2
Trecut-au anii
88
74
Unda spumă*
47
41
63
2.7778
0.9730
0.9208
36
2.5000
0.9756
Venere şi Madona
393
0.8994
247
183
4.5714
0.9676
Veneţia (de Gaetano Cerri)*
73
0.9253
68
64
2.3333
0.9853
0.9453
Viaţa mea fu ziuă
105
Vis
177
86
75
3.0000
0.9767
0.9244
138
114
3.2500
0.9783
0.9400
As can be seen in Figure 3.3.2, R2 increases with increasing text size but the dispersion is too great to give a clear result. A possible remedy would be the scrutinising of individual poems which can be considered outliers. To this end, a confidence belt around the power curve – expressing the dependence – could show the positioning of individual poems. However, in this way we would obtain only a new classification.
Figure 3.3.2. Increase of vocabulary richness R2 with text size
Again, the variance of R2 is simply Var(R2) = G(k)(1 - G(k))/V, in e.g., the poem Veneţia we find Var(R2) = 0.9853(1 - 0.9853)/68 = 0.0002 and the comparison of its richness with that of Doi aştri yields
= z
| 0.9453 − 0.9244 | = 0.74 . 0.0006 + 0.0002
194 The word i.e. a non-significant difference. Though R2 seems to increase with increasing poem size, there are short poems with great richness (e.g. Veneţia) and a long poem (Glossă) with small richness. But roughly, there is some kind of trend which cannot be found for a unique author because of a great dispersion. If we correlate R1 with R2, we do not find any relation. The points form rather a cloud. Hence, we can conclude that though both views are mere transformations of one another, they are relatively independent richness indicators and can be used for text characterisation. The outliers must be studied individually. Another possibility is to take the mean of R1 and R2, i.e. Rm = (R1 + R2)/2 which is independent of text size but depends on the language type. Computing the mean richness indicator for 176 texts in 20 languages (cf. Popescu et al. 2009) a horizontal cloud with a mean of ca. 0.86 is obtained. Differences in one and the same language are signs of stylistic or text sort differences. For Eminescu's poems, we obtain the points presented in Figure 3.3.3. The mean is ca. 0.91. The outliers and the extreme values are research objects for literary studies. The variance of this indicator is Var(Rm) = (Var(R1) + Var(R2))/4, so that differences can easily be tested.
Figure 3.3.3. Mean vocabulary richness in Eminescu's poems
As a general remark, the outlier poems “La mijloc de codru…” and “Replici” appear in all indicators considered in the present book. The cause is the excessive repetition of some words, “şi de” (and of), respectively “eu sunt” (I am) and “tu eşti” (you are) as indicated in continuation below by bolded fonts.
Vocabulary richness 195
La mijloc de codru... ... Legănându-se din unde, În adâncu-i se pătrunde Şi de lună şi de soare Şi de păsări călătoare, Şi de lună şi de stele Şi de zbor de rândurele Şi de chipul dragei mele. Replici Poetul Tu eşti o undă, eu sunt o zare, Eu sunt un ţărmur, tu eşti o mare, Tu eşti o noapte, eu sunt o stea Iubita mea. Iubita Tu eşti o ziuă, eu sunt un soare, Eu sunt un flutur, tu eşti o floare, Eu sunt un templu, tu eşti un zeu Iubitul meu. Tu eşti un rege, eu sunt regină, Eu sunt un caos, to o lumină, Eu sunt o arpă muiată-n vânt Tu eşti un cânt. Poetul Tu eşti o frunte, eu sunt o stemă, Eu sunt un geniu, tu o problemă, Privesc în ochii-ţi să te ghicesc Şi te iubesc! …
196 The word
3.4 Word length The first linguistic invstigations date back at least 150 years. Usually, Augustus de Morgan is quoted who in 1851 mentioned in a private letter word length as an indicator of style (cf. Lord 1958). Today, the problem of word length is almost a separate discipline represented in many states; there are projects and bibliographies (cf. http://wwwuser.gwdg.de/~kbest/), congresses, omnibus volumes with a great amount of literature (cf. Best 2001; Grzybek 2006) and chapters in general works (cf. Best 2005). After many years of trials there is no unification of the field and probably there will be none because languages display a variety of word length phenomena (caused by boundary conditions), the individual texts may display idiosyncrasies, there is a great number of definitions of the concept word and numerous ways of length measurements, and last but not least, every linguist knows only a limited number of languages, living and dead ones, for which s/he can perform tests. The time of a great unification did not come as yet. We restrict ourselves to the study of word length in Eminescu's poems, try to characterise it and try to find a unique model, as far as possible. The analysis will be performed with the following provisos: clitics are parts of the phonological word; apostrophes will be eliminated, and units joined with a hyphen will be considered one word. These “decisions” are no rules and cannot be followed in every language. In German, e.g. the term “Natur- und Kulturschutz” is, as a matter of fact, a special kind of compound and stating its word length is a pure convention. Cases of similar kinds exist in every language, they evoke initial difficulties and hinder comparison and unification. The frequencies of word lengths in 100 poems by Eminescu are presented in Table 3.4.1. We present them in order to enable other researchers to work with the data. Very short poems are left out because the frequencies are not reliable enough to test a model. The table contains the lengths x = 1,2,…, and in the lines the frequencies of these lengths in individual poems are presented. The measurement of word length was performed in terms of syllable numbers, since syllables are the immediate phonetic components of the words. Another measurement, in terms of morpheme numbers, is a rather risky enterprise and can lead to vehement discussions; besides, it would be rather the measurement of morphemic complexity. We do not need to determine the syllable boundaries when we count the number of syllables in a word. It is sufficient to identify the elements from which we can tell how many syllables a word has, i.e. the nucleuses, which are in general the vowels or a syllabic consonant such as [l, r]. Therefore, this task can be performed by means of a programme and some rules
Word length 197
for correction (diphthongs etc.). Word length management in poems differs from that in prose. If the poem is constructed on rhythmic principles, e.g. isosyllabism, hexameter, etc. then the word length distribution has a greater excess than in prose texts or in poetic texts which are not based on rhythm or rhyme, because a specific length dominates. However, the poet can develop a specific technique of his own. In general, one can suppose that the longer the poem, the more deviations from general models may occur because the poet writes it with pauses and makes additional corrections; but on the other hand, additional corrections may lead to unifications. For a recent review of word length see Popescu et al.5
3.4.1 Ord's scheme In Table 3.4.1, we present the individual word-length distributions, the size of the poems in word-tokens (N) and Ord's characterisation by means of I and S (cf. Chapter 3.2.2) The numbers in the second column represent the frequencies of the individual lengths x = 1,2,3,… Though the -points have a relatively great dispersion – as can be seen graphically in Figure 3.4.1 – , they are placed on a straight line which is characteristic both of the text sort, of an author, a language, etc. It is also unique with respect to the entity in question, that is, to the linguistic unit and its property. It has been shown that e.g. the rank-frequency distributions of word-forms in many languages are placed on a straight line in the negative hypergeometric domain, i.e. below the S = 2I-1 line (cf. Popescu et al. 2009: 154). For Eminescu, we obtained the straight line S = -0.3219 + 2.8858I. For a Slovak poet, E. Bachletová, who writes rhyme-free and rhythm-free poems, we obtained the word length function S = -0.5843 + 3.0397I (cf. Čech et al. 2011). Of course, the differences between the functions can be tested but a deeper hypothesis needs more data concerning different units and properties from various texts and languages. Nevertheless, the above results show that there is a mechanism controlling word length in poetry. With increasing size of the poem the poet approximates both the mean of I and the mean of S.
5 Popescu, I.-I., Naumann S., Kelih E., Rovenchak A., Sanada H., Overbeck A., Smith R., Čech R., Mohanty P., Wilson A., Altmann G., (2013). Word length: aspects and languages, "To honour Karl-Heinz Best", In Köhler, R., Altmann G. (eds.), Issues in Quantitative Linguistics 3, Lüdenscheid: RAM, 224-281
198 The word Table 3.4.1: Word length distributions in 100 Eminescu's poems (N = number of words in the poem, I, S = Ord's indicators) N
I
S
75
0.6480
1.3231
159
0.4313
1.7535
257
0.3009
0.3537
119, 69, 57, 18, 2, 1
266
0.5446
0.8923
Andrei Mureşanu
817, 603, 323, 111, 15, 9, 3
1881
0.5358
1.1922
Atât de fragedă...
85, 60, 24, 6, 1
176
0.4248
0.9160
Aveam o muză
198, 139, 64, 14, 3, 3
421
0.4965
1.2828
Basmul ce i l-aş spune ei
198, 133, 53, 12, 1
397
0.4037
0.8493
Călin (file din poveste)
1017, 754, 332, 108, 12, 3, 1
2227
0.4666
1.0033
Când
47, 47, 30, 2
126
0.3500
0.2992
Când amintirile…
44, 35, 11, 4, 2, 0, 1
97
0.6221
2.0252
Când crivăţul cu iarna...
330, 232, 111, 32, 4
709
0.4519
0.8668
Ce e amorul?
70, 31, 16, 8
124
0.5021
1.0389
Ce te legeni?...
48, 33, 14, 6, 1
102
0.4944
1.0143
Ce-ţi doresc eu ţie, dulce
56, 80, 36, 11
183
0.3695
0.4673
Poem title
Word-length distributions
Adânca mare Adio
29, 27,10, 4, 5 84, 53, 20, 1, 0, 0, 1
Amicului F.I.
101, 113, 41, 2
Amorul unei marmure
Românie Crăiasa din poveşti
47, 54, 15, 6
122
0.3693
0.6917
Criticilor mei
62, 49, 15, 3, 1
130
0.3914
0.9603
Cu mâine zilele-ţi adaogi...
58, 51, 22, 5, 5
141
0.5319
1.1988
Cugetările sărmanului Dionis 282, 173, 81, 30, 4, 0, 1
571
0.5079
1.1888
Dacă treci râul Selenei
150, 116, 71, 17, 3
357
0.4607
0.7457
De câte ori. iubito...
51, 24, 19, 6, 2
102
0.5794
1.0501
De ce nu-mi vii?
77, 31, 11, 3, 1
123
0.4370
1.3568
De-aş avea
40, 31, 9, 12, 0, 1
93
0.6170
1.2394
De-or trece anii...
51, 25, 9, 2
87
0.3780
0.9456
Departe sunt de tine...
69, 46, 11, 7, 0, 1, 1
135
0.5954
2.1310
Despărţire
179, 78, 30, 14, 3
304
0.5055
1.3414
Diana
65, 55, 18, 7, 2, 1
148
0.5180
1.3681
Din străinătate
113, 53, 55, 17, 4, 1, 1
244
0.6453
1.2351
Din valurile vremii…
81, 43, 19, 9
152
0.4741
0.9795
Dintre sute de catarge
10, 19, 7, 12, 1, 1
50
0.5806
0.6776
Dorinţa
51, 32, 13, 2, 4
102
0.5673
1.4899
Dumnezeu şi om
176, 131, 100, 27, 5, 1, 1
441
0.5317
0.9736
După ce atâta vreme
22, 11, 10, 4
47
0.5296
0.6756
Ecò
311, 224, 133, 23, 4, 3
698
0.4684
0.9757
Egipetul
255, 234, 126, 58, 12, 2, 0, 1
688
0.5570
1.1130
Word length 199
N
I
S
378, 326, 163, 51, 11, 1
930
0.4786
0.8946
2853, 1866, 994, 251, 47, 17, 4
6032
0.4970
1.1213
0.5797
1.7838 0.6794
Poem title
Word-length distributions
Epigonii Feciorul de împărat fără de stea Floare-albastră
125, 73, 32, 14, 2, 0, 0, 1
247
Freamăt de codru
68, 63, 34, 13, 1
179
0.4614
Ghazel
161, 90, 60, 19, 1
331
0.4902
0.8148
Glossă
191, 132, 46, 9, 2
380
0.3951
0.9286
Icoană si privaz
755, 464, 190, 45, 13, 3, 2
1472
0.4836
1.3511
Împărat şi proletar
669, 481, 250, 89, 15, 5, 1
1510
0.5250
1.1054
În căutarea Şeherezadei
412, 277, 167, 43, 13, 2, 1
915
0.5296
1.1228
Înger de pază
42, 35, 11, 2, 1
91
0.4030
1.0059
Înger şi demon
371, 282, 148, 59, 10, 5, 2
877
0.5692
1.2736
Iubind în taină…
44, 26, 13, 2, 1, 1
87
0.5546
1.5919
Iubitei
246, 114, 42, 10, 1, 2, 1
416
0.4912
1.7738
Junii corupţi
212, 119, 85, 33, 6, 2
457
0.5828
1.0636
Kamadeva
34, 28, 13, 5, 1
81
0.4884
0.9090
La Bucovina
71, 73, 30, 9, 1
184
0.4133
0.7310
La mijloc de codru...
27, 16, 8, 3, 1
55
0.5418
1.1422
La mormântul lui Aron
63, 42, 28, 15, 1, 0, 1
150
0.6104
1.2215
Pumnul La steaua
37, 21, 11, 2
71
0.4099
0.7692
Lacul
42, 32, 12, 4
90
0.4090
0.7846
Lasă-ţi lumea...
94, 90, 34, 5, 2
225
0.3877
0.8134
Lida
25, 30, 9, 2
66
0.3318
0.5554
Locul aripelor
139, 81, 30, 9, 1
259
0.4264
1.0158
Luceafărul
942, 487, 227, 72, 8, 1
1737
0.4692
1.0789
Mai am un singur dor
66, 36, 17, 4, 1, 1
125
0.5313
1.4828
Melancolie
124, 86, 44, 18, 2
274
0.4955
0.8858
Memento mori
4134, 3152, 1668, 646, 117, 36,
9777
0.5623
1.2939 1.0703
16, 7, 1 Miradoniz
281, 205, 98, 43, 7, 2
636
0.5305
Mitologicale
304, 199, 124, 41, 10, 4
682
0.5656
1.1375
Mortua est!
244, 178, 59, 8, 1
491
0.3535
0.7559
Napoleon
92, 86, 45, 12, 5
240
0.4867
0.9010
Noaptea...
84, 61, 24, 7, 0, 1
177
0.4552
1.1509
Nu mă-nţelegi
185, 104, 72, 14, 7, 2
384
0.5615
1.2308
O, ramâi
70, 40, 15, 3, 1
129
0.4226
1.0904
O. mamă…
72, 45, 19, 4
140
0.3938
0.7876
200 The word N
I
S
Poem title
Word-length distributions
Odă în metru antic
46, 37, 12, 6, 2
103
0.5132
1.1673
Oricâte stele...
40, 26, 14, 5
85
0.4610
0.7699
Pajul Cupidon...
69, 41, 28, 8, 1, 1
148
0.5461
1.1176
Pe aceeaşi ulicioară…
60, 58, 16, 3, 1
138
0.3657
0.8659
Pe lângă plopii fără soţ
119, 56, 20, 3, 1
199
0.3872
1.1087
Peste vârfuri
18, 18, 8, 3
47
0.4184
0.6278
Povestea codrului
94, 79, 37, 9, 1
220
0.4211
0.7435
Povestea teiului
181,132, 65, 8, 4
390
0.4251
0.8825
Revedere
65, 48, 15, 11, 1, 0, 1
141
0.5922
1.6526
Rugăciunea unui dac
171, 115, 53, 17, 0, 1
357
0.4575
0.9688
S-a dus amorul
121, 66, 21, 10, 1
219
0.4541
1.1556
Sara pe deal
63, 57, 34, 0, 1, 1
156
0.4100
0.9482
Scrisoarea I
560, 418, 187, 98, 18, 1
1282
0.5241
1.0151
Scrisoarea II
324, 214, 110, 40, 9
697
0.5117
0.9943
Scrisoarea III
1054, 703, 349, 133, 31, 6, 2, 2
2280
0.5616
1.3424
Scrisoarea IV
578, 425, 177, 67, 9, 6, 1, 1
1264
0.5285
1.3893
Scrisoarea V
495, 310, 158, 56, 12, 1
1032
0.5151
1.0595
Se bate miezul nopţii…
21, 15, 5, 3, 1
45
0.5531
1.2327
Şi dacă…
29, 16, 5, 1, 1, 1
53
0.6457
2.1460
Singurătate
83, 56, 22, 9, 2
172
0.4920
1.0870
Somnoroase păsărele…
18, 23, 8, 6
55
0.4458
0.6520
Sonete
127, 87, 36, 8, 3, 1
262
0.4834
1.2538
Speranţa
102, 87, 47, 8, 1
245
0.4069
0.6399
Stelele-n cer
44, 23, 15, 3, 4, 2
Strigoii
1012, 775, 310, 113, 19, 5, 4
91
0.7760
1.7620
2238
0.5030
1.2582
Sus în curtea cea domnească 57, 53, 13, 5
128
0.3647
0.7771
Te duci…
127, 57, 30, 5, 2, 2
223
0.5445
1.6360
Trecut-au anii
43, 31, 9, 3, 2
88
0.4968
1.3431
Venere şi Madona
173, 132, 59, 20, 5, 3, 1
393
0.5719
1.4921
Viaţa
218, 174, 82, 21, 4, 2
501
0.4753
1.0544
Word length 201
This fact can be seen in Figure 3.4.2. As long as the poem is short, the control of word-lengths can be dictated by some conscious (e.g. poetic form) or unconscious mechanism, but with increasing poem size the control disappears. As can be seen, the longer a poem, the stronger it tends to the mean of I and S.
Figure 3.4.1. The I-S relationship in Eminescu's poems
Figure 3.4.2. The convergence of I and S with increasing sample size
202 The word 3.4.2 Word-length distribution As can be seen in Table 3.4.1, we would obtain quite different relative frequencies of lengths for individual poems. This hypothesis could be tested by means of a chi-square test for homogeneity or the equivalent information statistics. The differences may be signs of different background models, differences in parameters of the same model or, last but not least, idiosyncrasies. In the sequel we shall search for models realised in Eminescu's poetry. We start from the general theory proposed by Wimmer and Altmann (2005) and explained in Chapter 2.5.1 (Word length in rhyme) where we obtained the Poisson, the hyper-Poisson and the Ferreri-Poisson distributions. As can be seen in Table 3.4.2, at least one of these three special cases of the theory can be successfully fitted to 100 texts. Of course, there are cases for which one of the three distributions is not adequate and there are also texts to which none of the given distributions can be fitted. These are especially long poems. For some of them a special modification can be found but some of them resist any fitting. This is caused by the fact that long texts are, as a matter of fact, mixed texts. The mixing arises by making pauses in writing or subdividing the text in chapters, etc. After a pause, the rhythms in the brain may change, the memory of the previous text may weaken, new impressions may arise, etc. But these deviations may arise also voluntarily, e.g. with dada-writers. Table 3.4.2: Fitting the above distributions to word-length distributions in Eminescu's poems. The following abbreviations are used (cf. Popescu et al. 2009: 134): X2 = the empirical chisquare value for the goodness-of-fit; DF = degrees of freedom; P = the associated probability; a, b = parameters; Po = Poisson; FP = Ferreri-Poisson; HP = hyper-Poisson. Poem title Adânca mare
Adio
Andrei Mureşanu Atât de fragedă...
Aveam o muză
X2
DF
P
a
b
Po
7.05
3
0.07
1.1646
--
FP
4.47
3
0.21
1.6019
--
HP
3.19
2
0.20
2.2069
2.6898
Po
1.73
2
0.42
0.6365
--
FP
3.84
3
0.28
1.0897
--
HP
1.71
1
0.19
0.5982
0.9480
FP
6.12
3
0.11
1.3560
--
HP
5.40
2
0.07
1.2601
1.6277
Po
0.19
3
0.98
0.7402
--
FP
1.01
3
0.80
1.1097
--
HP
0.15
2
0.93
0.7938
1.1019
Po
3.14
3
0.37
0.7997
--
Distr.
Word length 203
Poem title
Basmul ce i l-aş spune ei
Călin (file de poveste) Când amintirile…
Când crivăţul cu iarna...
Ce e amorul? Ce te legeni?...
Distr.
X2
DF
P
a
b
FP
2.34
3
0.50
1.2001
--
HP
1.39
2
0.50
1.0685
1.5220
Po
1.38
3
0.71
0.7074
--
FP
3.61
3
0.31
1.1561
--
HP
1.35
2
0.51
0.7384
1.0614
FP
10.80
5
0.06
1.2680
--
HP
6.29
3
0.10
0.9750
1.2907
Po
1.72
2
0.42
0.9306
--
FP
2.43
3
0.49
1.3134
--
HP
2.26
2
0.32
1.5712
2.3239
Po
5.03
3
0.17
0.8034
--
FP
4.87
3
0.18
1.2557
--
HP
3.59
2
0.17
0.9875
1.3424
FP
3.20
2
0.20
1.1455
--
HP
0.39
1
0.53
4.4611
9.4112
Po
1.37
2
0.50
0.8184
--
FP
0.56
3
0.91
1.2660
--
HP
0.54
2
0.77
1.2673
1.8434
Ce-ţi doresc eu ţie, dulce
Po
5.19
2
0.07
1.0354
--
Românie
HP
0.01
1
0.90
0.6395
0.4476
Crăiasa din poveşti
Po
3.47
2
0.17
0.8516
--
HP
1.16
1
0.28
0.5404
0.4931
Po
0.50
2
0.78
0.7075
--
FP
2.62
3
0.45
1.1641
--
HP
0.05
1
0.81
0.5550
0.7145
Po
4.73
3
0.19
0.9706
--
FP
2.88
3
0.41
1.4092
--
HP
2.96
2
0.23
1.4334
1.8267
Criticilor mei
Cu mâine zilele-ţi adaogi...
Cugetările sărmanului Dionis Dacă treci râul Selenei
De câte ori, iubito...
De ce nu-mi vii?
FP
3.24
4
0.52
1.2251
--
HP
2.62
3
0.46
1.3514
2.1028
Po
5.02
3
0.17
0.9081
--
FP
6.23
3
0.10
1.3761
--
HP
4.91
2
0.09
1.0477
1.2461
Po
7.61
3
0.05
0.8757
-
FP
4.47
3
0.21
1.3103
--
HP
2.96
2
0.23
2.8203
4.8019
Po
9.49
2
0.17
0.5491
--
FP
0.67
2
0.71
0.9452
--
204 The word Poem title De-aş avea
De-or trece anii...
Departe sunt de tine...
Despărţire Diana
Din valurile vremii…
Distr.
X2
DF
P
a
b
HP
0.02
2
Po
0.42
1
0.99
2.0774
5.1168
0.52
0.8867
FP
0.002
--
1
0.96
1.3560
HP
--
3.50
1
0.06
2.5882
3.5811
Po
0.56
2
0.76
0.5694
--
FP
0.17
2
0.92
0.9876
--
HP
0.14
1
0.70
0.8728
1.7354
Po
4.25
2
0.12
0.7251
--
FP
3.29
3
0.35
1.1499
--
HP
3.13
2
0.19
1.0919
1.7676
FP
6.32
3
0.10
1.0655
--
HP
1.59
2
0.45
3.0181
6.6950
Po
2.25
3
0.52
0.8568
--
FP
1.70
3
0.64
1.2977
--
HP
1.68
2
0.43
1.0714
1.3390
Po
4.70
2
0.10
0.7356
--
FP
1.16
2
0.56
1.1583
--
HP
0.03
1
0.87
2.1534
4.0564
Dintre sute de catarge
Po
8.07
4
0.09
1.6490
--
FP
8.41
4
0.08
2.1452
--
Dorinţa
Po
1.47
2
0.48
0.7644
--
FP
0.56
2
0.76
1.1366
--
HP
3.14
2
0.21
2.8135
4.9013
Po
3.59
2
0.17
0.9417
--
FP
2.43
2
0.30
1.3921
--
HP
1.86
1
0.17
1.9251
2.6456
Po
5.44
4
0.25
1.0546
--
FP
3.08
5
0.69
1.5202
--
După ce atâta vreme
Egipetul
Epigonii
Floare-albastră
Freamăt de codru
HP
2.51
3
0.47
1.3055
1.3926
Po
2.03
4
0.73
0.9207
--
FP
5.66
4
0.23
1.3846
--
HP
1.71
3
0.63
0.9803
1.0982
Po
7.30
3
0.06
0.7883
--
FP
2.18
3
0.54
1.2169
--
HP
1.15
3
0.77
1.8414
3.1538
Po
2.28
3
0.52
0.9901
--
FP
3.30
3
0.35
1.4630
--
HP
2.28
2
0.33
0.9983
1.0067
Word length 205
Poem title Glossă
Icoană si privaz Împărat şi proletar În căutarea Şeherezadei Înger de pază
Înger şi demon Iubind în taină…
Iubitei Kamadeva
La Bucovina
La mijloc de codru...
Distr.
X2
DF
P
a
b
Po
0.18
3
0.98
0.6831
--
FP
3.71
3
0.30
1.1293
--
HP
0.14
2
0.93
0.6472
0.9286
FP
3.21
4
0.52
1.1557
--
HP
3.19
3
0.36
1.1619
1.9042
FP
5.07
5
0.41
1.3382
--
HP
4.59
4
0.34
1.2756
1.6807
FP
7.94
4
0.09
1.3338
--
HP
7.27
3
0.06
1.3282
1.7945
Po
0.45
2
0.80
0.7345
--
FP
2.16
3
0.54
1.1982
--
HP
0.04
1
0.84
0.5488
0.6586
FP
4.79
4
0.31
1.4060
--
HP
3.87
4
0.42
1.5353
2.0002
Po
3.36
2
0.19
0.8247
--
FP
3.68
3
0.30
1.3016
--
HP
3.55
2
0.17
1.6424
2.5116
FP
2.38
3
0.50
0.9796
--
HP
0.37
2
0.93
1.7957
4.1584
Po
1.62
2
0.45
0.8258
--
FP
0.43
2
0.81
1.2623
--
HP
0.07
2
0.97
1.8589
3.0212
Po
1.56
3
0.67
0.9005
--
FP
4.89
3
0.18
1.3756
--
HP
0.48
2
0.79
0.6957
0.6766
Po
1.62
2
0.45
0.8258
--
FP
0.43
2
0.81
1.2623
--
HP
0.07
2
0.97
1.8589
3.0212
La mormântul lui Aron Pum-
FP
5.32
3
0.15
1.4849
--
nul
HP
4.49
3
0.21
2.0277
2.6576
La steaua
Po
1.35
2
0.51
0.7026
--
Lacul
Lasă-ţi lumea...
FP
1.15
2
0.56
1.1471
--
HP
1.13
1
0.29
1.0531
1.7120
Po
0.02
2
0.99
0.7700
--
FP
0.52
2
0.77
1.2160
-1.0208
HP
0.01
1
0.92
0.7778
Po
3.08
3
0.38
0.8151
--
HP
0.42
1
0.52
0.5078
0.5237
206 The word Poem title Lida
Locul aripelor
Mai am un singur dor
Melancolie
Miradoniz Mitologicale Mortua est! Napoleon
Noaptea...
O, mamă…
O, ramâi
Odă în metru antic
Oricâte stele...
Pajul Cupidon...
Distr.
X2
DF
P
a
b
Po
2.73
FP
5.14
2
0.26
0.8367
--
2
0.08
1.3239
HP
--
0.02
1
0.90
0.4186
0.3488
Po
1.88
3
0.60
0.6669
--
FP
0.74
3
0.86
1.0979
--
HP
0.63
2
0.73
0.9464
1.6000
Po
2.37
2
0.31
0.7187
--
FP
0.95
3
0.81
1.1595
--
HP
0.46
2
0.79
1.6873
3.0524
Po
5.49
3
0.14
0.8742
--
FP
3.11
3
0.38
1.3247
--
HP
2.95
2
0.23
1.3227
1.8055
FP
3.25
4
0.52
1.3477
--
HP
2.83
3
0.42
1.3705
1.8390
FP
6.82
4
0.15
1.3742
--
HP
4.39
3
0.22
1.6287
2.2437
Po
3.40
3
0.33
0.6690
--
HP
1.12
2
0.57
0.4932
0.6634
Po
0.61
3
0.89
0.9740
--
FP
1.61
3
0.66
1.4381
--
HP
0.55
2
0.76
1.0411
1.1137
Po
0.31
3
0.96
0.7595
--
FP
1.01
3
0.80
1.2088
--
HP
0.25
2
0.88
0.8382
1.1542
Po
0.68
2
0.71
0.6859
--
FP
1.07
2
0.58
1.1310
--
HP
0.59
1
0.44
0.7906
1.2129
Po
0.49
2
0.78
0.6405
--
FP
0.28
2
0.90
1.0726
--
HP
0.08
1
0.77
0.8595
1.4725
Po
2.14
3
0.54
0.8651
--
FP
1.21
3
0.75
1.3035
--
HP
1.26
2
0.53
1.2166
1.6491
Po
0.39
2
0.82
1.2754
--
FP
1.17
2
0.56
0.8278
--
HP
0.34
1
0.56
1.2874
1.8439
Po
5.00
3
0.17
0.8778
--
FP
2.94
3
0.40
1.3254
--
Word length 207
Poem title Pe aceeaşi ulicioară…
Pe lângă plopii fără soţ
Peste vârfuri
Povestea codrului
Povestea teiului
Revedere
Rugăciunea unui dac
S-a dus amorul
Scrisoarea II Scrisoarea III Scrisoarea IV Scrisoarea V Se bate miezul nopţii…
Şi dacă… Singurătate
Distr.
X2
DF
P
a
b
HP
2.60
2
Po
2.86
2
0.27
1.5879
2.2952
0.24
0.7484
FP
6.75
3
0.08
1.2243
---
HP
0.17
1
0.68
0.4504
0.4801
Po
1.39
2
0.50
0.5488
--
FP
0.55
2
0.76
0.9642
--
HP
0.47
1
0.49
0.8429
1.7273
Po
0.06
2
0.97
0.9336
--
FP
0.59
2
0.74
1.4037
--
HP
0.002
1
0.97
0.8253
0.8299
Po
1.17
3
0.76
0.8450
--
FP
3.54
3
0.32
1.3115
--
HP
1.03
2
0.60
0.7692
0.8746
Po
5.31
3
0.15
0.7872
--
FP
7.58
3
0.06
1.2439
--
HP
5.29
2
0.07
0.8389
1.1008
Po
6.43
3
0.09
0.8775
--
FP
4.11
3
0.25
1.3140
--
HP
3.81
2
0.15
1.5657
2.2967
Po
4.37
3
0.22
0.7836
--
FP
3.45
3
0.33
1.2325
--
HP
3.13
2
0.21
1.0451
1.4988
Po
5.37
2
0.07
0.6549
--
FP
2.53
3
0.47
1.0879
--
HP
2.24
2
0.33
1.2948
2.3932
FP
3.18
3
0.36
1.2970
--
HP
2.45
2
0.29
1.4089
2.0344
HP
4.08
4
0.40
1.6030
2.3501
FP
7.56
4
0.11
1.2920
--
HP
7.56
3
0.06
1.3141
1.8682
FP
6.35
4
0.17
1.2670
--
HP
4.72
3
0.19
1.3871
2.0506
Po
1.74
2
0.42
0.8706
--
FP
0.81
2
0.67
1.3077
--
HP
0.57
1
0.45
1.0624
2.4780
Po
1.70
2
0.43
0.6945
--
HP
0.27
1
0.60
1.7589
3.3780
Po
2.27
3
0.52
0.7960
--
208 The word Poem title
Distr.
X2
DF
P
a
b
FP
0.50
3
0.92
1.2315
--
HP
0.49
2
0.78
1.2018
1.7814
Po
1.29
2
0.52
1.0824
--
FP
1.72
2
0.42
1.5509
--
HP
1.38
1
0.24
0.9048
0.7155
Po
2.61
3
0.46
0.7732
--
FP
0.98
3
0.81
1.2074
--
HP
0.97
3
0.61
1.0686
1.5828
Po
4.01
3
0.26
0.8704
--
FP
7.30
3
0.06
1.3452
--
HP
3.57
2
0.17
0.7450
0.7966
Stelele-n cer
HP
2.62
3
0.45
9.2961
16.8191
Strigoii
FP
6.59
4
0.16
1.2797
--
HP
3.37
2
0.19
0.9823
1.2826
Po
2.44
2
0.30
0.7471
--
FP
5.31
2
0.07
1.2104
-0.5908
Somnoroase păsărele…
Sonete
Speranţa
Sus în curtea cea domnească
Te duci… Trecut-au anii
Venere şi Madona
Viaţa
HP
1.22
1
0.27
0.5198
FP
6.12
3
0.11
1.1103
--
HP
4.08
3
0.25
4.6704
10.4059
Po
1.25
2
0.54
0.7505
--
FP
0.90
2
0.64
1.1791
--
HP
0.88
1
0.35
1.1020
1.5199
Po
5.18
3
0.16
0.8962
--
FP
3.77
4
0.44
1.3575
--
HP
2.89
3
0.42
1.5683
2.1912 --
Po
0.77
3
0.86
0.8507
FP
2.92
4
0.57
1.3107
--
HP
0.59
2
0.74
0.8999
1.1029
Out of 100 poems, 86 followed the above mentioned models, however, the number of modifications is still greater. It depends on the theme and the momentary mood of the writer, whether the next poem will follow the well known scheme. Again, we can argue with supplementary corrections or conscious deviations from the casting mould. It is sufficient if an individual length-class deviates in order to obtain another model. Consider the modified Singh-Poisson distribution defined as
Word length 209
(3.4.9)
1 − α + α e − a , x = 1 x −1 − a Px = α a e x = 2,3,4,... ( x − 1)! ,
in its 1-displaced form where the first class is modified and the other ones are modified by α in order to yield a sum of 1 (cf. Wimmer, Altmann 1999). The poems following this regime are presented in Table 3.4.3. In two cases it was possible to apply the 1-displaced binomial distribution (which converges to the 1-displaced Poisson distribution) and we obtained for Când: X2 = 3.82, DF = 1, P = 0.0505, n = 3, p = 0.3025 Amicului F.I.: X2 = 1.71, DF = 1, P = 0.20, n = 3, p = 0.2629 Now, only 6 poems remained which could not be captured by these models, viz. Ecò (N = 689); Sara pe deal (N = 156); Ghazel (N = 331), Feciorul de împărat fără de stea (N = 6032); Memento mori (N = 9777), and Scrisoarea I (N = 1282). Three of them have a too great size (N > 1200) and can be considered as mixtures. The remaining three poems represent deviations and we can conjecture that some other mechanism was active at their production or they are conscious stylistic deviations. Perhaps, a purely qualitative analysis could unveil this “mystery”, i.e. to show the boundary conditions under which the poems were written. Table 3.4.3: Poems following the 1-displaced Singh-Poisson distribution X2
DF
P
a
α
Amorul unei marmure
5.91
3
0.12
1.2100
0.7946
Din străinătate
6.42
3
0.09
1.3836
0.7241
Dumnezeu şi om
7.47
3
0.06
1.1474
0.8852
Junii corupţi
2.94
3
0.40
1.2230
0.7616
Luceafărul
5.31
3
0.15
0.8907
0.7775
Nu mă-nţelegi
5.70
3
0.13
1.1223
0.7739
Poem title
We can conclude that there are not only stress-conditioned rhythms in texts but also other ones. In this chapter we scrutinised word length which obeys some patterns derivable from a common background mechanism. The responsible mechanism yields a pattern out of several ones contained in a reservoir, which can/should be captured by a theory. Needless to say, the application of a selected model always takes place with the ceteris paribus condition. Unfortunately, this condition is not always met and may lead to exceptions or devia-
210 The word tions. Our aim is to present the theory and not the local conditions which are not always traceable. Their finding must be left to those who are specialised in text sorts or individual writers.
3.5 Word classes (parts of speech) 3.5.1 Frequencies The study of word classes or, in particular parts of speech, has been strongly influenced by the Greek-Latin tradition, which arranges words according to changing criteria: morphologically marked grammatical categories, ontological and semantic features, syntactic and discourse-pragmatic use. But even here, languages allowing conversion – e.g. English, German – and “exotic” languages destroyed the illusion that the world itself is “classified”. Of course, it is ordered in its own way, but every language constructs this order conceptually in a different way. Unfortunately, the difference is not only conceptual, it may be also formal: while in German most parts of speech can be used as nouns in an appropriate syntactic context (often with the determiner das), the reverse procedure is not always possible; derivation by means of affixation is common in inflecting languages. Nevertheless, it is always possible to partition the text in prefabricated classes – with or without intersections – but different researchers may do it in different ways. It does not depend on some order in reality but on the aim of our investigation. In such a situation one always seeks a criterion of “correctness” of the classification but it cannot be found in the language or text itself. Mostly, agreement with the classical grammar based on European languages is chosen as a criterion, which is a too narrow approach. Also semantic, psychological, sociolinguistic, etc. criteria are applied, all depending on the aim of research and they are all correct if they corroborate the hypothesis we have set up in advance. But working with the so-called Peircean abduction does not even provide clear hypotheses; with induction one hopes to have found a universal but this is mostly a problem of definition; and deduction holding true for all languages is in linguistics seldom and in many cases it creates languages which do not exist at all. According to Bunge (1983: 17), classification – if it has not been established deductively – is not a theoretical but a taxonomic account about reality. But a taxonomic account is always linked with some practical or epistemological aim. In grammatical research it is practical, in quantitative linguistics it is also epistemological. In grammatical research we classify entities according to form, meaning, function and place in sentence, etc. In quantitative linguistics we ask
Word classes (parts of speech) 211
always whether the given classification abides by a lawlike-hypothesis. That is, we set up the hypothesis that a linguistic classification may be considered “correct” or “purposeful” if the rank-frequency distribution of classes abides by some reproducible distribution. The distribution is not known in advance in all cases and it is not equal in all cases – especially because texts are different – but in any case, an acceptable rank-frequency distribution of a classification must be derivable form the general theory (cf. Wimmer, Altmann 2005). Word classes are results of language development. The process leading to their rise is known as diversification (cf. Zipf 1949; Köhler 1991), triggered by various circumstances and requirements of the language users. Altmann (2005) mentions six causes of diversification: random fluctuation, environmentally conditioned variation, conscious change, self-regulation, system modification, and Köhler's requirements, out of which those responsible for diversification are especially the trend for minimal coding and decoding effort, sufficient redundancy and minimisation of production effort, the general coding requirement and its opposite force, the need for minimising the inventory, context economy and context specificity, invariance vs. flexibility of the relation between expression and meaning. It is a very complex process leading to the rise of variants, classes, overt marking, conversions, etc. It is active in all domains of language beginning from phonetics up to sociolinguistics and psycholinguistics. Word classes can be marked phonetically, morphologically, syntactically or lexically, every language prefers its own methods. Our aim is to show that if there is a way to identify word classes in the given language, then the ranking of classes abides by a regular probability distribution or a stratification process. The study of this phenomenon is well developed though the number of publications is not too excessive (cf. Hammerl 1990; Best 1994, 1998; Köhler 1991; Rothe 1991; Wimmer, Altmann 2001; Ziegler 1998, 2001; Nemcová 2008; Overbeck, Best 2008). Let us consider the word classes in Eminescu's poems. In Romanian, we shall distinguish the following word classes: adjective (A), adverb (Av), article (At), conjunction (C), interjection (I), noun (N), numeral (Nu), preposition (P), pronoun (Pn) and verb (V), which is rather usual in Latin-based grammars. One could increase the number of classes by adding details (e.g. different kinds of pronouns or numerals) but this will not be necessary, since we are interested in reliable representations of classes. The empirical distribution is given in the order A, Av, At, C, I, N, Nu, P, Pn, V and the fitting is performed for the corresponding usual rank-frequency sequence, i.e. the frequencies are ordered in decreasing order and we try to test the following hypotheses:
212 The word The ordering of frequencies abides by the exponential decay regularity, defined as (3.5.1) y = c + a1 exp(-x/r1 ) + a2 exp(-x/r2) One can, of course, test also the classical Zipfian power function but in some publications it has been shown that it does not fit the data well. Some authors apply also the Zipf-Alekseev function and the negative hypergeometric distribution. The above formula is sufficient for short texts, where c = 0, but for longer texts in which each word class occurs at least once, it is usual to determine c on the right hand side according to the smallest frequency. There are qualitatively different orderings, i.e. we do not find in all poems the sequence ordered in form N, V, P, A, Av, Pn, At, I, Nu as it is in the poem Lacul. The equality of ordering can be tested by an appropriate statistical test. The frequencies of individual classes are presented in Table 3.5.1. Table 3.5.1: Frequencies of word classes in Eminescu's poems (the poem Călin (file de poveste) is written Călin) Poem title
Poem size N
Empirical frequencies A
Av
At
C
I
N
Nu
P
Pn
V
16
9
13
1
51
0
29
23
27
2
4
3
0
16
0
9
10
13
19
25
23
0
104
0
57
47
50
3
1
3
0
8
1
6
6
8
21
6
27
0
54
0
31
56
39
23
15
23
4
72
0
50
62
48
18
10
22
0
60
0
27
34
38
15
2
6
1
24
6
108
132
161
30
27
27
2
140
2
75
75
73
22
19
37
3
178
4
92
44
74
17
7
22
2
74
0
41
40
40
8
1
7
0
23
0
14
8
17
17
5
4
0
39
0
16
15
21
17
15
31
2
80
1
36
28
43
23
11
10
0
47
0
29
27
31
words Atât de fragedă... Călin, Gazel Călin, part I Călin, part II Călin, part III Călin, part IV Călin, part V Călin, part VI Călin, part VII Călin, part VIII Floare-albastră Lacul Mai am un singur dor Melancolie Pe lângă plopii fără soţ
176 15 65 4
351 44 41 4
249 25 304 31
247 36 95 3
463 46 504 55 247 20 90 11
125 11 274 33 199 13
Word classes (parts of speech) 213
As can be seen in Table 3.5.2, ten texts can be fitted by (3.5.1) with one component, five texts need two components. The causes must be sought directly in the given texts. Nevertheless, the texts display a different stratification. The usual chi-square test can be applied for testing for the homogeneity of class frequencies, for our purposes defined as 15
10
(3.5.2) X 2 = ∑∑ =i 1 =j 1
(nij − Eij ) 2 Eij
,
where there are 15 poems and 10 parts-of-speech, nij are the observed frequencies in the cells (i,j) (that means poem i and part-of speech j) and the Eij are the expected values Eij = ni.n.j /n, where ni. is the sum of the row, n.j is the sum of the column, and n is the sum of all frequencies. The test is generally known. Applying (3.5.2) to the data in Table 3.5.1 we obtain X2 = 524.38 which, with (15 – 1)(10 – 1) = 126 degrees of freedom, displays a very strong non-homogeneity. Even if we consider only one poem, here Călin (file de poveste), composed of nine parts, we obtain a chi-square X2 = 434.65 which with 80 degrees of freedom testifies to very strong non-homogeneity. This fact supports our conjecture that texts partitioned by the author himself or very long texts which cannot be written in one go are mixed texts and cannot be considered a homogeneous whole. If we ascribe ranks to the numbers in Table 3.5.1, we see that they are not associated equally with the given part-of-speech classes, though the differences are reduced by ranking. We ask whether the body of Eminescu's poems is homogeneous from this point of view. The problem can be tested using Kendall's concordance coefficient, Friedman's analysis of variance by ranks, etc. The classes with the same frequency obtain the mean rank, as shown in Table 3.5.3. We restrict ourselves to seven poems and compute the concordance coefficient (3.5.3) W =
12S , k n(n 2 − 1) 2
where k is the number of poems (here k = 7), n is the number of classes (here n = 10) and S is the sum of squared differences (3.5.4) S =
k (n + 1) ∑ Ri − 2 i =1 n
2
where Ri are the sums of ranks in individual columns (i = 1,2,…,n). One can find them in the last row of Table 3.5.3.
214 Word classes (parts of speech)
The word 215
Table 3.5.3: Ranking of word-classes Poem title
Poem
Ranks of word-classes
size N A words Atât de fragedă... Călin, Gazel Floare-albastră Lacul Mai am un singur dor Melancolie Pe lângă plopii fără soţ Sums of ranks Ri
176 6 65 5.5
247 6 90 4 125 6 274 7 199 6
37.5
Av
At
C
I
N
Nu
P
Pn
V
5
8
7
9
1
10
2
4
3
7
5.5
8
9.5
1
9.5
4
3
2
7
8
5
9
1
10
2
3.5
3.5
5.5
8
7
9.5
1
9.5
3
5.5
2
3
7
8
9.5
1
9.5
4
5
2
8
5
9
1
1
10
3
6
2
5
7
8
9.5
1
9.5
3
4
2
39.5
51.5
48
65
7
68
19
31
16.5
Since k(n + 1)/2 = 7(11)/2 = 38.5. we obtain S = (37.5 – 38.5)2 + (39.5 – 38.5)2 + (51.5 – 38.5)2 +(48 – 38.5)2 + (65 – 38.5)2 + + (7 – 38.5)2 + (68 – 38.5)2 + (19 – 38.5)2 + (31 – 38.5)2 + (16.5 – 38.5)2 = 746.5 Inserting this result in (3.5.3) we obtain
= W
12(3746.5) 44958 = = 0.9267. 2 48510 7 (10)(100 − 1)
As W ∈ [0,1], 1 means perfect concordance, 0 no agreement (independence), we see that as to ranking the parts of speech are distributed concordantly but this concordance is rather language-conditioned, not text-dependent. (3) If one chooses the same word classes for different languages, one obtains different orderings. Thus languages and texts can be classified according to this order, and the same may be done with authors.
216 The word 3.5.2 Descriptiveness vs. activity The frequencies of word classes in text depend both on language and on style. The distribution of frequencies in translations of the same text can be helpful in showing the present state of the languages, and translations of older texts in the present state of language can show the direction of development. As to style, two views of scrutinising a conspicuous property of the given text sort and the personal style of the author are known: the study of descriptiveness vs. activity and the study of nominality. The former view compares the number of descriptive or ornamental components with those which express some activity. The first group consists mostly of adjectives including optionally descriptive adverbs; the latter group consists of verbs which can be defined in several ways, e.g. including only verbs expressing activity (go, sing, blow,…) but not other verbs (have, sleep, be,…), or including all verbs, gerunds, gerundives, verbal interjections, etc. The latter view measures the extent of nominality in expressions. In German, laws and other official texts frequently use nominal expressions instead of verbal ones. In English, the difference can be illustrated by the sentences “He runs quickly” and “His run is quick”. Both views can further be studied from two perspectives: the first is static, taking into account the text as a whole; the second is dynamic, taking into account the development of the given property in the deployment of the text. We shall illustrate all these possibilities one after another. Consider the poem Peste vârfuri, in which we marked the components pointing to adjectives or adverbs responding to the question “how?” and for verbs: Peste vârfuri trece lună, Codru-şi bate frunza lin, Dintre ramuri de arin Melancolic cornul sună. Mai departe, mai departe, Mai încet, tot mai încet, Sufletu-mi nemângâiet Îndulcind cu dor de moarte. De ce taci, când fermecată Inima-mi spre tine-ntorn? Mai suna-vei, dulce corn, Pentru mine vre odată?
Word classes (parts of speech) 217
The sequence of verbs (V) and adjectives (A) as described above is given as follows: (I) V V A A V A A A A A V V A V V A. Several procedures are available to characterise descriptiveness/activity of the poem. We simply determine the proportion of verbs in the given sample, i.e. we define an indicator known as a modification of Busemann's ratio (3.5.5) Q =
nV nV + nA
(cf. Altmann 1978) and obtain Q = 0.44 for our example, where nV = 7 and nA = 9. If Q < 0.5 the text can be considered descriptive; if Q > 0.5, the text can be considered active; if Q ≈ 0.5, it is in a descriptive-active equilibrium. There is, of course, a way how to decide whether the state of the text is significantly descriptive or active. Since Q is a simple proportion whose expected value is a priori 0.5, the state of significance of its activity can be computed by means of the binomial distribution. If nV > n/2 where n = nV + nA, then we compute
n x = nV x n
∑
n (3.5.6) P ( X ≥ nV ) = 0.5 .
If P(X ≥ nV) < 0.05, we consider the text as significantly active. On the other hand, if nV < n/2, we compute nV n n (3.5.7) P ( X ≤ nV ) = 0.5 . ∑ x =0 x
If the probability is smaller than 0.05, the text is significantly descriptive. It would be possible to prepare tables for a number of n and state the nV at which the text takes on a significant degree of the property, but the computation with a programme is simple. Tables would consume very much space. In our example, we have n = nV + nA = 16, out of which nV = 7. Since 7 < 16/2, we compute (3.5.7) and obtain
16 16 16 P(X ≤ 7) = 0.516 + + ... + = 0.40 0 7 1 i.e. the text is in a descriptive-active equilibrium (the probability is greater than 0.05). For very long texts, the asymptotic test
218 The word
(3.5.8) X 2 =
(nV − nA ) 2 nV + nA
can be applied, where X2 is distributed like a chi-square with one degree of freedom, or identically (3.5.9)= u (2Q − 1) n distributed normally. Evidently u2 = X2. For our example we obtain
= X2
(7 − 9) 2 = 0.25 16
which is not significant with one degree of freedom, and 7 u =− (2 1) 16 = − 0.5 16
which is identical with the previous test. Evidently (-0.5)2 = 0.25 = X2. The two last tests are very quick and can be used with relatively small n. The first dynamic consideration concerns the sequence of Qs with increasing n. It has already been studied by Köhler and Galle (1993) on the basis of another definition. Here we shall obtain for the first members in the sequence (I) using (3.5.5). 1/1 = 1; 2/2 = 1; 2/3 = 0.67; 2/4 = 0.5; 3/5 = 0.6 , …. The complete sequence is then as follows (II) 1; 1; 0.67; 0.5; 0.6; 0.5; 0.43; 0.38; 0.33; 0.30; 0.36; 0.42; 0.38; 0.43; 0.47; 0.44.
If there are antagonistic requirements/forces operating in the rise of the text – here descriptiveness and activity – then it can be expected that sequences like the above one either display a regular oscillating course or display a stepwise change. In our case a regular oscillation cannot be supposed because the writer does not care for this detail. We rather expect that one of the properties changes more or less regularly. In order to capture this motion, we conjecture that the rate of change of the given variable is a function of the two antagonistic forces. Up to now, two approaches have been proposed (Popescu, Čech, Altmann 2011b). The first is a product of the form (3.5.10) y’ = Kf(x)[1 – f(x)],
Word classes (parts of speech) 219
where x is the position in the sequence and y the value of Q. In the second one, the relative rate of change of y is an additive composition of the functions of descriptiveness and activity in the form (3.5.11)
y' a b = − . y x M −x
Choosing an exponential function exp(-c(x-d)) for f(x) in (3.5.10) we obtain the Morse function6 (3.5.12) y = a + b[1 – exp(-c(x – d))]2. The solution of (3.5.11) yields the beta-function in the form
= y Cx a ( M − x)b . (3.5.13) Here M must be greater then the maximum number of steps (xmax). Applying the Morse function to the sequence (II) we obtain y = 0.3563 + 1.4156[1 – exp(- 0.0581(x – 10.1257))]2 yielding R2 = 0.92. It is presented graphically in Figure 3.5.1.
Figure 3.5.1. Fitting the Morse function to the sequence of Qs in the poem Peste vârfuri
The Q sequence of this poem begins with a verb (V), that is with Q = 1. Long poems of this kind, such as Odă în metru antic can be fitted as well, as shown in Figure 3.5.2. Note that parameter d means (nA+ nV) minimum of the curve. 6 The Morse potential, named after physicist Philip M. Morse, is a well known model of the potential energy in a diatomic molecule, see http://en.wikipedia.org/wiki/Morse_potential.
220 The word
Figure 3.5.2. Fitting the Morse function to the sequence of Qs in the poem Odă în metru antic
The other kind of Q sequences begins with an adjective (A), that is with Q = 0. In this case, the Morse fitting parameters a and d should vanish, as shown in Figure 3.5.3. The results for some other poems are presented in Table 3.5.4.
Figure 3.5.3. Fitting the Morse function to the sequence of Qs in the poem Lacul
Word classes (parts of speech) 221 Table 3.5.4: Fitting the Q-sequences in some poems by the Morse function (other function marked by name) Poem title
Empirical sequence
R2
Q-sequence Morse-function Adânca mare…
AAAAVVAAAVVAAAVVVVAVAVVAA 0; 0; 0; 0; 0.2; 0.33; 0.29; 0.25; 0.22; 0.3; 0.36; 0.33; 0.31; 0.29; 0.33; 0.38; 0.41; 0.44; 0.42; 0.45; 0.43; 0.45; 0.48; 0.46; 0.44 Chapman: y = 0,4652(1 - exp(-0,1570x))1,9041
Atât de fragedă...
0.87
AVAVVAVAAVVAAAVVAVVVAAVAVAVAVVAVAA VAVVVVAAVVV 0; 0.5; 0.33; 0.5; 0.6; 0.5; 0.57; 0.5; 0.44; 0.5; 0.55; 0.5; 0.46; 0.43; 0.47; 0.5; 0.47; 0.5; 0.53; 0.55; 0.52; 0.5; 0.52; 0.5; 0.52; 0.5; 0.52; 0.5; 0.52; 0.53; 0.52; 0.53; 0.52; 0.5; 0.51; 0.5; 0.51; 0.53; 0.54; 0.55; 0.54; 0.52; 0.53; 0.55; 0.56 y = 0.5104(1-exp(-0.9881(x – 0.8804)))2
Călin, Gazel
0.77
VVAVAVVVVVVVVAVVAA 1; 1; 0.67; 0.75; 0.6; 0.67; 0.71; 0.75; 0.78; 0.8; 0.82; 0.83; 0.85; 0.79; 0.8; 0.81; 0.76; 0.72 y = 0.7010 + 0.1180(1-exp(-0.2546(x-4.9521)))2
Călin, part I
0.62
VVAAVVVAVAAAVVVVAAVAVAAAAVVVAVVAAV AAVVVVAAVAVAVAAVAAVVAAVVAAAAVAVAVV AVAVVAAAVVVAAAVVVVVVAVAVAVA 1; 1; 0.67; 0.5; 0.6; 0.67; 0.71; 0.63; 0.67; 0.6; 0.55; 0.5; 0.54; 0.57; 0.6; 0.63; 0.59; 0.56; 0.58; 0.55; 0.57; 0.55; 0.52; 0.5; 0.48; 0.5; 0.52; 0.54; 0.52; 0.53; 0.55; 0.53; 0.52; 0.53; 0.51; 0.5; 0.51; 0.53; 0.54; 0.55; 0.54; 0.52; 0.53; 0.52; 0.53; 0.52; 0.53; 0.52; 0.51; 0.52; 0.51; 0.5; 0.51; 0.52; 0.51; 0.5; 0.51; 0.52; 0.51; 0.5; 0.49; 0.48; 0.49; 0.48; 0.49; 0.48; 0.49; 0.5; 0.49; 0.5; 0.49; 0.5; 0.51; 0.5; 0.49; 0.49; 0.49; 0.5; 0.51; 0.5; 0.49; 0.49; 0.49; 0.5; 0.51; 0.51; 0.52; 0.52; 0.52; 0.52; 0.52; 0.52; 0.52; 0.52; 0.52 y = 0.5177 + 9.4295E-6(1 - exp(-0.1350(x - 40.9756)))2
Călin, part II
0.73
VVAVAAVAVVAAVV 1; 1; 0.67; 0.75; 0.6; 0.5; 0.57; 0.5; 0.56; 0.6; 0.55; 0.5; 0.54; 0.57 Asymptotic: y = 0,5282 + 0,8015*0,6547x
Călin, part III
VVVAVAVAVVAAVVAVVAVVAVAAAAAVVVAAAV
0.84
222 The word Poem title
Empirical sequence
R2
VVAVVVAAVVAAAVAVVVVVVVAVVAVVVVAAA 1; 1; 1; 0.75; 0.8; 0.67; 0.71; 0.63; 0.67; 0.7; 0.64; 0.58; 0.62; 0.64; 0.6; 0.63; 0.65; 0.61; 0.63; 0.65; 0.62; 0.64; 0.61; 0.58; 0.56; 0.54; 0.52; 0.54; 0.55; 0.57; 0.55; 0.53; 0.52; 0.53; 0.54; 0.56; 0.54; 0.55; 0.56; 0.58; 0.56; 0.55; 0.56; 0.57; 0.56; 0.54; 0.53; 0.54; 0.53; 0.54; 0.55; 0.56; 0.57; 0.57; 0.58; 0.59; 0.58; 0.59; 0.59; 0.58; 0.59; 0.6; 0.6; 0.61; 0.6; 0.59; 0.58 y = 0.5685+0.00054(1-exp(-0.0894(x-39.1286)))2 Călin, part IV
0.87
VVAVVVAVVAAVAAVAAAAAAVVVVAVAAAAVVA VVVVAAVVAAVAAAVVVAVVVVVVVAVVVVVVVV VAVAVAV 1; 1; 0.67; 0.75; 0.8; 0.83; 0.71; 0.75; 0.78; 0.7; 0.64; 0.67; 0.62; 0.57; 0.6; 0.56; 0.53; 0.5; 0.47; 0.45; 0.43; 0.45; 0.48; 0.5; 0.52; 0.5; 0.52; 0.5; 0.48; 0.47; 0.45; 0.47; 0.48; 0.47; 0.49; 0.5; 0.51; 0.53; 0.51; 0.5; 0.51; 0.52; 0.51; 0.5; 0.51; 0.5; 0.49; 0.48; 0.49; 0.5; 0.51; 0.5; 0.51; 0.52; 0.53; 0.54; 0.54; 0.55; 0.56; 0.55; 0.56; 0.56; 0.57; 0.58; 0.58; 0.59; 0.6; 0.6; 0.61; 0.6; 0.61; 0.6; 0.6; 0.59; 0.6 y = 0.4759+0.29242(1-exp(-0.0266(x-31.9399)))2
Călin, part V
0.88
VAAAVAVVAVVVVVAVVVVVAAVAAVAAVVAAVA AAVVVAAVVVAAAAVVAVVAAAAAAVAAVAAVVA VVVVV 1; 0.5; 0.33; 0.25; 0.4; 0.33; 0.43; 0.5; 0.44; 0.5; 0.55; 0.58; 0.62; 0.64; 0.6; 0.63; 0.65; 0.67; 0.68; 0.7; 0.67; 0.64; 0.65; 0.63; 0.6; 0.62; 0.59; 0.57; 0.59; 0.6; 0.58; 0.56; 0.58; 0.56; 0.54; 0.53; 0.54; 0.55; 0.56; 0.55; 0.54; 0.55; 0.56; 0.57; 0.56; 0.54; 0.53; 0.52; 0.53; 0.54; 0.53; 0.54; 0.55; 0.54; 0.53; 0.52; 0.51; 0.5; 0.49; 0.5; 0.49; 0.48; 0.49; 0.48; 0.48; 0.48; 0.49; 0.49; 0.49; 0.5; 0.51; 0.51; 0.52 y = 0.3009 + 0.2562(1-exp(-0.3810(x-3.5721)))2
Călin, part VI
0.66
VVVVAVVVAAVVVAVVVVVA 1; 1; 1; 1; 0.8; 0.83; 0.86; 0.88; 0.78; 0.7; 0.73; 0.75; 0.77; 0.71; 0.73; 0.75; 0.76; 0.78; 0.79; 0.75 y = 0.7385+21.1310(1-exp(-0.0084(x-14.5036)))2
Călin, part VII
AVAVAAVAVVAVVVAVAVAVVAVVAVAAVVAVVA VAVVVVVVVVVVAAVAVAAVAVVAVAVAAAAVAV VAVAAAVAAVVVVVVAVVAAAVVVVVVVVVVAVV VVVAVAAVAAVVAVAVVAVV
0.83
Word classes (parts of speech) 223
Poem title
Empirical sequence
R2
0; 0.5; 0.33; 0.5; 0.4; 0.33; 0.43; 0.38; 0.44; 0.5; 0.45; 0.5; 0.54; 0.57; 0.53; 0.56; 0.53; 0.56; 0.53; 0.55; 0.57; 0.55; 0.57; 0.58; 0.56; 0.58; 0.56; 0.54; 0.55; 0.57; 0.55; 0.56; 0.58; 0.56; 0.57; 0.56; 0.57; 0.58; 0.59; 0.6; 0.61; 0.62; 0.63; 0.64; 0.64; 0.65; 0.64; 0.63; 0.63; 0.62; 0.63; 0.62; 0.6; 0.61; 0.6; 0.61; 0.61; 0.6; 0.61; 0.6; 0.61; 0.6; 0.59; 0.58; 0.57; 0.58; 0.57; 0.57; 0.58; 0.57; 0.58; 0.57; 0.56; 0.55; 0.56; 0.55; 0.55; 0.55; 0.56; 0.56; 0.57; 0.57; 0.58; 0.57; 0.58; 0.58; 0.57; 0.57; 0.56; 0.57; 0.57; 0.58; 0.58; 0.59; 0.59; 0.59; 0.6; 0.6; 0.61; 0.6; 0.6; 0.61; 0.61; 0.62; 0.62; 0.61; 0.62; 0.61; 0.61; 0.61; 0.6; 0.6; 0.6; 0.61; 0.6; 0.6; 0.6; 0.6; 0.61; 0.6; 0.6; 0.61 Asymptotic: y = 0,5917 - 0,4045*0,8783x Călin, part VIII
0.70
VVVAVAVAAVAVVAVAVVAVAVAAVAVVAAVAAV AAAAVVAAAVVAAAAAVAAAVVVVAVAAAAVAVA VAVAAVAVVVVVAAAVAVVVVVVVVVVAVVVAVV VVVAAVAVVAVVAAAVAVAVAAVVVVAVVA 1; 1; 1; 0.75; 0.8; 0.67; 0.71; 0.63; 0.56; 0.6; 0.55; 0.58; 0.62; 0.57; 0.6; 0.56; 0.59; 0.61; 0.58; 0.6; 0.57; 0.59; 0.57; 0.54; 0.56; 0.54; 0.56; 0.57; 0.55; 0.53; 0.55; 0.53; 0.52; 0.53; 0.51; 0.5; 0.49; 0.47; 0.49; 0.5; 0.49; 0.48; 0.47; 0.48; 0.49; 0.48; 0.47; 0.46; 0.45; 0.44; 0.45; 0.44; 0.43; 0.43; 0.44; 0.45; 0.46; 0.47; 0.46; 0.47; 0.46; 0.45; 0.44; 0.44; 0.45; 0.44; 0.45; 0.44; 0.45; 0.44; 0.45; 0.44; 0.44; 0.45; 0.44; 0.45; 0.45; 0.46; 0.47; 0.48; 0.47; 0.46; 0.46; 0.46; 0.46; 0.47; 0.47; 0.48; 0.48; 0.49; 0.49; 0.5; 0.51; 0.51; 0.52; 0.51; 0.52; 0.52; 0.53; 0.52; 0.52; 0.53; 0.53; 0.54; 0.54; 0.54; 0.53; 0.54; 0.53; 0.54; 0.54; 0.54; 0.54; 0.54; 0.54; 0.53; 0.53; 0.53; 0.53; 0.53; 0.53; 0.53; 0.53; 0.52; 0.53; 0.53; 0.54; 0.54; 0.53; 0.54; 0.54; 0.54 y = 0.4572 + 0.1311(1-exp(-0.0203(x-51.8483)))2
Când amintirile...
0.83
VVAAVAAVVAAVAVAVAVVVVVVVVAAVAA 1; 1; 0.67; 0.5; 0.6; 0.5; 0.43; 0.5; 0.56; 0.5; 0.45; 0.5; 0.46; 0.5; 0.47; 0.5; 0.47; 0.5; 0.53; 0.55; 0.57; 0.59; 0.61; 0.63; 0.64; 0.62; 0.59; 0.61; 0.59; 0.57 y = 0.4615 + 0.1781(1-exp(-0.1231 (x-9.4026)))2
Ce te legeni...
VVVVVVVAVVAVAVAVVVAVVAAVVAVVAAAAVA 1; 1; 1; 1; 1; 1; 1; 0.88; 0.89; 0.9; 0.82; 0.83; 0.77; 0.79; 0.73; 0.75; 0.76; 0.78; 0.74; 0.75; 0.76; 0.73; 0.7; 0.71; 0.72; 0.69; 0.7; 0.71; 0.69; 0.67; 0.65; 0.63; 0.64; 0.62
0.85
224 The word Poem title
Empirical sequence y = 0.6397+ 36.1336(1-exp(-0.0026(x - 39.5599)))
Crăiasa din poveşti
R2 2
0.94
AAVAVVVVVAAVAAVVAAAAVVVAAVVAAVVAVA V 0; 0; 0.33; 0.25; 0.4; 0.5; 0.57; 0.63; 0.67; 0.6; 0.55; 0.58; 0.54; 0.5; 0.53; 0.56; 0.53; 0.5; 0.47; 0.45; 0.48; 0.5; 0.52; 0.5; 0.48; 0.5; 0.52; 0.5; 0.48; 0.5; 0.52; 0.5; 0.52; 0.5; 0.51 y = 0.5259(1-exp(-0.4304x))2
Criticilor mei
0.79
AVAVAVVAAVAVVVVAVVAAVAVVAVVAAVAVVV VAVAVVV 0; 0.5; 0.33; 0.5; 0.4; 0.5; 0.57; 0.5; 0.44; 0.5; 0.45; 0.5; 0.54; 0.57; 0.6; 0.56; 0.59; 0.61; 0.58; 0.55; 0.57; 0.55; 0.57; 0.58; 0.56; 0.58; 0.59; 0.57; 0.55; 0.57; 0.55; 0.56; 0.58; 0.59; 0.6; 0.58; 0.59; 0.58; 0.59; 0.6; 0.61 y = 0.5607 (1-exp(-0.6302x))2
Cu mâine zilele-ţi
AVAVVAVVVVVVVAVVAVVAVAVVAVAVVVAAAV
adaogi…
AVAVVAAAVAA
0.69
0; 0.5; 0.33; 0.5; 0.6; 0.5; 0.57; 0.63; 0.67; 0.7; 0.73; 0.75; 0.77; 0.71; 0.73; 0.75; 0.71; 0.72; 0.74; 0.7; 0.71; 0.68; 0.7; 0.71; 0.68; 0.69; 0.67; 0.68; 0.69; 0.7; 0.68; 0.66; 0.64; 0.65; 0.63; 0.64; 0.62; 0.63; 0.64; 0.63; 0.61; 0.6; 0.6; 0.59; 0.58 y = 0.6720(1-exp(-0.5178x))2 De ce nu-mi vii?
0.75
VVVVVVVAVAVVVAAVVVAVVVAAAVAVVAVV 1; 1; 1; 1; 1; 1; 1; 0.88; 0.89; 0.8; 0.82; 0.83; 0.85; 0.79; 0.73; 0.75; 0.76; 0.78; 0.74; 0.75; 0.76; 0.77; 0.74; 0.71; 0.68; 0.69; 0.67; 0.68; 0.69; 0.67; 0.68; 0.69 y = 0.6283+ 0.0580(1-exp(-0.0227(x-58.6454)))2
De-aş avea
0.92
VAAAAVAVVVAVVAAAAAVAAVVAAVVVAV 1; 0.5; 0.33; 0.25; 0.2; 0.33; 0.29; 0.38; 0.44; 0.5; 0.45; 0.5; 0.54; 0.5; 0.47; 0.44; 0.41; 0.39; 0.42; 0.4; 0.38; 0.41; 0.43; 0.42; 0.4; 0.42; 0.44; 0.46; 0.45; 0.47 y = 0.2647 + 0.1811(1-exp(-0.3763(x-3.9474)))2
De-or trece anii...
0.88
VVAVVVVVVVVAVVVVVVVAVAVVV 1; 1; 0.67; 0.75; 0.8; 0.83; 0.86; 0.88; 0.89; 0.9; 0.91; 0.83; 0.85; 0.86; 0.87; 0.88; 0.88; 0.89; 0.89; 0.85; 0.86; 0.82; 0.83; 0.83; 0.84 y = 0.7919+ 0.0759(1-exp(-0.3850(x-3.6703))2
Departe sunt de tine... A V A V A V V V A V V V A V V A A A V A A A V A V V V V V A V A A V VVVVAAVA
0.44
Word classes (parts of speech) 225
Poem title
Empirical sequence
R2
0; 0.5; 0.33; 0.5; 0.4; 0.5; 0.57; 0.63; 0.56; 0.6; 0.64; 0.67; 0.62; 0.64; 0.67; 0.63; 0.59; 0.56; 0.58; 0.55; 0.52; 0.5; 0.52; 0.5; 0.52; 0.54; 0.56; 0.57; 0.59; 0.57; 0.58; 0.56; 0.55; 0.56; 0.57; 0.58; 0.59; 0.61; 0.59; 0.58; 0.59; 0.57 y = 0.5759 (1-exp(-0.6225x))2
0.69
Dintre sute de catarge V V A V V V V A V V V A V 1; 1; 0.67; 0.75; 0.8; 0.83; 0.86; 0.75; 0.78; 0.8; 0.82; 0.75; 0.77 y = 0.7786 + 0.0164(1-exp(-0.3737(x-5.2560))2 Dorinţa
0.52
VVAVAVVVVVVAAAVAAAVVAAVAVAAVVVAA 1; 1; 0.67; 0.75; 0.6; 0.67; 0.71; 0.75; 0.78; 0.8; 0.82; 0.75; 0.69; 0.64; 0.67; 0.63; 0.59; 0.56; 0.58; 0.6; 0.57; 0.55; 0.57; 0.54; 0.56; 0.54; 0.52; 0.54; 0.55; 0.57; 0.55; 0.53 y = 0.4799 + 0.0006(1-exp(-0.0316(x- 104.1823))2
După ce atâta vreme
0.68
AVVVAVAVAAAVVAAVVVAVV 0; 0.5; 0.67; 0.75; 0.6; 0.67; 0.57; 0.63; 0.56; 0.5; 0.45; 0.5; 0.54; 0.5; 0.47; 0.5; 0.53; 0.56; 0.53; 0.55; 0.57 y = 0.5614(1-exp(-1.0274)2
Floare-albastră
0.54
VAVAAVAAAVAVAAVAVVVVAVVVAAAAAVVVVA VVVVVVVVVAAVVVVVAVVAAVAAVAVAAVV 1; 0.5; 0.67; 0.5; 0.4; 0.5; 0.43; 0.38; 0.33; 0.4; 0.36; 0.42; 0.38; 0.36; 0.4; 0.38; 0.41; 0.44; 0.47; 0.5; 0.48; 0.5; 0.52; 0.54; 0.52; 0.5; 0.48; 0.46; 0.45; 0.47; 0.48; 0.5; 0.52; 0.5; 0.51; 0.53; 0.54; 0.55; 0.56; 0.58; 0.59; 0.6; 0.6; 0.59; 0.58; 0.59; 0.6; 0.6; 0.61; 0.62; 0.61; 0.62; 0.62; 0.61; 0.6; 0.61; 0.6; 0.59; 0.59; 0.58; 0.59; 0.58; 0.57; 0.58; 0.58 y = 0.3651 + 0.2283(1-exp(-0.0979(x-10.2388)))2
Înger de pază
0.79
VVAAVVAAVAVAAVAVAAVAVVAV 1; 1; 0.67; 0.5; 0.6; 0.67; 0.57; 0.5; 0.56; 0.5; 0.55; 0.5; 0.46; 0.5; 0.47; 0.5; 0.47; 0.44; 0.47; 0.45; 0.48; 0.5; 0.48; 0.5 y = 0.4850 + 7.5509E-6(1-exp(-0.1950(x- 29.7388)))2
Iubind în taină
0.86
VAVVVVAVVAVVAVVAVVAAVVVVV 1; 0.5; 0.67; 0.75; 0.8; 0.83; 0.71; 0.75; 0.78; 0.7; 0.73; 0.75; 0.69; 0.71; 0.73; 0.69; 0.71; 0.72; 0.68; 0.65; 0.67; 0.68; 0.7; 0.71; 0.72 y = 0.3073 + 0.4133(1-exp(-1.9721(x - 1.4212)))2
Kamadeva
VVVAVAVVAVVAAVVVAAVVAA 1; 1; 1; 0.75; 0.8; 0.67; 0.71; 0.75; 0.67; 0.7; 0.73; 0.67; 0.62; 0.64; 0.67; 0.69; 0.65; 0.61; 0.63; 0.65; 0.62; 0.59
0.77
226 The word Poem title
Empirical sequence y = 0.6208+ 263.1377(1-exp(-0.0022(x- 17.2038)))
La mijloc de codru…
R2 2
0.81
A (outlier) V A A A V V A 0 (outlier); 0.5; 0.33; 0.25; 0.2; 0.33; 0.43; 0.38 y = 0.2466 + 0.4466(1-exp(-0.2589(x - 4.1943)))2
Lacul
0.83
AAVVAVVAVVVVAVAAVVVAAAVAVVAAVVAA 0; 0; 0.33; 0.5; 0.4; 0.5; 0.57; 0.5; 0.56; 0.6; 0.64; 0.67; 0.62; 0.64; 0.6; 0.56; 0.59; 0.61; 0.63; 0.6; 0.57; 0.55 y = 0.5853(1-exp(-0.4129x))2
La steaua
0.87
VVAVVAVAVVAVVVVVVAAVA 1; 1; 0.67; 0.75; 0.8; 0.67; 0.71; 0.63; 0.67; 0.7; 0.64; 0.67; 0.69; 0.71; 0.73; 0.75; 0.76; 0.72; 0.68; 0.7; 0.67 y = 0.6785 + 0.0473(1-exp(-0.1754(x-8.4882)))2
Lida
0.72
VAVVVAVVAVVVAAVVAVA 1; 0.5; 0.67; 0.75; 0.8; 0.67; 0.71; 0.75; 0.67; 0.7; 0.73; 0.75; 0.69; 0.64; 0.67; 0.69; 0.65; 0.67; 0.63 y = 0.0908 + 0.6068(1-exp(-2.5182(x- 1.3174)))2
Luceafărul
0.80
VVAAAAVAVAVVVVVAAVVVAVVAVVVAVAVVAV AVVVAAVAVAAVVVAV 1; 1; 0.67; 0.5; 0.4; 0.33; 0.43; 0.38; 0.44; 0.4; 0.45; 0.5; 0.54; 0.57; 0.6; 0.56; 0.53; 0.56; 0.58; 0.6; 0.57; 0.59; 0.61; 0.58;
(up to A + V = 50)
0.6; 0.62; 0.63; 0.61; 0.62; 0.6; 0.61; 0.63; 0.61; 0.62; 0.6; 0.61; 0.62; 0.63; 0.62; 0.6; 0.61; 0.6; 0.6; 0.59; 0.58; 0.59; 0.6; 0.6; 0.59; 0.6 y = 0.4163 + 0.1984(1-exp(-0.1733(x-7.0923)))2
0.86
Mai am un singur dor V V V V A A A V A V A V A V V A V V A V A A V V A V V V V V A V 1; 1; 1; 1; 0.8; 0.67; 0.57; 0.63; 0.56; 0.6; 0.55; 0.58; 0.54; 0.57; 0.6; 0.56; 0.59; 0.61; 0.58; 0.6; 0.57; 0.55; 0.57; 0.58; 0.56; 0.58; 0.59; 0.61; 0.62; 0.63; 0.61; 0.63 y = 0.5553 +0.17824(1-exp(-0.0662(x-16.2706)))2 Melancolie
VVVAVVAAAAAAVAVVAVAAAVAVVVAVVVVAAA AAVVVVAVVAAVAVVAAVVAAAVAAAVVAVVVAA AVVVVVAV 1; 1; 1; 0.75; 0.8; 0.83; 0.71; 0.63; 0.56; 0.5; 0.45; 0.42; 0.46; 0.43; 0.47; 0.5; 0.47; 0.5; 0.47; 0.45; 0.43; 0.45; 0.43; 0.46; 0.48; 0.5; 0.48; 0.5; 0.52; 0.53; 0.55; 0.53; 0.52; 0.5; 0.49; 0.47; 0.49; 0.5; 0.51; 0.53; 0.51; 0.52; 0.53; 0.52; 0.51; 0.52; 0.51; 0.52; 0.53; 0.52; 0.51; 0.52; 0.53; 0.52; 0.51; 0.5; 0.51; 0.5; 0.49; 0.48; 0.49; 0.5; 0.49; 0.5; 0.51; 0.52; 0.51; 0.5; 0.49; 0.5;
0.88
Word classes (parts of speech) 227
Poem title
Empirical sequence
R2
0.51; 0.51; 0.52; 0.53; 0.52; 0.53 y = 0.4675 + 0.0545(1–exp(-0.0756(x–20.6697)))2 O, ramâi
0.90
VVVAVVVVAAAAVVAVAVAVAVAVAAVAVAVVVV VVVV 1; 1; 1; 0.75; 0.8; 0.83; 0.86; 0.88; 0.78; 0.7; 0.64; 0.58; 0.62; 0.64; 0.6; 0.63; 0.59; 0.61; 0.58; 0.6; 0.57; 0.59; 0.57; 0.58; 0.56; 0.54; 0.56; 0.54; 0.55; 0.53; 0.55; 0.56; 0.58; 0.59; 0.6; 0.61; 0.62; 0.63 y = 0.5533 + 2.6690(1–exp(-0.0141(x–25.5670)))2
Odă în metru antic
0.90
VVVAAAVAAVAAAVAAVAAAVVAAVAVVVAVAVA VVAV 1; 1; 1; 1; 0.75; 0.6; 0.5; 0.57; 0.5; 0.44; 0.5; 0.45; 0.42; 0.38; 0.43; 0.4; 0.38; 0.41; 0.39; 0.37; 0.35; 0.38; 0.41; 0.39; 0.38; 0.4; 0.38; 0.41; 0.43; 0.45; 0.43; 0.45; 0.44; 0.45; 0.44; 0.46; 0.47; 0.46; 0.47 y = 0.3796 + 0.1555(1-exp(-0.0692(x-17.4689)))2
Oricâte stele
0.94
VVVVVVVVAAAVAVVVVAVAVVAAA 1; 1; 1; 1; 1; 1; 1; 1; 0.89; 0.8; 0.73; 0.75; 0.69; 0.71; 0.73; 0.75; 0.76; 0.72; 0.74; 0.7; 0.71; 0.73; 0.7; 0.67; 0.64 y = 0.6703+ 140.8362(1-exp(-0.0020(x-27.2545)))2
0.84
Pe aceeaşi ulicioară... V A (outlier) V V A V V A V A V A A V A A A V A V V V V V V V V V V V VVVAV 1; 0.5 (outlier); 0.67; 0.75; 0.6; 0.67; 0.71; 0.63; 0.67; 0.6; 0.64; 0.58; 0.54; 0.57; 0.53; 0.5; 0.47; 0.5; 0.47; 0.5; 0.52; 0.55; 0.57; 0.58; 0.6; 0.62; 0.63; 0.64; 0.66; 0.67; 0.68; 0.69; 0.7; 0.68; 0.69 y = 0.5293+1.1133(1-exp(-0.0295(x-16.5278)))2 Pe lângă plopii fără
AVVVVVAVVVVVAVVVAVVAVVVAVAAVVAVAAV
soţ...
VAVAVAVVAAVVAV
0.78
0; 0.5; 0.67; 0.75; 0.8; 0.83; 0.71; 0.75; 0.78; 0.8; 0.82; 0.83; 0.77; 0.79; 0.8; 0.81; 0.76; 0.78; 0.79; 0.75; 0.76 y = 0.7283(1 – exp(-0.8021))2 Peste vârfuri
0.69
VVAAVAAAAVVAVVA 1; 1; 0.67; 0.5; 0.6; 0.5; 0.43; 0.38; 0.33; 0.4; 0.45; 0.42; 0.46; 0.5; 0.47 y = 0.3933+0.9668(1-exp(-0.0724(x-9.3162)))2
Revedere
VVAVVAVVVAVVVVVVVAVVVVAVVAVAVVVVAA VVVAAVVAAVVV
0.91
228 The word Poem title
Empirical sequence
R2
1; 1; 0.67; 0.75; 0.8; 0.67; 0.71; 0.75; 0.78; 0.7; 0.73; 0.75; 0.77; 0.79; 0.8; 0.81; 0.82; 0.78; 0.79; 0.8; 0.81; 0.82; 0.78; 0.79; 0.8; 0.77; 0.78; 0.75; 0.76; 0.77; 0.77; 0.78; 0.76; 0.74; 0.74; 0.75; 0.76; 0.74; 0.72; 0.73; 0.73; 0.71; 0.7; 0.7; 0.71; 0.72 y = 0.7310 + 0.0316(1-exp(-0.2815(x-6.0283)))2 Sara pe deal
0.52
V A (outlier) V V V A V V V A A A V A V A A V A V V A V V V V A V V A AVVAAVAVVVVAVVVAAAV 1; 0.5 (outlier); 0.67; 0.75; 0.8; 0.67; 0.71; 0.75; 0.78; 0.7; 0.64; 0.58; 0.62; 0.57; 0.6; 0.56; 0.53; 0.56; 0.53; 0.55; 0.57; 0.55; 0.57; 0.58; 0.6; 0.62; 0.59; 0.61; 0.62; 0.6; 0.58; 0.59; 0.61; 0.59; 0.57; 0.58; 0.57; 0.58; 0.59; 0.6; 0.61; 0.6; 0.6; 0.61; 0.62; 0.61; 0.6; 0.58; 0.59 y = 0.5761+0.0529(1-exp(-0.0568(x-23.3239)))2
0.75
Se bate miezul nopţii... V V V A A V V V A V V A 1; 1; 1; 0.75; 0.6; 0.67; 0.71; 0.75; 0.67; 0.7; 0.73; 0.67 y = 0.6762 + 1.0836(1-exp(-0.0619(x- 8.5554)))2 Şi dacă...
0.74
VVVVAVVVVVVAVVVVA 1; 1; 1; 1; 0.8; 0.83; 0.86; 0.88; 0.89; 0.9; 0.91; 0.83; 0.85; 0.86; 0.87; 0.88; 0.82 y = 0.8575 + 0.0150(1-exp(-0.1162(x-13.7304)))2
Singurătate
0.57
AVVVVAVAAAVAAVAVVVAAVVVVVVAAVVAAVV VVVAAVAAVVAVVVVAAAA 0; 0.5; 0.67; 0.75; 0.8; 0.67; 0.71; 0.63; 0.56; 0.5; 0.55; 0.5; 0.46; 0.5; 0.47; 0.5; 0.53; 0.56; 0.53; 0.5; 0.52; 0.55; 0.57; 0.58; 0.6; 0.62; 0.59; 0.57; 0.59; 0.6; 0.58; 0.56; 0.58; 0.59; 0.6; 0.61; 0.62; 0.61; 0.59; 0.6; 0.59; 0.57; 0.58; 0.59; 0.58; 0.59; 0.6; 0.6; 0.61; 0.6; 0.59; 0.58; 0.57 y = 0.5840(1-exp(-1.0126x))2
Somnoroase
AVVAVAVVVAVVVAAVAVA
păsărele…
0; 0.5; 0.67; 0.5; 0.6; 0.5; 0.57; 0.63; 0.67; 0.6; 0.64; 0.67;
0.48
0.69; 0.64; 0.6; 0.63; 0.59; 0.61; 0.58 y = 0.6209(1-exp(-0.8234x))2 Sonet I
0.74
VAVAVAVVAVVVVVVVVVAVVAVAAV 1; 0.5; 0.67; 0.5; 0.6; 0.5; 0.57; 0.63; 0.56; 0.6; 0.64; 0.67; 0.69; 0.71; 0.73; 0.75; 0.76; 0.78; 0.74; 0.75; 0.76; 0.73; 0.74; 0.71; 0.68; 0.69 y = 0.5101 + 0.2251(1– exp(-0.2701(x – 4.2365)))2
Sonet II
VVAVVVAAVAVVVVAVVAVVVAVVV
0.74
Word classes (parts of speech) 229
Poem title
Empirical sequence
R2
1; 1; 0.67; 0.75; 0.8; 0.83; 0.71; 0.63; 0.67; 0.6; 0.64; 0.67; 0.69; 0.71; 0.67; 0.69; 0.71; 0.67; 0.68; 0.7; 0.71; 0.68; 0.7; 0.71; 0.72 y = 0.6680 + 0.0648(1– exp(-0.1182(x – 11.0558)))2
0.71
VVAVVAVVAVAAVVVVVAAAVVVVVAAA
Sonet III
1; 1; 0.67; 0.75; 0.8; 0.67; 0.71; 0.75; 0.67; 0.7; 0.64; 0.58; 0.62; 0.64; 0.67; 0.69; 0.71; 0.67; 0.63; 0.6; 0.62; 0.64; 0.65; 0.67; 0.68; 0.65; 0.63; 0.61 y = 0.6494 + 0.0012(1– exp(-0.1739(x – 17.7071)))2
0.75
VAVVVVAAAVVVVAVVAVAVV
Trecut-au anii...
1; 0.5; 0.67; 0.75; 0.8; 0.83; 0.71; 0.63; 0.56; 0.6; 0.64; 0.67; 0.69; 0.64; 0.67; 0.69; 0.65; 0.67; 0.63; 0.65; 0.67 y = 0.6639+ 3.9492E-6(1– exp(-0.6199(x – 9.9793)))2
0.34
Some of the fitting results are not satisfactory and another reasonable function must be found. As it is shown in Chapter 3.3. (Figure 3.3.3) and Chapter 4 (in many figures), La mijloc de codru…, Replici and some other poems are outliers from different points of view. (3) The second dynamic view is the study of occurrences of A up to a given V (or the other way round). The counting is simple and can be performed directly from sequence (I). Writing the sequence again, we note the rank of V above and that of A below the sequence:
(I)
1
2
V
V
3 A
A
1
2
V
A
A
A
A
A
3
4
5
5
5
4
5
V
V
A
6
7
V
V
8
A 9
The result can be collected in Table 3.5.5. Table 3.5.5: Number of A's up to the xth V in the poem Peste vârfuri xth V
1
2
3
4
5
6
7
number of As
0
0
2
7
7
8
8
The last A can be written before the imaginary last V, i.e. we could add V = 8 and A = 9 but we shall restrain from this method.
230 The word This sequence is always non-decreasing and it is simple to conjecture that most probably such a sequence follows the simple power function. For the poem Peste vârfuri we obtain y = 0.7732x1.2778 with R2 = 0.83. The results for some other poems are presented in Table 3.5.6. Table 3.5.6: Cumulative A up to the xth V fitted by the power function y = axb a
b
R2
3.8579
0.4867
0.93
Atât de fragedă...
1.1766
0.8989
0.99
Călin, Gazel
0.4846
0.6914
0.77
Călin, I
0.6984
1.0866
0.99
Călin, II
0.3144
1.4624
0.93
Călin, III
0.7729
0.982
0.96
Călin, IV
1.8906
0.745
0.91
Călin, V
0.2002
1.4472
0.97
Călin, VI
0.1293
1.3332
0.89
Călin, VII
0.9704
0.908
0.97
Călin, VIII
1.9928
0.809
0.95
Când amintirile...
1.5127
0.7027
0.87
Ce te legeni...
0.0089
2.3251
0.96
Crăiasa din poveşti
0.6629
1.1329
0.96
Criticilor mei
1.0783
0.8530
0.98
Cu mâine zilele-ţi adaogi…
0.0938
1.5603
0.94
De ce nu-mi vii?
0.0146
2.1403
0.97 0.93
Poem title Adânca mare…
De-aş avea
1.2942
0.9771
De-or trece anii...
1.3042
0.87
Departe sunt de tine...
0.0709 0.7638
0.9823
0.94
Dintre sute de catarge
0.1253
1.3383
0.90
Dorinţa
0.0692
1.8925
0.95
După ce atâta vreme
0.6630
1.0942
0.92
Floare-albastră
2.5040
0.6238
0.93
Înger de pază
0.4745
1.3323
0.97
Iubind în taină
0.2169
1.2432
0.96
Kamadeva
0.1256
1.5874
0.96
La mijloc de codru…
1.6444
0.8979
0.76
Lacul
0.4542
1.1813
0.94
Word classes (parts of speech) 231
a
b
R2
La steaua
0.4152
0.9691
0.89
Lida
0.1738
1.4101
0.95
Luceafărul, up to the 30th
0.9248
0.8868
0.96
Poem title
verb Mai am un singur dor
0.6197
1.0174
0.94
Melancolie
0.9711
0.9923
0.98
O, ramâi
0.4382
1.1477
0.92
Odă în metru antic
1.4400
0.9394
0.93
Oricâte stele
0.0196
2.1031
0.89
Pe aceeaşi ulicioară...
1.5363
0.6496
0.80
Pe lângă plopii fără soţ...
0.0286
1.8667
0.99
Peste vârfuri
0.7732
1.2778
0.83
Revedere
0.0605
1.5292
0.96
Sara pe deal
0.6783
0.9870
0.96
Se bate miezul nopţii...
0.1570
1.4671
0.85
Si dacă...
0.0639
1.3417
0.87
Singurătate
1.0030
0.8764
0.96
Somnoroase păsărele…
0.6075
0.9674
0.91
Sonet I
0.9727
0.6304
0.80
Sonet II
0.4217
1.0003
0.95
Sonet III
0.3329
1.1638
0.94
Trecut-au anii...
0.3306
1.1698
0.93
The parameters cannot be generalised, they vary considerably. Not all of the fitting results are adequate but we see that there is a certain regularity expressed by the simple power relationship between the fitting parameters a and b, see Figure 3.5.4.
3.5.3 Runs Regularities are phenomena that are frequently followed unconsciously or placed intentionally, for example meter, rhyme, etc. Some regularities are textsort bound or language inherent but they may arise also ad hoc, spontaneously, as a consequence of style, as a consequence of some mood, psychic state of the author, etc. In that case one does not follow an accepted scheme but creates tendencies. Tendencies may arise spontaneously in order to give the text a certain colouring. They are, so to say, counterparts of regularities. They may be
232 The word present in text to a certain extent which may be considered significant or they may be missing (being non-significant). Out of the many possibilities to study tendencies, we shall examine here the existence of runs. Our data directly inspire to scrutinise this point of view. If we consider a sequence of binary data (I)
V V A A V A A A A A V V A V V A.
as shown above in (I) we may ask: (1) Is there a significant alternation of the binary categories? The hypothetic example AVAVAVAVAVAVAVAA illustrates an extremely reguar case. (2) Is there a tendency to separate the active and the descriptive part? A quite extreme case with separation of the categories would be VVVVVVVAAAAAAAAA. (3) Are there significantly long sequences of the same category?
Figure 3.5.4. The relationship between the fitting parameters a and b
3.5.3.1 Sequential dependence Since adjectives and verbs are predicates of nouns, we may ask whether there is any dependence in the transition from an A to a V and V to A respectively. The quickest way is to compute all transitions and place them in a contingency table. For the sequence (I) the contingency table is as follows: A
V
A
5
3
V
4
3
The cells can be called nAA = 5, nAV = 3, nVA = 4, nVV = 3. The number of transitions
Word classes (parts of speech) 233
nt is equal to the number of symbols minus 1, here nt = 15, representing the sum of all numbers in the contingency table. In order to test whether there is any tendency we perform the chi-square test in the form (3.5.14) 𝑥² = (𝑛
𝑛 𝑛𝑡 �| 𝑛𝐴𝐴 𝑛𝑉𝑉 − 𝑛𝐴𝑉 𝑛𝑉𝐴 | − 𝑡 �² 2
𝐴𝐴 + 𝑛𝐴𝑉 )(𝑛𝐴𝐴 + 𝑛𝑉𝐴 )(𝑛𝐴𝑉 + 𝑛𝑉𝑉 )(𝑛𝑉𝐴 + 𝑛𝑉𝑉 )
.
In the denominator we see the sums of rows and columns. In our case we obtain 𝑥² =
15 �| 5(3)− 3(4)|− 8(9)6(7)
15 �² 2
= 0.10 .
This chi-square statistic with 1 degree of freedom is not significant, hence there is no dependence in the transition between the categories. The critical value is 3.84. Another test based on runs yields the same result. Runs can be computed from the empirical sequence as uninterrupted sequences of the same symbol, here A or V. For example, in the above sequence (I) we find rA = 4, rV = 4 yielding r = 8, that is four runs of A's, four runs of V's, together 8 runs. The number of individual letters is n = 16, out of which nA = 9, nV = 7. We test asymptotically whether the number of runs (r) does not differ significantly from its expectation using the normal test given as (3.5.15) u =
r − E (r )
σr
where the expectation is (3.5.16) E (r ) = 1 +
2nAnV , n
in our case E(r) = 1 + 2(9)7/16 = 8.8750. The standard deviation is given as 2nAnV (2nAnV − n) , n 2 (n − 1)
(3.5.17) σ r = in our example = σr
2(9)7[2(9)7 − 16] = 1.8998 . 162 (16 − 1)
Inserting these values in (3.5.15) we obtain u = (8 – 8.8750)/1.8998 = -0.46
234 The word which is also non-significant. The critical value of u = ± 1.96. Hence, the runs are distributed randomly. There is no tendency to form too many or too few sequences of the same symbol (A or V) in this poem. Using the exact test of Cox (cf. Cox 1958; Maxwell 1961: 137; Bortz, Lienert, Boehnke 1990: 563), one defines the numbers in the contingency table as follows: nA = 9 = number of A's nV = 7 = number of V's n = nA + nV = 16 = number of A's and V's rA = 4 = number of runs of A's rV = 4 = number of runs of V's r = rA + rV = 4 + 4 = 8 = number of total runs rV = n – nA – rA + 1 = 16 – 9 – 4 + 1 = 4 rrA = nA – rA = 9 – 4 = 5 rrV = rA – 1 = 4 – 1 = 3 rrV = nV – rV = 7 – 4 = 3 rr = rrA + rrV = 5 + 3 = 8 Thus we obtain the contingency table rA = 4
rrA = 5
nA = 9
rV = 4
rrV = 3
nV = 7
r=8
rr = 8
n = 16
in which the marginal numbers are the sums. The probability of such an event can be computed according to
rA + rV rrA + rrV rA rrA = P (rA ) = n (3.5.18) rA + rrA =
(rA + rV )! (rrA + rrV )! (rA + rrA )! (rV + rrV )! rA ! rV ! rrA ! rrV ! n!
Inserting our values in the last formula we obtain
Word classes (parts of speech) 235
= P(r ) A
(4 + 4)!(5 + 3)!(4 + 5)!(4 + 3)! = 0.3427, 4! 4!5!3!16!
that means, the probability of this event is not significantly small, it is an event without a tendency. The results for some other poems are presented in Table 3.5.7. Here the critical values are x2 > 3.84, u > |1.96|, P < 0.05. Table 3.5.7: Tests for significance of sequences Poem title Atât de fragedă....
Dependence
Runs
x2 (1 DF)
u
Cox P
1.71
1.46
0.08 0.47
Călin, Gazel
0.16
0.14
Călin, I
0.39
0.73
0.12
Călin, II
0.09
0.65
0.42
Călin, III
0.0005
0.40
0.19
Călin, IV
0.42
0.97
0.10
Călin, V
0.49
-1.05
0.08
Călin, VI
0.33
0.31
0.47
Călin, VII
2.50
1.67
0.04
Călin, VIII
0.74
0.95
0.09
Ce te legeni?...
0.03
0.35
0.27
Crăiasa din poveşti
0.12
-0.17
0.26
Criticilor mei
2.78
1.83
0.05
Cu mâine zilele-ţi adaogi…
1.35
1.25
0.07
De ce nu-mi vii?
0.05
0.11
0.32
De-aş avea
0.06
-0.35
0.22
De-or trece anii...
0.06
1.01
0.58
Departe sunt de tine...
0.12
0.46
0.18
Dintre sute de catarge
0.15
1.17
0.58
Dorinţa
0.04
-0.34
0.27
După ce atâta vreme
0.008
0.33
0.33
Floare- albastră
0.003
-0.40
0.17
2.17
1.67
0.16
Înger de pază Iubind în taină
0.29
0.98
0.34
Kamadeva
0.004
0.16
0.34
Lacul
0.02
0.02
0.26
La mijloc de codru…
0.11
0.21
0.43
La steaua
0.38
0.85
0.26
236 The word Poem title
Dependence
Runs
Cox
x (1 DF)
u
P
Lida
0.73
1.10
0.20
Luceafărul (up to A + V = 50)
0.97
1.19
0.15
Mai am un singur dor
0.75
1.15
0.21
Melancolie
0.32
-0.90
0.10
O, ramâi
0.31
0.82
0.24
Odă în metru antic
0.75
1.01
0.21
Oricâte stele
0.20
-1.12
0.21
Pe aceeaşi ulicioară...
0.002
0.37
0.31
Pe lânga plopii fara soţ...
2.15
1.61
0.06
Peste vârfuri
0.10
-0.46
0.34
Revedere
0.03
0.50
0.28
Sara pe deal
0.24
0.70
0.21
Se bate miezul nopţii...
0.33
-0.23
0.51
Singurătate
0.46
-1.14
0.16
Somnoroase păsărele…
1.47
1.33
0.07
2
Sonet I
0.95
1.38
0.23
Sonet II
0.29
0.98
0.34
Sonet III
0.12
-0.95
0.22
Şi dacă...
0.06
0.05
0.67
Trecut-au anii...
0.002
0.34
0.39
3.5.3.2 Run length The individual runs in our sequence (I) are not equally long. Some consist of one element, some of two, etc. But there is a run of A's consisting of five elements. Since this is the longest run, we may ask, whether a run of such a length is random or has been produced non-randomly (e.g. intentionally – an assumption that cannot be answered qualitatively). In order to find the probability of the longest run, we perform Mood's test (1940). Again, we choose the following symbols: n = number of elements = 16 nA = number of A's = 9 nV = number of V's = 7 s = the longest run = 5 (this is the run of A's). The exceedance probability, i.e. the probability of a certain length s, can be computed using Bradley's (1968: 256) formula
Word classes (parts of speech) 237
nV + 1 n − s nV + 1 n − 2s nV + 1 n − 3s − + − ... 1 nV 2 nV 3 nV (3.5.19) P ( s ) = . n nV In our example we obtain
7 + 116 − 5 1 7 = P ( s ) = 16 7
0.2308,
which is not significant. Our expression in the numerator contains only one element because already the second one, n − 2s 16 − 2(5) 6 = = = 0 7 nV 7
by definition. Hence, in the given poem there is no tendency to place long descriptive sequences. If n > 30, one can perform the test asymptotically by means of the Poisson distribution computing (3.5.20) P ( s ) = 1 − e − λ where λ is the parameter of the Poisson distribution computed as s
n (3.5.21) λ = nV A . n
Even if our n is smaller, we obtain with our data λ = 7(9/16)5 = 0.3942, hence P(s) = 1 – 2.7183-0.3942 = 0.3258 which is even greater than the result of the exact Mood test and does not indicate any significance.
238 The word 3.5.3.3 Placing tendency In the previous sections we studied the existence or non-existence of some structures, the greatest length of the run, dependence of the transitions, etc. However, the categories A and V may display also a tendency to increase from the beginning to the end of the poem. In that case we speak about a climax of a certain category. There may be climaxes also in individual verses, e.g. a length climax, as has been observed in Malay folk poetry (cf. Altmann, Štukovský 1965), but in our case the number of A's and V's in one verse is very scarce and we must take into account the whole poem. Let us illustrate the procedure of finding a tendency using the well-known Mann-Whitney U-test (Mann, Whitney 1947; Gibbons 1971). We ask whether there is a tendency to apply a category (A or V) more often at the beginning or toward the end of the poem. To this end, we write the sequence of the poem Călin, Gazel as presented in Table 3.5.4 and ascribe ranks to the positions. We obtain V
V
A
V
A
V
V
V
V
V
V
V
V
A
V
V
A
A
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Now, let the sum of ranks of A and V be SA = 3 + 5 + 14 + 17 + 18 = 57 SV = 1 + 2 + 4 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 15 + 16 = 114. The number of A's, nA = 5 and the number of V's is nV = 13, thus n = nA + nV = 5 + 13 = 18. Now we compute the criterion
nA (nA + 1) − SA 2 n (n + 1) UV = nAnV + V V − SV 2 U A = nAnV +
for which we obtain in our example UA = 5(13) + 5(5 + 1)/2 – 57 = 65 + 15 – 57 = 23 UV = 5(13) + 13(13 + 1)/2 – 114 = 65+91 –114 = 156 – 114 = 42. Now, the smaller of the resulting numbers, here UA = 23 will be used as criterion and we look up its critical value in the appropriate tables (cf. e.g. Owen 1962; Bortz, Lienert, Boehnke 1990: 669, Table 6). If the observed value is smaller than the critical value, one can accept the existence of a climax tendency. If it is
Word classes (parts of speech) 239
greater, the hypothesis of no tendency can be accepted. For our case, UA = 23 is greater than the critical value (16) at the 0.05 level, hence there is no tendency. Fortunately, for greater nA, nV one can perform the test asymptotically using the normal criterion. To this end we test (3.5.22) u =
| U − E (U ) | −0.5
σU
,
where the expectation is (3.5.23) E (U ) =
nAnV 2
and the variance (3.5.24) σ U =
nAnV (nA + nV + 1) . 2
For our example we obtain UA = 23, E(U) = 5(13)/2 = 32.5, σU = √[5(13)(5 + 13 +1)/2] = 24.8495, hence u = (|23 – 32.5| - 0.5)/24.8495= 0.3622 which is not significant at the 0.05 level and testifies to the non-existence of a tendency. Performing this test for some other poems, we obtain the results presented in Table 3.5.8. Table 3.5.8: Positional descriptiveness/activity tendencies in some poems UV
77
44.74
0.44
250
107.24
0.36
58.53
0.05
Adânca mare…
162
163
97
57
Atât de fragedă...
421
614
289
211
Când amintirile...
198
267
114
107
110.5
Călin, I
UA
u
SA
Călin, Gazel
SV
σU
Poem title
E(U)
57
114
23
42
32.5
24.85
0.36
2194
2366
1141
1113
1127
328.93
0.04
Călin, II
45
60
24
24
24
18.97
0.03
Călin, III
938
1340
560
532
546
192.69
0.07
Călin, IV
982
1868
833
517
675
226.5
0.70
Călin, V
1331
1370
629
701
665
221.84
0.16
Călin, VI
58
152
32
43
37.5
28.06
0.18
Călin, VII
2820
4683
1908
1644
1776
467.38
0.28
Călin, VIII
3766
5012
2456
1875
2165.5
536.67
0.54
240 The word SA
SV
Ce te legeni?...
293
Crăiasa din poveşti
301
Criticilor mei Cu mâine zilele-ţi adaogi…
Poem title
σU
UA
UV
E(U)
u
302
71
202
136.5
69.12
0.94
329
158
148
153
74.22
0.06
301
560
235
165
200
91.65
0.38
515
520
169
325
247
106.59
0.73
De ce nu-mi vii?
195
333
80
140
110
60.25
0.49
De-aş avea
231
234
129
95
112
58.92
0.28
De-or trece anii...
57
268
37
47
42
33.05
0.14
Departe sunt de tine...
385
518
218
214
216
96.37
0.02
Dintre sute de catarge
23
68
13
17
15
14.49
0.10
Dorinţa
281
247
94
161
127.5
64.87
0.51
După ce atâta vreme
91
140
62
46
54
34.47
0.22
Floare-albastră
828
1317
576
450
513
184.0
0.34
Înger de pază
150
150
72
72
72
42.43
-0.01
Iubind în taină
87
238
67
59
63
40.47
0.09
Kamadeva
122
131
40
77
58.5
36.68
0.49
Lacul
265
263
110
145
127.5
64.87
0.26
La mijloc de codru…
21
15
9
6
7.5
8.22
0.12
La steaua
86
145
40
58
49
32.83
0.26
Lida
80
110
32
52
42
28.98
0.33
Luceafărul (up to nA + nV =
499
776
311
289
300
123.7
0.08
50) Mai am un singur dor
185
343
133
107
120
62.93
0.20
Melancolie
1334
1592
772
668
720
235.46
0.22 0.28
O, ramâi
250
491
191
145
168
80.94
Odă în metru antic
362
379
208
152
180
83.79
0.33
Oricâte stele
153
172
36
108
72
43.27
0.82
Pe aceeaşi ulicioară...
151
479
179
85
132
68.93
0.67
Pe lânga plopii fără soţ...
478
698
202
325
263.5
113.63
0.54
Peste vârfuri
76
60
32
31
31.5
23.14
0
Revedere
343
738
177
252
214.5
100.41
0.37
Sara pe deal
506
719
284
296
290
120.42
0.05 0.24
Se bate miezul nopţii...
30
48
12
20
16
14.42
Singurătate
642
789
324
366
345
136.49
0.15
Somnoroase păsărele…
86
104
38
50
44
29.66
0.19
Sonet I
111
240
69
75
72
44.09
0.06
Sonet II
83
242
71
55
63
40.47
0.19
Sonet III
179
227
74
113
93.5
52.07
0.36
Şi dacă...
34
119
14
28
21
19.44
0.33
Word classes (parts of speech) 241
Poem title
SA
SV
UA
UV
E(U)
σU
u
Trecut-au anii...
76
155
50
48
49
32.83
0.02
No analyzed poem displays the given tendency; we can conjecture that the placing of A and V in texts follows rather the grammatical order than a latent mechanism.
4 The control cycle We consider language as a complex network of interrelated units and properties, in which each element is connected to each other – directly or indirectly. Isolated units or properties without any bond to at least one other element do not exist. Many properties are – similar as in nature – involved in the processes of self-regulation and self-organisation. In analogy to Köhler's (1986, 2005) complex synergetic control cycle, which links levels, units, properties of language, and requirements of language users, we can try to set up the cycle of text properties studied up to now. The situation is advantageous because we have at our disposal a homogeneous collection of 146 texts written by the same author in the same form: rhymed poems. Of course, this homogeneity is disturbed by some outliers but we can better draw consequences – even if they would be local – from texts of different sorts written in different languages where a number of boundary conditions should be taken into account. The resulting cycle is merely a small cutting from the infinite world of texts. We consider the following indicators: 1. the relative entropy of word forms, Hrel; 2. the relative repeat rate (McIntosh) of word forms RRrel; 3. the Ord indicators, I and S; 4. the lambda indicator Λ, 5. Gini's indicator R4; 6. the geometric indicators A, B, α radians both for rank-frequencies and spectra (αS radians); 7. the richness indicators based on cumulative frequencies R1 and R2. All of them have been computed in the previous chapters. For convenience, we present all these data in Table 4.1. Needless to say, the number of individual indicators proposed by other researchers up to now is considerable; we restrict ourselves to those considered in our present investigation. In order to state whether an indicator is linked with another, we simply compute the well known correlation coefficient n
(4.1) r =
∑(x i =1
1i
− x1 )( x2i − x2 ) 1/ 2
n n 2 2 ∑ ( x1i − x1 ) ∑ ( x2i − x2 ) i 1 = = i 1
The control cycle 243
a procedure that can be performed by means of Excel, and test its significance using the t-test with n – 2 = 146 – 2 = 144 degrees of freedom. Since the number 144 is, in this respect, very large, it can be considered in testing as infinity, the critical value of t is ± 1.96 and the formula is (4.2) t = r n − 2 . 2 1− r
The results are presented in Table 4.2 and the respective t-values in Table 4.3. In (4.1) x1 and x2 are the means of the scrutinised variables. If we link with an edge two properties with a significant correlation, we obtain a system of text properties. Further properties may be scrutinised and added. However, correlation analysis is not sufficient. If we accept the existence of a relation between two properties, then it must be possible to display it in form of a function. In the rest of the chapter we shall study just this possibility. In order not to reject too many possible relations and at the same time not to accept those which display rather a cloud of points, we decide to preliminarily accept only those relations which can be expressed by a function and both the correlation coefficient and the determination coefficient are greater than 0.50. We know that for a correlation coefficient it is a too high boundary, and for the determination coefficient a too low one. The dispersion of the points in the plot gives a further hint for our decision. The possibility that some of the relations hold only for Eminescu's poems is not excluded; even in his poetic texts there are some outliers. Hence a number of different texts in different languages should be examined and the given relations should be strengthened or weakened, a general relation should be set up, and its individual cases should be supplied with boundary conditions. One will always find outliers whose existence is rather a problem for literary scientists. As can be seen, no property is isolated. One can draw a graph linking each property with each other but in that case we would obtain a highly redundant connectivity. One could evaluate also the properties of the given graph itself which would display several aspects but we dispense with this task. The links are not transitive, i.e. from x = f(y), y = g(z) one cannot conclude that x = h(z), because all these relations are stochastic, though in many cases such a relationship exists. It must be remarked that even if the correlation coefficient displays a significant value, the relationship of two properties may have a form of a cloud for which it will be difficult to find an appropriate function. This relationship may disappear or become stronger if one would process different authors of different text sorts in different languages. The present image is merely a state of the art in Eminescu's work.
0.9767 0.9879 47.5299
0.9618 0.9772 48.9285
0.903
0.9653 0.9734 33.7274
0.9482 0.9695 80.5931
0.9504 0.9696 73.9455
0.8925 0.9471 393.5057 329.6952
0.9713
0.9744 0.9804 18.7352
0.9174 0.9539 136.5299 94.3774
0.9585 0.9691 20.8195
0.9741 0.9794 19.3821
0.9675 0.9808 40.4923
0.9598 0.9662 24.0206
0.9632 0.9691 19.0325
0.9591
0.9581 0.9682 24.3134
Amorul unei marmure
Andrei Mureşanu
Atât de frageda…
Aveam o muză
Basmul ce i l-aş spune ei
Călin (file de poveste)
Când
Când amintirile...
Când crivăţul cu iarna...
Când marea...
Când priveşti oglinda mărei
Care-i amorul meu în astă lume
Ce e amorul?
Ce to legeni....
Ce-ţi doresc eu ţie. dulce Românie
Cine-i?
24.2304
0.9735 33.7896
0.978
12.8188
19.7728
9.3166
10.654
20.3393
6.6813
12.0935
5.8619
9.2264
46.3385
48.2603
15.3924
0.9548 348.2138 287.8655
28.7203
23.1258
28.0882
16.9221
Amicului F.I.
0.9720 39.2957
0.9531
S 4.5074
0.9562 0.9689 29.6257
I
Ah. mierea buzei tale
RRrel
Adio
Hrel
0.9747 0.9789 14.3434
Adânca mare…
Poem title
Table 4.1: Indicators from previous chapters Λ
R4
B
2.2743 2.1347
0.81 2.1331
1.6312
1.8328
1.8007
1.9160
0.7779 0.61 0.71 2.1124
0.7812 0.69 0.71 1.9733
1.9395
1.9465
0.7677 0.39 0.42 2.5443 2.4934
2.1225
0.7432 0.46 0.57 2.3956 2.1766
1.9976 1.6773
0.8383 0.47 0.74 2.3822 1.9068
0.56 0.68 2.2169 2.0030
0.5365 0.83 0.94 1.7702
0.7005 0.69 0.78 1.9851
0.7019 0.6
0.7826 0.73 0.73 1.9118
0.5506 0.77 0.91 1.8496 1.6718
0.7347 0.48 0.63 2.3785 2.0850
1.566
0.7541 0.49 0.7
2.3462 1.9560
1.6037 0.7348 0.45 0.69 2.4338 1.9719
1.5629
1.6533
1.7369
αS rad
0.7907 0.23 0.18 2.8356 2.9180
0.6912 0.53 0.6
1.6566 0.8291 0.58 0.61 2.1841
1.4776
α rad
0.7362 0.46 0.66 2.3939 2.0270
1.7666 0.6238 0.68 0.9
1.652
A
0.8411 0.47 0.46 2.3901 2.3946
1.6985 0.814
1.7534
1.7774
1.808
1.7701
1.7387
1.7059
1.8253
1.5259
1.5872
1.5767
R1
R2
0.9415
0.931
0.948
0.8806
0.8867 0.89
0.9174 0.9016
0.9032 0.8951
0.8831 0.9207
0.9442 0.9147
0.9158 0.9023
0.886
0.8347 0.9266
0.9227 0.9
0.9216 0.9203
0.8036 0.9527
0.8945 0.9325
0.8895 0.9268
0.9212 0.9314
0.826
0.9154
0.9611
0.9057 0.8931
0.8836 0.8998
0.9133 0.9113
244 The control cycle
0.9757 0.9841 24.8397
0.9439 0.9652 113.2859 63.01
0.9767 0.9823 19.4562
0.9844 0.9876 14.4398
0.9311
0.979
0.9564 0.9666 21.1285
0.9429 0.9645 49.8811
0.9525 0.9643 15.8942
0.9481 0.9681 47.5829
0.9185 0.9546 161.6418 120.1606
0.9831 0.9883 23.2885
0.9512
0.9675 0.976
0.9476 0.9674 57.6599
0.9683 0.9761 24.5417
0.986
0.9825 0.9866 12.5856
0.9495 0.9639 46.5548
Cugetările sărmanului Dionis
Cum negustorii din Constantinopol
Cum oceanu-ntărâtat...
Dacă treci râul Selenei
De câte ori iubito...
De ce nu-mi vii
De ce să mori tu?
De-aş avea
De-aş muri ori de-ai muri
Demonism
De-oi adormi (variantă)
De-or trece anii...
Departe sunt de tine
Despărţire
Din Berlin la Potsdam
Din lyra spartă...
Din noaptea
Din străinătate
0.9885 9.2695
26.2202
0.9586 16.6383
0.9833 19.1177
0.9503 69.2482
26.4649
4.0008
2.5197
10.3902
35.3227
10.6813
8.6144
5.5466
30.9101
11.0147
32.4173
13.6821
6.4573
42.1042
3.2689
5.8108
12.9797
13.2573
10.2383
Cu mâne zilele-ţi adaogi...
0.9813 22.1753
0.9768 0.9862 20.8298
S 43.0224
0.975
I
Criticilor mei
RRrel 0.9573 73.1912
Crăiasa din poveşti
Hrel
0.939
Copii eram noi amândoi
Poem title
Λ
R4
A
B
α rad
αS rad
0.7928 0.48 0.59 2.3808 2.1653
0.7768 0.31 0.16 2.7007 2.9383
0.8071 0.65 0.64 2.0564 2.0626
0.6962 0.72 0.87 1.9348 1.7120
0.8448 0.48 0.47 2.3809 2.3840
0.7268 0.59 0.72 2.1497
0.8441 0.68 0.61 1.9941
0.6822 0.73 0.87 1.9147
0.6088 0.72 0.87 1.9224 1.7107
2.0319
2.6062 2.3104
0.6938 0.45 0.66 2.422
0.7141 0.35 0.5
1.8970
1.9282
2.1166
1.7147
2.407
1.9513
1.706
1.5314
1.4675
1.6677
1.7051
2.9418
2.0285
1.8811
1.8638
3.1046 2.7078 0.7224 0.64 0.76 2.059
0.8571 0.04 0.3
0.8792 0.48 0.16 2.38
0.7972 0.53 0.66 2.2735
0.7005 0.59 0.75 2.1549
1.6868 0.7981 0.47 0.67 2.3859 2.0189
1.4467 0.7515 0.45 0.7
1.7998 0.8707 0.31 0.43 2.6976 2.4851
1.7502
1.5997
1.3225
1.6209 0.6848 0.55 0.74 2.2212
1.4575
1.6933
1.7523
1.6474 0.8806 0.48 0.44 2.3794 2.4506
1.68
1.9632 0.7069 0.71 0.88 1.9422 1.7006
1.623
1.4837
1.658
1.8253
R1
R2 0.9297 0.9221
0.8963 0.87
0.875
0.9252
0.9338 0.927
0.9412 0.9005
0.9197 0.907
0.8882 0.9158
0.9037 0.9035
0.8506 0.867
0.9467 0.9351
0.8492 0.9293
0.8956 0.8958
0.871
0.8797 0.8866
0.878
0.9424 0.9289
0.8427 0.9286
0.9497 0.9197
0.9257 0.9226
0.8844 0.9427
0.9441 0.9109
0.95
0.9303 0.9415
0.868
The control cycle 245
0.9743 0.9774 19.6653
0.9551
0.9241 0.9528 137.714
0.9406 0.9654 133.9957 80.3625
0.8631 0.9588 174.8791 116.411
0.9469 0.9681 81.1388
0.8629 0.9437 848.5269 842.0609
0.9076 0.9687 48.7903
0.9839 0.9883 21.7732
0.9721
0.9593 0.9684 21.2373
0.9563 0.974
0.9177
0.9811
0.9776 0.9842 24.8857
0.9082 0.9539 291.0834 208.616
0.9331 0.9617 182.195
0.979
0.9238 0.9606 166.9185 116.1771
Ecò
Egipetul
Epigonii
Făt-Frumos din tei
Feciorul de împărat fără de stea
Floare-albastră
Foaia veştedă (dupa Lenau)
Freamăt de codru
Frumoasă şi jună
Ghazel
Glossă
Horia
Iar când voi fi pământ (variantă)
Împărat şi proletar
În căutarea Şeherezadei
Înger de pază
Înger şi demon
63.7133
35.0398
27.1907
0.9845 16.0946
0.987
0.9538 59.0619
0.981
7.4472
109.1787
8.9622
8.4351
53.784
35.2446
11.1675
12.5882
5.3182
21.8256
46.6967
85.8646
0.9726 88.0286 43.1048
5.7203
0.1297
3.1726
Dumnezeu şi om
9.4416
Dorinţa
0.9731
0.9974 0.9977 6.7759
S 16.8749
0.9713
I
Doi aştri
RRrel
Dintre sute de catarge
Hrel
0.9601 0.9738 27.2352
Din valurile vremii...
Poem title
Λ
R4
A
B
α rad
0.8461 0.7
2.1701
0.86 1.9607 1.7280
1.6941
0.5752 0.57 0.78 2.1758 0.8298 0.58 0.58 2.1782
2.1818
2.3813
1.8396
0.7294 0.52 0.68 2.2878 1.9990
0.7584 0.53 0.53 2.2527 2.2619
0.8208 0.6
0.5
2.1331
2.3205
0.6747 0.74 0.87 1.8942 1.7160 1.8064 0.6277 0.64 0.89 2.0749 1.6929
1.5514
1.9921
1.8876 0.5965 0.75 0.91 1.8782 1.6679
1.7371
2.3067
2.2414
1.6193
0.8152 0.48 0.61 2.3778 2.1297
0.8725 0.48 0.51 2.377
0.7702 0.73 0.54 1.9227
0.4392 0.88 0.95 1.7039
1.8236 0.8489 0.58 0.48 2.176
1.3595
1.7957
1.551
1.8185
1.7931
1.8612
1.5332
1.8258 0.7059 0.64 0.77 2.0733 1.8492
1.8992 0.6505 0.74 0.89 1.9024 1.6881
0.688 0.7
2.0647
2.1626 1.8048
0.58 1.9582
1.8807 0.6607 0.71 0.89 1.9532 1.9211
αS rad 2.1901
2.3695 3.1416
0.64 2.5103
0.9756 0.49 0
0.8361 0.4
0.7323 0.29 0.57 2.7183
1.9548 0.7453 0.59 0.8
1.7292
1.5385
1.4153
1.52
R1
R2 0.9211 0.9111
0.9387 0.9537 0.8636 0.9385
0.9464 0.9143
0.865
0.8365 0.945
0.9351
0.9476 0.9401
0.8421 0.8668
0.9063 0.9291
0.8938 0.8902
0.9218 0.9351
0.9435 0.9285
0.9008 0.8962
0.7585 0.9502
0.8967 0.93
0.8523 0.9515
0.8766 0.9513
0.8345 0.948
0.9055 0.9454
0.927
0.9781 0.9244
0.9111
0.9046 0.8867
246 The control cycle
0.9887 0.9919 16.1075
0.9516 0.9743 59.7529
0.9428 0.9691 69.4926
0.9290 0.9539 93.9687
0.9845 0.988
0.9743 0.9838 34.167
0.8942 0.8826 10.251
0.9517
0.9661 0.9807 45.0691
0.9611
0.9701 0.9781 28.578
0.9725 0.9814 25.0687
0.9655 0.9764 38.4593
0.9601 0.9694 20.3502
0.9849 0.9882 13.2642
0.9651 0.9712 17.3485
0.9614 0.9743 44.086
0.9839 0.9842 7.5099
0.9843 0.987
0.953
Iubită dulce, o, mă lasă
Iubitei
Junii corupţi
Kamadeva
La Bucovina
La mijloc de codru...
La moartea lui Heliade
La moartea lui Neamţu
La moartea principelui Ştirbey
La mormântul lui Aron Pumnul
La o artistă (Ca a nopţii poezie)
La o artistă (Credeam ieri)
La Quadrat
La steaua
Lacul
Lasă-ţi lumea...
Lebăda
Lida
Locul aripelor
0.9728 48.2531
12.2344
0.9729 25.4086
0.9704 63.1788
15.1658
47.9721
0.9542 0.969
I
Iubind în taină...
RRrel
Întunericul şi poetul
Hrel
0.9839 0.9876 11.5293
Îngere palid...
Poem title
S
30.1077
3.0913
1.1242
20.6707
7.1442
2.9552
11.0975
23.1001
13.5436
12.1992
12.355
25.6754
37.2997
6.8378
15.9934
3.7007
51.5615
54.8294
41.9792
3.2733
25.6365
3.6802
Λ
R4
A
B
α rad
αS rad
2.9211
2.0769 1.9594
0.7814 0.56 0.64 2.2159
0.7991 0.55 0.71 2.2411
0.7682 0.37 0.31 2.5855
0.7477 0.41 0.59 2.5219
0.7129 0.61 0.74 2.1149
2.0611
1.9396
2.6884
2.1659
1.9044
0.6675 0.76 0.69 1.8486 1.9405
0.7946 0.48 0.59 2.3782 2.1590
0.8764 0.48 0.42 2.3784 2.4949
0.6918 0.68 0.87 1.9894 1.7228
0.6433 0.63 0.7
0.682 0.48 0.68 2.3805 1.9947
0.8952 0.23 0.18 2.84
0.7375 0.58 0.75 2.1822 1.8829
1.6297
1.5894
2.3565
2.1489
2.7164
3.1416
2.0052
0.7077 0.26 0.55 2.791
2.2367
0.8772 0.65 0.24 2.0526 2.8068
0
0.7656 0.59 0.67 2.1591
0.7984 0.46 0.64 2.3938 2.0635
0.8835 0.03 0.49 3.1077
0.7554 0.55 0.59 2.2241
1.4647 0.9077 0.3
1.7829
1.5432
1.6121
1.5012
1.6963 0.7488 0.71 0.72 1.9498 1.9223
1.6128
1.7237
1.5981
1.7095
1.76
1.3175
1.7465
1.655
1.8752
1.5643
1.6205
1.7128
1.7367
1.5088 0.8616 0.22 0.16 2.8495 2.9352
R1
R2
0.9408
0.8952
0.916
0.9073 0.9273
0.9394 0.8958
0.9201 0.8525
0.9161
0.9014 0.8921
0.9366 0.9292
0.9011
0.9224 0.9267
0.9375 0.9279
0.9133 0.9215
0.9091 0.8772
0.9327 0.9131
0.9022 0.9341
0.7295 0.792
0.9348 0.9317
0.9522 0.9206
0.8584 0.9255
0.9041 0.9018
0.9169 0.9009
0.967
0.8815 0.9287
0.9544 0.9124
The control cycle 247
0.9189 0.9458 119.1822 83.5678
0.9652 0.9776 28.1931
0.9322 0.9551
0.934
0.8967 0.9502 330.9081 290.9484
0.9789 0.9841 22.8708
0.9583 0.9697 44.7092
0.9613 0.9747 33.7436
0.976
0.9504 0.9717
0.9819 0.9861 21.6335
0.9809 0.9837 8.7626
0.9646 0.9749 30.2561
0.9538 0.9688 63.6083
0.9835 0.9981 14.358
0.9344 0.9549 66.904
0.9599 0.972
0.9784 0.9849 19.3205
Misterele nopţii
Mitologicale
Mortua est!
Mureşanu
Murmură glasul mării
Napoleon
Noaptea...
Nu e steluţă
Nu mă-nţelegi
Nu voi mormânt bogat (variantă)
Numai poetul
O arfă pe-un mormânt
O călărire în zori
O stea prin ceruri
O, adevăr sublime...
O, mamă…
Odă în metru antic
133.639
25.6049
73.0426
0.9815 8.8851
0.9616 89.4855
7.3207
14.9131
37.407
4.8041
38.371
14.0062
2.9458
4.1517
43.9308
5.1288
17.6877
24.8054
6.4199
63.6093
80.9262
16.2218
R4
A
α rad
αS rad 1.6374
0.86 1.9679 1.7257 1.8051
1.6777
3.0876 3.0126 2.1280
0.8542 0.21 0.29 2.8598 2.7214
0.8824 0.48 0.61 2.377
0.7051 0.43 0.64 2.4708 2.0669
0.8009 0.05 0.1
0.7538 0.47 0.59 2.3845 2.1616
0.7434 0.75 0.77 1.8804 1.8500
2.3854
0.93 1.8094 1.6432
0.8529 0.58 0.47 2.1794
0.5279 0.8
1.8838
0.742
0.31 0.36 2.7029 2.6026
0.47 0.55 2.3881 2.2250
0.6993 0.68 0.88 1.9883 1.7100
0.8548 0.03 0.17 3.1093 2.9283
1.6267 0.829
1.5313
1.7786
1.5726
1.8042 0.7184 0.73 0.75 1.9123
1.6827 0.7771 0.55 0.63 2.2406 2.0768
1.395
1.8106
1.7618
1.2778
1.655
1.7542
1.7789
1.6616
1.6837 0.6466 0.67 0.85 2.0148 1.7404
1.9389 0.6782 0.77 0.88 1.8499 1.6987
0.7524 0.47 0.49 2.3845 2.3348
0.9
0.4283 0.93 0.97 1.6437 1.5989
0.7
0.8499 0.31 0.33 2.6979 2.6580
1.7898 0.6312 0.8 1.5812
B
0.5309 0.83 0.94 1.7705
1.7968 0.73
1.7319
0.8582 0.9409 1350.459 1354.7581 1.6175
28.6786
8.2352
Miradoniz
53.2004
Memento mori
Λ 1.6514
0.9502 0.964
S
0.9849 0.9906 22.8924
I
Melancolie
RRrel
Mai am un singur dor
Hrel
0.8972 0.9482 281.3298 246.8649
Luceafărul
Poem title
R1
R2
0.9297
0.9292
0.9432
0.9296
0.9287
0.9441
0.9369 0.9112
0.8929 0.8967
0.8473 0.9232
0.9423 0.9258
0.9021 0.9222
0.9045 0.9171
0.9401 0.8994
0.9336 0.928
0.8997 0.9343
0.9167 0.866
0.9216 0.8992
0.9
0.937
0.813
0.8719 0.917
0.859
0.9226 0.8878
0.8349 0.9345
0.7655 0.9548
0.891
0.956
0.8123 0.9309
248 The control cycle
0.9543 0.9703 37.7909
0.969
0.9703 0.98
0.9485 0.9683 74.4225
0.9809 0.9837 8.7626
0.9675 0.975
0.954
0.8775 0.9062 23.5265
0.9644 0.9761 26.201
0.9508 0.9682 70.6941
0.9497 0.9673 42.5334
0.9758 0.9831 30.3494
0.9164 0.9555
0.925
0.8871 0.939
0.9124 0.9514
0.9094 0.949
Pe lângă plopii fără soţ
Peste vârfuri
Povestea codrului
Povestea teiului
Prin nopţi tăcute
Privesc oraşul furnicar
Pustnicul
Replici
Revedere
Rugăciunea unui dac
S-a dus amorul
Sara pe deal
Scrisoarea I
Scrisoarea II
Scrisoarea III
Scrisoarea IV
Scrisoarea V
9.616
23.742
36.4329
14.0915
21.5219
38.8569
12.9692
2.9458
44.4743
18.6836
2.7012
21.696
12.5824
11.3191
181.2812 142.4597
231.5885 173.3774
404.7692 327.5248
0.9585 133.7825 90.1734
232.3401 176.0081
0.9734 75.4629
33.8715
41.9171
0.9684 8.9681
29.8618
0.9631 0.9734 26.4997
Pe aceeaşi ulicioară...
0.963
0.955
Pajul Cupidon...
4.4763
0.9884 0.9922 15.4324
S
0.9293 0.9605 166.8483 110.8842
I
Oricâte stele...
RRrel
Ondina (Fantazie)
Hrel
0.9086 0.9548 241.2197 200.7361
Odin şi poetul
Poem title
Λ
R4
A
B
0.5622 0.77 0.9
α rad 1.7128
0.7892 0.6
0.76 2.1382 1.8713
0.8785 0.49 0.07 2.3703 3.0613
1.7791
0.8542 0.21 0.29 2.8598 2.7214
0.7044 0.69 0.82 1.9852
0.7914 0.67 0.66 2.0126 2.0323
1.9228
2.3863 2.1385
0.8418 0.59 0.72 2.1523
0.7272 0.47 0.6
1.7737
1.9428
0.7331 0.6
0.8
2.1496 1.8165
0.7598 0.27 0.52 2.7726 2.2851
0.5547 0.54 0.82 2.2167
0.7339 0.48 0.71 2.3771
0.5974 0.79 0.89 1.8204 1.6893
1.7059
0.584
0.77 0.9
1.6646
1.6246 1.8499 1.6779
0.81 0.91 1.7993
0.5441 0.86 0.95 1.7346 1.8504 0.596
1.823
1.8067 0.6404 0.63 0.88 2.0795 1.7098
1.8051
1.8203 0.8345 0.45 0.58 2.4442 2.1770
1.6623 0.7225 0.53 0.76 2.2727 1.8651
1.8549
1.5711
1.207
1.8599
1.8209 0.8043 0.64 0.77 2.0572 1.8519
1.395
1.8013
1.829
1.4254
1.6256
1.6202 0.7729 0.47 0.66 2.3865 2.0242
1.7418
1.6531
αS rad
1.8598 1.6775
1.8823 0.6486 0.73 0.87 1.921
1.6852
R1
R2 0.9467
0.9058
0.9353 0.9442 0.8268 0.9391
0.84
0.7891 0.9529
0.847
0.8574 0.9333
0.9405 0.9414
0.8881 0.9046
0.8852 0.9323
0.9229 0.8992
0.6916 0.8562
0.9109 0.937
0.9075 0.9246
0.9401 0.8994
0.8923 0.9187
0.9324 0.9442
0.8963 0.8974
0.897
0.8986 0.8915
0.8716 0.9457
0.9647 0.9326
0.864
0.8247 0.9388
The control cycle 249
0.9704 0.9807 33.4095
0.9784 0.981
0.9672 0.9816 50.5246
0.9302 0.9481 41.7168
0.9774 0.9827 12.6056
0.9776 0.9822 17.4031
0.9787 0.9856 24.3139
0.96
0.984
0.9789 0.9836 10.5674
0.9459 0.9667 70.3343
0.989
0.9768 0.9829 20.0956
0.9741 0.9828 33.5602
Somnoroase păsărele...
Sonete
Speranţa
Steaua vieţii
Stelele-n cer
Sus în curtea cea domnească
Te duci...
Trecut-au anii
Unda spumă
Venere şi Madona
Veneţia (de Gaetano Cerri)
Viaţa mea fu ziuă
Vis
0.9918 14.6022
0.9884 16.2951
0.9582 16.5634
10.2779
8.6727
0.9654 0.9721
I
Singurătate
RRrel 0.9885 8.2132
Şi dacă...
Hrel
0.987
Se bate miezul nopţii...
Poem title
S
14.0851
6.6987
2.4098
47.6845
4.554
5.0704
5.3582
8.6318
5.1325
5.6043
31.6412
25.6488
3.168
13.6364
5.7053
1.6115
Λ
R4
A
B
α rad
0.7629 0.41 0.55 2.5193 0.34 2.7164
2.6350
0.8207 0.72 0.76 1.9233
0.8325 0.48 0.41 2.376
1.8567
2.5068
0.8489 0.47 0.56 2.3836 2.2169
0.8197 0.3
1.7861
2.2335
0.8526 0.47 0.46 2.3907 2.3962
0.7996 0.38 0.35 2.5767 2.6294
0.7624 0.28 0.16 2.7367 2.9143
0.8291 0.04 0.21 3.0962 2.8646
1.7767
1.6681
1.7013
0.806 0.57 0.61 2.2096 2.1315
0.8358 0.48 0.47 2.3803 2.3840
0.9059 0.23 0.31 2.8417 2.6996
1.6936 0.6808 0.65 0.76 2.0566 1.8706
1.4055
1.6405 0.8603 0.48 0.38 2.3772 2.5688
1.6703
1.7229
1.6503
1.4561
αS rad
2.3825 2.7175
1.5082 0.6456 0.75 0.81 1.8781
1.7951
1.4633
1.7556
1.2116
1.4632 0.8983 0.47 0.3
R1
R2
0.9208
0.9036
0.9386 0.94
0.9286 0.9244
0.9636 0.9453
0.8856 0.9253
0.9237 0.8994
0.956
0.875
0.9414 0.93
0.9176 0.9137
0.9214 0.8712
0.8429 0.8781
0.9377 0.9278
0.9295 0.8886
0.9244 0.9257
0.8962 0.8302
0.9333 0.9069
250 The control cycle
The control cycle 251
Table 4.2: Correlation coefficients between all indicators RRrel
Hrel 0.87 RRrel I S Λ R4 A B α rad αS rad R1
I
S
-0.70 -0.65 -0.45 -0.42 0.99
Λ
R4
A
B
α rad
αS rad
R1
R2
-0.18 0.01 0.17 0.09
0.93 0.81 -0.69 -0.65 -0.17
-0.67 -0.63 0.49 0.45 0.46 -0.63
-0.75 -0.71 0.46 0.40 0.47 -0.77 0.79
0.67 0.63 -0.47 -0.42 -0.46 0.63 -1.00 -0.80
0.72 0.70 -0.42 -0.37 -0.45 0.75 -0.79 -1.00 0.80
0.92 0.96 -0.58 -0.56 0.01 0.86 -0.61 -0.72 0.62 0.70
-0.14 0.15 0.37 0.32 0.78 -0.16 0.33 0.33 -0.32 -0.30 0.06
Table 4.3: t-tests for the significance of the correlation coefficients
Hrel RRrel
RRrel 21.17
A B α rad αS rad R1 R2 -10.94 -13.43 10.80 12.55 28.07 -1.73
I S Λ -11.67 -10.38 -2.17
R4 31.17
-6.01
16.30 -9.63
-11.95 9.78
11.74
40.39 1.77
-5.52
0.06
I
109.60 2.07
-11.35 6.79
6.16
-6.34
-5.57
-8.65
4.85
S
1.10
-10.14 5.97
5.26
-5.55
-4.74
-8.02
4.04
-2.04
6.25
6.32
-6.15
-6.07
0.14
14.94
7.13
10.70
36.92 39.36 20.94 10.71
15.56
-191.12 -15.31 -9.34
Λ R4 A B α rad αS rad R1
4.17
-16.02 -167.86 -12.32 4.25 15.82
9.43
-3.99
11.80
-3.79 0.66
252 The control cycle Let us consider the links one by one. Since our investigation is mostly based on rank-frequency sequences, the simplest relationship is that between the most elementary properties of the sequence, viz. Ord's criterion, which links two functions of the moments of the distribution. The fact that S = f(I) has already been analysed in Chapter 3.2.2 where the respective formulas of the power and the linear functions were presented. Usually, this relationship is an expression of style, and writers display at least different parameters. The almost linear dependence is shown in Figure 4.1a, for clarity also in logarithmic transformation in Fig. 4.1.b but the formula concerns the plain values.
Figure 4.1a. The almost linear relationship between I and S. Plain presentation. |r| = 0.99; S = 2.9493 + 0.3425*I1.1512; R2 = 0.9974.
Figure 4.1b. The almost linear relationship between I and S. Logarithmic presentation.
Analyzing I and S, the logarithmic presentation is necessary due to the high disparity of the values (up to about 103) in comparison with other considered indicators, which mostly attain very small values, of the magnitude of a unity.
The control cycle 253
Since I is, as a matter of fact, a measure of dispersion, it must automatically be associated with the entropy and the repeat rate. The greater the dispersion, the greater is the entropy, which expresses also a kind of uncertainty; however, in our rank-frequency data, the average increases with increasing dispersion, thus the greater is I, the smaller will be the entropy. Further, the greater the dispersion, the smaller is the repeat rate, which is a measure of concentration. The first of these two relations is presented in Figure 4.2, again, in plain and in logarithmic presentation. The outliers are nominated in the plot. The relation of I to RRrel has a smaller correlation coefficient than required, hence it will be omitted, even if the power relation between the two indicators has an R2 = 0.63.
Figure 4.2. Plain (left) and logarithmic (right) presentation of Hrel = f(I). |r| = 0.70; Hrel = 1.0493 - 0.0391*I0.2287; R2 = 0.8490 (3 outliers omitted).
Figure 4.3. Relation of I to R1. |r| = 0.58; R1 = 0.9996 - 0.0323*I0.2928; R2 = 0.6743 (2 outliers omitted). Plain (left) and logarithmic (right) presentation.
254 The control cycle Since the h-point is a fixed point, it is to be expected that all indicators based on h must have some relation to I. These are A, α rad, R1, R2, R4, and even the indicators of the spectrum, B and αS rad. The “best” ones are presented in Figures 4.3 and 4.4. The functions linking all these indicators with I are presented in Table 4.2. Only four properties fulfil the criterion, hence the preliminary control cycle has the form displayed in Figure 4.5.
Figure 4.4. Relation of I to R4. |r| = 0.69; R4 = 0.4145*exp(-I/201.7846) + 0.43; R = 0.7325. Plain (left) and logarithmic (right) presentation.
Figure 4.5. The first step toward the self-regulation cycle.
Figure 4.6. Relation of S to R1. |r| = 0.56; R1 = 0.9963 – 0.0443*S0.2449; R2 = 0.7250 (2 outliers omitted). Plain (left) and logarithmic (right) presentation.
The control cycle 255
The next series of Figures (4.6 to 4.8) shows the relation of S to the other indicators, namely Hrel, R1 and R4. Thus, S plays the same role in the cycle as I.
Figure 4.7. The relation of S to Hrel. |r| = 0.65; Hrel = 1.0266 - 0.0335*S0.2353; R2 = 0.9140 (4 outliers omitted). Plain (left) and logarithmic (right) presentation.
In Figure 4.7, the logarithmic presentation shows that the relation is not as strong as shown in the plain presentation. The dispersion is too great.
Figure 4.8. The relation between S and R4. |r| = 0.65; R4 = 1.3591 – 0.4388*S0.1080; R2 = 0.9184 (2 outliers omitted). Plain (left) and logarithmic (right) presentation.
Completing the significant relations of S we obtain the new control cycle in Figure 4.9.
256 The control cycle
Figure 4.9. The second step toward the self-regulation cycle
The other indicator values are comparable and need no logarithmic presentation. Those of Hrel are presented in Figures 4.10 to 4.14.
Figure 4.10. Relation of Hrel to RRrel. |r| = 0.87; RRrel = 0.9396 + 0.0635*Hrel16.6456; R2 = 0.9038 (2 outliers omitted).
Figure 4.12. Relation of Hrel to αS. |r| = 0.72; αS = 1.618 + 1.7805*Hrel35.1141; R2 = 0.7102 (3 outliers omitted).
Figure 4.11. Relation of Hrel to B. |r| = 0.75; B = 0.9581 - 1.0259*Hrel30.4349; R2 = 0.7406 (3 outliers omitted).
Figure 4.13. Relation of Hrel to R1. |r| = 0.92; R1 = -0.5701 + 1.5374*Hrel; R2 = 0.9133 (4 outliers omitted).
The control cycle 257
Figure 4.14. Relation of Hrel to R4. ׀r = ׀0,93; R4 = 0,2264+0,7224*Hrel7,5475; R2 = 0,9566 (4 outliers omitted)
The cycle has now the following form (cf. Figure 4.15). The relations of RRrel are presented in Figures 4.16 to 4.21. The relation to A and to α radians lies at the acceptance boundary. The dispersion of the values is relatively great.
Figure 4.15. Inserting the significant relations of Hrel in the cycle
Figure 4.16. The relation of RRrel to A. |r| = 0.63; A = 2.8565 - 2.6203*RRrel4.3843; R2 = 0.5155 (2 outliers omitted).
Figure 4.17. Relation of RRrel to B. |r| = 0.71; B = 1.1264 - 1.1473*RRrel31.8398; R2 = 0.7256 (2 outliers omitted)
258 The control cycle
Figure 4.18. Relation of RRrel to α. |r| = 0.63; α = - 2.2224 + 5.0227*RRrel4.0487; R2 = 0.5168 (2 outliers omitted).
Figure 4.19. Relation of RRrel to αS. |r| = 0.70; αS = 1.4031 + 1.9086*RRrel38.5419; R2 = 0.7170 (2 outliers omitted).
Figure 4.20. Relation of RRrel to R1. |r| = 0.96; R1 = - 2.1780 + 3.1664*RRrel; R2 = 0.9447 (1 outlier omitted).
Figure 4.21. Relation of RRrel to R4. |r| = 0.81; R4 = - 6.0059 + 6.9509*RRrel; R2 = 0.8076 (2 outliers omitted).
The graph has now the following form (cf. Figure 4.22).
Figure 4.22. The status quo of the cycle.
The control cycle 259
Figure 4.23. Relation of A to B. |r| = 0.79; B = 0.0921 + 0.9831*A; R2 = 0.6272.
Though the relation of A to B and to αS radians displays a number of outliers, they still can be included in the cycle.
Figure 4.24. Relation of A to α. |r| = 1.00; α = 3.2209 - 1.7802*A; R2 = 0.9961.
Figure 4.25. Relation of A to αS radians. |r| = 0.79; αS = 2.9170 - 1.6430*A1.1992; R2 = 0.6226.
Figure 4.26. Relation of A to R1. |r| = 0.61; R1 = 0.9302 - 0.2598*A4.4935; R2 = 0.6224 (2 outliers omitted).
260 The control cycle The cycle has now the form shown in Figure 4.27.
Figure 4.27. The status quo of the cycle
Figure 4.28. Relation of B to α. |r| = 0.80; α = 2.7529 - 1.0649*B1.8363; R2 = 0.6755.
Figure 4.29. Relation of B to αS. |r| = 1.00; αS = 3.1863 - 1.7117*B; R2 = 0.9949
Figure 4.30. Relation of B to R1. |r| = 0.72; R1 = 0.9362 - 0.1666*B4.7810; R2 = 0.7984 (2 outliers omitted).
Figure 4.31. Relation of R4 to B. |r| = 0.77; R4 = 0.8408 - 0.3632*B4.1031; R2 = 0.7266.
The control cycle 261
Figure 4.32. Relation of α to αS. |r| = 0.80; αS = 1.0411 + 0.2209*α1.9232; R2 = 0.6277.
Figure 4.33. Relation of α to R1. |r| = 0.62; R1 = 0.93191-107.1708*exp(-α/0.2587); R2 = 0.6276 (2 outliers omitted).
Figure 4.34. Relation of α to R4. |r| = 0.63; R4 = 0.8187 -285.8296*exp(-α /0.2518); R2 = 0.5474.
Figure 4.35. Relation of αS to R1. |r| = 0.70; R1 = 0.9367 - 124.8114*exp(-αS/0.2359); R2 = 0.7971 (2 outliers omitted).
Figure 4.36. Relation of αS to R4. |r| = 0.75; R4 = 0.8424 - 107.2138*exp(-αS/0.2747); R2 = 0.7265.
Figure 4.37. Relation of R1 to R4. |r| = 0.86; R4 = 0.9842*R12.6096; R2 = 0.8060 (2 outliers omitted).
262 The control cycle
Figure 4.38. Relation of R2 to Λ. |r| = 0.78; Λ = - 3.1375 + 5.2308*R20.9879; R2 = 0.6377 (2 outliers omitted).
The complete cycle can be presented in Figure 4.39. This figure shows that two of the properties, viz. R2 and Λ, form an isolated cycle while the main cycle does not contain all links. Of course, this holds only if our criteria turn out to be acceptable.
Figure 4.39. The final form of the cycle.
The results require some comments which may be presented here. 1. As conjectured above, no property is isolated. Relatively independent is the indicator Λ which concerns the normalised arc length. Its link to R2 was not expected and will possibly be strengthened by other texts. 2. The majority of links is represented by a straight line (21), a power function (34) and an exponential function (11), that means, very simple links which can be incorporated in a system analogous to that of Köhler. All relations, also those that are not shown here graphically, are presented in Table 4.4. Some of them are merely preliminary proposals, without a corroboration by means of a test.
The control cycle 263
3.
Since the correlation coefficient is merely the first sign of a relationship, a function joining two variables and yielding R2 > 0.80 is a sufficient reason to incorporate the link in a control cycle. Our present steps are merely preliminary hints at a possibility of theory building. 4. The basis of the given system are the properties R1 and R4 which have the most (8) strong links. But this is merely a very preliminary, explorative view restricted to only one writer. 5. Whenever many texts are analysed and a link between properties is discovered, outliers may occur that seemingly destroy the relationship. However, they are not necessarily reasons for rejecting a hypothesis. Usually, we set up hypotheses without caring for boundary conditions. Every text may contain elements that were inserted voluntarily, so to say against the automatically working mechanisms, in order to give the text a special view, introduce a special mood, etc. If we mix prose texts with poetic ones in which a special rhythm is necessary, then some indicators do not change but other ones may display a very deviating pattern. A simple example may illustrate this circumstance: if we find a law of word length distribution, then it holds only for languages in which word length is a variable; it does not hold for monosyllabic languages which do not fulfil the basic requirement of variable word length. An outlier is rather a cause for further research with other means. 6. We presented the relations between variables in pairs. But later on one can discover that a variable may depend simultaneously on several other ones (see, for instance, 3D plots in Figures 4.40, 4.41, and 4.42). This is, of course, a way without end because one would be forced to scrutinise all triple, quadruple, etc. relations, but sometimes it is also a possible explanation of outliers. To this end, the last Figure 4.39 could be interpreted systems-theoretically, i.e. as a directed graph with influences, parameters, loops, and the whole control cycle would consist of a system of equations. 7. If a variable represented by an indicator does not display any link to the other ones, it does not mean that it is isolated. We simply did not find those variables with which it is linked, and further investigations are necessary. It would be naïve to suppose that texts have only twelve properties because we know that properties are only our theoretical concepts. 8. As can be seen in Table 4.4, the linking functions are very simple. If we set up the differential equations whose solutions they are, we obtain the following results:
264 The control cycle Function
Differential equation
y = a + bx y = axb y = a + bxc y = a*exp(-x/b) y = a + b*exp(-x/c)
y’ = b y’/y = b/x y’/(y-a) = c/x y’/y = -1/b y’/(y-a) = -1/c
Table 4.4: Linking the variables by preliminary functions Var
Var
Fitting function
1
2
I
Λ
0.17 Λ = - 6.9463 + 8.2416*I0.0123
I
R2
0.37 R2 = 0.3771 + 0.4917*I
0.3129
I
αS
0.42 αS = 1.6 + 1.2519*exp(-I/39.1304)
0.5861
I
RRrel
0.45 RRrel = 1.1018 - 0.0986*I0.0739
0.6284
I
B
0.46 B = 1 - 0.7391*exp(-I/55.8425)
0.6026
|r|
I 0.0255
omitted R
outliers
0.4283
2
2
I
α
0.47 α = 1.7 + 0.8906*exp(-I/94.8474)
0.4510
I
A
0.49 A = - 5.9300 + 6.0336*I0.0190
0.4767
I
R1
0.58 R1 = 0.9996 - 0.0323*I0.2928
0.6743
I
R4
0.69 R4 = 0.43 + 0.4145*exp(-I/201.7846)
0.7325
I
Hrel
0.70 Hrel = 1.0493 - 0.0391*I0.2287
0.8490
I
S
0.99 S = -2.9493 + 0.3425*I1.1512
0.9974
|r|
S
2
2 3
R2
S
Λ
0.09 Λ = - 2.1657 + 3.6655*S
0.2627
S
R2
0.32 R2 = 0.8948*S0.0085
0.1751
S
αS
0.37 αS = 1.0465*exp(-S/25.6391) + 1.6
0.5849
S
B
0.40 B = - 5.3964 + 5.6494*S0.0217
0.6331
S
RRrel
0.42 RRrel = 1.0503 - 0.0586*S
0.6941
S
α
0.42 α = 0.8111*exp(-S/68.2008) + 1.7
0.4292
S
A
0.45 A = - 0.0618 + 0.3910*S0.1418
0.4451
S
R1
0.56 R1 = 0.9963 - 0.0443*S0.2449
0.725
2
S
Hrel
0.65 Hrel = 1.0266 - 0.0335*S
0.914
4
S
R4
0.65 R4 = 1.3591 - 0.4388*S0.1080
0.9184
2
I
S
0.99 S = -2.9493 + 0.3425*I1.1512
0.9974
0.0159
0.0957
0.2353
|r|
Hrel
2
2
R2
Hrel
R2
0.14 R2 = 1.1650 - 0.2581*Hrel
0.0952
Hrel
Λ
0.18 Λ = 2.5936 - 0.9689*Hrel
0.0318
2
S
Hrel
0.65 Hrel = 1.0266 - 0.0335*S0.2353
0.9140
Hrel
A
0.67 A = 1.1159 - 0.0001*exp(9.4401*Hrel)
0.4941
1
Hrel
α
0.67 α = 1.6180 + 1.2022*Hrel15.3522
0.4863
2
4
The control cycle 265 Var
Var
Fitting function
omitted
I
Hrel
0.70 Hrel = 1.0493 - 0.0391*I
0.8490
3
Hrel
αS
0.72 αS = 1.618 + 1.7805*Hrel35.1141
0.7102
3
Hrel
B
0.75 B = 0.9581 - 1.0259*Hrel30.4349
0.7406
3
Hrel
RRrel
0.87 RRrel = 0.9396 + 0.0635*Hrel16.6456
0.9038
2
0.2287
Hrel
R1
0.92 R1 = - 0.5701 + 1.5374*Hrel
0.9133
4
Hrel
R4
0.93 R4 = 0.2264 + 0.7224*Hrel7.5475
0.9566
4
|r|
RRrel
R2
RRrel
Λ
0.01 Λ = 4.2076 - 2.6059*RRrel
0.0510
RRrel
R2
0.15 R2 = 1.2065 - 0.2964*RRrel
0.0274
2
S
RRrel
0.42 RRrel = 1.0503 - 0.0586*S0.0957
0.6941
2
0.45 RRrel = 1.1018 - 0.0986*I
0.0739
I
RRrel
0.6284
2
RRrel
A
0.63 A = 2.8565 - 2.6203*RRrel4.3843
0.5155
2
RRrel
Α
0.63 α = - 2.2224 + 5.0227*RRrel4.0487
0.5168
2
RRrel
αS
0.70 αS = 1.4031 + 1.9086*RRrel38.5419
0.7170
2
RRrel
B
0.71 B = 1.1264 - 1.1473*RRrel31.8398
0.7256
2
RRrel
R4
Hrel
RRrel
RRrel
R1
0.81 R4 = - 6.0059 + 6.9509*RRrel
0.8076
2
0.87 RRrel = 0.9396 + 0.0635*Hrel16.6456
0.9038
2
0.96 R1 = - 2.1780 + 3.1664*RRrel
0.9447
1
|r|
A
R
2
A
R2
0.33 R2 = 0.8901 + 0.0517*A
0.1656
S
A
0.45 A = - 0.0618 + 0.3910*S0.1418
0.4451
A
Λ
0.46 Λ = 1.4599 + 0.3866*A
0.2131
I
A
0.49 A = - 5.9300 + 6.0336*I0.0190
0.4767
A
R1
0.61 R1 = 0.9302 - 0.2598*A4.4935
0.6224
2
A
R4
0.63 R4 = 0.8149 - 0.6118*A4.7963
0.5961
3
RRrel
A
0.63 A = 2.8565 - 2.6203*RRrel4.3843
0.5155
2
Hrel
A
0.67 A = 1.1159-0.0001*exp(9.4401*Hrel)
0.4941
1
A
αS
0.79 αS = 2.9170 - 1.6430*A1.1992
0.6226
A
B
0.79 B = 0.0921 + 0.9831*A
0.6272
A
Α
1
α = 3.2209 - 1.7802*A
1
0.9961
|r|
B
R2
B
R2
0.33 R2 = 0.8973 + 0.0339*B
0.1197
S
B
0.40 B = - 5.3964 + 5.6494*S
0.6331
I
B
0.46 B = 1 - 0.7391*exp(-I/55.8425)
0.6026
0.0217
2
B
Λ
0.47 Λ = 1.4678 + 0.3283*B
0.2514
1
RRrel
B
0.71 B = 1.1264 - 1.1473*RRrel31.8398
0.7256
2
B
R1
0.72 R1 = 0.9362 - 0.1666*B4.7810
0.7984
2
0.7406
3
Hrel
B
0.75 B = 0.9581 -
1.0259*Hrel30.4349
266 The control cycle Var
Var
Fitting function
omitted
B
R4
0.77 R4 = 0.8408 - 0.3632*B
A
B
0.79 B = 0.0921 + 0.9831*A
0.6272
B
α
0.80 α = 2.7529 - 1.0649*B1.8363
0.6755
B
αS
1
4.1031
αS = 3.1863 - 1.7117*B
0.9949
|r| Α
R2
0.7266
Α
0.32 R2 = 0.9818 - 0.0282*α
R2 0.1566
S
α
0.42 α = 0.8111*exp(-S/68.2008) + 1.7
0.4292
Α
Λ
0.46 Λ = 2.1524 - 0.2141*α
0.2079
I
α
0.47 α = 0.8906*exp(-I/94.8474) + 1.7
0.4510
Α
R1
0.62 R1 = 0.93191-107.1708*exp(-α /0.2587)
0.6276 0.5474
1
2
Α
R4
0.63 R4 = 0.8187 -285.8296*exp(-α /0.2518)
RRrel
α
0.63 α = - 2.2224 + 5.0227*RRrel4.0487
0.5168
2
Hrel
α
0.67 α = 1.6180 + 1.2022* Hrel15.3522
0.4863
2
Α
αS
0.80 αS = 1.0411 + 0.2209*α1.9232
0.6277
B
α
0.80 α = 2.7529 - 1.0649*B1.8363
0.6755
A
α
1
α = 3.2209 - 1.7802*A
0.9961
|r|
αS
R2
αS
R2
0.30 R2 = 0.9618 - 0.0207*αS
0.1211
S
αS
0.37 αS = 1.0465*exp(-S/25.6391) + 1.6
0.5849
1
I
αS
0.42 αS = 1.2519*exp(-I/39.1304) + 1.6
0.5861
αS
Λ
0.45 Λ = 2.0448 - 0.1774*αS
0.2037
RRrel
αS
0.70 αS = 1.4031 + 1.9086*RRrel38.5419
0.7170
αS
R1
0.70 R1 = 0.9367-124.8114*exp(-αS/0.2359)
0.7971
2
Hrel
αS
0.72 αS = 1.618 + 1.7805* Hrel35.1141
0.7102
3
αS
R4
0.75 R4 = 0.8424-107.2138*exp(-αS/0.2747)
0.7265
A
αS
0.79 αS = 2.9170 - 1.6430*A1.1992
0.6226
Α
αS
0.80 αS = 1.0411 + 0.2209*α1.9232
B
αS
1
αS = 3.1863 - 1.7117*B
2
0.6277 0.9949
|r|
R1
R2
R1
Λ
0.01 Λ = 2.2006 - 0.5847*R1
0.0275
2
R1
R2
0.06 R2 = 1.0077 - 0.0993*R1
0.0330
2
S
R1
0.56 R1 = 0.9963 - 0.0443*S0.2449
0.7250
2
I
R1
0.58 R1 = 0.9996 - 0.0323*I
0.6743
2
A
R1
0.61 R1 = 0.9302 - 0.2598*A4.4935
0.6224
2
Α
R1
0.62 R1 = 0.93191 - 107.1708*exp(-α /0.2587)
0.6276
2
αS
R1
0.70 R1 = 0.9367 - 124.8114*exp(-αS/0.2359)
0.7971
2
B
R1
0.72 R1 = 0.9362 - 0.1666*B4.7810
0.7984
2
0.8060
2
R1
R4
0.2928
0.86 R4 =
0.9842*R12.6096
The control cycle 267 Var
Var
Fitting function
Hrel
R1
0.92 R1 = - 0.5701 + 1.5374*Hrel
0.9133
4
RRrel
R1
0.96 R1 = - 2.1780 + 3.1664*RRrel
0.9447
1
|r|
R2
omitted
R2
R1
R2
0.06 R2 = 1.0077 - 0.0993*R1
0.0330
2
Hrel
R2
0.14 R2 = 1.1650 - 0.2581* Hrel
0.0952
2
RRrel
R2
0.15 R2 = 1.2065 - 0.2964*RRrel
0.0274
2
R2
R4
0.16 R4 = 0.7733 - 61.9188*R2
0.2111
1
αS
R2
0.30 R2 = 0.9618 - 0.0207*αS
0.1211
1
S
R2
0.32 R2 = 0.8948*S0.0085
0.1751
Α
R2
0.32 R2 = 0.9818 - 0.0282*α
0.1566
A
R2
0.33 R2 = 0.8901 + 0.0517*A
0.1656
1
B
R2
0.33 R2 = 0.8973 + 0.0339*B
0.1197
2
I
R2
0.37 R2 = 0.3771 + 0.4917*I0.0255
0.3129
R2
Λ
0.78 Λ = - 3.1375 + 5.2308*R20.9879
0.6377
|r|
118.6560
R4
1
2
R
2
R2
R4
0.16 R4 = 0.7733 - 61.9188*R2118.6560
0.2111
R4
Λ
0.17 Λ = 1.8626 - 0.2589* R4
0.0280
Α
R4
0.63 R4 = 0.8187-285.8296*exp(-α /0.2518)
0.5474
A
R4
0.63 R4 = 0.8149 - 0.6118*A4.7963
0.5961
3
S
R4
0.65 R4 = 1.3591 - 0.4388*S0.1080
0.9184
2
I
R4
0.69 R4 = 0.4145*exp(-I/201.7846) + 0.43
0.7325
αS
R4
0.75 R4 = 0.8424-107.2138*exp(-αS/0.2747)
0.7265
B
R4
0.77 R4 = 0.8408 - 0.3632*B4.1031
0.7266
RRrel
R4
0.81 R4 = - 6.0059 + 6.9509*RRrel
0.8076
R1
R4
0.86 R4 = 0.9842*R12.6096
0.8060
2
Hrel
R4
0.93 R4 = 0.2264 + 0.7224*Hrel7.5475
0.9566
4
|r| R1
Λ
RRrel S
Λ
1
2
R2
0.01 Λ = 2.2006 - 0.5847*R1
0.0275
Λ
0.01 Λ = 4.2076 - 2.6059*RRrel
0.0510
Λ
0.09 Λ = - 2.1657 + 3.6655*S0.0159
0.2627
2
I
Λ
0.17 Λ = - 6.9463 + 8.2416*I0.0123
0.4283
2
R4
Λ
0.17 Λ = 1.8626 - 0.2589*R4
0.0280
Hrel
Λ
0.18 Λ = 2.5936 - 0.9689*Hrel
0.0318
αS
Λ
0.45 Λ = 2.0448 - 0.1774*αS
0.2037
A
Λ
0.46 Λ = 1.4599 + 0.3866*A
0.2131
Α
Λ
0.46 Λ = 2.1524 - 0.2141* α
0.2079
B
Λ
0.47 Λ = 1.4678 + 0.3283*B
R2
Λ
0.78 Λ = - 3.1375 + 5.2308*R2
0.9879
2
0.2514
1
0.6377
2
268 The control cycle The expression (y-a) in the denominator of two of the equations simply means that the beginning of the function is shifted by constant a. All differential equations represent the simplest forms of Wimmer-Altmann's (2005) unified theory. Though here some cases are not sufficiently corroborated, we can suppose that if a relation exists, it can be expressed by some of these five formulas or their generalisations. It is to be emphasised that this is merely a beginning of theory development. Since a property is never completely isolated, it does not mean that it is linked directly with only one other variable. Sometimes the prediction may improve if we consider functions like y = f(x,z) or y = f(x,w,z) etc, i.e. a link to more than one independent variable. When we have n properties, then we can have 2n different control cycles. Consider for example the two functions of R2: for R2 = f(αs) we obtained R2 = 0.1211; for R2 = f(R1) we obtained R2 = 0.0330, that means in both cases a rather insignificant result. If we take R2 = f(αs) + f(R1) in a simple form R2 = c + a*αsb + d*R1k we obtain already R2 = 0.4340. This is still insignificant but adding further variables one could find a reliable net of interrelations. This research perspective seems to be encouraging while contemplating 3D plots of triplets of indicators, such as illustrated in Figures 4.40, 4.41, and 4.42. 3D plots (big spheres) show the connections that can be simultaneously established amongst any three of the set of indicators considered in Table 4.1. and one can easily recognise the 2D projections (small circles) presented in detail above. Obviously, these 3D plots become more lucid if the data are cleaned up of outliers. Outliers represent, so to say, a foreign body in lowdimensional view. But even if we consider a system of equations, the outliers can be resolved often only with some boundary conditions which hold only for the given poems. Unfortunately, surpassing three dimensions we must give up graphical presentations and restrain to systems of equations.
The control cycle 269
Figure 4.40. Hrel, R1, R4 relationship
Figure 4.41. I, S, R4 relationship
Figure 4.42. α, A, R1 relationship
References Agricola, E. (1969). Semantische Relationen im Text und im System. Halle: Niemeyer. Altmann, G. (1963). Phonic structure of Malay pantun. Archiv orientalní 32, 274–286. Altmann, G. (1966a). The measurement of euphony. Teorie verše I, 259–261. Brno: Universita J. Purkyně. Altmann, G. (1966b). Binomial index of euphony for Indonesian poetry. Asian and African Studies 2, 62–67. Altmann, G. (1968). Some phonic features of Malay shaer. Asian and African Studies 4, 9–16. Altmann, G. (1978). Zur Anwendung der Quotiente in der Textanalyse. Glottometrika 1, 91–106. Altmann, G. (1987). Tendenzielle Vokalharmonie. Glottometrika 8, 104–112. Altmann, G. (1999). Von der Fachsprache zum Modell. In: Wiegand, H.E. (ed.), Sprache und Sprachen in den Wissenschaften. Geschichte und Gegenwart: 294-312. Berlin: de Gruyter. Altmann, G. (2007). Poesie und Mathematik. Göttinger Beiträge zur Sprachwissenschaft 14, 7– 24. Altmann, G. (2009). Texte und Theorien. In: Delcourt, V., Hug, M. (eds.), Mélanges offerts à Charles Muller: 37–45. Paris: Conseil International de la Langue Française. Altmann, G., Popescu, I.-I., Zotta, D., (2013). Stratification in Texts, Glottometrics 25, 85–93; http://www.cs.auckland.ac.nz/research/groups/CDMTCS/researchreports/433APZ.pdf Altmann, G., Štukovský, R. (1965). The climax in Malay pantun. Asian and African Studies 1, 13–20. Altmann, V., Altmann, G. (2008) Anleitung zu quantitativen Textanalysen. Lüdenscheid: RAM. Ammermann, S. (2001). Zur Wortlängenverteilung in deutschen Briefen über einen Zeitraum von 500 Jahren. In: Best, K.-H. (ed.) (2001b), 59–91. Andres, J. (2010). On a conjecture about the fractal structure of language. Journal of Quantitative Linguistics 17(2), 101–122. Andres, J., Benešová, M. (2012). Fractal analysis of Poe’s Raven. Journal of Quantitative Linguistics 19(4), 301–324. Arapov, V.V. (1977). Dva modeli rangovogo raspredelenija. Vorposy informacionnoj teorii i praktiki 4, 3-42. Arapov, V.V., Šrejder, Ju.A. (1977). Klassifikacii i rangovye raspredelenija. Naučno-techničeskaja informacija 11(12), 15–21. Arapov, V.V., Šrejder, Ju.A. (1978). Zakon Cipfa i princip dissimetrii sistemy. Semiotika i informatika 10, 74–95 Arlt, I. (2006). Zur Wortlängenverteilung in SMS-Texten. Göttinger Beiträge zur Sprachwissenschaft 13, 9–21. Baayen, R.H. (1989). A corpus-based approach to morphological productivity. Statistical analysis and psycholinguistic interpretation. Amsterdam: Centrum voor Wiskunde en Informatica. Bernet, Ch. (1988). Faits lexicaux. Richesse du vocabulaire. In: Thoiron, Ph., Labbé, D., Serant, D. (eds.), Études sur la richess et la structure lexicale: 1-11. Paris-Genève: ChampionSlatkine. Best, K.-H. (2001a). Silbenlängen in Meldungen der Tagespresse. In: Best, K.H. (ed.) (2001b), 15–32. Best, K.-H. (ed.) (2001b). Häufigkeitsverteilungen in Texten. Göttingen: Peust & Gutschmidt.
References 271 Best, K.-H. (2003). Quantitative Linguistik. Eine Annäherung. Göttingen: Peust & Gutschmidt. Best, K.-H. (2005a). Morphlänge. In: Köhler, R., Altmann, G., Piotorowski, R.G. (eds.), Quantitative Linguistics. An International Handbook: 255-260. Berlin-New York: de Gruyter. Best, K.-H. (2005b). Wortlänge. In: Köhler, R., Altmann, G., Piotorowski, R.G. (eds.), Quantittative Linguistics. An International Handbook: 260-273. Berlin-New York: de Gruyter. Best, K.H., Kaspar, I. (2001). Wortlängen in Faröischen. In: Best, K.H. (ed.) (2001b), 92-100. Bortz, J., Lienert, G.A., Boehnke, K. (1990). Verteilungsfreie Methoden in der Biostatistik. Berlin: Springer. Bradley, J.V. (1968). Distribution-free statistical tests. Englewood Cliffs: Prentice Hall. Brainerd, B. (1976). On the Markov structure of the text. Linguistics 176, 5–30. Brown, C., Yule, G. (1983). Discourse analysis. Cabridge: Cabridge University Press. Brunet, É. (1978). Le vocabulaire de Jean Giraudoux. Structure et évolution. Genève: Slatkine. Bunge, M. (1967). Scientific research. Berlin-Heidelberg-New York: Springer. Bunge, M. (1983). Epistemology & Methodology I: Exploring the world. Dordrecht: Reidel. Bunge, M. (1995). Quality, quantity, pseudoquantityt and measurement in social science. Journal of Quantitative Linguistics 2, 1–10. Busemann, A. (1925). Die Sprache der Jugend als Ausdruck der Entwicklungsrhythmik. Jena: Fischer. Christmann, C. (2004). Denotative Textanalyse am Beispiel von Zeitungsartikeln. Seminararbeit, Trier. Cosette, A. (1994). La richesse lexicale et sa mesure. Paris: Champion. Cox, D.R. (1958). The regression analysis of binary sequences. Journal of the Royal Statistical Society B, 20, 215–232. Dindelegan, G.P. (ed.) (2013). The Grammar of Romanian. Oxford: Oxford University Press. Dugast, D. (1980). La statistique lexicale. Genève: Slatkine. Essler, W.K. (1971). Wissenschaftstheorie II. Freiburg: Alber. Esteban, M.D., Morales, D. (1995). A summary of entropy statistics. Kybernetica 31(4), 337-346. Fan, F., Altmann, G. (2007). Measuring the cohesion of compounds.In: Kaliuščenko, V., Köhler, R., Levickij, V. (eds.), Problems of typological and quantitative lexicology: 190-209. Černovcy: RUTA. Ferrer i Cancho, R. (2005). The variation of Zipf’s law in human language. European Physical Journal 44, 249–257. Galtung, J. (1967). Theory and methods of social research. Oslo: Universitetsforlaget. Gibbons, J.D. (1971). Nonparametric statistical inference. New York: McGraw-Hill. Gini, C. (1921). Maß der Verschiedenheit und der Einkommen“. Das ökonomische Journal 31: 124–126. Gini, C. (1936) On the Measure of Concentration with Special Reference to Income and Statistics. Colorado College Publication, General Series No. 208, 73-79. Grotjahn, R., Altmann, G. (1988). Linguistische Messverfahren. In: Ammon, U., Dittmar, N., Matheier, K.J. (eds.), Sociolinguistics. Soziolinguistik: 1026-1039. Berlin-New York: de Gruyter. Grzybek, P. (ed.) (2006). Contributions to the science of text and language. Word length studies and related issues. Dordrecht: Springer. Guiraud, P. (1954). Les charactères statistiques du vocabulaire. Essai de méthodologie. Paris: Université de France. Haight, F. A. (1966). Some statistical problems in connection word word association data. Journal of Matematical Psychology 3, 217–233.
272 References Haight, F.A. (1969). Two probability distributions connected with Zipf´s rank-size conjecture. Zastosowania matematyki 10, 225–228. Halliday, M.A.K., Hassan, R. (1976). Cohesion in English. London: Longman. Herdan, G. (1956). Language as choice and chance. Groningen: Nordhoff. Herdan, G. (1962). The calculus of linguistic observations. The Hague: Mouton. Herdan, G. (1966). The advanced theory of language aa choice and chance. Berlin: Springer. Herfindahl, O. (1950). Concentration in the steel industry. Diss., New York: Columbia University. Hoffmannová, J. (1996). Analýza diskurzu (ve světle nových publikací). Slovo a slovesnost 57(2), 109–115. Honore, T. (1979). Some simple measures of richness of vocabulary. ALLC Bulletin 7, 172–177. Hřebíček, L. (1985). Text as a unit and co-references. In: Ballmer, T.T. (ed.), Linguistic dynamics: 190-198. Berlin-New York: de Gruyter. Hřebíček, L. (1992). Text in communication: Supra-sentence structure. Bochum, Brockmeyer. Hřebíček, L. (1993). Text as a construct of aggregations. In: Köhler, R., Rieger, B. (eds.), Contributions to quantitative linguistics. Dordrecht: Kluwer: 33–39. Hřebíček, L. (1995). Text levels. Language constructs, constituents and Menzerath-Altmann law. Trier: WVT. Hřebíček, L. (1996). Word associations and text. Glottometrika 15, 12–17. Hřebíček, L. (1997). Lectures on text theory. Prague: Oriental Institute. Hřebíček, L. (2000). Variation in sequences. Prague: Oriental Institute. Hubert, P., Labbé, D. (1994). La richesse du vocabulaire. Communication au congrès de l’ ALLCACH. Paris: Sorbonne. Kaeding, F.W. (1897/98). Häufigkeitswörterbuch der deutschen Sprache. Steglitz: Selbstverlag. Kelih, E. (2009). Preliminary analysis of a Slavic parallel corpus. In: Levická, J., Garabík, R. (eds.), NLP, Corpus Linguistics, Corpus Based Grammar Research. Fifth International Conference Smolenice, Slovakia, 25-27 November 2009, Proceedings: 173–183. Bratislava: Tribun. Kendall, M.G. (1962). Rank correlation methods. London: Griffin. Köhler, R. (1986). Zur synergetischen Linguistik. Struktur und Dynamik der Lexik. Bochum: Brockmeyer. Köhler, R, (2005). Synergetic liguistics. In: Köhler, R., Altmann, G., Piotrowski, R.G. (eds.), Quantitative Linguistics. An International Handbook: 760-774. Berlin-New York, de Gruyter. Köhler, R., Galle, M. (1993). Dynamic aspects of text characteristics. In Altmann, G., Hřebíček, L. (eds.), Quantitative Text Analysis: 46-53. Trier: WVT. Köhler, R., Naumann, S. (2007). Quantitative analysis of co-reference structures in texts. In: Grzybek, P., Köhler, R. (eds.), Exact methods in the study of language and text: 317–329. Berlin-New York: Mouton de Gruyter. Ku, H.H. (1963). A note on contingency tables involving zero frequencies and the 2I test. Technometrics 5, 398–400. Levinson, S.C. (1983). Pragmatics. Cambridge: Cambridge University Press. Li, W. (1992). Random texts exhibit Zipf’s-law-like word frequency distribution. IEEE Transactions on Information Theory 38(6), 1842–1845. Lienert,
References 273 Lord, R.D. (1958). Studies in the history of probability and statistics. VIII: De Morgan and the statistical study of literary style. Biometrika 45, 282. Mandelbrot, B. (1953). An informational theory of the statistical structure of language. In: W. Jackson. (ed.), Communication theory: 486-502. London: Butterworth. Manin, D.Yu. (2009). Mandelbrot’s model for Zipf’s law. Can Mandelbrot’s model explain Zif’s law for language? Journal of Quantitative Linguistics 16(3), 274-285. Mann, F.J., Whitney, D.R. (1947). On a test whether one of two random variables is stochastically larger than the other. Annals of Mathematica Statistics 18, 50–60. Maxwell, A.E. (1961). Analysing qualitative data. London: Methuen McIntosh, R.P. (1967). An index of diversity and the relation of certain concepts to diversity. Ecology 48, 392–404. Ménard, N. (1983). Mesure de la richesse lexicale. Genève: Slatkine. Meyer-Eppler, W. (1959). Grundlagen und Anwendungen der Informationstheorie. Berlin: Springer. Miller, G.A. (1957). Some effects of intermittent silence. The American Journal of Psychology 70, 311-314. Mood, A.M. (1940). Distribution theory of runs. The Annals of Mathematical Statistics 11, 367392. Muller, Ch. (1964). Calcul de probabilité et calcul d’ un vocabulaire. Travaux de linguistique et literature 2(1), 235–244. Muller, Ch. (1970). Sur la mesure de la richesse lexicale. Études de linguistique appliqué, Nouvelle serie 1, 20–46. Muller, Ch. (1977). Principes et méthodes de la statistique linguistique. Paris: Hachette. Naranan, S., Balasubrahmanyan, V.K. (2005). Power laws in statistical linguistics and related systems. In: Köhler, R., Altmann, G., Piotrowski, R.G. (eds.), Quantitative Linguistics. An International Handbook: 716–738. Berlin: de Gruyter. Nemcová, E., Altmann, G. (1994). Zur Wortlänge in slowakischen Texten. Zeitschrift für empirische Textforschung 1, 40–43. Numan, D. (1993). Introducing discourse analysis. London: Penguin. Oakes, M.P. (2007). Ord’s criterion with word length spectra for the discrimination of texts, music and computer programs. In: Grzybek, P., Köhler, R. (eds.), Exact methods in the study of language and text: 508-519. Berlin-New York: Mouton de Gruyter. Orlov, Ju.K., Boroda, M.G., Nadarejšvili, I.Š. (1982) Sprache,Ttext, Kunst. Quantitative Analysen. Bochum: Brockmeyer. Owen, D.B. (1962). Handbook of statistical tables. Reading, MA: Addison-Wesley. Palek, B. (1988). Referenční výstavba textu. Praha: Univerzita Karlova. Popescu, I.-I. (2007). Text ranking by the weight of highly frequent words. In: Grzybek, P., Köhler, R. (eds.), Exact methods in the study of language and text: 555-565. Berlin-New York: Mouton de Gruyter. Popescu, I.-I., Altmann, G. (2006). Some aspects of word frequencies. Glottometrics 13, 24–46. Popescu, I.-I., Altmann, G. (2007). Writer’s view of text generation. Glottometrics 15, 71–81. Popescu, I.-I., Altmann, G. (2011). Thematic concentration in texts. In: Kelih, E., Levickij, V., Matskulyak, Y. (eds.), Issues in Quantitative Linguistics 2: 110-116. Lüdenscheid: RAMVerlag.. Popescu, I.-I., Altmann, G., Köhler, R. (2010). Zipf´s law – another view. Quality and Quantity 44(4), 713–731.
274 References Popescu, I.-I., Čech, R., Altmann, G. (2010). Structural conservatism and innovation in texts, Glottotheory, 3(2), 43–63. Popescu, I.-I., Čech, R., Altmann, G. (2011). The lambda-structure of texts. Lüdenscheid: RAMVerlag. Popescu, I.-I., Čech, R., Altmann, G. (2012). Some characterizations of Slovak poetry. In: Naumann, S., Grzybek, P., Vulanovic, R., Altmann, G. (eds.), Synergetic Linguistics. Text and Language as Dynamic Systems: 187–165. Wien: Praesens. Popescu, I.-I., Čech, R., Altmann, G. (2013). The descriptivity in Slovak lyrics. Glottotheory 4(1), 92-104. Popescu, I.-I., Kelih, E., Best, K.-H., Altmann, G. (2009). Diversification of the case. Glottometrics 18, 32–39 Popescu, I,.-I., Mačutek, J., Altmann, G. (2009). Aspects of word frequencies. Lüdenscheid: RAM. Popescu, I,.-I., Mačutek, J., Altmann, G. (2010). Word forms, style and typology. Glottotheory 3(1), 89–96. Popescu, I.-I., et al. (2009). Word frequency studies. Berlin-New York: Mouton de Gruyter. Popescu, I.-I., Kelih, E., Mačutek, J., Čech, R., Best, K.-H., Altmann, G. (2010). Vectors and codes of text. Lüdenscheid: RAM-Verlag. Schiffrin, D. (1987). Discourse markers. Cambridge: Cambridge University Press. Schulz, K.-P., Altmann, G. (1988). Lautliche Strukturierung von Spracheinheiten. Glottometrika 9, 1–48. Schwarz, C. (1995). The distribution of aggregates in text. ZET – Zeitschrift für empirische Textforschung 2, 62–66. Sebeok, T.A., Zeps, V.J. (1959). On non-random distribution of initial phonemes in Cheremis verse. Lingua 8, 370–384. Skinner, B.F. (1939). The alliteration in Shakespeare’ sonnets: A study in literary behaviour. The Psychological Record 3, 186–192. Skinner, B.F. (1941). A quantitative estimate of certain types of sound patterning in poetry. The American Journal of Psychology 54, 64–79. Stegmüller, W. (1970). Theorie und Erfahrung. Berlin-Heidelberg-New York: Springer. Stubbs, M. (1983). Discourse analysis. Oxford: Blackwell. Štukovský, R., Altmann, G. (1965). Vývoj otvoreného rýmu v slovenskej poézii. Litteraria VIII, 156–161. Štukovský, R., Altmann, G. (1966). Die Entwicklung des slowakischen Reimes im XIX und XX Jahrhundert. In: Teorie verše I, 258-261. Brno: Universita J.E. Purkyně. Thoiron, Ph. (1988). Richesse lexicale et classement des textes. In: Thoiron, Ph., Labbé, D., Serant, D. (eds.), Études sur la richess et la structure lexicale: 141–163. Paris-Genève: Champion-Slatkine. Thoiron, Ph., Labbé, D., Serant, D. (eds.) (1988). Études sur la richess et la structure lexicale: 141-163. Paris-Genève: Champion-Slatkine. Tuzzi, A., Popescu, I.-I., Altmann, G. (2010a). Quantitative Analysis of Italian Texts. Lüdenscheid: RAM-Verlag. Tuzzi, A., Popescu, I.-I., Altmann, G. (2010b). The golden section in texts. ETC - Empirical Text and Culture Research 4, 30–41. Vater, H. (1994). Einführung in die Textliguistik. Struktur, Thema und Referenz in Texten. München: Fink.
References 275 Viehweger, D. (1978). Struktur und Funktion nominativer Ketten im Text. Studia Grammatica 17, 149–168. Wimmer, G., Altmann, G. (1999). Thesaurus of univariate discrete probability distributions. Essen: Stamm. Wimmer, G., Altmann, G. (2005). Unified derivation of some linguistic laws. In: Köhler, R., Altmann, G., Piotrowski, R.G. (eds.), Quantitative Linguistics. An International Handbook: 791–807. Berlin: de Gruyter. Wimmer, G., Altmann, G., Hřebíček, L., Ondrejovič, S., Wimmerová, S. (2003). Úvod do analyzy textov. Bratislava: Veda. Ziegler, A. (2005). Denotative Textanalyse. In: Köhler, R., Altmann, G., Piotrowski, R.G. (eds.), Quantitative Linguistics. An International Handbook: 423–447. Berlin-New York: de Gruyter. Ziegler, A., Altmann, G. (2002). Denotative Textanalyse. Wien: Praesens. Zipf, G.K. (1935/1968). The psycho-biology of language: an introduction to dynamic philology. Cambridge, Mass.: The M.I.T. Press. Zipf, G.K. (1949). Human behaviour and the principle of least effort. Cambridge, MA: AddisonWesley Zörnig, P., Boroda, M.G. (1992). The Zipf-Mandelbrot law and the interdependencies between frequency structure and frequency distribution in coherent text. Glottometrika 13, 205– 218.
Index active-descriptive equilibrium 93 activity 216f., 239 aggregation 9, 56 alliteration 9, 44, 50, 56 alliterative weight 53 Altmann 9, 22, 31, 37, 56f., 70, 75, 84, 92, 99, 103f., 134, 136f., 139, 141, 144, 153, 158f., 171, 175, 182, 197, 202, 209, 211, 217f., 238, 268, 2f., 5f. analytic 103f. analytic language 74 Andres 3 Arapov 102 arc length 66, 129 association 33, 39, 73 assonance 9, 31, 39f. asymmetry 39f. attractor 134 autosemantics 104, 171, 182 Baayen 102 Balasubrahmanyan 103 Benešová 3 Beowulf alliteration 44, 53f. Beowulf-alliteration 54 Bernet 182 Best 158, 196, 211 beta 219 beta function 70, 140 Boehnke 234, 238 Boroda 45, 102 Bortz 234, 238 boundary condition 263 Bowker-test 40 Bradley 236 Brainerd 31 Brunet 182 Bulgarian 103 Bunge 210, 5, 7 Busemann 92 Busemann coefficient 92 Busemann's ratio 217 Čech 70, 134, 137, 139, 144, 197, 218
ceteris paribus condition 76 chaos 3 climax 238 coefficient of euphony 26 concentration 147, 253 concentration indicator 144 concordance 215 control cycle 242 correlation 243, 251, 253 correlation coefficient 242 cosine 58 Cossette 182 Cox 234 cumulative frequency distribution 99 Czech 137 de Morgan 196 deducibility 6 descriptive equilibrium 93 descriptiveness 98, 216f., 239 determination coefficient 243 diagonal 37f. difference equation 75 dis legomena 110 dispersion 147, 253 dissimilarity 59 distance 45, 56, 61, 64f. distances 64 distribution – binomial 22, 209, 217 – Conway-Maxwell-Poisson 80f. – displaced Poisson 209 – displaced Singh-Poisson 209 – Ferreri-Poisson 82, 84, 202 – frequency 98 – hyper-Poisson 80, 83, 202 – negative hypergeometric 197, 212 – normal 32, 6 – Poisson 202, 237 – rank-order 92 – Riemann zeta 103 – sampling 5 – word-length 202 – zeta 103
278 Index – Zipf 103, 109 – Zipf-Alekseev 108 – Zipf-Mandelbrot 108 diversification 211 Dugast 182 English 137, 210, 216 entropy 144, 242, 253 equilibrium 93, 134, 147, 217 Essler 7 Esteban 144 Euclidean distance 129 euphonic tendency 26, 28 euphony 21f., 26, 28, 31, 45 euphony indicator 26 exactness 6 explanation 1 exponential 219 Ferrer i Cancho 103 frequency distribution 98 frequency sequence 122 Friedman's analysis of variance 213 Galle 218 Galtung 5 German 87, 98, 196, 210, 216 Gibbons 238 Gini 242 Gini's coefficient 153 golden section 134, 171, 175f., 180f. Grotjahn 5 Grzybek 196 Guiraud 182 Haight 102 Hammerl 211 hapax legomena 103, 110 Hawaiian 103 Herdan 103, 144 Herfindahl 144 homogeneity 213 homogeneity test 93 Honore 182 h-point 182, 254 hreb 99, 157 Hřebíček 99, 3
Hrel 255 Hubert 182 Hungarian 103 hypertext 2 hypothesis 211, 4ff. – statistical 4 iconic 28 idiosyncrasy 7 indicator of concentration 144 Indonesian 39 information statistics 94 inner rhythm 74 Jakobson 1 Javanese 21 Joos' model 103 Kaeding's 98 Kelih 158, 197, 3 Kendall 19 Kendall's concordance coefficient 19, 213 Köhler 9, 99, 104, 197, 211, 218, 242, 262, 3 Köhler's requirements 211 k-point 183 Labbé 182 lambda 129, 242 Latin 87 law 98f., 263, 2, 5f., 8 lemma 157 lemmatisation 99 length 31, 41, 97 Li 102 Lienert 234, 238 link 5 Lord 196 Lorenz curve 153f. Mačutek 103, 134, 136f., 141, 3 Malay 31 Mandelbrot 99, 102 Manin 103 Mann 238 Mann-Whitney U-test 238 Markov 31 Maxwell 234
Index 279
McIntosh 146, 242 Ménard 182 Meyer-Eppler 22 Miller 99, 102 mixing 202 Mohanty 197 Mood test 236f. Morales 144 Morse function 219ff. motif 97 Muller 182 Nadarejšvili 45 Naranan 102 Naumann 99, 197 Nemcová 211 nominal style 98 nominality 216 non-smoothness 65 normality 6 open rhyme 85 Ord 242 Ord's criterion 122 Ord's scheme 197 Orlov 45, 102 ornamental 93 ornamentality 92, 98 Overbeck 197, 211 Owen 238 parallelism 31 part of speech 74, 93f., 96 parts of speech 104, 210 Parts-of-speech 87f. Peircean abduction 210 phoneme frequencies 9 placing 238 poem 1 Poisson 75ff., 79f., 82 Poisson distribution 75 Popescu 9, 70, 99, 103f., 134, 136f., 139, 141, 144, 153, 158f., 164, 171, 175, 182f., 188, 194, 197, 202, 218, 3 power function 41, 124, 230 property 6 prose rhythm 1
R1 183, 187, 194, 253, 255 R2 188f., 193 R4 255 radian 58, 171, 176, 180f., 242 rank-frequency sequence 118 regularity 2 reliability 5 renewal process 28 repeat rate 144f., 242, 253 rhyme 9, 21, 31, 73f., 87, 93 – closed 74 – feminine 74, 86 – masculine 74, 86 – masculine 86 – masculine 87 – mixed 74 – open 74, 84 rhyme – closed 84 rhyme words 86f. rhyme-word 75, 88 rhythmic structure 1 richness 242, 5 Rothe 211 roughness 65f., 68 Rovenchak 197 run length 236 runs 231 Sahlean 61 Sanada 197 Schulz 37 Sebeok 22 self-information 147 self-organisation 21, 122, 134, 242 self-regulation 175, 242, 6 self-regulation cycle 256 self-regulation. 76 self-stimulation 56 sequence 41f. sequential dependence 232 Serant 182 Shannon 144 sigmoid 70 similarities 60, 64, 70f. similarity 58ff., 64f., 68, 92f. simplicity 5
280 Index simplification 7 Singh-Poisson 208 Skinner 22, 56 Skinner alliteration 44f., 51, 53 Skinner alliterative 53 Skinner effect 21, 56, 68 Slovak 84, 106, 148, 197 Smith 197 spectrum 99f., 102, 157, 165, 175, 188, 242, 254 speech act 106 spontaneity 56, 68, 75 spontaneous 56, 61, 68, 231 Šrejder 102 statistical test 7 steady state 84 Stegmüller 7 stochastic processes 75 stratification 9, 17, 102, 106, 211, 213 stratification law 9 Štukovský 84, 238 Symmetry 39 synsemantics 104, 171, 182 synthetic 103f. synthetism 103 system 99, 5 systematisation 6 tendency 37 testability 6 theory 5f. Thoiron 182 tokenisation 98 tris legomena 110 Tuzzi 171, 175
uncertainty 147 unevenness 147 unified theory 268 uniformity 147 validity 5 variation interval 5 vocabulary richness 98, 148, 153f., 169, 181, 194 vowel harmony 37 Wentian Li 102 Whitney 238 Wilson 197 Wimmer 22, 26, 57, 75, 102f., 182, 202, 209, 211, 268, 5f. word class 97 word frequency 97ff. word length 74f., 98, 196 Word length 74 word lengths 74 word-forms 157 writer's view 171 Zeps 22 Ziegler 99, 211 Zipf 9, 99, 102, 211 Zipf-Alekseev function 212 Zipf-Alekseev's function 9 Zipf-Estoup law 103 Zipfian power function 212 Zipf's law 99, 102f., 108 Zörnig 102, 137 Zotta 104