Quantitative Analysis of Poetic Texts 9783110363791, 9783110336054

The book presents methods for the objective analysis of poetic language. Common objects of literary studies such as rhyt

247 68 8MB

English Pages 288 Year 2015

Table of contents :
Foreword
Contents
1 Introduction
2 Phonic phenomena
2.1 Occurrence without pattern
2.1.1 Phoneme frequencies
2.1.2 Euphony In general
2.2 Assonance
2.2.1 The diagonal
2.2.2 Symmetry
2.2.3 Poem length and significant sequences
2.3 Alliteration
2.4 Aggregation
2.5 Rhyme
2.5.1 Word length
2.5.2 Open and closed rhymes
2.5.3 Masculine and feminine rhyme
2.5.4 Parts of speech In rhymewords
3 The word-
3.1 Introduction
3.2 Frequency distribution
3.2.1 Stratification
3.2.2 Ord's criterion
3.2.3 The lambda indicator
3-2.4 Entropy and repeat rate
3.2.5 Glnl's coefficient
3.2.6 Geometric properties
3.2.6.1 The triangle
3.2.6.2 Writer's view and the golden section
3.3 Vocabulary richness
3.4 Word length
3.4.1 Ord's scheme
3.4.2 Word-length distribution
3.5 Word classes (parts of speech)
3.5.1 Frequencies
3.5.2 Descrlptivenessvs. activity
3.5.3 Runs
3.5.3.1 Sequential dependence
3.5.3.2 Run length
3.5.3.3 Placing tendency
4 The control cycle
References
Index

Recommend Papers

Quantitative Analysis of Poetic Texts 9783110394795, 9783110336054, 9783110363791

The book presents methods for the objective analysis of poetic language. Common objects of literary studies such as rhyt

148 52 32MB Read more

Conceptual Metaphors in Poetic Texts 9781617190292, 1617190292

The Hebrew Bible discusses difficult and often ineffable subjects such as life, God, heaven and earth and frequently rel

158 21 2MB Read more

Introduction to Quantitative Analysis for International Educators (Springer Texts in Education) 3030938301, 9783030938307

This textbook introduces international education scholars, professionals and graduate students to quantitative research

116 48 3MB Read more

Quantitative Syntax Analysis 9783110272192, 9783110272925

This is the first book which brings together the fields of theoretical and empirical studies in syntax on the one hand a

167 67 4MB Read more

Pairs trading : quantitative methods and analysis 0471460672

313 122 5MB Read more

Quantitative Analysis of Dependency Structures 9783110571097, 9783110565775, 9783110573565

Dependency analysis is increasingly used in computational linguistics and cognitive science. Surprisingly, compared with

172 83 18MB Read more

Quantitative Investment Analysis 9781119743644, 9781119743620, 9781119743651, 1119743648

Whether you are a novice investor or an experienced practitioner, Quantitative Investment Analysis, 4th Edition has some

177 44 19MB Read more

Vogel's Textbook of Quantitative Chemical Analysis [5th ed] 9780470215173, 0470215178

This book has been carefully restructured to present the basic theory alongside coverage of more practical subjects. Thi

406 9 250KB Read more

Value Investing CHECKLIST: Qualitative and Quantitative Analysis

148 31 6MB Read more

Quantitative Technical Analysis 978-097918385-0

630 111 21MB Read more

Quantitative Analysis of Poetic Texts
9783110363791, 9783110336054

Author / Uploaded
Ioan-Iovitz Popescu
Mihaiela Lupea
Doina Tatar
Gabriel Altmann

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Ioan-Iovitz Popescu, Mihaiela Lupea, Doina Tatar and Gabriel Altmann Quantitative Analysis of Poetic Texts

Quantitative Linguistics

Editors Reinhard Köhler Gabriel Altmann Peter Grzybek Advisory Editor Relja Vulanović

Volume 67

Ioan-Iovitz Popescu, Mihaiela Lupea, Doina Tatar and Gabriel Altmann

Quantitative Analysis of Poetic Texts

DE GRUYTER MOUTON

ISBN 978-3-11-033605-4 e-ISBN (PDF) 978-3-11-036379-1 e-ISBN (EPUB) 978-3-11-039479-5 ISSN 0179-3616 Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2015 Walter de Gruyter GmbH, Berlin/Boston Printing and binding: CPI books GmbH, Leck ♾ Printed on acid-free paper Printed in Germany www.degruyter.com

Foreword The study of language is a travel without end. Not only because there are many languages but also because there is an unlimited number of texts. Everyone produces several ones on a daily basis and the only way to learn about languages (and not only about languages) is the Sisyphean analysis of the infinite number of texts. Usually, a given problem whose solution is focused, presupposes some definitions, some conventions and some hypotheses. The definitions concern concepts which are created by the researcher and enable her/him to describe and classify. This is the mandatory initial point of any analysis. If we want to proceed to the next level, we must try to test some hypotheses about the properties and the behavior of the given classes. But even if we succeed to do so and capture the results in form of a model, we see that each of the classes can be further scrutinized and split into new classes according to another property, e.g. classifying the words of a text in parts of speech (level 1), we state that the nouns have different grammatical functions (level 2); in turn we may state that every grammatical function contains elements differing in their polysemy (level 3), etc. This procedure does not differ from that in physics or astronomy. The only difference is the fact that language is a cultural product; its analysis necessitates rather complex methods. The next complication shared also with biology is the variability of texts. Texts are written by persons of different age, education, gender, mother tongue, social status; they were written/uttered for different aims in parts or as a whole (text-sorts), they describe different (existing or imaginary) matters and stick sometimes to quite different restrictions (e.g. meter, rhythm, rhyme). And each time we decide for analyzing some of the aspects, find a mathematical model, test it and subsume the discovered regularity under a system, we discover a new aspect. And somewhere on this endless wandering we shall meet psychologists, biologists and physicists and will be forced to take into account their view of things. In the present volume we went a very restricted path: we analyzed the poetic work of the Romanian author Mihai Eminescu and tried to show at least two aspects of texts, viz. the phonetic aspect (Ch. 2), and the vocabulary (Ch. 3). The control cycle was presented in Ch. 4. We hope that the methods will be used for analyzing many different texts. We want to express our gratitude to several colleagues who helped, advised, corrected and improved the book. In the first place, it was Reinhard Köhler who spent more time with corrections to this book than with writing his own one. Other colleagues: Relja Vulanović, Ján Mačutek, Gabriela Pană

VI  Foreword Dindelegan, Claudiu Vasilescu, Sorin Vizireanu, and Dan Zotta helped us with the English language, mathematics, computing, graphics, and Romanian, and we are glad that they did not met with shattered nerves. Ioan-Iovitz Popescu Mihaiela Lupea Doina Tatar Gabriel Altmann

Contents Foreword  V 1

Introduction  1

2 2.1 2.1.1 2.1.2 2.2 2.2.1 2.2.2 2.2.3 2.3 2.4 2.5 2.5.1 2.5.2 2.5.3 2.5.4

Phonic phenomena  9 Occurrence without pattern  9 Phoneme frequencies  9 Euphony in general  21 Assonance  31 The diagonal  37 Symmetry  39 Poem length and significant sequences  41 Alliteration  44 Aggregation  56 Rhyme  73 Word length  74 Open and closed rhymes  84 Masculine and feminine rhyme  86 Parts of speech in rhyme words  87

3 The word  97 3.1 Introduction  97 3.2 Frequency distribution  98 3.2.1 Stratification  102 3.2.2 Ord's criterion  122 3.2.3 The lambda indicator  129 3.2.4 Entropy and repeat rate  144 3.2.5 Gini's coefficient  153 3.2.6 Geometric properties  157 3.2.6.1 The triangle  157 3.2.6.2 Writer's view and the golden section  171 3.3 Vocabulary richness  181 3.4 Word length  196 3.4.1 Ord's scheme  197 3.4.2 Word-length distribution  202 3.5 Word classes (parts of speech)  210 3.5.1 Frequencies  210

VIII  Contents 3.5.2 3.5.3 3.5.3.1 3.5.3.2 3.5.3.3 4

Descriptiveness vs. activity  216 Runs  231 Sequential dependence  232 Run length  236 Placing tendency  238 The control cycle  242

References  270 Index  277

1 Introduction Poetic texts can be analyzed from an infinite number of viewpoints, just as any text and the whole of the human behaviour. Every viewpoint is interesting for some scientific discipline, and the number of viewpoints increases with the advancement of science. Our aim is very restricted, but, nevertheless, it opens up an infinite domain of new problems. And every problem can be solved in different ways. Hence, there is a path without end, wherever one begins and in whatever direction one goes. In the present volume, we shall concentrate on a small number of methods used in the study of poetic texts and apply them to some already quantified textual properties. Our textual examples are poems; they are often short and each result can be checked even without the use of a computer. Besides, the study of the phonic structure of poems is reasonable, because according to R. Jakobson, in poetry the form stays in the foreground. In prose, the phonic structure is not as prominent as in poetry and the rhythmic structure of prose depends also on the character of the given language, it is seldom a conspicuous property of a single text. Nevertheless, there is a discipline engaged in the study of prose rhythm. The methods presented in this study are applied to a corpus of 150 Romanian poems (including also a few “outliers”) written by Mihai Eminescu as they can be found in many editions of his works, texts analysing his works, or on the Internet: http://ro.wikisource.org/wiki/Autor:Mihai_Eminescu. In the present investigation, quantitative methods proven and tested in studies of prose texts, including methods for text comparison, are applied to poetic texts. Inter-sort or inter-language comparisons are frequently somewhat futile because each genre and each language has its own characteristic ways of text creation, hence most of the properties are significantly different. A statistical test simply emphasizes this expectation. We shall study phonic features, word-form frequencies, word-length, wordclasses, and the semantic structure of the poems revealing some parts of the author's world of associations. Each of them has many facets, but we concentrate rather on methods and methodology. An obvious question at the beginning of any book on text studies is: What is a text? However, in contemporary science, such essentialist questions are rather outdated. They require determinations of a kind of Kantian noumenon, the essence of a thing, which does not exist, or, expressed in a weaker form, it would not explain anything because explanations form an infinite hierarchy whereas the “essence” would be a final (and therefore not acceptable) station on this

2  Introduction way. Hence the only rational question is: what do we consider as a text? For the purpose of the present study, a text is a linear sequence of meaningful entities, organized also hierarchically (e.g. in the hierarchy sentence, clause, phrase, word, morpheme, syllable, phoneme). In linguistics, we restrict ourselves to spoken or written material but even within this restricted field, we find exceptions. Hypertexts e.g., on the Internet, are full of pictures and links, or texts in comics, etc., belong to the domain of intertextuality. Of course, one can study them, too, from various points of view but they are not standard texts as we are interested in. The texts of our interest are written in some script and their entities do not have only a purpose (like the kitchen in a house) but also a meaning, i.e., they refer to objects outside of the text. Nevertheless, even under this restriction, they have many properties in common with other linear sequences, and consequently, many methods used in non-linguistic disciplines can be applied also in linguistics. In quantitative linguistics, the explication of a text is not one of the aims or results of the research activities nor are the description of the content nor its evaluation (whether aesthetic or stylistic). Quantitative-linguistic research aims at finding regularities which arise due to the effect of –possibly still unknown – background laws. These regularities should not be confused with grammatical rules, which can be learnt or changed or even violated, and appear, in a manner of speaking, on the surface of the texts. We rather search for textual phenomena which are evoked by and evidence of certain background mechanisms. We shall never know all of them but stepwise approaching the matter allows us to penetrate deeper and deeper. There are five main approaches to text analysis (cf. Altmann 2007, 2009): (1) The static approach is concerned with the text as a whole, comprising the computation of all known properties, stylistic studies, evaluation of frequencies of different phenomena, lengths, polysemy values, word associations, measurement of grammatical structures, rankings, diversifications, classifications, denotative structures, measurement of differences, entropies, etc. This means that the text will be dissected into well defined units whose properties are studied. For this approach, at least elementary statistical methods are indispensible. Among the obvious tools, mathematical graphs and their properties provide easy ways to describe and display phenomena and relations. (2) The sequential approach considers text as a linear sequence of entities forming time series, runs, Markov chains, reference chains, etc. These entities comprise degrees of properties, frequencies, metrical feet, distances between elements of the series, etc., the position of certain de-

Introduction  3

grees of a property in a higher unit, e.g. word length positioning in the given sentence. This approach is more complex and frequently requires more complex methods. Corresponding mathematical models may be based on differential and difference equations. (3) A systemic approach can be started when some of the problems in the first two domains have been solved. Relations between entities, properties and structures which form control cycles and display the selfregulation mechanisms of text are in the focus of this approach. Though we know that texts are produced by authors, which consciously obey only grammatical rules and maybe rules of text structure, there are also latent, subconscious forces which compel the speaker/writer to form the text in a special way, e.g. reducing the decoding effort, reducing the memory effort, reducing sentence difficulty, increasing originality etc. The writer is free with respect to the content but not free with respect to the external form of the text: s/he must abide by some laws if s/he wants to be understood. The axiom concerning the non-existence of isolated entities in language and text is a sufficient motivation for the systemic approach. Investigations of this kind are known from the socalled synergetic linguistics (cf. Köhler 2005) and comprise both language and text. (4) The typological approach consists of comparing all the above mentioned properties as they occur in texts of different languages, placing the languages and texts on different scales, building fuzzy classes and studying the variability of various phenomena. Though text analysis played a secondary role in this research, its importance receives new impulses (cf. e.g. Kelih 2009; Popescu, Mačutek, Altmann 2009). However, the notorious classifications based on categorical concepts do not yield anything else but new, more general, concepts. We need them, but they seldom lead to theoretical progress. (5) The chaos theoretical approach. All aspects mentioned above contain some elements of chaos which is placed in a deeper layer in all text phenomena. Some phenomena, e.g. fractals, dimensions, attractors are identifiable but because of their indirect relevance for the text sciences and also because of their computational effort they are not yet sufficiently discussed (cf. Hřebíček 1997, 2000; Andres 2010; Andres, Benešová 2011). Ideally, a quantitative text analysis engages three different specialists. This is because at the beginning of the research, it is always the task of a linguist/text

4  Introduction scientist to set up a hypothesis with linguistic relevance. No hypothesis – no quantitative text research! The linguist states what kind of data would be relevant for testing the hypothesis and the programmer tries to elicit them from texts. As opposed to facts and phenomena, data are not just given but they are the result of a scientific activity, they are constructed. To a text scientist, text is the matter from which data are conceptually constructed. In the meantime, the mathematician translates the verbal hypothesis into the language of mathematics, i.e. formulates it as a statistical hypothesis. At the same time s/he tries together with the linguist to find the mechanism that can lead to the rise of the given phenomenon. In other words, the mathematician tries to set up a model of the phenomenon and to subsume it under an existing theory, to embed it in a system of similar hypotheses. The programmer tests the hypothesis on her/his data and the mathematician interprets them statistically. The results of the test are translated into the daily language of linguistics, and the linguist interprets the result linguistically. Hence, the succession of persons in text analysis is: linguist –> mathematician –> programmer –> mathematician –> linguist. The linguist is placed at the beginning and the end of this procedure and warrants the linguistic relevance of the problem at the beginning and the relevance of the results at the end. Needless to say, mathematicians and programmers frequently propose excellent ideas; a sound cooperation yields the most reasonable results. Texts are sources also for other disciplines such as psycholinguistics, sociolinguistics, dialectology, language teaching, etc. in which the respective experts determine the course of research. Another obvious question is: What can be considered as poetry? The first answer is: Poetry is a kind of literary art where evocative and aesthetic effects are based on form, in addition to (sometimes: instead of) meaning. This volume aims at investigating the universal laws and interrelations of aspects connected with consciously formed texts under consciously imposed form restrictions. There are many commonalities in these texts but none of the properties can be supposed as a necessary condition. Rhyme, rhythm, meter, the existence of verse line, strophes, a fixed number of lines (as in sonnets), meaning, etc., can be found in many but not in all poems. We must rely on the judgement of literary historians, making allowance for the existence of outliers which may destruct even our theories. Many times they can be made harmless by introducing boundary or subsidiary conditions. A large part of quantitative characterisations is performed by means of indicators. Many of them tell the same story but their interpretation may be differ-

Introduction  5

ent. But if they tell the same story, then there is a clear link between them, even when their method of computation is different. The indicators should have at least the following properties (cf. Galtung 1967; Grotjahn, Altmann 1988; Wimmer et al. 2003: 25ff): (1) Meaning. This seems to be quite natural, but many indicators arise in form of a proportion which does not have a clear interpretation. The indicator must tell us what it describes. (2) Simplicity, especially at the beginning of a research, because it alleviates computation and the mathematical treatment. It is advantageous to express different properties with different indicators. (3) Variation interval. If there are indicators varying in the interval , a given value of this indicator cannot be interpreted. Every number can be considered large (with respect to the lower limit 0) or small (with respect to the upper limit ∞). It is therefore reasonable to restrict the value to a finite interval by means of normalization. (4) Sampling distribution. This property of an indicator is indispensable for a reliable evaluation of the measured values. It gives information about the frequency or probability of the individual values of the indicator, information which is fundamental for any statistical assessment. Unfortunately, this requirement is still ignored in the humanities in many cases. In order to apply an indicator, e.g. for comparisons, one should know at least its variance, which is needed for asymptotic tests. Exact probabilities can be computed only when the distribution of the indicator is known. The application of non-parametric statistics, a well-established technique, is an alternative. (5) Reliability is the measure of exactness and stability. The indicator should be stable and express the same property in all cases. (6) Validity means the fact that the indicator truly expresses the studied property. An illustrative example in this respect is the large number of available measures of vocabulary richness, whose validity is an open question. But all this cannot be achieved in an elementary, preliminary investigation. Research begins always with the first step and improves its argumentation step by step, sets up more complex hypotheses, extends the investigations to other languages and, based on the surface phenomena expressed by indicators, further steps towards a theory follow. A theory is a system of interrelated hypotheses, some of which can be considered laws, i.e. general statements derived from axioms or other laws, or in other words, anchored in antecedent knowledge, and empirically well corroborated (cf. Bunge 1967). In mainstream linguistics, the term theory is misused. It stands, as a rule, for concepts, isolated phenomena, descriptive approaches, sets of facts, classifications and sets of rules. All that, and even strict definitions – which are not more than conventions – and a preceding formalization do not have the status

6  Introduction of a theory. The mentioned definitions and formalisations are merely necessary but not sufficient conditions for the construction of a theory. A theory begins to arise when we derive hypotheses from antecedent knowledge, test them empirically and join them with a system of universal, corroborated statements. This is, of course, not a simple task because language is not a deterministic system with clear-cut units and relations. Though it is always in a steady state, it varies with every speaker, it changes incessantly, and communication is possible only because of its self-regulation. A speaker can – and does – change elements but if s/he aims at communicative success, the change must not surpass a certain limit. With every change, the limit is shifted by a tiny quantity. Since this shift is always advantageous from the point of view of the speaker – s/he is the actor in this play – the phenomena in language are never distributed according to the normal (Gaussian) distribution. Every distribution in language is skewed. Nevertheless, values of whatever property taken from many texts may display normality in a statistical sense (a situation that can be tested) and a comparison with other text groups is possible by means of an asymptotic test based on normality. The greatest advancements in every empirical science are achieved by introducing mathematical methods. Mathematics is a warrant of exactness, testability, deducibility, and systematisation and it gives us the chance to predict phenomena which are not visible on the surface of texts. In spite of this, there are still objections against the application of mathematical instruments in literary studies. Such objections can be heard from hard-core poeticists relying only on intelligent verbal descriptions. The corresponding arguments have been analysed (cf. Altmann 1999; Wimmer et al. 2003:14 ff.) and will be reproduced here for the sake of clarity. The objections are: 1. “Our objects cannot be quantified/mathematised.” 2. “Even if it would be possible, we are not interested in numbers but in qualities and properties.” 3. “We are not interested in laws but in the uniqueness, idiosyncrasy of texts.” 4. “Our problems are that complex that no mathematics can capture and express them.” Evidently, all these objections arise either from misunderstanding and can easily be appeased, or from a negative attitude towards mathematics, and in that case they cannot be removed. Objection 1 is rooted in false epistemology. We do not “mathematise” real objects, we quantify/mathematise only our concepts and ideas about them. Objects do not contain numbers, which could be observed. Properties are first

Introduction  7

conceptually constructed, then quantified and at last measured. These measures are ascribed to objects. Properties are always gradual (cf. Bunge 1983; 187f; 1995), hence quantification is the best way to perform exact research. If the concepts of objects or properties formed by a researcher cannot be quantified, then we have to conclude that these concepts are too poor or too unclear. In qualitative research only inexact expressions like “very warm”, “many”, “frequently” etc., occur, in extreme cases the property is dichotomised– relics from structuralism – and loses the major part of the information. Qualitative concepts of properties are the ontogenetic heritage of our language, in which numbers appear later on, as can be observed also in children's development. But if we admit that qualities and quantities do not exist in reality but only in our concepts, objection 1 becomes irrelevant. Objection 2, just as objection 1, confuses epistemology with ontology. No scientist is interested in numbers, but numbers are the best way to exactly capture our conceptual entities. Reality is neither qualitative nor quantitative. It simply exists. With the help of our concepts, we simply try to capture it in order to improve orientation and knowledge, and to survive. The information we obtain from reality are merely weak electrical impulses entering our brain via our senses, and the brain has to construct a (partial) map of the reality on this basis. Reality is (re-)constructed by means of concepts, which are primordially qualitative, and the natural language helps us by codifying them. Science, however, requires more exact concepts, viz. quantitative ones (cf. Bunge 1967, 1983; Stegmüller 1970; Essler 1971). Disciplines working with quantitative concepts develop more rapidly than other ones. Objection 3 is an evident error. Idiosyncrasy can be stated only as a contrast to a general background or as a difference from other texts. In any case, comparison is necessary. But if a text is said to represent an idiosyncrasy, the significance of the difference must be shown. This can be done only by means of a statistical test; the indication of the difference – even if it is given in an exact form – would not suffice. In literary studies, in corpus linguistics, and in computational linguistics, text and methods are frequently compared by indicating a certain numerical difference, sometimes in form of percentages. On this basis alone, conclusions are drawn such as "Method X is superior by 3 %" or "Text 1 possesses more of property X than text 2: Text 1 has 70 scores and text 2 only 68". These are proto-scientific statements, not more than opinions; they ignore that the difference may be a random result or due to an inexact measurement. Objection 4 is again a misunderstanding. Every statement, including scientific ones, is a simplification, whether it is given verbally or by means of mathematics. An object cannot be captured in its entirety – especially because

8  Introduction we do not even know what its entirety is. Only a small number of aspects of an object can be focused. In some theories, the inevitable separation into 'relevant' and other aspects is made explicit in the ceteris paribus condition, which can be presented in form of a disturbance constant or a special function, which weakly contributes to the main independent variable. Furthermore, mathematical concepts and quantitative methods are obviously the only imaginable way to describe and analyse complex structures and processes including fuzzy ones. There are two ways to perform text analyses: comparison of texts and text sorts, written by different authors in one or more languages and the study of the outcomes of text laws – or study of an individual author, one text sort, and one language, description of the properties of the given set of texts and the theoretical search for the latent mechanisms which brought about the given phenomenon. In this volume, we focus the poetic work by the famous Romanian poet Mihai Eminescu and try to characterise it, show some relations, and realisations of text laws, and we indicate perspectives for future research.

2 Phonic phenomena 2.1 Occurrence without pattern 2.1.1 Phoneme frequencies The usual way to capture phonic phenomena in texts is to consider sound/phoneme frequencies, either absolutely, relatively or associated with a position. While in prose mostly the first view is practised, in poetry patterns of sounds occur whose existence or positioning display a kind of statistical trend. The most common one is rhyme, which is created consciously whereas in other kinds of poetry and various languages, also phenomena such as alliteration, assonance, spontaneous aggregation, etc. are observed. Before scrutinising these specific phenomena, we will focus on the study of phoneme frequencies. We suppose that even in poetry – if there is no special aim, as in Dadaistic poetry – the phonemes abide by the stratification law, a general hypothesis, which was proposed as an alternative for Zipf's formula (cf. Popescu, Altmann, Köhler 2010). Nevertheless, Zipf's power law or ZipfAlekseev's function can also be used where the data are less complex. The stratification approach aims at finding the number of strata formed by the given entities. In short texts, there is usually only one layer, in longer texts, stratification becomes more obvious. This regularity holds for any kind of entities. In order to demonstrate this regularity on the phonic level, we first present the phonic analysis of Romanian and its phonemic interpretation as well as the transcription of letters into phonemes. In Romanian phonology, the phoneme inventory consists of seven vowels (strong vowels, syllabic vowels), one voiceless (non-syllabic) vowel, two or four semivowels (different views exist and we will work with the four semivowel version) and twenty-two consonants. The vowel „i“ can occur at the end of a syllable which already contains a syllabic vowel. In this case, „i“ is a nonsyllabic (voiceless) vowel. A semivowel (weak vowel) is phonetically similar to a vowel (strong vowel) but functions as a syllable boundary rather than as the nucleus of a syllable and is shorter than the corresponding vowel. Out of the total number of seven vowels, only four can behave as semivowels, which are involved in some special groups of phonemes called diphthongs and triphthongs. A diphthong refers to two adjacent vowels occurring within the same syllable. It contains one vowel (strong vowel) and one semivowel (weak vowel). A triphthong is the uninterrupted combination of three vowels in the same syllable: a strong vowel and

10  Phonic phenomena two semivowels (the strong vowel is usually in between the semivowels). The list of phonemic transcriptions of graphemes is presented in Tables 2.1.1.1 to 2.1.1.4 Table 2.1.1.1: The phoneme-grapheme relation for vowels and semivowels in Romanian Phoneme (IPA)

Grapheme

1

/a/

strong vowel

apa

2

/ə/

strong vowel

părinte

3

/ɨ/

strong vowel

cânta coborî, înainte

4

/e/

strong vowel

erou

5

/e̯ /

in diphthongs and triphthongs

weak vowel

stea, /e̯ /a/ - diphthong doreai, /e̯ /a/j/ - triphthong

6

/i/

strong vowel

inel

7

//

at the end of a syllable containing a syllabic vowel

non-syllabic flori, îţi, (voiceless) vowel orice, galbeni

8

/j/

in diphthongs and triphthongs

weak vowel

mai, /a/j/ -diphthong doreai, /e̯ /a/j/ triphthong

9

/o/

strong vowel

oraş

10

/o̯ /

in diphthongs and triphthongs

weak vowel

coasă, /o̯ /a/ - diphthong pleoape, /e̯ /o̯ /a/triphthong

11

/u/

strong vowel

durere

12

/w/

in diphthongs and triphthongs

weak vowel

nou, /o/w/ - diphthong vreau, /e̯ /a/w/ - triphthong

i

Example in Romanian

Occurrence without pattern  11

Table 2.1.1.2: The phoneme-grapheme relation for consonants in Romanian Phoneme (IPA)

Grapheme

Example

Example in English

1

/b/

bine

book

2

/k/

curaj karate quasar

close

3

/t∫/

cer cireşe

chest

4

/k’/

chemare, chipeş kilogram

kept

5

/d/

dar

day

6

/f/

foc

face

7

/g/

greu

gold

8

/dʒ/

ger gingaş

gist

9

/g’/

ghem ghiocel

get

10

/h/

harnic

hat

11

/ʒ/

joc

pleasure

12

/l/

lac

lake

13

/m/

mac

moon

14

/n/

nor

name

15

/p/

parc

pan

16

/r/

rac

rain

17

/s/

soare

sun

18

/∫/

şarpe

shape

19

/t/

tare

time

20

/ts/

ţară

its

21

/v/

val watt

voice

22

/z/

zori

zone

23

/c/+/s/ /g/+/z/

excursie examen

exception

12  Phonic phenomena Table 2.1.1.3: Romanian diphthongs Grapheme

Phonemic transcription

Example in Romanian

1

/a/j/

mai

2

/a/w/

dau

3

/e̯ /a/

stea

4

/e/j/

trei

5

/e̯ /o/

vreo

6

/e/w/

leu

7

/j/a/

biată

8

/j/e/

miere

9

/i/j/

fii

10

/j/o/

iobag

11

/i/w/

auriu

/j/u/

iubire

12

/o̯ /a/

soare

13

/o/j/

foi

14

/o/w/

nou

15

/w/a/

ziua

16

/w/e/

înşeuez

17

/u/j/

pui

18

/u/w/

continuu

19

/w/ə/

două

20

/w/ɨ/

plouând

21

/ə/j/

răi

22

/ə/w/

rău

23

/ɨ/j/

câine îi dau

24

/ɨ/w/

râu

Occurrence without pattern  13

Table 2.1.1.4: Romanian triphthongs Grapheme

Phonemic transcription

Example in Romanian

1

/e̯ /a/j/

doreai

2

/e̯ /a/w/

mergeau

3

/e̯ /o̯ /a/

pleoape

4

/j/a/j/

voiai

5

/j/a/w/

tăiau

6

/j/e/j/

piei

7

/j/e/w/

eu

8

/j/o̯ /a/

creioane

9

/j/o/j/

i-oi da

10

/j/o/w/

maiou

11

/o̯ /a/j/

orzoaică

12

/w/a/j/

înşeuai

13

/w/a/w/

înşeuau

14

/w/ə/j/

rouăi

Syllabification is very important in the identification of diphthongs, triphthongs and finally in the phonemic transcription. Some special cases of phonemic transcriptions with syllabification are presented below. 1. The grapheme ‘e’ at the beginning of personal pronouns is transcribed as follows: eu ea el ele ei

(I) (she) (he) (they - feminine) (they - masculine)

/j/e/w/ /j/a/ /j/e/l/ /j/e/ - /l/e/ /j/e/j/

2. The grapheme ‘e’ at the beginning of the forms (different tenses) of the verb “a fi” (“to be” ) is transcribed as /j/e/: e este eram

/j/e/ /j/e/s/ - /t/e/ /j/e/ - /r/a/m/

14  Phonic phenomena 3. The graphemes ‘e’ and ‘a’ at the beginning of a syllable, following a syllable which ends with ‘i’ are transcribed as /j/e/ and /j/a/ respectively. urgie prietenie România mantia

/u/r/ - /dʒ/i/ - /j/e/ /p/r/i/ - /j/e/ - /t/e/ - /n/i/ - /j/e/ /r/o/ - /m/ɨ/ - /n/i/ - /j/a/ /m/a/n/ - /t/i/ - /j/a/

4. Exceptions: loan words (neologisms) cordial Eliade diamant

/k/o/r/ - /d/i/ - /a/l/ /e/ - /l/i/- /a/ - /d/e/ /d/i/ - /a/ - /m/a/n/t/

For more details related to the rules for phonemic transcription in Romanian see Dindelegan (2013: 7–17). Examples of phonemic transcriptions with syllabification: chemare cheamă ochi ochii copii copiii veciniciei creioane şoarece fecioară valuri să-mi ghiozdan ghiocel geană gingaş examen excursie

/k’/e/ - /m/a/ - /r/e/ /k’/a/ - /m/ə/ /o/k’/ /o/ - /k’/i/ /k/o/ - /p/i/ /k/o/ - /p/i/- /j/i/ /v/e/t∫/ - /n/i/ - /t∫/i/ - /j/e/j/ /k/r/e/- /j/o̯ /a/ - /n/e/ /∫/o̯ /a/ - /r/e/ - /t∫/e/ /f/e/ - /t∫/o̯ /a/ - /r/ə/ /v/a/ - /l/u/r/i/ /s/ə/m/i/ /g’/o/z/ - /d/a/n/ /g’/i/ - /o/ - /t∫/e/l/ /dʒ/a/ - /n/ə/ /dʒ/i/n/ - /g/a/∫/ /e/ - /g/z/a/ - /m/e/n/ /e/k/s/ - /k/u/r/ - /s/i/ - /j/e/

Occurrence without pattern  15

The phonemic transcription of the poem Lacul is presented below. Lacul

phonemic transcription

Lacul codrilor albastru Nuferi galbeni îl încarcă Tresărind în cercuri albe El cutremură o barcă

/l/a/k/u/l/ /k/o/d/r/i/l/o/r/ /a/l/b/a/s/t/r/u/ /n/u/f/e/r/i/ /g/a/l/b/e/n/i/ /ɨ/l/ /ɨ/n/k/a/r/k/ə/ /t/r/e/s/ə/r/i/n/d/ /ɨ/n/ /t∫/e/r/k/u/r/i/ /a/l/b/e/ /j/e/l/ /k/u/t/r/e/m/u/r/ə/ /o/ /b/a/r/k/ə/

Şi eu trec de-a lung de maluri Parc-ascult şi parc-aştept Ea din trestii să răsară Şi să-mi cadă lin pe piept

/∫/i/ /j/e/w/ /t/r/e/k/ /d/e̯ /a/ /l/u/n/g/ /d/e/ /m/a/l/u/r/i/ /p/a/r/k/a/s/k/u/l/t/ /∫/i/ /p/a/r/k/a/∫/t/e/p/t/ /j/a/ /d/i/n/ /t/r/e/s/t/i/j/ /s/ə/ /r/ə/s/a/r/ə/ /∫/i/ /s/ə/m/i/ /k/a/d/ə/ /l/i/n/ /p/e/ /p/j/e/p/t/

Să sărim în luntrea mică Îngânaţi de glas de ape, Şi să scap din mână cârma, Şi lopeţile să-mi scape;

/s/ə/ /s/ə/r/i/m/ /ɨ/n/ /l/u/n/t/r/e̯ /a/ /m/i/k/ə/ /ɨ/n/g/ɨ/n/a/ts/i/ /d/e/ /g/l/a/s/ /d/e/ /a/p/e/ /∫/i/ /s/ə/ /s/k/a/p/ /d/i/n/ /m/ɨ/n/ə/ /k/ɨ/r/m/a/ /∫/i/ /l/o/p/e/ts/i/l/e/ /s/ə/m/i/ /s/k/a/p/e/

Să plutim cuprinşi de farmec Sub lumina blândei lune –

/s/ə/ /p/l/u/t/i/m/ /k/u/p/r/i/n/∫/i/ /d/e/ /f/a/r/m/e/k/ /s/u/b/ /l/u/m/i/n/a/ /b/l/ɨ/n/d/e/j/ /l/u/n/e/

Vântu-n trestii lin foşnească, Unduioasa apă sune!

/v/ɨ/n/t/u/n/ /t/r/e/s/t/i/j/ /l/i/n/ /f/o/∫/n/e̯ /a/s/k/ə/ /u/n/d/u/j/o̯ /a/s/a/ /a/p/ə/ /s/u/n/e/

Dar nu vine... Singuratic În zadar suspin şi sufăr Lângă lacul cel albastru Încărcat cu flori de nufăr

/d/a/r/ /n/u/ /v/i/n/e/ /s/i/n/g/u/r/a/t/i/k/ /ɨ/n/ /z/a/d/a/r/ /s/u/s/p/i/n/ /∫/i/ /s/u/f/ə/r/ /l/ɨ/n/g/ə/ /l/a/k/u/l/ /t∫/e/l/ /a/l/b/a/s/t/r/u/ /ɨ/n/k/ə/r/k/a/t/ /k/u/ /f/l/o/r/i/ /d/e/ /n/u/f/ə/r/

The frequencies of phonemes ranked in usual way are presented in Table 2.1.1.5. The poem Lacul has altogether total lines: 20; the total number of phonemes is 414 composed of 180 vowels (strong +voiceless +week) and 234 consonants. The size of the phoneme inventory is 29.

16  Phonic phenomena Table 2.1.1.5: Rank-frequencies of phonemes in individual strophes in Lacul Strophe 1

Strophe 2

Strophe 3

Strophe 4

Strophe 5

rank

freq phoneme freq phoneme freq

phoneme freq

phoneme freq

phoneme

1

12

/r/

9

/a/

7

/a/

10

/n/

8

/a/

2

8

/l/

7

/e/

7

/s/

9

/u/

8

/u/

3

7

/a/

7

/i/

6

/ə/

6

/a/

8

/n/

4

7

/e/

7

/r/

6

/e/

6

/e/

8

/r/

5

7

/k/

7

/t/

6

/i/

6

/s/

6

/l/

6

6

/u/

6

/p/

6

/n/

5

/i/

5

/i/

7

5

/n/

5

/ə/

5

/ɨ/

5

/l/

5

/k/

8

4

/ə/

5

/k/

5

/m/

4

/t/

5

/s/

9

4

/b/

5

/s/

4

/k/

3

/ə/

4

/ə/

10

3

/ɨ/

4

/d/

4

/l/

3

/j/

3

/ɨ/

11

3

/i /

4

/l/

4

/p/

3

/k/

3

/e/

12

3

/o/

4

/∫/

3

/d/

3

/d/

3

/d/

13

3

/t/

3

/j/

3

/r/

3

/m/

3

/f/

14

2

/i/

3

/u/

2

i

//

3

/p/

3

/t/

15

2

/d/

3

/n/

2

/g/

3

/r/

2

/g/

16

2

/s/

2

//

2

/∫/

2

/ɨ/

1

/ i/

17

1

/j/

2

/m/

2

/ts/

2

/b/

1

/o/

18

1

/t∫/

1

/e̯ /

1

/e̯ /

2

/f/

1

/b/

19

1

/f/

1

/w/

1

/o/

2

/∫/

1

/t∫/

20

1

/g/

1

/g/

1

/u/

1

/e̯ /

1

/p/

21

1

/m/

0

/ɨ/

1

/t/

1

/i /

1

/∫/

22

0

/e̯ /

0

/o/

0

/j/

1

/o/

1

/v/

23

0

/o̯ /

0

/o̯ /

0

/o̯ /

1

/o̯ /

1

/z/

24

0

/w/

0

/b/

0

/w/

1

/v/

0

/e̯ /

25

0

/p/

0

/t∫/

0

/b/

0

/w/

0

/j/

26

0

/∫/

0

/f/

0

/t∫/

0

/t∫/

0

/o̯ /

27

0

/ts/

0

/ts/

0

/f/

0

/g/

0

/w/

28

0

/v/

0

/v/

0

/v/

0

/ts/

0

/m/

29

0

/z/

0

/z/

0

/z/

0

/z/

0

/ts/

i

Occurrence without pattern  17

The rank-frequency distribution of phonemes in all strophes can be captured by the one- or two-component stratification formula

(2.1.1.1) fr = 1 + a*exp(-r/b) + c*exp(-r/d) Figure 2.1.1.1. Rank-frequencies of phonemes in the first strophe of Lacul

testifying to the phonic stratification of individual strophes. The individual fitting parameters, computed iteratively, are presented in Table 2.1.1.6. The graphic picture of the first strophe is presented in Figure 2.1.1.1 and the fourth strophe in Figure 2.1.1.2 (R2 is the usual determination coefficient). Table 2.1.1.6: Parameters of fitting (2.1.1.1) to individual strophes Strophe

a

b

c

d

R2

1

11.5941

6.1295

-

-

0.95

2

8.8879

9.0145

-

-

0.93

3

7.6789

9.0035

-

-

0.91

4

4.1970

1.0692

8.5249

7.7596

0.96

5

9.5960

7.2343

-

-

0.93

As can be seen in Table 2.1.1.6, the parameters do not vary excessively, the phoneme representation is quite uniform. Nevertheless, different local phenomena may appear, and these will be analyzed in the subsequent sections. Of course, the differences between individual parameters could be tested but the differences are too small to allow general hypothesis building. The ho-

18  Phonic phenomena mogeneity of the distributions in individual strophes cannot be performed by means of the chi-square test because the frequencies are too small; another test based on ranks could be used instead. To this end, we reorder Table 2.1.1.5 according to phonemes and ascribe them the respective rank in the given strophe. The result can be seen in Table 2.1.1.7. When two or more frequencies are identical, the corresponding phomenes were assigned the mean of the ranks, e.g. if ranks 5,6,7 have the same frequency, then all three items receive rank 6. If a frequency is unique, its rank remains as it is. Further, if a phoneme does not occur in a strophe, it obtains the highest rank (highest mean rank).

Figure 2.1.1.2. Rank-frequencies of phonemes in the fourth strophe of Lacul

The S column contains the sums of the given rows; Vi is a function of ties (in the strophe i). A tie with ti occurrences corresponds to the function ti3 - ti, e.g. in the first column, the rank 4 occurs three times (the phonemes /a/, /e/, /k/), hence V/a/ = 33 - 3 = 24. If there are several ties in the column, the sum of the above function has to be calculated. In column S2, there are the squares of the values in the S column. The squared sum of empirical rank sums is given as

(2.1.1.2)

2

 Np  SSR = ∑ ( Si − S ) =∑ Si −  ∑ Si  / N p ,  i =1  i =1 i =1   Np

2

Np

2

yielding in our case SSR = 198356 - 21752/29 = 35231 (see Table 2.1.1.7). Np is the number of distinct phonemes in the studied poem, e.g. Np = 29 for Lacul (see Table 2.1.1.5).

Occurrence without pattern  19

Since we have ties whose sums are given in the last row, we compute Kendall's concordance coefficient in the form:

(2.1.1.3)

W =

12( SSR) m

m 2 ( N p − N p ) − m∑Vi 3

i =1

where m is the number of strophes of the studied poem. Lacul has m = 5 strophes of 4 verses each. The value of W need not be calculated because one can directly compute the test-criterion (see below). The computation of the function Vi can be illustrated using the example of the first strophe. Rank 8.5 occurs twice, rank 4 three times, rank 15 three times, rank 11.5 four times, rank 19 five times, and rank 25.5 eight times, hence the function Vi is computed for the first strophe as follows: V1 = 1(23 - 2) + 2(33 - 3) + 1(43- 4)+1(53 - 5) +1(83 - 8) = 6 + 48 + 60 +120 + 504 = 738

Now we want to find a criterion enabling us to decide whether the strophes are phonically independent. This can be done by means of the chi-square criterion as defined by Kendall (1962: 100): (2.1.1.4)

X2 =

12( SSR) m N p ( N p + 1) −

m 1 ∑V j N p − 1 j =1

yielding a chi-square statistic with Np-1 degrees of freedom. Inserting the computed values in this formula we obtain X2 = 12(35231)/[5(29)30 - (1/28)3930] = 422772/(4350-140.357) = 100.43 Since we have DF = 28, our result is significant (e.g. for α = 0.0005 we have X2 = 50.5), i.e. the use of the phonemes is divergent among the individual strophes. This fact allows to deduce several consequences. We cannot, however, ask the author any more whether our interpretations are correct. Possible inferences are e.g. that the poem has not been written in one go or that it has been corrected subsequently or that it has not been written spontaneously in form of an improvisation, etc.

20  Phonic phenomena Table 2.1.1.7: Ranks of phonemes in individual strophes in Lacul phoneme

strophe 1

/a/

2 4

3

4

1

1.5

S

S2

13

169

5 4

2.5

/ə/

8.5

8

4.5

12

9

42

1764

/ɨ/

11.5

25

7.5

17.5

12

73.5

5402.25

/e/

4

3.5

4.5

4

12

28

784

/e̯ /

25.5

19

19.5

22

26.5

112.5

12656.25

/i/ i //

15

3.5

4.5

6.5

7

36.5

1332.25

11.5

16.5

15.5

22

19.5

85

7225

/j/

19

14

25.5

12

26.5

97

9409

/o/

11.5

25

19.5

22

19.5

97.5

9506.25

/o̯ /

25.5

25

25.5

22

26.5

124.5

15500.25

/u/

6

14

19.5

2

2.5

44

1936

/w/

25.5

19

25.5

27

26.5

123.5

15252.25

/b/

8.5

25

25.5

17.5

19.5

96

9216

/k/

4

8

10

12

7

41

1681

/t∫/

19

25

25.5

27

19.5

116

13456

/d/

15

11

12.5

12

12

62.5

3906.25

/f/

19

25

25.5

17.5

12

99

9801

/g/

19

19

15.5

27

15

95.5

9120.25

/l/

2

11

10

6.5

5

34.5

1190.25

/m/

19

16.5

7.5

12

26.5

81.5

6642.25

/n/

7

14

4.5

1

2.5

29

841

/p/

25.5

6

10

12

19.5

73

5329

/r/

1

3.5

12.5

12

2.5

31.5

992.25

/s/

15

8

1.5

4

7

35.5

1260.25

/∫/

25.5

11

15.5

17.5

19.5

89

7921

/t/

11.5

3.5

19.5

8

12

54.5

2970.25

/ts/

25.5

25

15.5

27

26.5

119.5

14280.25

/v/

25.5

25

25.5

22

19.5

117.5

13806.25

/z/

25.5

25

25.5

27

19.5

122.5

15006.25

Sum S

435

435

435

435

435

2175

198356

Vi

738

882

726

666

918

3930

Occurrence without pattern  21

2.1.2 Euphony in general In literary studies, euphony is a fuzzy concept originating from an individual perception of a text and the intuitive aesthetic evaluation of this perception. Peculiar enough, in music, which is strongly based on euphony (but not always), the concept does not even exist. Instead, various kinds of aesthetics are discussed. In textology, beauty is associated rather with the choice of words or association of ideas, etc. Euphony is some background noise ascribed to the phonic composition of the poem expressed above all by the rhyme. Since rhyme is conscious, the concept of euphony becomes fuzzy if we add to the rhyme also some subconscious phenomena and perform dichotomic decisions about their presence or absence. Definitions that can be found en masse in dictionaries or on the Internet say that euphony is a pleasing or sweet sound or a harmonious succession of words with a pleasing sound – which is simply a tautology, not an operational definition. In order to avoid passionate discussions about the greater harmonious and pleasing succession of words in Sheffield English than in Italian, we try to bestow the concepts of euphony with a more objective correspondence with perceived reality and warrant it a computable existence. In the presented approach, euphony is understood as a regular or nonrandom occurrence of sounds/phonemes or their sequential patterns in a text. We prefer to apply the concept of phoneme because that of sound is rather fuzzy and sound realisation depends always on the automatisms acquired during childhood. In poetry, for which the concept of euphony is considered as relevant – as opposed to prose –, the best known euphonic phenomenon is the rhyme. If, say, we find an /a/ in each position in a verse where a vowel occurs then we are inclined to consider it a euphonic pattern. This event need not be a conscious act of the poet, it may simply be the outcome of the Skinner effect or a phenomenon of self-organisation. Nevertheless, there are cases, as e.g. in old Javanese, where a specific vowels sequence is required for each verse. Whatever the cause may be, if we want to consider a pattern euphonic, we must show that the given pattern is significant, i.e. not a random effect. There are several methods to determine this fact: (i) To ask the author who perhaps remembers her/his motivs and writing method (as long as s/he lives) – however, even if so, a writer will not be able to state exactly the degree of realised euphony. (ii) To ask informants for their subjective impression; but this method furnishes information only in form of subjective statements and depends strongly on age, education, gender, social status, dialect, etc. Nevertheless, at least a kind of scaling can be

22  Phonic phenomena obtained in this way. (iii) The only objective method is a statistical test of the occurrence of individual phonemes or phonemic patterns for significance. This procedure has several advantages: (a) It is objective; every researcher will obtain the same results; (b) it involves quantification of a very fuzzy concept and can be used for comparisons, classifications, studying the evolution of a writer, etc., and (c) it allows us to determine the entities which evoked euphony. (d) Last but not least, we can compute even the probability of a false evaluation. Up to now, there are no general hypotheses about euphony. Neither boundary conditions are known under which it must, may or cannot occur. The number of statistical studies concerning euphony is very small (cf. Skinner 1939, 1941; Sebeok, Zeps 1959; Meyer-Eppler 1959; Altmann 1963, 1966a,b, 1968; Wimmer et al. 2003). The investigations are local, restricted to a text sort or a writer. Anyway, they show that objective quantification of different phonic phenomena in poetry is possible. In the present chapter, we shall study the general euphony of verses in the sense of non-random occurrence, i.e. by a significantly frequent occurrence of some phonemes in the line and set up an indicator of general euphony of a poem. Since we study only one writer, the starting point is the table of letters, their phonemic correspondences, and their occurrences in his works considered. The titles and sizes of 46 analysed poems are listed in Table 2.1.2.1 and the corresponding phoneme occurrence is presented in Table 2.1.2.2. It is to be noted that we work with the concept of phoneme, i.e. with a higher abstraction because of its uniformity as opposed to the variation observed in sounds. We want to present a realistic measurement of euphony; therefore we proceed as follows. Every line is considered a sample of its own. We distinguish the number of vowels V and the number of consonants C in the line. Since in a vocalic position, a certain vowel can occur or not, its distribution is binomial. Now, if a vowel i occurs two or more times in the line, in general xi-times, we compute the probability of the given or more extreme number of occurrences by means of the formula (2.1.2.1).

Occurrence without pattern  23

Table 2.1.2.1: Titles and sizes of analysed 46 poems by Eminescu Poem title

No of words (text size)

1

Lebăda

41

2

Peste vârfuri

47

3

Dintre sute de catarge

50

4

Şi dacă...

53

5

La mijloc de codru...

55

6

Somnoroase păsărele...

55

7

La steaua

71

8

Adânca mare…

75

9

Trecut-au anii

88

10

Lacul

90

11

Ce te legeni...

102

12

Odă în metru antic

103

13

De ce nu-mi vii

123

14

Mai am un singur dor

125

15

Criticilor mei

130

16

O, mamă…

140

17

Cu mane zilele-ţi adaogi...

141

18

Revedere

141

19

Sara pe deal

156

20

Atât de fragedă…

176

21

Freamăt de codru

179

22

Ce-ţi doresc eu ţie, dulce Românie

183

23

Pe lângă plopii fără soţ

199

24

Povestea codrului

220

25

Floare-albastră

247

26

Sonete

265

27

Despărţire

304

28

Ghazel

331

29

La moartea lui Heliade

332

30

O, adevăr sublime...

334

31

Iubită dulce, o, mă lasă

337

32

O călărire în zori

346

33

Dacă treci râul Selenei

356

24  Phonic phenomena

34

Rugăciunea unui dac

357

35

Copii eram noi amandoi

375

36

Glossă

380

37

Povestea teiului

390

38

Venere şi Madona

393

39

Făt-Frumos din tei

415

40

Dumnezeu şi om

443

41

Junii corupţi

458

42

Mortua est!

491

43

Epigonii

921

44

Împărat şi proletar

1510

45

Luceafărul

1737

46

Scrisoarea III

2278

V V  x V − x (2.1.2.1) P ( X ≥ xi ) =   pi qi , ∑ x = xi  x 

where X is a vocalic phoneme, and analogically for consonants, replacing V by C. The first line of the poem Adânca mare Adânca mare sub a lunei faţă; is transcribed as /a/d/ɨ/n/k/a/ /m/a/r/e/ /s/u/b/ /a/ /l/u/n/e/j/ /f/a/ts/ə/ In this verse, we have V = 12, C = 11. Considering the vocalic phoneme /a/ we see that it occurs 5 times and its relative frequency with respect to the inventory of vowels is 0.181583 (Table 2.1.2.2); we compute accordingly Table 2.1.2.2: Phonemes in Eminescu's 46 poems phoneme

No of occurrences

relative frequency

relative frequency vowels/consonants

Vowels (strong, weak, voiceless) 1

/a/

6172

0.086823

0.181583

2

/ə/

3005

0.042272

0.088408

Occurrence without pattern  25

3

/ɨ/

1733

0.024379

0.050986

4

/e/

7118

0.100131

0.209415

5

/e̯ /

859

0.012084

0.025272

6

/i/

4252

0.059814

0.125096

7

/ i/

1142

0.016065

0.033598

8

/j/

2434

0.034240

0.071609

9

/o/

2159

0.030371

0.063519

10

/e̯ /

546

0.007681

0.016064

11

/u/

4261

0.059941

0.125360

12

/w/

309

0.004347

0.009091

Consonants 13

/b/

767

0.010790

0.020676

14

/k/

2389

0.033607

0.064399

15

/t∫/

1145

0.016107

0.030865

16

/k’/

223

0.003137

0.006011

17

/d/

2627

0.036955

0.070814

18

/f/

795

0.011183

0.021430

19

/g/

505

0.007104

0.013613

20

/dʒ/

277

0.003897

0.007467

21

/g’/

19

0.000267

0.000512

22

/h/

67

0.000943

0.001806

23

/ʒ/

114

0.001604

0.003073

24

/l/

3173

0.044635

0.085533

25

/m/

2539

0.035717

0.068442

26

/n/

4867

0.068465

0.131197

27

/p/

2046

0.028782

0.055153

28

/r/

5220

0.073431

0.140712

29

/s/

2766

0.038910

0.074561

30

/∫/

1222

0.017190

0.032941

31

/t/

3944

0.055481

0.106316

32

/ts/

760

0.010691

0.020487

33

/v/

1116

0.015699

0.030083

34

/z/

514

0.007231

0.013856

26  Phonic phenomena

12  12 − x x 0.181583 (1 − 0.181583) x =5   = 0.038452 + 0.009953 + 0.001893 + 0.000262 + 0.0000259 + + 0.00000172 + 0.0000000695 + 0.000000001285 = 0.05059.

P (/ a / ≥ 5) =

12

∑ 5

The phoneme /n/ occurs twice in the given line and its relative frequency with respect to the inventory of consonants is 0.131197 (Table 2.1.2.2), hence we obtain 11 11   P (/ n / ≥ 2) ∑  0.131197 x (1 = = − 0.131197)11− x 0.433506. 2 x=2   If the computed probability is smaller than α, which can be determined conventionally as e.g. 0.05, then we may speak of a euphonic tendency. The extent of euphony contributed by the given phoneme will be measured by the indicator (cf. Wimmer et al. 2003: 60) (2.1.2.2)

CE

phoneme

100[α − P(ξ ≥ x)], if α > P(ξ ≥ x) = otherwise 0,

where ξ = occurrences of the phoneme. Here, α is the significance level (= 0.05), therefore the Coefficient of Euphony, CEphoneme expressed by (2.1.2.2) is always positive and may attain values in the interval [0; 5.00]. In our example, we obtained in the first case the sum of 0.05059, which is greater than 0.05, hence the five occurrences of /a/ do not display a euphonic effect. The same holds for the two occurrences of /n/, from which follows that the first line is not constructed euphonically. Performing the above computation for all k phonemes occurring at least twice in the line we obtain the mean euphony indicator for the line as (2.1.2.3) CE= line

100 ∑ [α − P(ξi ≥ xi )] k ξ i ∈E

where the ξi are the phonemes belonging to the euphonic set E fulfilling condition E = {phoneme|CEphoneme > 0}. Having performed this computation for all lines of a poem we may define the euphonic value of the whole poem with K lines as

Occurrence without pattern  27

(2.1.2.4) CE poem =

1 K

K

∑ CE

line j

.

j =1

For the given poem we may compute the variance of CEpoem empirically as (2.1.2.5) Var (CE poem ) =

1 K

K

∑ (CE

line j

j =1

− CE poem ) 2 ,

a simple expression that can be used for comparing two poems by means of a normal test. For the sake of illustration let us consider the poem Lebăda: Lebăda

Phonemic transcription

Când pintre valuri ce saltă /k/ɨ/n/d/ /p/i/n/t/r/e/ /v/a/l/u/r/ i/ / t∫/e/ /s/a/l/t/ə/ Pe baltă

/p/e/ /b/a/l/t/ə/

În ritm uşor,

/ɨ/n/ /r/i/t/m/ /u/∫/o/r/

Lebăda albă cu-aripele-n vânturi

/l/e/b/ə/d/a/ /a/l/b/ə/ /k/w/a/r/i/p/e/l/e/n/ /v/ɨ/n/t/u/r/ i/

În cânturi

/ɨ/n/ /k/ɨ/n/t/u/r/ i/

Se leagănă-n dor;

/s/e/ /l/e̯ /a/g/ə/n/ə/n/ /d/o/r/

Aripele-i albe în raza cea caldă

/a/r/i/p/e/l/e/j/ /a/l/b/e/ /ɨ/n/ /r/a/z/a/ /t∫/a/ /k/a/l/d/ə/

Le scaldă,

/l/e/ /s/k/a/l/d/ə/

Din ele bătând,

/d/i/n/ /j/e/l/e/ /b/ə/t/ɨ/n/d/

Şi-apoi pe luciu, pe unda d-oglinde

/∫/j/a/p/o/j/ /p/e/ /l/u/t∫/u/ /p/e/ /u/n/d/a/ /d/o/g/l/i/n/d/e/

Le-ntinde

/l/e/n/t/i/n/d/e/

O barcă de vânt.

/o/ /b/a/r/k/ə/ /d/e/ /v/ɨ/n/t/

We obtain the results presented in Table 2.1.2.3. Table 2.1.2.3: Analysis of the poem Lebăda line no.

phoneme

CEphoneme 1.701282

C̅E̅line

4

/b/

0.24304

5

/ɨ/

3.544263

1.772131

7

/a/

3.088126

0.772031

10

/p/

1.837048

0.204116

28  Phonic phenomena Adding the numbers in the last column and dividing the sum by the number of lines in the poem (K = 12) we obtain CELebada = 2.991318/12 = 0.249277. Using this mean and the values in the last column by taking into account the eight lines that have euphony zero we obtain the variance Var(CELebada) = 0.257629. In this way, the euphonic tendency of all poems can be computed. Here we shall simply order the poems according to increasing euphony as presented in Table 2.1.2.4. The problem of the euphonic weight of individual phonemes is languagedependent – even if it can have an iconic background – but it cannot contribute to answering general questions. We can simply state that individual poems have different euphony values ranging from 0.110634 up to 0.453381. Comparing the first poem (smallest euphony) with the last (highest euphony) by means of a normal test we obtain = u

| 0.110634 − 0.453381| 0.342747 = = 4.29 0.0798193 0.047272 0.442159 + 56 80

which is a highly significant value. Hence euphony played a certain role in Eminescu's work. We may ask two questions: (1) Did Eminescu develop in this respect or did he maintain the same level from the first to the last analysed poem? (2) Does the extent of euphony depend on the length of the poem? The first question can be answered by scrutinising the relation of the euphony of the poem to the year of its origin. Looking at Table 2.1.2.4 and plotting the euphony values according to years in a diagram (cf. Figure 2.1.2.1) we see that in a certain epoch of his creativity, Eminescu began to develop euphony, interrupted always this evolution and fell back to a lower state, where he began anew. His “pathological” year 18831 was, regarding euphony, particularly expanded. This reminds of renewal processes but scrutinising of this mechanism must be postponed to the happy time when the works by more writers will be at our disposal and we shall know not only the year of origin but also the dates of first appearance. In any case, one sees a very characteristic historical movement of euphony. Taking simply means of the concerned years leads to a slight linear increase.

 1 In June 1883, Eminescu fell seriously ill and finally died in 1889.

Occurrence without pattern  29

Table 2.1.2.4: Euphony in Eminescu's poems (ordered according to increasing values) Year

Poem title

No of verses

Euphony poem

Variance euphony

1873

Dumnezeu şi om

56

0.110634

0.047272

1867

Dacă treci râul Selenei

41

0.126545

0.042433

1867

La moartea lui Heliade

48

0.128128

0.047784

1870

Epigonii

114

0.137836

0.037605

1883

Peste vârfuri

12

0.14854

0.066133

1883

Somnoroase păsărele

16

0.150188

0.116695

1879

Atât de fragedă...

36

0.153134

0.102216

1883

Odă în metru antic

20

0.153366

0.072062

1873

Ghazel

40

0.165371

0.039335

1879

Despărţire

38

0.165682

0.063814

1887

Venere şi Madona

48

0.183075

0.069181

1874

Împărat şi proletar

210

0.183341

0.076543

1879

Rugăciunea unui dac

46

0.184598

0.067198

1881

Scrisoarea III

285

0.190571

0.079298

1869

Junii corupţi

78

0.195154

0.141009

1873

Adânca mare...

14

0.195572

0.064063

1883

Cu mâine zilele-ţi adaogi...

32

0.208622

0.139903

1866

O călărire în zori

86

0.2208

0.145174

1871

Iubită dulce, o, mă lasă

56

0.224681

0.096907

1871

Mortua est!

70

0.225005

0.092254

1874

O, adevăr sublime...

44

0.228707

0.090329

1867

Ce-ţi doresc eu ţie, dulce Românie

32

0.229817

0.146835

1887

De ce nu-mi vii

24

0.233317

0.113969

1879

Freamăt de codru

48

0.245238

0.117649

1869

Lebăda

12

0.249277

0.257629

1883

Şi dacă...

12

0.262959

0.34563

1883

Criticilor mei

28

0.264219

0.147977

1879

Sonete

42

0.264639

0.132969

1885

Sara pe deal

24

0.265213

0.088441

30  Phonic phenomena

1887

Povestea teiului

88

0.26974

0.217131

1876

Lacul

20

0.274255

0.152951

1883

Luceafărul

392

0.27721

0.239471

1883

Mai am un singur dor

36

0.28178

0.340581

1873

Floare albastră

56

0.293544

0.24609

1883

Trecut-au anii

14

0.296258

0.258758

1880

O, mamă...

18

0.296358

0.12687

1875

Făt-Frumos din tei

92

0.300824

0.201322

1883

La mijloc de codru

13

0.329564

0.127905

1871

Copii eram noi amândoi

92

0.336609

0.325554

1883

Pe lângă plopii fără soţ

44

0.342626

0.228156

1879

Revedere

36

0.353938

0.336106

1886

La steaua

16

0.365743

0.625435

1880

Dintre sute de catarge

16

0.373583

0.163423

1878

Povestea codrului

52

0.406063

0.316016

1883

Ce te legeni codrule

25

0.425026

0.486791

1883

Glossă

80

0.453381

0.442159

Figure 2.1.2.1. Plot of

Assonance  31

The second question can be answered if we scrutinise the relation as given in Figure 2.1.2.2.

Figure 2.1.2.2. Plot

As can easily be seen, there is no dependence of euphony on poem length. On the contrary, some poems of the same length seem to be written under quite different euphonic regimes. However, as soon as more authors have been analyzed, even this should be shown empirically by a statistical test. In our present study, euphony is a local phenomenon concerning a given poem but there is neither development nor length dependence.

2.2 Assonance Assonance may remind of an echo, a repetition of a sound sequence in another position of the poem. It must consist of at least two sounds (vowels) in the same linear order; the sequence may be “discontinuous”. While in prose assonance is not always relevant, it may play a certain euphonic role in poetry. Assonance may give rise to parallelism, i.e. repetition of the same sound-sequence in parallel positions in the strophe. This phenomenon can be observed e.g. in Malay folk-quatrains called pantuns (cf. Altmann 1963). In modern poetry, one cannot expect ordered vocalic structures outside of rhyme and if rhyme does not exist (e.g. in hexameter), vocalic patterns are rather seldom. One way of finding vocalic assonance patterns in modern poetry is to study the transitions from one vowel to the next one, so to say, to search for Markov dependencies of the first order. But the computation would be complex and not very lucid (cf. Brainerd 1976; Altmann 1988) since we have several different

32  Phonic phenomena states (= number of different vowels) in a text. Instead, we simply observe the transitions between vowels omitting the transition from one verse to the next and register them in a contingency table. For Romanian, we use the vocalic phonemes: {/a/, /ə/, /ɨ/, /e/, /e̯ /, /i/, /i/, /j/, /o/, /o̯ /, /u/, /w/}. Registering all transitions we obtain a 12 × 12 contingency table, in which we can find the individual tendencies. We test each individual cell using the criterion

(2.2.1) u =

ni .n. j n , ni .n. j (n − ni .)(n − n. j ) n 2 (n − 1) nij −

where u is the quantile of the standard normal distribution N(0,1), nij is the frequency in cell (i,j); ni. is the sum of row i; n. j is the sum of column j, and n is the total sum. The expression ni.n.j /n is the expectation for the cell (i,j), and the expression in the denominator is the standard deviation in the cell (i,j). If u ≥ 1.96, we have a significant vowel pattern, otherwise the pattern can be considered random. Here, we are not interested in different strengths of the patterning, hence we decide dichotomically: if u ≥ 1.96, we have an existing, positive pattern (P), otherwise the sequence is not significant (N). This is why we use rather the normal distribution than the chi-square criterion, which yields only positive results (being the square of (2.2.1)). For the sake of illustration let us present the results from the poem Lacul in Table 2.2.1. Consider the sequence of two a-s /aa/ yielding n/aa/ = 7 and the sequence /au/, n/au/ = 10. Inserting the other numbers from Table 2.2.1 we obtain: 36(34) 7− 160 u/ aa / = = −0.2999 36(34)(160 − 36)(160 − 34) 1602 (160 − 1)

Assonance  33

36(34) 10 − 160 = 2.43 36(34)(160 − 36)(160 − 34) 1602 (160 − 1)

u/ au /

Table 2.2.1: Frequency of vowel sequences in the poem Lacul /a/

/ə/

/ɨ/

/e/

/e̯ /

/i/

/i /

/j/

/o/

/o̯ /

/u/

/w/

ni .

a/

7

6

1

7

0

4

1

0

0

0

10

0

36

/ə/

4

2

1

0

0

3

2

0

1

0

2

0

15

/ɨ/

4

3

2

2

0

0

0

0

0

0

2

0

13

/e/

5

2

0

0

1

5

2

1

0

0

4

1

21

/e̯ /

3

0

0

0

0

0

0

0

0

0

0

0

3

/i/

2

3

3

5

0

1

1

3

3

0

3

0

24

/i /

4

0

1

3

0

0

0

0

0

0

0

0

8

/j/

1

1

0

2

0

1

0

0

0

1

1

0

7

/o/

2

0

0

1

1

1

1

0

0

0

0

0

6

/o̯ /

1

0

0

0

0

0

0

0

0

0

0

0

1

/u/

1

3

0

7

1

6

2

1

2

0

2

0

25

/w/

0

0

0

1

0

0

0

0

0

0

0

0

1

n.j

34

20

8

28

3

21

9

5

6

1

24

1

160

Since u/aa/ = -1.96 < -0.2999 < 1.96, the sequence /aa/ does not represent a significant association. The sequence /au/ represents a significant association because u/au/ = 2.43 ≥ 1.96. Performing the same test for all cells of Table 2.2.1 we obtain the results as presented in Table 2.2.2. Here we took the value of u = 1.96 as a boundary, however, one can use other quantiles. In a two-sided test, this corresponds to α = 0.05. One could, of course, assign the vowel sequences to different classes – as is usual in phonemics – e.g. those with u < -1.96 to the dissociative class (D) and those within [-1.96; 1.96] to the neutral class (N) but our aim here is to find only existing preferred associations (P).

34  Phonic phenomena Table 2.2.2: The u-test for individual cells of Table 2.2.1 /a/

/ə/

/ɨ/

/e/

/e̯ /

/i/

/i /

/j/

/o/

/o̯ /

/u/

/w/

P

/a/ /ə/ /ɨ/

P

/e/ /e̯ /

P

P

/i/ // i

P

P P

/j/ P

/o/ /o̯ / /u/ P

/w/

Evidently, there are associative tendencies in constructing vowel chains. Nine out of one hundred and forty-four sequences are preferred by the poet. In order to see whether the same tendencies exist in other works by Eminescu we analyzed 46 poems and presented the results in Table 2.2.3. Table 2.2.3: Associative two-member chains in Eminescu's poems Poem title

No of Significant chains of phonemes verses

Lebăda

12

/a,ə/, /ə,w/, /ɨ,i/, /i,e/, /i,e/, /o,j/, /u, i/,

Peste vârfuri

12

/a,j/, /ə,o/, /e,a/, /e,o/, / ,e/, /j,ɨ/, /o,u/,/o̯ ,a/, /u, / 9

Şi dacă...

12

/a,ə/, /a,ɨ/, /ɨ,u/, /e̯ ,a/, /i,j/, /j,e/,/o,e̯ /, /o,i/, /u,i/

9

La mijloc de codru...

13

/ə,i/, /ɨ,u/, /e̯ ,a/, /i,e/, /j,e/, /o̯ ,a/, /u,e̯ /

7

Adânca mare…

14

/a,ə/, /ɨ,e̯ /, /ɨ,i/, /e,j/, /e,o̯ /, /e̯ ,a/,/i,ɨ/, /i,o/, /j,e/, /o,i/, /o̯ ,a/

11

Trecut-au anii

14

/a,ə/, /a,e̯ /, /ɨ,a/, /e,u/, /e̯ ,a/, /i,j/, /i,o/, /i,ɨ/, /j,a/, /j,e/, /o,w /, /u, i/, /w,a/

13

Somnoroase păsărele..

16

/a,e/, /e̯ ,a/, /i,j/, /i,ɨ/, /o,i/, /o̯ ,a/, /u,ə/

7

La steaua

16

/a,e̯ /, /a,w/, /ə,i/, /ə,i/, /e̯ ,a/, /i,ɨ/, /i,j/, /i,o/, /j,e/, /o̯ ,a/, /u,i/

11

/a,u/, /ə,ɨ/, /ɨ,u/, /e̯ ,o/, /i,e/, /j,o/, /o,o/, /o̯ ,a/, /u,i/

9

Dintre sute de catarge 16

i

No. 7 i

Assonance  35

Poem title

No of Significant chains of phonemes verses

No.

O, mamă…

18

/a,ə/, /ə,u/, /e,w/, /e̯ ,a/, /i,e/, /i,ɨ/,/j,o/, /o,j/, /o̯ ,a/, /u,ə/

10

Lacul

20

/a,u/, /e,w/, /e̯ ,a/, /i,j/, /i,o/, /i,a/, /j,o̯ /, /o,e̯ /, /w,e/

9

Odă în metru antic

20

/a,e/, /ɨ,ɨ/, /e,i/, /e,w/, /e̯ ,a/, /i,j/, /i,ɨ/, /j,e/, /o,i/, /o̯ ,a/, /w,o/

11

De ce nu-mi vii

24

/ə,w/, /e,u/, /e,w/, /e̯ ,a/, /i,j/, /i,i/, /j,e/, /o,i/, /o,i/, /o̯ ,a/, /u,i/

11

Sara pe deal

24

/a,e̯ /, /ə,i/, /ɨ,ɨ/, /e,i/, /e,i/, /e,w/, /e̯ ,a/, /i,ə/, /i,j/, 12 /j,o̯ /, /o,i/, /o̯ ,a/

Ce te legeni...

25

/a,e̯ /, /a,j/, /ɨ,u/, /e,ɨ/, /e,e/, /e̯ ,a/, /i,o/, /o,u/, /o̯ ,a/, /u,i/, /u,i/

11

Criticilor mei

28

/ə, e̯ /, /ɨ,i/, /e,o̯ /, /e,u/, /e̯ ,a/, /i,j/, /j,e/, /o,i/, /o̯ ,a/, /u,e/, /u,i/, /w,ə/

12

Cu mâne zilele-ţi adaogi...

32

/a,o/, /ə,o̯ /, /e,i/, /e̯ ,a/, /i,e/, /i,w/, /j,e/, /o,ə/, /o̯ ,a/, /w,a/, /w,e̯ /

11

Ce-ţi doresc eu ţie, dulce Românie

32

/ə,i/, /ə,w/, /ɨ,i/, /e,o/, /e,w/, /e̯ ,a/, /i,e̯ /, /i,j/, /i,o/, /j,e/, /o,ɨ/, /o,o̯ /, /o̯ ,a/, /u,i/, /w,i/

15

Revedere

36

/ɨ,ə/, /ɨ,u/, /e,e̯ /, /e,i/, /e̯ ,a/, /i,e/, /j,e/, /j,o/, /o,j/, /o,u/, /o̯ ,a/, /u,i/

12

Atât de fragedă...

36

/ɨ,o/, /e,i/, /e,u/, /e̯ ,a/, /i,j/, /i,o/, /i,w/, /j,e/, /o,i/, 11 /o̯ ,a/, /w,a/

Mai am un singur dor

36

/a,o̯ /, /a,u/, /ə,i/, /e,e/, /e̯ ,a/, /i,e̯ /, /i,j/, /i,ɨ/, /j,e/, /o,j/, /o,o/, /o̯ ,a/, /u,i/, /w,o/

14

Despărţire

38

/ə,w/, /ɨ,o̯ /, /e̯ ,a/, /e̯ ,o̯ /, /i,j/, /j,e/, /o,j/, /o̯ ,a/, /w,ɨ/

9

Ghazel

40

/ə,u/, /ə,w/, /ɨ,o/, /e,i/, /e̯ ,a/, /i,ə/, /i,ɨ/, /i,o̯ /, /j,e/, 14 /o,i/, /o,j/, /o̯ ,a/, /u,e/, /u,i/

Dacă treci râul Selenei 41

/a,ə/, /a,e/, /ə,a/, /ə,i/, /e̯ ,a/, /i,j/, /i,o/, /i,w/, /i,ɨ/, /j,e/, /o,i/, /o,i/, /o,o̯ /, /o̯ ,a/, /u,e̯ /, /w,ɨ/

16

Sonete

42

/a,ə/, /ɨ,u/, /e,e/, /e,j/, /e̯ ,a/, /i,j/, /i,i/, /j,a/, /o,i/, /o̯ ,a/, /u,i/

11

Pe lângă plopii fără soţ

44

/ə,w/, /ɨ,i/, /e̯ ,a/, /i,j/, /o̯ ,a/, /w,e/

6

O, adevăr sublime...

44

/a,e̯ /, /ə,o/, /ə,w/, /e,i/, /e̯ ,a/, /i,ə/, /i,u/, /j,e/, /j,u/, /o,i/, /o̯ ,a/, /w,u/

12

Rugăciunea unui dac

46

/a,e̯ /, /a,w/, /ɨ,u/, /e,ɨ/, /e̯ ,a/, /e̯ ,o/, /i,o/, /j,e/, /o̯ ,a/, /u,j/, /w,o̯ /

11

Venere şi Madona

48

/a,ə/, /ə,u/, /ɨ,o̯ /, /e,j/, /e,w/, /e̯ ,a/, /j,e/, /o,i/,

11

36  Phonic phenomena

Poem title

No of Significant chains of phonemes verses

No.

/o,w/, /o̯ ,a/, /u,ɨ/ Freamăt de codru

48

/a,ə/, /ə,o̯ /, /e,w/, /e̯ ,a/, /i,i/, /i,o/, /j,e/, /o,j/, /o̯ ,a/, /u,i/

10

La moartea lui Heliade 48

/a,ə/, /ə,ɨ/, /ɨ,i/, /e̯ ,a/, /i,j/, /i,o/, /i,w/, /j,e/, /o,o/, 12 /o̯ ,a/, /u,i/, /u,j/

Povestea codrului

52

/a,ə/, /ə,i/, /ɨ,o/, /e̯ ,a/, /i,e̯ /, /i,j/, /i,o/, /j,ɨ/, /j,e/, /o,w/, /o̯ ,a/, /u,i/, /u,o̯ /, /w,o/

14

Iubită dulce, o, mă lasă

56

/a,e̯ /, /a,o/, /ə,j/, /ə,u/, /ə,w/, /e̯ ,a/, /j,e/, /o,i/, /o,u/, /o̯ ,a/, /u,ə/, /w,o/

12

Floare-albastră

56

/a,ə/, /a,ɨ/, /ə,i/, /ə,u/, /ɨ,o/, /ɨ,u/, /e,i/, /e,w/, /e̯ ,a/, /i,e̯ /, /i,j/, /i,w/, /j,e/, /o,i/, /o,j/, /o̯ ,a/, /u,ə/, /u,i/, /w,a/

19

Dumnezeu şi om

56

/a,ə/, /a,e/, /a,u/, /a,w/, /ə,ɨ/, /ə,i/, /ə,u/, /ɨ,ə/, /e̯ ,a/, /i,j/, /i,o̯ /, /j,e/, /o̯ ,a/

13

Mortua est!

70

/a,ə/, /a,e̯ /, /a,j/, /ə,i/, /ɨ,o/, /e,u/, /e̯ ,a/, /i,j/, /i,o̯ /, /i,w/, /j,e/, /o,i/, /o̯ ,a/, /u,i/, /w,ɨ/

15

Junii corupţi

78

/a,ə/, /ə,o̯ /, /ɨ,e̯ /, /ɨ,i/ ,/e̯ ,a/, /i,ə/, /i,j/, /i,ɨ/, /j,e/, /o,w/, /o̯ ,a/, /u,i/

12

Glossă

80

/a,ə/, /ə,ɨ/, /ə,i/, /ə,w/, /e,e/, /e̯ ,a/, /i,j/, /i,o/, /i,ɨ/, /j,e/, /o,o̯ /, /o,w/, /o̯ ,a/, /u, i/, /w,ə/

15

O călărire în zori

86

/a,ə/, /a,j/, /ə,i/, /ə,w/, /ɨ,u/, /e,ɨ/, /e,o/, /e̯ ,a/, /i,j/, /i,ɨ/, /j,a/, /o,i/, /o,w/, /o̯ ,a/, /u,j/

15

Povestea teiului

88

/a,ə/, /a,j/, /ə,ɨ/, /ə,u/, /e,e̯ /, /e,w/, /e̯ ,a/, /e̯ ,o̯ /, /j,a/, /o,i/, /o,u/, /o̯ ,a/, /u, i/, /w,a/

14

Copii eram noi amândoi

92

/a,e/, /ɨ,ə/, /e,w/, /e̯ ,a/, /i,j/, /i,o/, /j,e/, /o,i/, /o̯ ,a/, /u, i/, /u,o̯ /, /w,a/

12

Făt-Frumos din tei

92

/a,ə/, /ɨ,ə/, /e,ɨ/, /e̯ ,a/, /j,a/, /j,e/, /o,i/, /o,i/, /o,u/, 13 /o,w/, /o̯ ,a/, /u,i/, /w,a/

Epigonii

114

/a,ə/, /a,u/, /a,w/, /ɨ,ə/, /e,i/, /e̯ ,a/, /i,e̯ /, /i,j/, /i,o̯ /, /i,ɨ/, /j,e/, /o,j/, /o̯ ,a/, /u,e/

Împărat şi proletar

210

/a,ə/, /a,u/, /ə,ɨ/, /ə,i/, /e,i/, /e̯ ,a/, /i,e̯ /, /i,j/, /i,o̯ /, 17 /j,a/, /j,e/, /j,o/, /o,i/, /o,o/, /o̯ ,a/, /u,i/, /w,ə/

Scrisoarea III

285

/a,ə/, /a,u/, /a,w/, /ə,ɨ/, /ɨ,ə/, /ɨ,u/, /e,o̯ /, /e̯ ,a/, /i,e/, /i,j/, /i,o/, /i,ɨ/, /j,a/, /j,e/, /o,e̯ /, /o,i/, /o,i/, /o̯ ,a/, /u,ə/, /u,i/, /u,i/, /w,a/, /w,ɨ/

Luceafărul

392

/a,ə/, /a,e̯ /, /a,w/, /ə,ɨ/, /ə,i/, /e,o̯ /, /e,w/, /e̯ ,a/, /i,e̯ 21 /, /i,i/, /i,j/, /i,o/, /i,ɨ/, /i,o̯ /, /j,a/, /j,e/, /o,i/, /o,i/, /o,u/, /o̯ ,a/, /u,i/

14

23

Assonance  37

2.2.1 The diagonal Looking at Table 2.2.2 we see that there is no sequence with significant frequency on the diagonal. However, if we look at Table 2.2.3 we find a very small number of identical (= diagonal) pairs. This may indicate that Romanian avoids such pairs or characterise a property of Eminescu's poems. Whatever the “cause”, we may test the behaviour of the diagonal applying a simple statistical test for the existence of vowel harmony in languages (cf. Altmann 1987; Schulz, Altmann 1988; Altmann, Altmann 2008). In our case, we surpass the boundaries of words and test the existence or nonexistence of a tendency. We set up a function of diagonal cells in form of their sum S = n11 + n22 + … + nkk, with k as the number of cells on the diagonal, and compare it with the expected sum given as

ni .n.i n

k

E (S ) = ∑ i =1

using the variance defined as

(2.2.2) Var ( S ) =

∑ n .n. (n − n .)(n − n. ) + 2∑ n .n. n .n. i

i

i

i

i

2

n (n − 1)

i < i´

i

i

i'

i'

.

The difference between the observed and the expected values divided by the root of the variance of S yields the standard normal distribution N(0,1). Performing the test (2.2.3) u =

S − E (S ) , Var ( S )

we can state whether the diagonal as a whole exhibits a significantly positive (u> 1.96), a significantly negative (u < −1.96) or a neutral tendency (u ∈ [-1.96; 1.96]). The corresponding chi-square test becomes simpler when only the preference of the diagonal is of interest:

38  Phonic phenomena k

(2.2.4) X 2 =

n(nS − ∑ ni .n.i ) 2 k

i =1

k

∑ ni .n.i (n − ∑ ni .n.i ) 2

,

=i 1 =i 1

which is approximately X2 ≈ u2. For the sake of illustration, we compute the tendency of the diagonal for the data in Table 2.2.1. We designate the two sums in (2.2.2) as A and B respectively and obtain S = 7 + 2 + 2 + 0 + 0 + 1+ 0 + 0 + 0 + 0 + 2 + 0 = 14 E(S) = [36(34) + 25(20) + 13(8) + 21(28) + 3(3) + 24(21) + 8(9) + 7(5) + +6(6) + 1(1) + 25(24) + 1(1)]/160 = 21.7125 A = [36(34)(160-36)(160-34) + 25(20)(160-25)(160-20) + …+ +1(1)(160-1)(160-1)] /1602 /159 = 15.34948 B =2[36(34)25(20) + 36(34)13(8)+…+36(34)1(1)+…+25(24)1(1)] /1602 /159 = = 2.33445 Var(S) = A + B = 17.6839 Inserting the needed values in (2.2.3) we obtain u=

14 − 21.7125 = −1.834 17.6839

showing that the diagonal is neutral although there is in fact a negative tendency (avoidance of sequences of equal vowels). Using the chi-square criterion we obtain for (2.2.4)

= X2

160[160(14) − 3474]2 = 3.1697 3474(1602 − 3474)

which is approximately u2 = (-1.834)2 = 3.3635 and not significant: it does not show the direction of the tendency.

Assonance  39

Instead to analyse each of the 46 poems separately, we show the numbers of positive associations in all poems as presented in Table 2.2.4, obtained by counting the significant sequences in Table 2.2.3. Table 2.2.4: Numbers of associations of subsequent vowels in 46 poems /a/

/ə/

/e/

/e̯ /

/i/

/i /

/j/

/o/

/a/

0

23

/ə/ /ɨ/

1 1

0 6

2

5

8 2

0 0

8

0

0

5

1 2

10 6

5 1

1 0

/e/

1

0

4

/e̯ /

43

0

0

4

2

6

5

0

0

0

0

/i/

0

4

1

7

7

2

0

/i /

1

0

14

2

0

2

/j/

8

0

2

34

0

/ɨ/

/o̯ /

/u/

/w/

2

1

7

6

2 5

3 2

7 10

10 0

3

3

4

5

12

0

2

2

0

0

28

12

4

0

4

0

1

4

2

1

2

0

0

0

4

2

1

0

/o/

0

1

1

1

3

19

9

9

4

3

7

7

/o̯ /

42

0

0

0

0

0

0

0

0

0

0

0

/u/

0

5

1

3

2

5

24

3

0

2

0

0

/w/

8

3

4

2

1

1

0

0

4

1

1

0

Since (on the diagonal) there are only 12 cases (marked in bold face) out of 563, which indicate association of identical phonemes (/ɨ,ɨ/, /e,e/, /i,i/, /o,o/), we may state that there is a strong dissociative tendency to use sequences consisting of identical vowels. The most frequent pairs of subsequent vowels are /e̯ ,a/ (43), /o̯ ,a/ (42), /j,e/ (34), /i,j/ (28) and they correspond to the diphthongs: “ea“, “oa“, “ie“, “ii“.

2.2.2 Symmetry If assonance is symmetric then sequences are to be considered random; the given sequence and the same sounds in reverse order are to be expected with the same frequency. If /e̯ ,a/ is significantly frequent, then we expect also /a,e̯ / to have the same quality. If the situation is different, we may speak about significant asymmetry of assonance. In Table 2.2.4 we observe the asymmetry of assonance in the studied poems. This can be caused both by the given language and the style of the author. If the given language excludes specific sequences (e.g. in Indonesian there are sequences [ә,a] but no sequences [a,ә]), asymmetry is not necessarily given. In general, if vowel sequences are concerned, the more

40  Phonic phenomena vowels there are in the language, the smaller is the probability of the existence of all reverse sequences in a poem. Whatever the situation in the given language, asymmetry can be measured and expressed by an indicator. For this purpose, we use the well known Bowker-test, which is based on the comparison of all cells (i,j) with symmetric cells (j,i) where i ≠ j, i.e. symmetric cells without the diagonal of the contingency table. We set up the criterion k −1

(2.2.5) X 2 = ∑ i= 1

k

∑

j = i +1 nij + n ji ≠ 0

(n ij − n ji ) 2 nij + n ji

which is distributed like a chi-square with k(k-1)/2 degrees of freedom (DF), k being the number of classes, here vowels (k = 12). Using Table 2.2.1 we obtain

= X2

(6 − 4) 2 (4 − 1) 2 (1 − 1) 2 (2 − 0) 2 + + + + = 49.9151 6+4 4 +1 1+1 2+0

which is with DF = 12(12-1)/2 = 66 yields P = 0.93 showing that the sequences are quite symmetric. In this way, all poems could be scrutinised. But if we consider the overall Table 2.2.4 to see the situation and perform the same test we obtain X2 = 297.802 which is with 66 degrees of freedom highly significant, i.e. it testifies to strong asymmetry. All in all, there is a strong tendency to avoid reverse/symmetric vocalic assonances either in Eminescu's work or in Romanian in general. In any case, the extent of asymmetry can be considered a property of his poems. This is not an automatic result but a case of poem structure. However, only a comparison with other poets – also in other languages – would help to unveil the strength of this structure. Simple functions applied to the sequence do not yield a better result than R2 = 0.80 but it is surely based on the fact that each poem is an individual creation, and a better result would follow only if we had many poems of the same length. Preliminarily we can consider this problem as a future task.

Assonance  41

2.2.3 Poem length and significant sequences Out of 144 possible vowel sequences in the given poems we find only 95 significant ones. One can expect that if the number of lines increases, the number of significant sequences will increase, too. However, the increase is not linear, it can be more adequately captured using a power function. We apply such a function to the data in Table 2.2.5, which results from counting the significance sequences for each poem in Table 2.2.3. As can be seen in Figure 2.2.1, the number of significant sequences increases with increasing poem length.

Figure 2.2.1. Increase of significant vowel sequences with increasing poem length Table 2.2.5: Dependence of significant sequences on verse numbers No. of verses

No. of significant vowel sequences

No. of verses

No. of significant vowel sequences

12

7

41

16

12

9

42

11

12

9

44

6

13

7

44

12

14

11

46

11

14

13

48

11

16

7

48

10

16

9

48

12

16

11

52

14

42  Phonic phenomena

No. of verses

No. of significant vowel sequences

No. of verses

No. of significant vowel sequences

18

10

56

12

20

9

56

19

20

11

56

13

24

11

70

15

24

12

78

12

25

11

80

15

28

12

86

15

32

11

88

14

32

15

92

12

36

12

92

13

36

11

114

14

36

14

210

17

38

9

285

23

40

14

392

21

There is no significant trend, the sequences are likely to correspond with those usual in Romanian word structure. It will be reasonable to characterise Eminescu only after several Romanian authors have been investigated. An alternative measuring method for poem size is in terms of the number of words. This is done in Table 2.2.6. The result of the regression analysis is displayed in Figure 2.2.2. As can be seen, the regression is here too weak to be considered as really existing. Table 2.2.6: Title and size of analysed 46 poems by Eminescu Poem title

N words (text size)

No. of significant chains of phonemes

1

Lebăda

41

7

2

Peste vârfuri

47

9

3

Dintre sute de catarge

50

9

4

Şi dacă...

53

9

5

La mijloc de codru...

55

7

6

Somnoroase păsărele...

55

7

7

La steaua

71

11

Assonance  43

8

Adânca mare…

75

11

9

Trecut-au anii

88

13

10

Lacul

90

9

11

Ce te legeni...

102

11

12

Odă în metru antic

103

11

13

De ce nu-mi vii

123

11

14

Mai am un singur dor

125

14

15

Criticilor mei

130

12

16

O, mamă…

140

10

17

Cu mane zilele-ţi adaogi...

141

11

18

Revedere

141

12

19

Sara pe deal

156

12

20

Atât de fragedă…

176

11

21

Freamăt de codru

179

10

22

Ce-ţi doresc eu ţie, dulce Românie

183

15

23

Pe lângă plopii fără soţ

199

6

24

Povestea codrului

220

14

25

Floare-albastră

247

19

26

Sonete

265

11

27

Despărţire

304

9

28

Ghazel

331

14

29

La moartea lui Heliade

332

12

30

O, adevăr sublime...

334

12

31

Iubită dulce, o, mă lasă

337

12

32

O călărire în zori

346

15

33

Dacă treci râul Selenei

356

16

34

Rugăciunea unui dac

357

11

35

Copii eram noi amândoi

375

12

36

Glossă

380

15

37

Povestea teiului

390

14

38

Venere şi Madona

393

11

39

Făt-Frumos din tei

415

13

40

Dumnezeu şi om

443

13

41

Junii corupţi

458

12

42

Mortua est!

491

15

44  Phonic phenomena

43

Epigonii

44

Împărat şi proletar

921

14

1510

17

45

Luceafărul

1737

21

46

Scrisoarea III

2278

23

Figure 2.2.2. Number of significant phoneme chains vs. poem lengths in words

2.3 Alliteration In poetry, there are two kinds of alliteration (= repetitions of the same sound at the beginning of linguistic entities): (1) in the poem, at the beginning of verses, which we will call Skinner alliteration, (2) in the line, at the beginning of words, which can be called Beowulf alliteration. The evaluation of both kinds may be processed using the same method. However, there are different starting approaches. (a) Stating the relative frequencies of all phonemes (sounds) in the poetic work of the author or (b) stating the relative frequencies only at the beginning of words (or verses) in all his poems; or (c) considering only the given poem and taking all phoneme/sound frequencies or (d) only the initial phonemes/sounds and their frequencies into account, and finally (e) starting from the relative frequencies of phonemes/sounds in the given language. Needless to say, the outcomes of tests may turn out to be very different but none of these “universes of discourse” represents some kind of population in the sense of statistics. They are samples but there is no population which could be called “phoneme/sound frequency with Eminescu” or even “phoneme/sound frequency in Romanian”. Every language changes every day, has many dialects

Alliteration  45

and idiolects, and a “population” should represent also all spoken utterances. Hence, in language there is no “population” (cf. Orlov, Boroda, Nadarejšvili 1982). Nevertheless, one may always start from a conventionally stated background and obtain a restricted result. Furthermore, two “distance” approaches can be differentiated: (f) taking into account the mutual distance of the given alliterated lines or words because the repetition of the same sound e.g. at the beginning of the 1st and at the 100th line surely does not have an alliterative effect, or (g) igoring distance. Computation according to approach (f) is much more difficult and requires some intuitive, subjective decisions about the distance in which alliteration can still be perceived. Here we shall lean against the sound frequency in 146 poems by Eminescu. A study on the basis of the phoneme frequencies in modern Romanian would mean to perform judgements about samples on the basis of a “population” which arose circa 140 years later; considering the complete “language of Eminescu” would mean only his written and conserved texts, not his spoken ones. Every decision about the choice of a “population” in language is simply a condition under which the analysis will be performed. The situation resembles mathematical theorems with “Let be given…”. Computing the Skinner alliteration does not differ from that of general euphony but here we do not differentiate vowels and consonants because both can occur at the same position. All formulas have been presented in previous chapters. The results of the computation of Skinner alliteration are presented in Table 2.3.1. Table 2.3.1: Skinner alliteration in 46 poems by M. Eminescu Year

Poem

No. of verses

phoneme

alliterative euphonies

Mean Skinner alliteration poem

1869

Lebăda

12

/ɨ/ /l/

1.66509 3.55578

0.43507

1883

Peste vârfuri

12

/m/ /p/

4.93605 0.48573

0.45181

1883

Şi dacă...

12

/j/ /∫/

4.30004 4.99989

0.77499

1883

La mijloc de codru...

13

/l/ /∫/

4.79498 4.99983

0.75345

1873

Adânca

14

/a/

2.20067

0.34688

46  Phonic phenomena

Year

Poem

No. of verses

mare...

phoneme

alliterative euphonies

Mean Skinner alliteration poem

/∫/

2.65572

1883

Trecut-au anii

14

/k/ /∫/

3.95403 2.65572

0.47213

1883

Somnoroase păsărele

16

/d/ /p/ /s/

3.03033 3.99200 4.71373

0.73350

1886

La steaua

16

/j/ /l/

3.39116 4.53123

0.49515

1880

Dintre sute de catarge

16

/k/ /t∫/ /d/ /v/

3.46930 2.31987 3.03033 4.99050

0.86312

1880

O, mamă...

18

/d/ /m/ /s/

2.28268 4.66702 4.54768

0.63874

1876

Lacul

20

/ɨ/ /s/ /∫/

3.78911 0.90644 4.96607

0.48308

1883

Odă în metru antic

20

/p/

4.99844

0.24992

1887

De ce nu-mi vii

24

/k/ /d/ /s/

4.20917 4.83748 4.79619

0.57679

1885

Sara pe deal

24

/s/ /v/

4.97454 4.38810

0.39011

1883

Ce te legeni codrule

25

/b/ /d/ /∫/ /z/

2.03667 4.80299 4.99998 3.59533

0.61740

1883

Criticilor mei

28

/k/ /t∫/

4.971341 3.98666

0.319929

1883

Cu mâne zilele-ţi adaogi...

32

/k/ /t∫/ /d/ /∫/

4.99917 3.53697 2.06036 4.78625

0.48071

1867

Ce-ţi doresc eu ţie, dulce Românie

32

/k/ /t∫/ /d/ /f/ /v/

2.83420 3.53697 4.39714 0.02892 4.84627

0.48886

Alliteration  47

Year

Poem

No. of verses

phoneme

alliterative euphonies

Mean Skinner alliteration poem

1879

Revedere

36

/k/ /t∫/ /m/ /∫/ /v/

4.98289 2.99134 4.12971 4.66840 3.12156

0.55261

1879

Atât de fragedă...

36

/k/ /m/ /∫/

4.32218 1.13610 4.66840

0.28130

1883

Mai am un singur dor

36

/d/ /p/

4.00055 3.05850

0.19608

1879

Despărţire

38

/k/ /t∫/ /s/

4.99999 2.68212 4.67215

0.32511

1873

Ghazel

40

/d/ /p/ /∫/

3.45432 4.90593 4.99999

0.33401

1873

Dacă treci râul Selenei

41

/j/ /k/ /d/ /m/ /p/ /∫/

3.73529 3.82598 3.29193 4.99121 2.01644 4.93282

0.55594

1879

Sonete

42

/k/ /d/ /p/ /s/ /∫/

4.95362 3.11865 4.30731 2.70689 4.41912

0.46442

1883

Pe lăngă plopii fără soţ

44

/o/ /k/ /∫/

4.99518 4.93787 4.31429

0.32380

1874

O, adevăr sublime...

44

/o/ /k/ /∫/ /t/

4.96600 4.66043 4.31429 1.63301

0.35395

1879

Rugăciunea unui dac

46

/j/ /k/ /p/ /s/ /∫/

4.53414 4.91815 4.80177 4.99392 4.99989

0.52713

1887

Venere şi Madonă

48

/k/ /d/

2.78525 1.83675

0.26996

48  Phonic phenomena

Year

Poem

No. of verses

phoneme

alliterative euphonies

/f/ /∫/

3.33612 4.99984

Mean Skinner alliteration poem

1879

Freamăt de codru

48

/j/ /k/ /t∫/ /p/ /s/ /∫/

2.62252 4.89379 4.25465 0.09509 1.17964 4.06876

0.35655

1867

La moartea lui Heliade

48

/a/ /o/ /k/ /p/ /v/

2.90305 3.50287 4.89379 0.09509 4.31774

0.32734

1878

Povestea codrului

52

/k/ /t∫/ /∫/

1.98510 4.01405 4.80065

0.20769

1871

Iubită dulce, o, mă lasă

56

/k/ /t∫/ /s/ /∫/

3.88829 4.79065 4.97055 4.99948

0.33302

1873

Floarealbastră

56

/∫/ /v/

4.99999 3.83283

0.15773

1873

Dumnezeu şi om

56

/ɨ/ /d/ /f/ /∫/

3.82567 3.29543 2.50228 4.99508

0.26104

1871

Mortua est!

70

/k/ /d/ /p/ /s/ /∫/

4.99999 3.52076 3.44782 3.09290 4.97938

0.28630

1869

Junii corupţi

78

/ɨ/ /k/ /t∫/ /d/ /s/ /∫/ /v/

4.99795 4.99999 1.24682 4.18110 1.77316 4.99387 1.53272

0.30417

Alliteration  49

Year

Poem

No. of verses

phoneme

alliterative euphonies

Mean Skinner alliteration poem

1883

Glossă

80

/t∫/ /d/ /∫/ /t/ /v/

4.99993 4.72079 3.75505 4.99999 4.13341

0.28261

1866

O călărire în zori

86

/ɨ/ /k/ /d/ /p/ /∫/

3.11600 4.93680 4.54513 4.98123 4.99817

0.26253

1887

Povestea teiului

88

/j/ /k/ /d/ /p/ /s/ /∫/

4.91265 4.03543 3.37119 3.63219 4.26721 4.98556

0.28641

1871

Copii eram noi amândoi

92

/ɨ/ /k/ /d/ /m/ /p/ /∫/

2.46870 4.99382 4.98496 3.26839 4.48592 4.99999

0.273932

1875

Făt-Frumos din tei

92

/ɨ/ /j/ /k/ /p/ /∫/

2.46870 4.56485 3.75689 4.48592 4.99999

0.22040

1870

Epigonii

114

/k/ /t∫/ /p/ /s/ /∫/ /v/

4.95225 4.99996 0.28475 4.49828 4.99698 4.99860

0.21694

1874

Împărat şi proletar

210

/ɨ/ /k/ /t∫/ /p/ /s/ /∫/ /v/ /z/

4.78623 4.99999 4.76277 4.99999 3.18147 4.99998 4.35967 3.09749

0.16756

50  Phonic phenomena

Year

Poem

No. of verses

phoneme

alliterative euphonies

1881

Scrisoarea III

285

/ɨ/ /k/ /t∫/ /d/ /p/ /∫/ /v/

4.80557 4.99999 3.16111 4.99732 4.58681 4.99999 4.99849

1883

Luceafărul

392

/ɨ/ /j/ /k/ /d/ /h/ /p/ /s/ /∫/

4.59363 4.99538 4.99999 4.99999 4.94273 4.99950 3.81825 4.99999

Mean Skinner alliteration poem 0.114208

0.09783

Figure 2.3.1. Skinner alliteration in 46 poems by Eminescu

Comparing the mean alliteration of poems (last column) with the number of lines it can easily be seen that the longer the poem the weaker is the weight of alliteration. The result is presented in Figure 2.3.1. The trend can be expressed by a polynomial but this is no good solution. The oscillation is too strong to be captured by a simple function. However, we do not attempt at fitting a function as we currently lack an a priori hypothesis. We simply consider all poems in which the mean alliteration is greater than 0.5 as alliteratively emphasised in-

Alliteration  51

dependently of the number of lines. In Table 2.3.1, ten poems fulfill this condition, i.e. 21.74%. Since they are mostly short, we can conclude that Skinner alliteration did not play an important role in Eminescu's poetry. But even if we omit the respective poems, the oscillation is too strong and the best fitting result is yielded by a decay function (e.g. y = ab/(b+x) or y = a/(1 + bx)) with R2 = 0.72. Another question is the increase or decrease of Skinner alliteration in Eminescu's development. We reorder the poems according to the year of the publication, take the means of individual years and obtain the results presented in Table 2.3.2. Table 2.3.2: Historical development of Skinner alliteration in 46 poems by Eminescu (averages are marked with bold letter) Year

Poem title

1866

O călărire în zori

86

0.26253

1867

Ce-ţi doresc eu ţie, dulce Românie

32

0.48886

1867

La moartea lui Heliade

48

0.32734

1867

Average

1869

Lebăda

12 78

1869

Junii corupţi

1869

Average

1870

Epigonii

1871

No. of verses

Mean Skinner alliteration

0.40810 0.43507 0.30417 0.36962 114

0.21694

Iubită dulce, o, mă lasă

56

0.33302

1871

Mortua est!

70

0.28630

1871

Copii eram noi amândoi

92

0.27393

1871

Average

1873

Adânca mare...

14

0.34688

1873

Ghazel

40

0.33401

1873

Dacă treci râul Selenei

41

0.55594

1873

Floare-albastră

56

0.15773

1873

Dumnezeu şi om

56

0.26104

1873

Average

1874

O, adevăr sublime...

0.29775

0.33112 44

0.35395

52  Phonic phenomena

1874

Împărat şi proletar

1874

Average

210

0.16756

1875

Făt-Frumos din tei

92

0.22040

1876

Lacul

20

0.48308

1878

Povestea codrului

52

0.20769

1879

Revedere

36

0.55261

1879

Atât de fragedă...

36

0.28130

1879

Despărţire

38

0.32511

1879

Sonete

42

0.46442

1879

Rugăciunea unui dac

46

0.52713

1879

Freamăt de codru

48

0.35655

1879

Average

1880

Dintre sute de catarge

16

0.86312

1880

O, mamă...

18

0.63874

1880

Average

1881

Scrisoarea III

1883

0.26076

0.41785

0.75093 285

0.11421

Peste vârfuri

12

0.45181

1883

Şi dacă...

12

0.77499

1883

La mijloc de codru...

13

0.75345

1883

Trecut-au anii

14

0.47213

1883

Somnoroase păsărele

16

0.7335

1883

Odă în metru antic

20

0.24992

1883

Ce te legeni codrule

25

0.6174

1883

Criticilor mei

28

0.31993

1883

Cu mâne zilele-ţi adaogi...

32

0.48071

1883

Mai am un singur dor

36

0.19608

1883

Pe lângă plopii fără soţ

44

0.3238

1883

Glossă

1883

Luceafărul

1883

Average

1885

Sara pe deal

24

0.39011

1886

La steaua

16

0.49515

1887

De ce nu-mi vii

24

0.57679

80

0.28261

392

0.09783 0.44263

Alliteration  53

1887

Venere şi Madonă

48

0.26996

1887

Povestea teiului

88

0.28641

1887

Average

0.37772

We do not observe any trend in the data; the course of the values is horizontal. We can conclude that Skinner alliteration does not play any important role in Eminescu's poems. Consider now the individual phonemes and their alliterative weight. The results presented in Table 2.3.3 are obatined considering the number of poems in which a given phoneme has significant alliterative weight. As can be seen, vowels play a secondary role. The euphonic effects are brought about mostly by fricatives; some phonemes are not used at all in this role. Let us consider now the Beowulf alliteration, i.e. the repetition of the same phonemes at the beginning of words in the verse. The computation is analogous, the formulas can be found in previous chapters. The results for individual poems are presented in Table 2.3.4.

Table 2.3.3: Number of poems in which the given phoneme is significantly Skinner alliterative phoneme

No. of poems

/∫/

32

/k/

29

/d/

21

/p/

19

/s/

16

/t∫/

14

/v/

11

/ɨ/

10

/j/

8

/m/

6

/f/

3

/l/

3

/o/

3

/a/

2

/t/

2

54  Phonic phenomena

/z/

2

/b/

1

/h/

1

Figure 2.3.2. Beowulf-alliteration in 46 poems by Eminescu Table 2.3.4: Beowulf alliteration in poems by Eminescu

Year

Poem title

No. of verses

Mean Beowulf alliteration

1869

Lebăda

12

0.05347

1883

Peste vârfuri

12

0.46448

1883

Şi dacă...

12

0.14888

1883

La mijloc de codru

13

0.55811

1873

Adânca mare...

14

0.29776

1883

Trecut-au anii

14

0.24343

1883

Somnoroase păsărele

16

0.09431

1886

La steaua

16

0.13558

1880

Dintre sute de catarge

16

0.71598

1880

O, mamă...

18

0.39431

1876

Lacul

20

0.46970

1883

Odă în metru antic

20

0.30460

1887

De ce nu-mi vii

24

0.55745

1885

Sara pe deal

24

0.27876

1883

Ce te legeni codrule

25

0.14388

1883

Criticilor mei

28

0.31361

1883

Cu mâne zilele-ţi adaogi...

32

0.22782

Alliteration  55

Year

Poem title

No. of verses

Mean Beowulf alliteration

1867

Ce-ţi doresc eu ţie, dulce Românie

32

0.54427

1879

Revedere

36

0.38388

1879

Atât de fragedă...

36

0.33618

1883

Mai am un singur dor

36

0.31839

1879

Despărţire

38

0.40529

1873

Ghazel

40

0.35619

1873

Dacă treci râul Selenei

41

0.30746

1879

Sonete

42

0.28151

1883

Pe lângă plopii fără soţ

44

0.37434

1874

O, adevăr sublime...

44

0.29885

1879

Rugăciunea unui dac

46

0.35243

1887

Venere şi Madonă

48

0.42784

1879

Freamăt de codru

48

0.28827

1867

La moartea lui Heliade

48

0.31488

1878

Povestea codrului

52

0.18022

1871

Iubită dulce, o, mă lasă

56

0.28434

1873

Floare-albastră

56

0.32952

1873

Dumnezeu şi om

56

0.29509

1871

Mortua est!

70

0.38091

1869

Junii corupţi

78

0.41006

1883

Glossă

80

0.49429

1866

O călărire în zori

86

0.24570

1887

Povestea teiului

88

0.37388

1871

Copii eram noi amândoi

92

0.32993

1875

Făt-Frumos din tei

92

0.40153

1870

Epigoni

114

0.38133

1874

Împărat şi proletar

210

0.34650

1881

Scrisoarea III

285

0.38960

1883

Luceafărul

392

0.31057

Even if we take means for individual poem lengths, this impression remains. However, it can depend on the insufficient representation of some poem lengths. In any case, there is no significant tendency according to Figure 2.3.2.

56  Phonic phenomena The same image can be obtained by ordering the poems according to the year of origin. There is no noteworthy trend. Even negative results are important: we may state that alliteration of whatever kind did not play any important role in Eminescu's poetry.

2.4 Aggregation Departing from the Skinner effect, according to which “the appearance of a sound in speech raises the probability of occurrence of that sound for some time thereafter” (Skinner 1939) it may be supposed that verses in mutually near neighbourhood are phonetically more similar than those in greater distance. This effect is not necessarily based on semantic effects; it is rather a kind of mechanical self-stimulation. It can be brought into connection with the generally accepted view of neuronal activity, which is maintained over a certain time span and may thus evoke repeatedly the same or a similar stimulus in the brain of the writer. In this way, sounds, words, or other units which were produced on a neural trigger have an increased probability to be repeated as long as the activity has not fully expired. We call this phenomenon aggregation. The distance d is measured simply in terms of the number of “steps” between two verses. Poetic texts form the best material for studying this phenomenon because of the almost equally long text sections (verses) which may be compared. Though we can measure the probability of appearance of a phonic element in a certain distance d after its previous appearance – although we must operate with conditional probabilities – we cannot measure either the degree of spontaneity or the extent of self-stimulation but we can, at least, assume the existence of a mechanism controlling this phenomenon. The rest must be left to neurologists. Now, even if the hypothesis is reasonable, the outcome may be disappointing because the author of a written text can exchange words later on in several places and disturb the tendency. As a matter of fact, only pieces of spoken language can be fully spontaneous but if we have luck, we find the traces of this trend also in poetry. Hence, the non-existence of the aggregation trend is no proof for the absence of spontaneity. A material in which this trend can be found with some guarantee are spontaneously narrated epopees of oriental narrators who compose them each time anew (cf. Altmann 1968). The situation is different in music where improvisation is a spontaneous sequence of harmonies and even those “stolen” occasionally from known compositions are inserted because the foregoing sequences simply lead to them.

Aggregation  57

The possibilities of research in this domain are merely touched and open an interdisciplinary field. One can compare sounds, phonemes, n-grams, syllables, morphemes, places of accent, feet, word lengths, clause lengths, etc. The research is at its beginnings. Here we shall adhere to the method proposed in Altmann (1968) (cf. also Wimmer et al. 2003: 72ff) and construct the data as follows: We stay on the level of phonemes and each verse of a poem is transcribed phonemically. For convenience we provide below the phonemic transcription of the poem Lacul. Lacul

phonemic transcription

Lacul codrilor albastru Nuferi galbeni îl încarcă Tresărind în cercuri albe El cutremură o barcă

/l/a/k/u/l/ /k/o/d/r/i/l/o/r/ /a/l/b/a/s/t/r/u/ /n/u/f/e/r/i/ /g/a/l/b/e/n/i/ /ɨ/l/ /ɨ/n/k/a/r/k/ə/ /t/r/e/s/ə/r/i/n/d/ /ɨ/n/ /t∫/e/r/k/u/r/i/ /a/l/b/e/ /j/e/l/ /k/u/t/r/e/m/u/r/ə/ /o/ /b/a/r/k/ə/

Şi eu trec de-a lung de maluri Parc-ascult şi parc-aştept Ea din trestii să răsară Şi să-mi cadă lin pe piept

/∫/i/ /j/e/w/ /t/r/e/k/ /d/e̯ /a/ /l/u/n/g/ /d/e/ /m/a/l/u/r/i/ /p/a/r/k/a/s/k/u/l/t/ /∫/i/ /p/a/r/k/a/∫/t/e/p/t/ /j/a/ /d/i/n/ /t/r/e/s/t/i/j/ /s/ə/ /r/ə/s/a/r/ə/ /∫/i/ /s/ə/m/i/ /k/a/d/ə/ /l/i/n/ /p/e/ /p/j/e/p/t/

Să sărim în luntrea mică Îngânaţi de glas de ape, Şi să scap din mână cârma, Şi lopeţile să-mi scape;

/s/ə/ /s/ə/r/i/m/ /ɨ/n/ /l/u/n/t/r/e̯ /a/ /m/i/k/ə/ /ɨ/n/g/ɨ/n/a/ts/i/ /d/e/ /g/l/a/s/ /d/e/ /a/p/e/ /∫/i/ /s/ə/ /s/k/a/p/ /d/i/n/ /m/ɨ/n/ə/ /k/ɨ/r/m/a/ /∫/i/ /l/o/p/e/ts/i/l/e/ /s/ə/m/i/ /s/k/a/p/e/

Să plutim cuprinşi de farmec Sub lumina blândei lune – Vântu-n trestii lin foşnească, Unduioasa apă sune!

/s/ə/ /p/l/u/t/i/m/ /k/u/p/r/i/n/∫/i/ /d/e/ /f/a/r/m/e/k/ /s/u/b/ /l/u/m/i/n/a/ /b/l/ɨ/n/d/e/j/ /l/u/n/e/ /v/ɨ/n/t/u/n/ /t/r/e/s/t/i/j/ /l/i/n/ /f/o/∫/n/e̯ /a/s/k/ə/ /u/n/d/u/j/o̯ /a/s/a/ /a/p/ə/ /s/u/n/e/

Dar nu vine... Singuratic În zadar suspin şi sufăr Lângă lacul cel albastru Încărcat cu flori de nufăr

/d/a/r/ /n/u/ /v/i/n/e/ /s/i/n/g/u/r/a/t/i/k/ /ɨ/n/ /z/a/d/a/r/ /s/u/s/p/i/n/ /∫/i/ /s/u/f/ə/r/ /l/ɨ/n/g/ə/ /l/a/k/u/l/ /t∫/e/l/ /a/l/b/a/s/t/r/u/ /ɨ/n/k/ə/r/k/a/t/ /k/u/ /f/l/o/r/i/ /d/e/ /n/u/f/ə/r/

We set up a vector whose elements are the 34 Romanian phonemes PV = {/a/, /ə/, /ɨ/, /e/, /e̯ /, /i/, /i/, /j/, /o/, /o̯ /, /u/, /w/, /b/, /k/, /t∫/, /k’/, /d/, /f/, /g/, /dʒ/, /g’/, /h/, /ʒ/, /l/, /m/, /n/, /p/, /r/, /s/, /∫/, /t/, /ts/, /v/, /z/}.

58  Phonic phenomena and replace the phonemes by their frequencies in the given verse. Analyzing the whole poem consisting of 20 verses we obtain the following phoneme vectors: PV1 = 〈3,0,0,0,0,1,0,0,2,0,2,0,1,2,0,0,1,0,0,0,0,0,0,4,0,0,0,3,1,0,1,0,0,0〉 PV2 = 〈2,1,2,2,0,0,2,0,0,0,1,0,1,2,0,0,0,1,1,0,0,0,0,2,0,3,0,2,0,0,0,0,0,0〉 PV3 = 〈1,1,1,3,0,1,1,0,0,0,1,0,1,1,1,0,1,0,0,0,0,0,0,1,0,2,0,4,1,0,1,0,0,0〉 PV4 = 〈1,2,0,2,0,0,0,1,1,0,2,0,1,2,0,0,0,0,0,0,0,0,0,1,1,0,0,3,0,0,1,0,0,0〉 PV5 = 〈2,0,0,3,1,1,1,1,0,0,2,1,0,1,0,0,2,0,1,0,0,0,0,2,1,1,0,2,0,1,1,0,0,0〉 PV6 = 〈4,0,0,1,0,1,0,0,0,0,1,0,0,3,0,0,0,0,0,0,0,0,0,1,0,0,3,2,1,2,3,0,0,0〉 PV7 = 〈2,3,0,1,0,2,0,2,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,3,3,0,2,0,0,0〉 PV8 = 〈1,2,0,2,0,3,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,1,1,3,0,1,1,1,0,0,0〉 PV9 = 〈1,3,1,0,1,2,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,2,2,0,2,2,0,1,0,0,0〉 PV10 = 〈3,0,2,3,0,0,1,0,0,0,0,0,0,0,0,0,2,0,2,0,0,0,0,1,0,2,1,0,1,0,0,1,0,0〉 PV11 = 〈2,2,2,0,0,2,0,0,0,0,0,0,0,2,0,0,1,0,0,0,0,0,0,0,2,2,1,1,2,1,0,0,0,0〉 PV12 = 〈1,1,0,3,0,2,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,2,1,0,2,0,2,1,0,1,0,0〉 PV13 = 〈1,1,0,2,0,2,1,0,0,0,2,0,0,2,0,0,1,1,0,0,0,0,0,1,2,1,2,2,1,1,1,0,0,0〉 PV14 = 〈1,0,1,2,0,1,0,1,0,0,3,0,2,0,0,0,1,0,0,0,0,0,0,3,1,3,0,0,1,0,0,0,0,0〉 PV15 = 〈1,1,1,1,1,2,0,1,1,0,1,0,0,1,0,0,0,1,0,0,0,0,0,1,0,4,0,1,2,1,3,0,1,0〉 PV16 = 〈3,1,0,1,0,0,0,1,0,1,3,0,0,0,0,0,1,0,0,0,0,0,0,0,0,2,1,0,2,0,0,0,0,0〉 PV17 = 〈2,0,0,1,0,3,0,0,0,0,2,0,0,1,0,0,1,0,1,0,0,0,0,0,0,3,0,2,1,0,1,0,1,0〉 PV18 = 〈2,1,1,0,0,2,0,0,0,0,2,0,0,0,0,0,1,1,0,0,0,0,0,0,0,2,1,2,3,1,0,0,0,1〉 PV19 = 〈3,1,1,1,0,0,0,0,0,0,2,0,1,1,1,0,0,0,1,0,0,0,0,5,0,1,0,1,1,0,1,0,0,0〉 PV20 = 〈1,2,1,1,0,0,1,0,1,0,2,0,0,3,0,0,1,2,0,0,0,0,0,1,0,2,0,3,0,0,1,0,0,0〉 We shall call the components of one vector xi, those of another yi. In order to state the phonemic similarity of two verses we use the cosine indicator without normalising the values given as cos ( PV1 , PV2 ) =

x1 y1 + x2 y2 + ... + xk yk x12 + x22 + ... + xk2

y12 + y22 + ... + yk2

k

(2.4.1) =

∑x y i =1

i

k

i

k 2 i =i 1 =i 1

∑x ∑ y

2 i

where k = 34 represents the number of Romanian phonemes, and from (2.4.1) we obtain the radian of the angle using (2.4.2) τij = arccos(cos(PVi,PVj)).

Aggregation  59

The greater is τ, the greater is the phonic dissimilarity of two verses. If τ = 0, perfect similarity is given. Different modifications are possible but we consider the result as it is. We compute the mean (dis)similarity of verses in a certain distance d and take d = 1 evaluting (2.4.2) for all immediately neighbouring verses i.e. (1,2), (2,3),…,(19,20). For the first and the second verse we obtain (leaving out pairs in which one of the elements is 0):

cos( PV1 , PV2 ) = =

3(2) + 2(1) + 1(1) + 2(2) + 4(2) + 3(2) 2

4 + 2(3) 2 + 3(2) 2 + 5(1) 2 32 + 7(2) 2 + 5(1) 2 27 = 0.583383 7.1414(6.4807)

From this, τ12 = arccos(0.583383) = 0.947908. After computing all τ among neighbouring verses, we obtain the mean of these τ(1) as the sum of all respective τi,i+1 divided by 19, since there are 19 neighbouring pairs. Thus the mean τ for distance 1 is τ(1) = (0.947908 + 0.666946 + 0.701674 + 0.797118 + 0.937744 + 1.032968 + 0.970128 + 0.913333 + 1.244796 + 1.039012 + 0.986940 + 0.746283 + 0.957312 + 0.917633 + 0.998581 + 0.877128 + 0.696339 + 1.101363 + 0.987134) / 19 =0.922123. In order to obtain directly an expression for similarity (there are dozens of different formulas), we simple take (2.4.3) S = 1 −

2τ ( d )

π

which attains values in interval [0, 1], where 0 is the smallest similarity and 1 the greatest similarity, and present the results in Table 2.4.1.

60  Phonic phenomena The graphical representation of the course of similarity in dependence on distance is presented in Figure 2.4.1.

Figure 2.4.1. The development of mean phonemic similarity with increasing distance in Lacul Table 2.4.1: Mean phonic similarities of verses of Lacul at distance d Distance d

Mean dissimilarity τ(d)

Similarity S = 1-2τ(d)/ π

1

0.922123

0.412958

2

0.880339

0.439559

3

0.980918

0.375528

4

0.953367

0.393068

5

0.951558

0.394220

6

1.020575

0.350282

7

0.983388

0.373956

8

1.012124

0.355662

9

0.956406

0.391133

10

1.014036

0.354445

11

0.978028

0.377368

12

0.991043

0.369082

13

1.012168

0.355634

14

0.987895

0.371086

15

0.953101

0.393237

16

0.908692

0.421509

17

0.841329

0.464393

18

0.618277

0.606393

19

0.865573

0.448959

Aggregation  61

In spontaneously created poetry – like folklore – we expect a decrease of mean similarity. However, folklore and artistic poetry may differ drastically. In folklore, we have frequently spontaneous creation of a known story – though learning very long works by heart was nothing exceptional (e.g. in India). The stories have been told in one go and shortened or prolonged according to the interest of the audience. In artistic poetry there may be strivings for different effects, e.g. echoes at the end of a long poem or memories of some previous states, repetitions of the given rhyme, phonetic allusions, etc. In Lacul one sees rather a convex curve which, at its end, makes a jump. Since there are 20 verses, this distance is measured only in a few cases, hence this value is not a reliable mean. However, one can see that the first (Lacul codrilor albastru = The blue lake of the woods) and the 19th (Lângă lacul cel albastru = Nearby the blue lake) verses are more similar because the latter plays the role of a refrain. If we accept only mean values based on at least ten observations, the decreasing trend can be observed but its significance cannot be shown. Evidently one must scrutinise longer texts and take care of refrains. In Table 2.4.2 we present the poem Glossă with its English translation by A. G. Sahlean (cf. http://users.rcn.com/luceafarul/translators_gl.html, 02.07.2012) indicating the “echoed” verses with the same number. Table 2.4.2: Echoes in the poem Glossă Glossă

Glossa

Vremea trece, vremea vine,

1

Time goes by, time comes along,

Toate-s vechi şi nouă toate;

2

All is old and all is new;

Ce e rău şi ce e bine

3

What is right and what is wrong,

Tu te-ntreabă şi socoate;

4

You must think and ask of you;

Nu spera şi nu ai teamă,

5

Have no hope and have no fear,

Ce e val ca valul trece;

6

Waves that rise can never hold;

De te-ndeamnă, de te cheamă,

7

If they urge or if they cheer,

Tu rămâi la toate rece.

8

You remain aloof and cold.

Multe trec pe dinainte,

To our sight a lot will glisten,

În auz ne sună multe,

Many sounds will reach our ear;

Cine ţine toate minte

Who could take the time to listen

Şi ar sta să le asculte ?...

And remember all we hear?

Tu aşează-te deoparte,

Keep aside from all that patter,

62  Phonic phenomena

Glossă

Glossa

Regăsindu-te pe tine,

Seek yourself, far from the throng

Când cu zgomote deşarte Vremea trece, vremea vine.

When with loud and idle clatter 1

Time goes by, time comes along.

Nici încline a ei limbă

Nor forget the tongue of reason

Recea cumpăn-a gândirii

Or its even scales depress

Înspre clipa ce se schimbă

When the moment, changing season,

Pentru masca fericirii,

Wears the mask of happiness -

Ce din moartea ei se naşte

It is born of reason's slumber

Şi o clipă ţine poate;

And may last a wink as true:

Pentru cine o cunoaşte

For the one who knows its number

Toate-s vechi şi nouă toate.

2

All is old and all is new.

Privitor ca la teatru

Be as to a play, spectator,

Tu în lume să te-nchipui;

As the world unfolds before:

Joace unul şi pe patru,

You will know the heart of matter

Totuşi tu ghici-vei chipu-i,

Should they act two parts or four;

Şi de plânge, de se ceartă,

When they cry or tear asunder

Tu în colţ petreci în tine

From your seat enjoy along

Şi-nţelegi din a lor artă

And you'll learn from art to wonder

Ce e rău şi ce e bine.

3

What is right and what is wrong.

Viitorul şi trecutul

Past and future, ever blending,

Sunt a filei două feţe,

Are the twin sides of same page:

Vede-n capăt începutul

New start will begin with ending

Cine ştie să le-nveţe;

When you know to learn from age;

Tot ce-a fost ori o să fie

All that was or be tomorrow

În prezent le-avem pe toate,

We have in the present, too;

Dar de-a lor zădărnicie

But what's vain and futile sorrow

Tu te-ntreabă şi socoate.

4

You must think and ask of you;

Căci aceloraşi mijloace

For the living cannot sever

Se supun câte există,

From the means we've always had:

Şi de mii de ani încoace

Now, as years ago, and ever,

Lumea-i veselă şi tristă;

Men are happy or are sad:

Alte măşti, aceeaşi piesă,

Other masks, same play repeated;

Aggregation  63

Glossă

Glossa

Alte guri, aceeaşi gamă,

Diff'rent tongues, same words to hear;

Amăgit atât de-adese Nu spera şi nu ai teamă.

Of your dreams so often cheated, 5

Have no hope and have no fear.

Nu spera când vezi mişeii

Hope not when the villains cluster

La izbândă făcând punte,

By success and glory drawn:

Te-or întrece nătărăii,

Fools with perfect lack of luster

De ai fi cu stea în frunte;

Will outshine Hyperion!

Teamă n-ai, căta-vor iarăşi

Fear it not, they'll push each other

Între dânşii să se plece,

To reach higher in the fold,

Nu te prinde lor tovarăş; Ce e val ca valul trece.

Do not side with them as brother, 6

Waves that rise can never hold.

Cu un cântec de sirenă,

Sounds of siren songs call steady

Lumea-ntinde lucii mreje;

Toward golden nets, astray;

Ca să schimbe-actorii-n scenă,

Life attracts you into eddies

Te momeşte în vârteje;

To change actors in the play;

Tu pe-alături te strecoară,

Steal aside from crowd and bustle,

Nu băga nici chiar de seamă,

Do not look, seem not to hear

Din cărarea ta afară De te-ndeamnă, de te cheamă.

From your path, away from hustle, 7

If they urge or if they cheer;

De te-ating, să feri în laturi,

If they reach for you, go faster,

De hulesc, să taci din gură;

Hold your tongue when slanders yell;

Ce mai vrei cu-a tale sfaturi,

Your advice they cannot master,

Dacă ştii a lor măsură;

Don't you know their measure well?

Zică toţi ce vor să zică,

Let them talk and let them chatter,

Treacă-n lume cine-o trece;

Let all go past, young and old;

Ca să nu-ndrăgeşti nimică,

Unattached to man or matter,

Tu rămâi la toate rece.

8

You remain aloof and cold.

Tu rămâi la toate rece,

8

You remain aloof and cold

De te-ndeamnă, de te cheamă;

7

If they urge or if they cheer;

Ce e val ca valul trece,

6

Waves that rise can never hold,

Nu spera şi nu ai teamă;

5

Have no hope and have no fear;

64  Phonic phenomena

Glossă

Glossa

Tu te-ntreabă şi socoate

4

You must think and ask of you

Ce e rău şi ce e bine;

3

What is right and what is wrong;

Toate-s vechi şi nouă toate;

2

All is old and all is new,

Vremea trece, vremea vine.

1

Time goes by, time comes along.

The development of mean phonemic similarity with increasing distance in the poem Glossă is shown in Figure 2.4.2. The echoes are systematically placed and add similarity to different distances and destroy thereby any trace of spontaneity (in our view). A quite different course of similarity in dependence on distance is presented in Figure 2.4.3 for the poem Luceafărul. Thus, the poetic texts of each author must be scrutinised separately and the descriptions need not be general. Long poems are seldom written without pause and, at last, they may have been corrected many times. In this way, some creative processes are glossed over, modified, words are replaced by other ones, etc., and the phonic elements of the creation become lost. However, frequently even all a posteriori modifications do not destroy everything, and poets may intuitively create some classes of poems with common features, in our case, common phonetic similarities depending on distance. Analyzing Eminescu's work, we found several types regarding the course of mean similarities with increasing distance. They could not be captured by simple curves in any case, in each case a polynomial of very high order was necessary. Thus characterisation in this way is not prolific.

Figure 2.4.2. The development of mean phonic similarity with increasing distance in the poem Glossă

Aggregation  65

Figure 2.4.3. The development of mean phonic similarity with increasing distance in the poem Luceafărul

However, almost all the poems have a common feature: the variance of the similarity increases with increasing distance. In order to capture the state of the similarity, we have chosen a static and a dynamic characterisation of the poems. The static characterisation may be performed by a non-weighted indicator called non-smoothness (cf. Popescu et al. 2010: 95 ff.) consisting in the enumeration of local extremes. A point xi on the curve is a local maximum, if xi-1 < xi > xi+1 and a local minimum, if xi-1 > xi < xi+1. In the last column of Table 2.4.1 we find m = 13 extremes (including the first and the last values). Non-smoothness is defined as (2.4.4) NS =

m−2 , nd − 2

where m is the number of local extremes and nd is the number of distances in the poem. Since the first and the last value are automatically extremes, they will be subtracted both from m and nd. As can easily be seen, in Lacul we obtain NS(Lacul) = (13 – 2)/(19 – 2) = 0.6471. This indicator is a simple proportion and can easily be treated statistically. However, (2.4.4) does not express the magnitude of the extremes. In order to weight the oscillation, Popescu et al. (2010: 97) proposed the indicator of roughness defined as (2.4.5) R =

(m − 2) L , (nd − 2)(nd − 1) 2

66  Phonic phenomena where L is the arc length computed as (2.4.6) L =

nd −1

∑ [( x − x i =1

i

i +1

) 2 + 12 ]1/ 2

normalised by its maximum, which is given by (2.4.7) Lmax =

nd −1

∑ [(0 − 1) i =1

2

+ 12 ]1/ 2 = (nd − 1) 2 .

Indicator R attains its minimum, 0, if the arc is a straight line, and its maximum, 1, if the values oscillate regularly between 0 and 1. To demonstrate the computation we can state that in Lacul there are nd = 19 points out of which m = 13 are extremes. The arc length can be obtained as L = [(0.439559 - 0.412958)2 + 1]1/2 + [(0.375528 – 0.439559)2 + 1]1/2 + … + +[(0.448959 – 0.606393)2 + 1]1/2 = 18.029689. Inserting these values in (2.4.5) we obtain (13 − 2)18.029689 = R = 0.458294 (19-2)(19-1) 2 i.e., approximately a mean roughness. The values for other poems by Eminescu can be found in Table 2.4.3, and the graphics of roughness and poem length in Figure 2.4.4. Table 2.4.3: Similarity roughness in 46 poems by Eminescu Poem title

n verses

Roughness

Lebăda

12

0.472067

Peste vârfuri

12

0.629308

Şi dacă...

12

0.709386

La mijloc de codru...

13

0.495605

Adânca mare...

14

0.579076

Trecut-au anii

14

0.515271

Somnoroase păsărele...

16

0.599956

La steaua

16

0.381355

Dintre sute de catarge

16

0.655667

Aggregation  67

O, mamă...

18

0.424618

Lacul

20

0.458294

Odă în metru antic

20

0.499768

De ce nu-mi vii

24

0.439849

Sara pe deal

24

0.539198

Ce te legeni codrule

25

0.482380

Criticilor mei

28

0.425023

Cu mâne zilele-ţi adaogi…

32

0.488748

Ce-ţi doresc eu ţie, dulce Românie

32

0.488421

Revedere

36

0.386118

Atât de fragedă

36

0.493082

Mai am un singur dor

36

0.471823

Despărţire

38

0.383963

Ghazel

40

0.458873

Dacă treci râul Selenei

41

0.409476

Sonete

42

0.526059

Pe lângă plopii fără soţ

44

0.448790

O, adevăr sublime...

44

0.414093

Rugăciunea unui dac

46

0.411280

Venere şi Madonă

48

0.424563

Freamăt de codru

48

0.487451

La moartea lui Heliade

48

0.471841

Povestea codrului

52

0.447547

Iubită dulce, o, mă lasă

56

0.413737

Floare-albastră

56

0.493925

Dumnezeu şi om

56

0.453710

Mortua est!

70

0.432799

Junii corupţi

78

0.462185

Glossă

80

0.516858

O călărire în zori

86

0.528599

Povestea teiului

88

0.407723

Copii eram noi amândoi

92

0.532498

68  Phonic phenomena

Făt-Frumos din tei

92

0.524711

Epigonii

114

0.484215

Împărat şi proletar

210

0.450940

Scrisoarea III

285

0.421290

Luceafărul

392

0.479955

Figure 2.4.4. Poem length and roughness

Now, in order to interpret roughness we consider the possibility that with spontaneous writing the Skinner effect is active, similarity decreases regularly and slowly and the arc is small. Hence the smaller roughness, the greater spontaneity: the poem was written “fluently”. But the more pauses there are during the writing process, the more echoes are placed (artificially) in the poem and the more additional corrections were made, the greater will be the roughness. Thus, roughness discloses something of the creative process but without the possibility of asking the author we cannot give a definitive statement. In any case, poems with smaller than mean roughness were produced with greater spontaneity – in general. As can be seen in Figure 2.4.4, roughness of the majority of the poems under study lies under 0.5 and testifies to the possibility that Eminescu wrote them “rather” spontaneously. The longer poems approximate the mean 0.5 and reveal that the author made a lot of supplementary changes, or made a number of pauses while writing, or sought a phonic effect, echo, etc., though his original way of writing might have been spontaneous. Only a small number of shorter poems display great roughness, i.e. “construction” difficulties or supplementary changes of the text.

Aggregation  69

The dynamic characterisation of the increasing oscillation can be performed using the stepwise variance (SV) of the computed similarities defined as

= SV (2.4.8)

1 d ∑ ( Si − S d ) 2 d − 1 i =1

where Si are the individual similarities. For illustration, let us consider the poem Lebăda. Taking, e.g. the first three distances we obtain the mean similarity 0.364403 and the variance as 0.001244 up to distance d = 3. Computing all stepwise variances we obtain the results in the last column of Table 2.4.4 displayed graphically in Figure 2.4.5. The numbers in the last column of the Table 2.4.4 are multiplied by 1000 for a more lucid survey in the Figure 2.4.5. Table 2.4.5: Stepwise variance of similarities in the poem Lebăda Distance d

Similarity S

Stepwise Mean

Stepwise variance (SV)

1

0.330972

0.330972

-

2

0.401263

0.366118

0.002470

3

0.360975

0.364403

0.001244

4

0.325068

0.354570

0.001216

5

0.337641

0.351184

0.000969

6

0.377393

0.355552

0.000890

7

0.373982

0.358185

0.000790

8

0.379664

0.360870

0.000735

9

0.339634

0.358510

0.000693

10

0.423059

0.364965

0.001033

11

0.522756

0.379310

0.003193

This is, of course, a somewhat smoother sequence than the similarities themselves and can be captured by the beta-function, (2.4.9) SV = a(d – 2)b(M – d)c where a, b, c, and M are fitting parameters and d is the independent variable (distance).

70  Phonic phenomena

Figure 2.4.5. Beta fitting of U-shaped variance of similarities in Lebăda

Figure 2.4.6. Beta fitting of U-shaped variance of similarities in Luceafărul

Generally, besides its easy linguistic interpretation (cf. Popescu, Čech, Altmann 2011: 103), the beta-function appears to be most versatile, as further illustrated for Luceafărul (392 verses) in Figure 2.4.6, Glossă (80 verses) in Figure 2.4.7, Freamăt de codru (48 verses) in Figure 2.4.8 and Şi dacă... (12 verses) in Figure 2.4.9. The SV curves have a U-shape, an inverted sigmoid or other monotonous shapes either ascending or descending. The results of beta fitting to stepwise variances of similarities for other poems are presented in Table 2.4.5 in continuation below.

Aggregation  71

Figure 2.4.7. Beta fitting of the variance of similarities in Glossă

Figure 2.4.8. Beta fitting of the variance of similarities in Freamăt de codru

A small number of poems cannot be satisfactorily fitted by the beta-function. We suppose the effect of specific boundary conditions that are not yet known and should be incorporated in the theory later on. In some cases, the parameter a takes on enormous values, which have to be compensated by the other parameters. Also parameter M cannot always be explained empirically, though theoretically it is the maximum of x. Up to now, there is no satisfactory theory concerning the interplay of forces in texts. However, the results give an impetus for both literary and psycholinguistic investigations.

Figure 2.4.9. Beta fitting of the variance of similarities in Şi dacă…

72  Phonic phenomena Table 2.4.5: Fitting of beta-function 2.4.9 to stepwise variances of similarities (multiplied by 1000) Poem title

n verses

a

b

c

M

R2

Lebăda

12

2.3493

-0.4306

-0.2654

11.0089

0.98

Peste vârfuri

12

0.1169

1.0894

-0.0655

11.0003

0.93

Şi dacă...

12

12.3976

-0.5407

-0.3606

16.1572

0.94

La mijloc de codru...

13

1.2474

0.0789

0.0071

12.0000

0.86

Adânca mare...

14

1.2745

-7.1E-8

-0.2891

14.2913

0.57

Trecut-au anii

14

0.4839

0.2059

-0.2675

13.0083

0.99

Somnoroase păsărele...

16

5571.203

-0.6197

-2.7514

23.7679

0.63

La steaua

16

1.3119

-2E-9

-0.4255

15.1716

0.83

Dintre sute de catarge

16

1.6076

0.2446

0.2499

15.0000

0.44

O, mamă...

18

1.0765

-1E-13

-0.2986

19.4406

0.52

Lacul

20

83.3140

-0.6644

-1.4705

21.2663

0.78

Odă în metru antic

20

1.7370

-1.8E-9

-0.6308

20.3687

0.87

De ce nu-mi vii

24

11.6359

3.6506

-2.6821

107.8785

0.85

Sara pe deal

24

12.2982

0.2102

-1.0728

47.8712

0.89

Ce te legeni codrul

25

44.6701

1.5114

-2.2456

60.7189

0.89

Criticilor mei

28

132.7336

0.00017

-1.8380

35.7149

0.60

Cu mâne zilele-ţi adaogi

32

48.3140

2.7223

-1.9622

444.8317

0.66

Ce-ţi doresc eu ţie, dulce Românie

32

0.1020

0.7227

-0.1598

31.0098

0.93

Revedere

36

1.5975

-0.2331

-0.3290

35.0139

0.98

Atât de fragedă...

36

3.2043

0.1718

-0.6844

69.1686

0.75

Mai am un singur dor

36

2.4047

-0.2276

-0.5714

36.1521

0.92

Despărţire

38

6.0967

-0.4217

-0.7629

40.5750

0.82

Ghazel

40

1.3139

-0.0355

-0.4976

42.7757

0.91

Dacă treci râul Selenei

41

7.8334

-0.3357

-1.1493

45.7117

0.95

Sonete

42

0.0855

0.5007

-0.2268

41.0493

0.97

Pe lângă plopii fără soţ

44

1.8410

0.0475

-0.6434

47.2725

0.90

O, adevăr sublime...

44

10.5413

7.48E-8

-1.1991

52.7014

0.93

Rugăciunea unui dac

46

0.7278

-0.2065

-0.2231

45.1293

0.77

Venere şi Madona

48

0.4995

0.1174

-0.5464

48.4942

0.92

Freamăt de codru

48

0.5710

0.5304

-0.8757

52.7284

0.97

La moartea lui Heliade

48

0.4784

0.0620

-0.2752

47.0190

0.95

Rhyme  73

Povestea codrului

52

230.9531

-0.3735

-1.3829

73.4993

0.88

Iubită dulce, o, mă lasă

56

197.7745

-0.2191

-1.7486

76.1385

0.91

Floare-albastră

56

0.9176

-0.1208

-0.4055

55.7411

0.91

Dumnezeu şi om

56

0.1257

0.2524

-0.3413

56.1415

0.90

Mortua est!

70

10.1021

-0.3649

-0.8539

73.7914

0.90

Junii corupţi

78

1.7058

-0.1296

-0.5118

81.6006

0.94

Glossă

80

22.1707

-0.4299

-0.9037

79.4903

0.99

O călărire în zori

86

2.4177

-0.1933

-0.4362

86.7347

0.90

Povestea teiului

88

1.2982

0.1839

-0.7542

102.6669

0.95

Copii eram noi amândoi

92

243.4634

-0.3241

-1.3718

125.6920

0.81

Făt-Frumos din tei

92

5.8398

-0.2242

-0.8559

95.7490

0.97

114

2.1441

-0.2022

-0.7339

122.7160

0.92

Împărat şi proletar

210

0.6965

-0.2423

-0.4111

211.0245

0.93

Scrisoarea III

285

5.1373

-0.3903

-0.6547

289.804

0.96

Luceafărul

392

0.9638

-0.2562

-0.3700

391.8325

0.94

Epigonii

2.5 Rhyme The present section is devoted to properties of rhyme words, i.e. word-forms at the end of the verse/line with a phonic counterpart at the end of some further verse/line. For our purposes, we will operationalise the concept word-form in a simplistic, graphical way (cf. Chapter 3.2. Frequency distribution) as common also in computational and corpus linguistics: We will consider a string of characters separated from other strings by white spaces, punctuation marks or the end of a verse/line as a word-form (apostrophes, hyphens etc. are not considered as characters and are removed from the strings). If rhyme is present in a poem, it fulfils its main euphonic function. Since the number of possible rhymes in a language is finite – as can be shown by a combinatorial argument – the frequent use of two rhyme words together makes the given rhyme used up. It does not bring much surprise, on the contrary, it fulfils a kind of expectation and in the course of time it looses its effectiveness. Besides, not all words with a rhyming counterpart can be used within the same strophe because the respective words do not possess a matching semantic association or at least a kind of semantic contiguity. Generally, one and the same poet cannot afford to repeat the same rhyme in his further poems. Poetic originality is not only a matter of ideas but also a mat-

74  Phonic phenomena ter of form as distinct from scientific, journalistic or political texts in which the same matter can be steadily continued without any formal restrictions. Thus the rhyming technique of a poet must change in the course of time, and this change may concern any of the many properties both of rhyme (cf. e.g. http://en.wikipedia.org/wiki/Rhyme, 02.7.2012) and of the choice of rhyming words. Here we can take into consideration only a few of them. We shall show some quantitative properties and follow their development or at least their proportions and differences. (1) Word length. Rhyming words may have different syllabic lengths and the frequencies form a distribution. (2) Open and closed rhyme. A rhyming word ending with vowel is open, that ending with a consonant is closed. This concerns, of course, the phonetic and not the written form. The rhyme can be mixed, too, i.e. one of the rhyming words is open, the second is closed; hence each word must be counted separately. In a sonnet there are 14 rhyming words. (3) Masculine and feminine rhyme. The first contains a word with the stress on the last syllable, the latter has the stress on the penultimate syllable. (4) Parts of speech. Each word belongs to a part of speech which may be marked morphologically or, in some more analytic languages, at least syntactically. A poet may emphasise things, properties or references and other “Aristotelian” categories in different ways. It depends, of course, on the given language, which part of speech may occur at the end of a verse.

2.5.1 Word length The number of syllables (= word length) in rhyme words may be constant in some poetries but with Eminescu it is a free property controlled by meaning, which has priority. Nevertheless, we assume that there is some basic distribution which makes its way and controls the formation of rhyme words. It may follow the distribution of word length valid for the language as a whole but this is not very probable because in prose there are other regimes for manipulating length than in poetry. The existence of a distribution of rhyme-word lengths is, at the same time, a sign of some kind of inner rhythm. This rhythm may be disturbed in three extreme cases: (i) if the poem is too short, testing a distribution turns out to be a problem, e.g. because of zero degrees of freedom; (ii) in very long poems it may be disturbed by the fact that it was not written in one go; the author made pauses and after every pause a different mechanism could arise; (iii) if in most poems (of appropriate length) we discover some pattern in rhyme-

Rhyme  75

word length and find a poem not obeying such a regularity, it may be a sign of many corrections made a posteriori, or a sign of non-spontaneity. In the majority of cases one cannot ask the poet and considers the given poem as an exception. On the other hand, spontaneity is a process whose quality/degree cannot be measured. Analyzing 141 poems by Eminescu we arrived inductively at the hypothesis that rhyme-word length abides by a Poisson regime. Consider the rhyme positions as urns in which monosyllables, disyllables, etc. are placed. The urns may exert some influence, which may be neutral (random), attracting or repelling, thus yielding different distributions. But also the content may exert influence on the rhyme and this may lead to the rise of a whole family of distributions. Our task is to find this family. The simplest way is to apply the general theory as proposed by Wimmer and Altmann (2005). We suppose that if there is a mechanism controlling the rise of a distribution of rhyme-word lengths, then the probability Px of finding rhyme-word length x is proportional to the probability of the class x-1. This approach has been prolific in deriving many language laws. In discrete cases we simply suppose that (2.5.1)

∆Px −1 = g ( x) , Px −1

where ΔPx-1 = Px – Px-1, i.e. the difference of two neighbouring classes, and g(x) is a proportionality function. Simply, the relative difference of two neighbouring classes can be expressed by a function g(x). Taking, for example g(x) = a/x – 1, one obtains Px = (a/x)Px-1 whose solution yields the Poisson distribution (see below). This approach enables us to avoid the application of stochastic processes which are not beloved by linguists. We begin instead directly with the resulting difference equations whose solution yields – in our case – a family of Poisson-type distributions. Rewriting (2.5.1) as (2.5.2)

Px − Px −1 = g ( x) Px −1

we obtain (2.5.3) Px= (1 + g ( x)) Px −1 where g(x) represents a function which yields a preliminary, very general interpretation of the situation in language.

76  Phonic phenomena We write

a1 a2 a0 + + (2.5.4) g ( x) = c1 ( x + b1 ) ( x + b2 )c2 and interpret a0 as a proportionality constant which is present in any linguistic distribution. It is given as a constant of language, not as a property of the individual speaker or writer (poet). The second component expresses the simplest relationship of the neighbouring classes of the distribution, frequently controlled by two constants (b1 and c1), showing the mechanism of self-regulation. The numerator is a function of the speaker, the denominator is the control instrument of the hearer which cannot allow the property to attain infinite values. Here, it is the warrant of convergence. Instead of introducing further functions exerting influence on Px, one collects them in the third component considering it as the ceteris paribus condition. Here b2 and c2 are mostly great in order to break the activity of the speaker who is forced to take into account other intervening phenomena. In every case, the parameters and the components as a whole may obtain a different interpretation. Further research in this direction is necessary. In Eminescu's poems, we found four types of Poisson type distributions, all of them being special cases of (2.5.3). Since length x = 0 does not exist, all distributions must be displaced one step to the right, i.e. displacement means in the formulas that x is replaced by x-1, and the support changes to x = 1,2,…. If there are poems without word with length x = 1, then one must displace the distribution two steps to the right. The simplest special case is the Poisson distribution. Here we set a0 = −1, a1 = a, b1 = a2 = 0, c1 = 1 and obtain (2.5.5) Px =

a Px −1 x

Solving the equation stepwise yields

Px = (2.5.6)

a xe− a = , x 0,1,2,... x!

Rhyme  77

or in displaced form (2.5.7) Px =

a x −1e − a = , x 1,2,... ( x − 1)!

The results of fitting this distribution to Eminescu's rhyme-word lengths yielded the results presented in Table 2.5.1. Here, the empirical frequencies are given in the second column, for example, in Adânca mare…, there are 8 monosyllabic rhyme-words, 3 disyllabic, 1 trisyllabic, and 2 quadrisyllabic ones. The letter a symbolises the parameter of the Poisson distribution; X2 is the result of the chisquare test for goodness-of-fit performed with DF degrees of freedom and yielding the probability P. If P is greater than 0.01 – which has been chosen because of the very short tails of the empirical data –we consider the fitting result as safisfactory and accept the hypothesis that the data are Poisson-distributed. Table 2.5.1: Fitting the (displaced) Poisson distribution to rhyme-word lengths in Eminescu's poems (x = 1,2,…) Poem title Adânca mare…

Empirical distribution

a

X2

DF

P

8,3,1,2

0.7170

1.17

1

0.27

Atât de fragedă…

9,19,6,2

1.0441

4.12

2

0.13

Când amintirile...

4,14,3,2,0,0,1

1.2215

5.64

2

0.06

6,7,6,1

1.1021

0.23

1

0.63

Când priveşti oglinda mărei

7,14,9,2

1.2389

2.7

2

0.26

Ce e amorul?

6,11,6,4

1.3516

0.43

2

0.81

Ce te legeni...

Când marea...

6,6,8,3,1

1.4967

1.31

2

0.52

Cine-i?

9,14,6,1

0.9990

2.04

2

0.36

Crăiasa din poveşti

0,8,4,2

1.4366

5.67

2

0.06

1,9,3,1

1.3981

6.13

2

0.05

Cu mâne zilele-ţi adaogi...

6,6,3,0,1

0.9588

0.04

2

0.98

Cum oceanu-ntărâtat...

0,4,8,1,1

1.9464

6.54

1

0.01

De câte ori, iubito...

1,4,5,3,1

1.9992

0.53

1

0.47

De-aş avea

2,7,3,11,0,1

2.3374

2.39

1

0.12

De-aş muri ori de-ai muri

8,15,10,2,1

1.2764

1.97

3

0.58

Criticilor mei

78  Phonic phenomena

De-oi adormi (variantă)

10,19,3,1,2,1

0.8544

4.56

1

0.03

De-or trece anii...

5,6,4,1

1.1447

0.008

1

0.93

Departe sunt de tine

6,10,2

0.8428

3.03

1

0.08

18,15,4,1

0.6918

0.43

2

0.81

3,4,5,2

1.5232

0.9

2

0.64

1,4,3,2,2

2.0717

0.47

3

0.93

4,8,4

1.1037

1.25

1

0.26

7,9,10,7,1,1,1

1.6705

1.1

3

0.78

4,2,8,1,1

1.6165

6.45

3

0.69

2,6,4

1.3172

1.26

1

0.26

7,3,0,1,1

0.6334

0.43

1

0.51

3,4,5,0,1,0,1

1.5181

0.9

2

0.64

8,5,3

0.7375

0.12

1

0.73

6,9,5,1,1

1.1695

0.35

2

0.84

6,11,2,0,0,1

0.9248

2.96

2

0.23

4,5,3

1.0198

0.12

1

0.73

0,6,6,1,0,1

1.9453

2.54

1

0.11

18,26,15,3,1,0,1

1.1625

1.64

3

0.65

2,5,5,4

1.8606

0.3

2

0.86

6,9,7,5,2

1.6227

0.33

3

0.95

La moartea principelui Ştirbey

2,7,7

1.5773

1.16

1

0.28

La mormântul lui Aron Pumnul

5,7,7,4,1,0,1

1.6754

0.2

3

0.98

La o artistă (Ca a nopţii poezie)

7,8,7,14,1,1

2.0621

11.8

4

0.02

9,14,4,1

0.9152

2.28

2

0.32

3,8,4,1

1.2155

2.06

2

0.36

5,10,10,3

1.4814

2.58

2

0.28

13,16,4,1,1,1

0.9527

1.54

2

0.46

Misterele nopţii

4,7,8,11,1,1

2.1709

6.19

4

0.19

Noaptea

2,11,2,4,0,1

1.5351

6.88

3

0.08

Nu e steluţă

2,4,3,1,2

1.6323

0.09

2

0.95

Nu mă-nţelegi

4,8,9,1,2

1.6305

3.02

3

0.39

12,17,3,1,2,1

0.8542

1.93

1

0.16

Despărţire Din Berlin la Potsdam Din lyra spartă... Din noaptea Din străinătate Dintre sute de catarge Doi aştri Dorinţa Foaia veştedă (după Lenau) Frumoasă şi jună Horia Iar când voi fi pământ (variantă) Îngere palid... Iubind în taină... Iubitei La mijloc de codru... La moartea lui Neamţu

La o artistă (Credeam ieri) La steaua Locul aripelor Mai am un singur dor

Nu voi mormânt bogat (variantă)

Rhyme  79

O arfă pe-un mormânt

0,11,4,3,5

2.0288

8.77

3

0.03

10,7,1

0.5343

0.72

1

0.39

3,5,8,2

1.6324

3.55

2

0.17

17,17,8,1,1

0.9053

0.38

2

0.82

Peste vârfuri

2,5,4,1

1.3988

1.4

2

0.50

Povestea codrului

10,13,3

0.7885

2.45

1

0.12

8,13,3

0.8551

3.41

1

0.05

4,13,6,11,1,0,1

1.9392

7.69

3

0.05

S-a dus amorul

9,27,5,6,1

1.2459

11.21

3

0.02

Şi dacă...

4,5,1,1,0,1

1.1903

1.34

2

0.51

Singurătate

10,6,3,1

0.7578

0.28

1

0.59

Somnoroase păsărele...

0,8,4,4

1.6931

4.79

2

0.09

4,6,1,0,1

0.8983

0.94

1

0.33

10,14,13,4,2,1

1.4882

0.81

3

0.85

0,5,7,2

1.946

4.11

1

0.04

4,9,2,0,1

1.0091

2.69

1

0.11

O, mamă… Pajul Cupidon... Pe lângă plopii fără soţ…

Replici Revedere

Steaua vieţii Te duci... Veneţia (de Gaetano Cerri) Viaţa mea fu ziuă

2-displaced Poisson Ghazel

(0),21,15,4

0.5994

0.45

1

0.50

Glossa

(0),52,23,4,1

0.4218

0.07

1

0.79

(0),7,8,1

0.6925

2.07

1

0.15

Unda spumă

As can be seen, the result is satisfactory, even if we have small chi-square probabilities in some cases. In the process of fitting, we selected always the “best” Poissonian model, i.e. many of the above data can be captured by the other models, too. A further specification of (2.5.4), viz. a0 = −1, a1 = a, c1 = b, b1 = a2 = 0 yields

Px =

a Px −1 xb

80  Phonic phenomena i.e., there is a slight modification of the original Poisson distribution. The solution yields (2.5.8) = Px

ax = P0 , x 0,1,2,... ( x!)b

and with displacement (2.5.9) Px =

a x −1 P1 , x 1,2,3,... , = (( x − 1)!)b

representing the Conway-Maxwell-Poisson distribution. P1 is, as a matter of fact, the normalising constant. The results of fitting are presented in Table 2.5.2. Many of them can be captured also by the simple Poisson distribution but the result is better with this one. Of course, in all fittings we were forced to adapt pooling of some insufficiently represented length classes. The third model, called Hyperpoisson distribution, can be set up using the specifications a0 = −1, a1 = a, a2 = 0, b1 = b-1, c1 = 1. We see that also here the last component is zero. We obtain the recurrence relation yielding

= Px (2.5.10)

Px =

a Px −1 b + x −1

ax = P0 , x 0,1,2,... b( x )

where b(x) = b(b+1)(b+2)…(b+x-1) is the ascending factorial function. In the 1displaced case we obtain

= (2.5.11) Px

a x −1 C , x 1,2,3,... = b( x −1)

Rhyme  81

Table 2.5.2: Fitting the 1-displaced Conway-Maxwell-Poisson distribution to rhyme-word lengths in Eminescu's poems Poem title

Empirical distribution

a

b

X2

DF

P

Adio

5,23,10,1

4.6713

3.456

0.01

1

0.93

Ah, mierea buzei tale

8,17,10,1

2.8256 2.4639

0.95

1

0.33

Amicului F.I.

11,26,10,1

2.4892

2.7587

0.07

1

0.79

Basmul ce i l-aş spune ei

15,40,27,8

2.8747

2.1366

0.1

1

0.75

0.1125 2.9677

0.7

1

0.40

0.95

Ce-ţi doresc eu ţie, dulce Românie

2,11,15,4

Ecò

30,69,39,10,2

2.2079

1.9436

0.1

2

Epigonii

10,42,38,20,4

3.7233

1.9634

0.87

2

0.65

Făt-Frumos din tei

7,24,14,1

4.1189 3.0697

0.78

1

0.38

Împărat şi proletar

27,79,68,29,5,1,1

3.1494 1.9000

0.31

2

0.86

3.4911

1.6957

2.41

2

0.30

9.1609 3.8740

0.41

1

0.52

Junii corupţi Lasă-ţi lumea...

7,23,25,19,4 3,30,16,2,1

Luceafărul

91,169,100,29,3

2.0020

1.7773

1.46

2

0.46

Melancolie

8,17,10,3

2.1914

1.9109

0.01

1

0.91

Mortua est!

7,36,22,5

4.9340 2.9483

0.04

1

0.83

5,19,11,3

3.5377

2.5271

0.05

1

0.82

30,84,66,21,5

3.0179

1.9760

0.23

2

0.89

Privesc oraşul furnicar Ondina (Fantazie) Povestea teiului

4,40,36,5,3

10.1300

3.5192

0.01

1

0.92

Scrisoarea I

16,68,50,18,4

3.6527

2.1753

0.52

2

0.77

Scrisoarea II

8,37,28,8,1

4.5287

2.5572

0.02

2

0.99

Scrisoarea III

38,122,87,31,4,2,0,1

2.8995

1.9603

0.42

2

0.81

Scrisoarea IV

17,71,47,10,2,0,0,1

3.8110 2.4249

1.57

2

0.46

Scrisoarea V

17,46,39,12,5,1

1.7063

1.31

2

0.52

5.3863 3.0093

0.10

1

0.74

Viaţa

7,40,25,5,0,1

2.7577

where C is the normalisation constant. The results of fitting are presented in Table 2.5.3. There was only one distribution that required the last component containing aa.

82  Phonic phenomena The specification a0 = −1, a1 = a-1, a2 = 1, b1 = 0, b2 = a, c1 = c2 = 1 yields the recurrence formula

1   a −1 Px  Px −1 = + x a x  +  leading to the Ferreri-Poisson distribution (2.5.12a) Px =

a xe− a C , x 0,1,2,... , = (a + x) x!

where again the displacement causes the replacement of x by x-1 and the support is x = 1,2,…, i.e. (2.5.12b) Px =

a x −1e − a C , x 1,2,3... = (a + x − 1)( x − 1)!

Writing (2.5.12b) as

Px =

1 a x −1e − a C , x 1,2,3... = (a + x − 1) ( x − 1)!

we see that the original Poisson distribution has been modified at each step by a simple function. The fitting yielded the results presented in Table 2.5.4. In some cases, the results of fitting are not very satisfactory. We applied some legal “tricks”, e.g. adding a zero frequency for a non-existing first or last class, different pooling of classes, etc. In this way, we obtained a Poissonian regime in 131 poems out of 141, the remaining 10 poems were either too short, too irregular or too long, evidently composed of several parts so that no distribution at all could be fitted (Când; De ce nu-mi vii; Din valurile vremii..., Înger de pază; Lacul; Prin nopţi tăcute; Se bate miezul nopţii...; Maria Tudor; Feciorul de împărat fără de stea; Memento mori). We can now ask the question whether other Romanian poets abide by the same regime or whether it is only Eminescu's personal style. Further, is there a correlation between the year of origin and the model, or at least between the year and the parameters of a model? This question could be answered only if all poems by Eminescu would be analyzed. We leave the problems to future research.

Rhyme  83

Table 2.5.3: Fitting the 1-displaced Hyperpoisson distribution to rhyme-word lengths in Eminescu's poems Poem title Amorul unei marmure Andrei Mureşanu Aveam o muză

Empirical distribution

a

b

X2

DF

P

8,9,15,10,2

1.0272

0.0597

1.26

1

0.26

4,37,30,12,2,1

0.8283

0.0895

0.27

2

0.87

11,27,27,6,0,1

0.8508

0.2949

3.83

2

0.15

36,130,69,20,3

0.6044

0.1674

0.44

2

0.80

Când crivăţul cu iarna...

18,38,29,7,2

0.8030

0.3404

1.37

2

0.50

Care-i amorul meu în astă lume

0,20,12,3,1,1

0.6952

0.1084

3.07

1

0.08

8,14,21,6,2

0.7849

0.0099

0.76

1

0.38

2,35,18,7

0.5702

0.0326

0.1

1

0.75

4,13,8,1,1,0,1

0.7149

0.2200

0.02

1

0.89

8,22,21,5,3,1,1

1.3878

0.6787

3.54

3

0.32

Călin (file de poveste)

Copii eram noi amandoi Cugetările sărmanului Dionis De ce să mori tu? Doina Egipetul Floare-albastră Freamăt de codru Icoană şi privaz În căutarea Şeherezadei Înger şi demon Întunericul şi poetul Iubită dulce, o, mă lasă La Bucovina La moartea lui Heliade Mureşanu O călărire în zori O, adevăr sublime... Pustnicul Rugăciunea unui dac Sara pe deal

13,39,22,12,3,1

0.9712

0.3649

0.99

2

0.62

3,27,16,8,1,0,0,1

0.7707

0.0056

0.58

2

0.75

0,23,15,9,1

0.8174

0.0355

1.67

1

0.20

42,96,40,9,1

0.4931

0.2157

0.2

2

0.90

0,72,66,11,5,1,1

0.7269

0.0101

7.81

2

0.02

7,39,37,14,7,3

1.1232

0.2016

1.44

3

0.70

5,14,10,5

0.915

0.3268

0.01

1

0.91

6,23,16,8,3

0.9753

0.2544

0.17

2

0.92

8,15,9,3,1

0.8611

0.4593

0.003

1

0.96

9,18,14,7

1.0627

0.5313

0.16

1

0.69

58,103,46,12,3,1

0.6460

0.3637

0.6

2

0.74

13,40,22,11

0.7551

0.2454

0.21

1

0.65

10,21,8,5

0.7154

0.3407

0.95

1

0.33

9,33,17,3,2

0.5721

0.1560

0.04

1

0.85

6,20,13,7

0.8589

0.2577

0.03

1

0.87

0,13,10,0,1

0.5889

0.0546

3.18

1

0.07

1,22,10,5,3,1

0.8633

0.0392

4.22

1

0.04

11,24,7,2,1

0.4759

0.2181

0.75

1

0.39

Stelele-n cer

12,5,10,3,4,2

7.1701

8.9626

5.17

3

0.16

Venere şi Madonă

7,17,13,8,2,1

1.2575

0.5178

0.26

2

0.88

Sonete Speranţa

84  Phonic phenomena Table 2.5.4: Fitting the 1-displaced Ferreri-Poisson distribution to rhyme-word lengths in Eminescu's poems a

X2

1,10,2,2,1

2.6771

7.15

2

0.03

Kamadeva

1,7,2

1.6327

6.73

2

0.03

La Quadrat

1,10,4,0,1

1.7581

7.93

2

0.02

2,8,2

1.5667

5.71

1

0.02

2,9,4,1

1.7491

5.01

2

0.08

Poem

Empirical distribution

Cum negustorii din Constantinopol

Lebăda Lida O stea prin ceruri

DF

P

2,10,1,3

1.8384

7.45

2

0.02

0,0,7,4,3

4.0136

3.61

1

0.06

3,11,3,1

1.6173

6.23

2

0.04

Sus în curtea cea domnească

11,10,4,5

1.6469

1.18

2

0.55

Trecut-au anii

0,7,4,2,1

2.5384

1.17

2

0.56

13,6,5,3

1.4073

1.69

2

0.43

Oricâte stele... Pe aceeaşi ulicioară...

Vis

2.5.2 Open and closed rhymes If a rhyme-word ends with a vowel, we call it open, those ending with a consonant are closed. Evidently, the proportion of closed rhymes is merely the complement of the open ones, hence once needs only consider one of them. In the development of poetry there may be two trends: either it begins with a majority of open rhymes and after their abuse the number of closed rhymes increases, or vice versa; or rhyming approaches a steady state (fifty-fifty) which may attain the zero-zero limit with rhymeless poetry. The first case has been observed in the Slovak poetry up to 1960 (cf. Štukovský, Altmann 1965, 1966), the subsequent development is not known. As we analyze only one writer, we cannot make statements about the general development in Romanian. Nevertheless, we can at least show that this property is not constant but changes from year to year. We analyzed, again, 138 poems by Eminescu, but not all years are representative. For example, for 1875, 1882, 1884, 1885 and 1886 we have only one poem for each case and two of them display extreme values. For the other years we could form more or less reliable averages. The results are presented in Table 2.5.5.

Rhyme  85

Table 2.5.5: Mean proportion of open rhymes in Eminescu's poetry Year

% open rhymes 1866

0.6863

Year

% open rhymes 1878

0.6923

1867

0.5263

1879

0.7419

1868

0.5412

1880

0.7647

1869

0.6623

1881

0.7573

1870

0.7313

1882

0.7083

1871

0.7374

1883

0.7009

1872

0.6959

1884

0.3750

1873

0.8079

1885

0.9167

1874

0.7367

1886

0.5000

1875

0.5870

1887

0.7000

1876

0.6339

1889

0.7456

It can easily be shown that there is no trend in the data, the proportion of open rhymes is rather constant. The investigation for Eminescu comprises only 23 years so that a continuation of the study of Romanian poetry up to now would be necessary. In spite of the fact that some years are poorly represented (esp. the year 1884 yielding a strong extreme value) there is a slightly increasing trend to employ open rhyme words, as can be seen in Figure 2.5.1, which cannot be considered significant because of some outliers caused by insufficient representation. However, the drastic change in Figure 2.1.2.1. (euphony), occurring after the “pathological” year 1883, is confirmed also in Figure 2.5.1. for the open rhyme proportion. In further research, the data could be enriched by those from other writers because here we are interested rather in a hypothesis concerning the development of Romanian poetry as a whole.

86  Phonic phenomena

Figure 2.5.1. The trend of using open rhyme words by Eminescu

2.5.3 Masculine and feminine rhyme The placing of the last stress in the line distinguishes (at least) two kinds of rhyme words: masculine if the stress is on the last syllable, and feminine in all other cases. This is a slight simplification but in our case it is sufficient. Instead of presenting the whole table of data we show only the mean proportions of masculine rhymes in the respective years. Again, in some cases the individual outliers disturb the horizontal dispersion around a straight line. The results are presented in Table 2.5.6 and Figure 2.5.2.

Figure 2.5.2. Mean proportion of masculine rhymes in Eminescu's poems

Rhyme  87

Table 2.5.6: Mean proportions of masculine rhymes in Eminescu's poems ordered chronologically Year

Proportion of masculine rhymes

Year

Proportion of masculine rhymes

1866

0.4235

1878

0.6154

1867

0.5079

1879

0.3871

1868

0.5116

1880

0.5294

1869

0.4401

1881

0.3439

1870

0.4179

1882

0.5000

1871

0.4226

1883

0.4384

1872

0.3832

1884

0.0000

1873

0.2199

1885

0.5000

1874

0.2878

1886

1.0000

1875

0.3478

1887

0.4000

1876

0.4605

1889

0.2018

No significant trend could be observed but the dispersion seems to increase. Nevertheless, as noticed above (Figures 2.5.2 and 2.5.1), a significant change occurs after the year 1883 also in the masculine rhyme proportion.

2.5.4 Parts of speech in rhyme words Usually, not all parts of speech occur in the rhyme position unless the poet was forced to place e.g. a preposition or a conjunction or even a separable prefix at the end of the line in order to construct a rhyme. In some languages, it is not possible, in some other ones (e.g. German) it is sometimes necessary. Starting from the usual parts-of-speech system in Romanian following the classical Latin pattern we obtain N – noun V – verb A – adjective Av – adverb P – pronoun Pr – preposition

88  Phonic phenomena C – conjunction Nu – numeral I – interjection. However, not all parts of speech occur in rhyme positions in Eminescu's poetry. We do not find prepositions and numerals at all, only once a conjunction and a very small number of interjections. Hence we can set up a set {N,V,A,Av,P,C,I} but in Table 2.5.7 we bring the complete set because in analyzing further poems one could find a numeral or a conjunction. In order to study both the general and the individual trend in his works, we analyzed 141 poems and registered the part-of-speech to which the rhyme-word belongs. We obtained the results presented in Table 2.5.7, ordered alphabetically. Table 2.5.7: Parts of speech in Eminescu's poems Poem title (alphabetically)

N,V,A,Av,P,Pr,C,Nu,I

1

Adânca mare…

2

Adio

7, 5, 2, 0, 0, 0, 0, 0, 0

3

Ah, mierea buzei tale

16, 9, 3, 1, 7, 0, 0, 0, 0

4

Amicului F.I.

29, 4, 6, 2, 7, 0, 0, 0, 0

5

Amorul unei marmure

6

Andrei Mureşanu

7

Atât de fragedă…

8

Aveam o muză

9

Basmul ce i l-aş spune ei

10

Călin - File de poveste

11

Când

8, 8, 8, 0, 0, 0, 0, 0, 0

12

Când amintirile...

10, 7, 4, 2, 1, 0, 0, 0, 0

13

Când crivăţul cu iarna...

14

Când marea...

15

Când priveşti oglinda mărei

15, 3, 7, 4, 3, 0, 0, 0, 0

16

Care-i amorul meu în astă lume

22, 9, 5, 0, 1, 0, 0, 0, 0

17

Ce e amorul?

14, 7, 3, 2, 2, 0, 0, 0, 0

18

Ce te legeni...

19

Ce-ţi doresc eu ţie, dulce Românie

17, 11, 4, 0, 0, 0, 0, 0, 0

20

Cine-i?

16, 3, 11, 0, 0, 0, 0, 0, 0

15, 13, 3, 4, 4, 0, 0, 0, 1

30, 5, 7, 0, 2, 0, 0, 0, 0 34, 21, 24, 3, 4, 0, 0, 0, 0 23, 5, 4, 1, 3, 0, 0, 0, 0 27, 15, 25, 3, 2, 0, 0, 0, 0 37, 19, 23, 6, 4, 0, 0, 0, 1 136, 65, 49, 1, 7, 0, 0, 0, 0

29, 30, 28, 1, 6, 0, 0, 0, 0 7, 5, 7, 1, 0, 0, 0, 0, 0

10, 9, 1, 1, 4, 0, 0, 0, 0

Rhyme  89

Poem title (alphabetically)

N,V,A,Av,P,Pr,C,Nu,I

21

Copii eram noi amandoi

28, 15, 6, 1, 1, 0, 0, 0, 0

22

Crăiasa din poveşti

6, 2, 6, 0, 0, 0, 0, 0, 0

23

Criticilor mei

6, 6, 2, 0, 0, 0, 0, 0, 0

24

Cu mâne zilele-ţi adaogi...

25

Cugetările sărmanului Dionis

26

Cum negustorii din Constantinopol

14, 1, 1, 0, 0, 0, 0, 0, 0

27

Cum oceanu-ntărâtat...

4, 7, 3, 0, 0, 0, 0, 0, 0

28

De câte ori, iubito...

4, 4, 3, 3, 0, 0, 0, 0, 0

29

De ce nu-mi vii

30

De ce să mori tu?

31

De-aş avea

32

De-aş muri ori de-ai muri

33

De-oi adormi (variantă)

18, 6, 7, 4, 1, 0, 0, 0, 0

34

De-or trece anii...

6, 4, 0, 2, 4, 0, 0, 0, 0

35

Departe sunt de tine

36

Despărţire

37

Din Berlin la Potsdam

38

Din lyra spartă...

5, 5, 1, 0, 1, 0, 0, 0, 0

39

Din noaptea

6, 5, 2, 2, 1, 0, 0, 0, 0

40

Din străinătate

22, 7, 6, 1, 0, 0, 0, 0, 0

41

Din valurile vremii...

4, 10, 4, 1, 1, 0, 0, 0, 0

42

Dintre sute de catarge

12, 2, 1, 1, 0, 0, 0, 0, 0

43

Doi aştri

9, 0, 1, 0, 2, 0, 0, 0, 0

44

Doina

45

Dorinţa

46

Dumnezeu şi om

47

Ecò

73, 30, 44, 0, 3, 0, 0, 0, 0

48

Egipetul

44, 18, 28, 0, 0, 0, 0, 0, 0

49

Epigonii

49, 14, 49, 0, 2, 0, 0, 0, 0

50

Făt-Frumos din tei

51

Feciorul de împărat fără de stea

52

Floare-albastră

53

Foaia veştedă (după Lenau)

54

Freamăt de codru

18, 14, 11, 2, 3, 0, 0, 0, 0

55

Frumoasă şi jună

1, 8, 4, 0, 1, 0, 0, 0, 2

5, 7, 1, 3, 0, 0, 0, 0, 0 47, 10, 4, 2, 1, 0, 0, 0, 0

10, 4, 1, 5, 4, 0, 0, 0, 0 12, 3, 11, 2, 0, 0, 0, 0, 0 14, 1, 6, 3, 0, 0, 0, 0, 0 17, 8, 10, 0, 1, 0, 0, 0, 0

8, 7, 2, 1, 0, 0, 0, 0, 0 17, 9, 4, 4, 4, 0, 0, 0, 0 8, 4, 2, 0, 0, 0, 0, 0, 0

39, 14, 1, 1, 6, 0, 0, 0, 0 7, 3, 1, 1, 0, 0, 0, 0, 0 25, 3, 27, 0, 1, 0, 0, 0, 0

23, 6, 14, 2, 1, 0, 0, 0, 0 338, 211, 234, 22, 38, 0, 0, 0, 0 37, 6, 10, 1, 2, 0, 0, 0, 0 8, 4, 0, 0, 2, 0, 0, 0, 0

90  Phonic phenomena

Poem title (alphabetically)

N,V,A,Av,P,Pr,C,Nu,I

56

Ghazel

22, 7, 10, 0, 1, 0, 0, 0, 0

57

Glossă

36, 25, 10, 7, 2, 0, 0, 0, 0

58

Horia

59

Iar când voi fi pământ (variantă)

60

Icoană şi privaz

61

Împărat şi proletar

62

În căutarea Şeherezadei

63

Înger de pază

64

Înger şi demon

65

Îngere palid...

66

Întunericul şi poetul

23, 6, 4, 0, 1, 0, 0, 0, 0

67

Iubind în taină...

11, 2, 0, 0, 1, 0, 0, 0, 0

68

Iubită dulce, o, mă lasă

17, 19, 14, 4, 2, 0, 0, 0, 0

69

Iubitei

20, 20, 13, 2, 7, 0, 0, 0, 2

70

Junii corupţi

36, 18, 22, 2, 0, 0, 0, 0, 0

71

Kamadeva

72

La Bucovina

73

La mijloc de codru...

74

La moartea lui Heliade

23, 13, 12, 0, 0, 0, 0, 0, 0

75

La moartea lui Neamţu

15, 8, 5, 1, 0, 0, 0, 0, 0

76

La moartea principelui Ştirbey

77

La mormântul lui Aron Pumnul

13, 1, 8, 0, 3, 0, 0, 0, 0

78

La o artistă (Ca a nopţii poezie)

17, 8, 12, 2, 1, 0, 0, 0, 0

79

La o artistă (Credeam ieri)

14, 3, 11, 0, 0, 0, 0, 0, 0

80

La Quadrat

8, 4, 4, 0, 0, 0, 0, 0, 0

81

La steaua

4, 7, 2, 2, 1, 0, 0, 0, 0

82

Lacul

83

Lasă-ţi lumea...

84

Lebăda

6, 4, 2, 0, 0, 0, 0, 0, 0

85

Lida

6, 6, 4, 0, 0, 0, 0, 0, 0

86

Locul aripelor

87

Luceafărul

88

Mai am un singur dor

21, 4, 7, 3, 1, 0, 0, 0, 0

89

Maria Tudor

7, 5, 2, 0, 0, 0, 0, 0, 0

90

Melancolie

10, 4, 4, 0, 4, 0, 0, 0, 0 24, 7, 5, 2, 2, 0, 0, 0, 0 96, 52, 32, 4, 4, 0, 0, 0, 0 81, 55, 66, 2, 5, 0, 1, 0, 0 91, 23, 31, 0, 11, 0, 0, 0, 0 7, 2, 3, 0, 2, 0, 0, 0, 0 34, 26, 44, 0, 4, 0, 0, 0, 0 7, 4, 1, 0, 0, 0, 0, 0, 0

4, 2, 4, 0, 0, 0, 0, 0, 0 19, 2, 6, 3, 6, 0, 0, 0, 0 7, 2, 3, 0, 1, 0, 0, 0, 0

8, 5, 2, 0, 1, 0, 0, 0, 0

5, 5, 0, 0, 0, 0, 0, 0, 0 18, 17, 13, 0, 3, 0, 0, 0, 0

12, 6, 5, 1, 4, 0, 0, 0, 0 186, 100, 55, 36, 15, 0, 0, 0, 0

20, 10, 6, 1, 1, 0, 0, 0, 0

Rhyme  91

Poem title (alphabetically)

N,V,A,Av,P,Pr,C,Nu,I

91

Memento mori

668, 185, 303, 30, 24, 0, 0, 0, 1

92

Misterele nopţii

93

Mortua est!

94

Mureşanu

102, 57, 55, 2, 7, 0, 0, 0, 0

95

Noaptea...

8, 4, 8, 0, 0, 0, 0, 0, 0

96

Nu e steluţă

6, 3, 2, 0, 1, 0, 0, 0, 0

97

Nu mă-nţelegi

98

Nu voi mormânt bogat (variantă)

99

O arfă pe-un mormânt

100 O călărire în zori 101 O stea prin ceruri 102 O, adevăr sublime... 103 O, mamă… 104 Ondina (Fantazie) 105 Oricâte stele...

10, 11, 2, 9, 0, 0, 0, 0, 0 33, 13, 23, 0, 1, 0, 0, 0, 0

8, 4, 6, 3, 3, 0, 0, 0, 0 20, 6, 6, 3, 1, 0, 0, 0, 0 7, 3, 6, 1, 2, 0, 0, 0, 0 60, 11, 14, 1, 0, 0, 0, 0, 0 4, 9, 3, 0, 0, 0, 0, 0, 0 33, 4, 6, 0, 1, 0, 0, 0, 0 7, 5, 0, 3, 3, 0, 0, 0, 0 101, 33, 55, 4, 13, 0, 0, 0, 0 12, 1, 0, 1,0, 0, 0, 0, 0

106 Pajul Cupidon...

7, 2, 6, 3, 0, 0, 0, 0, 0

107 Pe aceeaşi ulicioară...

9, 3, 0, 4, 2, 0, 0, 0, 0

108 Pe lângă plopii fără soţ… 109 Peste vârfuri 110 Povestea codrului

12, 14, 11, 5, 2, 0, 0, 0, 0 4, 2, 3, 3, 0, 0, 0, 0, 0 19, 1, 4, 0, 2, 0, 0, 0, 0

111 Povestea teiului

36, 28, 12, 10, 2, 0, 0, 0, 0

112 Prin nopţi tăcute

2, 4, 10, 0, 0, 0, 0, 0, 0

113 Privesc oraşul furnicar

16, 9, 6, 5, 0, 0, 0, 0, 2

114 Pustnicul 115 Replici 116 Revedere

33, 8, 17, 3, 3, 0, 0, 0, 0 16, 3, 1, 0, 4, 0, 0, 0, 0 13, 14, 6, 2, 1, 0, 0, 0, 0

117 Rugăciunea unui dac

22, 17, 3, 3, 1, 0, 0, 0, 0

118 S-a dus amorul

19, 10, 7, 6, 6, 0, 0, 0, 0

119 Sara pe deal

8, 4, 10, 0, 2, 0, 0, 0, 0

120 Scrisoarea I

84, 35, 29, 5, 2, 0, 0, 0, 1

121 Scrisoarea II

58, 15, 6, 3, 0, 0, 0, 0, 0

122 Scrisoarea III

185, 53, 27, 6, 14, 0, 0, 0, 0

123 Scrisoarea IV

77, 37, 26, 7, 1, 0, 0, 0, 0

124 Scrisoarea V

72, 20, 17, 5, 6, 0, 0, 0, 0

125 Se bate miezul nopţii...

4, 2, 0, 0, 0, 0, 0, 0, 0

92  Phonic phenomena

Poem title (alphabetically) 126 Şi dacă...

N,V,A,Av,P,Pr,C,Nu,I 5, 6, 0, 1, 0, 0, 0, 0, 0

127 Singurătate

12, 4, 2, 2, 0, 0, 0, 0, 0

128 Somnoroase păsărele...

8, 4, 3, 1, 0, 0, 0, 0, 0

129 Sonete

21, 16, 3, 2, 0, 0, 0, 0, 0

130 Speranţa

22, 11, 8, 4, 0, 0, 0, 0, 0

131 Steaua vieţii

7, 2, 2, 0, 1, 0, 0, 0, 0

132 Stelele-n cer

23, 6, 7, 0, 0, 0, 0, 0, 0

133 Sus în curtea cea domnească

8, 3, 19, 0, 0, 0, 0, 0, 0

134 Te duci...

14, 16, 9, 4, 1, 0, 0, 0, 0

135 Trecut-au anii

7, 6, 0, 1, 0, 0, 0, 0, 0

136 Unda spumă

5, 5, 5, 0, 1, 0, 0, 0, 0

137 Venere şi Madonă

34, 4, 9, 0, 1, 0, 0, 0, 0

138 Veneţia (de Gaetano Cerri) 139 Viaţa

12, 1, 1, 0, 0, 0, 0, 0, 0 33, 17, 24, 3, 1, 0, 0, 0, 0

140 Viaţa mea fu ziuă

6, 7, 1, 1, 1, 0, 0, 0, 0

141 Vis

16, 4, 4, 1, 2, 0, 0, 0, 0

The rank-order distribution is not very illuminating and, as a matter of fact, not very useful. Direct comparisons of individual poems would easily be possible using the cosine similarity (see above) but one could obtain only a classification. More interesting is the activity/descriptivity relationship measured by means of a modified Busemann coefficient (cf. Busemann 1925; Altmann 1978) given as (2.5.13) Q =

V , A +V

i.e. as a proportion of verbs in A+V. The ratio Q is a simple proportion in [0,1]. In the vicinity of 0.5, it expresses the active-descriptive equilibrium; if Q < 0.5, it is a sign of ornamentality/descriptivity; if Q > 0.5, it is a sign of activity. Q is evidently binomially distributed and in absence of any trend, we have Q = p = 0.5. Now considering A + V = n, we easily obtain confidence intervals for any n. Thus we can decide that (i) if Q > 0.5 (or V > A) we compute n x =V  x  n

(2.5.14) P ( X ≥ V ) = ∑   0.5n

Rhyme  93

and if P(X ≥ V) ≤ 0.05, we consider the poem as rather active. (ii) If Q < 0.5, we compute n x =0  x  V

(2.5.15) P ( X ≤ V ) = ∑   0.5n and if P(X ≤ V) ≤ 0.05, we consider the poems as ornamental or descriptive. (ii) In all other cases the poem has the active-descriptive equilibrium. Consider the poem Adio in which V + A = n, i.e. 13 + 3 = 16, and Q = 13/16 = 0.8125. Since the first condition is fulfilled, we compute (2.5.14) and obtain P(X ≥ 13) = P(X = 13) + P(X = 14) + P(X = 15) + P(X = 16) = 16  16  16  16  =   0.516 +   0.516 +   0.516 +   0.516 =  13  14  15  16 

= 0.008545 + 0.001831 + 0.000244 + 0.000015 = 0.0106. Since the result is smaller than 0.05 we consider the rhyming part of the poem as highly active. In order to alleviate computations, we present a table (c.f. Table 2.5.8) showing n = 5,…,60 and the two boundaries of equilibrium. If V is smaller than or equal to the left number (VL), then the poem is ornamental (descriptive); if V is greater than or equal to the right number (VU), then the poem is active. In all other cases there is an active-descriptive equilibrium. In the above example, we had n = 16 and V = 13. Looking at Table 2.5.8 we easily find that V = 13 > 12, hence the poem is strongly “active”. In this way, we obtain at least a classification consisting of three classes. A finer computation of cumulative probabilities would yield a more continuous classification but it can be shown that Eminescu is rather in equilibrium. Not even the analysis of the history shows a trend-like motion. Though in the historically second half, some poems display greater “activity”, it is mostly because they are short and in the rhyme position there is no adjective. Thus the rhyme position is not enough for the judgement of the overall “activity” of the poem, it mirrors only a partial picture. Now, considering all parts of speech in the rhyme position we can ask whether the distribution of word classes is statistically equal. Any similarity measure could be used but a subsequent test for significance would be necessary in any case. Hence we rather perform a homogeneity test directly. We use

94  Phonic phenomena for this purpose the information statistics 2Î, which is asymptotically distributed like a chi-square variable, and build groups of poems which have a homogeneous distribution of parts of speech in the rhyme position. Table 2.5.8: Boundary values of active/descriptive equilibrium n

VL

VU

n

VL

VU

n

VL

VU

5

0

5

24

7

17

43

15

28

6

0

6

25

7

18

44

16

28

7

0

7

26

8

18

45

16

29

8

1

7

27

8

19

46

16

30

9

1

8

28

9

19

47

17

30

10

1

9

29

9

20

48

17

31

11

2

9

30

10

20

49

18

31

12

2

10

31

10

21

50

18

32

13

3

10

32

10

22

51

19

32

14

3

11

33

11

22

52

19

33

15

3

12

34

11

23

53

20

33

16

4

12

35

12

23

54

20

34

17

4

13

36

12

24

55

20

35

18

5

13

37

13

24

56

21

35

19

5

14

38

13

25

57

21

36

20

5

15

39

13

26

58

22

36

21

6

15

40

14

26

59

22

37

22

6

16

41

14

27

60

23

37

23

7

16

42

15

27

We demonstrate the procedure with the first two poems Adânca mare… and Adio. Adânca mare… - parts of speech in the rhyme position Adânca mare sub a lunei faţă (N); Înseninată de-a ei blondă rază (N), O lume-ntreagă-n fundul ei visează (V) Şi stele poartă pe oglinda-i creaţă (A). Dar mâni - ea falnică, cumplit turbează (V) Şi mişcă lumea ei negru-măreaţă (A),

Rhyme  95

Pe-ale ei mii şi mii de nalte braţe (N) Ducând peire - ţări înmormântează (V). Azi un diluviu, mâne-o murmuire (N), O armonie, care capăt n-are (V) Astfel e-a ei întunecată fire (N), Astfel e sufletu-n antica mare (N). Ce-i pasă - ce simţiri o să ni-nspire (V) Indiferentă, solitară - mare! (N)

Since none of the poems have numerals and prepositions in the rhyme position, we omit the two zeros and obtain N Adânca mare

V

…

7

A

Av

P

C

I

fi.

5

2

0

0

0

0

14

Adio

15

13

3

4

4

0

1

40

f.j

22

18

5

4

4

0

1

m=54

Let us consider the above matrix with 2 rows and 7 columns. The individual numbers in the cells are called fij, the sums of the rows are fi. and the sums of the columns f.j. Let the sum of all numbers be

m = ∑∑ fij i

j

Then the 2Î test can be performed by means of the formula (2.5.16) 2 Iˆ = 2

∑i ∑j

fij ≠ 0

f ij ln

mf ij fi . f . j

and can be presented also as

ˆ 2 ∑ ∑ fij ln fij + 2m ln m − 2∑ fi .ln fi . − 2 ∑ f . j ln f . j

= 2I (2.5.17)

i

j fij ≠ 0

i

j f . j ≠0

For each cell containing zero, 1 is subtracted from the overall 2Î (cf. Ku 1963). In our example we obtain m = 54 and (according to (2.5.17)) 2Î = 2(7 ln 7 + 5 ln 5 + 2 ln 2 + 15 ln 15 + 13 ln 13 + 3 ln 3 + 4 ln 4 + 4 ln 4 + 1 ln 1) + 2(54 ln 54) – 2(14 ln 14 + 40 ln 40) – 2(22 ln 22 + 18 ln 18 +

96  Phonic phenomena 5 ln 5 + 4 ln 4 + 4 ln 4 + 1 ln 1) = = 2(13.6214 + 8.0472 +1.3863 + 40.6208 + 33.3443 + 3.2958 + 5.5452 + 5.5452 + 0) + 2(215.4051) – 2(184.5020) – (68.0029 + 52.0267 + 8.0472 + 5.5452 +5.5452 +0) = = 222.8124 +430.8102 – 369.0040 – 278.3344 = = 6.2842 We subtract 5 for the zeroes from this number and obtain 2Î = 1.2842. Since we have (2-1)(7-1) = 6 degrees of freedom, we can say that the two poems are homogeneous from this point of view, even if the differences are striking. Unfortunately, the subtraction of zeroes leads many times to problematic results, hence in this problem the information statistics cannot be used. For all other cases we perform the usual chi-square test, i.e. we compute (2.5.18) X 2 = ∑ i

∑

j f .j ≠0

( fij − fi . f . j / m) 2 fi . f . j / m

.

For the above case we obtain X2 = 4.0953 which need not be modified. With 6 degrees of freedom it is not significant. Groupings can be performed by forming a group of only those poems which are not significantly different form all the other ones in the group. After performing all tests pair-wise, we can state that the poems have a very homogeneous distribution of parts of speech in the rhyme position. In this sense, Eminescu was very stable.

3 The word 3.1 Introduction The number of word properties studied so far in quantitative linguistics is considerable. As already said, these properties are nothing intrinsic to the word, they are scientific (or everyday) concepts, i.e. mental constructions. At the beginning of the 20th century very few quantitative word properties were discussed; today, every linguist can ex abrupto list at least twenty. Our conceptual knowledge increases and we aim at organising it in systems in which there are no longer isolated parts. This ideal state cannot be attained in one step, it is rather a way along which scientists collect and put together membra disiecta. In this section, we restrict ourselves to a few of the aspects of words which have been quantified. The most popular way to study quantitative linguistic properties is the investigation of word frequencies. This way yields various results, but also traps, pitfalls, stumbling blocks, etc. depending not only on the language, text sort, grammar under study, but also on the previous education of the researcher. The views on language and the methods applied by mathematicians, physicists, psychologists and linguists are quite different. A result that seems relevant to a researcher in information theory may seem quite irrelevant to a linguist and vice versa. The unification of all views into a general theory is not only necessary but also possible; however, complex research teams are needed to cover all the present-day opinions. Research expands both in depth and width; and a theory should be constructed in such a way that every new vista should be derivable from it. Presently, this is merely a dream. We shall restrict ourselves to the surveys of frequency and its different aspects, length, word classes, and sequences called motifs and apply the status quo to the poetic texts by M. Eminescu. In this volume, the presented models and methods are used to characterise poetic texts but they can be applied also to properties of any texts in any language. The word is, in a certain sense, a central linguistic unit. It is the clothing of concepts, so to say, their incarnation. Needless to say, communication is possible also without words but human communication is most effective when words are involved. Words and their properties are involved in syntactic constructions, in paradigmatic classes, in many control cycles of synergetic linguistics, and most dictionaries describe words and their properties. The number of words in a language is always underestimated. There are no complete dictionaries, of course. The greatest German dictionary contains about 300,000 words but lin-

98  The word guists estimate the real extent up to about 20 millions (including terminology). The impossibility to capture the complete stock is caused by the daily change of the vocabulary; words are born and die; only a part of them is codified in a dictionary. They are applied in texts, and their usage is controlled on two levels: on the surface by the grammar of the language, and in depth by laws dictated by the mechanisms of communication. These latent mechanisms give rise to phenomena a part of which will be described in this chapter. We shall touch word frequency, vocabulary richness of texts, word length, the representation of word classes evoking phenomena such as descriptiveness, nominal style, ornamentality, etc. but we shall not scrutinise all of them. It is in order to note here that capturing “all” properties of words in a work like that by Eminescu is absolutely impossible. On the other hand, many aspects we investigated yield neutral results, i.e. no observable tendencies or characteristic features. But with continuing research everything can turn out to be relevant for the development of language or literature. Thus we present many results but not all of them display remarkable features.

3.2 Frequency distribution Counting word frequencies in texts belongs to the earliest activities in quantitative linguistics. Perhaps the most famous case is that of Kaeding's (1897/98) frequency dictionary of German, which, just as the majority of frequency dictionaries, represents a kind of l’art pour l’art. It provides only material, not data, because no hypothesis is associated with it. Nevertheless, under favourable conditions one can construct different kinds of data from it. Today, the investigation of the frequency of occurrence of words and other units in texts has so many aspects that it must be considered a discipline on its own right. When units are counted three forms can and must be distinguished: (a) word forms, (b) lemmas, (c) hrebs. Regardless of which variant is chosen, a text has to be pre-processed before any counting of linguistic units becomes possible. Technically spoken, a (written) text consists of a stream of symbols, in which linguistically interesting units must be identified and segmented. In computational linguistics, this first step is called tokenisation. Fortunately, quite reliable software is available, which performs this task for many languages. Tokenisation has to cope with many detail problems, beginning with the decision which symbols will be considered characters and which are separators. The recognition of punctuation marks, numbers, hyphenation and many other questions belong to this step. Only then, the identified tokens can be processed to identify word forms. Compounding, multi-word units, proper

Frequency distribution  99

names, abbreviations, foreign words etc. form complications for which appropriate (with respect to the intended investigation) decisions have to be taken. Lemmatisation has become a standard task in computational linguistics, at least for many languages and common language processing purposes. An alternative is the segmentation of the text into hrebs (cf. Hřebíček 1997; Ziegler, Altmann 2001), which consist of words or morphemes or even phrases (Köhler, Naumann 2007) with the same meaning or function. In this case, the problem of homonyms is already solved and the performing of a morphemic analysis of the texts, the problem of apostrophes, hyphens and compounds, too. But there is no program that could master this specific task, at least up to now. Thus a hreb-like analysis must be performed with paper and pencil, though some procedures have already been programmed. The problem of “correctness” of our analysis does not exist here, because every analysis of a text is based on some criteria, and the criteria are not given a priori but set up by the researcher: they are conventions. In this sense, “correct” means “in agreement with definitions, conventions, etc.” If we search for external, objective criteria, then there are at least two: (1) Those definitions and criteria which yield results confirming with a linguistic law should be preferred over other ones. At the same time, this criterion is very pretentious because there is nothing more difficult in science than to establish a law. (2) That analysis is better whose resulting entities have the most associations with other computed entities, i.e. the results of the analysis can be embedded in a system of related statements. Needless to say, external criteria are a prerequisite on the way to a theory whereas internal criteria contribute to description, classifications, etc. In the sequel, we shall introduce some of the relations which have been thoroughly studied in quantitative linguistics. Word frequencies can be presented in three forms: (i) As ranked frequencies, where the rank is the independent variable and the frequency the dependent one. This form is called mostly “Zipf's law” but it has many variants and different interpretations (cf. e.g. Zipf 1935; Mandelbrot 1953; Miller 1957, Popescu, Altmann, Köhler 2010). It has a very rich history. Its properties belong to the most extensively studied ones, not only in linguistics (cf. http://www.nslij-genetics.org/wli/zipf). As a matter of fact, it is not a distribution but an ordered sequence. (ii) The frequency spectrum, where the independent variable is the frequency (x = 1,2,…) and the dependent variable is the number of words with frequency x. Theoretically, it can be obtained by a transformation of the rank-frequency sequence. This relation forms indeed a frequency distribution. (iii) In form of a cumulative frequency distribution, where the frequencies in (i) or (ii) are summed up step by step.

100  The word These presentations are already high abstractions and can be used to compute several indicators. In our present investigation, we shall operate with word forms. The apostrophes will be eliminated and the word parts will be joined (not replaced by a blank); the same will be done with hyphens; compounds represent one word form if they are written together. The boundary of the word form is formed by a blank (white space) or another separator such as punctuation marks, including the beginning and the end of a text. This criterion is simply practical and may be applied mechanically, at least in alphabetic languages. Its adequateness can be corroborated only by means of the two external criteria mentioned above. For the sake of illustration, we present the computing procedure as applied to Eminescu's poem Prin nopţi tăcute in Table 3.2.1. The first column contains the ranks r. We shall adhere to this technique, though it would be possible to ascribe words with the same frequency their mean rank. In Table 3.2.1 the first and the second word would have ranks 1.5. In our case it would complicate some computations. The order of the word forms in the last columns is reversely alphabetic in the given frequency class. This is rather a technical matter without any influence on the results. The frequency f(r) is the number of occurrences of the given word-form in the given text. The cumulative frequency ∑f(r) is the stepwise addition of frequencies beginning with the first one. F(r) is the empirical distribution function obtained as (3.2.1) F (r ) =

1 r ∑ f (i) , N i =1

where N is the sum of all frequencies (i.e. number of tokens = text length). Information of this kind is sufficient for any kind of statistical processing, but we shall use only a part of this information for descriptive purposes. The frequency spectrum is easily computed from the ranked frequencies simply by counting the numbers of ones, twos, threes,… in the second column. Then x (x = 1,2,3,…) is the occurrence frequency of words and f(x) is the number of words occurring x times. Using the data in Table 3.2.1 we set up the spectrum as presented in Table 3.2.2.

Frequency distribution  101 Table 3.2.1: Rank-frequency of word forms in Eminescu's poem Prin nopţi tăcute Rank r

Frequency

Cumulative

f(r)

frequency ∑f(r)

3

3

2

3

6

3

2

8

4

2

10

5

2

12

6

2

14

7

1

15

8

1

16

9

1

17

10

1

18

11

1

19

12

1

20

13

1

21

14

1

22

15

1

23

16

1

24

17

1

25

18

1

26

19

1

27

20

1

28

21

1

29

22

1

30

23

1

31

24

1

32

25

1

33

26

1

34

27

1

35

28

1

36

29

1

37

30

1

38

31

1

39

32

1

40

33

1

41

34

1

42

35

1

43

1

F(r)

Word form

0.0625

prin

0.1250

din

0.1667

un

0.2083

şi

0.2500

luna

0.2917

lumea

0.3125

visuri

0.3333

vântul

0.3542

văd

0.3750

trece

0.3958

tăcute

0.4167

senină

0.4375

sece

0.4583

sunt

0.4792

rece

0.5000

plină

0.5208

plâng

0.5417

ochiumi

0.5625

obraz

0.5833

o

0.6042

nor

0.6250

nopţi

0.6458

mute

0.6667

mintea

0.6875

marea

0.7083

lunce

0.7292

lină

0.7500

lată

0.7708

iute

0.7917

în

0.8125

icoanai

0.8333

glas

0.8542

eu

0.8750

cu

0.8958

cea

102  The word Rank r

Frequency

Cumulative

f(r)

frequency ∑f(r)

1

44

37

1

45

38

1

46

39

1

47

40

1

48

36

F(r)

Word form

0.9167

ce

0.9375

cată

0.9583

cânt

0.9792

beată

1.0000

aud

Table 3.2.2: Frequency spectrum of word forms in Eminescu's poem Prin nopţi tăcute Number of occurrences x

Number of words occurring x-times f(x)

1 2 3

34 4 2

In the same way as in Table 3.2.1, one can set up the cumulative frequencies and the empirical distribution function, too. Formally, the transformation of rankfrequencies into a spectrum is done by solving (3.2.2) x ≤ f(r) < x + 1 where f(r) is a theoretical function, for r (cf. Haight 1966, 1969; Baayen 1989; Zörnig, Boroda 1992, Wimmer et al. 2003). The presentation of lemma and hreb frequencies is performed analogically.

3.2.1 Stratification Many sophisticated argumentations have been presented together with tentative models of the rank-frequency distribution of word-like units. The observed regularity is known as “Zipf's law” though there were different researchers earlier than Zipf to discover the utility of the power function for similar purposes, cf. http://www.nslij-genetics.org/wli/zipf (02-01.2010) where Wentian Li collected much literature concerning this “law” in sciences. Though Zipf himself did not have a concrete hypothesis and set up his formula simply leaning against the inspection of frequencies, other researchers brought in miscellaneous arguments (e.g. B. Mandelbrot 1953, G.A. Miller 1957; J.K. Orlov 1982; V.V. Arapov 1977; Arapov, Šrejder 1977, 1978, W. Li 1992; Baayen 2001; Naranan,

Frequency distribution  103

Balasubrahmanyan 2005, Manin 2009; Ferrer i Cancho 2005), and “Zipf's law” is perhaps the most diffused linguistic concept in all sciences. The development is caused, as is usual in science, by cropping up counter arguments, different views, new interrelations etc. As already mentioned above, the rank-frequency sequence is not really a distribution because ranks do not represent a random variable. This problem concerned G. Herdan (1956, etc.) who criticised it wherever he could. However, the power function found a victorious way into several scientific disciplines; in probability theory it is called zeta-distribution (sometimes also Joos' model, Riemann zeta distribution, Zipf distribution, Zipf's law, Zipf-Estoup law, etc. cf. Wimmer, Altmann 1999: 665 f.). Whether in form of a function or of a distribution, it is a special case of the general theory of language laws (cf. Wimmer, Altmann 2005). However, it has been shown that the degree of synthetism of a language plays an important role for the adequacy of Zipf's formula, especially in the domain of hapax legomena. In Popescu, Mačutek, Altmann (2009) fitting Zipf's formula to three texts in languages with different degrees of synthetism has been shown. The fitting results, as presented in Figure 3.2.1.1, 3.2.1.2 and 3.2.1.3, show that in a highly synthetic language such as Hungarian, the power function lies below the hapax legomena; in a highly analytic language such as Hawaiian, it lies above the hapax legomena; only in a moderately analyticsynthetic language such as Bulgarian, it crosses the hapax legomena. This holds true, of course, only for unlemmatised texts.

Figure 3.2.1.1. Fitting the power function to a text in a highly synthetic language

104  The word

Figure 3.2.1.2. Fitting the power function to a text in a highly analytic language

Figure 3.2.1.3. Fitting the power function to a text in a balanced synthetic language (from Popescu, Mačutek, Altmann 2009: 104 f.)

Though all fitting results in the figures are adequate, they miss a bit of realism. In order to reconcile the observed facts with our intuition, we consider the ranked frequencies as a sequence and adhere to the proposal made by Popescu, Altmann, Köhler (2010) and Altmann, Popescu, Zotta (2013)1 considering the ranked frequencies as a superposition of different strata in text. The simplest strata are formed by e.g. the parts-of-speech classes, autosemantics and synsemantics, specific words and general words, direct and indirect speech, persons in a stage play, etc. The study of strata is not finished, on the contrary, it can begin just now. According to this approach, different strata have their own fre 1 Altmann, G., Popescu, I.-I., Zotta, D., (2013). Stratification in Texts, Glottometrics 25, 85-93.

Frequency distribution  105

quency sequences organised in a decreasing order, in form of a decay. Decays have usually the form y = a exp(-x/r). If we consider a superposition of these strata, i.e. if we feed the individual strata in the main stratum and re-rank the whole field we obtain the function

= y

∑ a exp(− x / r ) i

i

i

with a variable number of exponential components. In addition, since we know that frequencies cannot be smaller than 1, we add 1 as the limit and obtain finally (3.2.1.1) y = 1 + a1 exp(−x/r1) + a2 exp(−x/r2) + a3 exp(−x/r3) + … This function has several advantages: (i) It is not a distribution, hence it need not be normalised, and reconciles even those who reject ranking as random variable. (ii) It does not attain unrealistic values smaller than 1, a case that is present with all distributions applied to this phenomenon, if they are not truncated at the right hand side (but often even in that case). (iii) It automatically provides information about the number of strata: if two exponential components have the same (or very similar) parameter ri, then one of the components is redundant and may be omitted. Parameter ai is only part of the sum of all ai which express the amplitude. When we eliminate a redundant component, then a new fitting with a reduced number of exponential components automatically adds the lost part of the amplitude to other ai's. We performed fitting function (3.2.1.1) to 146 poems by Eminescu and obtained the results presented in Table 3.2.1.2. Adding a new component was stopped as soon as it contained an exponent identical with a previous one. The poem Memento mori, which is very long, is given in the table with three strata but it contains four. They are y = 448.4420 exp(-x/1.6753) + 179.4063exp(-x/9.7582) + + 16.0070exp(-x/79.0170) + 6.8450exp(-x/412.8684) and R2 = 0.9938. However, the fitting result does not improve considerably, because R2 is in both cases greater than 0.9, hence we shall accept the threecomponent version. The table gives some answers and stimulates other ones. (1) We see that the maximal number of strata is 4 but it can be reduced to 3 because the poem Memento mori may be captured sufficiently well with 3 components. Thus for Eminescu we have the results presented in Table 3.2.1.1.

106  The word Table 3.2.1.1: Number of strata in 146 poems by Eminescu No. of Strata

No. of Poems

1 2 3

41 66 39

It can be supposed that there are no stratification preferences in Eminescu's poems, although the frequencies are not distributed uniformly (X2 = 9.39 with 2 DF). (2) An increasing or decreasing number of strata in the course of time cannot be observed. (3) Though the shortest poems have only one stratum and the long poems have mostly 3 strata, a clear tendency (correlation with N) cannot be detected. This is caused by the weak variation of the number of strata. A tendency can at least be conjectured if we compare Eminescu's stratification with that of the Slovak poet E. Bachletová who wrote only short poems (N < 171), all of which contain only one stratum. (4) Is there an association between the number of strata and the average length of the verses? To answer this question, verse length line by line and poem by poem would have to be counted, but again, the small variation within the data would obviate reliable statements. (5) Are there any semantic or content-dependent properties that cause the given stratification? (6) Does stratification depend on the presence of communication or speech acts? (7) Strata may arise even if a poem was not written in one go – this is always the case with long texts – or if the author or the editors performed supplementary changes. This list can easily be prolonged but we shall postpone it or leave it to those researchers who are specialised in some of the given domains. We take it for granted that the number of strata in a poem is not a random event but surely rooted in some circumstances we do not know as of yet. Each of the above items requires both historical and literary knowledge and the research will boil down to interpretative conjectures, because Eminescu himself cannot be asked.

Frequency distribution  107

In order to illustrate the work with the stratification approach we present in Figures 3.2.1.4, 3.2.1.5, and 3.2.1.6 fitting results with (3.2.1.1) and one, two and three exponential components.

Figure 3.2.1.4. Fitting a mono-stratal poem Locul aripelor

Figure 3.2.1.5. Fitting a bi-stratal poem Speranţa

108  The word

Figure 3.2.1.6. Fitting a tri-stratal poem Scrisoarea IV

As can be seen in the last figures, the curves adequately capture the frequencies. However, the use of other formulas even for the monostratal case yields worse results. In Figures 3.2.1.7, 3.2.1.8, and 3.2.1.9 we present the fitting of Zipf's, Zipf-Mandelbrot's and Zipf-Alekseev's distributions to the data from the poem Locul aripelor. The formulas are as follows Right truncated Zipf/zeta distribution: (3.2.1.2)

;

;

Right truncated Zipf-Mandelbrot distribution: (3.2.1.3) Right truncated modified Zipf-Alekseev distribution:

(3.2.1.4)

Frequency distribution  109

Figure 3.2.1.7. Locul aripelor: Fitting the Right truncated zeta distribution with: a = 0.5305, n = 173, X2 = 12.7007, DF = 134, P ≈ 1.00, f(r) < 1 for r > 101

Figure 3.2.1.8. Locul aripelor: Fitting the Right truncated Zipf-Mandelbrot distribution with a = 0.6601, b = 3.8588, n = 173, X2 = 14.22, DF = 130, P ≈ 1.00, f(r) < 1 for r > 97

The result is excellent in each case, the probability is always ≈ 1.00 but the figures show some deficiencies and differences, especially the convergence against 0, present even with the truncated zeta/Zipf-distribution. Both approaches have similar problems: With a distribution as a model, it is not always easy to find a plausible interpretation of the parameters; with the stratification approach, the problem is the determination and interpretation of the strata. In the optimal case one ought to derive the parameters from a theory, but this is a dream of the future.

110  The word

Figure 3.2.1.9. Locul aripelor: Right truncated modified Zipf-Alekseev: a = 0.3351, b = 0.0337, n = 173, α = 0.0309, X2 = 13.98, DF = 130, P ≈ 1.00; f(r) < 1 for r > 97

In Table 3.2.1.2 the exponential parameters are arranged in increasing order. A preliminary examination yielded the following results: – The parameter r1 of the mono-stratal data depends on text length N as can be seen in Figure 3.2.1.10. It could be captured by means of a linear function but this circumstance cannot be generalized as yet. – In bi-stratal data the relationship of parameters r1 and r2 displays a considerable dispersion (Figure 3.2.1.11). Though an increasing trend is visible, much more data are necessary to establish a relationship. – In tri-stratal data there are three possibilities to combine two parameters as can be seen in Figures 3.2.1.12, 3.2.1.13, and 3.2.1.14. In the first case, the outlier seduces to the acceptation of a linear trend but the dispersion is too great. The relationship between r1 and r3 in Figure 3.2.1.13 is rather a cloud than a strict relationship. And that of r2 and r3 as presented in Figure 3.2.1.14 is again a very peculiar phenomenon. Thus, until further investigations have been performed, we cannot set up even hypotheses about these relations. The result is won abductively, it is pretheoretical and needs substantiation. Let us briefly recollect the rise of the spectrum. It is created from the rank-frequency distribution by “turning it round” and without taking recourse to the ranking. The spectral g(1) is the number of different hapax legomena, g(2) the number of different dis legomena, g(3) the number of different tris legomena, etc., i.e. the number of different words occurring 1, 2, 3, etc. times.

The word  111

Crăiasa din poveşti Criticilor mei Cu mâne zilele-ţi adaogi… Cugetările sărmanului Dionis Cum negustorii din Constantinopol Cum oceanu-ntărâtat... Dacă treci râul Selenei De câte ori, iubito… De ce nu-mi vii De ce să mori tu? De-aş avea De-aş muri ori de-ai muri Demonism De-oi adormi (variantă) De-or trece anii… Departe sunt de tine Despărţire Din Berlin la Potsdam Din lyra spartă... Din noaptea Din străinătate Din valurile vremii… Dintre sute de catarge Doi aştri Dorinţa

1876 1883 1883 1872 1874

1873 1873 1879 1887 1869 1866 1869 1872 1883 1883 1878 1879 1873 1867 1884 1866 1883 1880 1872 1876

Poem title

Year

77 356 102 123 266 93 258 882 122 87 135 304 128 51 68 244 152 50 40 102

122 130 141 571 101

N

4.3323 26.2532 30.1546 1009.5406 12.5348 5.9555 6.5861 39.9979 4.0551 8.1834 3.5936 13.1881 6.1389 2.4715 2.9175 15.5203 1048.0385 5.1076 1023.8836 16.4283

620.1621 2.9378 5.9539 53.0793 5.0416

a1

0.5342 2.7062 0.4500 0.1649 1.0288 6.0681 4.1209 3.1068 5.0045 3.4878 1.8192 1.1467 1.7924 3.5407 4.6150 2.2650 0.1447 2.3633 0.1443 0.7296

0.1967 14.4864 1.4186 0.7128 4.0443

r1

3.0751 3.3878 1.9529 4.8512 7.9849 X 4.7201 8.9490 X X 4.4930 2.4289 2.8418 X X 3.0936 5.5159 X X 2.2071

2.3550 X 2.1354 11.5949 X

a2

3.6733 20.7827 8.5934 2.2171 11.7215 X 15.1460 30.5860 X X 6.2784 4.8224 8.2511 X X 17.1414 9.2768 X X 6.1129

11.5610 X 15.9017 6.7859 X

r2

X X X 2.7798 X X X X X X X 6.0690 X X X X X X X X

X X X 3.0792 X

a3

X X X 12.0362 X X X X X X X 14.3645 X X X X X X X X

X X X 31.9774 X

r3

0.9429 0.9759 0.9426 0.9657 0.9849 0.9592 0.9755 0.9692 0.9526 0.9808 0.9714 0.9844 0.9580 0.8790 0.9120 0.9711 0.9661 0.9265 1.0000 0.9621

0.9424 0.8984 0.9244 0.9899 0.9663

R2

3 2 2 3 2 1 2 2 1 1 2 3 2 1 1 2 2 1 1 2

2 1 2 3 1

112  The word

Strata

Dumnezeu şi om Ecò Egipetul Epigonii Făt-Frumos din tei Feciorul de împărat fără de stea Floare-albastră Foaia veştedă (după Lenau) Freamăt de codru Frumoasă şi jună Ghazel Glossă Horia Iar când voi fi pământ (variantă) Împarat şi proletar În căutarea Şeherazadei Înger de pază Înger şi demon Îngere palid... Întunericul şi poetul Iubind în taină… Iubită dulce, o, mă lasă Iubitei Junii corupţi

1873 1872 1872 1870 1875 1872

1874 1874 1871 1873 1870 1869 1883 1871 1871 1869

1873 1879 1879 1871 1873 1883 1867 1883

Poem title

Year

1510 915 91 876 63 249 87 337 416 458

247 115 179 113 331 380 143 131

443 698 688 921 415 6030

N

47.4060 35.7173 16.6935 23.0117 2.6544 11.1358 2.6544 12.0744 18.0428 20.7220

14.9171 511.6083 4.0893 1811.9780 9.4178 12.9813 19491.6295 9.9354

15.2903 34.9965 26.4144 23.6970 11.4825 142.3625

a1

2.7852 3.5201 0.5082 1.8070 4.5588 3.6606 4.5588 0.7234 1.5403 1.8898

2.2033 0.1729 1.9703 0.1551 2.5320 1.2626 0.1114 0.6885

4.2676 4.5633 4.3648 2.4497 2.0479 2.1681

r1

19.9569 3.1089 1.8382 16.2868 X 2.1619 X 7.3443 6.8214 11.7782

2.5109 2.8761 4.1896 4.7950 5.1449 4.4059 2.8539 3.0310

2.5673 3.0133 3.4749 20.9277 4.4631 130.6617

a2

13.2994 10.8104 10.9270 13.0474 X 19.2127 X 17.3343 24.1768 10.9045

16.0441 5.8439 7.9319 6.7276 16.7188 5.1963 8.6061 8.2061

27.6472 40.4731 41.1845 6.3594 2.6915 11.4048

r2

4.9170 4.7263 X 2.4309 X X X X X X

X X X X X 9.0697 X X

X X X 3.3973 6.4490 10.6212

a3

61.2710 39.9599 X 53.0710 X X X X X X

X X X X X 18.4480 X X

X X X 58.2903 17.4201 194.6782

r3

0.9923 0.9931 0.9064 0.9829 0.9004 0.9669 0.9050 0.9839 0.9888 0.9858

0.9256 0.9461 0.9752 0.9703 0.9815 0.9872 0.9508 0.9559

0.9781 0.9836 0.9867 0.9887 0.9900 0.9847

R2

3 3 2 3 1 2 1 2 2 2

2 2 2 2 2 3 2 2

2 2 2 3 3 3

Frequency distribution  113

Strata

Kamadeva La Bucovina La mijloc de codru… La moartea lui Heliade La moartea lui Neamţu La moartea principelui Ştirbey La mormântul lui Aron Pumnul La o artistă (Ca a nopţii poezie) La o artistă (Credeam ieri...) La Quadrat La steaua Lacul Lasă-ţi lumea… Lebăda Lida Locul aripelor Luceafărul Mai am un singur dor Melancolie Memento mori Miradoniz Misterele nopţii Mitologicale Mortua est! Mureşanu

1887 1866 1883 1867 1870 1869 1866 1868

1869 1870 1886 1876 1883 1869 1866 1869 1883 1883 1876 1872 1872 1866 1873 1871 1876

Poem title

Year

219 110 71 90 225 41 66 259 1737 125 274 9773 636 155 681 491 2051

81 184 55 332 245 132 150 142

N

52.6423 6.7652 3.1825 1.9511 8.7220 4.5000 5198.9942 8.0232 109.0767 2.5801 2050.7692 459.7574 52.0807 14.7594 43.8444 2413.1118 90.3787

5.1921 6.6344 21.1647 2010.6876 22.1321 6.0559 502.6801 1009.8848

a1

0.5313 4.8902 3.6197 2.0911 1.5835 1.4427 0.1216 11.4417 1.1309 9.4739 0.1770 1.7496 1.8462 0.5051 1.9040 0.1537 1.9000

0.5548 1.4426 1.3732 0.1311 0.3829 6.5669 0.1630 0.1583

r1

3.1343 X X 5.0539 4.9613 X 1.9852 X 35.8357 X 7.8142 171.4638 5.7625 4.3472 6.7523 16.0205 31.4086

2.6971 2.8229 X 11.0637 1.5359 X 6.5360 3.4573

a2

20.0234 X X 3.8944 10.6603 X 4.7291 X 9.7030 X 4.2871 11.2838 8.0546 10.9757 8.9135 4.7484 14.4051

4.4592 14.6002 X 2.9660 3.4484 X 1.6817 1.7790

r2

X X X X X X X X 6.3336 X 2.7489 14.9325 4.4058 X 3.2527 4.5730 4.5796

X X X 4.3421 4.5061 X 2.5251 2.3545

a3

X X X X X X X X 82.1682 X 17.8317 234.1431 34.3053 X 38.8606 28.8266 120.0863

X X X 19.6092 15.6923 X 11.0561 14.9838

r3

0.9694 0.9466 0.9073 0.9741 0.9837 0.8348 0.9194 0.9833 0.9947 0.8653 0.9854 0.9930 0.9941 0.9639 0.9627 0.9900 0.9931

0.9338 0.9539 0.9699 0.9771 0.9690 0.9481 0.9614 0.9415

R2

2 1 1 2 2 1 2 1 3 1 3 3 3 2 3 3 3

2 2 1 3 3 1 3 3

114  The word

Strata

1868 1873 1866 1869 1874 1880 1883 1872 1869 1878 1879 1879 1883 1883 1878 1887 1869 1873 1874

1873 1874 1871 1867 1882 1883

Year

Murmură glasul mării Napoleon Noaptea… Nu e steluţă Nu mă-nţelegi Nu voi mormânt bogat (variantă) Numai poetul O arfă pe-un mormânt O călărire în zori O stea pin ceruri O, adevăr sublime... O, mamă… Odă în metru antic Odin şi poetul Ondina (Fantazie) Oricâte stele… Pajul Cupidon… Pe aceeaşi ulicioară… Pe lângă plopii fără soţ Peste vârfuri Povestea codrului Povestea teiului Prin nopţi tăcute Privesc oraşul furnicar Pustnicul

Poem title

48 157 346 78 334 140 103 1429 871 85 148 138 199 47 220 390 48 173 380

119 240 177 54 384 113

N

2.9884 6.7169 71101.8760 2.7075 18.3782 3.9817 3.7195 54.6362 25.5026 1.9520 11.6078 3.3786 5.1965 7.7134 11.7077 2279.6645 2.9884 4182.5534 7.2483

20092.6605 23.3121 8.4605 2.6189 9.8973 5.6330

a1 3.4122 3.0347 5.5876 X 3.3607 X X 4.3446 8.1815 X 3.1713 4.1507 X 35.4042 22.2535 X X 4.4394 6.2670 X 2.6900 10.4876 X 4.8366 5.8695

3.3896 1.3403 0.1114 5.7072 3.7779 2.1181 6.1772 0.9785 1.2515 7.2602 3.3671 2.4637 1.3915 1.5149 1.4751 0.1617 3.3896 0.1398 4.1610

a2

0.1092 1.2236 0.6900 6.2878 5.7220 3.1103

r1

X 8.4609 4.3886 X 16.1917 9.4530 X 9.2958 6.7031 X X 7.0803 9.8897 X 16.5057 3.6789 X 2.2154 15.2017

5.8434 19.0081 9.2722 X 24.0198 X

r2 X X X X X X X X 2.6000 X X X X 4.3825 3.8466 X X X X X X 4.5035 X 2.9397 X

a3 X X X X X X X X 30.1466 X X X X 87.6763 48.8057 X X X X X X 21.3489 X 9.5797 X

r3

0.9141 0.9774 0.9760 0.9065 0.9859 0.9595 0.9453 0.9904 0.9914 0.8349 0.9697 0.9771 0.9808 0.9843 0.9442 0.9851 0.9141 0.9747 0.9881

0.9599 0.9786 0.9731 0.8884 0.9739 0.9756

R2

1 2 3 1 2 2 1 3 3 1 1 2 2 1 2 3 1 3 2

2 2 2 1 2 1

Frequency distribution  115

Strata

Poem title

Replici Revedere Rugăciunea unui dac S-a dus amorul Sara pe deal Scrisoarea I Scrisoarea II Scrisoarea III Scrisoarea IV Scrisoarea V Se bate miezul nopţii… Şi dacă… Singurătate Somnoroase păsărele… Sonete Speranţa Steaua vieţii Stelele-n cer Sus în curtea cea domnească Te duci... Trecut-au anii Unda spumă Venere şi Madona Veneţia (de Gaetano Cerri) Viaţa mea fu ziuă Vis

Year

1871 1879 1879 1883 1885 1881 1881 1881 1881 1881 1883 1883 1878 1883 1879 1868 1871 1879 1870 1883 1883 1869 1887 1883 1869 1876

147 141 357 219 156 1282 696 2278 1256 1027 45 53 172 55 265 245 70 91 128 84 88 59 393 79 105 177

N 17.6037 5.6647 14.4738 205.8417 2.6301 62.2749 14062.8669 113.6286 103.8187 45.3177 3.1155 3.5954 5.7752 4.2714 20.1325 28.0111 3.2808 5.1921 5.4093 1005.2837 6.3903 2.8065 18.1985 2.9884 4.7017 7.0752

a1 4.8747 7.5513 3.8073 0.1904 2.2015 2.9501 0.1374 2.2583 1.0165 3.1912 2.1840 5.3136 7.3088 2.6561 0.4237 1.4522 5.2560 0.5548 0.5535 0.1888 0.5808 5.1492 2.8804 3.3896 4.6823 1.6542

r1 X X 2.9110 8.9709 4.0110 9.1858 19.3143 40.4430 22.9525 11.1447 X X X X 5.4946 4.6618 X 2.6971 3.5815 3.9763 2.1689 X 3.8295 X X 2.5533

a2 X X 20.8656 8.0310 6.5669 44.9864 9.7624 9.8721 9.5323 11.3547 X X X X 13.5360 17.0697 X 4.4592 7.2919 3.4067 6.8424 X 28.3750 X X 13.3389

r2 X X X X X X 1.8867 5.6378 4.8122 3.9754 X X X X X X X X X X X X X X X X

a3 X X X X X X 48.6273 99.8282 62.6190 61.7916 X X X X X X X X X X X X X X X X

r3 0.9452 0.9662 0.9776 0.9853 0.9604 0.9833 0.9835 0.9802 0.9961 0.9903 0.9179 0.9257 0.9593 0.9502 0.9768 0.9815 0.9161 0.9342 0.9535 0.9875 0.9105 0.9059 0.9744 0.9195 0.9609 0.9537

R2 1 1 2 2 2 2 3 3 3 3 1 1 1 1 2 2 1 2 2 2 2 1 2 1 1 2

116  The word

Strata

Frequency distribution  117

Figure 3.2.1.10. Dependence of r1 on N in mono-stratal data

Figure 3.2.1.11. The relationship of r1 and r2 in bi-stratal data

Figure 3.2.1.12. Relationship between r1 and r2 in tri-stratal data

118  The word

Figure 3.2.1.13. Relationship between r1 and r3 in tri-stratal data

Figure 3.2.1.14. Relationship between r2 and r3 in tri-stratal data

For an illustration, let us consider the rank-frequency sequence of the poem Lacul, namely 6, 5, 4, 3, six successive 2, and sixty successive 1, hence the corresponding spectrum is g(1) = 60, g(2) = 6, g(3) = 1, g(4) = 1, g(5) = 1, and g(6) = 1. Transforming the data according to (3.2.2) in order to obtain the frequency spectra of individual poems, it is possible to show that also spectra are stratified entities. Since spectra contain a true random variable, one can expect that the exponential fitting (3.2.1.1) will be at least as good as with ranked frequencies. As shown in Table 3.2.1.3, this is true, and only one stratum is required. Indeed, with increasing frequency, x, the decay of the spectrum, g(x), closely follows the simple exponential law g(x) = 1 + a*exp(-x/b)

Frequency distribution  119

with a and b as text parameters. The average value of the determination coefficient of 146 poems, considered in Table 3.2.1.3, is very high, of R̄2 = 0.9987, with a very narrow standard deviation, of 0.0020. Table 3.2.1.3: Exponential fitting of the spectrum g(x) = 1 + a*exp(-x/b) in 146 poems by Eminescu (asterisk corresponds to the transformation g*(x) = g(x) – g(W) + 1 where W = the greatest non-zero class, see comment on indicator B in Chapter 3.2.6) Poem title

a

b

R2

Adânca mare... Adio Ah, mierea buzei tale Amicului F.I. Amorul unei marmure* Andrei Mureşanu Atât de fragedă … Aveam o muză Basmul ce i l-aş spune ei Călin (file de poveste) Când Când amintirile… Când crivăţul cu iarna... Când marea... Când priveşti oglinda mărei Care-i amorul meu în astă lume Ce e amorul? Ce te legeni… Ce-ţi doresc eu ţie, dulce Românie Cine-i? Copii eram noi amândoi Crăiasa din poveşti Criticilor mei* Cu mâne zilele-ţi adaogi… Cugetările sărmanului Dionis Cum negustorii din Constantinopol Cum oceanu-ntărâtat... Dacă treci râul Selenei De câte ori, iubito… De ce nu-mi vii De ce să mori tu? De-aş avea* De-aş muri ori de-ai muri Demonism De-oi adormi (variantă)* De-or trece anii…

1354.4756 568.3042 735.7630 1001.1384 869.0966 4425.6238 859.3323 1597.6866 1240.1196 4102.0535 725.1972 4617.2887 2885.9093 449.0164 805.7183 1313.6504 737.9769 376.5405 649.0010 505.3207 1604.0263 364.6824 203.2197 331.8470 2704.4726 1733.1756 878.1955 1262.1373 508.9190 238.0511 2147.2913 427.8289 1102.0203 2017.5358 2827.1894 680.9978

0.3103 0.5328 0.5182 0.5323 0.5497 0.5681 0.4842 0.5100 0.5523 0.6249 0.4639 0.2395 0.4701 0.5049 0.4093 0.4309 0.4425 0.5446 0.5318 0.5206 0.4849 0.6219 0.8219 0.6918 0.4705 0.3171 0.3703 0.5237 0.5042 0.7091 0.3662 0.4395 0.4711 0.5685 0.2929 0.3888

0.9982 0.9995 0.9911 0.9993 0.9995 0.9981 0.9988 0.9991 0.9985 0.9985 0.9997 0.9977 0.9995 0.9986 0.9999 0.9979 1.0000 0.9989 0.9985 0.9997 0.9993 0.9997 0.9976 0.9970 0.9995 0.9992 1.0000 0.9999 0.9994 0.9984 0.9950 0.9954 0.9961 0.9970 0.9999 0.9995

120  The word Poem title

a

b

R2

Departe sunt de tine Despărţire Din Berlin la Potsdam Din lyra spartă... Din noaptea* Din străinătate Din valurile vremii… Dintre sute de catarge* Doi aştri Dorinţa Dumnezeu şi om Ecò Egipetul Epigonii Făt-Frumos din tei Feciorul de împărat fără de stea Floare-albastră Foaia veştedă (după Lenau) Freamăt de codru Frumoasă şi jună Ghazel Glossă Horia Iar când voi fi pământ (variantă) Împarat şi proletar În căutarea Şeherazadei Înger de pază Înger şi demon Îngere palid...* Întunericul şi poetul Iubind în taină…* Iubită dulce, o, mă lasă Iubitei Junii corupţi Kamadeva La Bucovina La mijloc de codru… La moartea lui Heliade La moartea lui Neamţu La moartea principelui Ştirbey La mormântul lui Aron Pumnul La o artistă (Ca a nopţii poezie) La o artistă (Credeam ieri...) La Quadrat

983.8634 1326.8446 675.3902 349.9895 1061.9776 933.7838 492.4733 1156.0026 12102.1473 782.4757 2160.4607 2537.8588 2531.8318 2480.0678 2437.8153 7251.3292 1420.0060 1070.6388 1645.0020 608.9169 1552.7107 1542.3236 1000.4145 777.4744 6333.5097 4337.9540 247.0794 3043.6635 470.0535 960.0351 1130.1664 1085.2744 932.7079 6592.8465 754.1010 650.0029 261.3333 1547.0818 884.7791 1365.4426 669.0598 330.7506 468.6121 452.0503

0.4162 0.4755 0.4743 0.4451 0.3186 0.5149 0.5463 0.2836 0.1727 0.4240 0.4791 0.5128 0.5162 0.5789 0.4259 0.6513 0.4527 0.3984 0.3868 0.4500 0.4736 0.4166 0.4379 0.4614 0.4554 0.4602 0.6586 0.5036 0.4181 0.5233 0.3539 0.5238 0.5931 0.3144 0.3977 0.5658 0.4477 0.4659 0.5316 0.3571 0.5124 0.6868 0.6937 0.5032

0.9998 0.9990 0.9998 0.9998 1.0000 0.9986 0.9973 1.0000 1.0000 1.0000 0.9999 0.9999 0.9992 0.9996 0.9987 0.9952 0.9984 1.0000 0.9996 0.9973 0.9990 0.9854 0.9995 0.9999 0.9991 0.9994 0.9959 0.9994 0.9999 0.9999 1.0000 0.9971 0.9962 0.9996 0.9999 1.0000 1.0000 0.9967 0.9976 0.9968 0.9994 0.9987 0.9990 0.9988

Frequency distribution  121 Poem title

a

b

R2

La steaua* Lacul Lasă-ţi lumea… Lebăda* Lida Locul aripelor* Luceafărul Mai am un singur dor Melancolie Memento mori Miradoniz Misterele nopţii Mitologicale Mortua est! Mureşanu Murmură glasul mării Napoleon Noaptea… Nu e steluţă* Nu mă-nţelegi* Nu voi mormânt bogat (variantă) Numai poetul* O arfă pe-un mormânt O călărire în zori O stea pin ceruri* O, adevăr sublime... O, mamă… Odă în metru antic* Odin şi poetul Ondina (Fantazie) Oricâte stele… Pajul Cupidon…* Pe aceeaşi ulicioară… Pe lângă plopii fără soţ Peste vârfuri Povestea codrului Povestea teiului Prin nopţi tăcute* Privesc oraşul furnicar Pustnicul Replici Revedere Rugăciunea unui dac S-a dus amorul

7881.7018 706.0953 1697.2405 672.4397 480.3582 1302.7430 2575.7621 519.8061 1116.5236 11401.9795 1927.2720 704.3565 2526.0404 1378.5337 2920.5188 1454.2049 714.8968 1636.4762 144.3022 1464.2586 4053.6331 515.9543 1000.1551 827.9697 683.8633 1844.0705 525.3332 760.3918 2471.2823 2840.3157 430.4143 3541.9342 799.2952 1463.1455 1157.9948 943.3530 1414.5720 515.9543 1125.8307 3102.1643 415.5590 628.3138 1774.9980 1578.8917

0.1999 0.4029 0.4019 0.3317 0.4381 0.4440 0.6715 0.5488 0.5081 0.6507 0.5326 0.4753 0.5125 0.5527 0.6840 0.3565 0.5891 0.3654 0.5839 0.5072 0.2634 0.3597 0.4304 0.6461 0.3881 0.4410 0.5170 0.4142 0.6440 0.5269 0.5119 0.2819 0.4462 0.3904 0.2834 0.5182 0.5216 0.3597 0.4400 0.3843 0.4944 0.4881 0.4695 0.3968

0.9999 0.9999 0.9988 1.0000 1.0000 0.9992 0.9984 0.9996 0.9998 0.9948 0.9986 0.9933 0.9997 0.9979 0.9982 0.9987 0.9994 0.9951 0.9979 0.9990 0.9999 1.0000 0.9988 0.9981 0.9999 0.9998 0.9970 0.9999 0.9978 0.9993 0.9993 0.9998 0.9999 0.9973 1.0000 0.9993 0.9996 1.0000 0.9999 0.9988 0.9987 0.9987 0.9999 0.9991

122  The word Poem title

a

b

R2

Sara pe deal Scrisoarea I Scrisoarea II Scrisoarea III Scrisoarea IV Scrisoarea V Se bate miezul nopţii… Şi dacă… Singurătate* Somnoroase păsărele… Sonete Speranţa Steaua vieţii Stelele-n cer Sus în curtea cea domnească Te duci... Trecut-au anii Unda spumă* Venere şi Madona Veneţia (de Gaetano Cerri)* Viaţa mea fu ziuă Vis

1528.3283 3893.9812 2429.4491 4835.4891 3500.8155 2045.7178 616.4617 141.1241 2386.1162 512.9476 1485.5921 478.7944 266.2396 881.1158 919.5371 1166.2636 496.1178 414.2072 836.2394 1988.4880 1046.7900 789.2336

0.3826 0.5117 0.5092 0.5835 0.5374 0.6208 0.3486 0.5902 0.3288 0.3881 0.4474 0.6460 0.5485 0.3859 0.4261 0.3370 0.4809 0.4047 0.6558 0.2897 0.3774 0.5145

0.9997 0.9982 0.9998 0.9988 0.9988 0.9992 1.0000 0.9880 0.9977 1.0000 0.9983 0.9979 1.0000 1.0000 0.9992 0.9997 0.9996 0.9999 0.9987 1.0000 0.9994 0.9999

3.2.2 Ord's criterion The study of a writer's word-frequencies is always performed with the intent to find some order in his writing. Though themes and style may change, we conjecture that there are some properties in the text that are mastered by the writer only partially, especially as long as the text is short. But as soon as the text begins to increase, the writer loses the overall control and some of the properties develop without his conscious will. In such a situation, self-organisation begins to work and “pulls” the text to an attractor which must be found analytically. This is the case e.g. with the addition of strata whose number may increase in the course of text writing (see Chapter 3.2.1). By self-organisation, new strata are laid on the old ones and our apparatus captures the probable form of the attractor. Here we shall restrict ourselves to another attractor, viz. Ord's criterion (cf. J.K. Ord, 1972), used frequently in linguistics. To this end we need the moments of the ranked frequency sequence which may be a distribution or merely a sequence of numbers. We compute the usual first moment (average), the second

Frequency distribution  123

central moment (here variance) and the third central moment, an indicator of the skewness of the ranked sequence. The definitions are

1 N

′ r= (3.2.2.1) m1=

V

∑ rf (r ) r =1

where r are the ranks, f(r) are the frequencies, N = text size in words and V = the maximum rank (the vocabulary). Further (3.2.2.2)

1 N

2 m= s= 2

V

∑ (r − r )

2

f (r )

r =1

and (3.2.2.3)= m3

1 N

V

∑ (r − r ) r =1

3

f (r ) .

Considering the moments individually in Eminescu's poems we would find a great variation and no order. But if we use Ord's ratios defined as (3.2.2.4) = I

m2 s2 = r m1′

and (3.2.2.5)

S=

m3 , m2

Figure 3.2.2.1. Ord's criterion for Eminescu's 146 rank-frequency sequences

we can plot a representative of each text in a Cartesian coordinate system and obtain a kind of order. Up to now, all investigations of in textology

124  The word have shown that a writer in a certain language is positioned either in an elliptic cloud or, in stricter cases, on a straight line, independently of the property measured (cf. Ammermann 2001; Arlt 2006; Best 2001a, 2003; 2005a,b; Best, Kaspar 2001; Nemcová, Altmann 1994; Oakes 2007; Popescu et al. 2008; Popescu, Čech, Altmann 2011a; Wimmer et al. 2003: 100 ff.). Generally, two similar sequences of rank-frequencies fi and fj obey the rule fi = constant • fj so that the constant is eliminated in the above Ord's ratios and the corresponding values coincide. Hence, the more similar are two rank-frequencies, the more closer are their representative points in a plot. In Table 3.2.2.1 we present the values for all poems by Eminescu and plot the results in Figure 3.2.2.1. In the plot, we omitted two texts with great size in order to render the picture more lucid. Nevertheless, even the omitted texts lie exactly on the computed curve. Needless to say, Ord's indicators can be computed also from the theoretical values of the stratification formula, e.g. with (3.2.2.6)

(r) = 1 + aexp(-r/b)

the average would be (3.2.2.7)

′ m= 1

1 N

V

∑ r (1 + a exp(−r / b)) . r =1

For example, in the poem Adânca mare... we obtain empirically m1´ = 26.57, and using the computed parameters (cf. Table 3.2.1.2 in Chapter 3.2.1) the expectation is

1 62 m1′ = ∑ r (1 + 5.1159exp(−r / 3.1488)) =26.71 , 75 r =1 i.e. the difference is only in the decimal places. As can be seen, S is a power function of I and can be expressed as S = 0.30517(I) 1.16736 with R2 = 0.9972. The straight line S = 2I – 1 is the boundary between the great area of the betabinomial (= negative hypergeometric) distribution and the hypergeometric and the beta-Pascal distributions, and touching the binomial, Poisson and negative binomial distributions (cf. Ord 1972; Popescu et al. 2009: 154). As can be seen, Eminescu is placed entirely in the beta-binomial domain. The dependence of I on N is quite natural but that of S on I is not. The third central moment, which is also a component of S is a result of self-organization, a kind of adaptation of the skewness of the ranked sequence to the variance. Though all of these sequences have by definition a hyperbolic form, skewness and variance are not necessarily interdependent. It has been shown in other

Frequency distribution  125

publications (cf. Popescu et al. 2009: 154-165; Wimmer et al. 2003: 99-102; Popescu, Čech, Altmann 2011a) that in most text collections in 20 languages, the relationship S = f(I) is usually an increasing straight line. Here, we would obtain a very good fit with S = -18.5433 + 0.9647(I) with R2 = 0.99. The kind of dependence and the parameters are features of the given text collection. A quasi-final theoretical decision cannot be made before analyses of different text sorts and individual writers have been performed.

126  The word

Pajul Cupidon... O arfă pe-un mormânt Sara pe deal Singurătate

14.3580 4.8041

14.4398 3.2689

14.6022 2.4098

4.4763

15.1658

15.4324

15.8942 11.0147

16.0946 7.4472

16.1075

16.2951

16.5634 5.3582

16.6383 8.6144

17.3485

17.4031

18.7352

19.0325 9.3166

19.1177

19.3205 7.3207

O stea prin ceruri

Cum oceanuntărâtat... Veneţia (de Gaetano Cerri) Kamadeva

Oricâte stele...

De-aş avea

Înger de pază

Iubind în taină...

Trecut-au anii

Te duci...

De-or trece anii...

Lacul

Stelele-n cer

Când amintirile...

Ce te legeni...

De câte ori, iubito...

Odă în metru antic

6.4573

5.8619

5.1325

7.1442

5.0704

3.2733

3.7007

La mormântul lui Aron Pumnul Adio

14.3434 4.5074

16.2218

16.8749

S

16.9221

14.0062

33.7436

33.7274

33.5602 17.6877

15.3924

14.0851

33.4095 13.6364

30.3494 9.6160

30.2561

29.8618 11.3191

29.6257

28.5780 12.1992

28.1931

27.2352

I

Care-i amorul meu în astă lume

La o artistă (Credeam ieri) Ah, mierea buzei tale

Pe lângă plopii fără soţ

Freamăt de codru

La Bucovina

15.9934 21.6960

28.0882 40.4923 20.3393

39.2957

38.4593 23.1001

37.7909

35.0398 12.5882

34.1670

Ce-ţi doresc eu ţie, dulce 33.7896 19.7728 Românie Privesc oraşul furnicar 33.8715 12.9692

Noaptea...

Atât de frageda…

Vis

Misterele nopţii

Din valurile vremii...

13.2642 2.9552

Adânca mare…

Poem title

La steaua

S

I

Poem title

Scrisoarea V

Epigonii

Inger si demon

Ondina (Fantazie)

Demonism

Ecò

Când crivăţul cu iarna...

Egipetul

Scrisoarea II

Mitologicale

Cugetările sărmanului Dionis Miradoniz

Junii corupţi

Mortua est!

Dumnezeu şi om

Făt-Frumos din tei

Aveam o muză

Pustnicul

Povestea teiului

Poem title 44.4743

S

46.6967

48.2603

181.2812 142.4597

174.8791 116.4110

166.9185 116.1771

166.8483 110.8842

161.6418 120.1606

137.7140 85.8646

136.5299 94.3774

133.9957 80.3625

133.7825 90.1734

133.6390 80.9262

119.1822 83.5678

113.2859 63.0100

93.9687 51.5615

89.4855 63.6093

88.0286 43.1048

81.1388

80.5931

75.4629 38.8569

74.4225

I

Frequency distribution  127

Locul aripelor Floare-albastră

20.8298 13.2573

4.1517

5.3182

10.2383

21.1285

21.2373

21.6335

21.7732

22.1753

Criticilor mei

De ce nu-mi vii

Frumoasă şi jună

Nu voi mormânt bogat (variantă) Foaia veştedă (dupa Lenau) Crăiasa din poveşti

Murmură glasul mării 22.8708 6.4199

11.1675

13.6821

20.8195 12.0935

Când marea...

Amorul unei marmure

Întunericul şi poetul

25.6754

47.5299

30.1077

25.6365

48.9285 28.7203

48.7903 21.8256

48.2531

47.9721

30.9101

23.1258

46.5548 26.4649

45.0691

44.7092 24.8054

De-aş muri ori de-ai muri 47.5829

Amicului F.I.

Din străinătate

La moartea lui Neamţu

Napoleon

20.3502 11.0975

La Quadrat

23.7420

18.6836

31.6412

S

44.0860 20.6707

42.5334

Lasă-ţi lumea...

41.9171

Povestea codrului S-a dus amorul

41.7168

I

Speranţa

Poem title

20.0956 6.6987

S

Viaţa mea fu ziuă

I

Când priveşti oglinda 19.3821 6.6813 mărei Cum negustorii din 19.4562 5.8108 Constantinopol Dorinţa 19.6653 5.7203

Poem title

Feciorul de împărat fără de stea Memento mori

Scrisoarea III

Călin (file de poveste)

Andrei Mureşanu

Mureşanu

Împărat şi proletar

Luceafărul

Odin şi poetul

Scrisoarea I

Scrisoarea IV

În căutarea Şeherezadei

Poem title

S

1350.45901354.7581

848.5269 842.0609

404.7692 327.5248

393.5057 329.6952

348.2138 287.8655

330.9081 290.9484

291.0834 208.6160

281.3298 246.8649

241.2197 200.7361

232.3401 176.0081

231.5885 173.3774

182.1950 109.1787

I

128  The word

Frequency distribution  129

3.2.3 The lambda indicator In texts of sufficient lengths, the frequencies which correspond to adjacent ranks are not equidistant. At the beginning of the sequence, the distances between f(r) and f(r+1) are great, with increasing ranks, the differences between the corresponding frequencies decrease. The tail of the sequence consists mostly of f(r) = 1, i.e. there is no difference. This way of structuring can be expressed by the length of the arc beginning at the distribution top (1, f(1)), see Figure 3.2.6.1, and ending at (V, f(V) with V = maximum rank (the extent of Vocabulary); usually f(V) = 1. Since we have to do with discrete quantities, the arc length can be expressed as (3.2.3.1)= L

V −1

∑[( f (r ) − f (r + 1)) r =1

2

+ 1]1/ 2 ,

i.e. as the sum of Euclidean distances between neighbouring frequencies. Evidently, the greater the text size N (= area under the distribution), the greater is the arc length. In order to normalise L, i.e. to make it independent of N, Popescu, Čech and Altmann (2011) proposed the lambda indicator (3.2.3.2) Λ =

L(log10 N ) , N

yielding values scattered around an horizontal straight line in the plot and enabling us to compare texts of different lengths. When we consider the poem Prin nopţi tăcute as shown in Chapter 3.2 (Table 3.2.1), we obtain the sequence 3,3,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 from which it follows that L = [(3 – 3)2 + 1]1/2 + [(3 – 2)2 + 1]1/2 + … + [(1 – 1)2 + 1]1/2 = 39.8284. Since N = 48, we obtain Λ = 39.8284(log1048)/48 = 1.3950.

130  The word Table 3.2.3.1.: Lambda in 146 poems by M. Eminescu N

L

Λ

Var (Λ)

63.0645

1.5767

0.003986

114.6410

1.5872

0.004079

Year

Poem title (alphabetically)

1873

Adânca mare…

75

1883

Adio

159

1873

Ah, mierea buzei tale

228

147.5432

1.5259

0.001572

1869

Amicului F.I.

257

194.6569

1.8253

0.001052

1868

Amorul unei marmure

266

187.1290

1.7059

0.001915

1871

Andrei Mureşanu

2008

1057.0693

1.7387

0.000326

1879

Atât de frageda…

176

138.7396

1.7701

0.000954

1871

Aveam o muză

421

290.0402

1.8080

0.001917

1871

Basmul ce i l-aş spune ei

398

272.0975

1.7774

0.001117

1876

Călin (file de poveste)

2299

1199.1833

1.7534

0.000268

1869

Când

126

101.8929

1.6985

0.000737

1883

Când amintirile...

97

80.6569

1.6520

0.002305

1872

Când crivăţul cu iarna...

708

438.8507

1.7666

0.001252

1869

Când marea...

114

81.8929

1.4776

0.003552

1869

Când priveşti oglinda mărei

101

83.4787

1.6566

0.002213

1873

Care-i amorul meu în astă lume

213

158.8929

1.7369

0.001910

1883

Ce e amorul?

124

97.9274

1.6533

0.004057

1883

Ce te legeni...

102

79.3657

1.5629

0.003192

1867

Ce-ţi doresc eu ţie, dulce Românie

183

129.7148

1.6037

0.003101

1869

Cine-i?

129

95.7148

1.5660

0.003505

1871

Copii eram noi amândoi

375

265.9252

1.8253

0.000963

1876

Crăiasa din poveşti

122

96.9515

1.6580

0.001990

1883

Criticilor mei

130

91.2426

1.4837

0.001974

1883

Cu mâne zilele-ţi adaogi...

141

106.4787

1.6230

0.002482

1872

Cugetările sarmanului Dionis

571

406.6445

1.9632

0.000958

1874

Cum negustorii din Constantinopol

101

84.6569

1.6800

0.002164

1873

Cum oceanu-ntărâtat...

77

67.2426

1.6474

0.001497

1873

Dacă treci râul Selenei

356

244.4991

1.7523

0.001164

1879

De câte ori, iubito...

102

85.9907

1.6933

0.001029

1887

De ce nu-mi vii

123

85.7800

1.4575

0.003658

1869

De ce să mori tu?

266

177.8098

1.6209

0.002018

1866

De-aş avea

93

62.4787

1.3225

0.006866

1869

De-aş muri ori de-ai muri

258

171.1356

1.5997

0.002206

1872

Demonism

882

524.0894

1.7502

0.000931

1883

De-oi adormi (variantă)

122

105.2426

1.7998

0.002182

1883

De-or trece anii...

87

64.8929

1.4467

0.003931

1878

Departe sunt de tine

135

106.8929

1.6868

0.002983

Frequency distribution  131

Year

Poem title (alphabetically)

N

L

Λ

Var (Λ)

1879

Despărţire

304

208.7707

1.7051

0.001639

1873

Din Berlin la Potsdam

128

101.3006

1.6677

0.003899

1867

Din lyra spartă...

51

43.8284

1.4675

0.002791

1884

Din noaptea

68

56.8284

1.5314

0.002883

1866

Din străinătate

244

174.3565

1.7060

0.001276

1883

Din valurile vremii...

152

105.8929

1.5200

0.002469

1880

Dintre sute de catarge

50

41.6503

1.4153

0.007118

1872

Doi aştri

40

38.4142

1.5385

0.000781

1876

Dorinţa

102

87.8126

1.7292

0.001200

1873

Dumnezeu şi om

443

327.2346

1.9548

0.001641

1872

Ecò

698

461.6051

1.8807

0.000996

1872

Egipetul

688

465.7789

1.9211

0.000442

1870

Epigonii

921

590.0754

1.8992

0.000570

1875

Făt-Frumos din tei

415

289.4209

1.8258

0.001232

1872

Feciorul de impărat fără de stea

6030

2445.6464

1.5332

0.000081

1873

Floare-albastră

247

192.1356

1.8612

0.002321

1879

Foaia veştedă (dupa Lenau)

115

100.0645

1.7931

0.002050

1879

Freamăt de codru

179

144.4853

1.8185

0.000871

1871

Frumoasă şi jună

113

85.3657

1.5510

0.003709

1873

Ghazel

331

235.8836

1.7957

0.001088

1883

Glossă

380

200.2494

1.3595

0.001759

1867

Horia

143

120.9907

1.8236

0.001519

1883

Iar când voi fi pământ (variantă)

131

107.4787

1.7371

0.001469

1874

Împărat şi proletar

1510

896.5804

1.8876

0.000491

1874

În căutarea Şeherezadei

915

615.4903

1.9921

0.000412

1871

Înger de pază

91

72.0645

1.5514

0.001199

1873

Inger şi demon

876

537.7659

1.8064

0.000980

1870

Îngere palid...

63

52.8284

1.5088

0.003236

1869

Întunericul şi poetul

249

180.4694

1.7367

0.002033

1883

Iubind în taină...

87

76.8284

1.7128

0.001976

1871

Iubită dulce, o, mă lasă

337

216.0618

1.6205

0.001782

1871

Iubitei

416

248.4707

1.5643

0.001104

1869

Junii corupţi

458

322.7674

1.8752

0.001988

1887

Kamadeva

81

70.2426

1.6550

0.001384

1866

La Bucovina

184

141.8929

1.7465

0.001464

1883

La mijloc de codru...

55

41.6363

1.3175

0.003397

1867

La moartea lui Heliade

332

231.7707

1.7600

0.001249

1870

La moartea lui Neamţu

245

175.3071

1.7095

0.001398

1869

La moartea principelui Ştirbey

132

99.4787

1.5981

0.003432

132  The word Year

Poem title (alphabetically)

N

L

Λ

Var (Λ)

1866

La mormântul lui Aron Pumnul

150

118.8191

1.7237

0.001424

1868

La o artistă (Ca a nopţii poezie)

142

106.4049

1.6128

0.002545

1869

La o artistă (Credeam ieri)

219

158.7279

1.6963

0.001128

1870

La Quadrat

110

80.8929

1.5012

0.002015

1886

La steaua

71

61.8284

1.6121

0.002699

1876

Lacul

90

71.0711

1.5432

0.001176

1883

Lasă-ţi lumea...

225

170.5432

1.7829

0.001060

1869

Lebăda

41

37.2361

1.4647

0.004821

1866

Lida

66

57.6503

1.5894

0.002342

1869

Locul aripelor

259

174.8995

1.6297

0.002292

1883

Luceafărul

1737

885.3917

1.6514

0.000364

1883

Mai am un singur dor

125

103.2426

1.7319

0.001547

1876

Melancolie

274

201.9549

1.7968

0.001355

1872

Memento mori

9773

3961.9447

1.6175

0.000068

1872

Miradoniz

636

406.0453

1.7898

0.000703

1866

Misterele nopţii

155

111.8929

1.5812

0.001928

1873

Mitologicale

681

466.0482

1.9389

0.000647

1871

Mortua est!

491

307.1973

1.6837

0.001684

1876

Mureşanu

2051

1029.0071

1.6616

0.000296

1873

Murmură glasul mării

119

101.9907

1.7789

0.002033

1874

Napoleon

240

176.8790

1.7542

0.000967

1871

Noaptea...

177

130.3071

1.6550

0.002370

1867

Nu e steluţă

54

39.8284

1.2778

0.004077

1882

Nu mă-nţelegi

384

261.7793

1.7618

0.001325

1883

Nu voi mormânt bogat (variantă)

113

99.6569

1.8106

0.001814

1868

Numai poetul

48

39.8284

1.3950

0.004853

1873

O arfă pe-un mormânt

157

120.3071

1.6827

0.001113

1866

O călărire în zori

346

245.8645

1.8042

0.001767

1869

O stea prin ceruri

78

64.8284

1.5726

0.002338

1874

O, adevăr sublime...

334

235.3818

1.7786

0.001053

1880

O, mamă…

140

99.8929

1.5313

0.003158

1883

Odă în metru antic

103

83.2426

1.6267

0.002847

1872

Odin şi poetul

1429

763.2834

1.6852

0.000405

1869

Ondina (Fantazie)

871

557.6579

1.8823

0.000441

1878

Oricâte stele...

85

72.8284

1.6531

0.001285

1879

Pajul Cupidon...

148

118.7800

1.7418

0.003224

1879

Pe aceeaşi ulicioară...

138

104.4853

1.6202

0.002283

1883

Pe lângă plopii fără soţ

199

140.7213

1.6256

0.002001

1883

Peste vârfuri

47

40.0645

1.4254

0.003249

Frequency distribution  133

Year

Poem title (alphabetically)

N

L

Λ

Var (Λ)

1878

Povestea codrului

220

171.7800

1.8290

0.000718

1887

Povestea teiului

390

271.1328

1.8013

0.001072

1869

Prin nopţi tăcute

48

39.8284

1.3950

0.004853

1873

Privesc oraşul furnicar

173

140.7559

1.8209

0.002139

1874

Pustnicul

380

273.9640

1.8599

0.001115

1871

Replici

147

81.8627

1.2070

0.007270

1879

Revedere

141

103.0711

1.5711

0.002439

1879

Rugăciunea unui dac

357

259.4127

1.8549

0.001310

1883

S-a dus amorul

219

155.5432

1.6623

0.002897

1885

Sara pe deal

156

129.4787

1.8203

0.002549

1881

Scrisoarea I

1282

744.5881

1.8051

0.000558

1881

Scrisoarea II

696

442.3648

1.8067

0.001246

1881

Scrisoarea III

2278

1236.8800

1.8230

0.000224

1881

Scrisoarea IV

1256

749.9394

1.8504

0.000669

1881

Scrisoarea V

1027

581.7540

1.7059

0.000425

1883

Se bate miezul nopţii...

45

39.8284

1.4632

0.003358

1883

Şi dacă...

53

37.2426

1.2116

0.005811

1878

Singurătate

172

135.0711

1.7556

0.001423

1883

Somnoroase păsărele...

55

46.2426

1.4633

0.002493

1879

Sonete

265

196.3071

1.7951

0.001229

1868

Speranţa

245

154.6554

1.5082

0.001430

1871

Steaua vieţii

70

55.2426

1.4561

0.003816

1879

Stelele-n cer

91

76.6569

1.6503

0.001156

1870

Sus în curtea cea domnească

128

104.6569

1.7229

0.001490

1883

Te duci...

84

72.9112

1.6703

0.003614

1883

Trecut-au anii

88

74.2426

1.6405

0.001218

1869

Unda spumă

59

46.8284

1.4055

0.003571

1887

Venere şi Madona

393

256.5522

1.6936

0.001318

1883

Veneţia (de Gaetano Cerri)

79

70.8284

1.7013

0.002293

1869

Viaţa mea fu ziuă

105

86.6569

1.6681

0.002036

1876

Vis

177

139.8929

1.7767

0.000944

In Table 3.2.3.1, the lambdas of each poem are shown. The average of all the 146 poems is Λ = 1.6685 and the variance of the values is extremely small, viz. s2 = 0.0235. Lambda is, in this form, independent of N. If we order the poems according to their years of origin, we obtain a straight line, which is almost horizontal, viz. Λ = 2.6964 – 0.000548(year), and no test statistic (t or F) is significant. Thus, we obtained an indicator which gives a picture of the frequency structure of a text.

134  The word Using the empirical mean and variance, we may set up a 95% confidence interval for a true mean by computing

Λ − 1.96 s 2 / N ≤ µ Λ ≤ Λ + 1.96 s 2 / N , which in our case is 1.6685 – 1.96(0.0235/146)1/2 ≤ µ Λ ≤ 1.6685 + 1.96(0.0235/146)1/2, yielding . We obtain the 95% interval for the individual values of Λ as . It is easy to see that this interval includes the golden section

= ϕ

1+ 5 = 1.6180, 2

which appears also elsewhere with different text phenomena. This fact forces us to look at the text from a different point of view. The writer controls the text both consciously and subconsciously. S/He cares for style and theme and as long as the text is short, s/he can maintain a kind of equilibrium of word repetitions, a property captured by the rank-frequency sequence of words. But if the text becomes longer, this control is lost either suddenly after a break in writing or gradually during the writing process itself, depending on the capability of the writer. At this point, the equilibrium is broken and one should not expect any kind of conscious structuring. But in this chaotic situation, self-organisation takes place and steers the frequency structure of the text towards a fixed point which is, in our case, the golden section. Although we believe in the writer's freedom of text construction, this behaviour of texts forces us to believe either in the existence of self-organisation or in the existence of background mechanisms working during the whole time of writing. Needless to say, the golden section has been found now in the interval of Romanian, but Λ can take quite different values in other languages (cf. Popescu, Čech, Altmann 2011: 49ff), which simply have other attractors or obey some other latent mechanisms. In some subsequent chapters we shall see that the golden section is an attractor appearing in different geometric properties of the ranked frequency sequence. Nevertheless, there is a certain dispersion of lambdas in texts, i.e. not all texts converge equally to or attain the golden section – there are differences between them. In order to test them at least asymptotically, we must know the variance of an individual lambda. As shown in Popescu, Mačutek, Altmann (2010) (cf. also Popescu, Čech, Altmann 2011) the variance can be computed as

Frequency distribution  135

(3.2.3.3)

Var ( L) =

N − f1 V 2  pˆ r  N − f1 V −1 V aˆr pˆ r 1 − aˆr aˆ s pˆ r pˆ s −2 ∑ 2 ∑ ∑ 1 − pˆ1 r = 2 (1 − pˆ1 ) r = 2 s= r +1  1 − pˆ1 

where  pˆ r −1 − pˆ r    1 − pˆ1  − aˆr = + 2 ˆ r −1 − pˆ r  2 p ( N − f1 )   +1  1 − pˆ1 

 pˆ r − pˆ r +1    1 − pˆ1 

( N − f1 ) 

( N − f1 )  ( N − f1 )

2

2

 pˆ r − pˆ r +1    +1  1 − pˆ1 

for r = 2,..,V-1, and

aˆV = −

 pˆV −1 − pˆV    1 − pˆ1 

( N − f1 )  ( N − f1 )

2

2

 pˆV −1 − pˆV    +1  1 − pˆ1 

,

where the symbols with a circumflex are the values estimated form the sample. For an asymptotic test for comparing two texts we need the variance of Λ, which is given by 2

 log N  (3.2.3.4) Var (Λ ) = 10  Var ( L) .  N 

The variances of individual texts are presented in the last column of Table 3.2.3.1. The test can be performed by means of the normal approximation (3.2.3.5) u =

Λ1 − Λ 2 . Var (Λ1 ) + Var (Λ 2 )

For the sake of illustration, consider the last two poems Vis and Viaţa mea fu ziuă with Λ(Vis) = 1.7767 and Var(ΛVis) = 0.000944 and Λ(Viaţa…) = 1.6681 and Var(ΛViaţa…) = 0.002036. Inserting these values into formula (3.2.3.5) we obtain

= u

1.7767 − 1.6681 = 1.99 0.000944 + 0.002036

which is significant at the α = 0.05 level. Though the differences in Λ may be very small – Eminescu's lambda values vary in the interval – they may nonetheless be significantly different.

136  The word It can easily be shown that Λ does not depend on the development of the writer: there is no relation between the year of origin and Λ. The same holds for the relation to the elements of Ord's criterion. It is worthwhile to compare Λ with its maximum Λmax as allowed by the longest possible arc length Lmax. The latter can be slightly simplified and obtained without much computing in the form (cf. Popescu, Mačutek, Altmann 2009: 68) Lmax = f(1) + V – 2 = N – 1. Inserting this expression into Eq. (3.2.3.2), we obtain an approximate Λmax as

1  (3.2.3.6) Λ max =1 −  log10 N ,  N which approaches the common logarithm of N as N grows (rendering 1/N → 0). In order to show the behaviour of both Λ and Λmax, we present their course for different text lengths in Figure 3.2.3.1 for 1185 texts in 35 languages (cf. Popescu, Čech, Altmann 2011: 11) and in Figure 3.2.3.2 for 146 poems by Eminescu. As can

Figure 3.2.3.1. Observed and maximum lambda values of 1185 texts of different size in 35 languages

be seen, no text attains its lambda-maximum, and the greater the text size, the greater the difference between the observed lambda and maximum lambda. Considering the figures showing lambda, one can see that the observed lambda values are roughly constant and the deviations observed in individual texts may hint at language, genre, and some stylistic idiosyncrasy which should be further investigated. However, it is noteworthy to remark a flat bowlike Λdependence on text size N, between about 500 and 1000, as can be noticed in

Frequency distribution  137

Figure 3.2.3.2, a course confirmed also for Czech and English (R. Čech, personal communication, 2011). This behaviour is expected because in very short texts no or only few words are repeated and Λ may approach Λmax quite closely, whereas in very long texts Λ decreases significantly because of word repetition. Generally, the shorter the text, the more chances it has to attain the maximum. This can be clearly seen if we plot relative lambda values, i.e. Λ/Λmax against text size as shown in Figure 3.2.3.3. However, this dependence becomes blurred if N is measured in units of the area (h – 1)2, i.e. in terms of the indicator b = N/(h – 1)2 where h is the h-point (cf. Popescu, Mačutek, Altmann 2009:144, Eq.7.12), as illustrated in Figure 3.2.3.4.

Figure 3.2.3.2. Observed and maximum lambdas in Eminescu's poems of different size

A recent progress has been made in constructing a Λ indicator independent of text size (Popescu, Zörnig, Altmann, Glottometrics, 2013: 25, 43 ff.) by introducing a modified lambda as Λmod = Λ(log10N)0.14282575 with the variance (3.2.3.8) Var(Λmod) = (log10N)0,2858Var(Λ).

138  The word

Figure 3.2.3.3. Relative lambda against text size for 146 poems by Eminescu.

Figure 3.2.3.4. Indicators Λ and b

As a closing application of the above lambda concept we consider the ranking of a few well known poems by Eminescu in Tables 3.2.3.2, 3.2.3.3, and 3.2.3.4. The last table introduces the sampling by the first 100 words of the considered poem in the ratio with the corresponding Λmax(100) = 1.98. Obviously, the ranking is very sensitive to the considered variable. Thus, in Table 3.2.3.2, ordered by descending values of the absolute lambda, the top belongs to the poem În căutarea Şeherezadei; in Table 3.2.3.3, ordered by descending values of the relative lambda, the top belongs to the poem Floare-albastră; and, finally, in Table 3.2.3.4, ordered by descending vaues of the relative lambda of the first 100 word sampling, the top belongs to the poem Scrisoarea IV. In all cases the rank-

Frequency distribution  139

ing emphasises the quality of the word rank-frequency distribution of the considered texts. Table 3.2.3.2: Lambda in 7 poems by M. Eminescu N

Poem title În căutarea Şeherezadei Floare-albastră Scrisoarea IV Scrisoarea III Călin Memento mori Feciorul de împărat fără de stea

915 247 1256 2278 2299 9773 6030

Λ 1.9921 1.8612 1.8504 1.8230 1.7534 1.6175 1.5332

Table 3.2.3.3: Relative lambda in 7 poems by M. Eminescu Poem title Floare-albastră În căutarea Şeherezadei Scrisoarea IV Scrisoarea III Călin Feciorul de împărat fără de stea Memento mori

Λ

N 247 915 1256 2278 2299 6030 9773

1.8612 1.9921 1.8504 1.8230 1.7534 1.5332 1.6175

Λmax

Λ/Λmax

2.3830 2.9582 3.0965 3.3561 3.3601 3.7797 3.9896

0.7810 0.6734 0.5976 0.5432 0.5218 0.4056 0.4054

Table 3.2.3.4: Lambda of first 100 words samples from 7 poems by M. Eminescu Poem title Scrisoarea IV Memento mori Floare-albastră Scrisoarea III Călin În căutarea Şeherezadei Feciorul de împărat fără de stea

N

Λ(100) (for the first 100 words)

Λ(100) / Λmax(100) (Λmax(100) = 1,98)

1256 9773 247 2278 2299 915 6030

1.7731 1.7249 1.7238 1.7096 1.6896 1.6049 1.5413

0.8955 0.8712 0.8706 0.8634 0.8533 0.8106 0.7784

As has been shown in Popescu, Čech, Altmann (2011), lambda is a kind of control of the entire production of an author. Beside different style, subject, text sort, and other influences, the author has a certain (subconscious) way of writing, rendering the latent structure (in this case the frequency structure) of the text similar to his other texts. It is a kind of perseveration. It depends on his

140  The word feeling for rhythm, ease of writing, adopted customs, etc., leading to similarity – in extreme cases to monotony – in his work. On the other hand, he tries to avoid old structures and strives for some innovation. The clash of these forces boils down to the fact that not all texts have the same lambda structure and not all texts differ significantly from other ones. Since the lambda values of two texts can be tested for difference (using formula 3.2.3.5), we compare each poem with each other and state the number of different texts in the analyzed work. For Eminescu we obtain the results presented in the last columns of Table 3.2.3.5. We suppose that these differences are not haphazard but follow a certain regularity which can be derived from the interaction of the two mentioned forces. Considering the number of similarities as a continuous variable, we conjecture that the relative rate of change (of similarities dy/y) depends on the difference between the two forces, i.e. it is proportional to the difference of innovation minus perseveration, i.e.

dy  b c  =  − (3.2.3.7)  dx . y  x−m M −x Hence, y is the number of significant similarities, m is the minimum, M the maximum value of the variable x, and let x = 1/Λ. Solving (3.2.3.7) we obtain the beta function

a ( x − m)b ( M − x ) c . (3.2.3.8) y = Inserting the values presented in Table 3.2.3.5 we obtain the function as shown in Figure 3.2.3.5. Since we consider x = 1/Λ, the minimum can be set at 0.5 and all the other values were found iteratively. The coefficient of determination is R2 = 0.87, which can be considered a very good result. y = 0.0038(x – 0.5)2.0133(2.1449– x)33.493

Figure 3.2.3.5. Fitting the beta function to the number of similarities

Frequency distribution  141 Table 3.2.3.5: Number of significantly similar poems for each of Eminescu's poems ranked by first publishing year (from Popescu, Mačutek, Altmann 2011: 103 ff.) First published Poem title 1866 1866 1866 1866 1866 1866 1866 1867 1867 1867 1867 1867 1868 1868 1868 1868 1869 1869 1869 1869 1869 1869 1869 1869 1869 1869 1869 1869 1869 1869 1869 1869 1869 1869 1870 1870 1870 1870 1870 1871 1871

De-aş avea Din străinătate La Bucovina La mormântul lui Aron Pumnul Lida Misterele nopţii O călărire în zori Ce-ţi doresc eu ţie, dulce Românie Din lyra spartă... Horia La moartea lui Heliade Nu e steluţă Amorul unei marmure La o artistă (Ca a nopţii poezie) Numai poetul Speranţa Amicului F.I. Când Când marea... Când priveşti oglinda mărei Cine-i? De ce să mori tu? De-aş muri ori de-ai muri Întunericul şi poetul Junii corupţi La moartea principelui Ştirbey La o artistă (Credeam ieri...) Lebăda Locul aripelor O stea pin ceruri Ondina (Fantazie) Prin nopţi tăcute Unda spumă Viaţa mea fu ziuă Epigonii Îngere palid... La moartea lui Neamţu La Quadrat Sus în curtea cea domnească Andrei Mureşanu Aveam o muză

Λ 1.3225 1.7060 1.7465 1.7237 1.5894 1.5812 1.8042 1.6037 1.4675 1.8236 1.7600 1.2778 1.7059 1.6128 1.3950 1.5082 1.8253 1.6985 1.4776 1.6566 1.5660 1.6209 1.5997 1.7367 1.8752 1.5981 1.6963 1.4647 1.6297 1.5726 1.8823 1.3950 1.4055 1.6681 1.8992 1.5088 1.7095 1.5012 1.7229 1.7387 1.8080

Var(Λ) 0.0069 0.0013 0.0015 0.0014 0.0023 0.0019 0.0018 0.0031 0.0028 0.0015 0.0012 0.0041 0.0019 0.0025 0.0049 0.0014 0.0011 0.0007 0.0036 0.0022 0.0035 0.0020 0.0022 0.0020 0.0020 0.0034 0.0011 0.0048 0.0023 0.0023 0.0004 0.0049 0.0036 0.0020 0.0006 0.0032 0.0014 0.0020 0.0015 0.0003 0.0019

1/Λ 0.7561 0.5862 0.5726 0.5801 0.6292 0.6324 0.5543 0.6236 0.6814 0.5484 0.5682 0.7826 0.5862 0.6200 0.7168 0.6630 0.5479 0.5888 0.6768 0.6036 0.6386 0.6169 0.6251 0.5758 0.5333 0.6257 0.5895 0.6827 0.6136 0.6359 0.5313 0.7168 0.7115 0.5995 0.5265 0.6628 0.5850 0.6661 0.5804 0.5751 0.5531

Similarities 21 80 77 82 72 64 62 82 41 54 72 11 85 76 32 44 49 70 47 77 74 74 74 90 42 82 73 49 73 64 25 32 28 78 22 53 82 47 82 61 63

142  The word First published Poem title 1871 1871 1871 1871 1871 1871 1871 1871 1871 1871 1872 1872 1872 1872 1872 1872 1872 1872 1872 1872 1873 1873 1873 1873 1873 1873 1873 1873 1873 1873 1873 1873 1873 1873 1874 1874 1874 1874 1874 1874 1875 1876 1876 1876

Basmul ce i l-aş spune ei Copii eram noi amandoi Frumoasă şi jună Înger de pază Iubită dulce, o, mă lasă Iubitei Mortua est! Noaptea... Replici Steaua vieţii Când crivăţul cu iarna... Cugetările sărmanului Dionis Demonism Doi aştri Ecò Egipetul Feciorul împărat fără de stea Memento mori Miradoniz Odin şi poetul Adânca mare... Ah, mierea buzei tale Care-i amorul meu în astă lume Cum oceanu-ntărâtat... Dacă treci râul Selenei Din Berlin la Potsdam Dumnezeu şi om Floare-albastră Ghazel Înger şi demon Mitologicale Murmură glasul mării O arfă pe-un mormânt Privesc oraşul furnicar Cum negustorii din Constantinopol Împarat şi proletar În căutarea Şeherezadei Napoleon O, adevăr sublime... Pustnicul Făt-Frumos din tei Călin (file de poveste) Crăiasa din poveşti Dorinţa

Λ 1.7774 1.8253 1.5510 1.5514 1.6205 1.5643 1.6837 1.6550 1.2070 1.4561 1.7666 1.9632 1.7502 1.5385 1.8807 1.9211 1.5332 1.6175 1.7898 1.6852 1.5767 1.5259 1.7369 1.6474 1.7523 1.6677 1.9548 1.8612 1.7957 1.8064 1.9389 1.7789 1.6827 1.8209 1.6800 1.8876 1.9921 1.7542 1.7786 1.8599 1.8258 1.7534 1.6580 1.7292

Var(Λ) 0.0011 0.0010 0.0037 0.0012 0.0018 0.0011 0.0017 0.0024 0.0073 0.0038 0.0013 0.0010 0.0009 0.0008 0.0010 0.0004 0.0001 0.0001 0.0007 0.0004 0.0040 0.0016 0.0019 0.0015 0.0012 0.0039 0.0016 0.0023 0.0011 0.0010 0.0006 0.0020 0.0011 0.0021 0.0022 0.0005 0.0004 0.0010 0.0011 0.0011 0.0012 0.0003 0.0020 0.0012

1/Λ 0.5626 0.5479 0.6447 0.6446 0.6171 0.6393 0.5939 0.6042 0.8285 0.6868 0.5661 0.5094 0.5714 0.6500 0.5317 0.5205 0.6522 0.6182 0.5587 0.5934 0.6342 0.6554 0.5757 0.6070 0.5707 0.5996 0.5116 0.5373 0.5569 0.5536 0.5158 0.5621 0.5943 0.5492 0.5952 0.5298 0.5020 0.5701 0.5622 0.5377 0.5477 0.5703 0.6031 0.5783

Similarities 69 49 68 54 71 58 79 79 9 41 74 8 70 47 34 12 41 47 59 63 79 48 90 66 72 98 12 51 60 55 12 76 73 60 79 25 3 70 68 40 51 62 75 81

Frequency distribution  143

First published Poem title 1876 1876 1876 1876 1878 1878 1878 1878 1879 1879 1879 1879 1879 1879 1879 1879 1879 1879 1879 1880 1880 1881 1881 1881 1881 1881 1882 1883 1883 1883 1883 1883 1883 1883 1883 1883 1883 1883 1883 1883 1883 1883 1883 1883

Lacul Melancolie Mureşanu Vis Departe sunt de tine Oricâte stele... Povestea codrului Singurătate Atât de fragedă… De câte ori, iubito... Despărţire Foaia veştedă (dupa Lenau) Freamăt de codru Pajul Cupidon... Pe aceeaşi ulicioară... Revedere Rugăciunea unui dac Sonete Stelele-n cer Dintre sute de catarge O, mamă... Scrisoarea I Scrisoarea II Scrisoarea III Scrisoarea IV Scrisoarea V Nu mă-nţelegi Adio Când amintirile... Ce e amorul? Ce te legeni... Criticilor mei Cu mâne zilele-ţi adaogi... De-oi adormi (variantă) De-or trece anii... Din valurile vremii... Glossă Iar când voi fi pământ (variantă) Iubind în taină... La mijloc de codru... Lasă-ţi lumea... Luceafărul Mai am un singur dor Nu voi mormânt bogat (variantă)

Λ 1.5432 1.7968 1.6616 1.7767 1.6868 1.6531 1.8290 1.7556 1.7701 1.6933 1.7051 1.7931 1.8185 1.7418 1.6202 1.5711 1.8549 1.7951 1.6503 1.4153 1.5313 1.8148 1.8067 1.8230 1.8504 1.7059 1.7618 1.5872 1.6520 1.6533 1.5629 1.4837 1.6230 1.7998 1.4467 1.5200 1.3595 1.7371 1.7128 1.3175 1.7829 1.6514 1.7319 1.8106

Var(Λ) 0.0012 0.0014 0.0003 0.0009 0.0030 0.0013 0.0007 0.0014 0.0010 0.0010 0.0016 0.0021 0.0009 0.0032 0.0023 0.0024 0.0013 0.0012 0.0012 0.0071 0.0032 0.0006 0.0012 0.0002 0.0007 0.0004 0.0013 0.0041 0.0023 0.0041 0.0032 0.0020 0.0025 0.0022 0.0039 0.0025 0.0018 0.0015 0.0020 0.0034 0.0011 0.0004 0.0015 0.0018

1/Λ 0.6480 0.5565 0.6018 0.5628 0.5928 0.6049 0.5467 0.5696 0.5649 0.5906 0.5865 0.5577 0.5499 0.5741 0.6172 0.6365 0.5391 0.5571 0.6060 0.7066 0.6530 0.5510 0.5535 0.5485 0.5404 0.5862 0.5676 0.6300 0.6053 0.6049 0.6398 0.6740 0.6161 0.5556 0.6912 0.6579 0.7356 0.5757 0.5838 0.7590 0.5609 0.6055 0.5774 0.5523

Similarities 52 61 60 68 92 65 47 73 68 72 84 72 49 96 75 65 44 61 65 43 61 46 61 42 37 69 72 83 78 93 69 40 75 72 41 53 18 80 85 18 65 59 87 63

144  The word First published Poem title 1883 1883 1883 1883 1883 1883 1883 1883 1883 1883 1884 1885 1886 1887 1887 1887 1887

Odă în metru antic Pe lângă plopii fără soţ Peste vârfuri S-a dus amorul Se bate miezul nopţii... Şi dacă... Somnoroase păsărele... Te duci... Trecut-au anii Veneţia (de Gaetano Cerri) Din noaptea Sara pe deal La steaua De ce nu-mi vii Kamadeva Povestea teiului Venere şi Madona

Var(Λ)

Λ 1.6267 1.6256 1.4254 1.6623 1.4632 1.2116 1.4633 1.6703 1.6405 1.7013 1.5314 1.8203 1.6121 1.4575 1.6550 1.8013 1.6936

0.0028 0.0020 0.0032 0.0029 0.0034 0.0058 0.0025 0.0036 0.0012 0.0023 0.0029 0.0025 0.0027 0.0037 0.0014 0.0011 0.0013

1/Λ 0.6147 0.6152 0.7016 0.6016 0.6834 0.8254 0.6834 0.5987 0.6096 0.5878 0.6530 0.5494 0.6203 0.6861 0.6042 0.5552 0.5905

Similarities 76 71 34 85 42 8 39 96 65 91 61 64 80 41 67 59 74

Similar results have been found also in works of other authors in several languages (cf. Popescu, Čech, Altmann 2010). Of course, the interaction of the two forces innovation and perseveration may have another form but idiosyncrasies can be studied only after they had been discovered.

3.2.4 Entropy and repeat rate In this section, we shall present two well-known indicators of frequency data, viz. Shannon's entropy and Herfindahl's (1950) indicator of concentration. Shannon's entropy is only one of a large number of formulas expressing the same property (cf. Esteban, Morales 1995), but in linguistics it was used many times and is sufficient for our purposes. The concentration indicator was introduced into linguistics by G. Herdan (1962:36-40; 1966:271-273) under the name repeat rate. The formulas are very simple. Shannon's entropy is defined as V

(3.2.4.1) H = − ∑ pi log 2 pi = log 2 N − i =1

1 V ∑ fi log 2 fi , N i =1

where pi is the relative frequency of the respective entity, i.e., pi = fi/N. The variance of H is

Frequency distribution  145

V

(3.2.4.2) Var ( H ) =

∑ p log i =1

i

2 2

pi − H 2

N

.

The relative entropy is given as (3.2.4.3) H rel =

H , H0

where H0 = log2V, and V the number of types (vocabulary) or form-types. The test for the difference of two texts with the entropies H1 and H2 can be performed by means of the t-test (3.2.4.4) t =

H1 − H 2 . Var ( H1 ) + Var ( H 2 )

where the degrees of freedom can be computed from (3.2.4.5) DF =

[Var ( H1 ) + Var ( H 2 )]2 [Var ( H1 )]2 [Var ( H 2 )]2 + N1 N2

but since our data are usually very extensive, one can consider (3.2.4.4) as a normal test with infinitely many degrees of freedom, and the computation of formula (3.2.4.5) can be omitted. The domain of the entropy is H ∈ [0, log2V]. The repeat rate is defined as V

(3.2.4.6) RR = ∑ pi2 , i =1

computed usually as (3.2.4.7) RR =

1 V 2 ∑ fi , N 2 i =1

i.e. using directly the absolute frequencies. The variance of RR is asymptotically equal to

( RR) = (3.2.4.8) Var

4 V 3 (∑ pi − RR 2 ) . N i =1

For the comparison of two texts, the asymptotic normal test can be applied. Since RR ∈ [1/V; 1] normalisation can be performed as follows

146  The word

(3.2.4.9) RRrel =

1 − RR . 1−1 / V

Frequently, an alternative formula, the McIntosh version (1967) is employed: (3.2.4.10) RRMc =

1 − RR . 1−1 / V

The mutual asymptotic relationship of entropy and repeat rate is (3.2.4.11) RR =

2( H 0 − H ) + 1 . V

We illustrate the computation showing its application to the poem Prin nopţi tăcute, for which we obtain the sequence of frequencies 3,3,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1. Here N = 48, V = 40, hence H0 = log240 = 5.3219. Since log22 = 1 and log21 = 0, we obtain H1 = log248 – (1/48)[(2)3log23 + 4(2)1] = 5.5850 – (9.5098 + 8)/48 = 5.2202 and Hrel = 5.2202/5.3219 = 0.9809. The variance is Var(H) = {2(3/48)[log2(3/48)]2 + 4(2/48)[log2(2/48)]2 + + 34(1/48)[log2(1/48)]2 − 5.22022}/48 = 0.0072. The values of the repeat rate can be obtained as follows: RR = [2(32) + 4(22) + 34(1)2]/482 = 0.0295. RRMc =

1 − 0.0295 = 0.9838 1 − 1 / 40

Var(RR) =

3 3 3  4   3   2   1  2 0.0000179.  2   + 4   + 34   − 0.0295  = 48   48   48   48  

Frequency distribution  147

Both indicators express an aspect of the unevenness of the distribution of frequencies and can be interpreted in many different ways. Entropy is usually considered as an indicator of uncertainty but at the same time it is a measure of equilibrium or uniformity of the distribution of frequencies. Since –log2 pi is the quantity of self-information or uncertainty with which an entity occurs, expression (3.2.4.1) is the mathematical expectation of this quantity, i.e. an indicator of location. If we consider the text as a point in a V-dimensional space, then the repeat rate is also the square of the distance of this point from the origin. Further, if pi is considered a random variable, then the repeat rate is its mathematical expectation and in turn an indicator of location. However, originally it was introduced as a measure of concentration. Both indicators can be also used as measures of dispersion. It can easily be shown that they can be transformed into one another using formula (3.2.4.11). For example in the above case we obtain the repeat rate from the entropy as

= RR

2(5.319 − 5.2202) + 1 = 0.0301 , 40

while direct computation yielded RR = 0.0295, i.e. the difference is on the third decimal place. All values of entropy and repeat rate of word-forms in 146 poems by Eminescu are presented in Table 3.2.4.1. Entropy is maximal if all entities have the same frequency of occurrence. But in that case the vocabulary richness of the texts is also maximal. The entropy is minimal if a unique word is repeated in the text, but this cannot be found even in Dada texts. Nevertheless, a writer may repeat constructions which may influence all indicators, cf. e.g. Eminescu's poem La mijloc de codru… we find the last five lines: Şi de lună şi de soare Şi de păsări călătoare, Şi de lună şi de stele Şi de zbor de rândurele Şi de chipul dragei mele. where the repetition of words şi de (English “and of”) causes the poem to become an outlier with respect to many indicators and may impair otherwise clear relationships. On the other hand, if all entities have the same frequency, the repeat rate is minimal, i.e. great vocabulary richness is associated with a small repeat rate.

148  The word From this point of view, both indicators can textologically be interpreted as measures of vocabulary richness. The mutual empirical relationship of these two indicators for the considered 146 Eminescu's poems can be expressed by the function RR = 0.0072 + 10.3262*exp(-1.1436H) with R2 = 0.9350. The individual values are presented in Figure 3.2.4.1 and Table 3.2.4.1.

Figure 3.2.4.1. The relation between entropy and repeat rate

There are two outliers clearly visible in Figure 3.2.4.1, signalising specific aims of the writer. One is the poem Replici (H = 5.4313, RR = 0.0399), a “dialogue” between the poet and his sweetheart, with many repetitions of the words eu sunt (“I am”) and tu eşti (“you are”). The other one is the poem La mijloc de codru… (H = 4.5867, RR = 0.0718), discussed above, where the words şi de (“and of”) are repeated seven times in the last five lines. The mean relative entropy computed from all 146 poems is Hrel= 0.9548, whose degree cannot be evaluated unless works by other authors in the same language have been analyzed. A preliminary analysis of 54 poems of the Slovak writer Eva Bachletová yielded Hrel = 0.9745, indicating a slightly lower value of vocabulary richness as compared to Eminescu.

Frequency distribution  149 Table 3.2.4.1: Entropies and repeat rates of word-forms in 146 poems by Eminescu Poem title

H

H0

Hrel

Var(H)

RR

RRMc

Var(RR)

Adânca mare…

5.8038 5.9542 0.9747 0.0075 0.0212 0.9789 0.000012

Adio

6.4966 6.7944 0.9562 0.0070 0.0151 0.9689 0.000005

Ah. mierea buzei tale

6.8335 7.1699

Amicului F.I.

7.4229 7.5999 0.9767 0.0024 0.0069 0.9879 0.000003 7.5236

0.9531 0.0051 0.0119 0.9720 0.000002

Amorul unei marmure

7.2359

Andrei Mureşanu

9.0129 9.9816 0.9030 0.0021 0.0057 0.9548 0.0000002

0.9618 0.0040 0.0090 0.9772 0.000001

Atât de fragedă…

6.8106 7.0553

Aveam o muză

7.7134

8.1344 0.9482 0.0040 0.0078 0.9695 0.0000008

Basmul ce i l-aş spune ei

7.6351

8.0334 0.9504 0.0040 0.0082 0.9696 0.000001

Călin (file de poveste)

9.0441 10.1331 0.8925 0.0021 0.0066 0.9471 0.0000002

Când

6.4533 6.6439 0.9713 0.0059 0.0144 0.9780 0.000005

0.9653 0.0056 0.0123 0.9734 0.000004

Când amintirile...

6.1604 6.3219 0.9744 0.0062 0.0167 0.9804 0.000006

Când crivăţul cu iarna...

7.9948 8.7142

0.9174 0.0043 0.0086 0.9539 0.0000007

Când marea...

6.0594 6.3219

0.9585 0.0082 0.0194 0.9691 0.000009

Când priveşti oglinda mărei

6.1927

0.9741 0.0062 0.0166 0.9794 0.000007

Care-i amorul meu în astă lume 7.0575 Ce e amorul?

6.3576

7.2946 0.9675 0.0040 0.0095 0.9808 0.000001

6.2908 6.5546 0.9598 0.0086 0.0178 0.9662 0.000011

Ce te legeni...

6.0180 6.2479 0.9632 0.0086 0.0202 0.9691 0.000014

Ce-ţi doresc eu ţie. dulce

6.7026 6.9887 0.9591 0.0057 0.0128 0.9735 0.000003

Românie Cine-i?

6.2652 6.5392

Copii eram noi amândoi

7.4797 7.9658 0.9390 0.0056 0.0107 0.9573 0.000003

0.9581 0.0080 0.0175 0.9682 0.000008

Crăiasa din poveşti

6.3904 6.5546 0.9750 0.0050 0.0144 0.9813 0.000005

Criticilor mei

6.3568 6.5078 0.9768 0.0036 0.0137 0.9862 0.000001

Cu mâne zilele-ţi adaogi...

6.5508 6.7142

Cugetările sărmanului Dionis

8.1207 8.6036 0.9439 0.0037 0.0070 0.9652 0.0000008

Cum negustorii din

6.2436 6.3923 0.9767 0.0055 0.0156 0.9823 0.000005

0.9757 0.0040 0.0125 0.9841 0.000002

Constantinopol Cum oceanu-ntărâtat...

5.9713

6.0661 0.9844 0.0045 0.0177 0.9876 0.000005

Dacă treci râul Selenei

7.3167

7.8580

De câte ori. iubito...

6.2581

6.3923 0.9790 0.0050 0.0154 0.9833 0.000006

De ce nu-mi vii

6.0806 6.3576 0.9564 0.0082 0.0196 0.9666 0.00001

De ce să mori tu?

7.0020 7.4263 0.9429 0.0060 0.0119 0.9645 0.000002

De-aş avea

5.6489 5.9307

0.9525 0.0102 0.0253 0.9643 0.000015

De-aş muri ori de-ai muri

7.0085 7.3923

0.9481 0.0055 0.0114 0.9681 0.000002

Demonism

8.2353 8.9658 0.9185 0.0035 0.0078 0.9546 0.0000006

0.9311 0.0066 0.0126 0.9503 0.000004

150  The word Poem title

H

H0

Hrel

Var(H)

RR

RRMc

Var(RR)

De-oi adormi (variantă)

6.6007 6.7142

De-or trece anii...

5.6853

5.9773

0.9831 0.0034 0.0117 0.9883 0.000002 0.9512 0.0125 0.0263 0.9586 0.000025

Departe sunt de tine

6.4962 6.7142

0.9675 0.0062 0.0142 0.9760 0.000005

Despărţire

7.2572

Din Berlin la Potsdam

6.4189 6.6294 0.9683 0.0063 0.0149 0.9761 0.000005

7.6582 0.9476 0.0051 0.0101 0.9674 0.000002

Din lyra spartă...

5.3831

Din noaptea

5.7306 5.8329 0.9825 0.0052 0.0208 0.9866 0.000006

5.4594 0.9860 0.0051 0.0258 0.9885 0.00001

Din străinătate

7.0193

Din valurile vremii...

6.4330 6.7004 0.9601 0.0061 0.0148 0.9738 0.000003

7.3923 0.9495 0.0062 0.0122 0.9639 0.000003

Dintre sute de catarge

5.2039

5.3576

Doi aştri

5.2719

5.2854 0.9974 0.0012 0.0262 0.9977 0.000003

0.9713 0.0113 0.0320 0.9731 0.000038

Dorinţa

6.2445 6.4094 0.9743 0.0066 0.0165 0.9774 0.000009

Dumnezeu şi om

7.9486 8.3219

Ecò

8.1211

0.9551 0.0036 0.0067 0.9726 0.0000007

8.7879 0.9241 0.0043 0.0086 0.9528 0.0000009

Egipetul

8.2963 8.8202 0.9406 0.0033 0.0064 0.9654 0.0000005

Epigonii

8.4986 9.8471

Făt-Frumos din tei

7.7025

0.8631 0.0032 0.0066 0.9588 0.0000005

8.1344 0.9469 0.0042 0.0080 0.9681 0.000001

Feciorul de împărat fără de stea 9.6210 11.1491 0.8629 0.0011 0.0058 0.9437 6.00E-08 Floare-albastră

7.2136 7.9484 0.9076 0.0055 0.0105 0.9687 0.000003

Foaia veştedă (dupa Lenau)

6.5227 6.6294 0.9839 0.0035 0.0123 0.9883 0.000002

Freamăt de codru

6.9598 7.1599

Frumoasă şi jună

6.0989 6.3576 0.9593 0.0085 0.0192 0.9684 0.00001

Ghazel

7.5083

7.8517

0.9563 0.0040 0.0081 0.9740 0.000001

Glossă

6.9536

7.5774

0.9177 0.0056 0.0133 0.9538 0.000002

Horia

6.7646 6.8948 0.9811 0.0034 0.0107 0.9870 0.000002

0.9721 0.0043 0.0102 0.9810 0.000002

Iar când voi fi pământ (var.)

6.5771

Împărat şi proletar

8.8488 9.7432 0.9082 0.0026 0.0062 0.9539 0.0000003

6.7279 0.9776 0.0043 0.0124 0.9842 0.000003

În căutarea Şeherezadei

8.5975

Înger de pază

6.0203 6.1497 0.9790 0.0049 0.0175 0.9845 0.000005

Înger şi demon

8.3352 9.0224 0.9238 0.0033 0.0066 0.9606 0.0000004

Îngere palid...

5.6359

5.7279 0.9839 0.0050 0.0219 0.9876 0.000007

Întunericul şi poetul

7.1177

7.4594 0.9542 0.0055 0.0108 0.9690 0.000002

Iubind în taină...

6.1957 6.2668 0.9887 0.0029 0.0147 0.9919 0.000002

Iubită dulce, o, mă lasă

7.3540

Iubitei

7.4543 7.9069 0.9428 0.0039 0.0087 0.9691 0.0000008

Junii corupţi

7.6844

Kamadeva

6.0342 6.1293 0.9845 0.0043 0.0169 0.9880 0.000005

La Bucovina

6.9459

9.2143

7.7279 8.2715 7.1293

0.9331 0.0031 0.0060 0.9617 0.0000004

0.9516 0.0039 0.0086 0.9743 0.0000007 0.9290 0.0056 0.0101 0.9539 0.000002 0.9743 0.0036 0.0099 0.9838 0.000001

Frequency distribution  151

Poem title

H

H0

Hrel

Var(H)

RR

RRMc

Var(RR)

La mijloc de codru...

4.5867

5.1293

0.8942 0.0385 0.0718 0.8826 0.000391

La moartea lui Heliade

7.4361

7.8138

0.9517 0.0045 0.0089 0.9704 0.000001

La moartea lui Neamţu

7.1826 7.4346 0.9661 0.0037 0.0088 0.9807 0.0000008

La moartea principelui Ştirbey

6.3571

La mormântul lui Aron Pumnul

6.6531 6.8580 0.9701 0.0053 0.0127 0.9781 0.000004

6.6147

0.9611 0.0071 0.0157 0.9729 0.000005

La o artistă (Ca a nopţii poezie) 6.5165 6.7004 0.9725 0.0046 0.0132 0.9814 0.000003 La o artistă (Credeam ieri)

6.9977 7.2479 0.9655 0.0043 0.0106 0.9764 0.000002

La Quadrat

6.0524 6.3038 0.9601 0.0083 0.0195 0.9694 0.00001

La steaua

5.8643 5.9542 0.9849 0.0045 0.0188 0.9882 0.000005

Lacul

5.9155

Lasă-ţi lumea...

7.0988 7.3837 0.9614 0.0050 0.0102 0.9743 0.000002

6.1293

0.9651 0.0090 0.0210 0.9712 0.000013

Lebăda

5.1256

5.2095 0.9839 0.0077 0.0315 0.9842 0.000029

Lida

5.7414

5.8329 0.9843 0.0050 0.0207 0.9870 0.000008

Locul aripelor

7.0855 7.4346 0.9530 0.0049 0.0102 0.9728 0.000001

Luceafărul

8.6845 9.6795 0.8972 0.0024 0.0072 0.9482 0.0000003

Mai am un singur dor

6.5857 6.6865 0.9849 0.0028 0.0115 0.9906 0.000001

Melancolie

7.2069 7.5850 0.9502 0.0058 0.0111 0.9640 0.000003

Memento mori

10.1306 11.8041 0.8582 0.0008 0.0056 0.9409 5.00E-08

Miradoniz

7.8641 8.5584 0.9189 0.0049 0.0106 0.9458 0.000002

Misterele nopţii

6.5452

Mitologicale

8.1924 8.7879 0.9322 0.0040 0.0082 0.9551 0.000001

6.7814 0.9652 0.0053 0.0134 0.9776 0.000003

Mortua est!

7.6630 8.2046 0.9340 0.0044 0.0089 0.9616 0.000001

Mureşanu

8.8851 9.9084 0.8967 0.0022 0.0065 0.9502 0.0000002

Murmură glasul mării

6.5038 6.6439 0.9789 0.0045 0.0131 0.9841 0.000003

Napoleon

7.0924 7.4009 0.9583 0.0052 0.0110 0.9697 0.000003

Noaptea...

6.7288 7.0000 0.9613 0.0056 0.0124 0.9747 0.000003

Nu e steluţă

5.1944

Nu mă-nţelegi

7.6089 8.0056 0.9504 0.0040 0.0079 0.9717 0.0000007

5.3219

0.9760 0.0074 0.0314 0.9815 0.000014

Nu voi mormânt bogat (var.)

6.5094 6.6294 0.9819 0.0042 0.0128 0.9861 0.000003

Numai poetul

5.2202

O arfă pe-un mormânt

6.6392 6.8826 0.9646 0.0059 0.0132 0.9749 0.000004

5.3219 0.9809 0.0072 0.0295 0.9837 0.000018

O călărire în zori

7.5010 7.8642 0.9538 0.0043 0.0090 0.9688 0.000002

O stea prin ceruri

5.9230 6.0224 0.9835 0.0043 0.0181 0.9981 0.000004

O. adevăr sublime...

7.3073 7.8202 0.9344 0.0065 0.0118 0.9549 0.000003

O. mamă…

6.3493 6.6147 0.9599 0.0068 0.0159 0.9720 0.000005

Odă în metru antic

6.2373 6.3750 0.9784 0.0047 0.0152 0.9849 0.000003

Odin şi poetul

8.6319 9.4998 0.9086 0.0026 0.0065 0.9548 0.0000003

Ondina (Fantazie)

8.4225 9.0634 0.9293 0.0032 0.0066 0.9605 0.0000005

152  The word Poem title

H

H0

Hrel

Var(H)

RR

RRMc

Var(RR)

Oricâte stele...

6.1182

6.1898 0.9884 0.0028 0.0154 0.9922 0.000002

Pajul Cupidon...

6.5377

6.8455 0.9550 0.0087 0.0161 0.9630 0.000009

Pe aceeaşi ulicioară...

6.4398 6.6865 0.9631 0.0067 0.0150 0.9734 0.000005

Pe lângă plopii fără soţ

6.7834

7.1085

Peste vârfuri

5.1213

5.2854 0.9690 0.0134 0.0349 0.9684 0.000063

Povestea codrului

7.1731

7.3923 0.9703 0.0039 0.0091 0.9800 0.000002

Povestea teiului

7.6147 8.0279 0.9485 0.0043 0.0084 0.9683 0.000001

0.9543 0.0060 0.0126 0.9703 0.000003

Prin nopţi tăcute

5.2202

Privesc oraşul furnicar

6.8569 7.0875

5.3219 0.9809 0.0072 0.0295 0.9837 0.000018

Pustnicul

7.7049 8.0768 0.9540 0.0039 0.0074 0.9734 0.0000007

Replici

5.4313

Revedere

6.4349 6.6724 0.9644 0.0060 0.0145 0.9761 0.000003

0.9675 0.0055 0.0118 0.9750 0.000004

6.1898 0.8775 0.0177 0.0399 0.9062 0.000033

Rugăciunea unui dac

7.5905 7.9830 0.9508 0.0046 0.0086 0.9682 0.000001

S-a dus amorul

6.8833 7.2479 0.9497 0.0063 0.0124 0.9673 0.000003

Sara pe deal

6.8303 7.0000 0.9758 0.0042 0.0108 0.9831 0.000002

Scrisoarea I

8.6764 9.4676 0.9164 0.0027 0.0065 0.9555 0.0000004

Scrisoarea II

8.0706 8.7245 0.9250 0.0039 0.0078 0.9585 0.0000006

Scrisoarea III

9.0155 10.1624 0.8871 0.0024 0.0079 0.9390 0.0000003

Scrisoarea IV

8.6215

Scrisoarea V

8.2784 9.1033 0.9094 0.0035 0.0084 0.9490 0.0000006

9.4491 5.3219

0.9124 0.0029 0.0072 0.9514 0.0000005

Se bate miezul nopţii...

5.2529

Şi dacă...

5.0294 5.2095 0.9654 0.0108 0.0352 0.9721 0.000027

0.9870 0.0054 0.0281 0.9885 0.000014

Singurătate

6.8570 7.0661 0.9704 0.0045 0.0108 0.9807 0.000002

Somnoroase păsărele...

5.4040 5.5236 0.9784 0.0078 0.0268 0.9810 0.000019

Sonete

7.3509 7.5999 0.9672 0.0034 0.0079 0.9816 0.000001

Speranţa

6.6599

7.1599

0.9302 0.0080 0.0172 0.9481 0.000008

Steaua vieţii

5.6506

5.7814

0.9774 0.0063 0.0224 0.9827 0.000009

Stelele-n cer

6.1082 6.2479 0.9776 0.0058 0.0170 0.9822 0.000007

Sus în curtea cea domnească

6.5576 6.7004 0.9787 0.0040 0.0123 0.9856 0.000002

Te duci...

5.8442 6.0875 0.9600 0.0125 0.0249 0.9582 0.000041

Trecut-au anii

6.1099 6.2095 0.9840 0.0040 0.0160 0.9884 0.000003

Unda spumă

5.4375

Venere şi Madona

7.5182 7.9484 0.9459 0.0043 0.0090 0.9667 0.000001

Veneţia (de Gaetano Cerri)

6.0821

Viaţa mea fu ziuă

6.2773 6.4263 0.9768 0.0052 0.0151 0.9829 0.000004

Vis

6.9242 7.1085

5.5546 0.9789 0.0066 0.0256 0.9836 0.000011 6.1497 0.9890 0.0031 0.0159 0.9918 0.000003 0.9741 0.0039 0.0102 0.9828 0.000002

Frequency distribution  153

3.2.5 Gini's coefficient Another geometric property of the rank-frequency sequence is Gini's coefficient, which is rarely applied in textology (cf. Popescu, Altmann 2006; Popescu et al. 2009: 54 ff.) but it seems to reflect an aspect of vocabulary richness. The usual, original way of computation is lengthy while a simplified variant yields the same results. Originally, the frequencies are considered in reverse order – beginning with the smallest frequency – and the ranking is automatically reverse, too. Then both the frequencies and the ranks are cumulated and relativised. That means, the cumulative relative frequencies form an arc running from (0,0) and touch the bisector in (1,1). The arc is usually called Lorenz curve. The magnitude of the area between the bisector and the Lorenz curve yields Gini's coefficient, as can be seen in Figure 3.2.5.1. It could be computed by means of adding the areas of small trapezoids between the bisector and the Lorenz curve but fortunately there are several equivalent expressions. We shall use the definition (3.2.5.1)= G

1 2 V + 1 − V N

V



r =1



∑ rf (r )  ,

which is based on the usual ranks and frequencies. V is the size of vocabulary and N stand for the text length. Considering the last expression in (3.2.5.1) we see that it is nothing else but 2m1', i.e. twice the mean rank. The greater the area, the smaller is the vocabulary richness. This can easily be seen if one considers the maximal richness of the text in which all words occur exactly once. In that case the Lorenz curve is parallel to the diagonal and G (i.e. the area between the diagonal and the Lorenz curve) would be very small. In order to express the richness appropriately, Popescu et al. (2009) defined it as (3.2.5.2) R4 = 1 − G. The variance of G and R4 are identical because both 1 and V are constants. Hence (3.2.5.3) Var (G ) Var ( R4 ) = =

4σ 2 . V 2N

154  The word

Figure 3.2.5.1. Lorenz curve of (reversed) ranked cumulative relative frequencies

The difference between two texts can be tested on the basis of the asymptotic normal criterion (3.2.5.4) u =

R4,1 − R4,2 Var ( R4,1 ) + Var ( R4,2 )

,

where the lower case 1 and 2 indicate two different texts. All values of R4 and their variances for 146 poems by Eminescu are presented in Table 3.2.5.1 Table 3.2.5.1: Vocabulary richness using Gini's coefficient for 146 poems by Eminescu Poem title

R4

Var(R4)

Poem title

R4

Var(R4)

Adânca mare…

0.8411

0.0053 La moartea lui Heliade

0.7129

0.0012

Adio

0.7362

0.0025 La moartea lui Neamţu

0.7477

0.0016

Ah, mierea buzei tale

0.6912

0.0017 La moartea principelui

0.7682

0.0031

Amicului F.I.

0.7907

0.0015 La mormântul lui Aron

0.7991

0.0027

0.7814

0.0027

Ştirbey Pumnul Amorul unei marmure

0.7347

0.0015 La o artistă (Ca a nopţii

Andrei Mureşanu

0.5506

0.0002 La o artistă (Credeam ieri) 0.7488

0.0017

Atât de fragedă…

0.7826

0.0023 La Quadrat

0.7554

0.0036

Aveam o muză

0.7019

0.0010 La steaua

0.8835

0.0054

Basmul ce i l-aş spune ei

0.7005

0.0010 Lacul

0.7984

0.0045

Călin (file de poveste)

0.5365

0.0002 Lasă-ţi lumea...

0.7656

0.0018

poezie)

Frequency distribution  155

Poem title

R4

Var(R4)

Poem title

R4

Var(R4)

Când

0.8140

0.0032 Lebăda

0.9077

0.0093

Când amintirile...

0.8383

0.0041 Lida

0.8772

0.0058

Când crivăţul cu iarna...

0.6238

0.0006 Locul aripelor

0.7077

0.0015

Când marea...

0.7432

0.0035 Luceafărul

0.5309

0.0002

Când priveşti oglinda mărei 0.8291

0.0039 Mai am un singur dor

0.8499

0.0031

Care-i amorul meu în astă

0.0019 Melancolie

0.7300

0.0045

0.4283 0.00003

0.7677

lume Ce e amorul?

0.7812

0.0033 Memento mori

Ce te legeni...

0.7779

0.0039 Miradoniz

0.6312

0.0006

Ce-ţi doresc eu ţie, dulce

0.7348

0.0022 Misterele nopţii

0.7524

0.0025

Cine-i?

0.7541

0.0031 Mitologicale

0.6782

0.0006

Copii eram noi amândoi

0.6962

0.0011 Mortua est!

0.6466

0.0008

Românie

Crăiasa din poveşti

0.8071

0.0032 Mureşanu

0.5279

0.0002

Criticilor mei

0.7768

0.0028 Murmură glasul mării

0.8529

0.0033

Cu mâne zilele-ţi adaogi...

0.7928

0.0027 Napoleon

0.7434

0.0017

Cugetările sărmanului

0.7069

0.0007 Noaptea...

0.7538

0.0023

0.8448

0.0039 Nu e steluţă

0.8009

0.0068

Cum oceanu-ntărâtat...

0.8806

0.0050 Nu mă-nţelegi

0.7051

0.0010

Dacă treci râul Selenei

0.6822

0.0012 Nu voi mormânt bogat

0.8824

0.0035

Dionis Cum negustorii din Constantinopol

(variantă) De câte ori, iubito...

0.8441

0.0038 Numai poetul

0.8542

0.0080

De ce nu-mi vii

0.7268

0.0031 O arfă pe-un mormânt

0.7771

0.0026

De ce să mori tu?

0.6848

0.0015 O călărire în zori

0.7184

0.0011

0.7141

0.0041 O stea prin ceruri

0.8548

0.0049

0.6993

0.0012

De-aş avea De-aş muri ori de-ai muri

0.6938

0.0015 O, adevăr sublime...

Demonism

0.6088

0.0004 O, mamă…

0.7420

0.0028

De-oi adormi (variantă)

0.8707

0.0032 Odă în metru antic

0.8290

0.0038

De-or trece anii...

0.7515

0.0047 Odin şi poetul

0.5622

0.0003

Departe sunt de tine

0.7981

0.0030 Ondina (Fantazie)

0.6486

0.0005

Despărţire

0.7005

0.0013 Oricâte stele...

0.8785

0.0044

Din Berlin la Potsdam

0.7972

0.0031 Pajul Cupidon...

0.7892

0.0028

Din lyra spartă...

0.8792

0.0075 Pe aceeaşi ulicioară...

0.7729

0.0029

Din noaptea

0.8571

0.0057 Pe lângă plopii fără soţ

0.7272

0.0020

Din străinătate

0.7224

0.0017 Peste vârfuri

0.8418

0.0085

Din valurile vremii...

0.7323

0.0026 Povestea codrului

0.7914

0.0018

156  The word Poem title

R4

Var(R4)

Poem title

R4

Var(R4)

Dintre sute de catarge

0.8361

0.0079 Povestea teiului

0.7044

0.0010

Doi aştri

0.9756

0.0087 Prin nopţi tăcute

0.8542

0.0080

Dorinţa

0.8461

0.0039 Privesc oraşul furnicar

0.8043

0.0023

Dumnezeu şi om

0.7453

0.0009 Pustnicul

0.7339

0.0011

Ecò

0.6607

0.0006 Replici

0.5547

0.0025

Egipetul

0.6880

0.0006 Revedere

0.7598

0.0028

Epigonii

0.6505

0.0004 Rugăciunea unui dac

0.7331

0.0012

Făt-Frumos din tei

0.7059

0.0010 S-a dus amorul

0.7225

0.0019

Feciorul de împărat fără de

0.4392 0.000054 Sara pe deal

0.8345

0.0026

0.7702

0.0017 Scrisoarea I

0.5974

0.0003

Foaia veştedă (dupa Lenau) 0.8725

0.0034 Scrisoarea II

0.6404

0.0006

Freamăt de codru

0.8152

0.0023 Scrisoarea III

0.5441

0.0002

Frumoasă şi jună

0.7584

0.0035 Scrisoarea IV

0.5960

0.0003

Ghazel

0.7294

0.0012 Scrisoarea V

0.5840

0.0004

Glossă

0.5752

0.0009 Se bate miezul nopţii...

0.8983

0.0084

Horia

0.8489

0.0027 Şi dacă...

0.7624

0.0070

Iar când voi fi pământ

0.8298

0.0030 Singurătate

0.7996

0.0023

Împărat şi proletar

0.5965

0.0003 Somnoroase păsărele...

0.8526

0.0071

În căutarea Şeherezadei

0.6747

0.0005 Sonete

0.7629

0.0015

Înger de pază

0.8208

0.0042 Speranţa

0.6456

0.0016

Înger şi demon

0.6277

0.0005 Steaua vieţii

0.8197

0.0055

Îngere palid...

0.8616

0.0061 Stelele-n cer

0.8489

0.0043

Întunericul şi poetul

0.7375

0.0016 Sus în curtea cea dom-

0.8325

0.0031

stea Floare-albastră

(variantă)

nească Iubind în taină...

0.8952

0.0044 Te duci...

0.8207

0.0048

Iubită dulce, o, mă lasă

0.6820

0.0011 Trecut-au anii

0.8603

0.0044

Iubitei

0.6433

0.0009 Unda spumă

0.8291

0.0065

Junii corupţi

0.6918

0.0009 Venere şi Madona

0.6808

0.0010

Kamadeva

0.8764

0.0048 Veneţia (de Gaetano Cerri) 0.9059

0.0048

La Bucovina

0.7946

0.0021 Viaţa mea fu ziuă

0.8358

0.0038

La mijloc de codru...

0.6675

0.0074 Vis

0.8060

0.0022

The relationship to previous indicators may be considered linear. Thus, R4 = −2.6294 + 3.5363Hrel with R2 = 0.92

Frequency distribution  157

but concerning RRMc (cf. formula 3.2.4.10) we must consider the first three values as outliers and omit them. In that case we obtain R4 = −6.1206 + 7.0690RRMc with R2 = 0.80. If we consider all values, the relationship is not linear any more. It is again the poem La mijloc de codru… representing the strongest outlier. Comparing the numbers in the column of R4 we can easily state that they are located in the interval R4 ∈ [0.4283; 0.9756]. The empirical mean of these numbers is R4 = 0.75, the variance is s2 = 0.009828, the third central moment (3.2.2.3) is m3 = -0.000664 and the coefficient of asymmetry (S, cf. formula 3.2.2.5) is γ3 = m3/m23/2 = −0.6813. This means that the distribution is slightly steeper on its right hand side. It can be shown that there is no historical change in Eminescu's vocabulary richness. The linear regression against years yields an almost horizontal straight line with very great dispersion rendering it insignificant. However, it can also be shown that the longer the poem, the smaller is his vocabulary richness. Comparing N with R4 we obtain the power relationship R4 = 1.4023N−0.1185, displaying R2 = 0.78 and the t-test for parameters and the F-test for regression are smaller than 0.00001. This is caused by the fact that with increasing text length many words are repeated and new words do not come in great number. But if an indicator of richness changes with increasing length, it is not really adequate. However, if it changes very regularly, we must suppose that there is a background mechanism controlling the increase of new and the repetition of old words. Unquestionably, it should be connected to the information flow in a text but this concept is very vague and complex and nobody knows how to measure it. It is not identical with the stepwise measurement of entropy, it encompasses semantic, syntactic and lexical information but their share on information has never been measured or even defined.

3.2.6 Geometric properties 3.2.6.1 The triangle The rank-frequency distribution of word-forms, lemmas or hrebs of a text has always a hyperbolic form. The same holds for frequency spectra which can be determined empirically by simple addition or theoretically by a simple transformation. The rank-frequency distribution has two conspicuous points, viz. P1(V,1) designating the vocabulary size (V) of the text or the highest rank (V =

158  The word rmax), whose frequency f(V) is always 12, and the point P2(1,f(1)) designating the first ranked word with highest frequency f(1). The third point, called h-point is determined as follows (cf. Popescu et al. 2009, Eq. (3.2))3. if there is an r such that r = f (r ) r ,  (3.2.6.1) h =  f (i )rj − f ( j )ri  r − r + f (i ) − f ( j ) , if there is no r such that r = f (r )  j i

i.e. if the rank r is equal to its frequency f(r) then h = r, otherwise we use the second part of formula (3.2.6.1). When there are several such ranks r, we will take the smallest one. Usually ri and rj are the smallest neighbouring values such that f(i) < f(j). Of course, we would obtain another h if we ascribe the same frequencies to the same mean rank, e.g. if two equal frequencies have ranks 3 and 4 we ascribe them the mean rank 3.5. Now, having the third point P3(h,h) we can set up the characteristic triangle of the text, presented in Figure 3.2.6.1.

Figure 3.2.6.1. The characteristic triangle of the rank requency sequence (from Popescu, Altmann 2006)

 2 If the greatest rank is still smaller then the smallest frequency, i.e. if rmax < f(rmax), the parameter h cannot be computed. Hence one must transform the whole ranked sequence by subtraction into f*(r) = f(r) – f(V) + 1 (cf. Popescu, Kelih, Best, Altmann 2009) 3 The seminal paper introducing the h-point into linguistics was published in 2007 (cf. Popescu 2007).

Frequency distribution  159

Knowing the three points we compute the area of the triangle as

V 1 1 1 (3.2.6.2) Ah = 1 f (1) 1 2 h h 1 from which follows

Ah (3.2.6.3)=

1 | Vf (1) + 2h − h(V + f (1)) − 1| . 2

Note that one should take the absolute value of the area Ah regardless of its orientation (P1P2P3 or P1P3P2). For the sake of illustration let us consider the poem Doi aştri in which we find the following ranked word/form frequencies 2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1. Here we have V = 39, f(1) = 2, h = [2(2) – 1(1)]/[2-1+2-1] = 3/2 = 1.5 hence Ah = |39(2) + 2(1.5) - 1.5(39 + 2) – 1|/2 = 9.25. In order to obtain a relative number we compute first the maximum of Ah given as (3.2.6.4) Amax = (1/2)(V – 1)(f(1) – 1) , yielding in our example Amax = (1/2)(39 – 1)(2 – 1) = 19.00. Then the ratio (3.2.6.5) A = Ah/Amax gives the relative empirical size of the characteristic triangle. In our example we obtain A = 9.25/19.00 = 0.4868. As can be observed, N (the number of words in the text under study) does not play any role in this computation. We present the results in Table 3.2.6.1. The dispersion is too great to show a significant dependence on N. An increase of A with increasing N is, nevertheless, observable, as can be seen in Figure 3.2.6.2, but a function capturing this trend will not be proposed. The dispersion is the smaller, the greater is N. Evidently, A approaches its highest value 1. A similar result has been shown for 54 texts in 7 languages (Popescu, Altmann 2006a).

160  The word Table 3.2.6.1: Relative size of the characteristic triangle of rank-frequency sequences of wordforms in 146 poems by Eminescu Poem title

N

V

f(1)

h

Ah

Amax

A

Adânca mare…

75

62

5

3.00

57.00

122.00

0.47

Adio

159

111

9

5.00

204.00

440.00

0.46

Ah, mierea buzei tale

228

144

10

5.00

339.50

643.50

0.53

Amicului F.I.

257

194

5

4.00

90.50

386.00

0.23

Amorul unei marmure

266

184

9

5.00

350.00

732.00

0.48

Andrei Mureşanu

2008 1011

65

14.67

24980.98 32320.00

0.77

Atât de frageda…

176

133

11

3.50

482.50

660.00

0.73

Aveam o muză

421

281

17

7.00

1352.00

2240.00

0.60

Basmul ce i l-aş spune ei

398

262

18

6.00

1523.50

2218.50

0.69

Călin (file de poveste)

2299 1123

98

16.33

45071.35

54417.00

0.83

Când

126

100

7

3.50

165.75

297.00

0.56

Când amintirile...

97

80

5

3.00

75.00

158.00

0.47

Când crivăţul cu iarna...

708

420

31

10.00

4264.50

6285.00

0.68

Când marea...

114

80

7

4.00

109.50

237.00

0.46

Când priveşti oglinda mărei 101

82

6

3.00

116.50

202.50

0.58

Care-i amorul meu în astă

213

157

7

4.50

184.50

468.00

0.39

Ce e amorul?

124

94

8

3.00

225.50

325.50

0.69

Ce te legeni...

102

76

8

3.50

160.00

262.50

0.61

Ce-ţi doresc eu ţie, dulce

183

127

8

4.67

197.16

441.00

0.45 0.49

lume

Românie Cine-i?

129

93

8

4.33

157.00

322.00

Copii eram noi amândoi

375

250

24

7.00

2047.50

2863.50

0.72

Crăiasa din poveşti

122

94

7

3.00

180.00

279.00

0.65

Criticilor mei

130

91

4

3.00

42.00

135.00

0.31

Cu mâne zilele-ţi adaogi...

141

105

6

3.50

123.75

260.00

0.48

Cugetările sărmanului

571

389

27

8.00

3595.00

5044.00

0.71

101

84

5

3.00

79.00

166.00

0.48

Dionis Cum negustorii din Constantinopol Cum oceanu-ntărâtat...

77

67

4

2.50

47.25

99.00

0.48

Dacă treci râul Selenei

356

232

21

6.00

1682.50

2310.00

0.73

De câte ori, iubito...

102

84

6

2.50

141.50

207.50

0.68

De ce nu-mi vii

123

82

9

4.00

190.50

324.00

0.59

De ce să mori tu?

266

172

13

6.00

568.50

1026.00

0.55

De-aş avea

93

61

6

4.00

52.50

150.00

0.35

Frequency distribution  161

Poem title

N

V

f(1)

h

Ah

Amax

A

De-aş muri ori de-ai muri

258

168

10

5.67

340.83

751.50

0.45

Demonism

882

500

36

10.00

6329.50

8732.50

0.72

De-oi adormi (variantă)

122

105

4

3.00

49.00

156.00

0.31

De-or trece anii...

87

63

7

4.00

84.00

186.00

0.45

Departe sunt de tine

135

105

7

4.00

147.00

312.00

0.47

Despărţire

304

202

14

6.00

771.50

1306.50

0.59

Din Berlin la Potsdam

128

99

7

3.67

155.33

294.00

0.53

Din lyra spartă...

51

44

3

2.00

20.50

43.00

0.48

Din noaptea

68

57

3

3.00

2.00

56.00

0.04

Din străinătate

244

168

13

5.00

644.00

1002.00

0.64

Din valurile vremii...

152

104

7

5.00

91.00

309.00

0.29

Dintre sute de catarge

50

41

4

2.67

24.17

60.00

0.40

Doi aştri

40

39

2

1.50

9.25

19.00

0.49

Dorinţa

102

85

7

2.67

177.00

252.00

0.70

Dumnezeu şi om

443

320

15

6.50

1317.25

2233.00

0.59

Ecò

698

442

30

9.00

4514.50

6394.50

0.71

Egipetul

688

452

24

7.50

3646.00

5186.50

0.70

Epigonii

921

565

37

10.00

12258.00 16560.00

0.74

Făt-Frumos din tei

415

281

17

6.50

1426.00

0.64

Feciorul de împărat fără de 6030 2271

207

2240.00

23.67 205748.63 233810.00

0.88

stea Floare-albastră

247

185

12

3.88

983.56

1353.00

0.73

Foaia veştedă (dupa Lenau)

115

99

5

3.00

94.00

196.00

0.48

Freamăt de codru

179

143

7

4.00

204.00

426.00

0.48

Frumoasă şi jună

113

82

8

4.00

151.50

283.50

0.53

Ghazel

331

231

12

6.00

662.50

1265.00

0.52

Glossă

380

191

19

8.00

982.00

1710.00

0.57

Horia

143

119

6

3.00

172.00

295.00

0.58

Iar când voi fi pământ (vari-

131

106

6

3.00

152.50

262.50

0.58

Împărat şi proletar

1510

857

55

13.50

17424.50

23112.00

0.75

În căutarea Şeherezadei

915

594

34

9.00

7280.50

9784.50

0.74

Înger de pază

91

71

5

2.50

84.50

140.00

0.60

Înger si demon

876

520

30

11.00

4785.50

7525.50

0.64

Îngere palid...

63

53

3

2.50

11.50

52.00

0.22

Întunericul şi poetul

249

176

11

5.00

505.00

875.00

0.58

Iubind în taină...

87

77

3

2.50

17.50

76.00

0.23

Iubită dulce, o, mă lasă

337

212

11

6.00

502.50

1055.00

0.48

antă)

162  The word N

V

Iubitei

416

Junii corupţi

458

Poem title

f(1)

h

Ah

Amax

A

240

17

6.50

1210.75

1912.00

0.63

309

23

7.50

2315.50

3388.00

0.68

Kamadeva

81

70

4

2.50

49.50

103.50

0.48

La Bucovina

184

140

7

4.00

199.50

417.00

0.48

La mijloc de codru...

55

35

11

2.83

129.66

170.00

0.76

La moartea lui Heliade

332

225

14

5.75

893.13

1456.00

0.61

La moartea lui Neamţu

245

173

8

5.00

244.00

602.00

0.41

La moartea principelui

132

98

6

4.00

89.50

242.50

0.37

La mormântul lui Aron Pum- 150

116

8

4.00

219.50

402.50

0.55

142

104

7

3.50

172.75

309.00

0.56

La o artistă (Credeam ieri)

219

152

12

4.00

587.50

830.50

0.71

La Quadrat

110

79

7

3.50

129.00

234.00

0.55

La steaua

71

62

3

3.00

2.00

61.00

0.03 0.46

Ştirbey nul La o artistă (Ca a nopţii poezie)

Lacul

90

70

6

3.50

80.00

172.50

Lasă-ţi lumea...

225

167

10

4.50

440.75

747.00

0.59

Lebăda

41

37

3

2.33

10.67

36.00

0.30

Lida

66

57

4

2.00

54.50

84.00

0.65

Locul aripelor

259

173

8

6.00

154.50

602.00

0.26

Luceafărul

1737

820

84

14.00

Mai am un singur dor

125

103

4

3.00

48.00

153.00

Melancolie

274

192

5.50

1062.25

1528.00

Memento mori

9773 3576

423

27.00 702364.00 754325.00

0.93

Miradoniz

636

377

40

8.00

5879.50

7332.00

0.80

Misterele nopţii

155

110

7

4.00

154.50

327.00

0.47

Mitologicale

681

442

34

8.00

5617.50

7276.50

0.77

Mortua est!

491

295

22

7.50

2063.25

3087.00

0.67

Mureşanu

2051

961

88

17.00

33384.00 41760.00

0.80

Murmură glasul mării

119

100

6

3.00

143.50

247.50

Napoleon

240

169

14

4.00

820.50

1092.00

0.75

Noaptea...

177

128

8

4.50

210.00

444.50

0.47

Nu e steluţă

17

28125.50 33988.50

0.83 0.31 0.70

0.58

54

40

3

3.00

2.00

39.00

0.05

Nu mă-nţelegi

384

257

12

7.00

607.00

1408.00

0.43

Nu voi mormânt bogat

113

99

5

3.00

94.00

196.00

0.48

48

40

3

2.50

8.25

39.00

0.21

(variantă) Numai poetul

Frequency distribution  163

Poem title

N

V

f(1)

h

Ah

Amax

A

O arfă pe-un mormânt

157

118

8

4.00

223.50

409.50

0.55

O călărire în zori

346

233

19

5.50

1525.50

2088.00

0.73

O stea prin ceruri

78

65

3

3.00

2.00

64.00

0.03

O, adevăr sublime...

334

226

18

6.00

1307.50

1912.50

0.68

O, mamă…

140

98

7

4.00

136.50

291.00

0.47

Odă în metru antic

103

83

4

3.00

38.00

123.00

0.31

Odin şi poetul

1429

724

56

13.00

15214.50

19882.50

0.77

Ondina (Fantazie)

871

535

35

9.75

6593.00

9078.00

0.73

Oricâte stele...

85

73

3

2.00

35.00

72.00

0.49

Pajul Cupidon...

148

115

9

4.00

273.00

456.00

0.60

Pe aceeaşi ulicioară...

138

103

7

4.00

144.00

306.00

0.47

Pe lângă plopii fără soţ

199

138

9

5.00

258.00

548.00

0.47

Peste vârfuri

47

39

5

2.50

44.50

76.00

0.59

Povestea codrului

220

168

9

3.50

449.25

668.00

0.67

Povestea teiului

390

261

18

6.00

1517.50

2210.00

0.69

Prin nopţi tăcute

48

40

3

2.50

8.25

39.00

0.21

Privesc oraşul furnicar

173

136

10

4.00

391.50

607.50

0.64

Pustnicul

380

270

12

6.50

709.50

1479.50

0.48

Replici

147

73

15

6.43

270.57

504.00

0.54

Revedere

141

102

6

4.50

67.00

252.50

0.27

Rugăciunea unui dac

357

253

14

6.00

975.50

1638.00

0.60

S-a dus amorul

219

152

10

5.00

359.50

679.50

0.53

Sara pe deal

156

128

6

3.67

141.50

317.50

0.45

Scrisoarea I

1282

708

50

10.50

13730.50

17321.50

0.79

Scrisoarea II

696

423

30

11.00

3864.00

6119.00

0.63

Scrisoarea III

2278 1146

110

15.40

53373.70 62402.50

0.86

Scrisoarea IV

1256

699

65

12.33

18018.01 22336.00

0.81

Scrisoarea V

1027

550

46

10.50

9531.00

12352.50

0.77

Se bate miezul nopţii...

45

40

3

2.00

18.50

39.00

0.47

Şi dacă...

53

37

4

3.00

15.00

54.00

0.28

Singurătate

172

134

6

4.00

125.50

332.50

0.38

Somnoroase păsărele...

55

46

4

2.50

31.50

67.50

0.47

Sonete

265

194

8

5.00

275.50

675.50

0.41

Speranţa

245

143

19

5.00

958.00

1278.00

0.75

Steaua vieţii

70

55

4

3.00

24.00

81.00

0.30

Stelele-n cer

91

76

5

3.00

71.00

150.00

0.47

Sus în curtea cea domnea-

128

104

5

3.00

99.00

206.00

0.48

scă

164  The word N

V

f(1)

h

Ah

Amax

Te duci...

84

68

9

3.00

193.00

268.00

0.72

Trecut-au anii

88

74

4

2.50

52.50

109.50

0.48

Poem title

A

Unda spumă

59

47

3

3.00

2.00

46.00

0.04

Venere şi Madona

393

247

17

6.33

1269.34

1968.00

0.65

Veneţia (de Gaetano Cerri)

79

71

3

2.50

16.00

70.00

0.23

Viaţa mea fu ziuă

105

86

5

3.00

81.00

170.00

0.48

Vis

177

138

7

3.50

232.25

411.00

0.57

We obtain an image similar to that obtained for A also if we compute the corresponding triangle for the frequency spectrum (cf. Popescu et al. 2009: 81ff.). For the sake of differentiation we will call here W = the greatest non-zero class, g(1) = number of words occurring once, k = the h-point for the spectrum, Q1(W,1), Q2(1,g(1)), Q3(k,k). The situation is presented in Figure 3.2.6.3. The formulas are identical with those in (3.2.6.1) to (3.2.6.5), mutatis mutandis.4

Figure 3.2.6.2. The convergence of A-values with increasing N

We demonstrate the calculation of B for the poem Doi aştri, whose frequencies are shown above. Here we have after transformation

 4 If the greatest rank is still smaller than the smallest frequency, i.e. if the maximum frequency W < g(Wx), the parameter k cannot be computed. Hence one must transform the whole ranked sequence by subtraction into g*(f) = g(f) – g(W) + 1.

Frequency distribution  165

g(1) = 38, g(2) = 1, W = 2, hence k = (38(2) – 1(1))/(2 – 1+ 38 – 1) = 75/38 = 1.9737 Bk = Wg (1) + 2k – k ( g (1) + W ) – 1 / 2 2 ( 38 ) + 2 (1.9737 ) – 1.9737 ( 38 + 2 ) – 1 / 2 = 0.0003 = ... = Bmax

1) / 2 ( 2 (W – 1) ( g (1) –=

and = B B= k / Bmax

– 1) ( g (1) – = 1) / 2

37 /2 =

18.5

0.00001 .

0.0003 / 18.5 =

Figure 3.2.6.3. The characteristic triangle of the frequency spectrum

Again, we compute the value of B for 146 poems and obtain the results in Table 3.2.6.2. Table 3.2.6.2: Relative size of the characteristic triangle of the spectrum of word-forms in 146 poems by Eminescu (asterisk means the use of the transformation g*(f) = g(f) – g(W) + 1) Poem title

N

W

g(1)

k

Bk

Bmax

B 0.46

Adânca mare…

75

5

55

3.00

50.00

108.0

Adio

159

9

88

3.50

229.25

348.0

0.66

Ah, mierea buzei tale

228

10

108

4.33

288.17

481.5

0.60

Amicului F.I*

242

5

154

4.20

54.80

306.0

0.18 0.63

Amorul unei marmure*

235

9

142

3.80

355.40

564.0

Andrei Mureşanu

2008

65

763

6.50

22112.50

24384.0

0.91

Atât de frageda…

176

11

110

3.50

396.25

545.0

0.73

Aveam o muză

421

17

226

3.89

1451.89

1800.0

0.81

Basmul ce i l-aş spune ei

398

18

204

4.40

1351.50

1725.5

0.78

166  The word Poem title Călin (file de poveste) Când

N

W

g(1)

k

Bk

Bmax

B

2299

98

830

6.00

37891.50

40206.5

0.94

126

7

85

2.82

170.18

252.0

0.68

Când amintirile...

97

5

72

2.00

104.50

142.0

0.74

Când crivăţul cu iarna...

708

31

345

3.70

4655.10

5160.0

0.90

Când marea...

114

7

63

3.33

106.67

186.0

0.57

Când priveşti oglinda mărei

101

6

71

2.83

106.25

175.0

0.61

Care-i amorul meu în astă

213

7

130

4.33

162.00

387.0

0.42

Ce e amorul?*

110

8

78

2.88

190.75

269.5

0.71

Ce te legeni...

102

8

61

2.82

149.09

210.0

0.71

Ce-ţi doresc eu ţie, dulce

183

8

100

3.00

240.50

346.5

0.69 0.70

lume

Românie Cine-i?

129

8

75

2.91

181.68

259.0

Copii eram noi amândoi

375

24

205

3.63

2048.06

2346.0

0.87

Crăiasa din poveşti

122

7

74

3.00

140.00

219.0

0.64

Criticilor mei*

120

4

61

3.40

14.40

90.0

0.16

Cu mâne zilele-ţi adaogi...

141

6

79

2.95

114.08

195.0

0.59

Cugetările sărmanului Dionis

571

27

324

3.82

3707.23

4199.0

0.88

Cum negustorii din

101

5

75

3.00

70.00

148.0

0.47

Constantinopol Cum oceanu-ntărâtat...

77

4

60

2.60

38.90

88.5

0.44

Dacă treci râul Selenei

356

21

188

3.33

1628.50

1870.0

0.87

De câte ori, iubito...

102

6

71

2.82

106.82

175.0

0.61

De ce nu-mi vii

123

9

59

3.00

166.00

232.0

0.72

De ce să mori tu?

266

13

141

3.88

621.50

840.0

0.74

De-aş avea*

77

6

45

3.25

54.88

110.0

0.50

De-aş muri ori de-ai muri

258

10

133

3.88

391.31

594.0

0.66

Demonism

882

36

349

5.00

5324.00

6090.0

0.87

De-oi adormi (variantă)*

112

4

94

2.67

59.50

139.5

0.43

De-or trece anii...

87

7

53

2.60

109.60

156.0

0.70

Departe sunt de tine

135

7

90

2.88

177.94

267.0

0.67

Despărţire

304

14

163

4.00

790.50

1053.0

0.75

Din Berlin la Potsdam

128

7

83

2.90

162.40

246.0

0.66

Din lyra spartă...

51

3

38

2.60

5.80

37.0

0.16

Din noaptea*

56

3

47

2.33

14.00

46.0

0.30

Din străinătate

244

13

135

3.63

612.38

804.0

0.76

Din valurile vremii...

152

7

80

3.40

135.00

237.0

0.57

Dintre sute de catarge*

43

4

35

2.00

32.50

51.0

0.64

Frequency distribution  167

N

W

g(1)

Doi aştri

40

2

38

1.97

Dorinţa

102

7

75

3.33

Dumnezeu şi om

443

15

269

3.60

Ecò

698

30

362

4.00

Egipetul

688

24

366

4.00

Epigonii

921

37

442

4.56

Făt-Frumos din tei

415

17

234

4.40

1440.70

6030

207

1567

9.50

153767.00

Floare-albastră

247

12

157

5.69

466.19

858.0

0.54

Foaia veştedă (dupa Lenau)

115

5

88

2.86

89.50

174.0

0.51

Freamăt de codru

179

7

125

3.25

225.75

372.0

0.61

Frumoasă şi jună

113

8

67

4.00

121.50

231.0

0.53 0.68

Poem title

Feciorul de împărat fără de

k

Bk

Bmax

B

0.00

18.5

0.00

128.67

222.0

0.58

1509.40

1876.0

0.80

4649.50

5234.5

0.89

3615.50

4197.5

0.86

7090.00

7938.0

0.89

1864.0

0.77

161298.0 0.95

stea

Ghazel

331

12

189

4.33

702.33

1034.0

Glossă

380

19

141

4.57

977.86

1260.0

0.78

Horia

143

6

103

3.50

121.25

255.0

0.48

Iar când voi fi pământ (vari-

131

6

90

3.00

128.50

222.5

0.58

Împărat şi proletar

1510

55

706

5.50

17327.25

19035.0

0.91

În căutarea Şeherezadei

915

34

495

5.00

7097.00

8151.0

0.87

Înger de pază

91

5

55

2.86

54.14

108.0

0.50

Înger şi demon

876

30

419

4.00

5390.50

6061.0

0.89

57

3

44

2.60

7.00

43.0

0.16

antă)

Îngere palid* Întunericul şi poetul

249

11

143

3.33

532.67

710.0

0.75

Iubind în taină*

81

3

68

2.60

11.80

67.0

0.18

Iubită dulce, o, mă lasă

337

11

162

4.00

548.50

805.0

0.68

Iubitei

416

17

174

5.40

968.20

1384.0

0.70

Junii corupţi

458

23

275

3.75

2607.00

3014.0

0.87

Kamadeva

81

4

62

2.67

38.17

91.5

0.42

La Bucovina

184

7

112

3.33

196.50

333.0

0.59

La mijloc de codru...

55

11

29

3.25

97.25

140.0

0.69

La moartea lui Heliade

332

14

182

4.20

866.10

1176.5

0.74

La moartea lui Neamţu

245

8

136

3.75

277.25

472.5

0.59

La moartea principelui Ştirbey

132

6

84

4.25

64.50

207.5

0.31

La mormântul lui Aron Pumnul

150

8

96

2.87

237.30

332.5

0.71

La o artistă (Ca a nopţii poezie)

142

7

78

3.00

148.00

231.0

0.64

La o artistă (Credeam ieri)

219

12

112

3.78

441.06

610.5

0.72

La Quadrat

110

7

63

3.25

109.50

186.0

0.59

168  The word Poem title

N

W

g(1)

k

Bk

Bmax

B

La steaua*

59

3

54

1.98

26.01

53.0

0.49

Lacul

90

6

60

2.67

94.17

147.5

0.64

Lasă-ţi lumea...

225

10

142

3.75

428.25

634.5

0.67

Lebăda*

37

3

34

2.89

0.00

33.0

0.00

Lida

66

4

50

3.14

17.79

73.5

0.24

Locul aripelor*

223

8

138

4.00

263.50

479.5

0.55

Luceafărul

1737

84

583

5.60

22623.50

24153.0

0.94

Mai am un singur dor

125

4

85

2.93

42.11

126.0

0.33

Melancolie

274

17

157

3.00

1076.00

1248.0

0.86

Memento mori

9773

423 2460

10.92

504564.04

Miradoniz

636

40

296

4.40

5184.70

518849.0 0.97 5752.5

0.90

Misterele nopţii

155

7

87

3.83

127.67

258.0

0.49

Mitologicale

681

34

360

4.50

5237.50

5923.5

0.88

Mortua est!

491

22

227

3.87

2018.97

2373.0

0.85

Mureşanu

2051

88

679

6.29

27471.21

29493.0

0.93

Murmură glasul mării

119

6

89

3.50

103.75

220.0

0.47

Napoleon

240

14

132

3.71

656.07

851.5

0.77

Noaptea...

177

8

107

3.71

217.64

371.0

0.59

Nu e steluţă*

42

3

27

2.67

2.67

26.0

0.10

Nu mă-nţelegi*

330

12

205

4.75

718.88

1122.0

0.64

Nu voi mormânt bogat (vari-

113

5

92

2.50

110.75

182.0

0.61

42

3

33

2.33

9.33

32.0

0.29

antă) Numai poetul* O arfă pe-un mormânt

157

8

99

3.40

217.00

343.0

0.63

O călărire în zori

346

19

177

5.13

1183.87

1584.0

0.75

O stea prin ceruri*

66

3

53

2.60

8.80

52.0

0.17

O, adevăr sublime...

334

18

192

2.95

1420.70

1623.5

0.88

O, mamă…

140

7

77

3.50

125.50

228.0

0.55

Odă în metru antic*

93

4

69

2.83

36.92

102.0

0.36

Odin şi poetul

1429

56

525

5.89

12994.67

14410.0

0.90

Ondina (Fantazie)

871

35

427

5.00

6322.00

7242.0

0.87 0.07

Oricâte stele...

85

3

62

2.80

4.30

61.0

Pajul Cupidon...*

124

9

103

2.80

309.00

408.0

0.76

Pe aceeaşi ulicioară...

138

7

86

2.89

169.06

255.0

0.66

Pe lângă plopii fără soţ

199

9

114

4.00

270.50

452.0

0.60

Peste vârfuri

47

5

35

2.00

49.00

68.0

0.72

Povestea codrului

220

9

138

3.57

361.57

548.0

0.66

Povestea teiului

390

18

209

3.80

1453.00

1768.0

0.82

Frequency distribution  169

Poem title

N

W

g(1)

k

Bk

Bmax

B

Prin nopţi tăcute*

42

3

33

2.33

9.33

32.0

0.29

Privesc oraşul furnicar

173

10

117

2.92

402.21

522.0

0.77

Pustnicul

380

12

231

4.00

903.50

1265.0

0.71

Replici

147

15

56

3.00

316.00

385.0

0.82

Revedere

141

6

82

3.25

105.75

202.5

0.52

Rugăciunea unui dac

357

14

212

3.50

1091.50

1371.5

0.80

S-a dus amorul

219

10

128

3.00

435.50

571.5

0.76

Sara pe deal

156

6

113

3.00

163.00

280.0

0.58

Scrisoarea I

1272

50

553

5.86

12064.43

13524.0

0.89

Scrisoarea II

696

30

342

4.33

4327.83

4944.5

0.88

Scrisoarea III

2278

110

873

6.00

45071.50

47524.0

0.95

Scrisoarea IV

1256

65

546

6.00

15917.50

17440.0

0.91

Scrisoarea V

1027

46

410

5.00

8294.50

9202.5

0.90 0.30

Se bate miezul nopţii...

45

3

36

2.33

10.33

35.0

Şi dacă...

53

4

27

3.25

6.38

39.0

0.16

Singurătate*

151

6

115

4.13

99.06

285.0

0.35 0.46

Somnoroase păsărele...

55

4

40

2.50

27.00

58.5

Sonete

265

8

160

4.00

307.50

556.5

0.55

Speranţa

245

19

103

3.86

746.57

918.0

0.81

Steaua vieţii

70

4

44

2.86

21.79

64.5

0.34

Stelele-n cer

91

5

67

2.67

73.67

132.0

0.56

Sus în curtea cea Domnească

128

5

89

3.25

72.50

176.0

0.41

Te duci..

84

9

61

2.67

183.33

240.0

0.76

Trecut-au anii

88

4

63

2.78

35.22

93.0

0.38

Unda spumă*

47

3

36

2.50

7.25

35.0

0.21

Venere şi Madona

393

17

183

4.57

1102.43

1456.0

0.76

Veneţia (de Gaetano Cerri)*

73

3

64

2.33

19.67

63.0

0.31

Viaţa mea fu ziuă

105

5

75

3.00

70.00

148.0

0.47

Vis

177

7

114

3.25

205.13

339.0

0.61

As can be seen, both A and B can be interpreted as measures of vocabulary richness. In Doi aştri where only one word is repeated twice, the relative size of the spectrum triangle is 0.00001 attaining almost its lower boundary 0. Again, the graph of (see Figure 3.2.6.4) shows that B converges to 1 for large text sizes. However, the dispersion is still too great and the sequence cannot be captured by a power function.

170  The word Hoping that we obtain a smoother curve using the ratio A/B we must state that we obtain rather a funnel converging to 1 as shown in Figure 3.2.6.5. Here we took only 141 poems: two of them, Doi aştri and Lebăda, were omitted because B = 0 and three outliers, Oricâte stele... (A/B = 6.90), Din lyra spartă... (A/B = 3.04), and Lida (A/B = 2.68) in order to keep the picture more lucid.

Figure 3.2.6.4. The convergence of B-values with increasing N

Figure 3.2.6.5. The convergence of A/B ratio with increasing N

Frequency distribution  171

3.2.6.2 Writer's view and the golden section On the basis of the frequency triangle we can imagine that the writer unconsciously controls the increase of the word frequencies with the help of the h-point. S/He also controls the proportions of synsemantics and autosemantics which are separated from each other in a fuzzy way by the h-point, and by the k-point characterising the spectrum s/he controls the representation of the frequency classes. This idea was proposed several times, cf. Popescu, Altmann (2007); Tuzzi, Popescu, Altmann (2010a,b). The two straight lines joining P3(h,h) with P1(V,1) and P2(1,f(1)) respectively form an angle α, whose cosine can be computed as (3.2.6.6) cos α =

−[(h − 1)( f (1) − h) + (h − 1)(V − h)] . [(h − 1) 2 + ( f (1) − h) 2 ]1/ 2 [(h − 1) 2 + (V − h) 2 ]1/ 2

Considering again the poem Doi aştri we obtain cos α = -[(1.5-1)(2 – 1.5) + (1.5 – 1)(39 – 1.5)]/[(0.52 + 0.52)1/2(0.52 + + 37.52)1/2] = = -[0.52 + 0.5(37.5)]/[0.7071(37.5)] = - 0.7165 The angle α in radians can be obtained from this value by α radians = arcos(cos α). In our example it is arccos(-0.7165) = 2.3695 radians. The values of α radians of rank-frequency distributions of all poems are presented in Table 3.2.6.3. Table 3.2.6.3: α radians of rank-frequency distribution in 146 poems (ordered according to increasing N) Poem title

N

V

f(1)

h

Doi aştri

40

39

2

1.50 -0.7165

2.3695

Lebăda

41

37

3

2.33 -0.9109

2.7164

Se bate miezul nopţii...

45

40

3

2.00 -0.7255

2.3825

Peste vârfuri

47

39

5

2.50 -0.5493

2.1523

Numai poetul

48

40

3

2.50 -0.9606

2.8598

Prin nopţi tăcute

48

40

3

2.50 -0.9606

2.8598

Dintre sute de catarge

50

41

4

2.67 -0.8073

2.5103

Din lyra spartă...

51

44

3

2.00 -0.7237

2.3800

Şi dacă...

53

37

4

3.00 -0.9191

2.7367

Nu e steluţă

54

40

3

3.00 -0.9985

3.0876

La mijloc de codru...

55

35

11

2.83 -0.2742

1.8486

cos α

α rad

172  The word Poem title

N

V

f(1)

h

Somnoroase păsărele...

55

46

4

2.50 -0.7311

2.3907

Unda spumă

59

47

3

3.00 -0.9990

3.0962

Îngere palid...

63

53

3

2.50 -0.9577

2.8495

Lida

66

57

4

2.00 -0.4634

2.0526

Din noaptea

68

57

3

3.00 -0.9993

3.1046

Steaua vieţii

70

55

4

3.00 -0.9110

2.7164

La steaua

71

62

3

3.00 -0.9994

3.1077

Adânca mare…

75

62

5

3.00 -0.7307

2.3901

Cum oceanu-ntărâtat...

77

67

4

2.50 -0.7234

2.3794

O stea prin ceruri

78

65

3

3.00 -0.9995

3.1093

Veneţia (de Gaetano Cerri)

79

71

3

2.50 -0.9554

2.8417

Kamadeva

81

70

4

2.50 -0.7226

2.3784

Te duci...

84

68

9

3.00 -0.3453

1.9233

Oricâte stele...

85

73

3

2.00 -0.7170

2.3703

De-or trece anii...

87

63

7

4.00 -0.7421

2.4070

Iubind în taină...

87

77

3

2.50 -0.9549

2.8400

Trecut-au anii

88

74

4

2.50 -0.7218

2.3772

Lacul

90

70

6

3.50 -0.7332

2.3938

Înger de pază

91

71

5

2.50 -0.5331

2.1331

Stelele-n cer

91

76

5

3.00 -0.7262

2.3836

De-aş avea

93

61

6

4.00 -0.8601

2.6062

Când amintirile...

97

80

5

3.00 -0.7252

2.3822

Când priveşti oglinda mărei

101

82

6

3.00 -0.5756

2.1841

Cum negustorii din Constantinopol Ce te legeni...

101

84

5

3.00 -0.7243

2.3809

102

76

8

3.50 -0.5155

2.1124

De câte ori, iubito...

102

84

6

2.50 -0.4108

1.9941

Dorinţa

102

85

7

2.67 -0.3778

1.9582

Odă în metru antic

103

83

4

3.00 -0.9053

2.7029

Viaţa mea fu ziuă

105

86

5

3.00 -0.7239

2.3803

La Quadrat

110

79

7

3.50 -0.6078

2.2241

Frumoasă şi jună

113

82

8

4.00 -0.6303

2.2527

Nu voi mormânt bogat (variantă) 113

99

5

3.00 -0.7217

2.3770

Când marea...

114

80

7

4.00 -0.7344

2.3956

Foaia veştedă (dupa Lenau)

115

99

5

3.00 -0.7217

2.3770

Murmură glasul mării

119

100

6

3.00 -0.5717

2.1794

Crăiasa din poveşti

122

94

7

3.00 -0.4668

2.0564

De-oi adormi (variantă)

122

105

4

3.00 -0.9030

2.6976

De ce nu-mi vii

123

82

9

4.00 -0.5471

2.1497

cos α

α rad

Frequency distribution  173 Poem title

N

V

f(1)

h

Ce e amorul?

124

94

8

3.00 -0.3917

1.9733

Mai am un singur dor

125

103

4

3.00 -0.9032

2.6979

Când

126

100

7

3.50 -0.6021

2.2169

Din Berlin la Potsdam

128

99

7

3.67 -0.6463

2.2735

Sus în curtea cea domnească

128

104

5

3.00 -0.7210

2.3760

Cine-i?

129

93

8

4.33 -0.7000

2.3462

Criticilor mei

130

91

4

3.00 -0.9044

2.7007

Iar când voi fi pământ (variantă) 131

106

6

3.00 -0.5707

2.1782

La moartea principelui Ştirbey

132

98

6

4.00 -0.8493

2.5855

Departe sunt de tine

135

105

7

4.00 -0.7278

2.3859

Pe aceeaşi ulicioară...

138

103

7

4.00 -0.7282

2.3865

O, mamă…

140

98

7

4.00 -0.7293

2.3881

Cu mâne zilele-ţi adaogi...

141

105

6

3.50 -0.7243

2.3808

Revedere

141

102

6

4.50 -0.9327

2.7726

La o artistă (Ca a nopţii poezie)

142

104

7

3.50 -0.6013

2.2159

Horia

143

119

6

3.00 -0.5690

2.1760

Replici

147

73

15

6.43 -0.6019

2.2167

Pajul Cupidon...

148

115

9

4.00 -0.5375

2.1382

La mormântul lui Aron Pumnul

150

116

8

4.00 -0.6212

2.2411

Din valurile vremii...

152

104

7

5.00 -0.9118

2.7183

Misterele nopţii

155

110

7

4.00 -0.7268

2.3845

Sara pe deal

156

128

6

3.67 -0.7665

2.4442

O arfă pe-un mormânt

157

118

8

4.00 -0.6208

2.2406

Adio

159

111

9

5.00 -0.7333

2.3939

Singurătate

172

134

6

4.00 -0.8446

2.5767

Privesc oraşul furnicar

173

136

10

4.00 -0.4674

2.0572

Atât de fragedă …

176

133

11

3.50 -0.3345

1.9118

Noaptea...

177

128

8

4.50 -0.7269

2.3845

Vis

177

138

7

3.50 -0.5963

2.2096

Freamăt de codru

179

143

7

4.00 -0.7222

2.3778

Ce-ţi doresc eu ţie, dulce Românie La Bucovina

183

127

8

4.67 -0.7598

2.4338

184

140

7

4.00 -0.7225

2.3782

Pe lângă plopii fără soţ

199

138

9

5.00 -0.7280

2.3863

Care-i amorul meu în astă lume

213

157

7

4.50 -0.8269

2.5443

La o artistă (Credeam ieri)

219

152

12

4.00 -0.3700

1.9498

S-a dus amorul

219

152

10

5.00 -0.6457

2.2727

Povestea codrului

220

168

9

3.50 -0.4276

2.0126

Lasă-ţi lumea...

225

167

10

4.50 -0.5549

2.1591

cos α

α rad

174  The word Poem title

N

V

f(1)

h

Ah, mierea buzei tale

228

144

10

5.00 -0.6469

2.2743

Napoleon

240

169

14

4.00 -0.3047

1.8804

Din străinătate

244

168

13

5.00 -0.4690

2.0590

La moartea lui Neamţu

245

173

8

5.00 -0.8141

2.5219

Speranţa

245

143

19

5.00 -0.3025

1.8781

Floare-albastră

247

185

12

3.88 -0.3447

1.9227

Întunericul şi poetul

249

176

11

5.00 -0.5740

2.1822

Amicului F.I.

257

194

5

4.00 -0.9536

2.8356

De-aş muri ori de-ai muri

258

168

10

5.67 -0.7520

2.4220

Locul aripelor

259

173

8

6.00 -0.9392

2.7910

Sonete

265

194

8

5.00 -0.8125

2.5193

Amorul unei marmure

266

184

9

5.00 -0.7227

2.3785

De ce să mori tu?

266

172

13

6.00 -0.6055

2.2212

Melancolie

274

192

17

5.50 -0.3868

1.9679

Despărţire

304

202

14

6.00 -0.5515

2.1549

Ghazel

331

231

12

6.00 -0.6571

2.2878

La moartea lui Heliade

332

225

14

5.75 -0.5176

2.1149

O, adevăr sublime...

334

226

18

6.00 -0.4055

1.9883

Iubită dulce, o, mă lasă

337

212

11

6.00 -0.7241

2.3805

O călărire în zori

346

233

19

5.50 -0.3349

1.9123

Dacă treci râul Selenei

356

232

21

6.00 -0.3371

1.9147

Rugăciunea unui dac

357

253

14

6.00 -0.5471

2.1496

Copii eram noi amândoi

375

250

24

7.00 -0.3560

1.9348

Glossă

380

191

19

8.00 -0.5687

2.1758

Pustnicul

380

270

12

6.50 -0.7217

2.3771

Nu mă-nţelegi

384

257

12

7.00 -0.7834

2.4708

Povestea teiului

390

261

18

6.00 -0.4026

1.9852

Venere şi Madona

393

247

17

6.33 -0.4669

2.0566

Basmul ce i l-aş spune ei

398

262

18

6.00 -0.4026

1.9851

Făt-Frumos din tei

415

281

17

6.50 -0.4817

2.0733

Iubitei

416

240

17

6.50 -0.4847

2.0769

Aveam o muză

421

281

17

7.00 -0.5331

2.1331

Dumnezeu şi om

443

320

15

6.50 -0.5579

2.1626

Junii corupţi

458

309

23

7.50 -0.4065

1.9894

Mortua est!

491

295

22

7.50 -0.4296

2.0148

Cugetările sarmanului Dionis

571

389

27

8.00 -0.3629

1.9422

Miradoniz

636

377

40

8.00 -0.2322

1.8051

Mitologicale

681

442

34

8.00 -0.2755

1.8499

Egipetul

688

452

24

7.50 -0.3801

1.9607

cos α

α rad

Frequency distribution  175 Poem title

N

V

f(1)

h

Scrisoarea II

696

423

30

11.00 -0.4871

2.0795

Ecò

698

442

30

9.00 -0.3732

1.9532

Când crivăţul cu iarna...

708

420

31

10.00 -0.4140

1.9976

Ondina (Fantazie)

871

535

35

9.75 -0.3431

1.9210

Înger şi demon

876

520

30

11.00 -0.4830

2.0749

Demonism

882

500

36

10.00 -0.3444

1.9224

În căutarea Şeherezadei

915

594

34

9.00 -0.3178

1.8942

Epigonii

921

565

37

10.00 -0.3256

1.9024

Scrisoarea V

1027

550

46

10.50 -0.2755

1.8499

Scrisoarea IV

1256

699

65

12.33 -0.2265

1.7993

Scrisoarea I

1282

708

50

10.50 -0.2471

1.8204

Odin şi poetul

1429

724

56

13.00 -0.2850

1.8598

Împărat şi proletar

1510

857

55

13.50 -0.3026

1.8782

Luceafărul

1737

820

84

14.00 -0.1984

1.7705

Andrei Mureşanu

2008

1011

65

14.67 -0.2752

1.8496

Mureşanu

2051

961

88

17.00 -0.2363

1.8094

Scrisoarea III

2278

1146

110

15.40 -0.1631

1.7346

Călin (file de poveste)

2299

1123

98

16.33 -0.1981

1.7702

Feciorul de împărat fără de stea

6030

2271

207

23.67 -0.1327

1.7039

Memento mori

9773

3576

423

27.00 -0.0728

1.6437

cos α

α rad

As shown in Figure 3.2.6.6, the α radians converges to the golden section given as

= ϕ (3.2.6.7)

1+ 5 = 1.6180 2

as was firstly shown for 176 texts in 20 languages (Popescu, Altmann 2007: 79, Figure 5), and not to π/2 = 1.5708 (see more in Tuzzi, Popescu, Altmann, 2010b). Again, we may speak of mechanical self-regulation in cases where the text is too long and the word frequencies cannot be controlled consciously by the author. The fact that one obtains the golden section in different text sorts and different languages points to the existence of an attractor known since antiquity. We obtain a similar result also if we compute the indicator B, the homologous of indicator A for the frequency spectrum (cf. Popescu et al. 2009: 81ff.) using the triangle Q1(W,1), Q2(1,g(1)), Q3(k,k). The situation is presented in Figure 3.2.6.3. The results for frequency spectra of the same poems are presented in Table 3.2.6.4 and shown graphically in Figure 3.2.6.7.

176  The word

Figure 3.2.6.6. Convergence of α radians in Eminescu's poems to the golden section for rank frequencies (presented in logarithmic scaling of N) Table 3.2.6.4: α radians of frequency spectra in 146 poems by Eminescu (asterisk (*) means the use of the transformation g*(f) = g(f) – g(W) + 1) N(N*)

W

g(1)

Adânca mare…

75

5

55

Adio

159

9

88

3.50 -0.4405

2.0270

Poem title

Ah, mierea buzei tale

k

cos α

α rad

spectra

spectra

3.00 -0.7338

2.3946

228

10

108

4.33 -0.5345

2.1347

Amicului F.I.*

257(242)

5

154

4.20 -0.9751

2.9180

Amorul unei marmure*

266(235)

9

142

3.80 -0.4918

2.0850

2008

65

763

6.50 -0.1008

1.6718

Andrei Mureşanu Atât de fragedă …

176

11

110

3.50 -0.3384

1.9160

Aveam o muză

421

17

226

3.89 -0.2279

1.8007

Basmul ce i l-aş spune ei

398

18

204

4.40 -0.2590

1.8328

Călin (file de poveste)

2299

98

830

6.00 -0.0603

1.6312

Când

126

7

85

2.82 -0.4189

2.0030

Când amintirile...

97

5

72

2.00 -0.3297

1.9068

Când crivăţul cu iarna...

708

31

345

3.70 -0.1063

1.6773

Când marea...

114

7

63

3.33 -0.5694

2.1766

Când priveşti oglinda mărei

101

6

71

2.83 -0.5241

2.1225

Care-i amorul meu în astă lume

213

7

130

4.33 -0.7972

2.4934

Ce e amorul?*

124(110)

8

78

2.88 -0.3669

1.9465

Ce te legeni...

102

81

61

2.82 -0.3604

1.9395

Ce-ţi doresc eu ţie, dulce Românie

183

8

100

3.00 -0.3905

1.9719

Cine-i?

129

8

75

2.91 -0.3758

1.9560

Frequency distribution  177

N(N*)

W

g(1)

Copii eram noi amândoi

375

24

205

3.63 -0.1407

1.7120

Crăiasa din poveşti

122

7

74

3.00 -0.4722

2.0626

130(120)

4

61

3.40 -0.9794

2.9383

Poem title

Criticilor mei*

k

cos α

α rad

spectra

spectra

Cu mâne zilele-ţi adaogi...

141

6

79

2.95 -0.5601

2.1653

Cugetările sărmanului Dionis

571

27

324

3.82 -0.1294

1.7006

Cum negustorii din Constantinopol

101

5

75

3.00 -0.7265

2.3840 2.4506

Cum oceanu-ntărâtat...

77

4

60

2.60 -0.7706

Dacă treci râul Selenei

356

21

188

3.33 -0.1435

1.7147

De câte ori, iubito...

102

6

71

2.82 -0.5191

2.1166

De ce nu-mi vii

123

9

59

3.00 -0.3499

1.9282

De ce să mori tu?

266

13

141

3.88 -0.3204

1.8970

93(77)

6

45

3.25 -0.6740

2.3104

De-aş muri ori de-ai muri

258

10

133

3.88 -0.4450

2.0319

Demonism

882

36

349

5.00 -0.1395

1.7107

122(112)

4

94

2.67 -0.7921

2.4851

De-or trece anii...

87

7

53

2.60 -0.3714

1.9513

Departe sunt de tine

135

7

90

2.88 -0.4333

2.0189

Despărţire

304

14

163

4.00 -0.3054

1.8811

Din Berlin la Potsdam

128

7

83

2.90 -0.4419

2.0285

51

3

38

2.60 -0.9801

2.9418

68(56)

3

47

2.33 -0.9074

2.7078

244

13

135

3.63 -0.2888

1.8638

De-aş avea*

De-oi adormi (variantă)*

Din lyra spartă... Din noaptea* Din străinătate Din valurile vremii...

152

7

80

3.40 -0.5805

2.1901

50(43)

4

35

2.00 -0.4741

2.0647

Doi aştri

40

2

38

1.97 -1.0000

3.1416

Dorinţa

102

7

75

3.33 -0.5640

2.1701

Dumnezeu şi om

443

15

269

3.60 -0.2319

1.8048

Ecò

698

30

362

4.00 -0.1229

1.6941

Egipetul

688

24

366

4.00 -0.1565

1.7280

Epigonii

921

37

442

4.56 -0.1170

1.6881 1.8492

Dintre sute de catarge*

Făt-Frumos din tei

415

17

234

4.40 -0.2748

6030

207

1567

9.50 -0.0484

1.6193

Floare-albastră

247

12

157

5.69 -0.6214

2.2414

Foaia veştedă (dupa Lenau)

115

5

88

2.86 -0.6713

2.3067

Freamăt de codru

179

7

125

3.25 -0.5303

2.1297

Frumoasă şi jună

113

8

67

4.00 -0.6374

2.2619

Ghazel

331

12

189

4.33 -0.4152

1.9990

Feciorul de împărat fără de stea

178  The word N(N*)

W

g(1)

Glossă

380

19

141

Horia

143

6

103

3.50 -0.7246

2.3813

Iar când voi fi pământ (variantă)

131

6

90

3.00 -0.5737

2.1818

Împărat şi proletar

1510

55

706

5.50 -0.0969

1.6679

În căutarea Şeherezadei

915

34

495

5.00 -0.1447

1.7160

Înger de pază

91

5

55

2.86 -0.6814

2.3205

Înger şi demon

876

30

419

4.00 -0.1218

1.6929

Îngere palid...*

63(57)

3

44

2.60 -0.9788

2.9352

249

11

143

3.33 -0.3071

1.8829

87(81)

3

68

2.60 -0.9758

2.9211

Poem title

Întunericul şi poetul Iubind în taină...*

k

cos α

α rad

spectra

spectra

4.57 -0.2656

1.8396

Iubită dulce, o, mă lasă

337

11

162

4.00 -0.4113

1.9947

Iubitei

416

17

174

5.40 -0.3789

1.9594

Junii corupţi

458

23

275

3.75 -0.1514

1.7228

Kamadeva

81

4

62

2.67 -0.7981

2.4949

La Bucovina

184

7

112

3.33 -0.5549

2.1590

La mijloc de codru...

55

11

29

3.25 -0.3613

1.9405

La moartea lui Heliade

332

14

182

4.20 -0.3275

1.9044

La moartea lui Neamţu

245

8

136

3.75 -0.5606

2.1659

La moartea principelui Ştirbey

132

6

84

4.25 -0.8990 2.6884

La mormântul lui Aron Pumnul

150

8

96

2.87 -0.3605

La o artistă (Ca a nopţii poezie)

142

7

78

3.00 -0.4709

2.0611

La o artistă (Credeam ieri)

219

12

112

3.78 -0.3443

1.9223 2.1489

1.9396

La Quadrat

110

7

63

3.25 -0.5464

La steaua*

71(59)

3

54

1.98 -0.7074

2.3565

90

6

60

2.67 -0.4730

2.0635

Lacul Lasă-ţi lumea... Lebăda* Lida Locul aripelor* Luceafărul

225

10

142

3.75 -0.4209

2.0052

41(37)

3

34

2.89 -1.0000

3.1416

66

4

50

3.14 -0.9445

2.8068

259(223)

8

138

4.00 -0.6178

2.2367

1737

84

583

5.60 -0.0665

1.6374 2.6580

Mai am un singur dor

125

4

85

2.93 -0.8853

Melancolie

274

17

157

3.00 -0.1543

1.7257

Memento mori

9773

423

2460 10.92 -0.0281

1.5989

Miradoniz

636

40

296

4.40 -0.1067

1.6777

Misterele nopţii

155

7

87

3.83 -0.6918

2.3348

Mitologicale

681

34

360

4.50 -0.1276

1.6987

Mortua est!

491

22

227

3.87 -0.1688

1.7404

Frequency distribution  179

Poem title

N(N*)

W

g(1)

Mureşanu

2051

88

679

6.29 -0.0724

1.6432

Murmură glasul mării

119

6

89

3.50 -0.7275

2.3854

Napoleon

240

14

132

3.71 -0.2756

1.8500

Noaptea... Nu e steluţă* Nu mă-nţelegi* Nu voi mormânt bogat (variantă)

k

cos α

α rad

spectra

spectra

177

8

107

3.71 -0.5571

2.1616

54(42)

3

27

2.67 -0.9917

3.0126

384(330)

12

205

4.75 -0.4760

2.0669 2.1280

113

5

92

2.50 -0.5288

48(42)

3

33

2.33 -0.9130

2.7214

O arfă pe-un mormânt

157

8

99

3.40 -0.4847

2.0768

O călărire în zori

346

19

177

5.13 -0.3079

1.8838 2.9283

Numai poetul*

O stea prin ceruri*

78(66)

3

53

2.60 -0.9773

O, adevăr sublime...

334

18

192

2.95 -0.1387

1.7100

O, mamă…

140

7

77

3.50 -0.6086

2.2250

Odă în metru antic*

103(93)

4

69

2.83 -0.8582

2.6026

Odin şi poetul

1429

56

525

5.89 -0.1065

1.6775

Ondina (Fantazie)

871

35

427

5.00 -0.1416

1.7128

Oricâte stele...

85

3

62

2.80 -0.9968

3.0613

148(124)

9

103

2.80 -0.2960

1.8713 2.0242

Pajul Cupidon...* Pe aceeaşi ulicioară...

138

7

86

2.89 -0.4380

Pe lângă plopii fără soţ

199

9

114

4.00 -0.5377

2.1385

Peste vârfuri

47

5

35

2.00 -0.3448

1.9228

Povestea codrului

220

9

138

3.57 -0.4453

2.0323

Povestea teiului

390

18

209

3.80 -0.2068

1.7791

48(42)

3

33

2.33 -0.9130

2.7214

Privesc oraşul furnicar

173

10

117

2.92 -0.2774

1.8519

Pustnicul

380

12

231

4.00 -0.3635

1.9428

Replici

147

15

56

3.00 -0.2015

1.7737

Prin nopţi tăcute*

Revedere

141

6

82

3.25 -0.6551

2.2851

Rugăciunea unui dac

357

14

212

3.50 -0.2433

1.8165

S-a dus amorul

219

10

128

3.00 -0.2901

1.8651

Sara pe deal

156

6

113

3.00 -0.5697

2.1770

Scrisoarea I

1272

50

553

5.86 -0.1182

1.6893

Scrisoarea II

696

30

342

4.33 -0.1386

1.7098

Scrisoarea III

2278

110

873

6.00 -0.0538

1.6246

Scrisoarea IV

1256

65

546

6.00 -0.0937

1.6646

Scrisoarea V

1027

46

410

5.00 -0.1069

1.6779

45

3

36

2.33 -0.9114

2.7175

Se bate miezul nopţii...

180  The word Poem title

N(N*)

W

g(1)

Şi dacă...

53

4

27

3.25 -0.9743

2.9143

172(151)

6

115

4.13 -0.8716

2.6294

Somnoroase păsărele...

55

4

40

2.50 -0.7348

2.3962

Sonete

265

8

160

4.00 -0.6153

2.2335

Speranţa

245

19

103

3.86 -0.2136

1.7861

Steaua vieţii

70

4

44

2.86 -0.8744

2.6350

Stelele-n cer

91

5

67

2.67 -0.6021

2.2169

Sus în curtea cea domnească

128

5

89

3.25 -0.8052

2.5068

Te duci...

84

9

61

2.67 -0.2820

1.8567

Trecut-au anii

88

4

63

2.78 -0.8404

2.5688

Unda spumă*

59(47)

3

36

2.50 -0.9619

2.8646

393

17

183

4.57 -0.2954

1.8706

79(73)

3

64

2.33 -0.9039

2.6996

Singurătate*

Venere şi Madona Veneţia (de Gaetano Cerri)* Viaţa mea fu ziuă

k

cos α

α rad

spectra

spectra

105

5

75

3.00 -0.7265

2.3840

177

7

114

3.25 -0.5318

2.1315

Vis

Figure 3.2.6.7. Convergence of α radians to the golden section for spectra in Eminescu's poems

The simultaneous convergence of α radians, both for ranks and spectra, to the golden section in Eminescu's poems is presented in Figure 3.2.6.8.

Vocabulary richness  181

Figure 3.2.6.8. Convergence of α radians, both for ranks and spectra, to the golden section in Eminescu's poems

Finally, as expected on geometrical grounds, there should be a perfect linear relationship between the indicator A and the angle α radians for rankfrequencies. With the data in Tables 3.2.6.1 and 3.2.6.3 we have A = 1.8035 - 0.5593*α (ranks) with R2 = 0.9964 Similarly for spectra, with the data in Tables 3.2.6.2 and 3.2.6.4, we have B = 1.8561 - 0.5816*α (spectra) with R2 = 0.9949.

3.3 Vocabulary richness One of the most diffuse concepts in textology is that of vocabulary richness. This concept is not well-defined; a large number of measures were proposed, each of them expresses another property and each of them behaves differently. Even the basic idea behind the concept is not really clear as it is, in all cases, intended to be measured in terms of textual properties but at the same time often interpreted as a measure of cognitive qualities of an author. However, there is no hypothesis which could (or would try to) deliver a bridge between a person's mental word inventory (be it its active or passive version) and the vocabulary of an individual text. There are too many intervening factors between these two aspects – factors of style, genre, target readership, topic and many more – to allow to infer one from the other. Moreover, even the aim of the measurement is not always transparent. A direct comparison of texts with respect to vocabulary size is not reasonable if the texts are of different size, which is almost always the

182  The word case. Short texts inevitably correspond to small vocabularies, a fact which is neglected by some of the proposed richness measures such as the ratio between the number of hapax legomena and the number of running words. Furthermore, some of the proposed measures display a bizarre mathematical behaviour under certain conditions (cf. Wimmer, Altmann 1999) – a sign of a bad correspondence between the textual reality and the idea how it should be measured. We are confronted with a very fuzzy concept, and the merit of science is the great effort for and the progress in quantifying and measuring this phenomenon. The history of the problem is very rich, the most contributions have been furnished by French scientists (Bernet 1988; Brunet 1978; Cossette 1994; Dugast 1980; Guiraud 1954; Honore 1979; Hubert, Labbé 1994; Ménard 1983; Muller 1964, 1970, 1977; Thoiron 1988; Thoiron, P., Labbé, D., Serant, D. 1988) but the research advances in the whole world. Unfortunately, even an attempt at writing a survey of all models and results would necessitate a separate book. If word-forms are considered, different results are obtained for the same text in different languages. This is caused by the degree of synthetism of a language. Hence in turn two other views present themselves: the lemmatised text and the frequency spectrum of the text. The lemmatised text has more problems than the unlemmatised text, since lemmatisation is very probably a procedure which every linguist performs in another way. One should not forget that lemmatisation is based on a series of rules mostly inherited in national linguistics, it encompasses the elimination of grammatical categories (but not everywhere!), unsolvable cases (e.g. suppletivism, gender), decisions about the status of prepositions-prefixes-proclitics (or postpositions-suffixes-enclitics), the status of compounds, derivates, etc. Lemmatisation performed by software is only one of the possible ones – not the last truth! Nevertheless, one can use it for the predetermined purposes (cf. Chapter 3.2). The other way out is to look at the frequency spectrum of word-forms or lemmas. In the spectrum, all hapax legomena are collected in x = 1; words occurring twice, in x = 2, etc. Of course, words occurring twice or three times, etc. contribute to the richness, too, but up to which x should we consider the words as richness bearers? Here a very simple and a well definable point is given, viz. the h-point and its family (cf. Popescu et al. 2009; 2010) which will be used here, too. On the other hand, one could weight of richness by the inverse value of frequency yielding values in the interval [0, 1]. Here we shall restrict ourselves to the application of some indicators introduced in works by Popescu et al. (2009, 2011). For a short survey of other indicators see Wimmer, Altmann (1999). As has been shown in Chapter 3.2.6, the h-point separates synsemantics – occurring usually very frequently – from autosemantics which make up the

Vocabulary richness  183

contents of the text and occur less frequently. Of course, some of the synsemantics occur also beyond the h-point, and some autosemantics forming the very theme of the text (cf. Popescu, Altmann 2011) occur more frequently below the h-point but do not contribute significantly to vocabulary richness. Thus the h-point or the k-point can be used for expressing an approximate measure of vocabulary richness. Defining (3.3.1)

F ( h) = F ( r ≤ h) =

1 [h] ∑ f (r ) N r =1

as the distribution function of the rank-frequency sequence consisting of the sum of relative frequencies from r = 1 up to [h] (since h needs not be an integer, we take only values up to the integer value, [h], of h). Making a small correction and subtracting the half of the square formed by h, divided by the whole area under the frequency sequence (N), we obtain F(h) – h2/(2N). Finally, since richness is given by those words whose rank is greater than h, we define the richness indicator R1 (cf. Popescu et al. 2009: 33) as  h2 (3.3.2) R1 = 1 −  F ([h]) − 2N 

 . 

For illustration consider the rank-frequency sequence of Doi aştri shown in Chapter 3.2.6: 2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1. Here we have h = 1.5, N = 40, F([1.5]) = F(1) = 2/40 = 0.05 whence R1 = 1 – (0.05 – 1.52/(2*40)) = 0.9781, a value that could be expected since almost all words occur only once. This is, of course, possible only in short texts. The values of R1 for other poems by Eminescu are presented in Table 3.3.1. Table 3.3.1: The vocabulary richness R1 in 146 poems by Eminescu. Poem title

N

h

F(h)

R1

Adânca mare…

75

3.00

0.1467

0.9133

Adio

159

5.00

0.1950

0.8836

Ah, mierea buzei tale

228

5.00

0.1491

0.9057

Amicului F.I.

257

4.00

0.0700

0.9611

Amorul unei marmure

266

5.00

0.1316

0.9154

Andrei Mureşanu

2008

14.67

0.2276

0.8260

184  The word Poem title

N

h

F(h)

R1

Atât de fragedă…

176

3.50

0.1136

0.9212

Aveam o muză

421

7.00

0.1686

0.8895

Basmul ce i l-aş spune ei

398

6.00

0.1508

0.8945

Călin (file de poveste)

2299

16.33

0.2545

0.8036

Când

126

3.50

0.1270

0.9216

Când amintirile...

97

3.00

0.1237

0.9227

Când crivăţul cu iarna...

708

10.00

0.2359

0.8347

Când marea...

114

4.00

0.1842

0.8860

Când priveşti oglinda mărei

101

3.00

0.1287

0.9158

Care-i amorul meu în astă lume

213

4.50

0.1033

0.9442

Ce e amorul?

124

3.00

0.1532

0.8831

Ce te legeni...

102

3.50

0.1569

0.9032

Ce-ţi doresc eu ţie, dulce Românie

183

4.67

0.1421

0.9174

Cine-i?

129

4.33

0.1860

0.8867

Copii eram noi amândoi

375

7.00

0.1973

0.8680

Crăiasa din poveşti

122

3.00

0.1066

0.9303

Criticilor mei

130

3.00

0.0846

0.9500

Cu mâne zilele-ţi adaogi...

141

3.50

0.0993

0.9441

Cugetările sărmanului Dionis

571

8.00

0.1716

0.8844

Cum negustorii din Constantinopol

101

3.00

0.1188

0.9257

Cum oceanu-ntărâtat...

77

2.50

0.0909

0.9497

Dacă treci râul Selenei

356

6.00

0.2079

0.8427

De câte ori, iubito...

102

2.50

0.0882

0.9424

De ce nu-mi vii

123

4.00

0.1870

0.8780

De ce să mori tu?

266

6.00

0.1880

0.8797

De-aş avea

93

4.00

0.2151

0.8710

De-aş muri ori de-ai muri

258

5.67

0.1667

0.8956

Demonism

882

10.00

0.2075

0.8492

De-oi adormi (variantă)

122

3.00

0.0902

0.9467

De-or trece anii...

87

4.00

0.2414

0.8506

Departe sunt de tine

135

4.00

0.1556

0.9037

Despărţire

304

6.00

0.1711

0.8882

Din Berlin la Potsdam

128

3.67

0.1328

0.9197

Din lyra spartă...

51

2.00

0.0980

0.9412

Din noaptea

68

3.00

0.1324

0.9338

Din străinătate

244

5.00

0.1762

0.8750

Din valurile vremii...

152

5.00

0.1776

0.9046

Dintre sute de catarge

50

2.67

0.1600

0.9111

Vocabulary richness  185

Poem title

N

h

F(h)

R1

Doi aştri

40

1.50

0.0500

0.9781

Dorinţa

102

2.67

0.1078

0.9270

Dumnezeu şi om

443

6.50

0.1422

0.9055

Ecò

698

9.00

0.2235

0.8345

Egipetul

688

7.50

0.1642

0.8766

Epigonii

921

10.00

0.2020

0.8523

Făt-Frumos din tei

415

6.50

0.1542

0.8967

6030

23.67

0.2879

0.7585

247

3.88

0.1296

0.9008

Feciorul de împărat fără de stea Floare-albastră Foaia veştedă (după Lenau)

115

3.00

0.0957

0.9435

Freamăt de codru

179

4.00

0.1229

0.9218

Frumoasă şi jună

113

4.00

0.1770

0.8938

Ghazel

331

6.00

0.1480

0.9063

Glossă

380

8.00

0.2421

0.8421

Horia

143

3.00

0.0839

0.9476

Iar când voi fi pământ (variantă)

131

3.00

0.0992

0.9351

Împărat şi proletar

1510

13.50

0.2238

0.8365

În căutarea Şeherezadei

915

9.00

0.1792

0.8650

Înger de pază

91

2.50

0.0879

0.9464

Înger şi demon

876

11.00

0.2055

0.8636

Îngere palid...

63

2.50

0.0952

0.9544

Întunericul şi poetul

249

5.00

0.1687

0.8815

Iubind în taină...

87

2.50

0.0690

0.9670

Iubită dulce, o, mă lasă

337

6.00

0.1365

0.9169

Iubitei

416

6.50

0.1466

0.9041

Junii corupţi

458

7.50

0.2031

0.8584

Kamadeva

81

2.50

0.0864

0.9522

La Bucovina

184

4.00

0.1087

0.9348

La mijloc de codru...

55

2.50

0.3273

0.7295

La moartea lui Heliade

332

5.75

0.1476

0.9022

La moartea lui Neamţu

245

5.00

0.1184

0.9327

La moartea principelui Ştirbey

132

4.00

0.1515

0.9091

La mormântul lui Aron Pumnul

150

4.00

0.1400

0.9133

La o artistă (Ca a nopţii poezie)

142

3.50

0.1056

0.9375

La o artistă (Credeam ieri)

219

4.00

0.1142

0.9224

La Quadrat

110

3.50

0.1545

0.9011

La steaua

71

3.00

0.1268

0.9366

Lacul

90

3.50

0.1667

0.9014

186  The word N

h

F(h)

Lasă-ţi lumea...

225

4.50

0.1289

0.9161

Lebăda

41

2.33

0.1463

0.9201

Poem title

R1

Lida

66

2.00

0.0909

0.9394

Locul aripelor

259

6.00

0.1622

0.9073

Luceafărul

1737

14.00

0.2441

0.8123

Mai am un singur dor

125

3.00

0.0800

0.9560

Melancolie

274

5.50

0.1642

0.8910

Memento mori

9773

27.00

0.2718

0.7655

Miradoniz

636

8.00

0.2154

0.8349

Misterele nopţii

155

4.00

0.1290

0.9226

Mitologicale

681

8.00

0.1880

0.8590

Mortua est!

491

7.50

0.1853

0.8719

Mureşanu

2051

17.00

0.2574

0.8130

Murmură glasul mării

119

3.00

0.1008

0.9370

Napoleon

240

4.00

0.1333

0.9000

Noaptea...

177

4.50

0.1356

0.9216

Nu e steluţă

54

3.00

0.1667

0.9167

Nu mă-nţelegi

384

7.00

0.1641

0.8997

Nu voi mormânt bogat (variantă)

113

3.00

0.1062

0.9336

Numai poetul

48

2.50

0.1250

0.9401

O arfă pe-un mormânt

157

4.00

0.1465

0.9045

O călărire în zori

346

5.50

0.1416

0.9021

O stea prin ceruri

78

3.00

0.1154

0.9423

O, adevăr sublime...

334

6.00

0.2066

0.8473

O, mamă…

140

4.00

0.1643

0.8929 0.9369

Odă în metru antic

103

3.00

0.1068

Odin şi poetul

1429

13.00

0.2344

0.8247

Ondina (Fantazie)

871

9.75

0.1906

0.8640 0.9647

Oricâte stele...

85

2.00

0.0588

Pajul Cupidon...

148

4.00

0.1824

0.8716

Pe aceeaşi ulicioară...

138

4.00

0.1594

0.8986

Pe lângă plopii fără soţ

199

5.00

0.1658

0.8970

Peste vârfuri

47

2.50

0.1702

0.8963

Povestea codrului

220

3.50

0.0955

0.9324

Povestea teiului

390

6.00

0.1538

0.8923

Prin nopţi tăcute

48

2.50

0.1250

0.9401

Privesc oraşul furnicar

173

4.00

0.1387

0.9075

Pustnicul

380

6.50

0.1447

0.9109

Vocabulary richness  187

N

h

F(h)

Replici

147

6.43

0.4490

0.6916

Revedere

141

4.50

0.1489

0.9229

Rugăciunea unui dac

357

6.00

0.1653

0.8852

S-a dus amorul

219

5.00

0.1689

0.8881 0.9405

Poem title

R1

Sara pe deal

156

3.67

0.1026

Scrisoarea I

1282

10.50

0.1871

0.8574

Scrisoarea II

696

11.00

0.2399

0.8470

Scrisoarea III

2278

15.40

0.2629

0.7891

Scrisoarea IV

1256

12.33

0.2205

0.8400

Scrisoarea V

1027

10.50

0.2269

0.8268

45

2.00

0.1111

0.9333

Se bate miezul nopţii... Şi dacă...

53

3.00

0.1887

0.8962

Singurătate

172

4.00

0.1221

0.9244

Somnoroase păsărele...

55

2.50

0.1273

0.9295

Sonete

265

5.00

0.1094

0.9377

Speranţa

245

5.00

0.2082

0.8429

Steaua vieţii

70

3.00

0.1429

0.9214

Stelele-n cer

91

3.00

0.1319

0.9176

Sus în curtea cea domnească

128

3.00

0.0938

0.9414

Te duci...

84

3.00

0.1786

0.8750

Trecut-au anii

88

2.50

0.0795

0.9560

Unda spumă

59

3.00

0.1525

0.9237

Venere şi Madona

393

6.33

0.1654

0.8856

Veneţia (de Gaetano Cerri)

79

2.50

0.0759

0.9636

Viaţa mea fu ziuă

105

3.00

0.1143

0.9286

Vis

177

3.50

0.0960

0.9386

Figure 3.3.1. Decrease of vocabulary richness R1 with text size

188  The word The short and the long poems display significant differences; most short poems have very high richness while long poems have significantly less (see Figure 3.3.1). No historical trend is observed. Since this indicator is based on F(h), one can obtain its variance as (3.3.3) Var(F(h)) = F(h)(1 - F(h))/N because the subtracted value (h2/(2N)) can be considered constant for the given text and its variance is zero. Thus comparison of the richness values of two texts is possible. Consider the text with the greatest richness, Doi aştri, R1= 0.9781, and that with the smallest richness, Replici, R1 = 0.6916. In Table 3.3.1, all necessary values are given, hence the asymptotic test can be performed using the normal distribution (3.3.4) z =

| R1,1 − R1,2 | F (h1 )(1 − F (h1 )) F (h2 )(1 − F (h2 )) + N1 N2

yielding in our case z = 0.57 which is not significant. However, comparing Doi aştri with Împărat şi proletar (R1 = 0.8365) we obtain z = 3.92 which is highly significant. Thus even smaller differences can be more significant than great differences; it depends on poem size. The same procedure can be performed for the spectrum, but in a spectrum the hapax legomena and words with a small repetition are placed at the beginning of the distribution; we define (cf. Popescu et al. 2009: 38)

= R2 G ([k ]) − (3.3.5)

k2 . 2V

This indicator has the advantage of taking into account not only the hapax legomena but also other frequency classes. But the number of classes to be considered gets evident only after computing the indicator k. When we compare two texts, we do not compare identically defined proportions but textually determined ones. Calculating this measure again for the poem Doi aştri we obtain k = 1.9737, V = 39 and G([1.9737]) = G(1) = 38/39 = 0.9744. From which we obtain R2(Doi aştri) = 0.9744 – 1.97372/(2*39) = 0.9244, which is, again, a very high richness value. The values of richness R2 for all 146 poems are presented in Table 3.3.2 and in Figure 3.3.2.

Vocabulary richness  189 Table 3.3.2: The vocabulary richness R2 in 146 poems by Eminescu (asterisk corresponds to the transformation g*(x) = g(x) – g(W) + 1, where W = the greatest non-zero class, see Chapter 3.2.6) Poem title

N

V

g(1)

k

G(k)

R2

Adânca mare…

75

62

55

3.0000

0.9839

0.9113 0.8998

Adio

159

111

88

3.5000

0.9550

Ah, mierea buzei tale

228

144

108

4.3333

0.9583

0.8931

Amicului F.I.*

242

189

154

4.2000

0.9947

0.9480

Amorul unei marmure*

235

177

142

3.8000

0.9718

0.9310

Andrei Mureşanu

2008

1011

763

6.5000

0.9624

0.9415

Atât de fragedă…

176

133

110

3.5000

0.9774

0.9314

Aveam o muză

421

281

226

3.8889

0.9537

0.9268

Basmul ce i l-aş spune ei

398

262

204

4.4000

0.9625

0.9325

Călin (file de poveste)

2299

1123

830

6.0000

0.9697

0.9527

126

100

85

2.8182

0.9600

0.9203

Când Când amintirile...

97

80

72

2.0000

0.9250

0.9000

Când crivăţul cu iarna...

708

420

345

3.7000

0.9429

0.9266

Când marea...

114

80

63

3.3333

0.9500

0.8806

Când priveşti oglinda mărei

101

82

71

2.8333

0.9512

0.9023

Care-i amorul meu în astă

213

157

130

4.3333

0.9745

0.9147

Ce e amorul?*

110

90

78

2.8750

0.9667

0.9207

Ce te legeni...

102

76

61

2.8182

0.9474

0.8951

Ce-ţi doresc eu ţie, dulce

183

127

100

3.0000

0.9370

0.9016

lume

Românie Cine-i?

129

93

75

2.9091

0.9355

0.8900

Copii eram noi amândoi

375

250

205

3.6250

0.9560

0.9297

Crăiasa din poveşti

122

94

74

3.0000

0.9894

0.9415

Criticilor mei*

120

87

61

3.4000

0.9885

0.9221

Cu mâne zilele-ţi adaogi...

141

105

79

2.9500

0.9524

0.9109

Cugetările sărmanului Dionis

571

389

324

3.8182

0.9614

0.9427

Cum negustorii din

101

84

75

3.0000

0.9762

0.9226

Constantinopol Cum oceanu-ntărâtat...

77

67

60

2.6000

0.9701

0.9197

Dacă treci râul Selenei

356

232

188

3.3333

0.9526

0.9286

De câte ori, iubito...

102

84

71

2.8182

0.9762

0.9289

De ce nu-mi vii

123

82

59

3.0000

0.9512

0.8963

De ce să mori tu?

266

172

141

3.8750

0.9302

0.8866

77

56

45

3.2500

0.9643

0.8700

De-aş avea*

190  The word Poem title

N

V

g(1)

k

G(k)

R2

De-aş muri ori de-ai muri

258

168

133

3.8750

0.9405

0.8958

Demonism

882

460

349

5.0000

0.9565

0.9293

De-oi adormi (variantă)*

112

101

94

2.6667

0.9703

0.9351

De-or trece anii...

87

63

53

2.6000

0.9206

0.8670 0.9035

Departe sunt de tine

135

105

90

2.8750

0.9429

Despărţire

304

202

163

4.0000

0.9554

0.9158

Din Berlin la Potsdam

128

99

83

2.9000

0.9495

0.9070

Din lyra spartă...

51

44

38

2.6000

0.9773

0.9005

Din noaptea*

56

51

47

2.3333

0.9804

0.9270

Din străinătate

244

168

135

3.6250

0.9643

0.9252

Din valurile vremii...

152

104

80

3.4000

0.9423

0.8867

Dintre sute de catarge*

43

38

35

2.0000

0.9737

0.9211

Doi aştri

40

39

38

1.9737

0.9744

0.9244

Dorinţa

102

85

75

3.3333

0.9765

0.9111

Dumnezeu şi om

443

320

269

3.6000

0.9656

0.9454

Ecò

698

442

362

4.0000

0.9661

0.9480

Egipetul

688

452

366

4.0000

0.9690

0.9513

Epigonii

921

565

442

4.5556

0.9699

0.9515

Făt-Frumos din tei

415

281

234

4.4000

0.9644

0.9300

6030

2271

1567

9.5000

0.9701

0.9502

Floare-albastră

247

185

157

5.6923

0.9838

0.8962

Foaia veştedă (după Lenau)

115

99

88

2.8571

0.9697

0.9285

Freamăt de codru

179

143

125

3.2500

0.9720

0.9351

Frumoasă şi jună

113

82

67

4.0000

0.9878

0.8902

Ghazel

331

231

189

4.3333

0.9697

0.9291

Glossă

380

191

141

4.5714

0.9215

0.8668

Horia

143

119

103

3.5000

0.9916

0.9401

Iar când voi fi pământ (vari-

131

106

90

3.0000

0.9811

0.9387

Împărat şi proletar

1510

857

706

5.5000

0.9627

0.9450

În căutarea Şeherezadei

915

594

495

5.0000

0.9747

0.9537

Înger de pază

91

71

55

2.8571

0.9718

0.9143

Înger şi demon

876

520

419

4.0000

0.9538

0.9385

Îngere palid…*

57

50

44

2.6000

0.9800

0.9124

Feciorul de împărat fără de stea

antă)

Întunericul şi poetul

249

176

143

3.3333

0.9602

0.9287

Iubind în taină…*

81

74

68

2.6000

0.9865

0.9408

Iubită dulce, o, mă lasă

227

212

162

4.0000

0.9387

0.9009

Vocabulary richness  191

N

V

Iubitei

416

Junii corupţi

458

Kamadeva La Bucovina

Poem title

La mijloc de codru...

g(1)

k

G(k)

R2

240

174

309

275

5.4000

0.9625

0.9018

3.7500

0.9482

81

70

0.9255

62

2.6667

0.9714

0.9206

184

140

112

3.3333

0.9714

0.9317

55

35

29

3.2500

0.9429

0.7920

La moartea lui Heliade

332

225

182

4.2000

0.9733

0.9341

La moartea lui Neamţu

245

173

136

3.7500

0.9538

0.9131

La moartea principelui

132

98

84

4.2500

0.9694

0.8772

150

116

96

2.8667

0.9569

0.9215

142

104

78

3.0000

0.9712

0.9279

La o artistă (Credeam ieri)

219

152

112

3.7778

0.9737

0.9267

La Quadrat

110

79

63

3.2500

0.9620

0.8952

La steaua*

59

56

54

1.9815

0.9643

0.9292

Ştirbey La mormântul lui Aron Pumnul La o artistă (Ca a nopţii poezie)

Lacul

90

70

60

2.6667

0.9429

0.8921

Lasă-ţi lumea...

225

167

142

3.7500

0.9581

0.9160

Lebăda*

37

35

34

2.8857

0.9714

0.8525 0.8958

Lida

66

57

50

3.1429

0.9825

Locul aripelor*

223

165

138

4.0000

0.9758

0.9273

Luceafărul

1737

820

583

5.6000

0.9500

0.9309

Mai am un singur dor

125

103

85

2.9286

0.9709

0.9292

Melancolie

274

192

157

3.0000

0.9531

0.9297

Memento mori

9773

3576

2460

10.9167

0.9715

0.9548

Miradoniz

636

377

296

4.4000

0.9602

0.9345

Misterele nopţii

155

110

87

3.8333

0.9545

0.8878

Mitologicale

681

442

360

4.5000

0.9661

0.9432

Mortua est!

491

295

227

3.8667

0.9424

0.9170

Mureşanu

2051

961

679

6.2857

0.9646

0.9441

Murmură glasul mării

119

100

89

3.5000

0.9900

0.9287

Napoleon

240

169

132

3.7143

0.9704

0.9296

Noaptea...

177

128

107

3.7143

0.9531

0.8992

Nu e steluţă*

42

34

27

2.6667

0.9706

0.8660

Nu mă-nţelegi*

330

248

205

4.7500

0.9798

0.9343

Nu voi mormânt bogat (vari-

113

99

92

2.5000

0.9596

0.9280

42

37

33

2.3333

0.9730

0.8994

antă) Numai poetul*

192  The word Poem title

N

V

g(1)

k

G(k)

R2

O arfă pe-un mormânt

157

118

99

3.4000

0.9661

0.9171

O călărire în zori

346

233

177

5.1250

0.9785

0.9222

O stea prin ceruri*

66

59

53

2.6000

0.9831

0.9258

O, adevăr sublime...

334

226

192

2.9500

0.9425

0.9232

O, mamă…

140

98

77

3.5000

0.9592

0.8967

Odă în metru antic*

93

79

69

2.8333

0.9620

0.9112

Odin şi poetul

1429

724

525

5.8889

0.9627

0.9388

Ondina (Fantazie)

871

535

427

5.0000

0.9701

0.9467

Oricâte stele...

85

73

62

2.8000

0.9863

0.9326

Pajul Cupidon…*

124

109

103

2.8000

0.9817

0.9457

Pe aceeaşi ulicioară...

138

103

86

2.8889

0.9320

0.8915

Pe lângă plopii fără soţ

199

138

114

4.0000

0.9638

0.9058

Peste vârfuri

47

39

35

2.0000

0.9487

0.8974

Povestea codrului

220

168

138

3.5714

0.9821

0.9442

Povestea teiului

390

261

209

3.8000

0.9464

0.9187

Prin nopţi tăcute*

42

37

33

2.3333

0.9730

0.8994

Privesc oraşul furnicar

173

136

117

2.9167

0.9559

0.9246

Pustnicul

380

270

231

4.0000

0.9667

0.9370

Replici

147

73

56

3.0000

0.9178

0.8562 0.8992

Revedere

141

102

82

3.2500

0.9510

Rugăciunea unui dac

357

253

212

3.5000

0.9565

0.9323

S-a dus amorul

219

152

128

3.0000

0.9342

0.9046

Sara pe deal

156

128

113

3.0000

0.9766

0.9414

Scrisoarea I

1272

707

553

5.8571

0.9576

0.9333

Scrisoarea II

696

423

342

4.3333

0.9574

0.9353

Scrisoarea III

2278

1146

873

6.0000

0.9686

0.9529

Scrisoarea IV

1256

699

546

6.0000

0.9700

0.9442

Scrisoarea V

1027

550

410

5.0000

0.9618

0.9391

Se bate miezul nopţii...

45

40

36

2.3333

0.9750

0.9069

Şi dacă...

53

37

27

3.2500

0.9730

0.8302

Singurătate*

151

128

115

4.1250

0.9922

0.9257

Somnoroase păsărele...

55

46

40

2.5000

0.9565

0.8886

Sonete

265

194

160

4.0000

0.9691

0.9278

Speranţa

245

143

103

3.8571

0.9301

0.8781

Steaua vieţii

70

55

44

2.8571

0.9455

0.8712

Stelele-n cer

91

76

67

2.6667

0.9605

0.9137

Sus în curtea cea domnească

128

104

89

3.2500

0.9808

0.9300

Te duci...

84

68

61

2.6667

0.9559

0.9036

Vocabulary richness  193

Poem title

N

V

g(1)

k

G(k)

R2

Trecut-au anii

88

74

Unda spumă*

47

41

63

2.7778

0.9730

0.9208

36

2.5000

0.9756

Venere şi Madona

393

0.8994

247

183

4.5714

0.9676

Veneţia (de Gaetano Cerri)*

73

0.9253

68

64

2.3333

0.9853

0.9453

Viaţa mea fu ziuă

105

Vis

177

86

75

3.0000

0.9767

0.9244

138

114

3.2500

0.9783

0.9400

As can be seen in Figure 3.3.2, R2 increases with increasing text size but the dispersion is too great to give a clear result. A possible remedy would be the scrutinising of individual poems which can be considered outliers. To this end, a confidence belt around the power curve – expressing the dependence – could show the positioning of individual poems. However, in this way we would obtain only a new classification.

Figure 3.3.2. Increase of vocabulary richness R2 with text size

Again, the variance of R2 is simply Var(R2) = G(k)(1 - G(k))/V, in e.g., the poem Veneţia we find Var(R2) = 0.9853(1 - 0.9853)/68 = 0.0002 and the comparison of its richness with that of Doi aştri yields

= z

| 0.9453 − 0.9244 | = 0.74 . 0.0006 + 0.0002

194  The word i.e. a non-significant difference. Though R2 seems to increase with increasing poem size, there are short poems with great richness (e.g. Veneţia) and a long poem (Glossă) with small richness. But roughly, there is some kind of trend which cannot be found for a unique author because of a great dispersion. If we correlate R1 with R2, we do not find any relation. The points form rather a cloud. Hence, we can conclude that though both views are mere transformations of one another, they are relatively independent richness indicators and can be used for text characterisation. The outliers must be studied individually. Another possibility is to take the mean of R1 and R2, i.e. Rm = (R1 + R2)/2 which is independent of text size but depends on the language type. Computing the mean richness indicator for 176 texts in 20 languages (cf. Popescu et al. 2009) a horizontal cloud with a mean of ca. 0.86 is obtained. Differences in one and the same language are signs of stylistic or text sort differences. For Eminescu's poems, we obtain the points presented in Figure 3.3.3. The mean is ca. 0.91. The outliers and the extreme values are research objects for literary studies. The variance of this indicator is Var(Rm) = (Var(R1) + Var(R2))/4, so that differences can easily be tested.

Figure 3.3.3. Mean vocabulary richness in Eminescu's poems

As a general remark, the outlier poems “La mijloc de codru…” and “Replici” appear in all indicators considered in the present book. The cause is the excessive repetition of some words, “şi de” (and of), respectively “eu sunt” (I am) and “tu eşti” (you are) as indicated in continuation below by bolded fonts.

Vocabulary richness  195

La mijloc de codru... ... Legănându-se din unde, În adâncu-i se pătrunde Şi de lună şi de soare Şi de păsări călătoare, Şi de lună şi de stele Şi de zbor de rândurele Şi de chipul dragei mele. Replici Poetul Tu eşti o undă, eu sunt o zare, Eu sunt un ţărmur, tu eşti o mare, Tu eşti o noapte, eu sunt o stea Iubita mea. Iubita Tu eşti o ziuă, eu sunt un soare, Eu sunt un flutur, tu eşti o floare, Eu sunt un templu, tu eşti un zeu Iubitul meu. Tu eşti un rege, eu sunt regină, Eu sunt un caos, to o lumină, Eu sunt o arpă muiată-n vânt Tu eşti un cânt. Poetul Tu eşti o frunte, eu sunt o stemă, Eu sunt un geniu, tu o problemă, Privesc în ochii-ţi să te ghicesc Şi te iubesc! …

196  The word

3.4 Word length The first linguistic invstigations date back at least 150 years. Usually, Augustus de Morgan is quoted who in 1851 mentioned in a private letter word length as an indicator of style (cf. Lord 1958). Today, the problem of word length is almost a separate discipline represented in many states; there are projects and bibliographies (cf. http://wwwuser.gwdg.de/~kbest/), congresses, omnibus volumes with a great amount of literature (cf. Best 2001; Grzybek 2006) and chapters in general works (cf. Best 2005). After many years of trials there is no unification of the field and probably there will be none because languages display a variety of word length phenomena (caused by boundary conditions), the individual texts may display idiosyncrasies, there is a great number of definitions of the concept word and numerous ways of length measurements, and last but not least, every linguist knows only a limited number of languages, living and dead ones, for which s/he can perform tests. The time of a great unification did not come as yet. We restrict ourselves to the study of word length in Eminescu's poems, try to characterise it and try to find a unique model, as far as possible. The analysis will be performed with the following provisos: clitics are parts of the phonological word; apostrophes will be eliminated, and units joined with a hyphen will be considered one word. These “decisions” are no rules and cannot be followed in every language. In German, e.g. the term “Natur- und Kulturschutz” is, as a matter of fact, a special kind of compound and stating its word length is a pure convention. Cases of similar kinds exist in every language, they evoke initial difficulties and hinder comparison and unification. The frequencies of word lengths in 100 poems by Eminescu are presented in Table 3.4.1. We present them in order to enable other researchers to work with the data. Very short poems are left out because the frequencies are not reliable enough to test a model. The table contains the lengths x = 1,2,…, and in the lines the frequencies of these lengths in individual poems are presented. The measurement of word length was performed in terms of syllable numbers, since syllables are the immediate phonetic components of the words. Another measurement, in terms of morpheme numbers, is a rather risky enterprise and can lead to vehement discussions; besides, it would be rather the measurement of morphemic complexity. We do not need to determine the syllable boundaries when we count the number of syllables in a word. It is sufficient to identify the elements from which we can tell how many syllables a word has, i.e. the nucleuses, which are in general the vowels or a syllabic consonant such as [l, r]. Therefore, this task can be performed by means of a programme and some rules

Word length  197

for correction (diphthongs etc.). Word length management in poems differs from that in prose. If the poem is constructed on rhythmic principles, e.g. isosyllabism, hexameter, etc. then the word length distribution has a greater excess than in prose texts or in poetic texts which are not based on rhythm or rhyme, because a specific length dominates. However, the poet can develop a specific technique of his own. In general, one can suppose that the longer the poem, the more deviations from general models may occur because the poet writes it with pauses and makes additional corrections; but on the other hand, additional corrections may lead to unifications. For a recent review of word length see Popescu et al.5

3.4.1 Ord's scheme In Table 3.4.1, we present the individual word-length distributions, the size of the poems in word-tokens (N) and Ord's characterisation by means of I and S (cf. Chapter 3.2.2) The numbers in the second column represent the frequencies of the individual lengths x = 1,2,3,… Though the -points have a relatively great dispersion – as can be seen graphically in Figure 3.4.1 – , they are placed on a straight line which is characteristic both of the text sort, of an author, a language, etc. It is also unique with respect to the entity in question, that is, to the linguistic unit and its property. It has been shown that e.g. the rank-frequency distributions of word-forms in many languages are placed on a straight line in the negative hypergeometric domain, i.e. below the S = 2I-1 line (cf. Popescu et al. 2009: 154). For Eminescu, we obtained the straight line S = -0.3219 + 2.8858I. For a Slovak poet, E. Bachletová, who writes rhyme-free and rhythm-free poems, we obtained the word length function S = -0.5843 + 3.0397I (cf. Čech et al. 2011). Of course, the differences between the functions can be tested but a deeper hypothesis needs more data concerning different units and properties from various texts and languages. Nevertheless, the above results show that there is a mechanism controlling word length in poetry. With increasing size of the poem the poet approximates both the mean of I and the mean of S.

 5 Popescu, I.-I., Naumann S., Kelih E., Rovenchak A., Sanada H., Overbeck A., Smith R., Čech R., Mohanty P., Wilson A., Altmann G., (2013). Word length: aspects and languages, "To honour Karl-Heinz Best", In Köhler, R., Altmann G. (eds.), Issues in Quantitative Linguistics 3, Lüdenscheid: RAM, 224-281

198  The word Table 3.4.1: Word length distributions in 100 Eminescu's poems (N = number of words in the poem, I, S = Ord's indicators) N

I

S

75

0.6480

1.3231

159

0.4313

1.7535

257

0.3009

0.3537

119, 69, 57, 18, 2, 1

266

0.5446

0.8923

Andrei Mureşanu

817, 603, 323, 111, 15, 9, 3

1881

0.5358

1.1922

Atât de fragedă...

85, 60, 24, 6, 1

176

0.4248

0.9160

Aveam o muză

198, 139, 64, 14, 3, 3

421

0.4965

1.2828

Basmul ce i l-aş spune ei

198, 133, 53, 12, 1

397

0.4037

0.8493

Călin (file din poveste)

1017, 754, 332, 108, 12, 3, 1

2227

0.4666

1.0033

Când

47, 47, 30, 2

126

0.3500

0.2992

Când amintirile…

44, 35, 11, 4, 2, 0, 1

97

0.6221

2.0252

Când crivăţul cu iarna...

330, 232, 111, 32, 4

709

0.4519

0.8668

Ce e amorul?

70, 31, 16, 8

124

0.5021

1.0389

Ce te legeni?...

48, 33, 14, 6, 1

102

0.4944

1.0143

Ce-ţi doresc eu ţie, dulce

56, 80, 36, 11

183

0.3695

0.4673

Poem title

Word-length distributions

Adânca mare Adio

29, 27,10, 4, 5 84, 53, 20, 1, 0, 0, 1

Amicului F.I.

101, 113, 41, 2

Amorul unei marmure

Românie Crăiasa din poveşti

47, 54, 15, 6

122

0.3693

0.6917

Criticilor mei

62, 49, 15, 3, 1

130

0.3914

0.9603

Cu mâine zilele-ţi adaogi...

58, 51, 22, 5, 5

141

0.5319

1.1988

Cugetările sărmanului Dionis 282, 173, 81, 30, 4, 0, 1

571

0.5079

1.1888

Dacă treci râul Selenei

150, 116, 71, 17, 3

357

0.4607

0.7457

De câte ori. iubito...

51, 24, 19, 6, 2

102

0.5794

1.0501

De ce nu-mi vii?

77, 31, 11, 3, 1

123

0.4370

1.3568

De-aş avea

40, 31, 9, 12, 0, 1

93

0.6170

1.2394

De-or trece anii...

51, 25, 9, 2

87

0.3780

0.9456

Departe sunt de tine...

69, 46, 11, 7, 0, 1, 1

135

0.5954

2.1310

Despărţire

179, 78, 30, 14, 3

304

0.5055

1.3414

Diana

65, 55, 18, 7, 2, 1

148

0.5180

1.3681

Din străinătate

113, 53, 55, 17, 4, 1, 1

244

0.6453

1.2351

Din valurile vremii…

81, 43, 19, 9

152

0.4741

0.9795

Dintre sute de catarge

10, 19, 7, 12, 1, 1

50

0.5806

0.6776

Dorinţa

51, 32, 13, 2, 4

102

0.5673

1.4899

Dumnezeu şi om

176, 131, 100, 27, 5, 1, 1

441

0.5317

0.9736

După ce atâta vreme

22, 11, 10, 4

47

0.5296

0.6756

Ecò

311, 224, 133, 23, 4, 3

698

0.4684

0.9757

Egipetul

255, 234, 126, 58, 12, 2, 0, 1

688

0.5570

1.1130

Word length  199

N

I

S

378, 326, 163, 51, 11, 1

930

0.4786

0.8946

2853, 1866, 994, 251, 47, 17, 4

6032

0.4970

1.1213

0.5797

1.7838 0.6794

Poem title

Word-length distributions

Epigonii Feciorul de împărat fără de stea Floare-albastră

125, 73, 32, 14, 2, 0, 0, 1

247

Freamăt de codru

68, 63, 34, 13, 1

179

0.4614

Ghazel

161, 90, 60, 19, 1

331

0.4902

0.8148

Glossă

191, 132, 46, 9, 2

380

0.3951

0.9286

Icoană si privaz

755, 464, 190, 45, 13, 3, 2

1472

0.4836

1.3511

Împărat şi proletar

669, 481, 250, 89, 15, 5, 1

1510

0.5250

1.1054

În căutarea Şeherezadei

412, 277, 167, 43, 13, 2, 1

915

0.5296

1.1228

Înger de pază

42, 35, 11, 2, 1

91

0.4030

1.0059

Înger şi demon

371, 282, 148, 59, 10, 5, 2

877

0.5692

1.2736

Iubind în taină…

44, 26, 13, 2, 1, 1

87

0.5546

1.5919

Iubitei

246, 114, 42, 10, 1, 2, 1

416

0.4912

1.7738

Junii corupţi

212, 119, 85, 33, 6, 2

457

0.5828

1.0636

Kamadeva

34, 28, 13, 5, 1

81

0.4884

0.9090

La Bucovina

71, 73, 30, 9, 1

184

0.4133

0.7310

La mijloc de codru...

27, 16, 8, 3, 1

55

0.5418

1.1422

La mormântul lui Aron

63, 42, 28, 15, 1, 0, 1

150

0.6104

1.2215

Pumnul La steaua

37, 21, 11, 2

71

0.4099

0.7692

Lacul

42, 32, 12, 4

90

0.4090

0.7846

Lasă-ţi lumea...

94, 90, 34, 5, 2

225

0.3877

0.8134

Lida

25, 30, 9, 2

66

0.3318

0.5554

Locul aripelor

139, 81, 30, 9, 1

259

0.4264

1.0158

Luceafărul

942, 487, 227, 72, 8, 1

1737

0.4692

1.0789

Mai am un singur dor

66, 36, 17, 4, 1, 1

125

0.5313

1.4828

Melancolie

124, 86, 44, 18, 2

274

0.4955

0.8858

Memento mori

4134, 3152, 1668, 646, 117, 36,

9777

0.5623

1.2939 1.0703

16, 7, 1 Miradoniz

281, 205, 98, 43, 7, 2

636

0.5305

Mitologicale

304, 199, 124, 41, 10, 4

682

0.5656

1.1375

Mortua est!

244, 178, 59, 8, 1

491

0.3535

0.7559

Napoleon

92, 86, 45, 12, 5

240

0.4867

0.9010

Noaptea...

84, 61, 24, 7, 0, 1

177

0.4552

1.1509

Nu mă-nţelegi

185, 104, 72, 14, 7, 2

384

0.5615

1.2308

O, ramâi

70, 40, 15, 3, 1

129

0.4226

1.0904

O. mamă…

72, 45, 19, 4

140

0.3938

0.7876

200  The word N

I

S

Poem title

Word-length distributions

Odă în metru antic

46, 37, 12, 6, 2

103

0.5132

1.1673

Oricâte stele...

40, 26, 14, 5

85

0.4610

0.7699

Pajul Cupidon...

69, 41, 28, 8, 1, 1

148

0.5461

1.1176

Pe aceeaşi ulicioară…

60, 58, 16, 3, 1

138

0.3657

0.8659

Pe lângă plopii fără soţ

119, 56, 20, 3, 1

199

0.3872

1.1087

Peste vârfuri

18, 18, 8, 3

47

0.4184

0.6278

Povestea codrului

94, 79, 37, 9, 1

220

0.4211

0.7435

Povestea teiului

181,132, 65, 8, 4

390

0.4251

0.8825

Revedere

65, 48, 15, 11, 1, 0, 1

141

0.5922

1.6526

Rugăciunea unui dac

171, 115, 53, 17, 0, 1

357

0.4575

0.9688

S-a dus amorul

121, 66, 21, 10, 1

219

0.4541

1.1556

Sara pe deal

63, 57, 34, 0, 1, 1

156

0.4100

0.9482

Scrisoarea I

560, 418, 187, 98, 18, 1

1282

0.5241

1.0151

Scrisoarea II

324, 214, 110, 40, 9

697

0.5117

0.9943

Scrisoarea III

1054, 703, 349, 133, 31, 6, 2, 2

2280

0.5616

1.3424

Scrisoarea IV

578, 425, 177, 67, 9, 6, 1, 1

1264

0.5285

1.3893

Scrisoarea V

495, 310, 158, 56, 12, 1

1032

0.5151

1.0595

Se bate miezul nopţii…

21, 15, 5, 3, 1

45

0.5531

1.2327

Şi dacă…

29, 16, 5, 1, 1, 1

53

0.6457

2.1460

Singurătate

83, 56, 22, 9, 2

172

0.4920

1.0870

Somnoroase păsărele…

18, 23, 8, 6

55

0.4458

0.6520

Sonete

127, 87, 36, 8, 3, 1

262

0.4834

1.2538

Speranţa

102, 87, 47, 8, 1

245

0.4069

0.6399

Stelele-n cer

44, 23, 15, 3, 4, 2

Strigoii

1012, 775, 310, 113, 19, 5, 4

91

0.7760

1.7620

2238

0.5030

1.2582

Sus în curtea cea domnească 57, 53, 13, 5

128

0.3647

0.7771

Te duci…

127, 57, 30, 5, 2, 2

223

0.5445

1.6360

Trecut-au anii

43, 31, 9, 3, 2

88

0.4968

1.3431

Venere şi Madona

173, 132, 59, 20, 5, 3, 1

393

0.5719

1.4921

Viaţa

218, 174, 82, 21, 4, 2

501

0.4753

1.0544

Word length  201

This fact can be seen in Figure 3.4.2. As long as the poem is short, the control of word-lengths can be dictated by some conscious (e.g. poetic form) or unconscious mechanism, but with increasing poem size the control disappears. As can be seen, the longer a poem, the stronger it tends to the mean of I and S.

Figure 3.4.1. The I-S relationship in Eminescu's poems

Figure 3.4.2. The convergence of I and S with increasing sample size

202  The word 3.4.2 Word-length distribution As can be seen in Table 3.4.1, we would obtain quite different relative frequencies of lengths for individual poems. This hypothesis could be tested by means of a chi-square test for homogeneity or the equivalent information statistics. The differences may be signs of different background models, differences in parameters of the same model or, last but not least, idiosyncrasies. In the sequel we shall search for models realised in Eminescu's poetry. We start from the general theory proposed by Wimmer and Altmann (2005) and explained in Chapter 2.5.1 (Word length in rhyme) where we obtained the Poisson, the hyper-Poisson and the Ferreri-Poisson distributions. As can be seen in Table 3.4.2, at least one of these three special cases of the theory can be successfully fitted to 100 texts. Of course, there are cases for which one of the three distributions is not adequate and there are also texts to which none of the given distributions can be fitted. These are especially long poems. For some of them a special modification can be found but some of them resist any fitting. This is caused by the fact that long texts are, as a matter of fact, mixed texts. The mixing arises by making pauses in writing or subdividing the text in chapters, etc. After a pause, the rhythms in the brain may change, the memory of the previous text may weaken, new impressions may arise, etc. But these deviations may arise also voluntarily, e.g. with dada-writers. Table 3.4.2: Fitting the above distributions to word-length distributions in Eminescu's poems. The following abbreviations are used (cf. Popescu et al. 2009: 134): X2 = the empirical chisquare value for the goodness-of-fit; DF = degrees of freedom; P = the associated probability; a, b = parameters; Po = Poisson; FP = Ferreri-Poisson; HP = hyper-Poisson. Poem title Adânca mare

Adio

Andrei Mureşanu Atât de fragedă...

Aveam o muză

X2

DF

P

a

b

Po

7.05

3

0.07

1.1646

--

FP

4.47

3

0.21

1.6019

--

HP

3.19

2

0.20

2.2069

2.6898

Po

1.73

2

0.42

0.6365

--

FP

3.84

3

0.28

1.0897

--

HP

1.71

1

0.19

0.5982

0.9480

FP

6.12

3

0.11

1.3560

--

HP

5.40

2

0.07

1.2601

1.6277

Po

0.19

3

0.98

0.7402

--

FP

1.01

3

0.80

1.1097

--

HP

0.15

2

0.93

0.7938

1.1019

Po

3.14

3

0.37

0.7997

--

Distr.

Word length  203

Poem title

Basmul ce i l-aş spune ei

Călin (file de poveste) Când amintirile…

Când crivăţul cu iarna...

Ce e amorul? Ce te legeni?...

Distr.

X2

DF

P

a

b

FP

2.34

3

0.50

1.2001

--

HP

1.39

2

0.50

1.0685

1.5220

Po

1.38

3

0.71

0.7074

--

FP

3.61

3

0.31

1.1561

--

HP

1.35

2

0.51

0.7384

1.0614

FP

10.80

5

0.06

1.2680

--

HP

6.29

3

0.10

0.9750

1.2907

Po

1.72

2

0.42

0.9306

--

FP

2.43

3

0.49

1.3134

--

HP

2.26

2

0.32

1.5712

2.3239

Po

5.03

3

0.17

0.8034

--

FP

4.87

3

0.18

1.2557

--

HP

3.59

2

0.17

0.9875

1.3424

FP

3.20

2

0.20

1.1455

--

HP

0.39

1

0.53

4.4611

9.4112

Po

1.37

2

0.50

0.8184

--

FP

0.56

3

0.91

1.2660

--

HP

0.54

2

0.77

1.2673

1.8434

Ce-ţi doresc eu ţie, dulce

Po

5.19

2

0.07

1.0354

--

Românie

HP

0.01

1

0.90

0.6395

0.4476

Crăiasa din poveşti

Po

3.47

2

0.17

0.8516

--

HP

1.16

1

0.28

0.5404

0.4931

Po

0.50

2

0.78

0.7075

--

FP

2.62

3

0.45

1.1641

--

HP

0.05

1

0.81

0.5550

0.7145

Po

4.73

3

0.19

0.9706

--

FP

2.88

3

0.41

1.4092

--

HP

2.96

2

0.23

1.4334

1.8267

Criticilor mei

Cu mâine zilele-ţi adaogi...

Cugetările sărmanului Dionis Dacă treci râul Selenei

De câte ori, iubito...

De ce nu-mi vii?

FP

3.24

4

0.52

1.2251

--

HP

2.62

3

0.46

1.3514

2.1028

Po

5.02

3

0.17

0.9081

--

FP

6.23

3

0.10

1.3761

--

HP

4.91

2

0.09

1.0477

1.2461

Po

7.61

3

0.05

0.8757

-

FP

4.47

3

0.21

1.3103

--

HP

2.96

2

0.23

2.8203

4.8019

Po

9.49

2

0.17

0.5491

--

FP

0.67

2

0.71

0.9452

--

204  The word Poem title De-aş avea

De-or trece anii...

Departe sunt de tine...

Despărţire Diana

Din valurile vremii…

Distr.

X2

DF

P

a

b

HP

0.02

2

Po

0.42

1

0.99

2.0774

5.1168

0.52

0.8867

FP

0.002

--

1

0.96

1.3560

HP

--

3.50

1

0.06

2.5882

3.5811

Po

0.56

2

0.76

0.5694

--

FP

0.17

2

0.92

0.9876

--

HP

0.14

1

0.70

0.8728

1.7354

Po

4.25

2

0.12

0.7251

--

FP

3.29

3

0.35

1.1499

--

HP

3.13

2

0.19

1.0919

1.7676

FP

6.32

3

0.10

1.0655

--

HP

1.59

2

0.45

3.0181

6.6950

Po

2.25

3

0.52

0.8568

--

FP

1.70

3

0.64

1.2977

--

HP

1.68

2

0.43

1.0714

1.3390

Po

4.70

2

0.10

0.7356

--

FP

1.16

2

0.56

1.1583

--

HP

0.03

1

0.87

2.1534

4.0564

Dintre sute de catarge

Po

8.07

4

0.09

1.6490

--

FP

8.41

4

0.08

2.1452

--

Dorinţa

Po

1.47

2

0.48

0.7644

--

FP

0.56

2

0.76

1.1366

--

HP

3.14

2

0.21

2.8135

4.9013

Po

3.59

2

0.17

0.9417

--

FP

2.43

2

0.30

1.3921

--

HP

1.86

1

0.17

1.9251

2.6456

Po

5.44

4

0.25

1.0546

--

FP

3.08

5

0.69

1.5202

--

După ce atâta vreme

Egipetul

Epigonii

Floare-albastră

Freamăt de codru

HP

2.51

3

0.47

1.3055

1.3926

Po

2.03

4

0.73

0.9207

--

FP

5.66

4

0.23

1.3846

--

HP

1.71

3

0.63

0.9803

1.0982

Po

7.30

3

0.06

0.7883

--

FP

2.18

3

0.54

1.2169

--

HP

1.15

3

0.77

1.8414

3.1538

Po

2.28

3

0.52

0.9901

--

FP

3.30

3

0.35

1.4630

--

HP

2.28

2

0.33

0.9983

1.0067

Word length  205

Poem title Glossă

Icoană si privaz Împărat şi proletar În căutarea Şeherezadei Înger de pază

Înger şi demon Iubind în taină…

Iubitei Kamadeva

La Bucovina

La mijloc de codru...

Distr.

X2

DF

P

a

b

Po

0.18

3

0.98

0.6831

--

FP

3.71

3

0.30

1.1293

--

HP

0.14

2

0.93

0.6472

0.9286

FP

3.21

4

0.52

1.1557

--

HP

3.19

3

0.36

1.1619

1.9042

FP

5.07

5

0.41

1.3382

--

HP

4.59

4

0.34

1.2756

1.6807

FP

7.94

4

0.09

1.3338

--

HP

7.27

3

0.06

1.3282

1.7945

Po

0.45

2

0.80

0.7345

--

FP

2.16

3

0.54

1.1982

--

HP

0.04

1

0.84

0.5488

0.6586

FP

4.79

4

0.31

1.4060

--

HP

3.87

4

0.42

1.5353

2.0002

Po

3.36

2

0.19

0.8247

--

FP

3.68

3

0.30

1.3016

--

HP

3.55

2

0.17

1.6424

2.5116

FP

2.38

3

0.50

0.9796

--

HP

0.37

2

0.93

1.7957

4.1584

Po

1.62

2

0.45

0.8258

--

FP

0.43

2

0.81

1.2623

--

HP

0.07

2

0.97

1.8589

3.0212

Po

1.56

3

0.67

0.9005

--

FP

4.89

3

0.18

1.3756

--

HP

0.48

2

0.79

0.6957

0.6766

Po

1.62

2

0.45

0.8258

--

FP

0.43

2

0.81

1.2623

--

HP

0.07

2

0.97

1.8589

3.0212

La mormântul lui Aron Pum-

FP

5.32

3

0.15

1.4849

--

nul

HP

4.49

3

0.21

2.0277

2.6576

La steaua

Po

1.35

2

0.51

0.7026

--

Lacul

Lasă-ţi lumea...

FP

1.15

2

0.56

1.1471

--

HP

1.13

1

0.29

1.0531

1.7120

Po

0.02

2

0.99

0.7700

--

FP

0.52

2

0.77

1.2160

-1.0208

HP

0.01

1

0.92

0.7778

Po

3.08

3

0.38

0.8151

--

HP

0.42

1

0.52

0.5078

0.5237

206  The word Poem title Lida

Locul aripelor

Mai am un singur dor

Melancolie

Miradoniz Mitologicale Mortua est! Napoleon

Noaptea...

O, mamă…

O, ramâi

Odă în metru antic

Oricâte stele...

Pajul Cupidon...

Distr.

X2

DF

P

a

b

Po

2.73

FP

5.14

2

0.26

0.8367

--

2

0.08

1.3239

HP

--

0.02

1

0.90

0.4186

0.3488

Po

1.88

3

0.60

0.6669

--

FP

0.74

3

0.86

1.0979

--

HP

0.63

2

0.73

0.9464

1.6000

Po

2.37

2

0.31

0.7187

--

FP

0.95

3

0.81

1.1595

--

HP

0.46

2

0.79

1.6873

3.0524

Po

5.49

3

0.14

0.8742

--

FP

3.11

3

0.38

1.3247

--

HP

2.95

2

0.23

1.3227

1.8055

FP

3.25

4

0.52

1.3477

--

HP

2.83

3

0.42

1.3705

1.8390

FP

6.82

4

0.15

1.3742

--

HP

4.39

3

0.22

1.6287

2.2437

Po

3.40

3

0.33

0.6690

--

HP

1.12

2

0.57

0.4932

0.6634

Po

0.61

3

0.89

0.9740

--

FP

1.61

3

0.66

1.4381

--

HP

0.55

2

0.76

1.0411

1.1137

Po

0.31

3

0.96

0.7595

--

FP

1.01

3

0.80

1.2088

--

HP

0.25

2

0.88

0.8382

1.1542

Po

0.68

2

0.71

0.6859

--

FP

1.07

2

0.58

1.1310

--

HP

0.59

1

0.44

0.7906

1.2129

Po

0.49

2

0.78

0.6405

--

FP

0.28

2

0.90

1.0726

--

HP

0.08

1

0.77

0.8595

1.4725

Po

2.14

3

0.54

0.8651

--

FP

1.21

3

0.75

1.3035

--

HP

1.26

2

0.53

1.2166

1.6491

Po

0.39

2

0.82

1.2754

--

FP

1.17

2

0.56

0.8278

--

HP

0.34

1

0.56

1.2874

1.8439

Po

5.00

3

0.17

0.8778

--

FP

2.94

3

0.40

1.3254

--

Word length  207

Poem title Pe aceeaşi ulicioară…

Pe lângă plopii fără soţ

Peste vârfuri

Povestea codrului

Povestea teiului

Revedere

Rugăciunea unui dac

S-a dus amorul

Scrisoarea II Scrisoarea III Scrisoarea IV Scrisoarea V Se bate miezul nopţii…

Şi dacă… Singurătate

Distr.

X2

DF

P

a

b

HP

2.60

2

Po

2.86

2

0.27

1.5879

2.2952

0.24

0.7484

FP

6.75

3

0.08

1.2243

---

HP

0.17

1

0.68

0.4504

0.4801

Po

1.39

2

0.50

0.5488

--

FP

0.55

2

0.76

0.9642

--

HP

0.47

1

0.49

0.8429

1.7273

Po

0.06

2

0.97

0.9336

--

FP

0.59

2

0.74

1.4037

--

HP

0.002

1

0.97

0.8253

0.8299

Po

1.17

3

0.76

0.8450

--

FP

3.54

3

0.32

1.3115

--

HP

1.03

2

0.60

0.7692

0.8746

Po

5.31

3

0.15

0.7872

--

FP

7.58

3

0.06

1.2439

--

HP

5.29

2

0.07

0.8389

1.1008

Po

6.43

3

0.09

0.8775

--

FP

4.11

3

0.25

1.3140

--

HP

3.81

2

0.15

1.5657

2.2967

Po

4.37

3

0.22

0.7836

--

FP

3.45

3

0.33

1.2325

--

HP

3.13

2

0.21

1.0451

1.4988

Po

5.37

2

0.07

0.6549

--

FP

2.53

3

0.47

1.0879

--

HP

2.24

2

0.33

1.2948

2.3932

FP

3.18

3

0.36

1.2970

--

HP

2.45

2

0.29

1.4089

2.0344

HP

4.08

4

0.40

1.6030

2.3501

FP

7.56

4

0.11

1.2920

--

HP

7.56

3

0.06

1.3141

1.8682

FP

6.35

4

0.17

1.2670

--

HP

4.72

3

0.19

1.3871

2.0506

Po

1.74

2

0.42

0.8706

--

FP

0.81

2

0.67

1.3077

--

HP

0.57

1

0.45

1.0624

2.4780

Po

1.70

2

0.43

0.6945

--

HP

0.27

1

0.60

1.7589

3.3780

Po

2.27

3

0.52

0.7960

--

208  The word Poem title

Distr.

X2

DF

P

a

b

FP

0.50

3

0.92

1.2315

--

HP

0.49

2

0.78

1.2018

1.7814

Po

1.29

2

0.52

1.0824

--

FP

1.72

2

0.42

1.5509

--

HP

1.38

1

0.24

0.9048

0.7155

Po

2.61

3

0.46

0.7732

--

FP

0.98

3

0.81

1.2074

--

HP

0.97

3

0.61

1.0686

1.5828

Po

4.01

3

0.26

0.8704

--

FP

7.30

3

0.06

1.3452

--

HP

3.57

2

0.17

0.7450

0.7966

Stelele-n cer

HP

2.62

3

0.45

9.2961

16.8191

Strigoii

FP

6.59

4

0.16

1.2797

--

HP

3.37

2

0.19

0.9823

1.2826

Po

2.44

2

0.30

0.7471

--

FP

5.31

2

0.07

1.2104

-0.5908

Somnoroase păsărele…

Sonete

Speranţa

Sus în curtea cea domnească

Te duci… Trecut-au anii

Venere şi Madona

Viaţa

HP

1.22

1

0.27

0.5198

FP

6.12

3

0.11

1.1103

--

HP

4.08

3

0.25

4.6704

10.4059

Po

1.25

2

0.54

0.7505

--

FP

0.90

2

0.64

1.1791

--

HP

0.88

1

0.35

1.1020

1.5199

Po

5.18

3

0.16

0.8962

--

FP

3.77

4

0.44

1.3575

--

HP

2.89

3

0.42

1.5683

2.1912 --

Po

0.77

3

0.86

0.8507

FP

2.92

4

0.57

1.3107

--

HP

0.59

2

0.74

0.8999

1.1029

Out of 100 poems, 86 followed the above mentioned models, however, the number of modifications is still greater. It depends on the theme and the momentary mood of the writer, whether the next poem will follow the well known scheme. Again, we can argue with supplementary corrections or conscious deviations from the casting mould. It is sufficient if an individual length-class deviates in order to obtain another model. Consider the modified Singh-Poisson distribution defined as

Word length  209

(3.4.9)

1 − α + α e − a , x = 1  x −1 − a Px =  α a e x = 2,3,4,...  ( x − 1)! , 

in its 1-displaced form where the first class is modified and the other ones are modified by α in order to yield a sum of 1 (cf. Wimmer, Altmann 1999). The poems following this regime are presented in Table 3.4.3. In two cases it was possible to apply the 1-displaced binomial distribution (which converges to the 1-displaced Poisson distribution) and we obtained for Când: X2 = 3.82, DF = 1, P = 0.0505, n = 3, p = 0.3025 Amicului F.I.: X2 = 1.71, DF = 1, P = 0.20, n = 3, p = 0.2629 Now, only 6 poems remained which could not be captured by these models, viz. Ecò (N = 689); Sara pe deal (N = 156); Ghazel (N = 331), Feciorul de împărat fără de stea (N = 6032); Memento mori (N = 9777), and Scrisoarea I (N = 1282). Three of them have a too great size (N > 1200) and can be considered as mixtures. The remaining three poems represent deviations and we can conjecture that some other mechanism was active at their production or they are conscious stylistic deviations. Perhaps, a purely qualitative analysis could unveil this “mystery”, i.e. to show the boundary conditions under which the poems were written. Table 3.4.3: Poems following the 1-displaced Singh-Poisson distribution X2

DF

P

a

α

Amorul unei marmure

5.91

3

0.12

1.2100

0.7946

Din străinătate

6.42

3

0.09

1.3836

0.7241

Dumnezeu şi om

7.47

3

0.06

1.1474

0.8852

Junii corupţi

2.94

3

0.40

1.2230

0.7616

Luceafărul

5.31

3

0.15

0.8907

0.7775

Nu mă-nţelegi

5.70

3

0.13

1.1223

0.7739

Poem title

We can conclude that there are not only stress-conditioned rhythms in texts but also other ones. In this chapter we scrutinised word length which obeys some patterns derivable from a common background mechanism. The responsible mechanism yields a pattern out of several ones contained in a reservoir, which can/should be captured by a theory. Needless to say, the application of a selected model always takes place with the ceteris paribus condition. Unfortunately, this condition is not always met and may lead to exceptions or devia-

210  The word tions. Our aim is to present the theory and not the local conditions which are not always traceable. Their finding must be left to those who are specialised in text sorts or individual writers.

3.5 Word classes (parts of speech) 3.5.1 Frequencies The study of word classes or, in particular parts of speech, has been strongly influenced by the Greek-Latin tradition, which arranges words according to changing criteria: morphologically marked grammatical categories, ontological and semantic features, syntactic and discourse-pragmatic use. But even here, languages allowing conversion – e.g. English, German – and “exotic” languages destroyed the illusion that the world itself is “classified”. Of course, it is ordered in its own way, but every language constructs this order conceptually in a different way. Unfortunately, the difference is not only conceptual, it may be also formal: while in German most parts of speech can be used as nouns in an appropriate syntactic context (often with the determiner das), the reverse procedure is not always possible; derivation by means of affixation is common in inflecting languages. Nevertheless, it is always possible to partition the text in prefabricated classes – with or without intersections – but different researchers may do it in different ways. It does not depend on some order in reality but on the aim of our investigation. In such a situation one always seeks a criterion of “correctness” of the classification but it cannot be found in the language or text itself. Mostly, agreement with the classical grammar based on European languages is chosen as a criterion, which is a too narrow approach. Also semantic, psychological, sociolinguistic, etc. criteria are applied, all depending on the aim of research and they are all correct if they corroborate the hypothesis we have set up in advance. But working with the so-called Peircean abduction does not even provide clear hypotheses; with induction one hopes to have found a universal but this is mostly a problem of definition; and deduction holding true for all languages is in linguistics seldom and in many cases it creates languages which do not exist at all. According to Bunge (1983: 17), classification – if it has not been established deductively – is not a theoretical but a taxonomic account about reality. But a taxonomic account is always linked with some practical or epistemological aim. In grammatical research it is practical, in quantitative linguistics it is also epistemological. In grammatical research we classify entities according to form, meaning, function and place in sentence, etc. In quantitative linguistics we ask

Word classes (parts of speech)  211

always whether the given classification abides by a lawlike-hypothesis. That is, we set up the hypothesis that a linguistic classification may be considered “correct” or “purposeful” if the rank-frequency distribution of classes abides by some reproducible distribution. The distribution is not known in advance in all cases and it is not equal in all cases – especially because texts are different – but in any case, an acceptable rank-frequency distribution of a classification must be derivable form the general theory (cf. Wimmer, Altmann 2005). Word classes are results of language development. The process leading to their rise is known as diversification (cf. Zipf 1949; Köhler 1991), triggered by various circumstances and requirements of the language users. Altmann (2005) mentions six causes of diversification: random fluctuation, environmentally conditioned variation, conscious change, self-regulation, system modification, and Köhler's requirements, out of which those responsible for diversification are especially the trend for minimal coding and decoding effort, sufficient redundancy and minimisation of production effort, the general coding requirement and its opposite force, the need for minimising the inventory, context economy and context specificity, invariance vs. flexibility of the relation between expression and meaning. It is a very complex process leading to the rise of variants, classes, overt marking, conversions, etc. It is active in all domains of language beginning from phonetics up to sociolinguistics and psycholinguistics. Word classes can be marked phonetically, morphologically, syntactically or lexically, every language prefers its own methods. Our aim is to show that if there is a way to identify word classes in the given language, then the ranking of classes abides by a regular probability distribution or a stratification process. The study of this phenomenon is well developed though the number of publications is not too excessive (cf. Hammerl 1990; Best 1994, 1998; Köhler 1991; Rothe 1991; Wimmer, Altmann 2001; Ziegler 1998, 2001; Nemcová 2008; Overbeck, Best 2008). Let us consider the word classes in Eminescu's poems. In Romanian, we shall distinguish the following word classes: adjective (A), adverb (Av), article (At), conjunction (C), interjection (I), noun (N), numeral (Nu), preposition (P), pronoun (Pn) and verb (V), which is rather usual in Latin-based grammars. One could increase the number of classes by adding details (e.g. different kinds of pronouns or numerals) but this will not be necessary, since we are interested in reliable representations of classes. The empirical distribution is given in the order A, Av, At, C, I, N, Nu, P, Pn, V and the fitting is performed for the corresponding usual rank-frequency sequence, i.e. the frequencies are ordered in decreasing order and we try to test the following hypotheses:

212  The word The ordering of frequencies abides by the exponential decay regularity, defined as (3.5.1) y = c + a1 exp(-x/r1 ) + a2 exp(-x/r2) One can, of course, test also the classical Zipfian power function but in some publications it has been shown that it does not fit the data well. Some authors apply also the Zipf-Alekseev function and the negative hypergeometric distribution. The above formula is sufficient for short texts, where c = 0, but for longer texts in which each word class occurs at least once, it is usual to determine c on the right hand side according to the smallest frequency. There are qualitatively different orderings, i.e. we do not find in all poems the sequence ordered in form N, V, P, A, Av, Pn, At, I, Nu as it is in the poem Lacul. The equality of ordering can be tested by an appropriate statistical test. The frequencies of individual classes are presented in Table 3.5.1. Table 3.5.1: Frequencies of word classes in Eminescu's poems (the poem Călin (file de poveste) is written Călin) Poem title

Poem size N

Empirical frequencies A

Av

At

C

I

N

Nu

P

Pn

V

16

9

13

1

51

0

29

23

27

2

4

3

0

16

0

9

10

13

19

25

23

0

104

0

57

47

50

3

1

3

0

8

1

6

6

8

21

6

27

0

54

0

31

56

39

23

15

23

4

72

0

50

62

48

18

10

22

0

60

0

27

34

38

15

2

6

1

24

6

108

132

161

30

27

27

2

140

2

75

75

73

22

19

37

3

178

4

92

44

74

17

7

22

2

74

0

41

40

40

8

1

7

0

23

0

14

8

17

17

5

4

0

39

0

16

15

21

17

15

31

2

80

1

36

28

43

23

11

10

0

47

0

29

27

31

words Atât de fragedă... Călin, Gazel Călin, part I Călin, part II Călin, part III Călin, part IV Călin, part V Călin, part VI Călin, part VII Călin, part VIII Floare-albastră Lacul Mai am un singur dor Melancolie Pe lângă plopii fără soţ

176 15 65 4

351 44 41 4

249 25 304 31

247 36 95 3

463 46 504 55 247 20 90 11

125 11 274 33 199 13

Word classes (parts of speech)  213

As can be seen in Table 3.5.2, ten texts can be fitted by (3.5.1) with one component, five texts need two components. The causes must be sought directly in the given texts. Nevertheless, the texts display a different stratification. The usual chi-square test can be applied for testing for the homogeneity of class frequencies, for our purposes defined as 15

10

(3.5.2) X 2 = ∑∑ =i 1 =j 1

(nij − Eij ) 2 Eij

,

where there are 15 poems and 10 parts-of-speech, nij are the observed frequencies in the cells (i,j) (that means poem i and part-of speech j) and the Eij are the expected values Eij = ni.n.j /n, where ni. is the sum of the row, n.j is the sum of the column, and n is the sum of all frequencies. The test is generally known. Applying (3.5.2) to the data in Table 3.5.1 we obtain X2 = 524.38 which, with (15 – 1)(10 – 1) = 126 degrees of freedom, displays a very strong non-homogeneity. Even if we consider only one poem, here Călin (file de poveste), composed of nine parts, we obtain a chi-square X2 = 434.65 which with 80 degrees of freedom testifies to very strong non-homogeneity. This fact supports our conjecture that texts partitioned by the author himself or very long texts which cannot be written in one go are mixed texts and cannot be considered a homogeneous whole. If we ascribe ranks to the numbers in Table 3.5.1, we see that they are not associated equally with the given part-of-speech classes, though the differences are reduced by ranking. We ask whether the body of Eminescu's poems is homogeneous from this point of view. The problem can be tested using Kendall's concordance coefficient, Friedman's analysis of variance by ranks, etc. The classes with the same frequency obtain the mean rank, as shown in Table 3.5.3. We restrict ourselves to seven poems and compute the concordance coefficient (3.5.3) W =

12S , k n(n 2 − 1) 2

where k is the number of poems (here k = 7), n is the number of classes (here n = 10) and S is the sum of squared differences (3.5.4) S =

k (n + 1)   ∑  Ri − 2  i =1  n

2

where Ri are the sums of ranks in individual columns (i = 1,2,…,n). One can find them in the last row of Table 3.5.3.

214  Word classes (parts of speech)

The word  215

Table 3.5.3: Ranking of word-classes Poem title

Poem

Ranks of word-classes

size N A words Atât de fragedă... Călin, Gazel Floare-albastră Lacul Mai am un singur dor Melancolie Pe lângă plopii fără soţ Sums of ranks Ri

176 6 65 5.5

247 6 90 4 125 6 274 7 199 6

37.5

Av

At

C

I

N

Nu

P

Pn

V

5

8

7

9

1

10

2

4

3

7

5.5

8

9.5

1

9.5

4

3

2

7

8

5

9

1

10

2

3.5

3.5

5.5

8

7

9.5

1

9.5

3

5.5

2

3

7

8

9.5

1

9.5

4

5

2

8

5

9

1

1

10

3

6

2

5

7

8

9.5

1

9.5

3

4

2

39.5

51.5

48

65

7

68

19

31

16.5

Since k(n + 1)/2 = 7(11)/2 = 38.5. we obtain S = (37.5 – 38.5)2 + (39.5 – 38.5)2 + (51.5 – 38.5)2 +(48 – 38.5)2 + (65 – 38.5)2 + + (7 – 38.5)2 + (68 – 38.5)2 + (19 – 38.5)2 + (31 – 38.5)2 + (16.5 – 38.5)2 = 746.5 Inserting this result in (3.5.3) we obtain

= W

12(3746.5) 44958 = = 0.9267. 2 48510 7 (10)(100 − 1)

As W ∈ [0,1], 1 means perfect concordance, 0 no agreement (independence), we see that as to ranking the parts of speech are distributed concordantly but this concordance is rather language-conditioned, not text-dependent. (3) If one chooses the same word classes for different languages, one obtains different orderings. Thus languages and texts can be classified according to this order, and the same may be done with authors.

216  The word 3.5.2 Descriptiveness vs. activity The frequencies of word classes in text depend both on language and on style. The distribution of frequencies in translations of the same text can be helpful in showing the present state of the languages, and translations of older texts in the present state of language can show the direction of development. As to style, two views of scrutinising a conspicuous property of the given text sort and the personal style of the author are known: the study of descriptiveness vs. activity and the study of nominality. The former view compares the number of descriptive or ornamental components with those which express some activity. The first group consists mostly of adjectives including optionally descriptive adverbs; the latter group consists of verbs which can be defined in several ways, e.g. including only verbs expressing activity (go, sing, blow,…) but not other verbs (have, sleep, be,…), or including all verbs, gerunds, gerundives, verbal interjections, etc. The latter view measures the extent of nominality in expressions. In German, laws and other official texts frequently use nominal expressions instead of verbal ones. In English, the difference can be illustrated by the sentences “He runs quickly” and “His run is quick”. Both views can further be studied from two perspectives: the first is static, taking into account the text as a whole; the second is dynamic, taking into account the development of the given property in the deployment of the text. We shall illustrate all these possibilities one after another. Consider the poem Peste vârfuri, in which we marked the components pointing to adjectives or adverbs responding to the question “how?” and for verbs: Peste vârfuri trece lună, Codru-şi bate frunza lin, Dintre ramuri de arin Melancolic cornul sună. Mai departe, mai departe, Mai încet, tot mai încet, Sufletu-mi nemângâiet Îndulcind cu dor de moarte. De ce taci, când fermecată Inima-mi spre tine-ntorn? Mai suna-vei, dulce corn, Pentru mine vre odată?

Word classes (parts of speech)  217

The sequence of verbs (V) and adjectives (A) as described above is given as follows: (I) V V A A V A A A A A V V A V V A. Several procedures are available to characterise descriptiveness/activity of the poem. We simply determine the proportion of verbs in the given sample, i.e. we define an indicator known as a modification of Busemann's ratio (3.5.5) Q =

nV nV + nA

(cf. Altmann 1978) and obtain Q = 0.44 for our example, where nV = 7 and nA = 9. If Q < 0.5 the text can be considered descriptive; if Q > 0.5, the text can be considered active; if Q ≈ 0.5, it is in a descriptive-active equilibrium. There is, of course, a way how to decide whether the state of the text is significantly descriptive or active. Since Q is a simple proportion whose expected value is a priori 0.5, the state of significance of its activity can be computed by means of the binomial distribution. If nV > n/2 where n = nV + nA, then we compute

n x = nV  x  n

∑

n (3.5.6) P ( X ≥ nV ) =  0.5 .

If P(X ≥ nV) < 0.05, we consider the text as significantly active. On the other hand, if nV < n/2, we compute nV n n (3.5.7) P ( X ≤ nV ) =   0.5 . ∑ x =0  x 

If the probability is smaller than 0.05, the text is significantly descriptive. It would be possible to prepare tables for a number of n and state the nV at which the text takes on a significant degree of the property, but the computation with a programme is simple. Tables would consume very much space. In our example, we have n = nV + nA = 16, out of which nV = 7. Since 7 < 16/2, we compute (3.5.7) and obtain

 16  16  16   P(X ≤ 7) = 0.516    +   + ... +    = 0.40  0   7     1  i.e. the text is in a descriptive-active equilibrium (the probability is greater than 0.05). For very long texts, the asymptotic test

218  The word

(3.5.8) X 2 =

(nV − nA ) 2 nV + nA

can be applied, where X2 is distributed like a chi-square with one degree of freedom, or identically (3.5.9)= u (2Q − 1) n distributed normally. Evidently u2 = X2. For our example we obtain

= X2

(7 − 9) 2 = 0.25 16

which is not significant with one degree of freedom, and 7 u =− (2 1) 16 = − 0.5 16

which is identical with the previous test. Evidently (-0.5)2 = 0.25 = X2. The two last tests are very quick and can be used with relatively small n. The first dynamic consideration concerns the sequence of Qs with increasing n. It has already been studied by Köhler and Galle (1993) on the basis of another definition. Here we shall obtain for the first members in the sequence (I) using (3.5.5). 1/1 = 1; 2/2 = 1; 2/3 = 0.67; 2/4 = 0.5; 3/5 = 0.6 , …. The complete sequence is then as follows (II) 1; 1; 0.67; 0.5; 0.6; 0.5; 0.43; 0.38; 0.33; 0.30; 0.36; 0.42; 0.38; 0.43; 0.47; 0.44.

If there are antagonistic requirements/forces operating in the rise of the text – here descriptiveness and activity – then it can be expected that sequences like the above one either display a regular oscillating course or display a stepwise change. In our case a regular oscillation cannot be supposed because the writer does not care for this detail. We rather expect that one of the properties changes more or less regularly. In order to capture this motion, we conjecture that the rate of change of the given variable is a function of the two antagonistic forces. Up to now, two approaches have been proposed (Popescu, Čech, Altmann 2011b). The first is a product of the form (3.5.10) y’ = Kf(x)[1 – f(x)],

Word classes (parts of speech)  219

where x is the position in the sequence and y the value of Q. In the second one, the relative rate of change of y is an additive composition of the functions of descriptiveness and activity in the form (3.5.11)

y' a b = − . y x M −x

Choosing an exponential function exp(-c(x-d)) for f(x) in (3.5.10) we obtain the Morse function6 (3.5.12) y = a + b[1 – exp(-c(x – d))]2. The solution of (3.5.11) yields the beta-function in the form

= y Cx a ( M − x)b . (3.5.13) Here M must be greater then the maximum number of steps (xmax). Applying the Morse function to the sequence (II) we obtain y = 0.3563 + 1.4156[1 – exp(- 0.0581(x – 10.1257))]2 yielding R2 = 0.92. It is presented graphically in Figure 3.5.1.

Figure 3.5.1. Fitting the Morse function to the sequence of Qs in the poem Peste vârfuri

The Q sequence of this poem begins with a verb (V), that is with Q = 1. Long poems of this kind, such as Odă în metru antic can be fitted as well, as shown in Figure 3.5.2. Note that parameter d means (nA+ nV) minimum of the curve.  6 The Morse potential, named after physicist Philip M. Morse, is a well known model of the potential energy in a diatomic molecule, see http://en.wikipedia.org/wiki/Morse_potential.

220  The word

Figure 3.5.2. Fitting the Morse function to the sequence of Qs in the poem Odă în metru antic

The other kind of Q sequences begins with an adjective (A), that is with Q = 0. In this case, the Morse fitting parameters a and d should vanish, as shown in Figure 3.5.3. The results for some other poems are presented in Table 3.5.4.

Figure 3.5.3. Fitting the Morse function to the sequence of Qs in the poem Lacul

Word classes (parts of speech)  221 Table 3.5.4: Fitting the Q-sequences in some poems by the Morse function (other function marked by name) Poem title

Empirical sequence

R2

Q-sequence Morse-function Adânca mare…

AAAAVVAAAVVAAAVVVVAVAVVAA 0; 0; 0; 0; 0.2; 0.33; 0.29; 0.25; 0.22; 0.3; 0.36; 0.33; 0.31; 0.29; 0.33; 0.38; 0.41; 0.44; 0.42; 0.45; 0.43; 0.45; 0.48; 0.46; 0.44 Chapman: y = 0,4652(1 - exp(-0,1570x))1,9041

Atât de fragedă...

0.87

AVAVVAVAAVVAAAVVAVVVAAVAVAVAVVAVAA VAVVVVAAVVV 0; 0.5; 0.33; 0.5; 0.6; 0.5; 0.57; 0.5; 0.44; 0.5; 0.55; 0.5; 0.46; 0.43; 0.47; 0.5; 0.47; 0.5; 0.53; 0.55; 0.52; 0.5; 0.52; 0.5; 0.52; 0.5; 0.52; 0.5; 0.52; 0.53; 0.52; 0.53; 0.52; 0.5; 0.51; 0.5; 0.51; 0.53; 0.54; 0.55; 0.54; 0.52; 0.53; 0.55; 0.56 y = 0.5104(1-exp(-0.9881(x – 0.8804)))2

Călin, Gazel

0.77

VVAVAVVVVVVVVAVVAA 1; 1; 0.67; 0.75; 0.6; 0.67; 0.71; 0.75; 0.78; 0.8; 0.82; 0.83; 0.85; 0.79; 0.8; 0.81; 0.76; 0.72 y = 0.7010 + 0.1180(1-exp(-0.2546(x-4.9521)))2

Călin, part I

0.62

VVAAVVVAVAAAVVVVAAVAVAAAAVVVAVVAAV AAVVVVAAVAVAVAAVAAVVAAVVAAAAVAVAVV AVAVVAAAVVVAAAVVVVVVAVAVAVA 1; 1; 0.67; 0.5; 0.6; 0.67; 0.71; 0.63; 0.67; 0.6; 0.55; 0.5; 0.54; 0.57; 0.6; 0.63; 0.59; 0.56; 0.58; 0.55; 0.57; 0.55; 0.52; 0.5; 0.48; 0.5; 0.52; 0.54; 0.52; 0.53; 0.55; 0.53; 0.52; 0.53; 0.51; 0.5; 0.51; 0.53; 0.54; 0.55; 0.54; 0.52; 0.53; 0.52; 0.53; 0.52; 0.53; 0.52; 0.51; 0.52; 0.51; 0.5; 0.51; 0.52; 0.51; 0.5; 0.51; 0.52; 0.51; 0.5; 0.49; 0.48; 0.49; 0.48; 0.49; 0.48; 0.49; 0.5; 0.49; 0.5; 0.49; 0.5; 0.51; 0.5; 0.49; 0.49; 0.49; 0.5; 0.51; 0.5; 0.49; 0.49; 0.49; 0.5; 0.51; 0.51; 0.52; 0.52; 0.52; 0.52; 0.52; 0.52; 0.52; 0.52; 0.52 y = 0.5177 + 9.4295E-6(1 - exp(-0.1350(x - 40.9756)))2

Călin, part II

0.73

VVAVAAVAVVAAVV 1; 1; 0.67; 0.75; 0.6; 0.5; 0.57; 0.5; 0.56; 0.6; 0.55; 0.5; 0.54; 0.57 Asymptotic: y = 0,5282 + 0,8015*0,6547x

Călin, part III

VVVAVAVAVVAAVVAVVAVVAVAAAAAVVVAAAV

0.84

222  The word Poem title

Empirical sequence

R2

VVAVVVAAVVAAAVAVVVVVVVAVVAVVVVAAA 1; 1; 1; 0.75; 0.8; 0.67; 0.71; 0.63; 0.67; 0.7; 0.64; 0.58; 0.62; 0.64; 0.6; 0.63; 0.65; 0.61; 0.63; 0.65; 0.62; 0.64; 0.61; 0.58; 0.56; 0.54; 0.52; 0.54; 0.55; 0.57; 0.55; 0.53; 0.52; 0.53; 0.54; 0.56; 0.54; 0.55; 0.56; 0.58; 0.56; 0.55; 0.56; 0.57; 0.56; 0.54; 0.53; 0.54; 0.53; 0.54; 0.55; 0.56; 0.57; 0.57; 0.58; 0.59; 0.58; 0.59; 0.59; 0.58; 0.59; 0.6; 0.6; 0.61; 0.6; 0.59; 0.58 y = 0.5685+0.00054(1-exp(-0.0894(x-39.1286)))2 Călin, part IV

0.87

VVAVVVAVVAAVAAVAAAAAAVVVVAVAAAAVVA VVVVAAVVAAVAAAVVVAVVVVVVVAVVVVVVVV VAVAVAV 1; 1; 0.67; 0.75; 0.8; 0.83; 0.71; 0.75; 0.78; 0.7; 0.64; 0.67; 0.62; 0.57; 0.6; 0.56; 0.53; 0.5; 0.47; 0.45; 0.43; 0.45; 0.48; 0.5; 0.52; 0.5; 0.52; 0.5; 0.48; 0.47; 0.45; 0.47; 0.48; 0.47; 0.49; 0.5; 0.51; 0.53; 0.51; 0.5; 0.51; 0.52; 0.51; 0.5; 0.51; 0.5; 0.49; 0.48; 0.49; 0.5; 0.51; 0.5; 0.51; 0.52; 0.53; 0.54; 0.54; 0.55; 0.56; 0.55; 0.56; 0.56; 0.57; 0.58; 0.58; 0.59; 0.6; 0.6; 0.61; 0.6; 0.61; 0.6; 0.6; 0.59; 0.6 y = 0.4759+0.29242(1-exp(-0.0266(x-31.9399)))2

Călin, part V

0.88

VAAAVAVVAVVVVVAVVVVVAAVAAVAAVVAAVA AAVVVAAVVVAAAAVVAVVAAAAAAVAAVAAVVA VVVVV 1; 0.5; 0.33; 0.25; 0.4; 0.33; 0.43; 0.5; 0.44; 0.5; 0.55; 0.58; 0.62; 0.64; 0.6; 0.63; 0.65; 0.67; 0.68; 0.7; 0.67; 0.64; 0.65; 0.63; 0.6; 0.62; 0.59; 0.57; 0.59; 0.6; 0.58; 0.56; 0.58; 0.56; 0.54; 0.53; 0.54; 0.55; 0.56; 0.55; 0.54; 0.55; 0.56; 0.57; 0.56; 0.54; 0.53; 0.52; 0.53; 0.54; 0.53; 0.54; 0.55; 0.54; 0.53; 0.52; 0.51; 0.5; 0.49; 0.5; 0.49; 0.48; 0.49; 0.48; 0.48; 0.48; 0.49; 0.49; 0.49; 0.5; 0.51; 0.51; 0.52 y = 0.3009 + 0.2562(1-exp(-0.3810(x-3.5721)))2

Călin, part VI

0.66

VVVVAVVVAAVVVAVVVVVA 1; 1; 1; 1; 0.8; 0.83; 0.86; 0.88; 0.78; 0.7; 0.73; 0.75; 0.77; 0.71; 0.73; 0.75; 0.76; 0.78; 0.79; 0.75 y = 0.7385+21.1310(1-exp(-0.0084(x-14.5036)))2

Călin, part VII

AVAVAAVAVVAVVVAVAVAVVAVVAVAAVVAVVA VAVVVVVVVVVVAAVAVAAVAVVAVAVAAAAVAV VAVAAAVAAVVVVVVAVVAAAVVVVVVVVVVAVV VVVAVAAVAAVVAVAVVAVV

0.83

Word classes (parts of speech)  223

Poem title

Empirical sequence

R2

0; 0.5; 0.33; 0.5; 0.4; 0.33; 0.43; 0.38; 0.44; 0.5; 0.45; 0.5; 0.54; 0.57; 0.53; 0.56; 0.53; 0.56; 0.53; 0.55; 0.57; 0.55; 0.57; 0.58; 0.56; 0.58; 0.56; 0.54; 0.55; 0.57; 0.55; 0.56; 0.58; 0.56; 0.57; 0.56; 0.57; 0.58; 0.59; 0.6; 0.61; 0.62; 0.63; 0.64; 0.64; 0.65; 0.64; 0.63; 0.63; 0.62; 0.63; 0.62; 0.6; 0.61; 0.6; 0.61; 0.61; 0.6; 0.61; 0.6; 0.61; 0.6; 0.59; 0.58; 0.57; 0.58; 0.57; 0.57; 0.58; 0.57; 0.58; 0.57; 0.56; 0.55; 0.56; 0.55; 0.55; 0.55; 0.56; 0.56; 0.57; 0.57; 0.58; 0.57; 0.58; 0.58; 0.57; 0.57; 0.56; 0.57; 0.57; 0.58; 0.58; 0.59; 0.59; 0.59; 0.6; 0.6; 0.61; 0.6; 0.6; 0.61; 0.61; 0.62; 0.62; 0.61; 0.62; 0.61; 0.61; 0.61; 0.6; 0.6; 0.6; 0.61; 0.6; 0.6; 0.6; 0.6; 0.61; 0.6; 0.6; 0.61 Asymptotic: y = 0,5917 - 0,4045*0,8783x Călin, part VIII

0.70

VVVAVAVAAVAVVAVAVVAVAVAAVAVVAAVAAV AAAAVVAAAVVAAAAAVAAAVVVVAVAAAAVAVA VAVAAVAVVVVVAAAVAVVVVVVVVVVAVVVAVV VVVAAVAVVAVVAAAVAVAVAAVVVVAVVA 1; 1; 1; 0.75; 0.8; 0.67; 0.71; 0.63; 0.56; 0.6; 0.55; 0.58; 0.62; 0.57; 0.6; 0.56; 0.59; 0.61; 0.58; 0.6; 0.57; 0.59; 0.57; 0.54; 0.56; 0.54; 0.56; 0.57; 0.55; 0.53; 0.55; 0.53; 0.52; 0.53; 0.51; 0.5; 0.49; 0.47; 0.49; 0.5; 0.49; 0.48; 0.47; 0.48; 0.49; 0.48; 0.47; 0.46; 0.45; 0.44; 0.45; 0.44; 0.43; 0.43; 0.44; 0.45; 0.46; 0.47; 0.46; 0.47; 0.46; 0.45; 0.44; 0.44; 0.45; 0.44; 0.45; 0.44; 0.45; 0.44; 0.45; 0.44; 0.44; 0.45; 0.44; 0.45; 0.45; 0.46; 0.47; 0.48; 0.47; 0.46; 0.46; 0.46; 0.46; 0.47; 0.47; 0.48; 0.48; 0.49; 0.49; 0.5; 0.51; 0.51; 0.52; 0.51; 0.52; 0.52; 0.53; 0.52; 0.52; 0.53; 0.53; 0.54; 0.54; 0.54; 0.53; 0.54; 0.53; 0.54; 0.54; 0.54; 0.54; 0.54; 0.54; 0.53; 0.53; 0.53; 0.53; 0.53; 0.53; 0.53; 0.53; 0.52; 0.53; 0.53; 0.54; 0.54; 0.53; 0.54; 0.54; 0.54 y = 0.4572 + 0.1311(1-exp(-0.0203(x-51.8483)))2

Când amintirile...

0.83

VVAAVAAVVAAVAVAVAVVVVVVVVAAVAA 1; 1; 0.67; 0.5; 0.6; 0.5; 0.43; 0.5; 0.56; 0.5; 0.45; 0.5; 0.46; 0.5; 0.47; 0.5; 0.47; 0.5; 0.53; 0.55; 0.57; 0.59; 0.61; 0.63; 0.64; 0.62; 0.59; 0.61; 0.59; 0.57 y = 0.4615 + 0.1781(1-exp(-0.1231 (x-9.4026)))2

Ce te legeni...

VVVVVVVAVVAVAVAVVVAVVAAVVAVVAAAAVA 1; 1; 1; 1; 1; 1; 1; 0.88; 0.89; 0.9; 0.82; 0.83; 0.77; 0.79; 0.73; 0.75; 0.76; 0.78; 0.74; 0.75; 0.76; 0.73; 0.7; 0.71; 0.72; 0.69; 0.7; 0.71; 0.69; 0.67; 0.65; 0.63; 0.64; 0.62

0.85

224  The word Poem title

Empirical sequence y = 0.6397+ 36.1336(1-exp(-0.0026(x - 39.5599)))

Crăiasa din poveşti

R2 2

0.94

AAVAVVVVVAAVAAVVAAAAVVVAAVVAAVVAVA V 0; 0; 0.33; 0.25; 0.4; 0.5; 0.57; 0.63; 0.67; 0.6; 0.55; 0.58; 0.54; 0.5; 0.53; 0.56; 0.53; 0.5; 0.47; 0.45; 0.48; 0.5; 0.52; 0.5; 0.48; 0.5; 0.52; 0.5; 0.48; 0.5; 0.52; 0.5; 0.52; 0.5; 0.51 y = 0.5259(1-exp(-0.4304x))2

Criticilor mei

0.79

AVAVAVVAAVAVVVVAVVAAVAVVAVVAAVAVVV VAVAVVV 0; 0.5; 0.33; 0.5; 0.4; 0.5; 0.57; 0.5; 0.44; 0.5; 0.45; 0.5; 0.54; 0.57; 0.6; 0.56; 0.59; 0.61; 0.58; 0.55; 0.57; 0.55; 0.57; 0.58; 0.56; 0.58; 0.59; 0.57; 0.55; 0.57; 0.55; 0.56; 0.58; 0.59; 0.6; 0.58; 0.59; 0.58; 0.59; 0.6; 0.61 y = 0.5607 (1-exp(-0.6302x))2

Cu mâine zilele-ţi

AVAVVAVVVVVVVAVVAVVAVAVVAVAVVVAAAV

adaogi…

AVAVVAAAVAA

0.69

0; 0.5; 0.33; 0.5; 0.6; 0.5; 0.57; 0.63; 0.67; 0.7; 0.73; 0.75; 0.77; 0.71; 0.73; 0.75; 0.71; 0.72; 0.74; 0.7; 0.71; 0.68; 0.7; 0.71; 0.68; 0.69; 0.67; 0.68; 0.69; 0.7; 0.68; 0.66; 0.64; 0.65; 0.63; 0.64; 0.62; 0.63; 0.64; 0.63; 0.61; 0.6; 0.6; 0.59; 0.58 y = 0.6720(1-exp(-0.5178x))2 De ce nu-mi vii?

0.75

VVVVVVVAVAVVVAAVVVAVVVAAAVAVVAVV 1; 1; 1; 1; 1; 1; 1; 0.88; 0.89; 0.8; 0.82; 0.83; 0.85; 0.79; 0.73; 0.75; 0.76; 0.78; 0.74; 0.75; 0.76; 0.77; 0.74; 0.71; 0.68; 0.69; 0.67; 0.68; 0.69; 0.67; 0.68; 0.69 y = 0.6283+ 0.0580(1-exp(-0.0227(x-58.6454)))2

De-aş avea

0.92

VAAAAVAVVVAVVAAAAAVAAVVAAVVVAV 1; 0.5; 0.33; 0.25; 0.2; 0.33; 0.29; 0.38; 0.44; 0.5; 0.45; 0.5; 0.54; 0.5; 0.47; 0.44; 0.41; 0.39; 0.42; 0.4; 0.38; 0.41; 0.43; 0.42; 0.4; 0.42; 0.44; 0.46; 0.45; 0.47 y = 0.2647 + 0.1811(1-exp(-0.3763(x-3.9474)))2

De-or trece anii...

0.88

VVAVVVVVVVVAVVVVVVVAVAVVV 1; 1; 0.67; 0.75; 0.8; 0.83; 0.86; 0.88; 0.89; 0.9; 0.91; 0.83; 0.85; 0.86; 0.87; 0.88; 0.88; 0.89; 0.89; 0.85; 0.86; 0.82; 0.83; 0.83; 0.84 y = 0.7919+ 0.0759(1-exp(-0.3850(x-3.6703))2

Departe sunt de tine... A V A V A V V V A V V V A V V A A A V A A A V A V V V V V A V A A V VVVVAAVA

0.44

Word classes (parts of speech)  225

Poem title

Empirical sequence

R2

0; 0.5; 0.33; 0.5; 0.4; 0.5; 0.57; 0.63; 0.56; 0.6; 0.64; 0.67; 0.62; 0.64; 0.67; 0.63; 0.59; 0.56; 0.58; 0.55; 0.52; 0.5; 0.52; 0.5; 0.52; 0.54; 0.56; 0.57; 0.59; 0.57; 0.58; 0.56; 0.55; 0.56; 0.57; 0.58; 0.59; 0.61; 0.59; 0.58; 0.59; 0.57 y = 0.5759 (1-exp(-0.6225x))2

0.69

Dintre sute de catarge V V A V V V V A V V V A V 1; 1; 0.67; 0.75; 0.8; 0.83; 0.86; 0.75; 0.78; 0.8; 0.82; 0.75; 0.77 y = 0.7786 + 0.0164(1-exp(-0.3737(x-5.2560))2 Dorinţa

0.52

VVAVAVVVVVVAAAVAAAVVAAVAVAAVVVAA 1; 1; 0.67; 0.75; 0.6; 0.67; 0.71; 0.75; 0.78; 0.8; 0.82; 0.75; 0.69; 0.64; 0.67; 0.63; 0.59; 0.56; 0.58; 0.6; 0.57; 0.55; 0.57; 0.54; 0.56; 0.54; 0.52; 0.54; 0.55; 0.57; 0.55; 0.53 y = 0.4799 + 0.0006(1-exp(-0.0316(x- 104.1823))2

După ce atâta vreme

0.68

AVVVAVAVAAAVVAAVVVAVV 0; 0.5; 0.67; 0.75; 0.6; 0.67; 0.57; 0.63; 0.56; 0.5; 0.45; 0.5; 0.54; 0.5; 0.47; 0.5; 0.53; 0.56; 0.53; 0.55; 0.57 y = 0.5614(1-exp(-1.0274)2

Floare-albastră

0.54

VAVAAVAAAVAVAAVAVVVVAVVVAAAAAVVVVA VVVVVVVVVAAVVVVVAVVAAVAAVAVAAVV 1; 0.5; 0.67; 0.5; 0.4; 0.5; 0.43; 0.38; 0.33; 0.4; 0.36; 0.42; 0.38; 0.36; 0.4; 0.38; 0.41; 0.44; 0.47; 0.5; 0.48; 0.5; 0.52; 0.54; 0.52; 0.5; 0.48; 0.46; 0.45; 0.47; 0.48; 0.5; 0.52; 0.5; 0.51; 0.53; 0.54; 0.55; 0.56; 0.58; 0.59; 0.6; 0.6; 0.59; 0.58; 0.59; 0.6; 0.6; 0.61; 0.62; 0.61; 0.62; 0.62; 0.61; 0.6; 0.61; 0.6; 0.59; 0.59; 0.58; 0.59; 0.58; 0.57; 0.58; 0.58 y = 0.3651 + 0.2283(1-exp(-0.0979(x-10.2388)))2

Înger de pază

0.79

VVAAVVAAVAVAAVAVAAVAVVAV 1; 1; 0.67; 0.5; 0.6; 0.67; 0.57; 0.5; 0.56; 0.5; 0.55; 0.5; 0.46; 0.5; 0.47; 0.5; 0.47; 0.44; 0.47; 0.45; 0.48; 0.5; 0.48; 0.5 y = 0.4850 + 7.5509E-6(1-exp(-0.1950(x- 29.7388)))2

Iubind în taină

0.86

VAVVVVAVVAVVAVVAVVAAVVVVV 1; 0.5; 0.67; 0.75; 0.8; 0.83; 0.71; 0.75; 0.78; 0.7; 0.73; 0.75; 0.69; 0.71; 0.73; 0.69; 0.71; 0.72; 0.68; 0.65; 0.67; 0.68; 0.7; 0.71; 0.72 y = 0.3073 + 0.4133(1-exp(-1.9721(x - 1.4212)))2

Kamadeva

VVVAVAVVAVVAAVVVAAVVAA 1; 1; 1; 0.75; 0.8; 0.67; 0.71; 0.75; 0.67; 0.7; 0.73; 0.67; 0.62; 0.64; 0.67; 0.69; 0.65; 0.61; 0.63; 0.65; 0.62; 0.59

0.77

226  The word Poem title

Empirical sequence y = 0.6208+ 263.1377(1-exp(-0.0022(x- 17.2038)))

La mijloc de codru…

R2 2

0.81

A (outlier) V A A A V V A 0 (outlier); 0.5; 0.33; 0.25; 0.2; 0.33; 0.43; 0.38 y = 0.2466 + 0.4466(1-exp(-0.2589(x - 4.1943)))2

Lacul

0.83

AAVVAVVAVVVVAVAAVVVAAAVAVVAAVVAA 0; 0; 0.33; 0.5; 0.4; 0.5; 0.57; 0.5; 0.56; 0.6; 0.64; 0.67; 0.62; 0.64; 0.6; 0.56; 0.59; 0.61; 0.63; 0.6; 0.57; 0.55 y = 0.5853(1-exp(-0.4129x))2

La steaua

0.87

VVAVVAVAVVAVVVVVVAAVA 1; 1; 0.67; 0.75; 0.8; 0.67; 0.71; 0.63; 0.67; 0.7; 0.64; 0.67; 0.69; 0.71; 0.73; 0.75; 0.76; 0.72; 0.68; 0.7; 0.67 y = 0.6785 + 0.0473(1-exp(-0.1754(x-8.4882)))2

Lida

0.72

VAVVVAVVAVVVAAVVAVA 1; 0.5; 0.67; 0.75; 0.8; 0.67; 0.71; 0.75; 0.67; 0.7; 0.73; 0.75; 0.69; 0.64; 0.67; 0.69; 0.65; 0.67; 0.63 y = 0.0908 + 0.6068(1-exp(-2.5182(x- 1.3174)))2

Luceafărul

0.80

VVAAAAVAVAVVVVVAAVVVAVVAVVVAVAVVAV AVVVAAVAVAAVVVAV 1; 1; 0.67; 0.5; 0.4; 0.33; 0.43; 0.38; 0.44; 0.4; 0.45; 0.5; 0.54; 0.57; 0.6; 0.56; 0.53; 0.56; 0.58; 0.6; 0.57; 0.59; 0.61; 0.58;

(up to A + V = 50)

0.6; 0.62; 0.63; 0.61; 0.62; 0.6; 0.61; 0.63; 0.61; 0.62; 0.6; 0.61; 0.62; 0.63; 0.62; 0.6; 0.61; 0.6; 0.6; 0.59; 0.58; 0.59; 0.6; 0.6; 0.59; 0.6 y = 0.4163 + 0.1984(1-exp(-0.1733(x-7.0923)))2

0.86

Mai am un singur dor V V V V A A A V A V A V A V V A V V A V A A V V A V V V V V A V 1; 1; 1; 1; 0.8; 0.67; 0.57; 0.63; 0.56; 0.6; 0.55; 0.58; 0.54; 0.57; 0.6; 0.56; 0.59; 0.61; 0.58; 0.6; 0.57; 0.55; 0.57; 0.58; 0.56; 0.58; 0.59; 0.61; 0.62; 0.63; 0.61; 0.63 y = 0.5553 +0.17824(1-exp(-0.0662(x-16.2706)))2 Melancolie

VVVAVVAAAAAAVAVVAVAAAVAVVVAVVVVAAA AAVVVVAVVAAVAVVAAVVAAAVAAAVVAVVVAA AVVVVVAV 1; 1; 1; 0.75; 0.8; 0.83; 0.71; 0.63; 0.56; 0.5; 0.45; 0.42; 0.46; 0.43; 0.47; 0.5; 0.47; 0.5; 0.47; 0.45; 0.43; 0.45; 0.43; 0.46; 0.48; 0.5; 0.48; 0.5; 0.52; 0.53; 0.55; 0.53; 0.52; 0.5; 0.49; 0.47; 0.49; 0.5; 0.51; 0.53; 0.51; 0.52; 0.53; 0.52; 0.51; 0.52; 0.51; 0.52; 0.53; 0.52; 0.51; 0.52; 0.53; 0.52; 0.51; 0.5; 0.51; 0.5; 0.49; 0.48; 0.49; 0.5; 0.49; 0.5; 0.51; 0.52; 0.51; 0.5; 0.49; 0.5;

0.88

Word classes (parts of speech)  227

Poem title

Empirical sequence

R2

0.51; 0.51; 0.52; 0.53; 0.52; 0.53 y = 0.4675 + 0.0545(1–exp(-0.0756(x–20.6697)))2 O, ramâi

0.90

VVVAVVVVAAAAVVAVAVAVAVAVAAVAVAVVVV VVVV 1; 1; 1; 0.75; 0.8; 0.83; 0.86; 0.88; 0.78; 0.7; 0.64; 0.58; 0.62; 0.64; 0.6; 0.63; 0.59; 0.61; 0.58; 0.6; 0.57; 0.59; 0.57; 0.58; 0.56; 0.54; 0.56; 0.54; 0.55; 0.53; 0.55; 0.56; 0.58; 0.59; 0.6; 0.61; 0.62; 0.63 y = 0.5533 + 2.6690(1–exp(-0.0141(x–25.5670)))2

Odă în metru antic

0.90

VVVAAAVAAVAAAVAAVAAAVVAAVAVVVAVAVA VVAV 1; 1; 1; 1; 0.75; 0.6; 0.5; 0.57; 0.5; 0.44; 0.5; 0.45; 0.42; 0.38; 0.43; 0.4; 0.38; 0.41; 0.39; 0.37; 0.35; 0.38; 0.41; 0.39; 0.38; 0.4; 0.38; 0.41; 0.43; 0.45; 0.43; 0.45; 0.44; 0.45; 0.44; 0.46; 0.47; 0.46; 0.47 y = 0.3796 + 0.1555(1-exp(-0.0692(x-17.4689)))2

Oricâte stele

0.94

VVVVVVVVAAAVAVVVVAVAVVAAA 1; 1; 1; 1; 1; 1; 1; 1; 0.89; 0.8; 0.73; 0.75; 0.69; 0.71; 0.73; 0.75; 0.76; 0.72; 0.74; 0.7; 0.71; 0.73; 0.7; 0.67; 0.64 y = 0.6703+ 140.8362(1-exp(-0.0020(x-27.2545)))2

0.84

Pe aceeaşi ulicioară... V A (outlier) V V A V V A V A V A A V A A A V A V V V V V V V V V V V VVVAV 1; 0.5 (outlier); 0.67; 0.75; 0.6; 0.67; 0.71; 0.63; 0.67; 0.6; 0.64; 0.58; 0.54; 0.57; 0.53; 0.5; 0.47; 0.5; 0.47; 0.5; 0.52; 0.55; 0.57; 0.58; 0.6; 0.62; 0.63; 0.64; 0.66; 0.67; 0.68; 0.69; 0.7; 0.68; 0.69 y = 0.5293+1.1133(1-exp(-0.0295(x-16.5278)))2 Pe lângă plopii fără

AVVVVVAVVVVVAVVVAVVAVVVAVAAVVAVAAV

soţ...

VAVAVAVVAAVVAV

0.78

0; 0.5; 0.67; 0.75; 0.8; 0.83; 0.71; 0.75; 0.78; 0.8; 0.82; 0.83; 0.77; 0.79; 0.8; 0.81; 0.76; 0.78; 0.79; 0.75; 0.76 y = 0.7283(1 – exp(-0.8021))2 Peste vârfuri

0.69

VVAAVAAAAVVAVVA 1; 1; 0.67; 0.5; 0.6; 0.5; 0.43; 0.38; 0.33; 0.4; 0.45; 0.42; 0.46; 0.5; 0.47 y = 0.3933+0.9668(1-exp(-0.0724(x-9.3162)))2

Revedere

VVAVVAVVVAVVVVVVVAVVVVAVVAVAVVVVAA VVVAAVVAAVVV

0.91

228  The word Poem title

Empirical sequence

R2

1; 1; 0.67; 0.75; 0.8; 0.67; 0.71; 0.75; 0.78; 0.7; 0.73; 0.75; 0.77; 0.79; 0.8; 0.81; 0.82; 0.78; 0.79; 0.8; 0.81; 0.82; 0.78; 0.79; 0.8; 0.77; 0.78; 0.75; 0.76; 0.77; 0.77; 0.78; 0.76; 0.74; 0.74; 0.75; 0.76; 0.74; 0.72; 0.73; 0.73; 0.71; 0.7; 0.7; 0.71; 0.72 y = 0.7310 + 0.0316(1-exp(-0.2815(x-6.0283)))2 Sara pe deal

0.52

V A (outlier) V V V A V V V A A A V A V A A V A V V A V V V V A V V A AVVAAVAVVVVAVVVAAAV 1; 0.5 (outlier); 0.67; 0.75; 0.8; 0.67; 0.71; 0.75; 0.78; 0.7; 0.64; 0.58; 0.62; 0.57; 0.6; 0.56; 0.53; 0.56; 0.53; 0.55; 0.57; 0.55; 0.57; 0.58; 0.6; 0.62; 0.59; 0.61; 0.62; 0.6; 0.58; 0.59; 0.61; 0.59; 0.57; 0.58; 0.57; 0.58; 0.59; 0.6; 0.61; 0.6; 0.6; 0.61; 0.62; 0.61; 0.6; 0.58; 0.59 y = 0.5761+0.0529(1-exp(-0.0568(x-23.3239)))2

0.75

Se bate miezul nopţii... V V V A A V V V A V V A 1; 1; 1; 0.75; 0.6; 0.67; 0.71; 0.75; 0.67; 0.7; 0.73; 0.67 y = 0.6762 + 1.0836(1-exp(-0.0619(x- 8.5554)))2 Şi dacă...

0.74

VVVVAVVVVVVAVVVVA 1; 1; 1; 1; 0.8; 0.83; 0.86; 0.88; 0.89; 0.9; 0.91; 0.83; 0.85; 0.86; 0.87; 0.88; 0.82 y = 0.8575 + 0.0150(1-exp(-0.1162(x-13.7304)))2

Singurătate

0.57

AVVVVAVAAAVAAVAVVVAAVVVVVVAAVVAAVV VVVAAVAAVVAVVVVAAAA 0; 0.5; 0.67; 0.75; 0.8; 0.67; 0.71; 0.63; 0.56; 0.5; 0.55; 0.5; 0.46; 0.5; 0.47; 0.5; 0.53; 0.56; 0.53; 0.5; 0.52; 0.55; 0.57; 0.58; 0.6; 0.62; 0.59; 0.57; 0.59; 0.6; 0.58; 0.56; 0.58; 0.59; 0.6; 0.61; 0.62; 0.61; 0.59; 0.6; 0.59; 0.57; 0.58; 0.59; 0.58; 0.59; 0.6; 0.6; 0.61; 0.6; 0.59; 0.58; 0.57 y = 0.5840(1-exp(-1.0126x))2

Somnoroase

AVVAVAVVVAVVVAAVAVA

păsărele…

0; 0.5; 0.67; 0.5; 0.6; 0.5; 0.57; 0.63; 0.67; 0.6; 0.64; 0.67;

0.48

0.69; 0.64; 0.6; 0.63; 0.59; 0.61; 0.58 y = 0.6209(1-exp(-0.8234x))2 Sonet I

0.74

VAVAVAVVAVVVVVVVVVAVVAVAAV 1; 0.5; 0.67; 0.5; 0.6; 0.5; 0.57; 0.63; 0.56; 0.6; 0.64; 0.67; 0.69; 0.71; 0.73; 0.75; 0.76; 0.78; 0.74; 0.75; 0.76; 0.73; 0.74; 0.71; 0.68; 0.69 y = 0.5101 + 0.2251(1– exp(-0.2701(x – 4.2365)))2

Sonet II

VVAVVVAAVAVVVVAVVAVVVAVVV

0.74

Word classes (parts of speech)  229

Poem title

Empirical sequence

R2

1; 1; 0.67; 0.75; 0.8; 0.83; 0.71; 0.63; 0.67; 0.6; 0.64; 0.67; 0.69; 0.71; 0.67; 0.69; 0.71; 0.67; 0.68; 0.7; 0.71; 0.68; 0.7; 0.71; 0.72 y = 0.6680 + 0.0648(1– exp(-0.1182(x – 11.0558)))2

0.71

VVAVVAVVAVAAVVVVVAAAVVVVVAAA

Sonet III

1; 1; 0.67; 0.75; 0.8; 0.67; 0.71; 0.75; 0.67; 0.7; 0.64; 0.58; 0.62; 0.64; 0.67; 0.69; 0.71; 0.67; 0.63; 0.6; 0.62; 0.64; 0.65; 0.67; 0.68; 0.65; 0.63; 0.61 y = 0.6494 + 0.0012(1– exp(-0.1739(x – 17.7071)))2

0.75

VAVVVVAAAVVVVAVVAVAVV

Trecut-au anii...

1; 0.5; 0.67; 0.75; 0.8; 0.83; 0.71; 0.63; 0.56; 0.6; 0.64; 0.67; 0.69; 0.64; 0.67; 0.69; 0.65; 0.67; 0.63; 0.65; 0.67 y = 0.6639+ 3.9492E-6(1– exp(-0.6199(x – 9.9793)))2

0.34

Some of the fitting results are not satisfactory and another reasonable function must be found. As it is shown in Chapter 3.3. (Figure 3.3.3) and Chapter 4 (in many figures), La mijloc de codru…, Replici and some other poems are outliers from different points of view. (3) The second dynamic view is the study of occurrences of A up to a given V (or the other way round). The counting is simple and can be performed directly from sequence (I). Writing the sequence again, we note the rank of V above and that of A below the sequence:

(I)

1

2

V

V

3 A

A

1

2

V

A

A

A

A

A

3

4

5

5

5

4

5

V

V

A

6

7

V

V

8

A 9

The result can be collected in Table 3.5.5. Table 3.5.5: Number of A's up to the xth V in the poem Peste vârfuri xth V

1

2

3

4

5

6

7

number of As

0

0

2

7

7

8

8

The last A can be written before the imaginary last V, i.e. we could add V = 8 and A = 9 but we shall restrain from this method.

230  The word This sequence is always non-decreasing and it is simple to conjecture that most probably such a sequence follows the simple power function. For the poem Peste vârfuri we obtain y = 0.7732x1.2778 with R2 = 0.83. The results for some other poems are presented in Table 3.5.6. Table 3.5.6: Cumulative A up to the xth V fitted by the power function y = axb a

b

R2

3.8579

0.4867

0.93

Atât de fragedă...

1.1766

0.8989

0.99

Călin, Gazel

0.4846

0.6914

0.77

Călin, I

0.6984

1.0866

0.99

Călin, II

0.3144

1.4624

0.93

Călin, III

0.7729

0.982

0.96

Călin, IV

1.8906

0.745

0.91

Călin, V

0.2002

1.4472

0.97

Călin, VI

0.1293

1.3332

0.89

Călin, VII

0.9704

0.908

0.97

Călin, VIII

1.9928

0.809

0.95

Când amintirile...

1.5127

0.7027

0.87

Ce te legeni...

0.0089

2.3251

0.96

Crăiasa din poveşti

0.6629

1.1329

0.96

Criticilor mei

1.0783

0.8530

0.98

Cu mâine zilele-ţi adaogi…

0.0938

1.5603

0.94

De ce nu-mi vii?

0.0146

2.1403

0.97 0.93

Poem title Adânca mare…

De-aş avea

1.2942

0.9771

De-or trece anii...

1.3042

0.87

Departe sunt de tine...

0.0709 0.7638

0.9823

0.94

Dintre sute de catarge

0.1253

1.3383

0.90

Dorinţa

0.0692

1.8925

0.95

După ce atâta vreme

0.6630

1.0942

0.92

Floare-albastră

2.5040

0.6238

0.93

Înger de pază

0.4745

1.3323

0.97

Iubind în taină

0.2169

1.2432

0.96

Kamadeva

0.1256

1.5874

0.96

La mijloc de codru…

1.6444

0.8979

0.76

Lacul

0.4542

1.1813

0.94

Word classes (parts of speech)  231

a

b

R2

La steaua

0.4152

0.9691

0.89

Lida

0.1738

1.4101

0.95

Luceafărul, up to the 30th

0.9248

0.8868

0.96

Poem title

verb Mai am un singur dor

0.6197

1.0174

0.94

Melancolie

0.9711

0.9923

0.98

O, ramâi

0.4382

1.1477

0.92

Odă în metru antic

1.4400

0.9394

0.93

Oricâte stele

0.0196

2.1031

0.89

Pe aceeaşi ulicioară...

1.5363

0.6496

0.80

Pe lângă plopii fără soţ...

0.0286

1.8667

0.99

Peste vârfuri

0.7732

1.2778

0.83

Revedere

0.0605

1.5292

0.96

Sara pe deal

0.6783

0.9870

0.96

Se bate miezul nopţii...

0.1570

1.4671

0.85

Si dacă...

0.0639

1.3417

0.87

Singurătate

1.0030

0.8764

0.96

Somnoroase păsărele…

0.6075

0.9674

0.91

Sonet I

0.9727

0.6304

0.80

Sonet II

0.4217

1.0003

0.95

Sonet III

0.3329

1.1638

0.94

Trecut-au anii...

0.3306

1.1698

0.93

The parameters cannot be generalised, they vary considerably. Not all of the fitting results are adequate but we see that there is a certain regularity expressed by the simple power relationship between the fitting parameters a and b, see Figure 3.5.4.

3.5.3 Runs Regularities are phenomena that are frequently followed unconsciously or placed intentionally, for example meter, rhyme, etc. Some regularities are textsort bound or language inherent but they may arise also ad hoc, spontaneously, as a consequence of style, as a consequence of some mood, psychic state of the author, etc. In that case one does not follow an accepted scheme but creates tendencies. Tendencies may arise spontaneously in order to give the text a certain colouring. They are, so to say, counterparts of regularities. They may be

232  The word present in text to a certain extent which may be considered significant or they may be missing (being non-significant). Out of the many possibilities to study tendencies, we shall examine here the existence of runs. Our data directly inspire to scrutinise this point of view. If we consider a sequence of binary data (I)

V V A A V A A A A A V V A V V A.

as shown above in (I) we may ask: (1) Is there a significant alternation of the binary categories? The hypothetic example AVAVAVAVAVAVAVAA illustrates an extremely reguar case. (2) Is there a tendency to separate the active and the descriptive part? A quite extreme case with separation of the categories would be VVVVVVVAAAAAAAAA. (3) Are there significantly long sequences of the same category?

Figure 3.5.4. The relationship between the fitting parameters a and b

3.5.3.1 Sequential dependence Since adjectives and verbs are predicates of nouns, we may ask whether there is any dependence in the transition from an A to a V and V to A respectively. The quickest way is to compute all transitions and place them in a contingency table. For the sequence (I) the contingency table is as follows: A

V

A

5

3

V

4

3

The cells can be called nAA = 5, nAV = 3, nVA = 4, nVV = 3. The number of transitions

Word classes (parts of speech)  233

nt is equal to the number of symbols minus 1, here nt = 15, representing the sum of all numbers in the contingency table. In order to test whether there is any tendency we perform the chi-square test in the form (3.5.14) 𝑥² = (𝑛

𝑛 𝑛𝑡 �| 𝑛𝐴𝐴 𝑛𝑉𝑉 − 𝑛𝐴𝑉 𝑛𝑉𝐴 | − 𝑡 �² 2

𝐴𝐴 + 𝑛𝐴𝑉 )(𝑛𝐴𝐴 + 𝑛𝑉𝐴 )(𝑛𝐴𝑉 + 𝑛𝑉𝑉 )(𝑛𝑉𝐴 + 𝑛𝑉𝑉 )

.

In the denominator we see the sums of rows and columns. In our case we obtain 𝑥² =

15 �| 5(3)− 3(4)|− 8(9)6(7)

15 �² 2

= 0.10 .

This chi-square statistic with 1 degree of freedom is not significant, hence there is no dependence in the transition between the categories. The critical value is 3.84. Another test based on runs yields the same result. Runs can be computed from the empirical sequence as uninterrupted sequences of the same symbol, here A or V. For example, in the above sequence (I) we find rA = 4, rV = 4 yielding r = 8, that is four runs of A's, four runs of V's, together 8 runs. The number of individual letters is n = 16, out of which nA = 9, nV = 7. We test asymptotically whether the number of runs (r) does not differ significantly from its expectation using the normal test given as (3.5.15) u =

r − E (r )

σr

where the expectation is (3.5.16) E (r ) = 1 +

2nAnV , n

in our case E(r) = 1 + 2(9)7/16 = 8.8750. The standard deviation is given as 2nAnV (2nAnV − n) , n 2 (n − 1)

(3.5.17) σ r = in our example = σr

2(9)7[2(9)7 − 16] = 1.8998 . 162 (16 − 1)

Inserting these values in (3.5.15) we obtain u = (8 – 8.8750)/1.8998 = -0.46

234  The word which is also non-significant. The critical value of u = ± 1.96. Hence, the runs are distributed randomly. There is no tendency to form too many or too few sequences of the same symbol (A or V) in this poem. Using the exact test of Cox (cf. Cox 1958; Maxwell 1961: 137; Bortz, Lienert, Boehnke 1990: 563), one defines the numbers in the contingency table as follows: nA = 9 = number of A's nV = 7 = number of V's n = nA + nV = 16 = number of A's and V's rA = 4 = number of runs of A's rV = 4 = number of runs of V's r = rA + rV = 4 + 4 = 8 = number of total runs rV = n – nA – rA + 1 = 16 – 9 – 4 + 1 = 4 rrA = nA – rA = 9 – 4 = 5 rrV = rA – 1 = 4 – 1 = 3 rrV = nV – rV = 7 – 4 = 3 rr = rrA + rrV = 5 + 3 = 8 Thus we obtain the contingency table rA = 4

rrA = 5

nA = 9

rV = 4

rrV = 3

nV = 7

r=8

rr = 8

n = 16

in which the marginal numbers are the sums. The probability of such an event can be computed according to

 rA + rV  rrA + rrV     rA  rrA  = P (rA ) =  n    (3.5.18)  rA + rrA  =

(rA + rV )! (rrA + rrV )! (rA + rrA )! (rV + rrV )! rA ! rV ! rrA ! rrV ! n!

Inserting our values in the last formula we obtain

Word classes (parts of speech)  235

= P(r ) A

(4 + 4)!(5 + 3)!(4 + 5)!(4 + 3)! = 0.3427, 4! 4!5!3!16!

that means, the probability of this event is not significantly small, it is an event without a tendency. The results for some other poems are presented in Table 3.5.7. Here the critical values are x2 > 3.84, u > |1.96|, P < 0.05. Table 3.5.7: Tests for significance of sequences Poem title Atât de fragedă....

Dependence

Runs

x2 (1 DF)

u

Cox P

1.71

1.46

0.08 0.47

Călin, Gazel

0.16

0.14

Călin, I

0.39

0.73

0.12

Călin, II

0.09

0.65

0.42

Călin, III

0.0005

0.40

0.19

Călin, IV

0.42

0.97

0.10

Călin, V

0.49

-1.05

0.08

Călin, VI

0.33

0.31

0.47

Călin, VII

2.50

1.67

0.04

Călin, VIII

0.74

0.95

0.09

Ce te legeni?...

0.03

0.35

0.27

Crăiasa din poveşti

0.12

-0.17

0.26

Criticilor mei

2.78

1.83

0.05

Cu mâine zilele-ţi adaogi…

1.35

1.25

0.07

De ce nu-mi vii?

0.05

0.11

0.32

De-aş avea

0.06

-0.35

0.22

De-or trece anii...

0.06

1.01

0.58

Departe sunt de tine...

0.12

0.46

0.18

Dintre sute de catarge

0.15

1.17

0.58

Dorinţa

0.04

-0.34

0.27

După ce atâta vreme

0.008

0.33

0.33

Floare- albastră

0.003

-0.40

0.17

2.17

1.67

0.16

Înger de pază Iubind în taină

0.29

0.98

0.34

Kamadeva

0.004

0.16

0.34

Lacul

0.02

0.02

0.26

La mijloc de codru…

0.11

0.21

0.43

La steaua

0.38

0.85

0.26

236  The word Poem title

Dependence

Runs

Cox

x (1 DF)

u

P

Lida

0.73

1.10

0.20

Luceafărul (up to A + V = 50)

0.97

1.19

0.15

Mai am un singur dor

0.75

1.15

0.21

Melancolie

0.32

-0.90

0.10

O, ramâi

0.31

0.82

0.24

Odă în metru antic

0.75

1.01

0.21

Oricâte stele

0.20

-1.12

0.21

Pe aceeaşi ulicioară...

0.002

0.37

0.31

Pe lânga plopii fara soţ...

2.15

1.61

0.06

Peste vârfuri

0.10

-0.46

0.34

Revedere

0.03

0.50

0.28

Sara pe deal

0.24

0.70

0.21

Se bate miezul nopţii...

0.33

-0.23

0.51

Singurătate

0.46

-1.14

0.16

Somnoroase păsărele…

1.47

1.33

0.07

2

Sonet I

0.95

1.38

0.23

Sonet II

0.29

0.98

0.34

Sonet III

0.12

-0.95

0.22

Şi dacă...

0.06

0.05

0.67

Trecut-au anii...

0.002

0.34

0.39

3.5.3.2 Run length The individual runs in our sequence (I) are not equally long. Some consist of one element, some of two, etc. But there is a run of A's consisting of five elements. Since this is the longest run, we may ask, whether a run of such a length is random or has been produced non-randomly (e.g. intentionally – an assumption that cannot be answered qualitatively). In order to find the probability of the longest run, we perform Mood's test (1940). Again, we choose the following symbols: n = number of elements = 16 nA = number of A's = 9 nV = number of V's = 7 s = the longest run = 5 (this is the run of A's). The exceedance probability, i.e. the probability of a certain length s, can be computed using Bradley's (1968: 256) formula

Word classes (parts of speech)  237

 nV + 1 n − s   nV + 1 n − 2s   nV + 1 n − 3s    −  +   − ... 1  nV   2  nV   3  nV  (3.5.19) P ( s ) =  . n     nV  In our example we obtain

 7 + 116 − 5     1  7  = P ( s ) = 16    7

0.2308,

which is not significant. Our expression in the numerator contains only one element because already the second one,  n − 2s  16 − 2(5)   6  = =     =  0  7  nV   7

by definition. Hence, in the given poem there is no tendency to place long descriptive sequences. If n > 30, one can perform the test asymptotically by means of the Poisson distribution computing (3.5.20) P ( s ) = 1 − e − λ where λ is the parameter of the Poisson distribution computed as s

n  (3.5.21) λ = nV  A  .  n 

Even if our n is smaller, we obtain with our data λ = 7(9/16)5 = 0.3942, hence P(s) = 1 – 2.7183-0.3942 = 0.3258 which is even greater than the result of the exact Mood test and does not indicate any significance.

238  The word 3.5.3.3 Placing tendency In the previous sections we studied the existence or non-existence of some structures, the greatest length of the run, dependence of the transitions, etc. However, the categories A and V may display also a tendency to increase from the beginning to the end of the poem. In that case we speak about a climax of a certain category. There may be climaxes also in individual verses, e.g. a length climax, as has been observed in Malay folk poetry (cf. Altmann, Štukovský 1965), but in our case the number of A's and V's in one verse is very scarce and we must take into account the whole poem. Let us illustrate the procedure of finding a tendency using the well-known Mann-Whitney U-test (Mann, Whitney 1947; Gibbons 1971). We ask whether there is a tendency to apply a category (A or V) more often at the beginning or toward the end of the poem. To this end, we write the sequence of the poem Călin, Gazel as presented in Table 3.5.4 and ascribe ranks to the positions. We obtain V

V

A

V

A

V

V

V

V

V

V

V

V

A

V

V

A

A

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

Now, let the sum of ranks of A and V be SA = 3 + 5 + 14 + 17 + 18 = 57 SV = 1 + 2 + 4 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 15 + 16 = 114. The number of A's, nA = 5 and the number of V's is nV = 13, thus n = nA + nV = 5 + 13 = 18. Now we compute the criterion

nA (nA + 1) − SA 2 n (n + 1) UV = nAnV + V V − SV 2 U A = nAnV +

for which we obtain in our example UA = 5(13) + 5(5 + 1)/2 – 57 = 65 + 15 – 57 = 23 UV = 5(13) + 13(13 + 1)/2 – 114 = 65+91 –114 = 156 – 114 = 42. Now, the smaller of the resulting numbers, here UA = 23 will be used as criterion and we look up its critical value in the appropriate tables (cf. e.g. Owen 1962; Bortz, Lienert, Boehnke 1990: 669, Table 6). If the observed value is smaller than the critical value, one can accept the existence of a climax tendency. If it is

Word classes (parts of speech)  239

greater, the hypothesis of no tendency can be accepted. For our case, UA = 23 is greater than the critical value (16) at the 0.05 level, hence there is no tendency. Fortunately, for greater nA, nV one can perform the test asymptotically using the normal criterion. To this end we test (3.5.22) u =

| U − E (U ) | −0.5

σU

,

where the expectation is (3.5.23) E (U ) =

nAnV 2

and the variance (3.5.24) σ U =

nAnV (nA + nV + 1) . 2

For our example we obtain UA = 23, E(U) = 5(13)/2 = 32.5, σU = √[5(13)(5 + 13 +1)/2] = 24.8495, hence u = (|23 – 32.5| - 0.5)/24.8495= 0.3622 which is not significant at the 0.05 level and testifies to the non-existence of a tendency. Performing this test for some other poems, we obtain the results presented in Table 3.5.8. Table 3.5.8: Positional descriptiveness/activity tendencies in some poems UV

77

44.74

0.44

250

107.24

0.36

58.53

0.05

Adânca mare…

162

163

97

57

Atât de fragedă...

421

614

289

211

Când amintirile...

198

267

114

107

110.5

Călin, I

UA

u

SA

Călin, Gazel

SV

σU

Poem title

E(U)

57

114

23

42

32.5

24.85

0.36

2194

2366

1141

1113

1127

328.93

0.04

Călin, II

45

60

24

24

24

18.97

0.03

Călin, III

938

1340

560

532

546

192.69

0.07

Călin, IV

982

1868

833

517

675

226.5

0.70

Călin, V

1331

1370

629

701

665

221.84

0.16

Călin, VI

58

152

32

43

37.5

28.06

0.18

Călin, VII

2820

4683

1908

1644

1776

467.38

0.28

Călin, VIII

3766

5012

2456

1875

2165.5

536.67

0.54

240  The word SA

SV

Ce te legeni?...

293

Crăiasa din poveşti

301

Criticilor mei Cu mâine zilele-ţi adaogi…

Poem title

σU

UA

UV

E(U)

u

302

71

202

136.5

69.12

0.94

329

158

148

153

74.22

0.06

301

560

235

165

200

91.65

0.38

515

520

169

325

247

106.59

0.73

De ce nu-mi vii?

195

333

80

140

110

60.25

0.49

De-aş avea

231

234

129

95

112

58.92

0.28

De-or trece anii...

57

268

37

47

42

33.05

0.14

Departe sunt de tine...

385

518

218

214

216

96.37

0.02

Dintre sute de catarge

23

68

13

17

15

14.49

0.10

Dorinţa

281

247

94

161

127.5

64.87

0.51

După ce atâta vreme

91

140

62

46

54

34.47

0.22

Floare-albastră

828

1317

576

450

513

184.0

0.34

Înger de pază

150

150

72

72

72

42.43

-0.01

Iubind în taină

87

238

67

59

63

40.47

0.09

Kamadeva

122

131

40

77

58.5

36.68

0.49

Lacul

265

263

110

145

127.5

64.87

0.26

La mijloc de codru…

21

15

9

6

7.5

8.22

0.12

La steaua

86

145

40

58

49

32.83

0.26

Lida

80

110

32

52

42

28.98

0.33

Luceafărul (up to nA + nV =

499

776

311

289

300

123.7

0.08

50) Mai am un singur dor

185

343

133

107

120

62.93

0.20

Melancolie

1334

1592

772

668

720

235.46

0.22 0.28

O, ramâi

250

491

191

145

168

80.94

Odă în metru antic

362

379

208

152

180

83.79

0.33

Oricâte stele

153

172

36

108

72

43.27

0.82

Pe aceeaşi ulicioară...

151

479

179

85

132

68.93

0.67

Pe lânga plopii fără soţ...

478

698

202

325

263.5

113.63

0.54

Peste vârfuri

76

60

32

31

31.5

23.14

0

Revedere

343

738

177

252

214.5

100.41

0.37

Sara pe deal

506

719

284

296

290

120.42

0.05 0.24

Se bate miezul nopţii...

30

48

12

20

16

14.42

Singurătate

642

789

324

366

345

136.49

0.15

Somnoroase păsărele…

86

104

38

50

44

29.66

0.19

Sonet I

111

240

69

75

72

44.09

0.06

Sonet II

83

242

71

55

63

40.47

0.19

Sonet III

179

227

74

113

93.5

52.07

0.36

Şi dacă...

34

119

14

28

21

19.44

0.33

Word classes (parts of speech)  241

Poem title

SA

SV

UA

UV

E(U)

σU

u

Trecut-au anii...

76

155

50

48

49

32.83

0.02

No analyzed poem displays the given tendency; we can conjecture that the placing of A and V in texts follows rather the grammatical order than a latent mechanism.

4 The control cycle We consider language as a complex network of interrelated units and properties, in which each element is connected to each other – directly or indirectly. Isolated units or properties without any bond to at least one other element do not exist. Many properties are – similar as in nature – involved in the processes of self-regulation and self-organisation. In analogy to Köhler's (1986, 2005) complex synergetic control cycle, which links levels, units, properties of language, and requirements of language users, we can try to set up the cycle of text properties studied up to now. The situation is advantageous because we have at our disposal a homogeneous collection of 146 texts written by the same author in the same form: rhymed poems. Of course, this homogeneity is disturbed by some outliers but we can better draw consequences – even if they would be local – from texts of different sorts written in different languages where a number of boundary conditions should be taken into account. The resulting cycle is merely a small cutting from the infinite world of texts. We consider the following indicators: 1. the relative entropy of word forms, Hrel; 2. the relative repeat rate (McIntosh) of word forms RRrel; 3. the Ord indicators, I and S; 4. the lambda indicator Λ, 5. Gini's indicator R4; 6. the geometric indicators A, B, α radians both for rank-frequencies and spectra (αS radians); 7. the richness indicators based on cumulative frequencies R1 and R2. All of them have been computed in the previous chapters. For convenience, we present all these data in Table 4.1. Needless to say, the number of individual indicators proposed by other researchers up to now is considerable; we restrict ourselves to those considered in our present investigation. In order to state whether an indicator is linked with another, we simply compute the well known correlation coefficient n

(4.1) r =

∑(x i =1

1i

− x1 )( x2i − x2 ) 1/ 2

n   n  2 2    ∑ ( x1i − x1 )   ∑ ( x2i − x2 )     i 1 = = i 1  

The control cycle  243

a procedure that can be performed by means of Excel, and test its significance using the t-test with n – 2 = 146 – 2 = 144 degrees of freedom. Since the number 144 is, in this respect, very large, it can be considered in testing as infinity, the critical value of t is ± 1.96 and the formula is (4.2) t = r n − 2 . 2 1− r

The results are presented in Table 4.2 and the respective t-values in Table 4.3. In (4.1) x1 and x2 are the means of the scrutinised variables. If we link with an edge two properties with a significant correlation, we obtain a system of text properties. Further properties may be scrutinised and added. However, correlation analysis is not sufficient. If we accept the existence of a relation between two properties, then it must be possible to display it in form of a function. In the rest of the chapter we shall study just this possibility. In order not to reject too many possible relations and at the same time not to accept those which display rather a cloud of points, we decide to preliminarily accept only those relations which can be expressed by a function and both the correlation coefficient and the determination coefficient are greater than 0.50. We know that for a correlation coefficient it is a too high boundary, and for the determination coefficient a too low one. The dispersion of the points in the plot gives a further hint for our decision. The possibility that some of the relations hold only for Eminescu's poems is not excluded; even in his poetic texts there are some outliers. Hence a number of different texts in different languages should be examined and the given relations should be strengthened or weakened, a general relation should be set up, and its individual cases should be supplied with boundary conditions. One will always find outliers whose existence is rather a problem for literary scientists. As can be seen, no property is isolated. One can draw a graph linking each property with each other but in that case we would obtain a highly redundant connectivity. One could evaluate also the properties of the given graph itself which would display several aspects but we dispense with this task. The links are not transitive, i.e. from x = f(y), y = g(z) one cannot conclude that x = h(z), because all these relations are stochastic, though in many cases such a relationship exists. It must be remarked that even if the correlation coefficient displays a significant value, the relationship of two properties may have a form of a cloud for which it will be difficult to find an appropriate function. This relationship may disappear or become stronger if one would process different authors of different text sorts in different languages. The present image is merely a state of the art in Eminescu's work.

0.9767 0.9879 47.5299

0.9618 0.9772 48.9285

0.903

0.9653 0.9734 33.7274

0.9482 0.9695 80.5931

0.9504 0.9696 73.9455

0.8925 0.9471 393.5057 329.6952

0.9713

0.9744 0.9804 18.7352

0.9174 0.9539 136.5299 94.3774

0.9585 0.9691 20.8195

0.9741 0.9794 19.3821

0.9675 0.9808 40.4923

0.9598 0.9662 24.0206

0.9632 0.9691 19.0325

0.9591

0.9581 0.9682 24.3134

Amorul unei marmure

Andrei Mureşanu

Atât de frageda…

Aveam o muză

Basmul ce i l-aş spune ei

Călin (file de poveste)

Când

Când amintirile...

Când crivăţul cu iarna...

Când marea...

Când priveşti oglinda mărei

Care-i amorul meu în astă lume

Ce e amorul?

Ce to legeni....

Ce-ţi doresc eu ţie. dulce Românie

Cine-i?

24.2304

0.9735 33.7896

0.978

12.8188

19.7728

9.3166

10.654

20.3393

6.6813

12.0935

5.8619

9.2264

46.3385

48.2603

15.3924

0.9548 348.2138 287.8655

28.7203

23.1258

28.0882

16.9221

Amicului F.I.

0.9720 39.2957

0.9531

S 4.5074

0.9562 0.9689 29.6257

I

Ah. mierea buzei tale

RRrel

Adio

Hrel

0.9747 0.9789 14.3434

Adânca mare…

Poem title

Table 4.1: Indicators from previous chapters Λ

R4

B

2.2743 2.1347

0.81 2.1331

1.6312

1.8328

1.8007

1.9160

0.7779 0.61 0.71 2.1124

0.7812 0.69 0.71 1.9733

1.9395

1.9465

0.7677 0.39 0.42 2.5443 2.4934

2.1225

0.7432 0.46 0.57 2.3956 2.1766

1.9976 1.6773

0.8383 0.47 0.74 2.3822 1.9068

0.56 0.68 2.2169 2.0030

0.5365 0.83 0.94 1.7702

0.7005 0.69 0.78 1.9851

0.7019 0.6

0.7826 0.73 0.73 1.9118

0.5506 0.77 0.91 1.8496 1.6718

0.7347 0.48 0.63 2.3785 2.0850

1.566

0.7541 0.49 0.7

2.3462 1.9560

1.6037 0.7348 0.45 0.69 2.4338 1.9719

1.5629

1.6533

1.7369

αS rad

0.7907 0.23 0.18 2.8356 2.9180

0.6912 0.53 0.6

1.6566 0.8291 0.58 0.61 2.1841

1.4776

α rad

0.7362 0.46 0.66 2.3939 2.0270

1.7666 0.6238 0.68 0.9

1.652

A

0.8411 0.47 0.46 2.3901 2.3946

1.6985 0.814

1.7534

1.7774

1.808

1.7701

1.7387

1.7059

1.8253

1.5259

1.5872

1.5767

R1

R2

0.9415

0.931

0.948

0.8806

0.8867 0.89

0.9174 0.9016

0.9032 0.8951

0.8831 0.9207

0.9442 0.9147

0.9158 0.9023

0.886

0.8347 0.9266

0.9227 0.9

0.9216 0.9203

0.8036 0.9527

0.8945 0.9325

0.8895 0.9268

0.9212 0.9314

0.826

0.9154

0.9611

0.9057 0.8931

0.8836 0.8998

0.9133 0.9113

244  The control cycle

0.9757 0.9841 24.8397

0.9439 0.9652 113.2859 63.01

0.9767 0.9823 19.4562

0.9844 0.9876 14.4398

0.9311

0.979

0.9564 0.9666 21.1285

0.9429 0.9645 49.8811

0.9525 0.9643 15.8942

0.9481 0.9681 47.5829

0.9185 0.9546 161.6418 120.1606

0.9831 0.9883 23.2885

0.9512

0.9675 0.976

0.9476 0.9674 57.6599

0.9683 0.9761 24.5417

0.986

0.9825 0.9866 12.5856

0.9495 0.9639 46.5548

Cugetările sărmanului Dionis

Cum negustorii din Constantinopol

Cum oceanu-ntărâtat...

Dacă treci râul Selenei

De câte ori iubito...

De ce nu-mi vii

De ce să mori tu?

De-aş avea

De-aş muri ori de-ai muri

Demonism

De-oi adormi (variantă)

De-or trece anii...

Departe sunt de tine

Despărţire

Din Berlin la Potsdam

Din lyra spartă...

Din noaptea

Din străinătate

0.9885 9.2695

26.2202

0.9586 16.6383

0.9833 19.1177

0.9503 69.2482

26.4649

4.0008

2.5197

10.3902

35.3227

10.6813

8.6144

5.5466

30.9101

11.0147

32.4173

13.6821

6.4573

42.1042

3.2689

5.8108

12.9797

13.2573

10.2383

Cu mâne zilele-ţi adaogi...

0.9813 22.1753

0.9768 0.9862 20.8298

S 43.0224

0.975

I

Criticilor mei

RRrel 0.9573 73.1912

Crăiasa din poveşti

Hrel

0.939

Copii eram noi amândoi

Poem title

Λ

R4

A

B

α rad

αS rad

0.7928 0.48 0.59 2.3808 2.1653

0.7768 0.31 0.16 2.7007 2.9383

0.8071 0.65 0.64 2.0564 2.0626

0.6962 0.72 0.87 1.9348 1.7120

0.8448 0.48 0.47 2.3809 2.3840

0.7268 0.59 0.72 2.1497

0.8441 0.68 0.61 1.9941

0.6822 0.73 0.87 1.9147

0.6088 0.72 0.87 1.9224 1.7107

2.0319

2.6062 2.3104

0.6938 0.45 0.66 2.422

0.7141 0.35 0.5

1.8970

1.9282

2.1166

1.7147

2.407

1.9513

1.706

1.5314

1.4675

1.6677

1.7051

2.9418

2.0285

1.8811

1.8638

3.1046 2.7078 0.7224 0.64 0.76 2.059

0.8571 0.04 0.3

0.8792 0.48 0.16 2.38

0.7972 0.53 0.66 2.2735

0.7005 0.59 0.75 2.1549

1.6868 0.7981 0.47 0.67 2.3859 2.0189

1.4467 0.7515 0.45 0.7

1.7998 0.8707 0.31 0.43 2.6976 2.4851

1.7502

1.5997

1.3225

1.6209 0.6848 0.55 0.74 2.2212

1.4575

1.6933

1.7523

1.6474 0.8806 0.48 0.44 2.3794 2.4506

1.68

1.9632 0.7069 0.71 0.88 1.9422 1.7006

1.623

1.4837

1.658

1.8253

R1

R2 0.9297 0.9221

0.8963 0.87

0.875

0.9252

0.9338 0.927

0.9412 0.9005

0.9197 0.907

0.8882 0.9158

0.9037 0.9035

0.8506 0.867

0.9467 0.9351

0.8492 0.9293

0.8956 0.8958

0.871

0.8797 0.8866

0.878

0.9424 0.9289

0.8427 0.9286

0.9497 0.9197

0.9257 0.9226

0.8844 0.9427

0.9441 0.9109

0.95

0.9303 0.9415

0.868

The control cycle  245

0.9743 0.9774 19.6653

0.9551

0.9241 0.9528 137.714

0.9406 0.9654 133.9957 80.3625

0.8631 0.9588 174.8791 116.411

0.9469 0.9681 81.1388

0.8629 0.9437 848.5269 842.0609

0.9076 0.9687 48.7903

0.9839 0.9883 21.7732

0.9721

0.9593 0.9684 21.2373

0.9563 0.974

0.9177

0.9811

0.9776 0.9842 24.8857

0.9082 0.9539 291.0834 208.616

0.9331 0.9617 182.195

0.979

0.9238 0.9606 166.9185 116.1771

Ecò

Egipetul

Epigonii

Făt-Frumos din tei

Feciorul de împărat fără de stea

Floare-albastră

Foaia veştedă (dupa Lenau)

Freamăt de codru

Frumoasă şi jună

Ghazel

Glossă

Horia

Iar când voi fi pământ (variantă)

Împărat şi proletar

În căutarea Şeherezadei

Înger de pază

Înger şi demon

63.7133

35.0398

27.1907

0.9845 16.0946

0.987

0.9538 59.0619

0.981

7.4472

109.1787

8.9622

8.4351

53.784

35.2446

11.1675

12.5882

5.3182

21.8256

46.6967

85.8646

0.9726 88.0286 43.1048

5.7203

0.1297

3.1726

Dumnezeu şi om

9.4416

Dorinţa

0.9731

0.9974 0.9977 6.7759

S 16.8749

0.9713

I

Doi aştri

RRrel

Dintre sute de catarge

Hrel

0.9601 0.9738 27.2352

Din valurile vremii...

Poem title

Λ

R4

A

B

α rad

0.8461 0.7

2.1701

0.86 1.9607 1.7280

1.6941

0.5752 0.57 0.78 2.1758 0.8298 0.58 0.58 2.1782

2.1818

2.3813

1.8396

0.7294 0.52 0.68 2.2878 1.9990

0.7584 0.53 0.53 2.2527 2.2619

0.8208 0.6

0.5

2.1331

2.3205

0.6747 0.74 0.87 1.8942 1.7160 1.8064 0.6277 0.64 0.89 2.0749 1.6929

1.5514

1.9921

1.8876 0.5965 0.75 0.91 1.8782 1.6679

1.7371

2.3067

2.2414

1.6193

0.8152 0.48 0.61 2.3778 2.1297

0.8725 0.48 0.51 2.377

0.7702 0.73 0.54 1.9227

0.4392 0.88 0.95 1.7039

1.8236 0.8489 0.58 0.48 2.176

1.3595

1.7957

1.551

1.8185

1.7931

1.8612

1.5332

1.8258 0.7059 0.64 0.77 2.0733 1.8492

1.8992 0.6505 0.74 0.89 1.9024 1.6881

0.688 0.7

2.0647

2.1626 1.8048

0.58 1.9582

1.8807 0.6607 0.71 0.89 1.9532 1.9211

αS rad 2.1901

2.3695 3.1416

0.64 2.5103

0.9756 0.49 0

0.8361 0.4

0.7323 0.29 0.57 2.7183

1.9548 0.7453 0.59 0.8

1.7292

1.5385

1.4153

1.52

R1

R2 0.9211 0.9111

0.9387 0.9537 0.8636 0.9385

0.9464 0.9143

0.865

0.8365 0.945

0.9351

0.9476 0.9401

0.8421 0.8668

0.9063 0.9291

0.8938 0.8902

0.9218 0.9351

0.9435 0.9285

0.9008 0.8962

0.7585 0.9502

0.8967 0.93

0.8523 0.9515

0.8766 0.9513

0.8345 0.948

0.9055 0.9454

0.927

0.9781 0.9244

0.9111

0.9046 0.8867

246  The control cycle

0.9887 0.9919 16.1075

0.9516 0.9743 59.7529

0.9428 0.9691 69.4926

0.9290 0.9539 93.9687

0.9845 0.988

0.9743 0.9838 34.167

0.8942 0.8826 10.251

0.9517

0.9661 0.9807 45.0691

0.9611

0.9701 0.9781 28.578

0.9725 0.9814 25.0687

0.9655 0.9764 38.4593

0.9601 0.9694 20.3502

0.9849 0.9882 13.2642

0.9651 0.9712 17.3485

0.9614 0.9743 44.086

0.9839 0.9842 7.5099

0.9843 0.987

0.953

Iubită dulce, o, mă lasă

Iubitei

Junii corupţi

Kamadeva

La Bucovina

La mijloc de codru...

La moartea lui Heliade

La moartea lui Neamţu

La moartea principelui Ştirbey

La mormântul lui Aron Pumnul

La o artistă (Ca a nopţii poezie)

La o artistă (Credeam ieri)

La Quadrat

La steaua

Lacul

Lasă-ţi lumea...

Lebăda

Lida

Locul aripelor

0.9728 48.2531

12.2344

0.9729 25.4086

0.9704 63.1788

15.1658

47.9721

0.9542 0.969

I

Iubind în taină...

RRrel

Întunericul şi poetul

Hrel

0.9839 0.9876 11.5293

Îngere palid...

Poem title

S

30.1077

3.0913

1.1242

20.6707

7.1442

2.9552

11.0975

23.1001

13.5436

12.1992

12.355

25.6754

37.2997

6.8378

15.9934

3.7007

51.5615

54.8294

41.9792

3.2733

25.6365

3.6802

Λ

R4

A

B

α rad

αS rad

2.9211

2.0769 1.9594

0.7814 0.56 0.64 2.2159

0.7991 0.55 0.71 2.2411

0.7682 0.37 0.31 2.5855

0.7477 0.41 0.59 2.5219

0.7129 0.61 0.74 2.1149

2.0611

1.9396

2.6884

2.1659

1.9044

0.6675 0.76 0.69 1.8486 1.9405

0.7946 0.48 0.59 2.3782 2.1590

0.8764 0.48 0.42 2.3784 2.4949

0.6918 0.68 0.87 1.9894 1.7228

0.6433 0.63 0.7

0.682 0.48 0.68 2.3805 1.9947

0.8952 0.23 0.18 2.84

0.7375 0.58 0.75 2.1822 1.8829

1.6297

1.5894

2.3565

2.1489

2.7164

3.1416

2.0052

0.7077 0.26 0.55 2.791

2.2367

0.8772 0.65 0.24 2.0526 2.8068

0

0.7656 0.59 0.67 2.1591

0.7984 0.46 0.64 2.3938 2.0635

0.8835 0.03 0.49 3.1077

0.7554 0.55 0.59 2.2241

1.4647 0.9077 0.3

1.7829

1.5432

1.6121

1.5012

1.6963 0.7488 0.71 0.72 1.9498 1.9223

1.6128

1.7237

1.5981

1.7095

1.76

1.3175

1.7465

1.655

1.8752

1.5643

1.6205

1.7128

1.7367

1.5088 0.8616 0.22 0.16 2.8495 2.9352

R1

R2

0.9408

0.8952

0.916

0.9073 0.9273

0.9394 0.8958

0.9201 0.8525

0.9161

0.9014 0.8921

0.9366 0.9292

0.9011

0.9224 0.9267

0.9375 0.9279

0.9133 0.9215

0.9091 0.8772

0.9327 0.9131

0.9022 0.9341

0.7295 0.792

0.9348 0.9317

0.9522 0.9206

0.8584 0.9255

0.9041 0.9018

0.9169 0.9009

0.967

0.8815 0.9287

0.9544 0.9124

The control cycle  247

0.9189 0.9458 119.1822 83.5678

0.9652 0.9776 28.1931

0.9322 0.9551

0.934

0.8967 0.9502 330.9081 290.9484

0.9789 0.9841 22.8708

0.9583 0.9697 44.7092

0.9613 0.9747 33.7436

0.976

0.9504 0.9717

0.9819 0.9861 21.6335

0.9809 0.9837 8.7626

0.9646 0.9749 30.2561

0.9538 0.9688 63.6083

0.9835 0.9981 14.358

0.9344 0.9549 66.904

0.9599 0.972

0.9784 0.9849 19.3205

Misterele nopţii

Mitologicale

Mortua est!

Mureşanu

Murmură glasul mării

Napoleon

Noaptea...

Nu e steluţă

Nu mă-nţelegi

Nu voi mormânt bogat (variantă)

Numai poetul

O arfă pe-un mormânt

O călărire în zori

O stea prin ceruri

O, adevăr sublime...

O, mamă…

Odă în metru antic

133.639

25.6049

73.0426

0.9815 8.8851

0.9616 89.4855

7.3207

14.9131

37.407

4.8041

38.371

14.0062

2.9458

4.1517

43.9308

5.1288

17.6877

24.8054

6.4199

63.6093

80.9262

16.2218

R4

A

α rad

αS rad 1.6374

0.86 1.9679 1.7257 1.8051

1.6777

3.0876 3.0126 2.1280

0.8542 0.21 0.29 2.8598 2.7214

0.8824 0.48 0.61 2.377

0.7051 0.43 0.64 2.4708 2.0669

0.8009 0.05 0.1

0.7538 0.47 0.59 2.3845 2.1616

0.7434 0.75 0.77 1.8804 1.8500

2.3854

0.93 1.8094 1.6432

0.8529 0.58 0.47 2.1794

0.5279 0.8

1.8838

0.742

0.31 0.36 2.7029 2.6026

0.47 0.55 2.3881 2.2250

0.6993 0.68 0.88 1.9883 1.7100

0.8548 0.03 0.17 3.1093 2.9283

1.6267 0.829

1.5313

1.7786

1.5726

1.8042 0.7184 0.73 0.75 1.9123

1.6827 0.7771 0.55 0.63 2.2406 2.0768

1.395

1.8106

1.7618

1.2778

1.655

1.7542

1.7789

1.6616

1.6837 0.6466 0.67 0.85 2.0148 1.7404

1.9389 0.6782 0.77 0.88 1.8499 1.6987

0.7524 0.47 0.49 2.3845 2.3348

0.9

0.4283 0.93 0.97 1.6437 1.5989

0.7

0.8499 0.31 0.33 2.6979 2.6580

1.7898 0.6312 0.8 1.5812

B

0.5309 0.83 0.94 1.7705

1.7968 0.73

1.7319

0.8582 0.9409 1350.459 1354.7581 1.6175

28.6786

8.2352

Miradoniz

53.2004

Memento mori

Λ 1.6514

0.9502 0.964

S

0.9849 0.9906 22.8924

I

Melancolie

RRrel

Mai am un singur dor

Hrel

0.8972 0.9482 281.3298 246.8649

Luceafărul

Poem title

R1

R2

0.9297

0.9292

0.9432

0.9296

0.9287

0.9441

0.9369 0.9112

0.8929 0.8967

0.8473 0.9232

0.9423 0.9258

0.9021 0.9222

0.9045 0.9171

0.9401 0.8994

0.9336 0.928

0.8997 0.9343

0.9167 0.866

0.9216 0.8992

0.9

0.937

0.813

0.8719 0.917

0.859

0.9226 0.8878

0.8349 0.9345

0.7655 0.9548

0.891

0.956

0.8123 0.9309

248  The control cycle

0.9543 0.9703 37.7909

0.969

0.9703 0.98

0.9485 0.9683 74.4225

0.9809 0.9837 8.7626

0.9675 0.975

0.954

0.8775 0.9062 23.5265

0.9644 0.9761 26.201

0.9508 0.9682 70.6941

0.9497 0.9673 42.5334

0.9758 0.9831 30.3494

0.9164 0.9555

0.925

0.8871 0.939

0.9124 0.9514

0.9094 0.949

Pe lângă plopii fără soţ

Peste vârfuri

Povestea codrului

Povestea teiului

Prin nopţi tăcute

Privesc oraşul furnicar

Pustnicul

Replici

Revedere

Rugăciunea unui dac

S-a dus amorul

Sara pe deal

Scrisoarea I

Scrisoarea II

Scrisoarea III

Scrisoarea IV

Scrisoarea V

9.616

23.742

36.4329

14.0915

21.5219

38.8569

12.9692

2.9458

44.4743

18.6836

2.7012

21.696

12.5824

11.3191

181.2812 142.4597

231.5885 173.3774

404.7692 327.5248

0.9585 133.7825 90.1734

232.3401 176.0081

0.9734 75.4629

33.8715

41.9171

0.9684 8.9681

29.8618

0.9631 0.9734 26.4997

Pe aceeaşi ulicioară...

0.963

0.955

Pajul Cupidon...

4.4763

0.9884 0.9922 15.4324

S

0.9293 0.9605 166.8483 110.8842

I

Oricâte stele...

RRrel

Ondina (Fantazie)

Hrel

0.9086 0.9548 241.2197 200.7361

Odin şi poetul

Poem title

Λ

R4

A

B

0.5622 0.77 0.9

α rad 1.7128

0.7892 0.6

0.76 2.1382 1.8713

0.8785 0.49 0.07 2.3703 3.0613

1.7791

0.8542 0.21 0.29 2.8598 2.7214

0.7044 0.69 0.82 1.9852

0.7914 0.67 0.66 2.0126 2.0323

1.9228

2.3863 2.1385

0.8418 0.59 0.72 2.1523

0.7272 0.47 0.6

1.7737

1.9428

0.7331 0.6

0.8

2.1496 1.8165

0.7598 0.27 0.52 2.7726 2.2851

0.5547 0.54 0.82 2.2167

0.7339 0.48 0.71 2.3771

0.5974 0.79 0.89 1.8204 1.6893

1.7059

0.584

0.77 0.9

1.6646

1.6246 1.8499 1.6779

0.81 0.91 1.7993

0.5441 0.86 0.95 1.7346 1.8504 0.596

1.823

1.8067 0.6404 0.63 0.88 2.0795 1.7098

1.8051

1.8203 0.8345 0.45 0.58 2.4442 2.1770

1.6623 0.7225 0.53 0.76 2.2727 1.8651

1.8549

1.5711

1.207

1.8599

1.8209 0.8043 0.64 0.77 2.0572 1.8519

1.395

1.8013

1.829

1.4254

1.6256

1.6202 0.7729 0.47 0.66 2.3865 2.0242

1.7418

1.6531

αS rad

1.8598 1.6775

1.8823 0.6486 0.73 0.87 1.921

1.6852

R1

R2 0.9467

0.9058

0.9353 0.9442 0.8268 0.9391

0.84

0.7891 0.9529

0.847

0.8574 0.9333

0.9405 0.9414

0.8881 0.9046

0.8852 0.9323

0.9229 0.8992

0.6916 0.8562

0.9109 0.937

0.9075 0.9246

0.9401 0.8994

0.8923 0.9187

0.9324 0.9442

0.8963 0.8974

0.897

0.8986 0.8915

0.8716 0.9457

0.9647 0.9326

0.864

0.8247 0.9388

The control cycle  249

0.9704 0.9807 33.4095

0.9784 0.981

0.9672 0.9816 50.5246

0.9302 0.9481 41.7168

0.9774 0.9827 12.6056

0.9776 0.9822 17.4031

0.9787 0.9856 24.3139

0.96

0.984

0.9789 0.9836 10.5674

0.9459 0.9667 70.3343

0.989

0.9768 0.9829 20.0956

0.9741 0.9828 33.5602

Somnoroase păsărele...

Sonete

Speranţa

Steaua vieţii

Stelele-n cer

Sus în curtea cea domnească

Te duci...

Trecut-au anii

Unda spumă

Venere şi Madona

Veneţia (de Gaetano Cerri)

Viaţa mea fu ziuă

Vis

0.9918 14.6022

0.9884 16.2951

0.9582 16.5634

10.2779

8.6727

0.9654 0.9721

I

Singurătate

RRrel 0.9885 8.2132

Şi dacă...

Hrel

0.987

Se bate miezul nopţii...

Poem title

S

14.0851

6.6987

2.4098

47.6845

4.554

5.0704

5.3582

8.6318

5.1325

5.6043

31.6412

25.6488

3.168

13.6364

5.7053

1.6115

Λ

R4

A

B

α rad

0.7629 0.41 0.55 2.5193 0.34 2.7164

2.6350

0.8207 0.72 0.76 1.9233

0.8325 0.48 0.41 2.376

1.8567

2.5068

0.8489 0.47 0.56 2.3836 2.2169

0.8197 0.3

1.7861

2.2335

0.8526 0.47 0.46 2.3907 2.3962

0.7996 0.38 0.35 2.5767 2.6294

0.7624 0.28 0.16 2.7367 2.9143

0.8291 0.04 0.21 3.0962 2.8646

1.7767

1.6681

1.7013

0.806 0.57 0.61 2.2096 2.1315

0.8358 0.48 0.47 2.3803 2.3840

0.9059 0.23 0.31 2.8417 2.6996

1.6936 0.6808 0.65 0.76 2.0566 1.8706

1.4055

1.6405 0.8603 0.48 0.38 2.3772 2.5688

1.6703

1.7229

1.6503

1.4561

αS rad

2.3825 2.7175

1.5082 0.6456 0.75 0.81 1.8781

1.7951

1.4633

1.7556

1.2116

1.4632 0.8983 0.47 0.3

R1

R2

0.9208

0.9036

0.9386 0.94

0.9286 0.9244

0.9636 0.9453

0.8856 0.9253

0.9237 0.8994

0.956

0.875

0.9414 0.93

0.9176 0.9137

0.9214 0.8712

0.8429 0.8781

0.9377 0.9278

0.9295 0.8886

0.9244 0.9257

0.8962 0.8302

0.9333 0.9069

250  The control cycle

The control cycle  251

Table 4.2: Correlation coefficients between all indicators RRrel

Hrel 0.87 RRrel I S Λ R4 A B α rad αS rad R1

I

S

-0.70 -0.65 -0.45 -0.42 0.99

Λ

R4

A

B

α rad

αS rad

R1

R2

-0.18 0.01 0.17 0.09

0.93 0.81 -0.69 -0.65 -0.17

-0.67 -0.63 0.49 0.45 0.46 -0.63

-0.75 -0.71 0.46 0.40 0.47 -0.77 0.79

0.67 0.63 -0.47 -0.42 -0.46 0.63 -1.00 -0.80

0.72 0.70 -0.42 -0.37 -0.45 0.75 -0.79 -1.00 0.80

0.92 0.96 -0.58 -0.56 0.01 0.86 -0.61 -0.72 0.62 0.70

-0.14 0.15 0.37 0.32 0.78 -0.16 0.33 0.33 -0.32 -0.30 0.06

Table 4.3: t-tests for the significance of the correlation coefficients

Hrel RRrel

RRrel 21.17

A B α rad αS rad R1 R2 -10.94 -13.43 10.80 12.55 28.07 -1.73

I S Λ -11.67 -10.38 -2.17

R4 31.17

-6.01

16.30 -9.63

-11.95 9.78

11.74

40.39 1.77

-5.52

0.06

I

109.60 2.07

-11.35 6.79

6.16

-6.34

-5.57

-8.65

4.85

S

1.10

-10.14 5.97

5.26

-5.55

-4.74

-8.02

4.04

-2.04

6.25

6.32

-6.15

-6.07

0.14

14.94

7.13

10.70

36.92 39.36 20.94 10.71

15.56

-191.12 -15.31 -9.34

Λ R4 A B α rad αS rad R1

4.17

-16.02 -167.86 -12.32 4.25 15.82

9.43

-3.99

11.80

-3.79 0.66

252  The control cycle Let us consider the links one by one. Since our investigation is mostly based on rank-frequency sequences, the simplest relationship is that between the most elementary properties of the sequence, viz. Ord's criterion, which links two functions of the moments of the distribution. The fact that S = f(I) has already been analysed in Chapter 3.2.2 where the respective formulas of the power and the linear functions were presented. Usually, this relationship is an expression of style, and writers display at least different parameters. The almost linear dependence is shown in Figure 4.1a, for clarity also in logarithmic transformation in Fig. 4.1.b but the formula concerns the plain values.

Figure 4.1a. The almost linear relationship between I and S. Plain presentation. |r| = 0.99; S = 2.9493 + 0.3425*I1.1512; R2 = 0.9974.

Figure 4.1b. The almost linear relationship between I and S. Logarithmic presentation.

Analyzing I and S, the logarithmic presentation is necessary due to the high disparity of the values (up to about 103) in comparison with other considered indicators, which mostly attain very small values, of the magnitude of a unity.

The control cycle  253

Since I is, as a matter of fact, a measure of dispersion, it must automatically be associated with the entropy and the repeat rate. The greater the dispersion, the greater is the entropy, which expresses also a kind of uncertainty; however, in our rank-frequency data, the average increases with increasing dispersion, thus the greater is I, the smaller will be the entropy. Further, the greater the dispersion, the smaller is the repeat rate, which is a measure of concentration. The first of these two relations is presented in Figure 4.2, again, in plain and in logarithmic presentation. The outliers are nominated in the plot. The relation of I to RRrel has a smaller correlation coefficient than required, hence it will be omitted, even if the power relation between the two indicators has an R2 = 0.63.

Figure 4.2. Plain (left) and logarithmic (right) presentation of Hrel = f(I). |r| = 0.70; Hrel = 1.0493 - 0.0391*I0.2287; R2 = 0.8490 (3 outliers omitted).

Figure 4.3. Relation of I to R1. |r| = 0.58; R1 = 0.9996 - 0.0323*I0.2928; R2 = 0.6743 (2 outliers omitted). Plain (left) and logarithmic (right) presentation.

254  The control cycle Since the h-point is a fixed point, it is to be expected that all indicators based on h must have some relation to I. These are A, α rad, R1, R2, R4, and even the indicators of the spectrum, B and αS rad. The “best” ones are presented in Figures 4.3 and 4.4. The functions linking all these indicators with I are presented in Table 4.2. Only four properties fulfil the criterion, hence the preliminary control cycle has the form displayed in Figure 4.5.

Figure 4.4. Relation of I to R4. |r| = 0.69; R4 = 0.4145*exp(-I/201.7846) + 0.43; R = 0.7325. Plain (left) and logarithmic (right) presentation.

Figure 4.5. The first step toward the self-regulation cycle.

Figure 4.6. Relation of S to R1. |r| = 0.56; R1 = 0.9963 – 0.0443*S0.2449; R2 = 0.7250 (2 outliers omitted). Plain (left) and logarithmic (right) presentation.

The control cycle  255

The next series of Figures (4.6 to 4.8) shows the relation of S to the other indicators, namely Hrel, R1 and R4. Thus, S plays the same role in the cycle as I.

Figure 4.7. The relation of S to Hrel. |r| = 0.65; Hrel = 1.0266 - 0.0335*S0.2353; R2 = 0.9140 (4 outliers omitted). Plain (left) and logarithmic (right) presentation.

In Figure 4.7, the logarithmic presentation shows that the relation is not as strong as shown in the plain presentation. The dispersion is too great.

Figure 4.8. The relation between S and R4. |r| = 0.65; R4 = 1.3591 – 0.4388*S0.1080; R2 = 0.9184 (2 outliers omitted). Plain (left) and logarithmic (right) presentation.

Completing the significant relations of S we obtain the new control cycle in Figure 4.9.

256  The control cycle

Figure 4.9. The second step toward the self-regulation cycle

The other indicator values are comparable and need no logarithmic presentation. Those of Hrel are presented in Figures 4.10 to 4.14.

Figure 4.10. Relation of Hrel to RRrel. |r| = 0.87; RRrel = 0.9396 + 0.0635*Hrel16.6456; R2 = 0.9038 (2 outliers omitted).

Figure 4.12. Relation of Hrel to αS. |r| = 0.72; αS = 1.618 + 1.7805*Hrel35.1141; R2 = 0.7102 (3 outliers omitted).

Figure 4.11. Relation of Hrel to B. |r| = 0.75; B = 0.9581 - 1.0259*Hrel30.4349; R2 = 0.7406 (3 outliers omitted).

Figure 4.13. Relation of Hrel to R1. |r| = 0.92; R1 = -0.5701 + 1.5374*Hrel; R2 = 0.9133 (4 outliers omitted).

The control cycle  257

Figure 4.14. Relation of Hrel to R4. ‫׀‬r‫ = ׀‬0,93; R4 = 0,2264+0,7224*Hrel7,5475; R2 = 0,9566 (4 outliers omitted)

The cycle has now the following form (cf. Figure 4.15). The relations of RRrel are presented in Figures 4.16 to 4.21. The relation to A and to α radians lies at the acceptance boundary. The dispersion of the values is relatively great.

Figure 4.15. Inserting the significant relations of Hrel in the cycle

Figure 4.16. The relation of RRrel to A. |r| = 0.63; A = 2.8565 - 2.6203*RRrel4.3843; R2 = 0.5155 (2 outliers omitted).

Figure 4.17. Relation of RRrel to B. |r| = 0.71; B = 1.1264 - 1.1473*RRrel31.8398; R2 = 0.7256 (2 outliers omitted)

258  The control cycle

Figure 4.18. Relation of RRrel to α. |r| = 0.63; α = - 2.2224 + 5.0227*RRrel4.0487; R2 = 0.5168 (2 outliers omitted).

Figure 4.19. Relation of RRrel to αS. |r| = 0.70; αS = 1.4031 + 1.9086*RRrel38.5419; R2 = 0.7170 (2 outliers omitted).

Figure 4.20. Relation of RRrel to R1. |r| = 0.96; R1 = - 2.1780 + 3.1664*RRrel; R2 = 0.9447 (1 outlier omitted).

Figure 4.21. Relation of RRrel to R4. |r| = 0.81; R4 = - 6.0059 + 6.9509*RRrel; R2 = 0.8076 (2 outliers omitted).

The graph has now the following form (cf. Figure 4.22).

Figure 4.22. The status quo of the cycle.

The control cycle  259

Figure 4.23. Relation of A to B. |r| = 0.79; B = 0.0921 + 0.9831*A; R2 = 0.6272.

Though the relation of A to B and to αS radians displays a number of outliers, they still can be included in the cycle.

Figure 4.24. Relation of A to α. |r| = 1.00; α = 3.2209 - 1.7802*A; R2 = 0.9961.

Figure 4.25. Relation of A to αS radians. |r| = 0.79; αS = 2.9170 - 1.6430*A1.1992; R2 = 0.6226.

Figure 4.26. Relation of A to R1. |r| = 0.61; R1 = 0.9302 - 0.2598*A4.4935; R2 = 0.6224 (2 outliers omitted).

260  The control cycle The cycle has now the form shown in Figure 4.27.

Figure 4.27. The status quo of the cycle

Figure 4.28. Relation of B to α. |r| = 0.80; α = 2.7529 - 1.0649*B1.8363; R2 = 0.6755.

Figure 4.29. Relation of B to αS. |r| = 1.00; αS = 3.1863 - 1.7117*B; R2 = 0.9949

Figure 4.30. Relation of B to R1. |r| = 0.72; R1 = 0.9362 - 0.1666*B4.7810; R2 = 0.7984 (2 outliers omitted).

Figure 4.31. Relation of R4 to B. |r| = 0.77; R4 = 0.8408 - 0.3632*B4.1031; R2 = 0.7266.

The control cycle  261

Figure 4.32. Relation of α to αS. |r| = 0.80; αS = 1.0411 + 0.2209*α1.9232; R2 = 0.6277.

Figure 4.33. Relation of α to R1. |r| = 0.62; R1 = 0.93191-107.1708*exp(-α/0.2587); R2 = 0.6276 (2 outliers omitted).

Figure 4.34. Relation of α to R4. |r| = 0.63; R4 = 0.8187 -285.8296*exp(-α /0.2518); R2 = 0.5474.

Figure 4.35. Relation of αS to R1. |r| = 0.70; R1 = 0.9367 - 124.8114*exp(-αS/0.2359); R2 = 0.7971 (2 outliers omitted).

Figure 4.36. Relation of αS to R4. |r| = 0.75; R4 = 0.8424 - 107.2138*exp(-αS/0.2747); R2 = 0.7265.

Figure 4.37. Relation of R1 to R4. |r| = 0.86; R4 = 0.9842*R12.6096; R2 = 0.8060 (2 outliers omitted).

262  The control cycle

Figure 4.38. Relation of R2 to Λ. |r| = 0.78; Λ = - 3.1375 + 5.2308*R20.9879; R2 = 0.6377 (2 outliers omitted).

The complete cycle can be presented in Figure 4.39. This figure shows that two of the properties, viz. R2 and Λ, form an isolated cycle while the main cycle does not contain all links. Of course, this holds only if our criteria turn out to be acceptable.

Figure 4.39. The final form of the cycle.

The results require some comments which may be presented here. 1. As conjectured above, no property is isolated. Relatively independent is the indicator Λ which concerns the normalised arc length. Its link to R2 was not expected and will possibly be strengthened by other texts. 2. The majority of links is represented by a straight line (21), a power function (34) and an exponential function (11), that means, very simple links which can be incorporated in a system analogous to that of Köhler. All relations, also those that are not shown here graphically, are presented in Table 4.4. Some of them are merely preliminary proposals, without a corroboration by means of a test.

The control cycle  263

3.

Since the correlation coefficient is merely the first sign of a relationship, a function joining two variables and yielding R2 > 0.80 is a sufficient reason to incorporate the link in a control cycle. Our present steps are merely preliminary hints at a possibility of theory building. 4. The basis of the given system are the properties R1 and R4 which have the most (8) strong links. But this is merely a very preliminary, explorative view restricted to only one writer. 5. Whenever many texts are analysed and a link between properties is discovered, outliers may occur that seemingly destroy the relationship. However, they are not necessarily reasons for rejecting a hypothesis. Usually, we set up hypotheses without caring for boundary conditions. Every text may contain elements that were inserted voluntarily, so to say against the automatically working mechanisms, in order to give the text a special view, introduce a special mood, etc. If we mix prose texts with poetic ones in which a special rhythm is necessary, then some indicators do not change but other ones may display a very deviating pattern. A simple example may illustrate this circumstance: if we find a law of word length distribution, then it holds only for languages in which word length is a variable; it does not hold for monosyllabic languages which do not fulfil the basic requirement of variable word length. An outlier is rather a cause for further research with other means. 6. We presented the relations between variables in pairs. But later on one can discover that a variable may depend simultaneously on several other ones (see, for instance, 3D plots in Figures 4.40, 4.41, and 4.42). This is, of course, a way without end because one would be forced to scrutinise all triple, quadruple, etc. relations, but sometimes it is also a possible explanation of outliers. To this end, the last Figure 4.39 could be interpreted systems-theoretically, i.e. as a directed graph with influences, parameters, loops, and the whole control cycle would consist of a system of equations. 7. If a variable represented by an indicator does not display any link to the other ones, it does not mean that it is isolated. We simply did not find those variables with which it is linked, and further investigations are necessary. It would be naïve to suppose that texts have only twelve properties because we know that properties are only our theoretical concepts. 8. As can be seen in Table 4.4, the linking functions are very simple. If we set up the differential equations whose solutions they are, we obtain the following results:

264  The control cycle Function

Differential equation

y = a + bx y = axb y = a + bxc y = a*exp(-x/b) y = a + b*exp(-x/c)

y’ = b y’/y = b/x y’/(y-a) = c/x y’/y = -1/b y’/(y-a) = -1/c

Table 4.4: Linking the variables by preliminary functions Var

Var

Fitting function

1

2

I

Λ

0.17 Λ = - 6.9463 + 8.2416*I0.0123

I

R2

0.37 R2 = 0.3771 + 0.4917*I

0.3129

I

αS

0.42 αS = 1.6 + 1.2519*exp(-I/39.1304)

0.5861

I

RRrel

0.45 RRrel = 1.1018 - 0.0986*I0.0739

0.6284

I

B

0.46 B = 1 - 0.7391*exp(-I/55.8425)

0.6026

|r|

I 0.0255

omitted R

outliers

0.4283

2

2

I

α

0.47 α = 1.7 + 0.8906*exp(-I/94.8474)

0.4510

I

A

0.49 A = - 5.9300 + 6.0336*I0.0190

0.4767

I

R1

0.58 R1 = 0.9996 - 0.0323*I0.2928

0.6743

I

R4

0.69 R4 = 0.43 + 0.4145*exp(-I/201.7846)

0.7325

I

Hrel

0.70 Hrel = 1.0493 - 0.0391*I0.2287

0.8490

I

S

0.99 S = -2.9493 + 0.3425*I1.1512

0.9974

|r|

S

2

2 3

R2

S

Λ

0.09 Λ = - 2.1657 + 3.6655*S

0.2627

S

R2

0.32 R2 = 0.8948*S0.0085

0.1751

S

αS

0.37 αS = 1.0465*exp(-S/25.6391) + 1.6

0.5849

S

B

0.40 B = - 5.3964 + 5.6494*S0.0217

0.6331

S

RRrel

0.42 RRrel = 1.0503 - 0.0586*S

0.6941

S

α

0.42 α = 0.8111*exp(-S/68.2008) + 1.7

0.4292

S

A

0.45 A = - 0.0618 + 0.3910*S0.1418

0.4451

S

R1

0.56 R1 = 0.9963 - 0.0443*S0.2449

0.725

2

S

Hrel

0.65 Hrel = 1.0266 - 0.0335*S

0.914

4

S

R4

0.65 R4 = 1.3591 - 0.4388*S0.1080

0.9184

2

I

S

0.99 S = -2.9493 + 0.3425*I1.1512

0.9974

0.0159

0.0957

0.2353

|r|

Hrel

2

2

R2

Hrel

R2

0.14 R2 = 1.1650 - 0.2581*Hrel

0.0952

Hrel

Λ

0.18 Λ = 2.5936 - 0.9689*Hrel

0.0318

2

S

Hrel

0.65 Hrel = 1.0266 - 0.0335*S0.2353

0.9140

Hrel

A

0.67 A = 1.1159 - 0.0001*exp(9.4401*Hrel)

0.4941

1

Hrel

α

0.67 α = 1.6180 + 1.2022*Hrel15.3522

0.4863

2

4

The control cycle  265 Var

Var

Fitting function

omitted

I

Hrel

0.70 Hrel = 1.0493 - 0.0391*I

0.8490

3

Hrel

αS

0.72 αS = 1.618 + 1.7805*Hrel35.1141

0.7102

3

Hrel

B

0.75 B = 0.9581 - 1.0259*Hrel30.4349

0.7406

3

Hrel

RRrel

0.87 RRrel = 0.9396 + 0.0635*Hrel16.6456

0.9038

2

0.2287

Hrel

R1

0.92 R1 = - 0.5701 + 1.5374*Hrel

0.9133

4

Hrel

R4

0.93 R4 = 0.2264 + 0.7224*Hrel7.5475

0.9566

4

|r|

RRrel

R2

RRrel

Λ

0.01 Λ = 4.2076 - 2.6059*RRrel

0.0510

RRrel

R2

0.15 R2 = 1.2065 - 0.2964*RRrel

0.0274

2

S

RRrel

0.42 RRrel = 1.0503 - 0.0586*S0.0957

0.6941

2

0.45 RRrel = 1.1018 - 0.0986*I

0.0739

I

RRrel

0.6284

2

RRrel

A

0.63 A = 2.8565 - 2.6203*RRrel4.3843

0.5155

2

RRrel

Α

0.63 α = - 2.2224 + 5.0227*RRrel4.0487

0.5168

2

RRrel

αS

0.70 αS = 1.4031 + 1.9086*RRrel38.5419

0.7170

2

RRrel

B

0.71 B = 1.1264 - 1.1473*RRrel31.8398

0.7256

2

RRrel

R4

Hrel

RRrel

RRrel

R1

0.81 R4 = - 6.0059 + 6.9509*RRrel

0.8076

2

0.87 RRrel = 0.9396 + 0.0635*Hrel16.6456

0.9038

2

0.96 R1 = - 2.1780 + 3.1664*RRrel

0.9447

1

|r|

A

R

2

A

R2

0.33 R2 = 0.8901 + 0.0517*A

0.1656

S

A

0.45 A = - 0.0618 + 0.3910*S0.1418

0.4451

A

Λ

0.46 Λ = 1.4599 + 0.3866*A

0.2131

I

A

0.49 A = - 5.9300 + 6.0336*I0.0190

0.4767

A

R1

0.61 R1 = 0.9302 - 0.2598*A4.4935

0.6224

2

A

R4

0.63 R4 = 0.8149 - 0.6118*A4.7963

0.5961

3

RRrel

A

0.63 A = 2.8565 - 2.6203*RRrel4.3843

0.5155

2

Hrel

A

0.67 A = 1.1159-0.0001*exp(9.4401*Hrel)

0.4941

1

A

αS

0.79 αS = 2.9170 - 1.6430*A1.1992

0.6226

A

B

0.79 B = 0.0921 + 0.9831*A

0.6272

A

Α

1

α = 3.2209 - 1.7802*A

1

0.9961

|r|

B

R2

B

R2

0.33 R2 = 0.8973 + 0.0339*B

0.1197

S

B

0.40 B = - 5.3964 + 5.6494*S

0.6331

I

B

0.46 B = 1 - 0.7391*exp(-I/55.8425)

0.6026

0.0217

2

B

Λ

0.47 Λ = 1.4678 + 0.3283*B

0.2514

1

RRrel

B

0.71 B = 1.1264 - 1.1473*RRrel31.8398

0.7256

2

B

R1

0.72 R1 = 0.9362 - 0.1666*B4.7810

0.7984

2

0.7406

3

Hrel

B

0.75 B = 0.9581 -

1.0259*Hrel30.4349

266  The control cycle Var

Var

Fitting function

omitted

B

R4

0.77 R4 = 0.8408 - 0.3632*B

A

B

0.79 B = 0.0921 + 0.9831*A

0.6272

B

α

0.80 α = 2.7529 - 1.0649*B1.8363

0.6755

B

αS

1

4.1031

αS = 3.1863 - 1.7117*B

0.9949

|r| Α

R2

0.7266

Α

0.32 R2 = 0.9818 - 0.0282*α

R2 0.1566

S

α

0.42 α = 0.8111*exp(-S/68.2008) + 1.7

0.4292

Α

Λ

0.46 Λ = 2.1524 - 0.2141*α

0.2079

I

α

0.47 α = 0.8906*exp(-I/94.8474) + 1.7

0.4510

Α

R1

0.62 R1 = 0.93191-107.1708*exp(-α /0.2587)

0.6276 0.5474

1

2

Α

R4

0.63 R4 = 0.8187 -285.8296*exp(-α /0.2518)

RRrel

α

0.63 α = - 2.2224 + 5.0227*RRrel4.0487

0.5168

2

Hrel

α

0.67 α = 1.6180 + 1.2022* Hrel15.3522

0.4863

2

Α

αS

0.80 αS = 1.0411 + 0.2209*α1.9232

0.6277

B

α

0.80 α = 2.7529 - 1.0649*B1.8363

0.6755

A

α

1

α = 3.2209 - 1.7802*A

0.9961

|r|

αS

R2

αS

R2

0.30 R2 = 0.9618 - 0.0207*αS

0.1211

S

αS

0.37 αS = 1.0465*exp(-S/25.6391) + 1.6

0.5849

1

I

αS

0.42 αS = 1.2519*exp(-I/39.1304) + 1.6

0.5861

αS

Λ

0.45 Λ = 2.0448 - 0.1774*αS

0.2037

RRrel

αS

0.70 αS = 1.4031 + 1.9086*RRrel38.5419

0.7170

αS

R1

0.70 R1 = 0.9367-124.8114*exp(-αS/0.2359)

0.7971

2

Hrel

αS

0.72 αS = 1.618 + 1.7805* Hrel35.1141

0.7102

3

αS

R4

0.75 R4 = 0.8424-107.2138*exp(-αS/0.2747)

0.7265

A

αS

0.79 αS = 2.9170 - 1.6430*A1.1992

0.6226

Α

αS

0.80 αS = 1.0411 + 0.2209*α1.9232

B

αS

1

αS = 3.1863 - 1.7117*B

2

0.6277 0.9949

|r|

R1

R2

R1

Λ

0.01 Λ = 2.2006 - 0.5847*R1

0.0275

2

R1

R2

0.06 R2 = 1.0077 - 0.0993*R1

0.0330

2

S

R1

0.56 R1 = 0.9963 - 0.0443*S0.2449

0.7250

2

I

R1

0.58 R1 = 0.9996 - 0.0323*I

0.6743

2

A

R1

0.61 R1 = 0.9302 - 0.2598*A4.4935

0.6224

2

Α

R1

0.62 R1 = 0.93191 - 107.1708*exp(-α /0.2587)

0.6276

2

αS

R1

0.70 R1 = 0.9367 - 124.8114*exp(-αS/0.2359)

0.7971

2

B

R1

0.72 R1 = 0.9362 - 0.1666*B4.7810

0.7984

2

0.8060

2

R1

R4

0.2928

0.86 R4 =

0.9842*R12.6096

The control cycle  267 Var

Var

Fitting function

Hrel

R1

0.92 R1 = - 0.5701 + 1.5374*Hrel

0.9133

4

RRrel

R1

0.96 R1 = - 2.1780 + 3.1664*RRrel

0.9447

1

|r|

R2

omitted

R2

R1

R2

0.06 R2 = 1.0077 - 0.0993*R1

0.0330

2

Hrel

R2

0.14 R2 = 1.1650 - 0.2581* Hrel

0.0952

2

RRrel

R2

0.15 R2 = 1.2065 - 0.2964*RRrel

0.0274

2

R2

R4

0.16 R4 = 0.7733 - 61.9188*R2

0.2111

1

αS

R2

0.30 R2 = 0.9618 - 0.0207*αS

0.1211

1

S

R2

0.32 R2 = 0.8948*S0.0085

0.1751

Α

R2

0.32 R2 = 0.9818 - 0.0282*α

0.1566

A

R2

0.33 R2 = 0.8901 + 0.0517*A

0.1656

1

B

R2

0.33 R2 = 0.8973 + 0.0339*B

0.1197

2

I

R2

0.37 R2 = 0.3771 + 0.4917*I0.0255

0.3129

R2

Λ

0.78 Λ = - 3.1375 + 5.2308*R20.9879

0.6377

|r|

118.6560

R4

1

2

R

2

R2

R4

0.16 R4 = 0.7733 - 61.9188*R2118.6560

0.2111

R4

Λ

0.17 Λ = 1.8626 - 0.2589* R4

0.0280

Α

R4

0.63 R4 = 0.8187-285.8296*exp(-α /0.2518)

0.5474

A

R4

0.63 R4 = 0.8149 - 0.6118*A4.7963

0.5961

3

S

R4

0.65 R4 = 1.3591 - 0.4388*S0.1080

0.9184

2

I

R4

0.69 R4 = 0.4145*exp(-I/201.7846) + 0.43

0.7325

αS

R4

0.75 R4 = 0.8424-107.2138*exp(-αS/0.2747)

0.7265

B

R4

0.77 R4 = 0.8408 - 0.3632*B4.1031

0.7266

RRrel

R4

0.81 R4 = - 6.0059 + 6.9509*RRrel

0.8076

R1

R4

0.86 R4 = 0.9842*R12.6096

0.8060

2

Hrel

R4

0.93 R4 = 0.2264 + 0.7224*Hrel7.5475

0.9566

4

|r| R1

Λ

RRrel S

Λ

1

2

R2

0.01 Λ = 2.2006 - 0.5847*R1

0.0275

Λ

0.01 Λ = 4.2076 - 2.6059*RRrel

0.0510

Λ

0.09 Λ = - 2.1657 + 3.6655*S0.0159

0.2627

2

I

Λ

0.17 Λ = - 6.9463 + 8.2416*I0.0123

0.4283

2

R4

Λ

0.17 Λ = 1.8626 - 0.2589*R4

0.0280

Hrel

Λ

0.18 Λ = 2.5936 - 0.9689*Hrel

0.0318

αS

Λ

0.45 Λ = 2.0448 - 0.1774*αS

0.2037

A

Λ

0.46 Λ = 1.4599 + 0.3866*A

0.2131

Α

Λ

0.46 Λ = 2.1524 - 0.2141* α

0.2079

B

Λ

0.47 Λ = 1.4678 + 0.3283*B

R2

Λ

0.78 Λ = - 3.1375 + 5.2308*R2

0.9879

2

0.2514

1

0.6377

2

268  The control cycle The expression (y-a) in the denominator of two of the equations simply means that the beginning of the function is shifted by constant a. All differential equations represent the simplest forms of Wimmer-Altmann's (2005) unified theory. Though here some cases are not sufficiently corroborated, we can suppose that if a relation exists, it can be expressed by some of these five formulas or their generalisations. It is to be emphasised that this is merely a beginning of theory development. Since a property is never completely isolated, it does not mean that it is linked directly with only one other variable. Sometimes the prediction may improve if we consider functions like y = f(x,z) or y = f(x,w,z) etc, i.e. a link to more than one independent variable. When we have n properties, then we can have 2n different control cycles. Consider for example the two functions of R2: for R2 = f(αs) we obtained R2 = 0.1211; for R2 = f(R1) we obtained R2 = 0.0330, that means in both cases a rather insignificant result. If we take R2 = f(αs) + f(R1) in a simple form R2 = c + a*αsb + d*R1k we obtain already R2 = 0.4340. This is still insignificant but adding further variables one could find a reliable net of interrelations. This research perspective seems to be encouraging while contemplating 3D plots of triplets of indicators, such as illustrated in Figures 4.40, 4.41, and 4.42. 3D plots (big spheres) show the connections that can be simultaneously established amongst any three of the set of indicators considered in Table 4.1. and one can easily recognise the 2D projections (small circles) presented in detail above. Obviously, these 3D plots become more lucid if the data are cleaned up of outliers. Outliers represent, so to say, a foreign body in lowdimensional view. But even if we consider a system of equations, the outliers can be resolved often only with some boundary conditions which hold only for the given poems. Unfortunately, surpassing three dimensions we must give up graphical presentations and restrain to systems of equations.

The control cycle  269

Figure 4.40. Hrel, R1, R4 relationship

Figure 4.41. I, S, R4 relationship

Figure 4.42. α, A, R1 relationship

References Agricola, E. (1969). Semantische Relationen im Text und im System. Halle: Niemeyer. Altmann, G. (1963). Phonic structure of Malay pantun. Archiv orientalní 32, 274–286. Altmann, G. (1966a). The measurement of euphony. Teorie verše I, 259–261. Brno: Universita J. Purkyně. Altmann, G. (1966b). Binomial index of euphony for Indonesian poetry. Asian and African Studies 2, 62–67. Altmann, G. (1968). Some phonic features of Malay shaer. Asian and African Studies 4, 9–16. Altmann, G. (1978). Zur Anwendung der Quotiente in der Textanalyse. Glottometrika 1, 91–106. Altmann, G. (1987). Tendenzielle Vokalharmonie. Glottometrika 8, 104–112. Altmann, G. (1999). Von der Fachsprache zum Modell. In: Wiegand, H.E. (ed.), Sprache und Sprachen in den Wissenschaften. Geschichte und Gegenwart: 294-312. Berlin: de Gruyter. Altmann, G. (2007). Poesie und Mathematik. Göttinger Beiträge zur Sprachwissenschaft 14, 7– 24. Altmann, G. (2009). Texte und Theorien. In: Delcourt, V., Hug, M. (eds.), Mélanges offerts à Charles Muller: 37–45. Paris: Conseil International de la Langue Française. Altmann, G., Popescu, I.-I., Zotta, D., (2013). Stratification in Texts, Glottometrics 25, 85–93; http://www.cs.auckland.ac.nz/research/groups/CDMTCS/researchreports/433APZ.pdf Altmann, G., Štukovský, R. (1965). The climax in Malay pantun. Asian and African Studies 1, 13–20. Altmann, V., Altmann, G. (2008) Anleitung zu quantitativen Textanalysen. Lüdenscheid: RAM. Ammermann, S. (2001). Zur Wortlängenverteilung in deutschen Briefen über einen Zeitraum von 500 Jahren. In: Best, K.-H. (ed.) (2001b), 59–91. Andres, J. (2010). On a conjecture about the fractal structure of language. Journal of Quantitative Linguistics 17(2), 101–122. Andres, J., Benešová, M. (2012). Fractal analysis of Poe’s Raven. Journal of Quantitative Linguistics 19(4), 301–324. Arapov, V.V. (1977). Dva modeli rangovogo raspredelenija. Vorposy informacionnoj teorii i praktiki 4, 3-42. Arapov, V.V., Šrejder, Ju.A. (1977). Klassifikacii i rangovye raspredelenija. Naučno-techničeskaja informacija 11(12), 15–21. Arapov, V.V., Šrejder, Ju.A. (1978). Zakon Cipfa i princip dissimetrii sistemy. Semiotika i informatika 10, 74–95 Arlt, I. (2006). Zur Wortlängenverteilung in SMS-Texten. Göttinger Beiträge zur Sprachwissenschaft 13, 9–21. Baayen, R.H. (1989). A corpus-based approach to morphological productivity. Statistical analysis and psycholinguistic interpretation. Amsterdam: Centrum voor Wiskunde en Informatica. Bernet, Ch. (1988). Faits lexicaux. Richesse du vocabulaire. In: Thoiron, Ph., Labbé, D., Serant, D. (eds.), Études sur la richess et la structure lexicale: 1-11. Paris-Genève: ChampionSlatkine. Best, K.-H. (2001a). Silbenlängen in Meldungen der Tagespresse. In: Best, K.H. (ed.) (2001b), 15–32. Best, K.-H. (ed.) (2001b). Häufigkeitsverteilungen in Texten. Göttingen: Peust & Gutschmidt.

References  271 Best, K.-H. (2003). Quantitative Linguistik. Eine Annäherung. Göttingen: Peust & Gutschmidt. Best, K.-H. (2005a). Morphlänge. In: Köhler, R., Altmann, G., Piotorowski, R.G. (eds.), Quantitative Linguistics. An International Handbook: 255-260. Berlin-New York: de Gruyter. Best, K.-H. (2005b). Wortlänge. In: Köhler, R., Altmann, G., Piotorowski, R.G. (eds.), Quantittative Linguistics. An International Handbook: 260-273. Berlin-New York: de Gruyter. Best, K.H., Kaspar, I. (2001). Wortlängen in Faröischen. In: Best, K.H. (ed.) (2001b), 92-100. Bortz, J., Lienert, G.A., Boehnke, K. (1990). Verteilungsfreie Methoden in der Biostatistik. Berlin: Springer. Bradley, J.V. (1968). Distribution-free statistical tests. Englewood Cliffs: Prentice Hall. Brainerd, B. (1976). On the Markov structure of the text. Linguistics 176, 5–30. Brown, C., Yule, G. (1983). Discourse analysis. Cabridge: Cabridge University Press. Brunet, É. (1978). Le vocabulaire de Jean Giraudoux. Structure et évolution. Genève: Slatkine. Bunge, M. (1967). Scientific research. Berlin-Heidelberg-New York: Springer. Bunge, M. (1983). Epistemology & Methodology I: Exploring the world. Dordrecht: Reidel. Bunge, M. (1995). Quality, quantity, pseudoquantityt and measurement in social science. Journal of Quantitative Linguistics 2, 1–10. Busemann, A. (1925). Die Sprache der Jugend als Ausdruck der Entwicklungsrhythmik. Jena: Fischer. Christmann, C. (2004). Denotative Textanalyse am Beispiel von Zeitungsartikeln. Seminararbeit, Trier. Cosette, A. (1994). La richesse lexicale et sa mesure. Paris: Champion. Cox, D.R. (1958). The regression analysis of binary sequences. Journal of the Royal Statistical Society B, 20, 215–232. Dindelegan, G.P. (ed.) (2013). The Grammar of Romanian. Oxford: Oxford University Press. Dugast, D. (1980). La statistique lexicale. Genève: Slatkine. Essler, W.K. (1971). Wissenschaftstheorie II. Freiburg: Alber. Esteban, M.D., Morales, D. (1995). A summary of entropy statistics. Kybernetica 31(4), 337-346. Fan, F., Altmann, G. (2007). Measuring the cohesion of compounds.In: Kaliuščenko, V., Köhler, R., Levickij, V. (eds.), Problems of typological and quantitative lexicology: 190-209. Černovcy: RUTA. Ferrer i Cancho, R. (2005). The variation of Zipf’s law in human language. European Physical Journal 44, 249–257. Galtung, J. (1967). Theory and methods of social research. Oslo: Universitetsforlaget. Gibbons, J.D. (1971). Nonparametric statistical inference. New York: McGraw-Hill. Gini, C. (1921). Maß der Verschiedenheit und der Einkommen“. Das ökonomische Journal 31: 124–126. Gini, C. (1936) On the Measure of Concentration with Special Reference to Income and Statistics. Colorado College Publication, General Series No. 208, 73-79. Grotjahn, R., Altmann, G. (1988). Linguistische Messverfahren. In: Ammon, U., Dittmar, N., Matheier, K.J. (eds.), Sociolinguistics. Soziolinguistik: 1026-1039. Berlin-New York: de Gruyter. Grzybek, P. (ed.) (2006). Contributions to the science of text and language. Word length studies and related issues. Dordrecht: Springer. Guiraud, P. (1954). Les charactères statistiques du vocabulaire. Essai de méthodologie. Paris: Université de France. Haight, F. A. (1966). Some statistical problems in connection word word association data. Journal of Matematical Psychology 3, 217–233.

272  References Haight, F.A. (1969). Two probability distributions connected with Zipf´s rank-size conjecture. Zastosowania matematyki 10, 225–228. Halliday, M.A.K., Hassan, R. (1976). Cohesion in English. London: Longman. Herdan, G. (1956). Language as choice and chance. Groningen: Nordhoff. Herdan, G. (1962). The calculus of linguistic observations. The Hague: Mouton. Herdan, G. (1966). The advanced theory of language aa choice and chance. Berlin: Springer. Herfindahl, O. (1950). Concentration in the steel industry. Diss., New York: Columbia University. Hoffmannová, J. (1996). Analýza diskurzu (ve světle nových publikací). Slovo a slovesnost 57(2), 109–115. Honore, T. (1979). Some simple measures of richness of vocabulary. ALLC Bulletin 7, 172–177. Hřebíček, L. (1985). Text as a unit and co-references. In: Ballmer, T.T. (ed.), Linguistic dynamics: 190-198. Berlin-New York: de Gruyter. Hřebíček, L. (1992). Text in communication: Supra-sentence structure. Bochum, Brockmeyer. Hřebíček, L. (1993). Text as a construct of aggregations. In: Köhler, R., Rieger, B. (eds.), Contributions to quantitative linguistics. Dordrecht: Kluwer: 33–39. Hřebíček, L. (1995). Text levels. Language constructs, constituents and Menzerath-Altmann law. Trier: WVT. Hřebíček, L. (1996). Word associations and text. Glottometrika 15, 12–17. Hřebíček, L. (1997). Lectures on text theory. Prague: Oriental Institute. Hřebíček, L. (2000). Variation in sequences. Prague: Oriental Institute. Hubert, P., Labbé, D. (1994). La richesse du vocabulaire. Communication au congrès de l’ ALLCACH. Paris: Sorbonne. Kaeding, F.W. (1897/98). Häufigkeitswörterbuch der deutschen Sprache. Steglitz: Selbstverlag. Kelih, E. (2009). Preliminary analysis of a Slavic parallel corpus. In: Levická, J., Garabík, R. (eds.), NLP, Corpus Linguistics, Corpus Based Grammar Research. Fifth International Conference Smolenice, Slovakia, 25-27 November 2009, Proceedings: 173–183. Bratislava: Tribun. Kendall, M.G. (1962). Rank correlation methods. London: Griffin. Köhler, R. (1986). Zur synergetischen Linguistik. Struktur und Dynamik der Lexik. Bochum: Brockmeyer. Köhler, R, (2005). Synergetic liguistics. In: Köhler, R., Altmann, G., Piotrowski, R.G. (eds.), Quantitative Linguistics. An International Handbook: 760-774. Berlin-New York, de Gruyter. Köhler, R., Galle, M. (1993). Dynamic aspects of text characteristics. In Altmann, G., Hřebíček, L. (eds.), Quantitative Text Analysis: 46-53. Trier: WVT. Köhler, R., Naumann, S. (2007). Quantitative analysis of co-reference structures in texts. In: Grzybek, P., Köhler, R. (eds.), Exact methods in the study of language and text: 317–329. Berlin-New York: Mouton de Gruyter. Ku, H.H. (1963). A note on contingency tables involving zero frequencies and the 2I test. Technometrics 5, 398–400. Levinson, S.C. (1983). Pragmatics. Cambridge: Cambridge University Press. Li, W. (1992). Random texts exhibit Zipf’s-law-like word frequency distribution. IEEE Transactions on Information Theory 38(6), 1842–1845. Lienert,

References  273 Lord, R.D. (1958). Studies in the history of probability and statistics. VIII: De Morgan and the statistical study of literary style. Biometrika 45, 282. Mandelbrot, B. (1953). An informational theory of the statistical structure of language. In: W. Jackson. (ed.), Communication theory: 486-502. London: Butterworth. Manin, D.Yu. (2009). Mandelbrot’s model for Zipf’s law. Can Mandelbrot’s model explain Zif’s law for language? Journal of Quantitative Linguistics 16(3), 274-285. Mann, F.J., Whitney, D.R. (1947). On a test whether one of two random variables is stochastically larger than the other. Annals of Mathematica Statistics 18, 50–60. Maxwell, A.E. (1961). Analysing qualitative data. London: Methuen McIntosh, R.P. (1967). An index of diversity and the relation of certain concepts to diversity. Ecology 48, 392–404. Ménard, N. (1983). Mesure de la richesse lexicale. Genève: Slatkine. Meyer-Eppler, W. (1959). Grundlagen und Anwendungen der Informationstheorie. Berlin: Springer. Miller, G.A. (1957). Some effects of intermittent silence. The American Journal of Psychology 70, 311-314. Mood, A.M. (1940). Distribution theory of runs. The Annals of Mathematical Statistics 11, 367392. Muller, Ch. (1964). Calcul de probabilité et calcul d’ un vocabulaire. Travaux de linguistique et literature 2(1), 235–244. Muller, Ch. (1970). Sur la mesure de la richesse lexicale. Études de linguistique appliqué, Nouvelle serie 1, 20–46. Muller, Ch. (1977). Principes et méthodes de la statistique linguistique. Paris: Hachette. Naranan, S., Balasubrahmanyan, V.K. (2005). Power laws in statistical linguistics and related systems. In: Köhler, R., Altmann, G., Piotrowski, R.G. (eds.), Quantitative Linguistics. An International Handbook: 716–738. Berlin: de Gruyter. Nemcová, E., Altmann, G. (1994). Zur Wortlänge in slowakischen Texten. Zeitschrift für empirische Textforschung 1, 40–43. Numan, D. (1993). Introducing discourse analysis. London: Penguin. Oakes, M.P. (2007). Ord’s criterion with word length spectra for the discrimination of texts, music and computer programs. In: Grzybek, P., Köhler, R. (eds.), Exact methods in the study of language and text: 508-519. Berlin-New York: Mouton de Gruyter. Orlov, Ju.K., Boroda, M.G., Nadarejšvili, I.Š. (1982) Sprache,Ttext, Kunst. Quantitative Analysen. Bochum: Brockmeyer. Owen, D.B. (1962). Handbook of statistical tables. Reading, MA: Addison-Wesley. Palek, B. (1988). Referenční výstavba textu. Praha: Univerzita Karlova. Popescu, I.-I. (2007). Text ranking by the weight of highly frequent words. In: Grzybek, P., Köhler, R. (eds.), Exact methods in the study of language and text: 555-565. Berlin-New York: Mouton de Gruyter. Popescu, I.-I., Altmann, G. (2006). Some aspects of word frequencies. Glottometrics 13, 24–46. Popescu, I.-I., Altmann, G. (2007). Writer’s view of text generation. Glottometrics 15, 71–81. Popescu, I.-I., Altmann, G. (2011). Thematic concentration in texts. In: Kelih, E., Levickij, V., Matskulyak, Y. (eds.), Issues in Quantitative Linguistics 2: 110-116. Lüdenscheid: RAMVerlag.. Popescu, I.-I., Altmann, G., Köhler, R. (2010). Zipf´s law – another view. Quality and Quantity 44(4), 713–731.

274  References Popescu, I.-I., Čech, R., Altmann, G. (2010). Structural conservatism and innovation in texts, Glottotheory, 3(2), 43–63. Popescu, I.-I., Čech, R., Altmann, G. (2011). The lambda-structure of texts. Lüdenscheid: RAMVerlag. Popescu, I.-I., Čech, R., Altmann, G. (2012). Some characterizations of Slovak poetry. In: Naumann, S., Grzybek, P., Vulanovic, R., Altmann, G. (eds.), Synergetic Linguistics. Text and Language as Dynamic Systems: 187–165. Wien: Praesens. Popescu, I.-I., Čech, R., Altmann, G. (2013). The descriptivity in Slovak lyrics. Glottotheory 4(1), 92-104. Popescu, I.-I., Kelih, E., Best, K.-H., Altmann, G. (2009). Diversification of the case. Glottometrics 18, 32–39 Popescu, I,.-I., Mačutek, J., Altmann, G. (2009). Aspects of word frequencies. Lüdenscheid: RAM. Popescu, I,.-I., Mačutek, J., Altmann, G. (2010). Word forms, style and typology. Glottotheory 3(1), 89–96. Popescu, I.-I., et al. (2009). Word frequency studies. Berlin-New York: Mouton de Gruyter. Popescu, I.-I., Kelih, E., Mačutek, J., Čech, R., Best, K.-H., Altmann, G. (2010). Vectors and codes of text. Lüdenscheid: RAM-Verlag. Schiffrin, D. (1987). Discourse markers. Cambridge: Cambridge University Press. Schulz, K.-P., Altmann, G. (1988). Lautliche Strukturierung von Spracheinheiten. Glottometrika 9, 1–48. Schwarz, C. (1995). The distribution of aggregates in text. ZET – Zeitschrift für empirische Textforschung 2, 62–66. Sebeok, T.A., Zeps, V.J. (1959). On non-random distribution of initial phonemes in Cheremis verse. Lingua 8, 370–384. Skinner, B.F. (1939). The alliteration in Shakespeare’ sonnets: A study in literary behaviour. The Psychological Record 3, 186–192. Skinner, B.F. (1941). A quantitative estimate of certain types of sound patterning in poetry. The American Journal of Psychology 54, 64–79. Stegmüller, W. (1970). Theorie und Erfahrung. Berlin-Heidelberg-New York: Springer. Stubbs, M. (1983). Discourse analysis. Oxford: Blackwell. Štukovský, R., Altmann, G. (1965). Vývoj otvoreného rýmu v slovenskej poézii. Litteraria VIII, 156–161. Štukovský, R., Altmann, G. (1966). Die Entwicklung des slowakischen Reimes im XIX und XX Jahrhundert. In: Teorie verše I, 258-261. Brno: Universita J.E. Purkyně. Thoiron, Ph. (1988). Richesse lexicale et classement des textes. In: Thoiron, Ph., Labbé, D., Serant, D. (eds.), Études sur la richess et la structure lexicale: 141–163. Paris-Genève: Champion-Slatkine. Thoiron, Ph., Labbé, D., Serant, D. (eds.) (1988). Études sur la richess et la structure lexicale: 141-163. Paris-Genève: Champion-Slatkine. Tuzzi, A., Popescu, I.-I., Altmann, G. (2010a). Quantitative Analysis of Italian Texts. Lüdenscheid: RAM-Verlag. Tuzzi, A., Popescu, I.-I., Altmann, G. (2010b). The golden section in texts. ETC - Empirical Text and Culture Research 4, 30–41. Vater, H. (1994). Einführung in die Textliguistik. Struktur, Thema und Referenz in Texten. München: Fink.

References  275 Viehweger, D. (1978). Struktur und Funktion nominativer Ketten im Text. Studia Grammatica 17, 149–168. Wimmer, G., Altmann, G. (1999). Thesaurus of univariate discrete probability distributions. Essen: Stamm. Wimmer, G., Altmann, G. (2005). Unified derivation of some linguistic laws. In: Köhler, R., Altmann, G., Piotrowski, R.G. (eds.), Quantitative Linguistics. An International Handbook: 791–807. Berlin: de Gruyter. Wimmer, G., Altmann, G., Hřebíček, L., Ondrejovič, S., Wimmerová, S. (2003). Úvod do analyzy textov. Bratislava: Veda. Ziegler, A. (2005). Denotative Textanalyse. In: Köhler, R., Altmann, G., Piotrowski, R.G. (eds.), Quantitative Linguistics. An International Handbook: 423–447. Berlin-New York: de Gruyter. Ziegler, A., Altmann, G. (2002). Denotative Textanalyse. Wien: Praesens. Zipf, G.K. (1935/1968). The psycho-biology of language: an introduction to dynamic philology. Cambridge, Mass.: The M.I.T. Press. Zipf, G.K. (1949). Human behaviour and the principle of least effort. Cambridge, MA: AddisonWesley Zörnig, P., Boroda, M.G. (1992). The Zipf-Mandelbrot law and the interdependencies between frequency structure and frequency distribution in coherent text. Glottometrika 13, 205– 218.

Index active-descriptive equilibrium 93 activity 216f., 239 aggregation 9, 56 alliteration 9, 44, 50, 56 alliterative weight 53 Altmann 9, 22, 31, 37, 56f., 70, 75, 84, 92, 99, 103f., 134, 136f., 139, 141, 144, 153, 158f., 171, 175, 182, 197, 202, 209, 211, 217f., 238, 268, 2f., 5f. analytic 103f. analytic language 74 Andres 3 Arapov 102 arc length 66, 129 association 33, 39, 73 assonance 9, 31, 39f. asymmetry 39f. attractor 134 autosemantics 104, 171, 182 Baayen 102 Balasubrahmanyan 103 Benešová 3 Beowulf alliteration 44, 53f. Beowulf-alliteration 54 Bernet 182 Best 158, 196, 211 beta 219 beta function 70, 140 Boehnke 234, 238 Boroda 45, 102 Bortz 234, 238 boundary condition 263 Bowker-test 40 Bradley 236 Brainerd 31 Brunet 182 Bulgarian 103 Bunge 210, 5, 7 Busemann 92 Busemann coefficient 92 Busemann's ratio 217 Čech 70, 134, 137, 139, 144, 197, 218

ceteris paribus condition 76 chaos 3 climax 238 coefficient of euphony 26 concentration 147, 253 concentration indicator 144 concordance 215 control cycle 242 correlation 243, 251, 253 correlation coefficient 242 cosine 58 Cossette 182 Cox 234 cumulative frequency distribution 99 Czech 137 de Morgan 196 deducibility 6 descriptive equilibrium 93 descriptiveness 98, 216f., 239 determination coefficient 243 diagonal 37f. difference equation 75 dis legomena 110 dispersion 147, 253 dissimilarity 59 distance 45, 56, 61, 64f. distances 64 distribution – binomial 22, 209, 217 – Conway-Maxwell-Poisson 80f. – displaced Poisson 209 – displaced Singh-Poisson 209 – Ferreri-Poisson 82, 84, 202 – frequency 98 – hyper-Poisson 80, 83, 202 – negative hypergeometric 197, 212 – normal 32, 6 – Poisson 202, 237 – rank-order 92 – Riemann zeta 103 – sampling 5 – word-length 202 – zeta 103

278  Index – Zipf 103, 109 – Zipf-Alekseev 108 – Zipf-Mandelbrot 108 diversification 211 Dugast 182 English 137, 210, 216 entropy 144, 242, 253 equilibrium 93, 134, 147, 217 Essler 7 Esteban 144 Euclidean distance 129 euphonic tendency 26, 28 euphony 21f., 26, 28, 31, 45 euphony indicator 26 exactness 6 explanation 1 exponential 219 Ferrer i Cancho 103 frequency distribution 98 frequency sequence 122 Friedman's analysis of variance 213 Galle 218 Galtung 5 German 87, 98, 196, 210, 216 Gibbons 238 Gini 242 Gini's coefficient 153 golden section 134, 171, 175f., 180f. Grotjahn 5 Grzybek 196 Guiraud 182 Haight 102 Hammerl 211 hapax legomena 103, 110 Hawaiian 103 Herdan 103, 144 Herfindahl 144 homogeneity 213 homogeneity test 93 Honore 182 h-point 182, 254 hreb 99, 157 Hřebíček 99, 3

Hrel 255 Hubert 182 Hungarian 103 hypertext 2 hypothesis 211, 4ff. – statistical 4 iconic 28 idiosyncrasy 7 indicator of concentration 144 Indonesian 39 information statistics 94 inner rhythm 74 Jakobson 1 Javanese 21 Joos' model 103 Kaeding's 98 Kelih 158, 197, 3 Kendall 19 Kendall's concordance coefficient 19, 213 Köhler 9, 99, 104, 197, 211, 218, 242, 262, 3 Köhler's requirements 211 k-point 183 Labbé 182 lambda 129, 242 Latin 87 law 98f., 263, 2, 5f., 8 lemma 157 lemmatisation 99 length 31, 41, 97 Li 102 Lienert 234, 238 link 5 Lord 196 Lorenz curve 153f. Mačutek 103, 134, 136f., 141, 3 Malay 31 Mandelbrot 99, 102 Manin 103 Mann 238 Mann-Whitney U-test 238 Markov 31 Maxwell 234

Index  279

McIntosh 146, 242 Ménard 182 Meyer-Eppler 22 Miller 99, 102 mixing 202 Mohanty 197 Mood test 236f. Morales 144 Morse function 219ff. motif 97 Muller 182 Nadarejšvili 45 Naranan 102 Naumann 99, 197 Nemcová 211 nominal style 98 nominality 216 non-smoothness 65 normality 6 open rhyme 85 Ord 242 Ord's criterion 122 Ord's scheme 197 Orlov 45, 102 ornamental 93 ornamentality 92, 98 Overbeck 197, 211 Owen 238 parallelism 31 part of speech 74, 93f., 96 parts of speech 104, 210 Parts-of-speech 87f. Peircean abduction 210 phoneme frequencies 9 placing 238 poem 1 Poisson 75ff., 79f., 82 Poisson distribution 75 Popescu 9, 70, 99, 103f., 134, 136f., 139, 141, 144, 153, 158f., 164, 171, 175, 182f., 188, 194, 197, 202, 218, 3 power function 41, 124, 230 property 6 prose rhythm 1

R1 183, 187, 194, 253, 255 R2 188f., 193 R4 255 radian 58, 171, 176, 180f., 242 rank-frequency sequence 118 regularity 2 reliability 5 renewal process 28 repeat rate 144f., 242, 253 rhyme 9, 21, 31, 73f., 87, 93 – closed 74 – feminine 74, 86 – masculine 74, 86 – masculine 86 – masculine 87 – mixed 74 – open 74, 84 rhyme – closed 84 rhyme words 86f. rhyme-word 75, 88 rhythmic structure 1 richness 242, 5 Rothe 211 roughness 65f., 68 Rovenchak 197 run length 236 runs 231 Sahlean 61 Sanada 197 Schulz 37 Sebeok 22 self-information 147 self-organisation 21, 122, 134, 242 self-regulation 175, 242, 6 self-regulation cycle 256 self-regulation. 76 self-stimulation 56 sequence 41f. sequential dependence 232 Serant 182 Shannon 144 sigmoid 70 similarities 60, 64, 70f. similarity 58ff., 64f., 68, 92f. simplicity 5

280  Index simplification 7 Singh-Poisson 208 Skinner 22, 56 Skinner alliteration 44f., 51, 53 Skinner alliterative 53 Skinner effect 21, 56, 68 Slovak 84, 106, 148, 197 Smith 197 spectrum 99f., 102, 157, 165, 175, 188, 242, 254 speech act 106 spontaneity 56, 68, 75 spontaneous 56, 61, 68, 231 Šrejder 102 statistical test 7 steady state 84 Stegmüller 7 stochastic processes 75 stratification 9, 17, 102, 106, 211, 213 stratification law 9 Štukovský 84, 238 Symmetry 39 synsemantics 104, 171, 182 synthetic 103f. synthetism 103 system 99, 5 systematisation 6 tendency 37 testability 6 theory 5f. Thoiron 182 tokenisation 98 tris legomena 110 Tuzzi 171, 175

uncertainty 147 unevenness 147 unified theory 268 uniformity 147 validity 5 variation interval 5 vocabulary richness 98, 148, 153f., 169, 181, 194 vowel harmony 37 Wentian Li 102 Whitney 238 Wilson 197 Wimmer 22, 26, 57, 75, 102f., 182, 202, 209, 211, 268, 5f. word class 97 word frequency 97ff. word length 74f., 98, 196 Word length 74 word lengths 74 word-forms 157 writer's view 171 Zeps 22 Ziegler 99, 211 Zipf 9, 99, 102, 211 Zipf-Alekseev function 212 Zipf-Alekseev's function 9 Zipf-Estoup law 103 Zipfian power function 212 Zipf's law 99, 102f., 108 Zörnig 102, 137 Zotta 104