145 93 1MB
English Pages 70 [64] Year 2022
SPRINGER BRIEFS IN STATISTICS
Thorsten Dickhaus
Lectures on Dependency Selected Topics in Multivariate Statistics 123
SpringerBriefs in Statistics
SpringerBriefs present concise summaries of cutting-edge research and practical applications across a wide spectrum of fields. Featuring compact volumes of 50 to 125 pages, the series covers a range of content from professional to academic. Typical topics might include: • A timely report of state-of-the art analytical techniques • A bridge between new research results, as published in journal articles, and a contextual literature review • A snapshot of a hot or emerging topic • An in-depth case study or clinical example • A presentation of core concepts that students must understand in order to make independent contributions SpringerBriefs in Statistics showcase emerging theory, empirical research, and practical application in Statistics from a global author community. SpringerBriefs are characterized by fast, global electronic dissemination, standard publishing contracts, standardized manuscript preparation and formatting guidelines, and expedited production schedules.
More information about this series at https://link.springer.com/bookseries/8921
Thorsten Dickhaus
Lectures on Dependency Selected Topics in Multivariate Statistics
Thorsten Dickhaus Institute for Statistics University of Bremen Bremen, Germany
ISSN 2191-544X ISSN 2191-5458 (electronic) SpringerBriefs in Statistics ISBN 978-3-030-96931-8 ISBN 978-3-030-96932-5 (eBook) https://doi.org/10.1007/978-3-030-96932-5 Mathematics Subject Classification: 62-01, 62H20, 62G10, 62G09, 62G30, 60E05, 60E15, 60G15 © The Author(s), under exclusive licence to Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To my family
Preface
This book originated from lectures which I have given within the “Reading Club” of the working group “Mathematical Statistics” at the University of Bremen from February 2016 until January 2019. I would like to express my gratitude to the working group members who have attended these lectures and have provided me with constructive feedback. In particular, I thank Rostyslav Bodnar, Magdalena Hernández, André Neumann, Jonathan von Schroeder, Natalia Sirotko-Sibirskaya, and Nico Steffen. There exists a variety of excellent introductory textbooks on multivariate statistics, in particular those by Anderson (2003), Muirhead (1982), and Srivastava and Khatri (1979). While these latter books provide a systematic and comprehensive introduction to the field, the current book has a different goal: It presents specific, selected aspects of multivariate statistics in order to rouse the reader’s interest in the field. Consequently, the topics for the lectures presented here have been chosen according to two principles: On the one hand, the material should be entertaining in the sense that the proofs are elegant and the solutions to the stated problems are appealing (for instance, closed-form expressions for seemingly complicated quantities). On the other hand, the goal was to provide Ph.D. students with some mathematical techniques and knowledge which they may find valuable when working on research topics from multivariate statistics, where concepts of dependency (in the stochastic-statistical sense) are important. The provided references are meant to provide a starting point for a more systematic study of multivariate statistics. The target audience of this book consists of (early career) researchers in mathematical statistics and of lecturers from this field. In particular, one may use this book as the basis for a one-semester course or seminar on stochastic-statistical dependencies. The length of each of Chaps. 2–8 is appropriate for being covered in one 90min teaching session. Actually, I have given a seminar entitled “Stochastic-statistical dependencies” on the basis of this book in the summer term 2021 at the University of Bremen, and I am grateful to the participants of that seminar for their positive and constructive feedback.
vii
viii
Preface
If the book is used for self-studies, the reader should have knowledge in measure-theoretic probability theory on the level of an introductory course, including knowledge about conditional expectations and conditional distributions. Bremen, Germany December 2021
Thorsten Dickhaus
References Anderson TW (2003) An introduction to multivariate statistical analysis. Hoboken, NJ: Wiley Muirhead R (1982) Aspects of Multivariate Statistical Theory. New York: Wiley Srivastava MS, Khatri CG (1979) An introduction to multivariate statistics. New York, Oxford: North Holland/New York
Acknowledgements
The author is grateful to three anonymous referees for their constructive feedback regarding an earlier version of the book manuscript. Their comments have helped to improve the presentation and to streamline the focus of the book. Furthermore, the responsible Springer Editor Veronika Rosteck has also made a couple of helpful suggestions regarding the presentation of the material. Special thanks are due to Springer Nature Customer Service Centre GmbH for giving the permission to reprint Table 3.1 from an earlier Springer publication.
ix
Contents
1 General Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Covariances and Correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Copula Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Normal Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Statistical Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Overview of the Remaining Chapters . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 1 3 4 5 6 7
2 Correlation Coefficients of Bivariate Normal Distributions . . . . . . . . . . 2.1 Introduction and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Gaussian Integrals of Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Elementary Proof of Part (ii) of Theorem 2.2 . . . . . . . . . . . . . . . . . . . . 2.4 More Elegant Proof of Part (ii) of Theorem 2.2 . . . . . . . . . . . . . . . . . . 2.5 Extension to Kendall’s τ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9 9 11 13 14 16 17
3 Empirical Likelihood Ratio Tests for the Population Correlation Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Empirical Likelihood Ratio Tests for Multivariate Means . . . . . . . . . . 3.3 Profile ELR Test for ρ(X, Y ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19 19 20 21 23
4 The Rearrangement Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Numerical Solution of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Application to the Product of Marginally Uniform Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25 25 26 27 30
xi
xii
Contents
5 On the Covariances of Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Verification of Non-negative Correlations Among the Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31 31
6 On Equi-Correlation Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Introduction and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Computing the Inverse of an Equi-Correlation Matrix . . . . . . . . . . . . . 6.3 Lower Bound on ρ via the Determinant of R . . . . . . . . . . . . . . . . . . . . 6.4 Matrix Theory for Statisticians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37 37 38 39 40 41
7 Skew-Normal Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Introduction and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Stochastic Representation of the Skew-Normal Distribution . . . . . . . 7.3 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43 43 44 47 48
8 The Weighted Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Introduction: Testing Statistical Functionals . . . . . . . . . . . . . . . . . . . . . 8.2 Efron’s Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Weighted Bootstrap Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49 49 50 52 53
32 35
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Acronyms
(Ω, F , P) B(Y ) B(Rd ) (Y , B(Y ), P) ANOVA Y¯n Bin(n, p) χν2
Probability space Some σ -field over the set Y System of Borel sets of Rd , for d ∈ N Statistical model Analysis of variance Arithmetic mean of the n random variables Y1 , . . . , Yn Binomial distribution with parameters n and p Chi-square distribution with ν degrees of freedom
→ cdf δx det(A) diag(. . .) ELR def = D = ||x||2 Exp(λ) x In i.i.d. 1A L (Y ) NPMLE N (μ, σ 2 ) Nd (µ, )
Convergence in distribution Cumulative distribution function One-point distribution with point mass one in x Determinant of the matrix A Diagonal matrix the diagonal elements of which are given by . . . Empirical likelihood ratio Equality by definition Equality in distribution Euclidean norm of the vector x Exponential distribution with intensity parameter λ > 0 Largest integer smaller than or equal to x Identity matrix in Rn×n Independent and identically distributed Indicator function of the set A Law (or distribution) of the random variate Y Nonparametric maximum likelihood estimator Normal distribution on R with mean μ and variance σ 2 Normal distribution on Rd with mean vector µ and covariance matrix
Probability density function Cumulative distribution function of the standard normal law on R
D
pdf Φ
xiii
xiv
ϕ 2M ¯ R sgn Y X UNI[a, b] w → w. l. o. g.
Acronyms
Lebesgue density of the standard normal law on R Power set of the set M, 2 M = {A : A ⊆ M} R ∪ {−∞, +∞} Sign function X is stochastically independent of Y Uniform distribution on the interval [a, b] Weak convergence Without loss of generality
Chapter 1
General Preliminaries
Abstract In this introductory chapter, some general notions and results are introduced and discussed. In particular, we discuss properties of covariances, correlations, copula functions, and normal distributions. Also some remarks regarding statistical tests are made. Finally, a conceptual overview of the other book chapters is given. Keywords Copula · Correlation · Covariance · Normal distribution · Statistical test Stochastic-statistical dependencies play a crucial role in many modern data analysis applications, at least for two reasons (cf. Stange et al. Stat Probab Lett 111:32–40, 2016): On the one hand, data generated with nowadays’ high-throughput measurement techniques typically exhibit strong temporal, spatial, or spatio-temporal dependencies due to the underlying biological or technological mechanisms. Hence, it is of importance to describe and take into account these dependencies in any realistic stochastic/statistical model for such data. On the other hand, dependencies among data points can often be exploited in statistical inference methods in order to enhance or optimize their performance; see, e.g., Dickhaus (Simultaneous statistical inference with applications in the life sciences. Springer, Berlin, Heidelberg, 2014) and Dickhaus et al. (Handbook of multiple comparisons. Chapman & Hall/CRC Handbooks of Modern Statistical Methods, forthcoming, 2021). In this introductory chapter, some general notions and results about dependence modeling are introduced and discussed. Furthermore, a conceptual overview of the other chapters is given.
1.1 Covariances and Correlations One way to express (linear) stochastic-statistical dependencies is by means of covariances and correlations. Definition 1.1 Let X = (X 1 , X 2 ) : (Ω, F , P) → (R2 , B(R2 )) denote a bivariate random vector with 0 < Var(X i ) < ∞ for i = 1, 2.
© The Author(s), under exclusive licence to Springer Nature Switzerland AG 2022 T. Dickhaus, Lectures on Dependency, SpringerBriefs in Statistics, https://doi.org/10.1007/978-3-030-96932-5_1
1
2
1 General Preliminaries
(a) The covariance of X 1 and X 2 is given by Cov(X 1 , X 2 ) = E[(X 1 − E[X 1 ])(X 2 − E[X 2 ])] = E[X 1 X 2 ] − E[X 1 ]E[X 2 ]. For i ∈ {1, 2}, Var(X i ) = Cov(X i , X i ). (b) Pearson’s product-moment correlation coefficient of X 1 and X 2 is given by Cov(X 1 , X 2 ) ∈ [−1, +1]. ρ(X 1 , X 2 ) = √ Var(X 1 )Var(X 2 ) The following properties of covariances follow immediately from Definition 1.1 and from basic properties of expectation operators. Theorem 1.2 Under the assumptions of Definition 1.1, covariances possess the following properties. (a) The covariance is symmetric, meaning that Cov(X 1 , X 2 ) = Cov(X 2 , X 1 ). (b) The covariance is translation-invariant, meaning that Cov(a + X 1 , b + X 2 ) = Cov(X 1 , X 2 ) for all real constants a and b. (c) The covariance is bilinear, meaning that (i) Cov(a X 1 , bX 2 ) = abCov(X 1 , X 2 ) for all real constants a and b. (ii) Cov(X 1 , X 2 + X 3 ) = Cov(X 1 , X 2 ) + Cov(X 1 , X 3 ), Cov(X 1 + X 3 , X 2 ) = Cov(X 1 , X 2 ) + Cov(X 3 , X 2 ), where X 3 : (Ω, F , P) → R denotes a further random variable with finite variance. (d) We have the identities E[X 1 X 2 ] = E[X 1 ]E[X 2 ] + Cov(X 1 , X 2 ), Var(X 1 ± X 2 ) = Var(X 1 ) + Var(X 2 ) ± 2 Cov(X 1 , X 2 ). (e) Stochastic independence implies uncorrelatedness, meaning that Cov(X 1 , X 2 ) = 0 whenever X 1 and X 2 are stochastically independent. However, uncorrelatedness does in general not imply stochastic independence. Definition 1.3 Let X = (X 1 , . . . , X d ) : (Ω, F , P) → Rd denote a random vector. Assume that E[X i2 ] < ∞ holds true for all 1 ≤ i ≤ d. Then, we call def def Σ = (σi, j )1≤i, j≤d = E (X − E[X]) (X − E[X]) ∈ Rd×d
(1.1)
the covariance matrix of X. Obviously, it holds that σi, j = Cov(X i , X j ) for all 1 ≤ i, j ≤ d.
1.1 Covariances and Correlations
3
Theorem 1.4 Under the assumptions of Definition 1.3, the following assertions hold true. (a) The matrix Σ is positive semi-definite, i.e., Σ is symmetric and for all vectors a = (a1 , . . . , ad ) ∈ R, d d
ai a j σi, j = a Σa ≥ 0.
i=1 j=1 def
(b) Let A ∈ Rm×d denote a deterministic matrix, and let Y = AX. Then, Y is an Rm -valued random vector with covariance matrix AΣ A ∈ Rm×m . d d d Proof For proving part (a), notice that i=1 j=1 ai a j σi, j = Var i=1 ai X i due to bilinearity of covariances. Since variances are always non-negative, the assertion follows. For proving part (b), notice first that E[Y] = AE[X], due to linearity of expectations. Now, we calculate according to (1.1) and by exploiting again linearity of expectations, that Cov(Y) = E (AX − AE[X]) (AX − AE[X]) = E A (X − E[X]) (X − E[X]) A = AE (X − E[X]) (X − E[X]) A = AΣ A , completing the proof.
1.2 Copula Functions A complete description of the dependency structure among the components X 1 , . . . , X d of a d-dimensional random vector X is given by the copula (function) of X. Definition 1.5 Let d ∈ N. A copula C : [0, 1]d → [0, 1] is a d-dimensional cdf, such that all d corresponding univariate marginal distributions are U N I [0, 1], i.e., ∀1 ≤ j ≤ d : ∀u ∈ [0, 1] : C j (u) = u, where C j is the j-th univariate marginal cdf pertaining to C, 1 ≤ j ≤ d. Copulae are also referred to as “dependence functions” in the literature; see, e.g., Deheuvels (1981). This terminology is justified, because the following theorem (which is due to Sklar 1959) asserts that the dependency structure among the components X 1 , . . . , X d of any Rd -valued random vector X can be described by a copula in dimension d.
4
1 General Preliminaries
Theorem 1.6 (Sklar’s Theorem) Let F be any d-dimensional cdf with corresponding univariate marginal cdfs F1 , ..., Fd . Then there exists a copula C, such that ¯ d : F(x) = C(F1 (x1 ), . . . , Fd (xd )). ∀x = (x1 , . . . , xd ) ∈ R
(1.2)
If all F j are continuous functions (1 ≤ j ≤ d), then C is unique. Conversely, let C : [0, 1]d → [0, 1] be (any) copula and (F j : 1 ≤ j ≤ d) cdfs on R. Then, the function F constructed by (1.2) is a proper d-dimensional cdf. Copula theory and corresponding dependence measures will play an important role in Chaps. 2 and 4. For more details on copulae, the books of Schweizer and Sklar (1983), Nelsen (2006), Joe (2015), and Durante and Sempi (2016) as well as the overview article by Embrechts et al. (2003) are valuable resources.
1.3 Normal Distributions Throughout the remaining chapters, normal distributions will play a crucial role. Definition 1.7 The normal distribution on (R, B(R)) with parameters μ ∈ R and σ 2 > 0 is the probability distribution which possesses the Lebesgue density R x → √
(x − μ)2 . exp − 2σ 2 2π σ 2 1
(1.3)
We write Z ∼ N (μ, σ 2 ) to indicate that the real-valued random variable Z possesses the normal distribution on R with parameters μ and σ 2 . We call N (0, 1) the standard normal distribution on (R, B(R)). The pdf of N (0, 1) will be denoted by ϕ, and the cdf of N (0, 1) will be denoted by . Remark 1.8 Let Z be a real-valued random variable which is defined on some probability space (Ω, F , P). Assume that Z ∼ N (μ, σ 2 ). Elementary calculations then yield that E[Z ] = μ, Var(Z ) = σ 2 , Z −μ ∼ N (0, 1), σ where σ =
√
σ 2.
Definition 1.9 Let d ∈ N, and let X = (X 1 , . . . , X d ) denote an Rd -valued random vector which is defined on some probability space (Ω, F , P). Then, we say that X follows the standard normal distribution on (Rd , B(Rd )), if all components of X are i.i.d., and X 1 ∼ N (0, 1). Hence, a Lebesgue density of X is given by
1.3 Normal Distributions
5
Rd x = (x1 , . . . , xd ) →
1 d ϕ(xk ) = (2π )− 2 exp − ||x||22 2 k=1 1 d = (2π )− 2 exp − x x . 2
d
Now, let µ ∈ Rd be a given vector and Σ ∈ Rd×d a given symmetric and positive definite matrix. We can write Σ = Q Q , where Q ∈ Rd×d is a lower triangular matrix with positive diagonal elements. The matrix Q is unique. Define the Rd valued random vector Y by def (1.4) Y = QX + µ, where X follows the standard normal distribution on (Rd , B(Rd )). Then, we call the distribution of Y the normal distribution on (Rd , B(Rd )) with parameters µ and Σ, and we write Y ∼ Nd (µ, Σ). Remark 1.10 Let Y be as in (1.4). Then, linearity of expectations and part (b) of Theorem 1.4 yield that E[Y] = µ, Cov(Y) = Σ. Furthermore, the transformation formula for Lebesgue densities yields that Y possesses the Lebesgue density − d2
R y → (2π ) d
| det(Σ)|
− 21
1 −1 exp − (y − µ) Σ (y − µ) . 2
1.4 Statistical Tests Chapters 3 and 8 will deal with statistical test problems and statistical test procedures. Therefore, let us briefly summarize some basic notions of statistical test theory in this section. We follow the presentation of Sect. 1.1 in Dickhaus (2018). Definition 1.11 (Statistical model) A triple (Y , B(Y ), P) consisting of a nonempty set Y , a σ -field B(Y ) ⊆ 2Y over Y , and a family P = {Pϑ : ϑ ∈ } of probability measures on (Y , B(Y )) is called a statistical model. For each fixed ϑ ∈ , (Y , B(Y ), Pϑ ) is a probability space. The set Y is called the sample space of the statistical model. It is the set of all possible outcomes of an experiment of interest. If ⊆ R p , p ∈ N, then we call (Y , B(Y ), (Pϑ )ϑ∈ ) a parametric statistical model, where ϑ ∈ is called the parameter, and is called the parameter space. The value of ϑ is unknown in practice. Definition 1.12 (Statistical test problem, statistical test) Let (Y , B(Y ), (Pϑ )ϑ∈ ) be a statistical model. Assume that two non-empty and disjoint subsets 0 and 1 of are given, such that 0 ∪ 1 = .
6
1 General Preliminaries
Then, we define the so-called null hypothesis H0 by H0 : ϑ ∈ 0 and the corresponding alternative hypothesis by H1 : ϑ ∈ 1 . In this, H0 and H1 should be chosen such, that the scientific claim that one is interested to gain evidence for is represented by the alternative hypothesis H1 . Often, one directly interprets H0 and H1 themselves as subsets of , i.e., one considers sets H0 and H1 such that H0 ∪ H1 = and H0 ∩ H1 = ∅. A (non-randomized) statistical test φ is a measurable mapping φ : (Y , B(Y )) → ({0, 1}, 2{0,1} ) with the convention that φ(y) = 1 ⇐⇒ Rejection of the null hypothesisH0 , decision in favor ofH1 , φ(y) = 0 ⇐⇒ Non-rejection ofH0 . The subset {y ∈ Y : φ(y) = 1} of the sample space Y is called the rejection region or, synonymously, the critical region of φ, {φ = 1} for short. Its complement {y ∈ Y : φ(y) = 0} is called the acceptance region of φ, {φ = 0} = {φ = 1} for short. Definition 1.13 (Properties of statistical tests) Consider the framework of Definition 1.12. (i) The quantity βφ (ϑ) = Eϑ [φ] = Pϑ (φ = 1) = Y φdPϑ denotes the rejection probability of a given test φ as a function of ϑ ∈ . For ϑ ∈ 1 we call βφ (ϑ) the power of φ in the point ϑ. For ϑ ∈ 0 , we call βφ (ϑ) the type I error probability of φ under ϑ ∈ 0 . A type I error occurs, if H0 is true, but gets rejected by the test φ. A type II error occurs, if H0 is false, but does not get rejected by the test φ. (ii) For fixed α ∈ (0, 1), we call a test φ with βφ (ϑ) ≤ α for all ϑ ∈ H0 a level α test. The constant α is called the significance level.
1.5 Overview of the Remaining Chapters The remaining chapters of this book are ordered chronologically by the dates at which the corresponding lectures have been given within the “Reading Club” of the working group “Mathematical Statistics” at the University of Bremen. Chapter 2 (with corresponding lecture given on February 19th, 2016) deals with the relationship between Pearson’s product-moment correlation coefficient and
1.5 Overview of the Remaining Chapters
7
Spearman’s rank correlation coefficient under normal distributions (and more general elliptical distributions). The chapter also includes a section on Gaussian integrals of quadratic forms, which may be of independent interest. Finally, the developed methodology will also be applied to Kendall’s τ . In Chap. 3 (with corresponding lecture given on March 30th, 2016), empirical likelihood ratio tests for Pearson’s product-moment correlation coefficient are discussed. As a side result, we also prove a nonparametric maximum likelihood estimator property of the empirical measure. Chapter 4 (with corresponding lecture given on June 6th, 2016) discusses certain optimization problems related to copula theory. A numerical solution to these problems is presented, namely, the so-called rearrangement algorithm. Chapter 5 (with corresponding lecture given on August 11th, 2017) deals with properties of order statistics of i.i.d. real-valued observables. In particular, it is shown that such order statistics are non-negatively correlated, and some corollaries are derived from this property. The chapter also includes a useful lemma from measure theory regarding a sufficient condition for non-negative covariance. In Chap. 6 (with corresponding lecture given on August 16th, 2017), we analyze properties of equi-correlation matrices. In particular, we apply the Sherman-Morrison formula to the problem of inverting an equi-correlation matrix, and we provide a lower bound on the equi-correlation coefficient. Some more general remarks about matrix theory for statisticians are also made in this chapter. Chapter 7 (with corresponding lecture given on May 7th, 2018) deals with skewnormal distributions on the real line. Utilizing results from stationary time series analysis, we derive a stochastic representation of a skew-normally distributed random variable. Furthermore, possible generalizations of skew-normal distributions are mentioned. In the final Chap. 8 (with corresponding lecture given on January 18th, 2019), (weighted) bootstrap methods for testing statistical functionals are presented. The relationship to dependency is that Efron’s bootstrap involves multinominal weights which are correlated. Some modifications of these weights are proposed to address this issue.
References Deheuvels P (1981) Multivariate tests of independence. In: Analytical methods in probability theory (Oberwolfach, 1980). Lecture notes in mathematics, vol 861. Springer, Berlin-New York, pp 42– 50 Dickhaus T (2014) Simultaneous statistical inference with applications in the life sciences. Springer, Berlin, Heidelberg Dickhaus T (2018) Theory of nonparametric tests. Springer, Cham. https://doi.org/10.1007/978-3319-76315-6, https://doi.org/10.1007/978-3-319-76315-6 Dickhaus T, Neumann A, Bodnar T (2021) Multivariate multiple test procedures. In: Handbook of multiple comparisons. Chapman & Hall/CRC Handbooks of Modern Statistical Methods, Chapter 3
8
1 General Preliminaries
Durante F, Sempi C (2016) Principles of copula theory. CRC Press, Boca Raton, FL Embrechts P, Lindskog F, McNeil A (2003) Modelling dependence with copulas and applications to risk management. In: Rachev S (ed) Handbook of heavy tailed distributions in finance. Elsevier Science BV, pp 329–384 Joe H (2015) Dependence modeling with copulas, monographs on statistics and applied probability, vol 134. CRC Press, Boca Raton, FL Nelsen RB (2006) An introduction to copulas, 2nd ed. Springer Series in Statistics. Springer, New York, NY Schweizer B, Sklar A (1983) Probabilistic metric spaces. North-Holland series in probability and applied mathematics. North-Holland Publishing Co, New York Sklar M (1959) Fonctions de répartition à n dimensions et leurs marges. Publ Inst Statist Univ Paris 8:229–231 Stange J, Dickhaus T, Navarro A, Schunk D (2016) Multiplicity- and dependency-adjusted p-values for control of the family-wise error rate. Stat Probab Lett 111:32–40
Chapter 2
Correlation Coefficients of Bivariate Normal Distributions
Abstract This lecture deals with the relationship between Pearson’s productmoment correlation coefficient and Spearman’s rank correlation coefficient under normal distributions (and more general elliptical distributions). This chapter also includes a section on Gaussian integrals of quadratic forms, which may be of independent interest. Finally, the developed methodology will also be applied to Kendall’s τ . Keywords Elliptical distribution · Gaussian integrals of quadratic forms · Kendall’s concordance coefficient · Probability integral transformation · Spearman’s rank correlation coefficient
2.1 Introduction and Motivation In this lecture, we mainly follow the argumentation of Hotelling and Pabst (1936). Pearson’s product-moment correlation coefficient (see Definition 1.1) is a measure for the degree of linear dependency among two real-valued random variables X 1 and X 2 . An alternative dependence measure, which is also capable of expressing certain forms of nonlinear dependency among X 1 and X 2 , is Spearman’s rank correlation coefficient. Definition 2.1 (Spearman’s rank correlation coefficient) Let X = (X 1 , X 2 ) : (Ω, F , P) → (R2 , B(R2 )) denote a bivariate random vector, and denote the (marginal) cdf of X i by Fi , i = 1, 2. Then, Spearman’s rank correlation coefficient of X 1 and X 2 is given by def
ρ S (X 1 , X 2 ) = ρ(F1 (X 1 ), F2 (X 2 )) ∈ [−1, +1], where ρ(·, ·) denotes Pearson’s product-moment correlation coefficient; cf. Definition 1.1. In general, the relationship between ρ(X 1 , X 2 ) and ρ S (X 1 , X 2 ) is highly nontrivial and cannot be expressed in closed form. However, in the case of bivariate Gaussianity of X the following neat formulas hold true. © The Author(s), under exclusive licence to Springer Nature Switzerland AG 2022 T. Dickhaus, Lectures on Dependency, SpringerBriefs in Statistics, https://doi.org/10.1007/978-3-030-96932-5_2
9
10
2 Correlation Coefficients of Bivariate Normal Distributions
Theorem 2.2 Under the assumptions of Definitions 1.1 and 2.1, assume that X ∼ 0 1r , , r ∈ (−1, +1), meaning that the (joint) Lebesgue density of X N2 0 r 1 is given by def
ϕ(x1 , x2 ; r ) =
1 1 x 2 − 2r x1 x2 + x22 , (x1 , x2 ) ∈ R2 . (2.1) exp − 1 √ 2 2 2 1 − r 2π 1 − r
Then it holds that (i) ρ(X 1 , X 2 ) = r , (ii) ρ S (X 1 , X 2 ) = 6 arcsin(r/2)/π . Part (i) of Theorem 2.2 can be proved straightforwardly. Let us first show that X i possesses the standard normal distribution on R for i = 1, 2. To this end, we calculate R
1 1 (x − r y)2 + (1 − r 2 )y 2 dx exp − √ 2 1 − r2 2π 1 − r 2 R 2 1 y 1 (x − r y)2 d x.(2.2) = exp − exp − √ 2 2 1 − r2 2π 1 − r 2 R
ϕ(x, y; r )d x =
Due to the normalization of the normal distribution on R with mean r y and variance 1 − r 2 , we get thatthe integral on the right-hand of (2.2) equals 2π(1 − r 2 ). Hence, √ we conclude that R ϕ(x, y; r )d x = exp(−y 2 /2)/ 2π . Due to symmetry, we can let x1 take the role of x and x2 take the role of y in the above calculation or vice versa, completing the argumentation. In particular, we have shown that E[X i ] = 0 and Var(X i ) = 1 for i = 1, 2. Thus, ρ(X 1 , X 2 ) = E[X 1 X 2 ] = =
R R
x1 x2 ϕ(x1 , x2 ; r )d x1 d x2 x2 x1 ϕ(x1 , x2 ; r )d x1 d x2
2
R
by Fubini’s Theorem. Let y → ϕ(y) denote the Lebesgue density of the standard normal distribution on R, and notice that
1 1 x 2 − 2r x y + y 2 ϕ(x, y; r ) 2 = − y exp − ϕ(y) 2 1 − r2 2π(1 − r 2 ) 1 1 x 2 − 2r x y + r 2 y 2 = exp − 2 1 − r2 2π(1 − r 2 ) 1 1 (x − r y)2 . = exp − 2 1 − r2 2π(1 − r 2 )
2.1 Introduction and Motivation
11
Hence, x → ϕ(x, y; r )/ϕ(y) is the Lebesgue density of the normal distribution on R with mean r y and variance 1 − r 2 for every fixed y ∈ R and r ∈ (−1, +1). We note in passing that this implies that L (X i |X j = x j ) = N (r x j , 1 − r 2 ) for i = 1 and j = 2 or i = 2 and j = 1, respectively. Now, we get that
R
ϕ(x1 , x2 ; r ) d x1 ϕ(x2 ) R = ϕ(x2 )E N (r x2 , 1 − r 2 ) = r x2 ϕ(x2 ),
x1 ϕ(x1 , x2 ; r )d x1 = ϕ(x2 )
x1
and this finally leads to ρ(X 1 , X 2 ) =
R
r x22 ϕ(x2 )d x2 = r Var(N (0, 1)) = r,
completing the proof of part (i) of Theorem 2.2. D
Remark 2.3 Under the assumptions of Theorem 2.2, assume that (Y1 , Y2 ) = (α X 1 + β, γ X 2 + δ) for real constants α, β, γ , δ such that sgn(αγ ) = 0. Then ρ(Y1 , Y2 ) = sgn(αγ )ρ(X 1 , X 2 ). The proof of part (ii) of Theorem 2.2 is more involved. As a preparation, we consider Gaussian integration of quadratic forms in the following section.
2.2 Gaussian Integrals of Quadratic Forms Lemma 2.4 (Gaussian integrals of quadratic forms) Let M ∈ Rd×d be a symmetric and positive definite matrix. Then it holds that Rd
1 (2π )d/2 . exp − q Mq dq = √ 2 det(M)
Proof It is well-known that stant. We have that
∞
−∞
exp(−x 2 )d x =
√
π . Now, let a > 0 be a real con-
2 ∞
a a 2 x exp − x d x = exp − d x. 2 2 −∞ −∞
(2.3)
∞
(2.4)
√ √ √ We substitute y := a/2x such that dy/d x = a/2, hence d x = 2/ady. With this, the right-hand side of (2.4) can be expressed as
∞
2 dy = exp(−y ) a −∞ 2
(2π )1/2 2π = √ . a a
12
2 Correlation Coefficients of Bivariate Normal Distributions
Applying Fubini’s Theorem, we conclude that Rd
d 1 (2π )d/2 exp − ai xi2 dx = 2 i=1 d i=1 ai
(2.5)
for all a = (a1 , . . . , ad ) with ai > 0 for all 1 ≤ i ≤ d, where x = (x1 , . . . , xd ) . Now, we consider the linear transformation x = Bq for a (d × d)−matrix B with det(B) = 0. We get that d
ai xi2 = x Ax
i=1
= q B ABq =: q Mq, where A = diag(a1 , . . . , ad ) and M = B AB. By the change-of-variables theorem for the Lebesgue integral (see, e.g., Theorem 3.7.1. in Bogachev 2007), we conclude from (2.5), that Rd
1 (2π )d/2 . exp − q Mq | det(B)|dq = √ 2 det(A)
Finally, the assertion follows by noticing det(B AB) = [det(B)]2 det(A), or equivalently √ det(M) , | det(B)| = √ det(A)
that
(2.6) det(M) =
(2.7)
plugging (2.7) into (2.6), and rearranging the terms. Remark 2.5 With the help of Lemma 2.4, it can easily be verified that ϕ(·, ·; r ) from (2.1) is indeed normalized r ∈ (−1, +1). To this end, let d = 2 and 1 for any r − 2 1−r 2 consider the matrix M = 1−rr with det(M) = (1 − r 2 )−1 and, conse1 − 1−r 2 1−r 2 √ quently, det(M)−1/2 = 1 − r 2 . We immediately conclude that R2
ϕ(x, y; r )d xd y =
1 2π 1 − r 2 = 1. √ 2π 1 − r 2
2.3 Elementary Proof of Part (ii) of Theorem 2.2
13
2.3 Elementary Proof of Part (ii) of Theorem 2.2 Let denote the cdf of the standard normal distribution on R. As we have seen before, under the assumptions of Theorem 2.2 is the marginal cdf of X i for i = 1, 2. Hence, by the principle of probability integral transformation (see, e.g., Theorem 2.10 in Dickhaus 2018), the random variable Ui := (X i ) is uniformly distributed on [0, 1] with E[Ui ] = 1/2 and Var(Ui ) = 1/12 for i = 1, 2. We compute ρ S (X 1 , X 2 ) = ρ(U1 , U2 ) E[U1 U2 ] − 1/4 = 1/12 = 12E[(X 1 )(X 2 )] − 3 = 12 (x)(y)dPX (x, y) − 3 R2 = 12 (x)(y)ϕ(x, y; r )d xd y − 3. R2
Lemma 2.6 We have that ∂ ∂2 ϕ(x, y; r ) = ϕ(x, y; r ). ∂r ∂ x∂ y Proof The assertion follows by applying the basic rules of (partial) differentiation. Making use of Lemma 2.6 we get that ∂ ρ S (X 1 , X 2 ) = 12 ∂r
R
= 12
2
R2
(x)(y)
∂2 ϕ(x, y; r )d xd y ∂ x∂ y
(2.8)
ϕ(x)ϕ(y)ϕ(x, y; r )d xd y,
(2.9)
where interchanging integration and partial differentiation on the right-hand side of (2.8) is justified due to the regularity of the family of bivariate normal distributions, and (2.9) follows from integration by parts, where ϕ(z) = d/(dz)(z). Plugging the explicit expressions for ϕ(x), ϕ(y), and ϕ(x, y; r ) into (2.9) yields that ∂ 12 ρ S (X 1 , X 2 ) = 2 ∂r 4π 1 − r 2
1 (2 − r 2 )x 2 − 2r x y + (2 − r 2 )y 2 exp − 2 1 − r2 R2
Now, consider the quadratic form Q(x, y) =
2r 2 − r2 2 2 − r2 2 x − x y + y . 1 − r2 1 − r2 1 − r2
d xd y.
14
2 Correlation Coefficients of Bivariate Normal Distributions
Obviously, Q(x, y) = (x, y)M(x, y) , where M=
2−r 2 1−r 2 r − 1−r 2
r − 1−r 2
2−r 2 1−r 2
is symmetric and positive definite with det(M) =
r2 4 − r2 (2 − r 2 )2 − = . (1 − r 2 )2 (1 − r 2 )2 1 − r2
Thus, Lemma 2.4 yields that √ 1 ∂ 2π 1 − r 2 12 6 = √ , ρ S (X 1 , X 2 ) = √ √ 2 2 2 ∂r π 4 − r2 4π 1 − r 4−r leading to ρ S (X 1 , X 2 ) = 6 arcsin(r/2)/π , as desired.
2.4 More Elegant Proof of Part (ii) of Theorem 2.2 Here, we provide a more elegant proof of part (ii) of Theorem 2.2 by exploiting linearity of Gaussian distributions. We first collect some preparatory results in Lemmas 2.7–2.9. To this end, we define an independent copy of X as a bivariate random vector Y which is defined on the same probability space as X and is such, that X and Y are stochastically independent and that Y has the same (joint) distribution as X. Lemma 2.7 Under the assumptions of Theorem 2.2, let U1 = (X 1 ) and U2 = (X 2 ). Furthermore, let ( X˜ 1 , X˜ 2 ) and (X 1 , X 2 ) be two independent copies of X. Then it holds that
2E[U1 U2 ] = P (X 1 − X˜ 1 )(X 2 − X 2 ) > 0 .
Consequently, ρ S (X 1 , X 2 ) = 6P (X 1 − X˜ 1 )(X 2 − X 2 ) > 0 − 3. Proof Let C denote the joint cdf of (U1 , U2 ) , which is commonly referred to as the copula of (X 1 , X 2 ) ; cf. Sect. 1.2. Then we have that E[U1 U2 ] = On the other hand, we have that
uvdC(u, v). [0,1]2
2.4 More Elegant Proof of Part (ii) of Theorem 2.2
15
P (X 1 − X˜ 1 )(X 2 − X 2 ) > 0 = P X 1 > X˜ 1 , X 2 > X 2 + P X 1 < X˜ 1 , X 2 < X 2
= 2P X 1 > X˜ 1 , X 2 > X 2
due to symmetry. Let U˜ 1 = ( X˜ 1 ) and U2 = (X 2 ). Since is strictly isotone on R, we get that
2P X 1 > X˜ 1 , X 2 > X 2 = 2P U˜ 1 < U1 , U2 < U2
=2 P U˜ 1 < u, U2 < v dC(u, v) [0,1]2 =2 uvdC(u, v), [0,1]2
because U˜ 1 and U2 are stochastically independent with U˜ 1 ∼ UNI[0, 1] and U2 ∼ UNI[0, 1] by the principle of probability integral transformation. √ Lemma 2.8 Under the assumptions of Lemma (X 1 − X˜ 1 )/ 2 and 2.7, let Z 1 := √ 0 1 r/2 , . Furthermore, it Z 2 := (X 2 − X 2 )/ 2. Then (Z 1 , Z 2 ) ∼ N2 0 r/2 1 holds that
P (X 1 − X˜ 1 )(X 2 − X 2 ) > 0 = P (Z 1 Z 2 > 0) = 2P (Z 1 > 0, Z 2 > 0) . Consequently, ρ S (X 1 , X 2 ) = 12P (Z 1 > 0, Z 2 > 0) − 3. A stochastic representation of (Z 1 , Z 2 ) is given by D
(Z 1 , Z 2 ) = (V cos(θ ) + W sin(θ ), W ) , where (V, W ) ∼ N2 (0, I2 ) and θ = arcsin(r/2) ∈ (−π/6, π/6). Proof All assertions follow by elementary calculations. Lemma 2.9 Let (V, W ) be as in Lemma 2.8. Let R denote an R>0 -valued random variable which possesses the chi distribution with two degrees of freedom, meaning that its Lebesgue density is given by r → r exp(−r 2 /2), r ∈ R>0 . Furthermore, let denote a random variable which is stochastically independent of R and uniformly D distributed on the interval [−π, π ]. Then (V, W ) = (R cos(), R sin()) . Proof The proof is a simple application of the transformation formula for (Lebesgue) densities. After these preparations, we are ready to compute ρ S (X 1 , X 2 ) = 12P (Z 1 > 0, Z 2 > 0) − 3.
(2.10)
16
2 Correlation Coefficients of Bivariate Normal Distributions
Since R is almost surely larger than zero, we have that P (Z 1 > 0, Z 2 > 0) = P (cos() cos(θ ) + sin() sin(θ ) > 0, sin() > 0) = P (cos( − θ ) > 0, sin() > 0) , (2.11) where (2.11) follows from the well-known compound angle formula for the cosine function as given in, e.g., Eq. 4 in Sect. 9.3 of Neill (2018). We note that − θ takes its values in (−7π/6, 7π/6) and that cos(x) > 0 for x ∈ (−π/2, π/2) ⊂ (−7π/6, 7π/6). From this, we conclude that {cos( − θ ) > 0} = { ∈ (θ − π/2, θ + π/2)}. Furthermore, {sin() > 0} = { ∈ (0, π )}. Altogether, this entails that P (cos( − θ ) > 0, sin() > 0) = P ( ∈ (θ − π/2, θ + π/2) ∩ (0, π )) = P ( ∈ (0, θ + π/2)) arcsin(r/2) + π/2 θ + π/2 = . = 2π 2π Substituting this into (2.10) finally yields that 6 arcsin(r/2) + π/2 ρ S (X 1 , X 2 ) = 12 − 3 = arcsin(r/2) ∈ [−1, +1], 2π π
as desired.
2.5 Extension to Kendall’s τ ˜ = ( X˜ 1 , X˜ 2 ) denote an independent copy of X = (X 1 , X 2 ) . Then, Kendall’s Let X concordance coefficient τ of X 1 and X 2 is defined as
τ (X 1 , X 2 ) = P (X 1 − X˜ 1 )(X 2 − X˜ 2 ) > 0 − P (X 1 − X˜ 1 )(X 2 − X˜ 2 ) < 0 . With the methods developed in Sects. 2.3 and 2.4, it is also possible to calculate τ (X 1 , X 2 ) under the assumptions of Theorem 2.2. On the one hand, it can be shown that τ (X 1 , X 2 ) = 4E[(X 1 , X 2 ; r )] − 1 =4 (x, y; r )ϕ(x, y; r )d xd y − 1, R2
where (·, ·; r ) denotes the (joint) cdf of X. Analogously to the technique in Sect. 2.3 and making use of the product rule of differentiation, we obtain
2.5 Extension to Kendall’s τ
17
∂ ∂ (x, y; r )ϕ(x, y; r ) + (x, y; r ) ϕ(x, y; r )d xd y ∂r R2 ∂r
2 ∂ ϕ(x, y; r) =4 (x, y; r )d xd y [ϕ(x, y; r )]2 d xd y + ∂ x∂ y R2 R2 (2.12) (2.13) =8 [ϕ(x, y; r )]2 d xd y,
∂ τ (X 1 , X 2 ) = 4 ∂r
R2
where (2.12) follows from the fact that ∂/(∂r )(x, y; r ) = ϕ(x, y; r ) (see, e.g., the appendix of Sibuya 1960), and (2.13) follows from integration by parts applied to the second summand in (2.12). Solving the integral in (2.13) by making use of Lemma 2.4 leads to 1 2 ∂ τ (X 1 , X 2 ) = √ , ∂r π 1 − r2 hence τ (X 1 , X 2 ) = 2 arcsin(r )/π , where arcsin(r ) ∈ (−π/2, π/2). On the other hand, we may also express Kendall’s τ as τ (X 1 , X 2 ) = 4P(X 1 > 0, X 2 > 0) − 1. Proceeding as in Sect. 2.4, but with θ = arcsin(r/2) ∈ (−π/6, π/6) replaced by θ = arcsin(r ) ∈ (−π/2, π/2), we get again that τ (X 1 , X 2 ) = 4
2 arcsin(r ) + π/2 − 1 = arcsin(r ). 2π π
(2.14)
It is remarkable that the computation of P(X 1 > 0, X 2 > 0) can be traced back to the nineteenth century; see the bibliographic remarks on page 290 of Cramér (1946). Remark 2.10 Lindskog et al. (2003) have shown that the formula (2.14) for Kendall’s τ is valid in a much broader class of elliptical distributions.
References Bogachev VI (2007) Measure theory. vol I, II. Springer, Berlin. https://doi.org/10.1007/978-3-54034514-5, https://doi.org/10.1007/978-3-540-34514-5 Cramér H (1946) Mathematical methods of statistics. Princeton mathematical series, vol 9. Princeton University Press, Princeton, NJ Dickhaus T (2018) Theory of nonparametric tests. Springer, Cham. https://doi.org/10.1007/978-3319-76315-6, https://doi.org/10.1007/978-3-319-76315-6 Hotelling H, Pabst MR (1936) Rank correlation and tests of significance involving no assumption of normality. Ann Math Stat 7:29–43 Lindskog F, McNeil A, Schmock U (2003) Kendall’s Tau for elliptical distributions. Credit risk. Evaluation and management, contributions to economics. Physica-Verlag, Measurement, pp 149– 156
18
2 Correlation Coefficients of Bivariate Normal Distributions
Neill H (2018) Trigonometry: a complete introduction. John Murray Learning, London, UK Sibuya M (1960) Bivariate extreme statistics. I. Ann Inst Statist Math Tokyo 11:195–210. https:// doi.org/10.1007/bf01682329, https://doi.org/10.1007/bf01682329
Chapter 3
Empirical Likelihood Ratio Tests for the Population Correlation Coefficient
Abstract In this chapter, empirical likelihood ratio tests for Pearson’s productmoment correlation coefficient are discussed. As a side result, we also prove a nonparametric maximum likelihood estimator property of the empirical measure. Keywords Nonparametric test · Profile maximum likelihood · Statistical functional · Wilks phenomenon
3.1 Introduction and Motivation This lecture contains results taken from Dickhaus (2015). In Chap. 2, we have studied (properties of) Pearson’s product-moment correlation coefficient ρ = ρ(X, Y ) of a bivariate random vector (X, Y ) taking its values in R2 . All results of Chap. 2 refer to the population level, meaning that the (joint) distribution of (X, Y ) is exactly known. In the present chapter, we consider methods and results on the sample level, or, in other words, statistical approaches. To this end, we assume that a sample (X 1 , Y1 ), . . . , (X n , Yn ) of observable bivariate i.i.d. random vectors is D at hand, where n ∈ N denotes the sample size and (X 1 , Y1 ) = (X, Y ). In particular, we consider nonparametric methods for testing H0 : ρ = ρ ∗ versus H1 : ρ = ρ ∗ , where ρ ∗ ∈ [−1, +1] is a given constant. Remark 3.1 We implicitly assume that the necessary (second) moments of the joint distribution of (X, Y ) exist in R. One nonparametric testing approach for statistical functionals is the empirical likelihood ratio (ELR) test approach, which we will introduce in the next section.
© The Author(s), under exclusive licence to Springer Nature Switzerland AG 2022 T. Dickhaus, Lectures on Dependency, SpringerBriefs in Statistics, https://doi.org/10.1007/978-3-030-96932-5_3
19
20
3 Empirical Likelihood Ratio Tests for the Population …
3.2 Empirical Likelihood Ratio Tests for Multivariate Means Let Z1 , . . . , Zn denote observable i.i.d. random vectors taking their values in Rd D for d ∈ N. Assume that Z1 = Z, denote the distribution of Z by P, and assume that n def δZi . For a given E[Z] exists in Rd . Define the empirical measure Pˆ n = n −1 i=1 probability measure ν on Rd , define the empirical (nonparametric) likelihood based on the (observed) sample Z1 = z1 , . . . , Zn = zn by L(ν) =
n
ν ({zi }) .
i=1
Theorem 3.2 Under the aforementioned assumptions, Pˆ n (evaluated at the realized values z1 , . . . , zn ) maximizes L over all probability measures ν on Rd . Hence, Pˆ n is the nonparametric maximum likelihood estimator (NPMLE) of P. Proof Let u1 , . . . , um be the distinct observed values of Z1 , . . . , Zn , where m ≤ n. Let n j ≥ 1 be the number of those Zi ’s which have taken the value u j , for 1 ≤ def j ≤ m. For a given probability measure ν on Rd , define p j = ν {u j } for 1 ≤ def
j ≤ m. Furthermore, let pˆ j = n j /n for 1 ≤ j ≤ m. If p j = 0 for any 1 ≤ j ≤ m, then L(ν) = 0 < n −n = L(Pˆ n ). Thus, we can w.l.o.g. assume that p j > 0 for all 1 ≤ j ≤ m, and that p j = pˆ j for at least one j ∈ {1, . . . , m}. Then it holds that log
L(ν) L(Pˆ n )
pj = n j log pˆ j j=1 m pj . pˆ j log =n p ˆj j=1 m
(3.1)
Now, we exploit that log(x) ≤ x − 1 for all x > 0, with equality only for x = 1. Hence, we get that the right-hand side of (3.1) is strictly smaller than
n
m
pˆ j
j=1
⎡ ⎤ m pj −1 = n⎣ p j − 1⎦ ≤ 0, pˆ j j=1
which completes the proof. Now, suppose that we want to test the null hypothesis H = {E[Z] = μ∗ }, where μ is a given point in Rd . One plausible idea is to consider a likelihood ratio-type test statistic for this purpose. ∗
3.2 Empirical Likelihood Ratio Tests for Multivariate Means
21
Definition 3.3 The empirical likelihood ratio (ELR) is a (random) function R : Rd → R, given by
n n n npi 0 ≤ pi ≤ 1, pi = 1, pi Zi = μ . R(μ) = max p1 ,..., pn
i=1
i=1
i=1
Remark 3.4 Due to Theorem 3.2, R(μ) ≤ 1 for any μ ∈ Rd . The following theorem is one of the major results of ELR test theory. def
Theorem 3.5 (see Owen (1990)) Assume that E[Z] = μ0 ∈ Rd and that V0 = Cov(Z) is finite and of rank q > 0. D
Then, −2 log (R(μ0 )) → χq2 as n → ∞.
3.3 Profile ELR Test for ρ(X, Y ) Under the assumptions of Sect. 3.1, let Z = (X, Y, X 2 , Y 2 , X Y ) ∈ R5 , def
such that E[Z] = (μ X , μY , μ2X + σ X2 , μ2Y + σY2 , ρσ X σY + μ X μY ) , where μ X and μY denote the means of X and Y , respectively, and σ X and σY denote the standard deviations of X and Y , respectively. Corollary 3.6 Denote by θ = (μ X , μY , σ X2 , σY2 , ρ) ∈ ⊂ R5 the five-dimensional vector of the population moments of interest. Furthermore, define h : → R5 as the function which maps θ onto E[Z]. Obviously, h possesses (partial) derivatives of any order. Then,
n n n ∗ ∗ npi 0 ≤ pi ≤ 1, pi = 1, pi Zi = h(θ ) (3.2) R(θ ) = max p1 ,..., pn
i=1
i=1
i=1
is the ELR pertaining to H0 : {θ = θ ∗ }, where Zi is calculated from (X i , Yi ) as Z from (X, Y ), for 1 ≤ i ≤ n. Furthermore, the (asymptotic) ELR test for H0 rejects H0 at level α ∈ (0, 1), if ∗ −2 log R(θ ) exceeds the (1 − α)-quantile of the chi-square distribution with five degrees of freedom.
22
3 Empirical Likelihood Ratio Tests for the Population …
However, remember that we want to test H0 : {ρ = ρ ∗ } for a given value ρ ∗ ∈ [−1, +1]. Hence, we have to deal with the four nuisance parameters μ X , μY , σ X2 , and σY2 . To this end, we adapt the approach of profile maximum likelihood, meaning that we maximize the ELR over the nuisance parameters. The resulting method can be summarized as follows. Algorithm 3.7 1. Maximize R from (3.2) over {θ ∗ ∈ : ρ = ρ ∗ }. Denote the maximizer by θ(ρ ∗ ). 2. Reject H0 , if −2 log(R(θ (ρ ∗ ))) exceeds the (1 − α)-quantile of the chi-square distribution with one degree of freedom, α ∈ (0, 1). Theorem 3.8 (see Owen (1990)) The profile ELR test for H0 which is defined by Algorithm 3.7 is an asymptotic level α test as n → ∞.
Table 3.1 Relative rejection frequencies of the profile empirical likelihood ratio test for the population correlation coefficient in the case of bivariate Gaussian data. The nominal significance level was set to α = 5% in all simulations. Results are based on 10,000 Monte Carlo repetitions for each parameter configuration (ρ: true underlying correlation coefficient, n: sample size). Reprinted by permission from Springer Nature Customer Service Centre GmbH: Springer Nature, Stochastic Models, Statistics and Their Applications. Springer Proceedings in Mathematics & Statistics, vol. c 122 by Ansgar Steland, Ewaryst Rafajłowicz, Krzysztof Szajowski (Eds.) 2015 ρ n Relative rejection ρ n Relative rejection frequency frequency −0.9 −0.9 −0.9 −0.9 −0.75 −0.75 −0.75 −0.75 −0.5 −0.5 −0.5 −0.5 −0.25 −0.25 −0.25 −0.25 0 0 0 0
10 20 50 100 10 20 50 100 10 20 50 100 10 20 50 100 10 20 50 100
0.1669 0.1030 0.0681 0.0593 0.1668 0.1069 0.0758 0.0588 0.1645 0.1089 0.0688 0.0624 0.1655 0.1102 0.0737 0.0593 0.1669 0.1106 0.0697 0.0623
0.25 0.25 0.25 0.25 0.5 0.5 0.5 0.5 0.75 0.75 0.75 0.75 0.9 0.9 0.9 0.9
10 20 50 100 10 20 50 100 10 20 50 100 10 20 50 100
0.1612 0.1085 0.0716 0.0558 0.1705 0.1066 0.0743 0.0586 0.1649 0.1077 0.0737 0.0605 0.1662 0.1015 0.0716 0.0625
3.3 Profile ELR Test for ρ(X, Y )
23
Remark 3.9 Algorithm 3.7 involves a nested double optimization. Namely, the inner optimization is given by the maximization over p1 , . . . , pn in (3.2) for given θ ∗ , and the outer optimization is given by the maximization over the nuisance parameters μ X , μY , σ X2 , σY2 as described in the first step of Algorithm 3.7. For the inner (constrained) optimization, the Lagrange multiplier method can be applied, while for the outer optimization any all-purpose optimization routine can be employed. For more detailed comments on possible implementations, see Dickhaus (2015) and the references therein. Algorithm 3.7 defines a nonparametric test φ (say) for H0 : {ρ = ρ ∗ } which is approximately of level α for large sample sizes n. Unfortunately, though, φ does typically not keep the significance level α for finite sample sizes. For an illustration, consider Table 3.1, which is taken from Dickhaus (2015) and reprinted by permission from Springer Nature Customer Service Centre GmbH: Springer Nature, Stochastic Models, Statistics and Their Applications. Springer Proceedings in Mathematics & Statistics, vol. 122 by Ansgar Steland, Ewaryst Rafajłowicz, Krzysztof Szajowski c (Eds.) 2015. This table displays relative rejection frequencies of φ obtained in a computer simulation under the assumption that L (X, Y ) = N2 (0, Σ) and the diagonal elements of Σ are both equal to one. In this setup, the off-diagonal element of Σ is the parameter ρ of interest.
References Dickhaus T (2015) Self-concordant profile empirical likelihood ratio tests for the population correlation coefficient: a simulation study. In: Stochastic models, statistics and their applications. Collected papers based on the presentations at the 12th workshop, Wrocław, Poland, February 2015. Springer, Cham, pp 253–260. https://doi.org/10.1007/978-3-319-13881-7 Owen A (1990) Empirical likelihood ratio confidence regions. Ann Stat 18(1):90–120
Chapter 4
The Rearrangement Algorithm
Abstract In this lecture, we discuss certain optimization problems related to copula theory. A numerical solution to these problems is presented, namely, the so-called rearrangement algorithm. Keywords Copula theory · Fréchet-Hoeffding bounds · Numerical optimization · Product of uniform random variables · Supermodular function
4.1 Introduction The presentation in this chapter mainly follows the article by Puccetti and Rüschendorf (2015). Copula theory (cf. Sect. 1.2) is a very active field of modern probability theory. In particular, several interesting optimization problems arise in the context of copulae. One such optimization problem is the topic of this chapter. For its introduction, we need some more preparations. Definition 4.1 A function ψ : Rd → R is called supermodular, if f (x ∨ y) + f (x ∧ y) ≥ f (x) + f (y)
(4.1)
for all x, y ∈ Rd . In (4.1), ∨ denotes the component-wise maximum and ∧ denotes the component-wise minimum. We are now ready to formulate the problem that we are concerned with in this chapter: Let ψ be a supermodular function from Rd to R for d ∈ N which is such that there exist supermodular functions ψ d−1 : Rd−1 → R and ψ 2 : R2 → R fulfilling that ψ(x1 , . . . , xd ) = ψ 2 x j , ψ d−1 (x1 , . . . , x j−1 , x j+1 , . . . , xd ) for all 1 ≤ j ≤ d and all x = (x1 , . . . , xd ) ∈ Rd . Furthermore, let X 1 , . . . , X d denote real-valued random variables which are all defined on the same probability
© The Author(s), under exclusive licence to Springer Nature Switzerland AG 2022 T. Dickhaus, Lectures on Dependency, SpringerBriefs in Statistics, https://doi.org/10.1007/978-3-030-96932-5_4
25
26
4 The Rearrangement Algorithm
space (Ω, F , P). Assume that the marginal cdf F j of X j is known for all 1 ≤ j ≤ d, but that the copula of the joint distribution of X = (X 1 , . . . , X d ) is unknown. Find sψ = inf E [ψ(X 1 , . . . , X d )] : X j ∼ F j for all 1 ≤ j ≤ d , C∈C
(4.2)
where C denotes the set of all copulae in dimension d. Two relevant special cases are given by ψ(x1 , . . . , xd ) =
d
xj,
j=1
ψ(x1 , . . . , xd ) = min x j . 1≤ j≤d
Remark 4.2 (a) The problem of finding Sψ = sup E [ψ(X 1 , . . . , X d )] : X j ∼ F j for all 1 ≤ j ≤ d C∈C
is trivial: Just choose C(u 1 , . . . , u d ) = min1≤ j≤d u j , which is the so-called upper Fréchet-Hoeffding bound and corresponds to the case of comonotonicity of X 1 , . . . , X d ; cf., e. g., Sect. 3.2 of Embrechts et al. (2003). (b) In dimension d = 2, the problem of finding sψ from (4.2) is trivial, too: Just choose C(u 1 , u 2 ) = max{0, u 1 + u 2 − 1}, which is the so-called lower FréchetHoeffding bound and corresponds to the case of countermonotonicity of X 1 and X 2 ; see again Sect. 3.2 of Embrechts et al. (2003) for more details. (c) In dimension d ≥ 3, the lower Fréchet-Hoeffding bound ⎧ ⎫ d ⎨
⎬ (u 1 , . . . , u d ) → max 0, uj − d + 1 ⎩ ⎭ j=1
is not a copula, and the problem of finding sψ from (4.2) for a given function ψ is highly non-trivial in general.
4.2 Numerical Solution of the Problem As shown by Puccetti and Rüschendorf (2015), the problem of finding sψ from (4.2) can in arbitrary dimension d ∈ N numerically be solved by the so-called rearrangement algorithm.
4.2 Numerical Solution of the Problem
27
Algorithm 4.3 (Rearrangement algorithm) (0) Choose a discretization parameter value n ∈ N. (1) Create an (n × d)-matrix X with entries xi, j =
F j−1
i , 1 ≤ i ≤ n, 1 ≤ j ≤ d. n
(2) Define X˜ (1) from X by iteratively rearranging its j-th column X˜ (1) j for 1 ≤ j ≤ d such that it is oppositely ordered to the (n × 1)-vector ⎛
⎞ ψ d−1 (x1,1 , . . . , x1, j−1 , x1, j+1 , . . . , x1,d ) ⎜ ⎟ .. ⎝ ⎠. . ψ d−1 (xn,1 , . . . , xn, j−1 , xn, j+1 , . . . , xn,d )
In this, columns that have already been rearranged are used with their updated values. (3) Estimate sψ by n 1 (1) (1) , ψ x˜i,1 , . . . , x˜i,d sˆψ(1) = n i=1 where (x˜i,(1)j ) 1≤i≤n are the elements of X˜ (1) . 1≤ j≤d
(4) Iterate steps (2) and (3) with X˜ (k−1) as the initial matrix to obtain sˆψ(k) for k = 2, 3, . . . until convergence. Here, we will not prove that the rearrangement algorithm works. Instead, we will illustrate its usage by means of a concrete example.
4.3 Application to the Product of Marginally Uniform Random Variables Let d = 3, X j ∼ UNI[0, 1] for all j = 1, 2, 3, and ψ(x1 , x2 , x3 ) = x1 x2 x3 . Hence, we search for (4.3) s ∗ = inf E[U1 U2 U3 ], C∈C
where U1 , U2 , and U3 are three marginally standard uniformly distributed random variables which are defined on the same probability space, and C is the set of all copulae in dimension three. We have implemented the rearrangement algorithm for this example in R as follows.
28
4 The Rearrangement Algorithm
d i s c r e t e _ r o w s