219 43 3MB
English Pages 292 Year 2021
Abdelhamid Hassairi Riesz Probability Distributions
De Gruyter Series in Probability and Stochastics
|
Edited by Itai Benjamini, Israel Jean Bertoin, Switzerland Michel Ledoux, France René L. Schilling, Germany
Volume 1
Abdelhamid Hassairi
Riesz Probability Distributions |
Mathematics Subject Classification 2020 Primary: 60-02, 60B11, 17A30; Secondary: 60B20, 62D10 Author Prof. Dr. Abdelhamid Hassairi Sfax University Faculty of Sciences Laboratory of Probability and Statistics 1171 B. P., 3000 Sfax Tunisia [email protected]
ISBN 978-3-11-071325-1 e-ISBN (PDF) 978-3-11-071337-4 e-ISBN (EPUB) 978-3-11-071345-9 ISSN 2512-9007 Library of Congress Control Number: 2021934847 Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2021 Walter de Gruyter GmbH, Berlin/Boston Typesetting: VTeX UAB, Lithuania Printing and binding: CPI books GmbH, Leck www.degruyter.com
Preface The Wishart probability distribution was defined in 1928 by J. Wishart [113] as the natural extension of the chi-square distribution to positive semidefinite symmetric matrices. It is formally defined as the distribution of the sample covariance matrix from a multivariate Gaussian distribution (see [87]). This representation in terms of Gaussian random vectors has led to many important and useful properties of the Wishart distribution, as well as to the wide use of the Gaussian distribution in statistics. The precise form of the Wishart probability distribution defined in this way depends on the position of the size n of the random sample relative to the number r of variables. It is absolutely continuous with respect of the Lebesgue measure on the cone of positive definite symmetric matrices of rank r, when n ≥ r, otherwise, it is singular, supported by the boundary of the cone. The Wishart distribution in its absolutely continuous form is extended to a distribution whose probability density function depends on a real parameter which belongs to an unbounded interval. This fact follows from a result in harmonic analysis due to Gindikin [36] without any interpretation in terms of multivariate Gaussian vectors. There is an extensive literature on the matrix Wishart probability distribution and on some related distributions such as the different kinds of matrix beta distributions, or the matrix Dirichlet distributions. The monographs of Muirhead [91] and Gupta and Nagar [39] are good references in this area. In this context, the impact of the matrix calculus and of the analysis on the algebra of symmetric matrices has been substantial. Conversely, the probabilistic problems and statistical applications have motivated many new directions of research in this algebra (see [98] and [69]). In his survey of random matrices realized in 2002, Olkin [95] said that there are probably close to 20 derivations of the Wishart distribution or extensions and modifications, and that many among them were suggested prior to 1950. He cited Anderson [2] and Johnson and Kotz [62] to uncover relevant references. More recent generalizations have also been realized either by replacing the Gaussian multivariate distribution by other multivariate distributions or by some transformations on the Wishart distribution itself. To cite only a few of these generalizations, we mention that Letac and Massam [79] have defined a normal quasi-Wishart distribution in graphical models, and Bryc [17] has introduced a compound Wishart and a q-Wishart distribution. Wong and Wang [114] have proposed a Laplace–Wishart distribution, while Daz-Garcia et al. [26] have extended the Wishart distribution to a real normed division algebra. The number of works around the Wishart distribution is constantly increasing with the need for more complex and flexible statistical models to solve complicated data analysis problems. In a completely different approach, the Riesz probability distribution on symmetric matrices, or more generally on a symmetric cone, has been introduced in [41] as one of the most important generalization of the Wishart distribution. Its definition relies on results from the theory of simple Euclidean Jordan algebras and their symmetric cones. It involves the notions of the generalized power and Riesz integrals leading to Riesz measures. The present book is a research monograph on the Riesz https://doi.org/10.1515/9783110713374-201
VI | Preface probability distribution in a symmetric cone and on its derivatives. It is worth mentioning here that the use in multivariate statistical analysis of particular Jordan algebras such as the algebra of real symmetric matrices, the algebra of Hermitian matrices, and the Jordan algebra of Minkowski space is not new, it has arisen in many situations in a natural way. It happened in particular with the Wishart distribution, and it was also considered in the context of complex and quaternionic Hermitian matrices by Andersson [3] and in the context of the Jordan algebra of Minkowski space by Jensen [61]. Nowadays, the statistical applications of Jordan algebras in complete generality have become very common. As far as we are concerned, there are at least two reasons for writing the book in the general setting of a simple Euclidean Jordan algebra. The first reason is that the harmonic analysis on symmetric cones was our source of inspiration for introducing the Riesz probability distribution. In fact, the fundamental concepts such as the generalized power, spherical functions and Riesz integrals appear in the theory of Jordan algebras. The second reason is that we have in this theory many important and useful properties which are ready to be exploited for probabilistic and statistical purposes. Of course, the algebra of real symmetric matrices remains an important motivating and illustrative example. The probabilistic results in the book are accessible without recourse to the general concepts of the theory of Jordan algebras, a reader who is not familiar with this theory may read them as if they were written in the setting of the matrix algebras. To make clear the key idea on which the definition of a Riesz probability distribution is based, we just need to contemplate how to translate from the gamma distribution in the real case to the Wishart distribution on a symmetric cone. The translation is done both in the probability density function and in the Laplace transform by changing a real power of the real variable into a real power of the determinant of an element of the cone. With this, all the properties extend with a remarkable analogy. Taking a real power of the determinant of an element of the cone means that we are taking the same real power of all its eigenvalues. Roughly speaking, in the definition of the Riesz probability distribution, instead of taking the same real power of all the eigenvalues, we can take different powers for the different eigenvalues. This has been realized using the notion of the generalized power in a symmetric cone. Accordingly, a more general version of Gindikin’s theorem, which appears in the framework of the so-called Riesz integral in a Jordan algebra, allowed us to introduce a class of probability distributions R(s, σ), the parameter σ is in the symmetric cone associated to the algebra and s is in a given subset Ξ of ℝr , where r is the rank of the algebra. The probability distributions R(s, σ) are called the Riesz probability distributions. When s in Ξ is such that s1 = ⋅ ⋅ ⋅ = sr = λ, the Riesz distribution R(s, σ) is nothing but the the classical Wishart distribution W(λ, σ). A remarkable fact is that the representation of some Wishart probability distributions by the Gaussian distribution extends to the Riesz probability distributions. In fact, under a condition on the parameter s = (s1 , . . . , sr ), the Riesz probability distribution is the distribution of the covariance matrix of a sample from standard multivariate Gaussian distribution with missing values. Along the same line, exploiting extensively the rich geometric and
Preface | VII
algebraic structure of a Jordan algebra, the definition of the Riesz probability distribution has led to the definition of other probability distributions generalizing those related to the Wishart distribution. We speak of beta–Riesz, beta–hypergeometric, Riesz–Dirichlet, and Riesz–Inverse Gaussian probability distributions on a symmetric cone. Also several extensions of the Riesz probability distribution to settings derived from graphical models and to the setting of homogeneous cones have been realized. In the context of graphical models, a Wishart distribution for a decomposable graph and an inverse Wishart for a nondecomposable graph have been introduced by Letac and Massam [80] and by Roverato [100]. A Riesz probability distribution associated to graphs with connection to homogeneous cones has been defined by Andersson and Wojnar [5] and by Andersson and Klein [4]. Accordingly, some characterizations of the Wishart distribution on homogeneous cones have been realized by Boutouria [14, 15] and by Boutouria et al. [16]. Also Díaz-García [27] has defined a Riesz probability distribution on a real division algebra. Of course, the extensions of the notion of a Riesz probability distribution from the setting of symmetric cones to other cones rely on a certain analogy, nevertheless, there are some conceptual differences. For instance, the algebraic structures, techniques used, and approaches are completely different. I take the point of view that the symmetric cones have special features which allow to go further in research of a new perspective in probability theory and statistical applications. However, the theory of symmetric cones and their Jordan algebras is not commonly a standard tool for researchers in probability and statistics, it is therefore necessary to begin by introducing all the material and the results related to this theory needed in the book. Chapter 1 introduces the basic concepts that lead to the definition of a Euclidean Jordan algebra and of a symmetric cone, and their essential properties. It also introduces the fundamental groups of automorphisms, usually related to a Jordan algebra, particularly the triangular group that plays a central role due to some analytical properties in connection with the generalized power. Chapter 2 focuses on the notions of the generalized power and spherical polynomials which play the role of the monomials in the univariate case. This leads to the concept of Mellin transform which is an efficient tool that characterizes a probability distribution on a symmetric cone under invariance with respect to the orthogonal group. Such a tool has almost never been used before in a multivariate probabilistic context. The two first chapters contain many results which are responses to some needs that have arisen when solving certain specific probabilistic questions. These results may appear technical, but, in fact, they have their geometric or analytical interpretation in the Jordan algebra and therefore they represent a significant addition in this area. The exposition of the content of these chapters is designed to be largely self-contained and to serve as an easily accessible reference where the necessary minimum is found by those wishing to use such tools. It is in Chapter 3 that we begin to approach the probabilistic aspects. We introduce the Riesz probability distributions and describe them using the boundary decomposition of a symmetric cone established in Chapter 2. The probability distributions of the Peirce components of a Riesz random variable are determined and some in-
VIII | Preface dependence properties are established. Inspired by the probability distributions of the Peirce components, a Gaussian representation of the Riesz probability distribution is given, generalizing and including the representation of some Wishart probability distributions. In Chapter 4, we study the natural exponential family generated by a Riesz probability distribution. Two important results are established: the first is a characterization of a Riesz exponential family by its variance function, it is a rational fraction in the mean which reduces to a polynomial for the particular case of the Wishart, and the second is a characterization by invariance with respect to the triangular group. Chapter 5 studies the stable distributions on a simple Euclidean Jordan algebra within the framework of the mathematical theory of the stable distribution as a case study in exponential families. It also extends the definition of a Tweedie scale to the algebra and characterizes an important class of natural exponential families belonging to this scale involving the Riesz measure. Chapter 6 is devoted to the first moments of a Riesz probability distribution and to some properties of constancy regression on the sample mean. In Chapter 7, we introduce and study the beta probability distributions constructed by the Riesz probability distribution called in [46] beta–Riesz probability distribution. The tool is a division algorithm relying on the parametrization of the symmetric cone by the triangular group. Besides the extension to these distributions of the classical results of the usual matrix beta distribution, we establish a characterization in the Olkin and Rubin way. This characterization is the heart of the chapter – it shows how the Riesz probability distribution arises in a very natural way. Assuming the invariance with respect to the orthogonal group, the beta–Riesz probability distribution reduces to the beta–Wishart, it is then completely determined by its Mellin transform. This allows us to give in Chapter 8 some characterization results concerning the class of beta–Wishart distributions. We also give in this chapter a remarkable stability property which motivates the work in the next chapter. In Chapter 9, we use the hypergeometric functions to introduce a beta–hypergeometric probability distribution on a symmetric cone in its general form. Many connections between results on different subjects, particularly zonal polynomials, continued fractions, and Markov chains have led to a remarkable characterization of the beta–Gauss hypergeometric probability distribution by a stability property. A similar result concerning the beta–hypergeometric probability distribution corresponding to the hypergeometric function 1 F0 is given. Chapter 10 deals with the Dirichlet probability distribution constructed from the Riesz probability distribution introduced in [50]. Properties concerning the projections of this distribution on some subalgebras are established and, among other results, a Mauldon characterization of the Wishart–Dirichlet probability distribution is given. Chapter 11 is a short chapter on the Riesz–Inverse Gaussian probability distribution defined in [51]. It is proved that this distribution arises in a natural way as the distribution of one of the Peirce components of a Riesz random variable conditioned by another component.
Preface
| IX
I hope that the book will be useful as a source of statements and applications of results in multivariate probability distributions and multivariate statistical analysis, as well as a reference to some material of harmonic analysis on symmetric cones adapted to the needs of researchers in these fields.
Acknowledgment I wish to express my deepest gratitude to Sallouha Lajmi and Raoudha Zine for their generous help and advice. I also thank Afif Masmoudi, Mahdi Louati, Ons Regaieg, and Mohamed Ben Farah for their collaboration. I am also indebted to my former advisor Gérard Letac who has initiated me to research and who has never stopped encouraging me. I am grateful to an anonymous referee for giving me the time and attention to review the book and for his thoughtful and detailed comments that greatly improved the final version. My thanks are extended to Karin Sora, Daniel Tiemann, Nadja Schedensack, Achi Dosanjh, and Leonardo Milla for their patience in editing the book.
https://doi.org/10.1515/9783110713374-202
Contents Preface | V Acknowledgment | XI 1 1.1 1.2 1.3 1.3.1 1.3.2 1.4 1.4.1 1.4.2 1.4.3 1.4.4
Jordan algebras and symmetric cones | 1 Euclidean Jordan algebra | 1 The Peirce decomposition | 3 Symmetric cones | 8 Symmetric cone associated with a Jordan algebra | 8 Classification of the irreducible symmetric cones | 11 Groups of linear automorphisms | 13 The group G | 13 The orthogonal group | 14 Frobenius transformation | 16 The triangular group | 24
2 2.1 2.2 2.3 2.4 2.4.1 2.4.2 2.5 2.5.1 2.5.2 2.5.3 2.6 2.6.1 2.6.2
Generalized power | 31 Definitions and derivatives | 31 Properties | 34 Division algorithm | 50 Spherical polynomials | 53 The gamma function and the generalized Pochhammer symbol | 53 Spherical polynomials | 55 Spherical Fourier transform and Mellin transform | 56 Spherical Fourier transform | 56 Mellin transform | 57 Multiplicative convolution | 59 Decomposition into G-orbits | 62 Decomposition and boundary structure of a symmetric cone | 62 Decomposition of the algebra into G-orbits | 62
3 3.1 3.2 3.3 3.4 3.4.1 3.4.2 3.5 3.6
Riesz probability distributions | 67 Gaussian probability distribution on a Euclidean space | 67 Lévy measure | 69 Generalized Riesz integrals | 69 Riesz measures | 69 Definition of a Riesz measure | 69 Description of the Riesz measures | 71 Riesz probability distributions | 79 Effect of the orthogonal group on a Riesz probability distribution | 80
XIV | Contents 3.7 3.7.1 3.7.2 3.8
4 4.1 4.2 4.2.1 4.2.2 4.3
Distributions of the Peirce components of a Riesz random variable | 82 Distributions of the Peirce components | 82 Distributions of the entries in the Cholesky decomposition of a Riesz random variable | 85 Cennection between the Riesz probability distribution and the Gaussian | 86
4.4.2
Riesz natural exponential families | 93 Generalities on exponential families | 93 Natural exponential family generated by a Riesz measure | 96 Condition on the parameter s | 96 Variance function of a Riesz exponential family | 98 Characterization of a Riesz exponential families by a property of invariance | 103 Classification of the Riesz natural exponential families | 113 Construction of groups of linear transformations on a Jordan algebra | 114 Classification of the Riesz natural exponential families | 118
5 5.1 5.2 5.3
Tweedie scale | 125 Stable distributions | 125 The Tweedie scale | 134 The Riesz probability distribution as a Lévy measure | 141
6 6.1 6.1.1 6.1.2 6.2 6.2.1 6.2.2 6.2.3
Moments and constancy of regression | 143 Moments of a Riesz probability distribution | 143 Mean and variance of a Riesz probability distribution | 143 Mellin transform of a Riesz probability distribution | 143 Constancy of regression on the mean | 149 Definition and characterization | 149 Constant regression property for the Riesz probability distribution | 151 Constant regression of a scalar statistic | 154
7 7.1 7.1.1 7.1.2 7.2 7.3 7.4
Beta Riesz probability distributions | 159 Beta and hypergeometric functions | 159 Beta function | 159 Hypergeometric functions | 160 Beta–Riesz probability distributions | 164 Projection of a beta–Riesz probability distribution | 167 The expectation of a beta–Riesz probability distribution | 176
4.4 4.4.1
Contents | XV
7.5 7.5.1 7.5.2 7.6
Characterization of the Riesz probability distribution without invariance | 182 Functional equations | 184 Characterization without invariance of the quotient | 187 Beta–Riesz probability distribution of the second kind | 189
8 8.1 8.2 8.3 8.4
Beta–Wishart distributions | 193 A characterization of the beta–Wishart probability distribution | 197 Product of beta–Wishart random variables | 199 Stability property | 201 Characterizations by constancy regression | 204
9 9.1 9.1.1 9.1.2 9.2 9.2.1 9.2.2
Beta–hypergeometric distributions | 213 Continued fractions in a symmetric cone | 213 Definition and properties | 213 Convergence | 216 Beta–Gauss hypergeometric distributions | 228 Definition | 228 Characterizations of the beta–Gauss hypergeometric distributions | 229 Generalized beta distribution | 234
9.3
10 Riesz–Dirichlet distributions | 239 10.1 Definition | 239 10.2 Projections of a Riesz–Dirichlet distribution | 242 10.3 Wishart–Dirichlet distribution | 246 10.3.1 Some results involving determinants | 246 10.3.2 Product and independence properties | 249 10.3.3 Mauldon characterization of the Wishart–Dirichlet distribution | 253 10.3.4 A distributional property | 256 10.4 A generalization of the Dirichlet distribution | 258 11 11.1 11.2
Riesz inverse Gaussian distribution | 261 Modified generalized Bessel function | 262 Connection with the Riesz probability distribution | 264
Bibliography | 267 Index | 273 Index of notations | 275
1 Jordan algebras and symmetric cones Jordan algebras were first introduced in 1934 by Jordan et al. [63] to provide an algebraic formalism for Quantum Mechanics. Since then the Jordan algebras have been intensively studied by mathematicians, and many modernizing contributions have been realized. The paper by N. Jacobson [59] is considered as the work which put the theory on the path of modernism. A description of the historical development of this theory may be found in the book by K. McCrimmon [86]. It shows that the development seems to be in part due to a substantial interaction with many other areas of mathematics such as analysis, Lie algebras, algebraic groups, and projective geometry, and to an impressive variety of applications in theoretical physics, mechanics, and more particularly in probability and statistics. The aim of this chapter is not to give a complete exposition of the theory of symmetric cones. However, to be relatively self-contained, we will bring the material on harmonic analysis associated to symmetric cones that we will use in the development of our probabilistic and statistical results. We have adopted the notations used in the monograph of Faraut and Korányi [31] which provides a complete and self-contained exposition of the geometry and analysis on symmetric cones and related domains. The chapter contains many new results on symmetric cones that are necessary for our purposes and that we have been led to establish.
1.1 Euclidean Jordan algebra For the probabilistic purposes of the book, the field over which a Jordan algebra is defined is supposed to be ℝ. An algebra is a vector space V with a bilinear map defining a product V × V → V,
(x, y) → xy.
The algebra V is said to be a Jordan algebra if for all x, y, z in V, (i) xy = yx, (ii) x(x2 y) = x2 (xy), where we used the abbreviation x 2 = xx. In this case the product is called the Jordan product. Suppose that the Jordan algebra V is finite dimensional and that (iii) there exists an identity element e in V such that ex = x, for any x in V; (iv) there exists a scalar product ⟨x, y⟩ such that ⟨x, yz⟩ = ⟨xy, z⟩. Then we say that V is a Euclidean Jordan algebra. A Euclidean Jordan algebra is said to be simple if it does not contain a nontrivial ideal. https://doi.org/10.1515/9783110713374-001
2 | 1 Jordan algebras and symmetric cones An element x of the algebra V is said to be invertible if there exists y in V such that xy = e. In this case y is called the inverse of x and denoted x −1 . As a first example of a simple Euclidean Jordan algebra, we consider the space Sym(r, ℝ) of symmetric real (r, r)-matrices equipped with the Jordan product 1 xy = (x.y + y.x), 2 where x.y is the ordinary product of matrices and with the inner product ⟨x, y⟩ = tr(xy). Other examples will be given when we classify all kinds of symmetric cones. Now, for x in V, we define the endomorphisms of V L(x) : y → xy and P(x) = 2L(x)2 − L(x 2 ). Endomorphisms L(x) and P(x) are symmetric for the Euclidean structure of V, the map x → P(x) is called the quadratic representation of V. In the case where V is the algebra Sym(r, ℝ) of real symmetric matrices of rank r, we have that P(x)(y) = x.y.x. Given x, y and z in V, we denote P(x, y) = L(x)L(y) + L(y)L(x) − L(xy)
(1.1)
{xyz} = 2(x(yz) + (xy)z − (xz)y),
(1.2)
{xyz} = P(x + z)y − P(x)y − P(z)y.
(1.3)
and
which is equivalent to
Also for two endomorphisms g and h of V, we use the notation [g, h] = gh − hg.
(1.4)
Throughout all what follows, the algebra V is supposed to be a simple Euclidean Jordan algebra.
1.2 The Peirce decomposition
| 3
1.2 The Peirce decomposition An element c of V is said to be idempotent if c2 = c; it is a primitive idempotent if, furthermore, c ≠ 0 and is not the sum t + u of two nonnull idempotents t and u. A Jordan frame is a set {c1 , c2 , . . . , cr } of primitive idempotents such that r
∑ ci = e i=1
and ci cj = δi,j ci ,
for 1 ≤ i, j ≤ r.
It is an important result that the size r of such a set is a constant called the rank of V. For any element x of a Euclidean simple Jordan algebra, there exist a Jordan frame (ci )1≤i≤r and (λ1 , . . . , λr ) ∈ ℝr such that x = ∑ri=1 λi ci . The real numbers λ1 , λ2 , . . . , λr depend only on x, they are called the eigenvalues of x, and this decomposition is called its spectral decomposition. The trace and the determinant of x are then respectively defined by r
tr(x) = ∑ λi i=1
and r
Δ(x) = ∏ λi . i=1
It is shown in [31, p. 200] that, if x and y are invertible, then Δ(x−1 + y−1 ) = Δ(x−1 )Δ(y−1 )Δ(x + y).
(1.5)
The rank of an element x of V is the number of nonnull eigenvalues of x, it is denoted rank(x). Definition 1.1. An element x of V is said to have a signature equal to (p, q) and we write sg x = (p, q) if it has p strictly positive eigenvalues and q strictly negative eigenvalues. If c is a primitive idempotent of V, the only possible eigenvalues of L(c) are 0, 21 , and 1. The corresponding eigenspaces are respectively denoted by V(c, 0), V(c, 21 ), and V(c, 1), and the decomposition 1 V = V(c, 1) ⊕ V(c, ) ⊕ V(c, 0) 2 is called the Peirce decomposition of V with respect to c.
4 | 1 Jordan algebras and symmetric cones We have the following properties concerning the Jordan product: V(c, i)V(c, i) ⊂ V(c, i),
V(c, 1)V(c, 0) = 0,
i ∈ {0, 1},
1 1 V(c, 1)V(c, ) ⊂ V(c, ), 2 2 1 1 V(c, 0)V(c, ) ⊂ V(c, ), 2 2 1 1 V(c, )V(c, ) ⊂ V(c, 1) + V(c, 0). 2 2
(1.6)
From this, we deduce that for i and j in {0, 21 , 1}, V(c, 2i − j) if 2i − j is in {0, 21 , 1}, P(V(c, i))(V(c, j)) ⊂ { 0 if 2i − j is not in {0, 21 , 1}.
(1.7)
For an idempotent c and x in V(c, 1), we denote by Δ(c) (x) the determinant of x in the subalgebra V(c, 1). Remark 1.1. P(c) is the orthogonal projection on the space V(c, 1), its kernel is V(c, 21 )⊕ V(c, 0). An element x of V can then be written in a unique way as x = x1 + x12 + x0 , with x1 in V(c, 1), x12 in V(c, 21 ), and x0 in V(c, 0), which is also called the Peirce decomposition of x with respect to the idempotent c. For an example of such decomposition, we take V = Sym(r, ℝ) equipped with the usual basis. In this case, we have that ci is the element of V such that the diagonal entry of order i is equal to 1 and all the other entries are equal to 0. If, for 1 ≤ k ≤ r − 1, we consider the idempotent c = c1 +⋅ ⋅ ⋅+ck , then the Peirce decomposition with respect to c of an element x = (xij )1≤i,j≤r in V consists in decomposing x into blocks in the form x1 0
x=(
0 0 )+( 0 x21
x12 0 )+( 0 0
0 ), x0
where x1 is a (k, k)-matrix and x21 = t x12 . Throughout, we suppose that the Jordan frame (ci )1≤i≤r is fixed in V and, for 1 ≤ i, j ≤ r, we set V(ci , 1) = ℝci Vij = { V(ci , 21 ) ∩ V(cj , 21 )
if i = j,
if i ≠ j.
1.2 The Peirce decomposition
| 5
It is shown (see [31, Theorem IV.2.1, p. 68]) that V = ⨁ Vij . i≤j
It is the Peirce decomposition of V with respect to the Jordan frame (ci )1≤i≤r . The dimension of Vij is, for i ≠ j, a constant d called the Jordan constant, which is related to the dimension n and the rank r of V by n=r+
d r(r − 1). 2
For S ⊂ {1, . . . , r}, we set cS = ∑i∈S ci , it is an idempotent of rank equal to the cardinality of S. Then the Peirce spaces associated to cS are expressed in terms of the Vij as follows: V(cS , 1) = ⨁ Vij , i,j∈S
1 V(cS , ) = ⨁ Vij , 2 i∈S,j∈S̸
(1.8)
V(cS , 0) = ⨁ Vij . i∈S,j ̸ ∈S̸
We also have the rules of multiplication Vij Vij ⊂ Vii + Vjj ,
Vij Vjk ⊂ Vik
if i ≠ k,
(1.9)
Vij Vkl = {0} if {i, j} ∩ {k, l} = 0.
The first inclusion in (1.9) means that 1 xy = ⟨x, y⟩(ci + cj ) 2
when x and y are in Vij .
(1.10)
Proposition 1.1. Let c be an idempotent and let x be in V(c, 1), then we have: (i) 2L(x)|V(c, 1 ) is an endomorphism of V(c, 21 ) with determinant equal to (Δ(c) (x))d(r−k) , 2 where k is the rank of c; (ii) If x is invertible in V(c, 1), then 2L(x)|V(c, 1 ) is an automorphism of V(c, 21 ) with inverse 2
equal to 2L(x−1 )|V(c, 1 ) ; 2
(iii) L(x)2 |V(c, 1 ) = 21 L(x2 )|V(c, 1 ) . 2
2
Proof. (i) Let x ∈ V(c, 1). We know from (1.6) that V(c, 1)V(c, 21 ) ⊆ V(c, 21 ), hence 2L(x)|V(c, 1 ) is an endomorphism of V(c, 21 ). 2 As c is an idempotent with rank k, there exist c1 , c2 , . . . , ck orthogonal idempotents and (λ1 , . . . , λk ) ∈ ℝk such that c = ∑ki=1 ci and x = ∑ki=1 λi ci . We also have Δ(c) (x) = ∏ki=1 λi .
6 | 1 Jordan algebras and symmetric cones Similarly, since e − c is an idempotent with rank r − k, there exist ck+1 , ck+2 , . . . , cr orthogonal idempotents such that e − c = ∑r−k i=1 ck+i . The system (ci )1≤i≤r is a Jordan frame of V. ̃i,k+1 = ⨁r If, for 1 ≤ i ≤ k, we set V j=k+1 Vij , then k 1 ̃i,k+1 . V(c, ) = ⨁ V 2 i=1
(1.11)
We can easily show that 2L(x)|Ṽ = λi Idi,k+1 , for 1 ≤ i ≤ k, where Idi,k+1 is the identity i,k+1 ̃i,k+1 . As the dimension of V ̃i,k+1 is equal to (r − k)d, we have that the in the space V determinant of 2L(x)|V(c, 1 ) is equal to ∏ki=1 λi(r−k)d = (Δ(c) x)(r−k)d . 2
(ii) If x is invertible in V(c, 1), then λ1 , . . . , λk are different from zero and x−1 = k ̃i,k+1 with inverse λ−1 Idi,k+1 . ∑p=1 λp−1 cp . Therefore, 2L(x)|Ṽ is an automorphism of V i i,k+1
It follows from (1.11) that 2L(x)|V(c, 1 ) is an automorphism of V(c, 21 ) with inverse 2
2L(x−1 )|V(c, 1 ) . 2
(iii) We have that x2 = ∑kp=1 λp2 cp and, for all 1 ≤ i ≤ k, 2L(x)|Ṽ
i,k+1
= λi Idi,k+1 .
Then L(x)2 |Ṽ
i,k+1
2
=
1 λi 1 1 2 λ Id = Id = L(x 2 )|Ṽ . i,k+1 4 i i,k+1 2 2 i,k+1 2
Thus, we conclude that L(x)2 |V(c, 1 ) = 21 L(x2 )|V(c, 1 ) . 2
2
The following result is due to Massam and Neher [88]. Proposition 1.2. Let c be an idempotent of V, u1 in V(c, 1), v12 in V(c, 21 ), and z0 in V(c, 0). Then we have: (i) ⟨u1 , P(v12 )z0 ⟩ = 2⟨v12 , L(z0 )L(u1 )v12 ⟩; (ii) L(z0 )L(u1 ) = L(u1 )L(z0 ); (iii) For x in V(c, 1), we have P(x)P(c) = P(x). Proof. (i) For all x, y, and z in V, we have that ⟨x, P(y)(z)⟩ = 2⟨x, y(yz)⟩ − ⟨x, y2 z⟩ = 2⟨xy, yz⟩ − ⟨xz, y2 ⟩
= ⟨y, (2L(x)L(z) − L(xz))y⟩. Applying this to x = u1 , y = v12 , z = z0 , and taking into account the fact that u1 z0 = 0, we get ⟨u1 , P(v12 )(z0 )⟩ = 2⟨v12 , L(u1 )L(z0 )v12 ⟩
= 2⟨v12 , L(z0 )L(u1 )v12 ⟩.
1.2 The Peirce decomposition
| 7
(ii) We have that (see [31, Proposition II.1.1]) [L(x), L(yz)] + [L(y), L(xz)] + [L(z), L(xy)] = 0,
∀x, y, z ∈ V.
Taking x = u1 , y = c, and z = z0 , we get [L(u1 ), L(z0 )] = 0.
(1.12)
(iii) Let h = h1 +h12 +h0 be the Peirce decomposition with respect to c of an element h of V. Then, according to (1.7), we have that P(x)(h12 ) = P(x)(h0 ) = 0. Thus P(x)(h) = P(x)(h1 ) = P(x)P(c)(h). We deduce that P(x) = P(x)P(c). Proposition 1.3. Let x = ∑ri=1 xi ci + ∑ie
if and only if λi (x) > 1,
for i = 1, . . . , r.
If λi (x) > 1, for i = 1, . . . , r, then λi (x − e) ≥ λi (x) − λr (e) = λi (x) − 1 > 0,
for i = 1, . . . , r.
This implies that x − e is in Ω, therefore x > e. Now define J = {c : c is a primitive idempotent in V}. Then J is a compact subset of V, this fact appears in [31, p. 78]. For c in J, we have ⟨c, e⟩ = ⟨c2 , e⟩ = ⟨c, c⟩ = ‖c‖2 . With the notations above, we have the following result due to Hirzebruch [55] and known as min–max theorem. Theorem 1.5. For any x in V, we have ⟨x, c⟩ , ⟨e, c⟩ ⟨x, c⟩ λr (x) = min . c∈J ⟨e, c⟩
λ1 (x) = max c∈J
10 | 1 Jordan algebras and symmetric cones Now, given an idempotent c, we denote by Ωc the symmetric cone associated to the subalgebra V(c, 1). For a fixed Jordan frame (ci )1≤i≤r , and for 1 ≤ k ≤ r, we define c k = c1 + ⋅ ⋅ ⋅ + ck ,
and
V (k) = V(ck , 1),
(1.14)
The symmetric cone Ωck associated to V(ck , 1) is simply denoted by Ωk . We also define ck,l = ck+1 + ⋅ ⋅ ⋅ + ck+l = ck+l − ck .
(1.15)
Clearly, ck,l is an idempotent of rank l in V(e − ck , 1). We consider V(ck,l , 1), V(ck,l , 21 ), and V(ck,l , 0), respectively, as the subspaces of V(e − ck , 1) corresponding to the eigenvalues 1, 21 , and 0 of L(ck,l ). Also Ωck,l is the symmetric cone associated to V(ck,l , 1). Proposition 1.6. Let c be an idempotent of V. Then (i) If u1 ∈ Ωc and z0 ∈ Ωe−c , then L(u1 )L(z0 )|V(c, 1 ) is a positive definite endomorphism. 2
(ii) Ωc = P(c)(Ω).
Proof. (i) Let x12 be in V(c, 21 ) and such that x12 ≠ 0. Then, according to (1.3), we have that {u1 x12 z0 } = P(u1 + z0 )x12 − P(u1 )x12 − P(z0 )x12 = P(u1 + z0 )x12
because
P(u1 )x12 = P(z0 )x12 = 0.
On the other hand, from (1.2), we have that {x12 z0 x12 } = 2P(x12 )z0 . It follows that 4⟨x12 , L(u1 )L(z0 )x12 ⟩ = 4⟨u1 (z0 x12 ), x12 ⟩
2 = 2⟨u1 , P(x12 )z0 + x12 z0 ⟩
2 = 2⟨u1 , P(x12 )z0 ⟩ + 2⟨u1 , x12 z0 ⟩
= 2⟨u1 , P(x12 )z0 ⟩,
because u1 ∈ V(c, 1)
2 and x12 z0 ∈ V(c, 0).
Consequently, 4⟨x12 , L(u1 )L(z0 )x12 ⟩ = ⟨u1 , {x12 z0 x12 }⟩ = ⟨{u1 x12 z0 }, x12 ⟩
= ⟨P(u1 + z0 )x12 , x12 ⟩.
1.3 Symmetric cones | 11
As u1 + z0 ∈ Ωc + Ωe−c ⊂ Ω and P(u1 + z0 ) is positive definite, we have that ⟨P(u1 + z0 )x12 , x12 ⟩ > 0. From this we deduce that L(u1 )L(z0 )|V(c, 1 ) is positive definite. 2
(ii) Recall that from Theorem III.2.1 in [31], we have that the symmetric cone of a Jordan algebra is the set of elements x in V such that L(x) is positive definite. Let x be in Ω. For y ∈ V(c, 1), y ≠ 0, we have ⟨L(P(c)x)(y), y⟩ = ⟨P(c)(x)y, y⟩ = ⟨P(c)(x), y2 ⟩ = ⟨x, P(c)y2 ⟩
= ⟨y, xy⟩ > 0.
Thus P(c)Ω ⊂ Ωc . Now, let w ∈ Ωc , and then w + (e − c) is an element of Ω. Since P(c)(w + (e − c)) = P(c)(w) = w, we obtain that Ωc ⊂ P(c)Ω. 1.3.2 Classification of the irreducible symmetric cones The irreducible symmetric cones, or equivalently the simple Euclidean Jordan algebras, have been classified by Jordan, von Neumann, and Wigner [63] into four families of algebras together with a single exceptional algebra. It is shown (see [31]) that any simple Euclidean Jordan algebra is isomorphic to one of the following algebras: 1. The algebra of real symmetric matrices We have already mentioned that, equipped with the product 1 xy = (x.y + y.x), 2 where x.y is the ordinary product of matrices and with the inner product ⟨x, y⟩ = tr(xy), the space Sym(r, ℝ) of symmetric real (r, r)-matrices is a simple Euclidean Jordan algebra. The cone associated to this algebra is the cone of positive definite symmetric matrices. The rank of this algebra is r, the Jordan constant is d = 1, and the dimension is n = r(r+1) . 2 2. The algebra of complex Hermitian matrices We define on the space Herm(r, ℂ) of (r, r)-Hermitian matrices the product 1 xy = (x.y + y.x), 2
where x.y is the ordinary product of matrices, and the inner product by ⟨x, y⟩ = tr(xy). Then Herm(r, ℂ) is a simple Euclidean Jordan algebra.
12 | 1 Jordan algebras and symmetric cones The associated symmetric cone is the cone of positive definite matrices, the rank is r, the Jordan constant is d = 2, and the dimension is r 2 . 3. The algebra of quaternionic Hermitian matrices The linear space of quaternions is a four-dimensional space denoted by Q with basis (1, i, j, k). We define on Q a multiplication by
1 i j k
1
i
j
k
1 i j k
i −1 −k j
j k −1 −i
k −j i −1
The conjugate of an element x = x1 1+x2 i+x3 j+x4 k of the space Q is x = x1 1−x2 i−x3 j−x4 k. As for the matrices with complex entries, a matrix A with entries in Q is said to be Hermitian if it is equal to the transpose of its conjugate. On the set of (p, p)-Hermitian matrices with entries in Q, we define a product by 1 xy = (x.y + y.x), 2 and an inner product by ⟨x, y⟩ = R(tr(xy)), where R(z) is the real part of z. We get an algebra called the algebra of quaternionic Hermitian matrices. The rank of this algebra is r = p, the Jordan constant is d = 4, and the dimension is p(2p − 1). 4. The Jordan algebra of Minkowski space We consider the Minkowski linear space ℝ × ℝk . With the product (x0 , x)(y0 , y) = (x0 y0 + ⟨x, y⟩, x0 y + y0 x) and the usual inner product of ℝ × ℝk , it is a simple Euclidean Jordan algebra. The associated symmetric cone is the Lorentz cone Ω = {(x0 , x) : x0 > 0 and x02 − ‖x‖2 > 0}. The rank is r = 2, the Jordan constant d = k − 1, and the dimension is n = k + 1. 5. The algebra of 3 × 3 octonion Hermitian matrices This is an exceptional algebra related to the eight-dimensional real linear space of octonions O with basis (1, ε1 , . . . , ε7 ). A multiplication on O is defined by
1.4 Groups of linear automorphisms | 13
1 ε1 ε2 ε3 ε4 ε5 ε6 ε7
1
ε1
ε2
ε3
ε4
ε5
ε6
ε7
1 ε1 ε2 ε3 ε4 ε5 ε6 ε7
ε1 −1 −ε4 −ε7 ε2 −ε6 ε5 ε3
ε2 ε4 −1 −ε5 −ε1 ε3 −ε7 ε6
ε3 ε7 ε5 −1 −ε6 −ε2 ε4 −ε1
ε4 −ε2 ε1 ε6 −1 −ε7 −ε3 ε5
ε5 ε6 −ε3 ε2 ε7 −1 −ε1 −ε4
ε6 −ε5 ε7 −ε4 ε3 ε1 −1 −ε2
ε7 −ε3 −ε6 ε1 −ε5 ε4 ε2 −1
The conjugate of an element x = x0 1 + x1 ε1 + x2 ε2 + x3 ε3 + x4 ε4 + x5 ε5 + x6 ε6 + x7 ε7 is x = x0 1 − x1 ε1 − x2 ε2 − x3 ε3 − x4 ε4 − x5 ε5 − x6 ε6 − x7 ε7 . As for the matrices with complex entries, a square matrix A with entries in O is said to be Hermitian if it is equal to the transpose of its conjugate. Now on the space of 3×3 Hermitian matrices with entries in O, we define a product by 1 xy = (x.y + y.x), 2 and an inner product by ⟨x, y⟩ = R(tr(xy)). Then the space becomes a simple Euclidean Jordan algebra. The rank of this algebra is r = 3, the Jordan constant is d = 8, and the dimension is n = 27.
1.4 Groups of linear automorphisms 1.4.1 The group G We have already introduced in the previous section the group G(Ω) of linear automorphisms of the algebra V which preserve Ω. Now, following Faraut and Korányi [31], we denote by G the connected component of the identity in the group G(Ω). When V = Sym(r, ℝ), the group G is isomorphic to the linear group GL+ (ℝr ) of nonsingular r × r real matrices with positive determinant; more precisely, G is the set of linear automorphisms g of V defined by g(x) = axa∗ , where a is in GL+ (ℝr ) and here a∗ is the adjoint of a.
14 | 1 Jordan algebras and symmetric cones Recall that the adjoint of an element a of GL(ℂr ) is the conjugate transpose of a, that is, a∗ = t a, and the adjoint of an element a of GL(ℝr ) is the transpose of a, that is, a∗ = t a. The linear automorphisms of V belonging to the group G have the following useful properties proved in [31, pp. 53 and 570]. Proposition 1.7. Denoting Det g the determinant of an element g in G, and g ∗ its adjoint, we have (i) If x is invertible in V and g in G, then −1
(g(x))
−1
= (g ∗ ) (x−1 );
(ii) For all x in V and g in G, Δ(g(x)) = (Det g)r/n Δ(x); (iii) P(g(x)) = gP(x)g ∗ . 1.4.2 The orthogonal group The orthogonal group is the group K of the elements g of G such that g(e) = e. When V = Sym(r, ℝ), an element g is in the orthogonal group K if it is of the form g(x) = k.x.k ∗ , where k is an orthogonal matrix in GL+ (ℝr ). In the following proposition, we use the square-root of an element of Ω. From the definition of the cone Ω, for x in Ω, there exists a unique element x 1/2 in Ω such that x = (x1/2 )2 . In the case of real symmetric matrices, this means that any positive-definite symmetric matrix may be written in a unique manner as the square of a positivedefinite symmetric matrix. We also introduce the group of automorphisms of the algebra V. Definition 1.2. An automorphism g of the algebra V is a bijective linear transformation of V such that g(xy) = g(x)g(y). Denoting Aut(V) the set of automorphisms of the algebra V, we have that Aut(V) is a subgroup of the linear group GL(V). We have that K = G ∩ Aut(V). Therefore, if k is in K, then for all x and y in V, we have k(xy) = k(x)k(y).
(1.16)
1.4 Groups of linear automorphisms | 15
Proposition 1.8. (i) For any x in Ω and k in K, 1/2
(k(x))
= k(x1/2 ).
(ii) For any x, y in Ω, there exists k in K such that P(x1/2 )y = kP(y1/2 )x. Proof. (i) As k in an automorphism of V, we have 2
2
(k(x 1/2 )) = k((x1/2 ) ) = k(x). (ii) We first verify that for g in G, there exist z in Ω and k in K such that g = kP(z). In fact, it suffices to take z = (g(e))1/2 and k = gP(z)−1 . We apply this to g = P(x1/2 )P(y1/2 ). There exist z in Ω and k in K such that P(x1/2 )P(y1/2 ) = kP(z).
(1.17)
P(x1/2 )y = kP(z)(e).
(1.18)
Hence
But (1.17) implies that P(y1/2 )P(x1/2 ) = P(z)k ∗ , and it follows that P(y1/2 )x = P(z)(e). Replacing the latter into (1.18), we get the result. We suppose that the Jordan frame (ci )1≤i≤r is always fixed in the simple Euclidean ̃ of the orthogonal group K by Jordan algebra V, and define the subgroup K ̃ = {k ∈ K : k(ci ) = ci ∀1 ≤ i ≤ r}. K
(1.19)
̃ is sometimes denoted by M. In In the literature on harmonic analysis, the group K some cases this group is small. For instance, when V = Sym(r, ℝ), denoting by ρ(m) the linear map Herm(r, ℂ) → Herm(r, ℂ), X → mXm∗ ,
we have r
̃ = {ρ(m) : m = diag(ε1 , . . . , εr ), εk = ±1, ∏ εk = 1} ≃ (ℤ/2ℤ)r−1 , K k=1
and, when V = Herm(r, ℂ), we obtain r
̃ = {ρ(m) : m = diag(eiθ1 , . . . , eiθr ), θk ∈ ℝ, ∏ eiθk = 1} ≃ (S1 )r−1 . K k=1
16 | 1 Jordan algebras and symmetric cones 1.4.3 Frobenius transformation For x and y in V, x◻y is the endomorphism of V defined by x◻y = L(xy) + [L(x), L(y)]. If c is a idempotent and if z is an element of V(c, 21 ), then τc (z) = exp(2z◻c) is called the Frobenius transformation; it is an element of the group G. Proposition 1.9. Let c be an idempotent of V, x an element of V, and let x = x1 + x12 + x0 and y = y1 + y12 + y0 be the Peirce decompositions of x and y with respect to c. Then (i) If y = τc (z)∗ (x) with z in V(c, 21 ), then y1 = x1 + 2c(zx12 + z(zx0 )), { { { y12 = x12 + 2zx0 , { { { {y0 = x0 ;
(1.20)
y1 = x1 , { { { y12 = 2zx1 + x12 , { { { {y0 = 2(e − c)(z(zx1 ) + zx12 ) + x0 .
(1.21)
(ii) If y = τc (z)(x), then
Proof. We show claim (i); claim (ii) is proved by the same method. As (z◻c) = L(zc) + L(z)L(c) − L(c)L(z), we have that (z◻c)∗ = c◻z
and τc (z)∗ = exp(2c◻z),
2(c◻z) = 2(L(cz) + L(c)L(z) − L(z)L(c)) = L(z) + 2L(c)L(z) − 2L(z)L(c).
Also using the multiplication rules in (1.6), we have 2(c◻z)(x1 ) = zx1 + 2c(zx1 ) − 2zx1 = 0,
2(c◻z)(x12 ) = zx12 + 2c(zx12 ) − zx12 = 2c(zx12 ) ∈ V(c, 1), 1 2(c◻z)(x0 ) = zx0 + 2c(zx0 ) − 0 = 2zx0 ∈ V(c, ). 2
This implies that (2c◻z)n (x1 ) = 0,
∀n ≥ 1,
1.4 Groups of linear automorphisms | 17
(2c◻z)n (x12 ) = 0, n
2(c◻z) (x0 ) = 0,
∀n ≥ 2,
∀n ≥ 3.
It follows that 1 exp(2c◻z) = Id + 2c◻z + (2c◻z)2 , 2 and that τc (z)∗ (x1 ) = x1 ,
τc (z)∗ (x12 ) = x12 + 2c(zx12 ),
τc (z)∗ (x0 ) = x0 + 2zx0 + 2c(z(zx0 )).
Let V be the algebra Sym(r, ℝ) of symmetric matrices equipped with the usual basis, and take ci the element of Sym(r, ℝ) such that the diagonal entry of order i is equal to 1 and all the other entries are equal to 0. If we consider for 1 ≤ k ≤ r − 1, the idempotent c = c1 + ⋅ ⋅ ⋅ + ck , then the Peirce decomposition with respect to c of an element x = (xij )1≤i,j≤r in Sym(r, ℝ) consists in decomposing x in the form x = (t
x1 x12
x12 ), x0
where x1 is a (k, k)-matrix.
In this case, we have e = Ir , where Ir is the identity matrix of order r and the idempotent c is Ik 0
0 ). 0
0 z12
z12 ) 0
c=( If z = (t
is in V(c, 21 ), then the Frobenius transformation τc (z) corresponding to c = c1 + ⋅ ⋅ ⋅ + ck is the linear transformation of symmetric matrices x = (t
x1 x12
x12 I ) → (t k x0 z12
0 x )( 1 Ir−k t x12
x12 Ik )( x0 0
z12 ). Ir−k
The Frobenius transformation on a Jordan algebra may be seen as a generalization of this transformation on symmetric matrices.
18 | 1 Jordan algebras and symmetric cones Proposition 1.10. Let c be an idempotent of V, and let u and v be in V(c, 21 ). Then (i) {0 { { (c◻u)(x) = {c(ux) { { {ux
if x ∈ V(c, 1),
if x ∈ V(c, 21 ), if x ∈ V(c, 0);
(ii) ∀x ∈ V(c, 1), u(xv) − v(xu) ∈ V(c, 1); (iii) (c◻u)(c◻v) = (c◻v)(c◻u) and (u◻c)(v◻c) = (v◻c)(u◻c). Proof. (i) Follows from (1.6). (ii) Let x be in V(c, 1). Then u(xv) − v(xu) = 21 {uvx} − x(uv). As {uvx} and x(uv) are in ∈ V(c, 1), we have that u(xv) − v(xu) ∈ V(c, 1). (iii) With the use of (i), we show the equality on each of the spaces V(c, 1), V(c, 21 ), and V(c, 0). If x ∈ V(c, 1), then (c◻v)(c◻u)(x) = 0 = (c◻u)(c◻v)(x). If x ∈ V(c, 21 ), then (c◻v)(c◻u)(x) = (c◻v)(c(ux)) = 0
because c(ux) ∈ V(c, 1).
Similarly, (c◻u)(c◻v)(x) = (c◻u)(c(vx)) = 0. If x ∈ V(c, 0), then (c◻v)(c◻u)(x) = (c◻v)(ux) = c(v(ux)). Invoking (ii), we have that u(xv) − v(xu) ∈ V(c, 0), so c(u(xv)) = c(v(xu)). It follows that (c◻u)(c◻v) = (c◻v)(c◻u). The latter and the fact that (x◻y)∗ = y◻x lead to the second equality of (iii). Proposition 1.11. Let (ci )1≤i≤r be a Jordan frame. Consider a = ∑ri=1 ai ci and b = ∑ri=1 bi ci , with ai , bi in ℝ. Then (i) L(a)L(b) = L(b)L(a); (ii) If h = ∑ri=1 hi ci + ∑i 0 is the shape parameter.
3.6 Effect of the orthogonal group on a Riesz probability distribution We recall that the Jordan frame (ci )1≤i≤r is fixed and that for all k in the orthogonal group K, (k(ci ))1≤i≤r is a Jordan frame. Also for s = (s1 , . . . , sr ) in ℝr and 1 ≤ p ≤ r, we denote sp = (s1 , . . . , sp ) and ̂sp = (sp+1 , . . . , sr ). Theorem 3.10. Let X be a Riesz random variable X ∼ Rr (s, σ, (ci )). Then (i) For all k in the orthogonal group K, k(X) ∼ Rr (s, k(σ), (k(ci ))). (ii) For p in {1, . . . , r}, we set (σ −1 )1 = P(cp )(σ −1 ). Then the orthogonal projection P(cp )(X) of X on V(cp , 1) is a Riesz random variable, P(cp )(X) ∼ Rp (sp , ((σ −1 )1 ) , (ci )1≤i≤p ). −1
Proof. (i) Let μ be the image of Rr (s, σ, (ci )) by k. Then the Laplace transform of μ is Lμ (θ) = ∫ e⟨θ,k(x)⟩ Rr (s, σ, (ci ))(dx) Ω
= ∫ e⟨k Ω
−1
(θ),x⟩
Rr (s, σ, (ci ))(dx)
3.6 Effect of the orthogonal group on a Riesz probability distribution
| 81
= LRr (s,σ) (k −1 (θ)) = =
Δs ((σ − k −1 (θ))−1 ) Δs (σ −1 )
(for θ ∈ k(σ) − Ω, by (3.11))
Δs (k −1 (k(σ) − θ)−1 ) Δs (k −1 (k(σ −1 )))
(for θ ∈ k(σ) − Ω).
On the other hand, if we set c̃i = k(ci ), 1 ≤ i ≤ r, then, according to Theorem 2.9, we can write ̃ s ((k(σ) − θ)−1 ) Δ ̃ s ((k(σ −1 ))) Δ
Lμ (θ) =
(for θ ∈ k(σ) − Ω)
= LRr (k(σ),s,k(ci )) (θ), ̃ s denotes the generalized power associated to the Jordan frame (k(ci ))1≤i≤r . The where Δ result follows from the injectivity of the Laplace transform. (ii) Let ν be the image of the distribution Rr (s, σ, (ci )) by the orthogonal projection P(cp ) on V(cp , 1). Then the Laplace transform of ν evaluated at some θ ∈ V(cp , 1) is Lν (θ) =
1 ∫ e⟨θ,P(cp )(x)⟩−⟨σ,x⟩ Δs− n (x)dx r ΓΩ (s)Δs (σ −1 ) Ω
1 = ∫ e⟨P(cp )(θ)−σ,x⟩ Δs− n (x)dx r ΓΩ (s)Δs (σ −1 ) Ω
=
Δs ((σ − P(cp )(θ))−1 )
(for θ ∈ (σ − Ω) ∩ V(cp , 1)).
Δs (σ −1 )
Using Corollary 2.7 (iii), we obtain −1 Δspp (((σ −1 )−1 1 − θ) ) (c )
Lν (θ) =
Δspp ((σ −1 )1 ) (c )
,
for θ ∈ (σ − Ω) ∩ V(cp , 1).
Since, by Proposition 1.14, (σ − Ω) ∩ V(cp , 1) = (P(cp )σ −1 )−1 − Ωcp , we deduce that −1 Δspp (((σ −1 )−1 1 − θ) ) (c )
Lν (θ) =
(c ) Δspp ((σ −1 )1 )
= LRp (s
p ,(σ
−1 )−1 ,(c ) i 1≤i≤p ) 1
(θ)
(for θ ∈ (P(cp )σ −1 )
−1
− Ωcp )
(by (3.11)).
Corollary 3.11. Let X be a Riesz random variable X ∼ Rr (s, σ, (ci )1≤i≤r ), and let c be an idempotent of rank p. Suppose that (c̃i )1≤i≤p is a Jordan frame in V(c, 1) and k is an element of the orthogonal group K such that k(ci ) = c̃i , for 1 ≤ i ≤ p. Then P(c)k(X) is a Riesz random variable, P(c)k(X) ∼ Rp (sp , k(((σ −1 )1 ) ), (c̃i )1≤i≤p ). −1
82 | 3 Riesz probability distributions Proof. As k is in K, we have from Theorem 3.10 (i) that k(X) ∼ Rr (s, k(σ), (k(ci ))1≤i≤r ). Since c = ∑pi=1 c̃i , Theorem 3.10 (ii) implies that P(c)k(X) ∼ Rp (sp , (((k(σ)) )1 ) , (c̃i )1≤i≤p ). −1
−1
On the other hand, we have that ((k(σ)) )1 = P(c̃p )((k(σ)) ) −1
−1
= P(c̃p )k(σ −1 )
= kP(k −1 (c̃p ))(σ −1 ) = kP(cp ))(σ −1 ) = k((σ −1 )1 ).
To conclude, we easily verify that (((k(σ))−1 )1 )−1 = k(((σ −1 )1 )−1 )
3.7 Distributions of the Peirce components of a Riesz random variable 3.7.1 Distributions of the Peirce components Claim (i) in the following theorem is a reformulation of claim (ii) in Theorem 3.10, it is quoted here to emphasize the distributional links between the Peirce components of a Riesz random variable. Theorem 3.12. Let X be an absolutely continuous Riesz random variable X ∼ Rr (s, σ, (ci )1≤i≤r ), and let k be an integer, 1 ≤ k ≤ r − 1. Let X = X1 + X12 + X0 and σ = σ1 + σ12 + σ0 be respectively the Peirce decompositions of X and σ with respect to the idempotent cr−k . Then (i) X1 ∼ Rr−k (sr−k , η1 , (ci )1≤i≤r−k ), where η1 = σ1 − P(σ12 )σ0−1 = ((σ −1 )1 )−1 , (ii) X0 − P(X12 )X1−1 ∼ Rk (̂sr−k − (r − k) d2 , σ0 , (ci )r−k+1≤i≤r ) and is independent of X1 and X12 , (iii) The conditional distribution of X12 given X1 is Gaussian with mean −{σ0−1 σ12 X1 } and covariance operator (4L(σ0 )L(X1−1 ))−1 |V(c , 1 ) . r−k 2
Proof. Let X be a Riesz random variable X ∼ Rr (s, σ, (ci )1≤i≤r ), and let k ∈ {1, . . . , r − 1}. Denote nr−k = dim V(cr−k , 1)
3.7 Distributions of the Peirce components of a Riesz random variable | 83
and nk = dim V(cr−k , 0). Then since rank(cr−k ) = r − k, we have d nr−k = (r − k) + (r − k)(r − k − 1) , 2 d nk = k + k(k − 1) , 2 d n = r + r(r − 1) = nr−k + nk + (r − k)kd. 2
(3.14)
n n d = r−k + k , r r−k 2 n nk d = + (r − k) . r k 2
(3.15)
Hence, we have
We easily verify that ΓΩ (s) = (2π)
nr−k −(r−k) 2
nk −(k) 2
× (2π)
r−k d ∏ Γ(sj − (j − 1) ) 2 j=1 k
∏ Γ(sr−k+j − (r − k) j=1
k(r−k) d2
× (2π)
d d − (j − 1) ) 2 2
.
In other words, we have d d ΓΩ (s) = (2π)k(r−k) 2 ΓΩc (sr−k )ΓΩe−c (̂sr−k − (r − k) ). r−k r−k 2
(3.16)
Now consider the map 1 Ω → Ωcr−k × V(cr−k , ) × Ωe−cr−k , 2 x = x1 + x12 + x0 → (x1 , x12 , y0 ), where y0 = x0 − P(x12 )x1−1 . Its Jacobian is equal to 1. From Corollary 2.7, we have r−k ) Δs (x) = Δ(c (x1 )Δ̂s s
(e−cr−k )
r−k
r−k
(y0 ).
(3.17)
On the other hand, since σ ∈ Ω, by Proposition 1.12, there exists a unique z ∈ V(cr−k , 21 ) such that σ = τe−cr−k (z)(σ0 + η1 ),
84 | 3 Riesz probability distributions so that σ −1 = τe−cr−k (−z)∗ (σ0−1 + η−1 1 ) ∗ = τcr−k (−z)(σ0−1 + η−1 1 ) (because τe−cr−k (−z) = τcr−k (−z)).
Using Corollary 2.7 (ii), we have r−k ) (η−1 Δs (σ −1 ) = Δ(c s 1 )Δ̂s
(e−cr−k )
r−k
r−k
(3.18)
(σ0−1 ).
Given that the probability density function of X = X1 + X12 + X0 is 1 e−⟨σ,x⟩ Δs− n (x)1Ω (x), r ΓΩ (s)Δs (σ −1 )
using Proposition 1.16 and taking into account (3.15), (3.16), (3.17), and (3.18), the joint probability density function of (X1 , X12 , Y0 = X0 − P(X12 )X1−1 ) is 1
r−k ΓΩc (sr−k )Δsr−k (η−1 1 )
(c
)
e−⟨η1 ,x1 ⟩ Δ
(cr−k ) sr−k −
r−k
(e−cr−k ) (σ −1 ))−1 ̂sr−k −(r−k) d2 0
(Δ ×
×
1
(Δ(cr−k ) (x1 ))−k 2
(2π)
r−k
(e−cr−k ) n ̂sr−k −(r−k) d2 − kk
r−k
k(r−k) d2
(x1 )1Ωc (x1 )
e−⟨σ0 ,y0 ⟩ Δ
k) d2 )
ΓΩe−c (̂sr−k − (r −
nr−k r−k
(y0 )1Ωe−c (y0 ) r−k
d −1
d
(Δ(e−cr−k ) (σ0−1 ))(r−k) 2
e−⟨σ0 ,P(x12 +{σ0
σ12 x1 })x1−1 ⟩
.
From this we deduce that Y0 = X0 − P(X12 )X1−1
is independent of X1 and X12 ,
d Y0 ∼ Rk (̂sr−k − (r − k) , σ0 , (ci )r−k+1≤i≤r ), 2 X1 ∼ Rr−k (sr−k , η1 , (ci )1≤i≤r−k ),
and that the conditional distribution of X12 given X1 is d
1
k(r−k) d2
(2π)
(Δ(cr−k ) (x1 ))−k 2
−1
(r−k) d2
(Δ(e−cr−k ) (σ0−1 ))
e−⟨σ0 ,P(x12 +{σ0
σ12 x1 })x1−1 ⟩
.
According to Proposition 3.1, this is the Gaussian distribution with mean −{σ0−1 σ12 X1 } and covariance operator (4L(σ0 )L(X1−1 ))−1 |V(c , 1 ) . r−k 2
With the notations used in the previous theorem, we have that Corollary 3.13. Let c = cr−k , then (P(e − c)X −1 )
−1
d ∼ Rk (P(e − c)σ, w2 − (r − k) , (ci )r−k+1≤i≤r ). 2
Proof. It is a consequence of Theorem 2.13 since, from Proposition 1.13, we have that (P(e − c)X −1 )−1 = X0 − P(X12 )X1−1 .
3.7 Distributions of the Peirce components of a Riesz random variable
| 85
3.7.2 Distributions of the entries in the Cholesky decomposition of a Riesz random variable The following theorem appears in [44] and extends to the Riesz probability distribution a result which appears in [91, p. 99] for a particular Wishart probability distribution. Theorem 3.14. Let X be Rr (s, e), where s = (s1 , . . . , sr ) ∈ ℝr is such that si > (i − 1) d2 for all 1 ≤ i ≤ r, and let X = tU (e), where U = ∑ri=1 Uii ci + ∑i 0, ∀1 ≤ i ≤ r, then also ∑ri=1 si ci is in Ω. Hence kR s (θ) ∈ T(Ω) = Ω, and it follows that kR s (−Ω) ⊂ Ω. On the other hand, if y is in Ω, then there exists t1 in T such that y = t1 (e). From the definition of T, we can write r
y = τc1 (z (1) ) ⋅ ⋅ ⋅ τcr−1 (z (r−1) )P(∑ ai ci )(e), i=1
or equivalently, r
y = τc1 (z (1) ) ⋅ ⋅ ⋅ τcr−1 (z (r−1) )P(∑ i=1
Defining t = τc1 (z (1) ) ⋅ ⋅ ⋅ τcr−1 (z (r−1) )P(∑ri=1 kR s (θ).
From this, we deduce that
ai c) √si i
r ai ci )(∑ si ci ). √si i=1
and θ = −t ∗−1 (e), we have that y =
Ω ⊂ kR s (−Ω).
100 | 4 Riesz natural exponential families Finally, we obtain that kR s (−Ω) = Ω, that
that is,
MF(Rs ) = Ω.
(ii) Let m be in MF(Rs ) , then there exist θ in −Ω and a unique element t in T such m = t(e) = kR s (θ).
On the other hand, there exists a unique element t1 in the triangular group T such that θ = −t1∗−1 (e). Using (i), we can write r
r
i=1
i=1
t(e) = kR s (θ) = t1 (∑ si ci ) = t1 ∘ P(∑ √si ci )(e). Thus t1 ∘ P(∑ri=1 √si ci ) is an element of T such that r
t(e) = t1 ∘ P(∑ √si ci )(e). i=1
According to Proposition 1.18, this implies that r
t = t1 ∘ P(∑ √si ci ), i=1
which can be written as r
t1 = t ∘ P(∑ i=1
ci ). √si
(4.11)
Now using (4.10) and Proposition 2.1 (ii), we have that r
kRs (θ) = − ∑(sr−i − sr−i+1 )P((Pi∗ (−θ)) ).
(4.12)
−1
i=1
Given that the variance function of F(Rs ) evaluated at m = kR s (θ) is VF(Rs ) (m) = kRs (θ), we obtain r
VF(Rs ) (m) = − ∑(sr−i − sr−i+1 )P((Pi∗ (t1∗−1 (e))) ). i=1
Using (2.8) then (4.11), we have −1
(Pi∗ (t1∗−1 (e)))
= t1 (cr−i+1 + ⋅ ⋅ ⋅ + cr )
−1
(4.13)
4.2 Natural exponential family generated by a Riesz measure
| 101
r
ck )(cr−i+1 + ⋅ ⋅ ⋅ + cr ) (using (4.11)) s k=1 √ k
= t ∘ P( ∑ r
ck ). s k=r−i+1 k
= t( ∑ Since
i−1 ck 1 1 1 = (cr−i+1 + ⋅ ⋅ ⋅ + cr ) + ∑ ( − )(cr−k+1 + ⋅ ⋅ ⋅ + cr ), s s s s k r−i+1 r−k+1 r−k k=1 k=r−i+1 r
∑
we obtain −1
(Pi∗ (t1∗−1 (e)))
=
1
sr−i+1
t(cr−i+1 + ⋅ ⋅ ⋅ + cr )
i−1
+ ∑( k=1
=
1
sr−i+1
1 1 − )t(cr−k+1 + ⋅ ⋅ ⋅ + cr ) sr−k+1 sr−k −1
(Pi∗ (m−1 ))
i−1
+ ∑( k=1
1 1 −1 − )(Pk∗ (m−1 )) . sr−k+1 sr−k
Inserting this into (4.13), we get r
VF(Rs ) (m) = ∑(sr−i+1 − sr−i )P[ i=1
i−1
+ ∑( k=1
(Pi∗ (m−1 ))−1 sr−i+1
1 1 −1 − )(Pk∗ (m−1 )) ]. sr−k+1 sr−k
Note that in the particular case where s1 = ⋅ ⋅ ⋅ = sr = p, we have that Rs = Wp and the natural exponential family F(Wp ) is a family of Wishart distributions, its variance function reduces to VF(Wp ) (m) =
1 P(m). p
When V is the algebra of symmetric matrices, this means that for all m in Ω, VF(Wp ) (m)(θ) =
1 1 P(m)(θ) = m.θ.m, p p
where m.θ.m is the ordinary product of three symmetric matrices. To understand the structure of the variance function of a Riesz exponential family, we will calculate its representative matrix in a suitable basis of the algebra V. For the sake of simplification, we will suppose that the algebra V is of rank 2 and we fix in V a Jordan frame (c1 , c2 ). Let B12 = (e1 , . . . , ed ) be an orthonormal basis of V12 , and suppose that V is equipped with the basis B = (c1 , e1 , . . . , ed , c2 ). If x = x1 + x12 + x2 is
102 | 4 Riesz natural exponential families the Peirce decomposition of an element x of V with respect to the Jordan frame (c1 , c2 ) i with x12 = ∑di=1 x12 ei , then the representative matrix of P(x) in the basis B is x12 + 21 ‖x12 ‖2 1 x1 x12 ( . ( ( . ( ( . d x1 x12 1 2 ( 2 ‖x12 ‖
1 x1 x12 1 2 x1 x2 + (x12 ) 0 0 0 0 1 x2 x12
. 0 . . . . .
. . . . . . .
. . . . . 0 .
d x1 x12 0 0 0 0 d 2 x1 x2 + (x12 ) d x2 x12
1 ‖x ‖2 2 12 1 x2 x12
) . ) ). . ) ) . d x2 x12 1 2 x2 + 2 ‖x12 ‖2 )
In fact, using the multiplication rules (1.9) and (1.10), we have that 1 1 P(x)c1 = 2L(x)2 c1 − L(x2 )c1 = (x12 + ‖x12 ‖2 )c1 + x1 x12 + ‖x12 ‖2 c2 2 2 and 1 1 P(x)c2 = ‖x12 ‖2 c1 + x2 x12 + (x22 + ‖x12 ‖2 )c2 . 2 2 If y12 is in V12 , then P(x)y12 = 2x(xy12 ) − x2 y12
1 1 1 = 2x( x1 y12 + ⟨x12 , y12 ⟩(c1 + c2 ) + x2 y12 ) 2 2 2 1 1 1 − x12 y12 − x22 y12 − (x1 + x2 )⟨x12 , y12 ⟩(c1 + c2 ) 2 2 2 = x1 ⟨x12 , y12 ⟩c1 + (x1 x2 y12 + ⟨x12 , y12 ⟩x12 ) + x2 ⟨x12 , y12 ⟩c2 .
Now given (s1 , s2 ) in Ξ such that s1 ≠ 0, we have that VF(Rs ) (m) = (s2 − s1 )P[
1 ∗ −1 −1 (P (m )) ] + s1 P(Z), s2 1
where Z=
1 ∗ −1 −1 1 1 −1 (P (m )) + ( − )(P1∗ (m−1 )) . s1 2 s2 s1
Let m = m1 c1 + m12 + m2 c2 be the Peirce decomposition of m with respect to the Jordan frame (c1 , c2 ). Then m is written in the basis B as d
m = m1 c1 + ∑ mi12 ei + m2 c2 , i=1
4.3 Characterization of a Riesz exponential families by a property of invariance
| 103
and if we denote by A = (aij (m))1≤i,j≤d+2 the matrix of VF(Rs ) (m) in the basis B, then using the calculation above, we have that A is symmetric and that a11 =
m21 1 + ‖m ‖2 , s1 2s1 12
m1 mi12 , 2 ≤ i ≤ d + 1, s1 1 ‖m ‖2 , = 2s1 12
a1i = a1d+2
aii =
(mi )2 m1 m2 1 1 1 + ( − )‖m12 ‖2 + 12 , s2 2 s1 s2 s1
aid+2 = ( ad+2d+2 =
m2 1 1 1 + ( − )‖m12 ‖2 )mi12 , s2 2 s1 s2
2 ≤ i ≤ d + 1,
2 ≤ i ≤ d + 1,
2
2
s2 − s1 m2 1 1 1 ‖m12 ‖2 1 ‖m12 ‖2 1 (m − ) + s ( + − ) ) + ‖m12 ‖2 , ( 2 1 2 2 m s 2 s s m 2s s2 1 2 1 2 1 1
aij = 0,
2 ≤ i, j ≤ d + 1, and i ≠ j.
To illustrate this, we suppose that the dimension of V is equal to 3, in this case r = 2 and d = 1. Writing the element m12 of V12 as m12 e1 , were the last m12 is a real number, the matrix of VF(Rs ) (m) is m21 s1
(
1 m2 2s1 12 m1 m12 s1 m212 2s1
+
m1 m2 s2
m2 m12 s2
+ +
m1 m12 s1 1 3 ( − s1 )m212 2 s1 2 1 1 ( 2 s1
−
1 )m312 s2
m22 s2
m212 2s1 1 1 ( 2 s1
+ (1 −
m2 m12 + s2 2 s1 m12 ) + ( 4s1 s2 4s2 m21 1
− +
1 )m312 s2 s1 4s22
−
). 4
1 m12 ) 2s2 m21
+
m212 2s1
The entries of the variance function of a Riesz exponential family are rational fractions in the mean, they reduce to polynomials in the case where the family is a Wishart family.
4.3 Characterization of a Riesz exponential families by a property of invariance One of the most important characterization results concerning the Wishart natural exponential families on the symmetric cone Ω of a Jordan algebra V is the characterization by invariance under the group G of automorphisms of V preserving the cone Ω. More precisely, we have the following theorem due to Letac [77]. Theorem 4.4. (i) Let F(Wp ) be a Wishart natural exponential family generated by the Wishart measure Wp . Then for all g in the group G, g(F(Wp )) = F(Wp ).
104 | 4 Riesz natural exponential families (ii) If F is a natural exponential family in V invariant by G, then there exists p in Λ such that either F = F(Wp ) or F is the image of F(Wp ) by x → −x. Next, we characterize a Riesz natural exponential family by invariance by the triangular group. Theorem 4.5. Let F(Rs ) be a Riesz natural exponential family. Then for t in the triangular group T, t(F(Rs )) = F(Rs ). Proof. Given a Riesz natural exponential family F(Rs ) and t in T, we already know that t(F(Rs )) = F(t(Rs )), where t(Rs ) is the image of the Riesz measure Rs by t. On the other hand, for θ in −Ω, we have that Lt(Rs ) (θ) = ∫ exp(⟨θ, t(x)⟩)Rs (dx) V
= ∫ exp(⟨t ∗ (θ), x⟩)Rs (dx) V
= Δs (−t −1 (θ−1 ))
= Δs (t −1 (e))Δs (−θ−1 ) = Δs (t −1 (e))LRs (θ).
Thus t(Rs ) = Δs (t −1 (e))Rs , it follows that t(F(Rs )) = F(Rs ). The result may also be established using the fact that the variance function of a natural exponential family characterizes the family. In fact, we have that Mt(F(Rs )) = t(MF(Rs ) ) = t(Ω) = Ω = MF(Rs ) . It suffices then to verify that for all m ∈ Ω, VF(Rs ) (t(m)) = tVF(Rs ) (m)t ∗ . If m = e, then according to Theorem 4.3 and to Proposition 2.4, we have r
VF(Rs ) (t(e)) = ∑(sr−i+1 − sr−i )P[ i=1
i−1
+ ∑( r
k=1
1 t(c + ⋅ ⋅ ⋅ + cr ) sr−i+1 r−i+1
1 1 − )t(cr−k+1 + ⋅ ⋅ ⋅ + cr )] sr−k+1 sr−k
= ∑(sr−i+1 − sr−i )P[t( i=1
i−1
+ ∑( k=1
1 (c + ⋅ ⋅ ⋅ + cr ) sr−i+1 r−i+1
1 1 − )(cr−k+1 + ⋅ ⋅ ⋅ + cr ))] sr−k+1 sr−k
4.3 Characterization of a Riesz exponential families by a property of invariance r
= ∑(sr−i+1 − sr−i )tP[ i=1
i−1
+ ∑( k=1
| 105
1 (c + ⋅ ⋅ ⋅ + cr ) sr−i+1 r−i+1
1 1 − )(cr−k+1 + ⋅ ⋅ ⋅ + cr )]t ∗ sr−k+1 sr−k
= tVF(Rs ) (e)t ∗ . Now for any m in Ω, there exists t1 in T such that m = t1 (e), so that VF(Rs ) (t(m)) = VF(Rs ) (tt1 (e))
= tt1 VF(Rs ) (e)(tt1 )∗
= t(t1 VF(Rs ) (e)t1 ) )t ∗ ∗
= tVF(Rs ) (t1 (e))t ∗ = tVF(Rs ) (m)t ∗ .
In the following theorem, we make use of the set r
ϒ = {ε = ∑ εi ci : εi = 1 or εi = −1}, i=1
defined in (2.31). Theorem 4.6. Let F = F(μ) be a natural exponential family on V invariant by the triangular group T. Then (i) There exists ε in ϒ such that T(ε) ⊆ MF ; (ii) If ε in ϒ is such that T(ε) ⊆ MF , then either ε = e or ε = −e; (iii) We have either Ω ⊆ MF or −Ω ⊆ MF . Proof. (i) As MF is an open subset of V, from Theorem 2.19, we have that r
MF ⊈ V \ ⋃ Sir , i=1
in other words, r
MF ∩ (⋃ Sir ) ≠ 0. i=1
Therefore there exists 1 ≤ j ≤ r such that MF ∩ Sjr ≠ 0. Since ⋃sg(ε)=(j,r−j) Vε is dense in Sjr , there exists ε in ϒ such that sg(ε) = (j, r − j)
and MF ∩ Vε ≠ 0.
106 | 4 Riesz natural exponential families As MF is invariant with respect to T, we deduce that Vε ⊂ MF ,
that is, T(ε) ⊂ MF .
(ii) Suppose that ε in ϒ is such that T(ε) ⊂ MF , and consider the map φ : V+ → Θ(μ),
u → ψμ (t u (ε)), −1
where t u is defined by (1.33). We will show that φ is twice differentiable in V+ and calculate φ . Since the family F(μ) is supposed to be invariant with respect to T, by (4.7), for all u in V+ , we get VF(μ) (t u (ε)) = t u VF(μ) (ε)t u . ∗
As ψμ (t u (ε)) = (VF(μ) (t u (ε)))−1 , we obtain ψμ (t u (ε)) = t u Γt u , ∗−1
−1
where Γ = (VF(μ) (ε)) . −1
It follows that ψμ (t u (ε)) = t u Γt u , −1
for all u ∈ V+ .
∗
(4.14)
Using Corollary 1.22, we get for h ∈ V, φ (u)(h) = −(t u ΓH (u)(h)t u )(ε), ∗
−1
for all u ∈ V+ ,
where the function H is defined in (1.35). Differentiating again, we obtain for h and θ in V, φ (u)(h, θ) = −((H (u)(θ)) ΓH (u)(h)H1 (u) + H(u)∗ ΓH (u)(h, θ)H1 (u) ∗
+ H(u)∗ ΓH (u)(h)H1 (u)(θ))(ε),
where the function H1 is defined in (1.37). From Proposition 1.21, we have r−1
H (u)(h) = ∑ τ(u(0) ) ⋅ ⋅ ⋅ τ(u(j−1) )Kjh (u)τ(u(j+1) ) ⋅ ⋅ ⋅ τ(u(r−1) )P(u) j=1
+ 2τ(u(1) ) ⋅ ⋅ ⋅ τ(u(r−1) )(2L(u)L(h) − L(uh)). Then for all h and θ in V, we have r−1 j−1
H (u)(h, θ) = ∑ ∑ τ(u(0) ) ⋅ ⋅ ⋅ τ(u(l−1) )Klθ (u)τ(u(l+1) ) ⋅ ⋅ ⋅ τ(u(j−1) )Kjh (u) j=1 l=1
(4.15)
4.3 Characterization of a Riesz exponential families by a property of invariance
| 107
τ(u(j+1) ) ⋅ ⋅ ⋅ τ(u(r−1) )P(u) r−1
+ ∑ τ(u(1) ) ⋅ ⋅ ⋅ τ(u(j−1) )(Kjh ) (u)(θ)τ(u(j+1) ) ⋅ ⋅ ⋅ τ(u(r−1) )P(u)
j=1
r−1 r−1
+ ∑ ∑ τ(u(1) ) ⋅ ⋅ ⋅ τ(u(j−1) )Kjh (u)τ(u(j+1) ) ⋅ ⋅ ⋅ τ(u(l−1) )Klθ (u) j=1 l=j+1
τ(u(l+1) ) ⋅ ⋅ ⋅ τ(u(r−1) )P(u) r−1
+ 2 ∑ τ(u(1) ) ⋅ ⋅ ⋅ τ(u(j−1) )Kjh (u)τ(u(j+1) ) ⋅ ⋅ ⋅ τ(u(r−1) ) j=1
(2L(u)L(θ) − L(uθ)) r−1
+ 2 ∑ τ(u(1) ) ⋅ ⋅ ⋅ τ(u(j−1) )Kjθ (u)τ(u(j+1) ) ⋅ ⋅ ⋅ τ(u(r−1) ) j=1
(2L(u)L(h) − L(uh)) + 2τ(u(1) ) ⋅ ⋅ ⋅ τ(u(r−1) )(2L(θ)L(h) − L(θ h)). Now taking u = e, we have Kjh (e) = 2h(j) ◻cj and (Kjh ) (e)(θ) = 4(h(j) ◻cj )(θ(j) ◻cj ) (by (1.34)).
It follows that r−1
r−1
j=1
j=1
H (e)(h) = ∑ Kjh (e) + 2L(h) = 2(∑ h(j) ◻cj + L(h)). On the other hand, we have H1 (e)(θ) = −H (e)(θ)
(by Corollary 1.22)
and r−1 j−1
H (e)(h, θ) = ∑ ∑ Klθ (e)Kjh (e) j=1 l=1 r−1
+ ∑(Kjh ) (e)(θ)
j=1
r−1 r−1
+ ∑ ∑ Kjh (e)Klθ (e) j=1 l=j+1
(4.16)
108 | 4 Riesz natural exponential families r−1
+ 2 ∑ Kjh (e)L(θ) j=1
r−1
+ 2 ∑ Kjθ (e)L(h) j=1
+ 2(2L(θ)L(h) − L(θ h)). Therefore for h and θ in V, r−1
∗
r−1
φ (e)(h, θ) = −4(∑ θ ◻cj + L(θ)) Γ(∑ h(j) ◻cj + L(h))(ε)
(j)
j=1
j=1
r−1 j−1
r−1
− 4Γ(∑ ∑(θ(l) ◻cl )(h(j) ◻cj ) + ∑(h(j) ◻cj )(θ(j) ◻cj ) j=1 l=1
j=1
r−1 r−1
r−1
j=1 l=j+1
j=1
+ ∑ ∑ (h(j) ◻cj )(θ(l) ◻cl ) + ∑(θ(j) ◻cj )L(h) r−1 r−1 r−1 1 − L(h) ∑(θ(j) ◻cj ) − ∑ ∑ (h(j) ◻cj )(θ(k) ◻ck ) − L(θ h))(ε). 2 j=1 j=1 k=1
Using the symmetry of φ (e), that is, φ (e)(h, θ) = φ (e)(θ, h) ∀h, θ ∈ V, and setting θ = e, we obtain r−1
r−1
∗
Γ(∑ h ◻cj + L(h))(ε) = (∑ h ◻cj + L(h)) Γ(ε). (j)
j=1
(j)
j=1
(4.17)
As Γ is equal to (VF(μ) (ε))−1 , then also Γ(ε) = (VF(μ) (ε))−1 (ε) is in V. Denote a = Γ(ε) and write a = ∑ri=1 ai ci + ∑i 0
for all h ∈ V \ {0}.
By (4.18), we get ⟨a, α(h)2 (ε)⟩ > 0
for all h ∈ V \ {0}.
(4.18)
4.3 Characterization of a Riesz exponential families by a property of invariance
| 109
When h = ci , 1 ≤ i ≤ r, this gives ⟨a, εi ai ci ⟩ = εi ai > 0. It follows that εi and ai have the same sign,
1 ≤ i ≤ r.
When h = hjk , j < k, 1 ≤ j ≤ r − 1 such that hjk ≠ 0, we get ⟨a, (hjk ◻cj )2 (ε)⟩ > 0. Therefore ⟨a, εj ‖hjk ‖2 ck ⟩ > 0, that is, εj ak ‖hjk ‖2 > 0. It follows that for 1 ≤ j ≤ r − 1 and j < k, εj and ak have the same sign. Finally, we deduce that ε1 , . . . , εr and a1 , . . . , ar have the same sign. Consequently, we have either ε = e or ε = −e. (iii) Suppose that F is a natural exponential family on V invariant with respect to the triangular group T. From (i), there exists ε in ϒ such that T(ε) ⊆ MF , and from (ii), we have either ε = e or ε = −e. If ε = e, then T(e) = Ω ⊆ MF . If ε = −e, then T(−e) = −Ω ⊆ MF . The following theorem and Theorem 4.5 together give a characterization of the Riesz natural exponential families on a simple Euclidean Jordan algebra by invariance with respect to the triangular group T. Theorem 4.7. Let V be a simple Euclidean Jordan algebra, (ci )1≤i≤r a Jordan frame in V, and T the triangular group. If F is a natural exponential family on V invariant with respect to T, then there exists s in Ξ such that either F = F(Rs ) or F is the image of F(Rs ) by x → −x. Proof. We conserve the notations used in the proof of Theorem 4.6, in particular we have that T(ε) ⊆ MF when ε = e or ε = −e. We will first show that in the Peirce decomposition a = ∑ri=1 ai ci + ∑i 0, 1 ≤ i ≤ r, and Ω ⊂ MF . From (4.18), we get r
Γ(α(h)(e)) = α(h)∗ (∑ ai ci )
∀h ∈ V.
i=1
(4.19)
On the one hand, we have r−1
α(h)(e) = (∑ h(j) ◻cj + L(h))(e) j=1
=
1 r−1 (j) r ∑ h + ∑ hi ci 2 j=1 i=1
(by Proposition 1.11).
Thus r
α(h)(e) = ∑ hi ci + i=1
1 ∑h . 2 j (j − 1) d2 + 1. For x ∈ Ω, denote g(x) =
1 Δ n (x). ΓΩ (s) s− r
Then the density of X with respect to the Lebesgue measure on Ω is 1 e−⟨σ,x⟩ g(x). Δs (σ −1 ) It is known that for x ∈ Ω, x−1 Δ(x) is a polynomial of degree r − 1. This comes from the equality xr − xr−1 tr(x) + ⋅ ⋅ ⋅ + (−1)r eΔ(x) = 0, (see [31, pp. 28–29]).
148 | 6 Moments and constancy of regression From this we deduce that (Pi (x))−1 Δi (x) is a polynomial of degree i − 1 taking its values in V(c1 + ⋅ ⋅ ⋅ + ci , 1). Since sj > (j − 1) d2 + 1, we define p = (p1 , . . . , pr ) by pj = sj − 1 if j ≤ i and pj = sj if j ≥ i + 1. Then (Pi (x))−1 Δi (x) exp⟨−σ, x⟩Δp− n (x) is integrable on Ω. r This implies that r
∑(si − si+1 )𝔼((Pi (X)) ) exists. −1
i=1
Let us assume that θ belongs to −Ω. By Proposition 2.1, the differential of the function x → exp⟨θ, x⟩g(x) is r
exp⟨θ, x⟩g(x)(θ + ∑(si − si+1 )(Pi (x)) ). −1
i=1
Furthermore, this function is equal to 0 on the boundary 𝜕Ω of the cone Ω. Applying Stokes formula gives r
∫ e⟨θ,x⟩ g(x)(θ + ∑(si − si+1 )(Pi (x)) )dx = 0. −1
i=1
Ω
For θ = −σ, we obtain r
∫ e−⟨σ,x⟩ g(x)(∑(si − si+1 )(Pi (x)) )dx = σ ∫ e−⟨σ,x⟩ g(x)dx. −1
i=1
Ω
Ω
Then r
∑(si − si+1 )𝔼((Pi (X)) ) = σ. −1
i=1
Taking the differential of (6.7) with respect to θ yields r
∫ e⟨θ,x⟩ g(x)(∑(si − si+1 )(Pi (x))
−1
i=1
Ω
⊗ x + θ ⊗ x + IdV )dx = 0.
Again, for θ = −σ, we get r
∫ e−⟨σ,x⟩ g(x)(∑(si − si+1 )(Pi (x))
−1
i=1
Ω
⊗ x)dx
= σ ⊗ ∫ xe−⟨σ,x⟩ g(x)dx − IdV ∫ e−⟨σ,x⟩ g(x)dx. Ω
Ω
(6.7)
6.2 Constancy of regression on the mean
| 149
Hence r
∑(si − si+1 )E((Pi (X))
−1
i=1
⊗ X) = σ ⊗ 𝔼(X) − IdV .
Since r
𝔼(X) = − ∑(sr−i − sr−i+1 )(Pi∗ (σ)) , −1
i=1
we obtain r
∑(si − si+1 )E((Pi (X)) i=1
−1
r
⊗ X) = − ∑(sr−i − sr−i+1 )σ ⊗ (Pi∗ (σ)) i=1
−1
− IdV ,
which is the desired result.
6.2 Constancy of regression on the mean One of most important methods of statistical analysis is regression analysis. It is used to investigate the relationship between variables and to examine the impact of some variables on a dependent one. There are many types of regression which can be used. For each type, we need to know under what specific conditions it is the most appropriate to be applied (see [29] and [112]). In many interesting situations, the probability distribution of a random variable X is characterized by a property of constancy of regression of a statistic S = S(X1 , . . . , Xk ) on a statistic M = M(X1 , . . . , Xk ), where X1 , . . . , Xk is a random sample of X. Some examples of such characterization are given in [73] and [67]. For the most common real distributions, statistic S is a polynomial and statistic M is the sample mean. In some cases, the form and degree of statistic S have been used to classify the distributions. For instance, Laha and Lukacs [73] have described the class of distributions for which S is a polynomial of degree less than or equal to two, and Fosam and Shanbhag [33] have identified the class for which S is a polynomial of degree less than or equal to three. In fact, the form of the statistic is related to the form of the variance function of the generated natural exponential family. In this section, we determine the regression on the mean of some statistics related to the Riesz probability distribution. 6.2.1 Definition and characterization Let X1 , . . . , Xk be a random sample in V. If S = S(X1 , . . . , Xk ) and M = M(X1 , . . . , Xk ) are two statistics, we say that S has a constant regression on M if there exists a constant C such that 𝔼(S | M) = C
150 | 6 Moments and constancy of regression holds almost surely. Of course, this presupposes the existence of the conditional expectation, that is, 𝔼(|S|) < +∞. Up to changing S by S − C, we speak of zero regression rather than constancy of regression. The constancy of regression of a scalar statistic is characterized by the following property: Proposition 6.6. Let X1 , . . . , Xk be a random sample in V, and let S = S(X1 , . . . , Xk ) and M = M(X1 , . . . , Xk ) be two statistics. Suppose that S is a scalar statistic and that M is in V such that 𝔼(e⟨θ,M⟩ |S|) < +∞ for θ in an open subset 𝒪 of V. Then 𝔼(S | M) = C almost surely, where C is a constant if and only if 𝔼(e⟨θ,M⟩ S) = C𝔼(e⟨θ,M⟩ ),
for all θ in 𝒪.
Proof. (⇒) Suppose that 𝔼(S | M) = C almost surely. Then for all θ in 𝒪, we have 𝔼(e⟨θ,M⟩ S) = 𝔼(𝔼(e⟨θ,M⟩ S | M)) = 𝔼(e⟨θ,M⟩ 𝔼(S | M)) = 𝔼(Ce⟨θ,M⟩ )
= C𝔼(e⟨θ,M⟩ ). (⇐) Suppose that 𝔼(e⟨θ,M⟩ S) = C𝔼(e⟨θ,M⟩ ), for all θ in an open subset of V. Then 𝔼(e⟨θ,M⟩ (𝔼(S | M) − C)) = 0. Denoting φ(M) = 𝔼(S|M) − C, we obtain that ∫ e⟨θ,m⟩ φ(m)1{φ(m)≥0} dPM (m) = ∫ e⟨θ,m⟩ (−φ(m))1{φ(m) (i − 1) d2 , for all 1 ≤ i ≤ r. If we set αi = si + si for all 1 ≤ i ≤ r and αr+1 = nr , then r
r
(i)
∑(si − si+1 )𝔼((Pi (X))
(ii)
∑ 𝔼((si − si+1 )Pi (X)−1 ⊗ X + (si − si+1 )Pi (Y)−1 ⊗ Y | X + Y)
−1
i=1 r
i=1
| X + Y) = ∑(αi − αi+1 )(Pi (X + Y)) ; −1
i=1
r
= ∑(αi − αi+1 )Pi (X + Y)−1 ⊗ (X + Y) − IdV , i=1
where sr+1 = 0 and sr+1 = 0. Proof. We know that the Laplace transform of the Riesz probability distribution Rr (s, σ) is defined for θ in σ − Ω by LRr (s,σ) (θ) =
Δs ((σ − θ)−1 ) . Δs (σ −1 )
(i) For θ ∈ −Ω, we have r
𝔼(∑(si − si+1 )(Pi (X)) e⟨θ,X⟩ ) −1
i=1
r
= ∫ ∑(si − si+1 )(Pi (x))
1 e−⟨σ−θ,x⟩ Δs− n (x)dx r ΓΩ (s)Δs (σ −1 )
= ∑(si − si+1 ) ∫(Pi (x))
1 e−⟨σ−θ,x⟩ Δs− n (x)dx r ΓΩ (s)Δs (σ −1 )
−1
Ω i=1 r
−1
i=1
Ω
r
= ∑(si − si+1 )(∫(Pi (x)) i=1
Ω
−1
1 e−⟨(σ−θ),x⟩ Δs− n (x)dx) r ΓΩ (s)Δs ((σ − θ)−1 )
Δ ((σ − θ)−1 ) × s Δs (σ −1 )
= (σ − θ)LX (θ),
where in the last equality, we have used Theorem 6.5. Thus we have r
𝔼(∑(si − si+1 )(Pi (X)) e⟨θ,X⟩ ) = (σ − θ)LX (θ). i=1
−1
6.2 Constancy of regression on the mean
This implies that r
𝔼(∑(αi − αi+1 )(Pi (X + Y)) e⟨θ,X+Y⟩ ) = (σ − θ)LX+Y (θ). −1
i=1
It follows that r
∑(si − si+1 )𝔼((Pi (X))
−1
i=1
r
| X + Y) = ∑(αi − αi+1 )(Pi (X + Y)) . −1
i=1
From this we deduce that r
∑(si − si+1 )𝔼((Pi (X))
−1
i=1
r
| X + Y) = ∑(si − si+1 )𝔼((Pi (Y))
| X + Y).
−1
i=1
(ii) By Theorem 6.5, we have r
∑(si − si+1 )𝔼((Pi (X))
−1
i=1
r
⊗ X) = − ∑(sr−i − sr−i+1 )σ ⊗ (Pi∗ (σ))
−1
i=1
− IdV .
Hence r
∑(si − si+1 )𝔼((Pi (X)) i=1
⊗ Xe⟨θ,X⟩ )
−1
r
= ∑(si − si+1 ) ∫(Pi (x))
−1
i=1
⊗x
Ω
Δs− n (x) r
ΓΩ (s)Δs (σ −1 )
e−⟨σ−θ,x⟩ dx
r
= LX (θ)(−IdV − ∑(sr−i − sr−i+1 )(σ − θ) ⊗ Pi∗ ((σ − θ)) ). −1
i=1
From this, we deduce that r
∑(αi − αi+1 )𝔼((Pi (X + Y))
−1
i=1
⊗ (X + Y)e⟨θ,X+Y⟩ )
r
= LX+Y (θ)(− ∑(αr−i − αr−i+1 )(σ − θ) ⊗ (Pi∗ ((σ − θ)))
−1
i=1
− IdV ).
Then r
∑(αi − αi+1 )𝔼((Pi (X + Y)) i=1
−1
⊗ (X + Y)e⟨θ,X+Y⟩ )
r
= LX+Y (θ)(− ∑(sr−i − sr−i+1 )(σ − θ) ⊗ Pi∗ ((σ − θ))
−1
i=1
r
− ∑(sr−i − sr−i+1 )(σ − θ) ⊗ (Pi∗ ((σ − θ))) i=1
−1
− IdV ).
| 153
154 | 6 Moments and constancy of regression This can be rewritten as r
∑(si − si+1 )𝔼((Pi (X)) i=1
r
−1
⊗ Xe⟨θ,X+Y⟩ )
+ ∑(si − si+1 )𝔼((Pi (Y))
−1
i=1
⊗ Ye⟨θ,X+Y⟩ ) + LX+Y (θ)IdV .
We finally conclude that r
∑(si − si+1 )𝔼((Pi (X)) i=1
−1
r
⊗ X | X + Y) + ∑(si − si+1 )𝔼((Pi (Y))
r
i=1
= ∑(αi − αi+1 )(Pi (X + Y))
−1
i=1
−1
⊗ Y | X + Y)
⊗ (X + Y) − IdV .
Note that when all the si are equal and all the si are equal, we get as a particular case the properties of the Wishart distribution established by Letac and Massam.
6.2.3 Constant regression of a scalar statistic In [53], Heller has shown that the statistic S=
∑
1≤i,j≤k i=j̸
aij (cr (p − 1)Δ(Xi )Δ(Xj ) − cr (p)Δ(Xi )2 ),
(6.15)
where the aij are real numbers and cr (p) = ∏rl=1 (p − l−1 ), has a zero regression on the 2 sample mean of a Wishart distribution. The problem of extending Heller’s result to the Riesz probability distribution has been considered in [75]. The next theorem gives a scalar statistic which has a constant regression on the empirical mean of a sample having a Riesz probability distribution. From this statistic, we derive a scalar statistic corresponding to the Wishart probability distribution which is more general than that given by Heller. Theorem 6.8. Let X1 , . . . , Xk be a sequence of independent and identically distributed random variables with Riesz probability distribution Rr (s, σ). Then for all m = (m1 , . . . , mr ) ∈ ℕr such that m1 ≥ ⋅ ⋅ ⋅ ≥ mr > 0, and all real numbers aij , 1 ≤ i, j ≤ k, i ≠ j, the scalar statistic ∑
1≤i,j≤k i=j̸
2
aij ((s + m)m Δm (Xi )Δm (Xj ) − (s)m (Δm (Xj )) )
has a zero regression on the statistic M = X1 + ⋅ ⋅ ⋅ + Xk .
6.2 Constancy of regression on the mean
| 155
Proof. It suffices to show that for 1 ≤ i, j ≤ k, i ≠ j, the statistic 2
Eij = (s + m)m Δm (Xi )Δm (Xj ) − (s)m (Δm (Xj ))
has a zero regression on M. From (3.11), we have that the Laplace transform of the Riesz distribution is defined for θ in σ − Ω by LRr (s,σ) (θ) =
Δs ((σ − θ)−1 ) . Δs (σ −1 )
For the sake of simplicity, we set ω = σ − θ and define φ(ω) =
Δs (ω−1 ) . Δs (σ −1 )
φ(ω) =
Δ∗−s∗ (ω) . Δs (σ −1 )
By (2.7), we have (6.16)
We also have φ(ω) =
1 ∫ e−⟨ω,x⟩ Δs− n (x)dx. r ΓΩ (s)Δs (σ −1 ) Ω
Now let m = (m1 , . . . , mr ) ∈ ℕr such that m1 ≥ ⋅ ⋅ ⋅ ≥ mr > 0. Given that for a polynomial 𝜕 p in V, the associated differential operator p( 𝜕x ) is such that p(
𝜕 ⟨x,y⟩ )e = p(y)e⟨x,y⟩ , 𝜕x
we have that Δm (
(−1)|m| 𝜕 )φ(ω) = ∫ e⟨−ω,x⟩ Δm+s− n (x)dx. r 𝜕ω ΓΩ (s)Δs (σ −1 ) Ω
Therefore Δm (
𝜕 )φ(ω) = (−1)|m| 𝔼(e⟨θ,X⟩ Δm (X)). 𝜕ω
𝜕 Applying again Δm ( 𝜕ω ) and denoting
Δ2m (
𝜕 𝜕 𝜕 ) = Δm ( )Δm ( ), 𝜕ω 𝜕ω 𝜕ω
(6.17)
156 | 6 Moments and constancy of regression we get Δ2m (
𝜕 2 )φ(ω) = 𝔼(e⟨θ,x⟩ (Δm (X)) ). 𝜕ω
(6.18)
On the other hand, given that Δm (
𝜕 )Δ∗ (x) = (−1)|m| (−s∗ )m Δ∗s−m∗ (x), 𝜕x s
𝜕 ) to (6.16), we obtain applying Δm ( 𝜕ω
Δm (
Δ ∗ ∗ (ω) 𝜕 )φ(ω) = (−1)|m| (s)m −s −m−1 . 𝜕ω Δs (σ ) ∗
Therefore Δ2m (
Δ∗ ∗ ∗ (ω) 𝜕 )φ(ω) = (s)m (s + m)m −s −2m−1 . 𝜕ω Δs (σ )
It follows that (s + m)m (Δm (
2
𝜕 𝜕 )φ(ω)) − (s)m φ(ω)Δ2m ( )φ(ω) = 0. 𝜕ω 𝜕ω
This, using (6.17) and (6.18), implies 2
2
(s + m)m (𝔼(e⟨θ,x⟩ Δm (X))) = (s)m 𝔼(e⟨θ,X⟩ )𝔼(e⟨θ,X⟩ (Δm (X)) ). Since X1 , . . . , Xk are independent and identically distributed, this implies that, for all 1 ≤ i, j ≤ k, i ≠ j, 2
(s + m)m 𝔼(e⟨θ,Xi ⟩ Δm (Xi )e⟨θ,Xj ⟩ Δm (Xj )) − (s)m 𝔼(e⟨θ,Xi ⟩ e⟨θ,Xj ⟩ (Δm (Xj )) ) = 0. Multiplying by ∏1≤l≤k l=i,̸ l=j̸ 𝔼(e⟨θ,Xl ⟩ ), we get 2
𝔼(e⟨θ,M⟩ ((s + m)m Δm (Xi )Δm (Xj ) − (s)m (Δm (Xj )) )) = 0. According to Proposition 6.6, we deduce that the statistic 2
Eij = (s + m)m Δm (Xi )Δm (Xj ) − (s)m (Δm (Xj ))
has a zero regression on M = X1 +⋅ ⋅ ⋅+Xk , and the proof of Theorem 6.8 is complete. If we set s1 = ⋅ ⋅ ⋅ = sr = p in Theorem 6.8, we get the following statement which gives a scalar statistic corresponding to the Wishart probability distribution.
6.2 Constancy of regression on the mean
| 157
Corollary 6.9. Let X1 , . . . , Xk be a sequence of independent and identically distributed random variables with Wishart probability distribution W(p, σ). Then for all m = (m1 , . . . , mr ) ∈ ℕr such that m1 ≥ ⋅ ⋅ ⋅ ≥ mr > 0, and all real numbers aij , 1 ≤ i, j ≤ k, i ≠ j, the scalar statistic ∑
1≤i,j≤k i=j̸
2
aij ((p + m)m Δm (Xi )Δm (Xj ) − (p)m (Δm (Xj )) )
(6.19)
has a zero regression on M = X1 + ⋅ ⋅ ⋅ + Xk . Note that the statistic (6.19) is more general than that in (6.15) given by Heller. In fact, Heller’s statistic corresponds to the particular case where m = (1, . . . , 1).
7 Beta Riesz probability distributions In the first section of this chapter, we introduce the beta function on a symmetric cone which will serve in the definition of the beta–Riesz probability distribution. In the sequel, we also introduce the hypergeometric functions which will be used later.
7.1 Beta and hypergeometric functions 7.1.1 Beta function Consider for x in Ω, s and s in ℝr the integral I(x) =
∫ Ω∩(x−Ω)
Δs− n (y)Δs − n (x − y)dy. r
r
Then using Fubini theorem, we have ∫ e− tr(x) I(x)dx = Ω
∫ {(x,y)∈Ω×Ω ; x−y∈Ω}
e− tr(x) Δs− n (y)Δs − n (x − y)dxdy. r
r
Setting z = x − y, we get ∫ e− tr(x) I(x)dx = ∫ e− tr(y) Δs− n (y)dy ∫ e− tr(z) Δs − n (z)dz.
Ω
r
Ω
r
Ω
Using Theorem 2.13, we deduce that the integral is convergent for s and s in ∏ri=1 ](i − 1) d2 , +∞[, and that ∫ e− tr(x) I(x)dx = ΓΩ (s)ΓΩ (s ).
(7.1)
Ω
This implies that the integral I(x) is absolutely convergent for s and s in ∏ri=1 ](i − 1) d2 , +∞[. Now, we can write I(x) = Δs − n (x) r
∫ Ω∩(x−Ω)
Δs− n (y)Δs − n (e − π −1 (x)y)dy. r
r
n
Making the change of variable z = π −1 (x)y, we have that dy = Δ r (x)dz and Δs− n (y) = r Δs− n (x)Δs− n (z). Therefore we have r
r
I(x) = Δs+s − n (x) r
https://doi.org/10.1515/9783110713374-007
∫ Ω∩(e−Ω)
Δs− n (z)Δs − n (e − z)dz. r
r
160 | 7 Beta Riesz probability distributions In other words, I(x) = Δs+s − n (x)I(e), r
so that ∫ e− tr(x) I(x)dx = I(e) ∫ e− tr(x) Δs+s − n (x)dx
Ω
r
Ω
= ΓΩ (s + s ).
Comparing with (7.1), we obtain I(e) =
ΓΩ (s)ΓΩ (s ) . ΓΩ (s + s )
(7.2)
Therefore I(x) = Δs+s − n (x) r
ΓΩ (s)ΓΩ (s ) . ΓΩ (s + s )
(7.3)
The integral I(e) is called the beta function of the symmetric cone Ω, denoted BΩ (s, s ) and given by BΩ (s, s ) =
∫ Ω∩(e−Ω)
Δs− n (x)Δs − n (e − x)dx. r
r
(7.4)
The function BΩ (s, s ) is then defined for s and s in ∏ri=1 ](i − 1) d2 , +∞[, and we have that BΩ (s, s ) =
ΓΩ (s)ΓΩ (s ) . ΓΩ (s + s )
In the case where r = 1, we usually write B(⋅, ⋅) instead of BΩ (⋅, ⋅). 7.1.2 Hypergeometric functions The matrix hypergeometric functions arise in a diverse range of applications in probability theory and multivariate statistics. We next recall in the setting of a Jordan algebra some fact concerning these functions. The hypergeometric function p Fq on the cone Ω is defined in a general way for αi = (αi1 , . . . , αir ) in ℂr , i = 1, . . . , p and for βj = (βj1 , . . . , βjr ) in ℂr , j = 1, . . . , q, by p Fq (α1 , . . . , αp ; β1 , . . . , βq ; x)
= ∑
m≥0
1 d ϕ (x), (β1 )m ⋅ ⋅ ⋅ (βq )m ( nr )m m m
(α1 )m ⋅ ⋅ ⋅ (αp )m
where dm and ϕm are defined in (2.25) and (2.22), respectively.
(7.5)
7.1 Beta and hypergeometric functions | 161
It is also defined by the same expression (7.5) for αi and βj in ℂ. In this case, assuming that none of the numbers β1 , . . . , βq belongs to the set A = {a ∈ ℂ : ∃m ≥ 0, (a)m = 0},
(7.6)
it is shown in [31, p. 318] that if the numbers α1 , . . . , αp do not all belong to A, then the domain 𝒟 of convergence of this series is: – V, if p ≤ q, – D = {w : |w| < 1}, if p = q +1, where |⋅| is the spectral norm. Writing w = k(∑ri=1 λi ci ) with k in K and λi ≥ 0, this norm is defined by |w| = sup λi , it is invariant with respect to the orthogonal group. – 0, if p > q + 1. We have in particular that 0 F0 (x)
1 F0 (α, x)
= etr(x)
for x in V,
= Δ (e − x) −α
for x in D.
The following result serves in the definition of the beta–hypergeometric probability distributions. Proposition 7.1. Let p ≤ q + 1, a > (r − 1) d2 , and b > (r − 1) d2 . For g in G such that δ = g(e) is in Ω ∩ (e − Ω), we have n
n
Δa− r (x)Δb− r (e − x)p Fq (α1 , . . . , αp ; β1 , . . . , βq ; g(x))dx
∫ Ω∩(e−Ω)
Γ(a)Γ(b) = Γ(a + b)
(7.7) p+1 Fq+1 (α1 , . . . , αp , a; β1 , . . . , βq , a
+ b; δ).
Proof. From (7.4), we have that n
Δm+a− n (x)Δb− r (e − x)dx =
∫
r
Ω∩(e−Ω)
ΓΩ (m + a)ΓΩ (b) . ΓΩ (m + a + b)
Changing x by k(x) with k ∈ K and integrating over K, we get n
n
ϕm (x)Δa− r (x)Δb− r (e − x)dx =
∫ Ω∩(e−Ω)
ΓΩ (m + a)ΓΩ (b) . ΓΩ (m + a + b)
Thus n
n
ϕm (g(x))Δa− r (x)Δb− r (e − x)dx
∫ Ω∩(e−Ω)
=
∫
n
n
(∫ ϕm (gkx)dk)Δa− r (x)Δb− r (e − x)dx
Ω∩(e−Ω) K
162 | 7 Beta Riesz probability distributions
=
n
n
ϕm (δ)ϕm (x)Δa− r (x)Δb− r (e − x)dx
∫ Ω∩(e−Ω)
(because of (2.23)) =
ΓΩ (m + a)ΓΩ (b) ϕ (δ). ΓΩ (m + a + b) m
Integrating the hypergeometric series term by term, we get the desired result. Note that taking p = 1, q = 0, and g = π(δ) in Proposition 7.1, we get the Euler integral representation of the Gauss hypergeometric function: 2 F1 (α1 , a; b; δ)
=
ΓΩ (b) ΓΩ (a)ΓΩ (b − a) ×
n
n
Δa− r (x)Δb−a− r (e − x)Δ−α1 (e − π(δ)x)dx,
∫
(7.8)
Ω∩(e−Ω)
and taking p = 2 and q = 1, we get 3 F2 (α1 , α2 , a; β1 , b; δ)
=
ΓΩ (b) ΓΩ (a)ΓΩ (b − a) ×
n
n
Δa− r (x)Δb−a− r (e − x)2 F1 (α1 , α2 ; β1 ; π(δ)x)dx.
∫
(7.9)
Ω∩(e−Ω)
We first show that under the condition p = q + 1, the identity (7.7) extends to δ = e. In other words, we show that the series q+1 Fq (δ) converges when δ = e. In the matrix setting, this has been given in a broader context (see [99] or [39]). Proposition 7.2. Let αj = (αj1 , . . . , αjr ) and βj = (βj1 , . . . , βjr ) be in ℝr such that none of αj and βj is in the set A defined in (7.6). Denote for i = 1, . . . , r, ci = ∑ βji − ∑ αji . 1≤j≤q
1≤j≤q+1
Then the series ∑
m≥0
1 d , (β1 )m ⋅ ⋅ ⋅ (βq )m ( nr )m m
(α1 )m ⋅ ⋅ ⋅ (αq+1 )m
(7.10)
converges if and only if, for all 1 ≤ k ≤ r, k n ∑ ci > 1 + k(r − k) − k . r i=1
Note that in particular c1 > (r − 1)(1 − d2 ), and when r = 1, the condition reduces to c1 > 0.
7.1 Beta and hypergeometric functions | 163
Proof. Let αj = (αj1 , . . . , αjr ) and βj = (βj1 , . . . , βjr ) be in ℝr . We will consider separately two cases: – Case where αji > (i − 1) d2 and βji > (i − 1) d2 for all i = 1, . . . , r. Denote pi = mi − mi+1 , for i = 1, . . . , r, where mr+1 = 0. Then (p1 , . . . pr ) ∈ ℕr and mi = ∑rk=i pk . Using the approximation of dm , namely dm ≈ ∏ (1 + mi − mj )d ,
when mr → +∞,
1≤i 1 + k(r − k) − k j=1
n r
for all 1 ≤ k ≤ i − 1.
Remark 7.1. In the particular case where αj and βj are real numbers such that βj −(i−1) d2 is a nonnegative integer for all 1 ≤ j ≤ q and 1 ≤ i ≤ r, the series (7.10) converges if and only if d ∑ βj − ∑ αj > (r − 1) . 2 1≤j≤q 1≤j≤q+1 Now from Proposition 7.2, we deduce that n
n
Δa− r (x)Δb− r (e − x)p Fq (α1 , . . . , αp ; β1 , . . . , βq ; x)dx
∫ Ω∩(e−Ω)
Γ(a)Γ(b) = Γ(a + b)
(7.11) p+1 Fq+1 (α1 , . . . , αp , a; β1 , . . . , βq , a
+ b; e).
The Gauss hypergeometric function 2 F1 verifies the following useful formula (see [31, p. 330]): 2 F1 (β1
− α1 , β1 − α2 ; β1 ; x) = Δα1 +α2 −β1 (e − x) 2 F1 (α1 , α2 ; β1 ; x).
(7.12)
7.2 Beta–Riesz probability distributions The beta distribution on the real line ℝ with parameters p > 0 and q > 0 is defined by βp,q (dx) =
1 xp−1 (1 − x)q−1 1]0,1[ (x)dx. B(p, q)
7.2 Beta–Riesz probability distributions | 165
U where U and V are two indepenIt is the distribution of the random variable Z = U+V dent random variables with respective distributions γp,σ and γq,σ . The extension of this distribution to the cone of positive definite symmetric matrices was introduced by Hsu [56] while studying the distribution of roots of certain equation. Since then, this distribution has arisen in various problems in multivariate statistical analysis. For instance, several test statistics in multivariate analysis of variance and covariance are functions of beta matrices. In Bayesian analysis, this distribution and some of its properties are used in preposterior analysis of parameters of normal multivariate regression models. Using the general form of the beta function on the symmetric cone introduced in the preceding section involving the generalized power function, we define an important extension of the beta–Wishart distribution called beta–Riesz distribution (see [46]). This distribution may also be defined in a way similar to that used in the construction of the real beta distribution from the gamma distribution. More precisely, the gamma distribution is replaced by the Riesz probability distribution, and for the quotient, we use the division algorithm based on the Cholesky decomposition of an element of the cone. Some fundamental properties of the beta–Wishart distributions are extended to the beta–Riesz distributions. It is shown that the projection on a subalgebra of a beta– Riesz probability distribution is also a beta–Riesz probability distribution and that its the expectation is a linear combination of the elements of the fixed Jordan frame in the algebra. As a corollary, we give the regression on the mean of a Riesz random variable, that is, the conditional expectation 𝔼(U | U + V) where U and V are two independent Riesz random variables. Equation (7.4) leads to the following definition of a beta–Riesz probability distribution of the first kind; the beta–Riesz probability distribution of the second kind is deduced from its relation with that of the first kind.
Definition 7.1. The probability distribution (1) βs,s (dx) =
1 Δ n (x)Δs − n (e − x)1Ω∩(e−Ω) (x)dx r BΩ (s, s ) s− r
defined for s, s ∈ ℝr such that si > (i − 1) d2 , si > (i − 1) d2 for all 1 ≤ i ≤ r, is called the beta–Riesz probability distribution of the first kind on the symmetric cone Ω with parameters s and s . This name is justified by the fact that this distribution is connected to the Riesz probability distribution. More precisely, we have Theorem 7.3. Let U and W be two independent Riesz random variables U ∼ R(s, σ) and
W ∼ R(s , σ)
with s, s ∈ ℝr such that si > (i − 1) d2 , si > (i − 1) d2 for all 1 ≤ i ≤ r. If we set Y =U +W
and
X = π −1 (U + W)(U),
166 | 7 Beta Riesz probability distributions then (1) (i) X ∼ βs,s ,
(ii) Y ∼ R(s + s , σ) and is independent of X. Proof. Consider the transformation Ω × Ω → (Ω ∩ (e − Ω)) × Ω,
(u, w) → (x, y) = (π −1 (u + w)(u), u + w). n
Its Jacobian is Det(π −1 (y)) = Δ− r (y). The density of probability f(X,Y) (x, y) of (X, Y) with respect to the Lebesgue measure is then given by f(X,Y) (x, y) =
ΓΩ (s)ΓΩ n
1
Δs− n (π(y)(x))Δs − n (π(y)(e −1 r r s+s (σ ) e−⟨σ,y⟩ 1(Ω∩(e−Ω))×Ω (x, y).
(s )Δ
× Δ r (y)
− x))
This density may be written as f(X,Y) (x, y) =
ΓΩ (s + s ) Δ n (x) Δs − n (e − x)1(Ω∩(e−Ω)) (x) r ΓΩ (s)ΓΩ (s ) s− r 1 e−⟨σ,y⟩ Δs+s − n (y)1Ω (y). × r ΓΩ (s + s )Δs+s (σ −1 )
From this we deduce that X and Y are independent, that Y ∼ R(s + s , σ) and X ∼ (1) βs,s . When s1 = ⋅ ⋅ ⋅ = sr = p and s1 = ⋅ ⋅ ⋅ = sr = q with p > (r − 1) d2 and q > (r − 1) d2 , the generalized power functions reduce to determinants and the random variables U and W are Wishart. In this case the results in Theorem 7.3 may be stated and proved independently of the choice of the division algorithm, and the corresponding beta probability distribution is written as (1) βp,q (dx) =
n n 1 Δp− r (x)Δq− r (e − x)1Ω∩(e−Ω) (x)dx. BΩ (p, q)
It is the beta–Wishart probability distribution of the first kind on the symmetric cone Ω with parameters p and q. The beta–Wishart distributions on the cone of positive symmetric matrices have been extensively studied in the literature. Some of their properties may be deduced as particular statements from the results which will be established in the present chapter for the general beta–Riesz distributions. We will, however, specialize the next chapter for some results concerning beta–Wishart probability distribution. In the rest of the chapter, when we say beta–Riesz, we mean beta–Riesz of the first kind. The following theorem gives a property of independence related to the beta–Riesz distribution. The method of proof is the one used in [48].
7.3 Projection of a beta–Riesz probability distribution
| 167
(1) Theorem 7.4. Let X and Y be two independent beta–Riesz random variables, X ∼ βs,s
(1) d and Y ∼ βs+s are such that si , si , s ,s where s, s , s i > (i − 1) 2 , for 1 ≤ i ≤ r. Then
(1) π −1 (e−π(Y)X)(e−Y) and e−π(Y)X are independent with distributions βs(1) ,s and βs +s ,s , respectively.
Proof. From the hypothesis on X and Y, there exist U, V, and W independent Riesz random variables, U ∼ R(s, e), V ∼ R(s , e), and W ∼ R(s , e), such that X = π −1 (U + V)(U) and Y = π −1 (U + V + W)(U + V). The independence of X and Y is justified by the fact that X = π −1 (U + V)(U) is independent of U + V, see Theorem 7.3, and (U, V) is independent of W. Using Lemma 2.11, we have that π(Y)X = π −1 (U + V + W)(U). Therefore e − π(Y)(X) = π −1 (U + V + W)(U + V + W) − π −1 (U + V + W)(U) = π −1 (U + V + W)(V + W).
Thus, we conclude that e − π(Y)(X) is a βs(1) +s ,s random variable. Also we have that π −1 (e − π(Y)X)(e − Y) = π −1 (V + W)(W). This shows that π −1 (e − π(Y)X)(e − Y) is a βs(1) ,s random variable and that it is independent of V + W. As it is also independent of U and given that e − π(Y)X is function of V +W and U, we conclude that π −1 (e−π(Y)X)(e−Y) and e−π(Y)X are independent. The next theorem and Theorem 7.4 together give a characterization of the beta– Riesz distribution. For a proof, we refer the reader to [70]. Theorem 7.5. Let X and Y be two independent random variables with continuous positive density functions on Ω ∩ (e − Ω). Consider the random variables U = π −1 (e − π(Y)X)(e − Y)
and
V = e − π(Y)X.
(1) If U and V are independent, then there exist s, s , and s such that X ∼ βs,s and Y ∼ (1) βs+s ,s .
7.3 Projection of a beta–Riesz probability distribution In this section, we confine our attention to some properties of the projections of a beta– Riesz random variable. The first property concerns the projection on the subalgebra
168 | 7 Beta Riesz probability distributions W (k) = V(cr−k+1 + ⋅ ⋅ ⋅ + cr , 1). Recall that Pk∗ denotes the orthogonal projection on W (k) and that Ωe−cr−k is the symmetric cone of this subalgebra. We will denote by nk and ek∗ ,
respectively, the dimension and unit of W (k) . We also denote by Δ∗(k) the determinant and Δ∗(k) for s in ℝk the generalized power in the subalgebra W (k) . We also recall that s when s = (s1 , . . . , sr ), we set ̂sj = (sj+1 , . . . , sr ). With these notations we have, in particular, that nr−1 = (r − 1) + (r − 1)(r − 2)
d 2
n nr−1 d = + , r r−1 2
and
and that d
ΓΩ (s) = (2π) 2 (r−1) Γ(s1 )ΓΩe−c (̂s1 − 1
d ). 2
For the proof of the theorem concerning the projection of a beta–Riesz random variable, we need to establish several intermediary technical results which involve the generalized Riesz integral defined on the Schwartz space on V, i. e., the function space of 𝒞 ∞ functions on V with all derivatives rapidly decreasing. It is known (see [31, p. 139]) that any x in Ω may be written in a unique manner as x = t(uc1 + v)(c1 + y),
(7.13)
where y ∈ Ωe−c1 , v ∈ V(c1 , 21 ) and u > 0, and that if x = x1 + x12 + x0 is the Peirce decomposition of x with respect to c1 , we have x1 = u2 c1 ,
x12 = uv,
and
x0 = (v2 )0 + y.
(7.14)
Moreover, when x is in Ω ∩ (e − Ω), we have that 0 < u < 1. In fact, if x is in Ω ∩ (e − Ω), then e − x is in Ω, so that it may also be written in a unique manner as x = t(u c1 + v )(c1 + y ),
(7.15)
where y ∈ Ωe−c1 , v ∈ V(c1 , 21 ) and u > 0. Accordingly, the Peirce components of e − x with respect to c1 are (e − x)1 = u 2 c1 ,
(e − x)12 = u v ,
and
(e − x)0 = (v 2 )0 + y .
(7.16)
On the other hand, using (7.14), we have (e − x)1 = (1 − u2 )c1 ,
(e − x)12 = −uv,
and (e − x)0 = (e − c1 ) − (v2 )0 − y.
(7.17)
Equating corresponding terms, we get 1 − u2 = u 2 > 0. Therefore u2 < 1, and as we already know that u > 0, we deduce that 0 < u < 1.
7.3 Projection of a beta–Riesz probability distribution
| 169
Thus, when x is in Ω ∩ (e − Ω), we can replace in (7.13), v by √1 − u2 z, z ∈ V(c1 , 21 ), and write x as x = t(uc1 + √1 − u2 z)(c1 + y).
(7.18)
The Peirce components of x with respect to c1 are then x1 = u2 c1 ,
x12 = u√1 − u2 z,
and x0 = (1 − u2 )(z 2 )0 + y.
(7.19)
For symmetric matrices this means that we write an element x in Ω ∩ (e − Ω) as u √ z 1 − u2
0 1 )( Idr−1 0
x=(
z √1 − u2 ), Idr−1
0 u )( y 0
where y is an (r − 1) × (r − 1) matrix, z ∈ ℝr−1 , 0 < u < 1, and Idr−1 the identity matrix with rank r − 1. Also, since e − x is in Ω ∩ (e − Ω), it can be written in the form e − x = P(u c1 + c2 + ⋅ ⋅ ⋅ + cr )τc1 (√1 − u2 z )(c1 + y ), and so, using (7.14), we easily obtain the following useful expression: ∗ e − x = P(√1 − u2 c1 + c2 + ⋅ ⋅ ⋅ + cr )τc1 (−uz)(c1 + er−1 − y − (z 2 )0 ).
(7.20)
Lemma 7.6. Let u and v be in Ω and let x = π −1 (u + v)(u). Then for 1 ≤ j ≤ r, we have −1
(Pj∗ (x −1 ))
= π −1 (u + v)((Pj∗ (u−1 )) ). −1
Proof. As x is in Ω, we can write x = t1 (e) with t1 in T. Then, according to Proposition 2.4, we have −1
(Pj∗ (x−1 ))
r
= t1 ( ∑ ci ). i=r−j+1
As x = π −1 (u + v)u, we have that u = π(u + v) ∘ t1 (e), so it follows that −1
(Pj∗ (u−1 ))
r
= π(u + v) ∘ t1 ( ∑ ci ). i=r−j+1
Hence r
π −1 (u + v)((Pj∗ (u−1 )) ) = t1 ( ∑ ci ), −1
i=r−j+1
which is the desired result.
170 | 7 Beta Riesz probability distributions The following result resembles Theorem VII.1.7 from [31, p. 130]. It is, however, quite different and needs a different argument. Proposition 7.7. Let s = (s1 , . . . , sr ) be in ℝr such that si > (i − 1) d2 , 1 ≤ i ≤ r. Then Ir (s) =
n
d
Δ 2 − r (x) Δs− n (e − x) dx =
∫
r
Ω∩(e−Ω)
ΓΩ ( d2 )ΓΩ (s) ΓΩ ( d2 + s)
.
Proof. Using (7.14) and (7.20), we can write (r−1) d2
dx = 2u1+(r−1)d (1 − u2 ) Δs (x) =
dudzdy,
u2s1 Δ̂∗(r−1) (y), s1
Δ(x) = u2 Δ∗(r−1) (y), s1
(7.21)
∗ Δs (e − x) = (1 − u2 ) Δ∗(r−1) (er−1 − y − (z 2 )0 ), ̂ 2
s
1
Δ(e − x) = (1 − u )Δ
∗(r−1)
∗ (er−1 − y − (z 2 )0 ).
Denoting ∗ H = {(u, z, y) : 0 < u < 1, y ∈ Ωe−c1 and er−1 − y − (z 2 )0 ∈ Ωe−c1 },
(7.22)
we obtain that s1 −1
Ir (s) = ∫ 2ud−1 (1 − u2 )
∗(r−1) ∗ Δ∗(r−1) (er−1 − y − (z 2 )0 )dudzdy. d n (y) Δ̂ s −n 2
H
1
−r
r
As we have that 1
s1 −1
2 ∫ ud−1 (1 − u2 )
du =
0
Γ( d2 ) Γ(s1 ) Γ(s1 + d2 )
,
we also get Ir (s) =
Γ( d2 ) Γ(s1 ) Γ( d2
+ s1 )
∗(r−1) ∫ Δ∗(r−1) nr−1 (y) Δ d 𝒦
−
(̂s1 − 2 )−
r−1
nr−1 r−1
∗ (er−1 − y − (z 2 )0 ) dzdy,
where 2
𝒦 = {(z, y) : y ∈ Ωe−c1 and er−1 − y − (z )0 ∈ Ωe−c1 }. ∗
(7.23)
For the calculation of Ir (s), we set 1 2
2
𝒦 = {z ∈ V(c1 , ) : er−1 − y − (z )0 ∈ Ωe−c1 }
∗
(7.24)
7.3 Projection of a beta–Riesz probability distribution
| 171
and Ψ(x) = Δ∗(r−1) d
(̂s1 − 2 )−
nr−1 r−1
∗ (er−1 − y − x).
Then d
∫ Δ∗(r−1) d
(̂s1 − 2 )−
𝒦
nr−1 r−1
∗ (er−1 − y − (z 2 )0 ) dz = (2π) 2 (r−1) R0d (Ψ), 2
(7.25)
where R0α is the tempered distribution on the subalgebra V(c1 , 0) as defined in (3.3). Hence Ir (s) =
Γ( d2 ) Γ(s1 ) Γ( d2
+ s1 )
d
(2π) 2 (r−1) ⟨R0d ⋆ R00 , f (0, ⋅)⟩ 2
where f (x, y) = Δ∗(r−1) d
(̂s1 − 2 )−
∗ (er−1 − y − x).
nr−1 r−1
Now the fact that R0d ⋆ R00 = R0d implies that 2
2
Ir (s) =
Γ( d2 ) Γ(s1 ) Γ( d2
+ s1 )
d
= (2π) 2
d
(2π) 2 (r−1) ⟨R0d , f (⋅, 0)⟩ 2
Γ(s1 )
Γ( d2
+ s1 )
Ir−1 (̂s1 −
d ). 2
By induction, we get d
Ir (s) = (2π) 2 (r−1)
Γ(s1 ) ⋅ ⋅ ⋅ Γ(sr−1 − (r − 2) d2 )
Γ(s1 +
d ) ⋅ ⋅ ⋅ Γ(sr−1 2
+
d 2
− (r −
I1 (̂sr−1 2) d2 )
d − (r − 1) ). 2
Since 1
d d d I1 (̂sr−1 − (r − 1) ) = ∫ x 2 −1 (1 − x)sr −(r−1) 2 −1 dx 2
0
=
Γ( d2 ) Γ(sr − (r − 1) d2 ) Γ(sr +
d 2
− (r − 1) d2 )
,
the proof is complete. Theorem 7.8. Let U and W be two independent Riesz random variables, U ∼ R(s, σ) and W ∼ R(τ, σ), and let X = π −1 (U + W)(U). If X1 , X12 , X0 are the Peirce components of X with respect to c1 , then identifying ℝc1 with ℝ, we have
172 | 7 Beta Riesz probability distributions (i) For all 1 ≤ j ≤ r − 1, −1
∗ (Pr−j (X −1 ))
∗ = π −1 (U + W)((Pr−j (U −1 )) ); −1
(ii) X1 has a beta distribution on ℝ; ∗ (iii) Y = (Pr−1 (X −1 ))−1 is a beta–Riesz random variable on the algebra V(c2 + ⋅ ⋅ ⋅ + cr , 1) with a probability density function equal to 1
BΩ1 (̂s1 − d2 , τ̂1 )
Δ∗(r−1) d
(̂s1 − 2 )−
∗ (y)Δ∗(r−1) nr−1 (er−1 − y)1Ω e−c ̂
nr−1 r−1
τ1 −
1
r−1
∗ −Ω (y). ∩(er−1 e−c1 )
X12 . √X1 (1−X1 )
Furthermore X1 is independent of Y and of
Note that if V is the algebra of (r, r)-symmetric matrices, the Peirce decomposition of an element X of V with respect to c1 consists in writing X1 X21
X12 ) X0
(
with X1 a positive real number. In this case ∗ Y = (Pr−1 (X −1 ))
−1
= X0 − X21 .X1−1 .X12 .
Proof. (i) Follows from Lemma 7.6. (ii) We know, from Theorem 7.3, that the density of X = π −1 (U +W)(U) with respect to the Lebesgue measure is 1 Δ n (x)Δτ− n (e − x)1Ω∩(e−Ω) (x). r BΩ (s, τ) s− r As X is in Ω ∩ (e − Ω), then writing X and e − X in terms of X1 , Z, and Y as in (7.19) and (7.20), and using (7.21) for X, we obtain that the joint probability density of (X1 , Z, Y) is Cx1 s1 −1 (1 − x1 )τ1 −1 Δ∗(r−1) d
(̂s1 − 2 )−
nr−1 r−1
(y)Δ∗(r−1) d
(̂τ1 − 2 )−
nr−1 r−1
∗ (er−1 − y − (z 2 )0 )1H (x1 , y, z)
where C=
ΓΩe−c (̂s1 + τ̂1 − d2 ) 1 1 1 B(s1 , τ1 ) (2π)(r−1) d2 ΓΩ (̂s1 − d )ΓΩ (τ̂1 − d ) 2 2 e−c 1 1
and ∗ H = {(x1 , z, y) : 0 < x1 < 1, y ∈ Ωe−c1 , and er−1 − y − (z 2 )0 ∈ Ωe−c1 }.
Thus X1 is independent of Y and Z and has a beta distribution on ℝ with density equal to fX1 (x1 ) =
1 x s1 −1 (1 − x1 )τ1 −1 1]0,1[ (x1 ). B(s1 , τ1 ) 1
7.3 Projection of a beta–Riesz probability distribution
| 173
(iii) From (7.25), we have that d
∫ Δ∗(r−1) d
nr−1 (̂τ1 − 2 )− r−1
𝒦
∗ (er−1 − y − (z 2 )0 ) dz = (2π) 2 (r−1) R0d (ϕ), 2
(7.26)
where 𝒦 is defined by (7.24) and ϕ(x) = Δ∗(r−1) d
(̂τ1 − 2 )−
nr−1 r−1
∗ (er−1 − y − x),
which is equal to (2π)
d (r−1) 2
ΓΩe−c (τ̂1 − d2 ) 1
ΓΩe−c (τ̂1 ) 1
∗ Δ∗(r−1) nr−1 (er−1 − y). ̂ τ1 −
r−1
It follows that the probability density function of Y is fY (y) =
1
BΩe−c (̂s1 − 1
Δ∗(r−1) nr−1 d d , τ̂ ) (̂s1 − 2 )− r−1 2 1
∗ (y)Δ∗(r−1) nr−1 (er−1 − y)1Ω e−c ̂ τ1 −
1
r−1
∗ −Ω (y). ∩(er−1 e−c1 )
With the notations of Theorem 7.8, the next corollary is easily proved using the fact that, for i ≤ j, Pi∗ ∘ Pj∗ = Pi∗ . ∗ Corollary 7.9. For all 1 ≤ j ≤ r − 1, the random variable (Pr−j (X −1 ))−1 has a beta–Riesz probability distribution on V(cj+1 + ⋅ ⋅ ⋅ + cr , 1) and its probability density is
1
BΩj (̂sj −
∗(r−j) Δ nr−j d d j 2 , τ̂j ) (̂sj −j 2 )− r−j
(y)Δ
∗(r−j) τ̂j −
nr−j r−j
∗ (er−j − y)1Ωe−c ∩(e∗ j
r−j
−Ωe−cj ) (y).
The next two theorems appear in [116], they are established in the setting of the algebra V = Sym(r, ℝ) of real symmetric matrices of rank r. We will conserve the notations used for any Jordan algebra and suppose that Sym(r, ℝ) is equipped with its usual Jordan frame (c1 , . . . , cr ). Theorem 7.10. Let X be a beta–Riesz random variable of the first kind in Sym(r, ℝ) with (1) distribution βs,τ , and let X1 , X12 , X0 be the Peirce components of X with respect to ck . Define Y = X0 − P(X12 )(X1−1 ) = X0 − X21 .(X1−1 ).X12 ,
(7.27)
∗ ∗ and write er−k − Y = t.t ∗ , the Cholesky decomposition of er−k − Y, where t is a lowertriangular matrix with a positive diagonal. Let 1
W = t −1 .X21 .(X1−1 + (ck − X1 )−1 ) 2 .
(7.28)
174 | 7 Beta Riesz probability distributions Then X1 , Y, and W are independent, Y ∼ β(1)
X1 ∼ βs(1),τ , k
̂sk − k2 ,̂τk
k
,
and the probability density function of W is fW (w) = CΔτ̂
k k− 2
∗ (er−k − ww∗ )1M (w),
where C is a normalizing constant and M is the set of (r − k, k)-matrices w such that ∗ er−k − ww∗ ∈ Ωe−ck . Proof. The probability distribution of X is given by 1 Δ n (x)Δτ− n (e − x)1Ω∩(e−Ω) (x)dx. r BΩ (s, τ) s− r
(7.29)
Consider the map φ:
Ω ∩ (e − Ω)
∗ (Ωck ∩ (ck − Ωck )) × (Ωe−ck ∩ (er−k − Ωe−ck )) × M,
→
x = x1 + x12 + x0 →
(x1 , y, w)
with y, w defined in terms of x1 , x12 , and x0 as in (7.27) and (7.28). A standard calculation shows that φ is a bijection. It is also easy to see that it is a diffeomorphism which can be decomposed as φ = φ2 ∘ φ1 with φ1 : x = x1 + x12 + x0 → (x1 , y, x12 )
and φ2 : (x1 , y, x12 ) → (x1 , y, w).
We already know that the Jacobian of φ1 is equal to 1, and regarding 1
w = t −1 .x21 .(x1−1 + (ck − x1 )−1 ) 2 as a function of x12 leads to k
∗ dw = Δ− 2 (er−k − y)Δ−
r−k 2
(x1 )Δ−
r−k 2
(ck − x1 )dx12 .
Therefore k
∗ dx = Δ 2 (er−k − y)Δ
r−k 2
(x1 )Δ
r−k 2
(ck − x1 )dx1 dydw.
On the other hand, we have Δs− n (x) = Δsk − n (x1 )Δ̂sk − n (x0 − P(x12 )(x1−1 )) r
r
r
r
r
= Δsk − n (x1 )Δ̂sk − n (y))
and ∗ Δτ− n (e − x) = Δτk − n (ck − x1 )Δτ̂k − n (er−k − x0 − P(x12 )(ck − x1 )−1 ). r
r
r
(7.30)
7.3 Projection of a beta–Riesz probability distribution
| 175
As ∗ ∗ er−k − y − P(x12 )x1−1 − P(x12 )(ck − x1 )−1 − x0 − P(x12 )(ck − x1 )−1 = er−k ∗ ∗ = π(er−k − y)(er−k − ww∗ ),
we obtain ∗ ∗ − y)(er−k − ww∗ )) Δτ− n (e − x) = Δτk − n (ck − x1 )Δτ̂k − n (π(er−k r
r
r
r
r
∗ ∗ = Δτk − n (ck − x1 )Δτ̂k − n (er−k − y)Δτ̂k − n (er−k − ww∗ ). r
Replacing into (7.29), we deduce that with the normalizing constant C1 , the joint probability density function of (X1 , Y, W) evaluated at a point (x1 , y, w) of (Ωck ∩ (ck − Ωck )) × ∗ − Ωe−ck )) × M is (Ωe−ck ∩ (er−k f(X1 ,Y,W) (x1 , y, w) = C1 Δs
n r−k k− r + 2
(x1 )Δτ
× Δ̂sk − n (y)Δτ̂
n r−k k− r + 2
n k k− r + 2
r
(ck − x1 )
∗ (er−k − y)
∗ × Δτ̂k − n (er−k − ww∗ ) r
= C1 Δs
k+1 k− 2
(x1 )Δτ
× Δ̂s
k r−k+1 k− 2 − 2
× Δτ̂
k r−k+1 k− 2 − 2
k+1 k− 2 )
(y)Δτ̂
k−
(ck − x1 ) r−k+1 2
∗ (er−k − y)
∗ (er−k − ww∗ ).
This proves that X1 , Y, and W are independent such that X1 ∼ βs(1),τ , k
k
Y ∼ β(1)
̂sk −k d2 ,̂τk
,
and fW (w) = CΔτ̂
k k− 2
∗ (er−k − ww∗ )1M (w).
Theorem 7.11. Let X be a beta–Riesz random variable of the first kind in the Jordan al(1) gebra V with distribution βs,τ . Write X = tU (e), where U = ∑ri=1 Ui ci + ∑i (i − 1) d2 ∀1 ≤ i ≤ r so that Y ∼ R(τ, σ). Now by (7.50), we get
n r
is
g3 (v) = ln Δp (v) + c + ln Δp (v) + ⟨δ, v⟩ + c1 = ln Δp+p (v) + ⟨δ, v⟩ + c + c1 .
From (7.46) it follows that fX (x) = Δp+p (x)e⟨δ,x⟩ ec+c1
= Δp+τ− n (x)e−⟨σ,x⟩ ec+c1 , r
which implies that pi + τi > (i − 1) d2 ∀1 ≤ i ≤ r, and consequently, X ∼ R(s, σ) where s = p + τ.
7.6 Beta–Riesz probability distribution of the second kind Besides the real beta distributions of the first kind, the real beta distributions of the second kind, which are also called inverted beta distributions, represent a class of very flexible distributions with wide applications in statistical analysis. In this section, we derive a beta–Riesz probability distribution of the second kind from the beta–Riesz distribution of the first kind, and establish for this distribution a property of inversion.
190 | 7 Beta Riesz probability distributions Theorem 7.21. Suppose that U has a beta–Riesz probability distribution of the first kind (1) βs,τ . Then the distribution of Z = U(e − U)−1 is 1 −1 Δ n ((e + z −1 ) )Δτ+ n ((e + z)−1 )1Ω (z)dz. r BΩ (s, τ) s− r
(7.54)
Proof. Define the function φ : Ω → Ω,
u → (e − u)−1 − e.
The derivative of φ with respect to u evaluated at a point h of V is φ (u)(h) = P((e − u)−1 )(h), that is, φ (u) = P((e − u)−1 ). Thus the Jacobian of φ is equal to 2n
Det(P((e − u)−1 )) = Δ− r (e − u). Now since the Jordan product is associative in the subalgebra of V generated by e and an element of V, writing z = u(e − u)−1 , we have that u = z(e + z)−1
and
e − u = (e + z)−1 .
Inserting into the distribution of U, (1) βs,τ (du) =
1 Δ n (u)Δτ− n (e − u)1Ω∩(e−Ω) (u)du, r BΩ (s, τ) s− r
we deduce that the distribution of Z = U(e − U)−1 is 1 Δ n (z(e + z)−1 )Δτ+ n ((e + z)−1 )1Ω (z)dz, r BΩ (s, τ) s− r or equivalently, 1 −1 Δs− n ((e + z −1 ) )Δτ+ n ((e + z)−1 )1Ω (z)dz. r r BΩ (s, τ) Definition 7.2. The probability distribution (2) βs,τ (dx) =
1 Δ n (x(e + x)−1 )Δτ+ n ((e + x)−1 )1Ω (x)dx r BΩ (s, τ) s− r
defined for s, τ ∈ ℝr such that si > (i − 1) d2 , τi > (i − 1) d2 for all 1 ≤ i ≤ r, is called the beta–Riesz probability distribution of the second kind on the symmetric cone Ω with parameters s and τ.
7.6 Beta–Riesz probability distribution of the second kind | 191
When s1 = ⋅ ⋅ ⋅ = sr = p and τ1 = ⋅ ⋅ ⋅ = τr = q with p > (r − 1) d2 and q > (r − 1) d2 , the corresponding beta–Riesz probability distribution of the second kind is (2) βp,q (dx) =
n 1 Δp− r (x)Δ−(p+q) (e + x)1Ω (x)dx. BΩ (p, q)
It is the beta–Wishart probability distribution of the second kind on the symmetric cone Ω with parameters p and q. In the real case, it is the well-known beta probability distribution of the second kind, (2) βp,q (dx) =
1 xp−1 (e + x)−(p+q) 1]0,+∞[ (x)dx. B(p, q)
Theorem 7.22. Let X be a random variable in Ω with beta–Riesz probability distribution (2) (2) of the second kind βs,τ . Then the distribution of Y = X −1 is βτ,s . Proof. We have that the distribution of X is 1 Δ n (x(e + x)−1 )Δτ+ n ((e + x)−1 )1Ω (x)dx. r BΩ (s, τ) s− r n
We make the change of variable y = x−1 . As dx = Δ−2 r (y)dy, we obtain that the distribution of Y is n 1 −1 Δs− n ((e + y)−1 )Δτ+ n ((e + y−1 ) )Δ−2 r (y)1Ω (y)dy. r r BΩ (s, τ)
Given that Δ(y) = Δ(e + y)Δ((e + y−1 ) ), −1
the distribution of Y becomes 1 −1 Δ n ((e + y−1 ) )Δs+ n ((e + y)−1 )1Ω (y)dy, r BΩ (s, τ) τ− r (2) which is the βτ,s distribution.
8 Beta–Wishart distributions This chapter is devoted to some properties of the beta–Wishart distributions. The generalized power function used in the definition of a beta–Riesz distribution reduces in this case to a determinant. Therefore most of the probabilistic results may be established independently of the choice of the division algorithm with the same techniques. Unless otherwise stated, this fact will not be mentioned in all the statements, we will continue to use the division algorithm defined by the triangular group. Clearly, a beta–Wishart probability distribution is invariant with respect to the orthogonal group K and therefore it is characterized by its Mellin transform. If X has the beta–Wishart distribution of the first kind (1) βp,q (dx) =
n n 1 Δp− r (x)Δq− r (e − x)1Ω∩(e−Ω) (x)dx, BΩ (p, q)
then its Mellin transform s → 𝔼(Δs (X)) is given in the following preliminary result. (1) Proposition 8.1. Let X be a random variable in Ω ∩ (e − Ω) with distribution βp,q , where
p, q > (r − 1) d2 . Then for all s = (s1 , . . . , sr ) in ℝr such that p + si > (i − 1) d2 , i ∈ {1, . . . , r}, we have ΓΩ (p + q)ΓΩ (p + s) . ΓΩ (p)ΓΩ (p + q + s)
𝔼(Δs (X)) =
(8.1)
In particular, for all k = (k1 , . . . , kr ) ∈ ℕr such that p + ki > (i − 1) d2 , 1 ≤ i ≤ r, r
ki
𝔼(Δk (X)) = ∏(∏ i=1
j=1
p + ki − (i − 1) d2 − j
p + q + ki − (i − 1) d2 − j
).
(8.2)
(1) Proof. As X ∼ βp,q , then also
𝔼(Δs (X)) =
1 BΩ (p, q)
1 = BΩ (p, q) =
∫
n
n
Δs (x)Δp− r (x)Δq− r (e − x)dx
Ω∩(e−Ω)
∫ Ω∩(e−Ω)
Δs+p− n (x)Δq− n (e − x)dx r
r
ΓΩ (p + q)ΓΩ (p + s) . ΓΩ (p)ΓΩ (p + q + s)
This, using the expression of the function ΓΩ given in Theorem 2.13 and the properties of the real gamma function, leads to (8.2). https://doi.org/10.1515/9783110713374-008
194 | 8 Beta–Wishart distributions Similarly, if Z has the beta–Wishart probability distribution of the second kind (2) βp,q on Ω given by n
(2) βp,q (dx) = (BΩ (p, q)) Δp− r (x)Δ−(p+q) (e + x)1Ω (x)dx, −1
then the Mellin transform of Z is defined for s = (s1 , . . . , sr ) such that −(p − (i − 1) d2 ) < si < q − (i − 1) d2 by s → 𝔼(Δs (Z)) =
ΓΩ (p + s) ΓΩ (q − s) . ΓΩ (p) ΓΩ (q)
(8.3)
Theorem 8.2. Let Y be a random variable in Ω. Then (2) Y ∼ βp,q
if and only if
(1) π −1 (e + Y)(Y) ∼ βp,q .
(2) Proof. Let Y ∼ βp,q and Z = π −1 (e + Y)(Y). For a bounded measurable function h, we have n
𝔼(h(Z)) = (BΩ (p, q)) ∫ h(π −1 (e + y)(y))Δp− r (y)Δ−(p+q) (e + y)dy. −1
Ω
Setting z = π −1 (e + y)(y), we have y = π −1 (e − z)(z). In fact, z = π −1 (e + y)(y)
= π −1 (e + y)(e + y) − π −1 (e + y)(e) = e − π −1 (e + y)(e).
It follows that π −1 (e + y)(e) = e − z. Hence π −1 (e − z)z = π −1 (π −1 (e + y)(e))z = π(e + y)z = y. Now y = π −1 (e − z)(z) implies that Δ(y) = Δ(π −1 (e − z)(z)) = Δ(π −1 (e − z)(e))Δ(z) = Δ−1 (e − z)Δ(z). Therefore ln Δ(y) = − ln Δ(e − z) + ln Δ(z). Differentiating with respect to z, we get y y−1 = (e − z)−1 + z −1 = (e − z)−1 z −1 . Using Proposition 1.7, we can write r
Δ(y y−1 ) = Det n (y )Δ(y−1 ).
8 Beta–Wishart distributions
| 195
As Δ(y) = Δ−1 (e − z)Δ(z) and Δ(y y−1 ) = Δ((e − z)−1 z −1 ) = Δ−1 (e − z)Δ−1 (z), we obtain n
Det(y ) = Δ−2 r (e − z). Thus n
dy = Det(y )dz = Δ−2 r (e − z)dz. Using this, we have 𝔼(h(Z)) = (BΩ (p, q))
−1
∫
n
h(z)Δp− r (π −1 (e − z)(z))
Ω∩(e−Ω)
×Δ
−(p+q)
n
(π (e − z)(e))Δ−2 r (e − z)dz
= (BΩ (p, q))
−1
−1
∫
n
n
h(z)Δp− r (z)Δq− r (e − z)dz.
Ω∩(e−Ω) (1) Thus Z ∼ βp,q .
(1) (2) In the same way, we verify that if Z ∼ βp,q then π −1 (e − Z)(Z) ∼ βp,q .
Of course, some properties of the beta–Wishart distribution may be deduced as particular statement of results established for the beta–Riesz distribution. For example, Theorem 7.4 leads to the following statement for the beta–Wishart distribution. Theorem 8.3. Let X and Y be two independent beta–Wishart random variables, X ∼ (1) (1) βp,q and Y ∼ βp+q,k , where p, q, k > (r − 1) d2 . Then π −1 (e − π(Y)X)(e − Y)
and
e − π(Y)X
(1) (1) are independent with distributions βk,q and βq+k,p , respectively. (1) (1) Note that the distributions βk,q and βq+k,p in Theorem 8.3 are expressed in terms of
determinant functions. We can then replace the division algorithm π −1 in this theorem by any other division algorithm and get the same probabilistic results with the same method of proof. The following reciprocal statement is given in [70], it uses the division algorithm 1 g(x) = P(x− 2 ) defined by the quadratic representation.
196 | 8 Beta–Wishart distributions Theorem 8.4. Let X and Y be two independent random variables with continuous positive density functions on Ω ∩ (e − Ω). Consider the random variables U = P((e − 1 1 1 P(Y 2 )X)− 2 )(e − Y) and V = e − P(Y 2 )X. If U and V are independent, then there exist p, (1) (1) q, and k such that X ∼ βp,q and Y ∼ βp+q,k . (1) Proposition 8.5. Let X be a random variable with distribution βp,q and let Z be a 1 K-invariant random variable taking values in Ω ∩ (e − Ω). Suppose that X and Z are (1) (1) . independent. If Y = π(Z)X is βp,q , with q > q1 + nr , then the variable Z is βp+q 1 ,q−q1
Proof. Using the independence of X and Z and the fact that Δs (π(z)y) = Δs (z)Δs (y), we have that for all s = (s1 , . . . , sr ) ∈ ]0, +∞[r , 𝔼(Δs (Y)) = 𝔼(Δs (Z))𝔼(Δs (X)). And using (8.1), we obtain 𝔼(Δs (Z)) =
ΓΩ (p + q1 + s) ΓΩ (p + q) . ΓΩ (p + q1 ) ΓΩ (p + q + s)
(1) Thus 𝔼(Δs (Z)) coincides with 𝔼(Δs (T)) where T is βp+q . 1 ,q−q1
(1) As Z is assumed to be K-invariant, we conclude that Z is βp+q . 1 ,q−q1
Proposition 8.6. Let p and q be in ](r − 1) d2 , +∞[. Let Z and X be independent random variables on Ω, absolutely continuous with respect to the Lebesgue measure. Suppose that (i) Z has the Wishart probability distribution Wp+q,e , (ii) the probability density function of X is K-invariant, and (iii) Y = P(Z 1/2 )X has the Wishart probability distribution Wp,e . (1) Then X has the beta–Wishart probability distribution βp,q .
Proof. We first show that if X1 is a random variable in Ω with the beta–Wishart proba(1) bility distribution βp,q and independent of Z, then the variable Y1 = P(Z 1/2 )X1 has the Wishart probability distribution Wp,e . In fact, according to Theorem 2.17, we have that fY1 (y) =
n n 1Ω (y) ∫ e− tr(z) Δp+q−2 r (z)Δp− r (P(z −1/2 )y) ΓΩ (p)ΓΩ (q)
y+Ω
q− nr
×Δ
(e − P(z −1/2 )y)dz.
Using the fact that Δ(e − P(z −1/2 )y) = Δ(P(z −1/2 )(z − y)) = Δ−1 (z)Δ(z − y), we obtain fY1 (y) =
n n 1 Δp− r (y)1Ω (y) ∫ e− tr(z) Δq− r (z − y)dz. ΓΩ (p)ΓΩ (q)
y+Ω
8.1 A characterization of the beta–Wishart probability distribution
| 197
Making the change of variable u = z − y, we get fY1 (y) =
n 1 Δp− r (y)e− tr(y) 1Ω (y). ΓΩ (p)
Thus Y1 = P(Z 1/2 )X1 and Y = P(Z 1/2 )X have the same distribution Wp,e . Since Z is independent of X and X1 , according to Proposition 2.17, we obtain fZ ∗ fX = fZ ∗ fX1 .
(8.4)
Taking the spherical Fourier transform, we get ̂fX = ̂fX1 . Finally, given the K-invariance and the fact that fX1 is 𝒞 ∞ with compact support, Theorem 2.14 implies that fX = fX1 .
8.1 A characterization of the beta–Wishart probability distribution The following theorem extends a result established in [48] for the cone of positive definite symmetric matrices to the beta–Wishart probability distribution on any symmetric cone. A particular version appears in Muirhead [91, p. 120], as an exercise on the 2 beta matrix β(1) n m with half-integer parameters defined by the so-called χ matrices con, 2
2
structed using Gaussian vectors. In the necessity part of the proof, we use the Mellin transform. Recall that A is the set of probability distributions on Ω which are absolutely continuous with respect to the Lebesgue measure, invariant with respect to the orthogonal group K and such that the Mellin transform is finite in an open set.
(1) Theorem 8.7. Let X be a random variable in Ω∩(e−Ω). Then X is βp,q distributed, where
p, q > (r − 1) d2 , if and only if (i) The distribution of X is in A; (ii) For i ∈ {1, . . . , r}, the real random variable Δei (X), where ei is defined in (2.30), has the distribution β(1)
p−(i−1) d2 ,q
;
(iii) The Δei (X) are independent. (1) Proof. (⇒) Suppose that X is βp,q .
(1) (i) It is easy to see that the distribution βp,q is in A. (ii) For all θ = (θ1 , . . . , θr ) such that θi ≥ 0 ∀i ∈ {1, . . . , r}, we have r
θ
r
𝔼(∏(Δei (X)) i ) = 𝔼(∏ Δθi ei (X)) i=1
i=1
= 𝔼(Δθ (X)) Γ (p + q)ΓΩ (p + θ) = Ω ΓΩ (p)ΓΩ (p + q + θ)
198 | 8 Beta–Wishart distributions r
=∏ i=1
Γ(p + q − (i − 1) d2 )Γ(p + θi − (i − 1) d2 ) Γ(p − (i − 1) d2 )Γ(p + q + θi − (i − 1) d2 )
.
For θ = (0, . . . , 0, θi , 0, . . . , 0), θi ≥ 0, we get that θ
𝔼((Δei (X)) i ) =
Γ(p + q − (i − 1) d2 )Γ(p + θi − (i − 1) d2 )
Γ(p − (i − 1) d2 )Γ(p + q + θi − (i − 1) d2 )
which is the Mellin transform of a β(1) the distribution
p−(i−1) d2 ,q
β(1) . p−(i−1) d2 ,q
,
distribution in ℝ. Therefore, Δei (X) has
(iii) From the last equality, we deduce that r
r
θ
θ
𝔼(∏(Δei (X)) i ) = ∏ 𝔼((Δei (X)) i ). i=1
i=1
Consequently, the Δei (X) are independent. (⇐) Since the distribution of X is in A, by Proposition 2.15, it suffices to verify (1) that the distribution of M(X) is equal to the distribution of M(Y) where Y has a βp,q distribution. The fact that X is in Ω ∩ (e − Ω) implies that the random vector M(X) is bounded in r ℝ . It follows that all its moments are finite and uniquely determine the distribution. Invoking Proposition 8.1, we need only to verify that for k = (k1 , . . . , kr ) in ℕr , k
i=1
p + ki − (i − 1) d2 − α
ki
r
𝔼((M(X)) ) = ∏(∏ α=1
p + q + ki − (i − 1) d2 − α
In fact, since for all i ∈ {1, . . . , r}, Δei (X) ∼ β(1)
p−(i−1) d2 ,q
ki
𝔼(Δki ei (X)) = ∏ α=1
).
, we get
p + ki − (i − 1) d2 − α
p + q + ki − (i − 1) d2 − α
.
This, together with the fact that the Δei (X) are independent, implies that k
𝔼((M(X)) ) = 𝔼(Δk (X)) r
= ∏ 𝔼(Δki ei (X)) i=1 r
ki
= ∏(∏ i=1
α=1
p + ki − (i − 1) d2 − α
p + q + ki − (i − 1) d2 − α
As a corollary, we give an independence property.
).
8.2 Product of beta–Wishart random variables |
199
For a = (a1 , . . . , an ) and b = (b1 , . . . , bn ) in ℝr , we define a.b = (a1 b1 , . . . , an bn ), and when a1 ≠ 0, . . . , an ≠ 0, a−1 = (
1 1 , . . . , ). a1 an
We also set 1 = (1, . . . , 1), (1) Corollary 8.8. Let X and Y be two independent beta random variables, X ∼ βp,q and (1) Y ∼ βp+q,s , where p, q, s > (r − 1) d2 . Then the random vectors
(1 − M(X).M(Y)) .(1 − M(Y)) and −1
1 − M(X).M(Y)
are independent. Proof. From Theorem 8.7, we have that, for all i ≠ j, Δei (X) and Δej (X) are independent, also Δei (Y) and Δej (X) are independent. Then
1−Δei (Y) 1−Δei (X)Δei (Y)
is independent of
1 − Δej (X)Δej (Y). On the other hand, as Δei (X) and Δei (Y) are independent with disand β(1)
tributions β(1) 1−Δei (Y) 1−Δei (X)Δei (Y)
p+q−(i−1) d2 ,s
p−(i−1) d2 ,q
, respectively, using Theorem 8.3, we have that
is independent of 1 − Δei (X)Δei (Y). Thus [1 − M(X).M(Y)]−1 .[1 − M(Y)] and
1 − M(X).M(Y) are independent.
8.2 Product of beta–Wishart random variables The product we use is defined in a recursive way using the multiplication algorithm corresponding to the Cholesky decomposition. More precisely, we set 1
⨀ Xi = X1 , i=1
n
n−1
i=1
i=1
⨀ Xi = π(⨀ Xi )Xn .
(8.5)
Under some conditions on the parameters, the product of independent β(1) random variables is β(1) distributed. Theorem 8.9. Let X1 , . . . , Xn be independent random variables in Ω ∩ (e − Ω) such that Xi is βp(1)i ,qi , 1 ≤ i ≤ n. Then n
⨀ Xi is βp(1),∑n i=1
n
i=1
qi
if and only if
pi = pi+1 + qi+1 ,
for all i = 1, . . . , n − 1.
200 | 8 Beta–Wishart distributions Proof. (⇒) Suppose that Y = ⨀ni=1 Xi is βp(1),∑n n
-distributed. We first show that
qi ∑ni=1 qi , p1 , . . . , pn ). i=1
(pn , p1 + q1 , . . . , pn + qn ) is a permutation of (pn + In fact, as Δs (π(x)y) = Δs (x)Δs (y) and variables Xi are independent, we have that for all s = (s1 , . . . , sr ) ∈ ]0, +∞[r , n 𝔼(Δs+1 (Xi )) 𝔼(Δs+1 (Y)) =∏ . 𝔼(Δs (Y)) 𝔼(Δs (Xi )) i=1
The latter, using (8.1) and the fact that r d ΓΩ (s + 1) = ∏(si − (i − 1) )ΓΩ (s), 2 i=1
gives that r
∏ i=1
pn + si − (i − 1) d2
n
pn + ∑ni=1 qi + si − (i − 1) d2
r
= ∏∏ j=1 i=1
pj + si − (i − 1) d2
pj + qj + si − (i − 1) d2
.
(8.6)
Now define, for s = (s1 , . . . , sr ) ∈ ]0, +∞[r , n r d d f (s) = ∏ ∏(pn + sj − (i − 1) )(pj + qj + sj − (i − 1) ) 2 2 j=1 i=1
and n r n d d g(s) = ∏ ∏(pn + ∑ qi + si − (i − 1) )(pj + si − (i − 1) ). 2 2 j=1 i=1 i=1
Then from (8.6), we have that f (s) = g(s), for s = (s1 , . . . , sr ) ∈ ]0, +∞[r . Since f and g are two polynomial functions in s, we obtain that f (s) = g(s) for s ∈ ℝr . In particular, they have the same roots. We then deduce that (pn , p1 + q1 , . . . , pn + qn ) is some permutation of (pn + ∑ni=1 qi , p1 , . . . , pn ). For n = 2, this means that p1 = p2 + q2 , when X1 and X2 are two independent random variables with distributions βp(1)1 ,q1 and βp(1)2 ,q2 , respectively, such that Y = π(X1 )X2 is βp(1)2 ,q1 +q2 -distributed. Therefore the result is true for n = 2. Now suppose that it is true for the product of n − 1 random variables and write n
n−1
i=1
i=1
⨀ Xi = π(⨀ Xi )Xn .
(8.7)
(1) We have that Xn and ⨀n−1 i=1 Xi are independent such that Xn is βpn ,qn -distributed.
that
According to Proposition 8.5, the fact that ⨀ni=1 Xi is βp(1),∑n ⨀n−1 i=1 Xi
is
β(1) -distributed. pn +qn ,∑n−1 i=1 qi
n
i=1
qi
-distributed implies
On the other hand, by the induction hypoth-
esis, we have that pi = pi+1 + qi+1 , for all i = 1, . . . , n − 2. This, with the fact that
8.3 Stability property | 201
(pn , p1 + q1 , . . . , pn + qn ) is some permutation of (pn + ∑ni=1 qi , p1 , . . . , pn ), implies that we also have pn−1 = pn + qn . (⇐) The proof is also performed by induction on n. We first verify that the result is true for n = 2. In fact, if X1 and X2 are two independent random variables with beta distributions βp(1)1 ,q1 and βp(1)2 ,q2 , respectively, such that p1 = p2 + q2 , and if Y = π(X1 )X2 , then, according to Lemma 2.17, we have that n
fY (y) = ∫ fX1 (z)fX2 (π −1 (z)y)Δ− r (z)dz Ω
=
n ΓΩ (p2 )ΓΩ (q2 )ΓΩ (q1 ) Δ(y)p2 − r 1Ω∩(e−Ω) (y) ΓΩ (p2 + q2 + q1 ) n
n
× ∫ Δq1 − r (e − z)Δq2 − r (z − y)dz. y+Ω
Making the change of variables u = π −1 (e − y)(e − z), we obtain that n
n
∫ Δ(e − z)q1 − r Δ(z − y)q2 − r dz y+Ω n
= Δ(e − y)q1 +q2 − r
∫
n
n
Δ(u)q1 − r Δ(e − u)q2 − r du
Ω∩(e−Ω)
=
n ΓΩ (q2 + q1 ) Δ(e − y)q1 +q2 − r . ΓΩ (q2 )ΓΩ (q1 )
Hence fY (y) =
n n ΓΩ (p2 )ΓΩ (q2 + q1 ) Δ(y)p2 − r Δ(e − y)q1 +q2 − r 1Ω∩(e−Ω) (y). ΓΩ (p2 + q2 + q1 )
Therefore Y = π(X1 )X2 has the beta–Wishart probability distribution βp(1)2 ,q1 +q2 on Ω. (1) Suppose now that the result is true for n − 1 variables. Then ⨀n−1 i=1 Xi is β
-
pn−1 ,∑n−1 i=1 qi n−1 distributed. Thus we have that the two random variables ⨀i=1 Xi and Xn are independent with distributions β(1) n−1 and βp(1)n ,qn , respectively, and also pn−1 = pn +qn . Using pn−1 ,∑i=1 qi (8.7) and applying the first step, we deduce that ⨀ni=1 Xi is βp(1),∑n q -distributed. n i=1 i
8.3 Stability property The purpose of this section is to give a property of stability which involves the beta– Wishart probability distributions of the first and second kind. This property will be extended in the next chapter to a characterization property of the beta–hypergeometric probability distribution.
202 | 8 Beta–Wishart distributions (2) Theorem 8.10. Let Z be a βp,q random variable and X an independent nondegenerate
K-invariant random variable taking values in Ω ∩ (e − Ω). Then X is βp(1) ,q -distributed, with p = p + q if and only if the variable Y = π(Z)X is βp(2) ,q -distributed. Proof. (⇒) Using Lemma 2.17, we have n
fY (y) = ∫ fZ (z)fX (π −1 (z)y)Δ− r (z)dz Ω
n n ΓΩ (p + q) ∫ Δp− r (z)Δ−p−q (e + z)Δp − r (π −1 (z)y) ΓΩ (q)ΓΩ (p )ΓΩ (q )
=
y+Ω
×Δ
q − nr
n
(e − π (z)y)Δ− r (z)1Ω (y)dz n ΓΩ (p + q) = Δp − r (y)1Ω (y) ΓΩ (q)ΓΩ (p )ΓΩ (q ) −1
n
× ∫ Δ−p−q (e + z)Δq − r (z − y)dz.
y+Ω
Making the change of variable u = π −1 (e + y)(z − y), we get n
∫ Δ−p−q (e + z)Δq − r (z − y)dz
y+Ω n
n
= ∫ Δq − r (π(e + y)(u))Δ−p−q (π(e + y)(e + u))Δ r (e + y)du
Ω n
= Δ−p −q (e + y) ∫ Δq − r (u)Δ−p−q (e + u)du
Ω
ΓΩ (q )ΓΩ (p + q) −p −q Δ (e + y). ΓΩ (p + q + q)
= Thus
fY (y) =
n ΓΩ (p + q) Δ(y)p − r Δ(e + y)−p −q 1Ω (y). ΓΩ (p )ΓΩ (q)
This shows that Y is βp(2) ,q -distributed.
(⇐) Suppose that Y = π(Z)X is βp(2)1 ,q1 -distributed. Then using (8.3) and the definition of ΓΩ (⋅), the fact that 𝔼(Δs (Y)) = 𝔼(Δs (X))𝔼(Δs (Z)) implies r
∏ i=1
Γ(p1 + si − (i − 1) d2 ) Γ(q1 − si − (i − 1) d2 ) Γ(p1 − (i − 1) d2 ) r
= 𝔼(Δs (X)) ∏ i=1
Γ(q1 − (i − 1) d2 )
Γ(p + si − (i − 1) d2 ) Γ(q − si − (i − 1) d2 ) Γ(p − (i − 1) d2 )
Γ(q − (i − 1) d2 )
.
8.3 Stability property | 203
In particular, if s = (u, 0, . . . , 0), we get Γ(p1 + u) Γ(q1 − u) Γ(p + u) Γ(q − u) = 𝔼(Δ1 (X)u ) . Γ(p1 ) Γ(q1 ) Γ(p) Γ(q) According to Theorem 2 in [60], we obtain q1 = q and p1 < p. Thus 𝔼(Δs (X)) =
ΓΩ (p1 + s) ΓΩ (p1 + (p − p1 )) . ΓΩ (p1 ) ΓΩ (p1 + (p − p1 ) + s)
Using the fact that X is K-invariant and the expression of the Mellin transform of a beta–Wishart distribution given in (8.1), we deduce that X has the beta distribution βp(1)1 ,p−p1 . Note that the result in Theorem 8.10 holds if we change π(Z)X by π(X)Z. In this case, the proof of the sufficiency part is the same, but the proof of the necessity part changes, it involves the Gauss hypergeometric function 2 F1 . In fact, if W = π(X)Z, then for w in Ω, n
fW (w) = ∫ fX (x)fZ (π −1 (x)w)Δ− r (x)dx Ω
=
ΓΩ (p + q) ΓΩ (p + q ) ΓΩ (p)ΓΩ (q) ΓΩ (p )ΓΩ (q ) ×Δ
=
p− nr
−1
−p−q
(π (x)(w))Δ
n
n
Δp − r (x)Δq − r (e − x)
∫ Ω∩(e−Ω)
n
(e + π −1 (x)w)Δ(x)− r dx
ΓΩ (p + q) ΓΩ (p + q ) p− nr Δ (w) ΓΩ (p)ΓΩ (q) ΓΩ (p )ΓΩ (q ) n
n
Δp +q− r (x)Δq − r (e − x)Δ−p−q (w + x)dx
×
∫
Ω∩(e−Ω)
Γ (p + q) ΓΩ (p + q ) −q− nr Δ = Ω (w) ΓΩ (p)ΓΩ (q) ΓΩ (p )ΓΩ (q ) n
n
Δp +q− r (x)Δq − r (e − x)Δ−p−q (e + π −1 (w)x)dx.
×
∫
Ω∩(e−Ω)
According to [31, p. 335], we obtain n
fW (w) = c Δ−q− r (w) 2 F1 (p + q, p + q; p + q + q; −π −1 (w)(e))1Ω (w), where c =
ΓΩ (p+q) ΓΩ (p +q ) ΓΩ (p +q) . ΓΩ (p)ΓΩ (q) ΓΩ (p ) ΓΩ (p +q +q)
Under the condition p = p + q , and from the definition of 2 F1 , we get fW (w) =
ΓΩ (p + q) p − nr Δ (w)Δ−p −q (e + w)1Ω (w). ΓΩ (p )ΓΩ (q)
Therefore W is βp(2) ,q -distributed.
204 | 8 Beta–Wishart distributions Theorem 8.11. Let W , X, and W be three independent random variables valued in Ω. (2) (1) (i) If W ∼ βa+a ,a and X ∼ βa,a , then π −1 (e + π(X)(W ))(e) ∼ βa(1) ,a . (2) (1) (2) (ii) If W ∼ βa+a ,a , X ∼ βa,a , and W ∼ βa+a ,a , then
π −1 (e + π −1 (e + π(X)(W ))(W))(e) ∼ X. (2) Proof. (i) Let W and X be two independent random variables such that W ∼ βa+a ,a
(1) (2) and X ∼ βa,a . From Theorem 8.10, we have that π(X)(W ) ∼ βa,a , and, according to Theorem 8.2, we obtain (1) π −1 (e + π(X)(W ))(π(X)(W )) ∼ βa,a .
It follows that π −1 (e + π(X)(W ))(e) = e − (π −1 (e + π(X)(W ))(π(X)(W ))) ∼ βa(1) ,a . (ii) As π −1 (e + π(X)(W ))(W) = π(π −1 (e + π(X)(W ))(e))(W), then, according to Theorem 8.10 and the previous point (i), we obtain that π −1 (e + π(X)(W ))(W) ∼ βa(2) ,a . Therefore π −1 (e + π −1 (e + π(X)(W ))(W))(e) ∼ X.
8.4 Characterizations by constancy regression Theorem 8.12. Let X and Y be two independent random variables with distributions in A such that the probability density functions are positive in Ω ∩ (e − Ω). Denote W = 1 − M(X).M(Y)
and
U = W −1 .(1 − M(Y)).
(8.8)
Assume that 𝔼(U | W) = c1,
(8.9) 2
2
𝔼(U ⊗ U | W) = (b − c )e + c 1 ⊗ 1. 𝔼(M(Y)) = ((a + s)1 − η) (a1 − η), −1
(8.10) (8.11)
where η = (0, d2 , . . . , (r − 1) d2 ) and a, b, and c are positive real constants such that c ∈ (1) ]0, 1[, s = c(b−c) > (r − 1) d2 , and q = (1−c)(b−c) > (r − 1) d2 . Then a > q + (r − 1) d2 , X ∼ βp,q , c2 −b c2 −b
8.4 Characterizations by constancy regression
| 205
(1) and Y ∼ βp+q,s , where p = a − q. Consequently, the probability density functions of U and W with respect to the Lebesgue measure are respectively r
fU (u1 , . . . , ur ) = ∏ i=1
q−1 us−1 i (1 − ui ) 1]0,1[ (ui ) B(s, q)
and d
r
fW (w1 , . . . , wr ) = ∏
wis+q−1 (1 − wi )p−(i−1) 2 −1 B(s + q, p − (i − 1) d2
i=1
1]0,1[ (wi ).
The real version of this theorem is proved in [105], it will serve in the proof of the theorem in its general form. In this case, assumption (8.11), required to get the fitting parameters, is no longer needed and the statement of the theorem becomes: Theorem 8.13. Let X and Y be independent nondegenerate random variables valued in 1−Y ]0, 1[. Denote W = 1 − XY and U = 1−XY . Assume that 𝔼(U | W) = c
(8.12)
𝔼(U 2 | W) = b,
(8.13)
and
for some real constants c, b such that q =
p > 0 such that X ∼ (1) that U ∼ βs,q and W
(1) βp,q ,Y ∼ (1) ∼ βs+q,p .
(1) βp+q,s
(1−c)(b−c) c2 −b
> 0, s =
c(b−c) c2 −b
> 0. Then there exists
and, consequently, U and W are independent such
Proof of Theorem 8.13. Equations (8.12) and (8.13) may be written in terms of X and Y as 𝔼(1 − Y | XY) = c(1 − XY), 2
2
𝔼((1 − Y) | XY) = b(1 − XY) .
(8.14) (8.15)
As X and Y are bounded, their probability distributions are characterized by the sequences of their moments. From (8.14), we obtain that for all k ∈ ℕ, 𝔼((1 − Y)(XY)k ) = c𝔼((1 − XY)(XY)k ). Setting g(k) =
𝔼(X k+1 ) 𝔼(X k )
and h(k) =
𝔼(Y k+1 ) , 𝔼(Y k )
the above equation can be written, for all k ∈ ℕ, as h(k)(1 − cg(k)) = 1 − c.
(8.16)
206 | 8 Beta–Wishart distributions Similarly, from (8.15), we obtain that for all k ∈ ℕ, 𝔼((1 − Y)2 (XY)k ) = b𝔼((1 − XY)2 (XY)k ), which can be written as 1 − 2h(k) + h(k)h(k + 1) = b − 2bh(k)g(k) + bh(k)h(k + 1)g(k)g(k + 1).
(8.17)
Upon substitution, (8.16) and (8.17) lead to h(k + 1)[(c2 − b)h(k) + b(1 − c)] = [b(1 − c) + 2(c2 − b)]h(k) − (c2 − b). Given that the random variables are in ]0, 1[, nondegenerate, and that b = 𝔼(U 2 ) and c = 𝔼(U), we have that 1 > c > b > c2 . Thus s = c(b−c) > 0, and we have c2 −b h(k + 1)(s + 1 − h(k)) = 1 + (s − 1)h(k). This shows that for all k ∈ ℕ, h(k) ≠ s + 1, so that 1 + (s − 1)h(k) . s + 1 − h(k)
h(k + 1) = Now defining a =
s𝔼(Y) , 1−𝔼(Y)
we have h(0) = 𝔼(Y) = 𝔼(Y k+1 ) =
a , a+s
and it follows that for all k ∈ ℕ,
k+a 𝔼(Y k ). k+a+s
(1) From this we deduce that Y ∼ βa,s . On the other hand, from (8.16), we get
g(k) = This implies that p = a −
1−c s c
k+a−
1−c s c
k+a
,
for all k ∈ ℕ.
(1) > 0 and that X ∼ βp,q with q = a − p =
1−c s. c
We now give the proof of Theorem 8.12. Proof. As X and Y are two independent random variables with positive probability density functions in Ω∩(e−Ω), for every i ∈ {1, . . . , r}, Δei (Y) and Δei (X) are independent and nondegenerate random variables valued in ]0, 1[. Also, from (8.9) and (8.10), we have that 𝔼(
1 − Δei (Y)
W) = c 1 − Δei (X)Δei (Y)
(8.18)
and 𝔼((
2 ) W) = d. 1 − Δei (X)Δei (Y)
1 − Δei (Y)
(8.19)
8.4 Characterizations by constancy regression
| 207
This, according to Theorem 8.13, implies that Δei (Y) ∼ βp(1)i +q,s
and
Δei (X) ∼ βp(1)i ,q ,
> 0, q = (1−c)(b−c) > 0, and pi > 0. where s = c(b−c) c2 −b c2 −b Using (8.11), we deduce that for all i ∈ {1, . . . , r}, d
𝔼(Δei (Y)) =
a − (i − 1) 2 pi + q = . pi + q + s a + s − (i − 1) d 2
Thus pi = a − q − (i − 1) d2 . If we set p = a − q, then pi = p − (i − 1) d2 , and consequently, Δei (Y) ∼ β(1)
p+q−(i−1) d2 ,s
and
Δei (X) ∼ β(1)
p−(i−1) d2 ,q
.
Now as 1 − Δei (X)Δei (Y) is W-measurable, we have from (8.18) that 𝔼(1 − Δei (Y) | W) = c(1 − Δei (X)Δei (Y)). It follows that for any k ∈ ℕr , 𝔼[(1 − Δei (Y))Δk (X)Δk (Y)] = c𝔼[(1 − Δei (X)Δei (Y))Δk (X)Δk (Y)].
(8.20)
If we set hi (k) =
𝔼(Δk+ei (Y))
and gi (k) =
𝔼(Δk (Y))
𝔼(Δk+ei (X)) 𝔼(Δk (X))
,
then (8.20) can be written as hi (k)[1 − cgi (k)] = 1 − c.
(8.21)
Also from (8.10), we obtain for i ≠ j that 𝔼(
1 − Δei (Y)
1 − Δej (Y)
W) = c2 . 1 − Δei (X)Δei (Y) 1 − Δej (X)Δej (Y) .
For any k ∈ ℕr , we have that 𝔼((1 − Δei (Y))(1 − Δej (Y))Δk (X)Δk (Y))
= c2 𝔼((1 − Δei (X)Δei (Y))(1 − Δej (X)Δej (Y))Δk (X)Δk (Y)).
This, using (8.21) and some elementary calculations, leads to the relation hj (k) = hj (k + ei ).
(8.22)
208 | 8 Beta–Wishart distributions Thus 𝔼(Δk+ei (Y))𝔼(Δk+ej (Y))
𝔼(Δk+ei +ej (Y)) =
𝔼(Δk (Y))
.
(8.23)
Now suppose that for some k = (k1 , . . . , kr ) in ℕr , r
(8.24)
𝔼(Δk (Y)) = ∏ 𝔼(Δki ei (Y)), i=1
then, using (8.23), we verify that r
𝔼(Δ∑ri=1 (ki +εi )ei (Y)) = ∏ 𝔼(Δ(ki +εi )ei (Y)), i=1
where, for each i, εi is either 0 or 1. But (8.24) is trivially true for k = (0, . . . , 0). It follows that the same holds true for all k = (k1 , . . . , kr ) in ℕr . Consequently, the Δei (Y) are independent.
Given that q > 0 and pi = p − (i − 1) d2 > 0 for any i ∈ {1, . . . , r}, we obtain a > p = a − q > (r − 1) d2 . Also s > (r − 1) d2 . Hence, using Theorem 8.7, we conclude that (1) Y ∼ βp+q,s . Finally, we use (8.21) and (8.22) to obtain that gj (k) = gj (k + ei ), and with a similar (1) reasoning, we show that X ∼ βp,q . This, using Theorem 8.13 implies that Vi = 1 − Δei (X)Δei (Y) ∼ β(1)
s+q,p−(i−1) d2
and Ui =
1 − Δei (Y)
1 − Δei (X)Δei (Y)
(1) ∼ βs,q .
On the other hand, using Theorem 8.7, we have that (Ui ) are independent and that (Vi ) are independent. It follows that the probability density functions of U and W are r
fU (u1 , . . . , ur ) = ∏ i=1
q−1 us−1 i (1 − ui ) 1]0,1[ (ui ) β(s, q)
and r
fW (w1 , . . . , wr ) = ∏ i=1
d
wis+q−1 (1 − wi )p−(i−1) 2 −1 β(s + q, p − (i − 1) d2 )
Next, we give the second characterization result.
1]0,1[ (wi ).
8.4 Characterizations by constancy regression
| 209
Theorem 8.14. Let X and Y be two independent random variables with distributions in A such that the probability density functions are positive in Ω ∩ (e − Ω). Suppose that the Δei (X) are independent. Denote W = 1 − diag(M(X))M(Y) and U = (diag(W)) (1 − M(Y)), −1
and assume that (8.9) is verified, that is, 𝔼(U | W) = c1, and that 𝔼([1 − Δei (Y)] ) < ∞ −1
𝔼((diag U)
−1
(8.25)
∀i,
| W) = be,
(8.26)
and 𝔼(M(X)) = (diag((p + q)1 − η)) (p1 − η), −1
(8.27)
where η = (0, d2 , . . . , (r − 1) d2 ) and b, c, and p are positive real constants such that q = (b−1)(1−c) > (r − 1) d2 , s = (b−1)c > (r − 1) d2 . bc−1 bc−1 (1) (1) Then X ∼ βp,q and Y ∼ βp+q,s . Moreover, the probability density functions of U and V with respect to the Lebesgue measure are respectively r
fU (u1 , . . . , ur ) = ∏ i=1
q−1 us−1 i (1 − ui ) 1]0,1[ (ui ) β(s, q)
and r
fW (w1 , . . . , wr ) = ∏ i=1
d
wis+q−1 (1 − wi )p−(i−1) 2 −1 β(s + q, p − (i − 1) d2 )
1]0,1[ (wi ).
Proof. From (8.26), we have that for all i ∈ {1, . . . , r}, 𝔼((1 − Δei (Y))
−1
| W) = b(1 − Δei (Y)Δei (X)) , −1
and from (8.25), we have that for all k = (k1 , . . . , kr ) ∈ ℕr , 𝔼((1 − Δei (Y)) Δk (Y)) < ∞. −1
210 | 8 Beta–Wishart distributions It follows that 𝔼(Δk (X))𝔼[
Δk (Y) Δk (Y)Δk (X) ] = b𝔼[ ]. 1 − Δei (Y) 1 − Δei (Y)Δei (X)
Hence we obtain that 𝔼(Δk (X)) ∑ 𝔼(Δ(k1 ,...,ki−1 ,l,ki+1 ,...,kr ) (Y)) l≥ki
= b ∑ 𝔼(Δ(k1 ,...,ki−1 ,l,ki+1 ,...,kr ) (Y))𝔼(Δ(k1 ,...,ki−1 ,l,ki+1 ,...,kr ) (X)). l≥ki
If we set Hi (k) = ∑ 𝔼(Δ(k1 ,...,ki−1 ,l,ki+1 ,...,kr ) (Y)) l≥ki
and gi (k) =
𝔼(Δk+ei (X)) 𝔼(Δk (X))
,
then, for all k ∈ ℕr , we have Hi (k) − gi (k)Hi (k + ei ) = b[Hi (k) − Hi (k + ei )]. On the other hand, (8.9) leads to (8.21), which can be written in the form (1 − c)[Hi (k) − Hi (k + ei )] = (1 − cgi (k))[Hi (k + ei ) − Hi (k + 2ei )], Denoting Pi (k) =
Hi (k+ei ) , Hi (k)
k ∈ ℕr .
the previous equalities respectively yield 1 − b = [gi (k) − b]Pi (k)
and (1 − c)(1 − Pi (k)) = (1 − cgi (k))Pi (k)(1 − Pi (k + ei )). It follows that, for all k = (k1 , . . . , kr ) ∈ ℕr such that ki ≥ 1, gi (k) =
1 + (q − 1)gi (k − ei ) , q + 1 − gi (k − ei )
where q = (b−1)(1−c) . bc−1 By induction on α ∈ {0, . . . , ki }, we get that gi (k) =
α + (q − α)gi (k − αei ) . q + α − αgi (k − αei )
From (8.27), we have that, for all i ∈ {1, . . . , r}, 𝔼(Δei (X)) =
(8.28) p−(i−1) d2
p+q−(i−1) d2
.
8.4 Characterizations by constancy regression |
211
As Δei (X) is independent of Δej (X) for all i ≠ j, we obtain that gi (k − ki ei ) = 𝔼(Δei (X)). Using (8.28), we easily obtain, for α ∈ {0, . . . , ki }, that gi (k − αei ) =
p + ki − α − (i − 1) d2
p + q + ki − α − (i − 1) d2
.
Recursively we get k1
k2
𝔼(Δk (X)) = ∏ g1 (k − αe1 ) ∏ g2 (k − k1 e1 − αe2 ) ⋅ ⋅ ⋅ α=1
α=1
kr
× ∏ gr (k − k1 e1 − ⋅ ⋅ ⋅ − kr−1 er−1 − αer ) α=1
p + ki − (i − 1) d2 − α
ki
r
= ∏(∏ i=1
p + q + ki − (i − 1) d2 − α
α=1
).
(1) As q > (r − 1) d2 , to deduce that X ∼ βp,q , we need to verify that p > (r − 1) d2 . In fact, as
0 ≤ 𝔼(Δei (X)) ≤ 1 and q > (r − 1) d2 , then necessarily p > (r − 1) d2 . Now using (8.21), we get for all k = (k1 , . . . , kr ) ∈ ℕr , 𝔼(Δk+ei (Y)) =
p + q + ki − (i − 1) d2 q 1−c
p+
+ ki − (i − 1) d2
𝔼(Δk (Y)),
which leads to
i=1
where s =
p + q + ki − (i − 1) d2 − j
ki
r
𝔼(Δk (Y)) = ∏(∏ j=1
p + q + s + ki − (i − 1) d2 − j
),
qc . 1−c
(1) Since p + q > (r − 1) d2 , this implies that Y ∼ βp+q,s . Finally, as we have done in Theorem 8.12, we use Theorems 8.13 and 8.7 to deduce that U and W have respectively the following probability density functions with respect to the Lebesgue measure: r
fU (u1 , . . . , ur ) = ∏ i=1
q−1 us−1 i (1 − ui ) 1]0,1[ (ui ) β(s, q)
and r
fW (w1 , . . . , wr ) = ∏ i=1
d
wis+q−1 (1 − vi )p−(i−1) 2 −1 β(s + q, p − (i − 1) d2 )
1]0,1[ (wi ).
9 Beta–hypergeometric distributions In this chapter, we use the integral representation of a hypergeometric function on a symmetric cone to define a beta–hypergeometric probability distribution in its general form. We consider two interesting examples of such a distribution. The first example involves the Gauss hypergeometric function. A characterization result concerning this distribution proved in the matrix setting in [49] is given. For the proof, different tools are used, in particular a notion of continued fraction in a symmetric cone. A criterion of convergence for a non ordinary continued fraction is given in the first section of the chapter. The second example of a beta–hypergeometric distribution in the cone Ω involves the hypergeometric function 1 F0 , which is known in the univariate case as a generalized beta distribution.
9.1 Continued fractions in a symmetric cone Several versions of continued fractions defined on some matrix spaces have been introduced to bring answers to some needs which have arisen in different areas. In many situations, continued fractions provide an effective tool for establishing some probabilistic results. Usually the construction process of a continued fraction relies on the notion of a ratio, however, some construction processes may be defined using only the inversion without any use of multiplication or ratio. This was the case for the proof of a characterization result due to Evelyne Bernadac [10] concerning the Wishart probability distribution on a symmetric cone. As there is no single way to define a matrix inverse, there are different ways to define a continued fraction with matrix argument. The first natural way is to assume invertibility and use the classical matrix inverse to define a ratio of matrices. The second way is based on the partial matrix inverse. Yet another way uses the generalized inverse considered to be more efficient in matrix continued fraction interpolation problems. For details about these different matrix continued fractions and their applications, we refer the reader to [115] and to the references therein. We will use the division algorithm based on the Cholesky decomposition to define a continued fraction in a symmetric cone. 9.1.1 Definition and properties Let (xn )n≥1 and (yn )n≥0 be two sequences in Ω. The continued fraction denoted K(xn /yn ) or y0 + [ https://doi.org/10.1515/9783110713374-009
x1 x2 , , . . .] y1 y2
214 | 9 Beta–hypergeometric distributions is an expression whose the nth convergent Rn = y0 + [
x x1 ,..., n] y1 yn
is defined in the following recursive way: [
x1 −1 ] = (π ∗−1 (x1 )y1 ) = π(x1 )(y1−1 ), y1
x x x x1 , . . . , n+1 ] = (π ∗−1 (x1 )(y1 + [ 2 , . . . , n+1 ])) y1 yn+1 y2 yn+1
−1
[
= π(x1 )(y1 + [
x x2 , . . . , n+1 ]) . y2 yn+1 −1
Note that to get Rn+1 from Rn , we replace yn by yn + (π ∗−1 (xn+1 )yn+1 )−1 . When xn = e, for all n, we say that the continued fraction K(e/yn ) is an ordinary continued fraction. Its kth convergent is given by Rk = y0 + (y1 + (y2 + (y3 + ⋅ ⋅ ⋅ + (yk )−1 ) ⋅ ⋅ ⋅) ) . −1
(9.1)
−1 −1
The construction process of this continued fraction uses only the inversion without any use of the product or the division algorithm. Next, we show that as in the classical matrix case (see [66]), any continued fraction K(xn /yn ) is equivalent to an ordinary continued fraction K(e/an ). This result has a mathematical interest, but it has no implication on the convergence of the non ordinary random continued fraction. In fact, when the sequences in the non ordinary continued fraction are constituted of independent random variables, the corresponding ordinary continued fraction does not have this property anymore. Proposition 9.1. The continued fraction denoted K(xn /yn ) is equivalent to the continued fraction K(e/an ) where a0 = y0 , a2k = π(x1 )π ∗−1 (x2 ) ⋅ ⋅ ⋅ π(x2k−1 )π ∗−1 (x2k )y2k , a2k+1 = π
∗−1
(x1 )π(x2 ) ⋅ ⋅ ⋅ π(x2k )π
∗−1
(9.2) (9.3)
(x2k+1 )y2k+1 .
Proof. The proof is performed by induction. We will use the fact that π −1 (u)v−1 = (π ∗ (u)v) , −1
or equivalently, π ∗ (u)v−1 = (π −1 (u)v) . −1
Since [
x1 −1 ] = (π ∗−1 (x1 )y1 ) y1
(9.4)
9.1 Continued fractions in a symmetric cone | 215
and [
x1 x2 x , ] = (π ∗−1 (x1 )(y1 + [ 2 ])) y1 y2 y2
−1
−1
−1
= (π ∗−1 (x1 )(y1 + (π ∗−1 (x2 )y2 ) ))
= (π ∗−1 (x1 )y1 + π ∗−1 (x1 )(π ∗−1 (x2 )y2 ) )
−1 −1
−1 −1
= (π ∗−1 (x1 )y1 + (π(x1 )π ∗−1 (x2 )y2 ) ) , we have that a0 = y0 ,
a1 = π ∗−1 (x1 )y1 ,
a2 = π(x1 )π ∗−1 (x2 )y2 . Therefore (9.3) is true for k = 0, and (9.2) is true for k = 1. Suppose that R2k = a0 + (a1 + (a2 + (a3 + ⋅ ⋅ ⋅ + (a2k )−1 ) ⋅ ⋅ ⋅) ) , −1
−1 −1
with ai , 0 ≤ i ≤ 2k, defined by (9.3) and (9.2). In particular, a2k = π(x1 )π ∗−1 (x2 ) ⋅ ⋅ ⋅ π(x2k−1 )π ∗−1 (x2k )y2k . To get R2k+1 from R2k , we replace in a2k , y2k by y2k + (π ∗−1 (x2k+1 )y2k+1 )−1 , that is, we replace a2k by (a2k + π(x1 )π ∗−1 (x2 ) ⋅ ⋅ ⋅ π(x2k−1 )π ∗−1 (x2k )(π ∗−1 (x2k+1 )y2k+1 ) ) . −1 −1
Using (9.4), the latter can be written as (a2k + (π ∗−1 (x1 )π(x2 ) ⋅ ⋅ ⋅ π ∗−1 (x2k−1 )π(x2k )π ∗−1 (x2k+1 )y2k+1 ) ) . −1 −1
Hence R2k+1 = a0 + (a1 + (a2 + (a3 + ⋅ ⋅ ⋅ + (a2k + (a2k+1 )−1 ) ) ⋅ ⋅ ⋅) ) , −1 −1
−1 −1
with a2k+1 = π ∗−1 (x1 )π(x2 ) ⋅ ⋅ ⋅ π ∗−1 (x2k−1 )π(x2k )π ∗−1 (x2k+1 )y2k+1 . A similar reasoning is used for R2k+2 .
216 | 9 Beta–hypergeometric distributions 9.1.2 Convergence We suppose that (xn )n≥1 and (yn )n≥0 are two sequences of independent random variables defined on the same probability space and valued in the cone Ω. A criterion of convergence for the ordinary random continued fraction K(e/yn ) has been already given in [10]. We next give a criterion of convergence for the non ordinary random continued fraction K(xn /e). x For the sake of simplicity, [ xe1 , xe2 , . . . , ek ] will be written as [x1 , x2 , . . . , xk ] which is the notation usually used for the ordinary continued fraction given in (9.2). The continued fraction [x1 , x2 , . . . , xk ] which we will study in what follows is then defined by [x1 ] = x1
and [x1 , x2 , . . . , xk ] = π(x1 )(e + [x2 , . . . , xk ]) . −1
We first prove the following result: Proposition 9.2. Let (xn )n≥1 be a sequence in Ω. Then for k ≥ 1, (−1)k+1 ([x1 , x2 , . . . , xk ] − [x1 , x2 , . . . , xk+1 ]) is in Ω. Proof. The proof uses induction on k. The property is true for k = 1, in fact, [x1 ] − [x1 , x2 ] = x1 − π(x1 )(e + x2 )−1
= π(x1 )(e − (e + x2 )−1 )
= π(x1 )((e + x2 − e)(e + x2 )−1 )
= π(x1 )(x2 (e + x2 )−1 ), which is in (−1)2 Ω. Suppose that the property is true for k. Then [x1 , . . . , xk+1 ] − [x1 , . . . , xk+2 ] = π(x1 )(e + [x2 , . . . , xk+1 ])
−1
= π(x1 )((e + [x2 , . . . , xk+1 ])
− π(x1 )(e + [x2 , . . . , xk+2 ])
−1
−1
− (e + [x2 , . . . , xk+2 ]) ). −1
Using the induction hypothesis, we have that [x2 , . . . , xk+1 ] − [x2 , . . . , xk+2 ] ∈ (−1)k+1 Ω, so that (e + [x2 , . . . , xk+1 ]) − (e + [x2 , . . . , xk+2 ]) ∈ (−1)k+1 Ω. To finish the proof, it suffices to observe that for all x and y in Ω, (x − y ∈ Ω)
⇐⇒
(x−1 − y−1 ∈ −Ω).
(9.5)
9.1 Continued fractions in a symmetric cone | 217
Now, letting wk = (−1)k+1 ([x1 , x2 , . . . , xk ] − [x1 , x2 , . . . , xk+1 ]),
(9.6)
we then have that n−1
[x1 , x2 , . . . , xn ] = x1 + ∑ (−1)k wk .
(9.7)
k=1
It follows that the continued fraction [x1 , x2 , . . . , xn ] converges if and only if the alterk nating series ∑+∞ k=1 (−1) wk converges. The following result appears in [10] and provides sufficient conditions for the convergence of an alternating series in the algebra V. Proposition 9.3. Let (vn )n≥1 be a sequence in Ω. If (vn )n≥1 is decreasing and converges k to 0, then the alternating series ∑+∞ k=1 (−1) vk is convergent. Next, we state and prove a theorem which is crucial for the study of the converk gence of the alternating series ∑+∞ k=1 (−1) wk . Theorem 9.4. Let (xn )n≥1 be a sequence in Ω and denote Fk (x1 , . . . , xk+2 ) = ([x1 , . . . , xk ]−1 − [x1 , . . . , xk+1 ]−1 )
−1
+ ([x1 , . . . , xk+1 ]−1 − [x1 , . . . , xk+2 ]−1 ) . −1
(9.8)
Then we have F1 (x1 , x2 , x3 ) = π(x1 )π ∗−1 (x2 )π ∗−1 (x3 )(e), F2 (x1 , x2 , x3 , x4 ) = −π(x1 )π
∗−1
(x2 )π
∗−1
(9.9)
(x3 )P(e + π (x3 )(e))π ∗
∗−1
(x4 )(e),
(9.10)
and, for all k ≥ 3, k−1
Fk (x1 , . . . , xk+2 ) = (−1)k+1 π(x1 )(∏ π ∗−1 (xi )P(e + [xi+1 , . . . , xk ])) i=2
π
∗−1
(xk )π
∗−1
(xk+1 )P(uk )π
∗−1
(9.11)
(xk+2 )(e),
where uk = uk (x1 , . . . , xk+1 ) = e + π ∗ (xk+1 ) k−3
i+1
(e + ∑ (−1) i=0
i
(9.12)
(∏(π (xk−j )P((e + [xk−j , . . . , xk ]) )))(e + [xk−i , . . . , xk ])). ∗
j=0
The product here is the composition of maps.
−1
218 | 9 Beta–hypergeometric distributions Proof. We can write F1 = F1 (x1 , x2 , x3 ) = ([x1 ]−1 − [x1 , x2 ]−1 )
−1
+ ([x1 , x2 ]−1 − [x1 , x2 , x3 ]−1 )
−1
= (π ∗−1 (x1 )(e) − π ∗−1 (x1 )(e + x2 ))
−1
+ (π ∗−1 (x1 )(e + x2 ) − π ∗−1 (x1 )(e + (π ∗−1 (x2 )(e + x3 )) )) −1
= (−π ∗−1 (x1 )(x2 ))
−1
−1
+ (π ∗−1 (x1 )(x2 ) − π ∗−1 (x1 )π(x2 )((e + x3 )−1 ))
−1
= −π(x1 )π ∗−1 (x2 )(e) + π(x1 )π ∗−1 (x2 )((e − (e + x3 )−1 ) ) −1
= π(x1 )π ∗−1 (x2 )((e − (e + x3 )−1 ) = π(x1 )π
∗−1
(x2 )π
∗−1
−1
− e)
(x3 )(e).
Now to calculate F2 = F2 (x1 , x2 , x3 , x4 ) = ([x1 , x2 ]−1 − [x1 , x2 , x3 ]−1 )
−1
+ ([x1 , x2 , x3 ]−1 − [x1 , x2 , x3 , x3 ]−1 ) ,
we write [x1 , x2 ] = [X1 ] with X1 = π(x1 )(e + x2 )−1 , then express [x1 , x2 , x3 ] = [X1 , X2 ] with X2 = −π ∗ ((e + x2 )−1 )π(x2 )(x3 (e + x3 )−1 ), and, finally, we write [x1 , x2 , x3 , x4 ] = [X1 , X2 , X3 ], with X3 = π −1 (e + π ∗ (x3 )(e))(x4 ). Thus using (9.9), we have F2 (x1 , x2 , x3 , x4 ) = F1 (X1 , X2 , X3 )
= π(X1 )π ∗−1 (X2 )π ∗−1 (X3 )(e).
−1
9.1 Continued fractions in a symmetric cone | 219
Upon substitution, we get F2 (x1 , x2 , x3 , x4 ) = −π(x1 )π((e + x2 )−1 )π −1 ((e + x2 )−1 )π ∗−1 (x2 )
π ∗−1 (x3 )π(e + π ∗ (x3 )(e))π ∗ (e + π ∗ (x3 )(e))π ∗−1 (x4 )(e).
Given that π(e + π ∗ (x3 )(e))π ∗ (e + π ∗ (x3 )(e)) = P(e + π ∗ (x3 )(e)), we obtain F2 (x1 , x2 , x3 , x4 ) = −π(x1 )π ∗−1 (x2 )π ∗−1 (x3 )P(e + π ∗ (x3 )(e))π ∗−1 (x4 )(e). Therefore (9.11) is true for k = 2. Suppose that it true for k. Then using the same reasoning, we have that Fk+1 = Fk+1 (x1 , . . . , xk−1 , xk , xk+1 , xk+2 , xk+3 ) = Fk (x1 , . . . , xk−1 , Xk , Xk+1 , Xk+2 ),
with Xk = π(xk )((e + xk+1 )−1 ), Xk+1 = −π ∗ ((e + xk+1 )−1 )π(xk+1 )(xk+2 (e + xk+2 )−1 ), and Xk+2 = π −1 (e + π ∗ (xk+2 )(e))(xk+3 ). Inserting this into Fk (x1 , . . . , xk−1 , Xk , Xk+1 , Xk+2 ) as defined in (9.11), we have in particular that [xi+1 , . . . , xk−1 , Xk ] is nothing but [xi+1 , . . . , xk , xk+1 ]. We also have π ∗−1 (Xk )π ∗−1 (Xk+1 ) = −π ∗−1 (xk )π ∗−1 ((e + xk+1 )−1 )π −1 ((e + xk+1 )−1 ) π ∗−1 (xk+1 )π ∗−1 (xk+2 )π(e + π ∗ (xk+2 )(e))
= −π ∗−1 (xk )P(e + xk+1 )π ∗−1 (xk+1 )π ∗−1 (xk+2 ) π(e + π ∗ (xk+2 )(e)),
and that π ∗−1 (Xk+2 )(e) = π ∗ (e + π ∗ (xk+2 )(e))π ∗−1 (xk+3 )(e) Thus Fk+1 = Fk+1 (x1 , . . . , xk−1 , xk , xk+1 , xk+2 , xk+3 ) = Fk (x1 , . . . , xk−1 , Xk , Xk+1 , Xk+2 )
220 | 9 Beta–hypergeometric distributions k
= (−1)k+2 π(x1 )(∏ π ∗−1 (xi )P(e + [xi+1 , . . . , xk+1 ])) i=2
π ∗−1 (xk+1 )π ∗−1 (xk+2 )
π(e + π ∗ (xk+2 )(e))P(uk (x1 , . . . , xk−1 , Xk , Xk+1 )) π ∗ (e + π ∗ (xk+2 )(e))π ∗−1 (xk+3 )(e).
Denoting g = π(e + π ∗ (xk+2 )(e))P(uk (x1 , . . . , xk−1 , Xk , Xk+1 ))π ∗ (e + π ∗ (xk+2 )(e)), it remains to verify that g = P(uk+1 (x1 , . . . , xk−1 , xk , xk+1 , xk+2 )). In fact, uk (x1 , . . . , xk−1 , Xk , Xk+1 ) = e − π −1 (e + π ∗ (xk+2 )(e))π ∗ (xk+2 )π ∗ (xk+1 )π((e + xk+1 )−1 ) (e − π ∗ ((e + xk+1 )−1 )π ∗ (xk )P((e + [xk , xk+1 ]) )(e + [xk , xk+1 ]) −1
k−3
+ ∑ (−1)i+1 π ∗ ((e + xk+1 )−1 )π ∗ (xk )P((e + [xk , xk+1 ]) ) −1
i=1
i
(∏ π ∗ (xk−j )P((e + [xk−j , . . . , xk+1 ]) ))(e + [xk−i , . . . , xk+1 ])). −1
j=1
Using the fact that π(y)P(x)π ∗ (y) = P(π(y)x), we obtain that g = P(e + π ∗ (xk+2 )(e) − π ∗ (xk+2 )π ∗ (xk+1 )π((e + xk+1 )−1 ) (e − π ∗ ((e + xk+1 )−1 )π ∗ (xk )P((e + [xk , xk+1 ]) )(e + [xk , xk+1 ]) −1
k−3
+ ∑ (−1)i+1 π ∗ ((e + xk+1 )−1 )π ∗ (xk )P((e + [xk , xk+1 ]) ) −1
i=1 i
(∏ π ∗ (xk−j )P((e + [xk−j , . . . , xk+1 ]) ))(e + [xk−i , . . . , xk+1 ]))) j=1
−1
9.1 Continued fractions in a symmetric cone | 221
= P(e + π ∗ (xk+2 )(e) − π ∗ (xk+2 )π ∗ (xk+1 )P((e + xk+1 )−1 )((e + xk+1 )) + π ∗ (xk+2 )π ∗ (xk+1 )P((e + xk+1 )−1 )π ∗ (xk )P((e + [xk , xk+1 ]) )(e + [xk , xk+1 ]) −1
k−3
+ ∑ (−1)i+2 π ∗ (xk+2 )π ∗ (xk+1 )P((e + xk+1 )−1 )π ∗ (xk )P((e + [xk , xk+1 ]) ) −1
i=1 i
(∏ π ∗ (xk−j )P((e + [xk−j , . . . , xk+1 ]) ))(e + [xk−i , . . . , xk+1 ])). −1
j=1
Setting i = i + 1 and j = j + 1, after some standard calculations, the latter is nothing but P(uk+1 (x1 , . . . , xk−1 , xk , xk+1 , xk+2 )). We now coin a useful lemma concerning the expression of uk . Lemma 9.5. For all x1 , . . . , xk in Ω, k−3
i
i=0
j=0
Γ = e + ∑ (−1)i+1 (∏(π ∗ (xk−j )P((e + [xk−j , . . . , xk ]) )))(e + [xk−i , . . . , xk ]) −1
is in Ω. Proof. Denoting, for 1 ≤ i ≤ k − 3, i
Ai = (∏(π ∗ (xk−j )P((e + [xk−j , . . . , xk ]) )))(e + [xk−i , . . . , xk ]), −1
j=0
we have that Γ = e − π ∗ (xk )P((e + xk )−1 )(e + xk ) +
(Ai − Ai+1 ),
∑ 1≤i≤k−3, i odd
with Ak−2 = 0. Now e − π ∗ (xk )P((e + xk )−1 )(e + xk ) = e − π ∗ (xk )((e + xk )−1 ) = e − (π −1 (xk )(e + xk ))
−1
= e − (π −1 (xk )(e) + e) . −1
This is in Ω, in fact, it is easy to see that (x, y ∈ Ω)
⇒
(x − (y + x −1 )
−1
∈ Ω).
(9.13)
On the other hand, we have, for i odd, i
Ai − Ai+1 = (∏(π ∗ (xk−j )P((e + [xk−j , . . . , xk ]) ))) −1
j=0
((e + [xk−i , . . . , xk ]) − π ∗ (xk−(i+1) )((e + [xk−(i+1) , . . . , xk ]) )) −1
222 | 9 Beta–hypergeometric distributions i
= (∏(π ∗ (xk−j )P((e + [xk−j , . . . , xk ]) )))((e + [xk−i , . . . , xk ]) −1
j=0
− π ∗ (xk−(i+1) )((e + π(xk−(i+1) )((e + [xk−i , . . . , xk ]) )) )) −1
i
−1
= (∏(π ∗ (xk−j )P((e + [xk−j , . . . , xk ]) ))) −1
j=0
((e + [xk−i , . . . , xk ]) − (π −1 (xk−(i+1) )(e) + (e + [xk−i , . . . , xk ]) ) ), −1 −1
which is in Ω, according to (9.13). Thus Γ is in Ω. Theorem 9.6. The sequence (wk ) defined in (9.6) is decreasing. Proof. From the definition, we have that [e, x1 , x2 , . . . , xk ] = π(e)((e + [x1 , x2 , . . . , xk ]) ), −1
so [e, x1 , x2 , . . . , xk ]−1 = e + [x1 , x2 , . . . , xk ]. Hence wk−1 = (−1)k+1 ([x1 , x2 , . . . , xk ] − [x1 , x2 , . . . , xk+1 ])
−1
= (−1)k+1 ([e, x1 , x2 , . . . , xk ]−1 − [e, x1 , x2 , . . . , xk+1 ]−1 ) , −1
−1 wk−1 − wk+1 = (−1)k+1 ([e, x1 , x2 , . . . , xk ]−1 − [e, x1 , x2 , . . . , xk+1 ]−1 )
−1
−(−1)k+2 ([e, x1 , x2 , . . . , xk+1 ]−1 − [e, x1 , x2 , . . . , xk+2 ]−1 )
−1
= (−1)k+1 {([e, x1 , x2 , . . . , xk ]−1 − [e, x1 , x2 , . . . , xk+1 ]−1 )
−1
+ ([e, x1 , x2 , . . . , xk+1 ]−1 − [e, x1 , x2 , . . . , xk+2 ]−1 ) } −1
= (−1)k+1 Fk+1 (e, x1 , . . . , xk+2 ) k−1
= (−1)k+1 (−1)k+2 (∏ π ∗−1 (xi )P(e + [xi+1 , . . . , xk ])) i=1
π ∗−1 (xk )π ∗−1 (xk+1 )P(uk+1 (e, x1 , . . . , xk+1 ))π ∗−1 (xk+2 )(e). Therefore k−1
−1 wk−1 − wk+1 = −(∏ π ∗−1 (xi )P(e + [xi+1 , . . . , xk ]))π ∗−1 (xk )π ∗−1 (xk+1 ) i=1
P(uk+1 (e, x1 , . . . , xk+1 ))π
∗−1
(9.14)
(xk+2 )(e).
As for all v in the cone Ω, the automorphisms π(v) and P(v) are in the group G, they conserve Ω, thus we deduce −1 wk−1 − wk+1 ∈ −Ω,
9.1 Continued fractions in a symmetric cone | 223
which is equivalent to wk − wk+1 ∈ Ω. Hence the sequence (wk ) is decreasing. Next, we give an other important intermediary result. Using the notation above, for a sequence (xn ) in the cone Ω, we define the following element of G: k−1
Qk = (∏ π ∗−1 (xi )P(e+[xi+1 , . . . , xk ]))π ∗−1 (xk )π ∗−1 (xk+1 )P(uk+1 (e, x1 , . . . , xk+1 )). (9.15) i=1
Note that according to (9.14), we have −1 −1 wk+1 − wk−1 = Qk (xk+2 ).
(9.16)
Proposition 9.7. For all y in Ω\{0}, there exists C > 0 such that for all k ≥ 2, ∗ Qk (y) ≥ C. The constant C depends on y and on x1 . Proof. It suffices to show that for all k ≥ 2 and i = 1, . . . , r, λi (Q∗k (y)) ≥ λi (π −1 (x1 )(y)), where the λi are the eigenvalues ordered as in (1.13). In fact, this implies that 1 2
r
r
2 1 2 −1 ∗ −1 ∗ Qk (y) = (∑(λi (Qk (y))) ) ≥ (∑(λi (π (x1 )(y))) ) 2 = π (x1 )(y), i=1
i=1
and we take C = ‖π −1 (x1 )(y)‖. We have that Q∗k = P(uk+1 (e, x1 , . . . , xk+1 ))π −1 (xk+1 )π −1 (xk ) k−1
(∏ P(e + [xk+1−i , . . . , xk ])π −1 (xk−i )). i=1
Using the fact that P(x) = π(x)π ∗ (x), we obtain that Q∗k is equal to the composition of the following elements of G: π ∗ (e + [x2 , . . . , xk ])π −1 (x1 ),
(9.17)
224 | 9 Beta–hypergeometric distributions then the automorphisms π ∗ (e + [xk+1−i , . . . , xk ])π −1 (xk−i )π(e + [xk−i , . . . , xk ]),
(9.18)
for i = 1, . . . , k − 2, followed by the automorphisms π −1 (xk )π(e + [xk ]),
(9.19)
P(uk+1 (e, x1 , . . . , xk+1 ))π −1 (xk+1 ).
(9.20)
and
We will use the fact that if g1 and g2 are two elements of G such that λi (g1 (z0 )) ≥ λi (z0 ), for some z0 ∈ Ω\{0} and λi (g2 (z)) ≥ λi (z), ∀z ∈ Ω\{0}, then λi (g2 g1 (z0 )) ≥ λi (z0 ). By Theorem 2.12 (i), we have that for i = 1, . . . , r, λi (π ∗ (e + [x2 , . . . , xk ])π −1 (x1 )y) ≥ λi (π −1 (x1 )y). For the automorphisms in (9.18), we have π ∗ (e + [xk+1−i , . . . , xk ])π −1 (xk−i )π(e + [xk−i , . . . , xk ]) = π ∗ (e + [xk+1−i , . . . , xk ])π −1 (xk−i )π(e + π(xk−i )(e + [xk+1−i , . . . , xk ]) ) −1
= π ∗ (e + [xk+1−i , . . . , xk ])π(π −1 (xk−i )(e) + (e + [xk+1−i , . . . , xk ]) ). −1
From Theorem 2.12 (iv), we have that for all z in Ω\{0} and i = 1, . . . , r, λi (π ∗ (e + [xk+1−i , . . . , xk ])π(π −1 (xk−i )(e) + (e + [xk+1−i , . . . , xk ]) )(z)) ≥ λi (z). −1
Concerning the automorphisms in (9.19), we have π −1 (xk )π(e + [xk ]) = π(π −1 (xk )(e + [xk ])) = π(e + π −1 (xk )(e)).
We use Theorem 2.12 (ii) to get for all z in Ω\{0} and i = 1, . . . , r, λi (π(e + π −1 (xk )(e))(z)) ≥ λi (z). Finally, for (9.20), according to the definition of uk see (9.12), and by Lemma 9.5, we have that uk+1 (e, x1 , . . . , xk+1 ) can be written in the form uk+1 (e, x1 , . . . , xk+1 ) = e + π ∗ (xk+1 )(e + v), where v is an element of Ω. Thus P (uk+1 (e, x1 , . . . , xk+1 )) π −1 (xk+1 ) = P (e + π ∗ (xk+1 )(e + v)) π −1 (xk+1 ) = π (e + π ∗ (xk+1 )(e + v)) π ∗ (e + π ∗ (xk+1 )(e + v)) π −1 (xk+1 ) = π (e + π ∗ (xk+1 )(e + v)) π ∗ (π ∗ (xk+1 ) (e + π ∗−1 (xk+1 )(e) + v)) π −1 (xk+1 ).
9.1 Continued fractions in a symmetric cone | 225
We have that for all u, w in Ω and z in Ω\0, π ∗ (π ∗−1 (w)u) (z) and π ∗ (u) π −1 (w)(z) have the same spectrum. In fact, for λ in ℝ, Δ(π ∗ (π ∗−1 (w)(u)) (z) − λe) = 0
⇔
Δ(z − λπ ∗−1 (π ∗−1 (w)(u)) (e)) = 0
⇔
Δ(z − λ (π ∗−1 (w)(u)) ) = 0
⇔
Δ(z − λπ(w)(u−1 )) = 0
⇔
Δ(z − λπ(w)π ∗−1 (u)(e)) = 0
⇔
Δ(π ∗ (u)π −1 (w)(z) − λe) = 0.
−1
This implies that for all z in Ω\0, π ∗ (π ∗ (xk+1 ) (e + π ∗−1 (xk+1 )(e) + v)) π −1 (xk+1 )(z) and π ∗ (e + π ∗−1 (xk+1 )(e) + v) (z) have the same spectrum. And from (i) and (ii) of Theorem 2.12, we have that for all z in Ω\{0}, λi (π(e + π ∗ (xk+1 )(e + v))π ∗ (e + π ∗−1 (xk+1 )(e) + v)(z)) ≥ λi (z). Besides the general facts that we have established above, for the proof of the main result concerning the convergence of a random non ordinary continued fraction, we need the following probability result which appears in [10]. We give it with a slightly different proof. Proposition 9.8. Let (Ak )k≥1 be an increasing sequence of σ-fields, and let Ak be in Ak , for k ≥ 1. Suppose that there exist l > 1, 0 < k0 < l, and a random variable α which is Ak0 -measurable and valued in ]0, 1[ such that for all k ≥ l, 𝔼(1Ak+1 | Ak ) ≤ α. Then P(⋂ Ak ) = 0. k≥l
Proof. Let us show by induction on p ≥ 1 that 𝔼(1Al ⋅ ⋅ ⋅ 1Al+p | Ak0 ) ≤ αp
a. s.
For p = 1, 𝔼(1Al 1Al+1 | Ak0 ) = 𝔼(𝔼(1Al 1Al+1 |Al ) | Ak0 ) = 𝔼(1Al 𝔼(1Al+1 |Al ) | Ak0 ) ≤ α𝔼(1Al | Ak0 )
≤α
a. s.
a. s.
Now suppose that 𝔼(1Al ⋅ ⋅ ⋅ 1Al+p | Ak0 ) ≤ αp a. s. Then 𝔼(1Al ⋅ ⋅ ⋅ 1Al+p+1 | Ak0 ) = 𝔼((1Al ⋅ ⋅ ⋅ 1Al+p+1 | Al+p ) | Ak0 ) = 𝔼(1Al ⋅ ⋅ ⋅ 1Al+p 𝔼(1Al+p+1 | Al+p ) | Ak0 )
226 | 9 Beta–hypergeometric distributions ≤ α𝔼(1Al ⋅ ⋅ ⋅ 1Al+p | Ak0 ) p
≤ αα = α
p+1
a. s.
a. s.
Having shown that 𝔼(1Al ⋅ ⋅ ⋅ 1Al+p | Ak0 ) ≤ αp
a. s.,
we just need to take the expectation and let p → ∞ to get the result. Theorem 9.9. Let (xk )k≥1 be a sequence of independent random variables in the cone Ω with distributions absolutely continuous with respect to the Lebesgue measure and such that for all k, the complement in Ω of the support of xk−1 is bounded. Suppose that there exists p ≥ 1, such that for all i ≥ 1, L(xi ) = L(xi+p ). Then the continued fraction [x1 , . . . , xk ] is almost surely convergent. Proof. We already know (see (9.7)) that the continued fraction [x1 , . . . , xk ] converges k almost surely if and only if the alternating series ∑+∞ k=1 (−1) wk , where (wk ) is defined in (9.6), converges almost surely. On the other hand, from Theorem 9.6, we have that the sequence (wk ) is decreasing. Thus invoking Proposition 9.3, we need only to show that under the assumptions of the theorem, (wk ) converges almost surely to 0. For this, we will first show that if for all y in Ω\{0}, the sequence (⟨wk−1 , y⟩)k≥1 diverges, then the sequence (wk ) converges almost surely to 0. Recall that wk−1 is in Ω and that the sequence (wk )k≥1 is decreasing, so that the sequence (⟨wk−1 , y⟩)k≥1 is positive and increasing. We also have that the set J of primitive idempotents elements of V is compact. Then writing the eigenvalues of wk−1 as λ1 (wk−1 ) ≥ λ2 (wk−1 ) ≥ ⋅ ⋅ ⋅ ≥ λr (wk−1 ) > 0, and using the min–max theorem of Hirzebruch, Theorem 1.5, we have lim λr (wk−1 ) = lim min
k→+∞
k→+∞ c∈J
⟨wk−1 , c⟩ . ⟨e, c⟩
Given that the sequence (⟨wk−1 , y⟩)k≥1 is increasing and that the set J is compact, from the particular statement of von Neumann minimax theorem given in the corollary which follows Theorem 1 in [106], we can write ⟨wk−1 , c⟩ . k→+∞ ⟨e, c⟩
lim λr (wk−1 ) = min lim
k→+∞
c∈J
As c ∈ Ω\{0}, we deduce that limk→+∞ λr (wk−1 ) = +∞, and it follows that limk→+∞ λi (wk−1 ) = +∞, for 1 ≤ i ≤ r.
9.1 Continued fractions in a symmetric cone | 227
1 , we get that for 1 ≤ i ≤ r, limk→+∞ λi (wk ) = 0. λr+1−i (wk−1 ) 2 The fact that ‖wk ‖ = ∑ri=1 λi2 (wk ) implies that (wk ) converges to 0. Now, in order to show that (⟨wk−1 , y⟩)k≥1 diverges almost surely, we will
Since λi (wk ) =
show that
−1 P({⟨wk+1 − wk−1 , y⟩ → 0}) = 0. k→∞
Let Ak be the σ-field generated by the random variables x1 , . . . , xk+1 , and define for ε > 0, −1 Ak = {⟨wk−1 − wk−1 , y⟩ ≤ ε}.
Then Ak ∈ Ak , and we have −1 {⟨wk+1 − wk−1 , y⟩ → 0} ⊂ ⋃ ⋂ Ak . k→∞
n>0 k≥n
−1 −1 Given that wk+1 − wk−1 = Qk (xk+2 ) (see (9.16)), we have −1 Ak+1 = {⟨Qk (xk+2 ), y⟩ ≤ ε} −1 = {⟨xk+2 , Q∗k (y)⟩ ≤ ε} −1 = {⟨xk+2 ,
Q∗k (y) ε ⟩≤ ∗ }. ‖Q∗k (y)‖ ‖Qk (y)‖
Let S be the unit sphere in V, and define for a random variable z in Ω, hz (a) = sup P(⟨z, θ⟩ ≤ a). θ∈S∩Ω
Then we have that −1 𝔼(1Ak+1 | Ak ) = P({⟨xk+2 ,
≤ hx−1 ( k+2
Q∗k (y) ε ⟩≤ ∗ } Ak ) ∗ ‖Qk (y)‖ ‖Qk (y)‖
ε ) ‖Q∗k (y)‖
(because Q∗k (y) is Ak -measurable)
ε ≤ hx−1 ( ), k+2 C
where the constant C is determined in Proposition 9.7; it depends on x1 and on y. Denoting α = max(hx−1 ( Cε ), . . . , hxp−1 ( Cε )), we have that α < 1. In fact, if α = 1, then as 1
S∩Ω is compact, there exist 1 ≤ j ≤ p, θ0 in S∩Ω and a > 0 such that P(⟨xj−1 , θ0 ⟩ ≤ a) = 1. This is in contradiction with the fact that the complement in Ω of the support of xj−1 is bounded. Thus we have 𝔼(1Ak+1 | Ak ) ≤ α < 1. Therefore using Proposition 9.8 with k0 = 2, we deduce that P(⋂ Ak ) = 0, k≥n
which is the desired result.
228 | 9 Beta–hypergeometric distributions
9.2 Beta–Gauss hypergeometric distributions 9.2.1 Definition Using Proposition 7.1, one can define for a > (r − 1) d2 , b > (r − 1) d2 , p and q such that p ≤ q + 1, and g in G such that δ = g(e) is in Ω ∩ (e − Ω), a beta–hypergeometric distribution by n
n
CΔa− r (x)Δb− r (e − x)p Fq (α1 , . . . , αp ; β1 , . . . , βq ; g(x))1Ω∩(e−Ω) dx,
(9.21)
where the normalization constant C is C(α1 , . . . , αp , β1 , . . . , βq , a, b, δ) =
Γ(a + b) . Γ(a)Γ(b) p+1 Fq+1 (α1 , . . . , αp , a; β1 , . . . , βq , a + b; δ)
We will restrict our attention to the beta–hypergeometric distributions corresponding to q = 1 and p = 2, and to q = 0 and p = 1. Definition 9.1. The beta–Gauss hypergeometric distribution, with parameters a, a , b all in ](r − 1) d2 , +∞[, is defined on Ω ∩ (e − Ω) by n
n
μa,a ,b (dx) = C(a, a , b)Δa− r (x)Δb− r (e − x) 2 F1 (a, b; a + a ; x)1Ω∩(e−Ω) (x)(dx),
(9.22)
where C(a, a , b) =
ΓΩ (a + b) . ΓΩ (a)ΓΩ (b) 3 F2 (a, a, b; a + b, a + a ; e)
(1) Note that μa,a ,a+a is nothing but the distribution βa,a . In fact, according to (7.12), (9.22) becomes
μa,a ,b (dx) =
ΓΩ (a + b)ΓΩ (a ) ΓΩ (a + a )ΓΩ (b) 3 F2 (a, a, b; a + b, a + a ; e) (1) × 2 F1 (a , a + a − b; a + a ; x)βa,a (dx).
When a + a − b = 0, 2 F1 (a , 0; b; x) = 1, for all x, and 3 F2 (a, a, b; a + b, a + a ; e) becomes 2 F1 (a, a; 2a + a ; e). Using the following Gauss formula: 2 F1 (α, β; γ; e)
=
ΓΩ (γ)ΓΩ (γ − α − β) , ΓΩ (γ − β)ΓΩ (γ − α)
(9.23)
(1) for α = β = a and γ = 2a + a , we deduce that μa,a ,a+a coincides with βa,a . This justifies the fact that the beta–Gauss hypergeometric distribution is considered as an extension of the beta distribution.
9.2 Beta–Gauss hypergeometric distributions |
229
9.2.2 Characterizations of the beta–Gauss hypergeometric distributions In Theorem 8.11 (ii), the random variables W and W are beta-distributed with the first parameter equal to the sum of the parameters of the distribution of the variable X. In the real case, Asci, Letac, and Piccioni [6] have shown that with the use of the beta– (2) (2) hypergeometric distribution, the result can be extended to W ∼ βb,a and W ∼ βb,a with b > 0 not necessarily equal to a + a . This result is extended here to the setting of a symmetric cone. (2) Theorem 9.10. Let X and W be two independent random variables such that W ∼ βb,a and X ∼ μa,a ,b . Then
π −1 (e + π(X)(W))(e) ∼ μa ,a,b .
(9.24)
Proof. Let X be a random variable with distribution μa ,a,b , and define V = π −1 (X )(e − X ). Then showing (9.24) is equivalent to showing that V and π(X)(W) have the same distribution. Let h be a bounded measurable function. Then 𝔼(h(V)) =
h(π −1 (x)(e − x))μa ,a,b (dx)
∫ Ω∩(e−Ω)
= C(a , a, b)
n
n
h(π −1 (x)(e − x))Δa − r (x)Δb− r (e − x)
∫ Ω∩(e−Ω)
× 2 F1 (a , b; a + a ; x)dx.
2n
Setting v = π −1 (x)(e − x), or equivalently x = π −1 (e + v)(e), gives dx = Δ− r (e + v)dv, and we have n
n
𝔼(h(V)) = C(a , a, b) ∫ h(v)Δa − r (π −1 (e + v)(e))Δb− r (e − π −1 (e + v)(e))
Ω
2n
× 2 F1 (a , b; a + a ; π −1 (e + v)(e))Δ− r (e + v)dv
n
= C(a , a, b) ∫ h(v)Δa +b (π −1 (e + v)(e))Δb− r (v)
Ω
× 2 F1 (a , b; a + a ; π −1 (e + v)(e))dv. Hence the probability density function of V is n
fV (v) = C(a , a, b)Δa +b (π −1 (e + v)(e))Δb− r (v) 2 F1 (a , b; a + a ; π −1 (e + v)(e))1Ω (v). (9.25)
On the other hand, using Lemma 2.17, the probability density function of U = π(X)(W) is given by the multiplicative convolution fU (u) =
∫ Ω∩(e−Ω)
n
fX (x)fW (π −1 (x)(u))Δ− r (x)dx
230 | 9 Beta–hypergeometric distributions
= C(a, a , b) b− nr
×Δ
ΓΩ (a + b) ΓΩ (a )ΓΩ (b)
(π (x)(u))Δ
= C(a, a , b) −b−a
×Δ
n
Ω∩(e−Ω)
−b−a
−1
n
Δa− r (x)Δb− r (e − x) 2 F1 (a, b; a + a ; x)
∫
n
(e + π −1 (x)(u))Δ− r (x)dx
ΓΩ (a + b) b− nr Δ (u) ΓΩ (a )ΓΩ (b)
n
n
Δa+a − r (x)Δb− r (e − x)
∫ Ω∩(e−Ω)
(x + u)2 F1 (a, b; a + a ; x)dx.
With the change of variable, t = e − x, we get fU (u) = C(a, a , b) −b−a
×Δ
ΓΩ (a + b) b− nr Δ (u) ΓΩ (a )ΓΩ (b)
Ω∩(e−Ω)
(e + u − t) 2 F1 (a, b; a + a ; e − t)dt n
= C(a, a , b) −b−a
×Δ
ΓΩ (a + b) Δb− r (u) ΓΩ (a )ΓΩ (b) Δa +b (e + u)
∫
n
Ω∩(e−Ω)
(e − π (e + u)(t)) 2 F1 (a, b; a + a ; e − t)dt n
Γ (a + b) Δb− r (u) = C(a, a , b) Ω ΓΩ (a )ΓΩ (b) Δa +b (e + u) ×Δ
n
Δa+a − r (e − t)Δb− r (t)
−1
−b−a
n
n
Δa+a − r (e − t)Δb− r (t)
∫
∫
n
n
∫ Δa+a − r (e − t)Δb− r (t)
Ω∩(e−Ω) K
(e − π (e + u)(kt)) 2 F1 (a, b; a + a ; e − t)dkdt. −1
Given that dm (a + b)m ϕm (π −1 (e + u)(kt)), n ) ( m≥0 r m
Δ−b−a (e − π −1 (e + u)(kt)) = ∑ (see [31, p. 244]), we can write
n
fU (u) = C(a, a , b)
ΓΩ (a + b) Δb− r (u) ΓΩ (a )ΓΩ (b) Δa +b (e + u)
∫
n
n
∫ Δa+a − r (e − t)Δb− r (t)
Ω∩(e−Ω) K
dm (a + b)m ϕm (π −1 (e + u)(kt)) 2 F1 (a, b; a + a ; e − t)dkdt. n ( ) m≥0 r m
× ∑
Using (2.23), we obtain n
fU (u) = C(a, a , b)
ΓΩ (a + b) Δb− r (u) ΓΩ (a )ΓΩ (b) Δa +b (e + u)
∫
n
n
Δa+a − r (e − t)Δb− r (t)
Ω∩(e−Ω)
dm (a + b)m ϕm (π −1 (e + u)(e))ϕm (t) 2 F1 (a, b; a + a ; e − t)dt n ( ) m≥0 r m
× ∑
n
= C(a, a , b)
ΓΩ (a + b) Δb− r (u) ΓΩ (a )ΓΩ (b) Δa +b (e + u)
∫ Ω∩(e−Ω)
n
n
Δa+a − r (e − t)Δb− r (t)
9.2 Beta–Gauss hypergeometric distributions | 231
(a)λ (b)λ dλ dm (a + b)m ϕm (π −1 (e + u)(e))ϕm (t) ∑ ϕ (e − t)dt n (a + a )λ ( nr )λ λ m≥0 ( r )m λ≥0
× ∑
n
= C(a, a , b)
ΓΩ (a + b) Δb− r (u) ΓΩ (a )ΓΩ (b) Δa +b (e + u)
× ∑ ∑ ϕm (π −1 (e + u)(e)) m≥0 λ≥0
×
n
(a)λ (b)λ (a + b)m (a + a )λ
n
Δa+a − r (e − t)Δb− r (t)
∫ Ω∩(e−Ω)
d dm ϕm (t) n λ ϕλ (e − t)dt. ( nr )m ( r )λ
Using (2.24), this becomes n
Γ (a + b) Δb− r (u) fU (u) = C(a, a , b) Ω ΓΩ (a )ΓΩ (b) Δa +b (e + u)
(a)λ (b)λ (a + b)m (a + a )λ
× ∑ ∑ ϕm (π −1 (e + u)(e)) m≥0 λ≥0
×
n
n
Δa+a − r (e − t)Δb− r (t)
∫ Ω∩(e−Ω)
Cm (t) Cλ (e − t) dt, |m|! |λ|!
and using (2.26), we obtain n
fU (u) = C(a, a , b)
ΓΩ (a + b) Δb− r (u) ΓΩ (a )ΓΩ (b) Δa +b (e + u)
× ∑ ∑ ϕm (π −1 (e + u)(e)) m≥0 λ≥0
×
∫ Ω∩(e−Ω)
n
n
Δa+a − r (e − t)Δb− r (t) ∑ θϕm,λ Cϕm,λ (t, e − t)dt
= C(a, a , b)
ϕ∈m.λ
b− nr
ΓΩ (a + b) Δ (u) ΓΩ (a )ΓΩ (b) Δa +b (e + u)
× ∑ ∑ ϕm (π −1 (e + u)(e)) m≥0 λ≥0
× ∑ θϕm,λ ϕ∈m.λ
(a)λ (b)λ (a + b)m (a + a )λ |m|!|λ|!
∫ Ω∩(e−Ω)
n
(a)λ (b)λ (a + b)m (a + a )λ |m|!|λ|! n
Δa+a − r (e − t)Δb− r (t)Cϕm,λ (t, e − t)dt.
Now according to (2.27), we get n
fU (u) = C(a, a , b) ×
ΓΩ (a + b) Δb− r (u) ∑ ∑ ϕ (π −1 (e + u)(e)) ΓΩ (a )ΓΩ (b) Δa +b (e + u) m≥0 λ≥0 m
(a)λ (b)λ (a + b)m m,λ 2 ΓΩ (b + m)ΓΩ (a + a + λ) (θ ) Cϕ (e) ∑ (a + a )λ |m|!|λ|! ϕ∈m.λ ϕ ΓΩ (b + a + a + ϕ)
232 | 9 Beta–hypergeometric distributions n
(a + b)m ϕm (π −1 (e + u)(e)) Γ (a + b) Δb− r (u) = C(a, a , b) Ω ∑ ΓΩ (a )ΓΩ (b) Δa +b (e + u) m≥0 |m|!
×∑ ∑ λ≥0 ϕ∈m.λ
(a)λ (b)λ m,λ 2 ΓΩ (b + m)ΓΩ (a + a + λ) (θ ) C (e) (a + a )λ ϕ ΓΩ (b + a + a + ϕ)|λ|! ϕ n
(b)m Γ (a + b) Δb− r (u) = C(a, a , b) Ω ∑ (a + b)m ΓΩ (a ) Δa +b (e + u) m≥0 ΓΩ (b + a + a )|m|!
× ϕm (π −1 (e + u)(e))ΓΩ (a + a ) ∑ ∑
λ≥0 ϕ∈m.λ
(a)λ (b)λ 2 (θm,λ ) Cϕ (e). (a + a + b)ϕ |λ|! ϕ
Finally, invoking (2.28), we can write n
fU (u) = C(a, a , b)
(a + b)m (b)m ΓΩ (a + b) Δb− r (u) ∑ ΓΩ (a ) Δa +b (e + u) m≥0 ΓΩ (b + a + a )|m|!
× ϕm (π −1 (e + u)(e))ΓΩ (a + a ) n
= C(a, a , b)
ΓΩ (a + a + b)ΓΩ (a + m) C (e) ΓΩ (a + b + m)ΓΩ (a + a + m) m
(a )m (b)m dm Δb− r (u) ϕm (π −1 (e + u)(e)). ∑ n Δa +b (e + u) m≥0 (a + a )m ( r )m
Thus the density of U is equal to n
fU (u) = C(a, a , b)Δa +b (π −1 (e+u)(e))Δb− r (u) 2 F1 (a , b; a+a ; π −1 (e+u)(e))1Ω (u). (9.26)
Comparing (9.25) and (9.26), we conclude that the densities of U and V are equal. Note that, as a consequence of the last theorem, we have the following identity concerning the hypergeometric function 3 F2 . Corollary 9.11. For a, a , b in ](r − 1) d2 , +∞[, we have 3 F2 (a, a, b; a + b, a + a ; e) ΓΩ (a )ΓΩ (a + b)
=
3 F2 (a , a , b; a
+ b, a + a ; e) . ΓΩ (a)ΓΩ (a + b)
(9.27)
Proof. We have that the normalizing constants C(a, a , b) and C(a , a, b) are equal. The result then follows from the symmetry of C(a, a , b) as a function of (a, a ). Next, we give a characterization of the beta–Gauss hypergeometric distribution. (2) (2) , W ∼ βb,a Theorem 9.12. Let W ∼ βb,a , and X be three independent random variables, with X valued in Ω. Then
X ∼ π −1 (e + π −1 (e + π(X)(W ))(W))(e)
if and only if X ∼ μa,a ,b .
Proof. (⇐) Let X and W be two independent random variables such that X ∼ μa,a ,b (2) and W ∼ βb,a . Then using Theorem 9.10, we obtain that π −1 (e + π(X)(W ))(e) ∼ μa ,a,b .
9.2 Beta–Gauss hypergeometric distributions |
233
(2) As W ∼ βb,a , we apply Theorem 9.10 again to obtain
X ∼ π −1 (e + π −1 (e + π(X)(W ))(W))(e). (⇒) Suppose that W, W , and X are three independent random variables such that (2) (2) W ∼ βb,a , W ∼ βb,a , X valued in Ω, and X ∼ π −1 (e + π −1 (e + π(X)(W ))(W))(e).
(9.28)
We will show that the distribution of X is the beta–Gauss hypergeometric distribution μa,a ,b . Consider two independent sequences (Wn )n≥1 and (Wn )n≥1 of independent random (2) (2) variables with respective distributions βb,a and βb,a , and define the random continued fraction π −1 (e + π −1 (e + ⋅ ⋅ ⋅ + π −1 (e + ⋅ ⋅ ⋅)(Wn ))(Wn−1 ))Wn−1 ) ⋅ ⋅ ⋅)W1 )(e),
which is a formal definition of a continued fraction of the form e+
e
W1
(9.29)
.
W 1 e+ W2 e+ W 2 e+ e+⋅⋅⋅
In other words, it is the continued fraction [x1 , . . . , xk ] as defined in (9.5) such that x2n+1 = Wn+1
for n ≥ 0
and x2n = Wn
for n ≥ 1.
Then we have that – (xn )n≥1 is a sequence of independent random variables in the cone Ω with distributions which are absolutely continuous with respect to the Lebesgue measure; – For all k, the support of xk−1 is Ω, its complement in Ω is then bounded; – For all i ≥ 1, L(xi ) = L(xi+2 ). According to Theorem 9.9, the continued fraction [x1 , . . . , xk ] is almost surely convergent. As the convergence of a continued fraction doesn’t depend on the initial vector, we deduce that the continued fraction [e, x1 , . . . , xk ] is also almost surely convergent. Now consider the sequence (Fn )∞ n=1 of random mappings from Ω ∩ (e − Ω) into itself defined by Fn (z) = π −1 (e + π −1 (e + π(z)(Wn ))(Wn ))(e). Since for a given z in Ω ∩ (e − Ω), F1 ∘ ⋅ ⋅ ⋅ ∘ Fn (z) has a limit almost surely, the distribution of this limit is a stationary distribution of the Markov chain ξn = Fn ∘ ⋅ ⋅ ⋅ ∘ F1 (z) which is unique. From the hypothesis, this distribution is nothing but the distribution of X. On the other hand, according to Theorem 9.10, we have that μa,a ,b is a stationary distribution of the Markov chain (ξn )∞ n=0 . It follows that X ∼ μa,a ,b .
234 | 9 Beta–hypergeometric distributions The following theorem characterizes in a similar way a particular class of the beta–Gauss hypergeometric distributions. (2) Theorem 9.13. If W ∼ βb,a and X ∈ Ω are two independent random variables, then
X ∼ π −1 (e + π(X)(W))(e)
if and only if
X ∼ μa,a,b .
(9.30)
Proof. (⇐) It is a particular case of Theorem 9.10. (⇒) We use the same reasoning as in the proof of Theorem 9.12 with the random mappings Gn (z) = π −1 (e + π(z)(Wn ))(e).
9.3 Generalized beta distribution This section is devoted to the beta–hypergeometric probability distribution on a symmetric cone expressed in terms of the hypergeometric function 1 F0 . Definition 9.2. We call generalized beta probability distribution the beta–hypergeometric probability distribution n
n
β(a, b, c, δ)(dx) = C(a, b, c, δ)Δa+c− r (x)Δb− r (e − x)Δc (x −1 − e + δ)1Ω∩(e−Ω) (x)dx, where a, b are in ](r − 1) d2 , +∞[, c in ℝ, δ in Ω ∩ (e − Ω), and C(a, b, c, δ) =
ΓΩ (a + b) . ΓΩ (a)ΓΩ (b) 2 F1 (−c, a; a + b; e − δ)
The generalized beta probability distribution has an invariance property which involves the beta probability distribution of type 1. Theorem 9.14. Let X and Y be two independent random variables such that X ∼ β(a + b, c, −c − b, δ)
and Y ∼ β(1) (a, b).
Define 1
1
U = e − P(δ 2 )(P(X − 2 )Y −1 − e + δ)
−1
and 1
1
V = P(U − 2 )(e − P(δ 2 )(X −1 − e + δ) ). −1
Then U and V are independent, as well as U ∼ β(c + b, a, −a − b, δ)
and
V ∼ β(1) (c, b).
9.3 Generalized beta distribution
| 235
Proof. We have that the joint probability distribution of X and Y is n
n
C(a + b, c, −c − b, δ)Δa−c− r (x)Δc− r (e − x)Δ−c−b (x−1 − e + δ) n n 1 Δa− r (y)Δb− r (e − y)1Ω∩(e−Ω) (x)1Ω∩(e−Ω) (y)dxdy. × B(a, b)
(9.31)
Consider the map 2
2
ς : (Ω ∩ (e − Ω)) → (Ω ∩ (e − Ω)) , (x, y) → (u, v), with 1
1
u = e − P(δ 2 )(P(x− 2 )y−1 − e + δ)
(9.32)
−1
and 1
1
v = P(u− 2 )(e − P(δ 2 )(x−1 − e + δ) ).
(9.33)
−1
To show that it realizes a bijection from (Ω ∩ (e − Ω))2 into (Ω ∩ (e − Ω))2 , we first verify that u and v are in Ω ∩ (e − Ω), then we show that x and y are uniquely determined in 1 terms of u and v. In fact, as x and y are in Ω ∩ (e − Ω), we have that P(x 2 )y ∈ Ω ∩ (e − Ω), 1 1 1 so that P(x− 2 )y−1 = (P(x 2 )y)−1 ∈ (e + Ω). This implies that P(x − 2 )y−1 − e + δ ∈ δ + Ω, and it follows that 1
1
P(δ− 2 )(P(x − 2 )y−1 − e + δ) ∈ e + Ω. Thus 1
1
P(δ 2 )(P(x− 2 )y−1 − e + δ)
−1
1
1
= (P(δ− 2 )(P(x− 2 )y−1 − e + δ))
−1
∈ Ω ∩ (e − Ω).
Therefore u ∈ Ω ∩ (e − Ω). On the other hand, the fact that x ∈ Ω∩(e−Ω) implies x −1 > e, so that (δ+x −1 −e)−1 < 1 1 −1 δ . Hence P(δ 2 )(δ + x−1 − e)−1 < P(δ 2 )(δ−1 ) = e. From this we deduce that v ∈ Ω. 1 1 Now as y ∈ Ω∩(e−Ω), we have that y−1 > e. This implies that P(x − 2 )y−1 > P(x − 2 )e = x−1 and hence 1
1
P(δ 2 )(P(x − 2 )y−1 − e + δ)
−1
1
< P(δ 2 )(x−1 − e + δ) . −1
Therefore 1
e − P(δ 2 )(x−1 − e + δ)
−1
1
1
< e − P(δ 2 )(P(x− 2 )y−1 − e + δ)
−1
= u.
Hence 1
1
1
v = P(u− 2 )(e − P(δ 2 )(x−1 − e + δ) ) < P(u− 2 )u = e, −1
which means that v ∈ e − Ω. Thus we have shown that v ∈ Ω ∩ (e − Ω).
236 | 9 Beta–hypergeometric distributions Now from (9.33), we have that 1
1
x−1 − e + δ = P(δ 2 )(e − P(u 2 )v) ,
(9.34)
−1
so 1
1
x = (e − δ + P(δ 2 )(e − P(u 2 )v) ) .
(9.35)
−1 −1
And from (9.32), we have 1
1
y = P(x − 2 )P(δ− 2 )(δ−1 − e + (e − u)−1 ) .
(9.36)
−1
Thus ς is a bijection from (Ω ∩ (e − Ω))2 into itself. We now take the determinants in (9.34) and (9.35) to get 1
Δ(x−1 − e + δ) = Δ−1 (e − P(u 2 )v)Δ(δ)
(9.37)
and 1
1
Δ(x) = Δ−1 (e − δ + P(δ 2 )(e − P(u 2 )v) ).
(9.38)
−1
We also have 1
Δ(x−1 − e) = Δ(δ)Δ−1 (e − P(u 2 )v)Δ(u)Δ(v). As e − x = x(x−1 − e), we get 1
Δ(e − x) = Δ(δ)Δ−1 (e − P(u 2 )v)Δ(u)Δ(v) 1
1
(9.39)
× Δ−1 (e − δ + P(δ 2 )(e − P(u 2 )v) ). −1
Using (9.36), we have that Δ(y) = Δ−1 (x)Δ(e − u)Δ−1 (u)Δ−1 (u−1 − e + δ) 1
1
(9.40)
= Δ(e − δ + P(δ 2 )(e − P(u 2 )v) ) −1
× Δ(e − u)Δ−1 (u)Δ−1 (u−1 − e + δ). Writing 1
1
1
1
1
e − y = P(x − 2 )P(δ− 2 )[P(δ− 2 )x − (δ−1 − e + (e − u)−1 ) ] −1
1
= P(x − 2 )P(δ− 2 )[(δ−1 − e + (e − P(u 2 )v) )
−1 −1
− (δ−1 − e + (e − u)−1 ) ], −1
we get 1
Δ(e − y) = Δ(δ)Δ(e − v)Δ−1 (u−1 − e + δ)Δ−1 (e − P(u 2 )v).
(9.41)
9.3 Generalized beta distribution
| 237
It remains to calculate the Jacobian of the map 2
2
ς : (Ω ∩ (e − Ω)) → (Ω ∩ (e − Ω)) , (x, y) → (u, v). We write ς = ς2 ∘ ς1 with 2
2
ς1 : (Ω ∩ (e − Ω)) → (Ω ∩ (e − Ω)) , (x, y) → (x, u),
and 2
2
ς2 : (Ω ∩ (e − Ω)) → (Ω ∩ (e − Ω)) , (x, u) → (u, v).
The Jacobian of ς1 is Id J1 = Det ( 𝜕u
0
𝜕u ) 𝜕y
𝜕x
= Det(
𝜕u ), 𝜕y
where 𝜕u is the differential of the map x → u(x, y) and 𝜕u is the differential of the map 𝜕x 𝜕y y → u(x, y). 1 1 As u = e − P(δ 2 )(P(x− 2 )y−1 − e + δ)−1 , and the differential of the function z → z −1 is equal to −P(z −1 ), we have 1 1 1 𝜕u −1 = −P(δ 2 )P((P(x− 2 )y−1 − e + δ) )P(x − 2 )P(y−1 ). 𝜕y
It follows that 1
1
−1
1
n
n
|J1 | = Det(P(δ 2 )P((P(x − 2 )y−1 − e + δ) )P(x− 2 )P(y−1 )) n
n
1
= Δ r (δ)Δ−2 r (P(x− 2 )y−1 − e + δ)Δ− r (x)Δ−2 r (y). Using (9.32), (9.38) and (9.40), we get n
n
1
1
n
n
|J1 | = Δ− r (δ)Δ− r (e − δ + P(δ 2 )(e − P(u 2 )v) )Δ2 r (u)Δ2 r (u−1 − e + δ). −1
On the other hand, the Jacobian of ς2 is 𝜕u
J2 = Det ( 𝜕x 𝜕v 𝜕x 1
Id 𝜕v ) = Det(− ). 𝜕x 0
1
As v = P(u− 2 )(e − P(δ 2 )(x−1 − e + δ)−1 ), we have 1 1 𝜕v −1 = −P(u− 2 )P(δ 2 )P((x−1 − e + δ) )P(x −1 ). 𝜕x
238 | 9 Beta–hypergeometric distributions Hence 1
1
J2 = Det(P(u− 2 )P(δ 2 )P((x−1 − e + δ) )P(x −1 )) n
n
−1
n
n
= Δ r (δ)Δ− r (u)Δ−2 r (x−1 − e + δ)Δ−2 r (x). Invoking (9.37) and (9.38), we obtain n
n
n
1
n
1
1
J2 = Δ− r (δ)Δ− r (u)Δ2 r (e − P(u 2 )v)Δ2 r (e − δ + P(δ 2 )(e − P(u 2 )v) ). −1
Therefore the Jacobian which serves to express dxdy in terms of dudv is the inverse of the absolute value of the Jacobian J = J1 J2 of ς, which is n
n
1
n
J −1 = Δ2 r (δ)Δ−2 r (e − P(u 2 )v)Δ− r (u) 1
n
1
(9.42)
× Δ− r (e − δ + P(δ 2 )(e − P(u 2 )v) ) −2 nr
×Δ
−1
(u−1 − e + δ).
Inserting (9.37)–(9.42) into (9.31) gives that the joint probability distribution of (U, V) is n n C(a + b, c, −c − b, δ) 2c+b Δ (δ)Δc−a− r (u)Δa− r (e − u)Δ−a−b (u−1 − e + δ) B(a, b) n
(9.43)
n
× Δc− r (v)Δb− r (e − v)1Ω∩(e−Ω) (u)1Ω∩(e−Ω) (v)dudv.
It is easy to verify that C(a + b, c, −c − b, δ) C(c + b, a, −a − b, δ) = . B(a, b) B(c, b) We then deduce that U and V are independent, as well as U ∼ β(c + b, a, −a − b, δ) and
V ∼ β(1) (c, b).
As a corollary of the previous theorem, we have the following distributional property. (1) (1) Corollary 9.15. Let Y ∼ βa,b , Y ∼ βc,b , and X ∼ β(a+b, c, −c −b, δ) be three independent random variables. Then 1
1
1
−1 − 1
X ∼ e − P(δ 2 )(P((e − P(δ 2 )(P(X − 2 )Y −1 − e + δ) ) 2 )Y −1 − e + δ) . −1
This is similar to the situation for the beta–Gauss hypergeometric probability distribution. The property in the last corollary implies that the generalized beta probability distribution β(a + b, c, −c − b, δ) is a stationary distribution of some Markov chain constructed with two independent sequences of random variables (Yn ), (Yn ) with re(1) (1) spective distributions βa,b and βc,b . To show that this property characterizes the generalized beta probability distribution, we need to check that the corresponding continued fraction is convergent. This involves a lot of calculations because the process of construction of this continued fraction is relatively complicated.
10 Riesz–Dirichlet distributions It is well known that the real Dirichlet probability distribution may be derived from the gamma probability distribution. In fact, if Y1 , . . . , Yq are independent random variables with respective gamma distributions γp1 ,σ , . . . , γpq ,σ , and if we define S = Y1 + ⋅ ⋅ ⋅ + Yq
and X = (
Yq Y1 , . . . , ), S S
then the distribution of X is the Dirichlet distribution with parameters (p1 , . . . , pq ) and is denoted D(p1 ,...,pq ) . This distribution, usually considered as an extension of the beta distribution corresponding to the case q = 2, has many nice probabilistic characterization properties in different directions such as independence properties, zero regression or conditional distributions. It is also one of the most important distributions in statistics because of high flexibility for modeling data and because it is a conjugate prior of the parameters of the multinomial distribution in Bayesian statistics. In the multivariate setting, a matrix Dirichlet distribution has been defined using a division algorithm and replacing the gamma distribution by the Wishart probability distribution (see [39]). These distributions, which we call Wishart–Dirichlet distributions, are useful in several testing problems in multivariate statistical analysis (Troskie [107, 108]). For example, the likelihood ratio test statistic for testing homogeneity of several multivariate normal distributions is a function of Dirichlet matrices. It is also worth mentioning here that Connor and Mosimann [22] have introduced the so-called generalized Dirichlet distribution which is a more general version of the ordinary Dirichlet distribution with real margins. The extension of the properties of the real beta and real Dirichlet distributions to Wishart–Dirichlet involves in general some conceptual and technical difficulties; however, many interesting results in this direction have been established. In this chapter, we introduce and study a more general class of Dirichlet distribution on a symmetric cone whose definition appears in [50], it involves the Riesz probability distribution. We also give some results concerning the Wishart–Dirichlet distribution, in particular a characterization of this distribution in the Mauldon way, extending those given in [11] and [12].
10.1 Definition The definition of the Riesz–Dirichlet distribution relies on the following fundamental theorem. Theorem 10.1. Let Y1 , . . . , Yq be q independent absolutely continuous Riesz random j
variables with the same σ, Yj ∼ R(sj , σ), where sj = (s1 , . . . , sjr ), for 1 ≤ j ≤ q. If we set S = Y1 + ⋅ ⋅ ⋅ + Yq and Xj = π −1 (S)(Yj ), then (i) S is a Riesz random variable S ∼ R(∑qj=1 sj , σ) which is independent of (X1 , . . . , Xq−1 ); https://doi.org/10.1515/9783110713374-010
240 | 10 Riesz–Dirichlet distributions (ii) The density of the joint distribution of (X1 , . . . , Xq−1 ) with respect to the Lebesgue measure is ΓΩ (∑qj=1 sj )
∏qj=1 ΓΩ (sj )
q−1
∏ Δsj − n (xj )Δsq − n (e − (x1 + ⋅ ⋅ ⋅ + xq−1 ))1Tq−1 (x1 , . . . , xq−1 ), r
j=1
r
where q−1
Tq−1 = {(x1 , . . . , xq−1 ) ∈ Ωq−1 : e − ∑ xi ∈ Ω}. i=1
(10.1)
Proof. As Y1 , . . . , Yq are independent, the probability density function of (Y1 , . . . , Yq ) at a point (y1 , . . . , yq ) of Ωq is q
1
∏ Δsj − n (yj ) exp(−⟨σ, y1 r ∏qj=1 ΓΩ (sj )Δ∑q sj (σ −1 ) j=1 j=1
+ ⋅ ⋅ ⋅ + yq ⟩)1Ωq (y1 , . . . , yq ).
Consider the transformation Ωq → Ω × Tq−1 ,
(y1 , . . . , yq ) → (z, x1 , . . . , xq−1 ), where z = ∑qj=1 yj and xj = π −1 (z)(yj ), j = 1, . . . , q − 1. For the calculation of the Jacobian of this transformation, we first fix z in xj = π −1 (z)(yj ), j = 1, . . . , q − 1, and differentiate with respect to yj . We obtain n
dxj = Det(π −1 (z))dyj = Δ− r (z)dyj ,
j = 1, . . . , q − 1,
so that n
dx1 ⋅ ⋅ ⋅ dxq−1 = Δ−(q−1) r (z)dy1 ⋅ ⋅ ⋅ dyq−1 . We then fix y1 , . . . , yq−1 in z = ∑qj=1 yj = ∑q−1 y + yq , and get dz = dyq . Thus we obtain j=1 j n
dy1 ⋅ ⋅ ⋅ dyq = Δ(q−1) r (z)dx1 ⋅ ⋅ ⋅ dxq−1 dz. As the image of (Y1 , . . . , Yq ) by this transformation, the vector (S, X1 , . . . , Xq−1 ) has the distribution concentrated on the set q−1
{(z, x1 , . . . , xq−1 ) ∈ Ωq : e − ∑ xi ∈ Ω} = Ω × Tq−1 j=1
10.1 Definition | 241
with a probability density function equal to q−1
1
∏ Δsj − n (π(z)(xj )) r ∏qj=1 ΓΩ (sj )Δ∑q sj (σ −1 ) j=1 j=1 q−1
n
× Δsq − n (π(z)(e − ∑ xj ))e−⟨σ,z⟩ Δ(q−1) r (z)1Ω×Tq−1 (z, x1 , . . . , xq−1 ). r
j=1
Using (2.1), this density may be written as product of the following two terms: ΓΩ (∑qj=1 sj )
q−1
∏ Δsj − n (xj )Δsq − r r ∏qj=1 ΓΩ (sj ) j=1
n
q−1
(e − ∑ xj )1Tq−1 (x1 , . . . , xq−1 ) j=1
(10.2)
and 1 ΓΩ (∑qj=1 sj )Δ∑q
j=1
sj
(σ −1 )
e−⟨σ,z⟩ Δ∑q
j=1
sj − nr (z)1Ω (z).
From this we deduce that S and (X1 , . . . , Xq−1 ) are independent, that q
S ∼ R(∑ sj , σ), j=1
and that the joint probability density function of (X1 , . . . , Xq−1 ) is given by (10.2). Definition 10.1. The distribution of (X1 , . . . , Xq−1 ) is called the Riesz–Dirichlet distribution on V with parameters (s1 , . . . , sq ) such that for all 1 ≤ k ≤ q and 1 ≤ i ≤ r, ski > (i − 1) d2 ; it is denoted by D(s1 ,...,sq ) . We sometimes say that (X1 , . . . , Xq ) has the Riesz–Dirichlet distribution D(s1 ,...,sq ) . This means that (X1 , . . . , Xq ) is in the set q
Tq−1,1 = {(x1 , . . . , xq ) ∈ Ωq : ∑ xi = e}, i=1
(10.3)
and the probability density function of (X1 , . . . , Xq−1 ) is given by (10.2). By convenience, one of the two notations is used indifferently. It is easy to see that for σ ∈ Ω, (π −1 (σ)X1 , . . . , π −1 (σ)Xq−1 ) has the Riesz–Dirichlet distribution D(s(1) ,...,s(q) ) if and only if (X1 , . . . , Xq−1 ) has the probability density function ΓΩ (∑qi=1 s(i) )
∏qi=1 ΓΩ (s(i) )
q−1
∏ i=1
Δs(i) − n (xi )Δs(q) − n (σ − ∑q−1 x) i=1 i r
Δ∑q
r
s(i) )− nr (σ) i=1
1Tq−1 (σ) (x1 , . . . , xq−1 ),
where Tq−1 (σ) = {(x1 , . . . , xq−1 ) ∈ Ωq−1 : ∑q−1 x ∈ Ω ∩ (σ − Ω)}. i=1 i
242 | 10 Riesz–Dirichlet distributions
10.2 Projections of a Riesz–Dirichlet distribution In this section, we study the distributions of some random variables obtained from a Riesz–Dirichlet random variable in a symmetric cone by projections and by inversions of the margins. For 1 ≤ k ≤ r, denote by T (k) the triangular group corresponding to the subalgebra (k) V , and set πk−1 to be the division algorithm defined by the Cholesky decomposition in the cone Ωk . Similarly, for the subalgebra W (j) , 1 ≤ j ≤ r − 1, the symmetric cone (j) Ωe−cr−j of W (j) is parameterized by the set W+ , and Tj , Kj , and π −1 j denote respectively the triangular group, the orthogonal group, and the division algorithm corresponding to W (j) . Of course, the unit in V (k) is ck , and the unit in W (j) is e − cr−j = cr−j+1 + ⋅ ⋅ ⋅ + cr . We have the following useful results concerning the projections Pk and Pj∗ . Proposition 10.2. Let u and v be in Ω and let x = π −1 (u + v)(u). Then, for 1 ≤ k ≤ r, we have Pk (x) = πk−1 (Pk (u) + Pk (v))(Pk (u)). Proof. As u + v ∈ Ω, there exists an α in V+ such that u + v = tα (e), so that x = tα−1 (u). Using the fact that Pk (tα (x)) = tPk (α) (Pk (x)), (see Proposition 2.4), we get Pk (x) = tP−1k (α) (Pk (u)) and Pk (u + v) = tPk (α) (ck ). Then πk−1 (Pk (u) + Pk (v)) = tP−1k (α) . Hence Pk (x) = πk−1 (Pk (u) + Pk (v))(Pk (u)).
(10.4)
10.2 Projections of a Riesz–Dirichlet distribution
| 243
Proposition 10.3. Let u and v be in Ω and let x = π −1 (u + v)(u). Then for 1 ≤ j ≤ r, we have −1
(Pj∗ (x −1 ))
∗ −1 ∗ −1 = π −1 j ((Pj ((u + v) )) )(Pj (u )) . −1
−1
Proof. As (u + v) ∈ Ω, there exists a unique α in V+ such that u + v = tα (e), and using (2.8) we have (Pj∗ (u + v)−1 )
−1
= tα (e − cr−j+1 ).
(10.5)
Since (Pj∗ (u + v)−1 )−1 ∈ Ωe−cr−j , there exists a unique βj in W+ such that (j)
(Pj∗ (u + v)−1 )
−1
= tβj (e − cr−j+1 ).
(10.6)
Comparing (10.5) and (10.6), we obtain that tβ−1j ∘ tα|W (j) ∈ Tj ∩ Kj , and since the only orthogonal transformation which is triangular with positive diagonal entries is the identity (see [31, p. 111]), we get tα|W (j) = tβj . On the other hand, from Lemma 7.6, we have tα ((Pj∗ (x−1 )) ) = (Pj∗ (u−1 )) . −1
−1
Since (Pj∗ (x−1 ))−1 ∈ W (j) , one obtains −1
(Pj∗ (x−1 ))
= tβ−1j ((Pj∗ (u)−1 ) ). −1
Using (10.6), we get −1
(Pj∗ (x−1 ))
∗ −1 ∗ −1 = π −1 j ((Pj (u + v) ) )(Pj (u )) . −1
−1
Next we show that the direct orthogonal projection of a Riesz–Dirichlet distribution on V (k) is still Riesz–Dirichlet. Theorem 10.4. Let X = (X1 , . . . , Xq ) be a Riesz–Dirichlet random variable with distribution D(s1 ,...,sq ) . Then for all 1 ≤ k ≤ r, the random variable (Pk (X1 ), . . . , Pk (Xq )) has the Riesz–Dirichlet distribution D(s1 ,...,sq ) on Ωck , where sik = (si1 , . . . , sik ), ∀1 ≤ i ≤ q. k
k
244 | 10 Riesz–Dirichlet distributions Proof. Suppose that the distribution of X is D(s1 ,...,sq ) . Then there exist independent Riesz random variables Y1 , . . . , Yq with the same scale parameter σ and respective shape parameters s1 , . . . , sq such that, if S = Y1 + ⋅ ⋅ ⋅ + Yq , the X = (X1 , . . . , Xq ) = (π −1 (S)Y1 , . . . , π −1 (S)Yq ). From Proposition 10.2, we obtain (Pk (X1 ), . . . , Pk (Xq )) q
q
i=1
i=1
= (πk−1 (∑ Pk (Yi ))Pk (Y1 ), . . . , πk−1 (∑ Pk (Yi ))Pk (Yq )). We now use Theorem 3.12 to say that for all 1 ≤ i ≤ q, Pk (Yi ) is a Riesz random variable with parameters sik and σ1 − P(σ12 )σ0−1 , where σ1 , σ12 , σ0 are the Peirce components of σ with respect to ck . Since Pk (Y1 ), . . . , Pk (Yq ) are independent, one obtains that (Pk (X1 ), . . . , Pk (Xq )) has the Dirichlet distribution D(s1 ,...,sq ) . k
k
The following theorem concerns the distribution of the orthogonal projection of a Riesz–Dirichlet random variable on the subalgebra W (j) , 1 ≤ j ≤ r − 1. Theorem 10.5. Let X = (X1 , . . . , Xq ) be a Riesz–Dirichlet random variable with distribution D(s1 ,...,sq ) , and let 1 ≤ j ≤ r − 1. Setting Sj = ∑ql=1 (Pj∗ (Xl−1 ))−1 , we have that −1 ∗ −1 ∗ −1 (π −1 j (Sj )(Pj (X1 )) , . . . , π j (Sj )(Pj (Xq )) ) −1
−1
has a Riesz–Dirichlet distribution on the algebra V(cr−j+1 + ⋅ ⋅ ⋅ + cr , 1) with distribution D , where ŝi = (si , . . . , si ), ∀1 ≤ i ≤ q. r−j
(ŝ1 r−j −(r−j) d2 ,...,ŝq r−j −(r−j) d2 )
r−j+1
r
Proof. We again use the fact that a Riesz–Dirichlet random variable is derived from Riesz random variables. Suppose that the distribution of X is D(s1 ,...,sq ) . Then there exist Y1 , . . . , Yq independent Riesz random variables with the same scale parameter σ and respective shape parameters s1 , . . . , sq such that, if S = Y1 + ⋅ ⋅ ⋅ + Yq , then we have X = (X1 , . . . , Xq ) = (π −1 (S)Y1 , . . . , π −1 (S)Yq ). From Proposition 10.3, we have that, for all 1 ≤ i ≤ q, −1
(Pj∗ (Xi−1 ))
∗ −1 ∗ −1 = π −1 j ((Pj (Y1 + ⋅ ⋅ ⋅ + Yq ) ) )(Pj (Yi )) . −1
−1
Then q
∗ −1 ∗ −1 Sj = π −1 j ((Pj (Y1 + ⋅ ⋅ ⋅ + Yq ) ) )(∑(Pj (Yi )) ). −1
−1
i=1
(10.7)
As (Pj∗ (Y1 + ⋅ ⋅ ⋅ + Yq )−1 )−1 ∈ Ωe−cr−j , there exist γj in W+ such that (j)
(Pj∗ (Y1 + ⋅ ⋅ ⋅ + Yq )−1 )
−1
= tγj (e − cr−j+1 ).
(10.8)
10.2 Projections of a Riesz–Dirichlet distribution
| 245
On the other hand, Sj ∈ Ωe−cr−j , and there exist νj in W+ such that (j)
Sj = tνj (e − cr−j+1 ). This, together with (10.7) and (10.8), implies that q
= tγj (Sj ) = tγj ∘ tνj (e − cr−j+1 ).
−1
∑(Pj∗ (Yi−1 )) i=1
(10.9)
We also have ∗ −1 π −1 j (Sj )(Pj (Xi ))
= (tν−1j ∘ tγ−1j )((Pj∗ (Yi−1 )) ).
−1
−1
From (10.9), we immediately get ∗ −1 π −1 j (Sj )(Pj (Xi ))
−1
q
∗ −1 ∗ −1 = π −1 j (∑(Pj (Yi )) )(Pj (Yi )) . −1
−1
i=1
(10.10)
From (10.10), we obtain −1 ∗ −1 ∗ −1 (π −1 j (Sj )(Pj (X1 )) , . . . , π j (Sj )(Pj (Xq )) ) −1
−1
q
q
−1 ∗ −1 ∗ −1 ∗ −1 ∗ −1 = (π −1 j (∑(Pj (Yl )) )(Pj (Y1 )) , . . . , π j (∑(Pj (Yl )) )(Pj (Yq )) ). −1
−1
l=1
−1
−1
l=1
According to Corollary 3.13, we have that for all 1 ≤ i ≤ q, (Pj∗ (Yi−1 ))−1 is a Riesz random
variable with parameters si −(r−j) d2 and σ0 , where σ1 , σ12 , σ0 are the Peirce components of σ. Since (Pj∗ (Y1−1 ))−1 , . . . , (Pj∗ (Yq−1 ))−1 are independent, one obtains that −1 ∗ −1 ∗ −1 (π −1 j (Sj )(Pj (X1 )) , . . . , π j (Sj )(Pj (Xq )) ) −1
−1
has the Dirichlet distribution D(s1 −(r−j) d ,...,sq −(r−j) d ) . 2
2
Remark 10.1. (i) Particular statements of Theorems 10.4 and 10.5 may be given replacing the Riesz– Dirichlet distributions by the ordinary Wishart–Dirichlet distribution. (ii) From Theorem 10.4, we have in particular that (P1 (X1 ), . . . , P1 (Xq )) is a real Dirichlet random variable with parameters (s11 , . . . , sq1 ). (iii) Theorem 10.5 implies that (
(P1∗ (X1−1 ))−1
∑ql=1 (P1∗ (Xl−1 ))
,..., −1
(P1∗ (Xq−1 ))−1
∑ql=1 (P1∗ (Xl−1 ))−1
)
is a real Dirichlet random variable with parameters (s1r − (r − 1) d2 , . . . , sqr − (r − 1) d2 ).
246 | 10 Riesz–Dirichlet distributions
10.3 Wishart–Dirichlet distribution In this section, we give some results concerning a particular class of the Riesz– j Dirichlet distributions. Suppose that in Theorem 10.1, we take sk = pj , 1 ≤ k ≤ r, then Yj has the Wishart distribution W(p, σ)(dx) =
Δp (σ) −⟨σ,x⟩ p− nr e Δ (x)1Ω (x)dx. ΓΩ (p)
In this case the probability density function of (X1 , . . . , Xq−1 ) = (π −1 (S)Y1 , . . . , π −1 (S)Yq−1 ), where S = Y1 + ⋅ ⋅ ⋅ + Yq , is equal to ΓΩ (∑qj=1 pj )
q−1
∏Δ ∏qj=1 ΓΩ (pj ) j=1
pj − nr
n
q−1
(xj )Δpq − r (e − ∑ xj )1Tq−1 (x1 , . . . , xq−1 ). j=1
It is easy to verify that this distribution does not depend on the choice of the division algorithm used to define (X1 , . . . , Xq ). We can then use indifferently any division algorithm, in particular, that defined by the quadratic representation and define 1
1
(X1 , . . . , Xq−1 ) = (P(S− 2 )(Y1 ), . . . , P(S− 2 )(Yq−1 )). This distribution will be called the Wishart–Dirichlet distribution on V with parameters (p1 , . . . , pq ) and denoted by D(p1 ,...,pq ) . We will say that the distribution of the vector (X1 , . . . , Xq ) is K-invariant if for all k in K, (kX1 , . . . , kXq ) and (X1 , . . . , Xq ) have the same distribution. This implies that the distribution of each component Xi is K-invariant. Clearly, a Wishart–Dirichlet distribution is K-invariant. 10.3.1 Some results involving determinants We give some results which play a crucial role in the proof of some properties of the Wishart–Dirichlet distribution. These results concern some functions defined on the cone Ω, and are of independent interest. The first is a multivariate version of a result which is stated without proof in Darroch and Ratcliff [23] and appears in a particular form in Feller [32, p. 275]. We state it only for continuous functions, but it may be extended to Lebesgue measurable functions. Theorem 10.6. Let u and v be two continuous positive functions defined respectively on Ω and Ω ∩ (σ − Ω), where σ ∈ Ω. Suppose that for all x ∈ Ω, u(π(x)y) converges to a finite number φ(x) as y → 0, v(y) where the convergence in the algebra V is with respect to its norm.
(10.11)
10.3 Wishart–Dirichlet distribution
| 247
Then there exist s in ℝr and c in ℝ such that φ(x) = cΔs (x), for all x ∈ Ω. If, moreover, we assume that the function φ is K-invariant, then φ(x) = cΔp (x), for some p in ℝ. Proof. We first observe that given x1 and x2 in Ω, for y in Ω ∩ (σ − Ω) close to 0, we have the identity u(π(x1 )π(x2 )y) u(π(x2 )y) u(π(x2 )y) u(π(x1 )π(x2 )y) = . v(y) v(π(x2 )y) v(y) v(π(x2 )y) This yields φ(π(x1 )x2 )φ(e) = φ(x1 )φ(x2 )
∀x1 , x2 ∈ Ω.
(10.12)
̃(t) = φ(t(e))/φ(e) is a one-dimensional representation of the It follows that t → φ triangular group T. Since φ is continuous, Lemma 7.16 implies that ̃(t) = Δs (t(e)), φ
for some s in ℝr .
φ(x) = cΔs (x),
where c = φ(e).
Thus
The function φ is K-invariant if and only the components s1 , . . . , sr of s are such that s1 = ⋅ ⋅ ⋅ = sr = p. In this case, we have φ(x) = cΔp (x). The following lemma is useful. Lemma 10.7. Let a(x) be a finite positive continuous and K-invariant function on Ω ∩ (e − Ω) such that a(π −1 (e − y)(z))a(y) = ca(π −1 (y + z)(y))a(y + z), for all y, z ∈ Ω such that y + z ∈ Ω ∩ (e − Ω), where c is a constant. Then (i) limx→0 a(x) exists, (ii) a(x) is constant for all x ∈ Ω ∩ (e − Ω), (iii) c = 1.
(10.13)
248 | 10 Riesz–Dirichlet distributions Proof. Let b ∈ (e+Ω). Then for all y ∈ Ω∩(π −1 (b)(e)−Ω), one can substitute z = π(y)b−y into (10.13) to get a(π −1 (e − y)(π(y)b − y))a(y) = ca(π −1 (b)(e))a(π(y)b),
(10.14)
and substitute z = π −1 (b)(e) − y to get a(π −1 (e − y)(π −1 (b)(e) − y))a(y) = ca(π(b)y)a(π −1 (b)(e)).
(10.15)
Define the set r
D = {∑ λi ci : λi > 0, 1 ≤ i ≤ r}, i=1
(10.16)
where c1 , . . . , cr is the fixed Jordan frame, and suppose that b and y are in D. Then π(b)y = π(y)b, and we obtain from (10.14) and (10.15) that a(π −1 (e − y)π(y)(b − e)) = a(π −1 (e − y)(b−1 − y)).
(10.17)
If we set π −1 (e − y)π(y)(b − e) = x, then the fact that y ∈ Ω ∩ (π −1 (b)(e) − Ω) implies that x ∈ Ω ∩ (b − Ω) and (10.17) becomes a(x) = a(π(e − x)b−1 ).
(10.18)
Since 0 is on the boundary of Ω∩(b−Ω), one can let x → 0, and as a(x) is a continuous function, we deduce that limx→0 a(x) exists and is equal to a(b−1 ), for all b in D∩(e+Ω). But b being in e + Ω is equivalent to b−1 being in Ω ∩ (e − Ω). Thus the function a(x) is constant on the set D ∩ Ω ∩ (e − Ω). To conclude that a is constant on Ω ∩ (e − Ω), we use the fact that a is K-invariant and that an element x of Ω ∩ (e − Ω) may be written as x = k(D) with D in D ∩ Ω ∩ (e − Ω) and k in K. Finally, as the function a is constant, (10.13) implies that c = 1. We also have the following lemma which is easy to prove. Lemma 10.8. Let σ be in Ω. If for all x in Ω ∩ (σ − Ω), the equation Δp (x)Δq (σ − x) = c is satisfied, where c is a constant, then (i) p = q = 0, (ii) c = 1.
10.3 Wishart–Dirichlet distribution
| 249
10.3.2 Product and independence properties The following result, where the product ⨀ is defined in (8.5), is a corollary of Theorem 8.9. Proposition 10.9. Let X1 , . . . , Xn be independent random variables with beta–Wishart distributions βp(1)1 ,q1 , . . . , βp(1)n ,qn such that ⨀ni=1 Xi is βp(1),∑n q -distributed. Define n
i=1
i
Y1 = e − X1 , i−1
Yi = π(⨀(Xk ))(e − Xi ), n
for i = 2, . . . , n − 1,
k=1
Yn = ⨀ Xi , i=1
and n
Yn+1 = e − ∑ Yi . i=1
Then (Y1 , . . . , Yn+1 ) has the Wishart–Dirichlet distribution D(q1 ,...,qn−1 ,pn ,qn ) . Proof. As ⨀ni=1 Xi is βp(1),∑n n
i=1
qi
-distributed, according to Theorem 8.9, we have that pi =
pi+1 + qi+1 . Let V1 , . . . , Vn and Un be independent random variables such that Vi is W(qi , e)-distributed, i = 1, . . . , n and Un is W(pn , e)-distributed. Then we can suppose that n
n
i=k
i=k+1
Xk = π −1 (Un + ∑ Vi )(Un + ∑ Vi ). In fact, all Xk defined in this way are independent and the distribution of Xk is βp(1)k ,qk . It is then easy to see that for k = 1, . . . , n − 1, n
Yk = π −1 (Un + ∑ Vi )(Vk ), i=1 n
Yn = π −1 (Un + ∑ Vi )(Un ), i=1
and n
Yn+1 = π −1 (Un + ∑ Vi )(Vn ). i=1
Thus (Y1 , . . . , Yn+1 ) has the Wishart–Dirichlet distribution D(q1 ,...,qn−1 ,pn ,qn ) .
250 | 10 Riesz–Dirichlet distributions Theorem 10.10. Let σ be in Ω and let X, Y be two random variables in Ω such that X + Y ∈ Ω ∩ (σ − Ω). Assume that the probability density functions of π −1 (σ)X, π −1 (σ)Y, π −1 (σ −Y)X, and π −1 (σ −X)Y are all continuous and K-invariant. Then π −1 (σ −Y)X and Y are independent and π −1 (σ−X)Y and X are independent if and only if (π −1 (σ)X, π −1 (σ)Y) is Wishart–Dirichlet-distributed. Proof. Assume first that σ = e. Let g1 , g2 , h1 , and h2 denote the probability distribution functions of π −1 (e − Y)X, Y, π −1 (e − X)Y, and X. Then the probability distribution function of (X, Y) is given, for (x, y) in the set ζ = {(x, y) : x ∈ Ω, y ∈ Ω, x + y ∈ Ω ∩ (e − Ω)}, by n
n
f (x, y) = g1 (π −1 (e − y)x)g2 (y)Δ− r (e − y) = h1 (π −1 (e − x)y)h2 (x)Δ− r (e − x).
(10.19)
Therefore, for (x, y) in ζ , we have n
g1 (π −1 (e − y)x) h1 (π −1 (e − x)y) Δ r (e − y) = . n h2 (x) g2 (y) Δ r (e − x)
(10.20)
Obviously, when y is Ω ∩ (e − Ω), (0, y) is on the boundary of the set ζ . Similarly, when x is Ω ∩ (e − Ω), (x, 0) is on the boundary of the set ζ . Also as x + y ∈ Ω ∩ (e − Ω), one can write x = e − y − ω with ω ∈ Ω ∩ (e − Ω). It follows that x tends towards e − y as ω → 0. Accordingly, let x → 0 in (10.20), then since h1 is assumed to be continuous, the n r
(e−y) right-hand side of (10.20) converges to h1 (y)Δ as x → 0. g2 (y) Thus, by Theorem 10.6, the left-hand side of (10.20) converges and its limit is of n the form c1 Δ−p1 + r (e − y). Therefore
h1 (y) = c1 Δ−p1 (e − y)g2 (y).
(10.21)
Similarly, by letting y → 0 on both sides of (10.20), we obtain g1 (x) = c2 Δ−p2 (e − x)h2 (x).
(10.22)
On the other hand, if we define g̃1 (x) = g1 (e − x)
̃ (x) = h (π −1 (e + x)(e)), and h 1 1
then from (10.19), we have n
g̃1 (π −1 (e − y)(e − x − y)) h2 (x) Δ r (e − y) = . n ̃ (π −1 (y)(e − x − y)) g2 (y) Δ r (e − x) h 1 Let x → e − y, then the right-hand side of (10.23) converges to
(10.23) n
h2 (e−y)Δ r (e−y) . n g2 (y)Δ r (y)
Thus the
left-hand side also converges to a finite limit when x → e − y. Since it can be written as
10.3 Wishart–Dirichlet distribution
g̃1 (π(α)β) ̃ (β) , with α h 1
| 251
= π −1 (e − y)(y) and β = π −1 (y)(e − x − y), by Theorem 10.6, it converges
to a limit of the form c3
Δ(y) . n Δp3 − r (e−y)
Consequently,
h2 (e − y) = c3 g2 (y)
Δ(y) . Δp3 (e − y)
(10.24)
We now use (10.21), (10.22), and (10.24) to express (10.20) as a functional equation for g2 , namely g2 (π −1 (e − y)(e − x − y))g2 (y) c1 Δp2 −p1 −p3 (e − x − y) = . n n −1 c2 Δp2 − r (e − y)Δ−p1 −p2 + r (e − x) g2 (π (e − x)(y))g2 (e − x)
(10.25)
Write a(x) =
g2 (x) , b(x)
where n
n
b(x) = Δp2 − r (x)Δp1 +p3 − r (e − x). Then a is positive continuous and K-invariant function over Ω ∩ (e − Ω) and satisfies c1 a(π −1 (y + z)(y))a(y + z), c2
a(π −1 (e − y)(z))a(y) =
where z = e − x − y. Therefore, by Lemma 10.7, a(x) = c and n
n
g2 (x) = cΔp2 − r (x)Δp1 +p3 − r (e − x). It is now easy to find g1 , h1 , and h2 and to show that n
n
n
f (x, y) = kΔp1 − r (x)Δp2 − r (y)Δp3 − r (e − x − y)1T2 (x, y),
(10.26)
where k is a normalizing constant. We have proved the sufficiency part of the theorem, while the necessity part is trivial. For σ ≠ e, we apply the above proof to π −1 (σ)X and π −1 (σ)Y. In this case, using (10.26), we obtain that the probability density function of (X, Y) is n
n
n
f (x, y) = k Δp1 − r (x)Δp2 − r (y)Δp3 − r (σ − x − y)1T2 (x, y), where T2 = {(x, y) : x ∈ Ω, y ∈ Ω, and x + y ∈ σ − Ω}. Note that the parameters p1 , p2 , and p3 do not depend on σ.
252 | 10 Riesz–Dirichlet distributions Theorem 10.11. Let X1 , . . . , Xn be random variables in Ω such that n
∑ Xi ∈ Ω ∩ (e − Ω). i=1
Assume that, for all i ≠ j and all l ≠ i, j, the conditional probability density functions of Yi = π −1 (e − ∑l=i̸ Xl )(Xi ) and of Xj given that Xl = xl are absolutely continuous and K-invariant. Then Yi is independent of the set {Xl , l ≠ i}, for i = 1, . . . , n, if and only if (X1 , . . . , Xn ) has a Wishart–Dirichlet distribution. Proof. As for Theorem 10.10, the necessity part of the proof is trivial and we concentrate on the sufficiency part. Given X3 = x3 , X4 = x4 , . . . , Xn = xn , we have that π −1 (e − ∑ xl − X2 )(X1 )
and X2 are independent
π −1 (e − ∑ xl − X1 )(X2 )
and X1 are independent.
l=1,2 ̸
and
l=1,2 ̸
Applying Theorem 10.10 with σ = e − ∑l=1,2 ̸ xl , we obtain that the conditional probability density function of (X1 , X2 ) given X3 = x3 , X4 = x4 , . . . , Xn = xn is given by f(X1 ,X2 )|X3 =x3 ,...,Xn =xn (x1 , x2 ) [1]
=
n
n
n
[2] p12 − r ΓΩ (p[1] (x1 )Δp12 − r (x2 )Δp12 − r (e − ∑ni=1 xi ) 12 + p12 + p12 ) Δ
[2] ΓΩ (p[1] 12 )ΓΩ (p12 )ΓΩ (p12 )
[2]
n
Δp12 +p12 +p12 − r (e − ∑i=1,2 ̸ xi ) [1]
[2]
,
[2] where the parameters p[1] 12 , p12 , and p12 do not depend on x3 , x4 , . . . , xn . It follows that the probability density function of (X1 , . . . , Xn ) is given by n
n
n
n
f (x1 , . . . , xn ) = d12 (x)Δp12 − r (x1 )Δp12 − r (x2 )Δp12 − r (e − ∑ xi ), [1]
[2]
i=1
where d12 (x) denotes a function of x3 , . . . , xn . Similarly, n
n
n
n
f (x1 , . . . , xn ) = d13 (x)Δp13 − r (x1 )Δp13 − r (x3 )Δp13 − r (e − ∑ xi ), [1]
[3]
i=1
where d13 (x) denotes a function of x2 , x4 , . . . , xn . Applying Lemma 10.8 gives [1] p[1] 12 = p13 ,
p12 = p13 ,
n
n
d12 (x)Δp12 − r (x2 ) = d13 (x)Δp13 − r (x3 ). [2]
[3]
10.3 Wishart–Dirichlet distribution
| 253
The parameters p[i] do not depend on xk , and it is simple to show that ij p[i] ij = pi ,
pij = pn+1 ,
and n
n
d12 (x) = cΔp3 − r (x3 ) ⋅ ⋅ ⋅ Δpn − r (xn ). The proof of Theorem 10.11 is complete.
10.3.3 Mauldon characterization of the Wishart–Dirichlet distribution A remarkable characterization result concerning the real Dirichlet distribution due to Mauldon [89] says, in a slightly different form, that if X1 , . . . , Xq are real random variables, then (X1 , . . . , Xq ) has a Dirichlet joint distribution with parameters (p1 , . . . , pq ) if and only if, for all positive real numbers f1 , . . . , fq , q
𝔼[(∑ fi Xi ) i=1
−(p1 +⋅⋅⋅+pq )
q
] = ∏ fi i=1
−pi
.
(10.27)
A proof of this result is given in the paper by Chamayou and Letac [20]. The expectation formula (10.27) has been extended to the multivariate Wishart–Dirichlet distributions in different ways. For instance, Letac, Massam, and Richards [81] have shown that a similar formula holds when X = (X1 , . . . , Xq ) has a matrix Wishart–Dirichlet distribution D(p1 ,...,pq ) and the fi are real numbers, and Gupta and Richards [40] have shown that the formula holds when X = (X1 , . . . , Xq ) has a matrix Wishart–Dirichlet distribution and the fi are symmetric matrices. In the next theorem, we give an extension of Mauldon characterization to the Wishart–Dirichlet distribution on a symmetric cone Ω. For a ∈ Ω, we consider the map on V given by H(a) = Id + P(a1/2 ), where Id denotes the identity on V. Theorem 10.12. Let p1 , . . . , pq be in ](r − 1) d2 , +∞[, p = p1 +⋅ ⋅ ⋅+pq and let X = (X1 , . . . , Xq ) be a random variable in Tq−1,1 with K-invariant distribution. Then X has the Wishart– Dirichlet distribution with parameters (p1 , . . . , pq ) if and only if for all a1 , . . . , aq in Ω, we have q
q
i=1
i=1
𝔼[Δ−p (∑ H(ai )(Xi ))] = ∏ Δ−pi (H(ai )(e)).
(10.28)
254 | 10 Riesz–Dirichlet distributions Of course, in the real case (r = 1), if we set ai = fi − 1 in (10.28), we get (10.27). Similarly, when we set ai = (fi − 1)e with fi > 0, (10.28) becomes q
q
i=1
i=1
𝔼[Δ−p (∑ fi Xi )] = ∏ fi
−rpi
.
Proof. (⇒) Suppose that the distribution of X is D(p1 ,...,pq ) . Then there exist Y1 , . . . , Yq independent Wishart random variables with the same scale parameter e and respective shape parameters p1 , . . . , pq such that, if S = Y1 +⋅ ⋅ ⋅+Yq , we have X = (X1 , . . . , Xq ) = (P(S−1/2 )Y1 , . . . , P(S−1/2 )Yq ). We follow the method used in the real case by Chamayou and Letac [20] and compute by two different methods the expectation q
MY (a1 , . . . , aq ) := 𝔼(e− tr(∑i=1 ai Yi ) ), where Y = (Y1 , . . . , Yq ). The first method uses the independence of the Yi ’s. In fact, we have q
MY (a1 , . . . , aq ) = ∏ 𝔼(e− tr(ai Yi ) ) i=1 q
= ∏ Δ−pi (e + P(a1/2 i )(e)) i=1 q
= ∏ Δ−pi (H(ai )(e)). i=1
In the second method, we replace in the expression of MY (a1 , . . . , aq ), Yi by P(S1/2 )Xi . According to Proposition 1.8 (ii), there exists ki in K which depends on S such that P(S1/2 )ai = ki P(a1/2 i )S. Therefore tr(ai P(S1/2 )Xi ) = ⟨P(S1/2 )Xi , ai ⟩ = ⟨Xi , P(S1/2 )ai ⟩
= ⟨Xi , ki P(a1/2 i )S⟩
= ⟨ki∗ Xi , P(a1/2 i )S⟩
∗ = ⟨P(a1/2 i )ki Xi , S⟩.
Using the fact that X and S are independent, we can write q
1/2
MY (a1 , . . . , aq ) = 𝔼(e− tr(∑i=1 (P(ai = 𝔼(𝔼(e
)ki∗ (Xi ))S)
)
− tr(∑qi=1 (P(a1/2 )ki∗ (Xi ))S) i
| S))
10.3 Wishart–Dirichlet distribution
q
1/2
= 𝔼(𝔼(e− tr(∑i=1 (P(ai
)Xi )S)
| 255
| S))
(because Xi ’s are K-invariant and independent of S) q
1/2
= 𝔼(e− tr(∑i=1 (P(ai = 𝔼(𝔼(e
)Xi )S)
)
− tr(∑qi=1 (P(a1/2 )Xi )S) i
| X))
q
= 𝔼[Δ−p (e + ∑ P(a1/2 i )Xi )] i=1
q
= 𝔼[Δ−p (∑ H(ai )(Xi ))]. i=1
Comparing the expressions of MY (a1 , . . . , aq ) obtained in the two ways gives the identity (10.28). (⇐) Let Z1 , . . . , Zq be independent random variables with Wishart distributions Wp1 ,e , . . . , Wpq ,e , respectively. Then Z = Z1 + ⋅ ⋅ ⋅ + Zq has the Wishart probability distribution Wp,e and (P(Z −1/2 )Z1 , . . . , P(Z −1/2 )Zq ) has the Wishart–Dirichlet distribution D(p1 ,...,pq ) . Suppose that (Z1 , . . . , Zq ) and X are independent. Then by (3.12), for a1 , . . . , aq in Ω, we have q
q
𝔼(e− ∑i=1 tr(ai Zi ) ) = ∏ Δ−pi (e + ai ), i=1
and q
𝔼(e− ∑i=1 tr(ai P(Z
1/2
)Xi )
q
) = 𝔼[Δ−p (e + ∑ P(a1/2 i )Xi )]. i=1
Using (10.28), we obtain q
𝔼(e− ∑i=1 tr(ai P(Z
1/2
)Xi )
q
) = 𝔼(e− ∑i=1 tr(ai Zi ) ).
This is true for all a1 , . . . , aq in Ω, so ℒ(P(Z
1/2
)X1 , . . . , P(Z 1/2 )Xq ) = ℒ(Z1 , . . . , Zq ).
Thus P(Z 1/2 )X1 , . . . , P(Z 1/2 )Xq are independent random variables with Wishart distributions Wp1 ,e , . . . , Wpq ,e , respectively.
As ∑qi=1 Xi = e, we have that ∑qi=1 P(Z 1/2 )Xi = Z. From this and the very definition of the Wishart–Dirichlet distribution as a particular case of Riesz–Dirichlet probability distribution (see Theorem 10.1), we deduce that (X1 , . . . , Xq ) has the Wishart–Dirichlet distribution D(p1 ,...,pq ) .
256 | 10 Riesz–Dirichlet distributions 10.3.4 A distributional property The following theorem extends a property given in [20] for the real Dirichlet distribution. Its proof uses the characterization of the Wishart–Dirichlet distribution in Ω given in Theorem 10.12. Theorem 10.13. Let, for 1 ≤ i ≤ k, X (i) = (Xi1 , . . . , Xid ) be a Wishart–Dirichlet random variable in Ω, ℒ(X (i) ) = D(pi1 ,...,pid ) with pij > (r − 1) d2 ∀j, and let Y = (Y1 , . . . , Yk ) be a Wishart–Dirichlet random variable in Ω with distribution D(r1 ,...,rk ) , where ri = ∑dj=1 pij . Suppose that X (1) , . . . , X (k) and Y are independent. Define, for 1 ≤ j ≤ d, k
Zj = ∑ P(Yi1/2 )Xij . i=1
Then Z = (Z1 , . . . , Zd ) ∼ D(s1 ,...,sd ) , with sj = ∑ki=1 pij . We first prove the following technical result. Proposition 10.14. Let p1 , . . . , pq be in ](r − 1) d2 , +∞[ and let X = (X1 , . . . , Xq ) be a Wishart–Dirichlet random variable in Ω with distribution D(p1 ,...,pq ) . Denote p = p1 + ⋅ ⋅ ⋅ + pq . Then (i) For all aij ∈ Ω, q
n
q
n
i=1
j=1
−pi 𝔼[Δ−p (e + ∑ ∑ P(a1/2 ij )Xi )] = ∏ Δ (e + ∑ aij ). i=1 j=1
(ii) For all bi ∈ Ω and ki ∈ K, q
q
i=1
i=1
−pi 𝔼[Δ−p (e + ∑ ki (P(b1/2 i )Xi ))] = ∏ Δ (e + bi ).
Proof. (i) This fact is proved by a method similar to that used in the proof of Theorem 10.12. (ii) Taking ai = ki (bi ) in (10.28), we get q
1/2
q
1/2
𝔼[Δ−p (∑(e + P((ki (bi )) )Xi ))] = ∏ Δ−pi (e + P((ki (bi )) )e). i=1
As
∑qi=1 Xi
i=1
= e, by Propositions 1.8 (i) and 1.7 (iii), we have 1/2
(ki (bi ))
= ki (b1/2 i ) and
∗ P(ki ((bi )1/2 )) = ki P(b1/2 i )ki ,
and thus obtain q
q
i=1
i=1
∗ −pi 𝔼[Δ−p (e + ∑ ki P(b1/2 i )ki Xi )] = ∏ Δ (e + bi ).
The fact that the distribution of Xi is K-invariant gives the result.
10.3 Wishart–Dirichlet distribution
| 257
Proof of Theorem 10.13. It is easy to see that ∑dj=1 Zj = e. For a1 , . . . , ad in Ω and p = ∑ki=1 ∑dj=1 pij , we have d
M = 𝔼[Δ−p (∑ ℍ(aj )(Zj ))] j=1
k
d
1/2 = 𝔼[Δ−p (e + ∑ ∑ P(a1/2 j )P(Yi )Xij )] i=1 j=1
k d 1/2 = 𝔼[𝔼[Δ−p (e + ∑ ∑ P(a1/2 j )P(Yi )Xij ) X]], i=1 j=1
where X = (X (1) , . . . , X (k) ). Invoking Proposition 1.8, there exists kij in K such that P(Yi1/2 )Xij = kij (P(Xij1/2 )Yi ). We can then write k d 1/2 M = 𝔼[𝔼[Δ−p (e + ∑ ∑ kij P(kij−1 (a1/2 j ))P(Xij )Yi ) X]]. i=1 j=1
As X and Y are independent, Proposition 10.14 implies that k
d
i=1
j=1
M = 𝔼[∏ Δ−ri (e + ∑ kij P(kij−1 (a1/2 j ))Xij )]. Since X (1) , . . . , X (k) are independent, this can be written as k
d
i=1
j=1
M = ∏ 𝔼[Δ−ri (e + ∑ kij P(kij−1 (a1/2 j ))Xij )]. Using Proposition 10.14, we obtain k
d
d
M = ∏ ∏ Δ−pij (e + aj ) = ∏ Δ−sj (ℍ(aj )(e)), i=1 j=1
j=1
and conclude using Theorem 10.12. The following result concerns the beta distribution and is an immediate consequence of Theorem 10.13. Corollary 10.15. Let p, p , q, and q be in ](r − 1) d2 , +∞[. Let X, Y, and Z be indepen(1) (1) dent random variables in Ω with respective distributions βp,q , βp(1) ,q , and βp+q,p +q . Then 1 P(Z 1/2 )X + P((e − Z)1/2 )Y has the beta distribution βp+p ,q+q .
258 | 10 Riesz–Dirichlet distributions
10.4 A generalization of the Dirichlet distribution We extend to a symmetric cone the definition of the generalized Dirichlet distribution given in the real case by Connor and Mosimann [22]. Let X1 , . . . , Xm be continuous random variables in Ω ∩ (e − Ω). If for i = 1, . . . , m, we define i−1
Zi = π −1 (e − ∑ Xj )(Xi ),
(10.29)
j=1
with the convention ∑0j=1 Xj = 0, then we have that i
i
j=1
j=1
e − ∑ Xj = ⨀(e − Zj ).
(10.30)
From this, the variables Z1 , . . . , Zm can be transformed to X1 , . . . , Xm by i−1
Xi = π(⨀(e − Zj ))(Zi ). j=1
Using (10.30), the Jacobian of (X1 , . . . , Xm ) → (Z1 , . . . , Zm ) is seen to be m−1
i−1
n
∏ Δ− r (e − ∑ Xj ). j=1
i=1
Now suppose that the Zi , i = 1, . . . , m are mutually independent and that Zi is βp(1)i ,qi -distributed. Then the joint density function of (X1 , . . . , Xm ), concentrated on Tm , is given by 1
∏m i=1 BΩ (pi , qi )
n
m
m
j=1
i=1
n
i−1
Δqm − r (e − ∑ xj ) ∏ Δpi − r (xi )Δqi−1 −(pi +qi ) (e − ∑ xj ). j=1
We call this distribution the generalized Dirichlet distribution on the cone Ω with parameters (p1 , . . . , pm , q1 , . . . , qm ) and denote it by GD(p1 ,...,pm ,q1 ,...,qm ) . Note that if qi−1 = pi + qi , for i = 2, . . . , m, then the generalized Dirichlet becomes the ordinary Dirichlet distribution D(p1 ,...,pm ,qm ) . Next, we specify a condition which affects the generality of the generalized Dirichlet distribution in Ω to reduce it to the ordinary Wishart–Dirichlet distribution. Theorem 10.16. Let (X1 , . . . , Xm ) have the generalized Dirichlet distribution GD(p1 ,...,pm ,q1 ,...,qm ) on Ω. Then (X1 , . . . , Xm ) has the ordinary Wishart–Dirichlet distribution D(p1 ,...,pm ,qm ) if and only if for i = 1, . . . , m − 1, the distribution of the random variable Yi = π −1 (e − ∑j=i̸ Xj )(Xi ) is βp(1)i ,qm .
10.4 A generalization of the Dirichlet distribution | 259
Proof. (⇒) Suppose that (X1 , . . . , Xm ) is D(p1 ,...,pm ,qm ) -distributed. It is easy to verify using the very definition of the Wishart–Dirichlet distribution on Ω that, by removing a component, we still have a Wishart–Dirichlet. More precisely, (X1 , . . . , Xi−1 , Xi+1 , . . . , Xm ) is D(p1 ,...,pi−1 ,pi+1 ,...,pm ,pi +qm ) -distributed. For a bounded measurable function h, we have for i = 1, . . . , m − 1, 𝔼(h(Yi )) = E[h(π −1 (e − ∑ Xj )(Xi ))] j=i̸
m
n
= c ∫ h(π −1 (e − ∑ xj )(xi )) ∏ Δpj − r (xj ) j=i̸
Tm n
j=1
m
× Δqm − r (e − ∑ xj )dx1 ⋅ ⋅ ⋅ dxm , j=1
where c =
ΓΩ (∑m j=1 pj +qm ) . ΓΩ (qm ) ∏m j=1 ΓΩ (pj )
Making the change of variable yi = π −1 (e − ∑j=i̸ xj )(xi ), we get 𝔼(h(Yi )) = c
∫
n
n
h(yi )Δpi − r (yi )Δqm − r (e − yi )dyi ,
(10.31)
Ω∩(e−Ω)
where c = c
∫ {∑j=i̸ xj ∈Ω∩(e−Ω)}
= =
n
n
∏ Δpj − r (xj )Δpi +qm − r (e − ∑ xi )dx1 ⋅ ⋅ ⋅ dxi−1 dxi+1 ⋅ ⋅ ⋅ dxm j=i̸
j=i̸
ΓΩ (∑m j=1 pj + qm ) ΓΩ (pi + qm ) ∏j=i̸ ΓΩ (pj )
ΓΩ (qm ) ∏m j=1 ΓΩ (pj ) ΓΩ (pi + qm ) . ΓΩ (pi )ΓΩ (qm )
ΓΩ (∑m j=1 pj + qm )
Substituting this into (10.31), we obtain that Yi is βp(1)i ,qm -distributed, for i = 1, . . . , m − 1. (⇐) From (10.29), we easily check that for i = 1, . . . , m − 1, m
π −1 (Yi )(e − Yi ) = π −1 (Zi )π(e − Zi )( ⨀ (e − Zj )). j=i+1
In particular, for i = m − 1, π −1 (Ym−1 )(e − Ym−1 ) = π −1 (Zm−1 )π(e − Zm−1 )(e − Zm ). Applying Theorem 8.10 gives qm−1 = pm + qm .
(10.32)
260 | 10 Riesz–Dirichlet distributions Suppose now that qi = pi−1 + qi−1 , for i = k, . . . , m − 1, where k = 2, . . . , m − 1. Then (1) from Theorem 8.9, we obtain that ⨀m j=k (e − Zj ) has the beta distribution βq ,∑m p , is m
i=k
i
independent of Zk−1 , and qk−1 = qm + ∑m i=k pi = pk + qk . The result follows by induction using the same reasoning. When m = 2, Theorem 10.16 states the following: Corollary 10.17. Let X1 and X2 be random variables in Ω ∩ (e − Ω). Suppose that X1 and π −1 (e − X1 )(X2 ) are independent with distributions βp(1)1 ,q1 and βp(1)2 ,q2 , respectively. Then
π −1 (e − X2 )(X1 ) has the βp(1)1 ,q2 distribution if and only if (X1 , X2 ) has the Wishart–Dirichlet distribution D(p1 ,p2 ,q2 ) .
11 Riesz inverse Gaussian distribution The generalized inverse Gaussian distribution on the real line ℝ was introduced in Barndorff-Nielsen and Halgreen [7]. It is defined by λ
a2 b 2 1 ν(λ,a,b) (dx) = xλ−1 exp{− (ax + bx−1 )}1]0,+∞[ (x)dx √ 2 2𝒦λ ( ab) −λ
(11.1)
where, for λ ∈ ℝ, 𝒦λ is the modified Bessel function of the third kind (see Abramowitz– Stegun [1]). The domain of variation of the parameters λ, a, and b is given by b ≥ 0, a > 0 if λ > 0, { { { b > 0, a > 0 if λ = 0, { { { {b > 0, a ≥ 0 if λ < 0. Taking λ = − 21 leads to the celebrated inverse Gaussian distribution 1
1
3 a− 4 b 4 1 x− 2 exp{− (ax + bx−1 )}1]0,+∞[ (x)dx ν(− 1 ,a,b) (dx) = 2 √ 2 2𝒦− 1 ( ab) 2
introduced by Tweedie [110]. The properties of the generalized inverse Gaussian distribution and its statistical applications are investigated in many books and papers, we refer to [21] and [64], and to the references within. We also cite Seshadri [104] where a review of the development of this distribution and of some statistical methods based upon the generated exponential family is given, and where an account of different proposals for multivariate extensions is provided with the context in which each proposal is advanced. In this respect, we mention the extension introduced by Barndorff-Nielsen et al. [8] defined on the cone Ω of (r, r)-symmetric positive definite matrices by n
1 Δλ− r (x) ν(λ,a,b) (dx) = exp{− (tr(ax) + tr(bx−1 ))}1Ω (x)dx, 𝒦(λ, a, b) 2
(11.2)
where λ ∈ ℝ, a and b in the closed cone Ω of nonnegative (r, r)-symmetric matrices and the norming constant 𝒦(λ, a, b) is expressed in terms of a matrix Bessel function of the second kind (see [54]). The domain of variation of the parameters a, b, and λ is either (a, b, λ) ∈ Ω × Ω × ℝ, { { { or b ∈ Ω with rank s < r and (a, λ) ∈ Ω × ( r−s−1 , ∞), { 2 { { r−s−1 {or a ∈ Ω with rank s < r and (b, λ) ∈ Ω × (−∞, − 2 ). The definition of an extension of the class of generalized inverse Gaussian distributions to symmetric cones uses a suitable generalization of the Bessel function which involves the generalized power function. https://doi.org/10.1515/9783110713374-011
262 | 11 Riesz inverse Gaussian distribution
11.1 Modified generalized Bessel function Consider, for a and b in Ω and s in ℝr , the integral 𝒦(s, a, b) = ∫ e
− 21 (⟨a,x⟩+⟨b,x−1 ⟩)
Δs− n (x)dx.
(11.3)
r
Ω
In [31], s is supposed to be in ℂr , while for our purposes we are only interested in the case where s is in ℝr . For the convergence of the integral (11.3), we need the following property in which we use the element m0 of K defined in (2.5) such that m0 cj = cr−j+1 , 1 ≤ j ≤ r. Lemma 11.1. For a, b in Ω and s = (s1 , . . . , sr ) ∈ ℝr , ∗
𝒦(s, a, b) = 𝒦(−s , m0 b, m0 a),
(11.4)
where s∗ = (sr , . . . , s1 ). 2n
Proof. Making in (11.3) the change of variable x = y−1 and noting that dx = Δ− r (y)dy, we get 𝒦(s, a, b) = ∫ e
− 21 (⟨a,y−1 ⟩+⟨b,y⟩)
n
Δs (y−1 )Δ− r (y)dy.
Ω
On the other hand, using (2.6) and (2.7), we have that Δs (y−1 ) = Δ−s∗ (m0 y). It follows that 𝒦(s, a, b) = ∫ e
− 21 (⟨a,y−1 ⟩+⟨b,y⟩)
n
Δ−s∗ (m0 y)Δ− r (y)dy.
Ω
By setting z = m0 y, this becomes 𝒦(s, a, b) = ∫ e Ω
− 21 (⟨m0 a,z −1 ⟩+⟨m0 b,z⟩)
Δ−s∗ − n (z)dz r
= 𝒦(−s∗ , m0 b, m0 a). Proposition 11.2. Given a and b in Ω, the integral K(s,a,b) converges for all s in ℝr . Proof. By Theorem 3.3, the integral converges for s such that si > (i − 1) d2 , and, by (11.4), it follows that it also converges for s such that si < (r − i) d2 . To deduce that the integral converges for all s in ℝr , we mention that the function s → Δs (x) is convex. In fact, Δs (x) can be written in the form exp(⟨β, s⟩), with β in ℝr , its second derivative with respect to s is equal to β ⊗ β exp(⟨β, s⟩) which is positive semidefinite.
11.1 Modified generalized Bessel function
| 263
Now we show that with some restriction on the domain of s, the Bessel function may be extended to the case where a or b is in Ω. Theorem 11.3. The integral 𝒦(s, a, b) converges for (s, a, b) satisfying b ∈ Ω,̄ a ∈ Ω if si > (i − 1) d2 , ∀ 1 ≤ i ≤ r, { { { b ∈ Ω, a ∈ Ω if −(r − i) d2 ≤ si ≤ (i − 1) d2 , ∀ 1 ≤ i ≤ r, { { { d ̄ {b ∈ Ω, a ∈ Ω if si < −(r − i) 2 , ∀ 1 ≤ i ≤ r.
(11.5)
Proof. From Proposition 11.2, when a and b are in Ω, the integral converges for any s in ℝr . When b ∈ Ω and a ∈ Ω, we observe that if 𝒦(s, a, 0) converges, then 𝒦(s, a, b) converges. And it is known that 1
∫ e− 2 ⟨a,x⟩ Δs− n (x)dx r
Ω
converges for s such that si > (i − 1) d2 . For the setting in which b ∈ Ω and a ∈ Ω, we use (11.4) to deduce that 𝒦(s, a, b) converges for si < −(r − i) d2 . Definition 11.1. For (s, a, b) satisfying (11.5), the distribution RIG(s, a, b)(dx) =
1
𝒦(s, a, b)
1 exp{− (⟨a, x⟩ + ⟨b, x −1 ⟩)}Δs− n (x)1Ω (x)dx r 2
is called the Riesz inverse Gaussian distribution with parameters (s, a, b). Note that if s1 = s2 = ⋅ ⋅ ⋅ = sr = λ, the Riesz inverse Gaussian distribution with parameters (s, a, b) is nothing but the generalized inverse Gaussian distribution with parameters (λ, a, b) defined by (11.2). Next we calculate the Laplace transform of the Riesz inverse Gaussian distribution. Proposition 11.4. The Laplace transform of the RIG(s, a, b) distribution evaluated at θ ∈ −Ω is equal to LRIG(s,a,b) (θ) =
𝒦(s, a − 2θ, b) . 𝒦(s, a, b)
Proof. Let θ be in Ω. Then LRIG (s,a,b) (θ) = ∫ e⟨θ,x⟩ RIG (s, a, b)(dx) Ω
= ∫ e⟨θ,x⟩ Ω
1
𝒦(s, a, b)
1
e− 2 (⟨a,x⟩+⟨b,x
−1
⟩)
Δs− n (x)dx r
264 | 11 Riesz inverse Gaussian distribution
=
1 −1 1 ∫ e− 2 (⟨a−2θ,x⟩+⟨b,x ⟩) Δs− n (x)dx r 𝒦(s, a, b)
Ω
𝒦(s, a − 2θ, b) = . 𝒦(s, a, b)
Now we show that, up to a linear transformation, the inverse of a Riesz inverse Gaussian random variable is also a Riesz inverse Gaussian random variable. Proposition 11.5. If s is in ℝr , a and b are in Ω, then X ∼ RIG (s, a, b)
if and only if
m0 X −1 ∼ RIG (−s∗ , m0 b, m0 a).
Proof. Let f be a bounded continuous function. Then 𝔼(f (mo X −1 )) = ∫ f (mo x−1 ) Ω
1 −1 1 e− 2 (⟨a,x⟩+⟨b,x ⟩) Δs− n (x)dx. r 𝒦(s, a, b) 2n
Setting y = mo x−1 , we have that dx = Δ− r (y)dy and, from Lemma 11.1, we deduce that 𝔼(f (mo X −1 )) = ∫ f (y) Ω
1
1
𝒦(−s∗ , m0 b, m0 a)
e− 2 (⟨m0 a,y
−1
⟩+⟨m0 b,y⟩)
Δ−s∗ − n (y)dy. r
Hence the distribution of mo X −1 is equal to 1
𝒦(−s∗ , m
0 b, m0 a)
1
e− 2 (⟨m0 a,y
−1
⟩+⟨m0 b,y⟩)
Δ−s∗ − n (y), r
which is the RIG (−s∗ , m0 b, m0 a) distribution.
11.2 Connection with the Riesz probability distribution Butler [18] has shown that the generalized inverse Gaussian distribution defined by (11.2) is connected to the Wishart distribution on symmetric matrices. It appears, in fact, as a conditional distribution of components of a Wishart random matrix. We next show that the Riesz inverse Gaussian distribution appears also as a conditional distribution of components of a Riesz random variable. Butler’s result may be obtained as a corollary of this most general result. For a fixed k in {1, . . . , r − 1}, we consider the Peirce decomposition x = x1 + x12 + x0 , with respect to the idempotent c = cr−k = c1 + ⋅ ⋅ ⋅ + cr−k of an element x of V. Theorem 11.6. Let X be a random variable in V with distribution Rr (s, σ). If X1 , X12 , X0 and σ1 , σ12 , σ0 are the Peirce components with respect to c of X and σ, respectively, then the conditional distribution of X1 given X12 is the Riesz inverse Gaussian distribution with parameters sr−k − k d2 , 2σ1 , and 2P(x12 )σ0 .
11.2 Connection with the Riesz probability distribution | 265
Proof. From Theorem 3.12, we can deduce the density of X12 , which is equal to 1
−1 ΓΩc (sr−k )Δ(c) sr−k (η1 )
k(r−k) d2
(2π)
−1
× ∫ e−⟨η1 ,x1 ⟩ e−⟨σ0 ,P(x12 +{σ0 Ωc
1
d
(Δ(e−c) (σ0−1 ))(r−k) 2
−k d σ12 x1 })x1−1 ⟩ (c) (c) Δ (x1 )) 2 dx1 . n1 (x1 )(Δ sr−k − r−k
Since from Proposition 1.16 we have that ⟨σ0 , P(x12 + {σ0−1 σ12 x1 })x1−1 ⟩ = ⟨σ0 , P(x12 )x1−1 ⟩ + ⟨x12 , σ12 ⟩ + ⟨x1 , P(σ12 )σ0−1 ⟩,
we obtain that the probability distribution of X12 is given by 1
−1 ΓΩc (sr−k )Δ(c) sr−k (η1 ) 1
k(r−k) d2
(2π)
1
d (Δ(e−c) (σ0−1 ))(r−k) 2 −1
× ∫ e− 2 (⟨2σ1 ,x1 ⟩+⟨2P(x12 )σ0 ,x1 Ωc
e−⟨σ12 ,x12 ⟩
⟩) (c) Δ n1 (x1 )dx1 (sr−k −k d2 )− r−k
d 1 e−⟨σ12 ,x12 ⟩ = 𝒦(sr−k − k , 2σ1 , 2P(x12 )σ0 ) . d −1 k(r−k) d2 2 ΓΩc (sr−k )Δ(c) (Δ(e−c) (σ0−1 ))(r−k) 2 sr−k (η1 ) (2π) We finally conclude that the conditional distribution of X1 given X12 is a Riesz inverse Gaussian distribution with parameters sr−k − k d2 , 2σ1 , and 2P(x12 )σ0 .
Bibliography [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23]
Abramowitz, M. and Stegun, I., Handbook of mathematical functions, Dover, New York, 1972. (p. 261) Anderson, T. W., An introduction to multivariate statistical analysis, 2nd ed., Wiley, New York, 1984. (p. V) Andersson, S., Invariant normal models, Ann. Stat. 3 (1975), 132–154. (p. VI) Andersson, S. A. and Klein, T., On Riesz and Wishart distributions associated with decomposable undirected graphs, J. Multivar. Anal. 101(4) (2010), 789–810. (p. VII) Andersson, S. A. and Wojnar, G., The Wishart distribution on homogeneous cones, J. Theor. Probab. 17 (2004), 781–818. (p. VII) Asci, C., Letac, G. and Piccioni, M., Beta-hypergeometric distributions and random continued fractions, Stat. Probab. Lett. 78 (2008), 1711–1721. (p. 229) Barndorff-Nielsen, O. E. and Halgreen, C., Infinite divisibility of the hyperbolic and generalized inverse Gaussian distributions, Z. Wahrscheinlichkeitstheor. Verw. Geb. 38 (1977), 309–311. (p. 261) Barndorff-Nielsen, O. E., Blæsild, P., Jensen, J. L. and Jørgensen, B., Exponential transformation models, Proc. R. Soc. Lond. A 379 (1982), 41–65. (p. 261) Bass, J., Cours de Mathématiques, Tome 2, Masson et cie, 1961. (p. 128, 135) Bernadac, E., Random continued fractions and inverse Gaussian distribution on a symmetric cone, J. Theor. Probab. 8 (1995), 221–256. (p. 213, 216, 217, 225) Ben Farah, M. and Hassairi, A., Characterization of the Dirichlet distribution on symmetric matrices, Stat. Probab. Lett. 77 (2007), 357–364. (p. 239) Ben Farah, M. and Hassairi, A., On the Dirichlet distributions on symmetric matrices, J. Stat. Plan. Inference 139(8) (2009), 2559–2570. (p. 239) Bobecka, K. and Wesolowski, J., The Lukacs–Olkin–Rubin theorem without invariance of the “quotient”, Stud. Math. 152 (2002), 147–160. (p. 183) Boutouria, I., Characterization of the Wishart distributions on homogeneous cones, C. R. Math. Acad. Sci. Paris 341(1) (2005), 43–48. (p. VII) Boutouria, I., Characterization of the Wishart distribution on homogeneous cones in the Bobecka and Wesolowski way, Commun. Stat., Theory Methods 38(13–15) (2009), 2552–2566. (p. VII) Boutouria, I., Hassairi, A. and Massam, H., Extension of the Olkin and Rubin characterization to the Wishart distribution on homogeneous cones, Infin. Dimens. Anal. Quantum Probab. Relat. Top. 14(4) (2011), 591–611. (p. VII) Bryc, W., Compound real Wishart and q-Wishart matrices, Int. Math. Res. Not. (2008), rnn079. (p. V) Butler, R. W., Generalized inverse Gaussian distributions and their Wishart connections, Scand. J. Stat. 25 (1998), 69–75. (p. 264) Casalis, M. and Letac, G., The Lukacs–Olkin–Rubin characterization of the Wishart distributions on symmetric cone, Ann. Stat. 24 (1996), 763–786. (p. 145, 176, 183) Chamayou, J. F. and Letac, G., Transient random walk on stochastic matrices with Dirichlet distribution, Ann. Probab. 22 (1994), 424–430. (p. 253, 254, 256) Chhikara, R., The inverse Gaussian distribution: theory, methodology, and applications, CRC Press, 1988. (p. 261) Connor, R. J. and Mosimann, J. E., Concepts of independence for proportions with a generalization of the Dirichlet distribution, J. Am. Stat. Assoc. 64 (1969), 194–206. (p. 239, 258) Darroch, J. N. and Ratcliff, D., A characterization of the Dirichlet distribution, J. Am. Stat. Assoc. 66 (1971), 641–643. (p. 246)
https://doi.org/10.1515/9783110713374-012
268 | Bibliography
[24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48]
Davis, A. W., Invariant polynomials with two matrix arguments extending the zonal polynomials: applications to multivariate distribution theory, Ann. Inst. Stat. Math. 31 (1979), 465–485. (p. 55) Díaz-García, J. A., Integral properties of zonal spherical functions, hypergeometric functions and invariant polynomials, JIRSS 13(1) (2014), 83–124. (p. 55) Díaz-García, J. A., Gutierrez-Jaimez, R., et al., On Wishart probability distribution: some extensions, Linear Algebra Appl. 435(6) (2011), 1296–1310. (p. V) Díaz-García, J. A., Matric variate Pearson type II-Riesz distribution, J. King Saud Univ., Sci. 28(4) (2016), 359–367. (p. VII) Ding, H., Gross, K. I. and Richards, D. St., Ramanujan’s master theorem for symmetric cones, Pac. J. Math. 175(2) (1996), 447–490. (p. 56) Draper, N. R. and Smith, H., Applied regression analysis, 3rd ed., Wiley, New York, 1998. (p. 149) Fama, E., The behavior of stock market prices, J. Bus. 38 (1965), 34–105. (p. 125) Faraut, J. and Korànyi, A., Analysis on symmetric cones, Oxford Univ. Press, 1994. (p. 1, 3, 5, 7, 8, 9, 11, 13, 14, 20, 25, 31, 32, 40, 55, 56, 57, 59, 62, 67, 69, 70, 71, 147, 161, 163, 164, 168, 170, 203, 230, 243, 262) Feller, W., An introduction to probability theory and its applications 2, J. Willey and Sons, New York, 1971. (p. 246) Fosam, E. B. and Shanbhag, D. N., An extended Laha–Lukacs characterization result based on a regression property, J. Stat. Plan. Inference 63 (1997), 173–186. (p. 149) Franklin, J. N., Matrix theory, Dover Publications, 1993. (p. 9) Garrigós, G., Generalized Hardy spaces on tube domains over cones, Colloq. Math. 90(2) (2001), 213–251. (p. 183) Gindikin, S. G., Analysis on homogeneous domains, Russ. Math. Surv. 29 (1964), 1–89. (p. V, 72) Gowda, M. S. and Tao, J., Some inequalities involving determinants, eigenvalues, and Schur complements in Euclidean Jordan algebras, Positivity 15(3) (2011), 381–399. (p. 9) Graczyk, P. and Lœb, J. J., Bochner and Schoenberg theorems on symmetric spaces in the complex case, Bull. Soc. Math. Fr. 122(4) (1994), 571–590. (p. 56) Gupta, A. K. and Nagar, D. K., Matrix variate distributions, Chapman and Hall, CRC, London, Raton, FL, 2000. (p. V, 162, 239) Gupta, R. D. and Richards, D. S. P., Moment properties of the multivariate Dirichlet distributions, J. Multivar. Anal. 82(1) (2002), 240–262. (p. 253) Hassairi, A. and Lajmi, S., Riesz exponential families on symmetric cones, J. Theor. Probab. 14 (2001), 927–948. (p. V, 67, 93) Hassairi, A. and Lajmi, S., Singular Riesz measures on symmetric cones, Adv. Oper. Theory 3(2) (2018), 12–25. (p. 72) Hassairi, A., Lajmi, S. and Zine, R., A characterization of the Riesz distribution, J. Theor. Probab. 21 (2008), 773–790. (p. 183) Hassairi, A., Lajmi, S. and Zine, R., Some new properties of the Riesz probability distribution, Math. Methods Appl. Sci. 40 (2017), 5946–5958. (p. 85) Hassairi, A. and Lajmi, S., Classification of Riesz exponential families on a symmetric cone by invariance properties, J. Theor. Probab. 17(3) (2004), 521–539. (p. 114) Hassairi, A., Lajmi, S. and Zine, R., Beta–Riesz distributions on symmetric cones, J. Stat. Plan. Inference 133 (2005), 387–404. (p. VIII, 165) Hassairi, A. and Louati, M., Multivariate stable exponential families and Tweedie scale, J. Stat. Plan. Inference 139 (2009), 143–158. (p. 125) Hassairi, A. and Regaig, O., Characterizations of the beta distribution on symmetric matrices, J. Multivar. Anal. 100(8) (2009), 1682–1690. (p. 166, 197)
Bibliography | 269
[49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74]
Hassairi, A., Masmoudi, M. A. and Regaig, O., Beta-hypergeometric probability distribution on symmetric matrices, Adv. Appl. Math. 89 (2017), 184–199. (p. 213) Hassairi, A., Masmoudi, M. and Zine, R., On the projections of the Riesz Dirichlet distribution, J. Math. Anal. Appl. 349 (2009), 367–373. (p. VIII, 239) Hassairi, A., Lajmi, S. and Zine, R., Riesz inverse Gaussian distribution, J. Stat. Plan. Inference 137 (2007), 2024–2033. (p. VIII) Helgason, S., Groups and geometric analysis, Academic Press, 1984. (p. 57) Heller, B., Speciel functions and characterizations of probability distributions by zero regression properties, J. Multivar. Anal. 13 (1984), 473–487. (p. 154) Herz, C. S., Bessel functions of matrix argument, Ann. Math. 61 (1955), 474–523. (p. 261) Hirzebruch, U., Der min-max-satz von E. Fischer für formal-reelle Jordan-algebren, Math. Ann. 186 (1970), 65–69. (p. 9) Hsu, P. L., On the distributions of the roots of certain determinantal equation, Annu. Eugen. 9 (1939), 256–258. (p. 165) Ishi, H., Positive Riesz distributions on homogeneous cones, J. Math. Soc. Jpn. 52(1) (2000), 161–186. (p. 72) Ishi, H., Homogeneous cones and their applications to statistics in modern methods of multivariate statistics, Travaux en Cours 82, Hermann Éditeurs, Paris, 2014. (p. 72) Jacobson, N., Structure theory for a class of Jordan algebras, Proc. Natl. Acad. Sci. USA 55(2) (1966), 243–251. (p. 1) James, I. R., Products of independent beta variables with applications to Connor and Mosimann’s generalized Dirichlet distribution, J. Am. Stat. Assoc. 67 (1972), 910–912. (p. 203) Jensen, S. T., Covariance hypotheses which are linear in both the covariance and the inverse covariance, Ann. Stat. 16 (1988), 302–322. (p. VI) Johnson, N. L. and Kotz, S., Distributions in statistics: continuous multivariate distributions, Wiley, New York, 1972. (p. V) Jordan, P., von Neumann, J. and Wigner, E. P., On an algebraic generalization of the quantum mechanical formalism, Ann. Math. (2) 35 (1934), 29–64. (p. 1, 11) Jørgensen, B., Statistical properties of the generalized inverse Gaussian distribution, Lecture notes in statistics 9, Springer-Verlag, New York, 1982. (p. 261) Jørgensen, B., Exponential dispersion models, J. R. Stat. Soc. 49(2) (1987), 127–162. (p. 125) Kacha, A. and Raissouli, M., Convergence of matrix continued fractions, Linear Algebra Appl. 320 (2000), 115–129. (p. 214) Kagan, A. M., Linnik, J. V. and Rao, C. R., Characterization problems of mathematical statistics, Wiley, New York, 1973. (p. 149) Kaneyuki, S., The Sylvester’s law of inertia for Jordan algebras, Proc. Jpn. Acad., Ser. A, Math. Sci. 64(8) (1988), 311–313. (p. 63) Kollo, T. and Von Rosen, D., Advanced multivariate statistics with matrices, Springer Science and Business Media, 2006. (p. V) Kołodziejek, B., Characterization of beta distribution on symmetric cones, J. Multivar. Anal. 143 (2016), 414–423. (p. 167, 195) Kołodziejek, B., The Lukacs–Olkin–Rubin theorem on symmetric cones without invariance of the “quotient”, J. Theor. Probab. 29(2) (2016), 550–568. (p. 183, 186) Lasalle, M., Algèbre de Jordan et ensemble de Wallach, Event. Math. 89 (1987), 375–393. (p. 62) Laha, R. G. and Lukacs, E., On a problem connected with quadratic regression, Biometrika 47 (1960), 335–343. (p. 149) Lajmi, S., Les Familles exponentielles de Riesz sur les cônes Symetriques, PhD thesis, Faculty of Sciences, Sfax University, 1998. (p. 72)
270 | Bibliography
[75]
Lajmi, S., Scalar statistics which have constant regression on the mean of a Riesz distribution, Commun. Stat., Theory Methods 35(11) (2006), 2075–2082. (p. 154) [76] Letac, G., Lectures on natural exponential families and their variance functions 50, IMPA, Rio de Janeiro, 1992. (p. 93) [77] Letac, G., A characterization of the Wishart exponential families by an invariance property, J. Theor. Probab. 1 (1989), 71–86. (p. 103, 125) [78] Letac, G. and Massam, H., Quadratic and inverse regressions for Wishart distributions, Ann. Stat. 26 (1998), 573–595. (p. 145, 151) [79] Letac, G. and Massam, H., The normal quasi-Wishart distribution, Contemp. Math. 287 (2001), 231–240. (p. V) [80] Letac, G. and Massam, H., Wishart distributions for decomposable graphs, Ann. Stat. 35(3) (2007), 1278–1323. (p. VII) [81] Letac, G., Massam, H. and Richards, D., An expectation formula for the multivariate Dirichlet distibution, J. Multivar. Anal. 77 (2001), 117–137. (p. 253) [82] Letac, G., La réciprocité des familles exponentielles naturelles sur ℝ, C. R. Math. Acad. Sci. Paris, Sér. I 303 (1986), 61–64. (p. 125) [83] Loos, O., Jordan pairs, Lect. notes in math. 460, Springer-Verlag, 1975. (p. 23) [84] Lukacs, E., A characterization of the gamma distribution, Ann. Stat. 26 (1955), 319–324. (p. 182) [85] Mandelbrot, B., The variation of certain speculative prices, J. Bus. 36 (1963), 394–419. (p. 125) [86] McCrimmon, K., A taste of Jordan algebras, Springer-Verlag, New York Berlin Heidelberg, 2004. (p. 1) [87] Mardia, K. V., Kent, J. T. and Bibby, J. M., Multivariate analysis, London Academic Press, 1979. (p. V) [88] Massam, H. and Neher, E., On transformation and determinants of Wishart variables on symmetric cones, J. Theor. Probab. 10 (1997), 867–902. (p. 6, 68) [89] Mauldon, J. G., A generalization of the beta-distribution, Ann. Math. Stat. 30 (1959), 509–520. (p. 253) [90] Merikoski, J. K. and Kumar, R., Inequalities for spreads of matrix sums and products, Appl. Math. E-Notes 4 (2004), 150–159. (p. 9) [91] Muirhead, R. J., Aspects of multivariate statistical theory, Wiley, New York, 1982. (p. V, 85, 197) [92] Nagar, D. K. and Gupta, A. K., Matrix variate Kummer–Beta distribution, J. Aust. Math. Soc. 73 (2002), 11–26. (p. 55, 56) [93] Nagar, D. K. and Gupta, A. K., An identity involving invariant polynomials of matrix arguments, Appl. Math. Lett. 18 (2005), 239–243. (p. 56) [94] Nolan, J. P., Stable distributions: models for heavy tailed data, Springer, New York, 2016. (p. 125, 126, 127) [95] Olkin, I., The 70th anniversary of the distribution of random matrices: a survey, Linear Algebra Appl. 354 (2002), 231–243. (p. V) [96] Olkin, I. and Rubin, H., A characterization of the Wishart distribution, Ann. Math. Stat. 33 (1962), 1272–1280. (p. 182) [97] Rao, C. R., Linear statistical inference and its applications, Wiley, 1965. (p. 180) [98] Rao, C. R. and Rao, M. B., Matrix algebra and its applications to statistics and econometrics, World Scientific Publishing Co. Pte. Ltd., 1998. (p. V) [99] Roux, J. J. J., On generalized multivariate distributions, S. Afr. Stat. J. 5 (1971), 91–100. (p. 162) [100] Roverato, A., Hyper inverse Wishart distribution for non-decomposable graphs and its application to Bayesian inference for Gaussian graphical models, Scand. J. Stat. 29(3) (2002), 391–411. (p. VII)
Bibliography | 271
[101] Samorodnitsky, G. and Taqqu, M. S., Stable non-Gaussian random processes: stochastic models with infinite variance, Chapman and Hall, New York, London, 1994. (p. 126) [102] Samuelson, P. A., Efficient portfolio selection for Pareto–Lévy investments, J. Financ. Quant. Anal. 2 (1967), 107–117. (p. 125) [103] Satake, I., On zeta functions associated with self dual homogeneous cones, in Reports on symposium of geometry and automorphic functions, Tohoku Univ., Sendai, pp. 145–168, 1988. (p. 63) [104] Seshadri, V., The inverse Gaussian distribution, Clarendon. Press, Oxford, 1993. (p. 125, 141, 261) [105] Seshadri, V. and Wesolowski, J., Constancy of regressions for beta distributions, Sankhya 65 (2003), 284–291. (p. 205) [106] Terkelsen, F., Some minimax theorems, Math. Scand. 31 (1972), 405–413. (p. 226) [107] Troskie, C. G., Noncentral multivariate Dirichlet distributions, S. Afr. Stat. J. 1 (1967), 21–32. (p. 239) [108] Troskie, C. G., The distributions of some test criteria depending on multivariate Dirichlet distributions, S. Afr. Stat. J. 6 (1972), 151–163. (p. 239) [109] Stuck, B. W. and Kleiner, B., A statistical analysis of telephone noise, Bell Syst. Tech. J. 53 (1974), 1263–1320. (p. 125) [110] Tweedie, M. C. K., Inverse statistical variate, Nature 155 (1945), 453. (p. 261) [111] Tweedie, M. C. K., An index which distinguishes between some important exponential families in statistics: applications and new directions, in Proceedings of the Indian statistical institute golden jubilee international conference, pp. 579–604, 1984. (p. 125) [112] Von Rosen, D., Bilinear regression analysis, Lecture notes in statistics, Spinger, 2018. (p. 149) [113] Wishart, J., The generalised product moment distribution in samples from a normal multivariate population, Biometrika 20A(1–2) (1928), 32–52. (p. V) [114] Wong, C. S. and Wang, T., Laplace–Wishart distributions and Cochran theorems, Sankhya A (1995), 342–359. (p. V) [115] Zhaoa, H.-xi., Matrix-valued continued fractions, J. Approx. Theory 120 (2003), 136–152. (p. 213) [116] Zine, R., On the matrix-variate beta distribution, Commun. Stat., Theory Methods 41 (2012), 1569–1582. (p. 173) [117] Zolotarev, V. M., One-dimensional stable distributions, Translations of mathematical monographs 65, American Mathematical Society, 1986. (p. 125, 126)
Index absolutely continuous 58, 79, 226 Bessel function 263 beta distribution 164 beta function 159 beta–hypergeometric 213 beta–Riesz 165 beta–Wishart 165, 193 character 183 Cholesky decomposition 25 constancy regression 150, 151, 204 continued fraction 213 covariance 67, 68, 94, 143 cumulant function 93 determinant 3 division algorithm 50 domain of the means 94 eigenspace 3 eigenvalue 3 Euclidean Jordan algebra 1 expectation 58, 143, 176 Frobenius transformation 16 functional equation 184 gamma distribution 80 gamma function 53 Gauss hypergeometric function 164, 203 Gaussian 67 generalized beta 234 generalized power 31 group of automorphisms 14 Harish-Chandra’s c-function 57 homogeneous 8 hypergeometric functions 159 idempotent 3 inverse Gaussian distribution 261 Jordan algebra 1 Jordan constant 5 Jordan frame 3 Jordan product 1
Laplace transform 67 Lévy measure 69, 141 Mellin transform 57, 193 natural exponential family 93 orthogonal group 14 Peirce components 37 Peirce decomposition 3, 5 Pochhammer symbol 54 primitive idempotent 3 principal minor 34 quadratic representation 2 rank 3 Riesz integral 69 Riesz inverse Gaussian 263 Riesz measure 69 Riesz probability distribution 79 Riesz–Dirichlet 241 salient 8 self-dual 8 spectral decomposition 3 spherical Fourier transform 56 spherical function 57 spherical polynomial 55 stable probability distribution 125 symmetric cone 8 trace 3 triangular group 24, 105 Tweedie scale 134 variance function 94 Wallach set 70 Weyl’s inequalities 9 Wishart probability distribution 80 Wishart–Dirichlet 246 zonal polynomials 55
Index of notations ≃ ∼ ⟨, ⟩ × ⊕ ⊗ \ ≈ a. s. Aut(V) t a a∗ (a)m BΩ (⋅, ⋅) ℂ 𝒞∞
Cone(A) CCH(A) c(α) d Det (⋅) D(s1 ,...,sq ) e 𝔼(⋅) 𝔼(⋅|⋅) ̂f fX F(μ) G(Ω) G GL(ℝr ) GL+ (ℝr ) GL(V) Herm(r, ℂ) Id J K kμ LX Lμ
isomorphic to is distributed as scalar product direct product direct sum tensor product setminus equivalent to almost surely group of automorphisms of the algebra V transpose of a adjoint of a Pochhammer symbol beta function of the symmetric cone Ω field of complex numbers space of functions infinitely differentiable cone generated by A the closed convex hull of A Harish-Chandra’s c-function Jordan constant determinant of an endomorphism Riesz–Dirichlet distribution identity element expectation conditional expectation spherical Fourier transform probability density function of X natural exponential family generated by μ group of linear automorphisms of the algebra V connected component of the identity in the group G(Ω) linear group of non-singular r × r real matrices linear group of r × r real matrices with positive determinant linear group of linear automorphisms of the algebra V space of (r, r)-Hermitian matrices identity mapping set of a primitive idempotents in the algebra V orthogonal group cumulant function of the measure μ Laplace transform of a random variable X Laplace transform of the measure μ
https://doi.org/10.1515/9783110713374-013
276 | Index of notations MF MX (⋅) n N(m, ϱ) p Fq P(x) ℝ r Rs R(s, σ) RIG(s, a, b) Sym(r, ℝ) S(V) T tr (⋅) V var(X) VF (⋅) Wr (p, σ) x−1 x x1 , x12 , x0 1A (1) βs,s
(2) βs,s β(a, b, c, δ) ΓΩ γp,σ Δ(⋅) Δk (⋅) Δs (⋅) Λ μa,a ,b π(⋅) π −1 (⋅) τc (z) ϕm (x) Φλ (x) Ω
domain of the means Mellin transform of X dimension of V Gaussian distribution hypergeometric function quadratic representation field of real numbers rank of the algebra Riesz measure Riesz probability distribution Riesz inverse Gaussian distribution space of symmetric real (r, r)-matrices Schwartz space on the algebra V triangular group trace Jordan algebra covariance operator of X variance function Wishart probability distribution inverse of x in V conjugate of x Peirce components of x
1, when x is in A 1A (x) = { 0, when x is in A beta–Riesz probability distribution of the first kind
beta–Riesz probability distribution of the second kind generalized beta probability distribution gamma function gamma distribution determinant principal minor of order k generalized power Wallach set beta–Gauss hypergeometric distribution multiplication algorithm division algorithm Frobenius transformation spherical polynomial spherical function symmetric cone