337 92 8MB
English Pages 334 [336] Year 1998
de Gruyter Textbook Pestman / Alberink · Mathematical Statistics
Wiebe R. Pestman Ivo Β. Alberink
Mathematical Statistics Problems and Detailed Solutions
W G DE
Walter de Gruyter Berlin · New York 1998
Authors Wiebe R. Pestman Ivo Β. Alberink University of Nijmegen NL-6525 ED Nijmegen The Netherlands
1991 Mathematics Subject Classification: 62-01 Keywords: Estimation theory, hypothesis testing, regression analysis, non-parametrics, stochastic analysis, vectorial (multivariate) statistics
©Printed on acid-free paper which falls within the guidelines of the ANSI to ensure permanence and durability.
Library of Congress - Cataloging-in-Publication
Data
Pestman, Wiebe R., 1954Mathematical statistics : problems and detailed solutions / Wiebe R. Pestman, Ivo Β. Alberink. p. cm. - (De Gruyter textbook) ISBN 3-11-015359-9 (hardcover : alk. paper). -ISBN 3-11-015358-0 (pbk. : alk. paper) 1. Mathematical statistics-Problems, exercises, etc. I. Alberink, Ivo Β., 1971. II. Title. III. Series. QA276.2.P47 1998 519.5 O76-dc21 98-20084 CIP Die Deutsche Bibliothek - Cataloging-in-Publication
Data
Pestman, Wiebe R.: Mathematical statistics - problems and detailed solutions / Wiebe R. Pestman ; Ivo Β. Alberink. - Berlin ; New York : de Gruyter, 1998 (De Gruyter textbook) Erg. zu: Pestman, Wiebe R.: Mathematical statistics - an introduction ISBN 3-11-015358-0 brosch. ISBN 3-11-015359-9 Gb. © Copyright 1998 by Walter de Gruyter GmbH & Co., D-10785 Berlin. All rights reserved, including those of translation into foreign languages. No part of this book may be reproduced in any form or by any means, electronic or mechanical, including photocopy, recording or any information storage and retrieval system, without permission in writing from the publisher. Printed in Germany. Typesetting using the authors' TgX files: I. Zimmermann, Freiburg. Printing and Binding: WB-Druck GmbH & Co., Rieden/Allgäu.
Preface
The book that is now in front of you is a presentation of detailed solutions to various exercises (some 260) in mathematical statistics. It is a companion volume and solutions manual for the textbook Mathematical Statistics - An Introduction (Walter de Gruyter, Berlin - New York, 1998) by the second author. Though an exercise book, we tried to keep it as self-contained as was humanly possible. Each chapter starts with a summary of the corresponding chapter in the textbook, followed by formulations of and solutions to the corresponding exercises. In this way the book is very suitable for self-study. If you find any mistakes or typing errors, please let us know. Of course suggestions on nice, useful or interesting additional problems are welcome too. Thanks go to Alessandro di Bucchianico and Mark van de Wiel for computing most of the statistical tables.
Nijmegen, March 1998
Ivo Β. Alberink Wiebe R. Pestman
Table of contents
Chapter 1. Probability theory 1.1 1.1.1 1.1.2 1.1.3 1.1.4 1.1.5 1.1.6 1.1.7 1.1.8 1.1.9 1.1.10 1.2
Summary of Chapter I Probability spaces Stochastic variables Product measures and statistical independence Functions of stochastic vectors Expectation, variance and covariance of stochastic variables Statistical independence of normally distributed variables Distribution functions and probability distributions Moments, moment generating functions and characteristic functions The central limit theorem Transformation of probability densities Exercises to Chapter I
1 1 2 2 3 3 4 5 6 8 8 8
Chapter 2. Statistics and their probability distributions, estimation theory 2.1 2.1.1 2.1.2 2.1.3 2.1.4 2.1.5 2.1.6 2.1.7 2.1.8 2.1.9 2.1.10 2.2
Summary of Chapter II Introduction The gamma distribution and the x2-distribution The ¿-distribution Statistics to measure differences in mean The F-distribution The beta distribution Populations which are not normally distributed Bayesian estimation Estimation theory in a more general framework Maximum likelihood estimation, sufficiency Exercises to Chapter II
49 49 50 50 51 51 52 53 53 54 56 56
Chapter 3. Hypothesis testing 3.1 3.1.1
Summary of Chapter III The Neyman-Pearson theory
110 110
vili
3.1.2 3.1.3 3.1.4 3.2
Table of contents
Hypothesis tests concerning normally distributed populations The χ 2 -test on goodness of fit The x 2 -test on statistical independence Exercises to Chapter III
111 113 113 114
Chapter 4. Simple regression analysis 4.1 4.1.1 4.1.2 4.1.3 4.1.4 4.1.5 4.2
Summary of Chapter IV The method of least squares Construction of an unbiased estimator of σ 2 Normal regression analysis Pearson's product-moment correlation coefficient The sum of squares of errors as a measure of the amount of linear structure Exercises to Chapter IV
144 144 145 145 146 146 147
Chapter 5. Normal analysis of variance 5.1 5.1.1 5.1.2 5.2
Summary of Chapter V One-way analysis of variance Two-way analysis of variance Exercises to Chapter V
165 165 166 167
Chapter 6. Non-parametric methods 6.1 6.1.1 6.1.2 6.1.3 6.1.4 6.1.5 6.1.6 6.2
Summary of Chapter VI The sign test, Wilcoxon's signed-rank test Wilcoxon's rank-sum test The runs test Rank correlation tests The Kruskal-Wallis test Friedman's test Exercises to Chapter VI
176 176 177 177 178 179 179 180
Chapter 7. Stochastic analysis and its applications in statistics 7.1 7.1.1
Summary of Chapter VII The empirical distribution function associated with a sample
201 201
ix 7.1.2 7.1.3 7.1.4 7.1.5 7.1.6 7.1.7 7.1.8 7.1.9 7.1.10 7.1.11 7.1.12 7.1.13 7.2
Convergence of stochastic variables The Glivenko-Cantelli theorem The Kolmogorov-Smirnov test statistic Metrics on the set of distribution functions Smoothing techniques Robustness of statistics Trimmed means, the median and their robustness Statistical functional The von Mises derivative; influence functions Bootstrap methods Estimation of densities by means of kernel densities Estimation of densities by means of histograms Exercises to Chapter VII
201 203 203 203 204 205 205 206 209 210 212 213 213
Chapter 8. Vectorial statistics 8.1 8.1.1 8.1.2 8.1.3 8.1.4 8.1.5 8.1.6 8.1.7 8.1.8 8.1.9 8.1.10 8.1.11
Summary of Chapter VIII Linear algebra The expectation vector and the covariance operator of stochastic vectors Vectorial samples The vectorial normal distribution Conditional probability distributions that emanate from Gaussian ones Vectorial samples from Gaussian distributed populations Vectorial versions of the fundamental limit theorems Normal correlation analysis Multiple regression analysis The multiple correlation coefficient Exercises to Chapter VIII
Statistical tables
268 268 269 271 271 272 273 274 275 276 277 278 307
Chapter 1 Probability Theory
1.1
Summary of Chapter I
1.1.1
Probability spaces (1.1)
We are dealing with experiments in which randomness is playing a role. The set of all possible outcomes of the experiment is denoted by Ω, the sample space. By definition an event is a subset of the sample space. A σ-algebra of subsets of Ω is understood to be a collection 21 of subsets, enjoying the following properties:
(i)
Ω e α,
(ii)
If A e α then Ac e O,
(ili) If¿i,¿ 2 ,...€2lthenU£iA€5l. A measure on such a σ-algebra is a map μ : 2t —> [0, +00] that has the following properties: (i) For every countable family A\,A2,... of mutually disjoint elements of 2t one has
(ii) μ(0) = Ο. The smallest σ-algebra on R n containing all open sets will be denoted by Q3n. The elements of 2$n are called Borei sets and a measure on ® n is said to be a Borei measure. A fundamental result in measure theory is the existence of a unique Borei measure λ on © n such that λ([0, l] n ) = 1 and + a) = λ(Α) for all A e © n , a G Κ": this measure is called the Lebesgue measure on R n . A function / : R n —> R m is said to be a Borei function if for all open sets O C R m the set / - 1 ( 0 ) is a Borei set in R n . Every continuous function is a Borei function. The rest of §1.1 is a summary of the elements of measure theory. Important concepts are that of an integral and that of a Borei measure having a density with respect to the Lebesgue measure.
2
Chapter 1. Probability Theory
A measure Ρ on (Ω, 21) is said to be a probability measure if Ρ(Ω) = 1. A probability space is understood to be an ordered triplet (Ω, 21, P), where Ω is a sample space, 21 a σ-algebra of subsets of Ω and Ρ a probability measure on Ω (we ought to say 'on 21').
1.1.2
Stochastic variables (1.2)
Given a probability space (Ω, 21, P), a function X : Ω —• R n is said to be 21measurable if X _ 1 (A) G 21 for all A G © n . In statistics and probability theory such a function is usually called a stochastic n-vector. In the case where η = 1 we shall be speaking about a stochastic variable. To every stochastic η-vector there is in a canonical way an associated Borei measure Ρχ on R n ; it is defined by Ρχ(-Α) Ρ [Χ - 1 (Λ)] for every A that is Borei in R". Ρχ is called the probability distribution of lili the range of X is countable, then X is said to be discretely distributed. Examples of discrete probability distributions are the binomial, the geometrical and the Poisson distribution. A stochastic η-vector X is said to enjoy an absolutely continuous distribution if there exists a Borei function / : R n —• [0, +oo) such that Ρχ = / λ , that is to say Ρχ(Λ) = ¡ÍAfdX for all A that are Borei in ® n . Here the function / is called the probability density of X. In measure theory Ρχ is also called the image of Ρ under X. If Xi,... , Xn are real-valued stochastic variables, then we can form the stochastic η-vector X by setting Χ(ω) := ( Χ ι Μ , . . . , Χ η Μ ) We shall frequently denote Ρχ 1κ ..
1.1.3
(ω G Ω).
instead of Ρχ.
Product measures and statistical independence (1.3)
A probability measure Ρ on R™ is called the product measure of the probability measures P i , . . . , P n on R if for all A\,... ,A„ € 18 one has: Ρ (Α! χ···χΑη)
=
Ρι(Α1)···Ρη(Αη).
We indicate this by writing Ρ = Pi • · · ® P n The stochastic variables Xi,X2,. • • , Xn are said to be statistically independent if Ρ(χ1,...,χ„)=Ρχ1®···®Ρχ„· Events B \ , . . . ,B n are called independent if l s 1 , . . . , lg„ are independent variables. Statistical independence of stochastic vectors is defined in exactly the same way.
1.1. Summary of Chapter I
3
The tensor product of a couple of functions /i , . . . , / „ : R —» R is the function / : R" —• R defined by / ( χ χ , . . . ,xn) := f(xi)---f(xn); we denote / = /χ ® • • · ® în- I f X i , . . · , X n are stochastic variables having densities / i , . . . , fn, then the system X\,... , Xn is statistically independent if and only if the stochastic n-vector (Χχ,... , Xn) has as its density the function / = f\ ® · · · fn-
1.1.4
Functions of stochastic vectors (1.4)
If X is a stochastic m-vector and / : R m —» R p a Borei function, then we can form the composition /(X), being a stochastic p-vector. If X and Y are statistically independent stochastic vectors and f,g are Borei functions, then the stochastic vectors / ( X ) and g(Y) are automatically statistically independent. If φ : R n -> R and / : R m R n are Borei, then for every stochastic m-vector X we have (Theorem 1.4.3)
1.1.5
Expectation value, variance and covariance of stochastic variables (1.5)
Given a stochastic m-vector X and a Borei function g : R m —» R and provided that / Ig I d Ρχ < +oo, the expectation value (or mean) of g(X) is defined by
So for every stochastic variable X we have E(X) = / χά¥χ(χ), provided / |x| άΨχ(χ) < +00. If X is discretely distributed with range W = {ai, a2,... }, then we have
Eb(X)]= $>(a)P(X = a), aew provided that X)aeVV |g(a)| P(X = a) < -foo. If X enjoys an absolutely continuous distribution with density function / , then
provided that / |g(x)|/(x) dx < +oo. The action of taking the expectation value of stochastic variables is a linear operator, that is to say: Ε(αΧ) = αΕ(Χ)
and E(X + Y) = E(X) + E(K)
4
Chapter 1. Probability Theory
whenever α 6 Κ and E ( X ) and E ( Y ) exist. If X and Y are statistically independent we have moreover that E(XY) = E ( X ) E ( Y ) . If E ( X 2 ) exists, then automatically also E ( X ) exists; we then define the variance of X by var(X) := E ( X 2 ) - E ( X ) 2 . We have var(X) = E [(X - E(X)) 2 ]; it follows that always var(X) > 0. If var(X) = 0 then the variable X will show with probability 1 the constant value μ = E ( X ) as its outcome. The covariance of a pair of stochastic variables X and Y is defined by cov(X, Y ) := Y.{XY) - E ( X ) E ( Y ) , provided E ( X 2 ) and E ( Y 2 ) exist. It is a direct consequence of this definition that cov(X, X ) = var(X) and cov(X, Y ) = 0 if X and Y are independent. Furthermore, writing μχ :=E(X) and μγ := E ( Y ) , we have cov(A",Y) = E [(Χ - μχ)(Υ - μγ)]. The map (Χ, Y) cov(X, Y) defines a semi inner product. If Χι,... , Xn axe stochastic variables satisfying cov(AT¿, Xj) = 0 for all i φ j, then η v a r p f j + ·•· + * „ ) = £ varpQ). t=l Covariance is invariant under translations, that is to say, for all o, b G R we have: cov(X + a, y + b) - cov(X, Y ) . The standard, deviation of a stochastic variable X is defined by σ χ := x/var(AT), provided E ( X 2 ) exists. The correlation coefficient of two variables X and Y is given by
σχσγ provided this expression exists. If so, then we have — 1 < p(X, Y ) < 1. The values ± 1 are taken on if and only if the variables X and Y are linked by a relation of type aX + bY — c, where a, 6, c are constants.
1.1.6
Independent normally distributed stochastic variables (1.6)
A stochastic variable X is said to be normally distributed with parameters μ and σ2 (where σ > 0) if it enjoys an absolutely continuous distribution with density f(x)
=
exp
1/χ-μ\2 2\
σ
)
/
1.1. Summary of Chapter I
5
We often indicate this by saying that X is Ν(μ, a 2 )-distributed. If so, then μχ — E(X) = μ and σ2χ = var(X) = σ2. If X is Ν (μ, a 2 )-distributed, then pX + q (where ρ ^ 0) is Ν(ρμ + q, ρ 2 σ 2 )-distributed (Theorem 1.6.1). In particular: the variable (X — μ)/σ enjoys a N(0, l)-distribution, also called the standard normal distribution. In general we say that (X - μχ)/σχ is the standardized of a variable X . Prom now on we assume that Xi,... , X„ are statistically independent variables, all of them with a common N(0,CT2)-distribution. Then Ρχ 1ι ...,χ„ has a density given by
A probability distribution Ρ γ of a stochastic η-vector Y is called rotation invariant if for every orthogonal linear operator Q : R n —• R" we have that Ρ γ = P q y . (Equivalently: Ργ(-Α) = Py(Qj4) for all A e and all orthogonal linear operators Q). Under these conditions Ρχ^... ,x n is rotation invariant. Next, suppose that X\,... ,Xn is a statistically independent system of variables and suppose that Xi is Ν(μί:a¿2)-distributed for i = 1 , 2 , . . . ,n. Then (Theorem 1.6.6) the sum 5 := Xi + • • • + Xn is Ν(μχ Η h μη, σ\ + · · · + σ 2 )distributed. Let
be the linear span of the variables X\,... , Xn. A very useful criterion for statistical independence is Theorem 1.6.7. It states: if Mi,... , Mp, Ni,... , Nq e 9J and if for all possible i and j we have c o v ( M i , N j ) = 0, then the stochastic vectors ( M i , . . . , M p ) and (ΛΓχ,... , Ng) are statistically independent. In particular, if Μ, Ν G 5J, then: cov(M, Ν) = 0
1.1.7
M and Ν are statistically independent.
Distribution functions and probability distributions (1.7)
The distribution function Fx : R —» [0,1] belonging to a stochastic variable X is defined by Fx(x)
: = Ρ [ Χ < χ ] = Ρχ[(-οο,χ]].
Such functions always have the following properties: (i)
F x is increasing and right-continuous;
(ii)
l i m ^ - o o Fx(x) = 0 and lim x _ + 0 o Fx(x)
= 1.
6
Chapter 1. Probability Theory
Conversely, every function F satisfying these conditions is of the form F = Fx where X is some stochastic variable. If X enjoys an absolutely continuous distribution with probability density / , then for all χ G R we have F x ( x ) =
Γ M J— oo
DT.
If / is continuous in xo, then f(xo) = F'x(xq). For a stochastic n-vector X = ( Χ ι , . . • , X n ) the so-called function Fx : R" —• [0,1] is defined by Fx(xi,...
, xn)
:= Ρ (Χι
< xu...
, Xn
(t)]t=o.
Furthermore, for two variables X and Y the following four statements are under mild conditions equivalent: (i)
Ρχ
=
(ii)
Fx
—
Py, Fy,
1.1. Summary of Chapter I
(iii)
Μχ = Μγ on some interval (-ε,
(iv)
E ( X n ) = E ( K n ) for all η e Ν.
7
+ε),
We say that a sequence of stochastic variables Xi,X2,... bution (or: converges weakly) to X if lim FXn{x) η—>oo
=
converges in distri-
Fx(x)
for all points χ in which F x is continuous. By Theorem 1.8.4 this kind of convergence occurs if lim Μχη(t)
=
n—>oo
Mx{t)
for all t in some interval (—οο,ξ]. The characteristic function χ (or χχ ) belonging to a stochastic variable X is defined by X(t) := E ( e i t X ) . This expression exists for all t in R and the function t • x(t) is always continuous on R. Levy's theorem states that a sequence X\,X2, • • • converges in distribution to X if and only if Jta^ Χχη (t) =
Xx
(t)
for all t in R.
A probability distribution Ρ χ is completely characterized by χχ. More precisely: X x = Xy is equivalent to Ρ χ = P y . If the stochastic variables X and Y are statistically independent then Mx+Y(t)
= Mx(t)MY(t)
and
Xx+Y(t)
-
xx(t)xY(t).
As to the effect of scale transformations on characteristic and moment generating functions we have the following. If Y = pX + q, then 1. My(t) 2.
= egt Μχ (pi) for all t e R where these expressions make sense,
(t) = eiqt χχ (pt) for all t € R.
If X is Ν (μ, a2)-distributed, then: 1. Mx(t)
= e ^ + i " 2 ' 3 for all t € R,
2. χχ ( t ) = e ^ t - i ^ t 2
for
all t e R.
8
Chapter 1. Probability Theory
1.1.9
The centred limit theorem (1.9)
In this section the central limit theorem is proved. Its content is the following. If X i , X 2 , · · · is a statistically independent sequence of identically distributed stochastic variables with expectation value μ and variance σ 2 , then the sequence V»
=
+ ·•· + * » ) - μ } cr2)-distribution.
converges in distribution to the ΛΓ(0, large η approximately Ν(μ, tf2/n)-distributed.
1.1.10
In other words, ~Sn is for
Transformation of probability densities (1.10)
Given a variable X (or two statistically independent variables X and V ) how can we determine the probability densities of for example X/Y, \[X or X2 ? In 1.10 techniques are discussed how to solve these problems.
1.2
Exercises to Chapter I
Exercise 1 Suppose 2t is a σ-algebra. Prove that for arbitrary A, Β € 21 also A U ß e S t , i 4 f l 5 e 2 l and A\B e 21. Proof. Ιί A, Β e % then certainly AC,BC,A\J Β = Λ U Β U 0 U 0 · · · € 21. Consequently 21 is closed under the operation of taking unions. Because of this also Ac U B° € 21 and in turn it follows that Α ΓΊ Β = (Ac U B°)° e 21, so that 2t is also closed under the operation of taking intersections. In this way we see that also A\B = A η Bc € 21. •
Exercise 2 Prove that for Au A2,... € 21, also
A € 21.
Proof. By the laws of De Morgan we have (see Exercise 1)
• Exercise 3 Let $ C Ρ(Ω)
and set 2t:= Π
C6Z
«
1.2. Exercises to Chapter I
9
where Ζ is the collection of all σ-algebras on Ω containing J . (i)
Prove that St is a σ-algebra. Proof. (i)
For all £ e Ζ we have Ω G £. Consequently Ω e 21.
(ii)
If A G 21 then A G £ for all £ in Ζ. Hence also A c e £ for all £ in Z, which implies that Ac G 21.
(iii) If Ai, Ai,... G 2t then also Αχ, A2,... G £ for all £ in Z. It follows that (jj Ai G £ for all £ in Z, hence |J i Ai G 21. • (ii)
Prove that for each σ-algebra S which contains $ one has 21 C Proof. Suppose A G 21. Then (by construction of 21) A G 2t C » .
Therefore •
Exercise 4 Is the collection of all open sets in R a σ-algebra ? Solution. This collection is not a σ-algebra. To see this, note that (—oo,0) is an element of this collection, whereas (-oo, 0)c = [0, +oo) is not. •
Exercise 5 Suppose / : R m
R n is a Borei function. Define 21 := {A C R" : f~l(A)
(i)
G ® m }.
Prove that 21 is a σ-algebra containing all open sets in R™. Proof. (i)
/ - 1 ( R n ) = R m G » m , therefore R n G 21.
(ii)
Suppose A G 21, that is ί~\Α) G ® m . Then / - 1 ( A C ) = ( / _ 1 ( ^ ) ) C e ® m and consequently Ac G 21.
(iii) If AI,A2,... G 21, then f~l(Ai) G » m for all i. Therefore we have • T H l U i ) = U i f ' H A i ) e ® m ; hence ( J ^ i G 21. This shows that 21 is a σ-algebra. If O is open in R n , then (Definition 1.1.4) / - 1 ( 0 ) G 93m, which is the same as saying that O G 21. It follows that 21 contains all open sets in R n . • (ii)
Prove that f~1(A)
is Borei in R m for all Borei sets A in R".
Proof. Take for # the collection of open sets in R n and apply Exercise 3. •
10
Chapter 1. Probability Theory
Exercise 6 Prove that the composition of two Borei functions is a Borei function. Proof. Suppose that / : K p - » R ' and g :Rq R n are Borei functions. For every n open set O € R we then have (applying Exercise 5(ii)) : (9°f)-1(0)
= r1(g-1(0))
•
e 0, elsewhere.
Determine the left-hand and right-hand derivative of F x in the point χ = 0. Solution. We have ^(x)-^(0) x?o χ—0
l i m
0-0 ¡CÍO χ
= l i m
= 0
•
1.2. Exercises to Chapter I
17
That is to say, the left-hand derivative in χ = 0 equals zero. Prom the right side we expect the derivative of 1 - e~x to appear, that is e~° = 1. To verify this we notice that for χ > 0 |1 - e~x - x\ =
0, then
FY(y) = P(V < y) =
Ρ (X
0. Prove that φ(Χ) also enjoys an absolutely continuous distribution and find an expression for its probability density. Proof. Applying well-known results in basic mathematical analysis we conclude that ψ is strictly increasing and that the following limits (finite or infinite) exist: