124 40 1MB
English Pages 138 Year 2017
SOLUTIONS MANUAL FOR Mathematical Foundations for Signal Processing, Communications, and Networking
by Erchin Serpedin Thomas Chen Dinesh Rajan
SOLUTIONS MANUAL FOR Mathematical Foundations for Signal Processing, Communications, and Networking
by Erchin Serpedin Thomas Chen Dinesh Rajan
CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2012 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed in the United States of America on acid-free paper Version Date: 2011915 International Standard Book Number: 978-1-4398-8403-4 (Paperback) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com
Acknowledgements This work would have not been possible without the support and encouragement of our families and the help of our contributing authors and students. Our heartfelt thanks go to our families, PhD student Mr. Ali Riza Ekti and research associate Dr. Serhan Yarkan. Mr. Ali Riza Ekti and Dr. Serhan Yarkan devoted a huge amount of time and effort in compiling, correcting and merging together all the files and contributions. Their help is greatly appreciated. We would like also to thank our students: Amina Noor, Aitzaz Ahmad, and Sabit Ekin for their help. Despite our efforts to contain all sources of errors and misunderstandings, we believe that inconsistencies and errors might show up. Therefore, we are asking our readers kindly to email us their feedback to: [email protected]. Any feedback is welcome to improve this work. This book webpage will be maintained at the webpage of Dr. Serpedin at Texas A&M University: http://www.ece.tamu.edu/~serpedin/
ii
1
List of Contributing Authors: Ahmad Aitzaz, Texas A&M University, College Station Helmut B¨ olcskei, ETH Zurich, Switzerland Thomas Chen, Swansea University, Wales, UK Romain Couillet, L’Ecole Superieure D’Electricite (SUPELEC), France Shuguang Cui, Texas A&M University, College Station, USA Merouane Debbah, L’Ecole Superieure D’Electricite (SUPELEC), France Tolga Duman, Arizona State University, Tempe, USA Bogdan Dumitrescu, Tampere University of Technology, Finland Eduard Jorswieck, Technical University of Dresden, Germany Erik G. Larsson, Linkoping University, Sweden Hongbin Li, Stevens Institute of Technology, USA Veniamin I. Morgenshtern, ETH Zurich, Switzerland Daniel Palomar, Hong Kong University of Science and Technology Michal Pioro, Lund University, Sweden, and Warsaw University of Technology, Poland Khalid Qaraqe, Texas A&M University at Qatar Dinesh Rajan, Southern Methodist University, Dallas, USA Erchin Serpedin, Texas A&M University, College Station, USA Man-Cho Anthony So, Chinese University of Hong Kong Vivek Sarin, Texas A&M University, College Station, USA Fatemeh Hamidi Sepehr, Texas A&M University, College Station, USA Venugopal V. Veeravalli, University of Illinois at Urbana-Champaign, USA Serhan Yarkan, Texas A&M University, College Station, USA Walter D. Wallis, Southern Illinois University, Carbondale, USA Jiaheng Wang, Hong Kong University of Science and Technology Xiaodong Wang, Columbia University, New York Yik-Chung Wu, University of Hong-Kong Rui Zhang, National University of Singapore
Contents 1 Signal Processing Transforms
3
2 Linear Algebra
8
3 Elements of Galois Fields
27
4 Numerical Analysis
30
5 Combinatorics
34
6 Probability, Random Variables and Stochastic Processes
40
7 Random Matrix Theory
45
8 Large Deviations
50
9 Fundamentals of Estimation Theory
53
10 Fundamentals of Detection Theory
62
11 Monte Carlo Methods for Statistical Signal Processing
74
12 Factor Graphs and Message Passing Algortihms
77
13 Unconstrained and Constrained Optimization Problems
85
14 Linear Programming and Mixed Integer Programming
90
15 Majorization Theory and Applications
94
16 Queueing Theory
100
17 Network Optimization Techniques
104
18 Game Theory
112
19 A Short Course on Frame Theory
119
2
Chapter 1
Signal Processing Transforms Solution 1.1 By the definition of an even/odd function, it is sufficient to check what the output of h(−t) would be. Hence: h(−t) = f (−t)g(−t) since f (t) and g(t) are odd and even signals, respectively, = −f (t)g(t) and since h(t) is defined to be f (t)g(t), then = − f (t)g(t) | {z } h(t)
= −h(t). Solution 1.2 The problem can be solved in many ways. Intuitively, f (t) either slowed down or sped up version of g(t) for k > 0, whereas for k < 0, f (t) can still be regarded as the same with a time–reversal operation. Therefore, if the slowed down or sped up version of a signal is periodic, the original signal must be periodic as well. This reasoning formally can be stated as follows. Since it is known that f (t) is periodic, then f (t + nT ) = f (t) for n ∈ Z. This implies that f (t + nT ) = g(kt + knT ) = f (t), therefore, f (t + nT ) = g(kt + knT ) = g(kt). Now assume that kt = x and reorganize inside the brackets then g (x + n(kT )) = g(x). This can be rewritten as g (x + nT 0 ) = g(x) where T 0 = kT . Recall that k ∈ R \ {0}. Therefore, yes, g(·) is periodic with the period T 0 . Solution 1.3 In order for an linear time–invariant (LTI) system to be invertible, the input of the system can be recovered from its output. Consider a system which yields the same output for different input values. In these cases since the system cannot determine which of the input yielded that specific output, the system cannot be considered as invertible. d For instance, assume that x(t) = at + b for a, b ∈ R. Then the output becomes y(t) = K (x(t)) = dt (at + b) = a. However, the output y(t) = a is the same for a different input xc (t) = at + c where c ∈ R and b 6= c. Solution 1.4 Since it is known that f (t) is periodic, first of all, it is clear that g(t) is periodic as well. This stems from the fact that g(t) is a scaled and shifted version of f (t). Therefore, to answer Exercise 1.4, first one should find the Fourier series coefficients for the scaled version, that is the coefficients for f (2t); then the coefficients of the shifted version (of the scaled version) need to be found. 3
4
CHAPTER 1. SIGNAL PROCESSING TRANSFORMS For f (2t), the period is T /2 due to the compression or also known as time–scaling. The Fourier series representation of f (2t) is given by: ∞ X
f (2t) =
ak e(jk (T /2) t) 2π
k=−∞ ∞ X
=
2π ak e(j2k T t)
k=−∞
Since ω0 = 2π/T0 , where T0 is the fundamental period, then ∞ X
=
ak e(jk(2ω0 )t)
k=−∞
(1) is obtained. Note that in (1), the frequency of each complex exponential changes (actually doubles); however, there is no change in the coefficients. This shows that time–scaling does not change the Fourier series coefficients (but does change the representation!). Next, time shifting takes place. It is known that time shifting causes a phase shift in the phasor, then the Fourier series coefficients of g(t) is bk = ak e(jk(4π/T )) . Solution 1.5 Since it is known that f (t) is a continuous–time periodic signal, it means its Fourier series representation exists. Therefore: ! ∞ X 2π d2 jk t a k e( T ) g(t) = 2 dt k=−∞ !! ∞ X d d jk 2π t ( ) = ak e T dt dt k=−∞
since differentiation is a linear operator, then: d = dt = =
∞ X k=−∞
∞ X k=−∞ ∞ X
ak
d (jk 2π ak e T t) dt
!
jk2π d (jk 2π e T t) T dt
4k 2 π 2 (jk 2π −ak e T t) 2 T {z } k=−∞ | bk
2
2
is obtained and the coefficients bk is −ak 4kT 2π .
5 Solution 1.6 Because f (t) includes and absolute value operator, the Fourier transform integral needs to be broken into parts as follows: Z∞ F (jω) =
e−3|t−1| e−jωt dt
−∞
since breaking point for absolute value is 1, watching out the signs: Z1 =
e
−3(t−1) −jωt
e
−∞ −jω
Z∞ dt +
e3(t−1) e−jωt dt
1
e e−jω = + 3 + jω 3 − jω 6e−jω = . 6 + ω2 Solution 1.7 Since what is asked for a rational function in the frequency domain, by taking advantage of elementary identities and properties: ) ( k−1 (jt) 1 1 −1 = j F e(jxt) sgn(t) (2) k (k − 1)! 2 (ω − x) where k = 1, 2, . . .. In (2), if k = 1 and x = 0, then the answer is obtained as
j 2
sgn (t).
Solution 1.8 Cross–correlation of two functions is given by: Z∞ ρf,g (τ ) =
f (t)g ∗ (t + τ )dτ
(3)
−∞
Note that in (3), the definition of convolution property of the Fourier transform is given implicitly. ρf,g (τ ) = f (t) ? g ∗ (t) since convolution in the time domain is multiplication in the transform domain: = F (jω) ? G∗ (jω). Solution 1.9
∆t
∞ X
∞ X
f (t − n∆t) =
n=−∞
F (nj∆ω)ejn∆ωt
n=−∞
First assume that t = 0 and recall that ∆t∆ω = 2π. Then (4) degenerates to Poisson formula: ∆t
∞ X n=−∞
f (n∆t) =
∞ X n=−∞
F (nj∆ω)
(4)
6
CHAPTER 1. SIGNAL PROCESSING TRANSFORMS Now assume that f (t) is convolved with the impulse train as: ∞ X
x(t) = f (t) ?
δ(t − n∆t)
n=−∞
= =
∞ X
f (t) ? δ(t − n∆t)
n=−∞ ∞ X
f (t − n∆t)
n=−∞
Next, if the Fourier transform is applied to x(t) keeping in mind the convolution–multiplication duality: ∞ X
X(jω) = ∆ω
F (j∆ω)δ(jω − n∆ω)
n=−∞
since F (·) takes the value where Dirac delta function is defined ∞ X
= ∆ω
F (n∆ω)δ(jω − n∆ω)
n=−∞
is obtained. Above, it is assumed that x(t) = F−1 {X(jω)} ∆t
∞ X
f (t − n∆t) = ∆tF−1 {X(jω)}
n=−∞
( −1
=F
∆t∆ω
∞ X
) F (j∆ω)δ(jω − n∆ω)
n=−∞
= 2π
∞ X
F (j∆ω)F−1 {δ(jω − n∆ω)}
n=−∞
=
∞ X
F (j∆ω)ejn∆ωt .
n=−∞
Solution 1.10
By definition, one can write: 2
2
d d (Fc {f (t)}) = dω 2 dω 2
∞ Z f (t) cos (ωt)dt 0
Z∞ =
f (t)
d2 (cos (ωt)) dt dω 2
0
Z∞ =
f (t)(−1)t2 cos (ωt)dt
0
= Fc (−1)t2 f (t) .
7 Solution 1.11 By definition: L
ZT d d f (t) = lim f (t)e−st dt T →0 dt dt 0
integral can be divided into parts so that integration by part can be employed for the n–piecewise function
=e
−st
ZT T1 T2 −st f (t) + e f (t) + · · · + s e−st f (t)dt 0
T1
0
where u = e−st and dv =
+
= lim −f (0 ) + e T →∞
df dt dt.
−sT
Hence: ZT
f (T ) + s
e−st f (t)dt
0
and because the transform is assumed to exist for the limiting case T → 0: = sF (s) − f (0+ ). Solution 1.12
In order to show this orthonormality, it suffices to state that:
cas (iν0 t) cas (kν0 t) √ | √ 2π 2π
Zπ
=
x −π
Zπ
1 = 2π
cas (iν0 t) cas (kν0 t) dt −π
Since cross–terms after expanding cas (·) will vanish, then:
cas (iν0 t) cas (kν0 t) √ | √ 2π 2π
(
=
1, if i = k, 0, otherwise
Solution 1.13 By the definition of autoconvolution ρf f (t) = f ? f , if the Fourier transform is applied: F 2 (jω) = F {ρf f (t)} Similarly, for ρ∆∆ : ff
F ρ∆∆ = (−j sgn (jω)F (jω)) ff
which equals −F 2 (jω).
Chapter 2
Linear Algebra Solution 2.1 Let us assume that ∀s1 , s2 ∈ S, ∀r1 , r2 ∈ F : r1 s1 + r2 s2 ∈ S – Suppose s2 = 0 and s1 ∈ S, then for any r ∈ F , rs1 ∈ S – Suppose r = 1 and s1 , s2 ∈ S, then s1 + s2 ∈ S Hence, S is a subspace. Conversely, – If s1 , s2 ∈ S, then for any r1 , r2 ∈ F , r1 s1 ∈ S, r2 s2 ∈ S . – If r1 s1 , r2 s2 ∈ S, then r1 s1 + r2 s2 ∈ S. Then S is a subspace by definition, ⇒ S is a subspace if and only if ∀s1 , s2 ∈ S, ∀r1 , r2 ∈ F : r1 s1 + r2 s2 ∈ S. Solution 2.2 Following the hint, let x,y ∈ V. Since B is an ordered basis for V, x and y can be expressed as
x= y=
n X i=1 n X
si bi rj bj
j=1
Taking the inner product
* hx | yi = = =
n X
si bi i=1 n X n X
|
n X
+ rj bj
j=1
si rj∗ hbi | bj i
i=1 j=1 n X n X
si rj∗ aji
i=1 j=1
= [y]†B A[y]B Hence proved. Notice furthermore, that for any arbitrary vector x, we’ll have 8
9
hx | xi = [x]†B A[x]B , because ∗
hx | xi = hx | xi > 0, ∀x 6= 0, It follows that A = AH ≥ 0. Solution 2.3 We first show that hv0 | vi = 0. * 0
hv | vi =
w−
hw | vi v 2
kvk
= hw | vi −
+ |v
hw | vi hv | vi kvk
2 2
= hw | vi −
hw | vi kvk 2
kvk
=0 2
Since, 0 ≤ kvk , we have
2
hw | vi v
0 ≤ w −
2
kvk * + hw | vi v hw | vi v = w− |w− 2 2 kvk kvk = hw | wi − 2
= kwk −
hw | vi hv | wi 2
kvk hw | vi hv | wi
−
hw | vi hw | vi 2
kvk
+
hw | vi hw | vi hv | vi 2
2
kvk kvk
2
kvk
Hence, we have the Cauchy-Schwarz inequality, 2
2
2
|hv | wi| ≤ kvk kwk . We now prove the Triangle inequality using the above result. 2
2
2
kv + wk = kvk + kvk + 2< hv | wi 2
2
≤ kvk + kwk + 2 kvk kwk where the second step follows from the Cauchy-Schwarz inequality. Therefore, kv + wk ≤ kvk + kwk Solution 2.4 We first suppose that there are n elements in B. Consider a vector v in the vector space V given by v=
n X i=1
ci bi .
10
CHAPTER 2. LINEAR ALGEBRA Taking the inner product hv | bj i = =
* n X
+ ci bi | bj
i=1 n X
ci hbi | bj i
i=1 2
= cj kbj k . where the last step follows from the orthogonality of bi and bj . Since bj 6= 0, we have hv | bj i
cj =
2
kbj k
.
Since v = 0 implies cj = 0, ∀j = i, . . . , n, we conclude that the vectors b1 , . . . , bn in B are linearly independent. Solution 2.5 Suppose {v1 , . . . , vn } is set of linearly independent vectors in a vector space V of dimension n. Then we can form an orthogonal set {w1 , . . . , wn } ⊂ V by Gram-Schmidt process. We first calculate, w1 = v1 w2 = v2 − w3 = v3 −
hv2 | w1 i 2
.w1
2
.w1 −
kw1 k hv3 | w1 i kw1 k
hv3 | w2 i kw2 k
2
.w2
.. . wn = v n −
n−1 X
hvn | wi i 2
kwi k
i=1
.wi
Calculating the inner products, we obtain: hw2 | w1 i = hv2 | w1 i − hw3 | w1 i = hv3 | w1 i −
hv2 | w1 i 2
. hw1 | w1 i = 0
2
. hw1 | w1 i −
kw1 k hv3 | w1 i kw1 k
hv3 | w2 i 2
kw2 k
. hw2 | w1 i = 0
Proceeding in a similar manner, we can conclude that hwk | wi i = 0, ∀i = 1, . . . , k − 1 and hence, {w1 , . . . , wk } is an orthogonal set. This set of vectors is a basis because it is a linearly independent set with the same dimension as that of the spanning set {v1 , v2 . . . , vk } . Solution 2.6 The idea is to reduce first the original matrix to a row echelon form. Observe that the row rank of matrix is equal to the number of non-zero rows in the row echelon form. Then, implementing only elementary column operations, the row echelon form can be transformed to a diagonal matrix. One can show that the row rank and column rank are both equal with the number of non-zero diagonal entries. An alternative solution lies on these facts.
11 ∆ ∆ Notice that column rank (A) = dim (R (A)) and row rank (A) = dim R A† . Now, the nullspace N ull (A) is the orthogonal complement of rowspace R A† of A.
⇒ row rank (A) = dim R A†
⊥ = n − dim R A† = n − dim (N ull (A)) = dim (R (A)) = column rank (A) .
Solution 2.7 – a. We first show that the linear transformation TA is surjective if and only if rank (A) = m. Assume that the transformation TA is surjective. This implies that R (TA ) = F m F m) dim (R (TA )) = dim (F ⇒ rank (A) = m Now suppose that rank (A) = m. This shows that m < n and there are m linearly independent rows. Therefore, for every vector w ∈ F m there exists a vector v ∈ F n such that TA v = w ⇒ R (TA ) = F m
Hence, TA is surjective. – b. Next, we show that TA is injective if and only if rank (A) = n. First, we show that if TA is injective, then rank (A) = n. Notice that TA is injective, implies that N ull (TA ) = {0}. Eq. (2.46) implies further R (TA∗ ) = F n , i.e., rank (A) = n. The converse follows by using a reverse argument or simply an argument by contradiction. If we assume that TA is not injective, then there exists v1 6= v2 such that TA v1 = TA v2 , i.e., TA (v1 − v2 ) = 0, which means v1 − v2 ∈ N ull (A). Thus, dim (N ull (A)) ≥ 1 which contradicts Eq.(2.46). Solution 2.8 To show that T (v) = v, ∀v ∈ R (T ), consider vectors w ∈ V and v ∈ R (T ) such that T (w) = v T (T (w)) = T (v) T 2 (w) = T (v) T (w) = T (v) T (v) = v. Hence proved. In order to show that R (T ) and N ull (T ) are disjoint, we need to show that a vector v does not simultaneously belong to R (T ) and N ull (T ) except for v = 0. Suppose v ∈ R (T ) and v ∈ N ull (T ). Then, T (v) = 0. But T (v) = v from the previous proof. ⇒ T (v) = 0 Hence, R (T ) and N ull (T ) are disjoint.
if f v = 0.
12
CHAPTER 2. LINEAR ALGEBRA
Solution 2.9 For a projection to be orthogonal, its range and null space must be orthogonal. We need to show that (I − P) v ∈ N ull (P) and hPu | (I − P) vi = 0. Let (I − P) v = w
⇒ w ∈ R ({I − P})
2
(I − P) v = (I − P) w (I − P) v = (I − P) w ⇒ (I − P) w = w Pw = 0 ⇒ w ∈ N ull (P) . Now consider the inner product H
hPu | (I − P) vi = vH (I − P) Pu = vH I − PH Pu = vH P − P2 u. Since P = P2 , hPu | (I − P) vi = 0. Hence a Hermitian idempotent matrix P is an orthogonal projection. Solution 2.10 Let B = {v1 , v2 , . . . , vn } be a basis for the vector space V. Let M be a mapping from V to F such that vi∗ (vj ) = δij , then we can define a linear transformation T : V → F using the mapping M : T (vk ) = M (vk ) We now prove its uniqueness. Any vector v ∈ V can be uniquely represented as v=
n X
ak vk
k=1
Now, T (v) = vj∗ (v) = =
n X k=1 n X
ak vj∗ (vk ) ak δij
k=1
If U (vk ) = M (vk ) then U (v) = vj∗ (v) = =
n X k=1 n X k=1
∀v ∈ V then U = T . We obtain a set of linear functional from B i.e. v1∗ , . . . , vn∗ .
ak vj∗ (vk ) ak δij = T (v)
13 Suppose v∗ =
n X
si v∗
i=1
⇒ v∗ (vj ) =
n X
si v∗ (vj )
i=1
=
n X
si δij = sj
i=1
We notice that if it is the zero functional i.e., v∗ (vj ) = 0, j = 1, . . . , n, then sj = 0, j = 1, . . . , n. It follows that v1∗ , . . . , vn∗ are linearly independent. Since the dimension of V ∗ = n, we conclude that v1∗ , . . . , vn∗ is the dual basis of B. We have v= ⇒ vj∗ (v) = =
n X k=1 n X k=1 n X
ak vk ak vj∗ (vk ) ak δij = aj
k=1
Hence, the unique expression for v as a linear combination of vi∗ for i = 1, . . . , n is v=
n X
vi∗ (v)vk
k=1
Solution 2.11 – a. Let w ∈ R (T ) and n ∈ N ull (T ∗ ). hw | ni = hT v | ni = hv | T ∗ ni = 0 ⇒ n ∈ R (T )
⊥ ⊥
⇒ N ull (T ∗ ) ⊂ R (T ) . ⊥
– b. Let w ∈ R (T )
hT v | wi = 0 ⇒ hv | T ∗ wi = 0. Since this is true ∀v ∈ V, we have T ∗ w = 0. ⇒ w ∈ N ull (T ∗ ) . – c. From parts a and b, it follows that
⊥
R (T ) = N ull (T ∗ ) . – d. This can be obtained similarly by replacing T by T ∗ and using the fact that T ∗∗ = T .
14
CHAPTER 2. LINEAR ALGEBRA
Solution 2.12 We only prove the first assertion. The second follows in a similar manner. Let T be the linear transformation from V to W. Now assume that T is surjective. This implies for every w ∈ W, there exists a v ∈ V such that Tv = w ⇒ w ∈ R (T ) , ∀w ∈ W ⇒ w ∈ N ull (T ∗ )
⊥
Now suppose that T ∗ is not injective. This implies that there exists a vector w 6= 0 such that T ∗w = 0 ⇒ w ∈ N ull (T ∗ ) ⊥
which is a contradiction because a non zero vector w cannot simultaneously belong to N ull (T ∗ ) and N ull (T ∗ ). Hence, T ∗ must be injective. Now assume that T ∗ is injective and T is not surjective. This implies that there exists a vector w 6= 0 such that w ∈ / R (T ) ⊥
w ∈ R (T )
⇒ w ∈ N ull (T ∗ ) which contradicts the assumption that T ∗ is injective. Hence, T is surjective and the proof follows. Solution 2.13 – a. Let u ∈ N ull (T ). ⇒ Tu = 0 T ∗T u = 0 ⇒ u ∈ N ull (T ∗ T ) ⇒ N ull (T ) ⊂ N ull (T ∗ T ) Now suppose T ∗ T v = 0 but T v 6= 0. Let T v = b. This implies b ∈ R (T ). From assumption, b ∈ N ull (T ∗ ). ⊥ Hence, b ∈ R (T ) which is a contradiction. Therefore, T ∗ T v = 0 implies T v = 0. N ull (T ∗ T ) = N ull (T ) – b. Replacing T by T ∗ and noting that T ∗∗ = T , the proof follows. – c. Let v ∈ R (T ∗ T ). ⇒ T ∗T u = v Taking T u = b, we have v ∈ R (T ∗ ). Hence, R (T ∗ T ) ⊂ R (T ∗ ). Now let v ∈ R (T ∗ ). Hence, we can write T ∗x = v ∀x ∈ W. Let v ∈ / R (T ∗ T ) ⇒ T ∗ T y 6= v ∀T y ∈ W. Selecting x = T y, we have a contradiction. Therefore, v ∈ R (T ∗ T ) and R (T ∗ ) ⊂ R (T ∗ T ) R (T ∗ T ) = R (T ∗ ) – d. Similar to (b).
15 Solution 2.14 We have kT k =
kT vk . v∈V−{0} kvk
kT k =
Tv
.
v∈V−{0} kvk
sup
This can be written as
Now let w =
v kvk .
sup
This implies that kT k =
sup
kT wk .
w∈V,kwk=1
Solution 2.15 – a. Recall that the p-norm of a matrix A is given by kAkp = and kxk1 =
Pn
i=1
max
x∈X,kxkp =1
kAxkp
|xi |. Let the m × n matrix A be represented in column format as . . . A = a1 ..a2 .., · · · , ..an
⇒ Ax =
n X
ai xi
i=1
n
X
kAxk1 =
≤
i=1
n X
ai xi
1
|xi | kai k1
i=1 n X
≤ max kaj k1 j
! |xi |
i=1
= max kaj k1 kxk1 j
For kxk1 = 1, we have kAxk1 ≤ max kAj k1 , j
⇒ max kAxk1 = max kaj k1 j
kxk1 =1
⇒ kAk1 = max kaj k1 j
Note that we can also write kaj k1 =
X
|aij |
i
⇒ kAk1 = max j
Hence proved.
X i
|aij |
16
CHAPTER 2. LINEAR ALGEBRA – b. We have kxk∞ = max |xi |. i=1,...,n
kAxk∞
P
j a1j xj
. . =
P .
j amj xj ∞ X aij xj = max i j X ≤ max |aij xj | i
j
X ≤ max |aij | max |xj | i
j
j
X = max |aij | kxk i
∞
j
For kxk∞ = 1 and using an argument similar to (a), we have the desired result. kAk∞ = max i
X
|aij |
j
– c. Consider I = xH AH Ax − λxH x Taking gradient with respect to x and equating to zero, we have AH Ax = λx ⇒ x is an eigenvector of AH A. Also xH AH Ax = λxH x = λ
∵ kxk2 = xH x = 1
Since we want to maximize the quantity on the left, λ must be the largest eigenvalue of AH A. Let us define the spectral radius of a matrix as ρ (A) = max |λi | i
Hence, we have 2 kAk2 = ρ AH A q kAk2 = ρ (AH A) As an alternative proof, use the concept of Rayleigh quotient. – d. This part can be proved easily by considering an m × n matrix A, writing the product AH A, and noting that the trace of this product matrix can be written as m X n X 2 tr AH A = |aij | i=1 j=1
17 Solution 2.16 It follows from the previous exercise that 2 H kUAkF = tr (UA) (UA) = tr AH UH UA = tr AH A ∵ UH U = I 2
= kAkF ⇒ kUAkF = kAkF . Solution 2.17
I 0 det(In + BA) = det B I + BA I −A I = det B I O
A I
= det(Im + AB) Solution 2.18 – a. Suppose s ∈ S and hv − s | s∗ i = 0 ∀s∗ ∈ S. Let s0 ∈ S, where s0 6= s. Rewriting the difference as v − s0 = v − s + s − s0 2
kv − s0 k = kv − s + s − s0 k
2
2
2
2
2
= kv − sk + ks − s0 k + 2< [hv − s | s − s0 i] = kv − sk + ks − s0 k . where the last step follows since v − s is orthogonal every vector in S. This implies that 2
2
kv − s0 k ≥ kv − sk . 2
2
– b. Conversely, let kv − s0 k ≥ kv − sk ,∀s0 ∈ S. It follows from the above proof that 2
2< [hv − s | s − s0 i] + ks − s0 k ≥ 0 Substituting s − s0 with
0
(1)
0
hv−s|s−s i(s−s )
in the equality (1), it follows that, # 2 hv − s | hv − s | s − s0 i (s − s0 )i |hv − s | s − s0 i| 2< − + ≥0 2 2 ks − s0 k ks − s0 k " # 2 2 |hv − s | s − s0 i| |hv − s | s − s0 i| ⇒ 2< − + ≥0 2 2 ks − s0 k ks − s0 k ks−s0 k2
"
2
⇒−
|hv − s | s − s0 i| 2
ks − s0 k
≥0
which is true iff hv − s | s − s0 i = 0. Hence, v − s is orthogonal to every vector in S. We show that the best approximation is unique using proof by contradiction. Suppose s and s0 are best approximations. Then kv − sk = kv − s0 k. From the proof in part (a) 2
2
2
kv − s0 k = kv − sk + ks − s0 k 2
⇒ ks − s0 k = 0 Therefore, the best approximation, if it exists is unique.
18
CHAPTER 2. LINEAR ALGEBRA – c. * hv − s | sj i =
Pm
v−
k=1
hv | sk i sk 2
ksk k
= hv | sj i −
+ | sj
hv | sj i hsj | sj i ksj k
2
=0 ⇒ v − s is orthogonal to every vector in S and it is the best approximation. Solution 2.19 – a. P is an orthogonal projection of V onto S. ⇒ Pv = s and s ∈ R (P). Now suppose (I − P) v = w. It has been shown in exercise (2.12) that w ∈ N ull (P) ⇒ hs | wi = 0
∵ P is an orthogonal projection
⊥
⇒w∈S
2
Hence, (I − P) projects onto S⊥ . Notice also (I − P) = I − P. To show that (I − P) is orthogonal, let x ∈ N ull (P) (I − P) x = 0 ⇒ x = Px Consider the inner product hw | xi = h(I − P) v | Pxi = xT P − P2 v =0 since P is an orthogonal projection. – b. P is idempotent as it is an orthogonal projection and every projection is idempotent. – c. Suppose Ps0 = 0. This implies s0 ∈ N ull (P). From (a), it follows that s0 ∈ S⊥ . Now assume that s0 ∈ S⊥ . For any s ∈ S, we have hs | s0 i = 0. Since P is an orthogonal projection, its null space and range space are orthogonal. ⇒ s0 ∈ N ull (P) and the proof follows. – d. Let v ∈ V. By the projection theorem, there is a unique s0 ∈ S such that kv − s0 k ≤ kv − sk ∀s ∈ S and w0 = v − s0 ∈ S⊥ . Hence, any vector in V can be decomposed into v = s0 + w0 where s0 ∈ S and w0 ∈ S⊥ . Therefore, V = S + S⊥ . Solution 2.20 – a. Using projection theorem, error is orthogonal to data, i.e., (T v − w) ⊥ R (T ) ⊥
(T v − w) ∈ R (T )
⇒ (T v − w) ∈ N ull (T ∗ ) .
19 – b. T ∗ (T v − w) = 0 ⇒ T ∗ T v = T ∗ w.
Solution 2.21 In this exercise, we prove that if all eigenvalues of an n × n matrix A are distinct, the corresponding eigenvectors are all linearly independent. We show the process for n = 2. The generalization for n > 2 is straightforward. Assume that the eigenvectors are linearly dependent, i.e., there exist constants c1 and c2 , not all zero, such that c1 x1 + c2 x2 = 0 Multiplying by A yields c1 Ax1 + c2 Ax2 = c1 λ1 x1 + c2 λ2 x2 = 0 Now subtract λ1 times the first equation from the second to obtain c2 (λ2 − λ1 ) x2 = 0 Since the eigenvalues are distinct and the eigenvector x2 6= 0, we have c2 = 0. Similarly, it can be shown that c1 = 0, and the proof follows by contradiction. Solution 2.22 Let us first prove that the eigenvalues of a Hermitian matrix are real. Clearly, viH Avi
H
= viH AH vi = viH Avi .
Now consider 2
Avi = λi vi , viH Avi = λi viH vi = λi kvi k . Since the left hand side (LHS) of the above equation is real, the eigenvalues λi of a Hermitian matrix A are real. Now we show that eigenvectors of a Hermitian matrix A corresponding to distinct eigenvalues are orthogonal. For eigenvalues λi , λj , we can write Avi = λi vi
(1)
Avj = λj vj
(2)
Assume λi 6= λj . Multiply (1) to the left by vj H , we will obtain vj H Avi = λi vj H vi
(3)
vi H Avj = λj vi H vj .
(4)
Multiply (2) to the left by vi H , we will obtain
The LHS of (3) & (4) are equal because A = AH :
(vj H Avi )H = vi H AH vj . | {z } | {z } scalar
scalar
20
CHAPTER 2. LINEAR ALGEBRA Therefore, the right hand side (RHS) of (3) & (4) should be equal. λi vj H vi = λj vi H vj .Sinceλi 6= λj ⇒ vj H vi = vi H vj = 0.
Solution 2.23 – a. We have Avi = λi vi ,
i = 1, . . . , n.
In matrix form A [v1 , . . . , vn ] = [Av1 , . . . , Avn ] = [λ1 v1 , . . . , λn vn ] λ1 .. = [v1 , . . . , vn ] .
λn
Let S = [v1 , . . . , vn ] and Λ = diag (λ1 , . . . , λn ). This implies that AS = SΛ. Since S is composed of linearly independent vectors, it is full rank and invertible. ⇒ Λ = SAS−1 . i.e., A is diagonalizable. n
– b. Given S−1 AS = Λ, we have AS = SΛ. Let {vi }i=1 denote the columns of S. λ1 .. A [v1 , . . . , vn ] = [v1 , . . . , vn ] . λn = [λ1 v1 , . . . , λn vn ] ⇒ Avi = λi vi for i = 1, . . . , n. Hence, the i th column of S is an eigenvector of A, corresponding to the i th diagonal entry of Λ. Solution 2.24 We know from exercise 2.19 that kUAkF = kAkF .
2 2 kAkF = UΣVH F
2 = ΣVH F 2
= kΣkF = tr ΣH Σ
= σ12 + σ22 + . . . + σn2 ⇒ kAkF = σ12 + σ22 + . . . + σn2
21
.
21 Solution 2.25 We assume A = A1
A2
B1 and B = , where B2 A1 A2 B1 B2
A11 = A21 A12 = A22 = B11 B12 = B21 B22
Using properties of block matrix multiplication, AB = A1
A2
B1 = A1 B1 + A2 B2 B2
A11 B11 B12 A1 B1 = A21 A11 B11 A11 B12 = A21 B11 A21 B12 A2 B2 can determined similarly and the desired result can be obtained. Solution 2.26 Let A be a circulant matrix and Πm is a permutation matrix, then
a1 an Πm A = . ..
a2 a1 .. .
... ...
a2
a3
...
an an−1 = . .. a1 an an−1 Πm AΠT = . .. a1
an an−1 = . .. a1
0 1 0 an 0 0 1 an−1 .. .. .. .. . . . . 0 0 0 an a1 1 0 0 an−2 an−1 an−3 an−2 .. .. . . an−1 an 0 0 an−2 an−1 1 0 an−3 an−2 .. .. .. .. . . . . 0 0 an−1 an 0 0 an−2 an−1 an−3 an−2 .. .. = A . . an−1 an
an−1 an−2 .. .
a1 an .. .
... ...
a2
...
a1 an .. .
... ...
a2
...
a1 an .. .
... ...
a2
...
... ... .. . ... 0
... ... .. . ... 0
0 0 .. . 1 0
0 1 0 0 .. .. . . 1 0 0 0
A circulant matrix A can be written as A = circ(a1 , . . . , an ) = circ(an , a1 , . . . , an−1 )circ(0, 0, . . . , 0, 1) = circ(0, 1, 0, . . . , 0)circ(a1 , . . . , an)circ(0, 0, . . . , 0, 1) = Πm AΠTm
22
CHAPTER 2. LINEAR ALGEBRA Hence, a square matrix A is circular if and only if Πm AΠTm = A.
Solution 2.27 – a. Let us denote the product AB by C. If A and B are circulant matrices, using the theorem proved in previous exercise, A = Πm AΠTm B = Πm BΠTm ⇒ AB = Πm AΠTm Πm BΠTm Replacing AB by C and using the fact that ΠTm Πm = I, we have C = Πm CΠTm ⇒ C is a circulant matrix. – b. Denote A−1 by D. We can write A = Πm AΠTm −1 −1 A−1 = Π−T Πm m A −1 D = Π−T m DΠm −1 T Using the fact that Π−T m = Πm and Πm = Πm , we have the desired result.
Solution 2.28 We proceed by proving that the circulant matrix C can be diagonalized by the DFT matrix F. 1 1 1 .. . 1 First we show that the matrix
√F N
1
1
e−j2π/N e−j4π/N .. .
... ... ... .. .
e−j2π(N −1)/N e−j4π(N −1)/N .. .
e−j2π(N −1)/N
...
e−j2π(N −1)
2
/N
is a unitary matrix. N −1 X † F F l,m = e+j2πlk/N e−j2πmk/N k=0
=
N −1 X
e+j2π(m−l)k/N
k=0
When m = l, N −1 X
e+j2π(m−l)k/N =
k=0
N −1 X
1=N
k=0
When m 6= l, N −1 X
e+j2π(m−l)k/N =
k=0
N −1 X
(e+j2π(m−l)/N )k
k=0
=
1 − (e+j2π(m−l)/N )N 1−1 = =0 1 − (e+j2π(m−l)/N ) 1 − (e+j2π(m−l)/N )
23
⇒ F† F l,m = ⇒
√F N
N 0
l=m l 6= m
is unitary.
Now we show that C is diagonalized by F i.e., CF = F∧ where ∧ is diagonal. Consider the l, mth element of CF [CF]l,m =
N −1 X
e−j2π(m−1)k/N ck mod(N )
k=0
= e−j2π(l−1)k/N
N −1 X
e−j2π(m−1)k/N ck mod(N ) = e−j2π(l−1)k/N = [CF]1,m
k=0
⇒ CF = Fdiag([CF]1,1 , [CF]1,2 , . . . , [CF]1,m ) It proves that the eigenvalues of matrix are λi =
N −1 X
e−j2π(ik)/N ci
k=0
and the corresponding eigenvectors are 1 xk = √ 1 n
e−j2πk/N
e−j2π(2k)/N
...
e−j2π(N −1)k/N
Solution 2.29 The determinant of a Vandermonde matrix A can be determined by the following equation |A| =
Y
(aj − ai )
1≤i 0 2
⇒ kxi k λH i >0 2
Now since, kxi k > 0, we have λi > 0.
24
CHAPTER 2. LINEAR ALGEBRA – b. Using the hint that any non-zero vector x in Rn can be written as x = α1 x1 + . . . + αn xn , where αi = xT xi . We can write Ax = α1 Ax1 + . . . + αn Axn = α1 λ1 x1 + . . . + αn λn xn T
x Ax = α1 λ1 xT x1 + . . . + αn λn xT xn = α12 λ1 + . . . + αn2 λn Each of the terms in the above sum is positive. Therefore, xT Ax > 0 and A is positive definite. Solution 2.31 Using properties of block matrix multiplication, we have A 0 I A−1 B A B = C E 0 I C CA−1 B + E Putting E = D − CA−1 B yields the desired result. Solution 2.32 Using results from the previous exercise, M=
A B
0 I A−1 B E 0 I
and the given hint, we have A |M| = B
0 I A−1 B E 0 I
= |A| |E| where E is the Schur complement. Solution 2.33 A generalized inverse of a matrix M−1 satisfies MM− M = M Using properties of block matrix multiplication A B A− + A− BE− CA− −A− BE− A B − MM M = C D −E− CA− E− C D − − − − − − AA + AA BE CA − BE CA −AA− BE− + BE− A B = CA− + CA− BE− CA− − DE− CA− −CA− BE− + DE− C D Using E = D − CA− B and after some simplification,
A B MM M = C D −
25 Solution 2.34 – a. Define D = Am×n + Bm×n .
K = Dm×n ⊗ Cp×q
d11 C d12 C . . . d21 C = .. . dm1 C dmn C (a11 + b11 ) C (a12 + b12 ) C . . . (a21 + b21 ) C = .. .. . . (am1 + bm1 ) C
(amn + bmn ) C
= Am×n ⊗ Cp×q + Bm×n ⊗ Cp×q – c.
βb12 βb21 .. .
... ... .. .
βb1q βb21 .. .
αam1 αam2 . . . αamn βbp1 βbp2 αa11 βB αa12 βB . . . αa1n B αa21 βB αa21 βB . . . αa21 B = .. .. .. .. . . . . αam1 βB αam2 βB . . . αamn B αa11 βB αa12 βB . . . αa1n B αa21 βB αa21 βB . . . αa21 B = αβ .. .. .. .. . . . .
...
βbpq
αa11 αa21 (αAm×n ) ⊗ (βBp×q ) = . ..
αa12 αa21 .. .
αa1n βb11 βb21 αa21 .. ⊗ .. . .
... ... .. .
αam1 βB αam2 βB . . .
αamn B
= αβ(Am×n ) ⊗ Bp×q ) Parts (b) and (d) can be solved in a similar way. Solution 2.35 – We want to compute the gradient of xT a
∂ ∂x1 ∂ ∂x2
a1 m X a2 ∂xT a ∂xT a = . =a xi ai = . ⇒ . ∂x ∂x .. i=1 . ∂ am ∂x m
Similarly, ∂aT x =a ∂x – Now, we compute the gradient of xT Ax where A is symmetric.
∂ ∂x1 ∂ ∂x2
∂xT Ax = . ∂x ..
∂ ∂xm
m X m X xi xj Aij i=1 j=1
26
CHAPTER 2. LINEAR ALGEBRA Let us consider the first component, m m ∂ XX xi xj Aij ∂x1 i=1 j=1
For i 6= 1 and j 6= 1, there is no contribution to the gradient. For i = 1 and j 6= 1, we have m X ∂ X xi xj Aij = xj A1j when computed for i = 1. ∂x1 j6=1
j6=1
Similarly, for i 6= 1 and j = 1, we get m X X ∂ X xi xj Aij = xi Ai1 = xj A1j ∂x1 i6=1
i6=1
j6=1
When i = j = 1, we get X ∂ xi xj Aij = x21 A11 = 2x1 A11 ∂x1 i6=1
Combining the above three equations m m m X ∂ XX xj A1j xi xj Aij = 2 ⇒ ∂x1 i=1 j=1 i=1
we get the first element of the product 2Ax. Combining the results we get the required answer. An alternative way to prove ∂xT Ax = Ax + AT x = 2Ax ∂x is to recall the differentiation formula: ∂(f (x) · g(x)) ∂f (x) ∂g(x) = · g(x) + f (x) · ∂x ∂x ∂x Now interpret ∂xT Ax ∂(bT x) ∂(xT c) = + ∂x ∂x ∂x where bT = xT A and c = Ax are treated at the time of computing the above gradients wrt to x as constant vectors that do not dependent on x. since ∂(bT x) ∂(xT c) = b, = c. ∂x ∂x It follows that ∂xT Ax = b + c = AT x + Ax = 2Ax. ∂x The remaining properties can be proved in a similar way.
Chapter 3
Elements of Galois Fields Solution 3.1 Let a ∈ G and assume that b1 , b2 ∈ G are two distinct inverses of this element. Then a · b1 = a · b2 = e (e: identity element). Applying the inverse of “a” to both sides, we get b1 = b2 , which is a contradiction. Hence the inverse of “a” is unique. Solution 3.2 Let a ∈ G, since a & p are relatively prime (p is prime and a < p). Using Euclid’s algorithm, we can write a · n = p · r + 1 with 1 ≤ n < p. Therefore a · n mod-p is 1, i.e., “n” which is also an element of G, is the inverse of “a”. Solution 3.3 . 1 1 1 2 2 3 3 4 4 5 5 6 6
2 2 4 6 1 3 5
3 3 6 2 5 1 4
4 4 1 5 2 6 3
5 5 3 1 6 4 2
6 6 5 4 3 2 1
element 1 2 3 4 5 6
inverse 1 4 5 2 3 6
order 1 3 6 3 6 2
Solution 3.4 [Solution is adopted from [1, p.31].] Let aH & bH be two distinct cosets of H. Assume that they have a common element. Then a · h1 = b · h2 for some h1 , h2 ∈ H, −1 0 0 which implies that a = b · h2 · h−1 1 . Since h2 h1 ∈ H, we can also write a = b · h with h ∈ H, which means that 0 0 0 aH = bh H = {b(h h) : h ∈ H} as {h h : b · H} lists all the elements of H. In other words, a · H = b · H, i.e., the two cosets are identical, which is a contradiction. Therefore, two different cosets of H cannot have common elements.
Solution 3.5 0 = 0 · b = (a + (−a)) · b = a · b + (−a) · b, hence “ (−a) · b” is the additive inverse of “a · b”, i.e., (−a) · b = − (a · b) . The other equality is proved in a similar manner. Solution 3.6 −1 a·b= a · c & a = 6 0. Since “a” has an inverse denote it by a , we can write −1 −1 a ·a·b= a · a · c ⇒ b = c. 27
28
CHAPTER 3. ELEMENTS OF GALOIS FIELDS
Solution 3.7 Let {b1 , b2 , ..., bq−1 } denote all the non-zero field elements. Construct the set of elements: {a · b1 , a · b2 , ..., a · bq−1 } . Clearly, all these elements of the field must be distinct (otherwise abi = abj (f or i 6= j) ⇒ a·(bi − bj ) = 0 ⇒ bi −bj = 0 ⇒ bi = bj ). The two products b1 b2 ...bq−1 & ab1 ab2 ...abq−1 must be identical (both are the products of all non-zero field elements). Therefore,
b1 b2 ...bq−1 = aq−1 b1 b2 ...bq−1 | {z } | {z } non−zero
and multiplying both sides with (b1 b2 ...bq−1 )
−1
non−zero
, we obtain aq−1 = 1.
Solution 3.8 αX + Y + α3 Z = α5
(1)
α3 Y + αZ = α
(2)
α5 X + α2 Z = α3
(3)
Multiplying (2) by α4 , we get Y = α5 + α5 Z, and (3) by α2 we obtain X = α4 Z + α5 . Substituting these into (1), we obtain α6 + α5 + α5 + α5 + α3 Z = α5 ⇒ α3 Z = α6 ⇒ Z = α3 . Substituting this into (2) and (3), we can solve for X and Y as X = α5 + α4 · α3 = 1 + α + α2 + 1 = α4 , Y = α5 + α8 = 1 + α + α2 + α = α6 . To summarize, the solution of the set of equations is X = α4 ; Y = α6 ; Z = α3 Solution 3.9 Assume that φ(x) does not divide P (x) and write: P (x) = φ(x)q(x) + r(x) with r(x) 6= 0. Substituting X = b, we can write P (b) = φ(b)q(b) + r(b). Since P (b) = φ(b) = 0 ⇒ r(b) = 0. But, this is a contradiction as r(x) has a degree smaller than that of the minimal polynomial φ(x). Hence φ(x) must divide P (x).
29 Solution 3.10 Consider the conjugacy class of a non-zero element b in GF (2m ) b, b2 , b4 , . . . . m
m
Clearly b2 = b (since b2 −1 = 1 for any non-zero field element). Then, clearly the number of distinct elements in the above sequence is at most m. Hence φ(x) must have a degree less than or equal to m. Solution 3.11 We need to evaluate (X + α6 )(X + α12 )(X + α9 )(X + α3 ) where α is primitive and α4 = α + 1, (α ∈ GF (24 )). Using the table generated in (Ex. 12) we get: φ(x) = (X + (α6 + α12 )X + α18 )(X 2 + (α9 + α3 )X + α12 ) = (X 2 + (α + 1)X + α3 )(X 2 + αX + α12 ) = (X 2 + α4 X + α3 )(X 2 + αX + α12 ) = X 4 + (α + α4 )X 3 + (α12 + α5 + α3 )X 2 + (α16 + α16 )X + α15 = X 4 + X 3 + X 2 + X + 1. Solution 3.12 b1 with order 15 is contained in GF (24 ). Its distinct conjugates are: {b1 , b21 , b41 , b81 } since b16 1 = b1 . Hence its minimal polynomial has degree 4. b2 with order 1023 is contained in GF (210 ). Its distinct conjugates are: 32 64 128 256 512 {b2 , b22 , b42 , b82 , b16 2 , b2 , b2 , b2 , b2 , b2 },
hence its minimal polynomial has degree 10. Solution 3.13 Let b ∈ GF (2m ) have order 25. Its conjugacy classes are (using b25 = 1): {1} {b, b2 , b4 , b8 , b16 , b7 , b14 , b3 , b6 , b12 , b24 , b23 , b21 , b17 , b9 , b18 , b11 , b22 , b19 , b13 } {b5 , b10 , b20 , b15 } Hence there are three irreducible factors of degrees 1, 20 and 4.
Chapter 4
Numerical Analysis Solution 4.1 Absolute Error
Relative Error
– a.
−1.1116e + 003
−9.00369e − 001
– b.
−1.0678e − 004
−1.51011e − 004
– c.
1.2645e − 003
4.02499e − 004
– d.
1.0191e + 000
1.12100e + 001
– e.
9.9900e − 004
1.00000e − 003
– f.
1.7182e − 003
6.32080e − 004
Solution 4.2 Using 3-digit arithmetic with rounding, – a. a = 1/3, a ˆ = 0.333, relative error = −1.0e − 3 – b. x = 0.8. Rounding the value of xi for i = 0 : n in each iteration, and using it to compute the sum which is also rounded to 3 digits gives a final value of 4.57 which is different from the actual 15-digit value 4.57050327040000. – c. 5 – d. x = −0.8. Rounding the value of xi for i = 0 : n in each iteration, and using it to compute the sum which is also rounded to 3 digits gives a final value of 0.603 which is different from the actual 15-digit value 0.603277414400000. – e. 0.556 Solution 4.3 The table summarizes the relative error in computing ex for x = 10 k = 1, approx ex = 1.10000000e + 1, exact ex = 2.20264658e + 4, rel error = −9.99500601e − 1 k = 4, approx ex = 6.44333333e + 2, exact ex = 2.20264658e + 4, rel error = −9.70747312e − 1 k = 8, approx ex = 7.33084127e + 3, exact ex = 2.20264658e + 4, rel error = −6.67180321e − 1 k = 16, approx ex = 2.14308347e + 4, exact ex = 2.20264658e + 4, rel error = −2.70416098e − 2 Solution 4.4 The table summarizes the relative error in computing ex for x = −10 k = 1, approx ex = −9.00000000e + 0, exact ex = 4.53999298e − 5, rel error = −1.98239192e + 5 k = 4, approx ex = 2.91000000e + 2, exact ex = 4.53999298e − 5, rel error = 6.40970055e + 6 k = 8, approx ex = 1.34258730e + 3, exact ex = 4.53999298e − 5, rel error = 2.95724523e + 7 k = 16, approx ex = 1.79453698e + 2, exact ex = 4.53999298e − 5, rel error = 3.95272974e + 6
30
31 Solution 4.5 The table summarizes the results x f 0 (x) 0.000000e + 000 1.003225e + 000 7.853982e − 001 2.026105e + 000 1.374447e + 000 3.492044e + 001
f 0 (θ) = 1/cos2 (θ) 1.000000e + 000 2.000000e + 000 2.627414e + 001
relativeerror 3.225197e − 003 1.305237e − 002 3.290801e − 001
Solution 4.6 The general solution for the recurrence is given by yk = α1 (1/3)k + α2 (2)2 The specific solution is obtained for α1 = 1 and α2 = 0. However, the floating-point representation of the initial value y1 is not exact, which causes α2 to be a small nonzero value. The error introduced in α2 allows the second term to be part of the computed solution. Even though α1 and α2 are not computed, the solution implicitly includes the two terms. The second term grows rapidly and dominates the solution. Solution 4.7 – a. Large relative error occurs when x is very small. For small values of x, one should use the following formula: √ f (x) = x/(10 + 100 − x) – b. Large relative error occurs when x is very small. For small values of x, one should use the following formula: f (x) = 2x/(1002 − x2 ) Solution 4.8 Condition number of the problem is given in (2.9): |f 0 (x)x/f (x)|. For f (θ) = tan(θ), it will be equal to κ(θ) = |θ/(sin(θ) cos(θ)|. – a. κ(π/4) = 1.57 – b. κ(π/2) = ∞ Solution 4.9 mach = 0.5 × β 1−p = 0.5 × 21−31 = 2−31 . The decimal digits of accuracy are given by − log10 |mach | which equals about 9.33 digits. Solution 4.10 – a. 104 – b. 0.9999 – c. 10 – d. Underflow – e. 10−1 – f. Overflow Solution 4.11 If a and b are very large numbers of the same sign, we can have overflow in the first formula but the second one computes the answer correctly. The opposite is true when a and b are large numbers with opposite sign.
32
CHAPTER 4. NUMERICAL ANALYSIS
Solution 4.12 The second formula incurs cancelation error that is more harmful because the two quantities being subtracted are large and nearly equal. The cancelations in the first formula tend to be less problematic. As an example, consider √ x1 = 1, and xi = mach/2 , i = 2, . . . , n, and check that the second formula can give a negative value for standard deviation. Solution 4.13 – a. Vandermonde approach requires solving the system 1 −1 1 a1 2 1 0 0 a2 = −1 1 1 1 a3 1 which provides the values of the coefficients: a1 = −1, a2 = −1/2, a3 = 5/2. This gives the polynomial p2 (x) = (5/2)x2 − x/2 − 1. – b. Lagrange method gives the polynomial: p2 (x) = 2x(x − 1)/((−1)(−2)) + (−1)(x + 1)x − 1)/((1)(−1)) + (1)(x(x + 1)/((2)(1)) – c. Newton’s method gives the polynomial: p2 (x) = 2 + (x + 1)(−3) + (5/2)(x)(x + 1) It is easy to see that these polynomials are identical. Solution 4.14 Use the divided differences to compute the interpolating polynomial in Newton’s form. The common way to represent these differences are in a tabular form xi yi = f (xi ) f (xi , xi+1 ) f (xi , . . . , xi+2 ) f (xi , . . . , xi+3 ) f (xi , . . . , xi+2 ) 1 36 100−36 = 64 2 100 1 144−100 44−64 3 144 = 44 1 3−1 = −10 144−144 0−44 −22+10 4 144 = 0 = −4 1 4−2 = −22 4−1 100−144 44−0 −22+22 0+4 5 100 = −44 = −22 =0 1 5−3 5−2 5−1 = 1 The values on the diagonal are the coefficients of the Newton’s form, i.e., c0 = 36, c1 = 64, c2 = −10, c3 = −4, c4 = 1. Note that the table can be computed one row at a time starting from the first row. This is equivalent to adding data points one at a time. Solution 4.15 There are four intervals of equal width h = 1. We need to compute four functions si (x), i = 1, 2, 3, 4. We first determine σi , i = 2, 3, 4 by solving the tri-diagonal system shown in Example 2.4.5 with the right hand side made up of gi , i = 2, 3, 4 given by equation 2.30: g2 = (144 − 100) − (100 − 36)) = −20, g3 = ((144 − 144) − (144 − 100)) = −44 g4 = ((100 − 144) − (144 − 144)) = −44 The solution is σ2 = −3, σ3 = −8, σ4 = −9. These values need to be used in equation 2.29 to determine the spline interpolant. Solution 4.16 Using Newton’s interpolation technique, one gets erf (x) = 0.2763260 + 0.9766960(x − 0.25) + 0.5205000(x − 0.25)(x − 0.5) – a. erf(5/8) via interpolation: 0.6225180, relative error: 1.16e-3 – b. erf(7/8) via interpolation: 0.786415, relative error: 2.98e-3
33 Solution 4.17 Since f (x) = ax + b and f 0 (x) = a, Newton’s iteration will be xk+1 = xk − (axk + b)/a which simplifies to xk+1 = −b/a Solution 4.18 Since f (x) = x2 − a and f 0 (x) = 2x, Newton’s iteration will be xk+1 = xk − (x2k − a)/(2xk ) which simplifies to xk+1 = (1/2) × (xk + a/xk ) Solution 4.(19-25) Note: No solution is provided for Problems 19-25 since these are implementations of the algorithms provided in the text.
Chapter 5
Combinatorics Solution 5.1 4 + 3 + 7 = 14 (addition principle). Solution 5.2 2n+1 n+1 or
(2n+1)! n!(n+1)! (multiplication
principle).
Solution 5.3 4 × 6 × 2 = 48. Solution 5.4 263 × 103 . Solution 5.5 Minimum is 24; each vertex is on just one edge. Maximum is
48 2
= 1, 128, for the complete graph.
Solution 5.6 262 × 102 × 524 = 494, 265, 241, 600. Solution 5.7 24 × 23 × 22 × 21 = 255, 024. Solution 5.8 There are 9 possibilities for the first place, 2 for the second, 10 for each other, so the first answer is 18 × 105 = 1, 800, 000. In the second case there are 9 possible first integers and 19 possibilities for the secomd and third together, so 9 × 19 × 104 = 1, 620, 000. Solution 5.9 First we’ll coint the possibilities up to one million. Consider the strings of 6 digits, from 000000 to 999999. (000000 represents 1,000,000; otherwise abcdef represents abc, def with leading zeroes deleted.) There are 6 places to put 3; for each choice there are 5 places to put 5; for each of these there are 4 places to put 7; there are 7 choices for each other number. The total is 6 × 5 × 4 × 73 = 41, 160. There are the same number from 1,000,001 to 2,000,000. So the answer is 2 × 41, 160 = 82, 320.
34
35 Solution 5.10 This is equivalent to finding – a. the total number of 4-digit numbers – b. the number of 4-digit numbers with no repeated digits. The answer to – a. is 9000 (all the numbers from 1000 to 9999). – b. there are 9 choices (any of 1 to 9) for the first digit (x say), 9 for the second (0 is now available but x is not), 8 for the third and 7 for the fourth, so the answer is 9 × 9 × 8 × 7 = 4536. Solution 5.11 Since the sites are mutually exclusive (the student will join only one site), the addition principle applies; we add the numbers for each site. For a given site, befriending one acquaintance does not affect whether or not the student befriends another, so that the multiplication principle applies. Thus, the first site provides the student with 23 or 8 possibilities; the second, 25 = 32; the third, 64; and the fourth, 16. We add these to find 120 possibilities. Solution 5.12 – a. There are 73 = 35 sets that include a and 35 that include b. If we count both sets, we have counted the 15 sets that contain both a and b twice. 35 + 35 − 15 = 55. – b. 35 + 35 − 2 × 15 = 40 Solution 5.13 199 − 66 − 28 + 9 = 114 Solution 5.14 5000 − 1250 − 1000 − 833 + 250 + 416 + 166 − 83 = 2666 Solution 5.15 – a. 0; – b. 28 + 34 − 47 = 15; – c. 28 Solution 5.16 – a. (x3 + 4x2 + x)/(1 − x)4 . – b. (4x2 + 3x − 1)/(1 − x)3 . – c. x/(1 − x)4 . – d. 1/(1 − 3x) − 3/(2 − 4x). Solution 5.17 – a. We expand (1 + x + x2 + x3 )3 and the coefficient of xn is the result. This is 1 + 3x + 6x2 + 10x3 + 12x4 + 12x5 + 10x6 + 6x7 + 3x8 + x9 . – b. Similarly, (x + x2 + x3 )3 = x3 + 3x4 + 6x5 + 7x6 + 6x7 + 3x8 + x9 . – c. For at least 2 banana-walnut muffins, we expand (x2 + x3 )(1 + x + x2 + x3 )2 , to get: x2 + 3x3 + 5x4 + 7x5 + 7x6 + 5x7 + 3x8 + x9
36
CHAPTER 5. COMBINATORICS – d. For an unlimited supply, (1 + x + x2 + . . . )3 = (1 − x)−3 . to see that we can do this in C(n + 3 − 1, 3 − 1) = (n2 + 3n + 2)/2 ways. To find the recurrence, expand the denominator; this is 1 − 3x + 3x2 − x3 . This gives the recurrence an = 3an−1 − 3an−2 + an−3 .
Solution 5.18 – a. an = 4n . – b. an = (n − 2)2n . – c. an = (−3)n . – d. an = an = 2n . – e. an = 4n − (−3)n . – f. an = n2n−1 . Solution 5.19 Call x0 the center, edges from x0 spokes and the other edges outer edges. There are n spokes and n outer edges, total 2n. x0 has degree n, and all others have degree 3. Solution 5.20 Minimum 38; maximum 171. Solution 5.21 Say there are v vertices. Minimum degree is 0, maximum is n − 1, so if all are different, each of 0, 1, . . . , n − 1 must be a degree. One vertex (degree 0) is adjacent to no other vertex; call it x. One vertex (degree n − 1) is adjacent to every other vertex; call it y. So x and y are both adjacent and non-adjacent, a contradictionl. Solution 5.22 Case n = 1 is easy. Assume r+1 X
Pr
i=1
i(i + 1) = 31 r(r + 1)(r + 2). Then
i(i + 1)
=
1 r(r + 1)(r + 2) + (r + 1)(r + 2) 3
=
r
i=1
3
1 + 1 (r + 1)(r + 2) = (r + 1)(r + 2)(r + 3). 3
Solution 5.23 – a. 4 – b. 4 Solution 5.24 xabcf , xabef , xadef , xdabcf , xdabef , xdef . Solution 5.25 Go along the path x, a, . . . until the first time you meet a vertex (other than x) that appears in both paths. Say that vertex is c. Then follow the second path back from c to y. The result is a closed walk containing x. The walk will be a cycle unless it contains some repetition, and this will only happen if a = b. In the latter case, the walk generated is not a cycle, it goes xax, a repeated edge. Solution 5.26 – a. No; 1 – b. Yes – c. No; 3
37 – d. Yes – e. No; 3 – f. 3 Solutions to (b) and (d) are found by the algorithm. Here are sample solutions for the eulerizations. (a)
(c)
(e)
(f)
Solution 5.27 One well-known example is:
Solution 5.28 – a. From a: 191 (adceba); from c: 191 (cdabec); sorted edges: 191 (adceba). – b. From a: 154 (abdcea); from c: 161 (cabdec); sorted edges: 153 (abecda). – c. From a: 43 (acbdea); from c: 36 (cadbec); sorted edges: 36 (dabecd). – d. a: 183 (adbcea); from c: 166 (cbedac); sorted edges: 166 (cbedac). Solution 5.29 Costs: – a. 37 – b. 24 – c. 32 – d. 39 (a) a
b
6 d
e
6
9 g
5 3
8 h
5
(c) a
3
d f 4 4
(b) a
f
6 4
i
f
(d)
a c
3
3 g
6 i 4
j
2 8 6 d 2
g
3
e 4
b 3
2
6
3 5
9
c
c
b 6
h
4 4
3 7
4 g
2 e 3 h b
5 d
c
4 5
8
e 3 6 5 8 h i 2 2 3 4 k l 6
f 4 j
38
CHAPTER 5. COMBINATORICS
Solution 5.30 – a. 3 – b. 7 – c. 1 – d. 6 Solution 5.31 – a. 1011010 – b. 1101000 – c. 0011001 – d. 1010001 Solution 5.32 – a. 1100 – b. 0001 – c. 1001 – d. 0101 – e. 0101 – f. 1010 Solution 5.33 – a. Suppose a Latin square L has first row (a1 , a2 , . . . , an ). Carry out the permutation 1 7→ a1 , 2 7→ a2 , . . . , n 7→ an on the columns of L. Permute the rows in the corresponding fashion. – b. Each reduced square gives rise to n! different squares by column permutation. Once a column permutation is chosen, there are (n − 1)! different row permutations available (leaving row 1 as the first row). The n!(n − 1)! resulting squares are all different (any two will differ somewhere in the first row or column). – c. r3 = 1, r4 = 4. – d. The reduced squares are: 1 2 3 4
2 1 4 3
3 4 1 2
4 3 2 1
1 2 3 4
2 1 4 3
3 4 2 1
4 3 1 2
1 2 3 4
2 3 4 1
3 4 1 2
4 1 2 3
1 2 3 4
2 4 1 3
3 1 4 2
4 3 2 1
Solution 5.34 – a. Where will the entry (1,1) occur in (A, AT )? If it is in position ij, it also appears in position ji. To avoid two occurrences, we need i = j, a diagonal position. So 1 appears on the diagonal somewhere. The same argument applies to every symbol. So each of 1, 2, . . . , n is on the diagonal. There is just enough room for each symbol to appear exactly once. – b. Say A is a self-orthogonal Latin square of side 3 whose diagonal is (1, 2, 3). The only possible (1, 2) entry is 3, and it is also the only possible (2, 1) entry. Therefore (3, 3) appears too many times in (A, AT ).
39 – c. 1 4 2 3
3 2 4 1
4 1 3 2
1 4 5 3 2
2 3 1 4
3 2 4 5 1
2 5 3 1 4
5 1 2 4 3
4 3 1 2 5
– d. 0101 – e. 0101 – f. 1010 Solution 5.35 Three blocks are 12, 23 and 13. The other two must be either two copies of the same block (e.g. {12, 23, 13, 12, 12}) or different blocks (e.g. {12, 23, 13, 12, 13}). Any example can be derived from one of these two by permuting the names of varieties. Solution 5.36 Write S for the set comprising the first v positive integers, and write Si for the set formed by deleting i from S. Then blocks S1 , S2 , . . . , Sv form the required design. Solution 5.37 The fourth and sixth lines cannot be completed. Others: v
b
r
k
λ
7
14
6
3
2
10
15
6
4
2
66
143
13
6
1
21
30
10
7
3
Solution 5.38 Clearly λ ≤ r. If λ = r then every block must contain every element, and the design is complete. Solution 5.39 In a symmetric design, λ(v − 1) = k(k − 1). The right-hand side is always even, so v even implies v − 1 is odd, and λ must be even. Solution 5.40 In each case v is even but k − λ is not a perfect square.
Chapter 6
Probability, Random Variables and Stochastic Processes Solution 6.1 – a. If the maximum number that is selected equals 20, then the numbers of the remaining 9 balls must be in (19 9) the set {1, 2, . . . 19}. The probability of this event is given by 100 . ( 10 ) – b. The probability that the smallest numbered ball equals 20 is given by
(79 9) (100 10 )
Solution 6.2 The joint PMF PX ,Y (x, y) is given in Table 6.1. The marginal PMFs are given in Table 6.2. It can be clearly seen that the joint PMF does not equal the product of the marginal PMFs and hence the random variables X and Y are not independent.
X X X X
= = = =
0 1 2 3
Table 6.1: The joint PMF of random variables X and Y =0 Y =1 Y =2 0 p2 (1 − p)3 2p(1 − p)4 p3 (1 − p)2 2p2 (1 − p)2 p(1 − p)3 (1 + 3p) 2p4 (1 − p) p3 (1 − p)(4 − 3p) 2p2 (1 − p)2 5 4 p 2p (1 − p) p3 (1 − p)2
Y. Y=3 (1 − p)5 2p(1 − p)4 p2 (1 − p)3 0
Table 6.2: The marginal PMFs of random variables X and Y. 0 1 2 3 PX (1 − p)3 3p(1 − p)2 3p2 (1 − p) p3 PY p3 3p2 (1 − p) 3p(1 − p)2 (1 − p)3
Solution 6.3 Let the number of particles observed equal k. The conditional probability of observing k particles if element 1 −x k was selected equals e k!x . Similarly if element 2 was selected the conditional probability of observing k particles equals
e−y y k k! .
The optimal decision is to select the particle for which the probability is larger. Thus element 1 is 40
41 selected if e−x xk k! x−y =⇒ k > =δ log(x/y)
>
e−y y k k!
Similarly, element 2 is selected if k ≤ δ. The probability of error is given by 0.5
Pbδc
k=0
e−x xk k!
+ 0.5
P∞
k=bδc+1
e−y y k k!
Solution 6.4 Suppose there are m Heads observed in the N observations. Then the conditional probability of obtaining this 1 m N −m N sequence of observations given that coin k was selected can be calculated as Pm|k = m . The 1 − k1 k optimal decision rule is to select k for which Pm|k is largest. Solution 6.5 First, it can be recognized from the characteristic function that N is a Poisson random variable. The mean of X (t) can be calculated as 2 X E {X (t)} = E {Ni } E {sin(2πf0 t + θi )} = 0 i=1
The autocorrelation can be calculated as ! 2 2 X X RX (t, τ ) = E Ni sin(2πf0 t + θi ) Nj sin(2πf0 (t + τ ) + θj ) i=1
=
2 X 2 X
j=1
E {Ni Nj } E {sin(2πf0 t + θi ) sin(2πf0 (t + τ ) + θj )}
i=1 j=1
=
2 X
E Ni2 cos(2πf0 τ )
i=1
=
2λ cos(2πf0 τ )
Solution 6.6 The one-step transition probability diagram is given in Figure 6.1. The transition probability matrix for this Markov chain is given in Table 6.3. The steady state probabilities can now be computed as 1/9 for all states. 0 1/6 1/6 1/6 1/6 0 1/6 0 1/6 1/6 0 1/6 1/6 1/6 1/6 0 1/6 0 1/6 1/6 0 0 1/6 1/6 1/6 0 1/6 1/6 1/6 0 0 1/6 1/6 1/6 1/6 0 1/8 1/8 1/8 1/8 0 1/8 1/8 1/8 1/8 0 1/6 1/6 1/6 1/6 0 0 1/6 1/6 1/6 0 1/6 1/6 1/6 0 0 1/6 1/6 0 1/6 0 1/6 1/6 1/6 1/6 0 1/6 1/6 0 1/6 0 1/6 1/6 1/6 1/6 0 Table 6.3: The transition probability matrix for the motion of the queen on the 3 × 3 chessboard.
Solution 6.7 The value of c can be computed as 4 by setting the integral of the joint PDF equal to 1. It can be clearly recognized that the joint PDF of A, B has non-zero values only over the region {(a, b) : 0 < a < b < 1}. Hence, ( fX ,Y (a, b) + fY,X (b, a) = 8ab 0 < a < b < 1 fA,B (a, b) = 0 otherwise
42
CHAPTER 6. PROBABILITY, RANDOM VARIABLES AND STOCHASTIC PROCESSES
Figure 6.1: The transition probability diagram for the Markov chain representing the motion of the queen on a 3 × 3 chessboard.
The marginal PDFs can be calculated as follows Z fA (a) = 8a
1
b=a Z b
fB (b)
=
8b
b db = 4a(1 − a2 ), 0 < a < 1 a da = 4b3 , 0 < b < 1
a=0
Solution 6.8 Let gi represent the growth in value of the stock on the ith day of trading of the stock. Then Pr (gi = 1.02) = Q365 0.1, Pr (gi = 0.99) = 0.2 and Pr (gi = 1) = 0.7. The value of the stock at the end of the year X = 100 i=1 gi . A simple trick of calculating log(X ) enables us to readily calculate the value of E {X } using the Central Limit Theorem. Thus,
E {log(X )} = log(100) +
365 X
E {log(gi )}
i=1
= 1.995
(1)
Similarly the variance of log(X ) can be calculated as σ 2 = 0.1(log(1.02))2 + 0.2(log(0.99))2 − E {log(X )} = 4.06 × 10−3
2
(2)
We can now approximate Y = log(X ) as a Gaussian random variable with mean and variance given by (1) and (2) respectively. The required tail probability of X can be calculated as Pr (X ≥ 105)
=
Pr (log(X ) ≥ log(105)) log(105) − 1.995 √ = Φ = 0.65 4.06 × 10−3
43 Solution 6.9 The PDF of random variable M can be computed as fM (m)
=
1/2fX1 (m) + 1/2fX1 (m) ∗ fX2 (m) ( 1/4 + 1/8m 0 < m ≤ 2 = (4 − m)/8 2 Cα | H0 } = α. Recall that the joint density of all p eigenvalues of a sample covariance matrix of n > p multivariate real-valued Gaussian observations with a population covariance Σ is given by Q −n/2 Q (n−p−1)/2 Q p (l1 , l2 , ..., lp | Σ) = Cn ,p i µi × i1 n l1 −lj 1− 2 2 λ+1 λ+1
Combining and taking logarithms gives p(l ,l ,...,l |H )
log p(l11 ,l22 ,...,lpp |H10 ) =
n 2
h
i λ l1 λ+1 − log(1 + λ) (1 +0 (1))
Hence, asymptotically in sample size, we accept H1 if l1 > 1 +
1 λ
log(1 + λ) + n2 logCα .
That as n → ∞ with p fixed, distinguishing between H0 and H1 should be based only on the largest sample eigenvalue l1 . The same conclusion holds also for complex-valued observations, where the density of sample eigenvalues has a slightly different formula.
48
CHAPTER 7. RANDOM MATRIX THEORY
Solution 7.5 Definition. Almost sure convergence . We say that xn converges almost surely to x if P r[ lim xn = x] = 1 n→∞
∞
Borel Cantelli Lemma. Let {An }n=0 be a sequence of events and let A be the event consisting of an infinite number of event An , n = 1, . . . then if the sum ∞ X
P (An ) < ∞
n=1
then: P
lim sup An = 0
n→∞
that it, there is 0 probability that infinitley many of them will occur. If P (An ) ≤
K n2
then An converges a.e. to A.
(We will use it later as will be implied, we will define An to be that the absolute value of the difference is larger then ε, ans since the sum P (An ) is less then ∞, from Borel Cantelli the limit of the events is zero, the limit of the event is that they are different so they are equal in probability 1.) Another lemma with proof: Θ = [θ1 , ..., θK ] extracted from a Haar matrix. Let X = [x1 , ..., xN ] be an i.i.d. Gaussian matrix. θ1 = where Π1 = I −
x1 kx1 k ,
θ2 =
Π1 x2 kΠ1 x2 k
x1 xH 1 kx1 k2
θK =
ΠK −1 xK kΠK −1 xK k
Note that Πk−1 and xk are stochastically independent. Suppose BN is a function of θ2 , ..., θK . θ1H BN θ1 −
1 N −K T r(Π0 BN )
= f1 (θ1 , ..., θK )
with Π0 = I − ΘΘH + θ1 θ1H ˜ = ΘP . It is clear that fK θ˜1 , ..., θ˜K = Let P be a permutation matrix exchanging column 1 with K. Define Θ f1 (θ1 , ..., θK ) ˜ and Θ have the same distribution, random variables f1 (θ1 , ..., θK ) and fK (θ1 , ..., θK ) are identically disSince Θ tributed which implies: 4
E kf1 (θ1 , ..., θK )k = E kfK (θ1 , ..., θK )k
4
In the following, we focus therefore on column θK of Θ. 4 E kf1 (θ1 , ..., θK )k does not depend on the particular way Θ is generated provided it is Haar distributed. 1 T r (ΠK−1 BN ) N −K H xH Πk−1 xK 1 k ΠK−1
=
xH ΠH BN kΠk−1 xK k − N − K T r (ΠK−1 BN ) K−1 k
H eN = θ K BN θ K −
= e1 ,N +e2 ,N .
49 e1 ,N =
H xH k ΠK−1 BN Πk−1 xK kxk ΠK−1 k2 H xH k ΠK−1 BN Πk−1 xK N −K
e2 ,N =
− −
H xH k ΠK−1 BN Πk−1 xK N −K
1 N −K T r (ΠK−1 BN )
H xH 1 k ΠK−1 BN Πk−1 xK ' T race (ΠK−1 BN ΠK−1 ) N −K N −K 1 = T race Π2K−1 BN N −K 1 = T race (ΠK−1 BN ) . N −K
e1 ,N =
H xH k ΠK−1 BN Πk−1 xK kxk ΠK−1 k2
−
H xH k ΠK−1 BN Πk−1 xK N −K
We rewrite the form: e1 ,N = Let us first show that
H xH k ΠK−1 BN Πk−1 xK N −K
H xH k ΠK−1 BN Πk−1 xK N −K
N −K kxK ΠK−1 k2
−1
is bounded. As e2 ,N converges to 0 almost everywhere,
H xH k ΠK−1 BN Πk−1 xK N −K
K−1 BN ) < 2 T race(Π ≤ 2 kΠK−1 BN k N −K
for N large enough (for any matrix X, T race(X) ≤ kXk Rank(X)) and the rank of ΠK−1 BN does not exceed N-K in our case). H
xH k ΠK−1 BN Πk−1 xK
As supN ∈N ΠH is bounded almost everywhere. K−1 BN < ∞, N −K
e1 ,N =
H xH k ΠK−1 BN Πk−1 xK N −K
N −K kxK ΠK−1 k2
−1
Note that ΠK−1 is independent of xK . N −K−1
2
kΠK−1 xK k is χ2 distributed with 2(N-K) degrees of freedom. Its probability density is the function: (Nt −K−1)! e−t . A direct computation show that: E kx which coincides with O (N )−2 if
K N
→ α.
N −K 2 K ΠK−1 k
4 − 1 = O (N − K)−2
Chapter 8
Large Deviations Solution 8.1 1 Q (x) = √ 2π
Z
∞
e
−
(t+x)2 2
0
x2
e− 2 dt = √ 2π
Z
∞
e−
(t2 +2tx) 2
dt
0
Using the relation:
e
−
(t2 +2tx)
t2
≤ e− 2
2
We obtain:
x2
e− 2 Q (x) ≤ √ 2π 1 As, √ 2π
∞
Z
t2
e− 2 dt = e−
x2 2
0
Z
∞
t2
e− 2 dt =
0
1 2
Solution 8.2 Λ (θ) = log E eθx
= log eλ eθ − 1 = λ eθ − 1
I (x) = sup xθ − λ eθ − 1 Now, xθ − λ eθ − 1 attains supremum if ; d xθ − λ eθ − 1 = 0 dθ ∴ I (x) = xlog 50
x λ
−x+λ
51 Solution 8.3
- Please See Sanov Theorem [8.6 from Book] ∗
Q (x) = P
P (x)e a∈A
P
j
λj gj (x) P , j λj gj (x)
x∈A
P (a)e
−a2
2
2 √1 e 2 eλx N (0, 1)eλx 2π Q (x) = R ∞ = R ∞ 1 −a2 λx2 2 N (0, 1)e √ e 2 eλx da a=−∞ a
∗
−∞
We should choose λ in (1) such that
R
(1)
2π
x2 Q∗ (x) = a2 and Q∗ (x) ∼ N (0, a2 ).
1 1 (µ1 − µ0 )2 σ12 σ2 −D(N (µ1 , σ12 )kN (µ0 , σ02 )) = − log( 02 ) + ( + − 1)loge. 2 σ1 2 σ02 σ02
(2)
1 1 1 By using, (1)&(2)Exponential Rate: − D(Q∗ kP ) = −D(N (0, a2 )kN (0, 1)) = − log(a2 ) + ( a2 − )loge. 2 2 2 Solution 8.4 Please see Theorem 8.5.8 in the book. – a. k ≥ l k! = l! · (l + 1) · · · k k! = (l + 1) · (l + 2) · · · k l! which is always greater or equal than lk−l = l · l · · · l, total number of “l” will be equal to “k − l” Let’s say k = 10 and l = 9 then 10 ≥ 9 – b. k < l can be showed the same way as Part a Solution 8.5 – a. From Sanov’s theorem, it follows that
Qn (E) = 2−nD(P
∗
||Q)
where P ∗ minimizes D(P ||Q) over all P that satisfies P ∗ (x) = P
eλx
u∈{1...6}
eλu
P6
iP (i) ≥ 5. Using Lagrange multipliers, we obtain λ = 0.2519 P ∗ = (0.0205, 0.0385, 0.0723, 0.1357, 0.2548, 0.4781) and therefore D(P ∗ ||Q) = 0.6122 Prob. of average 10000 throws is ≥ 5 is ≈ 2−6122 1
– b. ∗ Probability of getting face 1 would be: ∗ Probability of getting face 2 would be: ∗ Probability of getting face 3 would be: ∗ Probability of getting face 4 would be: ∗ Probability of getting face 5 would be:
1 n 6 1 n 6 1 n 6 1 n 6 1 n 6
in n toses of the dice in n toses of the dice in n toses of the dice in n toses of the dice in n toses of the dice
52
CHAPTER 8. LARGE DEVIATIONS ∗ Probability of getting face 6 would be:
1 n 6
in n toses of the dice
Hence the probability of the getting the faces 1 to 5 with 1 percent probability and face 6 with 95 percent probability is:
6n n n n n n n 1 1 1 1 1 1 1 1 1 1 1 1 95 95 × × × × × = 12 6 100 6 100 6 100 6 100 6 100 6 100 10 6 This is the required probability. Solution 8.6
ω1 K0 +ω2 K1 +... = 1 −
1 log3
3 1 1 1 1 1 1 1 3 1 1 + − + − +... = 1 − + ≈ 0.296 8 log3 log4 4 log4 log5 4 log3 8 log3 4
Chapter 9
Fundamentals of Estimation Theory Solution 9.1 Define x = [x[0], x[1], · · · , x[N − 1]]T , then the likelihood function of φ given x is
f (x|φ) =
1 N/2 (2πσ 2 )
exp
NP −1
− n=0
2 (x[n] − A cos(2πf0 n + φ)) 2σ 2
.
The first and second order derivatives of ln f (x|φ) can be derived as follows, ∂ ln f (x|φ) ∂φ
=
−
N −1 1 X {[x[n] − A cos(2πf0 n + φ)] · A sin(2πf0 n + φ)} σ 2 n=0
=
−
N −1 1 X {Ax[n] sin(2πf0 n + φ) − A2 sin(2πf0 n + φ) cos(2πf0 n + φ)} σ 2 n=0
=
−
N −1 1 X A2 {Ax[n] sin(2πf n + φ) − sin(4πf0 n + 2φ)}, 0 σ 2 n=0 2
∂ 2 ln f (x|φ) ∂2φ
= −
N −1 1 X {Ax[n] cos(2πf0 n + φ) − A2 cos(4πf0 n + 2φ)} σ 2 n=0
≈
N −1 1 X Ax[n] cos(2πf0 n + φ). σ 2 n=0
−
The Fisher information about φ can be written as
=
∂ 2 ln f (x|φ) −E ∂2φ
=
N −1 A2 X cos2 (2πf0 n + φ) σ 2 n=0
=
N −1 A2 X (1 + cos(4πf0 n + 2φ)) 2σ 2 n=0
≈
N A2 . 2σ 2
I(φ)
53
54
CHAPTER 9. FUNDAMENTALS OF ESTIMATION THEORY
Therefore, the CRLB for φ is C(φ) = I −1 (φ) =
2σ 2 . N A2
Solution 9.2 Define x = [x[0], x[1], · · · , x[N − 1]]T and θ = [A, B]T , the likelihood function of θ given x is
f (x|θ) =
1 N/2
(2πσ 2 )
N −1 P 2 (x[n] − A − Bn) n=0 exp − . 2σ 2
The first and second order partial derivatives of f (x|θ) with respect to A and B can be derived as NP −1
∂ ln f (x|θ) ∂A
=
σ2 NP −1
∂ ln f (x|θ) ∂B ∂ 2 ln f (x|θ) ∂2A 2 ∂ ln f (x|θ) ∂A∂B ∂ 2 ln f (x|θ) ∂2B
=
(x[n] − A − Bn)
n=0
n(x[n] − A − Bn)
n=0
σ2 N = − 2 σ N (N − 1) = − 2σ 2 N (N − 1)(2N − 1) = − . 6σ 2
Thus, the Fisher Information Matrix 1 I(θ) = 2 σ
N N (N − 1)/2
N (N − 1)/2 N (N − 1)(2N − 1)/6
"
− N (N6+1)
Inverting the matrix yields I−1 (θ) = σ 2
2(2N −1) N (N +1) − N (N6+1)
12 N (N 2 −1)
It follows immediately that 2(2N − 1) Aˆ ≥ , N (N + 1)
ˆ≥ B On the other hand,
∂ ln f (x|θ) ∂θ
12 . N (N 2 − 1)
can be transformed into the following form,
# .
.
55
∂ ln f (x|θ) ∂θ
NP −1
=
NP −1
(x[n]−A−Bn)
n=0
σ2 n(x[n]−A−Bn)
n=0
σ2
" =
N σ2 N (N −1) 2σ 2
N (N −1) 2σ 2 N (N −1)(2N −1) 6σ 2
#
Aˆ − A ˆ −B B
,
where N −1 N −1 X 2(2N − 1) X 6 Aˆ = x[n] − nx[n], N (N + 1) n=0 N (N + 1) n=0
ˆ=− B
N −1 N −1 X X 6 12 x[n] + nx[n]. N (N + 1) n=0 N (N 2 − 1) n=0
ˆ are the MVUEs. Therefore, Aˆ and B
Solution 9.3 – a. Define x = [x[0], x[1], · · · , x[N − 1]]T , then the PDF of x conditioned on θ can be expressed as N 1 f (x|θ) = U (θ − xmax ) U (xmin ), | {z } θ {z } h(x) | g(T (x),θ)
where T (x) = xmax = max x[n], xmin = min x[n]. Thus, T (x) is a sufficient statistics. – b. First, the cumulative distribution function of T can be derived as Pr{T ≤ ξ}
=
Pr{x[0] ≤ ξ, x[1] ≤ ξ, ..., x[N − 1] ≤ ξ}
=
Pr {x[n] ≤ ξ}N .
Then, the PDF of T can be calculated as follows fT (ξ)
=
d Pr{T ≤ ξ} dξ
= N Pr {x[n] ≤ ξ}N −1
d Pr{x[n] ≤ ξ} . dξ
Thus, 0 N −1 1 fT (ξ) = N θξ θ 0
ξ−
σ2 ln 2
1 −1 c
= −t
Thus Ye = (−t, t). We choose 0 if π0 (y) ≤ min{c, π(1|y)}. From above it is clear that Y0 = (−∞, −t]. Finally, we choose 1 if π1 (y) ≤ min{c, π(0|y)}. Thus Y1 = [t, ∞). To summarize, 0 e δB (y) = 1
y ≤ −t −t < y < t , y≥t
σ2 where t = ln 2
1 −1 , c
if c < 0.5,
– b. Recall that we erase if c < min{π(0|y), π(1|y)}. Since min{π(0|y), π(1|y)} is always less that 0.5, we never erase if c ≥ 0.5. 0 y≤0 δB (y) = , if c ≥ 0.5. 1 y≥0 Solution 10.4 – a. With uniform costs and equal priors, the critical region Y1 for minimum Bayes error is given by Y1 = {y ∈ [−3, 4] s.t. p1 (y) ≥ p0 (y)} = [0, 3/2] ∪ [3, 4]. Thus the Bayes rule, as shown in figure 10.1, is given by 1 if 0 ≤ y ≤ 3/2 or 3 ≤ y ≤ 4 δB (y) = . 0 otherwise
64
CHAPTER 10. FUNDAMENTALS OF DETECTION THEORY
p1 p0
H0
H1
H0
H1
0.25
0.1667
0
−3
0
1.5
3
4
Figure 10.1: Decision regions for Problem 4. The corresponding minimum Bayes risk is given by 1 1 P0 (Y1 ) + P1 (Y0 ) 2 2 which is equal to half of the area of the shaded region in Figure 10.1 1 1 1 1 3 13 11 + + = . 2 2 4 6 2 42 32 r(δB ) =
– b. Since p1 (y) and p0 (y) are such that the likelihood ratio L(y) does not have any point mass, the minmax rule has the form 1, L(y) ≥ τ δm (y) = . 0, L(y) < τ For the given problem, this will correspond to the following rule 1, if 0 ≤ y ≤ x or 3 ≤ y ≤ 4 δm (y) = 0, otherwise
(1)
for some x ∈ [0, 3/2]. The corresponding risks, as a function of x, are given by x 1 1 1 x2 + 6x + + R0 (x) = P0 (Y1 ) = x= 2 6 6 18 36 and R1 (x) = P1 (Y0 ) =
3−x . 4
Therefore, the minmax risk is obtained by minimizing max {R0 (x), R1 (x)} = max
x2 + 6x 3 − x , 36 4
over x ∈ [0, 3/2]. Since R0 (x) is an increasing function of x and R1 (x) is a decreasing function of x, the optimal x∗ can be obtained by equating R0 (x) and R1 (x): R0 (x∗ ) = R1 (x∗ ) x2 + 6x 3−x = 36 4 x2 + 15x − 27 = 0 √ −15 + 333 ∗ x = ≈1.6241. 2
65 Therefore, the minmax rule is given by (1) with x∗ as the threshold, and the corresponding minmax risk is given by √ 3 − x∗ 21 − 333 Rmax (δm ) = = ≈ 0.344. 4 8 Solution 10.5 The likelihood ratio for this problem takes the values L(a) = 0.5,
L(b) = 0.5,
L(c) = ∞.
The threshold as a function of π is given by π0 . 10(1 − π0 )
τ (π0 ) =
The critical values of L(y) are 0.5 and ∞, and thus the critical values of π0 are 5/6 and 1, which correspond to thresholds of 0.5 and ∞. But V (0) = V (1) = 0 since correct decisions are not penalized. Thus, only π0 = 5/6 is interesting and V must have a maximum there. Now, 1 if L(y) ≥ 0.5 − δ5/6 (y) = = 1 for all y. 0 if L(y) < 0.5 and + δ5/6 (y) =
It is easy to see that
1 if L(y) > 0.5 = 0 if L(y) ≤ 0.5
− − R0 (δ5/6 ) = 1, R1 (δ5/6 ) = 0, and
and hence q= Thus
( δm (y) =
1 if y = c . 0 if y = a, b
+ + R0 (δ5/6 ) = 0, R1 (δ5/6 )=5
5 (5 − 0) = . (1 − 0) + (5 − 0) 6
− δ5/6 (y) w.p. 5/6 + δ5/6 (y) w.p. 1/6
=
1 if y = c . 1 w.p. 5/6 if y = a, b
Solution 10.6 There are M states 0, . . . , M − 1, corresponding to the M hypotheses H0 , . . . , HM −1 . Bayes rule chooses decision i (or hypothesis Hi ) that has the smallest a posteriori cost, given Y = y. Now, C(i|y) =
M −1 X
Cij π(j|y),
j=0
where π(j|y) =
pj (y)πj p(y)
Thus the Bayes decision regions are given by Yi = y ∈ Y : C(i|y) = min C(k|y) 0≤k≤M −1 M −1 M −1 X X = y∈Y : Ci,j πj pj (y) = min Ck,j πj pj (y) . 0≤k≤M −1 j=0
j=0
66
CHAPTER 10. FUNDAMENTALS OF DETECTION THEORY
Solution 10.7 – a. For uniform costs Ci,i = 0 for all i, and Ci,j = 1 for i 6= j. Furthermore we have equal priors. Thus based, on the solution to Problem 3, we get that the decision regions for Bayes rule are given by: Yj = y pj (y) = max pk (y) . k
If we draw a figure with the five densities, it is obvious that: Y0 = (−∞, −1.5], Y1 = (−1.5, −0.5], Y2 = (−0.5, 0.5], Y3 = (0.5, 1.5], Y4 = (1.5, ∞). – b. It is easier in this case to calculate the probability of correct decision under each hypothesis. These are given by: 2Φ(0.5) − 1 if j = 1, 2, 3 Pj (Yj ) = , Φ(0.5) if j = 0, 4 from which we get that the average probability of correct decision is: P (correct decision) =
1 [8Φ(0.5) − 3] . 5
The minimum Bayes risk or the probability of error is: Pe = 1 −
1 8 [8Φ(0.5) − 3] = [1 − Φ(0.5)] . 5 5
Solution 10.8 – a. The likelihood ratio L(y) is given by L(y) = 2e−|y| . With equal priors (π0 = π1 = 0.5), the threshold for Bayes rule is given by τ=
C1,0 − C0,0 1 = . C0,1 − C1,1 2
Therefore, the Bayes rule is δB (y) =
1, 0,
if |y| ≤ ln 4 . otherwise
– b. The corresponding Bays risk is equal to 1 1 1 C1,0 P0 (Y1 ) + C0,1 P1 (Y0 ) = 2 2 2
3 1 7 + = . 4 16 16
– c. Since L(y) does not have any point mass, the Neyman-Pearson rule for any η corresponds to a likelihood ratio test (LRT). For this problem, LRT has the form 1, if |y| ≤ ln x δη (y) = 0, otherwise for some x ≥ 1. The probability of false alarm associated with the above rule, as a function of x, is given by PF (x) = P0 (Y1 ) = 1 −
1 . x
The probability of miss detection is given by PM (x) = P1 (Y0 ) = The Neyman-Pearson rule corresponding to η=
1 4
x=
4 . 3
is obtained by choosing
1 . x2
67 – d. The corresponding probability of detection is given by PD (x) = 1 − PM (x) =
7 . 16
Solution 10.9 L(y) =
1 − β1 1 − β0
– a.
δB (y) =
β1 β0
y
1 if y ≥ τ 0 , 0 if y < τ 0
where τ0 =
ln
ln
1−β0 1−β1
β1 β0
– b. Neyman-Pearson tests must have the form: 1 1 δ˜η (y) = 0
if y > η w. p. Y if y = η if y < η
for some η > 0 and some Y ∈ [0, 1]. If we plot P0 (Y > η) as a function of η, we see that it is a staircase function of the form: bηc P0 (Y > η) = β0 β0 where b·c is the floor function. From this it is clear that: ηα = k and Yk =
if β0k+1 ≤ α < β0k ,
α − β0k+1 β0k − β0k+1
k = 0, 1, 2, . . .
if β0k+1 ≤ α < β0k ,
k = 0, 1, 2, . . .
The power of the test for β0k+1 ≤ α < β0k , k = 0, 1, 2, . . ., is PD
=
P1 (Y > ηα ) + Yα P1 (Y = ηα ) ∞ X = (1 − β1 ) β1i + Yα P1 (Y = ηα ) i=ηα +1
= β1k+1 +
α − β0k+1 (1 − β1 )β1k β0k − β0k+1
Solution 10.10 – a. Observe that ˜ 2 + PM (δ) ˜ = min min [PF (δ)] δ˜
α
min
˜ F (δ)≤α ˜ δ:P
˜ = min α2 + α + PM (δ) 2
α
min
˜ F (δ)≤α ˜ δ:P
! ˜ PM (δ) .
Therefore, the optimal rule is a N-P rule for some α, and hence is a (possibly randomized) LRT. – b. It can be shown that the N-P rule δ˜NP with PF (δ˜NP ) = α is given by 1 if y ≥ 1 − α δ(y) = 0 otherwise with PM (δ˜NP ) = (1 − α)2 . Therefore, the optimal risk is min α2 + (1 − α)2 = 0.5 α
corresponding to α = 0.5.
68
CHAPTER 10. FUNDAMENTALS OF DETECTION THEORY
Solution 10.11 – a. Consider η > 1. Then Z PD (δη ) = P1 (L(Y ) ≥ η) =
L(y)p0 (y)dy ≥ η P0 (L(Y ) ≥ η) ≥ PF (δη ) {L(y)≥η}
For η ≤ 1, an analogous argument works if we start with PD (δη ) = 1 − P1 (L(Y ) < η). – b. First note that
dPD dPD dη . = dPF dPF dη
Now
Z
L(y)p0 (y)dy = E0 L(y)I{L(y)≥η} ,
PD (δη ) = {L(y)≥η}
where I{·} is the indicator function. Let pL 0 (`) denote the density of the random variable L(Y ) under H0 . Clearly d d P0 (L(Y ) ≤ `) = − P0 (L(Y ) ≥ `) . pL 0 (`) = d` d` Now, from above Z Z ∞
∞
` I{`≥η} pL 0 (`)d` =
PD (δη ) = 0
` pL 0 (`)d`.
η
Taking derivatives with respect to η, we get d dPF dPD = −ηpL P0 (L(Y ) ≥ η) = η . 0 (η) = η dη dη dη Solution 10.12 ˜ the N-P test for testing between θ0 and θ1 has the form For any fixed θ0 ≤ θ˜ and θ1 > θ, if gθ0 ,θ1 (y) > η 1 ˜ 1 w.p. γ if gθ0 ,θ1 (y) = η δ(y) = for some η ≥ 0. 0 if gθ0 ,θ1 (y) < η
(1)
Randomization is required in general because T (y) and hence L(y) can have point masses. Since gθ0 ,θ1 (y) is strictly increasing, the above test will is equivalent to if T (y) > τ 1 ˜ 1 w.p. γ if T (y) = τ δ(y) = for some τ . (2) 0 if T (y) < τ Since T (y) is independent of θ0 and θ1 , the structure of the above test is independent of θ0 and θ1 , hence a UMP test exists. The corresponding τ and γ can be determined using the false alarm constraint: ˜ θ0 ) = sup (Pθ (T (y) > τ ) + γ Pθ (T (y) = τ )) = α. sup PF (δ; 0 0
θ0 0. 0 otherwise Since the structure of the test is independent of θ0 and θ1 , an UMP test exists. The corresponding threshold τ is chosen to satisfy the following constraint on the worst-case false alarm probability: sup PF (δ; θ0 ) = α. θ0 ∈∆0
Observe that PF (δ; θ0 ) = 1 − e−θ0 τ which is maximized when θ0 ↑ 2. Therefore, to meet the false alarm constraint, the threshold τ is chosen such that 1 − e−2τ = α ⇒ τ = −
ln(1 − α) . 2
Therefore, the UMP test of level α is δUMP (y) =
1 0
if y ≤ − ln(1−α) 2 . otherwise
– b. The generalized LRT for this problem has statistic given by max θe−θy
TGLRT (y) =
2≤θ η if ln Lθ (y) = η if ln Lθ (y) < η
for some η ∈ (−∞, ∞) and some γ ∈ (0, 1).
Note that η ∈ (−∞, ∞) since we took logarithms. Since y has no point mass under H0 and ln Lθ (y) is nondecreasing in y, the above test is equivalent to: 1 if y ≥ τ δτ (y) = for some τ ∈ (−∞, ∞). 0 if y < τ Since the above test is independent of θ, an UMP exists. The threshold τ can be determined using the false alarm constraint PF (δτ ) = P0 (Y > τ ) = α − ln(2α) if α ≤ 0.5 ⇒τ = . ln(2(1 − α)) if α > 0.5 The power of the test is given by: −θ 1 − (1 − α)e −θ e PD (δUMP ; θ) = 1 − 4α αeθ
if α > 0.5 if 0.5e−θ ≤ α ≤ 0.5 if α < 0.5e−θ
– b. The test statistic for the LMP test is given by Tlo (y) =
d |y|−|y−θ| e = sgn(y). dθ θ=0
where sgn(y) takes the value 1 if y ≥ 0 and −1 if y < 0. Hence δ˜lo has the form: 1 1 w.p. γ δ˜η,γ = 0
if sgn(y) > η if sgn(y) = η . if sgn(y) < η
To find the threshold for an α-level test, consider 1 0.5 P0 (sgn(Y ) > η) = 0
if η < −1 if −1 ≤ η < 1 . if η ≥ 1
71 Thus, for α < 0.5, we get δ˜η,γ =
1 w.p. 2α 0
if y ≥ 0 if y < 0
and PD (δ˜lo ) = 2α Pθ (Y > 0) = α(2 − e−θ ). For α ≥ 0.5, we get δ˜η,γ =
1 0 w.p. 2α − 1
if y ≥ 0 if y < 0
and PD (δ˜lo ) = Pθ (Y > 0) + (2α − 1) Pθ (Y ≤ 0) = 1 − (1 − α)e−θ . Solution 10.15 – a. The minimum-probability-of-error detector is the ML detector, and is given by ( 1 if ln L(y) ≥ 0 δB (y) = . 0 if ln L(y) < 0 For the given problem, s0 + s0 1 + ρ2 = [2a 0]> ln L(y) = (s1 − s0 )> Σ−1 y − −ρ 2 2 = 2a (1 + ρ )y1 − ρy2
−ρ 1
y
Since a > 0, the test is equivalent to ( δB (y) =
1 0
if y1 − by2 ≥ 0 if y1 − by2 < 0
where b = ρ/(1 + ρ2 ). – b. It can be seen that y1 − by2 ∼ N (−a, σ 2 ) under H0 , under H1 , y1 − by2 ∼ N (a, σ 2 ) under H1 , where 1 σ 2 = 1+ρ 2 . Therefore, the probability of error is given by p 1 1 P0 (Y1 − bY2 > 0) + P1 (Y1 − bY2 < 0) = Q(a/σ) = Q(a 1 + ρ2 ). 2 2 – c. In the limit ρ → 0, b → 0, and hence the detector is ( 1 δB (y) = 0
if y1 ≥ 0 . if y1 < 0
Note that in the limit, the noise terms in y1 and y2 are independent and since the signal is present only in y1 , the detector is independent of y2 . – d. For any fixed 0 < a0 < 1 and a1 > 1, the log-likelihood ratio (ln La0 ,a1 (y)) has the form ln La0 ,a1 (y) = c1 (2y1 − y2 ) + c2 where c1 > 0 and c2 are constants independent of y. Therefore, the N-P test for testing between a0 and a1 has the form ( 1 if 2y1 − y2 ≥ τ δB (y) = . 0 if 2y1 − y2 < τ
72
CHAPTER 10. FUNDAMENTALS OF DETECTION THEORY Since the above test is independent of a0 and a1 , a UMP test exists. The corresponding τ can be determined using the false alarm constraint sup Pa0 (2Y1 − Y2 ≥ τ ) = α. 0 > > > s> Σ−1 N y = s G Gy = (Gs) (Gy) = b x
where b = Gs and x = Gy are as defined in the problem. – b. Observe that
> −1 PF (δη ) = P0 (b> x ≥ η) = P0 (s> Σ−1 N y ≥ η) = P(s ΣN z ≥ η).
Since s> Σ−1 N z is zero-mean Gaussian random variable with a variance of 2 > −1 > > 2 E[s> Σ−1 N z] = s ΣN s = s G Gs = ||b||
we get PF (δη ) = Q(η/||b||) Equating PF to α, we get ηα = ||b||Q−1 (α). – c. The corresponding probability of detection is given by PD (δη ) = P1 (b> x ≥ η) = P1 (s> Σ−1 N y ≥ η) η − ||b||2 2 > −1 = P(||b|| + s ΣN z ≥ η) = Q ||b|| = Q(Q−1 (α) − ||b||).
73 Solution 10.17 – a. For a fixed θ ∈ {−1, +1}, the log likelihood ratio is given by ln Lθ (y) = ln
pθ (y) = θs> Σ−1 (y − θs/2) = θ(y1 − y2 ) − 1. p0 (y)
Therefore, log likelihood ratio test is given by ( δη (y) =
1 0
if θ(y1 − y2 ) ≥ η if θ(y1 − y2 ) < η
For a given α, the optimal N-P test for θ = −1 and θ = +1 are different and it is not possible to obtain a common test that is uniformly optimal for both θ = −1 and θ = +1. Hence an UMP test does not exist. – b. The log GLRT statistic is given by ln TGLR (y) =
max
θ∈{−1,+1}
ln Lθ (y) = |y1 − y2 | − 1.
Therefore, the GLRT test is equivalent to ( 1 δGLRT (y) = 0
if |y1 − y2 | ≥ η if |y1 − y2 | < η
The probability of false alarm is given by PF (δGLRT ) = P0 (|Y1 − Y2 | ≥ η) = P(|Z1 − Z2 | ≥ η) = 2Q Equating PF (δGLRT ) to α, we get η equal to
√
η √ 2
2Q−1 (α/2).
– c. Since the distribution of |Y1 − Y2 | = |2θ + Z1 − Z2 | is identical under both θ = −1 and θ = +1, the probability of detection PD (δGLRT ) = Pθ (|Y1 − Y2 | ≥ η) is independent of θ and is given by
η−2 η+2 √ +Q √ 2 2 −1 = Q Q (α/2) − 2 + Q Q−1 (α/2) + 2 .
P(|2 + Z1 − Z2 | ≥ η) = Q
Chapter 11
Monte Carlo Methods for Statistical Signal Processing Solution 11.1
N var
(N ) X φ(x(i) ) i=1
N
(N ) X 1 (i) = var φ(x ) N i=1 N X X 1 var{φ(x(i) )} + COV[φ(x(i) , φ(x(j) ] = N i=1 i6=j ( ) N −1 X 1 = N σ 2 + 2σ 2 (N − i)ρj N i=1
Solution 11.2 Let us assume that Eπ h(x1 ) = 0. n o n o (0) (1) (0) (1) cov φ(x1 ), φ(x1 ) = E φ(x1 )φ(x2 ) o n (0) (0) (1) = E[E φ(x1 )φ(x2 )|x2 ] o n o n (0) (0) (0) (1) = E[E φ(x1 )|x2 .E φ(x2 )|x2 ] n o (0) (0) = E[E φ(x1 )|x2 ]2 = var[E {φ(x1 )|x2 }]
Solution 11.3 We know that the variance of a random variable Y defined as Y = N X j=1
varXj + 2
X i