135 37 12MB
English Pages 229 Year 2010
Mathematical Summary for Digital Signal Processing Applications with Matlab
E.S. Gopi
Mathematical Summary for Digital Signal Processing Applications with Matlab
123
E.S. Gopi National Institute of Technology, Trichy Dept. Electronics & Communication Engineering 67 Tanjore Main Road Tiruchirappalli-620015 National Highway India [email protected]
ISBN 978-90-481-3746-6 e-ISBN 978-90-481-3747-3 DOI 10.1007/978-90-481-3747-3 Springer Dordrecht Heidelberg London New York Library of Congress Control Number: 2009944069 c Springer Science+Business Media B.V. 2010 No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Cover design: eStudio Calamar S.L. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Dedicated to my son G.V. Vasig and my wife G. Viji
Preface
The book titled “Mathematical summary for Digital Signal Processing Applications with Matlab” consists of Mathematics which is not usually dealt in the DSP core subject, but used in the DSP applications. Matlab Illustrations for the selective topics such as generation of Multivariate Gaussian distributed sample outcomes, Optimization using Bacterial Foraging etc. are given exclusively as the separate chapter for better understanding. The book is written in such a way that it is suitable for Nonmathematical readers and is very much suitable for the beginners who are doing research in Digital Signal Processing. E.S. Gopi
vii
Acknowledgements
I am extremely happy to express my thanks to Prof. K.M.M. Prabhu, Indian Institute of Technology Madras for his constant encouragement. I also thank the Director Prof. M. Chidambaram, Prof. B. Venkataramani and Prof. S. Raghavan National Institute of Technology Trichy for their support. I thank those who directly or indirectly involved in bringing up this Book. Special thanks to my parents Mr. E. Sankarasubbu and Mrs. E.S. Meena.
ix
Contents
1
Matrices .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.1 Properties of Vectors.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.2 Properties of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.3 LDU Decomposition of the Matrix . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.4 PLDU Decomposition of an Arbitrary Matrix . . . . . . . . . . . . . .. . . . . . . . . . . 1.5 Vector Space and Its Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.6 Linear Independence, Span, Basis and the Dimension of the Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.6.1 Linear Independence .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.6.2 Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.6.3 Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.6.4 Dimension .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.7 Four Fundamental Vector Spaces of the Matrix . . . . . . . . . . . . .. . . . . . . . . . . 1.7.1 Column Space .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.7.2 Null Space .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.7.3 Row Space.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.7.4 Left Null Space.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.8 Basis of the Four Fundamental Vector Spaces of the Matrix . . . . . . . . . . 1.8.1 Column Space .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.9 Observations on Results of the Example 1.12 .. . . . . . . . . . . . . .. . . . . . . . . . . 1.9.1 Column Space .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.9.2 Null Space .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.9.3 Left Column Space (Row Space) . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.9.4 Left Null Space.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.9.5 Observation.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.10 Vector Representation with Different Basis . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.11 Linear Transformation of the Vector .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.11.1 Trick to Compute the Transformation Matrix . . . . .. . . . . . . . . . . 1.12 Transformation Matrix with Different Basis . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.13 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.13.1 Basic Definitions and Results . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.13.2 Orthogonal Complement . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.14 System of Linear Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .
1 2 3 7 10 12 12 12 13 13 13 13 14 14 14 14 14 15 20 21 21 21 21 22 22 24 25 25 26 26 27 27 xi
xii
Contents
1.15 Solutions for the System of Linear Equation [A] x D b . . . .. . . . . . . . . . . 1.15.1 Trick to Obtain the Solution .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.16 Gram Schmidt Orthonormalization Procedure for Obtaining Orthonormal Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.17 QR Factorization .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.18 Eigen Values and Eigen Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.19 Geometric Multiplicity (Versus) Algebraic Multiplicity .. . .. . . . . . . . . . . 1.20 Diagonalization of the Matrix .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.21 Schur’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.22 Hermitian Matrices and Skew Hermitian Matrices . . . . . . . . .. . . . . . . . . . . 1.23 Unitary Matrices .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.24 Normal Matrices .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.25 Applications of Diagonalization of the Non-deficient Matrix . . . . . . . . . 1.26 Singular Value Decomposition .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 1.27 Applications of Singular Value Decomposition .. . . . . . . . . . . .. . . . . . . . . . . 2
28 29 36 40 42 44 47 49 50 52 56 58 60 62
Probability . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 67 2.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 67 2.2 Axioms of Probability .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 68 2.3 Class of Events or Field (F) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 68 2.4 Probability Space (S, F, P) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 68 2.5 Probability Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 68 2.6 Conditional Probability .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 69 2.7 Total Probability Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 70 2.8 Bayes Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 70 2.9 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 70 2.10 Multiple Experiments (Combined Experiments) .. . . . . . . . . . .. . . . . . . . . . . 71 2.11 Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 74 2.12 Cumulative Distribution Function (cdf) of the Random Variable ‘x’. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 75 2.13 Continuous Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 76 2.14 Discrete Random Variable.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 76 2.15 Probability Mass Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 76 2.16 Probability Density Function.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 76 2.17 Two Random Variables .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 77 2.18 Conditional Distributions and Densities . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 79 2.19 Independent Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 79 2.20 Some Important Results on Conditional Density Function .. . . . . . . . . . . 80 2.21 Transformation of Random Variables of the Type Y D g.X/ . . . . . . . . . 84 2.22 Transformation of Random Variables of the Type Y1 D g1.X1; X2/; Y2 D g2.X1; X2) . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 85 2.23 Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 99 2.24 Indicator .. . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 99 2.25 Moment Generating Function .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .101 2.26 Characteristic Function .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .102
Contents
xiii
2.27 Multiple Random Variable (Random Vectors) . . . . . . . . . . . . . .. . . . . . . . . . .102 2.28 Gaussian Random Vector with Mean Vector X and Covariance Matrix CX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .107 2.29 Complex Random Variables.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .118 2.30 Sequence of the Number and Its Convergence . . . . . . . . . . . . . .. . . . . . . . . . .119 2.31 Sequence of Functions and Its Convergence . . . . . . . . . . . . . . . .. . . . . . . . . . .120 2.32 Sequence of Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .120 2.33 Example for the Sequence of Random Variable.. . . . . . . . . . . .. . . . . . . . . . .122 2.34 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .122 3
Random Process .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .123 3.1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .123 3.2 Random Variable Xt1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .124 3.3 Strictly Stationary Random Process with Order 1 . . . . . . . . . .. . . . . . . . . . .124 3.4 Strictly Stationary Random Process with Order 2 . . . . . . . . . .. . . . . . . . . . .124 3.5 Wide Sense Stationary Random Process . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .125 3.6 Complex Random Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .127 3.7 Properties of Real and Complex Random Process . . . . . . . . . .. . . . . . . . . . .127 3.8 Joint Strictly Stationary of Two Random Process . . . . . . . . . . .. . . . . . . . . . .127 3.9 Jointly Wide Sense Stationary of Two Random Process . . . .. . . . . . . . . . .128 t 3.10 Correlation Matrix of the Random Column Vector X Ys for the Specific ‘t’ ‘s’ .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .128 3.11 Ergodic Process .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .128 3.12 Independent Random Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .132 3.13 Uncorrelated Random Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .132 3.14 Random Process as the Input and Output of the System. . . .. . . . . . . . . . .132 3.15 Power Spectral Density (PSD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .134 3.16 White Random Process (Noise) .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .137 3.17 Gaussian Random Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .138 3.18 Cyclo Stationary Random Process . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .139 3.19 Wide Sense Cyclo Stationary Random Process . . . . . . . . . . . . .. . . . . . . . . . .139 3.20 Sampling and Reconstruction of Random Process . . . . . . . . . .. . . . . . . . . . .142 3.21 Band Pass Random Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .144 3.22 Random Process as the Input to the Hilbert Transformation as the System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .146 3.23 Two Jointly W.S.S Low Pass Random Process Obtained Using W.S.S. Band Pass Random Process and Its Hilbert Transformation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .148
4
Linear Algebra . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .153 4.1 Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .153 4.2 Linear Transformation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .154 4.3 Direct Sum . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .160 4.4 Transformation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .162 4.5 Similar Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .164
xiv
Contents
4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13
Structure Theorem .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .166 Properties of Eigen Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .171 Properties of Generalized Eigen Space . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .172 Nilpotent Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .173 Polynomial ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .175 Inner Product Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .176 Orthogonal Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .177 Riegtz Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .179
5
Optimization .. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .181 5.1 Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .181 5.2 Extension to Constrained Optimization Technique to Higher Dimensional Space with Multiple Constraints .. .. . . . . . . . . . .186 5.3 Positive Definite Test of the Modified Hessian Matrix Using Eigen Value Computation .. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .189 5.4 Constrained Optimization with Complex Numbers .. . . . . . . .. . . . . . . . . . .193 5.5 Dual Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .194 5.6 Kuhn-Tucker Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .195
6
Matlab Illustrations.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .197 6.1 Generation of Multivariate Gaussian Distributed Sample Outcomes with the Required Mean Vector ‘MY ’ and Covariance Matrix ‘CY ’ . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .197 6.2 Bacterial Foraging Optimization Technique.. . . . . . . . . . . . . . . .. . . . . . . . . . .202 6.3 Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .208 6.4 Newton’s Iterative Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .210 6.5 Steepest Descent Algorithm .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .214
Index . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .217
Chapter 1
Matrices
One-dimensional array representation of scalars are called vector. If the elements are arranged in row wise, it is called Row vector. In the same fashion, if the elements of the vector are arranged in column wise, it is called column vector. Two-dimensional array representations of scalars are called matrix. Size of the matrix is represented as R C, where R is the number of rows and C is the number of columns of the matrix. Scalar elements in the array can be either complex numbers .C/ or the real numbers. .R/. The column vector is represented as X. The Row vector is represented as X T . Example 1.1. Row Vector with the elements filled up with real numbers Œ2:89
21:87 100
Column Vector with the elements filled up with Complex numbers. 2
3 1Cj 6 j 7 6 7 49 C 7j 5 0 Matrix of size 2 3 with the elements filled up with real numbers "
2
3
# 6
4
1
2
Matrix of size 3 2 with the elements filled up with complex numbers 2
j
6 62j 4 0
1Cj
3
7 5j 7 5 j
E.S. Gopi, Mathematical Summary for Digital Signal Processing Applications with Matlab, DOI 10.1007/978-90-481-3747-3 1, c Springer Science+Business Media B.V. 2010
1
2
1 Matrices
1.1 Properties of Vectors Scalar multiplication of the vector X is given as c X , where c R. Example 1.2.
2
1Cj
3
2
2 C 2j
3
7 7 6 6 6 j 7 6 2j 7 7 6 7 6 26 7 7D6 49 C 7j 5 418 C 14j 5 0
0
Linear combinations of two vectors X1 and X2 are obtained as c1 X1 C c2 X2 , where c1, c2 R Example 1.3. 3 3 2 2 5j 1j 7 7 6 6 7 6 6 j 7 6 j 7 6 j 7 7 7D6 7C36 26 7 7 6 6 7 6 4 7j 5 418 7j 5 49 C 7j 5 2
1Cj
3
0
0
0
Example 1.4. Graphical illustration of summation of two vectors [3 1] and [1 2] to obtain the vector [4, 3] is given below (Fig. 1.1). (Recall the Parallelogram Rule of addition). Note that the first and second elements of the vector are represented as the variable X1 and X2, respectively. The variables X1 and X2 can be viewed as the random variables.
X2
(4,3)
(1,2)
(3,1)
X1
Fig. 1.1 Graphical illustration of the summation of two vectors
1.2 Properties of Matrices
3
1.2 Properties of Matrices (a) Matrix addition Let the Matrix A be represented as 2
a11 6a 6 21 6 6a31 6 4 an1
a12 a22 a32 an2
a13 a33 a33 an3
3 a1m a2m 7 7 7 a3m 7 7 5 anm
::: ::: ::: :::
Also let the Matrix B be represented as 2 3 b11 b12 b13 : : : b1m 6b 7 6 21 b22 b23 : : : b2m 7 6 7 6b31 b32 b33 : : : b3m 7 6 7 4 5 bn1 bn2 bn3 bnm 2 a11 C b11 a12 C b12 a13 C b13 6 a Cb a22 C b22 a23 C b23 6 21 21 6 ) A C B D 6 a31 C b31 a32 C b32 a33 C b33 6 4 anm C bnm anm C bnm anm C bnm
::: ::: ::: :::
3 a1m C b1m a2m C b2m 7 7 7 a3m C b3m 7 7 5 bnm C bnm
Note that (i,j)th element of the matrix ‘A’ is represented as aij . Matrix ‘A’ in general is represented as A D Œaij Let C D A C B ) Œcij D Œaij C Œbij (b) Scalar multiplication Let c 2 C or c 2 R ) cA D cŒaij D Œcaij (c) Matrix multiplication The product of the matrix ‘A’ with size n p and the matrix ‘B’ with size p m is the matrix C of size n m. The elements of the matrix C are obtained as follows. c11 c12 c13 : : : c1m c21 c22 c23 : : : c2m C D c31 c32 c33 : : : c3m
cn1 cn2 cn3 : : : cnm Where cij D ai1 b1j C ai 2 b2j C ai 3 b3j C : : : C aip bpj
4
1 Matrices
Example 1.5. Consider the matrix A and B of size 2 3 and 3 3, respectively # " a11 a12 a13 AD a21 a22 a23 2 3 b11 b12 b13 6 7 B D 4b21 b22 b23 5 b31 b32 b33 Let the matrix C D AB "
a11 b11 C a12 b21 C a13 b31 D a21 b11 C a22 b21 C a23 b31
a11 b12 C a12 b22 C a13 b32 a11 b13 C a12 b23 C a13 b33 a21 b12 C a22 b22 C a23 b32 a21 b13 C a22 b23 C a23 b33
#
The matrix A can be viewed as A D Œa1 a2 a3 , where a11 a12 a a2 D a3 D 13 a1 D a21 a22 a23 2 3 b1 Similarly the matrix B can be viewed as B D 4b2 5, where b3 b1 D Œb11
b12
b13
b2 D Œb21 b3 D Œb31
b22 b32
b23 b33
So the matrix C D AB can also be represented as a1 b1 C a2 b2 C a3 b3 Also the matrix AB can be obtained as
i h b11 .a1 / C b21 .a2 / C b31 .a3 / b12 .a1 / C b22 .a2 / C b32 .a3 / b13 .a1 / C b23 .a2 / C b33 .a3 /
(d) Matrix multiplication is associative Let A and B be the two matrices, then A .BC/ D .AB/ C. (e) Matrix multiplication is non-commutative Let A and B be the two matrices, then AB ¤ BA (f) Block multiplication Consider the matrix P as shown below 3 2 a11 a12 a13 b11 b12 7 6 6a21 a22 a23 b21 b22 7 7 6 7 6 6 c11 c12 c13 d11 d12 7 7 6 7 6c 4 21 c22 c23 d21 d22 5 c31 c32 c33 d31 d32
1.2 Properties of Matrices
5
The above matrix ‘P’ can be viewed as the matrix with each element filed up with matrix as mentioned below. This way of representing the matrix is called as Block matrix. 2 3 b11 b12 a11 a12 a13 6 a 7 62 21 a22 a233 2 b21 b22 37 6 7 d11 d12 7 6 c11 c12 c13 64 7 4 c21 c22 c33 5 4d21 d22 55 c31 c32 c33 d31 d32 In short the matrix P is represented as
ŒA ŒB ŒC ŒD
Similarly consider the matrix Q represented as
ŒE ŒF
Then the matrix PQ is represented as
ŒAE C BF ŒCE C DF
This way of multiplying two Block matrices is called as Block multiplication. (g) Transpose of the matrix The transpose of the matrix A is the matrix B whose elements are related as follows. aij D bji Note that transpose of the Block matrix
ŒA ŒB ŒC ŒD
is given as
T ŒA ŒCT ŒBT ŒDT
(h) Square matrix The matrix with number of Rows is equal to the number of Columns. (i) Identity matrix The square matrix with all the elements is filled up with zeros except the diagonal elements which are filled up with all ones.
6
1 Matrices
(j) Lower triangular matrix The square matrix with all the elements is filled up with zeros except the elements in the diagonal and below the diagonal which are filled up at least one non-zero elements. In other words, Lower triangular matrix is the matrix with zeros in the upper triangular portion of the matrix with at least one non-zero element in the remaining portion. (k) Upper triangular matrix The square matrix with all the elements is filled up with zeros except the elements in the diagonal and above the diagonal which are filled up at least one non-zero elements. In other words, Upper triangular matrix is the matrix with zeros in the Lower triangular portion of the matrix with at least one non-zero element in the remaining portion. (l) Diagonal matrix The square matrix with all the elements is filled up with zeros except the diagonal elements which are filled up with at least one non-zero element in the diagonal (m) Permutation matrix Permutation matrix is one when multiplied with the matrix interchanges the elements of the matrix column wise or row wise. If the matrix is multiplied by the permutation matrix, columnwise interchange of the elements of the matrix occurs. Similarly if the permutation matrix is multiplied by some matrix, row wise interchange of the elements of the matrix occurs. 2 3 1 2 3 Example 1.6. Arbitrary matrix A D 44 5 65 7 8 9 2 3 1 0 0 Arbitrary Permutation matrix P D 40 0 15 0 1 0 3 2 1 3 2 A P D 44 6 55 7 9 8 Note that the elements of the second column using the operation A P. 2 1 2 4 PAD 7 8 4 5
and third column is interchanged 3 3 95 6
Note that the elements of the second row and third row is interchanged using the operation P A. P P : : : P is always some permutation matrix. Also note that the identity matrix is the trivial permutation matrix, which when multiplied with any
1.3 LDU Decomposition of the Matrix
7
arbitrary matrix will end up with the same matrix. Also note that the inverse of any arbitrary permutation matrix is always the permutation matrix. (n) Inverse of the matrix Inverse of the matrix is defined only for the square matrix. The matrix A is defined as the inverse of the matrix B if AB D BA D I, where ‘I’ is the identity matrix. If there exists the inverse matrix for the particular square matrix A, then that matrix ‘A’ is known as the Invertible matrix. Otherwise it is called as the non-invertible matrix.
1.3 LDU Decomposition of the Matrix The matrix A as shown below can be represented as the product of three matrices Lower triangular matrix (L) with all ones in the diagonal elements, Diagonal matrix (D) and the upper triangular matrix (U) with all ones in the diagonal elements. 2 3 123 Example 1.7. LDU Decomposition of the matrix A D 42 3 45 346 2 32 3 2 3 1 0 0 1 2 3 1 2 3 ) 40 1 05 42 3 45 D 42 3 45 0 0 1 3 4 6 3 4 6 Note: Row operation on the Identity matrix in the LHS and the same operation done on the RHS will not affect the equality. R2- > R2-2 R1 R3- > R3-3 R1 2 32 3 2 3 1 0 0 1 2 3 1 2 3 ) 42 1 05 42 3 45 D 40 1 25 3 0 1 3 4 6 0 2 3 2 32 32 3 2 3 1 0 0 1 0 0 1 2 3 1 2 3 ) 40 1 05 42 1 05 42 3 45 D 40 1 25 0 0 1 3 0 1 3 4 6 0 2 3 Again applying Row operation we get R3- > R3-2 R2 2 32 32 3 2 3 1 0 0 1 0 0 123 1 2 3 ) 40 1 05 42 1 05 42 3 4 5 D 40 1 25 0 2 1 3 0 1 346 0 0 1
8
1 Matrices
In general, after applying the Row operation the matrix equation will in the form as given below ŒLn ŒLn1 : : : ŒL3 ŒL2 ŒL1 ŒA D ŒU Where L1 ; L2 ; L3 ; L4 ; : : : Ln are the lower triangular matrices with diagonal elements filled up with ones. ‘A’ is the actual matrix ‘U’ is the upper triangular matrix So the matrix A can be represented as the product of A D L1 1 : : : Ln2 1 Ln1 1 : : : Ln 1 U In our example 2
3 1 0 0 L1 D 42 1 05 3 0 1 2 1 0 ) L1 1 D 42 1 3 0
2 3 1 0 0 L2 D 40 1 05 0 2 1 2 3 3 0 1 0 0 05 L2 1 D 40 1 05 1 0 2 1
Note: In General the inverse of the matrix with the characteristics given below can be obtained by direct observation. Characteristics: (a) Diagonal elements filled up ones (b) One column below diagonal filled up atleast one non-zero elements (c) All other elements filled up with zeros Example 1.8. Consider the matrix Ln given below that satisfies all the characteristics mentioned above 1 0 0 0 0 a1 1 0 0 0 Ln D a2 0 1 0 0 a3 0 0 1 0 a4 0 0 0 1 The inverse of the matrix Ln is obtained by direct observation as
Ln 1
1 a1 D a2 a3 a4
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
1.3 LDU Decomposition of the Matrix
2
9
32
32
3
2
3
1 0 0 1 0 0 1 2 3 1 2 3 5 4 5 4 5 4 Consider 4 0 1 D 0 2 1 0 2 3 4 0 1 2 5 0 2 1 3 0 1 3 4 6 0 0 1 2
3 2 1 2 3 )4 2 3 4 5D4 3 4 6 2 3 2 1 2 3 )4 2 3 4 5D4 3 4 5
32 1 0 0 2 1 0 54 3 0 1 32 1 0 0 2 1 0 54
32 3 1 0 0 1 2 3 0 1 0 5 4 0 1 2 5 0 2 1 0 0 1 3 1 2 3 0 1 2 5
3 2 1
0 0
1
3 1 2 3 The matrix 4 0 1 2 5 can be represented as the product of the diagonal ma0 0 1 trix and the upper triangular matrix with all the elements in the diagonal are one as given below. 2
2
3 2 32 3 1 2 3 1 0 0 1 2 3 4 0 1 2 5 D 4 0 1 0 5 4 0 1 2 5 0 0 1 0 0 1 0 0 1 Note: In General the Upper triangular matrix with non-unity diagonal elements can be represented as the product of the diagonal matrix and the Upper triangular matrix with ones in the diagonal as mentioned below Example 1.9. 2 a b ments 4 0 d 0 0
Consider the Upper triangular matrix with non-unity diagonal ele3 c e 5 which can be represented as the product of f 2
3
2
a b c a 0 0 4 0 d e 5D4 0 d 0 0 0 f 0 0 f 2
3 2 a b c a 0 4 5 4 D 0 d e D 0 d 0 0 f 0 0 2
3
2
a 6a 56 6 40 0 32 0 0 54 f
b a d d
0
3 c a 7 7 e 7 d 5 f f
3 1 ba ac 0 1 de 5 0 0 1
3 1 2 3 Thus the invertible matrix A D 4 2 3 4 5 is represented as the product of Lower 3 4 6 triangular matrix with ones in the diagonal, diagonal matrix and the upper triangular
10
1 Matrices
matrix with ones in the diagonal as shown below 2
3 2 32 32 3 1 2 3 1 0 0 1 0 0 1 2 3 4 2 3 4 5 D 4 2 1 0 5 4 0 1 0 5 4 0 1 2 5 3 4 6 3 2 1 0 0 1 0 0 1
1.4 PLDU Decomposition of an Arbitrary Matrix In general an arbitrary matrix A can be represented as the product of the permutation matrix (P), Lower triangular matrix with ones in the diagonal, diagonal matrix (D) with non-zero diagonal elements and the Upper triangular matrix with all ones in the diagonal. If the permutation matrix is the identity matrix, then the matrix A is represented as the product of L, D, U (see Section 1.3) 2 3 1 2 3 Example 1.10. PLDU Decomposition of the matrix A D 4 2 4 4 5 3 4 6 2
32 3 2 3 1 0 0 1 2 3 1 2 3 ) 4 0 1 0 54 2 4 4 5 D 4 2 4 4 5 0 0 1 3 4 6 3 4 6 Note: Row operation on the Identity matrix in the LHS and the same operation done on the RHS will not affect the equality. R2- > R2-2 R1 R3- > R3-3 R1 3 2 3 2 3 1 0 0 1 2 3 1 2 3 ) 4 2 1 0 5 4 2 4 4 5 D 4 0 0 2 5 3 0 1 3 4 6 0 2 3 2
2
3 1 2 3 Note that the matrix 4 0 0 2 5 cannot be further subjected to mere row op0 2 3 eration to obtain the upper triangular format. Hence the following technique using permutation matrix is used. 2
32 3 2 3 1 0 0 1 2 3 1 2 3 4 0 0 1 54 2 4 4 5 D 4 3 4 6 5 0 1 0 3 4 6 2 4 4
1.4 PLDU Decomposition of an Arbitrary Matrix
2
32
32
11
3
2
3
1 0 0 1 0 0 1 2 3 1 2 3 ) 4 0 1 0 54 0 0 1 54 2 4 4 5 D 4 3 4 6 5 0 0 1 0 1 0 3 4 6 2 4 4 Now applying the Row operation on the identity matrix, we get R2- > R2-3 R1 R3- > R3-2 R1 2 32 32 3 2 3 1 0 0 1 0 0 1 2 3 1 2 3 4 3 1 0 5 4 0 0 1 5 4 2 4 4 5 D 4 0 2 3 5 2 0 1 0 1 0 3 4 6 0 0 2 2 32 3 2 32 3 1 0 0 1 2 3 1 0 0 1 2 3 ) 4 0 0 1 5 4 2 4 4 5 D 4 3 1 0 5 4 0 2 3 5 0 1 0 3 4 6 2 0 1 0 0 2 2
3 2 3 1 0 0 1 0 0 Note that the inverse of the matrix 4 3 1 0 5 is 4 3 1 0 5 2 0 1 2 0 1 32 3 3 2 32 1 0 0 1 2 3 1 2 3 1 0 0 ) 4 2 4 4 5 D 4 0 0 1 5 4 3 1 0 5 4 0 2 3 5 2 0 1 0 0 2 3 4 6 0 1 0 2
2
3 2 3 1 0 0 1 0 0 Note that the inverse of the matrix 4 0 0 1 5 is 4 0 0 1 5 0 1 0 0 1 1 2
3 2 32 32 32 3 1 2 3 1 0 0 1 0 0 1 0 0 1 2 3 ) 4 2 4 4 5 D 4 0 0 1 5 4 3 1 0 5 4 0 2 0 5 4 0 1 3=2 5 3 4 6 0 1 0 2 0 1 0 0 2 0 0 1 2 Thus the matrix 4 2
1 2 3 0 1 0
3 2 3 4 4 5 is represented as the product of the permuta4 6 3 3 2 1 0 0 5, Lower triangular matrix 4 3 1 0 5 diagonal matrix
1 0 tion matrix 4 0 0 0 1 2 3 2 2 1 2 1 0 0 4 0 2 0 5 and the Upper triangular matrix 4 0 1 0 0 2 0 0
0 1 3 3 3=2 5. 1
12
1 Matrices
1.5 Vector Space and Its Properties 1. Vector space V over a field F is a set with two operation ‘C’ (addition) and ‘.’ (scalar multiplication) such that the following condition holds x; y 2 V; then x C y 2 V x 2 V; c 2 F; thenc:x 2 F 2. Properties of the vector space: (a) Commutative addition For all x; y 2 V; x C y D y C x (b) Associatively .x C y/ C z D x C .y C z/ (c) Additive identity There exists an element z 2 V such that z C x D x for all x 2 V . z is called zero vector (d) Additive inverse For each x 2 V , there exists y 2 V such that x C y D z (e) There exists 1 2 F , such that 1:x D x for all x 2 V (f) For all a; b 2 F and x 2 V a:.b:x/ D .ab/:x (g) For all a 2 F and x; y 2 V a:.x C y/ D a:x C a:y (h) For all a; b 2 F and x 2 V .a C b/:x D a:x C b:y 3. Subspace S of the vector space V is a subset of the V such that x; y 2 S; then x C y 2 S x 2 S; c 2 F; then c:x 2 S Example 1.11. 1. Set of all real numbers R over the field R is the vector space. 2. Set of all the vectors of the form Œx; y, where x 2 R; y 2 R, over the field R is the vector space which is represented in short as R2 . In general Rn is the vector space over the field R which is the set of all the vectors of the form Œx1 x2 x3 x4 : : : xn , where x1; x2; x3; : : : xn 2 R.
1.6 Linear Independence, Span, Basis and the Dimension of the Vector Space 1.6.1 Linear Independence Consider the vector space ‘V’ over the field F. v1 ; v2 ; v3 ; v4 : : : ; vn 2 V are said to be independent if the linear combinations of the above vectors [(i.e.) ˛1 v1 C
1.7 Four Fundamental Vector Spaces of the Matrix
13
˛2 v2 C ˛3 v3 C ˛4 v4 : : : C ˛n vn , where ˛1 ; ˛1 ; ˛1 ; : : : ˛1 2 R] is the zero vector only when ˛1 D ˛2 D ˛3 D ˛4 D ˛n D 0. Suppose there exists at least one non-zero scalar ˛1 ; ˛2 ; ˛3 ; : : : ˛n 2 R such that ˛1 v1 C ˛2 v2 C ˛3 v3 C ˛4 v4 : : : C ˛n vn D 0, then any one arbitrary vector among the list v1 ; v2 ; v3 ; v4 : : : ; vn can be represented as the linear combinations of the remaining vectors. For instance if all the scalars .˛1 ; ˛2 ; ˛3 ; : : : ˛n 2 R/ are non-zero, then the vector v1 is represented as the linear combinations of other vectors as shown below. ˛2 ˛3 ˛4 ˛n v1 D v2 C v3 C v4 C vn ˛1 ˛1 ˛1 ˛1 This implies that the vector v1 depends upon the other vectors v2 ; v3 ; v4 : : : ; vn . Similarly any vector in the lists can be represented as the linear combinations of other vectors. This implies that the vectors v1 ; v2 ; v3 ; v4 : : : ; vn are dependent vectors.
1.6.2 Span Consider the vector space ‘V’ over the field F. Let v1 ; v2 ; v3 ; v4 : : : ; vn 2 V0 vectors. If all the vectors in the vector space V can be represented as the linear combinations of the above listed vectors, then the listed vectors are called the spanning set of the vector space ‘V’.
1.6.3 Basis Spanning set of the vector space ‘V’ which consists of the minimum number of independent vectors are called the basis of the vector space ‘V’.
1.6.4 Dimension Number of vectors in the Basis is called the dimension of the vector space ‘V’.
1.7 Four Fundamental Vector Spaces of the Matrix Columns of the matrix can be viewed as the set of column vectors. Similarly Rows of the matrix can be viewed as the set of Row vectors.
14
1 Matrices
1.7.1 Column Space Column space of the matrix A of size m n is the vector space over the field ‘R’ which is the subspace of the vector space Rm . Any vector in the column space of the matrix A can be obtained as the linear combinations of the columns of the matrix. Columns of the matrix A forms the spanning set of the Column space.
1.7.2 Null Space Null space of the matrix A of size m n, represented as N (A), is the vector space over the field ‘R’ which is the subspace of the vector space Rn such that the A v D 0 for all v 2 N.A/.
1.7.3 Row Space Row space of the matrix A of size m n, represented as C.AT /, is the vector space over the field ‘R’ which is the subspace of the vector space Rn , which is basically the Column space of the matrix AT . Therefore any vector in the Row space of the matrix A can be obtained as linear combinations of the row vectors. Rows of the matrix A forms the spanning set of the Row space.
1.7.4 Left Null Space Left Null space of the matrix A of size m n, represented as N.AT /, is the vector space over the field ‘R’ which is the subspace of the vector space Rm such that the AT v D 0 for all v 2 N.AT /.
1.8 Basis of the Four Fundamental Vector Spaces of the Matrix Example 1.12.
2
3 1 2 3 4 A D 4 5 5 7 12 5 6 7 10 16
1.8 Basis of the Four Fundamental Vector Spaces of the Matrix
15
1.8.1 Column Space To compute the column space, we need to find the maximum number of independent columns of the matrix A which is the minimum spanning set (i.e.) Basis of the column space. Trick to find out maximum number of Independent columns of the matrix A. Applying the following Row Operation R2- > R2-5 R1 R3- > R3-6 R1 we get 2
3 1 2 3 4 ) 4 0 5 8 8 5 0 5 8 8 2
R3- > R3-R2
3 1 2 3 4 )D 4 0 5 8 8 5 0 0 0 0 2 3 1 2 3 4 ) 4 0 5 8 8 5 0 0 0 0 The above is the Row Reduced Echelon Form (RREF). The Bold numbers in the above matrix are called pivot elements and the corresponding columns are called pivot columns. 3 2 1 2 3 4 4 0 5 8 8 5 0 0 0 0 The bold numbers mentioned in the above matrix can also be treated as the pivot numbers and the corresponding columns are called pivot columns. 2
3 1 2 3 4 A D 4 0 5 8 8 5 0 0 0 0 Similarly the bold numbers mentioned in the above matrix can be treated as the pivot numbers and the corresponding columns are called pivot columns. Pivot columns are independent to each other and it is the maximum number of independent columns of the RREF matrix. It can be shown that the corresponding columns of the original matrix A are independent vectors.
16
1 Matrices
Check for independent vectors The Linear combinations of the independent vectors is equal to zero vector only when the scaling factors are identically zeros. (i.e.) 2 3 2 3 2 3 1 2 0 4 5 4 5 4 ˛1 5 C ˛2 5 D 05 ) ˛1 D ˛2 D 0 6 7 0 Rewriting the above equations in the standard Linear equation form is as shown below. ˛1 C 2˛2 D 0- - - - - - - - - - - - - - - - - - - - - - -.1/ 5˛1 C 5˛2 D 0- - - - - - - - - - - - - - - - - - - - -.2/ 6˛1 C 7˛2 D 0- - - - - - - - - - - - - - - - - - - - -.3/ .2/–5 .1/ and .3/–6 .1/ gives the following equations ˛1 C 2˛2 D 0- - - - - - - - - - - - - - - - - - - - - - - - -.1/ 5˛2 D 0- - - - - - - - - - - - - - - - - - - - - - -.2/ 5˛2 D 0- - - - - - - - - - - - - - - - - - - - - - -.3/ ) ˛1 D ˛2 D 0 The operation performed above is equivalent to the Row operation of the original matrix A. Thus columns of the original matrix corresponding to the pivot columns are independent to each other. Hence the corresponding columns in the original matrix form the column space of the matrix A. Thus column space of the matrix A is represented as set of vectors 2 3 2 3 1 2 ˛1 455 C ˛2 455 ; where ˛1 ; ˛1 2 R 6 7 Also note that the dimension of the column space is 2. Null space of the matrix A is obtained as follows. 2
1 2 3
4
3
6 7 A D 4 5 5 7 12 5 6 7 10 16 Row and the column operation of the matrix can be viewed as the matrix multiplication of the particular matrix with the matrix itself as described below.
1.8 Basis of the Four Fundamental Vector Spaces of the Matrix
32
2
1 2 3 1 0 0 6 76 ) 4 0 1 0 54 5 5 7
3
1 2 3 7 6 12 5 D 4 5 5 7 4
6 7 10 16
0 0 1
17
2
4
3
7 12 5
6 7 10 16
Applying the Row operation on the Identity matrix in the LHS and the matrix in the RHS, we get R2- > R2-5 R1 R3- > R3-6 R1 3 2 1 2 3 4 1 2 3 4 1 0 0 6 5 1 0 7 6 7 6 4 5 4 5 5 7 12 5 D 4 0 5 8 8 6 0 1 6 7 10 16 0 5 8 8 3 2 32 32 2 1 2 3 4 1 0 0 1 0 0 7 6 76 76 6 ) 4 0 1 0 5 4 5 1 0 5 4 5 5 7 12 5 D 4 2
32
0 0 1
6 0 1
6 7 10 16
3 7 5 1 2
3
4
3
7 0 5 8 8 5 0 5 8 8
Again applying the Row operation on the Identity matrix in the LHS and the matrix in the RHS, we get 2 32 32 3 2 3 1 0 0 1 0 0 1 2 3 4 1 2 3 4 6 76 76 7 6 7 0 5 4 5 1 0 5 4 5 5 7 12 5 D 4 0 5 8 8 5 4 0 1 0 1 1
6 0 1
6 7 10 16
0 0
0
0
2
3 2 3 x1 1 2 3 4 6x27 7 4 5 5 7 12 5 Thus an arbitrary vector 6 4x35 if multiplied with the matrix 6 7 10 16 x4 2 3 x1 6x27 7 gives the zero vector, then the same vector 6 4x35 when multiplied with the matrix x4 3 1 2 3 4 4 0 5 8 8 5 gives the zero vector. In other words the Null space of the 0 0 0 0 2 3 1 2 3 4 matrix 4 5 5 7 12 5 is same as that of the Null space of the matrix after per6 7 10 16 forming Row operation. (i.e.) 2 3 1 2 3 4 Null space of the matrix 4 0 5 8 8 5. 0 0 0 0 2
18
1 Matrices
2
3
1 2 3 4 Null space of the matrix 4 0 5 8 8 5 is determined as follows. 0 0 0 0 2 3 3 x1 2 3 1 2 3 4 0 6x27 4 0 5 8 8 5 6 7 D 405 4x35 0 0 0 0 0 x4 2
Bold numbers of the above matrix is treated as pivot columns. Remaining columns are called free variable columns. The corresponding variables x1 and x2 are called pivot variables and the variables x3 and x4 are called free variables. Representing the pivot variables in terms of free variables, we get x1 D 2
8 8 x3 x4 3 x3 4 x4 5 5
16 16 x3 C x4 3 x3 4 x4 5 5 1 4 x1 D x3 x4 5 5 8 8 x2 D x3 x4 5 5 x3 D x3 x1 D
x4 D x4 Thus the set of vectors as shown below form the Null space of the matrix A. 2
2 1 3 2 43 3 x1 5 5 6x27 6 8 7 6 8 7 6 7 D x3 6 5 7 C x4 6 5 7 ; x3; x4 2 R 4x35 4 1 5 4 0 5 x4 0 1 As shown above, the number of independent columns to represent the Null space is two. Hence dimension of the Null space of the matrix A is given as two. Row space and the Left Null space of the matrix A is obtained as the Column space and the Null space of the matrix AT as shown below. 2
3 12 3 4 A D 45 5 7 125 6 7 10 16
2 1 62 T A D6 43 4
3 5 6 5 77 7 7 105 12 16
1.8 Basis of the Four Fundamental Vector Spaces of the Matrix
19
Applying Row operation, R2- > R2-2 R1 R3- > R3-3 R1 R4- > R4-4 R1 We get,
2
1 6 0 6 4 0 0
5 5 8 8
3 6 5 7 7 8 5 8
Further applying the Row operation R3- > R3-(5/8) R2 R4- > R4-(5/8) R2 We get,
2
1 6 0 6 4 0 0
5 5 0 0
3 6 5 7 7 0 5 0
The Bold numbers in the above matrix are called pivots and the corresponding columns are called pivot columns. The columns of the original matrix corresponding to the pivot columns are the maximum number of independent columns of the matrix A. Hence they are the basis of the column space of the matrix AT or the Row space of the matrix A. Thus the Row space of the matrix A is represented as the set as given below. 2 3 2 3 1 5 627 657 6 7 7 ˛1 6 435 C ˛2 4 7 5 ; ˛1 ; ˛2 2 R 4
12
The dimension of the Row space of the matrix A is 2. Similarly the Left Null space of the matrix A is obtained as the Null space of the matrix AT The matrix AT after subjected to Row operation is as shown below. 2
1 6 0 6 4 0 0
5 5 0 0
3 6 5 7 7 0 5 0
20
1 Matrices
Bold letters mentioned above in the matrix are the pivot elements and the corresponding columns are the pivot columns. Other column is called free variable column. Null space of the matrix AT is same as that of the Null space of the matrix AT after subjected to Row operation. Thus the vector space of the form 2 2 3 3 2 3 2 3 1 5 6 0 x1 x1 6 0 5 5 7 607 7 4x25 D 6 7 to obtain the Null 4x25 is required such that 6 4 0 0 405 0 5 x3 x3 0 0 0 0 T space of the matrix A . The variables x1 and x2 are called pivot variables and the variable x3 is called free variable. Representing the pivot variables in terms of free variables, we get the following. x1 C 5 x2 C 6 x3 D 0- - - - - - - - - - - - - - - - - - - - - -.1/ 5 x2 5 x3 D 0- - - - - - - – - - - - - - - - - - - -.2/ Equation .2/ ) x2 D x3 Equation .1/ ) x1 D 5 x2 6 x3 ) x1 D 5 x3 6 x3 D x3 Thus the Basis of the Left Null space of the matrix A, which is the Basis of the Null space of the matrix AT is given below. 3 2 x3 7 6 6x37 8x3 2 R 5 4 x3 2 3 1 6 7 6 D 417 5˛ 8 ˛ 2 R 1 Note that the dimension of the Left Null space of the matrix A is one.
1.9 Observations on Results of the Example 1.12 In the Example 1.12, Column space, Null space, 2 1 and the Left Null space of the matrix A D 4 5 6 following
Left Column space (Row space) 3 2 3 4 5 7 12 5 is obtained as the 7 10 16
1.9 Observations on Results of the Example 1.12
21
1.9.1 Column Space 2 3 2 3 1 2 6 7 6 7 7 6 7 ˛1 6 455 C ˛2 455 ; where ˛1 ; ˛1 2 R 6 7 The dimension of the column space is two. Also note that the Column space of the matrix A sized 3 4 is the subspace of the vector space R3 .
1.9.2 Null Space 2
x1
2
3
1 3 5 6 87 6 5 7 6 7
2
45
3
6 87 6 7 6 5 7 6x27 6 7 6 7 D x3 6 7 C x4 6 7 8x3; x4 2 R 6 7 4 1 5 4 0 5 4x35 x4
0
1
The dimension of the Null space is two. Also note that the Null pace of the matrix A sized 3 4 is subspace of the vector space R4 .
1.9.3 Left Column Space (Row Space) 2 3 2 3 1 5 627 657 6 7 7 ˛1 6 435 C ˛2 4 7 5 8 ˛1 ; ˛2 2 R 4 12 The dimension of the Row space is two. Also note that the Row space of the matrix A sized 3 4 is subspace of the vector space R4 .
1.9.4 Left Null Space 2 3 1 415 ˛ 8 ˛ 2 R 1 The dimension of the Left Null space is one. Also note that the Left Null space of the matrix A sized 3 4 is subspace of the vector space R3 .
22
1 Matrices
1.9.5 Observation 1. The column space and the Left Null space of the matrix A are the subspaces of the R3 , where 3 is the number of rows of the matrix A. 2. The Null space and the Left column space of the matrix A are the subspaces of the R4 , where 4 is the number of columns of the matrix A. 3. Dimension of the column space of the matrix A C Dimension of the Null space of the matrix A D 2 C 2 D 4 D Number of Columns of the matrix A. 4. Dimension of the Left Null space of the matrix AC Dimension of the Left Column space (Row space) of the matrix A D 1 C 2 D 4 D Number of Rows of the matrix A. 5. The Column space of the matrix A and the Left Null space of the matrix A are orthogonal to each other (i.e.) any vector a 2 C.A/ and b 2 N.AT / satisfies the condition aT b D b T a D 0. Proof. Let y 2 N.AT / ) AT y D 0. Taking transpose on both sides, we get y T A D 0 ) y T Ax D 0 (Multiplying ¯arbitrary vector on both sides). ¯ Note that¯ Ax 2 C.A/. Hence proved. 6. The Row of the matrix A and the Null space of the matrix A are orthogonal to each other. Proof. Let y 2 N.A/ ) Ay D 0. Taking transpose on both sides, we get ¯ arbitrary vector x on both sides). Note y T AT D 0 ) y T AT x D 0 (Multiplying ¯ that AT x 2 C.AT /. Hence proved. In general for the matrix A of size m n dim.C.A// C dim .N.A// D n. This is known as Rank-Nullity Theorem. Dimension of the Column space of the matrix A is the maximum number of independent columns (pivot columns) of the matrix A. Let it be ‘r’. From the procedure of the determining the Null space of the matrix A (Representing all the variables in terms of free variables), it can be shown that dimension of the Null space of the matrix A is equal to the number of free variable columns of the matrix A. This is equal to n-r, where ‘n’ is the total number of columns of the matrix A and the ‘r’ is the number of pivot columns. Thus dim.C.A// C dim.R.A// D n.
1.10 Vector Representation with Different Basis 3 2 32 3 2 R2 is the vector represented with respect to the basis 4 415405 0 1 1 0 3 2 32 3 1 2 32 3 0 2 32 3 D3 C4 . (i.e.) The vector 0 415405 1 415405 0 1 4 415405 0 1 0 1 0 1
Example 1.13.
1.10 Vector Representation with Different Basis
23
5 2 32 3 Similarly 2 R2 is the vector represented with respect to the basis 6 4154 1 5 1 1 1 1 5 2 32 3 1 2 32 3 1 2 32 3 . (i.e.) The vector D5 C6 1 1 6 4154 1 5 1 4154 1 5 1 4154 1 5 1 1 1 1 1 1 1 2 32 3 1 2 32 3 5 2 32 3 The vector D 5 C 6 1 4154 1 5 1 4154 1 5 6 4154 1 5 1 1 1 1 1 1 1 2 32 3 1 Also the vector is represented as the linear combinations of 1 1 1 4 54 5 0 1 1 0 1 0 and as 1 C1 1 0 1 1 2 32 3 is represented as the linear combinations of Similarly the vector 1 4154 1 5 1 1 1 0 1 0 and as 1 C .1/ 0 1 0 1 5 2 32 3 is represented as the following Thus the vector 6 4154 1 5 1 1 1 0 1 0 C1 C 6 1 C .1/ 5 1 0 1 0 1 D .5 1 C 6 1/ D
1 0 C .5 1 C 6 1/ 0 1
1 1 5 2 32 3 1 1 6 415;405 0 1
5 2 32 1 1 5 2 32 3 3 D 6 41541 15 1 1 6 415;405 1 1 1 0 1 )
5 2 32 3 11 2 3 2 3 D 6 4154 1 5 1 415;405 1 1 0 1
24
1 Matrices
( 1 ) 5 2 32 3 5 2 32 3 1 1 Similarly D 6 415 4 1 5 6 415405 1 1 ; 0 1 1 1
1.11 Linear Transformation of the Vector T: V->U is the Linear transformation such that any vector in the vector space ‘V’ is mapped to another vector that lies in the vector space ‘U’. There exists the transformation matrix to perform this operation. The vector space V can also be equal to the vector space U. Example 1.14. T: R2 ->R2 such that x 2 R2 is mapped to another vector any vector 1 0 y 2 R2 using the relation y D x. This is the Linear operation of the 0 1 image reflection about the origin. Note that the vectors x and y are represented with respect to the standard basis. Also note that the transformation matrix is with respect to the standard basis. R2 Vector points plotted in the 2D plane before and after transformation are given below (Fig. 1.2).
vector [x y]
2D Map before Transformation vector [−x −y]
2D Map after Transformation
Fig. 1.2 Illustration of the linear transformation of the vector
1.12 Transformation Matrix with Different Basis
25
1.11.1 Trick to Compute the Transformation Matrix 1 1 Identify and note down the transformed vectors for the standard basis and . 0 0 1 0 For the example mentioned above and are the transformed vectors 0 1 1 1 corresponding to the standard basis and respectively. The transformation 0 0 matrix corresponding to the above transformation is obtained by representing the transformed vectors column wise. For the above mentioned example, the transformation matrix is given as 1 0 0 1 This is same as the one given in the Example 1.14.
1.12 Transformation Matrix with Different Basis Consider the transformation matrix with respect to the standard basis as described below 1 0 0 1 1 1 5 2 32 3 5 2 32 3 D represented with reConsider the vector 1 1 6 415;405 6 4154 1 5 1 1 0 1 1 1 1 0 spect to the basis and ; respectively. 1 1 0 1 1 1 5 2 32 3 The vector in the standard basis (i.e.) is transformed into 1 1 6 415;405 0 1 1 0 another vector in the standard basis using the transformation matrix as 0 1 1 0 1 1 5 2 32 3 0 1 1 1 6 415;405 0 1 The above vector in standard basis is equivalent to the vector given below with 1 1 respect to the basis 1 1
26
1 Matrices
(
) 1 1 0 1 1 5 2 32 3 1 1 0 1 1 1 6 415 4 1 5 1 1 ; 1 1
5 Note that the vector and the transformed vector are represented with respect to 6 1 1 the basis ; 1 1 1 0 The matrix is the transformation matrix with respect to the standard 0 1 ( ) 1 1 0 1 1 1 1 3 is the correspond2 32 basis and the matrix 0 1 1 1 415 4 1 5 1 1 ; 1 1 1 1 ing transformation matrix with respect to the basis ; These matrices are 1 1 called similar matrices. In general similar matrix of the matrix A is obtained using invertible matrix M as M 1 AM.
1.13 Orthogonality 1.13.1 Basic Definitions and Results Inner product: Inner product of the vectors x; y 2 Rn is defined as x T y D 2 3 x1 6x27 6 7 6 7 6 : 7 6 7 y1y2 : : : yn D x1 y1 C x2 y2 C x3 y3 : : : C xn yn 6 : 7 6 7 4 : 5 xn Orthogonal vectors: Two vectors x and y are said to be orthogonal if x T y D 0. Orthogonal basis: Basis B is said to be orthogonal if x T y D 0 8 x; y 2 B; x ¤ y Orthonormal basis: Basis B is said to be orthonormal basis if x T y D 0 and x T x D 1 8 x; y 2 B; x ¤ y Example. Standard basis for the vector space Rn A set of mutually orthogonal non-zero vectors are linearly independent. But set of independent vectors need not be orthogonal vectors.
1.14 System of Linear Equation
27
Orthogonality of subspaces: Consider two subspaces S Rn and T Rn . The two subspace S and T are said to be orthogonal if x T y D 0 8 x; 2 S and y 2 T; x ¤ y Example. 1. Column space of the arbitrary matrix A and the Left Null space of the matrix A are the orthogonal spaces. 2. Row space of the arbitrary matrix A and the Null space of the matrix A are orthogonal spaces.
1.13.2 Orthogonal Complement The Vector space ‘V’ that consists of the all the vectors that are orthogonal to the vector space ‘S’ is called the orthogonal complement of the vector space ‘S’. It is denoted as V D S ? . Example 1.15. 1. Column space of the arbitrary matrix A and the Left Null space of the matrix A are the orthogonal complement. Row space of the arbitrary matrix A and the Null space of the matrix A are the orthogonal complement.
1.14 System of Linear Equation System of Linear Equation can be viewed as the problem of estimating the value of ci 0 S , (i.e.) c1 ; c2 ; c3 ; : : : cm 2 R such that c1 X 1 C c2 X2 C : : : cm X m D b. 3 2 3 x1i b1 6x 7 6b 7 6 2i 7 6 27 6 7 6 7 6x3i 7 6b 7 where Xi D 6 7 and b D 6 3 7 and xij ; bj 2 R 6 : 7 6:7 6 7 6 7 4 : 5 4:5 xni bn 2
The Expanded representation of the above mentioned equation is represented as follows. c1 x11 C c2 x12 C c3 x13 C cm x1m D b1 c1 x21 C c2 x22 C c3 x23 C cm x2m D b2 c1 x31 C c2 x32 C c3 x33 C cm x3m D b3
28
1 Matrices
c1 xn1 C c2 xn2 C c3 xn3 C cm xnm D bn Thus the above equation can also be represented as 32 3 2 3 2 c1 b1 x11 x12 x13 x1m 76 7 6 7 6 x 6 21 x22 x23 x2m 7 6 c2 7 6 b2 7 76 7 6 7 6 6 x31 x32 x33 x3m 7 6 c3 7 D 6 b3 7 76 7 6 7 6 4 5 4 5 4 5 xn1 xn2 xn3 xnm cm bm D ŒXc D b
1.15 Solutions for the System of Linear Equation [A] x D b Note: Let the size of the matrix ‘A’ be m n. x 2 Rn ; b 2 Rm . Basis of the vector space Rn can be obtained as the concatenation of the basis vectors in the Row space of the matrix A with the Basis of the Null space of the matrix ‘A’. Let fb1 ; b2 ; : : : br g be the basis of the Row space of the matrix A. Also fbrC1 ; brC2 ; : : : bn g be the basis of the Null space of the matrix A. Then the basis of the vector space Rn is given a fb1 ; b2 ; : : : bn g. Consider any arbitrary vector ‘v’ in the vector space Rn , which can be represented as the linear combinations of the above mentioned basis vectors as given below. v D ˛1 b1 C ˛2 b2 C ˛3 b3 C ˛r br C ˛rC1 brC1 C ˛rC2 brC2 C C ˛n bn D ˛1 b1 C ˛2 b2 C ˛3 b3 C ˛r br / C .˛rC1 brC1 C ˛rC2 brC2 C C an bn / D vr C vn Thus any vector ‘v’ in the vector space Rn can be represented as the direct summation of the vector from the Row space ‘vr’ and the vector from the Null space ‘vn’. Similarly any vector in the vector space Rm can be represented as the summation of the vector from the column space and the vector from the Left Null space. Case 1: If the vector b lies in the column space of the matrix A The solution that exists in the vector space Rn can be represented as the direct sum of the vector from the row space and the vector from the null space. The vector which is obtained from the row space is unique. But any vector from the null space can be chosen to add with the one chosen from the row space to get the solution.
1.15 Solutions for the System of Linear Equation [A] x D b
29
Thus in general, there exists infinite number of solutions for the system of Linear equation. But, if the Null space of the matrix ‘A’ is 0, then the unique solution occurs for the equation of Linear Equation [A] x D b. In this case the unique solution is the vector obtained from the row space Example 1.16. Consider the shown below 2 1 45 6
Linear Equation represented in the matrix form as
2 3 2 3 3 x1 1 2 3 4 6x2 7 7 D 425 5 7 12 5 6 4x3 5 3 7 10 16 x4 2 3 2 3 2 3 x1 1 2 3 4 1 6x27 7 and b D 425 Note that A D 4 5 5 7 12 5 The x D 6 4x35 6 7 10 16 3 x4 The above equation can be viewed as 2 3 2 3 2 3 2 3 2 3 1 2 3 4 1 455 x1 C 455 x2 C 4 7 5 x3 C 4125 x4 D 425 6 7 10 16 3 From the above, we can interpret that the solution for the above linear equation occurs only if the vector b lies in the column space of the matrix A.
1.15.1 Trick to Obtain the Solution 2 3 3 x1 2 3 1 2 3 4 1 6x27 4 5 5 7 12 5 6 7 D 425 4x35 6 7 10 16 3 x4 2
2 3 3 x1 2 32 3 1 0 0 1 2 3 4 1 0 0 1 6x27 7 D 4 0 1 0 5 425 ) 4 0 1 0 5 4 5 5 7 12 5 6 4x35 0 0 1 6 7 10 16 0 0 1 3 x4 2
32
Applying the Row operation on the identity matrix on both sides will not affect the equality
30
1 Matrices
R2- > R2-5 R1 R3- > R3-6 R1 2 3 2 3 x1 1 1 0 0 1 2 3 4 6x27 6 6 5 1 0 7 4 6 7 5 4 5 5 5 7 12 4 5 D 4 5 x3 6 7 10 16 6 0 1 6 x4 2 3 2 2 32 3 x1 1 0 0 6 7 1 2 3 4 6 76 7 6x27 7 6 ) 4 0 1 0 5 4 0 5 8 8 5 6 6x37 D 4 0 5 8 8 4 5 0 0 1 x4 2
32
32 3 1 76 7 1 0 5 425 0 0 0 1 1 0 0
3 32
1
3
76 7 0 1 0 5 435 0 0 1
3
Applying again the Row operation R3- > R3-R2
2 3 2 32 3 3 x1 1 0 0 1 1 0 0 1 2 3 4 6x27 7D4 0 1 5 435 )4 0 1 0 0 5 4 0 5 8 8 5 6 4x35 0 1 1 3 0 1 1 0 5 8 8 x4 2
We get
32
2 3 3 x1 2 3 1 2 3 4 1 6x27 7 D 435 ) 4 0 5 8 8 5 6 4x35 0 0 0 0 0 x4 2
The modified equation mentioned above can be used to compute the values for the unknown variables x1; x2; x3 and 2 x4.3 1 To get the solution, the vector 435 should lie in the column space of the matrix 0 2 3 1 2 3 4 40 5 8 85. The Bold letters mentioned in the matrix are called pivot elements 0 0 0 0 and the corresponding columns are called pivot columns and the remaining columns are called free variable columns. Hence the column space of the above matrix is obtained as 2 3 2 3 1 2 4 4 5 x1 0 C x2 55 0 0
1.15 Solutions for the System of Linear Equation [A] x D b
31
Thus for some values x1 and x2 2 3 2 3 2 3 1 2 1 6 7 6 7 6 7 x1 405 C x2 455 D 435 0 ) x2 D
0
0
3 1 x1 D 5 5
Thus the particular solution of the equation 3 2 3 2 2 3 1=5 x1 2 3 3 x1 1 1 2 3 4 7 6 7 6 6 7 x27 6 7 6x27 6 3=5 7 6 76 7 6 7 6 7 6 4 5 5 7 12 5 6 7 D 425 is 6 7 D 6 7 4x35 4 0 5 4x35 3 6 7 10 16 0 x4 x4 2
2
3 2 3 n1 1 2 3 4 6n27 7 4 5 Consider the vector 6 4n35 in the null space of the matrix A D 5 5 7 12 6 7 10 16 n4 (i.e.) 2 3 2 3 n1 2 3 1 2 3 4 0 6 7 n27 6 7 6 76 6 7 4 5 5 7 12 5 6 7 D 405 4n35 6 7 10 16 0 n4 Combining the two equation, we get 2
1 2 3 4 4 5 5 7 12 6 7 10 16 2
1 2 3 )4 5 5 7 6 7 10
2 3 3 x1 C n1 2 3 1C0 6x2 C n27 7 4 56 5 4x3 C n35 D 2 C 0 3C0 x4 C n4 2 3 3 x1 C n1 2 3 4 1 6x2 C n27 7 D 425 12 5 6 4x3 C n35 16 3 x4 C n4
3 x1 C n1 6x2 C n27 7 )6 4x3 C n35 is the solution of the Linear equation. x4 C n4 2
32
1 Matrices
2
2 3 3 x1 n1 6x27 6n27 6 7 7 where 6 4x35 is the particular solution and 4n35 is the particular vector in the Null x4 n4 space of the matrix A So the complete solution of Linear Equation of the form Ax D b consists of particular solution C any vector in the null space of the matrix A. Computation of Null space of the matrix A The matrix A in RREF is given below 2
3 1 2 3 4 4 0 5 8 8 5 0 0 0 0 Representing pivot variables in terms of free variables, we get 8 8 x4 x2 D x3 C 5 5 x1 D 2x2 3x3 4x4 8 8 x4 3x3 4x4 ) x1 D 2 x3 C 5 5 16 16 x3 C x4 3x3 4x4 ) x1 D 5 5 1 4 x4 ) x1 D x3 C 5 5 The Null space of the matrix A is given as 2 43 2 1 3 2 1 3 4 5 5 x3 5 x4 5 6 87 6 87 6 8 7 8 6 7 6 7 6 5 x3 5 x47 6 7 ) x3 6 5 7 C x4 6 5 7 4 0 5 4 1 5 4 5 x3 0 1 x4 Thus the Complete solution of the Linear Equation is given as 2 1 3 2 43 2 13 5 5 5 6 87 6 87 6 3 7 6 5 7 6 5 7 6 5 7 6 7 C x3 6 7 C x4 6 7 ; where x3; x4 2 R 6 1 7 6 0 7 6 0 7 4 5 4 5 4 5 0 0 1 Case 2: If the vector b does not lie in the column space of the matrix A For any x 2 Rm ; Ax always lies in the column space of the matrix A and hence If the vector b does not lie in the column space of the matrix A, then solution doesn’t occur.
1.15 Solutions for the System of Linear Equation [A] x D b
33
In this case the vector x is estimated such that kAx bk is minimized. It is also known that the solution occurs in the vector space Rm (i.e.) x 2 Rm Any vector in the vector space Rm can be represented as the direct sum of the vector from the column space and the vector from the Left Null space of the matrix A. The vector bc that lies in the column space of the matrix A are to be found such that kbc bk is minimized. Note: The vector b D bc C bln : Multiplying the matrix AT on both sides; we get AT b D AT .bc C bln / D AT bc C AT bln D AT bc C 0 ŒBecause bln lies in the left Null Space of the matrix A ) AT b D AT bc Consider solving the equation Ax D b when b does not lie in the column space of the matrix A. The vector x cannot be found, because b does not lie in the column space of A. Hence the best value for the vector x is estimated as b x such that Ab x D bc Multiplying the matrix AT on both sides, we get x D AT bc AT Ab )b x D .AT A/1 AT bc )b x D .AT A/1 AT b
Œ* AT bc D AT b
Note that .AT A/1 .AT A/ is the identity matrix and hence .AT A/1 AT is the left inverse of the matrix A. Also note that the best estimate b x D .AT A/1 AT b exists only when AT A is T invertible. Also note that A A is invertible if and only if the columns of A are independent (i.e.) A is invertible. This is true because Null space of A is exactly equal to the Null space of AT A. So if N.A/ is 0, then N.AT A/ is 0 and hence if A is invertible, AT A is invertible and vice versa. x is the estimate It can also be shown that the above estimated value for x (i.e.) b of the Multi variable x such that kAx bk is minimized. Example 1.17. Solving the Linear Equation of the form Ax D b 2 1 Where A D 42 3
2 3 3 4 5 55 b D 4 7 5 6 10
34
1 Matrices
The best estimate for the variable x lies in the vector space R2 . It is known that dim.N.A// C dim.R.A// D 2 dim.C.A// C dim.N.AT // D 3 dim.N.A// C dim.C.A// D 2 Note that the dimension of the column space of the matrix A is 2. From the above mentioned fact, dimension of the Null space of the matrix A is 0, the dimension of the row space of the matrix A is 2 and the dimension of the Left Null space of the matrix A is 1. The vector b does not lie in the column space of A. The best estimate for the variable x is found as b x such that Ab x D bc and kbc bk is minimized. The vector bc lies in the column space of the matrix A, which can be represented as 2
3 1 4 c x1 D Ab x bc D 4 2 5 5 b x2 36 Consider the vectors bc and b as shown below (Fig. 1.3). From the figure, it can be realized that the distance between the vector bc and b occur only when the vector bc is orthogonal to the vector .b bc /. As bc lies in the column space of A, the vector .b bc / orthogonal to the individual basis of the column space of the matrix A, 2 2 3T 0 1 1 ) 425 @b 42 3 3
1 3 4 c x1 A D 0- - - - - - - - - - - - - - - - - - - - - - - - - - - - -.1/ 55 b x2 6
b
(b-bc)
Fig. 1.3 Pictorial representation of the vectors bc and b
0
bc
1.15 Solutions for the System of Linear Equation [A] x D b
2 3T 0 2 4 1 455 @b 42 6 3
35
1 3 4 c x1 A D 0- - - - - - - - - - - - - - - - - - - - - - - - - - - - -.2/ 55 b x2 6
Combining both the equations we get 1 3T 0 2 3 1 4 1 4 c 4 2 5 5 @b 4 2 5 5 x1 A D 0 b x2 3 6 3 6 3 2 3T 2 3T 2 1 4 c 1 4 1 4 x1 )4 2 5 5 bD4 2 5 5 4 2 5 5 b x2 3 6 3 6 3 6 2 3T 2 2 3T 3 1 4 1 4 1 4 c x1 )4 2 5 5 4 2 5 5 D4 2 5 5 b b x2 3 6 3 6 3 6 02 311 2 3T 3T 2 1 4 1 4 1 4 c x1 B C ) D @4 2 5 5 4 2 5 5 A 4 2 5 5 b b x2 3 6 3 6 3 6 02 1 1 2 3T 2 3 3 3T 2 1 4 14 5 1 4 c x1 B4 C 4 4 4 5 5 5 D@ 2 5 ) x D .AT A/1 AT b 2 5 A 25 7 5 D Œb b x2 3 6 36 10 3 6 2
which is same as the one used in the previous section. 1:7222 Œb x D 0:7778 Œb x D .AT A/1 AT b Multiplying both sides by A AŒb x D .AT A/1 AT b AŒb x is the vector bc that lies in the column space of the matrix A. ) A.AT A/1 AT b D bc A.AT A/1 AT is called as the projection matrix, that projects the vector in the vector space R3 into the column space of the matrix A such that kbc bk is minimized.
36
1 Matrices
In this example Projection matrix is 2
3 0:8333 0:3333 0:1667 4 0:3333 0:3333 0:3333 5 0:1667 0:3333 0:8333 .AT A/1 .AT A/ D Identity matrix. Hence it is also clear that .AT A/1 AT is the left inverse of the matrix A, which is usually represented as AC . In this example Left inverse matrix of the matrix A is given as 0:9444 0:1111 0:7222 0:4444 0:1111 0:2222 2 3 1 4 0:9444 0:1111 0:7222 4 1 0 5 2 5 D 0:4444 0:1111 0:2222 0 1 3 6 Left inverse of the matrix exists if .AT A/1 exists. Similarly if .AAT / .AAT /1 D Identity matrix, AT .AAT /1 is the right inverse of the matrix A. Right inverse of the matrix exists only when .AAT /1 exists. As already mentioned, if the matrix A consists of all columns independent, .AT A/1 exists and hence Left inverse of the matrix exists. Similarly if all the rows of the matrix A are independent (i.e.) all the columns of the matrix AT is independent, Right inverse of the matrix exists. Thus solving the equation Ax D b when b is not in the column space of the matrix A (the matrix with all the columns are independent to each other), can be obtained by multiplying the left inverse matrix of the matrix A represented as AC with the vector b Œb x D .AT A/1 AT b D AC b Note: Solving the equation Ax D b, when b is not in the column space of the matrix A and .AT A/1 doesn’t exist can be obtained using Singular Value Decomposition (SVD) which is referred in the Section 4.25.
1.16 Gram Schmidt Orthonormalization Procedure for Obtaining Orthonormal Basis 1. Given the set of independent vectors fv1; v2; v3; : : : vng which forms the basis of the vector space V, an alternative orthonormal basis fa1; a2; : : : ang for the vector space ‘V’ can be obtained using Gram-Schmidt Orthonormalization procedure as described below (Fig. 1.4). Steps v1 1. a1 D kv1k
1.16 Gram Schmidt Orthonormalization Procedure for Obtaining Orthonormal Basis Fig. 1.4 Illustration of Gram-Schmidt orthogonalization
37
v2
v2-p*a1
0
p*a1
a1
Find out the vector a2 corresponding to the vector v1 that is orthogonal to the vector a1 2. The vector that is orthogonal to the vector a1 is v2p a1 as shown in the figure, where p a1 is the perpendicular projection of the vector v2 on the vector a1. The projection point p is computed using the following condition. .v2 pa1/T a1 D ).v2/T a1 D .pa1/T a1 )p D
v2T a1 D v2T a1Œ* a1T a1 D 1 D a1T v2 a1T a1
Thus the orthogonal vector q2 to the vector a1 is obtained as q2 D v2 q2 .a1T v2/ a1 and the corresponding orthonormal vector is given as a2 D kq2k 3. Now we are in need of the vector which is orthogonal to both a1 and a2 corresponding to the vector v3 The perpendicular projection vector of the vector v3 on the column space of the matrix consists of a1 and a2 as their columns is obtained as p1 a1 C p2 a2 (as shown in Fig. 1.5). The vector v3 p1 a1 C p2 a2 is perpendicular to both the vectors a1 and a2. The projection points p1; p2 are obtained as the perpendicular projection of the vector v3 on the vector a1 and a2 respectively which are computed as p1 D
v3T a1 D v3T a1Œ* a1T a1 D 1 D a1T v3 a1T a1
p2 D
v3T a2 D v3T a2Œ* a2T a2 D 1 D a2T v3 a2T a2
38
1 Matrices
Fig. 1.5 Illustration of Gram-Schmidt orthogonalization
v3
v3-p1*a1-p2*a2 a2
0 p1*a1+p2*a2
a1
Thus the vector perpendicular to both the orthonormal vectors a1 and a2 is given as q3 D v3 a1T v3 a1 a2T v3 a2 - - - - - - - - - - - - - - - - - - - - - - - - -(3) and the corresponding orthonormal vector which is perpendicular to both a1 and a2 is given as q3 a3 D kq3k This can also be computed using the Projection matrix as follows. The perpendicular projection vector q30 of the vector of the vector v3 on the column space of the matrix consists of a1 and a2 as their columns is obtained using projection matrix as follows Let A D Œa1 a2 (Note that a1 and a2 are the column vectors). Projection vector q30 D A.AT A/1 AT q3 ! 1 a1T a1T q3 D a1 a2 a1 a2 a2T a2T T 1 a1 q3 D a1 a2 .I / a2T a1T D a1 a2 q3 a2T D .a1a1T C a2T a2/q3 Therefore perpendicular vector q3 D v3 q30 D v3 .a1a1T C a2T a2/q3 that is same as Eq. 3 as mentioned above.
1.16 Gram Schmidt Orthonormalization Procedure for Obtaining Orthonormal Basis
39
4. Similarly the vector perpendicular to both the orthonormal vectors a1 and a2; a3; a4; : : : an 1 corresponding to the independent vector vn is given as q n D vn a1T vn a1 a2T vn a2 a3T vn a3 a4T vn a4 an1 T vn vector is
an1 and the corresponding orthonormal
an D
qn kq nk
Thus the set of orthonormal basis fa1; a2; : : : ang, of the vector space ‘V’ corresponding to the basis vectors fv1; v2; : : : ; vng is obtained using Gram-Schmidt orthogonalization procedure. 2 3 1 5 10 62 6 107 7 Example 1.18. Consider the matrix A D 6 43 7 115 whose column vectors are 4 8 12 independent to each other. Using Gram-Schmidt, the corresponding orthonormal vectors are obtained as follows 2 3 2 3 2 3 1 5 10 627 667 6107 7 6 7 6 7 Let v1 D 6 435 ; v2 D 475 and v3 D 4115 4
8
2 3 0:1826 60:36517 v1 7 D6 a1 D 40:54775 kv1k 0:7303
12
2
3 2:6667 6 1:3333 7 7 q2 D v2 .a1T v2/ a1 D 6 4 0:0000 5 1:3333 2 3 0:1865 6 0:4082 7 q2 7 D6 a2 D 4 0:0000 5 kq2k 0:4082
2
3 0:3000 60:40007 7 q3 D v3 a1T v3 a1 a2T v3 a2 D 6 40:10005 0:2000
40
1 Matrices
2
3 0:5477 60:73037 q3 7 D6 a3 D 40:18265 kq3k 0:3651 Thus the set of orthonormal vectors are 2 3 0:1826 0:8165 0:5477 60:3651 0:4082 0:73037 6 7 40:5477 0:0000 0:18265 0:7303 0:4082 0:3651
1.17 QR Factorization The matrix with independent column vectors can be represented as the product of the matrix with orthonormal column vectors and the upper triangular matrix. Consider the matrix A as shown below. A D Œv1 v2 : : : vn, where v1 v2 : : : vn are the independent vectors. From Gram Schmidt orthogonalization procedure discussed in the previous section, v1 kv1k ) v1 D a1kv1k a1 D
Multiplying a1T on both sides, we get ) a1T v1 D a1T a1kv1k ) kv1k D a1T v1 From the above, we get v1 D a1.a1T v1/ Similarly q2 kq2k ) kq2ka2 D q2
a2 D
Multiplying a2T on both sides, we get ) a2T q2 D a2T a2kq2k ) kq2k D a2T q2
1.17 QR Factorization
41
Also q2 D v2 .a1T v2/ a1 .see previous section/ Multiplying a2T on both sides, we get a2T q2 D a2T v2 a2T .a1T v2/ a1 ) a2T q2 D a2T v2 Also we know kq2k D a2T q2 ) a2T v2 D kq2k Rewriting the equation for v2 we get, v2 D q2 C .a1T v2/ a1 )v2 D kq2ka2 C .a1T v2/ a1 )v2 D .a2T v2/ a2 C .a1T v2/ a1 Similarly it can be shown that )v3 D a3 .a3T v3/ C a2 .a2T v3/ C a1 .a1T v3/ Similarly it can be shown that ) vn D an .anT vn/ C an1 .an1 T vn/ C an2 .an2 T vn/ Can3 .an3 T vn/ C an4 .an4 T vn/ C a1 .a1 T vn/ Representing the above list of equations for v1; v2; v3; : : : vn in matrix form A D v1 v2 v3 : : : vn 2 T .a1 v1/ .a1T v2/ .a1T v3/ 6 0 .a2T v2/ .a2T v3/ 6 6 0 0 .a3T v3/ 6 6 0 0 6 0 D a1 a2 a3 : : : an 6 6 0 0 0 6 6 0 0 0 6 4 ::: ::: ::: 0 0 0
3 .a1T vn/ .a2T vn/ 7 7 .a3T vn/ 7 7 7 .a4T vn/ 7 7 T .a5 vn/ 7 7 .a6T vn/ 7 7 ::: 5 .anT vn/
Thus the matrix A is represented as the product of the matrix consists of orthonormal vectors and the upper triangular matrix as shown above.
42
1 Matrices
Example 1.19. From the Example 1.17, QR factorization of the matrix A is given below 2 3 1 5 10 62 6 107 6 7 43 7 115 D 4 8 12 2 3 3 0:1826 0:8165 0:5477 2 60:3651 0:4082 0:73037 5:4772 12:7802 20:2657 6 74 0 3:2660 7:3485 5 40:5477 0:0000 0:18265 0 0 i 0:5477 0:7303 0:4082 0:3651
1.18 Eigen Values and Eigen Vectors For the given matrix A, if there exists the non zero vector x such that Ax D x, where is the scaling factor, then the vector x is the Eigen vector corresponding to the matrix A. x is in the column space of the matrix A. As is the scalar, the Eigen vector x lies in the column space of the matrix A. )ŒA I x D 0 where I is the Identity matrix ) The Null space of the matrix ŒA I ¤ 0 ) If the matrix A is the square matrix, the matrix ŒA I is called as singular matrix. ) Determinant .A I / D 0, which is represented as jA I j D 0 is the polynomial equation with variable . The equation thus obtained is called characteristic equation and the solutions to the characteristic equation are called Eigen values. Let the degree of the characteristic polynomial be ‘n’ and the corresponding Eigen values are 1; 2; 3; : : : n (say). Eigen values thus obtained need not be distinct. For every distinct Eigen values, there exist one or more Eigen vectors that are obtained as follows. The Eigen vectors corresponding to the Eigen value 1 are obtained as the basis of the Null space of the matrix ŒA 1 I . 2 3 1 0 0 Example 1.20. Consider the matrix A D 40 2 35 0 4 5 The Characteristic polynomial of the matrix A is given as 3 C 82 5 2 D 0
1.18 Eigen Values and Eigen Vectors
43
The Eigen values of the matrix A are given as 1 D 1 p 7 C 57 D 7:2749 2 D 2 p 7 57 D 0:2749 3 D 2 The Eigen vectors corresponding to the distinct Eigen values are obtained as follows. Eigen vector corresponding to the Eigen value 1 D 1 is the null space of the matrix 2 3 2 3 2 3 1 0 0 1 0 0 0 0 0 ŒA 1 I D A D 40 2 35 1 40 1 05 D 40 1 35 0 4 5 0 0 1 0 4 4 2
3 0 0 0 The Null space of the matrix 40 1 35 is obtained as follows. 0 4 4 2 3 1 405 x; x; 2 R 0 Similarly the Null space of the matrix ŒA 2 I is obtained as 2
3 0 40:49445 x; x 2 R 0:8693 and the Null space of the matrix ŒA 3 I is obtained as 3 0 40:79685 x; x 2 R 0:6042 2
2 3 2 3 2 3 1 0 0 Thus the Eigen vectors are 405 ; 40:49445 ; 40:79685 0 0:8693 0:6042
44
1 Matrices
1.19 Geometric Multiplicity (Versus) Algebraic Multiplicity Let the distinct Eigen values of the arbitrary matrix ‘A’ are 1 ; 2 ; 3 ; : : : k such that 1 is repeated n1 times, 2 is repeated n2 times and similarly k is repeated nk times. The sequence of numbers n1 ; n2 ; n3 ; : : : nk are called Algebraic multiplicity of the corresponding Eigen values 1 ; 2 ; 3 ; : : : k . The dimension of the Null space of the following matrices ŒA 1 I ; ŒA 2 I ; ŒA 3 I ; : : : ŒA k I are m1 ; m2 ; m3 ; : : : mk respectively. The sequence of numbers m1 ; m2 ; m3 ; : : : ; mk are called Geometric multiplicity corresponding to the Eigen values 1 ; 2 ; 3 ; : : : k . For any Eigen value ‘k ’, mk nk . If mk < nk , the concern matrix ‘A’ is called deficient matrix. Note: 1. Eigen vectors corresponding to distinct Eigen values are independent. Proof. Let 1 ; 2 ; 3 forms the Eigen values of the matrix A and the corresponding Eigen vectors are e1 ; e2 ; e3 respectively. Let suppose c1 e1 C c2 e2 C c3 e3 D 0 .a/: Multiply the matrix A on both sides, we get c1 Ae1 C c2 Ae2 C c3 Ae3 0: ) c1 Ae1 C c2 Ae2 C c3 Ae3 D 0: ) c1 1 e1 C c2 2 e2 C c3 3 e3 D 0 .b/: .b/ 1 .a/ we get cl .1 3 /e1 C c2 .2 3 /e2 D 0 .c/ Multiplying the matrix A on both sides, we get c1 .1 3 /A e1 C c2 .2 3 /Ae2 D 0 )c1 .1 3 /1 e1 C c2 .2 3 /2 e2 D 0 .d / .d / 2 .c/ we get c1 .1 3 /.1 2 D 0 ) c1 D 0 Œ* 1 ; 2 ; 3 are distinct Eigen values
Similarly it can be shown that c2 D 0 and c3 D 0 Hence proved (See Section 1.6)
1.19 Geometric Multiplicity (Versus) Algebraic Multiplicity
45
In the Example 1.19, the Eigen vectors corresponding to distinct Eigen values 2 3 2 3 2 3 1 0 0 1, 7.2749 and 0:2749 are 405 ; 40:49445 ; 40:79685. They are independent 0 0:8693 0:6042 vectors. 2. Note that Eigen vectors corresponding to the particular Eigen value is the basis of the null space matrix of the form ŒA 1 I , where 1 is the Eigen value of the matrix A and hence they are independent vectors. Also Eigen vectors corresponding to distinct Eigen values are also independent. Number of Eigen values with or without repetition is equal to the degree of the characteristic function. Thus if the matrix A of size m n is not deficient, Eigen vectors of the matrix A forms the basis for the Rm . In the Example 1.19, the matrix A of size 3 3 is not the deficient matrix because of the following reasons. Geometric multiplicity of the Eigen value 1 D Algebraic multiplicity of the
particular Eigen value 1 D 1
Geometric multiplicity of the Eigen value 7.2749 D Algebraic multiplicity of
the particular Eigen value 7:2749 D 1
Geometric multiplicity of the Eigen value 0:2749 D Algebraic multiplicity
of the particular Eigen value 0:2749 D 1
Thus the Eigen vectors mentioned above forms the basis of the vector space R3 3. Let Determinant .ŒA I D . 1 /. 2 /. 3 / : : : . n / Substituting the value for D 0 on both sides, we get Determinant .ŒA/ D .0 1 /.0 2 /.0 3 / : : : .0 n / D D .1/n 1 2 3 : : : n It can also be shown that trace of the matrix D a1 C a2 C a3 C an D 1 C 2 C 3 C C n by comparing the co-efficient of ./n1 4. Let one of the similar matrix for the arbitrary matrix A of size n n be V and is obtained using the arbitrary invertible matrix M of size n n as B D M 1 AM. It can be shown that the Eigen values of the matrices A and B are equal. Also it can also be shown that if x is the Eigen vector of the matrix A, then M 1 x is the Eigen vector of the matrix B. Proof. Consider the characteristic of the matrix B as jB I j D 0j. Substitute B D M 1 AM in the equation we get jM 1 AM I j D 0j )jM 1 AM M 1 M j D 0j )jM 1 .A I /M j D 0j
46
1 Matrices
)j.A I /j D 0j ) The Characteristic equation of the matrix A and the Characteristic equation of the matrix B are the same and hence their Eigen values are equal. If ‘x’ is the Eigen vector of the matrix A corresponding to the Eigen value ‘’. Ax D x. )MBM 1 x D x)B.M 1 x/ D .M 1 x/ Thus M 1 x is the Eigen vector of the matrix B. 5. Computing Eigen vector for the block diagonal matrix. Consider the matrix of the form 2 3 1 2 3 0 0 0 64 5 6 0 0 0 7 6 7 6 7 A 0 67 8 9 0 0 0 7 C D6 ; where 7D 60 0 0 10 11 127 0 B 6 7 40 0 0 13 14 155 0 0 0 16 17 18 2 3 2 3 1 2 3 10 11 12 A D 44 5 65 B D 413 14 155 7 8 9
16 17 18
Eigen vectors of the matrix A (Arranged column wise) are given as 2 3 0:2320 0:7858 0:4082 40:5253 0:0868 0:81655 0:8187 0:6123 0:4082 Similarly Eigen vectors of the matrix B (Arranged column wise) are given as 2 3 0:4482 0:7392 0:4082 40:5689 0:0333 0:81655 0:6896 0:6727 0:4082 The Eigen vectors of the matrix C are obtained as follows. (See Eigen vectors of A and B.) 3 2 0:2320 0:7858 0:4082 0 0 0 60:5253 0:0868 0:8165 0 0 0 7 7 6 7 6 0 0 0 7 60:8187 0:6123 0:4082 7 6 6 0 0 0 0:4482 0:7392 0:4082 7 7 6 4 0 0 0 0:5689 0:0333 0:81655 0 0 0 0:6896 0:6727 0:4082
1.20 Diagonalization of the Matrix
47
1.20 Diagonalization of the Matrix If the matrix A is not the deficient matrix, it can be diagonalizable as described below. If the matrix A of size m n is not the deficient matrix, then there exists ‘n’ Eigen vectors e1 ; e2 ; e3 ; e4 ; e5 ; e6 ; : : : en , corresponding to ‘n’ Eigen values 1 2 3 4 5 : : : n with or without repetition, that satisfies the following conditions. A e 1 D 1 e 1 A e 2 D 2 e 2 A e 3 D 3 e 3 ::: A e n D 1 e n
The above set of equations is written in the matrix form as shown below. 2
32 a11 a12 a13 a14 : : : a1n e11 e21 6a 7 6e a a a : : : a 2n 7 6 12 e22 6 21 22 23 24 6 76 6a31 a32 a33 a34 : : : a3n 7 6e13 e23 6 76 6a41 a42 a43 a44 : : : a4n 7 6e14 e24 6 76 4: : : : : : : : : : : : : : : : : : 5 4: : : : : : an1 an2 an3 an4 : : : ann e1n e2n 32 2 1 0 e11 e21 e31 e41 : : : en1 76 6e 6 12 e22 e23 e42 : : : en2 7 6 0 2 76 6 e e e : : : en3 7 6 0 0 6e D 6 13 23 33 43 76 6e14 e24 e34 e44 : : : en4 7 6 0 0 76 6 4 : : : : : : : : : : : : : : : : : : 5 4: : : : : : e1n e2n e3n e4n : : : enn 0 0 2 3 a11 a12 a13 a14 : : : a1n 6 7 6a21 a22 a23 a24 : : : a2n 7 6 7 6 7 6a31 a32 a33 a34 : : : a3n 7 6 7 7 )6 6a41 a42 a43 a44 : : : a4n 7 6 7 6 7 6::: ::: ::: ::: ::: ::: 7 6 7 4 5 an1 an2 an3 an4 : : : ann
e31 e23 e33 e34 ::: e3n
e41 e42 e43 e44 ::: e4n
::: ::: ::: ::: ::: :::
0 0 ::: 0 0 ::: 3 0 : : : 0 4 : : : ::: ::: ::: 0 0 :::
3 en1 en2 7 7 7 en3 7 7 en4 7 7 :::5 enn 3 0 07 7 7 07 7 07 7 : : :5 n
48
1 Matrices
2
e11 6 6e12 6 6 6e13 6 D6 6e14 6 6 6: : : 6 4 e1n 2
e21 e31 e41 : : : en1
3
3 72 0 0 0 ::: 0 e22 e23 e42 : : : en2 7 76 1 7 6 0 2 0 0 : : : 0 7 7 7 e23 e33 e43 : : : en3 7 76 0 0 3 0 : : : 0 7 76 6 7 e24 e34 e44 : : : en4 7 0 0 0 4 : : : 0 7 76 6 7 4: : : : : : : : : : : : : : : : : :7 5 ::: ::: ::: ::: :::7 7 0 0 0 0 : : : 5 n e2n e3n e4n : : : enn
e11 6e 6 12 6 6e13 6 6e14 6 4: : : e1n
e21 e22 e23 e24 ::: e2n
e31 e23 e33 e34 ::: e3n
e41 e42 e43 e44 ::: e4n
::: ::: ::: ::: ::: :::
3T en1 en2 7 7 7 en3 7 7 en4 7 7 :::5 enn
a11 a21 a Where A D 31 a41 ::: an1
a12 a22 a32 a42 ::: an2
a13 a23 a33 a43 ::: an3
a14 a24 a34 a44 ::: an4
::: ::: ::: ::: ::: :::
a1n a2n a3n a4n ::: ann
3 2 3 2 3 2 3 e11 e21 e31 en1 6e 7 6e 7 6e 7 6e 7 12 32 32 6 7 6 7 6 7 6 n2 7 6e 7 6e 7 6e 7 6e 7 6 13 7 6 23 7 6 33 7 6 n3 7 6 7 6 7 6 7 6 7 Eigen vectors is given as e1 D 6e14 7 e2 D 6e24 7 e3 D 6e34 7 : : : en D 6en4 7 6 7 6 7 6 7 6 7 6e15 7 6e25 7 6e35 7 6en5 7 6 7 6 7 6 7 6 7 4: : :5 4: : :5 4: : :5 4:::5 e1n e2n e3n enn 2
e11 e12 e The Eigen vector matrix E D 13 e14 ::: e1n
e21 e22 e23 e24 ::: e2n
e31 e23 e33 e34 ::: e3n
e41 e42 e43 e44 ::: e4n
::: ::: ::: ::: ::: :::
en1 en2 en3 en4 ::: enn
1.21 Schur’s Lemma
49
3
2
1 0 0 0 : : : 0 60 0 0 ::: 0 7 2 7 6 7 6 6 0 0 3 0 : : : 0 7 Diagonal matrix D D 6 7 6 0 0 0 4 : : : 0 7 7 6 4: : : : : : : : : : : : : : : : : :5 0 0 0 0 : : : n Thus the non-deficient matrix A is represented as the product of the transpose of the Eigen vector matrix, Diagonal matrix and the Eigen vector matrix (i.e.) A D EDET .
1.21 Schur’s Lemma For any square matrix A, there exists the Unitary matrix U such that U H AU D T , where T is the triangular matrix. Example 1.21. Consider the matrix A as shown below. 2 3 1 2 3 A D 44 5 7 5 7 8 10 Eigen values and the corresponding Eigen vectors are as shown below. 17.1747, 1; 0:1747. The Eigen vector corresponding to the Eigen value 2 3 0:2176 17.1747 is E1 D 40:53925. 0:8136 Construct the Unitary matrix U1 with the Eigen vector E1 as the first column and other two columns are arbitrarily chosen. 2
0:2176 1 4 U1 D 0:5392 1 0:8136 0:9302 2 0:2176 1 .U1 /H A U1 D 40:5392 1 0:8136 0:9302 2 32 1 2 3 0:2176 44 5 7 5 40:5392 7 8 10 0:8136 2 17:1753 6:0233 D4 0 2:6023 0 0:3201
3 1 0:7726 5 0:2445 3H 1 0:7726 5 0:2445
3 1 1 1 0:7726 5 0:9302 0:2445 3 3:6934 0:9996 5 0:4418
50
1 Matrices
2:6023 0:9996 0:3201 0:4418 The Eigen values of the matrix B are 2:7414 and 0:3027. The corresponding 0:9905 Eigen vector 2:7414 is 0:1379 2 3 1 0 0 Form the matrix U2 D 40 0:9905 e15 Now consider the matrix B D
0 0:1379 e2 e1 The vector are chosen such that the columns of the matrix e2 0:9905 e1 are orthogonal to each other. 0:1379 e2 2 3 1 0 0 Thus the matrix U2 is chosen as 40 0:9905 1 5 0 0:1379 7:1827 2 3 1 0 0 U2 0 U2 D 40 1 0 5 0 0 52:5912 2 3 17:1753 6:4754 20:5055 .U2 /H .U1 /H A U1 U2 D 4 0 2:7417 4:9272 5 0 0 15:9140 Let U H D .U2 /H .U1 /H , which is also the Unitary Matrix 2 3 17:1753 6:4754 20:5055 ) UHAU D 4 0 2:7417 4:9272 5 0 0 15:9140 By using the procedure described above, any arbitrary matrix A, there exists the Unitary matrix U such that U H AU D T , where T is the triangular matrix.
1.22 Hermitian Matrices and Skew Hermitian Matrices Let AH is the conjugate transpose of the matrix A. The matrix is said to be Hermitian matrix if AH D A and the matrix A is said to be Skew Hermitian if AH D A. Properties: 1. x H Ax is real. Proof. Taking the conjugate transpose of the matrix x H Ax, we get .x H Ax/H D x H Ax. Hence proved.
1.22 Hermitian Matrices and Skew Hermitian Matrices
51
2. Eigen values of the Hermitian matrix are real. Proof. Vector such that Ax D x. Multiplying x H on both sides, we get x H Ax D x H x: We know x H Ax and x H x are the real numbers. ) x H x D real number ) is the real number: 3. Eigen vectors of the Hermitian matrix A corresponding to distinct Eigen values are orthogonal. Proof. Let A be the Hermitian matrix and let 1 ; 2 be the distinct Eigen values of the matrix A and the corresponding Eigen vectors are e1 and e2 . Ae1 D 1 e1 Ae2 D 2 e2 Consider .1 e1 /H e2 D .A e1 /H e2 D e1 H AH e2 D e1 H Ae2 D e1 H 2 e2 (Note that A D AH ) ) .1 e1 /H e2 D e1 H 2 e2 ) e1 H e2 .1 2 / D 0 ) e1 H e2 D 0 ŒBecause 1 2 ¤ 0 Hence proved. 2
3 1 i 3i Example 1.22. Let A D 4 i 2 4 5 be the Hermitian matrix (i.e.) A D AH 3i 4 5 Eigen values are 1:3560, 0.4123 and 8.9438 (They are real) and the corresponding Column wise Eigen vectors are 2
3 0:5410i 0:7601i 0:3599i 4 0:5728 0:6464 0:5041 5 0:6158 0:0666 0:7851
52
1 Matrices
Note that Eigen vectors are orthogonal to each other. (i.e.)
2 3 0:5410i 0:5410i 0:5728 0:6158 4 0:5728 5 D 1:110216 l 0 0:6158
Similarly it can be shown that all the Eigen vectors are orthogonal to each other. 4. Hermiitian matrix is always diagonalizable using the unitary matrix. For all arbitrary Hermitian matrixes ‘A’, there exists the Unitary matrix U such U H AU D D, where D is the Diagonal matrix. Proof. From Schur’s lemma, for any arbitrary Hermitian matrix ‘A’, there exists the Unitary matrix U such that U H AU D T . Taking Hermitian transpose on both sides, we get U H AH U D T H ) U H AU D T H ŒBecause A D AH ) T D TH ) ‘T ’, in this case is the Diagonal matrix ‘D’ with all the elements in the matrix are filled up with real numbers. Hence proved.
1.23 Unitary Matrices Columns of the unitary matrices are orthonormal to each other. Let U be the Unitary matrix, then U H U D I (Identity matrix) Properties: 1. kUxk D kxk. Proof. Sqrt..Ux/H .Ux// D Sqrt..x/H .U /H .U /.x// D Sqrt..x/H .x// D kxk Hence proved. 2. .Ux/H ..Uy/ D .x/H .y/. 3. If U1 ; U2 are Unitary matrices, then U1 U2 is also the Unitary matrix. 4. U is always invertible. In particular the inverse of the unitary matrix U is given as U H . 5. Magnitude of the Eigen values of the unitary matrix is always one. Proof. The Eigen vector x of the unitary matrix ‘U’ satisfies the condition Ux D x, where is the Eigen value of the unitary matrix
1.23 Unitary Matrices
53
Taking norm on both sides, we get kUxk D kxk .see property 1/ kUxk D kxk D sqrt..x/H .x// D sqrt.x H H x/ D sqrt..H /sqrt..x H x// D kkkxk D jjkxk ) kxk D jjkxk ) jj D 1 Hence proved. 6. Eigen vectors corresponding to distinct Eigen values are orthogonal. Proof. Consider the Unitary matrix U such that Ux1 D 1 x1 and x2 D 2 x2 , where x1 ; x2 are Eigen vectors corresponding to the distinct Eigen values 1 and 2 . Consider .1 x1 /H .2 x1 / D 1 2 x1 H x1 D .Ux1 /H .Ux1 / D x1 H x1 ) 1 2 x1 H x1 D x1 H x1 ) .1 2 1/x1 H x1 D 0 ) x1 H x1 D 0 ŒBecause .1 2 1/ ¤ 0 Hence proved Example 1.23.
0:7071 0:7071 AD 0:7071i 0:7071i
1. Note that the columns of the matrix are orthonormal to each other. 1 0 .i:e:/ AH A D 0 1 2. The Eigen values of the matrix A are 0:9659 C 0:2588i and 0:2588 0:9659i. Note that the magnitude of the Eigen values are 1. 3. The Eigen vectors corresponding to the distinct Eigen values are listed below. ˇ ˇ ˇ ˇ 0:8881 0:3251 C 0:3251i ˇ ˇ E1 D ˇ ; E2 D 0:3251 C 0:3251i ˇ 0:8881 Note that they are orthonormal to each other. .i:e/ E1
H
1 0 E2 D 0 1
54
1 Matrices
Example 1.24. The DFT matrix is the unitary matrix. Four-point DFT matrix is as shown below. 3 2 1 1 1 1 2 37 1 6 61 w w w 7 AD 2 4 4 1 w w w6 5 2 1 w3 w6 w9
j 2
where w D e 4 Note that the columns of the matrix A are orthonormal to each other. 2 1 6 0 .i:e/A0 A D 6 40 0
0 1 0 0
0 0 1 0
3 0 07 7 05 1
1. The Eigen values of the matrix A are 1, 1, i 2. The Eigen vectors corresponding to the above mentioned Eigen values are given as 2
3 2 3 2 3 2 3 0:8660 0:5 0 0:0231 0:0070i 60:28877 6 0:5 7 6 0:7071 7 60:4158 0:0035i 7 6 7 6 7 6 7 6 7 40:28875 ; 4 0:5 5 ; 4 0 5 ; 4 5 0:8085 0:2887 0:5 0:7071 0:4158 0:0035i Characteristics of the DFT Matrices 1. The Eigen values of the DFT matrices are one among the following values 1, 1, i, i. The magnitude of the Eigen values are always one. 2. Consider the circular convolution of the following two sequences as shown below. 2 3 2 3 5 1 667 627 7 6 7 Let X D 6 435 and H D 475 8 4 The circular convolution performed using matrix method is as shown below. 2 1 62 6 43 4
4 1 2 3
3 4 1 2
32 3 2 3 2 5 66 667 6687 37 76 7 D 6 7 45 475 4665 1
8
60
1.23 Unitary Matrices
55
2
1 62 The circular matrix C D 6 43 4 U as shown below. C D UDU 1 2 1 4 3 62 1 4 )6 43 2 1 4 3 2 2 10 60 6 40 0
4 1 2 3
3 4 1 2
3 2 37 7 is diagonalized using DFT Unitary matrix 45 1
3 2 3 2 0:5 0:5 0:5 0:5 6 7 37 7 D 60:5 0:5i 0:5 0:5i 7 5 4 4 0:5 0:5 0:5 0:5 5 1 05 0:5i 0:5 0:5i 32 3 0 0 0 0:5 0:5 0:5 0:5 6 7 2 C 2i 0 07 7 60:5 0:5i 0:5 0:5i 7 5 4 0 2 2i 0 0:5 0:5 0:5 0:5 5 0 0 2 0:5 0:5i 0:5 0:5i
3 2 0:5 0:5 0:5 0:5 60:5 0:5i 0:5 0:5i 7 7 Where U D 6 40:5 0:5 0:5 0:5 5 05 0:5i 0:5 0:5i 2 10 0 0 6 0 2 C 2i 0 D D 6 40 0 2 2i 0 0 0
3 0 07 7 05 2
2 3 5 667 7 Let B D 6 475 8 Multiplying Matrix B on both sides in the above equation we get, CB D UDU 1 B 2 32 3 2 1 4 3 2 5 0:5 0:5 62 1 4 37 667 60:5 0:5i 76 7 6 )6 43 2 1 45 475 D 40:5 0:5 4 3 2 1 8 05 0:5i 2 32 10 0 0 0 0:5 6 0 2 C 2i 7 60:5 0 0 6 76 40 0 2 2i 0 5 40:5 0 0 0 2 0:5
3 0:5 0:5 0:5 0:5i 7 7 0:5 0:5 5 0:5 0:5i
32 3 5 0:5 0:5 0:5 667 0:5i 0:5 0:5i 7 76 7 0:5 0:5 0:5 5 475 8 0:5i 0:5 0:5i
56
1 Matrices
The product U 1 B in the RHS as shown below is given as 2 32 3 2 3 0:5 0:5 0:5 0:5 5 13 60:5 0:5i 0:5 0:5i 7 667 61 C i 7 6 76 7 6 7 40:5 0:5 0:5 0:5 5 475 D 4 1 5 : This can be viewed as 0:5 0:5i 0:5 0:5i 8 1 i the DFT of the vector ‘B’ (Except the scaling factor). The vector thus obtained is in frequency domain. The vector thus obtained in frequency domain is scaled using the diagonal matrix D as shown below. DU 1 B 2
10 0 0 6 0 2 C 2i 0 )6 40 0 2 2i 0 0 0
3 32 3 2 0 13 1:3X102 27 6 7 6 07 7 61 C i 7 D 60:040iX10 7 2 5 5 4 5 4 0:02X10 0 1 0:04iX102 2 1 i
The obtained vector is then multiplied with the Unitary matrix U can be viewed as the inverse DFT as shown below to obtain the circular convolved output. 2 3 2 3 32 0:5 0:5 0:5 0:5 66 1:3X102 60:5 0:5i 0:5 0:5i 7 60:040iX1027 6687 6 7 6 7 76 40:5 0:5 0:5 0:5 5 4 0:02X102 5 D 4665 0:04iX102 05 0:5i 0:5 0:5i 60
1.24 Normal Matrices The matrix A that satisfies the condition AH A D AAH is called Normal Matrix. Hermitian matrix, Unitary matrix, Permutation matrix and circular matrices are called Normal matrices. All the Normal matrices are diagonalizable. Summary (see Fig. 1.6 below) 1. For any matrix A of size mm, the Eigen vectors corresponding to distinct Eigen values are independent. The Eigen vectors corresponding to the particular Eigen value are independent, because it is the basis for the Null space of the matrix of the form jŒA I . So in case of non-deficient matrix A, the Eigen vectors forms the basis for the vector space Rm . 2. All the Normal matrices are non-defective. (i.e.) Any Normal matrix A of size m m can be represented as the product of UDU H , where U is the unitary matrix and D is the Diagonal matrix. The Eigen vectors corresponding to various Eigen values are orthogonal to each other. Hermitian Matrix, Unitary Matrix (Example DFT Matrix), Permutation matrix and the Circular matrix are the examples for the Normal matrices. All Normal matrices need not be invertible.
1.24 Normal Matrices
57
Fig. 1.6 Set of deficient and non-deficient matrices 1. Deficient Matrices 2. Non-Deficient Matrices 3. Invertible Matrices 4. Unitary Matrices 5. Permutation Matrices 6. Hermitian Matrices 7. Normal Matrices
3. The Eigen values of the Hermitian matrix A of size m m are real. Any arbitrary Hermitian matrix can be represented as the product of UDU H , where U is the unitary matrix and D is the Diagonal matrix. The Eigen vectors of the Hermitian matrix ‘A’ forms the orthogonal basis of the vector space Rm . Hermitian matrix need not be invertible. 4. The magnitude of the Eigen values of the Unitary matrix A of size m m is 1. The value can be complex. This matrix can also be represented always as UDU H . where U is the unitary matrix and D is the Diagonal matrix. The Eigen vectors of the Unitary matrix ‘A’ forms the orthogonal basis of the vector space Rm . Unitary matrix is always invertible. 5. DFT matrix is the example for the Unitary matrix which is the type Normal matrix. DFT matrix is always diagonalizable (i.e.) non-deficient. The DFT matrix A of size m m can be represented as UDU H , where U is the unitary matrix and D is the Diagonal matrix. The Eigen values of the matrix A hold any of the following values i, I, 1, 1. DFT matrix is the Unitary matrix and is always invertible. The Eigen vectors of the DFT matrix ‘A’ forms the orthogonal basis of the vector space Rm . 6. Permutation matrix is the example for the unitary matrix, which is the type of the Normal matrix and hence diagonalizable (i.e.) non-deficient. The permutation matrix A of size m m can be represented as UDU H . The magnitude of the Eigen values of the permutation matrix is 1. Permutation matrix, which is the unitary matrix, is always invertible. The Eigen vectors of the Permutation matrix ‘A’ of size m m forms the orthogonal basis of the vector space Rm . 7. Circular matrix is the type of Normal matrix and hence diagonalizable (i.e.) nondeficient. The circular matrix A of size m m can be represented as U H , where
58
1 Matrices
U is the unitary matrix and D is the Diagonal matrix. Circular matrix need not be invertible matrix. The Eigen vectors of the Circular matrix ‘A’ forms the orthogonal basis of the vector space Rm .
1.25 Applications of Diagonalization of the Non-deficient Matrix (a) Solving the Difference Equation of the form UkC1 D AU k , where 2
UkC1
2 3 3 x1.k C 1/ x1.k/ 6 x2.k C 2/ 7 6 x2.k/ 7 6 7 6 7 6 7 6 7 D 6 x3.k C 3/ 7 Uk D 6 x3.k/ 7 6 7 6 7 4 5 4 ::: 5 ::: x n.k C 1/ x n.k/
Where ‘A’ is the non-deficient n n matrix. Example 1.25. Let the non-deficient 2 2 matrix be A D
11 . 10
UkC1 D
x1.k C 1/ x1.k/ and Uk D x2.k C 2/ x2.k/
Solving UkC1 D AU k is equivalent to solving the equation Uk D Ak U0 . Note that the matrix A is the unitary matrix. Therefore the matrix A can be represented as A D ED EH (as given below), where E is the Eigen matrix, in which the columns are the Eigen vectors. 11 10 0:5257 0:8507 0:6180 0 0:5257 0:8507 D 0:8507 0:5257 0 1:6180 0:8507 0:5257 ) ŒAk D EDEH EDEH EDEH EDEH : : : .‘k 0 times/ Note that EEH is the Identity matrix and hence ŒAk D EDDD : : : .k times/E H ) ŒAk D EDk E H .0:6180/k 0 Also D k D 0 .1:6180/k
1.25 Applications of Diagonalization of the Non-deficient Matrix
59
Thus solution to the above difference equation is given as Uk D Ak U0 D ED k E H U0 0:5257 0:8507 0:5257 0:8507 .0:6180/k 0 U0 D 0:8507 0:5257 0 .1:6180/k 0:8507 0:5257 If U0 D
1 0
0 0:5257 0:8507 1 0:5257 0:8507 .0:6180/k 0 .1:6180/k 0:8507 0:5257 0 0:8507 0:5257 0 0:5257 0:8507 .0:6180/k 0:5257 0 .1:6180/k 0:8507 0:8507 0:5257 0:5257 0:8507 .0:5257/.0:6180/k 0:8507 0:5257 .0:8507/.1:6180/k 0:5257 0:8507 .0:5257/.0:6180/k 0:8507 0:5257 .0:8507/.1:6180/k Œ.0:5257/2 .0:6180/k C .0:8507/2.1:6180/k .0:5257/.0:8507/.0:6180/k C .0:5257/.0:8507/.1:6180/k Œ.0:2764/.0:6180/k C .0:7237/.1:6180/k Œ.0:4472/.0:6180/k C .0:4472/.1:6180/k
Uk D D D D D D
(b) Solving Differential Equation of the form 2 3 u1 .t/ 6u2 .t/7 dU.t / 7 D A U.t/, where U.t/ D 6 dt 4 : : : 5 and A is the Non-deficient matrix. un .t/ Solution for the above differential equation is of the following form U.t/ D e At U.0/ 11 1 and U.0/ D 10 0 To compute e At , Diagonalization of the matrix ‘A’ is used. Let us use the non-deficient matrix A D
.A/2 .A/3 A C C C 1Š 2Š 3Š EDEH EDEH EDEH EDEH EDEH A C C ) eA D I C C 1Š 2Š 3Š EDEH EDDEH EDDDEH ) e A D EIEH C C C C 1Š 2Š 3Š eA D I C
60
1 Matrices
DD DDD D A C C C EH )e DE IC 1Š 2Š 3Š ) e A D EeD E H ) e At D EeDt E H
For the given problem the solution of the differential equation is given as follows. U.t/ D e At U.0/ ) U.t/ D Ee Dt E H U.0/ H 1 0:5257 0:8507 Dt 0:5257 0:8507 ) U.t/ D e 0:8507 0:5257 0 0:8507 0:5257
0:6180 0 where D D 0 1:6180 0:6180t 0:6180 0 e 0 De tD 0 1:6180 0 e 1:6180t
)e
Dt
) U.t/
0:5257 0:8507 1 0:5257 0:8507 e 0:6180t 0 0:8507 0:5257 0 e 1:6180t 0:8507 0:5257 0 0:5257 0:8507 0:5257e 0:6180t 0:8507 0:5257 .0:8507/e 1:6180t 0:5257 0:8507 0:5257e 0:6180t 0:8507 0:5257 .0:8507/e 1:6180t 0:5257 0:8507 0:5257e 0:6180t 0:8507 0:5257 .0:8507/e 1:6180t 0:2764e 0:6180t C 0:7234e 1:6180t 0:4472e 0:6180t C 0:4472e 1:6180t
D D D D D
1.26 Singular Value Decomposition Consider the matrix A of size m n. The matrix A can be represented as the product of Hermitian transpose of the unitary matrix U1 , Diagonal matrix ‘D’ and the unitary matrix ‘U2 ’. A D U1 DU2 H . Let the unit magnitude Eigen vector ‘vi ’ corresponding to the matrix AT A satisfies the condition AT A vi D i vi . Multiplying A on both sides, we get AAT .A vi / D i .A vi /. ) .A vi / is the Eigen vector of the matrix AAT . The
1.26 Singular Value Decomposition
61
corresponding Eigen value is i . The magnitude of the vector Avi is obtained as .Avi /T .Avi / D vi T AT Avi D i vi T vi D p i . The Unit magnitude Eigen vector of the matrix AAT can be represented as Avi =p i . Let it be ui Rewriting the above equation as Avi D i ui 123 Example 1.26. Consider the matrix A D . 456 AAT D
14 32 32 77
The Eigen values of the matrix AAT are 0.5973, 90.4027 and the corresponding 0:9224 0:3863 Eigen vectors are given as ; 0:3863 0:9224 Similarly the Eigen values of the matrix AT A are 0, 0.5963, 90.4027 and the corresponding Eigen vectors are given as 2
3 2 3 2 3 0:4082 0:8060 0:4287 4 0:8165 5 ; 40:11245 and 40:56635 : 0:4082 0:5812 0:7039 The Eigen vectors of the matrix AAT are obtained as following for the corresponding non-zero Eigen values. 2 3 0:8060 123 4 0:11245 456 0:5812 0:9224 D p 0:3863 0:5963 2 3 0:4287 123 4 0:5663 5 456 0:7039 0:3863 p D 0:9224 90:4027 Note that it is same as the one computed directly from the basic definition. Also note that the Eigen vectors thus obtained are orthonormal to each other as the matrix AAT and AT A are Hermitian matrix. Thus the matrix
2 3 0:8060 0:4287 123 4 0:1124 0:5663 5 456 0:5812 0:7039 p 0:5963 p 0 0:9224 0:3863 D 90:4027 0 0:3863 0:9224
62
1 Matrices
As the Eigen vectors are orthonormal, the matrix A can be represented as follows as follows 123 ) 456 2 3T 0:8060 0:4287 p 0:5963 p 0 0:9224 0:3863 40:1124 0:5663 5 D 90:4027 0 0:3863 0:9224 0:5812 0:7039 2 3T 0:8060 0:4287 p 0:9224 0:3863 0:5963 p 0 40:1124 0:5663 5 D 90:4027 0:3863 0:9224 0 0:5812 0:7039 We can even include the Eigen vector corresponding to the zero Eigen value of the matrix AT A as shown below. 123 ) 456 32 2p 3T 0:5963 p 0 0 0:8060 0:4287 0:4082 0:9224 0:3863 4 D 90:4027 05 40:1124 0:5663 0:8165 5 0 0:3863 0:9224 0 0 0 0:5812 0:7039 0:4082 The above method of representing the matrix A as the product of the Unitary matrix ‘U1 ’, diagonal matrix ‘D’ and the Unitary matrix ‘U2 0T are called Singular Value Decomposition.
1.27 Applications of Singular Value Decomposition 1. Spectral factorization representation of the matrix is obtained using Singular Value Decomposition as given below. Example 1.27. From the Example 1.23, we get, 1 2 3 : 4 5 6 2 3T p 0:8060 0:4287 0:5963 p 0 0:9224 0:3863 40:1124 0:5663 5 D 0 0:3863 0:9224 90:4027 0:5812 0:7039 p 0:5963 p 0 0:9224 0:3863 0:8060 0:1124 0:5812 D 90:4027 0:4287 0:5663 0:7039 0 0:3863 0:9224
1.27 Applications of Singular Value Decomposition
D
p 0:5963
63
0:9224 0:8060 0:1124 0:5812 0:3863 p 0:3863 C 90:4027 0:4287 0:5663 0:7039 0:9224
The above mentioned way of representing the matrix is called spectral factorization. If the Eigen values are so small, we can neglect them in the above representation and hence data compression is achieved. 2. Computation of Pseudo inverse of the non-invertible matrix Case 1: When the matrix AT A is invertible Consider the case of solving the equation of the form Ax D b when AT A is invertible. 2 3 2 3 1 0 0 2 3 5 6 7 x1 6 7 60 2 07 6 7 667 76 7 6 7 Let the equation be 6 60 0 37 4x25 D 647 : 4 5 x3 4 5 0 0 0 4 Multiplying AT on both sides, we get 32 1 1 0 0 0 76 6 60 2 0 07 60 76 6 60 0 3 07 60 54 4 0 0 0 0 0 2 1 6 0 )6 4 0 2
2 3 1 0 0 0 2 3 6 7 x1 2 07 6 7 60 2 7 6x27 D 6 4 5 60 0 0 37 4 5 x3 0 0 0 0 32 3 2 3 0 0 x1 5 76 7 6 7 4 07 6x27 D 6127 54 5 4 5 0 9 x3 12
In this case x1 D 5; x2 D 3; x3 D
4 3
32 3 5 0 0 76 7 0 07 667 76 7 6 7 3 07 5 445 4 0 0
2
1 60 Note that the solution corresponds to the projected column vector 6 40 0 2 3 2 3 5 x1 6 7 4x25 D 667 445 x3 0
0 2 0 0
3 0 07 7 35 0
64
1 Matrices
Also note that the matrix AT A is invertible and we have already shown that the 2 3 5 667 7 solution vector thus obtained is such that magnitude of the error vector 6 445 4 2 3 2 3 5 0 667 607 6 7 D 6 7 is minimized. 445 405 0
4
Case 2: When the matrix AT A is non-invertible Consider the case of solving the equation of the form Ax D b, where AT A is non-invertible. 2 3 2 3 3 x1 2 1 0 0 0 6 7 5 7 6x27 6 7 6 7 6 7 6 7 6 0 2 0 0 Let the equation be 4 5 6x37 D 465 : 4 5 0 0 3 0 4 x4 2
3 1 0 0 0 60 4 0 07 7 In this case A0 A D 6 40 0 9 05 is not invertible and hence projection method 0 0 0 0 as used above cannot be used in this case. By direct observation of the set of equations, the solution is obtained as follows x1 D 5 x2 D 3 x2 D 4=3 2
3 x1 6x27 7 The variable x4 is arbitrarily chosen as 0 to reduce the length of the vector 6 4x35. x4 Thus the solution obtained above is of shortest length. In both the cases the pseudo inverse matrix of the diagonal matrix A which is represented as AC is obtained as follows. 1 0 0 0 For the Case 1 AC D 0 12 0 0 0 0 13 0
1.27 Applications of Singular Value Decomposition
65
The solution for the equation Ax D b, is obtained as x D AC b 2
32 3 2 3 1 0 0 0 5 5 ) x D 40 12 0 05 465 D 4 3 5 4 4=3 0 0 13 0 Similarly for the case 2, 1 0 0 0 12 0 AC D 0 0 13 0 0 0 The solution for the equation Ax D b, is obtained as x D AC b 3 2 3 1 0 0 2 3 5 5 60 1 0 7 6 7 2 74 5 6 3 7 )xD6 40 0 1 5 6 D 44=35 3 4 0 0 0 0 2
Thus if the matrix A is the diagonal matrix the inverse of the matrix can be obtained by inverting the non-zero diagonal elements of the matrix A with zero elements unchanged. If the matrix A is not the diagonal matrix, then SVD is used to represent the matrix as A D U1 DU2 H and obtain the vector x such that kAxbk is minimized. ) we have to obtain the value for the vector x such that kU1 DU2 H x bk is minimized. ) kDU2 H x U1 H bk is minimized. [Multiplying Unitary matrix on both sides of the linear equation] Note: Multiplying with the unitary matrix on both sides of the linear equation will not affect the distance as described below In general kAx bk2 D .Ax b/H .Ax b/ Multiplying the Unitary matrix U on both sides we get, kUAx Ubk2 D .UAx Ub/H .UAx Ub/ D U D xH AH UH UAx xH AH UH Ub bH UH Ub bH UH Ub D xH AH Ax xH AH b bH b bH b D .Ax b/H .Ax b/ D kAx bk2
66
1 Matrices
Let y D U2 H x and the modified task is to minimize the norm kDy U1 H bk, where ‘D’ is the diagonal matrix and hence solution for y is obtained as y D D C U1 H b ) U2 H x D D C U1 H b ) x D U2 D C U1 H b Thus for any arbitrary matrix A, we can obtain inverse (If the matrix is invertible) or pseudo inverse (If the matrix is not invertible) using the technique as mentioned above. Example 1.28.
1 2 3 AD 4 5 6
The inverse of the matrix A is obtained as follows. Using SVD we get, "
# 1 2 3 4 5 6
2 32 3T # p0:5963 0:8060 0:4287 0:4082 0 p 0 0:9224 0:3863 6 76 7 D 90:4027 05 40:1124 0:5663 0:8165 5 0 4 0:3863 0:9224 0 0 0 0:5812 0:7039 0:4082 "
AC "
#
2
0:9224 0:3863 6 4 0:3863 0:9224
D
p 1 0:5963
0 p
0
1 90:4027
0
0
p 1 0:5963
0
32 3T 0:8060 0:4287 0:4082 76 7 05 40:1124 0:5663 0:8165 5 0:5812 0:7039 0:4082 0 0
C
A
" D
#
2
0:9224 0:3863 6 4 0:3863 0:9224
0 0
p
1 90:4027
0
3T 32 0:8060 0:4287 0:4082 7 76 0:8165 5 05 40:1124 0:5663 0:5812 0:7039 0:4082 0 0
3. Representing the matrix A as the product of Unitary matrix and the symmetric matrix as shown below. Using SVD, A D U1 DU2 H Inserting U2 H U2 in the middle we get A D U1 U2 H U2 DU2 H Note that the matrix U1 U2 H is the unitary matrix and the matrix U2 DU2 H is the symmetric matrix.
Chapter 2
Probability
2.1 Introduction 1. Set: It is collection of well defined objects. Each object is referred as an element 2. The set B which is the subset of A (Represented as B A), is a set whose element are also the elements of A 3. Set of no elements are called Empty set 4. Set operations: A union B (Represented as A [ B or A C B) is the set that consists of the
elements which are either in A or B or in both. A intersection B (Represented as A \ B or AB) is the set of elements which
are in both A and B.
A complement (Represented as AN or Ac ) is the set that consists of the elements
that are not present in the set A. 5. Mutually exclusive sets (Disjoint sets): Two sets A and B are disjoint, if they have no elements in common i.e. AB D . In general The sets A1,A2,A3, : : : An are disjoint if Ai Aj D for i ¤ j 6. Sample space: Set of all experimental outcomes 7. Partition: A partition of a set is the collection of mutually exclusive sets A1, A2 : : :whose union is the sample space ‘S’ (i.e.) If A1 C A2 C A3 C : : : An D S and Ai Aj D for i ¤ j, then Ai’s form a partition of the set 8. Event: Subset of the sample space Certain event: S (Sample space) Impossible event: (Null space) Elementary event: One outcome of the experiment. 9. Countable infinite set: The set C is countably infinite when there is one-to-one correspondence between the elements of the set C and a set of all non-negative numbers
E.S. Gopi, Mathematical Summary for Digital Signal Processing Applications with Matlab, DOI 10.1007/978-90-481-3747-3 2, c Springer Science+Business Media B.V. 2010
67
68
2 Probability
2.2 Axioms of Probability 1. Probability of the event Ÿ represented as P.Ÿ/ 0. 2. P.S/ D 1, where ‘S’ is the sample space. 3. If A1,A2, : : : An are such that Ai Aj D for i ¤ j then P
X1 i D1
X1 Ai D P .Ai /: i D1
Note: If sample space is having finite number or infinitely countable number of subsets, probabilities can be assigned to all the subsets of the sample space that satisfies all the axioms mentioned in 2.2. If sample space is having uncountable infinite number of subsets, it is not possible to assign the probability to all the subsets of the sample space that satisfies all the axioms mentioned in 2.2.
2.3 Class of Events or Field (F) Class of Events is the subset of the sample space satisfying the following properties. 1. If A F, then AN F. 2. If A; B 2 F, then A C B 2 F. P 3. Sigma Field: If A1; A2; : : : 2 F; 1 i D1 Ai 2 F.
2.4 Probability Space (S, F, P) The probability space (S, F, P) consists of Sample space S, Field F and the probability measure P. The probability measure maps every element of F to a number less than 1 and greater than 0. The measured value is the probability.
2.5 Probability Measure The probability of the event A can be measured as follows. Method 1: Probabilities of elementary events are assumed as equal. Let the number of outcomes belonging to the event A is NA and the total number of outcomes is N, then P.A/ D NA =N . The probability measured in this technique depends upon how the set of possible outcomes are defined.
2.6 Conditional Probability
69
Method 2: Repeat the experiment N times. Let nA is the number of times event A occurs, then P.A/ D limN !1 nNA
2.6 Conditional Probability Case 1 (Fig. 2.1): Let ‘nA’ and ‘nB’ be the number of outcomes belonging to the event A and B respectively. Also let ‘n’ be the total number of outcomes. Probability of the event A is nA/n Probability of the event B is nB/n Probability of the event A given B has occurred is given as follows nA P.A=B/ D nA=nB D n
nB D P.A/=P.B/ D P.AB/=P.B/ n
Case 2 (Fig. 2.2): In this case P(A/B) D Probability of A given B has occurred is 1. This can also be obtained using the formula obtained in the case 1. P .A=B/ D P .AB/=P.B/ D P.B/=P.B/ D 1 Therefore P.A=B/ D P.AB/=P.B/ is considered as the common formula that can be used in both the cases. nA
nB
n
Fig. 2.1 Conditional probability – case 1
nB
nA
Fig. 2.2 Conditional Probability – case 2
n
70
2 Probability
2.7 Total Probability Theorem Let A1, A2 : : : An be the partition of the sample space S. B D BS D B.A1 C A2 C A3 C : : : An/ D A1B C A2B C A3B C : : : AnB P.B/ D D
Xn i D1 n X i D1
P .AiB/
P
B Ai
P .Ai /
Total Probability Theorm W P .B/ D
n X
P
i D1
B Ai
P .Ai /
Note that Ai Aj D for i ¤ j
2.8 Bayes Theorem P.A=B/ D P.AB/=P.B/ P.B=A/ D P.AB/=P.A/ P.B/P.A=B/ D P.B=A/P.A/ ) P.B/ D P.A/P.B=A/=P.A=B/
Bayes Theorem W P .B/ D P .A/ P
B A
P
A B
Let A1, A2, : : : An forms the partition of the sample space S. B P .Ai /P . Ai / In general P.Ai=B/ D Pn B i D1 P . Ai /P .Ai / Note that Ai Aj D for i ¤ j
2.9 Independence 1. A and B are independent if P.AB/ D P.A/P.B/. Also P.A=B/ D P.A/ 2. Three events A, B and C are independent if it satisfies the following conditions P .AB/ D P .A/P .B/ P .AC / D P .A/P .C / P .BC / D P .B/P .C / P .ABC / D P .A/P .B/P .C /
2.10 Multiple Experiments (Combined Experiments)
71
P .AB=C / D P .ABC /=P .C / P .ABC / P .A=BC / D D P .A/ P .BC /
2.10 Multiple Experiments (Combined Experiments) 1. Consider the sample spaces S1 and S2 corresponding to the two independent experiments. fS1g D f1; 2; 3; 4; 5; 6g fS2g D fh; tg Sample space of the combined experiments is represented as S D S1XS2 D f.a; b/; a 2 S1; b 2 S2g 2. Consider the event A D f2; 3g 2 S1 and the event B D fhg 2 S2. The eventfAXS2g D f.2; h/.3; h/.2; t/.3; t/g The eventfS1XBg D f.1; h/.2; h/.3; h/.4; h/.5; h/.6; h/g fAXBg D f.2; h/.3; h/gwhich can be obtained using the following: fAXBg D fAXS2g \ fS1XBg P.AXB/ D P.AXS2/P.S1XB/ Some properties of the probability derived from the Axioms of the probability N D 1 P.A/ 1. P .A/ Proof. A C AN D S (A and AN are Mutually Exclusive Events) N D 1 ŒSecond Axiom P .S / D P .A C A/ N ŒThird Axiom D P .A/ C P .A/ N ) P .A/ D 1 P .A/ 2. P .A/ 1 N [From proof 1] Proof. P .A/ D 1 P .A/ N 0 and P .A/ 0 ŒFirst Axiom P .A/ ) P .A/ 1 3. P .˚/ D 0 Proof. The Event A D A C ˚ [A and ˚ are Mutually Exclusive Events] P .A/ D P .A/ C P .˚/ ŒThird Axiom ) P .˚/ D 0
72
2 Probability
4. fAi gniD1 is P a partition of the sample space S (Fig. 2.3). For any event B; P .B/ D niD1 P .B Ai / Proof. From the diagram, it can S be shown that any event ‘B’ can be represented as the union of disjoint events niD1 B Ai .i:e:/ B D [niD1 BAi P .B/ D P .[niD1 B Ai / D P .BA1 C BA2 C BA3 C BA4 C BAn / Xn D P .BA1 / C P .BA2 / C : : : P .BAn / D P .BAi / i D1
5. If A B; P .A/ P .B/ (Fig. 2.4) N (Note that B and BA N are disjoint events) Proof. A D B C BA N )P .A/ D P .B/ C P .BA/ N 0 and he nce P .A/ P .BA/ N Also 1 P .BA/ 6. P .ACB CC / D P .A/CP .B/CP .C /P .AB/P .AC/ D P .BC/CP .ABC/ (Fig. 2.5)
Fig. 2.3 Partition of the sample space S
A B
Fig. 2.4 Venn diagram illustrating A B
2.10 Multiple Experiments (Combined Experiments) Fig. 2.5 Venn diagram illustrating A C B C C
73
B
A
C
Proof. A C B C C can be represented as the summation of three disjoint events as given below. N C .A C B/C (Disjoint Events) A C B C C D A C AB N ) A C B C C D A C .1 A/B C ANBC N / ) P .A C B C C / D P .A/ C P ..1 A/B/ C P .ANBC On simplification A C B C C D A C .1 A/B C .1 A/.1 B/C .Disjoint Events/ ) A C B C C D A C B AB C .1 B A C AB/C ) A C B C C D A C B AB C C BC AC C ABC ) P .A C B C C / D P .A/ C P .B/ P .AB/ C P .C / P .BC / P .AC / C P .ABC / ) P .A C B C C / P .A/ C P .B/ C P .C / P
7. In general P fAi gniD1 niD1 P .Ai / (It can be proved using Mathematical Induction) 8. If two events A and B are independent, AN and B are independent Proof. Given: P .AB/ D P .A/P .B/ N D B AB AB N D P .B/ P .AB/ P .AB/ ) P .B/ P .A/P .B/ ) P .B/.1 P .A// N ) P .B/P .A/
74
2 Probability
Thus AN and B are also independent. ) AN and BN are independent (property 8) ) A and BN are independent (property 8) 9. If three events A, B and C are independent, the events A and B C C are independent. Proof. Given: P .ABC/ D P .A/P .B/P .C / P .A.B C C // D P .AB C AC / D P .AB/ C P .AC / P .ABC / D P .A/P .B/CP .A/P .C /P .A/P .B/P .C / D P .A/.P .B/ C P .C / P .BC // D P .A/P .B C C / ) A and B C C are independent
10. If A, B and C are three events, P .AB=C / D P .A=BC/P .B=C /. Proof. P .AB=C / D P .ABC /=P .C / D P .A=BC /P .BC /=P .C / D P .A=BC /P .B=C / 11. If A, B and C are three events, P(ABC) D P(A/BC)P(B/C)P(C). Proof. P.ABC/ D P.A=BC/P.BC/ D P.A=BC/P.B=C/P.C/
2.11 Random Variable Consider the probability space (S,F,P), then the mapping of the outcomes s 2 F to the real line is called random variable. (i.e.) X: F > R (Fig. 2.6). Mapping must be chosen such that every subset of the real line of the form (1, a] should be an event in F. The subset of the real line of the form (a, b] is called Borel set represented as B. Let A 2 B: PX .A/ D P.fs 2 S W X.s/ 2 Ag/, where fs 2 S W X.s/ 2 Ag is called inverse image of A. Thus the random variable maps the probability space (S,F,P) to the Borel space (R, B, Px)
2.12 Cumulative Distribution Function (cdf) of the Random Variable ‘x’ Fig. 2.6 Illustration of the random variable
75
(S,F,P)
S
x
−⬁
a
R
(R,B,Px)
2.12 Cumulative Distribution Function (cdf) of the Random Variable ‘x’ FX .˛/ D PX .X ˛/ 1. 2. 3. 4. 5. 6. 7.
0 FX .˛/ 1 for all ˛ lim˛!1 FX .˛/ D 1 lim˛!1 FX .˛/ D 0 FX .˛/ is the non decreasing function. (i.e.) If ˛1 < ˛2, then FX .˛1/ FX .˛2/ If FX .˛0/ D 0, then FX .˛/ D 0 for all ˛ < ˛0 PX .a X b/ D FX .b/ FX .a/ If X and Y are two random variable such that X.s/ Y .s/ for all s 2 S, then FX .a/ FY .a/ for all ˛ (Fig. 2.7)
Proof. FX .a/ D P .X a/ FY .b/ D P .Y b/ By definition FX .a/ D FY .b/ D P .Y a/ C P .a Y b/ D FY .a/ C K The constant K ranges from 0 to 1 and it is the positive quantity and hence FX .a/ FY .a/ for all values of ‘a’
76
2 Probability S s F (x) X F (y) Y
X
a
b
Y
Fig. 2.7 Illustration of the property 7 of the CDF
2.13 Continuous Random Variable A random variable X is continuous if FX .˛/ is the continuous function of ˛.
2.14 Discrete Random Variable A random variable X is discrete if and only if it maps S to a countable subset of R (Real numbers).
2.15 Probability Mass Function If the random variable is discrete, probability of the particular value of the random variable can be computed using the function known as probability mass function. Probability of the random variable X D ˛ is represented as P.X D ˛/.
2.16 Probability Density Function If the random variable is continuous, probability of the random variable X over the smallest range ˛ to ˛ C ˛ is computed as follows. lim ŒFX .˛ C ˛/ FX .˛/
˛!0
2.17 Two Random Variables
77
Define fx .˛/˛ be the probability of the random variable x over the smallest range ˛ to ˛ C ˛ ) fx .˛/˛ D lim˛!0 ŒFX .˛ C ˛/ FX .˛/ ) fx .˛/ D lim˛!0 ŒFX .˛ C ˛/ FX .˛/ =˛ The function fx .˛/ is called probability density function. lim ŒFX .˛ C ˛/ FX .˛/
fx .˛/
˛!0
) fx .˛/ D
˛ dFX .˛/ dx
Properties: 1. 2. 3. 4.
fx .˛/ 0Rfor all ˛. F [Because F is the non-decreasing function] ˛ .˛/ D 1 fx .˛/ d˛ F RX 1 fx .x/dx D 1 R1 x2 x1 fx .x/dx D FX .x2/ FX .x1/
2.17 Two Random Variables Consider the probability space (S, F, P), then the mapping of the outcome s 2 F to the real line is called random variable. This mapping can be done in multiple forms and are called as Multiple random variables. In particular if the mapping is done in two different forms, then the corresponding mapping is called two random variables as shown in the Fig. 2.8. Two random variables are completely described by the Joint distribution function PŒX x; Y y and it is usually represented as FXY .x; y/. Properties of joint distribution function and joint density function with two random variables. limx!1 FXY .x; y/ D P.X 1; Y y/ D FY .y/. limy!1 FXY .x; y/ D P.X x; Y 1/ D FX .x/. limY !1 FXY .x; y/ D 0 limX!1 FXY .x; y/ D 0 P.x1 X x2; Y 1/ D FXY .x2; y/ FXY .x1; y/. P.x1 X x2; y1 Y y2/ D FXY .x2; y2/ FXY .x2; y1/ FXY .x1; y2/ C FXY .x1; y1/ 7. 0 FXY .x; y/ 1 8. It is the non-decreasing function with both ‘x’ and ‘y’
1. 2. 3. 4. 5. 6.
78
2 Probability
Fig. 2.8 Illustration of the two random variables
(S,F,P) S
y
x
a (R,B,Px)
−⬁
b
R
(R,B,Py)
9. The relationship between the Joint probability density function of the two random variables ‘X’ and ‘Y’ and the corresponding joint distribution function is given below fXY .x; y/ D Z FXY .a; b/ D
Z
b 1
@2 FXY .x; y/ @x@y a
1
fXY .x; y/dx dy
10. Probability of the event A corresponding to the random variable X which ranges from xmin to xmax and the random variable Y which ranges from ymin to ymax is computed using the joint density function as given below xmax Z ymax Z
P .A/
fXY .x; y/dx dy xmin ymin
11. The marginal probability density function of the random variable ‘X’ is given as (Fig. 2.9) Z1 fX .x/ D
fXY .x; y/ dy 1
Z FY .y/ D
1 1
fXY .x; y/ dx
2.19 Independent Random Variables
79
Y
Y y
x
Fy(y)
x
x
x
x
Fx(x)
y y1
x1
x2
x y2
P(x1≤ x ≤x2,Y ≤ ⬁)
P(x≤⬁, y1≤ Y≤ y2)
Fig. 2.9 Illustration of the Marginal probability density function
2.18 Conditional Distributions and Densities The conditional distribution of ‘x’ over the event ‘B’ represented as FX=B .x/ D P.X x=B/ D
P.X x; B/ P .B/
2.19 Independent Random Variables Two events A and B are called independent if; P .AB/ D P .A/P .B/ Then we can write, FXY .x; y/ D P .X x; Y y/ D P .X x/.Y y/ D FX .x/FY .y/ So, fXY .x; y/ D fX .x/fY .y/
80
2 Probability
2.20 Some Important Results on Conditional Density Function 1. ‘X’ is continuous, ‘Y’ is discrete, then FX=Y Dy .x/ is computed as described below FX=Y Dy .x/ D P .X x=Y D y/ P .X x; Y D y/ D P .Y D y/ X P .X x/ D P .X x; Y D y/ Y X ) P .X x/ D FX=Y Dy .x/P .Y D y/ Y X ) FX .x/ D FX=Y Dy .x/P .Y D y/ Y X ) fX .x/ D fX=Y Dy .x/P .Y D y/ Y
2. ‘X’ is continuous and ‘Y’ is continuous, then FX=Y Dy .x/ is computed as follows FX=Y Dy .x/ D P .X x=Y D y/ D li m P .X x=y Y y C y/ y!0
D li m
P .X x; y Y y C y/ P .y Y y C y/
D li m
P .y Y y C y=X x/P .X x/ P .y Y y C y/
y!0
y!0
P .y Y y C y=X x/P .X x/ y!0 P .y Y y C y/
D li m
.FY =Xx .y C y/ FY =Xx .y//P .X x/ FY .y C y/ FY .y C y/ .fY =Xx .y//P .X x/ D fY .y/ .fY =Xx .y/P .X x// ) P .X x=Y D y/ D fY .y/ D li m
y!0
Also P .X x=Y D y/fY .y/ D fY =Xx .y/P .X x/ Z )
P .X x=Y D y/fY .y/dy Z
D
fY =Xx .y/P .X x/dy
D P .X x/
2.20 Some Important Results on Conditional Density Function
81
Also consider P .X x; y Y y C y/ P .y Y y C y/ FXY .x; y C y/ FXY .x; y/ ) FX=Y Dy .x/ D li m y!0 FY .y/ FY .y/ ŒFXY .x; y C y/ FXY .x; y/ =y ) FX=Y Dy .x/ D li m y!0 ŒFY .y C y/ FY .y/=y
P .X x=Y D y/ D li m
y!0
) FX=Y Dy .x/ D ) fX=Y Dy .x/ D
dFX Y .x;y/ dy
fY .y/ @2 FX Y .x;y/ @x@y
fY .y/ fXY .x; y/ ) fX=Y Dy .x/ D fY .y/ Z ) fX=Y Dy .x/fY .y/dy D fX .x/ Z ) fX .x/ D fX=Y Dy .x/fY .y/dy Example 2.1. Let FXY .x; y/ D 1 1 =2 1 =2 1 =4 0
x 2 Œ2; 1 \ y 2 Œ3; 1 x 2 Œ0; 2 \ y 2 Œ3; 1 x 2 Œ2; 1 \ y 2 Œ0; 3 x 2 Œ0; 2 \ y 2 Œ0; 3 Else where
.11/
Y 1/2 1 (2,3) 1/4 1/2 (0,0)
x
For the above specification FX .x/ and FY .y/ are computed as follows (Fig. 2.10) FX .x/ D P .X x; Y 1/ and FY .y/ D P .X 1; Y y/ Example 2.2. Let the event ‘B’ be X b Then, FX=Xb .x/ D P.X x=X b/ D ) FX=Xb .x/ D 1 FX .x/ for X < b D FX .b/
P.X x; X b/ P .X b/
for X b
Conditional Probability density function can be obtained by differentiating conditional distribution function.
82
2 Probability FX(x)
FY(y)
1/2
1/2
2
X
3
X
Fig. 2.10 FX .x/ and FY .y/ of the Example 2.1
Y
g( . )
X
P .X D 1=Y D y/
2.20 Some Important Results on Conditional Density Function
f Y D1 .y/ X
fY .y/
P.X D 1/ > )
f Y D1 .y/ X
fY .y/
P.X D 1/
fY =XD1 .y/ P .X D 1/ > fY =XD1 .y/ P .X D 1/
We have, FY =XD1 .y/ D P .Y y=X D 1/ D P .X C N y=X D 1/ D P .1 C N y/ D P .N y 1/ D FN .y 1/ Differentiating both with respect to y, @.FY =XD1 .y/ @FN .y 1/ D @y @y fY =XD1 .y/ D fN .y 1/ So, fY =XD1 .y/ D fN .y 1/ and; fY =XD1 .y/ D fN .y C 1/ Then,
P .X D 1/ fN .y 1/ > fN .y C 1/ P .X D 1/
If fN .n/ is a Gaussian density function with D 0 1 2 fN .n/ D p e .n/=2 2 2
Then, 2 1 .y1/ fN .y 1/ D p e 2 2 2 2 2 1 .yC1/ fN .y C 1/ D p e 2 2 : 2 2
Now, P .X D 1/ FN .y 1/ > fN .y C 1/ P .X D 1/
83
84
2 Probability
e
.y1/ 2 2
2
e
P .X D 1/ P .X D 1/ 4y P .X D 1/ e 2 2 > P .X D 1/ P .X D 1/ 4y > ln 2 2 P .X D 1/ 2 P .X D 1/
ln y> 2 P .X D 1/
.yC1/2 2 2
>
Thus the transformation function g(.) is obtained as follows. If y >
2 P .X D 1/ ln Decide XO D 1; otherside decide XO D 1 2 P .X D 1/
2.21 Transformation of Random Variables of the Type Y D g.X/ Consider the random variable ‘X’ which is transformed into another random variable ‘Y’ using the transformation function defined as Y D g.X/. The graph relating the Y and X is as shown in the figure given below (Fig. 2.12). Solving Y D g.X/ for the general variable ‘y1’ gives x1, x4, x5. (i.e.) g.x1/ D g.x4/ D g.x5/ D y1. Also note that they belongs to the region where there are piece-wise monotonically increasing or decreasing function.
y
y2
Δy
y1
Δx1
Δx4
Δx5
x1 x2
x3 x4
x5 x6
Fig. 2.12 Transformation of random variables-case 1
x
2.22 Transformation of Random Variables of the Type Y1 D g1.X1; X2/; Y2 D g2.X1; X2) 85
From the basics of probability theory P.y1 y y2/ D P.x1 x x2/ C P.x3 x x4/ C P.x5 x x6/ y fY .y/ D x1 fX .x/ at x D x1 C x4 fX .x/ D x4 C x5 fX .x/ at x D x5 Note that magnitude of x1; x4 and x5 are considered to compute fY .y/ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ x1 ˇ ˇ ˇ ˇ C fX .x4/ ˇ x4 ˇ C fX .x5/ ˇ x5 ˇ Therefore; fY .y/ D fX .x1/ ˇˇ ˇ ˇ ˇ ˇ y y y ˇ fX .x4/ fX .x5/ fX .x1/ ) fY .y/ D ˇ ˇ Cˇ ˇ Cˇ ˇ ˇ dy ˇ ˇ dy ˇ ˇ dy ˇ ˇ dx ˇ at x D x1 ˇ dx ˇ at x D x4 ˇ dx ˇ at x D x5 In general, probability density function of y (i.e.) fY .y/ is obtained as follows. Given fX .x/ and the transformation Y D g.X/. 1. Obtain the solutions for the equation Y D g.x/ for the general variable ‘y’, so that x1; x2; : : : xn are obtained. Note that x1; x2; : : : xn are represented in terms of ‘y’. P 2. fY .y/ D niD1 ˇˇ dyfˇˇX .xi/ ˇ dx ˇat xDxi
3. The range of ‘Y’ can be obtained from the range of ‘X’ and the transformation equation Y D g.X/.
2.22 Transformation of Random Variables of the Type Y1 D g1.X1; X2/; Y2 D g2 .X1; X2) Given fX1X2 .x1; x2/; Y 1 D g1.X1; X 2/; Y 2 D g2.X1; X 2/; fY 1Y 2 .y1; y2/ is obtained as follows Also note that the functions g1(.) and g2(.) are invertible. (i.e.) There exists the function h1(.) and h2(.) such that X1 D h1(Y1,Y2) and X2 D h2(Y1,Y2). Consider the rectangle obtained from the intersection of lines Y1 D y1; Y1 D y1 C y1 and Y2 D y2; Y2 D y2 C y2 in the Y1–Y2 plane. The obtained rectangular region is represented as black shade in the Y1–Y2 plane. Also, consider the curve ‘a’ in the X1–X2 plane which is obtained by joining the set of points .X1 D x1; X2 D x2/ which satisfies the equation g1.x1; x2/ D y1. Similarly curve ‘b’, curve ‘c’ and curve ‘d’ are obtained by joining the set of points .X1 D x1; Y1 D y1/ which satisfies the equation g1.x1; x2/ D y1 C y1; g2.x1; x2/ D y2; g2.x1; x2/ D y2 C y2 respectively. The shaded region represented in the X1–X2 plane (see Fig. 2.13)) is obtained as the intersection of the curves ‘a’, ‘b’, ‘c’ and ‘d’. This shaded region can be approximated as the parallelogram.
86
2 Probability
Y2
X2 y2 + Δy2
a b
y2
d c Y1 y1 y1 + Δy1
X1 a: g1(x1,x2)=y1 b: g1(x1,x2)=y1+Δy1 c: g2(x1,x2)=y2 d: g2(x1,x2)=y2+Δy2
Fig. 2.13 Transformation of random variable-Case 2
From the basics of probability theory, the probability belonging to the shaded region in the Y1–Y2 plane is equal to the probability belonging to the shaded region in the X1–X2 plane. Therefore, fY 1Y 2 .y1; y2/ [Area of the rectangle] D fX1X2 .x1; x2/ [Area of the area of the parallelogram] Consider the curve a: g1.x1; x2/ D y1. The set of points on this curve satisfies the equation x1 D h1.y1; y2/ and x2 D h2.y1; y2/. Note that ‘y1’ is constant. Hence the points on this curve ‘a’ can be rewritten as x1 D h1.y2/ and x2 D h2.y2/ for the fixed Y1 D y1. Similarly for the curve b: The invertible equations for the equation g1.x1; x2/ D y1 C y1 can be written as x1 D h1.y2/ and x2 D h2.y2/ for the fixed Y1 D y1 C y1. For the curve ‘c’: The invertible equations for the equation g2.x1; x2/ D y2 can be written as x1 D h1.y1/ and x2 D h2.y1/ for the fixed Y2 D y2. For the curve ‘d’: The invertible equations for the equation g2.x1; x2/ D y2 C y2 can be written as x1 D h1.y1/ and x2 D h1.y1/ for the fixed Y2 D y2 C y2. Thus the rectangle with the co-ordinates mentioned and the corresponding parallelogram with the co-ordinates marked is as shown in the Fig. 2.14. Point 1 (p1) on the curve ‘b’ can be obtained as x1C (rate of change of h1 with respect to y2)(dy2) for the X1 co-ordinate and x2C (rate of change of h2 with respect to y2) dy2. Point 2 (p2) on the curve ‘c’ can be obtained as x1C (rate of change of h1 with respect to y1)(dy1) for the X1 co-ordinate and x2C (rate of change of h2 with respect to y1) dy1.
2.22 Transformation of Random Variables of the Type Y1 D g1.X1; X2/; Y2 D g2.X1; X2) 87 (y1+dy1,y2)
(y1+dy1,y2+dy2)
curve a p2
curve d
curve c (x1,x2) (y1,y2)
curve b
p1
(y1,y2+dy2)
p1:
x1+ ⭸h1 dy2 x2+ ⭸h2 dy2 ⭸y2 ⭸y2
p2:
x1+ ⭸h1 dy1 x2+ ⭸h2 dy1 ⭸y2 ⭸y1
Fig. 2.14 Rectangle to Parallelogram mapping for the transformation of random variable
Thus the co-ordinates for the points ‘p1’ and ‘p2’ are obtained as follows. dh2 dh1 dy2; x2 C dy2 x1 C dy2 dy2 dh1 dh2 P2 W x1 C dy1; x2 C dy1 dy1 dy1
P1 W
Thus the equation becomes, fY 1Y 2 .y1; y2/ ŒArea of the rectangle D fX1X2 .x1; x2/ ŒArea of the Area of the parallelogram Area of the Rectangle D dy1 d y2 ˇ ˇ ˇ dh2 ˇ ˇ dh1 dh2 ˇ ˇ dy2ˇ ˇ ˇ ˇ dy2 dy2 ˇ dy2 ˇDˇ ˇ ˇ ˇ dh1 dh2 ˇ dy1 dy2 dh2 ˇ ˇ ˇ dy1ˇ ˇ ˇ dy1 dy1 dy1 ˇ ˇ ˇ dh1 dh2 ˇ ˇ ˇ ˇ dy2 dy2 ˇ ˇ dy1 dy2 ˇ fY 1Y 2 .Y 1; y2/dy1 dy2 fX1X2 .x1; x2/ ˇ ˇ ˇ dh1 dh2 ˇ ˇ ˇ dy1 dy1 ˇ ˇ ˇ dh1 dh2 ˇ ˇ ˇ ˇ dy2 dy2 ˇ ˇ ˇ ) f1 1Y 2 .y1; y2/ D fX1X2 .x1; x2/ ˇ ˇ ˇ dh1 dh2 ˇ ˇ ˇ dy1 dy1
ˇ ˇ dh1 ˇ dy2 ˇ dy2 Area of the parallelogram D ˇˇ ˇ d k1 dy1 ˇ dy1
88
2 Probability
ˇ ˇ ˇ dh1 dh2 ˇ ˇ ˇ ˇ dy2 dy2 ˇ ˇ ˇ is called as Jacobian matrix. The matrix ˇ ˇ ˇ dh1 dh2 ˇ ˇ ˇ dy1 dy1 Also it can be shown using the same procedure ˇ ˇ dg1 ˇ ˇ dx2 fX1X2 .x1; x2/ D fY 1Y 2 .y1; y2/ ˇˇ ˇ dg1 ˇ dx1
ˇ dg2 ˇ ˇ dx2 ˇˇ dg2 ˇˇ ˇ dx1
Example 2.4. The random variables X and Y are related via Y D g.X/, where g(.) is monotonically increasing function. FY .y/ D P .Y y/ D P .g.X/ Y/ D P.X g1 .y// D FX .g1 .y// Let a D g1 .y/ ) y D g.a/ ) FY .g.a// D FX .a/ Example 2.5. Let X be a random variable with uniform distribution over the interval [4 to 4] (Fig. 2.15). Let the random variable Y D g.X /. where g.x/ D x; D 2; D 2;
jxj 2 X < 2 X >2
The density function of the random variable Y is computed as follows. P .Y y/ D P .g.X /y/ D P .X g1 .y//
y = g(x) 2 -2 2 -2
Fig. 2.15 Y D g(X) of the Example 2.5
X
2.22 Transformation of Random Variables of the Type Y1 D g1.X1; X2/; Y2 D g2.X1; X2) 89
From the graph (Fig. 2.16) FY .1/ D P .Y 1/ D P .X 1/ D 1 FY .2/ D P .Y 2/ D P .X 1/ D 1 FY .y/ D P .Y y/ D P .X y/ for 2 y 2 ) D P .Y 2/ D 0 fY .y/ is obtained by differentiating FY .y/ with respect to y as shown in the graph below. Note that there are two impulses in fY .y/. One at y D 2 and another at y D 2 ) P .Y D 2/ D 14 and P .Y D 2/ D 14 Example 2.6. Let X and Y are the two random variables such that Y D X 2 . Given probability density function of the random variable X, the probability density function of the random variable Y is obtained as follows (Fig. 2.17). p p p From the graph, FY .y/ D P .Y y/ D P . y X y/ D FX . y/ p FX . y/ F X (x)
f X (x) 1/8 −4
1 4
−4
x
1/4
1 /4 1 /8
3/4 2
x
f Y (y)
F Y 1 (y) 1/4 −2
4
−2
y
2
y
Fig. 2.16 PDF and the corresponding CDF of the random variables X and Y of the Example 2.5 Y = X2
Y=y
Fig. 2.17 Y D g(X) of the Example 2.6
−√ ⎯y
+√ ⎯y
X
90
2 Probability
Differentiating on both sides gives 1 p p fY .y/ D p fX y C fX . y/ for y 0 2 y (a) If X is having the following probability density function x2
fX .x/ D x’ e 2’ x 0 D 0; elsewhere p y y 1 ) fY .y/ D p e 2˛ 0 2 y a ŒNote that fx .x/ D 0 for x < 0 1 y e 2˛ for y 0 ) fY .y/ D 2˛ (b) If X is having the following probability density function
2 1 x fX .x/ D p e 2 2 for 1 x 1 2˘ 2
1 1 1 y2 y2 2 2 p e Cp e ) fY .y/ D p 2 y 2˘ 2 2˘ 2 1
) fY .y/ D p e 2˘y 2
y 2 2
for y0
(c) fY .y=X > 0/ is computed as follows FY .y=X > 0/ D P .y Y =X > 0/ P .y Y; X > 0/ D P .X > 0/
p p P y X y; X > 0 D P .X > 0/
p P 0 0/
p y FX .0/ FX D P .X > 0/
p y FX .0/ FX D Œ1 P .X 0/
p y FX .0/ FX D Œ1 FX .0/
2.22 Transformation of Random Variables of the Type Y1 D g1.X1; X2/; Y2 D g2.X1; X2) 91
Differentiating on both sides gives fY .y=X > 0/ D
1 p 2 y
"
p # FX y f or y 0 Œ1 FX .0/
Example 2.7. Let X and Y are the two independent random variables, the probability density function of the random variable Z D X C Y is computed as follows (Fig. 2.18). From the Graph Z 1 Z zy fXY .x; y/dx dy FZ .z/ D P .Z z/ D P ..X C Y / z/ D 1 1 R 1 R zy @ 1 1 fXY .x; y/dx dy ) fZ .z/ D @z Using Leibnitz integration Formula Z 1 @.z y/ @.1/ fXY .z y; y/ fXY .1; y/ ) fZ .z/ D @z @z 1 Z zy @fXY .x; y/ dx dy C @z 1 Z 1 ) fZ .z/ D fXY .z y; y/ dy 1
Y
z X+Y=z
z
Fig. 2.18 Graph depicting X C Y D z
X
92
2 Probability f X (x)
f Y (y) 1/2
1/4
* 2
x
4
y
f Z (z) 1/4
2
4
6
z
Fig. 2.19 Probability density function of the random variables X, Y and Z, where Z D X C Y
Note: Leibnitz integration Differentiation of the integration with limits (Leibnitz integration) Z
a.x/
g(x) D
f .x,y/dy b.x/
dg(x) da.x/ db.x/ D f .x,a.x// f .x,b.x/ C dx dx dx X and Y are independent ) fZ .z/ D
R1
1
Z
a.x/ b.x/
@f .x,y/ dy @x
fX .z y/fY .y/ dy
) fZ .z/ D fX .z/ fY .z/ Suppose if X is uniformly distributed between 0 and 2 and Y is uniformly distributed between 0 and 4, then the probability density function of Z is computed as shown below (Fig. 2.19). Example 2.8. Let X and Y are the two independent random variables, the probability density function of the random variable Z D X=Y is computed as follows. FZ .z/ D P .Z z/ D P
X Y
z
2.22 Transformation of Random Variables of the Type Y1 D g1.X1; X2/; Y2 D g2.X1; X2) 93
To obtain the solution the Trick is to Compute
z F Z Dy .z/ D P Z Dy Y Y D P .Z z; Y D y/=P .Y D y/ ) FZ=Dy .z/ D P ..X= .Y D y// z; Y D y/ =P .Y D y/ ) FZ=Y Dy .z/ D P .X zy/P .Y D y/=P .Y D y/ ŒBecause X and Y are independent ) FZ=Y Dy .z=Y D y/ D P .X zy/ D FX .zy/ Differentiating on both sides gives, ) fZ=Y Dy .z/ D y fX .zy/ Also P .Z z; y Y y C y/ y!0 P .y Y y C y/ FZY .z; y C y/ FZY .z; y/ ) FZ=Y Dy .x/ D lim y!0 FY .y C y/ FY .y/ ŒFZY .z; y C y/ FZY .z; y/ =y ) FZ=Y Dy .x/ D lim y!0 ŒFY .y C y/ FY .y/ =y
P .Z z=Y D y/ D li m
) FZ=Y Dy .x/ D
dFZY .z; y/ dy
fY .y/ @2 FZY
) fZ=Y Dy .x/ D
.z; y/ @z@y
fY .y/ fZY .z; y/ ) fZ=Y Dy .x/ D fY .y/ Z ) fZ=Y Dy .x/fY .y/dy D fZ .z/ Z ) fZ .z/ D fZ=Y Dy .z/fY .y/dy fZ=Y Dy .z/ D y fX .zy/ Z ) fZ .z/ D y fX .zy/fY .y/dy Example 2.9. The random variable Xpand Y are independent and identically dis Y . The joint tributed random variables. Let Z D X 2 C Y 2 and ˆ D tan1 X density function fZˆ .Z; ˆ/ is computed as follows.
94
2 Probability
We know, fXY .x; y/ fXY .x; y/ fZˆ .z; ˆ/ D ˇ ˇ at .x1; y1/ C ˇ ˇ at .x2; y2/ ˇ dg1 dg2 ˇ ˇ dg1 dg2 ˇ ˇ ˇ ˇ ˇ ˇ dx ˇ dx dx ˇˇ dx ˇˇ ˇ ˇ ˇ dg1 dg2 ˇ ˇ dg1 dg2 ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ dy dy dy dy fXY .x; y/ C ˇ ˇ at .x n; yn/ ˇ dg1 dg2 ˇ ˇ ˇ ˇ dx dx ˇˇ ˇ ˇ dg1 dg2 ˇ ˇ ˇ ˇ ˇ dy dy Where Z D g1.X; Y/ and ˆ D g2.X; Y/ and (x1,y1), (x2,y2), : : : (xn,yn) are the solutions obtained by solving the equations g1(x,y) and g2(x,y) for the constant ‘Z’ and ‘ˆ’. p
Y In our case, g1.X; Y/ D Z D X 2 C Y 2 and g2.X; Y/ D ˆ D tan1 X . Solving for X and Y for the fixed Z and ˚ gives X1 D Z cos ˚ and Y1 D Z sin ˚.x1; y1/ D .Zcos˚; Zsin˚/ X Zcos˚ dg1 Dp D cos ˚ at .x1; y1/ D .Zcos˚; Zsin˚/ D 2 2 dX Z X CY dg1 Y Zsin˚ Dp D sin ˚ at .x1; y1/ D .Zcos˚; Zsin˚/ D 2 2 dY Z X CY
XY2 dg2 Y D 2 at .x1; y1/ D .Zcos˚; Zsin˚/ D 2 dX 1 C .Y =X / X CY2 Sin˚ ZSin˚ D D Z 2 Z 1 dg2 X cos˚ X D D 2 at.x1; y1/ D .Zcos˚; Zsin˚/ D dY 1 C .Y =X /2 X CY2 Z ˇ ˇ ˇ ˇ ˇ dg1 dg2 ˇ ˇ ˇ ˇ ˇcos˚ Sin˚ ˇˇ ˇ dx ˇ ˇ dx Z ˇˇ D 1=jZj ˇDˇ ) ˇˇ ˇ ˇ cos˚ ˇ dg1 dg2 ˇ ˇ ˇ sin˚ ˇ ˇ ˇ Z dy dy Thus fZˆ .z; ˆ/ D
f .x;y/ XY dg1 dg2 dx dx dg1 dg2 dy dy
at .x1; y1/ D .Zcos˚; Zsin˚/
) fZˆ .z; ˆ/ D jZjfXY .Zcos˚; Zsi n˚/
2.22 Transformation of Random Variables of the Type Y1 D g1.X1; X2/; Y2 D g2.X1; X2) 95
Suppose if fXY .x; y/ D
1 2˘ 2
) fZˆ .z; ˆ/ D D 0; Otherwise
e
.x 2 Cy 2 / 2 2
, where 1 x; y 1.
.z2 / jzj 2 2 ; where Z 0 and 0 ˆ 2˘ e 2˘ 2
Example 2.10. The random variablep X and Y are independent and identically distributed random variables. Let Z D X 2 C Y 2 and W D X=Y. The joint density function fZW .Z; W / is computed as follows. We know, fXY .x; y/ fXY .x; y/ fZW .z; w/ D ˇ ˇ at .x1; y1/ C ˇ ˇ at .x2; y2/ ˇ dg1 dg2 ˇ ˇ dg1 dg2 ˇ ˇ ˇ ˇ ˇ ˇ dx ˇ dx dx ˇˇ dx ˇˇ ˇ ˇ ˇ dg1 dg2 ˇ ˇ dg1 dg2 ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ dy dy dy dy fXY .x; y/ C ˇ ˇ at .x n; yn/ ˇ dg1 dg2 ˇ ˇ ˇ ˇ dx dx ˇˇ ˇ ˇ dg1 dg2 ˇ ˇ ˇ ˇ ˇ dy dy Where Z D g1.X; Y/ and W D g2.X; Y/ and (x1,y1), (x2,y2),: : :(xn,yn) are the solutions obtained by solving the equations g1(x,y) and g2(x,y) for the constant ‘Z’ and ‘W’. p In our case, g1.X; Y/ D Z D X 2 C Y 2 and g2.X; Y/ D W D Y=X. Solving for X and Y for the fixed Z and W gives the following set of solutions p Z D X 2 C Y 2 ) Z 2 D X 2 C Y 2 D X 2 C .X W /2 D X 2 .1 C W 2 / ) X 2 D Z 2 =.1 C W 2 / p p ) X D CZ= 1 C W 2 or X D Z= 1 C W 2 and the corresponding values for Y are p p Y D CZW= 1 C W 2 or Y D ZW= 1 C W 2 respectively:
p p p Therefore the solutions are Z= 1 C W 2 ; CZW= 1 C W 2 and .Z= 1 C W 2 ; p ZW= 1 C W 2 /
p p X dg1 Dp at .x1; y1/ D CZ= 1 C W 2 ; C ZW= 1 C W 2 dX X2 C Y 2 p D 1= 1 C W 2
96
2 Probability
p Z 2 at .x2; y2/ D p ; ZW= 1 C W 1CW2
X dg1 Dp 2 dX X CY2 1 D p 1CW2
p p Y dg1 Dp at .x1; y1/ D CZ= 1 C W 2 ; C ZW= 1 C W 2 dY X2 C Y 2 p D W= 1 C W 2 p Y dg1 Z 2 Dp at .x2; y2/ D p ; ZW= 1 C W dY X2 C Y 2 1CW2 p D W= 1 C W 2
p p Y dg2 D 2 at .x1; y1/ D CZ= 1 C W 2 ; C ZW= 1 C W 2 dX X p W 1CW2 D Z p p Y W dg2 1CW2 Z D 2 at .x2; y2/ D p ; ZW= 1 C W 2 D C dX X Z 1CW2 p p 2 1 Z dg2 1CW D at .x1; y1/ D p ; ZW= 1 C W 2 D C 2 dY X Z 1CW p p dg2 1 Z 1CW2 D at .x2; y2/ D p ; ZW= 1 C W 2 D dY X Z 1CW2 ˇ ˇ p p 2ˇ ˇ 2 W 1CW j.1 C W /j ˇ ˇ 1= 1 C W 2 p Z Jacobian at .x1; y1/ D ˇ p 2 ˇ D 1CW 2 ˇ ˇW= 1 C W jZj Z ˇ ˇ ˇ1=p1 C W 2 W=p1 C W 2 ˇ j.1 C W 2 /j ˇ ˇ p Jacobian at .x2; y2/ D ˇ W p1CW 2 ˇD 2 1CW ˇ ˇ jZj Z Z jZj fZW .z; w/ D j.1 C W 2 /j p Z fXY p ; ZW= 1 C W 2 1CW2 p Z jzj 2 f ; ZW= 1 C W C p XY j.1 C W 2 /j 1CW2 p Z jZj 2 fXY p ) fZW .z; w/ D ; ZW= 1 C W j.1 C W 2 /j 1CW2 p Z 2 ; ZW= 1 C W CfXY p 1CW2
p p Z= 1 C W 2 ; CZW= 1 C W 2
2.22 Transformation of Random Variables of the Type Y1 D g1.X1; X2/; Y2 D g2.X1; X2) 97
Suppose if fXY .x; y/ D
1 .x 2 Cy 2 /=2 2˘ e
1 ..z/2 C.zw/2 /=2.1CW 2 / jZj e 2 j.1 C W /j ˘ 1 z2 jZj e 2 f or z 0 and 1 W 1 ) fZW .z; w/ D 2 j.1 C W /j ˘ ) fZW .z; w/ D
Given W D Y=X 1 X 1 and 1 Y 1 and hence 1 W 1 Example 2.11. Let X is a uniformly distributed random variable over the interval [0 1]. Y is the random variable related with the random variable X as Y D g.X/. The invertible function ‘g(.)’ is related with the distribution function FY .y/ is as given below.
P .Y y/ D P .g.X / y/ D P .X g1 .y// D FX g 1 .y/ Note that g(.) must be the invertible function. Also we know FX .x/ D x as X is uniformly distributed over the interval Œ0 1 ) P .Y y/ D FY .y/ D g 1 .y/ ) FY .:/ D g 1 .:/ p
For instance if fYp.y/ D .e
2jyj
p /= 2, then g (.) is obtained as follows.
2jyj
For fY .y/ D e p2 , FY .y/ is computed as follows. Z FY .y/ D
p
e 2jyj p dy 2
y 1
Case 1: If y 0 Z
y 1
D
p
e 2y p dy 2 p
e
2y
2
Case 2: If y 0 Z FY .y/ D
0 1
p
e 2y p dy C 2
Z
y 0
p
e 2y p dy 2
98
2 Probability
D
p 2y
1 e 2 2 p
e
D 1 p
g
1
.y/ D
e
2
C
1 2
y
2 2y
2
)yD
Dx)
p 2y=2 D ln.x/
p 2 ln.x/ f or y 0 p
Similarly f or y 0; g
1
.y/ D 1
e
2y
2
Dx
p ) y D ln.2.1 x//= 2 f or y 0 p Thus the function y D g.x/ D 2 ln.x/ f or 0:5 x 1 p D ln.2.1 x//= 2 f or 0 x 0:5 Example 2.12. Let Z D max.X; Y/ and W D min.X; Y/, where, X and Y are arbitrary random variables. The joint density function fZW .z; w/ is computed in terms of fXY .x; y/ is as follows. When z w; P .Z z; W w/ D P .X z; Y w/ C P .X w; Y z/ P .X w; Y w/ (See Fig. 2.20 given below) (W, Z)
(Z, Z)
(Z, W) (W, W)
Fig. 2.20 Computation of the joint density function of the 2.12
2.24 Indicator
99
) FZW .z; w/ D FXY .z; w/ C FXY .w; z/ FXY .w; w/ f or z w D 0; Otherwise ) fZW .z; w/ D fXY .z; w/ C fXY .w; z/ fXY .w; w/ f or z w D 0; Otherwise For instance if X and Y are independent and identically distributed and is uniformly distributed between 0 and 4, then 1 for z w; z; w D 0 to 4 25 D 0; Otherwise
fZW .z; w/ D
2.23 Expectations Expectation of the random variable ‘X’ represented as E(X) is defined as follows: Z E.X/ D
1 1
xfX .x/
Properties: 1. 2. 3. 4. 5.
R1 E.g.X// D 1 g.x/fX .x/dx R1 EX=Y Dy D E.X=Y D y/ D 1 xf X=Y Dy .x/ E.X/ D EY .EX=Y Dy / If X > 0 then E.X/ > 0 E.X 2 D EŒŒX E.X /2 C EŒX 2
2.24 Indicator (a) Markov inequality (Fig. 2.21) Consider X=b Ib (X) X E .Ib .X // )E b X P .X b/ )E b E.X / ) P .X b/ b
100
2 Probability
Fig. 2.21 Indicator x/b
Ib(x)
b
x
(b) Chebyshev inequality Consider the random variable X D Y 2 Using Markov inequality P.Y 2 b/ E.Y 2 /=b 2 ) P .jY j b/ E.Y 2 /=b 2 ) P .jY E.Y /j b/ E.jY E.Y /j2 /=b 2 ) P .jY E.Y /j b/ y2 =b 2 (c) Schwarz inequality
Consider E .aX Y/2 0 ) a2 E.X 2 / C E.Y 2 / 2aE.X Y / 0 ) a2 E.X 2 / 2aE.X Y / C E.Y 2 / 0 The above equation can be viewed as the quadratic equation with variable ‘a’ The equation is valid only when 4.E.XY//2 4E.X 2 /E.Y 2 / 0 ) .E.X Y //2 E.X 2 /E.Y 2 / (d) Chernoff bound (Fig. 2.22) Consider eSXaS Ia (x) ) E.eSXaS / E.Ia .x// ) E.eSXaS / P .X b/ ) P .X a/ eaS E.eSX / (e) Also it can be shown EŒXY 0:5.E.X 2 / C E.Y 2 // as follows E..X Y /2 / 0
2.25 Moment Generating Function
101
esx-as
Ia(x)
a
x
Fig. 2.22 Chernoff bound
) E.X 2 / C E.Y 2 / 2E.X Y /
) EŒXY 0:5 E.X 2 / C E.Y 2 / (f) Correlation co-efficient From Cauchy-Schwarz inequality .E.XY//2 E.X 2 /E.Y 2 / The ratio
.E.X Y //2 1 E.X 2 /E.Y 2 /
E ..X mX /.Y mY // Define the ratio D p E.X mX /2 /E..Y mY /2 / The ratio is called correlation co-efficient. The range of is given as 0 j j 1
2.25 Moment Generating Function The moment generating function is defined as ˚X .s/ D E.e sX /
2 .sX/2 sX Also it is known that E.e / D E 1 C 2Š C .sX/ 3Š C Therefore differentiating ‘n-times’ the moment generating function ˚X .s/ and equating s D 0 gives the E.X n /. n ˚ .s/ X Thus E.X n / D d dS js D 0 n
102
2 Probability
2.26 Characteristic Function The characteristic function of the random variable X is given as E.e jwX /. E.X n / can also be obtained using characteristic function.
2.27 Multiple Random Variable (Random Vectors) Collections of random variables are called random vectors. The random vectors X1 X2 : represented as X D : : : Xn Joint Cumulative distribution function FX .x1; x2; x3; : : : x n/ D pr.X1 x1; X 2; x2; X 3 x3; : : : Xn x n/ Probability mass function of the random vector is represented as follows. pr.X1 D x1; X 2 D x2; X 3 D x3; : : : Xn D x n/ Probability density function of teh fX .x1; x2; x3; : : : x n/ D
@n FX .x1; x2; x3; : : : x n/ @x1@x2@x3 : : : @x n
Similarly FX .x1; x2; x3; : : : x n/ D Z Z Z Z : : : fX .x1; x2; x3; : : : x n/ @x1@x2@x3 : : : @x n Marginal density function Z Z Z fXi .xi / D
Z
fX .x1;x2;:::x.i 1/;x.i C1/x.i /:::x n/ @x1@x2:::@x.i 1/@x.i C1/;:::@x n
2.27 Multiple Random Variable (Random Vectors)
103
Joint density function Z Z Z fX1X2 .x1; x2/ D
Z :::
fX
x1; x2; : : : x.i 1/ @x3@x4 : : : @x n x.i /; x.i C 1/ : : : x n
Conditional probability density function Consider the random vector X is divided into two random vectors X1 and X 2 as shown below. X1 XD X2 Then the conditional probability of the random vector X1 over the random vector X 2 is computed as follows fX1=X2 .x1/ D fX .x/=fX2 .x2/ Also note that fX .x/ D fX1=X2 .x1/fX2 .x2/ In General, fX .x/ D fX1 .x1/fX2=X1 .x2/fX3=X1;X2 .x3/ : : : fXn=Xn1:::X1 .x n/ Independence The random variables of the random vectors X1,X2,X3,: : :Xn are independent if FX .x1; x2; x3; : : : x n/ D FX1 .x1/FX2 .x2/FX3 .x3/ : : : FXn .x n/ fX .x1; x2; x3; : : : x n/ D fX1 .x1/fX2 .x2/fX3 .x3/ : : : fXn .x n/ Expectation of the random vector E.X1/ E.X 2/ : E.X / D : : : E.Xn/ Moment generating function of the random vector E.e S
T
X
/
104
2 Probability
Where X1 X2 : XD : : : Xn
s1 s2 : SD : : : sn
Characteristic function of the random vector T
E.e j w
X
/
where X1 X2 : XD : : : Xn
w1 w2 : W D : : : wn
Correlation matrix of the random vector E.X X T / D E.X12 / E.X1X 2/ E.X1X 3/ : : : E.X1Xn/ E.X 2X1/ E.X 22 / E.X 2X 3/ : : : E.X 2Xn/ : : : ::: : : : : ::: : : : : ::: : E.XnX1/ E.XnX 2/ E.XnX 3/ : : : E.Xn2 / Covariance matrix of the random vector
E .X E.X/.X E.X /T D E.X X T / E.X /E.X /T Note: 1. The events Xi and Xj are statistically uncorrelated if E.Xi Xj/E.Xi/E.Xj/ D 0; i ¤ j. Also note that the co-variance matrix becomes diagonal matrix if the elements of the random vector are uncorrelated to each other. 2. The events Xi and Xj are independent then E.XiXj/ D E.Xi/E.Xj/. 3. If the two events are statistically independent, they are uncorrelated. But the viceversa is not true.
2.27 Multiple Random Variable (Random Vectors)
105
Gaussian probability density function with mean ‘’ and variance ‘ 2 ’ 2 1 Œx fX .x/ D p e 2¢ 2 2…¢ 2 FX .x/ D P .X x/
Suppose y 2 D
ŒX2 ¢2
Z FX .x/ D
x
p Z
D Z D Z
1 Œx ¢
1 Œx ¢
1
2
1 2…¢ 2
e
Œx 2 2¢
dx
y2 ¢ p e 2 dy 2…¢ 2 y2 ¢ p e 2 dy 2…¢ 2
Œx ¢
1 y2 e 2 dy p 2… 1 ŒX D FY ¢
D
Thus Gaussian distribution value at ‘x’ with mean D ; variance D 2 can be whose mean D 0 and computed using Gaussian distribution function of y at ŒX variance D 2 Moment generating function of the Gaussian density function Z ˆX .s/ D E.e D sx
1 1
2 e sx Œx p e 2¢ 2 dx 2…¢ 2
Consider the powers of e. sx2¢ 2 x 2 2 C 2x 2¢ 2 2 2 x C .2 C 2s¢ 2 /x D 2¢ 2 Adding . C s¢ 2 /2 and subtracting . C s¢ 2 /2 on the numerator, we get the following 2 x C 2 .2 C 2s¢ 2 /x C . C s¢ 2 /2 . C s¢ 2 /2 D 2¢ 2 .x .2 C 2s¢ 2 //2 C 2 . C s¢ 2 /2 D 2¢ 2
106
2 Probability
Z
1
)
e
2 .Cs¢ 2 /2 2¢ 2
p 2…¢ 2
1
D e s e Z
1
e
2
.x.2C2s¢ 2 // 2¢ 2
e
dx
¢ 2 s2 2
.x.2C2s¢ 2 //
2
2¢ 2
p dx D 1 2…¢ 2 @ˆX .s/ EŒX D js D 0 @s @n ˆX .s/ EŒX n D js D 0 @S n
Because;
1
ˆX .s/ D e s e
¢ 2 s2 2
Consider the case when D 0. ˆX .s/ D e
2 2 2
¢ 2 s2 2
¢ s 2
¢ 2s2 2
¢ 2s2 2
2m
C C :::: C C 1Š 2Š 2mŠ @2m ˆX .s/ ¢ 2m .2m/.2m 1/ : : : 1 ) js D 0 D @S 2m 2m mŠ Therefore EŒX n D ¢ n Œ1:3:5:7 : : : :.n 1/ when n is even D 1C
D 0 when n is odd Chernoff bound for Gaussian density function with zero mean and unity variance The Chernoff bound is given as P .X a/ eaS E.eSX / (i.e) P .X a/ eaS ˆX .s/ For Gaussian density function ˆX .s/ D e s e ) P .X a/ eaS e s e
¢ 2 s2 2
¢ 2 s2 2
f or al l s
To get the tight bound we have to find the value of ‘s’ so that eaS e s e minimized. Assume D 0 and variance ¢ 2 D 1 for simplicity. Differentiating eaS e
¢ 2 s2 2
eaS e
with respect to s and equate to zero
¢ 2 s2 2
¢ 2 s 2 aS ¢2 .2s/ C e e .a/ D 0 2 2
s D .a=¢ 2 / D a
¢ 2 s2 2
is
2.28 Gaussian Random Vector
Substituting s D a in eaS e
107 s2 2
we get, e
a 2 2
) P .X a/ e
a 2 2
2.28 Gaussian Random Vector with Mean Vector X and Covariance Matrix CX h iT h i 1 1 X X X X fX .x/ D e C n 1 2 .2˘ / 2 jC j 2 1
Moment Generating function for the Gaussian random variable is given as ˆX .S/ D e ŒX
TS
1
e 2 ŒS
T CS
Properties of Gaussian random vector
1.
2. 3. 4. 5. 6.
E.X1/ E.X 2/ : E.X / D D X : : : E.Xn/ ˇ d˚X .s/ ˇˇ E.Xi / D sD0 dSi ˇ ˇ
d 2 ˚X .s/ ˇˇ E Xi 2 D sD0 dSi2 ˇ ˇ d 2 ˚X .s/ ˇˇ sD0 E.Xi Xj / D dSi Sj ˇ ˇ
m n d mCn ˚X .s/ ˇˇ sD0 In general E Xi Xj D dSi m sj n ˇ The correlation matrix of the Gaussian random vector is given as R D E.X X T / D E.X12 / E.X1X 2/ E.X1X 3/ : : : E.X1Xn/ E.X 2X1/ E.X 22 / E.X 2X 3/ : : : E.X 2Xn/ : : : ::: : : : : ::: : : : : ::: : E.XnX1/ E.XnX 2/ E.XnX 3/ : : : E.Xn2 /
108
2 Probability
7. The co-variance matrix is obtained as
C D E .X E.X / .X E.X /T D E.X X T / E.X /E.X /T 8. Marginal density of any `-dimensional sub vector of X .` n/ is also Gaussian with proper mean and co-variance matrix as described below X1 X2 : : : Xl Consider the Gaussian random vector X D XlC1 XlC2 XlC3 : : : Xn m1 m2 : : : ml With mean vector m D mlC1 mlC2 mlC3 : : : mn And the covariance matrix given as c11 c21 : : : cl1
c12 c22 : : : cl2
: : : : : :
:: :: :: :: :: ::
c1l c2l : : : cl l
:: :: :: :: :: ::
: c1n : c2n : : : : cln
2.28 Gaussian Random Vector
109
clC1;1 clC1;2 : : : : : : cn2 cn1
: : : clC1;l ::: : ::: : ::: : : : : cn;l
:: :: :: :: ::
: clC1;n : : : : : : : cnn
Let the sub vector of the above mentioned Gaussian vector be X1 X2 : Y D : : Xl The random vector Y is also Gaussian distributed with mean vector m1 m2 : my D : : ml and co-variance matrix c11 c12 c21 c22 : : : : : : c1l cl2
: : : : : :
: : : : : :
: c1l : c2l : : : : : : : cl l
9. If the Gaussian random vector X with mean vector M and co-variance matrix ‘CX ’ is linearly transformed into the random vector Y using the transformation matrix ‘A’ and the column vector b as Y D AX C b Then the random vector Y is also Gaussian distributed with mean vector A M C b and the co-variance matrix A CX AT Note that the co-variance matrix of the random variable Y is A CX AT irrespective of the distribution function. Similarly the mean vector of the random variable Y is A M C b irrespective of the type of the distribution function.
110
2 Probability
10. Consider the random vector X is represented as row concatenation of two random vectors X1 and X 2 as shown below X1 X2
XD
Similarly the mean vector the random vector X is represented as row concatenation of the mean vectors of the random vectors X1 and X 2 as shown below. M1 M2
M D
Also let the co-variance matrix of the random vector X1 and X 2 are represented as C11 and C22 respectively. The co-variance matrix of the random vector X is represented as # " ŒC11 ŒC12 ŒC21 ŒC22 (a) The conditional density function fX2=X1Dx1 .x2/ is also Gaussian vector M 2 C ŒC21 ŒC11 1 ŒX1 M1 and co-variance matrix C ŒC21 ŒC11 1 ŒC12 (b) The conditional density function fX1=X2Dx2 .x1/ is also Gaussian vector M1 C ŒC12 ŒC22 1 ŒX 2 M 2 and co-variance matrix C ŒC12 ŒC22 1 ŒC21
with mean D ŒC22 with mean D ŒC11
11. If the co-variance matrix of the Gaussian random vector is diagonal, the elements in the random vector X are uncorrelated and independent 12. Contours of 2D-Gaussian probability density function (Fig. 2.23)
fXY .x; y/ D
1 p 2˘ 1 2 1 2
y2 2 xy x2 C
1 2
2 2
1 2 2 2.1 / e
Consider the 2D contour obtained from the equation y2 2 xy x2 C D constant D c1 2 2
1
2
1 2 This is the contour having the same probability density value.
2.28 Gaussian Random Vector
111
rho=0 sigma1=1 sigma2=1
rho=0 sigma1=1 sigma2=2
10
rho=0 sigma1=2 sigma2=1
10
10
8
8
8
6
6
6
4
4
4
2
2
2
0
0
0
−2
−2
−2
−4
−4
−4
−6
−6
−6
−8
−8
−10 −10
−5
0
5
10
−8
−10 −10
rho=1/2 sigma1=1 sigma2=1
−5
0
5
10
−10 −10
10
10
8
8
8
6
6
6
4
4
4
2
2
2
0
0
0
−2
−2
−2
−4
−4
−4
−6
−6
−6
−8
−8
−10 −10
−5
0
5
10 10
−5
0
5
10
rho= −1/2 sigma1=1 sigma2=1
rho=1/2 sigma1=1 sigma2=2
10
−8
−10 −10
−5
0
5
10
−10 −10
rho= −1/2 sigma1=1 sigma2=2
8 6 4 2 0 −2 −4 −6 −8 10 −10 −8 −6 −4 −2
0
2
4
6
8
10
Case 1: D 0; 1 D 2
y c1=2c c1 c X
Fig. 2.23 Contours of the 2D Gaussian probability density function
−5
0
5
10
112
2 Probability
Note that when the contour radius increases, actual magnitude of the pdf decreases. Thus contour c1 is having higher magnitude compared with the contour c2. Case 2: D 0; 1 D 2 y
1 > 2 x
Case 3: D 0; 2 D 1 y
σ1 < σ2
x
Case 4: < 0; 2 D 1 . Note that the value of ‚ D 45ı
y
x
Fig. 2.23 (continued)
2.28 Gaussian Random Vector
113
Case 5: > 0; 2 D 1 . Note that the value of ‚ D 45ı y
x
Case 6: ¤ 0; 2 ¤ 1 , Note that the value of ‚ ¤ 45ı
Fig. 2.23 (continued)
Example 2.13. Let X be a uniform random variable in [0,100]. E.X=X 65/ is computed as follows. Z
100
E.X=X 65/ D
x fX=X65 .x/dx 0
fX=X65 is obtained as follows FX=X65 .x/ D P .X < x; X 65/=P .X 65/ D P .65 X x/=P .X 65/ D .FX .x/ FX .65//=P .X 65/ D .FX .x/ FX .65//=P .X 65/
114
2 Probability
Differentiating on both sides, ) fX=X65 .x/ D fX .x/=P .X 65/ f or X 65 Z 100 fX .x/dx ) fX=X65 .x/ D fX .x/ Z
65 100
) fX=X65 .x/ D fX .x/
x fX .d /dx 65
1 f or 0 x 100 .Uniformly distributed/ 100 Z 100 ) fX=X65 .x/ D fX .x/ .1=100/dx 65 1 D .1=100/ .35/ 100 1 f or X 65 and X 100 D 35 Z 100 ) E.X=X 65/ D x fX=X65 .x/dx 65 Z 100 1 dx x D 35 65 Z 100 1 D x dx 35 65
1 .1002 652 / 165 35 D D 35 2 35 2 D 82:5
fX .d / D
Example 2.14. Let X be a poison random variable with probability mass function PX .k/ D
e k f or k D 0; 1; 2; : : : kŠ
E(X) and variance(X) are computed as follows The moment generating function ˚X .s/ D E.e SX / D
X1 xD0
e Sx
e x xŠ
1 X .e S /x e D xŠ xD0 .e S /2 .e S /3 .e S / C C C D e 1 C 1Š 2Š 3Š
D e e .e
S /
2.28 Gaussian Random Vector
115
Differentiating on both sides with respect to ‘s’ and substitute s D 0 gives S
E.X/ D e
de .e / S .at s D 0/ D e e .e / e S .at s D 0/ D ds
Differentiating e e .e
S /
E.X 2 / D e
h
e S with respect to ‘s’ and substitute s D 0 gives i S S e S e .e / e S C e .e / e S .at s D 0/ D .2 C /
) Variance D ŒE.X 2 / E.X/2 D Mean D Variance D Example 2.15. Consider the random variable X with probability density function as given below. fX .x/ D e x f or x 0 D 0; otherwise E.X/; fX .x=X 2/ and E.X=X 2/ are computed as shown below. Z 1 Z 1 Z 1 x E.X/ D x fX .x/dx D x e dx D x e x dx 1
Z
1
D
"
xd 0
" "
x
e #1
e x DD x
0
#
Z
1
0
0
0
# e x dx D Œ1=2 D 1=
FX .x=X 2/ D P .X x=X 2/ D P .X x; X 2/=P .X 2/ D P .X x; X 2/=P .X 2/ D
P .2 X x/ P .X 2/
D
FX .x/ FX .2/ P .X 2/
) fX .x=X 2/ D
fX .x/ f or x 2 P .X 2/
D 0; otherwise Z 1 1 e x 1 D e x D e 2 P .X 2/ D fX .x/dx D e x dx D 2 2 2 Z
) fX .x=X 2/ D e x e 2 f or x 2 D 0; otherwise
116
2 Probability
Z E.X=X 2/ D
1
1
Z
1
xe
xfX .x=X 2/dx D Z
x 2
e dx D e
1
2
2
2
" " D e 2 x
e x
#1
Z
1
2
2
"
e x xd #
#
h i e x dx D e 2 .2e 2 =/ C e 2 =2 /
D 2 C .1=/ Example 2.16. Let X D ŒX1 X2 X3T is a three-dimensional zero-mean Gaussian random vector with covariance matrix C given by 1 3 0 C D3 2 3 0 3 1 The joint density function fX1X2X3 .x1; x2; x3/ is given as follows fX1X2X3 .x1; x2; x3/ D fX .x/ h iT h i 1 1 1 X X e C X X D n 1 2 .2˘ / 2 jC j 2 where X D Œx1 x2 x3T ; jcj D 16; n D 3; X D Œ0 0 0T and 2
3 7 3 9 1 1 4 5 C D 3 1 3 16 9 3 7 ) fX1X2X3 .x1; x2; x3/ 0 2 3 7 3 9 1 1 @Œx1 x2 x3 4 3 1 3 5 D e 3 1 2 .2˘ / 2 j16j 2 9 3 7 2 31 x1 1 4x25A 16 x3 ) fX1X2X3 .x1; x2; x3/ 1 1 .7 x12 x22 C 7 x32 C 6 x1 x2 D e 3 1 32 .2˘ / 2 j16j 2 C6 x2 x3 18x1 x3 If Y D X1CX2CX3, then fY .y/ is Gaussian with mean mY and covariance matrix CY as shown below
2.28 Gaussian Random Vector
117
2
3
x1 Y D Œ1 1 1 4x25 x3
Y D AX ; where A D Œ1 1 1
2 32 3 1 3 0 1 The covariance matrix CY D A CX AT D Œ1 1 1 43 2 35 415 D 16 0 3 1 1 Mean vector mY D AmX D AX D Œ1 1 1Œ0 0 0T D 0 Thus fY .y/ D
2
Œy p 1 e 216 2˘13
Example 2.17. Let ŒX1 X 2T be the two-dimensional zero-mean Gaussian random 42 vector with covariance matrix C given by C D . The conditional density func24 tions fX1=X2 .x1/ and fX2=X1 .x2/ are also Gaussian distributed with the following specifications. (i) fX2=X1 .x2/ is Gaussian with mean D m2 c21 c111 .x1 m1/ and variance D c22 c21 c111 c12, where Œm1 m2T is the mean vector of the c11 c12 two-dimensional Gaussian random vector in general. Also C D c21 c22 be the corresponding generalized co-variance matrix. In our case m1 D 0 and m2 D 0: c11 D 4 c12 D 2 c21 D 2 c22 D
4 There fore fX2=X1 .x2/ is Gaussian distributed with mean D 24 x1 and
variance D 4 24 2 D 3 1
fX2=X1 .x2/ D p e …6
Œx2C 21 x1
2
6
(ii) fX1=X2 .x1/ is Gaussian with mean D m1 c12 c221 .x2 m1/ and variof the ance D c11 c12 c221 c21, where Œm1 m2T is the mean vector c11 c12 two-dimensional Gaussian random vector in general. Also C D c21 c22 be the corresponding generalized co-variance matrix. In our case m1 D 0 and m2 D 0: c11 D 4 c12 D 2 c21 D 2 c22
D 4 There fore fX2=X1 .x2/ is Gaussian distributed with mean D 24 x1 and vari ance D 4 24 2 D 3 1
fX2=X1 .x2/ D p e …6
Œx2C 21 x1 6
2
118
2 Probability
Example 2.18. Let X1 and X2 are jointly Gaussian with zero-mean. Let the 1 co-variance matrix of the random vector [X1 X2] is given as C D . Let 1 Y1 D X1 C X2; Y2 D X1 X2, then the co-variance matrix of the random vector [Y1 Y2] is given as CY D ACX AT
1 1 1 1 1 2.1 C / 0 D 1 1 1 1 1 0 2.1 /
This implies the random variable Y1 and Y2 are uncorrelated. As the distribution is Gaussian, this also implies that the two random variables Y1 and Y2 are independent.
12 1 2 and Similarly, for the case of co-variance matrix C D 1 21 22 Y1 D X1 C X2 ; Y2 D X1 X2 , the co-variance matrix CY D ACX AT is computed 1 2 1 2 as following 1= 1 1= 2
12 1 2 1= 1 1= 1 1= 2 1= 2 1= 1 1= 2 1 2 22
1 C 1 2 C 2 1= 1 1= 1 2.1 C / 0 D D
1 1 2 2 1= 2 1= 2 0 2.1 /
This implies the random variable Y1 and Y2 are uncorrelated. As the distribution is Gaussian, this also implies that the two random variables Y1 and Y2 are independent.
2.29 Complex Random Variables Consider the probability space (S,F,P), then the mapping of the outcome s 2 sample space S to the complex line is called complex random variable (Fig. 2.24). The complex random variable ‘X’ is represented as Xr C j Xi E.X/ D E.Xr / C j E.Xi / The joint pdf of ‘Xr ’ and ‘Xi ’ is represented as fXr Xi .xr ; xi /, which is the joint density function of the random variable ‘Xr ’ and ‘Xi ’. E.g.X// D E.g.Xr // C j E.g.Xi // The complex random variable X D Xr C j Xi is defined as proper complex random variable if it satisfies the following condition.
2.30 Sequence of the Number and Its Convergence
119 (S,F,P)
Fig. 2.24 Illustration of the Complex random variables S
x
-`
a
C
(C,B,Px)
E.X 2 / D 0 E.X 2 / D E ..Xr C j Xi /.Xr C j Xi // D E.X r 2 X i 2 / C 2jE.X iX r/ ) E.X r 2 / E.X i 2 / D 0 .i:e/ E.X r 2 / D E.X i 2 / Also; E.X i X r/ D 0 (i.e.) The random variable Xr and Xi are uncorrelated The second moment of the random variables ‘Xr ’ and ‘Xi ’ are zero. It can also be shown that for the proper complex random variable, variance of the random variables ‘Xr ’ and ‘Xi ’ are equal and they are uncorrelated. E ..X E.X //2 / is called pseudo covariance. The co-variance for the complex random variable ‘X’ is defined as E((X-E(X)) .X E.X//H )
2.30 Sequence of the Number and Its Convergence Let the sequence of random variables be represented as x1 x2 x3 : : :, then the sequence converges to the constant x is represented as follows lim xn D x
n!1
For example 1 C 11 ; 1 C 12 ; 1 C 13 ; : : : 1 C n1 is the sequence of random variables.
120
2 Probability
lim
n!1
1C
1 n
D1
Definition for convergence: lim xn D x
n!1
) For any 2> 0, there exists N, such that jxn xj 2/ converges to the value 0 (i.e.) limn!1 P.jXn X j >2/ D 0 for any 2> 0 Convergence in distribution limn!1 Xn .s/ D X.s/ in distribution sense if distribution function FXn .˛/ D FX .˛/ for all values of ‘˛’ where FX .˛/ is continuous. Properties of the different types of convergence 1. If the sequence of random variable converges in mean square sense then they will converge in probability sense (Fig. 2.25) P ..jXn X j 2/
E..Xn X /2 / 22
Proof. If the sequence of random variable converges in the mean square sense, E..Xn X /2 / D 0 ) P..jXn X j 2/ 0. Probability cannot be less than zero and hence P..jXn X j 2/ D 0 and hence proved 2. If the sequence of random variable converges in almost sure sense then they will converge in probability sense. (Obvious from the definition) 3. If the sequence of random variable converges in probability sense then they will converge in distribution sense and hence
Fig. 2.25 Convergence of the Sequence of random variables 1. Strictly convergence 2. Convergence in distribution 3. Convergence in probability 4. Convergence in mean square sense 5. Convergence in almost sure sense
122
2 Probability
If the sequence of random variable converges in mean square sense then they
will converge in distribution sense. If the sequence of random variable converges in almost sure sense then they
will converge in distribution sense.
2.33 Example for the Sequence of Random Variable Suppose X1 X2 : : : is the sequence of random variable such that E.Xi / D k for all i, E..Xi k/2 / is finite for all i and the covariance E..Xi k/.Xj k// D 0 for i ¤ j , then the sequence of random variable defined as Y1 Y2 Y3 : : : Yn converges to the constant ‘k’ mean square sense.
in P n Where Yn D n1 kD1 Xi .
2.34 Central Limit Theorem If we have independent and identically distributed random variables X1 X2 : : : with mean and variance less than infinity, then the sequence of random variable
‘m’ Pn 1 p i D1 .Xi m/ converges to the random variable X in distribution sense n such that the random variable ‘X’ is having Gaussian probability density function with mean zero and constant variance.
Chapter 3
Random Process
3.1 Introduction The mapping of the experimental outcomes s 2 sample space S to the set of Random vectors is called as random process (Fig. 3.1). Individual random vector can be treated as the signal which varies as the function of time. (i.e.) Thus the random process can also be viewed as the mapping of the outcomes s 2 sample space S to the set of signals as the function of time. Example 3.1. An experiment has four equally likely outcomes 0,1,2,3 (i.e.) S D f0; 1; 2; 3g. The random process Xt is defined as Xt D cos.2 PI s t/ for all s 2 S. In the above example, the outcome of the experiment ‘0’ is mapped to the random vector [1 1 1 1 1 1 1 1 1 1 : : : 1] Similarly the outcome of the experiment ‘1’, ‘2’, ‘3’ are mapped to the set of random vectors as shown below. [Note that resolution of the variable ‘t’ is 1/1,000] ‘1’-> [1.0000 1.0000 0.9999 0.9998 0.9997 0.9995 0.9993 0.9990 0.9987 0.9984: : : ‘2’-> [1.0000 0.9999 0.9997 0.9993 0.9987 0.9980 0.9972 0.9961 0.9950 0.9936 : : :] 3-> [1.0000 0.9998 0.9993 0.9984 0.9972 0.9956 0.9936 0.9913 0.9887 0.9856 : : :] The mapped vectors are plotted as the function of time which are shown in the Fig. 3.1. Thus the random process can be viewed as the mapping of the outcomes of the experiment to the set of signals as the function of time.
E.S. Gopi, Mathematical Summary for Digital Signal Processing Applications with Matlab, DOI 10.1007/978-90-481-3747-3 3, c Springer Science+Business Media B.V. 2010
123
124
3 Random Process 2 s1=ⱊ0⬘
s2=ⱊ1⬘
1 0 0 1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 −1 0 1 s3=ⱊ2⬘
0 −1 0 1 0
s4=ⱊ3⬘
−1 0
Fig. 3.1 Random process
3.2 Random Variable Xt1 The Random variable Xt1 is obtained by sampling across the random process Xt at particular time instant ‘t1’. In the previous example the random variable X0 X0:25 X0:5 are obtained by sampling the random process across the time instant ‘0’, ‘0.25’, ‘0.5’ respectively as shown in the Fig. 3.2. The random variable X0 holds the values 0 with probability D 1 The random variable X00:25 holds the values 1 with probability D 1/4 0 with probability D 1/2 1 with probability D 1/4 The random variable X0:5 holds the values 1 with probability D 1/2 1 with probability D 1/2
3.3 Strictly Stationary Random Process with Order 1 Cumulative distribution function of the random variable Xt1C D Cumulative distribution function of the random variable Xt1 for all values of . (i.e.) FXt1 .˛/ D FXt1C .˛/ for all .
3.4 Strictly Stationary Random Process with Order 2 FXt1;Xt 2 .˛; ˇ/ D FXt1C; Xt 2C .˛; ˇ/ for all ; t1; t2; .˛; ˇ/ tends to
If ˇ ! 1, order 1 stationarity holds
3.5 Wide Sense Stationary Random Process
125
2 1 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 0 −1 1 0 −1 1 0 −1 X0
X0.25
X0.5
Fig. 3.2 Random variable Xt1
In general if the random process is strictly stationary at order n, it is strictly stationary for order n 1. Example 3.2.(a) Xn : Functions of outcomes of tossing the coin. If the Probability of Head is same for every toss, then the random process Xn is strictly stationary of order 1. (b) Example 2.1 is the non-stationary random process. (c) Xt D A cos.2 … f t C ‚/, ‘‚’ is the random variable independent of ‘t’ and is uniformly distributed between 0 to 2 …. This is strictly stationary process.
3.5 Wide Sense Stationary Random Process The Random process Xt is said to be wide-sense stationary random process if it satisfies the following conditions mX .t/ D E.Xt / D constant for all time instant ‘t’. EŒXt Xs / D RX .t; s/ is the function of t s.
(i.e.) EŒXt Xs / D RX .t; s/ D RX .t s/ D RX ./ for all values of ‘t’ and ‘s’.
126
3 Random Process
Example 3.3. Xt D A cos.2 … f t C ‚/, ‘‚’ is uniformly distributed between 0 to 2 …. mX .t/ D E.Xt / D E.A cos.2 … f t C ‚/ D A E.cos.2 … f t C ‚/ (Note Expectation is computed at one particular time instant ‘t’ (i.e.) ‘t’ is fixed. so the function cos.2 … f t C ‚/ is the function of the random variable ‘‚’ only. (Represented as g(‚)). Let the probability density function of the random variable ‘‚’ be represented as f(‚). Therefore A E.cos.2 … f t C ‚// Z D A g.‚/f .‚/ d‚ Z D A cos.2…ft C ‚/ f.‚/ d‚ Z 2… 1 d‚ cos.2…ft C ‚/ DA 2 … 0 Z 2… A D cos.2…ft C ‚/ d‚ 2… 0 D0 ) mX .t/ is Constant Also Autocorrelation is computed as follows RX .s; t/ D RX .t C ; t/ D E.Xt C Xt / D E.A E.cos.2 … f .t C / C ‚/ A E.cos.2 … f t C ‚// D EŒA2 cos.2 … f .t C / C ‚/ cos.2 … f t C ‚/ D A2 E Œcos.2 … f .t C / C ‚/ cos.2 … f t C ‚/ A2 EŒcos.2 … f .2t C / C 2‚/ C cos.2 … f / D 2 A2 A2 EŒcos.2 … f .2t C / C 2‚/ C EŒcos.2 … f / D 2 2 Note that cos.2 … f / is constant as and f are constant So
A2 A2 EŒcos.2 … f / D cos.2 … f / 2 2
3.8 Joint Strictly Stationary of Two Random Process
127
I term
A2 2 EŒcos.2 … f .2t mX .t/]
C / C 2‚/ D 0 [Refer the steps involved in calculating 2
) RX .s; t/ D RX .t C ; s/ D A2 cos.2 … f / D RX .s t/ D RX ./ is the function of the difference u s D . Thus Xt D A cos.2 … f t C ‚/, where ‘‚’ is the random variable which is uniformly distributed between 0 to 2 … is the wide sense stationary process.
3.6 Complex Random Process The autocorrelation of the complex random process is given as RX .t; s/ D E.Xt XS /: If the Complex random process is Wide Sense Stationary process, then RX .t; s/ D E Xt XS D RX .t s/ D RX ./ which is the function of difference t s D
3.7 Properties of Real and Complex Random Process
1. RX .0/ D E Xt2 for real random process RX .0/ D E.jXt j2 / for complex random process 2. RX ./ D RX ./ for real random process RX ./ D RX ./ for complex random process Also Real ŒRX ./ D Real ŒRX ./ (Even symmetry) Imaginary ŒRX ./ D Imaginary ŒRX ./ (Odd symmetry) 3. jRX ./j ˇ RX .0/ˇfor both real2 and complex random process (i.e.) ˇE Xt C X ˇ E.jX t j / for complex random process jE.Xt C Xt /j E Xt2 for real random process
3.8 Joint Strictly Stationary of Two Random Process Consider two random process ‘Xt ’ and ‘Yt ’. They are said to be jointly strictly stationary if it satisfies the following condition FXt1;Y t 2 .˛; ˇ/ D FXt1C; Y t 2C .˛; ˇ/ for all ; t1; t2; .˛; ˇ/: Note that the two random process ‘Xt ’ and ‘Yt ’ are individually stationary.
128
3 Random Process
3.9 Jointly Wide Sense Stationary of Two Random Process Consider two random process ‘Xt ’ and ‘Yt ’. They are said to be jointly Wide Sense Stationary (W.S.S) if it satisfies the following condition 1. mX .t/ D E.Xt / D constant for all time instant ‘t’ 2. EŒXt Xs / D RX .t; s/ is the function of t s. (i.e.) EŒXt Xs / D RX .t; s/ D RX .t s/ D RX ./ for all values of ‘t’ and ‘s’ 3. mY .t/ D E.Yt / D constant for all time instant ‘t’ 4. EŒYt Ys / D RY .t; s/ is the function of t s. (i.e.) EŒYt Ys / D RY .t; s/ D RY .t s/ D RY ./ for all values of ‘t’ and ‘s’ 5. EŒXt Ys D RXY .t; s/ D RXY .t s/ D RXY ./ for all values of ‘t’ and ‘s’ Note (a) EŒXt Ys D RXY .t; s/ is called as cross correlation function. (b) RXY .t; s/ D RYX .s; t/ for real random process RXY ./ D RYX ./ for W.S.S. real random process .s; t/ for Complex random process RXY ./ D RYX ./ for (c) RXY .t; s/ D RYX W.S.S. Complex random process
3.10 Correlation Matrix of the Random Column Vector for the Specific ‘t’ ‘s’ E
Xt jXt Ys
E Xt2 Ys j D EŒYt Xs
Xt Ys
Y EŒX t s
E Yt2
3.11 Ergodic Process Let ‘s1’, ‘s2’ : : :‘sn’ be the outcomes of the experiment. Let Xt .s1/ be the signal as the function of time which is the map of the experiment S1. Similarly Xt .s2/ be the signal corresponding to the experiment S2. The set of functions as the outcomes of all the experiments forms the random process which is represented as Xt (see Fig. 3.3). Also let Xt1 be the random variable which holds the values obtained by collecting the values across the process at some arbitrary time instant ‘t1’. Ensemble average of the random variable Xt1 is computed across the process and is given by Z 1 xfXt1 .x/dx 1
where fXt1 is the probability density function of the random variable ‘Xt1 ’.
3.11 Ergodic Process
129
Fig. 3.3 Illustrations of Ergodic process
Time average of the arbitrary mapped signal Xt .s1/ is computed as follows lim
T !1
1 2T
Z
T T
Xt .s1/ dt
In case of Wide Sense Stationary process, the ensemble average E.Xt1 / is constant.. If the above mentioned Ensemble average (constant) is equal to the Time average computed for any arbitrary mapped signal Xt (s1) (say),the random process is called Ergodic in mean. (i.e.) The Random process Xt is said to be Ergodic in mean if limT !1
1 2T
Z
Z
T T
Xt .si/ dt D
1
1
xfXt1 .x/dx D constant for all ‘i’:
Note that Random process Xt must be the W.S.S. process if it is Ergodic process. Similarly the random process is said be Ergodic in Auto correlation if the random process satisfies for the following condition. Let Xt be the W.S.S process. Ensemble average in auto correlation computation is given as “ RX ./ D E.Xt C Xt / D
xy fXt CXt .x; y/dx dy
Time average in auto correlation computation is given as lim
T !1
1 2T
Z
T T
Xt .s1/Xt C .s1/ dt
130
3 Random Process
If Ensemble average is equal to the time average in auto correction computation, the random process is said to be Ergodic in autocorrelation. (i.e.) The random process Xt is said to be Ergodic in autocorrelation if limT !1 “ D
1 2T
Z
T T
Xt .si/Xt C .si/ dt
xy fXt CXt .x; y/dx dy D RX ./ D function of ‘’ for all ‘i’
Example 3.4. Xt D A cos.2 … f t C ‚/, ‘‚’ is uniformly distributed between 0 to 2 …. One particular map corresponding to the experimental outcome S1 is given as Xt .s1/ D A cos.2 … f t C ‚.s1// Ensemble average is given as E.Xt / D 0 (see Example 3.3) Time average lim
T !1
1 2T
Z
T T
Xt .s1/ dt
Z T 1 A cos.2 … f t C ‚.s1// dt D limT !1 2T T 1 A sin.2 … f T C ‚.s1//=.2 … f/ D limT !1 2T
A sin.2 … f T C ‚.s1//=.2 … f// is bounded between two constants. Say between ‘M1’ and ‘M2’. .i:e:/
limT !1
1 2T
M1 limT !1
1 2T
A sin.2 … f T
C‚.s1//= .2 … f/ limT !1 limT !1
1 2T 1 2T
M2 ) 0 A sin.2 … f T
C ‚.s1//=.2 … f/ 0
3.11 Ergodic Process
131
) limT !1
1 2T
A sin.2 … f T
C‚.s1//=.2 … f / D 0 ) limT !1
1 2T
Z
T T
Xt .s1/ dt D 0
Thus Ensemble average D Time average D constant D 0 and hence the random process Xt D A cos.2 … f t C ‚/ (where ‘‚’ is uniformly distributed between 0 to 2 …) is Ergodic in mean. In the same manner, it can be shown that Xt D A cos.2 … f t C ‚/ (where ‘‚’ is uniformly distributed between 0 to 2 …) is Ergodic in autocorrelation as shown below. Ensemble average in auto correlation is given as RX ./ D E.Xt C Xt / D A2 2 cos.2 … f / (see Example 2.3) Time average in auto correlation is computed as lim
T !1
1 2T
Z
T T
Xt .s1/Xt C .s1/ dt
Z T 1 A cos.2 … f .t C / C ‚.s1// D limT !1 2T T A cos.2 … f t C ‚.s1// dt Z T 1 D limT !1 .A2 =2/Œcos.2 … f .2t C / C 2 ‚.s1// 2T T C cos.2 … f / dt
First term limT !1
1 2T
Z
T
.A2 =2/Œcos.2 … f .2t C / C 2 ‚.s1// dt D 0 T
(As described in Time average in mean) ) limT !1
1 2T
Z
T
T T
Z
Xt .s1/Xt C .s1/ dt
1 .A2 =2/Œcos.2 … f .2t C / C 2 ‚.s1// 2T T C cos.2 … f / dt Z T
2 1 D limT !1 A =2 Œcos.2 … f / dt 2T T Z T 1 2 .A =2/Œcos.2 … f / dt D limT !1 2T T
D limT !1
132
3 Random Process
D limT !1
1 2T
.A2 =2/Œcos.2 … f /.2T/
D limT !1 .A2 =2/Œcos.2 … f / D .A2 =2/Œcos.2 … f / Thus Ensemble average in autocorrelation D Time average in auto correlation D .A2 =2/Œcos.2 … f / is the function of ‘ 0 : Hence the random process Xt D A cos.2 … f t C ‚/ (where ‘‚’ is uniformly distributed between (0 to 2 …) is Ergodic in autocorrelation
3.12 Independent Random Process Let Xt and Yt be two random processes. Let X D ŒXt1 Xt 2 Xt 3 Xt 4 : : : Xtn be the random vector obtained by sampling across the random process Xt at time instants t1, t2, : : : tn. Similarly the random vector Y D ŒYt1 Yt 2 Yt 3 Yt 4 : : : Ytn is obtained by sampling across the random process Yt . The random process Xt and Yt are independent if the random vectors X and Y are independent random vectors. .i:e:/ FXY ./ D FX ./FY ./
3.13 Uncorrelated Random Process Let Xt and Yt be two random processes. Let X D ŒXt1 Xt 2 Xt 3 Xt 4 : : : Xtn be the random vector obtained by sampling across the random process Xt at time instants t1, t2,: : : tn. Similarly the random vector Y D ŒYt1 Yt 2 Yt 3 Yt 4 : : : Ytn is obtained by sampling across the random process Yt . The random process Xt and Yt are uncorrelated if the cross-covariance matrix computed as E ..X mX /T .Y mY // is the diagonal matrix.
3.14 Random Process as the Input and Output of the System Consider the Linear time invariant system described by its impulse response h(t) (Fig. 3.4). Let Xt be the W.S.S. random process which is given as the input to the systemR h(t) and Yt be the corresponding output random process which is also W.S.S. 1 Then 1 h./Xt d converges to the output random process in mean square sense. (i.e.)
3.14 Random Process as the Input and Output of the System
Linear Time invariant System
Xt
133
Yt
Fig. 3.4 Random process as the input and output of the LTI system
Z Yt m:s:s D
1
h./Xt d Z 1 h./Xt d 2 / D 0 ) E.ŒYt 1
1
Properties 1. Mean of the output random process E.Yt / is constant Proof. Z
1
E.Yt / D E h./Xt d 1 Z 1 D h./E .Xt / d 1 Z 1 h./mX d D 1 Z 1 h./ d D mX 1
D constant: 2. RYX .t; s/ D E.Yt Xs D RYX ./ D h./ RX ./ Proof. RYX .t; s/ D E.Yt Xs D RYX ./ Z 1 D E h./Xt d Xs 1 Z 1 D h./E.Xt Xs / d Z1 1 h./RX .t s/ d D Z1 1 h./RX .t s / d D Z1 1 D h./RX . / d 1
134
3 Random Process
D t s.say/ ) RYX ./ D h./ RX ./ 3. RY ./ D RX ./ h./ h./ RY .t; s/ D E.Yt Ys D RY ./ Z 1 D E Yt h./Xs d 1 Z 1 h./E.Yt Xs / d D 1 Z 1 D h./RYX .t s C / d Z1 1 h./RYX . C / d D 1
D t s.say/ Let • D Z
1
) RY ./ D h.•/RYX .• C / d 1 Z 1 h0 .•/RYX .ı C / d D 1
Let h0 .•/ D h.•/.say/ ) RY ./ D h0 ./ RYX ./ ) RY ./ D h./ RYX ./ We know, RYX ./ D h./ RX ./ ) RY ./ D h./ h./ RX ./ Note that the above mentioned properties are true for the complex random process also.
3.15 Power Spectral Density (PSD) The power spectral density of the W.S.S. random process Xt is defined as the Fourier transformation of its autocorrelation function RX ./. (i.e.)
3.15 Power Spectral Density (PSD)
135
Z SX .f / D
1 1
RX ./ e j 2˘f d
Consider the Linear time invariant system described by its impulse response h(t). Let Xt be the W.S.S. random process which is given as the input to the system h(t) and Yt be the corresponding output random process which is also W.S.S. We have shown that RY ./ D h./ h./ RX ./ Taking Fourier transformation on both sides ) SY .f / D H.f /H.f /SX .f / ) SY .f / D H.f /H .f /SX .f /.Assuming h./ is the real function/ ) SY .f / D jH.f /j2 SX .f / We have also shown that RYX ./ D h./ RX ./ Taking Fourier transformation on both sides ) SYX .f / D H.f /SX .f / The power spectral density SYX .f / is called as Cross power spectral density. Properties of power spectral density R1 1. SX .0/ D 1 RX ./ d R1
2. RX .0/ D 1 SX .f / d D E Xt 2 Note that mean square value is obtained from all frequencies of the spectral density 3. SX .f / is always real for all values of ‘f’ and 0 Z SX .f / D
1
1 Z 0
D
RX ./ e j 2˘ d Z
j 2˘f
d C 1 Z Z 0 RX ./ e j 2˘ d C D RX ./ e
1
1
0 1
RX ./ e j 2˘f d
RX ./ e j 2˘ d
0
Let u D Z
1
D 0
Z
1
RX .u/ e j 2˘f u d u C 0
RX ./ e j 2˘f d
136
3 Random Process
Z
Z
1
RX .u/ e j 2˘f u d u C Z0 1 Z D RX .u/ e j 2˘f u d u C
D
0
1
0 1
RX ./ e j 2˘f d
RX ./ e j 2˘f d
0
.u/] [Using the property of complex auto correlation .i:e/ RX .u/ D RX
Z
1
D 0
RX .u/
Z
1
RX ./ e j 2˘f d Z 1 Z 1 ŒRX ./e j 2˘f d RX ./ e j 2˘f d e
j 2˘f u
du C
0
0
0
which is the real value and hence power spectral density SX .f / is always positive. (i.e.) SX .f / 0 4. If the W.S.S. random process Xt is real, then SX .f / D SX .f / 5. Consider the W.S.S. random process Xt and the corresponding autocorrelation function and spectral density function are given as RX ./ and SX .f / respectively and SX .f / D 0 for jfj > W, then the random process Xt is said to be band limited with Bandwidth ‘W’ 6. Consider the system which is Band limited with bandwidth ‘W’ (Fig. 3.5). Consider the band limited W.S.S. random process Xt which is given as the input to the system. The output of the system is the random process Yt , then E.ŒXt Yt /2 / D 0 Suppose Wt and Vt are the responses of the system H1 .f / and H2 .f / to the band limited process ‘Xt ’ (Fig. 3.6) If H1 .f / D H2 .f / for all jf j W , then Wt and Vt are equal in Mean Square Sense. (i.e.) EŒ.Wt Yt /2 D 0
|H(f)|
Xt
−w
Fig. 3.5 Transfer function of the band limited system
w
f
Yt
3.16 White Random Process (Noise)
137
H1(f) Wt +
Xt
− Vt H2(f)
Fig. 3.6 Response of the system to the band limited process
Proof. Let Zt D .Wt Yt / ) E .Wt Yt /2 D E Zt 2 We know that, SZ .f / D SX .f /jH.f /j2 where H.f / D H1 .f / H2 .f / Also E Zt 2 D Rz .0/ Z D D
1
Z1 1 1
SZ .f / df SX .f /jH.f /j2 df
Because H .f/ D 0 for all jf j W and SX .f / D 0 for all jf j > W Thus EŒ.Wt Yt /2 D 0
3.16 White Random Process (Noise) The random process Xt is said to be White Gaussian Random process, if Mean D mX .t/ D E.Xt / D 0 for all time instant ‘t’ RX .t; s/ D RX .£/•.£/ SX .f / D constant:
No 2
138
3 Random Process
White noise has zero mean and infinite variance, which cannot be realized in practical situation. The white noise obtained in real time can be viewed as the filtered white noises through the Band limited filter whose frequency response flat over the bandwidth of interest.
3.17 Gaussian Random Process The Random vector Xt is said to be Gaussian Random Process if the random vector obtained by sampling across the process Xt at time instants t1, t2, t3, : : : tn (Represented as ŒXt1 Xt 2 : : : Xtn ) is a jointly Gaussian random vector. Properties 1. If the input to the Linear, stable system is Gaussian random process, then output is also Gaussian random process. Note that the system can also be Time variant 2. If Xt is Gaussian and W.S.S. it is also Strictly Stationary 3. The random process Xt is said to be White Gaussian Random process, if Xt W.S.S. Gaussian Random process Mean D mX .t/ D E.Xt / D 0 for all time instant ‘t’ RX .t; s/ D RX ./ı./ N2o SX .f / D constant Note that there can be White Non-Gaussian Random process. Example 3.5. Gaussian Random process 1. Wiener process [Non-stationary random process] Mean D mX .t/ D E.Xt / D 0 for all time instant ‘t’ RX .t; s/ D 2 min.t; s/ C m2 ts for all t, s 0 2. Gauss-Markov process[Stationary random process] mX .t/ D 0 RX .t; s/ D 2 e ˇ jt sj ; 2 ; ˇ > 0 for all t, s 0 Let t1,t2,t3 be the three samples instance of the Gauss Markov random process with t3 > t2 > t1, then fXt3=Xt2 Xt1 D fXt3=Xt2 In general conditional density of the random variable obtained at particular time instant tn given the random variables obtained at set of time instants t1,t2,t3,..tn 1 with t1 < t2 < : : : tn 1, is equal to the conditional density function of the random variable obtained at time instant tn given the random variable obtained at time instant tn 1. (i.e.) Let t1,t2,t3,..tn be the three samples instance of the Gauss Markov random process with tn > tn 1 > tn 2 : : : t2 > t1, then fXtn=Xtn1 Xtn2 ::: Xt1 D fXtn=Xtn1
3.19 Wide Sense Cyclo Stationary Random Process
139
3.18 Cyclo Stationary Random Process The random process Xt is said to be strictly stationary random process FXt1;Xt2;Xt3;Xt4:::;Xtn (˛1; ˛2; ˛3; : : : ˛n) D FXt1C;Xt2C;Xt3C;Xt4C:::CXtnC (˛1; ˛2; ˛3; : : : ˛n) for all £; ’1; ’2; ’3; : : : ’n If D n T, where n D : : : 2; 1; 0; 1; 2 : : : and T is constant, the random process is said to be cyclo stationary random process with period ‘T’. P Example 3.6. Xt D 1 nD1 An p.t nT/ is the cyclo stationary random process, where An is a discrete time strictly stationary process. p(t) is the function of ‘t’.
3.19 Wide Sense Cyclo Stationary Random Process The random process Xt is W.S. Cyclo stationary random process if mX .t/ D 0 RX .t; s/ D RX .t C nT; s C nT / Example 3.7. Consider the Discrete wide sense stationary random process An that takes the values C1 and 1 with equal probability at all time instants (Stream of Binary data). Let the pulse used to modulate the above mentioned binary stream is p(t) P1having nonzero values for 0 t T . The random process defined as Xt D nD1 An p.t nT/ is cyclo stationary. Let this random process be the input to the channel input and the random process Yt is the random process of the output of the channel (i.e.) P in the receiver section. The random process Yt is represented as follows. Yt D 1 nD1 An p.t nT /, where be the time delay of the pulse p (t) which can be viewed as the random variable which is uniformly distributed between 0 to T. The random process Yt is the Wide Sense Stationary random process. Proof. Xt D
X1
An p.t nT / X1 mX .t/ D E.Xt / D E.An /p.t nT / nD1 X1 D kp.t nT / which is periodic with time nD1
nD1
period ‘T0
) mX .t C T / D mX .t/ X1 Xt D An p.t nT / nD1
140
3 Random Process
RX .t; s/ D E.Xt Xs / D D
X1
X1
nD1 1 1 X X
kD1
E.An Ak /p.t kT /p.s nT /
RA .n k/p.t kT / p.s nT /
nD1 kD1
Changing the limit m D n k D
1 X
1 X
RA .n k/ p .t kT / p .s nT /
nD1 kD1
D
1 X
1 X
RA .m/ p.t nT C mT / p .s nT /
nD1 mD1
) RX .t C T; s C T / D
D
1 X
X1
X1
nD1
mD1
RA .m/ p .t C T nT C mT /
p .s C T nT / 1 X RA .m/ p .t nT C mT / p .s nT / D RX .t; s/
nD1 mD1
P Hence Xt D 1 nT/ is cyclo stationary process. nD1 An p.tP The random process Yt D 1 nD1 An p.t nT / [ is uniformly distributed between 0 to T] is wide sense stationary random process Proof. Computation of mean Yt D
X1 nD1
An p.t nT / i
E.Yt / D E ŒE.Yt = D a/ D E ŒE.Yt = D a/ ŒE.Yt = D a/ X1 E.An /p.t nT a/ D nD1 X1 D kp.t nT a/ nD1 X1 D mX .t a/ŒBecause mX .t/ D
nD1
kp.t nT /
which is periodic with time period ‘T’ ) E ŒE.Yt = D a/ D E ŒmX .t a/ Z T 1 mX .t a/ da D T 0 Note that mX .t/ is periodic with time period ‘T’ .mX .t a/ is the shifted version
RT of mX .t/ and hence T1 0 mX .t a/da is constant.
3.19 Wide Sense Cyclo Stationary Random Process
141
E.Yt C Yt / D E ŒE.Yt C Yt = D a/ X1 X1 An p.t C mT a/ E.Yt C Yt = D a/ D nD1
mD1
Am p.t nT a/ X1 X1 D E.An Am /p.t C mT a/ nD1
D
X1
mD1
p.t nT a/ X1 RA .n m/p.t C mT a/
nD1
mD1
p.t nT a/ Let k D n m D
1 X
1 X
RA .k/p.t C nT C kT a/
nD1 kD1
D
1 X
p.t nT a/ 1 X RA .k/p.t C nT C kT a/
nD1 kD1
p.t nT a/ ) E ŒE.Yt C Yt = D a/ Z T X X1 1 1 RA .k/p.t C nT D nD1 kD1 T 0 CkT a/p.t nT a/da X 1 1 Z T X 1 D RA .k/ p.t C nT T nD1 0 KD1
CkT a/p.t nT a/da Let u D t nT a X 1 1 Z t nT X 1 D RA .k/ p.u C C kT /p.u/d u T nD1 t nT T KD1
Consider
R t nT nD1 t nTT
P1
p.u C C kT/p.u/du
Let f.u/ D p.u C C kT/p.u/ Z Z t C3T f .u/d u C D C C
t C2T Z t T t 2T
f .u/d u C
t C4T
t C3T Z t 2T t T
Z
t
f .u/d u C : : :
f .u/d u t T
f .u/d u C
142
3 Random Process
Z D D D
1
f .u/d u Z1 1 Z1 1
p.u C C kT /p.u/ d u p.u C C kT /p.u/ d u
1
D RP . C kT/ which is the autocorrelation of the deterministic signal pulse p(t). X 1 1 Z T X 1 RA .k/ p.t C nT C kT a/p.t nT a/da T nD1 0 KD1 X 1 1 ) RY ./ D RA .k/RP . C kT / KD1 T
)
P Which is the function of t s D and hence Yt D 1 nD1 An p.t nT / [‘’ is uniformly distributed between 0 to T.] is Wide Sense Stationary process. Taking Fourier transform on both sides of the equation RY ./ D we get SY .f / D
X 1 1 RA .k/RP . C kT / KD1 T
X 1 1 RA .k/e j 2˘f kT jP .f /j2 T KD1
Where P(f) is the Fourier transformation of the autocorrelation function p(t). Also note that RP .t/ D p.t/ p.t/ and hence Fourier transform of RP ./ is P .f /j2
3.20 Sampling and Reconstruction of Random Process Consider the Band limited continuous random process Xt with
bandwidth ‘W’ (i.e.) SX .f / D 08jfj > W, which is sampled with sampling rate T1 2W to obtain the discrete random process XnT Then the sequence of random process Xt .N / converges to random process Xt in mean square sense as N> 1, where Xt
.N /
D
XnDN nDN
XnT
D 0 as N-> 1:
ˇ ˇ2 t ˇ .N / ˇ n :.i:e:/ E ˇXt Xt ˇ sinc T
3.20 Sampling and Reconstruction of Random Process
143
Proof. The requirement is to show that E jXt Xt .N / j2 D 0 as N > 1. ˇ ˇ2 ! XnDN ˇ ˇ t ˇX t n ˇˇ XnT sinc ˇ nDN T XN
t 2 n E Xt XnT sinc D E jXt j nDN T XN
t n E Xnt Xt sinc nDN T XN XN
t C n E XnT XmT sinc nDN mDN T t sinc m T
ˇ ˇ2 ˇ ˇ E ˇXt Xt .N / ˇ D E
Note Note that if Xt is bandlimited then the corresponding autocorrelation function (i.e.) RX .t/ bandlimited and RX .t/ can be viewed as the band limited signal and hence using sampling theorem, RX .t/ can be reconstructed using the following formula X1 t n RX .t/ D RX .nT //sinc nD1 T The shifted version of the signal (i.e.) RX .t / can be reconstructed using the formula as shown below. X1 t n RX .t / D RX .nT //sinc nD1 T At t D RX .0/ D
X1 nD1
RX .nT t//sinc
t n .1/ T
Similarly we can show that RX .0/ D
X1 nD1
RX .t nT //sinc
t n .2/ T
Consider the four terms in the expanded form of the equation E jXt Xt .N / j2 (as shown above). Applying the limit N ! 1 individually on the four terms we get the following. First term E.jXt j2 / D RX .0/ Second term W
XN
E.Xt XnT /sinc nDN
t T
n
144
3 Random Process N X
lim
N !1
nDN
t n D RX .0/ RX .t nT //sinc T
(From Eq. (2)) t Third term W n T N X t lim n D RX .0/ RX .nT t//sinc N !1 T XN
E.XnT Xt /sinc nDN
nDN
(From Eq. (1)) Fourth term XN
XN
nDN
D
t t n sinc m T T t t n sinc m RX .nT mT /sinc T T
E.XnT XmT /sinc mDN
XN
XN
nDN
mDN
Consider the term lim
N X
N !1
RX .nT mT /sinc
mDN
t n D RX .t mT / T
(From the Reconstruction formula of sampling theorem) lim
N !1
N X
RX .t mT /sinc
mDN
t m D RX .0/ T
(From Eq. (2)) Thus the ˇ ˇ2 ˇ ˇ lim E ˇXt Xt.N / ˇ D E
N !1
ˇ ˇ2 ! XnDN ˇ ˇ t ˇX t n ˇˇ XnT sinc ˇ nDN T
RX .0/ RX .0/ RX .0/ C RX .0/ D 0 Hence proved
3.21 Band Pass Random Process The Random process is said to be Band pass Random process if its spectral density have band pass frequency response.
3.21 Band Pass Random Process
145
Example 3.8. The Random process Xt D Xt I cos.2˘fc t/ Xt Q sin.2˘fc t/ forms the Band pass W.S.S. Random process if the following conditions are satisfied.
Xt I and Xt Q are the Low pass random process They are Jointly W.S.S. Process RX I ./ D RX Q./ RX I X Q ./ D RX Q X I ./
Proof. To make the random process Xt as the Wide sense stationary process, RX .t; u/ should be the function of D t u RX .t C ; / D E.Xt C X / Let A D Xt C I cos.2˘fc .t C // Xt C Q sin.2˘fc .t C // B D Xt I cos.2˘fc t/ Xt Q sin.2˘fc t/ DE(AB) consists of four terms. I term: E.Xt C I cos.2˘fc .t C //Xt I cos.2˘fc t// I I1 Œcos.2˘fc .2t C // C cos.2˘fc / D E Xt C Xt 2 1 D RX I ./ Œcos .2˘fc .2t C // C cos.2˘fc / 2
Second term: E Xt C I cos.2˘fc .t C //Xt Q sin.2˘fc t/ I Q1 Œsin.2˘fc .2t C // sin.2˘fc / D E Xt C Xt 2 1 D RX I X Q ./ Œsin.2˘fc .2t C // sin.2˘fc / 2
Third term: E Xt C Q sin.2˘fc .t C //Xt I cos.2˘fc t/ 1 D E Xt C Q Xt I Œsin.2˘fc .2t C // C sin.2˘fc / 2 1 D RX I X Q ./ Œsin.2˘fc .2t C // C sin.2˘fc / 2
Fourth term: E Xt C Q sin.2˘fc .t C //Xt Q sin.2˘fc t/ 1 D E Xt C Q Xt Q Œ cos.2˘fc .2t C // C cos.2˘fc / 2 1 D RX Q ./ Œ cos.2˘fc .2t C // C cos.2˘fc / 2 RX I X Q ./ RX Q X I ./ RX I ./ C RX Q ./ D cos.2˘fc / C sin.2˘fc / 2 2
146
3 Random Process
RX I ./ C RX Q ./ cos.2˘fc .2t C // 2 RX I X Q ./ RX Q X I ./ sin.2˘fc .2t C // C 2
C
To make the above expression independent of ‘t’ First and second term is already independent of ‘t’. To make the third and fourth terms independent of ‘t’, the following conditions have to be satisfied. RX I ./ D RX Q ./ RX I X Q ./ D RX Q X I ./ Hence proved.
3.22 Random Process as the Input to the Hilbert Transformation as the System Let W.S.S random process Xt be the input to the system (Hilbert transform) whose frequency response is as shown below (Figs. 3.7 and 3.8). ct . The Hilbert transform of the random process Xt is represented as X Properties 1. SXO .f / D SX .f / SXO .f / D SX .f /jH.f /j2
H(f) j
t −j
Fig. 3.7 Transfer function of the Hilbert transformation
^ Xt
Xt h(t)
Fig. 3.8 Hilbert transformation system
3.22 Random Process as the Input to the Hilbert Transformation as the System
jH.f /j2 D 1 for all f .see Graph/ Hence SXO .f / D SX .f / 2. SX XO .f / D SXX O .f / We know, RXX O ./ D RX ./ h./ ) RXX O ./ D RX ./ h./ The function RX ./ is the even function. ) RX ./ D RX ./ The impulse response of the Hilbert transformation system is given as 1 for all t h.t/ D ˘t ) RXX O ./ D RX ./ h./ ) RXX O ./
1
1
Also we know RXX O ./ D E.Xt C Xt / D E.Xt Xt C / D RX XO ./ ) RXX O ./ D RX
./ XO
From the above ) RX
./ XO
D RXX O ./
Taking Fourier transformation on both sides, we get SX XO .f / D SXX O .f / ct / D 0 3. E.Xt X Proof. From property 2 we get, RX XO ./ D RXO X ./ ) RX XO .0/ D RXO X .0/ .1/ Also we know ct / D E.X ct Xt / E.Xt X ) RX XO .0/ D RXO X .0/ .2/ From (1) and (2) we conclude RX XO .0/ D 0 ct / D 0 ) E.Xt X
147
148
3 Random Process
ct , then 4. Let Zt D Xt C j X
1
b
E.Zt C Zt / D E..Xt C C j Xt C /.Xt C j Xt C / ct / C jE.Xt C Xt /E.Xt C X ct / D E.Xt C Xt / j E.Xt C X
1
1
) RZ ./ D RX ./ jRX XO ./ C jRXX O ./ C RXO ./ From the property 1 and property 2, RX ./ D RXO ./ RX XO ./ D RXX O ./ D RX ./ C 2jRXX O ./ ) RZ ./ D 2RX ./ C 2jRX ./ h./ In frequency domain (By taking Fourier Transform), we get, SZ .f / D 2SX .f / C 2jSX .f /H.f / ) SZ .f / D 2SX .f /.1 C jH.f// .See Figure 3-7/ ) SZ .f / D 4SX .f / f or f > 0 D 0 f or f < 0
3.23 Two Jointly W.S.S Low Pass Random Process Obtained Using W.S.S. Band Pass Random Process and Its Hilbert Transformation Let Xt be the W.S.S. Band pass Random process ct Define Xt C , Xt C j X From the property 4 of Hilbert transformation, we get SX C .f / D 4SX .f / for f > 0 D 0 for f < 0 Define XQt D Xt C e j 2…fc t
E.XQ t C£ XQt / D RX .£/ D E Xt C£ C e j 2…fc .t C£/ Xt C e j 2…fc .t /
D E Xt C Xt C e j 2˘fc .£/ ) RX ./ D RX C ./e j 2˘fc ./ ) SX .f / D SX C .f C fc /
3.23 Two Jointly W.S.S Low Pass Random Process
149
Let Xt D Xt I C j Xt Q
ct e j 2˘fc .t / ) Xt I D Real Xt C e j 2˘fc .t / D Real Xt C j X ct si n .2 ˘ fc t/ D Xt cos .2 ˘ fc t/ C X Similarly Xt Q D Imaginary.Xt C e j 2˘fc t / ct cos.2 ˘ fc t/ Xt si n.2 ˘ fc t/ DX In the same fashion it can be shown that Xt D Xt I cos.2 ˘ fc t/ Xt Q sin.2 ˘ fc t/ (Which is of the same form as mentioned in Example 2.11) The random process Xt I and Xt Q satisfies the following conditions. RX I ./ D RX Q ./ RX I X Q ./ D RX Q X I ./ SX I .f / D SX Q .f / D 14 ŒSX .f / C SX .f /
(Low pass frequency
response) j SX I X Q .f / D 4 ŒSX .f / SX .f / (Low pass frequency response) Thus jointly wide sense stationary low pass random process Xt I and Xt Q are genct as erated using the Bandpass random process Xt and its Hilbert transformation X mentioned below ct sin.2 … fc t/ Xt I D Xt cos.2 … fc t/ C X ct cos.2 ˘ fc t/ Xt sin.2 ˘ fc t/ Xt Q D X
Proof. RX I .£/ D E Xt C£ I Xt I We know Xt D Xt I C j Xt Q .Xt C .Xt / / ) Xt I D 2
.Xt C£ C .Xt C£ / / .Xt C .Xt / / 2 2 1 1 1 D E.Xt C£ Xt / C E.Xt C£ .Xt / / C E..Xt C£ / Xt / 4 4 4 1 C E..Xt C£ / .Xt / / 4
) E.Xt C I .Xt I / / D E
I term: 14 E .Xt C£ Xt / D 0
150
3 Random Process
1
E Xt C£ C e j 2pi fc .t C£/ Xt C e j 2pi fc £ 4
1
ct e j 2pi fc £ D E Xt C£ C j Xt C£ e j 2pi fc .tC£/ Xt C j X 4
1
ct D e j 2pi fc .2tC£/ E .Xt C£ C j Xt C£ / Xt C j X 4
1 ct D e j 2pi fc .2tC£/ .E.Xt C£ Xt / C jE Xt C£ Xt C jE Xt C£ X 4 ct / E.Xt C£ X
D
1
1
1
1
D e j 2pi fc .2tC£/
1 RX .£/ C jRX XO .£/ C jRXX O .£/ RXO .£/ 4
From the property of Hilbert transformation mentioned in 3.22, we get the following RX .£/ D RXO .£/ RX XO .£/ D RXO X .£/ Hence E.Xt C£ Xt / D 0 II term: 14 E.Xt C£ .Xt / / D 14 RX .£/ III term: 14 E..Xt C£ / Xt / D 14 E.Xt .Xt C£ / / D 14 RX .£/ IV term: 14 E..Xt C£ / .Xt / / D 0 This can be obtained in the same fashion as that of the I term. Thus 1 RX I .£/ D ŒRX .£/ C RX .£/ 4 Taking Fourier transformation on both sides we get SX I .f/ D
1 ŒSX .f/ C SX .f / 4
In the same fashion we can show 1 ŒSX .f/ C SX .f / 4 j ŒSX .f / SX .f / and SX I X Q .f / D 4 SX Q .f/ D
Example 3.9. Consider the Band pass random process Xt whose spectral density is as shown below (Figs. 3.9 and 3.10).
RX .£/ D E.Xt C£ .Xt / / D E Xt C£ C .Xt C / e j 2pi fc .tC£/ e j 2pi fc .t / D RX C .£/e j 2pi fc .£/ ) SX .f/ D SX C .f C fc /
3.23 Two Jointly W.S.S Low Pass Random Process
151
Sx (f )
1
−40
−10
10
40
f
MHz
Fig. 3.9 Spectral density of the band pass random process
Fig. 3.10 Spectral density of the XtC
SX + (f )
4
10
Fig. 3.11 Spectral density of the XtC
40
fMHZ
SX ~ (f) 4
−5
35
fMHz
We know that SX C .f/ D 4SX .f / for f > 0 D 0 for f < 0 Let fc be chosen as 15 MHz so that the spectral density SX .f/ looks like the following (Fig. 3.11). Q Thus the spectral density of the XtI and the Xt are as shown below (Figs. 3.12 and 3.13). Note that SX I .f/; SX Q .f/ and j SX I X Q .f / are having low pass characteristics.
152
3 Random Process
2
Sx |(f)
1
−35 −30
−5
5
30 35
fMHZ
Q
Fig. 3.12 Spectral density of the XtI which is same as that of the spectral density Xt
-j SX | X Q(f)
−35 −30
−5 5
Fig. 3.13 Spectral density jSX I X Q .f /
30 35
fMHZ
Chapter 4
Linear Algebra
4.1 Vector Space 1. The set of vectors forms the vector space ‘V’ over the Field ‘F’, if the vectors belonging to that set satisfy the following properties (a) If v1, v2 are the elements of the vector space ‘V’, then the vector defined by v1 C v2 must be the element of the vector space ‘V’. (b) For some scalars f‘’’ ‘“’g 2 F. .’ C “/v D ’v C “v: .’ “/v D ’.“v/: (c) There exists the identity scalar represented as ‘1’ such that 1:v D v:1 D v. (d) There exists the additive identity vector represented as ‘0’ such that for any vector ‘v’ 2 vector space, v C 0 D 0 C v D v. (e) There exists the additive inverse vector for every vector ‘v’ 2 vector space which is represented as ‘v’ such that vC.v/ D 0, where 0 is the additive identity vector which is the element of the vector space. (f) .v1 C v2/’ D ’ v1 C ’ v2. (g) v1 C .v2 C v3/ D .v1 C v2/ C v3. (h) ’ v 2 V where ’ 2 Fv 2 V. 2. ‘W’ is the subspace of the vector space ‘V’ .W V/ if the elements of the set ‘W’ is the subset of the vector space ‘V’ and satisfies all the properties mentioned in the fact 1. 3. If W1 V and W2 V, then W1 \ W2 V. But W1 [ W2 V only when W1 W2 or W2 W1. 4. The set of vectors fv1, v2, v3, v4: : : vng are said to be linearly independent if ’1 v1 C ’2 v2 C ’3 v3 C ’4 v4 C : : : ’n vn D 0 for some scalars f’1; ’2; ’3 : : : ’ng, then ’1 D ’2 D ’3 D : : : ’n D 0. 5. If the set of vectors fv1,v2,v3,: : :vng are linearly dependent if any one of the vector in the set is represented as the linear combinations of other vectors. 6. The set of vectors fv1,v2,v3,: : :vng forms the Generating set of the vector space ‘V’ if any vector in the vector space ‘V’ can be represented as the linear E.S. Gopi, Mathematical Summary for Digital Signal Processing Applications with Matlab, DOI 10.1007/978-90-481-3747-3 4, c Springer Science+Business Media B.V. 2010
153
154
7. 8. 9. 10. 11. 12. 13.
14.
4 Linear Algebra
combinations of the vectors in the Generating set. Also note that the Generating set is the subset of the Vector space ‘V’. Generating set which are linearly independent is called the basis of the vector space ‘V’. If fv1,v2,v3,: : :vng be the basis of the vector space ‘V’, then the set of .n C 1/ vectors in the vector space is always dependent. The set of minimum number of the vectors which forms the Generating set of the vector space ‘V’ is called as Minimal Generating Set. The maximum number of independent vectors collected from the vector space ‘’V is called maximal independent set. If fu1,u2,u3,: : :ung is the minimal generating set, then the maximal independent set is fu1,u2,u3,: : :ung. If fu1,u2,: : :ung is the maximal linear independent set then fu1,u2,u3,: : :ung is the basis of the vector space ‘V’. The number of elements of the set is called as the cardinal number of the set. The cardinal number of the Basis set is called dimension (dim) of the vector space ‘V’. If W is the subspace of V.W V/ then (a) the Basis of W Basis of V. (b) Any basis of ‘W’ is extended to the basis of ‘V’. (c) dim.W/ dim.V/.
15. If W1 V and W2 V then (a) dim.W1 [ W2/ D dim.W1/ C dim.W2/ dim.W1 \ W2/. (b) Let the basis for the subspace W1 \ W2 is fu1,u2,u3,: : :ukg. The basis for the subspace W1 is obtained by extending the basis of the subspace W1 \ W2 as fu1,u2,: : :uk,w1,w2,w3,: : :wmg. Similarly the basis for the subspace W2 is obtained by extending the basis of the subspace W1 \ W2 as fu1,u2,u3,: : :uk,v1,v2,v3,: : :vng. Note that the dimension of the vector space W1 is k C m and the dimension of the vector space W2 is k C n. (c) The basis of the vector space ‘V’ is obtained as fu1,u2,: : :uk,w1,w2, w3, : : :wm,v1,v2,v3,: : :vmg.
4.2 Linear Transformation 1. Let ‘U’ and ‘V’ be the vector spaces over the field F.A Map T: V ! V is called linear transformation if (a) T.u C v/ D T.u/ C T.v/ (b) T.’ U/ D ’ T.u/ where u 2 U and v 2 V The Linear map is graphically represented as follows (Fig. 4.1).
4.2 Linear Transformation
155
Fig. 4.1 Linear transformation
T :U
V
u1
v1
u2
v2
u3
v3
u4
v4
un
vm
Fig. 4.2 Injective transformation u1
v1
u2
v2
un
vn vn+1 vn+2 vn+3
vm
2. The linear transformations are broadly classified as (a) ONE-ONE (Injective) transformation (b) ONTO (Surjective) transformation (c) Isomorphic (Bijective) transformation. (a) Injective transformation (Fig. 4.2) A Linear map T: U ! V is said to be injective if T.u1/ D T.u2/ ) u1 D u2 In this case ker.U/ D f0g. Also note that dim.U/ . Note that VXV is the vector space with first element 1. < v; v >¤ 0 where v 2 V If < v; v >D 0 then v D 0 2. < v; w C u >D< v; w > C < v; u > where u; v 2 V 3. < cv; w > c < v; w > where c is the scalar constant. 4. < v; w >D < w; v >; where < w; v >, is called as conjugate )< v; cw >D cN < w; v > The vector space V with the with the defined inner product forms the Inner product space. Examples for the inner product space 1. Consider the vector space R2 . Consider two arbitrary vectors v1 D .a1; a2/ 2 R2 , and v2 D .b1; b2/ 2 R2 , then the inner product defined in the vector space V as < v1; v2 >D a1b1 C a2b2. Thus the vector space R2 with the above defined inner product forms the inner product space. 2. Consider the vector space Rnxn . Consider the arbitrary vector B 2 Rnxn , the inner product is defined as < A; B > trace.AB /; B , is the conjugate of the matrix B. Thus the vector space Rnxn for the above defined inner product forms the inner product space. 3. Consider the vector space consists of the set of all complex valued functions defined for the interval (0,1] with the inner product defined as follows: Z
1
f .t/g.t/dt forms the inner product space:
< f; g >D 0
4.12 Orthogonal Basis
177
Formation of Inner product space for the vector space V Consider the n-dimensional vectors space V. Let the basis of the vector basis be represented as fv1, v2, v3: : :vng. There exists the transformation A: V ! V such that Av1 D e1; Av2 D e2; : : : Avn D en, where ‘ei’ is the vector with n elements completely filled up with zeros except ith element which is filled up with 1:e1; e2; e3; : : : en are called as standard basis of the vector space Rn . The Inner product of the vectors u, v 2 V represented as < v; u > is defined as follows. Let v D ’1 v1 C ’2 v2 C ’3 v3 C : : : ’n vn u D “1 v1 C “2 v2 C “3 v3 C : : : “n vn < v; u >D ’1“1 C ’2“2 C ’3“3 C ’4“4 : : : C ’n“n Norm of the vector v is denoted as follows kvk D
p < v; v >
Properties of the norm 1. 2. 3. 4. 5.
kc; uk D jcjkuk kuk > 0 Cauchy-Schwarz inequality j < u; v > j kuk kvk ku C vk kuk C kvk j < v; u > j Re.< v; u >/
4.12 Orthogonal Basis 1. Consider the inner product vector space V. Let u, v 2 V. The vector ‘u’ is orthogonal to the vector ‘v’ if the inner product < u; v >D 0. 2. The set of vectors fv1; v2; : : : vkg are orthogonal set if < vi; vj >D 0, where i ¤ j and i; j D 1; 2; ::k. 3. The set of orthogonal vectors are always linearly independent. 4. If the set of Orthogonal vectors fv1; v2; : : : vkg with kvik D 1; i D 1::k are called as orthonormal vectors. 5. If the set of orthogonal vectors are fv1; v2; : : : vkg, then the set of orthonormal v1 v2 vk ; kv2k ; : : : kvkk g. vectors are obtained as f kv1k 6. If the set of orthogonal vectors forms the basis of the vector space V, they are called orthogonal basis. 7. If fv1; v2; v3; : : : vng forms the basis of the vector space V, then there exists the set of vectors fu1; u2; u3; : : : ung which is the orthonormal set which forms the basis of the vector space ‘V’, which are obtained using Gram-Schmidt orthogonalization procedure as given below. v1’ D
v1 D u1 kv1k
178
4 Linear Algebra
v2’ D v2 < v2; u1 > u1 v20 u2 D kv20 k v3’ D v3 < v3; u1 > u1 < v3; u2 > u2 v30 u3 D kv30 k and so on. 8. Set of all vectors in the vector space V which are orthogonal to the set of vectors of the subspace S V is the vector space represented as S? . 9. If the dimension of the vector space S is k, then the dimension of the vector space S? is n k, where n is the dimension of the vector space ‘V’ (Fig. 4.16). 10. If the basis of the space S is fw1 ; w2 ; : : : wk1 ; wk g and the basis of the vector space S? is fwkC1 ; wkC2 ; : : : wn1 ; wn g, then the basis of the vector space V is given as fw1 ; w2 ; : : : wk1 ; wk ; wkC1 ; wkC2 ; : : : wn1 ; wn g. .i:e:/ V D S ˚ S? 11. The basis are orthogonal to each other between the space S and S? . 12. Any vector v 2 V can be uniquely written as v1 C v2, where v1 2 S and v2 2 S? 13. Any vector v 2 V can be written as the linear combinations of the 14. Orthonormal basis vectors fv1,v2,v3,: : : vng as ˛1 v1 C ˛2 v2 C ˛3 v3 C : : : ˛n vn, where ˛1 is computed as < v; v1 > : Similarly ˛2 D< v; v2 > ˛3 D< v; v3 > ˛4 D< v; v4 > In general ˛k D< v; vk >
V s⊥
S
Fig. 4.16 Illustration of the vector space S and its orthogonal complement S?
4.13 Riegtz Representation
179
4.13 Riegtz Representation Consider the Linear transformation T: V ! F, where the vector space V with dimension ‘n’. There exists unique vector y 2 V such that T.v/ D< v; y > Technique to obtain the vector y 2 V 1. 2. 3. 4. 5.
Find the kernel(T) D W Find the W? . Find any orthonormal vector u 2 W? Compute T.u/ The vector y D T.u/ u
Example 4.1. Consider the transformation A W Rn ! R A.Rn / D A.Œx1 x2 x3 : : : xn/ D ˛1 x1 C ˛2 x2 C ˛3 x3 C : : : C ˛n xn )< .x1 x2 x3 : : : xn /; .˛1 ˛2 : : : ˛n / > D ˛1 x1 C ˛2 x2 C ˛3 x3 C : : : C ˛n xn Where y D .˛1 ˛2 : : : ˛n / is the unique vector 2 Rn .
Chapter 5
Optimization
5.1 Constrained Optimization Consider the function f .x; y/ D 2x 2 C 4y C 3. The requirement is to find out the optimal values for ‘x’ and ‘y’ such that f(x, y) is minimized. Also it has to satisfy the constraint that g.x; y/ D x C y C 3 D 0. Let the local extremum (maximum or minimum) point be .x0 ; yo /. The curve x C y C 3 D 0 can be viewed as the set of points .˛; 3 ˛/; 8 ˛ 2 R. This is known as parametric representation of the curve.The tangent vector at the d.3˛/ D Œ1 1. Also the grapoint .x0 ; yo / points towards the direction d˛ d˛ ; d˛ " # @.xCyC3/ 1 @x dient vector of the equation g.x; y/ is obtained as @.xCyC3/ D Note that 1 @y the gradient vector and the tangent vector is always orthogonal to each other (see Fig. 5.1). The solution of the above optimization problem lies on the intersection of the curve g.x; y/ and f .x; y/ So consider the points on the curve g.x; y/ represented in the parametric form as g.x.˛/; y.˛// D g1.˛/. Let the optimal point .x0 ; yo / be represented in terms of the variable ‘˛’ as ˛0 . Note that the point lies on the curve f .x; y/. The function f .x; y/ can be represented as the function of ‘˛’ for the points of intersection of the curves g.x; y/ and f .x; y/ as f 1.˛/ Using Taylor series, ˛ 2 d 2 f 1.˛0 / df 1.˛0 / C C ::: d˛ 2Š d˛ 2 ˛ 2 d 2 f 1.˛0 / df 1.˛0 / ) f 1.˛ C ˛0 / f 1.˛0 / D ˛ C C ::: d˛ 2Š d˛ 2 f 1.˛ C ˛0 / D f 1.˛0 / C ˛
f 1.˛0 / corresponds to local extremum and hence f 1.˛ C ˛0 / f 1.˛0 / must be greater than 0 if the point ˛0 is local minima, or it must be lesser than 0 if the point ˛0 is local maxima. Also ˛0 can be either positive or negative value. Thus the 0/ condition that ˛0 belongs to the local extremum is df 1.˛ D 0. Also note that if d˛ E.S. Gopi, Mathematical Summary for Digital Signal Processing Applications with Matlab, DOI 10.1007/978-90-481-3747-3 5, c Springer Science+Business Media B.V. 2010
181
182
5 Optimization
x2 x+y+3=0
x1
[1-1]
Fig. 5.1 Illustration of the property that the gradient vector is orthogonal to the tangent vector x
Fig. 5.2 Illustration of the functional dependencies
f
α
y
the extremum is maxima, then minima, then
d 2 f 1.’0 / d’2
d 2 f 1.˛0 / d˛ 2
must be negative and if the extremum is
is positive.
Thus the condition that the point ˛0 belongs to extremum is
df 1.˛0 / d˛
D 0 (Fig. 5.2)
f 1.’0 / D f .x.’0 /; y.’0 // )
df .x.’0 /; y.’0 // df .x.’/; y.’// df 1.’0 / D D at ˛ D ˛0 d˛ d’ d’
Using the illustration of functional dependencies as shown above, we get the following df .x.’/; y.’// dx df dy df .at ˛ D ˛0 / D C D0 d˛ dx d˛ dy d˛
5.1 Constrained Optimization
183
2
3
dx df df 6 d˛ 7 ) 4 dy 5 .at ˛ D ˛0 / D 0 dx dy d˛
" df # The vector " dx # Also
d˛ dy d˛
dx df dy
is the gradient vector of the curve f .x; y/ at the point ˛ D ˛0 .
at ˛ D ˛0 is the direction of the tangent vector on the point ˛ D ˛0 of
the curve g.x; y/. [This is because of the fact that the set of points ..x.˛/; y.˛// are the set of parametric representation of the points on the curve g.x; y/. Thus from the above statement, the gradient vector of the curve f .x; y/ at the point .x.˛0 /; y.˛0 // is orthogonal to the direction of the tangent vector drawn at the point .x.˛0 /; y.˛0 // on the curve g.x; y/. We have already shown that the gradient vector of the curve g.x; y/ at the point .x.˛0 /; y.˛0 // and the direction of the tangent vector of the curve g.x; y/ at the point .x.˛0 /; y.˛0 // are orthogonal to each other. This implies that the gradient vector of the curve f .x; y/ and the gradient vector of the curve g.x; y/ are parallel to each other and hence for some scalar ‘œ’, which is known as Lagrange multiplier, Gradient vector of the curve f .x; y/ C œ Gradient vector of the curve g.x; y/ D 0. " # Gradient vector of the curve f .x; y/ is represented as rf D
df dx df dy
. Similarly
" dg # the gradient vector of the curve g.x; y/ is represented as rg D
dx dg dy
.
) rf C œrg D 0
4x 4
1 and rg D . There fore from above 1
In our problem rf D 4x 1 Cœ D 0. Solving gives œ D 4 and x D 4=4 D 1. Also we know 4 1 x C y C 3 D 0. ) y D x 3 D 1 3 D 4:
Thus the optimal solution for computing the extremum of the function f .x; y/ D 2x 2 C4y C3 is obtained as .1; 4/ and the corresponding value is 2–16 C3 D 11. But to test whether the obtained extremum is maximum or minimum is decided using the second derivative as shown below. 2 We have already shown that d fd1.˛/ ˛ 2 .at ˛ D ˛0 / > 0, where f 1.˛/ D f .x.˛/; y.˛// if the extremum point ˛ D ˛0 is minima point.
184
5 Optimization
Using the functional dependencies as shown in the Fig. 5.2, computed as shown below.
d 2 f 1.˛/ 2 ˛ d
is
df 1 @f @x @f @y D C D p.x.˛/; y.˛// d˛ @x @˛ @y @˛
d dfd ’1 d.p.x.˛/; y.˛// @p @x @p @y d 2 f 1.˛/ D D C D 2 d’ d’ d’ @x @˛ @y @˛ @f @y @f @y @f @x @f @x C C @ @ @x @y @x @˛ @y @˛ @x @˛ @y @˛ D C @˛ @x @˛ @y 2 @x @f @ x @x @2 f @y @x @2 f @x C C D @˛ @x2 @˛ @˛ @x @x@˛ @˛ @x@y @˛ 2 2 2 @ y @y @ f @y @y @f @ y @x @f C C C @˛ @y @x@˛ @˛ @y2 @˛ @˛ @y @˛@y 2 @y @2 f @x @y @f @ x C C @˛ @y@x @y @˛ @x @y@˛ 2 2 2 2 3 3 @2 f 2 @x 3 @ f @ x 2 7 @x @y 6 @f @f 6 @y@x 7 6 @x 6 @˛ 2 7 7 6 @˛ 7 D 2 2 5 4 @y 5 C 2 5 4 4 @ f @ f @˛ @˛ @x @y @ y @˛ @y@x @y2 @˛ 2 Thus to satisfy the condition 2
@x @˛
@y 6 6 6 @˛ 4
@2 f @x2 @2 f @y@x
d 2 f 1.˛/ .at d’2
˛ D ˛0 / > 0
3 @2 f 2 @x 3 @f @y@x 7 7 6 @˛ 7 C 74 5 @x @2 f 5 @y @˛ 2 @y
2
3 @2 x 27 @f 6 6 @˛ 7 .at ˛ D ˛0 / > 0 4 @y @2 y 5 @˛ 2
i h @x @y Note that @˛ @˛ .at ˛ D ˛0 / is the direction of tangent vector drawn at the point ˛ D ˛0 on the curve g.x; y/ and hence the point lies on the tangent vector drawn on the curve g.x; y/ at the point ˛ D ˛0 . 2 D 0 at.at ˛ D ˛0 /. Also œ d d’g.˛/ 2 Expanding in the similar fashion as described above, it can be shown that œ
@x @˛
@2 g 2 @y 6 6 @x 2 @˛ 4 @ g @y@x
2
32 3 @2 g @x 6 @˛ 7 @y@x 7 7 6 7 C œ @g @2 g 5 4 @y 5 @x @y2
@˛
3 @2 x 7 @g 6 6 @˛ 2 7 .at ˛ D ˛0 / D 0 2 @y 4 @ y 5 @˛ 2
2
5.1 Constrained Optimization
185
Adding both the equation, we get
2 32 3 2 2 2 3 2 3 @2 f @ g @ f @ g @x @x 2 2 6 6 6 7 7 @x @y 6 @x @x @y 6 @x @y@x 7 6 @˛ 7 @y@x 7 6 @˛ 74 7 Cœ 5 @2 f 5 4 @y 5 @2 g 5 @y @˛ @˛ 4 @2 f @˛ @˛ 4 @2 g @˛ @˛ @y@x @y2 @y@x @y2 2 2 3 2 2 3 @ x @ x 6 @˛ 2 7 7 2 @g @g @f @f 6 6 @˛ 7 C œ 6 7 .at ˛ D ˛0 / > 0 C 4 5 2 @x @y @x @y 4 @2 y 5 @ y
2
@˛ 2
@˛ 2
We have already shown that
@f @x
@f @g Cœ @y @x
@g D0 @y
and hence we get the following condition. 2 2 3 2 2 @2 g 2 @x 3 @ g @ f @ f Cœ Cœ @x @y 6 @x 2 @y@x @y@x 7 6 @x 2 7 6 @˛ 7 6 7 4 5 .at ˛ D ˛0 / > 0 2 2 2 @˛ @˛ 4 @2 f @ g @ f @ g 5 @y Cœ C œ @˛ @y@x @y@x @y 2 @y 2 In practice representing the function g.x; y/ form is not easy like i h in parametric @y @x the one used in the example. But the point @˛ @˛ always lies on the tangent of the curve g.x; y/ drawn at the point .x0 ; y0 / and hence to confirm whether the obtained we have 3to test whether the modified Hessian matrix 2
point is minima, 4
@2 f @x 2 @2 f @y@x
@2 g @x 2 @2 g œ @y@x
Cœ
@2 f @y@x @2 f @y 2
2
@ g C œ @y@x
5 is positive definite restricted to the points on 2 C C œ @@yg2 the tangent of the curve g.x; y/ drawn at the point .x0 ; y0 /. The modified Hessian matrix to be tested is given as 3 2 2 d 2g d f d 2g d 2f C C 2 6 dx dy dx dy 7 dx 7 6 dx2 7 with œ D 4; as found earlier 6 2 2 2 4d f d g d f d 2g 5 C C 2 dydx dx dy dy2 dx 4 C .4/ 0 0 C 0 4 0 D D D A .say/ 0 C .4/ 0 0 C 0 0 0 The obtained optimal solution x D 1; y D 4 (described earlier) corresponds to the minimal point if the modified Hessian matrix (A) as shown above is the positive definite restricted to the points on the tangent drawn on the curve g.x; y/ at the point .1; 4/.
186
5 Optimization
The equation of the tangent drawn on the curve g.x; y/ at the point .1; 4/ is obtained as the set of points .z1; z2/ satisfying the condition rgT Œz11 z2C4 D 0. Note that rgT is computed at .1; 4/. For the function g.x; y/ D x Cy C3 D 0, the gradient vector at the point .1; 4/ is found as follows. 2 3T @g 6 @x 7 6 7 D 1 4 @g 5 1 @y Therefore the equation of the tangent drawn at the point .1; 4/ on the curve T 1 g.x; y/ D x C y C 3 D 0 is obtained as Œz1 1 z2 C 4 D z1 C z2 C 3 D 0. 1 The set of points .v; 3 v/8 v 2 R is the parametric representation of the above equation that lies on the tangent drawn on the curve g.x; y/ at .1; 4/. Also, the matrix A is said to be positive definite restricted to the points that lies on the tangent drawn on the curve g.x; y/ at the point .1; 4/. if uT Au > 0; 8u 2 points as described above. uT Au D Œv
3 v
4 0 v D Œ4v 0 0 3 v
0
v D 4v2 3 v
which is greater than or equal to zero and hence the point obtained is the saddle point.
5.2 Extension to Constrained Optimization Technique to Higher Dimensional Space with Multiple Constraints In general the problem discussed above can be extended to more than two variables and more than one constraints as shown below. Minimize f .Œx1 x2 x3 : : : xn/, subject to the constraints f .Œx1 x2 x3 : : : xn/
g2.Œx1 x2 x3 : : : xn/ D 0 : : : gm.Œx1 x2 x3 : : : xn/ D 0:
Create the Lagrangean function L.f; g1; g2; g3/ D f .Œx1 x2 x3 : : : xn/ C 1 g1 .Œx1 x2 x3 : : : xn/ C2 g2 .Œx1 x2 x3 : : : xn/ C : : : m gm .Œx1 x2 x3 : : : xn/ Differentiating the above equation with respect to x1 x2 x3 : : : xn; 1 ; 2 ; : : : m and equate to zero gives the extremum point .x01 x02 x03 : : : x0n / (say).
5.2 Extension to Constrained Optimization Technique to Higher Dimensional Space
187
The above set of equations can also be obtained using the following equation 2 3 1 6 2 7 7 rf C Œrg1 rg2 : : : rg1m 6 4 : : : 5 D 0. m Modified Hessian matrix is used to test whether the obtained extremum point thus obtained is minimum or not as shown below. The Modified Hessian matrix is as shown in below 2
d 2 g1 d 2f 6 dx 2 C 1 dx 2 1 1 6 6 2 2 6 d f d g 1 6 6 dx x C 1 dx x 6 2 1 2 1 6 6 6 4 d 2f d 2 g1 C 1 dxn x1 dxn x1
C 2
d 2 g2 d 2 g3 C 3 dx1 2 dx1 2
C 2
d 2 g2 d 2 g3 C 3 dx1 x1 dx1 x1
d 2f d 2 g1 d 2 g2 d 2 g3 2 C 1 2 C 2 2 C 3 dx2 dx2 dx2 dx2 2 :::
d 2 g2 d 2 g3 C 3 dxn x1 dxn x1
d 2f d 2 g1 d 2 g2 d 2 g3 C 1 C 2 C 3 dxn x2 dxn x2 dxn x2 dxn x2
::: C 2
::: ::: ::: :::
d 2f d 2 g1 d 2 g2 d 2 g3 C 1 C 2 C 3 dx1 x2 dx1 x2 dx1 x2 dx1 x2
3 d 2f d 2 g1 d 2 g2 d 2 g3 C 1 C 2 C 3 dx1 xn dx1 xn dx1 xn dx1 xn 7 7 7 d 2 g1 d 2 g2 d 2 g3 7 d 2f 7 C 1 C 2 C 3 7 dx2 xn dx2 xn dx3 xn dxn xn 7 7 ::: 7 7 d 2f d 2 g1 d 2 g2 d 2 g3 5 C C C 1 2 3 xn 2 dxn 2 dxn 2 dxn 2
If the Modified Hessian matrix as mentioned above is positive definite restricted to the points on the tangent plane P as defined below. Tangent plane ‘P’ drawn at the optimal point .x01 x02 x03 : : : x0n / on the surface defined by the equations g1 .x1; x2; : : xn/; g2 .x1; x2; : : xn/ and g3 .x1; x2; : : xn/ is defined as the set of points ‘y’ satisfying the equation. 0 2 31 x01 B 6x 7C B 6 02 7C B 6 7C B 6x 7C Œrg1 rg2 rg3 T .computed at Œx01 x02 x03 : : : x0n / By 6 03 7C D 0 B 6x04 7C B 6 7C @ 4 : : : 5A x0n Example 5.1. Minimize the function f .x1; x2; x3/ D 2x12 C 4 x2 C 2 x3 C 2. Subject to the constraint g1.x1; x2; x3/ D x1 C x2 4 D 0; g2.x1; x2; x3/ D x1 C x3 5 D 0
188
5 Optimization
3 1 6 2 7 7 rf C Œrg1 rg2 : : : rg1m 6 4: : : 5 D 0 m 2 3 2 3 4x1 1 1 ) 4 4 5 C 41 05 1 D 0 2 2 0 1 2 3 4x1 C 1 C 2 D 0 ) 4 4 C 1 C 2 D 0 5 2
2 C 2 D 0 Solving the above equation gives 2 D 2; 1 D 2; x1 D 1; x2 D 3; x3 D 4 The optimum value is (1,3,4) and the corresponding function value is 24. To test whether the obtained solution is minima or not is done using modified Hessian matrix as given below. 2 3 4 0 0 40 0 05 0 0 0 The optimal point thus obtained is minima if the modified Hessian matrix obtained is positive definite restricted to the points on the tangent plane of the surface defined by the equation g1.x1; x2; x3/ D x1Cx24 D 0; g2.x1; x2; x3/ D x1Cx35 D 0 The equation of the tangent as described above is obtained as follows. 2 3 y1 1 Œrg1 rg2T 4y2 35 D 0 y3 4 2 3 y1 1 1 1 0 4 ) y2 35 D 0 1 0 1 y3 4 2 3 y1 1 1 0 4 5 4 ) y2 D 1 0 1 5 y3 y1 C y2 D 4 y1 C y3 D 5 Let y1 D ˛
y2 D 4 ˛ and y3 D 5 ˛
5.3 Positive Definite Test of the Modified Hessian Matrix Using Eigen Value Computation
189
Thus the tangent plane is defined as the vector space that is spanned by the column vectors as described below. 2
3 ˛ 44 ˛ 5 ; 8˛ 2 R 5˛ 2 3 4 0 0 To check whether the matrix 40 0 05 is negative definite restricted to the vector 0 0 0 2 3 ˛ of the form 44 ˛ 5 is tested as described below. 5˛ 32 3 4 0 0 ˛ 5 ˛ 40 0 05 44 ˛ 5 D 4˛ 2 08˛ 2 R 0 0 0 5˛ 2
Œ˛
4˛
Hence the obtained extremal point is the saddle point.
5.3 Positive Definite Test of the Modified Hessian Matrix Using Eigen Value Computation If all the Eigen values of the matrix modified Hessian matrix computed at the extremum point are positive, the matrix is said to be positive definite matrix and the corresponding extremum point is minima. If all the Eigen values of the matrix modified Hessian matrix computed at the extremum point are negative, the matrix is said to be Negative definite and the corresponding extremum point belongs to maxima. 2 3 a1 a2 a3 Proof. As Hessian matrix A D 4a2 a4 a55 (say) is the symmetric matrix, it is a3 a5 a6 diagonalizable (i.e.) the matrix can be represented as A D EDEH . Also the Eigen values of the matrix A are real numbers and the Eigen vectors are orthonormal to each other. (Refer Chapter 4)
190
5 Optimization
2
a1 a2 Œx1 x2 x3 4a2 a4 a3 a5 2 e11 D Œx1 x2 x3 4e12 e13
32 3 a3 x1 5 4 a5 x25 a6 x3 32 32 32 3 e21 e31 œ1 0 0 e11 e12 e13 x1 e22 e325 4 0 œ2 0 5 4e21 e22 e235 4x25 e23 e33 e31 e32 e33 x3 0 0 œ3
D œ1 .x1 e11 C x2 e12 C x3 e13/2 Cœ2 .x1 e21 C x2 e22 C x3 e23/2 Cœ3 .x1 e31 C x2 e32 C x3 e33/2 Thus to test whether the matrix A is positive definite, the requirement is œ1 .x1 e11 C x2 e12 C x3 e13/2 C œ2 .x1 e21 C x2 e22 C x3 e23/2 Cœ3 .x1 e31 C x2 e32 C x3 e33/2 > 0 This implies all the Eigen values should be greater than zero. Example 5.2. Consider the problem of minimizing the function f .x; y/ D 2x 2 C 4y C 3 subject to the constraint g.x; y/ D x C y C 3 D 0. We have already shown that the extremal point is .1; 4/ with respect to the xy co-ordinate system. Now shifted co-ordinate system PQ co-ordinate system is framed such that the extremal point is (0,0) with respect to the new co-ordinate system. ) P D X 1; Q D Y C 4 Thus the function g.x; y/ D x C y C 3 D 0 is rewritten with respect to the new co-ordinate system as shown below. u.p; q/ D p C 1 C q 4 C 3 D 0 ) u.p; q/ D p C q D 0 The parametric representation of the above equation is represented as the set of points of the form .ˇ; ˇ/. The equation of the tangent plane with respect to the new co-ordinate system is the set of points y D Œy1 y2T such that ruT y D 0 ) Œ1
1
y1 D 0: y2
In other words the set of points describing thetangent plane is in the Null space of ˛ the matrix [1 1], which is the represented as , where ˛ 2 R. ˛
5.3 Positive Definite Test of the Modified Hessian Matrix Using Eigen Value Computation
191
The Modified Hessian matrix is found at the point (0,0) with the new co-ordinate system and is displayed below. The equation f .x; y/ D 2x 2 C4yC3 in the modified co-ordinate system is given as v.p; q/ D 2 .p C 1/2 C 4 .q C 5/ C 3or D 2.p 2 C 1 C 2p/ C 4q C 20 C 3a D 2p 2 C 4p C 4p C 25 4 0 Modified Hessian matrix is (Note that the value of œ D 4 which is same as 0 0 the one calculated with the previous co-ordinate system). To test whether the above Hessian matrix is positive definite or not restricted to the tangent plane as described above is as shown below. 4 0 ˛ Œ˛ ˛ D 4˛ 2 0 and hence the extremal point obtained is the 0 0 ˛ indeterminate point. Trick to test whether the Hessian matrix ‘H’ is positive definite restricted to the points on the tangent plane as described above. Any vector in the tangent plane can be represented as the linear combinations of the basis of the Null space of the matrix ruT (see above). The Eigen basis vector E1,E2,E3 (say) are arranged in column form to obtain Eigen matrix E. Any vector in the tangent plane space can be representedas the linear combinations of the Eigen p1 vectors as described below. Thus ŒE1 E2 is the arbitrary point in the tangent p2 plane as described above. Thus the Hessian matrix H is positive definite restricted to the points on the p1 tangent plane (as described above) if Œp1 p2E T HE > 08p1; p2 2 R. p2 Thus if the Eigen values of the matrix E T HE is greater than 0, the matrix is positive definite matrix. " # " h i 4 0 p1 1 1 2 D In the above example, Eigen values of the matrix p2 p2 0 0 p12 Œ2 shave to be found. The Eigen value is positive and hence the Extremum point obtained is minimum. Example 5.3. Consider the problem described in Example 5.1. Minimize the function f .x1; x2; x3/ D 2x12 C4x2C2x3C2. Subject to the constraint g1.x1; x2; x3/ D x1 C x2 4 D 0; g2.x1; x2; x3/ D x1 C x3 5 D 0. The optimal point is obtained as (1,3,4) with respect to the co-ordinate system .x1; x2; x3/. The modified co-ordinate system .z1; z2; z3/ is obtained as z1Dx1 1I z2 D x2 3I z3 D x3 4;
192
5 Optimization
The Modified equations corresponding to the functions g1; g2, and f are u1; u2 and v respectively which are displayed below u1.z1; z2; z3/ D z1 C 1 C z2 C 3 4 D 0 D z1 C z2 D 0 u2.z1; z2; z3/ D z1 C 1 C z3 C 4 5 D 0 D z1 C z3 D 0 v.z1; z2; z3/ D 2.z1 C 1/2 C 4 .z2 C 3/ C 2 .z3 C 4/ C 2 D 2.z12 C 1 C 2z1/ C 4z2 C 12 C 2z3 C 8 C 2 D 2z12 C 4z1 C 4z2 C 2z3 C 24 The Modified Hesian matrix ‘H’ at the point (0,0,0) with respect to the modified co2 3 4 0 0 ordinate system is given below. 40 0 05. (Note that the value of œ1 D 2 and 0 0 0 œ2 D 2, which are same as that of the one calculated with the previous co-ordinate system). The Equation of the tangent plane passing through the point (0,0,0) in the new co-ordinate system is described as the set of points (p,q,r) satisfies the following condition. 2 3 p T 4 5 Œrg1 rg2 q D0 r 2 3 p 1 1 0 4 5 ) q D0 1 0 1 r The set of points on the tangent plane is given as the null space of the matrix 2 1 3 p 3 1 1 0 6 p1 7 . The Eigen basis of the null space is represented as B D 4 35 1 0 1 p1 2 32 33 4 0 0 1 T 4 5 4 The Eigen value of the matrix E HE D Œ1 1 1 0 0 0 15 D Œ4 0 0 0 1 is 4 which is greater than zero and hence the obtained minima point is the minima point. Example 5.4. Maximize the function x1 x2 C x2 x3 C x1 x3 .subject to the constraint x1 C x2 C x3 D 3. The Lagrangean function is obtained as x1 x2 C x2 x3 C x1 x3 C .x1 C x2 C x3 3/ D 0. The following set of equations are obtained by computing by differentiating the Lagrangean function with respect to .x1; x2; x3; / and equate to zero.
5.4 Constrained Optimization with Complex Numbers
193
x2 C x3 C D 0I x1 C x3 C D 0I x1 C x2 C D 0I x1 C x2 C x3 D 3 Solving the above equation gives D 2; x1 D x2 D x3 D 1 The modified co-ordinate system .z1 ; z1 ; z1 / are obtained as z1 D x1 1I z2 D x2 1I z3 D x3 1. The modified equation corresponding to f and g is as shown below. v.z1 ; z2 ; z3 / D .z1 C 1/.z2 C 1/ C .z2 C 1/.z3 C 1/ C .z1 C 1/.z3 C 1/ D z1 z2 C z2 z3 C z3 z1 C z1 C z2 C z3 C 3 u.z1 ; z2 ; z3 / D z1 C 1 C z2 C 1 C z3 C 1 D z1 C z2 C z3 C 3 D 0 The Modified Hessian matrix at (0,0,0) in the new co-ordinate system is computed 2 3 0 1 1 as 41 0 15 1 1 0 The Equation of the tangent plane with the modified co-ordinate system is set of 2 3 p T 4 5 points .p; q; r/ satisfying the condition ru q D0 r This implies the set of points form the null space of the matrix Œ1 1 1 2 3 0:5774 0:5774 The basis of the null space of the matrix Œ1 1 1 is given as 4 0:7887 0:21135. 0:2113 07887 3 2 3T 2 0 1 1 0:5774 0:5774 The Eigen values of the matrix 4 0:7887 0:21135 41 0 15 1 1 0 0:2113 07887 2 3 0:5774 0:5774 4 0:7887 0:21135 D 1 0 are given as 1 and 1 which are less 0 1 0:2113 07887 than zero and hence the obtained extremum point is the maxima point.
5.4 Constrained Optimization with Complex Numbers Consider the problem of minimizing the function f .x1; x2; x3/ where x1; x2; x3 2 C subject to the constraint g.x1; x2; x3/ D 0; where x1; x2; x3 2 C The problem can be viewed as maximizing both real and imaginary part of the function. Hence the Lagrangean function is given as f .x1; x2; x3/ C g.x1; x2; x3/
194
5 Optimization
The equation can be written as below treating the complex Lagrange multiplier defined as œ D œ1 C j œ2 .
N f .x1; x2; x3/ C Re g.x1; x2; x3/ D 0 Differentiating the above equation with respect to x1; x2; x3 and œN and equate to zero gives the extremal point for the above problem. Note: Differentiation of the function with respect the complex number is defined as shown below. @ 1 @ 1 @ @ @ @ D D and Cj j 2 @x11 @x12 @x1 2 @x11 @x12 @x1 Properties of the complex differentiation @x1
D0 @x1 @x1 D1 2. @x1 @.AH x1/ 3. D0 @x1 @.AH x1/ 4. D AN @x1 @.zH A/ 5. DA @Nz @Re.zH A/ 6. D 12 A @Nz 1.
5.5 Dual Optimization Problem Consider the problem of minimizing the function f .x1; x2; x3/, subject to the constraint g1.x1; x2; x3/ D 0. Framing the Lagrange equation we get, L.X; / D f .x1; x2; x3/ C g1.x1; x2; x3/. Differentiate the above equation with respect to x1; x2; x3; and equate to zero. Solve for x1; x2; x3 in terms and substitute in the equation L.X; / to obtain the function h./. Thus function thus obtained is called dual optimization problem for the above mentioned constrained optimization problem. Thus the Dual problem is as shown below. Maximize the function h./ without constraints. (i.e.) Unconstrained optimization problem. Example 5.5. Consider the problem of minimizing the function f .x1; x2; x3/ D x1x2 C x2x3 C x3x1 C x1x2x3 4 D 0. Subject to the following constraints g1.x1; x2; x3/ D x1 C x2 C x3 3 D 0.
5.6 Kuhn-Tucker Conditions
195
Framing the Lagrange equation we get L.X; / D x1x2 C x2x3 C x3x1 C x1x2x3 4 C .x1 C x2 C x3 3/ Differentiating the above equation with respect to x1; x2; x3 and equate to zero and solving for x1; x2; x3 in terms of , we get x1 D
1 1 1 1 1 1 .7 C 2/ 2 ; x2 D .7 C 2/ 2 ; x3 D 2 C .7 C 2/1=2 2 2 2 2
Substituting the above equation in the Lagrange equation L.X;1/, we get the Dual
1 1 1 1 1 2 without any conproblem Maximizing the 2 2 .7 C 2/ 2 2 2 .7 C 2/ straints. (i.e.)Unconstrained optimization.
5.6 Kuhn-Tucker Conditions Consider the optimization problem as shown below. Minimize the function f .x1; x2; x3/. Subject to the constraints that g1.x1; x2; x3/ D 0 g2.x1; x2; x3/ D 0 h1.x1; x2; x2/ 0 h2.x1; x2; x3/ 0 The Lagrangean function is framed with the Lagrangean multipliers 1 ; 2 ; 1 ; 2 as shown below. f .x1; x2; x3/ C 1 g1.x1; x2; x3/ C 2 g2.x1; x2; x3/ C1 h1.x1; x2; x3/ C 2 h2.x1; x2; x3/ D 0 Differentiating the above equation with respect x1; x2; x3 and equate to zero to obtain the three equations. Also, The Lagrange multiplier used for inequality constraints satisfies the following conditions 1 h1.x1; x2; x3/ C 2 h2.x1; x2; x3/ D 0 1 0; 2 0 Along with this, the two equations g1.x1; x2; x3/ D 0 g2.x1; x2; x3/ D 0 are used to obtain the extremal point.1 Example 5.6. Minimize the function f .x1; x2; x3/ D x1x2 C x2x3 C x3x1 C x1x2x3 4 D 0. Subject to the following constraints g1.x1; x2; x3/ D x1 C x2 C x3 3 D 0 h1.x1; x2; x3/ D x1 x2 x3 0; h2.x1; x2; x3/ D x1 C x2 2x3 0
196
5 Optimization
Construct the Lagrangean function x1x2 C x2x3 C x3x1 C x1x2x3 C 1 .x1 C x2 C x3 4/ C1 .x1 x2 x3/ C 2 .x1 C x2 C 2x3/ D 0 Differentiating with respect to x1; x2; x3 and equate to zero, the following equations are obtained x2 C x3 C x2x3 C 1 C 1 C 2 D 0 x1 C x3 C x1x3 C 1 1 C 2 D 0 x2 C x1 C x1x2 C 1 1 22 D 0 Also 1 .x1 x2 x3/ C 2 .x1 C x2 2x3/ D 0 Along with above equation g1.x1; x2; x3/ D x1 C x2 C x3 3 D 0 is used to obtain the extremal points as given below. Case 1: 1 D 0. The following solutions are obtained .x1; x2; x3; 1 ; 1 ; 2 / D .1; 0; 4; 1; 0; 0/; .3; 0; 0; 3; 0; 0/; .1; 1; 1; 3; 0; 0/; .1; 5; 1; 1; 0; 0/ All the solutions are valid as 1 0, and 2 0 Case 2: 2 D 0. The following solutions are obtained .x1; x2; x3; 1 ; 1 ; 2 / D .1; 0; 4; 1; 0; 0/.3; 0; 0; 3; 0; 0/ 3 3 3 99 9 ; ; ; ; ; 0 .1; 1; 1; 3; 0; 0/.1; 5; 1; 1; 0; 0/ 2 4 4 32 32 All the solutions obtained above are valid as 1 0, and 2 0.
Chapter 6
Matlab Illustrations
6.1 Generation of Multivariate Gaussian Distributed Sample Outcomes with the Required Mean Vector ‘MY ’ and Covariance Matrix ‘CY ’ Consider the transfer of random variables X D R cos ‚ and Y D R sin ‚, where ‘‚’ is uniformly distributed between 0 and 2 and ‘R’ is Rayleigh density function as described below. Also R and ‘‚’ are independent random variables. The Rayleigh density function is given as follows r r 22 e 2 8r 0
2 D 0; elsewhere
fR .r/ D
The corresponding distribution function is given as follows. FR .r/ D 1 e
r2 2 2
8r 0
D 0; elsewhere The joint density function of X and Y represented as fXY .x; y/ is obtained using Jacobian as follows. fXY .x; y/ D
1 fR‚ .r.x; y/; ‚.x; y// jJ j
p The solution for the above set of equation gives R D X 2 C Y 2 and ‚ D
1 Y . The Jacobian matrix at the solution is obtained as follows. tan X 2
3 @X @X p Y 6 @R @‚ 7 2 C Y 2 ; ‚ D tan1 is obtained as X at R D 4 @Y @Y 5 X @R @‚p R D X2 C Y 2 E.S. Gopi, Mathematical Summary for Digital Signal Processing Applications with Matlab, DOI 10.1007/978-90-481-3747-3 6, c Springer Science+Business Media B.V. 2010
197
198
6 Matlab Illustrations
Thus the joint density function of
p
y 1 fXY .x; y/ D p fR‚ x 2 C y 2 ; tan1 x X2 C Y 2
y p 1 fR x 2 C y 2 f‚ tan1 D p x X2 C Y 2
p x2 C y 2 2 2 1 .x Cy / 1 e 2 2 D p 2
2 X2 C Y 2 2 2 y x 1 1 e 2 2 p e 2 2 D fX .x/fY .y/ D p 2 2 2
2
Note that the density functions fX .x/; fY .y/ obtained above are the independent Gaussian density function with variance 2 and mean zero. Thus the steps involved in generating the sample outcomes that is Gaussian distributed with mean D 0 and variance D 2 is summarized below. Step 1: Generation of sample outcomes that is Rayleigh distributed from the uniformly distributed sample outcomes Consider the Uniformly distributed random variable be ‘U’ and Rayleigh distributed random variable be ‘R’. Also Let R D g.U/ FR .r/ D P .R r/ D P .g.U/ r/ D P.U g1 .r// D FU .g1 .r// D g1 .r/ (As U is the uniformly distributed function). ) FR .r/ D g1 .r/ ) 1e Solving for ‘r’, we get e
r2 2 2
r2 2 2
D g1 .r/ D u
D1u) s
)rD
2 2
r2 2 2
D ln
1 ln 1u
1 1u
r2
Note that the distribution function FR .r/ D 1 e 2 2 is valid only for positive values of ‘r’. It is also noticed that the range for the u is from 0 to 1. Thus to generate the outcomes with Rayleigh distributed, generate the outcomes is uniformly distributed over the range 0–1 and use the formula q ‘u’, that
1 2 r D 2 ln 1u to obtain the Rayleigh distributed outcomes. Step 2: Generate the sample outcomes of the uniformly distributed random variable ‘‚’ that ranges from 0 to 2 .
6.1 Generation of Multivariate Gaussian Distributed Sample Outcomes
199
x104 Probability density function of the sample outcomes of the generated data 4.5 4 3.5 3
pdf
2.5 2 1.5 1 0.5 0 −6
−4
−2
0 Values
2
4
6
Fig. 6.1 Probability density function of the 10,000,000 samples generated using Matlab (Gaussian distribution with mean D 0 and variance D 1)
Step 3: Compute X D Rcos‚ and Y D Rsin‚ to obtain the two sets of sample outcomes that are independent Gaussian distributed with mean ‘0’ and variance 2 (Fig. 6.1). Steps to Generate the Multivariate (say ‘N’) Gaussian distributed sample outcomes Repeat the procedure described above to generate ‘N’ outcomes which are individually Gaussian distributed with mean D 0 and variance D 1. Note that they are independent in nature (i.e.) If we compute the co-variance matrix for the above generated ‘N’ outcomes, it is identity matrix. Let the generated Multivariate Gaussian distributed outcomes be represented as the outcomes of the random vector ‘X’. Now the requirement is to obtain the outcomes of the Gaussian random vector ‘Y’ whose mean and co-variance matrix are represented as ‘CY ’ and ‘M’ respectively the generated outcomes of the random vector ‘X’. They are obtained using transformation matrix ‘A’ as described below. Let Y D AX C b be the transformation equation which transforms the multivariate random variable X to the multivariate random variable ‘Y’, where ‘A’ is the transformation matrix and ‘b’ is the column vector. If ‘X’ is the Multivariate Gaussian distributed, then ‘Y’ is also the Multivariate Gaussian distributed with Co-variance matrix CY D ACX AT and the mean vector ‘AmX C b’, where mX is the mean vector of the multivariate random variable X.
200
6 Matlab Illustrations
The generated Multivariate Gaussian distributed outcomes has the Identity covariance matrix CX and the mean vector mX D 0, ) CY D AAT : ) Mean vector D b Thus the requirement is to represent the required co-variance CY matrix as the product of A and AT and hence the matrix ‘A’ is obtained. Then applying the transformation Y D AX C b to obtain Multivariate Gaussian distributed outcomes with the specified mean ‘b’ and the co-variance matrix CY . Choose ‘b D mY ’ for the specifications mentioned above. Representing the matrix CY as the product of AAT is obtained using Eigen decomposition as described below. Represent the matrix CY D EDET , where E is the Eigen column matrix, in which the columns are the orthonormal Eigen vectors obtained from the co-variance matrix CY and D is the Diagonal matrix with Diagonal elements filled up with the corresponding Eigen values of the co-variance matrix CY .
1 1 T 1 1 CY D EDET D ED 2 D 2 E T D ED 2 ED 2 :
1 ) A D ED 2 is obtained: Using the obtained transformation matrix ‘A’, the outcomes of the random vector ‘X’ is transformed to the outcomes of the random vector ‘Y’ using the equation Y D AX C mY . gengd.m %m-file for generating the Multivariate Gaussian distributed %sample outcomes with zero mean vector and Identity matrix %co-variance matrix for k D 1:2:10 u D rand(1,1000000); r D sqrt(2 log(1./(1 u))); theta D rand(1,1000000).2 pi; Z1 D r. cos(theta); Z2 D r. sin(theta); Xfkg D X; Xfk C 1g D Y; end CX D cov(cell2mat(X’)’);
6.1 Generation of Multivariate Gaussian Distributed Sample Outcomes
201
The Co-variance matrix of the generated data X is computed as 3 2 1:0 0 0 0 0 0 0 0 0 0 6 0 0:99 0 0 0 0 0 0 0 07 7 6 60 0 :99 0 0 0 0 0 0 07 7 6 60 0 0 1 0 0 0 0 0 07 7 6 7 6 0 0 0 :99 0 0 0 0 07 60 CX D 6 7 60 0 0 0 0 1:0 0 0 0 07 7 6 60 0 0 0 0 0 1:0 0 0 07 7 6 60 0 0 0 0 0 1:0 0 0 07 7 6 40 0 0 0 0 0 0 0 0:99 0 5 0 0 0 0 0 0 0 0 0 1: Note that the co-variance matrix of the generated sample outcomes is almost identity matrix. (As expected) Let mean vector mY D [0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0] and 2 0:1 60:2 6 60:3 6 60:4 6 6 60:5 CY D 6 60:6 6 60:7 6 60:8 6 40:9 1:0
0:2 0:4 0:6 0:8 1:0 1:2 1:4 1:6 1:8 2:0
0:3 0:6 0:9 1:2 1:5 1:8 2:1 2:4 2:7 3:0
0:4 0:8 1:2 1:6 2:0 2:4 2:8 3:2 3:6 4:0
0:5 1:0 1:5 2:0 2:5 3:0 3:5 4:0 4:5 5
0:6 1:2 1:8 2:4 3:0 3:6 4:2 4:8 5:4 6:0
0:7 1:4 2:1 2:8 3:5 4:2 4:9 5:6 6:3 7:0
0:8 1:6 2:4 3:2 4:0 4:8 5:6 6:4 7:2 8:0
0:9 1:8 2:7 3:6 4:5 5:4 6:3 7:2 8:1 9:0
3 1:0 2:0 7 7 3:0 7 7 4:0 7 7 7 5:0 7 7 6:0 7 7 7:0 7 7 8:0 7 7 9:0 5 10:0
%m-file for generating the Multivariate Gaussian distributed sample %outcomes with mean vector mY and the co-variance matrix Co%variance matrix CY as displayed below MY D 0.1:0.1:1.0; CY D [1:1:10;2:2:20;3:3:30;4:4:40;5:5:50;6:6:60;7:7: 70;8:8:80;9:9:90;10:10:100]0.1; %Representing the matrix CY as the product of two matrix A and A0 as follows %As the co-variance matrix is the symmetric matrix, the eigen values are %real [P,Q] D eig(CY); A D P sqrt(abs(Q));
202
6 Matlab Illustrations
for i D 1:2:10 r D rand(1,1000000); sigma D 1; R D sqrt(2 sigmaˆ2log(1./(1 r))); theta D rand(1,1000000)2 pi; u D R. cos(theta); v D R. sin(theta); [p1,q1] D hist(u,100) [p2,q2] D hist(v,100) datafig D u datafi C 1g D v end X D cell2mat(data’); Y D A X C repmat(MY’,1,1000000); CY D cov(Y’); The co-variance matrix computed for the generated Multivariate Gaussian distribution sample outcomes is as shown below (as expected) (Figs. 6.2 and 6.3). 2 0:1 60:2 6 60:3 6 60:4 6 6 60:5 CY D 6 60:6 6 60:7 6 60:8 6 40:9 1:0
0:2 0:4 0:6 0:8 1:0 1:2 1:4 1:6 1:8 2:0
0:3 0:6 0:9 1:2 1:5 1:8 2:1 2:4 2:7 3:0
0:4 0:8 1:2 1:6 2:0 2:4 2:8 3:2 3:6 4:0
0:5 1:0 1:5 2:0 2:5 3:0 3:5 4:0 4:5 5
0:6 1:2 1:8 2:4 3:0 3:6 4:2 4:8 5:4 6:0
0:7 1:4 2:1 2:8 3:5 4:2 4:9 5:6 6:3 7:0
0:8 1:6 2:4 3:2 4:0 4:8 5:6 6:4 7:2 8:0
0:9 1:8 2:7 3:6 4:5 5:4 6:3 7:2 8:1 9:0
3 1:0 2:0 7 7 3:0 7 7 4:0 7 7 7 5:0 7 7 6:0 7 7 7:0 7 7 8:0 7 7 9:0 5 10:0
6.2 Bacterial Foraging Optimization Technique Evolutionary algorithms that are formulated from the inspiration of the natural biological behavior are called Biologically Inspired Algorithms (BIA).Bacterial Foraging is one of the BIA inspired from the Foraging behavior of the E-Coli Bacteria. During Foraging, Bacteria tries to move towards the region where more nutrients are available. (i.e.) Moving towards the region where the concentration of nutrients are large. They pass through the neutral medium and they avoid poisonous substances. This biological behavior is inspired to formulate the Bacterial Foraging Optimization technique as described below. Consider the unconstrained optimization problem of minimizing the function J.X /; X 2 Rm
6.2 Bacterial Foraging Optimization Technique
203
Fig. 6.2 Sample outcomes of the 10-Variate ŒX1 X2 X3 : : : X10] Gaussian distribution with mean zero and Identity Co-variance matrix
Fig. 6.3 Sample outcomes of the 10-Variate ŒY1 Y2 Y3 : : : Y10 Gaussian distribution with mean MY and Co-variance matrix CY
Analogy: The vector X can be viewed as the position of the Bacteria J.X / < 0 can be treated as the presence of Nutrients J.X / D 0 can be treated as the Neutral J.X / > 0 can be treated as the presence of Toxic substances
204
6 Matlab Illustrations
Step 1: Initialization of the population Initialize the positions of ‘N’ (say) number of Bacteria. Let it be X 1 ; X 2 ; X 3 ; : : : X N corresponding to ‘N’ Bacterium b 1 ; b 2 ; b 3 ; : : : b N . The position vector X 1 ; X 2 ; X 3 ; : : : X N is the initial population. Step 2: Chemo taxis It is tendency of the bacteria to move towards the sources of Nutrients. It consists of two stages. They are the following (a) Tumbling: It is the tendency of the bacteria to change their positions in search i of Nutrients. Let Xnew be the next position of the i th Bacteria whose current i position is X . They are related as described below. i Xnew 2 Rm D X i C c'; where ' D p T such that each element of the vector is in the range Œ1 1. ˚ is the unit walk in random direction. ‘c’ is called as chemo tactic step size. The new positions are computed for i D 1; 2; : : : N (b) Swimming: Bacterium will tend to keep on moving in the particular direction if it is in the direction that is rich in nutrients.
i < J.X i /, then another swimming in the same direcMathematically if J Xnew th tion .'/ is taken by the i Bacteria and it can be continued upto Ns steps. After the completion
i of Ns stepsi Bacteria goes to the step 3. J.X /, Bacteria comes out of the tumbling stage and goes to the If J Xnew step 3. Step 3: Reproduction After step 2, best ‘N/2’ (50%) bacteria measured in terms of its Health are survived. The survived Bacterium are subjected to reproduction to obtain ‘N’ Bacterium as described below. Health of the Bacteria is measured in terms of J.X /. If the functional value J.X / is less, then the corresponding Bacteria is healthier. Compute J.X i / for i D 1; 2; : : : N . Arrange them in ascending order. First ‘N/2’ Bacterium and the corresponding positions are selected Let the positions be ŒY 1 ; Y 2 ; Y 3 ; : : : Y N=2 . Every Bacteria is split into two Bacterium and are placed in the same positions. Thus new set of positions corresponding to ‘N’ Bacterium are given as ŒY 1 ; Y 1 ; Y 2 ; Y 2 ; Y 3 ; Y 3 ; : : : Y N=2 Y N=2 D ŒZ 1 ; Z 2 ; Z 3 ; : : : Z N (say). Go to step 2. Repeat 2 and 3 for finite number of iterations. Then Go to step 4. Step 4: Elimination-dispersion In real world process, some of the bacterium (i.e.) with probability ‘Pd ’ are dispersed to new locations. This is simulated as shown below. Generate the random vector of size 1 N. Sort the elements of the vector in the ascending order. Find the index corresponding to the first N Pd sorted numbers. Choose the positions of the Bacterium corresponding to the obtained index. They are replaced with the randomly generated positions on the optimization domain.
6.2 Bacterial Foraging Optimization Technique
205
The positions thus obtained are treated as the current best positions. Go to step 2. Repeat the steps 2–4 for the finite number of iterations. The best value in every iteration can also be tracked and the best among them can be declared as the optimal solution. Social Communication In nature there is the social communication between Bacterium such that they are neither close together nor far away from each other. This is done by releasing the chemical by the Bacteria. The chemical signal can be either attractant or Repellent. If the chemical signal released by the particular Bacteria is attractant in nature, then it attracts other Bacteria to come to its position. On the contrary if the chemical signal released by the particular Bacteria is Repellent in nature, it doesn’t allow other Bacteria to come to its position. The social communication between Bacterium can be simulated using the modified objective function to be computed for the i th position corresponding to the i th position Bacteria as given below. Jmod.X i / D J.X i / C Jsocial.X i /; where Jmod is the modified Objective function computed for the i th position X i corresponding to the i th Bacteria. J.X i / is the actual objective function value computed for the i th position X i corresponding to the i th Bacteria. social.X i / is the attractant cum repellent signal computed for the i th position X i corresponding to the i th Bacteria as displayed below. ˇ ˇ2 Let dij D ˇjX i X j jˇ 0 1 N N X X Jsocial.X i / D M @ e Rdij e Adij A j D1
j D1
P Rdij Note that if the first term N is reduced if distance between the i th position j D1 e and others are made large and hence it acts as the repellent signal. Similarly the P Rd ij is reduced if the distance between the i th position and second term N j D1 e others are made small and hence it acts as the attractant signal. ‘R’ is the Repellent factor and ‘A’ is the attractant factor. The convergence graph obtained using Bacterial Foraging technique for minimizing the function J.X / D .x1^ 2 9/^ 2 C .x2^ 2 9/^ 2 C .x3^ 2 16/^ 2 is shown in the figure given below (Fig. 6.4). The corresponding Matlab program is also displayed below After 100 iteration, the best solution obtained is [2.9025 2.9512 4.0720] bactalgo.m %Bacterial Foraging Technique %Edit bactfun.m to insert the objective function to be minimized %TRACEERROR traces the minimum error obtained in every iteration
206
6 Matlab Illustrations
Fig. 6.4 Illustration of the convergence of the bacterial foraging algorithm
%TRACEVAL traces the corresponding best solution in every iteration %jsocial regulates the social communication between bacterium. pos1 D rand(1,100)10; pos2 D rand(1,100)10; pos3 D rand(1,100)10; vect D [pos1;pos2;pos3]’; figure TRACEVALUE D []; TRACEVECTOR D []; c D 1; for i D 1:1:100 Jcurvalue(i) D bactfun(vect(i,:)); Jswcurvalue(i) D Jcurvalue(i) C jsocial(vect(i,:), vect); End %Iteration starts for iter D 1:1:100 for dispersal D 1:1:50 for survey D 1:1:50 for bact D 1:1:100 pos1 D rand(1,1)10; pos2 D rand(1,1)10; pos3 D rand(1,1)10; phi D [pos1 pos2 pos3]; newvect D vect(bact,:) C c phi;
6.2 Bacterial Foraging Optimization Technique
207
Jcurvaluenew(bact) D bactfun(newvect); Jswcurvaluenew(bact) D Jcurvalue(bact) C jsocial (newvect,vect); %m is the maximum number swimming for m D 1:1:25 if (Jswcurvaluenew(bact) < Jswcurvalue(bact)) vect(bact,:) D newvect; Jcurvalue(bact) D bactfun(vect(bact,:)); Jswcurvalue(bact) D Jcurvalue(bact) C jsocial (vect(bact,:),vect); newvect D vect(bact,:) C c phi; Jcurvaluenew(bact) D bactfun(newvect); Jswcurvaluenew(bact) D Jswcurvaluenew(bact) C jsocial(newvect,vect); else m D 25; end end end %Reproduction of bacteria [p,q] D sort(Jswcurvaluenew); vect1 D vect(q(1:1:50),:); vect D [vect1;vect1]; end %Dispersal of bacteria with probability 0.2 [p,q] D sort(rand(1,100)); tempvect1 D vect(q(1:1:80),:); pos1 D rand(1,20)10; pos2 D rand(1,20)10; pos3 D rand(1,20)10; tempvect2 D [pos1;pos2;pos3]’; vect D [tempvect1;tempvect2]; for i D 1:1:100 Jcurvalue(i) D bactfun(vect(i,:)); Jswcurvalue(i) D Jcurvalue(i) C jsocial(vect(i,:), vect); end end [p,q] D sort(Jswcurvalue); temp D vect(q(1),:); TRACEVALUE D [TRACEVALUE bactfun(temp)]; TRACEVECTOR D [TRACEVECTOR; temp]; hold on plot(TRACEVALUE) pause(0.2);
208
6 Matlab Illustrations
end %Solution for i D 1:1:100 Jcurvalue(i) D bactfun(vect(i,:)); end [p,q] D sort(TRACEVALUE); FINALRES D TRACEVECTOR(q(1),:); jsocial.m function [res] D jsocial(a,vect) M D 1; b D repmat(a,[100 1]); vect D vect b; vect D vect.^ 2; vect D sum(vect’); Wa D 2; Wr D 2; res1 D sum(exp(vect(Wa))); res2 D sum(exp(vect(Wr))); bactfun.m function [res] D bactfun(z); p D z(1); q D z(2); r D z(3); res D .p^ 2 9/^ 2 C .q^ 2 9/^ 2 C .r^ 2 16/^2;
6.3 Particle Swarm Optimization The Particle swarm optimization is the biologically inspired algorithm inspired from the behavior of birds on deciding the optimal path to move from the particular source to the destination. Consider the task of movement of group of birds (say A,B,C) from the particular source point ‘S’ to the destination ‘D’. Let ‘A’ (which is currently at point PA ) the decides to move towards the point PA0 . Similarly the bird ‘B’ and ‘C’ decides to move towards the point PB0 and PC0 from PB and PC respectively. Let the distance between the point PC0 and the destination point ‘D’ is less compared to the distance between the point PB0 and the destination point ‘D’ and the distance between the point PB0 and the destination point ‘D’. Thus the final decision taken by the bird ‘B’ is the combination of the individual decision taken by bird ‘B’ and the best global decision taken by the neighboring birds (In the current scenario, the best global decision is the decision taken by the bird ‘C’). Mathematically the bird moves from the current position to the next position described as follows. The next position moved by the bird ‘B’ is
PB C gB PC0 PB C lB PB0 PB
6.3 Particle Swarm Optimization
209
Similarly the next position moved by the bird ‘A’ and ‘C’ are given below
PA C gA PC0 PA C lA PB0 PA
PC C gC PC0 PC where ‘gA ; gB ; gC ’ are the global constants and ‘lA ; lB ’ are the local constants. Consider the unconstrained optimization problem of minimizing the function J.X /; X 2 Rm . Analogy: The vector ‘X ’ in the above definition is treated as the current position of the bird in the PSO algorithm. The corresponding value J.X / is the distance between the current position of the bird and the destination. The PSO algorithm identifies the shortest path as described above so that it reaches the destination as early as possible. (i.e.) Identifying the optimal value of X such that J.X / is minimized Algorithm Step 1: Initialize the positions of ‘N ’ number of birds. Let it be X1 ; X2 ; X3 ; : : : XN . Step 2: Obtain the next positions of the ‘N’ birds using the combination of local decision taken by the individual birds and the best decision taken by the neighboring birds (as described earlier). Step 3: The next positions thus obtained are treated as current positions. Local decisions taken by the individual birds for the next move are taken as per the procedure given below. If the distance between current local decision taken by the i th bird position and the destination is greater than the distance between the next position mentioned in the step 2 and the destination, next position is treated as the local decision taken by the i th bird position for the next movement. Otherwise the current local decision taken by the i th bird position is considered as the local decision for the next move also. Repeat step 2 and 3 for the finite number of iterations. Step 4: Best position among the ‘N ’ positions of the birds in the last iteration corresponding to ‘N ’ birds is declared as the final solution for minimizing the function J.X /. Best position among the ‘N ’ positions in the final iteration is the position whose distance from the destination is minimum among all other positions. (i.e.) Best positionD argi .kXi Dk/; i D 1; 2; 3; : : : N The convergence graph obtained using Particle Swarm Optimization for minimizing the function J.X / D 2.x1 4/^ 2 C 4.x2 3/^ 2 C 5.x3 6/^ 2 is shown in the Fig. 6.5. The corresponding Matlab program is also displayed below. 2 3 x1 For the problem mentioned above the vector of the form 4x25 is treated as the x3 arbitrary position of the bird. PSO algorithm is performed to obtain the optimal value of the position of the bird such that its distance from the destination is minimized. The distance between the current position of the bird and the destination is measured using the objective function J.X / and hence the function is minimized.
210
6 Matlab Illustrations
Fig. 6.5 Illustration of the convergence of the PSO algorithm
Immediate after fifth iteration, the solution obtained using PSO algorithm is given as [3.9999 3 6] and the is corresponding
6.4 Newton’s Iterative Method Consider the problem of minimizing the function f .x1; x2; x3/ D 2.x1 4/2 C 4.x2 3/2 C 5.x3 6/2 Expressing the above function using Taylor series we get the following. 02 3 1 3 2 31 02 3 1 02 31 x1 x1 x1 x1 x1 2 D 2 f @4x25A C : : : f @4x25 C 4x25A D f @4x25A C Df @4x25A C 2Š x3 x3 x3 x3 x3 02
@ @ @ where D D x1 @x1 C x2 @x2 C x3 @x3 . The above series can also be represented in the matrix form as shown below
02
3 2 31 x1 x1 f @4x25 C 4x25A x3 x3
3 @f 02 31 6 @x1 7 x1 7 6 6 @f 7 D f @4x25A C Œx1 x2 3 6 7 6 @x1 7 4 @f 5 x3 @x1 2
6.4 Newton’s Iterative Method
211
2
3
@2 f @2 f @2 f 6 @x12 @x1@x2 @x1@x3 7 6 7 2x13 6 2 7 6 @ f @2 f @2 f 7 4 C Œx1 x2 3 6 7 x25 2 6 @x2@x1 7 @x2 @x2@x3 6 7 x3 4 @2 f @2 f @2 f 5 @x3@x1 @x3@x2 @x32 C ::: 2 2 3 3 2 3 2 3 2 3 x1nC1 x1n x1 x1 x1 6 nC1 7 6 7 7 4 5 6 n7 Let 4x25 C 4x25 D 6 4x2 5 and x2 D 4x2 5 x3 x3 x3 x nC1 xn 3
3
Rewriting the Taylor series using the notations used above we get, 02
x1nC1
31
02
x1n
31
B6 7C B6 7C f @4x2nC1 5A D f @4x2n 5A C x1nC1 x1n x2nC1 x2n x3nC1 x3n x3nC1 x3n 2 3 @f 6 @x1 7 0 2 31 6 7 x1n 6 @f 7 6 7 B 6 n 7C nC1 6 7 @at 4x2 5A C x1 x1n x2nC1 x2n x3nC1 x3n 6 @x2 7 6 7 x3n 4 @f 5 @x3 3 @2 f 0 2 31 2 3 @x1 @x3 7 7 x1n x1nC1 x1n 7 6 7C 6 nC1 7 @2 f 7 B C 6x at 6 x2n 7 x2n 7 7B 2 @ 4 5 A 4 5 C ::: @x2 @x3 7 7 x3n x3nC1 x3n @2 f 5 @x3 @x3 02 31 x1nC1 B6 nC1 7C 7C 6 If we want to find out the roots of the above equation, we equate f B @4x2 5A D 0 x3nC1 3 2 x1nC1 6 nC1 7 7 and solve for the vector 6 4x2 5 as shown below x3nC1 2
@2 f 6 @x1 @x1 6 6 2 6 @ f 6 6 @x2 @x1 6 4 @2 f @x3 @x1
@2 f @x1 @x2 @2 f @x2 @x2 @2 f @x3 @x2
212
6 Matlab Illustrations
(Considering only first two terms) 02 n 31 x1 B6 n 7C f @4x2 5A D x1nC1 x1n x3n
x2nC1 x2n
02
nC1
) x1nC1 x2nC1 x2
2 @f 3
0 2 n 31 6 @x1 7 x1 6 @f 7 7 B 6 n 7C 6 7 @at 4x2 5A 6 6 @x2 7 4 @f 5 x3n @x3
x3nC1 x3n
n
D x1n x2n x2
31 n
2 @f 31
6 @x1 7 6 7 B6 n 7C 6 @f 7 f @4x2 5A 6 7 6 @x2 7 4 @f 5 x3n @x3 x1
0
2 n 31 x1 B 6 n 7C @at 4x2 5A x3n
The above mentioned equation is called Newton’s method of computing the roots of the multivariable equation f .x1; x2; x3/. Note:
2
3
0 2 n 31 x1 6 7 6 @f 7 B 6 n 7C Note that 6 @x2 7 @at 4x2 5A is the column vector and hence the inverse mentioned 4 5 @f x3n @f @x1
@x3
in the equation is the pseudo inverse (see Chapter 1 for details). Consider the Taylor series equation as mentioned below 02 n 31 02 nC1 31 x1 x1 B6 n 7C nC1 B6 nC1 7C f @4x2 5A D f @4x2 5A C x1 x1n x2nC1 x2n x3nC1
x3n 2 3 @f 6 @x1 7 0 2 n 31 6 7 x1 6 @f 7 6 7 B 6 n 7C 6 7 @at 4x2 5A C : : : 6 @x2 7 6 7 x3n 4 @f 5 @x3
x3nC1 x3n
6.4 Newton’s Iterative Method
213
2 3 x1 Differentiating with respect to the vector 4x2 5 on both sides we get the following x3 2 3 @f 0 2 n 31 0 2 nC1 31 6 @x1 7 x1 x1 6 7 B 6 nC1 7C 6 @f 7 B 6 n 7C rf @at 4x2 5A D 6 7 @at 4x2 5A C 6 @x2 7 nC1 4 x xn @f 5 3
2
@2 f 6 @x1 @x1 6 6 2 6 @ f 6 6 @x2 @x1 6 4 @2 f @x3 @x1
3
@2 f @x1 @x2 @2 f @x2 @x2 @2 f @x3 @x2 2
@x3 3
@2 f 0 2 n 31 2 nC1 3 @x1 @x3 7 7 x1 x1 x1n 7 2 @ f 7 B 6 n 7C 6 nC1 7 7 @at 4x2 5A 4x2 x2n 5 7 @x2 @x3 7 x3n x3nC1 x3n @2 f 5 @x3 @x3 x1nC1
3
6 7 Also to consider the point 4x2nC1 5 as the extremal point of the equation x3nC1 nC1 31 x1 B 6 nC1 7C rf @at 4x2 5A x3nC1 0 2
f .x1; x2; x3/, then
D 0.
Using the above condition and solving the expression for extremal point we get 2
x1nC1
3
6 nC1 7 4x2 5 D x3nC1
2
@2 f 2 n 3 6 @x1 @x1 6 x1 6 6 n 7 6 @2 f 4x2 5 6 6 @x2 @x1 6 x3n 4 @2 f @x3 @x1
@2 f @x1 @x2 @2 f @x2 @x2 @2 f @x3 @x2
31 2 @f 3 @2 f 0 2 n 31 6 @x1 7 0 2 n 31 @x1 @x3 7 7 x x1 6 7 7 2 7 B 6 1 7C @ f 7 B 6 n 7C 6 @f n 7 7 @at 4x2 5A 6 6 @x2 7 @at 4x2 5A @x2 @x3 7 6 7 7 4 5 x3n x3n @f @2 f 5 @x3 @x3
@x3
Consider the problem of minimizing the function f .x1; x2; x3/ D 2.x1 4/2 C 4.x2 3/2 C 5.x3 6/2
214
6 Matlab Illustrations
Newton’s iteration equation is formulated as shown below. 2 nC1 3 2 n 3 2 3 2 4 x n 4 3 x1 x1 1 1=4 0 0 7 6 n 6 nC1 7 6 n 7 4 5 D 0 1=8 0 4 8 x1 3 5 4x2 5 4x2 5
0 0 1=10 x nC1 xn 12 x n 6 3
3
3
2 3 0 Let us initialize the extremum vector as 405 0 Iterations are performed using Matlab and Error (vs) Iteration table is mentioned below for illustration. Iteration 1 Error
2
3
4
5
7.2 0.2880 0.0115 0.0005 0.0000
Note that Error function converges to zero immediate after reaching fifth iteration.
6.5 Steepest Descent Algorithm Consider the problem of minimizing the function f .x1; x2; x3/ D 2.x1 4/2 C 4.x2 3/2 C 5.x3 6/2 The Linear approximation of the curve f .x1; x2; x3/ can be obtained as follows f .x1 C k 1 ; x2 C k 2 ; x3 C k 3 / @f @f D f .x1; x2; x3/ C k @x1 @x2
2 3 @f 4 1 5 2 @x3 3
.i:e/f .X C k/ D f .X / C krf T 3 1 42 5 is the unit vector and k is some scalar constant 3 The maximum increase in the value of the function f .x1; x2; x3/ (i.e.) f .x1 C 2 3 i 1 h k 1 ; x2 C k2 ; x3 C k3 / f .x1; x2; x3/ occurs when @f @f @f 42 5 2
@x1 @x2 @x3
3 is maximum. We also know from Cauchy-Schwarz inequality (see Chapter 4)
6.5 Steepest Descent Algorithm
215
h that maximum value of the inner product
@f @f @x1 @x2
2 3 i 1 @f 42 5 occurs only when @x3 3
2 @f 3 3 @x1 1 6 @f 7 7 6 4 2 5 D l 4 @x2 5, where ‘l’ is some scalar constant. 3 @f 2
2 @f 3 2 3 @x1 1 6 @f 7 7 Similarly maximum decrease in the function occurs when 42 5 D l 6 4 @x2 5 3 @f @x3 2 n3 x1 Let 4x2n 5 be the current value of the vector X used in the steepest descent iterx2n 2 nC1 3 x1 6 nC1 7 ation algorithm. Then the next best value for the vector X represented as 4x2 5 @x3
x2nC1 such that the vector is in the direction of decreasing function f (X) is given as follows. Error value (vs) Iteration 250
Error value
200
150
100
50
0
0
10
20
30
40
50 60 Iteration
70
Fig. 6.6 Illustration of the convergence of the steepest descent algorithm
80
90
100
216
6 Matlab Illustrations
2
3
@f 6 @x1 7 6 7 6 nC1 7 6 n 7 6 @f 7 7 4x2 5 D 4x2 5 l 6 6 @x2 7 nC1 n 4 x2 x2 @f 5 @x3 For the above problem the iterative equation is obtained as follows 2
2
x1nC1
3
3
2
2
x1n
3
3
2 3 4 .x1 4/ 6 nC1 7 6 n 7 4x2 5 D 4x2 5 l 4 8 .x2 3/ 5 10 .x3 6/ x nC1 xn x1nC1
2
x1n
2
2 3 0 Let us initialize the vector D 405. Iterations are performed using Matlab and the 0 convergence graph is plotted for illustration (Fig. 6.6). The learning rate is chosen as l D 0:01. After 100 iterations, function value 2 3 3:9325 reaches 0.0091 and the corresponding vector obtained is 42:99935. 5:9998
Index
A Algebraic multiplicity, 44–46, 171
B Bacterial foraging, 202–208 Band limited random process, 136, 137, 303 Band pass random process, 144–146, 148–152 Basis, 12–20, 22–26, 28, 34, 36–40, 42, 45, 56–58, 154, 157–159, 162–172, 174, 175, 177–179, 191–193 Bayes theorem, 70 Block multiplication, 4, 5
C Cayley-Hamilton theorem, 167 Central limit theorem, 122 Characteristic polynomial, 42, 167, 171 Chebyshev inequality, 100 Chernoff bound, 101, 106–107 Column space, 14–22, 27–30, 32–38, 42 Complex random process, 127, 128 Complex random variable, 118–119 Conditional probability, 69, 82, 103 Constrained optimization, 181–189, 193–194 Contours, 110–112 Convergence, 119–122, 205, 206, 209, 210, 215, 216 Correlation-coefficient, 101 Correlation matrix, 104, 107, 128 Co-variance matrix, 104, 107–118, 197–203 Cumulative distribution function (cdf), 75–76, 89, 102, 124 Cyclo-stationary process, 140 Cyclo stationary random process, 139–142
D Diagonalization of the matrix, 47–49, 58–60 Direct sum, 28, 33, 160–162, 168, 170–173 Dual optimization problem, 194–195 Dual space, 158–160
E Eigen basis, 168, 191, 192 Eigen space, 168, 169, 171–173 Eigen values, 42–47, 49–54, 57, 61–63, 167–169, 171–173, 189–193, 200, 201 Empty set, 67 Ensemble average, 128–132 Ergodic process, 128–132 Event, 67–74, 75, 78–80, 82, 104, 120 Expectations, 99, 103, 126 Extremum, 181–183, 186, 187, 189, 191, 193, 214
G Gaussian density function, 84, 105–107, 198 Gaussian random process, 138 Generalized Eigen space, 169, 171–173 Geometric multiplicity, 44–46, 171 Gradient vector, 181–183, 186 Gram Schmidt orthogonalization procedure, 37, 38, 40
H Hermitian matrix, 50–52, 56, 57, 61 Hessian matrix, 185, 187–193 Hilbert transformation, 146–150
217
218 I Identity matrix, 5, 7, 10, 11, 17, 29, 33, 36, 42, 52, 58, 167, 176, 199 Image, 24, 75, 157, 172 Independence, 12–13, 70–71 Independent random process, 132 Indicator, 100–101 Injective, 155, 156 Inner product space, 176–177 Invariant space, 168 Isomorphic, 155–157, 162, 164, 167
J Jordon basis, 291
K Kernel, 157, 172, 173, 176, 179 Kuhn-Tucker conditions, 195–196
L LDU decomposition, 7–10 Learning rate, 216 Left Null space, 14, 18–22, 27, 28, 33, 34 Linearly independent, 13, 26, 153, 154, 157, 172, 178 Low pass random process, 145, 148–152
M Markov inequality, 100 Matlab, 197–216 Minimal polynomial, 167–169, 171, 172, 176 Moment generating function, 102–107, 114 Multivariate Gaussian distribution, 197–202 Mutually exclusive set, 67
N Newton’s iterative method, 210–214 Nilpotent transformation, 173–175 Non-deficient matrix, 49, 56–60 Normal matrix, 56–58 Null space, 14, 16–22, 27–29, 31–34, 42, 43, 45, 56, 67, 190–193
O Orthogonal basis, 26, 57, 177–179 Orthogonal complement, 27, 178
Index P Particle swarm optimization, 208–210 Partition, 67, 70, 72 Poison random variable, 114 Positive definite matrix, 189, 191 Power spectral density (PSD), 134–137 Probability density function, 77–79, 82, 85, 89, 90, 92, 93, 102, 103, 105, 110, 111, 115, 122, 126, 128, 199 Probability space, 68, 75, 77, 118 Projection matrix, 35, 36, 38 Pseudo inverse, 63, 64, 66, 212
Q QR factorization, 40–42
R Random process, 123–152 Random variable, 75–80, 82, 84–100, 102–107, 109, 113–115, 118–122, 124–128, 138, 197, 198, 199 Random vector, 102–118, 123, 132, 138, 199, 200, 204 Rank-Nullity theorem, 22, 158 Rayleigh distribution, 198 Riegtz representation, 179 Row space, 14, 18–22, 27–29, 34
S Sample space, 67, 68, 70–72, 120, 123 Schur’s lemma, 49–50, 52 Schwarz inequality, 100 Sequence of random variables, 119–122 Set, 12–16, 18, 19, 26, 36, 39, 40, 47, 57, 64, 67, 68, 75, 76, 86, 95, 123, 128, 138, 153–154, 157–159, 166, 168–170, 176–178, 181, 183, 186, 187, 190, 192, 193, 197, 199, 204 Similar matrices, 26, 45, 164–166 Singular value decomposition (SVD), 36, 60–66 Skew Hermitian matrix, 50–52 Span, 12–15, 189 Square matrix, 5–7, 42, 49, 169 Steepest-descent algorithm, 214–216 Strictly stationary random process, 124–125, 139 Subset, 12, 67, 68, 75, 76, 153 Subspace, 12, 14, 21, 22, 27, 153, 154, 157, 159, 160, 168, 178 Surjective, 155, 156
Index T Tangent vector, 181–185 Taylor series, 181, 210–212 Time average, 129–132 Total probability theorem, 70 Transformation matrix, 24–26, 109, 162–172, 174, 175, 199, 200 Transpose, 5, 22, 49, 50, 52, 60
U Uncorrelated random process, 132 Unitary matrix, 49, 50, 52–58, 60, 62, 65, 66
219 V Vector space, 12–21, 24, 26–28, 33–36, 39, 45, 56–58, 153–154, 157, 158, 160, 162, 163, 166–168, 170–172, 176–179, 189
W White random process, 137–138 Wide sense cyclo stationary random process, 139–142 Wide sense stationary random process, 125–127