235 92 3MB
English Pages 227 [229] Year 2018
MATHEMATICS RESEARCH DEVELOPMENTS
QUATERNION MATRIX COMPUTATIONS
No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in rendering legal, medical or any other professional services.
MATHEMATICS RESEARCH DEVELOPMENTS Additional books and e-books in this series can be found on Nova’s website under the Series tab.
MATHEMATICS RESEARCH DEVELOPMENTS
QUATERNION MATRIX COMPUTATIONS
MUSHENG WEI YING LI FENGXIA ZHANG AND
JIANLI ZHAO
Copyright © 2018 by Nova Science Publishers, Inc. All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted in any form or by any means: electronic, electrostatic, magnetic, tape, mechanical photocopying, recording or otherwise without the written permission of the Publisher. We have partnered with Copyright Clearance Center to make it easy for you to obtain permissions to reuse content from this publication. Simply navigate to this publication’s page on Nova’s website and locate the “Get Permission” button below the title description. This button is linked directly to the title’s permission page on copyright.com. Alternatively, you can visit copyright.com and search by title, ISBN, or ISSN. For further questions about using the service on copyright.com, please contact: Copyright Clearance Center Phone: +1-(978) 750-8400 Fax: +1-(978) 750-4470 E-mail: [email protected].
NOTICE TO THE READER The Publisher has taken reasonable care in the preparation of this book, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained in this book. The Publisher shall not be liable for any special, consequential, or exemplary damages resulting, in whole or in part, from the readers’ use of, or reliance upon, this material. Any parts of this book based on government reports are so indicated and copyright is claimed for those parts to the extent applicable to compilations of such works. Independent verification should be sought for any data, advice or recommendations contained in this book. In addition, no responsibility is assumed by the publisher for any injury and/or damage to persons or property arising from any methods, products, instructions, ideas or otherwise contained in this publication. This publication is designed to provide accurate and authoritative information with regard to the subject matter covered herein. It is sold with the clear understanding that the Publisher is not engaged in rendering legal or any other professional services. If legal or any other expert assistance is required, the services of a competent person should be sought. FROM A DECLARATION OF PARTICIPANTS JOINTLY ADOPTED BY A COMMITTEE OF THE AMERICAN BAR ASSOCIATION AND A COMMITTEE OF PUBLISHERS. Additional color graphics may be available in the e-book version of this book.
Library of Congress Cataloging-in-Publication Data ISBN: H%RRN
Published by Nova Science Publishers, Inc. † New York
Contents Preface
ix
Acknowledgments
xi
Notations
xiii
1 Preliminaries 1.1. Introduction . . . . . . . . . . . . . . . . . . 1.2. Quaternions . . . . . . . . . . . . . . . . . . 1.3. Quaternion Matrices . . . . . . . . . . . . . 1.4. Eigenvalue Problem . . . . . . . . . . . . . . 1.5. Norms . . . . . . . . . . . . . . . . . . . . . 1.5.1. Vector Norms . . . . . . . . . . . . 1.5.2. Matrix Norms . . . . . . . . . . . . 1.6. Generalized Inverses . . . . . . . . . . . . . 1.7. Projections . . . . . . . . . . . . . . . . . . 1.7.1. Idempotent Matrices and Projections . 1.7.2. Orthogonal Projections . . . . . . . . 1.7.3. Geometric Meanings of AA† and A† A 1.8. Properties of Real Representation Matrices . 2 Computing Matrix Decompositions 2.1. Elementary Matrices . . . . . . . . 2.2. The Quaternion LU Decomposition 2.3. The Quaternion LDLH and Cholesky Decompositions . . . . . . . . . . . 2.4. The Quaternion QR Decomposition
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
1 1 2 4 7 14 14 17 23 25 26 29 30 31
35 . . . . . . . . . . . . . . . 36 . . . . . . . . . . . . . . . 45 . . . . . . . . . . . . . . . 51 . . . . . . . . . . . . . . . 58
vi
Contents 2.4.1. The Quaternion Householder QRD . 2.4.2. The Givens QRD . . . . . . . . . . . 2.4.3. The Modified Gram-Schimit Scheme 2.4.4. Complete Orthogonal Decomposition 2.5. The Quaternion SVD . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
59 66 67 69 69
3 Linear System and Generalized Least Squares Problems 3.1. Linear System . . . . . . . . . . . . . . . . . . . . . . 3.2. The Linear Least Squares Problem . . . . . . . . . . . 3.2.1. The LS Problem and Its Equivalent Problems . 3.2.2. The Regularization of the LS Problem . . . . . 3.2.3. Some Matrix Equations . . . . . . . . . . . . 3.3. The Total Least Squares Problem . . . . . . . . . . . . 3.4. The Equality Constrained Least Squares Problem . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
79 79 82 82 85 87 92 96
4 Direct Methods for Solving Linear System and Generalized LS Problems 4.1. Direct Methods for Linear System . . . . . 4.2. Direct Methods for the LS Problem . . . . . 4.3. Direct Methods for the TLS Problem . . . . 4.4. Direct Methods for the LSE Problem . . . . 4.5. Some Matrix Equations . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
103 104 105 107 108 111
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
5 Iterative Methods for Solving Linear System and Generalized LS Problems 5.1. Basic Knowledge . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1. The Chebyshev Polynomials . . . . . . . . . . . . . . 5.1.2. The Range of Eigenvalues of Real Symmetric Tridiagonal Matrices . . . . . . . . . . . . . . . . . . . . . . 5.2. Iterative Methods for Linear System . . . . . . . . . . . . . . 5.2.1. Basic Theory of Splitting Iterative Method . . . . . . 5.2.2. The Krylov Subspace Methods . . . . . . . . . . . . . 5.3. Iterative Methods for the LS Problem . . . . . . . . . . . . . 5.3.1. Splitting Iterative Methods . . . . . . . . . . . . . . . 5.3.2. The Krylov Subspace Methods . . . . . . . . . . . . . 5.3.3. Preconditioning Hermitian-Skew Hermitian Splitting Iteration Methods . . . . . . . . . . . . . . . . . . . .
119 . 120 . 120 . . . . . . .
121 122 122 132 137 138 143
. 149
vii
Contents 5.4. Iterative Methods for the TLS Problem 5.4.1. The Partial SVD Method . . . 5.4.2. Bidiagonalization Method . . 5.5. Some Matrix Equations . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
6 Computations of Quaternion Eigenvalue Problems 6.1. Quaternion Hermitian Right Eigenvalue Problem . . . . . . . 6.1.1. The Power Method and Inverse Power Method for Quaternion Hermitian Right Eigenvalue Problem . . . 6.1.2. Real Structure-Preserving Algorithm of Hermitian QR Algorithm for Hermitian Right Eigenvalue Problem . 6.1.3. Real Structure-Preserving Algorithm of the Jacobi Method for Hermitian Right Eigenvalue Problem . . . 6.1.4. Subspace Methods . . . . . . . . . . . . . . . . . . . 6.2. Quaternion Non-Hermitian Right Eigenvalue Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1. The Power Method and the Inverse Power Method . . 6.2.2. The Quaternion QR Algorithm for Quaternion Non-Hermitian Right Eigenvalue Problem . . . . . . .
. . . .
151 151 152 154
157 . 158 . 158 . 164 . 168 . 175 . 177 . 177 . 185
References
189
About the Authors
207
Index
209
Preface In 1843, Sir William Rowan Hamilton (1805-1865) introduced quaternion as he tried to extend complex numbers to higher spatial dimensions, and then he spent the rest of his life obsessed with them and their applications. Nevertheless he probably never thought that one day in the future the quaternion he had invented would be used in quaternionic quantum mechanics (qQM) , color image processing and many other fields. About 100 years later, Finkelstein et al built the foundations of qQM and gauge theories. Their fundamental works led to a renewed interest in algebrization and geometrization of physical theories by non-commutative fields. Among the numerous references on this subject, the important paper of Horwitz and Biedenharn showed that the assumption of a complex projection of the scalar product, also called complex geometry Rembielinski, permits the definition of a suitable tensor product between single-particle quaternionic wave functions. Now quaternion becomes to play an important role in many application fields, such as special relativity, group representations, non-relativistic and relativistic dynamics, field theory, Lagrangian formalism, electro weak model, grand unification theories and so on. In 1996, Sangwine proposed to encode three channel components of an RGB image on the three imaginary parts of a pure quaternion. Thus, a color image can be represented by a pure imaginary quaternion matrix. Since then, quaternion representation of a color image has attracted great attention. With the rapid development of applications in the above disciplines, it is necessary to further study numerical computations of quaternion matrices. Sangwine and Le Bihan established a plausible Quaternion Toolbox software for Matlab. During the recent years, there are two kinds of methods proposed for quaternion matrix computations based on real arithmetic operations. The first kind of method is to directly perform computations on the real representation matrices
x
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
of the original quaternion matrices which do not consider the structure of the quaternion matrix. These methods contain two drawbacks: The sizes of the real representation matrices become much larger and stability properties differ from the original quaternion matrices. The second kind of method appeared in the literature was originally suggested by Wei with his former students. At each step of implementations, one can apply quaternion operations to the quaternion matrices, and then use the properties of quaternion matrices and the real representation matrices to obtain real structure-preserving operations. It suffices to perform these operations on the first column block or the first row block of the real representation matrices and to perform high-level performance computations to utilize vector pipelining arithmetic operations. The advantages of these methods are that these operations have the same numbers of real arithmetic operations as in quaternion operations therefore gain much higher CPU speeds than those in the Quaternion Toolbox, and that these operations retain all properties of quaternion matrices. In order to present state-of-the-art technique for quaternion matrix computations, promote the further development of these kinds of problems, and meet the practical requirement of applications, in this monograph we describe real structure-preserving methods for commonly used factorizations such as the LUD, QRD, SVD, direct and iterative methods for solving quaternion linear system, generalized least squares problems, and right eigenvalue problems. For self-contained purpose, we also briefly mention some basic properties of quaternions and quaternion matrices. Some applications to real scientific problems are also included. This monograph can be used as a reference book for scientists, engineers and researchers in color image processing, quaternionic quantum mechanics, information engineering, information security and scientific computing, or a text book in graduate student level in related areas. Musheng Wei Ying Li Fengxia Zhang Jianli Zhao June, 2018
Acknowledgments We are grateful to the publishers, grants, institution and individuals that provided supports when we prepared this monograph. We express our gratitude to Nova Science Publisher for inviting us to write this monograph and publish it by Nova Science Publisher. We thank publishers Elsevier B.V. and Springer Nature for permission that in this monograph we can include some materials and figures contained in articles published by Elsevier B.V. and Springer Nature. Some research material in this monograph was supported by the National Natural Science Foundation of China under grants 11301247 and 11171226, the Natural Science Foundation of Shandong Province, under grant ZR2012FQ005, Science and Technology Project of Department of Education, Shandong Province, P. R. China under grand J15LI10, the Science Foundation of Liaocheng University under grants 31805, 318011318. Liaocheng University supplied all necessary facilities when we prepared manuscript. Dr. Tongsong Jiang and Dr. Fuzhen Zhang pointed out important properties of quaternion matrices. Dr. Zhigang Jia provided useful suggestions for numerical performances of real structure-preserving methods. Finally, this manuscript would have been impossible to produce without the support of the families of the authors.
Notations • e, i, j, k: e is the identity operator, and i, j, k are unit imaginary quaternion. • R, C, Q: Sets of real, complex and quaternion numbers, respectively. • Rm×n , Cm×n , Qm×n : Sets of m × n matrices of real, complex and quaternion elements, respectively. Rn = Rn×1 , Cn = Cn×1 , Qn = Qn×1 . • R0 :
Set of nonnegative numbers.
• Un :
Set of n × n unitary matrices.
• A¯ : Conjugate of the matrix A. • AT : Transpose of the matrix A. • AH : Conjugate transpose of the matrix A (i.e. A¯ T ). • A−1 : The inverse of the matrix A. • A† :
The Moore-Penrose (MP) generalized inverse of the matrix A.
• A > 0 : The matrix A is an Hermitian positive definite matrix. • A ≥ 0 : The matrix A is an Hermitian semi-positive definite matrix. • In : The n × n identity matrix. If there is no confusion occurs, we also denote as I. • 0m×n : The m × n matrix of all zero entries. If there is no confusion occurs, we also denote as 0. • R (A) : The subspace spanned by all columns of the matrix A.
xiv
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao • span (u1 , · · · , ui ) : The subspace spanned by the vectors u1 , · · · , ui . • N (A ) :
The null space of the matrix A.
• PA : The orthogonal projection on R (A ). • rank(A) : The rank of the matrix A. • Re(x) : The real part of the quaternion x. • Im(x) : The imaginary part of the quaternion x. • Co(x) : The complex part of the quaternion x. • ρ(A) : The spectral radius of the matrix A. • λl (A), λr(A) : The sets of left and right eigenvalues of the matrix A, respectively. • σ(A) : • [x]:
The set of the singular values of the matrix A. The equivalent set of the quaternion x.
• σmin + (A) : The minimal positive singular value of the matrix A. • kxk2 :
The Euclidion norm of the vector x.
• kAk2 : The spectral norm of the matrix A. • kAkF : The Frobenius norm of the matrix A.
Chapter 1
Preliminaries 1.1. Introduction In 1843, Sir William Rowan Hamilton (1805-1865) introduced quaternion as he tried to extend complex numbers to higher spatial dimensions, and then he spent the rest of his life obsessed with them and their applications. About 100 years later, Finkelstein et al. built the foundations of qQM and gauge theories. Their fundamental works led to a renewed interest in algebrization and geometrization of physical theories by non-commutative fields. Now quaternion becomes to play an important role in many application fields, such as special relativity, group representations, non-relativistic and relativistic dynamics, field theory, Lagrangian formalism, electro weak model, grand unification theories and so on. In 1996, Sangwine proposed to encode three channel components of an RGB image on the three imaginary parts of a pure quaternion. Thus, a color image can be represented by a pure imaginary quaternion matrix. Since then, quaternion representation of a color image has attracted great attention. Many researchers applied the quaternion matrix to study the problems of color image processing due to its capability to treat the three color channels holistically without losing color information. With the rapid development of applications in the above disciplines, it is necessary to further study algebraic properties and numerical computations of quaternion matrices. In order to present state-of-the-art technique for quaternion matrix computations, promote the further development of these kinds of problems, and meet the practical requirement of applications, in this monograph we describe real structure-preserving methods for commonly used factorizations such as the LUD, QRD, SVD, direct and iterative methods
2
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
for solving quaternion linear system, generalized least squares problems and right eigenvalue problems. In this chapter, we describe basic definitions and properties of quaternions and quaternion matrices. For further study of this topic, we refer to nice surveys of Zhang [209, 210], Jiang [85] and Rodman [151]. In §1.2 we describe basic properties of quaternions; in §1.3 we describe basic properties of quaternion matrices; in §1.4 we introduce the results of the eigenvalues of quaternion matrices; in §1.5 we introduce the quaternion vector norms and the quaternion matrix norms; in §1.6 we discuss the properties of generalized inverses of the quaternion matrices; in §1.7 we describe the corresponding definitions and results about projections; in §1.8 we describe properties of real representation matrices of quaternion matrices.
1.2. Quaternions In this section, we recall some basic properties about quaternion. As usual, let C and R denote the fields of the complex and real numbers, respectively. Let Q be a four-dimensional vector space over R with an ordered basis, denoted by e, i, j and k. A real quaternion, simply called quaternion, is a vector x = x1 e + x2 i + x3 j + x4 k ∈ Q with real coefficients x1 , x2 , x3 , x4 . Besides the addition and the scalar multiplication of the vector space Q over R, the product of any two of the quaternions e, i, j, k is defined by the requirement that e act as an identity and by the table i2 = j2 = k2 = ijk = −1. From the above identities, one can check that ij = −ji = k, jk = −kj = i, ki = −ik = j. If a and b are any (real) scalars, while u, v are any two of e, i, j, k, then the product (au)(bv) is defined as (ab)(uv). These rules, along with the distribution law, determine the product of any two quaternion . Q is clearly an associative but non-commutative algebra of rank four over R, called quaternion skew-field. Real numbers and complex numbers can be thought of as quaternions in the natural way. Thus x1 e + x2 i + x3 j + x4 k is simply written as x1 + x2 i + x3 j + x4 k.
Preliminaries
3
For any x = x1 + x2 i + x3 j + x4 k ∈ Q, we define Re(x) = x1 , the real part of x; Co(x) = x1 + x2 i the complex part of x; Im(x) = x2 i + x3 j + x4 k, the imaginary part of x; x¯ = x1 − x2 i − x2 j − x3 k, the conjugate of x; and q
|x| = x21 + x22 + x23 + x24 , the norm of x. x is said to be a unit quaternion if its norm is 1. From the above definitions, we have the following facts. Theorem 1.2.1. Let x, y and z be quaternions. Then 1. xH x = xxH , i.e., |x| = |xH |; 2. |x|2 + |y|2 = 12 (|x + y|2 + |x − y|2 ); 3. x = |x|u for some unit quaternion u; 4. jc = cj ¯ or jcjH = c¯ for any complex number c; 5. xH ix = (x21 + x22 − x23 − x24 )i + 2(−x1 x4 + x2 x3 )j + 2(x1 x3 + x2 x4 )k if x = x1 + x2 i + x3 j + x4 k;
6. x = 12 (x + xH ) + 21 (x + jxH j) + 21 (x + kxH k), and xH = − 12 (x + ixi + jxj + kxk); 7. x2 = |Re(x)|2 − |Im(x)|2 + 2Re(x)Im(x); 8. (xy)H = yH xH ; 9. (xy)z = x(yz); 10. (x + y)2 6= x2 + 2xy + y2 in general; 11. xH = x if and only if x ∈ R; 12. ax = xa for every x ∈ Q if and only if a ∈ R; 13.
xH |x|2
is the inverse of x if x 6= 0, and if x−1 denotes the inverse of x, then |x−1 | =
1 |x| ;
14. x2 = −1 has infinitely many of quaternion x as solutions; 15. x and xH are solutions of t 2 − 2Re(x)t + |x|2 = 0;
4
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao 16. every quaternion q can be uniquely expressed as q = c1 + c2 j, in which c1 and c2 are complex numbers.
Two quaternion x and y are said to be similar if there exists a nonzero quaternion u such that u−1 xu = y, this is written as x ∼ y. Obviously, x and y are similar if and only if there is a unit quaternion v such that v−1 xv = y, and two similar quaternion have the same norm. It is routine to check that ∼ is an equivalence relation on the quaternion skew-field. We denote by [x] the equivalent class containing x. Lemma 1 , q2 , q3 , q4 ∈ R, then q and q 1.2.2. If q = q1 + q2 i + q3 j + q4 k with qq q1 + q22 + q23 + q24 i are similar, namely, q ∈ [q1 + q22 + q23 + q24 i].
Proof. Consider the equation of quaternion q qx = x(q1 + q22 + q23 + q24 i) (1.2.1) q It is easy to verify that x = ( q22 + q23 + q24 + q2 ) − q4 j + q3 k is a solution to (1.2.1) if q23 + q24 6= 0.
It is readily seen that [x] contains a single element if and only if x ∈ R. If x 6∈ R, then [x] contains infinitely many quaternions, among which there are only two complex numbers that are a conjugate pair, moreover x ∼ xH for every quaternion x. This lemma yields the following theorem. Theorem 1.2.3. (Brenner, 1951; Au-Yeung, 1984). Let x = x1 + x2 i + x3j + x4 k and y = y1 + y2 i + y3 j + y4 k be quaternions. Then x and y are similar if and only if x1 = y1 and x22 + x23 + x24 = y22 + y23 + y24 , i.e., Re(x) = Re(y) and |Im(x)| = |Im(y)|.
1.3. Quaternion Matrices In this section, we introduce concepts, properties and operations of quaternion matrices. Let Qm×n denote the collection of all m × n matrices with quaternion entries. Besides the ordinary addition and multiplication, for A = (ast ) ∈ Qm×n , q ∈ Q, the left (right) scalar multiplication is defined as follows qA = (qast )
(Aq = (ast q)).
Preliminaries
5
It is easy to see that for A ∈ Qm×n , B ∈ Qn×o and p, q ∈ Q, (qA)B = q(AB), (Aq)B = A(qB), (pq)A = p(qA). Moreover Qm×n is a left (right) vector space over Q. All operators for complex matrices can be performed except the ones such as (qA)B = A(qB), in which commutativity is involved. Just as for complex matrices, we associate to A = (ast ) ∈ Qm×n with A¯ = T n×m the transpose of A; and (ast ) = (aH st ), the conjugate of A; and A = (ats ) ∈ Q ¯ T ∈ Qn×m , the conjugate transpose of A. AH = (A) A square matrix A ∈ Qn×n is said to be normal if AAH = AH A; Hermitian H if A = A; unitary if AH A = I, and invertible (nonsingular) if AB = BA = I for some B ∈ Qn×n . Just as with complex matrices, we can define elementary row (column) operations for quaternion matrices and the corresponding elementary quaternion matrices. It is easy to see that applying an elementary row (column) operation on A is equivalent to multiplying A by the corresponding elementary quaternion matrix from left (right) and that any square quaternion matrix can be brought to a diagonal matrix by elementary quaternion matrices. A list of facts follows immediately, some of which are unexpected. Theorem 1.3.1. Let A ∈ Qm×n , B ∈ Qn×o . Then ¯ T = (AT ); 1. (A) 2. (AB)H = BH AH ; 3. AB 6= (A)(B) in general; 4. (AB)T 6= BT AT in general; 5. (AB)−1 = B−1 A−1 if A and B are invertible; 6. (AH )−1 = (A−1 )H if A is invertible; 7. (A)−1 6= A−1 in general;
6
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao 8. (AT )−1 6= (A−1 )T in general.
Many aspects of quaternion matrices, such as rank, linear independence, similarity, characteristic matrix, Gram matrix and determinantal expansion theorem have been discussed in [32–36, 192, 195]. One can define, in the usual sense, the left and right linear independence over Q for a set of quaternion vectors. Definition 1.3.1. For vectors η1 , η2 , · · · , ηr ∈ Qn , we say that η1 , η2 , · · · , ηr are left linearly independent, if for k1 , k2 , · · · , kr ∈ Q, k1 η1 + k2 η2 + · · · + kr ηr = 0 implies k1 = k2 = · · · = kr = 0. Otherwise we say that η1 , η2 , · · · , ηr are left linearly dependent. we say that η1 , η2 , · · · , ηr are right linearly independent, if for k1 , k2 , · · · , kr ∈ Q, η1 k1 + η2 k2 + · · · + ηr kr = 0 implies k1 = k2 = · · · = kr = 0. Otherwise, we say that η1 , η2 , · · · , ηr are right linearly dependent. One can also easily find an example of two vectors which are left linearly dependent but right linearly independent. The Gram-Schmidt process [76] is still effective. The rank of a quaternion matrix A is defined to be the maximum number of columns of A which are right linearly independent, and denoted by rank(A). It is easy to see that for any invertible matrices P and Q of suitable sizes, rank(A) = rank(PAQ). Thus rank(A) is equal to the number of positive singular values of A. If rank(A) = r, then r is also the maximum number of rows of A that are left linearly independent, and A is nonsingular if and only if A is of (full) rank n. Let Qn denote the collection of quaternion vectors with n components. Qn is a right vector space over Q under the addition and the right scalar multiplication. If A ∈ Qm×n is a quaternion matrix, then the solutions of Ax = 0 form a subspace of Qn , and the subspace has dimension r if and only if rank(A) = n − r. Next we introduce the complex representation of a quaternion and a quaternion matrix. Definition 1.3.2. For any quaternion x = x1 + x2 i + x3 j + x4 k = y + zj ∈ Q and quaternion matrix A = A1 + A2 i + A3 j + A4 k = B1 + B2 j ∈ Qm×n , in which
Preliminaries
7
y = x1 + x2 i, z = x3 + x4 i and B1 = A1 + A2 i, B2 = A3 + A4 i, the complex representation of x is defined as √ √ x1 + x2 √−1 −x3 + x4√ −1 y −z C = ∈ C2×2 , x = x3 + x4 −1 x1 − x2 −1 z y the complex representation of A is defined as B1 −B2 C A = ∈ C2m×2n . B2 B1 From the definition of the complex representation of a quaternion matrix, we have the following results. Theorem 1.3.2. 1. If A, B ∈ Qm×n , a ∈ R, then (A + B)C = AC + BC , (aA)C = aAC ; 2. If A ∈ Qm×n , B ∈ Qn×s , then (AB)C = AC BC , (AH )C = (AC )H ; 3. If A ∈ Qn×n , then A is invertible if and only if AC is invertible, and (AC )−1 = (A−1 )C ;
4. If A ∈ Qn×n , then A is unitary if and only if AC is unitary . From Theorem 1.3.2, we have Theorem 1.3.3. Qm×n ∼ = (Qm×n )C ⊆ C2m×2n ,
here ∼ = denotes the isomorphism.
1.4. Eigenvalue Problem In this section, we turn our attention to the eigenvalues of quaternion matrices. Since left and right scalar multiplications are different, we need to treat Ax = λx and Ax = xλ, separately. A quaternion λ is said to be a left (right) eigenvalue provided that Ax = λx(Ax = xλ). The set {λ ∈ Q : Ax = λx, for some x 6= 0} is called the left spectrum of A, denoted by λl (A). Similarly, the set {λ ∈ Q : Ax = xλ, for some x 6= 0} is called the right spectrum of A, denoted by λr (A).
8
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
Theorem 1.4.1. Let A ∈ Qn×n be an upper triangular matrix. Then a quaternion λ is a left eigenvalue of A if and only if λ is a diagonal entry. The proof is basically the same as in the complex case. The situation of general matrices is much more complicated. Generally speaking, there is no very close relation between left and right eigenvalues. For real matrices, however, we have the following theorem. Theorem 1.4.2. If A is a real n × n matrix, then the left and right eigenvalues of A coincide; that is, λl (A) = λr (A). Proof. Let λ be a left eigenvalue of A, i.e., Ax = λx for some x 6= 0. For any quaternion q 6= 0, we have (qAq−1)qx = (qλq−1 )qx and Aqx = (qλq−1 )qx, since A is real. Taking 0 6= q ∈ Q such that qλq−1 is a complex number and writing qx = y = y1 + y2 j, we have Ay1 = y1 qλq−1 and Ay2 = y2 qλq−1 . It follows that λ is a right eigenvalue of A. Similarly one can prove that every right eigenvalue is also a left eigenvalue. For any given matrix A ∈ Qn×n , does there always exist λ ∈ Q and nonzero column vector x of quaternion such that Ax = λx? This is of course a very basic question. It was raised in [35] and was later proved by Wood [196]. Theorem 1.4.3. (Wood, 1985). Every n-by-n quaternion matrix has at least one left eigenvalue in Q. An elementary proof for the cases of 2-by-2 and 3-by-3 matrices has been recently obtained [153]. On the contrast, the right eigenvalues have been well studied. Since they are invariant under the similarity transformation, right eigenvalues are more useful. Their existence and properties are as follows. Lemma 1.4.4. Suppose that u1 ∈ Qn is unit. Then there exist unit vectors u2 , · · · , un ∈ Qn such that u1 , u2 , · · · , un is an orthogonal set, i.e., uH s ut = 0, s 6= t. Theorem 1.4.5. Given A ∈ Qn×n .
Preliminaries
9
If λ ∈ Q is a right eigenvalue of A. Then for any nonzero q ∈ Q, q−1 λq is also a right eigenvalue of A. ¯ is also a right eigenvalue of A. If λ ∈ C is a right eigenvalue of A. Then λ From Theorem 1.4.5, we know that if λ is a non-real right eigenvalue of A, so is any element in [λ]. Therefore A has finite eigenvalues if and only if all right eigenvalues of A are real. On the other hand, for A ∈ Qn×n , there must exist a complex right eigenvalue with nonnegative imaginary part. Moreover, we have the result as follows. Theorem 1.4.6. (Brenner, 1951; Lee, 1949). Any n-by-n quaternion matrix A has exactly n right eigenvalues which are complex numbers with nonnegative imaginary parts. Those eigenvalues are called the standard right eigenvalues of A. We describe the following quaternion matrix decomposition related to right eigenvalues. Theorem 1.4.7. (The Jordan canonical decomposition ) Suppose that λ1 , · · · , λr are different right eigenvalues of A ∈ Qn×n , in which the algebraic multiplicity of the right eigenvalues are n(λ1 ), · · · , n(λr), respectively. Then there exists a nonsingular matrix P ∈ Qn×n such that P−1 AP = J ≡ diag(J1 (λ1 ), · · · , Jr (λr )), in which
(1)
(k )
Ji (λi ) = diag(Ji (λi ), · · · , Ji i (λi )) ∈ Qn(λi )×n(λi) ,
(k) Ji (λi )
=
λi
1 .. .
ki
∈ Qnk (λi )×nk (λi ) , 1 ≤ k ≤ ki , .. . 1 λi ..
.
(1.4.1) (1.4.2)
(1.4.3)
∑ nk (λi) = n(λi), 1 ≤ i ≤ r,
k=1 (k)
and except the order of Ji (λi )(1 ≤ k ≤ ki , 1 ≤ i ≤ r), J is unique. (k) All matrices Ji (λi )(1 ≤ k ≤ ki , 1 ≤ i ≤ r) in (1.4.3) are called Jordan blocks, and the matrix J is called the Jordan canonical form of A.
10
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
Theorem 1.4.8. (The Schur decomposition) Suppose that A ∈ Qn×n , then there exists a unitary matrix U ∈ Un such that U H AU = T,
(1.4.4)
in which T is an upper triangular matrix. Moreover, by selecting U appropriately, the diagonal elements of T can be arranged in any prescribed order. From Theorem 1.4.8, we have the following corollary. Corollary 1.4.9. 1. A is normal ⇔ There exists a unitary matrix U ∈ Un such that U H AU = diag(λ1 , · · · , λn ), i.e., there exists an orthonormal basis of Qn , which is consisted of right eigenvectors of A. 2. A is Hermitian ⇔ A is normal, and the right eigenvalues of A are all real. Thus all the right eigenvalues are also left eigenvalues of A, and λr (A) ⊆ λl (A). 3. A is unitary ⇔ A is normal, and λr (A) ⊆ ϕ ≡ {λ ∈ Q : |λ| = 1} Example 1.4.1. Let A =
0 i −i 0
. Then
λr (A) = {1, −1} and λl (A) = {λ : λ = α + βj + γk}, α2 + β2 + γ2 = 1. A is Hermitian, 1, −1, j and k are left eigenvalues. Thus λr (A) ⊂ λl (A). Theorem 1.4.10. Let A ∈ Qn×n be Hermitian. Then there exists a unitary matrix U ∈ Un such that U H AU = diag(λ1, λ2 , · · · , λn ), in which λ1 > λ2 > · · · > λn are all n right eigenvalues of A.
Preliminaries
11
The number n(λi ) of all standard right eigenvalues of A which are the same as λi is called algebra multiplicity of λi . The number ki of all Jordan blocks ( j) Ji (λi ) for λi is called geometry multiplicity of λi . From the eigen decomposition of semi-positive definite quaternion Hermitian matrix, we can derive the important singular value decomposition (SVD) of a quaternion matrix A, and generalizations of the SVD for complex matrices. The SVD of a quaternion matrix was described by Zhang [210]. Now the quaternion SVD has been exploited generally in reduced-rank signal processing in which the idea is to extract the significant parts of a signal, and it is also an effective algebraic feature extraction method for any colour image in image pattern recognition, therefore the quaternion SVD has gained much interest. Theorem 1.4.11. (The SVD) For given A ∈ Qm×n with r > 0, there exist unitary r matrices U ∈ Um and V ∈ Un such that Σ1 0 r U H AV = 0 0 m−r , (1.4.5) r n−r in which Σ1 = diag(σ1 , · · · , σr ), σ1 ≥ · · · ≥ σr > 0. Proof. Since AH A ∈ Qn×n is a semi-positive definite Hermitian matrix, its r right eigenvalues are nonnegative numbers, denoted as σ21 , · · · , σ2n , satisfying σ1 ≥ · · · ≥ σr > 0, σr+1 = · · · = σn = 0. Let v1 , · · · , vn be orthogonal right eigenvectors of AH A, associated with σ21 , · · · , σ2n , respectively. Write V1 = (v1 , · · · , vr ), V2 = (vr+1 , · · · , vn ), V = (V1 ,V2), Σ1 = diag(σ1 , · · · , σr ). Then we have AH AV1 = V1 Σ21 , V1H AH AV1 = Σ21 , AH AV2 = 0, V2H AH AV2 = 0. H m×(m−r) Therefore AV2 = 0. Let U1 = AV1Σ−1 such 1 , then U1 U1 = Ir . Take U2 ∈ Q that U = (U1 ,U2) is a unitary matrix. Then H H U1 AV1 U1H AV2 U1 U1 Σ1 0 Σ1 0 U H AV = = = . U2H AV1 U2H AV2 U2H U1 Σ1 0 0 0
The decomposition in (1.4.5) is called the singular value decomposition (SVD) of the matrix A, σ1 ≥ · · · ≥ σr > 0 = σr+1 = · · · = σl are called the
12
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
singular values of A, in which l = min{m, n}, u j and v j are left and right singular vectors of A associated with σ j , j = 1 : l. The set of all the singular values of A is denoted as σ(A). By applying the SVD of quaternion matrix, we can derive quaternion CS decomposition and the quaternion quotient SVD (Q-SVD) which can be used to study some generalized LS problems. Theorem 1.4.12. (CS decomposition) Suppose that W ∈ Un . Partition W into the following form W11 W12 r1 W= W21 W22 r2 , (1.4.6) c1 c2 in which r1 + r2 = c1 + c2 = n. Then we have the following decomposition H 0 U1 0 D11 D12 V1 , (1.4.7) W= 0 U2 D21 D22 0 V2H in which U1 ∈ Ur1 , U2 ∈ Ur2 , V1 ∈ Uc1 , V2 ∈ Uc2 , 0H I S S C I 0 D11 D12 C = D21 D22 I 0S S −C I 0CH and
,
C = diag(c1 , c2 , · · · , cl ), 1 > c1 ≥ c2 ≥ · · · ≥ cl > 0,
S = diag(s1 , s2 , · · · , sl ), 0 < s1 ≤ s2 ≤ · · · ≤ sl < 1,
(1.4.8)
(1.4.9)
in which C and S satisfy
C2 + S2 = Il . Proof. Let the SVD of W11 be W11 = U1 D11V1H , in which U1 and V1 are uniH H tary matrices of appropriate sizes. Since W is unitary, W11 W11 + W21 W21 = Ic1 . H W and W H W are Hermitian semi-positive definite maNotice that both W11 11 21 21 H H W11 and W21 W21 are in [0, 1], thus D11 is as trices, all the singular values of W11 mentioned in Theorem 1.4.12. Since 0t , (W21V1 )H (W21V1 ) = I − DH I −C2 11 D11 = I
13
Preliminaries
the columns of W21V1 are orthogonal, and the first t columns of W21V1 are zero vectors. Normalize the last c1 − t columns of W21V1 , and expand it as an or0 0 thonormal basis of Cr2 , denoted as U2 ∈ Ur2 , then (U2 )H W21V1 = D21 . Similarly, there exists a unitary matrix V2 ∈ Uc2 such that U1H W12V2 = D12 . Then we obtain 0H I S C S I 0C D11 D12 . = D21 D22 X11 X12 X13 0S S X21 X22 X23 I X31 X32 X33 0
The above matrix is unitary, in which D22 = (U2 )H W22V2 . Since the last row block and the last column block are unit, we have Xi3 = 0 for i = 1 : 3, X3 j = 0 for j = 1 : 3. Since the second and forth column blocks are mutually perpendicular, and the second and forth row blocks are mutually perpendicular, we have X21 = 0, X12 = 0. Since the second and fifth column blocks are mutually perpenH dicular, we have CH S+SH X22 = 0, and so X22 = −C. Finally we have X11 X11 = I 0 H and X11 X11 = I, and so X11 is unitary. By choosing U2 = U2 diag(X11, I, I), we complete the proof of the theorem. Theorem 1.4.13. (The Q-SVD) Suppose that A ∈ Qm×n , B ∈ Q p×n ,CH = (AH , BH ) are given with k = rank(C). Then there exist unitary matrices U ∈ Um ,V ∈ U p ,W ∈ Uk and Q ∈ Un such that U H AQ = ΣA (W H ΣC , 0),
V H BQ = ΣB (W H ΣC , 0),
(1.4.10)
in which ΣC = diag(σ1 (C), · · · , σk (C)), σ1 (C), · · · , σk (C) are positive singular values of C, ΣA , ΣB are as follows IA 0B , , SA SB ΣA = ΣB = (1.4.11) 0A IB r s k−r−s r s k−r−s in which IA , IB are identity matrices, 0A , 0B are zero matrices or null, SA = diag(αr+1 , · · · , αr+s), SB = diag(βr+1 , · · · , βr+s), 1 > αr+1 ≥ · · · ≥ αr+s > 0, 0 < βr+1 ≤ · · · ≤ βr+s < 1, α2i + β2i = 1, i = r + 1 : r + s.
14 Proof.
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao Since rank(C) = k, the SVD of C is ΣC 0 H P CQ = , 0 0
in which P ∈ Um+p , Q ∈ Un . Partition P as P11 P12 P= , P21 P22 in which P11 ∈ Qm×k . Let the CS decomposition of P be H U P11 ΣA W= , P21 ΣB VH in which U ∈ Um ,V ∈ U p ,W ∈ Uk , ΣA , ΣB are as in (1.4.11). Then A P11 ΣC 0 UΣAW H ΣC 0 Q= = . B P21 ΣC 0 V ΣBW H ΣC 0 Remark 1.4.1. The CS decomposition of a complex unitary matrix in the form (1.4.7) was provided by Paige and Sanders [140]. The Q-SVD (G-SVD) of two complex matrices was provided by Van Loan [169].
1.5. Norms In this section, we discuss definitions and theoretical properties of vector norms and matrix norms. Norm is an important concept in matrix theory and numerical algebra, which are needed in discussing linear system and generalized least squares problems.
1.5.1.
Vector Norms
The vector norm on Qn is a generalization of the norm of quaternion. Definition 1.5.1. A vector norm on Qn is a function ν : Qn → R0 with the following properties, for any x, y ∈ Qn , a ∈ Q, (1) x 6= 0 ⇒ ν(x) > 0 ( positivity), (2) ν(ax) = ν(xa) = |a|ν(x) (homogeneity), (3) ν(x + y) ≤ ν(x) + ν(y) (triangular inequality).
(1.5.1)
15
Preliminaries From Definition 1.5.1, it is easy to check ν(0) = 0,
ν(−x) = ν(x), |ν(x) − ν(y)| ≤ ν(x − y), ∀x, y ∈ Qn .
For any x = (ξ1 , · · · , ξn )T ∈ Qn , n
1
kxk p = ( ∑ |ξi | p ) p , 1 ≤ p ≤ ∞.
(1.5.2)
i=1
k · k p are called Holder ¨ norms (or p−norms ) on Qn . Frequently used p norms are kxk1 , kxk2 , kxk∞ , in which kxk∞ = max |ξi |, ∀x = (ξ1 , · · · , ξn )T ∈ Qn . 1≤i≤n
We can construct some new norms by known norms. For example, suppose that µ is a vector norm on Qm , A ∈ Qnm×n . Then from ν(x) = µ(Ax), x ∈ Qn , ν is also a vector norm on Qn . Another example is that, suppose that A ∈ Qn×n is a positive definite matrix. Then from √ ν(x) = xH Ax, x ∈ Qn , ν is also a vector norm on Qn . Measurement derived from k · k2 on Qn is called Euclidean measure, i.e., for any vectors x = (ξ1 , · · · , ξn )T and y = (η1 , · · · , ηn )T in Qn , we define s n
kx − yk2 =
∑ |ξi − ηi |2
i=1
as the distance of x, y. Therefore Qn is a metric space. Lemma 1.5.1. ous in x ∈ Qn . Proof.
Suppose that ν is a vector norm on Qn . Then ν(x) is continu-
Suppose that ei is the ith column of In . Denote γ =
r
n
∑ ν2 (ei ) > 0. i=1
Then for any x = (ξ1 , · · · , ξn )T , y = (η1 , · · · , ηn )T ∈ Qn , from the properties of
16
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
norms and Cauchy inequality, we have n
n
|ν(y) − ν(x)| ≤ ν(y − x) = ν( ∑ (ηi − ξi )ei ) ≤ ∑ |ηi − ξi |ν(ei) i=1 r i=1 r n n ≤ ∑ |ηi − ξi |2 × ∑ ν2 (ei ) = γky − xk2 . i=1
i=1
For any ε > 0, take δ = γε , if ky − xk2 < δ, we have |ν(y) − ν(x)| < ε. Lemma 1.5.2. Qn . Proof.
The set F∞ = {x ∈ Qn : kxk∞ = 1} is a bounded close set in
For any x = (ξ1 , · · · , ξn )T ∈ F∞ , from s n √ √ kxk2 = ∑ |ξi |2 ≤ nkxk∞ = n, i=1
we know F∞ is a bounded set in Qn . Suppose that {x(k)} is a countable set of points in F∞, and lim kx(k) − xk2 = k→∞
0. Obviously, there exists a countable subset {x(ki) } of {x(k)}, such that for some (k ) j(1 ≤ j ≤ n), the norms of the jth components, ξ j i , of x(ki ) are 1, i = 1, 2, · · · . Suppose that the jth component of x is ξ j , then from (k )
(k )
|1 − |ξ j || = ||ξ j i | − |ξ j || ≤ |ξ j i − ξ j | ≤ kx(ki ) − xk2 → 0 (i → ∞), we obtain |ξ j | = 1. Moreover, for any i = 1 : n and k = 1, 2, · · ·, we have (k) (k) |ξi | ≤ 1. Then |ξi | = lim |ξi | ≤ 1. Therefore x ∈ F∞, i.e., F∞ is a bounded
close set in Qn .
k→∞
Theorem 1.5.3. Suppose that ν and µ are any two norms on Qn . Then there exist positive numbers r1 , r2 , which only depend on ν and µ, such that r1 ν(x) ≤ µ(x) ≤ r2 ν(x),
∀x ∈ Qn .
(1.5.3)
Proof. When x = 0, (1.5.3) is obvious. Suppose that 0 6= x = (ξ1 , · · · , ξn )T ∈ Qn . Then n
n
µ(x) = µ( ∑ ξi ei ) ≤ ∑ |ξi |µ(ei) ≤ p2 kxk∞ , i=1
i=1
Preliminaries
17
n
in which p2 = ∑ µ(ei ) > 0 and it is not related to x. On the other hand, µ is i=1
a continuous function on Qn , and F∞ is a bounded close set on Qn , therefore µ attain the minimum p1 = min µ(y) > 0 on F∞ . Let y = kxkx ∞ , then y ∈ F∞ . y∈F∞
Therefore
µ(x) = µ(ykxk∞) = kxk∞ µ(y) ≥ p1 kxk∞ . Then we have p1 kxk∞ ≤ µ(x) ≤ p2 kxk∞ .
(1.5.4)
Similarly, we can obtain the following inequalities q1 kxk∞ ≤ ν(x) ≤ q2 kxk∞ ,
(1.5.5)
in which the positive numbers p1 , p2 , q1 , q2 are not related to x. From (1.5.4) and (1.5.5), we obtain 0 < r1 =
Example 1.5.1.
p1 µ(x) p2 ≤ ≤ = r2 . q2 ν(x) q1
Suppose that x ∈ Qn . then kxk∞ ≤ kxk1 ≤ nkxk∞, √1 kxk1 ≤ kxk2 ≤ kxk1 , n √1 kxk2 ≤ kxk∞ ≤ kxk2 . n
1.5.2.
Matrix Norms
We can obtain the concept of matrix norms on Qm×n by generalizing the vector norms on Qn . Definition 1.5.2. A matrix norm on Qm×n is a function ν : Qm×n → R0 with the following properties, for any α ∈ Q, (1) A 6= 0 ⇒ ν(A) > 0 ( positivity), (2) ν(αA) = ν(Aα) = |α|ν(A) (homogeneity), (3) ν(A + B) ≤ ν(A) + ν(B) (triangular inequality), in which A, B are any matrices in Qm×n .
(1.5.6)
18
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao From Definition 1.5.2, it is easy to see ν(0) = 0,
Example 1.5.2.
ν(−A) = ν(A),
|ν(A) − ν(B)| ≤ ν(A − B).
For A = (ai j ) ∈ Qm×n , suppose that r kAkF = ∑ |ai j |2 . i, j
kAkF is called the Frobenius norm of A, F-norm of A for short. F-norm is a direct generalization of Euclidean norm of vectors. Example 1.5.3.
For A = (ai j ) ∈ Qm×n , suppose that kAk0∞ = max |ai j |. i, j
The norm k · k0∞ is a direct generalization of the vector norm k · k∞. Example 1.5.4.
For A = (ai j ) ∈ Qm×n , suppose that kAkα =
1 m n ∑ ∑ |ai j |. n i=1 j=1
The norm k · kα is a direct generalization of the vector norm k · k1 .
Note that if A ∈ Qm×n is viewed as a vector Qmn , then any matrix function on Qm×n can be viewed as a vector function on Qmn . From Theorem 1.5.3, we have the following result. Theorem 1.5.4. Suppose that ν and µ are norms on Qm×n . Then there exist positive constants s1 and s2 which only depend on ν, µ, such that s1 ν(A) ≤ µ(A) ≤ s2 ν(A),
∀A ∈ Qm×n .
In numerical analysis, it is very convenient to use consistent norms as defined below. Definition 1.5.3. Suppose that ρ : Qm×k → R0 , µ : Qm×n → R0 , ν : Qn×k → R0 are all matrix norms. If for any A ∈ Qm×n , B ∈ Qn×k , (4) ρ(AB) ≤ µ(A)ν(B) (consistency), then we say µ, ν and ρ are consistent. In particular, if the norm ν on Qm×m satisfies ν(AB) ≤ ν(A)ν(B), for all A, B ∈ Qm×n , we say ν is self-consistent, or consistent for short.
19
Preliminaries
The norm k · kF is consistent, while norms k · k0∞ and k · kα are not consistent norms. Theorem 1.5.5. Suppose that k · k : Qn×n → R0 is a consistent matrix norm. Then there must exist a vector norm ν on Qn which is consistent with k · k. Proof.
For any nonzero vector a ∈ Qn . We define ν(x) = kxaH k,
x ∈ Qn .
It is easy to check that ν is a vector norm on Qn , and from ν(Ax) = kAxaH k ≤ kAk kxaH k = kAkν(x) we know that ν is consistent with k · k. The consistent matrix norms have a property as follows. Theorem 1.5.6. Suppose that k · k : Qn×n → R0 is a consistent matrix norm. Then for any A ∈ Qn×n , we have |λ| ≤ kAk, ∀ λ ∈ λl (A) or λ ∈ λr (A). Proof. When A = 0, the assertion is obvious. Suppose that A 6= 0. According to Theorem 1.5.5, there exists a vector norm ν on Qn , which is consistent with k · k. Suppose that λ ∈ λl (A), and x is the associated left eigenvector, i.e., Ax = λx,
x ∈ Qn ,
x 6= 0.
Then from |λ|ν(x) = ν(λx) = ν(Ax) ≤ kAkν(x) and ν(x) > 0, we have |λ| ≤ kAk. In a similar manner, we can prove the assertion for λ ∈ λr (A). Matrix norms can be classified from different aspects. Operator norms and unitary invariant norms are most frequently used. We now discuss these matrix norms.
20
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
Theorem 1.5.7. Suppose that µ : Qm → R0 and ν : Qn → R0 are vector norms. We define k · kµ,ν : Qm×n → R0 , kAkµ,ν = sup x∈Qn x6=0
µ(Ax) = maxn µ(Ax). x∈Q ν(x) ν(x)=1
(1.5.7)
Then k · kµ,ν is a matrix norm on Qm×n . x Proof. For any x ∈ Qn , x 6= 0, let y = ν(x) . Then ν(y) = 1. Obviously, µ(Ay) is a continuous function in y, and it attains the maximum on {y ∈ Qn : ν(y) = 1}. Therefore, the second equality in (1.5.7) is meaningful, and
µ(Ax) ≤ kAkµ,νν(x), ∀A ∈ Qm×n , ∀x ∈ Qn .
(1.5.8)
Next we prove that k · kµ,ν satisfies items (1), (2) and (3) in Definition 1.5.2. (1) Positivity. Suppose that A 6= 0. then there exists a positive integer i ≤ n such that Aei 6= 0. Therefore 0 < µ(Aei) ≤ kAkµ,νν(ei ) ⇒ kAkµ,ν > 0. (2) Homogeneity. For any α ∈ Q, we have kαAkµ,ν = maxn µ(αAx) = maxn |α|µ(Ax) = |α|kAkµ,ν. x∈Q ν(x)=1
x∈Q ν(x)=1
kAαkµ,ν = maxn µ(Aαx) = kAαkµ,νν(x) = |α|kAkµ,ν. x∈Q ν(x)=1
(3) Triangular inequality. For any A, B ∈ Qm×n , suppose that x satisfies ν(x) = 1 and µ((A + B)x) = kA + Bkµ,ν . Then kA + Bkµ,ν = µ((A + B)x) ≤ µ(Ax) + µ(Bx) ≤ kAkµ,ν ν(x) + kBkµ,νν(x) = kAkµ,ν + kBkµ,ν . Definition 1.5.4. Suppose that µ and ν are norms on Qm and Qn , respectively. k · kµ,ν : Qm×n → R0 defined in (1.5.7) is called the operator norm on Qm×n , or subordinate matrix norm to µ, ν.
21
Preliminaries
We have the following result about the consistent property of operator norms. Theorem 1.5.8. Suppose that µ, ν and ω are norms on Qm , Qn and Qk , respectively. We define operator norms k · kµ,ν , k · kν,ω and k · kµ,ω on Qm×n , Qn×k and Qm×k , respectively, according to (1.5.8). Then kABkµ,ω ≤ kAkµ,ν kBkν,ω, Proof. (1.5.8),
∀A ∈ Qm×n , ∀B ∈ Qn×k .
(1.5.9)
Suppose that u ∈ Qk , ω(u) = 1, and µ(ABu) = kABkµ,ω. Then from kABkµ,ω = µ(ABu) = µ(A(Bu)) ≤ kAkµ,νν(Bu) ≤ kAkµ,νkBkν,ω ω(u) = kAkµ,νkBkν,ω .
Corollary 1.5.9. Suppose that ν is a norm on Qn . Then the subordinate matrix norm k · kν to ν on Qn×n is consistent, i.e., kABkν ≤ kAkνkBkν ,
∀A, B ∈ Qn×n .
Theorem 1.5.10. Suppose that ν is a norm on Qn , k · kν is a matrix norm on Qn×n subordinate to ν. Suppose that k · k is any matrix norm on Qn×n consistent with ν. Then kAkν ≤ kAk, ∀A ∈ Qn×n . Proof.
Suppose that x ∈ Qn satisfies ν(x) = 1, and ν(Ax) = kAkν. Then kAkν = ν(Ax) ≤ kAkν(x) = kAk.
For p = 1, 2, ∞, k · k p are the most frequently used operator norms on Qm×n , kAk p = max kAxk p , kxk p=1
A ∈ Qm×n .
Moreover from Theorem 1.5.8, we know that these operator norms are all consistent. We describe the following properties of matrix norms k · k1 , k · k2 and k · k∞.
22
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
Example 1.5.5.
Suppose that A = (ai j ) ∈ Qm×n . Then m
kAk1 = max
∑ |ai j |,
1≤ j≤n i=1 n
kAk∞ = max
∑ |ai j |,
1≤i≤m j=1
kAk2 =
q λmax (AH A) = max σi (A). i
kAk1 and kAk∞ are called the column-sum norm and row-sum norm, respectively. kAk2 is called the spectral norm. We can prove that kAk2 = max |yH Ax|, kxk2 =1 kyk2 =1
kAH k2 = kAT k2 = kAk2, kAH Ak2 = kAk22 , kAk22 ≤ kAk1 kAk∞, and for any unitary matrices U ∈ U m and V ∈ U n , we have kUAVk2 = kAk2 . Suppose that A ∈ Qn×n . Then ρ(A) = max{|λ| : λ ∈ λr (A)} is called the spectral radius of A. From Theorem 1.5.6, if k · k is a consistent norm on Qn×n , then ρ(A) ≤ kAk, ∀A ∈ Qn×n . The unitary invariant norms play an important role in research of generalized least squares problems. Definition 1.5.5. A nonnegative real-value function on Qm×n k · k is called unitary invariant norm, if it satisfies the following two conditions and (1)-(3) in Definition 1.5.2, (4) kUAVk = kAk, ∀U ∈ Um , ∀V ∈ Un , (5) kAk = kAk2 , ∀A with rank(A) = 1. The spectral norm k · k2 and the F-norm k · kF are unitary invariant. While k · k1 and k · k∞ are not.
23
Preliminaries
1.6.
Generalized Inverses
We now discuss generalized inverse matrices which are frequently used. Suppose that A ∈ Qn×n . If A is nonsingular, then there exists a unique inverse matrix of A, A−1 , satisfies AA−1 = A−1 A = In . However, if matrix A is singular or rectangular, its inverse does not exist. In these cases, we should study generalized inverse matrices of A. The concept of the generalized inverse matrix was introduced by Moore [135]. For given A ∈ Cm×n , we say X ∈ Cn×m is the generalized inverse of A if it satisfies (1.6.1) AX = PR (A) , XA = PR (X) , denoted by A† , in which PR (A) and PR (X) are orthogonal projection operators on R (A) and R (X), respectively. Penrose [146] defined the generalized inverse X ∈ Cn×m of A ∈ Cm×n via the following four matrix equations (1) AXA = A; H
(3) (AX) = AX;
(2) XAX = X; (4) (XA)H = XA.
(1.6.2)
In fact, the two definitions are equivalent. These definitions can be extended to the quaternion matrices. Moreover the generalized inverse matrix satisfy the definition is unique. Theorem 1.6.1. Suppose that A ∈ Qm×n . Then there is only one solution n×m X ∈Q to (1.6.2). Proof.
Uniqueness. Suppose that X, Y are two solutions to (1.6.2). Then Y
= YAY = AH Y H Y = (AXA)HY H Y = (XA)H (YA)HY = XAY = XAXAY = XX H AH Y H AH = XX H AH = XAX = X.
Existence. Obviously, if A = 0, X = 0 is the only solutionto (1.6.2). If Σ1 0 rank(A) > 0, suppose that the SVD of A is A = U V H , in which 0 0 U and V are unitary, Σ1 = diag(σ1 , · · · , σr ) > 0. Let −1 Σ1 0 X =V UH. 0 0 It is easy to show that X satisfy (1.6.2).
24
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
Definition 1.6.1. Suppose that A ∈ Qm×n . The solution X to (1.6.2) is called the Moore-Penrose inverse of A, referred to as the MP inverse, denoted by A† . The equations in (1.6.2) are called the definition equations of the MP inverse. Remark 1.6.1. From (1.6.2), when A is nonsingular, A† = A−1 . Therefore, † the MP inverse A is a generalization of inverse of nonsingular matrix. We can define different generalized inverses by one or several equations in (1.6.2). If η = {i, j, k} is a subset of {1, 2, 3, 4}, matrix X ∈ Qn×m satisfying the ith, jth and kth equations in (1.6.2) is called a η - inverse of A, denoted by Aη . If A is singular, Aη = {X ∈ Qn×m : X = Aη } is a set of matrices consisting infinite number of matrices. Suppose that rank(A) = r > 0. The SVD of A ∈ Qm×n is Σ1 0 A =U VH, 0 0 in which U,V are unitary, Σ1 = diag(σ1 , · · · , σr ). It is easy to check that −1 Σ1 K (1) A =V UH, L M −1 Σ1 K (1,2) A =V UH, L LΣ K 1 −1 Σ1 0 (1,2,3) A =V UH, L 0 −1 Σ1 K (1,2,4) A =V UH, 0 0 −1 Σ1 0 † A =V U H = A(1,2,4)AA(1,2,3), 0 0 in which K, L, M are arbitrary matrices with appropriate sizes. Now we describe the properties of the MP inverse as follows 1. (A† )† = A. 2. (AH )† = (A† )H . 3. rank(A) = rank(A† ) = rank(A† A). 4. (AAH )† = AH† A† ,
(AH A)† = A† AH† .
Preliminaries 5. (AAH )† AAH = AA† ,
25
(AH A)† AH A = A† A.
6. A† = (AH A)† AH , A† = AH (AAH )† . m×n In particular, if A ∈ Qnm×n , then A† = (AH A)−1 AH . If A ∈ Qm , then † H H −1 A = A (AA ) .
7. If U,V are unitary, then (UAV)† = V H A†U H . Remark 1.6.2. Many properties of A† are not same as A−1 . Example, if A is square, in general 1. (AB)† 6= B† A† . 2. AA† 6= A† A. 3. If A ∈ Qn×n , k ≥ 2, then (Ak )† 6= (A† )k . 4. The nonzero right eigenvalues of A and A† have no reciprocal relation.
1.7. Projections In this section, we introduce the concepts and properties of projections. Firstly, we introduce several concepts of subspaces. Definition 1.7.1. Suppose that A ∈ Qm×n . Then
R (A) = {x ∈ Qm : x = Ay, y ∈ Qn }, N (A) = {y ∈ Qn : Ay = 0} are called the column space and null space of A, respectively. Definition 1.7.2. Suppose that L and M are two subspaces of Qn . If ∀x ∈ Qn , there exist x1 ∈ L and x2 ∈ M such that x = x1 + x2 . Moreover, if x ∈ L ∩ M, then x = 0. Then Qn is called the direct sum of two subspaces L and M, denoted by Qn = L ⊕ M.
26
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
If Qn = L ⊕ M, then L and M are called mutually complementary subspaces. If ∀x ∈ L, ∀y ∈ M, x ⊥ y, then M is called orthogonal complementary subspace of L, and L is called orthogonal complementary subspace of M, denoted by M = L⊥ or L = M ⊥. Obviously, if Qn = L ⊕ M, then arbitrary vector x ∈ Qn can be uniquely expressed as x = y + z, y ∈ L, z ∈ M. Lemma 1.7.1. For arbitrary matrix A ∈ Qm×n , the following relationships hold.
N (A) = R (AH )⊥, R (A) = N (AH )⊥ .
(1.7.1)
Proof. Suppose that x ∈ Qn , y ∈ Qm . If x ∈ N (A), then (AH y)H x = yH (Ax) = 0, ∀y ∈ Qm , i.e., x ∈ R (AH )⊥. Therefore N (A) ⊆ R (AH )⊥. If x ∈ R (AH )⊥, then yH (Ax) = (AH y)H x = 0, ∀y ∈ Qm . Therefore x ∈ N (A), i.e., R (AH )⊥ ⊆ N (A). Similarly, we can obtain R (A) = N (AH )⊥ .
1.7.1.
Idempotent Matrices and Projections
Definition 1.7.3. If E ∈ Qrn×n satisfies E 2 = E, then E is called the idempotent matrix. Theorem 1.7.2. Suppose that E ∈ Qn×n is an idempotent matrix. Then 1. E H and In − E are idempotent matrices; Ir 0 2. The Jordan canonical form of E is , r ≤ n; 0 0 3. R (In − E) = N (E), N (In − E) = R (E); 4. If E has a full rank decomposition E = FG, then GF = Ir .
27
Preliminaries
Proof. Parts 1-2 are easy to prove. Therefore we only prove Parts 3 and 4. 3. If x ∈ R (In − E), then there exists y ∈ Qn such that x = (In − E)y. Therefore Ex = E(In − E)y = Ey − E 2 y = 0, i.e., x ∈ N (E), then R (In − E) ⊆ N (E). On the contrary , if x ∈ N (E), then Ex = 0, therefore (In − E)x = x, i.e., x ∈ R (In − E). Thus N (E) ⊆ R (In − E). Therefore R (In − E) = N (E). Similarly, we can prove N (In − E) = R (E). 4. From E 2 = E, we have F(GF − Ir )G = 0, and therefore F H F(GF − I)GGH = 0. Note that both F H F and GGH are nonsingular matrices, therefore GF − Ir = 0, i.e., GF = Ir . Definition 1.7.4. Suppose that Qn = L ⊕ M, and x ∈ Qn has the formula x = y + z, y ∈ L, z ∈ M.
(1.7.2)
Then y is called the projection of x along M to L. PL,M denotes the corresponding mapping from Qn to L, and is called projection transformation along M to L, or projection operator. Corollary 1.7.3. If Qn = L ⊕ M, then any vector x in Qn has unique decomposition (1.7.2), in which y and z can be expressed as y = PL,M x, z = (I − PL,M )x.
(1.7.3)
Theorem 1.7.4. If E ∈ Qn×n is an idempotent matrix, then Qn = R (E) ⊕ N (E), and E = PR (E),N (E) . If Qn = L ⊕ M, then there exists a unique idempotent matrix PL,M such that
R (PL,M ) = L, N (PL,M ) = M.
28
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
Proof. Suppose that E ∈ Qn×n is an idempotent matrix. Then on the one hand, from x = Ex + (I − E)x, ∀x ∈ Qn , in which Ex ∈ R (E), (I − E)x ∈ R (I − E) = N (E). On the other hand, if z ∈ R (E) ∩ N (E), there exist x1 , x2 ∈ Qn such that z = Ex1 = (I − E)x2 , then z = Ex1 = E 2 x1 = E(I − E)x2 = 0. Therefore Qn = R (E) ⊕ N (E) and Ex is the projection along N (E) to R (E). Thus E = PR (E),N (E) . On the other hand , if Qn = L ⊕ M, then we choose the following bases in L and M, respectively, {x1 , · · · , xl }, {y1 , · · · , ym }, in which l + m = n. We denote PL,M as the projection operator along M to L, and then ( PL,M xi = xi , i = 1, · · · , l, (1.7.4) PL,M y j = 0, j = 1, · · · , m. Suppose that X = (x1 , · · · , xl ), Y = (y1 , · · · , ym ). Then (1.7.4) can be expressed as PL,M (X,Y ) = (X, 0). (1.7.5) Since (X,Y ) ∈ Qn×n is nonsingular, we have PL,M = (X, 0)(X,Y)−1 , and R (PL,M ) = L, N (PL,M ) = M. In addition, 2 PL,M = PL,M (X, 0)(X,Y)−1 = (X, 0)(X,Y)−1 = PL,M ,
i.e., PL,M is an idempotent matrix.
Preliminaries
1.7.2.
29
Orthogonal Projections
Projections mentioned in the previous subsection are oblique, and they correspond to idempotent matrices. In this subsection, we discuss orthogonal projections. Theorem 1.7.5. Suppose that L is a subspace in Qn , and x is an arbitrary vector in Qn . Then there exists a unique vector ux in L such that kx − ux k2 < kx − uk2 , ∀u ∈ L, u 6= ux .
(1.7.6)
Proof. Suppose that L⊥ is orthogonal complementary subspace of L. It is easy to check that Qn = L ⊕ L⊥ . Therefore, for arbitrary vector x ∈ Qn , there has a unique decomposition x = ux + (x − ux ), ux ∈ L, x − ux ∈ L⊥ . Therefore, for arbitrary u ∈ L, we have kx − uk22 = k(ux − u) + (x − ux )k22 = kux − uk22 + kx − ux k22 . Obviously, kx − uk2 reaches the minimal value kx − uxk2 if and only if u = ux . For arbitrary vector x ∈ Qn , from Theorem 1.7.5, there exists a unique vector ux in L, which is called orthogonal projection of x along L⊥ to L. If PL denotes the corresponding mapping from Qn to L, then PL is called the orthogonal projection transformation along L⊥ to L, or orthogonal projection operator. If PL⊥ denotes the corresponding mapping from Qn to L⊥ , then PL⊥ = I − PL is called the orthogonal projection transformation along L to L⊥ , or the orthogonal projection operator. A matrix P is called Hermitian idempotent matrix, if P is not only an idempotent matrix, but also an Hermitian matrix. The following theorem illustrates the relation between orthogonal projection operator and Hermite idempotent matrix. Theorem 1.7.6. Suppose that Qn = L ⊕ M. Then PL,M is an Hermitian matrix if and only if M = L⊥ .
30
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
Proof. From Lemma 1.7.1 and Theorem 1.7.4, we have ( H ) = N (P ⊥ ⊥ R (PL,M L,M ) = M H N (PL,M ) = R (PL,M )⊥ = L⊥ .
(1.7.7)
H H Since PL,M is an idempotent matrix, according to Theorem 1.7.4, R (PL,M ) and H ⊥ ⊥ N (PL,M ) are complementary, i.e., M and L are complementary. According to Theorem 1.7.4, there exists an idempotent matrix PM⊥ ,L⊥ such that (1.7.8) R (PM⊥ ,L⊥ ) = M ⊥ , N (PM⊥ ,L⊥ ) = L⊥ ,
and this idempotent matrix is unique. Comparing (1.7.7) and (1.7.8), we have H PL,M = PM⊥ ,L⊥ .
Therefore, PL,M is an Hermitian matrix if and only if PL,M = PM⊥ ,L⊥ . Thus, PL,M is an Hermitian matrix if and only if M = L⊥ .
1.7.3.
Geometric Meanings of AA† and A† A
In this subsection, we study properties of column space and null space of AA† and A† A. Theorem 1.7.7. Suppose that A ∈ Qm×n . Then
R (AA†) = R (AAH ) = R (A), R (A† A) = R (AH A) = R (AH ) = R (A†),
(1.7.9)
N (AA†) = N (AAH ) = N (AH ) = N (A†), N (A†A) = N (AH A) = N (A).
(1.7.10)
Proof. Suppose that rank(A) = r > 0, and suppose that the SVD of A is Σ 0 A =U VH, 0 0 in which U = (U1,U2 ), V = (V1 ,V2) are unitary matrices, U1 , V1 denote the first r columns of U, V , respectively, Σ = diag(σ1 , · · · , σr ) > 0. Therefore A = U1 ΣV1H , AH = V1 ΣU1H , A† = V1 Σ−1U1H ,
Preliminaries
31
AAH = U1 Σ2U1H , AH A = V1 Σ2V1H , AA† = U1U1H , A† A = V1V1H . Thus
R (A) = R (U1), R (AH ) = R (V1), R (A†) = R (V1), R (AAH ) = R (U1), R (AH A) = R (V1), R (AA†) = R (U1), R (A†A) = R (V1), N (A) = R (V2), N (AH ) = R (U2), N (A†) = R (U2), N (AAH ) = R (U2), N (AH A) = R (V2), N (AA†) = R (U2), N (A†A) = R (V2), from above formulas, we obtain (1.7.9)-(1.7.10). When A = 0, the assertions are obvious. Now we consider geometric meanings of AA† and A† A. Obviously, AA† ∈ Q is an Hermitian idempotent matrix, m×m
Qm = R (AA† ) ⊕ N (AA† ),
and AA† is the orthogonal projection operator to R (A), denoted by PR (A) = AA† ≡ PA . A† A ∈ Qn×n is also an Hermitian idempotent matrix,
Qn = R (A† A) ⊕ N (A† A),
and A† A is the orthogonal projection operator to R (AH ), denoted by PR (AH ) = A† A ≡ PAH .
1.8. Properties of Real Representation Matrices In this section, we describe properties of real representation matrices of quaternion matrices. Definition 1.8.1. For any quaternion matrix A = A1 + A2 i + A3 j + A4 k, in which A1 , A2 , A3 , A4 ∈ Rm×n , its real representation can be defined as A1 −A2 −A3 −A4 A2 A1 −A4 A3 . AR ≡ (1.8.1) A3 A4 A1 −A2 A4 −A3 A2 A1
32
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
It is worthy to observe that, AR has some nice properties. For any quaternion matrix A = A1 + A2 i + A3 j + A4 k, now we study the structural properties of AR . By simple computations, we can obtain the following properties of the real representation of quaternion matrices. Theorem 1.8.1. Let A, B ∈ Qm×n , C ∈ Qn×s and a ∈ R. Then 1. (A + B)R = AR + BR , (aA)R = aAR , (AC)R = ARCR ; 2. (AH )R = (AR )T ; 3. A ∈ Qm×m is unitary if and only if AR is orthogonal; 4. A is invertible if and only if AR is invertible, and (A−1 )R = (AR )−1 . 2 5. if 0 6= q ∈ Q, then q−1 = q/|q| ¯ , (q−1 )R = (qR )T /|q|2 .
From Theorem 1.8.1, we have Theorem 1.8.2. Qm×n ∼ = (Qm×n )R ⊆ R4m×4n,
here ∼ = denotes the isomorphism. Define three unitary matrices
0 0 −In 0 0 0 0 −In , Jn = In 0 0 0 0 In 0 0
0 −In 0 0 In 0 0 0 , Sn = Rn = 0 0 0 In 0 0 −In 0
0 0 0 −In 0 0 In 0 . 0 −In 0 0 In 0 0 0
A real matrix M ∈ R4n×4n is called JRS-symmetric if Jn MJnT = M,
Rn MRTn = M,
Sn MSTn = M.
A matrix O ∈ R4n×4n is called JRS-symplectic if OJn OT = Jn ,
ORn OT = Rn ,
OSn OT = Sn .
(1.8.2)
(1.8.3)
Preliminaries
33
A matrix W ∈ R4n×4n is called orthogonal JRS-symplectic if it is orthogonal and JRS-symplectic. One can see that an orthogonal JRS-symplectic matrix must be orthogonal symplectic, but the converse is not always true. Theorem 1.8.3. [83] Let A ∈ Qm×n . Then 1. AR is JRS-symmetric; 2. If AR is orthogonal then it must be orthogonal JRS-symplectic. We have the following result with respect to the relation of the singular values of quaternion matrix and its real representation. Theorem 1.8.4. Suppose that A ∈ Qm×n and AR is the real representation of A. Then there exist orthogonal JRS-symplectic matrices U ∈ R4m×4m and V ∈ R4n×4n such that Σ 0 0 0 0 Σ 0 0 U T AR V = (1.8.4) 0 0 Σ 0 , 0 0 0 Σ in which Σ ∈ Rm×n is a diagonal matrix containing the singular values of A. Therefore the singular values of AR appear in fours.
Now we present the standard form of the real representation of a quaternion Hermitian matrix under the orthogonal JRS-symplectic transformations. Theorem 1.8.5. Suppose that A ∈ Qn×n is Hermitian and AR is the real representation of A. Then there exists an orthogonal JRS-symplectic matrix W ∈ R4n×4n such that D 0 0 0 0 D 0 0 (1.8.5) WARW T = 0 0 D 0 , 0 0 0 D
in which D ∈ Rn×n is a symmetric tridiagonal matrix.
Theorem 1.8.6. Let A ∈ Qn×n . Then the real eigenvalues of AR appear in fours; the purely imaginary eigenvalues of AR appear in pairs and in conjugate pairs.
34
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
Therefore, if we want to perform some transformation of A ∈ Qm×n , we can apply the properties of its real representation AR , to make the computation workload, computational time greatly reduced. From (1.8.1) and Theorem 1.8.1, for the matrix AR , we only need to store the firstcolumn block or the first row block of AR , which are denoted as ARc = A1 A2 or AR = A1 −A2 −A3 −A4 , respectively. r A3 A4 From these notation and Theorem 1.8.1, we have the following result. Theorem 1.8.7. Let A, B ∈ Qm×n , C ∈ Qn×s , q ∈ Qm and a ∈ R. Then 1. (A + B)Rc = ARc + BRc , (aA)Rc = aARc , (AC)Rc = ARCcR . 2. (AH )Rc = ((AR)T )c ≡ (AR)Tc . 3. kAkF = kARckF , kqk2 = kqRck2 . Theorem 1.8.8. Let A, B ∈ Qm×n , C ∈ Qn×s , q ∈ Qm and a ∈ R. Then 1. (A + B)Rr = ARr + BRr , (aA)Rr = aARr , (AC)Rr = ARrCR . 2. (AH )Rr = ((AR)T )r ≡ (AR)Tr . 3. kAkF = kARrkF , kqk2 = kqRrk2 .
Chapter 2
Computing Matrix Decompositions Matrix decompositions play a prominent role in real matrix computations. The research of real matrix decomposition has been extensively studied. Most of the ideas of the real matrix decompositions can be extended to quaternion matrices. In this chapter, we describe several kinds of quaternion matrix decompositions. In §2.1, we introduce the basic knowledge and elementary matrices. In §2.2, we discuss the quaternion LU and Cholesky decompositions and real structurepreserving algorithms for computing these decompositions. In §2.3, we provide the quaternion QR decomposition based on quaternion Householder, Givens and modified Gram-Schimit transformations, respectively. In §2.4, we introduce a real structure-preserving algorithm of the quaternion SVD. Applying the real structure-preserving quaternion SVD algorithm, we solve the problem of color image compression. All these computations are performed on an Intel Core i7-2600 @ 3. 40GHz/8GB computer. The version of Matlab used is R2013a. One way to quantify the volume of work associated with a computation is to count flops. Usually f l(x op y) stands for a arithmetic of +, −, × or ÷ of x and y. For example, for a, b ∈ R, there is only one real flop in a ± b and a × b, while for a, b ∈ Q, there are 4 real flops in a ± b and 28 real flops in a × b. For a, b ∈ Rn , there is only 2n real flops in aT b, while for a, b ∈ Qn , there are 32n real flops in aH b, 16 times of real flops of that in computing inner product of real vectors. An algorithm should satisfy the following three basic requirements. First, the algorithm should be numerical stable and so is reliable. Second, compu-
36
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
tational speed should be as fast as possible. The last, it should save storage space. (1). In the real matrix computations, there are many stable and high accurate algorithms. Most of them can be generalized to quaternion matrix computations and they are also stable and reliable. (2). The number of flops in a given matrix computation is usually obtained by summing the amount of arithmetic associated with the most deeply nested statements. To reduce the complexity of an algorithm, we should minimize the flops. The quaternion arithmetic is much more complicated compared with the real arithmetic. By using Theorem 1.8.1, we can convert the problem of quaternion matrix computations to that of its real representation matrix computations, and by applying Theorem 1.8.7, we can design real structure-preserving algorithms, which ensure that the numbers of real flops of the two methods are same. When it comes to design a high-performance matrix computation, it is not enough to minimize flops. Since the assignment number affects the complexity of an algorithm and so also affects computational speed. Assignment refers to call subroutines or perform matrix operations. An assignment typically requires several cycles to complete. The input scalars proceed along a computational assembly line, spending one cycle at each of all the work stations. Vector operation is a very regular sequence of scalar operation. Vector processors exploit the key idea of pipelining. With pipelining, the input vector are streamed through the operation unit. Once the pipeline is filled and steady state reached, an output component is produced every cycle. The rate of vector processing is about n time that of scalar processing, in which n is the number of cycles in a floating point operation. On the other hand, parallel algorithm using multiprocessor play a great role in improving efficiency of matrix computations, say, B = AX + Y , we adopt the assignment B = A ∗ X + Y to utilize vector pipelining arithmetic operations rather than explicitly using triply-nested for-end loops, to speed up computations remarkably. Therefore, real arithmetic numbers as well as assignment numbers are important measures. See §1.5 and §1.6 of [64].
2.1. Elementary Matrices In this section, we introduce some elementary matrices which are used in computations of quaternion matrix decompositions. Here we propose the following
37
Computing Matrix Decompositions
numerical code to transform a quaternion, quaternion vector or quaternion matrix to corresponding real representation. Algorithm 2.1.1. This function transforms a quaternion, quaternion vector or quaternion matrix g = g1 +g2 i+g3 j +g4 k to corresponding real representation. Function : GR = Realp(g1 , g2, g3 , g4) Realp = [g1 , −g2, −g3 , −g4; g2 , g1, −g4, g3 ; g3, g4, g1 , −g2; g4 , −g3, g2, g1 ]; end
We introduce three kinds of elementary matrices used in the quaternion LU decomposition, Cholesky decomposition, QR decomposition and SVD.
1. Gauss Transformation Matrices Gauss transformation matrices are widely used in LU, LDLH and Cholesky decompositions. For x = (x1 , · · · , xn )T ∈ Qn and xi 6= 0 (i = 1, 2, · · · , n), let li = (0, · · · , 0, li+1,i, li+2,i , · · · , ln,i )T in which lk = xk x−1 i =
xk x¯i , |xi |2
(k = i + 1, · · · , n). We define
1 ··· .. . . . . 0 ··· Li = In − li eTi = 0 ··· .. . 0 ···
then
0 .. .
0 ··· .. .
1 0 ··· −li+1,i 1 .. .. . . . . . −ln,i 0 · · ·
x1 .. . xi . Li x = 0 .. . 0
0 .. . 0 , 0 .. . 1
(2.1.1)
The matrix Li ∈ Qn×n in (2.1.1) is called the quaternion Gauss transformation matrix and li+1 , · · · , ln−1 are called quaternion multipliers.
38
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
2. Givens Matrices We can annihilate some prescribed sub-column or sub-row in a matrix by a Householder matrix, which are exceedingly useful for introducing zeros on a grand scale. However, if we want to zero elements more selectively, Givens rotation matrices are the better choice.
(1) The Real Givens Matrix In 2-dimension space, we can express the clockwise rotation by θ as c −s G0 (θ) = , c = cosθ, s = sinθ. s c The Givens rotation is clearly an orthogonal matrix.. For a = (a1 , a2 )T ∈ R2 , denote G0 (θ)T a = b = (b1 , b2 )T , then b1 = ca1 + sa2 , b2 = −sa1 + ca2 . Denote c = a1 /σ, s = a2 /σ, σ = (a21 + a22 )1/2 6= 0, then G0 (θ)
T
a1 a2
=
σ 0
.
It is simple to eliminate the second entry of a. The following numerical code for Givens rotation can guard against overflow of σ . Algorithm 2.1.2. Given scalars a1 ∈ R and a2 ∈ R, this function computes c, s, σ in Givens matrix such that −sa1 + ca2 = 0. Function : [c, s] = givens(a1 , a2 ) if a2 = 0 c = 1; s = 0; σ = a1 ; elseif |a2| > |a1| t = −a1 /a2 ; s = 1/sqrt(1 + t 2); c = st; else t = −a2 /a1 ; c = 1/sqrt(1 + t 2); s = ct; end
This algorithm requires 6 flops.
39
Computing Matrix Decompositions
(2) The JRSGivens Matrix The authors of [83] provided the following formula which can be used to obtain an orthogonal matrix from a quaternion number. For given g = g1 + g2 i + g3 j + g4 k ∈ Q , g1 , g2 , g3 , g4 ∈ R, if g 6= 0, define G2 as ! cosα1 − cosα2 − cosα3 − cosα4 g1 −g2 −g3 −g4 gR g2 g1 −g4 g3 cosα2 cosα1 − cosα4 cosα3 G2 = cosα3 cosα4 cosα1 − cosα2 = g3 g4 g1 −g2 /|g| = , |g| cosα4 −cosα3 cosα2 cosα1 g4 −g3 g2 g1 then GT2 gR = gR GT2 = |g|I4 , GT2 G2 = G2 GT2 = I4 and G2 = (
g R ) . |g|
Therefore, G2 is an orthogonal matrix. Here we propose the corresponding numerical codes. Algorithm 2.1.3. A method for generating an orthogonal matrix G2 from g = g1 + g2 i + g3 j + g4 k ∈ Q. Function : G2 = JRSGivens(g1 , g2, g3, g4 ) if [g2 , g3, g4 ] == 0 G2 = eye(4); else G2=Realp(g1 , g2 , g3, g4)/norm([g1, g2 , g3, g4]); end
This algorithm costs 24 flops, including 1 square root operation. Notice that the transformation G2 acts as a four-dimensional Givens rotation [54, 129].
(3) The qGivens Matrix When we apply the idea of Givens rotation to the quaternion case, we have the following result. x1 Theorem 2.1.4. Suppose that ∈ Q2 with x2 6= 0. Then there exists a unix2 q1 q3 tary matrix G1 = , such that GH 1 x = αe1 , in which α = k(x1 ; x2 )k2 . q2 q4
40
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
A choice of G1 is q1 = k(x1x;x12 )k2 , q2 = k(x1x;x22 )k2 ; H H q3 = |q2 |, q4 = −|q2 |q−H 2 q1 = −q2 q1 /|q2 |, f or |x1 | ≤ |x2 |; H H q4 = |q1 |, q3 = −|q1 |q−H 1 q2 = −q1 q2 /|q1 |, f or |x1 | > |x2 |.
(2.1.2)
Proof. Because G1 is unitary, we can define q1 =
x2 x1 , q2 = , k(x1 ; x2 )k2 k(x1 ; x2 )k2
and q3 , q4 should satisfy H H H qH 1 q3 + q2 q4 = 0, q3 q3 + q4 q4 = 1.
(2.1.3)
In order to ensure stability, the problem of selecting q3 , q4 will be discussed in the following two cases. (1) |x1 | ≤ |x2 | ⇔ |q1 | ≤ |q2 |. From (2.1.3), we get H 2 2 −H H 2 q4 = −q−H 2 q1 q3 , 1 = |q3 | + |q3 | |q2 q1 | ,
1 1 |q2 | |q3 | = q =q =p = |q2 |, 2 + |q |2 −H H 2 −1 2 |q | 2 2 1 1 + |q2 q1 | 1 + |q2 | |q1 |
therefore we can choose
H H q3 = |q2 |, q4 = −|q2 |q−H 2 q1 = −q2 q1 /|q2 |.
(2) |x1 | > |x2 | ⇔ |q1 | > |q2 | From (2.1.3), we get H H H 2 −1 2 2 2 q3 = −q−H 1 q2 q4 , 1 = q3 q3 + q4 q4 = |q4 | |q1 | |q2 | + |q4 | ,
therefore
1 |q1 | = |q1 |, =p |q4 | = q 2 + |q |2 −1 2 |q | 2 1 2 1 + |q1 | |q2 |
and so we can choose
H H q4 = |q1 |, q3 = −|q1 |q−H 1 q2 = −q1 q2 /|q1 |.
Computing Matrix Decompositions
41
Obviously, G1 with such structure is unitary. Finally x1 k(x1 ; x2 )k2 GH = = k(x1 ; x2 )k2 e1 . 1 x2 0 Remark 2.1.1. The Givens matrix G1 in Theorem 2.1.4 is the generalization of real Givens matrix G0 . Moreover, we have |q1 | = |q4 | and |q2 | = |q3 |. Remark 2.1.2. According to the sizes of √|x1 | and |x2 |, we take the different q3 , q4 . When |x1 | ≤ |x2 |, then |q3 | = |q2 | ≥ 22 , and when |x1 | > |x2 |, then |q4 | = |q1 | > G1 .
√ 2 2 .
With such choices we can ensure stability in the process of computing
The following is the numerical code for generating the real representation GR of the quaternion Givens matrix. Algorithm 2.1.5. Given vector x = (x1; x2), in which x1, x2 are the column blocks of the real representations of a1 , a2 ∈ Q, respectively, a = (a1 , a2 )T is a quaternion vector. This function generates the real q1 q3 resentation GR of the quaternion Givens matrix G = such q2 q4 G0 a = (||a||2; 0).
first and repthat
Function : GR = qGivens(x1, x2) t = [norm(x1), norm(x2), norm(x1; x2)]; if t(1) < t(2) tt = t(2)/t(3); q3 = [tt; 0; 0; 0]; q4 = −Real p(q2(1), q2(2), q2(3), q2(4)) · · · ∗[q1(1); −q1(2); −q1(3); −q1(4)]/tt; else tt = t(1)/t(3); q4 = [tt; 0; 0; 0]; q3 = −Real p(q1(1), q1(2), q1(3), q1(4)) · · · ∗[q2(1); −q2(2); −q2(3); −q2(4)]/tt; end GR = Real p([q1(1), q3(1); q2(1), q4(1)], [q1(2), q3(2); q2(2), q4(2)], · · · [q1(3), q3(3); q2(3), q4(3)], [q1(4), q3(4); q2(4), q4(4)]); end
42
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
3. Householder Based Transformations The Householder transformation is one of the most powerful tools in matrix computations and is widely used in the real QR decomposition and the real SVD, see [64] and references cited therein. It was introduced by Householder in a paper about the triangularization of a non-symmetric matrix [79], then was generalized to complex case by Morrison [134], and was further generalized to quaternion case by Bunse-Gerstner et al. [58].
(1) The Real Householder Transformation Suppose that vector u0 ∈ Rn satisfy u0 T u0 = 1. Then a Householder matrix H0 (u0 ) is defined as H0 (u0 ) ≡ In − 2u0 uT0 , (2.1.4) in which u0 is called the Householder vector. If 0 6= x ∈ Rn , then H0 = In − β0 xxT , β0 =
2 xT x
(2.1.5)
is a Householder matrix. The following is the numerical code for generating the Householder vector u0 and scalar β0 in H0 . Algorithm 2.1.6. A method for generating the Householder vector u0 and scalar β0 in H0 of a given real vector x ∈ Rn . Function : [u0 , β0] = Householder0(x, n) u0 = x; α1 = norm(x); if u0 (1) >= 0 u0 (1) = u0 (1) + α1 ; β0 = 1/(u0 (1) ∗ α1 ); else u0 (1) = u0 (1) − α1 ; β0 = 1/(u0 (1) ∗ α1 ); end
43
Computing Matrix Decompositions
(2) Quaternion Householder Based Transformations Real Householder matrices play an important role in matrix computations. In fact, they can be extended to quaternion matrices to produce Householder based transformations. The authors of [23] and [115] mentioned the following necessary and sufficient conditions for existing of the quaternion Householder reflection. Theorem 2.1.7. (H1 ) Suppose that x, y ∈ Qn , x 6= y. There exists a unit vector y−x u = ky−xk ∈ Qn such that H1 y = (I − 2uuH )y = x, if and only if kxk2 = kyk2 2 H and y x = xH y. In particular, if y ∈ Qn is not a multiple of e1 , which is the first column of unit matrix In , then the choice x = αe1 , in which − |yy11 | kyk2 , y1 6= 0, α= −kyk2 , otherwise, gives a reflection that maps y to αe1 . Proof. Necessity. If there exists a unit vector u ∈ Qn such that (I − 2uuH )y = x, then we observe that H1 , (I − 2uuH ) is a unitary matrix, (I − 2uuH )H (I − 2uuH ) = I − 2uuH − 2uuH + 4uuH uuH = I. So kxk2 = k(I −2uuH )yk2 = kyk2 . Left multiplying by yH , xH and uH in the both sides of (I − 2uuH )y = x, respectively, we obtain the following three equations yH y − 2yH uuH y = yH x, xH y − 2xH uuH y = xH x, uH y − 2uH uuH y = uH x ⇒ xH u = −yH u. From the above three equalities and the fact that kxk2 = kyk2 , we have xH y = xH x + 2xH uuH y = yH y − 2yH uuH y = yH x. Sufficiency. If x 6= y, kxk2 = kyk2 and yH x ∈ R, let u = easily obtain (I − 2uuH )y = x.
y−x ky−xk2 ,
then we can
The authors of [159] gave the following Householder based transformation.
44
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
Theorem 2.1.8. (H2 ) Householder2 Given an arbitrary quaternion vector x ∈ n Qn and a real vector √v ∈ R with unit norm, there exists a quaternion vector n u ∈ Q with kuk2 = 2 and a unit quaternion scalar z, such that z(I − uuH )x = kxk2 v in which ( 1 |xT v| = 0, T z = 1/ξ and ξ = − |xxT vv| otherwise, 2 u = x−ξvkxk and µ = µ H uu ).
p
kxk2 (kxk2 + |xT v|). We denote z(I − uuH ) as H2 , z(I −
If x ∈ Rn , we can perform a real Householder transformation on x. So for x = (x1 , x2 , · · · , xn )T ∈ Qn , we can first transform x into a real vector by left multiplying a diagonal unitary matrix XM , with x¯l xl 6= 0, |xl | (2.1.6) XM = diag(z1 , z2 , · · · , zn ), zl = 1 otherwise, then perform a real Householder transformation on XM x. This is the basis of the third form of quaternion Householder based transformation, introduced in [83]. Theorem 2.1.9. (H3 )Given an arbitrary quaternion vector x = (x1 , x2 , · · · , xn )T ∈ Qn , denote XM as in (2.1.6), there exists a real orthogonal vector u ∈ Rn such that (I − 2uuT )XM x = −kxk2 e1 . Here we denote H3 = (I − 2uuT )XM = H0 XM ,
(2.1.7)
in which H0 is the real Householder matrix. For given vector x ∈ Qn , by performing above Householder based transformations, we have Hl x = αe1 , l = 1, 2, 3. Notice that α are real numbers when l = 2, 3, while α is a quaternion when l = 1. In many applications, it is more convenient when α is a real number. For this purpose, the authors of [115] modify the Householder matrix H1 to obtain the Householder based transformation H4 , as mentioned below. Theorem 2.1.10. Suppose that 0 6= y ∈ Qn is not a multiple of e1 , denote u = y−αe1 ky−αe1k2 , in which − |yy11 | kyk2 , y1 6= 0, α= −kyk2 , otherwise,
α¯ αM = diag( |α| , In−1 ), and H4 = αM H1 . Then H4 maps y to |α|e1 .
Computing Matrix Decompositions
45
Remark 2.1.3. The following remarks are in order. 1. In Theorem 2.1.8, if v = e1 , then we have H2 = zH1 , in which z is a unit quaternion scalar which makes the first element of H1 x a positive number. 2. Hl for l = 1, 2, 3, 4 described above are all quaternion unitary matrices. Among them, only H1 is Hermitian and reflection. 3. In Theorem 2.1.7 and Theorem 2.1.8, the choices of signs in α and ξ can avoid the subtraction of two close numbers, which will guarantee the numerical stability, as in the real matrix case, see, e. g., [64]. Because both H2 and H4 depend on H1 , now we only propose the procedure of computing the Householder vector and the scalar in H1 . The following is the numerical code for generating the real representation matrix u of the quaternion Householder vector u1 and the scalar β1 in H1 . Algorithm 2.1.11. A method for generating the real representation matrix u of the quaternion Householder vector u1 and the scalar β1 in H1 from a given quaternion vector x = x1 + x2 i + x3 j + x4 k ∈ Qn . Function : [u, β1] = Householder1(x1 , x2 , x3 , x4 , n) u1 (1 : n, 1 : 4) = [x1 , x2 , x3 , x4 ]; aa = norm([x1 ; x2 ; x3 ; x4 ]); xx = norm([x1 (1), x2 (1), x3 (1), x4 (1)]); if xx == 0 α1 = aa ∗ [1, 0, 0, 0]; else α1 = −(aa/xx) ∗ ([x1 (1), x2 (1), x3 (1), x4 (1)]); end u1 (1, 1 : 4) = u1 (1, 1 : 4) − α1 ; β1 = 1/(aa ∗ (aa + xx)); u = Real p(u1 (:, 1), u1(:, 2), u1(:, 3), u1(:, 4)); end
2.2. The Quaternion LU Decomposition The LU decomposition is a type of matrix decomposition that decomposes a matrix into a product of a unit lower triangular matrix and an upper triangular matrix. It is mainly used in numerical computations to solve the problems of linear systems or inverse matrices. In this section, we propose real structure-preserving algorithm for the quaternion LU decomposition [118], based on the identities in Theorem 1.8.7.
46
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
First, we review quaternion Gauss transformation which is the tool of the quaternion LU decomposition.
1. The Quaternion LU Decomposition For a quaternion matrix A ∈ Qm×m , if A(1, 1) is nonzero, we can construct a quaternion Gauss transformation L1 such that (1) (1) (1) a11 a12 · · · a1m (2) (2) 0 a22 · · · a2m (2) (2) (2) 0 a · · · a L1 A = 32 3m = A . .. .. .. .. . . . . 0
(2)
(2)
am2 · · · amm
In general, after k − 1 steps, we have (1) (1) (1) a11 a12 · · · a1k (2) (2) a22 · · · a2k .. .. . . (k) A = (k) akk .. . (k) amk (k)
(1)
··· ···
a1m (2) a2m .. .
···
akm .. .
(k)
(k)
· · · amm
.
If akk 6= 0, then we can constuct a quaternion Gauss transformation Lk A(k) to eli(k) (k) mate ak+1,k, · · · , am,k . If the principal square submatrices Ai , (i = 1, · · · , n − 1) of A are all nonsingular, we can construct a series of quaternion Gauss transformations L1 , L2 , · · · , Łn−1 such that Ln−1 · · ·L1 A = U. −1 −1 Let L = L−1 1 L2 · · ·Ln−1 , then
A = LU,
in which L is a unit lower triangular quaternion matrix and U is an upper triangular quaternion matrix. In this process, the principal diagonal entries A(k, k) are all nonzeros, which can be used as denominator to obtain multipliers. These quantities are referred to as pivots. We now discuss properties of Li , i = 1, 2, · · · , n − 1.
Computing Matrix Decompositions Theorem 2.2.1. Suppose that Li = In − li eTi 1. For i < j, Li L j = In − li eTi − l j eTj . T 2. L−1 i = In + li ei . 1 0 ··· ∗ 1 ··· −1 −1 3. L = L−1 . 1 L2 · · ·Ln−1 = .. . ..
47
∈ Qn×n are given in (2.1.1). Then 0 0 .. .
∗ ∗ ··· 1
The following is the numerical code for the real structure-preserving algorithm of the quaternion LU decomposition. Algorithm 2.2.2. (Real structure-preserving algorithm of the quaternion LU decomposition) For a given nonsingular matrix A = A1 + A2 i + A3 j + A4 k ∈ Qm×m , the input AA is the first column block of A, i.e. ARc , the output LL and UU are the first column block of a quaternion unit lower triangular matrix L and a quaternion upper triangular matrix U, respectively, such that A = LU. Function : [LL,UU] = qLU(AA) B = AA; for k = 1 : m − 1 α = norm([B(k, k), B(m + k, k), B(2m + k, k), B(3m + k, k)]); B([k + 1 : m, k + 1 + m : 2m, k + 1 + 2m : 3m, k + 1 + 3m : 4m], k) = ... Real p(B(k + 1 : m, k), B(k + 1 + m : 2m, k), ... B(k + 1 + 2m : 3m, k), B(k + 1 + 3m : 4m, k))... ∗([B(k, k); −B(k + m, k); −B(k + 2m, k); −B(k + 3m, k)]/α2 ); B([k + 1 : m, k + 1 + m : 2m, k + 1 + 2m : 3m, k + 1 + 3m : 4m], k + 1 : m) = ... B([k + 1 : m, k + 1 + m : 2m, k + 1 + 2m : 3m, k + 1 + 3m : 4m], k + 1 : m)... −Real p(B(k + 1 : m, k), B(k + 1 + m : 2m, k), ... B(k + 1 + 2m : 3m, k), B(k + 1 + 3m : 4m, k))... ∗B([k, k + m, k + 2 ∗ m, k + 3 ∗ m], k + 1 : m); %(1) end UU = [triu(B(1 : m, :)); triu(B(m + 1 : 2m, :)); ... triu(B(2m + 1 : 3m, :)); triu(B(3m + 1 : 4m, :))]; LL = [tril(B(1 : m, :), −1) + eye(m); tril(B(m + 1 : 2 ∗ m, :), −1); ... tril(B(2 ∗ m + 1 : 3 ∗ m, :), −1);tril(B(3 ∗ m + 1 : 4 ∗ m, :), −1)]; end
2. The Partial Pivoting Quaternion LU Decomposition When A ∈ Qm×m is nonsingular, we can not guarantee that the principal diagonal entries A(k)(k, k) (k = 1 : n − 1) are nonzeros or |A(k)(k, k)| (k = 1 : n − 1)
48
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
are relatively large. In fact, if the norms of pivots are relatively small, the LU decomposition is unstable. In order to guarantee computational stability, we can interchange two rows of A(k), which is equivalent to left-multiplying a permutation matrix to A(k). So we have the following theorem. Theorem 2.2.3. [178] Suppose that A = A1 + A2 i + A3j + A4 k ∈ Qm×m is nonsingular. Then there exist a permutation matrix P ∈ Qm×m , a unit lower triangular matrix L ∈ Qm×m and an upper triangular matrix U ∈ Qm×m such that PA = LU. A particular row interchange strategy, which is called partial pivoting, can be used to guarantee that absolute values of all multipliers are not greater than one. To get the smallest possible multipliers in the kth Gauss transformation using row interchanges for k = 1 : n − 1, we need |A(k, k)| to be the largest in A(k : m, k). Assume that |A(µ, k)| = kA(k : m, k)k∞, then interchange the kth and the µth rows, which is equivalent to left-multiply a permutation matrix Ek . Let A = Ek A, we can determine the Gauss transformation Lk such that ν is the kth column of Lk A, then ν(k + 1 : m) = 0. Upon completion we emerge with Ln−1 En−1 · · ·L1 E1 A = Ln−1 Ln−2 · · ·L1 PA = U, where U is an upper triangular matrix. Partial pivoting effectively guarantee stable computations. The overhead associated with partial pivoting is minimal from the standpoint of floating point arithmetic. In the following, we give the numerical code for real structure-preserving algorithm of the partial pivoting LU decomposition of A ∈ Qm×m . Algorithm 2.2.4. (The real structure-preserving algorithm of the partial pivoting quaternion LU decomposition) For a given nonsingular matrix A = A1 + A2 i + A3 j + A4 k ∈ Qm×m , the input AA is the first column block of A, i.e. ARc , the output LL and UU are the first column block of a unit lower triangular quaternion matrix L and an upper triangular quaternion matrix U, respectively, and a vector J which expresses the permutation matrix P, such that PA = LU.
Computing Matrix Decompositions
49
end
Function: [J, LL,UU] = qPLU(AA) B = AA; J = (1 : m); f or k = 1 : m − 1 [mx, id] = max(sum(([B(k : m, k), B(m + k : 2 ∗ m, k), B(2 ∗ m + k : 3 ∗ m, k), ... B(3 ∗ m + k : 4 ∗ m, k)]. ∗ [B(k : m, k), B(m + k : 2 ∗ m, k), ... B(2 ∗ m + k : 3 ∗ m, k), B(3 ∗ m + k : 4 ∗ m, k)])0)); %(1) i f id > 1 id = id + k − 1; B([k, id, k + m, id + m, k + 2 ∗ m, id + 2 ∗ m, k + 3 ∗ m, id + 3 ∗ m], :)... = B([id, k, id + m, k + m, id + 2 ∗ m, k + 2 ∗ m, id + 3 ∗ m, k + 3 ∗ m], :); J([k, id]) = J([id, k]); end B([k + 1 : m, k + 1 + m : 2m, k + 1 + 2m : 3m, k + 1 + 3m : 4m], k) = ... Real p(B(k + 1 : m, k), B(k + 1 + m : 2m, k), B(k + 1 + 2m : 3m, k), ... B(k + 1 + 3m : 4m, k)) ∗ ([B(k, k); −B(k + m, k); −B(k + 2m, k); ... −B(k + 3m, k)]/mx); B([k + 1 : m, k + 1 + m : 2m, k + 1 + 2m : 3m, k + 1 + 3m : 4m], k + 1 : m) = ... B([k + 1 : m, k + 1 + m : 2m, k + 1 + 2m : 3m, k + 1 + 3m : 4m], k + 1 : m) − ... Real p(B(k + 1 : m, k), B(k + 1 + m : 2m, k), B(k + 1 + 2m : 3m, k), ... B(k + 1 + 3m : 4m, k)) ∗ B([k, k + m, k + 2 ∗ m, k + 3 ∗ m], k + 1 : m); %(2)
end
UU = [triu(B(1 : m, :)); triu(B(m + 1 : 2m, :)); ... triu(B(2m + 1 : 3m, :)); triu(B(3m + 1 : 4m, :))]; LL = [tril(B(1 : m, :), −1) + eye(m); tril(B(m + 1 : 2 ∗ m, :), −1); ... tril(B(2 ∗ m + 1 : 3 ∗ m, :), −1); tril(B(3 ∗ m + 1 : 4 ∗ m, :), −1)];
Remark 2.2.1. In fact, in assignment lines containing %(1) of Function qPLU, we select the maximal square of norms of the elements in A(k)(k : m, k) and record the size of the maximal, i.e., [mx, id] = max(|A(k)(k, k)|2, |A(k)(k + 1, k)|2, · · · , |A(k)(m, k)|2). Remark 2.2.2. In the Function qPLU, the vector J is the index which records the permutation of the rows of A such that LU = PA = A([J], :). Now we provide two numerical examples to compare the efficiency of the real structure-preserving algorithms qLU, qPLU and QTFM on randomly obtained quaternion matrices of sizes 5k × 5k for k = 1 : 100. We compare the CPU times and errors of the real structure-preserving algorithms and QTFM for performing the qLU, qPLU decompositions.
50
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
In the first example, we choose generated matrices diagonal dominant, so that we can perform the algorithm qLU. We perform the algorithms qLU and the ’lu’ command in QTFM. 15 −
CPU time(second)
−real structure−preserving Algorithm *−lu of Quaternion Toolbox
10
5
0
0
20
40 60 k=1:100 (matrices sizes:5k*5k)
80
100
Figure 2.2.1. CPU times for the LU decomposition. Fig 2.2.1 shows that the CPU time of the real structure-preserving algorithm qLU is lower, which costs about the one-fifth CPU time of QTFM LU when the size of quaternion matrix is 500 ∗ 500. In Fig 2.2.2, we observe that the Frobenius norm of LU − A in the real structure-preserving algorithm is about one percent of that in QTFM LU. When the size of quaternion matrix is 500 ∗ 500, absolute error is 0.2156 ∗ 10−9 in QTFM LU, whereas it is only 0.2958 ∗ 10−11 in real structure-preserving algorithm. In the second example, we perform the algorithm qPLU and ’lu’ command in QTFM. In Fig 2.2.3 and Fig 2.2.4, we observe big difference of CPU times and errors between the real structure-preserving algorithm qPLU and QTFM LU. CPU time of qPLU is about one-fifth of that of QTFM LU. When the size of quaternion matrix is 500 ∗ 500, the Frobenius norm of LU − A is 0.4939 ∗ 10−11 in the real structure-preserving algorithm, whereas it is 0.1675 ∗ 10−10 in QTFM
51
Computing Matrix Decompositions −11
1.6
x 10
−
−structure−preserving Algorithm *−−lu of Quaternion Toolbox
1.4
error(F norm)
1.2 1 0.8 0.6 0.4 0.2 0
0
20
40 60 k=1:100 (matrices sizes:5k*5k)
80
100
Figure 2.2.2. Errors (F-norm) for the LU decomposition. LU, that is, the absolute error of QTFM LU is about 10 times of that of the real structure-preserving algorithm.
2.3. The Quaternion LDLH and Cholesky Decompositions When A is a real position definite matrix, we can perform the LDLT and LLT decompositions [64]. In this section, we generalize the real LDLH and LLT decompositions to propose the real structure-preserving LDLH and LLH decompositions for quaternion Hermitian positive definite matrices [119]. Theorem 2.3.1. [177] For an Hermitian positive definite matrix A ∈ Qm×m , there exists a unique triangular matrix L ∈ Qm×m with positive diagonal elements such that A = LLH . The first algorithm we propose is the real structure-preserving algorithm qLDLH1, which is based on the quaternion LU decomposition.
52
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao 15 −
CPU time(second)
−structure−preserving Algorithm *−−lu of Quaternion Toolbox
10
5
0
0
20
40 60 k=1:100 (matrices sizes:5k*5k)
80
100
Figure 2.2.3. CPU times for the partial pivoting LU decomposition. Because the quaternion matrix A ∈ Qm×m is Hermitian positive definition, the principal square submatrices Ai (i = 1, · · · , m − 1) of A are all nonsingular, we have A = LU, in which L is a quaternion unit lower triangular matrix and U is a quaternion upper triangular matrix and all the diagonal elements of U are positive. When A ∈ Qm×m is Hermitian positive definite, it satisfies the conditions that the principal square submatrices of A are all nonsingular. Let D be a diagonal matrix whose diagonal elements are those of U. Then LH = D−1U, because A is Hermitian, and so A = LDLH . We take out the first column block of the real representation of A, i.e., ARc , and implement LDLH decomposition of A by means of executing the following numerical code for the real structure-preserving algorithm on ARc . Algorithm 2.3.2. (qLDLH1: The quaternion LDLH decomposition) For a given Hermitian positive definite matrix A = A1 + A2 i + A3 j + A4 k ∈ Qm×m ,
53
Computing Matrix Decompositions −11
x 10
−
error(F norm)
−structure−preserving Algorithm *−−lu of Quaternion Toolbox
1
0
0
20
40 60 k=1:100 (matrices sizes:5k*5k)
80
100
Figure 2.2.4. Errors (F-norm) for the partial pivoting LU decomposition. the input AA is the first column block of AR , i.e. ARc , the output LL is the first column block of LR , and D is the diagonal matrix such that A = LDLH . Function : [LL, D] = qLDLH1(AA) B = AA; m = size(B, 2); for k = 1 : m − 1 B([k + 1 : m, k + 1 + m : 2 ∗ m, k + 1 + 2 ∗ m : 3 ∗ m, k + 1 + 3 ∗ m : 4 ∗ m], k) = ... B([k + 1 : m, k + 1 + m : 2 ∗ m, k + 1 + 2 ∗ m : 3 ∗ m, k + 1 + 3 ∗ m : 4 ∗ m], k)... /B(k, k); B([k + 1 : m, k + 1 + m : 2 ∗ m, k + 1 + 2 ∗ m : 3 ∗ m, k + 1 + 3 ∗ m : 4 ∗ m], ... k + 1 : m) = ... B([k + 1 : m, k + 1 + m : 2 ∗ m, k + 1 + 2 ∗ m : 3 ∗ m, k + 1 + 3 ∗ m : 4 ∗ m], ... k + 1 : m)...
54
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
−Real p(B(k + 1 : m, k), B(k + 1 + m : 2 ∗ m, k), B(k + 1 + 2 ∗ m : 3 ∗ m, k), ... B(k + 1 + 3 ∗ m : 4 ∗ m, k)) ∗ B([k, k + m, k + 2 ∗ m, k + 3 ∗ m], k + 1 : m); end LL = [tril(B(1 : m, :), −1) + eye(m); tril(B(m + 1 : 2 ∗ m, :), −1); ... tril(B(2 ∗ m + 1 : 3 ∗ m, :), −1); tril(B(3 ∗ m + 1 : 4 ∗ m, :), −1)]; D = diag(diag(B(1 : m, 1 : m))); end
The second algorithm that we propose is another LDLH decomposition of A, which is the generalization of the real LDLT decomposition (P.158, [64]). Suppose that A = LDLH , and we know the first j − 1 columns of L, diagonal elements d1 , · · · , d j−1 for some j with 1 6 j 6 m. From A(1 : j, j) = L(1 : j, 1 : j)v(1 : j) with
we have
d1 L( j, 1) .. .
v(1 : j) = d j−1 L( j, j − 1) dj
,
j−1
v( j) = d j = A( j, j) − ∑ dk |L( j, k)|2, k=1
and the jth column of L satisfies L( j + 1 : m, j)v( j) = A( j + 1 : m, j) − L( j + 1 : m, 1 : j − 1)v(1 : j − 1). We implement the LDLH decomposition of A by means of executing the following numerical code for the real structure-preserving algorithm, named qLDLH2, on ARc . Algorithm 2.3.3. (qLDLH2: The quaternion LDLH decomposition) For a given Hermitian positive definite matrix A = A1 + A2 i + A3 j + A4 k ∈ Qm×m , the input AA is the first column block of A, i.e. ARc , the output LL is the first column block of the lower triangular quaternion matrix L and D is the diagonal matrix such that A = LDLH .
55
Computing Matrix Decompositions Function : [LL,D] = qLDLH2(AA) B = AA; v = zeros(4 ∗ m,1); B([2 : m,2 + m : 2 ∗ m,2 + 2 ∗ m : 3 ∗ m,2 + 3 ∗ m : 4 ∗ m],1) = ... B([2 : m,2 + m : 2 ∗ m,2 + 2 ∗ m : 3 ∗ m,2 + 3 ∗ m : 4 ∗ m],1)/B(1,1); for j = 2 : m for i = 1 : j − 1 v([i,i + m,i + 2 ∗ m,i + 3 ∗ m],1) = ... B(i,i) ∗ [B( j,i);−B( j + m,i);−B( j + 2 ∗ m,i);−B( j + 3 ∗ m,i)]; end B( j, j) = B( j, j) − [B( j,1 : j − 1),−B( j + m,1 : j − 1),−B( j + 2 ∗ m,1 : j − 1),... −B( j + 3 ∗ m,1 : j − 1)] ∗ ... v([1 : j − 1,m + 1 : j − 1 + m,2 ∗ m + 1 : j − 1 + 2 ∗ m,3 ∗ m + 1 : j − 1 + 3 ∗ m],1); B([ j + 1 : m, j + 1 + m : 2 ∗ m, j + 1 + 2 ∗ m : 3 ∗ m, j + 1 + 3 ∗ m : 4 ∗ m], j) = ... (B([ j + 1 : m, j + 1 + m : 2 ∗ m, j + 1 + 2 ∗ m : 3 ∗ m, j + 1 + 3 ∗ m : 4 ∗ m], j) − ... Real p(B( j + 1 : m,1 : j − 1),B( j + 1 + m : 2 ∗ m,1 : j − 1),... B( j + 1 + 2 ∗ m : 3 ∗ m,1 : j − 1),B( j + 1 + 3 ∗ m : 4 ∗ m,1 : j − 1))... ∗v([1 : j − 1,m + 1 : j − 1 + m,2 ∗ m + 1 : j − 1 + 2 ∗ m,3 ∗ m + 1 : j − 1 + 3 ∗ m],1))... /B( j, j); end LL = [tril(B(1 : m,:),−1) + eye(m);tril(B(m + 1 : 2 ∗ m,:),−1);... tril(B(2 ∗ m + 1 : 3 ∗ m,:),−1);tril(B(3 ∗ m + 1 : 4 ∗ m,:),−1)]; D = diag(diag(B(1 : m,:))); end
The third algorithm that we propose is a generalization of the real Cholesky decomposition, the quaternion Cholesky decomposition. It can be derived from the partitionin
A=
α vH v B
=
β
0
v β
Im−1
1 0 H 0 B − vvα
"
H
β vβ 0 Im−1
#
,
(2.3.1)
We can derive the quaternion Cholesky decomposition by repeated application of (2.3.1). Now, we propose the following numerical code for the real structure-preserving algorithm qChol, of the quaternion Cholesky decomposition based on outer product. Algorithm 2.3.4. (qChol: The quaternion Cholesky decomposition) For a given Hermitian positive definite matrix A = A1 + A2 i + A3 j + A4 k ∈ Qm×m , the input AA is the first column block of AR , i.e. ARc , the output LL is the first column block of LR such that A = LLH .
56
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao Table 2.3.1. Computational amounts and assignment numbers real flops qLDLH1 qLDLH2 qChol
32m3 3 16m3 3 16m3 3
assignment numbers 2m + 2 m2 +3m 2
5m
Function : LL = qChol(AA) B = AA; for k = 1 : m B(k, k) = sqrt(B(k, k)); B([k + 1 : m, k + 1 + m : 2 ∗ m, k + 1 + 2 ∗ m : 3 ∗ m, k + 1 + 3 ∗ m : 4 ∗ m], k) = ... B([k + 1 : m, k + 1 + m : 2 ∗ m, k + 1 + 2 ∗ m : 3 ∗ m, k + 1 + 3 ∗ m : 4 ∗ m], k)... /B(k, k); C = Real p(B(k + 1 : m, k), B(k + 1 + m : 2 ∗ m, k), B(k + 1 + 2 ∗ m : 3 ∗ m, k), ... B(k + 1 + 3 ∗ m : 4 ∗ m, k)); Ct = C 0 ; B([k + 1 : m, k + 1 + m : 2 ∗ m, k + 1 + 2 ∗ m : 3 ∗ m, k + 1 + 3 ∗ m : 4 ∗ m], ... k + 1 : m) = B([k + 1 : m, k + 1 + m : 2 ∗ m, k + 1 + 2 ∗ m : 3 ∗ m, ... k + 1 + 3 ∗ m : 4 ∗ m], k + 1 : m) −C ∗ Ct(:, 1 : m − k); end LL = zeros(4 ∗ m, m); LL([2 : m, 2 + m : 2 ∗ m, 2 + 2 ∗ m : 3 ∗ m, 2 + 3 ∗ m : 4 ∗ m], 1 : m − 1) = ... [tril(B(2 : m, 1 : m − 1)); tril(B(m + 2 : 2 ∗ m, 1 : m − 1)); ... tril(B(2 ∗ m + 2 : 3 ∗ m, 1 : m − 1)); tril(B(3 ∗ m + 2 : 4 ∗ m, 1 : m − 1))]; LL(1 : m, :) = LL(1 : m, :) + diag(diag(B(1 : m, :))); end
We list the real computational amounts and the assignment numbers of above algorithms for A ∈ Qm×m in Table 2.3.1. In this Table, we can see that the real computational amount of qLDLH2 is half of qLDLH1, but the assignment number of qLDLH2 is much higher than qLDLH1 and qChol. Because in qLDLH1 and qChol, we can use high-level operations at each step of computations, while in qLDLH2, we can not compute v(1 : j, 1) with high-level operations. Now we provide numerical examples to compare the efficiency of the above three real structure-preserving algorithms qLDLH1, qLDLH2, qChol with QTFM on array sizes of 10k × 10k for k = 1 : 50.
57
Computing Matrix Decompositions 14 −
−qLDLH1 o−−qLDLH2 −−qChol −−lu of Quaternion Toolbox
12
CPU time(second)
10
8
6
4
2
0
0
10
20 30 k=1:50 (matrices sizes:10k*10k)
40
50
Figure 2.3.1. CPU times for the Cholesky decomposition. −10
x 10
−
−qLDLH1 o−−qLDLH2 −−qChol −−lu of Quaternion Toolbox
2.5
error(F norm)
2
1.5
1
0.5
0
0
10
20 30 k=1:50 (matrices sizes:10k*10k)
40
50
Figure 2.3.2. Errors (F-norm) for the Cholesky decomposition.
58
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
From Fig 2.3.1, we observe that, all three real structure-preserving algorithms are superior to that of QTFM. When the sizes of the matrices are large, the CPU time of qLDLH1 is almost the same as that of qChol, which is about two thirds of that of qLDLH2, and one-fifth of that of the function 0 lu0 of Quaternion toolbox, respectively. In Fig 2.3.2, we observe that the Frobenius norms of LLH − A of qLDLH1, qChol and the function 0 lu0 of Quaternion toolbox are almost the same, while that of qLDLH2 is about twice of that in the other three algorithms. Therefore, both algorithms qLDLH1 and qChol are most efficient.
2.4. The Quaternion QR Decomposition The QR decomposition is one of the most important numerical tools in matrix theory and numerical computations. For a real matrix A, several efficient methods are developed to perform the QR decomposition. In this section, we generalize this idea to provide the quaternion QR decomposition. Theorem 2.4.1. (The quaternion QR decomposition) For given A ∈ Qrm×n with r > 0, there exist a permutation matrix Π ∈ Rn , Q ∈ Um and an upper trapezoidal matrix R ∈ Qr×n such that r R AΠ = Q . (2.4.1) 0 Proof. We use induction on r, the rank of A. It is obvious when r = 1. If 1 ≤ r < k, the result holds. When r = k, there exists a permutation q matrix Πk e ≡ AΠk . Let aeH ae1 = r11 , such that ae1 6= 0, which is the first column of A 1
e = (e e2 ) ∈ Um . Then qe1 = ae1 /r11 , Q q1 , Q r11 ∗ He e Q A= , 0 A1
in which rank(A1 ) = k − 1. By the induction hypothesis, there exist a permue 1 ∈ Rn−1 , Q e1 ∈ Um−1 and an upper trapezoidal matrix R1 ∈ tation matrix Π R1 (k−1)×(n−1) e e e 1 ), Q = Qk−1 such that A1 Π1 = Q1 . Take Π = Πk diag(1, Π 0 R e e Qdiag(1, Q1 ), then we have AΠ = Q . 0
Computing Matrix Decompositions Corollary 2.4.2. (Full rank decomposition) there exist F ∈ Qrm×r and G ∈ Qr×n such that r
59
For given A ∈ Qrm×n with r > 0,
A = FG.
(2.4.2)
Proof. In Theorem 2.4.1, let F = Q1 ∈ Qrm×r , i.e., the first r columns of Q, G = RΠT ∈ Qr×n r , then we have A = FG. e in which Remark 2.4.1. If we rewrite the QR decomposition of A as A = Q1 R, T e Q1 is the first r columns of Q, R = RΠ . Then the decomposition A = Q1 Re is called unitary decomposition of A. In addition, for an n × n nonsingular matrix A, its QRD A = QR is unique upon to a diagonal matrix D with |D| = In , in which Q is an n × n unitary matrix, and R is a n × n upper triangular with nonzero diagonal entries. In fact, if A = Q1 R1 = Q2 R2 are two QRD of A, in which Q1 , Q2 are n × n unitary matrices, and R1 , R2 are n×n upper triangular matrices, then Q1 = Q2 D, R1 = D−1 R2 , in which |D| = In .
2.4.1.
The Quaternion Householder QRD
In §2.1.2, we introduce four quaternion Householder based transformations. In this subsection, we apply these Householder based transformations to perform the quaternion Householder QRD, we compare the computational amounts and assignment numbers of different Householder transformations to find most efficient Householder based transformations [115]. For convenience, H1 can be rewritten as H1 = I − βuuH , in which u = x − y, β = 2/(x − y)H (x − y), so as H0 . Also, to reduce the computational amounts, for A ∈ Qm×n , we use the following forms to compute Hl A, l = 1, 2, 3, 4. Notice that this trick is used in the Householder transformations for real matrices, see, e. g. , [64]. 1. H1 A = A − (βu)(uH A), 2. H2 A = zH1 A = z(A − (βu)(uH A)), ˆ 3. H3 A = H0 XM A = Aˆ − (βu)(uT A), (2.4.3) ¯ ¯ ¯ A A A 11 21 m1 ˆ A = diag( |A11 | , |A21 | , · · · , |Am1 | )A, 4. H4 A = αM H1 A = αM (A − (βu)(uH A)).
60
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
Here we describe these procedures of Hl A, (l = 1 : 4) with a 5 × 4 quaternion matrix. × × × × × × × × A= × × × × , × × × × × × × × y y y y 0 y y y , H1 A = 0 y y y 0 y y y 0 y y y r z z z y y y y 0 y y y 0 z z z H2 A = ξ 0 y y y = 0 z z z , 0 y y y 0 z z z 0 z z z 0 y y y r y y y r z z z r y y y 0 z z z H3 A = H0 r y y y = 0 z z z , r y y y 0 z z z r y y y 0 z z z y y y y r z z z 0 y y y 0 y y y ξ 0 0 y y y = 0 y y y , H4 A = 0 I 0 y y y 0 y y y 0 y y y 0 y y y in which the letters r denote some real numbers. Denote nr and nq as real number and quaternion number, we list computational amounts for computing Hl A with A ∈ Qm×n , as shown in Table 2.4.2. In the process of multiplication of two quaternion numbers, there are 16 multiplications and 12 additions of real numbers. The difference of calculations of four kinds of Householder based transformations mainly concentrate on the multiplication of quaternion numbers. We can see that the performance of H3 A
61
Computing Matrix Decompositions Table 2.4.2. Computational amounts for Hl A
H1 A H2 A H3 A H4 A
nq ∗ nq 2mn 3mn mn 2mn + n
nq + nq (2m − 1)n (2m − 1)n (2m − 1)n (2m − 1)n
n r ∗ nq m m 2mn m
nr + nr
m
Table 2.4.3. Computational amounts and assignment numbers for (Hl A)Rc (H1 A)Rc (H2 A)Rc (H3 A)Rc (H4 A)Rc
real flops 64mn 92mn 44mn 64mn
assignment numbers 4 m+5 2m + 2 6
has the lowest amount of computations, which is one-second of those of H1 A and H4 A, and one-third of that of H2 A. Next, we propose real structure-preserving algorithms for quaternion Householder based transformations. In Table 2.4.2 we list the numbers of quaternion arithmetic for computing Hl A. In Table 2.4.3, we list real flops and assignment numbers of (Hl A)Rc for a quaternion matrix A. Comparing Tables 2.4.2 and 2.4.3, we observe that, the numbers of real arithmetic contained in computing Hl A and (Hl A)Rc are the same, because the multiplication of two quaternion numbers contains 12 real additions and 16 real multiplications, and the addition of two quaternion numbers contains 4 real additions. Therefore, real structure-preserving algorithms are much more efficient than the original transformations, the latter adopt quaternion arithmetic. Let A ∈ Qm×n be of full column rank n and m ≥ n. We can obtain the QR decomposition utilizing Householder transformations as follows, " # " # " # In−1 0 Is 0 1 0 (0) R= ... ... Hl A, (2.4.4) (n−1) (s) (1) 0 Hl 0 Hl 0 Hl
62
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao (s)
in which for l = 1, · · · , 4 and s = 0, · · · , n − 1, Hl are m − s unitary quaternion matrices of stye Hl defined above. After n steps, we obtain the upper triangular matrix R, and " #H " #H " #H 1 0 Is 0 In−1 0 (0) H Q = (Hl ) ... ... . (2.4.5) (1) (s) (n−1) 0 Hl 0 Hl 0 Hl We illustrate this processes with a 5 × 4 quaternion matrix A.
A=
=⇒
× × × × × y 0 0 0 0
× × × × × y z 0 0 0
× × × × × y z w 0 0
× y y y 0 y y × × =⇒ 0 y y 0 y y × × 0 y y y y y y 0 z z z w =⇒ 0 0 w 0 0 0 w 0 0 0 w
y y y =⇒ y y y z w . u 0
y y y 0 z z 0 0 z 0 0 z 0 0 z
y z z z z
It is not difficult to write real structure-preserving algorithms from (2.4.4) (s) and Hl by applying Theorem 2.1.7–Theorem 2.1.10 and Algorithm 2.1.11 proposed above to perform real structure-preserving QR decomposition algorithms, named qQR1, qQR2, qQR3 and qQR4, respectively. Notice that the initial data is contained in B := ARc , and the factorizations are also performed on B. For brevity, here we will give numerical codes for algorithms of qQR1 and qQR4. Table 2.4.4. Computational amounts and assignment numbers for qQRl
qQR1 qQR2 qQR3 qQR4
real flops 32(mn2 − n3 /3) 46(mn2 − n3 /3) 22(mn2 − n3 /3) 32(mn2 − n3 /3)
assignment numbers 4n nm − n2 /2 2nm − n2 6n
Algorithm 2.4.3. (qQR1) For a given quaternion matrix A = A1 + A2 i + A3 j + A4 k ∈ Qnm×n , the input AA is the first column block of
Computing Matrix Decompositions
63
AR , i.e. ARc , the output B1 is the first column block of RR in the QR decomposition of A. Function : B1 = qQR1(AA) [M, n] = size(AA); m = M/4; B1 = AA; for s = 1 : n [u1, beta1] = Householder1(B1(s : m, s), B1((m + s) : (2 ∗ m), s), ... B1((2 ∗ m + s) : (3 ∗ m), s), B1((3 ∗ m + s) : (4 ∗ m), s), m − s + 1); u1R = Real p(u1(:, 1), u1(:, 2), u1(:, 3), u1(:, 4)); Y = B1([s : m, s + m : 2 ∗ m, s + 2 ∗ m : 3 ∗ m, s + 3 ∗ m : 4 ∗ m], s : n); Y = Y − (beta1 ∗ u1R) ∗ (u1R0 ∗ Y ); B1([s : m, s + m : 2 ∗ m, s + 2 ∗ m : 3 ∗ m, s + 3 ∗ m : 4 ∗ m], s : n) = Y ; end
Algorithm 2.4.4. (qQR4) For a given quaternion matrix A = A1 + A2 i + A3 j + A4 k ∈ Qnm×n , the input AA is the first column block of AR , i.e. ARc , the output B4 is the first column block of RR in the QR decomposition of A. Function : B4 = qQR4(AA) [M, n] = size(AA); m = M/4; B4 = AA; for s = 1 : n [u4, beta4] = Householder1(B4(s : m, s), B4((m + s) : (2 ∗ m), s), ... B4((2 ∗ m + s) : (3 ∗ m), s), B4((3 ∗ m + s) : (4 ∗ m), s), m − s + 1); u4R = Real p(u4(:, 1), u4(:, 2), u4(:, 3), u4(:, 4)); Y = B4([s : m, s + m : 2 ∗ m, s + 2 ∗ m : 3 ∗ m, s + 3 ∗ m : 4 ∗ m], s : n); Y = Y − (beta4 ∗ u4R) ∗ (u4R0 ∗ Y ); B4([s : m, s + m : 2 ∗ m, s + 2 ∗ m : 3 ∗ m, s + 3 ∗ m : 4 ∗ m], s : n) = Y ; G4 = JRSGivens(B4(s, s), B4(s + m, s), B4(s + 2 ∗ m, s), B4(s + 3 ∗ m, s)); B4([s, s + m, s + 2 ∗ m, s + 3 ∗ m], s : n) = ... G40 ∗ B4([s, s + m, s + 2 ∗ m, s + 3 ∗ m], s : n); end end
The computational flops and the assignment numbers of qQRl for A ∈ Qm×n are listed in Table 2.4.4. We now provide a numerical example to compare the performance of our real structure-preserving algorithms and other two known methods. For a = 9, b = 6, k = 1 : 60, m = ak, n = bk, we perform five different algorithms to compute the QR decomposition of A: four real structure-preserving algorithms qQR1, qQR2, qQR3, qQR4 and one adopting the function 0 qr0 in QTFM [158]. We compare the CPU times of these five different algorithms.
64
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
From Fig 2.4.1, we observe that four real structure-preserving algorithms are superior to the function 0 qr0 in QTFM. Among four real structure-preserving algorithms, qQR1 and qQR4 cost about one-twentieth CPU time of that by running 0 qr0 of QTFM. Therefore, both algorithms qQR1 and qQR4 are most efficient. In the QR decomposition, column pivoting is the key step to guarantee numerical stability. For real structure-preserving algorithms of the QR decomposition of quaternion matrices, this step can be easily performed, because we have that, kzk2 = kzRc k2 for a quaternion vector z. In the follows we propose numerical codes for two real structure-preserving Householder QRD algorithms based on H1 and H4 . Algorithm 2.4.5. (qPQR1) For a given quaternion matrix A = A1 + A2 i + A3 j + A4 k ∈ Qm×m , the input AA is the first column block of AR , i.e. ARc , the output J is a vector which expresses the permutation matrix P, and B1 is the first column block of RR in QR decomposition of A with column pivoting. Function : [B1,J] = qPQR1(AA) [M,n] = size(AA); m = M/4; J = (1 : n); B1 = AA; for s = 1 : n − 1 [mx,id] = max(sum(B1([s : m,m + s : 2 ∗ m,2 ∗ m + s : 3 ∗ m,3 ∗ m + s : 4 ∗ m],... s : n). ∗ B1([s : m,m + s : 2 ∗ m,2 ∗ m + s : 3 ∗ m,3 ∗ m + s : 4 ∗ m],s : n))); if id > 1 id = id + s − 1; B1(:,[s,id]) = B1(:,[id,s]); J([s,id]) = J([id,s]); end [u1,beta1] = Householder1(B1(s : m,s),B1((m + s) : (2 ∗ m),s),... B1((2 ∗ m + s) : (3 ∗ m),s),B1((3 ∗ m + s) : (4 ∗ m),s),m − s + 1); u1R = Real p(u1(:,1),u1(:,2),u1(:,3),u1(:,4)); Y = B1([s : m,s + m : 2 ∗ m,s + 2 ∗ m : 3 ∗ m,s + 3 ∗ m : 4 ∗ m],s : n); Y = Y − (beta1 ∗ u1R) ∗ (u1R0 ∗Y ); B1([s : m,s + m : 2 ∗ m,s + 2 ∗ m : 3 ∗ m,s + 3 ∗ m : 4 ∗ m],s : n) = Y ; end s = n; [u1,beta1] = Householder1(B1(s : m,s),B1((m + s) : (2 ∗ m),s),... B1((2 ∗ m + s) : (3 ∗ m),s),B1((3 ∗ m + s) : (4 ∗ m),s),m − s + 1); u1R = Real p(u1(:,1),u1(:,2),u1(:,3),u1(:,4)); Y = B1([s : m,s + m : 2 ∗ m,s + 2 ∗ m : 3 ∗ m,s + 3 ∗ m : 4 ∗ m],s : n); Y = Y − (beta1 ∗ u1R) ∗ (u1R0 ∗Y ); B1([s : m,s + m : 2 ∗ m,s + 2 ∗ m : 3 ∗ m,s + 3 ∗ m : 4 ∗ m],s : n) = Y ; end
Computing Matrix Decompositions
65
Algorithm 2.4.6. (qPQR4) For a given quaternion matrix A = A1 + A2 i + A3 j + A4 k ∈ Qm×n , the input AA is the first column block of AR , i.e. ARc , the output J is a vector which expresses the permutation matrix P, and B4 is the first column block of RR in QR decomposition of A with column pivoting. Function : [B4, J] = qPQR4(AA) [M, n] = size(AA); m = M/4; J = (1 : n); B4 = AA; for s = 1 : n − 1 [mx, id] = max(sum(B4([s : m, m + s : 2 ∗ m, 2 ∗ m + s : 3 ∗ m, ... 3 ∗ m + s : 4 ∗ m], s : n). ∗ B4([s : m, m + s : 2 ∗ m, 2 ∗ m + s : 3 ∗ m, ... 3 ∗ m + s : 4 ∗ m], s : n))); if id > 1 id = id + s − 1; B4(:, [s, id]) = B4(:, [id, s]); J([s, id]) = J([id, s]); end for s = 1 : n − 1 [u4, beta4] = Householder1(B4(s : m, s), B4((m + s) : (2 ∗ m), s), ... B4((2 ∗ m + s) : (3 ∗ m), s), B4((3 ∗ m + s) : (4 ∗ m), s), m − s + 1); u4R = Real p(u4(:, 1), u4(:, 2), u4(:, 3), u4(:, 4)); Y = B4([s : m, s + m : 2 ∗ m, s + 2 ∗ m : 3 ∗ m, s + 3 ∗ m : 4 ∗ m], s : n); Y = Y − (beta4 ∗ u4R) ∗ (u4R0 ∗ Y ); B4([s : m, s + m : 2 ∗ m, s + 2 ∗ m : 3 ∗ m, s + 3 ∗ m : 4 ∗ m], s : n) = Y ; G4 = JRSGivens(B4(s, s), B4(s + m, s), B4(s + 2 ∗ m, s), B4(s + 3 ∗ m, s)); B4([s, s + m, s + 2 ∗ m, s + 3 ∗ m], s : n) = ... G40 ∗ B4([s, s + m, s + 2 ∗ m, s + 3 ∗ m], s : n); end s=n [u4, beta4] = Householder1(B4(s : m, s), B4((m + s) : (2 ∗ m), s), ... B4((2 ∗ m + s) : (3 ∗ m), s), B4((3 ∗ m + s) : (4 ∗ m), s), m − s + 1); u4R = Real p(u4(:, 1), u4(:, 2), u4(:, 3), u4(:, 4)); B4([s : m, s + m : 2 ∗ m, s + 2 ∗ m : 3 ∗ m, s + 3 ∗ m : 4 ∗ m], s : n) = ... B4([s : m, s + m : 2 ∗ m, s + 2 ∗ m : 3 ∗ m, s + 3 ∗ m : 4 ∗ m], s : n) − ... (beta4 ∗ u4R) ∗ (u4R0 ∗ B4([s : m, s + m : 2 ∗ m, s + 2 ∗ m : 3 ∗ m, ... s + 3 ∗ m : 4 ∗ m], s : n)); G4 = JRSGivens(B4(s, s), B4(s + m, s), B4(s + 2 ∗ m, s), ... B4(s + 3 ∗ m, s)); B4([s, s + m, s + 2 ∗ m, s + 3 ∗ m], s : n) = ... G40 ∗ B4([s, s + m, s + 2 ∗ m, s + 3 ∗ m], s : n); end
66
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao 35 *−−Algorithm qQR1 −
−Algorithm qQR2 +−−Algorithm qQR3 o−−Algorithm qQR4 h−−qr of Quaternion Toolbox
30
CPU time(second)
25
20
15
10
5
0
0
10
20
30 k=1:60
40
50
60
Figure 2.4.1. CPU times for the QR decompositions.
2.4.2.
The Givens QRD
The second way to obtain the QR decomposition is to perform Givens transformations. For a matrix A , there exist Givens matrices G1 , G2 , · · · , Gt , in which t is the total number of rotations, which make Gt · · ·G2 G1 A = R2 . Denote Q2 = (Gt · · ·G2 G1 )H , and then we have the QRD as follows A = Q2 R2 , in which Q2 is an n×n unitary matrix, and R2 is an n×n upper triangular matrix. This method requires (2n3 + 25 n2 ) × 16 real flops and 12 n2 square roots.
Computing Matrix Decompositions
2.4.3.
67
The Modified Gram-Schimit Scheme
The third way to obtain the quaternion QR decomposition is the modified GramSchmidt (MGS) algorithm obtained by rearrangement of the calculation of the classical Gram-Schmidt algorithm, which is a direct method for implementing the quaternion QR decomposition for a full column rank matrix. For a matrix A, at the kth step of the MGS, the kth column of Q (denoted by qk ) and the kth row of R (denoted by rkT ) are determined. To derive the MGS method, we define the matrix A(k) with the size of m × (n − k + 1) by k−1 n . (0..A(k)) = A − ∑ qi riH = ∑ qi riH . i=1
It follows that if
i=k
. A(k) = (z..B),
then rkk = kzk2, qk = z/rkk and (rk,k+1, · · · , rkn) = qH k B. We then compute the outer product A(k+1) = B − qk (rk,k+1, · · · , rkn ) and proceed to the next step. This completely describes the kth step of the MGS. This method requires 32n3 flops. Comparing with the classical Gram-Schmidt algorithm, the MGS has better numerical properties, and is a more reliable procedure. Among the Householder transformation algorithm, Givens transformation algorithm and the modified Gram-Schmidt algorithm, the modified Gram-Schmidt algorithm is the most effective. Now we give numerical codes for the real structure-preserving modified Gram-Schmidt algorithm and column pivoting MGS algorithm for the quaternion matrix, which can efficiently decrease computational quantity and increase computational efficiency. Algorithm 2.4.7. (The real structure-preserving quaternion MGS algorithm) For a given quaternion matrix A = A1 + A2 i + A3 j + A4 k ∈ Qn×n , the input AA is the first column block of AR , i.e. ARc , the output QQ and RR are the the first column block of QR and RR in the QR decomposition of A, respectively.
68
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao Function : [QQ, RR] = qMGS1(AA) n = size(AA, 2); RR = zeros(4 ∗ n, n); ; for s = 1 : n − 1 RR(s, s) = norm(AA(:, s)); AA(:, s) = AA(:, s)/RR(s, s) GD = Real p(AA(1 : n, s), AA(1 + n : 2 ∗ n, s), AA(2 ∗ n + 1 : 3 ∗ n, s), ... AA(3 ∗ n + 1 : 4 ∗ n, s)); RR([s, s + n, s + 2 ∗ n, s + 3 ∗ n], s + 1 : n) = GD0 ∗ AA(:, s + 1 : n); AA(:, s + 1 : n) = AA(:, s + 1 : n) − ... GD ∗ AA([s, s + n, s + 2 ∗ n, s + 3 ∗ n], s + 1 : n); end RR(n, n) = norm(AA(:, n)); AA(:, n) = AA(:, n)/RR(n, n); QQ = AA; end
Algorithm 2.4.8. (The real structure-preserving column pivoting quaternion MGS algorithm) For a given quaternion matrix A = A1 + A2 i + A3 j + A4 k ∈ Qn×n , the input AA is the first column block of AR , i.e. ARc , the output J is a vector expressing the permutation matrix P, QQ is the the first column block of QR in the QR decomposition of A Function : [J, QQ, RR] = qPMGS1(AA) [M, n] = size(AA); R = zeros(4 ∗ n, n); for s = 1 : n − 1 [mx, id] = max(sum(AA(:, s : n). ∗ AA(:, s : n))); if id > 1 id = id + s − 1; AA(:, [s, id]) = AA(:, [id, s]); J([s, id]) = J([id, s]); end
69
Computing Matrix Decompositions R(s, s) = norm(AA(:, s), 2); AAc (:, s) = AAc (:, s)/R(s, s); R([s, s + n, s + 2n, s + 3n], s + 1 : n)) = (AR )T AA(:, s + 1 : n); AA(:, s + 1 : n) = AA(:, s + 1 : n) − AR R([s, s + n, s + 2n, s + 3n], ... s + 1 : n); end R(n, n) = norm(AA(:, n), 2); AA(:, n) = AA(:, n)/R(n, n); QQ = AA; end
In Algorithm 2.4.7, the AR is overwritten by QR , and the matrix R is stored in a separate matrix.
2.4.4.
Complete Orthogonal Decomposition
For A ∈ Qrm×n with r < n, R obtained by QR decomposition is upper trapezoidal. It is difficult to compute the MP inverse of R. To solve this problem, we should compute the QR decomposition of RH . Let the QR decomposition with column pivoting be AΠ = Q1 R. Compute QR decomposition of RH , we obtain RH = V1 SH , in which V1H V1 = Ir , SH is nonsingular upper triangular. Then we obtain AΠ = Q1 SV1H ,
(2.4.6)
H in which QH 1 Q1 = V1 V1 = Ir , S is nonsingular lower triangular. (2.4.6) is called complete orthogonal decomposition of A.
2.5. The Quaternion SVD The SVD is regarded as one of the most fundamental and powerful decompositions. It deals with the rows and columns in a symmetric fashion, reveals useful properties of the original matrix. This is the property that makes the SVD so useful in many areas. In this section, we propose a real structure-preserving algorithm for the quaternion SVD [114]. In the process of performing the quaternion SVD of the matrix A ∈ Qm×n , we apply the idea of Golub and Reinsch in [62] for the
70
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
SVD of a real matrix, first perform a sequence of left-multiplying and rightmultiplying Householder based transformations on A, to obtain a real upper bidiagonal matrix T as follows, "
# " # " # In−1 0 Is 0 1 0 (0) T= ... ... Hl A (n−1) (s) (1) 0 Hl 0 Hl 0 Hl " #H " #H " #H 1 0 It 0 In−1 0 × ... ... , b (1) b (t) b (n−1) 0 H 0 H 0 H l l l (s)
in which for l = 2, 3, 4 and s = 0, · · · , n − 1, Hl are m − s unitary quaternion b (t) are n − s unitary matrices of type Hl , and for l = 2, 3, 4 and t = 1, · · · , n − 1, H l quaternion matrices of type Hl . Then we perform a sequence of iterations using Givens rotations on the real upper bidiagonal matrix T , such that the resulting matrix converges to a diagonal matrix Σ. Here we describe this procedure with a 6 × 4 quaternion matrix.
A=
=⇒
× × × × × × r 0 0 0 0 0
× × × × × × r r 0 0 0 0
0 r v v v v
× × × × × ×
0 0 v v v v
× × × × × ×
=⇒
=⇒
r 0 0 0 0 0
r 0 0 0 0 0
y y y y y y r r 0 0 0 0
y y y y y y 0 r r 0 0 0
0 0 w w w w
y r y 0 y 0 =⇒ y 0 0 y y 0 r 0 0 =⇒ 0 0 0
0 z z z z z
r z z z z z r r 0 0 0 0
0 r r 0 0 0
0 z z z z z 0 0 r p p p
=⇒
=⇒
r 0 0 0 0 0
r 0 0 0 0 0
0 u u u u u
r r 0 0 0 0 r r 0 0 0 0
0 r r 0 0 0
0 u u u u u 0 0 r r 0 0
= T,
in which the letters r denote some real numbers. We can propose three kinds of real structure-preserving algorithms, named qSVD2, qSVD3 and qSVD4, respectively, in terms of H2 , H3 and H4 . Here we only give the numerical code of algorithm of qSVD4. Algorithm 2.5.1. (qSVD4) For a given matrix A = A1 + A2 i + A3 j + A4 k ∈ Qm×n , the input AA is the first column block of AR , i.e. ARc , the output P R R and W arethe first column block of PP and WW , respectively, and D = Σ1 0 , Σ1 = diag(σ1 , · · · , σr ), such that A = PP ∗ D ∗WW H . 0 0
Computing Matrix Decompositions Function : [P, D,W] = qSV D4(AA) B4 = AA; P = zeros(4 ∗ m, m); P(1 : m, :) = eye(m); W = zeros(4 ∗ n, n); W (1 : n, :) = eye(n); Y = W 0; for s = 1 : n − 1 if norm([B4(s : m, s); B4((m + s) : (2 ∗ m), s); B4((2 ∗ m + s) : (3 ∗ m), s); ... B4((3 ∗ m + s) : (4 ∗ m), s)]) 6= 0 if norm([B4(s, s); B4(s + m, s); B4(s + 2 ∗ m, s); B4(s + 3 ∗ m, s)]) 6= 0 G4 = JRSGivens(B4(s, s), B4(s + m, s), B4(s + 2 ∗ m, s), B4(s + 3 ∗ m, s)); t = s; B4([t, t + m, t + 2 ∗ m, t + 3 ∗ m], s : n) = ... G40 ∗ B4([t, t + m, t + 2 ∗ m, t + 3 ∗ m], s : n); P([t, t + m, t + 2 ∗ m, t + 3 ∗ m], 1 : m) = ... G40 ∗ P([t, t + m, t + 2 ∗ m, t + 3 ∗ m], 1 : m); end [u4R, beta4] = Householder(B4(s : m, s), B4((m + s) : (2 ∗ m), s), ... B4((2 ∗ m + s) : (3 ∗ m), s), B4((3 ∗ m + s) : (4 ∗ m), s), m − s + 1); BB = B4([s : m, s + m : 2 ∗ m, s + 2 ∗ m : 3 ∗ m, s + 3 ∗ m : 4 ∗ m], s : n); BB = BB − (beta4 ∗ u4R) ∗ (u4R0 ∗ BB); B4([s : m, s + m : 2 ∗ m, s + 2 ∗ m : 3 ∗ m, s + 3 ∗ m : 4 ∗ m], s : n) = BB; PP = P([s : m, s + m : 2 ∗ m, s + 2 ∗ m : 3 ∗ m, s + 3 ∗ m : 4 ∗ m], 1 : m); PP = PP − (beta4 ∗ u4R) ∗ (u4R0 ∗ PP); P([s : m, s + m : 2 ∗ m, s + 2 ∗ m : 3 ∗ m, s + 3 ∗ m : 4 ∗ m], 1 : m) = PP; end if s n [u4R, beta4] = Householder(B4(s : m, s), B4((m + s) : (2 ∗ m), s), ... B4((2 ∗ m + s) : (3 ∗ m), s), B4((3 ∗ m + s) : (4 ∗ m), s), m − s + 1); BB4 = B4([s : m, s + m : 2 ∗ m, s + 2 ∗ m : 3 ∗ m, s + 3 ∗ m : 4 ∗ m], s : n); BB4 = BB4 − (beta4 ∗ u4R) ∗ (u4R0 ∗ BB4); B4([s : m, s + m : 2 ∗ m, s + 2 ∗ m : 3 ∗ m, s + 3 ∗ m : 4 ∗ m], s : n) = BB4; PP4 = P([s : m, s + m : 2 ∗ m, s + 2 ∗ m : 3 ∗ m, s + 3 ∗ m : 4 ∗ m], 1 : m); PP4 = PP4 − (beta4 ∗ u4R) ∗ (u4R0 ∗ PP4); P([s : m, s + m : 2 ∗ m, s + 2 ∗ m : 3 ∗ m, s + 3 ∗ m : 4 ∗ m], 1 : m) = PP4; end end [U, D,V] = svd(B4(1 : m, 1 : n)); P = [U 0 ∗ P(1 : m, :);U 0 ∗ P(m + 1 : 2 ∗ m, :); ... U 0 ∗ P(2 ∗ m + 1 : 3 ∗ m, :);U 0 ∗ P(3 ∗ m + 1 : 4 ∗ m, :)]; P = [P(1 : m, :)0; −P(m + 1 : 2 ∗ m, :)0; ... −P(2 ∗ m + 1 : 3 ∗ m, :)0; −P(3 ∗ m + 1 : 4 ∗ m, :)0]; Y = [Y (:, 1 : n) ∗ V,Y(:, n + 1 : 2 ∗ n) ∗ V, ... Y (:, 2 ∗ n + 1 : 3 ∗ n) ∗ V,Y(:, 3 ∗ n + 1 : 4 ∗ n) ∗ V]; W = [Y (:, 1 : n); −Y (:, n + 1 : 2 ∗ n); ... −Y (:, 2 ∗ n + 1 : 3 ∗ n); −Y(:, 3 ∗ n + 1 : 4 ∗ n)]; end
Remark 2.5.1. When organizing real structure-preserving algorithms, two issues should be pointed out. 1. If x is a row quaternion vector, then xH is a column quaternion vector. T Therefore when Hl xH = αe1 with α a real number, then xQH l = αe1 . So we can obtain the Householder vector u and real scalar β from the vector xH to make T xQH l = αe1 . 2. When performing real structure-preserving algorithms for the quaternion SVD, we need to treat the performance of the form (AHlH )R = AR (HlR )T . In order to save storage and computational amounts, we only store B := ARc =
74
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
Table 2.5.5. Computational amounts and assignment numbers of qSVDl
qSVD2 qSVD3 qSVD4
real flops 92(mn2 − n3 /3) 44(mn2 − n3 /3) 64(mn2 − n3 /3)
assignment numbers mn 2 5n /2 + 2mn 9n
(AT1 , AT2 , AT3 , AT4 )T , the first column block of AR . Therefore, we first form a real matrix C := (A1 , −A2 , −A3 , −A4 ), the first row block of AR , according to (1.8.1), then we obtain C(HlR )T , which is the first row block of (AHlH )R = AR (HlR )T . Finally, we store these data back to B. In Table 2.5.5, we list computational flops and assignment numbers of Algorithms qSVD2, qSVD3 and qSVD4 for the real bidiagonalization of A ∈ Qm×n . We now provide a numerical example to compare the efficiency of real structure-preserving algorithms. For a = 9, b = 6, k = 1 : 70, m = ak, n = bk, we generate random matrices A1 , A2 , A3 , A4 ∈ Rm×n obtained using the function ‘rand 0 . We compare CPU times of three real structure-preserving algorithms qSVD2, qSVD3, qSVD4 for performing real bidiagonalization of quaternion matrices and one adopting the function 0 svd 0 in QTFM [158]. Fig 2.5.1 shows that the CPU time of the Algorithm qSVD4 is the smallest, which cost about the half CPU time of qSVD3, and one-seventh CPU time of the function 0 svd 0 in QTFM. From the above discussion and Fig 2.5.1, we see that our real structurepreserving algorithms are efficient. Among them, the real structure-preserving algorithm based on H4 is the most efficient and convenient. We now apply the above Algorithm 2.5.1 to the color image compression problem. In [155], Sangwine proposed to encode the three channel components of a RGB image on the three imaginary parts of a pure imaginary quaternion, that is q(x, y) = r(x, y)i + g(x, y)j + b(x, y)k, in which r(x, y), g(x, y), b(x, y) are the red, green and blue values of the pixel (x, y), respectively. Thus, a color image with N rows and N columns can be
75
Computing Matrix Decompositions 70 −
−Algorithm qSVD2 +−−Algorithm qSVD3 o−−Algorithm qSVD4 *−−svd of Quaternion Toolbox
60
CPU time(second)
50
40
30
20
10
0
0
10
20
30
40
50
60
70
k=1:70
Figure 2.5.1. CPU times for the SVD. represented by a pure imaginary quaternion matrix A = (qi j )N×N = Ri + Gj + Bk, qi j ∈ Q. We can reconstruct a compressed color image by implementing the quaternion SVD for A using the real structure-preserving algorithm as follows [117]. Step1. Performing the quaternion SVD for A using the real structurepreserving algorithm to obtain A = UA ΣAVAH , in which ΣA = diag(σA1 , · · · , σAN ). Step2. Color image compressing Assume q is an integer and q < N. Denote AC = UA (:, 1 : q)ΣA (1 : q, 1 : q)VA(:, 1 : q)H . By removing some smallest singular values, we get a new quaternion matrix AC . Since the quaternion matrix of a color image is pure imaginary, we modify
76
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao a. original image
b. QC with q=16
c. QC with q=32
d. QC with q=64
e. QC with q=128
Figure 2.5.2. Original and compressed image Pepper using two schemes with different q.
the real part of AC as zero matrix to get a pure imaginary quaternion matrix A˜ C , which is corresponding to the compressed color image. We carry out an experiment to evaluate the effectiveness of the above compression schemes. We choose the color image Pepper of size 512 × 512 as the original image, shown in (a) of Fig 2.5.2. The compressed images with q = 16, 32, 64, 128 using quaternion compression scheme are shown in (b) − (e) of Fig 2.5.2. We use several different quantitative criterias to judge the effect of the quaternion compression algorithm, PSNR and CR. The visual fidelity can be measured by calculating two types of parameters known as Mean Square Error (MSE) or Peak Signal to Noise Ratio (PSNR) e Mean Square Error between the original image A and the compressed image A.
77
Computing Matrix Decompositions
Table 2.5.6. PSNR and CR values with different numbers of selected singular values image Pepper Pepper Pepper Pepper Pepper
N 512 512 512 512 512
q 8 16 32 64 128
PSNR 15.4628 18.6249 21.8394 24.8440 27.2391
CR 24 12 6 3 1.5
(MSE) is defined as N
MSE =
N
3
e y, k))2 ∑ ∑ ∑ (A(x, y, k) − A(x,
x=1 y=1 k=1
3N 2
and Peak Signal to Noise Ratio (PSNR) is defined as PSNR = 10lg
3N 2 (max A(x, y, k))2 N
N
3
,
e y, k))2 ∑ ∑ ∑ (A(x, y, k) − A(x,
x=1 y=1 k=1
in which A is a matrix of size N × N, max A(x, y, k) represents the maximum e y, k) are pixel value of a color image, and here it is 255. A(x, y, k) and A(x, the pixel values location at position (x, y, k) in the original image and the compressed image, respectively. In general, the lower MSE or the larger PSNR means better image quality. Compression Ratio (CR) is a term that is being used to describe ratio of compressed image to the original image defined as CR =
N2 , K
in which K is the size of compressed image. We can use CR to judge how compression efficiency is, as higher CR means better compression. Table 2.5.6 shows the quantitative comparison and compression ratio of the quaternion techniques with different q, the numbers of selected singular values.
Chapter 3
Linear System and Generalized Least Squares Problems For given nonsingular matrix A ∈ Qn×n and arbitrary vector b ∈ Qn , linear system Ax = b always exists a unique solution x = A−1 b. However, in many practical problems, the matrix A is not square and even not of full rank, or b 6∈ R (A), therefore linear system Ax = b does not exist a solution in the usual sense. For real or complex linear system and generalized least squares problems (GLS), theoretical properties and numerical method have been extensively studied. Concepts, properties and most numerical methods of linear system and GLS problems can be generalized to quaternion cases. In this chapter, we will study the linear system and GLS problems of Ax = b. In §3.1, we discuss basic definitions and properties of linear system. In §3.2, we study the least squares problem (LS). In §3.3, we study the total least squares problem (TLS). In §3.4, we discuss the equality constrained least squares problem (LSE).
3.1. Linear System Many scientific problems result in solving linear system. When a linear system has a solution, it is called compatible, otherwise it is called incompatible. In this section, we discuss equivalent conditions for a compatible quaternion linear system and the structures of the solutions.
80
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
1. Homogeneous Linear System Suppose that Ax = 0,
(3.1.1)
in which A = (ai j ) ∈ Qm×n is a given coefficient matrix, 0 ∈ Qm is the zero vector, x ∈ Qn is an unknown vector. For (3.1.1), zero vector is obviously a solution of (3.1.1), and therefore we mainly discuss non-zero solutions of (3.1.1). Theorem 3.1.1. (3.1.1) has non-zero solutions if and only if rank(A) < n. (3.1.1) has only zero solution if and only if rank(A) = n. When (3.1.1) has non-zero solutions, we study the structures of the solutions. We first introduce an important property of the solutions of (3.1.1). Theorem 3.1.2. Suppose that α1 , α2 , · · · , αr are solutions of (3.1.1). Then α1 k1 + α2 k2 + · · · + αr kr is also a solution of (3.1.1), in which k1 , k2 , · · · , kr are arbitrary quaternions. Now, we give the definition of the set of fundamental solutions. Definition 3.1.1. Suppose that η1 , η2 , · · · , ηt are the solutions of (3.1.1). If they satisfy (1) η1 , η2 , · · · , ηt are right linearly independent; (2) An arbitrary solution x of (3.1.1) can be uniquely expressed as x = η1 k1 + η2 k2 + · · · + ηt kt , in which k1 , k2 , · · · , kt are quaternions. Then {η1 , η2 , · · · , ηt } is called a set of fundamental solutions of (3.1.1). Obviously two sets of fundamental solutions of (3.1.1) are equivalent, and therefore they have the same number of vectors. How many vectors are included in a set of fundamental solutions? Now we discuss this problem. Theorem 3.1.3. For (3.1.1), suppose that rank(A) = r < n. Then (3.1.1) has a set of fundamental solutions, and the number of solutions in the set of fundamental solutions is n − r.
2. Nonhomogeneous Linear System Suppose that Ax = b,
(3.1.2)
Linear System and Generalized Least Squares Problems
81
in which A = (ai j ) ∈ Qm×n , 0 6= b ∈ Qm are given, x ∈ Qn is an unknown vector. Denote b = (A b). A
b is called the augmented matrix of (3.1.2) Then A
Theorem 3.1.4. Nonhomogeneous linear system (3.1.2) has a solution if and b = rank(A) only if rank(A)
From above theorem, we have that (3.1.2) does not exist a solution if and b 6= rank(A). only if rank(A)
Theorem 3.1.5. Suppose that (3.1.2) has a solution. If rank(A) = n, then (3.1.2) has a unique solution; If rank(A) < n, then (3.1.2) has infinite number of solutions. When (3.1.2) has infinite number of solutions, we discuss the structures of the solutions. The solutions of (3.1.1) and (3.1.2) have close relations. Now we list them as follows. Theorem 3.1.6. Suppose that u1 , u2 are two solutions of (3.1.2). Then u1 − u2 is a solution of (3.1.1). Theorem 3.1.7. Suppose that u is a solution of (3.1.2), and v is a solution of (3.1.1). Then u + v is a solution of (3.1.2). From the above properties, we can easily obtain the following theorem. Theorem 3.1.8. If γ0 is a solution of (3.1.2), then arbitrary solution γ of (3.1.2) can be expressed as γ = γ0 + η, in which η is a solution of (3.1.1). From above theorem, we observe that when we compute solutions of (3.1.2), we only need to compute a special solution of (3.1.2) and a set of fundamental solutions of (3.1.1). Suppose that γ0 is a special solution of (3.1.2), and {η1 , η2 , · · · , ηt } is a set of fundamental solutions of (3.1.1), then any solution of (3.1.2) can be expressed as γ0 + η1 k1 + η2 k2 + · · · + ηt kt , in which k1 , k2 , · · · , kt are quaternions.
82
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
Theorem 3.1.9. Suppose that A ∈ Qn×n is nonsingular. Then (3.1.1) has the unique solution x = 0. Theorem 3.1.10. Suppose that A ∈ Qn×n is nonsingular. Then (3.1.2) has the unique solution x = A−1 b.
3.2. The Linear Least Squares Problem In this section, we discuss the linear least squares problem (LS), including its equivalent problems and the regularization of the LS problem.
3.2.1.
The LS Problem and Its Equivalent Problems
Definition 3.2.1. Suppose that A ∈ Qm×n , b ∈ Qm . The LS problem refers to find a vector x ∈ Qn such that ρ(x) = kAx − bk2 = minn kAv − bk2 . v∈Q
(3.2.1)
Theorem 3.2.1. The expression of any solution of the LS problem (3.2.1) is given as follows x = A† b + (I − PAH )z = A† b + PA⊥H z, (3.2.2)
in which z ∈ Qn is an arbitrary vector. (3.2.1) has the unique minimal norm solution xLS = A† b, which satisfies (3.2.1) and kxLS k2 = min kxk2 , x∈S
S is the solution set of the LS problem (3.2.1). When the column vectors of A are right linearly independent, the LS problem (3.2.1) has the unique solution xLS = A† b. Proof. Notice that for arbitrary x ∈ Qn , we have Ax ∈ R (A). Therefore, we can split b as follows b = b1 + b2 , in which b1 = AA† b is the orthogonal projection from b to R (A), b2 = b − b1 = PA⊥ b. From ρ(x)2 = kb − Axk22 = kb1 − Axk22 + kb2 k22 , we have ρ(x) = minn kAv − bk2 if and only if Ax = AA† b. v∈Q
83
Linear System and Generalized Least Squares Problems Suppose that x = A† b + y, and x satisfies Ax = AA† b. From Ay = A(x − A† b) = b1 − b1 = 0,
we have y ∈ N (A) = R (AH )⊥, and the vector y in R (AH )⊥ can be expressed as y = (In − PAH )z = PA⊥H z, z ∈ Qn . Therefore x = A† b + PA⊥H z = xLS + PA⊥H z. Notice that A† b and PA⊥H z are mutually perpendicular, and kxk22 = kA† bk22 + kPA⊥H zk22 ≥ kA† bk22 , thus we obtain the minimal norm solution xLS = A† b immediately . When A ∈ Qnm×n , we have PAH = In . Therefore the LS solution xLS = A† b is unique. We now introduce two equivalent problems to the LS problem (3.2.1), which are the normal equation and the KKT equation.
1. The Normal Equation Theorem 3.2.2. Suppose that A ∈ Qm×n , b ∈ Qm . Then the linear system AH Ax = AH b
(3.2.3)
is compatible. (3.2.3) is called the normal equation of the LS problem (3.2.1). The normal equation (3.2.3) and the LS problem (3.2.1) have the same solution sets. Proof. Obviously, R (AH b) ⊆ R (AH A), therefore (3.2.3) is compatible. From the expression of general solution of the LS problem and the properties of the MP inverse, a general solution of (3.2.3) have the following expression x = (AH A)† AH b + (In − (AH A)† (AH A))z = A† b + (I − A† A)z, z ∈ Qn . Therefore (3.2.3) and the LS problem (3.2.1) have the same solution sets.
84
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
2. The KKT Equation For arbitrary LS solution x of (3.2.1), r = b−Ax = b−AxLS is called the residual vector of the LS problem (3.2.1). Theorem 3.2.3. Suppose that A ∈ Qm×n , b ∈ Qm . Then x and r = b − Ax are a solution and the residual vector of the LS problem (3.2.1), respectively, if and only if x and r are the solutions of the following compatible system I A r b = . (3.2.4) AH 0 x 0 The above linear system is called the Karush- Kuhn -Tucker equation (KKT) of the LS problem (3.2.1) . Proof. Suppose that x and r are a solution and the residual vector of the LS problem (3.2.1), respectively. Then x = A† b + (I − A† A)z, r = b − Ax = b − AA† b = (I − AA† )b. Therefore, from the properties of the MP inverse A† and r + Ax = b, we have AH r = AH (I − AA† )b = ((I − AA† )A)H b = 0. Thus (3.2.4) is a compatible linear system, x and r satisfy (3.2.4). Conversely, by checking the four conditions of the MP inverse, we can check that † I − AA† (A† )H I A † = . T ≡ A† −A† (A† )H AH 0 Therefore, any solutions x, r of (3.2.4) have the following form r b y (I − AA† )b † † =T + (I − T T ) = , x 0 z A† b + (I − A† A)z in which y ∈ Qm , z ∈ Qn are arbitrary vectors. Therefore, any vectors x, r satisfying (3.2.4) are the LS solution and the residual vector of (3.2.1), respectively.
Linear System and Generalized Least Squares Problems
3.2.2.
85
The Regularization of the LS Problem
Suppose that matrix A ∈ Qrm×n has the singular value decomposition A = Udiag(Σ1 , 0)V H ,
(3.2.5)
in which U ∈ Um , V ∈ Un , l = min{m, n}, σ1 ≥ · · · ≥ σr > 0 = σr+1 = · · · = σl , Σ1 = diag(σ1 , · · · , σr ).
(3.2.6)
Then the minimal norm least squares solution xLS of the LS problem (3.2.1) can be expressed as r uH b xLS = A† b = ∑ i vi . (3.2.7) i=1 σi Since there may have rounding errors, all algorithms computing A† and xLS b† and xbLS = A b† b in fact compute the MP inverse A b of some perturbed matrix b A = A + ∆A. When A is an ill-conditioned full rank matrix, i.e., there exists k, 1 ≤ k < l, uH b such that σk σk+1 ≈ 0, since there exist σi i vi , i ≥ k + 1 in the expression of the LS solution xLS , a big error will occur in the LS solution xLS although A has a small perturbation. Here the matrix A is called numerically rank deficient. When A is rank deficient and k∆Ak is very small, the discontinuity of the MP inverse indicates that the original rank is not fit to numerical calculations. When b and A are not same, the difference of A b† and A† , xLS and xbLS may the ranks of A be very large. k∆Ak becomes smaller, then the degree of difference may be bigger. In the following two subsections, we provide two regularization methods, which are the truncated LS problem and the Tikhonov regularization.
1. The Truncated LS Problem We first provide the definition of δ-rank of A. Definition 3.2.2. Suppose that A ∈ Qm×n and δ > 0 are given. If k = min {rank(B) : kA − Bk2 ≤ δ}, m×n B∈Q
then k is called the δ-rank of A.
(3.2.8)
86
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
From above definition and the best approximation theorem of singular matrix, when k < l, we have min kA − Bk2 = σk+1 .
rank(B)≤k
When
k
B = Ak = ∑ σi ui vH i , i=1
the minimal value is achieved. Therefore δ-rank of A is k if and only if σ1 ≥ · · · ≥ σk ≥ δ > σk+1 ≥ · · · ≥ σl , (l = min{m, n}). For the LS problem (3.2.1), the truncated LS problem is as follows kAkx − bk2 = minn kAk y − bk2 , y∈Q
(3.2.9)
in which Ak is the best rank-k approximation of A. Here the minimal norm LS solution xLS of (3.2.9) can be expressed as k
uH i b vi . i=1 σi
xbLS = A†k b = ∑
(3.2.10)
2. The Tikhonov Regularization
We consider the following regularization problem kAx − bk22 + τ2 kDxk22 = minn {kAy − bk22 + τ2 kDyk22 }, y∈Q
(3.2.11)
in which τ > 0, D = diag(d1 , · · · , dn ) is a positive definite diagonal matrix . Obviously, the problem (3.2.11) is equivalent to
τD
τD 0 0
= min
. y− (3.2.12) x−
A
n b A b 2 y∈Q 2
When τ > 0, the coefficient matrix is of full column rank, therefore there exists a unique LS solution.
Linear System and Generalized Least Squares Problems
87
Suppose that m ≥ n, then when D = In , the singular values of the coefficient matrix in (3.2.12) are σ˜ i = (σ2i + τ2 )1/2 , i = 1, · · · , n. Hence the solution of (3.2.11) can be expressed as n
n uH uH i bσi i b fi vi , v = i ∑ 2 + τ2 σ i=1 i i=1 σi
x(τ) = ∑
fi =
σ2i , σ2i + τ2
(3.2.13)
in which f i is called filtering factor. When τ σi , f i ≈ 1. When τ σi , f i 1. The advantage of regularization problem (3.2.11) is that its solution can be obtained by the SVD of A or the following QR decomposition τD R =Q . A 0 Remark 3.2.1. The regularization of the LS problem has converted the original LS problem into a new LS problem. In fact, from (3.2.10) and (3.2.13), we can see that the minimal norm LS solution of the LS problem (3.2.1) and the corresponding residual vectors of original LS problem, the truncated LS problem, and the Tikhonov regularization method are not same. The regularization method is often used to deal with ill-posed problems, such as ill-conditioned LS problem, the inverse problem and et al.
3.2.3.
Some Matrix Equations
In this subsection, we illustrate applications of the SVD for two simple matrix equations. The first matrix equation is as follows kAXB −CkF = min,
(3.2.14)
in which A ∈ Q p×m , B ∈ Qn×q and C ∈ Q p×q are given, and the matrix X ∈ Qm×n is unknown. Consider the following problems. (1) Compute X ∈ Qm×n which satisfies (3.2.14). Denote the set of matrices satisfying (3.2.14) by S1 . (2) Determine X ∈ S1 such that kXkF = min. (3) Under what conditions, is the matrix equation AXB = C compatible? The following theorem answers above three problems.
(3.2.15)
88
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao p×m
n×q
Theorem 3.2.4. Suppose that A ∈ QrA , B ∈ QrB and the SVD of A, B are as follows
and C ∈ Q p×q are given,
A = UAdiag(ΣA , 0)VAH , B = UB diag(ΣB, 0)VBH ,
(3.2.16)
in which UA , UB , VA, VB are unitary matrices of appropriate sizes, ΣA = diag(σ1 (A), · · · , σrA (A)) > 0, ΣB = diag(σ1 (B), · · · , σrB (B)) > 0. Denote X˜
= VAH XUB
=
X11 X12 X21 X22
, C˜
= UAH CVB
=
C11 C12 C21 C22
,
(3.2.17)
in which X11 , C11 ∈ QrA ×rB . Any matrix X satisfying (3.2.14) has the following expression −1 Z12 ΣA C11 Σ−1 B X = VA UBH = A†CB† + Z − PAH ZPB , (3.2.18) Z21 Z22
Z11 Z12 in which Z = VA UBH ∈ Qm×n is an arbitrary matrix, and Z11 ∈ Z21 Z22 QrA ×rB . Among all matrices satisfying (3.2.14), −1 ΣA C11 Σ−1 0 B XLS = VA UBH (3.2.19) 0 0 has the minimal F-norm. And if and only if A, B and C satisfy PACPBH = C,
(3.2.20)
the matrix equation (3.2.15) is compatible. ˜ we have ˜ C, Proof. From the SVD of A, B and the partitions of X, ΣA 0 ΣB 0 H kAXB −CkF = kUA VA XUB VBH −CkF 0 0 0 0 ΣA 0 ΣB 0 ˜ F ˜ =k X − Ck 0 0 0 0 ΣA X11 ΣB −C11 −C12 =k kF . −C21 −C22
(3.2.21)
Linear System and Generalized Least Squares Problems
89
−1 Obviously, kAXB −CkF = min if and only if X11 = Σ−1 A C11 ΣB . In addition, for m×n an arbitrary matrix Z ∈ Q , 0 Z12 Z − PAH ZPB = VA UBH , Z21 Z22
Z11 Z12 = VAH ZUB . Therefore, X has the form in (3.2.18). In Z21 Z22 addition, the solution XLS has the minimal F-norm when Zi j = 0 for (i, j) 6= (1, 1). From (3.2.21), (3.2.15) is compatible if and only if 0 = kAXB − CkF , which is equivalent to Ci j = 0 for (i, j) 6= (1, 1). Therefore C 0 11 H ˜ B = UA C = UACV VBH = PACPBH . 0 0 in which
The second matrix equation is as follows kHB −CkF = min,
(3.2.22)
in which B, C ∈ Qn×l are given, H ∈ Qn×n is a unknown Hermitian matrix. Consider the following problems. (1) Compute Hermitian matrix H ∈ Qn×n satisfying (3.2.22). Denote the set S2 of Hermitian matrices satisfying (3.2.22). (2) Determine H ∈ S2 such that kHkF = min. (3) Under what conditions, is the matrix equation HB = C
(3.2.23)
compatible? The following theorem answers above three problems. Theorem 3.2.5. Suppose that B ∈ Qrn×l , C ∈ Qn×l are given, the SVD of B is as B follows B = UB diag(ΣB , 0)VBH , (3.2.24) in which UB , VB are unitary matrices of appropriate sizes, ΣB = diag(σ1 (B), · · · , σrB (B)) > 0.
90
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
Suppose that H˜
= UBH HUB
=
H11 H12 H21 H22
, C˜
= UBH CVB
=
C11 C12 C21 C22
,
(3.2.25)
in which H11 , C11 ∈ QrB ×rB . Then any Hermitian matrix H satisfying (3.2.22) has the following form H ) Σ−1CH K ∗ (C11 ΣB + ΣBC11 B 21 H = UB UBH = HLS + PB⊥ T PB⊥ , (3.2.26) C21 Σ−1 T 22 B HLS = UB
H H K ∗ (C11ΣB + ΣBC11 ) Σ−1 B C21 −1 C21 ΣB 0
UBH ,
(3.2.27)
in which K = (ki j ) ∈ Qrb ×rB , ki j = 1/(σ2i + σ2j ), K ∗ D denotes the Hadamard product of K and D, and T11 T12 T = UB UBH ∈ Qn×n T21 T22 is an arbitrary Hermitian matrix, T11 ∈ QrB ×rB . Among all Hermitian matrices satisfying (3.2.22), HLS has the minimal F-norm. Furthermore, if and only if B and C satisfy CPBH = C and (PBCB† )H = PBCB† , (3.2.28) the matrix equation (3.2.23) is compatible, here H has the following form H = CB† + B†H CH − B†H CH PB + PB⊥ T PB⊥ ,
(3.2.29)
in which T is an arbitrary n × n Hermitian matrix. ˜ we have ˜ C, Proof. From the SVD of B and the partitions of H, ΣB 0 kHB −CkF = kHUB VBH −CkF 0 0 Σ 0 B ˜ F = kH˜ − Ck 0 0 H11 ΣB −C11 −C12 =k kF . H21 ΣB −C21 −C22
(3.2.30)
Linear System and Generalized Least Squares Problems
91
Obviously, kHB−CkF = min if and only if H21 ΣB = C21 , and kH11ΣB −C11 kF = min(6= 0 in general, because H11 is Hermitian.) Therefore, we get H21 = C21 Σ−1 B . Suppose that f (H11) = kH11 ΣB −C11 k2F ,
then f is a quadratic polynomial made up of the elements of Hermitian matrix H11 . From the necessary condition for achieving the minimal value of difH ferential functions ∂ f (H11 )/∂H11 = 0, we obtain H11 = K ∗ (C11ΣB + ΣBC11 ). Therefore we obtain the first equation of (3.2.26). For an arbitrary Hermitian matrix T ∈ Qn×n , we have 0 0 ⊥ ⊥ PB T PB = UB UBH , 0 T22
therefore the second equation of (3.2.25) holds. Obviously, among all Hermitian matrices satisfying (3.2.26), HLS has the minimal F-norm. From (3.2.30), the matrix equation (3.2.23) is compatible, if and only if −1 −1 H C12 = 0, C22 = 0, H21 = C21 Σ−1 B , C11 ΣB = ΣB C11 = H11 .
(3.2.31)
˜ the conditions C12 = 0, C22 = 0 From the relation formula (3.2.25) of C and C, −1 H are equivalent to C = CPBH . The conditions C11 Σ−1 B = ΣB C11 is equivalent to † H † (PBCB ) = PBCB . Therefore (3.2.31) and (3.2.28) are equivalent. Conversely, when (3.2.28) and so (3.2.31) hold, from H defined in (3.2.25), H , and we have H11 = H11
Therefore
H) H11 = K ∗ (C11 ΣB + ΣBC11 −1 H 2 = K ∗ (C11 ΣB + ΣB ΣB C11 ) = K ∗ (C11 ΣB + Σ2BC11 Σ−1 B ) −1 = K ∗ (C11 Σ2B + Σ2BC11 )Σ−1 B = C11 ΣB .
H21 ΣB = C21 , H11 ΣB = C11 , C12 = 0, C22 = 0, i.e., HB −C = 0, and
H = CB† + B†H CH − B†H CH PB + PB⊥ T PB⊥ ,
in which T is an arbitrary n × n Hermitian matrix. Remark 3.2.2. For more complicated matrix equations, we need generalized singular value decompositions to deal with. For example, for matrix equation AXB + CY D = E, in which A, B, C, D, E are given, we can solve it by the Q-SVD or CCD, see, e.g., [31, 121, 197, 198].
92
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
3.3. The Total Least Squares Problem In scientific computing, the least squares is a classical method of solving linear system Ax = b. From analysis of the LS problem, we know that the LS problem minn kAx − bk2 is equivalent to the consistent linear system Ax = AA† b, that is, in
x∈Q
the LS problem, one assumes that all errors are contained in b, and A is accurate. But in many practical problems, there would exist errors in both A and b. In this case, we need to study the total least squares method (TLS). When A is a real matrix and b is a real vector, Golub and Van Loan [63] in 1980 provided the definition of the TLS problem and studied the algebraic properties of the TLS problem. In order to make the TLS problem meaningful in any case, Van Huffel and Vandewalle [173] generalized the definition of the TLS problem. And Wei [188, 189] further generalized the definition of the TLS problem. In this section, we discuss the relevant definitions and the solution sets of the quaternion TLS problem AX = B with multiple right hand, in which A ∈ Qm×n , B ∈ Qm×d . Notice that B = b ∈ Qm is a special case. Suppose that A0 ∈ Qm×n , B0 ∈ Qm×d are accurate but non-observable matrices, A ∈ Qm×n , B ∈ Qm×d are observable approximate matrices, R (B0 ) ⊆ R (A0). In order to simplify discussion, we suppose that m ≥ n + d in this section. For the situation m < n + d, we can discuss the TLS problem similarly. Suppose that the observable linear system AX = B
(3.3.1)
is perturbation from a accurate linear system A0 X = B0 ,
(3.3.2)
which is compatible. In general, there exist errors in both matrices A and B , therefore linear system (3.3.1) is not compatible. From the discussion in §3.2, if we regard (3.3.1) as the LS problem, then the solution sets of (3.3.1) and compatible linear system AX = AA† B (3.3.3) are the same, and an arbitrary LS solution X has the following form X = A† B + (I − A† A)Z, Z ∈ Qn×d . Therefore, solving the LS problem (3.3.1) is equivalent to solving linear system (3.3.3). It is equivalent to regarding coefficient matrix of A invariable,
Linear System and Generalized Least Squares Problems
93
and considering all errors produced by B. It is obviously not reasonable in many scientific computing problems. Therefore one considers low rank optimal approximation of (A, B) to obtain a new compatible linear system. This leads to the TLS problem b = B. b AX (3.3.4)
Definition 3.3.1. Suppose that A ∈ Qm×n , B ∈ Qm×d are given. Solve the linear b B) b = (A − EA , B − EB ) satisfies system (3.3.4), in which (A, b b ⊆ R (A). k(EA, EB )kF = min k(EeA , EeB)kF , and R (B) EeA ,EeB
(3.3.5)
b = rank(A, b B), b therefore (3.3.5) is equivalent to From (3.3.5), n ≥ rank(A)
b B − B)k b F= k(A − A,
b b ⊆ R (A). min k(A, B) − GkF, and R (B)
rank(G)≤n
(3.3.6)
Definition 3.3.1 does not always exist a solution.
Definition 3.3.2. Suppose that A ∈ Qm×n , B ∈ Qm×d . Determine an integer p ( b ∈ Qm×n , Bb ∈ Qm×d , G b = (A, b B), b such that 0 ≤ p ≤ n ), and A b F= k(A, B) − Gk
min
rank(G)≤p
b b ⊆ R (A). k(A, B) − GkF, and R (B)
(3.3.7)
From Definitions 3.3.1 and 3.3.2, we can see that the TLS problem involves low rank optimal approximation, and therefore the SVD and CS decomposition are important tools for studying the TLS problem. Suppose that A ∈ Qm×n , B ∈ Qm×d , the SVD of A and C = (A, B) are as follows eV eΣ e H , C = (A, B) = UΣV H , A =U (3.3.8) e e are unitary matrices , in which U,U, V,V
e = diag(σ e1 , · · · , σ en ), Σ = diag(σ1 , · · · , σn+d ), Σ e1 ≥ · · · ≥ σ en , σ1 ≥ σ2 ≥ · · · ≥ σn+d , σ Σ1 = diag(σ1 , · · · , σ p ), Σ2 = diag(σ p+1 , · · · , σn+d ),
For some integer p ≤ n, we partition V in the following form V11(p) V12(p) n V= V21(p) V22(p) d p n+d − p
(3.3.9)
(3.3.10)
94
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
Lemma 3.3.1. Suppose that A ∈ Qm×n , B ∈ Qm×d , the SVD of A and (A, B) are given by (3.3.8) and (3.3.9), respectively. For some integer p ≤ n, we partition V as in (3.3.10). Then (1) V22(p) is of full row rank, if and only if V11 (p) is of full column rank. e p > σ p+1 , then V22 (p) is of full row rank , and V11(p) is of full column (2) If σ rank . Corollary 3.3.2. Suppose that A ∈ Qm×n , B ∈ Qm×d , the SVD of A and (A, B) are given by (3.3.8)-(3.3.9) respectively. For some integer p ≤ n, we partition V as in (3.3.10). If V11 (p) is not of full column rank, then e p ≤ σ p+1 . σ p+d ≤ σ
(3.3.11)
Corollary 3.3.3. Suppose that A ∈ Qm×n , B ∈ Qm×d , the SVD of A and (A, B) are given by (3.3.8) and (3.3.9) respectively. For some integer p ≤ n, we partition V as in (3.3.10). Then the following two conditions are equivalent: (1) σ p > σ p+1 = · · · = σn+d , V11(p) is of full column rank ; e p > σ p+1 = · · · = σn+d . (2) σ
(3.3.12)
Now we discuss the solvability of the TLS problem according to Definition 3.3.1. We extend the results of Liu [126] and Wei [188, 189] into the quaternion case, as mentioned in the following Theorem 3.3.4. Theorem 3.3.4. Suppose that A ∈ Qm×n , B ∈ Qm×d , and the SVD of C = (A, B) is given by (3.3.8) and (3.3.9). Then there exists a solution of the TLS problem according to Definition 3.3.1, if and only if one of the following conditions holds. (1) If σn > σn+1 , then rank(V22(n)) = d. (2) If p < n and σ p > σ p+1 = · · · = σn+d , then rank(V22(p)) = d. (3) If p < n < q < n + d and σ p > σ p+1 = · · · = σq > σq+1 , then rank(V22(p)) = d rank(V22(q)) = n + d − q. For the sake of simplicity, we denote σ0 = +∞, σn+d+1 = −∞. Remark 3.3.1. From Theorem 3.3.4, we know the TLS problem under Definition 3.3.1 does not always exist solution. On the other hand, V22(p) is a part of unitary matrix V , and therefore, there exists an integer p in Definition 3.3.2 satisfying σ p > σ p+1 , rank(V22 (p)) = d. For example, we take p = 0, and therefore the TLS problem under Definition 3.3.2 always exists a solution. If many
Linear System and Generalized Least Squares Problems
95
integers p satisfy the conditions of Definition 3.3.2, we often take the biggest p. Consider the numerical stability, p is not the bigger the better. It should be pointed out that when the TLS problem under Definition 3.3.1 is solvable, then the TLS problem under Definition 3.3.2 is also solvable. Further the solution of the TLS problem under Definition 3.3.1 must be the solution of TLS problem under Definition 3.3,2. In the sequel, we discuss the TLS problem under Definition 3.3.2. Now we discuss the solution set of the TLS problem. Theorem 3.3.5. Suppose that A ∈ Qm×n , B ∈ Qm×d , the SVD of A and C = (A, B) are given by (3.3.8) and (3.3.9), respectively. For some integer p ≤ n, we partition V in the following form V11 (p) V12 (p) n V11 V12 V= ≡ . V21 V22 V21 (p) V22 (p) d (3.3.13) p n+d − p b Bb of Definition 3.3.2 Suppose that σ p > σ p+1 , V22 is of full row rank. Then A, are as follows H b H b = U1 Σ1V11 A , B = U1 Σ1V21 , (3.3.14)
in which the matrix U1 is the first p columns of U. The minimal norm TLS b = Bb is as follows solution XTLS of the compatible linear system AX b† Bb = (V H )†V H = A b† B = −V12V † XTLS = A 11 21 22 H † H H H = (AH A −V12 ΣH 2 Σ2V12 ) (A B −V12 Σ2 Σ2V22 ).
(3.3.15)
The solution set STLS of the TLS problem is as follows
b : Z ∈ Qm×d } STLS = {X = XTLS + (I − Ab† A)Z = {X = −V12 Q(V22Q)−1 : QH Q = Id such that V22Q nonsingular} (3.3.16) and the minimal F-norm modified matrix (δA, δB) according to X satisfies H ,V H ), (δA, δB) = U2 Σ2 QQH (V12 22 s s d
∑ σ2p+i ≥ k(δA, δB)kF ≥
i=1
d
∑ σ2n+i .
i=1
(3.3.17)
96
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
3.4. The Equality Constrained Least Squares Problem Suppose that A ∈ Qm×n , b ∈ Qm . When we solve the linear system Ax = b, sometimes we have certain requirement for the coefficient matrix A and the vector b. For example, we partition A and b in the following forms L h A= , b= , K g in which m = m1 + m2 , L ∈ Qm1 ×n , K ∈ Qm2 ×n , h ∈ Qm1 , g ∈ Qm2 , and there do not exist errors in L and h, but there exist errors in K and g, that is, h ∈ R (L). This results in the equality constrained least squares problem (LSE), which will be discussed in this section. In the following, we give the definition of the LSE problem, and derive several equivalent problems. Definition 3.4.1. Suppose that L ∈ Qm1 ×n , K ∈ Qm2 ×n , h ∈ Qm1 , g ∈ Qm2 , m = m1 + m2 , L h A= , b= , (3.4.1) K g
The LSE problem refers to that there exists x ∈ Qn such that kKx − gk2 = min kKy − gk2 , y∈S
such that S = {y ∈ Qn : Ly = h}.
(3.4.2)
If h 6∈ R (L), then the set S can be defined in the least squares sense, that is, S = {y ∈ Qn : kLy − hk2 = minn kLz − hk2 }. z∈Q
If the following conditions hold: rank(L) = m1 , rank(A) = n, there exists a unique solution to the LSE problem (3.4.2). Now we deduce the solution of the LSE problem. Suppose that P = I − L† L, L‡K = (I − (KP)† K)L† .
(3.4.3)
Linear System and Generalized Least Squares Problems
97
Theorem 3.4.1. For given matrices L ∈ Qm1 ×n , K ∈ Qm2 ×n , and the vectors h ∈ Qm1 , g ∈ Qm2 , the minimal norm solution xLSE of the LSE problem (3.4.2) has the following form xLSE = L‡K h + (KP)† g.
(3.4.4)
The solution set of the LSE problem (3.4.2) is as follows SLSE = {x = L‡K h + (KP)† g +(P − (KP)† KP)z : z ∈ Qn }.
(3.4.5)
Proof. From the definition of the LSE problem, any solution x of the LSE problem satisfies Lx = h (when h ∈ R (L) ) or kLx − hk2 = min (when h 6∈ R (L) ). Therefore, x can be expressed as x = L† h + (I − L† L)w = L† h + Pw, in which w ∈ Qn is some vector. We substitute x into the first condition of (3.4.2), to have kKx − gk2 = kKL† h + KPw − gk2 = minn kKL† h + KPu − gk2 . u∈Q
Therefore w = (KP)† (g − KL† h) + (I − (KP)† (KP))z, in which z ∈ Qn is an arbitrary vector. Notice that P is an orthogonal projection matrix, we have the following equality by the properties of the MP inverse P(KP)† = P(KP)H ((KP)(KP)H )† = (KP)H ((KP)(KP)H )† = (KP)† . (3.4.6) Therefore x has the following expression x = L† h + P(KP)† (g − KL† h) + (P − P(KP)† (KP))z = L‡K h + (KP)† g + (P − (KP)† KP)z, in which xLSE = L‡K h + (KP)† g has the minimal 2-norm. Now we deduce five equivalent problems to the LSE problem (3.4.2), which are the KKT equation, the unitary decomposition method, the Q-SVD method, the weighted LS method and the unconstrained LS method, respectively.
98
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
1. The KKT Equation Theorem 3.4.2. For given matrices L ∈ Qm1 ×n , K ∈ Qm2 ×n , and the vectors h ∈ Qm1 , g ∈ Qm2 , the LSE problem (3.4.2) is equivalent to the following KarushKuhn -Tucker equation (KKT) kBy − dk2 = min kBw − dk2 , m+n w∈Q 0 0 L h v B = 0 Im2 K , d = g , y = rK , H L KH 0 0 x
(3.4.7)
in which m = m1 + m2 , v is the Lagrange vector, rK = g − Kx is the residual vector, and (KL‡K )H KL‡K −(KL‡K )H (L‡K )H B† = −KL‡K Im2 − K(KP)† (KP)†H . ‡ † † †H LK (KP) −(KP) (KP)
(3.4.8)
2. The Unitary Decomposition Method Theorem 3.4.3. Suppose that L ∈ Qmp 1 ×n , K ∈ Qm2 ×n , A ∈ Qrm×n , h ∈ Qm1 , g ∈ Qm2 are given by (3.4.1). If A has the unitary decomposition R Q11 m1 A=Q = Q1 R, Q1 = , (3.4.9) 0 Q21 m2 in which Q ∈ Um, Q1 is the first r columns of Q, R is the full row rank matrix. Then L‡K = R† Q†11 , (KP)† = R† (I − Q†11 Q11 )QH 21 , (3.4.10) † † † † R R = A A = L L + (KP) KP. Any solution x of the LSE problem (3.4.2) has the following expression † x = R† Q†11 h + R† (I − Q†11 Q11 )QH 21 g + (In − R R)z,
(3.4.11)
in which z ∈ Qn , xLSE = R† Q†11 h + R† (I − Q†11 Q11 )QH 21 g is the minimal norm solution of the LSE problem (3.4.2).
(3.4.12)
Linear System and Generalized Least Squares Problems
99
3. The Q-SVD Method Theorem 3.4.4. Suppose that L ∈ Qmp 1 ×n , K ∈ Qm2 ×n , A ∈ Qrm×n , h ∈ Qm1 , g ∈ Qm2 are given by (3.4.1). Suppose that the SVD of A is as follows A = ZT QH = Z1 T1 H1H ,
(3.4.13)
T = diag(T1 , 0), T1 = diag(t1 , · · · ,tr ),
(3.4.14)
in which Z ∈ Um , H ∈ Un ,
t1 ≥ · · · ≥ tr > 0 are the singular values of A, Z1 , H1 are the first r columns of Z, H, respectively. Then † † H, L‡K = H1 T1−1 Z11 , (KP)† = H1 T1−1 (I − Z11 Z11 )Z21 A† A = H1 H1H ,
(3.4.15)
in which Z11 , Z21 are the first m1 rows and last m2 rows of Z1 , respectively. Any solution x of the LSE problem (3.4.2) has the form † † H Z11 )Z21 g + (In − H1 H1H )z, h + H1 T1−1 (I − Z11 x = H1 T1−1 Z11
(3.4.16)
in which † † H xLSE = H1 T1−1 Z11 h + H1 T1−1 (I − Z11 Z11 )Z21 g
(3.4.17)
is the minimal norm solution of the LSE problem (3.4.2). Proof. From the SVD of A, we obtain the unitary decomposition of A as follows A = Q1 R, in which Q1 = Z1 , Z11 = Q11 , Z21 = Q21 , R = T1 H1H . Then R† = H1 T1−1 , from Theorem 3.4.3, we obtain the assertions immediately.
4. The Weighted LS Method Theorem 3.4.5. Suppose that L ∈ Qmp 1 ×n , K ∈ Qm2 ×n , A ∈ Qrm×n , h ∈ Qm1 , g ∈ Qm2 are given by (3.4.1). We define 2 τ Im1 W (τ) = , (3.4.18) Im2
100
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
then when τ → +∞, the solution set of the WLS problem 1
1
kW (τ) 2 (Ax − b)k2 = minn kW (τ) 2 (Ay − b)k2
(3.4.19)
y∈Q
tends to the solution set of the LSE problem (3.4.2). Proof. Suppose that the SVD of A is given by (3.4.13) and (3.4.14), the CS decomposition of Z is as follows Z11 Z12 m1 Z = Z21 Z22 m2 r (3.4.20) r m − H U1 D11 D12 V1 = , U2 D21 D22 V2H in which U1 , U2 , V1 and V2 are all unitary matrices, D11 = diag(Iq ,C, 0C ), D21 = diag(0H S , S, Ir−p ),
D12 = diag(0S , S, Im1 −p ), D22 = diag(Im2 +q−r , −C, 0CH ),
(3.4.21)
C, S are the positive definite diagonal matrices of order p − q satisfying C2 + S2 = Ip−q (when p − q = 0 , both C and S are empty). From Theorem 3.4.4, we have 1 1 lim (W (τ) 2 A)†W (τ) 2 = lim (AHW (τ)A)†AH W (τ) τ→+∞
τ→+∞
H H H H = lim H1 T1−1 (τ2 Z11 Z11 + Z21 Z21 )−1 (τ2 Z11 , Z21 ) τ→+∞
= H1 T −1 (V1diag(Iq ,C−1 , 0)U1H , V1 diag(0, 0, Ir−p )U2H ) † † H = H1 T1−1 (Z11 , (Ir − Z11 Z11 )Z21 ) = (L‡K , (KP)†).
In addition, for an arbitrary integer τ > 0, we have 1
1
((W(τ) 2 A)†W (τ) 2 )(W(τ)A(W(τ)A)†A) = A† A, 1 1 (W (τ) 2 A)†W (τ) 2 A = A† A. Therefore, any solution x(τ) of the WLS problem (3.4.19) is as follows 1
1
1
1
x(τ) = (W (τ) 2 A)†W (τ) 2 b + (I − (W (τ) 2 A)†W (τ) 2 A)z 1 1 = (W (τ) 2 A)†W (τ) 2 b + (I − A† A)z, in which z ∈ Qn . We take the limits τ → ∞ in the above equality, then x(τ) tends to a solution of the LSE problem (3.4.2).
Linear System and Generalized Least Squares Problems
101
5. Unconstrained LS Method Lemma 3.4.6. Suppose that L ∈ Qmp 1 ×n , K ∈ Qm2 ×n , A ∈ Qrm×n are given by (3.4.1). We define L AI = , (3.4.22) KP(KP)†K then we have A†I = (L‡K , (KP)†),
(3.4.23)
A†I AI = A†I A = A† A = L† L + (KP)† KP.
(3.4.24)
Theorem 3.4.7. Suppose that L ∈ Qmp 1 ×n , K ∈ Qm2 ×n , A ∈ Qrm×n , h ∈ Qm1 , g ∈ Qm2 are given by (3.4.1). Then the LSE problem (3.4.2) is equivalent to the following unconstrained LS problem h kAI x − bk2 = minn kAI y − bk2 , b = . (3.4.25) g y∈Q
Chapter 4
Direct Methods for Solving Linear System and Generalized LS Problems Linear system and generalized LS problems are basic problems in matrix computations. In Chapter 4 and 5, we discuss numerical computations of these problems. In this chapter, we discuss direct methods, i.e., change the original problem into an equivalent form which is much simpler by using decompositions of coefficient matrices. After finite steps, we can obtain solution of the original problem. In §4.1, we study direct methods for nonsingular linear system. In §4.2, we study direct methods for the LS problem. In §4.3 and §4.4, we propose direct methods for the TLS and LSE problems, respectively. In §4.5, we apply direct methods to compute special solutions of two kinds of quaternion matrix equations. In Chapter 2, we have discussed the real structure-preserving algorithms of several kinds of matrix decompositions in detail. In this chapter, we only provide the main idea of direct methods for solving linear system and generalized LS problems, and do not provide numerical codes of corresponding real structure-preserving algorithms.
104
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
4.1. Direct Methods for Linear System Consider the following linear system Ax = b,
(4.1.1)
in which A ∈ Qm×m is nonsingular and b ∈ Qm . In §2.2, we discussed the real structure-preserving algorithms of the LU and PLU decompositions of a quaternion matrix. Suppose that A in (4.1.1) has the LU decomposition A = LU, in which L ∈ Qm×m is unit lower triangular and U ∈ Qm×m is upper triangular. Substitute A = LU into Ax = b, we have LUx = b. Let y = Ux, we obtain two triangular system Ly = b and Ux = y. y is obtained by forward substitutions in Ly = b, and then x is obtained by back substitutions in Ux = y. This algorithm is called the Gaussian elimination. For the partial pivoting LU decomposition, PA = LU, in which P is a permutation matrix, L is unit lower triangular, and U is upper triangular. We obtain LUx = Pb. Then Ax = b can be solved by solving two simple linear systems. In fact, we can simplify the numerical computations as follows. Denote e = (A, b). Then we can perform Gaussian eliminations on A, and so A, e to A obtain Ln−1 Ln−2 · · ·L1 (A, b) = (U, e b) or partial pivoting Gaussian eliminations to obtain
Ln−1 En−2 Ln−2 · · ·L1 E1 (A, b) = (U, e b).
Based on the above discussion, we can modify Function qLU (Algorithm 2.2.2) and Function qPLU (Algorithm 2.2.4) as follows. For Algorithms 2.2.2 and 2.2.4, we only need to substitute the input AA as the first column block of the real representation matrix of the augmented matrix A˜ = (A, b), and change the transformations implemented on k + 1 : m columns into k + 1 : m + 1 columns in assignment lines containing %(1) of Algorithm 2.2.2 and assignment lines containing %(2) of Algorithm 2.2.4, respectively. Here we omit detailed algorithms.
Direct Methods for Solving Linear System and Generalized LS ...
105
4.2. Direct Methods for the LS Problem In this section, we discuss the numerical computations of xLS for the LS problem kAx − bk = min kAy − bk, y
(4.3.1)
in which A ∈ Qrm×n , b ∈ Qm .
1. The QR Decomposition Method The following algorithm is backward stable for the three kinds of methods using the Householder, the Givens or the MGS methods, when we implement the QR decomposition on (A, b), R z (AΠ, b) = (Q1 , qn+1) , (4.3.2) 0 ρ in which qn+1 is orthogonal to Q1 , R ∈ Qr×n is an upper trapezoid matrix. The solution xLS of the LS problem (4.3.1) can be obtained by solving RΠT x = z,
r = ρqn+1 .
When RΠT x = z, kAx − bk2 attains its minimum, and the residual vector is ρqn+1 . If r = n, R is a nonsingular upper triangular matrix, the unique LS solution is xLS = ΠR−1 z. (4.3.3) H H e the If r < n, we should implement the QR decomposition on R , R = U R, minimal norm LS solution is e−H z. xLS = ΠR† z = ΠU R
(4.3.4)
2. The Normal Equation Method
Theorem 4.2.1. For given A ∈ Qrm×n , b ∈ Qm , the linear system AH Ax = AH b
(4.3.5)
is consistent, (4.3.5) is called the normal equation of LS problem (4.3.1). The normal equation has the same solutions as the original LS problem. When A ∈ Qnm×n , b ∈ Qm , AH A is an Hermitian positive definite matrix. After computing Cholesky decomposition AH A = LLH , we get Ly = b, LH xLS = y.
106
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
3. The Complete Orthogonal Decomposition Method If A ∈ Qm×n is (almost) rank deficient, we can solve the LS problem by the complete orthogonal decomposition. Suppose that R11 R12 AΠ = Q (4.3.6) 0 R22 is the PQR decomposition of A, in which for a given positive number ε, R22 satisfies inequality kR22 k2 < ε. We then set R22 := 0, and perform the QRD of (R11 , R12)H : e R H (R11, R12 ) = U , 0
in which Re is a nonsingular upper triangular matrix. Then we have H R11 R12 Re 0 e AΠ = Q =Q UH 0 0 0 0
e and complete orthogonal decomposition of A H Re 0 e A=Q U H ΠT . 0 0
Then the minimal norm solution of the (modified) LS problem is
4. The SVD Method
xLS = ΠU1 Re−H QH 1 b.
Suppose that A ∈ Qrm×n , b ∈ Qm , the SVD of A is A = U1 ΣV1H , U1H U1 = V1H V1 = Ir , Σ = diag(σ1 , · · · , σr ), σ1 ≥ · · · ≥ σr > 0. Then xLS = V1 Σ−1U1H b =
uHj b v j. j=1 σ j r
∑
(4.3.7)
If A has full rank and is well-conditioned, the QR decomposition method is more efficient than the SVD method. But if A is (almost) rank deficient, the SVD method or the complete orthogonal decomposition method are more stable.
Direct Methods for Solving Linear System and Generalized LS ...
107
4.3. Direct Methods for the TLS Problem In this section, we present three kinds of algorithms for the TLS problem, i.e., b ∈ Qm×n , Bb ∈ Qm×d , G b = (A, b B), b such that determine p ≤ n, A b F= k(A, B) − Gk
min
rank(G)≤p
b b ⊂ R (A), k(A, B) − GkF, and R (B)
(4.4.1)
in which A ∈ Qm1 ×n , B ∈ Qm×d are given and m ≥ n + d.
1. Basic SVD Method We can computer the minimum norm TLS solution xTLS as follows. Let the SVD of C = (A, B) be C = (A, B) = UΣV H , (4.4.2) in which both U,V are unitary, Σ = diag(σ1 , · · · , σn+d ) with σ1 ≥ σ2 ≥ · · · ≥ σn+d . Given parameter η > 0. Step 1 For some integer p ≤ n, if σ p ≥ η > σ p+1 , partition V into the following form V11 V12 n V= V21 V22 d . (4.4.3) p n+d − p Step 2 note
Implement the QL decomposition on V22, V22 = (0, Γ)QT , de V12 Y Z Q= . (4.4.4) V22 0 Γ
If Γ is a nonsingular square matrix, then xTLS = −ZΓ−1 . Otherwise, let p := p − 1, and go back to Steps 1 ∼ 2 until Γ is nonsingular. When we compute (4.4.3)-(4.4.4), repeatedly, Γ obtained from the QL decomposition of the former V22 , can be used in the latter steps. The computational amounts can be reduced by adding the first column of V22 to Γ as the first column.
2. The Complete Orthogonal Decomposition Method We can compute xTLS by the method of the complete orthogonal decomposition of C. Let the orthogonal decomposition of C be R11 R12 H C = URV = U VH, (4.4.5) 0 R22
108
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
in which U, V are all unitary, R11 ∈ Q p×p is nonsingular upper triangular, for a small parameter tol, |r pp | ≥ tol, kR12 k2 < tol, kR22 k2 < tol. Then V2, the last n+d − p columns of V is an approximation to the right singular vector of C associated to the smallest singular values. By implementing the QL decomposition of V2 , we have xTLS = −ZΓ−1 .
3. The Cholesky Decomposition Method When rank(A) = n, and the linear systemAx = b is almost consistent, such xTLS that σn (A) > σn+1 (C) ≡ σn+1 . Note that is a right eigenvector of −1 CH C = (A, b)H (A, b), H A A AH b xTLS xTLS 2 = σn+1 , bH A bH b −1 −1 therefore (AH A − σ2n+1 I)xTLS = AH b.
(4.4.6)
Take parameter η ≈ σ2n+1 , we computer xTLS , the solution of (AH A − ηI)x = AH b by the Cholesky decomposition. The main problem of this method is how to determine the parameter η exactly.
4.4. Direct Methods for the LSE Problem L Suppose that L ∈ Qm1 ×n , K ∈ Qm2 ×n , h ∈ Qm1 , g ∈ Qm2 , A = , b= K h , and suppose that rank(L) = p, rank(A) = r. In this section, we provide g several numerical algorithms for sloving the minimal norm solution xLSE of the LSE problem (3.4.2).
Direct Methods for Solving Linear System and Generalized LS ...
109
1. Null Space Method From Theorem 3.4.1, the minimal norm solution xLSE of the LSE problem (3.4.2) is as follows xLSE = (I − (KP)† K)L† h + (KP)† g = L† h + (KP)† (g − KL† h).
(4.4.1)
Suppose that the complete orthogonal decomposition of L (If L is full row rank, then we can compute the QR decomposition of LH with column pivoting) is R 0 L=U QH = U1 RQH (4.4.2) 1, 0 0 in which U = (U1 ,U2) and Q = (Q1 , Q2 ) are unitary matrices, U1 and Q1 are the first p columns of U and Q, respectively, and R is a nonsingular upper triangular matrix. Therefore we have H P = I − L† L = I − Q1 QH 1 = Q2 Q2 , H y =Q xLSE R−1U1H h y1 = = , (KQ2 )† (g − KQ1 R−1U1H h) y2
(4.4.3)
in which y1 = R−1U1H h, y2 = (KQ2 )† (g − KQ1 y1 ). From above formula, we obtain that xLSE = Q1 y1 + Q2 y2 . Remark 4.4.1. When p = m1 , we compute the QR decomposition with column pivoting of LH , R is a nonsingular upper triangular matrix, therefore (RH )† ΠH h = (RH )−1 Πh. When we compute K2† , we still need to compute the complete orthogonal decomposition of K2 ( if K2 is of full column rank, then we need to compute the QR decomposition of K2 ).
2. The Weighted LS Method We set a big weighting factor τ, and compute the minimal norm LS solution x(τ) of the following unconstrained LS problem
τL
τL τh τh
x− = min y− (4.4.4)
K y g 2 K g 2
110
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
From Theorem 3.4.5, the minimal norm LS solution x(τ) of (4.5.4) satisfies lim x(τ) = xLSE .
r→∞ 1
From the analysis of §3.4, τ ≥ u− 2 is suitable, in which u is the machine precision. Any algorithm for computing the LS problem can be applied to weighted LS method to compute the solution of the LSE problem. But as the analysis b they must satisfy rank(b of §3.4, when we compute matrices b L and A, L) = p, b = r. Therefore, when rank(L) = m1 , rank(A) = n , we can compute rank(A) x(τ) by the QR decomposition with column pivoting; when rank(L) = p < m1 , rank(A) = n, we first need to implement the complete orthogonal decomposition ˜ for L = U1 RQH 1 = U1 L to determine the numerical rank of L. Then we obtain ta new LS problem
τL˜
τL˜ τU1H h τU1H h
, x− y− (4.4.5)
K
= min
y g K g 2 2
and we compute x(τ) by the QR decomposition with column pivoting. When rank(L) = p < m1 , rank(A) = r < n , we first need to implement the complete orthogonal decomposition L = U1 RQH 1 to determine the numerical ranks of L and A. Of course, when L is not of full row rank and is well conditioned, we may compute the QR decomposition of L with column pivoting instead of completing orthogonal decomposition.
3. Direct Elimination Method When L is of full row rank, and A is of full column rank, we have the following direct elimination method to solve the LSE problem. First, we compute the QR decomposition of L with column pivoting QH R11 R12 , L LΠL = in which R11 is a nonsingular upper triangular matrix. Then we act QH L on the vector h, and the constrained equation is converted to b (R11 , R12)b x=b h, xb = ΠH h = QH L x, L h. We partition the matrix Kb = KΠL and the vector xb conforming with (R11, R12 ), then xb1 b b b b = KΠL . Kx − g = Kb x − g = (K1 , K2 ) − g, K xb2
111
Direct Methods for Solving Linear System and Generalized LS ... b Substituting xb1 = R−1 b2 ) to the above formula, we obtain 11 (h − R12 x e2 b Kx − g = K x2 − gb,
b gb = g − Kb1 R−1 11 h.
e2 = Kb2 − K b1 R−1 R12 , K 11
Then we can solve the LS problem
e2 xb2 − gbk2 = min kK e2 y − gbk2 kK y
b to obtain xb2 , and then compute xb1 = R−1 b2 ). Finally we obtain x = ΠH b. Lx 11 (h− R12 x
4. The QR Decomposition and the Q-SVD Methods
From §3.4, when we compute xLSE , the numerical ranks of A and L should be the same as the ranks of the original A and L, respectively. When L is not of full row rank, we need to compute xLSE by the QR decomposition and the Q-SVD method, and compute Q11 by the complete orthogonal method. We also compute xLSE according to the QR decomposition and the Q-SVD method obtained in §3.4, here we omit detail.
4.5. Some Matrix Equations In this section, we study three kinds special least squares solutions of two quaternion matrix equations AX = B and AXB + CXD = E, respectively [207, 208].
1. Special Solutions to the Quaternion Matrix Equation AX = B For given A ∈ Qm×n , B ∈ Qm×p , we consider the following three kinds special least squares solutions of the quaternion matrix equation AX = B.
(4.5.1)
Problem 1. Find XQ ∈ QL such that kXQkF = min kXkF , in which X∈QL
QL = {X|X ∈ Qn×p , kAX − BkF = min}.
112
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
Problem 2. Find XI ∈ IL such that kXI kF = min kXkF , in which X∈IL
IL = {X|X ∈ IQn×p , kAX − BkF = min}, in which IQn×p denotes the set of all n × p pure imaginary quaternion matrices. Problem 3. Find XR ∈ RL such that kXRkF = min kXkF , in which X∈RL
RL = {X|X ∈ Rn×p , kAX − BkF = min}. XQ, XI and XR are respectively called the minimal norm LS solution, pure imaginary LS solution, and real LS solution of the quaternion matrix equation (4.5.1). By applying the real representations of quaternion matrices, the particular structures of real representation matrices and the MP inverse, we study the solutions of Problems 1, 2, 3. Theorem 4.5.1. Let A ∈ Qm×n , B ∈ Qm×p . Then the set QL of Problem 1 can be expressed as QL = {X|XcR = (AR )† BRc + (I4n − (AR )† AR )Y },
(4.5.2)
in which Y ∈ R4n×p is an arbitrary matrix. Furthermore, the unique solution XQ ∈ QL of Problem 1 has the form (XQ)Rc = (AR)† BRc .
(4.5.3)
(4.5.1) has a solution X ∈ Qn×p , if and only if [I4m − AR (AR )† ]BRc = 0.
(4.5.4)
Proof. According to Lemma 1.8.7, we obtain the following equality kAX − BkF = kAR XcR − BRc kF . Therefore, min kAX − BkF = min if and only if n×p X∈Q
min kAR XcR − BRc kF =
XcR ∈R4n×p
min . We then obtain the formulas in (4.5.2) and (4.5.3). Furthermore, (4.5.1) is consistent if and only if for any X ∈ QL , 0 = kB − AXkF = kBRc − AR XcR kF = k[I4m − AR (AR)† ]BRc kF , that is, [I4m − AR (AR )† ]BRc = 0.
Direct Methods for Solving Linear System and Generalized LS ...
113
The proof of the following results are similar to those of Theorem 4.5.1, therefore we only state them without proof. Theorem 4.5.2. Let A = A0 + A1 i + A2 j + A3 k ∈ Qm×n , B = B0 + B1 i + B2 j + B3 k ∈ Qm×p and X = X0 + X1 i + X2 j + X3 k ∈ Qn×p . Denote −A1 −A2 −A3 e = A0 −A3 A2 . A A3 A0 −A1 −A2 A1 A0 Then the set IL of Problem 2 can be expressed as X1 e }, e † BRc + (I3n − (A) e † A)Y IL = {X|X0 = 0, X2 = (A) X3
(4.5.5)
in which Y ∈ R3n×p is an arbitrary matrix. Furthermore, the unique solution XI ∈ IL of Problem 2 satisfies X1 e † BRc . X2 = (A) (4.5.6) X3 (4.5.1) has a solution X ∈ IQn×p , if and only if
e A) e † ]BRc = 0. [I4m − A(
(4.5.7)
RL = {X|X = (ARc )† BRc + [In − (ARc )† ARc ]Y },
(4.5.8)
Theorem 4.5.3. Let A ∈ Qm×n and B ∈ Qm×p . Then the set RL of Problem 3 can be expressed as
in which Y ∈ Rn×p is an arbitrary matrix. Furthermore, the unique solution of Problem 3 is XR = (ARc )† BRc . (4.5.9) (4.5.1) has a solution X ∈ Rn×p , if and only if [I4m − ARc (ARc )† ]BRc = 0.
(4.5.10)
114
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
Second
8 6 4 2 0
0
20
40 60 K, Problem 1
80
100
0
20
40 60 K, Problem 2
80
100
0
20
40 60 K,Problem 3
80
100
Second
6 4 2 0
Second
0.4 0.3 0.2 0.1 0
Figure 4.5.1. The CPU times for solving Problems 1, 2, 3.
Example 4.5.1. Let m = 6K, n = 4K, p = 3K, A and B be generated randomly for K = 1 : 100. Consider the quaternion matrix equation AX = B, in which A ∈ Qm×n , B ∈ Qm×p and X ∈ Qn×p . We record the CPU times according to (4.5.3), (4.5.6), (4.5.9) and those reported in [205] for Problems 1, 2, 3, respectively, as shown in figure 4.5.1, in which the ’star’s and ’o’s stand for the CPU times by (4.5.3), (4.5.6), (4.5.9) and those reported in [205], respectively. From figure 4.5.1, we observe that the CPU times cost by (4.5.3) and (4.5.6) are shorter for Problems 1, 2, which illustrates the efficiency of these methods and confirms the above discussion.
Direct Methods for Solving Linear System and Generalized LS ...
115
2. Special Solutions to the Quaternion Matrix Equation AXB + CXD = E For given A ∈ Qm×n , B ∈ Qk×s, C ∈ Qm×n , D ∈ Qk×s and E ∈ Qm×s , we consider the following three kinds special LS solutions of the quaternion matrix equation AXB +CXD = E.
(4.5.11)
Problem 1. Find XQ ∈ QL such that kXQkF = min kXkF , in which X∈QL
QL = {X|X ∈ Qn×k ,
kAXB +CXD − EkF = min}.
Problem 2. Find XI ∈ IL such that kXI kF = min kXkF , in which X∈IL
IL = {X|X ∈ IQn×k ,
kAXB +CXD − EkF ) = min}.
Problem 3. Find XR ∈ RL such that kXRkF = min kXkF , in which X∈RL
RL = {X|X ∈ Rn×k ,
kAXB +CXD − EkF = min}.
We first summarize some results that are needed in the sequel. Lemma 4.5.4. Let X ∈ Qn×k . Then vec(X R ) = F vec(XcR ), in which diag(I4n, · · · , I4n ) diag(Qn, · · · , Qn ) 16nk×4nk , F = diag(Rn, · · · , Rn ) ∈ R diag(Sn, · · · , Sn )
and
0 In Qn = 0 0
−In 0 0 0
0 0 0 −In
0 0 ,R = In n 0
0 0 In 0
0 0 0 In
0 0 0 −In 0 0 In 0 . Sn = 0 −In 0 0 In 0 0 0
−In 0 0 0
0 −In , 0 0
116
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
Lemma 4.5.5. Suppose X = 0 + X1 i + X2 j + X3 k ∈ Qn×k . Then X1 vec(XcR ) = H vec X2 , X3
in which
I
0
H = ··· 0
0 0 I 0 0 ∈ R4nk×3nk, ··· ··· ··· 0 0 I 0
0 0 0 In 0 0 4n×3n I = . 0 In 0 ∈ R 0 0 In
Lemma 4.5.6. Suppose X = X0 + 0i + 0j + 0k ∈ Qn×k . Then e vec(X ), vec(XcR ) = H 0
in which e I 0 e 0 I He = ··· ··· 0 0
0 0 0 0 ∈ R4nk×nk, ··· ··· 0 Ie
In 0 4n×n Ie = . 0 ∈R 0
Theorem 4.5.7. For given A ∈ Qm×n , B ∈ Qk×s, C ∈ Qm×n , D ∈ Qk×s, E ∈ Qm×s , denote M = (BRc )T ⊗ AR + (DRc )T ⊗ CR . Then the set QL of Problem 1 can be expressed as QL = {X|vec(XcR) = (M F )† vec(EcR ) + [I4nk − (M F )† (M F )]y, y ∈ R4nk }. (4.5.12) Therefore, the unique LS solution XQ ∈ QL of Problem 1 satisfies vec((XQR )c ) = (M F )† vec(EcR ).
(4.5.13)
(4.5.11) has a solution X ∈ Qn×k if and only if [I4ms − (M F )(M F )† ]vec(EcR ) = 0.
(4.5.14)
Direct Methods for Solving Linear System and Generalized LS ...
117
Proof. According to Lemmas 4.5.4-4.5.6, we have 1 kAXB +CXD − EkF = k(AXB +CXD − E)R kF 2 = kAR X R BRc +CR X R DRc − EcR kF
= k[(BRc)T ⊗ AR + (DRc )T ⊗CR ]vec(X R ) − vec(EcR )kF
= k[(BRc)T ⊗ AR + (DRc )T ⊗CR ]F vec(XcR ) − vec(EcR )kF
= kM F vec(XcR ) − vec(EcR )kF .
Therefore min kAXB +CXD − EkF = min, if and only if X∈Qn×k
min kM F vec(XcR ) − vec(EcR )kF = min.
XcR ∈R4n×k
We then obtain the formulas in (4.5.12) and (4.5.13). Furthermore, (4.5.11) is compatible if and only if for any X ∈ QL , 0 = kE − AXB −CXDkF = kvec(EcR ) − M F vec(XcR )kF = k[I4ms − (M F )(M F )† ]vec(EcR )kF , that is, [I4ms − (M F )(M F )† ]vec(EcR ) = 0. Therefore Theorem 4.5.7 holds. Now we state the results of Problem 2 and 3. Since the methods are similar to those of Theorem 4.5.7, we only describe the results and omit the detailed proof. Theorem 4.5.8. Let A ∈ Qm×n , B ∈ Qk×s , C ∈ Qm×n , D ∈ Qk×s, E ∈ Qm×s , X = X0 + X1 i + X2 j + X3 k ∈ Qn×k , and M is the same as that of Theorem 4.5.7. Then the set IL of Problem 2 can be expressed as X1 IL = {X : X0 = 0, vec X2 = (M F H )† vec(EcR )+ X3 [I3nk − (M F H )† (M F H )]y, y ∈ R3nk}. Therefore, the unique pure imaginary LS solution XI = X1 i + X2 j + X3 k ∈ IL of Problem 2 satisfies X1 vec X2 = (M F H )† vec(EcR ). (4.5.15) X3
118
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
(4.5.11) has a solution X ∈ IQn×k if and only if [I4ms − (M F H )(M F H )† ]vec(EcR ) = 0.
(4.5.16)
Theorem 4.5.9. Let A ∈ Qm×n , B ∈ Qk×s, C ∈ Qm×n , D ∈ Qk×s, E ∈ Qm×s , and M is the same as that of Theorem 4.5.7. Then the set RL of Problem 3 can be expressed as e )† vec(E R ) + [I − (M F H e )† (M F H e )]y, y ∈ Rnk }. RL = {X|vec(X) = (M F H nk c (4.5.17) Therefore, the unique real LS solution XR ∈ RL of Problem 3 satisfies e )† vec(E R ). vec(XR ) = (M F H c
(4.5.18)
e )(M F H e )† ]vec(E R ) = 0. [I4ms − (M F H c
(4.5.19)
(4.5.11) has a solution X ∈ Rn×k if and only if
Chapter 5
Iterative Methods for Solving Linear System and Generalized LS Problems In this chapter, we describe iterative methods for solving quaternion linear system and quaternion generalized LS problems, that is, select an initial vector x(0) , · · · , x(r−1) such that vector sequence {x(k)} obtained by the iterative process x(k) = f k (x(k−r), · · · , x(k−1)), k = r, r + 1, · · · converge to the solution of the original problem. Both direct methods and iterative methods have advantages and disadvantages. Direct methods have a small computational amount, but they need large storage capacity. The problems are complexity in general. On the contrary, iterative methods need large computational amount and relative small storage capacity, and procedures are simple and efficient, especially for solving large sparse quaternion system and generalized LS problems. In §5.1, we introduce some basic knowledge. In §5.2, we describe iterative methods for linear system. In §5.3 and §5.4, we describe iterative methods for the quaternion LS and the quaternion TLS problems, respectively. In §5.5, we describe two iterative algorithms for solving quaternion matrix equation AXB = C.
120
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
5.1. Basic Knowledge In this section, we provide some basic knowledge needed in this chapter, including properties of the Chebyshev polynomials and basic theory for splitting iterative methods.
5.1.1.
The Chebyshev Polynomials
For n = 0, 1, 2, · · ·, the Chebyshev polynomial Tn (x) is defined as
p p 1 Tn (x) = ((x + x2 − 1)n + (x − x2 − 1)n ), −∞ < x < +∞. 2
(5.1.1)
If |x| 6 1, Tn (x) can be expressed as
Tn (x) = cos(n arccosx).
(5.1.2)
In fact, let x = cos θ with 0 6 θ 6 π, then from the Euler’s formula, we have √ √ Tn (x) = 21 ((x + i 1 − x2 )n + (x − i 1 − x2 )n ) = 12 ((cosθ + i sin θ)n + (cos θ − i sinθ)n ) = 12 (einθ + e−inθ ) = cos(nθ) = cos(n arccos x). Tn (x) have the following properties. (1) Recursive relations T0 (x) = 1, T1 (x) = x, Tn+1(x) = 2xTn (x) − Tn−1 (x), n = 1, 2, · · · .
(5.1.3)
(2) If |x| 6 1, then |Tn (x)| 6 1. If |x| > 1, then |Tn (x)| > 1, n = 1, 2, · · · . Therefore max |Tn (x)| = 1. −16x61
(3) Tn (x) has n different real roots. (n)
xk = cos
(2k − 1)π , k = 1 : n. 2n
(4) Suppose that z is a given positive number with z > 1, Θn is a set of real coefficient polynomials satisfying the following conditions.
Iterative Methods for Solving Linear System ...
121
(i) The degree of any polynomial qn (x) ∈ Θn is not greater than n; (ii) For any qn (x) ∈ Θn , we have qn (z) = 1. Then min
max |qn (x)| = max |Tbn (x)|,
qn (x)∈Θn −16x61
in which
−16x61
Tn (x) . Tbn (x) = Tn (z)
(5.1.4)
(5.1.5)
Theorem 5.1.1. Suppose that Ψ1n is the set of polynomials pn (λ), whose degrees are not greater than n and constant terms are 1. For 0 < a < b, polynomials pbn (λ) ∈ Ψ1n satisfying max | pbn (λ)| =
a6λ6b
max |pn (λ)|
min
pn (λ)∈Ψ1n a6λ6b
is unique, and has the form
5.1.2.
pbn (λ) =
Tn ( b+a−2λ b−a ) Tn ( b+a b−a )
.
(5.1.6)
The Range of Eigenvalues of Real Symmetric Tridiagonal Matrices
Now we consider the range of matrix a1 b1 T =
right eigenvalues of real symmetric tridiagonal
b1 a2 .. .
..
.
..
.
bn−1
bn−1 an
.
(5.1.7)
Suppose that b1 , · · · , bn−1 are not zero, else we can consider the case of smaller size. Let Tr be the principal submatrix of order r of T , pr (λ) = det(Tr − λIr ) is the principal minor of degree r of T − λI and p0 (λ) = 1. Expanding det(Tr − λIr ) according to the last row, then we obtain the following recursive formulas p0 (λ) = 1, p1 (λ) = a1 − λ, pr (λ) = (ar − λ)pr−1 (λ) − b2r−1 pr−2 (λ), r = 2, · · · , n.
(5.1.8)
122
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
The polynomial sequence {pr (λ)} have the following properties. (1) The roots of pr (λ) are all real ( r = 1, · · · , n ). (2) pr (−∞) > 0, the sign of pr (+∞) is (−1)r (r = 1, · · · , n), in which pr (+∞) is the value pr (λ) when λ is sufficiently large, pr (−∞) is the value of pr (λ) when −λ is sufficiently large. (3)There are no common roots for two neighboring polynomials. (4) If pr (α) = 0, then pr−1 (α)pr+1(α) < 0, r = 1, · · · , n − 1. (5) pr (λ) has only simple roots, and the roots of pr+1 (λ) is strictly separated by the roots of pr (λ), (r = 1, · · · , n − 1). For given α ∈ R, polynomial sequence {pr (α)}nr=0 is called the Sturm sequence. sk (α) is the number of the same sign in the neighboring items of the sequence p0 (α), p1(α), · · · , pk (α). If pr (α) = 0, we prescript the sign of pr (α) is opposite to that of pr−1 (α) . Theorem 5.1.2. For given real number α, the number of roots of pr (λ) which are strictly larger than α is sr (α). Theorem 5.1.3. For given real number α, the number of right eigenvalues of T which are strictly larger than α is sn (α).
5.2. Iterative Methods for Linear System Direct methods for solving quaternion linear system Ax = b in Chapter 4 involve the decomposition of the coefficient matrix A. It can be impractical if A is large and sparse. In this case, iterative methods could be used to solve various linear systems. In this section, we study two kinds of iterative methods, splitting iterations and the Krylov subspace methods.
5.2.1.
Basic Theory of Splitting Iterative Method
Consider the following linear system Ax = b,
(5.2.1)
Iterative Methods for Solving Linear System ...
123
in which A ∈ Qn×n is nonsingular. Splitting A as A = M − N,
(5.2.2)
where M is nonsingular, and it is easy to compute the inverse of M. Thus Ax = b can be expressed as x = Gx + g, (5.2.3) in which G = M −1 N = I − M −1 A,
g = M −1 b.
(5.2.4)
Obviously, (5.2.1) and (5.2.3) are equivalent. For any x(0) ∈ Qn , construct iteration x(k) = Gx(k−1) + g, k = 1, 2, · · · .
(5.2.5)
Above formula is called order-1 linear stationary iterative method. In general, for k = 1, 2, · · · , A is expressed as A = Mk − Nk , Gk = Mk−1 Nk = I − Mk−1 A,
(5.2.6) gk = Mk−1 b,
(5.2.7)
in which Mk are nonsingular. For any x(0) ∈ Qn , construct iteration x(k) = Gk x(k−1) + gk , k = 1, 2, · · · .
(5.2.8)
Above formula is called order-1 linear non-stationary iterative method. Definition 5.2.1. For any given initial vector x(0) , if vector sequence {x(k)} obtained from the iteration (5.2.8) converge to the solution u of (5.2.1), then we say the iteration is convergent, otherwise, the iteration is divergent. And e(k) = x(k) − u is called the error vector of the k-th step. If the iteration is convergent, x(k) is called the approximate solution obtained at the k-th step. We have the following results about linear iteration. Theorem 5.2.1. The sufficient and necessary condition of convergence of iterative method (5.2.8) is the matrix sequence Tk = Gk Gk−1 · · ·G1 , k = 1, 2, · · · converge to the zero matrix.
(5.2.9)
124
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
Proof. Suppose that u is the solution of (3.1.2). From (5.2.6)-(5.2.8), we have e(k) = Gk (x(k−1) − u) = Gk e(k−1) = Gk Gk−1 · · ·G1 e(0) = Tk e(0) . From definition of convergence of iteration, for any initial error vector e(0) , e(k) converge to 0, (k → ∞). Therefore the sufficient and necessary condition of convergence of iterative methods (5.2.8) is lim Tk = 0.
k→∞
Theorem 5.2.2. The sufficient and necessary condition of convergence of iteration (5.2.5) is lim Gk = 0, or equivalently, ρ(G) < 1. (5.2.10) k→∞
Proof. If ρ(G) ≥ 1, suppose that λ ∈ λr (G), and |λ| = ρ(G). Let y be the right eigenvector of G associated with λ, and x0 = u + y. Then when k → ∞, e(k) = Gk e(0) = Gk y = yλk 6→ 0. If ρ(G) < 1, suppose that J is the Jordan canonical form of G, G = XJX −1 , X is a nonsingular matrix. Since ρ(G) < 1, J is bidiagonal with the absolute values of elements on diagonal are all less than 1, elements at upper diagonal are 0 or 1. Then when k → ∞, we have J k → 0, Gk = XJ k X −1 → 0. Theorem 5.2.3. If kGk < 1 for some consistent matrix norm k · k on Qn×n , then the iteration (5.2.5) is convergent. Proof.
From Theorem 1.5.6, we have ρ(G) ≤ kGk < 1.
Now we discuss iterative speed of iteration (5.2.5). Suppose that k · k is a vector norm on Qn , or matrix norm consistent with vector norm on Qn . We want that the error vector satisfies ke(k)k 6 kGk kke(0)k 6 ke(0)kξ (ξ < 1),
Iterative Methods for Solving Linear System ...
125
that is kGk k 6 ξ. So the number of iterations k should satisfy 1 k > (− lnkGk k)−1 ln ξ−1 . k Definition 5.2.2. Suppose that k·k is a consistent matrix norm on Qn×n , ρ(G) < 1. Let 1 Rk (G) = − ln kGk k, R(G) = − ln ρ(G). (5.2.11) k Rk (G) is called the average rate of convergence of k steps according to k · k. R(G) is called the asymptotic rate of convergence of the iteration (5.2.5). Theorem 5.2.4. Suppose that k · k is a consistent matrix norm on Qn×n , ρ(G) < 1. Then for iterative method (5.2.5), we have R(G) = lim Rk (G) = − ln ρ(G). k→∞
(5.2.12)
Theorem 5.2.5. Suppose that k · k is a vector norm on Qn , or a matrix norm on Qn×n which is consistent with vector norm with kGk < 1. Then the approximate solution x(k) according to the iterative method (5.2.5) satisfies (1 − kGk)kx(k) − uk ≤ kx(k+1) − x(k) k ≤ (1 + kGk)kx(k) − uk, kx(k) − uk 6
kGkk (1) − x(0) k, 1−kGk kx
(5.2.13)
in which u is the solution of (5.2.1). Remark 5.2.1. From Theorem 5.2.5, when kGk < 1 , the iterative method (5.2.5) is at least linearly convergent and kx(k+1) − x(k) k is a good estimation of the iterative error kx(k) − uk. From A = M − N, the splitting iterative method has the form Mx(k+1) = Nx(k) + b
(5.2.14)
For the iteration (5.2.14) to be practical, it must be easy to solve a linear system with M a nonsingular matrix. So we hope that M has some special structure. For example, M is diagonal or triangular. In view of this idea, we can construct different iterative methods based on different splitting of A. The Jacobi and
126
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
Gauss-Seidel procedures are typical members of splitting iterative methods, and iterative formula from x(k−1) to x(k) can be concisely described in terms of the matrices L, D and U, defined by 0 0 ··· 0 0 −a21 0 ··· 0 0 L= 0 0 −a31 −a32 · · · , ··· ··· ··· ··· ··· −an1 −an2 · · · −an,n−1 0 D = diag(a11, a22 , · · · , ann ), 0 −a12 · · · −a1,n−1 −a1n 0 0 · · · −a −a 2,n−1 2n 0 0 · · · −a −a 3,n−1 3n U = ··· ··· ··· ··· ··· 0 0 ··· 0 −an−1,n 0 0 ··· 0 0
Obviously, A = D − L −U.
.
1. The Jacobi Iteration Splitting A as A = M − N, in which M = D, N = L + U. The Jacobi iterative matrix J is J = M −1 N = D−1 (L +U), and the Jacobi iteration has the form Dx(k) = Lx(k−1) +Ux(k−1) + b, k = 1, 2, · · · or b (k−1) + b x(k) = (b L + U)x b, k = 1, 2, · · ·
b = D−1U, b in which b L = D−1 L, U b = D−1 b. The program is as follows for i = 1 : n b (k−1) + b x(k) = (b L + U)x b(i) end The numerical code of real structure-preserving algorithm of the Jacobi iterative is as follows.
Iterative Methods for Solving Linear System ...
127
Algorithm 5.2.6. For given nonsingular matrix A = A1 + A2 i + A3 j + A4 k ∈ Qn×n and the vector b = b1 + b2 i + b3 j + b4 k ∈ Qn . The inputs AA, bb are the first column blocks of AR , bR , respectively, i.e., ARc , bRc , M is the maximal number of iterations, tol is a given small parameter. The output xx is the first column block of the real representation of the approximate solution of (5.2.1), and k is the number of iterations. Function: [xx, k] = jacobi(AA, bb, M,tol) [m, n] = size(AA); xx = ones(4 ∗ n, 1); DD = [diag(diag(AA(1 : n, :))); diag(diag(AA(n + 1 : 2 ∗ n, :))); ... (diag(AA(2 ∗ n + 1 : 3 ∗ n, :))); diag(diag(AA(3 ∗ n + 1 : 4 ∗ n, :)))]; AA1 = DD − AA; bbb = bb for i = 1 : n q = AA(i, i) ∗ AA(i, i) + AA(n + i, i) ∗ AA(n + i, i) + AA(2 ∗ n + i, i)... ∗AA(2 ∗ n + i, i) + AA(3 ∗ n + i, i) ∗ AA(3 ∗ n + i, i); ss = Real p(AA(i, i), −AA(n + i, i), −AA(2 ∗ n + i, i), −AA(3 ∗ n + i, i))/q; AA1([i, n + i, 2 ∗ n + i, 3 ∗ n + i], :) = ss ∗ AA1([i, n + i, 2 ∗ n + i, 3 ∗ n + i], :); bbb([i, n + i, 2 ∗ n + i, 3 ∗ n + i]) = ss ∗ bb([i, n + i, 2 ∗ n + i, 3 ∗ n + i]); end for j j = 1 : M zz = Real p(AA1(1 : n, :), AA1(n + 1 : 2 ∗ n, :), ... AA1(2 ∗ n + 1 : 3 ∗ n, :), AA1(3 ∗ n + 1 : 4 ∗ n, :)) ∗ xx + bbb; if norm(zz − xx) < tol k = j j; xx = zz; break end xx = zz; end end
Notice that in the Jacobi iteration, one does not use the most recently avail(k) able information when computing xi . If we revise the Jacobi iteration so that we always use the most current estimation of the computed xi , then we obtain The Gauss-Seidel iteration.
128
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
2. The Gauss-Seidel Iteration Splitting A as A = M − N, in which M = D − L, N = U. The Gauss-Seidel iteration matrix G is G = (D − L)−1U, and the Gauss-Seidel step has the form b (k−1) + b x(k) = b Lx(k) + Ux b, k = 1, 2, · · · .
The detail program is as follows. b 2 : n)x(k−1)(2 : n) + b x(k)(1) = U(1, b(1) for i = 2 : n − 1 b (i, i+1 : n)x(k−1)(i+1 : n)+ b x(k) (i) = b L(i, 1 : i−1)x(k)(1 : i−1)+ U b(i) end x(k)(n) = b L(n, 1 : n − 1)x(k)(1 : n − 1) + b b(n) The numerical code of real structure-preserving algorithm of the GaussSeidel iteration is as follows. Algorithm 5.2.7. For given nonsingular matrix A = A1 + A2 i + A3 j + A4 k ∈ Qn×n and the vector b = b1 + b2 i + b3 j + b4 k ∈ Qn . The inputs AA, bb are the first column blocks of AR , bR , respectively, i.e., ARc , bRc , M is the maximal number of iterations, tol is a given small parameter. The output xx is the first column block of the real representation of the approximate solution of (5.2.1), and k is the number of iterations. Function: [k, xx] = gauss(AA, bb, M,tol) a = AA; [m, n] = size(a); xx = ones(4 ∗ n, 1); d = [diag(diag(a(1 : n, :))); diag(diag(a(n + 1 : 2 ∗ n, :))); ... diag(diag(a(2 ∗ n + 1 : 3 ∗ n, :))); diag(diag(a(3 ∗ n + 1 : 4 ∗ n, :)))]; a1 = d − a; bbb = bb; for i = 1 : n q = a(i, i) ∗ a(i, i) + a(n + i, i) ∗ a(n + i, i) + a(2 ∗ n + i, i) ∗ a(2 ∗ n + i, i)... +a(3 ∗ n + i, i) ∗ a(3 ∗ n + i, i); ss = Real p(a(i, i), −a(n + i, i), −a(2 ∗ n + i, i), −a(3 ∗ n + i, i))/q; a1([i, n + i, 2 ∗ n + i, 3 ∗ n + i], :) = ss ∗ a1([i, n + i, 2 ∗ n + i, 3 ∗ n + i], :); bbb([i, n + i, 2 ∗ n + i, 3 ∗ n + i]) = ss ∗ bb([i, n + i, 2 ∗ n + i, 3 ∗ n + i]); end
Iterative Methods for Solving Linear System ...
129
ll = [tril(a1(1 : n, :), −1); tril(a1(n + 1 : 2 ∗ n, :), −1); ... tril(a1(2 ∗ n + 1 : 3 ∗ n, :), −1);tril(a1(3 ∗ n + 1 : 4 ∗ n, :), −1)]; uu = [triu(a1(1 : n, :), +1); triu(a1(n + 1 : 2 ∗ n, :), +1); ... triu(a1(2 ∗ n + 1 : 3 ∗ n, :), +1);triu(a1(3 ∗ n + 1 : 4 ∗ n, :), +1)]; for j j = 1 : M yy = xx; yy([1, n + 1, 2 ∗ n + 1, 3 ∗ n + 1]) = Real p(uu(1, 2 : n), uu(n + 1, 2 : n), ... uu(2 ∗ n + 1, 2 : n), uu(3 ∗ n + 1, 2 : n))... ∗yy([2 : n, n + 2 : 2 ∗ n, 2 ∗ n + 2 : 3 ∗ n, 3 ∗ n + 2 : 4 ∗ n])... +bbb([1, n + 1, 2 ∗ n + 1, 3 ∗ n + 1]); for i = 2 : n − 1 yy([i, n + i, 2 ∗ n + i, 3 ∗ n + i]) = ... Real p(ll(i, 1 : i − 1), ll(n + i, 1 : i − 1), ll(2 ∗ n + i, 1 : i − 1), ... ll(3 ∗ n + i, 1 : i − 1)) ∗ yy([1 : i − 1, n + 1 : n + i − 1, 2 ∗ n + 1 : 2 ∗ n + i − 1, ... 3 ∗ n + 1 : 3 ∗ n + i − 1]) + Real p(uu(i, i + 1 : n), uu(n + i, i + 1 : n), ... uu(2 ∗ n + i, i + 1 : n), uu(3 ∗ n + i, i + 1 : n)) ∗ yy([i + 1 : n, n + i + 1 : 2 ∗ n, ... 2 ∗ n + i + 1 : 3 ∗ n, 3 ∗ n + i + 1 : 4 ∗ n]) + bbb([i, n + i, 2 ∗ n + i, 3 ∗ n + i]); end yy([n, 2 ∗ n, 3 ∗ n, 4 ∗ n]) = Real p(ll(n, 1 : n − 1), ll(2 ∗ n, 1 : n − 1), ... ll(3 ∗ n, 1 : n − 1), ll(4 ∗ n, 1 : n − 1)) ∗ yy([1 : n − 1, n + 1 : 2 ∗ n − 1, ... 2 ∗ n + 1 : 3 ∗ n − 1, 3 ∗ n + 1 : 4 ∗ n − 1]) + bbb([n, 2 ∗ n, 3 ∗ n, 4 ∗ n]); i f norm(yy − xx) < tol k = j j; xx = yy; break end xx = yy; end end
If the spectral radius of (D − L)−1U is close to one, then the convergent speed of the Gauss-Seidel iteration may be slow. To improve the convergent speed, we change the Gauss-Seidel step to obtain the successive over-relaxation (SOR).
3. The Successive over Relaxation Iteration (SOR) Splitting A as A = M − N, in which M = ω1 D − L, N = 1−ω ω D + U and ω ∈ R, which is called relaxed parameter. The successive over relaxation (SOR) iterative matrix S is S = (D − ωL)−1 ((1 − ω)D + ωU)
130
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
and SOR step has the form (D − ωL)x(k) = ((1 − ω)D + ωU)x(k−1) + ωb or Dx(k) = Dx(k−1) + ω(b + Lx(k) +Ux(k−1) − Dx(k−1) ), k = 1, 2, · · · . The detailed program is as follows b 2 : n)x(k−1)(2 : n) − x(k−1)(1)] x(k)(1) = x(k−1)(1) + ω[b b(1) + U(1, for i = 2 : n − 1 x(k) (i) = x(k−1)(i) + ω[b b(i) + b L(i, 1 : i − 1)x(k) (1 : i − 1)+ b i + 1 : n)x(k−1)(i + 1 : n) − x(k−1)(i)] U(i, end x(k)(n) = x(k−1)(n) + ω[b b(n) + b L(n, 1 : n − 1)x(k)(1 : n − 1) − x(k−1)(n)] The numerical code of real structure-preserving algorithm of the SOR is as follows. Algorithm 5.2.8. For given nonsingular matrix A = A1 + A2 i + A3 j + A4 k ∈ Qn×n and the vector b = b1 + b2 i + b3 j + b4 k ∈ Qn . The inputs AA, bb are the first column blocks of AR , bR , respectively, i.e., ARc , bRc , w is a relaxed parameter, M is the maximal number of iterations, and tol is a given small parameter. The output xx is the first column block of the real representation of the approximate solution of (5.2.1), i.e. the first column block of xR , and k is the number of iterations. Function: [k, xx] = SOR(AA, bb, w, M,tol) a = AA; [m, n] = size(a); xx = ones(4 ∗ n, 1); d = [diag(diag(a(1 : n, :))); diag(diag(a(n + 1 : 2 ∗ n, :))); ... diag(diag(a(2 ∗ n + 1 : 3 ∗ n, :))); diag(diag(a(3 ∗ n + 1 : 4 ∗ n, :)))]; a1 = d − a; bbb = bb; for i = 1 : n q = a(i, i) ∗ a(i, i) + a(n + i, i) ∗ a(n + i, i) + a(2 ∗ n + i, i) ∗ a(2 ∗ n + i, i) + ... a(3 ∗ n + i, i) ∗ a(3 ∗ n + i, i); ss = Real p(a(i, i), −a(n + i, i), −a(2 ∗ n + i, i), −a(3 ∗ n + i, i))/q; a1([i, n + i, 2 ∗ n + i, 3 ∗ n + i], :) = ss ∗ a1([i, n + i, 2 ∗ n + i, 3 ∗ n + i], :); bbb([i, n + i, 2 ∗ n + i, 3 ∗ n + i]) = ss ∗ bb([i, n + i, 2 ∗ n + i, 3 ∗ n + i]); end
Iterative Methods for Solving Linear System ...
131
ll = [tril(a1(1 : n, :), −1); tril(a1(n + 1 : 2 ∗ n, :), −1); ... tril(a1(2 ∗ n + 1 : 3 ∗ n, :), −1);tril(a1(3 ∗ n + 1 : 4 ∗ n, :), −1)]; uu = [triu(a1(1 : n, :), +1); triu(a1(n + 1 : 2 ∗ n, :), +1); ... triu(a1(2 ∗ n + 1 : 3 ∗ n, :), +1);triu(a1(3 ∗ n + 1 : 4 ∗ n, :), +1)]; for j j = 1 : M yy = xx; yy([1, n + 1, 2 ∗ n + 1, 3 ∗ n + 1]) = (1 − w) ∗ yy([1, n + 1, 2 ∗ n + 1, 3 ∗ n + 1])... +w ∗ (Real p(uu(1, 2 : n), uu(n + 1, 2 : n), uu(2 ∗ n + 1, 2 : n), ... uu(3 ∗ n + 1, 2 : n)) ∗ yy([2 : n, n + 2 : 2 ∗ n, 2 ∗ n + 2 : 3 ∗ n, 3 ∗ n + 2 : 4 ∗ n])... +bbb([1, n + 1, 2 ∗ n + 1, 3 ∗ n + 1])); for i = 2 : n − 1 yy([i, n + i, 2 ∗ n + i, 3 ∗ n + i]) = (1 − w) ∗ yy([i, n + i, 2 ∗ n + i, 3 ∗ n + i])... +w ∗ (Real p(ll(i, 1 : i − 1), ll(n + i, 1 : i − 1), ll(2 ∗ n + i, 1 : i − 1), ... ll(3 ∗ n + i, 1 : i − 1)) ∗ yy([1 : i − 1, n + 1 : n + i − 1, 2 ∗ n + 1 : 2 ∗ n + i − 1, ... 3 ∗ n + 1 : 3 ∗ n + i − 1]) + Real p(uu(i, i + 1 : n), uu(n + i, i + 1 : n), ... uu(2 ∗ n + i, i + 1 : n), uu(3 ∗ n + i, i + 1 : n)) ∗ yy([i + 1 : n, n + i + 1 : 2 ∗ n, ... 2 ∗ n + i + 1 : 3 ∗ n, 3 ∗ n + i + 1 : 4 ∗ n]) + bbb([i, n + i, 2 ∗ n + i, 3 ∗ n + i])); end yy([n, 2 ∗ n, 3 ∗ n, 4 ∗ n]) = (1 − w) ∗ yy([n, 2 ∗ n, 3 ∗ n, 4 ∗ n]) +w ∗ (Real p(ll(n, 1 : n − 1), ll(2 ∗ n, 1 : n − 1), ll(3 ∗ n, 1 : n − 1), ... ll(4 ∗ n, 1 : n − 1)) ∗ yy([1 : n − 1, n + 1 : 2 ∗ n − 1, 2 ∗ n + 1 : 3 ∗ n − 1, ... 3 ∗ n + 1 : 4 ∗ n − 1]) + bbb([n, 2 ∗ n, 3 ∗ n, 4 ∗ n])); i f norm(yy − xx) < tol k = j j; xx = yy; break end xx = yy; end end
In some cases, the best relaxation parameter is known. However, it is difficult to determine the best ω in general.
4. The Chebyshev Semi-Iterative Method A way to accelerate the convergence of an iterative method is to make use of the Chebyshev polynomials. Suppose that x(1), · · · , x(k) have been generated via the iteration Mx(k) = Nx(k−1) + b, and we wish to determine coefficients v j (k), j = 0, · · · , k such that
132
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
y(k) =
k
∑ x( j) v j (k) j=0
k
represents an improvement over x(k) . Obviously, ∑ v j (k) = 1 is required. Unj=0
der this constraint, we want to determine how to choose v j (k) so that kx∗ − yk2 is minimized in which x∗ is the solution of (3.1.2). Denote G = M −1 N, assume that G is Hermitian with right eigenvalues λi that satisfy −1 < α ≤ λn ≤ · · · ≤ λ1 ≤ β < 1 By exploiting the Chebyshev polynomials, we obtain the following algorithm y(k+1) = ω(k+1)(y(k) − y(k−1) + γz(k)) + y(k−1),
Mz(k) = b − Ay(k), in which
y(0) = x(0) , y(1) = x(1) , γ = 2/(2 − α − β), 2 − β − α Tk (u) ω(k+1) = 2 , β − α Tk+1 (u) 1−β u = 1+2 . β−α Tk and Tk+1 are the Chebyshev polynomials with degrees k and k + 1, respectively. We refer to this scheme as the Chebyshev semi-iterative method associated with My(k+1) = Ny(k) + b. For the acceleration to be effective, we need good lower and upper bounds α and β. As in SOR, these parameters may be difficult to determine except in a few structured problems. A difficulty associated with the SOR, the Chebyshev Semi-iteration method is that they depend on parameters which are sometimes hard to choose properly.
5.2.2.
The Krylov Subspace Methods
In this subsection, we describe the Krylov subspace methods for solving linear system Ax = b, i.e., we first construct a Krylov subspace
Kk (A, b) = span{b, Ab, · · · , Ak−1b}.
Iterative Methods for Solving Linear System ...
133
Then we search xk ∈ Kk (A, b) such that xk is the best approximation of the solution of Ax = b in the sense that krk k2 = min{kb − Axk2 , x ∈ K (A, b)}, in which rk = b − Axk is the residual vector for xk . We mainly discuss the conjugate gradient method applied to Hermitian positive definite system, which is one of the most famous Krylov subspace methods.
1. The Conjugate Gradient Method (CG) Suppose that A ∈ Qn×n is Hermitian positive definite and b ∈ Qn . Consider the linear system Ax = b. Ax − b = 0 ⇔ 0 = kAx − bk2 = k(Ax − b)Rc k2 = kAR xRc − bRc k2 . Notice that A is a quaternion Hermitian positive definite matrix if and only if AR is a real symmetric positive definite matrix. Therefore, for the linear system b = AR , b Ax = b, we can adopt the real CG method. Denote A b = bRc , y = xRc , and define a function 1 b bT − b y. ϕ(y) = yT Ay 2 It is easy to check that the gradient of ϕ is b −b ϕ(y) = Ay b,
b −1b and ϕ has a unique minimum point y∗ = (A) b. Thus, minimizing ϕ and solvb =b ing Ay b are equivalent problems.
Theorem 5.2.9. Suppose that A ∈ Qn×n is Hermitian positive definite. y∗ is a b =b solution of Ay b if and only if ϕ(y∗ ) = min ϕ(y). y∈R4n
The method of searching y∗ such that ϕ(y) attains its minimum value is to construct a vector system {y(k) } such that ϕ(y(k)) converge to ϕ(y∗ ).
Definition 5.2.3. Suppose that p, q ∈ Qn , and A ∈ Qn×n is an Hermitian positive definite matrix. If p 6= 0, q 6= 0, and pH Aq = 0,
we say p and q are A-conjugate.
134
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao Suppose that p0 , p1 , · · · , pm ∈ Qn are all nonzero vectors which satisfy pH i Ap j = 0,
i 6= j, i, j = 0, 1, 2, · · · , m,
we say p0 , p1 , · · · , pm is a set of A-conjugate vectors. b 0 and take b For any initial vector y0 ∈ R4n , compute b r0 = b b − Ay r0 as the T descent direction, denoted as pb0 = b r0 and ρ0 = b r0 b r0 . For k = 0, 1, · · ·, compute αk =
T b rk−1 b rk−1 , T b pb Ab pk k
yk = yk−1 + αk pbk ,
βk =
b rkT b rk , T b rk−1b rk−1
pbk+1 = b rk + βk pbk .
To do this, we can determine the new descent direction pbk+1. In general, the termination criteria is ρk ≤ ερ0 , in which ε > 0 is a given termination parameter. From yk , xk can be obtained accordingly. For the properties of the CG method, we have the following results. Theorem 5.2.10. In the conjugate gradient method, if k > 0, b r(k) 6= 0, then bp(i) = 0, 0 ≤ i < k, b r(k)T b r(i) =br(k)T pb(i) = pb(k)T Ab (k)T (i) (k)T (k) pb b r =b r b r , 0 ≤ i ≤ k.
(5.2.15)
bp(k) = (b Remark 5.2.2. If αk 6= 0, we have Ab r(k) − b r(k+1))/αk. Therefore bpb(k), βk = kb αk = kb r(k)k22 /( pb(k))H A r(k+1)k22 /kb r(k)k22 .
(5.2.16)
Algorithm 5.2.11. (CG) For given A ∈ Qnm×n , b ∈ Qm , x(0) ∈ Qn and small b = AR , b parameter tol > 0. Let A b = bRc , y(0) ∈ R4n . Step 1.
Step 2.
end
Compute
b (0), pb(0) = b b r(0) = b b − Ay r(0), ρ0 = (b r(0))T b r(0).
For k = 0, 1, 2, · · ·, if ρk > tol, compute bpb(k), αk = ρk /( pb(k))T A y(k+1) = y(k) + αk pb(k), b (k+1), b r(k+1) = b b − Ay (k+1) T (k+1) ρk+1 = (b r ) b r , (k+1) β = ρk+1 /ρk , pb(k+1) = b r(k+1) + β pb(k)
135
Iterative Methods for Solving Linear System ...
From above discussion, we now provide the following numerical code of real structure-preserving algorithm of the CG. Algorithm 5.2.12. For given quaternion Hermitian positive definite matrix A = A1 + A2 i + A3 j + A4 k ∈ Qn×n and the vector b = b1 + b2 i + b3 j + b4 k ∈ Qn . The inputs AA, bb are the first column blocks of AR , bR , respectively, i.e. ARc , bRc , and tol is a given small parameter. The output xx is the first column block of the real representation of the approximate solution of (5.2.1), and l is the numbers of iterations. Function: [xx,l] =CG(AA,bb,tol) n=size(AA,2) sAA=Realp(AA(1:n,:),AA(n+1:2*n,:),AA(2*n+1:3*n:),AA(3*n+1:4*n,:)); AA=sAA’*sAA; x=zeros(4*n,1); rr=bb-AA*x; pp=rr; rho1=rr’*rr; for k=0:4*n-1 a=rho1/(pp’*AA*pp); x=x+a*pp; rrr=bb-AA*x; i f (norm(rrr) 0 is the diagonal part matrix of AH A, so is positive definite, −L is the strict lower triangular of AH A.
1. The Jacobi Iteration Splitting AH A = M − N, Let M = D, N = L + LH , then we obtain the Jacobi iteration x(k) = x(k−1) + D−1 AH (b − Ax(k−1)), k = 1, 2, · · · , (5.3.5) (k)
or equivalently, let r(k−1) = b − Ax(k−1), for every x j , j = 1, · · · , n, compute (k)
(k−1)
xj = xj
+ aHj r(k−1)/d j , k = 1, 2, · · · .
(5.3.6)
The iterative matrix of the Jacobi iteration is GJ = I − D−1 AH A, and we should not compute AH A explicitly in the process of iteration.
2. The Gauss-Seidel Iteration Splitting AH A = M − N, Let M = D − L, N = LH , then we obtain Gauss-Seidel iteration x(k) = D−1 (AH b + Lx(k) + LH x(k−1)), k = 1, 2, · · · . (5.3.7)
Iterative Methods for Solving Linear System ...
139
The iterative matrix of the Gauss-Seidel iteration is GS = (D − L)−1LH , and we should compute AH A explicitly. At the kth step, suppose that z(1) = x(k−1), r(1) = b − Ax(k−1) , compute z( j+1) = z( j) + δ j e j , r( j+1) = r( j) − δ j a j , δ j = aHj r( j) /d j , j = 1 : n,
(5.3.8)
then x(k) = z(n+1). AH A does not need to be computed explicitly in this algorithm.
3. The Successive over Rrelaxation Iteration (SOR) Splitting AH A = M − N, let M=
1 1 (D − ωL), N = ((1 − ω)D + ωLH ), ω ω
ω 6= 0,
then we obtain the successive over relaxation iteration x(k) = ωD−1 (Lx(k) + LH x(k−1) + AH b) + (1 − ω)x(k−1) ,
(5.3.9)
in which ω is the relaxation parameter, the iterative matrix is Gω = (D − ωL)−1 ((1 − ω)D + ωLH ). In this case, AH A should be computed explicitly. At the kth step, let z(1) = x(k−1), r(1) = b − Ax(k−1), compute z( j+1) = z( j) + δ j e j , r( j+1) = r( j) − δ j a j , δ j = ωaHj r( j) /d j , j = 1 : n,
(5.3.10)
then x(k) = z(n+1). AH A does not need to be computed explicitly in this algorithm. If in the SOR iterative method, we take ω = 1, then G1 = GS = (D − L)−1LH is the iterative matrix of the Gauss-Seidel iteration. Theorem 5.3.1. Suppose that A ∈ Qnm×n . Then the iterative matrix Gω of the SOR method satisfies ρ(Gω) > |1 − ω|. Therefore the necessary condition of the SOR method to be convergent is 0 < ω < 2.
(5.3.11)
140
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
Proof. Since Gω = (I − ωD−1 L)−1 ((1 − ω)I + ωD−1 LH ), I − ωD−1 L is a unit lower triangular matrix, (1 − ω)I + ωD−1 LH is an upper triangular matrix, then det(Gω) = det((I − ωD−1 L)−1 ) det((1 − ω)I + ωD−1 LH ) = det((1 − ω)I + ωD−1 LH ) = (1 − ω)n . Therefore the product of all right eigenvalues of Gω is (1 − ω)n , i.e., ρ(Gω ) > |1 − ω|. If the SOR method is convergent, then |1 − ω| ≤ ρ(Gω ) < 1. While ω is real, we have 0 < ω < 2. Theorem 5.3.2. Suppose that A ∈ Qnm×n . When 0 < ω < 2 , the SOR method is convergent. Corollary 5.3.3. Suppose that A ∈ Qnm×n . Then the Gauss-Sidel iteration is convergent. The convergent speed of the SOR method is related to the iterative factor ω. Now, for a kind of special matrices, we discuss how to choose the best iterative factor. Definition 5.3.1. Suppose that A = (a1 , a2 , · · · , an ) ∈ Qnm×n . If set S = {1, 2, · · · , n} can be divided into t disjoint subsets S1 , · · · , St , such that (1) ∪tj=1 S j = S; (2) If aH i a j 6= 0, i ∈ Sk , and when i < j, j ∈ Sk+1 , when i > j, j ∈ Sk−1 , then we say that AH A has a consistent order. If AH A has a consistent order, then there exist a permutation matrix P of order n and diagonal matrices E1 , E2 , such that E1 F H H PA AP = . (5.3.12) F H E2 Theorem 5.3.4. Suppose that A ∈ Qnm×n , AH A has a consistent order. Its splitting formula is (5.3.4), right eigenvalues of the Jacobi iteration matrix GJ are all real numbers, and ρ(GJ ) < 1. Then ( 1 ωρ(GJ )+(ω2 ρ(GJ )2 −4(ω−1)) 2 2 ] , If 0 < ω 6 ωopt ; ρ(Gω) = [ 2 (5.3.13) ω − 1, If ωopt 6 ω < 2,
141
Iterative Methods for Solving Linear System ... in which ωopt is the optimal factor of the SOR method, p 1 − 1 − ρ(GJ )2 2 p p ωopt = = 1+ . 1 + 1 − ρ(GJ )2 1 + 1 − ρ(GJ )2
(5.3.14)
p
(5.3.15)
Therefore
ρ(Gωopt ) = ωopt − 1 =
1−
1+
p
1 − ρ(GJ )2 1 − ρ(GJ )2
=
2 ρ(GJ ) p . 1 + 1 − ρ(GJ )2
The proof of the theorem is very complicated, and we refer to [199]. Theorem 5.3.5. Suppose that A ∈ Qnm×n , AH A has a consistent order. Its splitting formula is (5.3.4) and ρ(GJ ) < 1. Then (1) R(GS ) = 2R(GJ ), 1 1 (2) 2ρ(GJ )[R(GS)] 2 6 R(Gωopt ) 6 R(GS ) + 2[R(GS )] 2 ,
(5.3.16)
in which the second inequality in (2) holds when R(GS ) 6 3, and lim
ρ(GJ )→1−
R(Gωopt ) 1
2[R(GS )] 2
= 1.
(5.3.17)
Remark 5.3.1. Under the assumption of Theorem 5.3.5, the asymptotic rate of convergence of the Gauss-Seidel iteration is twice of that of the Jacobi iteration when ρ(GJ ) → 1− , if we take the best iterative factor ωopt , the quantitative series of the asymptotic convergence rate of the SOR is one smaller than that of the Gauss-Seidel iteration. But if AH A has no consistent order, the asymptotic convergence rate is almost the same for any ω ∈ (0, 2). At this moment, we can use the symmetric SOR method, and choose iterative factor ω = 1.
4. The Chebyshev Semi-Iterative Acceleration For A ∈ Qnm×n , consider the splitting of AH A, AH A = M − N, in which M is an Hermitian positive definite matrix. Let G = M −1 N = I − B, B = M −1 AH A, g = M −1 AH b, x(k) = x(k−1) + τk (g − Bx(k−1)), k = 1, 2, · · · .
(5.3.18) (5.3.19)
142
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
(5.3.19) is called the Richardson iterative method, in which τk , k = 1, 2, · · · are given real parameters. When τk = τ, (5.3.19) is a general iterative method. Denote e(k) = x(k) − u, in which u is the LS solution of (5.3.1). Then k
e(k) = e(k−1) − τk Be(k−1) = (I − τk B)e(k−1) = ∏(I − τk B)e(0).
(5.3.20)
j=1
1 e = M − 12 AH AM − 21 . Then Be is an HermiFor y ∈ Qn , denote kyk ≡ kM 2 yk2 , B tian positive definite matrix. Suppose that λ0 and λ1 are the biggest and smallest
k
e respectively, pk (λ) = ∏ (1 − τk λ). Then from (5.3.20), right eigenvalues of B, j=1
k
(0) e ke(k)k ≤ ρ( ∏(I − τk B))ke k = max |pk (λ)|ke(0)k, λ1 ≤λ≤λ0
j=1
since pk (0) = 1, pk (λ) ∈ Ψ1k . Therefore according to Theorem 5.1.1, if we want to make max |pk (λ)| = min, we can choose λ1 6λ6λ0
pk (λ) =
1 −2λ Tk ( λ0λ+λ ) 0 −λ1 1 Tk ( λλ00 +λ −λ1 )
.
Therefore the optimal value of τ j is that which makes all the zeros of pk (λ) and Tk (
λ0 + λ1 − 2λ ) λ0 − λ1
are the same, i.e., 1 λ0 − λ1 2j −1 λ0 + λ1 = cos( π) + , j = 1 : k. τj 2 2k 2
(5.3.21)
The Richardson method taking {τk } as (5.3.21) is also called the Chebyshev semi-iterative method. In general the aim is to accelerate the original stationary iterative method by selecting the best coefficients in the Chebyshev semiiterative method. It is easy to check that q k λ0 − 1 1 λ1 . 6 2q (5.3.22) λ0 +λ1 λ0 Tk ( λ0 −λ1 ) + 1 λ1
143
Iterative Methods for Solving Linear System ... Therefore, taking 1 k> 2
s
1 λ0 2 1 2 ln = κ2 (AM − 2 ) ln , λ1 ε 2 ε
we have
ke(k)k 6 ε. ke(0)k
Remark 5.3.2. In practical computations, we usually do not know the biggest and smallest right eigenvalues λ0 and λ1 of B. If we can estimate a lower bound a > 0 of λ1 and an upper bound b of λ0 , then 1 b−a 2j −1 b+a = cos( π) + . τj 2 2k 2 q ke(k) k 1 Notice that, if k > 2 ba ln 2ε , then ke(0) k 6 ε. Furthermore, in the splitting of
AH A, one often takes M = D, or the corresponding matrix in the SOR and then M is an Hermitian positive definite matrix.
5.3.2.
The Krylov Subspace Methods
Now we discuss the Krylov subspace methods for the linear LS problem.
1. The Conjugate Gradient Method (CGLS) Suppose that A ∈ Qnm×n . Consider the normal equation (5.3.2) of the LS problem (5.3.1) AH Ax = AH b. Obviously, AH A is a positive definite matrix. Because 0 = kAH Ax − AH bk2 = k(AR )T AR xRc − (AR )T bRc k2 , b = AR , b let y = xRc , A b = bRc , (5.3.2) can be reduced to find the minimum vector of 1 bT b bT b f (y) = yT A Ay − (A b)T y. 2
In fact, the gradient g(y) of f (y) is g(y) = 5 f (y) = (
∂f ∂ f T bT b ,··· , ) = A Ay − AbT b b. ∂y1 ∂y4n
(5.3.23)
144
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
Moreover, for any given nonzero vector p ∈ R4n and t ∈ R, we have 1 bT Ap. b f (y + t p) − f (y) = tg(y)T p + t 2 pT A 2
If u is the solution of (5.3.2), let ub = uRc , then g(b u) = 0. Therefore for any nonzero vector p ∈ R4n , we have > 0, if t 6= 0; f (b u + t p) − f (b u) = 0, if t = 0, bT A b is positive defi.e., ub is the minimum vector of f (y). Furthermore, since A inite, quadratic function f (y) has only one minimum vector in R4n . If ub is the minimum vector of f (y), then Therefore
1 b f (b u + t p) − f (b u) = tg(b u)T p + t 2 pT Ap. 2
d f (b u + t p) |t=0 = g(b u)T p = 0. dt Since p is arbitrary, we have g(b u) = 0, so u is a solution of (5.3.2). If a set of nonzero vectors p(0), p(1), · · · , p(k−1) ∈ R4n satisfy bT Ap b ( j) = 0, i 6= j, p(i)T A
(5.3.24)
y(k+1) = y(k) + αk pk .
(5.3.25)
bT A b in R4n then {p(k)} is called a set of conjugate vectors with respect to A Given any initial vector y(0) ∈ R4n . For k = 0, 1, 2, · · · , find the minimum vector of f (y) on the straight line y = y(k) +t p(k) along the direction of p(k) from y(k), we obtain
Search bT (b b (k) ), αk = s(k) = A b − Ay
ks(k)k22 s(k)T p(k) = . bT Ap b (k) kAp b (k) k2 p(k)T A 2
(5.3.26)
p(k) is called the search direction, and (5.3.25) is called the conjugate direction method. Particularly, take p(0) = s(0), p(k+1) = s(k+1) + βk p(k) , βk = −
bT Ap b (k) ks(k+1)k2 s(k+1)T A 2 = , (k)k2 (k)T (k) T b b ks p A Ap 2
(5.3.27)
Iterative Methods for Solving Linear System ...
145
(5.3.26)-(5.3.27) is called the conjugate gradient method (CG). From (5.3.25)(5.3.27) we know, if k ≥ 0 such that s(k) = 0, then y(k) , and so x(k) is the solution of the LS problem, and αk = βk = 0, s(k+1) = p(k+1) = 0.
2. The QR Least Squares Method (LSQR) The QR least squares method (LSQR) is proposed for real matrices by Paige and Saunders in [140], which is obtained based on Lanczos bidiagonalization (LBD). The LSQR method can be generalized to solve the quaternion LS problems [182]. Suppose that A ∈ Qm×n , m ≥ n. Then there exist unitary quaternion matrices U = (u1 , · · · , um ), V = (v1 , · · · , vn ), U1 = (u1 , · · · , un ), and a real bidiagonal matrix α 1 β1 α 2 β2 . . . . B= ∈ Rn×n , . . αn−1 βn−1 αn such that
A=U
B 0
V H , or equivalently, AV = U1 B, AH U1 = V BH .
(5.3.28)
By applying relation of operations of quaternion matrices and their real representation matrices in Theorem 1.8.7, we have ARVcR = (U1 )R BRc , (AR)T (U1)Rc = V R (BR)Tc . b = AR , V b = VcR , U b1 = (U1 )Rc , from the jth column of the last two formulas Let A above, we have bv j = α j ubj + β j−1 ubj−1 , Ab bT ubj = α j vbj + β j vbj+1 , j = 1 : n. A
For convenience, we take β0 ub0 = 0, βn vbn+1 = 0. For given vectors vb1 ∈ R4n , kb v1 k2 = 1, using the following Golub-Kahan bidiagonalization recursive formula [61], we obtain α j , u j , β j , v j+1 , bv j − β j−1 ubj−1 , α j = kb b r j = Ab r j k2 , ubj = b r j /α j , bT ubj − α j vbj , β j = k pbj k2 , vbj+1 = pbj /β j , j = 1 : n. pbj = A
(5.3.29)
146
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
Obviously, if α j 6= 0, β j 6= 0, j = 1 : k, then vb1 , · · · , vbk are orthogonal to each other, ub1 , · · · , ubk are orthogonal to each other, too, bT A, b vb1 ), ubj ∈ Kk (A bA bT , Ab bv1 ). vbj ∈ Kk (A
When α j = 0 or β j = 0, iterative process (5.3.29) stops. If iterative bv j = β j−1 ubj−1 and process stops, when α j = 0, j < n, then we have Ab b b b < n, such span{Ab v1 , · · · , Ab v j } ⊂ span{b u1 , · · · , ubj−1 }. Therefore when rank(A) situation will happen. If iterative process stops when β j = 0, then we can check that bT (b A(b v1 , · · · , vbj ) = (b u1 , · · · , ubj )B j , A u1 , · · · , ubj ) = (b v1 , · · · , vbj )BbTj .
b Therefore σ(Bb j ) ⊂ σ(A). Paige and Saunders [141] described another bidiagonalization algorithm. Takeing initial vector u1 ∈ Qm , after n steps, the matrix A is transformed to a real lower bidiagonal matrix as in (5.3.28), in which α1 β2 α 2 .. ∈ R(n+1)×n, . B = Bn = β3 (5.3.30) . .. α n βn+1
here Bn is not square matrix. Define β1 v0 ≡ 0, αn+1vn+1 ≡ 0, then from (5.3.28), we obtain recursive relation. bT ubj = β j vbj−1 + α j vbj , A bv j = α j ubj + β j+1 ubj+1 , j = 1 : n. Ab
For given initial vector ub1 ∈ R4m , kb u1 k2 = 1, and for j = 1, 2, · · ·, we take the following formulas bT ubj − β j vbj−1 , α j = kb b rj = A r j k2 , b vj = b r j /α j , b pbj = Ab v j − α j ubj , β j+1 = k pbj k2 , ubj+1 = pbj /β j+1 ,
(5.3.31)
bn and the vectors vbi , ubi . For this bidiagonalization scheme, we obtain αi , βi of B we have bA bT , ub1 ), vbj ∈ K j (A bT A, bA bT ub1 ). ubj ∈ K j (A
Iterative Methods for Solving Linear System ...
147
Theoretically, we will obtain the same vector sequences by applying the bA bT and A bT A, b whether we take b bT b bT b Lanczos process to A v1 = A b/kA bk2 as initial vector in LBD process (5.3.29), or we take ub1 = b b/kb bk2 as initial vector. In the floating point operations, the Lanczos vectors will lose orthogonality, and some of above relations will not hold for adequate accuracy requirements. Even though the maximal and minimal singular values of truncated bidiagnoal matrix b when Bbk ∈ Q4(k+1)×k can well approach the corresponding singular values of A, k n. Now we consider the LSQR algorithm [141] for the quaternion linear least squares problem (5.3.1). We take the vector ub1 = b b/kb bk2 , and adopt (5.3.31). After k steps of iterations, we obtain the matrices bk = (b V v1 , · · · , vbk ),
bk+1 = (b U u1 , · · · , ubk+1),
and Bbk , in which Bbk is the (k + 1) × k submatrix of the upper left corner of Bbn , and (5.3.31) can be written with bV bk = U bk+1Bbk , A
bk+1 b β1U e1 = b b,
bT U bk+1 = V bk BbTk + αk+1 vbk+1 ebTk+1 . A
Now we search for an approximate solution of (5.3.1) x(k) ∈ Kk , Kk = Kk (AH A, AH b). Then x(k) can be expressed as x(k) = Vky(k) , and b − Ax(k) = Uk+1tk+1,
tk+1 = β1 e1 − Bk y(k).
Therefore we have bx(k) − b min kAb bk2 = min kBbk yb(k) − β1 eb1 k2 .
xb(k) ∈Kk
yb(k)
Theoretically, the LSQR and the CGLS methods produce the same approximate sequences, therefore, the convergent properties of the LSQR and the CGLS methods are the same. Now we describe the LSQR algorithm. Because Bk is a rectangular low bidiagonal matrix, we can compute the QR decomposition of Bk by applying a sequence of the Givens matrices, fk Rk Qk Bk = , Qk (β1 e1 ) = b , 0 φk+1
148
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
in which Rk is an upper triangular matrix, Qk = Gk,k+1Gk−1,k · · ·G12 . By computing the following relations 0 (k) H Rk y = f k , tk+1 = Qk , b φk+1
we can obtain the solution vector y(k) and corresponding residual vector tk+1 . Above steps do not require to compute at the beginning of each step. If we have computed the decomposition of Bk−1 , we add the kth column in the next step and compute plan rotation transformation Qk = Gk,k+1Qk , such that
0 θk b φk φk Gk,k+1Gk−1,k αk = ρk , Gk,k+1 = b . φk+1 0 βk+1 0
Theoretically, the LSQR and the CGLS methods will produce the same approximate sequences x(k). However, Paige and Saunders [141] proved that the LSQR is more reliable numerically when we need to perform many iterations, or A is an ill conditioned matrix. Algorithm 5.3.6. (LSQR). Suppose that A ∈ Qnm×n , b ∈ Qm , x(0) ∈ Qn are b = AR , xb = xRc , b b = V R, given and a small parameter tol > 0. Let A b = bRc , V Step 1. Compute
bT u1 ; xb(0) := 0; β1 u1 := b b; α1 v1 = A w1 = v1 ; b φ 1 = β1 ; b ρ1 = α 1 ;
Step 2. For i = 1, 2, · · ·, repeat the following steps until convergence
Stop
b i − αi ui ; βi+1 ui+1 := Av bT ui+1 − βi+1 vi ; αi+1 vi+1 := A (ci , si , ρi ) = qGivens(b ρi, βi+1 ); θi = si αi+1 ; b ρi+1 = ci αi+1 ; φi = ci b φi ; b φi+1 = −sib φi ; (i) (i−1) xb = xb + (φi /ρi )wi ; wi+1 = vi+1 − (θi /ρi )wi ;
Iterative Methods for Solving Linear System ...
149
Here, 0 qGivens0 is an algorithm computing Givens rotation and in quaternion case, we can choose qGivens mentioned in Algorithm 2.1.5, and we choose parameters αi ≥ 0 and βi ≥ 0 such that corresponding vectors ui , vi are unit. Amount of computations and storages of the LSQR and the CGLS are considerable. When the condition number of A is very big, we can adopt the preconditioning CGLS or the preconditioning LSQR algorithms.
5.3.3.
Preconditioning Hermitian-Skew Hermitian Splitting Iteration Methods
Now we consider the iterative method of the KKT equation of the WLS problem in §3.4. −1 W A r b Bz = d, B = ,z = , d= (5.3.32) H A 0 x 0
in which A ∈ Qnm×n , W is Hermitian positive definite. If W = I, (5.3.32) is the KKT equation of the LS problem. From the conditions satisfied by A and W , the KKT matrix B is nonsingular symmetry indefinite matrix. Since B in (5.3.32) is an Hermitian indefinite matrix, we rewrite the equation (5.3.32) as the following form in order to propose effective iterative method. −1 W A r b Bz = d, B = , z= , d= . (5.3.33) −AH 0 x 0
Here the KKT matrix B is a non-Hermitian positive definite matrix . When all involved matrices and vectors are real, based on Hermitian/skew Hermitian splitting iterative method (HSS) for nonsymmetric positive definite linear equations, Bai and Ng [12], Golub and Pan [13] proposed the preconditioning Hermitian/skew Hermitian splitting iterative methods (PHSS) for nonsymmetric positive definite linear equations. We now generalize the HSS iterative method to quaternion case. We introduce the matrix −1 W 0 b = W 12 AC− 21 ∈ Qm×n , P= , A 0 C in which C ∈ Qn×n is an Hermitian positive definite matrix with appropriate size, and define ! b b 1 1 1 1 I A b r r b −2 −2 b b 2 B = P BP = , =P , d= = P− 2 d. H b xb x 0 −A 0
150
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
Therefore linear system (5.3.33) can be converted to the following form r b b b B = d. (5.3.34) xb Denote
b = 1 (Bb + BbH ) = H 2
I 0
1 , Sb = (Bb − BbH ) = 2
0 bH −A
and then the HSS iterative method [12] of linear system (5.3.34) is ! (k) (k+ 21 ) b r b r b b (αI + H) = (αI − S) + db 1 (k+ ) 2 x b(k) xb ! (k+1) (k+ 21 ) b r b r b b b (αI + S) xb(k+1) = (αI − H) xb(k+ 21 ) + d,
b A 0
!
,
(5.3.35)
in which α > 0 is a parameter. Iterative matrix of the HSS iteration is as follows b (α) = (αI + S) b −1 (αI − H)(αI b b b −1 (αI − S). L + H)
(5.3.36)
After inverse transform of the HSS iteration (5.3.35), we obtain PHSS iterative method [13] about linear system (5.3.33). ! (k) (k+ 21 ) r r (αP + H) = (αP − S) +d 1 x(k) x(k+ 2 ) ! (5.3.37) (k+1) (k+ 21 ) r r (αP + S) x(k+1) = (αP − H) x(k+ 12 ) + d, in which H = 21 (B + BH ), S = 21 (B − BH ). We obtain iterative matrix after direct verification. 1
1
b (α)P 2 . L (α) = (αP + S)−1 (αP − H)(αP + H)−1 (αP − S) = P− 2 L
(5.3.38)
Therefore we have
b (α)). ρ(L (α)) = ρ(L
(5.3.39)
When we apply the PHSS iteration (5.3.37), each iteration can be computed by direct methods or inner iterative methods.
151
Iterative Methods for Solving Linear System ...
bk (k = 1 : n) are the positive singular Theorem 5.3.7. Suppose that A ∈ Qnm×n , σ b = W 12 AC− 21 . Then the iterative matrix L (α) of the PHSS iterative values of A α−1 method has the right eigenvalue α+1 with the multiplicity m − n, and the right eigenvalue q 1 2 b2 2 2 2 2 4 bk ) − 4α σ bk , k = 1 : n. α(α − σk ) ± (α + σ b2k ) (α + 1)(α2 + σ Therefore ,
ρ(L (α)) < 1,
∀α > 0,
i.e. the PHSS iteration converges to the exact solution of the linear system (5.3.33). Theorem 5.3.8. Suppose that W ∈ Qm×m is an Hermitian positive definite matrix, A ∈ Qm×n is a full column rank matrix, α > 0 is a given constant, and bk (k = 1 : n) are the positive singuC ∈ Qn×n is Hermitian positive definite. If σ 1 1 − bk }, σmax = max {σ bk }, then lar values of the matrix W 2 AC 2 , σmin = min {σ 1≤k≤n
1≤k≤n
the optimal parameter α of the PHSS iterative method for solving the linear system (5.3.33) is as follows α∗ = arg min ρ(L (α)) = α
and ρ(L (α∗)) =
√
σmin σmax ,
σmax − σmin . σmax + σmin
5.4. Iterative Methods for the TLS Problem In this section, we discuss the iterative method of the total least squares problem.
5.4.1.
The Partial SVD Method
Consider the TLS problem AX = B, in which A ∈ Qm×n , B ∈ Qm×d , m ≥ n + d. When we compute XTLS by basic SVD method, we need to compute the SVD of C = [A, B], which need huge calculation quantity. We notice that computing XTLS is in fact computing the right singular vectors of several smallest singular values of C, and therefore we do not have to compute the SVD of C. Thus we can adopt the following partial SVD method. For given parameter η > 0.
152
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
Step 1. Transform C to a real bidiagonal matrix J by applying a sequence of Househodler matrices P1 , Q1 , · · · , Pn+d−1, Qn+d−1, Pn+d , Qn+d , i.e., the bidiagonalization process of C α 1 β1 · · · 0 .. 0 α2 . . . . J . . . Pn+d Pn+d−1 · · ·P1CQ1 · · ·Qn+d = = . . . . . βn+d−1 . 0 . 0 ··· 0 αn+d 0 ··· 0 0 (5.4.1) Step 2. Perform diagonalization iteration to J by applying implicit symmetrical QR algorithm. In the iterative process, we obtain J = diag(J1 , · · · , Jk ). Divide J1 , · · · , Jk into three sets D1 = {Ji : σ j (Ji ) ≥ η}; D2 = {Ji : σ j (Ji ) < η}; D3 = {Ji : some σ j (Ji ) ≥ η some σ j (Ji ) < η},
(5.4.2)
in which σ j (Ji ) denotes the singular values of Ji . We only perform diagonalization for Ji in D3 , until D3 does not exist. Step 3. For theset D2, we compute the corresponding right singular vecV12 tors of C to obtain , and then we compute the QL decomposition of V22 (4.4.4) and xTLS = −ZΓ−1 . Remark 5.4.1. If η is too small, Γ maybe singular. Then we should increase η. In addition, we can apply Sturm sequence of the real symmetrical tridiagonal matrix JiH Ji to determine D1 , D2 , D3 .
5.4.2.
Bidiagonalization Method
Let Ax = b, in which A ∈ Qm×n , b ∈ Qm . We consider its approximation solution of the TLS problem by the Lanczos bidiagonalization process. min k(E, r)kF, E,r
(A + E)x = b + r
153
Iterative Methods for Solving Linear System ...
bn > σn+1 , then the solution of the TLS According to the discussion in §3.3, if σ problem can be determined by right singular vector vn+1 of (A, b). Therefore, we can apply the LSQR Lanczos bidiagonalization method proposed by Paige and Saunders [141] to the augmented matrix (A, b), which can produce a approximate TLS solution in the Krylov subspace Kk = Kk (AH A, AH u1 ). Take β1 = kbk2, u1 = b/β1 , and apply the same iterative process as in §5.3, suppose that β1 v0 ≡ 0, αn+1 vn+1 ≡ 0, and for j = 1, 2, · · ·, we apply the following formulas r j = AH u j − β j v j−1 , α j = kr j k2 , v j = r j /α j , p j = Av j − α j u j , β j+1 = kp j k2 , u j+1 = p j /β j+1 ,
(5.4.3)
to obtain the elements αi , βi of Bn and the vectors vi , ui . For this bidiagonalization scheme, we have u j ∈ K j (AAH , u1 ), v j ∈ K j (AH A, AH u1 ), β1 α 1 β2 α 2 .. ∈ Q(k+1)×(k+1). . (β1 e1 , Bk ) = β3 .. . αk βk+1
And then, we seek a approximate TLS solution x(k) = Vk yk ∈ Kk . So we have (A + Ek )x(k) = (A + Ek )Vk yk = (Uk+1Bk + EkVk)yk = β1Uk+1e1 + rk . Thus compatibility relations are converted to (Bk + Fk )yk = β1 e1 + sk ,
T Fk = Uk+1 EkVk ,
H sk = Uk+1 rk .
Therefore yk is the solution of the following TLS subproblem, min k(F, s)kF, F,s
(Bk + F)yk = β1 e1 + s.
We compute the SVD of Bk by standard implicit QR algorithm, and we have Bk = Pk+1 Ωk QH k+1 ,
154
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
and (b, A) Therefore, from
Vk1 Vk2
zk γk
(Qkek ) = ωk (Uk+1Pk+1 ek ).
=
Vk1 Vk2
Qk ek ,
we obtain the approximate TLS solution x(k) = −zk /γk ∈ Kk . In order to compute x(k) , we only need the right singular vector Qk ek associated with the smallest singular value σn+1 , while the vector vk is needed to store and renew. If the (k) smallest singular value equals to σk+1 , then we have (k)
k(Ek, rk )kF = σk+1 , and we obtain the following estimate formula k(Ek , rk )k2F ≤ min k(E, r)k2F + σ21 (
tan θ(v, u1) 2 ) . Tk−1 (1 + 2γ1 )
5.5. Some Matrix Equations In this section, we describe two iterative algorithms of the least squares solutions of the quaternion matrix equation AXB = C. The first iterative algorithm is the quaternion matrix CGLS method (qMCGLS), which is a generalization of one in [144, 146] for real matrix A, B and C. The second iterative algorithm is the quaternion matrix LSQR method (qMLSQR) [180, 181]. Algorithm 5.5.1. (qMCGLS for AXB = C) For given A ∈ Qm×n , B ∈ Q p×q ,C ∈ Qm×q , this algorithm computes the minimum F-norm solution matrix X ∈ Qn×p of the quaternion matrix LS problem kAXB −CkF = min. b = AR , X b = X R, B b = BR , Cb = CR . Choose an initial matrix Step 1. Set A Xb1 = 0 and a small tolerant parameter ε > 0. Step 2. Calculate
Set k := 1.
bT (Cb − AbXb1 B)( b BbT )c ; (c R1 ) c = A bT A bR b1 B( b BbT )c . (Pb1 )c = A
Iterative Methods for Solving Linear System ...
155
bk is an approximation of the minStep 3. If k(Rbk)c kF < ε, then stop and X imum F-norm LS solution; else Step 4. Calculate k(Rbk)c k2F b (Pk )c; k(Pbk )c k2F bT (Cb − A bXbk+1B) b BbTc ; (Rbk+1)c = A (Xbk+1)c = (Xbk )c +
bT A bRbk+1BbBbTc + (Pbk+1 )c = A
Set k := k + 1. Step 5. Go to Step 3.
k(Rbk+1)c k2F b (Pk )c ; k(Rbk)c k2F
The following algorithm qMLSQR described in [180] is a generalization of the LSQR algorithm for real matrix proposed by Paige and Saunders [141]. Here we make some improvement over the algorithm mentioned in [180]. Algorithm 5.5.2. (qMLSQR for AXB = C) For given A ∈ Qm×n , B ∈ Q p×q ,C ∈ Qm×q , this algorithm computes the minimum F-norm solution matrix X ∈ Qn×p of the quaternion matrix LS problem kAXB −CkF = min. b = AR , Xb = X R , Bb = BR , Cb = CR . Choose a small tolerant Step 1. Set A parameter ε > 0. Initialization. β1 = kCcR kF ,
(U1 )Rc = CcR /β1 ,
α1 = k(AR)T U1R (BR)Tc kF , (H1 )Rc = (V1 )Rc ,
(V1)Rc = (AR )T U1R (BR )Tc /α1 ,
XcR = 04n×p ,
ξ¯ 1 = β1 ,
ρ¯ 1 = α1 ;
Step 2. Iteration. For t = 1, 2, · · · (a) Bidiagonalization (i) βt+1 = kARVtR BRc − αt (Ut )Rc kF ,
(Ut+1 )Rc = (ARVtR BRc − αt (Ut )Rc )/βt+1 ,
R (ii) αt+1 = k(AR)T Ut+1 (BR )Tc − βt+1 (Vt )Rc kF ,
R (BR)T − β R (Vt+1 )Rc = ((AR )T Ut+1 t+1 (Vt )c /αt+1 ; c
156
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
(b) Construct and use quaternion Givens rotation q 2 , ρt = ρ¯ t2 + βt+1 ct =
ρ¯ t ρt , st
=
βt+1 ρt , θt+1
= st αt+1 ,
ρ¯ t+1 = −ct αt+1 , ξt = ct ξ¯ t , ξ¯ t+1 = st ξ¯ t ; (c) Update X and H (Xt )Rc = (Xt−1 )Rc + ( ρξtt )(Ht )Rc , R (Ht+1 )Rc = (Vt+1 )Rc − ( θt+1 ρt )(Ht )c ;
(d) Check convergence.
Chapter 6
Computations of Quaternion Eigenvalue Problems The algebraic eigenvalue problem is one of the major problems in matrix computations. For a real or complex matrices A, eigenvalue problem is well studied, and various numerical methods are proposed [64]. For a quaternion matrix A, since the non-commutativity of quaternion multiplication, there is no very close relation between left and right eigenvalues [210]. Compared with the left eigenvalue problem, the right eigenvalue problem has been well studied, since right eigenvalues are invariant under the similarity transformation, and so are more useful [210]. If A is a quaternion Hermitian matrix, then A is unitary similarity to a real diagonal matrix, thus the methods for solving eigenvalue problem of real or complex matrices can be directly generalized to solve quaternion Hermitian right eigenvalue problem. On the contrary, if A is a non-Hermitian matrix, the situation is quite different and more complicated, the computational methods of real or complex matrix eigenvalue problem can not be directly extended to the quaternion non-Hermitian matrix, and the working on computing right eigenvalues of quaternion matrices is much more difficult than that of real or complex matrices. In this chapter, we discuss numerical computations for right eigenvalue problems of quaternion Hermitian matrices and quaternion non-Hermitian matrices. In §6.1, we generalize the power method and the inverse power method to quaternion Hermitian right eigenvalue problem and propose the real structurepreserving algorithms of the power method and the inverse power method. We propose the real structure-preserving algorithm of Hermitian QR algorithm for
158
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
computing all right eigenvalues and associated right eigenvectors of a quaternion Hermitian matrix. We also propose several subspace methods and quaternion Jacobi method for quaternion Hermitian right eigenvalue problem. In §6.2, we propose the real structure-preserving algorithms of the power method and the inverse power method for quaternion non-Hermitian right eigenvalue problem. We also give the real structure-preserving algorithm of QR algorithm for right eigenvalue problem of quaternion non-Hermitian matrix. Suppose that the matrix A ∈ Qn×n . λ1 , λ2 , · · · , λn are standard right eigenvalues of A, and x1 , x2 , · · · , xn ∈ Qn are right eigenvectors associated with λ1 , λ2 , · · · , λn , respectively. Suppose that the right eigenvalues of A satisfy |λ1 | > |λ2 | ≥ · · · ≥ |λn |.
(6.0.1)
6.1. Quaternion Hermitian Right Eigenvalue Problem Suppose that A ∈ Qn×n is Hermitian, then all the right eigenvalues of A are real. The Hermitian right eigenvalue problem has nice properties and rich mathematical theory. In this section, we study several methods for solving quaternion Hermitian right eigenvalue problems.
6.1.1.
The Power Method and Inverse Power Method for Quaternion Hermitian Right Eigenvalue Problem
When A is large and sparse, the power method is useful and simple to evaluate the maximal norm eigenvalue and associated eigenvector. We now generalize the power method and the inverse power method used in complex eigenvalue problem to quaternion Hermitian right eigenvalue problem [120].
1. The Power Method for Quaternion Hermitian Right Eigenvalue Problem For Hermitian matrix A ∈ Qn×n , there exists a unitary matrix X such that X −1 AX = Λ = diag(λ1 , λ2 , · · · , λn ), in which λ1 , λ2 , · · · , λn ∈ R.
159
Computations of Quaternion Eigenvalue Problems
In analogy to the complex matrix, we propose the following iterative algorithm which is the power method. yk+1 = Auk , (6.1.1) yk+1 uk+1 = kyk+1 k2 , for k = 0, 1, 2, . . . until convergent, in which u0 ∈ Qn is any given initial vector with ku0 k2 = 1.
Theorem 6.1.1. Suppose that A ∈ Qn×n is Hermitian, and the right eigenvalues of A satisfy (6.0.1). For a unit quaternion vector u0 ∈ Qn which has a non-zero component in the direction of the right eigenvector x1 associated with the right eigenvalue λ1 , then the power method produces two sequences of quaternion vectors yk+1 and uk according to the formula (6.1.1). When k is sufficiently large, ˆ 1 ≡ uH yk+1 = λ1 (1 + O(| λ2 |2k )) ≈ λ1 , λ k λ1 λk α
uk = x1 |λk1||α1 | (1 + O(| λλ12 |k )), 1
xˆ1 ≡
x¯ (i )
(6.1.2)
1
x¯ (i ) uˆk = uk |x¯11 (i11 )|
≈ x1 β1 ,
in which β1 = |x¯11 (i11 )| , i1 is the position of element with the maximal norm in uk , therefore x1 β1 is the unit right eigenvector of A associated with λ1 . Proof. Let u0 ∈ Qn be a unit vector. Since the column vectors x1 , x2 , · · · , xn form a basis in Qn , we have n
u0 =
∑ x jα j,
j=1 n
k
uk =
A u0 = kAk u0 k2
n
∑ x j λk+1 j αj
∑ x j λkj α j
j=1 n
, yk+1 =
k ∑ x j λkj α j k2
j=1 n
.
k ∑ x j λkj α j k2
j=1
j=1
Since X = (x1 , x2 , · · · , xn ) is unitary, then n
uH k yk+1 = =
n n 2k+1 αj ∑ αH i λi j=1 i=1 n n k H k 2k ∑ αH ∑ αH i λi xi ∑ x j λ j α j i λi α j i=1 j=1 i=1 n λ α λ12k+1 |α1 |2 (1+ ∑ (( λ i )2k+1 | α i |2 ) 1 1 i=2 n 1 2 (1+ ∑ (( λi )2k | αi |2 ) λ2k |α | 1 1 λ1 α1 i=2 k+1 k H ∑ αH i λi xi ∑ x j λ j α j
i=1 n
=
= λ (1 + O(| λλ21 |2k )),
(6.1.3)
160
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
n
n
∑ x j λkj α j
uk =
∑ x j λkj α j
j=1 n
j=1
=
k ∑ x j λkj α j k2
s
j=1
n
=
n
2 ∑ λ2k j |α j |
j=1
|λk1 ||α1 |
(6.1.4)
λ
λk1 (x1 α1 + ∑ x j ( λ j )k α j ) 1 j=2 s n
λ
α
1+ ∑ ( λ j )2k | α j |2 ) j=2
1
1
=
λk α x1 |λk1||α1 | (1 + O(| λλ21 |k )). 1 1
Suppose that |uk(i1 )| = max{|uk (1)|, |uk(2)|, · · · , |uk(n)|}, then uk (i1 ) ≈ x1 (i1 )
λk1 α1 . |λk1 ||α1 |
u¯ (i )
Let xˆ1 ≡ uk |ukk (i11 )| , then kxˆ1 k2 = 1 and xˆ1 ≈ x1
¯ k x¯1 (i1 ) λk1 α1 α¯ 1 λ x¯1 (i1 ) 1 = x1 . k k |x¯1 (i1 )| |λ1 ||α1| |α1 ||λ1 | |x¯1 (i1 )|
Remark 6.1.1. 1. From Theorem 6.1.1, when k is sufficiently large, uH k yk+1 is a good approximation of λ1 , and uˆk is a good approximation of right eigenvector x1 associated with λ1 . Notice that uH k yk+1 =
uk Auk kuk k22
is the Rayleigh quotient. 2. Since we do not know the appropriate k, in the following algorithm, we set the outer iterations and inner iterations. In the inner iterations, we only compute yk+1 and uk according to (6.1.1), and we perform (6.1.2) after the end of each inner iteration to reduce computational amounts. In the following, we provide the numerical code of real structure-preserving algorithm for the power method for quaternion Hermitian right eigenvalue problem.
Computations of Quaternion Eigenvalue Problems
161
Algorithm 6.1.2. For a quaternion Hermitian matrix A = A1 + A2 i + A3 j + A4 k ∈ Qn×n , the input B is the first column block of the real representation matrix of A, i.e., ARc , u0c is the first column block of the real representation of the initial vector u0. M and N are the iterate numbers of the outer iterations and inner iterations, respectively, and tol is the termination parameter. The output d1 is the right eigenvalue with the maximum norm, and u is the first column block of real representation of the right eigenvector associated with d1. Function : [d1, u] = qPower(B, u0c, M, N,tol) A0 = B; n = size(A0, 2); d1 = 1; u = u0c; A1 = Real p(A0(1 : n, :), A0(n + 1 : 2 ∗ n, :), A0(2 ∗ n + 1 : 3 ∗ n, :), ... A0(3 ∗ n + 1 : 4 ∗ n, :)); fox i = 1 : M fox k = 1 : N y = A1 ∗ u; u = y/norm(y); end y = A1 ∗ u; d2 = u0 ∗ y; [mx, i1] = max(sum(([u(1 : n), u(1 + n : 2 ∗ n), u(1 + 2 ∗ n : 3 ∗ n), ... u(1 + 3 ∗ n : 4 ∗ n)] ∗ [u(1 : n), u(1 + n : 2 ∗ n), ... u(1 + 2 ∗ n : 3 ∗ n), u(1 + 3 ∗ n : 4 ∗ n)])0));% (1) uR = Real p(u(1 : n), u(n + 1 : 2 ∗ n), u(2 ∗ n + 1 : 3 ∗ n), u(3 ∗ n + 1 : 4 ∗ n)); u = uR ∗ [u(i1); −u(i1 + n); −u(i1 + 2 ∗ n); −u(i1 + 3 ∗ n)]/sqrt(mx); if norm(d2 − d1) < tol d1 = d2; S = i ∗ N; break end d1 = d2; end end
The assignment lines containing %(1) compute mx and i1, in which mx = max{|uk(1)|2 , |uk(2)|2, · · · , |uk(n)|2 } ≡ |uk(i1)|2.
162
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
2. The Inverse Power Method for Quaternion Hermitian Right Eigenvalue Problem The inverse power method can be used to compute the minimal norm right eigenvalue and associated right eigenvector. For an Hermitian matrix A ∈ Qn×n , suppose that λ1 , λ2 , · · · , λn are the right eigenvalues of A, and x1 , x2 , · · · , xn ∈ Qn are right eigenvectors associated with λ1 , λ2 , · · · , λn , respectively, and the right eigenvalues of A satisfy |λ1 | ≥ |λ2 | ≥ · · ·|λn−1| > |λn |.
(6.1.5)
−1 −1 −1 Then the right eigenvalues λ−1 satisfy 1 , λ2 , · · · , λn of A −1 −1 |λ−1 n | > |λn−1 | ≥ · · · ≥ |λ1 |.
So we can obtain λn and xn by implementing the power method on A−1 . In fact, we need not to compute A−1 in practice. We can adopt the following iterative algorithm Ayk+1 = uk , (6.1.6) k+1 , uk+1 = kyyk+1 k2 in which u0 ∈ Qn is any given initial vector with ku0 k2 = 1. In each iterative step, we need to compute a linear system Ayk+1 = uk . Since all the linear systems have the same coefficient matrix A, we can implement the pivoting LU decomposition (PLU) on A with the real structure-preserving Algorithm 2.2.4 in advance. Suppose that PA = LU, in which P ∈ Qn×n is a permutation matrix, L ∈ Qn×n is a unit lower triangular matrix and U ∈ Qn×n is an upper triangular matrix. Substitute A = PH LU into Ayk+1 = uk , we obtain LUyk+1 = Puk. Denote Uyk+1 = zk , then we only need to solve two triangular linear systems Lzk = Puk , (6.1.7) Uyk+1 = zk ,
and
uk+1 =
yk+1 kyk+1 k2
Computations of Quaternion Eigenvalue Problems
163
Table 6.1.1. The power method for Hermitian matrices n 20 40 60 80 100 120 140 160 180 200
Number of iterative steps 15 10 10 10 10 10 10 10 10 10
ˆ 1 k2 kAxˆ1 − xˆ1 λ 0.2458 ∗ 10−8 0.1276 ∗ 10−7 0.2843 ∗ 10−8 0.8710 ∗ 10−9 0.3735 ∗ 10−9 0.2380 ∗ 10−9 0.8805 ∗ 10−10 0.5822 ∗ 10−10 0.4415 ∗ 10−10 0.2525 ∗ 10−10
at each iterative step. Here we omit the algorithm for the inverse power method, since it is similar to the power method. Let µ ∈ R be approximation to the right eigenvalue λk , µ is referred to as a shift. Then λ1 − µ, · · · , λk − µ, · · · , λn − µ are right eigenvalues of A − µI, and since µ is much closer to λk than other right eigenvalues, λk − µ has the minimal norm. We can implement the inverse power method on A − µI to compute λk and xk . This is the shift inverse power method. We now provide a numerical example to illustrate the effectiveness of the above real structure-preserving algorithms for quaternion Hermitian right eigenvalue problem. For n = 20 : 20 : 200, and randomly obtained quaternion Hermitian matrices A ∈ Qn×n , we perform the real structure-preserving algorithms for the power method and the inverse power method for quaternion Hermitian right eigenvalue problem. We choose M = 300, N = 5, tol = 10−5 . We list numbers of ˆ n k2 of these iterative steps and computational errors kAxˆ1 − xˆ1 λˆ1 k2 , kAxˆn − xˆn λ algorithms in Table 6.1.1 and Table 6.1.2, respectively. From Table 6.1.1 and Table 6.1.2, we observe that the power method and the inverse power method are efficient.
164
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao Table 6.1.2. The inverse power method for Hermitian matrices n 10 20 30 40 50 60 70 80 90 100
6.1.2.
Number of iterative steps 10 15 15 75 20 20 10 65 20 10
ˆ n k2 kAxˆn − xˆn λ 0.8382 ∗ 10−13 0.6580 ∗ 10−8 0.1968 ∗ 10−7 0.9347 ∗ 10−5 0.7018 ∗ 10−7 0.2060 ∗ 10−5 0.4364 ∗ 10−9 0.9440 ∗ 10−5 0.1961 ∗ 10−5 0.3555 ∗ 10−11
Real Structure-Preserving Algorithm of Hermitian QR Algorithm for Hermitian Right Eigenvalue Problem
In this subsection, we propose a real structure-preserving algorithm of Hermitian QR algorithm to compute all right eigenvalues and associated right eigenvectors of a quaternion Hermitian matrix [83], in which at the first part of the tridiagonalization, we can implement the idea in [115] using the Householder based matrix H4 to speed up computations. From Theorems 1.8.5 and 1.8.6, we know that right eigenvalues λ1 , · · · , λn of A ∈ Qn×n can be obtained from 4n eigenvalues of AR ∈ R4n×4n λ1 , λ1 , λ1 , λ1 , · · · , λn , λn , λn , λn . Therefore, if we want to compute the right eigenvalues of a quaternion matrix A, we can deal with the eigenvalue problem of the 4n-by-4n real representation matrix AR . In general, that will cost more computational workload, computational time and storage space. But we find that all these costs will be greatly reduced if the structures of AR are fully considered. From Theorem 1.8.5, for a quaternion Hermitian matrix A, the right eigeninformation about AR can be obtained by computing the eigen-information of a real symmetric tridiagonal matrix D. By applying Theorem 1.8.1 and 1.8.5,
Computations of Quaternion Eigenvalue Problems
165
DR = V R AR (V H )R = (V H AV )R , in which V = W0 +W1 i +W2 j +W3 k ∈ Un . Then we have V H AV = D.
(6.1.8)
We call (6.1.8) the real tridiagonalization of the quaternion Hermitian matrix A. We formulate this result in the following theorem. Theorem 6.1.3. Let A ∈ Qn×n be Hermitian. Then there exists a quaternion matrix V ∈ Un such that V H AV = D, in which D is a real symmetric tridiagonal matrix. Furthermore, 1.
If (λ, x) is an eigenpair of D, then λ is a right eigenvalue of A and V H x is the associated right eigenvector;
2.
If D has a diagonalization D = Xdiag(λ1 , · · · , λn )X T , in which X ∈ R is an orthogonal matrix and λs ∈ R (s = 1, 2, · · · , n), then A has a diagonalization n×n
A = Zdiag(λ1 , · · · , λn )Z H , Z = V H X. By Theorem 6.1.3, we only need to handle an eigenvalue problem of a real symmetric tridiagonal matrix D. We now present a method for quaternion Hermitian right eigenvalue problem based on the real structure-preserving tridiagonalization of the real representation of A = A1 + A2 i + A3 j + A4 k ∈ Qn×n . The numerical code of real structure-preserving algorithm based on H4 for computing the right eigenvalues and associated right eigenvectors of a quaternion Hermitian matrix is as follows. Algorithm 6.1.4. For A = A1 +A2 i+A3 j+A4 k ∈ Qn×n , in which As ∈ Rn×n , s = 1, 2, 3, 4. The input AA is a 4n × n real matrix which is the first column block of AR . The output Λ is a n × n diagonal matrix whose diagonal elements are the right eigenvalues of A, and PP is a 4n × n matrix which is the first column block of PR satisfying A = PΛPH .
166
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao Function : [PP, Λ] = qHermeig(AA) n = size(AA, 2); B = A; P = zeros(4 ∗ n, n); P(1 : n, :) = eye(n); for t = 1 : n − 2 s = t + 1; if norm([B(s : n, t), B((n + s) : (2 ∗ n), t), B((2 ∗ n + s) : (3 ∗ n), t), ... B((3 ∗ n + s) : (4 ∗ n), t)]) > 0 [u, beta] = Householder1(B(s : n, t), B((n + s) : (2 ∗ n), t), ... B((2 ∗ n + s) : (3 ∗ n), t), B((3 ∗ n + s) : (4 ∗ n), t), n − s + 1); B([s : n, s + n : 2 ∗ n, s + 2 ∗ n : 3 ∗ n, s + 3 ∗ n : 4 ∗ n], t : n) = ... B([s : n, s + n : 2 ∗ n, s + 2 ∗ n : 3 ∗ n, s + 3 ∗ n : 4 ∗ n], t : n) − ... (beta ∗ u) ∗ (u0 ∗ B([s : n, s + n : 2 ∗ n, s + 2 ∗ n : 3 ∗ n, s + 3 ∗ n : 4 ∗ n], t : n)); P([s : n, s + n : 2 ∗ n, s + 2 ∗ n : 3 ∗ n, s + 3 ∗ n : 4 ∗ n], 1 : n) = ... P([s : n, s + n : 2 ∗ n, s + 2 ∗ n : 3 ∗ n, s + 3 ∗ n : 4 ∗ n], 1 : n) − (beta ∗ u) ∗ ... (u0 ∗ P([s : n, s + n : 2 ∗ n, s + 2 ∗ n : 3 ∗ n, s + 3 ∗ n : 4 ∗ n], 1 : n)); end Z(t : n, [s : n, s + n : 2 ∗ n, s + 2 ∗ n : 3 ∗ n, s + 3 ∗ n : 4 ∗ n]) = ... [B(t : n, s : n), −B(t + n : 2 ∗ n, s : n), −B(t + 2 ∗ n : 3 ∗ n, s : n), ... −B(t + 3 ∗ n : 4 ∗ n, s : n)]; Z(t : n, [s : n, s + n : 2 ∗ n, s + 2 ∗ n : 3 ∗ n, s + 3 ∗ n : 4 ∗ n]) = ... Z(t : n, [s : n, s + n : 2 ∗ n, s + 2 ∗ n : 3 ∗ n, s + 3 ∗ n : 4 ∗ n])... −Z(t : n, [s : n, s + n : 2 ∗ n, s + 2 ∗ n : 3 ∗ n, s + 3 ∗ n : 4 ∗ n]) ∗ u ∗ (beta ∗ u0); B([t : n, t + n : 2 ∗ n, t + 2 ∗ n : 3 ∗ n, t + 3 ∗ n : 4 ∗ n], s : n)... = [Z(t : n, s : n); −Z(t : n, s + n : 2 ∗ n); −Z(t : n, s + 2 ∗ n : 3 ∗ n); ... −Z(t : n, s + 3 ∗ n : 4 ∗ n)]; end for t = 1 : n − 2 s = t + 1; if norm([B(s, t), B(s + n, t), B(s + 2 ∗ n,t), B(s + 3 ∗ n,t)]) > 0 G = JRSGivens(B(s, t), B(s + n, t), B(s + 2 ∗ n,t), B(s + 3 ∗ n,t)); B([s, s + n, s + 2 ∗ n, s + 3 ∗ n],t : t + 2) = G0 ∗ B([s, s + n, s + 2 ∗ n, s + 3 ∗ n], ... t : t + 2); P([s, s + n, s + 2 ∗ n, s + 3 ∗ n], 1 : n) = G0 ∗ P([s, s + n, s + 2 ∗ n, s + 3 ∗ n], 1 : n); Z(t : t + 2, [s, s + n, s + 2 ∗ n, s + 3 ∗ n]) = [B(t : t + 2, s), −B(t + n : t + 2 + n, s), ... −B(t + 2 ∗ n : t + 2 + 2 ∗ n, s), −B(t + 3 ∗ n : t + 2 + 3 ∗ n, s)]; Z(t : t + 2, [s, s + n, s + 2 ∗ n, s + 3 ∗ n]) = Z(t : t + 2, [s, s + n, s + 2 ∗ n, ... s + 3 ∗ n]) ∗ G; B([t : t + 2, t + n : t + 2 + n, t + 2 ∗ n : t + 2 + 2 ∗ n, t + 3 ∗ n : t + 2 + 3 ∗ n], s) = ... [Z(t : t + 2, s); −Z(t : t + 2, s + n); −Z(t : t + 2, s + 2 ∗ n); −Z(t : t + 2, s + 3 ∗ n)]; end end
Computations of Quaternion Eigenvalue Problems
167
c = norm(B([n, 2 ∗ n, 3 ∗ n, 4 ∗ n], n − 1)); if c > 0 G = Real p(B(n, n − 1), B(2 ∗ n, n − 1), B(3 ∗ n, n − 1), B(4 ∗ n, n − 1))/c; P([n, 2 ∗ n, 3 ∗ n, 4 ∗ n], 1 : n) = G0 ∗ P([n, 2 ∗ n, 3 ∗ n, 4 ∗ n], 1 : n); B([n, 2 ∗ n, 3 ∗ n, 4 ∗ n], n − 1) = [c; 0; 0; 0]; B([n − 1, 2 ∗ n − 1, 3 ∗ n − 1, 4 ∗ n − 1], n) = [c; 0; 0; 0]; end D = B(1 : n, 1 : n) [U, Λ] = eig(D); P = [U 0 ∗ P(1 : n, :);U 0 ∗ P(n + 1 : 2 ∗ n, :);U 0 ∗ P(2 ∗ n + 1 : 3 ∗ n, :); ... U 0 ∗ P(3 ∗ n + 1 : 4 ∗ n, :)]; P = [P(1 : n, :)0 ; −P(n + 1 : 2 ∗ n, :)0; −P(2 ∗ n + 1 : 3 ∗ n, :)0; ... −P(3 ∗ n + 1 : 4 ∗ n, :)0]; end
In Algorithm 6.1.4, we emphasize that the process contains real structurepreserving tridiagonalization of the real representation of a quaternion matrix, and compute the right eigenvalues and associated right eigenvectors of the real symmetric tridiagonal matrix D by the function 0 eig0 in MATLAB. This algorithm is strongly backward stable, since in every step the structures of AR are preserved and all transforming matrices are orthogonal. Now we present a numerical example, in which we apply Algorithm 6.1.4 to solve the right eigenvalue problem of a quaternion Hermitian linear operator from Quaternion Quantum Mechanics. The tolerance is taken to be tol = 10−15 . Example 6.1.1. For given quaternion Hermitian matrix OQ ∈ Qn×n
OQ =
3
−2i
2i
3
j
2i .. . .. . .. . ···
2k 0 .. . 0
−j
−2k .. . −2i .. .. . . .. .. . . .. .. . . .. .. . . 0 2k
0 ··· 0 .. .. .. . . . .. .. . . 0 .. .. . . −2k , .. . −2i − j 2i 3 −2i j
2i
3
when n = 5, we compute the right eigenvalues and associated right eigenvectors of OQ by Algorithm 6.1.4.
168
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
Solution. Since OQ is Hermitian, by Algorithm 6.1.4 we get Λ = {−1.2426, 0.0000, 3.0000, 6.0000, 7.2426}, V = V1 +V2 i +V3 j +V4 k, in which V1 =
0.3557 0.4969 0.5031 0.4969 −0.3557
V3 =
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
−0.4943 0.1988 0.3926 −0.0994 0.1998
0.0936 −0.0994 −0.4908 0.1988 −0.4617
0 0 0 0 0
0 0 0 0 0
T 0.1996 −0.3975 0.5031 −0.3975 −0.1996
0.2945 !T −0.2981 −0.0000 0.2981 0.2945
,
, V2 =
V4 =
0 0 0 0 0
0 00 0 0
−0.5895 −0.1739 0.0245 0.5714 0.0005
−0.0647 0.2236 −0.2209 −0.2236 −0.3770
0 0 0 0 0
0 0 0 0 0
0.3118 −0.3230 0.0245 0.2733 0.2771
0.2119 0.5217 −0.2209 −0.0745 0.5243
In fact OQ is similar to a real symmetric tridiagonal matrix ! 3.0000 3.0000 0 0 0 D=
3.0000 3.0000 2.1344 0 0 0 2.1344 3.0000 2.1082 0 0 0 2.1082 3.0000 3.0000 0 0 0 3.0000 3.0000
0 !T 0 0 0 0
0 T 0 0 0 0
,
.
.
Now we obtain max{min |λ − eλ|} = 1.998401444325282e − 015, eλ∈Λ e
in which
λ∈Λ
min{min |λ − eλ|} = 2.220446049250313e − 016, eλ∈Λ e λ∈Λ
√ √ e = {3 − 3 2, 0, 3, 6, 3 + 3 2} Λ
is the set of all explicit right eigenvalues of OQ . Above results show that our algorithm is reliable.
6.1.3.
Real Structure-Preserving Algorithm of the Jacobi Method for Hermitian Right Eigenvalue Problem
A quaternion Hermitian matrix can be reduced to a diagonal matrix by unitary similarity transformations. The quaternion Jacobi method exploits the Hermitian property and chooses suitable rotations to reduce an Hermitian matrix to
169
Computations of Quaternion Eigenvalue Problems
a diagonal form. The Jacobi method is usually much slower than the Hermitian QR algorithm. However, the calculation precision is much better than the QR algorithm when computing the small right eigenvalues. What is more, the quaternion Jacobi method is capable of programming and inherently parallelism recognized. We now introduce its basic idea. Let A = (ai j ) ∈ Qn×n be an Hermitian matrix and 1/2 n n 1/2 n off(A) ≡ kAk2F − ∑ a2ii = ∑ ∑ a2i j . i=1
i=1
j=1
j6=i
The idea of the Jacobi method is to systematically reduce off(A) to be zero. The basic tool for doing this is called the Jacobi rotation defined as follows
J(p, q, θ) = In +
ep eq
cr − 1 s −s¯ cr − 1
eTp eTq
∈ Qn×n
in which p < q and ek denotes the kth unit vector. Let B = J H AJ, we want B pq = Bqp = 0. We observe that the updated matrix B agree with A except the pth row (column) and the qth row (column). We also note that the Frobenius norm is unitary invariant. We then get |a pp |2 + |aqq |2 + 2|a pq |2 = |b pp |2 + |bqq |2 . Thus n
off(B)2 = kBk2F − ∑ b2ii
i=1 n
= kAk2F −
=
∑
∑ a2ii − |a pp |2 − |aqq|2 + |b pp |2 + |bqq|2
i=1 n kAk2F − a2ii + |a pp |2 + |aqq |2 − |b pp |2 − |bqq |2 i=1 2 off(A) − 2|a pq |2 .
= , a pp , aqq ∈ R, a pq , aqp ∈ Q with a¯qp = a pq .
(6.1.9)
a pp a pq aqp aqq In [127], the authors proposed the real structure-preserving Jacobi algorithm for solving the quaternion Hermitian right eigenvalue problem. The real For
170
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
structure-preserving algorithm works by performing a series of orthogonal JRSsymplectic transformations AR ← (GR )T AR GR with the property that each updated AR is full but more approximating to a diagonal form than the previous one. Eventually, the off-diagonal entries of the updated AR are small enough and the diagonal entries can be seen as the eigenvalues of AR numerically. In this subsection, we introduce the real structure-preserving algorithm of the Jacobi method. We now derive a quaternion Givens matrix for eliminating off-diagonal elements of a 2 − by − 2 quaternion matrix. Implementation for obtaining the real representation matrix of this quaternion Givens matrix is much simpler than that of orthogonal JRS-symplectic Jacobi rotation obtained in [127]. Now we derive a quaternion Givens matrix, with which a 2-by-2 quaternion Hermitian matrix can be transformed to a real diagonal matrix. c11 c12 Theorem 6.1.5. Suppose C = is a 2-by-2 quaternion Hermitian c21 c22 matrix, in which c11 , c22 ∈ R, and c21 = c12 6= 0. Then there exists a 2-by-2 quaternion unitary matrix c −su12 G= (6.1.10) su12 c in which c, s ∈ R, c2 + s2 = 1, cs 6= 0 and u12 = |cc12 . Let t = 12 | s p 1 t c= + √ , s = 1 − c2 , 2 2 4 + t2 and if t < 0, s= Then we have
s
If t ≥ 0,
p 1 t − √ , c = 1 − s2 . 2 2 4 + t2 H
G CG = with b pp , bqq ∈ R.
C11 −C22 |C12 | .
b pp 0 0 bqq
.
Proof. By the hypothesis, the 2-by-2 quaternion unitary matrix G can be written as follows c su12 G= su21 cu22
Computations of Quaternion Eigenvalue Problems
171
with c, s ∈ R, c2 + s2 = 1, cs 6= 0 and |ui j | = 1, i, j = 1, 2. Since the two columns of G are mutually perpendicular, we have H c su12 = 0, su21 cu22 therefore cs(u12 + u¯21 u22 ) = 0, and so u12 = −u¯21 u22 . From (GH CG)12 = (GH CG)21 = 0, we obtain c2C12 u22 + s2 u¯21C21 u12 + cs(C11 u12 + u¯21C22 u22 ) = 0. In the above equality we substitute u12 = −u¯21 u22 to obatain [c2C12 − s2 u¯21C21 u¯21 + cs(C22 −C11 )u¯21 ]u22 = 0. Since |ui j | = 1, then c2C12 − s2 u¯21C21 u¯21 + cs(C22 −C11 )u¯21 = 0. Right-multiplying u21 in both sides of the above equality, we obtain c2C12 u21 − s2 u¯21C21 + cs(C22 −C11 ) = 0. We can take simple forms of u21 and u22 as u21 =
C¯12 , u22 = 1, |C12 |
such that the above equality becomes (c2 − s2 )|C12 | + cs(C22 −C11 ) = 0, and so
c2 − s2 C11 −C22 = ≡ t. cs |C12 |
For stability, if t ≥ 0, we take s p 1 t c= + √ , s = 1 − c2 , 2 2 4 + t2 and if t < 0, we take s=
s
p 1 t − √ , c = 1 − s2 , 2 2 4 + t2
172
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
then we have H
G CG =
b pp 0 0 bqq
.
with b pp , bqq ∈ R. Based on Theorem 6.1.5, we present the following numerical code for generating a generalized orthogonal JRS-symplectic Jacobi rotation for a 2-by-2 quaternion Hermitian matrix. Algorithm 6.1.6. Given a 2-by-2 quaternion Hermitian matrix C = c11 c12 , in which c11, c22 ∈ R, and c21 = c12 6= 0. This algorithm comc21 c22 putes the real representation GR of a 2-by-2 quaternion unitary Givens matrix G as (6.1.10) such that GH CG is a diagonal matrix. function GR = qqGivens(c11, c22, c12Rc ) t = (c11 − c22)/norm(c12Rc ); if t >= 0 c = sqrt((1 + t/sqrt(4 + t ∗ t))/2); s = sqrt(1 − c ∗ c); else s = sqrt((1 − t/(2 ∗ sqrt(4 +t ∗ t))/2); c = sqrt(1 − s ∗ s); u12 = c12Rc /norm(c12Rc); u12 = −s ∗ c12Rc ; u21 = (−u12(1); u12(2); u12(3); u12(4)); GR = Real p([c, u12(1); u21(1), c], [0, u12(2); u21(2), 0], [0, u12(3); u21(3), 0], ... [0, u12(4); u21(4), 0]); end
Now we present the numerical code of the real structure-preserving Jacobi method for any n-by-n quaternion Hermitian matrix A. Algorithm 6.1.7. (Cyclic real structure-preserving Jacobi) Given an n-byn quaternion Hermitian matrix A = A1 + A2 i + A3 j + A4 k ∈ Qn×n and a tolerance tol > 0, this algorithm overlaps the real representation matrix AR by (GR )T AR GR , in which GR is a generalized JRS-symplectic rotation and off((GR)T AR GR ) ≤ tol · kARkF . V = (In )R , ζ = tol · kARkF while off(AR ) > tol
Computations of Quaternion Eigenvalue Problems
173
for p = 1 : n − 1 for q = p + 1 : n J = qqGivens(A pp , Aqq , (A pq )Rc ); A([p, q], :)R = J T ∗ A([p, q], :)R; A(:, [p, q])R = A(:, [p, q])R ∗ J; V = V GR (p, q, θ); end for end for end while
We demonstrate the efficiency of the real structure-preserving Jacobi algorithm by two numerical examples. Example 6.1.2. Let A be a 5-by-5 quaternion Hermitian matrix: 3 −2i − j −2k 0 2i 3 −2i − j −2k A= j 2i 3 −2i − j . 2k j 2i 3 −2i 0 2k j 2i 3
In this experiment, we apply the cyclic real structure-preserving Jacobi method (Algorithm 6.1.7) to compute the eigenvalues of A and compare them with the √ √ ˜ = {3 − 2 2, 0, 3, 6, 3 + 2 2}. This is a toy example to show the explicit ones Λ validity and reliability of our algorithm. Using Algorithm 6.1.7 by five sweeps, we get the numerical results (AR )(5) with −1.2426 0 0 0 0 0 7.2426 0 0 0 (5) , A1 = 0 0 0.0000 0 0 0 0 0 6.0000 0 0 0 0 0 3.0000 (5)
(5)
(5)
A2 = A3 = A4 = 05×5 .
(5)
The diagonal entries of A1 are computed eigenvalues of A, collected in the set Λ. Since ˜ = 7.1054e − 015, min{min |λ − λ|} ˜ = 1.72201e − 016, max{min |λ − λ|} ˜ Λ ˜ λ∈
λ∈Λ
˜ Λ ˜ λ∈Λ λ∈
the errors of the computed eigenvalues have a upper bound 7.1054e − 015, which is smaller than the given tol = 10−14 .
174
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao 700 Classical complex Jacobi method Structure−preserving method
600
CPU time(second)
500
400
300
200
100
0
0
20
40
60 80 100 Dimension n=10:160
120
140
160
Figure 6.1.1. CPU times for computing eigenvalues and eigenvectors. Example 6.1.3. In this experiment, we deal with the diagonalization of 150 random quaternion Hermitian matrices, which dimension are n = 11 : 160 and figure out the CPU times by utilizing Algorithm 6.1.7 and the classical complex Jacobi algorithm in [18]. The CPU times are shown in Figure 6.1.1, in which the star line and the dot line show the CPU times costed by the classical complex Jacobi algorithm in [18] and Algorithm 6.1.7, respectively. From Figure 6.1.1, one can see that Algorithm 6.1.7 is faster than the classical complex Jacobi method when n is large, and this advantage becomes more obvious as the dimensions of quaternion matrices become larger. Meantime, Algorithm 6.1.7 and the classical complex Jacobi algorithm almost have same calculation accuracy. Taking n = 100 for an example, the computed eigenvalues of Algorithm 6.1.7 and the classical complex Jacobi algorithm, shown in Figure 6.1.2, are almost equal to each other. These results indicate that Algorithm 6.1.7 is superior to the classical complex Jacobi algorithm under the same high accuracy.
Computations of Quaternion Eigenvalue Problems
175
30 Structure−preserving method Classical complex Jacobi method 20
Eigenvalues
10
0
−10
−20
−30
0
20
40 60 Dimension n=100
80
100
Figure 6.1.2. Eigenvalues in descending order.
6.1.4.
Subspace Methods
Suppose A ∈ Qn×n is large and sparse, the subspace methods are more practical to compute quaternion Hermitian right eigenvalue problem. In this subsection, we propose several subspace methods such as Rayleigh-Ritz projection method and Lanczos method.
1. The Rayleigh-Ritz Projection Method We can compute a few right eigenvalues and associated right eigenvectors using Rayleigh-Ritz projection method. Let A = AH ∈ Qn×n , H is a subspace of Qn with dimension m. We hope to obtain some approximate right eigenvalues and associated right eigenvectors via H. The basic idea is choosing µ ∈ R and u ∈ H such that (Au − uµ)⊥H (6.1.11) Suppose that v1 , v2 , · · · , vm is an orthonormal basis of H, let V = (v1 , v2 , · · · , vm ).
176
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
Then u can be expressed as u = V y, in which y ∈ Qm . Thus the equivalent form of (6.1.11) is V H (AV y −V yµ) = 0 i.e., Bmy = yµ, in which Bm = V H AV is exactly the Rayleigh quotient of A about V . For every (µ, y), µ is called the Ritz value, and u = V y is called the Ritz vector. The right eigenvalue problem of A is converted into computing right eigenvalues of Bm . In general, m n, Bm is a quaternion Hermitian matrix of small order. We have several efficient numerical methods to solve this eigenvalue problem of Bm and with this method, we can compute approximate right eigenvalues and associated right eigenvectors of A. The following is its basic process. Algorithm 6.1.8. (The Rayleigh-Ritz projection method) For a quaternion Hermitian matrix A ∈ Qn×n and q1 ∈ Qn , kq1 k = 1, the following algorithm computes (µ j , u j ) as the estimate of a few right eigenvalues and associated right eigenvectors of A. Step 1. Compute an orthonormal basis v1 , v2 , · · · , vm of H, denote V = (v1 , v2 , · · · , vm ); Step 2. Compute the Rayleigh quotient Bm = V H AV ; Step 3. Compute right eigenvalues of Bm , and choose some needed as approximate right eigenvalues µ1 , µ2 , · · · , µk ; Step 4. Compute right eigenvectors y j associated with µ j , and form Ritz vectors u j = V y j , j = 1, 2, · · · , k.
2. The Hermitian Lanczos Method For a quaternion Hermitian matrix A ∈ Qn×n , the Lanczos method can be used to compute a few of its largest norm and/or smallest norm right eigenvalues, which are called extremal right eigenvalues. The method generates a sequence of tridiagonal matrices Tk with the property that the extremal right eigenvalues of Tk are progressively better estimates of those of A. Algorithm 6.1.9. (Lanczos method) For a quaternion Hermitian matrix A ∈ Qn×n and q1 ∈ Qn , kq1 k = 1, the following algorithm compute (µ j , u j ) as the estimate of its extremal right eigenvalues and associated right eigenvectors.
Computations of Quaternion Eigenvalue Problems
177
Step 1. Compute a matrix Qk = (q1 , · · · , qk) with orthonormal columns and a tridiagonal matrix Tk ∈ Qk×k so that Tk = QH k AQk . The diagonal and superdiagonal entries of Tk are α1 , · · · , αk and β1 , · · · , βk−1 , respectively; Step 2. Compute right eigenvalues of Tk , select a few right eigenvalues µ j , j = 1, 2, · · · , l, which satisfy some conditions; Step 3. Compute right eigenvectors y j associated with µ j , and the Ritz vectors u j = Qk y j , j = 1, 2, · · · , l; Step 4. Examine whether |βk ||eTk y j | satisfy conditions, if not, let k = k + 1, return to Step 1.
6.2.
Quaternion Non-Hermitian Right Eigenvalue Problem
In this section, we propose the power method, the inverse power method and the quaternion QR method for quaternion non-Hermitian right eigenvalue problem.
6.2.1.
The Power Method and the Inverse Power Method
1. The Power Method for Quaternion Non-Hermitian Right Eigenvalue Problem In this subsection, we discuss the power method for quaternion non-Hermitian right eigenvalue problem [120]. For given initial vector u0 ∈ Qn with ku0 k2 = 1, we propose the following iterative algorithm which is similar to that of quaternion Hermitian right eigenvalue problem yk+1 = Auk , (6.2.1) k+1 uk+1 = kyyk+1 . k2 We have the following result.
Theorem 6.2.1. Suppose that A ∈ Qn×n is non-Hermitian, and the standard right eigenvaluesof A satisfy (6.0.1). Given a unit quaternion vector u0 ∈ Qn which has a non-zero component α1 in the direction of the right eigenvector x1 associated with the right eigenvalues λ1 , Jordan canonical decomposition of A is as in (1.4.1), (1.4.2) and (1.4.3). Then the power method produces two
178
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
sequences of quaternion vectors yk+1 and uk according to the formula (6.2.1). When k is sufficiently large, x1 λk1 α1 (1 + O(| λλ12 |k kn1 (λ2 )−1 )), kx1 λk1 α1 k2 x λk+1 α yk+1 = kx1 λ1k α k1 (1 + O(| λλ12 |k kn1 (λ2 )−1 )). 1 1 1 2
uk =
(6.2.2)
Suppose that |uk (i1 )| = max{|uk(1)|, |uk(2)|, · · · , |uk (n)|}, then we have yk+1 (i1 )uk (i1 )−1 = x1 (i1 )λ1 x1 (i1 )−1 (1 + O(| λλ2 |k kn1 (λ2 )−1 )) ≈ x1 (i1 )λ1 x1 (i1 )−1 , 1
uk uk (i1 )−1 = x1 x1 (i1 )−1 (1 + O(| λλ2 |k kn1 (λ2 )−1 )) ≈ x1 x1 (i1 )−1 , 1 uk uk (i1 )−1 /|uk (i1 )−1 | = uk u¯k (i1 )/|uk (i1 )|.
(6.2.3)
Suppose that yk+1(i1 )uk (i1 )−1 = a + bi + cj + dk, a, b, c, d ∈ R, we choose p z z = b + b2 + c2 + d 2 + dj − ck, and v = , |z| then
√ λ1 ≈ vyk+1 (i1 )uk (i1 )−1 v¯ = a + b2 + c2 + d 2 i.
(6.2.4)
ˆ 1 ≡ vyk+1 (i1 )uk (i1 )−1 v¯ is a good approximation of the standard right Therefore λ eigenvalue λ1 , and xˆ1 ≡ uk u¯k (i1 )v/|u ¯ k(i1 )| is a good approximation of the unit right eigenvector x1 associated with the standard right eigenvalue λ1 . Proof. Let u0 ∈ Qn be an initial unit vector. x1 , x2 , · · · , xn form a basis in Qn , we have
Since the column vectors
n
u0 =
∑ x jα j,
j=1
From Theorem 1.4.7, A has Jordan canonical form X −1 AX = J as in (1.4.1)– (1.4.3). Suppose that u0 = Xη. Partition X and η as X = (x1 , X2 (λ2 ), · · · , Xr (λr )) and η = (α1 , ηT2 (λ2 ), · · · , ηTr (λr ))T ,
Computations of Quaternion Eigenvalue Problems
179
respectively, in which the partitions of X and η are conforming with that of J as in (1.4.1). When k is sufficiently large, we have r
uk =
Ak u
x1 λk1 α1 + ∑ X j (λ j )J j (λ j )η j (λ j ) 0
kAk u0 k2
=
j=2 r
kx1 λk1 α1 + ∑ X j (λ j )J j (λ j )η j (λ j )k2 j=2
r
=
(x1 λk1 α1 + ∑ X j (λ j )J j (λ j )η j (λ j ))/|λ1 |k k(x1λk1 α1 +
j=2 r
∑ X j (λ j )J j (λ j )η j (λ j )/|λ1 |k )k2
.
j=2
Suppose that in (1.4.3), (l)
in which N =
0
1 .. .
Ji (λi ) = λi I + N,
. Notice that .. . 1 0 ..
(l) (Ji (λi ))k
.
k
k
= (λi I + N) = ∑
t=0
k t
(λi )t N k−t ,
and so for k ≥ nl (λi ) and λi 6= 0, √ (l) (l) nl (λi )kJi (λi )k k∞ k(Ji (λi ))k k2 ≤ k |λ1 | |λ1 |k nl (λi )−1 nl (λi )−1 p k |λi |k−t |λi |k−t nl (λi )−1 ≤ nl (λi ) ∑ ≤ k ∑ k |λ1 | |λ1 |k t t=0 t=0 nl (λi )−1 p = nl (λi )knl (λi )−1 | λλ1i |k ∑ |λ1i |t t=0
= cknl (λi )−1 | λλ1i |k ,
in which c=
Therefore, we have uk =
p 1−| λ1 |nl (λi )−1 i , |λi| = 6 1 nl (λi ) 1− 1
|λi |
3
nl (λi) 2 , |λi | = 1.
k x1 λk1 α1 n1 (λ2 )−1 |λ2 | )) (1 + O(k k |λ1 |k kx1λ1 α1 k2
≈
x1 λk1 α1 . kx1 λk1 α1 k2
180
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao Similarly, we have yk+1 =
k x1 λk+1 n1 (λ2 )−1 |λ2 | )) 1 α1 (1 + O(k k |λ1 |k kx1λ1 α1 k2
≈
x1 λk+1 1 α1 . kx1 λk1 α1 k2
Suppose that |uk(i1 )| = max{|uk (1)|, |uk(2)|, · · · , |uk(n)|}, then we have yk+1 (i1 ) ≈
x1 (i1 )λk+1 1 α1 , kx1 λk1 α1 k2
uk (i1 ) ≈
x1 (i1 )λk1 α1 , kx1λk1 α1 k2
and so yk+1 (i1 )uk(i1 )−1 ≈ x1 (i1 )λ1 x1 (i1 )−1 ,
uk uk (i1 )−1 ≈ x1 x1 (i1)−1 .
(6.2.5)
Obviously, x1 (i1 )λ1 x1 (i1 )−1 ∈ [λ1 ] is a right eigenvalue of A and x|x1 x1¯(i1 (i1 1)|) is associated right eigenvector. Suppose that yk+1 (i1 )uk(i1 )−1 = a+bi+cj+dk, a, b, c, d ∈ R, from Lemma 1.2.2, we can choose v = |z|z with p z = b + b2 + c2 + d 2 + dj − ck (6.2.6)
such that
ˆ 1 v¯ = a + vλ x x¯ (i )
p
b2 + c2 + d 2 i ≈ λ1 .
(6.2.7)
Denote x˜1 = |x1 1 1(i1 1)| v, ¯ then x˜1 is a good approximation of the unit right eigenvector of A associated with λ1 . We call the above algorithm as the power method for the quaternion non-Hermitian right eigenvalue problem. When A is a non-Hermitian diagonalizable matrix, then with the power method mentioned in (6.2.1) and (6.2.2), we have a better estimation, as mentioned below. Corollary 6.2.2. Under the conditions and procedures in Theorem 6.2.1, if furthermore, A is diagonalizable, when k is sufficiently large, x1 λk1 α1 (1 + O(| λλ12 |k )), kx1 λk1 α1 k2 x λk+1 α yk+1 = kx1 λ1k α k1 (1 + O(| λλ21 |k )). 1 1 1 2
uk =
Suppose that |uk (i1 )| = max{|uk(1)|, |uk(2)|, · · · , |uk (n)|}, then we have
(6.2.8)
yk+1 (i1 )uk(i1 )−1 = x1 (i1 )λ1 x1 (i1 )−1 (1 + O(| λλ12 |k )) ≈ x1 (i1 )λ1 x1 (i1 )−1 , uk uk (i1 )−1 = x1 x1 (i1 )−1 (1 + O(| λλ21 |k )) ≈ x1 x1 (i1 )−1 , uk uk (i1 )−1 /|uk(i1 )−1 | = uk u¯k (i1 )/|uk(i1 )|. (6.2.9)
Computations of Quaternion Eigenvalue Problems
181
Suppose that yk+1(i1 )uk (i1 )−1 = a + bi + cj + dk, a, b, c, d ∈ R, we choose p z z = b + b2 + c2 + d 2 + dj − ck, and v = , |z| then
√ λ1 ≈ vyk+1 (i1 )uk (i1 )−1 v¯ = a + b2 + c2 + d 2 i.
(6.2.10)
ˆ 1 ≡ vyk+1 (i1 )uk (i1 )−1 v¯ is a good approximation of the standard right Therefore λ eigenvalue λ1 , and xˆ1 ≡ uk u¯k (i1 )v/|u ¯ k(i1 )| is a good approximation of the unit right eigenvector x1 associated with the standard right eigenvalue λ1 . Now we provide the numerical code for the real structure-preserving algorithm of the power method for quaternion non-Hermitian right eigenvalue problem. Algorithm 6.2.3. For a given matrix A = A1 + A2 i + A3 j + A4 k ∈ Qm×m , the input B is the first column block of the real representation of A, i.e. ARc , u0c is the first column block of the real representation of the initial vector u0. M and N are the iterate numbers of the outer iteration and inner iteration, respectively, and tol is the terminate parameter. Function : [d1,u] = qPower(B,u0c,M,N,tol) A0 = B; n = size(A0,2); d1 = [0;0;0;0]; u = u0c; A1 = Real p(A0(1 : n,:),A0(n + 1 : 2 ∗ n,:),A0(2 ∗ n + 1 : 3 ∗ n,:),... A0(3 ∗ n + 1 : 4 ∗ n,:)); for i = 1 : M for k = 1 : N y = A1 ∗ u; u = y/norm(y); end y = A1 ∗ u; [mx,i1] = max(sum(([u(1 : n),u(1 + n : 2 ∗ n),u(1 + 2 ∗ n : 3 ∗ n),... u(1 + 3 ∗ n : 4 ∗ n)] ∗ [u(1 : n),u(1 + n : 2 ∗ n),u(1 + 2 ∗ n : 3 ∗ n),... u(1 + 3 ∗ n : 4 ∗ n)])0));% (1) d2 = Real p(y(i1),y(i1 + n),y(i1 + 2 ∗ n),y(i1 + 3 ∗ n)) ∗ ... [u(i1);−u(i1 + n);−u(i1 + 2 ∗ n);−u(i1 + 3 ∗ n)]/mx; u = Real p(u(1 : n),u(1 + n : 2 ∗ n),u(1 + 2 ∗ n : 3 ∗ n),u(1 + 3 ∗ n : 4 ∗ n))... ∗[u(i1);−u(i1 + n);−u(i1 + 2 ∗ n);−u(i1 + 3 ∗ n)]/sqrt(mx); z = [d2(2) + sqrt(d2(2) ∗ d2(2) + d2(3) ∗ d2(3) + d2(4) ∗ d2(4));0;d2(4);−d2(3)]; v = z/norm(z);
182
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao d2 = Real p(v(1),v(2),v(3),v(4)) ∗ (Real p(d2(1),d2(2),d2(3),d2(4))... ∗[v(1);−v(2);−v(3);−v(4)]); u = Real p(u(1 : n),u(n + 1 : 2 ∗ n),u(2 ∗ n + 1 : 3 ∗ n),u(3 ∗ n + 1 : 4 ∗ n))... ∗[v(1);−v(2);−v(3);−v(4)]; if norm(d2 − d1) < tol d1 = d2; S = i ∗ N; break end d1 = d2; end end
The assignment lines containing %(1) compute mx and i1, in which mx = max{|uk(1)|2 , |uk(2)|2, · · · , |uk(n)|2 } ≡ |uk(i1)|2.
2. The Inverse Power Method for Quaternion Non-Hermitian Right Eigenvalue Problem The inverse power method can be used to compute the standard right eigenvalue with the minimal norm and associated right eigenvector. For non-Hermitian matrix A ∈ Qn×n , suppose that λ1 , λ2 , · · · , λn are the standard right eigenvalues of A, and x1 , x2 , · · · , xn ∈ Qn are the right eigenvectors associated with λ1 , λ2 , · · · , λn , respectively, and the standard right eigenvalues of A satisfy |λ1 | ≥ |λ2 | ≥ · · ·|λn−1| > |λn |.
(6.2.11)
−1 −1 −1 satisfy Then the eigenvalues λ−1 1 , λ2 , · · · , λn of A −1 −1 |λ−1 n | > |λn−1 | ≥ · · · ≥ |λ1 |.
So we can obtain λn and associated right eigenvector xn by implementing the power method on A−1 . In fact, we need not to compute A−1 in practice. We can adopt the following iterative algorithm Ayk+1 = uk , (6.2.12) k+1 uk+1 = kyyk+1 , k2
in which u0 ∈ Qn is any given initial vector with ku0 k2 = 1. At each iterative step, we need to compute a linear system Ayk+1 = uk . Since all the linear systems
Computations of Quaternion Eigenvalue Problems
183
have the same coefficient matrix A, we can implement the pivoting LU decomposition (PLU) on A with the real structure-preserving algorithm [118, 178] in advance. Suppose that PA = LU, in which P ∈ Qn×n is a permutation matrix, L ∈ Qn×n is a unit lower triangular matrix and U ∈ Qn×n is an upper triangular matrix. Substituting A = PH LU into Ayk+1 = uk , we obtain LUyk+1 = Puk. Denote Uyk+1 = zk , then we only need to solve two triangular linear systems Lzk = Puk , (6.2.13) Uyk+1 = zk ,
and
yk+1 kyk+1 k2 at each iterative step. Here we omit the algorithm of the inverse power method, since it is also similar to the power method. We do not know whether the shift inverse power method can be extended to quaternion non-Hermitian matrices. We now provide two numerical examples to illustrate the effectiveness of the above real structure-preserving algorithms for quaternion non-Hermitian right eigenvalue problem. uk+1 =
Example 1. Let quaternion matrix A = A1 + A2 i + A3 j + A4 k ∈ Q3×3 with
0.4309 2.3036 −1.2251 A1 = −1.5248 2.2281 1.7447 , A2 = −0.5634 0.5770 2.3410 0.7802 −1.5571 3.6028 A3 = 0.2742 2.4791 −0.7757 , A4 = 0.9590 2.1893 −2.1153
−0.0800 −4.5249 4.4569 −0.4619 −3.7069 6.1034 , −0.4263 −1.7585 3.3300 −0.1395 −0.1690 2.7813 −0.7953 0.9238 1.1998 . −0.4127 0.1302 0.4477
Three standard right eigenvalues of A are 3 + 4i, 1 + 2i and 1 + i, respectively. We choose M = 300, N = 5, tol = 10−5 , the initial vector 1 1 1 1 2 + 2i+ 2 j + 2k 1 1 . u0 = 2i+ 2k 1 1 2−2j
184
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao Table 6.2.3. The power method for non-Hermitian matrices n 20 40 60 80 100 120 140 160 180 200
Number of iterative steps 15 15 15 15 15 15 10 10 15 10
ˆ 1 k2 kAxˆ1 − xˆ1 λ 0.1426 ∗ 10−11 0.1443 ∗ 10−14 0.5950 ∗ 10−15 0.7493 ∗ 10−15 0.1106 ∗ 10−14 0.1633 ∗ 10−14 0.9129 ∗ 10−13 0.6065 ∗ 10−13 0.3020 ∗ 10−14 0.2933 ∗ 10−13
After 25 iterate steps, we obtain the standard right eigenvalue d = 3.0000 + 4.0000i which has the maximal norm, and associated right eigenvector u = u1 + u2 i + u3 j + u4 k with
0.0671 −0.0448 −0.0564 0.5886 u1 = 0.4789 , u2 = 0.0000 , u3 = 0.0404 , u4 = 0.4455 . 0.4266 0.0007 −0.0041 0.1803
After 35 iterate steps, we obtain the standard right eigenvalue d = 1.0000 + 1.0000i which has the minimal norm, and associated right eigenvector u = u1 + u2 i + u3 j + u4 k with
0.3779 −0.1853 0.0975 0.4260 u1 = 0.5988 , u2 = 0.0000 , u3 = 0.0397 , u4 = 0.1931 . 0.3451 0.0392 −0.1201 0.3153
Example 2. For n = 20 : 20 : 200, and randomly obtained quaternion nonHermitian matrices A ∈ Qn×n , we perform the real structure-preserving algorithms of the power method and the inverse power method for quaternion nonHermitian right eigenvalue problem. We choose M = 300, N = 5, tol = 10−5 . We list the numbers of iterative steps and computational errors kAxˆ1 − xˆ1 λˆ1 k2 , ˆ n k2 of these algorithms in Table 6.2.3 and Table 6.2.4, respectively. kAxˆn − xˆn λ From above two examples, we can see the power method and the inverse
Computations of Quaternion Eigenvalue Problems
185
Table 6.2.4. The inverse power method for non-Hermitian matrices n 10 20 30 40 50 60 70 80 90 100
Number of iterative steps 25 25 20 50 45 55 35 70 35 30
ˆ n k2 kAxˆn − xˆn λ 0.2751 ∗ 10−8 0.1860 ∗ 10−8 0.2749 ∗ 10−8 0.6721 ∗ 10−7 0.3512 ∗ 10−6 0.5257 ∗ 10−8 0.2022 ∗ 10−8 0.4024 ∗ 10−7 0.1070 ∗ 10−7 0.6833 ∗ 10−8
power method proposed for computing the maximal norm standard right eigenvalue and minimal norm standard right eigenvalue are reliable.
6.2.2.
The Quaternion QR Algorithm for Quaternion Non-Hermitian Right Eigenvalue Problem
In this subsection, we show how to extend the Francis QR algorithm to quaternion matrices [23]. The QR algorithm is the iterative method which makes full use of the unitary similarity relation between a quaternion non-Hermitian matrix and a quaternion tridiagonal matrix. Suppose that A ∈ Qn×n , from the Schur decomposition (Theorem 1.4.8), we know that there exist a quaternion unitary matrix U ∈ Un and a quaternion upper triangular matrix T ∈ Qn×n , such that U H AU = T. The QR algorithm reduces a quaternion matrix to a Schur-like triangular form through a sequence of quaternion unitary similarity transformations. And then the diagonal entries of T are representatives of right eigenvalues of A. The quaternion QR algorithm can be written as follows. Algorithm 6.2.4. H0 = U0 AU0H for k = 1, 2, · · · Factor Hk−1 = Qk Rk with unitary Qk and triangular Rk . Set Hk =: Rk Qk .
186
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
The shortcoming in Algorithm 6.2.4 is that if Hk−1 is full, each step requires a full QR decomposition costing O(n3 ) flops. In order to compute the Schur decomposition efficiently, we must carefully choose the initial unitary similarity U0 . Fortunately, if we choose U0 so that H0 is upper Hessenberg, then the amount of work in each iteration is reduced from O(n3 ) to O(n2 ). Furthermore, when Hk−1 = Qk Rk is upper Hessenberg, Hk =: Rk Qk is also an upper Hessenberg matrix. To reduce a quaternion upper Hessenberg matrix of order n into a quaternion upper triangular matrix, we need n − 1 unitary transformations based on the qGivens matrices (Algorithm 2.1.5). Therefore, if we want to compute the right eigenvalues of a quaternion non-Hermitian matrix A, we can deal with the right eigenvalue problem of the quaternion triangular matrix Hk . We now present the two steps of the triangularliaztion of A in the QR method for quaternion non-Hermitian right eigenvalue problem [84]. Step 1. In the process of upper Hessenberglization of a non-Hermitian matrix A ∈ Qn×n , we perform a sequence of left-multiplying and right-multiplying Householder based transformations on A, to obtain a quaternion upper Hessenberg matrix H0 as follows In−2 0 Is 0 1 0 H0 = . . . . . . A 0 H˜ n−2 0 H˜ s 0 H˜ 1 H H H 1 0 Is 0 In−2 0 × ... ... , 0 H˜ 1 0 H˜ s 0 H˜ n−2 in which H˜ s are n − s quaternion Householder matrices, s = 1, · · · , n − 2. We can perform Step 1 [84, 115] with the following numerical code of the real structure-preserving algorithm. Algorithm 6.2.5. Real structure-preserving method for Hessenberglization of a quaternion matrix A = A1 + A2 i + A3 j + A4 k ∈ Qn×n , in which As ∈ Rn×n , s = 1, 2, 3, 4. The input AA is a 4n × n real matrix which is the first column block of AR . The output H0 R is a 4n × n matrix which is the first column block of H0R , and PP is a 4n × n matrix which is the first column block of PR satisfying A = PH0 PH .
Computations of Quaternion Eigenvalue Problems
187
Function : [PP, H0R] = qHermeig(AA) n = size(AA, 2); B = A; P = zeros(4 ∗ n, n); P(1 : n, :) = eye(n); for t = 1 : n − 2 s = t + 1; if norm([B(s : n, t), B((n + s) : (2 ∗ n), t), B((2 ∗ n + s) : (3 ∗ n), t), ... B((3 ∗ n + s) : (4 ∗ n), t)]) > 0 [u, beta] = Householder1(B(s : n, t), B((n + s) : (2 ∗ n), t), ... B((2 ∗ n + s) : (3 ∗ n), t), B((3 ∗ n + s) : (4 ∗ n), t), n − s + 1); B([s : n, s + n : 2 ∗ n, s + 2 ∗ n : 3 ∗ n, s + 3 ∗ n : 4 ∗ n], t : n) = ... B([s : n, s + n : 2 ∗ n, s + 2 ∗ n : 3 ∗ n, s + 3 ∗ n : 4 ∗ n], t : n) − ... (beta ∗ u) ∗ (u0 ∗ B([s : n, s + n : 2 ∗ n, s + 2 ∗ n : 3 ∗ n, s + 3 ∗ n : 4 ∗ n], t : n)); P([s : n, s + n : 2 ∗ n, s + 2 ∗ n : 3 ∗ n, s + 3 ∗ n : 4 ∗ n], 1 : n) = ... P([s : n, s + n : 2 ∗ n, s + 2 ∗ n : 3 ∗ n, s + 3 ∗ n : 4 ∗ n], 1 : n) − (beta ∗ u) ∗ ... (u0 ∗ P([s : n, s + n : 2 ∗ n, s + 2 ∗ n : 3 ∗ n, s + 3 ∗ n : 4 ∗ n], 1 : n)); end Z(t : n, [s : n, s + n : 2 ∗ n, s + 2 ∗ n : 3 ∗ n, s + 3 ∗ n : 4 ∗ n]) = ... [B(t : n, s : n), −B(t + n : 2 ∗ n, s : n), −B(t + 2 ∗ n : 3 ∗ n, s : n), ... −B(t + 3 ∗ n : 4 ∗ n, s : n)]; Z(t : n, [s : n, s + n : 2 ∗ n, s + 2 ∗ n : 3 ∗ n, s + 3 ∗ n : 4 ∗ n]) = ... Z(t : n, [s : n, s + n : 2 ∗ n, s + 2 ∗ n : 3 ∗ n, s + 3 ∗ n : 4 ∗ n])... −Z(t : n, [s : n, s + n : 2 ∗ n, s + 2 ∗ n : 3 ∗ n, s + 3 ∗ n : 4 ∗ n]) ∗ u ∗ (beta ∗ u0); B([t : n, t + n : 2 ∗ n, t + 2 ∗ n : 3 ∗ n, t + 3 ∗ n : 4 ∗ n], s : n)... = [Z(t : n, s : n); −Z(t : n, s + n : 2 ∗ n); −Z(t : n, s + 2 ∗ n : 3 ∗ n); ... −Z(t : n, s + 3 ∗ n : 4 ∗ n)]; H0 R = B; end
Step 2. Then we perform a sequence of iterations using qGivens rotations as described in Theorem 2.1.4 on the upper Hessenberg matrix H0 , such that the resulting matrix converts to an upper triangular matrix R as follows. It 0 0 ˜1 In−2 0 G 0 0 ··· R= · · · 0 G˜ t H0 0 G˜ n−1 0 In−2 0 0 In−t−2 H H H I 0 0 t G˜ 1 0 In−2 0 ˜ × · · · 0 Gt 0 ··· , 0 In−2 0 G˜ n−1 0 0 In−t−2
188
Musheng Wei, Ying Li, Fengxia Zhang and Jianli Zhao
in which Gt are 2 × 2 quaternion Givens matrices t = 1, · · · , n − 1. Step 2 can be performed with Algorithm 2.1.5. 3. Implicit Double Shift Trick To ensure rapid convergence of the quaternion QR algorithm, we need to shift the eigenvalue. Bunse-Gerstner, Byers and Mehrmann [23] pointed that the single-shift technique cannot choose any nonreal quaternion as the shift because of noncommunity of quaternions and directly proposed the implicitly double shift QR algorithm. They proposed the implicitly double shift QR algorithm directly. Algorithm 6.2.6 (Implicitly Double Shift Quaternion QR Algorithm [23]). Given a quaternion matrix A ∈ Qn×n , set A0 := U0∗ AU0 in which U0 is unitary chosen so that A0 is upper Hessenberg. For s = 0, 1, 2, . . . 1. Select an approximate eigenvalue µ ∈ Q.
2. Set Ak+1 := Q∗k Ak Qk in which Qk is unitary chosen so that Q∗k (A2k − (µ + µ¯ )Ak + µ¯µ)I is upper triangular.
In general, Mk = A2k −(µ+ µ)A ¯ k +µ¯µI can not be explained as (Ak −µI)(Ak − µI) ¯ when the shift µ is a nonreal quaternion number. How to calculate the real JRS−Schur form by the Francis QR step, we can refer to [84]. It is inconvenient to work with Mk directly. As the choice of shift varies, it becomes difficult to trace back to the original eigenvalues. Moreover, there is a well known danger of destructive rounding errors in forming the matrix product A2k . So, we must work with Mk implicitly by generalizing the usual Hessenberg form driven implicit shift trick [55]. For detailed analysis for the quaternion QR method, we refer to [23, 84].
References [1] Adler S. L., Quaternionic Quantum Mechanics and Quantum Fields, Oxford University Press, New York, 1995. [2] Adler S. L., Emch G. G., A rejoinder on quaternionic projective representations, J. Math. Phys. 38 (1997) 4758-4762. [3] Adler S. L., Scattering and decay theory for quaternionic quantum mechanics and the structure of induced T non-conservation, Phys. Rev. D 37 (1998) 3654-3662. [4] Anda A. A., Park H., Fast plane rotations with dynamic scaling. SIAM J. Matrix Anal. Appl., 15(1994) 162-174. [5] Apostol T. M., Mathematical Analysis, 2nd ed., Addison-Wesley, 1977. [6] Au-Yeung Y. H., On the convexity of numerical range in quaternionic Hilbert spaces, Linear and Multilinear Algebra, 16(1984) 93-100. [7] Au-Yeung Y. H., Cheng C. M., On the pure imaginary quaternionic solutions of the Hurwitz matrix equations, Linear Algebra Appl. 419(2006) 630-642. [8] Axelsson O., A generalized SSOR method. BIT, 12(1972) 443-467. [9] Axelsson O., A generalized conjugate gradient least squares method. Numer. Math., 51(1987) 209-227. [10] Axelsson O., Iterative Solution Methods. Cambridge: Cambridge Univ. Press, 1994.
190
References
[11] Bai Z., Demmel J. W., Computing the generalized singular value decomposition. SIAM J. Sci. Comput., 14(1993) 1464-1486. [12] Bai Z., Golub G. H., Ng M. K., Hermitian and skew-Hermitian splitting methods for non-Hermitian positive definite linear systems. SIAM J. Matrix Anal. Appl., 24(2003) 603-626. [13] Bai Z., Golub G. H., Pan J., Preconditioned Hermitian and skewHermitian splitting method for non-Hermitian positive semidefinite linear systems. Numer. Math., 98(2004) 1-32. [14] Baksalary J. K., Kala R., The matrix equation AY B − CZD = E. Linear Algebra Appl., 130(1987) 141-147. [19] Barlow J. L., Handy S. L., The direct solution of weighted and equality constrained least squares problems. SIAM J. Sci. Stat. Comput. 9(1988) 704-716. [24] Ben-Israel A., Greville T. N. E., Generalized Inverses: Theory and Applications. New York: John Wiley, 1974. [17] Benner P., Symplectic balancing of Hamiltonian matrices, SIAM J. Sci. Comput. 22(5)(2000) 1885-1904. [18] Bihan N. L., Sangwine S. J., Jacobi method for quaternion matrix singular value decomposition, Appl. Math. Comp., 187(2007) 1265–1271. [19] Bischof C. H., Hansen P. C., Structure preserving and rank-revealing QR factorizations. SIAM J. Sci. Statist. Comput., 12(1991) 1332-1350. ˚ Solving linear least squares problems by Gram-Schmidt or[20] Bj¨orck A. thogonalization. BIT, 7(1967) 1-21. ˚ Numerical Methods for Least Squares Problems. Philadelphia: [21] Bj¨orck A. SIAM, 1996. [22] Brenner J. L., Matrices of quaternion. Pacific J. Math. 1 (1951) 329-335. [23] Bunse-Gerstner A., Byers R., Mehrmann V., A quternion QR algorithm, Numer. Math., 55 (1989) 83-95.
References
191
[24] Businger P., Golub G. H., Linear least squares solutions by Householder transformations. Numer. Math., 7(1965) 269-276. [25] Caccavale F., Natale C., Siciliano B., Villani L., Six-DOF impedance control based on angle/axis representations, IEEE Trans. Robot. Autom. 15(1999) 289-300. [26] Campbell S. L., Meyer Jr. C. D., Generalized Inverses of Linear Transformations. London, San Francisco: Pitman, 1979. [27] Chan T. F., An improved algorithm for computing the singular value decomposition. ACM Trans. Math. Software, 8 (1982) 72-83. [28] Chan T. F., Rank revealing QR-factorizations. Linear Algebra Appl., 88/89(1987) 67-82. [29] Chan T. F., Hansen P. C., Computing truncated SVD least squares solutions by rank revealing QR factorizations. SIAM J. Sci. Statist. Comput., 11(1990) 519-530. [30] Chandrasekaran S., Ipsen I. C. F., On rank-revealing factorizations. SIAM J. Matrix Anal., 15(1994) 592-622. [31] Chang X. W., Wang J. S., The symmetric solution of the matrix equation AY + ZA = C, AYAT + BZBT = C, and (AT YA, BT Y B) = (C, D), Linear Algebra Appl., 179(1993) 171-189. [32] Chen L. X., Definition of determinant and Cramer solutions over the quatemion field, Acta Math. Sinica N. S. 7(1991) 171-180. [33] Chen L. X., The extension of Cayley-Hamilton theorem over the quatemion field, Chinese Sci. Bull. 17(1991) 1291-1293. [34] Chen L. X., Inverse matrix and properties of double determinant over quaternion field, Sci. China Ser. A 34(1991) 528-540 . [35] Cohn P. M., Skew Field Constructions, London Math. Sot. Lecture Note Ser. 27, Cambridge U.P., 1977. [36] Cohn P. M., The similarity reduction of matrices over a skew field, Math. 2.132 (1973) 151-163.
192
References
[37] Collection of Facial Images: Faces95. http://cswww.essex.ac.uk/mv/allfaces/faces95.html. [38] Cullum J. K., Willoughby R. A., Lake M., A Lanczos algorithm for computing singular values and vectors of large matrices. SIAM J. Sci. Stat. Comput., 4(1983) 197-215. [39] Dai H., On the symmetric solutions of linear matrix equation. Linear Algebra Appl., 131(1990) 1-7. [40] Davies A. J., Quaternionic Dirac equation, Phys. Rev. D, 41 (1990) 26282630. [41] Davies A. J., Mckellar B. H., Non-relativistic quaternionic quantum mechanics, Phys. Rev. A, 40 (1989) 4209-4214. [42] Davies A. J., Mckellar B. H., Observability of quaternionic quantum mechanics, Phys. Rev. A, 46 (1992) 3671-3675. [43] De Leo S., Scolarici G., Right eigenvalue equation in quaternionic quantum mechanics, J. Phys. A: Math. Gen., 33(2000) 2971-2995. [44] De Pierro A. R., Wei M., Some new properties of the constrained and weighted least squares problem. Linear Algebra Appl., 320(2000) 145165. [45] Demmel J. W., Accurate singular value decompositions of structured matrices. SIAM J. Matrix Anal. Appl., 21(1999) 562-580. [46] Demmel J. W., Kahan W., Accurate singular values of bidiagonal matrices. SIAM J. Sci. Statist. Comput., 11(1990) 873-912. [47] Eilenberg S., Niven I., The fundamental theorem of algebra for quaternions, Bull. Amer. Math. Sot., 50(1944) 246-248. [48] Eckart C., Young G., The approximation of one matrix by another of lower rank. Psychometrica, 1(1936) 211-218. [49] Fabrizio C., Ciro N., Bruno S., Luigi V., Six-DOF impedance control based on angle/axis representations, IEEE Trans. Robot. Autom., 15(1999) 289-300.
References
193
[50] Farenick D. R., Pidkowich B. A. F., The spectral theorem in quaternion, Linear Algebra Appl., 371(2003) 75-102. [51] Finkelstein D., Jauch J. M., Schiminovich S., Speiser D., Foundations of quaternion quantum mechanics, J. Math. Phys., 3 (1962) 207-220. [52] Finkelstein D., Jauch J. M., Speiser D., Quaternionic representations of compact groups, J. Math. Phys., 4 (1963) 136-140. [53] Fletcher R., Conjugate gradient methods for indefinite systems, in: G.A. Watson(Ed.), Proc. Dundee Biennial Conference in Numerical Analysis, Springer-Verlag, 1975. [54] Faβender H., Mackey D. S., Mackey N., Hamilton and Jacobi come full circle: Jacobi algorithms for structured Hamiltonian problems, Linear Algebra Appl., 332-334(2001) 37-80. [55] Francis J., The QR transformation – Part II. Computer Journal, 5(1962) 332-345. [56] Freund R. W., Golub G. H., Nachtigal N., Iterative solution of linear systems. Acta Numerica, 1(1991) 57-100. [57] Gentleman W. M., Least squares computations by Givens transformations without square roots. J. Inst. Maths. Applic., 12(1973) 329-336. [58] Bunse-Gerstner A., Byers R., Mehrmann V., A quternion QR algorithm, Numer. Math., 55 (1989) 83-95. [59] Givens W., Computation of plane unitary rotations transforming a general matrix to triangular form. SIAM J. Appl. Math., 6(1958) 26-50. [60] Golub G. H., Numerical methods for solving least squares problems. Numer. Math., 7(1965) 206-216. [61] Golub G. H., Kahan W., Calculating the singular values and pseudoinverse of a matrix. SIAM J. Numer. Anal. Ser. B, 2(1965) 205-224. [62] Golub G. H., Reinsch C., Singular value decomposition and least squares solutions. Numer. Math., 14(1970) 403-420.
194
References
[63] Golub G. H., Van Loan C. F., An analysis of the total least squares problem. SIAM J. Numer. Anal., 17(1980) 883-893. [64] Golub G. H., Van Loan C. F., Matrix Computations, 4th Edition, The Johns Hopkins University Press, Baltimore, MD, 2013. [65] Gu M., Eisenstat S. C., A divide-and-conquer algorithm for the bidiagonal SVD. SIAM J. Matrix. Anal. Appl., 16(1995) 79-92. [66] Guerlebeck K., Sproessig W., Quaternionic Analysis, Akademie Verlag, Berlin, 1989. [67] Gulliksson M., On modified Gram-Schmidt for weighted and constrained linear least squares. BIT, 35(1995) 458-473. ˚ Modifying the QR -decomposition to con[68] Gulliksson M., Wedin P. A., strained and weighted linear least squares. SIAM J. Matrix Anal. Appl., 13(1992) 1298-1313. [69] Hamilton W. R., The Mathematical Papers of Sir William Rowan Hamilton, Cambridge: Cambridge University Press, 1967. [70] Hamilton W. R., Elements of Quaternions, New York: Chelsea, 1969. [71] Hankins T. L., Sir William Rowan Hamilton, Baltimore: the Johns Hopkins University Press, 1980. [72] Herstein I. N., Topics in Algebra, second edition, Toronto:Xerox College Publishing, 1975. [73] Hestenes M. R., Stiefel E., Methods of conjugate gradients for solving linear system. J. Res. Nat. Bur. Standards., 49(1952) 409-436. [74] Higham N. J., Iterative refinement enhances the stability of QR factorization methods for solving linear equations. BIT, 31(1991) 447-468. [75] Higham N. J., Accuracy and Stability of Numerical Algorithms. 2nd Edit., Philadelphia: SIAM, 2002. [76] Horn R. A., Johnson C. R., Matrix Analysis, Cambridge U.P., 1985. [77] Horn R. A., Johnson C. R., Topics in Matrix Analysis, Cambridge U.P., 1994.
References
195
[78] Hong H. P., Pan C. T., Rank-revealing QR factorization and SVD. Math. Comp., 58(1992) 213-232. [79] Householder A. S., Unitary triangularization of a non-symmetric matrix, J. Assoc. Comput. Math., 5 (1958) 339-342. [80] Huang L., The matrix equation AXB − CXD = E over the quaternion field, Linear Algebra Appl., 234(1996) 197-208. [81] Huang L., On two questions about quaternion matrices, Linear Algebra Appl., 318 (2000) 79-86. [90] Ji P., Wu H., A closed-form forward kinematics solution for the 6-6 p Stewart platform, IEEE Trans. Robot. Autom., 17(2001) 522-526. [83] Jia Z., Wei M., Ling S., A new structure-preserving method for quaternion Hermitian eigenvalue problems, J. Comput. Appl. Math., 239(2013) 12-24. [84] Jia Z., Wei M., Zhao M., Chen Y., A new real structure-preserving quaternion QR algorithm, J. Comput. Appl. Math., 343(2018) 26-48. [85] Jiang T., Representation Theory of Matrices with Applications in Numerical Computations(in Chinese), Ph D. thesis, East China Normal University, 2003. [86] Jiang T., An algorithm for eigenvalues and eigenvectors of quaternion matrices in quaternionic quantum mechanics, J. Math. Phys., 45 (2004) 3334-3338. [87] Jiang T., Chen L., Algebraic algorithms for least squares problem in quaternionic quantum theory, Comput. Phys. Commun., 176(2007) 481485. [88] Jiang T., Algebraic methods for diagonalization of a quaternion matrix in quaternionic quantum theory, J. Math. Phys., 46 (2005) 52106-52108. [89] Jiang T., Chen L., An algebraic method for Schr¨odinger equation in quaternionic quantum mechanics, Comput. Phys. Commun., 178(2008) 795-799.
196
References
[90] Jiang T., Wei M., Equality constrained least squares problem over quaternion field, Appl. Math. Lett., 16(2003) 883-888. [91] Jiang T., Wei M., On a solution of the quaternion matrix equation X − ˜ = C and its application, Acta. Math. Sin., 21(2005) 483-490. AXB [92] Jiang T., Zhao J., Wei M., A new technique of quaternion equality constrained least squares problem, J. Comput. Appl. Math., 216(2008) 509513. [93] Kahan W., Numerical linear algebra. Canad. Math. Bull., 9(1966) 757801. [94] Kaiser H., George E. A., Werner S. A., Neutron interferometry search for quaternion in quantum mechanics, Phys. Rev. A, 29 (1984) 2276-2279. [95] Kelley C. T., Iterative Methods for Linear and Nonlinear Equations. SIAM, Philadelphia, 1995. [96] Klein A. G., Schr¨odinger inviolate: Neutron optical searches for violations of quantum mechanics, Phys. B, 151 (1988) 44-49. [97] Krieg A., Modular Forms on Half-Spaces of Quaternions, Lecture Notes in Math. 1143, Springer-Verlag, 1985. [99] Kyrchei I., Explicit representation formulas for the minimum norm least squares solutions of some quaternion matrix equations, Linear Algebra Appl., 438(1)(2013) 136-152. [99] Kyrchei I., Analogs of Cramer’s rule for the minimum norm least squares solutions of some matrix equations, Appl. Math. Comput., 218(11)(2012) 6375-6384. [100] Lawson C. L., Hanson R. J., Solving Least Squares Problems. 3nd Edit., Philadelphia: SIAM, 1995. [101] Lax M., Symmetry Principles in Solid State and Molecular Physics, New York: Wiley, 1974. [102] Le Bihan N., Mars J., Singular value decomposition of quaternion matrices: a new tool for vector-sensor signal processing, Signal Process., 84(7)(2004) 1177-1199.
References
197
[103] Le Bihan N., Sangwine S. J., Quaternion principal component analysis of color image, IEEE International Conference on Image Processing (ICIP), 1(2003) 809-812. [104] Le Bihan N., Sangwine S. J., Color image decomposition using quaternion singular value decomposition, IEEE International Conference on Visual Information Engineering of Quaternion (VIE), Guidford, (2003) 113-116. [105] Lee H. C., Eigenvalues and canonical forms of matrices with quaternion coefficients, Proc. Roy. Irish Acad. Sect. A, 52 (1949) 253-260. [106] De Leo S., Quaternion and special relativity, J. Math. Phys., 37 (1996) 2955-2968. [107] De Leo S., Ducati G., Quaternionic groups in physics, Int. J. Theor. Phys., 38 (1999) 2197-2220. [108] De Leo S., Ducati G., Solving simple quaternionic differential equations, J. Math. Phys., 44 (2003) 2224-2233. [109] De Leo S., Rotelli P., Quaternion scalar field, Phys. Rev. D, 45 (1992) 575-579. [110] De Leo S., Rotelli P., Quaternionic electroweak theory, J. Phys. G, 22 (1996) 1137-1150. [111] De Leo S., Rodrigues W. A., Quaternionic electron theory: Dirac’s equation, Int. J. Theor. Phys., 37 (1998) 1511-1529. [112] De Leo S., Scolarici G., Solombrino L., Quaternionic eigenvalue problem, J. Math. Phys., 43 (2002) 5815-5829. [113] Li N., Wang Q., Jiang J., An efficient algorithm for the reflexive solution of the quaternion matrix equation AXB + CX H D = E, J. Appl. Math., 2013(2013) 1-14. [114] Li Y., Wei M., Zhang F., Zhao J., A fast structure-preserving method for computing the singular value decomposition of quaternion matrix, Appl. Math. Comput., 235 (2014) 157-167.
198
References
[115] Li Y., Wei M., Zhang F., Zhao J., Real structure-preserving algorithms of Householder based transformations for quaternion matrices, J. Comput. Appl. Math., 305(15)(2016) 82-91. [116] Li Y., Wei M., Zhang F., Zhao J., A new double color image watermarking algorithm based on the SVD and Arnold scrambling, J. Appl. Math., (2016) 1-9. [117] Li Y., Wei M., Zhang F., Zhao J., Comparison of two SVDbased color image compression schemes, Plos One, 2017, DOI:10.1371/journal.pone.0172746. [118] Li Y., Wei M., Zhang F., Zhao J., A structure-preserving method for the quaternion LU decomposition: revisit, Calcolo, 54(2017) 1553-1563. [119] Li Y., Wei M., Zhang F., Zhao J., Real structure-preserving algorithms for the quaternion Cholesky decomposition: revisit, J. Liaocheng University (Natural Science Edit), 32(1) (2018) 27-34. [120] Li Y., Wei M., Zhang F., Zhao J., On the power method for quaternion right eigenvalue problem, J. Comput. Appl. Math., 345(2019) 59-69. [121] Liao A. P., Bai Z., The bisymmetric minimum norm solution of the matrix equation AT XA = D, Math. Num. Sinica, 24(2002) 9-20. [124] Ling S., Wang M., Wei M., Hermitian tridiagonal solution with the least norm to quaternionic least squares problem, Comput. Phys. Comm., 181(2010) 481-488. [123] Ling S., Jia Z., Matrix iterative algorithms for least-squares problem in quaternionic quantum theory, Int. J. Comput. Math., 90(3)(2013) 727745. [124] Ling S., Wang M., Wei M., Hermitian tridiagonal solution with the least norm to quaternionic least squares problem, Comput. Phys. Comm., 181(2010) 481-488. [125] Liu Y., On the best approximation problem of quaternion matrices, J. Math. Study, 37 (2) (2004) 129-134. [126] Liu X. G., ON the solvability and perturbation analysis of TLS problem, Acta. Math. Appl. Sin., 19(1996) 254-262.
References
199
[127] Ma R., Jia Z., Bai Z., A structure-preserving Jacobi algorithm for quaternion Hermitian eigenvalue problems, Comput. Math. Appl., 75 (2018) 809-820. [128] Mackey D. S., Mackey N., Tisseur F., Structured Tools for Structured Matrices, Electr J. Linear Algebra, 10 (2003) 106-145. [129] Mackey D. S., Mackey N., On n-dimensional Givens rotations. Numerical Analysis Report No. 423, Manchester Centre for Computational Mathematics, Manchester, England, 2003. [130] Magnus J. R., L-structured matrices and linear matrix equations, Linear and Multilinear Algebra, 14 (1983) 67-88. [131] Martin R. S., Wilkinson J. H., Similarity reduction of a General Matrix to Hessenberg Form Numerische Mathematik, 12(1968) 349-368. [132] Mirsky L., Symmetric gauge functions and unitarily invariant norms. Quart. J. Math. Oxford, 11(1960) 50-59. [133] Mitra S. K., The matrix equation AXB +CXD = E, SIAM J. Appl. Math., 32(1977) 823-825. [134] Morrison D. D., Remarks on the unitary triangularization of a nonsymmetric matrix, J. Assoc. Comput. Math., 7 (1960) 185-186. [135] Moore E. H., On the reciprocal of the general algebraic matrix(Abstract). Bull. Amer. Math. Soc., 26(1920) 394-395. [136] Moxey C. E., Sangwine S. J., Ell T. A., Hypercomplex correlation techniques for vector imagines, IEEE Trans. Signal Process., 51(2003) 19411953. [137] O’Leary D. P., Rust B. W., Confidence intervals for inequalityconstrained least squares problems, with applications to ill-posed problems. SIAM J. Sci. Statist. Comput., 7(1986) 473-489. [138] Ortega J. M., Rheinboldt W. C., Iterative Solution of Nonlinear Equations in Several Variables. New York: Academic Press, 1970. [139] Paige C. C., Saunders M. A., Solution of sparse indefinite systems of linear equations. SIAM J. Numer. Anal., 12(1975) 617-629.
200
References
[140] Paige C. C., Saunders M. A., Toward a generalized singular value decomposition. SIAM J. Numer. Anal., 18(1981) 398-405. [141] Paige C. C., Saunders M. A., LSQR. An algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Software, 8(1982) 4371. [142] Parlett B. N., Global Convergence of the Basic QR Algorithm on Hessenberg Matrices. Math. Comp., 22(1968) 803-817. [143] Peng Z., An iterative method for the least squares symmetric solution of the linear matrix equation AXB = C, Appl. Math. Comput., 170(2005) 711-723. [144] Peng Y., Hu X., Zhang L., An iteration method for the symmetric solutions and the optimal approximation solution of the matrix equation AXB = C, Appl. Math. Comput., 160(3)(2005) 763-777. [145] Peng Y., Hu X., Zhang L., An iterative method for symmetric solutions and optimal approximation solution of the system of matrix equations A1 XB1 = C1 , A2 XB2 = C2 , Appl. Math. Comput., 183(2006) 1127-1137. [146] Penrose R., A generalized inverse for matrices. Proc. Cambridge Philos. Soc., 51(1955) 406-413. [147] Peres A., Proposed test for complex versus quaternion quantum theory, Phys. Rev. Lett., 42(1979) 683-686. [148] Pereyra V., Iterative methods for solving nonlinear least squares problems. SIAM J. Numer. Anal., 4(1967) 27-36. [149] Powell M., Reid J., On applying Householder transformations to linear least squares problems. Proc. of the IFIP Congress, (1968) 122-126. [150] Rao C. R., Mitra S. K., Generalized Inverses of Matrices and Its Applications. New York: John Wiley & Sons., 1971. [151] Rodman L., Topics in quaternion linear algebra, Princeton University Press, 2014. [152] Rosch N., Time-Reversal Symmetry, Kramers Degeneracy and the Algebraic Eigenvalue Problem. Chemical Physics, 80(1983) 1-5.
References
201
[153] So W., Left eigenvalues of quaternionic matrices, a Private communication, 1994. [154] Saad Y., Schultz M. H., GMRES: ageneralized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Statist., 7(1986) 856-869. [155] Sangwine S. J., Fourier transforms of colour images using quaternion, or hypercomplex numbers, Electron Lett., 32(21)(1996) 1979-1980. [156] Sangwine S. J., Comment on “A structure-preserving method for the quaternion LU decomposition in quaternionic quantum theory”, Comput. Phys. Comm., 188(2015) 128-130. [157] Sangwine S. J., Eli T. A., Moxey C. E., Vetor phase correlation, Electron. Lett., 37(25)(2001) 1513-1515. [158] Sangwine S. J., Le Bihan N., Quaternion toolbox for Matlab, http://qtfm.sourceforge.net/. [159] Sangwine S. J., Le Bihan N., Quaternion singular value decomposition based on bidiagonalization to a real or complex matrix using quaternion Householder transformations, Appl. Math. Comput., 182(2006) 727-738. ¨ [160] Schmidt E., Uber die aufl¨osung linearer gleichungen mit unendlich vielen unbekannten. Rend. Circ. Mat. Palermo. Ser., 1, 25(1908) 53-77. [161] Scolarici G., Solombrino L., Quaternionic representation of magnetic groups, J. Math. Phys., 38(1997) 1147-1160. [162] Stewart G. W., Matrix Algonrithms Volume II: Eigensystems, SIAM, 2001. [163] Stewart G. W., On the weighting method for least squares problems with linear equality constraints. BIT, 37(1997) 961-967. [164] Stewart G. W., Sun J. G., Matrix Perturbation Theory. Boston: Academic Press, 1990. [165] van der Sluis A., Veltkamp G. W., Restoring rank and consistency by orthogonal projection. Linear Algebra Appl., 28(1979) 257-278.
202
References
[166] van der Vorst H., Bi-CGSTAB: A fast and smoothing converging variant of Bi-CG for solution of non-symmetric linear systems. SIAM J. Sci. Statist.Comput., 13(1992) 631-644. [167] van der Vorst H., Iterative Krylov Methods for large Linear Systems. Cambridge: Cambridge Univ. Press, 2003. [168] Van Loan C. F., Generalizing the singular value decomposition. SIAM J. Numer. Anal., 13(1976) 76-83. [169] Van Loan C. F., Computing the CS and the generalized singular value decomposition. Numer. Math., 46(1985) 479-492. [170] Van Loan C. F., On the method of weighting for equality-constrained least squares problems. SIAM J. Numer. Anal., 22(1985) 851-864. [171] Van Huffel S., Edit., Recent Advances in Total Least Squares Techniques and Errors- in- Variables Modelling. Philadelphia: SIAM, 1997. [172] Van Huffel S., Lemmerling P. Edit., Total Least Squares and Errors- inVariables Modelling: Analysis, Algorithms, and Applications. Dordrecht: Kluwer Academic Publishers, 2002. [173] S. Van Huffel, Vandewalle J., Analysis and solution of the nongeneric total least squares problem, SIAM J. Matrix Anal. Appl., 9(1988) 360372. [174] Varga R. S., Matrix Iterative Analysis. Prentice-Hall, Englewood Cliffs, NJ, 1962. [175] Viswanath K., Contributions to Linear Quaternionic Hilbert Analysis, Thesis, Indian Statistical Inst., Calcutta, 1968. [176] Viswanath K., Normal operators on quaternionic Hilbert spaces, Trans. Amer. Math. Sot., 162(1971). [177] Wang M., Ma W., A structure-preserving algorithm for the quaternion Cholesky decomposition, Appl. Math. Comput., 223(2013) 354-361. [178] Wang M., Ma W., A structure-preserving method for the quaternion LU decomposition in quaternionic quantum theory, Comput. Phys. Commun., 184(2013) 2182-2186.
References
203
[179] Wang M., Cheng X., Wei M., Iterative algorithms for solving the matrix equation AXB +CX T D = E. Appl. Math. Comput., 187(2007) 622-629. [180] Wang M., Wei M., Feng Y., An iterative algorithm for least squares problem in quaternionic quantum theory. Comput. Phys. Commun., 179(2008) 203-207. [181] Wang M., Wei M., Feng Y., An iterative algorithm for a least squares solution of a matrix equation. Inter. J. Comput. Math., 87(2010) 12891298. [182] Wang M., Algorithm Q-LSQR for the least squares problem in quaternionic quantum theory. Comput. Phys. Commun., 181(2010) 1047-1050. [183] Wang M., Wei M., Hu S., An iterative method for the least-squares minimum-norm symmetric solution. Comput. Model. Eng. Sci., 77(2011) 173-182. [184] Wang Q., Yu S., Xie W., Extreme ranks of real matrices in solution of the quaternion matrix equation AXB = C with applications, Algebra Colloq., 17(2010) 345-360. [185] Wang Q., Jiang J., Extreme ranks of (skew-)Hermitian solutions to a quaternionmatrix equation, Electron. J. Linear Algebra, 20(2010) 552573. [186] D. Watkins, Understanding the QR Algorithm. SIAM Review, 24(1982) 427-440. [187] Wei M., Algebraic properties of the rank deficient equality constrained least squares and weighted least squares problems. Linear Algebra Appl., 161(1992) 27-43. [188] Wei M., The analysis for the total least squares problem with more than one solution. SIAM J. Matrix Anal. Appl., 13(1992) 746-763. [189] Wei M., The analysis for the total least squares problem with more than one solution. SIAM J. Matrix Anal. Appl., 13(1992) 746-763. [190] Wei M., Generalized Least Squares Problems: Theory and Computations (in Chinese), Science Press, Beijing, 2006.
204
References
[191] Wei M., Liu Q., Roundoff error estiamtes of the modified Gram-Schmidt algorithm with column pivoting. BIT, 43(2003) 627-645. [192] Wiegmann N. A., Some theorems on matrices with real quaternion elements, Canad. J. Math., 7(1955) 191-201. [193] Wigner E. P., Group Theory and its Application to the Quantum Mechanics of Atomic Spectra, New York: Academic Press, 1959. [194] Wilkinson J. H., The Algebraic Eigenvalue Problem. Oxford: Clarendon Press, 1965. [195] Wolf L. A., Similarity of matrices in which the elements are real quaternions, Bull. Amer. Math. Sot., 42(1936) 737-743. [196] Wood R. M. W., Quaternionic eigenvalues, Bull. London Math. Sot., 17(1985) 137-138. [197] Xie D. X., Hu X. Y., Zhang L., Solvability conditions for the inverse eigenproblems of symmetric and anti-persymmetric matrices and its approximation, Numer. Linear Algebra Appl., 10(2003) 223-234. [198] Xu G., Wei M., Zheng D., On solutions of matrix equation AXB+CY D = F. Linear Algebra Appl., 279 (1998) 93-109. [199] Young D. M., Iterative Solution of Large Linear Systems. New York: Academic Press, 1971. [200] Yuan S., Least squares pure imaginary solution and real solution of the quaternion matrix equation AXB +CXD = E with the least norm, J. Appl. Math., 4(2014) 1-9. [201] Yuan S., Liao A., Least squares solution of the quaternion matrix equaˆ = C with the least norm, Linear and Multilinear Algebra, tion X − AXB 59(2011) 985-998. [202] Yuan S., Liao A., Lei Y., Least squares Hermitian solution of the matrix equation (AXB,CXD) = (E, F) with the least norm over the skew field of quaternion, Math. Comput. Model., 48(2008) 91-100.
References
205
[203] Yuan S., Liao A., Yao G., The matrix nearness problem associated with the quaternion matrix equation AXAH + BY BH = C, J. Appl. Math. Comput., 37(2011) 133-144. [204] Yuan S., Wang Q., Two special kinds of least squares solutions for the quaternion matrix equation AXB + CXD = E, Electron. J. Linear Algebra, 23(2012) 257-274. [205] Yuan S., Wang Q., Duan X., On solutions of the quaternion matrix equation AX = B and their applications in color image restoration, Appl. Math. Comput., 221(2013) 10-20. [206] Yuan S., Wang Q., Zhang X., Least squares problem for quaternion matrix equation AXB +CY D = E over different constrained matrices, International Journal of Computer Mathematics, 90(3)(2013) 565-576. [207] Zhang F., Wei M., Li Y., Zhao J., Special least squares solutions of the quaternion matrix equation AX = B with applications, Appl. Math. Comput., 270 (2015) 425-433. [208] Zhang F., Wei M., Li Y., Zhao J., Special least squares solutions of the quaternion matrix equation AXB + CXD = E, Comput. Math. Appl., 72 (2016) 1426-1435. [209] Zhang F., Permanent Inequalities and Quaternion Matrices, PhD Dissertation, Univ. of California at Santa Barbara, 1993. [210] Zhang F., Quaternions and matrices of quaternion. Linear Algebra Appl., 251(1997) 21-57. [211] Zhang F., Wei Y., Jordan canonical from of a partitioned complex matrix and its application to real quaternion matrices, Commnun. Algebra, 29(6)(2001) 2363-2375. [212] Zoltowski M. D., Generalized minimum norm and constrained total least squares with applications to array processing, San Diego: SPIE Signal Processing III, 975(1988) 78-85.
About the Authors Ying Li, PhD Professor College of Mathematical Sciences Liaocheng University Shandong, P. R. China Email: [email protected] Dr. Ying Li received the PhD degree from University of Shanghai for Science and Technology in 2012. Since 2018, she has been professor with the College of Mathematics Science in Liaocheng University, Liaocheng, China. Her current research interest cover numerical algebra, matrix analysis, and image processing.
Musheng Wei, PhD Professor College of Mathematics and Science Shanghai Normal University Shanghai, P. R. China, retired College of Mathematical Sciences Liaocheng University Shandong, P. R. China Dr. Musheng Wei received a BS degree in Mathematics in 1982 from Nanjing University, Nanjing, China, received the MS and PhD degrees in Applied Mathematics in 1984 and 1986, respectively, both from Brown University, RI, USA. Between 1986 and 1988, he was a postdoctoral fellow at IMA in University of Minnesota, the Ohio State University, and Michigan State
208
About the Authors
University. Between 1988 and 2008, he obtained a position at the Department of Mathematics, East China Normal University, Shanghai, China where he was a professor. From 2008 he moved to the College of Mathematics and Science, Shanghai Normal University, Shanghai, China as a professor. Since 2012 he has served as a visiting professor at College of Mathematical Sciences, Liaocheng University, Shandong, China. His research interests include numerical algebra, matrix analysis, scattering theory, signal processing, control theory, and scientific computing.
Fengxia Zhang, MS Associated Professor College of Mathematical Sciences Liaocheng University Shandong, P. R. China Fengxia Zhang received her MS degree from the College of Mathematics Science, Liaocheng University in 2008. Since 2013, she has been associate professor with the College of Mathematics Science in Liaocheng University, Liaocheng, China. Her current research interest cover numerical algebra, matrix analysis, and image processing.
Jianli Zhao, PhD Professor College of Mathematical Sciences Liaocheng University Shandong, P. R. China Dr. Jianli Zhao received his PhD degree from the Department of Mathematics, East China Normal University in 2007. Since 1997, he has been professor with the college of Mathematics Science in Liaocheng University, Liaocheng, China. His research interests covers matrix analysis, numerical algebra and image processing.
Index δ-rank, 85 JRS-symmetric, 32 JRS-symplectic, 32, 172 p− norm, 15 Jacobi method, 168 A-conjugate, 133 accurate, 36, 92 arithmetic, 35, 36, 61 assignment number, 36, 56, 59, 61, 62, 74 asymptotic rate of convergence, 125, 141 augmented matrix, 81, 104, 153 average rate of convergence, 125 best approximation, 86, 133 bidiagonal matrix, 145, 152 Bidiagonalization method, 152 bounded close set, 16, 17
color image, 1, 74, 77 column block, 13, 127, 161, 165, 181, 186 column pivoting, 64, 67–69, 109, 110 column space, 25, 30 compatible, 79, 83, 87–92, 95, 117 complementary subspace, 26, 29 complete orthogonal decomposition, 69, 106, 107, 109, 110 complex representation, 6 Compression Ratio, CR, 77 condition number, 137 conjugate gradient LS method, CGLS, 143, 147, 148 conjugate gradient method, CG, 133– 137, 145 consistent matrix norm, 19, 124, 125 consistent norm, 18, 19, 21, 22 consistent order, 140, 141 continuous, 15, 17, 20 convergent, 123–125, 129, 136, 137, 139, 147, 159 countable set, 16 CS decomposition, CSD, 12, 14, 93, 100
Cauchy inequality, 16 Chebyshev polynomial, 120, 131, 132, 136, 137 Chebyshev semi-iterative acceleration, 141 Chebyshev semi-iterative method, diagonalizable matrix, 180 132, 142 direct elimination, 110 Cholesky decomposition, 37, 51, 55, direct method, 67, 103, 119, 150 105, 108
210
Index
direct sum, 25 divergent, 123 equality constrained least squares, LSE, 79, 96–101, 103, 108– 110 equivalent class, 4 equivalent problem, 82, 83, 96, 97, 133 filtering factor, 87 floating point, 147 floating point arithmetic, 48 flop, 35, 36, 38, 39, 61, 63, 66, 74, 186 Francis QR algorithm, 185 Frobenius norm, F-norm, 18, 22, 50, 169 full column rank, 61, 67, 86, 94, 109, 110, 151 Full rank decomposition, 26, 59 full row rank, 94, 95, 98, 109–111 Gauss transformation, 37, 46, 48 Gauss-Seidel iteration, 128, 129, 138– 141 generalized inverse, 23, 24 generalized LS, GLS, 12, 79, 103, 119 geometric meaning, 31 Givens matrix, 38, 41, 147, 170, 172, 188 Givens QRD, 66 Givens rotation, 38, 39, 70, 149, 156 good approximation, 160, 178, 180, 181 Gram-Schmidt, 6 Holder ¨ norms, 15 Hadamard product, 90
Hermitian idempotent matrix, 29, 31 Hermitian Lanczos method, 176 Hermitian matrix, 5, 10, 33, 89–91, 157, 164, 165, 167, 172, 176 Hermitian positive definite, 51, 52, 54, 55, 105, 133, 135–137, 141, 143, 149, 151 Hermitian QR method, 164 Hermitian right eigenvalue problem, 158, 162, 163, 165, 168, 175, 177 Hermitian/skew Hermitian splitting iterative method, HSS, 149, 150 high-level operation, 56 high-performance, 36 homogeneous, 80 Householder based transformation, 42–44, 59, 60, 70, 186 Householder matrix, 38, 42, 44 Householder QRD, 59, 64 Householder vector, 42, 45, 73 idempotent matrix, 26–30 ill-conditioned, 85, 87 ill-posed problem, 87 implicit double shift, 188 incompatible, 79 inner iteration, 160, 161, 181 inverse power method, 157, 158, 162, 163, 177, 182–184 isomorphism, 7, 32 iterative algorithm, 119, 154, 162, 177, 182 iterative method, 122, 123, 125, 131, 137, 142, 149–151, 185 iterative speed, 124
Index Jacobi iteration, 126, 127, 138, 140, 141 Jacobi method, 158, 168, 172–174 Jordan block, 9, 11 Jordan canonical decomposition, 9, 177 Jordan canonical form, 9, 26, 124, 178 JRSGivens matrix, 39
211
matrix decomposition, 9, 35, 36, 45, 103 matrix equation, 87–91, 103, 111, 112, 114, 119, 154 matrix norm, 17, 18, 20, 21, 124 Mean Square Error, MSE, 76, 77 minimal norm solution, 82, 83, 97, 98, 106, 108, 109 modified Gram-Schmidt, MGS, 67, Karush- Kuhn -Tucker equation, KKT 105 equation, 83, 84, 98, 138, 149 modified matrix, 95 Krylov subspace, 122, 132, 138, 143, Moore-Penrose inverse, MP inverse, 153 24, 69, 83, 85, 97, 112 mutually perpendicular, 13, 83, 171 Lanczos bidiagonalization, LBD, 145, 147, 152 non-Hermitian matrix, 157, 158, 182, Lanczos method, 176 185, 186 Lanczos process, 147 non-stationary, 123 least squares, LS, 79, 82–87, 92, 103, nonhomogeneous, 80 105, 106, 110, 111, 137, 143, nonsingular matrix, 9, 24, 47, 48, 59, 145 79, 124, 125, 127, 128, 130 left eigenvalue, 7, 8, 10, 157 normal equation, 83, 105, 138, 143 left linearly independent, 6 normal matrix, 5, 10 left singular vector, 12 null space, 25, 30 left spectrum, 7 numerical rank, 110, 111 linear least squares problem, 82 numerical stability, 45, 64, 95 linear iteration, 123 linear system, 2, 14, 45, 79, 80, 83, operator norm, 20, 21 84, 92, 95, 96, 103–105, 108, optimal factor, 141 119, 122, 125, 132, 133, 136, orthogonal projection, 23, 29, 31, 82, 138, 150, 151, 162, 182 97 low rank optimal approximation, 93 orthogonal JRS-symplectic, 33, 170 LS, 105 orthogonal matrix, 39, 165 LS solution, 105 orthogonal projection, 29 LU decomposition, LUD, 37, 45, 47, orthogonal vector, 44 51, 104 outer iteration, 160, 161, 181 matrix computation, 35, 36, 43
partial pivoting, 48, 104
212
Index
partial SVD, 151 Peak Signal to Noise Ratio, PSNR, 76, 77 permutation matrix, 48, 58, 64, 68, 104, 140, 162, 183 pipelining, 36 pivoting LU decomposition, PLU, 104, 162, 183 pivoting LU decomposition, PLU, 47 polynomial, 122, 136 positive definite, 15, 86, 100, 138, 143 power method, 158, 160, 162, 163, 177, 180, 182–184 preconditioning, 137, 149 preconditioning Hermitian/skew Hermitian splitting iterative method, PHSS, 149–151 projection, 25–28 pure imaginary, ix, 112, 117 qGivens matrix, 39 qMLSQR, 155 QR algorithm, 152, 153, 157, 164, 185, 188 QR decomposition, 58, 59, 61–67, 69, 87, 105, 106, 109–111, 147, 186 QR least squares, LSQR, 145, 147, 148, 153–155 quadratic polynomial, 91 quaternion Cholesky decomposition, 51 quaternion Householder based transformation, 43 quaternion LU decomposition, 46 quotient singular value decomposition, Q-SVD, 13, 14, 91, 97, 99,
111 rank deficient, 85, 106 Rayleigh quotient, 160, 176 Rayleigh-Ritz projection, 175 Rayleigh-Ritz projection method, 175 real Givens matrix, 38, 41 real Householder transformation, 42, 44 real representation, 31, 33, 36, 37, 41, 45, 52, 104, 112, 127, 128, 130, 135, 161, 164, 165, 167, 172, 181 real structure-preserving, 36, 45, 47– 49, 51, 52, 54, 56, 61–64, 67, 69, 70, 73–75, 103, 104, 126, 128, 130, 135, 137, 157, 158, 160, 162–165, 167, 169, 172, 181, 183, 184, 186 real symmetric positive definite, 133, 135 real symmetric tridiagonal matrix, 121, 164, 165, 167, 168 real tridiagonalization, 165 recursive formula, 121, 145 recursive relation, 120, 146 regularization, 82, 85, 87 residual vector, 84, 87, 98, 105, 133, 148 Richardson iterative method, 142 right eigenvalue, 7–10, 157, 162–165, 167, 169, 175–177, 180, 183, 186 right eigenvector, 10, 11, 158, 162, 164, 165, 167, 176–178, 180–182, 184 right linearly independent, 6, 80, 82
Index
213
right singular vector, 12, 108, 151, total least squares, TLS, 79, 92–94, 152, 154 103, 107, 119, 151, 153 right spectrum, 7 triangular matrix, 185 Ritz vector, 176 triangularliaztion, 186 tridiagonal matrix, 33, 177 Schur decomposition, 10, 185 truncated LS, 85–87 self-consistent, 18 set of fundamental solutions, 80, 81 unconstrained least square, 101, 109 shift inverse power method, 163, 183 unit lower triangular matrix, 45–48, similar, 4 52, 104, 140, 183 singular value decomposition, SVD, unit vector, 8, 43, 159, 178 11, 14, 23, 30, 35, 37, 42, unitary decomposition, 59, 97–99 69, 73, 75, 87–90, 93–95, 99, unitary invariant norm, 22 100, 106, 107, 151, 153 unitary matrix, 5, 7, 11, 32, 39, 43–45, SOR, 130, 139 88, 158, 185 spectral condition number, 136, 137 unitary similarity, 157, 185 spectral norm, 22 upper bidiagonal, 70 spectral radius, 22, 129 upper Hessenberg, 186, 187 splitting, 122, 123, 125, 128, 129, 138, upper trapezoidal matrix, 58 140, 141, 143 upper triangular, 104, 108 stable, 35, 48, 105, 106, 167 upper triangular matrix, 8, 10, 45, 47, standard right eigenvalue, 9, 11, 158, 48, 52, 62, 66, 105, 106, 109, 161, 177, 178, 181–183 110, 140, 148, 162, 183, 187 stationary, 123 vector norm, 14, 15, 17–20, 124 Sturm sequence, 122, 152 subspace, 25, 29, 175 weighted least square, WLS, 97, 99, subspace method, 158, 175 100, 110, 149 successive over-relaxation, SOR, 129, well-conditioned, 106 132, 139, 141, 143 The Chebyshev semi-iterative method, 131 the first column block, 34 the first row block, 34 the minimal norm least squares solution, 85, 105, 109, 110, 112 Tikhonov regularization, 86, 87