129 46
English Pages 775 [763] Year 2023
Mathematics Online First Collections
Mohammad Sal Moslehian Ed.
Matrix and Operator Equations and Applications
Mathematics Online First Collections
This series covers the broad spectrum of theoretical, applied, and computational mathematics. Once peer-reviewed chapters are accepted for publication, they are published online ahead of the completion of the full volume. The readership for books in the series is intended to be made up of researchers and often times graduate students, as well. As in the publication Math in the Time of Corona, the series may occasionally publish books fashioned for a general audience.
Mohammad Sal Moslehian Editor
Matrix and Operator Equations and Applications
Editor Mohammad Sal Moslehian Department of Pure Mathematics Ferdowsi University of Mashhad Mashhad, Iran
ISSN 2730-633X ISSN 2730-6348 (electronic) Mathematics Online First Collections ISBN 978-3-031-25385-0 ISBN 978-3-031-25386-7 (eBook) https://doi.org/10.1007/978-3-031-25386-7 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Dedicated to my beloved children: Anahita Arash
Preface
This book focuses on a comprehensive and self-contained study of matrix and operator equations and equalities, which find extensive application in diverse scientific disciplines for formulating challenging problems and solving them accurately. The investigation is conducted systematically, employing powerful methods that have proven effective in exploring significant equations within the fields of functional analysis, operator theory, linear algebra, and other related subjects over the past few decades. These developments hold substantial relevance to a broad spectrum of pure and applied mathematicians, thereby catering to a wide-ranging audience. The book is structured into two distinct sections: Part I focuses on matrix equations, while Part II delves into operator equations. In the first part of the book, an overview of the current state-of-the-art of systems of matrix equations is provided with a particular focus on conditions for the existence and representations of solutions. Quaternion restricted two-sided matrix equation AXB = D and approximation problems related to it are discussed, and in another chapter, semi-tensor product of matrices is introduced to solve quaternion matrix equations. Additionally, the application of quaternion algebras is explored in the context of solving well-known matrix equations such as Sylvester, Stein, and Lyapunov equations. Another chapter explores the interconnections between matrix means and quadratic matrix equations. There is a comprehensive analysis of the Yang-Baxter-Like matrix equation AXA = XAX. A chapter is devoted to the investigation of Hermitian polynomial matrix equations Xs ±∑ i=1ℓδiAiXtiAi = Q, which arise in linear-quadratic control problems. Furthermore, a review of both classical and recently discovered inequalities pertaining to matrix exponentials is presented. Finally, a broad survey of some major developments in the study of numerical ranges is given. The second part of the book showcases the latest advancements in various equations within modern operator theory. These developments hold significant interest for both pure and applied mathematicians. The specific topics covered include the stability and boundary controllability of operator differential equations, vii
viii
Preface
singular integral operators with shifts, some bounds related to the Berezin number and Berezin norm, an expository survey of different results about the norms of derivations, semicircular elements induced by connected finite graphs, generalizing data analysis in Hilbert C-modules, general theory to operator equations and iterative processes as well as applications to integral equations, and finally the Daugavet equation kI + Tk = 1 + kTk in both linear and nonlinear settings. Each chapter offers a comprehensive and in-depth exploration of its respective subject matter. The chapters are written in a reader-friendly style and can be approached independently, allowing readers to select specific topics of interest. Furthermore, each chapter provides an extensive bibliography, ensuring access to a wide range of related research. This book is primarily targeted toward researchers and graduate students in the disciplines of mathematics, physics, and engineering, as it aims to provide them with valuable resource to support their studies and research undertakings. The editor expresses his sincere appreciation not only to the authors for their wonderful contributed chapters but also to the numerous mathematicians who devoted their time to thoroughly reviewing the chapters included in this book. The invaluable feedback and insightful comments provided by these reviewers have helped to the improvement of the content. Mashhad, Iran Spring 2023
Mohammad Sal Moslehian
Contents
Part I
Matrix Equations
Existence and Representations of Solutions to Some Constrained Systems of Matrix Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dijana Mosić and Predrag S. Stanimirović Quaternion Two-Sided Matrix Equations with Specific Constraints . . . . Ivan I. Kyrchei, Dijana Mosić, and Predrag S. Stanimirović
3 73
Matrices over Quaternion Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Xin Liu and Yang Zhang Direct Methods of Solving Quaternion Matrix Equation Based on STP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Ying Li, WenXu Ding, XiaoYu Zhao, AnLi Wei, and JianLi Zhao Geometric Mean and Matrix Quadratic Equations . . . . . . . . . . . . . . . . . 211 Mitsuru Uchiyama Yang-Baxter-Like Matrix Equation: A Road Less Taken . . . . . . . . . . . . 241 Nebojša Č. Dinčić and Bogdan D. Djordjević Hermitian Polynomial Matrix Equations and Applications . . . . . . . . . . . 347 Zhigang Jia, Linlin Zhao, and Meixiang Zhao Inequalities for Matrix Exponentials and Their Extensions to Lie Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 Luyining Gan, Xuhua Liu, and Tin-Yau Tam Numerical Ranges of Operators and Matrices . . . . . . . . . . . . . . . . . . . . 413 Pei Yuan Wu and Hwa-Long Gau
ix
x
Part II
Contents
Operator Equations
Stability and Controllability of Operator Differential Equations . . . . . . 443 Jin Liang, Ti-Jun Xiao, and Zhe Xu On Singular Integral Operators with Shifts . . . . . . . . . . . . . . . . . . . . . . 489 Yuri I. Karlovich and Jennyffer Rosales-Méndez Berezin Number and Norm Inequalities for Operators in Hilbert and Semi-Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525 Cristian Conde, Kais Feki, and Fuad Kittaneh Norm Equalities for Derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559 Mohamed Boumazgour and Abdelghani Sougrati On Semicircular Elements Induced by Connected Finite Graphs . . . . . . 583 Ilwoo Cho and Palle E. T. Jorgensen Hilbert C-Module for Analyzing Structured Data . . . . . . . . . . . . . . . . . 633 Yuka Hashimoto, Fuyuta Komura, and Masahiro Ikeda Iterative Processes and Integral Equations of the Second Kind . . . . . . . 661 Sanda Micula and Gradimir V. Milovanović The Daugavet Equation: Linear and Nonlinear Recent Results . . . . . . . 713 Sheldon Dantas, Domingo García, Manuel Maestre, and Juan B. Seoane-Sepúlveda
Part I
Matrix Equations
Existence and Representations of Solutions to Some Constrained Systems of Matrix Equations Dijana Mosić and Predrag S. Stanimirović
Abstract The main goal of this chapter is to present necessary and sufficient conditions for the existence and representations of solutions to some restricted matrix equations and the systems of matrix equations. In particular, equivalent conditions for the existence and representations of {2}-, {1}-, and {1, 2}-inverses with additional assumptions on ranges and/or kernel are surveyed and analyzed. Conditions for the existence and representations of (B, C)-inverses and one-sided (B, C)-inverses are investigated in relation to outer inverses with given image and kernel. G-outer inverses and one-sided G-outer inverses are considered as particular inner inverses to which additional restrictions characteristic of outer inverses are imposed. We also derive purely algebraic necessary and sufficient conditions for the solvability of some new systems of matrix equations and the general forms of their solutions. Particular cases of considered equations and systems of matrix equations with their solutions are investigated too. Keywords Matrix equation • Generalized inverse Mathematics Subject Classification (MSC2020) Primary 15A24 • Secondary 15A09, 15A23, 65F20
D. Mosić (✉) • P. S. Stanimirović Faculty of Sciences and Mathematics, University of Niš, Niš, Serbia e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Moslehian (ed.), Matrix and Operator Equations and Applications, Mathematics Online First Collections, https://doi.org/10.1007/16618_2023_44
3
4
D. Mosić and P. S. Stanimirović
1 Introduction Generalized inverses of matrices are generalization of the notion of the ordinary matrix inverse, while the ordinary matrix inverse is unique and exists only for square nonsingular matrices. The generalized inverses are introduced as extensions of usual inverse in singular and rectangular matrices. Uniqueness of generalized inverse is a desirable property, but typically generalized inverses are not unique. The broadest assumptions for a generalized inverse of a matrix require the following: • exists for a larger class of matrices than the ordinary inverse does; • has some properties of the ordinary inverse; • for a given square nonsingular matrix, it reduces to the ordinary inverse. The idea for the definition of generalized inverses of matrices origins from the necessity of finding a solution of a given system of linear equations, which as a problem appears in many scientific and practical disciplines, such as statistics, operational research, physics, economy, electrotechnics, and many others. Generalized inverses provide a simple way for obtaining a solution of the so-called “ill-conditioned” linear problems. The idea to introduce a generalization of the usual inverse in the case of a singular matrix has been first presented by Moore [44] in 1920. But, no efficient investigation of the subject was made by the mid-1950s when the investigation of generalized inverses progressed rapidly. The study of generalized inverses has progressed significantly since its rebirth in the early 1950s. Namely, Roger Penrose [62], unfamiliar with previous work, reclassified the Moore “reciprocal inverse” in 1955. He concluded that Moore inverse could be represented by four equations (known as Penrose equations). Another well-known kind of generalized inverses is the Drazin inverse, named after Michael P. Drazin (1958). A major extension of this field came in the 1950s, when C. R. Rao [65] and J. Chipman [18] exploited the relation between generalized inverses, least squares, and statistics. Generalized inverses are very powerful tools and are applicable in many branches of mathematics, technics, and engineering. The most frequent and important is the application in finding solution of many matrix equations and system of linear equations. Besides numerical linear algebra, there are a lot of other mathematical and technical disciplines in which generalized inverses play an important role. Some of them are: estimation theory (regression), computing polar decomposition, electrical circuits (networks) theory, automatic control theory, filtering, difference equations, pattern recognition, and image restoration. The real expansion in the development of this scientific
Existence and Representations of Solutions to Some Constrained Systems. . .
5
field began with the work of Penrose [62] in 1955. The general reciprocal rediscovered by Penrose in [62] is now widely recognized as the MoorePenrose inverse. Penrose in his paper was the first who showed the close connection between the Moore-Penrose inverse and the least-squares solution problem of a system of linear equations. The last represents a special case of the nonlinear optimization problems. Additionally, the discovered minimal properties of the solution of a linear system of equations, obtained with the usage of the Moore-Penrose inverse, brought to intensive usage of the optimization methods. Thousands of papers on various theoretical and computational aspects of generalized inverses and their applications have appeared since 1955. Also, a number of monographs and survey papers on that subject have been written [2, 7, 9, 23, 46, 80, 83, 88, 89, 91]. Ben-Israel in [3] observed about 2000 articles and 15 books on generalized inverses of matrices and linear operators. Now, that number is increasing every day. It is justifiable to say that the theory of generalized inverses becomes an important part of mathematics as well as important part of many applicable scientific areas, such as computer science, electrical engineering, image restoration, and many other areas. Some details about the history of generalized inverses are available in two survey journal papers [3] and [4]. A global overview of various applications of generalized inverses was presented in [3]. A widely recognized use of generalized inverses is to calculate leastsquares solutions to a system of linear equations or matrix equation. It is known that least-squares solutions are not unique. But, application of the MP inverse ensures uniqueness by generating a unique least-squares solution of the minimum norm. A chemical equation is only a symbolic representation of a chemical reaction and represents an expression of atoms, elements, compounds, or ions. Balancing chemical equations is an important application of generalized inverses (see [66]). Krishnamurthy in [33] gave a mathematical method for balancing chemical equations founded by virtue of a generalized matrix inverse. In fact, the problem of finding generalized inverses of largescale matrices resulting from real applications with underlying 2D or 3D geometry (such as partial differential equations, optimization, computational fluid dynamics, simulation, and computer graphics) frequently occurs in practice. We now collect some notations, definitions, and preliminary results that are used in the next sections. The set of complex (resp. real) numbers is denoted by (resp. ). Following these notations, m×n (resp. m×n) denote the set involving all m × n complex (resp. real) matrices. Following the standard notations, AT, A, rank(A), RðAÞ, and N ðAÞ will represent the transpose, the conjugate-transpose, the rank, the range (column space), and
6
D. Mosić and P. S. Stanimirović
×n ×n the kernel (null space) of A, respectively. Further, m (resp. m ) will r r stand for the set of complex (resp. real) m × n matrices of rank r. For A 2 m×n , there exists the Moore-Penrose inverse of A as the unique matrix X 2 n × m (denoted by A{) which satisfies the Penrose equations [62]:
ð1Þ AXA = A,
ð2Þ XAX = X,
ð3Þ ðAXÞ = AX,
ð4Þ ðXAÞ = XA:
Usability of the Moore-Penrose inverse imply its extensive usage in the image restoration [8, 19–21]. The influence of a linear motion causes a blur on the original image. The blur of an image is modeled by the matrix equations FHT = G and HF = G with respect to the unknown original matrix F, the blurred matrix G, and the matrix H causing the blur. The MoorePenrose inverse H{ is usable in solving these equations [19, 21], since it gives their best approximate solutions. An application of the Moore-Penrose inverse in solving the basis pursuit problem minn fk u k1 : Au = f g was u2
investigated in [67]. Moreover, the pseudoinverse has been used in diverse applications such as linear estimation, differential and difference equations, Markov chains, graphics, cryptography, coding theory, and robotics. The set of all matrices obeying the conditions contained in δ ⊆{1, 2, 3, 4} is denoted by A{δ}. Any matrix from A{δ} is called the δ-inverse of A and is denoted by A(δ). Notice that A{1, 2, 3, 4} = {A{}. Let A 2 m×n be of rank r, let T be a subspace of n of dimension s ≤ r, and let S be a subspace of m of dimension m - s. The outer inverse of A with ð2Þ the range T and the null-space S is unique matrix X 2 n × m (denoted by AT,S) satisfying XAX = X,
RðXÞ = T,
N ðXÞ = S:
ð2Þ
It is well-known that AT,S exists if and only if AT S = m [2]. The ð2Þ
×n m×n symbol A 2 m and AT,S exists. Recall that T,S will indicate that A 2 ð2Þ
A{ = ARðA Þ,N ðA Þ : For A 2 n × n , there exists the Drazin inverse of A as the unique matrix X 2 n × n (denoted by AD) such that Ak + 1 X = Ak ,
XAX = X,
AX = XA,
Existence and Representations of Solutions to Some Constrained Systems. . .
7
where k = ind(A) is the index of A, that is, the smallest nonnegative integer k satisfying rank(Ak) = rank(Ak+1). When ind(A) = 1, the Drazin inverse AD reduces to the group inverse of A (denoted by A#). Notice that ð2Þ
AD = ARðAk Þ,N ðAk Þ
and
ð2Þ
A# = ARðAÞ,N ðAÞ :
For computational purposes of {2}-inverses, the following formula, known as Urquhart formula, is convenient to be applied. The Urquhart formula was originated [81] and confirmed in [2, Theorem 13, P. 72]. ×n , U 2 n × p , V 2 Proposition 1.1 (Urquhart Formula) Let A 2 m r q×m and
X = UðVAUÞð1Þ V, where (V AU)(1) is a fixed but arbitrary element of (V AU){1}. Then (1) (2) (3) (4)
X 2 A{1} if and only if rank(V AU) = r; X 2 A{2} and RðXÞ = RðUÞ if and only if rank(V AU) = rank(U); X 2 A{2} and N ðXÞ = N ðVÞ if and only if rank(V AU) = rank(V ); ð2Þ X = ARðUÞ,N ðVÞ if and only if rank(V AU) = rank(U) = rank(V ); ð1,2Þ
(5) X = ARðUÞ,N ðVÞ if and only if rank(V AU) = rank(U) = rank(V ) = r. Outer generalized inverses with prescribed image and kernel are pivotal in matrix theory. Some of their applications are in defining iterative methods for solving the nonlinear equations [2], in statistics [30, 31] as well as in stable approximations of ill-posed problems and in linear and nonlinear problems [60]. Inner generalized inverses (or {1}-inverses) play an essential role in solving matrix and operator equations and systems of equations. The next results establish the extremely important relationship between {1}-inverses and the solutions of a linear matrix equation [2, 83]. Lemma 1.2 For arbitrary A 2 m×n , B 2 p × q , D 2 m × q , the general linear matrix equation AXB = D is solvable if and only if
ð1:1Þ
8
D. Mosić and P. S. Stanimirović
AAð1Þ DBð1Þ B = D holds for some A(1), B(1). In this case, the general solution to (1.1) is given as X = Að1Þ DBð1Þ + Y - Að1Þ AYBBð1Þ for arbitrary Y 2 n × p . The Moore-Penrose inverse and {1},{1, 3},{1, 4}-inverses play fundamental role concerning solutions to the system of linear equations Ax = b,
A 2 m×n , b 2 m
ð1:2Þ
with respect to unknowns x 2 n . Corollary 1.3 For A 2 m×n and b 2 m , the system (1.2) is consistent if and only if for some A(1) it holds AAð1Þ b = b, in which case the general solution to the system (1.2) is x = Að1Þ b + ðI - Að1Þ AÞy, for arbitrary y 2 n . Lemma 1.4 The linear system (1.2) is solvable if and only if b 2 RðAÞ (or equivalently, AA{b = b). In this case, least-squares solutions to (1.2) are given by the set S = A{ b N ðAÞ = A{ b + ðI - A{ AÞy,
for arbitrary y 2 n :
ð1:3Þ
The fundamental results presented in Lemma 1.2 and Corollary 1.3 have been extended to various classes of generalized inverses and various matrix equations or systems of matrix equations. Outer inverses with prescribed range and null space are useful in solving the restricted SoLE. This application is based on the following essential result from [16]: Proposition 1.5 ( [16]) Let A 2 m×n be of rank r, let T be a subspace of n, and let the condition
Existence and Representations of Solutions to Some Constrained Systems. . .
b 2 AT,
9
dimðATÞ = dimðTÞ
be satisfied. Then the unique solution to the constrained SoLE Ax = b,
x2T
is given by ð2Þ
x = AT,S b, for any subspace S of m satisfying AT S = m . The solvability of matrix equations and finding their explicit solutions have many applications in physics, mechanics, control theory, and many other fields [2, 83]. Many matrix equations have been extended to Hilbert space operators [82, 97]. The aim of this chapter is to present equivalent conditions for the existence and representation of solutions to some restricted matrix equations and the systems of matrix equations. Precisely, we give necessary and sufficient conditions for the existence and representations of {2}-, {1}-, and {1, 2}inverses which satisfy certain conditions on ranges and/or kernel. Also, we obtain purely algebraic equivalent conditions for the solvability of our new systems of matrix equations, and we find the general forms of their solutions. Special cases of our equations and systems with their solutions are also considered. The following is the organization of this chapter. Section 2 contains equivalent conditions for the existence and representations of the outer inverses with the prescribed range and/or kernel. Outer inverses with the prescribed range are studied in Section 2.1, outer inverses with the prescribed kernel are studied in Section 2.2, and outer inverses with the prescribed range and kernel in Section 2.3. Main particular cases of outer inverses are discussed in Section 2.4. Matrix equations corresponding to composite outer inverses involving the MP inverse are investigated in Section 2.5. Existence and representations of one-sided (B, C)-inverses are considered in Section 3. Properties of the left and right (B, C)-inverses are presented in Sections 3.1 and 3.2, respectively, and as their particular cases, left and right inverse along a matrix are obtained in Section 3.3. Existence and representations of inner inverses with adequate range and/or kernel are investigated in Section 4. Inner inverses with corresponding ranges are investigated in Section 4.1, and inner inverses with adequate kernels in Section 4.2. Section 4.3 involves {1, 2}-inverses with the determined range, while
10
D. Mosić and P. S. Stanimirović
Section 4.4 is devoted to {1, 2}-inverses with the determined kernel. Further, {1, 2}-inverses with determined range and kernel are considered in Section 4.5. G-outer inverses and one-sided G-outer inverses are considered in Section 5. G-outer inverses as solutions of certain systems of matrix equations are stated in Section 5.1. Left and right G-outer inverses are characterized in Section 5.2, and as their special cases, left and right G-Drazin inverses in Section 5.3. Section 6 solves several systems of equations applying G-outer inverses and describes the set of all G-outer inverses. Section 7 proposes the solvability of systems of equations by left and right G-outer or G-Drazin inverses as well as general forms of these inverses.
2 Existence and Representations of Outer Inverses The set of all outer inverses (or also called {2}-inverses) is defined for an arbitrary matrix A 2 m×n by Af2g = fX 2 n × m j XAX = Xg:
ð2:1Þ
According to standard notation, the set involving all outer inverses of rank s is termed as A{2}s, while A(2) stands for an arbitrary outer inverse of A. The {2}-inverses have application in the iterative methods for solving the nonlinear equations [2] as well as in statistics [30, 65]. In particular, outer inverses play an important role in stable approximations of ill-posed problems and in linear and nonlinear problems involving rank-deficient generalized inverses [60]. Main results concerning outer inverses of matrices with the prescribed range and kernel were discovered in [2, 68, 74, 83, 90, 92, 95] and other research articles cited in these references. Immediately from the definition, it can be concluded rank(A(2)) ≤ r = rank (A). Further, it is known that an arbitrary X 2 A{1, 2} is an outer inverse of A satisfying rank(X) = rank(A) = r. The Urquhart formula given in Proposition 1.1 gives a uniform representation X = B(CAB)(1)C of inner and outer inverses of A. Moreover, relationships between rank(B), rank(CAB), rank(C), and rank(A) determine range and null space of X. A straight way to compute (CAB)(1) is based on Lemma 2.1. ×n ×m Lemma 2.1 Let A 2 m and let E 2 m and P 2 nn × n be matrices r m satisfying
Existence and Representations of Solutions to Some Constrained Systems. . .
EAP =
Ir
K
O
O
11
:
Then the n × m matrix X =P
Ir
K
O
L
E
ð2:2Þ
- rÞ × ðm - rÞ is a {1}-inverse (or inner inverse) of A, for any L 2 ðn . m
But, the representation (2.2) requires application of instable Gauss-Jordan elimination. Another computational approach is based on the following full-rank ð2Þ representation of AT,S inverse with the prescribed range and kernel. The representation was proposed [68] in the case when FG = R is a full-rank factorization of selected R 2 n × m , such that RðRÞ = T, N ðRÞ = S and GAF is invertible. ×n , T be a subspace of n of dimension Proposition 2.2 ([68]) Let A 2 m r s ≤ r and let S be a subspace of m of dimensions m - s. In addition, suppose that R 2 n × m satisfies RðRÞ = T and N ðRÞ = S. Let R = FG be an arbitrary ð2Þ full-rank decomposition of R. If AT,S exists, then:
(1) GAF is an invertible matrix; ð2Þ (2) AT,S = FðGAFÞ - 1 G. One approach in calculating B(CAB)(1)C is to compute the inner inverse (CAB)(1) using one of known direct methods based on various decompositions [73, 77, 78] or iterative algorithms for computing generalized inverses [15, 63, 72, 79], or Gaussian elimination method to compute generalized ð2Þ inverse AT,S [69, 70]. In addition, various methods are arising from the general Groetsch representation theorem [17, 36, 90, 93]. The main idea exploited here is to compute (CAC)(1) solving the matrix equation BUCAB = B or CABUC = C and then simply generate the results X := BUC.
12
D. Mosić and P. S. Stanimirović
2.1
Existence and Representations of Outer Inverses with Prescribed Range
For A 2 m×n and B 2 n × k , an outer inverse (or {2}-inverse) of A with the prescribed range RðBÞ is a solution to the constrained equation: XAX = X,
RðXÞ = RðBÞ:
ð2:3Þ
ð2Þ
The symbol ARðBÞ, will stand for a solution to the equation (2.3), i.e., an outer inverse of A with the prescribed range RðBÞ. The set of all solutions to the equation (2.3), i.e., the set of all outer inverses with the prescribed range RðBÞ, is denoted by Af2gRðBÞ, . In the first theorem of this section, we will prove that X is a solution to (2.3) if and only if it is a solution to one of two systems of matrix equations presented in parts (ii) and (iii) of Theorem 2.3. This result is new in the literature, according to our best knowledge. Theorem 2.3 Let A 2 m×n , X 2 n × m and B 2 n × k . The following statements are equivalent: (i) X is a solution to (2.3), i.e., X 2 Af2gRðBÞ, ; (ii) X = BB{X and XAB = B; (iii) XAX = X, X = BB{X, and XAB = B. Proof (i) ) (ii): Since RðXÞ = RðBÞ and XAX = X, it follows X = BU = BB{ ðBUÞ = BB{ X and B = XV = XAðXVÞ = XAB, for some U 2 k × m and V 2 m × k . (ii) ) (iii): The assumptions X = BB{X and XAB = B imply XAX = ðXABÞB{ X = BB{ X = X: (iii) ) (i): It is clear.
□ {
The conditions X = BB X and XAB = B, which appeared in Theorem 2.3, can be replaced with some of the equivalent conditions presented in
Existence and Representations of Solutions to Some Constrained Systems. . .
13
Corollary 2.4. In this way, we can obtain several matrix equation systems with solutions satisfying X 2 Af2gRðBÞ, . Corollary 2.4 Let A 2 m×n , X 2 n × m , and B 2 n × k . (a) If XAX = X, notice that the following statements are equivalent: (i) (ii) (iii) (iv) (v)
X = BB{X; XA = BB{XA; XAA{ = BB{XAA{; XAA = BB{XAA; RðXÞ ⊆ RðBÞ.
(b) The following statements are equivalent: (i) (ii) (iii) (iv)
XAB = B; XABB{ = BB{; XABB = BB; XA(B{) = (B{).
Under the hypothesis XAX = X, we observe that XAB = B is equivalent to RðBÞ ⊆ RðXÞ. The following conditions for the existence of outer inverses of A with the prescribed range RðBÞ and their representations were proved in [74, Theorem 3] and present a theoretical basis for calculating these inverses. Theorem 2.5 ([74, Theorem 3]) Let A 2 m×n and B 2 n × k . (a) The following statements are equivalent: (i) (ii) (iii) (iv) (v)
there exists a {2}-inverse X of A satisfying RðXÞ = RðBÞ; there exists U 2 k × m such that BUAB = B; N ðABÞ = N ðBÞ; rank(AB) = rank(B); B(AB)(1)AB = B, for some (equivalently every) (AB)(1) 2 (AB){1}.
(b) If the statements in (a) are true, then the set of all outer inverses with the prescribed range RðBÞ is represented by Af2gRðBÞ, = BðABÞð1Þ j ðABÞð1Þ 2 ðABÞf1g = fBU j U 2 k × m , BUAB = Bg: Moreover,
14
D. Mosić and P. S. Stanimirović
Af2gRðBÞ, = BðABÞð1Þ + BY I m - ABðABÞð1Þ
Y 2 k × m ,
where (AB)(1) 2 (AB){1} is arbitrary but fixed. Theorem 2.5 provides not only criteria for the existence of an outer inverse ð2Þ ARðBÞ, with prescribed range, but also, it provides a method for computing such an inverse. Namely, the problem of computing a {2}-inverse X of A satisfying RðXÞ = RðBÞ boils down to the problem of computing a solution to the matrix equation BUAB = B, where U is an unknown matrix taking values in k × m . If U is an arbitrary solution to this equation, then X := BU is a {2}-inverse of A satisfying RðXÞ = RðBÞ. Algorithm 2.1 Computing an outer inverse with prescribed range
2.2
Existence and Representations of Outer Inverses with Prescribed Kernel
Let A 2 m×n and C 2 l × m . An outer inverse (or {2}-inverse) of A with the prescribed kernel N ðCÞ is a solution of the constrain equation: XAX = X,
N ðXÞ = N ðCÞ:
ð2:4Þ
ð2Þ
We use A,N ðCÞ to denote a solution of the equation (2.4), i.e., an outer inverse of A with the prescribed kernel N ðCÞ, and Af2g,N ðCÞ to denote the set of all solutions to the equation (2.4), i.e., the set of all outer inverses with the prescribed kernel N ðCÞ. We verify that X is a solution of the constrain equation (2.4) if and only if it is a solution of one of two systems of matrix equations given in parts (ii) and (iii) of Theorem 2.6. Theorem 2.6 Let A 2 m×n , X 2 n × m and C 2 l × m . The following statements are equivalent:
Existence and Representations of Solutions to Some Constrained Systems. . .
15
(i) X is a solution of (2.4) , i.e., X 2 Af2g,N ðCÞ ; (ii) X = XC{C and CAX = C; (iii) XAX = X, X = XC{C, and CAX = C. Proof (i) ) (ii): By N ðXÞ = N ðCÞ and XAX = X, we conclude that X = UC = ðUCÞC{ C = XC { C and C = VX = ðVXÞAX = CAX, for some U 2 n × l and V 2 l × n . (ii) ) (iii): From X = XC{C and CAX = C, we notice that XAX = XC { ðCAXÞ = XC{ C = X: (iii) ) (i): This implication is obvious.
□
To get new systems of matrix equations which have an outer inverse of A with the prescribed kernel N ðCÞ as a solution, we can replace the conditions X = XC{C and CAX = C of Theorem 2.6 with some of the following necessary and sufficient conditions. Remark 2.7 Let A 2 m×n , X 2 n × m , and C 2 l × m . (a) Under the assumption XAX = X, the following statements are equivalent: (i) (ii) (iii) (iv) (v)
X = XC{C; AX = AXC{C; A{AX = A{AXC{C; AAX = AAXC{C; N ðCÞ ⊆ N ðXÞ.
(b) The following statements are equivalent: (i) (ii) (iii) (iv)
CAX = C; C{CAX = C{C; CCAX = CC; (C{)AX = (C{).
Under the hypothesis XAX = X, we observe that CAX = C is equivalent to N ðXÞ ⊆ N ðCÞ.
16
D. Mosić and P. S. Stanimirović
The next existence criterions for outer inverses with the prescribed kernel were presented in [74, Theorem 5]. Theorem 2.8 ([74, Theorem 5]) Let A 2 m×n and C 2 l × m . (a) The following statements are equivalent: (i) (ii) (iii) (iv) (v)
there exists a {2}-inverse X of A satisfying N ðXÞ = N ðCÞ; there exists V 2 n × l such that CAV C = C; RðCAÞ = RðCÞ; rank(CA) = rank(C); CA(CA)(1)C = C, for some (equivalently every) (CA)(1) 2 (CA){1}.
(b) If the statements in (a) are true, then the set of all outer inverses with the prescribed null space N ðCÞ is represented by Af2g,N ðCÞ = ðCAÞð1Þ Cj ðCAÞð1Þ 2 ðCAÞf1g = fVC j V 2 n × l , CAVC = Cg: Moreover, Af2g,N ðCÞ
= ðCAÞð1Þ C + I l - ðCAÞð1Þ CA YC Y 2 n × l ,
where (CA)(1) is an arbitrary fixed matrix from (CA){1}. ð2Þ
ð2Þ
For more details about the existence of ARðBÞ, or A,N ðCÞ for complex matrices, see [10], for tensors see [73], and for matrices over a ring, see [35]. Theorem 2.8 reduces the problem of computing a {2}-inverse X of A satisfying N ðXÞ = N ðCÞ to the problem of computing a solution to the matrix equation CAV C = C, where V is an unknown matrix taking values in ð2Þ n × l . Then X := A,N ðCÞ = VC. Algorithm 2.2 Computing an outer inverse with prescribed null space
Existence and Representations of Solutions to Some Constrained Systems. . .
17
Remark 2.9 Algorithms 2.1 and 2.2 are based on straightforward and efficient algorithmic framework, which can be explained in two global steps: 1. Solve requested equation(s); 2. Multiply the solution obtained in the first step by appropriate expressions, if necessary. The underlying equations can be solved using various methods, leading to various computationally efficient algorithms. The approach based on recurrent neural networks was used in [74]. Approach based on symbolic solutions to underlying matrix equations in computational package Mathematica was proposed in [75].
2.3
Existence and Representations of Outer Inverses with Prescribed Range and Kernel
For A 2 m×n , B 2 n × k , and C 2 l × m , an outer inverse (or {2}-inverse) of A with the prescribed range RðBÞ and kernel N ðCÞ is the unique solution of the constrain equation: XAX = X,
RðXÞ = RðBÞ,
N ðXÞ = N ðCÞ:
ð2:5Þ
ð2Þ
By ARðBÞ,N ðCÞ , we denote the solution of the equation (2.5), i.e., an outer inverse of A with the prescribed range RðBÞ and kernel N ðCÞ. By Theorems 2.3 and 2.6, we obtain some systems of matrix equations whose solutions include an outer inverse of A with the prescribed range RðBÞ and kernel N ðCÞ. Corollary 2.10 Let A 2 m×n , X 2 n × m , and B 2 n × k . The following statements are equivalent: ð2Þ
(i) X is a solution of (2.5), i.e., X = ARðBÞ,N ðCÞ ; (ii) X = BB{X = XC{C, XAB = B, and CAX = C; (iii) XAX = X, X = BB{X = XC{C, XAB = B, and CAX = C. Several necessary and sufficient conditions for the existence of an outer inverse with the prescribed range and kernel were shown in [74, Theorem 6].
18
D. Mosić and P. S. Stanimirović
Theorem 2.11 ([74, Theorem 6]) Let A 2 m×n , B 2 n × k , and C 2 l × m . (a) The following statements are equivalent: (i) there exists a {2}-inverse X of A satisfying RðXÞ = RðBÞ and N ðXÞ = N ðCÞ; (ii) there exist U 2 k × l such that BUCAB = B and
CABUC = C;
(iii) there exist U, V 2 k × l such that BUCAB = B and
CABVC = C;
(iv) there exist U 2 k × m and V 2 n × l such that BUAB = B,
CAVC = C, and BU = VC;
(v) there exist U 2 k × m and V 2 n × l such that CABU = C
and
VCAB = B;
(vi) N ðCABÞ = N ðBÞ, RðCABÞ = RðCÞ; (vii) rank(CAB) = rank(B) = rank(C); (viii) B(CAB)(1)CAB = B and CAB(CAB)(1)C = C, for some (equivalently every) (CAB)(1) 2 (CAB){1}. (b) If the statements in (a) are true, then the unique {2}-inverse of A with the prescribed range RðBÞ and null space N ðCÞ is represented by ð2Þ
ARðBÞ,N ðCÞ = BðCABÞð1Þ C = BUC, for arbitrary (CAB)(1) 2 (CAB){1} and arbitrary U 2 k × l satisfying BUCAB = B and CABUC = C. Comparing the representations of Theorem 2.11 with the full-rank representation restated in Proposition 2.2, it is remarkable that the representations given in Theorem 2.11 do not require computation of a full-rank factorization
Existence and Representations of Solutions to Some Constrained Systems. . .
19
ð2Þ
R = FG of the matrix R. More precisely, representations of ARðBÞ,N ðCÞ from ð2Þ
Theorem 2.11 boil down to the full-rank factorization of ARðFÞ,N ðGÞ from Proposition 2.2 in the case when BC = R is a full-rank factorization of R and CAB is invertible. Remark 2.12 Especially, for idempotents P and Q, the notation of the image-kernel (P, Q)-inverse of A was presented in [32] for elements of a ring. We can define the image-kernel (P, Q)-inverse in the matrix case as follows: for idempotents P 2 n × n and Q 2 m × m , X 2 n × m is imagekernel (P, Q)-inverse of A 2 m×n if XAX = X,
RðXAÞ = RðPÞ,
and N ðAXÞ = N ðI - QÞ:
Remark that, for idempotents B and C, X is the outer inverse of A with the prescribed range RðBÞ and kernel N ðCÞ if and only if X is the image-kernel (B, I - C)-inverse of A. More characterizations of the image-kernel (P, Q)inverse can be seen in [45, 48, 49, 52, 53]. Theorem 2.11 provides a powerful representation of a {2}-inverse X of A satisfying RðXÞ = RðBÞ and N ðXÞ = N ðCÞ. Also, it suggests the following procedure for computing those generalized inverses. First, it is necessary to verify whether rank(CAB) = rank(B) = rank(C). If this is true, then by Theorem 2.11, it follows that the equations BUCAB = B and CABV C = C are solvable and have the same sets of solutions. We compute an arbitrary solution U of the equation BUCAB = B, and then X = BUC is the desired {2}-inverse of A. Algorithm 2.3 Computing a {2}-inverse with prescribed range and null space
20
D. Mosić and P. S. Stanimirović
Example 2.1 Consider
A=
B=
1
-1
0
0
0
0
-1
1
0
0
0
0
-1
-1
1
-1
0
0
-1
-1
-1
1
0
0
-1
-1
-1
0
2
-1
-1
-1
0
-1
-1
2
,
0:793372
0:265655
0:140305
0:633824
0:329002
0:184927
0:14117
0:427424
, C = I6:
0:0468532 0:0979332 0:89495
0:253673
An application of Algorithm 2.3 produces the class ARðBÞ, = BUC, where U is a solution of BUCAB = B defined by u1,1, u1,2, u1,3, u1,4, u2,1, u2,2, u2,3, u2,4 arbitrary, u1,5 = 0:13u1,1 - 0:13u1,2 - 0:54u1,3 - 0:55u1,4 - 0:35, u1,6 = - 0:57u1,1 + 0:57u1,2 - 0:57u1,3 - 0:03u1,4 + 0:43, u2,5 = - 2:7 × 10 - 17 u1,1 + 2:7 × 10 - 17 u1,2 - 5:5 × 10 - 19 u1,3 + 2:7 × 10 - 17 u1,4 + 0:13u2,1 - 0:13u2,2 - 0:54u2,3 - 0:56u2,4 - 0:2515, u2,6 = 2:1 × 0 - 17 u1,1 - 2:1 × 10 - 17 u1,2 + 1:1 × 10 - 17 u1,3 + 3:1 × 10 - 18 u1,4 - 0:57u2,1 + 0:57u2,2 - 0:57u2,3 - 0:03u2,4 :
Existence and Representations of Solutions to Some Constrained Systems. . .
21
Example 2.2 (a) Let us select the two-variable rational matrix
A=
1 z2
z1
0
0
1 z2
z1
0
0
0
:
Since the index of A is ind(A) = 1, we can ask the group inverse of A as the output of Algorithm 2.3 under the choice B = C = A. Underlying matrix system BUCAB = B becomes AUAAA = A. The solution to AUAAA = A is given as u1,1
- z1 z2 2z32 + u2,2
u1,3
u2,2
u2,3
z32 - u2,2 z1 z2
u3,3
z32 - u1,1 U= z1 z2 u1,1 - z32 z21 z22
:
The result X = AUA is equal to
X = AD = A# =
z2
- z1 z22
- 2z21 z32
0
z2
z1 z22
0
0
0
:
(b) In this part, consider B = C = AT. The symbolic solution to ATUATAAT is equal to z21 z52 + z32 - u1,1 + z21 z22 + 1 z1 z2
z41 z42
u1,1 U= z1 z2 -
z32 4 4 z1 z2 + z21 z22
+1
- u2,2
u3,1
and the product ATUAT gives
u2,2 u3,2
u1,1 -
2z21 z52 + z32 4 z1 z42 + z21 z22 + 1 z21 z22
z21 z52 + z32 - u2,2 + z21 z22 + 1 z1 z2
z41 z42
u3,3
22
D. Mosić and P. S. Stanimirović
z21 z32 + z2 z41 z42 + z21 z22 + 1
-
z31 z42 z41 z42 + z21 z22 + 1
A{ =
-
z1 z22 0 z41 z42 + z21 z22 + 1
z2 4 4 z1 z2 + z21 z22
0 :
+1
z22 z22 z31 + z1 z41 z42 + z21 z22 + 1
z21 z32 z41 z42 + z21 z22 + 1
0
(c) Now, let us choose
B=
z1 z2
0
z2
z21
z1 z2
z32
, C=
z1
z22
0
0
z21
z1 z2
:
Clearly, the requirement rank(CAB) = rank(B) = rank(C) = 2 = rank ð1,2Þ (A) is satisfied. Then Algorithm 2.3 produces ARðBÞ,N ðCÞ , which is equal to z2 z42 + z1 5 z2 + z42 - z31 z22 -
-
+ z1
z31 - z32 z1 z52 + z42 - z31 z22 + z1 z22
z21
z52
z42 - z31 z2 + z42 - z31 z22 + z1
z52
+ z42
z21 z22 - z31 z22 + z1
-
z1 z2 z52 + z42 - z31 z22 + z1 -
-
z22 z31 - z22 ðz2 + 1Þ z1 z52 + z42 - z31 z22 + z1
z21
z32 z52 + z1 z2 + z31 z52 + z42 - z31 z22 + z1
z82 - z31 z52 + z22 z31 z52 + z42 - z31 z22 + z1
:
z32 - z51 - z2 z31 + z22 ðz2 + 1Þz21 + z42 z41 z52 + z42 - z31 z22 + z1
(d) Consider the same A, B as in the part (c) and
C=
z1
z22
0
z21
z2
z1
0 z1 z2 : 0
Since rank(CAB) = rank(B) holds, so that Algorithm 2.3 produces the class Af2gRðBÞ, , defined as
Existence and Representations of Solutions to Some Constrained Systems. . . z2 z42 + z1 5 z2 + z42 - z31 z22 + z1 Af2gRðBÞ, =
-
z31 - z32 z1 z52 + z42 - z31 z22 z22
z21
z52
z1 z2
2.4
z52
z21 z22 4 + z2 - z31 z22
+ z1
z1 z2 z52 + z42 - z31 z22 + z1
+ z1
z42 - z31 z2 + z42 - z31 z22 + z1 z2
z2 -
-
23
-
z22 z31 - z22 ðz2 + 1Þ z1 z52 + z42 - z31 z22 + z1
z21 - z32 u1,1 -
z42 + z1 z22 + z1 5 z2 + z42 - z31 z22 + z1
z1 z21 - z32 u2,1 z42 + z1 z22 + z1 z2 z2 + z2 + z2 + 1 + z21 - z32 u1,1 + z1 5 1 4 2 3 2 + z2 z52 + z42 - z31 z22 + z1 z2 + z2 - z1 z2 + z1 z1
:
z1 z21 - z32 u2,1 3 z2 z21 + z22 + z2 + 1 z4 + z1 z2 + z1 z2 + z21 - z32 u1,1 - 5 2 4 23 2 z2 z2 + z2 - z1 z2 + z1 z52 + z42 - z31 z22 + z1 + z1 z31
Special Cases of Outer Inverses with Given Range and Kernel
Several special cases of Theorem 2.11 are listed below. (a) If rank(CAB) = rank(B) = rank(C) = rank(A), then the outer inverse ð2Þ ð1,2Þ ARðBÞ,N ðCÞ becomes ARðBÞ,N ðCÞ . (b) In the case that A is nonsingular and B = C = I, the outer inverse ð2Þ ARðBÞ,N ðCÞ coincides with the usual inverse A-1. Then the matrix equation BUCAB = B becomes UA = I and A-1 = U. (c) For B = C = A or when A = BC is a full-rank factorization of A, it ð2Þ follows that ARðBÞ,N ðCÞ = A{ . (d) The choice m = n, B = C = Al , l ≥ind(A) or the full-rank factorization ð2Þ Al = BC implies ARðBÞ,N ðCÞ = AD . (e) The choice m = n, B = C = A or the full-rank factorization A = BC proð2Þ duces ARðBÞ,N ðCÞ = A# . (f) In the case m = n when A is invertible, the inverse matrix A-1 can be generated by two choices: B = C = A, or B = C = I.
24
D. Mosić and P. S. Stanimirović
(g) Theorem 2.11 and the full-rank representation of {2, 4}- and {2, 3}inverses from [72] are a theoretical basis for computing a {2, 4}- and {2, 3}-inverses with the prescribed range and null space. Taking B = (CA) in Theorem 2.11, we can verify the next result. Corollary 2.13 Let A 2 m×n and C 2 l × m . (a) The following statements are equivalent: (i) there exists a {2, 4}-inverse X of A satisfying RðXÞ = RððCAÞ Þ
and
N ðXÞ = N ðCÞ;
(ii) there exists U 2 l × l such that ðCAÞ UCAðCAÞ = ðCAÞ
and
CAðCAÞ UC = C;
(iii) there exist U, V 2 l × l such that ðCAÞ UCAðCAÞ = ðCAÞ
and
CAðCAÞ VC = C;
(iv) there exist U 2 l × m and V 2 n × l such that ðCAÞ UAðCAÞ = ðCAÞ ,
CAVC = C,
and
ðCAÞ U = VC;
(v) there exist U 2 l × m and V 2 n × l such that CAðCAÞ U = C
and
VCAðCAÞ = ðCAÞ ;
(vi) N ðCAðCAÞ Þ = N ððCAÞ Þ, RðCAðCAÞ Þ = RðCÞ; (vii) rank(CA(CA)) = rank((CA)) = rank(C); (viii) (CA)(CA(CA))(1)CA(CA) = (CA) and CA(CA)(CA(CA))(1)C = C, for some (equivalently every) (CA(CA))(1) 2 (CA(CA)){1}. (b) If the statements in (a) are true, then the unique {2, 4}-inverse of A with the prescribed range RððCAÞ Þ and null space N ðCÞ is represented by ð1Þ
ð2,4Þ
ARððCAÞ Þ,N ðCÞ = ðCAÞ ðCAðCAÞ Þ C = ðCAÞ UC,
Existence and Representations of Solutions to Some Constrained Systems. . .
25
for arbitrary (CA(CA))(1) 2 (CA(CA)){1} and arbitrary U 2 l × l satisfying (CA)UCA(CA) = (CA) and CA(CA)UC = C. Corollary 2.14 Let A 2 m×n and B 2 n × k . (a) The following statements are equivalent: (i) there exists a {2, 3}-inverse X of A satisfying RðXÞ = RðBÞ
and
N ðXÞ = N ððABÞ Þ;
(ii) there exists U 2 k × k such that BUðABÞ AB = B and
ðABÞ ABUðABÞ = ðABÞ ;
(iii) there exist U, V 2 k × k such that BUðABÞ AB = B and
ðABÞ ABVðABÞ = ðABÞ ;
(iv) there exist U 2 k × m and V 2 n × k such that BU AB = B,
ðABÞ AVðABÞ = ðABÞ ,
and
BU = VðABÞ ;
(v) there exist U 2 k × m and V 2 n × k such that ðABÞ ABU = ðABÞ
and
VðABÞ AB = B;
(vi) N ððABÞ ABÞ = N ðBÞ, RððABÞ ABÞ = RððABÞ Þ; (vii) rank((AB)AB) = rank(B) = rank((AB)); (viii) B((AB)AB)(1)(AB)AB = B and (AB)AB((AB)AB)(1)(AB) = (AB), for some (equivalently every) ((AB)AB)(1) 2 (CAB){1}. (b) If the statements in (a) are true, then the unique {2, 3}-inverse of A with the prescribed range RðBÞ and null space N ððABÞ Þ is represented by ð1Þ
ð2,3Þ
ARðBÞ,N ððABÞ Þ = BððABÞ ABÞ ðABÞ = BUðABÞ , for arbitrary ((AB)AB)(1) 2 ((AB)AB){1} and arbitrary U 2 k × k satisfying BU(AB)AB = B and (AB)ABU(AB) = (AB). Corollary 2.15 shows the equivalence between the first representation given in (2.14) of Corollary 2.14 and [71, Corollary 1].
26
D. Mosić and P. S. Stanimirović
Corollary 2.15 Let A 2 m×n and B 2 n × k satisfy rank(AB) = rank(B). Then ð2,3Þ
ARðBÞ,RðABÞ⊥ = BðABÞð1,3Þ : In the dual case, Corollary 2.16 is an additional result to Corollary 1 from [71]. Corollary 2.16 Let A 2 m×n and C 2 l × m satisfy rank(CA) = rank(C). Then ð2,4Þ
AN ðCAÞ⊥ ,N ðCÞ = ðCAÞð1,4Þ C: (h) Further, Theorems 2.5 and 2.8 provide a way to characterize {1, 2, 4}and {1, 2, 3}-inverses of a matrix. Theorem 2.17 Let A 2 m×n . Then Af1, 2, 4g = Af2gRðA Þ, = Af1, 2gRðA Þ, = fA U j U 2 m × m , A UAA = A g: Theorem 2.18 Let A 2 m×n . Then Af1, 2, 3g = Af2g,N ðA Þ = Af1, 2g,N ðA Þ = fVA j V 2 n × n , A AVA = A g: Example 2.3 Consider the matrix
A=
1
-1
0
0
0
0
-1
1
0
0
0
0
-1
-1
1
-1
0
0
-1
-1
-1
1
0
0
-1
-1
-1
0
2
-1
-1
-1
0
-1
-1
2
and B 2 62 × 2 , C 2 44 × 6 defined by
2 65 × 6
Existence and Representations of Solutions to Some Constrained Systems. . .
B¼
0:793372
0:265655
0:140305
0:633824
0:329002
0:184927
0:141169569
0:427424
0:0468532
0:0979332
C¼
27
,
0:89494969 0:253673 0:714297 0:734462 0:790305 1:1837035 0:850446
1:143219
0:596075
0:5652303
0:745458
1:011021
0:785712
1:013570
0:780387
0:931596
0:630581
1:23033
0:723199 1:0876717
0:298214 0:30235998 0:337657
0:496275
0:361875
:
0:482631
(a) This part of the example illustrates results of Theorem 2.11 and numerical data are obtained by Algorithm 2.3. The matrices A, B, C satisfy rank (B) = 2, rank(C) = 4, rank(CAB) = 2. Since the conditions in (vii) of Theorem 2.11 are not satisfied, there is no a unique solution to the system of matrix equations BUCAB = B and CABUC = C. The outer inverses X = B(CAB)(1)C can be computed using the following framework: Step 1.
Solve the matrix equation BUCAB = B with respect to U. The matrix B is of full-column rank, and it possesses the left inverse Bl- 1 . Consequently, the equation BUCAB = B is equivalent to UCAB = I, and the solution to BUCAB = B is equal to
U = ðCABÞ{ =
Step 2.
1:76069
4:94967
- 7:50589
1:04706
- 1:49938
- 4:03114
5:8936
- 0:875175
:
The outer generalized inverse X = BUC is equal to X ¼ BUC ¼ BðCABÞ{ C - 0:83127 - 1:56045
¼
0:352412
- 1:03387
0:135393
0:360282
0:807398
- 0:389035
0:384252
- 0:267666
- 0:449024 0:0308925
- 0:28009
- 0:509081
0:0886387
- 0:356487
0:0189951
- 0:172153
0:180706
0:424133
- 0:22968
0:183394
- 0:164812
- 0:00844537
0:0220041
0:0596765
- 0:0424447
0:0184201
- 0:0328727
- 0:0110591
- 0:977459
- 1:84514
0:432908
- 1:21068
0:175583
- 0:515159
:
Further, the matrix U = (CAB){ is an approximate solution of the matrix equations CABUC = C and BUCAB = B. Also, X = BUC is an approximate solution to the equations (3.1), since
28
D. Mosić and P. S. Stanimirović
kCABUC - Ck = 2:23 10 - 14 ,
kBUCAB - Bk = 9:46 10 - 15 :
Therefore, the equations in XAB = B, CAX = C are satisfied.
(b) Dual approach in Theorem 2.11 is based on the solution of CAV C = C and the initiated outer inverse X1 = BV C. Since the matrix C is right invertible, the matrix equation CABV C = C gives the dual form of the matrix equation for computing (CAB){, that is, CABV = I. Conclusion is that both X and X1 are approximations of the same outer inverse of A, which is equal to B(CAB){C. To that end, it can be verified that X and X1 satisfy k X - X 1 k = 4:14 10 - 11 : (c) This part of the example illustrates Theorem 2.5 and Algorithm 2.1. The matrices A and B satisfy rank(AB) = rank(B), so that it is justifiable to search for a solution U of the matrix equation BUAB = B and the initiated outer inverse X = BU. In order to highlight the results derived by the implementation of Algorithm 2.1, it is important to mention that U := (AB){ is equal to 0:167297
- 0:167297 0:00708203
- 0:203528 0:203528
- 0:279822
- 0:123801
- 0:236308 0:239756
- 0:0731705
- 0:112743
- 0:385548
and X :¼ BðABÞ{ ¼ BU 0:0786607
¼
- 0:0786607
- 0:0687173
- 0:117658
- 0:217431
0:0877929
- 0:105529
0:105529
- 0:176364
- 0:0637471
- 0:104615
- 0:21073
0:0174033
- 0:0174033
- 0:0494166
- 0:0542619
- 0:098595
0:00758195
- 0:0633756
0:0633756
- 0:118603
- 0:0487517
- 0:0815486
- 0:130946
- 0:0120938
0:0120938
- 0:027072
- 0:0129663
- 0:0221131
- 0:0265246
0:0980931
- 0:0980931
- 0:0646451
- 0:129357
- 0:240083
0:116766
:
According to theoretical results, we conclude X 2 Af2gRðBÞ, . (d) This part of the example illustrates Theorem 2.8 and Algorithm 2.2. Since rank(CA) = rank(C), it is justifiable to search for a solution of the matrix equation CAV C = C. One possible solution to CAV C = C is given as
Existence and Representations of Solutions to Some Constrained Systems. . .
V := ðCAÞ{ =
120140:
129792:
27421:9
- 618952:
- 90013:5
- 47865:2
- 6777:93
329013:
- 52937:6
- 1464:19
3452:26
120689:
23062:
- 103225:
- 30460:2
230800:
- 36793:4
- 112814:
- 28769:9
388910:
66669:
217503:
55777:9
- 740399:
29
,
and X :¼ ðCAÞ{ C 0:800499
¼ VC
0:290122
- 0:192667
- 0:201861
- 0:0498093
- 0:0355252
- 0:714584
- 0:247028
- 0:0707528
- 0:0994969
- 0:155725
- 0:111067
- 0:615629
- 0:552896
0:153409
- 0:291012
0:0511091
- 0:0352418
0:408051
0:436046
- 0:154673
0:37013
- 0:115624
- 0:15416
- 0:373293
- 0:441438
0:209825
- 0:0172156
0:0895808
- 0:149952
0:580871
0:558288
- 0:208561
- 0:0619027
- 0:0250655
0:339354
:
Theoretical results initiate the conclusion X 2 Af2g,N ðCÞ .
2.5
Matrix Equations Corresponding to Composite Outer Inverses
Combinations of outer inverses which possess known image and kernel with the Moore-Penrose inverse become the most popular branch of generalized inverses. Due to the basic fact that such combinations are outer inverses, such expressions have aroused great interest. Main properties, representations, and characterizations of these inverses have been investigated in a number of papers. A survey on this topic was presented in [76]. 2.5.1
Core and Core-EP Inverses
The core-EP inverse is introduced as an outer inverse that satisfies specific conditions [64]. Let A 2 n × n and k = ind(A). A matrix X 2 n × n , denoted { by A A, is called the core-EP inverse of A if it satisfies
30
D. Mosić and P. S. Stanimirović
XAX = X,
RðXÞ = RðX Þ = RðAk Þ:
Also, the following representation from [64] is known:
{ A = Ak Ak Ak + 1
-
Ak
=A
ð2Þ : RðAk Þ,N ððAk Þ Þ
ð2:6Þ
In [29], the core-EP inverse is represented as the product of the Drazin inverse, rank-invariant matrix power, and the Moore-Penrose inverse: {
{ A := AD Ak ðAk Þ :
Various representations of the core-EP inverse were investigated in [34, 39, 57, 84, 98]. # In the case ind(A) = 1, the core-EP inverse A becomes the core inverse A # { = A AA [1]. { The dual core-EP inverse is given by A = (Ak){AkAD, while the dual core { # inverse of A # is defined as A # = A AA .
2.5.2
DMP and MPD Inverses
DMP inverse and MPD inverse are defined on square matrices of arbitrary index in [41] as a hybrid composition of the Drazin inverse and the MoorePenrose inverse. The DMP inverse AD,{ := ADAA{ of A 2 n × n is the unique solution to the following matrix equations: XAX = X,
XA = AD A,
Ak X = Ak A{ , k = indðAÞ:
ð2:7Þ
The authors of [37, 96] described the range space and null space of the DMP inverse as follows: RðAD,{ Þ = RðAk Þ,
N ðAD,{ Þ = N ðAk A{ Þ, k = indðAÞ: ð2Þ
Therefore, the DMP inverse is just the outer inverse AD,{ = ARðAk Þ,N
ðAk A{ Þ [24, 96]. The most important results concerning the matrix DMP inverse were discovered in [37, 38, 96]. Various representations for the DMP inverse can be found in [38]. Algorithms for calculating the DMP inverse were proposed in [37, 85].
Existence and Representations of Solutions to Some Constrained Systems. . .
31
The MPD inverse A{,D := A{AAD is defined as the dual to the DMP inverse [41]. 2.5.3
MPCEP Inverse
The MPCEP (or MP-Core-EP) inverse of A 2 n × n was proposed in [12] as the matrix product involving the Moore-Penrose and the core-EP inverse as follows: { := A{ AA: A{,
ð2:8Þ
It is known that A{, is the unique solution to the matrix equations { XA = A{ AA A,
XAX = X,
{ AX = AA :
ð2:9Þ
The CEPMP inverse of A 2 n × n is defined in [12] by { A,{ { := A { AA ,
and it is the unique solution to the matrix equations { AX = AA { AA ,
XAX = X,
XA = A { A:
Main representations and properties of the MPCEP inverse were introduced in [55]. Theorem 2.19 ([55]) Let A 2 n × n satisfy ind(A) = k. For an arbitrary l ≥ k, the MPCEP inverse of A is defined by { = A{ Al Al A{,
{
= A{ PRðAl Þ :
ð2:10Þ
Corollary 2.20 ([55]) Let A 2 n × n be such that ind(A) = k. Then ð2Þ
{ A{, = ARðA{ Ak Þ,N
ððAk Þ Þ
=A
{
ð2Þ RðA{ Ak ðAk Þ Þ,N ðA{ Ak ðAk Þ Þ
= A{ Ak ðAk Þ Ak ðAk Þ {
= A{ Ak Ak Ak Ak
{
ð2:11Þ
{
= A{ Ak Ak :
32
D. Mosić and P. S. Stanimirović
2.5.4
CMP Inverse
The CMP inverse is denoted as Ac,{ and defined by [43] Ac,{ := A{ AAD AA{ :
ð2:12Þ
It is common knowledge that Ac,{ is the unique solution to the system of matrix equations XAX = X,
XA = A{ AAD A,
AXA = AAD A,
AX = AAD AA{ :
Representations and main properties of the CMP inverse are available in [43, 94]. The following representation of the CMP inverse ð2Þ
Ac,{ = ARðA{ AD Þ,N ðAD A{ Þ follows from [54, Corollary 3.7]. The CMP inverse is the outer inverse with known null space and range: ð2Þ
Ac,{ = AR A{ Ak A{ ,N A{ A2k A{ : ð Þ ð Þ
2.5.5
ð2:13Þ
OMP, MPO, and MPOMP Inverses
The OMP, MPO, and MPOMP inverses are defined as compositions of outer inverses with the Moore-Penrose inverse in [56] and are known as composite outer inverses. The leading idea is to replace positions of particular outer inverses that appear in definitions of the core-EP, dual core-EP, DMP, MPD, ð2Þ MPCEP, and CMP inverses by the general outer inverse AT,S . ð2Þ
If A 2 m×n and AT,S exists, then the OMP inverse of A is defined as ð2Þ,{
ð2Þ
AT,S := AT,S AA{ : The outer Moore-Penrose (or OMP) inverse was introduced as a solution to an appropriate system of matrix equations in [56, Theorem 2.1].
Existence and Representations of Solutions to Some Constrained Systems. . .
33
ð2Þ
Theorem 2.21 ([56, Theorem 2.1]) If A 2 m×n and AT,S exists, then the matrix system XAX = X,
ð2Þ
ð2Þ
AX = AAT,S AA{ ,
XA = AT,S A
ð2Þ
is solvable and its unique solution is X = AT,S AA{ . Main appearances of the OMP inverses are listed as follows: ð2Þ
(i) For m = n, ind(A) = k and AT,S = AD , the OMP inverse reduces to the DMP inverse. ð2Þ (ii) In the particular settings m = n, ind(A) = 1 and AT,S = A# , the OMP inverse becomes the core inverse. ð2Þ
If A 2 m×n and AT,S exists, then the Moore-Penrose outer (MPO) inverse of A is defined as {,ð2Þ
ð2Þ
AT,S := A{ AAT,S : The MPO inverse is a unique solution to corresponding system of matrix equations [56]. ð2Þ
Theorem 2.22 ([56, Theorem 2.1]) If A 2 m×n and AT,S exists, the system of equations XAX = X,
ð2Þ
AX = AAT,S ,
ð2Þ
XA = A{ AAT,S A ð2Þ
is solvable and its unique solution is X = A{ AAT,S . Particular cases of the MPO inverse are listed as follows: ð2Þ
(i) If m = n, ind(A) = k, and AT,S = AD , the MPD inverse is a particular case of the MPO inverse. ð2Þ (ii) For m = n, ind(A) = 1, and AT,S = A# , the MPO inverse becomes the dual core inverse. ð2Þ (iii) In the case that m = n, ind(A) = k, and AT,S = A, the MPO inverse ð2Þ
A{ AAT,S becomes the MPCEP inverse A{,.
34
D. Mosić and P. S. Stanimirović
More characterizations of the OMP inverse were proved in [56, Theorem 2.2]. The Moore-Penrose outer Moore-Penrose (MPOMP) inverse is defined by [56] {,ð2Þ,{
AT,S
ð2Þ
:= A{ AAT,S AA{ : ð2Þ
Theorem 2.23 ([56, Theorem 2.1]) If A 2 m×n and AT,S exists, then the system of equations XAX = X,
ð2Þ
ð2Þ
AX = AAT,S AA{ ,
XA = A{ AAT,S A {,ð2Þ,{
is solvable and its unique solution is X := AT,S
.
3 Existence and Representations of (B, C) and One-Sided (B, C)-Inverses Drazin in [25] generalized the concept of the outer inverse with the prescribed range and null space by introducing the concept of a (b, c)-inverse in a semigroup. In the matrix case, these concepts can be defined as follows. Let A 2 m×n , B 2 n × k , and C 2 l × m . Then X 2 n × m is a (B, C)-inverse of A if the following equations hold: XAB = B,
CAX = C
X = BU = VC, for some U 2 k × m , V 2 n × l :
ð3:1Þ ð3:2Þ
Notice that X is the (B, C)-inverse of A if and only if X is the {2}-inverse of A satisfying RðXÞ = RðBÞ and N ðXÞ = N ðCÞ. Various interesting results related to (B, C)-inverse can be found in [11, 58, 61, 86]. Following their invention, (b, c)-inverses and inverses along an element have been investigated on matrices over a field [6, 13, 74, 87], on matrices over a ring [35], or in tensors’ case [73]. In [26], the left and right (B, C)invertible elements of rings were given as extensions of the (B, C)-inverse. One-sided (B, C)-inverses were considered for arbitrary matrices in [6].
Existence and Representations of Solutions to Some Constrained Systems. . .
3.1
35
Existence and Representations of Left (B, C)-Inverses
Suppose that A 2 m×n , B 2 n × k , and C 2 l × m . A left (B, C)-inverse of A is defined as a solution to the restricted equation: XAB = B,
N ðCÞ ⊆ N ðXÞ:
ð3:3Þ
The set of all left (B, C)-inverses of A, i.e., the set of all solutions to the equation (3.3) is denoted by A{l, B, C}. In the following theorem, we present equivalent conditions for the existence of left (B, C)-inverses of A, originated in [58, Corollary 2.3]. Theorem 3.1 The subsequent statements are valid for A 2 m×n , B 2 n × k, and C 2 l × m . (a) The following statements are equivalent: (i) A is left (B, C)-invertible; (ii) there exists X 2 n × m satisfying XAB = B and
X = VC,
for some V 2 n × l ; (iii) there exists X 2 n × m satisfying XAB = B and
X = XC ð1Þ C,
for arbitrary inner inverse Cð1Þ 2 m × l of C; (iv) there exists X 2 n × m satisfying XAB = B and
X 2 X m × l C;
B 2 n × l CAB; N ðBÞ = N ðCABÞ; N ðBÞ ⊇ N ðCABÞ; B = V CAB, for some V 2 n × l ; B = B(CAB)(1)CAB, for arbitrary inner inverse ðCABÞð1Þ 2 k × l of CAB; (x) B = BV CAB, for some V 2 k × l ; (xi) rank(B) = rank(CAB).
(v) (vi) (vii) (viii) (ix)
36
D. Mosić and P. S. Stanimirović
(b) If one of the statements (i)–(xi) holds, then the set of left (B, C)-inverses of A is given by Afl, B, Cg = fVCjV 2 k × l , VCAB = Bg = BYC + ZC - ZCABYCjZ 2 n × l , for arbitrary inner inverse Y of CAB. Moreover, BYC 2 Af2gRðBÞ, : (c) Suppose that B = B1B2 and C = C1C2, where B1 2 n × r , B2 2 r × k , C 1 2 l × s , and C2 2 s × m , are full-rank factorizations of B and C, respectively. Then the subsequent statements are equivalent: (ic) A is left (B, C)-invertible; (iic) rank(B1) = rank(C2AB1). Thereafter, if one of the statements (ic)–(iic) holds, then the set of left (B, C)-inverses of A is given by Afl, B, Cg = B1 YC2 + ZC2 - ZC 2 AB1 YC2 jZ 2 n × s , for arbitrary inner inverse Y of C2AB1.
3.2
Existence and Representations of Right (B, C)-Inverses
Arbitrary matrices A 2 m×n , B 2 n × k , and C 2 l × m are considered again. A right (B, C)-inverse of A is introduced as a solution to the constrained matrix equation: CAX = C,
RðXÞ ⊆ RðBÞ:
ð3:4Þ
We use the notation A{r, B, C} for the set of all right (B, C)-inverses of A, i.e., the set of all solutions to (3.4). Now, we give necessary and sufficient conditions for the existence of right (B, C)-inverses of A, which can be verified by results proved in [58].
Existence and Representations of Solutions to Some Constrained Systems. . .
37
Theorem 3.2 Consider A 2 m×n , B 2 n × k , and C 2 l × m . (a) The next statements are mutually equivalent: (i) A is right (B, C)-invertible; (ii) there exists X 2 n × m satisfying CAX = C
and
X = BU,
for some U 2 k × m ; (iii) there exists X 2 n × m satisfying CAX = C
and
X = BBð1Þ X,
for arbitrary inner inverse Bð1Þ 2 k × n ; (iv) there exists X 2 n × m which satisfies CAX = C
and
X 2 Bk × n X;
C 2 CABk × m ; RðCÞ = RðCABÞ; RðCÞ ⊆ RðCABÞ; C = CABU, for some U 2 k × m ; C = CAB(CAB)(1)C, for arbitrary inner inverse ðCABÞð1Þ 2 k × l of CAB; (x) C = CABUC, for some U 2 k × l ; (xi) rank(C) = rank(CAB).
(v) (vi) (vii) (viii) (ix)
(b) Additionally, if at least one of the statements (i)–(xi) is true, the set of right (B, C)-inverses of A is given by Afr, B, Cg = fBU jU 2 k × m , CABU = Cg = BYC + BZ - BYCABZ jZ 2 k × m , for arbitrary inner inverse Y of CAB. Moreover, BYC 2 Af2g,N ðCÞ :
38
D. Mosić and P. S. Stanimirović
(c) Suppose that B = B1B2 and C = C1C2, where B1 2 n × r , B2 2 r × k , C 1 2 l × s , and C2 2 s × m , are full-rank factorizations of B and C, respectively. The next statements are equivalent: (ic) A is right (B, C)-invertible; (iic) rank(C2) = rank(C2AB1). In addition, if one of the statements (ic)–(iic) holds, then all right (B, C)inverses of A can be characterized by Afr, B, Cg = B1 YC 2 + B1 Z - B1 YC2 AB1 Z jZ 2 r × m , for arbitrary inner inverse Y of C2AB1. Notice that statements in the parts (c) of Theorems 3.1 and 3.2 are proved in [6, Theorem 3.22]. Generally, a left (or right) (B, C)-inverse of a given matrix is not its outer inverse. Remark that, by Theorem 3.1 (or Theorem 3.2), if A is left (or right) ð2Þ ð2Þ (B, C)-invertible, then the outer inverse ARðBÞ, (or A,N ðCÞ ) exists and it is equal to one of left (or right) (B, C)-inverses of A. The converse is not valid without some additional conditions. For example, A is left (or right) (B, C)ð2Þ ð2Þ ð2Þ invertible if and only if ARðBÞ, (or A,N ðCÞ ) exists and N ðCÞ ⊆ N ðARðBÞ, Þ ð2Þ
(or RðA,N ðCÞ Þ ⊆ RðBÞ).
3.3
Existence and Representations of Inverses Along a Matrix
The concepts of one-sided inverses along B and inverse along B are obtained in the case B = C in the definitions of one-sided (B, C)-inverses and (B, C)inverse, respectively. The inverse along an element was defined in [40]. In [99], left and right inverses along an element were presented in the context of semigroups, generalizing the notion of the inverse along an element. The notation of one-sided inverses along a given matrix was presented in [6]. Let A 2 m×n and B 2 n × m . (i) The matrix X 2 n × m is said to be a left inverse along B of A if it is satisfying XAB = B,
N ðBÞ ⊆ N ðXÞ:
Existence and Representations of Solutions to Some Constrained Systems. . .
39
(ii) The matrix X 2 n × m is said to be a right inverse along B of A if it fulfills BAX = B,
RðXÞ ⊆ RðBÞ:
(iii) The matrix X 2 n × m is said to be an inverse along B of A, denoted by A||B, if XAB = B,
BAX = B,
RðXÞ ⊆ RðBÞ,
N ðBÞ ⊆ N ðXÞ:
Necessary and sufficient conditions for the existence of left and right inverse along B of matrices are obtained in Corollary 3.3 following the results of Theorems 3.1 and 3.2. We present only some of these conditions to observe that left and right invertibility along a matrix are equivalent notions. Corollary 3.3 Let A 2 m×n and B 2 n × m . (a) Then the following statements are equivalent: (i) (ii) (iii) (iv)
A is left invertible along B; rank(B) = rank(BAB); A is right invertible along B; A is invertible along B.
(b) In addition, if one of the statements (i)–(iv) holds, then the unique inverse along B of A is defined as AjjB = BðBABÞð1Þ B, for an arbitrary inner inverse ðBABÞð1Þ 2 m×n of BAB. Various properties of inverses along an element are available in [5, 14, 42].
4 Existence and Representations of Inner Inverses with Adequate Range and/or Kernel The set of inner inverses (or also called {1}-inverses) is defined for arbitrary A 2 m×n by
40
D. Mosić and P. S. Stanimirović
Af1g = fX 2 n × m j AXA = Ag:
ð4:1Þ
Conditions for the existence and representations of inner inverses with the prescribed range and/or null space were considered in [74]. Representations of inner inverses are given in terms of solutions to proper matrix equations and usage of ranks of involved matrices.
4.1
Existence and Representations of Inner Inverses with Adequate Range
If A 2 m×n and B 2 n × k , we consider the existence and representations of the solutions of the next constrained matrix equation: AXA = A,
RðXÞ ⊆ RðBÞ:
ð4:2Þ
Theorem 4.1 reveals that X is a solution to the restricted equation (4.2) if and only if it is a solution to the system of matrix equations presented in its part (ii). Theorem 4.1 Let A 2 m×n , X 2 n × m , and B 2 n × k . The following statements are equivalent: (i) X is a solution to (4.2) ; (ii) AXA = A and X = BB{X. Theorem 4.2 given in [74] is significant for calculating a {1}-inverse X of A satisfying RðXÞ ⊆ RðBÞ. Theorem 4.2 ([74, Theorem 8]) Let A 2 m×n and B 2 n × k . (a) The following statements are equivalent: (i) (ii) (iii) (iv) (v)
there exists a {1}-inverse X of A satisfying RðXÞ ⊆ RðBÞ; there exists U 2 k × m such that ABUA = A; RðABÞ = RðAÞ; AB(AB)(1)A = A, for some (equivalently every) (AB)(1) 2 (AB){1} rank(AB) = rank(A).
(b) If the statements in (a) are true, then the set of all inner inverses of A whose range is contained in RðBÞ is represented by
Existence and Representations of Solutions to Some Constrained Systems. . .
41
fX 2 Af1g j RðXÞ ⊆ RðBÞg = BðABÞð1Þ j ðABÞð1Þ 2 ðABÞf1g = fBU j U 2 k × m , ABUA = Ag: Moreover, fX 2 Af1g j RðXÞ ⊆ RðBÞg = BðABÞð1Þ AAð1Þ + BY - BðABÞð1Þ ABYAAð1Þ Y 2 k × m , where (AB)(1) 2 (AB){1} and A(1) 2 A{1} are arbitrary but fixed. Theorem 4.2 can be used in a similar way as Theorem 2.5: if the equation ABUA = A is solvable and any of its solution U is computed, then a {1}inverse X of A satisfying RðXÞ ⊆ RðBÞ is computed as X = BU. Algorithm 4.1 described corresponding computational procedure. Algorithm 4.1 Computing a {1}-inverse X of A satisfying ℛ(X) ⊆ℛ(B)
It is important to mention that each particular solution U 2 k × m to the matrix equation ABUA = A induces corresponding {1}-inverse X = BU of A satisfying RðXÞ ⊆ RðBÞ.
4.2
Existence and Representations of Inner Inverses with Adequate Kernel
It is also interesting to calculate the solutions of the restricted matrix equation AXA = A,
N ðCÞ ⊆ N ðXÞ,
ð4:3Þ
wherein A 2 m×n and C 2 l × m . Remark that the constrained matrix equation (4.3) can be replaced with the system of matrix equations given in Theorem 4.3(ii).
42
D. Mosić and P. S. Stanimirović
Theorem 4.3 Let A 2 m×n , X 2 n × m , and C 2 l × m . The following statements are equivalent: (i) X is a solution of (4.3) ; (ii) AXA = A and X = XC{C. For finding a {1}-inverse X of A satisfying N ðCÞ ⊆ N ðXÞ, we can used the following result proved in [74]. Theorem 4.4 ([74, Theorem 9]) Let A 2 m×n and C 2 l × m . (a) The following statements are equivalent: (i) (ii) (iii) (iv) (v)
there exists a {1}-inverse X of A satisfying N ðCÞ ⊆ N ðXÞ; there exists V 2 n × l such that AV CA = A; N ðCAÞ = N ðAÞ; A(CA)(1)CA = A, for some (equivalently every) (CA)(1) 2 (CA){1} rank(CA) = rank(A).
(b) If the statements in (a) are true, then the set of all inner inverses of A whose null space is contained in N ðCÞ is represented by fX 2 Af1g j N ðCÞ ⊆ N ðXÞg = ðCAÞð1Þ C j ðCAÞð1Þ 2 ðCAÞf1g = fVC j V 2 n × l , AVCA = Ag: Moreover, fX 2 Af1g j N ðCÞ ⊆ N ðXÞg = = Að1Þ AðCAÞð1Þ C + YC - Að1Þ AYCAðCAÞð1Þ C Y 2 n × l , where (CA)(1) 2 (CA){1} and A(1) 2 A{1} are arbitrary but fixed. Similarly, Theorem 4.4 can be used for computing {1}-inverses X of A satisfying N ðCÞ ⊆ N ðXÞ, as it is presented in Algorithm 4.2. Algorithm 4.2 Computing a {1}-inverse X of A satisfying N (C) ⊆ N (X)
Existence and Representations of Solutions to Some Constrained Systems. . .
43
Useful observation is that each particular solution V 2 n × l to the matrix equation AV CA = A initiates corresponding {1}-inverse X = V C of A satisfying N ðCÞ ⊆ N ðXÞ.
4.3
Existence and Representations of {1, 2}-Inverses with Prescribed Range
In the case A 2 m×n and B 2 n × k , a {1, 2}-inverse of A with the prescribed range RðBÞ presents a solution to the restricted system of matrix equations: AXA = A,
XAX = X,
RðXÞ = RðBÞ:
ð4:4Þ
ð1,2Þ
We use ARðBÞ, to denote a solution to the system (4.4), that is, a {1, 2}inverse of A with the prescribed range RðBÞ. The symbol Af1, 2gRðBÞ, will stand for the set of all solutions to the system (4.4), i.e., the set of all {1, 2}inverses with the prescribed range RðBÞ. By Theorem 2.3, we obtain the next consequence. Corollary 4.5 Let A 2 m×n , X 2 n × m , and B 2 n × k . The following statements are equivalent: (i) X is a solution to (4.4) , i.e., X 2 Af1, 2gRðBÞ, ; (ii) AXA = A, X = BB{X, and XAB = B; (iii) AXA = A, XAX = X, X = BB{X, and XAB = B. Some equivalent conditions for the existence and representations of {1, 2}-inverses with the prescribed range were proposed in [74, Theorem 10]. Theorem 4.6 ([74, Theorem 10]) Let A 2 m×n and B 2 n × k . (a) The following statements are equivalent: (i) there exists a {1, 2}-inverse X of A satisfying RðXÞ = RðBÞ; (ii) there exist U, V 2 k × m such that BUAB = B and
ABVA = A;
(iii) there exists W 2 k × m such that BWAB = B and
ABWA = A;
44
D. Mosić and P. S. Stanimirović
(iv) N ðABÞ = N ðBÞ and RðABÞ = RðAÞ; (v) rank(AB) = rank(A) = rank(B); (vi) B(AB)(1)AB = B and AB(AB)(1)A = A, for some (equivalently every) (AB)(1) 2 (AB){1}. (b) If the statements in (a) are true, then the set of all {1, 2}-inverses with the prescribed range RðBÞ is represented by Af1, 2gRðBÞ, = Af2gRðBÞ, = fX 2 Af1g j RðXÞ ⊆ RðBÞg:
4.4
Existence and Representations of {1, 2}-Inverses with Prescribed Kernel
Let A 2 m×n and C 2 l × m . A solution to the next system of matrix equations AXA = A,
XAX = X,
N ðXÞ = N ðCÞ,
ð4:5Þ
is called a {1, 2}-inverse of A with the prescribed kernel N ðCÞ and it is ð1,2Þ denoted by A,N ðCÞ . The set of all {1, 2}-inverse of A with the prescribed kernel N ðCÞ, that is, the set of all solutions to (4.5), will be denoted by Af1, 2g,N ðCÞ . Theorem 2.6 implies the following result. Corollary 4.7 Let A 2 m×n , X 2 n × m , and C 2 l × m . The following statements are equivalent: (i) X is a solution to (4.5) , i.e., X 2 Af1, 2g,N ðCÞ ; (ii) AXA = A, X = XC{C, and CAX = C; (iii) AXA = A, XAX = X, X = XC{C, and CAX = C. Necessary and sufficient conditions for the existence and representations ð1,2Þ of A,N ðCÞ were proposed in [74, Theorem 11]. Theorem 4.8 ([74, Theorem 11]) Let A 2 m×n and C 2 l × m . (a) The following statements are equivalent: (i) there exists a {1, 2}-inverse X of A satisfying N ðXÞ = N ðCÞ; (ii) there exist U, V 2 n × l such that
Existence and Representations of Solutions to Some Constrained Systems. . .
CAUC = C
45
AVCA = A;
and
(iii) there exists W 2 n × l such that CAWC = C
and
AWCA = A;
(iv) N ðCAÞ = N ðAÞ and RðCAÞ = RðCÞ; (v) rank(CA) = rank(A) = rank(C); (vi) CA(CA)(1)C = C and A(CA)(1)CA = A, for some (equivalently every) (CA)(1) 2 (CA){1}. (b) If the statements in (a) are true, then the set of all {1, 2}-inverses with the range RðBÞ is given by Af1, 2g,N ðCÞ = Af2g,N ðCÞ = fX 2 Af1g j N ðCÞ ⊆ N ðXÞg:
4.5
Existence and Representations of {1, 2}-Inverses with Prescribed Range and Kernel
Suppose A 2 m×n , B 2 n × k , and C 2 l × m . A {1, 2}-inverse of A with the prescribed range RðBÞ and kernel N ðCÞ is a solution to the system of equations: AXA = A,
XAX = X,
RðXÞ = RðBÞ,
N ðXÞ = N ðCÞ:
ð4:6Þ
ð1,2Þ
Denote by ARðBÞ,N ðCÞ the solution to the system (4.6) (or the {1, 2}-inverse of A with the prescribed range RðBÞ and kernel N ðCÞ). Notice that Corollary 2.10 gives some characterizations of a {1, 2}-inverse of A with the prescribed range RðBÞ and kernel N ðCÞ. Corollary 4.9 Let A 2 m×n , X 2 n × m , and B 2 n × k . The following statements are equivalent: ð1,2Þ
(i) X is a solution to (2.5) , i.e., X = ARðBÞ,N ðCÞ ; (ii) AXA = A, X = BB{X = XC{C, XAB = B and CAX = C; (iii) AXA = A, XAX = X, X = BB{ X = XC { C, XAB = B and CAX = C. Theorem 4.10 was proved in [74] and gives characterizations of {1, 2}inverses with prescribed range and null space.
46
D. Mosić and P. S. Stanimirović
Theorem 4.10 ([74, Theorem 12]) Let A 2 m×n , B 2 n × k , and C 2 l × m . (a) The following statements are equivalent: (i) there exists a {1, 2}-inverse X of A satisfying RðXÞ = RðBÞ and N ðXÞ = N ðCÞ; (ii) there exist U 2 k × m and V 2 n × l such that BUAB = B,
ABUA = A,
CAVC = C and AVCA = A;
(iii) N ðABÞ = N ðBÞ, RðABÞ = RðAÞ, RðCAÞ = RðCÞ, and N ðCAÞ = N ðAÞ; (iv) rank(AB) = rank(A) = rank(B), rank(CA) = rank(A) = rank(C); (v) rank(CAB) = rank(C) = rank(B) = rank(A); (vi) BðABÞð1Þ AB = B, ABðABÞð1Þ A = A, CAðCAÞð1Þ C = C and ð1Þ (1) AðCAÞ CA = A, for some (equivalently every) (AB) 2 (AB){1} and (CA)(1) 2 (CA){1}. (b) If the statements in (a) are true, then the unique {1, 2}-inverse of A with the prescribed range RðBÞ and null space N ðCÞ is represented by ð1,2Þ
ARðBÞ,N ðCÞ = BðABÞð1Þ AðCAÞð1Þ C = BUAVC
ð4:7Þ
= BðCABÞð1Þ C, for arbitrary (AB)(1) 2 (AB){1}, (CA)(1) 2 (CA){1} and (CAB)(1) 2 (CAB) {1}, and arbitrary U 2 k × m , V 2 n × l satisfying BUAB = B and CAV C = C. Corollary 4.11 Theorem 2.11 is equivalent to Theorem 4.10 in the case rank (CAB) = rank(B) = rank(C) = rank(A). Remark 4.12 It is evident that only the conditions (v) of Theorem 4.10 can be derived from the Urquhart’s results. All other conditions are based on the solutions to certain matrix equations and they are introduced in Theorem 4.10. Also, the first two representations in (4.7) are introduced in the present research. Theorem 4.10 is very important for computing a {1, 2}-inverse with the predefined range RðBÞ and kernel N ðCÞ. Algorithm 4.3 gives an efficient computational framework.
Existence and Representations of Solutions to Some Constrained Systems. . .
47
Algorithm 4.3 Calculating a {1, 2}-inverse of prescribed range and null space
ð1,2Þ
Algorithm 4.4 gives an alternative procedure for computing ARðBÞ,N ðCÞ . Algorithm 4.4 Alternative calculation of a {1, 2}-inverse of prescribed range and null space
Example 4.1 (a) Let us consider
A=
and
1
-1
0
0
0
0
-1
1
0
0
0
0
-1
-1
1
-1
0
0
-1
-1
-1
1
0
0
-1
-1
-1
0
2
-1
-1
-1
0
-1
-1
2
48
D. Mosić and P. S. Stanimirović
0:225335
B=
0:243036
0:897702
0:45812
0:566272
0:889351 0:0980218 0:973943
0:27347
0:622659
0:70979
0:736926
0:933898 0:314849 0:218703
0:840236
0:978944
0:221621 0:210693 0:271547
, 0:811756 0:0749757 0:631064 0:767058 0:943193 0:262034
C=
0:521909
0:639143 0:426581 0:519599
0:449225
0:547829
0:201558
0:704045 0:540362
0:871495
0:863133
0:677018 0:0668656 0:727383 0:654979
0:943263
0:88328
0:907449
0:365506
0:606665 0:142921 0:0816509 :
0:372605
0:251005
0:33544
0:926017
0:0644001 0:664659
0:361055
0:381101 0:456523
0:6328
0:274991 0:213369
The matrices B and C are generated with a view to illustrate Theorem 4.10, Algorithms 4.3, and 4.4. The conditions (iv) and (v) of Theorem 4.10 are satisfied. Therefore, it is expectable that the results generated by Algorithms 4.3 and 4.4 are identical as much as possible. One of the possible solutions to BUCAB = B and CABUC = C is given by U = (CAB){. Corresponding outer inverse is given as X = BUC, which is equal to ð2Þ
ð1,2Þ
3:35289
2:85289
- 0:25
- 0:25
- 9:44 10 - 15
- 1:53 10 - 14
- 8:71026
- 8:21026
- 0:25
- 0:25
- 2:22 10 - 15
3:66 10 - 15
12:0644
11:2978
0:506077
0:235422
- 0:448913
- 0:83511
- 9:88819
- 10:6548
0:00607695
0:735422
- 0:448913
- 0:83511
5:01206
4:24545
- 0:16059
- 0:0979117
0:217754
- 0:501777
- 2:40549
- 3:1721
- 0:327256
0:068755
- 0:115579
- 0:168444
X = ARðBÞ,N ðCÞ = ARðBÞ,N ðCÞ = BðCABÞ{ C =
:
In the case, the matrix equations CAX = CABUC = C and XAB = BUC = B are satisfied, since
Existence and Representations of Solutions to Some Constrained Systems. . .
49
k CABUC - C k 2 = 1:171753294529215 10 - 13 , k BUCAB - B k 2 = 1:171753294529215 10 - 13 : ð2Þ
ð1,2Þ
(b) The outer inverse X = A,N ðCÞ = A,N ðCÞ can be computed using Algorithm 2.2. Step 1.
Solve the matrix equation CAV C = C with respect to an unknown matrix V 2 n × l . The MATLAB Simulink model gives
V=
Step 2.
- 5:73388
5:44832
2:20603
- 2:15892
1:27204
11:9853
- 11:3053
- 6:9465
5:39402
- 3:26872
- 14:9338
13:8379
9:20539
- 7:54047
4:84552
17:4675
- 16:7211
- 8:11103
8:20388
- 6:06537
- 7:71027
7:17545
2:17421
- 2:73204
2:85034
5:17655
- 4:29222
- 3:2692
2:06864
- 1:63049
:
The output matrix is ð1,2Þ
X = VC = A,N ðCÞ = 3:35285
2:85285
- 0:250001
- 0:249997
5:26 10 - 6
3:29 10 - 6
- 8:71021
- 8:21021
- 0:249998
- 0:250006
- 0:00001221
- 7:13 10 - 6
10:8687
10:8687
0:499998
7:88 10 - 6
- 0:249984
- 0:249991
0:499991
- 0:250018
- 0:25001
- 11:0838
- 11:0838 2:93 10 - 6
3:81574
3:81572
- 0:1669
- 0:333715
0:416583
0:0832855
- 3:60117
- 3:60117
- 0:333333
- 0:166669
0:0833285
0:416664
:
Further, it can be verified that kXAX - Xk = 0.00088129447877.
5 G-Outer Inverses and One-Sided G-Outer Inverses A G-Drazin inverse of a square matrix A 2 n × n with k = ind(A), is defined in [84] as a matrix X 2 n × n (which is not unique in general) such that
50
D. Mosić and P. S. Stanimirović
AXA = A,
Ak + 1 X = Ak
and XAk + 1 = Ak :
ð5:1Þ
The system (5.1) is equivalent to the system AXA = A,
AD AX = AD
and XAAD = AD :
As an extension of G-Drazin inverse, a G-outer inverse was introduced for Banach space operators in terms of outer inverse with fixed range and null ×n space in [47]. A matrix X 2 n × m is a G-outer (T, S)-inverse of A 2 m T,S if ð2Þ
ð2Þ
AT,S AX = AT,S
AXA = A,
and
ð2Þ
ð2Þ
AT,S = XAAT,S :
ð5:2Þ
The G-Drazin inverse is defined as an inner inverse satisfying equations related to the Drazin inverse, and G-outer (T, S)-inverse inverse is introduced as an inner inverse satisfying equations related to the corresponding outer inverse. Since the G-outer (T, S)-inverse inverse is a generalization of the G-Drazin inverse, its name comes from the name of the G-Drazin inverse. Starting from the fact that G-outer inverse is not unique, denote by A{GO, T, S} the set of all G-outer (T, S)-inverses of A. It is clear that A{GO, ð2Þ T, S}⊆ A{1}. When AT,S = AD , the G-outer (T, S)-inverse of A becomes the G-Drazin inverse of A. G-Drazin and G-outer inverses are essential in studying partial orders and solving some matrix equations [22, 27, 28, 50, 51].
5.1
Properties of G-Outer Inverses
According to [47, Theorem 2.1], X is a G-outer (T, S)-inverses of A if and only if it is a solution to one of the systems presented in parts (ii) and (iii) of Theorem 5.1. ×n n×m , the following statements are Theorem 5.1 Let A 2 m T,S . For X 2 equivalent:
(i) X 2 A{GO, T, S}; ð2Þ ð2Þ (ii) AXA = A and AT,S AX = XAAT,S ; ð2Þ
ð2Þ
ð2Þ
ð2Þ
(iii) AXA = A, AAT,S AX = AAT,S and XAAT,S A = AT,S A.
Existence and Representations of Solutions to Some Constrained Systems. . .
51
In the case that T and S are the range and null space of some matrices D and B, respectively, we obtain additional characterizations of G-outer inverses. ×n Theorem 5.2 Let B 2 p × m , D 2 n × q and assume A 2 m RðDÞ,N ðBÞ . The following statements are equivalent for X 2 n × m :
(i) X 2 AfGO, RðDÞ, N ðBÞg; (ii) AXA = A, BAX = B, and XAD = D; (iii) AXA = A, N ðAXÞ ⊆ N ðBÞ and RðDÞ ⊆ RðXAÞ.
5.2
Left and Right G-Outer Inverses
To present weaker versions of G-outer invertibility, even the second equation or the third equation was omitted in (5.2) and left and right G-outer (T, S)inverses (or one-sided G-outer (T, S)-inverses) of a rectangular matrix were defined in [59]. ×n n×m is Let A 2 m T,S . A matrix X 2 (i) a left G-outer (T, S)-inverse of A if the following equalities hold: AXA = A
and
ð2Þ
ð2Þ
XAAT,S = AT,S ;
(ii) a right G-outer (T, S)-inverse of A if the following equalities hold: AXA = A
and
ð2Þ
ð2Þ
AT,S AX = AT,S :
Obviously, if X 2 n × m is both left and right G-outer (T, S)-inverse of ×n A 2 m T,S , then X is G-outer (T, S)-inverse of A [47]. Also, an arbitrary G-outer inverse of A is a left and right G-outer inverse, which implies that wider classes of generalized inverses are presented. The sets of all left and right G-outer (T, S)-inverses of A are denoted by A {l, GO, T, S} and A{r, GO, T, S}, respectively. Necessary and sufficient conditions for a matrix to be a left or right G-outer inverse are considered. ×n n×m , the following statements are Theorem 5.3 Let A 2 m T,S . For X 2 equivalent:
52
D. Mosić and P. S. Stanimirović
(i) X 2 A{l, GO, T, S}; ð2Þ ð2Þ (ii) AXA = A and XAAT,S A = AT,S A. ×n n×m , the following statements are Theorem 5.4 Let A 2 m T,S . For X 2 equivalent:
(i) X 2 A{r, GO, T, S}; ð2Þ ð2Þ (ii) AXA = A and AAT,S AX = AAT,S . In particular, if T and S are range and null space of some matrices D and B, respectively, the following characterizations of left and right G-outer inverses are derived. ×n Theorem 5.5 Let B 2 p × m , D 2 n × q and let A 2 m RðDÞ,N ðBÞ . For X 2 n × m , the following statements are equivalent:
(i) X 2 Afl, GO, RðDÞg; (ii) AXA = A and XAD = D; (iii) AXA = A and RðDÞ ⊆ RðXAÞ. ×n Theorem 5.6 Let B 2 p × m , D 2 n × q and let A 2 m RðDÞ,N ðBÞ . For X 2 n × m , the following statements are equivalent:
(i) X 2 Afr, GO, N ðBÞg; (ii) AXA = A and BAX = B; (iii) AXA = A and N ðAXÞ ⊆ N ðBÞ.
5.3
Left and Right G-Drazin Inverses ð2Þ
Taking m = n and AT,S = AD in the definitions of the left and right G-outer (T, S)-inverses, left and right G-Drazin inverses were presented in [59]. Let A 2 n × n and ind(A) = k. A matrix X 2 n × n is (i) a left G-Drazin inverse of A if the following equalities hold: AXA = A
and
XAk + 1 = Ak ;
(ii) a right G-Drazin inverse of A if the following equalities hold: AXA = A
and
Ak + 1 X = Ak :
Existence and Representations of Solutions to Some Constrained Systems. . .
53
If X 2 n × n is both left and right G-Drazin inverse of A 2 n × n , then X is G-Drazin inverse of A. The G-Drazin inverse of A is evidently left and right G-Drazin inverse of A. Denoted by A{l, GD} and A{r, GD} the sets of all left and right G-Drazin inverses of A, respectively. We characterize left G-Drazin inverses in the following result by Theorem 5.3, Theorem 5.2, and some well-known properties of Drazin inverses. Corollary 5.7 ([59, Corollary 3.1]) Let A 2 n × n and ind(A) = k. For X 2 n × n , the following statements are equivalent: (i) (ii) (iii) (iv)
X 2 A{l, GD}; AXA = A and XAAD = AD; AXA = A and XAADA = ADA; AXA = A and RðAk Þ ⊆ RðXAÞ.
Applying Theorems 5.4 and 5.6, we get characterizations of right G-Drazin inverses. Corollary 5.8 ([59, Corollary 3.2]) Let A 2 n × n and ind(A) = k. For X 2 n × n , the following statements are equivalent: (i) (ii) (iii) (iv)
X 2 A{r, GD}; AXA = A and ADAX = AD; AXA = A and AADAX = AAD; AXA = A and N ðAXÞ ⊆ N ðAk Þ.
Example 5.1 Consider A=
a 0
b 0
,
D=
1 0 c
0
B=
,
1
0
b
0
,
where a, b, c ≠ 0 are free variables. Unknown matrix X is of the general form X=
x1,1
x1,2
x2,1
x2,2
,
where x1,2, x1,2, x2,1, x2,2 are unevaluated variables with values from the domain . The matrices D and B satisfy rank(AD) = rank(D) = rank(BA) = rank(B) = 1, and the general solution to the system
54
D. Mosić and P. S. Stanimirović
AXA = A, BAX = B, XAD = D required in Theorem 5.2 is equal to
AfGO, RðDÞ, N ðBÞg =
1 a + bc c a + bc
x1,2 ax - 1,2 b
j a, b, c, x1,2 2 :
Further, the general solution to ADA = A, XAD = D gives
Afl, GO, RðDÞg =
1 x a + bc 1,2 j a, b, c, x1,2 , x2,2 2 : c x a + bc 2,2
Finally, the symbolic solution to AXA = A, BAX = B is equal to x1,1 Afr, GO, N ðBÞg =
1 - ax1,1 b
x1,2 -
ax1,2 j a, b, x1,1 , x1,2 2 : b
Further, for
C 1 = DðBADÞ{ B =
the general solution to
b2 1 + 2 b + 1 ða + bcÞ b2 + 1 ða + bcÞ
0
cb2 c + 2 b + 1 ða + bcÞ b2 + 1 ða + bcÞ
0
,
Existence and Representations of Solutions to Some Constrained Systems. . .
55
AXA = A, XAC1 = C 1 , C 1 AX = C 1 is represented by
AfGO, RðC 1 Þ, N ðC 1 Þg
=
1 a + bc
x1,2
c a + bc
ax - 1,2 b
j a, b, c, x1,2 2 :
Further, the general solution to AXA = A, XAC 1 = C 1 gives
Afl, GO, RðC 1 Þg
1 x a + bc 1,2
=
c x a + bc 2,2
j a, b, c, x1,2 , x2,2 2 :
Finally, the solution to AXA = A, C 1 AX = C 1 in symbolic form is
Afr, GO, N ðC 1 Þg
=
1 a + bc
x1,2
c cx1,2 a + bc
j a, b, c, x1,2 2 :
6 Solvability of Some Systems by G-Outer Inverses Purely algebraic equivalent conditions for the solvability of some new systems of matrix equation as well as the general forms of their solutions in terms of G-outer inverses were proposed in [50].
56
D. Mosić and P. S. Stanimirović
Theorem 6.1 ([50, Theorem 2.1]) Let A 2 m×n and B, D, E 2 n × m . If ð2Þ ARðBÞ,N ðDÞ exists and AfGO, RðBÞ, N ðDÞg ≠ ∅, then the system AXA = AEA and
BAEAX = XAEAD
ð6:1Þ
has a solution if and only if ABAðEAÞ2 = ðAEÞ2 ADA: In this case, the general solution X to (6.1) is given as X = C 1 AEAC 2 + M - ðI - A - AÞMAEADðAEADÞ - ðBAEAÞ - BAEAMðI - AA - Þ - A - AMAA - , for arbitrary M 2 n × m and arbitrary but fixed (AEAD)-2 (AEAD){1}, (BAEA)-2 (BAEA){1} and C1 , C 2 2 AfGO, RðBÞ, N ðDÞg, A-2 A{1}. Remark that [27, Theorem 2.2] can be obtained as a special case of Theorem 6.1 for m = n, k = ind(A) and B = D = Ak-1. Results stated in Theorem 6.1 give solutions to some additional systems of matrix equations. Corollary 6.2 Let A 2 m×n and B, D 2 n × m . ð2Þ
(i) If ARðBÞ,N ðDÞ exists and AfGO, RðBÞ, N ðDÞg ≠ ∅, then the system AXA = A and
BAX = XAD
ð6:2Þ
is solvable if and only if ABA = ADA: In this case, the general solution X to (6.2) is given as X = C + M - ðI - A - AÞMADðADÞ - ðBAÞ - BAMðI - AA - Þ - A - AMAA - , for arbitrary M 2 n × m and arbitrary but fixed A-2 A{1}, (AD) 2 (AD){1}, (BA) 2 (BA){1} and C 2 AfGO, RðBÞ, N ðDÞg.
Existence and Representations of Solutions to Some Constrained Systems. . .
57
ð2Þ
(ii) If ARðBÞ,N ðBÞ exists and AfGO, RðBÞ, N ðBÞg ≠ ∅, then the system AXA = A and
BAX = XAB
ð6:3Þ
has a solution. In this case, the general solution X to (6.3) is given as X = C + M - ðI - A - AÞMABðABÞ - ðBAÞ - BAMðI - AA - Þ - A - AMAA - , for arbitrary M 2 n × m and arbitrary but fixed A-2 A{1}, (AB)-2 (AB) {1}, (BA)-2 (BA){1} and C 2 AfGO, RðBÞ, N ðBÞg. Results obtained in [50, Theorem 2.1] give an equivalent condition for solving a new matrix equation system which is a generalization of the system given by (5.2). Also, the general solution to this new system is presented in terms of G-outer inverses. Besides the fact that the matrices B and D are not of the same type in Theorems 6.1 and 6.3, we observe that G-outer inverses which appear in Theorems 6.1 and 6.3 are not from the same set. Precisely, we use G-outer inverses from the set AfGO, RðBÞ, N ðDÞg in Theorem 6.1, but from the set AfGO, RðDÞ, N ðBÞg in Theorem 6.3. Theorem 6.3 ([50, Theorem 2.1]) Let A 2 m×n , B 2 p × m , D 2 n × q and ð2Þ E 2 n × m . If ARðDÞ,N ðBÞ exists (or rank(D) = rank(B) = rank(BAD)) and AfGO, RðDÞ, N ðBÞg ≠ ∅, then the system AXA = AEA,
BAEAX = B and
XAEAD = D
ð6:4Þ
has a solution if and only if BAðEAÞ2 = BA
and
ðAEÞ2 AD = AD:
In this case, the general solution X to (6.4) is X = C 1 AEAC 2 + M - ðI - A - AÞMAEADðAEADÞ - ðBAEAÞ - BAEAMðI - AA - Þ - A - AMAA - , for arbitrary M 2 n × m and arbitrary but fixed (BAEA)-2 (BAEA){1} (AEAD)-2 (AEAD){1}, C 1 , C2 2 AfGO, RðDÞ, N ðBÞg.
A-2 A{1}, and
58
D. Mosić and P. S. Stanimirović
Specializing matrices B, D, E of Theorem 6.3, we obtain some interesting applications. In the particular case that E = A{ in Theorem 6.3, we solve the following system of three equations using a G-outer inverse. ð2Þ
Corollary 6.4 Let A 2 m×n , B 2 p × m and D 2 n × q . If ARðDÞ,N ðBÞ exists and AfGO, RðDÞ, N ðBÞg ≠ ∅, then the system AXA = A,
BAX = B and
XAD = D
ð6:5Þ
has a solution. In addition, the general solution X to (6.5) is given as X = C + M - ðI - A - AÞMADðADÞ - ðBAÞ - BAMðI - AA - Þ - A - AMAA - , for arbitrary M 2 n × m and arbitrary but fixed C 2 AfGO, RðDÞ, N ðBÞg, A-2 A{1}, (AD)-2 (AD){1} and (BA)-2 (BA){1}. ð2Þ
Choosing B = D = AT,S in Theorem 6.3, we present the system which generalizes (5.2) and get that its solvability is equivalent with some algebraic conditions. We also describe the general form of its solutions. ×n n×m . If A{GO, T, S}≠ ∅, then the Corollary 6.5 Let A 2 m T,S and E 2 system
AXA = AEA,
ð2Þ
ð2Þ
AT,S AEAX = AT,S
and
ð2Þ
ð2Þ
XAEAAT,S = AT,S
ð6:6Þ
has a solution if and only if ð2Þ
ð2Þ
AT,S AðEAÞ2 = AT,S A
and
ð2Þ
ð2Þ
ðAEÞ2 AAT,S = AAT,S :
In this case, the general solution X to (6.6) is given as ð2Þ
ð2Þ -
X = C 1 AEAC 2 + M - ðI - A - AÞMAEAAT,S ðAEAAT,S Þ ð2Þ
-
ð2Þ
- ðAT,S AEAÞ AT,S AEAMðI - AA - Þ - A - AMAA - , for arbitrary M 2 n × m and arbitrary but fixed C1, C2 2 A{GO, T, S}, A-2 A{1}, (AEAD)-2 (AEAD){1} and (BAEA)-2 (BAEA){1}. ð2Þ
Setting E = AT,S in Corollary 6.5, we show the next consequence.
Existence and Representations of Solutions to Some Constrained Systems. . .
59
×n n×m Corollary 6.6 Let A 2 m . If A{GO, T, S}≠ ∅, then the T,S and E 2 system ð2Þ
ð2Þ
AXA = AAT,S A,
ð2Þ
AT,S AX = AT,S
ð2Þ
ð2Þ
XAAT,S = AT,S
and
ð6:7Þ
is consistent. In addition, the general solution X to (6.7) is given as ð2Þ
ð2Þ
X = AT,S + M - ðI - A - AÞMAAT,S ð2Þ
- AT,S AMðI - AA - Þ - A - AMAA - , for an arbitrary M 2 n × m and arbitrary but fixed A-2 A{1}. ð2Þ
If AT,S = AD in Corollary 6.6, we verify the next corollary. Corollary 6.7 Let A 2 n × n and k = ind(A). Then the system AXA = AAD A,
AD AX = AD
and
XAAD = AD
ð6:8Þ
has a solution. In addition, the general solution X to (6.8) is given as X = AD + M - ðI - A - AÞMAAD - AD AMðI - AA - Þ - A - AMAA - , for arbitrary M 2 n × m and arbitrary but fixed A-2 A{1}. We can easily check that the system appeared in Corollary 6.7 has a solution if and only if the system AXA = AADA and AkX = XAk considered in [27, Corollary 3.5], has a solution, and these two systems have the same general solution forms. Applying Theorem 6.3 for A 2 n × n , k = ind(A) and B = D = Ak, we consider solvability of an extension of system (5.1). Corollary 6.8 Let A 2 n × n , k = ind(A), and E 2 n × n . Then the system AXA = AEA,
Ak + 1 EAX = Ak
has a solution if and only if
and
XAEAk + 1 = Ak
ð6:9Þ
60
D. Mosić and P. S. Stanimirović
Ak + 1 ðEAÞ2 = Ak + 1
ðAEÞ2 Ak + 1 = Ak + 1 :
and
In this case, the general solution X to the matrix system (6.9) is given by X = C 1 AEAC2 + M - ðI - A - AÞMAEAk + 1 ðAEAk + 1 Þ
-
-
- ðAk + 1 EAÞ Ak + 1 EAMðI - AA - Þ - A - AMAA - , for arbitrary M 2 n × m and arbitrary but fixed C1, C2 2 A{GD}, A-2 A{1}, (AEAk+1)-2 (AEAk+1){1} and (Ak+1EA)-2 (Ak+1EA){1}. By Corollary 6.5, we describe the set of all G-outer (T, S)-inverses of A using one particular G-outer (T, S)-inverse of A. Precisely, we present two general representations for G-outer inverses in terms of only one parameter M and in terms of two parameters U and V . ×n Theorem 6.9 Let A 2 m T,S . If A{GO, T, S}≠ ∅, then ð2Þ
AfGO, T, Sg = fC + ðI - A - AÞMðI - AAT,S Þ ð2Þ
+ ðI - AT,S AÞMðI - AA - Þ - ðI - A - AÞMðI - AA - Þ, M 2 n × m is arbitraryg ð2Þ
ð2Þ
= fC + ðI - A - AÞVðI - AAT,S ÞðI - AT,S AÞUðI - AA - Þ, U, V 2 n × m are arbitraryg for arbitrary but fixed C 2 A{GO, T, S} and A-2 A{1}. ð2Þ
Remark that, if m = n and AT,S = AD in Theorem 6.9, we obtain [27, Theorem 3.2].
7 Solvability of Some Systems by Left and Right G-Outer Inverses Using left and right G-outer inverses, new matrix equation systems were solved in [50]. Necessary and sufficient conditions for solving these new systems and their general solutions are presented now. ×n Theorem 7.1 Let B 2 p × m , D 2 n × q , E 2 n × m , A 2 m RðDÞ,N ðBÞ and Afl, GO, RðDÞg ≠ ∅. Then the system
Existence and Representations of Solutions to Some Constrained Systems. . .
AXA = AEA and
XAEAD = D
61
ð7:1Þ
has a solution if and only if ðAEÞ2 AD = AD: In this case, the general solution X to (7.1) is given as X = C 1 AEAC2 + W - ðI - A - AÞWAEADðAEADÞ - - A - AWAA - , for arbitrary W 2 n × m and fixed but arbitrary (AEAD) 2 (AEAD){1} and C1 , C 2 2 Afl, GO, RðDÞg.
A-2 A{1},
×n Theorem 7.2 Let B 2 p × m , D 2 n × q , E 2 n × m , A 2 m RðDÞ,N ðBÞ and Afr, GO, N ðBÞg ≠ ∅. Then the system
AXA = AEA and
BAEAX = B
ð7:2Þ
has a solution if and only if BAðEAÞ2 = BA: In this case, the general solution X to (7.2) is given as X = C 1 AEAC2 + W - ðBAEAÞ - BAEAWðI - AA - Þ - A - AWAA - , for arbitrary W 2 n × m and fixed but arbitrary (BAEA)-2 (BAEA){1} and C1 , C 2 2 Afr, GO, N ðBÞg.
A-2 A{1},
For E = A{ in Theorem 7.1, we get general solution of adequate system of two equations based on a left G-outer inverse. ×n Corollary 7.3 Suppose that B 2 p × m , D 2 n × q , A 2 m RðDÞ,N ðBÞ , and Afl, GO, RðDÞg ≠ ∅. Then the system
AXA = A and
XAD = D
has a solution. In addition, the general solution X to (7.3) is given as
ð7:3Þ
62
D. Mosić and P. S. Stanimirović
X = C + W - ðI - A - AÞWADðADÞ - - A - AWAA - , for arbitrary W 2 n × m and for fixed but arbitrary A-2 A{1}, (AD)-2 (AD) {1} and C 2 Afl, GO, RðDÞg. Similarly, we have the next result by Theorem 7.2. ×n Corollary 7.4 Suppose that B 2 p × m , D 2 n × q , A 2 m RðDÞ,N ðBÞ , and Afr, GO, N ðBÞg ≠ ∅. Then the system
AXA = A and
BAX = B
ð7:4Þ
has a solution. In addition, the general solution X to (7.4) is X = C + W - ðBAÞ - BAWðI - AA - Þ - A - AWAA - , for arbitrary W 2 n × m and for fixed but arbitrary A-2 A{1}, (BA)-2 (BA) {1} and C 2 Afr, GO, N ðBÞg. ð2Þ
The choice B = D = AT,S in Theorems 7.1 and 7.2 requires solving a system that extends the systems given in the definitions of left and right G-outer inverses. ×n n×m and A{l, GO, T, S}≠ ∅. Then the Corollary 7.5 Let A 2 m T,S , E 2 system
AXA = AEA and
ð2Þ
ð2Þ
ð7:5Þ
XAEAAT,S = AT,S
has a solution if and only if ð2Þ
ð2Þ
ðAEÞ2 AAT,S = AAT,S : In this case, the general solution X to (7.5) is given as ð2Þ
ð2Þ
X ¼ C 1 AEAC2 + W - ðI - A - AÞWAEAAT,S AEAAT,S
-
- A - AWAA - , for arbitrary W 2 n × m and for fixed but arbitrary C1, C2 2 A{l, GO, T, S}, ð2Þ ð2Þ A-2 A{1} and ðAEAAT,S Þ 2 ðAEAAT,S Þf1g.
Existence and Representations of Solutions to Some Constrained Systems. . .
63
×n n×m Corollary 7.6 Let A 2 m and A{r, GO, T, S}≠ ∅. Then the T,S , E 2 system
AXA = AEA and
ð2Þ
ð2Þ
AT,S AEAX = AT,S
ð7:6Þ
has a solution if and only if ð2Þ
ð2Þ
AT,S AðEAÞ2 = AT,S A: In this case, the general solution X to (7.6) is equal to ð2Þ
-
ð2Þ
X = C 1 AEAC2 + W - ðAT,S AEAÞ AT,S AEAWðI - AA - Þ - A - AWAA - , for arbitrary W 2 n × m , for fixed but arbitrary C1, C2 2 A{r, GO, T, S}, ð2Þ ð2Þ A-2 A{1}, and ðAT,S AEAÞ 2 ðAT,S AEAÞf1g. Now, the sets of all left and right G-outer (T, S)-inverses of A can be described using one particular left and right G-outer (T, S)-inverse of A, respectively. In particular, we obtain general representations of left and right G-outer inverses based on one parameter W and based on two parameters U and V . ×n Theorem 7.7 Let A 2 m T,S and A{l, GO, T, S}≠ ∅. Then ð2Þ
Afl, GO, T, Sg = fC + W - ðI - A - AÞWAAT,S - A - AWAA - , W 2 n × m is arbitraryg
ð2Þ
= fC + ðI - A - AÞVðI - AAT,S Þ + UðI - AA - Þ, U, V 2 n × m are arbitraryg
for fixed but arbitrary C 2 A{l, GO, T, S} and A-2 A{1}. ×n Theorem 7.8 Let A 2 m T,S and A{r, GO, T, S}≠ ∅. Then ð2Þ
Afr, GO, T, Sg = fC + W - AT,S AWðI - AA - Þ - A - AWAA - , W 2 n × m is arbitraryg
ð2Þ
= fC + ðI - A - AÞV + ðI - AT,S AÞUðI - AA - Þ, U, V 2 n × m are arbitraryg
for fixed but arbitrary C 2 A{r, GO, T, S} and A-2 A{1}.
64
D. Mosić and P. S. Stanimirović ð2Þ
In the case when E = AT,S in Corollary 7.5, we get solutions to one system more. ×n Corollary 7.9 Let A 2 m T,S and A{l, GO, T, S}≠ ∅. Then the system ð2Þ
AXA = AAT,S A and
ð2Þ
ð2Þ
XAAT,S = AT,S
ð7:7Þ
has a solution. In addition, the general solution X to (7.7) is ð2Þ
ð2Þ
X = AT,S AC + W - ðI - A - AÞWAAT,S - A - AWAA - , for arbitrary W 2 n × m and for fixed but arbitrary C 2 A{l, GO, T, S} and A-2 A{1}. Similarly, we prove the following result. ×n Corollary 7.10 Let A 2 m T,S and A{r, GO, T, S}≠ ∅. Then the system ð2Þ
AXA = AAT,S A and
ð2Þ
ð2Þ
AT,S AX = AT,S
ð7:8Þ
has a solution. In addition, the general solution X to (7.8) is given as ð2Þ
ð2Þ
X = CAAT,S + W - AT,S AWðI - AA - Þ - A - AWAA - , for arbitrary W 2 n × m and for fixed but arbitrary C 2 A{r, GO, T, S} and A-2 A{1}. ð2Þ
Using AT,S = AD in Corollary 7.9, we get the following consequence. Corollary 7.11 Let A 2 n × n and k = ind(A). Then the system AXA = AAD A and
XAAD = AD
ð7:9Þ
has a solution. In addition, the general solution X to (7.9) is given as X = AD AC + W - ðI - A - AÞWAAD - A - AWAA - , for arbitrary W 2 n × m and for fixed but arbitrary C 2 A{l, GD} and A-2 A {1}. Also, we can check the next result.
Existence and Representations of Solutions to Some Constrained Systems. . .
65
Corollary 7.12 Let A 2 n × n and k = ind(A). Then the system AXA = AAD A and
ð7:10Þ
AD AX = AD
has a solution. In addition, the general solution X to (7.10) is given as X = CAAD + W - AD AWðI - AA - Þ - A - AWAA - , for arbitrary W 2 n × m and for fixed but arbitrary C 2 A{r, GD} and A-2 A {1}. For A 2 n × n , k = ind(A) and B = D = Ak in Theorem 7.1, we obtain solvability of an extension of system (5.1). Corollary 7.13 Let A 2 n × n , k = ind(A) and E 2 n × n . Then the system AXA = AEA and
XAEAk + 1 = Ak
ð7:11Þ
has a solution if and only if ðAEÞ2 Ak = Ak : In this case, the general solution X to (7.11) is given as -
X = C 1 AEAC 2 + W - ðI - A - AÞWAEAk + 1 ðAEAk + 1 Þ - A - AWAA - , for arbitrary W 2 n × m and for fixed but arbitrary C1, C2 2 A{l, GD}, A-2 A{1} and (AEAk+1)-2 (AEAk+1){1}. Corollary 7.14 Let A 2 n × n , k = ind(A) and E 2 n × n . Then the system AXA = AEA and
Ak + 1 EAX = Ak
has a solution if and only if Ak ðEAÞ2 = Ak : In this case, the general solution X to (7.12) is
ð7:12Þ
66
D. Mosić and P. S. Stanimirović -
X = C 1 AEAC 2 + W - ðAk + 1 EAÞ Ak + 1 EAWðI - AA - Þ - A - AWAA - , for arbitrary W 2 n×m and for fixed but arbitrary C1, C2 2 A{r, GD}, A-2 A{1} and (Ak+1EA)-2 (Ak+1EA){1}. Applying Theorems 7.7 and 7.8, we describe the sets of all left and right G-Drazin inverses of A. Corollary 7.15 Let A 2 n × n and k = ind(A). Then Afl, GDg = fC + W - ðI - A - AÞWAAD - A - AWAA - j W 2 n × m g = fC + ðI - A - AÞVðI - AAD Þ + UðI - AA - Þ, U, V 2 n × m g
and Afr, GDg ¼ C ′ + W - AD AW ðI - AA - Þ - A - AWAA - W 2 n × m ¼ C ′ + ðI - A - AÞV + I - AD A U ðI - AA - Þ, U, V 2 n × m , for fixed but arbitrary C 2 A{l, GO, T, S}, C′2 A{r, GO, T, S} and A-2 A {1}.
8 Conclusion Presented results provide equivalent conditions for the existence and corresponding characterizations and representations of outer and inner inverses with prescribed range and/or null space. These representations are derived using appropriate matrix equations, which leads to efficient computational procedure, presented in the form of efficient algorithms. The methods and algorithms proposed arising from theoretical investigations are aimed to compute various classes of outer and/or inner generalized inverses of the form B(CAB)(1)C, such that (CAB)(1) is an arbitrary solution to proper matrix equation(s) solvable under specified conditions. One approach in calculating B(CAB)(1)C is to compute the inner inverse (CAB)(1) using one of known direct methods based on various decompositions or iterative algorithms for computing generalized inverses. On the other hand, derived representations
Existence and Representations of Solutions to Some Constrained Systems. . .
67
of {1}-, {2}-, and {1, 2}-inverses and initiated algorithms are based on two global steps: Step 1. Solve required equation(s); Step 2. Multiply the solution obtained in Step 1 by appropriate matrix expressions, if it is necessary. The underlying equations can be solved using various methods. The approach based on recurrent neural networks was utilized in [74]. Computational procedures based on finding exact solutions to underlying linear matrix equations required during the computation of outer inverses were presented in [75]. Importantly, other techniques can be applied in solving equations corresponding to certain generalized inverses, leading to numerous and various computational procedures. Conditions for the existence and representations of one-sided (B, C)inverses are considered, and some correlations with outer inverses possessing prescribed image and kernel are investigated. G-outer inverses and one-sided G-outer inverses are considered as particular inner inverses to which additional restrictions characteristic for outer inverses are imposed. Acknowledgements Dijana Mosić and Predrag Stanimirović are supported from the Ministry of Education, Science and Technological Development, Republic of Serbia, Grants 451-03-47/2023-01/200124. Predrag S. Stanimirović is supported by the Science Fund of the Republic of Serbia, (No. 7750185, Quantitative Automata Models: Fundamental Problems and Applications—QUAM).
References 1. Baksalary, O. M., & Trenkler, G. (2010). Core inverse of matrices. Linear and Multilinear Algebra, 58, 681–697 2. Ben-Israel, A., & Greville, T. N. E. (2003). Generalized inverses: theory and applications (2nd edn.). New York: Springer 3. Ben-Israel, A. (1986). Generalized inverses of matrices: a perspective of the work of Penrose. Mathematical Proceedings of the Cambridge Philosophical Society, 100, 407–425 4. Ben-Israel, A. (2002). The Moore of the Moore-Penrose inverse. The Electronic Journal of Linear Algebra, 9, 150–157 5. Benítez, J., & Boasso, E. (2017). The inverse along an element in rings with an involution, Banach algebras and C-algebras. Linear and Multilinear Algebra, 65, 284–299 6. Benítez, J., Boasso, E., & Jin, H. (2017). On one-sided (B, C)-inverses of arbitrary matrices. Electronic Journal of Linear Algebra, 32, 391–422 7. Bjorck, A. (1996). Numerical methods for least squares problems. Philadelphia: SIAM
68
D. Mosić and P. S. Stanimirović
8. Bovik, A. (2000). Handbook of image and video processing. San Diego, San Francisko, New York, Boston, London, Sydney, Tokyo: Academic Press 9. Campbell, S. L., & Meyer, C. D. Jr. (2008). Generalized inverses of linear transformations. New York: Dover Publications, Inc. Corrected reprint of the 1979 original, SIAM, Philadelphia ð2Þ 10. Cao, C. G., & Zhang, X. (2003). The generalized inverse AT, and its applications. Journal of Applied Mathematics and Computing, 11, 155–164 11. Chen, J., Ke, Y., & Mosić, D. (2017). The reverse order law of the (b, c)-inverse in semigroups. Acta Mathematica Hungarica, 151, 181–198 12. Chen, L. J., Mosić, D., & Xu, Z. S. (2020). On a new generalized inverse for Hilbert space operators. Quaestiones Mathematicae, 43, 1331–1348 13. Chen, J., Xu, S., Benítez, J., & Chen, X. (2019). Rank equalities related to a class of outer generalized inverse. Filomat, 33, 5611–5622 14. Chen, J., Zou, H., Zhu, H., & Patrício, P. (2017b). The one-sided inverse along an element in semigroups and rings. Mediterranean Journal of Mathematics, 14, 208 15. Chen, L., Krishnamurthy, V. E., & Macleod, I. (1994). Generalized matrix inversion and rank compuation by successive matrix powering. Parallel Computing, 20, 297–311 16. Chen, L. Y. (1993). A cramer rule for solution of the general restricted linear equation. Linear and Multilinear Algebra, 34, 177–186 17. Chen, Y., & Chen, X. (2000). Representation and approximation of the outer inverse ð2Þ AT,S of a matrix A. Linear Algebra and Its Applications, 308, 85–107 18. Chipman, S. J. (1976). Estimation and aggregation in econometrics: An application of the theory of generalized inverses. Generalized Inverses and Applications (pp. 549–769). Academic Press 19. Chountasis, S., Katsikis, N. V., & Pappas, D. (2009a). Applications of the MoorePenrose inverse in digital image restoration. Mathematical Problems in Engineering (vol. 2009, Article ID 170724, 12 p.). https://doi.org/10.1155/2009/170724 20. Chountasis, S., Katsikis, N. V., & Pappas, D. (2009b). Image restoration via fast computing of the Moore-Penrose inverse matrix. Systems, Signals and Image Processing, IWSSIP 2009 21. Chountasis, S., Katsikis, N. V., & Pappas, D. (2010). Digital image reconstruction in the spectral domain utilizing the Moore-Penrose inverse. Mathematical Problems in Engineering (vol. 2010, Article ID 750352, 14 p.). https://doi.org/10.1155/2010/ 750352 22. Coll, C., Lattanzi, M., & Thome, N. (2018). Weighted G-Drazin inverses and a new pre-order on rectangular matrices. Applied Mathematics and Computation, 317, 12–24 23. Cvetković-Ilić, D. S., & Wei, Y. (2017). Algebraic properties of generalized inverses. Developments in Mathematics (vol. 52). Singapore: Springer 24. Deng, C., & Yu, A. (2015). Relationships between DMP relation and some partial orders. Applied Mathematics and Computation, 266, 41–53 25. Drazin, P. M. (2012). A class of outer generalized inverses. Linear Algebra and Its Applications, 436, 1909–1923 26. Drazin, P. M. (2016). Left and right generalized inverses. Linear Algebra and Its Applications, 510, 64–78
Existence and Representations of Solutions to Some Constrained Systems. . .
69
27. Ferreyra, E. D., Lattanzi, M., Levis, E. F., & Thome, N. (2019). Parametrized solutions X of the system AXA = AEA and AkEAX = XAEAk for a matrix A having index k. Electronic Journal of Linear Algebra, 35, 503–510 28. Ferreyra, E. D., Lattanzi, M., Levis, E. F., & Thome, N. (2020). Solving an open problem about the G-Drazin partial order. Electronic Journal of Linear Algebra, 36, 55–66 29. Gao, Y., & Chen, J. (2018). Pseudo core inverses in rings with involution. Communications in Algebra, 46, 38–50 30. Getson, J. A., &Hsuan, C. F. (1988). {2}-inverses and their statistical applications. Lecture Notes in Statistics (vol. 47). Berlin: Springer 31. Husen, F., Langenberg, P., & Getson, A. (1985). The {2}-inverse with applications to satistics. Linear Algebra and Its Applications, 70, 241–248 32. Kantún-Montiel, G. (2014). Outer generalized inverses with prescribed ideals. Linear and Multilinear Algebra, 62, 1187–1196 33. Krishnamurthy, V. E. (1978). Generalized matrix inverse approach for automatic balancing of chemical equations. International Journal of Mathematical Education in Science and Technology, 9, 323–328 34. Ma, H., & Stanimirović, P. S. (2019). Characterizations, approximation and perturbations of the core-EP inverse. Applied Mathematics and Computation, 359, 404–417 35. Ke, Y., Chen, J., Stanimirović, P. S., & Ćirić, M. (2021). Characterizations and representations of outer inverse for matrices over a ring. Linear and Multilinear Algebra, 69, 155–176 ð2Þ 36. Li, X., & Wei, Y. (2002). A note on computing the generalized inverse AT,S of a matrix A. International Journal of Mathematics and Mathematical Sciences, 31(8), 497–507 37. Liu, X., & Cai, N. (2018). High-order iterative methods for the DMP inverse. Journal of Mathematics, 2018, Article ID 8175935, 6 p. 38. Ma, H., Gao, X., & Stanimirović, P. S. (2020). Characterizations, iterative method, sign pattern and perturbation analysis for the DMP inverse with its applications. Applied Mathematics and Computation, 378, 125196 39. Ma, H., Stanimirović, P. S., Mosić, D., & Kyrchei, I. I. (2021). Sign pattern, usability, representations and perturbation for the core-EP and weighted core-EP inverse. Applied Mathematics and Computation, 404, 126247 40. Mary, X. (2011). On generalized inverses and Green’s relations. Linear Algebra and Its Applications, 434, 1836–1844 41. Malik, B. S., & Thome, N. (2014). On a new generalized inverse for matrices of an arbitrary index. Applied Mathematics and Computation, 226, 575–580 42. Mary, X., & Patrício, P. (2012). The inverse along a lower triangular matrix. Applied Mathematics and Computation, 219, 886–891 43. Mehdipour, M., & Salemi, A. (2018). On a new generalized inverse of matrices. Linear and Multilinear Algebra, 66, 1046–1053 44. Moore, H. E. (1920). On the reciprocal of the general algebraic matrix. Bulletin of the American Mathematical Society, 26, 394–395 45. Mosić, D. (2018a). Characterizations of the image-kernel ( p, q)-inverses. Bulletin of the Malaysian Mathematical Sciences Society, 41, 91–104 46. Mosić, D. (2018b). Generalized inverses. Faculty of Sciences and Mathematics. Niš: University of Niš
70
D. Mosić and P. S. Stanimirović
47. Mosić, D. (2020a). G-outer inverse of Banach spaces operators. Journal of Mathematical Analysis and Applications, 481, 123501 48. Mosić, D. (2017). Reflexive-EP elements in rings. Bulletin of the Malaysian Mathematical Sciences Society, 40, 655–664 49. Mosić, D. (2020b). Representations for the image-kernel ( p, q)-inverses of block matrices in rings. Georgian Mathematical Journal, 27, 297–305 50. Mosić, D. (2020c). Solvability to some systems of matrix equations using G-outer inverses. Electronic Journal of Linear Algebra, 36, 265–276 51. Mosić, D. (2019). Weighted G-Drazin inverse for operators on Banach spaces. Carpathian Journal of Mathematics, 35, 171–184 52. Mosić, D., Djordjević, D. S., & Kantún-Montiel, G. (2014). Image-kernel ( p, q)inverses in rings. Electronic Journal of Linear Algebra, 27, 272–283 53. Mosić, D., & Djordjević, D. S. (2014). Inner image-kernel ( p, q)-inverses in rings. Applied Mathematics and Computation, 239, 144–152 54. Mosić, D., & Kolundžija, M. Z. (2019). Weighted CMP inverse of an operator between Hilbert spaces. Revista de la Real Academia de Ciencias Exactas, Físicas y Naturales. Serie A: Matemáticas. RACSAM, 113, 2155–2173 55. Mosić, D., Kyrchei, I., & Stanimirović, P. S. (2021a). Journal of Applied Mathematics and Computation, 67, 101–130 56. Mosić, D., & Stanimirović, P. S. (2021). Composite outer inverses for rectangular matrices. Quaestiones Mathematicae, 44, 45–72 57. Mosić, D., Stanimirović, P. S., & Katsikis, N. V. (2020a). Solvability of some constrained matrix approximation problems using core-EP inverses. Computational and Applied Mathematics, 39, 311. https://doi.org/10.1007/s40314-020-01360-y 58. Mosić, D., Stanimirović, P. S., Sahoo, K. J., Behera, R., & Katsikis, K. V. (2021b). One-sided weighted outer inverses of tensors. Journal of Computational and Applied Mathematics, 388, 113293 59. Mosić, D., & Wang, L. (2020b). Left and right G-outer inverses. Linear and Multilinear Algebra. https://doi.org/10.1080/03081087.2020.1837062 60. Nashed, Z. M. (1976). Generalized inverse and applications. New York: Academic Press 61. Mosić, D., Zou, H., & Chen, J. (2018). On the (b, c)–inverse in rings. Filomat, 32, 1221–1231 62. Penrose, R. (1955). A generalized inverse for matrices. Proceedings of the Cambridge Philosophical Society, 51, 406–413 63. Petković, M. D., & Petković, M. S. (2015). Hyper-power methods for the computation of outer inverses. Journal of Computational and Applied Mathematics, 278, 110–118 64. Prasad, M. K., & Mohana, S. K. (2014). Core-EP inverse. Linear and Multilinear Algebra, 62, 792–802 65. Rao, R. C. (1962). A note on a generalized inverse of a matrix with applications to problems in mathematical statistics. Journal of the Royal Statistical Society, Series B, 24, 152–158 66. Risteski, B. I. (2008). A new pseudoinverse matrix method for balancing chemical equations and their stability. Journal of the Korean Chemical Society, 52, 223–238 67. Saha, T., Srivastava, S., Khare, S., Stanimirovi’c, P. S., & Petković, M. D. (2019). An improved algorithm for basis pursuit problem and its applications. Applied Mathematics and Computation, 355, 385–398
Existence and Representations of Solutions to Some Constrained Systems. . .
71 ð2Þ
68. Sheng, X., & Chen, G. (2007). Full-rank representation of generalized inverse AT,S and its applications. Computers & Mathematics with Applications, 54, 1422–1430 69. Sheng, X., Chen, L. G., & Gong, Y. (2008). The representation and computation of ð2Þ generalized inverse AT,S . Journal of Computational and Applied Mathematics, 213, 248–257 70. Sheng, X., & Chen, G. (2013). Innovation based on Gaussian elimination to compute ð2Þ generalized inverse AT,S . Computers & Mathematics with Applications, 65, 1823–1829 ð2,3Þ 71. Srivastava, S., & Gupta, K. D. (2014). A new representation for AT,S . Applied Mathematics and Computation, 243, 514–521 72. Stanimirović, P. S., Cvetković-Ilić, D. S., Miljković, S., & Miladinović, M. (2011). Full-rank representations of {2, 4}, {2, 3}-inverses and successive matrix squaring algorithm. Applied Mathematics and Computation, 217, 9358–9367 73. Stanimirović, P. S., Ćirić, M., Katsikis, N. V., Li, C., & Ma, H. (2020). Outer and (b, c) inverses of tensors. Linear and Multilinear Algebra, 68, 940–971 74. Stanimirović, P. S., Ćirić, M., Stojanović, I., & Gerontitis, D. (2017). Conditions for existence, representations and computation of matrix generalized inverses. Complexity, 2017, Article ID 6429725, 27 p. 75. Stanimirović, P. S., Ćirić, M., Lastra, A., Sendra, R. J., & Sendra, J. (2021a). Representations and symbolic computation of generalized inverses over fields. Applied Mathematics and Computation, 406, 126287 76. Stanimirović, P.S., Mosić, D., & Wei, Y. (2021b). Least squares properties of generalized inverses. Communications in Mathematical Research, 37, 421–447 77. Stanimirović, P. S., Pappas, D., Katsikis, N. V., Stanimirović, I. P. (2012a). Full-rank representations of outer inverses based on the QR decomposition. Applied Mathematics and Computation, 218, 10321–10333 78. Stanimirović, P. S., Pappas, D., Katsikis, N. V., & Stanimirović, I. P. (2012b). ð2Þ Symbolic computation of AT,S -inverses using QDR factorization. Linear Algebra and Its Applications, 437, 1317–1331 79. Stanimirović, P. S., & Soleymani, F. (2014). A class of numerical algorithms for computing outer inverses. Journal of Computational and Applied Mathematics, 263, 236–245 80. Stewart, W. G., & Sun, J. (1990). Matrix perturbation theory. New York: Academic Press 81. Urquhart, S. N. (1968). Computation of generalized inverse matrtices which satisfy specified conditions. SIAM Review, 10, 216–218 82. Vosough, M., & Moslehian, S. M. (2017). Solutions of the system of operator equations BXA = B = AXB via the -order. Electronic Journal of Linear Algebra, 32, 172–183 83. Wang, R. G., Wei, Y., & Qiao, S. (2004). Generalized inverses: theory and computations. Beijing/New York: Science Press 84. Wang, H., & Liu, X. (2016). Partial orders based on core-nilpotent decomposition. Linear Algebra and Its Applications, 488, 235–248 85. Wang, H., Chen, J., & Yan, G. (2018). Generalized Cayley-Hamilton theorem for core-EP inverse matrix and DMP inverse matrix. Journal of Southeast University (English Edition), 34, 135–138
72
D. Mosić and P. S. Stanimirović
86. Wang, L., Castro-González, N., & Chen, L. J. (2017). Characterizations of outer generalized inverses. Canadian Mathematical Bulletin, 60, 861–871 87. Wang, W., Xu, S., & Benítez, J. (2019). Rank equalities related to the generalized inverses AkðB1 ,C1 Þ , DkðB2 ,C2 Þ of two matrices A and D. Symmetry, 11, 539. https://doi. org/10.3390/sym11040539 88. Wei, M. (2006). Theory and computation for generalized least squares problems. Beijing: Science Press 89. Wei, Y. (2014). Generalized inverses of matrices. Chapter 27 of Handbook of Linear Algebra, Edited by L. Hogben (2nd ed.). Boca Raton, FL: CRC Press ð2Þ 90. Wei, Y. (1998). A characterization and representation of the generalized inverse AT,S and its applications. Linear Algebra and Its Applications, 280, 87–96 91. Wei, Y. Stanimirović, P. S., & Petković, M. (2018). Numerical and symbolic computations of generalized inverses. Singapore: World Scientific 92. Wei, Y., & Wu, H. (2003). The representation and approximation for the generalized ð2Þ inverse AT,S . Applied Mathematics and Computation, 135, 263–276 93. Wei, Y., & Zhang, N. (2004). A note on the representation and approximation of the ð2Þ outer inverse AT,S of a matrix A. Applied Mathematics and Computation, 147, 837–841 94. Xu, Z. S., Chen, L. J., & Mosić, D. (2018). New characterizations of the CMP inverse of matrices. Linear and Multilinear Algebra, 68, 790–804 ð2Þ 95. Yang, H., & Liu, D. (2009). The representation of generalized inverse AT,S and its applications. Journal of Computational and Applied Mathematics, 224, 204–209 96. Yu, A., & Deng, C. (2016). Characterizations of DMP inverse in a Hilbert space. Calcolo, 53, 331–341 97. Zhang, X., & Ji, G. (2018). Solutions to the system of operator equations AXB = C = BXA. Acta Mathematica Scientia, 38, 1143–1150 98. Zhou, M. M., Chen, L. J., Li, T. T., & Wang, G. D. (2018). Three limit representations of the core-EP inverse. Filomat, 32, 5887–5894 99. Zhu, H. H., Chen, L. J., & Patrício, P. (2016). Further results on the inverse along an element in semigroups and rings. Linear and Multilinear Algebra, 64, 393–403
Quaternion Two-Sided Matrix Equations with Specific Constraints Ivan I. Kyrchei, Dijana Mosić, and Predrag S. Stanimirović
Abstract This chapter is devoted to the survey of quaternion restricted two-sided matrix equation AXB = D and approximation problems related with it. Unique solutions to the considered approximation matrix problems and the restricted quaternion two-sided matrix equations with specific constraints are expressed in terms of the core-EP inverse and the dual core-EP inverse, the MPCEP and CEPMP inverses, and the DMP and MPD inverses. The MPCEP-CEPMP inverses and the DMP-MPD inverses are generalized inverses obtained by combining the Moore-Penrose (MP-) inverse with the core-EP (CEP-)inverse and the MP-inverse with the Drazin (D-)inverse, respectively. Several particular cases of these equations and approximation matrix problems are presented too. Cramer’s rules for solving these constrained quaternion matrix equations and approximation matrix problems with their particular cases are developed by using of noncommutative row-column determinants. As a consequence, Cramer’s rules for solving these constrained matrix equations with complex matrices are derived as well. Numerical examples are given to illustrate gained results. Keywords Generalized inverse • Quaternion restricted matrix equation • Determinantal representation Mathematics Subject Classification (MSC2020) Primary 15A09 • Secondary 15A24, 15B33
I. I. Kyrchei (✉) Pidstryhach Institute for Applied Problems of Mechanics and Mathematics of NAS of Ukraine, L’viv, Ukraine D. Mosić • P. S. Stanimirović Faculty of Sciences and Mathematics, University of Niš, Niš, Serbia e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Moslehian (ed.), Matrix and Operator Equations and Applications, Mathematics Online First Collections, https://doi.org/10.1007/16618_2023_45
73
74
I. I. Kyrchei et al.
1 Introduction Standardly, the field of complex (natural or real) numbers is marked by usual term ( or ), and the m × n involves all m × n matrices over the quaternion skew field = fu0 þ u1 i þ u2 j þ u3 k j i2 = j2 = k2 = ijk = - 1, u0 , u1 , u2 , u3 2 g: = u0 -u1 i- u2 j -u3 k, For u= u0 þ u1 i þpu2 j þ up 3 k 2 , its conjugate is u and its norm kuk = uu = uu = u20 þ u21 þ u22 þ u23 . The conjugate transpose of A 2 m×n is denoted by A. If A 2 n×n and A =A, then A is Hermitian. Applying appropriate -valued inner product [84] hx, zir = z1 x1 þ ⋯ þ zn xn for z = ðzi Þni= 1 , x = ðxi Þni= 1 2 n×1 , the right quaternionic vector space Hr = n×1 is given. Note that an adequate -valued inner product on the left quaternionic vector space Hl = 1×n is determined as well [41]. Presented inner product induces the vector norm kxk = hx, xir on n×1 and the quaternion matrix Frobenius norm for A = ðail Þ 2 m×n by p kAkF = tr A A =
ka:l k2 =
kail k2 :
l
l
i
This is clear by ka:l k2 = l
l
hal: , a:l ir =
ail ail , i
l
where al: means the lth row of A and a.l is the lth column of A. Due to noncommutativity in the quaternion skew field, for arbitrary A 2 m×n , it makes sense to introduce the notations: – – – –
Rl ðAÞ = fr 2 1×n : r = sA, s 2 1×m g is the left row space of A; C r ðAÞ = fr 2 m×1 : r = As, s 2 n×1 g is the right column space of A; N l ðAÞ = fs 2 1×m : sA = 0g denotes the left null space of A; N r ðAÞ = fs 2 n×1 : As = 0g is the right null space of A.
The rank of A 2 m×n is determined as rankðAÞ =dimC r ðAÞ= dim Rl(A*). We refer the reader to [98] for more details on quaternion matrices. Solutions to quaternion linear system were considered in [17]. Quaternion matrices play a
Quaternion Two-Sided Matrix Equations with Specific Constraints
75
vital impact on computer science, quantum physics, and signal and color image processing [8, 15, 46, 72, 87, 90, 96, 97]. It is known that quaternion matrices are usable in representing red, green, and blue channels of color images [79]. Determinants of quaternion matrices are investigated in [16]. In addition, various quaternion matrix equations have been investigated recently. Main results were obtained in [10, 22, 74, 89, 100]. By bold capital letters, we denote quaternion matrices and vectors, while capital letters are used for complex ones. As usual, m×n denotes the set of m × n complex matrices. The definitions of significant generalized inverses are stated now. Balancing chemical equations, i.e., symbolic representation of a chemical reaction and representation of an expression of atoms, elements, compounds, or ions, is an important application of generalized inverses (see [27, 70]). The problem of finding generalized inverses of large-scale matrices resulting from real applications with underlying 2D or 3D geometry (such as partial differential equations, optimization, computational fluid dynamics, simulation, and computer graphics) frequently occur in practice. The unique A{ := X is the Moore-Penrose inverse of A 2 n×m iff X = XAX,
A = AXA,
AX = ðAXÞ ,
XA = ðXAÞ :
The Moore-Penrose inverse was used in applications such as linear estimation, differential and difference equations, Markov chains, graphics, cryptography, coding theory, and robotics [7, 11]. The equation X = XAX is only satisfied for outer inverses. The Drazin inverse of A 2 n×n is the unique AD := X such that X = XAX,
Ak = XAkþ1 ,
XA = AX,
where k = IndðAÞ = minfk 2 [ f0g j rkðAk Þ = rkðAkþ1 Þg is the index of A. Note that AD becomes the group inverse A# when IndðAÞ ≤ 1. The Drazin inverse has many applications in the theory of finite Markov chains as well as in the study of differential equations and singular linear difference equations [11], cryptography [45], etc. Analogously as in [62], the core-EP inverse was defined for quaternion matrices in [38] using characteristics of quaternionic vector spaces. { := X is the core-EP inverse of A 2 Definition 1.1 The unique matrix A n×n if C r ðAD Þ = C r ðXÞ = C r ðX Þ and X = XAX.
According to [62], we have the next characterization of the quaternion core-EP inverse.
76
I. I. Kyrchei et al.
{ Lemma 1.2 For A, X 2 n×n and k = IndðAÞ, we have X = A iff
XAkþ1 = Ak , ðAXÞ = AX, AX2 = X and C r ðXÞ ⊆ C r ðAk Þ: By [20, Theorem 2.3] proved for elements of a ring with involution, the following expression can be obtained for the core-EP inverse of quaternion matrices. { = AD Am ðAm Þ{ . Lemma 1.3 If A 2 n×n and m ≥ k = IndðAÞ, then A { # reduces to the core inverse A = In the case that IndðAÞ = 1, A { A AA [4]. #
{ Definition 1.4 The dual core-EP inverse A := X of A 2 n×n is the unique matrix which satisfies X = XAX and Rl ðXÞ = Rl ðAD Þ = Rl ðX Þ:
We can characterized the dual core-EP inverse as follows. Lemma 1.5 For A, X 2 n×n and m ≥ k = IndðAÞ, the following claims are equivalent: (i) X = A {; (ii) Akþ1 X = Ak , ðXAÞ = XA, X2 A = X and Rl ðXÞ ⊆ Rl ðAk Þ; m { m D (iii) X = A { = ðA Þ A A . The interesting relation between the core-EP inverse and its dual was provided in [20]:
{ Þ = ðA Þ ðA {:
Various representations of the core-EP inverse were proposed in [38, 51, 91, 99]. In [63, 64], iterative and bordering methods to find the core-EP inverse were established. In [21], the continuity was studied for the core-EP inverse. Some extensions of core-EP inverse were given to rectangular matrices [18], tensors [71], Hilbert space operators [59], Banach algebra elements [55], and elements of rings [20, 56]. The notions of the MPCEP inverse and CEPMP inverse for Hilbert space operators were defined in [12] as adequate combinations of the MoorePenrose inverse with the core-EP inverse or its dual. The definitions of the MPCEP and CEPMP inverses are generalized for quaternion matrices using the specialties of quaternion core-EP inverses. { := X of A 2 Definition 1.6 The MPCEP (or MP-Core-EP) inverse A{, n×n is determined uniquely by
Quaternion Two-Sided Matrix Equations with Specific Constraints
X = XAX,
{ AX = AA ,
77
{ XA = A{ AA A:
The CEPMP (or Core-EP-MP) inverse A,{ { := X of A is uniquely determined by X = XAX,
XA = A { A,
{ AX = AA { AA :
By [12], recall that the MPCEP and CEPMP inverses can be given by { { = A{ AA , A{,
ð1:1Þ
{ A,{ { = A { AA :
ð1:2Þ
The DMP inverse defined in [52] as a combination of the Drazin (D)inverse and the Moore-Penrose (MP)-inverse was extended for a quaternion matrix in [38]. Definition 1.7 For A 2 n×n with IndðAÞ = k, X := AD,{ is the DMP inverse of A if it is a unique solution to XAX = X, XA = AD A, Ak X = Ak A{ :
ð1:3Þ
A matrix X := A{,D is said to be the MPD inverse of A if it is a unique solution to XAX = X, AX = AAD , XAk = A{ Ak : Notice that AD,{ = AD AA{ ,
ð1:4Þ
A{,D = A{ AAD :
ð1:5Þ
In the last decade, many investigators concerned the DMP inverse. Especially, characterizations and representations of the DMP inverse were developed in [19, 50]. Determinantal and integral representations of DMP inverse were proved in [40, 101]. An iterative method for computing the DMP inverse was established in [48]. Several generalizations of DMP inverse were given for rectangular matrices in [53], tensors in [86], operators in [59], and elements of rings in [102]. Papers [41, 57, 58, 60, 93] contain more significant properties of DMP inverse.
78
I. I. Kyrchei et al.
Generalized inverses are usually applied in solving matrix equations or linear systems of matrix equations. For an unknown vector x and a given vector b, let us mentioned that, for a nonsingular A, x = A-1b is a uniquely determined solution to Ax = b. When A is singular or rectangular, the equation Ax = b sometimes has many solutions or no solutions. Obviously, Ax = b has a solution if and only if b 2 Cr ðAÞ. Sometimes it is important that the vector x lies in a corresponding subspace, or sometimes to find the best approximation solution x0 of Ax = b such that kAx -bk ≥ kAx0 -bk and kx0k < kxk for all x ≠ x0. The best approximate solution is determined by Moore-Penrose inverse, that is, x0 = A{b [7]. The constrain equation [11] Ax = b,
b, x 2 RðAk Þ,
where A 2 n×n with IndðAÞ = k, has the unique Drazin-inverse solution x = # A b. When IndðAÞ = 1 and b 2 RðAÞ, x = A#b is a unique solution to # b. Without the assumption b 2 RðAÞ, Ax = b. One can see that A#b = A n×n [88], for A 2 , 1 = IndðAÞ, and b 2 n , verified that x = Ab is the uniquely determined solution in the Frobenius norm to the matrix approximation problem: minkAx - bkF
subject to x 2 RðAÞ:
ð1:6Þ
In [23, 24], under the constraints included in NX = D, quaternionic least squares models which solve minkMX - Dk were investigated. Least squares minimization minkMx - Dk2 were studied in [25, 47] provided that kNx -dk2 ≤ c, c ≥ 0. Solutions of these mentioned models have significant applications in quaternionic quantum mechanics [1]. In this chapter, the object of our research is the two-sided quaternion matrix equation (TQME) AXB = D. TQME, as a special kind of the Sylvester equation, finds applications in various areas such as image processing and signal [67], photogrammetry [66], etc. For this equation, [54] considered solutions with fixed ranks. [26] established a necessary and sufficient condition for the existence of a Hermitian solution to TQME. Applying the matrix rank method, [85] investigates its solutions. Using the Kronecker product, some authors found algorithms for computing solutions of TQME with or without constrained [3, 13, 61, 92]. Recall that X = A{DB{ is its unique best approximate solution.
Quaternion Two-Sided Matrix Equations with Specific Constraints
79
Investigations related to the best approximation solutions inspired us that the principal aim of this chapter be research about solutions of two-sided quaternion restricted matrix equations (TQRME) with specific constraints containing the MPCEP and CEPMP inverses and the DMP and MPD inverses, and approximation quaternion matrix problems related with TQRME and given in terms of the core-EP inverse and its dual. We give their explicit representations and obtain their Cramer’s rule analogies. This chapter provides and continues the area raised in a number of papers where two-sided quaternion matrix equations were investigated concerning different generalized inverses [31–34, 42, 43, 68, 75, 76–78]. The results presented in this investigation are organized in the following parts. Section 2 describes the motivation and additional preliminary results. Section 3 involves solvability of TQRME and their special cases; in particular, the constrained quaternion matrix approximation problems presented in terms of the (dual) core-EP inverse are explored in Section 3.1 and the solvability of TQRME with constraints containing the MPCEP-CEPMP inverses and the DMP-MPD inverses are studied in Sections 3.2 and 3.3, respectively. Cramer’s rule for considered solutions is derived in Section 4. Furthermore, Cramer’s rule for the constrained quaternion matrix approximation problems is derived in Section 4.1, and the determinantal (D)-representations of MPCEP-CEPMP-solutions and DMP-MPD-solutions are developed in Sections 4.2 and 4.3. Numerical examples are given in Section 5. Concluding comments are stated in Section 6. The chapter generalizes the main results that were partly published in [42–44].
2 Detailed Motivation and Preliminaries Quaternion matrix equation (QME) has been studied by many researchers because of their wide applications in quantum physics, statistics, computer science, signal and color image processing, quantum mechanics, rigid mechanics, control theory, field theory, and so on. In engineering, mathematics, and others, a number of problems can be transformed into linear matrix equations. Using various methods (e.g., the Kronecker product of matrices, various representations of quaternion matrices, and properties of the Moore-Penrose inverse), the solvability of the QME and its general solutions, as well as some efficient algorithms for calculating these solutions, was investigated.
80
I. I. Kyrchei et al.
We use the following notation: ðnÞðkÞ = fA 2 n×n j k = IndðAÞg, = A 2 ðnÞðkÞ j r = rkðAÞ , ðnÞðkÞ r as well as the symbols from [43] M 2 Cr, ⊂ ðAÞ , C r ðMÞ ⊂ C r ðAÞ, M 2 Rl, ⊂ ðBÞ , Rl ðMÞ ⊂ Rl ðBÞ: Also, we denote M 2 O ⊂ ðA, BÞ , C r ðMÞ ⊂ C r ðAÞ, Rl ðMÞ ⊂ Rl ðBÞ, ðAjBÞ 2 ðmjnÞðkjlÞ , A 2 ðmÞðkÞ , B 2 ðnÞðlÞ ,
ð2:1Þ
, A 2 ðmÞðkÞ , B 2 ðnÞðlÞ : ðAjBÞ 2 ðmjnÞðkjlÞ r,s r s Especially, the following problems will be investigated. For ðAjBÞ 2 ðmjnÞðkjlÞ and D 2 m×n , we explore the two-sided quaternion matrix equation AXB = D
ð2:2Þ
with the following constraints and research directions. 1. Firstly, in terms of the core-EP inverse, we provide the uniquely determined solution to the next quaternion-matrix constrained approximation problem in the Frobenius norm: kAXB - DkF = min
subject to X 2 O ⊂ ðAk , Bl Þ,
ð2:3Þ
where the relation O ⊂ is defined in (2.1). We say inside-unknown minimization quaternion-matrix problem (IU Q-matrix problem shortly) for the problem (2.3). Thus, the problem which generalizes (1.6) proposed in [88] for complex matrices of index one is given and solved for quaternion matrices of arbitrary index. 2. We get solutions to the following problems, which are special cases of (2.3), kAX - DkF = min
subject to
Cr ðXÞ ⊂ Cr ðAk Þ,
ð2:4Þ
Quaternion Two-Sided Matrix Equations with Specific Constraints
kXB - DkF = min
subject to
81
Rl ðXÞ ⊂ Rl ðBl Þ:
ð2:5Þ
We say right-unknown quaternion-matrix (RU Q-matrix) approximation problem for the minimization problem (2.4). The dual problem (2.5) will be termed as left-unknown quaternion-matrix) (LU Q-matrix) approximation problem. 3. We find the solutions to Equation (2.2) with the next constrains (where the equality D = E means that D is replaced with the corresponding matrix E in (2.2)) {
{
ð2:6Þ
X 2 O ⊂ ðA{ Ak , Bl B{ Þ:
D = Ak ðAk Þ DðBl Þ Bl ,
An unique solution to the problem (2.6) is given by the MPCEP inverse of A and the CEPMP inverse of B. 4. {
{
D = AðAk Þ Ak A{ DB{ Bl ðBl Þ B,
{
{
X 2 O ⊂ ðAk Þ , ðBl Þ :
ð2:7Þ
The problem (2.7) is considered in terms of the CEPMP inverse of A and the MPCEP inverse of B. 5. D = AAD DBD B,
X 2 O ⊂ ðA{ Ak , Bl B{ Þ:
ð2:8Þ
The unique solution to TQRME (2.8) is determined based on the MPD inverse of A and the DMP inverse of B. 6. It’s proven that the problem (2.8) is equivalent to the next once Akþ1 XBlþ1 = Ak DBl ,
X 2 O ⊂ ðA{ Ak , Bl B{ Þ:
ð2:9Þ
7. Consider D = AAD AA{ DB{ BBD B,
X 2 O ⊂ ðAk , Bl Þ:
ð2:10Þ
The unique solution to TQRME (2.10) is determined based on the DMP inverse of A and the MPD inverse of B. 8. The problem (2.10) is equivalent to Ak XBl = Ak A{ DB{ Bl ,
X 2 O ⊂ ðAk , Bl Þ:
ð2:11Þ
82
I. I. Kyrchei et al.
9. Special and analogy cases of all proposed QRME are considered when B = In or A = Im or IndðAÞ = 1 = IndðBÞ. 10. For all abovementioned equations, Cramer’s rules are given. As is well-known, Cramer’s rule for systems of linear equations is derived, thanks to the determinantal representation (D-representation shortly) of the ordinary inverse that is the matrix with cofactors in entries. The same is desirable to have for generalized inverses. But the construction of determinantal representations of generalized inverses is not so evident and unambiguous even for matrices with complex or real entries. Due to looking for their more applicable explicit expressions, there are widespread various Drepresentations of generalized inverses of matrices over complex numbers [9, 49, 73, 81, 82] or over integral domains [5, 6, 65, 94, 95]. D-representation for outer inverses in Riemannian space was proposed in [80, 83]. Because of complexity and difficult applicability of some of the previously obtained expressions of D -representations, they didn’t yield convenient Cramer’s rules for matrix equations. Taking into account the non-commutativity of quaternions, the task dealing with D-representing of quaternion generalized inverses is more complicated. Difficulties arise already in defining the determinant of a quaternion matrix (see survey articles [2, 14, 98] for detail). Recently, solving this problem began to be decided, thanks to the theory of column-row determinants developed in [28, 29]. Within the framework of the theory of columnrow determinants by using developed limit-rank method, determinantal representations of various kinds of generalized inverses have been derived, in particular the Moore-Penrose inverse in [30], the Drazin inverse in [31], and the weighted Moore-Penrose and Drazin inverse in [32, 33], respectively. Corresponding analogs of Cramer’s rule for quaternion Sylvester-type matrix equations have been obtained in [34–37, 69]. Applying the limit-rank method in the case dealing with complex matrices gives new determinantal representations of generalized inverses and new expressions for Cramer’s rules as well [39]. The usual determinants and minors are evidently used in this case. There is, for A = ðaij Þ 2 n×n , a method to produce n row (ℜ-)determinants and n column (ℭ-)determinants by stating a certain order of factors in each term. • The ith ℜ-determinant of A, for an arbitrary row index i 2 In = {1, . . ., n}, is given by rdeti A :=
σ2Sn
ð - 1Þn - r ðaiik1 aik1 ik1 þ1 . . . aik1 þl1 i Þ. . .ðaikr ikr þ1 . . . aikr þlr ikr Þ,
Quaternion Two-Sided Matrix Equations with Specific Constraints
83
where at Sn denotes the symmetric group on In, while the permutation σ is defined as a product of mutually disjunct subsets ordered from the left to right by the rules σ = ði ik1 ik1 þ1 . . . ik1 þl1 Þðik2 ik2 þ1 . . . ik2 þl2 Þ . . . ðikr ikr þ1 . . . ikr þlr Þ, ikt < ikt þs , ik2 < ik3 < ⋯ < ikr , 8 t = 2, . . ., r,
s = 1, . . ., lt :
• For an arbitrary column index j 2 In, the jth ℭ-determinant of A is defined as the sum cdetj A =
τ2Sn
ð - 1Þn-r ðajkr jkr þlr ⋯ajkr þ1 jkr Þ⋯ðajjk1 þl1 ⋯ajk1 þ1 jk1 ajk1 j Þ,
in which a permutation τ is ordered from the right to left in the following way: τ = jkr þlr ⋯jkr þ1 jkr ⋯ jk2 þl2 ⋯jk2 þ1 jk2 jk1 þl1 ⋯jk1 þ1 jk1 j , jkt < jkt þs ,
jk2 < jk3 < ⋯ < jkr :
Due to the non-commutativity of quaternions, all ℜ- and ℭ-determinants are, in general, different. However, the following equalities are verified for a Hermitian matrix A in [28]: rdet1 A = ⋯ = rdetn A = cdet1 A = ⋯ = cdetn A 2 : For more details on quaternion column-row determinants see [29]. The next symbols related to D-representations will be used. The ith row and jth column of A are marked with ai. and a. j, respectively. Let A:j ðcÞ (resp. Ai: ðbÞ) mean the matrices formed by replacing jth column (resp. ith row) of A by the column vector c (resp. by the row vector b). Suppose α : = fα1 , . . ., αk g ⊆ f1, . . ., mg and β : = fβ1 , . . ., βk g ⊆ f1, . . ., ng are subsets with 1 ≤ k ≤ minfm, ng. For A 2 m×n , the notation Aαβ stands for a submatrix with rows and columns indexed by α and β, respectively. Further, Aαα and jAjαα denote a principal submatrix and a principal minor of A, respectively. The standard notation Lk,n : = fα : α = ðα1 , . . ., αk Þ, 1 ≤ α1 < ⋯ < αk ≤ ng will mean the set of strictly increasing sequences of k 2{1, . . ., n} integers elected from f1, . . ., ng. In this respect, we put
84
I. I. Kyrchei et al.
I r,m fig : = fα : α 2 Lr,m , i 2 αg, J r,n fjg : = fβ : β 2 Lr,n , j 2 βg for some fixed i 2 α and j 2 β. Recently, D-representations of the quaternion core-EP inverse with its dual, the DMP and MPD inverses, were proved in [38] and for complex matrices in [40], and the quaternion CEPMP and MPCEP inverses in [44]. Lemma 2.1 ([38]) If A 2 ðnÞðkÞ and rkðAk Þ = s, then determinantal repre{ { { sentations of A = a,r and A = a,l , respectively, are given by ij ij
Akþ1 Akþ1
rdetj
{ a,r ij =
α2I s,n fjg
j:
α
ð^ai : Þ
α
α Akþ1 Akþ1 α
α2I s,n
Akþ1 Akþ1
cdeti
{ a,l ij =
β2J s,n fig
A β2J s,n
kþ1
:i
ða:j Þ
ð2:12Þ
β β
β Akþ1 β
= ðAkþ1 Þ Ak and ^ai where a:j is the jth column of A ^ = Ak ðAkþ1 Þ . A
,
:
,
ð2:13Þ
is the ith row of
D -representations of the complex core-EP inverse and its dual can be obtained [40] by replacement noncommutative row-column determinants by usual determinant in the next constrains (where the Equations (2.12) and (2.13). D-representations for the quaternion MPD and DMP inverses are considered subject to availability of Hermiticity. be of rkðAk Þ = s1 . The DMP inverse Lemma 2.2 ([38]) Let A 2 ðnÞðkÞ s has the D-representations as below. AD,{ = aD,{ ij (i) If A is arbitrary, then rdetj ðAA Þj: ð~ ui: Þ
aD,{ ij =
α2I s,n fjg α2I s1 ,n
α A2kþ1 A2kþ1 α
α2I s,n
α α
jAA jαα
,
ð2:14Þ
~ : = UA2 A , and U = ðuif Þ 2 n×n is ~i: means the ith row of U where u expressed
Quaternion Two-Sided Matrix Equations with Specific Constraints
uif =
A2kþ1 A2kþ1
rdetf
:f
α2I s1 ,n ff g
85 α
ðai : Þ
, α
ð2:15Þ
: = AðA2kþ1 Þ . where ai : stands for ith row of A (ii) For a Hermitian A, it can be derived ð1Þ
aD,{ ij =
α2I s,n fjg
rdetj ðA2 Þj: ðvi: Þ Akþ1
β2J s1 ,n
=
β2J s1 ,n fig
Akþ1
cdeti
β2J s1 ,n
β β
β Akþ1 β
α2I s,n
A2
β2J s,n
α
ð2:16Þ
α α
β
ð1Þ
: i
α
u:j
β
ð2:17Þ
,
β A2 β
where ð1Þ
vi: =
cdeti β2J s1 ,n fig
ð1Þ
u:j =
Akþ1
ðkþ2Þ
:i
a:f
ðkþ2Þ
α2I s,n fjg
rdetj ðA2 Þj: ðal:
Þ
β β
α α
2 1×n ,
2 n×1 ,
f = 1, . . ., n,
l = 1, . . ., n,
respectively, are the row-vector and the column-vector. Lemma 2.3 ([38]) Consider A 2 ðnÞðkÞ with rkðAk Þ = s1 . The D-represens are listed. tations of the MPD inverse A{,D = a{,D ij (i) For arbitrary A, we have
a{,D ij =
β2J s,n fig β2J s,n
cdeti ðA AÞ:i ð~v:j Þ
jA Ajββ
β β
A2kþ1 A2kþ1 β2J s1 ,n
β
,
ð2:18Þ
β
~ : = A A2 V , and V = ðvf j Þ 2 n×n is where ~v:j is the jth column of V expressed as
86
I. I. Kyrchei et al.
vf j =
2kþ1
cdetf
A
2kþ1
A
:f
β2J s1 ,n ff g
ð^a:j Þ
β
ð2:19Þ
, β
^ : = ðA2kþ1 Þ A. where ^a:j stands for jth column of A (ii) For a Hermitian A, it is ð2Þ
a{,D ij =
β2J s,n fig
cdeti ðA2 Þ:i ðv:j Þ A2
β2J s,n
β β
Akþ1 β2I s1 ,n
rdetj
=
α2I s1 ,n fjg
α2I s,
n
Akþ1
α Akþ1 α
β β
ð2:20Þ
β β
α
ð2Þ
j:
α2I s1 ,n
ui: A2
α
ð2:21Þ
,
α α
where ð2Þ
v:j =
rdetj α2I s1 ,n fjg
ð2Þ
ui: =
Akþ1
ðkþ2Þ
j:
al:
ðkþ2Þ
cdeti ðA2 Þ:i ða:f
β2J s,n fig
α α
Þ
β β
2 n×1 ,
2 1×n ,
l = 1, . . ., n,
l = 1, . . ., n:
For the case of complex matrix, we have the next. and rkðAk Þ = s1 . Lemma 2.4 ([40]) Let A 2 ðnÞðkÞ s 1) Its DMP inverse AD,{ = aD,{ has the determinantal representations ij ð1Þ
aD,{ ij =
α2I s,n fjg β2J s1 ,n
where
ðAA Þj: ðui: Þ
β Akþ1 β
α2I s,n
α α
jAA jαα
Akþ1 =
β2J s1 ,n fig β2J s1 ,n
β Akþ1 β
ð2Þ
: i
β2J s,n
u:j
jAA jββ
β β
,
Quaternion Two-Sided Matrix Equations with Specific Constraints ð1Þ
ui: =
Akþ1 β2J s1 ,n fig
ð2Þ
u:j =
α2I s,n fjg
β β
a~:f
:i
ðAA Þj: ð~ al: Þ
α
2 1×n ,
f = 1, . . ., n,
2 n×1 ,
α
87
l = 1, . . ., n:
Here a~:f and a~l: are the fth column and the lth row of A~ : = Akþ1 A . has the determinantal representations 2) The MPD inverse A{,D = a{,D ij ð1Þ
a{,D ij =
β2J s,n fig β2J s,n
ðA AÞ:i ðv:j Þ
jA Ajββ
β2I s1 ,n
β β
α Akþ1 α
=
α2I s1 ,n
vi:
j:
α2I s1 ,n fjg
jA Ajββ
α
ð2Þ
Akþ1
α2I s,
n
α
α Akþ1 α
,
where ð1Þ
v:j =
α2I s1 ,n fjg
ð2Þ
vi: =
β2J s,n fig
α
Akþ1 j: ða^l: Þ ðA AÞ:i ð^ a:f Þ
α
β β
2 n×1 ,
l = 1, . . ., n,
2 1×n ,
l = 1, . . ., n:
Here a^l: and a^:f are the lth row and the fth column of A^ : = A Akþ1 . The next lemmas develop the D-representations of the quaternion MPCEP and CEPMP inverses. { { and rkðAk Þ = s1 , then A{, = a{, can Lemma 2.5 ([44]) If A 2 ðnÞðkÞ ij s
be represented in the form Akþ1 Akþ1
rdetj
{ = a{, ij
α2I s1 ,n fjg
β2J s,n
=
jA
β2J s,n fig β2J s,n
Ajββ
kþ1
α2I s1 ,n
A
cdeti ðA AÞ:
jA Ajββ
α2I s1 ,n
j:
α Akþ1 α
ð1Þ
i
u:j
ð1Þ
ðvi: Þ
α α
ð2:22Þ
β β
α Akþ1 Akþ1 α
,
ð2:23Þ
88
I. I. Kyrchei et al.
where ð1Þ
vi: =
β2J s,n fig
ð1Þ
u:j =
a:l Þ cdeti ðA AÞ:i ð~ rdetj
β β
Akþ1 Akþ1
2 1×n , j:
α2I s1 ,n fjg
l = 1, . . ., n, α
ð~ af : Þ
2 n×1 ,
α
f = 1, . . . , n,
~ = A Akþ1 ðAkþ1 Þ . and ~a:l and ~af : are the lth column and the fth row of A ,{ { Lemma 2.6 ([44]) If A 2 ðnÞðkÞ with rkðAk Þ = s1 , then A,{ has { = aij s
the following D-representation ð2Þ
α2I s,n fjg
{ a,{ = ij
β2J s,n
rdetj ðAA Þj: ðvi: Þ
jAA jββ
=
β2J s1 ,n fig
β2J s,n
: i
α2I s1 ,n
ð2:24Þ
α α
ð2Þ
Akþ1 Akþ1
jAA jββ
α
Akþ1 Akþ1
α2I s1 ,n
cdeti
α
u:j
Akþ1 Akþ1
β β
,
α
ð2:25Þ
α
where ð2Þ
vi: =
cdeti β2J s1 ,n fig
ð2Þ
u:j =
Akþ1 Akþ1
α2I s,n fjg
rdetj ðAA Þj: ð^af :
α α
:i
ð^a:l Þ
2 n×1 ,
β β
2 1×n ,
l = 1, . . . , n,
f = 1, . . . , n:
^ = ðAkþ1 Þ Akþ1 A . Here ^a:l and ^af : are the lth column and the fth row of A D-representations of the complex CEPMP and MPCEP inverses can be obtained by replacement of noncommutative row-column determinants by the usual determinant in Lemmas 2.5 and 2.6, respectively.
Quaternion Two-Sided Matrix Equations with Specific Constraints
89
3 Solvability of QRME We solve the quaternion two-sided matrix equation (2.2) with constrains (2.3)–(2.11) and their special cases in this section.
3.1
Solvability of the Quaternion Matrix Minimization Problems
Solvability of the constrained quaternion matrix approximation problems (2.3)–(2.5) is considered now. For a quaternion matrix M and its core-EP { , the next decompositions are obtained as in [91]. inverse M Lemma 3.1 If M 2 ðnÞðkÞ , there exist a unitary matrix P 2 n×n , a nonsingular matrix M1 2 t×t , t = rk ðMk Þ , and a nilpotent matrix M3 2 ðn - tÞ × ðn - tÞ of index k which transform M in the form M=P
M1
M2
0
M3
P :
ð3:1Þ
In addition, { M =P
M1- 1
0
0
0
P :
ð3:2Þ
Based on the corresponding core-EP inverse and its dual, we firstly get the unique solution to the approximation problem (2.3). Theorem 3.2 ([42]) The unique solutions to the IU Q-matrix minimization problem (2.3) is presented as { DB X = A {:
ð3:3Þ
Proof The conditions Cr ðXÞ ⊂ Cr ðAk Þ and Rl ðXÞ ⊂ Rl ðBl Þ imply that there is Y 2 m×n satisfying X = AkYBl. Applying Lemma 3.1, we have A = P1
A1
A2
0
A3
P1 ,
B = P2
B1
B2
0
B3
P2 ,
90
I. I. Kyrchei et al.
where A1 2 r × r and B1 2 s × s , are nonsingular matrices with r = rk ðAk Þ, s = rk ðBl Þ, A3 2 ðm-rÞ × ðm-rÞ and B1 2 ðn-sÞ × ðn-sÞ are nilpotent matrices of indexes k and l, respectively. Using Lemma 3.1 and [20, Remark 1.3], {
A = P1
A1- 1
0
0
0
P1 ,
{
B { = ½ðB Þ = P2
ðB1- 1 Þ
0
0
0
P2 :
Let P1 YP2 =
Y1
Y2
Y3
Y4
,
P1 DP2 =
D1
D2
D3
D4
,
where Y1 , D1 2 r × s , Y2 , D2 2 r × ðn - sÞ , Y3 , D3 2 ðm - rÞ × s , and Y4, D4 2 ðm - rÞ × ðn - sÞ . Furthermore, AXB = Akþ1 YBlþ1 A1 A2 = P1 0 A3 = P1 = P1
Akþ1 1
Ak1
Z1
0
0
A1 Z1
0 0 ~ Y 0 P2 , 0 0
Y1 Y3
ðB1 Þl
0
B1
0
Z2
0
B2
B3
Y2
ðB1 Þlþ1
0
Y4
Z2 B1
0
P1 YP2
P2
P2
where ~ = Akþ1 Y1 ðB Þlþ1 þ A1 Z1 Y3 ðB Þlþ1 þ Akþ1 Y2 Z2 B þ A1 Z1 Y4 Z2 B Y 1 1 1 1 1 1 for appropriate matrices Z1 2 ðm - rÞ × r and Z2 2 ðn - sÞ × s . So, ~ - D1 - D2 2 Y - D3 - D4 F ~ - D1 k2 þ kD2 k2 þ kD3 k2 þ kD4 k2 : = kY F F F F
kAXB - Dk2F =
Since X is a solution to (2.3) if and only if Y is a solution to
Quaternion Two-Sided Matrix Equations with Specific Constraints
91
kAkþ1 YBlþ1 - DkF = min , we note min
Yi for i = 1, ... , 4
~ - D1 k2 = 0, kY F
i.e., kAkþ1 YBlþ1 - DkF = min = kD2 k2F þ kD3 k2F þ kD4 k2F for arbitrary Yi for i = 2, 3, 4 and - ðkþ1Þ
Y1 = A1
D1 ðB1 Þ - ðlþ1Þ - A1- k Z1 Y3
ð3:4Þ
- Y2 Z2 ðB1 Þ - l - A1- k Z1 Y4 Z2 ðB1 Þ - l : Thus, X = Ak YBl = P1 = P1
Ak1
Z1
Y1
Y2
ðB1 Þl
0
0
0
Y3
Y4
Z2
0
Ak1 Y1 ðB1 Þl
þ
Z1 Y3 ðB1 Þl
þ
Ak1 Y2 Z2
P2
þ Z1 Y4 Z2
0
0 0
P2 : ð3:5Þ
Taking (3.4) in (3.5), one can see that (2.3) has the unique solution X = P1 = P1
A1- 1 D1 ðB1 Þ - 1
0
0 A1- 1
0
D1
0 D2
0
0
D3
D4
P2 ðB1 Þ - 1
0
0
0
P2
{
= A DB {: □ { For IndðAÞ = 1 or IndðBÞ = 1 in Theorem 3.2, the core-EP inverse A # reduces to the core inverse A or the dual core-EP inverse B { becomes the # dual core inverse B . If B = In in (2.3), we get the approximation problem (2.4), which solution can be obtained by Theorem 3.2.
92
I. I. Kyrchei et al.
Corollary 3.3 ([42]) The unique solutions to RU Q-matrix minimization (2.4) is given by { D: X = A
ð3:6Þ
The constrained approximation problem (2.5) is considered in the case A = Im. Corollary 3.4 ([42]) The unique solution to the LU Q-matrix minimization (2.5) is expressed by X = DB {:
ð3:7Þ
The following constrained RU and LU, respectively, and Q-vector approximation problems are studied as particular cases of problems (2.4) and (2.5): kAx - DkF = min
subject to
x 2 Cr ðAk Þ,
ð3:8Þ
kxB - dkF = min
subject to
x 2 Rl ðBl Þ,
ð3:9Þ
where A, B 2 n×n , k = IndðAÞ, l = IndðBÞ, D 2 n×1 , and d 2 1×n . Corollary 3.5 ([42]) The unique solutions to (3.8) and (3.9) are respectively expressed by { x = A D,
x = DB {:
Notice that the approximation problem (3.8) coincides with known results for complex matrices of index one, verified in [88], and for nonsingular complex matrices.
3.2
MPCEP-CEPMP-Solutions of Restricted Equations
Solvability of QRMEs (2.6)–(2.7), verified in [44], is investigation stream of this subsection. We firstly give the unique solution of (2.6) in terms of the MPCEP inverse and CEPMP inverse of appropriate quaternion matrices. Theorem 3.6 For ðAjBÞ 2 ðmjnÞðkjlÞ and D 2 m×n , we have just one solution to (2.6) expressed as in
Quaternion Two-Sided Matrix Equations with Specific Constraints
93
{ X = A{, DB,{ { : { Proof From (1.1), (1.2), A = AD Ak Ak that
{
ð3:10Þ {
l and B Bl BD , it follows { = B
{ { { DB,{ X = A{, { = A AADB { BB {
{
= A{ ðAAD Ak ÞðAk Þ DðBl Þ ðBl BD BÞB{ {
{
= A{ Ak ðAk Þ DðBl Þ Bl B{ : { DB,{ Hence, (2.6) is solved with X = A{, { . Suppose that X and Y are solutions of (2.6). The equality A(X -Y)B = 0 in conjunction with X, Y 2 Cr, ⊂ ðA{AkÞ gives { { ðX - YÞB 2 Cr ðA{ Ak Þ \ N r ðAÞ ⊆ Cr ðA{, AÞ \ N r ðA{, AÞ = f0g:
By (X -Y)B = 0 and X, Y 2 Rl, ⊂ ðBl B{ Þ, we get X - Y 2 N l ðBÞ \ Rl ðBl B{ Þ ⊆ N l ðBB,{ { Þ \ Rl ðBB,{ { Þ = f0g: □
Thus, (3.10) is the uniquely determined solution to (2.6).
We possess the next results when B = In and A = Im in Theorem 3.6, respectively. Corollary 3.7 For A 2 ðmÞðkÞ and D 2 m×n , the unique solution to {
AX = Ak ðAk Þ D,
X 2 Cr, ⊂ ðA{ AkÞ
ð3:11Þ
is given as { X = A{, D:
ð3:12Þ
Corollary 3.8 Consider B 2 ðnÞðlÞ and D 2 m×n . The unique solution to {
XB = DðBl Þ Bl ,
X 2 Rl, ⊂ ðBl B{ Þ
ð3:13Þ
is expressed as X = DB,{ { :
ð3:14Þ
94
I. I. Kyrchei et al.
If we add an extra assumption on D in (2.6), new constrained equation can be studied. Indeed, if D 2 Cr, ⊂ ðAk Þ (resp. D 2 Rl, ⊂ ðBl Þ ), the equation AXB = Ak(Ak){D(Bl){Bl becomes AXB = D(Bl){Bl (resp. AXB = (resp. Ak(Ak){D) and, by Theorem 3.6, the solution is X = A{DB,{ { { DB{). X = A,{ Similarly, when D 2 Cr, ⊂ ðAk Þ (resp. D 2 Rl, ⊂ ðBl Þ ) in (3.11) (resp. (3.13)), the equation AX = Ak(Ak){D (resp. XB = D(Bl){Bl) reduces to AX = D (resp. XB = D) and the solution (3.12) (resp. (3.14)) becomes X = A{D (resp. X = DB{). When 1 = IndðAÞ and 1 = IndðBÞ in Theorem 3.6, the following equation can be solved. Corollary 3.9 If ðAjBÞ 2 ðmjnÞð1j1Þ and D 2 m×n , then AXB = AA{ DB{ B,
X 2 O ⊂ ðA{ , B{ Þ
is uniquely solvable by X = A{DB{. The solvability of new QRMEs can be verified as in Theorem 3.6. Theorem 3.10 If ðAjBÞ 2 ðmjnÞðkjlÞ and D 2 m×n , the only solution to {
{
{
{
ðAk Þ AXBðBl Þ = ðAk Þ DðBl Þ ,
X 2 O ⊂ ðA{ Ak , Bl B{ Þ
is expressed as in (3.10). Theorem 3.11 For ðAjBÞ 2 ðmjnÞðkjlÞ and D 2 m×n, the unique solution to (2.7) is given as in {, { : X = A,{ { DB
ð3:15Þ
Theorem 3.12 If ðAjBÞ 2 ðmjnÞðkjlÞ and D 2 m×n , the unique solution of Ak XBl = Ak A{ DB{ Bl ,
{
X 2 O ⊂ ðAk Þ , ðBl Þ
{
is presented as in (3.15). The following results are obtained for B = In and A = Im in Theorem 3.11, respectively.
Quaternion Two-Sided Matrix Equations with Specific Constraints
95
Corollary 3.13 If A 2 ðmÞðkÞ and D 2 m×n , then {
AX = AðAk Þ Ak A{ D,
X 2 Cr, ⊂ ðAk Þ
{
is uniquely solvable by X = A,{ { D:
ð3:16Þ
Corollary 3.14 For B 2 ðnÞðlÞ and D 2 m×n , the unique solution to {
XB = DB{ Bl ðBl Þ B,
X 2 Rl, ⊂ ðBl Þ
{
is represented by { X = DB,{ :
3.3
ð3:17Þ
MPD-DMP-Solutions to Restricted Matrix Equations
In this subsection, the solutions to QRMEs (2.8)–(2.11) are investigated. Notice that the following results were proven in [43]. Theorem 3.15 For A 2 ðmÞðkÞ , B 2 ðnÞðlÞ , and D 2 m×n , the matrix X = A{,D DBD,{
ð3:18Þ
is the unique solution to the constrained equations (2.8) and (2.9). Proof On the basis of definitions (1.4) and (1.5), the following identities can be concluded for X = A{, DDBD,{: Akþ1 XBlþ1 = Akþ1 A{,D DBD,{ Blþ1 = ðAkþ1 A{ AÞAD DBD ðBB{ Blþ1 Þ = Akþ1 AD DBD Blþ1 = Ak DBl :
96
I. I. Kyrchei et al.
Also, llCr ðXÞ = Cr ðA{,D DBD,{ Þ ⊂ Cr ðA{,D Þ = Cr ðA{ AAD Þ = Cr ðA{ AD Þ = Cr ðA{ Ak Þ, Rl ðXÞ ⊂ Rl ðBD BB{ Þ = Rl ðBD B{ Þ = Rl ðBl B{ Þ: Hence, (3.18) is a solution to (2.9). Assume two solutions to (2.9), denoted as X and X1. Since Ak+1(X -X1) Bl+1 = 0, Cr ðXÞ ⊂ Cr ðA{ Ak Þ and Cr ðX1 Þ ⊂ Cr ðA{ Ak Þ, we conclude ðX - X1 ÞBlþ1 2 N r ðAkþ1 Þ \ Cr ðA{ Ak Þ ⊆ N r ðA{,D AÞ \ Cr ðA{,D AÞ = f0g: According to (X -X1)Bl+1 = 0, Rl ðX1 Þ ⊂ Rl ðBl B{ Þ, we notice
Rl ðXÞ ⊂ Rl ðBl B{ Þ
and
X - X1 2 N l ðBlþ1 Þ \ Rl ðBl B{ Þ ⊆ N l ðBBD,{ Þ \ Rl ðBBD,{ Þ = f0g, which implies that (3.18) is the unique solver to (2.9). Analogously, it is verified that (3.18) is a unique solution to (2.8).
□
After the replacement B := In in Theorem 3.15, it is possible to solve some particular cases of the equations (2.9) and (2.8). Corollary 3.16 Under the suppositions A 2 ðmÞðkÞ and D 2 m×n , the matrix X = A{,D D
ð3:19Þ
is the unique solution to the equations (3.20) and (3.21) defined as Akþ1 X = Ak D,
X 2 Cr, ⊂ ðA{ Ak Þ,
ð3:20Þ
AX = AAD D,
X 2 Cr, ⊂ ðA{ Ak Þ:
ð3:21Þ
The specialization A := Im in Theorem 3.15 gives the following consequence.
Quaternion Two-Sided Matrix Equations with Specific Constraints
97
Corollary 3.17 Consider B 2 ðnÞðlÞ and D 2 m×n . Then X = DBD,{
ð3:22Þ
is the unique solution to the constrained equations (3.23) and (3.24) XBlþ1 = DBl ,
X 2 Rl, ⊂ ðBl B{ Þ,
ð3:23Þ
XB = DBD B,
X 2 Rl, ⊂ ðBl B{ Þ:
ð3:24Þ
If the additional constraint D 2 O ⊂ ðAk , Bl Þ is imposed in (2.8), then AADDBDB = D and the next statement follow. Corollary 3.18 Assume A 2 ðmÞðkÞ , B 2 ðnÞðlÞ , and D 2 m×n . Then X = A{ DB{ is the unique solver of AXB = D,
X 2 O ⊂ ðA{ Ak , Bl B{ Þ, D 2 O ⊂ ðAk , Bl Þ:
Remark 3.19 Corollary 3.18 provides a new characterization of the best approximate solution X = A{DB{ to the QME AXB = D. According to Theorem 3.15 and Corollary 3.18, the additional assumption D 2 O ⊂ ðAk , Bl Þ brings down the solution A{,DDBD,{ to the best approximation solution A{DB{. In this way, A{,DDBD,{ is a generalization of A{DB{. It is appropriate to utilize MPD-DMP best approximate solution to represent A{,DDBD,{. The following result related to solvability of equations (2.10) and (2.11) is confirmed by Theorem 3.15. Theorem 3.20 Assume A 2 ðmÞðkÞ , B 2 ðnÞðlÞ , and D 2 m×n . Then X = AD,{ DB{,D :
ð3:25Þ
is the unique solution to both equations (2.10) and (2.11). When B := In or A := Im in Theorem 3.20, we solve some particular quaternion systems.
98
I. I. Kyrchei et al.
Corollary 3.21 For arbitrary A 2 ðmÞðkÞ and D 2 m×n , the matrix X = AD,{ D:
ð3:26Þ
is the unique solution to AX = AAD AA{ D,
X 2 Cr, ⊂ ðAk Þ,
Ak X = Ak A{ D,
X 2 Cr, ⊂ ðAk Þ:
Corollary 3.22 For B 2 ðnÞðlÞ and D 2 m×n , the matrix X = D B{,D :
ð3:27Þ
is the unique solution to the constrained equations XB = DB{ BBD B,
X 2 Rl, ⊂ ðBl Þ,
XBl = DB{ Bl ,
X 2 Rl, ⊂ ðBl Þ:
Theorem 3.20 leads to the conclusion that AD,{DB{, D extends the Drazininverse solution ADDBD. Corollary 3.23 For A 2 ðmÞðkÞ , B 2 ðnÞðlÞ , and D 2 m×n , the matrix X = AD DBD is the unique solver of AXB = D,
X, D 2 O ⊂ ðAk , Bl Þ:
Proof Since D 2 Cr, ⊂ ðAk Þ, then D = ADAD and D 2 Cr, ⊂ ðAÞ, which gives D = AA{D. In a same manner, D 2 Rl, ⊂ ðBl Þ yields D = DBBD = DB{B. The rest follows by Theorem 3.20. □ Remark 3.24 Corollary 3.23 gives a new characterization of the Drazininverse solution X = ADDBD of the QME AXB = D. By Theorem 3.20 and Corollary 3.23, two additional assumptions D 2 O ⊂ ðAk , Bl Þ
Quaternion Two-Sided Matrix Equations with Specific Constraints
99
degrade AD,{DB{,D into the Drazin-inverse solution ADDBD. In this way, AD,{DB{,D is a generalization of ADDBD. The term DMP-MPD best approximate solution is used for the expression AD,{DB{,D. The following results are consequences in the complex matrix space. The QME (2.9)–(2.11) can be considered in complex matrix setting and our results reduce to new results valid in the domain of complex matrices. Notice that, in the matrix case, the following statements hold: X 2 Cr, ⊂ ðDÞ , X = DD{ X,
X 2 Rl, ⊂ ðDÞ , X = XD{ D:
Corollary 3.25 Assume A 2 ðmÞðkÞ , B 2 ðnÞðlÞ , and D 2 m×n . In this case, X = A{ DB{ is the unique solution to AXB = D,
{
{
X = A{ Ak ðA{ Ak Þ X, X = XðBl B{ Þ Bl B{ , {
{
D = Ak ðAk Þ D, D = DðBl Þ Bl : Corollary 3.26 Suppose A 2 ðmÞðkÞ , B 2 ðnÞðlÞ , and D 2 m×n . In this case, X = AD DBD is the unique solution to AXB = D,
{
{
X = Ak ðAk Þ X, X = XðBl Þ Bl , {
{
D = Ak ðAk Þ D, D = DðBl Þ Bl :
4 Cramer’s Representations of Derived Solutions In this section, we establish the D-representations for solutions of QRMEs discovered in Section 3.
100
4.1
I. I. Kyrchei et al.
Cramer’s Rules for Constrained Q-Matrix Approximation Problems
Now, we give determinantal representations, considered in [42], for solutions to constrained matrix approximation problems. Theorem 4.1 Consider ðAjBÞ 2 ðmjnÞðkjlÞ , rkðAk Þ = r, and rkðBl Þ = s. The solution X = ðxij Þ 2 m×n from (3.3) is represented componentwise as d~ij
xij = α2I r,m
α α
Akþ1 Akþ1
β β
Blþ1 Blþ1 β2J s,n
,
ð4:1Þ
~ = ðd~ij Þ = ΦDΨ . Here Φ = (ϕij) and Ψ = (ψ ij) are determined, where D respectively, by ϕig =
Akþ1 Akþ1
rdetg
g:
α2I r,m fgg
ψ pj =
cdetp
Blþ1 Blþ1
β2J s,n fpg
:p
α
ð^ai: Þ
ð4:2Þ
, α
:j Þ ðb
β β
ð4:3Þ
,
: j is the jth column of B = ðBlþ1 Þ Bl and ^ai: is the ith row of where b ^ = Ak ðAkþ1 Þ . A Proof According to (3.3) and determinantal representations (2.12) and { { (2.13) for the core-EP inverse A = ða,r ij Þ and the dual core-EP inverse ,l { B { = ðbij Þ, respectively, we have m
xij =
g=1 p=1
m
=
n
n
{ ,l { a,r ig d gp bpj
α2I r,m fgg
g=1 p=1
α2I r,m
cdetp
×
Akþ1 Akþ1
rdetg
Akþ1 Akþ1
Þ ðb :p : j
Blþ1 Blþ1
β2J s,n fpg
Blþ1 Blþ1 β2J s,n
β β
g:
ð^ai: Þ
α α β β
,
α
d gp α
Quaternion Two-Sided Matrix Equations with Specific Constraints
101
: j denotes the jth column in B = ðBlþ1 Þ Bl and a^i: means the ith row where b ^ = Ak ðAkþ1 Þ . involved in A If we observe Φ = (ϕig) and Ψ = (ψ pj) determined by (4.2) and (4.3), ~ = ΦCΨ, the equality (4.1) follows. respectively, then, using C □ When B = In or A = Im, we evidently verify the next corollaries. Corollary 4.2 For A 2 ðmÞðkÞ and rk Ak = r , the unique solution X 2 m×n to (2.4) given as in (3.6) is expressed by ~cij kþ1
xij = α2I r,m
A
Akþ1
α α
,
~ = ðd~ij Þ = ΦC. Here Φ = (ϕil) is determined by where D ϕig =
rdetg
Akþ1 Akþ1
g:
α2I s,m fgg
α
ð^ai: Þ
, α
^ = Ak ðAkþ1 Þ . where ^ai : is the ith row of A Corollary 4.3 Let B 2 ðnÞðlÞ and rk Bl = s. The unique solution X 2 m×n of (2.5) given as in (3.7) can be expressed componentwise by d~ij
xij =
Blþ1 Blþ1 β2J s,n
β β
,
~ = ðd~ij Þ = DΨ. Here Ψ = (ψ lj) is determined by where D ψ pj =
cdetp
Blþ1 Blþ1
β2J s,n fpg
:p
:j Þ ðb
β β
,
:j represents the jth column grasped in B = Blþ1 Bl . where b It is important to mention that all row and column determinants reduce to usual determinants in the case of complex matrices. So, the next corollary of Theorem 4.1 is true. Corollary 4.4 Let ðAjBÞ 2 ðmjnÞðkjlÞ , rkðAk Þ = r, and rkðBl Þ = s. (i) The unique solution X 2 m×n componentwise by
from (3.3) can be expressed
102
I. I. Kyrchei et al.
d~ij
xij = α2I r,m
α α
Akþ1 Akþ1
Blþ1 Blþ1 β2J s,n
β β
,
~ = ðd~ij Þ = ΦDΨ. Here Φ = (ϕig) and Ψ = (ψ pj) are determined, where D respectively, by ϕig =
Akþ1 Akþ1
α2I r,m fgg
ψ pj =
Blþ1 Blþ1 β2J s,n fpg
g:
ð^ ai: Þ
ðb Þ :p :j
α
,
ð4:4Þ
,
ð4:5Þ
α β β
where b:j is the jth column of B = ðBlþ1 Þ Bl and a^i : is the ith row of A^ = Ak ðAkþ1 Þ . (ii) The unique solution X 2 m×n of the form (3.6) can be represented componentwise by
d~ij
xij =
kþ1
α2I r,m
A
Akþ1
α α
,
~ = ðd~ij Þ = ΦD and Φ is determined by (4.4). where D (iii) The unique solution X 2 m×n from (3.7) can be expressed componentwise by d~ij
xij =
Blþ1 Blþ1 β2J s,n
β β
,
~ = ðd~ij Þ = DΨ, and Ψ = (ψ ij) is determined by (4.5). where D
4.2
D-Representations for the MPCEP-CEPMP-Solutions
The content of this subsection was presented in [44]. Firstly, we get the D-representations for the solution (3.10) and its special cases (3.12) and (3.14).
Quaternion Two-Sided Matrix Equations with Specific Constraints
103
Theorem 4.5 Let ðAjBÞ 2 ðmjnÞðkjlÞ , rkðAk Þ = r1 , and rkðBl Þ = s1 . The r,s matrix X = ½xij 2 m×n in the form (3.10) is defined elementwise in one of the next representations. (1)
α
~ ði:1Þ rdetj ðBB Þj: ψ
α
α2I s,n fjg
xij ¼ β2J r,m
jA Ajββ
β2J r1 ,m
β Akþ1 Akþ1 β
α α
Blþ1 Blþ1
α2I s1 ,n
α2I s,n
jBB jαα
ð4:6Þ
ð1Þ ~ 1 : = Ψ1 V2 . Here, Ψ1 = ψ ð1Þ where ψ~i: stands for the ith row in Ψ ig ð2Þ Þ are determined as follows: and V2 = ðvgp
vð2Þ gp = ð1Þ ψ ig
Blþ1 Blþ1
cdetg β2J s1 ,n fgg
=
β2J r,m fig
cdeti ðA AÞ: i
~:g u
:g
^:p b
β β
, ð4:7Þ
β , β
^ = ðBlþ1 Þ Blþ1 B and u ^:p is the pth column of B ~:g is the gth where b ð1Þ m×n ~ column of U = U1 D, and U1 = ðuf t Þ 2 satisfies ð1Þ
uf t =
rdetf
Akþ1 Akþ1
f:
α2I r1 ,m ff g
ð~at:
α
ð4:8Þ
, α
~ = A Akþ1 ðAkþ1 Þ . where ~at: is the tth row of A (2) β2J r,m fig
xij ¼ β2J r,m
jA Ajββ
^ ð1Þ cdeti ðA AÞ:i ϕ :j
Akþ1 Akþ1 β2J r1 ,m
β β
α2I s1 ,n
β β
Blþ1 Blþ1
α α
α2I s,n
jBB jαα
ð4:9Þ
^ð1Þ is the jth column of Φ ^ 1 = U1 Φ1 . Here U1 is determined by where ϕ :j (4.8) and Φ = (ϕfj) is such that ð1Þ
ϕf j :=
α2I s,n fjg
rdetj ðBB Þj: ð^ vf : Þ
α α
,
^ = DV2 , where V2 := ðvð2Þ Þ is with ^vf : which signifies the fth row of V fp determined by (4.7).
104
I. I. Kyrchei et al.
Proof According to (3.10) and the representation (2.23) of the MPCEP { { inverse A{, = ða{, ij Þ in common with the representation (2.24) of the ,{ { CEPMP inverse B,{ { = ðbij Þ, one concludes m
xij =
n
f =1 g=1
{ ,{ { a{, i f d f g bg j β
ð1Þ
m
=
n
β2J r,m fig
f =1 g=1
β2J r,m
α2I s,n fjg
×
β2J s,n
jA
cdeti ðA AÞ:i u:f
Ajββ
A
α2I r1 ,m
kþ1
jBB jββ
α2I s1 ,n
B
df g
ð4:10Þ
α
rdetj ðBB Þj: ðvð2Þ g: Þ lþ1
β
α Akþ1 α
α
Blþ1
α α
,
where ð1Þ
u:f = ð2Þ = vg:
Akþ1 Akþ1
rdetf
f:
α2I r1 ,m ff g
Blþ1 Blþ1
cdetg β2J s1 ,n fgg
:g
ð~at:
α α β
^:p b
β
2 m×1 ,
t = 1, . . . , m,
2 1×n ,
p = 1, . . . , n:
~ = A Akþ1 ðAkþ1 Þ and b ^:p is the pth column in Here ~at: is the tth row in A lþ1 lþ1 ^ = ðB Þ B B . B Convolutions of (4.10) lead to expressive formulas. ð1Þ Extracting columns u:f and rows vð2Þ g: , it is possible to calculate the ð2Þ ~ = U1 D 2 m×m and V2 = vgp 2 n×n . D enote U
ð1Þ
matrices U1 = utf
^ = DV2 . Then, and V m f =1
ð1Þ
~:g , u:f d f g = u
n g=1
d f g vð2Þ vf : g: = ^
Denote by m
ð1Þ
ψ ig := =
f = 1 β2J r,m fig
cdeti ðA AÞ:
ð1Þ
i
~:g cdeti ðA AÞ: i u
β2J r,m fig
u:f β β
β β
df g
Quaternion Two-Sided Matrix Equations with Specific Constraints
~ 1 =: Ψ1 V2 . Then 2 m×n and put Ψ
ð1Þ
the (ig)th entry in Ψ1 = ψ ig n g=1
ð1Þ
ψ ig
α2I s,n fjg
105
rdetj ðBB Þj: ðvð2Þ g: Þ
α α
=
ð1Þ
rdetj ðBB Þj: ðψ~i: Þ
α2I s,n fjg
α α
gives (4.6). If we denote by n
ð1Þ
ϕf j : = =
g=1
α
α2I s,n fjg
rdetj ðBB Þj: ð^ vf : Þ
α2I s,n fjg
α α
^ 1 =: U1 Φ1 . Then 2 m×n and put Φ
ð1Þ
the ( fj)th element of Φ1 = ϕf j m
α
rdetj ðBB Þj: ðvð2Þ g: Þ
df g
β
ð1Þ
cdeti ðA AÞ:i u:f
β
f = 1 β2J r,m fig
ϕf j =
β2J r,m fig
^ð1Þ cdeti ðA AÞ:i ϕ :j
β β
□
gives (4.9).
Corollary 4.6 Under assumptions A 2 ðmÞðkÞ , D 2 m×n , and rkðAk Þ = r 1 , r the matrix X = ½xij 2 m×n defined in (3.12) is represented as ^:j cdeti ðA AÞ:i u
β2J r,m fig
xij =
β2J r,m
jA Ajββ
β β
Akþ1 Akþ1 β2J r1 ,m
β
,
β
^ = U1 D and U1 = ðu Þ 2 m×n satisfies ^:j denotes the jth column of U where u ft (4.8). ð1Þ
, D 2 m×n, and rkðBl Þ = s1. An arbitrary Corollary 4.7 Assume B 2 ðnÞðlÞ s entry in X = ½xij 2 m×n of the pattern (3.14) is defined as
xij =
α2I s,n fjg α2I s1 ,n
B
rdetj ðBB Þj: ð~vi: Þ
lþ1
α Blþ1 α
α2I s,n
α α
jBB jαα
~ : = DV2 , and V2 = ðvð2Þ Þ is defined by (4.7). where ~vi: is the ith row of V gp
106
I. I. Kyrchei et al.
The D-representations for solutions (3.15), (3.16), and (3.17) are obtained in subsequent results. , D 2 m×n , rkðAk Þ = r1 , and Theorem 4.8 Let ðAjBÞ 2 ðmjnÞðkjlÞ r,s rkðBl Þ = s1 . In this case, X = ½xij 2 m×n defined by (3.15) is expressible elementwise in one of two possible cases. (1)
α
~ ði:2Þ rdetj ðBB Þj: ψ
α
α2I s,n fjg
xij ¼ β2J r,m
jA Ajββ
Akþ1 Akþ1 β2J r1 ,m
β β
α2I s1 ,n
Blþ1 Blþ1
α α
α2I s,n
jBB jαα
ð4:11Þ
ð2Þ ~ 2 := Ψ2 V2 . Here, Ψ2 = ðψ ð2Þ Þ and where ψ~i: is the ith row of Ψ ig V2 = ðvð2Þ Þ are representable as gp
vð2Þ gp =
cdetg
Blþ1 Blþ1
β2J s1 ,n fgg
ð2Þ
ψ ig =
β2J r,m fig
:g
^:p b
β β
ð4:12Þ
,
β , β
~:g cdeti ðA AÞ:i u
ð4:13Þ
^:p is the pth column of B ^ = ðBlþ1 Þ Blþ1 B , u ~:g is the gth column where b ð1Þ m×n ~ = U1 D, and U1 = ðu Þ 2 of U satisfies ft ð1Þ
uf t =
rdetf
Akþ1 Akþ1
f:
α2I r1 ,m ff g
ð~at: Þ
α
ð4:14Þ
, α
~ = A Akþ1 ðAkþ1 Þ . where ~at: is the tth row of A (2) ^ ð2Þ cdeti ðA AÞ:i ϕ :j
β2J r,m fig
xij ¼ β2J r,m
jA Ajββ
kþ1
A β2J r1 ,m
β Akþ1 β
α2I s1 ,n
β β
Blþ1 Blþ1
α α
α2I s,n
jBB jαα
ð4:15Þ
^ð2Þ is the jth column of Φ ^ 2 = U1 Φ2 . Here U1 is determined by where ϕ :j ð2Þ
(4.14) and Φ2 = ϕf j
is such that
Quaternion Two-Sided Matrix Equations with Specific Constraints ð2Þ
ϕf j : =
α2I s,n fjg
rdetj ðBB Þj: ð^vf : Þ
α α
107
,
ð4:16Þ
^ = DV2 , where V2 := ðvð2Þ Þ is with ^vf : representing the fth row of V fp fixated on (4.12). Proof According to (3.15), considering derived D-representation (2.25) of ,{ { { {, { A,{ = ðb{, { = ðaij Þ and the representation (2.22) of B ij Þ, it follows m
xij =
f =1 g=1
m
=
n
n
{ {, { a,{ if d f g bgj
β2J r1 ,m fig
f =1 g=1
×
Akþ1 Akþ1
cdeti
α2I s1 ,n fjg β2J s,n
β2J r1 ,m
jAA jββ
jB Bjββ
α2I s1 ,n
:i
β
α Akþ1 Akþ1 α
α2I r1 ,m
Blþ1 Blþ1
rdetj
β
ð2Þ
u:f
j:
ðvð1Þ g: Þ
α Blþ1 Blþ1 α
df g
ð4:17Þ
α α
,
where ð2Þ
u:f = vð1Þ g: =
α2I r,m fjg
rdetj ðAA Þj: ð^ af :
~:g cdeti ðB BÞ:i b
β2J s,n fig
α α β β
2 m×1 ,
f = 1, . . . , m,
2 1×n ,
l = 1, . . . , n:
^ = ðAkþ1 Þ Akþ1 A and b ~:g is the gth column in Here ^af : is the fth row of A ~ = B Blþ1 ðBlþ1 Þ . B To obtain expressive formulas, we make some convolutions of (4.17). ð2Þ Using the columns u:f and the rows vð1Þ g: , ones construct the matrices ð2Þ
U2 = utf 2 m×m ~ = DV1 . Then, V
^ = U2 D 2 n×n . Denote U and V1 = vð1Þ gp
and
108
I. I. Kyrchei et al. m
ð2Þ
f =1
^:g , u:f cf g = u
n
cf g vð1Þ vf : g: = ~
g=1
If we denote by m
ð2Þ
ψ ig : =
Akþ1 Akþ1
cdeti
:i
f = 1 β2J r1 ,m fig
=
Akþ1 Akþ1
cdeti
:i
β2J r1 ,m fig
β
ð2Þ
u:f
β
df g
β
^:g u
β
ð2Þ ~ 2 =: Ψ2 V1 , then the (ig)th element of Ψ2 = ðψ ig Þ 2 m×n and put Ψ n
ð2Þ
g=1
ψ ig
Blþ1 Blþ1
rdetj α2I s1 ,n fjg
=
rdetj
Blþ1 Blþ1
α
ðvð1Þ Þ j: g:
α2I s1 ,n fjg
α
ð2Þ
ðψ~i: Þ j:
= α α
gives (4.11). If we denote by n
ð2Þ
ϕf j : = =
g=1
df g
rdetj
Blþ1 Blþ1
α2I s,n fjg
rdetj
Blþ1 Blþ1
j:
α2I s,n fjg
ðvð1Þ Þ j: g:
α α
α
ð~vf : Þ
α
ð2Þ ^ 2 =: U2 Φ2 , the ( fj)th element of the matrix Φ2 = ðϕf j Þ 2 m×n and put Φ then m
ð2Þ
f = 1 β2J r,m fig
= gives (4.15).
cdeti ðA AÞ:i u:f
β β
ð2Þ
ϕf j =
^ð2Þ cdeti ðA AÞ:i ϕ :j
β2J r,m fig
β β
□
Quaternion Two-Sided Matrix Equations with Specific Constraints
109
Corollary 4.9 Let A 2 ðmÞðkÞ , D 2 m×n , and rkðAk Þ = r 1 : Then r X = ½xij 2 m×n of the form (3.16) can be expressed as
β2J r,m fig
xij =
β2J r,m
β
^ð2Þ cdeti ðA AÞ:i ϕ :j
jA Ajββ
A
kþ1
β
Akþ1
β2J r1 ,m
β β
^ð2Þ is the jth column of Φ ^ 2 = DΦ2 , and Φ2 is of the form (4.16). where ϕ :j , D 2 m×n and rkðBl Þ = s1 . Then Corollary 4.10 Assume that B 2 ðnÞðlÞ s the entries of the matrix X 2 m×n defined in (3.17) are defined by ð2Þ
xij =
α2I s,n fjg α2I s1 ,n
rdetj ðBB Þj: ðψ~i: Þ
B
lþ1
α Blþ1 α
α2I s,n
α α
jBB jαα
~ 2 : = Ψ2 D, and Ψ 2 is defined as in (4.13). wherein ψ~i: is the ith row of Ψ ð2Þ
Since all row and column determinants reduce to usual determinants in the complex matrix case, the next consequences of Theorem 4.5 and Theorem 4.8 are clearly concluded. , rkðAk Þ = r 1 , and rkðBl Þ = s1 . The Corollary 4.11 Let ðAjBÞ 2 ðmjnÞðkjlÞ r,s m×n matrix X = ½xij 2 determined by (3.10) is defined in one of the following representations: (1) ð1Þ
α2I s,n fjg
xij ¼ β2J r,m
ð1Þ
where ψ~i:
V 2 = ðvð2Þ gp Þ
jA Ajββ
β2J r1 ,m
ðBB Þj: ψ~ i:
β Akþ1 Akþ1 β
α α
α2I s1 ,n
Blþ1 Blþ1
α α
α2I s,n
jBB jαα
~ 1 := Ψ1 V 2 . Here, Ψ1 = ψ ð1Þ is the ith row of Ψ ig are determined as follows:
and
110
I. I. Kyrchei et al.
vð2Þ gp =
Blþ1 Blþ1 β2J s1 ,n fgg
ð1Þ
ψ ig =
β2J r,m fig
ðA AÞ:i u~:g
b^:p
:g
β β
, ð4:18Þ
β , β
where b^:p is the pth column of B^ = ðBlþ1 Þ Blþ1 B and u~:g is the gth ð1Þ column of U~ = U 1 D, and U 1 = ðuf t Þ 2 m×n satisfies ð1Þ
uf t =
Akþ1 Akþ1
α
ð~ a Þ , f : t:
ð4:19Þ
α
α2I r1 ,m ff g
where a~t: is the tth row of A~ = A Akþ1 ðAkþ1 Þ .
(2) β2J r,m fig
xij ¼ β2J r,m
jA Ajββ
^ ð1Þ ðA AÞ:i ϕ :j
Akþ1 Akþ1 β2J r1 ,m
β β
α2I s1 ,n
β β
Blþ1 Blþ1
α α
α2I s,n
jBB jαα
ð1Þ ^ 1 = U 1 Φ1. Here U1 is defined such that ϕ^:j stands for the jth column of Φ by (4.19) and Φ = (ϕfj) is presented as α
ð1Þ
ϕf j : =
α2I s,n fjg
ðBB Þj: ð^ vf : Þ , α
ð2Þ with v^f : representing the fth row of V^ = DV 2 , where V 2 := ðvf p Þ is determined by (4.18).
and D 2 m×n , such Corollary 4.12 Consider the pair ðAjBÞ 2 ðmjnÞðkjlÞ r,s m×n that rkðAk Þ = r 1 and rkðBl Þ = s1 . Then X = ½xij 2 defined by (3.15) is expressible elementwise in one of two possible representations. (1) ð2Þ
α2I s,n fjg
xij ¼ β2J r,m
jA Ajββ
β2J r1 ,m
ðBB Þj: ψ~ i:
β Akþ1 Akþ1 β
α α
α2I s1 ,n
Blþ1 Blþ1
α α
α2I s,n
jBB jαα
~ 2 : = Ψ2 V 2 . Here, Ψ2 = ðψ Þ and where ψ~i: is the ith row of Ψ ig ð2Þ V 2 = ðvgp Þ are representable as ð2Þ
ð2Þ
Quaternion Two-Sided Matrix Equations with Specific Constraints
vð2Þ gp =
Blþ1 Blþ1 β2J s1 ,n fgg
ð2Þ
ψ ig =
β2J r,m fig
ðA AÞ:i u~:g
b^:p
:g
β β
111
, ð4:20Þ
β , β
where b^:p is the pth column of B^ = ðBlþ1 Þ Blþ1 B , u~:g is the gth column of ð1Þ U~ = U 1 D, and the matrix U 1 = ðuf t Þ 2 m×n satisfies ð1Þ
uf t =
Akþ1 Akþ1
α
ð~ a Þ , f : t:
ð4:21Þ
α
α2I r1 ,m ff g
where a~t: is the tth row of A~ = A Akþ1 ðAkþ1 Þ .
(2) β2J r,m fig
xij ¼ β2J r,m
jA Ajββ
^ ð2Þ ðA AÞ:i ϕ :j
Akþ1 Akþ1 β2J r1 ,m
β β
α2I s1 ,n
β β
Blþ1 Blþ1
α α
α2I s,n
jBB jαα
ð2Þ ^ 2 = U 1 Φ2 . Here U1 is determined by where ϕ^:j is the jth column of Φ ð2Þ
(4.21) and Φ2 = ϕf j
is such that α
ð2Þ
ϕf j : =
α2I s,n fjg
ðBB Þj: ð^ vf : Þ , α
ð2Þ with v^f : representing the fth row of V^ = DV 2 , where V 2 := ðvf p Þ is fixated on (4.20).
4.3
D-Representations for the MPD-DMP-Solutions
Developed D-representations of solutions to (3.18), (3.19), (3.22), (3.25), (3.26), and (3.27), given in [43], are analogous to Cramer’s rule. Miscellaneous scenarios are considered due to non-Hermiticity of the matrices A and B.
112
I. I. Kyrchei et al.
Theorem 4.13 Suppose A 2 ðmÞðkÞ , B 2 sðnÞðlÞ , D 2 m×n , rkðAk Þ = r 1 , r and rkðBl Þ = s1 . Then the unique solution X 2 m×n defined by (3.18) is presented elementwise by representation defined in one of subsequent cases. (1) xij = α2I s,n fjg β2J r,m
jA Ajββ
A
α
ð1Þ rdetj ðBB Þj: ψ~i:
2kþ1
β2J r1 ,m
β A2kþ1 β
α2I s1 ,n
α
B2lþ1 B2lþ1
α α
α2I s,n
jBB jαα
ð4:22Þ ð1Þ ~ 1 : = Ψ1 UB2 B . Here, where ψ~i: denotes the ith row included in Ψ ð1Þ Ψ1 = ðψ ig Þ and U = (ugp) are determined as follows: ð1Þ
ψ ig = ugp =
β2J r,m fig
cdeti ðA AÞ:i ðað1Þ :g Þ rdetp
B2lþ1 B2lþ1
α2I s1 ,n fpg
β β
ð4:23Þ
,
:p
g: Þ ðb
α α
,
ð4:24Þ
: = BðB2lþ1 Þ and að1Þ means the gth g: means the gth row of B where b :g ð1Þ
column of A1 := AA2W1, and the matrix W1 : = ðwtg Þ 2 m×n satisfies ð1Þ
wtg =
cdett
A2kþ1 A2kþ1
β2J r1 ,m ftg
:t
ðdð1Þ :g Þ
β β
ð4:25Þ
,
2k+1 ) AD. where dð1Þ :g is the gth column of D1 := (A
(2) xij = ~ð1Þ cdeti ðA AÞ:i ϕ :j
β2J r,m fig β2J r,m
jA Ajββ
A2kþ1 A2kþ1 β2J r1 ,m
β β
α2I s1 ,n
β β
B2lþ1 B2lþ1
α α
α2I s,n
jBB jαα
ð4:26Þ ~ð1Þ is the jth column of Φ ~ 1 : = A A2 VΦ1. Here V = ðvtf Þ 2 m×m where ϕ :j ð1Þ
and Φ1 = ðϕf j Þ are determined as
Quaternion Two-Sided Matrix Equations with Specific Constraints
vtf =
A2kþ1 A2kþ1
cdett
:t
β2J r1 ,m ftg
ð1Þ
ϕf j =
ð1Þ
rdetj ðBB Þj: ðbf : Þ
α2I s,n fjg
α α
113 β
ð^a:f Þ
β
ð4:27Þ
,
,
ð4:28Þ
^ := ðA2kþ1 Þ A and bð1Þ denoting with ^a:f representing fth column in A f: ð2Þ
fth row in B1 := W2B2B. Here, W2 := ðwf p Þ is such that ð2Þ
wf p =
rdetp
B2lþ1 B2lþ1
α
ð2Þ
p:
α2I s1 ,n fpg
ðdf : Þ
α
,
ð4:29Þ
ð2Þ
where df : is the fth row of D2 := DB(B2k+1). Proof According to (3.18), considering D -representation (2.18) of A{,D = ða{,D ij Þ as well as the representation (2.14) of the DMP inverse BD,{ = ðbD,{ ij Þ, one obtains m
xij =
n
f =1 g=1
cdeti ðA AÞ:i ð~v:f Þ
β2J r,m fig
D,{ a{,D if d f g bgj =
β2J r,m
×
α2I s,n fjg α2I s1 ,n
jA Ajββ
rdetj ðBB Þj: ð~ ug: Þ
α B2lþ1 B2lþ1 α
α2I s,n
β β
df g
A2kþ1 A2kþ1 β2J r1 ,m
β β
α α
jBB jαα
, ð4:30Þ
~ := A A2 V, and u ~g: denotes gth row in where ~v:f is the fth column of V m×m 2 ~ and U = ðugt Þ 2 n×n are defined by U := UB B . Here, V = ðvtf Þ 2 vtf = ugp =
cdett
A2kþ1 A2kþ1
β2J r1 ,m ftg
rdetp α2I s1 ,n fpg
B2lþ1 B2lþ1
:p
:t
β
ð^a:f Þ
g: Þ ðb
β α α
,
,
^ := ðA2kþ1 Þ A and b g: stands for gth row in where ^a:f is the fth column of A 2lþ1 := BðB Þ . B
114
I. I. Kyrchei et al.
The following two possibilities are acquired as computational algorithms based on (4.30): Case (1). m
(i) Put f =1
2k+1 ^a:f df g = dð1Þ ) AD. :g as the gth column in D1 := (A
ð1Þ
(ii) Construct the matrix W1 = ðwtg Þ 2 m×n such that ð1Þ
wtg =
A2kþ1 A2kþ1
cdett
:t
β2J r1 ,m ftg
β
ðdð1Þ :g Þ
β
,
ð1Þ is the gth column of D1. where d:g ð1Þ
(iii) Denote A1 := AA2W1 and determine Ψ1 = ψ ig ð1Þ
ψ ig =
cdeti ðA AÞ:i ðað1Þ :g Þ
β2J r,m fig
β β
2 m×n such that ,
with að1Þ :g representing the gth column of A1. (iv) Then the D-representation (4.22) is verified by putting
n g=1
ð1Þ ~g: = ψ~ð1Þ ψ ig u i:
~ 1 := Ψ1 U ~ in conjunction with (4.30). as the ith row of Ψ Case (2). n
(i) Put g=1
g: = dð2Þ as fth row in D2 = DB(B2l+1). df g b f: ð2Þ
(ii) Construct the matrix W2 = ðwf p Þ 2 m×n such that ð2Þ
wf p =
B2lþ1 B2lþ1
rdetp α2I s1 ,n fpg
ð2Þ
p:
ðdf : Þ
α α
,
ð2Þ
where df : is the fth row of D2. ð1Þ
(iii) Denote B1 := W2B2B and determine Φ1 = ϕf j ð1Þ
ϕf j = ð1Þ
ð1Þ
α2I s,n fjg
rdetj ðBB Þj: ðbf : Þ
where bf : is the fth row in B1.
α α
2 m×n such that ,
Quaternion Two-Sided Matrix Equations with Specific Constraints m
(iv) By putting f =1
115
ð1Þ ~ 1 : = VΦ ~ 1 , (4.26) can ~ v:f ϕf j = ϕ:j as the jth column in Φ
be derived from (4.30). □
The proof is complete.
and B 2 ðnÞðlÞ are Hermitian, Theorem 4.14 Assume that both A 2 ðmÞðkÞ r s k m×n such that rkðA Þ = r1 and rkðBl Þ = s1 . Then and further consider D 2 m×n the unique solution X 2 defined as in (3.18) is represented in one of the following forms: (1)
α
~ ð1Þ rdetj ðB2 Þj: ω i:
α2I s,n fjg 2 β A β β2J r1 ,m
xij = β2J r,m
Akþ1
β β
α
Blþ1
α2I s1 ,n
α α
α2I s,n
B2
α α
,
ð4:31Þ
~ 1 = Ω1 V2 . Further, V2 = vð2Þ 2 n×n ~ i:ð1Þ is the ith row of Ω where ω gj ð1Þ
and Ω1 = ðωig Þ are determined as follows: ð2Þ
vgj =
cdetg β2J s1 ,n fig
ð1Þ
ωig =
Bkþ1
β2J r,m fig
β
ðkþ2Þ
b:j
:g
β
β
cdeti ðA2 Þ:i ðdð1Þ :g Þ
β
ð4:32Þ
,
ð4:33Þ
,
such that dð1Þ :g is the gth column of D1 := V1D. Here the matrix V1 = ð1Þ
ðvtf Þ 2 m×n is such that ð1Þ
vtf =
rdetf
Akþ1
α2I r1 ,m ff g
ðkþ2Þ
f:
at:
(2) ð1Þ cdeti ðA2 Þ:i υ~:j
xij = β2J r,m
β2J r,m fig β A2 β β2J r1 ,m
β Akþ1 β
α2I s1 ,n
α Blþ1 α
α α
:
ð4:34Þ
β β
α2I s,n
B2
α α
ð4:35Þ
116
I. I. Kyrchei et al. ð1Þ ~ 1 = V1 Υ1. Further, V1 = ðvð1Þ Þ 2 m×m where υ~:j means jth column of Υ tf ð1Þ
is defined by (4.34) and Υ1 = ðυf j Þ is determined as ð1Þ
υf j =
ð2Þ
α2I s,n fjg
rdetj ðB2 Þj: ðdf : Þ
α α
,
ð4:36Þ
ð2Þ
where df : is the fth row of D2 := DV2, and V2 is determined in (4.32). Proof According to (3.18) and taking into account the D-representations of D,{ the MPD inverse A{,D = ða{,D = ðbD,{ ij Þ and the DMP inverse B ij Þ given, respectively, by (2.20) and (2.16), it follows that ð1Þ
m
xij =
n
f =1 g=1
D,{ a{,D if d f g bgj =
m
n
β2J r,m fig
cdeti ðA2 Þ:i ðv:f Þ A2
f =1 g=1 β2J r,m
×
α2I s,n fjg
β2J s1 ,n
α2I s,n
Akþ1 β2I r1 ,m
β
df g
β β
α
rdetj ðB2 Þj: ðvð2Þ g: Þ β Bkþ1 β
β β
β
α
α B2 α
, ð4:37Þ
where ð1Þ
v:f =
rdetf
Akþ1
α2I r1 ,m ff g
vð2Þ g: =
cdetg
Bkþ1
β2J s1 ,n fig
ðkþ2Þ
f:
at:
:g
bðkþ2Þ :p
α α β β
2 n×1 ,
t = 1, . . . , m,
2 1×n ,
p = 1, . . . , n:
ð1Þ
ð2Þ , p = 1, . . ., n span the matrix The columns v:f , f = 1, . . ., m and the rows vg: ð2Þ
V1 = vtf
2 m×m and V2 = vð1Þ 2 n×n , respectively. gp
Thereafter, in view of (4.37), the following two cases are observed. m
In the case (1), and build Ω1 =
f =1 ð1Þ ψ ig
ð1Þ
v:f d f g = dð1Þ :g denotes gth column inside D1 := V1D 2 m×n such that
Quaternion Two-Sided Matrix Equations with Specific Constraints ð1Þ
ωig = n
Using
g=1
β2J r,m fig
cdeti ðA2 Þ:i ðdð2Þ :g Þ
β β
117
:
ð1Þ ~ ~ ð1Þ ωig vð2Þ i: as ith row of Ω1 = Ω1 V2 , (4.31) is derived from g: = ω
(4.37). n
For the case (2), we put
ð2Þ
ð2Þ d f g vg: = df : as the fth row of D2 := DV2 and
g=1 ð1Þ Υ1 = υf j
determine the matrix
2 m×n such that
ð1Þ
υf j = m
Using f =1
ð2Þ
α2I s,n fjg
rdetj ðB2 Þj: ðdf : Þ
α α
:
ð1Þ ð1Þ ð1Þ ~ 1 : = V1 Υ, (4.35) is derived v:f υf j = υ~:j as the jth column of Υ
□
on the basis of (4.37). Also two mixed cases are considered.
Theorem 4.15 Let A 2 ðmÞðkÞ be Hermitian, B 2 ðnÞðlÞ be arbitrary, r s D 2 m×n , rkðAk Þ = r 1 , and rkðBl Þ = s1 . Then the unique solution X 2 m×n defined by (3.18) is expressed componentwise in one of the following occurrences: (1) ^ ð1Þ rdetj ðBB Þj: ω i:
α2I s,n fjg
xij = β2J r,m
β A2 β
β2J r1 ,m
β Akþ1 β
α2I s1 ,n
B2lþ1 B2lþ1
α α α α
α2I s,n
jBB jαα
ð4:38Þ
ð1Þ 2 ^ ^ð1Þ where ω i: is the i-th row of Ω1 = Ω1 UB B , and Ω1 = ðωig Þ and U = (ugp) are determined by (4.33) and (4.24), respectively.
(2) ^ð1Þ cdeti ðA2 Þ:i ϕ :j
xij = β2J r,m
β2J r,m fig β A2 β β2J r1 ,m
β Akþ1 β
α2I s1 ,n
α Blþ1 α
β β
α2I s,n
B2
α α
^ð1Þ means jth column in Φ ^ 1 := V2 Φ1 . Here V2 = ðvð2Þ Þ 2 m×m where ϕ :j tf ð1Þ
and Φ1 = ðϕf j Þ are defined by (4.32) and (4.28), respectively.
118
I. I. Kyrchei et al.
Theorem 4.16 Let A 2 ðmÞðkÞ be arbitrary, B 2 sðnÞðlÞ be Hermitian, r D 2 m×n , rkðAk Þ = r 1 , and rkðBl Þ = s1 . The unique solution X 2 m×n defined in (3.18) can be represented componentwise by one of the following cases: (1) α2I s,n fjg
xij = β2J r,m
jA Ajββ
A
α
ð1Þ rdetj ðB2 Þj: ψ^i: 2kþ1
β2J r1 ,m
β A2kþ1 β
α2I s1 ,n
α
Blþ1
α α
α2I s,n
B2
α α
ð4:39Þ
ð1Þ ^ 1 := Ψ1 V2 and V2 = vð2Þ 2 n×n , where ψ^i: denotes the ith row of Ψ gj ð1Þ
and Ψ1 = ðψ ig Þ are determined by (4.32) and (4.23), respectively. (2)
β
ð1Þ cdeti ðA AÞ:i υ^:j
β
β2J r,m fig
xij = β2J r,m
jA Ajββ
A2kþ1 A2kþ1 β2J r1 ,m
β β
α2I s1 ,n
Blþ1
α α
α2I s,n
B2
α α
ð1Þ ^ 1 := A A2 VΥ1. Here, V = ðvtf Þ 2 m×m and where υ^:j is jth column of Υ ð1Þ
Υ1 = ðυf j Þ are determined by (4.27) and (4.36), respectively. D -representations for solutions (3.19) and (3.22) are developed Corollary 4.17. Corollary 4.17 Let A 2 ðmÞðkÞ , D 2 m×n , and rkðAk Þ = r 1 . Under these r assumptions, the unique solution X 2 m×n of the form (3.19) is defined elementwise by one of the following occurrences: (i) If A is arbitrary, then ð1Þ
xij =
β2J r,m fig β2J r,m
ð1Þ
cdeti ðA AÞ:i ða:j Þ
jA Ajββ
β β
A2kþ1 A2kþ1 β2J r1 ,m
β
,
β
where a:j is the jth column of A1 := AA2W1 and W1 is determined by (4.25). (ii) If A is Hermitian, it follows
Quaternion Two-Sided Matrix Equations with Specific Constraints β
cdeti ðA2 Þ:i ðdð1Þ :g Þ
xij =
β
β2J r,m fig β2J r,m
β A2 β
β2J r1 ,m
119
β Akþ1 β
,
where dð1Þ :g is the gth column of D1 := V1D, and V1 is defined by (4.34). satisfying rkðBl Þ = s1 and D 2 m×n . Corollary 4.18 Consider B 2 ðnÞðlÞ s The unique solution X 2 m×n defined in (3.22) is expressed in one of the subsequent two assertions. (i) If B is arbitrary, then α
ð1Þ
α2I s,n fjg
xij =
α2I s1 ,n
rdetj ðBB Þj: ðbi: Þ
B2lþ1 B2lþ1
α α
α2I s,n
α
,
jBB jαα
,
ð1Þ
where bi: denotes ith row of B1 := W2B2B and W2 is defined as in (4.29). (ii) In the case when B is Hermitian, xij is equal to α
ð2Þ
rdetj ðB2 Þj: ðdi: Þ
xij =
α2I s,n fjg
α2I s1 ,n
Blþ1
α α
α2I s,n
B2
α
α α
, ,
ð2Þ
where di: is the ith row of D2 := DV2. Here V2 is obtained in (4.32) In Theorem 4.19, we develop D-representations of the solution (3.25) and its manifest forms (3.26) and (3.27). Individual situations are observable, due to the presence or absence of Hermicity of A and B. , B 2 ðnÞðlÞ , D 2 m×n , Theorem 4.19 Let us suppose A 2 ðmÞðkÞ r s rkðAk Þ = r1 and rkðBl Þ = s1 . Under these assumptions, the unique solution X 2 m×n of the form (3.25) is represented by the expression d~ij
xij ¼ α2I r1 ,m
α A2kþ1 A2kþ1 α
α2I r,m
jAA jαα
β2J s,n
jB Bjββ
B2lþ1 B2lþ1 β2J s1 ,n
~ = ðd~ij Þ = Ψ2 DΦ2 . The matrices Ψ2 = ψ where D gj
ð2Þ
determined as follows:
β β
, ð4:40Þ
ð2Þ
and Φ2 = ϕif
are
120
I. I. Kyrchei et al. ð2Þ
ψ if =
α2I r,m ff g
ð2Þ
ϕgj =
β2J s,n fgg
rdetf ðAA Þf : ð~ ui: Þ
cdetg ðB BÞ:g ð~v:j Þ
α α
β β
,
ð4:41Þ
,
ð4:42Þ
~ : = UA2 A , while ~v:j stands for jth column of ~i: means ith row in U where u ~ : = B B2 V . The matrices U and V are defined by (2.15) and (2.19), V respectively. Proof According to (3.25) and the D-representations for the DMP inverse {,D AD,{ = ðaD,{ = ðb{,D ij Þ and the MPD inverse B ij Þ, respectively, by (2.14) and (2.18), it is concluded m
xij =
n
f =1 g=1
×
{,D aD,{ if d f g bgj
=
α2I r1 ,m
β2J s,n fgg β2J s,n
α2I r,m ff g
jB Bjββ
2lþ1
B
α α
A2kþ1 A2kþ1
cdetg ðB BÞ:g ð~ v:j Þ
α2I r,m
α
df g
jAA jαα
β β
B2lþ1
β2J s1 ,n
α
rdetf ðAA Þf : ð~ ui: Þ
β β
, ð4:43Þ
~ := UA2 A , and ~v:j is jth column in ~i: indicates ith row of U so that u ~ := B B2 V. Here U is determined by (2.15), and V = ðvf j Þ 2 n×n is defined V by vf j =
cdetf
B2lþ1 B2lþ1
β2J s1 ,n ff g
:f
^:j Þ ðb
β β
,
ð4:44Þ
^ := ðB2lþ1 Þ B. ^:j denotes jth column of B wherein b Since an ℜ-determinant possesses the left distributivity and a ℭ-determinant fulfills the right distributivity only, then a convolution in (4.43) (similar as in Theorem 4.13) is not accessible. So, it is necessary to create the matrices ð2Þ
Ψ2 = ψ gj
ð2Þ
and Φ2 = ϕif
such that
Quaternion Two-Sided Matrix Equations with Specific Constraints ð2Þ
ψ if =
α
α2I r,m ff g
ð2Þ
ϕgj =
α
rdetf ðAA Þf : ð~ ui: Þ
β2J s,n fgg
β
cdetg ðB BÞ:g ð~v:j Þ
β
121
, :
~ := Ψ2 DΦ2 . Then, (4.40) can be obtained by putting D
□
Theorem 4.20 Let A 2 ðmÞðkÞ and B 2 ðnÞðlÞ be Hermitian and satisfy r s k m×n l rkðB Þ = s1 , and assume D 2 , rkðA Þ = r 1 . The unique matrix X 2 m×n adopted as in (3.25) is represented as in one of the subsequent two scenarios. (1)
α
~ð2Þ rdetj ðBlþ1 Þj: ω i:
xij = β2J r,m
α2I s,n fjg β A2 β β2J r1 ,m
Akþ1
β β
α2I s1 ,n
Blþ1
α α α
α2I s,n
B2
α α
,
ð4:45Þ
ð2Þ ~ ~ ð2Þ 2 n×n and where ω i: the ith row of Ω2 := Ω2 U2 . Here, U2 = ugj ð2Þ
Ω2 = ðωig Þ are determined as follows: ð2Þ
ugj =
β2J s,n fgg
ð1Þ
ð2Þ
ωig =
ðlþ2Þ cdetg ðB2 Þ:g ðb:p Þ
β2J r,m fig
^ Þ cdeti ðAkþ1 Þ:i ðd :g
β β β β
, ð4:46Þ
,
^ð1Þ is the gth column of D ^ 1 := U1 D. Here, the entries of the matrix where d :g ð1Þ
U1 = ðutf Þ 2 m×n are defined by ð1Þ
utf =
ðkþ2Þ
α2I r,m ff g
rdetf ðA2 Þf : ðat:
Þ
α α
:
ð4:47Þ
122
I. I. Kyrchei et al. β
ð2Þ cdeti ðAkþ1 Þ:i υ~:j
(2) xij = β2J r,m
β2J r,m fig 2 β A β β2J r1 ,m
β β
Akþ1
α2I s1 ,n
Blþ1
β α α
B2
α2I s,n
α α
,
ð4:48Þ
~ 2 := U1 Υ2 . Here U1 = ðu Þ 2 m × m is where υ~:j is the jth column of Υ tf ð2Þ
ð1Þ
ð2Þ
obtained in (4.47), and Υ2 = ðυf j Þ is expressed with ð2Þ
υf j =
α2I s,n fjg
^ð2Þ Þ rdetj ðBlþ1 Þj: ðd f:
α α
,
^ 2 := DU2, and U2 = uð2Þ is determined by ^ð2Þ is the fth row of C where d gp f: (4.46). Proof According to (3.25), in combination with the D-representation (2.17) of the DMP inverse AD,{ = ðaD,{ ij Þ as well as the representation (2.21) of the
MPD inverse B{,D = ðb{,D ij Þ, it follows m
n D,{ a{,D if d f g bgj
xij ¼ f ¼1
g¼1
m
n
f ¼1
g¼1
¼
×
β2J r1 ,m fig
α2I s1 ,n fjg
Akþ1
cdeti
Akþ1 β2J r1 ,m
rdetj
α2I s,
n
Blþ1
α Blþ1 α
j:
α2I s1 ,n
β β
A2 β2J r,m
uðg:2Þ α B2 α
β
ð1Þ
u:f
: i
β
df g
β β
ð4:49Þ
α α
,
where ð1Þ
u:f = uð2Þ g: =
ðkþ2Þ
rdetf ðA2 Þf : ðat:
Þ
α2I r,m ff g
β2J s,n fgg
cdetg ðB2 Þ:g ðbðlþ2Þ Þ :p
α α β β
2 m×1 ,
t = 1, . . . , m,
2 1×n ,
p = 1, . . . , n:
Quaternion Two-Sided Matrix Equations with Specific Constraints
123
ð1Þ
Exploiting columns u:f , f = 1, . . ., m and rows uð2Þ g: , p = 1, . . ., n, we build ð1Þ
2 m×m and U2 = uð2Þ 2 n×n , respectively. gp
U1 = utf
Then, from (4.49), we observe the following two possibilities. m
ð1Þ
^ as the gth column of D ^ 1 := U1 D u:f d f g = d :g ð1Þ
In the case (1), we set f =1
ð2Þ
and determine the matrix Ω2 = ωig ð2Þ
ωig = n
By putting g=1
2 m×n such that
ð1Þ cdeti ðAkþ1 Þ:i ð^c:g Þ
β2J r,m fig
β β
:
ð2Þ ~ ~ ð2Þ ωig uð2Þ i: as the i-th row of Ω2 = Ω2 U2 , from (4.49), g: = ω
it follows (4.45). n
For the case (2), we put g=1
ð2Þ
^ ^ df g uð2Þ g: = df : as the fth row of D2 := DU2 and
~ 2 = υð2Þ 2 m×n such that form the matrix Υ fj ð2Þ
υf j =
ð2Þ
α2I s,n fjg
rdetj ðBlþ1 Þj: ð^cf : Þ m
Then (4.48) is generated using f =1
~ 2 := U1 Υ2 . Υ
ð1Þ ð2Þ
ð2Þ
u:f υf j = υ~:j
α α
:
as the jth column of □
The subsequent mixed manifestations follow from Theorems 4.19 and 4.20. Theorem 4.21 Assume that A 2 ðmÞðkÞ is Hermitian, B 2 sðnÞðlÞ is arbir trary, D 2 m×n , rkðAk Þ = r1 , and rkðBl Þ = s1 . Then the unique solution X 2 m×n of the form (3.25) is given componentwise as follows: ~ð1Þ cdeti ðAkþ1 Þ:i d :j
β2J r,m fig
xij = β2J r,m
β A2 β
β2J r1 ,m
β Akþ1 β
α2I s1 ,n
β β
α B2lþ1 B2lþ1 α
α2I s,n
jBB jαα
,
124
I. I. Kyrchei et al.
~ð1Þ is jth column of D ~ 1 := U1 DΨ2. Here Ψ 2 and U1 are determined by where d :j (4.41) and (4.47), respectively. Theorem 4.22 Suppose A 2 ðmÞðkÞ is arbitrary, B 2 sðnÞðlÞ is Hermitian, r D 2 m×n , as well as rkðAk Þ = r1 and rkðBl Þ = s1 . Then X 2 m×n given by (3.25) possesses the characterization α2I s,n fjg
xij = β2J r,m
jA Ajββ
α
~ð2Þ rdetj ðBlþ1 Þj: d i:
A
2kþ1
β2J r1 ,m
β A2kþ1 β
α
α2I s1 ,n
Blþ1
α α
α2I s,n
B2
α α
,
~ð2Þ is the ith row of D ~ 2 := Φ2 DU2. Here Φ2 and U1 are expressed by where d i: (4.42) and (4.47), respectively. Corollary 4.23 Let A 2 ðmÞðkÞ , D 2 m×n , and rkðAk Þ = r1 . Then X 2 r m×n given by (3.26) is defined as in one of the subsequent scenarios. (i) If A is an arbitrary, it follows ð1Þ dij
xij = α2I r1 ,m
A2kþ1 A2kþ1
α α
α2I r,m
jAA jαα
,
1 = ðdð1Þ Þ = Ψ2 D, and Ψ2 = ψ ð2Þ is determined by (4.41). where D gj ij (ii) If A is Hermitian, it follows
xij =
β2J r,m fig
^ð1Þ Þ cdeti ðAkþ1 Þ:i ðd :j A2
β2J r,m
β β
Akþ1 β2J r1 ,m
β β
β β
,
^ð1Þ is the gth column of D ^ 1 = U1 D, and U1 is defined by (4.47). where d :j Corollary 4.24 Consider B 2 ðnÞðlÞ satisfying rkðBl Þ = s1 and D 2 m×n . s m×n The solution X 2 defined by (3.27) is represented as in one of the subsequent situations. (i) In the case of arbitrary B, it follows xij =
ð2Þ dij β β2J s,n jB Bjβ
β2J s1 ,n
B2lþ1 B2lþ1
β β
,
2 := DΦ2 = ðdð2Þ Þ, and Φ2 = ϕð2Þ is obtained in (4.41). where D ij if
Quaternion Two-Sided Matrix Equations with Specific Constraints
125
(ii) In the case of Hermitian B, it is derived ð2Þ
^ Þ rdetj ðB2 Þj: ðd i:
xij =
α2I s,n fjg
Blþ1
α2I s1 ,n
α α
α2I s,n
B2
α α
,
α α
,
^ 2 := DU2 , and U2 is obtained in (4.46). ^ð2Þ is the ith row of D where d i:
5 Illustrative Examples To explain derived results and representations, subsequent examples are conducted. Example Consider the restricted Q-matrix approximation issue (2.3) with input matrices
A=
-k
-j
0
i
-1-j
iþk
j
1þj
k
0
i
0
-i þ k
1-j
i
i-k
0 k ,B=
i 0
0 j
0 -j ,D=
i
0
j
-k
j
0
1
k
i
0
-i
k
0
:
ð5:1Þ One can find
A3 ðA3 Þ =
B
2
B2 =
3
6i þ 4k
- 4 - 3j
- 4 - 6j
- 6i - 4k
19
4i þ 13k
19k
- 4 þ 3j
- 4i - 13k
10
13 þ 4j
- 4 þ 6j
- 19k
13 - 4j
19
2
0 2k
0
2
0
- 2k 0
2
,
:
Since rkðAÞ = rkðA AÞ = 3, rkðA3 Þ = rkðA2 Þ = 2, rkðB2 Þ = rkðBÞ = 2, then k1 = IndðAÞ = 2, k 2 = IndðBÞ = 1.
126
I. I. Kyrchei et al.
Hence, 3k
- 4 þ 6j
3i - 4k
6i - 4k
3 þ 4j
13i - 4k
- 10j
4 - 13j
i - 3k
2 - 9j
- 6i þ 3k
- 9i þ 2k
4i - 3k
- 4 - 13j
- 10i
- 13i - 4k
^ = A2 ðA3 Þ = A
and by (4.2) ϕ11 =
rdet1 A3 A3
1:
α2I 2,4 f1g
= rdet1 þrdet1
ð^ a1: Þ
3k
- 4 þ 6j
- 6i - 4k
19
3k
6i - 4k
- 4 þ 6j
19
α α
þ rdet1
3k
3i - 4k
- 4 þ 3j
10
= 15k:
By the same reasoning, we come to the conclusion
Φ=
15k
5j
- 10i
5i
- 10
5i
- 15j
- 5j
- 15i þ 10k
5 - 5j
- 15i - 10k
- 5i þ 5k
10k
- 5j
- 15i
- 5i
Similarly, 0 = ðB2 Þ B = i - k B 0 By (4.3),
- 2i 0 2j
0 1-j : 0
:
,
Quaternion Two-Sided Matrix Equations with Specific Constraints
ψ 11 =
β2J 2,3 f1g
=
cdet1
0
β
:1 Þ B2 B2 :l ðb
cdet1 0
i-k 2
þ cdet1
β
127
=
0
2k
0
2
= 0,
and - 4i
0 Ψ = 4i - 4k 0
0
0 4 - 4j :
4j
0
Since
~ = ΦDΨ = D
- 40i - 40k
- 40 þ 40j
40 þ 40j
60 - 60j
- 60i - 60k
- 60i þ 60k
- 20i - 100k
- 20 þ 100j
100 þ 20j
- 60i - 60k
- 60 þ 60j
60 þ 60j
,
and A3 A3 α2I 2,4
α α
= 25,
β2J 2,3
β
B2 B2 β = 8,
then by (4.1),
X=
- 0:2i - 0:2k
- 0:2 þ 0:2j
0:2 þ 0:2j
0:3 - 0:3j
- 0:3i - 0:3k
- 0:3i þ 0:3k
- 0:1i - 0:5k
- 0:1 þ 0:5j
0:5 þ 0:1j
- 0:3i - 0:3k
- 0:3 þ 0:3j
0:3 þ 0:3j
is the distinctive solution to the constrained Q-matrix approximation problem (2.3) with the constant matrices stated precisely in (5.1). Example Let’s derive Cramer’s rule for solving (3.10) with the matrix A given in (5.1) and
128
I. I. Kyrchei et al.
B=
4k
4i
- 5i
- 2j
2k
3
i
-1
, D=
-i
0
-1
0
-k
0
k
0
j
0
-j
0
k
:
It follows rkðAÞ = 3, rkðA3 Þ = rkðA2 Þ = 2, rkðBÞ = 2, rkðB2 Þ = rkðB3 Þ = 1. additionally, k = IndðAÞ = 2 and l = IndðBÞ = 2. Based on Theorem 4.5 in the case (4.6), Cramer’s rule to the solution (3.10) is expressed in the subsequent way. ~ = A A3 ðA3 Þ 1. Compute A Consequently,
~= A
A A =
BB =
^= B
^ = ðB3 Þ B3 B and B
and some others.
7i þ 27j
- 55 þ 48j
21i - 48k
48i - 55k
- 20 þ 7j
- 34i - 44k
37 þ 14j
44 þ 34j
12i - 15k
- 4 - 51j
- 36i þ 8k
- 51i - 4k
- 7i - 20k
44 - 33j
14i þ 37k
- 34i þ 44k
3
i - 4k
- 2 - 3j
-4 þ j
- i þ 4k
5
2i - 2k
- 5k
- 2 þ 3j
- 2i þ 2k
3
2 - 2j
-4-j
5k
2 þ 2j
5
57
- 31i
31i
17
13j
7k
- 13j
- 7k , ðB3 Þ B3 =
,
11
11i
- 11i
11
- 11j 11k
3
- 143k
77j
- 33i
- 143j
- 77k
- 33
- 143i
77
- 33k
:
,
11j - 11k , 11
Quaternion Two-Sided Matrix Equations with Specific Constraints
129
2. Based on (4.8), the following is valid:
U1 =
25k
- 10 þ 25j
0
- 25i - 10k
- 25 þ 25j
- 20i - 25k
- 25j
25 - 20j
- 25k
10j
- 25i
- 10i
25i - 25k
25 þ 20j
- 25i
- 20i þ 25k
,
and the direct calculation gives
~ 1 = U1 D = U
25j
- 35i þ 35k
- 25k
- 50i - 25k
- 45 - 45j
50 - 25j
0
- 10i þ 10k
0
- 25
5i - 5k
- 25i
:
^ Since rk ðB3 Þ B3 = 1, then, by (4.7), V2 = B. 3. Compute the matrix
~ 1 ¼ Ψ1 V2 Ψ
¼
þ
35750
43890
- 8250
- 317460
92400
- 12210
- 50050
- 20790
- 11500
178750
- 6160
- 34650
þ
- 35750
- 16170
- 8250
- 52910
- 77000
73260
- 50050
- 60830
- 150150
- 140910
and the values
81510
- 19250
6930
171610
170940
33000
- 38610
26950
26070
- 11440
i - 96250 60390
30030
- 19250
- 18810
143000
- 28490
- 39600
11550
112970
- 26950
8910
- 41250
261690
- 80850
2640
jþ
k,
130
I. I. Kyrchei et al.
β2J 3,4
jA Ajββ = 4,
α
B3 B3 α = 33,
α2I 1,3
α2I 2,3
jBB jαα = 10:
4. Finally, by (4.6), we have
X¼
þ
0:33
- 2:96
- 0:47
1:67
- 0:62
- 1:6
0:36
0:11
- 0:17
- 0:50
- 0:35
- 1:4
- 0:33
- 0:49
- 0:47
- 1:4
0:28
1:33
1:05
2:44
- 0:17
2:96
- 0:47
- 1:67
þ
0:62
1:6
- 0:36
- 0:11
0:33
- 2:96
- 0:47
1:67
0:14
1:33
1:05
2:44
- 1:33 1:05 2:44
0:28 jþ
iþ
0:33
0:49
- 0:38
0:46
1:4
k:
- 1:63 0:36 0:11
Example Consider Cramer’s rule of the solution (3.18) with A given in (5.1), and
B=
0
i
-i
0
0
k
0 -k , D= 0
-i
0
-1
0
-k
0
k
0
j
0
-j
0
:
Because A is non-Hermitian and B is Hermitian, Theorem 4.16 is applicable. It follows rkðAÞ = 3, rkðA3 Þ = rkðA2 Þ = 2, rkðB2 Þ = rkðBÞ = 2. In addition, k = IndðAÞ = 2 and l = IndðBÞ = 1. Due to Theorem 4.16 with the case (4.39), there is the following algorithm to find Cramer’s rule of the solution (3.18).
Quaternion Two-Sided Matrix Equations with Specific Constraints
131
^ = ðA5 Þ A and B2, which gives 1. Calculate A
^ = ðA5 Þ A = A
ðA5 Þ A5 =
B2 =
16 - 2j
i - 12k
- 8 - 9j
- 12 þ j
- 13i þ 4k
3 - 10j
8i þ 6k
10i - 3k
- 2 þ 3j
- 2i þ 2k
3
2 - 2j
- 4 - 13j 50
10i þ 3k 41i - 8k
- 6 þ 8j - 8 - 9j
3 þ 10j - 8 þ 41j
- 41i þ 8k
35
8i þ 6k
- 35k
- 8 þ 9j
- 8i - 6k
3
- 6 - 8j
- 8 - 41j 1 0 j
35k
- 6 þ 8j 0 2i 0
35
0
2 0 , B3 =
-j
0 1
,
,
- 2k :
- 2i
0
0
2k
0
2. By direct calculation, find
D1 := ðA5 Þ AD =
7i - 6k
- 11 þ 13j
- 7 - 6j
7 - 4j
7i - 13k
7i þ 4k
- 2i
- 4j
2
4i þ 7k
13 þ 7j
- 4 þ 7j
3. By (4.25), it follows
W1 =
- 15i þ 20k
15 - 35j
15 þ 20j
20 - 10j
- 36k
20i þ 10k
30i þ 35k
73 þ 15j
- 30 þ 35j
10i þ 20k
36
- 10 þ 10j
:
:
132
I. I. Kyrchei et al.
4. Further, compute
A1 := A A2 W1 =
- 50 þ 100j
- 84i þ 174k
- 50i - 160k
- 100i þ 25k
- 59 - 143j
140 þ 15j
- 75 - 100j
205i þ 9k
- 105i þ 120k
- 25 - 100j
143i - 59k
- 15i þ 140k
:
5. By (4.32) and (4.23), find, respectively,
V2 ¼
0
4i
- 4i
0
0
4k
Ψ1 ¼
0 - 4k , 0
- 100j
148i - 60k
60k
0
0
0
- 200 - 200j
432i þ 120k
- 200i þ 200k
0
0
0
,
and compute ^ 1 ¼ V2 Ψ1 Ψ 592 þ 240j ¼
- 240 þ 400k
- 240 þ 592j
0
0
0
1728 - 480j
- 800 - 800i - 800j þ 800k
480 þ 1728j
0
0
0
6. Since β2J 3,4
jA Ajββ = 4,
β
β2J 2,4
A5 A5 β = 25, and
:
α
α2I 2,3
B2 α = 4, then, by
(4.39), finally, we obtain
X=
0:74 þ 0:3j
- 0:3 þ k
- 0:3 þ 0:74j
0
0
0
2:16 - 0:6j
- 0:5 - 0:5i - 0:5j þ 0:5k
0:6 þ 2:16j
0
0
0
:
Quaternion Two-Sided Matrix Equations with Specific Constraints
133
6 Conclusion Our principal outcomes are related with solving the quaternion restricted two-sided matrix equation AXB = D and approximation problems related with it. Considered approximation matrix problems and the quaternion two-sided quaternion matrix equations with specific constraints are expressed in terms of the core-EP inverse and the dual core-EP inverse, the MPCEP and CEPMP inverses, and the DMP and MPD inverses. The MPCEP-CEPMP inverses and the DMP-MPD inverses are generalized inverses that combine the Moore-Penrose (MP-)inverse with the core-EP (CEP-)inverse and the MP-inverse with the Drazin (D-)inverse, respectively. Within the framework of the theory noncommutative row-column determinants recently developed by one of the authors, we derive Cramer’s rules for solving these constrained quaternion matrix equations and approximation matrix problems based on determinantal representations of generalized inverses. Some numerical examples are given to illustrate gained results. Acknowledgements Ivan I. Kyrchei thanks the Erwin Schrödinger Institute for Mathematics and Physics (ESI) at the University of Vienna for the support given by the Special Research Fellowship Programme for Ukrainian Scientists. Dijana Mosić and Predrag Stanimirović are supported from the Ministry of Education, Science and Technological Development, Republic of Serbia, Grants 451-03-47/2023-01/200124. Predrag S. Stanimirović is supported by the Science Fund of the Republic of Serbia (No. 7750185, Quantitative Automata Models: Fundamental Problems and Applications – QUAM).
References 1. Adler, L. S. (1995). Quaternionic quantum mechanics and quantum fields. New York: Oxford University Press 2. Aslaksen, H. (1996). Quaternionic determinants. The Mathematical Intelligencer, 18(3), 57–65 3. Bai, Z. Z., Deng, B. Y., & Gao, H. Y. (2006). Iterative orthogonal direction methods for Hermitian minimum norm solutions of two consistent matrix equations. Numerical Linear Algebra with Applications, 13, 801–823 4. Baksalary, M. O., & Trenkler, G. (2010). Core inverse of matrices. Linear and Multilinear Algebra, 58(6), 681–697 5. Bapat, B. R., K.P.S. Bhaskara, & Manjunatha Prasad, K. (1990). Generalized inverses over integral domains. Linear Algebra and Its Applications, 140, 181–196 6. Bapat, B. R. (1994). Generalized inverses with proportional minors. Linear Algebra and Its Applications, 211, 27–35 7. Ben-Israel, A., & Grevile, T. N. E. (2003). Generalized inverses, theory and applications (2nd ed.). Canadian Mathematical Society. New York: Springer
134
I. I. Kyrchei et al.
8. Le Bihan, N., & Sangwine, J. S. (2003). Quaternion principal component analysis of color images. In IEEE International Conference on Image Processing (ICIP), Barcelona, Spain 9. Cai, J., & Chen, G. (2007). On determinantal representation for the generalized ð2Þ inverse AT,S and its applications. Numerical Linear Algebra with Applications, 14, 169–182 10. Caiqin, S., & Guolian, C. (2012). On solutions of quaternion matrix equations XF AX = BY and XF - Ã = BY. Acta Mathematica Scientia, 32B(5), 1967–1982 11. Campbell, L. S., & Meyer, D. C. (1979). Generalized inverses of linear transformations. London: Pitman 12. Chen, L. J., Mosić, D., & Xu, Z. S. (2020). On a new generalized inverse for Hilbert space operators. Quaestiones Mathematicae, 43, 1331–1348 13. Chu, E. K. (1987). Singular value generalized singular value decompositions and the solution of linear matrix equations. Linear Algebra and Its Applications, 88/89, 83–98 14. Cohen, N., & De Leo, S. (2000). The quaternionic determinant. Electronic Journal of Linear Algebra, 7, 100–111 15. De Leo, S., & Scolarici, G. (2000). Right eigenvalue equation in quaternionic quantum mechanics. Journal of Physics A, 33, 2971–2995 16. Fan, J. (2003). Determinants and multiplicative functionals on quaternion matrices. Linear Algebra and Its Applications, 369, 193–201 17. Fan, X., Li, Y., Liu, Z., & Zhao, J. (2022). Solving quaternion linear system based on semi-tensor product of quaternion matrices. Symmetry, 14, 1359 18. Ferreyra, E. D., Levis, E. F., & Thome, N. (2018). Revisiting the core EP inverse and its extension to rectangular matrices. Quaestiones Mathematicae, 41(2), 265–281 19. Ferreyra, E. D., Levis, E. F., & Thome, N. (2018). Maximal classes of matrices determining generalized inverses. Applied Mathematics and Computation, 333, 42–52 20. Gao, Y., & Chen, J. (2018). Pseudo core inverses in rings with involution. Communications in Algebra, 46(1), 38–50 21. Gao, Y., Chen, J., & Patricio, P. (2021). Continuity of the core-EP inverse and its applications. Linear and Multilinear Algebra, 69(3), 557–571 22. He, Z.-H., & Wang, Q.-W. (2013). A real quaternion matrix equation with applications. Linear and Multilinear Algebra, 61(6), 725–740 23. Jiang, T., & Wei, M. (2003). Equality constrained least squares problem over quaternion field. Applied Mathematics Letters, 16, 883–888 24. Jiang, T., Zhao, J., & Wei, M. (2008). A new technique of quaternion equality constrained least squares problem. Journal of Computational and Applied Mathematics, 216, 509–513 25. Jiang, T., Zhang, Z., & Jiang, Z. (2018). A new algebraic technique for quaternion constrained least squares problems. Advances in Applied Clifford Algebras, 28, 14 26. Khatri, G. C., & Mitra, K. S. (1976). Hermitian and nonnegative definite solutions of linear matrix equations. SIAM Journal on Applied Mathematics, 31, 579–585 27. Krishnamurthy, V. E. (1978). Generalized matrix inverse approach for automatic balancing of chemical equations. International Journal of Mathematical Education in Science and Technology, 9, 323–328
Quaternion Two-Sided Matrix Equations with Specific Constraints
135
28. Kyrchei, I. I. (2008). Cramer’s rule for quaternionic systems of linear equations. Journal of Mathematical Sciences, 155(6), 839–858 29. Kyrchei, I. I. (2012). The theory of the column and row determinants in a quaternion linear algebra. In A.R. Baswell (Ed.), Advances in mathematics research (vol. 15, pp. 301–359). New York: Nova Sci. Publ. 30. Kyrchei, I. I. (2012). Determinantal representation of the Moore-Penrose inverse matrix over the quaternion skew field. Journal of Mathematical Sciences, 108(1), 23–33 31. Kyrchei, I. I. (2014). Determinantal representations of the Drazin inverse over the quaternion skew field with applications to some matrix equations. Applied Mathematics and Computation, 238, 193–207 32. Kyrchei, I. I. (2017). Determinantal representations of the Drazin and W-weighted Drazin inverses over the quaternion skew field with applications. In S. Griffin (Ed.) Quaternions: theory and applications (pp. 201–275). New York: Nova Sci. Publ. 33. Kyrchei, I. I. (2017). Determinantal representations of the quaternion weighted Moore-Penrose inverse and its applications. In A.R. Baswell (Ed.), Advances in mathematics research (vol. 23, pp. 35–96). New York: Nova Sci. Publ. 34. Kyrchei, I. I. (2018). Explicit determinantal representation formulas for the solution of the two-sided restricted quaternionic matrix equation. Journal of Applied Mathematics and Computing, 58(1–2), 335–365 35. Kyrchei, I. I. (2018). Cramer’s rules for Sylvester quaternion matrix equation and its special cases. Advances in Applied Clifford Algebras, 28(5), 90 36. Kyrchei, I. I. (2019). Determinantal representations of general and (skew-) Hermitian solutions to the generalized Sylvester-type quaternion matrix equation. Abstract and Applied Analysis, 2019, ID 5926832, 14 p. 37. Kyrchei, I. I. (2019). Cramer’s Rules for Sylvester-type matrix equations. In I.I. Kyrchei (Ed.), Hot topics in Linear Algebra (pp. 45–110). New York: Nova Sci. Publ. 38. Kyrchei, I. I. (2019). Determinantal representations of the quaternion core inverse and its generalizations. Advances in Applied Clifford Algebras, 29(5), 104 39. Kyrchei, I. I. (2015). Cramer’s rule for generalized inverse solutions. In I.I. Kyrchei (Ed.), Advances in Linear Algebra research (pp. 79–132). New York: Nova Sci. Publ. 40. Kyrchei, I. I. (2019). Determinantal representations of the core inverse and its generalizations with applications. Journal of Mathematics, 2019, ID 1631979, 13 p. 41. Kyrchei, I. I. (2020). Weighted quaternion core-EP, DMP, MPD, and CMP inverses and their determinantal representations. Revista de la Real Academia de Ciencias Exactas, Físicas y Naturales. Serie A. Matemáticas, 114, 198 42. Kyrchei, I. I., Mosić, D., & Stanimirović, P. S. (2021). Solvability of new constrained quaternion matrix approximation problems based on core-EP inverses. Advances in Applied Clifford Algebras, 31, 3 43. Kyrchei, I. I., Mosić, D., & Stanimirović, P. S. (2021). MPD-DMP-solutions to quaternion two-sided restricted matrix equations. Computers & Mathematics with Applications, 40, 177 44. Kyrchei, I. I., Mosić, D., & Stanimirović, P. S. (2022). MPCEP-* CEPMP-solutions of some restricted quaternion matrix equations. Advances in Applied Clifford Algebras, 32, 16 45. Levine, J., & Hartwig, E. R. (1980). Applications of Drazin inverse to the Hill cryptographic systems. Cryptologia, 5(2), 67–77
136
I. I. Kyrchei et al.
46. Liu, L.-S., Wang, Q.-W., Chen. J.-F., & Xie, Y.-Z. (2022). An exact solution to a quaternion matrix equation with an application. Symmetry, 14(2), 375 47. Ling, S., Xu, X., & Jiang, T. (2013). Algebraic method for inequality constrained quaternion least squares problem. Advances in Applied Clifford Algebras, 23, 919–928 48. Liu, X., & Cai, N. (2018). High-order iterative methods for the DMP inverse. Journal of Mathematics, 2018, ID 8175935, 6 p. 49. Liu, X., Yu, Y., & Wang, H. (2009). Determinantal representation of the weighted generalized inverse. Applied Mathematics and Computation, 208, 556–563 50. Ma, H., Gao, X., & Stanimirović, P. S. (2020). Characterizations, iterative method, sign pattern and perturbation analysis for the DMP inverse with its applications. Applied Mathematics and Computation, 378, 125196 51. Ma, H., & Stanimirović, P. S. (2019). Characterizations, approximation and perturbations of the core-EP inverse. Applied Mathematics and Computation, 359, 404–417 52. Malik, B. S., & Thome, N. (2014). On a new generalized inverse for matrices of an arbitrary index. Applied Mathematics and Computation, 226, 575–580 53. Meng, S. L. (2017). The DMP inverse for rectangular matrices. Filomat, 31(19), 6015–6019 54. Mitra, K. S. (1972). Fixed rank solutions of linear matrix equations. Sankhya Ser. A., 35, 387–392 55. Mosić, D. (2021). Core-EP inverses in Banach algebras. Linear and Multilinear Algebra, 69(16), 2976–2989 56. Mosić, D. (2020). Core-EP inverse in rings with involution. Publicationes Mathematicae Debrecen, 96(3–4), 427–443 57. Mosić, D. (2020). Weighted gDMP inverse of operators between Hilbert spaces. Bulletin of the Korean Mathematical Society, 55, 1263–1271 58. Mosić, D. (2020). Maximal classes of operators determining some weighted generalized inverses. Linear and Multilinear Algebra, 68(11), 2201–2220 59. Mosić, D., & Djordjević, D. S. (2018). The gDMP inverse of Hilbert space operators. Journal of Spectral Theory, 8(2), 555–573 60. Pablos Romo, F. (2021). On Drazin-Moore-Penrose inverses of finite potent endomorphisms. Linear and Multilinear Algebra, 69(4), 627–647 61. Peng, Y. Z. (2010). New matrix iterative methods for constraint solutions of the matrix equation AXB = C. Journal of Computational and Applied Mathematics, 235, 726–735 62. Prasad, M. K., & Mohana, S. K. (2014). Core-EP inverse. Linear and Multilinear Algebra, 62(6), 792–802 63. Prasad, M. K., & Raj, D. M. (2018). Bordering method to compute core-EP inverse. Special Matrices, 6, 193–200 64. Prasad, M. K., Raj, D. M., & Vinay, M. (2018). Iterative method to find core-EP inverse. Bulletin of Kerala Mathematics Association, Special Issue, 16(1), 139–152 65. Prasad, M. K., Rao, K. P. S. B., & Bapat, B. R. (1991). Generalized inverses over integral domains. II. Group inverses and Drazin inverses. Linear Algebra and Its Applications, 146, 31–47 66. Rauhala, A. U. (1980). Introduction to array algebra. Photogrammetric Engineering & Remote Sensing, 46, 177–192 67. Regalia, A. P., & Mitra, K. S. (1989). Kronecker products, unitary matrices and signal processing applications. SIAM Review, 31, 586–613
Quaternion Two-Sided Matrix Equations with Specific Constraints
137
68. Rehman, A., Kyrchei, I., Ali, I., Akram, M., & Shakoor, A. (2020). Explicit formulas and determinantal representation for η-skew-hermitian solution to a system of quaternion matrix equations. Filomat, 34(8), 2601–2627 69. Rehman, A., Kyrchei, I. I., Ali, I., Akram, M., & Shakoor, A. (2021). Constraint solution of a classical system of quaternion matrix equations and its Cramer’s rule. Iranian Journal of Science and Technology. Transaction A, Science, 45, 1015–1024 70. Risteski, B. I. (2008). A new generalized matrix inverse method for balancing chemical equations and their stability. Boletín de la sociedad química de México, 2, 104–115 71. Sahoo, K. J., Behera, R., Stanimirović, P. S., Katsikis, N. V., & Ma, H. (2020). Core and core-EP inverses of tensors. Computers & Mathematics with Applications, 39, 9 72. Sangwine, J. S., & Le Bihan, N. (2006). Quaternion singular value decomposition based on bidiagonalization to a real or complex matrix using quaternion Householder transformations. Applied Mathematics and Computation, 182(1), 727–738 ð2Þ 73. Sheng, X., & Chen, G. (2007). Full-rank representation of generalized inverse AT,S and its applications. Computers & Mathematics with Applications, 54, 1422–1430 (2007) 74. Song, C., Feng, J., Wang, X., & Zhao, J. (2014). A real representation method for solving Yakubovich-j-Conjugate quaternion matrix equation. Abstract and Applied Analysis, 2014, ID 285086, 9 p. 75. Song, J. G. (2012). Determinantal representation of the generalized inverses over the quaternion skew field with applications. Applied Mathematics and Computation, 219, 656–667 76. Song, J. G., Wang, Q. W., & Yu, W. S. (2018). Condensed Cramer rule for some restricted quaternion linear equations. Applied Mathematics and Computation, 336, 490–499 77. Song, J. G., & Yu, W. S. (2019). Cramer’s rule for the general solution to a restricted system of quaternion matrix equations. Advances in Applied Clifford Algebras, 29, 91 78. Song, J. G., Wang, W. Q., & Yu, W. S. (2018). Cramer’s rule for a system of quaternion matrix equations with applications. Applied Mathematics and Computation, 336, 490–499 79. Song, J. G., Ding, W., & Ng, M. K. (2021). Low rank pure quaternion approximation for pure quaternion matrices. SIAM Journal on Matrix Analysis and Applications, 42(1), 58–82 80. Stanimirović, P. S. (1999). General determinantal representation of generalized inverses over integral domains. Publicationes Mathematicae Debrecen, 54, 221–249 81. Stanimirović, P. S., Bogdanović, S., & Ćirić, M. (2006). Adjoint mappings and inverses of matrices. Algebra Colloquium, 13(3), 421–432 82. Stanimirović, P. S., & Djordjević, D. S. (2000). Full-rank and determinantal representation of the Drazin inverse. Linear Algebra and Its Applications, 311, 31–51 83. Stanimirović, P. S., & Zlatanović, M. L. (2012). Determinantal representation of outer inverses in Riemannian space. Algebra Colloquium, 19, 877–892 84. Strasek, R. (2003). Uniform primeness of the Jordan algebra of hermitian quaternion matrices. Linear Algebra and Its Applications, 367, 235–242 85. Tian, Y. (2003). Ranks of solutions of the matrix equation AXB = C. Linear and Multilinear Algebra, 51, 111–125
138
I. I. Kyrchei et al.
86. Wang, B., Du, H., & Ma, H. (2020). Perturbation bounds for DMP and CMP inverses of tensors via Einstein product. Computers & Mathematics with Applications, 39, 28 87. Wang, D., Li, Y., & Ding, W. (2022). Several kinds of special least squares solutions to quaternion matrix equation AXB=C. Journal of Applied Mathematics and Computation, 68, 1881–1899 88. Wang, H. X., & Zhang, X. X. (2020). The core inverse and constrained matrix approximation problem. Open Mathematics, 18, 653–661 89. Wang, Q., Yu, S., & Xie, W. (2010). Extreme ranks of real matrices in solution of the quaternion matrix equation AXB = C with applications. Algebra Colloquium, 17, 345–360 90. Wang, Q.-W., & Zhang, F. (2008). The reflexive re-nonnegative definite solution to a quaternion matrix equation. Electronic Journal of Linear Algebra, 17, 88–101 91. Wang, H. (2016). Core-EP decomposition and its applications. Linear Algebra and Its Applications, 508, 289–300 92. Wang, X., Li, Y., & Dai, L. (2013). On Hermitian and skew-Hermitian splitting iteration methods for the linear matrix equation AXB = C. Computers & Mathematics with Applications, 65, 657–664 93. Yu, A., & Deng, C. (2016). Characterization of DMP inverse in Hilbert space. Calcolo, 53, 331–341 ð2Þ 94. Yu, Y., & Wang, G. (2007). On the generalized inverse AT,S over integral domains. Australian Journal of Mathematical Analysis and Applications, 4, 1–20 95. Yu, Y., & Wei, Y. (2009). Determinantal representation of the generalized inverse ð2Þ AT,S over integral domains and its applications. Linear and Multilinear Algebra, 57, 547–559 96. Yuan, S. (2012). Least squares η-Hermitian solution for quaternion matrix equation AXB=C. In C. Liu, L. Wang, A. Yang (Eds), Information computing and applications, ICICA 2012. Communications in Computer and Information Science, vol. 307. Berlin, Heidelberg: Springer 97. Zhang, F. (2007). Geršgorin type theorems for quaternionic matrices. Linear Algebra and Its Applications, 424, 139–153 98. Zhang, F. Z. (1997). Quaternions and matrices of quaternions. Linear Algebra and Its Applications, 251, 21–57 99. Zhou, M. M., Chen, L. J., Li, T. T., & Wang, G. D. (2018). Three limit representations of the core-EP inverse. Filomat, 32, 5887–5894 100. Zhang, Y., Li, Y., Zhao, H., Zhao, J., & Wang, G. (2022). Least-squares bihermitian and skew bihermitian solutions of the quaternion matrix equation AXB = C. Linear and Multilinear Algebra, 70(6), 1081–1095 101. Zhou, M., & Chen, J. (2018). Integral representations of two generalized core inverses. Applied Mathematics and Computation, 333, 187–193 102. Zhu, H. (2019). On DMP inverses and m-EP elements in rings. Linear and Multilinear Algebra, 67(4), 756–766
Matrices over Quaternion Algebras Xin Liu and Yang Zhang
Abstract This survey provides an overview of several powerful real and complex representations of matrices over quaternion algebras including split quaternions and biquaternions as well as their applications in solving some famous matrix equations like Sylvester/Stein/Lyapunov equations and computing singular value (resp. eigenvalue) decompositions. Keywords Quaternion algebra • Split quaternion • Biquaternion • Matrix equation Mathematics Subject Classification (MSC2020) Primary 15B33 • Secondary 15A24, 15A66
1 Introduction In 1843, Hamilton discovered the real quaternions = fa1 þ a2 i þ a3 j þ a4 k j i2 = j2 = k 2 = - 1, ijk = - 1, a1 , . . . , a4 2 g which is a four-dimensional division algebra over the real number field . One year later, in 1844, Hamilton [17, 18] extended his real quaternions to biquaternions over the complex number field , that is, X. Liu Macau Institute of Systems Engineering, Faculty of Innovation Engineering Macau, University of Science and Technology, TaiPa, P.R. China e-mail: [email protected] Y. Zhang (✉) Department of Mathematics, University of Manitoba, Winnipeg, MB, Canada e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Moslehian (ed.), Matrix and Operator Equations and Applications, Mathematics Online First Collections, https://doi.org/10.1007/16618_2023_46
139
140
X. Liu and Y. Zhang
= fa1 þ a2 i þ a3 j þ a4 k j i2 = j2 = k 2 = - 1, ijk = - 1, a1 , . . . , a4 2 g: Both quaternions have been widely used in many areas such as computer graphics, control theory, physics, robotics, image and signal processing, electromechanics, quantum mechanics, etc. We refer the readers to [3, 8, 33, 48, 51–53, 58, 60] for more information. In 1849, Cokle [7] introduced the split quaternions s which is a fourdimensional algebra but not a division algebra, that is, s = fa1 þ a2 i þ a3 j þ a4 k j i2 = - 1, j2 = k2 = 1, ijk = 1, a1 , . . . , a4 2 g:
s also has many applications in mathematics and other areas like differential geometry, split quaternionic mechanics, rotations in four-dimensional space, the design of public key cryptosystem, etc. [11, 13, 24, 28, 69, 75]. Moreover, a quaternion algebra was introduced in Chapter III of Lam’s book [34]. Let be an arbitrary field of characteristic not two. For α, β 2 , the quaternion algebra A = ðα, βÞ is an -algebra with two generators i, j satisfying i2 = α, j2 = β, ij = - ji: Let k := ij 2 A. Then it is easy to see that k2 = - αβ 2 , ik = - ki = αj, kj = - jk = βi: The general form of A can be written as ðα, βÞ = fa1 þ a2 i þ a3 j þ a4 k j i2 = α, j2 = β, ij = - ji = k, a1 , . . . , a4 2 g:
Clearly, ðα, βÞ includes various types of generalized quaternions. In particular, let be the real number field . ðα, βÞ is called the generalized quaternion in [47]. Moreover, taking different values of α and β, ð - 1, - 1Þ is Hamilton quaternions ; ð - 1, 1Þ is the split quaternions s ; ð - 1, 0Þ is the degenerate quaternions; ð1, 0Þ is the pseudo-degenerate quaternions; and ð0, 0Þ is the doubly degenerate quaternions. The detailed properties and applications can be found in, for example, [1, 14, 22, 25, 42]. On the other hand, in 1892, Segre [54] defined the reduced biquaternion algebra:
Matrices over Quaternion Algebras
141
rb = fa1 þ a2 i þ a3 j þ a4 k j i2 = - 1, j2 = 1, ij = ji = k, a1 , . . . , a4 2 g: Clearly, k2 = -1, jk = kj = i, ki = ik = -j. Hence, rb is a commutative algebra. The reduced biquaternions have also been extensively studied due to various applications. For example, in [5], they studied the functions of reduced biquaternion variables and obtained the generalized CauchyRiemann conditions. Pei et al. [45] proposed a simplified reduced biquaternion polar form with applications in processing color images. In [46], they developed several algorithms for calculating the eigenvalues, eigenvectors, and the singular value decomposition of reduced biquaternion matrices. As applications, they applied the results into the processing of color images in the digital media. Two types of multistate Hopfield neural networks based on reduced biquaternions were investigated in [23]. Moreover, [16, 30, 31] discussed some algebraic properties of reduced biquaternion matrices as well as the generalized Sylvester/Stein matrix equation by means of real/complex representations. Motivated by the above works, in [57], we defined the generalized reduced biquaternion algebra rb ðα, βÞ as a commutative four-dimensional Clifford algebra satisfying the following: for a fixed pair α, β 2 , rb ðα, βÞ = fa1 þ a2 i þ a3 j þ a4 k j i2 = α, j2 = β, ij = ji = k, a1 , . . . , a4 2 g:
It is easy to see that k2 = ijk = αβ, jk = kj = βi, ki = ik = αj, and rb ð - 1, 1Þ is the same as Segre’s reduced biquaternion algebra rb . In Section 4, we will deal with the least-squares problem of matrix equations over rb ðα, βÞ. Matrix equations are ubiquitous in control theory, signal and image processing, and systems theory (see, e.g., [55] and references therein). Linear matrix equations have important roles in the stability analysis of linear dynamical systems and also take part in the theoretical developments of nonlinear systems. The famous linear matrix equations are Sylvester equation AX + XB = C, Lyapunov equation AX + XA = C, and Stein equation X + AXB = C. Various generalized matrix equations have been studied [6, 15, 21, 27, 29, 41, 49, 50, 56, 62, 67, 72]. Finding solutions to these kinds of matrix equations is a fundamental problem. In this chapter, we will summarize some recent results in solving matrix equations over various quaternion algebras. The real/complex representation methods is one of the standard and popular ways to investigate the fundamental properties of different kinds of quaternions [28, 30, 31, 36, 37, 39, 40, 43, 44, 65]. In this chapter, we outline some powerful real and complex representations and their applications. In
142
X. Liu and Y. Zhang
Section 2, we introduce three real representations and one complex representation for a generalized quaternion matrix over ðα, βÞ and use them to solve equations AXB + CX⋆D = E (2.3) and find X = ±X⋆ solutions to AXB + CXD = E (2.4) as well as the least-squares problems for AXD = B. In Section 3, we consider the matrices over the split quaternion s . We introduce two new real representations and apply them to solve two kinds of matrix equations over s : AX⋆ - XB = CY + D and X - AX⋆B = CY + D, X⋆ 2{Xi, X j, Xk, X}. Moreover, we investigate the (least-squares) η(anti-)Hermitian solutions to the following split quaternion matrix equations: AXAη = B, Aη2{Ai, A j, Ak}. In the first part of Section 4, we discuss eigenvalues/eigenvectors and the singular value (resp. eigenvalue) decomposition of an elliptic biquaternion matrix and apply them to give the leastsquares solution to AX = B. In the second part, we explore the least-squares problem for matrix equation AXC = B over the generalized reduced biquaternion algebra rb ðα, βÞ with nonzero α, β. Throughout this chapter, we denote the set of all m × n matrices over an algebra R by Rm×n. Let the symbols In and 0 stand for the n × n identity matrix and the zero matrix with appropriate sizes, respectively. For a matrix A, A{ stands for the Moore-Penrose inverse of A. LA = I - A{A and RA = I - AA{ are two projectors induced by A.
2 Quaternion Algebra Hℝ ðα, βÞ In this section, we consider a class of quaternion algebra with nonzero α, β and = , denoted by G = ðα, βÞ, which is also called generalized quaternions (see, e.g., Yu et al. [65], Wang et al. [59]). ×n For any A 2 m G , there is a unique representation A = A1 + A2i + A3j + A4k with Ai 2 m × n . We define three corresponding η-conjugates as follows: Ai = i - 1 Ai = A1 þ A2 i - A3 j - A4 k, Aj = j - 1 Aj = A1 - A2 i þ A3 j - A4 k,
ð2:1Þ
Ak = k - 1 Ak = A1 - A2 i - A3 j þ A4 k, Let A = A1T - A2Ti - A3Tj - A4Tk be the usual conjugate transpose of A. Then the other three η-conjugate transposes of A are defined as follows:
Matrices over Quaternion Algebras
143
Ai = i - 1 A i = A1 T - A2 T i þ A3 T j þ A4 T k, Aj = j - 1 A j = A1 T þ A2 T i - A3 T j þ A4 T k,
ð2:2Þ
Ak = k - 1 A k = A1 T þ A2 T i þ A3 T j - A4 T k: ×n There are several ways to give the real representations of A 2 m G . We introduce two of them defined in [65]. ×n Definition 2.1 For A = A1 þ A2 i þ A3 j þ A4 k 2 m with A1 , A2 , G m×n A3 , A4 2 , we define two real representations of A as follows:
Aτ =
A1
αA2
βA3
- αβA4
A2
A1
βA4
- βA3
A3
- αA4
A1
αA2
A4
- A3
A2
A1
Aσ = - Gn Aτ =
where Gn =
2 4m × 4n ,
A4
- A3
A2
A1
- A3
αA4
- A1
- αA2
A2
A1
βA4
- βA3
- A1
- αA2
- βA3
αβA4
0
0
0
- In
0
0
In
0
0
- In
0
0
In
0
0
0
2 4m × 4n ,
:
The properties of Aτ and Aσ are given in the following proposition. For simplicity, we denote
Rn =
0
αI n
0
0
In
0
0
0
0
0
0
- αI n
0
0
- In
0
, Qn =
0
0
βI n
0
0
0
0
βI n
In
0
0
0
0
In
0
0
,
144
X. Liu and Y. Zhang
Sn =
0
0
0
αβI n
0
0
βI n
0
0
- αI n
0
0
- In
0
0
0
:
n×p ×n n×n Proposition 2.2 Let A, B 2 m G , C 2 G , D 2 G , a 2 . Then
(a) (i) (A + B)τ = Aτ + Bτ, (AC)τ = AτCτ, (aA)τ = aAτ; (ii) (A + B)σ = Aσ + Bσ , (AC)σ = Aσ GnCσ , (aA)σ = aAσ ; (b) (i) Rm-1AτRn = Aτ, Qm-1AτQn = Aτ, Sm-1AτSn = Aτ; (ii) (RmT)-1Aσ Rn = Aσ , (QmT)-1Aσ Qn = Aσ , (SmT)-1Aσ Sn = -Aσ ; (c) (A)τ = -Gn(Aτ)TGm, (A)σ = -(Aσ )T; (d) Inτ = I4n; (e) If D is invertible, then (D-1)τ = (Dτ)-1. In the following, we use the above two real representations of a generalized quaternion matrix to consider the solvability of the generalized quaternion matrix equation AXB þ CX ⋆ D = E
ð2:3Þ
for X⋆ 2{X, Xi, X j, Xk, X, Xi, X j, Xk}. Furthermore, we consider the X = ±X⋆ solutions to the generalized quaternion matrix equation AXB þ CXD = E
ð2:4Þ
in a unified way. ×m ×n Theorem 2.3 Let A, C 2 m , B, D 2 nG× n , E 2 m G G : Then the gener⋆ η alized quaternion matrix equation (2.3) with X being X , η 2{1, i, j, k} has a ×n if and only if one of the following equivalent statements solution X 2 m G holds:
(a) its corresponding real matrix equation AYB þ CYD = E
ð2:5Þ τ
has a solution Y 2 4m × 4n , where A = Aτ , B = Bτ , C = ðCη - 1 Þ , D = ðηDÞτ , and E = E τ :
Matrices over Quaternion Algebras
145
(b) there exist nonsingular matrices P1 , P2 , P3 2 4ðmþnÞ × 4ðmþnÞ such that P2 - 1
P2 - 1
A
E
0
D
C
0
0
- I 4n
P3 - 1
I 4m
0
0
B
P1 =
P3 =
P1 =
A
0
0
D
,
C
0
0
- I 4n
I 4m
0
0
B
,
:
Moreover, if Y 2 4m × 4n is a solution to the real matrix equation (2.5), then In
X¼
1 I i α n
1 ½I I i I j I k Y þ Rm YRn- 1 þ Qm YQn- 1 þ Sm YSn- 1 16 m m m m
1 I j β n -
1 I k αβ n
×n 2 m G
is a solution to the generalized quaternion matrix equation (2.3). ×m ×n Corollary 2.4 Let A, C 2 m , B, D 2 nG× n , E 2 m G G : If matrix penτ -1 τ τ τ cils A + λ(Cη ) , (ηD) - λB are regular and τ
spðAτ , - ðCη - 1 Þ Þ
spððηDÞτ , Bτ Þ = ∅,
then the real matrix equation (2.5) has a unique solution Y = ðY ij Þ4 × 4 , Y ij 2 m × n , i, j = 1, 2, 3, 4: Furthermore, the generalized quaternion matrix equation (2.3) with X⋆ being Xη, η 2{1, i, j, k} also has a unique solution
146
X. Liu and Y. Zhang
1 1 1 X = Y 11 þ Y 12 i þ Y 13 j Y k α β αβ 14 Example Consider the generalized quaternion matrix equation AXB + CXiD = E (with α = -1, β = 1), where A= B= C= D= E=
1
-1
0
2
þ
1
-1
-2
3
1 0
þ
1 0 1
-1
0
1
1
1
1
-1
-2
1
4
3
þ
0
0
2
1
2
-1
0
1
þ þ
2
0
0
0
0
0
2
1
3
1
0
-1
1
1
0
2
-3
0
0
1
iþ iþ
iþ iþ iþ
0
1
1
1
3
-1
0
1
jþ
jþ jþ
jþ
0
0
2
0
1
0
-1
3
k,
2
-1
-1
0
-1
3
4
0
jþ
k,
k,
k,
0
1
-1
0
k:
By Theorem 2.3, the corresponding real matrix equation is τ
Aτ YBτ þ ðCi - 1 Þ YðiDÞτ = E τ :
ð2:6Þ
Using MATLAB, we obtain (quoted in two decimal places) sp Aτ , - Ci - 1
τ
¼ f- 6:70, - 2:43, 1:84, - 0:20, - 6:70, - 2:43, 1:84 - 0:20g
and spððiDÞτ , Bτ Þ = f - 10:46, - 10:46, ± 0:19 ± 1:21i, 0:71, 0:71g: τ
Clearly, spðAτ , - ðCi - 1 Þ Þ spððiDÞτ , Bτ Þ = ∅: Moreover, the matrix pencils Aτ + (Ci-1)τ and (iD)τ - Bτ are regular. By Corollary 2.4, the real matrix equation (2.6) has a unique solution:
Matrices over Quaternion Algebras
Y=
147
0:56
0:59
0:32
- 0:25
0:87
0:11
0:08
- 0:60
1:60
3:27
3:75
- 0:69
3:76
0:55
- 0:60
- 3:36
- 0:32
0:25
0:56
0:59
0:08
- 0:60
- 0:87
- 0:11
- 3:75
0:69
1:60
3:27
- 0:60
- 3:36
- 3:76
- 0:55
0:87
0:11
0:08
- 0:60
0:56
0:59
0:32
- 0:25
3:76
0:55
- 0:60
- 3:36
1:60
3:27
3:75
- 0:69
0:08
- 0:60
- 0:87
- 0:11
- 0:32
0:25
0:56
0:59
- 0:60
- 3:36
- 3:76
- 0:55
- 3:75
0:69
1:60
3:27
and the generalized quaternion matrix equation AXB + CXiD = E also has a unique solution X=
0:56
0:59
1:60
3:27
þ
- 0:32 0:25
þ
0:87 0:11 3:76 0:55
i
- 3:75 0:69
jþ
0:08
- 0:60
- 0:60
- 3:36
k:
Theorem 2.5 Let A, B, C, D, E 2 nG× n . Then the generalized quaternion matrix equation (2.3) with X⋆ being Xη, η 2{1, i, j, k} has a solution X 2 nG× n if and only if one of the following equivalent statements holds: (a) its corresponding real matrix equation ð2:7Þ
AYB - CY T D = E
where A = Aσ Gn , B = Gn Bσ , C = has a solution Y 2 4n × 4n , σ σ ðCη - 1 Þ Gn , D = Gn ðηDÞ , and E = E σ : (b) there exist nonsingular matrices P1 , P2 , P3 2 8n × 8n such that P2 T
0
D
A
E
P1 =
0 A P3 - 1
D
, P3 - 1
0 In
0
0
B
In
0
0
-C
P1 =
T
In
0
0
B
P2 =
:
In
0
0
- CT
,
148
X. Liu and Y. Zhang
Moreover, if Y 2 4n × 4n is a solution to the real matrix equation (2.7), then X¼
1 ½I k - I n j I n i - I n Y þ Rn T YRn- 1 þ Qn T YQn- 1 - Sn T YSn- 1 16 n In 1 I i α n 1 I j β n 1 I k αβ n
2 nG× n
is a solution to the generalized quaternion matrix equation (2.3). We also use the real representation Aτ to convert the problem to solving a system of matrix equations over . ×m ×n , B, D 2 nG× n , E 2 m Theorem 2.6 Let A, C 2 m G G : Then the generalized quaternion matrix equation (2.4) has a solution X = ±Xη with η 2{i, j, k} if and only if one of the following equivalent statements holds:
(a) the system AYB þ CYD = E F Y ∓ YG
ð2:8Þ
= 0
has a solution Y 2 4m × 4n , where A = Aτ , B = Bτ , C = Cτ , D = Dτ , E = E τ , F = ðηI m Þτ , and G = ðηI n Þτ : (b) there exist nonsingular matrices Q1 , Q2 , Q3 2 4ðmþnÞ × 4ðmþnÞ such that Q2 - 1 Q3 - 1
A
E
0
D
Im
0
0
B
Q1 = Q1 =
A
0
0
D
Im
0
0
B
,
Q2 - 1
,
Q3 - 1
C
0
0
- In
F
0
0
±G
Q3 = Q3 =
C
0
0
- In
F
0
0
±G
,
:
Matrices over Quaternion Algebras
149
Moreover, if Y 2 4m × 4n is a solution to the system (2.8), then In
X¼
1 I i α n
1 ½I I i I j I k Y þ Rm YRn- 1 þ Qm YQn- 1 þ Sm YSn- 1 16 m m m m
1 I j β n -
1 I k αβ n
×n 2 m G
is an X = ±Xη solution to the generalized quaternion matrix equation (2.4). Theorem 2.7 Let A, B, C, D, E 2 nG× n : Then the generalized quaternion matrix equation (2.4) has a solution X = ±Xη with η 2{1, i, j, k} if and only if one of the following equivalent statements holds: (a) the system AYB þ CYD = E
ð2:9Þ
F Y ∓ YT F = 0
has a solution Y 2 4n × 4n , where A = Aτ , B = Bτ , C = Cτ , D = Dτ , E = E τ , and F = Gn ðηI n Þτ : (b) there exist nonsingular matrices Q1 , Q2 , Q3 2 8n × 8n such that Q2 - 1
Q3 - 1
A
E
0
D
Im
0
0
B
Q1 =
Q1 =
A
0
0
D
Im
0
0
B
, Q2 - 1
, Q3 - 1
C
0
0
- In
0
∓F
F
0
Q3 =
Q3 =
C
0
0
- In
0
∓F
F
0
,
:
150
X. Liu and Y. Zhang
Moreover, if Y 2 4n × 4n is a solution to the system (2.9), then In
X¼
1 I i α n
1 ½I I i I j I k Y þ Rn YRn- 1 þ Qn YQn- 1 þ Sn YSn- 1 16 n n n n
1 I j β n -
1 I k αβ n
2 nG× n is an X = ±Xη solution to the generalized quaternion matrix equation (2.4). In [59], they defined a real and a complex representation of A = ×n A1 þ A2 i þ A3 j þ A4 k 2 m with A1 , A2 , A3 , A4 2 m × n as follows: let G α < 0, A1 p A2 - α AR = p A3 βδ p A4 - αβδ
- A2
p
-α
p A3 δ βδ p A4 δ - αβδ
p A4 δ - αβδ p - A3 δ βδ p - A2 - α
A1 p A4 - αβδ A1 p p - A3 βδ A2 - α A1 p p p δðA3 βδ þ A4 αβδÞ A1 þ A2 α AC = , p p p A1 - A2 α A3 βδ - A4 αβδ
where δ =
1 β>0 -1 β 0 have W ½s,m ⋉ A ⋉ W ½m,t = I m A, where W ½m,n 2 mn × mn , called the swap matrix, is defined as W ½m,n = ½I n δ1m , I n δ2m , . . . , I n δm m = δmn ½1, . . . , ðn - 1Þm þ 1, . . . , m, . . . , nm, and δk[i1, . . . , is] is a abbreviation of ½δik1 , . . . , δiks . Especially, when m = n, we denote W[n] := W[n,n]. Based on the semi-tensor product of matrices, we can implement multilinear computations. Definition 3.7 Let Wi (i = 0, 1, . . . , n) be vector spaces. The mapping F : Πni= 1 W i → W 0 is called a multilinear mapping, if for any 1 ≤ i ≤ n, α, β 2 , Fðx1 , . . . , αxi þ βyi , . . . , xn Þ = αFðx1 , . . . , xi , . . . , xn Þ þ βFðx1 , . . . , yi , . . . , xn Þ, in which xi 2 Wi, 1 ≤ i ≤ n, yi 2 Wi. If dim(Wi) = ki, (i = 0, 1, . . . , n), and ðδ1ki , δ2ki , . . . , δkkii Þ is the basis of Wi. Denote j
j
k0
j
Fðδk11 , δk22 , . . . , δknn Þ =
s=1
cjs1 ,j2 ,... ,jn δsk0 ,
in which jt = 1, . . . kt, t = 1, . . . , n. Then cjs1 ,j2 ,... ,jn j jt = 1, . . . , k t , t = 1, . . . , n; s = 1, . . . , k 0 are called structure constants of F. Arranging these structure constants in the following form
192
Y. Li et al.
MF =
c111...1
n . . . c11...k 1
...
ck11 k2 ...kn
c211...1
n . . . c11...k 2
...
ck21 k2 ...kn
⋮ ck11...1 0
⋮
⋮
n . . . c11...k k0
...
,
ckk10 k2 ...kn
MF is called the product structure matrix of F. Let x, y 2 Wi be as n
x=
i=1
n
ai ei , y =
i=1
bi ei :
Fix the basis, and then x, y can be expressed in vector form as →
→
x = ða1 , a2 , . . . , an ÞT , y = ðb1 , b2 , . . . , bn ÞT :
Using vector form, the vector product of x, y can be simply calculated as → → x !y = M F ⋉ x ⋉ y :
4 Real Vector Representation Methods of Solving Quaternion Matrix Equations In this section, we will propose the definition of real vector representation of a quaternion matrix and study its properties. Definition 4.1 Let x = x1 þ x2 i þ x3 j þ x4 k 2 , denote vR ðxÞ = ½x1 , x2 , x3 , x4 T , vR(x) is called the real staking form of x. By Definition 4.1, we can get the following theorem. Theorem 4.2 Let x, y 2 , then vR ðxyÞ = M Q ⋉ vR ðxÞ ⋉ vR ðyÞ, where the product structure matrix MQ of quaternion is
ð4:1Þ
Direct Methods of Solving Quaternion Matrix Equation Based on STP
MQ =
193
1
0
0
0
0
-1
0
0
0
0
-1
0
0
0
0
-1
0
1
0
0
1
0
0
0
0
0
0
1
0
0
-1
0
0
0
1
0
0
0
0
-1
1
0
0
0
0
1
0
0
0
0
0
1
0
0
1
0
0
-1
0
0
1
0
0
0
Proof Let x = x1 + x2i + x3j + x4k, y = y1 + y2i + y3j + y4k, and then vR ðxÞ = ½x1 , x2 , x3 , x4 T , vR ðyÞ = ½y1 , y2 , y3 , y4 T : Since xy = x1y1-x2y2-x3y3-x4y4+(x1y2+x2y1+x3y4-x4y3)i+(x1y3x2y4+x3y1 +x4y2)j+(x1y4+x2y3-x3y2+x4y1)k. Thus, we can obtain x1 y1 - x2 y2 - x3 y3 - x4 y4 vR ðxyÞ =
x1 y2 þ x2 y1 þ x3 y4 - x4 y3 x1 y3 - x2 y4 þ x3 y1 þ x4 y2
:
x1 y4 þ x2 y3 - x3 y2 þ x4 y1 And the right side of formula (4.1) is M Q ⋉ vR ðxÞ ⋉ vR ðyÞ = M Q ⋉ ðvR ðxÞ vR ðyÞÞ = M Q ⋉ ½x1 y1 x1 y2 x1 y3 x1 y4 x2 y1 x2 y2 x2 y3 x2 y4 x3 y1 x3 y2 x3 y3 x3 y4 x4 y1 x4y2 x4y3 x4y4]T x1 y1 - x2 y2 - x3 y3 - x4 y4 =
x1 y2 þ x2 y1 þ x3 y4 - x4 y3 x1 y3 - x2 y4 þ x3 y1 þ x4 y2
:
x1 y4 þ x2 y3 - x3 y2 þ x4 y1 the left and right sides of formula (4.1) are equal, so Theorem 4.2 is proved. □ Definition 4.3 Let x = [x1, x2, . . . , xn], y = [y1, y2, . . . , yn]T be quaternion vectors with xi , yi 2 , ði = 1, 2, . . . , nÞ: Denote
:
194
Y. Li et al.
vR ðx1 Þ vR ðxÞ =
vR ðy1 Þ , vR ðyÞ =
⋮ vR ðxn Þ
⋮
,
vR ðyn Þ
vR(x) and vR( y) are called the real staking form of quaternion vector x and y, respectively. Furthermore, we give the concepts of the real column stacking form and the real row stacking form of a quaternion matrix A. Definition 4.4 For A 2 m × n , denote vR ðCol1 ðAÞÞ vRc ðAÞ =
vR ðRow1 ðAÞÞ
vR ðCol2 ðAÞÞ
, vRr ðAÞ =
⋮ vR ðColn ðAÞÞ
vR ðRow2 ðAÞÞ ⋮
:
vR ðRowm ðAÞÞ
vRc ðAÞ and vRr ðAÞ are called the real column stacking form and the real row stacking form of A, respectively. The real column stacking form and the real row stacking form are collectively called the real vector representation of the quaternion matrix. We give the following properties of real vector representation with respect to vectors and matrices, respectively. x1 , x2 , . . . , xn , y = ½y1 , y2 , Theorem 4.5 Let x = ½x1 , x2 , . . . , xn , x = ½ . . . , yn T , a 2 , and then ð1Þ vR ðx þ xÞ = vR ðxÞ þ vR ð xÞ, ð2Þ vR ðaxÞ = avR ðxÞ, ð3Þ vR ðxyÞ = M Q ⋉
ð
n i=1
T
T
Þ
ðδin Þ ⋉ ðI 4n ðδin Þ Þ ⋉ vR ðxÞ ⋉ vR ðyÞ:
Proof We only give a detailed proof of (3). Using (4.1), we have
Direct Methods of Solving Quaternion Matrix Equation Based on STP
195
vR ðxyÞ = vR ðx1 y1 þ . . . þ xn yn Þ = vR ðx1 y1 Þþ . . . þ vR ðxn yn Þ = M Q ⋉ vR ðx1 Þ ⋉ vR ðy1 Þþ . . . þ M Q ⋉ vR ðxn Þ ⋉ vR ðyn Þ = M Q ⋉ ðvR ðx1 Þ ⋉ vR ðy1 Þþ . . . þ vR ðxn Þ ⋉ vR ðyn ÞÞ T
T
= M Q ⋉ ððδ1n Þ ⋉ vR ðxÞ ⋉ ðδ1n Þ ⋉ vR ðyÞþ . . . þðδnn ÞT ⋉ vR ðxÞ ⋉ ðδnn ÞT ⋉ vR ðyÞÞ
ð ⋉ð
= MQ ⋉ = MQ
n
T
i=1 n i=1
Þ
T
ðδin Þ ⋉ vR ðxÞ ⋉ ðδin Þ ⋉ vR ðyÞ T
T
Þ
ðδin Þ ⋉ ðI 4n ðδin Þ Þ ⋉ vR ðxÞ ⋉ vR ðyÞ: □
Theorem 4.6 Let A, A 2 m × n , B 2 n × p , and then = vR ðAÞ þ vR ðAÞ, ð1Þ vRr ðA þ AÞ r r
= vR ðAÞ þ vR ðAÞ, vRc ðA þ AÞ c c
ð2Þ vRr ðAÞ = vRc ðAT Þ, vRc ðAÞ = W ½m,n ⋉ vRr ðAÞ, vRr ðAÞ = W ½n,m ⋉ vRc ðAÞ, ð3Þ kAkðFÞ = kvRr ðAÞk = kvRc ðAÞk, ð4Þ vRc ðABÞ = GðvRr ðAÞ ⋉ vRc ðBÞÞ, vRr ðABÞ = G ′ ðvRr ðAÞ ⋉ vRc ðBÞÞ: in which, F = M Q ⋉
ð
T
n i=1
T
T
Þ
ðδin Þ ⋉ ðI 4n ðδin Þ Þ , and T
T
T
F ⋉ ðδ1m Þ ⋉ ½I 4mn ðδ1p Þ
F ⋉ ðδ1m Þ ⋉ ½I 4mn ðδ1p Þ
⋮
⋮ T
T
T 1 F ⋉ ðδm m Þ ⋉ ½I 4mn ðδp Þ
G=
⋮ T
F ⋉ ðδ1m Þ ⋉ ½I 4mn ðδpp ÞT , G′ =
:
⋮ T
F ⋉ ðδ1m Þ ⋉ ½I 4mn ðδpp ÞT
T 1 F ⋉ ðδm m Þ ⋉ ½I 4mn ðδp Þ
⋮
⋮
T p T F ⋉ ðδm m Þ ⋉ ½I 4mn ðδp Þ
T p T F ⋉ ðδm m Þ ⋉ ½I 4mn ðδp Þ
196
Y. Li et al.
Proof We only prove (4). We block A and B with its rows or columns as follows: Row1 ðAÞ A=
Row2 ðAÞ
, B = ½ Col1 ðBÞ Col2 ðBÞ . . . Colp ðBÞ :
⋮ Rowm ðAÞ
Then we have
vRc ðABÞ
=
vR ðRow1 ðAÞCol1 ðBÞÞ
F ⋉ vR ðRow1 ðAÞÞ ⋉ vR ðCol1 ðBÞÞ
⋮
⋮
vR ðRowm ðAÞCol1 ðBÞÞ
F ⋉ vR ðRowm ðAÞÞ ⋉ vR ðCol1 ðBÞÞ =
⋮
⋮
vR ðRow1 ðAÞColp ðBÞÞ
F ⋉ vR ðRow1 ðAÞÞ ⋉ vR ðColp ðBÞÞ
⋮
⋮
vR ðRowm ðAÞColp ðBÞÞ
F ⋉ vR ðRowm ðAÞÞ ⋉ vR ðColp ðBÞÞ
T
T
F ⋉ ½ðδ1m Þ ⋉ vRr ðAÞ ⋉ ½ðδ1p Þ ⋉ vRc ðBÞ ⋮ T F ⋉ ½ðδm mÞ
T
⋉ vRr ðAÞ ⋉ ½ðδ1p Þ ⋉ vRc ðBÞ
=
⋮ T
F ⋉ ½ðδ1m Þ ⋉ vRr ðAÞ ⋉ ½ðδpp ÞT ⋉ vRc ðBÞ ⋮ T p T R R F ⋉ ½ðδm m Þ ⋉ vr ðAÞ ⋉ ½ðδp Þ ⋉ vc ðBÞ T
T
F ⋉ ðδ1m Þ ⋉ ½I 4mn ðδ1p Þ ⋮ T F ⋉ ðδm mÞ
=
T
⋉ ½I 4mn ðδ1p Þ ⋮
ðvRr ðAÞ ⋉ vRc ðBÞÞ:
T
F ⋉ ðδ1m Þ ⋉ ½I 4mn ðδpp ÞT ⋮ T p T F ⋉ ðδm m Þ ⋉ ½I 4mn ðδp Þ
□
Direct Methods of Solving Quaternion Matrix Equation Based on STP
197
For G′, we can also get it by the above method. Nowadays, Sylvester matrix equations are widely and heavily used in descriptor systems control theory, neural network, robust, feedback, graph theory, and many other disciplines. In the last ten years, interest in Sylvester matrix equations has expanded to quaternion algebra. Next, we take the quaternion Sylvester matrix equation AX + XB = C as an example to illustrate the application of the real vector representation of quaternion matrix. Firstly, we provide some well-known conclusions of real linear systems of equations. Lemma 4.7 ( [24]) The least squares solutions of the linear system of equations Ax = b, with A 2 m × n and b 2 m , can be represented as x = A{ b þ ðI - A{ AÞy, where y 2 n is an arbitrary vector. The minimal norm least squares solution of the linear system of equations Ax = b is A{b. Lemma 4.8 ( [24]) The linear system of equations Ax = b, with A 2 m × n and b 2 m , has a solution x 2 n if and only if AA{ b = b: In that case, it has the general solution x = A{ b þ ðI - A{ AÞy, where y 2 n is an arbitrary vector. The minimal norm solution of the linear system of equations Ax = b is A{b. Theorem 4.9 Let A 2 n × n , B 2 n × n , and denote ~ = G01 ⋉ vRr ðAÞ þ G01 ⋉ W ½n,n ⋉ W ½4n2 ,4n2 ⋉ vRc ðBÞ, M
ð4:2Þ
where G01 has the same structure as G′ in Theorem 4.6, excepting the dimension. Hence, the least squares solutions of quaternion Sylvester matrix equation AX + XB = C can be represented as ~ { vRr ðCÞþðI 4n2 - M ~ { MÞy, ~ SQ = X 2 n × n jvRc ðXÞ = M
2
8y 2 4n : ð4:3Þ
And then the minimal norm least squares solution XQ satisfies
198
Y. Li et al. {
~ vRr ðCÞ: vRc ðX Q Þ = M
ð4:4Þ
Proof By Theorem 4.6, we get kAX þ XB - C kðFÞ ¼ vRr ðAX þ XB - C Þ ¼ vRr ðAX Þ þ vRr ðXBÞ - vRr ðC Þ ¼ G01 ⋉ vRr ðAÞ ⋉ vRc ðX Þ þ G01 ⋉ vRr ðX Þ ⋉ vRc ðBÞ - vRr ðC Þ ¼ G01 ⋉ vRr ðAÞ ⋉ vRc ðX Þ þ G01 ⋉ W ½nn ⋉ vRc ðX Þ ⋉ vRc ðBÞ - vRr ðC Þ ¼ G01 ⋉ vRr ðAÞ ⋉ vRc ðX Þ þ G01 ⋉ W ½nn ⋉ W ½4n2 4n2 ⋉ vRc ðBÞ ⋉ vRc ðX Þ - vRr ðC Þ G01 ⋉ vRr ðAÞ þ G01 ⋉ W ½n,n ⋉ W ½4n2 ,4n2 ⋉ vRc ðBÞ vRc ðXÞ - vRr ðCÞ ~ Rc ðXÞ - vRr ðCÞ : = Mv
=
Thus, kAX þ XB - C kðFÞ = min if and only if ~ Rc ðXÞ - vRr ðCÞk = min : kMv For the real matrix equation ~ Rc ðXÞ = vRr ðCÞ, Mv by Lemma 4.7, its least squares solutions can be represented as ~ { vRr ðCÞþðI 4n2 - M ~ { MÞy, ~ vRc ðXÞ = M
2
8y 2 4n :
Thus, we get the formula (4.3). Notice min kX kðFÞ ,
X2n × n
min
2 vRc ðXÞ24n
vRc ðXÞ ,
according to the previous proof of this theorem, we have that the minimal norm least squares solution XQ satisfies
Direct Methods of Solving Quaternion Matrix Equation Based on STP
199
{
~ vRr ðCÞ: vRc ðX Q Þ = M □
Therefore, (4.4) holds.
We can also get the sufficient and necessary condition for the compatibility of the quaternion Sylvester matrix equation AX + XB = C and the expression of the solution when AX + XB = C is compatible. ~ is as in (4.2). Hence, quaterCorollary 4.10 Let A 2 n × n , B 2 n × n , M nion Sylvester matrix equation has a solution X 2 n × n if and only if ~M ~ { - I 4n2 ÞvRr ðCÞ = O: ðM
ð4:5Þ
Moreover, if (4.5) holds, the solution set of quaternion Sylvester matrix equation AX + XB = C can be represented as 2 ~ ~ { vRr ðCÞþðI 4n2 - M ~ { MÞy, 8y 2 4n : SQ = X 2 n × n j vRc ðXÞ = M
And then the minimal norm solution X Q satisfies {
~ vRr ðCÞ: vRc ðX Q Þ = M
ð4:6Þ
Proof Quaternion Sylvester matrix equation AX + XB = C has a solution X 2 m × n if and only if kAX þ XB - CkðFÞ = O: By Theorem 4.9 and the properties of the MP inverse, we get kAX þ XB - CkðFÞ ~ Rc ðXÞ - vRr ðCÞ = Mv ~M ~ { Mv ~ Rc ðXÞ - vRr ðCÞ = M ~M ~ { vRr ðCÞ - vRr ðCÞ = M {
~M ~ - I 4n2 ÞvRr ðCÞ : = ðM Therefore, for X 2 SQ, we obtain
200
Y. Li et al.
kAX þ XB - CkðFÞ = O {
~M ~ - I 4n2 ÞvRr ðCÞ = O , ðM {
~M ~ - I 4n2 ÞvRr ðCÞ = O: , ðM In case that AX + XB = C is compatible, its solution X 2 m × n satisfies ~ Rc ðXÞ = vRr ðCÞ: Mv Moreover, according to Lemma 4.8, the solution X satisfies ~ ~ { vRr ðCÞþðI 4n2 - M ~ { MÞy, vRc ðXÞ = M and we can obtain the minimal norm solution X Q that satisfies {
~ vRr ðCÞ: vRc ðX Q Þ = M □ We give a numerical example to test the effectiveness of the above method. Example Consider the quaternion Sylvester matrix equation AX + XB = C, where A, B = randðn, nÞ þ randðn, nÞi þ randðn, nÞj þ randðn, nÞk, The rand(n, n) here refers to the function of randomly generating n × n real matrices. Let XQ = rand(n, n) + rand(n, n)i + rand(n, n)j + rand(n, n)k, and compute C = AX Q þ X Q B: Obviously, the quaternion Sylvester matrix equation AX + XB = C has the exact solution XQ. According to Corollary 4.10, we compute the numerical solution XQ. Let n = 2K, (K = 1 : 8), and denote ε = log 10 kX Q - X˘ Q k . The relation between K and the error ε is shown in Figure 1.
Direct Methods of Solving Quaternion Matrix Equation Based on STP
201
-11
-11.5
-12
-12.5
-13
-13.5
-14 1
2
3
4
5
6
7
8
K
Figure 1 ε under different matrix dimensions
We see clearly that this method only involves real operations and completely avoids quaternion operations, which is one of the advantages of this method. From the above numerical experiment, we can see that this method is effective.
5 Vector Operator Method of Solving Quaternion Matrix Equation Vector operators play an important role in solving matrix equation. However, the application of vector operators in solving quaternion matrix equations is limited due to the non-commutability of quaternion product. The properties of vector operators can be discussed again by using the semi-tensor product of matrices on quaternion skew-field. Definition 5.1 Let A = ðaij Þm × n 2 m × n . Then V c ðAÞ
= ða11 , ⋯ , am1 , a12 , ⋯ , am2 , ⋯ , a1n , ⋯ , amn ÞT ,
V r ðAÞ
= ða11 , ⋯ , a1n , a21 , ⋯ , a2n , ⋯ , am1 , ⋯ , amn ÞT :
202
Y. Li et al.
Theorem 5.2 Let A 2 m × n , X 2 n × q , Y 2 p × m , and then ð1Þ
V r ðAXÞ = A ⋉ V r ðXÞ, V c ðAXÞ = A ⋊ V c ðXÞ:
ð2Þ
V c ðYAÞ = AH ⋉ V c ðYÞ, V r ðYAÞ = AH ⋊ V r ðYÞ:
Proof (1) For V r ðAXÞ = A ⋉ V r ðXÞ: Suppose C = AX, ai(i = 1, ⋯ , m) represents the i-th row of matrix A, x j( j = 1, ⋯ , n) represents the j-th row of matrix X, and ci(i = 1, ⋯ , m) represents the i-th row of matrix C, then the i-th block of A ⋉ V r ðXÞ is n k = 1 aik xk1
T
ðx1 Þ ai ⋉ V r ðXÞ = ai ⋉
⋮ ðxn ÞT
=
⋮
T
= ðci Þ ,
n k = 1 aik xkq
and then we have V r ðAXÞ = A ⋉ V r ðXÞ: By the properties of the swap matrix and V r ðAXÞ = A ⋉ V r ðXÞ, we have V c ðAXÞ = W ½m,q ⋉ V r ðAXÞ = W ½m,q ⋉ A ⋉ V r ðXÞ = W ½m,q ⋉ A ⋉ W ½q,n ⋉ V c ðXÞ = A ⋊ V c ðXÞ: (2) According to (1) and Vr(AT) = Vc(A), we can obtain T
V c ðYAÞ = V r ððYAÞ Þ = V r ððYAÞH Þ = V r ðAH Y H Þ T
= AH ⋉ V r ðY H Þ = AH ⋉ V r ðY Þ = AH ⋉ V c ðYÞ: By the properties of the swap matrix and V c ðYAÞ = AH ⋉ V c ðYÞ, we have V r ðYAÞ = W ½n,p ⋉ V c ðYAÞ = W ½n,p ⋉ AH ⋉ V c ðYÞ = W ½n,p ⋉ AH ⋉ W ½p,m ⋉ V r ðYÞ = AH ⋊ V r ðYÞ: □ By using Theorem 5.2, the quaternion matrix equation can be transformed into a quaternion linear system of equations, so as to change the position of the variable X, and then the quaternion matrix equation can be solved. And
Direct Methods of Solving Quaternion Matrix Equation Based on STP
203
the quaternion Sylvester matrix equation AX + XB = C is taken as an example to illustrate this method. Theorem 5.3 Let A 2 n × n , B 2 n × n , C = C1 + C2j, In A = D1 + D2j, BH In = E1 + E2j, X = X r þ X i i þ X j j þ X k k 2 n × n , and
Ñ
=
=
i I n2
O
O
D1 þ E 1
O
E2
- D2
I n2
- i I n2
O
O
- E2
D2
E1
D1
O
O
I n2
i I n2
O
O
I n2
- i I n2
ð
V c ðC1 Þ
ð
V c ðC1 Þ
Re T~
I n2
ReðÑÞ
~= , C
ImðÑÞ Im
V c ðC2 Þ
V c ðC2 Þ
,
Þ Þ
:
ð5:1Þ Now we give the set of least squares solutions of quaternion Sylvester matrix equation AX + XB = C. V c ðX r Þ SQ =
X 2 n × n j
V c ðX i Þ V c ðX j Þ
{ ~ þ ðI 4n2 - T~ { TÞz ~ = T~ C ,
V c ðX k Þ where z is an arbitrary vector of appropriate order. And then the minimal norm solution X Q = X rQ þ X iQ i þ X jQ j þ X kQ k satisfies V c ðX rQ Þ V c ðX iQ Þ V c ðX jQ Þ V c ðX kQ Þ
{ ~ = T~ C:
204
Y. Li et al.
Proof Using Theorem 5.2, we can obtain kAX þ XB - C k 2ðFÞ = kV c ðAX þ XB - CÞk 2ðFÞ = kV c ðAXÞ þ V c ðXBÞ - V c ðCÞk 2ðFÞ = A ⋊ V c ðXÞ þ BH ⋉ V c ðXÞ - V c ðCÞ
2 ðFÞ
= ðI n AÞV c ðXÞ þ ðB I n ÞV c ðXÞ - V c ðCÞ H
2 ðFÞ
= ðD1 þ D2 jÞðV c ðX 1 þ X 2 jÞÞ þ ðE1 þ E 2 jÞV c ðX 1 - X 2 jÞ - V c ðC 1 þ C 2 jÞ = D1 V c ðX 1 Þ þ D1 V c ðX 2 Þj þ D2 V c ðX 1 Þj - D2 V c ðX 2 Þ
2 ðFÞ
2 ðFÞ
þ E1 V c ðX 1 Þ - E1 V c ðX 2 Þj þ E 2 V c ðX 1 Þj þ E2 V c ðX 2 Þ - ðV c ðC 1 Þ þ V c ðC2 ÞjÞ
2 ðFÞ
= D1 V c ðX 1 Þ - D2 V c ðX 2 Þ þ E 1 V c ðX 1 Þ þ E2 V c ðX 2 Þ - V c ðC 1 Þ þkD1 V c ðX 2 Þ þ D2 V c ðX 1 Þ - E2 V c ðX 1 Þ þ E1 V c ðX 2 Þ - V c ðC2 Þk V c ðX 1 Þ =
D1 þ E 1
O
E2
- D2
V c ðX 1 Þ
- E2
D2
E 1 þ D1
O
V c ðX 2 Þ
-
V c ðC1 Þ V c ðC2 Þ
V c ðX 2 Þ V c ðX r þ X i iÞ =
D1 þ E 1
O
E2
- D2
V c ðX r - X i iÞ
- E2
D2
E 1 þ D1
O
V c ðX j þ X k iÞ
-
V c ðC 1 Þ V c ðC 2 Þ
V c ðX j - X k iÞ
=
-
i I n2
O
O
V c ðX r Þ
D1 þ E 1
O
E2
- D2
I n2
- i I n2
O
O
V c ðX i Þ
- E2
D2
E 1 þ D1
O
O
O
I n2
i I n2
V c ðX j Þ
O
O
I n2
- i I n2
V c ðX k Þ
V c ðC 1 Þ V c ðC 2 Þ V c ðX r Þ
= Ñ
I n2
V c ðX i Þ V c ðX j Þ V c ðX k Þ
-
V c ðC1 Þ V c ðC2 Þ
Direct Methods of Solving Quaternion Matrix Equation Based on STP
205
V c ðX r Þ = ðReðÑÞ þ ImðÑÞiÞ
V c ðX i Þ
- Re
V c ðX j Þ
ð
V c ðC1 Þ V c ðC2 Þ
Þ þ Imð
V c ðC1 Þ V c ðC2 Þ
Þi
V c ðX k Þ V c ðX r Þ =
ReðÑÞ
V c ðX i Þ
ImðÑÞ
V c ðX j Þ
Re Im
V c ðX k Þ
ð
ð
V c ðC1 Þ V c ðC2 Þ V c ðC1 Þ V c ðC2 Þ
Þ Þ
V c ðX r Þ = T~
V c ðX i Þ V c ðX j Þ
~ -C
V c ðX k Þ
V c ðX r Þ Since kAX þ XB - C k 2ðFÞ = min , kT~
V c ðX i Þ V c ðX j Þ
~ 2 = min , for the - Ck
V c ðX k Þ real matrix equation V c ðX r Þ T~
V c ðX i Þ V c ðX j Þ
~ = C,
V c ðX k Þ by Lemma 4.7, its least squares solutions can be represented as V c ðX r Þ V c ðX i Þ V c ðX j Þ
{ ~ þ ðI 4n2 - T~ { TÞz: ~ = T~ C
V c ðX k Þ And the minimal norm least squares solution XQ can be expressed
206
Y. Li et al.
V c ðX rQ Þ V c ðX iQ Þ V c ðX jQ Þ
{ ~ = T~ C:
V c ðX kQ Þ □ ~ are as in (5.1). Hence, ~ C Corollary 5.4 Let A 2 n × n , B 2 n × n , Ñ, T, quaternion Sylvester matrix equation has a solution X 2 n × n if and only if { ~ = 0: ðT~ T~ - I 4n2 ÞC
ð5:2Þ
Moreover, if (5.2) holds, the solution set of quaternion Sylvester matrix equation AX + XB = C can be represented as V c ðX r Þ SQ =
X 2 n × n j
V c ðX i Þ V c ðX j Þ
{ ~ þ ðI 4n2 - T~ { TÞz ~ = T~ C ,
V c ðX k Þ where z is an arbitrary vector of appropriate order. And then the minimal norm solution X Q = X rQ þ X iQ i þ X jQ j þ X kQ k satisfies V c ðX rQ Þ V c ðX iQ Þ V c ðX jQ Þ
{ ~ = T~ C:
V c ðX kQ Þ Now we provide a numerical example to test the effectiveness of the above method. Let A 2 n × n , B 2 n × n , X 2 n × n n = 2L(L = 1 : 30). We compare the error between the exact solution and the solution obtained by the above method, denoted as κ. And κ under different matrix dimensions was recorded in Figure 2.
Direct Methods of Solving Quaternion Matrix Equation Based on STP
207
-9.5 -10 -10.5 -11 -11.5 -12 -12.5 -13 -13.5 -14 -14.5 0
5
10
15
20
25
30
L
Figure 2 κ under different matrix dimensions
As can be seen from Figure 2, the error between the exact solution and the obtained solution by the above method is very small, indicating the effectiveness of the above method. All computations are performed on Inter (R) core(TM) i9-10940U @3.30GHz/64GB computer using MATLAB R2019b software. Two kinds of methods for solving quaternion matrix equations have been proposed based on semi-tensor products of matrices. In Section 4, the real vector representation of the quaternion matrix is proposed firstly, and some new properties are obtained. The quaternion matrix equation problem can be equivalently transformed into a real vector equations problem in one step, which simplifies the process of solving quaternion matrix equation. However, it deals with a relatively small scale of the problem because of the number of matrix operations involved. In Section 5, some new properties of vector operators in quaternion matrix are obtained by using semi-tensor products of matrices. The position of variables in quaternion matrix equations can be changed directly without the help of large matrices. So the scale of the problem of solving quaternion matrix equation is enlarged. At present, we are combining the vector operator method with the several equivalent representations of quaternion matrix, and propose some new methods to solve the quaternion matrix equation problems. In addition, the ideas of the two kinds of methods proposed in this paper can be converted to other number systems, such as commutative quaternion, split quaternion, octonion,
208
Y. Li et al.
sedenion, and so on. Therefore, semi-tensor product of matrices will become a powerful tool for solving problems of hypercomplex matrix equations. Acknowledgements This work is supported by the National Natural Science Foundation of China [grant number 62176112] and the Natural Science Foundation of Shandong Province [grant number ZR2020MA053].
References 1. Beik, F., & Salman, A. A. (2015). An iterative algorithm for η-(anti)-Hermitian leastsquares solutions of quaternion matrix equations. Electronic Journal of Linear Algebra, 30, 372–401 2. Song, C. Q., Chen, G. L., & Zhang, X. Y. (2012). An iterative solution to coupled quaternion matrix equations. Filomat, 26(4), 809–826 3. Wang, M. H., Wei, M. S., & Feng, Y. (2008). An iterative algorithm for least squares problem in quaternionic quantum theory. Computer Physics Communications, 179(4), 203–207 4. Salman, A. A., & Fatemeh, P. A. B. (2017). Iterative algorithms for least-squares solutions of a quaternion matrix equation. Journal of Applied Mathematics and Computing, 53(1), 95–127 5. Wang, Q. W., He, Z. H., & Zhang. Y. (2019). Constrained two-side dcoupled Sylvester-type quaternion matrix equations. Automatica, 101, 207–213 6. Wang, Q. W., Yang, X. X., & Yuan, S. F. (2018). The least square solution with the least norm to a system of quaternion matrix equations. Iranian Journal of Science and Technology, Transaction A, 42, 1317–1325 7. Rehman, A., Wang, Q. W., Ali, I., Akram, M., & Ahmad. M. O. (2017). A constraint system of generalized Sylvester quaternion matrix equations. Advances in Applied Clifford Algebras, 27, 3183–3196 8. Rehman, A., Wang, Q. W., & He, Z. H. (2015). Solution to a system of real quaternion matrix equations encompassing η-Hermicity. Applied Mathematics and Computation, 265, 945–957 9. Liu, L. S., Wang, Q. W., Chen, J. F., & Xie, Y. Z. (2022). An exact solution to a quaternion matrix equation with an application. Symmetry, 14(2), 375 10. Mehany, M. S., & Wang. Q. W. (2022). Three symmetrical systems of coupled Sylvester-like quaternion matrix equations. Symmetry, 14(3), 550 11. Yuan, S. F., Liao, A. P., & Lei, Y. (2008). Least squares Hermitian solution of the matrix equation (AXB, CXD) = (E, F)with the least norm over the skew field of quaternions. Mathematical and Computer Modelling, 48, 91–100 12. Yuan, S. F., Wang, Q. W., & Duan, X. F. (2013). On solutions of the quaternion matrix equation AX = B and their applications in color image restoration. Applied Mathematics and Computation, 221, 10–20 13. Yuan, S. F., Wang, Q. W., & Zhang, X. (2013). Least-squares problem for the quaternion matrix equation AXB + CY D = E over different constrained matrices. International Journal of Computer Mathematics, 90, 565–576
Direct Methods of Solving Quaternion Matrix Equation Based on STP
209
14. Yuan, S. F., & Wang, Q. W. (2012). Two special kinds of least squares solutions for the quaternion matrix equation AXB + CXD = E. Electronic Journal of Linear Algebra, 23, 257–274 15. Zhang, F. X., Wei, M. S., Li, Y., & Zhao, J. L. (2016). Special least squares solutions of the quaternion matrix equation AXB + CXD = E. Computers & Mathematics with Applications, 72, 1426–1435 16. Zhang, F. X., Wei, M. S., Li, Y., & Zhao, J. L. (2015). Special least squares solutions of the quaternion matrix equation AX = B with applications. Applied Mathematics and Computation, 270, 425–433 17. Ding, W. X., Li, Y., & Wang, D. (2021). A real method for solving quaternion matrix equation X - A X B = C based on semi-tensor product of matrices. Advances in Applied Clifford Algebras, 31(2), 4–17 18. Wang, D., Li, Y., & Ding, W. X. (2021). Several kinds of special least squares solutions to quaternion matrix equation AXB = C. Journal of Applied Mathematics and Computing, 1–19 19. Fan, X. L., Li, Y., Liu, Z. H., & Zhao, J. L. (2022). Solving quaternion linear system based on semi-tensor product of quaternion matrices. Symmetry, 14(7), 1359 20. Liu, Z. H., Li, Y., Fan, X. L., & Ding, W. X. (2022). A new method of solving special solutions of quaternion generalized Lyapunov matrix equation. Symmetry, 14(6), 1120 21. Wei, M. S., Li, Y., Zhang, F. X, & Zhao, J. L. (2018). Quaternion matrix computations. New York: Nova Science Publisher 22. Cheng, D. Z., Qi, H. S., & Li, Z. Q. (2011). Analysis and control of Boolean networks: A semi-tensor product approach. Springer 23. Cheng, D. Z. (2019). From dimension-free matrix theory to cross-dimensional dynamic systems. Academic Press 24. Golub, G. H., & Van Loan, C. F. (2013). Matrix computations (4th edn.). Baltimore: The Johns Hopkins University Press
Geometric Mean and Matrix Quadratic Equations Mitsuru Uchiyama
Abstract We review and extend the relation between roots and coefficients of a numerical quadratic equation to that of a matrix quadratic equation by using matrix geometric mean. The idea seems to be quite new. Let A, B be N × N matrices such that A ≥ B ≥ 0. We show that the matrix equation B = X#Y, 0 ≤ X ≤ Y has a unique solution X = A - (A - B) A = XþY 2 , #(A + B), Y = A + (A - B)#(A + B), where the symbol “#” denotes the matrix geometric mean. We then determine every solution {X, Y } of the same equation under a weaker condition 0 ≤ X, 0 ≤ Y instead of 0 ≤ X ≤ Y . Such X and Y both satisfy the quadratic equation XA{X - 2X + BA{B = 0, where A{ denotes the generalized inverse of A. Conversely, if X satisfies this equation, then so does Y := 2A - X and we have X#Y = B. We also state C = X ! Y for given similar result regarding the equation A = XþY 2 , A ≥ C ≥ 0, where “ ! ” denotes the matrix harmonic mean. As an application, we decompose simultaneously X and Y such that X#Y = a(X + Y ), where 0 < a < 1∕2. Keywords Positive definite matrix • Operator geometric mean • Operator harmonic mean • Matrix quadratic equation • Generalized inverse Mathematics Subject Classification (MSC2020) Primary 47A64, 15A24 • Secondary 15A39, 47A63
The author was supported in part by (JSPS) KAKENHI 17K05286. M. Uchiyama (✉) Shimane University Matsue, Matsue, Japan Ritsumeikan University Otsu, Otsu, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Moslehian (ed.), Matrix and Operator Equations and Applications, Mathematics Online First Collections, https://doi.org/10.1007/16618_2023_48
211
212
M. Uchiyama
1 Introduction We begin with remembering an elementary problem: For given real numbers a > b > 0, find x, y > 0 such that p xþy = a, xy = b: 2 These x and y satisfy the quadratic equation x2 - 2ax þ b2 = 0: By solving this equation, we get p p x = a ± a2 - b2 , y = a ∓ a2 - b2 : In this chapter we will consider the same problem for finite Hermitian matrices and get solutions with same formulas.
1.1
Positivity
Let A be an N × N complex matrix and A the conjugate transpose of A. We naturally consider A as a linear transformation on ℂN with the conventional inner product (x, y). We denote the range and the null space of A by RðAÞ and N ðAÞ, respectively, and we thereby get N = RðAÞ N ðA Þ, where means the orthogonal sum. The set of all eigenvalues of A is denoted by σ(A). By taking account of the formula 4ðAx, yÞ =
4
ik ðAðx þ ik yÞ, x þ ik yÞ,
k=1
we can see that A is Hermitian, i.e., A = A if and only if (Ax, x) is real for every x 2 ℂN. Thus, every eigenvalue of a Hermitian A is real, and setting σ(A) = {λ1, . . . , λk} we have
Geometric Mean and Matrix Quadratic Equations
213
N = N ðA - λ1 IÞ ⋯ N ðA - λk IÞ, where I denotes the identity matrix. We abbreviate mI to m unless it would be misunderstood. Let Pi be the orthogonal projection onto N ðA - λi Þ. Then the above decomposition of the space leads us to the spectral decomposition of the Hermitian A: A = λ1 P1 ⋯ λk Pk ,
P1 ⋯ Pk = I
ð1 ≤ k ≤ NÞ:
ð1:1Þ
For a and b in ℂN, define an operator a b by (a b)x := (x, b)a. Then a a is a projection if ||a|| = 1. Let L be a linear subspace with an orthogonal basis {e1, ⋯ , ek}. Then the orthogonal projection P onto L is represented as P = e1 e1 þ ⋯ þ ek ek : A is called positive semi-definite, denoted by A ≥ 0, if (Ax, x) ≥ 0 for every x, and positive definite, denoted by A > 0, if (Ax, x) > 0 for every x ≠ 0. Recall that a Hermitian matrix A is positive semi-definite if and only if all principal minors of A are nonnegative. If A is positive definite, then there is the inverse A-1, which is also positive. Suppose A ≥ 0 and N ðAÞ ≠ 0. Since the restriction of A to RðAÞ is invertible, the generalized inverse A{ is defined by A{ : = ðAjRðAÞÞ - 1 PA , where PA is the orthogonal projection onto RðAÞ. We have A{ A = AA{ = PA :
1.2
Square Roots
Let A be a Hermitian matrix, and let (1.1) be the spectral decomposition of A. Then Pi (i = 1, ⋯ , k) commutes to every X which commutes to A. Indeed, since N ðA - λi Þ is invariant for X, we have (I - Pi)XPi = 0. Since X also commutes to A, (I - Pi)XPi = 0. Thus XPi = Pi XPi = ðPi X Pi Þ = Pi X: Let f(t) be a real continuous function defined on an interval including σ(A). Then f(A) is defined by
214
M. Uchiyama
f ðAÞ = f ðλ1 ÞP1 ⋯ f ðλk ÞPk : It is trivial to see that if {fn} converges uniformly to f on σ(A), then {fn(A)} converges to f(A) in the operator norm. We are now ready to state a fundamental result: A ≥ 0 has a unique square root, which is denoted by A1∕2. Proof Let (1.1) be the spectral decomposition of A. We notice that λi ≥ 0 and λi ≠ λj if i ≠ j. For f(t) = t1∕2 (0 < t < 1) define f(A) as the above. It is clear that f(A) ≥ 0 and f(A)2 = A. We need to show the uniqueness. Suppose B ≥ 0 and B2 = A. Since BA = B3 = AB, B commutes to every Pi. Let Bi be the restriction of B to N ðA - λi Þ. Then B is an orthogonal sum of Bi. It is apparent that Bi ≥ 0. Take an arbitrary eigenvalue μ ≥ 0 of Bi and a corresponding vector 0 ≠ x in N ðA - λi Þ, i.e., Bix = μx. Since μ2 x = B2i x = B2 x = Ax = λi x, p μ = λi : p This says that Bi x = λi x for every x in N ðA - λi Þ. We thereby arrive at B = f(A). □ It goes without saying that A1∕2 > 0 if A > 0.
1.3
Order
For Hermitian matrices A, B, we write A ≥ B if A - B ≥ 0, and A > B if A B > 0. This order has curious properties. First, A ≥ B ≥ 0 ⇏ A2 ≥ B2 : For instance, put A :=
2
1
1
1
, B=
1 0 0 0
. Then A ≥ B ≥ 0, but
A2 B2. Second, A ≥ B ≥ 0 ) A1∕2 ≥ B1∕2 : This is the most important fact in the study of operator inequality, so many proofs are known. However we here give a proof which seems to be new.
Geometric Mean and Matrix Quadratic Equations
215
Proof Assume A > 0. From the hypothesis jjB1∕2 A - 1∕2 xjj ≤ jjxjj ð8xÞ: Take an arbitrary eigenvalue λ > 0 of A-1∕4B1∕2A-1∕4 and corresponding eigenvector y. Since A-1∕4B1∕2A-1∕4y = λy, B1∕2A-1∕2(A1∕4y) = λA1∕4y. In virtue of the above inequality jjλA1∕4 yjj ≤ jjA1∕4 yjj: This says 0 < λ ≤ 1 because y ≠ 0. We consequently obtain A-1∕4B1∕2A-1∕4 ≤ I and hence B1∕2 ≤ A1∕2. For a general case, consider A + ɛ and take the limit of (A + ɛ)1∕2 ≥ B1∕2. □ Let A ≥ 0, B ≥ 0, and C ≥ 0. By applying the above result, one can verify the following: (i) If A > 0, then BAB ≤ CAC ) B ≤ C: (ii) If A ≥ 0, then BAB ≤ CAC ) PBP ≤ PCP, where P is the orthogonal projection onto RðAÞ. In general, a function f(t) which has the property A ≥ B ) f ðAÞ ≥ f ðBÞ is called an operator monotone function, and it has been deeply investigated by Löwner [9] (cf. [11]).
1.4
Harmonic Mean
The harmonic mean of A ≥ 0 and B ≥ 0 was defined in [1] by A ! B = 2AðA þ BÞ{ B: Of course if A and B are both invertible, then
ð1:2Þ
216
M. Uchiyama
A-1 þ B-1 A ! B= 2
-1
:
Let us confirm the continuity of this operation by showing the claim: for A ≥ 0 and B ≥ 0 ðA þ ɛÞððA þ ɛÞþðB þ ɛÞÞ - 1 ðB þ ɛÞ → AðA þ BÞ{ B ðɛ → 0Þ: Proof Let P be the orthogonal projection to RðA þ BÞ. Since RðAÞ ⊂ RðA þ BÞ, AP = A and similarly BP = B. Put HðɛÞ: = ðA þ ɛÞððA þ ɛÞþðB þ ɛÞÞ-1 ðB þ ɛÞ: Then HðɛÞP = PðA þ ɛÞPððA þ ɛÞþðB þ ɛÞÞ-1 PðB þ ɛÞP. Since PððA þ ɛÞþðB þ ɛÞÞ-1 P → ðA þ BÞ{ , ðA þ ɛÞP → A,
PðB þ ɛÞ → B,
HðɛÞP → PAðA þ BÞ{ BP
ðɛ → 0Þ:
On the other hand, HðɛÞðI - PÞ = ɛð2ɛÞ-1 ɛðI - PÞ → 0: Combine them to obtain the claim.
□
From the above it follows that A ! B = 2AðA þ BÞ{ B = 2BðA þ BÞ{ A = B ! A ≥ 0: We also show RðA ! BÞ = RðAÞ \ RðBÞ: Proof From the definition, RðA ! BÞ ⊂ RðAÞ \ RðBÞ. For x 2 RðAÞ\ RðBÞ, ðA ! BÞðA{ x þ B{ xÞ = BðA þ BÞ{ AA{ x þ AðA þ BÞ{ BB{ x = ðA þ BÞðA þ BÞ{ x = x: Thus x 2 RðA ! BÞ.
□
Geometric Mean and Matrix Quadratic Equations
1.5
217
Geometric Mean
For A ≥ 0 and B ≥ 0, the geometric mean A#B was defined in [2, 7, 10] by A#B = max fX ≥ 0 j
A
X
X
B
≥ 0g:
ð1:3Þ
Let us check that this definition is well defined. Suppose A > 0 and X ≥ 0. Then A
X
X
B
≥ 0 , jðXx, yÞj2 ≤ ðAx, xÞðBy, yÞ ð8x, yÞ 2
, B ≥ XA - 1 X , A - 1∕2 BA - 1∕2 ≥ ðA - 1∕2 XA - 1∕2 Þ 1∕2
) A1∕2 ðA - 1∕2 BA - 1∕2 Þ A1∕2 ≥ X: This deduces that if A > 0, then 1∕2
A#B = A1∕2 ðA - 1∕2 BA - 1∕2 Þ A1∕2 :
ð1:4Þ
If A is not invertible, we have A#B = lim ðA þ ɛÞ#B, ɛ→0
because the sequence of the right side is decreasing. Thus A#B is well defined for A, B ≥ 0. The following properties arise from the above results: A#B = B#A,
A#B = ðABÞ1∕2 if AB = BA,
B = ðA#BÞA - 1 ðA#BÞ if A is invertible:
ð1:5Þ
We next show RðA#BÞ = RðAÞ \ RðBÞ: Equation (1.3) implies RðA#BÞ ⊆ RðAÞ \ RðBÞ, while A ! B ≤ A#B entails RðA ! BÞ ⊆ RðA#BÞ. By RðA ! BÞ = RðAÞ \ RðBÞ, we obtain the required equality. We remark that this equality guarantees N ðA#BÞ = N ðAÞ þ N ðBÞ:
218
M. Uchiyama
1.6
Relations Among Means
We first show the following basic inequalities for A ≥ 0 and B ≥ 0 1 A ! B ≤ A#B ≤ ðA þ BÞ: 2
ð1:6Þ
Proof Assume A > 0. The trivial inequalities for real functions p 1þt 2t ≤ t≤ 1þt 2
ðt > 0Þ
ensure -1
A ! B = 2A1∕2 ðI þ A-1∕2 BA-1∕2 Þ A-1∕2 BA-1∕2 A1∕2 1 1∕2 ≤ A1∕2 ðA-1∕2 BA-1∕2 Þ A1∕2 ≤ A1∕2 I þ A-1∕2 BA-1∕2 A1∕2 : 2 This secures the required result for A > 0. By considering A + ɛ and taking the limits, (1.6) holds for every A ≥ 0. □ The following important formula is due to [2]. AþB # ðA ! BÞ = A#B: 2
ð1:7Þ
We give a simple proof shown in [13] because this will be often used later. Proof We may assume A > 0 and B > 0 without loss of generality. Since B = ðA#BÞA-1 ðA#BÞ,
A = ðA#BÞB-1 ðA#BÞ,
we have A þ B = ðA#BÞðA-1 þ B -1 ÞðA#BÞ: Multiply Then
ð
A-1 þ B -1 2
Þ
1∕2
from the both sides and take the square roots.
Geometric Mean and Matrix Quadratic Equations
A-1 þ B -1 2 =
1∕2
A-1 þ B -1 2
219
A þ B A-1 þ B -1 2 2
1∕2
ðA#BÞ
1∕2 1∕2
A-1 þ B -1 2
1∕2
,
which gives - 1∕2 ðA ! BÞ1∕2 ðA ! BÞ - 1∕2 AþB 2 ðA ! BÞ
1∕2
ðA ! BÞ1∕2 = A#B: □
This means (1.7).
1.7
A Problem on Matrices
We now propose a problem similar to one given at the beginning: For given A ≥ B ≥ 0, find X, Y ≥ 0 such that XþY = A, 2
X#Y = B:
Let us consider a simple case where AB = BA. Then, by taking account of A2 ≥ B2, we can see that 1∕2
X = A - ðA2 - B2 Þ ,
1∕2
Y = A þ ðA2 - B2 Þ
ð1:8Þ
is a pair of solutions, and both of them satisfy the quadratic equation ð1:9Þ
X 2 - AX - XA þ B2 = 0:
Having been said, there must be another pair of solutions which do not satisfy the equation (1.9). We here give an example. Example Let A=
1 2
3
2
2
3
,
1 B= p 5
3
2
2
3
:
Since these are commutative, we have a pair of solutions given by (1.8).
220
M. Uchiyama
However, put 2 1
X :=
1 1
Y :=
,
1
1
1
2
:
Since 1 X 1∕2 = p 5
3
1
1
2
,
2
-1
-1
3
1 = p 5 5
7
-1
-1
18
1 X -1∕2 = p 5
,
we get ðX
-1∕2
YX
-1∕2 1∕2
Þ
1 = 5
2
-1
-1
13
1∕2
:
We consequently derive XþY = A, 2
X#Y = B:
On the other hand, one can easily check that X does not satisfy (1.9). We therefore need to consider an alternative equation to (1.9), which will be given in the next subsection.
Geometric Mean and Matrix Quadratic Equations
221
The above figures would be helpful to comprehend (1.8) geometrically, which is by courtesy of M. Fujii.
1.8
Main Theorems
The objective of this chapter is the following [14]: Theorem 1 Let A, B be N × N matrices such that A ≥ B ≥ 0. Then the matrix equation A=
XþY , 2
B = X#Y,
0≤X ≤Y
ð1:10Þ
has a unique solution X = A - ðA - BÞ#ðA þ BÞ,
Y = A þ ðA - BÞ#ðA þ BÞ:
ð1:11Þ
Theorem 2 Let A, B be N × N matrices such that A ≥ B ≥ 0. Put X- = A (A - B)#(A + B), X+ = A + (A - B)#(A + B). If a pair {X, Y } satisfies the following equation A=
XþY , 2
B = X#Y
0 ≤ X, Y,
ð1:12Þ
then X and Y are both in the interval [X-, X+] and solutions of a quadratic equation XA{ X - 2X þ BA{ B = 0,
PA X = X,
ð1:13Þ
where PA is the orthogonal projection onto RðAÞ:Conversely, if a Hermitian matrix X is a solution of (1.13), then X-≤ X ≤ X+ and Y := 2A - X is a solution of (1.13) too. Moreover, this pair {X, Y } satisfies (1.12). Theorem 3 Let A, B be N × N matrices such that A ≥ B ≥ 0. Put K = A{∕2(A B)A{∕2, where A{∕2 := (A{)1∕2. Then X is a solution of (1.13) if and only if there is a unique orthogonal projection Q such that Q ≤ PK ,
QK = KQ, {∕2
X = X þ A QA1∕2 þ X - A{∕2 ðI - QÞA1∕2 , where X-, X+ were given in Theorem 2. In this case Y := 2A - X is represented by
222
M. Uchiyama
Y = X þ A{∕2 ðPK - QÞA1∕2 þ X - A{∕2 ðI - ðPK - QÞÞA1∕2 : Theorem 4 Let A, B be N × N matrices such that A ≥ B ≥ 0. The number of the solutions X′s of (1.13) is finite if and only if the multiplicity of each nonzero eigenvalue of K given above is 1. Precisely, as the number of such eigenvalues is k, the number of X′s is 2k. Hence, the number of {X, Y}′s satisfying (1.12) is 2k-1 if k ≥ 1, and only X = Y = A = B satisfies (1.12) if k = 0. Theorem 5 Let A, B be N × N matrices such that A ≥ B ≥ 0. For any solution X of (1.13) and X-, X+ in Theorem 2 {
t
X þ = X þ lim
t→1
e
- 2s sA{ X { sXA{
e
Ae
ds
,
0 {
t 2s - sA{ X { - sXA{
X - = X - lim
t→1
e e
Ae
ds
:
0
Recall that the idea of the formulas in Theorems 3 and 5 originated with Coppel [5] (cf. [8]), where a general equation with controllable coefficients has been treated, while the coefficients {-I, A{} of the equation (1.13) are not controllable, namely, Rðð - IÞk A{ Þ ð0 ≤ kÞ does not span CN. Proofs of Theorem 3 and Theorem 5 are far from those of [5]. Theorem 5 may seem to be puzzling, so we will give a simple example in the fourth section to illustrate it.
2 Lemmas In this section we deal with some elementary but unknown properties on the operator geometric mean and the operator harmonic mean, which we will use later. Lemma 6 Let A, B ≥ 0. (i) (a) RðBÞ ⊆ RðAÞ if and only if B = (A#B)A{(A#B). { (b) A ! B = ðA#BÞðAþB 2 Þ ðA#BÞ. (ii) (a) A1∕2(A{∕2BA{∕2)1∕2A1∕2 = A#(PABPA) ≥ A#B, where A{∕2 := (A{)1∕2. (b) If BPA = PAB, then A1∕2(A{∕2BA{∕2)1∕2A1∕2 = A#B. (c) RðBÞ ⊆ RðAÞ if and only if A#(BA{B) = B.
Geometric Mean and Matrix Quadratic Equations
223
(iii) Let B = HA{H for H ≥ 0. Then PAHPA = A#(PABPA). In addition, if RðBÞ ⊆ RðAÞ, then PA H = HPA ,
PA HPA = A#B:
Proof Let P be short for PA. (i)(a). Suppose RðBÞ ⊆ RðAÞ. Let A1 and B1 be the restrictions of A and B, respectively, to RðAÞ. Since { -1 A#B = ðA1 #B1 Þ 0, B1 = ðA1 #B1 ÞA-1 1 ðA1 #B1 Þ, A = A1 0,
we get { B = B1 0 = ðA1 #B1 ÞA-1 1 ðA1 #B1 Þ 0 = ðA#BÞA ðA#BÞ:
The converse implication is trivial. (i)(b). Since AþB #ðA ! BÞ = A#B and RðA þ BÞ ⊇ RðA ! BÞ, 2 this follows from (a). (ii)(a). In view of (i)(a), PBP = ðA#ðPBPÞÞA{ ðA#ðPBPÞÞ: By multiplying A{∕2 from the both sides. A{∕2 ðPBPÞA{∕2 = ðA{∕2 ðA#ðPBPÞÞA{∕2 Þ : 2
Take the square roots of the both sides to get 1∕2
1∕2
A#ðPBPÞ = A1∕2 ðA{∕2 ðPBPÞA{∕2 Þ A1∕2 = A1∕2 ðA{∕2 BA{∕2 Þ A1∕2 , where the last equality arises from PA = A. Besides, we have A#ðPBPÞ = ðPAPÞ#ðPBPÞ ≥ PðA#BÞP = A#B, where the inequality is due to (1.3). (ii)(a) has been proved.
224
M. Uchiyama
(ii)(b). Since B ≥ PBP arises from PB = BP, we obtain A#B ≥ A#ðPBPÞ ≥ A#B, where the second inequality is due to (ii)(a). We therefore get (ii)(b). (ii)(c). Suppose RðBÞ ⊆ RðAÞ. Since BA{B commutes to P, by (ii)(b) 1∕2
A#ðBA{ BÞ = A1∕2 ðA{∕2 BA{ BA{∕2 Þ A1∕2 = PA BPA = B: The converse is clear. (iii). Since A{∕2BA{∕2 = (A{∕2HA{∕2)2, we get 1∕2
PHP = A1∕2 ðA{∕2 BA{∕2 Þ A1∕2 = A#ðPBPÞ: Suppose RðBÞ ⊆ RðAÞ. From 0 = (I - P)B(I - P) = (I - P)HA{H(I - P) it follows that PH(I - P) = 0 and hence PH = HP. The last equality is trivial. □ Recall that (i)(b) is due to [13] and that (ii)(b) is a slight extension of [6], where the conclusion has been shown under the condition RðBÞ ⊆ RðAÞ. (i) and (ii) of the following lemma are extensions of ones given in [12], where A + B > 0 and A > 0 were respectively assumed. Lemma 7 Let A, B ≥ 0. Then { 1 (i) AþB 2 - A ! B = 2 ðA - BÞðA þ BÞ ðA - BÞ: (ii) If A ≥ B ≥ 0, then A - B = ðA þ ðA#BÞÞ ! ðA - ðA#BÞÞ: (iii) If A ≥ B ≥ 0, then
ðA - BÞ ! ðA þ BÞ = A - BA{ B ≤ A, ðA - BÞ#ðA þ BÞ = A#ðA - BA{ BÞ ≤ A:
Proof (i). This equality arises from the next relation. A þ B - ðA - BÞðA þ BÞ{ ðA - BÞ = ðA þ BÞðA þ BÞ{ ðA þ BÞ - ðA - BÞðA þ BÞ{ ðA - BÞ = 2AðA þ BÞ{ B þ 2BðA þ BÞ{ A = 2A ! B: (ii). It is clear that A ≥ A#B and RðAÞ ⊇ RðBÞ. By (i) and Lemma 6(i)(a)
Geometric Mean and Matrix Quadratic Equations
225
1 A - ðA þ ðA#BÞÞ ! ðA - ðA#BÞÞ = 2ðA#BÞð2AÞ{ 2ðA#BÞ = B: 2 (iii). (i) leads us to the first equality. By (1.7) and the first equality ðA - BÞþðA þ BÞ # ððA - BÞ ! ðA þ BÞÞ 2 = A#ðA - BA{ BÞ:
ðA - BÞ#ðA þ BÞ =
□
The two inequalities are evident.
The next lemma is well-known, however it plays an important role in this chapter, so for the sake of completeness we give a proof. Lemma 8 Let A ≥ 0. Then a Hermitian X satisfies X2 = A2 if and only if there is a unique orthogonal projection Q such that Q ≤ PA ,
QA = AQ,
X = AQ þ ð - AÞðI - QÞ:
In this case, - A ≤ X ≤ A. ≥ 0 and X - := jXj2- X ≥ 0. Denote Proof Assume X2 = A2. Put X þ := jXjþX 2 the orthogonal projection onto RðX þ Þ by Q. Since X+ ≤ |X| = A, we have Q ≤ PA. Since QX+ = X+ and QX- = 0, Q commutes to A and X = X þ - X - = AQ þ ð - AÞðI - QÞ: The converse assertion is evident. We next show the uniqueness of Q. If there were another Q′, then 2QA = 2Q′A, which implies Q = QPA = Q′PA = Q′. The last statement is evident. □ We remark that Q is uniquely determined by X, while we can replace (-A) (I - Q) with (-A)(PA - Q).
3 Proofs of Theorems In this section we prove theorems in order. The abbreviation P for PA remains here.
226
3.1
M. Uchiyama
Proof of Theorem 1
Proof Assume X and Y satisfy (1.10). Then by Lemma 6(i)(b) and Lemma 7(i), A - BA{ B =
XþY 1 Y -X {Y -X - X ! Y = ðY - XÞð2AÞ{ ðY - XÞ = A : 2 2 2 2
Since Y - X ≥ 0 and RðY - XÞ ⊆ RðAÞ, by Lemma 6(iii) Y -X = A#ðA - BA{ BÞ: 2 By Lemma 7(iii), we therefore get X = A - ðA - BÞ#ðA þ BÞ,
Y = A þ ðA - BÞ#ðA þ BÞ:
Let us confirm that this pair {X, Y } satisfies X # Y = B. X#Y ¼ A - A# A - BA{ B # A þ A# A - BA{ B ¼ A# A - A# A - BA{ B A{ A# A - BA{ B {
¼ A# A - A - BA B
{
¼ A#BA B
¼B
ðLemma 7ðiiiÞÞ ðLemma 6 ðiÞðaÞÞ ðLemma 6 ðiiÞðcÞÞ □
3.2
Proof of Theorem 2
Proof Assume X and Y satisfy (1.12). It is obvious that X = XP and Y = Y P. Since X ! Y = XA{Y , by Lemma 6(i)(b), XA{ Y = BA{ B: Substitute Y = 2A - X (or X = 2A - Y ) into this equality to see that X (or Y ) fulfills (1.13). To show X and Y are in [X-, X+], we prove a general claim: If a Hermitian X, which is not necessarily positive semi-definite, is a solution of (1.13), then X-≤ X ≤ X+.
Geometric Mean and Matrix Quadratic Equations
227
Multiplying A{∕2 from the both sides of (1.13) gives 2
ðA{∕2 XA{∕2 Þ - 2A{∕2 XA{∕2 þ A{∕2 BA{ BA{∕2 = 0, 2
ðA{∕2 XA{∕2 - PÞ = P - A{∕2 BA{ BA{∕2 : Since A{∕2BA{∕2 commutes to P and A{∕2BA{∕2 ≤ P, (A{∕2BA{∕2)2 ≤ P. Put 2
H := ðP - ðA{∕2 BA{∕2 Þ Þ1∕2 ≥ 0:
ð3:1Þ
Then we have 2
ðA{∕2 XA{∕2 - PÞ = H 2 :
ð3:2Þ
By Lemma 8 we observe - H ≤ A{∕2 XA{∕2 - P ≤ H, which says A - A1∕2 HA1∕2 ≤ X ≤ A þ A1∕2 HA1∕2 : Because of RðA - BA{ BÞ ⊆ RðAÞ, by Lemma 6(ii)(b), we have A1∕2 HA1∕2 = A1∕2 ðA{∕2 ðA - BA{ BÞA{∕2 Þ1∕2 A1∕2 = A#ðA - BA{ BÞ = ðA - BÞ#ðA þ BÞ,
ð3:3Þ
where the last equality is owing to Lemma 7(iii). From the above inequalities, it follows that A - ðA - BÞ#ðA þ BÞ ≤ X ≤ A þ ðA - BÞ#ðA þ BÞ, which means X-≤ X ≤ X+. Thus, the proof of the claim is complete, so X and Y that satisfy (1.12) are both in [X-, X+]. Conversely assume X satisfies (1.13). By the above claim X is positive semi-definite. Put Y = 2A - X. Then one can easily verify that Y satisfies (1.13) as well. By Lemma 6(i)(b),
228
M. Uchiyama
ðX#YÞA{ ðX#YÞ = X ! Y = XA{ Y = 2X - XA{ X = BA{ B: By Lemma 6(iii) and Lemma 6(ii)(c), X#Y = PðX#YÞP = A#ðBA{ BÞ = B: □
X and Y consequently fulfill (1.12).
3.3
Proof of Theorem 3
Proof Assume X satisfies (1.13). Then (3.2) holds for H defined by (3.1). By Lemma 8 there is a unique orthogonal projection Q such that Q ≤ PH , QH = HQ,
A{∕2 XA{∕2 - P = HQ þ ð-HÞðI - QÞ,
and then X = A1∕2 ðP þ HQ þ ð-HÞðI - QÞÞA1∕2 :
ð3:4Þ
Since, by (3.3), A1∕2HA1∕2 = (A - B)#(A + B), X þ = A þ A1∕2 HA1∕2 ,
X - = A - A1∕2 HA1∕2 :
We therefore obtain X = X þ A{∕2 QA1∕2 þ X - A{∕2 ðI - QÞA1∕2 :
ð3:5Þ
It is easy to get the required expression for Y = 2A - X. For K given in Theorem 3, we have RðHÞ = RðH 2 Þ = RðP - A{∕2 BA{∕2 Þ = RðKÞ, and PH = PK. It is easy to see that QH = HQ if and only if QK = KQ. Assume there were an orthogonal projection Q′ satisfying the statement. Then we have X = A1∕2 ðP þ HQ ′ þ ð - HÞðI - Q ′ ÞÞA1∕2 , and hence
Geometric Mean and Matrix Quadratic Equations
229
A{∕2 XA{∕2 - P = HQ ′ þ ð - HÞðI - Q ′ Þ: By the uniqueness of Q as mentioned at the beginning, Q = Q′. Conversely, X represented by (3.5) fulfills PAX = X, (3.4), and (3.2). We observe 2
ðA{∕2 XA{∕2 Þ - 2A{∕2 XA{∕2 þ A{∕2 BA{ BA{∕2 = 0: Since this is equivalent to (1.13), X is a solution of (1.13). The proof is complete. □ We remark that both terms of the right-hand side of (3.5) are Hermitian since HQ is Hermitian and that A{∕2QA1∕2 and A{∕2(I - Q)A1∕2 are both idempotents such that A{∕2QA1∕2 + A{∕2(I - Q)A1∕2 = P. In the proof of Theorem 1, we showed that the solution of (1.10) is unique and was denoted by {X-, X+} in Theorem 2. Let us now see it in a different way. Suppose {X, Y } satisfies (1.12) and X ≤ Y . Then, since X ≤ A, by (3.4), Q = 0. This gives X = A - A1∕2HA1∕2 = X-.
3.4
Proof of Theorem 4
Proof Assume dimN ðK - λIÞ ≥ 2 for an eigenvalue λ≠0. There are infinitely many orthogonal projections Q such that RðQÞ ⊆ N ðK - λIÞ ⊆ RðKÞ: Because of QK = KQ, by Theorem 3, the number of X’s is infinite. Assume next the multiplicity of every nonzero eigenvalue λi > 0 (1 ≤ i ≤ k) of K is 1, and let ei be the corresponding unit eigenvector. Then we have K = λ1 e1 e1 þ ⋯ þ λk ek ek , where λi ≠ λj for i ≠ j. An orthogonal projection Q satisfies Q ≤ PK and QK = KQ if and only if Q = δ1 e1 e1 þ ⋯ þ δk ek ek , where δi = 0 or δi = 1. By Theorem 3 the number of X’s is 2k. Suppose k = 0. Then K = 0 and hence X = Y = A = B. □
230
3.5
M. Uchiyama
Proof of Theorem 5
We remark that D{≤ C{ provided 0 ≤ C ≤ D and RðCÞ = RðDÞ. Proof Put t {
{
e - 2s esA X A{ esXA ds:
VðtÞ = 0
One can see that V (0) = 0, V (t) ≥ 0 is increasing for t ≥ 0, i.e., V (s) ≤ V (t) as 0 ≤ s ≤ t, and that N ðVðtÞÞ = N ðAÞ for t > 0. Since V (t){≤ V (s){ as 0 < s < t, there exists limt→1V (t){. To show this equals X+ - X, we may assume A is invertible with no loss of generality; indeed, denoting the restrictions of A and X to RðAÞ, respectively, by A1 and X1, we have {
{
-1
esA X A{ esXA = esA1
X1
-1
A1- 1 esX 1 A1 0,
and t -1
e - 2s esA1
VðtÞ =
X1
-1
A1- 1 esX 1 A1 ds 0:
0
Accordingly, we assume A is invertible, and we have only to show lim VðtÞ - 1 = X þ - X:
t→1
Since -1 -1 dV ðtÞ ¼ e - 2t etA X A-1 etXA ¼ A-1 þ dt
t
0
-1 -1 d e - 2s esA X A-1 esXA ds ds
¼ A-1 - 2V ðt Þ þ A-1 XV ðt Þ þ V ðt ÞXA-1 , dV ðtÞ-1 dV ðtÞ ¼ - V ðt Þ-1 V ðt Þ-1 ¼ - V ðt Þ-1 A-1 V ðt Þ-1 þ 2V ðtÞ-1 dt dt - V ðt Þ-1 A-1 X - XA-1 V ðt Þ-1 : Because V (t)-1 converges as t →1, this implies that so does V (t)-1 ≥ 0 is decreasing, we deduce that
dVðtÞ-1 dt .
As
Geometric Mean and Matrix Quadratic Equations
231
dVðtÞ-1 = 0: t→1 dt lim
By putting V -1 ð1Þ : = lim VðtÞ-1 ≥ 0, t→1
we arrive at 0 = - V -1 ð1ÞA-1 V -1 ð1Þ þ 2V -1 ð1Þ - V -1 ð1ÞA-1 X - XA-1 V -1 ð1Þ: Put W = V-1(1) + X. Then from the above equation, we get 0 = XA-1 X - 2X þ 2W - WA-1 W: Since X is a solution of (1.13), we obtain 0 = - BA-1 B þ 2W - WA-1 W, which says W itself is a solution of (1.13), because PA = I. By the equality shown above dVðtÞ þ A-1 = 2A-1 - 2VðtÞ þ A-1 XVðtÞ þ VðtÞXA- 1 dt = A-1 ðVðtÞ-1 - A þ XÞVðtÞ þ VðtÞðVðtÞ-1 - A þ XÞA-1 : 0
0 for t > 0, the real part of each eigenvalue of (V (t)-1 - A + X)A-1 is positive. From this we deduce that V (t)-1 - A + X > 0, whence comes W = X þ V -1 ð1Þ ≥ A: By Theorem 2, 2A - W is also solution of (1.13). Since W ≥ 2A - W, W must be coincident with X+, i.e., X+ = X + V-1(1). The second equality in Theorem 5 can be shown analogously. □
232
M. Uchiyama
4 An Example and Miscellaneous Results 4.1
An Example
Theorem 5 does not seem to be easy to understand. So we give a simple example to illustrate it. Let 2 0
A=
0 3
,
1 0
B=
:
0 1
Then X þ : = A þ ðA - BÞ#ðA þ BÞ =
X þ þX 2
= A,
3
0
p 2- 3
X- : = A - ðA - BÞ#ðA þ BÞ = We now easily see that both solutions of
p
2þ
0
p , 3þ2 2 0
p : 3-2 2
0
X þ #X - = B and that X+ and X- are
XA-1 X - 2X þ BA-1 B = 0: -1∕2
K : =A
-1∕2
ðA - BÞA
=
1 2
0
0
2 3
:
Since K is invertible, projections Q such that Q ≤ PK and QK = KQ are Q1 =
1
0
0
1
, Q2 : =
1 0 0 0
, Q3 : =
Apply Theorem 3 for each Qi. Then we obtain
0
0
0
1
, Q4 : =
0
0
0
0
:
Geometric Mean and Matrix Quadratic Equations
233
X 1 := X þ A-1∕2 Q1 A1∕2 þ X - A-1∕2 ðI - Q1 ÞA1∕2 = X þ , 2þ
X 2 := X þ A-1∕2 Q2 A1∕2 þ X - A-1∕2 ðI - Q2 ÞA1∕2 =
p
3
0
p , 3-2 2
0 p 2- 3
X 3 := X þ A-1∕2 Q3 A1∕2 þ X - A-1∕2 ðI - Q3 ÞA1∕2 =
0
p , 3þ2 2
0
X 4 := X þ A-1∕2 Q4 A1∕2 þ X - A-1∕2 ðI - Q4 ÞA1∕2 = X - : The arithmetic mean and the geometric mean of X2 and X3 are, respectively, A and B. They also satisfy the quadratic equation. Let us confirm Theorem 5. We first show -1
t
X þ = X 2 þ lim
t→1
e
- 2s sA-1 X 2 -1 sX 2 A-1
e
A e
-1
t -1
e2s e - sA
X - = X 2 - lim
t→1
ð4:1Þ
:
ds
0
X2
-1 - sX 2 A-1
A e
:
ds
0
Since p A-1 X 2 = X 2 A-1 =
1þ
3 2
0
-1
e - 2s esA
X 2 -1 sX 2 A-1
A e
=
1 p3s e 2 0
We therefore get
0 2p 12 3
,
0 p 1 - 43 2s e 3
:
ð4:2Þ
234
M. Uchiyama
p 2 3 p e 3t - 1
-1
t - 2s sA-1 X 2 -1 sX 2 A-1
e
e
A e
=
ds
0
0 0
→
0 p 4 2
0
0 p 4 2
1
p 4 1 - e - 3 2t
ðt → 1Þ:
Consequently we obtain (4.1). We similarly obtain
2s - sA-1 X 2 -1 - sX 2 A-1
e e
A e
=
1 - p3s e 2
0 , p 1 43 2s e 3 p 2 3 p - e - 3t þ 1
0 -1
t 2s - sA-1 X 2 -1 - sX 2 A-1
e e
A e
ds
=
p 4 2
0
0
→
0
p 2 3
0
0
0
p 4 -1 þ e3 2t
ðt → 1Þ:
This yields (4.2). By the analogous calculations, one can show -1
t
X þ = X 3 þ lim
t→1
e
- 2s sA-1 X 3 -1 sX 3 A-1
e
A e
ds
-1 -1
e2s e - sA
X - = X 3 - lim
t→1
,
0 t X3
-1 - sX 3 A-1
A e
0
This concludes an explanation of the example.
ds
1
:
Geometric Mean and Matrix Quadratic Equations
4.2
235
Corollary
We give a simple corollary of Theorem 1. Corollary 9 Let A ≥ 0 and B ≥ 0 be N × N matrices. Then (I) there is a unique pair {X-, X+} such that 0 ≤ X - ≤ Xþ,
X - þ X þ = A þ B,
X - #X þ = A#B:
They are precisely given by AþB AþB AþB þ þ A#B # - A#B , 2 2 2 AþB AþB AþB = þ A#B # - A#B : 2 2 2
Xþ = X-
Further, they satisfy X- ! X+ = A ! B too. (II) A and B are both in the interval [X-, X+] and solutions of 2XðA þ BÞ-1 X - 2X þ A ! B = 0: Proof Most of these statements follow from Theorems 1 and 2 except for the range constraint of the quadratic equation. However we can remove it since the range of A is included in that of A + B. □ We remark about the above corollary that if AB = BA, then X-X+ = X+Xand the set {X-, X+} are commutative to the set {A, B}.
4.3
Simultaneous Decomposition
In [3] (cf. [4]) it was mentioned that for 2 × 2 matrices A > 0, B > 0 such that detA = detB = 1 A#B =
AþB : detðA þ BÞ
We extend this formula and decompose A and B simultaneously by using Theorem 3. We denote the spectrum of X by σ(X).
236
M. Uchiyama
Lemma 10 ( [15]) Let A, B be linearly independent N × N positive definite matrices. Then there is a real number 0 < a < 1 such that p AþB A#B = a 2
ð4:3Þ
if and only if σ(A-1B) = {λ, 1∕λ}, where 0 < λ < 1. In this case, we have p p a = 21þλλ. Proof We notice that A-1B has at least two different eigenvalues and that (4.3) is equivalent to p -1∕2
ðA
-1∕2 1∕2
BA
Þ
=
a ðI þ A-1∕2 BA-1∕2 Þ: 2
ð4:4Þ
Assume (4.3) and take an arbitrary λ 2 σ(A-1∕2BA-1∕2). Then by (4.4) p p p λ = 2a ð1 þ λÞ. This implies λ is a solution of the numerical quadratic equation; so σ(A-1B) consists of two points. Since 1∕λ is also a solution of -1 B) = {λ, 1∕λ}. Conversely, if σ(A-1B) = {λ, 1∕λ}, then by this equation, σ(A p p putting a = 21þλλ, we get (4.4). □ Proposition 11 Let A, B be linearly independent N × N positive definite matrices. Then (4.3) holds if and only if there is an orthogonal projection Q such that p p 1 A = ðA þ BÞ1∕2 ðð1 þ 1 - aÞQ þ ð1 - 1 - aÞðI - QÞÞðA þ BÞ1∕2 2 p p 1 B = ðA þ BÞ1∕2 ðð1 - 1 - aÞQ þ ð1 þ 1 - aÞðI - QÞÞðA þ BÞ1∕2 : 2 Proof Substitute AþB 2 and A#B for A and B in Theorem 1, respectively. Then p p p AþB AþB AþB þ ð1 - aÞð1 þ aÞ = ð1 þ 1 - aÞ , 2 2 2 p p p AþB AþB AþB X- = - ð1 - aÞð1 þ aÞ = ð1 - 1 - aÞ : 2 2 2 p We also get K = ð1 - aÞI. By Theorem 3 there is an orthogonal projection Q such that Xþ =
Geometric Mean and Matrix Quadratic Equations
p
237
A þ B A þ B -1∕2 A þ B 1∕2 Q 2 2 2 p A þ B A þ B -1∕2 A þ B 1∕2 þ 1- 1-a ðI - QÞ 2 2 2 p p 1 1∕2 ¼ ðA þ BÞ 1 þ 1 - a Q þ 1 - 1 - a ðI - QÞ ðA þ BÞ1∕2 , 2 p p 1 B ¼ ðA þ BÞ1∕2 1 - 1 - a Q þ 1 þ 1 - a ðI - QÞ ðA þ BÞ1∕2 : 2
A¼ 1þ
1-a
□
5 Relevant Equations We consider a problem analogous to one in Section 1.8 by replacing the geometric mean to the harmonic mean. The first statement (I) of the following proposition is a minute extension of our earlier paper [12]. Proposition 12 ([14]) Let A, C be N × N matrices such that A ≥ C ≥ 0. Then (I) A matrix equation A=
XþY , 2
C = X ! Y,
0≤X ≤Y
ð5:1Þ
has a unique solution X = A - A#ðA - CÞ,
Y = A þ A#ðA - CÞ:
ð5:2Þ
(II) Put X- = A - A#(A - C) and X+ = A + A#(A - C). If a pair {X, Y } satisfies a matrix equation A=
XþY , 2
C=X ! Y
0 ≤ X, Y,
ð5:3Þ
then X and Y are both solutions of a quadratic equation XA{ X - 2X þ C = 0, and they are in the interval [X-, X+].
PA X = X,
ð5:4Þ
238
M. Uchiyama
Conversely, if X is a Hermitian solution of (5.4), then X-≤ X ≤ X+, and Y := 2A - X is a solution of (5.4) too. Moreover this pair {X, Y } satisfies (5.3). (III) Put L = A{∕2(A - C)A{∕2. X satisfies (5.4) if and only if there is a unique orthogonal projection Q such that Q ≤ PL ,
QL = LQ, {∕2
X = X þ A QA1∕2 þ X - A{∕2 ðI - QÞA1∕2 : In this case Y = 2A - X is represented by Y = X þ A{∕2 ðPL - QÞA1∕2 þ X - A{∕2 ðI - ðPL - QÞÞA1∕2 : (IV) The number of the solutions X′s of (5.4) is finite if and only if the multiplicity of each nonzero eigenvalue of L given above is 1. Precisely, as the number of such eigenvalues is k, the number of X′s is 2k. Hence, the number of {X, Y}′s satisfying (5.3) is 2k-1 if k ≥ 1, and only X = Y = A = C satisfies (5.3) if k = 0. (V) For any solution X of (5.4) and X-, X+ in Theorem2, we have {
t
X þ = X þ lim ð t→1
e
- 2s sA{ X { sXA{
e
Ae
dsÞ ,
0 {
t 2s - sA{ X { - sXA{
X- = X - lim ð t→1
e e
Ae
dsÞ :
0
Proof Put B = A#C. Then A ≥ B ≥ C, and the relations (1.10) and (5.1) are equivalent. Indeed, suppose a pair {X, Y } satisfies (1.10). Then by Lemma 6 (i)(b), X ! Y = BA{B = C. It therefore fulfills (5.1). Conversely suppose a pair {X, Y } satisfies (5.1). Since XþY Þ = C#A = B, X#Y = ðX ! YÞ#ð 2 it also satisfies (1.10). By Lemma 7 (iii). A#ðA - CÞ = A#ðA - BA{ BÞ = ðA - BÞ#ðA þ BÞ, from which it follows that (1.11) and (5.2) are equivalent to each other. That (1.13) is equivalent to (5.4) is apparent. Since L = P - (A{∕2BA{∕2)2 = H2,
Geometric Mean and Matrix Quadratic Equations
239
which has been given in (3.1), for an orthogonal projection Q, it is clear that Q ≤ PL, QL = LQ if and only if Q ≤ PK, QK = KQ. Thanks to such correspondences, one can derive Proposition 12 from theorems in Section 1.8. □ We have seen that every Hermitian solution of (1.13) or (5.4) is positive semi-definite. We end this chapter by showing the converse. Proposition 13 Let A be a positive definite matrix and C a Hermitian matrix. Then XA-1 X - 2X þ C = 0 has a Hermitian solution and every Hermitian solution is positive semidefinite if and only if A ≥ C ≥ 0. Proof It is enough for us to show the necessity. Let us transform the equation as 2
ðA-1∕2 XA-1∕2 - IÞ = I - A-1∕2 CA-1∕2 : Let X be a Hermitian solution. Then both side of the equality are positive semi-definite. Thus A ≥ C arise. In this case, X defined by 1∕2
A-1∕2 XA-1∕2 - I = - ðI - A-1∕2 CA-1∕2 Þ
is a Hermitian solution; so it must be positive semi-definite. This yields 1 ≥ (I - A-1∕2CA-1∕2)1∕2 and hence 1 ≥ I - A-1∕2CA-1∕2. Thus we arrive at C ≥ 0. □ Acknowledgements The author wishes to thank the reviewers for valuable comments and Prof. M. Fujii for good suggestion.
References 1. Anderson, W. N., & Duffin, R. J. (1969). Series and parallel addition of matrices. Journal of Mathematical Analysis and Applications,26, 576–594 2. Ando, T. (1978). Topics on operator inequalities. Lecture Note, Sapporo 3. Ando, T., Li, C. K., & Mathias, R. (2004). Geometric means. Linear Algebra and Its Applications,385, 305–334 4. Bhatia, R. (2007). Positive definite matrices. Princeton Series in Applied Mathematics
240
M. Uchiyama
5. Coppel, W. A. (1974). Matrix quadratic equations. Bulletin of the Australian Mathematical Society,10, 377–401 6. Fujimoto, M., & Seo, Y. (2019). The Schwarz inequality via operator-valued inner product and the geometric operator mean. Linear Algebra and Its Applications,561, 141–160 7. Kubo, F., & Ando, T. (1980). Means of positive linear operators. Mathematische Annalen,246, 205–224 8. Lancaster, P., & Rodman, L. (1995). Algebraic Riccati equations. Clarendon Press 9. Löwner, K. (1934). Über monotone matrixfunktionen. Mathematische Zeitschrift, 38, 177–216 10. Pusz, W., & Woronowicz, S. L. (1975). Functional calculus for sesquilinear forms and purification map. Reports on Mathematical Physics,8, 159–170 11. Simon, B. (2019) Loewner’s Theorem on Monotone Matrix Functions 12. Uchiyama, M. (2020). Operator functions and the operator harmonic mean. Proceedings of the American Mathematical Society,148, 797–809 13. Uchiyama, M. (2020). Some results on matrix means. Advances in Operator Theory,5(3), 728–733 14. Uchiyama, M. (2021). Operator means and matrix quadratic equations. Linear Algebra and Its Applications,609, 163–175 15. Uchiyama, M. (2023). Symmetric matrix means. Linear Algebra and Its Applications,656, 112–130
Yang-Baxter-Like Matrix Equation: A Road Less Taken Nebojša Č. Dinčić and Bogdan D. Djordjević
Abstract This chapter represents a comprehensive analysis of the matrix equation AXA = XAX. We revise some of our published results regarding this topic and provide some new original unpublished results. In particular, we revisit our methods for constructing infinitely many nontrivial solutions, for both regular and singular matrix A, and we revisit our characterization of all permutation and doubly stochastic solutions when A is a permutation matrix. Additionally, we prove new results which concern the case when A is invertible: we obtain the closed-form formula for all commuting solutions and characterize the existence of non-commuting solutions (these conditions cannot be weakened). We also provide an alternative way for proving the existence of doubly stochastic solutions when A is a permutation matrix. Keywords Yang-Baxter-like matrix equation • Sylvester equation • Matrix functions Mathematics Subject Classification (MSC2020) Primary 15A24 • Secondary 47A60, 47J05
N. Č. Dinčić (✉) Faculty of Sciences and Mathematics, University of Niš, Niš, Serbia e-mail: [email protected] B. D. Djordjević Mathematical Institute of the Serbian Academy of Sciences and Arts, Belgrade, Serbia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Moslehian (ed.), Matrix and Operator Equations and Applications, Mathematics Online First Collections, https://doi.org/10.1007/16618_2023_49
241
N. Č. Dinčić and B. D. Djordjević
242
1 Introduction In this chapter we study and solve a matrix equation known as the YangBaxter-like matrix equation (abbreviated as YBME), which is given by AXA = XAX
ð1:1Þ
where A 2 M n ðÞ is a known nonzero square matrix, often called the coefficient matrix, while X 2 M n ðÞ is the unknown solution matrix. A closer look reveals that YBME is a nonlinear equation which is equivalent to a system of n2 quadratic equations with n2 variables, so finding all its solutions for an arbitrary matrix A is a difficult task. The equation (1.1) has its roots in physics. It seems that Onsager [39] was the first one to study the so-called star-triangle transforms for the two-dimensional square-lattice Ising model in 1944. Two decades later, C. N. Yang observed a simple one-dimensional quantum many-body problem in [51], while R. J. Baxter studied a classical statistical mechanics problem in two dimensions in [4]. They both independently introduced the following equation: AðuÞBðu þ vÞAðvÞ = BðvÞAðu þ vÞBðuÞ,
ð1:2Þ
now widely known as the Yang-Baxter equation (YBE for short), where A and B are rational functions of their arguments. If A and B are independent from u and v, we obtain a special form of the YBE, the Yang-Baxter-like matrix equation (1.1). The YBME is a significant tool for studying the equation (1.2), as every result obtained for the matrix equation provides a new insight about the solutions of the original problem (1.2). Other than physical applications, both YBE and YBME are very popular research topics, which have remained open to this day: neither problem has been completely solved. The equations YBME and YBE are closely related to braid groups, knot theory, and their matrix representations. For more information we refer to, e.g., [29, 30, 38], and [52].
1.1
Fundamental Concepts
Returning to the YBME, we proceed to define some concepts which will be understood in the entire chapter. We denote the solution set of the YMBE (1.1) by
Yang-Baxter-Like Matrix Equation: A Road Less Taken
243
SðAÞ := fX : AXA = XAXg: If it is clear from the context, we will simply write S instead of SðAÞ. Since Sð0Þ = M n ðÞ, it is reasonable to assume that A ≠ 0. Additionally, when n = 1, the YBME reduces to a simple scalar quadratic equation a2 x = ax2 ) x = 0 _ x = a, thus throughout the chapter it is assumed that A ≠ 0 and n > 1. Definition 1.1 Let n > 1 and let A 2 M n ðÞ be an arbitrary nonzero square matrix. Let X 2 SðAÞ be a solution to YBME (1.1). • The solution X is said to be trivial if X 2{0, A}. Otherwise, the solution X is said to be nontrivial. • If the solution X commutes with A, then it is called a commuting solution. The set of all commuting solutions is denoted by S c ðAÞ S c ðAÞ = fX 2 SðAÞ : AX = XAg ⊃ f0, Ag: If the solution X does not commute with A, then it is said to be a non-commuting solution. Simply put, a solution X 2 SðAÞ is a commuting one if it subjects to AX = XA, while it is non-commuting if there exists a nonzero matrix Y 2 M n ðÞ such that AX - XA = Y . Both of these equations are special cases of Sylvester matrix equations SX - XZ = D, where S, Z, and D are given matrices of appropriate dimensions. Due to their essential role in studying the YBME, a special attention will be given to Sylvester matrix equations in Section 3. Note that searching for commuting solutions is the same as solving the quadratic matrix equation AX 2 - A2 X = 0: If A is invertible, the latter reduces to X 2 = AX,
ð1:3Þ
which will be examined and solved in Section 4.1. Let BL(r) denote the open ball in M n ðÞ centered at L 2 M n ðÞ having the radius r > 0:
N. Č. Dinčić and B. D. Djordjević
244
BL ðrÞ = fS 2 M n ðÞ:kS - Lk < rg, where kk is an a priori fixed matrix norm. Naturally, BL ðrÞ denotes its closure in M n ðÞ with respect to the topology induced by the said norm. Definition 1.2 A solution X 2 SðAÞ is isolated if there exists an open ball BX(r) centered at X with a radius r > 0 such that SðAÞ \ BX ðrÞ = fXg: Otherwise the solution X is not isolated. Definition 1.3 Let k be a positive integer and let τ := τ1 × τ2 × ⋯ × τk ⊂ k be a non-empty k -dimensional set. For every i, j = 1, n let xij : τ → be a complex-valued coordinate function defined in τ. If for every (t1, . . ., tk) 2 τ, the matrix X(t1, . . ., tk), given as Xðt 1 , . . ., t k Þ:= xij ðt 1 , . . ., tk Þ
i,j = 1,n
,
is a solution to the YBME (1.1), then X(t1, t2, . . ., tk) is a parametric solution to the YBME, depending on k complex parameters t1, . . ., tk. The set S τ ðAÞ:= fXðt 1 , . . ., t k Þ : ðt 1 , . . ., t k Þ 2 τg ⊂ SðAÞ defines a subset of the solution set SðAÞ and is parametrized by the parameter set τ. Specially, if τ is a connected set in k and the mapping t ° X(t) is continuous for every t 2 τ, then the solutions X(t) belong to the same connected component S τ ðAÞ of the solution set SðAÞ and are therefore connected. Examples in Section 2.2 will help understand parametric solutions more clearly. It is not difficult to see that by imposing certain properties on the coefficient matrix A, we also impose some (but not necessarily the same) properties on the solution matrix X as well. For example, the Cauchy-Binet theorem states that
Yang-Baxter-Like Matrix Equation: A Road Less Taken
245
ðdet AÞ2 det X = det Aðdet X Þ2 , so if we assume that A is invertible and we seek regular solutions to (1.1), it follows that the obtained solutions belong to that class of matrices, where each class representative has the same determinant as the matrix A. Other times, depending on the problem which is modeled via the YBME, it is convenient to seek those solutions which have a fixed rank, trace, or norm or possess some other property, like being (positive) definite, Hermitian, normal, unitary, (doubly) stochastic or a permutation matrix, nilpotent, m potent, idempotent, etc. In those instances, the problem of solving the YBME is not the same as finding all the matrices X which algebraically solve the equation, but rather obtaining those solutions with the desired property. Such solutions are called particular solutions. Dually, every time we impose any criteria for the input matrix A (meaning that we require A to have certain properties as well), we are not dealing with the general YBME, but rather we are analyzing a special case of the equation. There are a lot of papers in the available literature which deal with various special cases of the YBME. It is practically impossible to collect them all, nor is that the goal of this chapter; this manuscript covers original scientific contribution of the authors concerning this topic, along with some inevitable results obtained by other authors. Below we mention those which are somewhat relevant to our findings regarding this problem, while an interested reader is encouraged to investigate further. For solutions obtained via the spectral projections, consult [17, 18, 27], and [55]. When the matrix A has a special Jordan form, paper [20] provides infinitely many parametric solutions. Specially, if A is diagonalizable, paper [9] characterizes all solutions; see also [11, 21, 28, 45], and [59]. When A is invertible such that A-1 is a doubly stochastic matrix, paper [16] provides sufficient conditions for the existence of doubly stochastic solutions to the YBME. This case was later completely solved in [22] by extending the results to permutation matrices. Commuting solutions were considered, for example, in [19]. In [46] commuting solutions were also investigated, and a special Toeplitz form of these solutions was presented. Some new matrices H δη1 ,δ2 related to those solutions were introduced. When A is a rank one matrix, all solutions were found in paper [48]. The case when A is any rank two matrix was completely solved in [53] and [54]. In paper [57] all solutions were obtained when A is a nilpotent matrix of index two and has its rank equal to one or two. In [10] and [37], all solutions
N. Č. Dinčić and B. D. Djordjević
246
were considered for an idempotent matrix A. All commuting solutions were found in [56] in the case when A3 = 0, A2 ≠ 0. In [26] all solutions of the form I - uvT were considered, with vTu ≠ 0. In [43] all commuting solutions were found when A = I - PQT, where P and Q are two n × 2 complex matrices of full column rank and detðQT PÞ ≠ 0. In [35] two new iterative algorithms with the second-order convergence rate were proposed. An iterative method based on the Hermitian and skewHermitian splitting of the coefficient matrix A was used in [13]. For more recent results, see, e.g., [12] and [36].
1.2
Chapter Organization
This chapter is organized as follows: The introductory section (Section 1) familiarizes the reader with the topic of this text. It is a brief survey on some of the existing results regarding this problem and serves as a starting point. Section 2 reviews some basic properties about the equation itself. Some convenient results are combined in this section, since they will be called upon in remaining parts of the manuscript. In particular, Section 2 contains results about spectral solutions (obtained in papers [17] and [18]), results about all solutions when the coefficient matrix A is diagonal (obtained in papers [9] and [28]), some very important examples in the case when n = 2 (obtained by the authors), results regarding the isolation of the trivial solutions (obtained by the authors in [15]), and the results which concern nonlinearity of the equation (acquired by the authors in [15]). Section 3 is dedicated to Sylvester matrix equations. These equations will be exploited heavily throughout the chapter. The section starts with the original Sylvester theorem (see [47]) but mostly consists of the results obtained by the authors in their joint and individual papers (see [14, 23, 24], and [25]). In Section 4 we study the equation (1.1) under the premise that A is a regular matrix. We provide a way for obtaining all commuting solutions in this case (Section 4.1 consists of original unpublished results obtained by Dinčić), and afterward we generate a method for obtaining infinitely many non-commuting solutions (Section 4.2 consists of one result obtained by the authors in [15] and one unpublished result obtained by Djordjević). In Section 5 we solve the YBME by assuming that A is a singular matrix: we provide several methods for obtaining infinitely many solutions. This entire section is mainly based on results obtained by the authors in [15]. We provide the solutions by applying generalized inverses and the core-nilpotent
Yang-Baxter-Like Matrix Equation: A Road Less Taken
247
decomposition of A and by solving occurring singular Sylvester equations. Special attention is dedicated to the case when A is a nilpotent matrix. In Section 6 functional calculus is applied to the coefficient matrix A (regardless of whether it is regular or singular), and infinitely many non-commuting solutions are generated from an arbitrary initial non-commuting solution. Consequently, it is shown that the non-commuting solutions always belong to a path-connected subset of the solution set. These findings are obtained by the authors in [15]. A special case is investigated when A is an m -potent matrix, i.e., Am = A, and some relevant results are revisited as well (published in papers [28] and [37]). Section 7 concerns the case where A is a permutation matrix. We manage to find all permutation solutions in this case and provide sufficient and necessary conditions for the existence of doubly stochastic solutions. These results were obtained by Djordjević; some of them are unpublished, while some of them are published in [22]. This section also contains the scenario where A is an invertible matrix such that A-1 is a doubly stochastic matrix, and doubly stochastic solutions are provided in this case (these results were obtained by Ding and Rhee in [16]).
2 Some Basic Properties In this section we revisit some important results about the YBME and provide some appropriate examples which will come in handy for the remaining text.
2.1
Invariance Under Similarity
One way of simplifying the equation is achieved by reducing the matrix A to its Jordan block form. The following result enables us to consider the equivalence classes of matrices. Lemma 2.1 Let A 2 M n ðÞ and let T 2 M n ðÞ be any invertible matrix. Then SðT - 1 ATÞ = T - 1 SðAÞT:
ð2:1Þ
N. Č. Dinčić and B. D. Djordjević
248
Proof We have the following chain of equivalencies: X 2 SðT - 1 ATÞ , T - 1 AT X T - 1 AT = X T - 1 AT X , A TXT - 1 A = TXT - 1 A TXT - 1 , TXT - 1 2 SðAÞ , X 2 T - 1 SðAÞT, which conducts the proof.
□
In particular, we may take T to be a matrix whose columns are generalized eigenvectors of A organized in accordance with the Jordan blocks in the Jordan normal form J of the matrix A. Therefore, without loss of generality, in what follows we will assume that a matrix A is already in its Jordan normal form, unless otherwise stated. It is worth to point out that similarity and commutativity are closely related. In fact, the following useful lemma holds due to such connection. Lemma 2.2 (Horn and Johnson 32, pp. 81) Let L1 and L2 be square matrices of the same dimensions. If L1 and L2 commute, then σ(L1 - L2) ⊂ σ(L1) - σ(L2), where σðL1 Þ - σðL2 Þ = fλi ðL1 Þ - λj ðL2 Þ : λi ðL1 Þ 2 σðL1 Þ, λj ðL2 Þ 2 σðL2 Þg: Proof Since L1 and L2 commute, it follows that for any eigenvalue λ(L1) and for any corresponding eigenvector u1 for the matrix L1, we have L1L2u1 = L2L1u1 = λ(L1)L2u1; therefore L2 preserves the eigenspaces for L1 and vice versa. Consequently, L1 and L2 can be simultaneously triangulized: there exists a unitary matrix U such that Li = U Δi U, where Δi is the upper triangular matrix obtained from Li, for i = 1, 2. By construction, recall that the elements on the main diagonal of Δi are precisely the eigenvalues for the matrix Li. Thus L1 - L2 = UðΔ1 - Δ2 ÞU , and the spectrum of L1 - L2 consists of diagonal elements of the matrix Δ1 - Δ2. Finally, denote by d = dim L1 = dim L2 . We then have
Yang-Baxter-Like Matrix Equation: A Road Less Taken
249
σ ðL1 -L2 Þ ¼ ðΔ1 -Δ2 Þii : i ¼ 1,d ¼ ¼ λi ðL1 Þ -λi ðL2 Þ : i ¼ 1,d ⊂ λi ðL1 Þ -λj ðL2 Þ : i, j ¼ 1,dg: □ 2.1.1
The Case of a Diagonal Matrix A
Specially, when the Jordan normal form of A is a diagonal matrix, the problem of solving YBME is simplified even further. This fine structure of A was considered by Dong and Ding in [28]. Suppose that σ(A) = {λ1(A), . . . , λk(A)} and that A = TDT-1, where D = diagðλ1 ðAÞI 1 , . . ., λk ðAÞI k Þ,
ð2:2Þ
then we may assume Ai = λi ðAÞI i , i = 1, k: The i -th equation becomes λi ðAÞ2 X i = λi ðAÞX 2i : If λi(A) = 0, then there is no restriction for Xi, while λi(A) ≠ 0 implies Xi = λi(A)Pi, where Pi is a projection matrix. Lemma 2.3 (Dong and Ding 28, Lemma 2.1) Let λ be any number and let D = λIm. Then all solutions Y to the DY D = Y DY are commuting ones and are given by the solutions of the homogeneous equation λYðλI - YÞ = 0: So, if λ = 0, then all m × m matrices Y are solutions of DY D = Y DY , and if λ ≠ 0, then all the solutions of the equation are given by Y = λP, where P are all m × m projection matrices. The previous lemma facilitates the problem for those solutions which commute with A, i.e., which are simultaneously block-diagonal whenever A is diagonal. Thus Dong and Ding obtained the following result: Theorem 2.4 (Dong and Ding 28, Theorem 2.1) Let A be any n × n diagonalizable matrix with its Jordan form D being given as (2.2), so that A = TDT-1 for an invertible matrix T. Then all the commuting solutions of the Yang-Baxter-like matrix equation (1.1) are exactly
250
N. Č. Dinčić and B. D. Djordjević
X = T diagðY 1 , . . . , Y k ÞT - 1 , where Yi = λi(A)Pi with Pi any mi × mi projection matrix if λi(A) ≠ 0 and Yi is any mi × mi matrix if λi(A) = 0, for i = 1, . . . , k. Proof ([28]) It suffices by Lemma 2.1 to observe the restricted equation DY D = Y DY . Assume that Y is one of its commutative solutions. Then Y can be decomposed as Y = [Yij] with respect to D, following its block structure. Then from Y D = DY we observe the block equations λi(A) IiYij = λj(A)IiYij = Yijλj(A)Ij, which are homogeneous Sylvester equations SY - Y Z = 0. Recall that such equations have a unique solution Y = 0 if and only if σ(S) \ σ(Z) = Ø; see Theorem 3.1 below or [47, Theorem 1]. Precisely, for i ≠ j we have that Yij = 0; therefore Y has a block-diagonal form with respect to D. Now we have λ2i ðAÞY ii = λi ðAÞY 2ii for i = 1, k and Lemma 2.3 finishes the proof. □ Commuting solutions are characterized in the above manner when A is diagonalizable. Further development of this methodology was conducted by Chen and Yong in [9] where the authors characterized all solutions to the YBME when A is diagonalizable. With respect to our notation, we formulate the main results from [9]: Lemma 2.5 (Chen and Yong 9, Lemma 1.) Suppose that Y 2 SðDÞ where D is a diagonal matrix given as D = diagðλ1 I 1 , λ2 I 2 , . . ., λk I k Þ, where λ1λ2 . . . λk ≠ 0 and mi := dimI i , i = 1, k, m = m1 + . . . + mk. Then (1) Y is diagonalizable; (2) any nonzero eigenvalue λ(Y ) of Y satisfies λ(Y ) 2{λ1, . . . , λk}, and if λ(Y ) = λi, for some i 2{1, . . . , k}, then the algebraic multiplicity of λ(Y ) is no more than mi. Theorem 2.6 (Chen and Yong 9, Theorem 2.) Given A 2 M n ðÞ, if A = Tdiag(Dm, 0)T-1 for some nonsingular matrices, T, T-1 and a nonsingular diagonal matrix Dm = diag(λ1(A)I1, . . . , λk(A)Ik), then the general solution X to the YBME (1.1) is given by
Yang-Baxter-Like Matrix Equation: A Road Less Taken
X =T
Y1
Y2
Y3
Y4
251
T - 1,
where • Y 4 2 M n-m ðÞ is arbitrary; 0 0 P - 1 2 M m ðÞ where P 2 M m ðÞ is a nonsingular matrix • Y1 = P 0 Δ and Δ is a nonsingular diagonal matrix of dimension s, s ≤ m. • If Y1 is nonsingular (i.e., if m = s), then Y2 = 0m×(n-m) and Y3 = 0(n-m)×m. • Otherwise, let P~ be the m - s order leading principal submatrix of P - 1 Dm- 1 P. For an arbitrary Q 2 M ðm-sÞ × ðn-mÞ ðÞ, let ~ =0. W 2 M ðn-mÞ × ðm-sÞ ðÞ be an arbitrary matrix which solves W PQ Then Y 2 = Dm- 1 P
Q 0
and Y 3 = ½W 0P - 1 Dm- 1 :
Proof [9] Once again, due to Lemma 2.1 we restrict to the equation YdiagðDm , 0ÞY = diagðDm , 0ÞYdiagðDm , 0Þ: Respectively, Y has the form Y =
Y1
ð2:3Þ
Y2
. Combining this partition Y3 Y4 with the previous equation (2.3), we get a system of equations Y 1 Dm Y 1 = Dm Y 1 Dm , Y 1 Dm Y 2 = 0, Y 3 Dm Y 1 = 0, Y 3 Dm Y 2 = 0: Since Y4 does not appear in the latter system, there are no restrictions for it; therefore Y4 can be arbitrary. By applying Lemma 2.5 to the first equation, we conclude that Y1 is diagonalizable; thus there exist a nonsingular matrix 0 0 P and a diagonal nonsingular matrix Δ such that Y 1 = P P - 1. 0 Δ If dimΔ = m, then the second and the third equation are auxiliarilly solved with Y2 = 0 and Y3 = 0. Otherwise, if dimΔ < m, then there exist nonzero matrices Q and W, defined in a manner proposed by the theorem, which solve
N. Č. Dinčić and B. D. Djordjević
252
~ = 0. Respectively, solving the equations Y1DmY2 = 0, the equation W PQ Y3DmY1 = 0, and Y3DmY2 = 0 in terms of Q and W gives the desired forms □ for Y2 and Y3.
2.2
The Case When n = 2
In this section we demonstrate how the YBME (1.1) behaves when n = 2. Since we can reduce A to its Jordan normal form, we will simply observe the 2 × 2 blocks. Recall that ranðes Þ = ∖f0g for a complex variable s. Example 1 In the case when A = J2(0), there are the following two families of solutions that depend on two complex parameters s and t: SðAÞ =
0
s
0
t
,
es
t
0
0
t, s 2 : ♣
Example 2 In the case when A = J2(λ), λ ≠ 0, there is only one family of solutions that depends only on one complex parameter t: SðAÞ =
t
ð1 - t∕ λÞ2
- λ2
2λ - t
t2 : ♣
Example 3 In the case when A = λI2, λ ≠ 0, there are three families of solutions: two of which depend only on one complex parameter t and one family of solutions that depends on two complex parameters s and t: SðAÞ =
0
0
t
λ
,
λ
0
t
0
t ,
es
e - s ðλt - t 2 Þ λ - t
t, s 2 : ♣
Example 4 In the case when A = diag(λ, 0), λ ≠ 0, there are three families of solutions: two which depend on two complex parameters s and t and one which depends on one complex parameter t:
Yang-Baxter-Like Matrix Equation: A Road Less Taken
SðAÞ =
0
0
s
t
,
0
s
0
t
,
λ
0
0
t
253
t, s 2 : ♣
Example 5 In the case when A = diag(λ, μ), λμ(λ - μ) ≠ 0, all nontrivial solutions are obtained in the following parametric family:
λ
0
0
0
,
0
0
0
μ
μ2 μ-λ
et
λμðλ2 - λμ þ μ2 Þ λ2 - e-t λ-μ ðλ - μÞ2
t2 :
Notice that if λ2 - λμ + μ2 = 0, then the last parametric family of solutions becomes λ
et
0
μ
t2 :
p Moreover, in this case (i.e., when λ = μð1 ± i 3Þ∕2 ), there is one more family of solutions, p 1±i 3 μ 2 t
0 μ
t2 ,
cf. Theorem 3.4 in [45]. ♣ Remark 2.7 In the case A = J2(0), the first solution is commuting for t = 0, and the second one is never commuting. For A = J2(λ), λ ≠ 0, there are no commuting solutions. When A = λI2, all three families of solutions are commuting. For A = diag{λ, 0}, λ ≠ 0, both the first and the second solutions are commuting for s = 0, while the third family of solutions is commuting. Finally, the case A = diag(λ, μ), λμ(λ - μ) ≠ 0; both the first and the second solutions are commuting, while the third one is not; in the special case λ2 - λμ + μ2 = 0, the solutions are commuting for t = 0.
N. Č. Dinčić and B. D. Djordjević
254
2.3
Spectral Solutions
In paper [17] Ding and Rhee introduced the notion of a spectral solution. Let HðAÞ be the class of complex-valued functions f of a complex variable λ which are analytic in an open neighborhood Ω ⊃ σ(A). The neighborhood Ω is not fixed and it depends on the choice of f and does not need to be connected. For λi 2 σ(A) define hλi ðλÞ 2 HðAÞ to be hλi ðλÞ =
1,
in an open neighborhood of λi ,
0,
in an open neighborhood of σðAÞ∖fλi g:
Respectively, the spectral projector Pi associated with λi is defined as Pi := hλi ðAÞ: It is not hard to verify its most important properties: (a) P2i = Pi (b) Pi is a projection onto kerððA - λi IÞνðλi Þ Þ along ranððA - λi IÞνðλi Þ Þ, where ν(λi) is the index of an eigenvalue λi, and is defined as the smallest nonnegative integer k such that kerððA - λi IÞkþ1 Þ = kerððA - λi IÞk Þ (c) PiPj = 0, i ≠ j (d) APi = PiA (e) λi 2σðAÞ Pi = I. There are several ways to evaluate those spectral projections, also called Frobenius covariants. One of them is via the Cauchy integral formula: Pj =
1 2πi
ðzI - AÞ - 1 dz, Γj
where Γj is a simple closed rectifiable curve enclosing the domain Dj, such that Dj contains precisely one λj 2 σ(A) and no other eigenvalue of A belongs to the set Dj [ Γj. Note that when Pi is a spectral projector associated with λi 2 σ(A), then APi 2 SðAÞ: since A and Pi commute, we have A APi A - APi A APi = A3 Pi - A3 Pi = 0:
Yang-Baxter-Like Matrix Equation: A Road Less Taken
255
Definition 2.8 If Pi is a spectral projector corresponding to the eigenvalue λi 2 σ(A), then APi is a spectral solution to the YBME. Theorem 2.9 ([17]) Let λ1, . . . , λs denote all distinct eigenvalues of A. Then the sum of any number of matrices among AP1, . . . , APs is a solution to the YBME, where P1, . . . , Ps are the spectral projectors defined by Pi = hλi ðAÞ: The previous theorem is directly verifiable: for an arbitrary t ≤ s, consider the sum AP1 + . . . + APt. By using the fact that A and Pi commute, we have t
t
A
APi A -
i=1
i=1
t
=A
t
APi A
i=1
t
3 i=1
Pi - A
3 i=1
APi
t
Pi
j=1
Pj = 0:
Example 6 Let us consider A = diag{λ, 0}, λ ≠ 0. Here σ(A) = {0, λ} and we have 1 P0 = 2πi = diag
ðzI - AÞ
-1
Γ0
1 2πi
Γ0
z-λ 0
1 dz = 2πi
dz 1 , z - λ 2πi
0
Γ0
Γ0
-1
dz
z
dz = diagð0, 1Þ, z
where Γ0 is a simple closed rectifiable curve enclosing the domain D0 such = Γ0 [ D0. Since P0 + Pλ = I2, we that D0 contains the eigenvalue 0 but λ 2 conclude that P0 =
0
0
0
1
, Pλ =
1
0
0
0
:
By Theorem 2.9, AP0 =
λ
0
0
0
0
0
0
1
= 0, APλ =
λ
0
1
0
0
0
0
0
=A
are spectral solutions to (1.1); however these are just the trivial solutions. ♣
N. Č. Dinčić and B. D. Djordjević
256
Example 7 On a similar note as in the preceding example, in the case where A = diag(λ, μ), λμ(λ - μ) ≠ 0, we obtain the spectral projections Pλ =
1
0
0
0
λ
0
, Pμ =
0
0
0
1
:
Now, by Theorem 2.9 APλ =
0 0
, APμ =
0
0
0
μ
are spectral solutions, and their sum, APλ + APμ = A, is also a spectral solution but a trivial one. ♣ Theorem 2.10 [18] If E is a projection that commutes with A, then the matrix AE is a solution of YBME. Moreover, ðA - AEÞk = Ak - Ak E: Indeed, direct calculation shows that AðAEÞA - ðAEÞAðAEÞ = A3 E - A3 E2 = 0, hence AE 2 SðAÞ. The second part follows by the means of mathematical induction. For k = 1 the statement holds. Suppose that the statement is true for k; then we have ðA - AEÞkþ1 = ðA - AEÞk ðA - AEÞ = ðAk - Ak EÞðA - AEÞ = Ak ðI - EÞAðI - EÞ = Akþ1 ðI - EÞ2 = Akþ1 ðI - EÞ = Akþ1 - Akþ1 E, which completes the proof. Recall that an eigenvalue λ(A) of a matrix A 2 M n ðÞ is called simple if its algebraic multiplicity is 1, and it is called semisimple if its algebraic and geometric multiplicities are equal. The index ν(z) of some z 2 with respect to a matrix A is the smallest k 2 such that ker((A - zI)k+1) = ker((A - zI)k). Remark that λ 2 σ(A) , ν(λ) > 0 and λ is semisimple eigenvalue if and only if ν(λ) = 1.
Yang-Baxter-Like Matrix Equation: A Road Less Taken
257
Theorem 2.11 ([18]) Suppose that λ is a semisimple eigenvalue of A with multiplicity m. Let X and Y be two n × m matrices whose columns νðλÞ
form the bases of kerððA - λIÞνðλÞ Þ and kerððA - λIÞ Þ such that E = XY and YX = I. Then for any m × m matrix M such that M2 = M, the matrix EM = XMY is a projection such that AEM = EMA. So the matrix AEM is a solution of the YBME (1.1).
2.4
Trivial Solutions
As in many nonlinear problems, the two trivial solutions behave quite differently from the nontrivial ones. In this section we prove that, when certain conditions are imposed on A, the trivial solutions X = 0 and X = A are isolated points in the solution set. Contrary, we demonstrate that by omitting these conditions the two trivial solutions are not isolated in SðAÞ. This conditional isolation property is remarkable, as we will later on prove that the nontrivial solutions are never isolated. Below we formulate and prove the famous Banach fixed point theorem. This is almost a part of mathematical folklore and is frequently used in solving nonlinear problems. Since matrix equations are studied not only by mathematicians, but by engineers and physicists as well, we’ve decided to give a formal proof of the theorem, in order to make this chapter selfreadable. Formulation of the theorem and its proof are the standard ones and can be found in any textbook on mathematical analysis (see, e.g., [44]). Definition 2.12 For a non-empty set V , a function f : V → V is said to have a fixed point in V if there exists an element a 2 V such that f(a) = a. In that case, the said element a 2 V is a fixed point for the function f. Definition 2.13 Let W be a subset of a metric space (V, d). A function f : V → V is a contraction on W if there exists a q 2 (0, 1) such that for any x, y 2 W the inequality d( f(x), f( y)) ≤ qd(x, y) holds. If W = V , then f is simply called a contraction on V . Theorem 2.14 (Banach fixed point theorem) Let (V, d) be a complete metric space and let f : V → V be a contraction on V . Then there exists a unique fixed point x1 for f in V .
N. Č. Dinčić and B. D. Djordjević
258
Proof Let x0 be an arbitrary point in the complete metric space V . Then x1 := f(x0) and x2 := f(x1) = f( f(x0)). By virtue of f being a contraction, there exists a q 2 (0, 1) such that d( f(x2), f(x1)) ≤ qd(x2, x1) ≤ q2d(x1, x0). Continuing this process, for any n 2 let xn := f(xn-1) be a recursively defined V valued sequence. Then dðf ðxn Þ, f ðxn - 1 ÞÞ ≤ qdðxn , xn - 1 Þ ≤ . . . ≤ qn dðx1 , x0 Þ: Fix arbitrary n and m, where m > n, and observe the afore-given V valued sequence ðxk Þk2 . Let ε′ > 0 be an arbitrarily small positive number. Due to the triangle inequality, the following chain of inequalities holds: dðxm , xn Þ ≤ dðxm , xm - 1 Þ þ dðxm - 1 , xm - 2 Þþ . . . þ dðxnþ1 , xn Þ ≤ qm - 1 dðx1 , x0 Þ þ qm - 2 dðx1 , x0 Þþ . . . þ qn dðx1 , x0 Þ = qn dðx1 , x0 Þ =
m-1-n k=0
qk < qn dðx1 , x0 Þ
1
qk k=0
n
q dðx1 , x0 Þ < ε ′ , 1-q
for large enough n. Thus the V -valued sequence ðxk Þk2 is a Cauchy sequence with values in the complete metric space; therefore it is convergent, i.e., there exists a unique x12 V such that x1 = limnxn, that is, limnd(x1, xn) = 0. But then again for any positive ε > 0, there exists a large enough N ε1 2 such that dðx1 , xN εk Þ < ε∕2, for any k ≥ 1, and similarly there exists an N ε0 such that N ε0 := log q
ð1 - qÞε þ 1, 2dðx1 , x0 Þ
where ½ is the floor function. Then N ε0 > log q
ε ð1 - qÞε , ð1 - qÞε > 2dðx1 , x0 ÞqN 0 : 2dðx1 , x0 Þ
Yang-Baxter-Like Matrix Equation: A Road Less Taken
259
Choose N ε := maxfN ε1 , N ε0 g and verify that dðf ðx1 Þ, x1 Þ ≤ dðf ðx1 Þ, f ðxN ε ÞÞ þ dðx1 , f ðxN ε ÞÞ ≤ qdðx1 , xN ε Þ þ dðx1 , xN ε Þ þ dðxN ε , f ðxN ε ÞÞ
0 be arbitrary. Denote the following: K = fS 2 n × n : kSk ≤ Rg and W = fS 2 n × n : kSk ≤ kA - 1 k2 Rg: Introduce the function f : K → W as f(Z) := A-1ZA-1, for every Z 2 K. Let 0 < q < 1 be arbitrary and denote by s := q(2RkA-1k3)-1. Define g on W as g : Z ° sZAZ. Then f ∘ g maps K into K: for any Z 2 K we have kgðf ðZÞÞk = skA - 1 ZA - 1 ZA - 1 k ≤ skA - 1 k3 R2 = qR∕2 < R:
N. Č. Dinčić and B. D. Djordjević
260
On the other hand, for any Z1 and Z2 in K, we get gðf ðZ 1 ÞÞ - gðf ðZ 2 ÞÞk¼ skA - 1 Z 1 A - 1 Z 1 - Z 2 A - 1 Z 2 A - 1 ≤ s A-1
2
≤ s A-1 2
ðZ 2 þ ðZ 1 - Z 2 ÞÞA - 1 Z 1 - Z 2 A - 1 Z 2 Z 2 A - 1 ðZ 1 - Z 2 ÞkþkðZ 1 - Z 2 ÞA - 1 Z 1
≤ 2sR A - 1 3 kZ 1 - Z 2 k ¼ qkZ 1 - Z 2 k: The above proves that f ∘ g : K → K and is a contraction on K. Thus there exists a unique Z0 2 K such that Z0 = g( f(Z0)). Since 0 = g( f(0)), it follows that Z0 = 0. On the other hand, let X be a solution to (1.1) which satisfies kXk ≤ q(2k2(A)kA-1k)-1. Then kXk ≤ qð2k 2 ðAÞkA - 1 kÞ
-1
=
sR , kAk2
which proves that ks-1X k≤ R∕kAk2 and finally that A(s-1X)A 2 K. But then AXA = XAX , s - 2 AXA = s - 2 XAX , s - 1 Aðs - 1 XÞA = ðs - 1 XÞAðs - 1 XÞ , Aðs - 1 XÞA = gðf ðAðs - 1 XÞAÞÞ, which proves that 0 = A(s-1X)A and consequently X = 0. Since this conclusion holds for every solution X which satisfies kXk≤ q(2k2(A)kA-1k)-1 and q 2 (0, 1) was arbitrary, we conclude that the only solution to (1.1) which □ satisfies kXk < (2k2(A)kA-1k)-1 is X = 0. Example 8 Recall Example 2 with A = J2(λ), where λ ≠ 0. Then the set of nontrivial solutions is given as SðAÞ =
t
ð1 - t∕ λÞ2
- λ2
2λ - t
t2 :
Respectively, the Frobenius norm of any nontrivial solution is given as kXðtÞk2F = jλj4 þ j2λ - tj2 þ jtj2 þ jt∕ λ - 1j4 :
Yang-Baxter-Like Matrix Equation: A Road Less Taken
261
First we prove that inf fkXðtÞk2F : t 2 g = min fkXðtÞk2F : t 2 g = kXðλÞk2F :
ð2:4Þ
We obtain the relation (2.4) in the following manner: for x 2 þ 0 and a 2 þ , observe the real function h defined as hðxÞ := ð2a - xÞ2 þ x2 þ
x a
4
-1 :
The function h is obviously convex and nonnegative, so there must exist an absolute minimum for h. Differentiating the function with respect to x, we get h ′ ðxÞ = - 4a þ 4x þ
3 4 x -1 , a a
while the second derivative is strictly positive: h″ðxÞ = 4 1 þ
3 x -1 a2 a
2
> 0:
Thus the absolute minimum for h is obtained at x0 iff h′(x0) = 0, i.e., ðx0 - aÞ a4 þ ðx0 - aÞ2 = 0 , x0 = a: Therefore h(x) ≥ h(a) = 2a2 for every x ≥ 0. Substituting a := jλj and x := jtj, we get (by using the triangle inequality jz1 - z2j ≥ kz1j-jz2k) kXðtÞk2F = jλj4 þ j2λ - tj2 þ jtj2 þ jt∕ λ - 1j4 ≥ jλj4 þ j2jλj - jtjj2 þ jtj2 þ jjt∕ λj - 1j4 = jλj4 þ hðjtjÞ ≥ jλj4 þ hðjλjÞ = jλj4 þ 2jλj2 = kXðλÞk2F , so indeed (2.4) holds, that is, the solution X(λ) is that nontrivial solution to the YBME which is the closest to the trivial solution X = 0. We are going to show that X(λ) 2 = B0(r(λ)), where B0(r(λ)) is the centered ball which contains only the zero solution to the YBME.
N. Č. Dinčić and B. D. Djordjević
262
By Theorem 2.15, the radius r(λ) of the ball B0(r(λ)) is calculated as rðλÞ = ð2jjAjj2F jjA - 1 jj3F Þ = 2ð1 þ 2jλj2 Þ =
-1
1 jλj4
þ jλj2 2
3∕2
-1
jλj6 : 2ð1 þ 2jλj2 Þ5∕2
Note that r() depends only on the module of the parameter λ. It is not hard to see that rðλÞ → 0 when jλj → 0, and r(λ) 2-7∕2jλj when jλj → 1. Also note that A 2 = B0(r(λ)), since kAkF = 1 þ 2jλj2 > rðλÞ. We compute that kX(λ)kF 0.0141 0.1418 0.4337 0.7500 1.7321 1.9708 2.4972 3.0923 4.8990
|λ| 0.01 0.1 0.3 0.5 1 1.1 1.3 1.5 2
kX(λ)kF - r(λ) 0.0141 0.1418 0.4335 0.7472 1.7000 1.9299 2.4371 3.0120 4.7673
r(λ) ≈ 0.0000 ≈ 0.0000 0.0002 0.0028 0.0321 0.0410 0.0601 0.0803 0.1317
Below we plot the graph (Figure 1) for d(|λ|) := kX(λ)kF - r(λ), which represents the Euclidean distance between the solution X(λ) and the ball ♣ B0(r(λ)). Obviously there are no nontrivial solutions X(t) in B0(r(λ)). d 2500
2000
1500
1000
500
10
20
Figure 1 Graph of the function d(jλj)
30
40
50
Yang-Baxter-Like Matrix Equation: A Road Less Taken
263
The trivial solution X = A is isolated under stronger conditions. Theorem 2.16 [15] Let A be an invertible matrix, such that for any (not necessarily different) eigenvalues μ1, μ2, μ3, and μ4 2 σ(A), the following condition holds: μ1 μ2 - ðμ23 þ μ24 Þ ≠ 0:
ð2:5Þ
Then there exists a positive r0 such that the only solution to (1.1) in BA(r0) is X = A. Proof Let R > 0 be arbitrary. Define the sets K and W as before, K = B0 ðRÞ and W = B0 ð3 - 1 kAk - 2 RÞ: For any Z 2 W introduce the following linear operators (which are in fact square matrices over M n ðÞ): L1 : Z ° AZA and L2 : Z ° A2 Z þ ZA2 . Then kðL1 - L2 ÞðZÞk ≤ kAk2 kZk þ 2kAk2 kZk ≤ R, so ðL1 - L2 ÞðZÞ 2 K for every Z 2 W. In what follows we are going to show that L1 - L2 is an invertible matrix, by applying the Lemma 2.2. Since L1 and - L2 commute, it follows that σðL1 - L2 Þ ⊂ σðL1 Þ - σðL2 Þ = fλ1 - λ2 : λi 2 σðLi Þ, i = 1, 2g: To start, we have that λ 2 σðL1 Þ if and only if there exists an S ≠ 0 such that ASA = λS , AS - λSA - 1 = 0, S ≠ 0: This requires the homogeneous Sylvester equation AY - λY A-1 = 0 to be solvable for a nontrivial Y . Thus, it is necessary and sufficient (see Section 3, Theorem 3.1 or Sylvester 47, Theorem 1) that σ(A) \ σ(λA-1) ≠ Ø. Applying the spectral mapping theorem, it follows that there exists a μ 2 σ(A) such that
N. Č. Dinčić and B. D. Djordjević
264
μ 2 σðλA - 1 Þ =
λ , λ 2 σðμAÞ: σðAÞ
This proves that σðL1 Þ ⊂ fμ1 μ2 : μ1 , μ2 2 σðAÞg: On the other hand, for L2 we have L2 ðZÞ = A2 Z þ ZA2 where the operators Z ° A2Z and Z ° ZA2 are commuting square matrices over M n ðÞ. Therefore, Lemma 2.2 gives σðL2 Þ ⊂ σðA2 Þ þ σðA2 Þ = fμ23 þ μ24 : μ3 , μ4 2 σðAÞg: Combining the previous observations, we get σðL1 - L2 Þ ⊂ fμ1 μ2 - ðμ23 þ μ24 Þ : μ1 , μ2 , μ3 , μ4 2 σðAÞg ∌ 0: This shows that L1 - L2 is an invertible linear operator from W to ranðL1 - L2 Þ and ranðL1 - L2 Þ ⊂ K. Decompose K = ranðL1 - L2 Þ ranðL1 - L2 Þ⊥ . There exists a bounded linear operator L (which is a matrix), such that it maps K to W and is defined as L = ðL1 - L2 Þ - 1 0 :
ranðL1 - L2 Þ ranðL1 - L2 Þ⊥
→ W:
Observe the following chain of inequalities: 6kAk2 > 3kAk2 ≥ kL1 - L2 k ) 6kAk2 kLk > kL1 - L2 kkLk ≥ 1 ) 6kLkkAk2 > 1: -1
3
In that sense, let 0 < q < ð6kLkkAk2 Þ < 1 be arbitrary. Let s := q 9kAk R . Define g on W as g(Z) := -sZAZ. It follows that L∘g : K → K: for any Z 2 K we have
Yang-Baxter-Like Matrix Equation: A Road Less Taken
kgðLðZÞÞk ≤ skAkkLðZÞk2 ≤ skAk
265
R2 R2 = s = qR < R: 9kAk3 ð3kAk2 Þ2
Additionally, L∘g is a contraction on K: for any two Z1 and Z2 in K, we have kgðLðZ 1 ÞÞ - gðLðZ 2 ÞÞk¼ skLðZ 1 ÞALðZ 1 Þ - LðZ 2 ÞALðZ 2 Þk ≤ skðLðZ 1 Þ - LðZ 2 ÞÞALðZ 1 ÞkþskLðZ 2 ÞAððLðZ 1 Þ - LðZ 2 ÞÞÞk ≤ skAkkLðZ 1 ÞkkLðZ 1 - Z 2 ÞkþskAkkLðZ 2 ÞkkLðZ 1 - Z 2 Þk 2kLkR 2 ≤ sðkAk kLk kAk - 2RÞkZ 1 - Z 2 k ¼ s kZ 1 - Z 2 k 3 3kAk ¼q
9kAk3 2kLkR kZ 1 - Z 2 k R 3kAk
¼ 6qkLk kAk2 kZ 1 - Z 2 k < kZ 1 - Z 2 k: The previous analysis shows that there exists a unique Z0 2 K such that Z 0 = gðLðZ 0 ÞÞ and that is precisely Z0 = 0. To complete the proof, let rq := 3qkAk and assume there exists an X 2 BA ðr q Þ which is a solution to 3 1 -2 (1.1). Then Z := A - X, kZk ≤ 3qkAk = 13 kAk - 2 Rq 9kAk Rs and R = 3 kAk AZA = A3 - XAX = A3 - ðA - ZÞAðA - ZÞ = ZA2 þ A2 Z - ZAZ , AZA - ZA2 - A2 Z = - ZAZ∕ s - 1 , Aðs - 1 ZÞA - ðs - 1 ZÞA2 - A2 ðs - 1 ZÞ = - sðs - 1 ZÞAðs - 1 ZÞ By substituting Z′ := s-1Z, we get Z′2 W and the latter equation is equivalent to ðL1 - L2 ÞðZ ′ Þ = AZ ′ A - Z ′ A2 - A2 Z ′ = - sðZ ′ ÞAðZ ′ Þ = gðLððL1 - L2 ÞðZ ′ ÞÞÞ , ðL1 - L2 ÞðZ ′ Þ is a fixed point for L∘g in K , ðL1 - L2 ÞðZ ′ Þ = 0 , Z ′ = 0 , Z = 0 , A = X: The previous analysis holds for every solution X to (1.1) which is in -1 BA ðr q Þ, so varying q from 0 to ð6kLkkAk2 Þ , we obtain
N. Č. Dinčić and B. D. Djordjević
266
r0 = sup3qkAk = q
1 : 2kAkkLk □
Corollary 2.17 [15] Let A be an invertible matrix. (a) If A has only one eigenvalue σ(A) = {λ}, then X = A is an isolated solution to YBME (1.1). (b) If A has two distinct real eigenvalues λ1 and λ2 such that λi ≠ 2λj and p jλi j ≠ 2jλj j when i, j 2{1, 2}, i ≠ j, then X = A is an isolated solution to (1.1). Proof We verify that (2.5) holds in both cases of A. (a) When σ(A) = {λ}, then (μ1, μ2, μ3, μ4) 2 σ(A)4 gives only one choice μk = λ, k = 1, 4. Thus the condition (2.5) obviously holds: 2λ2 ≠ λ2. (b) When A has two distinct eigenvalues, then (μ1, μ2, μ3, μ4) 2 σ(A)4 has several possibilities. When μk = λ1 or μk = λ2 for all k = 1, 4, then the condition (2.5) obviously holds. Thus we are interested in the remaining cases. Let i, j 2{1, 2} be fixed such that i ≠ j. Then by the assumptions of the corollary, one has 2λ2i ≠ λi λj , 2λ2i ≠ λ2i , 2λ2i ≠ λ2j , and λ2i þ λ2j ≠ λ2i . Finally, since the quadratic mean is greater than the geometric mean, we get λ21 þ λ22 > 2
jλ1 λ2 j , λ21 þ λ22 > 2jλ1 jjλ2 j ≥ jλ1 λ2 j ≥ λ1 λ2 ,
so the condition (2.5) holds. □ The following example shows that the condition (2.5) cannot be weakened. Example 9 Recall Example 5, with coefficient matrix A = diag(λ, μ), where λμ(λ - μ) ≠ 0. Assume that λ2 + μ2 = λμ. Then one family of solutions to the YBME is parametrized as
Yang-Baxter-Like Matrix Equation: A Road Less Taken
λ
et
0
μ
267
t2 :
By taking t n := ln 1n, where ln is any branch of the complex logarithm, we obtain a sequence of nontrivial solutions X n := Xðt n Þ =
λ
1∕ n
0
μ
→ A,
n→ þ 1
where the convergence occurs in every matrix topology. This proves that the trivial solution A is not isolated if (2.5) does not hold. However, note that the trivial solution X = A is never attained via the afore-given parametric set, since zero is not in the range of the complex exponential function. ♣ As stated before, neither Theorem 2.15 nor Theorem 2.16 holds when A is a singular matrix. Recall Example 1: if A = J2(0) is given as A=
0
1
0
0
then one family of solutions is given as Xðt, sÞ =
0
s
0
t
t, s 2 :
When t = 0, varying s from 0 to 1, we obtain both trivial solutions X = 0 and X = A, so the trivial solutions are not isolated. ♣
2.5
Nonlinearity of the Problem
Notice that the YBME is indeed a nonlinear equation in X, in spite of A 2 M n ðÞ being a square matrix (a continuous linear operator on n ). To see this, observe the matrix-valued mapping F : M n ðÞ → M n ðÞ, FðXÞ := AXA - XAX: Then solving the YBME (1.1) is equivalent to solving
N. Č. Dinčić and B. D. Djordjević
268
FðXÞ = 0:
ð2:6Þ
Let α= 2{0, 1} be an arbitrary complex scalar. Then FðαXÞ - αFðXÞ = AðαXÞA - ðαXÞAðαXÞ - αðAXA - XAXÞ = ðα - α2 ÞXAX ≠ 0, i.e., the mapping F is not homogeneous. This proves the following theorem: Theorem 2.18 [15] If X 1 2 S, then αX 1 2 S iff α = 0 _ α = 1 _ AX1A = 0. Further, for any X 1 , X 2 2 M n ðÞ we have F ðX 1 Þ þ F ðX 2 Þ - F ðX 1 þ X 2 Þ ¼ AX 1 A - X 1 AX 1 þ AX 2 A - X 2 AX 2 - ðAðX 1 þ X 2 ÞA - ðX 1 þ X 2 ÞAðX 1 þ X 2 ÞÞ ¼ X 1 AX 2 þ X 2 AX 1 , which need not be zero, proving that F is not additive. Therefore, F is not a linear operator; hence YBME is not a linear equation in X. The previous calculations prove the following statement: Theorem 2.19 [15] If X 1 , X 2 2 S , then X 1 þ X 2 2 S if and only if X1AX2 + X2AX1 = 0. An effective tool for solving nonlinear problems as (2.6) (or, equivalently, the YBME (1.1)) is the fixed point theory. In Section 2.4 Banach fixed point theorem was applied and it was shown that the trivial solutions are isolated under certain conditions. Note, however, that not only the Banach fixed point theorem is suitable for our problem. The following lemma is rather convenient for the forthcoming results. Lemma 2.20 Let A be an invertible matrix. Then X 2 SðAÞ if and only if XA is a fixed point for the function ΨðYÞ := A - 1 Y 2 A - 1 :
ð2:7Þ
Proof Rewriting the initial equation, we get AXA = XAX , XA = A - 1 XAXAA - 1 , XA = ΨðXAÞ: □
Yang-Baxter-Like Matrix Equation: A Road Less Taken
269
The function Ψ is useful for obtaining particular nontrivial solutions when A is invertible. This will be exploited throughout further text.
3 Sylvester Matrix Equations: A Tool for Generating Nontrivial Solutions Before we proceed with further results, we revisit one important class of matrix equations, called the Sylvester equations. We begin with some standard results, which can be found in the available literature, e.g., [5, 6], and [47], and finish this section with the authors’ original results on this topic, which are essential for studying the YBME (1.1). These original results were obtained in [14, 23, 24], and [25]. For given positive integers n and m, let A 2 M n ðÞ, B 2 M m ðÞ, and C 2 M n × m ðÞ be given matrices. Equations of the form AX - XB = C
ð3:1Þ
with X 2 M n × m ðÞ being the unknown matrix are called Sylvester equations, in honor of sir J. Sylvester. These equations have found vast applications in numerous different fields of matrix analysis and operator theory, as well as in physics, engineering, control theory, robotics, etc. We start with the very first result on this topic which was proved by Sylvester in 1884 and is nowadays known as the Sylvester theorem. Theorem 3.1 [47] Let A, B, and C be matrices of appropriate dimensions. The equation (3.1) has a unique solution X if and only if σ(A) \ σ(B) = Ø. Proof This result can be proved in several different ways. Here we provide the proof from [6], as it is the most suitable for the further text. The if part. Observe the matrices L1 : X ° AX and L2 : X ° XB for X 2 M n × m ðÞ. Then L1, L2 2 M ðnmÞ × ðnmÞ ðÞ and L1L2 = L2L1. By Lemma 2.2 it follows that σðL1 - L2 Þ ⊂ σðAÞ - σðBÞ: Since 0 2 = σ(A) - σ(B), the matrix L1 - L2 generates an invertible continuous linear operator over the space M n × m ðÞ. Consequently, for every C 2 M n × m ðÞ there exists a unique X 2 M n × m ðÞ such that
N. Č. Dinčić and B. D. Djordjević
270
C = ðL1 - L2 ÞðXÞ = AX - XB, thus the Sylvester equation (3.1) is uniquely solved in this case and the solution is X := (L1 - L2)-1(C). The only if part. Conversely, assume that A and B share an eigenvalue λ. Then λ 2 σðA Þ; thus there exist (nonzero) eigenvectors u and v for B and A, respectively, which correspond to λ and λ, respectively. Define Cu := v and assume there exists a unique solution X to the appropriate Sylvester equation. Then 0 = λhXu, vi - λhXu, vi = hXu, λvi - λhXu, vi = hXu, A vi - hλXu, vi = hAXu, vi - hXBu, vi = hðAX - XBÞu, vi = hCu, vi = hv, vi = kvk2 > 0, which is impossible.
□
The only if part of the previous theorem suggests that in some instances the Sylvester equation is not even solvable. Ergo when σ(A) \ σ(B) = Ø, the Sylvester equation is said to be regular, since it is always solvable and with a unique solution (provided as X = (L1 - L2)-1(C) in the previous proof). Contrary, when σ(A) \ σ(B) ≠ Ø, the equation (3.1) is said to be singular. There exist singular Sylvester equations which are unsolvable (one example being the only if part of the proof of the Sylvester theorem), singular Sylvester equations with a unique solution (see [23]), and singular Sylvester equations with infinitely many solutions. The fundamental difference between regular and singular Sylvester equations lies in the homogeneous Sylvester equation, AX - XB = 0. If the equation is regular, then the only solution to AX = XB is the zero matrix, X = 0. On the other hand, if the homogeneous Sylvester equation is assumed to be singular, then it is solvable (this will be shown shortly) and has infinitely many nontrivial solutions. Moreover, observe the inhomogeneous singular (but solvable) Sylvester equation AX - XB = C, C ≠ 0, and let X1 be one of its solutions. Then a matrix X2 is also a solution to the equation (3.1) if and only if (X1 - X2) is a solution to the homogeneous Sylvester equation AX = XB. In other words, if the equation (3.1) is solvable, then the set of its solutions, denoted as S Syl ðA; B; CÞ, can be described as S Syl ðA; B; CÞ = fX p þ X h : AX h = X h Bg,
Yang-Baxter-Like Matrix Equation: A Road Less Taken
271
i.e., SSyl(A;B;C) can be expressed as the set sum of one particular solution Xp and the set of all solutions to the homogeneous Sylvester equation AX = XB. Note that, when the equation is regular, the latter simply reduces to Xh = 0 and Xp is indeed the only solution. The main advantage of solvable singular Sylvester equations over the regular ones is the possibility to study the commutator problems AX = XA and AX - XA = Y , Y ≠ 0. Notice that the assumption σ(A) \ σ(B) = Ø implies that A ≠ B; thus the commutator problems can never be modeled via regular Sylvester equations. Consequently, the commutator problems mentioned above can be (and will be) modeled as special cases of singular Sylvester equations where A = B. However, note that even in this special case, not every singular Sylvester equation is solvable: simply taking the traces of both sides shows that there is no matrix X such that AX - XA = I; see [5, 6, 14, 23], and [24]. Since solvability of the equation is not a given, in what follows we proceed to obtain sufficient conditions for the existence of infinitely many solutions to (3.1), as well as their general form. We start with the results obtained by Djordjević in [23] (which is an improvement of the results obtained by Djordjević and Dinčić in [24]). As mentioned above, we assume that A and B share s different eigenvalues: σ := fλ1 , . . . , λs g = σðAÞ \ σðBÞ: Observe the corresponding eigenspaces for B and A, kerðB - λi IÞ and kerðA - λi IÞ, whenever λi 2 σ. Different eigenvalues generate linearly independent eigenvectors, so there exists a direct sum E B := kerðB - λ1 IÞ kerðB - λ2 IÞ . . . kerðB - λs IÞ, which is a closed subspace of m ; thus m = E B E ⊥ B . With respect to that decomposition, denote BE := BPEB , B1 := BPE⊥B , and C 1 := CPE⊥B . In that sense, the upper triangular splitting of the matrix B holds: B=
BE
B0
0
B11
:
EB E⊥ B
→
EB E⊥ B
,
ð3:2Þ
N. Č. Dinčić and B. D. Djordjević
272
where B1 =
B0 B11
:
Notice that EB is a B -invariant subspace of m , that is, B(EB) = EB, and m consequently E B ⊂ ranðBÞ, while E ⊥ B is B11 -invariant subspace of . Additionally, B11 is a square matrix which defines a bounded linear operator on E ⊥ B. Lemma 3.2 (Djordjević 23, Lemma 2.1) With respect to the previous notation, if B0 : kerðB11 - λI E⊥B Þ → ranðBE - λI EB Þ, for every λ 2 σðB11 Þ,
ð3:3Þ
then σ(B11) ⊂ σ(B). Proof Let λ 2 σ(B11) be arbitrary. Then for every v 2 kerðB11 - λI E⊥B Þ there exists a vector u 2 EB such that BE ð - uÞ = - λI EB u - B0 v. Then B
-u v
=
BE
B0
-u
0
B11
v
so λ 2 σ(B), with ½ - u
=
BE ð - uÞ þ B0 v λv
=λ
-u v
,
vT being the corresponding eigenvector for B.
□
Theorem 3.3 (Djordjević 23, Theorem 2.1) With respect to the previous notation, let B be such that (3.3) holds. Additionally, if the condition C : kerðB - λi IÞ → ranðA - λi IÞ
ð3:4Þ
holds for every k = 1, s then there exist infinitely many solutions X to the matrix equation (3.1). Proof ([23]) With respect to the previous notation, observe the space EB. For every i 2{1, . . . , s} let Ni be an arbitrary linear map from kerðB - λi IÞ to kerðA - λi IÞ. By (3.4) for every u 2 kerðB - λi IÞ there exists a unique d u 2 kerðA - λi IÞ⊥ such that ðA - λi IÞdu = Cu:
Yang-Baxter-Like Matrix Equation: A Road Less Taken
273
Define X ði,N i Þ : u ° N i u þ du ,
u 2 kerðB - λi IÞ,
which is a solution to AY - YBE = CPEB on the space kerðB - λi IÞ. Varying i from 1 to s produces X ð1,N 1 Þ , . . ., X ðs,N s Þ linear operators, each defined on a different eigenspace for B. Adding them together gives s
X EðN 1 , ... , N s Þ :=
k=1
X ði,N i Þ ,
which is a solution to (3.1) on EB. Now observe the complemented space E⊥ B . By construction it follows that σ = σ(BE) and for every μ 2 σ the eigenspace kerðBE - μI EB Þ is just a formal projection of kerðB - μIÞ, i.e., kerðBE - μI EB Þ 0E⊥B = kerðB - μIÞ. By the virtue of condition (3.3), Lemma 3.2 states that σ(B11) ⊂ σ(B); thus for every μ 2 σ(B11) and for every corresponding eigenvector u 2 kerðB11 - μI E⊥B Þ, there exists a vector v 2 EB such that ðBE - μI EB Þv = B0 u, and in that case [-v u]T is an eigenvector for B which corresponds to μ. Assume that σ(B11) \ σ(BE) ≠ Ø and denote by μ0 their shared eigenvalue. As previously explained, for every eigenvector u 2 kerðB11 - μ0 I E⊥B Þ, there exists a v 2 EB such that [-v u]T is an element in kerðB - μ0 IÞ. However, since μ0 2 σ(BE), it follows that kerðB - μ0 IÞ = kerðBE - μ0 I EB Þ 0E⊥B ; thus u = 0E⊥B which is impossible. Ergo, σ(BE) \ σ(B11) = Ø and finally σ(B11) \ σ(A) = Ø. Observe the reduced Sylvester equation on E⊥ B: AXPE⊥B - XB1 = C 1 , AX 1 - X 1 B11 = C 1 þ X EðN 1 ,... , N s Þ B0 ,
ð3:5Þ
n n ⊥ where B11 2 ðE ⊥ B Þ, A 2 ð Þ, and C 1 þ X EðN 1 ,... , N s Þ B0 2 ðE B , Þ are known n matrices such that σ(B11) \ σ(A) = Ø, while X 1 2 ðE ⊥ B , Þ is the sought solution. By the Sylvester theorem, there exists a unique X 1ðN 1 ,... , N s Þ in n ðE ⊥ B , Þ such that (3.5) holds. Finally, it follows that
X = X EðN 1 ,... , N s Þ
X 1ðN 1 ,... , N s Þ :
EB E⊥ B
is an infinite family of solutions to the equation (3.1).
→ n
ð3:6Þ □
The previous theorem illustrates that when the matrix B obeys the form (3.2) and (3.3), then all the solutions to (3.1) are obtained via the formula (3.6), where each Ni is an arbitrary linear operator from kerðB - λi IÞ to
N. Č. Dinčić and B. D. Djordjević
274
kerðA - λi IÞ, for every i = 1, s. If the matrix B does not have the decomposition (3.2) and (3.3), then we exploit Jordan normal form of matrices A and B. This approach was conducted by Dinčić in [14]. Below we formulate the results from that paper, which will be useful for this chapter. Without loss of generality, we consider the equation AX - XB = C when σ(A) = σ(B) = {λ1, . . . , λs}≠ Ø. Suppose that the matrices A and B are already in their Jordan forms: A = diag Jðλ1 ; p11 , p12 , . . . , p1,k1 Þ, . . . , Jðλs ; ps1 , ps2 , . . . , ps,ks Þ , B = diag Jðλ1 ; q11 , q12 , . . . , q1,ℓ1 Þ, . . . , Jðλs ; qs1 , qs2 , . . . , qs,ℓs Þ , where pi1 ≥ . . . ≥ pi,ki > 0, qj1 ≥ . . . ≥ qj,ℓj > 0, i, j = 1, s and that C = [Cij]s×s and X = [Xij]s×s are partitioned in accordance with A and B. It is not hard to see that the equations on which solvability depends are precisely of the form Jðλi ; pi1 , pi2 , . . ., pi,ki ÞX ii - X ii Jðλi ; qi1 , qi2 , . . ., qi,ℓi Þ = C ii , i = 1, s, where we remark the notation C ij = ½C ðijÞ uv u = 1,ki ,v = 1,ℓ j 2 M pi × qj ðÞ: Translating the equation by - λi, we get Jð0; pi1 , pi2 , . . ., pi,ki ÞX ii - X ii Jð0; qi1 , qi2 , . . ., qi,ℓi Þ = C ii , i = 1, s: Lemma 3.4 [14] Let A = diag J m1 ð0Þ, . . . , J mp ð0Þ , m1 ≥ . . . ≥ mp > 0, and define Ahki := diag J m1 ð0Þm1 - 1 - k , . . ., J mp ð0Þmp - 1 - k , k = - 1, mp - 1:
We have: (i) Ah-1i = 0, (ii) Ahkþ1i A = AAhkþ1i = Ahki , k = - 1, mp - 2, (iii) Ahki = I , ð8i = 1, pÞ mi = k þ 1, (iv) ðI ± Ahki Þ
-1
= I ∓ Ahki , k = - 1, bmp ∕2c - 1:
Yang-Baxter-Like Matrix Equation: A Road Less Taken
275
Notice that J1(0)0 = I1, because 00 = 1. Recall that Hadamard (or entrywise) product of two matrices A and B from M m × n ðÞ, denoted by A ∘ B, is a m × n matrix with entries ðA∘BÞij = aij bij : In addition to being associative and distributive, it is also commutative, unlike the usual matrix multiplication. Theorem 3.5 (Dinčić 14, Theorem 3.5) Let A = diagðJ m1 ð0Þ, . . . , J mp ð0ÞÞ, m1 ≥ . . . ≥ mp > 0, B = diagðJ n1 ð0Þ, . . . , J nq ð0ÞÞ, n1 ≥ . . . ≥ nq > 0: Suppose M and N to be M = fði, jÞ 2 p × q : mi ≥ nj g, N i = fði, jÞ 2 p × q : mi < nj g, CM = C ∘ M, CN = C ∘ N, and d = min fm1 , n1 g. The Sylvester equation (3.1) is consistent if and only if d-1 k=0
Ahki C M Bk = 0,
d-1 k=0
Ak C N Bhki = 0,
or, in more condensed form, d-1
Ahki
0
CM
0
Bk
0
k=0
0
Ak
0
CN
0
Bhki
= 0:
The particular solution is given as Xp = XM + XN, where XM =
n1 - 1 k=0
ðAT Þ
kþ1
C M Bk , X N = -
m1 - 1 k=0
Ak CN ðBT Þ
kþ1
,
and the solutions of homogeneous equation AX - XB = 0 are given by
N. Č. Dinčić and B. D. Djordjević
276
When considering homogeneous Sylvester equation AX = XA, Theorem 3.5 gives all its solutions by means of Toeplitz matrices. Corollary 3.6 For A = diagðJ m1 ð0Þ, . . . , J mp ð0ÞÞ, m1 ≥ . . . ≥ mp > 0, all solutions of homogeneous Sylvester equation AX = XA are given by
3.1
Applications to YBME
The results derived in this section will be applied in several different occasions: in proving the existence of non-commuting solutions to the YBME when A is regular, in the core-nilpotent decomposition of A when it is singular, and in characterizing the commuting solutions to the YBME when A is singular. At this point, however, we demonstrate how the results concerning singular Sylvester equations can be used to generate new nontrivial solutions to the YBME from the trivial solution X = A. Recall Theorem 2.19, which says: If X 1 , X 2 2 S, then X 1 þ X 2 2 S iff X1AX2 + X2AX1 = 0. Since X = A is always a solution, the following corollary holds: Corollary 3.7 ([15]) If X 0 2 S, then A þ X 0 2 S iff A2X0 + X0A2 = 0. By the well-known spectral mapping theorem, if σ(A) = {λ1, . . . , λs}, then σðA2 Þ = fλ21 , . . . , λ2s g and σð - A2 Þ = f - λ21 , . . . , - λ2s g. So, σ(A2) \ σ(-A2) ≠ Ø if 0 2 σ(A) or there are some j, k 2{1, . . . , s} such that λj = iλk. From Corollary 3.7 we have the following method for generating infinitely many solutions: • solve the homogeneous Sylvester equation A2X + XA2 = 0 and characterize all its solutions X;
Yang-Baxter-Like Matrix Equation: A Road Less Taken
277
• check whether any X in addition solves the YBME; • if so, then A + X is a solution of the YBME as well. Example 10 For A =
2 0 0 0 XðtÞ =
all solutions to A2X + XA2 = 0 are 0
0
0
t
:t2 :
Every X(t) is also a solution to the YBME (1.1) (recall Example 4). Hence A þ XðtÞ =
2
0
0
t
is a solution to (1.1) as well, for every t 2 .
♣
4 Solving the Equation When A Is Regular We start with the case when A is a regular matrix. We obtain all commuting solutions to the YBME and provide a way for generating infinitely many non-commuting solutions.
4.1
All Commuting Solutions
Recall that a matrix B 2 M n ðÞ is a square root for the matrix A 2 M n ðÞ if B2 = A. Immediately, if B is a square root for A, then - B is a square root for A as well. There are matrices that do not have a square root, e.g., J2(0); for the existence of the square root of a matrix, please see [31, Theorem 1.22.] or [33, Theorem 6.4.12.]. In what follows we will provide a method for finding all square roots for a given invertible matrix A. These results are original and brand-new. They are obtained by the first author and have not been published anywhere. Let A be an invertible matrix, with s distinct eigenvalues λ1, . . . , λs and with p Jordan blocks, p ≥ s:
N. Č. Dinčić and B. D. Djordjević
278
A = TdiagðJ 1 ðλ1 Þ, . . . , J p ðλp ÞÞT - 1 , and denote by nk the dimension of each block: nk := dimJ k ðλk Þ, k = 1, p, ∑knk = n. Let f(z) := z1∕2 be a fixed branch of the complex square root which is regular in all points λ1, . . . , λs, i.e., having its branch-cut away from the set {λ1, . . . , λs}. Then the matrix square root f is well defined in every Jk(λk) and is given by (see, e.g., [31] or [33]) f ′ðλk Þ . . .
f ðnk-1Þ ðλk Þ∕ ðnk - 1Þ!
0
f ðλk Þ
...
f ðnk-2Þ ðλk Þ∕ ðnk - 2Þ!
⋮
⋮
⋱
⋮
0
0
...
f ðλk Þ
f ðλk Þ f ðJ k ðλk ÞÞ :=
, k = 1, p:
This provides pa way for introducing a square root of the matrix A. Remark that for f ðxÞ = x we have f ðjÞ ðxÞ =
ð - 1Þj - 1 ð2j - 3Þ!! p , j = 1, 2, . . . 2j x2j - 1
The matrix B is said to be a primary square root of A if it can be written as B = Tdiagðð - 1Þj1 f ðJ 1 ðλ1 ÞÞ, . . ., ð - 1Þjp ðJ p ðλp ÞÞÞT - 1 , where j1, . . . , jp 2{1, 2} are independent of one another but subject to the condition that λi1 = λi2 ) ji1 = ji2 , that is, the Jordan blocks which correspond to the same eigenvalue have the same sign. Otherwise, B is a nonprimary square root of A (in other words, it cannot stem from the scalar complex square root; see [31]). Either way, if B is a square root for A, then AB = BA. For the existence of matrix square root, consult [31, Theorem 1.22] or [33, Theorem 6.4.12]. For a more general approach to matrix functions, please see Section 6.1. Theorem 4.1 (Higham 31, Theorem 1.26) Let the nonsingular matrix A 2 M n ðÞ have the Jordan form
Yang-Baxter-Like Matrix Equation: A Road Less Taken
279
T - 1 AT = J = diagðJ 1 ðλ1 Þ, . . . , J p ðλp ÞÞ, with p Jordan blocks, and let s ≤ p be the number of distinct eigenvalues of A. Then A has precisely 2s square roots that are primary functions of A, given by Bj = Tdiagðð - 1Þj1 f ðJ 1 ðλ1 ÞÞ, . . ., ð - 1Þjp f ðJ p ðλp ÞÞÞT - 1 , j = 1, 2s , corresponding to all possible choices of j1, . . . , jp, jk = 1 or 2, subject to the constraint that ji = jk whenever λi = λk. If s < p, A has nonprimary square roots. They form parametrized families Bj(U), j = 2s þ 1, 2p , given by Bj ðUÞ = TUdiagðð - 1Þj1 f ðJ 1 ðλ1 ÞÞ, . . . , ð - 1Þjp f ðJ p ðλp ÞÞÞU - 1 T - 1 , where jk = 1 or 2, U is an arbitrary nonsingular matrix that commutes with J, and for each j there exist i and k, depending on j, such that λi = λk while ji ≠ jk. The previous theorem holds only for a regular A. This is because 0 2 always belongs to every branch-cut of the complex square root (see Higham 31, Theorem 1.36). By the said Theorem 1.36, since f′(λk) ≠ 0 for all k, the Jordan block structure is preserved by f. Recall that when A is invertible, solving AXA = XAX such that XA = AX is equivalent to solving (1.3), i.e., X2 = AX(= XA). Theorem 4.2 (Dinčić) If A 2 M n ðÞ is invertible, then all commuting solutions of the YBME are given by the closed-form formula X=
1∕2 1 A þ ðA2 Þ : 2
ð4:1Þ
Proof In proof, we use the well-known fact that both primary and nonprimary functions of A commute with A. Let X be a commuting solution to the YBME. Then X - 12 A
2
1 1 1 1 = X 2 - XA - AX þ A2 = X 2 - AX þ A2 , 2 2 4 4
hence the equation for commuting solutions X2 = AX becomes the quadratic equation
N. Č. Dinčić and B. D. Djordjević
280
X - 12 A
2
1 = A2 : 4
All its solutions are given by X=
1∕2 1 A þ ðA2 Þ , 2
where we consider both primary and nonprimary square roots. Since 1∕2 1∕2 1 1 A þ A2 A A þ A2 - A A þ A2 4 2 1∕2 1∕2 1 3 þ A2 A2 ¼ A þ A2 A2 4
XAX - AXA ¼
þ A2
1∕2
A A2
1∕2
--
1 3 A þ A A2 2
1∕2
1∕2
A
A
1∕2 1∕2 1∕2 1 3 þ A A2 A2 A þ 2A2 A2 4 1∕2 1 3 A þ A2 A2 2 1∕2 1 3 1 3 ¼ þ A A2 A þ 2A2 A2 A þ A2 A2 4 2
¼
1∕2
¼ 0,
and 1∕2 1∕2 1 1 AX - XA = A A þ ðA2 Þ A þ ðA2 Þ A = 0, 2 2
all X given by (4.1) are indeed commuting solutions.
□
2 1∕2
Let us consider (A ) for an invertible matrix A. It is well-known that for any p 2 we have (A1∕p)p = A, but it was shown in [2] that (Aα)1∕α = A only for α 2 [-1;1]; hence (A2)1∕2 ≠ A and further analysis is required. Theorem 4.3 (Dinčić) Let A 2 M n ðÞ be invertible with Jordan normal form T - 1 AT = J = diagðJðλ1 Þ, . . . , Jðλs ÞÞ, where λ1, . . . , λs, are different eigenvalues, we abbreviated
Yang-Baxter-Like Matrix Equation: A Road Less Taken
281
Jðλi ; pi1 , . . . , piki Þ = Jðλi Þ, i = 1, s, and pi1 ≥ . . . ≥ piki > 0, i = 1, s: (i) If a primary square root is used (i.e., the same sign was taken for all Jordan blocks containing the same eigenvalue), then 1∕2
ðA2 Þ
= Tdiagð ± Jðλ1 Þ, . . . , ± Jðλs ÞÞT - 1 ,
and all commuting solutions to the YBME (1.1) are spectral solutions, 1∕2 1 X j = ðA þ ðA2 Þ Þ = APj , j = 1, 2s , 2
where Pj is a spectral projector associated with the Jordan block Jj(λi) which corresponds to the eigenvalue λi, j = 1, 2s : (ii) If a nonprimary square root is used (i.e., the different sign was taken for the Jordan blocks containing the same eigenvalue), then 1∕2
ðA2 Þ ðUÞ = TSUS - 1 diagð ± J p11 ðλ1 Þ, . . . , ± J psks ðλs ÞÞSU - 1 S - 1 T - 1 , where U is any invertible matrix that commutes with diag J p11 ðλ21 Þ, . . . , J p1k1 ðλ21 Þ, . . . , J ps1 ðλ2s Þ, . . . , J psks ðλ2s Þ : Consequently, all solutions given this way are connected. Proof Let us denote by Sij invertible matrix such that J 2pij ðλi Þ = Sij J pij ðλ2i ÞSij- 1 , j = 1, ki , i = 1, s, and S := diagðS11 , . . . , Ssks Þ: Then we have A2 = Tdiag J 2 ðλ1 Þ, . . . , J 2 ðλs Þ T - 1 = TSdiag J p11 ðλ21 Þ, . . . , J p1k1 ðλ21 Þ, . . . , J ps1 ðλ2s Þ, . . . , J psks ðλ2s Þ × S - 1T - 1: (i) Theorem 4.1 for the primary root case gives
N. Č. Dinčić and B. D. Djordjević
282
ðA2 Þ
1∕2
2 2 2 -1 -1 1∕2 1∕2 = TSdiag J 1∕2 T p11 ðλ1 Þ, . . . , J p1k ðλ1 Þ, . . . , J psk ðλs Þ S s
1
= TSS
-1
diag ± J p11 ðλ1 Þ, . . . , ± J p1k1 ðλ1 Þ, . . . , ± J psks ðλs Þ
× SS - 1 T - 1 = Tdiagð ± Jðλ1 Þ, . . . , ± Jðλs ÞÞT - 1 , where the same sign must be taken at all blocks containing the same eigenvalue. Now 1∕2 1 X = ðA þ ðA2 Þ Þ 2 1 = T ðdiagðJðλ1 Þ, . . . , Jðλs ÞÞ þ diagð ± Jðλ1 Þ, . . . , ± Jðλs ÞÞÞT - 1 2 Jðλ1 Þ ± Jðλ1 Þ Jðλs Þ ± Jðλs Þ - 1 = Tdiag , . . ., T : 2 2
Without loss of generality, let us take the positive square root for λ1 blocks and the negative branch for the other eigenvalues. In this case the solution is X λ1 = Tdiag
Jðλ1 Þ þ Jðλ1 Þ Jðλ2 Þ - Jðλ2 Þ Jðλs Þ - Jðλs Þ - 1 , , . . ., T 2 2 2
= TdiagðJðλ1 Þ, 0, . . . , 0ÞT - 1 : Let us find the spectral projector associated with λ1: Pλ1 =
1 2πi
ðzI - AÞ - 1 dz: Γλ1
Since zI - A = zTT - 1 - TdiagðJðλ1 Þ, . . . , Jðλs ÞÞT - 1 = TdiagðzI 1 - Jðλ1 Þ, . . . , zI s - Jðλs ÞÞT - 1 , we have
Yang-Baxter-Like Matrix Equation: A Road Less Taken
283
ðzI - AÞ - 1 = Tdiag ðzI 1 - Jðλ1 ÞÞ - 1 , . . . , ðzI s - Jðλs ÞÞ - 1 T - 1 , where ðzI i - Jðλi ÞÞ - 1 = ðdiagðzI i1 - J i1 ðλi Þ, . . . , zI iki - J iki ðλi ÞÞÞ - 1 = diag ðzI i1 - J i1 ðλi ÞÞ - 1 , . . . , ðzI iki - J iki ðλi ÞÞ - 1 , i = 1, s: It is not hard to see that ðz - λÞ - 1
ðz - λÞ - 2
...
ðz - λÞ - q
0
ðz - λÞ - 1
...
ðz - λÞ - ðq - 1Þ
⋮
⋮
⋱
⋮
0
0
...
ðz - λÞ - 1
ðzI q - J q ðλÞÞ - 1 =
: q×q
Let Γλ1 be a simple contour that encloses λ1. Then it is clear that 1 2πi
Γλ1
dz 1 = 1, z-λ 2πi
Γλ1
dz = 0, k = 2, 3, ::: ðz - λÞk
Because of that, we conclude Pλ1 = TdiagðI λ1 , 0, . . . , 0ÞT - 1 : Therefore, indeed X λ1 = APλ1 : (ii) Theorem 4.1 for the nonprimary root case gives
N. Č. Dinčić and B. D. Djordjević
284 1∕2
ðA2 Þ ðUÞ 2 2 2 1∕2 1∕2 -1 -1 -1 = TSUdiag J 1∕2 S T p11 ðλ1 Þ, . . . , J p1k ðλ1 Þ, . . . , J psk ðλs Þ U s
1
= TSUS - 1 diag ± J p11 ðλ1 Þ, . . . , ± J p1k1 ðλ1 Þ, . . . , ± J psks ðλs Þ × SU - 1 S - 1 T - 1 , where U is any invertible matrix that commutes with diag J p11 ðλ21 Þ, . . . , J p1k1 ðλ21 Þ, . . . , J ps1 ðλ2s Þ, . . . , J psks ðλ2s Þ , and the same sign must not be taken at all blocks containing the same eigenvalue. Then by (4.1) we have 1∕2 1 XðUÞ = ðA þ ðA2 Þ ðUÞÞ, 2
i.e., the solution depends on the matrix U (hence on at least one complex parameter), so it cannot be an isolated solution to the YBME (1.1). □ Example 11 Let A = diag(λ, λ) = λI2. Then we have (by Theorem 1.24, [31], p. 18) ððλI 2 Þ2 Þ
1∕2
= ðλ2 I 2 Þ
1∕2
=U
±λ
0
0
±λ
U - 1,
where U is an arbitrary nonsingular matrix. Remark that the choice of the sign for each lambda is independent of one another. On the other hand, from Example 3 we know that all solutions to the YBME are commuting when A = λI2, λ ≠ 0 and are precisely given by t e - s ðλt - t 2 Þ
es λ-t
,
0
0
t
λ
,
t
0
λ
0
t, s 2 :
Below we proceed to inspect whether the solutions X given by (4.1) are contained in the abovementioned families of solutions.
Yang-Baxter-Like Matrix Equation: A Road Less Taken
285
From [31, Theorem 1.24] the expression (A2)1∕2 is written out as U
λ
0
0
λ
,
λ
0
0
-λ
,
-λ
0
0
λ
,
-λ
0
0
-λ
U - 1,
where U is an arbitrary invertible matrix. Further computation reduces these possibilities down to (regarded as set equalities) 1∕2
ðA2 Þ
= λI 2 = A, U
λ
0
0
-λ
U - 1, U
-λ
0
0
λ
U - 1 , - λI 2 = - A :
Respectively, the family of solutions X given as X = (A + (A2)1∕2)∕2 becomes λ 1∕2 1 ðA þ ðA2 Þ Þ = A, U 2 0
0
U - 1, U
0
0
0
0
λ
U - 1, 0 :
Let U=
a
b
c
d
, detðUÞ = ad - bc ≠ 0, U - 1 =
1 ad - bc
d
-b
-c
a
Then we have U
λ
0
0
0
U -1 =
ad λ ad - bc cd
- ab - bc
:
Let us consider the following cases: 1. Let ab = 0 (but they cannot both be 0 because of detðUÞ ≠ 0). (i) If a = 0, b ≠ 0 (so c ≠ 0), we have -
λ 0 bc cd
0 - bc
=
0
0
- λd∕ b
λ
(ii) If b = 0, a ≠ 0 (so d ≠ 0), we have
=
0
0
t
λ
;
:
N. Č. Dinčić and B. D. Djordjević
286
λ ad ad cd
0 0
=
λ
0
λc∕ a
0
=
λ
0
t
0
;
2. Let ab ≠ 0. If we take s := -
λab λad ≠ 0, t := , ad - bc ad - bc
then it is an easy calculation to see that ad λ ad - bc cd
t = λt - t 2 - bc s
- ab
s≠0 λ-t
:
The case U
0
0
0
λ
U -1
reduces to the previous one, which becomes obvious if we instead of U take V := UP = U
0 1 1 0
:
Therefore, we showed that all solutions are indeed obtained via the formula X = (A + (A2)1∕2)∕2. Notice that the solutions obtained via the primary square roots of A are precisely A and 0, because 1 A = ðA þ AÞ, 2
1 0 = ðA þ ð - AÞÞ, 2
while all remaining solutions are obtained via nonprimary square roots of A. We conclude that every nontrivial solution belongs to a path-connected component contained in SðAÞ, i.e., none of the nontrivial solutions are isolated in this example. On the other hand, the trivial solution X = 0 is isolated due to Theorem 2.15, while the trivial solution X = A is isolated by virtue of Corollary 2.17. ♣
Yang-Baxter-Like Matrix Equation: A Road Less Taken
4.1.1
287
(A2)1∕2 and the Matrix Sign Function
Recall that the signum of a complex number, sometimes denoted also by csgn, is defined by sgn z =
1,
Re z > 0,
- 1,
Re z < 0,
and is not defined when Re z = 0. The sign function brings together the absolute value and the initial variable: since (z2)1∕2 = ±z, it follows that ðz2 Þ
1∕2
= z sgn z,
Re z ≠ 0
defines the principal branch of the complex square root of z2, with the branchcut at ð0, - . Therefore, the sign function can be regarded as multivalued, having the same branches as the complex square root of z2. Similarly, if A does not have purely imaginary eigenvalues, there exists a matrix sgnA. More precisely, assume that A = TdiagðJ 1 , J 2 ÞT - 1 , where J1 consists of those Jordan blocks of the matrix A, which correspond to the eigenvalues of A which have positive real parts, while J2 consists of those Jordan blocks of the matrix A, which correspond to the eigenvalues of A which have negative real parts. Then the principal sign of the matrix A is defined as (see Bhatia and Rosenthal 6, pp. 11 or Higham 31, pp. 107) sgn A = T
I1
0
0
- I2
T - 1:
In that sense, the principal sign function satisfies sgn A = sgn A - 1 and consequently, sgn A = AðA2 Þ
- 1∕2
,
where (A2)-1∕2 is the principal square root of the matrix A-2. Thus any branch of the multivalued matrix sign function is defined via the formula
N. Č. Dinčić and B. D. Djordjević
288 1∕2
ðA2 Þ
= A sgn A,
where (A2)1∕2 is an arbitrary square root of the matrix A2. Corollary 4.4 (Dinčić) The expression for all commuting solutions (4.1) to the YBME (1.1) can be rewritten as 1 X = AðI þ sgn AÞ, 2
ð4:2Þ
where sgn A is an arbitrary branch of the sign function of the matrix A. The formula (4.2) is amenable from the application aspect since the computation of sgn A is a numerically stable procedure. However, note that this holds only when A does not have imaginary eigenvalues. Example 12 Let A = diag(J2(i), J2(i), J1(2)); hence s = 2 < p = 3. We will apply the formula (4.1) to obtain all commuting solutions to the YBME (1.1). In order to apply Theorem 4.1, we will find the Jordan normal form for A2: A2 = diagðJ 22 ðiÞ, J 22 ðiÞ, J 21 ð2ÞÞ = TdiagðJ 2 ð - 1Þ, J 2 ð - 1Þ, J 1 ð4ÞÞT - 1 , where 0
0
1
0
0
0
0
0
- i∕2
0
T= 1
0
0
0
0 :
0
- i∕2
0
0
0
0
0
0
0
1
Now, Theorem 4.1 gives a recipe for obtaining both primary and nonprimary roots. Primary Roots The theorem says that there are 2s = 4 primary square roots of A2, and they are given by 1∕2
1∕2 1∕2 -1 ðA2 Þj = TdiagðJ 1∕2 2 ð - 1Þ, J 2 ð - 1Þ, J 1 ð4ÞÞT
= Tdiagð ± J 2 ðiÞ, ± J 2 ðiÞ, ± J 1 ð2ÞÞT - 1 , j = 1, 4,
Yang-Baxter-Like Matrix Equation: A Road Less Taken
289
where we choose the same sign for both J2(i) blocks. Now, the solutions are: 1∕2 1 1 A þ A2 1 ¼ A þ TdiagðþJ 2 ðiÞ, þ J 2 ðiÞ,þJ 1 ð2ÞÞT -1 ¼ A, 2 2 1∕2 1 1 X 2 ¼ A þ A2 2 ¼ A þ Tdiagð -J 2 ðiÞ, -J 2 ðiÞ, -J 1 ð2ÞÞT - 1 ¼ 0, 2 2 1∕2 1 1 X 3 ¼ A þ A2 3 ¼ A þ TdiagðþJ 2 ðiÞ, þ J 2 ðiÞ, -J 1 ð2ÞÞT - 1 2 2 ¼ diagðJ 2 ðiÞ, J 2 ðiÞ,0Þ ¼ APi , 1∕2 1 1 X 4 ¼ A þ A2 4 ¼ A þ Tdiagð -J 2 ðiÞ, -J 2 ðiÞ, þJ 1 ð2ÞÞT - 1 2 2 ¼ diagð0,0,J 1 ð2ÞÞ ¼ AP2 ,
X1 ¼
where Pi and P2 are spectral projectors corresponding to eigenvalues i and 2, respectively. We conclude that in the primary case we obtain both trivial solutions and all spectral solutions. Nonprimary Roots The theorem says that there are 2p - 2s = 4 nonprimary square roots of A2, and they are given by 1∕2
1∕2 1∕2 -1 -1 ðA2 Þj ðUÞ = TUdiagðJ 1∕2 T 2 ð - 1Þ, J 2 ð - 1Þ, J 1 ð4ÞÞU
= TUdiagð ± J 2 ðiÞ, ± J 2 ðiÞ, ± J 1 ð2ÞÞU - 1 T - 1 , j = 5, 8, where we choose the different sign for J2(i) blocks; U is some invertible matrix that commutes with J = diag(J2(-1), J2(-1), J1(4)). By the Sylvester Theorem 3.1, the matrix U must be of the form
where a, b, c, d, e, f, g, j are some complex numbers. Since detðUÞ = ðce - agÞ2 j, it must be that ce ≠ ag and j ≠ 0. For example, we find X5(U) as follows: 1∕2 1 A þ ðA2 Þ5 ðUÞ 2 1 = ðA þ TUdiagðþJ 2 ðiÞ, - J 2 ðiÞ, þ J 1 ð2ÞÞU - 1 T - 1 Þ, 2
X 5 ðUÞ =
N. Č. Dinčić and B. D. Djordjević
290
and after some calculations we arrive at ð1Þ
ð2Þ
ð3Þ
ð4Þ
ð5Þ
X 5 ðUÞ = X 5 ðUÞ; X 5 ðUÞ; X 5 ðUÞ; X 5 ðUÞ; X 5 ðUÞ where -cðaeðg þ 2hÞ-2af g þ 2begÞ þ 2adeg þ c2 e2 ðce-agÞ2
ice ce-ag
ice ce-ag
0 ð1Þ ð2Þ iac X 5 ðUÞ= , X 5 ðUÞ = ce-ag
a2 ð2dg-cðg þ 2hÞÞ þ ac2 ðe þ 2f Þ-2bc2 e ðce-agÞ
0
,
2
iac ce-ag
0
0 g agðe-2f Þ þ 2beg þ e2 ð-ðc þ 2dÞÞ þ 2ce2 h ðce-agÞ2
ieg ag-ce
ieg ag-ce
0 ð3Þ ð4Þ iag , X 5 ðUÞ = X 5 ðUÞ= ag-ce
g a2 g-aðcðe þ 2f Þ þ 2deÞ þ 2bce þ 2aceh
,
2
ðce-agÞ
0
iag ag-ce
0
0 0 0 ð4Þ X 5 ðUÞ =
0 : 0 2
We see that X5(U) depends on eight complex parameters, a, b, c, d, e, f, g, j, such that ce ≠ ag and j ≠ 0. Remark that, because σðAÞ \ i ≠ Ø, we cannot apply the formula (4.2) as the sign of A is not defined. ♣
Yang-Baxter-Like Matrix Equation: A Road Less Taken
4.1.2
291
(A2)1∕2 and the Matrix Unwinding Function
The unwinding number of a complex number z is usually defined as UðzÞ =
z - log ez , 2πi
where log is the principal logarithm, - π < Im log z ≤ π. The matrix unwinding function is defined as UðAÞ =
A - log eA , 2πi
please see [2] for details. In the terms of Jordan blocks, the matrix unwinding function is given by UðAÞ = TdiagðUðλ1 ÞI n1 , . . . , Uðλp ÞI np ÞT - 1 : Recall that for an invertible matrix A we have [2, Lemma 3.11 for n = 2] ðA2 Þ
1∕2
= AeiπUð2log AÞ
Corollary 4.5 (Dinčić) The expression for all commuting solutions (4.1) to the YBME (1.1) can be rewritten as 1 X = A I þ eiπUð2log AÞ , 2 where U is the matrix unwinding function.
4.2
Existence of Non-commuting Solutions
In this section we provide sufficient conditions for the existence of non-commuting solutions. Recall Example 3; when A = λI2 for λ ≠ 0, there are no non-commuting solutions; thus their existence is not a given. Since the case of a diagonalizable matrix A was closed in [9] (recall Theorem 2.6 from Section 2.1.1), we only inspect the remaining case, i.e., the case when A is not diagonalizable.
N. Č. Dinčić and B. D. Djordjević
292
We proceed to prove our results by transforming the initial equation (1.1) into a Sylvester equation (3.1). We thereby derive a method for computing some initial non-commuting solutions to the YBME (1.1). Later on, the obtained solutions will be used in Section 6 to generate infinitely many new non-commuting solutions. Theorem 4.6 is an original result proved by Djordjević, and has not been published elsewhere, while Theorem 4.8 was also obtained by the authors and is a part of the published paper [15]. Recall notation and results from Section 3. Let A = UDW be the singular-value decomposition of A, where U and W are unitary while D is a positive definite diagonal matrix. If W = U-1, i.e., if A is a positive matrix, then A is also diagonalizable; thus we assume that W ≠ U-1. Recall that in the SVD the unitary matrices U and W are not uniquely determined, but are unique up to a phase shift (see Trefethen and Bau III 49, pp. 25–30 and [50]): in other words, if there exist unitary matrices U′ and W′ such that A = U′DW′ holds, then there exist α1 , . . . , αn , β1 , . . . , βn 2 ½0, 2π Þ such that U ′ = U diagðeiα1 , . . . , eiαn Þ,
W ′ = W diagðeiβ1 , . . ., eiβn Þ
where αk þ βk = γ k = const,
k = 1, n:
Consequently, the unitary matrix V := WU is uniquely determined. By assumption, W ≠ U-1; therefore V ≠ I. Theorem 4.6 (Djordjević) Let A 2 M n ðÞ be an invertible matrix, A = UDW its SVD decomposition such that V := WU ≠ I and σ(V ) \ σ(V) = Ø. The following statements are equivalent: (a) There exists a nonzero square matrix Z which solves the matrix equation p p p Z = D DðVZV Þ D
1∕2 p
D,
ð4:3Þ
p where D is the positive square root of D while ðÞ1∕2 is an arbitrary square root of the matrix expression. (b) The matrix X = W - 1 D - 1 ZD - 1 U - 1 is a nontrivial solution to the YBME (1.1).
Yang-Baxter-Like Matrix Equation: A Road Less Taken
293
If (a) or (b) holds, then the solution X is non-commuting if and only if DVZ ≠ ZVD:
ð4:4Þ
Proof For an arbitrary matrix X, denote by Y := WXU. Observe the following chain of equalities: AXA = XAX , UDWXUDW = XUDWX , UDYDWU = W - 1 YDY , VDYD = YDYV : Ergo X solves the YBME if and only if Y solves VDYD = YDYV :
ð4:5Þ
(b) ) (a) : Assume that X0 is a nontrivial solution to the YBME and let Y0 = WX0U. If Y0 solves the equation DY D = Y DY (e.g., if Y0 = D or is equal to any other commuting solution with respect to D obtained via Theorem 4.2), then there exists a matrix L0 such that DY0D = Y0DY0 = L0. However, substituting Y0 into the equation (4.5) gives VL0 = L0 V , L0 = 0 , Y 0 = 0 , X 0 = 0, which is not possible. Thus there exists a nonzero matrix L1 given as L1 := Y 0 DY 0 - DY 0 D: By the Sylvester Theorem 3.1, there exists a unique Z1 ≠ 0 such that V Z1 - Z1V = L1V. Substituting into the equation, we get 0 = VDY 0 D - Y 0 DY 0 V = VDY 0 D - ðDY 0 D þ L1 ÞV = VðDY 0 DÞ - ðDY 0 DÞV - L1 V i.e., due to the regularity of the Sylvester equation, Z1 = DY0D. Consequently, X0 = W-1D-1Z1D-1U-1 is one solution to the YBME (1.1). To verify that (4.3) holds, note that
N. Č. Dinčić and B. D. Djordjević
294
Y 0 DY 0 - DY 0 D = L1 , Y 0 DY 0 = L1 þ DY 0 D = L1 þ Z 1 = ðVZ 1 - Z 1 V ÞV þ Z 1 = VZ 1 V p p p p , DY 0 DY 0 D = DVZ 1 V D p p 2 p p , DY 0 D = DVZ 1 V D p p p 1∕2 p - 1 D , Y 0 = D - 1 DVZ 1 V D p p p 1∕2 p , DY 0 D = Z 1 = D DVZ 1 V D D:
Finally, we verify that AX 0 = UZ 1 D - 1 U - 1 ,
X 0 A = W - 1 D - 1 Z 1 W,
so AX 0 = XA0 , UZ 1 D - 1 U - 1 = W - 1 D - 1 ZW , DVZ 1 = Z 1 VD: (a) ) (b) : Conversely, let Z be such that (4.3) holds. Respectively, let L := V ZV - Z and Y := D-1ZD-1. Similarly as before, we have p p p 1∕2 p - 1 D Y = D - 1 DVZV D p p p 2 p , DY D = DVZV D , YDY = VZV = VZV - Z þ Z = L þ DYD , YDY - DYD = L: To verify that (4.5) holds for such an Y , we compute as VDYD - YDYV = VDYD - ðDYD þ LÞV = 0 , VZ - ZV = LV , VZV - Z = L, which is true. Thus Y solves the equation (4.5), or equivalently, X := W - 1 D - 1 ZD - 1 U - 1 solves the YBME (1.1). Direct verification shows that X is a non-commuting solution if and only if (4.4) holds. □
Yang-Baxter-Like Matrix Equation: A Road Less Taken
295
Remark 4.7 Note that solving (4.3) subjects to finding nonzero fixed points for the mapping p p p Z ° D DðVZV Þ D
1∕2 p
D,
which is a special form of the function Ψ from (2.7). Written out in these terms, it is a numerically stable problem that can be solved via Banach fixed point theorem or Brouwer fixed point theorem; see Section 7. The next result concerns the scenario where the unitary matrix V and its inverse V have common eigenvalues. We denote this spectral intersection by σ := σðVÞ \ σðV Þ ≠ Ø: Accordingly, denote by E V =
λðVÞ
λðVÞ2σ
λðVÞ
E V , where E V
is the eigenspace for
V which corresponds to the shared eigenvalue λ(V ) 2 σ. For easier notation, λ ðVÞ let jσ j = ℓ, dim EVi = ki when i = 1, ℓ dim EV = k = k 1 þ ⋯ þ kℓ and dim E ⊥ V = n - k, k ≤ n. Finally, since D is a diagonal positive matrix, we have D=
D1
0
0
D4
EV
:
E⊥ V
→
EV E⊥ V
:
Theorem 4.8 [15] With respect to the previous notation, assume that σ ≠ Ø. λðVÞ λðVÞ There exists a Hermitian matrix G0 2 M k ðÞ such that G0 : E V → E V for every λ(V ) 2 σ and the matrix X =W
-1
D1- 1 G0
0
0
0
U -1
ð4:6Þ
is a solution to (1.1). There are at least 2k different choices for G0. Proof As in the previous proof, for any square matrix X 2 M n ðÞ, denote by Y := WXU. Then X solves (1.1) if and only if Y solves (4.5). As opposed to the previous theorem, where σ(V ) \ σ(V) = Ø, we now have a singular Sylvester equation
N. Č. Dinčić and B. D. Djordjević
296
VZ - ZV = 0,
ð4:7Þ
which is solvable for a nontrivial Z. Therefore we solve the homogeneous equation (4.7) and obtain the set S Syl ðV; V ; 0Þ. Afterward, we seek those matrices Z 2 S Syl ðV; V ; 0Þ which are fixed points for the mapping f ðZÞ := D - 1 ZD - 1 ZD - 1 : Once such matrices Z are found, we once again verify that the matrices Y := D-1ZD-1 solve (4.5), while the matrices X := W-1Y U-1 solve (1.1). To start, V and V are unitary matrices which are dual to each other (V = V-1), λðVÞ so every eigenspace EV for V which corresponds to its eigenvalue λ(V ) 2 σ(V ) is simultaneously an eigenspace for V which corresponds to λðVÞ λðV Þ 2 σðV Þ. In that sense, E V := EV is an invariant subspace λðVÞ2σ
under both V and V . On the other hand, the fact that EV is a V -invariant subspace implies that E ⊥ V is a V -invariant subspace. Simultaneously, since EV is a V -invariant subspace, it follows that E⊥ V is a V -invariant subspace. All this justifies the decomposition V=
V1
0
0
V4
EV
:
E⊥ V
→
EV
,
E⊥ V
where V1 and V4 are invertible and unitary. Analogously
V =
V 1
0
0
V 4
:
EV E⊥ V
→
EV E⊥ V
:
It follows that all solutions to the Sylvester equation (4.7) are of the form (see Theorem 3.3) Z= where N =
N
0
0
0 λðVÞ
λðVÞ2σ
N σ and N σ : E V
the appropriate dimensions.
:
EV E⊥ V λðVÞ
→ EV
→
EV E⊥ V
,
ð4:8Þ
are arbitrary square matrices of
Yang-Baxter-Like Matrix Equation: A Road Less Taken
297
In order to verify that among all the solutions (4.8) there exists a Z0 which is a fixed point for f, we proceed as follows: D - 1 ZD - 1 ZD - 1 0 D1- 1 = 0 D4- 1 = =
N
0
D1- 1
0
N
0
D1- 1
0
0
0
0
D4- 1
0
0
0
D4- 1
D1- 1 N
0
D1- 1 N
0
D1- 1
0
0
0
0
0
0
D4- 1
D1- 1 ND1- 1 ND1- 1
0
0
0
:
Obviously Z0 = f(Z0) if and only if there exists an N0 in (4.8) such that N 0 = D1- 1 N 0 D1- 1 N 0 D1- 1 , if and only if there exists an N0 from (4.8) such that 2
N 0 D1- 1 = D1- 1 ðN 0 D1- 1 Þ D1- 1 holds, that is, if and only if there exists an N0 in (4.8) such that N 0 D1- 1 is a fixed point for the function (recall the function Ψ from (2.7)) gðZÞ := D1- 1 Z 2 D1- 1 : Conversely, if G0 is a nontrivial fixed point for g such that G0 : N0 0 λðVÞ λðVÞ is E V → E V for every λ(V ) 2 σ, then N0 := G0D1, and Z 0 := 0 0 a nontrivial solution to (4.7) which solves the equation Z0 = f(Z0); thus X =W
=W
-1
-1
D1- 1
0
0
D4- 1
D1- 1 G0
0
0
0
G0 D1 0
0 0
D1- 1
0
0
D4- 1
U -1
U -1
is a nontrivial solution to (1.1). Now to compute the sought G0, we conduct the following analysis: Let 1 ≤ i ≤ ℓ be arbitrary. Observe the corresponding restriction of D1 to λi ðVÞ λ ðVÞ E V , denoted as D1i. In that sense, the restriction of G0 to E Vi is denoted
N. Č. Dinčić and B. D. Djordjević
298
λ ðVÞ
as G0i. Finding a self-adjoint fixed point for g on E Vi subjects to solving G0i = D1i- 1 G20i D1i- 1 . Recall that D1i- 1 is a diagonal matrix, having singular values of A-1 on its main diagonal. Denote by si1 ðA - 1 Þ, . . . , sip ðA - 1 Þ the ordered set of diagonal elements of D1i- 1 , ip ≤ ki. Proceed to choose an arbitrary ri ≤ ki which would be the rank for G0i. Precisely, we define G0i to be a diagonal matrix, having zeros everywhere except for ri positions on its main diagonal. Finally, at those positions ðdiagðG0i ÞÞj where j belongs to the index set j 2 fj1 , . . . , jri g ⊂ f1, . . ., ki g we write ðdiagðG0i ÞÞj := s2j ðAÞ: By construction, G0,i maps E λVi to E λVi and is a diagonal Hermitian matrix which commutes with D1i while it solves G0i = D1i- 1 G20i D1i- 1 . Since V is a normal matrix, it follows that eigenvectors which correspond to different eigenvalues are mutually orthogonal; thus we can extend G0i to the entire λ ðVÞ
λ ðVÞ ⊥
EV, in a manner that it annihilates every other subspace E Vj ⊂ ðEVi Þ , for every j ≠ i. Finally we take G0 := G0,i . It follows that G0 : λðVÞ EV
1≤i≤ℓ
λðVÞ → EV
for every λ(V ) 2 σ, and it is a Hermitian diagonal matrix. As G0 D1 0 solves (4.7) while it is simultastated before, taking Z 0 := 0 0 neously a fixed point for f. This implies that X provided by (4.6) is a solution to (1.1). Notice that each 0 ≤ ri ≤ ki was arbitrarily chosen, for each 1 ≤ i ≤ ℓ, so there are
0 ≤ ri ≤ ki
ki = 2ki ri
possible choices for each G0i, which gives 1≤i≤ℓ
possible choices for G0.
2ki = 2k □
Yang-Baxter-Like Matrix Equation: A Road Less Taken
299
Remark 4.9 By construction, it follows that G0 is obtained from D21 by taking an arbitrary projector P onto an arbitrary D1 -invariant subspace: G0 = PD21 , and consequently X from (4.6) is obtained as X = W -1
D1 P 0 0
0
U - 1:
Thus the above solutions mimic the spectral solutions with respect to jAj instead of A. However, the matrix G0i need not be defined in this manner: one could apply Theorem 4.2. This will be explored in greater detail in Section 7, though we will lose the self-adjointness of G0i. Once again notice that AX = U
D21 P 0 0
0
U
-1
and XA = W
-1
D21 P 0 0
W
0
so these solutions are not commuting due to W ≠ U-1. Remark 4.10 The fact that none of the non-commuting solutions are isolated will be proved in Section 6, as this statement holds for both regular and singular coefficient matrix A.
5 Solving the Equation When A Is Singular At this point we study the YBME when the coefficient matrix A is singular. Unlike the invertible matrix case, it is rather easy to prove the existence of nontrivial solutions under this premise. We start by providing nontrivial solutions via the left and right zero divisors, and then we generate more nontrivial solutions by means of generalized inverses. At the end of this section, we will study the set of commuting solutions, since they behave differently from the non-commuting ones. The main results in this section were proved by the authors in [15]. Recall that, when A is a singular matrix, there exist left and right zero divisors for A, i.e., there exist nonzero matrices B1 and B2 of appropriate dimensions such that AB1 = 0 and B2A = 0.
300
N. Č. Dinčić and B. D. Djordjević
Theorem 5.1 [15] For a singular matrix A 2 M n ðÞ , there exists a nontrivial solution to the YBME (1.1). Proof Let A 2 M n ðÞ be a singular matrix. There exist some nonzero matrices B1 , B2 2 M n ðÞ such that AB1 = B2A = 0. Moreover, we can always choose B1, B2 not to be A, even in the case when A2 = 0. We verify that AB1A = B1AB1 = 0 and AB2A = B2AB2 = 0 and conclude that B1 and B2 are nontrivial solutions to the YBME (1.1). □ Since in general the matrices B1 and B2 need not commute with A, the above-provided solutions can be commuting or non-commuting. The next result can be proven by direct verification. Proposition 5.2 [15] If A is a singular matrix and X0 is a solution to the YBME (1.1), then (i) if there is nonzero P such that PA = 0, then X0P is a solution, (ii) if there is nonzero Q such that AQ = 0, then QX0 is a solution, (iii) if there is nonzero P such that PA = 0 or nonzero Q such that AQ = 0, then QX0P is a solution.
5.1
Obtaining New Solutions via Generalized Inverses
Below we recall some basic results about generalized inverses for matrices. For a detailed survey on the matter, consult [8]. These results will serve as a toolkit for producing new nontrivial solutions to the YBME (this technique was also applied by the authors in [15]). In general, the solutions obtained in this section need not commute with A, but there is no guarantee: in other words, the solutions obtained below via the generalized inverses are not necessarily commuting or non-commuting. Let A 2 M m×n ðÞ be a given rectangular matrix. A matrix B 2 M n×m ðÞ such that ABA = A is called an inner inverse for A, usually denoted by A-, while A is said to be inner regular. Precisely, if A is inner regular, then the set consisting of its inner inverses is usually denoted by A{1}. Furthermore, if A is inner regular, and there exists an inner inverse B for A which in addition satisfies BAB = B, then B is a reflexive generalized inverse for A, usually denoted by A+, while A is said to be reflexively regular. If A is reflexively regular, the set of its reflexive inverses is denoted by A{1, 2}. We formulate probably the most important theorem regarding applications of the generalized inverses: solving the equation AXC = D where A and C need not be invertible matrices. This famous result was obtained by Penrose in [40]; cf. [8].
Yang-Baxter-Like Matrix Equation: A Road Less Taken
301
Theorem 5.3 (Penrose) Let A 2 M m×n ðÞ, C 2 M p×q ðÞ, D 2 M m×q ðÞ . Then the matrix equation AXC = D is consistent if and only if, for some A-, C-, AA - DC - C = D, in which case the general solution is X = A - DC - þ Y - A - AYCC for arbitrary Y 2 M n×p ðÞ. By Theorem 5.3, all solutions to the equation AXA = 0 are X = Y - A - AYAA - , where Y 2 M n ðÞ is an arbitrary matrix and A- is any inner generalized inverse for the matrix A. We now have the following method for obtaining infinitely many new solutions (which was derived by the authors in [15]): • we solve the equation AXA = 0 and compute X as X = Y - A - AYAA - ; • we check whether XAX = 0, i.e., if X solves the YBME (1.1) (when Y = A-, one such possible solution is X = A - - A - AA - ); • by Theorem 2.18 we conclude that αX is a solution to the YBME (1.1), for every α 2 . Example 13 For A =
2 0 0 0
, all solutions to AXA = 0 are
Xðb, c, dÞ =
0
b
c
d
:
Such X(b, c, d) is a solution to the YBME if and only if bc = 0; hence any X = αXðb, c, dÞ =
0
αb
αc
αd
,
where bc = 0, is a solution as well. Notice that in this simple case the set {X(b, c, d)} coincides with A{1}, which does not hold in general. ♣
N. Č. Dinčić and B. D. Djordjević
302
The above-described method can be used for finding infinitely many (unfortunately, not all) solutions to the initial YBME. Below we proceed to apply the core-nilpotent decomposition on A, in order to make the equations AXA = 0 and XAX = 0 easier to analyze. This reduction was also done by the authors in [15]. Recall that (see [5]) for a square matrix A 2 M n ðÞ, its index ind(A) is defined as indðAÞ = minfk 2 0 : rankðAk Þ = rankðAkþ1 Þg: Let p be the index of the matrix A. It is a well-known fact (see [5], [32]) that the space n can be decomposed into a direct sum of the range and the null space of A p, i.e., n = ranðAp Þ kerðAp Þ, and then the matrix A has the following form (the core-nilpotent decomposition): A=
A1
0
0
A2
:
ranðAp Þ kerðAp Þ
→
ranðAp Þ kerðAp Þ
,
where A1 is invertible, A2 is nilpotent, and Ap2 = 0. Suppose the unknown matrix X has the following form: X=
X1
X2
X3
X4
:
ranðAp Þ kerðAp Þ
→
ranðAp Þ kerðAp Þ
:
We once again return to the previous method for solving the YBME: (1) We solve AXA = 0 to find X: A1 X 1 A1
A1 X 2 A2
=0 A2 X 3 A1 A2 X 4 A2 , X 1 = 0, X 2 A2 = 0, A2 X 3 = 0, A2 X 4 A2 = 0:
AXA =
Yang-Baxter-Like Matrix Equation: A Road Less Taken
303
By Theorem 5.3, it can be rewritten as X=
0
X2
X3
X4
=
0
Y 2 ðI - A2 A2- Þ
ðI - A2- A2 ÞY 3
Y 4 - A2- A2 Y 4 A2 A2-
,
for arbitrary matrices Y2, Y3, Y4. (2) We check whether XAX = 0: XAX = 0 , ðI - A2- A2 ÞðY 3 A1 Y 2 þ Y 4 A2 Y 4 ÞðI - A2 A2- Þ = 0: The last equality holds in some cases, among which are the following: • Y2 = 0 and Y 4 = Aq2 , where q ≥ ( p - 1)∕2; • Y3 = 0 and Y 4 = Aq2 , where q ≥ ( p - 1)∕2; • if we use a reflexive inverse Aþ 2 instead of an inner inverse A2 , we q þ may take Y2 or Y3 to be A2 , and Y 4 = A2 , where q ≥ ( p - 1)∕2; hence we find some families of solutions: for any q ≥ ( p - 1)∕2 the following parametric matrices are solutions to the YBME: 0
0
0
ðI - A2- A2 ÞY 3
,
Aq2 - A2- Aqþ2 2 A2
0
0
ðI - Aþ 2 A2 ÞY 3
qþ2 þ Aq2 - Aþ 2 A2 A2
,
Y 2 ðI - A2 A2- Þ
0 Aq2 - A2- Aqþ2 2 A2 0 Y 2 ðI - A2 Aþ 2Þ
0
qþ2 þ Aq2 - Aþ 2 A2 A2
, :
The above-derived solutions can be used to generate more solutions. The following theorem provides one way of producing a new solution: Theorem 5.4 ([15]) For a given A 2 M n ðÞ, X1A-X2 = X2AX1, then X 1 AX 2 2 S. Proof Let AX1A = X1AX1 and X1A X2 = X2AX1; then we have
if X 1 , X 2 2 S
AX2A = X2AX2.
Suppose
and that
ðX 1 AX 2 ÞAðX 1 AX 2 Þ - AðX 1 AX 2 ÞA = = X 1 AX 2 AX 1 AX 2 - AX 1 AA - AX 2 A = = X 1 AX 2 AX 1 AX 2 - X 1 AX 1 A - X 2 AX 2 = = X 1 AðX 2 AX 1 - X 1 A - X 2 ÞAX 2 = 0, hence X1AX2 is a solution.
□
N. Č. Dinčić and B. D. Djordjević
304
At last, we finish this section with the group inverse. For a given square matrix A 2 M n ðÞ, if there exists a matrix B 2 M n ðÞ such that ABA = A, BAB = B, AB = BA, then B is called a group inverse for A and is commonly denoted by A#. It is a well-known fact (see [8]) that A# exists if and only if ind(A) ≤ 1 and in that case it is unique. Of course, for the invertible A, we have A# = A-1. n
Theorem 5.5 If ind(A) ≤ 1 and X 0 2 S, then An X 0 ðA# Þ 2 S. Proof Let AX0A = X0AX0; then we have n
n
n
AðAn X 0 ðA# Þ ÞA - ðAn X 0 ðA# Þ ÞAðAn X 0 ðA# Þ Þ = n
= An ðAX 0 A - X 0 AX 0 ÞðA# Þ = 0, because n
ðA# Þ Anþ1 = ðA# Þ
n-1 n
A = . . . = A# A2 = A: □
Remark 5.6 Instead of the group inverse, any generalized inverse A such that (Ag)nAn+1 = A can be used. For example, this is true for the core inverse but is not true for the dual core inverse, the Moore-Penrose inverse, and the Drazin inverse. For more details on the core and dual core inverse, please see [3, 41, 42]. g
5.2
Nilpotent Coefficient Matrix A
As shown in the previous section, the core-nilpotent decomposition of the coefficient matrix A can drastically simplify the calculations needed to solve the YBME. Because of this, we dedicate a separate section to the case when A itself is a nilpotent matrix. Recall that a (nonzero) matrix A 2 M n ðÞ is nilpotent if Ak = 0 for some positive integer k. The smallest such k is called the nilpotency index of a matrix. Unless stated differently, when we say that A is nilpotent with its nilpotency index k, we assume that k > 1, to avoid the trivial case A = 0. It follows from the spectral mapping theorem that σ(A) = {0} whenever A is nilpotent. Consequently, there exists a nontrivial solution X0 to the
Yang-Baxter-Like Matrix Equation: A Road Less Taken
305
YBME in this case: recall Theorem 5.1 and Proposition 5.2. In particular, X0 could be provided as X0 = P0A, where P0 is a projector from n onto kerðAÞ, or as X = B1 or X = B2, where B1 and B2 are nonzero matrices which are zero divisors for the nilpotent matrix A: AB1 = B2A = 0. Note that if X0 is chosen to be X0 = P0A, then it is a commuting solution, while if X0 is an arbitrary left or right zero divisor for A (if X0 = B1 or X0 = B2), then X0 could be a non-commuting solution, since in general BA = 0⇎BA = 0: That being said, one initial solution is enough to generate a trajectory of solutions. None of the nontrivial solutions are isolated in this case: Theorem 5.7 ([15]) Let A be of nilpotency index k > 1 and let X 0 2 S. Then X 0 þ aAk - 1 2 S, where a 2 . Proof Indeed, we have AðX 0 þ aAk - 1 ÞA - ðX 0 þ aAk - 1 ÞAðX 0 þ aAk - 1 Þ = AX 0 A - X 0 AX 0 = 0: □ The previous theorem also applies to the trivial solutions X0 = 0 and X0 = A: Corollary 5.8 ([15]) Let A be of nilpotency index k > 1. Then fαA þ βAk - 1 : α, β 2 g ⊂ SðAÞ: The following result is convenient if the initial solution X0 is non-commuting: Corollary 5.9 ([15]) Let A be of nilpotency index k > 1 and let X 0 2 S. Then ðI - Ap ÞX 0 ðI þ Ap Þ 2 S, for any positive integer (k - 1)∕2 ≤ p ≤ k. Proof Because p ≥ (k - 1)∕2, it is not difficult to see that ðI þ Ap ÞAðI - Ap Þ = A: Therefore
N. Č. Dinčić and B. D. Djordjević
306
AðI - Ap ÞX 0 ðI þ Ap ÞA = ðI - Ap ÞAX 0 AðI þ Ap Þ = ðI - Ap ÞX 0 AX 0 ðI þ Ap Þ = ðI - Ap ÞX 0 ðI þ Ap ÞAðI - Ap ÞX 0 ðI þ Ap Þ, i.e., the matrix (I - A p)X0(I + A p) is also a solution to the YBME.
□
Remark 5.10 Note that if X0 is a commuting solution, then ðI - Ap ÞX 0 ðI þ Ap Þ = X 0 : Theorem 5.11 ([15]) Let A be nilpotent with its nilpotency index k = 3; let X 0 2 S such that A2X0 + X0A2 = 0. Then X 0 þ nA 2 S, n 2 : Proof Note that σ(A) = {0}, so A2X + XA2 = 0 has infinitely many solutions. Let X0 be a solution to the YBME (1.1), and let A2X0 + X0A2 = 0 hold. By Corollary 3.7, X1 = X0 + A is a solution. Then we have A2 X 1 þ X 1 A2 = A2 ðX 0 þ AÞþðX 0 þ AÞA2 = 2A3 þ A2 X 0 þ X 0 A2 = 0, because A3 = 0, so by Corollary 3.7 the matrix X2 = X1 + A = X0 + 2A is a solution. Let Xn := X0 + nA be a solution. Since A2 X n þ X n A2 = A2 ðX 0 þ nAÞþðX 0 þ nAÞA2 = 2nA3 þ A2 X 0 þ X 0 A2 = 0, by Corollary 3.7 Xn+1 = Xn + A = X0 + (n + 1)A is a solution. Mathematical induction completes the proof. □ 0 Example 14 For A = 0 0
2 0 0 i , all solutions to A2X + XA2 = 0 are 0 0
X=
x11
x12
x13
0
x22
x23
0
0
- x11
:
Such X is a solution to the YBME iff x11x22 = 0 and ix11x12 2x11x23 + 2ix22 = 0, i.e., there are several possibilities:
Yang-Baxter-Like Matrix Equation: A Road Less Taken
307
1. if x11 = x22 = 0, then we obtain the three-parameter family of solutions 0
x12
Xðx12 , x13 , x23 Þ = 0
0
0
0
x13 x23 ; 0
2. if x11 ≠ 0, x22 = 0, then x23 = ix12∕2 and the three-parameter family of solutions is x11
x12
0
0
ix12 ∕2 :
0
0
- x11
Xðx11 , x12 , x13 Þ =
x13
By Theorem 5.11, we have: 1. if x11 = x22 = 0, then the three-parameter family of solutions is 0
x12 þ 2n
Xðx12 , x13 , x23 Þ þ nA = 0
0
0
0
x13 x23 þ in ; 0
2. if x11 ≠ 0, x22 = 0, then x23 = ix12∕2 and the three-parameter family of solutions is
Xðx11 , x12 , x13 Þ þ nA =
x11
x12 þ 2n
0
0
0
0
x13 ix12 ∕2 þ in : - x11 ♣
Even though the previous results provide a way for generating infinitely many solutions, the general problem (finding all solutions in this case) remains open. We point out paper [57], where all solutions were given, provided that A2 = 0 and its rank is equal to 1 or 2.
N. Č. Dinčić and B. D. Djordjević
308
5.3
Commuting Solutions When A Is Singular
In this section we analyze the set of all commuting solutions, S c ðAÞ = fX 2 M n ðÞ : AXA = XAX ^ AX = XAg: The idea is to use the fact that S c ðAÞ = S Syl ðA; A; 0Þ \ SðAÞ, where S Syl ðA; A; 0Þ denotes the set of all solutions to the homogeneous Sylvester equation AX = XA. Let A now be singular, and suppose that it is of the form A = diagfJ n1 ð0Þ, J n2 ð0Þ, . . . , J np ð0Þg, where n1 ≥ n2 ≥. . . ≥ np > 0. By Corollary 3.6, we have X = ½X ij p × p , where
Let us find the number of parameters appearing in X. First we consider X 11 = pn1-1 ðJ n1 ð0ÞÞ, and see that it depends on n1 complex parameters. Then we consider its neighbor blocks: X12, X21, X22. We see that both
depend on minfn1 , n2 g = n2 parameters, and X 22 = pn2 - 1 ðJ n2 ð0ÞÞ on n2 parameters as well; hence their contribution is 3n2. By continuing in a such manner, we conclude that the solution X depends on p
n1 þ 3n2 þ 5n3 þ 7n4 þ . . . þ ð2p - 1Þnp = complex parameters.
j=1
ð2j - 1Þnj
ð5:1Þ
Yang-Baxter-Like Matrix Equation: A Road Less Taken
309
Now we put such X into the equation A2X = AX2. Recall that multiplying from the left side by a Jordan zero-block matrix shifts the rows of the observed matrix up and adds the zero row below. We see that, on the lefthand side, squared Jordan zero blocks multiply the corresponding blocks of X, which corresponds to shifting the blocks in X twice. Similarly, on the right-hand side, we see that Jordan zero blocks multiply the blocks from X, which corresponds to shifting the blocks in X. This implies that all elements on the main diagonal that correspond to some Jw(0), w > 0, of A must be zero; hence we slightly refine (5.1) to be p
n1 þ 3n2 þ 5n3 þ 7n4 þ . . . þ ð2p - 1Þnp - q =
j=1
ð2j - 1Þnj - q,
where q is the number of Jw(0) blocks in the Jordan normal form of A with w > 1. It is obvious that commuting solutions are not isolated. Example 15 Let A = diag(J4(0), J3(0), J2(0), J1(0), J1(0)). By [14, Theorem 3.5] we have the following form for X:
X=
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
0
x1
x2
x3
0
x5
x6
0
x8
0
0
0
0
x1
x2
0
0
x5
0
0
0
0
0
0
0
x1
0
0
0
0
0
0
0
0
x12
x13
x14
x15
x16
x17
x18
x19
x20
x21
0
0
x12
x13
0
x15
x16
0
x18
0
0
0
0
0
x12
0
0
x15
0
0
0
0
0
0
x22
x23
0
x24
x25
x26
x27
x28
x29
0
0
0
x22
0
0
x24
0
x26
0
0
0
0
0
x30
0
0
x31
0
x32
x33
x34
0
0
0
x35
0
0
x36
0
x37
x38
x39
and we see that it depends on 5 k=1
ð2k - 1Þnk = 4 þ 3 3 þ 5 2 þ 7 1 þ 9 1 = 39
,
N. Č. Dinčić and B. D. Djordjević
310
in general complex parameters. When we put this X into the YBME, because of the multiplication by zero Jordan blocks, it follows that x1 = x15 = x26 = 0 (i.e., q = 3), so we are dealing with 36 parameters. Moreover, we arrive at the following set of equations: x12 x5 = 0, x2 - x22
- x13 x5 - x12 x6 - x22 x8 = 0,
x5 - x16 x5 - x2 x5 - x24 x8 = 0, x18 x5 = 0, x12 - x12 x16 - x12 x2 - x18 x22 = 0, x18 x24 = 0, x12 x24 = 0: We see that those nonlinear equations include only 10 variables (namely, x2, x5, x6, x8, x12, x13, x16, x18, x22, x24), which means that there are 26 free parameters (they are shown in bold in the matrix below). 0
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
0
0
x2
x3
0
x5
x6
0
x8
0
0
0
0
0
x2
0
0
x5
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
x12
x13
x14
0
x16
x17
x18
x19
x20
x21
0
0
x12
x13
0
0
x16
0
x18
0
0
0
0
0
x12
0
0
0
0
0
0
0
0
0
x22
x23
0
x24
x25
0
x27
x28
x29
0
0
0
x22
0
0
x24
0
0
0
0
0
0
0
x30
0
0
x31
0
x32
x33
x34
0
0
0
x35
0
0
x36
0
x37
x38
x39
,
For example, let x5 ≠ 0, then x12 = x18 = 0 and x22, x24 are arbitrary. It follows that
Yang-Baxter-Like Matrix Equation: A Road Less Taken
x16 = 1 - x2 -
311
x2 - x22 - x8 x22 x8 x24 , x13 = , x5 x5
and one family of commuting solutions is (recall that x5 ≠ 0) 0 x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
0
0
x2
x3
0
x5
x6
0
x8
0
0
0
0
0
x2
0
0
x5
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
x13
x14
0
x16
x17
0
x19
x20
x21
0
0
0
x13
0
0
x16
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
x22
x23
0
x24
x25
0
x27
x28
x29
0
0
0
x22
0
0
x24
0
0
0
0
0
0
0
x30
0
0
x31
0
x32
x33
x34
0
0
0
x35
0
0
x36
0
x37
x38
x39
:
♣ For a more detailed inspection on this problem, we refer to the paper [46] by Shen, Wei, and Jia. The above discussion shows that it is difficult to address the commuting solutions when A is singular. In that sense, it is adequate to simplify the problem by reducing the coefficient matrix A to a form that is easier to analyze. One way this is achieved is via the core-nilpotent decomposition of A. We then have A = diagðA1 , A0 Þ, where σ(A0) = {0} and A0 is nilpotent, while A1 is regular, i.e., 0= 2σ(A1). Respectively, any matrix X can be partitioned as X=
X1
X2
X3
X0
:
N. Č. Dinčić and B. D. Djordjević
312
In that case, the equality AX = XA is equivalent to the following four uncoupled homogeneous Sylvester equations: A1 X 1 = X 1 A1 , A1 X 2 = X 2 A0 , A0 X 3 = X 3 A1 , A0 X 0 = X 0 A0 : Since σ(A0) \ σ(A1) = Ø, it follows that X2 = 0 and X3 = 0, so we write X = diagðX 1 , X 0 Þ: When we plug in such X into the YBME, we arrive to the following two uncoupled YBMEs: A1 X 1 A1 = X 1 A1 X 1 , A0 X 0 A0 = X 0 A0 X 0 , or when we use the commutativity A1X1 = X1A1, A0X0 = X0A0: A21 X 1 = A1 X 21 , A20 X 0 = A0 X 20 : Once we solve these two equations, we can construct the commutative solution set of the original YBME: S c ðdiagðA1 , A0 ÞÞ = diagðS c ðA1 Þ, S c ðA0 ÞÞ: Since A1 is invertible, the first equation A1X1A1 = X1A1X1 is completely closed in Section 4.1. The second equation, A0 X0 A0 = X0 A0 X0, which deals with the nilpotent matrix A0, was considered in the previous Section 5.2. Recall that not all solutions are provided for the nilpotent matrix A0, but there are infinitely many of them to choose from.
6 Non-commuting Solutions The methods for generating new non-commuting solutions from the existing ones rely on the matrix functions; hence we will recall some basic facts about the primary and nonprimary matrix functions. The main results in this section were obtained by the authors in [15].
Yang-Baxter-Like Matrix Equation: A Road Less Taken
6.1
313
Intermezzo: Elements of the Matrix Functions Theory
This subsection is mainly based on [31] and [33], and its aim is to provide a brief reminder to the reader about the matrix functions theory. For a matrix A 2 M n ðÞ with distinct eigenvalues λ1, . . . , λs, a matrix function f is said to be well defined on σ(A) if the values f ðjÞ ðλi Þ, j = 0, . . . , ni - 1, i = 1, . . . , s, exist, where f ( j)(λi) denotes the j -th derivative of f at point λi and ni is the dimension of the corresponding maximal Jordan block. When a function f is well defined on σ(A), there exists a unique matrix f(A) which is determined via the matrix function f at point A. More precisely, if a matrix A 2 M n ðÞ has the following Jordan normal form that contains p Jordan blocks (with s ≤ p) T - 1 AT = J = diagðJ 1 ðλ1 Þ, . . . , J p ðλp ÞÞ, where T is invertible, J i ðλi Þ 2 M ni ðÞ, and n1 + . . . + np = n, then f ðAÞ := Tf ðJÞT - 1 = Tdiag f ðJ 1 ðλ1 ÞÞ, . . . , f ðJ p ðλp ÞÞ T - 1 , where f ðλk Þ f ðJ k ðλk ÞÞ :=
f ′ ðλk Þ . . .
f ðnk - 1Þ ðλk Þ∕ ðnk - 1Þ!
0
f ðλk Þ
...
f ðnk - 2Þ ðλk Þ∕ ðnk - 2Þ!
⋮
⋮
⋱
⋮
0
0
...
f ðλk Þ
, k = 1, p:
Another way for defining f(A) is by using Hermite interpolating polynomial, given explicitly by the Lagrange-Hermite formula pðtÞ = where ϕi ðtÞ = f ðtÞ∕
s
ni - 1
i=1
j=0
1 ðjÞ ϕ ðλ Þðt - λi Þj j! i i
nj j ≠ i ðt - λj Þ .
j≠i
ðt - λj Þnj ,
N. Č. Dinčić and B. D. Djordjević
314
The third way to define f(A) is appropriate for the function f which is analytic inside a closed contour Γ that encloses σ(A). Respectively, the matrix f(A) is given via the Cauchy integral f ðAÞ =
1 2πi
f ðzÞðzI - AÞ - 1 dz: Γ
All three definitions are essentially equivalent, and a matrix function obtained by any of them is called a primary matrix function. If we are dealing with a multivalued function, e.g., a complex square root or logarithm, then the same branch must be taken for all Jordan blocks of A which correspond to the same eigenvalue. Note that every primary matrix function of A is always a polynomial of A; thus it always commutes with A. A matrix function f is said to be a nonprimary function if the mapping A ° f(A) cannot be expressed in the above manner, i.e., as a polynomial of A. In this chapter we restrict only to those nonprimary functions which stem from multivalued complex scalar functions. More precisely, we say that the mapping A ° f(A) is a nonprimary matrix function if there exists a multivalued complex scalar function f, such that different branches of f are chosen for the Jordan blocks of A which correspond to the same eigenvalue. They depend on the similarity matrix T that appears in the Jordan form of A, and they cannot be expressed as a polynomial of A. Nevertheless, they do commute with the matrix A. Such nonprimary functions were studied in [31]. We emphasize that under our restrictions of both primary and nonprimary matrix functions f, the matrix A always commutes with f(A). Moreover, for any primary function f, the matrix f(A) commutes with any matrix B that commutes with A, and the spectral mapping theorem holds, i.e., σ( f(A)) = f(σ(A)).
6.2
Generating New Non-commuting Solutions
The results presented in this section demonstrate how to generate infinitely many new non-commuting solutions starting from one non-commuting solution. Consequently, we conclude that none of the non-commuting solutions are ever isolated. The techniques exhibited below hold for both regular and singular coefficient matrix A.
Yang-Baxter-Like Matrix Equation: A Road Less Taken
315
Theorem 6.1 [15] For a given square matrix A, let X 0 2 S and let f be a well-defined function in σ(A) such that f has no zeros in σ(A). Then f ðAÞX 0 f ðAÞ - 1 2 S: Proof Since f has no zeros in σ(A), it follows that f(A) is invertible. Further, since A and f(A) commute, it follows that A and ( f(A))-1 commute as well. Let AX0A = X0AX0. Then f ðAÞX 0 ðf ðAÞÞ - 1 Af ðAÞX 0 ðf ðAÞÞ - 1 - Af ðAÞX 0 ðf ðAÞÞ - 1 A = f ðAÞðX 0 AX 0 - AX 0 AÞðf ðAÞÞ - 1 = 0, hence f ðAÞX 0 ðf ðAÞÞ - 1 2 S.
□
Remark 6.2 Again, note that if X0 commutes with A, then X0 commutes with f(A) and f(A)X0( f(A))-1 reduces to X0 once again. Example 16 It is known that for A = J2(3) (note that σ(A) = {3}) one family of solutions is XðtÞ =
t
ðt∕3 - 1Þ2
-9
6-t
, t 2 :
Fix one t 0 2 . Suppose that f(t) is such that f(3) ≠ 0, so the requirements of Theorem 6.1 are fulfilled. Hence, f ðAÞXðt0 Þf ðAÞ - 1 = =
f ð3Þ
f ′ ð3Þ
t0
0
f ð3Þ
- 9 6 - t0
ðt 0 ∕ 3 - 1Þ2
1∕ f ð3Þ
- f ′ ð3Þ∕ f ð3Þ2
0
1∕ f ð3Þ
t 0 - 9f ′ ð3Þ∕ f ð3Þ ððt 0 - 9f ′ ð3Þ∕ f ð3ÞÞ∕ 3 - 1Þ2 -9
6 - t 0 þ 9f ′ ð3Þ∕ f ð3Þ
= Xðt 0 - 9f ′ ð3Þ∕ f ð3ÞÞ, which is indeed a solution.
♣
Example 17 Recall Example 9, where A = diag(λ, μ), with λμ(λ - μ) ≠ 0 (note that σ(A) = {λ, μ}); one family of solutions is
N. Č. Dinčić and B. D. Djordjević
316
μ2 ∕ ðμ - λÞ
XðtÞ =
t 2
- λμðλ - λμ þ μ Þ∕ ððλ - μÞ Þ λ ∕ ðλ - μÞ 2
2
2
, t 2 ∖f0g:
For a fixed t 0 2 ∖f0g, suppose that f is given such that f(λ)f(μ) ≠ 0, so the requirements of Theorem 6.1 are fulfilled. Hence, f ðAÞXðt 0 Þf ðAÞ - 1 =
μ2 μ-λ
f ðλÞ
0
0
f ðμÞ μ2 μ-λ
=
-
λμðλ2 - λμ þ μ2 Þ ðλ - μÞ2 t 0 f ðλÞ t0 f ðμÞ
λμðλ2 - λμ þ μ2 Þ f ðλÞ ðλ - μÞ2 t 0 f ðμÞ
λ2 λ-μ
t0
1 f ðλÞ
0
λ2 λ-μ
0
1 f ðμÞ
= X t0
f ðλÞ , f ðμÞ
which is indeed a solution to (1.1). Let us consider another family of solutions, XðtÞ =
αμ
0
t
μ
,
p where α = ð1 ± i 3Þ ∕ 2. On a similar way as before, we have f ðAÞXðt0 Þf ðAÞ - 1 = X which is also a solution to (1.1).
f ðμÞ t , f ðαμÞ 0 ♣
Theorem 6.3 [15] Let A be an arbitrary square matrix. There are no isolated nontrivial solutions to the YBME (1.1) which do not commute with A. Proof Let X0 be a nontrivial solution that does not commute with A. Observe the matrix function f(A) = eAt, where t is an arbitrary complex parameter. Then f(A) is invertible, regardless of whether A is invertible or not. According to Theorem 6.1, X(t) := eAt X0e-At is a path-connected family of nontrivial □ solutions to (1.1). Thus X0 = X0(0) is not an isolated solution.
Yang-Baxter-Like Matrix Equation: A Road Less Taken
317
Theorem 6.1 can be further weakened in the sense that we do not necessarily need the invertibility of f(A). Theorem 6.4 ([15]) For a given matrix A 2 M n ðÞ, let X 0 2 S and let f, g be well-defined functions in σ(A) such that gðAÞAf ðAÞ = A:
ð6:1Þ
Then f ðAÞX 0 gðAÞ 2 S. Proof Since f ðAÞX 0 gðAÞAf ðAÞX 0 gðAÞ - Af ðAÞX 0 gðAÞA = f ðAÞX 0 AX 0 gðAÞ - f ðAÞAX 0 AgðAÞ = f ðAÞðX 0 AX 0 - AX 0 AÞgðAÞ = 0, □
f(A)X0g(A) is indeed a solution.
For the invertible A, the condition (6.1) reduces to f(A)g(A) = g(A)f(A) = I, i.e., to Theorem 6.1. However, for the singular matrix A, the functions f and g need not be uniquely related, as can be seen from the next example. Example 18 Let A = J2(0) and suppose that functions f and g are defined on the σ(A) = {0}. The condition (6.1) is satisfied iff gðAÞAf ðAÞ =
gð0Þ
g ′ ð0Þ
0
1
f ð0Þ
f ′ ð0Þ
0
gð0Þ
0
0
0
f ð0Þ
0 f ð0Þgð0Þ
=
0
0
= A,
i.e., iff f(0)g(0) = 1. Hence, the following choices for f and g are possible: f ðAÞ =
f ð0Þ
f ′ ð0Þ
0
f ð0Þ
, gðAÞ =
1∕ f ð0Þ
g ′ ð0Þ
0
1∕ f ð0Þ
,
where all three values, f(0) ≠ 0, f′(0), g′(0), are arbitrary. It is well-known (cf. Example 1) that every solution to the YBME (1.1) is of the form
N. Č. Dinčić and B. D. Djordjević
318
X0 =
a
b
0
d
, where ad = 0:
By Theorem 6.4, the matrix f(A)X0g(A) is f ðAÞX 0 gðAÞ =
a
b þ af ð0Þg ′ ð0Þ þ df ′ ð0Þ∕ f ð0Þ
0
d
,
which is indeed an element of the set SðAÞ.
6.3
♣
The m -Potent Matrix Case
Recall that a matrix A 2 M n ðÞ is m -potent if Am = A. As a particular case, for m = 2 we have idempotent, for m = 3 tripotent, and for m = 4 quadripotent matrices. Idempotent matrices were considered in [37], and all solutions in this case were obtained. Theorem 6.5 [37] Let A be an n × n idempotent matrix with rank r. Suppose A = Tdiag(Ir, 0)T-1 for some nonsingular matrix T. Then all the solutions of the Yang-Baxter-like matrix equation (1.1) are given by X =T
Y1
C
E
Y2
T - 1, Y 1 = U
Is
0
0
0
U - 1 , Y 2 arbitrary,
where U is any r × r nonsingular matrix, s ≤ r is any nonnegative integer, C = UZ and E = WU-1 with Z and W given by Z=
0 Z2
, W =½0
W 2 ,
and W2Z2 = 0. The case of tripotent matrix A was investigated in [1], and all solutions were obtained. The paper [56] collects all commuting solutions for a tripotent matrix A as well. The results obtained in [1] were obtained as a corollary of [59], where a singular diagonalizable matrix with three distinct eigenvalues
Yang-Baxter-Like Matrix Equation: A Road Less Taken
319
was considered. This paper was based on [34], where the case A2 = I was investigated. Some non-commuting solutions for a quadripotent matrix A were considered in the paper [58]. The m-potent case was considered in [28], as a corollary of more general result concerning the diagonalizable matrices, and all commuting solutions were characterized. Theorem 6.6 [28] Suppose A is such that Am = A and A j ≠ A for j = 2, . . . , m - 1 with some m > 2. Let λ1, . . . , λr be all the distinct eigenvalues of A with multiplicities n1, . . . , nr, respectively. Then all commuting solutions of (1.1) are X = UdiagðY 1 , . . . , Y r ÞU - 1 , where each Yi = λiPi with Pi any ni × ni projection matrix if λi is a (m - 1) -st root of 1 and Yi is any ni × ni matrix if λi = 0. Proposition 6.7 [15] For a m -potent matrix A, if X0 is a nontrivial solution to the YBME, then Aj X 0 Am - 1 - j , j = 0, m - 1, is a solution as well. Proof If we take f(A) := A j and g(A) := Am-1-j, where j = 0, 1, . . . , m - 1, the condition (6.1) is satisfied, so by Theorem 6.4 we have the result. □
7 A Special Case: Doubly Stochastic and Permutation Solutions Due to their close connections with braid groups, graph theory, probability, and statistics, a special section of this chapter is dedicated to doubly stochastic and permutation matrices. We start with the results obtained by Ding and Rhee in [16] and finish with those obtained by Djordjević (some unpublished, some published in [22]), which complete this case (provide sufficient and necessary conditions for the existence of doubly stochastic and permutation solutions). Unless stated differently, we are working with the n -dimensional real space n and the corresponding space of square n -dimensional matrices M n ðÞ. Respectively, we are working with standard Euclidean basis vectors denoted as ek, k = 1, n, where ek = ½δki Ti= 1,n and δki is the Kronecker delta symbol.
320
N. Č. Dinčić and B. D. Djordjević
At this point we revisit some basic concepts about permutation and doubly stochastic matrices. For a more detailed survey on the matter, consult, e.g., [5]. A real square matrix A is a permutation matrix if it has only one nonzero entry in each of its rows and columns, and those nonzero entries are all equal to 1. Equivalently, A is said to be a permutation matrix if and only if there exists a permutation π A of the index set {1, . . . , n} such that Aek = eπA ðkÞ , for every 1 ≤ k ≤ n. In that sense, the permutation π A is uniquely determined by A and vice versa. The matrix A is said to have a fixed point if and only if the permutation π A has a fixed point, which is equivalent to A having the entry 1 on its main diagonal, i.e., 1 2 diagðAÞ. It is not difficult to see that A is a permutation matrix if and only if it is invertible and A-1 is a permutation matrix as well; in that case, if π A is the permutation which corresponds to A, then π A- 1 is the permutation which corresponds to A-1. Furthermore, the product of any two permutation matrices is also a permutation matrix, and the corresponding permutation of the index set is in that case equal to the composition of the corresponding permutations of the index set. This shows that the set of permutation matrices in M n ðÞ, equipped with the standard matrix multiplication, is algebraically isomorphic to the symmetric group Sn, with In as the group neutral. A real square matrix is a doubly stochastic matrix if its entries are nonnegative and its every row sum and column sum equals to one. The set of all doubly stochastic matrices is the smallest convex polytope in M n ðÞ which contains all permutation matrices, and consequently, permutation matrices are the extreme points of the said polytope: in other words, any doubly stochastic matrix can be expressed as a convex combination of permutation matrices. A product of any two doubly stochastic matrices is also a doubly stochastic matrix, but it is as far as we can go algebraically speaking: the set of doubly stochastic matrices, equipped with the standard matrix multiplication, forms only a semigroup and not a proper group. Not every doubly stochastic matrix is invertible, nor do their inverses (for the invertible ones) need to be doubly stochastic matrices. In fact, the following characterization holds: for an invertible square matrix A, both A and A-1 are doubly stochastic matrices if and only if A is a permutation matrix.
Yang-Baxter-Like Matrix Equation: A Road Less Taken
7.1
321
Doubly Stochastic Solutions
Motivation for finding doubly stochastic solutions comes from the results obtained in [16]. At this point we assume that A is an arbitrary invertible matrix. Recall Brouwer fixed point theorem (see, e.g., [7]): Theorem 7.1 (Brouwer) Let W be a compact and convex set in the Euclidean space V . Then any continuous mapping f : W → W has a fixed point in W. Theorem 7.2 [16] Let A be an invertible matrix such that A-1 is a doubly stochastic matrix. Then there exists a doubly stochastic matrix B such that BA-1 is a doubly stochastic solution to the YBME (1.1). Proof Recall function Ψ from (2.7): the matrix X is a solution to the YBME (1.1) if and only if XA is a fixed point for the mapping ΨðXAÞ = A - 1 ðXAÞ2 A - 1 : Since A-1 is a doubly stochastic matrix, and the polytope of doubly stochastic matrices W is a convex and compact set in M n ðÞ, it follows by Brouwer fixed point theorem that Ψ has a fixed point in W: there exists a B 2 W such that B = Ψ(B). Equivalently, X := BA-1 is a doubly stochastic solution to the YBME. □ The previous theorem provides solvability of the equation in the set of doubly stochastic matrices; however, there is no guarantee that the obtained solution BA-1 is not a trivial one X = A. Thus the following corollary was introduced: Corollary 7.3 ([16]) With respect to the previous theorem, if in addition A is not a doubly stochastic matrix, then the obtained solution X = BA-1 is a nontrivial one. In other words, if A is invertible and not a doubly stochastic matrix, while its inverse A-1 is a doubly stochastic matrix, then there exists a nontrivial doubly stochastic solution X. Thus we are interested in the remaining case, and that is when A and A-1 are both doubly stochastic matrices, i.e., the case when A is a permutation matrix. We prove the following lemma which is valid for arbitrary invertible matrix A. Lemma 7.4 (Djordjević) Let A be an invertible matrix and let B be such that AB = B and BA = B2. Then BA-1 is a solution to (1.1).
N. Č. Dinčić and B. D. Djordjević
322
Proof Denote by X := BA-1 = (A-1B)A-1. It immediately follows that B = AXA. By assumption, B = B2A-1 = B(BA-1) = BX on one hand and B = A-1BA-1A = XA on the other, giving B = XA = BX. This implies XAX - AXA = XAX - B = BX 2 - B = BðX 2 - IÞ = BðX - IÞðX þ IÞ = ðBX - BA - 1 AÞðX þ IÞ = ðX - BA - 1 ÞAðX þ IÞ = 0: □ The previous calculation contains some trivial cases that need to be excluded. Corollary 7.5 (Djordjević) With respect to the previous notation and assumptions, denote by X := BA-1 the solution to (1.1) obtained via Lemma 7.4. The following statements hold: (a) (b) (c) (d)
If B = 0, then X = 0. If A = I, then X is a projector. If B is invertible, then A = B = X = I. If σ(A) \ σ(B) = Ø, then X = 0.
Proof (a) Obvious. (b) If A = I = A-1, then B2 = B; thus B is a projector and so is X. (c) If B is invertible, then from A-1B = B it follows that A-1 = I = A. The previous claim implies that B is an invertible projector, i.e., B = I and finally X = A = B = I. (d) Since B = BX = XA, it follows that X is a solution to the homogeneous Sylvester equation BX - XA = 0. If σ(A) \ σ(B) = Ø, then its solution X is unique and it is precisely X = 0. □ In order to find a nontrivial solution X via Lemma 7.4, it is necessary for the following assumptions to hold: • A should be an invertible nonidentity matrix. • B must be a singular nonzero matrix such that A-1B = B = B2A-1 holds. • A and B must have common eigenvalues. In the following theorem, we do not exclude the case when A = I because the statement still holds, i.e., we manage to find (a projector) matrix B which is a doubly stochastic singular matrix such that B2A-1 = B = A-1B.
Yang-Baxter-Like Matrix Equation: A Road Less Taken
323
Theorem 7.6 ([22]) Let A be a permutation matrix such that the corresponding permutation π A has at least one fixed point. Then there exists a doubly stochastic singular matrix X which is a solution to (1.1). Proof Let A be a permutation matrix, such that the corresponding permutation of column vectors has k fixed points, 1 ≤ k ≤ n. Assume that in the standard basis {e1, . . . , en} the matrix A has the form A = diagðI k , Sðn - kÞ Þ, where Ik is the identity matrix on k while Sðn - kÞ 2 M ðn - kÞ ðÞ is an arbitrary permutation matrix without fixed points. With respect to the block-diagonal form of A, the space n allows the decomposition n = k n - k : Let ℓ 2{1, k - 1} be arbitrary and choose B to be of the form B = diagðI k-ℓ , B0 Þ, where Ik-ℓ is the identity matrix on the space k spanðek , ek-1 , . . . , ek-ℓþ1 Þ, while B0 is a square matrix given on n-k spanðek , ek-1 , . . . , ek-ℓþ1 Þ, defined as
1 B0 = n-k þ ℓ
1
1
...
1
1
1
1
...
1
1
⋮ ⋮
⋱
⋮
⋮
1
1
...
1
1
1
1
...
1
1
:
ðn-kþℓÞ × ðn-kþℓÞ
It is not difficult to see that B is a doubly stochastic singular matrix: its null space kerðBÞ consists of vectors of the form
N. Č. Dinčić and B. D. Djordjević
324
½ 0 , . . . , 0; k-ℓ
n- kþℓ - 1
T
ð -λi Þ;λ1 ; λ2 ;. ..;λn- kþℓ -1 : λ1 , .. .,λn- kþℓ -1 2
i= 1
and kerðBÞ⊥ =
½ α1 ; α2 ; . . . ; αk - ℓ ; α; α; . . . ; α T : α1 , . . ., αk - ℓ , α 2 : n - kþℓ
It is obvious that A and B commute, AB = B and B2 = BA, so Lemma 7.4 applies and X = BA-1 = B is indeed a doubly stochastic solution to the (1.1). If A does not allow the diagonal representation introduced above with respect to the standard Euclidean basis, then once again we apply the transformation A = T-1A′T, where T transforms the standard basis {e1, . . . , en} into its permutation {f1, . . . , fn}, such that A′ has the above diagonal form in terms of {f1, . . . , fn}. By the proved part of the theorem, there exists a B′ given as above (w.r.t. the new basis), such that B = T-1B′T is the sought doubly stochastic singular matrix. □ Remark 7.7 In the previous proof, we managed to find a solution X which commutes with A. Note that, when k = n, one should not choose ℓ = 1, in order to avoid the trivial solution X = A. Example 19 Let 0
0
0
1 0
0
1
0
0 0
A= 0
0
1
0 0
0
0
0
0 1
1
0
0
0 0
be given in the standard basis, ei = ½δi,j Tj= 1,n , where δi,j is the Kronecker delta. Then A : e1 ° e5, A : e2 ° e2, A : e3 ° e3, A : e4 ° e1, and A : e5 ° e4. In that sense, define V 1 = spanfe2 , e3 g and V 2 = spanfe1 , e4 , e5 g, with the corresponding basis vectors: f1, f2 -basis for V1 and f3, f4, f5 -basis vectors for V2. Precisely, we have f1 = e2, f2 = e3, f3 = e5, f4 = e1, and f5 = e4. Then A has the form of A′ in terms of {f1, . . . , f5}:
Yang-Baxter-Like Matrix Equation: A Road Less Taken
1
0
0 0
0
0
1
0 0
0
A′ = 0
0
0 1
0
0
0
0 0
1
0
0
1 0
0
325
V1
:
→
V2
V1 V2
:
The corresponding transition matrices T and T-1 are computed as 0 1
0
0
0
0
0
0 1
0
0 0
1
0
0
1
0
0 0
0
T= 0 0
0
0
1 , T -1 = 0
1
0 0
0
1 0
0
0
0
0
0
0 0
1
0 0
0
1
0
0
0
1 0
0
and A = T - 1 A ′ T:
With respect to the basis {f1, . . . , f5}, we chose the matrix B′ to be 1
0
0
0
0
0
1∕4
1∕4
1∕4
1∕4
B′ = 0
1∕4
1∕4
1∕4
1∕4 :
0
1∕4
1∕4
1∕4
1∕4
0
1∕4
1∕4
1∕4
1∕4
Then auxiliary A′B′A′ = B′A′B′ and for B := T-1B′T we get 1∕4
0
1∕4
1∕4
1∕4
0
1
0
0
0
B = 1∕4
0
1∕4
1∕4
1∕4
1∕4
0
1∕4
1∕4
1∕4
1∕4
0
1∕4
1∕4
1∕4
which is a doubly stochastic singular (σ(B) = {0, 1}) solution to (1.1), since BA-1 = B = X in this case. ♣
N. Č. Dinčić and B. D. Djordjević
326
The solution X = BA-1 is not randomly chosen; this decomposition also appears in the results obtained by Ding and Rhee in [16]. Theorem 7.6 gives sufficient conditions for the existence of nontrivial solutions. Below we proceed to show that the same conditions are also necessary. The following proposition is well-known and is easy to verify (see, e.g., [5], once again): Proposition 7.8 A unitary doubly stochastic matrix is also a permutation matrix. Proof Denote by L the said unitary doubly stochastic matrix. Note that L is invertible, and L - 1 = L = LT , since L 2 M n ðÞ. Further, L is doubly stochastic; then so is LT = L-1; thus L is a permutation matrix. □ Theorem 7.9 [22] With respect to the previous notation, let A be a permutation matrix such that π A does not have fixed points. Then there does not exist a doubly stochastic nontrivial solution X to (1.1). Proof Since π A does not have fixed points, it can be broken down into a finite number of cycles, where each cycle has an appropriate cyclic matrix associated to it: πA = π1 π2 . . . πp,
1 ≤ p < n:
Assume there exists a doubly stochastic matrix X which is a nontrivial solution to (1.1). Case 1 First assume that p = 1. Then A has the cyclic property that An = In and the space n is spanned by an arbitrary basis vector e: n = spanðe, Ae, . . ., An - 1 eÞ: If the solution X is singular, then kerðXÞ is a nontrivial subspace of n and consequently XAX : kerðXÞ → f0g. But then AXA : kerðXÞ → f0g , A : kerðXÞ → kerðXÞ, which is impossible since A cannot have a nontrivial invariant subspace. Therefore X is a regular matrix and the Cauchy-Binet theorem states that det A = det X. By virtue of A being a permutation matrix, we have jdet Aj = 1
Yang-Baxter-Like Matrix Equation: A Road Less Taken
327
on one hand and 0 < jλ(X)|≤ 1 for every λ(X) 2 σ(X) on the other, while simultaneously jλðXÞj = 1:
λðXÞ2σðAÞ
This proves that all eigenvalues of X have moduli equal to one. However, this implies that X must be a permutation matrix: indeed if X = UDW is the singular value decomposition of X, then |λ(X)| = 1 for every λ(X) 2 σ(X) implies that D = In and X = UW is a unitary doubly stochastic matrix, for which Proposition 7.8 applies. In that sense, there exists the corresponding permutation of indices π X such that Xes = eπ X ðsÞ , for every 1 ≤ s ≤ n, where e1, . . . , en is the standard basis. Observe the mathematically equivalent problem: πX πAπX = πAπX πA: Recall that π A is a cyclic permutation; thus the set of indices {1, . . . , n} is written out in terms of π A as (a1, . . . , an), in a manner that π A : ai ° aiþn 1 , for every i = 1, n, where +n stands for addition modulo n. In that sense, we define the function succðÞ on {1, . . . , n} as the successor of the given element in the cyclic permutation π A: succðai Þ: = aiþn 1 , for every i = 1, n. Notice that π X cannot have fixed points in {1, . . . , n}. This follows from an auxiliary verification, if π X(m) = m for some m 2{1, . . . , n}, then π X π A π X ðmÞ = π X ðmþn 1Þ on one hand, while π A π X π A ðmÞ = 1þn π X ðmþn 1Þ on the other, and the two can never be equal modulo n (naturally n > 1). Therefore the permutation π X can be represented as a finite composition of r cyclic permutations: π X = q1 q2 . . . qℓ1
qðℓ1 þ1Þ . . . qðℓ1 þℓ2 Þ . . . qðℓr - 1 þ1Þ . . . qℓr
where r 2 denotes the number of cycles, ℓ 0 := 0, and thus ℓi - ℓ(i-1) is the length of ith cycle, and i = 1, r, ℓr = n and q1 , . . . , qℓr are mutually different elements from the set 2{1, . . . , n}. Now denote by mi := π A(qi) for every i = 1, n . Then
N. Č. Dinčić and B. D. Djordjević
328
π X π A π X ðq1 Þ = π X ðm2 Þ,
π A π X π A ðq1 Þ = succπ X ðm1 Þ,
where the successor is referred to in the sense of the cyclic shift given by π A (these shifts have nothing to do with the ordering qi in π X). Therefore π X subjects to the ordering imposed by π A, in the sense that π X ðm2 Þ = succπ X ðm1 Þ. Similarly we get π X π A π X ðq2 Þ = π X ðm3 Þ,
π A π X π A ðq2 Þ = succπ X ðm2 Þ,
thus π X ðm3 Þ = succπ X ðm2 Þ = succ2 π X ðm1 Þ, and by mathematical induction π X ðmℓ1 Þ = succℓ1 π X ðm1 Þ. Finally π X π A π X ðqℓ1 Þ = π X ðm1 Þ,
π A π X π A ðqℓ1 Þ = succπ X ðmℓ1 Þ,
which implies that π X ðm1 Þ = succπ X ðmℓ1 Þ = succℓ1 þ1 π X ðm1 Þ: But this gives that ðπ X ðm1 Þ . . . π X ðmℓ1 ÞÞ is an invariant cycle in the cyclic permutation π A, which is possible if and only if ℓ1 = n and π A(mi) = π X(mi), for every i = 1, n. But this implies that X = A, which is not possible. Case 2 Assume 1 < p < n and that the permutation π A is broken down into p disjoint cycles πA = π1 π2 . . . πp: Respectively, the matrix A has the decomposition A1
0
0
...
0
0
I n2
0
...
0
⋮
⋮
⋮
⋱
⋮
0
0
0
. . . I np
. . .
I n1
0
0
...
0
0
A2
0
...
0
⋮
⋮
⋮
⋱
⋮
0
0
0
...
I np
I n1
0
0
...
0
0
I n2
0
...
0
⋮
⋮
⋮
⋱
⋮
0
0
0
. . . Ap
,
Yang-Baxter-Like Matrix Equation: A Road Less Taken
329
where each As is the corresponding cyclic permutation matrix of the certain p
length ns,
s=1
ns = n. If X is invertible, then from detA = detX it once again
follows that X must be a permutation matrix (unitary and doubly stochastic, then Proposition 7.8). Similarly as before, π X cannot have a fixed point, and it allows the decomposition into a finite number of cycles π X = q1 q2 . . . qℓ1
qðℓ1 þ1Þ . . . qðℓ1 þℓ2 Þ . . . qðℓr - 1 þ1Þ . . . qℓr
where the letters denote the same entities as in the first case. Observe the first cycle of π X, the cyclic permutation q1 q2 . . . qℓ1 . Denote by mi := π A(qi) for every i as before. There exists a unique s, 1 ≤ s ≤ p, such that π X(m1) belongs to the cycle π s of π A. Then π Xπ Aπ X(q1) = π X(m2) and π Aπ Xπ A(q1) = π Aπ X(m1) = π sπ X(m1). The two are equal if and only if π X(m2) belongs to the same cycle π s as does the number π X(m1) and π X ðm2 Þ = succs π X ðm1 Þ, where succs is the successor function defined via As on the cycle π s the same way succ was defined in the first case via A. Then once again mathematical induction applies and we conclude that π X ðm1 Þ, π X ðm2 Þ, . . . , π X ðmℓ1 Þ belong to the same cycle π s of the permutation π A and π X ðm1 Þ = succs π X ðmℓ1 Þ = succ2s π X ðmℓ1-1 Þ = . . . = succsℓ1 þ1 π X ðm1 Þ, therefore ðπ X ðm1 Þ . . . π X ðmℓ1 ÞÞ is a sub-cycle of π s, which is possible if and only if ℓ 1 = ns and π X ðmi Þ = π A ðmi Þ = π As ðmi Þ for every i = 1, ns . In other words, the first cycle of π X is equal to the sth cycle of π A. Continuing this methodology, we conclude that p = r, and each cycle of π X is equal to precisely one cycle of π A but to different one each time the cycle of π X is changed. In other words, it follows that π X = π A, i.e., X = A, which is impossible. Finally, assume that X is a singular doubly stochastic matrix. Since A is invertible, it follows that A : kerðXÞ → kerðXÞ, i.e., kerðXÞ is a nontrivial A invariant subspace. Since none of the cyclic matrices A1, . . . Ap allow a nontrivial invariant subspace, there exist finitely many indices fjð1Þ, . . . , jðrÞg ⊂ f1, . . ., pg,
r≤p
N. Č. Dinčić and B. D. Djordjević
330
such that kerðXÞ is spanned by all those basis vectors in which the cyclic matrices Aj(1), . . . , Aj(r) are defined: if n = ps = 1 V s such that dimV s = ns and As : Vs → Vs, then kerðXÞ = V jð1Þ . . . V jðrÞ and kerðXÞ⊥ = i V i , where the sum runs over those indices i= 2{j(1), . . . , j(r)}. It follows that 0 is a simple eigenvalue for X: if there exists a u 2 n such that u⊥ V jð1Þ . . . V jðrÞ and Xu 2 kerðXÞ, then XAð Xu Þ = XðAXuÞ = 0, 2kerðXÞ
2kerðXÞ
while AXAu = 0 if and only if A : u → kerðXÞ, if and only if u 2 Vj(1). . . Vj(r), which is impossible. Thus n = kerðXÞ kerðXÞ⊥ , and the reduction YX of X onto kerðXÞ⊥ is a square, invertible doubly stochastic matrix, while the reduction YA of A on kerðXÞ⊥ is a square permutation matrix. The previous scenario applies to YA and YX, respectively, and we once again conclude that no such stochastic YX exists and neither does the initial solution X. □
7.2
Permutation Solutions
In this section we want to refine the previous results, in the sense of finding all permutation solutions when A is a permutation matrix. As stated before, each permutation matrix A corresponds to one permutation π A of the basis vectors’ index set, Aei = eπA ðiÞ for every i = 1, n. Respectively, we observe the symmetric group Sn of permutations over the set {1, . . . , n} and we rewrite the initial YBME (1.1) as πAπX πA = πX πAπX ,
ð7:1Þ
for π X, π A 2 Sn. Lemma 7.10 [22] Let π A and π X 2 Sn be such that (7.1) holds. Then π A has k fixed points if and only if π X has k fixed points as well. Proof If π A = π X, then the statement is obvious. If π X ≠ π A, let {i1, . . . , ik} be the set of fixed points for π A: π A = ði1 Þði2 Þ. . .ðik Þ π n - k
Yang-Baxter-Like Matrix Equation: A Road Less Taken
331
where π n-k is a permutation of the remaining n - k elements without fixed points. From π Xπ Aπ X(iℓ) = π Aπ Xπ A(iℓ) = π Aπ X(iℓ) for every ℓ = 1, k, it follows that {π Aπ X(i1), . . . π Aπ X(ik)} is a set of different fixed points for π X. Thus π X has at least k fixed points. Moreover, if π X has m > k fixed points, then due to symmetry of π Xπ Aπ X = π Aπ Xπ A, it follows that π A has at least m fixed points. This proves that m = k. Conversely, if π X has k fixed points, then so does π A. □ Theorem 7.11 [22] Let π A 2 Sn be a permutation with one fixed point. The following statements are equivalent: (a) There exists a nontrivial permutation solution π X to (7.1). (b) The permutation π A has a cycle of length two or a cycle of length three. Moreover, if π A = ðiÞðs1 s2 Þπ n-3
ð7:2Þ
where i, s1, and s2 are three different elements from the set {1, . . . , n} and π n-3 is a permutation without fixed points or cycles of length three of the remaining n - 3 elements {1, . . . , n}∖{i, s1, s2}, then π X = ðs2 Þði s1 Þπ n-3 :
ð7:3Þ
π A = ðiÞðs1 s2 s3 Þπ n - 4 ,
ð7:4Þ
Similarly, if
where i, s1, s2, and s3 are four different elements from the set {1, . . . , n} and π n-4 is a permutation without fixed points or cycles of length two of the remaining n - 4 elements {1, . . . , n}∖{i, s1, s2, s3}, then π X = ðs2 Þði s1 s3 Þπ n-4 :
ð7:5Þ
If π A has at least one cycle of length two and at least one cycle of length three, π A = ðiÞðs1 s2 Þðm1 m2 m3 Þπ n-6
ð7:6Þ
where π n-6 is a permutation without fixed points of the remaining n - 6 elements {1, . . . , n}∖{1, s1, s2, m1, m2, m3}, then π X has the form
N. Č. Dinčić and B. D. Djordjević
332
π X = ðs1 Þði s2 Þðm1 m2 m3 Þπ n - 6
or
π X = ðm2 Þðs1 s2 Þði m1 m3 Þπ n - 6 ð7:7Þ
Proof (b) ) (a) : If π A has the form (7.2), (7.4), or (7.6), then, respectively, choose π X to be given as (7.3), (7.5), or (7.7). Direct verification shows that (7.1) holds. (a) ) (b) : Assume that (a) holds. There exists a unique permutation matrix X 2 n × n which corresponds to π X, i.e., X is a permutation matrix which is a nontrivial solution to (1.1). Lemma 7.10 implies that π X has precisely one fixed point; respectively X has precisely one nonzero entry (the number 1) on its main diagonal. Assume that i is a shared fixed point for permutations π A and π X. Then there exists a basis of n such that A=
1
01 × ðn-1Þ
0ðn-1Þ × 1
Pn-1
and X =
1
01 × ðn-1Þ
0ðn-1Þ × 1
Qn-1
,
where Pn-1 and Qn-1 are permutation matrices of order n - 1 without fixed points. However, AXA = XAX implies that Pn-1 Qn-1 Pn-1 = Qn-1 Pn-1 Qn-1 holds, which is impossible by Theorem 7.9 when Pn-1 ≠ Qn-1. Thus i is not a fixed point for π X and there exists a unique π Aπ X(i) 2{1, . . . , n}∖{i} which is the unique fixed point for π X. Further, there exists a unique cycle in π A of length r, n > r > 1 such that π Aπ X(i) is contained in that cycle: π A = ðiÞðπ A π X ðiÞ π 2A π X ðiÞ . . . π rA π X ðiÞÞ π n-r-1 :
ð7:8Þ
Once again, we denote by π n-r-1 the permutation of the remaining elements from the set {1, . . . , n} without fixed points. Since π X(i) ≠ i, it follows that π Aπ X(i) ≠ π X(i). In other words, there exists a unique cycle in the π X permutation, having the length r′, n > r′ > 1, such that i belongs to that cycle: π X = ðπ A π X ðiÞÞ ði π X ðiÞ π 2X ðiÞ . . . π r0X - 1 ðiÞÞ π ′n - r ′ - 1 ,
ð7:9Þ
π′n - r′- 1 is a permutation of the remaining elements from the set {1, . . . , n} without fixed points. Notice that
Yang-Baxter-Like Matrix Equation: A Road Less Taken
π X π A π 2X ðiÞ = π A π X π A ðπ X ðiÞÞ = π 2A π X ðiÞ
333
ð7:10Þ
and similarly π X π 2A π X ðiÞ = π X π A ðπ A π X π A ðiÞÞ = π X π A π X ðπ A π X ðiÞÞ = π A π X π A ðπ A π X ðiÞÞ = π A π X π 2A π X ðiÞ
ð7:11Þ
) i = π X π 2A π X ðiÞ ) π 2A π X ðiÞ = π r0X - 1 ðiÞ: Equating (7.10) and (7.11), we get 2 r0-2 π X π A π 2X ðiÞ = π r0-1 X ðiÞ , π A π X ðiÞ = π X ðiÞ:
ð7:12Þ
π X π A π X ðπ rA π X ðiÞÞ = π A π X ðπ A π X ðiÞÞ = π 2A π X ðiÞ,
ð7:13Þ
Further
thus (7.10) and (7.13) give π X ðiÞ = π rA π X ðiÞ and it follows that π A = ðiÞðπ X ðiÞ π A π X ðiÞ π 2A π X ðiÞ . . . π r-1 A π X ðiÞÞ π n-r-1 :
ð7:14Þ
On the other hand, (7.11) and (7.13) give r r0-2 r π r0-1 X ðiÞ = π X π A π X ðπ A π X ðiÞÞ , π X ðiÞ = π A π X ðπ A π X ðiÞÞ
ð7:15Þ
and r0-1 π X = ðπ A π X ðiÞÞ ði π X ðiÞ π 2X ðiÞ . . . π r0-2 X ðiÞ π X ðiÞÞ π ′n - r ′ - 1: π A π 2X ðiÞ
π 2A π X ðiÞ
ð7:16Þ
Below we proceed to characterize all π r0-s-3 ðiÞ, s = 0, r ′ - 3 in a similar X manner. When s = 0, we have
N. Č. Dinčić and B. D. Djordjević
334
r r0-2 π X π A π X ðπ r-1 A π X ðiÞÞ = π A π X π A π X ðiÞ = π X ðiÞ
ð7:17Þ
r0-3 ) π A π X ðπ r-1 A π X ðiÞÞ = π X ðiÞ:
Assume that for a fixed s 2{0, . . . , r′- 4} we have ð7:18Þ
π r0-s-3 ðiÞ = π A π X ðπ r-s-1 π X ðiÞÞ: X A Then r-ðsþ1Þ-1
π X π A π X ðπ A
π X ðiÞÞ = π X π A π X ðπ Ar-s-2 π X ðiÞÞ
π X ðiÞ = π r0-s-3 ðiÞ = π A π X π r-s-1 A X r0-ðsþ1Þ-3
) π A π X ðπ r-s-2 π X ðiÞÞ = π r0-s-4 ðiÞ = π X A X
ðiÞ: ð7:19Þ
Combined (7.15), (7.18), and (7.19) prove that r-ℓþ2 π r0-ℓ π X ðiÞÞ, X ðiÞ = π A π X ðπ A
ℓ = 2, r ′ :
ð7:20Þ
When ℓ = r′, it follows that i = π A π X π rA- r0þ2 π X ðiÞ, that is, i = π X π rA- r0þ2 π X ðiÞ , π 2A π X ðiÞ = π rA- r0þ2 π X ðiÞ , r = jαj r ′ , for some α 2 . However, using the symmetry π Aπ Xπ A = π Xπ Aπ X, we can also prove that r′ = jβ| r, for some β 2 . This follows from the substitution j := π Aπ X(i). Then j is the unique fixed point for π X, while i = π X π 2A π X ðiÞ = π X π A ðjÞ is the unique fixed point for π A. Respectively, the cycle ðπ X ðiÞ π A π X ðiÞ . . . π r-1 A π X ðiÞÞ is the unique cycle in the π A permutation which contains j. In other words, the representation (7.14) can be rewritten in terms of j as π A = ðπ X π A ðjÞÞ ðj π A ðjÞ . . . π r-1 A ðjÞÞ π n-r-1 :
ð7:21Þ
length r
On the other hand, since i = π Xπ A( j), it follows that in the π X permutation the cycle ði π X ðiÞ π 2X ðiÞ . . . π r0-1 X ðiÞÞ is the unique cycle which contains π Xπ A( j) in itself. Thus (7.16) can be rewritten in terms of j:
Yang-Baxter-Like Matrix Equation: A Road Less Taken
335
π X = ðjÞ ðπ X π A ðjÞ π 2X π A ðjÞ . . . π r0-1 X π A ðjÞÞ π n - r ′ - 1 : length r ′
Now applying formulae (7.17)–(7.20), we conclude that r′ = jβjr for some β 2 ; therefore r = r′. Now we can rewrite (7.20) by means that r-ℓþ2 π X ðiÞÞ, π r-ℓ X ðiÞ = π A π X ðπ A
ℓ = 2, r:
ð7:22Þ
Analogously, with respect to j = π Aπ X(i), we have r-ℓþ2 π r-ℓ π A ðjÞÞ, A ðjÞ = π X π A ðπ X
ℓ = 2, r:
ð7:23Þ
Using that r = r′, it is not difficult to conclude that (7.22) and (7.23) hold even when ℓ 2{0, 1}. Indeed, when ℓ = 0 (7.22) gives π rX ðiÞ = i = π X π 2A π X ðiÞ , π rX- 0 ðiÞ = π A π X π Ar - 0þ2 π X ðiÞ and similarly when ℓ = 1, r-1þ2 2 π r-1 π X ðiÞ: X ðiÞ = π A π X ðiÞ = π A ðπ X π A π X ðiÞÞ = π A π X π A
Since the symmetry holds for (7.23), it follows that the recursive relations (7.22) and respectively (7.23) hold for every ℓ = 0, r. If r ≥ 4, then (7.23) implies for every ℓ 2{0, . . . , r} π r-ℓþ3 ðjÞπ X ðiÞ = π r-ℓþ2 π A π X ðiÞ = π r-ℓþ2 ðjÞ = π X π A π r-ℓþ4 π 2A π X ðiÞ ð7:24Þ X A A A Replacing (7.24) in (7.22), we get r-ℓþ2 π r-ℓ π X ðiÞ = π A π X π X π A π r-ℓþ4 π 2A π X ðiÞ X X ðiÞ = π A π X π A
when r = ℓ we get i = π 2X π A π 4X π 2A π X ðiÞ , π A π 2X ðiÞ = π A π 4X π 2A π X ðiÞ , i = π 2X π 2A π X ðiÞ , π 2X π A π 2X ðiÞ = π 2X π 2A π X ðiÞ , r-1 π r-2 X ðiÞ = π X ðiÞ
ð7:25Þ which is impossible. This proves that the cycle for π A which contains π X(i) must be of length two or three, that is, the representation (7.21) or,
336
N. Č. Dinčić and B. D. Djordjević
equivalently, (7.14) has the form (7.2) or (7.4) or (7.6). Additionally, this proves the form of the solution π X: since π X was an arbitrary nontrivial solution to π Aπ Xπ A = π Xπ Aπ X, we concluded that π X must have precisely one fixed point j ≠ i. Respectively, π A must contain a cycle of length two or three which contains j, and consequently the solution π X has a cycle of the same length which contains the element π A-1 ðjÞ. Since the length of these two cycles is the same and equals to two or three, it is obvious that the ordered quadruple i, π A- 1 ðjÞ, j, π A ðjÞ is the same as the ordered quadruple i, π X ðiÞ, j, π 2X ðiÞ (the elements need not be mutually different; hence we do not use the set notation here). Thus π n-r-1 and π′n - r -1 from (7.8) and (7.9), respectively, are permutations of the same elements f1, . . ., ng∖fi, π X ðiÞ, j, π 2X ðiÞg without fixed points. Theorem 7.9 applies once again to π n-r-1 and π ′n - r - 1 and we conclude that π n-r-1 = π′n - r - 1. Thus if π A has the form (7.2), then π X must be of the form (7.3), and respectively if π A has the form (7.4), then π X has the form (7.5), and □ finally if π A has the form (7.6), then piX has the form (7.7). Corollary 7.12 ([22]) Let A be a permutation matrix, such that π A has one fixed point, k2 cycles of length two, and k3 cycles of length three, 0 ≤ min fk 2 , k3 g, k2 + k3 > 0. There are in total 2k2 + 3k3 different nontrivial permutations π X which satisfy π Aπ Xπ A = π Xπ Aπ X. Proof Given that π A has one fixed point, we have by Lemma 7.10 that every such π X has precisely one fixed point j as well and that j ≠ i. Theorem 7.11 states that there must exist a cycle of length two or three in the permutation π A, such that j is contained in that cycle. It is not difficult to compute that there are in total k2 different choices for such cycles of length two and k3 different choices for such cycles of length three. Further, for a fixed cycle of length two, there are two different choices for j (recall notation from (7.2)). Analogously, for each fixed cycle of length three, there are three choices for j (recall notation from (7.4)). Adding them together gives in total 2k2 + 3k3. □ Theorem 7.13 ([22]) Let A be a permutation matrix with 1 ≤ k ≤ n fixed points. The following statements are equivalent: (a) The YBME (1.1) has a nontrivial solution X which is a permutation matrix. (b) The permutation π A of the basis vectors, which corresponds to the matrix A, has at least one cycle of length two or three, and k < n.
Yang-Baxter-Like Matrix Equation: A Road Less Taken
337
Proof (b) ) (a) : The permutation π A has exactly one of the following forms: π A = ði1 Þ. . .ðik Þðs1 s2 Þπ n-k-2 ,
ð7:26Þ
π A = ði1 Þ. . .ðik Þðm1 m2 m3 Þπ n-k-3 ,
ð7:27Þ
π A = ði1 Þ. . .ðik Þðs1 s2 Þðm1 m2 m3 Þπ n-k-5
ð7:28Þ
where once again π n-k-2, π n-k-3 and π n-k-5 are permutations of the remaining elements from the set {1, . . . , n} without fixed points. Choose an arbitrary index k0 2{1, . . . , k}. Respectively, if π A has the form (7.26), the permutation π X k0 can be chosen to be π X k0 =
ðiℓ Þðs1 Þðik0 s2 Þ π n-k-2 : 1 ≤ ℓ ≤ k, ℓ ≠ k0 ,
ð7:29Þ
Similarly, if π A has the form (7.27), then π X k0 =
ðiℓ Þðm2 Þðik0 m1 m3 Þ π n-k-3 , 1 ≤ ℓ ≤ k, ℓ ≠ k0 ,
ð7:30Þ
and finally if π A has the form (7.28), then π X k0 =
ðiℓ Þðs1 Þðik0 s2 Þðm1 m2 m3 Þ π n-k-5 or 1 ≤ ℓ ≤ k, ℓ ≠ k0 ,
π X k0 =
ðiℓ Þðm2 Þðs1 s2 Þðik0 m1 m3 Þ π n-k-5 :
ð7:31Þ
1 ≤ ℓ ≤ k, ℓ ≠ k0 ,
(a) ) (b) : If k = n, then A = In and X is a permutation matrix which is idempotent (Lemma 7.4), i.e., X = In = A. Therefore we proceed with k < n and the permutation π A is given as π A = ði1 Þði2 Þ . . . ðik Þ π n - k
N. Č. Dinčić and B. D. Djordjević
338
where π n-k is the permutation of the remaining elements from the set {1, . . . , n} without fixed points. From Lemma 7.10 it follows that π X has k fixed points as well. Similarly as before, π X cannot share all k fixed points with π A. Therefore, there exists a fixed point for π A i0 which is not a fixed point for π X, and in that case π Aπ X(i0) is a fixed point for π X. If π Aπ X(i0) is a fixed point for π A, it follows that π X(i0) is a fixed point for π A and π X ði0 Þ = π A π X ði0 Þ = π X π A π X ði0 Þ = π 2X ði0 Þ, which proves that π X(i0) is a fixed point for π X and consequently i0 is a fixed point for π X, which is impossible. This proves that there exists a cycle in the permutation π A, having the length r, r > 1, which contains the point π X(i0) in itself: π A = ði0 Þðπ X ði0 Þ π A π X ði0 Þ π 2A π X ði0 Þ . . . π r-1 A π X ði0 ÞÞ π n - r-1 where π n-r-1 is a permutation of the remaining elements from the set {1, . . . , n} with k - 1 fixed points (the cycle ðπ A π X ði0 Þ π 2A π X ði0 Þ . . . π rA π X ði0 ÞÞ cannot have any fixed points for π A). If π n-r-1 has a cycle of length two or three in it, then the statement holds. Thus we assume that π n-r-1 does not have such a cycle. Analogously as before, there exists a cycle of length r′, n > r′ > 1 in the π X permutation which contains i0: π X = ðπ A π X ði0 ÞÞ ði0 π X ði0 Þ π 2X ði0 Þ . . . π rX′ -1 ði0 ÞÞ π ′n - r ′ - 1 , where π′n - r′- 1 is a permutation of the remaining elements from the set {1, . . . , n} with k - 1 fixed points. By applying the same logic as in (7.10), (7.11), and (7.12), we conclude that π r0X - 1 ði0 Þ = π 2A π X ði0 Þ,
π A π 2X ði0 Þ = π r0-2 X ði0 Þ:
Then once again we obtain π r0X - ℓ ði0 Þ = π A π X ðπ r-ℓþ2 π X ði0 ÞÞ, A
ℓ = 2, r ′ :
ð7:32Þ
which once again proves that r = αr′ for some α 2 . Continuing the procedure as in (7.22) and (7.23), the recursive formula (7.32) holds for π A and r = r′. This proves that (7.32) holds even for ℓ 2{0, 1}. Finally, if r ≥ 4, we once again obtain the formula (7.24) w.r.t. i0 instead of i and conclude that
Yang-Baxter-Like Matrix Equation: A Road Less Taken
339
π rX- 2 ði0 Þ = π r-1 X ði0 Þ, which is impossible. Similarly as before, we conclude that if π A is given as (7.26), then π X can be provided as (7.29). Respectively, if π A is given via (7.27), then π X is given as (7.30), and finally, if π A is given as (7.28), then □ π X is given as (7.31). Definition 7.14 ([22]) Let π 3 be a permutation given as π 3 = (i) (s1 s2). Swapping the fixed point of π 3 produces a permutation π′3 defined as π′3 = (s1) (i s2). In that case, π′3 is obtained from π 3 by swapping its fixed point and the cycle (s2 s3) is said to be involved in the swapping. Analogously, if π 4 is a permutation given as π 4 = (i)(m1 m2 m3), then swapping the fixed point of π 4 defines a permutation π′4 given as π′4 = (m2) (i m1 m3). In that case, π′4 is obtained from π 4 by swapping its fixed point and the cycle (m1 m2 m3) is said to be involved in swapping. Remark 7.15 Recall that every permutation π can be written as a product of I cycles, π =∏i 2 Ici where ci represents one cycle. By definition, if I = Ø, then each ci is the trivial permutation, i.e., ci = id on the given set, and consequently ∏i 2Øci = id. Theorem 7.16 ([22]) Assume that π A has k1 fixed points, k2 cycles of length two, and k3 cycles of length three, such that 0 < k1 < n, 0 ≤ min fk2 , k 3 g, and 0 < k2 + k3: πA =
k1 ℓ=1
k2 - 1
ðiℓ Þ
ðs2pþ1 s2pþ2 Þ
p=0 k2 þk3 > 0
k3 - 1
ð7:33Þ ðm3qþ1 m3qþ2 m3qþ3 Þπ n - ðk1 þ2k2 þ3k3 Þ ,
q=0 k 2 þk3 > 0
where π n - ðk1 þ2k2 þ3k3 Þ is a permutation of the remaining elements from the set {1, . . . , n} without fixed points and cycles of lengths two and three. Then any permutation π X is a nontrivial solution to (7.1) if and only if it has the form
N. Č. Dinčić and B. D. Djordjević
340
πX =
k′1 ℓ=0
ðiℓ Þ
k′3
q=1
k′2 p=1
ðs2γðpÞ - 1 Þðik′1 þp s2γðpÞ Þ ð7:34Þ
ðm3δðqÞ - 1 Þðik′1 þk′2 þq m3δðqÞ - 2 m3δðqÞ Þ π′n′ ,
where 1. k′2 is the number of cycles of length two from (7.33) which were involved in swapping k′2 fixed points from π A, 0 ≤ k′2 ≤ min fk 2 , k 1 g. If k′2 > 0, then γ is an arbitrary injection from the set {1, . . . , k′2} into the set {1, 3, 5, . . . , 2k2 - 1}; otherwise γ : Ø →Ø; 2. k′3 is the number cycles of length three from (7.33) which were used in swapping k′3 fixed points from π A, 0 ≤ k′3 ≤ min fk3 , k1 - k′2 g . If k′3 > 0, then δ is an arbitrary injection from the set {1, . . . , k′3} into the set {1, 4, 7, . . . , 3k3 - 2}; otherwise δ : Ø →Ø; 3. minfk 2 þ k3 , k1 g ≥ k′2 þ k′3 > 0; 4. k′1 is the number of shared fixed points for π A and π X, i.e., the number of fixed points which were not involved in the swapping procedures, k′1 = k1 - (k′2 + k′3); 5. The permutation π′n′ consists of those elements from (7.33) which are not fixed points for π A and which were not involved in swapping of the fixed points for π A, π ′n ′ =
ðs2pþ1 s2pþ2 Þ p = 0, k2 - 1 2p - 1= 2ranðγÞ
ðm3qþ1 m3qþ2 m3qþ3 Þ π n - ðk1 þ2k2 þ3k3 Þ :
ð7:35Þ
q = 0, k 3 - 1 3q - 2= 2ranðδÞ
Proof The if part follows directly: whenever the permutation π X is given as (7.34), then simple calculation verifies that it solves (7.1). To prove the converse, assume that π X solves the equation (7.1). Lemma 7.10 implies that π X also has k1 fixed points. Denote that set by
Yang-Baxter-Like Matrix Equation: A Road Less Taken
341
fπ A π X ði1 Þ, . . . , π A π X ðik1 Þg: By Theorem 7.13, the two permutations π X and π A cannot share all k1 fixed points; thus there is a nonnegative integer k′1 2{0, . . . , k1 - 1} such that i1 , . . . , ik′1 are shared fixed points for π A and π X while the remaining are not: fik′1 þ1 , . . ., ik1 g ≠ fπ A π X ðik′1 þ1 Þ, . . . , π A π X ðik1 Þg. Now for every ℓ = k′1 þ 1, k1 , either there is a cycle c2ℓ of length two in π A such that c2ℓ = ðπ X ðiℓ Þ π A π X ðiℓ ÞÞ,
ð7:36Þ
while respectively π X has a cycle d 2ℓ of length two such that d2ℓ = ðiℓ π X ðiℓ ÞÞ,
ð7:37Þ
or π A has a cycle of length three c3ℓ such that c3ℓ = ðπ X ðiℓ Þ π A π X ðiℓ Þ π 2A π X ðiℓ ÞÞ
ð7:38Þ
while respectively π X has a cycle d 3ℓ of length three such that d3ℓ = ðiℓ π X ðiℓ Þ π 2A π X ðℓÞÞ:
ð7:39Þ
In other words, there must exist a nonnegative integer k′2, given in the range 0 ≤ k′2 ≤ min fk2 , k1 - k′1 g, which tells exactly how many cycles of length two from the permutation π A are involved in swapping k′2 out of k1 - k′1 fixed points for π A. Consequently, there are precisely k′3 cycles of length three in π A, k′3 = k1 - k′1 - k′2, which are involved in swapping the final remaining k1 - k′1 - k′2 fixed points for π A. By construction, there are k′1 shared fixed points for π X and π A. Furthermore, there are k′2 fixed points for π X of the form π Aπ X(ip), p = 0, k′2 - 1, as well as k′2 cycles of length two in π X which are obtained by (7.37) from (7.36). Finally, there are k′3 fixed points for π X having the form π Aπ X(iq), q = 0, k′3 - 1, as well as k′3 cycles of length three in π X which are obtained by (7.39) from (7.38). This indeed proves that π X must be given as (7.34). To prove (7.35), notice that π′n′ is a permutation of the remaining n′ elements, where n′ = n - k1 - 2k′2 - 3k′3, without fixed points. Then similarly as in proofs of Theorem 7.13 and Theorem 7.9, we conclude that π′n′ must
N. Č. Dinčić and B. D. Djordjević
342
coincide with π A on those elements which are not fixed points for π A nor are □ parts of the cycles used in swapping the fixed points for π A. We introduce χ p(m) for p 2 and m 2 0 as χ p ðmÞ =
1,
m=0
p m!,
m > 0:
Corollary 7.17 ([22]) With respect to the notation from Theorem 7.16, there are minfk 1 , k 2 g k′2 = 0
k2 k′2
k1 χ ðk′ Þ k′2 2 2
minfk 3 , k 1 - k′2 g k′3 = 0
χ 3 ðk′3 Þ
k3 k′3
k 1 - k′2 k′3
-1
ð7:40Þ nontrivial permutation solutions to (7.1). Proof Recall notation from Theorem 7.16. For a fixed k′2, there are
k2 k′2
possible choices for the cycles of length two from π A to be involved in the swapping of k′2 fixed points for π A. Further, once the number k′2 for the cycles of length two is fixed, there are
k1 k′2
available choices for k′2 fixed points of
π A (out of k1 of them), which will be involved in the said swapping. Once the k′2 fixed points for π A are chosen, as well as the k′2 cycles of length two from π A which will be involved in the swapping, there are k′2! possibilities of pairing up one fixed point with one cycle of length two. Finally, since (s1 s2) = (s2 s1), we have two possibilities for swapping one observed fixed point within one bi-length cycle: one would involve swapping the observed fixed point with s1 of the cycle, while the other would be swapping the observed fixed point with s2 of the cycle. Thus we have 2
k1 k′2
k2 ðk′2 !Þ k′2
possibilities for swapping the k′2 fixed points from π A with k′2 cycles of length two from π A. Similarly, there are
Yang-Baxter-Like Matrix Equation: A Road Less Taken
3
k3 k′3
343
k1 - k′2 ðk′3 !Þ k′3
ways for swapping the k′3 fixed points for π A with k′3 cycles of length three, 0 ≤ k′3 ≤ min fk1 - k′2 , k3 g. Combining these observations we get 6 k′2 k′3
k1 k′2
k2 k ðk′2 !Þ 3 k′2 k′3
k 1 - k′2 ðk′3 !Þ k′3
where the sum runs over all possible choices for (k′1, k′2). Since minfk 1 , k2 þ k 3 g ≥ k′2 þ k′3 > 0, we obtain the formula (7.40), where the □ “- 1” regards to the case where k′2 = k′3 = 0. Conflict of Interest The authors declare that there are no conflicts of interest in publishing the findings obtained in this paper. Data Availability Statement Data availability is not applicable as no data sets were generated during this research. Funding The first author is supported by the Ministry of Education, Science and Technological Development, Republic of Serbia, Grant No. 451-03-68/2022-14/ 200124. The second author is supported by the Ministry of Education, Science and Technological Development, Republic of Serbia, Grant No. 451-03-68/2022-14/ 200029 and by the bilateral project between Serbia and Slovenia (Generalized inverses, operator equations and applications, Grant No. 337-00-21/2020- 09/32).
References 1. Adam, M. S. I., Ding, J., Huang, Q., & Zhu, L. (2019). All solutions of the YangBaxter-like matrix equation when A3 = A. Journal of Applied Analysis and Computation, 9(3), 1022–1031 2. Aprahamian, M. & Higham, N. J. (2014). The matrix unwinding function, with an application to computing the matrix exponential. SIAM Journal on Matrix Analysis and Applications, 35(1): 88–109 3. Baksalary, O. M., & Trenkler, G. (2010). Core inverse of matrices. Linear and Multilinear Algebra, 58(6), 681–697 4. Baxter, R. J. (1972). Partition function of the eight-vertex lattice model. Annals of Physics, 70, 193–228 5. Bhatia, R. (1997). Matrix analysis, Springer 6. Bhatia, R., & Rosenthal, P. (1997). How and why to solve the operator equation AX XB = Y . Bulletin of the London Mathematical Society, 29, 1–21
344
N. Č. Dinčić and B. D. Djordjević
7. Ben–El–Mechaieh, H., & Mechaiekh, Y. A. (2022). An elementary proof of the Brouwer’s fixed point theorem. Arabian Journal of Mathematics (Springer), 11, 179–188. https://doi.org/10.1007/s40065-022-00366-0 8. Ben–Israel, A., & Greville, T. N. E. (2003). Generalized inverses, theory and applications (2nd ed.). Springer 9. Chen, D., & Yong, X. (2022). Finding solutions to the Yang-Baxter-like matrix equation for diagonalizable coefficient matrix. Symmetry, 14, 1577. https://doi.org/ 10.3390/sym14081577 10. Cibotarica, A., Ding, J., Kolibal, J., & Rhee, N. H. (2013). Solutions of the YangBaxter matrix equation for an idempotent. Numerical Algebra, Control & Optimization, 3(2), 347–352. https://doi.org/10.3934/naco.2013.3.347 11. Chen, D., Chen, Z., & Yong, X. (2019). Explicit solutions of the Yang–Baxter-like matrix equation for a diagonalizable matrix with spectrum contained in {1, α, 0}. Applied Mathematics and Computation, 348, 523–530 12. Dai, L., Liang, M., & Shen, Y. (2021). Some rank formulas for the Yang-Baxter matrix equation AXA = XAX. Wuhan University Journal of Natural Sciences Edition, 26(6), 459–463 13. Dehghan, M., & Shirilord, A. (2020). HSS-like method for solving complex nonlinear Yang–Baxter matrix equation. Engineering with Computers. https://doi. org/10.1007/s00366-020-00947-7 14. Dinčić, N. Č. (2019). Solving the Sylvester equation AX - XB = C when σ(A) \ σ(B) ≠ Ø. Electronic Journal of Linear Algebra, 35, 1–23 15. Dinčić, N. Č., & Djordjević, B. D. (2022). On the intrinsic structure of the solution set to the Yang-Baxter-like matrix equation. Revista de la Real Academia de Ciencias Exactas, Físicas y Naturales. Serie A. Matemáticas, 116, 73 16. Ding, J., & Rhee, N. H. (2012). A nontrivial solution to a stochastic matrix equation. East Asian Journal on Applied Mathematics, 2(4), 277–284. 17. Ding, J., & Rhee, N. H. (2013). Spectral solutions of the Yang–Baxter matrix equation. Journal of Mathematical Analysis and Applications, 402, 567–573 18. Ding, J., & Zhang, C. (2014). On the structure of the spectral solutions of the Yang– Baxter matrix equation. Applied Mathematics Letters, 35, 86–89 19. Ding, J., Zhang, C., & Rhee, N. H. (2013). Further solutions of a Yang-Baxter-like matrix equation. East Asian Journal on Applied Mathematics, 3(4), 352–362 20. Ding, J., Zhang, C., & Rhee, N. H. (2015). Commuting solutions of the Yang–Baxter matrix equation. Applied Mathematics Letters, 44, 1–4. https://doi.org/10.1016/j. aml.2014.11.017 21. Ding, J., & Rhee, N. H. (2015). Computing solutions of the Yang-Baxter-like matrix equation for diagonalisable matrices. East Asian Journal on Applied Mathematics, 5, 75–84. 22. Djordjević, B. D. (2023). Doubly stochastic and permutation solutions to AXA = XAX when A is a permutation matrix. Linear Algebra and its Applications, 661, 79–105. https://doi.org/10.1016/j.laa.2022.12.013 23. Djordjević, B. D. (2022). The equation AX - XB = C without a unique solution: The ambiguity which benefits applications. Zbornik Radova. (Beograd), 20(28), 395–442 24. Djordjević, B. D., & Dinčić, N. Č. (2019). Classification and approximation of solutions to Sylvester matrix equation. Filomat, 33(13), 4261–4280. https://doi.org/ 10.2298/FIL1913261D
Yang-Baxter-Like Matrix Equation: A Road Less Taken
345
25. Djordjević, B. D., & Dinčić, N. Č. (2018). Solving the operator equation AX XB = C with closed A and B. Integral Equations Operator Theory, 90 (51). https:// doi.org/10.1007/s00020-018-2473-3 26. Ding, J., & Tian, H. (2016). Solving the Yang–Baxter-like matrix equation for a class of elementary matrices. Computers and Mathematics with Applications, 72, 1541–1548 27. Dong, Q. (2017). Projection-based commuting solutions of the Yang–Baxter matrix equation for non-semisimple eigenvalues. Applied Mathematics Letters, 64, 231–234 28. Dong, Q., & Ding, J. (2016). Complete commuting solutions of the Yang–Baxter-like matrix equation for diagonalizable matrices. Computers and Mathematics with Applications, 72, 194–201 29. Dragović, V. (2012). Algebro-geometric approach to the Yang–Baxter equation and related topics. Publications de l’Institut Mathematique (Beograd) (N.S.), 91(105) 25–48. https://doi.org/10.2298/PIM1205025D 30. Felix, F. (2009). Nonlinear equations, quantum groups and duality theorems: A primer on the Yang–Baxter equation. VDM 31. Higham, N. J. (2008). Function of matrices, theory and computation. SIAM 32. Horn, R. A., & Johnson, C. R. (2012). Matrix analysis, Cambridge University Press. https://doi.org/10.1017/9781139020411 33. Horn, R. A., & Johnson, C. R. (1991). Topics in matrix analysis. Cambridge University Press 34. Huang, Q., Saeed Ibrahim Adam, M., Ding, J., & Zhu, L. (2019). All non-commuting solutions of the Yang-Baxter matrix equation for a class of diagonalizable matrices. Operators and Matrices, 13(1), 187–195 35. Kumar, A., & Cardoso, J. R. (2018). Iterative methods for finding commuting solutions of the Yang–Baxter–like matrix equation. Applied Mathematics and Computation, 333, 246–253 36. Kumar, A., Mosić, D., Stanimirović, P. S., Singh, G., & Kazakovtsev, L. A. (2022). Commuting outer inverse-based solutions to the Yang–Baxter-like matrix equation. Mathematics, 10(15), 2738. https://doi.org/10.3390/math10152738 37. Mansour, S. I. A., Ding, J., & Huang, Q. (2017). Explicit solutions of the YangBaxter-like matrix equation for an idempotent matrix. Applied Mathematics Letters, 63, 71–76 38. Nichita, F. F. (2015). Yang–Baxter equations, computational methods and applications. Axioms, 4(4), 423–435. https://doi.org/10.3390/axioms4040423 39. Onsager, L. (1944). Crystal statistics. I. A two-dimensional model with an orderdisorder transition. Physical Review, Series II, 65(3–4), 117–149 40. Penrose, R. (1955). A generalized inverse for matrices. Mathematical Proceedings of the Cambridge Philosophical Society, 51, 406–413 41. Rakić, D. S., Dinčić, N. Č., & Djordjević, D. S. (2014). Core inverse and core partial order of Hilbert space operators. Applied Mathematics and Computation, 244, 283–302 42. Rakić, D. S., Dinčić, N. Č., & Djordjević, D. S. (2014). Group, Moore–Penrose, core and dual core inverse in rings with involution. Linear Algebra and its Applications, 463, 115–133 43. Ren, H., Wang, X., & Wang, T. (2018). Commuting solutions of the Yang–Baxterlike matrix equation for a class of rank-two updated matrices. Computers and Mathematics with Applications, 76, 1085–1098
346
N. Č. Dinčić and B. D. Djordjević
44. Rudin, W. (1991). Real and complex analysis (3rd ed.). New York: McGraw-Hill 45. Shen, D., & Wei, M. (2020). All solutions of the Yang-Baxter-like matrix equation for diagonalizable coefficient matrix with two different eigenvalues. Applied Mathematics Letters, 101, 106048 46. Shen, D., Wei, M., & Jia, Z. (2018). On commuting solutions of the Yang–Baxterlike matrix equation. Journal of Mathematical Analysis and Applications, 462, 665–696 47. Sylvester, J. J. (1884). Sur l’equation en matrices px = xq. Comptes Rendus de l’Acadmie des Sciences Paris, 99, 67–71 and 115–116 48. Tian, H. (2016). All solutions of the Yang–Baxter-like matrix equation for rank-one matrices. Applied Mathematics Letters, 51, 55–59 49. Trefethen, L. N., & Bau III, D. (1997). Numerical linear algebra. SIAM 50. Wie, C. R. Phase factors in singular value decomposition and Schmidt decomposition. https://doi.org/10.48550/arXiv.2203.12579 51. Yang, C. N. (1967). Some exact results for the many-body problem in one dimension with repulsive delta-function interaction. Physical Review Letters, 19, 1312–1315 52. Yang, C., & Ge, M. (1989). Braid group, knot theory, and statistical mechanics. World Scientific 53. Zhou, D., Chen, G., & Ding, J. (2017). Solving the Yang-Baxter-like matrix equation for rank-two matrices. Journal of Computational and Applied Mathematics, 313, 142–151 54. Zhou, D., Chen, G., & Ding, J. (2017). On the Yang-Baxter-like matrix equation for rank-two matrices. Open Mathematics, 15, 340–353 55. Zhou, D., Chen, G., Yu, G., & Zhong, J. (2018). On the projection-based commuting solutions of the Yang–Baxter matrix equation. Applied Mathematics Letters, 79, 155–161 56. Zhou, D., & Ding, J. (2018). Solving the Yang-Baxter-like matrix equation for nilpotent matrices of index three. International Journal of Computer Mathematics, 95(2), 303–315 57. Zhou, D., & Ding, J. (2020). All slutions of the Yang–Baxter-like matrix equation for nilpotent matrices of index two. Complexity (Hindawi), 2020, 7, Article ID 2585602 58. Zhou, D. -M., & Vu, H. -Q. (2020). Some non-commuting solutions of the YangBaxter-like matrix equation. Open Mathematics, 18, 948—969 59. Zhou, D. -M., Ye, X. -X., Wang, Q. -W., Ding, J. -W., & Hu, W. -Y. (2021). Explicit solutions of the Yang-Baxter-like matrix equation for a singular diagonalizable matrix with three distinct eigenvalues. Filomat, 35(12), 3971–3982. https://doi.org/ 10.2298/FIL2112971Z
Hermitian Polynomial Matrix Equations and Applications Zhigang Jia, Linlin Zhao, and Meixiang Zhao
Abstract In this chapter, we consider Hermitian polynomial matrix equations X s ± ℓi = 1 δi Ai X ti Ai = Q, which frequently arise from linear-quadratic control problems. The latest results of several special cases are reviewed and further developed to the general situation. Based on the spectral analysis of coefficient matrices, sufficient conditions are presented to guarantee the existence and uniqueness of Hermitian positive definite solutions. A general algorithm is designed to compute the maximal or maximal-like solutions. Their feasibility and efficiency are indicated by numerical experiments. Keywords Polynomial matrix equation • Hermitian definite positive • Maximal-like solution • Maximal solution • Iteration method Mathematics Subject Classification (MSC2020) Primary 47J05 • Secondary 47H14
Z. Jia (✉) The Research Institute of Mathematical Science, Jiangsu Normal University, Xuzhou, P.R. China School of Mathematics and Statistics, Jiangsu Normal University, Xuzhou, P.R. China e-mail: [email protected] L. Zhao School of Mathematics and Big Data, Dezhou University, Dezhou, P.R. China e-mail: [email protected] M. Zhao School of Mathematics and Statistics, Jiangsu Normal University, Xuzhou, P.R. China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Moslehian (ed.), Matrix and Operator Equations and Applications, Mathematics Online First Collections, https://doi.org/10.1007/16618_2023_50
347
348
Z. Jia et al.
1 Introduction Hermitian polynomial matrix equations have a general form of Xs ±
ℓ i=1
δi Ai X ti Ai = Q,
ð1:1Þ
where ℓ, s, ti are positive integers, δi 2 {-1, 0, 1}, Ai , Q 2 n × n , and Q is Hermitian positive definite. They have an initial characteristic that the operator on the left side is self-adjoint. From a practical application point of view, the Hermitian positive definite (HPD) solutions of (1.1) reflect intrinsic physical properties, so their existence and computation are often the targets of research. Many important linear or nonlinear matrix equations can be unified into the general form (1.1), such as continuous-time or discrete-time algebraic Riccati equations (CARE or DARE) [1–3], generalized discrete-time algebraic Lyapunov equation (GDALE) [4, 5], and nonlinear matrix equation (NME) [6–8]. Among of them, GDALE and CARE source directly from the linear(-quadratic) control problem; many NMEs are introduced and studied by mathematicians based on interest and mathematical evolution. The solvability theory and algorithms of these matrix equations have been widely studied in [9–13], and [14]. The perturbation theory was well developed in [15–21], and the structured condition numbers were defined for the GDALE and NME in [5]. These topics have inspired many interesting works on nonlinear matrix equations [22–24]. However, there is still lack of efficient methods to solve the Hermitian polynomial matrix equations (1.1). The solvability of (1.1) highly relies on the values of parameters s, ti, δ and the spectral property of Ai. These factors are often mixed together, making the analysis complex. This also makes it difficult to design a unified solution. A possible solution is to use the alternating direction variable method, that is, to use the solution of the previous step to replace the unknown variables of some terms, and solve a relatively simple nonlinear matrix equation for each iteration step. The framework is as follows: Let Y be the solution of prior iteration. Replace X of several terms in (1.1) with Y, and move them to the right-hand side. So one can obtain simple Hermitian polynomial matrix equations, for instance,
Hermitian Polynomial Matrix Equations and Applications
X s ± δj Aj X tj Aj = Q ∓
i≠j
δi Ai Y ti Ai :
349
ð1:2Þ
Then compute the solution of (1.2) if exists, and go to next iteration. There are many choices to construct the intermediate iterative nonlinear matrix equations. Two principles shall be followed: one is that the intermediate iterative nonlinear matrix equations have a solution and are easy to solve; another is that the convergence of the whole iteration is guaranteed. Under the above framework, we first review the solvability of two typical Hermitian polynomial matrix equations of (1.2): X s þ A X t A = Q
ð1:3Þ
X s - A X t A = Q,
ð1:4Þ
and
where s, t are positive integers, A, Q 2 n × n , and Q > 0; then we present a general method to solve (1.1). The core work falls on the solutions of (1.3) and (1.4), for which the sufficient conditions will be proposed to guarantee their existence and uniqueness. It is valuable to mention that (1.3) and (1.4) themselves also have important applications to system and control theory and signal processing. When s = t = 1, they are exactly the famous discrete-time algebraic Lyapunov equations (DALEs) [3, 25]. The organization is as follows: In Section 2 we review the existence and uniqueness conditions of HPD solutions of (1.3) and minimal-like or maximal-like solutions of (1.4); a general method of solving (1.1) and algorithms are proposed in Section 3; numerical experiments are given in Section 4; and in Section 5 we present a conclusion. Notation: For Hermitian matrix H 2 n × n , λ1(H) ≥ λ2(H) ≥ ⋯ ≥ λn(H), denote its eigenvalues in the descending order. For matrix A 2 m × n , σ 1(A) ≥ σ 2 ≥ ⋯ ≥ σ l(A) ≥ 0, denote the singular values, where l = min fm, ng. For X = X and Y = Y, X ≥ Y (X > Y ) means X - Y is positive semi-definite (definite). [αI, βI] denotes the matrix set {XjX - αI ≥ 0 and βI - X ≥ 0}.
350
Z. Jia et al.
2 HPD Solutions of (1.3) and (1.4) In this section, we analyze the solvability of (1.3) and (1.4). The simplest case that n = 1 of (1.3) and (1.4) is considered in Section 2.1. In Section 2.2 we derive sufficient conditions in order to exist an HPD solution of (1.3). In Section 2.3, (1.4) is studied in three different cases. At first, we recall two famous results from [26] and [27]. Lemma 2.1 [26, Theorem 1.1] Let M and N be two Hermitian semi-definite matrices. If M ≥ N ≥ 0 and 0 ≤ r ≤ 1, then Mr ≥ Nr. Lemma 2.2 [27, Theorem 2.1] Let M and N be positive operators on a Hilbert space H . If α1I ≥ M ≥ β1I > 0, α2I ≥ N ≥ β2I > 0 and 0 < M ≤ N, then for any t ≥ 1, M t ≤ ðα1 ∕ β1 Þt - 1 N t , M t ≤ ðα2 ∕ β2 Þt - 1 N t :
2.1
ð2:1Þ
The Corresponding One-Dimension Problems
If n = 1, A a 2 , Q q 2 , and q > 0, (1.3) and (1.4) reduce, respectively, to scalar polynomials xs þ jaj2 xt = q
ð2:2Þ
xs - jaj2 xt = q:
ð2:3Þ
and
Let f 1 ðxÞ = xs þ jaj2 xt - q, f 2 ðxÞ = xs - jaj2 xt - q: 1
If jaj = 0, f1(x), as well as f2(x), has a unique positive root α = qs . 1
If jaj > 0, f1(x) has a unique positive root α 2 ð0, min fqs , ðjaj - 2 qÞ t gÞ. For 1
1
f2(x), situation is a little bit complicated. Notice that when s ≠ t, β = ðst jaj2 Þs - t is the only stationary point of f2(x).
Hermitian Polynomial Matrix Equations and Applications
(1) If t > s > 0, then when jaj2 < q
s-t s
1 s
t-s s
s s t ð1 - t Þ
2 s-t
α1 and α2, with q < α1 < β < α2 < jaj
351
, f2(x) has two positive roots
; when jaj2 = q s-t
s-t s
t-s
s s s t ð1 - t Þ , t-s s s s t ð1 - t Þ , f2(x)
f2(x) has a unique positive root α = β; when jaj2 > q s has no positive root. 1 (2) If s > t > 0, f2(x) has a unique positive root α 2 ½q s , þ 1Þ. 2 (3) If t = s > 0, then when jaj < 1, f2(x) has a unique positive root -1
1 s
α = ½qð1 - jaj2 Þ ; when jaj2 ≥ 1, f2(x) has no positive root.
2.2
HPD Solutions of (1.3)
Define f 1 ðxÞ = xs þ λn ðA AÞxt - λ1 ðQÞ
ð2:4Þ
f 2 ðxÞ = xs þ λ1 ðA AÞxt - λn ðQÞ:
ð2:5Þ
and
1
1
Let α1 2 ð0, ðλ1 ðQÞÞ s and β1 2 ð0, ðλn ðQÞÞs be two unique positive roots of f1(x) and f2(x), respectively. Lemma 2.3 [28, Theorem 3.3.16] Let M and N be two m-by-n matrices and let q = min fm, ng: The following inequalities hold for the decreasingly ordered singular values of M, N, M + N, and MN: (1) σ i+j-1(M + N) ≤ σ i(M) + σ j(N), (2) σ i+j-1(MN) ≤ σ i(M)σ j(N), for 1 ≤ i, j ≤ q and i + j ≤ q + 1. In particular, (3) jσ i(M + N) - σ i(M)j ≤ σ 1(N), (4) σ i(MN) ≤ σ i(M)σ 1(N), for i = 1, ⋯ , q. Theorem 2.4 If X 2 n × n is an HPD solution of (1.3), then β1 ≤ λðXÞ ≤ α1 : Proof From Lemma 2.3(4), we have
ð2:6Þ
352
Z. Jia et al.
σ i ðA X t AÞ ≤ σ i ðX t Þðσ 1 ðAÞÞ2 , i:e:, λi ðA X t AÞ ≤ λi ðX t Þλ1 ðA AÞ: ð2:7Þ In the case that A is invertible,
σ i ðX t Þ = σ i ððA - 1 Þ A X t AA - 1 Þ ≤ ðσ min ðAÞÞ- 2 σ i ðA X t AÞ: This derives σ i ðA X t AÞ ≥ σ i ðX t Þσ min ðA AÞ, i:e:, λi ðA X t AÞ ≥ λi ðX t Þλn ðA AÞ:
ð2:8Þ
In fact, (2.8) still holds in the case that A is singular since λn(AA) = 0. By Weyl theorem in [29], Xs = Q - AXtA implies λðX s Þ ≤ λ1 ðQÞ - λðA X t AÞ and λðX s Þ ≥ λn ðQÞ - λðA X t AÞ:
ð2:9Þ
Then from inequalities (2.7), (2.8), and (2.9), we have λðX s Þ ≤ λ1 ðQÞ - λn ðA AÞλðX t Þ and λðX s Þ ≥ λn ðQÞ - λ1 ðA AÞλðX t Þ: That means λðXÞs ≤ λ1 ðQÞ - λn ðA AÞλðXÞt , i:e:, g1 ðλðXÞÞ ≤ 0; λðXÞs ≥ λn ðQÞ - λ1 ðA AÞλðXÞt , i:e:, g2 ðλðXÞÞ ≥ 0: □
Then (2.6) is obtained. Theorem 2.5 If X is an HPD solution of (1.3), then ½λn ðQÞ - λ1 ðXÞs λ1 ðXÞ - t ≤ jλðAÞj2 ≤ ½λ1 ðQÞ - λn ðXÞs λn ðXÞ - t :
ð2:10Þ
Proof Let y be an eigenvector of unit norm corresponding to λ(A). Then y X s y þ jλðAÞj2 y X t y = y Qy: So we have λn ðXÞs þ jλðAÞj2 d tn ≤ λ1 ðQÞ and λ1 ðXÞs þ jλðAÞj2 λ1 ðXÞt ≥ λn ðQÞ: This implies (2.10).
□
Hermitian Polynomial Matrix Equations and Applications
353
Based on the above results, we analyze the solvability of (1.3). Theorem 2.6 If λ1 ðA AÞ ≤ λn ðQÞðλ1 ðQÞÞ- s , then (1.3) has an HPD solution X 2 [β1I, α1I]. Furthermore, the solution X is unique if -1 λ1 ðA AÞ < ðsβs1 - 1 Þðtαt1- 1 Þ . t
Proof 1
(1) At first, we prove the existence of solution. Define FðXÞ = ðQ - A X t AÞs 1 with X 2 ½0, ðλ1 ðQÞÞ s I. Applying Lemma 2.2, we can derive t X t ≤ ðλ1 ðQÞÞs I from the facts σ 1(X)I ≥ X ≥ σ n(X)I > 0 and 1 0 < X ≤ ðλ1 ðQÞÞ s I. From Lemma 2.1,
ð2:11Þ
1
1
FðXÞ ≤ Qs ≤ ðλ1 ðQÞÞs I: The condition λ1 ðA AÞ ≤ λn ðQÞðλ1 ðQÞÞ- s implies t
1
FðXÞ ≥ ðλn ðQÞ - λ1 ðA AÞλ1 ðX t ÞÞ s I 1
≥ ½λn ðQÞ - λ1 ðA AÞðλ1 ðQÞÞs s I ≥ 0: t
ð2:12Þ ð2:13Þ
The first assertion follows Brouwer’s fixed-point theorem and Theorem 2.4 with applying (2.11), (2.12), and (2.13). (2) Now we prove the uniqueness of the solution X. Assume (1.3) has two HPD solutions X 2 n × n and Y 2 n × n on [β1I, α1I], and ||X - Y ||F > 0. Then X s - Y s = A ðY t - X t ÞA and jjX s - Y s jjF ≤ jjAjj22 jjY t - X t jjF :
ð2:14Þ
One can derive that Xs - Y s =
s-1 k=0
X k ðX - YÞY s - 1 - k
ð2:15Þ
and jjX s - Y s jjF = jj½
s-1 k=0
ðY s - 1 - k Þ X k vecðX - YÞjj2 :
ð2:16Þ
354
Z. Jia et al.
Suppose X and Y have spectral decompositions: X = U ΛxU, Y = V ΛyV, where U, V 2 n × n are unitary, Λx = diag(λ1(X), ⋯ , λn(X)), Λy = diag (λ1(Y ), ⋯ , λn(Y )). So s-1
Y
s-1-k
s-1
VΛsy - 1 - k V UΛkx U
X ¼ k
k¼0
k¼0 s-1
Λsy - 1 - k Λkx
¼ V U
V U :
k¼0 s-1
It is observed that k=0
ðY s - 1 - k Þ X k is HPD and its eigenvalues are
s-1 k=0
λj ðXÞk λi ðYÞs - 1 - k , 1 ≤ i, j ≤ n:
The assumption X, Y 2 [β1I, α1I] implies jjX s - Y s jjF ≥ sβs1 - 1 jjX - YjjF
and jjX t - Y t jjF ≤ tαt1- 1 jjX - YjjF : ð2:17Þ -1
Under the further condition λ1 ðA AÞ < ðsβs1 - 1 Þðtαt1- 1 Þ , one can derive from (2.14) and (2.17) that jjX - YjjF ≤ tαt1- 1 ðsβs1 - 1 Þ
-1
λ1 ðA AÞjjX - YjjF < jjX - YjjF :
This is a contradiction. The uniqueness of the solution is followed.
□
In above analysis, we have presented a sufficient condition for the existence and uniqueness of the solution of (1.3). This condition is easy to check and becomes simple in special cases. For instance, in the case that s = 2, t = 1 and Q = I, (1.3) has a unique HPD solution if λ1(AA) ≤ 1. This is because 1
2β1 = ðλ1 ðA AÞ2 þ 4Þ2 - λ1 ðA AÞ > λ1 ðA AÞ. These results still hold if A and Q are real matrices and the solution is symmetric positive definite. We refer to [8] for the details.
Hermitian Polynomial Matrix Equations and Applications
2.3
355
HPD Solutions of (1.4)
The analysis of the solvability of (1.4) is more difficult. There are three different situations that need to be divided. Solutions are also divided into different cases such as the maximal solution, the maximal-like solution, and the minimal-like solution. In this part, A is always assumed to be invertible, that is, λn(AA) > 0. 2.3.1
Case 1: s < t.
Define f 3 ðxÞ = xs - λ1 ðA AÞxt - λ1 ðQÞ
ð2:18Þ
f 4 ðxÞ = xs - λn ðA AÞxt - λn ðQÞ
ð2:19Þ
and
where A, Q 2 n × n , AA > 0, and Q > 0. If s < t and λ1 ðA AÞ ≤ s-t 1 1 s t s s s-t t ðt - s λ1 ðQÞÞ , f3(x) has two roots α1 , α2 2 ½ðλ1 ðQÞÞ , ðλ1 ðA AÞÞ . If s-t
1
λn ðA AÞ ≤ st ½t -t s λn ðQÞ s , f4(x) has two roots β1, β2 2 ½ðλn ðQÞÞs , s-t
1
1
ðλn ðA AÞÞs - t . Moreover, if λ1 ðA AÞ ≤ st ðt -t s λ1 ðQÞÞ s , 0 < ðλn ðQÞÞ s ≤ β1 1
≤ α1 ≤ α2 ≤ β2 ≤ ðλn ðA AÞÞs - t .
Theorem 2.7 If λ1 ðA AÞ ≤ st ðt -t s λ1 ðQÞÞ solution of (1.4), then
s-t s
and X 2 n × n is an HPD
β1 ≤ λðXÞ ≤ α1 or α2 ≤ λðXÞ ≤ β2 :
ð2:20Þ
Proof From (2.7) and (2.8), Xs = Q + AXtA implies that f 3 ðλðXÞÞ ≤ 0 and f 4 ðλðXÞÞ ≥ 0, □
from which we obtain (2.20). s-t
Theorem 2.8 If λ1 ðA AÞ ≤ st ðt -t s λ1 ðQÞÞ s , then the following two assertions hold: (1) (1.4) has an HPD solution on [β1I, α1I] and furthermore, if -1 λn ðA AÞ > sβs1 - 1 ðtαt1- 1 Þ , then the solution is unique.
356
Z. Jia et al.
(2) (1.4) has an HPD solution on [α2I, β2I] and furthermore, if λn(AA) > -1 sβs2 - 1 ðtαt2- 1 Þ , then the solution is unique. 1
Proof (1) At first, we prove the existence. Define F1 ðXÞ = ðQ þ A X t AÞs for 1
1
X 2 ½ðλn ðQÞÞ s I, ðs∕ ðλ1 ðA AÞtÞÞt - s I. Applying Lemmas 2.1 and 2.2, we derive 1
t
1
ðλn ðQÞÞs I ≤ F1 ðXÞ ≤ fλ1 ðQÞ þ λ1 ðA AÞ½s∕ ðλ1 ðA AÞtÞt - s gs I ≤ ½s∕ ðλ1 ðA AÞtÞt - s × s I = ½s∕ ðλ1 ðA AÞtÞt - s I: s
1
1
1
1
F1 ðXÞ has a fixed-point X on ½ðλn ðQÞÞs I, ðs∕ ðλ1 ðA AÞtÞÞt - s I by Brouwer’s fixed-point theorem. Moreover, we know X 2 [β1I, α1I] by applying Theorem 2.7. Now we prove the uniqueness. Suppose (1.4) has another HPD solution 1 1 Y 2 ½ðλn ðQÞÞ s I, ðs∕ ðλ1 ðA AÞtÞÞt - s I and Y ≠ X. One can derive that
jjX t - Y t jjF = jjðA - 1 Þ ðX s - Y s ÞA - 1 jjF ≤ ðλn ðA AÞÞ - 1 jjX s - Y s jjF : ð2:21Þ Then jjX s - Y s jjF ≤ sαs1 - 1 jjX - YjjF imply
and jjX t - Y t jjF ≥ tβt1- 1 jjX - YjjF
jjX - YjjF ≤ sαs1 - 1 ½tβt1- 1 λn ðA AÞ
-1
jjX - YjjF < jjX - YjjF :
This is a contradiction. Hence, there must be X = Y .
1
(2) Define F2 ðZÞ = ½ðA - 1 Þ ðZ s - QÞA - 1 t , Z 2 [α2I, β2I]. We can see that F2 ðZÞ is continuous. Since ðA - 1 Þ ðαs2 I - QÞA - 1 ≤ ðA - 1 Þ ðZ s - QÞ A - 1 ≤ ðA - 1 Þ ðβs2 I - QÞA - 1 , we have F2 ðα2 IÞ ≤ F2 ðZÞ ≤ F2 ðβ2 IÞ:
ð2:22Þ
By Lemmas 2.1 and 2.2 and Brouwer’s fixed-point theorem, we only need to prove F2 ðα2 IÞ ≥ α2 I and F2 ðβ2 IÞ ≤ β2 I. The existence of the HPD solution Z 2 [α2I, β2I] follows the following inequalities:
Hermitian Polynomial Matrix Equations and Applications
1
357 1
F2 ðα2 IÞ = ½ðA-1 Þ ðαs2 I - QÞA - 1 t ≥ ½ðA-1 Þ ðαs2 I - λ1 ðQÞIÞA-1 t 1
≥ ½ðλ1 ðA AÞÞ-1 ðαs2 I - λ1 ðQÞIÞ t = α2 I ð2:23Þ and
1
1
F2 ðβ2 IÞ = ½ðA-1 Þ ðβs2 I - QÞA-1 t ≤ ½ðA-1 Þ ðβs2 I - λn ðQÞIÞA-1 t 1
≤ ½ðλn ðA AÞÞ-1 ðβs2 I - λn ðQÞIÞ t = β2 I: ð2:24Þ Assume that (1.4) has another HPD solutions Y on [α2I, β2I] and Y ≠ Z. Then
jjZ t - Y t jjF = jjðA- 1 Þ ðZ s - Y s ÞA- 1 jjF ≤ ðλn ðA AÞÞ-1 jjZ s - Y s jjF ≤ ðλn ðA AÞÞ-1 sβs2 - 1 jjZ - YjjF : -1
If λn ðA AÞ > sβs2 - 1 ðtαt2- 1 Þ , the inequality jjZ t - Y t jjF ≥ tαt2- 1 jjZ - YjjF implies -1
jjZ - YjjF ≤ ðtαt2- 1 λn ðA AÞÞ sβs2 - 1 jjZ - YjjF < jjZ - YjjF : This is a contradiction. The assumption does not hold. The uniqueness of Z has been proved. □ When the maximal or minimal solution (see, e.g., [30] and [31]) does not exist, we will consider a maximal-like or minimal-like solution, defined as follows: Definition 2.9 Suppose that XM and XL are HPD solutions of (1.4) on [αI, βI] with β ≥ α > 0, then XM is a maximal-like solution if λ1(XM) ≥ λ1(Y ), and XL is a minimal-like solution if λ1(XL) ≤ λ1(Y ), where Y 2 [αI, βI] is an arbitrary HPD solution of (1.4). From Definition 2.9 we can see that if (1.4) is solvable, then there always exist a minimal-like solution and a maximal-like solution. So Theorem 2.8 implies the following theorem directly:
358
Z. Jia et al. s-t
Theorem 2.10 If λ1 ðA AÞ ≤ st ðt -t s λ1 ðQÞÞ s , (1.4) has a minimal-like solution XL and a maximal-like solution XM on [β1I, β2I]. In the next theorem, we present a sufficient condition under which (1.4) has a maximal solution. s-t
Theorem 2.11 Suppose that λ1 ðA AÞ ≤ st ðt -t s λ1 ðQÞÞ s . -1
(1) If λn ðA AÞ > sβs1 - 1 ðtαt1- 1 Þ , (1.4) has a unique minimal-like solution XL 2 [β1I, α1I], which can be computed by ð2:25Þ
1
X i = ðQ þ A X t AÞs , i = 1, 2, ⋯
with an initial value X0 2 [β1I, α1I]. -1 (2) If λn ðA AÞ > sβs2 - 1 ðtαt2- 1 Þ , (1.4) has a maximal solution X max 2 ½α2 I, β2 I which can be computed by
1
ð2:26Þ
X i = ½ðA- 1 Þ ðX si - 1 - QÞA- 1 t , i = 1, 2, ⋯ with the initial value X0 = β2I.
Proof (1). From Theorem 2.8 (1) and Theorem 2.10, there is a unique minimal-like solution on [β1I, α1I]. We now prove the convergence of {X0, X1, X2, ⋯ } generated by (2.25). Set X0 2 [β1I, α1I]. Recall the following facts that β1I = [λn(AA)-1
1
1
ðβs1 I -λn ðQÞIÞ t ≤ ½ðA-1 Þ ðβs1 I -QÞA-1 t ,
1
α1 I =½λ1 ðA AÞ-1 ðαs1 I -λ1 ðQÞIÞ t 1
≥ ½ðA-1 Þ ðαs1 I -QÞA- 1 t , and ½ðA- 1 Þ ðβs1 I -QÞA- 1 t ≥ [(A-1) ðβs1 I -λn 1
ðQÞIÞ A- 1 t ≥ ½λn ðA AÞ-1 ðβs1 - λn ðQÞÞ t : Similar to (2.23), we can induce that if Xi-1 2 [β1I, α1I], then β1 I ≤ ðA-1 Þ ðβs1 I -QÞA-1 ≤ ðA-1 Þ ðX si-1 -QÞ A-1 ≤ ðA-1 Þ ðαs1 I -QÞA-1 ≤ α1 I: 1
1
1
Let F2 ðZÞ = ½ðA- 1 Þ ðZ s - QÞA- 1 t , Z 2 [α2I, β2I]. F2 ðZÞ is continuous, and F2 ðα2 IÞ ≤ F2 ðZÞ ≤ F2 ðβ2 IÞ,
ð2:27Þ
because ðA- 1 Þ ðαs2 I - QÞA- 1 ≤ ðA- 1 Þ ðZ s - QÞA- 1 ≤ ðA- 1 Þ ðβs2 I - QÞA- 1 . By Lemma 2.1, Lemma 2.2, and Brouwer’s fixed-point theorem, it is sufficient to prove F2 ðα2 IÞ ≥ α2 I and F2 ðβ2 IÞ ≤ β2 I in order to exist an HPD solution Z 2 [α2I, β2I]. The existence of such Z follows from inequalities:
Hermitian Polynomial Matrix Equations and Applications
359
1
1
F2 ðα2 IÞ = ½ðA- 1 Þ ðαs2 I - QÞA- 1 t ≥ ½ðA- 1 Þ ðαs2 I - λ1 ðQÞIÞA- 1 t 1
≥ ½ðλ1 ðA AÞÞ-1 ðαs2 I - λ1 ðQÞIÞ t = α2 I and
1
1
F2 ðβ2 IÞ = ½ðA- 1 Þ ðβs2 I - QÞA- 1 t ≤ ½ðA- 1 Þ ðβs2 I - λn ðQÞIÞA- 1 t 1
≤ ½ðλn ðA AÞÞ-1 ðβs2 I - λn ðQÞIÞ t = β2 I: Next we prove the uniqueness of Z under the additional condition that -1 λn ðA AÞ > sβs2 - 1 ðtαt2- 1 Þ . Suppose (1.4) has two different HPD solutions Z and Y on [α2I, β2I]. Then
jjZ t - Y t jjF = jjðA- 1 Þ ðZ s - Y s ÞA- 1 jjF ≤ ðλn ðA AÞÞ-1 jjZ s - Y s jjF ≤ ðλn ðA AÞÞ-1 sβs2 - 1 jjZ - YjjF : -1
Moreover if λn ðA AÞ > sβs2 - 1 ðtαt2- 1 Þ , jjZ t - Y t jjF ≥ tαt2- 1 jjZ - YjjF , we have
applying
the
inequality
-1
jjZ - YjjF ≤ ðtαt2- 1 λn ðA AÞÞ sβs2 - 1 jjZ - YjjF < jjZ - YjjF , which is impossible. Hence, Y = Z. Suppose that X is an arbitrary HPD solution of (1.4). From Theorem 2.7, X0 = β1I ≤ X and then X t0 = βt1 I ≤ X t . Assume that X ti - 1 ≤ X t; then from s < t,
X ti = ðA- 1 Þ ðX si - 1 - QÞA- 1 = ðA- 1 Þ ½ðX ti - 1 Þ - QA- 1
s∕ t
≤ ðA- 1 Þ ½ðX t Þ - QA- 1 = ðA- 1 Þ ðX s - QÞA- 1 = X t : s∕ t
Then we have X tL = lim X ti ≤ X t , which implies that λ1 ðX tL Þ ≤ λ1 ðX t Þ. So we i → þ1
have λ1(XL) ≤ λ1(X). (2). For any HPD solution X of (1.4), X0 = β2I ≥ X and X t0 = βt2 I ≥ X t . Assume that X ti - 1 ≥ X t ; then from s < t, X ti
=
ðA- 1 Þ ½ðX ti - 1 Þ - QA- 1 ≥ ðA- 1 Þ ½ðX t Þ - QA- 1 = X t : s∕ t
s∕ t
360
Z. Jia et al.
Then we have X tM = lim X ti ≥ X t , which implies that λ1(XM) ≥ λ1(X). i → þ1
-1
Moreover, if λn ðA AÞ > sαs2 - 1 ðtβt2- 1 Þ , on [α2I, β2I] (1.4) has a unique HPD solution, denoted by Y . Then Y t ≤ X tM ≤ βt2 I, which means λn(XM) ≥ λn(Y ) ≥ α2 and λ1(XM) ≤ β2. So XM 2 [α2I, β2I] and XM = Y . □ Specially, when t > s = 1, there exist minimal and maximal solutions of (1.4) under the conditions of Theorem 2.8. From Theorems 2.8 and 2.11, we have: Theorem 2.12 If t > s = 1, A 2 n × n is nonsingular, and λ1 ðA AÞ ≤ t-1 1 ðt - 1Þ t t ½ λ1 ðQÞ
X min
, then (1.4) has minimal and maximal solutions, denoted by and X max , respectively.
(1). Let X0 = β1I, with assuming βs1 ≥ λ1 ðQÞ. Then {X0, X1, ⋯ } from (2.26) will converge to X min and X min 2 ½β1 I, α1 I. (2). Let X0 = β2I. Then {X0, X1, ⋯ } from (2.26) will converge to X max and X max 2 ½α2 I, β2 I. Proof (1). From the proof of Theorem 2.11 (1), we only need to prove that for any HPD solution of (1.4) X, Xi ≤ X, which implies X min = lim X i ≤ X. i → þ1
Indeed, X0 ≤ X and assume Xi-1 ≤ X; then
1
1
X i = ½ðA- 1 Þ ðX i - 1 - QÞA- 1 t ≤ ½ðA- 1 Þ ðX - QÞA- 1 t = X: From Theorems 2.7 and 2.8 (1), it is easy to prove that X min 2 ½β1 I, α1 I. (2). Similar to the proof of (1). □ Now we consider (1.4) when t > s = 1, A is nonsingular, and Q = I. From above analysis, if λ1(AA) ≤ (t - 1)t-1∕tt, then 1 ≤ β1 ≤ α1 ≤ α2 ≤ λ1(AA)1-t and α2 ≤ β2 ≤ λn(AA)1-t, and (1.4) has minimal and maximal solutions. From Theorem 2.11, we have: Corollary 2.13 Suppose that t > s = 1, A is nonsingular, Q = I, and λ1(AA) ≤ (t - 1)t-1∕tt; then (1.4) has minimal and maximal solutions, denoted by X min and X max , respectively. (1) Let X0 = β1I; then {X0, X1, ⋯ } from
1
X i = ½ðA- 1 Þ ðX i - 1 - IÞA- 1 t , i = 1, 2, ⋯ ,
ð2:28Þ
will converge to X min and X min 2 ½β1 I, α1 I. (2) Let X0 = β2I, then {X0, X1, ⋯ } from (2.28) will converge to X max and X max 2 ½α2 I, β2 I.
Hermitian Polynomial Matrix Equations and Applications
2.3.2
361
Case 2: s > t > 0.
Denote 1
1
ð2:29Þ
1
1
ð2:30Þ
~ 1 = max fðλ1 ðQÞÞ s , ðλ1 ðA AÞÞs - t g m and ~ 2 = max fðλn ðQÞÞ s , ðλn ðA AÞÞs - t g: m
Recalling Section 2.1 and formulas (2.18) and (2.19), if s > t > 0, f3(x) has ~ 1 and f4(x) has a unique positive root γ 2 ≥ m ~ 2. a unique positive root γ 1 ≥ m If (1.4) has an HPD solution X, then γ 2 ≤ λ(X) ≤ γ 1. So we always suppose that γ 1 ≥ γ 2. In the following theorem, we show that this condition is also sufficient for the existence of the Hermitian positive definite solution of (1.4) if s > t > 0. Theorem 2.14 Suppose that s > t > 0; then (1.4) has an HPD solution on [γ 2I, γ 1I] if and only if γ 1 ≥ γ 2. (1) Let X0 = γ 2I. Then {X0, X1, ⋯ } from ð2:31Þ
1
X i = ðQ þ A X ti - 1 AÞs , i = 0, 1, 2, ⋯
will converge to a minimal-like solution of (1.4). (2) Let X0 = γ 1I. Then {X0, X1, ⋯ } from (2.31) will converge to a maximallike solution of (1.4). Proof If (1.4) has an HPD solution, it is obvious that γ 1 ≥ γ 2. Conversely, if 1
γ 1 ≥ γ 2, define matrix function FðXÞ = ðQ þ A X t AÞs on [γ 2I, γ 1I]. For FðXÞ is continuous, it only needs to show h(γ 2I) ≥ γ 2I and h(γ 1I) ≤ γ 1I. Indeed, 1
1
1
1
hðγ 2 IÞ = ðQ þ γ t2 A AÞs ≥ ðλn ðQÞ þ γ t2 λn ðA AÞÞ s I = γ 2 I, hðγ 1 IÞ = ðQ þ γ t1 A AÞs ≤ ðλ1 ðQÞ þ γ t1 λ1 ðA AÞÞ s I = γ 1 I: (1) For any HPD solution X of (1.4), X0 = γ 2I ≤ X and then X s0 ≤ X s . Assumt t ing that X si - 1 ≤ X s , X si = Q þ A ðX si - 1 Þs A ≤ Q þ A ðX s Þs A = X s . Then λ1(Xi) ≤ λ1(X), which implies that lim X i is a minimal-like solution. i→1
(2) Similar to the proof of (1). □
362
Z. Jia et al.
From Theorem 9.3 in [7] and Theorem 2.14, we have: Corollary 2.15 If s > t = 1 and γ 1 ≥ γ 2, then(1.4) has a unique solution, and 1
X 0 2 ½γ 2 I, γ 1 I, X i = ðQ þ A X i - 1 AÞs , i = 0, 1, 2, ⋯
ð2:32Þ
will converge to the unique solution. 2.3.3
Case 3: s = t > 0
In this case, (1.4) can be reduced to a linear matrix equation Y - AY A = Q, which is the discrete-time algebraic Lyapunov equation (DALE) or Hermitian Stein equation [3, Page 5], with Xs = Y . According to Section 6.4 of [28], if Y 2 n × n is HPD, there exists a unique HPD matrix X 2 n × n such that Xs = Y . Note that if (λ, y) is an eigenpair of matrix A 2 n × n , λ1 ðA AÞ = maxn x2
x ðA AÞx y A Ay ≥ = λλ = jλj2 : y y x x
Hence, if λ1(AA) < 1, then jλ1(A)j < 1 which means A is d-stable (see [3]). Lemma 2.16 [3, Theorem 1.1.3] The algebraic Stein equation S - NSA = R has a unique solution S 2 n × n if and only if λjμk ≠ 1 for all λj 2 σ(A), μk 2 σ(N). Lemma 2.17 [32, Theorem 1 in §13.2] Let A, V 2 n × n and let V be positive definite. If A is stable with respect to the unit circle, then equation H - AHA = V has a unique solution H, and H is positive definite. With applying the above two lemmas, we can obtain the following lemma: Lemma 2.18 If Q 2 n × n is HPD and λ1(AA) < 1, then DALE (or Hermitian Stein equation) Y - AY A = Q has a unique HPD solution. Let s = t > 0. (2.18) and (2.19) reduce, respectively, to g3 ðxÞ = xs - λ1 ðA AÞxs - λ1 ðQÞ, and
Hermitian Polynomial Matrix Equations and Applications
363
g4 ðxÞ = xs - λn ðA AÞxs - λn ðQÞ: 1
We have known that g3 ðxÞ has a unique root η1 = ½λ1 ðQÞ∕ ð1 - λ1 ðA AÞÞ s > 0 if and only if λ1(AA) < 1, and g4 ðxÞ has a unique root η2 = 1 ½λn ðQÞ∕ ð1 - λn ðA AÞÞ s > 0 if and only if λn(AA) < 1. Lemma 2.19 [3, Theorem 1.1.18] Let A 2 n × n be d-stable, and B ≥ 0. j j Then W 0 = 1 j = 0 ðA Þ QA defines the (unique) solution of the discrete-time algebraic Lyapunov equation W - AWA = Q. Note that ± Q ≥ 0 implies ± W0 ≥ 0. Moreover if Q ≥ BB and (B, A) is observable, then W0 > 0. From Lemma 2.19, we directly present the following theorem without proof: Theorem 2.20 If λ1(AA) < 1, s = t > 0, and η1 ≥ η2 > 0, then (1.4) has a unique HPD solution X 2 [η2I, η1I] and X s =
1
j=0
ðA Þj QAj .
At the current stage, we have presented sufficient conditions of the existence and uniqueness of solutions of (1.3) and (1.4). The proofs are constructive and give us ideas to design the iteration methods of computing solutions. Based on these results, we will present a general method of solving Hermitian polynomial matrix equations in the next section.
3 Algorithms and Applications In this section, we propose a general method of solving (1.1) and introduce its applications. Before doing this, we need to present two algorithms to compute solutions (1.3) and (1.4). To compute the HPD solution of (1.3), we design Algorithm 3.1 according to Theorem 2.6. Algorithm 3.1 Given matrices A, Q 2 n × n and integer numbers s, t. Step 1 Compute λ1(AA), λn(AA), λ1(Q), λn(Q), the unique root α1 of f1(x) defined by (2.4), and the unique root β1 of f2(x) defined by (2.5), respectively. t Step 2 If λ1 ðA AÞ ≤ λn ðQÞλ1 ðQÞ-s , let X0 = β1I, and choose a tolerance tol. For i = 1, 2, ⋯
364
Z. Jia et al. 1
while jjX i - ðQ - A X ti - 1 AÞs jjF ≥ tol 1
X i = ðQ - A X ti - 1 AÞs : end. To compute the HPD solution of (1.4), we design Algorithm 3.2 according to Theorems 2.11, 2.14, and 2.20. Algorithm 3.2 Given matrices A, Q 2 n × n and integer numbers s, t. Step 1 Compute λ1(AA), λn(AA), λ1(Q), λn(Q). Step 2 Input (2.18) and (2.19). Step 3 If s < t, run steps 4–5; if 0 < t < s, run steps 6–7; otherwise, run steps 8–9. Step 4 Compute the roots α1, α2 of f3(x), and β1, β2 of f4(x), respectively. Step 5 Let X0 = β2I; run (2.26). Step 6 Compute the root γ 1 of f3(x) and the root γ 2 of f4(x), respectively. Step 7 Let X0 = γ 1I; run (2.31). Step 8 Compute the root η1 of f3(x) and the root η2 of f4(x), respectively. Step 9 If λ1(AA) < 1 and η1 ≥ η2, let γ 1 η1, γ 2 η2. Run (2.31) with X0 = γ 1I. Now we are ready to present an algorithm for computing the HPD solution of (1.1), which is always assumed to be solvable. Recalling the framework described in Section “Introduction”, the solving process contains an inner iteration and an outer iteration. In the outer iteration, let Yk denote the computed HPD solution at the kth iteration, and let ε be the tolerance; then the convergence condition is kYk+1 - YkkF∕kYkkF < ε. In the inner iteration, we apply Algorithm 3.1 or Algorithm 3.2 to compute the HPD solution of the intermediate Hermitian polynomial matrix equation such as (1.2). Algorithm 3.3 Input matrices Ai , Q 2 n × n and integer numbers s, ti, δi. Set an initial solution Y0 and a tolerance ε. For k = 1, 2, ⋯ Step 1 Choose an index j from {1, 2, ⋯ , ℓ} and construct the intermediate matrix equations: X s ± Aj X tj Aj = Q ∓
i≠j
δi Ai Y tki Ai :
ð3:1Þ
Hermitian Polynomial Matrix Equations and Applications
365
Step 2 Check the solvability of (3.1). If solvable, choose Algorithm 3.1 or Algorithm 3.2 to compute the solution X of (3.1) and set Yk+1 = X; else, stop the iteration. Step 3 Check the convergence condition. If kYk+1 - YkkF ∕ kYkkF < ε, stop the iteration; else, go to Step 1. In the practical implementation, the index j in Algorithm 3.3 is chosen to guarantee the solvability of (3.1). That is Q ∓ i ≠ j δi Ai Y tki Ai > 0 and λ1 ðAj Aj Þ satisfies the sufficient conditions of the proposed theorems in Section “HPD Solutions of (1.3) and (1.4)”. To end this section, we briefly point out the applications of (1.1). We have mentioned in the introduction that (1.1) unify many well-known linear or nonlinear matrix equation from control theory and application. To overcome the difficulty of solution, Algorithm 3.3 provides an elementary but very efficient framework for solving general Hermitian polynomial matrix equations. This facilitates their application. A typical example is to solve the following generalized algebraic Riccati equations: YS SY - B Y - YB þ C YC - R = 0,
ð3:2Þ
which are exactly the continuous-time algebraic Riccati equations when C = 0 (see, e.g., [32] and [2]). Suppose that S is nonsingular. Let X = S(Y B(SS)-1)S, A = SCS-1, Q = S[R - CB(SS)-1C + (B2) (SS)-1]S, B = SSB(SS)-1, and R - CB(SS)-1C + (B2) (SS)-1 > 0 with A, S, B, R 2 n × n . Then (3.2) is equivalent to (1.1) with ℓ = 1, s = 2, and t1 = 1. We refer to [5] for the statement of application to (generalized) discrete-time algebraic Lyapunov equations.
4 Numerical Experiments In this section, we indicate the feasibility and efficiency of the proposed algorithms by several numerical experiments. Example 4.1 Suppose A = rand(100) × 10-2 and Q = eye(100). Let tol = 10-12 and the maximal number of iteration be 200. We apply Algorithms 3.1 and 3.2 to solve (1.3) and (1.4), respectively. In Table 1, we present iterations and CPU time (short format) before convergence – E1U, unique solution of (1.3); E2UM, unique maximal-like solution of (1.4)
366
Z. Jia et al.
Table 1 Iterations and CPU time (s,t) (1,2) (2,1) (5,8) (8,5)
E1U (29,0.4125 s) (11, 0.8455 s) (24,2.4037 s) (14,1.3625 s)
E2UM (200,28.2264 s)
(10,0.8239 s) (35, 77.8823 s) (10,0.9154 s)
2
2
0
0
−2
−2 log10(r(1,2))
−4
E2M
−4
log10(r(5,8))
−6
−6
−8
−8
log10(r(8,5))
log10(r(2,1)) −10
−10
−12
0
5
10
15
20
−12
0
5
10
15
20
Figure 1 r(s, t) for unique solutions of (1.3)
(t ≥ s > 0); and E2M, maximal-like solution of (1.4)(s ≥ t > 0). In Figures 1 and 2, r(s, t) = ||Xs + AXtA - Q||F and e(s, t) = ||Xs - AXtA - Q||F, respectively. Example 4.2 Solve Hermitian generalized Riccati matrix equations (HGARE) X2 + BX + XB + AXA = Q with Algorithm 3.3, when the given matrices A, B = B, Q are sparse. Let B = sprandsym(200, 0.3), Q = speye(200). With the stopping condition jjðX þ BÞ - ½ðQ þ B2 þ 1
A BAÞ - A ðX þ BÞA2 jjF < 10 - 12 , in Table 2 we show the iterations and the CPU time (long format) for the unique Hermitian positive solution. In Figure 3, E(A, B, Q) = ||X2 + BX + XB + AXA - Q||F.
Hermitian Polynomial Matrix Equations and Applications 10
367
4 2
5
log10(e(5,8))
log10(e(1,2))
0 −2
0
−4 −5
−6 −8
log10(e(2,1))
−10
log10(e(8,5)) −10
−15
0
20
40
60
−12
0
20
40
60
Figure 2 e(s, t) for maximal-like solutions of (1.4)
Table 2 Iterations and CPU time for HGARE 103 × A sprand(200,200,0.2) sprandn(200,200,0.2)
Iterations 5 4
CPU time 3.280131972080250e+000 2.686544798291403e+000
Example 4.3 We will show the relations between iterations and the values of s, t, with two simple matrices A × 102 = 8:93-0:28i
-3:33-2:53i -1:09-2:95i
1:66 -6:94i
2:65-0:88i
20:16 þ 1:38i
-0:43 þ 0:06i -3:33 þ 6:10i -2:45 þ 2:03i
4:04 þ 4:86i
1:43 þ 4:00i
20:63 þ 11:29i
0:37 þ 1:05i
-4:00- 1:68i -5:10 þ 5:88i -4:34 þ 3:36i 7:99 þ 21:44i 5:31 þ 1:07i
3:73-3:02i
2:82 -1:42i
8:97-11:48i
1:87-2:31i
-0:01 þ 0:35i , -4:44 þ 7:15i 25:79- 3:83i
368
Z. Jia et al. −1 −2 −3 −4 −5 −6
← Sprand ← Sprandn
−7 −8 −9 −10 −11
1
2
3
4
5
6
7
8
9
10
Figure 3 log10(E(A,B,Q)) for unique solutions of HGARE
and
Q=
0:9762
0:0531
- 0:0008 0:0782
- 0:0369
- 0:1084
1:2215
- 0:0132 0:1059
- 0:0698
- 0:1073
0:0641
- 0:0059
0:0155
- 0:0224
0:0044
1:0963
0:0768
- 0:0022 1:1102 0:0007
0:0116
- 0:0410 : - 0:0082 1:0959
In Figure 4, the first one shows that the bigger the t, the more complex computing unique HPD solutions of (1.3) gets, with a fixed s = 5 and t taking values from 1 to 25, while from the second the bigger the s(1 ≤ s ≤ 25), the easier it will be with a fixed t = 5. The last two show that the cost for maximal solutions of (1.4) also has a strong relation with the values of s and t.
Hermitian Polynomial Matrix Equations and Applications 2
1
3
4
24
26
200
9
22
24
180
8
22
160
20
140
18
120
16
100
14
80
12
60
10
40
8
20
20 18 # Iterations
369
7 6 5
16
4
14 12 10 8 6
0
20 40 s=5, t=1:25
6
0
20 40 t=5,s=1:25
0
3 2 1
0 10 20 s=5, t=5:17
0
0
10 20 t=5, s=5:17
Figure 4 Iterations and the values of s, t
5 Conclusion In this chapter, we consider maximal, minimal-like, and maximal-like solutions of Hermitian polynomial matrix equations in general forms, which are important in many applications. Sufficient conditions are given to guarantee the existence and uniqueness of Hermitian positive definite solutions. A general iteration method is proposed for unique and maximal or maximal-like solutions and its efficiency is indicated by numerical experiments. The perturbation and sensitivity analysis of such Hermitian polynomial matrix equations will be an interesting topic in the future. Acknowledgements The authors would like to thank Professors Musheng Wei and Jianli Zhao for wonderful suggestions on the presentation of this chapter. This paper is partly supported by the National Natural Science Foundation of China under grants 12171210, 11771188, and 12090011, the Natural Science Research of Jiangsu Higher Education Institutions of China under grant 21KJA110001, the Provincial Natural Science Foundation of Fujian of China under grant 2022J01378, the Priority Academic Program Development Project (PAPD), and the Top-notch Academic Programs Project (No. PPZY2015A013) of Jiangsu Higher Education Institutions.
370
Z. Jia et al.
References 1. Lancaster, P., & Rodman, L. (1995). The algebraic Riccati equation. Oxford University Press, Oxford 2. Benner, P., Laub, A. J., & Mehrmann, V. (1997). Benchmarks for the numerical solution of algebraic Riccati equations. IEEE Control Systems Magazine, 7, 18–28 3. Abou-Kandil, H., Freiling, G., Ionesu, V., & Jank, G. (2003). Matrix riccati equations in control and systems theory. Birkhauser 4. Tippett, M. K., Cohn, S. E., Todling, R., & Marchesin, D. (2000). Conditioning of the stable, discrete-time Lyapunov operator. SIAM Journal on Matrix Analysis and Applications, 22, 56–65 5. Jia, Z. G., & Zhao, M.X. (2018). A structured condition number for self-adjoint polynomial matrix equations with applications in linear control. Journal of Computational and Applied Mathematics, 331, 208–216 6. Ran, A. C. M., & Reurings, M. C. B. (2002). On the nonlinear matrix equation X + AF(X)A = Q: Solutions and perturbation theory. Linear Algebra and its Applications, 346, 15–26 7. Lee, H., & Lim, Y. (2008). Invariant metrics, contractions and nonlinear matrix equations. Nonlinearity, 21, 857–878 8. Jia, Z. G., & Wei, M. (2009). Solvability and sensitivity analysis of polynomial matrix equation Xs + ATXtA = Q. Applied Mathematics and Computation, 209(2), 230–237 9. Laub, A. J. (1979). A Schur method for solving algebraic Riccati equations. IEEE Transactions on Automatic Control, 24, 913–921 10. Mehrmann, V. (1996). A step towards a unified treatment of continuous and discrete time control problems. Linear Algebra and its Applications, 241–243, 749–779 11. Chu, E. K. W., Fan, H.Y., Lin, W. W., & Wang, C. S. (2004). Structure-preserving algorithms for periodic discrete-time algebraic Riccati equations. International Journal of Control, 77, 767–788 12. Chu, E. K. W., Fan, H. Y., & Lin, W. W. (2005). A structure-preserving doubling algorithm for continuous-time algebraic Riccati equations. Linear Algebra and its Applications, 396, 55–80 13. Xu, H. (2007). Transformations between discrete-time and continuous-time algebraic Riccati equations. Linear Algebra and its Applications, 425, 77–101 14. Jia, Z. G., Zhao, M. X., Wang, M. H., & Ling, S. T. (2014). Solvability theory and iteration method for one self-adjoint polynomial matrix equation. Journal of Applied Mathematics, 2014, Article ID 681605, 7 pages 15. Byers, R. (1985). Numerical condition of the algebraic Riccati equation. In B. N. Datta (Ed.), Linear algebra and its role in system theory (Vol. 47, pp. 35–49). AMS Contemporary Mathematics Series, AMS 16. Higham, N. J. (1993). Perturbation theory and backward error for AX - XB = C. BIT Numerical Mathematics, 33, 124–136 17. Xu, S. F. (1996). Sensitivity analysis of the algebraic Riccati equations. Numerical Mathematics, 75, 121–134 18. Sun, J. G. (1998). Perturbation theory for algebraic Riccati equations. SIAM Journal of Matrix Analysis and Application, 19, 39–65
Hermitian Polynomial Matrix Equations and Applications
371
19. Konstantinov, M., & Petkov, P. (1999). Note on perturbation theory for algebraic Riccati equations. SIAM Journal of Matrix Analysis and Application, 21, 327–327 20. Sun, J. G. (2002). Condition numbers of algebraic Riccati equations in the Frobenius norm. Linear Algebra and its Applications, 350, 237–261 21. Zhou, L.M., Lin, Y. Q., Wei, Y. M., & Qiao, S.Z. (2009). Perturbation analysis and condition numbers of symmetric algebraic Riccati equations. Automatica, 45, 1005–1011 22. Meng, J., & Kim, H.-M. (2017). The positive definite solution of the nonlinear matrix equation X p = A + M(B + X-1)-1M. Journal of Computational and Applied Mathematics, 322, 139–147 23. Lee, H., Kim, K.-M., & Meng, J. (2019). On the nonlinear matrix equation X p = A + MT(X#B)M. Journal of Computational and Applied Mathematics, 373(391), 112380 24. Jin, Z., & Zhai, C. (2021). On the nonlinear matrix equation - 1 -1 Þ M i . Linear and Multilinear Algebra. https://doi. Xp = A þ m i = 1 M i ðB þ X org/10.1080/03081087.2021.1882371 25. Gohberg, I., Lancaster, P., & Rodman, L. (1982). Matrix polynomials. Academic Press 26. Zhan, X. (2002). Matrix inequalities. Springer 27. Furuta, T. (1998). Operator inequalities associated with Holder-McCarthy and Kantorovich inequalities. Journal of Inequalities and Applications, 6, 137–148 28. Horn, R. A., & Johnson, C. R. (1991). Topics in matrix analysis. Cambridge University Press, Cambridge 29. Horn, R. A., & Johnson, C. R. (1990). Matrix analysis. Cambridge University Press 30. Liu, X., & Gao, H. (2003). On the positive definite solutions of the matrix equations Xs ± ATX-tA = In. Linear Algebra and its Applications, 368, 83–97 31. Xu, S. F. (1997). Perturbation analysis of the maximal solution of the matrix equation X + AX-1A = P. Linear Algebra and its Applications, 336, 61–70 32. Lancaster, P., & Tismenetsky, M. (1985). The theory of matrices second edition with applications. Academic Press
Inequalities for Matrix Exponentials and Their Extensions to Lie Groups Luyining Gan, Xuhua Liu, and Tin-Yau Tam
Abstract This chapter surveys some classical and recently discovered inequalities for matrix exponentials and their extensions to Lie groups. Some questions are asked. Keywords Matrix exponential • Positive definite matrix • Log-majorization • Kostant’s preorder • Lie group Mathematics Subject Classification (MSC2020) Primary 15A42 • Secondary 22E15
1 Introduction Let us recall some basic concepts in matrix analysis. Denote by Mn the vector space of all n × n complex matrices and by Hn the set of all n × n Hermitian matrices, which is a real subspace of Mn . Let P n be the set of all n × n positive definite matrices in Mn and let N n be the set of all n × n normal matrices in Mn . L. Gan Department of Mathematics and Statistics, University of Nevada, Reno, NV, USA School of Science, Beijing University of Posts and Telecommunications, Beijing, China e-mail: [email protected]; [email protected] X. Liu Department of Mathematics, North Greenville University, Tigerville, SC, USA e-mail: [email protected] T.-Y. Tam (✉) Department of Mathematics and Statistics, University of Nevada, Reno, NV, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Moslehian (ed.), Matrix and Operator Equations and Applications, Mathematics Online First Collections, https://doi.org/10.1007/16618_2023_51
373
374
L. Gan et al.
For all X 2 Mn , let λ(X) = (λ1(X), λ2(X), . . ., λn(X)) denote the vector of eigenvalues of X whose absolute values are in nonincreasing order, let s(X) = (s1(X), s2(X), . . ., sn(X)) denote the vector of singular values of X whose values are in nonincreasing order, let [λ(X)]m and [s(X)]m be the tuples consisting of m-th power of each element in λ(X) and s(X), let jXj = (XX)1∕2 so that λ(jXj) = s(X), let kXk denote the spectral norm of X (the largest singular value of X), and let tr X, X⊤, and X denote the trace, transpose, and conjugate transpose of X, respectively. The matrix exponential map exp : Mn → Mn is defined as exp X = eX =
1
Xn , n = 0 n!
8 X 2 Mn :
The spectral theorem implies that the restriction of exp on Hn is one-to-one and onto P n . So studying positive definite matrices can be carried out via the exponential map and Hn . Moreover, for all X, Y 2 Hn , it is known that eXeY = eYeX if and only if XY = Y X. The famous Lie-Trotter product formula states that m
lim ðeX∕ m eY∕ m Þ = eXþY ,
m→1
8 X, Y 2 Mn :
The Golden-Thompson inequality asserts that tr eXþY b tr ðeX eY Þ,
8 X, Y 2 Hn :
ð1:1Þ
Equality in (1.1) holds if and only if XY = Y X [34, 46]. The inequality can be viewed as a generalization of the equality ex+y = exe y, x, y 2 , though the multiplication in P n is not commutative; indeed P n is not even closed under multiplication. This celebrated result was independently discovered by Golden [19], Symanzik [48], and Thompson [51] in the same year of 1965, all motivated by statistical mechanics. Since then, the Golden-Thompson inequality has received much attention and been generalized in various ways and applied in many fields (see, e.g., [2, 5–7, 10, 16, 22, 24, 31, 34, 43, 52] and the references therein). For historical aspects, one may see a recent paper by Forrester and Thompson [16]. However, tr eXþYþZ ≰ tr ðeX eY eZ Þ in general.
Inequalities for Matrix Exponentials and Their Extensions to Lie Groups
375
Motivated by the Golden-Thompson inequality (1.1) and problems in linear-quadratic optimal feedback control, Bernstein [6] proved the following inequality: tr ðeX eX Þb tr eX
þX
8 X 2 Mn :
,
ð1:2Þ
So [46] showed that equality in (1.2) holds if and only if X is normal. Another interesting trace inequality is the Lieb-Thirring [36] inequality r
tr ðA1∕2 BA1∕2 Þ
b tr ðAr∕2 Br Ar∕2 Þ,
8 A, B 2 P n , 8 r ⩾ 1:
ð1:3Þ
Because the trace is cyclic and is the sum of eigenvalues, and λ(XY ) = λ(Y X) for all X, Y 2 Mn , the inequality (1.3) can be rewritten as tr ðABÞr b tr ðAr Br Þ,
8 A, B 2 P n , 8 r ⩾ 1:
ð1:4Þ
The Lieb-Thirring inequality (1.3) was further generalized by Araki [3] as tr ðA1∕2 BA1∕2 Þ b tr ðAr∕2 Br Ar∕2 Þ , rq q tr ðA1∕2 BA1∕2 Þ ⩾ tr ðAr∕2 Br Ar∕2 Þ , rq
q
8 q ⩾ 0, 8 r ⩾ 1, 8 q ⩾ 0, 8 0b r b 1:
ð1:5Þ ð1:6Þ
In 1975, Pusz and Woronowicz [44] first introduced the metric geometric mean (abbreviated geometric mean) of A, B 2 P n which is formally defined as 1∕2
A♯B = A1∕2 ðA - 1∕2 BA - 1∕2 Þ A1∕2 : In 1997, Fiedler and Pták [15] defined the spectral geometric mean (abbreviated spectral mean) of A, B 2 P n as 1∕2
1∕2
A♮B = ðA - 1 ♯BÞ AðA - 1 ♯BÞ : It was called spectral mean because (A♮B)2 is similar to AB so that λ(A♮B) is the principal square roots of λ(AB). It is straightforward to get that A♯B = B♯A and A♮B = B♮A. We have A♯B = A♮B = A1∕2B1∕2 if A and B commute. As the spectral mean is defined using the geometric mean, one may expect some nice relation between the two means. For example, a trace inequality between these two means was shown by Kim and Lim [29]:
376
L. Gan et al. 2∕ r
tr ðerX ♯ erY Þ
b tr eXþY b tr ðerX ♮ erY Þ2∕ r ,
8 r > 0:
ð1:7Þ
In fact Hiai and Petz [24] had obtained the first inequality in (1.7) as a complement to Golden-Thompson inequality (1.1). The last inequality in (1.7) is viewed as a refinement of Golden-Thompson inequality when 0 < rb1. Let A, B 2 P n and t 2 [0, 1]. One may extend the two mean notions by assigning weights, namely, the t-metric geometric mean (abbreviated t-geometric mean) is naturally defined by t
A♯t B = A1∕2 ðA - 1∕2 BA - 1∕2 Þ A1∕2 , and the t-spectral geometric mean (abbreviated t-spectral mean) is defined by t
t
A♮t B = ðA - 1 ♯BÞ AðA - 1 ♯BÞ : Lee and Lim [33] first introduced and defined the t-spectral mean. Note that both two weighted means are paths connecting the starting point A (when t = 0) and the endpoint B (when t = 1) in P n . In recent times, many scholars have been intrigued by t-geometric mean, because it is related to Riemannian geometry, that is, the curve γ(t) = A♯tB (0bt b1 ) is the unique geodesic between A and B in P n as P n is equipped with a suitable Riemannian metric. Liao et al. [35] generalized the t-geometric mean to symmetric spaces of noncompact type. In 2007, Ahn et al. [1, p.191] (also see [29, p.446]) studied the t-spectral mean and presented Lie-Trotter formulae for two weighted means. Kim [28] studied the further algebraic and geometric meaning on t-spectral mean in 2021. These classical trace inequalities (1.1), (1.2), (1.3), (1.4), (1.5), (1.6), and (1.7) were generalized in terms of unitarily invariant norms as well as stronger form of log-majorization. The main purpose of this chapter is to survey these beautiful Golden-Thompson type inequalities and their extensions to Lie groups.
2 Log-Majorization and Unitarily Invariant Norms Let x = (x1, x2, . . . , xn) and y = (y1, y2, . . . , yn) be in n . Rearrange the components of x to form x# = (x[1], x[2], . . ., x[n]) such that x½1 ⩾ x½2 ⩾ ⋯ ⩾ x½n . We say that x is weakly majorized by y, denoted by x ≺wy, if
Inequalities for Matrix Exponentials and Their Extensions to Lie Groups k i=1
x½i b
k i=1
377
8 1bk bn:
y½i ,
If, in addition, the equality holds for k = n, we say that x is majorized by y and denote this by x ≺ y. There are many equivalent conditions for majorization. Let conv Sn x be the convex hull of the orbit of x under the action of the symmetric group Sn. The following condition related to conv Sn x is suitable for generalization to Lie groups [25]: xy
, conv Sn x ⊂ conv Sn y:
ð2:1Þ
Let x and y be nonnegative n-tuples. If k i=1
x½i b
k i=1
8 1bkbn,
y½i ,
we say that x is weakly log-majorized by y, denoted by x ≺w-log y. If, in addition, the equality holds for k = n, we say that x is log-majorized by y and denote this by x log y. In other words, when x and y are positive, x log y if and only if log x log y, where log x = ðlog x1 , . . . , log xn Þ. It is known that [22, Proposition 1.3] x log y
)
xwlog y
)
xw y:
ð2:2Þ
Each of the above four types of relation is a preorder on n or nþ =: fx 2 n : x ⩾ 0g and thus induces a partial order on the orbits of n under the action of Sn. Remark 2.1 There are two important orderings that can be defined on Hn : (1) the Löwner order (a partial order) defined by Xb Y
, Y - X is positive semidefinite;
(2) the majorization (a preorder) defined by YX
,
λðYÞ λðXÞ:
Given X, Y 2 Hn , both Y ≺ X and X ≺ Y occur if and only if λ(X) = λ(Y ), i.e., X and Y are unitarily similar.
378
L. Gan et al.
Let us recall the definition of matrix norm defined in [27]. As defined in [27], a function kj kj : Mn → is a vector norm if for all A, B 2 Mn , the following properties are satisfied: (1) kj Akj ⩾ 0 with kj A kj = 0 if and only if A = 0; (2) kj cA kj = jcj kj A kj for all c 2 ; (3) kj A þ Bkj b kj Akj þ kj Bkj . If in addition kjABkjb kjAkjkjBkj for all A, B 2 Mn , then kj kj is a matrix norm. A vector norm kj kj on Mn is a unitarily invariant norm (u.i. norm for short) if kjUXVkj = kjXkj for all X 2 Mn and for all unitary U, V 2 Mn . The spectral norm k k given by kXk = s1(X) is unitarily invariant. It is shown in [27, p.469] that a unitarily invariant norm kj kj on Mn is a matrix norm if and only if kjXkj ⩾ s1 ðXÞ for all X 2 Mn . As characterized by von Neumann [53], a function f : Mn → is a unitarily invariant norm if and only if f(X) is a symmetric gauge function on the singular values of X (see also [7, p.91]). An equivalent formulation of the Ky Fan dominance theorem [13] is that for all X, Y 2 Mn sðXÞw sðYÞ
, kjXkj b kjYkj for all u:i: norms kj kj :
ð2:3Þ
Combining (2.2) and (2.3), we get that for all X, Y 2 Mn sðXÞ log sðYÞ
)
kjXkj b kjYkj for all u:i: norms kj kj :
For all 1b k b n, the k-th compound of X 2 Mn is the Ck(X) whose elements are given by
n k
×
n k
ð2:4Þ matrix
C k ðXÞα,β = det X½αjβ, where α, β 2 Qk,n, and Qk,n = fω = ðωð1Þ, . . . , ωðkÞÞ : 1b ωð1Þ < ⋯ < ωðkÞb ng is the set of strictly increasing sequences of length k chosen from {1, . . . , n}, and X[αjβ] is the submatrix of X whose rows and columns are indexed
Inequalities for Matrix Exponentials and Their Extensions to Lie Groups
379
by α and β, respectively. In particular, C1(X) = X and Cn(X) = det X. For 1 2 3 example, if X =
4
5
6
7
8
9
, then
det X½1, 2j1, 2 det X½1, 2j1, 3 det X½1, 2j2, 3 C2 ðXÞ =
det X½1, 3j1, 2 det X½1, 3j1, 3 det X½1, 3j2, 3 det X½2, 3j1, 2 det X½2, 3j1, 3 det X½2, 3j2, 3
=
-3
-6
-3
-6
- 12
-6
-3
-6
-3
:
The following lemma collects some useful properties of compound matrices that will be used later. Lemma 2.2 (Marcus [38, 39], Merris [41]) Let X, Y 2 Mn . The following statements are true: (1) C k ðI n Þ = I ðnÞ and Ck(XY ) = Ck(X)Ck(Y ), i.e., Ck : GLðn, Þ → GLð k
n k
, Þ
is a representation. (2) Ck(X) = [Ck(X)]. (3) If X is unitary (normal, Hermitian, positive definite), so is Ck(X), respectively. k
(4) The eigenvalues of Ck(X) are
j=1
λωðjÞ ðXÞ for all ω 2 Qk,n. k
(5) The singular values of Ck(X) are
j=1
sωðjÞ ðXÞ for all ω 2 Qk,n.
For all 1b k b n, the kth additive compound of X 2 Mn is defined by Δk ðXÞ =
d dt
t=0
Ck ðI n þ tXÞ:
In other words, Δk(X) is the directional derivative of Ck at the identity In in the direction X. It is known that Δk ðX þ YÞ = Δk ðXÞ þ Δk ðYÞ,
8 X, Y 2 Mn ,
ð2:5Þ
380
L. Gan et al.
and eΔk ðXÞ = C k ðeX Þ,
8 X 2 Mn ,
ð2:6Þ
and Δk ðX Þ = Δk ðXÞ ,
8 X 2 Mn :
ð2:7Þ
3 Inequalities for Matrix Exponentials Recall that s1(X) = kXk = kXXk1∕2 for all X 2 Mn , where k k denotes the spectral norm. Therefore for A, B 2 P n , kABk2 = kBA2 Bk = s1 ðBA2 BÞ = λ1 ðBA2 BÞ = λ1 ðA2 B2 Þ:
ð3:1Þ
The following result is of fundamental importance. The first inequality appeared in [11, p.24], where credit was given to E. Heinz [21]. For completeness a proof for the first inequality is given, which is adopted from [7, p.255] and Zhan [55, p.2–3]. The original idea is from Pedersen [42]. Theorem 3.1 (Heinz [21]) Let A, B 2 P n and let k k denote the spectral norm. The following statements are true and equivalent: (1) (2) (3) (4)
kAr Br kb kABkr for all 0b r b 1. kAr Br k ⩾ kABkr for all r ⩾ 1. The function r ° kArBrk1∕r is monotonically increasing on (0, 1). The function r ° kA1∕rB1∕rkr is monotonically decreasing on (0, 1).
Moreover, lim kA1∕ r B1∕ r kr = ke log Aþ log B k:
r→1
Proof We first show that (1) is valid. Let S = fr 2 ½0, 1:kAr Br kb kABkr g: Obviously, 0 2 S and 1 2 S. We will see that S = [0, 1].
Inequalities for Matrix Exponentials and Their Extensions to Lie Groups
381
If kAr Br kb kABkr and kAt Bt kb kABkt for some 0b r b t b 1, then AðrþtÞ∕2 BðrþtÞ∕2
2
¼ λ1 ðArþt Brþt Þ ¼ λ1 ðAr Brþt At Þ b kAr Brþt At k b kAr Br k kBt At k ¼ kAr Br k kAt Bt k b kABkr kABkt ¼ kABkrþt ,
ðby ð3:1ÞÞ
ðkX k ¼ kX kfor all X 2 Mn Þ
and thus kAðrþtÞ∕2 BðrþtÞ∕2 kb kABkðrþtÞ∕2 . This shows that S is a convex set. Therefore, S = [0, 1] and hence (1) is valid. (1) ) (2). By (1), we have kA1∕ r B1∕ r kb kABk1∕ r for all r ⩾ 1. So kA1∕ r B1∕ r kr b kABk for all r ⩾ 1. Replacing A with Ar and B with Br, respectively, we have (2). (2) ) (3). By (2), we have kAp∕ q Bp∕ q k ⩾ kABkp∕ q for all p ⩾ q > 0. So kAp∕ q Bp∕ q k1∕ p ⩾ kABk1∕ q for all p ⩾ q > 0. Replacing A with Aq and B with Bq, respectively, we have (3). (3) ) (4). This is obvious. (4) ) (1). By (4), we have kA1∕ r B1∕ r kr ⩾ kABk for all 0 < rb1. Replacing A with Ar and B with Br, respectively, we derive (1). Finally, note that kXk = s1(X) = (λ1(XX))1∕2 for all X 2 Mn . So lim kA1∕ r B1∕ r kr = r lim λ ðA1∕ r B1∕ r B1∕ r A1∕ r Þ →1 1
r∕2
r→1
= lim λ1 ðA1∕ r B2∕ r A1∕ r Þ
r∕2
r→1
r∕2
= lim λ1 ðA2∕ r B2∕ r Þ r→1
= λ1 ð lim ðA2∕ r B2∕ r Þ r→1
= λ1 ðelog Aþ log B Þ = ke log Aþ log B k:
r∕2
ðλðXYÞ = λðYXÞÞ ðby eigenvalue continuityÞ ðby Lie‐Trotter formulaÞ □
Theorem 3.1 is not true for all unitarily invariant norms. For a counterexample, consider A = B = In and r = 2 for the Frobenius norm. Given A, B 2 P n , AB may not be positive definite. However, the eigenvalues of AB are positive, so we have λ1(A) > 0. Though kABk≠ λ1(AB), the following inequality holds:
382
L. Gan et al.
λ1 ðABÞb λ1 ðAÞλ1 ðBÞ: The reason is that λ1 ðABÞb s1 ðABÞb s1 ðAÞs1 ðBÞ = λ1 ðAÞλ1 ðBÞ as s1(A) = λ1(A) for all A 2 P n . Moreover, the following result is equivalent to Theorem 3.1. Theorem 3.2 ([7, p.257]) Let A, B 2 P n . The following statements are true and equivalent: (1) (2) (3) (4)
λ1 ðAr Br Þb λ1 ððABÞr Þ for all 0b rb 1. λ1 ðAr Br Þ ⩾ λ1 ððABÞr Þ for all r ⩾ 1. The function r ° λ1((ArBr)1∕r) is monotonically increasing on (0, 1). The function r ° λ1((A1∕rB1∕r)r) is monotonically decreasing on (0, 1).
Moreover, r
lim λ1 ððA1∕ r B1∕ r Þ Þ = λ1 ðelog Aþ log B Þ:
r→1
Proof The proof of the equivalence of (1)–(4) is similar to that of Theorem 3.1. We only show that (1) is true. For all 0b rb 1, by (3.1) and Theorem 3.1 (1), we have λ1 ðA2r B2r Þ = kAr Br k2 b kABk2r = ½λ1 ðA2 B2 Þ = λ1 ððA2 B2 Þ Þ: r
r
Replacing A2 with A and B2 with B, respectively, (1) follows. By the Lie-Trotter product formula and the continuity of eigenvalues [45, p.43–44], r
r
lim λ1 ððA1∕ r B1∕ r Þ Þ = λ1 ð lim ðA1∕ r B1∕ r Þ Þ = λ1 ðelog Aþ log B Þ:
r→1
r→1
□
The following result (seemingly stronger than but equivalent to Theorem 3.2) can be obtained by using arguments involving compound matrices. It can be viewed as a generalization of the Lieb-Thirring inequality (1.4). Theorem 3.3 Let A, B 2 P n . The following statements are true and equivalent: (1) λðAr Br Þ log λððABÞr Þ for all 0b r b 1. (2) λððABÞr Þ log λðAr Br Þ for all r ⩾ 1. (3) The function r ° λ((ArBr)1∕r) is monotonically increasing on (0, 1) with respect to log . In other words, 1∕ t
λððAr Br Þ1∕ r Þ log λððAt Bt Þ Þ,
8 0 < r < t:
(4) The function r ° λ((A1∕rB1∕r)r) is monotonically decreasing on (0, 1) with respect to log .
Inequalities for Matrix Exponentials and Their Extensions to Lie Groups
383
Moreover, the function r ° λ((A1∕rB1∕r)r) is bounded and r
lim λððA1∕ r B1∕ r Þ Þ = λðelog Aþ log B Þ:
r→1
Proof The proof of the equivalence of (1)–(4) is similar to that of Theorem 3.1. The boundedness of the function in (4) because it is monotonically decreasing and has a limit λðelog Aþ log B Þ as r tends to 1 and when r tends to 0, it is monotonically increasing and its limit exists mentioned by Audenaert and Hiai [4]. The limit follows from the Lie-Trotter product formula and the continuity of eigenvalues [45, p. 43–44]. We only need to show that (1) is valid. Let 0b rb 1. For all 1b kb n, by Lemma 2.2, we have k i=1
λi ðAr Br Þ = λ1 ðC k ðAr Br ÞÞ = λ1 ðC k ðAr ÞC k ðBr ÞÞ = λ1 ð½C k ðAÞr ½C k ðBÞr Þ b λ1 ð½Ck ðAÞCk ðBÞr Þ
ðby Theorem 3:2Þ
r
= λ1 ð½C k ðABÞ Þ = λ1 ðC k ððABÞr ÞÞ k
=
i=1
λi ððABÞr Þ:
In other words, λ(ArBr) ≺w-logλ((AB)r). Note also that det ðAr Br Þ = det ðABÞr : Therefore, λðAr Br Þ log λððABÞr Þ. This completes the proof.
□
The following result combines Theorems 3.2 and 3.3. Although it is about (ArBr)1∕r, one may also formulate similar results for (A1∕rB1∕r)r and ArBr and (AB)r. Theorem 3.4 Let A, B 2 P n . The following statements are true and equivalent: (1) The function r ° λ((ArBr)1∕r) is monotonically increasing on (0, 1) with respect to log .
384
L. Gan et al.
(2) The function r ° λ((ArBr)1∕r) is monotonically increasing on (0, 1) with respect to ≺w. (3) The function r ° λ1((ArBr)1∕r) is monotonically increasing on (0, 1). Proof The implications (1) ) (2) ) (3) ) (1) follow by (2.2) and the equivalence of Theorems 3.2 and 3.3. □ The following result is then obvious, from which the Golden-Thompson inequality (1.1) follows. Corollary 3.5 Let A, B 2 P n . The following statements are true and equivalent: (1) (2) (3) (4)
tr ðAr Br Þb tr ðABÞr for all 0 b r b 1. tr ðAr Br Þ ⩾ tr ðABÞr for all r ⩾ 1. The function r ° tr (ArBr)1∕r is monotonically increasing on (0, 1). The function r ° tr (A1∕rB1∕r)r is monotonically decreasing on (0, 1).
Moreover, lim tr (A1∕rB1∕r)r exists and r→0
r
lim tr ðA1∕ r B1∕ r Þ = tr e log Aþ log B :
r→1
For all X, Y 2 Hn , while eXeY is not positive definite in general, eX∕2eYeX∕2 is. Note that eXeY and eX∕2eYeX∕2 have the same eigenvalues. Thus using the cyclic property of the trace, the Golden-Thompson inequality (1.1) is equivalent to tr eXþY b tr ðeX∕2 eY eX∕2 Þ,
8 X, Y 2 Hn :
The following result is stronger than the Golden-Thompson inequality. Theorem 3.5 (Thompson [52]) Let X, Y 2 Hn . The following statements are true and equivalent: λðeXþY Þ log λðeX∕2 eY eX∕2 Þ = λðeX eY Þ. λ(eX+Y) ≺w λ(eX∕2eYeX∕2) = λ(eXeY). s(eX+Y) ≺w s(eX∕2eYeX∕2) ≺ws(eXeY). kj eXþY kj b kj eX∕2 eY eX∕2 kj b kj eX eY kj for all unitarily invariant norms kj kj on Mn . (5) keXþY kb keX∕2 eY eX∕2 kb keX eY k, where kk is the spectral norm. (6) λ1 ðeXþY Þb λ1 ðeX∕2 eY eX∕2 Þ = λ1 ðeX eY Þ. (1) (2) (3) (4)
In particular, the Golden-Thompson inequality (1.1) follows from (2).
Inequalities for Matrix Exponentials and Their Extensions to Lie Groups
385
Proof The fact that (6) is valid follows from the Lie-Trotter product formula and Theorem 3.2 (4): λ1 ðeXþY Þ = λ1 ðnlim ðeX∕ n eY∕ n Þ Þ = nlim λ ððeX∕ n eY∕ n Þ Þb λ1 ðeX eY Þ: →1 →1 1 n
n
The implications (1) ) (2) ) (3) ) (4) ) (5) ) (6) are obvious in light of (2.2), (2.3), and (2.4), since λðeXþY Þ = sðeXþY Þ
and λðeX∕2 eY eX∕2 Þ = sðeX∕2 eY eX∕2 Þ:
In (2) ) (3), λ(eXeY) ≺ws(eXeY) follows from Weyl’s inequality jλðMÞj log sðMÞ,
8 M 2 Mn :
ð3:2Þ
It remains to show that (6) ) (1). Suppose λ1 ðeXþY Þb λ1 ðeX eY Þ,
8 X, Y 2 Hn :
For all 1b k 2 n, we have k
λi eXþY ¼ λ1 C k eXþY i¼1
¼ λ1 eΔk ðXþY Þ
ðby ð2:6ÞÞ
¼ λ1 eΔk ðX ÞþΔk ðY Þ
ðby ð2:5ÞÞ
b
ðby assumption ð6Þ and ð2:7ÞÞ
λ1 eΔk ðX Þ eΔk ðY Þ
¼ λ1 C k eX Ck eY
ðby ð2:6ÞÞ
¼ λ1 C k e e
ðby Lemma 2:2 ð1ÞÞ
X Y
k
λi eX eY :
¼ i¼1
In other words, λ(eX+Y) ≺w-log λ(eXeY). Note also that det eXþY = etr
ðXþYÞ
= etr X etr Y = det eX det eY = det ðeX eY Þ:
Therefore, λðeXþY Þ log λðeX eY Þ. This completes the proof.
□
386
L. Gan et al.
Note that λ((A1∕2BA1∕2)r) = λ((AB)r) and λ(Ar∕2BrAr∕2) = λ(ArBr) for all A, B 2 P n . Therefore, Theorem 3.3 (2) is equivalent to r
λððA1∕2 BA1∕2 Þ Þ log λðAr∕2 Br Ar∕2 Þ,
8 r ⩾ 1:
ð3:3Þ
Thus we have rq
q
λððA1∕2 BA1∕2 Þ Þ log λððAr∕2 Br Ar∕2 Þ Þ,
8 q ⩾ 0, 8 r ⩾ 1:
It then follows from (2.2) that rq
q
λððA1∕2 BA1∕2 Þ Þw λððAr∕2 Br Ar∕2 Þ Þ,
8 q ⩾ 0, 8 r ⩾ 1:
Hence (1.5) is valid. It is worthwhile to formulate some equivalent forms of (3.3), which are corresponding and equivalent to those in Theorem 3.3. Theorem 3.7 (Araki [3]) Let A, B 2 P n . The following statements are true and equivalent: (1) λðAr∕2 Br Ar∕2 Þ log λððA1∕2 BA1∕2 Þ Þ for all 0b rb 1. r (2) λððA1∕2 BA1∕2 Þ Þ log λðAr∕2 Br Ar∕2 Þ for all r ⩾ 1. (3) The function r ° λ((Ar∕2BrAr∕2)1∕r) is monotonically increasing on (0, 1) with respect to log . (4) The function r ° λ((A1∕2rB1∕rA1∕2r)r) is monotonically decreasing on (0, 1) with respect to log . r
Moreover, the function r ° λ((A1∕2rB1∕rA1∕2r)r) is bounded and r
lim λððA1∕2r B1∕ r A1∕2r Þ Þ = λðelog Aþ log B Þ:
r→1
Remark 3.8 The statements in Theorems 3.1, 3.2, 3.3, 3.4, and 3.7 are all equivalent to each other. Combined with the Lie-Trotter product formula, each of them implies Theorem 3.5. Because of the bijection X ° eX from Hn onto P n , they can be expressed with a form involving the matrix exponential map. Furthermore, in the form of log , they have extensions in Lie groups. Remark 3.9 Recently Audenaert and Hiai [4] considered the convergence of the sequence {(Ar∕2BrAr∕2)1∕r}r 2ℕ. They proved that 1∕ r
lim ðAr∕2 Br Ar∕2 Þ
r→1
exists but its explicit form is not known yet.
Inequalities for Matrix Exponentials and Their Extensions to Lie Groups
387
Recall that jXj = (XX)1∕2 for all X 2 Mn . The following result of Ky Fan is important. Theorem 3.10 (Ky Fan [12]) Let X 2 Mn and m 2 . The following two relations are equivalent and valid: λððX m Þ X m Þ log λððX XÞm Þ,
ð3:4Þ
λðjX m jÞ = sðX m Þ log ½sðXÞm = ½λðjXjÞm :
ð3:5Þ
Moreover, (Xm)Xm = (XX)m if and only if X 2 N n . Proof The equivalence of (3.4) and (3.5) follows immediately. By compound matrix arguments similar to the proof of Theorem 3.3, to derive (3.4) it suffices to show λ1 ððX m Þ X m Þb λ1 ððX XÞm Þ:
ð3:6Þ
But (3.6) is just the first of the following inequalities by Ky Fan [12] (see [9, Theorem 1] for more interesting inequalities): k j=1
λj ððX m Þ X m Þb
k j=1
λj ððX XÞm Þ,
8 1b k b n:
If X 2 N n , then (Xm)Xm = (X)mXm = (XX)m. Conversely, if (Xm)Xm = (XX)m, then tr (Xm)Xm = tr (XX)m, and hence X 2 N n by [46, Theorem 4.4], which states that “if tr MpM p = tr (MM)p for some p ⩾ 2, then M 2 □ Mn is normal.” As an application of Theorem 3.10, the following result is a generalization of the Bernstein inequality (1.2). It is a matrix version of the scalar identity jex+iyj = ex for all x, y 2 . Theorem 3.11 (Cohen [9]) Let X 2 Mn . The following two relations are equivalent and valid:
λðeX eX Þ log λðeX
þX
Þ,
λðjeX jÞ = sðeX Þ log sðeRe X Þ = λðeRe X Þ,
ð3:7Þ ð3:8Þ
388
L. Gan et al.
where Re X := (X + X)∕2 is the Hermitian part of X. Moreover, eX eX = eX if and only if X 2 N n .
þX
Proof Obviously, (3.7) and (3.8) are equivalent. Applying (3.4) on eX∕m and noting that ðeX Þ = eX , we get
λðeX eX Þ log λð½eX
∕ m X∕ m m
Þ,
e
8 m 2 :
ð3:9Þ
Combining (3.9) with the Lie-Trotter product formula, we have (3.7) by the continuity of eigenvalues. If X 2 N n , then X and X commute so that eX eX = eX þX . Conversely, if eX eX = eX þX , then tr eX eX = tr eX þX , and hence X 2 N n by [46, Theorem 4.7]. □ Theorem 3.12 is a generalization of Theorem 3.5 to normal matrices. Theorem 3.12 Let X, Y 2 N n . Then λðjeXþY jÞ log λðjeX jjeY jÞ: Proof Applying (3.8) on X + Y , we have
λðjeXþY jÞ log λðeððXþYÞ þðXþYÞÞ∕2 Þ = λðeðX
þXÞ∕2þðY þYÞ∕2
Þ
ðX þXÞ∕2 ðY þYÞ∕2
log λðe
e
Þ
ðby Theorem 3:5 ð1ÞÞ
= λðje jje jÞ: X
ðsince X, Y 2 N n Þ
Y
□
This completes the proof. Theorem 3.13 is a generalization of Theorem 3.7 to normal matrices. Theorem 3.13 Let X, Y 2 N n . Then r
λððjeX∕2 jjeY jjeX∕2 jÞ Þ log λðjerX∕2 jjerY jjerX∕2 jÞ,
8 r ⩾ 1,
8 0b rb 1:
r
λðjerX∕2 jjerY jjerX∕2 jÞ log λððjeX∕2 jjeY jjeX∕2 jÞ Þ,
1∕2
1∕2
= eX þX = eðX þXÞ∕2 for X 2 N n and Proof Note that jeX j = eX eX that ðX þ XÞ∕2 2 Hn . Application of Theorem 3.7 to A = eðX þXÞ∕2 and □ B = eðY þYÞ∕2 yields the desired results. Recall that the t-geometric mean
Inequalities for Matrix Exponentials and Their Extensions to Lie Groups t
A♯t B = A1∕2 ðA - 1∕2 BA - 1∕2 Þ A1∕2 ,
389
t 2 ½0, 1
is the geodesic joining A, B 2 P n . Since the determinant function is multiplicative, we have det ðA♯t BÞ = det ðA1 - t Þ det ðBt Þ = ðdet AÞ1 - t ðdet BÞt : The following limit result was proved in [24, p.172]. Theorem 3.14 (Hiai and Petz [24]) If A, B 2 P n and t 2 [0, 1], then lim ðAr ♯t Br Þ1∕ r = eð1 - tÞ log Aþt log B :
r→0
The following interesting result summarizes relations for t-geometric means in a way similar to Theorem 3.3. See [17] for a geometric interpretation of the inequalities in Theorem 3.15. Theorem 3.15 (Ando and Hiai [2]) Let A, B 2 P n and t 2 [0, 1]. Then the following relations are equivalent and valid: (1) λððA♯t BÞr Þ log λðAr ♯t Br Þ for all 0b rb 1. (2) λðAr ♯t Br Þ log λððA♯t BÞr Þ for all r ⩾ 1. (3) The function r ° λ((Ar♯tBr)1∕r) is monotonically decreasing on (0, 1) with respect to log . In other words, λððAp ♯t Bp Þ1∕ p Þ log λððAq ♯t Bq Þ1∕ q Þ,
8 0 < q < p:
(4) The function r ° λ((A1∕r♯tB1∕r)r) is monotonically increasing on (0, 1) with respect to log . Moreover, r
lim λððA1∕ r ♯t B1∕ r Þ Þ = λðeð1 - tÞ log Aþt log B Þ:
r→1
Proof The proof of the equivalence of (1)–(4) is similar to that in Theorem 3.1. By a compound matrix argument, the validity of (2) can be reduced to the validity of λ1 ðAr ♯t Br Þb λ1 ððA♯t BÞr Þ, which was shown in [2, p.119–120].
8 r ⩾ 1, □
390
L. Gan et al. r
By Theorem 3.15 (4), λ1 ððA1∕ r ♯t B1∕ r Þ Þ 2 is monotonically decreasing r as r tends to 0. It is bounded by 0 from below since ðA1∕ r ♯t B1∕ r Þ 2 P n . So the limit of λ1((A1∕r♯tB1∕r)r) exists as r tends to 0. Thus λ((A1∕r♯tB1∕r)r) exists by compound matrix argument. It is natural to ask the following questions, which are also mentioned in [4]. Question 3.16 ([4]) What is lim λ((A1∕r♯tB1∕r)r)? r→0
Motivated by Remark 3.9, we ask if the limit of the t-geometric mean matrix exists, not just its eigenvalues, as follows: Question 3.17 ([4]) Does lim (A1∕r♯tB1∕r)r exist? If so, what is the limit? r→0
Combining Theorem 3.3 and Theorem 3.15 yields the following result. We will see more in Theorem 3.24. Theorem 3.18 If A, B 2 P n and t 2 [0, 1], then 1∕ s
λ ðAr ♯t Br Þ1∕ r log λ eð1 - tÞ log Aþt log B log λ ðAð1 - tÞs Bts Þ for all r > 0 and s > 0. When r = s = 1, we have λðA♯t BÞ log λðA1 - t Bt Þ:
Recall that a real-valued function f : I → , where I ⊂ is an interval, is called convex if f ðð1 - αÞs þ αrÞb ð1 - αÞf ðsÞ þ αf ðrÞ,
8 s, r 2 I, α 2 ½0, 1,
i.e., the line segment between any two points r, s on the graph of the function lies above the graph between s and r. Similarly, it is called concave if ð1 - αÞf ðsÞ þ αf ðrÞb f ðð1 - αÞs þ αrÞ,
8 s, r 2 I, α 2 ½0, 1:
Motivated by these classical concepts and Theorem 3.15 (3) and (4), we introduce analogous notions. Let ξ : I → P n , where I ⊂ is an interval. We call ξ geodesically log-convex if λðξðð1 - αÞs þ αrÞÞ log λðξðsÞ♯α ξðrÞÞ,
8 s, r 2 I, α 2 ½0, 1
Inequalities for Matrix Exponentials and Their Extensions to Lie Groups
391
and geodesically log-concave if λðξðsÞ♯α ξðrÞÞ log λðξðð1 - αÞs þ αrÞÞ,
8 s, r 2 I, α 2 ½0, 1:
Here is the geometry behind the definition. The line segment [s, r] is the geodesic emitting from s to r in the one-dimensional Euclidean manifold (0, 1), the domain of ξ. Now the role of , the range of f, is played by P n , the range of ξ. Thus the line segment (1 - α)f(s) + αf(r), α 2 [0, 1], emitting from f(s) to f(r) in , should be replaced by the geodesic in P n : A♯αB, α 2 [0, 1] (see [8, p.205] and [35]). So the definitions of geodesically log-convex function and geodesic log-concave function match our intuition (Figure 1). Let g : ð0, 1Þ → P n be defined by gðrÞ := ðAr ♯t Br Þ1∕ r : Together with Figure 2 of the function r ° λ1(Ar♯tBr)1∕r plotted by MATLAB among many computer experiments performed, we ask the following question: Question 3.19 Given A, B 2 P n and t 2 [0, 1], does there exist x 2 (0, 1) depending on A, B, and t such that g(r) is geodesically log-concave on (0, x] and geodesically log-convex on [x, 1)? In other words, does there exist x 2 (0, 1) depending on A, B, and t such that the following inequalities hold? λðgðsÞ♯α gðrÞÞ log λðgðð1 - αÞs þ αrÞÞ, λðgðð1 - αÞs þ αrÞÞ log λðgðsÞ♯α gðrÞÞ,
for s, r 2 ð0, x, for s, r 2 ½x, 1Þ:
Figure 1 Geodesically log-concave function ξ(r)
ξ(s)
0
s
(1 − α)s + αr
r
R
392
L. Gan et al. 13.2
13
12.6
12.4
1
(Ar # t B r)1/r
12.8
12.2
12
11.8 0
5
10
15
20
25
30
35
r
Figure 2 The largest eigenvalues of (Ar♯tBr)1∕r
Explicitly, for s, r 2 (0, x] and α 2 [0, 1], λððAs ♯t Bs Þ1∕ s ♯α ðAr ♯t Br Þ1∕ r Þ log λððAð1 - αÞsþαr ♯t Bð1 - αÞsþαr Þ1∕ ðð1 - αÞsþαrÞ Þ and for s, r 2 [x, 1) and α 2 [0, 1], λððAð1 - αÞsþαr ♯t Bð1 - αÞsþαr Þ1∕ ðð1 - αÞsþαrÞ Þ log λððAs ♯t Bs Þ1∕ s ♯α ðAr ♯t Br Þ1∕ r Þ: Remark 3.20 Lee and Lim [33] proved for A, B 2 P n and α 2 [0, 1], A♯α Bb ð1 - αÞA þ αB,
ð3:10Þ
which implies that λðA♯α BÞ wlog λðð1 - αÞA þ αBÞ: The inequality (3.10) is a consequence of the numerical inequality t α b 1 - α þ αt for t > 0, due to Kubo-Ando’s theory of operator means in [32]. There may exist a stronger relation, such as log-majorization. Since the determinant function is multiplicative, we have
Inequalities for Matrix Exponentials and Their Extensions to Lie Groups
393
det ððAs ♯t Bs Þ1∕ s ♯α ðAr ♯t Br Þ1∕ r Þ = det ðAð1 - αÞsþαr ♯t Bð1 - αÞsþαr Þ1∕ ðð1 - αÞsþαrÞ : The computer-generated examples analogous to Figure 2 and the determinant equality are the motivation for us to ask Question 3.19. Recall the t-spectral mean t
t
A♮t B = ðA - 1 ♯BÞ AðA - 1 ♯BÞ ,
t 2 ½0, 1:
The following result is a generalization of (1.7). Theorem 3.21 (Gan et al. [18]) Let X, Y 2 Hn . For all r > 0, let 2∕ r
ϕðrÞ = ðerX ♯ erY Þ
and
2∕ r
ψðrÞ = ðerX ♮ erY Þ :
Then the following statements are valid: (1) λ(ϕ(r)) is monotonically decreasing on (0, 1) with respect to log-majorization: 2∕ s
2∕ r
λððesX ♯ esY Þ Þ log λððerX ♯ erY Þ Þ,
8 0 < r < s:
(2) λ(ψ(r)) is monotonically increasing on (0, 1) with respect to log-majorization. (3) lim ϕ(r) = eX+Y = lim ψ(r). r→0
r→0
2∕ r
2∕ s
(4) λððerX ♯ erY Þ Þ log λðeXþY Þ log λððesX ♮ esY Þ Þ for all r > 0 and s > 0. In particular, tr (erX♯ erY)2∕r is monotonically decreasing on r and tr (erX♮ erY)2∕r is monotonically increasing on r. Proof (1) This follows from a special case of Theorem 3.15. (2) According to [29, Proposition 2.3], there exists a unitary U 2 Mn such that ψðrÞ = U ðerX∕2 erY erX∕2 Þ U : 1∕ r
ð3:11Þ
So λ(ψ(r)) = λ((erX∕2erYerX∕2)1∕r) = λ((erXerY)1∕r) for all r > 0. Then (2) follows from Theorem 3.3.
394
L. Gan et al.
(3) This is implied by the Lie-Trotter formula, as also stated in [1, p.191] and [29, p.444]. (4) Combining (1)–(3) yields (4). □ It can be seen from Theorem 3.21 (3) that λðA♯BÞ log λðA♮BÞ,
8 A, B 2 P n ,
which was first pointed out by Ahn et al. [1, p.192]. It was generalized to tgeometric mean and t-spectral mean in [18]. Theorem 3.22 (Gan et al. [18]) For all A, B 2 P n and t 2 [0, 1], we have λðA♯t BÞ log λðA♮t BÞ: Since λ((A(1-t)sBts)1∕s) = λ((Bts∕2A(1-t)sBts∕2)1∕s), we may rewrite Theorem 3.18 as 1∕ s
λðA♯t BÞ log λðeð1 - tÞ log Aþt log B Þ log λððBts∕2 Að1 - tÞs Bts∕2 Þ Þ,
ð3:12Þ
for any t 2 [0, 1] and s > 0. Motivated by the inequalities (3.12) and Theorem 3.22, it is natural to ask whether λ(A♮tB) is an upper bound of 1∕ s λ Bts∕2 Að1 - tÞs Bts∕2 (and thus an upper bound of λðeð1 - tÞ log Aþt log B ÞÞ. Very recently Gan and Tam [17] proved that this is the case for a specific range of s and the range depends on the given t 2 [0, 1]. Theorem 3.23 (Gan and Tam [17]) Let A, B 2 P n . For each chosen t 2 [0, 1], let 0 < s b min f1∕ t, 1∕ ð1 - tÞg. We have 1∕ s
λððBts∕2 Að1 - tÞs Bts∕2 Þ Þ log λðA♮t BÞ: Moreover, a numerical example was given in [17] to show that the upper bound minf1∕ t, 1∕ ð1 - tÞg for s is needed. Gan and Tam [17] also proved the following result. Theorem 3.24 (Gan and Tam [17]) Let A, B 2 P n. For t 2 [0, 1] and s > 0. We have λ
Bts∕2 Að1 - tÞs Bts∕2
1∕ s
log λ ðAs ♮t Bs Þ1∕ s :
Inequalities for Matrix Exponentials and Their Extensions to Lie Groups
395
In particular, setting s = 1 yields λðA♯t BÞ log λðeð1 - tÞ log Aþt log B Þ log λðBt∕2 A1 - t Bt∕2 Þ log λðA♮t BÞ, for all 0 b t b 1. Motivated by Theorem 3.15, we would like to know if analogous relation holds for the t-spectral mean. The following theorem shows that such relation does exist but in reverse order. Theorem 3.25 (Gan and Tam [17]) For every A, B ⩾ 0 and 0 b t b 1, (1) λðAr ♮t Br Þ log λððA♮t BÞr Þ, 0 < rb 1. (2) λððA♮t BÞr Þ log λðAr ♮t Br Þ, r ⩾ 1. (3) The function r → λ((Ar♮tBr)1∕r) is monotonically increasing on (0, 1) with respect to log . In other words, λððAq ♮t Bq Þ1∕ q Þ log λððAp ♮t Bp Þ1∕ p Þ,
8 0 < q < p:
(4) The function r → λ((A1∕r♮tB1∕r)r) is monotonically decreasing on (0, 1) with respect to log . Moreover, Ahn et al. [1] provided that lim ðAr ♮t Br Þ1∕ r = eð1 - tÞ log Aþt log B ,
r→0
8 t 2 ½0, 1:
4 Extensions to Lie Groups We quickly review some algebraic structures of semisimple Lie groups (see [20, 30]). We denote by G a noncompact connected semisimple Lie group and its Lie algebra is denoted by g. A Cartan involution of G is denoted by Θ: G → G. The fixed point set K of G is an analytic subgroup of G. Denote by θ the differential map d Θ of Θ. Then θ : g → g is a Cartan involution. Let k be the eigenspace of θ corresponding to the eigenvalue 1, which is also the Lie algebra of K, and let p be the eigenspace of θ corresponding to the eigenvalue - 1, which is also an Ad K-invariant subspace of g complementary to k. Then g = k p is a Cartan decomposition. The Killing form B on g is negative definite on k, and it is positive definite on p. The bilinear form Bθ is an inner product on g, defined as
396
L. Gan et al.
Bθ ðX, YÞ = - BðX, θYÞ,
X, Y 2 g:
For each X 2 g, let eX = exp X be the exponential of X. Let P := feX : X 2 pg: The mapping K × p → G, defined by (k, X) ° keX, is a diffeomorphism. So each g 2 G can be uniquely written as g = kp = kðgÞpðgÞ
ð4:1Þ
with k = k(g) 2 K and p = p(g) 2 P. The right Cartan decomposition of G can be written as the decomposition G = KP. Correspondingly, G = PK is the left Cartan decomposition. In this section, we will use the right Cartan decomposition unless specified and p(g) in (4.1) is called as the P-component of g 2 G. Let : G → G be the diffeomorphism defined by (g) = Θ(g-1). We also write g = (g) for convenience. Note that is not an automorphism on G, since ( fg) = gf for all f, g 2 G. Since θ is the differential of Θ at the identity, by the naturality of the exponential map, we have
edðXÞ = ðeX Þ = Θðe - X Þ = e - θX ,
8 X 2 g:
Thus the differential d of is just - θ, and hence p is the eigenspace of d : g → g associated with the eigenvalue 1. Similar to the group case, for convenience we denote d(X) = X for all X 2 g. Hence we have ðX Þ = X
and X þ X 2 p,
8 X 2 g:
It follows that X=
- X þ X X þ X þ = X k þ X p 2 k þ p, 2 2
8 X 2 g:
So we call X k :=
- X þ X 2
and X p :=
X þ X 2
ð4:2Þ
the k-component and p-component of X 2 g, respectively. Because K is the fixed point set of Θ and exp g : p → P is bijective, we see that k = k-1 for all
Inequalities for Matrix Exponentials and Their Extensions to Lie Groups
397
k 2 K and p = p for all p 2 P. By the Cartan decomposition (4.1), we have the P-component pðgÞ = ðg gÞ1∕2 ,
8 g 2 G:
ð4:3Þ
An element g 2 G (resp., X 2 g) is said to be normal if gg = gg (resp., [X, X] = 0). It follows that if X 2 g is normal, then eX is normal in G. According to [20, p.183], Cartan involution/decomposition is unique up to conjugation, so normality is independent of the choice Θ. Obviously, elements in P are normal. If g = kp is the Cartan decomposition of g 2 G, then g is normal if and only if kp = pk. Fix a maximal abelian subspace a of p and let A be the analytic subgroup generated by a. A fixed closed Weyl chamber in a is written as aþ and let Aþ = exp aþ . Every element from p is K-conjugate to a unique element from aþ , that is, if X 2 p, there is a unique Z 2 aþ and some k 2 K such that X = Ad kðZÞ: We thus let aþ ðXÞ = Z. It follows that exp X = exp ðAd kðZÞÞ = kexp ðZÞk - 1 2 KAþ K: With Cartan decompositions g = pk, applying (4) to the P-component of any g 2 G, it gets the following Lie group decomposition G = KA+K. Thus each g 2 G can be decomposed as for u, v 2 K and a 2 A+ g = uav, Furthermore, a 2 A+ is uniquely determined, and we let a+(g) = a. Example 4.1 Consider the real simple Lie group G = SLn ðÞ, the special linear group consisting of all n × n complex matrices whose determinants are 1. Its Lie algebra g = sln ðÞ, the special linear algebra consisting of all n × n complex matrices whose traces are 0. Let a Cartan involution Θ of G be chosen as
ΘðgÞ = ðg - 1 Þ ,
8 g 2 G,
where denotes conjugate transpose. Then the diffeomorphism : G → G is given by
398
L. Gan et al.
ðgÞ = Θðg - 1 Þ = g ,
8 g 2 G,
the same as the usual conjugate transpose for matrices (so normality in G is the usual normality of matrices), K = SU(n) is the special unitary group consisting of all n × n unitary matrices whose determinants are 1, the corresponding Cartan involution θ = d Θ of g is given by θðXÞ = - X ,
8 X 2 g,
the corresponding Lie algebra Cartan decomposition is g = k p = sun isun , with k = sun consisting of skew-Hermitian matrices in g and p = isun consisting of Hermitian matrices in g, and the symmetric positive definite bilinear form Bθ is given by (up to a positive scalar) Bθ ðX, YÞ = tr XY ,
8 X, Y 2 g:
If we denote P = exp p = exp ðisun Þ, then the corresponding Lie group Cartan decompositions are G = PK and G = KP, which are the usual left and right polar decompositions for matrices on the element level, respectively. If we choose a to be the maximal abelian subspace of p consisting of all diagonal traceless matrices and choose aþ to be the closed Weyl chamber in a consisting of all diagonal traceless matrices whose entries are in nonincreasing order, then the KA+K decomposition of G is just the singular value decomposition for matrices. An element X 2 g is called real semisimple (resp., nilpotent) if ad X is diagonalizable over (resp., nilpotent). An element g 2 G is called hyperbolic (resp., unipotent) if g = exp X for some real semisimple (resp., nilpotent) X 2 g; in either case X is unique and we write X = log g. An element g 2 G is called elliptic if Ad g is diagonalizable over with eigenvalues of modulus 1. Let e be elliptic, h hyperbolic, and u unipotent. By [31, Proposition 2.1], each g 2 G can be uniquely decomposed as g = ehu,
ð4:4Þ
Inequalities for Matrix Exponentials and Their Extensions to Lie Groups
399
where e, h and u commute. Equation (4.4) is called the complete multiplicative Jordan decomposition (CMJD for abbreviation). According to [31, Proposition 6.2], the set L of all hyperbolic elements in G can be described as L = P2 = fpq : p, q 2 Pg: Let l denote the set of all real semisimple elements in g. Then the restriction of the exponential map on l is a bijection onto L. By [31, Proposition 2.4], X 2 l if and only if Ad gðXÞ 2 a for some g 2 G. Since p = Ad KðaÞ, we have l = Ad GðaÞ = Ad GðpÞ: The Weyl group W of ðg, aÞ acts simply transitively on a (and also on A through the exponential map exp : a → A). For any real semisimple X 2 g, let W(X) denote the set of elements in a that are conjugate to X, i.e., WðXÞ = Ad GðXÞ \ a: It is known from [31, Proposition 2.4] that W(X) is a single W-orbit in a. Let conv W(X) be the convex hull in a generated by W(X). For each g 2 G, define AðgÞ: = exp conv Wðlog hðgÞÞ, where h(g) is the hyperbolic component of g in its CMJD. Kostant’s preorder ≺G on G is defined (see [31, p.426]) by setting f G g
,
Aðf Þ ⊂ AðgÞ:
This preorder induces a partial order on the conjugacy classes of G. Moreover, this preorder ≺G does not depend on the choice of a, because according to [31, Theorem 3.1] f G g
,
ρðπðf ÞÞb ρðπðgÞÞ
ð4:5Þ
for all finite dimensional irreducible representations π of G, where ρ(π(g)) denotes the spectral radius of π(g). Example 4.2 Consider the real simple Lie group G = SLn ðÞ. Let the Cartan decomposition and the KA+K decomposition of G be as in Example 4.1. Then
400
L. Gan et al.
the Weyl group W is isomorphic to the symmetric group Sn (see [50, Section 2.8] for details), and the CMJD of g 2 G is given by g = ehu, where e is diagonalizable with eigenvalue moduli 1, h is diagonalizable with positive eigenvalues, and u is unipotent (see [50, Theorem 1.3] for details). In particular, if g 2 G, then h(g) is conjugate to diag jλ(g)j2 A+, where diag jλ(g)j is the diagonal matrix whose diagonal entries are the absolute values of the eigenvalues of g in nonincreasing order. It follows that AðgÞ = exp conv W log hðgÞ ffi exp conv Sn log ðjλðgÞjÞ ⊂ nþ : Therefore, by (2.1) Kostant’s preorder ≺G on SLn ðÞ means that f G g
,
jλðf Þj log jλðgÞj:
This allows us to think log-majorization in the context of group action. Let π : G →End V be any finite dimensional irreducible representation, and let dπ : g → End V be the induced representation of g (i.e., dπ is the differential of π at the identity of G). So by the naturality property of the exponential map, we have exp
End V ∘dπ = π∘exp g :
ð4:6Þ
Let g = g þ ig be the complexification of g. Since g = k p is a Cartan decomposition, g = u iu = ðk þ ipÞðp þ ikÞ is a Cartan decomposition of g , where u = k þ ip is a compact real form of g . Thus there exists a unique (up to scalar) inner product h, i on V such that dπ(X) is skew-Hermitian for all X 2 u and dπ(Y ) is Hermitian for all Y 2 iu, and hence π(k) is unitary for k 2 K and π( p) is positive definite for p 2 P by (4.6). We will assume that V is given this inner product. Now if g = kp with k 2 K and p 2 P, we have πðg Þ = ðπðgÞÞ , where (π(g)) denotes the adjoint operator of π(g), because
ð4:7Þ
Inequalities for Matrix Exponentials and Their Extensions to Lie Groups
401
πðg Þ = πðpk - 1 Þ = πðpÞðπðkÞÞ - 1 = ðπðpÞÞ ðπðkÞÞ = ðπðkÞπðpÞÞ = ðπðgÞÞ :
Because of the existence of such inner product on V , the operator norm kπ(g)k is defined for all g 2 G. Therefore, as a special case of (4.5), we have (see [31, Proposition 4.3]) kπðgÞkb kπðpÞk
)
g G p,
8 g 2 G, 8 p 2 P,
ð4:8Þ
because ρðπðgÞÞb kπðgÞkb kπðpÞk = ρðπðpÞÞ In particular, kpv G p,
8 k, v 2 K, 8 p 2 P:
ð4:9Þ
In the language of matrices, (4.9) amounts to saying that if A 2 P n and X 2 GLn ðÞ has the same singular values with A, then jλðXÞj log jλðAÞj = sðAÞ = sðXÞ: In other words, (4.9) is an extension of Weyl’s inequality (3.2) in Lie groups. Now we are ready to extend the log-majorization results for matrix exponentials in Section “Inequalities for Matrix Exponentials” to semisimple Lie groups with respect to the Kostant’s preorder. The following is an extension of Theorem 3.3. Theorem 4.3 (Liao et al. [35]) Let p, q 2 P. The following statements are true and equivalent: (1) prqr ≺G (pq)r for all 0b rb 1. (2) (pq)r ≺ Gprqr for all r ⩾ 1. (3) The mapping r ° ( prqr)1∕r is monotonically increasing on (0, 1) with respect to ≺G. More precisely, ðpr qr Þ1∕ r G ðpt qt Þ1∕ t ,
8 0 < r < t:
ð4:10Þ
(4) The mapping r ° ( p1∕rq1∕r)r is monotonically decreasing on (0, 1) with respect to ≺G.
402
L. Gan et al.
Proof The proof of the equivalence of (1)–(4) is similar to that in Theorem 3.3. We only need to show that (4.10) is true. Suppose 0 < r < t. Let π : G → End V be any irreducible finite dimensional representation. Fix an inner product on V such that π( p) and π(q) are positive definite. By (4.5), it suffices to show that ρðπ½ðpr qr Þ1∕ r Þb ρðπ½ðpt qt Þ1∕ t Þ:
ð4:11Þ
Now we have ρðπ½ðpr qr Þ1∕ r Þ = ρð½πðpr qr Þ1∕ r Þ = ðρ½πðpr qr ÞÞ1∕ r = ðρð½πðpÞr ½πðqÞr ÞÞ
1∕ r
= ðλ1 ð½πðpÞr ½πðqÞr ÞÞ
1∕ r
b ðλ1 ð½πðpÞt ½πðqÞt ÞÞ1∕ t t
t
= ðρð½πðpÞ ½πðqÞ ÞÞ
ðby Theorem 3:2ð3ÞÞ
1∕ t
= ρðπ½ðpt qt Þ1∕ t Þ:
Thus the desired result (4.11) is valid. This completes the proof.
□
Note that p q and p q p are conjugate and that the order ≺G is preserved under conjugation. So Theorem 4.3 can also be formulated for pr∕2qrpr∕2 below (which was first introduced in [49]), as an extension of Theorem 3.7. Also see Sarver and Tam [47] and Wang and Gong [54]. r r
r∕2 r r∕2
Theorem 4.4 (Tam [49]) Let p, q 2 P. The following statements are true and equivalent: (1) pr∕2qrpr∕2 ≺G( p1∕2qp1∕2)r for all 0b r b 1. (2) ( p1∕2qp1∕2)r ≺Gpr∕2qrpr∕2 for all r ⩾ 1. (3) The function r ° ( pr∕2qrpr∕2)1∕r is monotonically increasing on (0, 1) with respect to ≺G. (4) The function r ° ( p1∕2rq1∕rp1∕2r)r is monotonically decreasing on (0, 1) with respect to ≺G. Theorem 4.3 and the Lie-Trotter product formula m
lim ðeX∕ m eY∕ m Þ = eXþY ,
m→1
8 X, Y 2 g,
ð4:12Þ
together yield the following result of Kostant [31, Theorem 6.3], as an extension of Theorem 3.5.
Inequalities for Matrix Exponentials and Their Extensions to Lie Groups
403
Theorem 4.5 (Kostant [31]) If X, Y 2 p, then ð4:13Þ
eXþY G eX eY : Proof Theorem 4.3(4) implies that r
ðeX∕ r eY∕ r Þ G eX eY ,
8 r > 1:
Since the spectral radius is a continuous function on the space of operators, it follows from (4.5) that ≺G is continuous. So we have r
lim ðeX∕ r eY∕ r Þ G eX eY :
r→1
But the left-hand side is exactly eX+Y by the Lie-Trotter product formula (4.12). □ So for any character χ : G → associated with a finite dimensional irreducible representation of G, it can be deduced [31, p.447] that χðeXþY Þb χðeX eY Þ: So we recover Golden-Thompson inequality when G = SLn ðÞ and χ = tr. The following result is an extension of Theorem 3.10. Theorem 4.6 (Kostant [31]) For any g 2 G, we have g2m G ðgm Þ gm G ðg gÞm ,
8 m 2 :
ð4:14Þ
Proof Let π be any finite dimensional irreducible representation of G. By the properties of operator norm, we have kπðg2m Þkb kπðgm Þk2 b kπðgÞk2m : Note that kπðgm Þk2 = k½πðgm Þ πðgm Þk and that by (4.7) kπðgÞk2m = k½πðgÞ πðgÞkm = kπðg gÞkm = k½πðg gÞm k = kπððg gÞm Þk,
404
L. Gan et al.
where the second to last equality holds because π(gg) is positive definite. It follows that kπðg2m Þkb kπððgm Þ gm Þkb kπððg gÞm Þk: Since (gm)gm and (gg)m are in P, (4.15) yields (4.14) by (4.8).
ð4:15Þ □
Recall in (4.2) and (4.3) that X p = ðX þ XÞ∕2 and p(g) = (gg)1∕2 are the p -component of X 2 g and P-component of g 2 G, respectively. The following result is an extension of Theorem 3.11. Theorem 4.7 (Liu [37]) For all X 2 g, we have
eX eX G eX
þX
ð4:16Þ
:
Equivalently, pðeX Þ G eX p ,
8 X 2 g:
Proof Applying (gm)gm ≺G(gg)m in Theorem 4.6 to g = eX∕m, we see that
ðeX Þ eX G ðeX∕ m Þ eX∕ m
m
8 m 2 :
,
Note that eX = ðeX Þ by the naturality of the exponential map. So the above relation means that
eX eX G ðeX
∕ m X∕ m m
Þ ,
e
8 m 2 :
Applying the Lie-Trotter product formula on the right-hand side yields (4.16). Now (4.16) is equivalent to 2
ðpðeX ÞÞ = ðeX Þ eX G eX
þX
2
= ðeX p Þ :
Thus pðeX Þ G eX p for all X 2 g.
□
Now we combine Theorem 4.5, Theorem 4.6, and Theorem 4.7. Theorem 4.8 For all X, Y 2 g, we have
eXþY G pðeXþY Þ G eðXþYÞ ∕2 eðXþYÞ∕2 G eX p þY p G eX p eY p :
Inequalities for Matrix Exponentials and Their Extensions to Lie Groups
405
In particular,
eX G pðeX Þ G eX ∕2 eX∕2 G eX p ,
8 X 2 g:
Proof The first inequality in (4.8) follows from (4.9) directly. Applying (gm)gm ≺G(gg)m in Theorem 4.6 to g = e(X+Y )∕2 with m = 2, we see that
2
ðeXþY Þ eXþY G ðeðXþYÞ ∕2 eðXþYÞ∕2 Þ , which implies the second inequality. Theorem 4.7 implies that
eðXþYÞ ∕2 eðXþYÞ∕2 G eððXþYÞ þðXþYÞÞ∕2 = eX p þY p , and eX p þY p G eX p eY p follows from Theorem 4.5.
□
The following result generalizes Theorem 3.12 and Theorem 4.5. Theorem 4.9 If X, Y 2 g are normal, then ð4:17Þ
pðeXþY Þ G pðeX ÞpðeY Þ:
In particular, if X, Y 2 p, then (4.17) reduces to (4.13) in Theorem 4.5. Proof Since X 2 g is normal, we have eX eX p = e
X þX 2
= eX
þX 1∕2
þX
= eX eX and thus
= eX eX
1∕2
= pðeX Þ:
ð4:18Þ
According to Theorem 4.8, we have pðeXþY Þ G eX p eY p = pðeX ÞpðeY Þ: □
This completes the proof. The following result generalizes Theorem 3.13 and Theorem 4.4. Theorem 4.10 If X, Y 2 g are normal, then r
pðerX∕2 ÞpðerY ÞpðerX∕2 Þ G ðpðeX∕2 ÞpðeY ÞpðeX∕2 ÞÞ ,
8 0b r b 1, ð4:19Þ
r
ðpðeX∕2 ÞpðeY ÞpðeX∕2 ÞÞ G pðerX∕2 ÞpðerY ÞpðerX∕2 Þ,
8 r ⩾ 1:
ð4:20Þ
In particular, if X, Y 2 p, then (4.19) and (4.20) reduce to Theorem 4.4 (1) and (2), respectively.
406
L. Gan et al.
Proof Since X 2 g is normal, by (4.18), we have pðeX Þ = eX p ,
pðeX∕2 Þ = eX p ∕2 ,
pðerX Þ = erX p ,
pðerX∕2 Þ = erX p ∕2 ,
where X p 2 p. Thus (4.19) and (4.20) follow from Theorem 4.4 (1) and (2), respectively. □ The mapping p ° p1∕2K identify P with G∕K as a symmetric space of noncompact type. The t-geometric mean of p, q 2 P was defined in [35] as t
p♯t q = p1∕2 ðp - 1∕2 qp - 1∕2 Þ p1∕2 ,
8 0b t b 1:
It is the unique geodesic in P from p (at t = 0) to q (at t = 1). It is known that p♯tq = q♯1-tp and (p♯tq)-1 = p-1♯tq-1. When t = 1∕2, we abbreviate p♯1∕2q as p♯q. The following result is an extension of Theorem 3.14. Theorem 4.11 If X, Y 2 p and t 2 [0, 1], then 1∕ r
lim ðerX # t erY Þ
r→0
= eð1 - tÞXþtY :
ð4:21Þ
The following result is an extension of Theorem 3.15 to Lie groups. Theorem 4.12 Let X, Y 2 p and t 2 [0, 1]. Then the following relations are equivalent and valid: (1) (eX#teY)r ≺GerX#terY for all 0b rb 1. (2) erX#terY ≺G(eX#teY)r for all r ⩾ 1. (3) The mapping r ° (erX#terY)1∕r is monotonically decreasing on (0, 1) with respect to ≺G. (4) The mapping r ° (eX∕r#teY∕r)r is monotonically increasing on (0, 1) with respect to ≺G. Moreover, r
lim ðeX∕ r # t eY∕ r Þ = eð1 - tÞXþtY :
r→1
Proof The proof of the equivalence of (1)–(4) is similar to that of Theorem 3.1. The proof of (1) is similar to the proof of Theorem 4.3 by (4.5). □ The t-spectral mean of p, q 2 P is defined in [18] as
Inequalities for Matrix Exponentials and Their Extensions to Lie Groups t
407
8 0b t b 1:
t
p♮t q = ðp - 1 ♯qÞ pðp - 1 ♯qÞ ,
When t = 1∕2, we abbreviate p♮1∕2q as p♮q. The following result, which follows from Theorem 4.11 and Theorem 4.12, is an extension of Theorem 3.18 to G. Theorem 4.13 (Gan and Tam [17]) If X, Y 2 p and t 2 [0, 1], then 1∕ r
ðerX # t erY Þ
G eð1 - tÞXþtY G eð1 - tÞsX etsY
1∕ s
G ðesX ♮t esY Þ
1∕ s
for all r > 0 and s > 0. The following result is an extension of (3.11) to semisimple Lie groups. Lemma 4.14 For p, q 2 P, there exists a unique k 2 K such that p♮q = kðp1∕2 qp1∕2 Þ k - 1 : 1∕2
Consequently, (p♮q)2 is K-conjugate to p1∕2qp1∕2 2 P and G-conjugate to pq 2 G. Proof Let g = ( p-1♯q)1∕2p1∕2 and let g = kr be the right Cartan decomposition of g with k 2 K and r 2 P. Then gg = r2 and gg = kr2k-1 = k(gg)k-1. Therefore, 1∕2
1∕2
p♮q = ½ðp - 1 ♯qÞ p1∕2 ½ðp - 1 ♯qÞ p1∕2
= kð½ðp - 1 ♯qÞ p1∕2 ½ðp - 1 ♯qÞ p1∕2 Þk - 1 1∕2
1∕2
= kðp1∕2 ðp - 1 ♯qÞp1∕2 Þk - 1 = kðp1∕2 qp1∕2 Þ k - 1 : 1∕2
□
The following result is an extension of Theorem 3.21. Theorem 4.15 (Gan et al. [18]) Let X, Y 2 p. For all r > 0, let 2∕ r
ϕðrÞ = ðerX ♯ erY Þ
and
2∕ r
ψðrÞ = ðerX ♮ erY Þ :
Then the following statements are valid: 1. ϕ(r) is monotonically decreasing on (0, 1) with respect to ≺G. 2. ψ(r) is monotonically increasing on (0, 1) with respect to ≺G.
408
L. Gan et al.
3. lim ϕ(r) = eX+Y = lim ψ(r). r→0
r→0
4. (erX♯ erY)2∕r ≺GeX+Y ≺G(esX♮ esY)2∕s for all r > 0 and s > 0. Proof The statements about geometric mean are special cases of Theorem 4.12. Lemma 4.14 and Theorem 4.4 together yield the statements about spectral mean. □ The following result is an extension of Theorem 3.22. Theorem 4.16 (Gan et al. [18]) For all p, q 2 P and t 2 [0, 1], we have p♯t q G p♮t q: We can extend Theorems 3.23, 3.24, and 3.25. Theorem 4.17 (Gan and Tam [17]) Let p, q 2 P. For each chosen t 2 [0, 1], let 0 < sb min f1∕ t, 1∕ ð1 - tÞg. We have qts∕2 pð1 - tÞs qts∕2
1∕ s
G p♮t q:
Theorem 4.18 (Gan and Tam [17]) Let p, q 2 P. For t 2 [0, 1] and s > 0, qts∕2 pð1 - tÞs qts∕2
1∕ s
G ðps ♮t qs Þ1∕ s :
Moreover, p♯t q G eð1 - tÞ log pþt log q G qt∕2 pð1 - tÞ qt∕2 G p♮t q,
0b t b 1:
Theorem 4.19 (Gan and Tam [17]) For every p, q 2 P and t 2 [0, 1], (1) pr ♮t qr G ðp♮t qÞr , 0 < r b 1. (2) ðp♮t qÞr G pr ♮t qr , r ⩾ 1. (3) The function r → ( pr♮tqr)1∕r is monotonically increasing on (0, 1) with respect to ≺G. In other words ðpv ♮t qv Þ1∕ v G ðpu ♮t qu Þ1∕ u ,
80 < u < v:
(4) The function r → ( p1∕r♮tq1∕r)r is monotonically decreasing on (0, 1) with respect to ≺G. Moreover, lim ðpr ♮t qr Þ1∕ r = eð1 - tÞ log pþt log q ,
r→0
8t 2 ½0, 1:
See [17] for more results of the weighted geometric mean and the weighted spectral mean.
Inequalities for Matrix Exponentials and Their Extensions to Lie Groups
409
Acknowledgements We are thankful to the anonymous referees for their careful reading of the original manuscript and constructive suggestions which are helpful to us for improving the paper. The work of L. Gan was supported by the AMS-Simons Travel Grant 2022–2024.
References 1. Ahn, E., Kim, S., & Lim, Y. (2007). An extended Lie-Trotter formula and its applications. Linear Algebra and its Applications, 427, 190–196 2. Ando, T., & Hiai, F. (1994). Log majorization and complementary GoldenThompson type inequalities. Linear Algebra and its Applications, 197/198, 113–131 3. Araki, H. (1990). On an inequality of Lieb and Thirring. Letters in Mathematical Physics, 19, 167–170 4. Audenaert, K. M. R., Hiai, F. (2016). Reciprocal Lie-Trotter formula. Linear and Multilinear Algebra, 64, 1220–1235 5. Bebiano, N., da Providência, J., & Lemos, R. (2004). Matrix inequalities in statistical mechanics. Linear Algebra and its Applications, 376, 265–273 6. Bernstein, D. S. (1988). Inequalities for the trace of matrix exponentials. SIAM Journal on Matrix Analysis and Applications, 9, 156–158 7. Bhatia, R. (1997). Matrix analysis. Springer 8. Bhatia, R. (2007). Positive definite matrices. Princeton University Press 9. Cohen, J. E. (1988). Spectral inequalities for matrix exponentials. Linear Algebra and its Applications, 111, 25–28 10. Cohen, J. E., Friedland, S., Kato, T., & Kelly, F. (1982). Eigenvalue inequalities for products of matrix exponentials. Linear Algebra and its Applications, 45, 55–95 11. Cordes, H. O. (1987). Spectral theory of linear differential operators and comparison algebras. Cambridge University Press 12. Fan, K. (1949). On a theorem of Weyl concerning eigenvalues of linear transformations. I. Proceedings of the National Academy of Sciences of the United States of America, 35, 652–655 13. Fan, K. (1951). Maximum properties and inequalities for the eigenvalues of completely continuous operators. Proceedings of the National Academy of Sciences of the United States of America, 37, 760–766 14. Fan, K., & Hoffman, A. J. (1955). Some metric inequalities in the space of matrices. Proceedings of the National Academy of Sciences of the United States of America, 6, 111–116 15. Fiedler, M., & Pták, V. (1997). A new positive definite geometric mean of two positive definite matrices. Linear Algebra and its Applications, 251, 1–20 16. Forrester, P. J., & Thompson, C.J. (2014). The Golden-Thompson inequality: Historical aspects and random matrix applications. Journal of Mathematical Physics, 55, 023503 17. Gan, L., & Tam, T.-Y. (2021). Inequalities and limits of weighted spectral geometric mean. arXiv:2109.00351 18. Gan, L., Liu, X., & Tam, T.-Y. (2021) On two geometric means and sum of adjoint orbits. Linear Algebra and its Applications, 631, 156–173
410
L. Gan et al.
19. Golden, S. (1965). Lower bounds for Helmholtz function. Physics Review, 137, B1127–B1128 20. Helgason, S. (1978). Differential geometry, Lie groups, and symmetric spaces. Academic Press 21. Heinz, E. (1951). Beiträge zur Störungstheorie der Spektralzerlegung. Mathematische Annalen, 123, 415–438 22. Hiai, F. (1994). Log-majorizations and norm inequalities for exponential operators, Linear Operators. (1997). Polish Academy of Sciences, 38, 119–181. Banach Center Publication 23. Hiai, F. (1995). Trace norm convergence of exponential product formula. Letters in Mathematical Physics, 33, 147–158 24. Hiai, F., & Petz, D. (1993). The Golden-Thompson trace inequality is complemented. Linear Algebra and its Applications, 181, 153–185 25. Horn, A. (1954). Doubly stochastic matrices and the diagonal of a rotation of matrix. American Journal of Mathematics, 76, 620–630 26. Horn, A. (1954). On the eigenvalues of a matrix with prescribed singular values. Proceedings of American Mathematical Society, 5, 4–7 27. Horn, R. A., & Johnson, C. R. (2013). Matrix analysis (2nd ed.). Cambridge University Press 28. Kim, S. (2021). Operator inequalities and gyrolines of the weighted geometric means. Mathematical Inequalities and Applications, 24, 491–514 29. Kim, H., & Lim, Y. (2007). An extended matrix exponential formula. Journal of Mathematical Inequalities, 1, 443–447 30. Knapp, A. W. (2002). Lie groups beyond an introduction (2nd ed.). Birkhäuser 31. Kostant, B. (1973). On convexity, the Weyl group and the Iwasawa decomposition. Annales scientifiques de l’École Normale Supérieure, 6(4), 413–455 32. Kubo, F., & Ando, T. (1980). Means of positive linear operators. Mathematische Annalen, 246, 205–224 33. Lee, H., & Lim, Y. (2007). Metric and spectral geometric means on symmetric cones. Kyungpook Mathematical Journal, 47, 133–150 34. Lenard, L. (1971). Generalization of the Golden-Thompson inequality tr ðeA eB Þ ⩾ tr eAþB . Indiana University Mathematics Journal, 21, 457–467 35. Liao, M., Liu, X., & Tam, T.-Y. (2014). A geometric mean for symmetric spaces of noncompact type. Journal of Lie Theory, 24, 725–736 36. Lieb, E., & Thirring, W. (1976). Inequalities for the moments of the eigenvalues of the Schrödinger hamiltonian and their relation to sobolev inequalities, Studies in Mathematical Physics (pp. 269–303). Princeton University Press 37. Liu, X. (2017). Generalization of some inequalities for matrix exponentials to Lie groups. Journal of Lie Theory, 27, 185–192 38. Marcus, M. (1973). Finite dimensional multilinear algebra. I. Marcel Dekker 39. Marcus, M. (1975). Finite Dimensional multilinear algebra. II. Marcel Dekker 40. Marshall, A. W., Olkin, I., & Arnold, B. C. (2011). Inequalities: Theory of majorization and its applications (2nd ed.,). Springer 41. Merris, R. (1997). Multilinear algebra. Gordon and Breach Science Publishers. 42. Pedersen, G. K. (1972). Some operator monotone functions. Proceedings of the American Mathematical Society, 36, 309–310 43. Petz, D. (1994). A survey of certain trace inequalities. Functional Analysis and Operator Theory, 30, 287–298. Banach Center Publication, Polish Academy of Sciences
Inequalities for Matrix Exponentials and Their Extensions to Lie Groups
411
44. Pusz, W., & Woronowicz, S. L. (1975). Functional calculus for sesquilinear forms and the purification map. Reports on Mathematical Physics, 8, 159–170 45. Serre, D. (2010). Matrices: Theory and applications (2nd ed.). Springer 46. So, W. (1992). Equality cases in matrix exponential inequalities. SIAM Journal on Mathematical Analysis, 13, 1154–1158 47. Sarver, Z., & Tam, T.-Y. (2015). Extension of Wang-Gong monotonicallyity result in semisimple Lie groups. Special Matrices, 3, 244–249 48. Symanzik, K. (1965). Proof of refinements of an inequality of Feynmann. Journal of Mathematical Physics, 6, 1155–1156 49. Tam, T.-Y. (2010). Some exponential inequalities for semisimple Lie groups. Topics in operator theory (Vol. 1: Operators, Matrices and Analytic Functions). Operator Theory: Advances and Applications, 202, 539–552. Birkhäuser 50. Tam, T.-Y., & Liu, X. (2018). Matrix inequalities and their extensions to Lie groups. CRC Press 51. Thompson, C.J. (1965). Inequality with applications in statistical mechanics. Journal of Mathematical Physics, 6, 1812–1813 52. Thompson, C.J. (1971). Inequalities and partial orders on matrix spaces. Indiana University Mathematics Journal, 21, 469–480 53. von Neumann, J. (1937). Some matrix-inequalities and metrization of matric-space. Tomsk University Review, 1, 286–300 54. Wang, B. Y., & Gong, M. P. (1993). Some eigenvalue inequalities for positive semidefinite matrix power products. Linear Algebra and its Application, 184, 249–260 55. Zhan, X. (2002). Matrix inequalities. Springer
Numerical Ranges of Operators and Matrices Pei Yuan Wu and Hwa-Long Gau
Abstract In this chapter, we briefly survey some major developments over the past 100 years in the study of the numerical ranges of bounded linear operators on a complex Hilbert space. This is done in nine sections. We start with some basic properties of the numerical range, including its various parameters. For the numerical contractions, we discuss among others the power inequality and Ando’s theorem. The essential numerical range, together with its implications to the numerical ranges of compact operators, is studied next. We then relate the numerical range to dilations of an operator, focusing on the unitary dilation and Berger power dilation. In the succeeding two sections, we move to finite matrices. Section 7 discusses using the Kippenhahn polynomial to prove Anderson’s theorem. Another highlight is the numerical range analogue of the Perron–Frobenius theorem for nonnegative matrices. Then in Section 8, we consider matrices of class Sn, whose numerical ranges can be used to generalize Poncelet’s porism and other classical results from projective geometry. Finally, in the concluding Section 9, we discuss various generalized numerical ranges, including the more recent higher-rank numerical ranges. The main concern is their convexity and interrelationships. Keywords Kippenhahn polynomial • Numerical radius • Numerical range
P. Y. Wu (✉) Department of Applied Mathematics, National Yang Ming Chiao Tung University, Hsinchu, Taiwan e-mail: [email protected] H.-L. Gau Department of Mathematics, National Central University, Chungli, Taiwan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Moslehian (ed.), Matrix and Operator Equations and Applications, Mathematics Online First Collections, https://doi.org/10.1007/16618_2023_52
413
414
P. Y. Wu and H.-L. Gau
Mathematics Subject Classification (MSC2020) Primary 47A12 • Secondary 15A60
1 Introduction For a bounded linear operator A on a complex Hilbert space ðH, h , iÞ with inner product h, i and associated norm k k, its numerical range W(A) is the subset fhAx, xi : x 2 H, kxk = 1g of the complex plane . The theory of numerical range starts with the discovery of its convexity by Toeplitz and Hausdorff in 1918 and 1919. The purpose of this chapter is to give a survey of some of its major developments and achievements over the past 100 years. The first concern we have is, given an operator, what its numerical range is. Since an explicit description is in general nonattainable, we settle for something less by proving its geometric or topological properties and describing its shape and relative location in the plane. From such information, we can get more insight of the structure of the operator. Another line of investigation is along the opposite direction to determine, for a given subset of the plane, when it is the numerical range of some operator (in a certain special class). Since this latter problem has been covered in a brief survey [60] by the present authors, we concentrate here more on the discussions of the former. The main reference for our approach is the recently published monograph [59]. If A is an operator on H, then Re A and Im A denote its real part (A + A) ∕ 2 and imaginary part (A - A) ∕ (2i), and kerðAÞ and ran (A) its kernel and range, respectively. The spectrum, point spectrum, and approximate point spectrum of A are denoted by σ(A), σ p(A), and σ ap(A), respectively. We use ρðAÞ max fjzj : z 2 σðAÞg for the spectral radius of A. If A acts on a finite-dimensional H, then tr (A) and detðAÞ are its trace and determinant, respectively. The identity operator on H is denoted by I H (or I for brevity), and the n-by-n identity (resp., zero) matrix by In (resp., 0n). We use diag (a1, . . . , an) to denote the n-by-n diagonal matrix with diagonal entries a1, . . . , an. The algebra of all operators on H (resp., all n-by-n matrices) is ðHÞ (resp., M n ðÞ ). For a subset △ of , Int △, cl △, and ∂ △ denote its interior, closure, and boundary, respectively. We use to denote the open unit disc fz 2 : jzj < 1g in the plane. The field of real numbers is . For t in , we use btc (resp., dte) for the largest (resp., smallest) integer smaller (resp., larger) than or equal to t.
Numerical Ranges of Operators and Matrices
415
2 Basic Properties We start with some elementary properties of the numerical range. Proposition 2.1 Let A be an operator on H and a and b be scalars. Then (a) (b) (c) (d) (e) (f) (g)
W(A) is nonempty and bounded and is compact if H is finite dimensional, W(aA + bI) = aW(A) + b, W(Re A) = Re W(A) and W(Im A) = Im W(A), WðA Þ = WðAÞ ðfz 2 : z 2 WðAÞgÞ, W(UAU) = W(A) for any unitary operator U on H, W(A) = {a} if and only if A = aI, and WðAÞ ⊆ (resp., W(A) ≥ 0) if and only if A is Hermitian (resp., A ≥ 0).
If the An’s, n ≥ 1, are operators on H cl WðAn Þ → cl WðAÞ in the Hausdorff metric.
with kAn - Ak → 0, then
That the numerical range W(A) of A is always convex can be proven via the case for the two-dimensional H: if A = a0 bc , then W(A) is the (closed) elliptic disc with foci a and c and major and minor axes of lengths (ja - cj2 + jbj2)1∕2 and jbj, respectively. Theorem 2.2 The numerical range of an operator is always convex. It’s easy to see that if An is an operator on Hn for n ≥ 1, then W(∑n An) = conv ( nW(An)), the convex hull of nW(An). The spectrum and numerical range of an operator are closely related. Proposition 2.3 For any operator A, we have σ p(A) ⊆ W(A) and σ(A) ⊆ cl W(A). If A is normal, then conv (σ(A)) = cl W(A). A point a on the boundary of W(A) is in general not in σ(A), let alone in σ p(A). However, with some additional condition, it can be shown to be a reducing eigenvalue of A, meaning that Ax = ax and A x = ax for some nonzero vector x or, equivalently, A is unitarily similar to aI B for some operator B. Proposition 2.4 If a is in ∂W(A), then kerðA - aIÞ = kerðA - aIÞ . If, in addition, a is in σ p(A), then it is a reducing eigenvalue of A. The next proposition gives two different conditions yielding the reducing eigenvalue. For a convex subset △ of , the point a in △ is a corner of △ if there are at least two supporting lines of △ which pass through a.
416
P. Y. Wu and H.-L. Gau
Proposition 2.5 (a) If a is a corner of W(A) (resp., of cl W(A)), then a is a reducing eigenvalue of A (resp., a is in σ ap(A)). (b) If a in W(A) (resp., cl W(A)) is such that jaj = kAk, then a is a reducing eigenvalue of A (resp., a is in σ ap(A)). Part (b) of the preceding proposition can be used to obtain the numerical range of the (simple) unilateral shift S, defined by Sðx0 , x1 , . . . Þ = ð0, x0 , x1 , . . . Þ for (x0, x1, . . . ) in ℓ2. Its numerical range equals (cf. Wu and Gau [59, Lemma 1.4.2]). Examples of numerical ranges of other special-class operators can be found in Wu and Gau [59, Chapter 2].
3 Parameters of Numerical Range To measure the shape, size, and the relative location in the plane of the numerical range, we have various parameters. Foremost among them is the numerical radius w(A) of A defined as supfjzj : z 2 WðAÞg. Geometrically, it is the radius of the smallest closed circular disc centered at the origin and containing W(A). Here are some of its basic properties. Proposition 3.1 (a) w() is a norm on ðHÞ , that is, w() satisfies (i) w(A) ≥ 0, (ii) w(A) = 0 if and only if A = 0, (iii) w(aA) = jajw(A) for a in , and (iv) w(A + B) ≤ w(A) + w(B) for A and B in ðHÞ. For any operator A, we have (b) ρ(A) ≤ w(A) ≤ kAk, (c) w(A) = kAk if and only if ρ(A) = kAk, (d) wðAÞ = supfkRe ðeiθ AÞk : θ 2 g, (e) kAk ∕ 2 ≤ w(A) ≤ kAk, and (f) limn w(An)1∕n = ρ(A). The structure condition on A for the equality w(A) = kAk ∕ 2 was first obtained by Williams and Crimmins (cf. [58] or [59, Theorem 3.1.6]); for other equivalent conditions, consult [59, Theorem 3.1.7]. Theorem 3.2 Let A be an operator on H satisfying kAk = kAxk for some unit vector x in H. Then w(A) = kAk∕2 if and only if A is unitarily similar to
Numerical Ranges of Operators and Matrices
417
B for some operator B with w(B) ≤ jaj ∕ 2. In this case, we have WðAÞ = fz 2 : jzj ≤ jaj ∕ 2g. 0 a 0 0
The inequalities in Proposition 3.1 (e) can be further strengthened as in the next proposition. This is due to Kittaneh [36]. Proposition 3.3 For any operator A, the inequalities 1 1 kA A þ AA k1∕2 ≤ wðAÞ ≤ p kA A þ AA k1∕2 2 2 hold. The norm of the self-commutator AA - AA of an operator A can be estimated by the planar Lebesgue measure of W(A): kAA - AAk≤ 2 mes2(W(A)). Gevorgyan conjectured that this can be sharpened to kAA AAk≤ (4 ∕π) mes2(W(A)) and showed that this is indeed the case if cl W(A) is an elliptic disc (cf. also [59, Problem 1.52]). Other parameters of W(A) are the Crawford number c(A) defined as inf fjzj : z 2 WðAÞg and generalized Crawford number C(A) as inf fjzj : z 2 ∂WðAÞg. Geometrically, if 0 is not in W(A) (resp., 0 is in W(A)), then c(A) (resp., C(A)) is the radius of the largest open circular disc centered at the origin which is disjoint from W(A) (resp., contained in W(A)). The next proposition gives some of their basic properties. Proposition 3.4 For any operator A, the following hold: (a) c(A) = c(A) and C(A) = C(A). (b) 0 ≤ c(A) ≤ C(A) ≤ w(A). (c) If A is invertible, then c(A) > 0 (resp., C(A) > 0) if and only if c(A-1) > 0 (resp., C(A-1) > 0). (d) lim n cðAn Þ1∕ n ≤ dist ð0, σðAÞÞ and lim n CðAn Þ1∕ n ≤ ρðAÞ. As the example A =
1 0 0 -1
with
cðAn Þ1∕ n = CðAn Þ1∕ n =
1 if n is even, 0 if n is odd
shows, the sequence c(An)1∕n (resp., C(An)1∕n), n ≥ 1, does not always converge. More precise information on their limit suprema can be found in [56]. The minimal and maximal widths of the numerical range are used to measure its thickness via its supporting lines. More precisely, for any operator A and any real number θ, let B(θ) be the distance between the two
418
P. Y. Wu and H.-L. Gau
parallel supporting lines y = x tan θ þ d1 and y = x tan θ þ d2 of W(A). The minimal width m(W(A)) (resp., maximal width M(W(A)) of W(A) is inf θ2 BðθÞ (resp., supθ2 BðθÞ). It follows from the general result for convex sets in the plane that M(W(A)) equals the diameter diam W(A) ( supfjz1 - z2 j : z1 , z2 2 WðAÞg ) of W(A) (cf. [59, Proposition A.2.11]). The next theorem, due to Tsing [54], says that the consideration of diam W(A) may be reduced to the case for 2-by-2 matrices (cf. also [59, Theorem 1.5.10]). Recall that an operator B is said to dilate to A if A is unitarily similar to B . Theorem 3.5 For any operator A, diam W(A) equals each of the following quantities: (a) (b) (c) (d)
supfja þ bj : a b dilates to Ag, supfjajþjbj : a b dilates to Ag, 2supfjaj : a a dilates to Ag, and supfdiam WðBÞ : B 2by2 matrix dilates to Ag.
Another parameter of the numerical range, its inradius, will be discussed in association with Sn-matrices in Section 8.
4 Numerical Contraction A numerical contraction A is one with w(A) ≤ 1. The power inequality is one of its most eminent properties. Theorem 4.1 For any operator A, the inequality w(An) ≤ w(A)n holds for all n ≥ 1. Equivalently, the assertion can be rephrased as follows: powers of a numerical contraction are all numerical contractions. This was first proved by Berger [6] via his power dilation theorem (cf. Theorem 6.3). The usual proof is the elementary one as in [28, Problem 221]. An easy corollary of Theorem 4.1 follows: if w(A) ≤ 1, then kAnk≤ 2 for all n ≥ 1. The characterization of numerical contractions by Ando [1] is a powerful tool in many applications (cf. also [59, Theorem 3.3.3]). An operator A is a contraction if kAk ≤ 1. Theorem 4.2 The operator A is a numerical contraction if and only if there A is a Hermitian contraction Z such that IþZ A I - Z ≥ 0.
Numerical Ranges of Operators and Matrices
419
The next corollary gives various factorizations of a numerical contraction. Corollary 4.3 The following conditions are equivalent for an operator A: (a) w(A) ≤ 1, (b) A = (I - Z)1∕ 2B(I + Z)1∕ 2 for some contraction B and some Hermitian contraction Z, (c) A = 2(I - CC)1∕ 2C for some contraction C, and (d) A = 2BC for some operators B and C satisfying CC + BB≤ I. The factorization in Corollary 4.3 (b) can be used to generalize the power inequality from powers to analytic functions. This bears a strong resemblance to its contraction analogue, the von Neumann inequality: if kAk ≤ 1 and f is analytic on cl , then kf(A)k ≤ kfk1 ( max fjf ðzÞj : z 2 cl g) (cf. [28, Problem 229]). Theorem 4.4 If A is a numerical contraction and f an analytic function on cl , then w(f(A)) ≤k f k1 + 2jf(0)j. The proof can be found in [59, Theorem 3.3.6]. Theorem 4.4 is even true for more general functions, namely, those in the disc algebra AðÞ ( ff : cl → : f analytic on and continuous on cl g) (cf. [7, Corollary 2]). A further refinement has been obtained by Drury [22, Theorem 2], a corollary of which is the following: if A is a numerical contraction and f is in AðÞ, then w( f(A)) ≤ (5 ∕4)kfk1, where the constant “5 ∕4” is the best possible. Corollary 4.3 (b) can also be used to deduce the asymptotic behavior of kAnxk, n ≥ 1, for a numerical contraction A and a unit vector x. It is due to Crabb [17, Theorem 1] (cf. also [59, Theorem 3.3.13]). Theorem 4.5 If A is a numerical contraction on H and x a unit vector in H, p then kAnxk, n ≥ 1, converges to a limit which is less than or equal to 2. p Necessary and sufficient conditions for the preceding limit to be equal to 2 have also been obtained in [25, Theorem 3.2] (cf. also [59, Theorem 3.3.14]). A simpler proof of a result due to Sz.-Nagy and Foiaş relating the contraction and numerical contraction can be given by using Corollary 4.3 (c) (cf. [52, Corollary II.8.2] or [59, Theorem 3.3.11]). Theorem 4.6 If A is a numerical contraction, then there is an invertible normal operator X with kXkkX-1k ≤ 2 such that X-1AX is a contraction. In this case, the constant “ 2” is the best possible.
420
P. Y. Wu and H.-L. Gau
Another direction of generalization of the power inequality concerns the numerical radius of the product of two commuting operators. It was proposed by [29] in 1969 that w(AB) ≤ kAkw(B) might be true for commuting operators A and B. Over the years, many special classes of A and B have been shown to be the case. Then in [47], Müller gave a counterexample of 12-by-12 commuting matrices. Simpler examples in terms of zero–one matrices were found right after by Davidson and Holbrook [20]. It is now known that the smallest size of commuting matrices A and B with w(AB) > kAkw(B) is 3 (cf. [30] and [31, p. 278]). On the other hand, Crabb also obtained a p constant c ( = 2 þ 2 3 ∕ 2) for which w(AB) ≤ ckAkw(B) for commuting operators A and B (cf. [59, Theorem 3.4.17]). The question of the smallest such c remains open. So is the smallest r for which w(AB) ≤ rw(A)w(B) for all 3-by-3 commuting A and B (cf. [31, Corollary 3.6]), though it is known that r = 1 for commuting A and B of size 2 (cf. [30] or [31, Corollary 3.2]). More detailed information on this topic can be found in [59, Sections 3.4 and 7.4].
5 Essential Numerical Range Over the years, there appeared many different kinds of generalized numerical ranges, one of which is the algebraic numerical range. For an element x of a (complex) Banach algebra A with identity 1, its algebraic numerical range, denoted by W A ðxÞ, is the set ff ðxÞ : f : A → linear, k f k = f(1) = 1}. Trivially, such sets are always convex. The development of its theory up to the early 1970s has been codified in the two monographs [9] and [10]. When A = ðHÞ for some Hilbert space H, the algebraic numerical range W A ðAÞ coincides with cl W(A) for all operators A on H. If A is the Calkin algebra ðHÞ∕ ðHÞ for an infinite-dimensional separable space H, where ðHÞ denotes the -ideal of compact operators on H, then W A ðπðAÞÞ (with π the quotient map taking A in ðHÞ to the coset A þ ðHÞ in A) is called the essential numerical range of A and denoted by We(A). In the remainder of this section, we only consider operators on an infinite-dimensional separable Hilbert space. The next proposition lists some basic properties of We(A). Proposition 5.1 Let A be an operator on H. Then (a) (b) (c) (d) (e)
We(A) ⊆ cl W(A). We(A) = We(A + K) for any compact operator K on H, We(A) = {0} if and only if A is compact, We(A) is a nonempty compact convex subset of , and W e ðAÞ = fcl WðA þ KÞ : K compact on Hg.
Numerical Ranges of Operators and Matrices
421
The following theorem gives some spatial expressions for elements in We(A). Theorem 5.2 The conditions below are equivalent for an operator A on H: (a) (b) (c) (d)
z is in We(A), A = D , where D = diag (z1, z2, . . .) is a diagonal operator with zn → z, there is an orthonormal basis fen g1 n = 1 of H such that hAen , en i → z, and there are unit vectors xn, n ≥ 1, in H for which xn → 0 weakly and hAxn , xn i → z.
The proof of the preceding theorem can be found in [59, Theorem 4.2.3]. The next theorem, which relates the numerical range and the essential numerical range, is due to Lancaster [37] (cf. also [59, Theorem 4.5.1]). Theorem 5.3 For any operator A, the equality cl W(A) = conv (W(A) [ We(A)) holds. In particular, if A is compact, then cl W(A) = conv (W(A) [ {0}). This shows that We(A) plays a major role in determining the boundary of W(A). An easy corollary is that W(A) is closed if and only if We(A) is contained in W(A), and, for A compact, W(A) is closed if and only if 0 is in W(A). A more detailed analysis yields the following theorem (cf. [37] or [59, Theorem 4.5.7]). Theorem 5.4 If A is a compact operator, then ∂W(A) ∖ W(A) consists of at most two line segments of the form [0, λ). For a compact A, more is true for the boundary of W(A) as the next theorem shows. Theorem 5.5 Let A be a compact operator. (a) If 0 is in the interior of W(A), then ∂W(A) is the union of finitely many analytic arcs. (b) If 0 is in the boundary of W(A), then ∂W(A) is the union of countably many analytic arcs with the only possible accumulation points of the endpoints of these arcs being the endpoints of the longest line segment (possibly degenerate) in ∂W(A) which contains 0. This is due to Narcowich [48, Corollaries 3.5 and 3.6] (cf. also [59, Corollary 4.5.16 and Theorem 4.5.17]). Other generalized numerical ranges will be discussed later in Section 9.
422
P. Y. Wu and H.-L. Gau
6 Numerical Range and Dilation Recall that an operator A on H is said to dilate to operator B on K (or A is a compression of B) if A = VBV for some isometry V (V V = I H ) from H to K or, equivalently, B is unitarily similar to an operator of the form A . In this case, W(A) is contained in W(B) and thus w(A) ≤ w(B). A classical result of Halmos says that every contraction A can be dilated to the unitary operator A
- ðI - AA Þ1∕2
ðI - A AÞ1∕2
A
:
In 1964, he posed in [27] the question whether the numerical range of A can be completely determined by those of its unitary dilations. The problem was only settled affirmatively after 35-plus years by Choi and Li in [15]. Theorem 6.1 For any contraction A, cl W(A) is equal to unitary dilation of A}.
fcl WðUÞ : U
An illuminating example is the case of A = Jn, the n-by-n Jordan block 0 1 0 ⋱ ⋱ 1
:
0 Its numerical range is the circular disc fz 2 : jzj ≤ cosðπ∕ ðn þ 1ÞÞg (cf. [59, Lemma 2.4.1 (a)]). On the other hand, its (n + 1)-by-(n + 1) unitary dilation Uθ, θ 2 , of the form 0 1 0 ⋱ ⋱1 eiθ
0
has numerical range the closed (n + 1)-polygonal region Pθ with vertices e(θ+2kπ)i∕(n+1), 0 ≤ k ≤ n. It can be easily seen that W(Jn) equals the intersection fPθ : θ 2 g.
Numerical Ranges of Operators and Matrices
423
One of the later developments yields that the unitary dilations in the intersection can be restricted to the “most economical” ones. This is due to Bercovici and Timotin [5, Theorem 2.4] as follows (cf. also [59, Theorem 5.1.2]). Theorem 6.2 If A is a contraction on H with defect indices dA = dA = n, 1 ≤ n < 1, then cl WðAÞ = fcl WðUÞ : U unitary dilation of A on H K with dim K = ng. The defect index dA of a contraction A is, by definition, 1∕2 dimðcl ran ðI - A AÞ Þ. Note that the “most economical” claim is validated by the fact that, for a contraction A on H, the minimum dimension of a space K for which A dilates to a unitary operator on H K is dA if d A = d A and 1 if otherwise (cf. [59, Lemma 5.1.4 (b)]). Another prominent dilation result relating to the numerical contraction is the Berger power dilation. An operator A on H is to have power dilation B on K if there is an isometry V from H to K such that An = VBnV for all n ≥ 1. This is the same as saying that Bn is unitarily similar, via a common unitary n operator, to A for all n ≥ 1. A classical result of Sz.-Nagy [51] says that any contraction can be power dilated to a unitary operator and the latter is unique (in a certain sense). The Berger power dilation theorem [6] is its numerical contraction counterpart. Theorem 6.3 If A is a numerical contraction on H, then there is a unitary operator U on K which contains H such that An = 2PH U n jH for all n ≥ 1, where PH denotes the (orthogonal) projection from K onto H. We may even n require that K = cl span ð 1 n = - 1 U HÞ , in which case U is unique up to isomorphism in the sense that if U′ is another unitary operator on K ′ containing H such that A0n = 2P0H U 0n jH for all n, then there is a unitary operator W from K onto K ′ such that WjH = I H and WU = U′W. It turns out that both the Sz.-Nagy and Berger power dilations are covered by the more general Naimark dilation for a positive definite sequence of operators fAn g1 n = - 1 . The details can be found in [59, Section 5.2]. Note that the power inequality can be easily derived from the Berger power dilation theorem. In fact, this is how it was first proved back in 1963. In the same vein, inequalities involving numerical contractions and functions in AðÞ (or even in H1) can also be proved (cf. [59, Theorem 5.2.4 and Problem 7.39]). The notion of spectral sets originates from the von Neumann inequality: A is a contraction if and only if cl is spectral for A meaning that
424
P. Y. Wu and H.-L. Gau
kf ðAÞk ≤ max fjf ðzÞj : z 2 cl g for any rational function f with poles outside cl . The condition for general spectral sets is equivalent to the normal power dilation via the Berger–Foiaş–Lebow dilation theorem (cf. [59, Theorem 5.4.3]). In recent years, the more general k-spectral sets have been studied intensively. To be more precise, a (nonempty proper) subset △ of is a k-spectral set (k ≥ 1) for operator A if △ contains σ(A) and kf ðAÞk ≤ ksupfjf ðzÞj : z 2 △g for all rational functions f with poles outside △. A conjecture proposed by Crouzeix [18] in 2004 says that k can be taken to be “2” for the case △ = cl W(A) as suggested by the known inequality kAk ≤ 2w(A). The efforts of subsequent years in trying to resolve this conjecture resulted in some partial progresses: either with “2” for some special classes of A or with a universal constant slightly larger than 2 for general A. The next two theorems are examples for each. Theorem 6.4 If A is an operator on a two-dimensional space and f a function analytic on Int W(A) and continuous on W(A), then k f ðAÞk ≤ 2 max fjf ðzÞj : z 2 WðAÞg . Moreover, if this becomes an equality for some nonzero f, then W(A) is a circular disc. Theorem 6.5 If A is an operator and f a function analytic on Int W(A) and p continuous on cl W(A), then kf ðAÞk ≤ ð1 þ 2Þ max fjf ðzÞj : z 2 cl WðAÞg. These were proved in [18, Theorem 4.1] and [19, Theorem 3.1], p respectively (cf. also [59, Theorems 5.4.6 and 5.4.8]). The constant “1 þ 2” is the closest to the conjectured “2” for the present time.
7 Numerical Range of Finite Matrix In case the operator A acts on a finite-dimensional space, that is, A is represented as a finite matrix, the study of W(A) can be facilitated by the use of the Kippenhahn polynomial given by pA ðx, y, zÞ = detðxRe Aþ yIm A þ zI n Þ. As the next theorem of Kippenhahn [35] shows, W(A) can be obtained from the algebraic curve C ðAÞfa þ bi 2 : a, b 2 , ax þ by þ z = 0 tangent to pA(x, y, z) = 0}. Theorem 7.1 If A is an n-by-n matrix, then W(A) is the convex hull of the curve C ðAÞ. The numerical ranges of 3-by-3 matrices can be classified, according to the factorizability of the Kippenhahn polynomial and in terms of the eigenvalues
Numerical Ranges of Operators and Matrices
425
and entries of the matrix, into four different types (cf. [33] also [59, Section 6.2]). Another application of the Kippenhahn polynomial is a proof of Anderson’s theorem. Theorem 7.2 If A is an n-by-n matrix with W(A) contained in cl and ∂W(A) containing more than n points of ∂, then WðAÞ = cl . There are now several different proofs of this theorem, among which one depends on Bézout’s theorem from algebraic geometry (if two curves in the complex projective plane of orders m and n have no common component, then they intersect at exactly mn points counting multiplicity) and the other on the Riesz–Fejér theorem from classical analysis (a degree-n trigonometric polynomial which assumes only nonnegative values on the unit circle of the plane must be the square of the modulus of a degree-n polynomial). The original nonpublished proof of Anderson’s is based on the former. Further results along this line, such as replacing cl by an elliptic disc or restricting to nilpotent matrices and companion matrices, can be found in [59, Section 6.3]. For operators on an infinite-dimensional space, there is also an analogue of Anderson’s theorem (cf. [59, Theorem 4.5.18]). Theorem 7.3 If A is an operator on an infinite-dimensional space with WðAÞ ⊆ cl , ∂WðAÞ \ ∂ an infinite set, and W e ðAÞ ⊆ , then WðAÞ = cl . In particular, this applies to compact operators. A finite matrix A = ½aij ni,j = 1 is nonnegative if aij ≥ 0 for all i and j. It is irreducible if there is no permutation matrix (one with entries all equal to 0 or 1 and there is exactly one 1 on every row and every column) P for which PAP is of the form B0 DC with square matrices B and D. The next theorem on its numerical range is the analogue of the Perron–Frobenius theorem for its spectrum. This is due to Issos [32] (cf. also [59, Theorem 6.5.4]). Theorem 7.4 Let A be an n-by-n irreducible nonnegative matrix. Then (a) (b) (c) (d)
w(A) is in W(A), w(A) = hAx, xi for some unit vector x with strictly positive components, fx 2 n : hAx, xi = wðAÞkxk2 g is a subspace of n with dimension one, if z is in W(A) with jzj = w(A), then there is a diagonal unitary matrix D such that A = (z ∕w(A))D-1AD,
426
P. Y. Wu and H.-L. Gau
(e) z is in W(A) with jzj = w(A) if and only if z = wðAÞωjm for some j, 0 ≤ j ≤ m - 1, where m is the number of eigenvalues of A with moduli equal to ρ(A), and ωm = e2πi∕m, and (f) ωmW(A) = W(A). Note that the real part of an irreducible nonnegative matrix is also irreducible, but not conversely. Some of the assertions in Theorem 7.4 can be generalized to this larger class. The following theorem, due to Tam and Yang [53, Lemma 1], is one such example. Its proof is based on Theorem 7.4 and the operator theory results in [43]. Theorem 7.5 Let A be a nonnegative matrix with Re A irreducible and let w(A)eiθ be in W(A), where θ is in with eiθ ≠ 1. (a) If θ is an irrational multiple of 2π, then A is permutationally similar to a matrix of the form 0 A1 0 ⋱
ð2 ≤ m ≤ nÞ,
⋱ Am - 1 0
where the diagonal zeros are zero square matrices, and, in particular, W(A) is a circular disc centered at the origin. (b) If θ is a rational multiple of 2π, say, θ = 2πp∕q, where p and q are relatively prime integers and q ≥ 2, then A is permutationally similar to 0 A1 0 ⋱ ⋱ Aq - 1 Aq
0
and, in particular, W(A) satisfies W(A) = e2πi∕qW(A). Properties of the numerical ranges of (nonnegative) doubly (resp., row) stochastic matrices are discussed in the second part of [59, Section 6.5]. The problem of determining the possible shapes of the numerical ranges of doubly (resp., row) stochastic matrices is still quite open.
Numerical Ranges of Operators and Matrices
427
8 Numerical Range of Sn-Matrix An n-by-n matrix A is said to be of class Sn if it is a contraction, has eigenvalues all in , and satisfies rank (In - AA) = 1. Such matrices are finite-dimensional versions of the compression of the shift S(ϕ), first studied by Sarason [50], defined on H(ϕ) H2 ϕH2 by S(ϕ)f = P(zf(z)) for f in H(ϕ), where H2 is the Hardy space of square-integrable analytic functions on , ϕ is an inner function (ϕ bounded analytic on with jϕj = 1 a.e. on ∂), and P is the (orthogonal) projection from H2 onto H(ϕ). When ϕðzÞ = nj= 1 ðz - aj Þ∕ ð1 - aj zÞ, jajj < 1 for all j, a Blaschke product with n zeros, S(ϕ) represents an Sn-matrix with the aj’s as its eigenvalues. One such example is the transpose of the Jordan block Jn, which corresponds to S(ϕ) with ϕ(z) = zn on . The next proposition lists some basic properties of the numerical ranges of Sn-matrices. Proposition 8.1 Let A be an Sn-matrix. Then the following hold: (a) W(A) is contained in and has no corner. (b) If M is a proper invariant subspace of A, then W(AjM) ⊆Int W(A). (c) If λ is an extreme point of W(A), then the set fx 2 n : hAx, xi = λkxk2 g is a subspace of n of dimension one. (d) If hAx, xi is in ∂W(A), where x is a unit vector in n , then x is cyclic for both A and A, that is, x satisfies span ðfAk x : k ≥ 0gÞ = span ðfA k x : k ≥ 0gÞ = n . (e) ∂W(A) contains no line segment. Recall that λ is an extreme point of the convex set △ of if λ belongs to △ and is not expressible as tλ1 + (1 - t)λ2 for any λ1 ≠ λ2 in △ and any t, 0 < t < 1. The proof of Proposition 8.1 can be found in [59, Section 7.1]. Note that if S(ϕ) acts on an infinite-dimensional space, then ∂W(S(ϕ)) may contain a line segment (cf. [5, Example 7.5]). More important than the above, the numerical ranges of Sn-matrices are related to Poncelet’s porism, a classical result of projective geometry. The latter was established by J.-V. Poncelet in 1813 (and published in 1822) saying that, for two ellipses E1 and E2 in the plane with E1 contained in the elliptical region enclosed by E2, if there is an n-polygon with n sides all tangent to E1 and n vertices on E2, then for any point λ on E2, there is a unique such circumscribing–inscribing n-polygon with λ as a vertex. Normalizing the outer ellipse as the unit circle via an affine transformation, the next theorem provides more examples of the inner curve with infinitely many circumscribing–inscribing polygons. It first appeared in [45, Theorem 1] and [26, Theorem 2.1].
428
P. Y. Wu and H.-L. Gau
Theorem 8.2 For any Sn-matrix A and any point λ on ∂, there is a unique (n + 1)-polygon P which circumscribes ∂W(A), inscribes in ∂, and has λ as a vertex. Moreover, such polygons P are in one-to-one correspondence with the unitary dilations U of A on an (n + 1)-dimensional space such that the n + 1 vertices of P are exactly the eigenvalues of the corresponding U. Several consequences follow from this theorem. Corollary 8.3 Let A be an Sn-matrix. Then the following hold: (a) If B is another Sn-matrix with W(A) ⊆ W(B), then W(A) = W(B). (b) Every point in ∂W(A) is an extreme point of W(A). (c) ∂W(A) is an analytic and an algebraic curve. It seems natural to ask whether the converse of Theorem 8.2 is true, namely, whether the (n + 1)-Poncelet property characterizes the numerical ranges of Sn-matrices. That this is not so is shown by an example given by Mirman [46, Example 1] (cf. [59, Theorem 7.2.7] for an explicit proof). Other connections between numerical ranges of Sn-matrices and classical projective geometry results have also been established. The Brianchon–Ceva theorem says that, for a planar triangle △ABC with points P, Q, and R on its sides AB, BC, and CA, respectively, the existence of an ellipse circumscribed by △ABC at P, Q, and R is equivalent to the equality AP BQ CR = PB QC RA. The next theorem generalizes it to (n + 1)polygons. Theorem 8.4 Let a1, . . . , an+1 be n + 1 distinct points arranged counterclockwise around ∂, and let b1, . . . , bn+1 be points on (a1, a2), . . . , (an+1, a1), respectively. Then there is an Sn-matrix A with ∂W(A) circumscribed by the (n + 1)-polygon a1⋯an+1 at the tangent points b1, . . . , bn+1 if and only if nþ1 nþ1 j = 1 jbj - aj j = j = 1 jajþ1 - bj j (an+2 a1). In this case, A is unique up to unitary similarity. An easy consequence is the following corollary, part (a) of which corresponds to the fact that an Sn-matrix is determined, up to unitary similarity, by its spectrum. Corollary 8.5 (a) Two Sn-matrices are unitarily similar if and only if they have equal numerical ranges. (b) If A is an Sn-matrix, then wðAÞ ≥ cos ðπ∕ ðn þ 1ÞÞ. Moreover, the equality holds if and only if A is unitarily similar to Jn.
Numerical Ranges of Operators and Matrices
429
Note that, for general inner functions ϕ and ψ, the equality cl W(S(ϕ)) = cl W(S(ψ)) does not guarantee the unitary similarity of S(ϕ) and S(ψ). One such example was given in [5, Example 7.5]. However, when σ e(S(ϕ)) is a singleton or when ϕ is a singular inner function with σ e(S(ϕ)) consisting of exactly two points, this can still be true (cf. [13, Corollary 20] and [34, Theorem 4], respectively). The classical Lucas theorem says that zeros of the derivative of a polynomial are contained in the convex hull of the zeros of the polynomial and Siebeck’s theorem of 1864 that if a1, a2, and a3 are distinct zeros of a cubic polynomial and b1 and b2 are zeros of its derivative, then there is an ellipse with foci b1 and b2 circumscribed by △ a1a2a3 at the midpoints of its three sides. The next theorem is their combined generalization. Theorem 8.6 Let p be a degree-(n + 1) polynomial with distinct zeros a1, . . . , an+1 having modulus one and arranged counterclockwise around ∂. If A is an Sn-matrix whose eigenvalues are zeros of p′, then ∂W(A) is circumscribed by the (n + 1)-polygon a1⋯an+1 at the midpoints (a1 + a2)∕2, . . . , (an+1 + a1)∕2 of its sides. A more general version of this theorem, which characterizes all such Snmatrices, can also be given. The proofs of these can be found in [59, Section 7.3]. A class of operators more general than the compressions of the shift is that of C0 contractions. A contraction A is of class C0 if it has no unitary direct summand and satisfies ϕ(A) = 0 for some inner function ϕ. In this case, the one which is the factor of all such annihilating inner functions is called the minimal function of A. The next theorem shows that C0 contractions can be extended to direct sums of the compressions of the shift (cf. [59, Theorems 7.4.1 and 7.4.3, and Corollary 7.4.4]). Theorem 8.7 If A is a C0 contraction on H with minimal function ϕ, then A can be extended to SðϕÞ⋯ SðϕÞ, where d (1 ≤ d ≤1) is the defect index d
of A. Moreover, if H is finite dimensional, then the following conditions are equivalent: (a) (b) (c) (d)
∂W(A) \ ∂W(S(ϕ)) ≠ ∅, W(A) = W(S(ϕ)), w(A) = w(S(ϕ)), and A is unitarily similar to S(ϕ) B for some operator B.
This can be used to measure the inradius of the numerical range (cf. [59, Theorem 7.4.7]). Recall that, for any nonempty bounded convex subset △ of , its inradius i(△) is the radius of the largest open circular disc contained in △.
430
P. Y. Wu and H.-L. Gau
Theorem 8.8 For any n-by-n matrix A, we have iðWðAÞÞ ≤ kAk cos ðπ∕ ðn þ 1ÞÞ. Moreover, the equality holds if and only if A is unitarily similar to kAkJn. Applying Theorem 8.7 to a nilpotent operator, we can easily relate its numerical radius and norm in a similar fashion (cf. [59, Theorem 5.3.5]). On the other hand, the numerical radius and generalized Crawford number of a nilpotent operator can also be related as follows (cf. [59, Theorem 7.4.8]). The nilpotency of a nilpotent operator A is the smallest integer n for which An = 0. Theorem 8.9 Let A be a nilpotent operator on H with nilpotency n. Then w(A) ≤ (n - 1)C(A). Moreover, if A attains its numerical radius, that is, if w(A) = jhAx, xij for some unit vector x in H, then the following conditions are equivalent: (a) w(A) = (n - 1)C(A), (b) A is unitarily similar to aAn B, where a is in satisfying jaj = 2C(A), An is the n-by-n matrix 0 1⋯ 1 0⋱⋮ ⋱ 1
,
0 and B is some nilpotent operator, and (c) W(A) = bW(An) for some b ≥ 0. In this case, W(A) is closed. We remark that extensions of some of the properties of Sn-matrices to compressions of the shift S(ϕ) deserve to be further explored.
9 Generalized Numerical Ranges In this final section, we consider various generalizations of the classical numerical range other than the algebraic and essential numerical ranges discussed in Section 5. We start with the joint numerical range. For operators A1, . . . , Am on H, the joint numerical range W(A1, . . . , Am) is the subset fðhA1 x, xi, . . . , hAm x, xiÞ : x 2 H, kxk = 1g of m . As with the classical
Numerical Ranges of Operators and Matrices
431
numerical range, W(A1, . . . , Am) is nonempty and bounded in m , and is compact if H is finite dimensional. However, it is in general not convex even for m = 3 and dim H = 2. The first such example was discovered by Brickman [11, Remark 2]: A1 =
0
1
1
0
,
A2 =
0
i
-i 0
,
and A3 =
1
0
0
-1
:
On the other hand, under certain conditions on the Aj’s, W(A1, . . . , Am) may still be convex. One such result is the following. Theorem 9.1 If A1, . . . , Am are Hermitian operators on H, then W(A1, . . . , Am) is convex under either of the following conditions: (a) dim span ðfA1 , . . . , Am , IgÞ ≤ 3 or (b) dim H ≥ 3 and dim span ðfA1 , . . . , Am , IgÞ ≤ 4. This is due to Au-Yeung and Poon [3]; a direct proof was given in [49] (cf. also [59, Theorem 8.1.3]). The next theorem concerns commuting matrices. Theorem 9.2 Let A1, . . . , Am be n-by-n matrices. (a) If the Aj’s are doubly commuting (AjAk = AkAj and Aj Ak = Ak Aj for all j ≠ k), then W(A1, . . . , Am) is convex. (b) If 1 ≤ n ≤ 3 and the Aj’s are commuting, then W(A1, . . . , Am) is convex. (c) If n ≥ 4 and m ≥ 2, then there are commuting Aj’s for which W(A1, . . . , Am) is not convex. Here part (a) is from [8] and (b) and (c) were proven in [38]. Another generalized numerical range is the C-numerical range. For n-by-n matrices A and C, the C-numerical range of A is the subset W C ðAÞftr ðCU AUÞ : U nbyn unitary matrixg of the plane. Special cases of the C-numerical range are c-numerical range Wc(A) for c = (c1, . . . , cn) in n and k-numerical range Wk(A) for 1 ≤ k ≤ n. The former is WC(A) with C = diag (c1, . . . , cn) and the latter is Wc(A) with c = ð1∕ k, . . . , 1∕ k , 0, . . . , 0Þ in n . That Wk(A) is convex was proved by k
Berger (even for operator A). More generally, it was shown by Au-Yeung and Tsing [4] that Wc(A) is convex for all n-by-n matrices A if and only if the points c1, . . . , cn are collinear in the plane. In spite of this, the C-numerical
432
P. Y. Wu and H.-L. Gau
range can still be convex under certain conditions. The next theorem gives two of them. Theorem 9.3 Let A and C be n-by-n matrices. Then WC(A) is convex if either n = 2 or C is Hermitian. In the former case, WC(A) is a (closed) elliptic disc. Note that the C Hermitian assertion is covered by Au-Yeung and Tsing’s result mentioned above. The proof of the theorem can be found in [59, Theorems 8.2.3 and 8.2.5]. More generally, it is known via [43, 55, 57] that if C is an n-by-n matrix satisfying any one of the following conditions: (1) C - μIn is of rank one for some μ in , (2) eit(C -tr (C)In∕n) is Hermitian for some t in , and (3) C -tr (C)In∕n is unitarily similar to a matrix of the form 0
C1 0
C2 0
⋱ ⋱ Ck 0
for some k, 1 ≤ k ≤ n - 1, then WC(A) is convex for every n-by-n matrix A. It is open whether the converse is true. The case for a normal C is known: the convexity of WC(A) for all n-by-n matrices A is equivalent to the collinearity of the eigenvalues of C. The C-numerical range can also be considered for C an n-by-n matrix and A an operator on H with dim H ≥ n as W C ðAÞ = ftr ððC 0ÞU AU Þ : U unitary on Hg, where C 0 is regarded as an operator on H. Recently, a condition on C for which the WC(A) analogue from Theorem 6.2 holds for all contractions A is proven in [39]. Without the convexity in general, the next best thing is the starshapedness. Recall that a subset △ of the plane is star-shaped with star center a if ta + (1 - t)b is in △ for all t, 0 ≤ t ≤ 1, and all b in △. The following theorem was obtained by Cheung and Tsing [14, Theorem 4 (a)] (cf. also [59, Theorem 8.2.9]).
Numerical Ranges of Operators and Matrices
433
Theorem 9.4 For any n-by-n matrices A and C, the C-numerical range WC(A) is star-shaped with star center (tr (A))(tr (C))∕n. For the next topic, we consider the q-numerical range and Davis–Wielandt shell. For an operator A on H and a complex number q with jqj≤ 1, the qnumerical range Wq(A) of A is the set fhAx, yi : x, y 2 H, kxk = kyk = 1, hx, yi = qg. For q = 1, Wq(A) reduces to the classical numerical range W(A). The next theorem, due to Tsing [55], gives the convexity of Wq(A) and its connection with WC(A) (cf. also [59, Theorem 8.3.2 and Corollary 8.3.3]). Theorem 9.5 Let A be an operator on H and jqj≤ 1. Then (a) Wq(A) is convex, and (b) If 2 ≤ n dim H < 1, p 1 - jqj2 C = q0 0n - 2 . 0
then
Wq(A)
equals
WC(A),
where
Note that if S is the (simple) unilateral shift on ℓ2, then W q ðSÞ = cl if jqj < 1 and if jqj = 1. Another example is in the next theorem, whose proof can be found in [59, Theorem 8.3.5]. Theorem 9.6 If A is an n-by-n Hermitian matrix with eigenvalues a1 ≥⋯ ≥ an and jqj≤ 1, then Wq(A) is the (closed) elliptic disc with foci qa1 and qan and length of the minor axis 1 - jqj2 ða1 - an Þ. The Davis–Wielandt shell DW(A) of operator A on H is defined as the subset fðhAx, xi, hA Ax, xiÞ : x 2 H, kxk = 1g of × . As a generalization of the numerical range, it captures more information of the operator A. The next theorem concerns its convexity. Theorem 9.7 Let A be an operator on H. (a) If dim H = 2 and A is represented as a0 bc , then DW(A) is an ellipsoid centered at ((a + c) ∕ 2, (jaj2 + jbj2 + jcj2)∕2) and thus is convex if and only if b = 0. In this latter case, DW(A) is the (closed) line segment connecting (a, jaj2) and (c, jcj2) in × . (b) If dim H ≥ 3, then DW(A) is always convex. Note that part (a) appeared in [21, Section 9] and part (b) is an easy consequence of Theorem 9.1 (b) since DW(A) can be identified as the joint numerical range W(Re A, Im A, AA). For the (simple) unilateral shift S, it can be computed that DW(S) = {(λ, 1) : jλj < 1} and DW(S) = {(λ, 1) : jλj < 1}[{(λ, r) : jλj2 ≤ r < 1}. The Davis–Wielandt shell and q-numerical range are connected by the fact that, for matrices A and B possibly of different sizes, the equality of DW(A)
434
P. Y. Wu and H.-L. Gau
and DW(B) implies that of Wq(A) and Wq(B) for all q, jqj≤ 1, but not conversely. A brief account of this, together with the proofs of other assertions here, can be found in [59, Section 8.3]. We now move to the two types of matricial range: the spatial and the algebraic matricial ranges. They generalize the classical and the algebraic numerical ranges, respectively. Let A be an operator on H and let integer n be such that 1 ≤ n ≤ dim H. Then the nth spatial matricial range W ns ðAÞ of A is fB 2 M n ðÞ : B dilates to A}. It turns out that the W ns ðAÞ’s are in general not convex, nor are they star-shaped. Theorem 9.8 Let A be an n-by-n matrix and n∕2 < k ≤ n. Then W ks ðAÞ is convex if and only if A is a scalar matrix. The proof depends on the Fan–Pall theorem [23, Theorem 1]: if A is an n-by-n Hermitian matrix with eigenvalues a1 ≥⋯ ≥ an and 1 ≤ k ≤ n, then W ks ðAÞ consists of all k-by-k Hermitian matrices with eigenvalues b1 ≥⋯ ≥ bk satisfying aj ≥ bj ≥ an-k+j for all j, 1 ≤ j ≤ k. However, for the (simple) unilateral shift S, W ns ðSÞ can be shown to be fB 2 M n ðÞ : kBk ≤ 1, σðBÞ ⊆ g for all n ≥ 1, which yields their convexity (cf. [59, Theorem 8.4.7]). The star-shapedness of spatial matricial ranges for Hermitian matrices and for Hermitian operators without eigenvalues is discussed in [59, Theorem 8.4.4 and Proposition 8.4.6]. For compact operators, the spatial matricial range provides a powerful tool in their classification up to unitary similarity. Theorem 9.9 Two compact operators A and B on an infinite-dimensional separable space are unitarily similar if and only if dimðkerðAÞ \ kerðA ÞÞ = dimðkerðBÞ \ kerðB ÞÞ and cl W ns ðAÞ = cl W ns ðBÞ for all n ≥ 1. This appeared in [16, Theorems 1 and 2], whose proof depends on Voiculescu’s noncommutative Weyl–von Neumann theorem. A restricted version of it with kerðAÞ \ kerðA Þ = kerðBÞ \ kerðB Þ = f0g was proven via elementary arguments by Parrot, which appeared in [10, Section 36, Theorem 9] (cf. also [59, Theorem 8.4.10]). We next consider the algebraic matricial range. For any operator A on (a separable) H, let C(A) denote the C-algebra generated by A and I H , and CPn(A), n ≥ 1, the class of all unital completely positive maps ϕ from C(A) to M n ðÞ, that is, ϕ is such that the induced maps ϕk ð½Aij ki,j = 1 Þ = ½ϕðAij Þki,j = 1 take positive elements in Mk(C(A)) to positive elements in M k ðM n ðÞÞ for all k ≥ 1. For any n ≥ 1, the nth algebraic matricial range Wn(A) of A is the set fϕðAÞ 2 M n ðÞ : ϕ 2 CPn ðAÞg. The notion, together with many related developments, was pioneered by Arveson [2].
Numerical Ranges of Operators and Matrices
435
As a generalized notion of convexity, the C-convexity of a subset K of ðHÞ means that, for any finitely many operators A1, . . . , Ak in K and V1, . . . , Vk in ðHÞ with kj= 1 V j V j = I H , the operator kj= 1 V j Aj V j is in K. A result of Salinas [44, Proposition 31] says that the properties of compactness and C-convexity are enough to characterize the algebraic matricial range. For the (simple) unilateral shift S, it is known that W n ðSÞ = fB 2 M n ðÞ : kBk ≤ 1g and, in particular, W n ðSÞ = cl W ns ðSÞ for all n ≥ 1 (cf. [59, Corollary 8.4.19]). More generally, the next theorem, due to Bunce and Salinas [12, Theorem 3.5], gives a connection between the two types of matricial ranges for general operators. Theorem 9.10 For any operator A on H and any integer n, 1 ≤ n ≤ dim H, Wn(A) equals the closure of the C-convex hull of W ns ðAÞ. The C-convex hull of any nonempty subset K of ðHÞ is the smallest Cconvex set containing K. For compact operators, Theorem 9.9 has an analogue for algebraic matricial ranges. Theorem 9.11 Two (unitarily) irreducible compact operators A and B on a separable Hilbert space are unitarily similar if and only if Wn(A) = Wn(B) for all n ≥ 1. This was proved by Arveson [2, Section 2.4]. For an exposition of its proof in the finite-dimensional case, consult [24]. Finally, we come to the higher-rank numerical ranges, whose study was undertaken only in the past one and a half decades. For any operator A on H and any integer k, 1 ≤ k ≤ dim H, the rank-k numerical range Λk(A) of A is the set fλ 2 : λI k dilates to A}. In contrast to other types of generalized numerical ranges, the higher-rank numerical ranges can be empty. One such example is an n-by-n matrix A with all its eigenvalues having geometric multiplicity one, in which case Λk(A) = ∅ for any k, dn∕2e < k ≤ n (cf. [59, Proposition 8.5.2]). To prove the convexity of Λk(A), we need an alternative expression for its elements. Consider the set Ωk(A), defined for any operator A on H and any integer k, 1 ≤ k ≤ dim H, by 0 ≤ θ < 2π fλ 2 : Re ðe - iθ λÞ ≤ λk ðRe ðe - iθ AÞÞg, where, for a Hermitian operator B on H with n dim H < 1, λ1(B) ≥⋯ ≥ λn(B) denote its (ordered) eigenvalues, and, for dim H = 1, λk ðBÞ = supfλk ðV BVÞ : V isometry from k to Hg. Although the containment Λk(A) ⊆ Ωk(A) follows easily from the Fan–Pall theorem, the converse containment is much more difficult to prove even for a normal A. The general case was finally established by Li and Sze [42]
436
P. Y. Wu and H.-L. Gau
through a detailed analysis involving the -congruence canonical form of matrices (cf. also [59, Theorem 8.5.11]). Theorem 9.12 For any n-by-n matrix A and any k, 1 ≤ k ≤ n, the equality Λk(A) = Ωk(A) holds. The convexity of Λk(A) for any operator A follows easily from the preceding theorem though, when A acts on an infinite-dimensional space, we only have Int Ωk(A) ⊆ Λk(A) ⊆ Ωk(A) = cl Λk(A) for k ≥ 1 (cf. [40, Theorem 2.1] or [59, Theorem 8.5.16]). The condition for its nonemptiness can also be derived (cf. [41, Theorem 3] or [59, Theorem 8.5.13, Example 8.5.14 and Corollary 8.5.15]). Theorem 9.13 Let A be an operator on H and n = dim H. (a) If n < 1, then Λk(A) is nonempty for any k, 1 ≤ k ≤b(n + 2) ∕ 3c, and, moreover, the number “b(n + 2) ∕ 3c” is the best possible. (b) If n = 1, then Λk(A) is nonempty for all k ≥ 1. Hopefully, the contents of this chapter convince the readers that the study of numerical ranges has gone through exciting developments from its beginning onto the present time. Most likely, the study will continue to play a major role in operator theory for years to come. Acknowledgements The research of H.-L. Gau was partially supported by the Ministry of Science and Technology of the Republic of China under project MOST 111-2115-M008 -004. The authors appreciate very much the comments on this chapter by three reviewers, some of which have been incorporated into the present edition.
References 1. Ando, T. (1973). Structure of operators with numerical radius one. Acta Scientiarum Mathematicarum (Szeged), 34, 11–15 2. Arveson, W. B. (1969). Subalgebras of C-algebras. Acta Mathematica,123, 141–224 3. Au-Yeung, Y. H., & Poon, Y.-T. (1979). A remark on the convexity and positive definiteness concerning Hermitian matrices. Southeast Asian Bulletin of Mathematics, 3, 85–92 4. Au-Yeung, Y. H., & Tsing, N.-K. (1983). A conjecture of Marcus on the generalized numerical range. Linear and Multilinear Algebra, 14, 235–239 5. Bercovici, H., & Timotin, D. (2014). The numerical range of a contraction with finite defect numbers. Journal of Mathematical Analysis and Applications, 417, 42–56 6. Berger, C. A. (1963). Normal dilations. Ph.D. dissertation, Cornell University 7. Berger, C. A., & Stampfli, J. G. (1967). Mapping theorems for the numerical range. American Journal of Mathematics, 89, 1047–1055
Numerical Ranges of Operators and Matrices
437
8. Bolotrikov, V., & Rodman, L. (1999). Normal forms and joint numerical ranges of doubly commuting matrices. Linear Algebra and Its Applications, 301, 187–194 9. Bonsall, F. F., & Duncan, J. (1971). Numerical ranges of operators on normed spaces and of elements of normed algebras. London: Cambridge University Press 10. Bonsall, F. F., & Duncan, J. (1973). Numerical ranges II. London: Cambridge University Press 11. Brickman, L. (1961). On the field of values of a matrix. Proceedings of the American Mathematical Society, 12, 61–66 12. Bunce, J. W., & Salinas, N. (1976). Completely positive maps on C-algebras and the left matricial spectra of an operator. Duke Mathematical Journal, 43, 747–774 13. Chalendar, I., Gorkin, P., & Partinton, J. R. (2011). Determination of inner functions by their value sets on the circle. Computational Methods and Function Theory, 11, 353–373 14. Cheung, W.-S., & Tsing, N.-K. (1996). The C-numerical range of matrices is starshaped. Linear and Multilinear Algebra, 41, 245–250 15. Choi, M.-D., & Li, C.-K. (2001). Constraint unitary dilations and numerical ranges. Journal of Operator Theory, 46, 435–447 16. Chuan, W.-F. (1985). The unitary equivalence of compact operators. Glasgow Mathematical Journal, 26, 145–149 17. Crabb, M. J. (1971). The powers of an operator of numerical radius one. Michigan Mathematical Journal, 18, 252–256 18. Crouzeix, M. (2004). Bounds for analytic functions of matrices. Integral Equations and Operator Theory, 48, 461–477 p 19. Crouzeix, M., & Palencia, C. (2017). The numerical range is a ð1 þ 2Þ-spectral set. SIAM Journal on Matrix Analysis and Applications, 38, 649–655 20. Davidson, K. R., & Holbrook, J. A. R. (1988). Numerical radii of zero-one matrices. Michigan Mathematical Journal, 35, 261–267 21. Davis, C. (1968). The shell of a Hilbert-space operator. Acta Scientiarum Mathematicarum (Szeged), 29, 69–86 22. Drury, S. W. (2008). Symbolic calculus of operators with unit numerical radius. Linear Algebra and Its Applications, 428, 2061–2069 23. Fan, K., & Pall, G. (1957). Imbedding conditions for Hermitian and normal matrices. Canadian Journal of Mathematics, 9, 298–304 24. Farenick, D. R. (2011). Arveson’s criterion for unitary similarity. Linear Algebra and Its Applications, 435, 769–777 25. Gau, H.-L., & Wang, K.-Z. (2020). Matrix powers with circular numerical range. Linear Algebra and Its Applications, 603, 190–211 26. Gau, H.-L., & Wu, P. Y. (1998). Numerical range of S(ϕ). Linear and Multilinear Algebra, 45, 49–73 27. Halmos, P. R. (1964). Numerical ranges and normal dilations. Acta Scientiarum Mathematicarum (Szeged), 25, 1–5 28. Halmos, P. R. (1982). A Hilbert space problem book (2nd ed.). New York: Springer 29. Holbrook, J. A. R. (1969). Multiplicative properties of the numerical radius in operator theory. Journal fur die Reine und Angewandte Mathematik, 237, 166–174 30. Holbrook, J. A. R. (1992). Inequalities of von Neumann type for small matrices. In K. Jarosz (Ed.), Function spaces (pp. 189–193). New York: Marcel Dekker
438
P. Y. Wu and H.-L. Gau
31. Holbrook, J. A. R., & Schoch, J.-P. (2010). Theory vs. experiment: multiplicative inequalities for the numerical radius of commuting matrices. Operator Theory: Advances and Applications, 202, 273–284 32. Issos, J. N. (1966). The field of values of non-negative irreducible matrices. Ph.D. dissertation, Auburn University 33. Keeler, D. S., Rodman, L., & Spitkovsky, I. M. (1997). The numerical range of 3 × 3 matrices. Linear Algebra and Its Applications, 252, 115–139 34. Kérchy, L. (2017). Uniqueness of the numerical range of truncated shifts. Acta Scientiarum Mathematicarum (Szeged), 83, 243–261 35. Kippenhahn, R. (1951). Über den Wertevorrat einer Matrix. Mathematische Nachrichten, 6, 193–228 36. Kittaneh, F. (2005). Numerical radius inequalities for Hilbert space operators. Studia Mathematica, 168, 73–80 37. Lancaster, J. S. (1975). The boundary of the numerical range. Proceedings of the American Mathematical Society, 49, 393–398 38. Lau, P.-S., Li, C.-K., & Poon, Y.-T. (2022). The joint numerical range of commuting matrices. Studia Mathematica, 267, 241–259 39. Li, C.-K. (2022) The C-numerical range and unitary dilations. Acta Scientiarum Mathematicarum (Szeged). arXiv:2208.01405v4 40. Li, C.-K., Poon, Y.-T., & Sze, N.-S. (2008). Higher rank numerical ranges and low rank permutations of quantum channels. Journal of Mathematical Analysis and Applications, 348, 843–855 41. Li, C.-K., Poon, Y.-T., & Sze, N.-S. (2009). Condition for the higher rank numerical range to be non-empty. Linear and Multilinear Algebra, 57, 365–368 42. Li, C.-K., & Sze, N.-S. (2008). Canonical forms, higher rank numerical ranges, totally isotropic subspaces, and matrix equations. Proceedings of the American Mathematical Society, 136, 3013–3023 43. Li, C.-K., & Tsing, N.-K. (1991). Matrices with circular symmetry on their unitary orbits and C-numerical ranges. Proceedings of the American Mathematical Society, 111, 19–28 44. Loebl, R. I., & Paulsen, V. I. (1981). Some remarks on C-convexity. Linear Algebra and Its Applications, 35, 63–78 45. Mirman, B. (1998). Numerical ranges and Poncelet curves. Linear Algebra and Its Applications, 281, 59–85 46. Mirman, B. (2003). UB-matrices and conditions for Poncelet polygon to be closed. Linear Algebra and Its Applications, 360, 123–150 47. Müller, V. (1988). The numerical radius of a commuting product. Michigan Mathematical Journal, 35, 255–260 48. Narcowich, F. J. (1980). Analytic properties of the boundary of the numerical range. Indiana University Mathematics Journal, 29, 67–77 49. Poon, Y.-T. (1997). Generalized numerical ranges, joint positive definiteness and multiple eigenvalues. Proceedings of the American Mathematical Society, 125, 1625–1634 50. Sarason, D. (1967). Generalized interpolation in H1. Transactions of the American Mathematical Society, 127, 179–203 51. Nagy, B.Sz. (1953). Sur les contractions de l’espace de Hilbert. Acta Scientiarum Mathematicarum (Szeged), 15, 87–92
Numerical Ranges of Operators and Matrices
439
52. Nagy, B.Sz., Foiaş, C., Bercovici, H., & Kérchy, L. (2010). Harmonic analysis of operators on Hilbert space (2nd ed.). New York: Springer. 53. Tam, B.-S., & Yang, S. (1999). On matrices whose numerical ranges have circular or weak circular symmetry. Linear Algebra and Its Applications, 302/303, 193–221 54. Tsing, N.-K. (1983). Diameter and minimal width of the numerical range. Linear and Multilinear Algebra, 14, 179–185 55. Tsing, N.-K. (1984). The constrained bilinear form and the C-numerical range. Linear Algebra and Its Applications, 56, 195–206 56. Wang, K.-Z., Wu, P. Y., & Gau, H.-L. (2010). Crawford numbers of powers of a matrix. Linear Algebra and Its Applications, 433, 2243–2254 57. Westwick, R. (1975). A theorem on numerical range. Linear and Multilinear Algebra, 2, 311–315 58. Williams, J. P., & Crimmins, T. (1967). On the numerical radius of a linear operator. The American Mathematical Monthly, 74, 832–833 59. Wu, P. Y., & Gau, H.-L. (2021). Numerical ranges of Hilbert space operators. Cambridge: Cambridge University Press 60. Wu, P. Y., & Gau, H.-L. (2022). Which set is the numerical range of an operator? Acta Scientiarum Mathematicarum (Szeged), 88, 527–545
Part II
Operator Equations
Stability and Controllability of Operator Differential Equations Jin Liang, Ti-Jun Xiao, and Zhe Xu
Abstract The purpose of this chapter is twofold. First, we review an important research idea and process on the stability of coupled systems of operator differential equations in Hilbert spaces that has led to many subsequent studies. Second, we present a study about the boundary controllability of operator differential equations affected by memory in Hilbert spaces. A general time optimal boundary controllability theorem for the operator differential equations is established. Applying this theorem, we obtain the exact boundary controllability for a class of viscoelastic wave and plate equations. Keywords Operator • Differential equation • Hilbert space • Memory • Solution Mathematics Subject Classification (MSC2020) Primary 47N20 • Secondary 46N20, 35R20, 35B35, 35B40, 93B05
J. Liang School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai, China e-mail: [email protected] T.-J. Xiao (✉) • Z. Xu Shanghai Key Laboratory for Contemporary Applied Mathematics, School of Mathematical Sciences, Fudan University, Shanghai, China e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Moslehian (ed.), Matrix and Operator Equations and Applications, Mathematics Online First Collections, https://doi.org/10.1007/16618_2023_53
443
444
J. Liang et al.
1 Notation ′ kk kkX (, )H X′ •|X Δ DðAÞ ½DðAÞ A g x(t) ∂Ω Ω d H C(Ω;X) C p(Ω;X) l2 L p(Ω;X) L p(Ω) H p(Ω;X) H p(Ω) H p0 ðΩÞ
Set of natural numbers Set of nonzero integer numbers Norm of a normed space Norm of a normed space X when X needs special mention Inner product in Hilbert space H Dual of a normed space X Restriction of • to X Laplacian Domain of operator A Domain endowed with the graph norm The Hilbert space adjoint of A when A is a densely defined linear operator in H The convolution
t 0 gðt - sÞxðsÞds
of g and x
Boundary of Ω Closure of Ω d-dimensional real Euclidean space 1 Cartesian product space of the Hilbert space H and itself Space of continuous X-valued functions in Ω for a space X Space of p-times continuously differentiable X-valued functions in Ω for a space X Space of 2-summable sequences Space of p-integrable X-functions on Ω for a space X Space of p-integrable functions on Ω Abstract Sobolev space W p, 2( Ω; X) for a space X Sobolev space W p, 2( Ω)
W p,2 0 ðΩÞ H-1(Ω) Dual space to H 10 ðΩÞ Xloc(0, +1; •) Space of all x with x 2 X(V ; •) for each finite open interval V of (0, +1) Xloc(0, +1) Space of all x with x 2 X(V ) for each finite open interval V of (0, +1) LðX, YÞ Space of all bounded linear operators from X to Y
Stability and Controllability of Operator Differential Equations
445
2 Introduction In this chapter, we are concerned with some problems about the operator differential equations in Hilbert spaces. It is well-known that many initial value or initial-boundary value problems for partial differential equations, stemmed from mechanics, physics, engineering, control theory, etc., can be translated into the Cauchy problem for some operator differential equations in abstract spaces by regarding the partial differential operators in the space variables as coefficient operators in some function space and letting the boundary conditions (if any) be absorbed into the definition of the function space or of the domain of the coefficient operators. The idea of dealing with the initial value or initial-boundary value problems in this way was originally developed by Hille [22] and Yosida [51] in the 1940s of the last century. Moreover, the theory of operator differential equations is closely connected with many other branches of mathematics: operator theory, space theory, semigroup theory, partial differential equations, differential operator theory, control theory, integro-differential equations, etc. Therefore, the study of operator differential equations is important for both theoretical investigations and practical applications. From monographs [4, 13–16, 20, 22, 23, 43, 52], the reader can learn abundant basic or modern theories and methods about first-order operator differential equations and operator semigroups. The monographs [13, 15, 16, 20] also give some demonstration of the theory of more complicated higherorder operator differential equations. Monograph [35] is the first systematic treatment of the higher-order operator differential equations, which shows us a rich and systematic theory for higher-order operator differential equations as well as the profound application of the idea of “direct study” (that is quite different from the traditional method of reduction) in the research of higherorder operator equation theory. From monograph [35], the reader can see many characterizations, in terms of the coefficient operators, of some basic properties of higher-order operator differential equations, for instance, HilleYosida-Phillips-type generation theorems, analyticity theorem, norm continuity theorem, and differentiability theorem. Moreover, various techniques in operator calculus and vector-valued Laplace transforms can be found in [35]. So far, the theory of linear and nonlinear operator differential equations has been developed extensively and has been widely applied to physics, theoretical mechanics, materials science, engineering, aerospace, network engineering, and many other related fields (cf., e.g., [8, 9, 12, 13, 15, 20, 24, 35, 43, 45–50] and references therein).
446
J. Liang et al.
Clearly, since the study of operator differential equations is based on the linear or nonlinear analysis under the framework of large model with various types of concrete linear or nonlinear mathematical models as the basic background, it needs combinations of multidisciplinary theories and methods to make good progress. This chapter consists of two sections. In Section 1, we present a literature review based on some papers about the stability of operator differential equations with indirect memory damping in Hilbert spaces, especially we recall and summarize a study on the stability of a class of coupled systems of operator differential equations indirectly damped via memory effects in Hilbert spaces by virtue of the operator theory and estimation techniques, which opens up a new way to investigate coupled systems effectively as well as leads to a series of subsequent studies about the stability of various evolution equations (cf., e.g., [17, 18, 24–26, 34, 38, 41] and references cited there). In Section 2, we investigate the boundary controllability of a class of operator differential equations affected by memory in Hilbert spaces, which cover many viscoelastic systems. Among others, by showing the Riesz property of a family of functions associated with the equations, we obtain a general time optimal boundary controllability theorem for the operator differential equations. Applications illustrating our theoretical results are also given, from which we get to know the exact boundary controllability for a class of viscoelastic wave and plate equations.
3 Coupled Systems of Operator Differential Equations with Indirect Memory Damping in Hilbert Spaces The contents of the section are mainly from paper [50]. In this section, H is a real Hilbert space. Let A1, A2 be positive self-adjoint linear operators in H, g(t) : [0, 1) → [0, 1) a decreasing and locally p absolutely continuous function, Bi, i = 1, 2, linear operators in H, and f : D A1 → H. Of concern is the Cauchy problem for coupled systems of operator differential equations in H
Stability and Controllability of Operator Differential Equations
447
u00 ðtÞ þ A1 uðtÞ þ αuðtÞ t
-
gðt - sÞA1 uðsÞds þ B2 vðtÞ = f ðuðtÞÞ,
ð3:1Þ
0
v00 ðtÞ þ A2 vðtÞ þ B1 uðtÞ = 0, with initial data uð0Þ = u0 , vð0Þ = v0 , u ′ ð0Þ = u1 , v ′ ð0Þ = v1 :
ð3:2Þ
Definition 3.1 A pair (u, v) of functions is called a (classical) solution of (3.1)–(3.2) on [0, T), T > 0, if u 2 C2 ð½0, TÞ; H Þ \ C1 ½0, TÞ; D
A1
\ C ð½0, TÞ; ½DðA1 ÞÞ,
v 2 C2 ð½0, TÞ; H Þ \ C1 ½0, TÞ; D
A2
\ C ð½0, TÞ; ½DðA2 ÞÞ,
satisfying (3.1)–(3.2) for t 2 [0, T). Basic assumptions: (i) There exist positive constants a1, a2 > 0 such that ðAi u, uÞH ≥ ai kuk2 , (ii) DðBi Þ ⊃ D
p
p
i = 1, 2:
ð3:3Þ
Ai (i = 1, 2), B2 jDðA2 Þ
β
u 2 DðAi Þ,
Ai u ≤ kBi uk ≤ β1
p
Ai u ,
ð3:4Þ
⊃ B1 jDðA1 Þ , u2D
p Ai ,
i = 1, 2,
ð3:5Þ
þ α - β21 > 0:
ð3:6Þ
for some positive constants β and β1. (iii) α ≥ 0 is a constant, and 1
1
gðtÞdt < 1, 0
a1 1 -
gðtÞdt 0
(iv) f is a locally Lipschitz continuous gradient operator with f(0) = 0, and either
448
J. Liang et al.
ðf ðuÞ, uÞH ≤ 0,
u2D
p
ð3:7Þ
A1 ,
or there exists an increasing continuous function ϕ : [0, 1) → [0, 1) with ϕ(0) = 0 such that jðf ðuÞ, uÞH j ≤ ϕ
p
p
A1 u
A1 u 2 ,
8u2D
p
A1 :
ð3:8Þ
In 1970, Dafermos [11] showed that the dissipation given by the memory effect asymptotically stabilizes the solutions of a single abstract Volterra equation. Since then, there have been many papers about the rates of uniform decay of various memory systems under some conditions on the memory kernels. In 1993, Lasiecka and Tataru [30] presented an algorithm about how to obtain optimal decay rates for the energy function of wave equation with a damping that is not “quantified” at the origin. The arguments in [30] were later applied also in the context of viscoelasticity. For the coupled system, we see that one equation can be regarded as a stabilizer for another equation, that is, the dissipation produced by the memory term contained in one equation can be transmitted, through the coupling, to another equation and bring the whole system to decay. Such indirectly damped system was first investigated by Russell [44] in 1993, where the indirect damping were classified as displacement coupled stabilizer and velocity coupled stabilizer. In [1, 2], the authors studies the memory-dissipative (viscoelastic) linear single equation t 00
u ðtÞ þ A1 uðtÞ -
gðt - sÞA1 uðsÞds = f ðuðtÞÞ
ð3:9Þ
0
with the general kernel functions g satisfying g′ ðtÞ ≤ - KðgðtÞÞ,
for a:e: t ≥ 0
and some other assumptions, where K is a nonnegative, strictly convex, and continuously differentiable function on [0, g(0)], and obtain sharp energy decay rates for the memory-dissipative abstract operator differential equation. The work [29] presented an intrinsic method for determining decay rates (in terms of the function K) of solution energies of some abstract operator differential equations with memory. A single equation with singular convolution kernels possessing strongly positive definite primitives is investigated
Stability and Controllability of Operator Differential Equations
449
in [10], and some exponential stability results are established. In [3], the authors dealt with a class of linear Timoshenko systems with memory and obtain the energy decay rate. As can be seen, the stability result stated in this section generalize and improve the corresponding results given in [1, 3]. In 2013, Xiao and Liang [50] studied the stability of a class of coupled systems of operator differential equations indirectly damped via memory effects in Hilbert spaces by virtue of the operator theory and estimation techniques, which opens up a new way to investigate coupled systems effectively as well as leads to a series of subsequent studies about the stability of various evolution equations (cf., e.g., [17, 18, 24–26, 34, 38, 41] and references therein). Specifically, Jin et al. [24] discussed the Cauchy problem for coupled second-order operator differential equations in a Hilbert space, with indirect memory damping: t 00
u ðtÞ þ AuðtÞ -
gðt - sÞAuðsÞds þ αuðtÞ þ BvðtÞ = f ðuÞ, 0
v00 ðtÞ þ AvðtÞ þ BuðtÞ = 0, uð0Þ = u0 , u ′ ð0Þ = u1 , vð0Þ = v0 , v ′ ð0Þ = v1 , where A is a positive self-adjoint operator in H, B is a symmetric linear operator in H, and g(t) 2 L1(0, +1) is the memory kernel, and obtain an optimal rate of uniform decay for the system energy, only under basic conditions on the memory kernels. Simultaneously, the same rate is also obtained for the corresponding single memory-dissipative second-order operator differential equations. Gao et al. [18] obtained uniform decay rates for a class of nonlinear acoustic wave motions with boundary and localized interior damping, where the damping and potential in the boundary displacement equation are nonlinear. Moreover, the nonlinear system in [18] contains the localized interior damping term, which indicates that there is a thin absorption material and flow resistance on the endophragm of the boundary. Mustafa [41] considered a nonlinear viscoelastic equation with relaxation function g satisfying the minimal conditions on the L1(0, +1): g ′ ðtÞ ≤ - ξðtÞKðgðtÞÞ,
ð3:10Þ
where K is an increasing and convex function near the origin and ξ is a nonincreasing function, and get optimal explicit and general energy decay results for the nonlinear viscoelastic equation. Jin et al. [25] established uniform exponential and polynomial decay rates for the solutions to the
450
J. Liang et al.
mixed initial-boundary value problem for semilinear wave equations with complementary frictional dampings and memory effects, under much weak conditions concerning memory effects. Actually, Jin et al. [25] removed the fundamental condition that the memory-effect region includes a part of the system boundary, while the condition is a necessity in the previous literature, and obtained the exponential and polynomial decay rates also. Moreover, for the polynomial decay rates, only minimal conditions on the memory kernel function g were assumed, without the usual assumption of g′ controlled by g. Messaoudi and Hassan [38] concerned a memory-type Bresse system with homogeneous Dirichlet-Neumann-Neumann boundary conditions and gave some decay results in cases of equal and non-equal speeds of wave propagation. Feng and Soufyane [17] investigated the stability of a von-Kármán plate equation with memory-type boundary conditions. By assuming the relaxation functions g1 and g2 with the minimal conditions on the L1(0, +1), an optimal explicit and general energy decay result was presented. Moreover, the energy result holds for K(s) = s p with the full admissible range [1,2) instead of [1,3/2). Khemmoudj and Djaidja [26] considered a viscoelastic rotating Euler-Bernoulli beam that has one end fixed to a rotated motor in a horizontal plane and to a tip mass at the other end. Under condition that the relaxation function g satisfying (3.10) and other condition, some optimal explicit and general energy decay results were proved. In [34], Li, Liang, and Xiao were concerned with asymptotic behaviors of solutions for linear wave equations with frictional damping only on Wentzell boundary, but without any interior damping. With the help of the spectral approach, differential operator theory, the frequency domain method and a precise analysis of the associated auxiliary system, Li et al. [34] obtained an ideal estimate of the resolvent of the generator of the system along the imaginary axis and the polynomial decay rate of the energy, which gives a solution to an open problem in the related field for linear systems.
3.1
Global Existence and Uniqueness
Set wðtÞ = A=
uðtÞ vðtÞ
A1 þ α
0
0
A2
, w0 ðtÞ = , BðtÞ =
u0 ðtÞ v0 ðtÞ
, w1 ðtÞ =
gðtÞA1
0
0
0
, F=
u1 ðtÞ v1 ðtÞ
,
f
- B2
- B1
0
:
Stability and Controllability of Operator Differential Equations
451
Then the assumptions (i)–(iv) imply that A and every BðtÞ are self-adjoint linear operators inp H, BðtÞ 2 W 1,1 loc ð0, 1; Lð½DðAÞ, HÞÞ, and for w1 2 DðAÞ, w2 2 D A , ðAw1 , w1 ÞH ≥ minfa1 , a2 gkw1 k2 , p p Aw2 , t ≥ 0, jðBðtÞw1 , w2 ÞH j ≤ gðtÞ Aw1 p p jðB ′ ðtÞw1 , w2 ÞH j ≤ g ′ ðtÞ Aw1 Aw2 , a:e: t ≥ 0: Moreover, F is locally Lipschitz continuous from D (3.1)–(3.2) can be translated into t
w00 ðtÞ þ AwðtÞ -
p
A
to H, and
Bðt - sÞwðsÞds = F ðwðtÞÞ t 2 ½0, 1Þ,
0
ð3:11Þ
wð0Þ = w0 , w ′ ð0Þ = w1 , in the space H : = H × H. For a solution (u, v) of (3.1)–(3.2), we define its energy as EðtÞ = Eu,v ðtÞ :=
1 1 ku ′ ðtÞk2 þ kv ′ ðtÞk2 þ 2 2 1 þ 2
t
k
gðt - sÞ
t
1-
gðsÞds 0
2 A1 uðtÞ
0
k2
A1 uðtÞ
k2 ds þ 12 k
A1 uðsÞ -
α þðB1 u, vÞH þ kuðtÞk2 2
k
k2
A2 vðtÞ
1
ðf ðsuðtÞÞ, uðtÞÞH ds: 0
ð3:12Þ Then, 1 E ′ ðt Þ ¼ - gðt Þ 2 ≤ 0:
A1 uðt Þ
2
1 þ 2
t
g ′ ðt - sÞ 0
This indicates that the energy is decreasing.
A1 uðsÞ -
A1 uðtÞ 2 ds
452
J. Liang et al.
Using (3.3), (3.4), (3.5), (3.6), and (3.7) or (3.8), we can prove the following global existence and uniqueness theorem with the help of (3.11), the energy E(t) in (3.12) and (??) (see [50] for details). Theorem 3.2 Let the assumptions (i)–(iv) hold. Then there exists ρ > 0 such p p that for u0 2 DðA1 Þ, v0 2 DðA2 Þ, and u1 2 D A1 , v1 2 D A2 , with ku1 k, kv1 k,
kpA1 u0 k, kpA2 v0 k < ρ,
(3.1)–(3.2) has a unique solution (u(t), v(t)) on [0, 1), and EðtÞ
≥
ð
C0 ku ′ ðtÞk2 þ kv ′ ðtÞk2 þ t
þ
gðt - sÞ
k
A1 uðsÞ -
0
k
A1 uðtÞ
A1 uðtÞ
k2 þ k
A2 vðtÞ
k2
k2dsÞ,
for a constant C0 > 0. Moreover, one can take ρ = 1 in the case of (3.7).
3.2
Stability
In order to obtain stability of the coupled system (3.1)–(3.2), we need more assumptions on the relationship among the operators A1, A2, B1, B2, as well as on the kernel g as follows. (v) There exists a nonnegative functional P on B1 DðA1 Þ [ B2 DðA2 Þ, and two bounded linear operators Λi on H, i = 1, 2, such that for u 2 DðA1 Þ, v 2 DðA2 Þ, jðA1 u, B2 vÞH - ðB1 u, A2 vÞH j ≤ CðkB1 ukkB2 vk þ PðB1 uÞPðB2 vÞÞ, and for u 2 DðAi Þ, ðAi u, Λi Bi uÞH þ C 1 kBi uk2 ≥ P2 ðBi uÞ, ðu, Λi Bi uÞH ≤ C1 kuk2 , where C, C1 are positive constants.
Stability and Controllability of Operator Differential Equations
453
(vi) There exists a nonnegative, strictly convex, and continuously differentiable function K on [0, g(0)], with K(0) = K′(0) = 0, such that g ′ ðtÞ ≤ - KðgðtÞÞ,
for a:e: t ≥ 0:
Moreover, there exists a natural number j0 such that 1
g
1 - j1
0
ðsÞds < 1:
0
p Remark 3.3 When A1 = A2 and B1 = B2 = A1 , the assumptions (ii) and (v) hold automatically, with P = 0, Λi = 0, i = 1, 2. Theorem 3.4 Let the assumptions (i)–(vi) hold, and let u0, v0, u1, v1, ρ be as in Theorem 3.2. Then (1).
EðtÞ ≤ MðEð0ÞÞEð0Þg0 ðtÞ,
8t ≥ 0,
where M() : [0, 1) → (0, 1) is a locally bounded function, and g0 a function decaying to 0 as t →1; (2). if for each j 2{1, . . ., j0}, the function η ° η1- j K ðηj Þ is strictly convex on 1
0, g j ð0Þ , 0 < limþ inf η→0
KðηÞ KðηÞ ≤ limþ sup < 1, ηK ′ ðηÞ η → 0 ηK ′ ðηÞ
and g ′ ðtÞ = - KðgðtÞÞ,
for a:e: t ≥ 0,
then EðtÞ ≤ MðEð0ÞÞEð0ÞgðtÞ, where M() is as in (1). In the case of f = 0, one can take M const.
8t ≥ 0,
454
J. Liang et al.
Theorem 3.4 can be proved by virtue of the following seven lemmas (see [50] for details). Without loss of generality, we generally suppose the assumptions (i)– (vi) hold and u0, v0, u1, v1, ρ are as in Theorem 3.2, and we would not repeat relevant assumptions in these lemmas. Lemma 3.5 Let (u(t), v(t)) be the unique solution on [0, 1) of the coupled system (3.1)–(3.2) guaranteed by Theorem 3.2, and let ω(t) : [0, 1) → [0, 1) be a decreasing and locally absolutely continuous function. Then for T ≥ S ≥ 0, T
ωðtÞkv ′ ðtÞk2 dt S T
≤
3D1 ωð0ÞEðSÞ þ D2
ωðtÞ
k
k2 þ k
A1 uðtÞ
A2 vðtÞ
k2
dt,
S
where D1 > 0 and D2 > 1 are positive constants. Lemma 3.6 Let (u(t), v(t)) be the unique solution on [0, 1) of the coupled system (3.1)–(3.2) guaranteed by Theorem 3.2, ω(t) : [0, 1) → [0, 1) a decreasing and locally absolutely continuous function, and let Jðu, vÞ := ðA1 uðtÞ - g A1 uðtÞ, B2 vðtÞÞH - ðB1 uðtÞ - g B1 uðtÞ, A2 vðtÞÞH : Then there exists a constant D3 > 0 such that for T ≥ S ≥ 0, T
ωð t Þ
A2 vðt Þ 2dt
S T
≤ D3 ωð0ÞEðSÞ þ D3
t
ωð t Þ S
gðt - sÞ
ωð t Þ
u ′ ðt Þ þ 2
A1 uðt Þ
2
S
þ
1 6
T
1 dt þ 6D2
T
ωðtÞkv ′ ðtÞk2 dt S
T
ωðt Þ S
A1 uðt Þ 2 dsdt
0
T
þ D3
A1 uðsÞ -
A2 vðt Þ 2 dt þ β - 2
ωðt ÞjJ ðu, vÞj2 dt, S
where the constant D2 > 1 is given by Lemma 3.5.
Stability and Controllability of Operator Differential Equations
455
Lemma 3.7 Let (u(t), v(t)) be the unique solution on [0, 1) of the coupled system (3.1)–(3.2) guaranteed by Theorem 3.2, ω(t) : [0, 1) → [0, 1) a decreasing and locally absolutely continuous function, and let Jðu, vÞ := ðA1 uðtÞ - g A1 uðtÞ, B2 vðtÞÞH - ðB1 uðtÞ - g B1 uðtÞ, A2 vðtÞÞH : ð3:13Þ Then there exists a constant C4 > 0 such that for T ≥ S ≥ 0, T
ωðt ÞjJ ðu, vÞjdt S T
≤ C 4 ωð0ÞE ðSÞ þ C 4
t
ωðt Þ
gðt - sÞ
S
A1 uðsÞ -
A1 uðt Þ 2dsdt
0
T
þ C4
ωðt Þ
u ′ ðt Þ 2 þ
A1 uðt Þ
2
dt
S T
β2 þ 6
β2 A2 vðt Þ dt þ 12D2
ωðt Þ
T
ωðt Þkv ′ ðt Þk2 dt,
2
S
S
where the constant D2 > 1 is given by Lemma 3.5. Lemma 3.8 Let (u(t), v(t)) be the unique solution on [0, 1) of the coupled system (3.1)–(3.2) guaranteed by Theorem 3.2, and let ω(t) : [0, 1) → [0, 1) be a decreasing and locally absolutely continuous function. Then there exist positive constants D5 and D6 such that for T ≥ S ≥ 0, T
ωðt Þ
A1 uðt Þk2þk A2 vðt Þ
2
dt ≤ D6 ωð0ÞE ðSÞ
S T
t
ωðt Þ
þ D6 S
þ D6 1 3
A1 uðsÞ -
A1 uðt Þ 2 dsdt
0 T
þ
gðt - sÞ
1 ωðt Þku ′ ðt Þk dt þ 3D2
T
ωðt Þkv ′ ðt Þk2 dt
2
S T
ωðt Þ
S
A2 vðt Þ 2 dt, T ≥ S ≥ 0,
S
where the constant D2 > 1 is given by Lemma 3.5.
456
J. Liang et al.
Lemma 3.9 Let (u(t), v(t)) be the unique solution on [0, 1) of the coupled system (3.1)–(3.2) guaranteed by Theorem 3.2, and let ω(t) : [0, 1) → [0, 1) be a decreasing and locally absolutely continuous function. Then there exist positive constants C7 such that for T ≥ S ≥ 0, T
ωðt ÞE ðt Þdt ≤ C7 ωð0ÞEðSÞ S T
t
ωðt Þ
þ C7 S
gðt - sÞ
A1 uðsÞ -
A1 uðt Þ 2dsdt:
0
Lemma 3.10 Let 1
GðηÞ = K ðηj0 Þ, LðζÞ =
η 2 ½0, gj0 ð0Þ,
G⋆ ðζÞ , ζ
ζ > 0,
0,
ζ = 0,
where G⋆ is the convex conjugate of G. Then there exist positive constants C8 such that for T ≥ S ≥ 0, T
L-1 S
EðtÞ ≤ σL
EðtÞ EðtÞdt ≤ C8 EðSÞ; σ 1
J-1
,
t D10
t≥
ð3:14Þ
D10 , G ′ ðr 0 Þ
1
where r0 : = gj0 ð0Þ, and JðηÞ: =
1 þ G ′ ðr 0 Þ
G0ðr0 Þ 1 η
1 ν2
0 -1
1 - Υ ðG Þ
ΥðζÞ: =
GðζÞ : ζG ′ ðζÞ
ðνÞ
dν,
η≥
1 , G0j0 ðr 0 Þ
Stability and Controllability of Operator Differential Equations
457
Lemma 3.11 EðtÞ ≤ R~2 Eð0Þgj0 ðtÞ 2
for some constant R~2 > 0. Remark 3.12 (1). The constant M(E(0))E(0) in Theorem 3.4 above is more precise than that in [1, Theorems 2.1 and 2.2] to show how it relates to E(0). (2). Lemmas 3.10 and 3.11 are new even for the single equation case. (3). The condition lim inf
η → 0þ
KðηÞ >0 ηK ′ ðηÞ
in Theorem 3.4 above is weaker than the assumption limþ inf
η→0
KðηÞ 1 > ηK ′ ðηÞ 2
used in [1]. (4). The condition of g(t) is weaker than that used in [1, Theorems 2.1 and 2.2]
3.3
Applications to Concrete Equations
Example Consider a Timoshenko beam of length l, and denote by φ and ψ the transverse displacement of the beam and the rotation angle of the beam element, respectively. The evolution of the pair (φ, ψ) is described as the following system: ρ1 φtt - b1 ðξÞφξ þ qψ
ξ
=0
ρ2 ψ tt - b2 ðξÞψ ξ - g b2 ðξÞψ ξ = hðψÞ with the initial data
in ð0, lÞ × ð0, 1Þ, ξ
þ qφξ þ α1 ψ
in ð0, lÞ × ð0, 1Þ,
ð3:15Þ
458
J. Liang et al.
φð, 0Þ = φ0 ,
φt ð, 0Þ = φ1 ,
ψð, 0Þ = ψ 0 ,
ψ t ð, 0Þ = ψ 1 ,
in ð0, lÞ, ð3:16Þ
and subject to the boundary condition σ 11 φð0Þ - τ11 φξ ð0Þ = 0,
σ 12 φðlÞ þ τ12 φξ ðlÞ = 0,
ð3:17Þ
σ 21 ψð0Þ - τ21 ψ ξ ð0Þ = 0, σ 22 ψðlÞ þ τ22 ψ ξ ðlÞ = 0, where ρ1, ρ2, q, and α1 are positive constants, and 0 < b1 ðξÞ, b2 ðξÞ 2 C 1 ð½0, lÞ satisfying ρ1- 1 b1 ðξÞ = ρ2- 1 b2 ðξÞ, α1 ≥
ξ 2 ½0, l,
q2 ; min b1
g(t) : [0, 1) → (0, 1) is a decreasing and locally absolutely continuous function such that 1
gðtÞdt < 1, 0
and h : → is locally Lipschitz continuous, with ξhðξÞ ≤ 0,
8ξ 2 ;
for i, j = 1, 2, σ ij, τij in (3.17) are nonnegative constants such that σ ij + τij > 0, and for i = 1, 2, σ i1 þ σ i2 > 0,
τ1i τ2i = 0:
Then, by our theorems above, we can show that for ψ 0 2 DðA1 Þ, φ0 2 DðA2 Þ, ψ 1 2 D
p
A1 , φ1 2 D
p A2 ,
Stability and Controllability of Operator Differential Equations
459
the initial-boundary value problem (3.15)–(3.17) has a unique solution (ψ, φ), and its energy E(t) decays with various rates corresponding to the behaviors of the memory kernel g. Example Consider the system of coupled Petrovsky type equations as follows: 2
∂t uðt, ξÞ þ Δ2 uðt, ξÞ þ αuðt, ξÞ -
t
gðt - sÞΔ2 uðt, ξÞds - βΔvðt, ξÞ = uγ ðt, ξÞ,
0
t ≥ 0, ξ 2 Ω, 2
∂t vðt, ξÞ þ Δ2 vðt, ξÞ - βΔuðt, ξÞ = 0,
t ≥ 0, ξ 2 Ω,
uðt, ξÞ = vðt, ξÞ = Δuðt, ξÞ = Δvðt, ξÞ = 0, uð0, ξÞ = u0 ðξÞ, vð0, ξÞ = v0 ðξÞ,
t ≥ 0, ξ 2 ∂Ω,
ξ 2 Ω,
∂t uð0, ξÞ = u1 ðξÞ, ∂t vð0, ξÞ = v1 ðξÞ,
ξ 2 Ω,
where Ω is a bounded domain in 3 , with smooth boundary ∂ Ω, α ≥ 0, β > 0, and γ > 1. Then, we can use our theorems above to obtain various rates of energy decay based on the properties of g.
4 Operator Differential Equations Affected by Memory in Hilbert Spaces In this section, we investigate the following control problem for operator differential equations affected by memory in Hilbert spaces, t
u00 ðtÞ þ au ′ ðtÞ þ AuðtÞ þ
ðMðt - sÞA þ Nðt - sÞÞuðsÞds = 0,
t 2 ½0, T,
0
ð4:1Þ subject to the boundary control PuðtÞ = gðtÞ, where T > 0,
t 2 ½0, T,
ð4:2Þ
460
J. Liang et al.
a 2 ,
MðÞ 2 H 2 ð0, TÞ,
NðÞ 2 L2 ð0, TÞ,
ð4:3Þ
A : DðAÞ ⊂ H → H, P : DðPÞ ⊂ H → F are linear operators and H and F are complex Hilbert spaces. We regard P as a boundary operator and g(t) 2 F the control function. The controllability of various concrete systems with memory in one dimension space, such as the heat and wave equations with memory in one dimension space, has been studied by many researcher (cf. e.g., [5, 6, 21, 32, 33, 36, 37, 40, 54] and the references therein). We study here the control problem of the abstract second-order operator differential equations in Hilbert spaces (infinite dimension spaces) above, which covers a lot of concrete viscoelastic equations and systems, in terms of operator theory, moment theory, and the Riesz property of a family of functions associated with the operator differential equation. The section is organized as follows. In Section 4.1, we give a definition of weak solutions to the (4.1)–(4.2). Moreover, existence and uniqueness of solutions to the systems is presented. In Section 4.2, we study the controllability of the abstract second-order operator differential equations. Section 4.3 is devoted to the proof of the Riesz property theorem. In Section 4.4, applications illustrating our theoretical results are given, which shows the exact boundary controllability of various viscoelastic wave and plate equations.
4.1
Existence and Uniqueness Theorem
First, we recall some basic notations to be used (cf. [7, 28, 53]). Solvability Let fhn g be a family in H, and let Q : DðQÞ ⊂ H → l2 be an operator defined by Qðf Þ = fðf , hn ÞH g,
with DðQÞ := ff 2 H : Qðf Þ 2 l2 g:
The moment problem Qðf Þ = γ
ð4:4Þ
is called to be solvable, if for any γ = {γ n}2 l2, there exists f 2 D(Q) satisfying (4.4).
Stability and Controllability of Operator Differential Equations
461
Riesz Sequence A sequence fhn g in H is called to be a Riesz sequence, if there exist positive constants c, C > 0 such that ckαn kl2 ≤
αn hn
H
≤ Ckαn kl2 ,
whenever {αn}2 l2. It is known that (1) fhn g is a Riesz sequence if and only if it is the image of some orthonormal Riesz sequence under an isomorphic mapping. (2) The moment problem (4.4) is solvable if the family fhn g forms a Riesz sequence in H, and a solution is given by f=
~n , γnh
ð4:5Þ
~n g is any biothogonal Riesz sequence of fhn g. where fh Define operator A0 as the restriction of A to ker(P). Then we have Pu = 0,
8u 2 DðA0 Þ:
Basic assumptions. (I) There exists a0 > 0 such that A0 + a0I is a strictly positive self-adjoint operator on H, with eigenvalues 0 < λ1 ≤ λ2 ≤ ≤ λn ≤ λnþ1 ≤ , and the corresponding normalized eigenfunctions fϕn gn2 form an orthogonal basis in H. Besides, there exist s, m, M, N0 > 0 such that mns ≤ λn ≤ Mns ,
for n 2 and n > N 0 :
ð4:6Þ
(II) DðAÞ ⊂ DðPÞ and there exists a bounded linear operator P1 : DðA0 Þ → F such that ðAu, vÞH þ ðPu, P1 vÞF = ðu, AvÞH ,
u 2 DðAÞ, v 2 DðA0 Þ,
and there exists an s0 2 such that
P1 ϕn ∕ ð1 þ jλn jÞ 2
in F.
s0
ð4:7Þ
is bounded
462
J. Liang et al.
For a fixed real number s, we denote by Hs the completion of the linear hull of the basis vectors {ϕn} with respect to the norm defined by kuk2H s =
þ1 k=1
ð1 þ jλk jÞ2s juk j2
for u =
þ1 k=1
u k ϕk :
ð4:8Þ
It is easy to see that H 0 = H,
H 1 = DðA0 Þ,
and H-s is the dual space of Hs. For any s > r, we have Hs ⊂ Hr with a dense and continuous inclusion. Each space Hs has the orthogonal basis fϕsk = ð1 þ jλk jÞ - s ϕk : k = 1, 2, ⋯ g: Clearly, Aϕsk = λk ϕsk ,
k = 1, 2, ⋯
It is not hard to get the following lemma. Lemma 4.1 Let the assumptions (I) and (II) hold, u0 2 Hs, and u1 2 H s - 2 : Assume u() is the solution of the initial value problem 1
u00 ðtÞ þ au ′ ðtÞ þ A0 uðtÞ þ
t
ðMðt - sÞA0 þ Nðt - sÞÞuðsÞds = 0,
t 2 ½0, T,
ð4:9Þ
0
uð0Þ = u0 , u ′ ð0Þ = u1 : Then, kuðtÞkH s þ ku ′ ðtÞk
1
Hs - 2
s0
≤ C T ku0 kH s þ ku1 k
and when u0 2 H 2 and u1 2 H
s0 - 1 2
1
Hs - 2
8t 2 ½0, T, ð4:10Þ
,
,
kP1 uðÞkL2 ð0,T; FÞ ≤ C T ku0 k
s0
H2
þ ku1 k
with constants CT > 0 independent of u0 and u1.
H
s0 - 1 2
,
ð4:11Þ
Stability and Controllability of Operator Differential Equations
463
In order to give the definition of weak solutions to the equation (4.1), we consider the following Cauchy problem of the dual equation T
v00 ðtÞ - av ′ ðtÞ þ A0 vðtÞ þ
Mðs - tÞA0 vðsÞds
t T
þ
Nðs - tÞvðsÞds = 0,
t 2 ½0, T,
ð4:12Þ
t
vðTÞ = v0 , v ′ ðTÞ = v1 , where v0 =
N1 n=1
v0n ϕn ,
v1 =
N2 n=1
v1n ϕn ;
this means that v0 , v1 2 DðA0 Þ: Suppose u(t) is the solution of (4.1) with null initial data, and u(t) 2 Hr for any r. Then we get T
t 00
u ðt Þ þ au ′ ðt Þ þ Auðt Þ þ 0
ðM ðt - sÞA þ N ðt - sÞÞuðsÞds, vðt Þ
dt H
0
¼ 0:
Obviously T
T 00
ðuðtÞ, v00 ðtÞÞH dt,
ðu ðtÞ, vðtÞÞH dt = ðu ′ ðTÞ, v0 ÞH - ðuðTÞ, v1 ÞH þ 0
0 T
T
ðau ′ ðtÞ, vðtÞÞH dt = aðuðTÞ, vðTÞÞ 0 T
T
ðAuðtÞ, vðtÞÞH dt = 0
aðuðtÞ, v ′ ðtÞÞdt, 0
ðuðtÞ, A0 vðtÞÞH - ðgðtÞ, P1 vðtÞÞF dt, 0
464
J. Liang et al. T
t
Mðt - sÞAuðsÞds, vðtÞ 0
0
dt H
T
t
=
Mðt - sÞ ðuðsÞ, A0 vðtÞÞH - ðgðsÞ, P1 vðtÞÞF dsdt 0
0 T
T
Mðs - tÞ ðuðtÞ, A0 vðsÞÞH - ðgðtÞ, P1 vðsÞÞF dsdt:
= 0
t
Therefore ðu ′ ðTÞ, v0 ÞH - ðuðTÞ, v1 - av0 ÞH T
=
T
gðtÞ, P1 vðtÞ þ 0
Mðs - tÞP1 vðsÞds dt: t
F
Consequently, under the assumptions (I) and (II), we can give the following definition of weak solutions to the equation (4.1). Definition 4.2 Let g 2 L2 ð0, T; FÞ, u0 2 H u 2 Cð½0, T; H
1 - s0 2
1 - s0 2
s0
, and u1 2 H - 2 : A function s0
Þ \ C 1 ð½0, T; H 2 Þ
is called a (weak) solution of the equation (4.1) with boundary condition (4.2) and initial condition uð0Þ = u0 ,
u ′ ð0Þ = u1 ,
ð4:13Þ
if there holds - hu1 , vð0Þi þ ðu0 , vt ð0ÞÞ - ðuðSÞ, v1 Þþhu ′ ðSÞ, v0 i S
S
gðtÞ, P1 vðtÞ þ
= 0 s0
for any v0 2 H 2 , v1 2 H dual system:
Mðs - tÞP1 vðsÞds dt t
s0 - 1 2
F
, and S 2 [0, T]. Here, v(t) is the solution of the
Stability and Controllability of Operator Differential Equations
465
v00 ðtÞ þ av ′ ðtÞ þ A0 vðtÞ T
þ
ðMðs - tÞA0 þ Nðs - tÞÞvðsÞds = 0,
t 2 ½0, T,
t
vðTÞ = v0 , v ′ ðTÞ = v1 ; s0
s0
h, i and (, ) denote the duality pairing between H 2 , H - 2 , and H respectively.
s0 - 1 2
, H
1 - s0 2
In view of Lemma 4.1, we can obtain the existence and uniqueness theorem of weak solutions to the equation (4.1). Theorem 4.3 Let the assumptions (I) and (II) hold. Then for every initial data ðu0 , u1 Þ 2 H
1 - s0 2
s0
×H- 2,
there exists a unique weak solution u(t) of system (4.1)–(4.2) with (4.13), which satisfies kuðtÞk
H
1 - s0 2
þ ku ′ ðtÞk
s0
H- 2
≤ C T ku0 k
H
1 - s0 2
þ ku1 k
s0
H- 2
þ kgkL2 ð0,T; FÞ ,
8t 2 ½0, T,
with constants CT > 0 independent of u0, u1 and g.
4.2
Controllability
Let s0 ≥ 0. We define a family χ = fzn ðtÞ : n 2 ′ g by
zn ðtÞ =
eiωn t P1 ϕjnj , jωn js0
if
ωn ≠ 0,
P1 ϕn ,
if
ωn = 0 and n > 0,
tP1 ϕjnj ,
if
ωn = 0 and n < 0,
466
J. Liang et al.
where p ωn := λn , ωn :=
jλn ji,
p ω - n := - λn , ω - n := -
jλn ji,
if λn > 0, if λn < 0:
Assumptions: (III) There exists a T0 > 0 such that χ is a Riesz sequence in L2(0, T0;F). (III)′ Let T0 > 0, s0 ≥ 0. For the solution u() of the initial value problem (4.9) with a = 0 and M(t), N(t) 0, there holds the observability inequality c ku0 k2 s0 þ ku1 k2 s0 - 1 ≤ kP1 uðÞk2L2 ð0,T 0 ; FÞ , H2
H
ð4:14Þ
2
with constants c > 0 independent of u0 and u1. s0
Remark 4.4 When the assumption (III) holds, P1 ϕn ∕ ð1 þ jλn jÞ 2
must be
bounded in F. Theorem 4.5 Let the assumptions (I), (II), and (III) hold. Then the system (4.1) with null initial data and boundary control (4.2) is controllable at time T ≥ T0 and the control space is H 1 - s0
1 - s0 2
s0
× H - 2 : That is, for any given final state
s0
u0 2 H 2 and u1 2 H - 2 , there exists a control function g(t) 2 L2(0, T;F) driving the solution (u, u′) from the null initial state to the prescribed target at time T. Corollary 4.6 Let the assumptions (I), (II), and (III ′) hold. Then the conclusion of Theorem 4.5 is true. Before giving proofs of the above results, we would show the connection between the controllability of the system and the solvability of a moment problem. For simplicity, we assume ωn ≠ 0 and s0 = 1 which does not make a crucial difference. We set the solution of the system (4.1)–(4.2) with null initial data as uðtÞ =
þ1 n=1
un ðtÞϕn :
Stability and Controllability of Operator Differential Equations
467
Using assumptions (II) we get, for n 2 , t
u00n ðt Þ
þ au ′ ðtÞ þ
ω2n un ðt Þ
þ
ω2n
t
M ðt - sÞun ðsÞds þ 0
N ðt - sÞun ðsÞds 0
¼ f n ðt Þ, ð4:15Þ and un ð0Þ = u0n ð0Þ = 0, where t
f n ðtÞ := ðgðtÞ, P1 ϕn ÞF þ
Mðt - sÞðgðsÞ, P1 ϕn ÞF ds: 0
Let e1n(t) solve the homogenous equation corresponding to (4.15) with initial condition e1n ð0Þ = 0, e01n ð0Þ = 1,
n 2 ,
then we have t
un ðtÞ =
t
e1n ðt - sÞf n ðsÞds,
u0n ðtÞ =
0
e2n ðt - sÞf n ðsÞds, 0
where e2n ðtÞ := e01n ðtÞ solves the homogenous equation with initial condition e2n ð0Þ = 1, e02n ð0Þ = - a: For any given final state u0 2 H, u1 2 H - 2 , we write 1
uðTÞ = u0 =
þ1 n=1
ξn ϕn ,
u ′ ðTÞ = u1 =
þ1 n=1
where ξn := ðu0 , ϕn ÞH ,
ηn := ðu1 , ϕn ÞH :
η n ϕn ,
468
J. Liang et al.
It is easy to see that fξn g, fωηnn g 2 l2 : Define αn, en(t) as αn = ηjnj þ ωn ξjnj i,
n 2 ′,
en ðtÞ = e2jnj ðtÞ þ wn e1jnj ðtÞi,
ð4:16Þ
n 2 ′:
Then αn : n 2 ′ jωn j
2 l2 ,
and fen ðtÞ : n 2 ′ g satisfies e00n ðtÞ þ ae0n ðtÞ þ ω2n en ðtÞ þ ω2n t
þ
t
Mðt - sÞen ðsÞds
0
ð4:17Þ
Nðt - sÞen ðsÞds = 0,
0
en ð0Þ = 1,
e0n ð0Þ =
- a þ ωn i:
This leads to a moment problem T
ðf ðsÞ, e~n ðT - sÞÞF ds = 0
αn , jωn j
n 2 ′,
ð4:18Þ
where f ðtÞ, e~n ðtÞ 2 L2 ð0, T; FÞ are defined by t
f ðtÞ = gðtÞ þ
Mðt - sÞgðsÞds, 0
e~n ðtÞ =
en ðtÞP1 ϕjnj , n 2 ′ : ð4:19Þ jωn j
Since the map from g(t) to f(t) is bounded and boundedly invertible, the control problem is reduced to the moment problem. That is, for any fξn g, fωηnn g 2 l2 , we need to find f(t) 2 L2(0, T;F) satisfying (4.18). Thus when the moment problem is solvable, the system is controllable. Accordingly, the Riesz property of f~ en ðtÞg in Theorem 4.7 (2) below indicates that Theorem 4.5 is true.
Stability and Controllability of Operator Differential Equations
469
Theorem 4.7 Let the assumptions (I) and (II) hold, and let fen ðtÞ : n 2 ′ g solve the initial value problem: e00n ðtÞ þ ae0n ðtÞ þ ω2n en ðtÞ þ ω2n t
þ
t
Mðt - sÞen ðsÞds
0
Nðt - sÞen ðsÞds = 0,
0
en ð0Þ = 1, e0n ð0Þ = b þ ωn i,
if ωn ≠ 0,
en ð0Þ = 1, e0n ð0Þ = 0, e - n ð0Þ = 0, e0- n ð0Þ = 1,
if ωn = 0, ð4:20Þ
where a, b 2 ,
MðtÞ 2 H 2 ð0, TÞ, NðtÞ 2 L2 ð0, TÞ:
Then (1) we have sup jen ðtÞj < 1, n2 ′ , t2½0, T
sup jð1 þ jωn jÞ - 1 e0n ðtÞj < 1; n2 ′ , t2½0, T
ð4:21Þ
(2) the sequence e~n ðtÞ :=
en ðtÞP1 ϕjnj : ωn ≠ 0 [ f~ en ðtÞ := en ðtÞP1 ϕjnj : ωn = 0g jωn js0
is a Riesz sequence in L2(0, T;F) for T ≥ T0, provided that the assumption (III) is also satisfied. We close this section with a brief proof of Lemma 4.1 and Corollary 4.6, for the case of ωn ≠ 0 and s0 = 1 (without loss of generality), while Theorem 4.7 will be proved in the next section. Let u(t) be the solution of the system (4.9) and uðtÞ = It is clear that
þ1 n=1
un ðtÞϕn ,
u0 =
þ1 n=1
a n ϕn ,
u1 =
þ1 n=1
bn ϕn :
470
J. Liang et al.
ku0 k2 1 H2
þ1 n=1
ω2n jan j2 ,
ku1 kH =
þ1 n=1
jbn j2 ,
and kuðtÞk2 1 H2
þ1 n=1
ω2n jun ðtÞj2 ,
ku ′ ðtÞk2H =
þ1 n=1
ju0n ðtÞj2 :
Let en(t) satisfy (4.20) with b = -a, we have 1 b 1 b un ðtÞ = en ðtÞ an þ n þ e - n ðtÞ an þ n , ωn i ω - ni 2 2
n > 0:
ð4:22Þ
Because fen ðtÞ : n 2 ′ g and fe0n ðtÞ∕ ωn : n 2 ′ g are bounded by (4.21), we infer that inequalities (4.10) and (4.11) hold. For Corollary 4.6, let u() be the solution of (4.9) with a = M(t) = N(t) = 0, and u0 =
þ1 n=1
ξ n ϕn ,
u1 =
þ1 n=1
ηn ϕn :
It is easy to get P1 uðtÞ =
1 2
jωn jξjnj n2 ′
jωn j η i z ðtÞ: ωn jnj n
From this we see that ð4:11Þ–ð4:14Þ is equivalent to the Riesz property of fzn ðtÞ : n 2 ′ g in L2 ð0, T 0 ; FÞ:
ð4:23Þ
Therefore, the assumptions (I), (II), and (III′) imply the assumption (III), and so Corollary 4.6 is a direct consequence of Theorem 4.5.
Stability and Controllability of Operator Differential Equations
4.3
471
Proof of Theorem 4.7
In this section, we prove Theorem 4.7 for the case of s0 = 1 and M(0) = 0 (without loss of generality). Proof Take n0 > N0 such that ω2n0 > γ 2 : From (4.20) we have, for |n| > n0, en ðtÞ = eðihn - γÞt þ c1n eðihn - γÞt - c1n eð- ihn - γ Þt ω2 - n hn -
1 hn
t
s
e
- γðt - sÞ
sin hn ðt - sÞ
0 t
Mðs - rÞen ðrÞdr 0 s
e - γðt - sÞ sin hn ðt - sÞ 0
Nðs - rÞen ðrÞdr, 0
where a γ := , 2
hn :=
ω2n - γ 2 ,
c1n :=
h - n := -
ω2n - γ 2 ,
b þ γ þ ðωn - hn Þi : 2ihn
Next, we introduce a slightly simpler function vn ðtÞ = eγt en ðtÞ,
ð4:24Þ
and study the properties of vn(t) instead of en(t). Let M 1 ðtÞ = eγt MðtÞ,
N 1 ðtÞ = eγt NðtÞ,
M 1 ð0Þ = 0,
then for |n| > n0, we have vn ðtÞ = eihn t þ I 1n ðtÞ þ M n vn - M 1 vn , where I 1n ðtÞ := c1n eihn t - c1n e - ihn t , ω2 γ2 1 sin hn t N 1 : M n := - 2 M 1 þ 2n cos hn t M 01 hn hn hn
ð4:25Þ
472
J. Liang et al.
Since 1 M ′ ð0Þ sin hn t cos hn t M ′ = hn
t
sin hn ðt - sÞM 00 ðsÞds , 0
there exists a constant C1 > 0 such that jI 1n ðtÞj ≤
C1 C , jM n ðtÞj ≤ 1 , jhn j jhn j
jnj > n0 , t 2 ½0, T:
ð4:26Þ
Using Gronwall’s inequality, we can also get a C2 > 0 such that jvn ðtÞj ≤ C 2 ,
jnj > n0 , t 2 ½0, T:
ð4:27Þ
Thus, an easy deduction shows that (4.21) holds, and so the assertion (1) holds. For the assertion (2), we first recall two concepts. Let {un} and {vn} be two sequences in H. We say that fun gis quadratically close tofvn g, when kun - vn k2H < þ 1:
ð4:28Þ
A sequence {un} in H is said to be ω-independent, if αn un = 0 ðnorm convergentÞ
with fαn g 2 l2
ð4:29Þ
implies {αn} = 0. Our proof will be based on the Bari theorem (cf. [19, 53]), which is as follows (with a slight change). Proposition 4.8 Let {un}n≥1 be an ω-independent sequence in H, and let fvn gn ≥ n0 (for some n0 2 ) is a Riesz sequence in H. If fun gn ≥ n0 is quadratically close to fvn gn ≥ n0 , then {un}n≥1 is also a Riesz sequence in H. Also, we need the Paley-Wiener theorem stated below. Lemma 4.9 Let {un} be a Riesz sequence in Hilbert space H, {vn} is quadratically close to {un}, then there exists an l > 0 such that {vn}n>l is a Riesz sequence.
Stability and Controllability of Operator Differential Equations
473
We proceed in three steps to prove the assertion (2). Step 1.
Prove the following lemma.
Lemma 4.10 Let the assumptions (I), (II), and (III) hold, and set z~n ðtÞ =
eihn t P1 ϕjnj , jnj > n0 : jωn j
Then there exists n1 > n0 such that f~ zn ðtÞ : jnj > n1 g is a Riesz sequence in L2(0, T; F) when T ≥ T0. Clearly, we just need to prove that f~ zn ðtÞg is quadratically close to a Riesz sequence according to Lemma 4.9. It is easy to see z~n ðtÞ = zn ðtÞ þ eiðhn - ωn Þt - 1 zn ðtÞ,
ð4:30Þ
jnj > n0 :
This implies the existence of a constant C3 > 0 such that k~ zn ðtÞ - zn ðtÞkL2 ð0,T; FÞ ≤
C3 , jωn j
8t 2 ½0, T, jnj > n0 :
C3 , jnjs
8t 2 ½0, T, jnj > n0 :
Noticing (4.6), we have k~ zn ðtÞ - zn ðtÞk2L2 ð0,T; FÞ ≤
Therefore, f~ zn ðtÞ:jnj > n0 g and {zn(t) : jnj > n0} are quadratically close, when s > 1. Next, we indicate that condition s > 1 is actually not crucial. By the assumption (III), fzn ðtÞ : n 2 ′g is a Riesz sequence in L2(0, T; F) when T ≥ T0. Fix l > n0, and construct a bounded linear operator Bl on spanfzn ðtÞ : n 2 ′g by Bl zn ðtÞ = 0,
80 < jnj ≤ l;
Bl zn ðtÞ = iðhn - ωn Þtzn ðtÞ,
8jnj > l:
Apparently one can extend Bl by linearity and continuity to L2(0, T; F).
474
J. Liang et al.
For y =∑jnj>lγ nzn(t) in the closure of span{zn(t) : jnj > l}, we have m0
jγ n j2 ≤ kyk2L2 ð0,T; FÞ ≤ m1
jnj > l
jγ n j2 ,
jnj > l
the numbers m0, m1 > 0 not depending on l. Observe kBl yk2L2 ð0,T; FÞ ≤
C3 ω2l
jγ n j2 ≤ jnj > l
C3 kyk2L2 ð0,T; FÞ , m0 ω2l
where C3 > 0 is independent of l. Thus we can choose l large enough to make 1 kBl k < : 2 Thus, I + Bl is bounded and boundedly invertible on L2(0, T;F). Now, from (4.30) we have z~n ðtÞ = ðI þ Bl Þzn ðtÞ þ r n ðtÞ,
jnj > l,
where rn ðtÞ := eiðhn - ωn Þt - 1 zn ðtÞ - Bl zn ðtÞ: Clearly, krn ðtÞk2L2 ð0,T; FÞ ≤
C4 C ≤ 5 , jωn j4 jnj2s
8t 2 ½0, T, jnj > l:
Accordingly, when s > 12, f~ zn ðtÞ : jnj > lg is quadratically close to fðI þ Bl Þzn ðtÞ : jnj > lg: But the latter is a Riesz sequence in L2(0, T; F) for T ≥ T0. Hence, so is f~ zn ðtÞ : jnj > n1 g for some n1 > l, according to Lemma 4.9. The argument can be iterated to every s > 0. For example, we can set Bl by Bl zn ðtÞ = iðhn - ωn Þtzn ðtÞ -
ðhn - ωn Þ2 t 2 zn ðtÞ , 2
Then the conclusion can be proved for s > 13 :
8jnj > l:
Stability and Controllability of Operator Differential Equations
Step 2.
475
We prove that the family
v~n ðtÞ =
vn ðtÞP1 ϕjnj : jnj > n1 jωn j
is quadratically close to a Riesz sequence, where vn(t) is given in (4.24). From (4.25) it follows that vn ðtÞ þ M 1 vn = eihn t þ I 1n ðtÞ þ M n vn : Let M2(t) satisfy M2 þ M1 M2 = M1, then ðI þ M 1 ÞðI - M 2 Þx = x,
8x 2 L2 ð0, T; FÞ,
where I ± M1,2 are two bounded and boundedly invertible operators defined by ðI ± M 1,2 Þx = x ± M 1,2 x,
8x 2 L2 ð0, T; FÞ:
ð4:31Þ
Thus, we know the existence of a sequence {ɛn(t)} of bounded and continuous functions such that vn ðtÞ = eihn t - M 2 eihn t þ
ɛn ðtÞ : ωn
ð4:32Þ
Accordingly, we deduce zn ðtÞk2L2 ð0,T; FÞ ≤ k~ vn ðtÞ - ðI - M 2 Þ~
C5 C ≤ 6s , 2 jnj jωn j
8jnj > n1 ,
476
J. Liang et al.
so that f~ vn ðtÞ : jnj > n1 g is quadratically close to fðI þ M 2 Þ~ zn ðtÞ : jnj > n1 g, when s > 1, For s > 12 , we set v^n ðtÞ = ðI þ M 1 Þvn ðtÞ. Then v^n ðtÞ = eihn t þ I 1n ðtÞ þ ðI - M 2 ÞM n v^n , and so v^n ðtÞ = eihn t þ I 1n ðtÞ þ ðI - M 2 ÞM n eihn t þðI - M 2 ÞM n I 1n ðtÞ þ ðI - M 2 Þ2 M n
2
ð4:33Þ
v^n ,
where the exponent 2 denotes iterated convolution. It is easy to see that M 01 cos hn t eihn t =
1 M′ð0Þ sin hn t eihn t þ M 00 sin hn t eihn t , hn
and sin hn t eihn t =
1 ihn t 1 eihn t - e - ihn t : te þ 2i 4hn
Since f~ zn ðtÞ : jnj > n1 g is a Riesz sequence, and jc1n j ≤ O
1 jωn j
, noticing
the form of M n , we can define operator Cl ′ similar to the operator Bl defined in Step 1 such that C l ′ z~n = 0 for jnj ≤ l′ and Cl ′ z~n = I~1n ðtÞ þ ðI - M 2 ÞM n z~n ðtÞ,
jnj > l′ ,
where I~1n ðtÞ := c1n z~n ðtÞ - c1n z~- n ðtÞ: Set δ~n ðtÞ = δn ðtÞ with
P1 ϕjnj , jωn j
ð4:34Þ
Stability and Controllability of Operator Differential Equations
δn ðtÞ := ðI - M 2 ÞM n I 1n ðtÞ þ ðI - M 2 Þ2 M n
477 2
v^n :
Apparently C7 C ≤ 8, jωn j4 n2s
kδ~n ðtÞk2L2 ð0,T 0 ; FÞ ≤
8t 2 ½0, T;
thus zn ðtÞ þ ðI - M 2 ÞC N z~n ðtÞ þ ðI - M 2 Þδ~n ðtÞ, v~n ðtÞ = ðI - M 2 Þ~
jnj > l′:
Hence, one can choose l′ > n0 large enough such that f~ vn ðtÞ : jnj > l ′ g is quadratically close to a Riesz sequence fðI - M 2 Þ~ zn ðtÞ þ ðI - M 2 ÞC N z~n ðtÞ : jnj > l ′ g when s > 12. The argument can be iterated to every s > 0 by the same method. Step 3.
From Step 2, we see that f~ en ðtÞ : jnj > n1 g is quadratically close to en ðtÞ : jnj > n2 g is a Riesz sequence in L2(0, T;F) (T ≥ T0), and so f~ also a Riesz sequence for some n2 > n1, according to Lemma 4.9. Thus, it remains to show the ω-independence of f~ en ðtÞg in L2(0, T; F).
Let {αn}2 l2 satisfy n2 ′
αn e~n ðtÞ = 0,
then n2 ′
αn v~n ðtÞ = 0:
Set GðtÞ =
n2 ′
αn z~n ðtÞ = -
n2 ′
αn I~1n ðtÞ -
n2 ′
We will prove G(t) 2 H1(0, T;F) when T ≥ T0.
αn M n v~n ðtÞ:
478
J. Liang et al.
Noting (4.34) we have αn I~1n ðtÞ = n2′
αn c1n z~n ðtÞ n2′
Since f~ zn ðtÞg is Riesz and jc1n j ≤ O 0
αn I~n ðtÞ = i n2′
αn c1n z~- n ðtÞ: n2′
1 ωn
, we get
αn hn c1n z~n ðtÞ þ i
n2′
αn c1n hn z~- n ðtÞ:
n2′
Thus the first term of G(t) is in H1(0, T; F). For the second term of G(t), we consider Rn ðtÞ = cos hn M 01 vn ,
R~n ðtÞ = Rn ðtÞ
P1 ϕjnj , jωn j
then R0n ðtÞ = M 01 ð0Þ cos hn vn þ cos hn M 001 vn : Hence, using (4.32) gives R0n ðtÞ = M 01 ð0Þ cos hn t eihn t - M 01 ð0ÞM 2 cos hn t eihn t M 0 ð0Þ þ 1 cos hn t ɛn þ M 001 cos hn t eihn t ωn M 00 - M 2 M 001 cos hn t eihn t þ 1 cos hn t ɛn , ωn and 1 1 ihn t 1 - ihn t cos hn t eihn t = teihn t þ e e : 2 4hn i 4hn i 0
Thus n2′ αn R~n ðtÞ converges when s > 12. Observing the form of M n , one knows that the second term of G(t) is in H1(0, T; F). The same conclusion can be proved for all s > 0. For example, by (4.33) we see there exists a sequence {ɛ1n(t)} of bounded continuous functions such that
Stability and Controllability of Operator Differential Equations
479
vn ðtÞ = ðI - M 2 Þ eihn t þ I 1n ðtÞ þ ðI - M 2 ÞM n eihn t þ
ɛ1n ðtÞ : ω2n
Using this, we can repeat the above process and justifies the conclusion for the case s > 14 : Now, we recall the following result (cf. [32, Lemma 3.4]). Lemma 4.11 Let K be a Hilbert space and {kn} be a sequence in K. Let hn be a sequence of real numbers such that feihn t k n g is a Riesz sequence in L2(0, T; K). If αn eihn t k n 2 H 1 ð0, T þ h; KÞ
ðh > 0Þ,
then fαn hn g 2 l2 : Using Lemma 4.11 and the fact that G(t) 2 H1(0, T; F), we know that {αnωn}2 l2. The same procedure shows G(t) 2 H2(0, T; F) and fαn ω2n g 2 l2 : By the definition for e~n ðtÞ, we have t
αn ω2n e~n ðtÞ
þ
n2′
Mðt - sÞ 0
αn ω2n e~n ðtÞ = 0,
n2′
which gives n2 ′
αn ω2n e~n ðtÞ = 0:
By induction we obtain 2 fαn ω2k n g2l ,
k = 1, 2, 3, ⋯
Next, choose n3 > n2 such that ωn3 ≠ ωi ,
8i ≤ n3 - 1:
We construct fαðkÞ n g by 2 2 αð1Þ n = ωn - ω1 αn ,
so that
2 2 ðk - 1Þ αðkÞ , n = ωn - ωk αn
k = 1, 2, 3, ⋯
480
J. Liang et al.
jnj > n3
αðnn3 Þ ω2n e~n ðtÞ = 0:
Recalling that f~ en ðtÞ : jnj > n2 g is a Riesz sequence in L2(0, T; F) (T ≥ T0), we see that for n > n3, αðnn3 Þ = 0. This implies αn = 0,
8jnj > n3 :
Consequently,
jnj ≤ n3
αn e~n ðtÞ = 0:
If αn = 0, 8jnj≤ n3, then the proof of the ω-independence of f~ en ðtÞg will be complete. Otherwise, f~ en ðtÞgjnj ≤ n3 is linearly dependent, which will lead to a contradiction. Indeed, when f~ en ðtÞgjnj ≤ n3 is linearly dependent, one has a subset 0 of fn 2 ′ : jnj ≤ n3 g such that f~ en ðtÞgn20 is also linearly dependent, but not any of its proper subsets. Hence, there exist numbers hn ≠ 0 (n 2 0 ) satisfying hn e~n ðtÞ = 0, t 2 ½0, T: n20
From this we see t
hn ω2n e~n ðtÞ þ n20
hn ω2n e~n ðsÞds = 0,
Mðt - sÞ 0
n20
so that hn ω2n e~n ðtÞ = 0: n20
When there exist k1 , k 2 2 0 such that ω2k1 ≠ ω2k2 , we deduce
n20 , n ≠ k1
hn ω2n - ω2k1 e~n ðtÞ = 0,
ð4:35Þ
Stability and Controllability of Operator Differential Equations
481
which contradicts the property of 0 : On the other hand, when ω2n = λ for all n 2 0 , we observe that en(t) and e-n(t) are linearly independent for each n. Take ñ 2 0 . Then there exists t0 2 [0, T] such that h~n e~n ðt 0 Þ ≠ 0 if - ~n 2 = 0 ,
or
h~n e~n ðt 0 Þ þ h - ~n e - ~n ðt 0 Þ ≠ 0 if - ~ n 2 0 :
ð4:36Þ
Setting ψ :=
ϕjnj hn en ðt 0 Þ p if λ ≠ 0, j λj n20
hn en ðt 0 Þϕjnj if λ = 0, n20
we have P1ψ = 0 from (4.35). Also, we see by (4.36) that ψ ≠ 0, and hence it is an eigenvector of A0. Let u be the solution of u00 þ A0 u = 0, uð0Þ = ψ, u′ ð0Þ = 0, then p uðtÞ = ψ cos λt, and p P1 u = cos λtP1 ψ = 0: In view of (4.23), we have the observability inequality (4.14), which shows P1u(t) = 0 implies u(0) = u′(0) = 0. This contradict ψ ≠ 0. Therefore, the whole proof of Theorem 4.7 is finished. □
4.4
Application to Concrete Equations
First, we consider the following wave equation: t
utt ðx, tÞ - Δuðx, tÞ þ buðx, tÞ -
Mðt - sÞΔuðx, sÞds 0
t
Nðt - sÞuðsÞds = 0,
þ 0
in Ω × (0, T), with boundary condition
ð4:37Þ
482
J. Liang et al.
uðtÞ = gðtÞ on Γ0 × ð0, TÞ,
uðtÞ = 0 on Γ1 × ð0, TÞ,
ð4:38Þ
where b 2 , g 2 L2 0, T; L2 ðΓ0 Þ , MðÞ 2 H 2loc ð0, þ 1Þ, NðÞ 2 L2loc ð0, þ 1Þ, ð4:39Þ T is a given positive time and Ω a bounded open set of d with smooth boundary ∂ Ω = Γ0 [ Γ1, Γ0 being nonempty and Γ0 \ Γ1 = Ø. In order to put the concrete system into our abstract setting, we take H = L2 ðΩÞ, F = L2 ðΓ0 Þ, A = - Δ þ bI
with DðAÞ := H 2 ðΩÞ\fu 2 H 1 ðΩÞ : ujΓ1 = 0g,
and P : u ° ujΓ0 , 8u 2 H 1 ðΩÞ,
DðA0 Þ = H 2 ðΩÞ \ H 10 ðΩÞ:
Then we can write (4.37) as t
utt ðtÞ þ AuðtÞ þ au ′ ðtÞ þ
Mðt - sÞAuðsÞds ð4:40Þ
0 t
ðNðt - sÞ - bMðt - sÞÞuðsÞds = 0,
þ 0
with boundary control PuðtÞ = gðtÞ: Take A0 = - Δ
with DðA0 Þ := H 2 ðΩÞ \ H 10 ðΩÞ:
For {λn}, the eigenvalues of operator A0, we have the following asymptotic estimate: there exist m0, m1 > 0 such that 2
2
m0 nd < λn < m1 nd ; see [39, Page 192].
ð4:41Þ
Stability and Controllability of Operator Differential Equations
483
It is clear that the assumptions (I) and (II) hold with s0 = 1 and P1 : v ° -
∂v ∂μ
j
Γ0 ,
8v 2 DðA0 Þ;
1
H 2 = H 10 ðΩÞ, H 0 = L2 ðΩÞ,
where μ denotes the unit outward normal vector to Γ. When M(t) = N(t) = 0, the controllability and observability results for the system have been studied (cf. [27, 36]). If given some restrictions to the boundary Γ0 and time T, the system is controllable. For example, let x0 2 d , mðxÞ = x - x0 , Γ0 = fx 2 Γ j mðxÞ μðxÞ > 0g,
ð4:42Þ
Γ1 = fx 2 Γ j mðxÞ μðxÞ ≤ 0g: Then there exists T0 > 0 such that the system is controllable at time T > T0 and the control space is L2( Ω) × H-1( Ω). Therefore, the assumption (III′) is satisfied with s0
H 2 = H 10 ðΩÞ, H
s0 - 1 2
= L2 ðΩÞ:
Thus we can apply Corollary 4.6 to conclude that under condition (4.42), the system (4.37) with (4.38) is controllable when T > T0 and the control space is L2( Ω) × H-1( Ω). The above result is known for N = 0 (cf. [42]). Clearly, Corollary 4.6 can be applied to the system (4.37) with other boundary conditions as well, like Neumann boundary controls. Remark 4.12 Green et al. [21] extended the boundary observability inequality for (4.37) (with b = N(t) = 0) to the case of an arbitrary space dimension; the earlier results were restricted to d = 1 (in [5, 37]) and d ≤ 3 (in [42]). There exists a statement in [21] about their approach, “We do not expect to be able to extend the quadratically close property to arbitrary dimensions. Rather, we incorporate the estimates on . . .” On the other hand, we see from the proof of Theorem 4.7 (2) that the quadratically close property can indeed be extended to arbitrary dimensions.
484
J. Liang et al.
Second, we consider the Petrovsky system: t
utt ðx, tÞ þ Δ uðx, tÞ þ aut ðx, tÞ þ
Mðt - sÞΔ2 uðx, sÞds
2
ð4:43Þ
0 t
Nðt - sÞuðsÞds = 0
þ 0
in Ω × (0, T), with boundary conditions ∂u = gðtÞ on Γ0 × ð0, TÞ; ∂μ
∂u = 0 on Γ1 × ð0, TÞ; ∂μ
u = 0 on Γ × ð0, TÞ, ð4:44Þ
where a 2 , and M, N, Ω, Γ0, Γ1 are as those for System (4.37). Let H = L2 ðΩÞ, F = L2 ðΓ0 Þ, A = Δ2
with DðAÞ := H 4 ðΩÞ \
∂u jΓ = 0, ∂μ jΓ
u 2 H 2 ðΩÞ : u
1
=0 ,
and P : u°
∂u ∂μ
j
Γ0 ,
8u 2 H 2 ðΩÞ:
Apparently, the assumptions (I) and (II) are satisfied with s0 = 1, and P1 : v ° - Δv
jΓ , 0
8v 2 DðA0 Þ = H 4 ðΩÞ \ H 20 ðΩÞ,
1
H 2 = H 20 ðΩÞ, H = L2 ðΩÞ: We note that controllability for the case of a = M(t) = N(t) = 0 is studied in [27] when Γ0 is a suitable subset of Γ. The noticeable fact for the control time T is that T > 0 is arbitrary (first proved in [54]). When Γ0 and Γ1 satisfy (4.42), the control space is L2 ðΩÞ × H 20 ðΩÞ ′ : Thus applying Corollary 4.6, we assert that for any T > 0, the Petrovsky system (4.43) with (4.44) is controllable at any time T > 0, and the control space is L2 ðΩÞ × H 20 ðΩÞ ′ :
Stability and Controllability of Operator Differential Equations
485
Consider the boundary conditions: u = gðtÞ on Γ0 × ð0, TÞ;
u = 0 on Γ1 × ð0, TÞ;
∂u = 0 on Γ × ð0, TÞ, ∂μ
instead of (4.44); we see that the assumptions (I) and (II) hold with s0 = 32, and
jΓ ,
P : u°u
0
P1 : u °
∂Δu : ∂μ
ð4:45Þ
The controllability result for the case of a = M(t) = N(t) = 0 is investigated 1 3 in [31] when Γ0 = ∂ Ω. For any T > 0, the control space is H - 4 × H - 4 (Hs being defined in Section 4.1). Thus Corollary 4.6 tells us that the same controllability is inherited by the system (4.43) with (4.45). Acknowledgements This work was completed with the support of the National Natural Science Foundation of China (11971306, 12171094, 11831011) and the Shanghai Key Laboratory for Contemporary Applied Mathematics (08DZ2271900). The authors would like to thank the referees very much for valuable and helpful comments and suggestions.
References 1. Alabau-Boussouira, F., & Cannarsa, P. (2009). A general method for proving sharp energy decay rates for memory-dissipative evolution equations. Comptes Rendus de l’Acadmie des Sciences Paris,347(15–16), 867–872 2. Alabau-Boussouira, F., Cannarsa, P., & Sforza, D. (2008). Decay estimates for second order evolution equations with memory. Journal of Functional Analysis,254(5), 1342–1372 3. Ammar-Khodja, F., Benabdallah, A., Muñoz Rivera, J. E., & Racke, R. (2003). Energy decay for Timoshenko systems of memory type. Journal of Differential Equations,194(1), 82–115 4. Arendt, W., Batty, C. J. K., Hieber, M., & Neubrander, F. (2011). Vector-valued Laplace transforms and Cauchy problems. Monographs in Math. (vo. 96). Basel: Birkhäuser/Springer 5. Avdonin, S. A., & Belinskiy, B. P. (2013). On controllability of a non-homogeneous string with memory. Journal of Mathematical Analysis and Applications,398(1), 254–269 6. Avdonin, S. A., & Belinskiy, B. P. (2014). On controllability of a linear elastic beam with memory under longitudinal load. Evolution Equations and Control Theory,3(2), 231–245
486
J. Liang et al.
7. Avdonin, S. A., & Ivanov, S. A. (1995). Families of exponentials. The method of moments in controllability problems for distributed parameter systems. New York: Cambridge University Press 8. Batty, C. J. K., Liang, J., & Xiao, T. J. (2005). On the spectral and growth bound of semigroups associated with hyperbolic equations. Advances in Mathematics,191(1), 1–10 9. Cannarsa, P., & Sforza, D. (2004). Semilinear integrodifferential equations of hyperbolic type: existence in the large. Mediterranean Journal of Mathematics,1(2), 151–174 10. Cannarsa, P., & Sforza, D. (2011). Integro-differential equations of hyperbolic type with positive definite kernels. Journal of Differential Equations,250(12), 4289–4335 11. Dafermos, C. M. (1970). An abstract Volterra equation with applications to linear viscoelasticity. Journal of Differential Equations,7, 554–569 12. Diagana, T. (2018). Semilinear evolution equations and their applications. New York: Springer 13. Engel, K.-J., & Nagel, R. (2000). One-parameter semigroups for linear evolution equations. GTM (vol. 194). Berlin, New York: Springer 14. Engel, K.-J., & Nagel, R. (2006). A short course on operator semigroups. Universitext. New York: Springer 15. Fattorini, H. O. (1985). Second order linear differential equations in Banach spaces. Amsterdam: Elsevier Science Publishers B.V. 16. Favini, A., & Yagi, A. (1999). Degenerate differential equations in Banach spaces. Monographs and Textbooks in Pure & Appl. Math. (vol. 215). New York: Marcel Dekker, Inc. 17. Feng, B., & Soufyane, A. (2020). New general decay results for a von-Kármán plate equation with memory-type boundary conditions. Discrete and Continuous Dynamical Systems Serers A,40(3), 1757–1774 18. Gao, Y., Liang, J., & Xiao, T. J. (2018). A new method to obtain uniform decay rates for damped wave equations with nonlinear acoustic boundary conditions. SIAM Journal on Control and Optimization,56(2), 1303–1320 19. Gohberg, I. C., & Krein, M. G. (1969). Introduction to the theory of linear nonselfadjoint operators. Transl. Math. Monogr., vol. 18. Providence, RI: Amer. Math. Soc. 20. Goldstein, J. A. (1985). Semigroups of linear operators and applications. New York: Oxford Univ. Press 21. Green, W., Liu, S. T., & Mitkovski, M. (2019). Boundary observability for the viscoelastic wave equation. SIAM Journal on Control and Optimization,57(3), 1629–1645 22. Hille, E. (1948). Functional analysis and semi-groups (vol. 31). New York: Amer. Math. Soc. Colloquium Publications 23. Hille, E., & Phillips, R. S. (1957). Functional analysis and semi-groups (vol. 31). Providence, R.I.: Amer. Math. Soc. Colloquium Publications 24. Jin, K. P., Liang, J., & Xiao, T. J. (2014). Coupled second order evolution equations with fading memory: Optimal energy decay rate. Journal of Differential Equations,257(5), 1501–1528 25. Jin, K. P., Liang, J., & Xiao, T. J. (2019). Uniform stability of semilinear wave equations with arbitrary local memory effects versus frictional dampings. Journal of Differential Equations,266(11), 7230–7263
Stability and Controllability of Operator Differential Equations
487
26. Khemmoudj, A., & Djaidja, I. (2020). General decay for a viscoelastic rotating EulerBernoulli beam. Communications on Pure and Applied Analysis,19(7), 3531–3557 27. Komornik, V. (1994). Exact controllability and stabilization. The multiplier method. Res. Appl. Math. New York: Wiley 28. Komornik, V., & Loreti, P. (2005). Fourier series in control theory. New York: Springer 29. Lasiecka, I., Messaoudi, S. A., & Mustafa, M. (2013). Note on intrinsic decay rates for abstract wave equations with memory. Journal of Mathematical Physics,54, 031504 30. Lasiecka, I., & Tataru, D. (1993). Uniform boundary stabilization of semilinear wave equations with nonlinear boundary dissipation. Differential and Integral Equations, 6, 507–533 31. Lasiecka, I., & Triggiani, R. (1989). Exact controllability of the Euler-Bernoulli equation with controls in the Dirichlet and Neumann boundary conditions: a nonconservative case. SIAM Journal on Control and Optimization,27 (2), 330–373 32. Leugering, G. (1987a). Exact boundary controllability of an integro-differential equation. Applied Mathematics & Optimization,15(3), 223–250 33. Leugering, G. (1987b). Time optimal boundary controllability of a simple linear viscoelastic liquid. Mathematical Methods in the Applied Sciences,9(3), 413–430 34. Li, C., Liang, J., & Xiao, T. J. (2021). Asymptotic behaviours of solutions for wave equations with damped Wentzell boundary conditions but no interior damping. Journal of Differential Equations,271(1), 76–106 35. Liang, J., Nagel, R., & Xiao, T. J. (2008). Approximation theorems for the propagators of higher order abstract Cauchy problems. Transactions of the American Mathematical Society,360(4), 1723–1739 36. Lions, J. L. (1988). Contrôlabilité exacte, perturbations et stabilization de systémes distribués. Recherches en Mathé matiques Appliquée (vol. 8). Paris: Masson 37. Loreti, P., Pandolfi, L., & Sforza, D. (2012). Boundary controllability and observability of a viscoelastic string. SIAM Journal on Control and Optimization,50(2), 820–844 38. Messaoudi, S. A., & Hassan, J. H. (2019). New general decay results in a finitememory bresse system. Communications on Pure and Applied Analysis,18(4), 1637–1662 39. Mikhailov, V. P. (1978). Partial differential equations. Moscou: Mir 40. Mophou, G., & Nakoulima, O. (2009). Null controllability with constraints on the state for the semilinear heat equation. Journal of Optimization Theory and Applications,143(3), 539–565 41. Mustafa, M. I. (2018). General decay result for nonlinear viscoelastic equations. Journal of Mathematical Analysis and Applications,457(1), 134–152 42. Pandolfi, L. (2014). Distributed systems with persistent memory control and moment problems. Springer Briefs in Electrical and Computer Engineering, Control, Automation and Robotics. Cham: Springer 43. Pazy, A. (1983). Semigroups of linear operators and applications to partial differential equations. New York: Springer 44. Russell, D. L. (1993). A general framework for the study of indirect damping mechanisms in elastic systems. Journal of Mathematical Analysis and Applications,173(2), 339–358
488
J. Liang et al.
45. Xiao, T. J., & Liang, J. (1998). The Cauchy problem for higher-order abstract differential equations. Lecture Notes in Mathematics (vol. 1701). Berlin, Germany: Springer 46. Xiao, T. J., & Liang, J. (2003). Higher order abstract Cauchy problems and their existence, uniqueness families. Journal of the London Mathematical Society,67(1), 149–164 47. Xiao, T. J., & Liang, J. (2004a). Second order parabolic equations in Banach spaces with dynamic boundary conditions. Transactions of the American Mathematical Society,356(12), 4787–4809 48. Xiao, T. J., & Liang, J. (2004b). Complete second order differential equations in Banach spaces with dynamic boundary conditions. Journal of Differential Equations,200(1), 105–136 49. Xiao, T. J., & Liang, J. (2008). Second order differential operators with FellerWentzell type boundary conditions. Journal of Functional Analysis,254(6), 1467–1486 50. Xiao, T. J., & Liang, J. (2013). Coupled second order semilinear evolution equations indirectly damped via memory effects. Journal of Differential Equations,254(5), 2128–2157 51. Yosida, K. (1948). On the differentiability and the representation of one-parameter semi-group of linear operators. Journal of the Mathematical Society of Japan,1, 15–21 52. Yosida, K. (1995). Functional analysis. Classics in Math. Berlin: Springer 53. Young, R. M. (2001). An introduction to nonharmonic Fourier series. New York: Academic Press 54. Zuazua, E. (1987). Contrôlabilité exacte exacte d’un modèle de plaques vibrantes en un temps arbitrairement petit. Comptes rendus de l’Acadmie des Sciences Paris Series I - Mathematics,304(7), 173–176
On Singular Integral Operators with Shifts Yuri I. Karlovich and Jennyffer Rosales-Méndez
Abstract Given p 2 (1, 1), the chapter is devoted to studying the Fredholmness on the space L p( Γ) for the singular integral operator with shift Y = ða + I - b + V α ÞP + + ða - I - b - V α ÞP - , where a, b2 QC( Γ) are quasicontinuous functions on a star-like curve Γ = Nk= 1 Γk , α is an orientation-preserving homeomorphism of each arc Γk onto itself with quasicontinuous derivative α′, Vα : f ° f ∘ α is a shift operator, P := 12 ðI SΓ Þ, I is the identity operator, and SΓ is the Cauchy singular integral operator. Applying results on the Mellin pseudodifferential operators with quasicontinuous VðÞ -valued symbols, where VðÞ is the Banach algebra of all absolutely continuous functions of bounded total variation on , and using C-algebra representations related to investigations based on the local-trajectory method combined with spectral measures and lifting theorems, we establish an invertibility criterion for the functional operators A = aI - bVα and Fredholm criteria for the operator Y on the spaces L p( Γ). A literature review of investigations on the Fredholm theory of singular integral operators with shifts is provided. Keywords Singular integral operator with shifts • Mellin pseudodifferential operator • Quasicontinuous functions and symbols • Fredholmness The work was partially supported by the CONACYT Projects A1-S-8793 and A1-S-9201 (México). The second author was also sponsored by the CONACYT scholarship No. 666743. Y. I. Karlovich (✉) • J. Rosales-Méndez Centro de Investigación en Ciencias, Instituto de Investigación en Ciencias Básicas y Aplicadas, Universidad Autónoma del Estado de Morelos, Cuernavaca, Mexico e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Moslehian (ed.), Matrix and Operator Equations and Applications, Mathematics Online First Collections, https://doi.org/10.1007/16618_2023_54
489
490
Y. I. Karlovich and J. Rosales-Méndez
Mathematics Subject Classification (MSC2020) Primary 45E05 • Secondary 47A53, 47G10, 47G30
1 Introduction This chapter is devoted to the Fredholm study of singular integral operators with shifts and quasicontinuous data by applying the theory of Mellin pseudodifferential operators with quasicontinuous symbols (see [27, 28]). Nonlocal singular integral operators with shifts arise in studying boundary value problems for analytic functions (see [33–35]). Let BðXÞ be the Banach algebra of all bounded linear operators acting on a Banach space X and let KðXÞ be the ideal of all compact operators in BðXÞ. An operator A 2 BðXÞ is said to be Fredholm, if its image is closed and the spaces kerA and kerA are finite-dimensional (see, e.g., [15, 40]). If A 2 BðXÞ is a Fredholm operator, then Ind A := dim ker A - dim ker A is the index of A. Let jAj:= inf fk A + K kBðXÞ : K 2 KðXÞg for all A 2 BðXÞ. Given p 2 (1, 1), a star-like curve Γ = Nk= 1 Γk and a shift α : Γ → Γ, we study the Fredholmness of the singular integral operator with shift Y = ða + I - b + V α ÞP + + ða - I - b - V α ÞP - 2 BðLp ðΓÞÞ,
ð1:1Þ
where a, b2 QC( Γ) are quasicontinuous functions on Γ, P := 12 ðI SΓ Þ, I is the identity operator, and SΓ is the Cauchy singular integral operator given for t 2 Γ by 1 ε → 0 πi
ðSΓ f ÞðtÞ := lim
f ðτÞ dτ, τ-t Γ∖Γðt,εÞ
Γðt, εÞ := fτ 2 Γ : jτ - tj < εg, ð1:2Þ
Vα is the shift operator given by Vαf = f ∘ α, with α′2 QC( Γ), and α, Γ and the set QC( Γ) are defined in Section 3. By [11, 16], the operator SΓ is p bounded on the space L p( Γ), V 1 α 2 BðL ðΓÞÞ as well. Singular integral operators with piecewise continuous coefficients and infinite cyclic groups of shifts, as well as algebras of such operators, were studied by different methods by V.G. Kravchenko and the first author (see, e.g., [29, 30]), A.G. Myasnikov and L.I. Sazonov [36–38], V.N. Semenjuta [43], and A.P. Soldatov [44, 45] (also see [33] and the references therein). The Fredholmness on the Lebesgue spaces L p( Γ) with p 2 (1, 1) for singular
On Singular Integral Operators with Shifts
491
integral operators with discrete subexponential groups of shifts acting topologically freely on Γ and Banach algebras of such operators were studied in [23] and [32] in the case of piecewise smooth contours, piecewise continuous coefficients and piecewise smooth shifts. C-algebra approach to nonlocal operator C-algebras related to C-dynamical systems was developed in [1–4]. The Fredholm theory for the Banach algebra of singular integral operators with shifts on Lebesgue spaces was constructed in the case of piecewise slowly oscillating coefficients and slowly oscillating derivative α′ (see, e.g., [18–22]). C-algebras of singular integral operators with piecewise slowly oscillating data and different groups of shifts having common and non-common fixed points were also investigated by other means: by applying the local-trajectory method, spectral measures, and lifting theorem (see, e.g., [5–7]). The Fredholmness in C-algebras of singular integral operators with piecewise quasicontinuous coefficients and infinite groups of shifts acting freely on the unit circle was investigated in [12]. C-algebras of singular integral operators with piecewise quasicontinuous data and infinite groups of shifts having different sets of fixed points were studied in [8, 10]. The present chapter deals with studying the Fredholmness of the nonlocal singular integral operator Y = A+P+ + A-P- with functional operators A = aI - bVα and quasicontinuous data on the Lebesgue spaces L p( Γ) for p 2 (1, 1). The operator T = V α P + + GP - 2 BðLp ðΓÞÞ related to the Haseman boundary value problem with quasicontinuous data was earlier studied in [31]. Applying recent results [27, 28] on Mellin pseudodifferential operators with quasicontinuous symbols (see also [39]), we obtain the invertibility criteria for the binomial functional operators A on the spaces L p( Γ) with p 2 (1, 1) and establish a Fredholm criterion for the operator Y on the space L2( Γ) and a conditional Fredholm criterion for the operator Y on the space L p( Γ) for every p 2 (1, 1). The chapter is organized as follows. In Section 2 we collect preliminaries on quasicontinuous functions defined on the unit circle , on the real line , and on the half-line + . In Section 3 we introduce quasicontinuous data for the operators Y given by (1.1) and present the main results of the chapter: invertibility criteria for the functional operators A on the spaces L p( Γ) and Fredholm criteria for the operator Y on the spaces L p( Γ). In Section 4 we recall properties of quasicontinuous shifts obtained in [31]. In Section 5, making use of [24, 26–28], we consider the boundedness of Mellin pseudodifferential operators with bounded measurable and quasicontinuous VðÞ-valued symbols on + , where VðÞ is the Banach algebra of absolutely continuous functions of bounded total variation on , on the spaces Lp ð + , dμÞ with invariant measure dμ(ϱ) = dϱ∕ϱ, describe the
492
Y. I. Karlovich and J. Rosales-Méndez
Fredholm symbol calculus constructed in [28] for the Banach algebra Dp of Mellin pseudodifferential operators with quasicontinuous VðÞ-valued symbols, and give a Fredholm criterion for the operators D 2 Dp on the space Lp ð + , dμÞ with p 2 (1, 1) in terms of their Fredholm symbols. Section 6 deals with applications of Mellin pseudodifferential operators with quasicontinuous VðÞ-valued symbols. Finally, in Sections 7–10 we prove the main results of the chapter: the invertibility criterion for the functional operator A = aI - bVα with a, b, α′2 QC( Γ) on the spaces L p( Γ) (Theorem 3.1), a conditional Fredholm criterion for the operator (1.1) with quasicontinuous data on the spaces L p( Γ) (Theorem 3.2), and a complete Fredholm criterion for the operator (1.1) on the space L2( Γ) (Theorem 3.3). To prove Theorem 3.2, we apply results on Mellin pseudodifferential operators with quasicontinuous VðÞ -valued symbols, while for the proof of Theorem 3.3, we use a general form of singular integral operators with shifts (Theorem 9.2) and C-algebra representations related to those in [10] that are based on applying the local-trajectory method, spectral measures and lifting theorems.
2 Quasicontinuous Functions on , and + 2.1
The C-Algebra QC of Quasicontinuous Functions on
Let H1 be the closed subalgebra of L1 ðÞ that consists of all functions being non-tangential limits on of bounded analytic functions on the open unit disc := fz 2 : jzj < 1g, and let C := CðÞ. By [41] and [42], the C-algebra QC := QCðÞ of quasicontinuous functions on is defined by QC := ðH 1 + CÞ \ ðH 1 + CÞ:
ð2:1Þ
Given a commutative unital C-algebra A, we denote by MðAÞ the maximal ideal space of A. Identifying the points t 2 with the evaluation functionals δt( f ) = f(t), we obtain MðCðÞÞ = . Since C ⊂ QC, the maximal ideal space of QC is of the form M t ðQCÞ,
MðQCÞ =
M t ðQCÞ := fξ 2 MðQCÞ : ξjC = t g,
t2
where Mt(QC) are fibers of M(QC) over points t 2 .
ð2:2Þ
On Singular Integral Operators with Shifts
493
For t 2 , let M 0t ðQCÞ be the set of functionals in Mt(QC) that lie in the weak-star closure in QC of the set {δλ,t : λ 2 (1, 1)}, where for t = eiθ, δλ,t : QC → , f °
λ 2π
θ + π∕ λ
f ðeix Þdx for all ðλ, tÞ2ð1, 1Þ × ; θ - π∕ λ
M t ðQCÞ := fξ 2 M t ðQCÞ : ξðf Þ = 0 if f 2 QC and lim supjf ðzÞj = 0g: z → t
For each t 2 , it follows from [42, Lemma 8] (see also [15, Section 3.3]) that M t+ ðQCÞ \ M t- ðQCÞ = M 0t ðQCÞ,
M t+ ðQCÞ [ M t- ðQCÞ = M t ðQCÞ:
Hence, the fiber Mt(QC) splits into the three disjoint sets: M 0t ðQCÞ,
M t+ ðQCÞ∖M 0t ðQCÞ,
M t- ðQCÞ∖M 0t ðQCÞ:
Let PQC := algðQC, PCÞ be the C-subalgebra of L1 ðÞ generated by the C -algebras QC and PC, where PC := PCðÞ consists of all piecewise continuous functions on , that is, the functions having finite one-sided limits at each point t 2 . The functions in PQC are referred to as the piecewise quasicontinuous functions.
2.2
Quasicontinuous Functions on and +
Let + = ð0, 1Þ, + = ½0, 1, and _ = [ f1g. The C-algebra QCðÞ of quasicontinuous functions on is defined similarly to (2.1) by _ Þ \ ðH 1 ðÞ + CðÞ _ Þ, QCðÞ := ðH 1 ðÞ + CðÞ where H 1 ðÞ = ff ∘γ - 1 : f 2 H 1 g and _ t ° ið1 + tÞ∕ ð1 - tÞ: γ : → , _ = , _ it follows from (2.2) that MðQCðÞÞ = As MðCðÞÞ
ð2:3Þ x2_ M x ðQCðÞÞ,
M x ðQCðÞÞ := fξ 2 MðQCðÞÞ : ξjCðÞ _ = xg:
494
Y. I. Karlovich and J. Rosales-Méndez
Applying the homeomorphism (2.3), we conclude that a ∘ γ 2 QC if and only if a 2 QCðÞ. Hence, we can associate the fibers M x ðQCðÞÞ and ~ Mt(QC) as follows: ξ 2 M x ðQCðÞÞ for x 2 _ if and only if the functional ξ, defined by ξða∘γÞ = ξðaÞ for every a 2 QCðÞ, belongs to Mt(QC) for 0 t = γ - 1 ðxÞ 2 . Similarly, we define the sets M x ðQCðÞÞ and M x ðQCðÞÞ _ for every x 2 . The C-algebra of quasicontinuous functions on + is defined by QCð + Þ := QCðÞj + : The maximal ideal space MðQCð + ÞÞ can be identified with the set MðQCð + ÞÞ = M 0+ ðQCðÞÞ [
ð
M x ðQCðÞÞ
x2 +
where
ðQCðÞÞ M1
means
Þ[M
1 ðQCðÞÞ,
ð2:4Þ
_ M1 ðQCðÞÞ.
3 Main Results 3.1
Quasicontinuous Data of the Operator Y
Star-Like Curve Let Γ denote the star-like curve Γ=
N k = 1 Γk ,
Γk := eiβk +
ðk = 1, 2, . . ., NÞ,
ð3:1Þ
where the rays Γk are oriented either from 0 to 1 or from 1 to 0, and 0 ≤ β1 < β2 < . . . < βN < 2π:
ð3:2Þ
With each ray Γk we associate the number εk =
1
if 0 is the starting point of Γk ,
-1
if 0 is the terminating point of Γk :
ð3:3Þ
The curve Γ given by (3.1) and (3.2) is a Carleson curve (see [11]). Quasicontinuous Functions on Γ Let QC( Γ) denote the set of all quasicontinuous functions b : Γ → , which means that there exist functions bk 2 QCð + Þ such that bðeiβk rÞ = bk ðrÞ for r 2 + and all k = 1, 2, . . ., N.
On Singular Integral Operators with Shifts
495
Quasicontinuous Shift Let α be an orientation-preserving homeomorphism of each ray Γk (k = 1, 2, . . ., N) onto itself that possesses the property lnjα ′ j 2 L1 ðΓÞ:
ð3:4Þ
Then the shift operator Vα : f ° f ∘ α and its inverse V α- 1 are bounded on the spaces L p( Γ), p 2 (1, 1). We call such α a quasicontinuous shift on Γ if αðeiβk rÞ = eiβk reωk ðrÞ
for r 2 + and k = 1, 2, . . . , N,
ð3:5Þ
where ωk and ϕk : r ° rω0k ðrÞ are real-valued functions in QCð + Þ. If α is a quasicontinuous shift, then it follows from (3.4) and (3.5) that α ′ ðeiβk rÞ = ð1 + rω0k ðrÞÞeωk ðrÞ
for r 2 + and all k = 1, 2, . . ., N,
ess inf ð1 + rω0k ðrÞÞ > 0 r2 +
for all k = 1, 2, . . ., N: ð3:6Þ
By (3.6), α′2 QC( Γ). We define the real-valued functions ω, cα 2 QC( Γ) by ωðtÞ := ln ½αðtÞ∕ t,
3.2
cα ðtÞ := eωðtÞ∕ p
for t 2 Γ:
ð3:7Þ
Invertibility of Binomial Functional Operators
Given p 2 (1, 1), we consider the binomial functional operator: A = aI - bV α 2 BðLp ðΓÞÞ,
ð3:8Þ
where a, b 2 QC( Γ), α is a quasicontinuous shift defined in Section 3.1, I is the identity operator, and Vα is the shift operator given by Vαf = f ∘ α. Theorem 3.1 If p 2 (1, 1), a, b 2 QC( Γ), a star-like curve Γ = Nk= 1 Γk and a quasicontinuous shift α satisfy the conditions of Section 3.1, and α has only two fixed points 0 and 1, then the functional operator A = aI - bVα is invertible on the space L p( Γ) if and only if for every k = 1, 2, . . ., N one of the following two conditions holds:
496
Y. I. Karlovich and J. Rosales-Méndez
ess inf jak ðrÞj > 0 and
max ðjbk ðηÞjjα0k ðηÞj - 1∕ p ∕ jak ðηÞjÞ < 1,
ð3:9Þ
ess inf jbk ðrÞj > 0 and
max ðjak ðηÞjjα0k ðηÞj1∕ p ∕ jbk ðηÞjÞ < 1,
ð3:10Þ
r2 +
r2 +
~ η2Δ
~ η2Δ
where ak ðrÞ := aðeiβk rÞ, bk ðrÞ := bðeiβk rÞ and αk ðrÞ := e - iβk αðeiβk rÞ for r 2 + and k = 1, 2, . . . , N, and ~ := M + ðQCðÞÞ [ M - ðQCðÞÞ: Δ 0 1
3.3
ð3:11Þ
Fredholmness of a Singular Integral Operator with Shift
Making use of interrelations of singular integral operators with quasicontinuous coefficients and Mellin pseudodifferential operators with quasicontinuous symbols established in [27] and applying a reduction to such operators, we obtain Fredholm criteria for the operator (1.1) with quasicontinuous coefficients a, b2 QC( Γ) and quasicontinuous shift α. We now define the sets Δ0 := M 00 ðQCðÞÞ [ M 01 ðQCðÞÞ, M0 := M 00 ðQCðÞÞ × ,
M1 := M 01 ðQCðÞÞ × :
ð3:12Þ ð3:13Þ
Fix p 2 (1, 1) and consider the singular integral operator with shift Y = ða + I - b + V α ÞP + + ða - I - b - V α ÞP - 2 BðLp ðΓÞÞ, where a, b2 QC( Γ), the star-like curve Γ and a shift α : Γ → Γ satisfy the conditions of Section 3.1. Then the following conditional Fredholm criterion for the operator Y is valid. Theorem 3.2 Let p 2 (1, 1), a, b2 QC( Γ) and let a star-like curve Γ and a shift α : Γ → Γ satisfy the conditions of Section 3.1, where α has only two fixed points 0 and 1. If the functional operators A = aI - bVα are invertible on the space L p( Γ), then the operator Y = A+P+ + A-P- is Fredholm on the space L p( Γ) if and only if
On Singular Integral Operators with Shifts
det Yðξ, xÞ :=
ð
Þð
497
ÞP + ðxÞ + ð ε = - 1 A + ,k ðξ, xÞÞð ε = 1 A - ,k ðξ, xÞÞP - ðxÞ ≠ 0 εk = 1 A + ,k ðξ, xÞ k
εk = - 1 A - ,k ðξ, xÞ k
ð3:14Þ for all ðξ, xÞ 2 M0 [ M1 , where for k = 1, 2, . . ., N, 1 P ðxÞ := ½1 cothðπðx + i∕ pÞÞ for x 2 , := ½ - 1, + 1, 2 ð3:15Þ A,k ðξ, xÞ := a,k ðξÞ - b,k ðξÞeiωk ðξÞðx + i∕ pÞ for ðξ, xÞ 2 M0 [ M1 , ð3:16Þ a,k ðrÞ := a ðeiβk rÞ, b,k ðrÞ := b ðeiβk rÞ, ωk ðrÞ := ln ½αk ðrÞ∕ r r 2 + , and the shifts αk : + → + are defined in Theorem 3.1.
for
Theorem 3.2 gives sufficient Fredholm conditions for the operator Y on the space L p( Γ) that consist of the invertibility criteria for the binomial functional operators A = aI - bVα on the space L p( Γ), which are described by Theorem 3.1, and the fulfillment of (3.14). To get the complete Fredholm criterion for the operator Y , it remains to prove the invertibility of the operators A on the space L p( Γ) for the Fredholm operator Y . For p 2 (1, 2) [ (2, 1) this question is open. If p = 2, then we obtain the following Fredholm criterion for the operator Y by applying C-algebra representations related to those in [10, Section 4] obtained by using spectral measures and the local-trajectory method [5, 25] related to C-algebras associated with C-dynamical systems (see, e.g., [1, 2]). Theorem 3.3 Let p = 2, a, b2 QC( Γ) and let a star-like curve Γ and a shift α : Γ → Γ satisfy the conditions of Section 3.1, where α has only two fixed points 0 and 1. Then the operator Y = A+P+ + A-P- is Fredholm on the space L2( Γ) if and only if the functional operators A = aI - bVα are invertible on the space L2( Γ) and (3.14) holds for all ðξ, xÞ 2 M0 [ M1 , where p = 2 in (3.15) and (3.16).
498
Y. I. Karlovich and J. Rosales-Méndez
4 Properties of Quasicontinuous Shifts By analogy with [18], we say that an orientation-preserving homeomorphism α : + → + onto itself is a quasicontinuous shift on + if ln α ′ 2 L1 ð + Þ and α ′ 2 QCð + Þ. We denote by QCSð + Þ the set of such shifts. The following assertion generalizes [18, Lemma 2.2] and modifies [31, Theorem 4.1], where the required condition of absolute continuity for the homeomorphism α is absent (such condition is also required for the homeomorphisms in [31, Lemmas 4.5, 4.6]). Theorem 4.1 An orientation-preserving homeomorphism α : + → + belongs to QCSð + Þ if and, under the condition of absolute continuity of α, only if α(r) = reω(r) for r 2 + , where the real-valued functions ω and ϕ : r ° rω′(r) belong to QCð + Þ and ess inf ð1 + rω ′ ðrÞÞ > 0: r2 +
Theorem 4.2 ([31, Theorem 4.3]) If b 2 QCð + Þ and α 2 QCSð + Þ, then the functions b ∘ α and b - b ∘ α belong to QCð + Þ and b(ξ) - (b ∘ α)(ξ) = 0 for every ξ 2 Δ, where Δ is given by (3.11). Combining [31, Theorem 4.4] and [31, Lemma 4.5] subject the mentioned modification, we obtain the following. Theorem 4.3 If α 2 QCSð + Þ, then its inverse β = α-1 is in QCSð + Þ. If α and α-1 are absolutely continuous functions on + [ f0g , then the ˘ functions ωðrÞ := ln ½αðrÞ∕ r and ωðrÞ := ln ½βðrÞ∕ r defined for r 2 + belong to QCð + Þ, ω (r) = -ω[β(r)] for r 2 + and ω(ξ) = -ω (ξ) for all ξ 2 Δ. Since α′(r) = (1 + rω′(r))eω(r) for r 2 + , it follows from modified [31, Lemma 4.6] that α′(ξ) = eω(ξ) for every ξ 2 Δ0, where Δ0 is given by (3.12).
5 Mellin Pseudodifferential Operators Let VðÞ be the Banach algebra of all absolutely continuous functions a of bounded total variation V (a) on , equipped with the norm
On Singular Integral Operators with Shifts
499
k a k V := k a k L1 ðÞ + VðaÞ,
VðaÞ =
ja ′ ðxÞjdx:
Following [24, 26], let L1 ð + , VðÞÞ be the set of all functions a : + × → such that a : r ° aðr, Þ is a bounded measurable VðÞ-valued function on + . Therefore, að, xÞ 2 L1 ð + Þ for every x 2 , and the function r ° kaðr, ÞkV := max jaðr, xÞj + x2
∂a ðr, xÞ dx, ∂x
belongs to L1 ð + Þ. Note that the limits aðr, 1Þ = lim x → 1 aðr, xÞ exist a.e. on + . Clearly, L1 ð + , VðÞÞ is a Banach algebra with the norm
kakL1 ð + ,VðÞÞ := ess supkaðr, ÞkV : r2 +
As usual, let C 1 0 ð + Þ be the set of all infinitely differentiable functions of compact support on + . Let dμ(ϱ) = dϱ∕ϱ be the (normalized) invariant measure on + . The following boundedness result for Mellin pseudodifferential operators was obtained in [26, Theorem 9.1] due to [24, Theorem 3.1]. Theorem 5.1 If a 2 L1 ð + , VðÞÞ , then the Mellin pseudodifferential operator OpðaÞ, defined for functions f 2 C1 0 ð + Þ by the iterated integral
½OpðaÞf ðrÞ = 1
2π
aðr, xÞ
dx
+
r ϱ
ix
f ðϱÞ
dϱ ϱ
for
r 2 + ,
ð5:1Þ
extends to a bounded linear operator on the space Lp ð + , dμÞ and there is a number Cp 2 (0, 1) depending only on p such that k OpðaÞ k BðLp ð + ,dμÞÞ ≤ C p k a k L1 ð + ,VðÞÞ : Let L1 ð, VðÞÞ denote the set of all functions b : × → such that t ° bðt, Þ is a bounded measurable VðÞ-valued function on . We say that b 2 L1 ð, VðÞÞ is a quasicontinuous VðÞ -valued function on , b 2 QCð, VðÞÞ, if
500
Y. I. Karlovich and J. Rosales-Méndez
lim sup
1
δ → 0 jIj ≤ δ jIj
I
kbðt, Þ - jIj1
k dmðtÞ = 0,
bðτ, ÞdmðτÞ I
V
where m() is the Lebesgue length measure on , jIj = I dm. A function a 2 L1 ð, VðÞÞ is referred to as a quasicontinuous VðÞvalued function on , a 2 QCð, VðÞÞ, if the function b defined by bðt, Þ = aðγðtÞ, Þ for t 2 , where γ is given by (2.3), belongs to QCð, VðÞÞ. Let QCð + , VðÞÞ be the set of restrictions of functions a 2 QCð, VðÞÞ to + with respect to the first variable. Then QCð + , VðÞÞ is a Banach subalgebra of L1 ð + , VðÞÞ. Given a 2 QCð + , VðÞÞ and h 2 , let ah ðr, xÞ := aðr, x + hÞ for almost all r 2 + and all x 2 . Let Eð + , VðÞÞ be the Banach subalgebra of L1 ð + , VðÞÞ that consists of all functions a 2 QCð + , VðÞÞ satisfying the conditions: lim
jhj → 0
lim
M→1
ess supkaðr, Þ - ah ðr, ÞkV = 0, r2 +
j∂x aðr, xÞj dx = 0:
ess sup r2 +
ð5:2Þ ð5:3Þ
∖½ - M,M
Following [28], for p 2 (1, 1), we consider the Banach algebra Dp ⊂ Bp := BðLp ð + , dμÞÞ generated by the Mellin pseudodifferential operators OpðaÞ of the form (5.1) with quasicontinuous VðÞ -valued symbols a 2 Eð + , VðÞÞ. Note that the algebra Dp contains operators which are not Mellin pseudodifferential operators OpðaÞ with symbols a 2 Eð + , VðÞÞ but are limits of sequences of operators Opðan Þ with symbols an 2 Eð + , VðÞÞ. Theorem 5.2 ([28, Theorem 5.3]) For every p 2 (1, 1) and for all operators D1 , D2 2 Dp , the commutator [D1, D2] is a compact operator on the space Lp ð + , dμÞ. The ideal Kp := KðLp ð + , dμÞÞ of compact operators is contained in the Banach algebra Dp (see [28, Section 5]). Thus the quotient Banach algebra π Dπp := Dp ∕ Kp is commutative, and Dπp ⊂ Bp := Bp ∕ Kp . Let BðM, Þ be the algebra of all bounded complex-valued functions on the compact set M ⊂ MðQCð + ÞÞ × defined by
On Singular Integral Operators with Shifts
501
M := M + 1 [ M - 1 [ M0 [ M1 ,
½
M1 = M 0+ ðQCðÞÞ [
ð
M t ðQCðÞÞ
t2 +
Þ[M
× f 1g,
1 ðQCðÞÞ
M0 = M 00 ðQCðÞÞ × , M1 = M 01 ðQCðÞÞ × : ð5:4Þ Define the Fredholm symbol D of the operator D := OpðaÞ 2 Dp with a quasicontinuous VðÞ-valued symbol a 2 Eð + , VðÞÞ by Dðξ, xÞ := aðξ, xÞ for all ðξ, xÞ 2 M,
ð5:5Þ
where aðξ, 1Þ are the values of the Gelfand transforms of the functions að, 1Þ on the maximal ideal space MðQCð + ÞÞ and, by [28], the values aðξ, xÞ are defined for every ξ 2 M 00 ðQCðÞÞ [ M 01 ðQCðÞÞ and every x 2 in the form aðξ, xÞ = nlim →1
1 jI n j
aðγðtÞ, xÞdmðtÞ In
for the function aðγðÞ, Þ 2 Eð - , VðÞÞ, - := fz 2 and Im z < 0g, and a sequence of arcs I n ⊂ is such that their centers tn are equal to - 1 or 1 if, respectively, ξ 2 M 00 ðQCðÞÞ or ξ 2 M 01 ðQCðÞÞ, and limn→1|In| = 0. A unital closed subalgebra A of a unital Banach algebra B with the same unit is called inverse closed in B, if for every A 2 A, its spectra in the algebras A and B coincide (see, e.g., [13, p. 3]). Hence, if A 2 A is invertible in B, then A is invertible in A. The Fredholm symbol calculus for the Banach algebra Dp and the Fredholm criterion for the operators D 2 Dp in terms of their Fredholm symbols are given as follows. Theorem 5.3 ([28, Theorem 5.4]) For p 2 (1, 1), the map D ° Dð, Þ given by (5.5) on the generators of the Banach algebra Dp that are Mellin pseudodifferential operators OpðaÞ with symbols a 2 Eð + , VðÞÞ extends to a Banach algebra homomorphism Φp : Dp → Φp ðDp Þ, where Φp ðDp Þ is a subalgebra of the algebra BðM, Þ, kerΦp ⊃ Kp, and kerΦp = Kp for p = 2. The quotient Banach algebra Dπp is commutative and inverse closed in the
502
Y. I. Karlovich and J. Rosales-Méndez π
Calkin algebra Bp , and its maximal ideal space is homeomorphic to M. An operator D 2 Dp is Fredholm on the space Lp ð + , dμÞ if and only if the Gelfand transform Dπ ° Dð, Þ is invertible on M, which means that Dðξ, xÞ ≠ 0
for all ðξ, xÞ 2 M:
ð5:6Þ
The compactness criterion for Mellin pseudodifferential operators with symbols a 2 Eð + , VðÞÞ has the following form. Theorem 5.4 ([28, Theorem 5.5]) For p 2 (1, 1), the Mellin pseudodifferential operator OpðaÞ with symbol a 2 Eð + , VðÞÞ is a compact operator on the space Lp ð + , dμÞ if and only if aðξ, xÞ = 0 for all ðξ, xÞ 2 M:
ð5:7Þ
6 Applications of Mellin Pseudodifferential Operators Let p 2 (1, 1), Γ = Nk= 1 Γk be the star-like curve defined by (3.1) and (3.2), and let α : Γ → Γ satisfy the conditions of Section 3. Let Bp := BðLpN ð + ÞÞ and Kp := KðLpN ð + ÞÞ, where LpN ð + Þ is the Banach space of vectorfunctions φ = fφk gNk= 1 with entries φk 2 Lp ð + Þ and the norm 1∕ p k φ k = ð Nk= 1 k φk k pLp ð + Þ Þ . We now consider the isomorphisms Υ : Lp ðΓÞ → LpN ð + , dμÞ, ðΥf ÞðrÞ = fr 1∕ p f ðeiβk rÞgk = 1 ðr 2 + Þ, N
Ψ : BðLp ðΓÞÞ → BðLpN ð + , dμÞÞ,
A ° ΥAΥ - 1 :
ð6:1Þ For β 2 with Re β 2 (0, 2π), we take the operators Rβ 2 Bp defined by ðRβ f ÞðrÞ =
1 πi
+
f ðϱÞ dϱ, ϱ - eiβ r
r 2 + :
ð6:2Þ
The operator Rβ belongs to the Banach algebra algfI, S + g ⊂ Bp generated by the operators I and S + (see, e.g., [40, Section 4.2]).
On Singular Integral Operators with Shifts
503
By (3.4) and (3.5), we obtain ΨðV α Þ = diagfck- 1 V αk gk = 1 , V αk f = f ∘αk , ck ðrÞ := eωk ðrÞ∕ p , r 2 + , N
ð6:3Þ where αk : r ° reωk ðrÞ are in QCSð + Þ, V αk 2 BðLp ð + , dμÞÞ and, by (3.7), cα ðeiβk rÞ = ck ðrÞ = eωk ðrÞ∕ p
for r 2 + and all k = 1, 2, . . ., N:
ð6:4Þ
We get the following two crucial Ψ-images from [31, Theorems 7.2, 7.3]. Theorem 6.1 If p 2 (1, 1) and Γ = ΨðSΓ Þ = ½Opðεk sj,k Þj,k = 1 , N
N k = 1 Γk
is given by (3.1)–(3.2), then
sj,k ðr, xÞ := sj,k ðxÞ for ðr, xÞ 2 + × , ð6:5Þ
where εk := 1 if 0 is the starting point of Γ k, εk := -1 if 0 is the terminating point of Γ k, the functions sj,k are given for x 2 and j, k = 1, 2, . . ., N by
sj,k ðxÞ :=
cothðπðx + i∕ pÞÞ
if j = k,
exp ðθj,k ðx + i∕ pÞÞ sinh ðπðx + i∕ pÞÞ
if j ≠ k,
θj,k := π sgnðj - kÞ - ðβj - βk Þ2ð - π, πÞ,
ð6:6Þ
ð6:7Þ
and sj,k 2 VðÞ, sj,k 2 QCð + , VðÞÞ for all j, k = 1, 2, . . ., N. Theorem 6.2 If p 2 (1, 1) and if a star-like curve Γ = Nk= 1 Γk and a quasicontinuous shift α : Γ → Γ satisfy the conditions of Section 3.1, then ½ΨðV α SΓ Þj,k = εk Opðvj,k Þ for all j, k = 1, 2, . . ., N ðj ≠ kÞ, where the functions vj,k 2 QCð + , VðÞÞ are given for ðr, xÞ 2 + × by exp ðθj,k ðx + i∕ pÞÞ ð6:8Þ vj,k ðr, xÞ := eiωj ðrÞðx + i∕ pÞ sinh ðπðx + i∕ pÞÞ with θj,k defined by (6.7), and εk are given by (3.3).
504
Y. I. Karlovich and J. Rosales-Méndez
It is easily seen that the functions sj,k and vj,k given by (6.5)–(6.7) and (6.8), respectively, satisfy conditions (5.2) and (5.3), and therefore belong to the algebra Eð + , VðÞÞ. For j, k = 1, 2, . . ., N, we get ½ΨðSΓ Þk,k = εk S and ½ΨðSΓ Þj,k = εk Rj,k if j≠k, where the operators S, Rj,k 2 BðLp ð + , dμÞÞ are given for r 2 + by ðSf ÞðrÞ =
1 πi
+
ðr∕ ϱÞ1∕ p f ðϱÞ 1 dϱ, ðRj,k f ÞðrÞ = πi ϱ-r
+
ðr∕ ϱÞ1∕ p f ðϱÞ dϱ: ϱ - eiðβj - βk Þ r
Thus, by Theorems 6.1 and 6.2, the Ψ-images of SΓ and the commutator [Vα, SΓ] are Mellin pseudodifferential operators with matrix quasicontinuous symbols in the algebra ½Eð + , VðÞÞN × N . Below we will use the following important result. Theorem 6.3 ([31, Theorem 7.7]) If p 2 (1, 1), Γ and α satisfy the conditions of Section 3 and the function cα 2 QC( Γ) is given by (3.7), then the operators T := (cαVα)1P+ + P- are Fredholm on the space L p( Γ) and Ind T = 0.
7 Proof of Theorem 3.1 Obviously, the operator A = aI - bVα defined by (3.8) is invertible on the space L p( Γ) if and only if for every k = 1, 2, . . . , N the operators Ak := ak I - bk V αk , with coefficients ak , bk 2 QCð + Þ and shifts αk : + → + given in Theorem 3.1, are invertible on the space Lp ð + Þ. Rewriting the operators Ak in the form Ak = ak I - bk U αk ,
ð7:1Þ
where bk := bk ðα0k Þ - 1∕ p and U αk := ðα0k Þ1∕ p V αk are isometric operators, we deduce from [17, Theorem 3] that each operator Ak of the form (7.1) is invertible on the space Lp ð + Þ if and only if it is invertible on the space L2 ð + Þ. For every k = 1, 2, . . ., N, we consider the cyclic group fαnk : n 2 g that acts topologically freely (see, e.g., [9]) on the maximal ideal space MðQCð + ÞÞ defined by (2.4). Let V be the unitary operator on the space l2 := l2 ðÞ given by (V f )(n) = f(n + 1) for all f 2 l2 and all n 2 . For every ξ 2 M + ðQCðÞÞ := t2 + M t ðQCðÞÞ, we define the discrete operators
On Singular Integral Operators with Shifts
505
ð7:2Þ
Ak,ξ := ak,ξ I - bk,ξ V 2 Bðl2 Þ, where the functions ak,ξ , bk,ξ 2 l1 := l1 ðÞ are given by ak,ξ ðnÞ = ðak ∘αnk ÞðξÞ,
bk,ξ ðnÞ = ðbk ∘αnk ÞðξÞ,
n 2 :
Similarly to [9, Theorem 5.2], the operator Ak given by (7.1) is invertible on the space L2 ð + Þ if and only if for every ξ 2 M + ðQCðÞÞ the operator Ak,ξ given by (7.2) is invertible on the space l2 and -1 supfkAk,ξ kBðl2 Þ : ξ 2 M + ðQCðÞÞg < 1:
ð7:3Þ
By [17, Theorem 17], for every ξ 2 M + ðQCðÞÞ, the operator Ak,ξ is invertible on the space l2 if and only if one of the following conditions holds: ðiÞ ðiiÞ
inf jak,ξ ðnÞj > 0, rððbk,ξ ∕ ak,ξ ÞVÞ < 1;
n2
inf jbk,ξ ðnÞj > 0, rððak,ξ ∕ bk,ξ ÞV - 1 Þ < 1;
ð7:4Þ
n2
where r(A) is the spectral radius of an operator A 2 Bðl2 Þ. Let the operator Ak of the form (7.1) be invertible on the space L2 ð + Þ. Following [9, Theorem 5.2(iii)], we then conclude that the operators Ak,η := ak ðηÞI - bk ðηÞV are invertible on the space l2 for every η 2 Δ as well, which means that jak ðηÞj ≠ jbk ðηÞj for all η 2 Δ. Hence, either jak ðηÞj > jbk ðηÞj for all η 2 M 0+ ðQCðÞÞ or jak ðηÞj < jbk ðηÞj for all ðQCðÞÞ. η 2 M 0+ ðQCðÞÞ. The same property is valid for η 2 M 1 We claim that, actually, either jak ðηÞj > jbk ðηÞj for all η 2 Δ or jak ðηÞj < jbk ðηÞj for all η 2 Δ. Indeed, if 0 is the repelling point of αk and 1 is the attracting point of αk and jak ðηÞj > jbk ðηÞj for all η 2 M 0+ ðQCðÞÞ and jak ðηÞj < jbk ðηÞj for all η 2 M 1 ðQCðÞÞ, then for every ξ 2 M + ðQCðÞÞ there is an nξ 2 such that inf n > nξ jbk,ξ ðnÞj > 0, inf n < - nξ jak,ξ ðnÞj > 0 and jak,ξ ðnÞj∕ jbk,ξ ðnÞj ≤ jbk,ξ ðnÞj∕ jak,ξ ðnÞj ≤
max
fjak ðηÞj∕ jbk ðηÞjg < 1 for all n > nξ ,
max
fjbk ðηÞj∕ jak ðηÞjg < 1 for all n < - nξ :
η2M 1 ðQCðÞÞ
η2M 0+ ðQCðÞÞ
506
Y. I. Karlovich and J. Rosales-Méndez
Since the operator Ak of the form (7.1) is invertible on the space L2 ð + Þ, then for each ξ 2 M + ðQCðÞÞ, we can attain the property ak,ξ(n)bk,ξ(n)≠0 for all n 2{-nξ, . . ., nξ} by a small perturbation of coefficients ak , bk 2 QCð + Þ that preserves the invertibility of the operator Ak,ξ on the space l2. By [17, Proposition 14], the function -1 s = n ½bk,ξ ðsÞ∕ ak,ξ ðsÞ
f ðnÞ =
1 n s = 1 ½ak,ξ ðs - 1Þ∕ bk,ξ ðs - 1Þ
if
n 2 - ,
if
n = 0,
if
n 2 ,
ð7:5Þ
belongs to kerAk,ξ ⊂ l2 , which contradicts the invertibility of Ak on L2 ð + Þ. If 0 is the attracting point of αk and 1 is the repelling point of αk, and jak ðηÞj < jbk ðηÞj for all η 2 M 0+ ðQCðÞÞ and jak ðηÞj > jbk ðηÞj for all ðQCðÞÞ, then again the operator Ak is not invertible on L2 ð + Þ η 2 M1 because the function (7.5) belongs to kerAk,ξ ⊂ l2 , which contradicts the invertibility of Ak on L2 ð + Þ. Two other cases of different inequalities at 0 and 1 contradict the invertibility of the adjoint operator Ak on the space L2 ð + Þ, which proves the claim. Thus, for the invertible operator Ak of the form (7.1), only one of the cases (i) or (ii) can be realized for all ξ 2 M + ðQCðÞÞ in (7.4). Namely, min
jak ðξÞj > 0,
ξ2MðQCð + ÞÞ
sup
r
ξ2M + ðQCðÞÞ
jb ðηÞj jbk ðηÞj for all η 2 Δ, and min
jbk ðξÞj > 0,
ξ2MðQCð + ÞÞ
sup ξ2M + ðQCðÞÞ
r
ð abk,ξ k,ξ
V -1
Þ = max jak ðηÞj < 1 η2Δ
jbk ðηÞj
ð7:7Þ if jak ðηÞj < jbk ðηÞj for all η 2 Δ, which proves the necessity of conditions (3.9) and (3.10). Combining (7.4), (7.6), and (7.7), we obtain the sufficiency of conditions (3.9) and (3.10) in view of fulfillment of (7.3). □ If conditions of Theorem 3.1 are fulfilled and for k = 1, 2, . . . , N, the operator Ak = ak I - bk V αk is invertible on the space Lp ð + Þ, then
On Singular Integral Operators with Shifts
Ak- 1 =
1 n=0
507
ððbk ∕ ak ÞV αk Þ ak- 1 I
- V α-k 1
n
1 n=0
if ð3:9Þ holds,
ððak ∕ bk ÞV α-k 1 Þ bk- 1 I n
if ð3:10Þ holds:
ð7:8Þ
We write ak ≫ bk if (3.9) holds in (7.8), and bk ≫ ak if (3.10) holds there.
8 Proof of Theorem 3.2 In what follows, A ≃ B means that A - B is a compact operator. Since the operators A = aI - bVα are invertible on the space L p( Γ) and hence the operators A,k = a,k I - b,k V αk are invertible on the space Lp ð + Þ for all k = 1, 2, . . ., N, we infer from Theorem 3.1 and (3.16) that A,k ðξ, xÞ ≠ 0 for all ðξ, xÞ 2 M0 [ M1 and all k = 1, 2, . . ., N, where M0 and M1 are defined by (3.13). Take the shift operators V γ 2 BðLp ðΓÞÞ with shifts γ : Γ → Γ given by γ ðeiβk rÞ = eiβk γ ,k ðrÞ,
γ ,k ðrÞ =
r
if
a,k ≫ b,k ,
αk- 1 ðrÞ
if
b,k ≫ a,k ,
ð8:1Þ
for r 2 + and all k = 1, 2, . . ., N. Let cγ ðeiβk rÞ = cγ,k ðrÞ = ðγ ,k ðrÞ∕ rÞ1∕ p for r 2 + and all k = 1, 2, . . ., N according to (6.4). Theorem 6.3 implies that the operator Y 0 := ðcγ + V γ + ÞP + + ðcγ - V γ - ÞP - is Fredholm on the space L p( Γ). Indeed, the operator cγ - V γ - is invertible on the space L p( Γ): Y 0 = ðcγ - V γ - Þðcγ V γ P + + P - Þ, 1∕ p 1 where γ := γ + ∘γ for t 2 Γ, and the operator - : Γ → Γ, cγ ðtÞ := ðγðtÞ∕ tÞ p ðcγ V γ ÞP + + P - is Fredholm on the space L ( Γ) by Theorem 6.3. Similarly,
the operator Y 0 := ðcγ + V γ + Þ - 1 P + + ðcγ - V γ - Þ - 1 P - is also Fredholm on the space L p( Γ). Then the Mellin pseudodifferential operator ΨðY 0 Y 0 Þ is Fredholm on the space LpN ð + , dμÞ as well. We then get YY 0 = A + cγ + V γ + P + + A - cγ - V γ - P - + H 0 , H 0 := ðA + - A - ÞðP + cγ - V γ - P - - P - cγ + V γ + P + Þ:
ð8:2Þ
508
Y. I. Karlovich and J. Rosales-Méndez
For every operator Rβ given by (6.2), we deduce from Theorems 6.1 and 6.2 that the operator diagfA,k gNk= 1 Rβ is a Mellin pseudodifferential operator with quasicontinuous symbol in ½Eð + , VðÞÞN × N . It follows by analogy -1 N gk = 1 Rβ also possesses this with [20, Lemma 5.5] that the operator diagfA,k property. Hence we conclude from (8.2) that ΨðH 0 Þ = Op½ðA + - A - ÞðP + D - P - - P - D + P + Þ is a Mellin pseudodifferential operator with quasicontinuous matrix symbol in ½Eð + , VðÞÞN × N , where A ðr, xÞ := diagfa,k ðrÞ - b,k ðrÞeiωk ðrÞðx + i∕ pÞ gk = 1 ,
ð8:3Þ
N 1 I N ½εk sj,k ðr, xÞj,k = 1 , 2
ð8:4Þ
N
P ðr, xÞ := D = diagfd,k gk = 1 , N
d,k ðr, xÞ =
1 e
- iðωk ∘αk- 1 ÞðrÞx
if
a,k ≫ b,k ,
if
b,k ≫ a,k ,
ess inf fjdetðD ðr, xÞÞj : ðr, xÞ 2 + × g > 0, IN is the identity N × N matrix and the functions sj,k are defined in Theorem 6.1. Applying (8.1) and (6.3), we deduce that ΨðA cγ V γ Þ = diagfB,k gk = 1 , N
where, respectively, B,k :=
a,k I - b,k ck- 1 V αk
if a,k ≫ b,k ,
- ðb,k ck- 1 I - a,k V α-k 1 Þ
if b,k ≫ a,k ,
and the functions ck are given by (6.3). Consider the invertible on the space Lp ð + , dμÞ operators A,k :=
I - ðb,k ∕ a,k Þck- 1 V αk
if a,k ≫ b,k ,
I - ða,k ∕ b,k Þck V α-k 1
if b,k ≫ a,k ,
and the invertible on the space L p( Γ) operators
On Singular Integral Operators with Shifts
509 N
A,m = Ψ - 1 ðdiagfð1 - δm,k ÞI + δm,k A,k gk = 1 Þ,
m = 1, 2, . . ., N,
ð8:5Þ
where δm,k is the Kronecker symbol. For m = 1, 2, . . ., N, we introduce the operators P,2 , Y m , Y m 2 BðLp ðΓÞÞ given by ðP,2 f ÞðtÞ :=
f ðtÞ 1 2πi 2
Γ
ðt∕ τÞ1∕2 - 1∕ p f ðτÞ dτ, τ-t
Y m := A + ,m P + ,2 + A - ,m P - ,2
-1
f 2 Lp ðΓÞ,
t 2 Γ,
-1
Y m := A + ,m P + ,2 + A - ,m P - ,2 , ð8:6Þ
where the operators A,m are defined by (8.5). Clearly, ΨðY m Y m Þ ≃ OpðNm Þ,
ΨðY m Y m Þ ≃ OpðNm Þ,
ð8:7Þ
where OpðNm Þ and OpðNm Þ are Mellin pseudodifferential operators with quasicontinuous symbols Nm , Nm 2 ½Eð + , VðÞÞN × N , respectively. These symbols are given for ðr, xÞ 2 + × by Nm ðr, xÞ = Y m ðr, xÞY m ðr, xÞ,
Nm ðr, xÞ = Y m ðr, xÞY m ðr, xÞ,
Y m ðr, xÞ ¼ Aþ,m ðr, xÞPþ,2 ðr, xÞ þ A - ,m ðr, xÞP - ,2 ðr, xÞ, -1
-1
Y m ðr, xÞ ¼ Aþ,m ðr, xÞPþ,2 ðr, xÞ þ A - ,m ðr, xÞP - ,2 ðr, xÞ,
ð8:8Þ ð8:9Þ ð8:10Þ
where P,2 ðr, xÞ are given by (8.4) with sj,k ðr, xÞ := sj,k ðxÞ defined by (6.6)
with p = 2, A,m ðr, xÞ ¼ diagfð1 - δm,k Þ þ δm,k A,k ðr, xÞgk¼1 , and N
A,k ðr, xÞ =
1 - ðb,k ðrÞ∕ a,k ðrÞÞeiωk ðrÞðx + i∕ pÞ
if a,k ≫ b,k ,
1 - ða,k ðrÞ∕ b,k ðrÞÞe - iωk ðrÞðx + i∕ pÞ
if b,k ≫ a,k : ð8:11Þ
510
Y. I. Karlovich and J. Rosales-Méndez
Further, we deduce from (8.8)–(8.10) and (8.5) that detðNm ðr, xÞÞ ¼ detðNm ðr, xÞÞ ¼ detðY m ðr, xÞÞdetðY m ðr, xÞÞ, detðY m ðr, xÞÞ ¼ Ajm ,m ðr, xÞP þ,2 ðxÞ þ A - jm ,m ðr, xÞP - ,2 ðxÞ, -1
ð8:12Þ
-1
detðY m ðr, xÞÞ ¼ Ajm ,m ðr, xÞP þ,2 ðxÞ þ A - jm ,m ðr, xÞP - ,2 ðxÞ, where jm = if εm = 1, P ,2 ðxÞ = 12 ð1 tanh ðπxÞÞ for x 2 , and A,m ðr, xÞ are given by (8.11). p 1 Given p 2 (1, 1), let P 2 := 2 ðI S2 Þ 2 BðL ð + ÞÞ, where ðS2 f ÞðtÞ =
1 πi
1∕2 - 1∕ p
+
ð τt Þ
f ðτÞ dτ, τ-t
t 2 + :
It follows similarly to [20, Theorem 7.1] that the operators W m := Ajm ,m P2+ + A - jm ,m P2- ,
-1
-1
W m := Ajm ,m P2+ + A - jm ,m P2-
ð8:13Þ
are Fredholm on the space Lp ð + Þ. Consider the isomorphisms Υ0 : Lp ð + Þ → Lp ð + , dμÞ, ðΥ0 f ÞðrÞ = r1∕ p f ðrÞ ðr 2 + Þ, Ψ0 : BðLp ð + ÞÞ → BðLp ð + , dμÞÞ,
A ° Υ0 AΥ0- 1 : ð8:14Þ
We infer from (8.12)–(8.14) that Ψ0 ðW m W m Þ ≃ Ψ0 ðW m W m Þ ≃ Opðdet Y m det Y m Þ = Opðdet Nm Þ:
ð8:15Þ
As the operators Wm and W m are Fredholm on the space Lp ð + Þ, we deduce from (8.15) that the operator OpðdetNm Þ is Fredholm on the space Lp ð + , dμÞ. Since the entries of the operator matrices OpðNm Þ and OpðNm Þ belong to the Banach algebra Dp and therefore their pairwise commutators are compact operators on the space Lp ð + , dμÞ in view of Theorem 5.2, it follows that the operators OpðNm Þ and OpðNm Þ are Fredholm on the space LpN ð + , dμÞ if and only if the Mellin pseudodifferential operators OpðdetNm Þ and OpðdetNm Þ with symbols detNm , detNm 2 Eð + , VðÞÞ are Fredholm on the space Lp ð + , dμÞ. By Theorem 5.3 (see (5.6)), detNm ðξ, xÞ ≠ 0 and detNm ðξ, xÞ ≠ 0
for all ðξ, xÞ 2 M,
ð8:16Þ
On Singular Integral Operators with Shifts
511
where the compact set M is defined by (5.4). By (8.7), for every m = 1, 2, . . ., N, the Fredholmness of the operators OpðNm Þ and OpðNm Þ on the space LpN ð + , dμÞ implies the Fredholmness of the operators Ym and Y m on the space L p( Γ). Hence, we deduce from (8.16) for m = 1, 2, . . ., N that detðY m ðξ, xÞÞ ≠ 0
for all ðξ, xÞ 2 M0 [ M1 :
ð8:17Þ
Further, we infer from (8.2) and (8.6) that N 0 := YY 0 Y 1 Y 2 ⋯Y N ≃ d + P + ,2 + d - P - ,2 + H, ð8:18Þ
N
H :=
m=0
H m Y m + 1 Y m + 2 ⋯Y N ,
where H0 is given in (8.2) and -1
-1
H 1 := ðA + cγ + V γ + - A - cγ - V γ - ÞðP + A - ,1 P - ,2 - P - A + ,1 P + ,2 ÞY 2 Y 3 ⋯Y N , -1
-1
-1
-1
H m := ðA + cγ + V γ + A + ,1 ⋯A + ,m - 1 - A - cγ - V γ - A - ,1 ⋯A - ,m - 1 Þ -1
-1
× ðP + ,2 A - ,m P - ,2 - P - ,2 A + ,m P + ,2 ÞY m + 1 Y m + 2 ⋯Y N Y m + 1 Y m + 2 ⋯Y N = I
if
d ðeiβm rÞ := d,m ðrÞ =
m=N am ðrÞ
for
ðm = 2, . . ., NÞ,
m = 1, 2, . . ., N, if a,m ≫ b,m ,
- b,m ðrÞcm- 1 ðrÞ if b,m ≫ a,m ,
r 2 + :
Since the operators Y0 and Y m for m = 1, 2, . . ., N are Fredholm on L p( Γ), we conclude from (8.18) that the operator Y is Fredholm on the space L p( Γ) if and only if so is the operator N0. But ΨðN 0 Þ = OpðN0 Þ is a Mellin pseudodifferential operator with a quasicontinuous symbol N0 2 ½Eð + , VðÞN × N . The operator OpðN0 Þ is Fredholm on the space LpN ð + , dμÞ if and only if the operator OpðdetN0 Þ is Fredholm on the space Lp ð + , dμÞ. By (8.18), we infer that detN0 ðr, xÞ = det½diagfd + ,m ðrÞgNm = 1 P + ,2 ðr, xÞ + diagfd - ,m ðrÞgNm = 1 P - ,2 ðr, xÞ =
ð
+
ð
εm = 1 d + ,m ðrÞ
Þð
Þð
εm = - 1 d + ,m ðrÞ
ÞP + ðxÞ = 1 d - ,m ðrÞÞP - ðxÞ
εm = - 1 d - ,m ðrÞ εm
512
Y. I. Karlovich and J. Rosales-Méndez
for ðr, xÞ 2 M + 1 [ M - 1 . Since the functions d,m are invertible in L1 ð + Þ for every m = 1, 2, . . ., N in view of Theorem 3.1 for the invertible operators A, we conclude that ess inf r2 + jdet N0 ðr, 1Þj > 0. Hence, we infer from Theorem 5.3 (see also [27, Theorem 7.1]) that the operator OpðdetN0 Þ is Fredholm on the space Lp ð + , dμÞ if and only if detN0 ðξ, xÞ ≠ 0 for all ðξ, xÞ 2 M0 [ M1 . Since the Mellin pseudodifferential operator ΨðY 0 Y 0 Þ = OpðY 0 Y 0 Þ with symbol -1
-1
Y 0 Y 0 = ðD + P + + D - P - ÞðD + P + + D - P - Þ 2 ½Eð + , VðÞÞN × N
is Fredholm on the space LpN ð + , dμÞ, we deduce from Theorem 5.3 that det½Y 0 ðξ, xÞY 0 ðξ, xÞ ≠ 0
for all
ðξ, xÞ 2 M:
ð8:19Þ
It follows from (8.18) for all ðξ, xÞ 2 M0 [ M1 that det N0 ðξ, xÞ = det Yðξ, xÞdet Y 0 ðξ, xÞdet Y 1 ðξ, xÞ⋯det Y N ðξ, xÞ,
ð8:20Þ
where det Y 0 ðξ, xÞ ≠ 0 by (8.19), det Y m ðξ, xÞ ≠ 0 for all m 2{1, 2, . . ., N} in view of (8.17), and with A given by (8.3), det Yðξ, xÞ¼det ðAþ ðξ, xÞPþ ðξ, xÞ þ A - ðξ, xÞP - ðξ, xÞÞ is calculated by formula (3.14) similarly to [31, formula (3.12), Theorem 3.1]. Hence, we infer from (3.16) and (8.20) that det N0 ðξ, xÞ ≠ 0 for all ðξ, xÞ 2 M0 [ M1 if and only if det Yðξ, xÞ ≠ 0 for these points, which completes the proof. □
9 General Form of Singular Integral Operators with Shifts To prove Theorem 3.3 under conditions of Section 3, we need to find by analogy with [10] a general form of operators in the C-algebra B := algfQCðΓÞ, SΓ , U α g ⊂ BðL2 ðΓÞÞ
ð9:1Þ
generated by all multiplication operators aI with a 2 QC( Γ), by the Cauchy singular integral operator SΓ defined by (1.2) and by the unitary shift operator Uα : f ° |α′|1∕2( f ∘ α). Clearly, U nα 2 B for all n 2 .
On Singular Integral Operators with Shifts
513
The cyclic group G = fαn : n 2 g generated by the shift α has the set Λ := {0, 1}⊂ Γ of common fixed points and acts freely on the set Γarc := Γ ∖ Λ. Thus, the group G acts topologically freely on Γ, which means that for every finite set F ⊂ G ∖{e}, where e = α0 is the unit of G, and every open set Ω ⊂ Γ, there is a t 2 Ω such that g(t)≠t for every g 2 F (cf. [9]). Let Λ∘ and ∂ Λ be, respectively, the interior and the boundary of Λ. Then ∘ Λ = ∅ and ∂ Λ = Λ. For the set GΓ of all G-orbits G(t) := {g(t) : g 2 G} of points t 2 Γ, we have the following partition: GΓ = GΛ [ Garc, where GΛ is the set of all one-point G-orbits on Γ, and Garc is the set of all countable G-orbits on Γarc. For every k = 1, 2, . . . , N, we define the mapping γ k : + → Γk , γ k ðrÞ := eiβk r for all r 2 + :
ð9:2Þ
Fix tw 2 w for every G-orbit w 2 Garc, and put Γk,arc := Γk ∖ Λ for k = 1, 2, . . . , N. For every G-orbit w 2 Garc, we also define tw := γ k- 1 ðt w Þ and λw := γ k if t w 2 Γk,arc for k = 1, 2, . . ., N:
ð9:3Þ
Along with B given by (9.1), we consider its C-subalgebra A := algfQCðΓÞ, U α g ⊂ BðL2 ðΓÞÞ
ð9:4Þ
generated by the operators Uα and aI for all a 2 QC( Γ). Let A0 and B0 be the non-closed subalgebras of B that consist of all operators of the form ni= 1 T i,1 T i,2 . . . T i,ji ðn, ji 2 Þ, where Ti,k are, respectively, the generators aI (a 2 QC( Γ)) and Uα of the C-algebra A given by (9.4), or the generators aI (a 2 QC( Γ)), SΓ and Uα of the C-algebra B. The algebras A0 and B0 are dense subalgebras of A and B, respectively. Let ℌ be the closed two-sided ideal of B being the closure of the set m
ℌ0 :=
i=1
Bi H i C i : Bi , Ci 2 B0 , H i 2 H, m 2 ⊂ B,
ð9:5Þ
where H := f½aI, SΓ , U nα SΓ ðU nα Þ - SΓ : a 2 QCðΓÞ, n 2 g. The ideal K = KðL2 ðΓÞÞ is contained in ℌ (see, e.g., [8, Section 11]). Applying Theorems 6.1, 6.2 and 5.4, we infer that the operators H 2 ℌ have point singularities at 0 and 1 and are locally compact at all other points of Γ due to (5.7), which means that cH ≃ HcI ≃ 0 for all functions c 2 C( Γ) that vanish at 0 and 1.
514
Y. I. Karlovich and J. Rosales-Méndez
For every k = 1, 2, . . ., N, we define the set Ωk,arc := w2Gk,arc
M t ðQCðÞÞ, w
ð9:6Þ
where Gk,arc is the set of all G-orbits of points t 2 Γk,arc. For every k = 1, 0 2, . . ., N, every ξ 2 Ωk,arc and every operator A = m2F am U m α 2 A with a finite set F ⊂ and coefficients am 2 QC( Γ), we define similarly to (7.2) the discrete operators Ak,ξ on the space l2 = l2 ðÞ by Ak,ξ =
am,k,ξ V m ,
ð9:7Þ
m2F
where (V f )(n) = f(n + 1) for all f 2 l2 and all n 2 , the coefficients am,k,ξ 2 l1 are given by am,k,ξ ðnÞ = ððam Þk ∘αnk ÞðξÞ for all n 2 , and ððam Þk ∘αnk ÞðξÞ means the value of the Gelfand transform of the function ðam Þk ∘αnk 2 QCð + Þ at the point ξ 2 Ωk,arc. Since Λ∘ = ∅, we obtain the following crucial lemma by analogy with [10, Lemma 8.2]. Lemma 9.1 Every operator B 2 B0 is represented in the form B = A + P + + A - P - + H B , with A = m2F
0 0 m a m Uα 2 A , HB 2 ℌ ,
ð9:8Þ where F is a finite subset of , a m 2 QCðΓÞ for all m 2 F, max
0 sup kA k,ξ kBðl2 Þ ≤ inf f k B + H k BðL2 ðΓÞÞ : H 2 ℌ g,
k = 1, 2, ..., N ξ2Ωk,arc
ð9:9Þ
the sets Ωk,arc are given by (9.6), and for every k = 1, 2, . . ., N and every ξ 2 0 2 Ωk,arc the operators A k,ξ 2 Bðl Þ are defined for A 2 A as in (9.7). Proof Fix B 2 B0 . Since U α aU α = ða∘αÞI and a ∘ α 2 QC( Γ) along with a 2 QC( Γ) in view of Theorem 4.2 applied to the diagonal matrix function being the coefficient in Ψ[(a ∘ α)I] (see (6.1)), we deduce from (9.5) that B is of the form (9.8). Thus, we only need to prove inequality (9.9). First, by analogy with [7, Theorem 7.2], let us prove that k A k,ξ k Bðl2 Þ ≤ k B + H k BðL2 ðΓÞÞ
ð9:10Þ
On Singular Integral Operators with Shifts
515
for every H 2 ℌ0 , every k = 1, 2, . . ., N and every ξ 2 Ωk,arc. Fix H 2 ℌ0 , which implies that B + H = A + P + + A - P - + H0,
ð9:11Þ
with H 0 = H B + H 2 ℌ0 :
For every n 2 , let Πn 2 Bðl2 Þ be the multiplication operator by the characteristic function of the finite set F n = fm 2 : jmj ≤ ng. Fix k = 1, 2, . . ., N, w 2 Gk,arc, ξ 2 M t ðQCðÞÞ. Then w
k A k,ξ k Bðl2 Þ = lim inf k Πn Ak,ξ Πn k Bðl2 Þ :
ð9:12Þ
n→1
Choose a segment u ⊂ + such that t w 2 u and the segments αm k ðuÞ are disjoint subsets of + for all m 2 . Let M u ðQCðÞÞ := τ2u M τ ðQCðÞÞ. Consider the Hilbert space l2(Fn) consisting of the restrictions of functions f 2 l2 to the (2n + 1)-point set Fn. Let χ n be the characteristic function of the set un = γ k ðun Þ ⊂ Γk,arc , where un := nm = - n αm k ðuÞ ⊂ + . Then χ nH0χ nI ≃ 0 on the space L2( Γ). Applying the isometric isomorphism σ n : L2 ðun Þ → L22n + 1 ðuÞ, ðσ n φÞðtÞ = ½ðU m α φÞðγ k ðtÞÞm = - n , t 2 u, n
we infer form (9.11) similarly to [7, Theorem 7.2] that σ n χ n ðB + HÞχ n σ n- 1 ≃ σ n ðχ n A + χ n P + χ n I + χ n A - χ n P - χ n I Þσ n- 1 +
-
≃ AðnÞ Pu+ + AðnÞ Pu- 2 BðL22n + 1 ðuÞÞ, ð9:13Þ 1 where P u = 2 ðI Su Þ and Su is given by (1.2) with Γ replaced by u ⊂ + ,
and AðnÞ are (2n + 1) × (2n + 1) matrix functions with entries in QCðÞju
m such that AðnÞ ðxÞ := ½ða s - m,k ∘αk ÞðxÞm,s = - n for x 2 u. Then it follows n
similarly to [14, Theorem 7.1 and Section 7.4] that the operators AðnÞ I are +
-
invertible on the space L22n + 1 ðuÞ if the operator AðnÞ Pu+ + AðnÞ PuFredholm on this space. This implies that
+
-
kAðnÞ I kBðL22n + 1 ðuÞÞ ≤ jAðnÞ Pu+ + AðnÞ Pu- j:
is
ð9:14Þ
516
Y. I. Karlovich and J. Rosales-Méndez
Combining (9.13) and (9.14), we infer that
kAðnÞ I kBðL22n + 1 ðuÞÞ ≤ jχ n ðB + HÞχ n Ij ≤ k B + H k BðL2 ðΓÞÞ :
ð9:15Þ
By (9.15), for every η 2 M u ðQCðÞÞ and every n 2 , it follows that
kΠn Ak,η Πn kBðl2 Þ ≤ kAðnÞ I kBðL22n + 1 ðuÞÞ ≤ k B + H k BðL2 ðΓÞÞ , which implies that k Πn A k,ξ Πn k Bðl2 Þ ≤
sup η2M u ðQCðÞÞ
k Πn A k,η Πn k Bðl2 Þ ≤ k B + H k BðL2 ðΓÞÞ : ð9:16Þ
Applying (9.12) and (9.16), we infer that for every k = 1, 2, . . ., N, every w 2 Gk,arc and every ξ 2 M t ðQCðÞÞ ⊂ Ωk,arc , w
k A k,ξ k Bðl2 Þ = lim inf k Πn Ak,ξ Πn k Bðl2 Þ ≤ k B + H k BðL2 ðΓÞÞ , n→1
which implies (9.10) for all ξ 2 Ωk,arc and all H 2 ℌ0 . Hence, by (9.10), max
0 sup kA k,ξ kBðl2 Þ ≤ k B + H k BðL2 ðΓÞÞ for all H 2 ℌ ,
k = 1, 2, ..., N ξ2Ωk,arc
□
which immediately gives (9.9).
Let A be the C-algebra of 2 × 2 diagonal matrices with A-valued entries. Theorem 9.2 Every operator B 2 B is uniquely represented in the form B = A + P + + A - P - + HB,
ð9:17Þ
where A are functional operators in the C-algebra A, P = 12 ðI SΓ Þ and H B 2 ℌ. Moreover, the mapping B ° diagfA + , A - g is a C-algebra homomorphism of B onto A, whose kernel is ℌ, and k A k
≤ inf f k B + H k BðL2 ðΓÞÞ : H 2 ℌg ≤ jBj:
ð9:18Þ
On Singular Integral Operators with Shifts
517
Proof Fix B 2 B0 represented in the form (9.8). It is obvious in view of (9.9) that the mapping B ° diagfA + , A - g is an algebraic -homomorphism of the non-closed algebra B0 into A, whose kernel is ℌ0 . This map is given on the generators of B by aI ° diagfaI, aIg,
U g ° diagfU g , U g g,
SΓ ° diagfI, - Ig:
It follows from [9, Theorem 5.2] adapted to the space L2( Γ) that, respectively, k A k BðL2 ðΓÞÞ =
sup kA k,ξ kBðl2 Þ :
max
k = 1, 2, ..., N ξ2Ωk,arc
ð9:19Þ
Condition (9.18) for B 2 B0 follows from (9.19), Lemma 9.1 and K ⊂ H. By continuity, (9.18) is valid for any operator B 2 B, which implies the uniqueness of decomposition (9.17) for any operator B 2 B. □
10
Proof of Theorem 3.3
The proof of Theorem 3.3 is divided into the proofs of several theorems. According to [10, Section 4], with the C-algebra B given by (9.1), we associate the Hilbert space H1 :=
H1 w, w2Garc
ð10:1Þ
where the Hilbert spaces H1 w are given by 2 2 2 H1 w := l ðM t ðQCðÞÞ × f 1g, l ð, ÞÞ for all w 2 Garc : w
ð10:2Þ
We now consider the representation Φ1 of the C-algebra B on the Hilbert space (10.1), defined for every B 2 B by Φ1 ðBÞ :=
Φ1 w ðBÞ, w2Garc
ð10:3Þ
where Φ1 is the direct sum of the C-algebra homomorphisms 1 1 1 Φ1 w : B → BðHw Þ, B ° Φw ðBÞ = Symw ðBÞI ðw 2 Garc Þ,
ð10:4Þ
518
Y. I. Karlovich and J. Rosales-Méndez
defined initially on the generators aI, SΓ, and U κα of the C-algebra B for all a 2 QC( Γ) and all κ 2 as follows. If w 2 Garc, tw 2 Γarc, and t w 2 + , then Φ1 w ðBÞ are operators of multiplication by the infinite matrix functions 2 2 Sym1 w ðBÞ : M t ðQCðÞÞ × f 1g → Bðl ð, ÞÞ w
ð10:5Þ
whose values at the points ðξ, xÞ 2 M t ðQCðÞÞ × f 1g define bounded w
linear operators on the space l2 ð, 2 Þ and are given on the generators of B by n n ½Sym1 w ðaIÞðξ, xÞ:= diagfdiagfða∘α ∘λw ÞðξÞ, ða∘α ∘λw ÞðξÞggn2 ,
½Sym1 w ðSΓ Þðξ, xÞ:= diagfdiagf tanh ðπxÞ, - tanh ðπxÞggn2 , κ ½Sym1 w ðU α Þðξ, xÞ:= ðdiagfδn,m - κ , δn,m - κ gÞn,m2 ,
ð10:6Þ where for every a 2 QC( Γ) and every tw 2 Γk,arc with k = 1, 2, . . ., N, the function a ∘ αn ∘ γ k belongs to QCð + Þ, δn,m is the Kronecker symbol, and γ k and λw are defined by (9.2) and (9.3), respectively. Applying Theorem 9.2, we establish the following result by analogy with [10, Theorem 10.1] (see also [10, Theorem 4.1]). Theorem 10.1 Let a star-like curve Γ and a shift α : Γ → Γ satisfy the conditions of Section 3.1, where α has only two fixed points 0 and 1. Then 1 for every w 2 Garc the map Φ1 w : B → BðHw Þ defined on the generators of the C -algebra B by formulas (10.4)–(10.6) extends to a C-algebra homo 1 morphism Φ1 w of B into the C -algebra BðHw Þ such that 1 k Φ1 w ðBÞ k BðHw Þ ≤ jBj =
inf
K2KðL2 ðΓÞÞ
k B + K k BðL2 ðΓÞÞ for all B 2 B, ð10:7Þ
1 2 where H1 w is given by (10.2) and kerΦw ⊃ KðL ðΓÞÞ.
Proof Let w 2 Garc. It is easily seen from (10.6) that for every w 2 Garc there exists a matrix Dw such that Dw I 2 Bðl2 ð, 2 ÞÞ, Dw has exactly one nonzero entry in each row and each column, all these entries equal 1, and the 2 1 -1 similarity transform Sym1 w ðBÞ ° Dw Symw ðBÞDw := ðAi,j Þi,j = 1 changing positions of rows and columns sends odd diagonal entries into A1,1 and even
On Singular Integral Operators with Shifts
519
diagonal entries into A2,2. Then for any operator B 2 B0 represented in the form B = A+P+ + A-P- + HB by Lemma 9.1, every k = 1, 2, . . ., N, every w 2 Gk,arc, and every point ξ 2 M t ðQCðÞÞ, we infer from (10.6) that w
+ -1 Dw ð½Sym1 w ðBÞðξ, + 1ÞÞDw I = diagfAk,ξ , Ak,ξ g, + -1 Dw ð½Sym1 w ðBÞðξ, - 1ÞÞDw I = diagfAk,ξ , Ak,ξ g:
ð10:8Þ
Applying (9.19) and estimate (9.18), we deduce from (10.8) that k½Sym1 w ðBÞðξ, þ 1ÞkBðl2 ð,2 ÞÞ ≤ max f k A k BðL2 ðΓÞÞ g ≤ jBj, k½Sym1 w ðBÞðξ, - 1ÞkBðl2 ð,2 ÞÞ ≤ max f k A k BðL2 ðΓÞÞ g ≤ jBj:
ð10:9Þ 0 1 By (10.9), the algebraic -homomorphisms Ψ1 w : B → BðHw Þ extend by continuity to the whole C -algebra B, and the -homomorphisms Ψ1 w : 1 □ B → BðHw Þ satisfy (10.7) for all w 2 Garc.
Armed by Theorem 10.1 and applying the C-algebra homomorphisms Φ1 w given by (10.4)–(10.6), we immediately obtain the following necessary Fredholm condition by analogy with [10, Theorem 4.2(iv)]. Theorem 10.2 Under the conditions of Theorem 10.1, if an operator B 2 B is Fredholm on the space L2( Γ), then the operator Φ1(B) defined by (10.3)–(10.6) is invertible on the space H1 given by (10.1)–(10.2), that is, for every w 2 Garc and every ðξ, xÞ 2 M t ðQCðÞÞ × f1g, the operator w
2 2 ½Sym1 w ðBÞðξ, xÞI is invertible on the Hilbert space l ð, Þ and
sup
sup
-1
kð½Sym1 kBðl2 ð,2 ÞÞ < 1: w ðBÞðξ, xÞI Þ
w2Garc ðξ, xÞ2M ðQCðÞÞ × f1g tw
According to Section 3, to prove Theorem 3.3, it remains to obtain the following assertion on the basis of Theorem 10.2. Theorem 10.3 If the operator Y defined by (1.1) is Fredholm on the space L2( Γ), then the functional operators A = aI - bVα are invertible on the space L2( Γ). Proof For every k = 1, 2, . . ., N, every point t 2 Γk,arc, and every ξ 2 M γ k- 1 ðtÞ ðQCðÞÞ, we consider the discrete operators
520
Y. I. Karlovich and J. Rosales-Méndez 2 A k,ξ = ak,ξ I - bk,ξ V 2 Bðl Þ,
ð10:10Þ
1 where the functions a k,ξ , bk,ξ 2 l are given by n a k,ξ ðnÞ = ða,k ∘αk ÞðξÞ,
n b k,ξ ðnÞ = ðb,k ∘αk ÞðξÞ,
n 2 ,
b,k := b,k ðα0k Þ - 1∕2 , and a,k ðrÞ = a ðeiβk rÞ, b,k ðrÞ = b ðeiβk rÞ and αk ðrÞ := e - iβk αðeiβk rÞ for r 2 + and k = 1, 2, . . ., N. By (10.6), for w 2 Garc and Y defined by (1.1), the operator 1 Φ1 w ðYÞ = Symw ðYÞI is given by the following infinite matrix function: if tw 2 Γk,arc for k = 1, 2, . . ., N and ξ 2 M t ðQCðÞÞ, then w
n n ½Sym1 w ðYÞðξ, + 1Þ = ðdiagfða + ;k ∘αk ÞðξÞ, ða - ,k ∘αk ÞðξÞgδn,m Þn,m2
+ ðdiagfðb + ,k ∘αnk ÞðξÞ, ðb - ,k ∘αnk ÞðξÞgδn,m - 1 Þn,m2 , n n ½Sym1 w ðYÞðξ, - 1Þ = ðdiagfða - ,k ∘αk ÞðξÞ, ða + ,k ∘αk ÞðξÞgδn,m Þn,m2
+ ðdiagfðb - ,k ∘αnk ÞðξÞ, ðb + ,k ∘αnk ÞðξÞgδn,m - 1 Þn,m2 :
Then for k = 1, 2, . . ., N, tw 2 Γk,arc and ξ 2 M t ðQCðÞÞ, we infer that the w
2 2 operators ½Sym1 w ðYÞðξ, 1ÞI are invertible on the space l ð, Þ if and only if the discrete operators A k,ξ given by (10.10) are invertible on the space 2 2 l = l ðÞ. Thus, if the operator Y is Fredholm on the space L2( Γ), then we deduce from Theorem 10.2 that the discrete operators A k,ξ are invertible on 2 the space l for every k = 1, 2, . . ., N, every w 2 Garc with tw 2 Γk,arc, and every ξ 2 M t ðQCðÞÞ, and w
-1
kðAk,ξ Þ kBðl2 Þ < 1:
sup
sup
w2Gk,arc
ξ2M ðQCðÞÞ tw
Hence, by analogy with [9, Theorem 5.2(ii)], the operators A,k = a,k Ib,k V αk are invertible on the spaces L2( Γk) for all k = 1, 2, . . ., N. This implies the invertibility of both the operators A = aI - bVα on the space L2( Γ) for the Fredholm operator Y 2 BðL2 ðΓÞÞ. □ Finally, combining Theorem 10.3 and Theorem 3.2 in case p = 2, we complete the proof of Theorem 3.3. □
On Singular Integral Operators with Shifts
521
Acknowledgements The authors are grateful to the referees for the useful comments and suggestions.
References 1. Antonevich, A. B. (1988). Linear functional equations. Operator approach. Oper. Theory Adv. Appl. (vol. 83). Basel: Birkhäuser. Russian original: University Press, Minsk, 1988 2. Antonevich, A., & Lebedev, A. (1994). Functional differential equations: I.Ctheory. Pitman Monographs and Surveys in Pure and Applied Mathematics (vol. 70). Harlow: Longman Scientific & Technical 3. Antonevich, A., Belousov, M., & Lebedev, A. (1998). Functional differential equations: II.C-applications. Part 1 Equations with continuous coefficients. Pitman Monographs and Surveys in Pure and Applied Mathematics (vol. 94). Harlow: Longman Scientific & Technical 4. Antonevich, A., Belousov, M., & Lebedev, A. (1998). Functional differential equations: II.C-applications. Part 2 Equations with discontinuous coefficients and boundary value problems. Pitman Monographs and Surveys in Pure and Applied Mathematics (vol. 95). Harlow: Longman Scientific & Technical 5. Bastos, M. A., Fernandes, C. A., & Karlovich, Yu. I. (2007). Spectral measures in Calgebras of singular integral operators with shifts. Journal of Functional Analysis, 242, 86–126 6. Bastos, M. A., Fernandes, C. A., & Karlovich, Yu. I. (2008). C-algebras of singular integral operators with shifts having the same nonempty set of fixed points. Complex Analysis and Operator Theory, 2, 241–272 7. Bastos, M. A., Fernandes, C. A., & Karlovich, Yu. I. (2014). A C-algebra of singular integral operators with shifts admitting distinct fixed points. Journal of Mathematical Analysis and Applications, 413, 502–524 8. Bastos, M. A., Fernandes, C. A., & Karlovich, Yu. I. (2018). AC-algebra of singular integral operators with shifts and piecewise quasicontinuous coefficients. Operator Theory, Operator Algebras, and Matrix Theory (pp. 25–64), Oper. Theory Adv. Appl. (vol. 267). Cham: Birkhäuser/Springer 9. Bastos, M. A., Fernandes, C. A., & Karlovich, Yu. I. (2019). Invertibility criteria in C-algebras of functional operators with shifts and PQC coefficients. Integral Equations and Operator Theory, 91, 19 10. Bastos, M. A., Fernandes, C. A., & Karlovich, Yu. I. (2021). On C*-algebras of singular integral operators with PQC coefficients and shifts with fixed points. Complex Variables and Elliptic Equations, 67(3), 581–614 11. Böttcher, A., & Karlovich, Yu. I. (1997). Carleson curves, Muckenhoupt weights, and Toeplitz operators. Progress in Mathematics (vol. 154). Basel: Birkhäuser 12. Böttcher, A., Karlovich, Yu. I., & Silbermann, B. (1994). Singular integral equations with PQC coefficients and freely transformed argument. Mathematische Nachrichten, 166, 113–133 13. Böttcher, A., Karlovich, Yu. I., & Spitkovsky, I. M. (2002). Convolution operators and factorization of almost periodic matrix functions. Basel: Birkhäuser
522
Y. I. Karlovich and J. Rosales-Méndez
14. Böttcher, A., Roch, S., Silbermann, B., & Spitkovsky, I. M. (1990). A GohbergKrupnik-Sarason symbol calculus for algebras of Toeplitz, Hankel, Cauchy, and Carleman operators. Topics in Operator Theory. Ernst D. Hellinger Memorial (pp. 189–234). Oper. Theory Adv. Appl. (vol. 48). Basel: Birkhäuser 15. Böttcher, A., & Silbermann, B. (2006). Analysis of Toeplitz operators (2nd ed.). Berlin: Springer 16. Garnett, J. B. (1981). Bounded analytic functions. New York: Academic Press 17. Karlovich, A. Yu., & Karlovich, Yu. I. (2002). Invertibility in Banach algebras of functional operators with non-Carleman shifts. In Proceedings: Functional Analysis (pp. 107–124), Ukrainian Mathematical Congress-2001, Kiev, Ukraine, August 21–23, 2001. Kyïv: Instytut Matematyky NAN Ukraïny 18. Karlovich, A. Yu., Karlovich, Yu. I., & Lebre, A. B. (2011). Sufficient conditions for Fredholmness of singular integral operators with shifts and slowly oscillating data. Integral Equations and Operator Theory, 70, 451–483 19. Karlovich, A. Yu., Karlovich, Yu. I., & Lebre, A. B. (2011). Necessary conditions for Fredholmness of singular integral operators with shifts and slowly oscillating data. Integral Equations and Operator Theory, 71, 29–53 20. Karlovich, A. Yu., Karlovich, Yu. I., & Lebre, A. B. (2016). On a weighted singular integral operator with shifts and slowly oscillating data. Complex Analysis and Operator Theory, 10, 1101–1131 21. Karlovich, A. Yu., Karlovich, Yu. I., & Lebre, A. B. (2017). The index of weighted singular integral operators with shifts and slowly oscillating data. Journal of Mathematical Analysis and Applications, 450, 606–630 22. Karlovich, A. Yu., Karlovich, Yu. I., & Lebre, A. B. (2018). Criteria for n(d)normality of weighted singular integral operators with shifts and slowly oscillating data. Proceedings of the London Mathematical Society, 116, 997–1027 23. Karlovich, Yu. I. (1989). On algebras of singular integral operators with discrete groups of shifts in Lp-spaces. Soviet Mathematics - Doklady, 39, 48–53 24. Karlovich, Yu. I. (2006). An algebra of pseudodifferential operators with slowly oscillating symbols. Proceedings of the London Mathematical Society, 92, 713–761 25. Karlovich, Yu. I. (2007). A local-trajectory method and isomorphism theorems for nonlocal C-algebras. Modern Operator Theory and Applications. The Igor Borisovich Simonenko Anniversary (pp. 137–166), Oper. Theory Adv. Appl. (vol. 170). Basel: Birkhäuser 26. Karlovich, Yu. I. (2007). Algebras of pseudo-differential operators with discontinuous symbols. Modern Trends in Pseudo-Differential Operators (pp. 207–233), Oper. Theory Adv. Appl. (vol. 172). Basel: Birkhäuser 27. Karlovich, Yu. I. (2021). On Mellin pseudodifferential operators with quasicontinuous symbols. Special issue paper. Mathematical Methods in the Applied Sciences, 44, 9782–9816 28. Karlovich, Yu. I. (2022). Algebras of Mellin pseudodifferential operators with quasicontinuous symbols. Journal of Pseudo-Differential Operators and Applications, 13, 18. Published online: 04 April 2022. https://doi.org/10.1007/s11868-02200448-9 29. Karlovich, Yu. I., & Kravchenko, V. G. (1977). On a singular integral operator with non- Carleman shifts on an open contour. Soviet Mathematics - Doklady, 18, 1263–1267
On Singular Integral Operators with Shifts
523
30. Karlovich, Yu. I., & Kravchenko, V. G. (1984). An algebra of singular integral operators with piecewise-continuous coefficients and a piecewise-smooth shift on a composite contour. Mathematics of the USSR-Izvestiya, 23, 307–352 31. Karlovich, Yu. I., & Rosales-Méndez, J. (2022). The Haseman boundary value problem with quasicontinuous coefficients and shifts. Complex Analysis and Operator Theory, 16, 68. Published online: 15 June 2022. https://doi.org/10.1007/s11785022-01245-4 32. Karlovich, Yu. I., & Silbermann, B. (2004). Fredholmness of singular integral operators with discrete subexponential groups of shifts on Lebesgue spaces. Mathematische Nachrichten, 272, 55–94 33. Kravchenko, V. G., & Litvinchuk, G. S. (1994). Introduction to the theory of singular integral operators with shift. Dordrecht: Kluwer 34. Litvinchuk, G. S. (1977). Boundary value problems and singular integral equations with shift. Moscow: Nauka (in Russian) 35. Litvinchuk, G. S. (2000). Solvability theory of boundary value problems and singular integral equations with shift. Dordrecht: Kluwer 36. Myasnikov, A. G., & Sazonov, L. I. (1977). On singular integral operators with non-Carleman shift. Soviet Mathematics - Doklady, 18, 1559–1562 37. Myasnikov, A. G., & Sazonov, L. I. (1980). Singular integral operators with non-Carleman shift. Soviet Mathematics (Iz. VUZ), 24(3), 22–31 38. Myasnikov, A. G., & Sazonov, L. I. (1980). On singular operators with a non-Carleman shift and their symbols. Soviet Mathematics - Doklady, 22, 531–535 39. Rabinovich, V., Roch, S., & Silbermann, B. (2004). Limit operators and their applications in operator theory. Basel: Birkhäuser 40. Roch, S., Santos, P. A., & Silbermann, B. (2011). Non-commutative Gelfand theories. A tool-kit for operator theorists and numerical analysts. London: Springer 41. Sarason, D. (1975). Functions of vanishing mean oscillation. Transactions of the American Mathematical Society, 207, 391–405 42. Sarason, D. (1977). Toeplitz operators with piecewise quasicontinuous symbols. Indiana University Mathematics Journal, 26, 817–838 43. Semenjuta, V. N. (1977). On singular operator equations with shift on a circle. Soviet Mathematics - Doklady, 18, 1572–1574 44. Soldatov, A. P. (1979). On the theory of singular operators with a shift. Differential Equations, 15, 80–91 45. Soldatov, A. P. (1980). Singular integral operators on the real line. Differential Equations, 16, 98–105
Berezin Number and Norm Inequalities for Operators in Hilbert and Semi-Hilbert Spaces Cristian Conde, Kais Feki
, and Fuad Kittaneh
Abstract Let ðHΩ , h , iÞ be the reproducing kernel Hilbert space over some (non-empty) set Ω. Let k λ and kμ denote two normalized reproducing kernels of HΩ . The Berezin number and the Berezin norm of a bounded linear operator T acting on HΩ are, respectively, given by berðTÞ = supjhTkλ , kλ ij and kT k ber = sup jhTkλ , kμ ij. Our aim in this λ2Ω λ, μ2Ω chapter is to review and present several inequalities involving ber() and kkber. In addition, some bounds related to the Berezin number and Berezin norm are established when an additional semi-inner product structure induced by a positive operator A on HΩ is considered. Keywords Berezin number • Berezin norm • Reproducing kernel Hilbert space • Positive operator • Semi-inner product
C. Conde Instituto de Ciencias, Universidad Nacional de Gral. Sarmiento and Consejo Nacional de Investigaciones Científicas y Técnicas, Los Polvorines, Argentina e-mail: [email protected] K. Feki Faculty of Economic Sciences and Management of Mahdia, University of Monastir, Mahdia, Tunisia Laboratory Physics-Mathematics and Applications (LR/13/ES-22), Faculty of Sciences of Sfax, University of Sfax, Sfax, Tunisia e-mail: [email protected]; [email protected] F. Kittaneh (✉) Department of Mathematics, The University of Jordan, Amman, Jordan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Moslehian (ed.), Matrix and Operator Equations and Applications, Mathematics Online First Collections, https://doi.org/10.1007/16618_2023_55
525
526
C. Conde et al.
Mathematics Subject Classification (MSC2020) Primary: 47A63 • Secondary: 46C05, 47A12, 15A60, 47A30
1 Introduction Reproducing kernel Hilbert spaces have arisen into a crucial tool in many fields, especially machine learning and statistics. Furthermore, they play an important role in some areas of complex analysis, group representation theory, the theory of integral operators, as well as probability theory. In this chapter, we derive various inequalities involving Berezin Number and Norm Inequalities for Operators. Moreover, after considering an additional semi-inner product structure induced by a nonzero positive operator acting on the corresponding reproducing kernel Hilbert space, we develop some bounds related to Berezin number and Berezin norm. Before proceeding further, we introduce the following notations and terminologies. Let Ω be a non-empty set and F ðΩÞ be the set of all functions from Ω to , where stands for the field of all complex numbers. A set HΩ ⊆ F ðΩÞ is called a reproducing kernel Hilbert space (RKHS for short) on Ω if HΩ is a Hilbert space, and for every λ 2 Ω, the map E λ : HΩ → is bounded, where Eλ( f ) = f(λ) for all f 2 HΩ . We endow HΩ with the inner product h, i. By Riesz representation theorem, we deduce that for each λ 2 Ω, there exists a unique vector kλ 2 HΩ such that f(λ) = Eλ( f ) = hf, kλi for every f 2 HΩ . The map k : Ω × Ω → defined by kðz, λÞ = kλ ðzÞ = hkλ , kz i, is called the reproducing kernel function of the RKHS ðHΩ , h, iÞ. If {en} denotes any orthonormal basis for HΩ , then it has been shown in [28, problem 37] that the kernel function is given by kðz, λÞ =
1 n=0
en ðzÞen ðλÞ:
Let be the unit disk of . A well-known example of an RKHS is the Hardy space, denoted by H 2 ðÞ, which is the Hilbert space of all square summable holomorphic functions on (cf. [38]). Namely, we have
Berezin Number and Norm Inequalities for Operators
H 2 ðÞ := f ðzÞ =
n≥0
an zn 2 HolðÞ ;
527
n≥0
jan j2 < 1 ,
where HolðÞ denotes the collection of all holomorphic functions on . Let f(z) =∑n≥0 anzn and g(z) =∑n≥0 bnzn belong to H 2 ðÞ. The inner product between f and g is defined as hf , giH 2 ðÞ =
an bn :
n≥0
It can be checked that the reproducing kernel for H 2 ðÞ, known as the Szego kernel, is given by kðz, λÞ =
1 , 1 - zλ
for every z, λ 2 . For more information, details and references on reproducing kernel Hilbert spaces, the reader is invited to see [5, 36]. Let ðHΩ , h , iÞ be an RKHS on a set Ω with inner product h, i and associated norm kk. For λ 2 Ω, let kλ = kkkλλ k be the normalized reproducing kernel of HΩ . Notice that the following set fkλ ; λ 2 Ωg is a total set in HΩ . For T 2 ðHΩ Þ, i.e., a bounded linear operator on HΩ , the function T~ defined ~ = hTkλ , kλ i is the Berezin symbol of T, which has been firstly on Ω by TðλÞ introduced by Berezin [9, 10]. It is crucial to note that the Berezin transform of an operator provides several important information related to the operator. For example, it is well-known that on the most familiar reproducing kernel Hilbert spaces, including the Bergman, Hardy, Fock, and Dirichlet Hilbert spaces, the operator is uniquely determined by the Berezin transform, i.e., if ~ for all λ 2 Ω. The reader ~ = SðλÞ T, S 2 ðHΩ Þ, then T = S if and only if TðλÞ may consult, for instance, [40] and its references or [30]. For some applications related to the Berezin symbol of operators, the interested reader is invited to consult for example the following references [27, 31, 32, 35]. The Berezin set, the Berezin number and the Berezin norm of the operator T are, respectively, defined by
f
g
~ ~ = sup jhTkλ , kλ ij λ 2 Ω , berðTÞ : = sup jTðλÞj BerðTÞ : = TðλÞ; λ2Ω
λ2Ω
528
C. Conde et al.
and
f
g
kT kber := sup jhTkλ , kμ ij ; λ, μ 2 Ω ,
where k λ and kμ are two normalized reproducing kernels of the space HΩ (see [18] and references therein). Recently, several inequalities involving the Berezin number and Berezin norm of operators have been studied (see, e.g., [15, 16, 34] and the references therein). For T 2 ðHΩ Þ, we recall from [25] the following quantity: kT k
ber
:= sup kT k^λ k: λ2Ω
Since fkλ ; λ 2 Ωg is a total set in HΩ , then one can verify that ber(), kkber and kk are norms on ðHΩ Þ. In addition, it is not difficult to see that ber
berðTÞ ≤ kT kber ≤ kT k
ber
≤ kT k ,
8 T 2 ðHΩ Þ,
ð1:1Þ
where kTk stands for the usual operator norm of T. Before we proceed further, it should be mentioned here that the inequalities (1.1) are in general strict as it is shown in the next example. Example 1.1 Considering 2 as a RKHS, it is easy to see that the canonical basis for 2 , {e1, e2}, is precisely the set of normalized reproducing kernels 1 3 (see in [36, pp. 4–5]). If we take T = 2 ð2 Þ, then we can easily 0 2 show that berðTÞ = sup jhTei , ei ij = 2, i2f1, 2g kT k
kTkber = sup jhTei , ej ij = 3, i, j2f1, 2g
p = sup kTei k = 13 and kT k = ber i2f1,2g
p 3 5 þ 7:
It is well-known that the operator norm on Hilbert spaces is submultiplicative. However, this property may not be correct in general for the Berezin norm of operators even if for positive operators as it is shown in the next example. Before that, we recall that an operator T 2 ðHΩ Þ is said to be positive if hTx, xi≥ 0 for every x 2 HΩ and we write T ≥ 0.
Berezin Number and Norm Inequalities for Operators
529
Example 1.2 As in the previous example, we consider 2 as a RKHS and T = 12 11 11 2 ð2 Þ: Clearly, T is an orthogonal projection. So, T ≥ 0. Further, we have 1 kT 2 kber = kT kber = sup jhTei , ej ij = : 2 i, j2f1, 2g Thus, kT 2 kber =
1 2
> kT k2ber = 14 :
From the definitions of ber() and kkber, the following properties berðTÞ = berðT Þ and kT kber = kT kber , hold immediately for every T 2 ðHΩ Þ. Here T denotes the adjoint of T. It should be mentioned here that the equality kT k = kT k fails to be true ber ber 1 for some T 2 ðHΩ Þ. In fact, if we consider T = 0 10 , then one p can check that kT k = 1 and kT k = 2. ber
ber
Clearly, ber(T) ⊆ W(T) which implies that ber(T) ≤ ω(T). Here W(T) and ω(T) denote the numerical range and numerical radius of T, respectively, and are defined by WðTÞ : = fhT f , f i ; f 2 HΩ , k f k = 1g
and
ωðTÞ := sup jhT f , f ij: kf k = 1
For an account of the results involving the numerical range and radius of operators, we refer the reader [11, 13, 26, 34] and references therein. It is well-known that ω(Tn) ≤ ωn(T) for every positive integer n and the equality ω(Tn) = ωn(T) holds if T is a self-adjoint operator, i.e., T = T, (or in particular if T is positive). However, the above two properties may not be correct in general for the Berezin number of operators. Indeed, by using the same operator as in Example 1.2, we see that 1 1 berðT 2 Þ = berðTÞ = sup jhTei , ei ij = > ber2 ðTÞ = : 2 4 i2f1, 2g The present chapter is structured as follows: Section 2 deals with basic definitions and properties of notions related to semi-Hilbert spaces associated to a positive bounded linear operator A. In particular, we give a brief description of the well-known theorem due to R. G. Douglas [19]. This result
530
C. Conde et al.
plays a crucial role in this chapter. In Section 3, we establish some inequality involving ber() and kkber. Finally, in Section 4, some bounds related to the Berezin number and Berezin norm are proved in the context of semi-Hilbert spaces.
2 Semi-Hilbert Spaces Throughout this section, let H be a complex Hilbert space with inner product h, i and associated norm kk. By ðHÞ we denote the C-algebra of all bounded linear operators from H to H. It is crucial to mention that all operators in this work are assumed to be bounded and linear. If T 2 ðHÞ, then its range and its null space are denoted by ran(T) and kerðTÞ, respectively. Also, T stands for the adjoint of T. Let S be any linear subspace of H. Then we denote by S its closure with respect to the topology generated by kk. If M is a closed subspace of H, then PM stands for the orthogonal projection onto M. In all that follows, we suppose that A 2 ðHÞ is a nonzero positive operator. The semi-inner product on H induced by A is given by hx, yiA = hAx, yi = hA1∕2x, A1∕2yi for every x, y 2 H. Here A1∕2 stands for the square root of A. Let kkA be the seminorm induced by h, iA. Clearly, we have kxkA = kA1∕2xk for all x 2 H. It can observed that kxkA = 0 if and only if x 2 kerðAÞ. This implies that kkA is a norm on H if and only if A is one-toone. Moreover, it is not difficult to verify that the semi-Hilbert space ðH, kkA Þ is complete if and only if ranðAÞ = ranðAÞ. Let A ðHÞ= T 2 ðHÞ; ∃ c> 0 such that kTxkA ≤ckxkA , 8 x 2 ranðAÞ : Notice that A ðHÞ is not in general a subalgebra of ðHÞ (see [22]). If T 2 A ðHÞ, then the A-operator seminorm of T is given by kT k A =
kTxkA = kxkA x2ranðAÞ, sup x≠0
sup kTxkA < 1: x2ranðAÞ, kxkA = 1
It should be mentioned here that kTkA may be equal to +1 for some T 2 ðHÞ (see [22]). Before we proceed further, we recall from [2] the following definition.
Berezin Number and Norm Inequalities for Operators
531
Definition 2.1 [2] Let T 2 ðHÞ. An operator R 2 ðHÞ is said an A-adjoint operator of T if the equality hTx, yiA = hx, RyiA holds for every x, y 2 H. It can be observed from the above definition that an operator R is an A-adjoint of T if and only if AR = TA. That is, R is a solution of the operator equation AX = TA. Notice that, in general, not every T 2 ðHÞ admits an A-adjoint operator. Even if there exists an A-adjoint of T, it may not be unique. The following result, which is due to Douglas [19], plays an important role in studying the existence of solutions of the operator equation AX = TA. Theorem 2.1 [19] Let T, S 2 ðHÞ . Then the following statements are equivalent: (i) ran(S) ⊆ ran(T). (ii) TD = S for some D 2 ðHÞ. (iii) There exists λ > 0 such that kSxk≤ λkTxk for all x 2 H. Moreover, if one of these conditions holds, then there exists a unique solution Q 2 ðHÞ of the equation TX = S such that ranðQÞ ⊆ ranðT Þ. The operator Q is called the reduced solution of the equation TX = S. Now, we denote by A ðHÞ and A1∕2 ðHÞ the collections of all operators which admit A-adjoint and A1∕2-adjoint, respectively. An application of Theorem 2.1 gives
f
A ðHÞ = T 2 ðHÞ ; ranðT AÞ ⊆ ranðAÞ
g
and
f
g
A1∕2 ðHÞ = T 2 ðHÞ; ∃ c > 0 such that kTxkA ≤ ckxkA , 8x 2 H : If T 2 A1∕2 ðHÞ, then T is called A-bounded. It can be observed that A ðHÞ and A1∕2 ðHÞ are two subalgebras of ðHÞ. Notice that they are, in general, neither closed nor dense in ðHÞ (see [22]). Further, we have A ðHÞ ⊆ A1∕2 ðHÞ ⊆ A ðHÞ ⊆ ðHÞ. The above inclusions remain equality if A is one-to-one and ran(A) is a closed subspace in H. If T 2 A ðHÞ, then the reduced solution of the equation AX = TA will be denoted by T ♯A . It is well-known that T ♯A = A{ T A, where A{ denotes the Moore-Penrose inverse of A. It is important to note that if T 2 A ðHÞ, then T ♯A 2 A ðHÞ, ♯ ♯ ♯ ðT ♯A Þ A = PranðAÞ TPranðAÞ and ððT ♯A Þ A Þ A = T ♯A . An operator T 2 ðHÞ is
532
C. Conde et al.
called A-self adjoint if the operator AT is self adjoint. Clearly, if T is an A-self adjoint operator, then T 2 A ðHÞ but the equality T ♯A = T may not hold in general (see [23]). However, we have kT ♯A xkA = kTxkA for every x 2 H and ðT ♯A Þ
♯A
ð2:1Þ
= T ♯A :
An important observation is that for T 2 A1∕2 ðHÞ we have kT kA = supfkTxkA ; x 2 H, kxkA = 1g = sup jhTx, yiA j; x, y 2 H, kxkA = kykA = 1 : For more results related to the theory of semi-Hilbert spaces, the reader is invited to consult [2–4, 8, 14, 22, 24] and references therein. For T 2 ðHÞ, the A-numerical range of T was defined by Baklouti et al. in [7] as W A ðTÞ = fhTx, xiA ; x 2 H, kxkA = 1g: It has been proved in [7] that WA(T) is a non-empty convex subset of which is not necessarily closed. The supremum modulus of WA(T) is denoted by ωA(T) and called the A-numerical radius of T (see [7]). More precisely, we have ωA ðTÞ = supfjξj ; ξ 2 W A ðTÞg = supfjhTx, xiA j ; x 2 H, kxkA = 1g: Clearly ωA() is a seminorm on A1∕2 ðHÞ which is equivalent to kkA. Namely, we have 12 kT kA ≤ ωA ðTÞ ≤ kT kA for all T 2 A1∕2 ðHÞ (see [7]). Note that the definition of ωA() was firstly introduced by Saddi in [37]. For more facts and results involving A-numerical radius of operators, the reader is referred to [1, 7, 12, 39] and the references therein. We close this section with the following lemma which is a generalization of Buzano inequalities (see [17]) and will be used in due course of time. Lemma 2.1 [37] Let x, y, z 2 H be such that kzkA = 1. Then jhx, ziA hz, yiA j ≤
ð
Þ
1 jhx, yiA jþkxkA kykA : 2
Berezin Number and Norm Inequalities for Operators
533
3 Berezin Number and Berezin Norm Inequalities In this section ðHΩ , h , iÞ denotes an RKHS on a set Ω with associated norm kk. In all that follows, the Cartesian decomposition of T 2 ðHΩ Þ is given by T = Re(T) + iIm(T), where Re(T) and Im(T) denote the real part and T - T the imaginary part of T, respectively, i.e., ReðTÞ = TþT 2 and ImðTÞ = 2i : Also the identity operator on HΩ will be simply denoted by I. Our first result provides a lower bound for ber2(T) and reads as follows. Theorem 3.1 Let T 2 ðHΩ Þ. Then ber2 ðTÞ ≥ sup
0≤α≤1
ð 12 ber ðpαReðTÞ þ p1 - αImðTÞÞ 2
Þ
p p 1 þ ber2 ð αReðTÞ - 1 - αImðTÞÞ : 2 Proof Let kλ be a normalized reproducing kernel of HΩ and α 2 [0, 1]. Then applying the well-known inequality (ab + cd)2 ≤ (a2 + c2)(b2 + c2) for any real numbers a, b, c, d we have p
ðpαjhReðTÞkλ, kλij þ 1 - αjh ImðTÞkλ, kλijÞ 2 2 ≤ ðjhReðTÞkλ , kλ ij þ jhImðTÞkλ , kλ ij Þ:
2
Therefore, 1 2
jhTkλ, kλij = ðjhReðTÞkλ, kλij þ jh ImðTÞkλ, kλij Þ p p ≥ αjh ReðTÞk λ , k λ ij þ 1 - αjhImðTÞk λ , k λ ij p p ≥ jhð αReðTÞ ± 1 - αImðTÞÞk λ , k λ ij: 2
2
Taking supremun over all normalized reproducing kernels kλ in HΩ , we get p p berðTÞ ≥ berð αReðTÞ ± 1 - αImðTÞÞ: This implies that
534
C. Conde et al.
p p 2ber2 ðTÞ ≥ ber2 ð αReðTÞ þ 1 - αImðTÞÞ p p þ ber2 ð αReðTÞ - 1 - αImðTÞÞ: As the last inequality holds for all real numbers α 2 [0, 1], we get the desired inequality. □ In order to prove an upper bound ber(T), we require the following lemma. Lemma 3.1 [6] Let T 2 ðHΩ Þ. Then
ð
Þ
berðTÞ = sup ber Reðeiθ TÞ : θ2
ð3:1Þ
Theorem 3.2 Let T 2 ðHΩ Þ. Then berðTÞ ≤ inf
ϕ2
ber2 ðReðeiϕ TÞÞ þ ber2 ðImðeiϕ TÞÞ:
Proof By employing the identity (3.1), we have that berðTÞ = sup berðαReðTÞ þ βImðTÞÞ: α2 þβ2 = 1
ð3:2Þ
Indeed, for any θ 2 , we have eiθ T þ e - iθ T 2 1 ¼ fðcos θ þ i sin θÞT þ ðcos θ - i sin θÞT g 2 T þ T T - T ¼ ðcos θÞ - ðsin θÞ 2 2i ¼ ðcos θÞReðT Þ - ðsin θÞImðT Þ:
Re eiθ T ¼
Therefore, by putting α = cos θ and β = -sin θ in the previous equality, we obtain (3.2). Now, let ϕ 2 . Then
Berezin Number and Norm Inequalities for Operators
535
berðTÞ = berðeiϕ TÞ = sup berðαReðeiϕ TÞ þ βImðeiϕ TÞÞ α2 þβ2 = 1
≤ ≤ =
sup jαjberðReðeiϕ TÞÞ þ jβjberðImðeiϕ TÞÞ α2 þβ2 = 1
ber2 ðReðeiϕ TÞÞ þ ber2 ðImðeiϕ TÞÞ α2 þ β2
sup α2 þβ2 = 1
ber2 ðReðeiϕ TÞÞ þ ber2 ðImðeiϕ TÞÞ:
Thus, berðTÞ ≤ inf
ϕ2
ber2 ðReðeiϕ TÞÞ þ ber2 ðImðeiϕ TÞÞ: □
Remark 3.1 We note that if we consider ϕ = 0 in Theorem 3.2, we obtain a refinement of the following well-known inequality: berðTÞ ≤
ber2 ðReðTÞÞ þ ber2 ðImðTÞÞ,
(see Remark 2.2 in [6]). On the other hand, if T 2 ðHΩ Þ is a self-adjoint operator then berðTÞ = inf
ϕ2
ber2 ðReðeiϕ TÞÞ þ ber2 ðImðeiϕ TÞÞ,
since for any ϕ 2 we have that ber2 ðReðeiϕ TÞÞ = cos 2 ðϕÞber2 ðTÞ and ber2 ð Imðeiϕ TÞÞ = sin 2 ðϕÞber2 ðTÞ. Our next result reads as follows. Theorem 3.3 Let T 2 ðHΩ Þ. Then ber2r ðTÞ ≤
α α jjTjjr kT kr berr ðT 2 Þ þ 1 ber ber 2 2
for every r ≥ 1 and α 2 [0, 1].
ð
Þ
536
C. Conde et al.
Proof We recall the following refinement of the Cauchy-Schwarz inequality obtained by Khosravi et al. in [33, Corollary 2.5]:
jha, eihe, bij ≤ α2 jha, bij þ ð1 - α2 Þkakkbk, for any a, b, e 2 HΩ with kek = 1 and α 2 [0, 1]. Indeed, for any α2 2 ½0, 12, we get
jha, eihe, bij = jha, eihe, bi - α2 ha, bi þ α2 ha, bij α α ≤ jha, eihe, bi - ha, bij þ jha, bij 2 2 α 2 α ≤ j ha, eihe, bi - ha, bij þ jha, bij 2 α 2 α 2 α ≤ max f1, j1 - jgkakkbk þ jha, bij 2 α 2 α α = ð1 - Þkakkbk þ jha, bij: 2 2 Let kλ be a normalized reproducing kernel of HΩ . Then, by letting e = kλ , a = Tkλ and b = T kλ in the above inequality, we get
jhTkλ, kλij = jhTkλ, kλihkλ, T kλij α α ≤ jhTkλ , T kλ ij þ ð1 - ÞkTkλ kkT k λ k 2 2 α α 2 = jhT k λ , k λ ij þ ð1 - ÞkTk λ kkT kλ k: 2 2 2
Now, if we consider the convex function f(t) = tr on [0, 1), we have
jhT k^λ, k^λij
2r
r α hT 2 k^λ ,k^λ i þ 1 2 r α ≤ hT 2 k^λ ,k^λ i þ 1 2
≤
j j
j ð j ð
α kT k^λ kr kT k^λ kr 2 α jjTjjr kT kr : ber ber 2
Þ Þ
Taking the supremum over kλ 2 H in the last inequality, we get the desired inequality. □
Berezin Number and Norm Inequalities for Operators
537
Remark 3.2 From the above result we obviously have for r = 1 and α = 1, 1 1 berðT 2 Þ þ kT k kT k berðTÞ ≤ ber ber 2 2 1 2 1 kT kber þ kT k2 ≤ 2 2
1 2
1 2
:
Our next theorem is stated as follows. Theorem 3.4 Let T 2 ðHΩ Þ. Then kT k2
ber
≤ ½ð1 - αÞ2 þ α2 ber2 ðTÞ þ αktI - Tk2 þ ð1 - αÞkitI - Tk2 ber
ber
for every α 2 [0, 1] and t 2 : Proof We use the following inequality obtained by Dragomir in [21]: ½αktb - ak2 þ ð1 - αÞkitb - ak2 kbk2 ≥ kak2 kbk2 - ½ð1 - αÞℑha, bi þ αℜha, bi2 to get kak2 kbk2 ≤ αktb - ak2 þ ð1 - αÞkitb - ak2 kbk2 þ ½ð1 - αÞℑha, biþ αℜha, bi2 ≤ αktb - ak2 þ ð1 - αÞkitb - ak2 kbk2 þ ½ð1 - αÞ2 þ α2 jhabij2
ð3:3Þ
for any a, b 2 HΩ , α 2 ½0, 1, and t 2 : Here ℜz and ℑz denote the real and imaginary part of any complex number z, respectively. Let k λ be a normalized reproducing kernel of HΩ . Choosing in (3.3), a = Tk λ and b = k λ , we have kTk λ k2 ≤ αktkλ - Tkλ k2 þ ð1 - αÞkitk λ - Tk λ k2 þ½ð1 - αÞ2 þ α2 jhTk λ , kλ ij2 :
538
C. Conde et al.
Finally, taking the supremum over all λ 2 Ω in the last inequality, we deduce the desired result. □ The following particular cases, for α = 0 or α = 1 or α = 12, may be of interest. Corollary 3.1 Let T 2 ðHΩ Þ. Then kTk2 ≤ ber2 ðTÞ þ inf ktI - Tk2 ,
ð3:4Þ
kTk2 ≤ ber2 ðTÞ þ inf kitI - Tk2 ,
ð3:5Þ
ber
ber
t2
ber
ber
t2
and 1 1 kTk2 ≤ ber2 ðTÞ þ inf ktI - Tk2 þ kitI - Tk2 : ber ber ber 2 2 t2
ð3:6Þ
In the following statements, we present upper bounds for kTk2 - ber2 ðTÞ. ber
Proposition 3.1 Let T 2 ðHΩ Þ. Then kTk2 - ber2 ðTÞ ≤ kTk2
inf kI - αTk2 : ber
ber α2
ber
Proof Taking into account that for any a, b 2 HΩ , and α 2 , we have
j
j
2
kak2 kbk2 - ha, bi ≤ kbk2 ka - αbk2 :
ð3:7Þ
Let λ 2 Ω and k λ be the normalized reproducing kernel of the space HΩ . If we take a = kλ and b = Tk λ in (3.7), then we obtain
kTkλ k2 ≤ jhkλ , Tkλ ij2 þ kTkλ k2 kkλ I - αTkλ k2 : So, by taking the supremum over all λ 2 Ω in the last inequality, we get the
Berezin Number and Norm Inequalities for Operators
539
kTk2 - ber2 ðTÞ ≤ kTk2 kI - αTk2 : ber
ber
ber
□
This immediately yields to the desired result.
Corollary 3.2 Let t 0 2 be a complex number such that kt0 I - Tk2 ≤ p or ber
kit0 I - Tk2 ≤ p. Then ber
kTk2 - ber2 ðTÞ ≤ p: ber
Proof By placing t = t0 in (3.4) or (3.5), we have that kTk2 ≤ ber2 ðTÞ þ p, ber
□
and we obtain the desired result.
As it is mentioned in Example 1.1, the equality ber(T) = kTkber may not hold in general for operators in ðHΩ Þ. However, Bhunia et al. proved recently in [15] that the above equality is true for the class of positive operators acting on HΩ . Namely, we have the following lemma which plays a crucial rule in proving several results in this section. Lemma 3.2 Let T 2 ðHΩ Þ be a positive operator. Then, berðTÞ = kTkber :
ð3:8Þ
Remark 3.3 (1) If T 2 ðHΩ Þ, then clearly TT and TT are positive operators. So, by applying Lemma 3.2, we see that kTT kber = kT k2
ber
and kT T kber = kT k2 : ber
ð3:9Þ
(2) Bhunia et al. gave in [15] an example which shows that the equality (3.8) does not hold even for self-adjoint operators. It is important to note that the equality berðTÞ = kTk may not be true even for positive operators. ber
In fact, if we consider the same operator as in Example 1.2, i.e., p 1 , then it be verified that kT k = 2 but berðTÞ = 1. T = 12 1 2 2 1 1 ber Now, we are ready to prove the following theorem in this chapter which provides an improvement of a result by Huban et al. in [29].
540
C. Conde et al.
Theorem 3.5 Let T 2 ðHΩ Þ. Then p berðT Þ ≤
2 2
ð3:10Þ
kT T þ TT kber :
Proof Since T = Re(T) + iIm(T), then a short calculation shows that 1 ðTT þ T T Þ = Re2 ðTÞ þ Im2 ðTÞ: 2
ð3:11Þ
Let λ 2 Ω and kλ be the normalized reproducing kernel of the space HΩ . Then, by applying the Cauchy-Schwarz inequality, we see that
jhTkλ, kλij = jhReðTÞkλ, kλij 2
2
j
þ hImðTÞkλ , kλ i
j
2
≤ kReðTÞk λ k2 þ kImðTÞk λ k2 = hRe2 ðTÞkλ , k λ i þ hIm2 ðTÞkλ , k λ i = hðRe2 ðTÞ þ Im2 ðTÞÞk λ , k λ i ≤ berðRe2 ðTÞ þ Im2 ðTÞÞ
= kRe2 ðTÞ þ Im2 ðTÞkber , where we have used Lemma 3.2 in the last equality since we have Re2(T) + Im2(T) ≥ 0. So, by taking (3.11) into consideration, we get
jhTkλ, kλij
2
1 ≤ kTT þ T T kber : 2
Now, by taking the supremum over all λ 2 Ω in the last inequality, we get the inequality in (3.10) as desired. This finishes the proof. □ Remark 3.4 (1) Huban et al. proved in [29] that for every T 2 ðHΩ Þ, we have 1 2
p
kT T þ TT k ≤ berðTÞ ≤
2 2
kT T þ TT k:
ð3:12Þ
Berezin Number and Norm Inequalities for Operators
541
It is clear that the inequality (3.10) refines the second inequality in (3.12). However, it should be noted that the first inequality in (3.12) may not hold in general for every T 2 ðHΩ Þ. Also, the following inequality 1 2
kT T þ TT kber ≤ berðTÞ
fails to hold for some T 2 ðHΩ Þ. Indeed, let T =
0
1
0
0
2 ð2 Þ
with 2 as a RKHS. Then, it is easy to verify that berðTÞ = sup jhTei , ei ij = 0 and kT T þ TT kber = kT T þ TT k = 1: i2f1, 2g So, we infer that 1 2
kT T þ TT k =
1 > berðTÞ = 0, 2
and 1 2
kT T þ TT kber =
1 > berðTÞ = 0: 2
(2) By applying (3.9), we deduce that p
2 berðT Þ ≤ 2
p
kT T þ TT kber ≤
2 2
kT k2 þ kT k2 ≤ kT k2 : ber
ber
Remark 3.5 We now turn our attention to show that the upper bounds, obtained in Theorems 3.3 and 3.5, respectively, are not comparable in general. The following numerical examples will illustrate the incomparability. 1 0
Consider T = 2 1
2 ð2 Þ,
then one may verify that berðT 2 Þ = kT k2
0 2
ber
= 14 and for any α 2 [0, 1], we get p α α 1 1 2 2 2 2 ber T þ 1 - kT k ¼ kT k ¼ < ¼ ber ber 2 2 4 2 2
kT T þ TT kber :
542
C. Conde et al.
Again, if we consider T = 1 0 p
2 2
kT T þ TT kber ¼
p
2 1
2 ð2 Þ, then
1 1 3 < 3 ¼ ber T 2 þ kT k kT k ber ber 2 2 α α 2 ≤ ber T þ 1 - kT k kT k , ber ber 2 2
for any α 2 [0, 1]. The following lemma is useful in the proof of our next result. Lemma 3.3 [20] For any x, y, z 2 H, we have
jhx, yij2 þ jhx, zij2 ≤ kxk2 ðmax fkyk2 , kzk2 g þ jhy, zijÞ:
ð3:13Þ
Now, we are in a position to prove the following theorem which provides a Berezin norm inequality for sum of two operators acting on HΩ . Theorem 3.6 Let T, S 2 ðHΩ Þ. Then kT þ Skber ≤
1 ðkTT þ SS kber þ berðTT - SS ÞÞ þ berðST Þ þ 2kT kber kSkber : 2
Proof We first recall that for every a, b 2 , we have maxfa, bg =
ð
Þ
1 a þ b þ ja - bj : 2
ð3:14Þ
Now, let λ, μ 2 Ω and k λ , k μ be two normalized reproducing kernels of the space HΩ . By letting x = k λ , y = T kμ and z = S kμ in Lemma 3.3, we get
Berezin Number and Norm Inequalities for Operators
543
jhðT þ SÞkλ, kμij 2 2 ≤ jhk λ , T kμ ij þ jhkλ , S kμ ij þ 2jhTkλ , kμ ij jhSk λ , k μ ij ≤ max fkT k μ k2 , kS kμ k2 g þ jhST kμ , kμ ij þ 2jhTkλ , kμ ij jhSk λ , kμ ij 1 = ðkT kμ k2 þ kS k μ k2 þ kT k μ k2 - kS k μ k2 Þ þ jhST kμ , k μ ij 2 þ2jhTk λ , k μ ij jhSkλ , kμ ij ðby ð3:14ÞÞ 1 = ðhðTT þ SS Þk μ , k μ i þ hðTT - SS Þk μ , k μ i Þ þ jhST k μ , kμ ij 2 þ2jhTk λ , k μ ij jhSkλ , kμ ij 1 ≤ ðberðTT þ SS Þ þ berðTT - SS ÞÞ þ berðST Þ þ 2kTkber kSkber 2 1 = ðkTT þ SS kber þ berðTT - SS ÞÞ þ berðST Þ þ 2kTkber kSkber , 2 2
where the last inequality follows from Lemma 3.2 since TT + SS≥ 0. So, we get 2
1 2
hðT þ SÞkλ kμ i ≤ ðkTT þ SS kber þ berðTT - SS ÞÞ þ berðST Þ þ 2kT kber kSkber for every λ, μ 2 Ω. Therefore, by taking the supremum over all λ, μ 2 Ω in the above inequality, we obtain the desired result. □ In the next result, we establish another upper bound for kT + Skber with T and S are in ðHΩ Þ. Theorem 3.7 Let T, S 2 ðHΩ Þ. Then kT þ Skber ≤
kTk2ber þ kSk2ber þ kTk
ber
kS k
ber
þ berðSTÞ:
Proof Let λ, μ 2 Ω and k λ , kμ be two normalized reproducing kernels of the space HΩ . By using the triangle inequality together with Lemma 2.1 for A = I, we get
544
C. Conde et al.
jhðT þ SÞkλ, kμij 2 ≤ ðjhTk λ , kμ ij þ jhSk λ , k μ ijÞ 2
= jhTk λ , k μ ij þ jhSkλ , kμ ij þ 2jhTk λ , k μ ij jhSkλ , kμ ij 2
2
= jhTk λ , k μ ij þ jhSkλ , kμ ij þ 2jhTk λ , k μ ij jhk μ , S k λ ij 2
2
≤ jhTk λ , k μ ij þ jhSkλ , kμ ij þ kTk λ kkS kλ k þ jhTkλ , S k λ ij 2
2
= jhTk λ , k μ ij þ jhSkλ , kμ ij þ kTk λ kkS kλ k þ jhSTk λ , k λ ij 2
2
≤ kTk2ber þ kSk2ber þ kTk
ber
kS k
ber
þ berðSTÞ:
Thus, we obtain
jhðT þ SÞkλ , kμ ij ≤ kTk2ber þ kSk2ber þkTkber kS kber þ berðSTÞ 2
for every λ, μ 2 Ω. Therefore, by taking the supremum over all λ, μ 2 Ω in the above inequality, we get kT þ Sk2ber ≤ kTk2ber þ kSk2ber þ kTk
ber
kS k
ber
þ berðSTÞ: □
This completes the proof.
Remark 3.6 We now turn our attention to the bounds, obtained in Theorems 3.6 and 3.7, respectively, that are not comparable in general. The following numerical examples will illustrate the incomparability of such the upper bounds. Consider T = 1 2
1
0
1 2
0
2 ð2 Þ
and S = T. Then one may verify that
ðkTT þ SS kber þ berðTT - SS ÞÞ þ berðST Þ þ 2kTkberkSkber = 4
while kTk2ber þ kSk2ber þ kTk
ber
kS k
ber
þ berðSTÞ = 3 þ
0 α Again, if we consider T = α 0 β ,S= 0 0 ≤ γ < β < α, then
0 γ
5 4:
2 ð2 Þ, where
Berezin Number and Norm Inequalities for Operators
545
kTk2ber þ kSk2ber þ kTk kS k þ berðSTÞ = 3α2 þ berðSTÞ ber ber 1 2 2 2 < 3α þ ðβ - γ Þ þ berðSTÞ 2 1 = kTT þ SS kber þ berðTT - SS Þ þ berðST Þ þ 2kTkber kS kber : 2
ð
Þ
This shows that upper bounds obtained in Theorems 3.6 and 3.7 are not comparable. Another application of Lemma 3.2 can be seen in the next result. Theorem 3.8 Let T, S 2 ðHΩ Þ be two positive operators. Then kT þ Skber ≤
ber2 ðT þ iSÞ þ kTk
ber
kS k
ber
þ berðSTÞ:
Proof Since T ≥ 0 and S ≥ 0, then T + S ≥ 0. Thus, by Lemma 3.2 we have kT þ Skber = berðT þ SÞ:
ð3:15Þ
Now, let λ 2 Ω and kλ be the normalized reproducing kernel of the space HΩ . By proceeding as in the proof of Theorem 3.7, we see that
jhðT þ SÞkλ , kλ ij ≤ jhTkλ , kλ ij þ jhSkλ , kλ ij þ 2jhTkλ , kλ ij jhkλ , S kλ ij 2 = jhTkλ , kλ i þ ihSkλ , k λ ij þ 2jhTkλ , k λ ij jhk λ , S k λ ij 2 ≤ jhðT þ iSÞkλ , kλ ij þ kTk λ kkS k λ k þ jhTkλ , S k λ ij 2 = jhðT þ iSÞkλ , kλ ij þ kTk λ kkS kλ k þ jhSTk λ , kλ ij ≤ ber2 ðT þ iSÞ þ kTk kS k þ berðSTÞ: ber ber 2
2
2
Hence, we obtain
jhðT þ SÞkλ , kλ ij ≤ ber2 ðT þ iSÞ þkTkber kS kber þ berðSTÞ 2
for every λ 2 Ω. Therefore, by taking the supremum over all λ 2 Ω in the above inequality, we get ber2 ðT þ SÞ ≤ ber2 ðT þ iSÞ þ kTk
ber
kS k
ber
þ berðSTÞ:
Hence, we deduce the desired result by combining (3.15) and (3.16). Our next result reads as follows.
ð3:16Þ □
546
C. Conde et al.
Theorem 3.9 Let T, S 2 ðHΩ Þ. Then berðT þ SÞ ≤
ber2 ðTÞ þ ber2 ðSÞ þ
k
1 T T þ SS 2
kber þ berðSTÞ:
Proof Let λ 2 Ω and k λ be the normalized reproducing kernel of the space HΩ . By proceeding as in the proof of Theorem 3.7, we see that
jhðT þ SÞkλ , kλ ij 2 2 ≤ jhTk λ , k λ ij þ jhSkλ , kλ ij þ 2jhTkλ , kλ ij jhSk λ , k λ ij 2 2 = jhTkλ , k λ ij þ jhSkλ , kλ ij þ 2jhTkλ , kλ ij jhkλ , S kλ ij 2 2 ≤ jhTk λ , k λ ij þ jhSkλ , kλ ij þ kTk λ kkS k λ k þ jhTkλ , S kλ ij 2 2 ≤ jhTk λ , k λ ij þ jhSkλ , kλ ij þ kTk λ kkS k λ k þ jhSTk λ , k λ ij ≤ ber2 ðTÞ þ ber2 ðSÞ þ kTk λ kkS k λ k þ berðSTÞ: 2
Moreover, by applying the arithmetic-geometric mean inequality, we get
jhðT þ SÞkλ , kλ ij
2
ð hð
Þ i
1 kTkλ k2 þ kS kλ k2 þ berðSTÞ 2 1 T T þ SS k λ , k λ þ berðSTÞ = ber2 ðTÞ þ ber2 ðSÞ þ 2 1 ≤ ber2 ðTÞ þ ber2 ðSÞ þ ber T T þ SS þ berðSTÞ 2 1 2 2 = ber ðTÞ þ ber ðSÞ þ kT T þ SS kber þ berðSTÞ, 2 ≤ ber2 ðTÞ þ ber2 ðSÞ þ
Þ
ð
Þ
where the last inequality follows from Lemma 3.2 since TT + SS≥ 0. Hence, we obtain 1
jhðT þ SÞkλ , kλ ij ≤ ber2 ðTÞ þ ber2 ðSÞ þ 2 kT T þ SS kber þ berðSTÞ 2
for every λ 2 Ω. Therefore, by taking the supremum over all λ 2 Ω in the above inequality, we get 1 ber2 ðT þ SÞ ≤ ber2 ðTÞ þ ber2 ðSÞ þ kT T þ SS kber þ berðSTÞ: 2 This completes the proof.
□
Berezin Number and Norm Inequalities for Operators
547
4 A-Berezin Number and A-Berezin Seminorm Inequalities In this section, ðHΩ , h , iÞ denotes an RKHS on a set Ω with associated norm kk. Let us introduce the following definition. Definition 4.1 Let T 2 A1∕2 ðHΩ Þ and kλ , kμ be two normalized reproducing kernels of HΩ . (i) For λ 2 Ω, the A-Berezin symbol of T at λ is ~ = hTkλ , kλ iA : TðλÞ (ii) The A-Berezin range of T is BerA ðTÞ := fhTk λ , kλ iA ; λ 2 Ωg: (iii) The A-Berezin number of T is
fjhTkλ, kλiAj ;
berA ðTÞ := sup
g
λ2Ω :
(vi) The A-Berezin seminorm of T is kTkberA : = sup
fjhTkλ, kμiAj ;
g
λ, μ 2 Ω :
In the following proposition, we sum up some elementary properties of berA() and kkberA which follow immediately from their definitions. Proposition 4.1 Let T, S 2 A1∕2 ðHΩ Þ and α 2 . Then, the following properties hold: (i) (ii) (iii) (iv) (v) (vi)
berA(αT) = |α|berA(T), berA(T + S) ≤berA(T) + berA(S), berA ðTÞ ≤ kTkberA ≤ kAkkTk, kαTkberA = jαjkTkberA , kT þ SkberA ≤ kTkberA þ kSkberA , If T 2 A ðHΩ Þ, then berA ðTÞ = berA ðT ♯A Þ and kTkberA = kT ♯A kberA :
One main target of this section is to derive several bounds involving berA() and kkberA . Our first result in this section reads as follows.
548
C. Conde et al.
Theorem 4.1 Let T, S, X, Y 2 A ðHΩ Þ. Then berA ðTX ± YSÞ ≤
kTT ♯A þ S♯A SkberA kX ♯A X þ YY ♯A kberA ,
ð4:1Þ
kYY ♯A þ TT ♯A kberA kX ♯A X þ S♯A SkberA :
ð4:2Þ
and berA ðTX ± YSÞ ≤
Proof Let λ 2 Ω and k λ be the normalized reproducing kernel of the space HΩ . By applying the Cauchy-Schwarz inequality, we get
jhðTX ± YSÞkλ , kλ iA j
2
ð Þ 2 = ðjhXk λ , T ♯ k λ iA j þ jhSkλ , Y ♯ kλ iA jÞ 2 ≤ ðkXkλ kA kT ♯ k λ kA þ kSk λ kA kY ♯ k λ kA Þ 2 2 2 2 ≤ ðkSkλ kA þ kT ♯ kλ kA ÞðkXkλ kA þ kY ♯ kλ kA Þ = hðS♯ S þ TT ♯ Þk λ , k λ iA hðX ♯ X þ YY ♯ Þk λ , k λ iA ≤ jhTXkλ , k λ iA j þ jhYSk λ , k λ iA j A
2
A
A
A
A
A
A
A
A
A
≤ berA ðTT ♯A þ S♯A SÞberA ðX ♯A X þ YY ♯A Þ ≤ kTT ♯A þ S♯A SkberA kX ♯A X þ YY ♯A kberA : Thus
jhðTX ± YSÞkλ, kλiAj ≤
kTT ♯A þ S♯A SkberA kX ♯A X þ YY ♯A kberA
for all λ 2 Ω. By taking the supremum over all λ 2 Ω in the above inequality, we obtain (4.1) as desired. On the other hand, an application of the CauchySchwarz inequality gives kXk λ kA kT ♯A kλ kA þ kSkλ kA kY ♯A kλ kA ≤
2
2
kY ♯A k λ kA þ kT ♯A k λ kA
2
2
kXkλ kA þ kSkλ kA :
So, by taking the last inequality into consideration and using a similar argument as in the proof of (4.1), we get the second inequality (4.2). Hence, the proof is complete. □
Berezin Number and Norm Inequalities for Operators
549
In all that follows, for any arbitrary operator X 2 A ðHΩ Þ, we write ReA ðXÞ: =
X þ X ♯A 2
X - X ♯A : 2i
and ImA ðXÞ: =
The next lemma is useful in proving our next result. Lemma 4.1 Let X 2 A ðHΩ Þ. Then
ð
Þ
berA ðXÞ = sup berA ReA ðeiθ XÞ : θ2
ð4:3Þ
Proof Notice first that it follows from [39, Lemma 2.4] that sup hReA ðeiθ X Þkλ , k λ iA = jhXkλ , kλ iA j:
ð4:4Þ
θ2
So, by taking (4.4) into consideration, we see that sup berA ðReA ðeiθ XÞÞ = sup supjhReA ðeiθ XÞkλ , kλ iA j θ2
θ2 λ2Ω
= supjhXkλ , kλ iA j λ2Ω
= berA ðXÞ: This finishes the proof.
□
Remark 4.1 By replacing X by iX in (4.3), we get
ð
Þ
berA ðXÞ = sup berA ImA ðeiθ XÞ : θ2
ð4:5Þ
We are in a position to prove the following result. Theorem 4.2 Let T, X 2 A ðHΩ Þ. Then
ð
ð
Þ
ð
Þ
ber2A ðTX ± XT ♯A Þ ≤ 2kA1∕2 k2 kT ♯A k2 berA Re2A ðX Þ þ berA Im2A ðX Þ
ð
Þ
ð
Þ
Þ
þ berA Re2A ðX Þ þ berA Im2A ðX Þ þ ber2A ðLA Þ , where LA = ReA(X)ImA(X) + ImA(X)ReA(X).
550
C. Conde et al.
Proof Notice first that it can be verified that
½ ð
♯
ReA eiθ X ♯A T ♯A þ ðT ♯A Þ A X ♯A
Þ = ðT ♯ Þ♯ ReAðeiθ X♯ Þ þ ReAðeiθ X♯ ÞT ♯ : A
A
A
A
A
ð4:6Þ Let λ 2 Ω and kλ be the normalized reproducing kernel of the space HΩ . Then, by using (4.6), we see that
hReA ½eiθ ðX ♯A T ♯A þ ðT ♯A Þ A X ♯A Þkλ , kλ iA ♯
2
♯
= hðT ♯A Þ A ReA ðeiθ X ♯A Þ þ ReA ðeiθ X ♯A ÞT ♯A k λ , k λ iA ≤2
2
2
♯
2
hðT ♯A Þ A ReA ðeiθ X ♯A Þkλ , kλ iA þ hReA ðeiθ X ♯A ÞT ♯A kλ , kλ iA
,
where the last inequality follows by using the triangle inequality, together with the convexity of the function t ° t2. Now, by using the Cauchy-Schwarz inequality, we observe that
hReA ½eiθ ðX ♯A T ♯A þ ðT ♯A Þ A X ♯A Þkλ , kλ iA ♯
≤2
2
2
hReA ðeiθ X ♯A Þkλ , T ♯A kλ iA þ hT ♯A kλ , ReA ðeiθ X ♯A Þ
♯A
k λ iA
≤ 2kA1∕2 T ♯A k2 kReA ðeiθ X ♯A Þkλ k2A þ 2kA1∕2 T ♯A k2 k ReA ðeiθ X ♯A Þ = 4kA1∕2 T ♯A k2
2
♯A
kλ k2A
k ReAðeiθ X ♯ Þ ♯ kλkA: A
2
A
Since ReA ðeiθ X ♯A Þ is an A-self-adjoint operator, then by taking (2.1) into consideration, we infer that
hReA ½eiθ ðX ♯A T ♯A þ ðT ♯A Þ A X ♯A Þkλ , kλ iA ♯
≤ 4kA1∕2 T ♯A k2 h ReA ðeiθ X ♯A Þ
♯A 2
k λ , k λ iA
2 ♯A
≤ 4kA1∕2 k2 kT ♯A k2 h ReA ðeiθ X ♯A Þ
k λ , k λ iA
= 4kA1∕2 k2 kT ♯A k2 h ReA ðeiθ X ♯A Þ kλ , k λ iA , 2
2
Berezin Number and Norm Inequalities for Operators
551
where the last equality follows since ReA ðeiθ X ♯A Þ ≥ A 0. Now, since we have X = ReA(X) + iImA(X), then 2
♯A
X ♯A = ½ReA ðXÞ
♯A
- i½ImA ðXÞ
= ReA ðX ♯A Þ - i ImA ðX ♯A Þ:
So, we see that
½ ð Þ 2 = ðcos θReA ðX ♯ Þ þ sin θImA ðX ♯ ÞÞ
ReA ðeiθ X ♯A Þ = ReA eiθ ½ReA ðX ♯A Þ - iImA ðX ♯A Þ 2
A
2
A
= cos 2 θRe2A ðX ♯A Þ þ sin 2 θIm2A ðX ♯A Þ
ð
þ cos θ sin θ ReA ðX ♯A ÞImA ðX ♯A Þ þ ImA ðX ♯A ÞReA ðX ♯A Þ = cos 2 θRe2A ðX ♯A Þ þ sin 2 θIm2A ðX ♯A Þ þ
cos θ sin θL♯AA ,
where LA = ReA(X)ImA(X) + ImA(X)ReA(X). Hence, we see that sup h ReA ðeiθ X ♯A Þ kλ , k λ iA 2
θ2
½
= sup cos 2 θhRe2A ðX ♯A Þk λ , k λ iA þ sin2 θhIm2A ðX ♯A Þk λ , k λ iA θ2
þ cos θ sin θhL♯AA k λ , k λ iA
½
ð
Þ
ð
≤ sup cos2 θ berA Re2A ðX ♯A Þ þ sin2 θ berA Im2A ðX ♯A Þ θ2
þ cos θ sin θhL♯AA k λ , k λ iA
½
ð
Þ
Þ
ð
Þ
= sup cos2 θ berA Re2A ðX Þ þ sin2 θ berA Im2A ðX Þ θ2
≤
ð
þ cos θ sin θhL♯AA k λ , k λ iA
ð
Þ
ð
Þ
1 berA Re2A ðX Þ þ berA Im2A ðX Þ 2
ð
Þ
ð
Þ ð
þ berA Re2A ðX Þ þ berA Im2A ðX Þ þ hL♯AA kλ , kλ iA So, by taking (4.3) into account, we infer that
Þ Þ, 2
Þ
552
C. Conde et al.
ber2A ðTX þ XT ♯A Þ
ð
♯
= ber2A X ♯A T ♯A þ ðT ♯A Þ X ♯A
Þ
½ ð
♯
= sup hReA eiθ X ♯A T ♯A þ ðT ♯A Þ A X ♯A θ2
Þkλ, kλiA
2
≤ 4kA1∕2 k kT ♯A k suph ReA ðeiθ X ♯A Þ k λ , kλ iA 2
2
2
θ2
ð
ð
Þ
ð
≤ 2kA1∕2 k kT ♯A k berA Re2A ðX Þ þ berA Im2A ðX Þ 2
2
ð
Þ
ð
Þ
Þ ð
þ berA Re2A ðX Þ þ berA Im2A ðX Þ þ hL♯AA k λ , k λ iA
ÞÞ 2
for all λ 2 Ω. So, by taking the supremum over all λ 2 Ω in the last inequality, we obtain
ð
ð
Þ
ð
ber2A ðTX þ XT ♯A Þ ≤ 2kA1∕2 k2 kT ♯A k2 berA Re2A ðX Þ þ berA Im2A ðX Þ
ð
Þ
ð
Þ
Þ
Þ
þ berA Re2A ðX Þ þ berA Im2A ðX Þ þ ber2A ðL♯AA Þ , whence
ð
ð
Þ
ð
ber2A ðTX þ XT ♯A Þ ≤ 2kA1∕2 k2 kT ♯A k2 berA Re2A ðX Þ þ berA Im2A ðX Þ
ð
Þ
ð
Þ
Þ
Þ
þ berA Re2A ðX Þ þ berA Im2A ðX Þ þ ber2A ðLA Þ : Now, replacing T by iT in the last inequality yields
ð
ð
Þ
ð
Þ
ber2A ðTX - XT ♯A Þ ≤ 2kA1∕2 k2 kT ♯A k2 berA Re2A ðX Þ þ berA Im2A ðX Þ
ð
Þ
ð
Þ
Þ
þ berA Re2A ðX Þ þ berA Im2A ðX Þ þ ber2A ðLA Þ : By combining the above two inequalities, we get the desired result.
□
Berezin Number and Norm Inequalities for Operators
553
For T 2 A1∕2 ðHΩ Þ, we define cA ðTÞ : = inf
fjhTkλ, kλiAj ; λ 2 Ωg:
In the next theorem, we obtain the following lower bound for berA(T). Theorem 4.3 Let T 2 ðHΩ Þ: Then berA ðTÞ p 2 ≥ ber2A ðReA ðTÞÞ þ ber2A ðImA ðTÞÞ þ cA 2 ðReA ðTÞÞ þ cA 2 ðImA ðTÞÞ: 2 Proof Let λ 2 Ω and k λ be the normalized reproducing kernel of the space HΩ . Since T = ReA(T) + i ImA(T), then we have 2
2
2
hðReA ðTÞÞkλ , kλ iA þ hðImA ðTÞÞkλ , kλ iA = hTkλ , kλ iA : This implies that cA 2 ðReA ðTÞÞ þ ber2A ðImA ðTÞÞ ≤ ber2A ðTÞ,
ð4:7Þ
cA 2 ðImA ðTÞÞ þ ber2A ðReA ðTÞÞ ≤ ber2A ðTÞ:
ð4:8Þ
and
A combination of (4.7), together with (4.8), gives 2ber2A ðTÞ ≥ cA 2 ðReA ðTÞÞ þ ber2A ðImA ðTÞÞ þ cA 2 ðImA ðTÞÞ þ ber2A ðReA ðTÞÞ: Hence, the proof is complete. The following lemma is useful in proving our next result. Lemma 4.2 Let T, S 2 A ðHΩ Þ. Then berA ðST ♯A Þ = berA ðTS♯A Þ: Proof Since APranðAÞ = A and PranðAÞ S♯A = S♯A , then we see that
□
554
C. Conde et al.
berA ðST ♯A Þ = berA ðPranðAÞ TPranðAÞ S♯A Þ = berA ðPranðAÞ TS♯A Þ = sup jhAPranðAÞ TS♯A k λ , kλ ij ; λ 2 Ω = sup jhTS♯A kλ , kλ iA j ; λ 2 Ω = berA ðTS♯A Þ: □
This proves the desired result. Next, we prove the following A-Berezin norm inequality. Theorem 4.4 Let T, S 2 A ðHΩ Þ: Then, the following inequalities hold: kT þ SkberA ≤
1 kTk2berA þ kSk2berA þ berA ðT ♯A T þ S♯A SÞ þ berA ðT ♯A SÞ, 2
and kT þ SkberA ≤
1 kTk2berA þ kSk2berA þ berA ðTT ♯A þ SS♯A Þ þ berA ðTS♯A Þ: 2
Proof Let λ, μ 2 Ω and k λ , kμ be two normalized reproducing kernels of the space HΩ . Then, we obtain
jhðT þ SÞkλ , kμ iA j
2
≤ jhTkλ , kμ iA j þ jhSk λ , k μ iA j
2
= jhTk λ , k μ iA j þ jhSk λ , kμ iA j þ 2jhTkλ , k μ iA hSkλ , kμ iA j 2
2
= jhTk λ , k μ iA j þ jhSk λ , kμ iA j þ 2jhTkλ , k μ iA hk μ , Skλ iA j 2
2
≤ jhTk λ , k μ iA j þ jhSkλ , kμ iA j þ kTkλ kA kSk λ kA þ jhTk λ , Skλ iA j, 2
2
where the last inequality follows by applying Lemma 2.1. So, by applying the arithmetic geometric mean inequality, we obtain
Berezin Number and Norm Inequalities for Operators
555
jhðT þ SÞkλ ,kμ iA j
2
2 2 1 ≤ jhTkλ , kμ iA j þ jhSk λ ,k μ iA j þ ðkTk λ k2A þ kSkλ k2A Þ þ jhTk λ ,Skλ iA j 2 2 2 1 ≤ jhTkλ , kμ iA j þ jhSk λ ,k μ iA j þ hðT ♯A T þ S♯A SÞkλ , kλ iA þ jhT ♯A Skλ , kλ iA j 2 1 ≤kTk2berA þ kSk2berA þ berA ðT ♯A T þ S♯A SÞ þ berA ðT ♯A SÞ: 2
By taking the supremum over all λ, μ 2 Ω, we deduce that 1 kT þ Sk2berA ≤ kTk2berA þ kSk2berA þ berA ðT ♯A T þ S♯A SÞ þ berA ðT ♯A SÞ: 2 ð4:9Þ This proves the first inequality in Theorem 4.4. Finally, by replacing T by T ♯A and S by S♯A in (4.9) and then using the fact that berA ðX ♯A Þ = berA ðXÞ and kX ♯A kberA = kXkberA for all X 2 A ðHΩ Þ, we obtain 1 kT þ Sk2berA ≤ kTk2berA þ kSk2berA þ berA ðTT ♯A þ SS♯A SÞ þ berA ðST ♯A Þ: 2 This immediately proves the second inequality in Theorem 4.4 by taking Lemma 4.2 into consideration. □ Our next result reads as follows. Theorem 4.5 Let T, S 2 A ðHΩ Þ: Then kT þ SkberA ≤
kTk2berA þ kSk2berA þ
berA ðT ♯A TÞ
berA ðS♯A SÞ þ berA ðT ♯A SÞ:
Proof Let λ, μ 2 Ω and k λ , kμ be two normalized reproducing kernels of the space HΩ . By proceeding as in the proof of Theorem 4.4 and then using Lemma 2.1, we get
556
C. Conde et al.
jhðT þ SÞkλ , kμ iA j
2
≤ jhTk λ , kμ iA j þ jhSkλ , kμ iA j þ kTkλ kA kSkλ kA þ jhTk λ , Skλ iA j 2
2
1
1
≤ jhTk λ , kμ iA j þ jhSkλ , kμ iA j þ jhT ♯A Tk λ , kλ iA j2 jhS♯A Skλ , kλ iA j2 2
2
þjhT ♯A Skλ , kλ iA j ≤ kTk2berA þ kSk2berA þ
berA ðT ♯A TÞ
berA ðS♯A SÞ þ berA ðT ♯A SÞ:
By taking the supremum over all λ, μ 2 Ω, we deduce that kT þ Sk2berA ≤kTk2berA þ kSk2berA þ This proves the desired result.
berA ðT ♯A TÞ berA ðS♯A SÞ þ berA ðT ♯A SÞ: □
Acknowledgements The authors would like to express their gratitude to the anonymous referees for their comments toward an improved final version of this chapter.
References 1. Altwaijry, N., Feki, K., & Minculete, N. (2022). Further inequalities for the weighted numerical radius of operators. Mathematics, 10, 3576. https://doi.org/10.3390/ math10193576 2. Arias, M. L., Corach, G., & Gonzalez, M. C. (2008). Partial isometries in semiHilbertian spaces. Linear Algebra and Its Applications, 428(7), 1460–1475 3. Arias, M. L., Corach, G., & Gonzalez, M. C. (2008). Metric properties of projections in semi-Hilbertian spaces. Integral Equations and Operator Theory, 62, 11–28 4. Arias, M. L., Corach, G., & Gonzalez, M. C. (2009). Lifting properties in operator ranges. Acta Scientiarum Mathematicarum (Szeged), 75(3–4), 635–653 5. Aronzajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society, 68, 337–404 6. Bakherad, M., & Karaev, M. T. (2019). Berezin number inequalities for operators. Concrete Operators, 6(1), 33–43 7. Baklouti, H., Feki, K., & Sid Ahmed, O. A. M. (2018). Joint numerical ranges of operators in semi-Hilbertian spaces. Linear Algebra and Its Applications, 555, 266–284 8. Baklouti, H., & Namouri, S. (2022). Spectral analysis of bounded operators on semiHilbertian spaces. Banach Journal of Mathematical Analysis, 16, 12. https://doi. org/10.1007/s43037-021-00167-1 9. Berezin, F. A. (1972). Covariant and contravariant symbols for operators. Mathematics of the USSR-Izvestiya, 6, 1117–1151
Berezin Number and Norm Inequalities for Operators
557
10. Berezin, F. A. (1974). Quantizations. Mathematics of the USSR-Izvestiya, 8, 1109–1163 11. Bani-Domi, W., & Kittaneh, F. (2021). Norm and numerical radius inequalities for Hilbert space operators. Linear and Multilinear Algebra, 69(5), 934–945 12. Bhunia, P., Feki, K., & Paul, K. (2022). Generalized A-numerical radius of operators and related inequalities. Bulletin of the Iranian Mathematical Society.https://doi. org/10.1007/s41980-022-00727-7 13. Bhunia, P., & Paul, K. (2021). New upper bounds for the numerical radius of Hilbert space operators. Bulletin des Sciences Mathematiques, 167, 102959. https://doi. org/10.1016/j.bulsci.2021.102959 14. Bhunia, P., Kittaneh, F., Paul, K., & Sen, A. (2023). Anderson’s theorem and A-spectral radius bounds for semi-Hilbertian space operators. Linear Algebra and Its Applications, 657, 147–162 15. Bhunia, P., Paul, K., & Sen, A. (2022). Inequalities involving Berezin norm and Berezin number. https://arxiv.org/abs/2112.10186 16. Bhunia, P., Sen, A., & Paul, K. (2022). Development of the Berezin number inequalities. arXiv:2202.03790v1 17. Buzano, M. L. (1974). Generalizzazione della diseguaglianza di Cauchy-Schwarz. (Italian) Rend. Rendiconti del Seminario Matematico Universitá e Politecnico di Torino, 31, 405–409 18. Chien, F., Bakherad, M., & Alomari, M. W. (2023). Refined Berezin number inequalities via superquadratic and convex functions. Filomat, 37(1), 265–277 19. Douglas, R. G. (1966). On majorization, factorization and range inclusion of operators in Hilbert space. Proceedings of the American Mathematical Society, 17, 413–416 20. Dragomir, S. S. (2006). Some inequalities for the Euclidean operator radius of two operators in Hilbert spaces. Linear Algebra and Its Applications, 419, 256–264 21. Dragomir, S. S. (2006). A potpourri of Schwarz related inequalities in inner product spaces (II). Journal of Inequalities in Pure and Applied Mathematics, 7(1), Art. 14 22. Feki, K. (2020). Spectral radius of semi-Hilbertian space operators and its applications. Annals of Functional Analysis, 11, 929–946 23. Feki, K. (2020). A note on the A-numerical radius of operators in semi-Hilbert spaces. Archiv der Mathematik (Basel), 115(5), 535–544 24. Feki, K. (2022). Some A-spectral radius inequalities for A-bounded Hilbert space operators. Banach Journal of Mathematical Analysis, 16, 31. https://doi.org/10.1007/ s43037-022-00185-7 25. Garayev, M. T., & Alomari, M. W. (2021). Inequalities for the Berezin number of operators and related questions. Complex Analysis and Operator Theory, 15, 30 26. Gustafson, K. E., & Rao, D. K. M. (1997). Numerical range. New York: Springer 27. Garayev, M., Saltan, S., Bouzeffour, F., & Aktan, B. (2020). Some inequalities involving Berezin symbols of operator means and related questions. RACSAM, 114, 85 28. Halmos, P. R. (1982). A Hilbert space problem book (2nd ed.). New York: Springer 29. Huban, M. B., Başaran, H., & Gürdal, M. (2021). New upper bounds related to the Berezin number inequalities. Journal of Inequalities and Special Functions, 12(3), 1–12
558
C. Conde et al.
30. Karaev, M. T. (2013). Reproducing kernels and Berezin symbols techniques in various questions of operator theory. Complex Analysis and Operator Theory, 7, 983–1018 31. Karaev, M. T. (2006). Berezin symbol and invertibility of operators on the functional Hilbert spaces. Journal of Functional Analysis, 238, 181–192 32. Karaev, M. T., & Saltan, S. (2005). Some results on Berezin symbols. Complex Variables, Theory and Application, 50(3), 185–193 33. Khosravi, M., Drnovšek, R., & Moslehian, M. S. (2012). A commutator approach to Buzano’s inequality. Filomat, 26(4), 827–832 34. Majee, S., Maji, A., & Manna, A. (2023). Numerical radius and Berezin number inequality. Journal of Mathematical Analysis and Applications, 517, 126566 35. Nordgren, E., & Rosenthal, P. (1994). Boundary values of Berezin symbols. Operator Theory: Advances and Applications, 73, 362–368 36. Paulsen, V. I., & Raghupathi, M. (2016). An introduction to the theory of reproducing kernel Hilbert spaces. Cambridge Studies in Advanced Mathematics (vol. 152). Cambridge: Cambridge University Press 37. Saddi, A. (2012). A-normal operators in semi-Hilbertian spaces. Australian Journal of Mathematical Analysis and Applications, 9(1), Art. 5, 12 pp. 38. Nagy, B. Sz., & Foias, C. (1970). Harmonic analysis of operators on Hilbert space. Amsterdam–London: North-Holland 39. Zamani, A. (2019). A-numerical radius inequalities for semi-Hilbertian space operators. Linear Algebra and Its Applications, 578, 159–183 40. Zhu, K. (2007). Operator theory in functions spaces (2nd ed.). New York: Springer
Norm Equalities for Derivations Mohamed Boumazgour and Abdelghani Sougrati
Abstract We give an expository survey of different results about the norm of a derivation on a Banach space, with particular emphasis on the special case of derivations having the same norm when they are restricted to any symmetric norm ideal. Keywords Derivations • Norm • Numerical range Mathematics Subject Classification (MSC2020) 47A12 • 47A30 • 47B47
1 Introduction Let BðEÞ denote the algebra of all bounded linear operators acting on a Banach space E. For A, B 2 BðEÞ, LA denotes the left multiplication on BðEÞ defined by LA(X) = AX (X 2 BðEÞ ); RB denotes the corresponding right multiplication. The inner derivation on BðEÞ induced by A 2 BðEÞ is defined by δA(X) = AX - XA; the generalized derivation corresponding to A and B is given by δA,B(X) = AX - XB. Generalized derivations first appeared in a series of notes by Sylvester [23] in the 1880s, who proved that if A and B are n × n matrices (n ≥ 1), then δA,B is invertible if and only if A and B have no common eigenvalue. According to M. Rosenblum [17], it was D.C. Kleinecke who, in 1954, started the study of
M. Boumazgour (✉) Faculty of Economical Science, Ibn Zohr University, Agadir, Morocco e-mail: [email protected] A. Sougrati Faculty of Science, Ibn Zohr University, Agadir, Morocco © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Moslehian (ed.), Matrix and Operator Equations and Applications, Mathematics Online First Collections, https://doi.org/10.1007/16618_2023_56
559
560
M. Boumazgour and A. Sougrati
generalized derivations on arbitrary Banach algebras. The first formula for the norm of a derivation on a Hilbert space was established by J. Stampfli [22] in 1970. This expository survey is mainly dedicated to the norm of a derivation on a Banach space, with particular emphasis on the norm of a derivation when it is restricted to a norm ideal of a Hilbert space. Its organization is as follows. In Section 2, we describe the results of Johnson [12], Kyle [13], Stampfli [22] about the norm of a derivation. We start by giving Stampfli’s identity for the norm of a generalized derivation on a Hilbert space; next, we discuss different results obtained about the norm in the general setting of Banach spaces. In Section 3, we expose the norm properties of a derivation on the Calkin algebra, firstly obtained by Fong [9] and subsequently by Saksman and Tylli [19]. Section 4 is concerned with the problem of computation of the norm of a derivation when it is restricted to a given symmetric norm ideal. First, we expose different estimates of the norm obtained in [3, 7, 8]. Second, we discuss the results of [2, 4, 7, 8] about S-universal derivations. We finish this introduction by recalling some definitions and fixing notation. Let H be a complex Hilbert space. For A 2 BðHÞ, let σ(A) and r(A) denote, respectively, the spectrum and spectral radius of A. The numerical range of A is defined by W(A) = {hAx, xi, x 2 H, kxk = 1}, and the numerical radius of A is given by wðAÞ = supfjλj, λ 2 WðAÞg. If Ω is a non-empty bounded subset of the plane, then the diameter of Ω is defined by diamðΩÞ = supfjα - βj, α, β 2 Ωg. The closure of Ω will be denoted by Ω. The complex conjugate, real part, and imaginary part of a complex number λ are denoted by λ, ℜðλÞ, and ℑ(λ), respectively. If Ω ⊆ , we also denote ℜΩ = fℜðλÞ, λ 2 Ωg. Throughout, I stands for the operator identity. For A 2 BðHÞ, let A and A′ denote the operator adjoint of A and the opposed operator to A, respectively. For x, y 2 H, let x y be the rank-1 operator defined on H by x y(z) = hz, yix for all z 2 H.
2 Norm of a Derivation on a Hilbert Space Following [22], the maximal numerical range of A 2 BðHÞ, denoted by W0(A), is defined to be the set
Norm Equalities for Derivations
561
W 0 ðAÞ = fλ 2 : there exists fxn g ⊆ H, kxn k = 1 such that lim hAxn , xn i = λ and lim kAxnk = kAk g:
n→1
n→1
The normalized maximal numerical range of A 2 BðHÞ, denoted by WN(A), is given by
W N ðAÞ =
W0
A kAk
0
if A ≠ 0 if A = 0:
The set W0(A) is non-empty, closed, convex, and contained in the closure of the numerical range. Moreover, if A + λ ≠ 0 for any λ, then the map λ → W N ðA þ λÞ is upper semicontinuous, see [22]. Proposition 2.1 [22] For A 2 BðHÞ, the following conditions are equivalent: 1. 2. 3. 4.
0 2 W0(A). kδAk = 2kAk. kAk ≤ kA + λk, λ 2 . kAk2 + jλj2 ≤ kA + λk2, λ 2 .
Proof (1) ) (2): Let 0 2 W0(A). Then there exists a sequence fxn g ⊆ H such that kxn k = 1, lim n → 1 hAxn , xn i = 0 and lim n → 1 kAxnk = kAk: Set Axn = αnxn + βnyn, where αn , βn 2 , hxn, yni = 0 and kynk = 1; thus αn = hAxn, xni→ 0 and jαn j2 þ jβn j2 = kAxnk 2 → kAk 2 . Let Vn = xn xn - yn yn. Then kVnk = 1 and δA(Vn)xn = 2βnyn. Thus k δA k ≥ k δA ðV n Þk ≥ k δA ðV n Þxn k = 2jβn j = 2 kAxn k2 - jαn j2 = 2 kAxn k2 - jhAxn , xn ij2 → 2kAk: Since kδAk ≤ 2kAk for any A, then we have kδAk = 2kAk. (2) ) (1): We know that W0(A) is convex, so to show that 0 2 W0(A), it suffices to prove that W0(A) contains two opposite points. Thus, assume that kδAk = 2kAk, then there exist two unit sequence {xn}⊆ H and fX n g ⊆ BðHÞ such that lim n → 1 kAX n xn - X n Axnk = 2kAk. Moreover, we may choose xn
562
M. Boumazgour and A. Sougrati
and Xn such that kAX n - X n Ak - kAX n xn - X n Axn k ≤ 1n : Clearly, lim n → 1 kAX n xn k = kAk, lim n → 1 kX n xn k = 1, lim n → 1 kAxn k = kAk and lim n → 1 ðAX n xn þ X n Axn Þ = 0. Let μ = lim n → 1 hAxn , xn i by choosing a subsequence if necessary. Clearly μ 2 W0(A). On the other hand, since lim n → 1 kA kXX nn xxnn kk = kAk, we obtain that λ = lim hA n→1
X n xn X n xn , i 2 W 0 ðAÞ: kX n xn k kX n xn k
Since lim n→ 1 kX n xn k =1, it follows that lim n→ 1 hðI -X n X n Þxn ,xn i =0, so lim n→1 ðI -X n X n Þxn =0 because I -X n X n is a positive operator. Therefore X n xn X n xn , i kX n xn k kX n xn k X Ax X n xn h n n , = - nlim i → 1 kX n xn k kX n xn k X X n xn Axn = - lim h , n i n → 1 kX n xn k kX n xn k hAxn , xn i = - nlim →1
λ = lim hA n→1
= - μ:
This implies that 0 2 W0(A). (1) ) (4): If 0 2 W0(A), then there exists a sequence fxn g ⊆ H such that kxn k = 1, lim n → 1 hAxn , xn i = 0 and lim n → 1 kAxn k = kAk. Thus for λ 2 , we have kA þ λ k2 ≥ lim k ðA þ λÞxn k2 n→1
ð
Þ
= lim kAxn k 2 þ jλj2 þ 2ℜðλhAxn , xn iÞ n→1
= kAk2 þ jλj2 : (4) ) (3): Obvious. (3) ) (1): Assume in the contrary that 0 2 = W0(A). We shall prove the existence of a λ 2 such that kAk > kA + λk. By rotating A, we may assume that ℜW 0 ðAÞ ≥ τ > 0. Let F = fx 2 H :kxk = 1 and ℜðhAx, xiÞ ≤ τ∕2g, and let η= supfkAxk: x 2 F g. Then η 0, then kðA - μÞxk2 = ðα - μÞ2 þ β2 þ kyk2 = kAxk2 þ ðμ2 - 2αμÞ < kAk2 :
Thus, kA - μk < kAk which leads to a contradiction.
□
Theorem 2.1 [22] Let A 2 BðHÞ. Then kδA k = 2 inf f kA - λk, λ 2 g:
ð2:1Þ
Proof Since kAX - XAk = k(A - λ)X - X(A - λ)k≤ 2kA - λkkXk for any X 2 BðHÞ, it follows that kδA k ≤ 2 inf fkA - λk, λ 2 g. On the other hand, it follows by a compactness argument that there exists μ 2 such that inf fkA - λk, λ 2 g = kA - μk, whence kA - μk ≤ k(A - μ) + λk for all λ 2 . By Proposition 2.1, this implies that 0 2 W0(A - μ), and hence □ kδAk = kδA-μk = 2kA - μk; which completes the proof. For A 2 BðHÞ, let rA denote the radius of the smallest disk containing σ(A). Recall that an operator A 2 BðHÞ is said to be hyponormal if AA AA is positive. (For good accounts on the theory of hyponormal operators, we refer to [15]). Corollary 2.1 Let A 2 BðHÞ be hyponormal. Then k δA k = 2rA : Proof It is well-known that if A is hyponormal, then r(A - λ) = kA - λk for any λ 2 ; thus inf fkA - λk, λ 2 g = r A by [22, Corollary 1], and the proof is complete. □ Proposition 2.2 [22] If A, B 2 BðHÞ are nonzero operators, then the following assertions are equivalent: 1. WN(A) \ WN(-B) ≠ ∅. 2. kδA,Bk = kAk + kBk. 3. kAk + kBk ≤ kA + λk + kB + λk for all λ 2 .
564
M. Boumazgour and A. Sougrati
Proof (1) ) (2): Let μ 2 WN(A) \ WN(-B). Then there exist unit sequences {xn}, {yn}⊆ H with the properties that lim n → 1 kAxn k = kAk, lim n → 1 k Byn k = k B k, and lim n → 1 hAxn , xn i = μ kAk, lim n → 1 hByn , yn i = - μ k B k. Let Axn = αnxn + βnun and - Byn = λnyn + γ nvn, where un and vn are unit vectors orthogonal to xn and yn, respectively, and βn, γ n ≥ 0. Set Xn = xn yn + un vn. Then
ðAX n - X n BÞyn = Axn - hByn , yn ixn - hByn , vn iun = ðhAxn , xn i - hByn , yn iÞxn þ ðhAxn , un i - hByn , vn iÞun : Since lim n → 1 hAxn , xn i = μ kAk and lim n → 1 kAxn k = kAk, it follows that lim n → 1 hAxn , un i = 1 - jμj2 kAk. Analogously, we have lim n → 1 - hByn , yn i = μkBk and lim n→1 -hByn ,vn i = 1 -jμj2 kBk. Thus lim kðAX n - X n BÞyn k2
n!1
¼ jμj2 ðkAk þ kBkÞ2 þ 1 - jμj2 ðkAk þ kBkÞ2 ¼ ðkAk þ kBkÞ2 which implies that kAk þ kBk ≤ kδA,B k ≤ kAk þ kBk: (2) ) (3): Let fX n g ⊆ BðHÞ such that kXnk = 1 and kAX n - X n Bk = kAk þ kBk. For any λ 2 , we have
lim n → 1
kAX n - X n Bk = kðA þ λÞX n - X n ðB þ λÞk ≤ kA þ λk þ kB þ λk: It follows that kAk þ kBk ≤ kA þ λk þ kB þ λk: (3) ) (1): Let kAk + kBk ≤ kA + λk + kB + λk for every λ 2 . Assume that WN(A) \ WN(-B) = ∅. By rotating A and B, we may assume that ℜðα - βÞ ≥ τ > 0 for α 2 WN(A) and β 2 WN(-B). Let {xn}, {yn}⊆ H be two unit sequences such that kðA þ 1nÞxn k ≥ kA þ 1nk - n12 and kðB þ 1nÞyn k ≥ k B þ 1n k - n12 . Then kAxn k þ 1n - kAk ≥ kAþ 1n k - n12 - kAk, and similarly k Byn k þ 1n - kAk ≥ k Bþ 1n k - n12 - kBk. Since 1 1 kAþ n k þ k Bþ n k ≥ kAk þ kBk, it follows that
Norm Equalities for Derivations
565
2 n
ðkAk - kAxn kÞ þ ðkBk - kByn kÞ ≤ þ
2 n2
A -B Let α = lim n → 1 hkAk xn , xn i and β = lim n → 1 h kBk yn , yn i by choosing subsequences if necessary. It is easily seen that α 2 WN(A) and β 2 WN(-B). Consider ℜðβ - αÞ. Since
ℜhAxn , xn i 1 1 þ 2Þ kðA þ Þxn k2 ≤ kAk2 1 þ 2kAk- 2 ð n n 2n ℜhAxn , xn i 1 2 þ 2Þ , ≤ kAk2 1þ kAk- 2 ð n 2n
ð
Þ
ð
Þ
it follows that 1 1 1 ℜhAxn , xn i kðA þ Þxn k ≤ kAk þ 2 þ n 2n kAk n kAk Therefore, -
ℜhAxn , xn i 1 1 ≤ n kAk - kðA þ Þxn k þ kAk- 1 : n 2n kAk
ð
Þ
By a similar reasoning, we get -
ℜhByn , yn i 1 1 ≤ n kBk - kðB þ Þyn k þ kBk -1 : n 2n kBk
ð
Þ
Thus -
ℜhAxn , xn i ℜhByn , yn i 1 1 ≤ n kAk þ kBk -k A þ xn k -k B þ yn k n n k Ak kBk 1 1 þ kAk -1 þ kBk - 1 : 2n 2n
n , xn i Letting n → 1, we deduce that lim n → 1 - ℜhAx kAk ℜðβ - αÞ ≤ 0 which leads to a contradiction.
ℜhByn , yn i kBk
≤ 0, that is, □
566
M. Boumazgour and A. Sougrati
Theorem 2.2 [22] Let A, B 2 BðHÞ. Then kδA,B k = inf fkA - λk þ kB - λk, λ 2 g:
ð2:2Þ
Proof If λ 2 , then δA,B(X) = AX - XB = (A - λ)X - X(B - λ) for every X 2 BðHÞ. It follows that kδA,B(X)k ≤ (kA - λk + kB - λk)kXk. So kδA,B k ≤ inf f kA - λ k þ kB - λk, λ 2 g: On the other hand, kA - λk + kB - λk is large for λ large, so inf f kA - λ k þ kB - λk, λ 2 g must be taken at some point, say μ. By Proposition 2.2, the fact that kA - μk + kB - μk ≤ k(A - μ) + λk + k(B - μ) + λk for every λ 2 implies that W N ðA - μÞ \ W N ð - ðB - μÞÞ ≠ ∅: Hence k δA,B k = k δA - μ,B - μ k = kA - μ k þ k B - μ k; which completes the proof.
□
Stampfli [22] asked whether Theorem 2.2 is still valid for derivations on BðEÞ, where E is a Banach space. This question was answered in the negative by the following example; see [12]. Example 2.1 Let 1 < p < 1 and p ≠ 2. Then there exists a rank-1 operator A 2 Bðℓp Þ such that kδA k < 2 inf fkA - λk, λ 2 g: Subsequently, Kyle [13] found a characterization of uniformly convex Banach spaces on which Stampfli’s identity (2.1) holds. Theorem 2.3 [13] Let E be a uniformly convex Banach space. Then the following conditions are equivalent: 1. E is a Hilbert space. 2. kδA k = 2 inf f kA - λk, λ 2 g for all A in BðEÞ; 3. kδA k = 2 inf f kA - λk, λ 2 g for all rank-1 operators A in BðEÞ.
Norm Equalities for Derivations
567
Note that there are nonuniformly convex Banach spaces on which Stampfli’s identity (2.1) holds; see [12]. Example 2.2 Let ℓ 1n ðÞ = ðn , k k 1 Þ. Then kδA k = 2 inf fkA - λk, λ 2 g for all A in Bðℓ 1n ðÞÞ.
3 The Norm of a Derivation on the Calkin Algebra Let KðEÞ denote the set of all compact operators on a Banach space E. The Calkin algebra C ðEÞ = BðEÞ∕ KðEÞ is equipped with the essential norm kAke = inf fkA þ Kk : K is compactg. The left and right multiplications LA and RB with arbitrary A, B 2 BðEÞ induce left and right multiplications La and Rb on C ðEÞ defined through La(t) = at and Rb(t) = tb for t 2 C ðEÞ. The generalized derivation δa,b is then defined by δa,b = La - Rb. The inner derivation determined by a is δa = δa,a. Let A 2 BðEÞ, and let a be its image in the algebra C ðEÞ, that is, a = Aþ KðEÞ; then we have kAke = kak. Recall that an operator A 2 BðEÞ is weakly compact, denoted A 2 WðEÞ, if the image of the unit ball of E is relatively compact in the weak topology of E. The corresponding weak Calkin algebra is BðEÞ∕ WðEÞ equipped with the weak essential norm kAkw = distðA, WðEÞÞ. Let A be a bounded linear operator on a complex Hilbert space H. In order to find a formula for kδak, Fong [9] introduced the concept of the essential maximal numerical range of A, denoted by essW0(A), and defined by essW 0 ðAÞ = fλ 2 : there exists an orthonormal sequence fxn g ⊆ H such that nlim hAxn , xn i = λ and nlim kAxn k = kAke g: →1 →1 The set essW0(A) is non-empty, compact, convex, and contained in the essential numerical range of A; see [9]. Proposition 3.1 [9] Suppose that A 2 BðHÞ and U is a neighborhood of essW0(A). Then there exist δ > 0 and a subspace M of H of finite codimension such that x 2 M, kxk = 1 and kAxk ≥ kAke - δ imply hAx, xi 2 U. Proposition 3.2 [9] Let A 2 BðHÞ and λ 2 . Then the following conditions are equivalent:
568
M. Boumazgour and A. Sougrati
1. There exists an orthonormal sequence {xn} ⊆ H such that kAxn k → kAke and hAxn , xn i → λ. 2. There exists a sequence {xn} of unit vectors such that xn → 0 weakly, kAxnk → kAke and hAxn , xn i → λ. 3. There exists a projection P of infinite rank such that PAP - λP is compact and kAPke = kAk. Proposition 3.3 [9] Let A 2 BðHÞ. Then 0 2 essW0(A) if and only if kAke ≤ kA - λke for all λ 2 . Proof Suppose 0 2 essW0(A). By Proposition 3.2, there exists a projection P of infinite rank such that pap = 0 and kapk = kak. Hence kδa k ≥ kδa ð2p - 1Þk = kað2p - 1Þ - ð2p - 1Þak = 2kap - pak ≥ 2kap - papk = 2kapk = 2kak: Therefore 2 kA k e ≤ k δa k ≤ 2 kA - λk e for all λ 2 . Conversely, suppose that 0 2 = essW0(A). Then, by rotating A, we may assume that ℜðλÞ ≥ 2ε (λ 2 essW0(A)) for some ε > 0. By Proposition 3.1, there exist δ > 0 and a subspace M of H of finite codimension such that x 2 M, kxk = 1, and kAxk ≥ kAke - 3δ imply ℜðhAx, xiÞ ≥ ε. We may assume that δ ≤ ε. Let {xn} be an orthonormal sequence in M such that kðA - δÞxn k → kA - δke as n → 1. For sufficiently large n, we have k(A - δ)xnk ≥ kA - δke - δ, and hence kAxnk ≥ kAke - 3δ. Therefore when n is large enough, we have ℜðhAxn , xn iÞ ≥ δ, and hence kðA - δÞxn k2 = kAxn k2 - 2δℜðhAxn , xn iÞ þ δ2 ≤ kAxn k2 - 2δ2 þ δ2 = kAxn k2 - δ2 : Letting n → 1 we get kA - δk2e ≤ kAk2e - δ2 . Thus kA - δke < kAke which □ leads to a contradiction. Therefore 0 2 essW0(A).
Norm Equalities for Derivations
569
Theorem 3.1 [9] Let A 2 BðHÞ. Then kδa k = 2 inf fkA - λke , λ 2 g: Proof Since kA - λke is large for λ large, so inf f kA - λke , λ 2 g must be taken at some point, say μ. By Proposition 3.3, we have 0 2 essW0(A - μ). Hence, by Proposition 3.2, kδak = kδa-μk ≥ 2kA - μke. Therefore the theorem is valid. □ Let A 2 BðHÞ be non-compact. Following [19], the normalized essential maximal numerical range of A is defined by W N,ess ðAÞ = kAke- 1 essW 0 ðAÞ: Let A 2 BðHÞ be such that A þ λ 2 = KðHÞ for any λ 2 . Then WN,ess(A + λ) is a non-empty, convex, and compact subset of for all λ, and the set-valued map λ ° WN,ess(A + λ) is upper semicontinuous; see [19]. Saksman and Tylli [19] used the normalized essential maximal numerical range to find a formula for the essential norm of a generalized derivation on the Calkin algebra. Before giving this formula, we recall some concepts. Let E be a Banach space with a Schauder basis {ek}. The natural basis projections {Pk} and {Qk} on E are defined by Pk ð
1
j=1
k
aj ej Þ =
j=1
aj e j ,
Qk = I - P k
for 1 j = 1 aj ej 2 E and k 2 . For integers t, r 2 with t ≤ r, we denote P]t,r] = Pr - Pt. For a bounded linear operator T on E, we also recall the following facts (see [19]): 1. Pm T → T, TPm → T, and Pm TPm → T as m → 1 in the strong operator topology. 2. If E has a basis such that kQkk = 1 for all k, then kT ke = lim kQn T k: n→1
570
M. Boumazgour and A. Sougrati
Theorem 3.2 [19] Let A, B 2 BðHÞ. Then kδA,B kw = kδa,b k = inf fkA - λke þ kB - λke , λ 2 g: Proof It suffices to establish that kδA,B kw = inf fkA - λke þ kB - λke , λ 2 g in view of [19, Theorem 7]. Suppose first that K 0 , K 1 2 KðHÞ and λ 2 be arbitrary. Since LK 0 - RK 1 2 WðBðHÞÞ (see [18, 3.2]), it follows that kδA,B kw ≤ kδA,B - LK 0 þ RK 1k = kδA - λ - K 0 ,B - λ - K 1k ≤ kA - λ - K 0 k þ kB - λ - K 1 k: Consequently, kδA,B kw ≤ inf fkA - λ k e þ k B - λ k e , λ 2 g:
ð3:1Þ
To prove the converse of (3.1), we first suppose that A = λ + K, where λ 2 and K 2 KðHÞ. Then δA,B = Rλ-B + LK and kδA,Bkw = kRλ-Bkw = kλ - Bke in view of [18] and the fact that LK 2 WðBðHÞÞ. A similar conclusion holds if we let B = λ + K, where K is any compact operator. Next, assume that A þ λ2 = KðHÞ and B þ λ 2 = KðHÞ for any λ 2 . Let V 2 WðBðHÞÞ be arbitrary, and consider the operator δA,B - V . We first claim that there exists λ 2 such that W N,ess ðA þ λÞ \ W N,ess ð - ðB þ λÞÞ ≠ ∅:
ð3:2Þ
Let D = fλ 2 : jλj < 1g and let ϕ : D → be the surjective map defined by ϕ(reiθ) = r(1 - r)-1eiθ. Define F(λ) = [WN,ess(A + λ) - WN,ess(-(B + λ))] ∕ 2 for λ 2 and ψðλÞ =
FðϕðλÞÞ if λ 2 D, λ
if jλj = 1:
The set-valued map ψ : D → 2D is upper semicontinuous on D and ψ(λ) is non-empty, closed, and convex for all λ. Note that ψ is also upper semicontinuous on the boundary of D, since W N,ess ðA þ
Norm Equalities for Derivations
571
rð1 - rÞ - 1 eiθ Þ → eiθ uniformly in θ as r → 1-1. By the Kakutani fixed point theorem for set-valued maps (see [22] and [24, p. 745]), we have λ 2 D so that 0 2 ψ(λ), which yields (3.2). Since δA+λ,B+λ = δA,B, then we may assume that λ = 0. Fix μ 2 WN,ess(A) \ WN,ess(-B). Then there are orthonormal sequences {xn} and {yn} such that lim kAxn k = kAke ,
n→1
lim kByn k = kBke
n→1
ð3:3Þ
and hkAke- 1 Axn , xn i = - nlim hkBke- 1 Byn , yn i: μ = nlim →1 →1 Thus we may choose a norm-null sequence {wn}⊆ H such that the operator e Vn, defined on H by Vnyn = xn, V n Byn = - kBk kAke Axn þ wn and Vn = 0 on (span(yn, Byn))⊥, is at most two-dimensional and kVnk = 1. To continue, we will choose by induction increasing sequences {mk} and {rk} of natural numbers so that, by denoting X rk = Pmk ,mkþ1 V rk Pmk ,mkþ1 , one has k X rk yrk - V rk yrk k ≤ 1∕ k and k X rk Byrk - V rk Byrk k ≤ 1∕ k for all k:
ð3:4Þ
Here {Pk} are the basis projections with respect to the natural coordinates of H. Suppose that we have found 1 = m1 < ⋯ < mk < mk+1= and r1 < ⋯ < rk with the desired properties. We first show that it is possible to choose rk+1 > rk so that kQmkþ1 V rkþ1 Qmkþ1 zrkþ1 - V rkþ1 zrkþ1k < 1∕2k
ð3:5Þ
holds when zrkþ1 is either yrkþ1 or Byrkþ1 for all k 2 . In fact, consider the decomposition Qmkþ1 V s Qmkþ1 ys - V s ys = - Pmkþ1 V s ys þ ½ðPmkþ1 - 1ÞV s Pmkþ1 ys , where {ys} and {Vsys} = {xs} are weak-null sequences. The compactness of Pmkþ1 ensures that the sequences fPmkþ1 ys gs and fPmkþ1 V s ys gs are norm-null. This yields (3.5) as soon as s = rk+1 is large enough in the case {zs} = {ys}. The argument is similar for zs = Bys as the sequences {Bys} and {VsBys} =
572
M. Boumazgour and A. Sougrati
{-kBke ∕ kAkeAxs + ws} are weak-null. Hence we obtain (3.5) in both cases once rk+1 > rk is large enough. Since V rkþ1 is a fixed finite dimensional operator, it follows that the sequence fPmk ,j V rkþ1 Pmk ,j g tends to Qmk V rkþ1 Qmk in the operator norm as j → 1. Hence by choosing the index mk+2 > mk+1 sufficiently large, we get (3.4) and (3.5). This completes the induction step. By combining (3.4), (3.3) and the fact that {ws} is a norm-null sequence, we obtain that lim inf δA,B X rj j→1
≥ lim inf kAV rj yrj - V rj Byrj k - lim supð1 þ kAkÞ∕ j j→1
≥ lim inf kAxrj þ j!1
j!1
kBke Ax k - lim sup wrj kAke rj j!1
≥ kAke þ kBke : ð3:6Þ Let L 2 WðBðHÞÞ be arbitrary. Obviously kX rj k ≤ 1 for all j; thus (3.6) implies the estimate k δA,B - L k ≥ lim supð1∕ k X rj kÞ k δA,B ðX rj Þ - LðX rj Þ k j→1
≥ lim sup k δA,B ðX rj Þ k j→1
≥ kAk e þ kBk e : Above we applied the Dunford-Pettis property of the subspace M =f 1 j = 1 λj X rj : fλj g 2 c0 g to the weakly compact restriction L|M and the weak-null sequence fX rj g and conclude that lim j → 1 k LðX rj Þk = 0. This completes the proof. □
4 Estimation of the Norm of a Derivation on a Norm Ideal Following [20], a (symmetric) norm ideal (J, k kJ) of BðHÞ consists of a proper two-sided ideal J together with a norm k kJ satisfying the following conditions: 1. (J, k kJ) is a Banach space. 2. kAXBkJ ≤ kAkkBkkXkJ for all X 2 J and all operators A and B in BðHÞ. 3. kXkJ = kXk for any rank-1 operator X.
Norm Equalities for Derivations
573
For a compact operator A 2 BðHÞ, let s1(A) ≥ s2(A) ≥⋯ ≥ 0 denote the sequence of the singular values of A taken in decreasing order and counting multiplicities. For 1 ≤ p < 1, define the Schatten p-norm of A by kAk p = ð
spj ðAÞÞ : 1∕ p
j
The norm ideals associated with these norms are the Schatten p-ideals defined by C p ðHÞ = fK 2 BðHÞ : K is compact with k K k p < 1g: Hence C 1 ðHÞ and C 2 ðHÞ are the trace class and the Hilbert-Schmidt class of operators, respectively. Note that the Hilbert-Schmidt class of operators on H is a Hilbert space with respect to the inner product hA, Bi = trðAB Þ, where tr denotes the usual trace functional. For a complete account of the theory of norm ideals, we refer to [10, 20, 21]. Let (J, k kJ) be a norm ideal of BðHÞ. Obviously, δA,B(J) ⊆ J. We denote by δJ,A,B the restriction of δA,B to J. We set δJ,A = δJ,A,A, and in case J = C p ðHÞ (1 ≤ p < 1), we denote δJ,A by δp,A and δJ,A,B by δp,A,B. The restrictions of δA and δA,B to the ideal KðHÞ of all compact operators are denoted by δ1,A and δ1,A,B, respectively. If (J, k kJ) is a norm ideal of BðHÞ and X 2 J, then k δJ,A,B ðXÞk = kðA - λÞX - XðB - λÞ k J ≤ ð kA - λ k þ k B - λ k Þ k X k J , for all λ 2 . In view of (2.2), it follows that k δJ,A,B k ≤ k δA,B k for all J:
ð4:1Þ
Estimating the norm of δJ,A,B in the opposite direction has been investigated by Fialkow [7, 8] and Ando [1]. We start this section by proving that, for every pair (A, B) of operators on H, there exists a positive number αJ with 1 ≤ αJ ≤ 2 such that kδA,Bk ≤ αJkδJ,A,Bk. Indeed, let x, y, u, v 2 H be unit vectors such that hx, ui = hy, vi = 0, and let
574
M. Boumazgour and A. Sougrati
X = x y þ u v: Let (J, k kJ) be a norm ideal of BðHÞ and let X 2 J. We set αJ : = k X k J : We claim that αJ depends only on the unitary equivalence class of X. Indeed, let e, f, g, h 2 H be other unit vectors such that he, gi = hf, hi = 0. Let F0 = span(y, v), F1 = span( f, h), and let {en} and {gn} be orthonormal ⊥ ⊥ bases of F ⊥ 0 (F 0 : orthogonal complement of F0) and F 1 , respectively. Define ⊥ ⊥ the operator U : F 0 F 0 → F 1 F 1 by Uðαy þ βv þ f 0 Þ = αf þ βh þ f 1 , where α, β 2 , f0 =∑nαnen 2 F0 with αn 2 and f 1 = n αn gn 2 F ⊥ 1. ⊥ ⊥ Let V : span(e, g) (span(e, g)) →span(x, u) (span(x, u)) be the operator defined in the same manner as U. It is easy to see that U and V are unitary operators and that X = V Y U, where Y is the rank two operator defined by Y = e f + g h. Thus, by unitarily invariance property of the norm k kJ, we obtain that kX kJ = kY kJ: In the particular case where J = C p ðHÞ (1 ≤ p < 1), we have αJ = 21∕p. Since 1 = kXk≤kXkJ ≤kXk1 = 2, then 1 ≤ αJ ≤ 2. Now we are in a position to prove the following theorem. Theorem 4.1 [3] Let J be a norm ideal of BðHÞ. Then there exists a constant αJ with 1 ≤ αJ ≤ 2 such that the inequality kδA,Bk≤ αJkδJ,A,Bk holds for all A, B 2 BðHÞ. Proof By a compactness argument, there exists λ0 2 such that k δA,B k = k δA - λ0 ,B - λ0 k = k A - λ0 k þ k B - λ0 k : Without loss of generality, we may assume that λ0 = 0, and hence kδA,Bk = kAk + kBk. By Proposition 2.2, it follows that WN(A) \ WN(-B) ≠ ∅. Let μ 2 WN(A) \ WN(-B) ≠ ∅. Then there exist two unit sequences {xn} and {yn} of elements of H such that lim n → 1 hAxn , xn i = μ kAk, lim n → 1 kAxn k = kAk, and lim n → 1 hByn , yn i = - μ kBk, lim n → 1 k Byn k = kBk. Set Axn = αnxn + βnun and Byn = γ nyn + δnvn,
Norm Equalities for Derivations
575
where un and vn are unit vectors of H such that hxn, uni = hyn, vni = 0, and αn , βn , γ n , δn 2 . We may choose βn ≥ 0 and δn ≤ 0. For each n ≥ 1, let X n = xn yn þ un vn : Clearly kXnk = 1, αJ = kXnkJ and hAX n yn , X n Byn i = αn γ n þ βn δn : Since lim n→ 1 kAxn k 2 = lim n→ 1 jαn j2 þ β2n = kAk 2 jμj kAk, it follows that =
1 - jμj2 kAk:
lim δn =
1 - jμj2 kBk:
lim β n→1 n
and lim n→ 1 jαn j =
Similarly, we obtain n→1
Hence lim hAX n yn , X n Byn i = lim αn γ n þ βn δn
n→1
n→1
= - jμj2 kAkkBk - ð1 - jμj2 Þ kAkkBk = - kAkkBk: Since jhAXnyn, XnBynij≤kAXnynkkXnBynk≤kAkkBk for all n, it follows that lim n → 1 kAX n yn k = kAk and lim n → 1 k X n Byn k = kBk. After a short computation we obtain that lim kAX n yn - X n Bynk = kAk þ kBk:
n→1
ð4:2Þ
Since for each n ≥ 1, k δJ,A,B k ≥
1 1 1 k δ ðX Þ k ≥ k δ ðX Þk ≥ kAX n yn - X n Bynk, αJ A,B n J αJ A,B n αJ
then (4.2) implies that
576
M. Boumazgour and A. Sougrati
k δJ,A,B k ≥
1 kδ k, αJ A,B □
which completes the proof.
An immediate consequence of Theorem 4.1 is the following equality proved in [7] by a different method. Corollary 4.1 For A, B 2 BðHÞ, we have k δ1,A,B k = k δ1,A,B k = inf fkA - λ k þ k B - λ k : λ 2 g: For B 2 BðHÞ, the set {UBU : U unitary} is the unitary similarity orbit through B. The anti-distance from A to the orbit with respect to the norm k k is supfkA - U BU k : U unitaryg. As an application of Theorem 4.1, we get the following bounds for the anti-distance between the operators A and B. Note that the second one was proved in [1, Theorem 1]. Corollary 4.2 If A, B 2 BðHÞ, then for 1 ≤ p ≤1, we have supf kA - U BU k : U unitaryg ≤ 21∕ p k δp,A,Bk
ð1 ≤ p ≤ 1Þ:
ð4:3Þ
In particular, p supf kA - U BU k : U unitaryg ≤ 2 kA I - I B0 k :
ð4:4Þ
Proof Let λ 2 and let U 2 BðHÞ be unitary. Then kA - U BU k
=
jjðA - λÞ - U ðB - λÞU k
≤
kA - λ k þ k B - λ k
Then, it follows from (2.2) that supfkA - U BUk : U unitaryg ≤ k δA,Bk : Since the unit ball of BðHÞ is the closure of the convex hull of unitary operators on H, we derive that k δA,B k = supfkA - U BUk : U unitaryg: Thus the inequality in (4.3) follows from Theorem 4.1.
ð4:5Þ
Norm Equalities for Derivations
577
The inequality in (4.4) follows from (4.3) and the fact that δ2,A,B is unitarily equivalent to the operator A I - I B0 ; see [5]. □ Note that the equality in (4.5) was proved independently in [6]. Recall that a bounded operator A 2 BðHÞ is said to be cohyponormal if A is hyponormal. Corollary 4.3 If A and B are hyponormal and cohyponormal operators, respectively (if, in particular, both of them are normal), then p supfkA - U BU k : U unitaryg ≤ 2supfjα - βj : α 2 σðAÞ, β 2 σðBÞg: ð4:6Þ Proof If A and B are hyponormal, then by [14], δ2,A,B is hyponormal as operator on C 2 ðHÞ. Hence, it follows from [11] that rðδ2,A,B Þ = kδ2,A,B k : On the other hand, we have by [5], σ(δ2,A,B) = σ(A) - σ(B), so the equality in (4.6) follows, and the proof of the corollary is complete. □ The bound in Corollary 4.3 is known to be sharp in the simplest case dimH = 2. To see this, just consider the 2 × 2 matrices A=
0
1
1
0
and B =
0
1
-1
0
:
Note that the above corollary was proved by Omladič and Šemrl [16] and Ando [1] for normal matrices A and B. Let {en} be the unit coordinate basis of the space ℓ2. Define the operator A on Bðℓ 2 Þ by Ae2n = e2n-1 and Ae2n-1 = 0 for n 2 . In [7], Fialkow observed that the operator A satisfies the inequality k δ2,Ak < 2 inf fkA - λ k , λ 2 g: Then, he called a generalized derivation δA,B S-universal (A, B 2 BðHÞ) when kδJ,A,Bk does not depend on J, that is, kδJ,A,Bk = kδA,Bk for every norm ideal J. In the remainder of this section, we shall give a complete characterization of the class of S-universal generalized derivations. We begin with the next proposition
578
M. Boumazgour and A. Sougrati
Proposition 4.1 [2] Let A, B 2 BðHÞ . Then kA + Bk = kAk + kBk if and only if kAkkBk2 WðA BÞ. Proof Suppose kA + Bk = kAk + kBk. Then, there exists a sequence of unit vectors {xn}⊆ H such that lim n → 1 kAxn þ Bxnk = kAk þ kBk. Thus, it follows that lim n → 1 kAxn k = kAk and lim n → 1 k Bxnk = kBk. From the equality kAxn þ Bxnk2 = kAxn k2 þ k Bxnk2 þ 2ℜðhA Bxn , xn iÞ, we deduce that lim ℜðhA Bxn , xn iÞ = kAkkBk:
n→1
Since jhA Bxn , xn ij2 = ðℜðhA Bxn , xn iÞÞ þ ðℑðhA Bxn , xn iÞÞ and kABxnk ≤ kAkkBk, we infer that 2
2
lim jhA Bxn , xn ij = kAkkBk:
n→1
Thus lim n → 1 hA Bxn , xn i = kAkkBk, i.e., kAkkBk2 WðA BÞ. Conversely, if kAkkBk2 WðA BÞ, then we can find a unit sequence {xn} ⊆ H such that lim n → 1 hA Bxn , xn i = kAkkBk. Since jhABxn, xnij ≤ kAxnkkBxnk ≤ kAkBk, we deduce that lim n → 1 kAxnk = kAk and lim n → 1 k Bxn k = kBk. On the other hand, we have kAxn þ Bxn k 2 = kAxn k 2 þ k Bxn k 2 þ 2ℜðhA Bxn , xn iÞ and lim ℜðhA Bxn , xn iÞ = kAkkBk:
n→1
Hence, we get lim kAxn þ Bxnk = kAk þ kBk:
n→1
Therefore kA + Bk = kAk + kBk which completes the proof.
□
Theorem 4.2 [2] For A, B 2 BðHÞ, the following assertions are equivalent: 1. kδ2,A,Bk = kδA,Bk. 2. r(δ2,A,B) = kδA,Bk.
Norm Equalities for Derivations
579
Proof Assume that kδ2,A,Bk = kδA,Bk. By (4.1), we have k δA,Bk = inf fkA - λ k þ k B - λ k , λ 2 g. Thus, by a compactness argument, there exists μ 2 such that inf f kA - λk þ kB - λk, λ 2 g = kA - μk þ kB - μk: Hence kδ2,A,Bk = kδ2,A-μ,B-μk = kL2,A-μ - R2,B-μk = kA - μk + kB - μk. Without loss of generality, we may suppose that μ = 0, and then kL2,A - R2,Bk = kL2,Ak + kR2,Bk. By Proposition 4.1, this is equivalent to kAkkBk = k L2, Akk R2, B k 2 WðL2, A R2, B Þ: As remarked in the above, this implies that kAkkBk2 σð - L2,A R2,B Þ: But σðL2,A R2,B Þ = σðA ÞσðBÞ (see [5]). Thus there exist α 2 σ(A) and β 2 σ(B) such that kAkkBk = - αβ. Since jαj ≤ kAk and jβj ≤ kBk, we can find θ 2 such that α = kAkeiθ and β = -kBkeiθ. Therefore rðδ2,A,B Þ = supfjλ - μj : λ 2 σðAÞ, μ 2 σðBÞg ≥ jα - βj = kAk þ kBk : From the inequality r(δ2,A,B) ≤ kδ2,A,Bk ≤ kδA,Bk ≤ kAk + kBk, it follows that rðδ2,A,B Þ = k δ2,A,Bk = k δA,Bk : The reverse implication is easy to see since we always have r(δ2,A,B) ≤kδ2, □
A,Bk≤ kδA,Bk.
For A, B 2 BðHÞ let dðδA,B Þ = supfkAX - XB k : X 2 BðHÞ, k X k = 1, rank X = 1g. By [4, Remark 2.9], we have dðδA,B Þ ≤ k δJ,A,B k
ð4:7Þ
for any norm ideal J. Then, from the last theorem, we obtain the following characterization of S-universal generalized derivations. Theorem 4.3 For A, B 2 BðHÞ, the following assertions are equivalent: 1. 2. 3. 4.
kδJ,A,Bk = kδA,Bk for all J. w(δA,B) = kδA,Bk. r(δA,B) = kδA,Bk. d(δA,B) = kδA,Bk.
580
M. Boumazgour and A. Sougrati
Proof The implications (3) ) (2) ) (1) follow from the inequalities rðδA,B Þ ≤ wðδA,B Þ ≤ k δJ,A,Bk ≤ k δA,Bk: The implication (1) ) (3) follows directly from Theorem 4.2. The implication (4) ) (1) follows from (4.7) and the fact that kδJ,A,Bk ≤kδA,Bk. The implication (1) ) (4) follows from [4, Theorem 3.1]. □ Corollary 4.4 No nonzero quasinilpotent generalized derivation is S-universal. In the case of an inner derivation, we have the following characterization. Corollary 4.5 For A 2 BðHÞ, the following assertions are equivalent: 1. δA is S-universal. 2. diamðWðAÞÞ = 2 inf fkA - λk , λ 2 g. 3. diamðσðAÞÞ = 2 inf fkA - λk , λ 2 g. Let A 2 BðHÞ be hyponormal. Since r A = inf fkA - λ k , λ 2 g, then we have the next characterization. Corollary 4.6 Let A 2 BðHÞ be hyponormal. Then δA is S-universal if and only if diam(σ(A)) = 2rA. Acknowledgements We are grateful to the referees for their valuable comments and suggestions.
References 1. Ando, T. (1996). Bounds for anti-distance. Journal of Convex Analysis, 3, 371–373 2. Barraa, M., & Boumazgour, M. (2002). Inner derivations and norm equality. Proceedings of the American Mathematical Society, 130, 471–476 3. Boumazgour, M. (2006). An estimate for the norm of a derivation on a norm ideal. Linear and Multilinear Algebra, 54(5), 321–327 4. Boumazgour, M. (2016). On the S-universal elementary operators. Linear Algebra and its Applications, 507, 274–287 5. Brown, A., & Pearcy, C. (1966). Spectra of tensor products of operators. Proceedings of American Mathematical Society, 17, 162–166 6. Choi, M. D., & Li, C. K. (2006). The ultimate estimate of the upper norm bound for the summation of operators. Journal of Functional Analysis, 232(2), 455–476 7. Fialkow, L. (1979). A note on norm ideals and the operator X → AX - XB. Israel Journal of Mathematics, 32, 331–348
Norm Equalities for Derivations
581
8. Fialkow, L. (1992). Structural properties of elementary operators. In M. Mathieu (Ed.), Elementary operators and applications (pp. 55–113). World Scientific 9. Fong, C. K. (1979). On the essential maximal numerical range. Acta Science and Mathematics, 41, 307–315 10. Goheberg, I. C., & Krein, M. G. (1969). Introduction to the theory of linear nonselfadjoint operators. In Translations of mathematical monographs (Vol. 18). American Mathematical Society 11. Halmos, P. R. (1970). A Hilbert space problem book. Van Nostrand 12. Johnson, B. E. (1971). Norms of derivations on L(X). Pacific Journal of Mathematics, 38, 465–469 13. Kyle, J. (1977). Norms of derivations. Journal of the London Mathematical Society, 16, 297–312 14. Magajna, B. (1985). On subnormality of generalized derivations and tensor products. Bulletin of the Australian Mathematical Society, 31, 235–143 15. Martin, M., & Putinar, M. (1989). Lectures on hyponormal operators. In Operator theory: advances and applications (Vol. 39). Birkhäuser 16. Omladič, M. & Šemrl, P. (1990). On the distance between normal matrices. Proceedings of American Mathematical Society, 110, 591–596 17. Rosenblum, M. (1956). On the operator equation BX - XA = Q. Duke Mathematical Journal, 23, 263–269 18. Saksman, E., & Tylli, H.-O. (1994). Weak essential spectra of multiplication operators on spaces of bounded operators. Mathematische Annalen, 299, 299–309 19. Saksman, E., & Tylli, H.-O. (1999). The Apostol-Fialkow formula for elementary operators on Banach spaces. Journal of Functional Analysis, 161, 1–26 20. Schatten, R. (1960). Norm ideals of completely continuous operators. Springer 21. Simon, B. (1979). Trace ideals and their applications. Cambridge University Press 22. Stampfli, J. (1970). The norm of a derivation. Pacific Journal of Mathematics, 33, 737–747 23. Sylvester, J. (1884). Comptes Rendus de l’Acadmie des Sciences, 99, 67–71, 115–118, 409–412, 432–436, 527–529 24. Wojtaszczyk, P. (1991). Banach spaces for analysts. In Cambridge studies in advanced mathematics (Vol. 25). Cambridge University Press
On Semicircular Elements Induced by Connected Finite Graphs Ilwoo Cho and Palle E. T. Jorgensen
Abstract In this chapter, we study C-probability spaces which are induced by finite graphs. We first introduce classes if graph groupoids (derived from connected graphs). We then show that their representations act on specific C-probability spaces. Via these representations, we further show that groupoidal elements will then act as Banach-space operators, realized on our free-probabilistic structures, in such a way that they deform the original free-distributional data. We characterize the deformations arising this way from individual graph-groupoid elements. Applications are given to deformed semicircular laws. Keywords Graphs • Groupoids • Loops • Semicircular elements Mathematics Subject Classification (MSC2020) Primary 47A99 • Secondary 17A50, 18B40
1 Introduction In this paper, a graph is a combinatorial objects consisting of vertices and edges connecting vertices. A directed graph is a combinatorial diagrammatic form, consisting of a set of dots expressing vertices, and a set of jointed by arrowed curves indicating edges, where the arrows show the direction on the I. Cho Department of Mathematics and Statistics, St. Ambrose University, Davenport, IA, USA e-mail: [email protected] P. E. T. Jorgensen (✉) Department of Mathematics, University of Iowa, Iowa City, IA, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Moslehian (ed.), Matrix and Operator Equations and Applications, Mathematics Online First Collections, https://doi.org/10.1007/16618_2023_57
583
584
I. Cho and P. E. T. Jorgensen
graph (e.g., [20, 34, 36, 41, 43]). These graphs play important roles in pure and applied mathematics (e.g., [3, 18, 19, 21, 22, 30–32, 42, 49]).
1.1
Motivation
For operator-algebraic structures induced by directed graphs, see e.g., [28, 29, 35, 40, 45, 47]. Especially, for details about graph groupoids generated by graphs, and the operator-algebraic structures induced by them, see e.g., [5, 11, 12]. As an application, the fractal property induced by the graph groupoids is considered in [6, 7]. The graph grouopoids satisfying the fractal property are said to be the graph fractaloids, and we call the graphs inducing graph fractaloids, the fractal graphs. Meanwhile, free probability handles measure-theoretic and statistics on noncommutative structures (e.g., [1, 15, 33, 37, 39, 46, 48]). In this noncommutative analysis, the semicircular law plays a key role because it acts like the Gaussian (or the normal) distribution of commutative analysis by the central limit theorem(s) (e.g., [4, 8–10, 13, 14, 39, 46, 48]). In [4], we characterized the semicircularity in terms of a combinatorial property on graphs. It is showed that the semicircularity on free-probabilistic structures induced by graphs is fully characterized by so-called the loopness on graph (groupoid)s.
1.2
Overview
In Sections 2, 3, and 4, we characterize the free distributions of generating free random variables of the C-probability spaces of connected finite directed graphs. In Section 5, fractal graphs and their graph fractaloids are considered. It is shown that a fractal graph always generates suitably many semicircular elements in the C-probability space. Motivated by the main results of Sections 3, 4, and 5, we study how graph groupoids affect the original free-distributional data in Sections 6 and 7. These deformed free-distributional data are characterized in Section 8.
2 Definitions and Backgrounds In this section, we introduce backgrounds of our study. For motivations and details, see e.g., [5–7, 11, 12].
On Semicircular Elements Induced by Connected Finite Graphs
2.1
585
Graph Groupoids
Throughout this paper, all given graphs are automatically assumed to have more than one vertex, i.e., jVðGÞj > 1, for any given graph G, where jY j means the cardinality of a set Y . Let G = ðV ðGÞ, EðGÞÞ be a directed graph with its vertex set V (G) and its edge set E(G). If e 2 E(G) is an edge connecting its initial vertex v1 to its terminal vertex v2 (up to direction), then we write e = v1e, or e = ev2, or e = v1ev2, for emphasizing that an edge e has its initial vertex v1 and the terminal vertex v2. And we say that “v1 and e” and “e and v2” are admissible. For a graph G, one can define the oppositely directed graph, denoted by -1 G , with its vertex set, VðG - 1 Þ = VðGÞ, and the edge set, EðG - 1 Þ = e - 1 : e 2 EðGÞ , where e-1 means an edge, e-1 = v2e-1v1 in E(G-1), whenever e = v1 e v2 in E(G), with v1, v2 2 V (G) = V (G-1). This opposite directed edge e-1 2 E(G-1) of e 2 E(G) is said to be the shadow of e, and this new graph G-1 is called be the shadow of G. By definition, ðG - 1 Þ
-1
= G, as graphs:
Define the shadowed graph G of G by the graph union with, VðGÞ = VðGÞ = VðG - 1 Þ, and, EðGÞ = EðGÞ [ EðG - 1 Þ, where G-1 is the shadow of G. The admissibility on the edges of the shadowed graphs is similarly determined. By extending the admissibility on edges, one can naturally
586
I. Cho and P. E. T. Jorgensen
have the admissibility on finite paths of shadowed graphs. That is, if w1 and w2 are finite paths, then they are admissible, if a new finite path w1w2 is on G. Denote the set of all finite path by FPðGÞ: Now, we define the free semigroupoid þ ðGÞ of the shadowed graph G, by an algebraic structure, þ G
denote
=
þ G , ,
where þ G = fϕg [ V G [ FP G , and the binary operation () is the admissibility, where the element ϕ of þ G is the empty word in V G [ E G , representing the cases where two elements of þ G are “not admissible.” Define the reduction (RR) on þ ðGÞ, by the rule: ðRRÞ w = v1 wv2 2 þ G ) ww - 1 = v1 , and w - 1 w = v2 , on þ G , where v1 , v2 2 V G . The admissibility on þ G
under
(RR) is called the “reduced-admissibility.” Definition 2.1 The algebraic pair þ G ∕ ðRRÞ, •
of the quotient set
þ G ∕ ðRRÞ and the reduced-admissibility (•) is called the graph groupoid of G, and we denote it by . Indeed, graph “groupoids” are algebraically groupoids. In particular, all vertices of are the units (e.g., [5, 7, 16]). Notation. In the following text, we simply denote the reducedadmissibility (•) by (). Also, we denote the set of all “reduced” finite paths of by FPr G .
On Semicircular Elements Induced by Connected Finite Graphs
2.2
587
Graph-Groupoid C-Algebras
Define the graph Hilbert space HG of G by def
HG =
ξv
v 2VðGÞ
ξw ,
w 2 FPr ðGÞ
with its orthonormal basis, BG = fξw : w 2 ∖fϕgg, and its zero vector ξϕ = 0H G . On HG, we have a well-determined vector-multiplication, ξw1 ξw2 =
ξw1 w2
if w1 w2 ≠ ϕ
ξ ϕ = 0H G
if w1 w2 = ϕ,
for all w1 , w2 2 . Define a canonical action, L : → B(HG), of , by def
L(w) = Lw 2 B(HG), for all w 2 , where BðH G Þ is the operator algebra consisting of all operators on the graph Hilbert space HG. Note that each operator Lw has its adjoint, Lw = Lw - 1 , for all w 2 . By definition, the operators Lv are projections on HG, since Lv = Lv - 1 = Lv = Lvv = Lv Lv = L2v , in BðH G Þ, for all v 2 V G , and hence, the operators Lw are partial isometries on HG, since Lw Lw = Lw - 1 Lw = Lw - 1 w is a projection in BðH G Þ because w - 1 w 2 V G w 2 FPr G .
by (RR), for all
588
I. Cho and P. E. T. Jorgensen
Note that, for all w1 , w2 , w 2 , we have Lw1 w2 = Lw1 Lw2 , and Lw = Lw - 1 , in BðH G Þ, implying that the pair ðH G , LÞ is a well-defined Hilbert space representation of . Definition 2.2 For a graph G and the representation (HG, L) of the graph groupoid , the C-algebra MG is defined by M G = C ðLðÞÞ = ½LðÞ, denote
def
in B(HG), where ½X is the polynomial algebra generated by a set X, and Z is the operator-norm-topology closure of a subset Z of BðH G Þ. The Calgebra MG is called be the graph-groupoid (C-)algebra of G. Define the C-subalgebra DG of MG by def
DG =
ð Lv Þ, v2V G
called diagonal subalgebra of MG. Let x be in MG. Then it is expressed by x=
t w Lw with t w 2 : w2
Also, the unity (or, the multiplication-identity operator) 1G of MG is the operator, 1G =
^Þ v2V ðG
Lv in DG ⊆ M G ,
since 1G Lw = Lv1 Lw = Lv1 w = Lw = Lwv2 = Lw Lv2 = Lw 1G , where w = v1 wv2 2 ∖fϕg, with v1 , v2 2 V G , implying that 1G T = T = T1G , 8T 2 M G :
On Semicircular Elements Induced by Connected Finite Graphs
589
Now, define a conditional expectation, E : MG → DG, by E
t w Lw
def
=
w2
for all
t v Lv , v2V G
t w Lw 2 MG. w2
Combinatorially, two directed graphs G1 and G2 are said to be graphisomorphic, if there exist bijections, gV : VðG1 Þ → V ðG2 Þ, gE : EðG1 Þ → EðG2 Þ, such that gE ðeÞ = gE ðv1 ev2 Þ = gV ðv1 ÞgE ðeÞgV ðv2 Þ, in EðG2 Þ, for all e = v1 ev2 2 EðG1 Þ, with v1 , v2 2 V ðG1 Þ. And the pair ðgV , gE Þ is called a graph-isomorphism from G1 to G2. Algebraically, two groupoids G 1 and G 2 are groupoid-isomorphic, if there is a bijection g : G 1 → G 2 such that gðw1 w2 Þ = gðw1 Þgðw2 Þ in G 2 , for all w1 , w2 2 G 1 . Proposition 2.3 Let G1 and G2 be directed graphs. If their shadowed graphs G1 and G2 are graph-isomorphic, then the graph-groupoid algebras M G1and M G2 are -isomorphic. graph
Proof If G1 = G2 , then the graph groupoids 1 and 2 are groupoidisomorphic, i.e., graph
G 1 = G 2 ) 1 groupoid
groupoid
=
2 ,
where “ = ” means “being groupoid-isomorphic to.” So, the graph□ groupoid algebras M G1 and M G2 are -isomorphic, by definition.
590
2.3
I. Cho and P. E. T. Jorgensen
Semicircular Elements
Let A be a topological (noncommutative) -algebra (i.e., a C-algebra, or a von Neumann algebra, or a Banach -algebra, etc.), and let φ be a (bounded) linear functional on A. Then a pair ðA, φÞ is called a (noncommutative) topological -probability space (respectively, a C-probability space, or a W-probability space, or a Banach -probability space, etc.). An element a 2 A is said to be a free random variable as an element of ðA, φÞ. The free distribution of free random variables a1, . . . , as, for s 2 , is characterized by the joint-free moments, n
φ
k=1
= φ ari11 . . . arinn ,
arikk
for all ði1 , . . ., in Þ 2 f1, . . . , sgn , and ðr1 , . . . , r n Þ 2 f1, gn , for all n 2 . For details, see e.g., [46, 48]. Definition 2.4 A free random variable x 2 (A, φ) is semicircular, if it is selfadjoint, and φðxn Þ = ωn cn2 , 8n 2 ,
ð2:1Þ
where ωn =
1
if n is even
0
if n is odd;
for all n 2 , and ck =
1 kþ1
2k k
=
ð2k Þ! k!ðk þ 1Þ!
are the k-th Catalan numbers for all k 2 0 = [ f0g. By the Moebius inversion of [46], a free random variable x is semicircular in ðA, φÞ, if and only if its free cumulant satisfies that kφn ðx, . . . ; xÞ = δn,2 for all n 2 , by (2.1), where δ is the Kronecker delta.
On Semicircular Elements Induced by Connected Finite Graphs
591
3 Radial Operators Recall that we assume that all given graphs are “connected,” “finite,” and having more than one vertex. We say that G is disconnected, if there exist distinct vertices, v1 ≠ v2 2 V G = V ðGÞ, such that there does not exist a reduced finite path w 2 FPr G in , such that either w = v1 wv2 , or w = v2 wv1 : A graph G is said to be connected, if it is not disconnected. While, a graph G is said to be finite, if jV ðGÞj < 1, and jEðGÞj < 1: Assumption and Notation. In the following text, if we mention that “G is a graph,” then it means that “G is a connected finite directed graph having more than one vertex.” Definition 3.1 On the graph-groupoid algebra MG, define operators Tw 2 MG by T w = Lw þ Lw = Lw þ Lw - 1 , 8w 2 FPr G :
ð3:1Þ
We call these elements Tw of (3.1), the w-radial operators for w 2 FPr G . By the very definition, every radial operator is self-adjoint in MG. Define a linear functional φ on the diagonal subalgebra DG of MG by
φ
t v Lv v2V G
It is a (bounded) trace satisfying
=
tv : v2ðGÞ
592
I. Cho and P. E. T. Jorgensen
φðS1 S2 Þ = φðS2 S1 Þ, 8S1 , S2 2 DG : Now, define a linear functional τ on MG by ð3:2Þ
def
τ = φ ∘ E on M G :
Since τ of (3.2) is a well-defined bounded linear functional, we have a Cprobability space ðM G , τÞ induced by a graph G. In particular, the boundedness of τ is guaranteed by the finiteness of G. Definition 3.2 The C-probability space ðM G , τÞ is called the graph Cprobability space of G (or, of ). Two C-probability spaces ðA1 , ψ 1 Þ and ðA2 , ψ 2 Þ are said to be freeisomorphic, if there exists a -isomorphism, Φ : A1 → A2 , such that ψ 2 ðΦðaÞÞ = ψ 1 ðaÞ, 8a 2 ðA1 , ψ 1 Þ: This -isomorphism Φ is called a free-isomorphism. By definition, If two free-isomorphic C-probability spaces are regarded as the same freeprobabilistic structure. Theorem 3.3 Let G1 and G2 be graphs. If the shadowed graphs G1 and G2 are graph-isomorphic, then the graph C-probability spaces ðM G1 , τ1 Þand ðM G2 , τ2 Þare free-isomorphic. Proof We obtained that graph
G1 = G2 ) 1
groupoid
=
iso
2 ) M G1 = M G2 :
Indeed, if g : 1 → 2 is the groupoid-isomorphism induced by a graphisomorphism ðgV , gE Þ satisfying
On Semicircular Elements Induced by Connected Finite Graphs
ϕ, the empty word of 2
if w = ϕ in 1
gV ðwÞ
if w 2 V G
gE ðe1 Þ . . . gE ðen Þ
if w = e1 . . . en 2 FPr G ,
gðwÞ =
593
in 2 , for all w 2 1 , where e1 , . . ., en 2 E G for n 2 , then we have a -isomorphism, Φ : M G1 → M G2 , satisfying ð2Þ
t w Lwð1Þ =
Φ w21
t gðwÞ LgðwÞ , in M G2 , gðwÞ22
ðkÞ t w Lð1Þ are the Hilbert space representaw 2 M G1 , where H Gk , L
for all w21
tions of k , for k = 1, 2. By (3.2), it is shown that t w Lð1Þ w
τ2 Φ
ð2Þ
= τ2
w21
gðwÞ22
t gðwÞ LgðwÞ
t gðvÞ =
= gðvÞ2VðG2 Þ, v2V G1
=
gV ðvÞ2V G2
t w Lwð1Þ .
t v = τ1 v2V G1
t gV ðvÞ
w21
Therefore, Φ is a free-isomorphism, and hence, ðM G1 , τ1 Þ
freeiso
=
ðM G2 , τ2 Þ: □
594
I. Cho and P. E. T. Jorgensen
4 Free Probability on ðM G , τ Þ In this section, we concentrate on studying the free distributions of such generating free random variables. Theorem 4.1 For v 2 V G , the corresponding projection Lv 2 ðM G , τÞ has its free-moment sequence, 1 n=1
τ Lnv
= ð1, 1, 1, 1, 1, 1, . . .Þ
ð4:1Þ
Proof Indeed, the very first moment of Lv determine all other free moments, since Lv is a projection, and, by (3.2), τðLv Þ = φðE ðLv ÞÞ = φðLv Þ = 1: □ All projections fLv g
v2V G
are identically free-distributed in ðM G τÞ, by
(4.1). Theorem 4.2 Let w = v1 wv2 2 FPr G with v1 , v2 2 V G . (i) If v1 ≠ v2 in V G , then the free distribution of Lw is determined by the only non-zero joint-free moments, τ Lrw1 . . . Lrwn = 1, for all even number n 2 , and ðr1 , . . . , rn Þ = ð1, , 1, , . . . , 1, Þ, or ð, 1, , 1, . . . , , 1Þ: (ii) If v1 = v2 in V G , then the free distribution of Lw is determined by the joint free moments,
On Semicircular Elements Induced by Connected Finite Graphs l
1 if
l
τ
k=1
Lrwk
k=1
= 0
595
ek = 0
otherwise;
for all ðr 1 , . . ., rl Þ 2 f1, gl , where ek =
1
if rk = 1
-1
if rk = ,
for k = 1, . . . , l, for all l 2 . Proof Suppose the initial and the terminal vertices v1 and v2 of w 2 FPr G are distinct in V G . Then there is no reduced finite paths fwn g1 n = 2 , since w2 = ðv1 wv2 Þðv1 wv2 Þ = v1 wðϕÞwv2 = ϕ, in , and hence, wn=ϕ, for all n 2 ∖f1g. So, we have τ Lnw =
τðLw Þ = φðE ðLw ÞÞ = φð0G Þ = 0
if n = 1
τðLwn Þ = τ Lϕ = τð0G Þ = 0,
if n > 1,
for all n 2 , where 0G is the zero operator of MG. Similarly, one has τ Lw
n
= τ Lnw - 1 = 0,
since w - 1 = v2 w - 1 v1 2 FPr G in , with v2 ≠ v1 in V G . The above computations shows that τ Lnw1e1 Lnw2e2 . . . Lnwkek = 0, if there is a ni 2 fn1 , . . . , nk g such that ni > 1, for all ðn1 , . . . , nk Þ 2 k and ðe1 , . . . , ek Þ 2 f1, - 1gk , for all k 2 . So, it is sufficient to consider the only possible non-zero cases where τ Lw Lw Lw Lw . . . Lw Lw , or τ Lw Lw Lw Lw . . . Lw Lw : Since
596
I. Cho and P. E. T. Jorgensen
Lw Lw = Lw Lw - 1 = Lww - 1 = Lv1 , and Lw Lw = Lw - 1 Lw = Lw - 1 w = Lv2 , in DG ⊆ MG, τ Lw Lw Lw Lw . . . Lw Lw = τðLv1 Lv1 . . . Lv1 Þ = 1, and τ Lw Lw Lw Lw . . . Lw Lw = τðLv2 Lv2 . . . Lv2 Þ = 1, by (4.1). Now, assume that v1 = v = v2 in V G , and hence, w = vwv, with -1
w = vw-1v, in . Then, for all n 2 , there are non-empty reduced finite paths wn, satisfying ðwn Þ - 1 = ðw - 1 Þ in FPr G , 8n 2 : n
And they satisfy that τ Lnw = τðLwn Þ = 0 = τðLw - n Þ = τ Lw
n
,
for all n 2 , by (3.2). Thus l
τ
k=1
l
Lrwk
=τ
k=1
n
φðLv Þ = 1 Lwek
if k=1
= φð0G Þ = 0
ek = 0
otherwise;
for all ðr 1 , . . ., r l Þ 2 f1, gl , where ek = 1 if rk = 1, and ek = -1 if rk = , for k = 1, . . . , l, for all l 2 . □ The free distributions of the partial isometries fLw g
w2FPr G
characterized by (i) and (ii) in Theorem 4.2.
in ðM G , τÞ are
On Semicircular Elements Induced by Connected Finite Graphs
597
Theorem 4.3 Let w = v1 wv2 2 FPr G with v1 , v2 2 V G and T w 2 ðM G , τÞ, the w-radial operator. If v1 ≠ v2 in , then τ T nw =
2
if n is even
0
if n is odd;
n
= ðLw þ Lw - 1 Þn
ð4:2Þ
for all n 2 . Proof Observe that, for any n 2 , T nw
= Lw þ Lw
n
=
=
ðw1 , ..., wn Þ2fw ± 1 gn
Lwl
l=1
L ðw1 , ..., wn Þ2fw ± 1 gn
ð4:3Þ
n
wl l=1
For an n-tuple ðw1 , . . ., wn Þ 2 w, w - 1 , n l=1
wl =
v1
if ðw1 , . . ., wn Þ = ðw, w - 1 , w, w - 1 , . . . , w, w - 1 Þ
v2
if ðw1 , . . . , wn Þ = ðw - 1 , w, w - 1 , w, . . . , w - 1 , wÞ
ϕ
otherwise;
ð4:4Þ
in , for all n 2 , by the same arguments in the proof of Theorem 4.2(i), because v1 ≠ v2 in V G , and hence, wn = ϕ = w-n in , for all n 2 ∖f1g. So, we have that: if n 2 is even, then τ T nw =
L ðw1 , ... , wn Þ2fw ± 1 gn
n
wl l=1
by (4.3) = τðLww - 1 ww - 1 ...ww - 1 þ Lw - 1 ww - 1 w...w - 1 w þ ½RestÞ
598
I. Cho and P. E. T. Jorgensen
where ½Rest is the sum of all other terms for the n-tuples n ðw1 , . . . , wn Þ 2 w, w - 1 , other than ðw, w - 1 , w, w - 1 , . . . , w, w - 1 Þ, ðw - 1 , w, w - 1 , w, . . . , w - 1 , wÞ, and hence, = τðLww - 1 ww - 1 ...ww - 1 þ Lw - 1 ww - 1 w...w - 1 w þ 0G Þ by (4.4) = τðLv1 þ Lv2 Þ = φðLv1 þ Lv2 Þ = 1 þ 1 = 2; meanwhile, if n 2 is odd, then τ T nw = 0, because the odd-power of Tw contains no vertex-depending terms.
□
The free distribution of the w-radial operator Tw in ðM G , τÞ is characterized by (4.2), whenever w is a non-loop with distinct initial and terminal vertices. Lemma 4.4 If w = vwv 2 FPr G is a reduced finite path with identical initial-and-terminal vertex v 2 V G in , inducing the w-radial operator T w 2 ðM G , τÞ, then τ T nw = ωn cn2 , 8n 2 ,
ð4:5Þ
where ck are the k-th Catalan numbers. Proof For n 2 ,
τ T nw =
ðw1 ,..., wn Þ2fw ± 1 gn
τ L
n
wl
:
l=1
But, by Theorem 4.2(ii), every summand satisfies n
τ L
n
wl
l=1
=
1
if l=1
0
εl = 0
otherwise;
ð4:6Þ
On Semicircular Elements Induced by Connected Finite Graphs
599
where εl =
1
if wl = w
-1
if wl = w - 1 ,
for all l = 1, . . . , n. By Theorem 4.2(ii) the formula (4.6) satisfies that τ T nw =
1, n
ðw1 , ... , wn Þ2fw ± 1 g,
wl = v l=1
and hence, τ T nw =
ðw1 , . . . , wn Þ 2 w ± 1
n
n
:
wl = v
l=1
,
for all n 2 . So, by using (4.10), n
τ T nw =
ðε1 , . . . , εn Þ 2 f ± 1gn :
l=1
εl = 0
:
It is well-known that n
ðε1 , . . . , εn Þ 2 f ± 1gn :
l=1
εl = 0
=
cn2
if n is even
0
if n is odd; □
In the following text, a reduced finite path w is said to be a loop, if w = vwv 2 FPr G in . Theorem 4.5 A w-radial operator Tw is semicircular in ðM G , τÞ, if and only if w 2 FPr G is a loop. Proof By the very above lemma, if w is a loop, then the w-radial operator Tw is semicircular in ðM G , τÞ, by (4.5).
600
I. Cho and P. E. T. Jorgensen
Conversely, if we assume that w is not a loop in , then the free distribution of Tw is determined by the formula (4.2). So, it is not semicircular □ in ðM G , τÞ. The following result is immediately obtained by the above theorem because the semicircular law is universal. Corollary 4.6 If wl = vl wl vl 2 FPr G are loops in with vl 2 V G , for l = 1, 2, then the radial operators T w1 and T w2 are identically freedistributed. Proof The identically free-distributedness of T w1 and T w2 is shown by the above theorem and the universal semicircularity. □ Suppose Ge is a single-edge graph, V ðGe Þ = fv1 ≠ v2 g and EðGe Þ = fe = v1 ev2 g: Then the corresponding graph groupoid e does not contain a loop. It illustrates that there do exist graphs that do not generate semicircular elements in the corresponding graph C-probability spaces.
5 Fractal Graphs and Semicircular Elements The fractality has been studied in various different areas (e.g., [3, 6, 7, 23–27, 39]). Especially, in [6, 7], we studied the fractality of graph groupoids. Graph groupoids satisfying the fractality is called graph fractaloids. In such a case, the graphs are called the fractal graphs. Let G be a graph. Then, each vertex v induces two quantities, def
degin ðvÞ = jfe 2 EðGÞ : e = evgj, and def
degout ðvÞ = jfe 2 EðGÞ : e = vegj, called the in-degree, and the out-degree, respectively.
On Semicircular Elements Induced by Connected Finite Graphs
601
Definition 5.1 A graph G is fractal, if μG = maxfdegout ðvÞ, degin ðvÞ : v 2 V ðGÞg < 1,
ð5:1Þ
then degoout ðxÞ = 2μG = degoin ðxÞ, 8x 2 V G = V ðGÞ, where G is the shadowed graph of G and degoout and degoin are the out-degree and the in-degree of G. The graph groupoids of fractal graphs are said to be graph fractaloids. We now concentrate on the fact every fractal graph generates suitably many semicircular elements in their graph C-probability spaces. Lemma 5.2 Every fractal graph G induces infinitely many loops at each vertex in its graph fractaloid . Proof The existence of adjacent loops of for all vertices is guaranteed by the fractality (5.1) on the shadowed graph G of G (See [4] for details). □ By the above lemma, we obtain the following result. Theorem 5.3 Let ðM G , τÞ be the graph C-probability space of a fractal graph G. Then, for any vertex v 2 V G , there exist loops wv = vwv v 2 FPr G , such that the wnv -radial operators T wnv are semicircular in ðM G , τÞ, for all n 2 . Proof It is proven by the fractality and (4.5).
□
For any arbitrarily fixed graph G, which is not fractal, construct a graph Go followed by the processes (I), (II), and (III) below: (I) If there are vertices v 2 V ðGÞ, adjacent to loop edges, and if x 2 V ðGÞ is the vertex having the maximal number of loop edges, say N O 2 , then attach NO-many loop edges to “all ” vertices of G. By Go, we denote this new resulted graph. Meanwhile, if there are no loop edges in G, then Go = G; (II) In the graph Go obtained in the process (I), if there exists a pair ðv1 , v2 Þ of two distinct vertices having no edges connecting from v1 to v2, then attach an additional edge connecting from v1 to v2. Do this process for all such pairs. By Go# , we denote this new graph;
602
I. Cho and P. E. T. Jorgensen
(III) For the resulted graph Go# of the process (II), let μGo# be the quantity in the sense of (5.1). Then, for “all” vertices v 2 V ðGÞ, make them have degout ðvÞ = μGo# = degin ðvÞ, by attaching suitably many edges. We denote this new graph by Go. Note here that V ðGÞ = V ðGo Þ = V Go# = V ðGo Þ:
Lemma 5.4 The graph Go from the processes (I), (II), and (III) is a fractal graph containing G as its full subgraph. Proof By the processes, degoout ðvÞ = 2μGo# = degoin ðvÞ, for all v 2 V Go (e.g., see [4]).
□
Different from subgraphs (e.g., [4, 5, 34, 40]), K is said to be a full subgraph of G, if E ðK Þ ⊆ V ðGÞ, and V ðK Þ = v1 , v2 2 V ðGÞ
∃e 2 EðK Þ s:t:; e = v1 ev2
:
And the notation “K ≤ G” means “K is a full subgraph of G.” Theorem 5.5 Every connected finite graph G has its fractal cover Goo , the minimal fractal graph containing G as its full subgraph. Proof By the above lemma, for any graph G, there always exists a fractal graph Go by the above processes (I), (II), and (III), containing G as its full subgraph. So, by the axiom of choice for the full subgraph inclusion, one can □ find the minimal fractal graph Goo, containing G as its full subgraph. By the existence of fractal covers of graphs, one has the following result.
On Semicircular Elements Induced by Connected Finite Graphs
603
Theorem 5.6 Every connected finite graph G generates suitably many semicircular elements. Proof Suppose the graph groupoid of G has a loop w 2 FPr G . Then there are semicircular elements, fT wn = Lwn þ Lw - n : n 2 g, in ðM G , τÞ. Assume now that has no loops in FPr G . Then one can construct the fractal cover Goo of G, whose graph fractaloid oo contain infinitely many loops w adjacent to all vertices. So, there are semicircular elements Lw þ Lw - 1 in ðM Goo , τÞ. In this sense, the graph G generates infinitely many semicircular elements. □
6 Acting Graph Groupoids on Graph C-Probability Spaces I Let G be our graph (connected, finite, and having more than one vertex) with its graph groupoid , and let ðM G , τÞ be the graph C-probability space of G. In this section, we consider how the graph groupoid acts on ðM G , τÞ and how such an action deforms the original free-distributional data on ðM G , τÞ. In Section 4, we characterized the free distributions of generating free random variables fLw gw2∖fϕg of ðM G , τÞ ; for all v 2 V G , the free distributions of the projections Lv are characterized by τ Lnv
1 = ð1, 1, 1, 1, 1, n=1
. . .Þ
ð6:1Þ
by (4.1); meanwhile, for all w = v1 wv2 2 FPr G , with v1 ≠ v2 2 V G , the corresponding free distributions of the partial isometries Lw are characterized by “the only non-zero” joint-free moments of Lw , Lw = Lw - 1 , τ Lw Lw
n
= 1 = τ Lw Lw
n
ð6:2Þ
604
I. Cho and P. E. T. Jorgensen
for all n 2 , by Theorem 4.2(i); and if v1 = v2 in V G , then the free distribution of Lw is characterized by “the only non-zero” joint-free moments of fLw , Lw - 1 g, n
τ
l=1
n
Lwel
= 1, ,
l=1
el = 0,
ð6:3Þ
for all ðe1 , . . . , en Þ 2 f ± 1gn , for all n 2 , by Theorem 4.2(ii). Moreover, the free distributions of w-radial operators T w = Lw þ Lw - 1 are characterized too for w 2 FPr G : if w is not a loop, then τ T nw
1 = ð2ωn Þ1 n = 1 = ð0, 2, 0, 2, 0, 2, n=1
. . .Þ,
meanwhile, if w is a loop, then τ T nw
1 = n=1
ωn cn2
1 n=1
= ð0, c1 , 0, c2 , 0, c3 , . . .Þ,
ð6:4Þ
by (4.2) and (4.5), respectively. Now, regard our graph groupoid algebra MG as a Banach space with its C-norm (inherited by the operator norm on BðH G Þ). And let BðM G Þ be the operator space of all Banach-space operators (or, bounded linear transformations) on the Banach space MG (e.g., [15]). Define a linear morphism α : → BðM G Þ, by αðwÞðT Þ = Lw T, for all w 2 ,
ð6:5Þ
for all T 2 MG. Indeed, by the definition (6.5), αðwÞ 2 BðM G Þ are bounded and linear for all w 2 . From below, for convenience, we denote the Banach-space operators αðwÞ 2 BðM G Þ simply by αw, for all w 2 . Trivially, αϕ = 0M G , the zero operator of BðM G Þ acting on MG. By (6.5), one obtains that αw1 w2 ðT Þ = Lw1 Lw2 T = αw1 αw2 ðT Þ, 8T 2 M G , implying that
On Semicircular Elements Induced by Connected Finite Graphs
605
αw1 w2 = αw1 αw2 , for all w1 , w2 2 ,
ð6:6Þ
in BðM G Þ. That is, the triple, denote
G = ð, M G , αÞ,
ð6:7Þ
forms a well-defined groupoid dynamical system. Proposition 6.1 The triple G = ð, M G , αÞ of (6.7) is a well-defined groupoid dynamical system acting on the Banach space MG via an action α of (6.5). Proof The elements of fαw gw2 are well-determined Banach-space operators acting on the graph-groupoid algebra MG, i.e., fαw gw2 ⊂ BðM G Þ by (6.5), moreover, the morphism α is a groupoid-action of by (6.6). Therefore, the triple G of (6.7) forms a groupoid dynamical system. □ The above proposition illustrates that the graph groupoid of a given graph G canonically act on our graph-groupoid algebra MG as Banach-space operators via an action α. Definition 6.2 The Banach-space operator αw 2 BðM G Þ of an arbitrary element w 2 is called the w(-Banach-space-)operator (on MG). We are interested in how the action α of (6.5) affects the original free-probabilistic information on the graph C-probability space ðM G , τÞ of G, equivalently, and we are interested in how the w-operators fαw gw2 ⊂ BðM G Þ deform the original free-distributional data on ðM G , τÞ. Theorem 6.3 Let v 2 V G be a vertex of and Lv 2 ðM G , τÞ the corresponding projection, and let x 2 V G , and αx 2 BðM G Þ , the x-operator. If T = αx ðLv Þ 2 ðM G , τÞ, then τðT n Þ = δv,x , for all n 2 ,
ð6:8Þ
where δ is the Kronecker delta. Proof Let x, v 2 V G
be vertices of , and let Lx , Lv 2 ðM G , τÞ be
corresponding projections. Then
606
I. Cho and P. E. T. Jorgensen
Lx Lv = Lxv = δx,v Lv in M G , since xv = δx,v v in , with axiomatization: 0 u = ϕ, for all u 2 . Thus, T = αx ðLv Þ = δx,v Lv 2 M G , implying that T n = δx,v Lnv = δx,v Lv = T, 8n 2 , in ðM G , τÞ. Therefore, one has τðT n Þ = τðT Þ = δx,v τðLv Þ = δx,v 1, for all n 2 , by (6.1). Therefore, the free-moments (6.8) holds.
□
The above theorem shows that the free distribution of a projection Lv 2 MG for v 2 V G , characterized by (6.1), is deformed by the x-operator αx 2 BðM G Þ for x 2 V G and the deformed free distribution is characterized by the free-moment sequence, either ð1, 1, 1, . . .Þ, or ð0, 0, 0, . . .Þ, by (6.8). That is, if x = v in , then the free distribution of Lv is preserved by the x-operator αx; meanwhile, if x ≠ v in , then it is distorted to be the free zero distribution. Theorem 6.4 Let w = v1 wv2 2 FPr G be a reduced finite path of with v1 , v2 2 V G and Lw 2 ðM G , τÞthe corresponding partial isometry, and let x 2 V G and αx 2 BðM G Þ, the x-operator. If W = αx ðLw Þ 2 ðM G , τÞ, then the free distribution of W is characterized by the joint-free moments of fW, W g, n
τ
n
W rn l=1
= δv1 ,x τ
l=1
Lrwn ,
ð6:9Þ
On Semicircular Elements Induced by Connected Finite Graphs
607
for all ðr1 , . . . , rn Þ 2 f1, gn , for all n 2 . In particular, the joint-free moments τ
n l=1
Lrwn in the right-hand side of (6.9) are characterized by
the only non-zero quantities, either (6.2) (if v1 ≠ v2) or (6.3) (if v1 = v2). Proof Under hypothesis, one has Lx Lw = Lxw = δx,v1 Lw in M G , since xw = xv1 wv2 = ðxv1 Þw = δx,v1 w 0 w = ϕ for all w 2 . Thus,
in , under the axiomatization:
W = αx ðLw Þ = δx,v1 Lw 2 M G , implying that n l=1
n
W rl =
l=1
ðδx,v1 Lw Þrl = δx,v1
n l=1
Lrwl ,
in ðM G , τÞ, for all ðr1 , . . . , rn Þ 2 f1, gn , for all n 2 . Recall-and-remark that Lw is not self-adjoint in MG with Lw = Lw - 1 . Therefore, one has n
τ
n
W rl l=1
= τ δx,v1
l=1
n
Lrwl
= δx,v1 τ
l=1
Lrwl ,
for all ðr 1 , :::, rn Þ 2 f1, gn , for all n 2 . Therefore, the free moments (6.9) holds with help of (6.2) and (6.3) case by case. □ The above theorem shows that if a vertex x and a reduced finite path w are admissible in (i.e., xw = w in FPr G ), then the free distributions of Lw and that of αx ðLw Þ are identically free-distributed in ðM G , τÞ; equivalently, the free distribution of Lw is preserved by the action of x-operator αx 2 BðM G Þ; meanwhile, if x and w are not admissible in (i.e., xw = ϕ), then the free distribution of Lw is deformed to be the free zero distribution in ðM G , τÞ by the action of αx.
608
I. Cho and P. E. T. Jorgensen
Theorem 6.5 Let w = v1 wv2 2 FPr G with v1 , v2 2 V G in , and let αw 2 BðM G Þ be the w-operator. Let v 2 V G inducing a projection Lv 2 ðM G , τÞ . If v1 ≠ v2 , then the free distribution of T = αw ðLv Þ 2 ðM G , τÞ is characterized by the only “possible” non-zero joint-free moments of fT, T g, τððT T Þn Þ = δv2 ,v = τððTT Þn Þ, 8n 2 :
ð6:10Þ
Meanwhile, if v1 = x = v2 in V G , then the free distribution of T is determined by the only “possible” non-zero joint-free moments of fT, T g, n
τ
n
T rl l=1
= δx,v τ
l=1
Lwel
=
δv2 ,v
n
if
0
l=1
el = 0
ð6:11Þ
otherwise;
for all ðr1 , . . . , rn Þ 2 f1, gn , for all n 2 , where el =
1
if rl = 1
-1
if rl = ,
for all l = 1, . . . , n, for all n 2 . Proof Suppose w = v1 wv2 2 FPr G is a non-loop reduced finite path of with v1 ≠ v2 2 V G . Then, for a vertex v 2 V G , one has wv = wv2 v = wðv2 vÞ = δv2 ,v w in , implying that T = αw ðLv Þ = Lw Lv = Lwv = δv2 ,v Lw in M G : Thus, if δv2 ,v = 0, then the free distribution of T is the free zero distribution in ðM G , τÞ; meanwhile, if δv2 ,v = 1, equivalently, if v2 = v in , then T = Lw in MG, so the free distribution of T is identical to the free distribution of Lw in ðM G , τÞ, characterized by the joint-free moments of Lw , Lw , so, the free distribution of T is characterized by its only non-zero joint-free moments of fT= Lw , T = Lw - 1 g, where the only non-zero joint-free moments are
On Semicircular Elements Induced by Connected Finite Graphs
609
τððT T Þn Þ = 1 = τððTT Þn Þ, 8n 2 : Therefore, the free-distributional data (6.10) holds. Now, suppose w = xwx 2 FPr G is a loop in with x 2 V G . Then T = αw ðLv Þ = Lw Lv = Lwv = δx,v Lw in M G , since wv = wxv = wðxvÞ = δx,v w in . If δx,v = 0, then the free distribution of T is the free zero distribution in ðM G , τÞ; meanwhile, if δx,v = 1, equivalently, if w and x are admissible in , then T = Lw in ðM G , τÞ, so, the free distribution of T is characterized by the joint-free moments of fLw , Lw - 1 g, i.e., n
τ
n
T rl l=1
=τ
l=1
Lwel
=
δv2 ,v 1 0
n
if l=1
el = 0
otherwise;
for all ðr1 , . . . , rn Þ 2 f1, gn , where el = 1 if rl = 1, while el = -1 if rl =, for l = 1, . . . , n, for all n 2 , by (6.3). Therefore, the free-distributional data (6.11) holds. □ The above theorem characterizes how w-operators αw 2 BðM G Þ for w 2 FPr G
deform the free distribution of projections Lv 2 ðM G , τÞ for
v 2 V G , by (6.10) and (6.11). Theorem 6.6 Let w = v1 wv2 2 FPr G be a reduced finite path with v1 ≠ v2 in V G , and let αw 2 BðM G Þ be the corresponding w-operator. Let w1 = x1 w1 x2 2 FPr G is a reduced finite path with x1, x22 V G , and Lw1 2 ðM G , τÞ, a partial isometry. Suppose W = αw ðLw1 Þin ðM G , τÞ. (i) If w1 = w-1in , then the free distribution of W is determined by the free-moment sequence ð1, 1, 1, 1, . . .Þin ðM G , τÞ. (ii) If w1 ≠ w-1, v2 = x1 , and v1 = x2 in , then the free distribution of W is characterized by the joint-free moments of fW, W g,
610
I. Cho and P. E. T. Jorgensen n n
τ
W rl l=1
=
1
if l=1
0
el = 0
otherwise;
for all ðr1 , . . . , rn Þ 2 f1, gn, where el = 1 if rl = 1, and el = -1 if rl = , for l = 1, . . . , n, for all n 2 . (iii) If w1 ≠ w-1, v2 = x1 , and v1 ≠ x2 in , then the free distribution of W is characterized by the only non-zero joint-free moments of fW, W g, τððW W Þn Þ = 1 = τððWW Þn Þ, 8n 2 : (iv) If v2 ≠ x1 in V G ⊂ , then W has the free zero distribution in ðM G , τÞ. Proof Assume first that two reduced finite paths w and w1 are admissible in , and hence, W = αw ðLw1 Þ = Lww1 ≠ 0G in ðM G , τÞ: Then there are three different cases; (i) w1 = w-1, and hence, ww1 = ww-1 = v1 in ; (ii) w1 ≠ w-1, and ww1 2 FPr G is a loop in ; and (iii) w1 ≠ w-1, and ww1 2 FPr G is not a loop in . If ww1 = v1 is a vertex in as in the case (i), then W = Lv1 is a projection whose free distribution is characterized by the free-moment sequence, n ðτðW n ÞÞ1 n = 1 = τ Lv1
1 n=1
= ð1, 1, 1, . . .Þ,
by (6.1). Thus, the statement (i) holds. If ww1 ≠ ϕ as a loop in as in the case (ii), then W = Lww1 is a partial isometry for the loop ww1 whose free distribution is characterized by (6.3) under the identically free-distributedness of Theorem 4.2(ii). Therefore, the statement (ii) holds. If ww1 ≠ ϕ as a non-loop reduced finite path in as in the case (iii), then W = Lww1 is a partial isometry for ww1 whose free distribution is determined by (6.2) under the identically free-distributedness of Theorem 4.2(i). It shows the statement (iii) holds true. Different from the above discussion, assume now that w and w1 are not admissible in , and hence, ww1 = ϕ. Then
On Semicircular Elements Induced by Connected Finite Graphs
611
W = αw ðLw1 Þ = Lww1 = Lϕ = 0G , in ðM G , τÞ. Therefore, the free distribution of W is the free zero distribution in ðM G , τÞ, showing the statement (iv) holds. □ The above theorem fully characterizes how the action of w-operators αw 2 BðM G Þ for w 2 FPr G deform the original free distributions of partial isometries fLw g
w2FPr G
in the graph C-probability space ðM G , τÞ.
7 Acting Graph Groupoids on Graph C-Probability Spaces II In this section, we further consider how the graph groupoid of a graph G affects the free probability on the graph C-probability space ðM G , τÞ. In Section 6, we studied how the groupoid action α of (6.5) deforms the free distributions of generating free random variables fLw gw2 of ðM G , τÞ, and the deformed distributions were characterized there. Here, we consider how the action α of deforms the free distributions of w-radial operators, T w = Lw þ Lw - 1 2 M G , 8w 2 FPr G , in ðM G , τÞ. Recall that if w = v1 wv2 2 FPr G with v1 ≠ v2 2 V G , then the free distribution of Tw is characterized by τ T nw
1 = ð2ωn Þ1 n=1 n=1
= ð0, 2, 0, 2, 0, 2, 0, 2, . . .Þ,
ð7:1Þ
meanwhile, if v1 = v2 in V G , then Tw is semicircular in ðM G , τÞ, satisfying τ T nw
1 = n=1
ωn cn2
1 n=1
= ð0, c1 , 0, c2 , 0, c3 , . . .Þ,
ð7:2Þ
by (6.4), where ωn = 1 if n is even and ωn = 0 if n is odd, for all n 2 , and ck are the k-th Catalan numbers for all k 2 0 .
612
I. Cho and P. E. T. Jorgensen
Theorem 7.1 Let w = v1 wv2 2 FPr G with v1 ≠ v2 2 V G , and let T w 2 ðM G , τÞbe the w-radial operator. If αv 2 BðM G Þis the v-operator for v 2 V G and T = αv ðT w Þin MG , then the free-distribution of T is characterized by the only “possible” non-zero joint free moments of fT, T g, either
τððT T Þn Þ = δv,v1
or
τððT T Þn Þ = δv,v2 ,
ð7:3Þ
for all n 2 . Proof Under hypothesis, one has vw = vðv1 wÞ = ðvv1 Þw = δv,v1 w, and vw - 1 = vðv2 w - 1 Þ = ðvv2 Þw - 1 = δv,v2 w, in , under the axiom: 0 y = ϕ for all y 2 . So, T = αv ðLw þ Lw - 1 Þ = Lvw þ Lvw - 1 = δv,v1 Lw þ δv,v2 Lw - 1 , in MG. Since v1 ≠ v2 in , δv,v1 = 1 , δv,v2 = 0, and δv,v1 = 0 , δv,v2 = 1, implying that T = either δv,v1 Lw or δv,v2 Lw - 1 , in ðM G , τÞ. Thus, the resulted free random variable T is not self-adjoint in ðM G , τÞ, if it is non-zero in MG, and hence, in general, the free distribution of T is determined by the joint-free distributions of fT, T g, n
τ
n
T rl l=1
=τ
l=1
n
ðδv,v1 Lwel Þ , or τ
l=1
δv,v2 Lw - el ,
On Semicircular Elements Induced by Connected Finite Graphs
613
where el = 1 if rl = 1 and el = -1 if rl = , for all ðr1 , . . . , rn Þ 2 f1, gn , for all n 2 . Therefore, one can conclude that the only “possible” non-zero joint free moments of fT, T g are either δv,v1 τððLw Lw - 1 Þn Þ = δv,v1 , or δv,v2 τððLw - 1 Lw Þn Þ = δv,v2 , in ðM G , τÞ, for all n 2 , by (6.2).
□
The above theorem lets us realize how the original data (7.1) is distorted by the action α of V G in ðM G , τÞ, by (7.3). Theorem 7.2 Let w = xwx 2 FPr G be a loop with x 2 V G , and let T w 2 ðM G , τÞbe the w-radial operator. If αv 2 BðM G Þis the v-operator for v 2 V G and W = αv ðT w Þin MG , then the free-distribution of W is characterized by the only “possible” non-zero free moments, τðW n Þ = δx,v ωn cn2 ,
ð7:4Þ
for all n 2 . So, if xw ≠ ϕ in , then W is semicircular, meanwhile, if xw = ϕ, then W has the free zero distribution in ðM G , τÞ. Proof Under hypothesis, if x and w are admissible, equivalently, if x = v in V G in , then W = αv ðT w Þ = Lvw þ Lvw - 1 = T w in M G , and hence, the free random variable W is identified with the w-radial operator Tw, and hence, it is not only self-adjoint but also semicircular in ðM G , τÞ by (7.2), since w is a loop in . However, if x and w are not admissible, equivalently, if x ≠ v in V G , then
614
I. Cho and P. E. T. Jorgensen
W = αv ðT w Þ = Lϕ þ Lϕ = 0G in M G , and hence, it has the free zero distribution in ðM G , τÞ. Therefore, the free-distributional data (7.4) holds true.
□
The above theorem shows that the semicircular law of a loop-radial operator Tw is deformed by the action of α for V G by (7.4). Theorem 7.3 Let w = v1 wv2 2 FPr G with v1 ≠ v2 2 V G and T w 2 ðM G , τÞthe corresponding w-radial operator, and let y = x1 yx2 2 FPr G be a reduced finite path with x1 ≠ x2 2 V G , inducing the y-operator αy 2 BðM G Þ, and T = αy ðT w Þ 2 ðM G , τÞ. (i) If fx1 , x2 g \ fv1 , v2 g = ϕ in V G ⊂ , then T has the free zero distribution. (ii) If y = w - 1 2 FPr G in , then the free distribution of T is characterized by the free-moment sequence, ðτðT n ÞÞ1 n = 1 = ð1, 1, 1, 1, 1, . . .Þ: (iii) If y = w 2 FPr G in , then the free distribution of T is characterized by the only non-zero joint-free moments of fT, T g, 1
τððTT Þn Þn = 1 = 1 = τððT T Þn Þ, 8n 2 : (iv) If neither y = w-1nor y = w, and if v1 = x2 and v2 = x1 in , then the free distribution of T is characterized by the joint-free moments of fT, T g, n n
τ
T rl l=1
=
1
if l=1
0
el = 0
otherwise;
for all ðr1 , . . . , rn Þ 2 f1, gn, where el = 1 if rl = 1, and el = -1 if rl = , for l = 1, . . . , n, for all n 2 .
On Semicircular Elements Induced by Connected Finite Graphs
615
(v) If neither y = w-1nor y = w and if v2 = x1 and v1 ≠ x2 in , then the free distribution of T is characterized by the only non-zero joint-free moments of fT, T g, τððT T Þn Þ = 1 = τððTT Þn Þ, 8n 2 : Proof Under hypothesis, if the initial and terminal vertices x1 and x2 of y 2 FPr G are distinct, then yw = ðyx2 Þðv1 wÞ = yðx2 v1 Þw = δx2 ,v1 w, and yw - 1 = ðyx2 Þðv2 w - 1 Þ = yðx2 v2 Þw - 1 = δx2 ,v2 w - 1 , in , under the axiom 0 u = ϕ for all u 2 , and hence, αy ðLw Þ = Ly Lw = Lyw = δx2 ,v1 Lyw , and αy ðLw - 1 Þ = Ly Lw - 1 = Lyw - 1 = δx2 ,v2 Lyw - 1 , respectively, implying that T = αy ðT w Þ = δx2 ,v1 Lyw þ δx2 ,v2 Lyw - 1 ,
ð7:5Þ
in ðM G , τÞ. If fv1 , v2 g \ fx1 , x2 g = ϕ in V G , then the vertices fv1 , v2 g and the vertices fx1 , x2 g are not mutually admissible from each other, and hence, xn vm = ϕ in , 8n, m = 1, 2, implying that δx2 ,v1 = 0 = δx2 ,v2 :
616
I. Cho and P. E. T. Jorgensen
So, one has T = 0G + 0G = 0G in ðM G , τÞ by (7.5). Therefore, the free random variable T has the free zero distribution in ðM G , τÞ in this case. Thus, the statement ((i)) holds true. Assume now that y = w-1 is the shadow of w in , i.e., yw = w-1w = v2 in . Then, by (7.5), one has T = αy ðT w Þ = Lv2 þ 0 Lyw - 1 = Lv2 in ðM G , τÞ, since w-1 = v2 w-1v1 with v1 ≠ v2, guaranteeing that ðw - 1 Þ = ϕ in , for all k 2 > 1 = ∖f1g. That is, if y = w-1, then the resulted free random variable T becomes a projection Lv2 in ðM G , τÞ for the vertex v2. Therefore, by (6.1), the free distribution of T is characterized by the free-moment sequence, k
ðτðT n ÞÞ1 n = 1 = ð1, 1, 1, 1, 1, . . .Þ: So, the statement ((ii)) holds. Similar to the very above case, suppose y = w in . Then, by (7.5), we have T = αy ðT w Þ = 0 Lyw þ Lv1 = Lv1 in ðM G , τÞ: So, the free distribution of T is the free distribution of the projection Lv1 , characterized by the free-moment sequence, ðτðT n ÞÞ1 n = 1 = ð1, 1, 1, 1, 1, . . .Þ, by (6.1). Thus, the statement ((iii)) is satisfied. Suppose neither y = w-1 nor y = w in , and suppose δv2 ,x1 = 1 = δv1 ,x2 . Then yw and w-1y-1 are loops, but yw-1 = ϕ in . So, by (7.5), T = Lyw þ Lyw - 1 = Lyw þ Lϕ = Lyw , in ðM G , τÞ. Since yw is a loop, the free distribution of T = Lyw is characterized by the joint-free moment formula (6.3). Therefore, the statement ((iv)) holds. Finally, assume neither y = w-1 nor y = w, and δv2 ,x1 = 1, but δv1 ,x2 = 0. Then y and w are admissible, but y and w-1 are not admissible in ; moreover, the reduced finite path yw = x1 ywv2 2 FPr G reduced finite path in . Thus, one has
is a non-loop
On Semicircular Elements Induced by Connected Finite Graphs
617
T = αy ðT w Þ = Lyw þ 0 Lϕ = Lyw in ðM G , τÞ, by (7.5), where yw is a non-loop reduced finite path. Thus, the free distribution of T is characterized by the only non-zero joint free moments of fT, T g, τððT T Þn Þ = 1 = τððTT Þn Þ, 8n 2 , □
by (6.2). It shows that the statement ((v)) holds.
The above theorem fully characterizes how the original free distributions (7.1) of w-radial operators T w 2 ðM G , τÞ is deformed by the action of Banach-space operators αy 2 BðM G Þ, where w, y 2 FPr G
are “non-
loop” reduced finite paths of , case by case from ((i)) through ((v)). Theorem 7.4 Let w = vwv 2 FPr G with v 2 V G and T w 2 ðM G , τÞthe corresponding w-radial operator, and let y = xyx 2 FPr G be a reduced finite path with x 2 V G , inducing the y-operator αy 2 BðM G Þ , and T = αy ðT w Þ 2 ðM G , τÞ. Then (i) If v ≠ x in V G , then T has the free zero distribution in ðM G , τÞ. (ii) If x = v in V G , then T has its non-free zero distribution, but it is not semicircular in ðM G , τÞ. Proof Similar to (7.5), one obtains that T = αy ðT w Þ = δx2 ,v Lyw þ δx2 ,v Lyw - 1 ,
ð7:6Þ
If δv,x = 0, equivalently, if y and w (and hence, y and w-1) are not admissible (because w is a loop) in , then the free random variable T has the free zero distribution since it is identical to the zero element 0G in ðM G , τÞ by (7.6). It shows the statement (i) holds. Assume that δv,x = 1, equivalently, y and w (and hence, y and w-1) are admissible (because w is a loop) in , and suppose y = w-1 in . Then T = Lv þ Lw - 2 in ðM G , τÞ, by (7.6). So, it is not self-adjoint, since the summand Lw - 2 is not self-adjoint in MG. So, this free random variable T cannot be semicircular in ðM G , τÞ. Observe that
618
I. Cho and P. E. T. Jorgensen
τðT Þ = φðLv Þ = 1, showing that the free distribution of T is non-zero in ðM G , τÞ. Assume now that δv,x = 1, and suppose y ≠ w-1 in . Then T = Lyw þ Lyw - 1 2 ðM G , τÞ, by (7.6), where both yw and yw-1 are loops in FPr G ⊂ . Note in this case that yw - 1 ≠ w - 1 y - 1 = ðywÞ - 1 in , and hence, T ≠ Tyw in MG, moreover, it is not self-adjoint in MG, since T = Lw - 1 y - 1 þ Lwy - 1 ≠ T in M G , directly implying that T cannot be semicircular in ðM G , τÞ. Observe now that T T = Lw - 1 y - 1 yw þ Lw - 1 y - 1 yw - 1 þ Lwy - 1 yw þ Lwy - 1 yw - 1 = Lv þ Lw - 2 þ Lw2 þ Lv = 2Lv þ T w2 , where T w2 is the w2-radial operator of a loop w2 2 . It shows that τðT T Þ = φð2Lv Þ = 2, showing that the free random variable T has a non-free zero distribution in ðM G , τÞ. Therefore, if δx,v = 1, then the free distribution of T is non-zero, but it is □ not the semicircular law in ðM G , τÞ. So, the statement (ii) holds. The above theorem shows that the semicircular law of a semicircular wradial operator Tw induced by a loop w 2 is completely distorted by the action of Banach-space operators αy 2 BðM G Þ of loops y 2 FPr G ⊂ . That is, if w, y 2 FPr G
are loops of , then the free random variable
αy ðT w Þ cannot be semicircular, even though Tw is semicircular in ðM G , τÞ.
8 Acting Graph Groupoids on Graph C-Probability Spaces III. In Sections 6 and 7, we considered how the Banach-space operators fαw gw2 ⊂ BðM G Þ deform the free probability on the graph C-probability space ðM G , τÞ, where G is a given (connected finite directed) graph (having
On Semicircular Elements Induced by Connected Finite Graphs
619
more than one vertex) with its graph groupoid and where α : → BðM G Þ is the groupoid-action (6.5). In this section, we consider a different groupoid action, β : → BðM G Þ
defined by βw ðT Þ = TLw , 8T 2 M G ,
ð8:1Þ
for all w 2 . By definition, it is not hard to show that it is an “intertwining” (or, right) groupoid action satisfying βw1 w2 ðT Þ = TLw1 w2 = TLw1 Lw2 = βw2 βw1 ðT Þ, for all T 2 MG, i.e., βw1 w2 = βw2 βw1 , for all w1 , w2 2 :
ð8:2Þ
Note the order of products in (8.2). Observation. The triple ð, M G , βÞ is a well-defined groupoid (intertwining) dynamical system by (8.2), where β is the intertwining groupoid action (8.1). By the action β of , the free probability on ðM G , τÞ is deformed, and such deformations would be similar to those we considered in Sections 6 and 7. We now define a new action γ : → BðM G Þ by γ w ðT Þ = Lw TLw - 1 = Lw TLw , 8T 2 M G :
ð8:3Þ
Then it indeed satisfies that γ w1 w2 ðT Þ = Lw1 w2 TLw2- 1 w1- 1 = γ w1 γ w2 ðT Þ, for all T 2 MG, implying that γ w1 w2 = γ w1 γ w2 on M G , 8w1 , w2 2 :
ð8:4Þ
Proposition 8.1 The triple ð, M G , γ Þ is a well-defined groupoid dynamical system of acting on MG via a groupoid action γ of (8.3). Proof For all w 2 , the images γ w of the morphism γ of (8.3) become welldefined Banach-space operators acting on graph-groupoid algebra MG (by regarding MG as a Banach space). Moreover the relation (8.4) holds in □ BðM G Þ.
620
I. Cho and P. E. T. Jorgensen
By definition, for all w 2 , γ w = αw ∘βw - 1 = βw - 1 ∘αw on M G 8w 2 , where α and β are in the sense of (6.5) and (8.1) respectively. We are interested in how the groupoid action γ of (8.3) affects the free probability on ðM G , τÞ. Theorem 8.2 Let v 2 V G be a vertex of inducing a Banach-space operator γ v 2 BðM G Þ, and let w 2 , and Lw 2 ðM G , τÞ: Suppose W = γ v ðLw Þ in ðM G , τÞ. (i) If w 2 V G , then the free distribution of W is characterized by the free-moment sequence, ðτðW n ÞÞ1 n = 1 = ðδv,w , δv,w , δv,w , δv,w , δv,w , . . .Þ: (ii) If w = v1 wv2 2 FPr G with v1 ≠ v2 2 V G , then W has the free zero distribution. (iii) If w = vwv 2 FPr G , then the free distribution of W is characterized by the joint-free moments (6.3) of fW, W g. (iv) If w = xwx 2 FPr G with x ≠ v 2 V G , then W has the free zero distribution. Proof Under hypothesis, one has W = γ v ðLw Þ = Lv Lw Lv , 8w 2 , in ðM G , τÞ. So, if w is a vertex in , then W = δv,w Lv in ðM G , τÞ, and hence, the statement (i) holds. Now, a reduced finite path w = v1 wv2 is a “non-loop” element of with distinct vertices v1 and v2. Then W = Lv Lw Lv = Lvv1 wv2 v = δv,v1 δv2 ,v Lv , in MG. Note that, since v1 ≠ v2, we have
On Semicircular Elements Induced by Connected Finite Graphs
621
δv,v1 = 1 , δv,v2 = 0, and δv2 ,v = 1 , δv,v1 = 0, and hence, δv,v1 δv2 ,v = 0, implying that the free random variable W is the zero element 0G of ðM G , τÞ, whose free distribution is the zero one. Thus, the statement (ii) holds. Now, a reduced finite path w = vwv is a loop in having its initial-andterminal vertex v. Then W = Lv Lw Lv = Lvwv = Lw in ðM G , τÞ: So, in this case, the free distribution of W is the free distribution of Lw, characterized by the joint-free moments (6.3) of fLw , Lw - 1 g. Therefore, the statement (iii) is shown. Assume that w = xwx is a loop with its initial-and-terminal vertex x, which is distinct with a vertex v. Then W = Lv Lw Lv = LðvxÞwðxvÞ = Lϕ = 0G , in ðM G , τÞ. So, this free random variable W has the free zero distribution. Thus, the statement (iv) is proven. □ The above theorem fully characterizes how the Banach-space operators ⊂ BðM G Þ deform the original free-distributional data on ðM G , τÞ. fγ v g v2V G
Theorem 8.3 Let w = v1 wv2 2 FPr G with v1 ≠ v2 2 V G , and γ w 2 BðM G Þ the corresponding Banach-space operator induced by the action γ of (8.3). Let y 2 satisfy y ≠ w-1, and Ly 2 ðM G , τÞ , a generating free random variable. Suppose T = γ w Ly in ðM G , τÞ.
622
I. Cho and P. E. T. Jorgensen
(i) If y 2 V G , then the free distribution of T is characterized by the freemoment sequence, ðτðT n ÞÞ1 n = 1 = δy,v2 , δy,v2 , δy,v2 , δy,v2 , . . . : (ii) If y = x1 yx2 2 FPr G with x1 ≠ x2 2 V G , then T has the free zero distribution. (iii) If y = xyx 2 FPr G with x 2 V G , and if x = v2 , then the free distribution of T is characterized by the joint-free moments (6.3) of fT, T g. (iv) If y = xyx 2 FPr G with x 2 V G , and if x ≠ v2 , then T has the free zero distribution. Proof Remark that we assumed y ≠ w-1 in . Consider that if w = v1 wv2 is a “non-loop” reduced finite path of with two distinct vertices v1 and v2, then, for all y 2 , T = γ w Ly = Lw Ly Lw - 1 = Lwyw - 1 , in ðM G , τÞ. Note that wyw - 1 = wðv2 yv2 Þw - 1 in : Assume first that y 2 V G . Then, one has wyw - 1 = δy,v2 ww - 1 = δy,v2 v1 in , under the axiom: 0 u = ϕ for all u 2 . So, T = δy,v2 Lv1 2 ðM G , τÞ, and hence, the free distribution of T is characterized by the free-moment sequence, ðτðT n ÞÞ1 n = 1 = δy,v2 , δy,v2 , δy,v2 , . . . , in ðM G , τÞ. It proves the statement (i).
On Semicircular Elements Induced by Connected Finite Graphs
623
Suppose now that y = x1 yx2 is a non-loop reduced finite path with distinct vertices x1 and x2 in . Then wyw - 1 = wðv2 x1 yx2 v2 Þw - 1 = δv2 ,x1 δx2 ,v2 wyw - 1 , in , implying that T = δv2 ,x1 δx2 ,v2 Lwyw - 1 2 ðM G , τÞ: Since x1 ≠ x2 in V G , one can verify that δv2 ,x1 δx2 ,v2 = 0, concluding that T = 0 Lwyw - 1 in MG. So, this free random variable T has the free zero distribution in ðM G , τÞ. Therefore, the statement (ii) holds true. Now, assume that y = xyx is a loop with its initial-and-terminal vertex x in . Then wyw - 1 = wðv2 xyxv2 Þw - 1 = δv2 ,x wyw - 1 in . It shows that if v2 ≠ x, then wyw-1 = ϕ, meanwhile, if v2 = x, then wyw-1 is a loop in with its initial-and-terminal vertex v1. So, if v2 = x, then T = Lwyw - 1 2 ðM G , τÞ, a generating free random variable induced by a loop wyw-1, having the free-distributional data (6.3). So, the statement (iii) holds. While, if v2 ≠ x, then T = 0 Lwyw - 1 = 0G in ðM G , τÞ, whose free distribution is the zero distribution in ðM G , τÞ. Therefore, the statement (iv) is shown. □ The above theorem characterizes how the w-operators γ w for non-loop reduced finite paths w of affect the original free-distributional data on ðM G , τÞ. Theorem 8.4 Let w = vwv 2 FPr G with v 2 V G and γ w 2 BðM G Þ the corresponding Banach-space operator. Let y 2 satisfy y ≠ w-1, and Ly 2 ðM G , τÞ. Suppose T = γ w Ly in ðM G , τÞ.
624
I. Cho and P. E. T. Jorgensen
(i) If y 2 V G , then the free distribution of T is characterized by the freemoment sequence, ðτðT n ÞÞ1 n = 1 = δy,v , δy,v , δy,v , δy,v , . . . : (ii) If y = x1 yx2 2 FPr G with x1 ≠ x2 2 V G , then T has the free zero distribution. (iii) If y = xyx 2 FPr G with x 2 V G , and if x = v, then the free distribution of T is characterized by the joint-free moments (6.3) of fT, T g. (iv) If y = xyx 2 FPr G with x 2 V G , and if x ≠ v2 , then T has the free zero distribution. Proof Let w = vwv be a loop with its initial-and-terminal vertex v in and γ w 2 BðM G Þ the corresponding Banach-space operator. If y 2 V G
is a
vertex of , then wyw - 1 = wðvyvÞw - 1 = δy,v ww - 1 = δy,v v, in , implying that T = γ w Ly = Lwyw - 1 = δy,v Lv 2 ðM G , τÞ: Thus, the statement (i) holds. If now y = x1 yx2 is a non-loop reduced finite path with distinct vertices x1 and x2 in , then wyw - 1 = wðvx1 yx2 vÞw - 1 = δv,x1 δx2 ,v wyw - 1 , in , implying T = δv,x1 δx2 ,v Lwyw - 1 2 ðM G , τÞ: Since x1 ≠ x2, one has δv,x1 δx2 ,v = 0, and hence, T = 0G in MG, having the free zero distribution in ðM G , τÞ. It proves the statement (ii). Now, suppose that y = xyx is a loop with its initial-and-terminal vertex x in , then wyw - 1 = wðvxyxvÞw - 1 = δv,x wyw - 1 ,
On Semicircular Elements Induced by Connected Finite Graphs
625
in , implying that T = δv,x Lwyw - 1 2 ðM G , τÞ, the generating free random variable for a loop wyw-1 of . So, if v = x, equivalently, δv,x = 1, then the free distribution of T is characterized by the joint-free moments (6.3) of Lwyw - 1 , Lwy - 1 w - 1 , meanwhile, if v ≠ x, equivalently, δv,x = 0, then T has the free zero distribution in ðM G , τÞ. It proves statements both (iii) and (iv), respectively. □ The above theorem characterizes how the w-operators γ w 2 BðM G Þ for loops w deform the original free-probabilistic information on ðM G , τÞ.
9 Deformed Semicircular Laws on ðM G , τ Þ by Acting G In this section, motivated by the main results of Section 8, we consider deformed semicircular laws on the graph C-probability space ðM G , τÞ of a graph G under the action γ of the graph groupoid , where γ is in the sense of Theorem 8.2(i). In this section, we assume the graph groupoid of G contains a loop w0 = v0 w0 v0 2 FPr G with its initial-and-terminal vertex v0 2 V G , and fix it. Then, the corresponding w0-radial operator, denote
T 0 = T w0 = Lw0 þ Lw0- 1 2 ðM G , τÞ, is semicircular by (7.2). Theorem 9.1 Let v 2 V G be a vertex of and γ v 2 BðM G Þ , and let W = γ v ðT 0 Þbe a free random variable of ðM G , τÞ. Then τðW n Þ = δv0 ,v ωn cn2 , for all n 2 : Proof Under hypothesis, observe that W = γ v ðT 0 Þ = Lv T 0 Lv = Lvw0 v þ Lvw0- 1 v , in MG. Note that
ð9:1Þ
626
I. Cho and P. E. T. Jorgensen
vw0 v = ðvv0 Þw0 ðv0 vÞ = δv0 ,v w0 , and vw0- 1 v = ðvv0 Þw0- 1 ðv0 vÞ = δv0 ,v w0- 1 , in , implying that W = δv0 ,v Lw0 þ δv0 ,v Lw0- 1 = δv0 ,v T w0 , in ðM G , τÞ. Since δv0 ,v 2 f0, 1g, the free random variable W is self-adjoint in MG, and hence, the free distribution is characterized by its free moments, τðW n Þ = τððδv0 ,v T w0 Þn Þ = δv0 ,v τ T n0 , 8n 2 : Therefore, by the semicircularity of T0, the free-distributional data (9.1) is obtained. □ By Theorem 35, one obtains the following corollary. Corollary 9.2 Let γ v 2 BðM G Þ be the v-operator for a vertex v 2 V G . Then the free distribution of W = γ 0 ðT 0 Þis either the free zero distribution, or the semicircular law. In particular, it is semicircular, if and only if v = v0 in V G . Proof It is proven by (9.1).
□
Now, consider the cases where we have Banach-space operators of BðM G Þ. fγ w g w2FPr G
Theorem 9.3 Let w = x1 wx2 2 FPr G be a reduced finite path with x1 , x2 2 V G , satisfying both w ≠ w0 , and w ≠ w0- 1 in , and let γ w 2 BðM G Þbe the w-operator. If T = γ w ðT 0 Þin ðM G , τÞ, then τðT n Þ = δv,x2 ωn cn2 , for all n 2 :
ð9:2Þ
On Semicircular Elements Induced by Connected Finite Graphs
627
Proof Under hypothesis, observe that T = γ w ðT 0 Þ = γ w Lw0 þ Lw0- 1 = Lw Lw0 Lw - 1 þ Lw Lw0- 1 Lw - 1 = Lww0 w - 1 þ Lww0- 1 w - 1 , i.e., T = Lww0 w - 1 þ Lww0- 1 w - 1 2 ðM G , τÞ,
ð9:3Þ
and ww0 w - 1 = wðx2 v0 Þw0 ðv0 x2 Þw - 1 = δv0 ,x2 ww0 w - 1 , and ww0- 1 w - 1 = wðx2 v0 Þw0- 1 ðv0 x2 Þw - 1 = δv0 ,x2 ww0- 1 w - 1 , in , since w-1 = x2 w-1x1. So, more precise than (9.3), one has T = δv0 ,x2 Lww0 w - 1 þ Lww0- 1 w - 1 2 ðM G , τÞ:
ð9:4Þ
Note now that ðww0 w - 1 Þ
-1
= ww0- 1 w - 1 in ,
and hence, Lww0 w - 1
= Lðww0 w - 1 Þ - 1 = Lww0- 1 w - 1 ,
in MG, implying that T = δv0 ,x2 Lww0 w - 1 þ Lww0 w - 1 = δv0 ,x2 T ww0 w - 1 , in MG, by (9.4). Thus, this operator T is self-adjoint in MG. Furthermore, if it is non-zero, equivalently, if δv0 ,x2 = 1, equivalently, if v0 = x2 in V G , then T is identified with the ðww0 w - 1 Þ-radial operator T ww0 w - 1 in ðM G , τÞ.
628
I. Cho and P. E. T. Jorgensen
Note also that if non-empty, then ww0 w - 1 = v1 ðww0 w - 1 Þv1 in , and hence, it is a loop (because we assumed that w ≠ w0, and w ≠ w0- 1), with its initial-and-terminal vertex v1. Thus, one can conclude that τðT n Þ = δv0 ,x2 τ T nww0 w - 1 = δv0 ,x2 ωn cn2 , for all n 2 . Therefore, the free-distributional data (9.2) holds.
□
The above theorem shows how the Banach-space operators, γ w : w 2 FPr G ∖ w0± 1
⊂ BðM G Þ,
deform the semicircular law induced by a loop w0 of in ðM G , τÞ. In the above theorem, note that we did not give any conditions for the vertices x1 and x2 for a given reduced finite path w. As we have seen above, our proof covers the both cases where x1 ≠ x2 and x1 = x2. Note also that the assumption neither w = w0 nor w = w0- 1 is crucial there. Corollary 9.4 Let w = x1 wx2 2 FPr G ∖ w0± 1
with x1 , x2 2 V G
(which are not necessarily distinct), and γ w 2 BðM G Þ , the w-operator. Then the free distribution of the free random variable T = γ w ðT 0 Þis either the semicircular law, or the free zero distribution in ðM G , τÞ. In particular, T is semicircular, if and only if x2 = v0 in V G . Proof The proof is done by (9.2).
□
Acknowledgements Many thanks to our TE X-pert for developing this class file.
References 1. Barndorff-Nielsen, O. E. (2006). Classical and Free Infinite Divisibility and Levy Processes. Quantum independent increment processes II. Lecture Note in Mathematics, 1866 (pp. 33–159). Springer 2. Barndorff-Nielsen, O. E., & THorbjornsen, S. (2005). The Levy-Ito decomposition in free probability. Probability Theory and Related Fields, 131(2), 197–228
On Semicircular Elements Induced by Connected Finite Graphs
629
3. Bartholdi, L., Grigorchuk, R., & Nekrashevych, V. (2002). From fractal groups to fractal sets. arXiv:math.GR/0202001v4, Preprint 4. Cho, I. (2002). Semicircular elements induced by connected finite graphs. Preprint 5. Cho, I. (2007). Graph von Neumann algebras. ACTA Applied Mathematics, 95, 95–135 6. Cho, I. (2010). Frames, fractals and radial operators in hilbert space. Journal of Mathematical Science: Advances and Application, 5(2), 333–393 7. Cho, I. (2011). Fractal properties in B(H ) induced by partial isometries. Complex Analysis and Operator Theory, 5(1), 1–40 8. Cho, I. (2017). Free semicircular families in free product banach -algebras induced by p-adic number fields p over primes p. Complex Analysis and Operator Theory, 11(3), 507–565 9. Cho, I. (2019a). Semicircular-like, and semicircular laws on banach -probability spaces induced by dynamical systems of the finite adele ring. Advances in Operator Theory, 4(1), 24–70 10. Cho, I. (2019b). Banach-space operators acting on semicircular elements induced by orthogonal projections. Complex Analysis and Operator Theory, 13(8), 4065–4115 11. Cho, I., & Jorgensen, P. E. T. (2008). C-Subalgebras generated by partial isometries. Journal of Applied Mathematics and Computing, 26, 1–48 12. Cho, I., & Jorgensen, P. E. T. (2009). C-Subalgebras generated by a single operator in B(H ). ACTA Applied Mathematics, 108, 625–664 13. Cho, I., & Jorgensen, P. E. T. (2019). Deformations of semicircular and circular laws via p-adic number fields p and sampling of primes. Opuscula Mathematica, 39(6), 771–811 14. Cho, I., & Jorgensen, P. E. T. (2020). Certain -Homomorphisms on C-Algebras and sequences of semicircular elements: A banach space view. Illinois Journal of Mathematics, To Appear 15. Connes, A. (1992). Noncommutative geometry. Lecture Note in Mathematics, Mathematics Research Today and Tomorrow (Barcelona) (Vol. 1525, pp. 40–58), MR:1247054. Springer 16. Dicks, W., & Ventura, E. (1996). The group fixed by a family of injective endomorphisms of a free group. Contemporary Mathematics, 195, AMS 17. Dutkay, D. E., & Jorgensen, P. E. T. (2005). Iterated function systems, ruelle operators and invariant projective measures. arXiv:math.DS/0501077/v3, Preprint 18. Exel, R. (2005). A new look at the crossed-product of a C-algebra by a semigroup of endomorphisms. Preprint 19. Ghosh, A., Boyd, S., & Saberi, A. (2008). Minimizing effective resistance of a graph. SIAM Reviews, 50(1), 37–66 20. Gibbons, A., & Novak, L. (1999). Hybrid graph theory and network analysis. Cambridge University Press. ISBN: 0-521-46117-0 21. Gill, A. (1962). Introduction to the theory of finite-state machines (MR0209083). McGraw-Hill Book Co 22. Gliman, R., Shpilrain, V., & Myasnikov, A. G. (Eds.), (2001). Computational and statistical group theory. Contemporary Mathematics, 298, AMS 23. Guido, D.,Isola, T., & Lapidus, M. L. (2006). A trace on fractal graphs and the ihara zeta function. arXiv:math.OA/0608060v1, Preprint 24. Jorgensen, P. E. T. (2005). Use of operator algebras in the analysis of measures from wavelets and iterated function systems. Preprint
630
I. Cho and P. E. T. Jorgensen
25. Jorgensen, P. E. T., Schmitt, L. M., & Werner, R. F. (1994). q-Canonical commutation relations and stability of the Cuntz algebra. Pacific Journal of Mathematics, 165(1), 131–151 26. Jorgensen, P. E. T., Song, M. (2007). Entropy encoding, Hilbert spaces, and Kahunen-Loeve transforms. Journal of Mathematics Physics, 48(10), 103503 27. Kigami, J., Strichartz, R. S., & Walker, K. C. (2001). Constructing a laplacian on the diamond fractal. Experimental Mathematics, 10(3), 437–448 28. Kribs, D.W. (2005). Quantum causal histories and the directed graph operator framework. arXiv:math.OA/0501087v1, Preprint 29. Kribs, D.W., & Jury, M.T. (2003). Ideal structure in free semigroupoid algebras from directed graphs. arXiv:math/0309397, Preprint 30. Kucherenko, I. V. (2007). On the Structurization of a class of reversible cellular automata. Discrete Mathematics, 19(3), 102–121 31. Lind, D. A. (1987). Entropies of automorphisms of a topological markov shift. Proceedings of the American Mathematical Society, 99(3), 589–595 32. Lind, D. A., & Marcus, B. (1995). An introduction to symbolic dynamics and coding. Cambridge University Press 33. Liu, W. (2021). Relations between Convolutions and Transforms in Operator-Valued Free Probability. Advances in Mathematics, 390, 48. Paper No. 107949 34. Marshall, C. W. (1971). Applied graph theory. John Wiley & Sons. ISBN: 0-471-57300-0 35. Mitchener, P. D. (2005). C-categories, groupoid actions, equivalent KK-theory, and the Baum-Connes conjecture. arXiv:math.KT/0204291v1, Preprint 36. Myasnikov, A. G., & Shapilrain, V. (Eds.), (2003). Group theory, statistics and cryptography. Contemporary Mathematics, 360, AMS 37. Opper, M., & Cakmak, B. (2020). Understanding the dynamics of message passing algorithm; A free probability heuristics. ACTA Physica Polonica B, 51(7), 1673–1685 38. Potgieter, P. (2007). Nonstandard analysis, fractal properties and brownian motion. arXiv:math.FA/0701649v1, Preprint 39. Radulescu, F. (1994). Random matrices, amalgamated free products and subfactors of the C–algebra of a free group, of noninteger index. Inventiones Mathematicae, 115, 347–389 40. Raeburn, I. (2005). Graph algebras. CBMS no 3, AMS 41. Scapellato, R., & Lauri, J. (2003). Topics in graph automorphisms and reconstruction. London Mathematical Society, Student Text (Vol. 54). Cambridge University Press 42. Schiff, J.L. (2008). Cellular automata, discrete view of the World. Wiley-Interscience Series in Discrete Mathematics and Optimazation. John Wiley & Sons Press. ISBN: 978-0-470-16879-0 43. Shirai, T. (2000). The spectrum of infinite regular line graphs. Transactions of the American Mathematical Society, 352(1), 115–132 44. Shlyakhtenko, D. (2019). Ransom matrices and free probability, ransom matrices, IAS/Park City Mathematics Series (Vol. 26, pp. 389–459). American Mathematical Society 45. Solel, B. (2000). You can see the arrows in a Quiver operator algebras. Preprint
On Semicircular Elements Induced by Connected Finite Graphs
631
46. Speicher, R. (1998). Combinatorial theory of the free product with amalgamation and operator-valued free probability theory. Memoirs of the American Mathematical Society, 132(627) 47. Vega, V. (2007). Finite directed graphs and W-correspondences, Ph.D thesis, University of Iowa 48. Voiculescu, D., Dykema, K., & Nica, A. (1992). Free random variables. CRM Monograph Series, 1 49. Weintraub, S. H. (2003). representation theory of finite groups: Algebra and arithmetic. Graduate Studies in Mathematics, 59, AMS 50. Williams, J. D. (2017). Analytic function theory for operator-valued free probability. Journal fur die Reine und Angewandte Mathematik, 729, 119–149
Hilbert C-Module for Analyzing Structured Data Yuka Hashimoto, Fuyuta Komura, and Masahiro Ikeda
Abstract Generalizing data analysis in Hilbert spaces to Hilbert C-modules has been investigated. This generalization enables us to analyze structured data such as functional data. In this chapter, we review recent results for data analysis in Hilbert C-modules, for example, [Hashimoto et al., J Mach Learn Res 22(267): 1–56], which allow us to generalize analysis in Hilbert spaces to that in Hilbert C-modules. Keywords Hilbert C-module • Orthonormalization • Neural network • Reproducing kernel Mathematics Subject Classification (MSC2020) Primary 46L08 • Secondary 46N99
Y. Hashimoto (✉) NTT Network Service Systems Laboratories, Musashinoshi, Tokyo, Japan Center for Advanced Intelligence Project, RIKEN, Chuo-ku, Tokyo, Japan e-mail: [email protected] F. Komura Faculty of Science and Technology, Keio University, Kanagawa, Yokohama, Japan e-mail: [email protected] M. Ikeda Center for Advanced Intelligence Project, RIKEN, Chuo-ku, Tokyo, Japan Faculty of Science and Technology, Keio University, Kanagawa, Yokohama, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Moslehian (ed.), Matrix and Operator Equations and Applications, Mathematics Online First Collections, https://doi.org/10.1007/16618_2023_58
633
634
Y. Hashimoto et al.
1 Introduction Analyzing data with Hilbert C-modules instead of Hilbert spaces has been investigated [15, 16, 51]. Since Hilbert C-module is a natural generalization of Hilbert space [26, 29], we can generalize several important results in Hilbert spaces to Hilbert C-modules. For example, Hashimoto et al. [15] showed that if we consider the C-algebra of continuous functions on a compact Hausdorff space, we can generalize analysis in Hilbert spaces to that in Hilbert C-modules for functional data. In this chapter, we review data analysis with Hilbert C-modules. We first focus on a minimization property of a projection and provide a practical scheme of an orthonormalization to construct the projection. We generalize the results in [15], which requires the compactness of operators, to the case of general Hilbert C-modules over von Neumann-algebras. The minimization property of the projections on Hilbert spaces is important to obtain an approximation of an operator in an appropriate subspace [14, 18, 23]. The generalization of this minimization property to Hilbert C-modules allows us to obtain a similar approximation of an operator defined on a Hilbert Cmodule. We also review the application of Hilbert C-modules to neural networks. Recently, neural networks have been actively investigated and applied to various problems [19]. Originally, a neural network is defined as a map on a finite-dimensional Hilbert space, and the map is constructed by compositions of linear and nonlinear transformations. These maps are characterized by real or complex-valued parameters, and we find suitable parameters by given data. However, extensions of the neural network have been proposed [3, 4, 43, 44]. Chen et al. [4] introduced the idea of regarding the sequence of transformations as a dynamical system and extending it to continuous dynamical systems. We can make use of tools for continuous dynamical systems to find suitable parameters. Since then, methods related to their work have been investigated [28, 36, 53]. Moreover, extending the map on a finite-dimensional space to that on an infinite-dimensional space using integral operators is also investigated for theoretically analyzing neural networks [3, 43, 44]. Generalizing the neural network using Hilbert Cmodules corresponds to combining multiple neural networks continuously [16]. In that framework, we generalize real or complex-valued parameters to C-algebra ones. After focusing on the neural network, we focus on the C-module version of the reproducing kernel space, which is called reproducing kernel Hilbert C-module (RKHM) [17, 21, 48]. Reproducing kernel Hilbert space (RKHS) [37] has been investigated to apply to data analysis for dealing with
Hilbert C-Module for Analyzing Structured Data
635
nonlinearity of data using nonlinear kernel functions. It was first applied to data analysis by Aizerman et al. [1] and has been actively studied [9, 10, 20, 23, 38, 40, 47]. A generalization of RKHS, called vector-valued RKHS (vvRKHS) has also been investigated to analyze vector-valued data [2, 22, 25, 27, 30, 32]. One important theorem for data analysis in RKHSs is the representer theorem [39], which guarantees that solutions of a minimization problem are represented only with given data. The representer theorem is generalized to RKHMs [15]. We weaken an assumption of the representer theorem in RKHMs and derive an approximate representer theorem. The theorem guarantees that we can obtain a vector represented only with given data that attains sufficiently close value of the objective function of the minimization problem. We also introduce a generalization of kernel mean embedding in RKHS [14, 33, 42, 45, 46] to that in RKHM [15] to analyze distributions of data. This chapter is composed of five sections. We review mathematical notions and fundamental tools in Section 2. We see fundamental results for data analysis with Hilbert C-modules in Section 3. We focus on reproducing kernel Hilbert C-module in Section 4. Section 5 is the conclusion. We also discuss possible directions for future work in Section 5.
2 Preliminaries We review mathematical notions required for this chapter in Sections 2.1 and 2.2. Then, we focus on their applications to data analysis in Sections 2.3 and 2.4.
2.1
Hilbert C-Module
In this subsection, we review C-algebras, Hilbert C-modules, and their related notions. A motivation for applying Hilbert C-modules to data analysis is to construct a suitable framework for analyzing structured data using C-algebras. For example, for functional data, we can consider the Calgebra C( Ω) for a compact Hausdorff space Ω or L1( Ω) for a measure space Ω. Definition 2.1 Let A be an algebra over . If A satisfies the following conditions, it is called a C-algebra:
636
Y. Hashimoto et al.
1. A is equipped with a bijection ðÞ : A → A that satisfies • ðαa þ βbÞ = αa þ βb , • (ab) = ba, • (a) = a for α, β 2 and a, b 2 A. 2. A is a Banach space endowed with k kA . In addition, for a, b 2 A, we have kabkA ≤ kakA kbkA . 3. For a 2 A, we have ka akA = kak2A . In the remaining part of this chapter, we denote a C-algebra as A. In addition, we assume A is unital and denote its unit by 1A . It is shown that any C-algebra is regarded as a subalgebra of the C-algebra ðHÞ of bounded linear operators on some Hilbert space H (see, for example, Murphy [34]). In this chapter, some results require the C-algebra to be a von Neumannalgebra. Definition 2.2 A von Neumann-algebra is a C-algebra A ⊆ ðHÞ that is closed in the strong operator topology. The notion of positiveness is important since it provides an order in A. We can define minimization or maximization problems using the order. Definition 2.3 Let a 2 A. If a = bb for some b 2 A, then a is called positive. We put Aþ = fa 2 A j a is positiveg in this chapter. Note that we can define a (partial) order ≤ A in A by “a ≤ A b if and only if b - a is positive.” In addition, we denote a ≨A b if b - a is positive and not zero. We omit the subscription A if it is obvious. In practical applications, we formulate problems as minimization or maximization problems. Thus, we introduce supremum, maximum, infimum, and minimum in A with respect to the order ≤ A . Definition 2.4 Let S be a subset of A. 1. If an element a 2 A satisfies b ≤ a for any b 2 S, then it is called an upper bound of S. If we have c ≤ a for any upper bound a of S and if c is also an upper bound of S, then c 2 A is called a supremum of S. If c 2 S, then c is said to be a maximum of S. 2. If an element a 2 A satisfies a ≤ b for any b 2 S, then it is called a lower bound of S. If we have a ≤ c for any lower bound a of S and if c is also a lower bound of S, then c 2 A is called an infimum of S. If c 2 S, then c is said to be a minimum of S. We now introduce Hilbert C-module. We first introduce an A -valued version of the inner product.
Hilbert C-Module for Analyzing Structured Data
637
Definition 2.5 An abelian group X with an operation + is called a (right) C-module over A if it is equipped with a (right) A-multiplication. For a C-module X over A, a map h, iX : X × X → A is referred to as an Avalued inner product if it is -linear with respect to the second variable and has the following properties: For x, y, z 2 X and a, b 2 A, 1. 2. 3. 4.
hx, ya þ zbiX = hx, yiX a þ hx, ziX b. hx, yiX = hy, xiX . hx, xiX ≥ 0. If hx, xiX = 0 then x = 0.
For x 2 X , let a 2 Aþ be the positive element in A that satisfies a2 = hx, xiX : We denote a by jxjX and call A -valued absolute value on X . Let kxkX = kjxjX kA . Then k kX is a (þ -valued) norm in X . Here, þ is the set of all nonnegative real numbers. The A-valued absolute value jxjX and the norm k kX are both generalizations of the norm in a Hilbert space. Since the square of A-valued absolute value jxjX is calculated only with A-valued inner products, results in Hilbert spaces are generalized to those in Hilbert C-modules using the A-valued absolute value. See the results in Sections 3 and 4 for more details. Definition 2.6 A C-module X over A equipped with an A-valued inner product is called a Hilbert C-module over A or Hilbert A-module if X is a Banach space endowed by the norm k kX . In the remaining part of this chapter, we denote a (right) Hilbert C-module on A as X . Example For n 2 , let X = An . Then X is a Hilbert C-modules over A. For a = ða1 , . . ., an Þ, b = ðb1 , . . . , bn Þ 2 An , the A -valued inner product is defined as ha, biX = ni= 1 ai bi . The absolute value is defined as 1∕2 jajX = ð ni= 1 ai ai Þ . The norm is defined as kakX = k ni= 1 ai ai k1∕2 A. For practical applications, we identify an n-dimensional Hilbert space as or n , which enables us to implement algorithms on computers. The Hilbert C-module An can generalize the above case of Hilbert spaces to Hilbert C-modules. Orthonormality is important for data analysis. For example, we obtain orthogonal projections from an orthonormal basis. Then we can project vectors in a finite-dimensional subspace, which can be an approximation of the original vector in the finite-dimensional subspace. n
Definition 2.7 If z 2 X satisfies 0 ≠ hz, ziX = hz, zi2X , then it is called normalized. For an index set I and for i 2 I , let zi 2 X . We put
638
Y. Hashimoto et al.
S = fzi gi2I ⊆ X . If zi is normalized for any i 2 I and zi , zj X = 0 for i ≠ j, then S is called an orthonormal system of X . If S is an orthonormal system and the module generated by S is dense in X, then it is called an orthonormal basis. Another important notion related to Hilbert C-module is the internal tensor. We focus on the C-module of bounded linear operators on some Hilbert space H. We construct a Hilbert space using a Hilbert ðHÞ-module. Internal tensor is useful when we investigate the connection between Hilbert C-modules and Hilbert spaces. Lemma 2.8 Let X be a Hilbert ðHÞ-module, and let X H be the tensor product of X and H as vector spaces. We define a map h, iX H : X H × X H → as hx w, y hiX H = w, hx, yiX h
H
for x, y 2 X and w, h 2 H. Then h, iX H is a -valued pre-inner product on X H. Definition 2.9 Let W be the completion of X H with respect to the pre-inner product h, iX H . Then W is called the internal tensor between X and H. We denote W as X ðHÞ H. See Murphy [34] and Lance [26] for more details.
2.2
A-Valued Measure and Integral
We introduce A-valued measures and integrals, which are special cases of vector measures and integrals [7, 8], respectively. To deal with the randomness of data, we often assume each sample in data is a random variable and address its distribution. Therefore, A-valued measures and integrals allow us to generalize existing analyses of distributions in Hilbert spaces to those of Hilbert C-modules. See Section 4.3 for more details. We denote a locally compact Hausdorff space by V and a σ-algebra on V by Σ in this subsection. Definition 2.10 Let μ be an A-valued map on Σ such that for any countable and pairwise disjoint collection fE i g1 in Σ, we have i=1 1 1 μð i = 1 E i Þ = i = 1 μðEi Þ, where the convergence is with respect to the norm in A. Then μ is called a (countably additive) A-vaued measure.
Hilbert C-Module for Analyzing Structured Data
639
Definition 2.11 Let μ be an A-valued measure. 1. If jμj(E) < 1 for any E 2 Σ, where jμjðEÞ n
= supf
i=1
kμðEi ÞkA j n 2 , fE i gni= 1 is a finite partition of E 2 Σg,
then μ is referred to as finite. The þ -valued measure jμj is referred to as the total variation of μ. 2. Let E 2 Σ and ε > 0. If there exist a compact set K ⊆ E and an open set G ⊇ E such that kμðFÞkA ≤ ε for any F ⊆ G ∖ K, then μ is called regular. 3. The Borel σ-algebra B on V is the σ-algebra generated by all compact subsets of V . When Σ = B, μ is referred to as a Borel measure. We denote the set of all A-valued finite regular Borel measures by DðV, AÞ. We define an integral of an A-valued function with respect to an A-valued measure by using A-valued step functions. Definition 2.12 Let χ E : V →{0, 1} be the indicator function for E 2 B. Let s : V → A be an A-valued map satisfying sðvÞ = ni= 1 ai χ Ei ðvÞ for n 2 , ai 2 A and a finite partition fE i gni= 1 of V . Then s is called a step function. We denote the set of all A-valued step functions on V by SðV, AÞ. Let s 2 SðV, AÞ and μ 2 DðV, AÞ. Let s = ni= 1 ai χ Ei ðvÞ. We define the left integral of s with respect to μ as n
sðvÞdμðvÞ = v2V
i=1
ai μðE i Þ:
Similarly, we define the right integral of s as n
dμðvÞsðvÞ : = v2V
i=1
μðE i Þai :
Let ν be an þ-valued finite measure. Let L1ν ðV, AÞ be the space of all Avalued ν-Bochner integrable functions on V , i.e., the space of functions x such that there exists a sequence fsi g1 i = 1 ⊆ SðV, AÞ composed of step functions such that lim i → 1 v2V kxðvÞ - si ðvÞkA dνðvÞ = 0 [6, Chapter IV].
640
Y. Hashimoto et al.
Definition 2.13 Let μ 2 DðV, AÞ. Let x 2 L1jμj ðV, AÞ and let x =limi→1si. We define the left integral of x with respect to μ as dμðvÞsi ðvÞ:
lim
i→1
v2V
Similarly, we define the right integral of x as si ðvÞdμðvÞ:
lim
i→1
v2V
Here, the sequence fsi g1 i = 1 ⊆ SðV, AÞ is composed of step functions whose limit in L1jμj ðV, AÞ is x. Note that jμj is a (real nonnegative-valued) measure [8, Proposition 10 in Section 1.2] We consider a continuous function space that is dense in L1ν ðV, AÞ for any ν. Definition 2.14 Let x be an A-valued continuous function on V . It is said to vanish at infinity if for any ε > 0, the set fv 2 V j kxðvÞkA ≥ εg is compact in V . We denote the set of all x vanishing at infinity by C 0 ðV, AÞ. Proposition 2.15 The continuous function space C0 ðV, AÞ is dense in L1ν ðV, AÞ for any þ -valued finite regular measure ν. See Dinculeanu [7, 8] for further details.
2.3
Application of Reproducing Kernel Hilbert Space to Data Analysis
As explained in Section 1, the application of reproducing kernel Hilbert space (RKHS) to data analysis has been actively investigated for dealing with nonlinearity of data using nonlinear kernel functions. Let V be a non-empty set. For the application to data analysis, V is a given set where the data lives, and we set a positive definite kernel ~k, defined below, and construct the RKHS W ~k associated with it. Definition 2.16 Let ~k : V × V → be a -valued function satisfying the following two conditions:
Hilbert C-Module for Analyzing Structured Data
641
1. ~kðv, uÞ = ~kðu, vÞ for v, u 2 V , 2. ni,j = 1 αi αj~kðvi , vj Þ ≥ 0 for n 2 , α1 , . . . , αn 2 , v1, . . . , vn 2 V . Then ~k is called a positive definite kernel. ~ = ~kð, vÞ. Let ~ : V → V be the feature map, which is defined as ϕðvÞ Let ϕ W ~k,0 =
f
n i=1
j
~ i Þ n 2 , α1 , . . . , αn 2 , v1 , . . . , vn 2 V αi ϕðv
g:
Then we define a map h, iW ~k : W ~k,0 × W ~k,0 → as follows:
h
n i=1
~ i Þ, αi ϕðv
l j=1
i
~ jÞ βj ϕðu
n
W ~k
=
l
i=1 j=1
αi βj ~kðvi , uj Þ:
By the definition of ~k, the map h, iW ~k is well-defined and satisfies the conditions of the inner product. In addition, it satisfies ~ hϕðvÞ, xiW ~ = xðvÞ k
for x 2 W ~k,0 and v 2 V , that is, we can evaluate functions in W ~k using the feature map. We call the completion of W ~k,0 the RKHS associated with ~k and denote it by W ~k . Example For data analysis, we often use Gaussian and Laplacian kernels as 2 positive definite kernels, which are defined as kðv, uÞ = e - ckv - uk and kðv, uÞ = e - ckv - uk1 for v, u 2 d , respectively [33]. Here, c > 0 is an arbitrary integer, k k is the Euclidean norm, and k k1 is the norm defined as kðv1 , . . . , vd Þk = di= 1 jvi j. We can also use positive definite kernels that are specific to the kind of data such as semantic data [49] and image data [27]. In many cases, the set V is a finite-dimensional vector space. In this case, the dimension of W ~k is generally higher than that of V or infinitedimensional. Thus, complicated behavior of data in V is expected to become simple in W ~k . For example, implementing principal analysis and support vector machine in RKHSs enables us to capture nonlinear behavior of data. See, for example, Schölkopf and Smola [38]. Another important application of RKHSs to data analysis is the kernel ridge regression [35, Chapter 14.4.3]. For given samples v1, . . . , vn 2 V and γ 1 , . . . , γ n 2 , we try to find a function
642
Y. Hashimoto et al.
f 2 W ~k that minimizes the sum of the mean squared error and the regularization term n i=1
jf ðvi Þ - γ i j2 þ λkf k2W ~k ,
ð2:1Þ
where λ > 0. To find the function f minimizing equation (2.1), we use the following representer theorem [39]. Proposition 2.17 (Representer Theorem in RKHS) Let v1, . . . , vn 2 V and γ 1 , . . . , γ n 2 . Let h : V × × → þ be an error function and let g : þ → þ satisfy g(α) < g(β) for α < β. Then any x 2 W ~k minimizing n admits a representation of the form i = 1 hðvi , γ i , xðvi ÞÞ þ gðkxkW ~k Þ n i = 1 αi ϕðvi Þ for some α1 , . . . , αn 2 . By Theorem 2.17, the problem of finding f 2 W ~k minimizing equation (2.1) is reduced to that of finding ðα1 , . . . , αn Þ 2 d , which allows us to compute the solution by the finite number of operations.
2.4
Neural Network
Neural networks have been actively researched and have been successfully applied to many problems such as classification and data generation [11, 19]. Let n 2 and let N 0 , . . . , N nþ1 2 . In addition, for i = 1, . . . , ~ i : N i - 1 → N i be an Ni-1 × Ni matrix and σ~i : N i → N i be an n + 1, let W (often nonlinear) activation function. Typical choices of σ~i are elementwise rectified linear unit (ReLU) σ~i ðvÞ = ðmax ðv1 , 0Þ, . . . , maxðvn , 0ÞÞ and sigmoid σ~i ðxÞ = ð1∕ ð1 þ e - v1 Þ, . . . , 1∕ ð1 þ e - vn ÞÞ for v = ðv1 , . . . , vn Þ 2 n . The neural network model ~f : N 0 → N nþ1 is defined as ~f = σ~nþ1 ∘W ~ nþ1 ∘~ ~ n ∘⋯∘~ ~ 1: σ n ∘W σ 1 ∘W
ð2:2Þ
~ 1, . . . , W ~ nþ1 by minimizing an objective We fix σ~1 , . . . , σ~nþ1 and optimize W ~ ~ function called a loss function L. Let N = inþ1 = 1 N i - 1 N i . Let θ be the N~ 1, . . . , W ~ nþ1 . We set a dimensional vector composed of the set of matrices W N ~ ~ loss function L : → þ , which depends on θ (and usually on inputs and outputs). We often apply a gradient descent method such as stochastic gradient descent (SGD) [52] and Adam [24] to solve the minimization ~ numerically. problem with respect to the loss function L
Hilbert C-Module for Analyzing Structured Data
643
3 Recent Progress in Analysis with Hilbert C-Module Hilbert C-module generalizes the notion of Hilbert space, which allows us to generalize analysis in Hilbert spaces to that in Hilbert C-modules. We review recent progress in data analysis in Hilbert C-modules [15].
3.1
Gram–Schmidt Orthonormalization
Orthogonal projection onto a finite-dimensional subspace of a Hilbert space is important since it provides us the vector in the finite-dimensional space that minimizes the squared error between a vector in the Hilbert space. This minimization property of orthogonal projection is also available in Hilbert C-modules [15, Theorem 4.7]. Theorem 3.1 Let A be a unital C-algebra, and let I be a finite index set. Let fzi gi2I be an orthonormal system of X and let Y be the submodule of X generated by fzi gi2I . Then there exists the orthogonal projection P onto Y. Moreover, Px uniquely minimizes the following minimization problem: min jx - yj2X : y2Y
Here, the minimum is with respect to the order ≤ A in A. Note that Theorem 4.7 in [15] lacks the assumption that I is finite. We provide the proof of Theorem 3.1 above as follows. Proof of Theorem 3.1 For x 2 X , define zi hzi , xiX :
Px = i2I
One can see that P : X → X is the orthogonal projection onto Y. More precisely, P is a bounded A-linear map that satisfies P2 = P, P = P, and PX = Y. Here, a map f : X → A is called A-linear if f is linear and satisfies f(xa) = f(x)a for any x 2 X and a 2 A. Let x 2 X . For any y 2 Y, we have jx - yj2X = jPx - yj2X þ jðI - PÞxj2X ≥ jPx - xj2X :
644
Y. Hashimoto et al.
Thus, by Murphy [34, Theorem 2.2.6], we have jx - yjX ≥ jx - PxjX . Moreover, assume y 2 Y satisfies jx - yjX = jx - PxjX . Then we have and hx - Px, x - PxiX = hx - y, x - yiX . Since hx, PxiX = hPx, PxiX hx, yiX = hPx, yiX , we have hx, xiX - hPx, PxiX = hx, xiX - hPx, yiX - hy, PxiX þ hy, yiX : Therefore, we have jPx - yj2X = 0, which shows Px = y.
□
Propositions 6.10 and 6.11 in [15] and their proofs show an approach for constructing an orthonormal system of a Hilbert C-module from a sequence fxi g1 i = 1 . However, xi , xj X is assumed to be compact for any i and j in [15]. Here, we generalize them and prove that if A is a von Neumannalgebra, then we do not need the assumption. Proposition 3.2 Let A be a von Neumann-algebra. Let ε > 0 and let z^ 2 X ^ < 1∕ ε is a vector satisfying k^ zkX > ε. Then there exists b^ 2 A such that kbk A ^ and z : = z^b is normalized. Moreover, there exists b 2 A such that k^ z - zbkX ≤ ε. Proof Let a = hz^, z^iX and let λ2σ(a)λdE(λ) be the spectral decomposition of a, where σ(a) is the spectrum of a. Let b^ = λ2σðaÞ∖B 2 ð0Þ λ - 1∕2 dEðλÞ 2 A, ε ^ we have where Bε ð0Þ = fz 2 j jzj ≤ εg. By the definition of b, ^ kbk < 1∕ ε. Moreover, we have A
^ z^bi ^ = b^ ab^ = h^ zb, X
dEðλÞ: λ2σðaÞ∖Bε2 ð0Þ
^ z^bi ^ is a nonzero orthogonal projection. Thus, h^ zb, X ^ = Let b = λ2σðaÞ∖B 2 ð0Þ λ1∕2 dEðλÞ. Since bb λ2σðaÞ∖Bε2 ð0Þ dEðλÞ, we have ε ^ = h^ ^ z^bbi ^ and obtain h^ z, z^bbi zbb, ^ z^- z^bbi ^ ^ z^bbi ^ h^ z - zb, z^- zbiX = h^ z - z^bb, z, z^i - h^ zbb, X = h^ X =
λdEðλÞ: λ2Bε2 ð0Þ
^ ≤ ε. Thus, we have k^ z - zbk X
□
Hilbert C-Module for Analyzing Structured Data
645
Proposition 3.3 (Gram–Schmidt Orthonormalization) Let A be a von Neumann-algebra. Let fxi g1 i = 1 be a sequence in X . For i = 1, 2, . . . and ε > 0, let z^j = xj zj = 0
j-1 i=1
zi zi , xj
X
,
zj = z^j b^j
if k^ zj kX > ε,
otherwise:
Here, b^j is defined in the same manner as b^ in Proposition 3.2 by replacing z^ by z^j. Then fzj g1 j = 1 is an orthonormal system of X. Moreover, any xj is in the ε-neighborhood of the module generated by fzj g1 j = 1. Proposition 3.3 is proved in the same manner as Proposition 6.11 in [15]. Remark 3.4 If A is the C-algebra of n by n matrices for n 2 , then we can replace the condition ε > 0 by ε ≥ 0. Indeed, for a 2 A, σ(a) is discrete. If ε appearing in Gram–Schmidt orthonormalization is small, then we obtain an orthonormal system that approximates the vector xj well. However, if ε is small, then the norm of b^j is large. From the perspective of the implementation, this may cause numerical unstableness while computing the orthonormal system. Note that if A is not a von Neumann-algebra, the Gram–Schmidt orthonormalization does not always work. Example Let X = Cð½0, 1Þ. We consider X as a Hilbert C([0, 1])-modul]e in the natural way. Let Y = ff 2 X j f ð0Þ = 0g: Then Y is a closed submodule of X. It follows that Y has no nonzero normalized elements. Indeed, if z 2 Y is normalized, then hz, ziX 2 Cð½0, 1Þ is a projection. Since projections in C([0, 1]) are only {0, 1} and we have hz, ziX ð0Þ = 0, we obtain hz, ziX = 0 and z = 0. Hence, the Gram– Schmidt orthonormalization does not always work in general. Cnops [5] and Wang and Qian [50] investigated an orthonormalization in Hilbert modules over H-algebras.
646
3.2
Y. Hashimoto et al.
Neural Network Defined on Hilbert C-Module
A generalization of the parameter θ on N , which is a Hilbert space, to a Hilbert C-module was proposed by Hashimoto et al. [16]. This generalization allows us to combine multiple neural networks continuously and learn them efficiently. As in Section 2.4, let N 0 , . . . , N nþ1 2 . In addition, for i = 1, . . . , n + 1, let W i : AN i - 1 → AN i be an Ni-1 × Ni A-valued matrix and σ i : AN i → AN i be an (often nonlinear) activation function. We define the neural network model f : AN 0 → AN nþ1 as f = σ nþ1 ∘W nþ1 ∘σ n ∘W n ∘⋯∘σ 1 ∘W 1 in the same manner as equation (2.2). Let N = inþ1 = 1 N i - 1 N i . Let θ be the N-dimensional A-valued vector composed of the set of A-valued matrices W1, . . . , Wn+1. Then we set an A-valued loss function L : AN → Aþ , which depends on θ. If we set A = CðΩÞ, the C-algebra of continuous functions on a compact Hausdorff space Ω, then we can regard f as a combination of infinitely many neural networks ~f in Section 2.4. Indeed, for each ω 2 Ω, let ~f ω : N 0 → N nþ1 be defined as equation (2.2). Then we can set f ðxÞðωÞ = ~f ω ðxðωÞÞ. Since ~f ω continuously depends on ω, f can be regarded as a continuous aggregation of neural networks ~f . Thus, this framework allows us to adapt multiple models continuously to problems.
4 Recent Progress in Analysis with Reproducing Kernel Hilbert C-Module As mentioned in Section 2.3, the analysis with RKHSs has been actively studied. Vector-valued RKHS (vvRKHS) is a generalization of RKHS and is used for analyzing vector-valued data [2, 22, 25, 27, 30, 32]. Reproducing kernel Hilbert C-module (RKHM) generalizes RKHS and vvRKHS and has been studied for pure operator algebraic and mathematical physics problems [17, 29, 31]. For data analysis, Ye [51] focused on the case of considering the C-algebra of matrices and discussed a connection of the support vector machine (SVM) with RKHMs. Hashimoto et al. [15] proposed applying RKHMs to the analysis of functional data.
Hilbert C-Module for Analyzing Structured Data
4.1
647
Vector-Valued RKHS
Let V be a non-empty set. Let H be a Hilbert space. Moreover, we set an operator-valued positive definite kernel k as follows: Definition 4.1 Let k : V × V → ðHÞ be a ðHÞ-valued map satisfying the following two conditions: 1. k(v, u) = k(u, v) for v, u 2 V , 2. ni,j = 1 hi , kðvi , vj Þhj H ≥ 0 for n 2 , h1 , . . . , hn 2 H, v1, . . . , vn 2 V . Then k is called an operator-valued positive definite kernel. Let ϕ : V → ðHÞV be a map defined as ϕ(v) = k(, v). Let W vk,0 =
f
j
n i=1
ϕðvi Þhi n 2 , h1 , . . . , hn 2 H, v1 , . . . , vn 2 V
g:
Then we define a map h, iW v : W vk,0 × W vk,0 → as follows: k
h
n i=1
l
ϕðvi Þhi ,
j=1
ϕðuj Þwj
i
n v =
Wk
l
i=1 j=1
hi , kðvi , vj Þwj
H
:
By the properties of k in Definition 4.1, the map h, iW v is well-defined, and it k is an inner product. In addition, we can evaluate functions in W vk in the following way: hϕðvÞh, xiW v = hh, xðvÞiH k
for x 2 W vk,0 , v 2 V , and h 2 H. We call the completion of W vk,0 as the vvRKHS associated with ~k and denote it by W vk . We remark that even though the positive definite kernel is ðHÞ-valued, the inner product in W vk is -valued. See, for example, Kadri et al. [22] for more details on vvRKHS.
4.2
Reproducing Kernel Hilbert C-Module
We review the definition of RKHM.
648
Y. Hashimoto et al.
Definition 4.2 Let k : V × V → A be an A -valued map satisfying the following two conditions: 1. k(v, u) = k(u, v) for v, u 2 V , 2. ni,j = 1 ai kðvi , vj Þaj ≥ A 0 for n 2 , a1 , . . . , an 2 A, v1, . . . , vn 2 V . Then k is called an A-valued positive definite kernel. Let ϕ : V → AV be the feature map, which is defined as ϕ(v) = k(, v) for v 2 V . Let X k,0 be the C-module defined as X k,0 =
f
n i=1
j
ϕðvi Þai n 2 , a1 , . . . , an 2 A, v1 , . . . , vn 2 V
g:
We define an A-valued map h, iX k : X k,0 × X k,0 → A as follows:
h
n i=1
l
ϕðvi Þai ,
j=1
ϕðuj Þbj
i
Xk
n
=
l
i=1 j=1
ai kðvi , vj Þbj :
By the properties of k in Definition 4.2, the map h, iX k is well-defined. In addition, we can evaluate functions in X k as hϕðvÞ, xiX k = xðvÞ for x 2 X k,0 and v 2 V . Also, it is an A-valued inner product. We call the completion of X k,0 the reproducing kernel Hilbert A-module (RKHM) associated with k and denote it by X k . The inner products in RKHMs have more information than those in vvRKHSs since their values are in A. The following theorem generalizes Proposition 2.17 (the representer theorem in RKHS) to RKHM [15, Theorem 4.8]. Theorem 4.3 (Representer Theorem) Let A be a unital C-algebra. Let v1, . . . , vn 2 V and a1 , . . . , an 2 A. Let h : V × A × A → Aþ be an error function and let g : Aþ → Aþ satisfy gðaÞ gðbÞ for a b. In addition, let f : X k → Aþ be defined as f ðxÞ = ni= 1 hðvi , ai , xðvi ÞÞ þ gðjxjX k Þ. Assume the module (algebraically) generated by fϕðvi Þgni= 1 is closed. Then any x 2 X k minimizing f admits a representation of the form ni= 1 ϕðvi Þai for some a1 , . . . , an 2 A. Note that the proof of Theorem 4.8 in [15] (corresponding to Theorem 4.3 above) requires Proposition 4.3 in [15]. However, the statement of
Hilbert C-Module for Analyzing Structured Data
649
Proposition 4.3 in [15] lacks the assumption that the submodule is finitely generated. We can correct the statement as follows [29, Lemma 2.3.7]. Proposition 4.4 Let A be a unital C-algebra. Let Y be a finitely (algebraically) generated closed submodule of X . Then Y is orthogonally complemented in X (i.e., X = Y Y ⊥ , where Y ⊥ = fx 2 X j hx, yiX = 0 for any y 2 Yg). We provide the representer theorem and its proof as follows. Proof of Theorem 4.3 Let Y be the submodule generated by fϕðvi Þgni= 1 . Since we assume that Y is closed, by Proposition 4.4, Y is orthogonally complemented in X k . Let P be the orthogonal projection onto Y and assume there is a minimizer x 2 X k of f. Then for i = 1, . . . , n, we have xðvi Þ = hϕðvi Þ, xiX k = hϕðvi Þ, PxiX k = Pxðvi Þ: Moreover, if (I - P)x ≠ 0, then jðI - PÞxjX k ≠ 0, and we have gðjxjX k Þ = gðjPx þ ðI - PÞxjX k Þ
ð
= g ðjPxj2X k þ jðI - PÞxj2X k Þ
1∕ 2
ÞgðjPxjX Þ, k
which contradicts the fact that x is a minimizer. Thus, we have (I - P)x = 0. □ Note that a submodule generated by finite elements is not always closed. Example Put X = Cð½0, 1Þ and Y = ff 2 X j f ð0Þ = 0g: Then Y is a closed submodule of X . Define the function g : ½0, 1 → by g(x) = x for x 2 [0, 1]. Then it follows g 2 Y. Let Z be the C([0, 1])-module generated by g. One can see that Z is dense in Y. In addition, p Z is a proper subset of Y. Indeed, define the function h 2 X to be hðxÞ = x for x 2 [0, 1]. Then one can see h 2 Y∖Z. Hence, Z is not closed. Therefore, a submodule generated by finite elements is not always closed in general. Together with Proposition 3.3, we obtain the following new result for von Neumann-algebras. Proposition 4.5 (Approximate Representer Theorem) Let A be a von Neumann-algebra. Let v1, . . . , vn, a1, . . . , an, h, g, and f be the same as Theorem 4.3. Assume h is Lipschitz continuous with respect to the last
650
Y. Hashimoto et al.
variable with a constant L > 0 and there exists x 2 X k such that x minimizes f. Then for any ε > 0, there exists a an orthonormal system S of X k such that kf ðPxÞ - f ðxÞkA ≤ LnεkxkX k , where P is the orthogonal projection onto the submodule generated by S. Proof Let ε > 0. By Proposition 3.3, there exists an orthonormal system S satisfying such that for any i = 1, . . . , n, there exists yi 2 Y kϕðvi Þ - yi kA ≤ ε. Here, Y is the submodule generated by S. By Proposition 3.1, the projection P onto Y exists. For i = 1, . . . , n, let zi = ϕ(vi) - yi. Then we have xðvi Þ = hϕðvi Þ, xiX k = hyi þ zi , xiX k = hyi , PxiX k þ hzi , xiX k = hϕðvi Þ, PxiX k þ hzi , ðI - PÞxiX k = Pxðvi Þ þ hzi , ðI - PÞxiX k : Thus, by the Cauchy–Schwartz inequality in Hilbert C-modules [26, Proposition 1.1], we obtain kxðvi Þ - Pxðvi ÞkA ≤ εkxkX k . On the other hand, in the same manner as the proof of Theorem 4.3, we obtain gðjxjX k Þ gðjPxjX k Þ. Therefore, we have 0 ≤ f ðPxÞ - f ðxÞ n
= ≤
i=1 n i=1
n
hðvi , ai , Pxðvi ÞÞ hðvi , ai , Pxðvi ÞÞ -
i=1 n i=1
hðvi , ai , xðvi ÞÞ þ gðjPxjX k Þ - gðjxjX k Þ hðvi , ai , xðvi ÞÞ:
Since h is Lipschitz continuous with respect to the last variable, we have kf ðPxÞ - f ðxÞkA ≤ LnεkxkX k . □ Note that Y in the proof of Proposition 4.5 is contained in the submodule generated by fϕðvi Þgni= 1 . Proposition 4.5 guarantees that when A is a von Neumann-algebra, even if the submodule generated by fϕðvi Þgni= 1 is not closed, we can obtain a vector y that has a representation y = ni= 1 ϕðvi Þai (a1 , . . . , an 2 A) and f( y) is sufficiently close to the minimum of f. As for the connection between RKHMs and vvRKHSs, the following theorem shows that we can reconstruct vvRKHSs using RKHMs [15]. We use the internal tensor (Definition 2.9) on ðHÞ to construct isomorphic spaces with vvRKHSs. Theorem 4.6 Let A = ðHÞ. Then Hilbert spaces W vk and X ðHÞ H are isomorphic.
Hilbert C-Module for Analyzing Structured Data
651
Theorem 4.6 implies that the framework of RKHMs is more general than that of vvRKHSs. We can reconstruct existing algorithms in vvRKHSs, for example, principal component analysis (PCA) using RKHMs [15]. In the case of A = H, we can construct an RKHM and a vvRKHS whose associated positive definite kernels are the same. However, although the inner product in the vvRKHS is -valued, that in the RKHM is ðHÞ-valued. By setting H as an infinite-dimensional Hilbert space and making use of information of ðHÞ -valued inner product specific to the infinite-dimensional H, we expect that we can capture more detailed behavior of data through the RKHM.
4.3
Kernel Mean Embedding in RKHM
Kernel mean embedding is a map that maps measures to vectors in an RKHS [14, 33, 42, 45, 46] and is used in analyzing measures. It is generalized to RKHM with respect to A -valued measures [15]. In this subsection, we assume A is a von Neumann-algebra. In addition, we denote a locally compact Hausdorff space by V in this subsection. For kernel mean embeddings in RKHSs, we focus on the function space C 0 ðV, Þ since by the Riesz–Markov representer theorem, it DðV, Þ can be regarded as the continuous dual space of the Banach space C 0 ðV, Þ. Thus, kernels related to C 0 ðV, Þ, referred to as c0-kernels, is considered [46]. The following type of kernels is a generalization of the c0 kernels in RKHSs and is used for kernel mean embedding in RKHM. Definition 4.7 Let k : V × V → A be an A-valued positive definite kernel. If k is bounded and ϕðxÞ = kð, xÞ 2 C 0 ðV, AÞ for any x 2 V , then k is called a c0-kernel. Moreover, we impose the closedness of X with respect to the strong operator topology. Definition 4.8 Let A ⊆ ðHÞ for a Hilbert space H. Let X be a Hilbert A -module. Let W = X A H be the Hilbert space defined by the internal tensor (Definition 2.9). We identify x 2 X with the map h ° xA h and regard X as a subspace of ðH, WÞ, which is the space of bounded linear operators from H to W. If X ⊆ ðH, WÞ is strongly closed, X is called a von Neumann A-module. Example If k = ~k1A for some -valued positive definite kernel ~k, then X k is a von Neumann module. Indeed, in this case, we have X k = W ~k A. Here, the tensor product means the completion of the algebraic tensor
652
Y. Hashimoto et al.
product of W ~k and A with respect to the A-valued inner product defined as hw a, h bi = hw, hiW ~k a b for w, h 2 W ~k and a, b 2 A. Definition 4.9 Let k be a c0-kernel. Assume X k is a von Neumann-module. We define a map Φ : DðV, AÞ → X k as ΦðμÞ =
ϕðvÞdμðvÞ: v2V
Then Φ is called a kernel mean embedding in X k . Note that for von Neumann-modules, the Riesz representation theorem is valid [41], which guarantees the well-definedness of Φ. A metric of measures called maximum mean discrepancy (MMD) is represented using the kernel mean embedding. This fact was originally investigated in RKHSs [12, 13], and as seen below, it is generalized to RKHMs [15]. Definition 4.10 Let UðV, AÞ be a set of A-valued bounded and measurable functions and let μ, ν 2 DðV, AÞ. For μ and ν, we define γ UðV,AÞ ðμ, νÞ =
sup
j
x2UðV , AÞ
j
xðvÞdμðvÞ v2V
xðvÞdνðvÞ v2V
A
:
Then γ A ðμ, ν, UðV, AÞÞ is called the MMD for A-valued measures. Here, the supremum is taken with respect to the (pre) order in A. Proposition 4.11 Let k and X k be the same as Definition 4.9. Let U RKHM ðV, AÞ = fx 2 X k j kxkX k ≤ 1g. Then we obtain γ U RKHM ðV,AÞ ðμ, νÞ = jΦðμÞ - ΦðνÞjX k for μ, ν 2 DðV, AÞ. Using MMD, we can compute the difference between two measures. For example, a two-sample test [12, 13] using MMDs in RKHSs enables us to judge whether given two distributions of data are the same or not. We can generalize the two-sample test to that for A-valued version of distributions of data. We expect that by this generalization, we can compare multiple distributions efficiently.
Hilbert C-Module for Analyzing Structured Data
653
Remark 4.12 Although Φ is called a kernel mean embedding, it is not always an injection. The injectivity of Φ is important for comparing two measures since the MMD of two measures μ and ν is calculated by the difference between Φ(μ) and Φ(ν). For an RKHS, it is shown that the kernel mean embedding is injective if and only if the RKHS is dense in C 0 ðV, Þ. For an RKHM, we can also see that if the RKHM is dense in C0 ðV, AÞ, then the kernel mean embedding is injective [15]. On the other hand, whether the converse is true or not is an open problem. Indeed, to prove the equivalence of the injectivity and the denseness in RKHSs, we use the Riesz–Markov representation theorem and the Hahn–Banach theorem [46]. Since generalizing the Riesz–Markov representation theorem to A-valued functions and applying the Hahn–Banach theorem to A-valued functions are not straightforward, showing whether the equivalence of the injectivity and the denseness is true or not is a challenging problem.
5 Conclusion and Future Work In this chapter, we reviewed fundamental results for data analysis with Hilbert C-modules [15, 16]. These results allow us to analyze structured data such as functional data. We introduced an orthonormalization in Hilbert C-modules and a generalization of neural network using Hilbert C-modules. Regarding the orthonormalization, we generalized an existing result to general Hilbert C-modules over von Neumann-algebras. We also focused on the generalization of analysis in reproducing kernel Hilbert spaces to reproducing kernel Hilbert C-modules (RKHM). We gave an approximate representer theorem in RKHM over von Neumann-algebra using the above generalization of the result about orthonormalization. For future work, we should work on both theoretical investigations and practical applications. We list possible feature work as follows: • Theoretical investigations – Find sufficient conditions for a finitely generated submodule becomes closed. – Investigate positive definite kernels making their associated RKHMs become von Neumann modules. – Show the relationship between the injectivity of a kernel mean embedding and the denseness of the corresponding RKHM in C 0 ðV, AÞ.
654
Y. Hashimoto et al.
• Practical applications – Use the noncommutative product structures of noncommutative Calgebras. – Use the spectra of elements in the C-algebra ðHÞ. – Implement algorithms with C-algebra efficiently. – Applications to neural networks Investigate the representation power and generalization bound of the neural network model on a Hilbert C-module. Generalize hyperparameters in neural networks to C-algebravalued ones. Generalize the framework of neural network models on Hilbert Cmodules to that of models whose shapes are different. – Applications of RKHM Generalize classical methods with RKHSs to those with RKHMs. Investigate positive definite kernels that can achieve high performance. Apply an RKHM instead of multiple RKHSs to improve computational efficiency. Analyze neural networks on Hilbert C-modules using RKHMs. Regarding theoretical investigations, we should find sufficient conditions for a finitely generated submodule becomes closed. We showed that if it is generated by an orthonormal system, then it is closed. However, as in the representer theorem, generators of submodules that we focus on are not always orthonormal. Finding sufficient conditions may also connect to the effective design of C-algebra. Another theoretical topic is to find more examples of von Neumann modules. Especially, we should investigate what type of positive definite kernels make their associated RKHMs become von Neumann modules. As mentioned in Section 4.3, an RKHM associated with a positive definite kernel whose form is ~k1A for some -valued positive definite kernel ~k is a von Neumann module. However, providing more complicated kernels may help us design effective kernels. Furthermore, investigating the relationship between the injectivity of a kernel mean embedding and the denseness of the corresponding RKHM in C 0 ðV, AÞ is also an interesting topic. The denseness of an RKHM is important to guarantee that we can approximate any function in C 0 ðV, AÞ in the RKHM. Regarding practical applications, we should focus more on the case where A is noncommutative. In fact, existing studies often focus on the case where A is commutative. We believe that setting A as a noncommutative Calgebra and making use of its noncommutative product structure break
Hilbert C-Module for Analyzing Structured Data
655
through the performance limitation of existing methods. Moreover, considering the C-algebra ðHÞ and making use of the spectra of elements in ðHÞ may help us understand complicated behavior of data. For example, we sometimes encounter situations where we want to approximate operators on infinite-dimensional Hilbert spaces. In that case, extracting information about their continuous and residual spectra is a challenging problem since we usually approximate them in finite-dimensional Hilbert spaces. By extending the operators on Hilbert spaces to Hilbert ðHÞ-modules properly, we may extract the information of the spectra of original operators on the Hilbert space from the spectra of elements in the C-algebra ðHÞ. One challenge of applying C-algebras to practical problems is implementing algorithms on computations, especially if A is infinite-dimensional. We also need to investigate how we can implement algorithms with C-algebra efficiently. For the application to neural networks, more concrete examples of the directions of applications are as follows: For neural networks, investigating the representation power and generalization bound of a neural network model is one of the important topics in this area. The generalization bound is the gap between the true error and empirical error of the approximation obtained by a model. Investigating the representation power and generalization bound of the neural network model on a Hilbert C-module can clarify the advantages of the neural network models on Hilbert C-modules over those on Hilbert spaces. Moreover, generalizing hyperparameters in neural networks to Calgebra-valued ones would also be interesting. Examples of hyperparameters are parameters related to solving optimization problems for neural networks and those characterize the nonlinear activation function. Furthermore, the neural network models on Hilbert C-modules allow us to aggregate multiple related neural network models. However, the shape of each model, that is, the dimension Ni (i = 1, . . . , n + 1) should be the same. Generalizing the framework to that of models whose shapes are different would be interesting. This corresponds to considering a large Ni and restricting functions in the Calgebra C( Ω) to be 0 at certain subsets in Ω. Investigating ways to solve the optimization problems for neural networks with this restriction allows us to implement the generalized framework. This may also connect to an effective design of the shape of multiple related neural network models. As for the application of RKHMs, more concrete examples of the directions of applications are as follows: First, more investigations for generalizing methods with RKHSs to those with RKHMs are required. In [15], PCA in RKHMs is investigated. Generalizing other classical methods such as SVM and canonical correlation analysis (CCA) makes the framework of RKHMs solid for data analysis. In addition, for practical situations, choosing an appropriate positive definite kernel is important to obtain the best possible
656
Y. Hashimoto et al.
approximations in RKHMs with a given finite number of samples. The choice of kernels often depends on data spaces and properties of data. Investigating what type of C-algebra-valued positive definite kernels can achieve high performance is interesting. Especially, as we mentioned in the previous paragraph, making use of the product structure of noncommutative C-algebras to construct a positive definite kernel is interesting. Another important problem for methods with RKHSs is their computational costs. For many algorithms, the computational cost grows quadratically or cubically with the number of samples. Similar to the application of neural networks, considering the C-algebra of continuous functions enables us to combine multiple problems continuously. When we need to solve multiple problems simultaneously, applying an RKHM instead of multiple RKHSs may improve computational efficiency. Finally, RKHSs are closely related to analyzing neural networks. We can construct a -valued positive definite kernel that is related to a neural network. Constructing a C-algebra-valued positive definite kernel that is related to a neural network on a Hilbert Cmodule may help us analyze the neural network on the Hilbert C-module. We believe that there are many other interesting directions related to the application of Hilbert C-modules to be investigated, which will lead to new perspectives in data analysis. Acknowledgement This work was partially supported by JST CREST Grant Number JPMJCR1913.
References 1. Aizerman, M. A., Braverman, E. M., & Rozonoer, L. I. (1964). Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25, 821–837 2. Álvarez, M., Rosasco, L., & Lawrence, N. (2012). Kernels for vector-valued functions: A review. Foundations and Trends in Machine Learning, 4, 195–266 3. Candès, E. J. (1999). Harmonic analysis of neural networks. Applied and Computational Harmonic Analysis, 6(2), 197–218 4. Chen, R. T. Q., Rubanova, Y., Bettencourt, J., & Duvenaud, D. K. (2018). Neural ordinary differential equations. In Proceedings of Advances in Neural Information Processing Systems 31 5. Cnops, J. (1992). A Gram–Schmidt method in Hilbert modules. Fundamental Theories of Physics, 47, 193–203 6. Diestel, J. (1984). Sequences and series in Banach spaces. Graduate texts in mathematics (Vol. 92). Springer 7. Dinculeanu, N. (1967). Vector measures. International series of monographs on pure and applied mathematics (Vol. 95). Pergamon
Hilbert C-Module for Analyzing Structured Data
657
8. Dinculeanu, N. (2000). Vector integration and stochastic integration in Banach spaces. Wiley 9. Fukumizu, K., Gretton, A., Sun, X., & Schölkopf, B. (2007). Kernel measures of conditional dependence. In Proceedings of Advances in Neural Information Processing Systems 20 10. Fukumizu, K., Bach, F. R., & Jordan, M. I. (2004). Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. Journal of Machine Learning Research, 5, 73–99 11. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Proceedings of Advances in Neural Information Processing Systems 27 12. Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., & Smola, A. J. (2006). A kernel method for the two-sample-problem. In Proceedings of Advances in Neural Information Processing Systems 19 13. Gretton, A., Borgwardt, K. M., Rasch, M., Schölkopf, B., & Smola, A. J. (2012). A kernel two-sample test. Journal of Machine Learning Research, 13, 723–773 14. Hashimoto, Y., Ishikawa, I., Ikeda, M., Matsuo, Y., & Kawahara, Y. (2020). Krylov subspace method for nonlinear dynamical systems with random noise. Journal of Machine Learning Research, 21(172), 1–29 15. Hashimoto, Y., Ishikawa, I., Ikeda, M., Komura, F., Katsura, T., & Kawahara, Y. (2021). Reproducing kernel Hilbert C-module and kernel mean embeddings. Journal of Machine Learning Research, 22(267), 1–56 16. Hashimoto, Y., Wang, Z., & Matsui, T. (2022). C-algebra net: A new approach generalizing neural network parameters to C-algebra. In Proceedings of the 39th International Conference on Machine Learning, PMLR 162, 8523–8534 17. Heo, J. (2008). Reproducing kernel Hilbert C-modules and kernels associated with cocycles. Journal of Mathematical Physics, 49, 103507 18. Hestenes, M. R., & Stiefel, E. (1952). Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards, 49(6), 409–436 19. Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527–1544 20. Ishikawa, I., Fujii, K., Ikeda, M., Hashimoto, Y., & Kawahara, Y. (2018). Metric on nonlinear dynamical systems with Perron–Frobenius operators. In Proceedings of Advances in Neural Information Processing Systems 31 21. Itoh, S. (1990) Reproducing kernels in modules over C-algebras and their applications. Journal of Mathematics and Natural Sciences, 37, 1–20 22. Kadri, H., Duflos, E., Preux, P., Canu, S., Rakotomamonjy, A., & Audiffren, J. (2016). Operator-valued kernels for learning from functional response data. Journal of Machine Learning Research, 17(20), 1–54 23. Kawahara, Y. (2016). Dynamic mode decomposition with reproducing kernels for Koopman spectral analysis. In Proceedings of Advances in Neural Information Processing Systems 29 24. Kingma, D. P. & Ba, J. (2015). Adam: A method for stochastic optimization. ICLR 25. Laforgue, P., Clémençon, S., & d’Alché-Buc, F. (2019). Autoencoding any data through kernel autoencoders. In Proceedings of the Twenty-Second International Conference on Artificial Intelligence and Statistics, PMLR 89, 1061–1069
658
Y. Hashimoto et al.
26. Lance, E. C. (1995). HilbertC-modules—a toolkit for operator algebraists. London Mathematical Society Lecture Note Series, vol. 210. Cambridge University Press 27. Mairal, J., Koniusz, P., Harchaoui, Z., & Schmid, C. (2014). Convolutional kernel networks. In Proceedings of Advances in Neural Information Processing Systems 27 28. Kang, Q., Song, Y., Ding, Q., & Tay, W. P. (2021). Stable neural ODE with Lyapunov-stable equilibrium points for defending against adversarial attacks. In Proceedings of Advances in Neural Information Processing Systems 34 29. Manuilov, V. M. & Troitsky, E. V. (2000). Hilbert C- and W-modules and their morphisms. Journal of Mathematical Sciences, 98, 137–201 30. Micchelli, C. A. & Pontil, M. (2005). On learning vector-valued functions. Neural Computation, 17, 177–204 31. Moslehian, M. S. (2022). Vector-valued reproducing kernel Hilbert C-modules. Complex Analysis and Operator Theory, 16(1), 2 32. Minh, H. Q., Bazzani, L., & Murino, V. (2016). A unifying framework in vectorvalued reproducing kernel Hilbert spaces for manifold regularization and co-regularized multi-view learning. Journal of Machine Learning Research, 17(25), 1–72 33. Muandet, K., Fukumizu, K., Sriperumbudur, B. K., & Schölkopf, B. (2017). Kernel mean embedding of distributions: A review and beyond. Foundations and Trends in Machine Learning, 10(1–2), 1–141 34. Murphy, G. J. (1990). C-Algebras and operator theory. Academic Press 35. Murphy, K. P. (2012). Machine learning: A probabilistic perspective. MIT Press 36. Rubanova, Y., Chen, R.T.Q., & Duvenaud, D. (2019). Latent ODEs for irregularlysampled time series. In Proceedings of Advances in Neural Information Processing Systems 32 37. Saitoh, S. & Sawano, Y. (2016). Theory of reproducing kernels and applications. Springer 38. Schölkopf, B. & Smola, A. J. (2001). Learning with kernels: Support vector machines, regularization, optimization, and beyond. MIT Press 39. Schölkopf, B., Herbrich, R., & Smola, A. J. (2001). A generalized representer theorem. In Computational Learning Theory. Lecture Notes in Computer Science, vol. 2111. Berlin: Springer 40. Shawe-Taylor, J. & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge University Press 41. Skeide, M. (2000). Generalised matrix C-algebras and representations of Hilbert modules. Mathematical Proceedings of the Royal Irish Academy, 100A(1), 11–38 42. Smola, A. J., Gretton, A., Song, L., & Schölkopf, B. (2007). A Hilbert space embedding for distributions. In Algorithmic Learning Theory. Lecture Notes in Computer Science, vol. 4754 43. Sonoda, S. & Murata, N. (2017). Neural network with unbounded activation functions is universal approximator. Applied and Computational Harmonic Analysis, 43(2), 233–268 44. Sonoda, S., Ishikawa, I., & Ikeda, M. (2021). Ridge regression with overparametrized two-layer networks converge to Ridgelet spectrum. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, PMLR 130, 2674–2682
Hilbert C-Module for Analyzing Structured Data
659
45. Sriperumbudur, B. K., Gretton, A., Fukumizu, K., Schölkopf, B., & Lanckriet, G. R. G. (2010). Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research, 11, 1517–1561 46. Sriperumbudur, B. K., Fukumizu, K., & Lanckriet, G. R. G. (2011). Universality, characteristic kernels and RKHS embedding of measures. Journal of Machine Learning Research, 12, 2389–2410 47. Steinwart, I. (2001). On the influence of the kernel on the consistency of support vector machines. Journal of Machine Learning Research, 2, 67–93 48. Szafraniec, F. H. (2010). Murphy’s positive definite kernels and Hilbert C-modules reorganized. Noncommutative Harmonic Analysis with Applications to Probability II, 89, 275–295 49. Tsivtsivadze, E., Urban, J., Geuvers, H., & Heskes, T. (2011). Semantic graph kernels for automated reasoning. In Proceedings of the 2016 SIAM International Conference on Data Mining, 795–803 50. Wang, J., & Qian, T. (2001). Orthogonalization in Clifford Hilbert modules and applications. arXiv:2103.09416 51. Ye, Y. (2017). The matrix Hilbert space and its application to matrix learning. arXiv:1706.08110v2 52. Zhang, T. (2004). Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the 21st International Conference on Machine Learning 53. Zhang, T., Yao, Z., Gholami, A., Gonzalez, J. E., Keutzer, K., Mahoney, K. M. W., & Biros, G. (2019). ANODEV2: A coupled neural ODE framework. In Proceedings of Advances in Neural Information Processing Systems 32
Iterative Processes and Integral Equations of the Second Kind Sanda Micula and Gradimir V. Milovanović
Abstract Beside the general theory to operator equations and iterative processes, including existence and uniqueness of solutions, fixed point theory, local properties of iterative processes, main theorems of NewtonKantorovich method, as well as methods for acceleration of iterative processes, a special attention is dedicated to applications to integral equations of the second kind, including a discretization process by using quadrature formulas. Several kinds of integral equations are reviewed and considered: nonlinear Volterra-Fredholm integral equations, mixed Volterra-Fredholm integral equations, Volterra integral equations with delayed argument, functional Volterra integral equations, and fractional integral equations. Keywords Iterative processes • Operator equations • Integral equations of the second kind Mathematics Subject Classification (MSC2020) Primary 47J05 • Secondary 65R20, 47J26
S. Micula Faculty of Mathematics and Computer Science, Babeş-Bolyai University, Cluj-Napoca, Romania e-mail: [email protected] G. V. Milovanović (✉) Serbian Academy of Sciences and Arts, Belgrade, Serbia Faculty of Sciences and Mathematics, University of Niš, Niš, Serbia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Moslehian (ed.), Matrix and Operator Equations and Applications, Mathematics Online First Collections, https://doi.org/10.1007/16618_2023_59
661
662
S. Micula and G. V. Milovanović
1 Introduction to Operator Equations and Iterative Processes Let X and Y be two Banach spaces, D be a convex subset of X, and F : D → Y be an operator, in general case, nonlinear. We can consider the operator equation: Fu = 0,
ð1:1Þ
where 0 is zero vector of the space Y . A large number of problems in science and techniques come down to solving equations of the form (1.1). A special and important case is when Y = X and Fu = u - Tu = 0. We mention a few typical examples. (a) If X = Y = ℝ, u = x, and F = f, the f ðxÞ = x - cos x = 0 and the algebraic equation
nonlinear
equation
f ðxÞ = a0 xn þ a1 xn-1 þ ⋯ þ an-1 x þ an = 0, are of the form (1.1). (b) If X = Y = ℝn, u = x = [x1 ⋯ xn]T, and f 1 ðx1 , . . . , xn Þ Fu = FðxÞ =
⋮
,
f n ðx1 , . . . , xn Þ where fi : ℝn →ℝ are given functions, the equation (1.1) represents a system of nonlinear equations f i ðx1 , . . . , xn Þ = 0,
i = 1, . . . , n:
If F is a linear operator, for example, F(x) = Ax -b, where the matrix A and the vector b are given by
A=
a11
a12
⋯ a1n
a21
a22
a2n
an2
ann
⋮ an1
b1 and b = ⋮ , bn
Iterative Processes and Integral Equations of the Second Kind
663
respectively, then the equation (1.1) reduces to a system of linear algebraic equations ai1 x1 þ ai2 x2 þ ⋯ þ ain xn = bi ,
i = 1, . . . , n:
(c) In the case X = C2[a, b], Y = C[a, b] ×ℝ, u u(t), Fu =
f 1 ðuÞ f 2 ðuÞ
,
and f 1 ðuÞðtÞ = u″ðtÞ - f ðt, uðtÞ, u ′ ðtÞÞ
ðt 2 ½a, bÞ,
f 2 ðuÞ = gðuðaÞ, uðbÞÞ,
where F : ℝ3 → ℝ and g : ℝ2 → ℝ are given functions. Then the operator equation (1.1) represents the boundary value problem u″ðtÞ = f ðt, uðtÞ, u ′ ðtÞÞ ðt 2 ½a, bÞ, gðuðaÞ, uðbÞÞ = 0: (d) Let X = C([a, b]), K1, K2 2 C([a, b]2 ×ℝ), g 2 C([a, b]), and the operator T : X → X be defined by t
ðTuÞðtÞ = gðtÞ þ
b
K 1 ðt, x, uðxÞÞ dx þ a
K 2 ðt, x, uðxÞÞ dx,
ð1:2Þ
a
where t 2 [a, b]. In this case we have the operator equation uðtÞ = ðTuÞðtÞ,
t 2 ½a, b,
which is, in fact, a nonlinear Volterra-Fredholm integral equation of the second kind. All mentioned equations, as well as a number of others, can be treated in a unique way. That is why the subject of our consideration in this chapter is solving the operator equation (1.1), i.e., finding such a point u 2 D which satisfies (1.1). Therefore, this unique approach is applied to the equation
664
S. Micula and G. V. Milovanović
u = Tu
ð1:3Þ
where the operator T maps D to D and Tu = H(u, Fu), with an operator H : D × Y → D. It is clear that for a given equation (1.1), the form (1.3) is not unique, as the following example shows. A simple equation f(x) = 0 can be represented in an equivalent form x = x þ λf ðxÞ
ð1:4Þ
for each λ different from zero, but there are many other equivalent forms different from (1.4).
1.1
Iterative Processes
One of the ways to solve the equation (1.3) as one form of (1.1) is to construct the sequence fuk gk20 as ukþ1 = Tuk ,
k = 0, 1, . . . ,
ð1:5Þ
starting from some point u0 2 D. Under certain conditions for the operator T, the sequence fuk gk20 can converge to the desired solution. The formula (1.5) is known as an iterative process. Remark 1.1 Beside iterative processes of the form (1.5), one can consider more general processes, the so-called iterative processes with memory ukþ1 = Sðuk , uk - 1 , . . . , uk - mþ1 Þ,
k = m - 1, m, . . . ,
where S : Xm → X. Such a process, with a memory of the length m, needs m starting points u0, u1, . . . , um-1 2 D.
1.2
Existence and Uniqueness of Solutions. Fixed Point Theory
Let (X, kk) be a Banach space and (1.5) be an iterative process converging to u2 X, so that u = Tu. It means that there exists a point u2 X such that
Iterative Processes and Integral Equations of the Second Kind
665
lim kuk - u k = 0:
k → þ1
Such u2 X is a fixed point of the operator T. The fixed point u2 X is a solution of the previous equation (1.3). To discuss solvability and other properties of the operator equations, let us recall the main results for the fixed point theory [5] on a Banach space. Definition 1.2 Let (X, kk) be a Banach space. A nonlinear operator T : X ° X is a q-contraction if 0 ≤ q < 1 and ð8u, v 2 XÞ
kTu - Tvk ≤ qku - vk:
A classical result is the contraction principle on a Banach space. Theorem 1.3 Let (X, kk) be a Banach space and T : X ° X be a q-contraction. Then (a) The equation u = Tu has exactly one solution u2 X. (b) The iterative process uk+1 = Tuk, k = 0, 1, . . . , converges to the solution u for any arbitrary choice of the initial point u0 2 X. (c) The error estimate kuk - u k ≤
qk kTu0 - u0 k 1-q
holds for each k 2ℕ . A stronger fixed point result can be formulated in the following form (see Altman [2]). Theorem 1.4 Let (X, kk) be a Banach space and T : X ° X be a q-contraction. Let fεk gþ1 k = 0 be a sequence of positive numbers such that ε = þ 1. Then εk ≤ 1 and þ1 k=0 k (a) The equation u = Tu has exactly one solution u2 X. (b) The iterative process ukþ1 = ð1 - εk Þuk þ εk Tuk ,
k = 0, 1, . . . ,
converges to u for any arbitrary choice of the initial point u0 2 X.
666
S. Micula and G. V. Milovanović
(c) The error estimate kuk - u k ≤
e1-q kTu0 - u0 ke - ð1-qÞvk 1-q
holds for each k 2ℕ , where v0 = 0 and vk =
k-1 ν=0
εν , k ≥ 1.
Remark 1.5 The above results remain valid if instead of the entire space X, we consider any closed subset Y ⊂ X, satisfying T(Y ) ⊆ Y. Many times, such results are useful if applied on a closed ball Bϱ = {u 2 X : ku - u0k≤ ϱ}, for a suitable point u0 2 X. This issue is addressed in more detail in the next section.
1.3
Local Properties of Iterative Processes
Now we consider some local properties of the iterative process (1.5). Let u2 X be a fixed point of the operator T : X → X and let U be a convex neighborhood of the limit point u. The iterative process (1.5) is of order r (≥ 1) if kTu - u k = Oðku - u kr Þ ðu 2 UÞ: Theorem 1.6 If the operator T is r-times differentiable in Fréchet’s sense on U, then the iterative process (1.5) is of the order r if and only if the following conditions are satisfied (1) Tu = u; ðr-1Þ (2) T 0ðu Þ , T 00ðu Þ , . . . , T ðu Þ are zero operators; ðrÞ
(3) T ðuÞ is nonzero operator, with a norm bounded on U. For the proof of this theorem see, for example, the book by Collatz [9, p. 291]. The most known iterative process of the second order is the NewtonKantorovich method ukþ1 = uk - ½F 0ðuk Þ - 1 Fuk ,
k = 0, 1, . . . ,
ð1:6Þ
Iterative Processes and Integral Equations of the Second Kind
667
for solving the equation Fu = 0. This fundamental extension of the wellknown Newton method to functional spaces was given by L.V. Kantorovich1 in 1948 (see Kantorovich [15] and also the book Kantorovich and Akilov [16]). Here F 0ðuk Þ is the Fréchet derivative of the nonlinear operator F at the point uk, and ½F 0ðuk Þ - 1 is its inverse. This is one of the fundamental techniques in functional analysis and numerical analysis.
Theorem 1.7 (Kantorovich [15]) Assume that the operator F is defined and twice continuously differentiable on a ball B = {u : ku - u0k≤ ϱ}, the linear operator F 0ðu0 Þ is invertible, k½F 0ðu0 Þ - 1 Fu0 k ≤ η, k½F 0ðu0 Þ - 1 F 00ðuÞ k ≤ K ðu 2 BÞ, and 1 h = Kη < , 2
p 1 - 1 - 2h η: ϱ≥ h
Then the equation Fu = 0 has a solution u2 B, the iterative process (1.6) is well-defined and converges to u with quadratic rate kuk - u k ≤
k η ð2hÞ2 : k h2
There exist numerous versions of Kantorovich’s theorem, which differ in assumptions and results (cf. Polyak [29] and reference therein, as well as the books Kantorovich and Akilov [16], Krasnoselski et al. [17], Ortega and Rheinboldt [28]). We mention just one of them, due to Ivan Petrovich Mysovskikh. Theorem 1.8 (Mysovskikh [27]) Assume that the operator F is defined and twice continuously differentiable on a ball B = {u : ku - u0k≤ ϱ}, the linear operator F 0ðu0 Þ is invertible, k½F 0ðuÞ - 1 k ≤ β, kF 00ðuÞ k ≤ K ðu 2 BÞ, kFu0 k ≤ η,
1
Leonid Vitalyevich Kantorovich (1912–1986) was a famous Soviet mathematician and economist, the winner of the Nobel Prize for his theory and development of techniques for the optimal allocation of resources in economic sciences in 1975.
668
S. Micula and G. V. Milovanović
and h = Kβ2 η < 2,
ϱ ≥ βη
þ1
ν
ðh∕2Þ2 -1 :
ν=0
Then the equation Fu = 0 has a solution u2 B, the iterative process (1.6) is well-defined and converges to u with quadratic rate βηðh∕2Þ2 -1 k
kuk - u k ≤
1 - ðh∕2Þ2
k
:
Taking F 0ðu0 Þ instead of F 0ðuk Þ in (1.6), we get the iterative process ukþ1 = uk - ½F 0ðu0 Þ - 1 Fuk ,
k = 0, 1, . . . ,
of the first order. There are several approaches for modifying the NewtonKantorovich method in order to achieve global convergence. The simplest way is the so-called damped Newton-Kantorovich method ukþ1 = uk - γ k ½F 0ðuk Þ - 1 Fuk ,
k = 0, 1, . . . ,
where γ k (0 < γ k ≤ 1) is chosen so that kFukk < kFuk-1k. This kind of minimization enables a balance between convergence and order of convergence. Here we mention also the Levenberg-Marquardt method (cf. Polyak [29]) ukþ1 = uk - ½γ k I þ F 0ðuk Þ - 1 Fuk ,
k = 0, 1, . . . ,
which reduces to the Newton-Kantorovich method for γ k = 0.
1.4
Acceleration of Iterative Processes
Having an iterative process of order k, we can obtain the process of higher order (see Jovanović [14], Milovanović [24], Simeunović [30], [31]). Theorem 1.9 (Jovanović [14]) Let (1.5) be an iterative method of the order r and the operator T be (r + 1)-times differentiable in the sense of Fréchet on U. If we suppose that the inverse operator ½I then the iterative process
1 r
T 0ðuÞ
-1
exists for u 2 U,
Iterative Processes and Integral Equations of the Second Kind
ukþ1 = uk - ½I -
669
1 0 -1 T ðuk - Tuk Þ r ðuÞ
is at least of the order r + 1. Theorem 1.10 (Milovanović [24]) Let (1.5) be an iterative method of the order r ≥ 2 and the operator T be (r + 1)-times differentiable in the sense of Fréchet on U. Then the iterative process ukþ1 = Tuk -
1 0 T ðu - Tuk Þ r ðuÞ k
is at least of the order r + 1. Applying Theorem 1.9 to the iterative process (1.5) of the order r = 1 (with a linear convergence), uk+1 = Tuk, we get the iterative process ukþ1 = uk - ½I - T 0ðuÞ
-1
ðuk - Tuk Þ,
k = 0, 1, . . . ,
of the order at least two (quadratic convergence). It is exactly the Newton-Kantorovich method ukþ1 = uk - ½F 0ðuk Þ - 1 Fuk ,
k = 0, 1, . . . ,
where Fu = u - Tu = 0. Applying Theorem 1.10 to the Newton-Kantorovich method of the second order (r = 2), ukþ1 = uk - ½F 0ðuk Þ - 1 Fuk ,
k = 0, 1, . . . ,
we obtain an iterative method of the third order uk+1 = Φuk, k = 0, 1, . . . , where the operator Φ is given by Φu = u - ½F 0ðuÞ - 1 Fu -
1 0 - 1 00 -1 -1 ½F F ðuÞ ð½F 0ðuÞ Fu, ½F 0ðuÞ FuÞ, 2 ðuÞ
supposing the existence of certain higher derivatives of F. Similarly, an application of Theorem 1.9 to the Newton-Kantorovich method gives also a method of the third order (method of tangent hyperbolas), considered in 1961 by Altman [1].
670
S. Micula and G. V. Milovanović
2 Applications to Integral Equations of the Second Kind: Discretization Integral equations play a significant role in applied mathematics, since they arise in many applications in areas of physics, engineering, biology, hydrodynamics, thermodynamics, elasticity, quantum mechanics, etc. They represent an important tool for modeling the progress of an epidemic and various other biological problems. Also, many reformulations of initial and boundary value problems for partial differential equations can be written as integral equations. As such, finding numerical solutions and approximations of the true solution at a discrete set of points is an important task for researchers. Iterative methods are particularly suitable, as they not only guarantee the existence of a unique solution (under certain conditions) but they also provide means for finding approximate solutions, via successive approximations. We present several types of integral equations of the second kind and various numerical iterative methods that produce good approximations for their solutions.
2.1
Nonlinear Volterra-Fredholm Integral Equations
We consider nonlinear Volterra-Fredholm integral equations of the second kind t
uðtÞ = gðtÞ þ
b
K 1 ðt, x, uðxÞÞ dx þ a
K 2 ðt, x, uðxÞÞ dx,
t 2 ½a, b,
a
ð2:1Þ where K1, K2 2 C([a, b]2 ×ℝ) and g 2 C([a, b]). There are several methods for solving this equation, especially for linear equations. We employ fixed point theory for this kind of equations, and therefore, we take X = C([a, b]), equipped with the usual norm kuk = max juðtÞj, and t2½a, b define the integral operator T : X → X as in (1.2), i.e.,
Iterative Processes and Integral Equations of the Second Kind t
ðTuÞðtÞ = gðtÞ þ
671
b
K 1 ðt, x, uðxÞÞ dx þ a
K 2 ðt, x, uðxÞÞ dx: a
In this way we get the operator equation u(t) = (Tu)(t), t 2 [a, b]. Several variants of the integral operator T have appeared in papers in the last period. In Micula [23] it was considered just mentioned equation. The method is based on Picard iteration and uses a suitable quadrature formula (composite trapezoidal rule): b
a
1 f ðtÞdt = h f ðaÞ þ 2
n ν=1
1 f ðτν Þ þ f ðbÞ þ Rnþ2 ðf Þ, 2
ð2:2Þ
where h = (b - a)∕(n + 1), τν = a + hν, ν = 0, 1, . . . , n, n + 1, and the remainder term (= 0 for all f 2 P1) Rnþ2 ðf Þ = -
h2 ðb - aÞf ″ðξÞ, 12
ξ 2 ða, bÞ:
The existence and uniqueness of the solution, as well as the error estimates in the approximate solutions, were given under certain conditions, which ensure the application of the fixed point theory. Basic idea is an approximation of the equation uðtÞ = ðTuÞðtÞ by
~ ũðtÞ = ðTũÞðtÞ
ðt 2 ½a:bÞ,
usually on a discrete set of points in [a, b], e.g., a = τ0 < τ1 < ⋯ < τn < τnþ1 = b: Such discretization leads to the determination of a sequence of the vectors at the points τ = (τ0, τ1, . . . , τn, τn+1), ~k = ðũk ðτ0 Þ, ũk ðτ1 Þ, . . . , ũk ðτn Þ, ũk ðτnþ1 ÞÞ, u ~0 . Here, u ~k = u ~k ðτÞ 2 nþ2 . starting from some u Then, the iterative process ~k , ~kþ1 = T~ u u
k = 0, 1, . . . ,
k = 1, 2, . . . ,
672
S. Micula and G. V. Milovanović
~ should converge to the solution of the equation ũðτÞ = ðTũÞðτÞ, denoted by ~ ðτÞ, and to be close enough to the solution of the equation u(t) = (Tu)(t) ~ =u u at (n + 2) points τ = (τ0, τ1, . . . , τn, τn+1). Denote this (discrete) solution by u = u(τ). Using the uniform norm of vectors in ℝn+2, we have that ~ þ u ~ - u k k~ uk - u k = k~ uk - u ~ kþk~ ≤ k~ uk - u u - u k The first term depends on the iterative process and its speed, and the second one depends on the approximation of integrals by the quadrature formulas. Under conditions that the kernels K1 and K2 satisfy Lipschitz’s conditions with respect to the third argument, with constants L1 and L2, respectively, such that q = (b - a)(L1 + L2) < 1, and the (weight) coefficients of the quadrature formula Aν, ν = 0, 1, . . . , n + 1, are such that nþ1
γ = ðL1 þ L2 Þ
ν=1
jAν j < 1,
then k~ uk - u k ≤
M 1 qk M þ 2 , 1-q 1-γ
ð2:3Þ
for some positive constants M1 and M2. For the composite trapezoidal formula, in Micula [23], it was proved that: γ = q = ðb - aÞðL1 þ L2 Þ,
M2 b-a = Oðh2 Þ, h = , 1-γ nþ1
ð2:4Þ
and presented a few examples to illustrate her method based on Picard iteration, with this composite rule. Here we propose the following approximation of the integral equation (2.1): (i) at t = τ0 = a by: nþ1
ũkþ1 ðτ0 Þ = gðτ0 Þ þ
ν=0
Aðnþ1Þ K 2 ðτ0 , τν , ũk ðτν ÞÞ ν
and (ii) at t = τj, j = 1, . . . , n, n + 1, by
ð2:5Þ
Iterative Processes and Integral Equations of the Second Kind nþ1
ũkþ1 ðτj Þ = gðτj Þ þ
ν=0 nþ1
þ ν=0
673
Aðnþ1Þ K 2 ðτj , τν , ũk ðτν ÞÞ ν ð2:6Þ AðjÞ ν K 1 ðτj , τν , ũk ðτν ÞÞ,
with starting function ũ0 ðtÞ = gðtÞ. These quadrature formulas should be of degree of precision at least n + 1.
2.2
Construction of Interpolatory Quadrature Formulas for Volterra and Fredholm Parts
Because of simplicity, we construct here interpolatory quadrature formulas of closed type on the interval [0, 1], with arbitrary n internal nodes, 0 = τ0 < τ1 < ⋯ < τn < τnþ1 = 1: Such formulas are exact for all algebraic polynomials of degree at most n + 1. We consider only non-weighted formulas (w(t) = 1). For any other finite interval [a, b], such quadratures can be obtained by a simple linear transformation t ° a + (b - a)t. Thus, we consider: 1
nþ1
f ðtÞdt = 0
ν=0
Aν f ðτν Þ þ Rnþ2 ðf Þ
ð2:7Þ
and for each j = 1, . . . , n, τj
nþ1
f ðtÞdt = 0
ν=0
ðjÞ
AðjÞ ν f ðτν Þ þ Rnþ2 ðf Þ,
where for j = n + 1, Aðnþ1Þ Aν ν
ðnþ1Þ
and Rnþ2 ðf Þ = Rnþ2 ðf Þ:
Here, for each f 2 Pn+1, we have
ð2:8Þ
674
S. Micula and G. V. Milovanović ðnþ1Þ
Rnþ2 ðf Þ = 0,
j = 1, 2, . . . , n, n þ 1:
Remark 2.1 It is easy to transform these formulas to a general interval [a, b] by a linear transformation. In order to construct these quadrature rules, we start with the Lagrange polynomial of degree ≤ n + 1 at selected nodes nþ1
Lnþ1 ðf ; tÞ =
ν=0
f ðτν Þ
ωðtÞ , ðt - τν Þω ′ ðτν Þ
where ω is the node polynomial ωðtÞ = ðt - τ0 Þðt - τ1 Þ⋯ðt - τn Þðt - τnþ1 Þ = tðt - 1Þðt - τ1 Þ⋯ðt - τn Þ, so that f(t) = Ln+1( f; t) + r( f; t) and r(Pn+1; t) 0. Integrating over [0, τj], j = 1, . . . , n, n + 1, we get τj
τj
f ðtÞdt = 0
ðjÞ
Lnþ1 ðf ; tÞdt þ Rnþ2 ðf Þ, 0
ðjÞ
where Rnþ2 ðf Þ = 0 for each f 2 Pn+1. Taking the expression for the Lagrange polynomial, we obtain τj
nþ1
Lnþ1 ðf ; tÞdt = 0
ν=0
AðjÞ ν f ðτ ν Þ,
where AðjÞ ν =
1 ω ′ ðtÞ
τj 0
ωðtÞ dt, t - τν
ν = 0, 1, . . . , n, n þ 1,
for each j = 1, . . . , n, n + 1, and ω(t) is defined in (2.9).
ð2:9Þ
Iterative Processes and Integral Equations of the Second Kind
675
Thus, AνðjÞ
τj nþ1
= 0
i=0 i≠ν
t - τi dt, τν - τi
ν = 0, 1, . . . , n, n þ 1:
After changing the variables (for fixed j 2{1, 2, . . . , n, n + 1}) t = τj ξ and τν = τj ξν ,
ν = 0, 1, . . . , n, n þ 1,
so that ξν = τν /τj, ν = 0, 1, . . . , n, n + 1, i.e., ξ0 = 0, . . . , ξj = 1,
ξν > 1 ðν > jÞ,
we get the following: 1 nþ1
AðjÞ ν = τj 0
i=0 i≠ν
ξ - ξi dξ, ξν - ξi
ν = 0, 1, . . . , n, n þ 1; j = 1, 2, . . . , n, n þ 1,
ð2:10Þ
in the quadrature formulas for each j = 1, 2, . . . , n, n + 1, τj
nþ1
f ðtÞ dt = 0
ν=0
ðjÞ
AðjÞ ν f ðτ ν Þ þ Rnþ2 ðf Þ,
ðjÞ
Rnþ2 ðPnþ1 Þ = 0:
ð2:11Þ
Using the error in the Lagrange interpolation polynomial and supposing that f 2 Cn+2[0, 1], with jf (n+2)(t)j≤ Mn+2, we can get the following estimate ðjÞ jRnþ2 ðf Þj ≤
M nþ2 ðn þ 2Þ!
τj
jωðtÞj dt,
j = 1, 2, . . . , n, n þ 1,
0
where ω(t) is defined in (2.9). Now, we give a few standard sequences for the internal nodes. 2.2.1
Uniform Distribution of Nodes
We take the internal nodes as in the Newton-Cotes formulas,
ð2:12Þ
676
S. Micula and G. V. Milovanović
τν =
ν , nþ1
ν = 0, 1, . . . , n, n þ 1:
Here, for each j 2{1, 2, . . . , n, n + 1}, we have ξ0 = 0,
ν ξν = , ν = 1, . . . , n, n þ 1, j
and AðjÞ ν
1 nþ1
j = nþ1
0
i=0 i≠ν
jξ - i dξ, ν-i
ν = 0, 1, . . . , n, n þ 1:
These coefficients can be calculated very easily in symbolic form. 6 For n = 5, i.e., for the nodes fτν gν=0 = 0, 16 , 13 , 12 , 23 , 56 , 1 , we obtain the weight coefficients given by the following matrix: 19087 2713 362880 15120 1139 47 22680 189 137 27 2688 112 143 232 2835 945 3715 725 72576 3024 41 9 840 35
15487 293 120960 2835 11 166 7560 2835 387 17 4480 105 64 752 945 2835 2125 125 24192 567 9 34 280 105
-
6737 263 120960 15120 269 11 7560 945 243 9 4480 560 29 8 945 945 3875 235 24192 3024 9 9 280 35
-
863 362880 37 22680 29 13440 : 4 2835 275 72576 41 840
-
The coefficients AðjÞ ν , ν = 0, 1, . . . , n + 1 (here n = 5), in the quadrature formula (2.8) for j = 1, . . . , n are given in the j-th row of the previous matrix, while in the last row ( j = n + 1) these elements are weight coefficients which are in the quadrature formula (2.7). The weight coefficients corresponding to nodes in [0, τj] are marked in bold. Remark 2.2 Quadrature formulas with nodes outside the interval of integration can be found in the literature, e.g., quadratures of Birkhoff-Young type (see Birkhoff and Young [6], Milovanović [25]). The so-called extended
Iterative Processes and Integral Equations of the Second Kind
677
Simpson rule [32, p. 124] has also two nodes outside the interval of integration, cþh
f ðtÞdt = c-h
h f114f ðcÞ þ 34½f ðc þ hÞ þ f ðc - hÞ 90 - ½f ðc þ 2hÞ þ f ðc - 2hg þ RES 5 ðf Þ,
where jRES 5 ðf Þj
jhj7 ð6Þ jf ðξÞj, 756
c - 2h < ξ < c þ 2h,
supposing that f 2 C6[c - 2h, c + 2h]. ðjÞ
According to (2.12) the bounds for jRnþ2 ðf Þj, when n = 5 and j 2{1, 2, . . . , 6}, are respectively M7 1375 863 71 527 2459 71 , , , , , , 7! 40310784 20155392 1492992 10077696 40310784 746496 where M 7 = max jf ð7Þ ðtÞj. 0≤t≤1
2.2.2
Nodes of Lobatto Formula
We take the internal nodes as zeros of the polynomials π n(t), which are orthogonal on (0, 1) with respect to the weight function t ° w(t) = t(1 - t) (see Mastroianni and Milovanović [18, p. 330]) Using the moments 1
mk =
wðtÞt k dt = 0
1 , ðk þ 2Þðk þ 3Þ
k = 0, 1, . . . ,
we can obtain the coefficients αk and βk in the three-term recurrence relation for orthogonal polynomials π k(t), π kþ1 ðtÞ = ðt - αk Þπ k ðtÞ - βk π k-1 ðtÞ, where π 0(t) = 1 and π -1(t) = 0.
k = 1, 2, . . . ,
678
S. Micula and G. V. Milovanović
In this case, using Mathematica package OrthogonalPolynomials (see Cvetković and Milovanović [10], Milovanović and Cvetković [26]), we obtain (by routine aChebyshevAlgorithma in symbolic mode) αk =
kðk þ 2Þ 1 ðk ≥ 1Þ: β0 = , βk = 6 4ð2k þ 1Þð2k þ 3Þ
1 ðk ≥ 0Þ; 2
The quadrature nodes are as follows: τ0 = 0, τn+1 = 1, as well as the zeros of π n(t): τ1, . . . , τn. These internal nodes τ1, . . . , τn are also eigenvalues of the Jacobi matrix (cf. Mastroianni and Milovanović [18, p. 326]) β1
α0 β1 Jn =
α1 β2
O β2 α2
⋱
⋱
⋱ βn - 1
O
, βn - 1 αn - 1
and they can be determined very easy (with arbitrary precision) using the mentioned package. For example, in the case n = 5, we have the nodes fτν g6ν¼0 ¼
0,
1 33 66
1 33 þ 66
p 1 495 þ 66 15 , 33 66
p 1 495 - 66 15 , 33 þ 66
p 1 495 - 66 15 , , 2
p 495 þ 66 15 , 1 ,
i.e., fτν gν=0 = f0, 0:084888052, 0:26557560, 0:5, 0:73442440, 0:91511195, 1g. The weight coefficients (with only ≤ 4 decimal digits to save space) are given by the following matrix: 6
Iterative Processes and Integral Equations of the Second Kind
679
0:03285 0:0593
- 0:0108
0:0056
- 0:0035
0:0022
- 0:00084
0:01800 0:1577
0:1024
- 0:0185
0:0096
- 0:0057
0:00210
0:02753 0:1278
0:2375
0:1219
- 0:0216
0:0106
- 0:00372
0:02171 0:1441
0:2063
0:2623
0:1135
- 0:0193
0:00581
0:02465 0:1362
0:2194
0:2382
0:2266
0:0791
- 0:00904
0:02381 0:1384
0:2159
0:2438
0:2159
0:1384
0:02381
In this case the values of integrals (n = 5) are
τj 0 jωðtÞj
:
dt, for j = 1, 2, . . . , n, n + 1
6:102 × 10 - 6 , 2:447 × 10 - 5 , 5:154 × 10 - 5 , 7:860 × 10 - 5 , 9:697 × 10 - 5 , 1:031 × 10 - 4 ,
respectively. 2.2.3
Use of Chebyshev Polynomials of the First Kind
Chebyshev polynomials of the first kind defined by T n ðxÞ = cosðnarccos xÞ, n = 0, 1, . . . , are orthogonal on [-1, 1] with respect to the weight function w(x) = (1 - x2)-1∕2. We use here their transformed version Tn(2t - 1) on [0, 1]. The graphics for n = 5, 6, 7 are displayed in Figure 1. As internal nodes we can use the zeros of Tn(2t - 1) (cf. Mastroianni and Milovanović [18, p. 12]) 1.0 0.5 0.0 –0.5 –1.0 0.0
0.2
0.4
0.6
0.8
1.0
Figure 1 Chebyshev polynomials Tn(2t - 1) transformed to [0, 1] for n = 5 (blue), n = 6 (red), and n = 7 (green)
680
S. Micula and G. V. Milovanović 1.0 0.5 0.0 –0.5 –1.0 0.0
0.2
0.4
0.6
0.8
1.0
Figure 2 Zeros of Tn(2t - 1) for n = 5 as internal nodes and additional zeros at 0 and 1 1.0 0.5 0.0 –0.5 –1.0 0.0
0.2
0.4
0.6
0.8
1.0
Figure 3 Extremal points of Tn+1(2t - 1) for n = 5
τν =
ð2ν - 1Þπ ð2ν - 1Þπ 1 1 - cos = sin 2 , 2 2n 4n
ν = 1, . . . , n,
adding then two bounds τ0 = 0 and τn+1 = 1. The case n = 5 is shown in Figure 2. Similarly, we can use extremal points of Tn(2t - 1) (cf. Mastroianni and Milovanović [18, p. 12]) τν =
1 νπ νπ 1 - cos , = sin 2 2 nþ1 2ðn þ 1Þ
The case n = 5 is presented in Figure 3.
ν = 0, 1, . . . , n, n þ 1:
Iterative Processes and Integral Equations of the Second Kind
681
Remark 2.3 Standard Newton-Cotes formulas with zeros of the Chebyshev polynomials of the first and second kind are known as Fejér’s rules (cf. Dahlquist and Björk [11, pp. 538–539]). 2.2.4
Nodes of the Clenshaw-Curtis Formula
This is an interesting choice of the nodes induced by the Clenshaw-Curtis formula [8] (cf. also Trefethen [33]). The nodes are extremal points of Tn+1(2t - 1), which are orthogonal on (0, 1) with respect to the weight function t ° wðtÞ = 1∕ tð1 - tÞ. Since the extremal points of Tn+1(t) are given by - cos ðνπ∕ ðn þ 1ÞÞ, ν = 0, 1, . . . , n, n + 1, we have τν = sin 2
νπ , 2ðn þ 1Þ
ν = 0, 1, . . . , n, n þ 1:
ð2:13Þ
Note that τ0 = 0 and τn+1 = 1. In the case n = 5, the nodes are
fτν gν = 0 = 0, 6
p p 1 1 1 3 1 2- 3 , , , , 2þ 3 , 1 , 4 4 2 4 4
i.e., fτν gν = 0 = f0, 0:06698729810778, 0:25, 0:5, 0:75, 0:9330127018922, 1g. The weight coefficients (with only ≤ 4 decimal digits to save space) are given by the following matrix: 6
0:02722 0:0433
- 0:0050
0:0023
- 0:0015
0:0012
- 0:00056
0:00357 0:1546
0:1040
- 0:0183
0:0103
- 0:0078
0:00357
0:02103 0:1116
0:2532
0:1302
- 0:0246
0:0154
- 0:00675
0:01071 0:1348
0:2183
0:2786
0:1246
- 0:0276
0:01071
0:01485 0:1258
0:2301
0:2580
0:2336
0:0837
- 0:01293
0:01429 0:1270
0:2286
0:2603
0:2286
0:1270
0:01429
In this case the values of integrals (n = 5) are
τj 0 jωðtÞj
dt, for j = 1, 2, . . . , n, n + 1
:
682
S. Micula and G. V. Milovanović
5 37 1 155 187 1 , , , , , , 1572864 1572864 16384 1572864 1572864 8192 respectively. These quadrature will be considered in detail somewhere. In this subsection we present numerical results for iterations with the composite trapezoidal formula (2.2) like in Micula [23], as well as for the method based on approximation of integral equation (2.1) at the selected nodes a = τ0 < τ1 < ⋯ < τn < τn+1 = b, using the quadrature formulas (2.7) and (2.8), especially by the Clenshaw-Curtis nodes. Example In this example we consider the integral equation [23] uðtÞ = gðtÞ þ
1 12
t
sin x uðxÞ2 dx þ 0
1 36
π∕2
cos tð1 þ cos 2 tÞuðxÞ dx 0
ð2:14Þ on t 2 [0, π∕2], where gðtÞ =
1 ð35 cos t - 1Þ and u ðtÞ = cos t 36
is the exact solution. The equation (2.14) is of the form (2.1), with K 1 ðt, x, uÞ =
1 1 sin x u2 , K 2 ðt, x, uÞ = cos tð1 þ cos 2 tÞu, t, x 2 ½0, π∕2, 12 36
where x ° u(x). (1) First we use the composite trapezoidal rule (2.2), with the nodes νπ π τν = , ν = 0, 1, . . . , n, n + 1, and h = , so that we have 2ðn þ 1Þ 2ðn þ 1Þ at t = τ0 = a: nþ1
ũkþ1 ðτ0 Þ = gðτ0 Þ þ h
ν=0
and at t = τj, j = 1, . . . , n, n + 1, by
″K 2 ðτ0 , τν , ũk ðτν ÞÞ
ð2:15Þ
Iterative Processes and Integral Equations of the Second Kind j
nþ1
ũkþ1 ðτj Þ = gðτj Þ þ h
683
ν=0
″K 2 ðτj , τν , ũk ðτν ÞÞ þ h
ν=0
″K 1 ðτj , τν , ũk ðτν ÞÞ, ð2:16Þ
where the double prime on the sum means that the first and last terms should be halved. We present absolute errors in iterations jũk ðτν Þ - u ðτν Þj, finding ũk ðτν Þ, ν = 0, 1, . . . , n + 1 in iterations k = 1, 2, . . . , starting by u0(τν) = g(τν), ν = 0, 1, . . . , n + 1. We give graphics t ° En+2(t) joined by the obtained n + 2 points in each of iterations into a line (as a first-order spline interpolation). Also, we calculate kũk ðτν Þ - u ðτν Þk1 = max jũk ðτν Þ - u ðτν Þj, 0 ≤ ν ≤ nþ1
k = 1, 2, . . . :
In this case, the uniform (maximum) norm of the function t ° En+2(t) in k-th iteration is exactly ½k
½k
kE nþ2 k1 : = kEnþ2 k1 = max jE nþ2 ðtÞj = kũk ðτν Þ - u ðτν Þk1 : 0 ≤ t ≤ π∕2
Graphics of the functions t ° En+2(t) on [0, π∕2], when n + 2 = 10, 15, and 100, are presented in Figures 4, 5, and 6, respectively, after k iterations. We mention again that the integral equation (2.14) is approximated by the discrete model, given by j
nþ1
ũðτj Þ = gðτj Þ þ h
ν=0
″K 2 ðτj , τν , ũðτν ÞÞ þ h
ν=0
″K 1 ðτj , τν , ũðτν ÞÞ,
for j = 0, 1, . . . , n + 1, where for j = 0 the last sum vanishes. ~k , for sufficiently Using the iterative process (2.15)–(2.16), the iteration u ~ up to large k, approximates the exact solution of this discrete model u machine precision (see the first term on the right-hand side in the inequality (2.3)). However, the second term in (2.3), M2 ∕ (1 - γ) = O(h2) (see (2.4)) depends on the quadrature rule, and it determines the main part of the error k~ uk - u k. This shows that we have a limitation in obtaining a satisfactory approximation of the exact solution u of the integral equation (2.14), depending of the quadrature rule. For example, with the rule with n + 2 = 10 nodes (see Figure 4), the minimal absolute error is achieved with k = 5 iterations
684
S. Micula and G. V. Milovanović
E10 (t) 10–2
k=1
10–3 k=2 k=3 k≥5 10–4 0
p 8
p 4
3p 8
t
p 2
Figure 4 Errors at n + 2 = 10 equidistant nodes for the trapezoidal rule and k = 1, 2, 3 and k ≥ 5 iterations
E15 (t) 10–2
k=1
10–3 k=2
k=3 k≥6
10–4 0
p 8
p 4
3p 8
p 2
t
Figure 5 Errors at n + 2 = 15 equidistant nodes for the trapezoidal rule and k = 1, 2, 3 and k ≥ 6 iterations
Iterative Processes and Integral Equations of the Second Kind
685
E100(t) 10–2 k=1 10–3 k=2 10–4 k=3 10–5 k=4 k≥7 10–6 0
p 8
p 4
3p 8
p 2
t
Figure 6 Errors at n + 2 = 100 equidistant nodes for the trapezoidal rule and k = 1, 2, 3, 4 and k ≥ 7 iterations ½k
kE10 k1 = 4:18 × 10 - 4 , so that the further iterations do not reduce this error. For the rule with n + 2 = 15 nodes, this minimal error is 1.73 × 10-4 for k ≥ 6. Furthermore, the 100-point rule gives the minimal error ½k kE100 k1 = 3:46 × 10 - 6 for k ≥ 7. (2) Now we consider interpolatory quadrature process (2.11), with the Clenshaw-Curtis nodes τν and Cotes numbers AðjÞ ν , ν = 0, 1, . . . , n, n + 1, j = 0, 1, . . . , n, n + 1, which are given by (2.13) and (2.10) (transformed to [0, π∕2] or [a, b], in general), respectively. According to (2.11) these interpolatory formulas are exact for all polynomials of degree at most n + 1. Then, the corresponding discrete approximation of the integral equation (2.14) is given by nþ1
ũk ðτj Þ = gðτj Þ þ
ν=0
AðjÞ ν K 1 ðτj , τν , ũk ðτν ÞÞ þ
nþ1 ν=0
Aðnþ1Þ K 2 ðτj , τν , ũk ðτν ÞÞ, ν
686
S. Micula and G. V. Milovanović
for j = 0, 1, . . . , n + 1, where for j = 0 the first sum on the right-hand side vanishes, because Að0Þ ν = 0 for each ν = 0, 1, . . . , n + 1. However, for j = n = Aν are just coefficients of the Clenshaw+ 1, these Cotes numbers Aðnþ1Þ ν Curtis formula [8] (cf. also Trefethen [33]). Using iterative process (2.5) at t = τ0 = 0 and (2.6) at t = τj, j = 1, . . . , n, n + 1, we obtain the iterations ũk ðτj Þ, j = 0, 1, . . . , n + 1, as in the previous case with the composite trapezoidal rule. The corresponding graphics for the rules with n + 2 = 6, 10 and 15 Clenshaw-Curtis nodes are presented in Figures 7, 8 and 9, respectively. We note that in the case n + 2 = 6, the maximal error after 2, 3, and 4 iterations are 3.88 × 10-4, 5.20 × 10-5, and 3.39 × 10-5, respectively. In the case with n + 2 = 10 Clenshaw-Curtis nodes, the maximal error after 7 iterations is 7.85 × 10-9, and further increasing the number of itera½k tions does not contribute to further reducing the norm kE10 k1 ðk ≥ 7Þ. In the case (n + 2 = 15) after 10 iterations, the maximal error is 1.33 × 10-12, while for 12 iterations it is 3.71 × 10-14. This is also the maximum number of iterations for this kind of quadrature rules with 15 nodes (see Figure 9).
E6 (t)
10–2
k=1 10–3
k=2 10–4
k=3 10–5
k=4 –6
10
0
p 8
p 4
3p 8
p 2
t
Figure 7 Errors at the Clenshaw-Curtis n + 2 = 6 nodes for the rules and k = 1, 2, . . ., 4 iterations
Iterative Processes and Integral Equations of the Second Kind
687
E10 (t) 10–2
k=1
10–3
k=2
10–4
k=3
10–5
k=4
10–6
k=5
10–7
k=6
10–8
k=7
10–9
k=8
10–10 0
p 8
p 4
3p 8
t
p 2
Figure 8 Errors at the Clenshaw-Curtis n + 2 = 10 nodes for the rules and k = 1, 2, . . ., 8 iterations
E15 (t)
k=1
10–3
k=3 10–6
k=5
10–9
k=8 k = 10
10–12
k = 12 10
k = 13
–15
0
p 8
p 4
3p 8
p 2
t
Figure 9 Errors at the Clenshaw-Curtis n + 2 = 15 nodes for the rules and k = 1, 3, 5, 8, 10, 12, 13 iterations
688
S. Micula and G. V. Milovanović
Remark 2.4 The Newton-Kantorovich method for solving a system of 2 × 2 nonlinear Volterra integral equations, where the unknown function is in logarithmic form, was considered in Hameed et al. [13]. We also mention a paper by Ezquerro et al. [12], where the authors use high-order iterative methods for solving nonlinear integral equations of Fredholm type.
2.3
Mixed Volterra-Fredholm Integral Equations
Consider integral equations of the form [19]: t
b
K ðt, x, τ, y, uðτ, yÞÞ dy dτ þ gðt, xÞ,
uðt, xÞ = 0
ð2:17Þ
a
(t, x) 2 D = [0, T] × [a, b], where K 2 C D2 × and g 2 C(D). Such integrals arise in integral reformulations of the heat equation with Dirichlet, Neumann, or mixed boundary conditions. Let X = C(D) be equipped with the (uniform) Chebyshev norm kuk = max(t,x) 2 D|u(t, x)|, consider a closed ball Bϱ := {u 2 C(D) : ku gk≤ ϱ}, ϱ > 0, and define the integral operator F : X → X by t
b
K ðt, x, τ, y, uðτ, yÞÞ dy dτ þ gðt, xÞ:
Fuðt, xÞ := 0
a
The method described in Micula [19] uses Theorem 1.4 on Bϱ with εk = 1∕(k + 1). For the first part of the approximation (the iterative process), the following result holds. Let K 2 C D2 × , g 2 C(D) and ϱ1 = min gðt, xÞ, ðt , xÞ2D
ϱ2 = max gðt, xÞ: ðt , xÞ2D
Assume that there exists a constant L > 0 such that:
jKðt, x, τ, y, uÞ - Kðt, x, τ, y, vÞj ≤ Lku - vk,
ð2:18Þ
for all (t, x), (τ, y) 2 D and all u, v 2 [ϱ1 - ϱ, ϱ2 + ϱ]. In addition, assume that q := LTðb - aÞ < 1
ð2:19Þ
Iterative Processes and Integral Equations of the Second Kind
689
and M K Tðb - aÞ ≤ ϱ,
ð2:20Þ
where M K := maxjKðt, x, τ, y, uÞj over all (t, x), (τ, y) 2 D and all u, v 2 [ϱ1 - ϱ, ϱ2 + ϱ]. Then the equation (2.17) has a unique solution u2 Bϱ, and that solution can be found as the limit of the sequence of successive approximations ukþ1 = 1 -
1 1 Fu , uk þ kþ1 kþ1 k
k = 0, 1, . . . ,
ð2:21Þ
starting with any initial point u0 2 Bϱ. In addition, the error estimate kuk - u k ≤
e1-q - ð1-qÞzk e ku0 - u1 k 1-q
holds for every k 2ℕ, where the sequence {zk} is given by k-1
z0 = 0,
zk =
1 , i þ 1 i=0
k ≥ 1:
ð2:22Þ
It is noteworthy to mention the fact that the Lipschitz and contraction conditions (2.18) and (2.19) could restrict the area of applicability of this method if they would have to be satisfied on the entire space. That is why only this local existence and uniqueness result is used on Bϱ, for some ϱ > 0. For the second part of the method (the approximation of the iterates in (2.21), consider a numerical integration scheme b
d
φðx, wÞ dw dx = a
c
n1
n2
i=0 j=0
aij φðxi , wj Þ þ Rφ ,
ð2:23Þ
with nodes a = x0 < x1 < . . . < xn1 = b and c = w0 < w1 < ⋯ < wn2 = d and coefficients aij 2ℝ, i = 0, 1, . . . , n1, and j = 0, 1, . . . , n2, such that there exists M > 0 with |Rφ|≤ M, where M → 0 as n1, n2 →1. and For our approximations, let 0 = t 0 < t 1 < ⋯ < t n1 = T a = x0 < x1 < ⋯ < xn2 = b be partitions of [0, T] and [a, b], respectively, and let u0 = ũ0 g be the initial approximation. We use the successive iterations (2.21) and the numerical integration formula (2.23) to approximate the values uk(tl, xν) by ũk ðt l , xν Þ, for l = 0, n1 , ν = 0, n2 , and k = 0, 1, . . . , where
690
S. Micula and G. V. Milovanović
ũk ðt l , xν Þ = 1 -
1 ũ ðt , x Þ k k-1 l ν
ð
1 þ k
l
n2
i=0
j=0
Þ
aij νðt l , xν , t i , xj , ũk-1 ðt i , xj ÞÞ þ gðt l , xν Þ : ð2:24Þ
By an inductive argument, we get errðuk , ũk Þ = max juk ðt l , xν Þ - ũk ðt l , xν Þj ðt l , xν Þ2D ≤ M ð1 þ γ þ ⋯ þ γ n - 1 Þ, where γ := L
n1
n2
i=0 j=0
jaij j.
Thus, under the conditions (2.18)–(2.20) and the extra assumption γ=L
m1
m2
i=0 j=0
jaij j < 1,
the error estimate: errðu , ũk Þ ≤
e1-q - ð1-qÞzk M e ku0 - Fu0 k þ 1-q 1-γ
ð2:25Þ
holds for every k 2ℕ, where u is the true solution of equation (2.17) and the sequence {zk}k 2ℕ is defined in (2.22). Let us use the two-dimensional trapezoidal rule (as in Micula [19]): b
d
φðτ, yÞ dy dτ = a
c
½
ðb - aÞðd - cÞ φða, cÞ þ φðb, cÞ þ φða, dÞ 4n1 n2 þ φðb, dÞ þ 2
n1-1
ðφðτi , cÞ þ φðτi , dÞÞ
i=1
þ2
n2-1
ðφða, yj Þ þ φðb, yj ÞÞ
j=1
þ4
n1-1 n2 -1 i=1
j=1
φðτi , yj ÞÞ þ Rφ
Iterative Processes and Integral Equations of the Second Kind
691
with nodes b-a d-c i, wj = c þ j, i = 0, n1 , j = 0, n2 : n1 n2
xi = a þ
The remainder is given by aÞ ðd - cÞ ½ðb -12n φ n 3
Rφ = þ
2 1 2
ð2,0Þ
ðξ, η1 Þ þ
ðb - aÞ3 ðd - cÞ3 ð2,2Þ φ ðξ, ηÞ , 144n21 n22
ξ, ξ1 2 ða, bÞ, η, η1 2 ðc, dÞ,
where we used the notation φðα,βÞ ðt, xÞ = For our integrals, we get tl
b
K ðt l , xν , τ, y, uk ðτ, yÞÞ dy dτ = 0
a
ðb - aÞðd - cÞ3 ð0,2Þ φ ðξ1 , ηÞ 12n1 n22
αþβ
∂ φ ðt, xÞ. ∂t α ∂xβ
½
t l ðb - aÞ K l,ν,0,0 þ K l,ν,l,0 þ K l,ν,0,n2 4ln2 þ K l,ν,l,n2 þ 2 þ2
n2 - 1 j=0
þ4
l-1 i=0
ðK l,ν,i,0 þ K l,ν,i,n Þ 2
ðK l,ν,0,j þ K l,ν,l,j Þ
l - 1 n2 - 1 i=0 j=0
K l,ν,i,j þ RK ,
with nodes tl =
T l, n1
xν = a þ
b-a ν, n2
l = 0, n1 ,
ν = 0, n2
and the simplifying notation Kl,ν,i,j = K(tl, xν, ti, xj, uk(ti, xj)). Since tl ∕l = T∕n1, in this case, γ ≤ LT(b - a) = q, which is already assumed to be strictly less than 1 by (2.19). Now, we focus to the remainder. It is clear that if K(2, 0)(τ, y, uk(τ, y)), K(0, 2)(τ, y, uk(τ, y)), and K(2, 2)(τ, y, uk(τ, y)) are bounded, then the remainder RK is of the form O 1∕ n21 þ O 1∕ n22 . So, if K and g are C4 functions with bounded fourth-order partial derivatives, then there exists M > 0,
692
S. Micula and G. V. Milovanović
independent of k, such that |RK|≤ M, with M → 0 as n1, n2 →1. Then, we have the error estimate (2.25). Example We now illustrate the applicability of the above method on a numerical example. Consider the nonlinear mixed Volterra-Fredholm integral equation t
1
x2 yτe - τ euðτ,yÞ dy dτ þ x2 ð1 - e - t Þ,
uðt, xÞ = 2 0
ð2:26Þ
0
for t 2 [0, 1∕4], whose exact solution is u(t, x) = tx2. The theoretical assumptions are satisfied for ϱ = 1. We use the trapezoidal rule with n1 = n2 = 18 and nodes ti =
i , 4n1
i = 0, n1 ,
xj =
j , n2
j = 0, n2 :
The numerical implementation of (2.24) is done in MATLAB, in double precision arithmetic. The errors errðu , ũk Þ are given in the table below, with initial approximation u0(t, x) = g(t, x) = x2(1 -e-t) (Table 1). Table 1 Errors for equation (2.26), n1 = n2 = 18
2.4
errðu , ũk Þ 2.034743e - 01 9.354733e - 04 3.077314e - 05
k 1 5 10
Volterra Integral Equations with Delayed Argument
Next, let us consider Volterra integral equations of the form [22]:
uðtÞ =
t
φð0Þ þ gðtÞ þ
K ðt, x, uðxÞ, uðx - δÞÞdx,
t 2 ½0, b,
0
φðtÞ,
t 2 ½ - δ, 0, ð2:27Þ
where δ > 0, K 2 C ½0, b × ½0, b × 2 , φ 2 C[-δ, 0], g 2 C[0, b], and g(0) = 0.
Iterative Processes and Integral Equations of the Second Kind
693
These delayed argument equations are used to model dynamical systems, such as population growth or decay, or the evolution of an epidemic. Consider the space X = C[-δ, b] endowed with the Bielecki norm kukτ := max juðtÞj e - τt , t2½ - δ, b
u 2 X,
for some suitable τ > 0. Then (X, kkτ) is a Banach space on which the theoretical results in Section 1 hold. Let us remark that, when employing such fixed point results, the use of the Bielecki norm has sometimes a major advantage over the usual max norm: the Lipschitz or contraction-type conditions that the operator has to satisfy can be fulfilled by a convenient choice of the parameter τ, without imposing extra restrictions on the kernel function. We define the operator T : X → X by
TuðtÞ =
t
φð0Þ þ gðtÞ þ
K ðt, x, uðxÞ, uðx - δÞÞdx,
t 2 ½0, b,
0
φðtÞ,
t 2 ½ - δ, 0: ð2:28Þ
Again, we use a local fixed point result. Let Bϱ ⊂ X be the closed ball ~ ≤ ϱg, where Bϱ = fu 2 X : ku - φk ~ = φðtÞ
φðtÞ,
t 2 ½ - δ, 0,
φð0Þ þ gðtÞ,
t 2 ½0, b
and kk denotes the Chebyshev norm on X. Applying Theorem 1.4 with εk = 1∕(k + 1) to the operator T from (2.28), we have the following result. Assume that there exist constants L1, L2 > 0 such that jKðt, x, u1 , v1 Þ - Kðt, x, u2 , v2 Þj ≤ L1 ju1 - u2 j þ L2 jv1 - v2 j, for all t, x 2 [0, b] and all u1, u2, v1, v2 2 [ϱ1 - ϱ, ϱ2 + ϱ], where ~ ~ ϱ1 := min φðtÞ, ϱ2 := max φðtÞ: t2½ - δ, b t2½ - δ, b Further assume that bM ≤ ϱ,
694
S. Micula and G. V. Milovanović
where M := maxjKðt, x, u, vÞj over all t, x 2 [0, b] and all u, v 2 [ϱ1 - ϱ, ϱ2 + ϱ]. Then the integral equation (2.27) has a unique solution u2 Bϱ, and the sequence defined by ukþ1 = 1 -
1 1 uk þ Tu , kþ1 kþ1 k
k = 0, 1, . . . ,
ð2:29Þ
converges to the solution u for any initial point u0 2 Bϱ. Moreover, for every k 2ℕ, the following error estimate kuk - u kτ ≤
e1 - q ku - Tu0 kτ e - ð1 - qÞzk , 1-q 0
ð2:30Þ
holds, where k-1
z0 = 0, zk =
1 ðk ≥ 1Þ i þ 1 i=0
and q =
L1 þ L2 0 depends on the bounds of the derivatives of the functions K, g and φ, but not on k. Under the assumptions above, we can choose u0 2 X0 \ C2[-δ, b] \ Bϱ, such that the sequence defined in (2.29) has the following properties: (a) uk 2 X0 \ C2[-δ, b] \ Bϱ, kuk - gk≤ ϱ. (b) fu0k g and fu00k g are bounded sequences. Combining the errors in (2.30) and (2.31), we get the composite error
ju ðtν Þ - ~uk ðtν Þj ≤ ju ðtν Þ - uk ðtν Þj þ juk ðtν Þ - ~uk ðtν Þj = ju ðt ν Þ - uk ðt ν Þje - τt eτt þ juk ðt ν Þ - ~un ðt ν Þj ≤ kuk - u kτ eτb þ juk ðt ν Þ - ~uk ðt ν Þj ≤
eτbþ1 - q b3 ku0 - Tu0 kτ e - ð1 - qÞzk þ M, 1-q 12n2
at each node tν, ν = 0, n. Example Consider the integral equation with delayed argument [22]:
uðtÞ =
gðtÞ þ
35 34
t
ð
t 2 uðxÞ - 1
Þðuðx - 1Þ þ 1Þdx,
t 2 ½0, 2,
0
t 2 ½ - 1, 0,
0,
ð2:32Þ where gðtÞ = -
ð
Þ
352 1 1 3 1 3 35 t 9 - t 8 - t 7 - t 6 þ t5 - t 4 : 34 7 2 5 4 2
The exact solution of equation (2.32) is u ðtÞ =
0,
t 2 ½ - 1, 0,
35t 3 ,
t 2 ½0, 2:
Iterative Processes and Integral Equations of the Second Kind
2.2
⫻10
697
-5
k=8 k = 10
2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0
0.5
1
1.5
2
Figure 10 Errors at the nonnegative nodes for k = 8 and k = 10 iterations
We take n = 48 and the nodes tν = -1 + ν∕16, ν = 0, 48. Notice that t17 = 0. The initial approximations are u0(tν) = 0, for ν = 0, 17, and u0(tν) = g(tν), for ν = 18, 48. The graph of the errors at the nonnegative nodes is given in Figure 10 for k = 8 and 10 iterations (since, for the negative nodes, the approximation is exact).
2.5
Functional Volterra Integral Equations
Functional integral equations have many application in radiative transfer, control theory, mechanical engineering, etc. We consider a Volterra functional integral equation of the type [21] t
K ðt, x, uðxÞÞdx þ gðtÞ,
uðtÞ = λ
t 2 ½ - T, T,
ð2:33Þ
-t
for T > 0 and λ 2ℝ, with K 2 C ½ - T, T2 ×
and g 2 C[-T, T] . Let
X = C[-T, T] be endowed with the uniform norm kuk = maxt 2 [-T,T]jx(t)j and consider the closed ball Bϱ = {u 2 X j ku - gk≤ ϱ}, ϱ > 0. We define the integral operator F : X → X associated with equation (2.33) by
698
S. Micula and G. V. Milovanović t
K ðt, x, uðxÞÞdx þ gðtÞ:
FuðtÞ := λ -t
Using the contraction principle (Theorem 1.3) on the ball Bϱ, we have the following result. We assume that there exists a function L : ½ - T, T → þ such that
jKðt, x, uÞ - Kðt, x, vÞj ≤ LðxÞju - vj, for all t, x 2 [-T, T] and all u, v 2 [ϱ1 - ϱ, ϱ2 + ϱ], where ϱ1 := min g(t), t2½ - T;T
ϱ2 := max g(t). Also, assume that t2½ - T;T
T
q := jλj
LðxÞdx < 1 -T
and that 2jλjM K T ≤ ϱ, where M K = max jKðt, x, uÞj over t, x 2 [-T, T] and u, v 2 [ϱ1 - ϱ, ϱ2 + ϱ]. Then equation (2.33) has exactly one solution u2 Bϱ, which is the limit of the sequence given by ukþ1 = Fuk ,
k = 0, 1, . . . ,
ð2:34Þ
with any arbitrary initial point u0 2 Bϱ, and we have the error estimate kuk - u k ≤
qk ku - u0 k, 1-q 1
for every k 2ℕ. To approximate numerically the integrals in (2.34), we consider a symmetric quadrature formula b
j
φðxÞ dx = -b
i= -j
ai φðxi Þ þ Rφ,j ,
for any 0 < b ≤ T, with nodes 0 = x0 < x1 < ⋯ < xj = b, x-i = -xi, and i = 0, 1, . . . j and coefficients ai 2 ℝ and i = -j, . . . , j and for which the remainder satisfies
Iterative Processes and Integral Equations of the Second Kind
Rφ,j → 0 as
699
j → 1:
Let n ≥ 1 be fixed and let 0 = t0 < t1 < ⋯ < tn = T, t-i = -ti. Then, for ν = -n, . . . , n, we have u0 ðt ν Þ = gðt ν Þ, tν
ukþ1 ðt ν Þ = λ
K ðt ν , x, uk ðxÞÞdx þ gðt ν Þ, k = 0, 1, . . . :
ð2:35Þ
-t ν
In addition, we assume that when the quadrature formula is applied on the interval [-tν, tν], ν = 0, 1, . . . , n, the remainders RK,ν satisfy jRK,ν j ≤ M, where M depends on the fixed number n and M → 0 as n →1. Now we apply the quadrature scheme to our integrals above. To simplify the writing in (2.35), we make the following notations: K ν,i,k := K ðt ν , t i , uk ðt i ÞÞ K~ν,i,k := K ðt ν , t i , ũk,n ðt i ÞÞ, for ν, i = - n, . . . , n; k = 1, 2, . . . , where ν
~ u1,n ðt ν Þ = λ
ai K ðt ν , t i , gðt i ÞÞ þ gðt ν Þ
i= -ν ν
=λ
ai K ν,i,0 þ gðt ν Þ, i= -ν ν
~uk,n ðt ν Þ = λ
ai K~ν,i,k - 1 þ gðt ν Þ:
i= -ν
For a fixed n, we approximate uk(tν) by ũk,n ðt ν Þ the following way: tν
u1 ðt ν Þ = λ -t ν
ð
K ðt ν , x, gðxÞÞdx þ f ðt ν Þ ν
=λ
i= -ν
ai K ðt ν , ti , gðt i ÞÞ þ RK,ν
=~ u1,n ðt ν Þ þ R~1,ν
Þ þ gðt Þ ν
700
S. Micula and G. V. Milovanović
Then, denoting by kuk - ũk,n k :=
max juk ðt ν Þ - ũk,n ðt ν Þj
t ν 2½ - T , T
and R~k := max jR~k,ν j, -n≤ν≤n
we have ku1 - ũ1,n k ≤ R~1 ≤ jλjM: Proceeding further, in a similar fashion, we get tν
u2 ðt ν Þ = λ
K ðt ν , x, u1 ðxÞÞdx þ gðt ν Þ
-t ν
ð þλð
ν
=λ
Þ þ gðt Þ ~ Þ þ R Þ þ gðt Þ ðt Þ þ R
ai K ðt ν , t i , u1 ðt i ÞÞ þ RK,ν
i = -ν ν
ai K ðt ν , t i , ~ u1,n
i = -ν
i
1,i
ν
K,ν
= ~u2,n ðt ν Þ þ R~2,ν : To estimate the error R~2 , denote by γ = jλj
n i =-n
jai jLðt i Þ. Then,
ku2 - ~ u2,n k ≤ R~2 ν
≤ jλj
jai jLðt i ÞR~1 þ jλjjRK,ν j
i = -ν n
≤ jλj
jai jLðti ÞjλjM þ jλjM i = -n
= jλjMð1 þ γÞ and, by induction, we get
ν
Iterative Processes and Integral Equations of the Second Kind
701
kuk - ~ uk,n k ≤ R~k ≤ R~k - 1 γ þ jλjM = jλjM γð1 þ γ þ ⋯ þ γ k - 2 Þ þ 1 = jλjM ð1 þ γ þ ⋯ þ γ k - 1 Þ: Thus, under all the conditions assumed so far, if γ < 1, then the error estimate kũk,n - u k ≤
jλjM qk ku1 - u0 k þ 1-γ 1-q
holds for every k 2 ℕ. Thus, as k, n →1, ũk,n → u . In particular, let us consider the trapezoidal rule for approximating integrals over symmetric intervals: b
n-1
½
b φðxÞdx = φð - bÞ þ 2 2n -b
φðxj Þ þ φðbÞ þ Rφ,n , j = - nþ1
where the 2n + 1 nodes are xj = -b + bj∕n, j = 0, 2n, and the remainder is given by Rφ,n = -
b3 φ″ðηÞ, 6n2
η 2 ð - b, bÞ:
For ν = 0, n, let tν = Tν∕n and t-ν = -tν. We have, for ν = 0, n (i.e., for tν ≥ 0), tν
K ðt ν , x, uk ðxÞÞdx =
-t ν - tν
K ð - t ν , x, uk ðxÞÞdx = -
tν
Notice that
2t ν 4ν
=
T 2n,
ð
2t ν K ν, - ν,k þ 2 4ν
ν-1
K ν,j,k þ K ν,ν,k j = - νþ1
ð
2tν K - ν, - ν,k þ 2 4ν
so, in this case, γ ≤
ÞþR
ν-1
K - ν,j,k þ K - ν,ν,k j = - νþ1
jλjT n
n i= -n
K,ν ,
Þ-R
K,ν :
Lðti Þ, which will be
assumed to be less than 1. For the remainder, notice that for all ν = 0, 1, . . . , n, we have
702
S. Micula and G. V. Milovanović
j
j
t 3ν K ðt ν , ην , uk ðην ÞÞj00x 6ν2 T 3 ν3 = 3 2 K ðt ν , ην , uk ðην ÞÞj00x 6n ν T3 ≤ 2 K ðt ν , ην , uk ðην ÞÞj00x : 6n
jRK,ν j =
j
j
j
j
Thus, if K and g are C2 functions with bounded second-order (partial) derivatives and jλjT n
n i = -n
Lðti Þ < 1, then jRK,ν j ≤
T3 M =: M, 6n2 0
for some M0 > 0 that does not depend on k or ν. Hence, from all the work above, we get kũk,n - u k ≤
qk T 3 M0 kx1 - x0 k þ 2 , 1-q 6n 1 - γ
for all k = 1, 2, . . . . Example Let us consider the nonlinear functional integral equation [21] 1 uðtÞ = 32
t
cosðxÞu2 ðxÞ dx þ sin ðtÞ -t
1 sin 3 ðtÞ, 48
ð2:36Þ
for t 2 ½ - π∕2, π∕2. The exact solution of equation (2.36) is u ðtÞ = sinðtÞ. In this case λ=
1 , 32
Kðt, x, uÞ = cosðxÞu2 ,
gðtÞ = sinðtÞ -
1 sin 3 ðtÞ: 48
Let ϱ = 1. Notice that g is an increasing function on ½ - π∕2, π∕2, so ϱ1 = g -
π 2
= -
47 , 48
ϱ2 = g
We have M K = ðϱ2 þ ϱÞ2 = and so that
95 2 48
π 2
=
47 : 48
Iterative Processes and Integral Equations of the Second Kind
6
⫻10
703
-5
k=8 k = 10
5
4
3
2
1
0 -2
-1.5
-1
-0.5
0
0.5
1
1.5
2
Figure 11 Errors at the nodes for k = 8 and k = 10 iterations
π 95 2jλjM K T = 32 48 Also, on 0,
2
< 1 = ϱ:
π π 95 95 , we have × 0, × - , 2 2 48 48
jKðt, x, uÞ - Kðt, x, vÞj ≤ cos ðxÞju þ vjju - vj ≤ Let LðxÞ =
95 cosðxÞju - vj: 24
95 cosðxÞ. Then 24 π∕2
LðxÞdx =
q = jλj - π∕2
95 < 1: 384
For the second part, note that γ≤
1 π 95 32 2n 24
for any n ≥ 1.
n
cosðt i Þ ≤ i= -n
95π 2n þ 1 95π 3 ≤ < 1, 768 2n 768 2
704
S. Micula and G. V. Milovanović
Thus, all our theoretical assumptions are satisfied. π We use the trapezoidal rule with n = 12 and the 25 nodes t ν = ν, 24 t-ν = -tν, ν = 0, 12. In Figure 11 we present the graph of the errors at the nodes for k = 8 and 10 iterations.
2.6
Fractional Integral Equations
In recent years, fractional calculus has been studied extensively, as more and more applications have developed in various fields from physics and engineering where domains are fractal curves (continuous, but non-differentiable functions), where ideas and methods from classical calculus cannot be used. Here, we consider the following fractional-order integral equation [20] uðtÞ = aðtÞJ α ½bðtÞuðtÞ þ gðtÞ,
ð2:37Þ
i.e., uðtÞ =
aðtÞ ΓðαÞ
t
bðxÞðt - xÞα - 1 uðxÞ dx þ gðtÞ,
t 2 ½0, T,
0
where 0 < α < 1 and a, b, g : ½0, T → are continuous functions. The term 1 J f ðtÞ = ΓðαÞ
t
ðt - xÞα - 1 f ðxÞ dx
α
0
is called the fractional integral of f of order α and 1
e - x xα - 1 dx,
ΓðαÞ =
α>0
0
is Euler’s gamma function. On the space X = C[0, T], we consider again the Bielecki norm kukτ = max juðtÞj e - τt for some τ > 0 and the ball Bϱ := {u 2 X : t2½0, T ku - gkτ ≤ ϱ}, for some ϱ > 0. We define the fractional integral operator
Iterative Processes and Integral Equations of the Second Kind
aðtÞ FuðtÞ = ΓðαÞ
705
t
bðxÞðt - xÞα - 1 uðxÞ dx þ gðtÞ: 0
For continuous functions a, b, g and u on [0, T], it can be shown that Fu is also continuous on [0, T] (e.g., see Andras [3]), so F : X → X is well-defined. We choose the constant τ such that τ ≥ ð2kakkbkÞ1∕ α , where kk denotes the Chebyshev norm and the radius ϱ so that ϱ ≥ max f - ϱ1 , ϱ2 g, where ϱ1 := min gðtÞ and ϱ2 := max gðtÞ. t2½0, T t2½0, T These conditions will ensure the fact that F(Bϱ) ⊆ Bϱ and that F : Bϱ → Bϱ is a contraction with constant q=
kakkbk 0 such that ^ = jR^k j ≤ jRj
T2 Tα M, 8n2 α
where the constant M depends on a, b, g, ϱ and τ, but not on n, ν ,or k. As before, we approximate the values uk(tν) by ũk ðt ν Þ given by ũ0 ðt ν Þ = gðt ν Þ, 1 ũkþ1 ðt ν Þ = aðt Þ ΓðαÞ ν
ν
wj,ν bðt j Þũk ðt j Þ þ gðt ν Þ: j=0
Denoting by γ=
Tα M M, ΓðαÞ a
where M a = max fkak, ka ′ k, ka″kg, by computations similar to the ones in the previous section (details can be found in Micula [20]), we get, inductively, that uk k := max juk ðt ν Þ - ~uk ðt ν Þj kuk - ~ tν 2½0, T ≤
T2 γ ð1 þ γ þ ⋯ þ γ n - 1 Þ: 8n2
So, if we assume γ < 1, we have the error estimate kũk - u k ≤
qk T2 γ ku1 - u0 k þ 2 , 1-q 8n 1 - γ
for every k 2ℕ. Example Now, consider the fractional integral equation 0:01 5∕2 uðtÞ = t Γð1∕2Þ
t
ðt - xÞ - 1∕2 uðxÞ dx þ
p
π ð1 þ tÞ - 3∕2 - 0:02
0
p for t 2 [0, 1], whose exact solution is u ðtÞ = π ð1 þ tÞ - 3∕2 .
t3 , 1þt
Iterative Processes and Integral Equations of the Second Kind
2.5
⫻10
709
-5
k=8 k = 10
2
1.5
1
0.5
0 0
0.2
0.4
0.6
0.8
1
Figure 12 Errors at the nodes for k = 8 and k = 10 iterations
Here, we have α = 1∕2, a(t) = 0.01t5∕2, and b(t) 1. Then kak = 0.01, kbk = 1, so we can take τ = 1, satisfying our theoretical requirements. Since p t3 , gðtÞ = π ð1 þ tÞ - 3∕2 - 0:02 1þt p p we have ϱ1 = π 2 - 3∕2 , ϱ2 = π , so we can choose ϱ = 2. Then u2 Bϱ, and 15p since Ma = 0.06, M = π , we have γ ≈ 0.281 < 1. 4 We use the iterative scheme described above with n = 24, so corresponding nodes tν = ν∕24, ν = 0, 24, and the initial approximation u0(t) = g(t). The errors jũk ðt ν Þ - u ðt ν Þj at the nodes, for k = 8 and k = 10 iterations, are illustrated in Figure 12. Acknowledgment The work was supported in part by the Serbian Academy of Sciences and Arts (Φ-96).
710
S. Micula and G. V. Milovanović
References 1. Altman, M. (1961). Concerning the method of tangent hyperbolas for operator equations. Bulletin de L’Académie Polonaise Des Sciences: Série des Sciences Mathématiques, Astronomiques, et Physiques, 9, 633–637 2. Altman, M. M. (1981). A stronger fixed point theorem for contraction mappings, Preprint 3. Andras, S. (2003). Weakly singular Volterra and Fredholm-Volterra integral equations. Studia. Universitatis Babeş-Bolyai. Mathematica, 48(3), 147–155 4. Atkinson, K. E. (1989). An introduction to numerical analysis (2nd ed.). New York: Wiley 5. Berinde, V. (2007). Iterative approximation of fixed points. Lecture Notes in Mathematics. Berlin: Springer 6. Birkhoff, G., & Young, D. M. (1950). Numerical quadrature of analytic and harmonic functions. Journal of Mathematical Physics, 29, 217–221 7. Brunner, H., Pedas, A., & Vainikko, G. (1999). The piecewise polynomial collocation method for nonlinear weakly singular Volterra equations. Mathematics of Computation, 68(227), 1079–1095 8. Clenshaw, C. W., & Curtis, A. R. (1960). A method for numerical integration on an automatic computer. Numerical Mathematics, 2, 197–205 9. Collatz, L. (1966). Functional analysis and numerical mathematics. New York: Academic. (Translated from the German) 10. Cvetković, A. S. & Milovanović, G. V. (2004). The Mathematica Package “OrthogonalPolynomials”. Facta Universitatis. Series: Mathematics and Informatics, 19, 17–36 11. Dahlquist, G., & Björk, Å. (2008). Numerical methods in scientific computing, vol. I. Philadelphia: SIAM 12. Ezquerro, J. A., Hernández, M. A., & Romero, N. (2011). Solving nonlinear integral equations of Fredholm type with high order iterative methods. Journal of Computational and Applied Mathematics, 236, 1449–1463 13. Hameed, H. H., Eshkuvatov, Z. K., Ahmedov, A., & Nik Long, N. M. A. (2015). On Newton-Kantorovich method for solving the nonlinear operator equation. Abstract and Applied Analysis, 12, Art. ID 219616 14. Jovanović, B. (1972). A method for obtaining iterative formulas of higher order. Matematichki Vesnik, 9, 365–369 15. Kantorovich, L. V. (1948). On Newton’s method for functional equations. Doklady Akademii Nauk SSSR, 59, 1237–1240 16. Kantorovich, L. V., & Akilov, G. P. (1977). Functional analysis. Moscow: Nauka 17. Krasnoselski, M. A., Vainikko, G. M., Zabreiko, P. P., Rutitski, Ya. B., & Stetsenko, V. Ya. (1969). Approximate solution of operator equations. Moscow: Nauka 18. Mastroianni, G., & Milovanović, G. V. (2008). Interpolation processes—Basic theory and applications. Springer Monographs in Mathematics. Berlin: Springer 19. Micula, S. (2021). Numerical solution of two-dimensional Fredholm-Volterra integral equations of the second kind. Symmetry, 13(1326), 12. https://doi.org/10.3390/ sym13081326 20. Micula, S. (2018). An iterative numerical method for fractional integral equations of the second kind. Journal of Computational and Applied Mathematics, 339, 124–133
Iterative Processes and Integral Equations of the Second Kind
711
21. Micula, S. (2017). On some iterative numerical methods for a Volterra functional integral equation of the second kind. Journal of Fixed Point Theory and Applications, 19, 1815–1824 22. Micula, S. (2015). A fast converging iterative method for Volterra integral equations of the second kind with delayed arguments. Fixed Point Theory, 16(2), 371–380 23. Micula, S. (2015). An iterative numerical method for Fredholm-Volterra integral equations of the second kind. Applied Mathematics and Computation, 270, 935–942 24. Milovanović, G. V. (1974). A method to accelerate iterative processes in Banach space. Univerzitet u Beogradu. Publikacije Elektrotehničkog Fakulteta. Serija Matematika i Fizika, 461–479, 67–71 25. Milovanović, G. V. (2017). Generalized weighted Birkhoff-Young quadratures with the maximal degree of exactness. Applied Numerical Mathematics, 116, 238–255 26. Milovanović, G. V., & Cvetković, A. S. (2012). Special classes of orthogonal polynomials and corresponding quadratures of Gaussian type. Mathematica Balkanica, 26, 169–184 27. Mysovskikh, I. P. (1950). On convergence of L.V. Kantorovich’s method for functional equations and its applications. Doklady Akademii Nauk SSSR, 70, 565–568 28. Ortega, J. M., & Rheinboldt, W. C., (2000). Iterative solution of nonlinear equations in several variables. Classics in Applied Mathematics, vol. 30. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM) 29. Polyak, B. T. (2006). Newton-Kantorovich method and its global convergence. Journal of Mathematical Sciences, 133(4), 1513–1523 30. Simeunović, D. M. (1995). On a process for obtaining iterative formulas of higher order for roots of equations. Revue d’Analyse Numrique et de Thorie de l’Approximation, 24, 225–229 31. Simeunović, D. (2005). On a method for obtaining iterative formulas of higher order. Mathematica Moravica, 9, 53–58 32. Scarborough, J. B. (1930). Numerical mathematical analysis. Baltimore: Johns Hopkins Press 33. Trefethen, L. N. (2008). Is Gauss quadrature better than Clenshaw-Curtis?. SIAM Review, 50, 67–87
The Daugavet Equation: Linear and Nonlinear Recent Results Sheldon Dantas, Domingo García, Manuel Maestre, and Juan B. Seoane-Sepúlveda
Abstract A bounded linear operator T : X → X satisfies the Daugavet equation whenever the equality kIdX þ Tk = 1 þ kTk holds true. In 1963, I. K. Daugavet proved such an equation for X = C½0, 1 and for every compact operator T. Since then it has become a field of great and non-stopping activity. We survey recent and relevant results on this operator equation and the corresponding Daugavet property, both in the linear and nonlinear setting. Keywords Daugavet equation • Daugavet property • Alternative Daugavet equation • Alternative Daugavet property Mathematics Subject Classification (MSC2020) 46B04 • 46B25 • 46E15 • 47L10
1 Introduction All the theory we will be considering in this survey is based in a result obtained in 1963 by I. K. Daugavet [24]. He showed that every compact linear operator T on the Banach space C[0, 1] satisfies the equality:
S. Dantas • D. García • M. Maestre (✉) Departamento de Análisis Matemático, Facultad de Ciencias Matemáticas, Universidad de Valencia, Burjasot, Valencia, Spain e-mail: [email protected]; [email protected]; [email protected] J. B. Seoane-Sepúlveda Instituto de Matemática Interdisciplinar (IMI), Departamento de Análisis y Matemática Aplicada, Facultad de Ciencias Matemáticas, Madrid, Spain e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Moslehian (ed.), Matrix and Operator Equations and Applications, Mathematics Online First Collections, https://doi.org/10.1007/16618_2023_60
713
714
S. Dantas et al.
kId þ Tk = 1 þ kTk,
ð1:1Þ
where Id stands for the identity operator. Such a norm equality has become known as the Daugavet equation. Over the years, the validity of (1.1) has been established for many different classes of operators defined on many different Banach spaces. It is also a remarkable result given in 1970 by J. Duncan et al. (see [26]) that, for every compact Hausdorff space K and every bounded linear operator T on C(K), the equality: max kId þ ωTk = 1 þ kTk ω2
ð1:2Þ
holds true (here, is the unit sphere of the scalar field which is either or ). Nowadays, the equation (1.2) is known as the alternative Daugavet equation [53], and we refer the reader to [26, 53] and references therein for a proper background. The original aim of this survey, as the title says, was to present the recent progress in this area, both in the linear and nonlinear setting. However, we have realized that it was a too ambitious goal. We consider that a whole long book should be dedicated to that task, and we have chosen a more modest objective. We decided to present here only (a part of) the nonlinear theory associated to the Daugavet and alternative Daugavet properties. Even so, we present a few results of the linear theory that satisfy two requirements: first one being that they are very recent and second one that they are needed to justify and even to provide cases and examples for the polynomial and more general nonlinear cases. For these reasons we plan to discuss the Daugavet and the alternative Daugavet properties in the linear setting only in tensor product spaces, Lipschitz spaces, and JB-triples (that includes, among others, C-algebras). That point of view therefore has forced us to leave many deep and interesting results out of this survey. We would like to apologize to many authors whose important contributions to the Daugavet theory will not be reflected here. Unfortunately, they will have to wait until this needed book is written.
1.1
Basic Notations and Definitions
Throughout the text, we will be using the letters X , Y, andZ for a vector space over a field , which can be the real or complex numbers unless explicitly stated. We denote by BX the open unit ball, by BX the closed unit
The Daugavet Equation: Linear and Nonlinear Recent Results
715
ball, and by SX the unit sphere of a Banach space X . Given two Banach spaces X , Y, we denote the set of all bounded linear operators from X into Y by LðX , YÞ. When X = Y, we simply denote it by LðX Þ. The symbol KðX Þ stands for the Banach space of all compact operators from X into itself. We denote the identity operator on a Banach space X by IdX or simply by Id when there is no confusion on where the operator is defined. Given a Banach space X , its topological dual is denoted by X ′ . The rest of the notations will be given accordingly in each section of this manuscript whenever it is necessary. Based on Daugavet’s abovementioned result, in the nineties, it was introduced the concept of the Daugavet equation and Daugavet property in the context of Banach spaces. Later, in 2004, M. Martín and T. Oikhberg introduced the alternative Daugavet equation and property, which have a very strong relation with the numerical radius of an operator and the numerical index of the involved Banach space (see [53]). Definition 1.1 Let X be a real or complex Banach space and T : X → X be a bounded operator. (a) We say that T satisfies the Daugavet equation whenever: kId þ Tk = 1 þ kTk:
ð1:3Þ
(b) We say that T satisfies the alternative Daugavet equation if there exists ω 2 such that ω T satisfies Daugavet equation or, equivalently, if: max kId þ ω Tk = 1 þ kTk:
jωj = 1
ð1:4Þ
Whenever all rank-one operators on a Banach space satisfy the corresponding Daugavet equation, we say that the Banach space X has the corresponding Daugavet property. More precisely, we have the following definitions. Definition 1.2 Let X be a real or complex Banach space. (a) We say that X satisfies the Daugavet property if every rank-one operator T : X → X satisfies the Daugavet equation. (b) We say that X satisfies the alternative Daugavet property if every rankone operator T : X → X satisfies the alternative Daugavet equation. It is the same to ask such properties for all rank-one operators or all weakly compact ones. Recall that a rank-one operator T : X → X can be written as T = x x0 with x 2 X ′ and x0 2 X .
716
S. Dantas et al.
Theorem 1.3 ([41, Lemma 2.2 and Theorem 2.3]) Let X be a real or complex Banach space. The following statements are equivalent. (a) For every x 2 SX ′ and every x0 2 X , the function x x0 satisfies the Daugavet equation. (b) For every x 2 SX ′ , every x0 2 SX , and every ε > 0, there exists y 2 BX such that: Re x ðyÞ > 1 - ε
and
kx0 þ yk > 2 - ε:
(c) Every operator T : X → X such that TðBX Þ is a relatively weakly compact set satisfies the Daugavet equation. Theorem 1.4 ([53, Proposition 2.1 and Theorem 2.2]) Let X be a real or complex Banach space. The following statements are equivalent. (a) For every x 2 X ′ and every x0 2 X , the function x x0 satisfies the alternative Daugavet equation. (b) For every x 2 S0X , every x0 2 SX , and every ε > 0, there exist ω 2 and y 2 BX such that: Re x ðyÞ > 1 - ε
and
kx0 þ ωyk > 2 - ε:
(c) Every operator T : X → X such that TðBX Þ is a relatively weakly compact set satisfies the alternative Daugavet equation. The reader expert on the present theory will immediately realize that in the definitions above we have avoided the explicit definition of slices, so useful in the theory.
2 The (Alternative) Daugavet Property in the Linear Setting We first treat the linear case of the Daugavet property, that is, rank-one linear operators T 2 LðX Þ satisfying the Daugavet equation. We have selected a few results from recent papers on the topic. More specifically, we treat the Daugavet property in tensor product spaces, in Lipschitz function spaces, and in JB-algebras. Our main motivation to treat the Daugavet property in these spaces is the fact that they are strongly connected to the nonlinear case (see Section 3).
The Daugavet Equation: Linear and Nonlinear Recent Results
2.1
717
Tensor Product Spaces
In the survey [72], Dirk Werner provided a list of open problems on the Daugavet property. One of them was related to the tensor product between Banach spaces. More specifically, he wondered whether X π Y and X ε Y have the Daugavet property whenever X and/or Y satisfies the property. This indeed seems to be a natural question as the Daugavet property was deeply studied for classical Banach spaces and it was time to figure what happens to the stability of such a property under tensor products in order to produce more positive or negative examples. Before providing the main results that we will state in this section, let us recall some basic definitions about tensor products in Banach spaces. The projective and injective tensor products between the Banach spaces X and Y, denoted, respectively, by X π Y and Xε Y, are defined as the completion of the algebraic tensor product X Y endowed with the norms: n
kzkπ : = inf
i=1
n
kxi kkyi k : z =
i=1
xi yi
where the infimum is taken over all such representations of z and: n i=1
n
xi yi
ε
: = sup
i=1
x ðxi Þy ðyi Þ : x 2 BX ′ , y 2 BY ′ :
We would like to mention that it is well-known that B
X π Y
= coðBX BY Þ
and also that ðX π YÞ′ = BðX × Y; Þ = LðX , Y ′ Þ, where BðX × Y; Þ stands for the bilinear forms on X × Y. On the other hand, given a natural number N 2 , we denote by zN the N N element z ⋯ z 2 X ⋯ X. The projective symmetric tensor product of X , denoted by π,s,N X , is defined as the completion of the linear space π,s,N X generated by fzN : z 2 Xg endowed with the norm given by: n
kzkπ,s,N : = inf
i=1
n
jλi j : z =
i=1
λi xNi , n 2 , xi 2 SX , λi 2
where the infimum is taken over all the possible representations of z. It is worth mentioning that its topological dual ðπ,s,N X Þ′ can be identified (in the sense that there exists an isometric isomorphism) with PðN X Þ, the Banach space of all N-homogeneous polynomials on X into .
718
S. Dantas et al.
Now we are ready to present the results related to tensor products. It turns out that the answer for Werner’s question is negative in general when we consider only one of the factors to satisfy the Daugavet property. Indeed, we have the following example. Example Not long after this question was posed, Vladimir Kadets, Nigel Kalton, and Dirk Werner himself proved that: (a) There exists a two-dimensional complex Banach space Y such that the Banach space L 1 ½0, 1ε Y fails to have the Daugavet property (see [40, Theorem 4.2]). (b) There exists a two-dimensional complex Banach space Z such that the Banach space L 1 π Z fails the Daugavet property (see [40, Corollary 4.3]). (Here, L 1 ½0, 1 is the space of all complex-valued L1-functions, and L1 denotes the Banach space of all complex-valued L1-functions) Furthermore, Johann Langemets, Vegard Lima, and Abraham Rueda Zoca showed that there are real “Daugavet spaces” such that the tensor product between them fails to be octahedral or to have the strong diameter 2 property. In order to understand this better, recall that a Banach space with the Daugavet property is octahedral (see [8, Corollary 2.5]) and has the strong diameter 2 property (see [1, Theorem 4.4]). (Actually, an important characterization has been established recently by Vladimir Kadets: in [37], he proves that the diametral strong diameter 2 property is equivalent to the Daugavet property (this answered an open question posed by Julio Becerra Guerrero, Ginés López-Pérez, and Abraham Rueda Zoca).) With this in mind, we have that: n (c) [45, Theorem 3.8] shows that L 1 ½0, 1ε ℓ p is not octahedral for every 1 < p < 2 and n ≥ 3. n (d) [45, Corollary 3.9] shows that L 1 ½0, 1π ℓ p fails to have the strong diameter 2 property for 2 < p < 1 and n ≥ 3.
It is worth noting that both real and complex Banach spaces L1[0, 1] and L1[0, 1] satisfy the Daugavet property. Taking into account the previous examples, the following problem was the main motivation for two recent papers authored by Miguel Martín, Abraham Rueda Zoca, Pedro Tradacete, and Ignacio Villanueva (see [54, 62]): do the Banach spaces X π Y and X ε Y have the Daugavet property whenever both X and Y have the Daugavet property? In what follows, we will be describing their main results.
The Daugavet Equation: Linear and Nonlinear Recent Results
719
Rueda Zoca, Tradacete, and Villanueva proved the following result concerning the injective and projective tensor products. Let us notice that item (c) is an immediate consequence of item (b) of the same result, but we write it down for the sake of readability. It is worth pointing out that the results from [62] are for real Banach spaces. Theorem 2.1 ([62, Theorem 1.1 and Theorem 1.2]) The following real Banach spaces satisfy the Daugavet property. (a) L1 ðμ1 Þε L1 ðμ2 Þ where μ1, μ2 are purely nonatomic measures. (b) X π Y whenever X , Y are L1-preduals with the Daugavet property. (c) CðK 1 Þπ CðK 2 Þ whenever K1, K2 are compact spaces without isolated points. On the way to prove item (b) of Theorem 2.1, the authors provide a characterization for the Daugavet property in the projective tensor product X π Y (see [62, Corollary 3.2]), which allowed them to come up with a useful (and we will justify this adjective in a moment) property called the operator Daugavet property. In fact, the operator Daugavet property is a sufficient condition for a pair of Banach spaces X and Y so that X π Y satisfies the Daugavet property (see [62, Theorem 4.3]). On the other hand, Martín and Rueda Zoca considered a weaker property, called the weak operator Daugavet property, which still implies the Daugavet property for X π Y. Since the authors of this survey find both properties relevant (from the fact that it is not known whether they are equivalent), we highlight both of them in the following lines. We send the reader to [62, Definition 4.1] and [54, Definition 5.2]. Definition 2.2 Let X be a real or complex Banach space. (a) We say that X has the operator Daugavet property whenever, for every x1 , . . . , xn 2 SX , every slice S of BX , and every ε > 0, there is x 2 S such that, for every x′ 2 BX , there is an operator T 2 LðX Þ with kTk≤ 1 + ε, T(x) = x′ and kT(xi) - xik < ε for every i 2{1, . . . , n}. (b) We say that X has the weak operator Daugavet property if, for every x1 , . . ., xn 2 SX , every slice S of BX , every ε > 0, and every x′ 2 BX , there are x 2 S and T 2 LðX Þ with kTk≤ 1 + ε, kT(x) - x′k < ε, and kT(xi) - xik < ε for every i 2{1, . . ., n}. It is known that the operator Daugavet property implies the Daugavet property (see [62, Remark 4.2]), and it is immediate to see that the operator Daugavet property implies the weak operator Daugavet property. By [62, Proposition 4.4 and Remark 4.5], we have that real L1-preduals with the
720
S. Dantas et al.
Daugavet property and real L1 ðμ, YÞ with μ atomless have the operator Daugavet property. This provides more examples of Banach spaces X , Y satisfying that X π Y has the Daugavet property (see [62, Corollary 4.8]). We will be back to this property when we treat the nonlinear case of the Daugavet property (see Definition 3.32 of this survey in Section 3.1) as we provide a connection between the linear and nonlinear scenarios (see, for instance, Theorem 3.34). Still as a tool, the (weak) operator Daugavet property was used to provide examples of Banach spaces X such that the symmetric tensor product π,s,N X satisfies the Daugavet property. Indeed, the authors in [62] proved that whenever K is a compact Hausdorff space without isolated points, the real Banach space π,s,N CðKÞ satisfies the Daugavet property (see [62, Proposition 5.3]). This result was extended by Martín and Rueda Zoca in [54], and they provided even more examples as we can see in the following result. Theorem 2.3 ([54, Theorem 1.1]) Let Y be a real or complex Banach space. The real or complex Banach space π,s,N X satisfies the Daugavet property whenever: (a) X is an L1-predual space satisfying the Daugavet property. (b) X is an L1 ðμ, YÞ space with μ an atomless σ-finite positive measure. In what concerns positive results for tensor products when assuming the Daugavet property in only one of the factors, we would like to highlight two results. Rueda Zoca in [60] proved that whenever X is a separable L-embedded real Banach space (see, for instance, [31] for a background in these spaces) satisfying the Daugavet property and Y is a real Banach space with the metric approximation property, then the projective tensor product X π Y has the Daugavet property (see [60, Theorem 3.7]). On the other hand, Julio Becerra Guerrero and Angel Rodríguez-Palacios showed that whenever a real or complex Banach space X has no minimal L-summand (see these definitions in the next paragraph), then X π Y has the Daugavet property (see [7, Corollary 4.7]; we send the reader also to Corollaries 3.24, 3.25, and 3.26 of this survey). The latter result is strongly based on the theory of the centralizer and function module representations of Banach spaces, and, for this, we send the interested reader to the references from that paper. We recall that given a Banach space X any closed subspace Y of X is said to be an L-summand (M-summand, respectively) if there is a closed subspace Z of X such that X is the algebraic direct sum of Y and Z and:
The Daugavet Equation: Linear and Nonlinear Recent Results
721
ky þ zk = kyk þ kzk (ky þ zk = maxfkyk, kzkg, respectively) for all y 2 Y and z 2 Z. An L-summand or an M-summand is said to be minimal if it is minimal in the sense of the inclusion. One might wonder whether we can pass the Daugavet property from the tensor product Xπ Y or Xε Y to one of its factors X and Y. This problem was tackled by Miguel Marín, Javier Merí, and Alicia Quero in [52]. Let us notice that this is false in general: the real or complex Banach spaces L1 ð½0, 1, YÞ = L1 ½0, 1π Y and Cð½0, 1, YÞ = C½0, 1ε Y (see [63] for a background on these equalities) both have the Daugavet property for every Banach space Y, independently Y satisfies the Daugavet property or not. By proving that the projective tensor product of a slicely countably determined operator (see its definition in, for instance, [52, pg. 72]) and a rank-one operator is once again a slicely countably determined operator on a projective tensor product (see [52, Lemma 4.2]), they proved the following result. Theorem 2.4 ([52, Theorem 4.1]) Let X , Y be real or complex Banach spaces. Suppose that BY is a slicely countably determined set and that X π Y satisfies the Daugavet property. Then, X satisfies the Daugavet property. On the other hand, by assuming Fréchet differentiability at points of the unit sphere, the authors provided the following result. It is worth mentioning that they used the classical characterization of the Daugavet property in terms of slices of the closed unit ball (see Theorem 1.3). Theorem 2.5 ([52, Proposition 4.3 and Proposition 4.5]) Let X , Y be real or complex Banach spaces. Suppose that one of the following statements holds true: (a) X ε Y has the Daugavet property, and Y is Fréchet differentiable at a point y0 2 SY . (b) X π Y has the Daugavet property, and Y ′ is Fréchet differentiable at a point y0 2 SY ′ . Then, X has the Daugavet property. For the connection, in the context of tensor products, between the linear and nonlinear Daugavet property, we send the reader to Theorem 3.34 in Section 3.
722
2.2
S. Dantas et al.
Lipschitz Function Spaces
Once again we refer to [72] to motivate the problems of this section. In the last part of that survey (see [72, Problem (1), pg. 94]), Dirk Werner asked whether the real Banach space of Lipschitz functions over the unit square satisfies the Daugavet property. Let us notice first that the space of Lipschitz functions over [0, 1] is isometric to L1[0, 1] (through differentiation almost everywhere) and therefore satisfies the Daugavet property. Before highlighting the main results of this section, we need to fix some notation. Let M be a metric space. We assume that M contains a distinguished point 0 2 M, and we referee to (M, 0) as a pointed metric space or simply by M when the context is clear. We denote by Lip(M) the vector space of Lipschitz functions from M into . If f 2Lip(M), then the Lipschitz constant is defined by: kf kL = sup
jf ðxÞ - f ðyÞj : x, y 2 M, x ≠ y : dðx, yÞ
It turns out that kkL is a seminorm in Lip(M). When one considers (Lip0(M), kkL), where Lip0(M) is the space of all Lipschitz functions on M which vanish at 0, we have that it is a Banach space. It has become a well-known fact nowadays that Lip0(M) is a dual space for which its predual is the Lipschitz free space: F ðMÞ = spanfδx : x 2 Mg ⊆ Lip0 ðMÞ ′ (here, δx is defined as δx( f ) = f(x) for every x 2 M and f 2Lip0(M)). We need a little more of background. A metric space M is length if d(x, y) is the infimum of the length of rectifiable curves joining them for every x, y 2 M with x ≠ y. Let ε > 0 be fixed. We say that M is local if for every Lipschitz function f : M → , there exists x ≠ y such that d(x, y) < ε and: f ðxÞ - f ðyÞ > kf kL - ε: dðx, yÞ We send the reader to [19, 30, 34, 70] for a complete background on these topics. In [34], Yevgen Ivakhno, Vladimir Kadets, and Dirk Werner proved that if M is a length metric space, then Lip0(M) satisfies the Daugavet property answering in the positive the first question we have posed in this section.
The Daugavet Equation: Linear and Nonlinear Recent Results
723
Moreover, they showed that the Daugavet property in Lip0(M) and F ðMÞ are equivalent when M is a compact metric space. Eleven years after Ivakhno, Kadets, and Werner’s paper, Luis Carlos García-Lirola, Antonín Procházka, and Abraham Rueda Zoca gave a complete characterization in terms of the metric spaces for the Daugavet property in Lip0(M) and F ðMÞ. Indeed, they proved the following result. Theorem 2.6 ([29, Theorem 3.5]) Let M be a complete metric space. The following statements are equivalent: (a) M is a length metric space. (b) Lip0(M) satisfies the Daugavet property. (c) F ðMÞ satisfies the Daugavet property. Let us notice that if we assume that Lip0(M) has the Daugavet property, then we already have that F ðMÞ has it too since the Daugavet property is respected by taking preduals. On the other hand, to prove that M is length whenever F ðMÞ has the Daugavet property, the authors in fact prove that M is local and uses their own equivalence (see [29, Proposition 3.4]), a strengthening of [34, Corollary 2.10], which says that M is a length space if and only if M is a local space. Finally, in order to prove that M is local, they consider the following useful function as a tool: f xy ðtÞ =
dðx, yÞ dðt, yÞ - dðt, xÞ 2 dðt, yÞ þ dðt, xÞ
for every x, y 2 M with x ≠ y and use [29, Lemma 3.6 and Lemma 3.7] to construct a sequence ðxn , yn Þn2 in M with xn ≠ yn such that: f ðxn Þ - f ðyn Þ 2ε > 1 - ε and dðxn , yn Þ < dðxn , yn Þ ð1 - εÞ2
n
dðx, yÞ
for every n 2 . This allows them to conclude that M is local therefore. Still in the same paper [29] and still working with Lipschitz functions, the authors treated the vector-valued case. For a metric space M and a Banach space X (which we assume to be real throughout this section), we consider the Banach space Lip0 ðM, X Þ of all vector-valued Lipschitz functions f with f(0) = 0 endowed with the smallest Lipschitz constant. The space Lip0 ðM, X Þ is isometrically isomorphic to LðF ðMÞ, X Þ. We say that the pair ðM, X Þ has the contraction-extension property if whenever N ⊆ M and f : N → X is a Lipschitz function, there is a Lipschitz mapping F : M → X that extends f and:
724
S. Dantas et al.
kFkLip0 ðM,XÞ = kFkLip0 ðN,X Þ (see [9, Definition 2.3]). We send the reader to [11, Chapter 2] for examples of pairs ðM, X Þ satisfying such an extension property. The following result extends (a) ) (b) in Theorem 2.6. Proposition 2.7 ([29, Proposition 3.11]) Let M be a pointed length metric space. Suppose that the pair ðM, X Þ has the contradiction-extension property. Then, Lip0 ðM, X Þ satisfies the Daugavet property. As an immediate consequence of Proposition 2.7, in [29], it is obtained the following result which is related to Section 2.1. Corollary 2.8 ([29, Corollary 3.12.(b)]) Let M be a pointed metric space and X a Banach space. Assume that the pair ðM, X ′ Þ has the contractionextension property and also that F ðMÞ satisfies the Daugavet property. Then, the projective tensor product F ðMÞπ X also satisfies the Daugavet property. García-Lirola, Procházka, and Rueda Zoca conclude their paper by presenting the following complete characterization for the Daugavet property in Lip0(M) when M is a compact metric space. This result extends previous results by Ivakhno, Kadets, and Werner (see [34, Theorem 3.3]). Theorem 2.9 ([29, Corollary 5.11]) Let M be a pointed compact metric space. The following statements are equivalent: (a) (b) (c) (d) (e)
Lip0(M) satisfies the Daugavet property. BF ðMÞ has no preserved extreme points. BF ðMÞ has no strongly exposed points. The norm of Lip0(M) has no Gâteaux differentiable points. The norm of Lip0(M) has no Fréchet differentiable points.
For the connection, in the context of Lipschitz spaces, between the linear and nonlinear Daugavet property, we send the reader to Theorem 3.28 in Section 3.
2.3
C-Algebras and JB-Triples
In this short section, we will be dealing with JB-triples (in particular, with C-algebras) since they consist in important classes of spaces which have a strong connection with the nonlinear setting (see Section 3). To make the
The Daugavet Equation: Linear and Nonlinear Recent Results
725
readability of this survey as smooth as possible in order to avoid the reader jumping into too many different references in the case they are not used to some concept, we introduce some notations and terminology regarding JBtriples. A JB-triple is a complex Banach space X endowed with a continuous triple product: f. . . g : X × X × X → X that is linear and symmetric at the outer variables and conjugate-linear at the middle variable and also that satisfies the following three conditions: (a) For every x 2 X , y ° {xxy} is a Hermitian operator on X and has nonnegative spectrum. (b) The identity: fabfxyzgg = ffabxgyzg - fxfbaygzgþfxyfabzgg is satisfied for every a, b, x, y, z 2 X . (c) k{x, x, x}k = kxk3 for every x 2 X . It is worth mentioning that every C-algebra is a JB-triple under the product: 1 fxyzg = ðxy z þ zy xÞ: 2
ð2:1Þ
The Banach space LðX , YÞ of all bounded linear operators when X , Y are complex Hilbert spaces is a JB-triple; also, every complex Hilbert space is also a JB-triple. A JBW-triple is a JB-triple whose underlying Banach space is a dual Banach space. It is also worth mentioning that real JB-triples are norm-closed real subtriples (i.e., subspaces which are closed under triple products of their elements) of complex JB-triples. In this survey, we deal only with complex JB-triples, which is the classical framework. We will highlight now some characterizations concerning the Daugavet property for C-algebras, JB-triples, and their preduals. These results will be collected from [6], and we send the reader there for more details. We start with a result about real and complex JBW-triple X and its predual X , which characterizes the Daugavet property in terms of the geometry of these spaces. Theorem 2.10 ([6, Theorem 3.2]) Suppose that X is a JBW-triple. Denote by X its predual. The following statements are equivalent:
726
(a) (b) (c) (d) (e)
S. Dantas et al.
X has the Daugavet property. X has the Daugavet property. If A is a relative weak-open set of BX , then A has diameter 2. BX has no strongly exposed points. BX has no extreme points.
In order to prove Theorem 2.10, when dealing with preduals, we remind the reader that the Daugavet property is respected by taking preduals. Also, the authors apply some results from [69], more specifically (b) ) (c) is a consequence of [69, Lemma 3]. To prove that (d) implies (e), the authors show that every extreme point of BX is a strongly exposed point. Finally, to prove that (e) implies (a), they use the fact that the predual of every real or complex JBW-triple is L-embedded (see [10, Proposition 2.2]) and their own result [6, Corollary 2.3], which says that if X is an L-embedded Banach space without extreme points, then X satisfies the Daugavet property. We send the reader to [6, Corollaries 3.3, 3.4, 3.5, 3.7, Proposition 3.9 and Theorem 3.10] to check the consequences of Theorem 2.10. They provide us different points of view of the Daugavet property in these classes of spaces. We highlight one of them, which says that one can characterize the Daugavet property for JB-triples in terms of their norms; this makes the JB-triple scenario even more interesting since Theorem 2.11 is not valid for general Banach spaces (see [6, Remark 3.11]): the norm of ℓ1 is extremely rough (which implies that ℓ1 has no Fréchet differentiable points), but ℓ 1 does not satisfy the Daugavet property. Moreover, there are Banach spaces whose norms do not have Fréchet differentiable points but are rough (see [35, Remark 4, pg. 341]). Theorem 2.11 ([6, Theorem 3.10]) Suppose that X is a JB-triple. The following statements are equivalent: (a) X satisfies the Daugavet property. (b) The norm of X is extremely rough. (c) The norm of X is not Fréchet differentiable at any point. Since C-algebras are JB-triples under the product (2.1), we have that real C -algebras are real JB-triples. Therefore, we can state the following characterization of the Daugavet property for C-algebras as a consequence of the previous results on JB-triples.
Theorem 2.12 ([6, Corollary 4.1]) Let X be a C-algebra. The following statements are equivalent:
The Daugavet Equation: Linear and Nonlinear Recent Results
727
(a) X satisfies the Daugavet property. (b) The norm of X is extremely rough. (c) The norm of X is not Fréchet differentiable at any point. For an algebraic characterization of the Daugavet property for C-algebras, we send the reader to [6, Corollary 4.4]. On the other hand, as a consequence of some results concerning JBW-triples (that we will not treat here), we have the following results. Theorem 2.13 ([6, Corollary 3.5 and Corollary 4.3.(b)]) Suppose that X satisfies one of the following statements: (a) (b) (c) (d)
X X X X
is the dual of a JB-triple. is a JB-triple that is the bidual of some space. is the dual of a C-algebra. is a C-algebra that is the bidual of some space.
Then, X does not have the Daugavet property. We will start now treating characterizations for the alternative Daugavet property for JB-triples. As we already have mentioned in the introduction, the alternative Daugavet property is strongly connected to the concept of numerical index. Since we will be back to this topic later on in this survey, we send the reader to (3.1) and (3.2) for the definitions of numerical range and numerical radius, respectively. The numerical index of a Banach space X is defined as follows: nðX Þ: = inf fvðTÞ : T 2 LðXÞ, kTk = 1g: We have that an operator T 2 LðX Þ satisfies the alternative Daugavet equation if and only if its numerical radius coincides with its operator norm, that is, if and only if v(T) = kTk (see, for instance, [26, pg. 483]). This means that the numerical index nðX Þ = 1 if and only if every operator in LðX Þ satisfies the alternative Daugavet equation. Consequently, if nðX Þ = 1, then X has the alternative Daugavet property. Nevertheless, having the alternative Daugavet property is not enough to guarantee that the numerical index of X is 1. Indeed, in [53], the authors provide a counterexample on this manner: the Banach space X : = c0 1 Cð½0, 1, ℓ 2 Þ does not have numerical index 1, but it does satisfy the alternative Daugavet property. In fact, such an X does not satisfy the Daugavet property (see [53, Example 3.2]); this means that the alternative Daugavet property does not imply the classical Daugavet property.
728
S. Dantas et al.
Just out of curiosity, one might wonder when having the alternative Daugavet property for a Banach space X we have in fact that X has numerical index 1. This is the case, for instance, for Asplund spaces: by using [56, Lemmas 1.1 and 1.2], Miguel Martín proved the following result. If X is Asplund, then the following statements are equivalent: (a) nðX Þ = 1; (b) X satisfies the alternative Daugavet property; and (c) |x(x)| = 1 for every extreme point x of BX 00 and every w-strongly exposed point x of BX ′ . This can be found in [56, Corollary 1.3]. We also send the reader to [56, Proposition 3.3] for a similar phenomenon when one is dealing with C-algebras. In the same paper, Martín provides the following characterization for the alternative Daugavet property for JB-algebras. For the necessary background we send the interested reader to [56, Section 2] and to all references therein. We include all the characterization he has gotten there for the sake of completeness. Theorem 2.14 ([56, Theorem 2.6]) If X is a JB-triple, then the following statements are equivalent: (a) X satisfies the alternative Daugavet property. (b) |x(x)| = 1 for every extreme point x of BX 00 and every w-strongly exposed point x of BX ′ . (c) Every elementary triple ideal of X coincides with . (d) There exists a closed triple ideal Y with c0 ðΓÞ ⊆ Y ⊆ ℓ1 ðΓÞ for a convenient index set Γ such that X ∕ Y is nonatomic. (e) All minimal tripotents of X are diagonalizing. We also present a characterization that Martín got for C-algebras. In the same spirit as before, we present all the characterizations of [56] although we have not introduced some of the terminologies and concepts that appear in Theorem 2.15. Although it is a(n almost) consequence of Theorem 2.14, we do not state it as a corollary but as a theorem. For more information about this topic, we also send the reader to [53, 58]. Theorem 2.15 ([56, Corollary 3.2]) If X is a C-algebra, then the following are equivalent: (a) X satisfies the alternative Daugavet property. (b) |x(x)| = 1 for every extreme point x of BX 00 and every w-strongly exposed point x of BX ′ . (c) There is a two-sided commutative ideal Y such that X ∕ Y is nonatomic. (d) K 0 ðX Þ is isometric to c0( Γ). (e) K 0 ðX Þ is commutative.
The Daugavet Equation: Linear and Nonlinear Recent Results
729
(f) All atomic projections in X are central. (g) K 0 ðX Þ ⊆ ZðX Þ. For the connection, in the context of JB-triples and C-algebras, between the linear and nonlinear Daugavet property, we send the reader to Theorem 3.40 in Section 3.
3 The Daugavet Equation in the Nonlinear Setting The aim of this section is to study both the Daugavet and alternative Daugavet equations for polynomials in Banach spaces. We will survey the most relevant results in this setting. In fact, we will be working with a more general definition that has its roots in the works by Lawrence A. Harris (see [32]) and Ángel Rodríguez-Palacios (see [59]). Both of them have shown that the most convenient setting to deal with numerical ranges is to define it for bounded uniformly continuous functions on the unit sphere of a Banach space X with values in X . In 2007, Yun Sung Choi and Miguel Martín in a joint work with the second and third authors of this survey (see [22]) showed that, for the nonlinear setting, the study of the Daugavet equation was better clarified if it also defined for bounded continuous functions on the closed unit ball of a Banach space. We will now highlight some of the results of [22] since it is the one which puts the foundation for the nonlinear Daugavet equations. Before doing so, let us introduce some necessary notation we will be using throughout this section. Given two Banach spaces X , Y over a field , which can be either or , and given k 2 , a mapping P : X → Y is said to be a continuous k-homogeneous polynomial if there is a (unique) continuous klinear symmetric mapping P : X k → Y such that: ⋯ , xÞ PðxÞ = Pðx, for every x in X , where X k denotes the Cartesian product between X and itself k-times. We might also consider k = 0, and, in this case, we have that the 0-homogeneous polynomials are only the constant mappings. It is worth noting that the case k = 1 corresponds to LðX , YÞ. In this survey, we will be mainly interested in two cases: the first one is when Y = X , while the second is when Y = ; for these cases we are going to use the following notations: given k ≥ 0, we will denote by P k X ; X the Banach space of all k-homogeneous continuous polynomials from X into
730
S. Dantas et al.
itself and by P k X the Banach space of all k-homogeneous continuous scalar polynomials. We say that P : X → X is a polynomial on X , and we denote the space of all polynomials by P ðX ; X Þ, whenever P can be written as a finite sum of continuous homogeneous polynomials from X into X. We denote by P ðX Þ the space of all finite sums of continuous homogeneous scalar polynomials. The space PðX ; X Þ is a normed space whenever it is endowed with the norm: kPk: = supfkPðxÞk : x 2 BX g: In other words, P ðX ; X Þ is isometrically embedded into ℓ1 ðBX , X Þ, the Banach space of all bounded functions from BX into X endowed with the supremum norm. In what follows, we will write simply ℓ 1 ðBX Þ whenever we are dealing with scalar-valued functions. With the idea of generalizing the linear case (see Definition 1.1), the authors in [22] give the following general definition. Definition 3.1 Let X be a Banach space. (a) We say that Φ in ℓ1 ðBX , X Þ satisfies the Daugavet equation whenever: kId þ Φk = 1 þ kΦk: (b) We say that Φ satisfies the alternative Daugavet equation if there exists ω 2 such that ω Φ satisfies Daugavet equation or, equivalently, if: max kId þ ω Φk = 1 þ kΦk:
jωj = 1
Let us notice that if a mapping satisfies the Daugavet equation, then it satisfies the alternative Daugavet equation. Nevertheless, the converse does not hold true in general. Indeed, an immediate counterexample is the following. The operator T : c0 → c0 given by: TðxÞ = - xð1Þ e1
ðx 2 c0 Þ
satisfies the alternative Daugavet equation but does not satisfy the Daugavet equation. We start by presenting the following generalization of [41, Lemma 2.2 and Theorem 2.3]. It is worth mentioning that the results in [41] are presented for linear operators, and here we will be treating a more general setting in the context of Definition 3.1. Before doing that, let us present some new notation
The Daugavet Equation: Linear and Nonlinear Recent Results
731
we will be using throughout. Let X be a Banach space. Suppose that Z is a subspace of ℓ1 ðBX Þ. We set Z X to mean the space of all functions Φ : BX → X such that x ∘Φ 2 Z for every x2 X′. Let us notice that φ x0 2 Z X for every φ 2 Z and x0 2 X . Theorem 3.2 ([22, Theorem 1.1]) Let X be a real or complex Banach space. Suppose that Z is a subspace of ℓ 1 ðBX Þ . Then, the following statements are equivalent: (a) For every φ 2 Z and every x0 2 X , the function φ x0 satisfies the Daugavet equation. (b) For every φ 2 SZ , every x0 2 SX , and every ε > 0, there exist ω 2 and y 2 BX such that: Re ωφðyÞ > 1 - ε
and
kx0 þ ωyk > 2 - ε:
(c) Every Φ 2 Z X whose image is relatively weakly compact satisfies the Daugavet equation. The counterpart for the alternative Daugavet equation of Theorem 3.2 is the following result which is also taken from [22]. Theorem 3.3 Let X be a real or complex Banach space. Suppose that Z is a subspace of ℓ1 ðBX Þ. Then, the following statements are equivalent: (a) For every φ 2 Z and every x0 2 X , the function φ x0 satisfies the alternative Daugavet equation. (b) For every φ 2 SZ , every x0 2 SX , and every ε > 0, there exist ω1 , ω2 2 and y 2 BX such that: Re ω1 φðyÞ > 1 - ε
and
kx0 þ ω2 yk > 2 - ε:
(c) For every φ 2 SZ , every x0 2 SX , and every ε > 0, there exist ω 2 and y 2 BX such that: jφðyÞj > 1 - ε
and
kx0 þ ωyk > 2 - ε:
(d) Every Φ 2 Z X whose image is relatively weakly compact satisfies the alternative Daugavet equation. As we have already mentioned in the linear case (see Section 2.3), there is a strong connection between the concepts of numerical range and numerical radius with the Daugavet and the alternative Daugavet equations. This was
732
S. Dantas et al.
explicitly stated in [22]. Before providing the results in this manner, let us present some concepts. For a Banach space X , we set: ΠðX Þ: = fðx, x Þ : x 2 SX , x 2 SX ′ , x ðxÞ = 1g: Let f : SX → X be a bounded function. The spatial numerical range of the function f is defined by: Vðf Þ: = fx ðf ðxÞÞ : ðx, x Þ 2 ΠðX Þg
ð3:1Þ
and its numerical radius is: vðf Þ: = supfjλj : λ 2 Vðf Þg:
ð3:2Þ
These definitions were first given by L. Harris in [33] as extensions of the definition of numerical range for bounded operators on Banach spaces given by Bauer [5]. We send the reader to references [12, 13] for a complete background on these topics. Now we are ready to state the promised relation between the (alternative) Daugavet property and the concepts of spatial range and numerical radius. Proposition 3.4 ([22, Proposition 1.3]) Let X be a real or complex Banach space. Suppose that Φ : BX → X is a uniformly continuous mapping. Then, we have the following results: (a) Φ satisfies the Daugavet equation if and only if: kΦk = sup Re VðΦÞ: (b) Φ satisfies the alternative Daugavet equation if and only if: kΦk = vðΦÞ:
3.1
Nonhomogeneous Polynomial Spaces
Now, we will be focusing in the (nonhomogeneous) polynomial case of the Daugavet and alternative Daugavet equations although we will be back to
The Daugavet Equation: Linear and Nonlinear Recent Results
733
this more general situation (in the sense of Definition 3.1) soon, in Section 3.4. We start with weakly compact polynomials. Notice that linear operators are polynomials; therefore and are examples of Banach spaces on which there are (weakly compact) polynomials that do not satisfy the Daugavet equation. Even so, by using the maximum modulus theorem, one can prove that every polynomial on satisfies the alternative Daugavet equation (see [22, Examples 2.1.(b)]); this is no longer true for the real case (see [22, Examples 2.1.(c)]). In order to provide examples of Banach spaces for which every weakly compact polynomial satisfies the Daugavet equation, the following result, which is a corollary of Theorem 3.2, seems to be useful. Corollary 3.5 ([22, Corollary 2.2]) Let X be a real or complex Banach space. The following statements are equivalent: (a) For every P 2 PðX Þ and every x0 2 X , the polynomial P x0 satisfies the Daugavet equation. (b) For every P 2 PðX Þ with kPk = 1, every x0 2 SX , and every ε > 0, there exist ω 2 and y 2 BX such that: Re ωPðyÞ > 1 - ε
and
kx0 þ ωyk > 2 - ε:
(c) Every weakly compact P 2 PðX ; X Þ satisfies the Daugavet equation. Based on what we have seen in Corollary 3.5, we can define the following property. Definition 3.6 Let X be a real or complex Banach space. If X satisfies one (and all) of the conditions of Corollary 3.5, we say that X satisfies the polynomial Daugavet property. Two useful characterizations, in terms of sequences, in order to check if a Banach space X satisfies the polynomial Daugavet property, are the following. Proposition 3.7 ([23, Proposition 6.3]) Let X be a real or complex Banach space. Suppose that, for all x, z 2 SX , ω 2 with |ω| = 1 and ε > 0, there exists a sequence ðzn Þ1 n = 1 in X such that: (a) The series 1 n = 1 zn is weakly unconditionally Cauchy. (b) lim supkz þ zn k ≤ 1. (c) kx þ ωðz þ zn Þk > 2 - ε for every n 2 . Then, X satisfies the polynomial Daugavet property.
734
S. Dantas et al.
The second sequential characterization for the polynomial Daugavet property was provided very recently (we send the reader once again to [54]). We highlight it as follows. Proposition 3.8 ([54, Proposition 4.3]) Let X be a real or complex Banach space. Suppose that given x 2 SX , y 2 BX and ω 2 , there is a sequence in BX 00 such that: x n =2 lim sup x þ ωx n and that the linear operator from c0 to X 00 defined by en ° x n - y for all n 2 is continuous. Then, X satisfies the polynomial Daugavet property. Remark 3.9 The Daugavet property is infinite-dimensional in nature. Indeed, let X be a real or complex finite-dimensional Banach space; if Id : X → X denotes the identity operator, then - Id is a compact operator that does not satisfy the Daugavet equation since: kId þ ð- IdÞk = 0 < 2 = 1 þ kð- IdÞk: When it comes to the alternative Daugavet property, the situation changes drastically. Indeed, the complex field satisfies the alternative polynomial Daugavet property, but does not. The first known examples of Banach spaces having the polynomial Daugavet property are the C(K)-spaces when K is compact Hausdorff without isolated points (see Theorem 3.11 and Corollary 3.12). In fact, a more general result is known. We will provide some notation before mentioning such results. Here, Ω will stand for a completely regular Hausdorff topological space. We denote by Cb ðΩ, X Þ the Banach space of all bounded X valued continuous functions on Ω endowed with the supremum norm. For a locally compact Hausdorff space L, we denote by C 0 ðL, XÞ the Banach space of all continuous functions from L into X vanishing at infinity endowed with the supremum norm. We have the following definition. Definition 3.10 ([23, Definition 2.3]) We say that a subspace Z of Cb ðΩ, X Þ is Cb-rich if for every open subset U ⊆ Ω, every x 2 X, and every ε > 0, there exists a continuous function φ : Ω → ½0, 1 of norm one with support included in U such that the distance between φ x and Z is less than ε. It is worth noticing that Definition 3.10 is an extension of C-rich subspaces of C(K) given in [17, Definition 2.3]. Notice also that, without loss of
The Daugavet Equation: Linear and Nonlinear Recent Results
735
generality, one can assume that there exists a t 2 U such that φ(t) = 1 in Definition 3.10. Now, we can state the first relevant result of [22] for C(K) spaces obtained in 2007 (see also [28, Theorem 11.23] for a different proof). Theorem 3.11 ([22, Theorem 2.4]) Let Ω be a completely regular Hausdorff topological space without isolated points. Let X be a real or complex Banach space and let Z be a Cb-rich subspace of Cb( Ω, X). Then, every weakly compact polynomial from Z into itself satisfies the Daugavet equation. As Cb ðΩ, X Þ-spaces are Cb-rich in themselves, the above theorem applies to them, too. Moreover, given a perfect Hausdorff compact space K, we have that the spaces C(K, X) and Cb(K, X) coincide. Also, if L is a locally compact Hausdorff topological space without isolated points, then again the space C0(L, X) is a Cb-rich in Cb(L, X). Therefore, we may apply Theorem 3.11. As a consequence we have the following corollaries. Corollary 3.12 ([22, Corollary 2.5] and [23, Corollary 6.2]) Let K be a perfect compact Hausdorff space, L a locally compact Hausdorff topological space without isolated points, Ω a completely regular Hausdorff topological space without isolated points, and let X be a real or complex Banach space. Then: (a) (b) (c) (d)
The real and complex CðK, X Þ-spaces The real and complex C 0 ðL, XÞ-spaces The real and complex C b ðΩ, X Þ-spaces All the Cb-rich subspaces of (a), (b), and (c)
have the polynomial Daugavet property. An extension of the above result for the case of Hausdorff compact spaces without isolated points is possible since any finite-codimensional subspace of C(K) is C-rich in C(K). Thus, the above theorem gives the following corollary. Corollary 3.13 ([22, Corollary 2.7]) Let K be a compact Hausdorff topological space without isolated points and let Y be a finite-codimensional subspace of C(K). Then, Y satisfies the polynomial Daugavet property. How about the alternative Daugavet equation? One might think that the condition of not having isolated points is a necessary condition for the theorem above to hold in this context. The answer is still yes: it is indeed necessary since even in the linear setting we have that it is always possible to find weakly compact operators on C(K) which do not satisfy the Daugavet
736
S. Dantas et al.
equation. We distinguish this scenario into real and complex. In the real case, even the simplest case when CðKÞ = , there exist (weakly compact) polynomials which do not satisfy the alternative Daugavet equation. Nevertheless, the complex case is totally different as we can see in the next result. Proposition 3.14 ([22, Corollary 2.10]) Let K be a Hausdorff compact space. Then, every polynomial on the complex space C(K) satisfies the alternative Daugavet equation. It is worth mentioning that a more general result than Proposition 3.14 still holds. Indeed, we send the reader to Theorem 3.61 in order to check it in the context of holomorphic functions. In 2008, the authors of [23] provided more examples of Banach spaces which satisfy the polynomial Daugavet property. In order to state them, we will give some extra notation. The symbol C w ðK, X Þ stands for the space of all functions from K into X that are continuous when we consider X endowed with the weak topology; the symbol C w ðK, X ′ Þ stands for the space of functions from K into X ′ that are continuous when X ′ is endowed with the weak-star topology. The examples we were mentioning before are presented in the following result. Theorem 3.15 ([23, Theorem 6.5]) Let X be a real or complex Banach space. (a) If μ is an atomless σ-finite measure, then the Banach space L1 ðμ, X Þ satisfies the polynomial Daugavet property. (b) If K is a perfect compact Hausdorff topological space, then both spaces C w ðK, XÞ and Cw ðK, X ′ Þ have the polynomial Daugavet property. The authors of [23] provide the following stability results when it comes to c0- and ℓ1-sums. Proposition 3.16 ([23, Proposition 6.7]) The polynomial Daugavet property is inherited by M-summands. Conversely, if fX λ : λ 2 Λg is a non-empty family of real or complex Banach spaces with the polynomial Daugavet property, then ½λ2Λ X λ ℓ1 and ½λ2Λ X λ c0 have the polynomial Daugavet property. Let us now consider the vector-valued function spaces to have the polynomial Daugavet property whenever the range spaces do. Proposition 3.16 allowed the authors to obtain another positive result for the Banach space L1 ðμ, X Þ in both real and complex cases. They used the following fact: given a Banach space X and a σ-finite measure μ, there is a countable set J and an atomless σ-finite measure ν such that:
The Daugavet Equation: Linear and Nonlinear Recent Results
737
L1 ðμ, X Þ L1 ðν, X Þ1 ℓ1 ðJ, XÞ: Therefore, Theorem 3.15 and Proposition 3.16 provide the following result (we send the reader to [55, Theorem 2.5] for the linear case of Corollary 3.17). Corollary 3.17 ([23, Corollary 6.8]) Let ( Ω, Σ, μ) be a σ-finite measure space and let X be a real or complex Banach space. Then, the space L1 ðμ, X Þ satisfies the polynomial Daugavet property if and only if X satisfies the polynomial Daugavet property or μ is atomless. For the spaces of vector-valued continuous functions, the authors needed other technical tools; the more relevant of them was the following lemma, from where they got the promised result. Lemma 3.18 ([23, Lemma 6.9]) Let X be a complex Banach space with the polynomial Daugavet property, Γ a non-empty set, and let Y be a closed subspace of ℓ 1 ðΓ, X Þ. Suppose that for every f 0 2 Y there is a subset U0 of Γ which is norming for Y such that for every t0 2 U0 and every δ > 0, there is a function φ : Γ → ½0, 1 with φ(t0) = 1 such that: ΨðxÞ = ð1 - φÞf 0 þ φ x 2 Y
ðx 2 X Þ
and there is x0 2 BX such that kf0 - Ψ (x0)k < δ. Then Y has the polynomial Daugavet property. Now, as a consequence of Corollary 3.12, Theorem 3.15, Proposition 3.16, and Lemma 3.18, we have the following series of characterizations for the polynomial Daugavet property for the function spaces CðK, X Þ, C w ðK, X Þ, C0 ðL, X Þ, Cb ðΩ, X Þ, and Cw ðK, X ′ Þ. Theorem 3.19 ([23, Proposition 6.10]) Let X be a complex Banach space, K a compact space, L a locally compact space, and Ω a completely regular space. (a) CðK, XÞ has the polynomial Daugavet property if and only if X does or K is perfect. (b) C w ðK, XÞ has the polynomial Daugavet property if and only if X does or K is perfect. (c) C 0 ðL, X Þ has the polynomial Daugavet property if and only if X does or L is perfect. (d) C b ðΩ, X Þ has the polynomial Daugavet property if and only if X does or Ω is perfect.
738
S. Dantas et al.
(e) If K contains a dense subset of isolated points and X ′ has the polynomial Daugavet property, then Cw ðK, X ′ Þ has the polynomial Daugavet property. In 2010, Miguel Martín, Javier Merí, and Mikhail Popov proved analogous results in the context of spaces of integrable functions. Theorem 3.20 ([51, Theorem 2.6]) If ( Ω, Σ, μ) is an atomless measure space, then the Banach space L1(μ) satisfies the polynomial Daugavet property. One key element in their proof is that they use the fact that L1(μ) satisfies the Dunford-Pettis property (see, for instance, [2, Theorem 5.4.5]). This implies (see, for instance, [25, Proposition 2.34]) that every scalar polynomial on L1(μ) is weakly sequentially continuous. With a different approach, they were able to obtain the vector-valued version of Theorem 3.20. Notice that it is not an immediate adaptation of Theorem 3.20 since L1 ðμ, X Þ does not satisfy the Dunford-Pettis property in general. They instead use the density of simple functions and bring the problem down to a finite-dimensional space F, where they can use the Dunford-Pettis property for L1(μ, F) (see the proof of [51, Theorem 3.3]). We highlight it as follows. Theorem 3.21 ([51, Theorem 3.3]) Let ( Ω, Σ, μ) be an atomless measure space and let X be a real or complex Banach space. Then, the Banach space L1 ðμ, X Þ satisfies the polynomial Daugavet property. In 2008, Julio Becerra Guerrero and Angel Rodríguez Palacios (see [7]) studied and defined the concept of Banach spaces that are representable in a compact Hausdorff space, and they proved that these spaces have the (linear) Daugavet property. In fact, Geraldo Botelho and Elisa R. Santos in 2016 improved in [16] that result by proving that these spaces indeed have the polynomial Daugavet property. Before stating these results, we need some background. Definition 3.22 ([7, Definition 2.3]) Let X be a real or complex Banach space and let K be a compact Hausdorff space. We say that the space X is K-representable if there is a family ðX k Þk2K of Banach spaces such that X is linearly isometric to a closed C(K)-submodule of the C(K)-module 1 k2K X k in such a way that, for every x 2 SX and every ε > 0, the set {k 2 K : kx(k)k > 1 - ε} is infinite. X is said to be representable if it is K-representable for some compact Hausdorff space K.
The Daugavet Equation: Linear and Nonlinear Recent Results
739
Botelho and Santos, by using topological methods, the characterization of weakly unconditional Cauchy series, and a characterization of the polynomial Daugavet property in terms of sequences (see Proposition 3.7), proved that every representable Banach space satisfies the polynomial Daugavet property (see also [7, Lemma 2.4]). Theorem 3.23 ([16, Theorem 2.5]) Every representable Banach space has the polynomial Daugavet property. The definition of representable Banach space looks very abstract. However, in [7], it is shown that many examples of Banach spaces having the Daugavet property can be obtained by checking that they satisfy that definition. Botelho and Santos exploited this mine to get a plethora of examples of Banach spaces having the polynomial Daugavet property. Some of them are listed in the following corollaries. Corollary 3.24 ([16, Corollary 2.6]) Let X , Y be real or complex Banach spaces. Suppose that Y is a representable Banach space. Then, we have the following results: (a) X ε Y has the polynomial Daugavet property. (b) If M is a closed subspace of LðX , YÞ such that LðYÞ∘M ⊆ M, then M has the polynomial Daugavet property. Corollary 3.25 ([16, Corollary 2.9]) Every dual Banach space without minimal M-summands has the polynomial Daugavet property. By applying Corollary 3.24, they also get the following result. Corollary 3.26 ([16, Corollary 2.10]) Let X be a real or complex Banach space. Suppose that Y is a dual Banach space without minimal M-summands and take M to be a closed subspace of LðX, YÞ such that LðYÞ∘M ⊆ M . Then, M satisfies the polynomial Daugavet property. In 2022, Miguel Martín and Abraham Rueda Zoca in [54] consider the natural open question (see also [51, Problem 1.2] where it is explicitly stated): when does the (linear) Daugavet property imply the polynomial Daugavet property? From their study on Daugavet points and Δ points in L1-preduals (see [54, Section 3] for more details), they obtain several new and deep results. Among them the following theorem holds true. Theorem 3.27 ([54, Theorem 4.2]) The following Banach spaces have the polynomial Daugavet property:
740
S. Dantas et al.
(a) L1-preduals with the Daugavet property (b) More in general, spaces nicely embedded in Cb( Ω) when Ω has no isolated points and for which fps : s 2 Sg is (pairwise) linearly independent In the same spirit as Theorem 3.27, Martín and Rueda Zoca also prove that if Lip0(M) satisfies the (linear) Daugavet property, then it satisfies the polynomial Daugavet property, whenever M is complete. We send the reader to Section 2.2 of this survey for more results about the Daugavet property in the context of Lipschitz spaces. Proposition 3.28 ([54, Proposition 4.4]) Let M be a pointed complete metric space. If Lip0 ðMÞ has the Daugavet property, then it has the polynomial Daugavet property. Even more recent is a paper of 2022 by David Cabezas, Miguel Martín, and Antonio M. Peralta (see [20]). They have obtained very interesting results in the context of C-algebras and, more in general, in the context of JB-triples (we send the reader to Section 2.3 of this survey for more details in these spaces and their relation with the Daugavet properties). They extend part of the abovementioned results by proving that, for these spaces, the (linear) Daugavet property and the polynomial Daugavet property are in fact equivalent properties. Before describing their results precisely, let us highlight a result about polynomials they have obtained. It says that, in complex Banach spaces with the Daugavet property, weakly compact polynomials which are weakly continuous on bounded sets satisfy the Daugavet equation. Theorem 3.29 ([20, Theorem 2.3]) Let X be a complex Banach space with the Daugavet property. Then, given P 2 PðX Þ which is weakly continuous on bounded sets and satisfies kPk = 1, a 2 X with kak = 1 and ε > 0, there are x 2 BX and w 2 with |w| = 1 such that: Re wPðxÞ > 1 - ε
and
ka þ wxk > 2 - ε:
As a consequence, every weakly compact P 2 PðX , XÞ which is weakly continuous on bounded sets, satisfies the Daugavet equation. It is important to notice that Theorem 3.29 cannot be used to get the polynomial Daugavet property from the (linear) Daugavet property in any Banach space: indeed, every Banach space with the Daugavet property contains an isomorphic copy of ℓ 1, and it is known that every Banach space containing ℓ 1 admits a continuous scalar polynomial which is not
The Daugavet Equation: Linear and Nonlinear Recent Results
741
weakly continuous on bounded sets (see the first item in [20, Remarks 2.5] for proper references of these last facts). Then, they dig into a deep study of some properties of JB-triples that include properties of minimal tripotents, description of the tripotents in JBW-triples, the characterization of when a bounded multilinear mapping defined in the product of a finite number of JB-triples is quasi completely continuous in terms of its Aron-Berner extension, and also the use of the triple spectrum of any element of a JB-triple. Combining all these tools and more (we send the reader to [20, Section 3]), they are able to obtain the following result. Theorem 3.30 ([20, Theorem 4.1]) Let X be a JB-triple satisfying the Daugavet property. Then, X satisfies the polynomial Daugavet property. An important remark should be mentioned. The study about the polynomial Daugavet property for C-algebras and JB-triples was first studied by Elisa R. Santos in her PhD thesis, and her results published in [66]. It is worth reminding here some of them. For nonatomic commutative C-algebras with unit, Santos proved that they satisfy the polynomial Daugavet property. In fact, she proved a more general result that we state later on in this survey (see Proposition 3.63). For JB-triples, she obtained the following result which has been extended in 2022 in the already stated Theorem 3.30. We highlight it since it was proved before the extension given by Cabezas, Martín, and Peralta. Theorem 3.31 ([66, Theorem 3.5]) Let X be a JB-triple. The following statements are equivalent: (a) (b) (c) (d)
X has no minimal tripotents. Every polynomial of finite type on X satisfies the Daugavet equation. Every approximable polynomial on X satisfies the Daugavet equation. Every rank-one operator on X satisfies the Daugavet equation.
Martín and Rueda Zoca introduced in this year 2022 a stronger version of the Daugavet property called the (weak) operator Daugavet property. We send the reader to Definition 2.2 of this survey for the linear setting. To establish this property for polynomials, we need some preliminary notation. Let X be a Banach space. Let x1 , . . ., xn 2 SX , ε > 0, and x ′ 2 BX , we consider the set OF(x1, . . ., xn;x′, ε) as the set of all y 2 BX such that there exists an operator T 2 LðX Þ such that:
742
S. Dantas et al.
kTk ≤ 1 þ ε,
kTðyÞ - x ′ k < ε and kTðxi Þ - xi k < ε
for every i = 1, . . ., n. We send the reader to [54, Lemma 5.6] where it is checked that this set is large enough in the sense that it is indeed weakly dense. We are now ready to provide the reader the desired definition of the weak operator Daugavet property in the context of polynomials. Definition 3.32 ([54, Definition 5.7]) Let X be a real or complex Banach space. We say that X satisfies the polynomial weak operator Daugavet property if, for every P 2 PðXÞ with kPk = 1, every x1 , . . ., xn 2 SX , x ′ 2 BX , α > 0 and ε > 0, there is y 2 OFðx1 , . . ., xn ; x ′ , εÞ, and ω 2 with Re ωPðyÞ > 1 - α: It worth saying that [54, Remark 5.8] points out that the polynomial weak operator Daugavet property implies both weak operator Daugavet property (see Definition 2.2 of this survey) and the polynomial Daugavet property. Plenty of examples satisfying this new property are given in the next result. Theorem 3.33 ([54, Proposition 5.9 and Proposition 5.11]) 1. If X is an L1-predual with the Daugavet property, then X has the polynomial weak operator Daugavet property. 2. If μ is an atomless σ-finite positive measure and Y is a Banach space, then L1 ðμ, YÞ has the polynomial weak operator Daugavet property. Finally, they are able to provide the following remarkable result. Roughly speaking, that one can jump from the weak polynomial operator Daugavet property of a Banach space into the Daugavet property for the symmetric tensor product. Indeed, they have the following result. Theorem 3.34 ([54, Theorem 5.12]) Let X be a Banach space with the polynomial weak operator Daugavet property and let N 2 . The space ^π,s,N X has the weak operator Daugavet property and so the Daugavet property. In 2014, Santos considered and studied the alternative Daugavet property for polynomials in [65]. We remind the reader that a Banach space X satisfies the alternative polynomial Daugavet property if every weakly compact polynomial on X satisfies the alternative Daugavet equation. First she proved the stability of the alternative polynomial Daugavet property for c0- and ℓ 1sums. 1
Proposition 3.35 ([65, Proposition 2.1]) Let X j j = 1 be a sequence of Banach spaces. Then, the following statements are equivalent:
The Daugavet Equation: Linear and Nonlinear Recent Results
(a) 1 j = 1X j
ℓ1
or
1 j = 1X j
c0
743
satisfies the alternative polynomial
Daugavet property. (b) X j satisfies the alternative polynomial Daugavet property for every j 2 . She continued studying when the alternative polynomial Daugavet property is stable for ℓ1-sums. It was known that the (linear) alternative Daugavet property is stable for ℓ1-sums (proved by Martín and Oikhberg in 2004 in [53, Proposition 3.1]), but the result does not hold true in general for the alternative polynomial Daugavet property. Indeed, has the alternative polynomial Daugavet, but ℓ 1 ðÞ does not. Even so, Santos pointed out that whenever an ℓ1-sum of Banach spaces satisfies either the polynomial Daugavet property or the alternative polynomial Daugavet property, then so do its summands. This is the content of the next result. Proposition 3.36 ([65, Proposition 2.3]) Let X j Banach spaces. If 1 j = 1X j
ℓ1
1 j=1
be a sequence of
has the polynomial Daugavet property (resp.,
the alternative polynomial Daugavet property), then every X j has the polynomial Daugavet property (respectively, the alternative polynomial Daugavet property). Santos was also able to check that the techniques used in other papers (see [22] and [53]) to prove that some spaces have the polynomial Daugavet property might be used to prove the alternative polynomial Daugavet property. More precisely, she obtained the following result. Proposition 3.37 ([65, Proposition 3.1]) Let ( Ω, Σ, μ) be a σ-finite measure space and let X be a Banach space. The space L1 ðμ, X Þ has the alternative polynomial Daugavet property if and only if μ is atomless or X has the alternative polynomial Daugavet property. By using a result [49, Lemma 1] about the density of the points in which a weakly continuous function is actually norm continuous, Santos provides a characterization of the alternative polynomial Daugavet property for continuous vector-valued function spaces in the spirit of Theorem 3.19. Proposition 3.38 ([65, Proposition 3.3]) Let X be a complex Banach space, K a compact Hausdorff space, L a locally compact Hausdorff space, and Ω a completely regular Hausdorff space. The following statements hold:
744
S. Dantas et al.
(a) CðK, XÞ has the alternative polynomial Daugavet property if and only if K is perfect or X has the alternative polynomial Daugavet property. (b) C w ðK, XÞ has the alternative polynomial Daugavet property if and only if K is perfect or X has the alternative polynomial Daugavet property. (c) C 0 ðL, X Þ has the alternative polynomial Daugavet property if and only if L is perfect or X has the alternative polynomial Daugavet property. (d) C b ðΩ, X Þ has the alternative polynomial Daugavet property if and only if Ω is perfect or X has the alternative polynomial Daugavet property. When it comes to the Banach space L1 ðμ, X Þ of Bochner integrable functions, she obtained the following result regarding the alternative polynomial Daugavet property. Proposition 3.39 ([65, Proposition 3.4]) Let ( Ω, Σ, μ) be a σ-finite measure space and let X be a Banach space. If L1 ðμ, X Þ has the polynomial Daugavet property (respectively, the alternative polynomial Daugavet property), then μ is atomless or X has the polynomial Daugavet property (respectively, alternative polynomial Daugavet property). Cabezas, Martín, and Peralta, in their recent paper of 2022 (see [20]), show that whenever a JB-triple satisfies the alternative Daugavet property, then it also satisfies the alternative polynomial Daugavet property. Theorem 3.40 ([20, Theorem 4.2]) If X is a JB-triple satisfying the alternative Daugavet property, then X also satisfies the alternative polynomial Daugavet property. In general, a tool coming from the numerical radius theory can provide stronger results in the context of (alternative) Daugavet properties. In the setting of polynomials, we may (and we do) define the polynomial numerical index by the following expression: nP ðX Þ := inffvðPÞ : P 2 PðX ; X Þ, kPk = 1g: By using Proposition 3.4, notice that whenever X is a Banach space with nP ðX Þ = 1, then any continuous polynomial P : X → X satisfies the alternative Daugavet equation. In particular, in this case, X has the alternative polynomial Daugavet property. In the context of Banach spaces, during the past 20 years, a trend has been established. More specifically, it is nowadays very typical to measure or to quantify “how far” is a given Banach space X of satisfying a given property P. Usually, this is done by defining a “numerical function” in such a way that the following result holds true: X attains the maximum of such a function if
The Daugavet Equation: Linear and Nonlinear Recent Results
745
and only if X satisfies the desired property P. In 2003, in the context of Daugavet property, Miguel Martín defined the Daugavetian index of a Banach space X in [50]. Indeed, using the fact that in infinite-dimensional real or complex Banach spaces the compact operators on X are noninvertible, he defined the concept of the Daugavetian index for an infinitedimensional Banach space X as follows: daugðX Þ = maxfm ≥ 0 : kId þ Tk ≥ 1 þ mkTk for all T 2 KðX Þg: Let us notice that 0 ≤ daugðX Þ ≤ 1 and also that daugðX Þ = 1 if and only if X satisfies the Daugavet property. Notice also that daugðX Þ = 0 whenever X is an infinite-dimensional Banach space with a bicontractive projection with finite rank, that is, a finite-rank projection P such that kPk = kId - Pk = 1. Examples of Banach spaces which have this phenomenon are as follows: c0, c, ℓ p with 1 ≤ p ≤1, C(K)-spaces with K non-perfect, and also L1(μ)- and L1(μ)-spaces when μ has atoms (see [50, Example 4]). For an operator T 2 LðX Þ, Martín showed that: daugðX Þ = inffωðTÞ : T 2 KðX Þ, kTk = 1g where ωðTÞ = sup Re VðPÞ (see [50, Proposition 1]). We send the reader to check more on the function daugðX Þ in [50], more specifically, [50, Proposition 5, Example 6] and the stability results in [50, Section 3]. In 2019, Elisa R. Santos defined an analogous polynomial index as follows. That definition is only given for (infinite-dimensional) complex Banach spaces since for the results the Cauchy inequalities are needed. We denote the Banach space of all compact polynomials on a Banach space X into itself by P K ðX ; X Þ. Definition 3.41 ([67, pg. 408]) Let X be an infinite-dimensional complex Banach space. The polynomial Daugavetian index of X is defined as follows: daugp ðX Þ = maxfm ≥ 0 : kId þ Pk ≥ 1 þ mkPk for all P 2 P K ðX ; X Þg: The fact that X is a complex Banach space implies that 0 ≤ daugp ðX Þ ≤ 1 and also that if daugp ðX Þ = 1, then the space X has the polynomial Daugavet property. Moreover, by applying the Cauchy inequalities, Santos was able to observe that daugp ðX Þ ≤ daugðX Þ for every infinite-dimensional complex Banach space X, and as in the linear case, she was able to prove the following result. Recall that ωðPÞ = sup Re VðPÞ.
746
S. Dantas et al.
Proposition 3.42 ([67, Proposition 2.1]) Let X be an infinite-dimensional complex Banach space. Then, we have the following equalities: daugp ðX Þ = inf fωðQÞ : Q 2 P K ðX ; X Þ, kQk = 1g = maxfn ≥ 0 : ωðPÞ ≥ nkPk for all P 2 P K ðX ; X Þg: Santos studied the relationship between the Daugavetian indexes of a countable family of complex Banach spaces and the corresponding c0, ℓ1 and ℓ 1sums. We summarize her results in the next proposition. This generalizes some results from [50]. Proposition 3.43 ([67, Proposition 2.2 and Proposition 2.3]) Let ðX λ Þλ2Λ be a countable family of infinite-dimensional complex Banach spaces. Then: (a) daugp ½λ2Λ X λ c0 = inf daugp ðX λ Þ : λ 2 Λ . (b) daugp ½λ2Λ X λ ℓ1 = inf daugp ðX λ Þ : λ 2 Λ . (c) daugp ½λ2Λ X λ ℓ1 ≤ inf daugp ðX λ Þ : λ 2 Λ . Santos applied the above proposition to give the Daugavetian polynomial index of vector-valued spaces of continuous and of measurable bounded functions. Proposition 3.44 ([67, Proposition 2.4 and Proposition 2.5]) Let X be an infinite-dimensional complex Banach space. 1. If K is a compact Hausdorff space, then: daugp ðCðK, X ÞÞ = max daugp ðCðKÞÞ, daugp ðX Þ 2. If ( Ω, Σ, μ) is a σ-finite measure space, then: daugp ðL1 ðμ, X ÞÞ = max daugp ðL1 ðμÞÞ, daugp ðX Þ :
3.2
Homogeneous Polynomial Spaces
We now study the k-order Daugavet (k-Daugavet in short) and the k-order alternative Daugavet (k-alternative Daugavet in short) properties. We will be also working with the following definitions, which are once again taken from
The Daugavet Equation: Linear and Nonlinear Recent Results
747
the linear case. In what comes next, let us notice that in item (a), when k = 1, we have that X satisfies the Daugavet property. Analogously, in the linear case of item (b), we have that X has the alternative Daugavet property. Definition 3.45 Let X be a Banach space. We say that X has the: (a) k-order Daugavet property if all rank-one k-homogeneous polynomials satisfy Daugavet equation. (b) k-order alternative Daugavet property if all rank-one k-homogeneous polynomials satisfy the alternative Daugavet equation. We would like to highlight that there are Banach spaces with weakly compact polynomials which do not satisfy the alternative Daugavet equation and however every weakly compact k-homogeneous polynomial does satisfy the alternative Daugavet equation (we send the reader to [22, Examples 3.4. (b)]). This makes the study of the k-order properties a nontrivial task. As we have already stated several results for continuous polynomials in Section 3.1, we have that the same results hold true in the context of homogeneous polynomials. More precisely, Theorem 3.11 and Corollary 3.14 give us, as particular cases, the following examples of Banach spaces satisfying the k-Daugavet and k-alternative Daugavet properties. Corollary 3.46 ([22, Example 3.1]) (a) Let Ω be a completely regular Hausdorff topological space without isolated points and let X be a real or complex Banach space. The space C b ðΩ, XÞ satisfies the k-Daugavet property for every k 2 . (b) Let K be a Hausdorff compact space. The complex space C(K) satisfies the k-alternative Daugavet property for every k 2 . The versions of Theorem 3.2, Theorem 3.3, and Proposition 3.4 presented in this survey previously (see Section 3.1) in terms of homogeneous polynomials produce even more examples (we send the reader once again to the reference [22]). We highlight them in the following result. Corollary 3.47 Let X be a Banach space and let k be a positive integer. (a) The following statements are equivalent: (i) X has the k-Daugavet property. (ii) For every P 2 P k X with kPk = 1, every x0 2 SX , and every ε > 0, there exist ω 2 and y 2 SX such that:
748
S. Dantas et al.
Re ω PðyÞ > 1 - ε
and
kx0 þ ω yk > 2 - ε:
(iii) Every weakly compact P 2 P k X; X satisfies the Daugavet equation. (iv) For every weakly compact P 2 P k X ; X , we have kPk = sup Re VðPÞ. (b) The following statements are equivalent: (i) X has the k-alternative Daugavet property. (ii) For every P 2 P k X with kPk = 1, every x0 2 SX , and every ε > 0, there exist ω 2 and y 2 SX such that: Re ω PðyÞ > 1 - ε
and
kx0 þ yk > 2 - ε:
(iii) For every P 2 P k X with kPk = 1, every x0 2 SX , and every ε > 0, there exists y 2 SX such that: jPðyÞj > 1 - ε
and
kx0 þ yk > 2 - ε:
(iv) Every weakly compact P 2 P k X ; X satisfies the alternative Daugavet equation. (v) For every weakly compact P 2 P k X ; X , we have kPk = v(P). The following result tells us that the linear case and the nonlinear case are very different from each other. Indeed, surprisingly, for the complex case, we have that the Daugavet equation and the alternative Daugavet equation for khomogeneous polynomials with k greater than 1 are equivalent properties. Theorem 3.48 ([22, Proposition 3.2 and Corollary 3.3]) Let X be a Banach space. Suppose that k ≥ 2. (a) If X is a complex Banach space, then the k-Daugavet property and the k-alternative Daugavet property are equivalent. (b) If X is a real Banach space and k is even, then the k-Daugavet property and the k-alternative Daugavet property are equivalent. Concerning Theorem 3.48, we have the following observations. Remark 3.49 In the real case, the k-Daugavet and the k-alternative Daugavet properties are not equivalent properties for k odd (see [22, Remark 3.5]). In the complex case, the Banach spaces c0 and ℓ 1 have the k-order Daugavet property for every k ≥ 2. However, in the real case, c0 and ℓ 1 do not have the 2-alternative Daugavet property. More negative examples: the real or
The Daugavet Equation: Linear and Nonlinear Recent Results
749
complex Banach spaces ℓ21 and ℓ1; indeed, they do not have the 2-alternative Daugavet property. Moreover, the real space ℓ21 does not have the 2-alternative Daugavet property. Therefore, it is natural to wonder whether there exists a relation between the k-Daugavet property (respectively, the k-alternative Daugavet property) when the value of k is changed. For the k-alternative Daugavet property, we have the following result. Proposition 3.50 ([22, Proposition 3.7]) Let X be a real or complex Banach space. Suppose that k is a positive integer. If X has the (k + 1)alternative Daugavet property, then X has the k-alternative Daugavet property. If we go in the other direction in Proposition 3.50, the result is no longer true. Indeed, the Banach space ℓ 1 satisfies the 1-alternative Daugavet property, but it does not satisfy the 2-alternative Daugavet property. On the other hand, the analogous result for the k-Daugavet property changes quite a bit depending on if the Banach space X is real or complex as we can see in Proposition 3.51 and Proposition 3.52. Proposition 3.51 ([22, Corollary 3.8]) Let X be a complex Banach space. Suppose that k ≥ 2. If X satisfies the (k + 1)-Daugavet property, then X has the k-Daugavet property. It is worth mentioning that Proposition 3.51 does not hold true for k = 1 neither in the complex nor in the real case. Indeed, the complex Banach space c0 satisfies the 2-Daugavet property while it does not satisfy the 1-Daugavet property. Also the real Banach space satisfies the k-Daugavet property for every k even, but it does not have the k-Daugavet property for any k odd. In the real case, it can be proved a similar result to the above corollary if we allow a two-step jump. Indeed, we have the following result from [22]. Proposition 3.52 ([22, Proposition 3.10]) Let X be a real Banach space. Suppose that k is a positive integer. If X has the (k + 2)-Daugavet property, then X has the k-Daugavet property. For a wealth of more examples and counterexamples, we refer to [22]. As stated for the space of polynomials defined on a Banach space, the numerical index of the space of k-homogeneous polynomials from a Banach space into itself is a very useful tool to prove that certain Banach spaces satisfy the k-alternative Daugavet property. Let us be more precise. The k-polynomial numerical index is defined as follows:
750
S. Dantas et al.
nðkÞ ðX Þ:= inffvðPÞ : P 2 Pðk X ; X Þ, kPk = 1g: By Proposition 3.4, if X is a Banach space with nðkÞ ðX Þ = 1, then any continuous k-homogeneous polynomial P : X → X satisfies the Daugavet equation. In particular X has the alternative k-Daugavet property. For the study of polynomial numerical indices, we send the reader to [27, 42, 43, 46, 47]. We also send the readers to the recent and interesting book on spear operators by Vladimir Kadets, Miguel Martín, Javier Merí, and Antonio Pérez [38] where they can find plenty of examples and concepts related to the Daugavet property. In 2009, Sun Kwang Kim and Han Ju Lee [43] attacked the problem on when nðkÞ ðX Þ = 1 (proposed in [39]) whenever X is a finite-dimensional Banach space, and they got a remarkable result. In order to state it accordingly, let us introduce some necessary notations. Given in n a norm kk, we say that it is absolute if for every (z1, . . ., zn) in n it holds that k(z1, . . ., zn)k = k(|z1|, . . ., |zn|)k and kejk = 1 for j = 1, . . ., n, where {e1, . . ., en} is the canonical basis of n . The symbol ℓ n1 stands for ðn , k:k1 Þ where kðz1 , . . ., zn Þk1 = max fjz1 j, . . . , jzn jg. Observe that if in ðn , k:kÞ, the canonical basis {e1, . . ., en} is 1-unconditional, then that norm is absolute. The result is the following characterization. Theorem 3.53 ([43, Theorem 4.6]) Let X be an n-dimensional complex Banach space with an absolute norm and let k be an integer greater than or equal to 2. Then nðkÞ ðX Þ = 1 if and only if X is isometric to ℓ n1 . In our setting, since any k-homogeneous polynomial from n into itself is compact, Theorem 3.53 gives the following corollary if combined with Theorem 3.48. Corollary 3.54 Let X be an n-dimensional complex Banach space with an absolute norm and let k be an integer greater than or equal to 2. Then, X satisfies the k-alternative Daugavet property (equivalently, the k-Daugavet property) if and only if X is isometric to ℓn1 . Let us point out that Corollary 3.54 shows, in particular, that the linear theory of the Daugavet property (k = 1) and the one for k-homogeneous polynomials (for k ≥ 2) are quite different since in the linear case the Daugavet property fails to hold for any finite-dimensional Banach space (we send the reader back to Remark 3.9). Taking into account that Corollary 3.54 is stated for complex Banach spaces, it is natural to wonder what happens for finite-dimensional real Banach spaces. The answer was given by Han Ju Lee and Miguel Martín
The Daugavet Equation: Linear and Nonlinear Recent Results
751
in 2012 (see [46]) (partly based in [47, Proposition 4.1]). As the answer is based in a previous paper by Lee, Martín, and Merí of 2011 [47], we are going to state the two main results of that paper with not many details (the interested reader should go to [46, 47]), given in the context of 2-numerical radius. The first result we would like to highlight is the following one, which is based in a deep study of extreme and denting points. Theorem 3.55 ([47, Theorem 4.2]) Let Λ be a non-empty set and let X be a (complex) linear subspace of Λ with absolute norm such that n(2)(X) = 1. We have the following: (a) If X satisfies the Radon-Nikodým property, then Λ is finite and X = ℓ m 1 for some m 2 . (b) If X is an Asplund space, then X = c0 ðΛÞ. The second result is stated as follows. Theorem 3.56 ([47, Theorem 4.3]) Let Λ be a non-empty set and let X be a (real) linear subspace of Λ with absolute norm such that n(2)(X) = 1. If X has the Radon-Nikodým property or X is an Asplund space, then X = . Lee and Martín showed that, among Banach spaces with 1-unconditional basis, the only ones which have polynomial numerical index of order 2 equal to 1 are c0 and ℓm 1 in the complex case and in the real case. Theorem 3.57 ([46, Theorem 3.1]) Let X be a Banach space with 1-unconditional basis and nð2Þ ðX Þ = 1. (a) If X is a real space, then X = . (b) If X is a complex space, then either X = c0 or there exists m 2 such that X = ℓ m 1. Thus, it is immediately the following consequence for the 2-alternative Daugavet property. Corollary 3.58 If X is a real finite-dimensional Banach space with 1-unconditional basis, then X has the 2-alternative Daugavet property if and only if X = . Once again we get another strong indication that the theory of the polynomial and alternative polynomial Daugavet equations is far more richer in the context of complex Banach spaces than in the real ones.
752
3.3
S. Dantas et al.
Holomorphic Function Spaces
In this section, we will be treating the Daugavet properties in the context of holomorphic functions. We start by defining the Banach spaces we will be working with. Let X be a complex Banach space. The three more usual spaces of holomorphic functions (i.e., complex-Fréchet differentiable functions spaces) defined on the open unit ball BX are the following ones. We set H1 ðBX Þ (respectively, H1 ðBX , X Þ) for the Banach space of the complexvalued (respectively, X-valued) functions on BX which are holomorphic and bounded. We set A1 ðBX Þ (respectively, A1 ðBX , X Þ) for the Banach space of the complex-valued (respectively, X -valued) functions which are continuous and bounded on BX and holomorphic on the open unit ball. This one obviously embeds isometrically into ℓ 1 ðSX Þ (respectively, ℓ1 ðSX , X Þ). We also set Au ðBX Þ (respectively, Au ðBX , X Þ ) for the closed subspace of A1 ðBX Þ (respectively, A1 ðBX , X Þ) formed by the functions in that space which admit (a unique) uniformly continuous extension to the closed unit ball BX . It is worth noticing that all of them are Banach spaces when endowed with the supremum norm. Let us observe that in the scalar case, we have a richer structure since the spaces H1 ðBX Þ, A1 ðBX Þ, and Au ðBX Þ are all uniform Banach algebras. It is well-known the fact that we have the following chain of inclusions: Au ðBX Þ ⊆ A1 ðBX Þ ⊆ H1 ðBX Þ: These two inclusions, in general, are strict. But, it is worth mentioning also that whenever X is finite-dimensional, then we have that Au ðBX Þ = A1 ðBX Þ, and in fact, they coincide with: AðBX Þ = f 2 H1 ðBX Þ : f has a continuous extension to BX : It is a well-known fact that for any complex Banach spaces X , Y and for every holomorphic mapping f : BX → Y with 1 n = 0 Pn f ð0Þ as its Taylor series at 0, if f is a weakly compact mapping, then for each n, the nhomogeneous polynomial Pnf(0) satisfies that Pn f ð0ÞðBX Þ is a relatively weakly compact subset of Y (in 1976, Richard M. Aron and M. Schottenloher proved it in [4, Proposition 3.4] for compact entire mappings, but their technique clearly works also for weakly compact holomorphic mappings defined on the open ball BX ). Moreover, if f is
The Daugavet Equation: Linear and Nonlinear Recent Results
753
additionally uniformly continuous on BX , then the family ðSm,r Þm,r2 defined, for x in BX , by m
Sm,r ðxÞ =
n=0
Pn f ð0Þ
r-1 x , r
is norm dense in Au ðBX , YÞ. Hence, for any complex Banach space, the following result holds true. Theorem 3.59 Let X be a complex Banach space. Suppose that X satisfies the polynomial Daugavet property. Then, every weakly compact Φ in Au ðBX , X Þ satisfies the Daugavet equation. As a consequence of Theorem 3.59, we have the following. For any complex Banach space X satisfying the hypotheses of Theorems 3.11, 3.15, 3.19, 3.20, 3.21, 3.23, 3.27, 3.30, 3.33, 3.34, Corollaries 3.12, 3.13, 3.17, 3.24, 3.25, 3.26, and Propositions 3.14, 3.16, 3.28, it follows that every weakly compact Φ in Au ðBX , X Þ satisfies the Daugavet equation. The first time that this application of the polynomial Daugavet property appeared was in the following corollary of [22]. Corollary 3.60 ([22, Corollary 2.6]) Let Ω be a completely regular Hausdorff topological space without isolated points and let X be a complex Banach space. Then, every weakly compact Φ in Au ðBCb ðΩ,X Þ , Cb ðΩ, X ÞÞ satisfies the Daugavet equation. With respect to the alternative Daugavet property, for complex C(K) spaces, and as a consequence of the lemma below, in [22], it is obtained that every element of A1 ðBCðKÞ , CðKÞÞ satisfies the alternative Daugavet equation. Theorem 3.61 ([22, Theorem 2.8]) If K is a Hausdorff compact space and Φ 2 A1 ðBCðKÞ , CðKÞÞ, then v( Φ) = k Φk. Lemma 3.62 ([22, Lemma 2.9]) Let Ω be a set, let Z be a subspace of ℓ1( Ω), and let Λ ⊆ Ω be a norming set for Z (i.e., for every f 2 Z, kf k = supfjf ðλÞj : λ 2 Λg). Then, given a Banach space Y and a function Φ 2 ℓ1 ðΩ, YÞ such that y ∘Φ 2 Z for every y 2 Y ′ , we have: kΦk = supfkΦðλÞkY : λ 2 Λg: In 2014, Elisa R. Santos (see [66]), by using the Gelfand-Naimark theorem which implies that if A is a commutative C-algebra, then it is isometrically
754
S. Dantas et al.
weak-star isomorphic to CðSðAÞÞ, where SðAÞ is the spectrum of A, and also using the fact that if A is nonatomic, then SðAÞ has no isolated points, was able to establish the following two propositions. Proposition 3.63 ([66, Proposition 2.1]) If A is a nonatomic commutative C-algebra with unit, then every weakly compact Φ in Au ðBA , AÞ satisfies the Daugavet equation. Proposition 3.64 ([66, Proposition 2.2]) If A is a commutative C-algebra with unit, then every Φ in Au ðBA , AÞ satisfies the alternative Daugavet equation. We conclude this section by presenting some very recent results due to Mingu Jung [36]. These results show that the Daugavet property holds for the usual algebras of holomorphic functions defined on the open unit ball BX of a Banach space X. By using a combination of geometry of Banach spaces and the study of the Gleason parts of the maximal ideals (see [3]) of any of these three Banach algebras we have mentioned in the first part of this section, Jung proved that dual of Banach algebras of bounded holomorphic functions does not have weak-star strongly exposed points. Let us recall that a unit vector x in a Banach space X is strongly exposed if there exists a unit vector x in its topological dual such that x(x) = 1, and given any sequence ðxn Þn2 contained in BX , if Re x ðxn Þ → 1, then xn converges to x in norm. In the case that X is a dual space and an element x is strongly exposed by x belonging to the predual of X , then it is said that x is weak-star strongly exposed. M. Jung proved the following result. Theorem 3.65 ([36, Theorem 2.1]) Let X be a Banach space. If A is one of the spaces Au ðBX Þ, A1 ðBX Þ or H1 ðBX Þ, then its topological dual A ′ does not have weak-star strongly exposed points. Once Jung gets the above result, he combines it with a result by Jorge Mujica from 1991 [57] where, by using the Ng-Dixmier theorem on the existence of a predual of a Banach space, it shows the existence of a Banach space G1 ðX Þ predual of H1 ðBX Þ. All in all, Mingu Jung proves the following theorem. Theorem 3.66 ([36, Corollary 2.2]) For any Banach space X , G1 ðX Þ the predual of H1 ðBX Þ does not have strongly exposed points; hence G1 ðX Þ fails to have the Radon-Nikodým property. Finally, as an application of the above very interesting results, Jung obtains the following theorem.
The Daugavet Equation: Linear and Nonlinear Recent Results
755
Theorem 3.67 ([36, Corollary 2.3]) Let X be a Banach space. If A denote Au ðBX Þ, A1 ðBX Þ or H1 ðBX Þ, then A satisfies the Daugavet property. In fact, this last theorem is proved by Jung for any bounded open balanced convex subset of X . The fact that Au ðBX Þ satisfies the Daugavet property for any Banach space X has been obtained independently by Sun Kwang Kim and Han Ju Lee [44]. It is worth to mention that Jung’s results are extensions of two results: one by Wojtaszczyk who proved in [73, Corollary 3] that H1 ðVÞ and AðVÞ have the Daugavet property whenever V is an open and connected subset of N with N 2 and Werner who obtained in [71, Theorem 3.3] that the algebra of the disk AðÞ satisfies the Daugavet property. If A is a uniform algebra and Y is a complex Banach space, then: AY = ff 2 CðK, YÞ : y ∘ f 2 A for all y 2 Y ′ g: And, as usual, if f 2 A and y 2 Y, it is denoted by f y 2 CðK, YÞ, ( f y) (t) = f(t)y for t 2 K, and also A Y = ff y ; f 2 A, y 2 Yg: In 2014 in [21] it is proved the following general result for uniform algebras. Theorem 3.68 ([21, Theorem 2.7, Corollary 2.8 and Corollary 2.9]) If A is a uniform algebra whose Choquet boundary has no isolated points and Y is a complex Banach space, then any closed subspace B of AY such that A Y ⊂ B has the polynomial Daugavet property. From the above theorem, it is easily obtained next theorem that extends Theorem 3.67 to the polynomial Daugavet property and to the vectorvalued case. Theorem 3.69 Given X and Y complex Banach spaces. Denoting by A, either H1 ðBX Þ, or A1 ðBX Þ, or Au ðBX Þ, we have that any closed subspace B of AY such that A Y ⊂ B has the polynomial Daugavet property. In particular, H1 ðBX ; YÞ, A1 ðBX ; YÞ, and Au ðBX ; YÞ have the polynomial Daugavet property. Proof It is straightforward to check that given a topological Hausdorff space Ω and Ω0 ⊂ Ω a dense set, we have that Ω has no isolated points if and only if Ω0 has no isolated points. Hence, given a uniform algebra A, its Choquet boundary Γ0 ðAÞ has no isolated points if and only if its Shilov boundary ΓðAÞ has no isolated points. Now it is enough to apply the remark of P. Wojtaszczyk in [73] just before Corollary 3 where it is shown that if a uniform algebra has no nontrivial idempotent elements (see also [48, Lemma 2.5]), then its Shilov boundary does not have isolated points. Thus, its
756
S. Dantas et al.
Choquet boundary neither has isolated points. But, by Baire theorem and the identity principle, the only functions in H1 ðBX Þ, or A1 ðBX Þ, or Au ðBX Þ that satisfy f 2 = f (i.e. f( f - 1) = 0) are either f = 0 or f = 1. Therefore, no one of these spaces have nontrivial idempotent elements for any complex Banach space X . As a consequence the Choquet boundary of any of the uniform □ algebras H1 ðBX Þ, A1 ðBX Þ, and Au ðBX Þ has not isolated points. Whenever a complex Banach space Y satisfies the polynomial Daugavet property, no hypothesis is required on the uniform algebra to obtain that AY has the same property too. Theorem 3.70 ([21, Corollary 2.11]) If A is a uniform algebra and Y is a complex Banach space with the polynomial Daugavet property, then AY has the polynomial Daugavet property.
3.4
Bounded Function Spaces
Since the very beginning (see [22]), the general setting to define the nonlinear Daugavet equation, the Daugavet property, the alternative Daugavet equation, and the alternative Daugavet property has been in ℓ1 ðBX , X Þ, the space of all bounded mappings from the closed unit ball of a real or complex Banach space into itself. But, up to now, only a few results in this general setting have been established. For spaces of integrable functions, Martín, Merí and Popov proved in 2010 the following result. Proposition 3.71 ([51, Proposition 3.2.(d)]) Let ( Ω, Σ, μ) be an atomless measure space and let X be a real or complex Banach space. If φ : BL1 ðμ,X Þ → is a weakly sequentially continuous bounded function with kφk = 1 and x0 2 SL1 ðμ,XÞ , then for every ε > 0, there exist ω 2 and y 2 L1 ðμ, X Þ, kykb1, such that: Re ωφðyÞ > 1 - ε
and
kx0 þ ωyk > 2 - ε:
Equivalently, every weakly sequentially continuous with relatively weakly compact range Φ 2 ℓ 1 BL1 ðμ,X Þ , L1 ðμ, X Þ satisfies the Daugavet equation. It is worth pointing out that all the definitions related to the Daugavet property have a common denominator. Indeed, we always sum the identity mapping Id to a member of our class of functions and study when the norm of this sum is the sum of these norms. The situation changed a bit in 2010
The Daugavet Equation: Linear and Nonlinear Recent Results
757
where, in the linear setting of the Daugavet equation, it was studied when the identity operator could be replaced by any bounded operator that got the name of Daugavet center in the references [14, 15]. Many results have been obtained since then. However, here we would like to describe the advances made in the nonlinear setting mainly by Stefan Brach, Enrique A. Sánchez Pérez, and Dirk Werner in 2017 in [18], when they study the equation: kΦ þ Ψk = kΦkþkΨk, where Φ and Ψ are bounded mappings on the unit ball of some Banach space X having values in some (possibly different) Banach space Y. They give the following definitions. Definition 3.72 ([18, Definition 2.1]) Let X , Y be Banach spaces. Let Φ 2 ℓ1 ðBX , X Þ. We say that Ψ 2 ℓ 1 BX , Y satisfies the Φ-Daugavet equation if the norm equality: kΦ þ Ψk = kΦkþkΨk holds true. Definition 3.73 ([18, Definition 2.2]) Let X , Y be Banach spaces. Let Φ 2 ℓ1 ðBX , YÞ. We define the following properties: (1) Y has the Φ-Daugavet property with respect to V ⊆ ℓ1 BX , Y if the Φ-Daugavet equation is satisfied by all Ψ 2 V . (2) Y has the Φ-Daugavet property if k Φ + Rk = k Φk + kRk for all R 2 LðX , YÞ with one-dimensional range. (3) Y has the Daugavet property if (2) holds for X = Y and Φ = Id. Most of the results they have obtained are based in a deep study of what the authors call the slice continuity, and that was introduced by the last two authors in [64]. They produce many interesting results. We are going only to mention a few of them. However, before doing so, we need to fix some notation. The first one is the concept of slice for a nonlinear scalar-valued function. If p : X → is a bounded function with norm ≤ 1, we denote: Sðp, εÞ = x 2 BX : Re pðxÞ ≥ 1 - ε : Note that, in this case, it may occur that S( p, ε) = ∅. Next we need another definition of slice continuity. The following definition is taken from [64].
758
S. Dantas et al.
Definition 3.74 ([64, Definition 2.4] and [18, Definition 3.1]) Let X , Y be Banach spaces. Let Φ 2 ℓ1 BX , Y . (a) If y 2 Y ′ with y∘ Φ≠ 0, we define Φy : BX → by: Φy ðxÞ = 1∕ ky ∘ Φky ðΦðxÞÞ: (b) The natural set of slices defined by Φ is given by: S Φ = S Φy , ε : 0 < ε < 1, y 2 Y ′ , y ∘ Φ ≠ 0 (c) We write S Ψ ≤ S Φ if, for every SðΨz , εÞ 2 S Ψ , there is a slice S Φy , μ 2 S Φ with: S Φy , μ ⊆ SðΨz , εÞ: In this instance, we say that Ψ is slice continuous with respect to Φ. The definition of strong slice continuity is needed in order to state their theorem. Definition 3.75 ([18, Definition 3.2]) Let X, Y be Banach spaces. Let Φ, Ψ 2 ℓ1 BX , Y . We use the symbol S Ψ < S Φ if, for every slice SðΨz , εÞ 2 S Ψ , there is a slice S Φy , μ 2 S Φ such that: S ωΦy , μ ⊂ SðωΨz , εÞ
for all ω 2 :
In this case, we say that Ψ is strongly slice continuous with respect to Φ. Given Φ 2 ℓ 1 BX , Y , the symbol Y ′ Φ Y stands for: Y ′ Φ Y = fy Φ y : y 2 Y ′ , y 2 Y g: One of their main results is the following. Theorem 3.76 ([18, Theorem 3.11]) Let X , Y be Banach spaces. Let Φ, Υ, Ψ 2 ℓ 1 BX , Y with k Φk = k Υk = k Ψ k = 1. Assume that Y has the Φ-Daugavet property with respect to Y ′ Υ Y . Then, if S Ψ < S Υ and Ψ is weakly compact:
The Daugavet Equation: Linear and Nonlinear Recent Results
759
kΦ þ Ψk = 2 Analogously, they make a general study of the alternative Daugavet equation. We refer to their results on this topic with less detail. Definition 3.77 ([18, Defition 2.7]) Let X , Y be Banach spaces. Let Φ 2 ℓ1 BX , Y . (a) We say that Y has the alternative Φ-Daugavet property with respect to V ⊆ ℓ 1 BX , Y if the Φ-alternative Daugavet equation is satisfied for all Ψ2V. (b) We say that Y has the alternative Φ-Daugavet property if: max kΦ þ ωRk = kΦkþkRk
jωj = 1
for all R 2 LðX, YÞ with one-dimensional range. It is worth noting that Y has the alternative Daugavet property if it has the alternative Id-Daugavet property. Now that the notation is fixed, we look at the interrelation between the Daugavet and the alternative Daugavet equations. Again they show that one can jump from the property being true from rank-one functions to weakly compact one. In a precise way, we have the following result. Theorem 3.78 ([18, Theorem 3.15]) Let X , Y be Banach spaces. Let Z be a subspace of ℓ 1 BX . Assume that Φ 2 ℓ 1 BX , Y with k Φk = 1. Then, the following are equivalent: (a) For every φ 2 Z and y 2 Y, φ y satisfies the Φ-Daugavet equation. (b) Every weakly compact Ψ 2 ℓ 1 BX , Y such that y ∘ Ψ 2 Z for all y 2 Y ′ satisfies the Φ-Daugavet equation. In order to state their main result for the alternative Φ-Daugavet property, the notion of weak slice continuity is needed. We refer the interested reader to [18, Definition 4.3] for it. Theorem 3.79 ([18, Theorem 4.10]) Let X , Y be Banach spaces. Let Φ, Υ, Ψ 2 ℓ1 BX , Y with k Φk = k Υk = k Ψ k = 1. Assume that Y has the alternative Φ-Daugavet property with respect to Y ′ Υ Y. Then, if S 0Ψ < S 0Υ and Ψ is weakly compact, we have that:
760
S. Dantas et al.
max kΦ þ ωΨk = 2:
jωj = 1
To conclude this survey, we would like to describe two results from 2020 by Elisa R. Santos (see [68]) on the polynomial Daugavet centers, which are based in the research by Brach, Sánchez Pérez, and Werner, and the reference [15]. A polynomial Q : X → Y is said to be a polynomial Daugavet center if the equality kQ + Pk = kQk + kPk is satisfied for all rank-one polynomials P : X → Y (see, for instance, [68]; see also [68, Theorem 2.2] for a characterization). Theorem 3.80 ([68, Theorem 2.4]) Let X , Y be real Banach spaces. Suppose that Q 2 PðX , YÞ with kQk = 1. Then, Q is a polynomial Daugavet center if and only if given a polynomial slice S of BX ; it follows that, whenever there exist y1 , . . ., yn 2 Y such that QðSÞ ⊂ ni= 1 Bðyi , ri Þ , then there exists i0 2{1, . . ., n} such that r i0 ≥ 1 þ yi0 , where Bðyi , ri Þ stands for the closed ball centered at yi with radius ri for all i. Santos, by using Theorem 3.78, obtained the following corollary. Corollary 3.81 ([68, Corollary 2.9]) Let X , Y be real Banach spaces. Suppose that Q 2 PðX , YÞ with kQk = 1. If Q is a polynomial Daugavet center, then every weakly compact P 2 PðX , YÞ satisfies the Q-Daugavet equation, that is: kQ þ Pk = kQkþkPk: A final result gives conditions to extend a polynomial Daugavet center to a superspace from a subspace. Theorem 3.82 ([68, Theorem 2.12]) Let X , Y and E be real Banach spaces. If Q : X → Y is a polynomial Daugavet center, Y is a subspace of E and J : Y → E is the natural embedding operator; then E can be equivalent renormed in such a way that the new norm coincides with the original one on Y and J∘Q : X → E is also a polynomial Daugavet center. Related to the paper by Santos et al. [68], we also refer the reader to [61] by Abraham Rueda Zoca. Santos asked the following question: if Q : X → Y is a polynomial Daugavet center and S = S(P, α) is a polynomial slice of BX 00 , is there u 2 S \ SX 00 such that:
The Daugavet Equation: Linear and Nonlinear Recent Results
761
^ kQðuÞ þ signðPðuÞÞyk = 1 þ kyk ^ denotes the Aron-Berner extension of Q? Rueda for every y 2 Y, where Q Zoca answered in the negative this question (see [61, Remark 3.10] for more details). All the theory developed in this section is done in the setting of real Banach spaces (notice, on the other hand, the results in [18] are also valid for complex Banach spaces). It is a natural question to ask for the corresponding study in the setting of complex Banach spaces. Acknowledgements The authors are thankful to Miguel Martín for fruitful conversations during the preparation of this manuscript. They also thank the anonymous referees who have suggested many useful changes which improved the final version of the manuscript. Funding Information The first, second, and third authors were all supported by project PID2021-122126NB-C33/MCIN/AEI/ 10.13039/ 501100011033 (FEDER). The first author was also supported by project PID2019-106529GB - I00 / AEI / 10.13039/ 501100011033. The second and third authors were also supported by project PROMETEU/2021/070. The fourth author was supported by project PGC2018097286-B-I00/MCIU/AEI/10.13039/5011000 11033.
References 1. Abrahamsen, T. A., Lima, V., & Nygaard, O. (2013). Remarks on diameter 2 properties. Journal of Convex Analysis, 20, 439–452 2. Albiac, F., & Kalton, N. J. (2006). Topics in Banach space theory. Graduate Texts in Mathematics (Vol. 233). New York: Springer 3. Aron, R. M., Dimant, V., Lassalle, S., & Maestre, M. (2020). Gleason parts for algebras of holomorphic functions in infinite dimensions. Revista Matemática Complutense, 33, 415–436 4. Aron, R. M., & Schottenloher, M. (1976). Compact holomorphic mappings on Banach spaces and the approximation property. Journal of Functional Analysis, 21(1), 7–30 5. Bauer, F. L. (1962). On the field of values subordinate to a norm. Numerical Mathematics, 4, 103–111 6. Becerra Guerrero, J., & Martín, M. (2005). The Daugavet property of C-algebras, JBC-triples, and of their isometric preduals. Journal of Functional Analysis, 224, 316–337 7. Becerra Guerrero, J., & Rodríguez-Palacios, A. (2008). Banach spaces with the Daugavet property, and the centralizer. Journal of Functional Analysis, 254, 2294–2302
762
S. Dantas et al.
8. Becerra Guerrero, J., López-Pérez, G., & Rueda Zoca, A. (2014). Octahedral norms and convex combination of slices in Banach spaces. Journal of Functional Analysis, 266(4), 2424–2435 9. Becerra Guerrero, J., López-Pérez, G., & Rueda Zoca, A. (2018). Octahedrality in Lipschitz-free Banach spaces. Proceedings of the Royal Society of Edinburgh. Section A. Mathematics, 148(3), 447–460 10. Becerra Guerrero, J., López, G., Peralta, A. M. & Rodríguez-Palacios, A. (2004). Relatively weakly open sets in closed balls of Banach spaces and real JB-triples of finite rank. Mathematische Annalen, 330, 45–58 11. Benyamini, Y., & Lindenstrauss, J. (2000). Geometric Nonlinear Functional Analysis (Vol. 1), American Mathematical Society Colloquium Publications, vol. 48. Providence, RI: American Mathematical Society 12. Bonsall, F. F., & Duncan, J. (1971). Numerical Ranges of operators on normed spaces and of elements of normed algebras. London Mathematical Society Lecture Note Series (Vol. 2). Cambridge: Cambridge University Press 13. Bonsall, F. F., & Duncan, J. (1973). Numerical Ranges II. London Mathematical Society Lecture Note Series (Vol. 10). Cambridge: Cambridge University Press 14. Bosenko, T. V. (2010). Daugavet centers and direct sums of Banach spaces. Central European Journal of Mathematics, 8, 346–356 15. Bosenko, T. V., & Kadets, V. (2010). Daugavet centers. Zhurnal Matematicheskoi Fiziki, Analiza, Geometrii, 6(1), 3–20, 134 16. Botelho, G., & Santos, E. R. (2016). Representable spaces have the polynomial Daugavet property. Archiv der Mathematik (Basel), 107(1), 37–42 17. Boyko, K., Kadets, V., Martín, M., & Werner, D. (2007). Numerical index of Banach spaces and duality. Mathematical Proceedings of the Cambridge Philosophical Society, 142(1), 93–102 18. Brach, S., Sánchez Pérez, E. A., & Werner, D. (2017). The Daugavet equation for bounded vector-valued functions. The Rocky Mountain Journal of Mathematics, 47(6), 1765–1801 19. Bridson, M. R., & Haefliger, A. (1999). Metric spaces of non-positive curvature. Grundlehren der mathematischen wissenschaften (Fundamental principles of mathematics sciences, Vol. 319). Berlin: Springer 20. Cabezas, D., Martín, M., & Peralta, A. (2022). The Daugavet equation for polynomials on C -algebras and JB*-triples. Preprint 21. Choi, Y. S., García, D., Kim, S. K., & Maestre, M. (2014). Some geometric properties of disk algebras. Journal of Mathematical Analysis and Applications, 409, 147–157 22. Choi, Y. S., García, D., Maestre, M., & Martín, M. (2007). The Daugavet equation for polynomials. Studia Mathematica, 178, 63–84 23. Choi, Y. S., García, D., Maestre, M., & Martín, M. (2008). The polynomial numerical index for some complex vector-valued function spaces. Quarterly Journal of Mathematics, 59, 455–474 24. Daugavet, I. K. (1963). On a property of completely continuous operators in the space C . Uspekhi Matematicheskikh Nauk, 18, 157–158 (Russian) 25. Dineen, S. (1999). Complex analysis on infinite dimensional spaces. Springer-Verlag, Springer Monographs in Mathematics. London: Springer 26. Duncan, J., McGregor, C. M., Pryce, J. D., & White, A. J. (1970). The numerical index of a normed space. Journal of the London Mathematical Society, 2, 481–488
The Daugavet Equation: Linear and Nonlinear Recent Results
763
27. García, D., Grecu, B., Maestre, M., Martín, M., & Merí, J. (2014). Polynomial numerical indices C(K ) and L1(μ). Proceedings of the American Mathematical Society, 142, 1229–1232 28. García, D., & Maestre, M. (2014). Some non-linear geometrical properties of Banach spaces. In J. C. Ferrando, & M. Lopéz–Pellicer (Eds.), Descriptive topology and functional analysis. Springer (Proceedings in mathematics and statistics, PROMS series, Vol. 80, pp. 208–240). Cham: Springer 29. García-Lirola, L. C., Procházka, A., & Rueda Zoca, A. (2018). A characterisation of the Daugavet property in spaces of Lipschitz functions. Journal of Mathematical Analysis and Applications, 464(1), 473–492 30. Godefroy, G., & Kalton, N. (2003). Lipschitz-free Banach spaces. Studia Mathematica, 159(1), 121–141 31. Harmand, P., Werner, D., & Werner, W. (1993). M-ideals in Banach spaces and Banach algebras. Lecture notes in mathematics (Vol. 1547). Berlin: Springer 32. Harris, L. A. (1971). The numerical range of holomorphic functions in Banach spaces. American Journal of Mathematics, 93, 1005–1019 33. Harris, L. A. (1974). The numerical range of functions and best approximation. Proceedings of the Cambridge Philosophical Society, 76, 133–141 34. Ivakhno, Y., Kadets, V., & Werner, D. (2007). The Daugavet property for spaces of Lipschitz functions. Mathematica Scandinavica, 101, 261–279 35. John, K., & Zizler, V. (1978). On rough norms on Banach spaces. Commentationes Mathematicae Universitatis Carolinae, 19, 335–349 36. Jung, M. (2021). Daugavet property of Banach algebras of holomorphic functions and norm-attaining holomorphic functions. Preprint 37. Kadets, V. (2021). The diametral strong diameter 2 property of Banach spaces is the same as the Daugavet property. Proceedings of the American Mathematical Society, 149(6), 2579–2582 38. Kadets, V., Martín, M., Merí, J., & Pérez, A. (2018). Spear operators between Banach spaces. Lecture Notes in Mathematics (Vol. 2205). Springer 39. Kadets, V., Martín, M., & Payá, R. (2006) Recent progress and open questions on the numerical index of Banach spaces. RACSAM Revista de la Real Academia de Ciencias Exactas, Físicas y Naturales. Serie A. Matematicas 100(1–2), 155–182 40. Kadets, V., Kalton, N., & Werner, D. (2003). Remarks on rich subspaces of Banach spaces. Studia Mathematica, 159(2), 195–206 41. Kadets, V. M., Shvidkoy, R. V., Sirotkin, G. G., & Werner, D. (2000). Banach spaces with the Daugavet property. Transactions of the American Mathematical Society, 352, 855–873 42. Kim, S., Martín, M., & Merí, J. (2008). On the polynomial numerical index for the real spaces c0, ℓ1 and ℓ 1. Journal of Mathematical Analysis and Applications, 337, 98–106 43. Kim, S. K., & Lee, H. J. (2009). Strong peak points and strongly norm attaining points with applications to denseness and polynomial numerical indices. Journal of Functional Analysis, 257, 931–947 44. Kim, S. K., & Lee, H. J. (2023). Geometric properties of a vector valued version of a function algebra, in preparation 45. Langemets, J., Lima, V., & Rueda Zoca, A. (2017). Octahedral norms in tensor products of Banach spaces. The Quarterly Journal of Mathematics, 68(4), 1247–1260
764
S. Dantas et al.
46. Lee, H. J., & Martín, M. (2012). Polynomial numerical indices of Banach spaces with 1-unconditional bases. Linear Algebra and its Applications, 437(8), 2001–2008 47. Lee, H. J., Martín, M., & Merí, J. (2011). Polynomial numerical indices of Banach spaces with absolute norm. Linear Algebra and its Applications, 435(2), 400–408 48. Lee, H. J., & Tag, H. J. (2022). Diameter two properties in some vector-valued functions spaces. RACSAM Revista de la Real Academia de Ciencias Exactas, Físicas y Naturales. Serie A. Matematicas, 116(1), 19. Paper no. 17 49. López, G., Martín, M., & Merí, J. (2008). Numerical index of Banach spaces of weakly or weakly-star continuous functions. The Rocky Mountain Journal of Mathematics, 38, 213–223 50. Martín, M. (2003). The Daugavetian index of a Banach space. Taiwanese Journal of Mathematics, 7, 631–640 51. Martín, M., Merí, J., & Popov, M. (2010). The polynomial Daugavet property for atomless L1(μ)-spaces. Archiv der Mathematik (Basel), 94(4), 383–389 52. Martín, M., Merí, J., & Quero, A. (2021). Numerical index and Daugavet property of operator ideals and tensor products. Mediterranean Journal of Mathematics, 18(2), 15. Paper No. 72 53. Martín, M., & Oikhberg, T. (2004). The alternative Daugavet property. Journal of Mathematical Analysis and Applications, 294, 158–180 54. Martín, M., & Rueda Zoca, A. (2022). Daugavet property in projective symmetric tensor products of Banach spaces. Banach Journal of Mathematical Analysis, 16(2), 32. Paper No. 35 55. Martín, M., & Villena, A. (2003). Numerical index and the Daugavet property for L1(μ, X). Proceedings of the Edinburgh Mathematical Society. Series II, 46, 415–420 56. Martín, M. (2008). The alternative Daugavet property of C-algebras and JB-triples. Mathematische Nachrichten, 281(3), 376–385 57. Mujica, J. (1991). Linearization of bounded holomorphic mappings on Banach spaces. Transactions of the American Mathematical Society, 324, 867–887 58. Oikhberg, T. (2002). The Daugavet property of C-algebras and non-commutative Lp-spaces. Positivity, 6, 59–73 59. Rodríguez-Palacios, A. (2004). Numerical ranges of uniformly continuous functions on the unit sphere of a Banach space. Journal of Mathematical Analysis and Applications, 297(2), 472–476 60. Rueda Zoca, A. (2018). Daugavet property and separability in Banach spaces. Banach Journal of Mathematical Analysis, 12(1), 68–84 61. Rueda Zoca, A. (2022). L-orthogonality in Daugavet centers and narrow operators. Journal of Mathematical Analysis and Applications, 505, 12. Paper No. 125447 62. Rueda Zoca, A., Tradacete, P., & Villanueva, I. (2021). Daugavet property in tensor product spaces. Journal of the Institute of Mathematics of Jussieu, 20(4), 1409–1428 63. Ryan, R. A. (2002). Introduction to tensor products in Banach spaces. Springer Monographs in Mathematics. London: Springer 64. Sánchez Pérez, E. A., & Werner, D. (2014). Slice continuity for operators and the Daugavet property for bilinear maps. Functiones et Approximatio Commentarii Mathematici, 50, 251–269 65. Santos, E. R. (2014). An alternative Daugavet property for polynomials. Studia Mathematica, 224(3), 265–276
The Daugavet Equation: Linear and Nonlinear Recent Results
765
66. Santos, E. R. (2014). The Daugavet equation for polynomials on C-algebras. Journal of Mathematical Analysis and Applications, 409(1), 598–606 67. Santos, E. R. (2019). The polynomial Daugavetian index of a complex Banach space. Archiv der Mathematik (Basel), 112(4), 407–416 68. Santos, E. R. (2020). Polynomial Daugavet centers. The Quarterly Journal of Mathematics, 71(4), 1237–1251 69. Shvidkoy, R. V. (2000). Geometry aspects of the Daugavet property. Journal of Functional Analysis, 176, 198–212 70. Weaver, N. (1999). Lipschitz algebras, River Edge, NJ: World Scientific Publishing 71. Werner, D. (1997). The Daugavet equation for operators on function spaces. Journal of Functional Analysis, 143, 117–128 72. Werner, D. (2001). Recent progress on the Daugavet property. Irish Mathematical Society Bulletin, 46, 77-97 73. Wojtaszczyk, P. (1992). Some remarks on the Daugavet equation. Proceedings of the American Mathematical Society, 115, 1047–1052