276 11 3MB
English Pages 622 Year 2019
Springer Series in Computational Mathematics 56
Wolfgang Hackbusch
Tensor Spaces and Numerical Tensor Calculus Second Edition
Springer Series in Computational Mathematics Volume 56
Series Editors Randolph E. Bank, Department of Mathematics, University of California, San Diego, La Jolla, CA, USA Ronald L. Graham, Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA, USA Wolfgang Hackbusch, Max-Planck-Institut für Mathematik in den Naturwissenschaften, Leipzig, Germany Josef Stoer, Institut für Mathematik, University of Würzburg, Würzburg, Germany Richard S. Varga, Kent State University, Kent, OH, USA Harry Yserentant, Institut für Mathematik, Technische Universität Berlin, Berlin, Germany
This is basically a numerical analysis series in which high-level monographs are published. We develop this series aiming at having more publications in it which are closer to applications. There are several volumes in the series which are linked to some mathematical software. This is a list of all titles published in this series.
More information about this series at http://www.springer.com/series/797
Wolfgang Hackbusch
Tensor Spaces and Numerical Tensor Calculus Second Edition
123
Wolfgang Hackbusch Max Planck Institute for Mathematics in the Sciences Leipzig, Germany
ISSN 0179-3632 ISSN 2198-3712 (electronic) Springer Series in Computational Mathematics ISBN 978-3-030-35553-1 ISBN 978-3-030-35554-8 (eBook) https://doi.org/10.1007/978-3-030-35554-8 © Springer Nature Switzerland AG 2012, 2019 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Dedicated to my grandchildren Alina and Daniel
Preface
Large-scale problems have always been a challenge for numerical computations. An example is the treatment of fully populated n × n matrices when n2 is close to or beyond the computer’s memory capacity. Here the technique of hierarchical matrices can reduce the storage and the cost of numerical operations from O(n2 ) to almost O(n). Tensors of order (or spatial dimension) d can be understood as d-dimensional generalisations of matrices, i.e., arrays with d discrete or continuous arguments. For large d ≥ 3, the data size nd is far beyond any computer capacity. This book concerns the development of compression techniques for such high-dimensional data via suitable data sparse representations. Just as the hierarchical matrix technique was based on a successful application of the low-rank strategy, in recent years, related approaches have been used to solve high-dimensional tensor-based problems numerically. The results are quite encouraging, at least for data arising from suitably behaved problems, and even some problems of the size nd = 10001000 have become computable. The methods which can be applied to these multilinear problems are black boxlike. In this respect they are similar to methods used in linear algebra. On the other hand, most of the methods are approximate (computing suitably accurate approximations to quantities of interest) and in this respect they are similar to some approaches in analysis. The crucial key step is to construct an efficient new tensor representation, thus overcoming the drawbacks of the traditional tensor formats. In 2009 rapid progress could be achieved by introducing the hierarchical format, as well as the TT format, for tensor representation. Under suitable conditions, these formats allow a stable representation and a reduction of the data size from nd to O(dn). Another recent advancement is the so-called tensorisation technique, which may replace the size n with O(log n). Altogether, there is hope that problems of the size nd can be reduced to size O(d log(n)) = O(log(nd )); i.e., the problems are reduced to logarithmic size.
vii
viii
Preface
It turned out that some of the raw material for the methods described in this book was already known in the literature belonging to other (applied) fields outside of mathematics, such as chemistry. However, the particular language used to describe this material, combined with the fact that the algorithms (although potentially of general interest) were given names relating them only to a particular application, prevented the dissemination of the methods to a wider audience. One of the aims of this monograph is to introduce a more mathematically based treatment of this topic. Through this more abstract approach, the methods can be better understood, independently of the physical or technical details of the application. Accordingly, the book tries to introduce a nomenclature adapted to the mathematical background. The material in this monograph was used as the basis for a course of lectures at the University of Leipzig in the summer semester of 2010. The author’s research at the Max-Planck Institute of Mathematics in the Sciences has been supported by a larger group of researchers. In particular we would like to mention: B. Khoromskij, M. Espig, L. Grasedyck, and H.J. Flad. The help of H.J. Flad was indispensable for bridging the terminological gap between quantum chemistry and mathematics. The research programme has also benefited from the collaboration between the group in Leipzig and a group in Moscow headed by E. Tyrtyshnikov (Russian Academy of Sciences). Further inspiring cooperations have involved R. Schneider (TU Berlin, formerly University of Kiel) and A. Falc´o (CEU Cardenal Herrea University, Valencia). Later important contacts included D. Kressner (EPFL, Lausanne), H. Matthies (TU Braunschweig), L. De Lathauwer (University of Leuven), and A. Uschmajew and M. Michałek (both MPI, Leipzig). The author thanks many more colleagues for stimulating discussions. The first edition of this book appeared in 2012. The present revised edition not only contains corrections of the unavoidable misprints, but also includes new parts ranging from single additional statements to new subchapters. The additional chapters §4.5.4 and §10.1.5 allow L∞ estimates for truncations with respect to the Hilbert norm L2 . New statements about symmetric and antisymmetric tensors are added in §3.5.4, §6.9, and §7.7. The analysis of nonclosed formats has led to new results in §9.5 and §12.5. The cyclic matrix product representation and its siteindependent version are discussed in §12.5. The discussion of the ALS method is continued in §§9.6.2.3–9.6.2.7. The analysis of minimal subspaces of topological tensors is refined in §6.6. Other chapters are newly arranged: §4.2, §4.3, §7.4, §8.2, and §8.4. The number of references has increased by 50%. Finally, the author wishes to express his gratitude to the publisher, Springer, for their friendly cooperation. Leipzig, September 2019
Wolfgang Hackbusch
Contents
Part I Algebraic Tensors 1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 What are Tensors? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Tensor Product of Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 Tensor Product of Matrices, Kronecker Product . . . . . . . . . . . 1.1.3 Tensor Product of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Where do Tensors Appear? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Tensors as Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Tensor Decomposition for Inverse Problems . . . . . . . . . . . . . . 1.2.3 Tensor Spaces in Functional Analysis . . . . . . . . . . . . . . . . . . . 1.2.4 Large-Sized Tensors in Analysis Applications . . . . . . . . . . . . 1.2.5 Tensors in Quantum Chemistry . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Tensor Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Part I: Algebraic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 Part II: Functional Analysis of Tensors . . . . . . . . . . . . . . . . . . 1.4.3 Part III: Numerical Treatment . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.4 Topics Outside the Scope of the Monograph . . . . . . . . . . . . . . 1.5 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Comments about the Early History of Tensors . . . . . . . . . . . . . . . . . . . 1.7 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 3 3 5 7 8 8 9 10 10 13 13 14 14 15 16 17 18 18 19
2
Matrix Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Matrix Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Matrix Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Semidefinite Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Matrix Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Cholesky Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 QR Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23 23 25 27 29 30 30 31
ix
Contents
x
3
2.5.3 Singular-Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Low-Rank Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Linear Algebra Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Dominant Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33 39 41 44
Algebraic Foundations of Tensor Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Basic Facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Free Vector Space over a Set . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3 Quotient Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.4 (Multi-)Linear Maps, Algebraic Dual, Basis Transformation 3.2 Tensor Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Constructive Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Characteristic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Isomorphism to Matrices for d = 2 . . . . . . . . . . . . . . . . . . . . . 3.2.4 Tensors of Order d ≥ 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.5 Different Types of Isomorphisms . . . . . . . . . . . . . . . . . . . . . . . 3.2.6 Rr and Tensor Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Linear and Multilinear Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Definition on the Set of Elementary Tensors . . . . . . . . . . . . . . 3.3.2 Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Tensor Spaces with Algebra Structure . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Symmetric and Antisymmetric Tensor Spaces . . . . . . . . . . . . . . . . . . . 3.5.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 Quantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.3 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.4 Application of Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49 49 49 50 52 53 54 54 56 58 60 63 65 76 76 77 85 88 88 91 92 93
Part II Functional Analysis of Tensor Spaces 4
Banach Tensor Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.1 Banach Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.1.1 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.1.2 Basic Facts about Banach Spaces . . . . . . . . . . . . . . . . . . . . . . . 98 4.1.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.1.4 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.1.5 Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.1.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.1.7 Weak Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.1.8 Continuous Multilinear Mappings . . . . . . . . . . . . . . . . . . . . . . 108 4.2 Topological Tensor Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 4.2.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 4.2.2 Continuity of the Tensor Product, Crossnorms . . . . . . . . . . . . 110 4.2.3 Projective Norm k·k∧(V,W ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.2.4 Duals and Injective Norm k·k∨(V,W ) . . . . . . . . . . . . . . . . . . . . 120
Contents
4.3
4.4
4.5
4.6
4.7
5
xi
4.2.5 Embedding of V ∗ into L(V ⊗ W, W ) . . . . . . . . . . . . . . . . . . 127 4.2.6 Reasonable Crossnorms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 4.2.7 Reflexivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 4.2.8 Uniform Crossnorms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 4.2.9 Nuclear and Compact Operators . . . . . . . . . . . . . . . . . . . . . . . . 136 Tensor Spaces of Order d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 4.3.1 Continuity, Crossnorms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 4.3.2 Recursive Definition of the Topological Tensor Space . . . . . . 140 4.3.3 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 4.3.4 Embedding into L(V, Vj ) and L(V, Vα ) . . . . . . . . . . . . . . . 147 4.3.5 Intersections of Banach Tensor Spaces . . . . . . . . . . . . . . . . . . 151 4.3.6 Tensor Space of Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 4.4.1 Scalar Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 4.4.2 Basic Facts about Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . 155 4.4.3 Operators on Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 4.4.4 Orthogonal Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Tensor Products of Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 4.5.1 Induced Scalar Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 4.5.2 Crossnorms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 4.5.3 Tensor Products of L(Vj , Vj ) . . . . . . . . . . . . . . . . . . . . . . . . . . 164 4.5.4 Gagliardo–Nirenberg Inequality . . . . . . . . . . . . . . . . . . . . . . . . 165 4.5.5 Partial Scalar Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 Tensor Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 4.6.1 Vector Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 4.6.2 Matrix-Vector Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . 172 4.6.3 Matrix-Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 4.6.4 Hadamard Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 4.6.5 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 4.6.6 Function of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 Symmetric and Antisymmetric Tensor Spaces . . . . . . . . . . . . . . . . . . . 179 4.7.1 Hilbert Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 4.7.2 Banach Spaces and Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . 180
General Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 5.1 Vectorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 5.1.1 Tensors as Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 5.1.2 Kronecker Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 5.2 Matricisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 5.2.1 General Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 5.2.2 Finite-Dimensional Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 5.2.3 Hilbert Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 5.2.4 Matricisation of a Family of Tensors . . . . . . . . . . . . . . . . . . . . 198 5.3 Tensorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
Contents
xii
6
Minimal Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 6.1 Statement of the Problem, Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 6.2 Tensors of Order Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 6.2.1 Existence of Minimal Subspaces . . . . . . . . . . . . . . . . . . . . . . . 202 6.2.2 Use of the Singular-Value Decomposition . . . . . . . . . . . . . . . . 205 6.2.3 Minimal Subspaces for a Family of Tensors . . . . . . . . . . . . . . 206 6.3 Minimal Subspaces of Tensors of Higher Order . . . . . . . . . . . . . . . . . 207 6.4 Hierarchies of Minimal Subspaces and rankα . . . . . . . . . . . . . . . . . . . 210 6.5 Sequences of Minimal Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 6.6 Minimal Subspaces of Topological Tensors . . . . . . . . . . . . . . . . . . . . . 218 6.6.1 Setting of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 6.6.2 First Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 6.6.3 Second Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 6.7 Minimal Subspaces for Intersection Spaces . . . . . . . . . . . . . . . . . . . . . 224 6.7.1 Algebraic Tensor Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 6.7.2 Topological Tensor Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 6.8 Linear Constraints and Regularity Properties . . . . . . . . . . . . . . . . . . . . 226 6.9 Minimal Subspaces for (Anti-)Symmetric Tensors . . . . . . . . . . . . . . . 229
Part III Numerical Treatment 7
r-Term Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 7.1 Representations in General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 7.1.1 Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 7.1.2 Computational and Memory Cost . . . . . . . . . . . . . . . . . . . . . . . 235 7.1.3 Tensor Representation versus Tensor Decomposition . . . . . . 236 7.2 Full and Sparse Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 7.3 r-Term Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 7.4 Tangent Space and Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 7.4.1 Tangent Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 7.4.2 Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 7.5 Representation of Vj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 7.6 Conversions between Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 7.6.1 From Full Representation into r-Term Format . . . . . . . . . . . . 247 7.6.2 From r-Term Format into Full Representation . . . . . . . . . . . . 248 7.6.3 From r -Term into N -Term Format for r > N . . . . . . . . . . . . . 248 7.6.4 Sparse-Grid Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 7.6.5 From Sparse Format into r-Term Format . . . . . . . . . . . . . . . . . 251 7.7 Representation of (Anti-)Symmetric Tensors . . . . . . . . . . . . . . . . . . . . 253 7.7.1 Sums of Symmetric Rank-1 Tensors . . . . . . . . . . . . . . . . . . . . 254 7.7.2 Indirect Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 7.8 Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
Contents
xiii
8
Tensor Subspace Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 8.1 The Set Tr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 8.2 Tensor Subspace Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 8.2.1 General Frame or Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 8.2.2 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 8.2.3 Tensors in KI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 8.2.4 Orthonormal Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 8.2.5 Summary of the Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 8.2.6 Hybrid Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 8.3 Higher-Order Singular-Value Decomposition (HOSVD) . . . . . . . . . . 273 8.3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 8.3.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 8.3.3 Computation and Computational Cost . . . . . . . . . . . . . . . . . . . 277 8.4 Tangent Space and Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 8.4.1 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 8.4.2 Tangent Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 8.4.3 Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 8.5 Conversions between Different Formats . . . . . . . . . . . . . . . . . . . . . . . . 287 8.5.1 Conversion from Full Representation into Tensor Subspace Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 8.5.2 Conversion from Rr to Tr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 8.5.3 Conversion from Tr to Rr . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 8.5.4 A Comparison of Both Representations . . . . . . . . . . . . . . . . . . 292 8.5.5 r-Term Format for Large r > N . . . . . . . . . . . . . . . . . . . . . . . 293 8.6 Joining two Tensor Subspace Representation Systems . . . . . . . . . . . . 293 8.6.1 Trivial Joining of Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 8.6.2 Common Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
9
r-Term Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 9.1 Approximation of a Tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 9.2 Discussion for r = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 9.3 Discussion in the Matrix Case d = 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 301 9.4 Discussion in the Tensor Case d ≥ 3 . . . . . . . . . . . . . . . . . . . . . . . . . . 303 9.4.1 Nonclosedness of Rr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 9.4.2 Border Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 9.4.3 Stable and Unstable Sequences . . . . . . . . . . . . . . . . . . . . . . . . . 306 9.4.4 A Greedy Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 9.5 General Statements on Nonclosed Formats . . . . . . . . . . . . . . . . . . . . . 309 9.5.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 9.5.2 Nonclosed Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 9.5.3 Discussion of F = Rr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 9.5.4 General Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 9.5.5 On the Strength of Divergence . . . . . . . . . . . . . . . . . . . . . . . . . 313 9.5.6 Uniform Strength of Divergence . . . . . . . . . . . . . . . . . . . . . . . . 314 9.5.7 Extension to Vector Spaces of Larger Dimension . . . . . . . . . . 317
xiv
Contents
9.6 Numerical Approaches for the r-Term Approximation . . . . . . . . . . . . 318 9.6.1 Use of the Hybrid Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 9.6.2 Alternating Least-Squares Method . . . . . . . . . . . . . . . . . . . . . . 320 9.6.3 Stabilised Approximation Problem . . . . . . . . . . . . . . . . . . . . . . 329 9.6.4 Newton’s Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 9.7 Generalisations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 9.8 Analytical Approaches for the r-Term Approximation . . . . . . . . . . . . 333 9.8.1 Quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 9.8.2 Approximation by Exponential Sums . . . . . . . . . . . . . . . . . . . 335 9.8.3 Sparse Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346 10
Tensor Subspace Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 10.1 Truncation to Tr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 10.1.1 HOSVD Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348 10.1.2 Successive HOSVD Projection . . . . . . . . . . . . . . . . . . . . . . . . . 350 10.1.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352 10.1.4 Other Truncations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354 10.1.5 L∞ Estimate of the Truncation Error . . . . . . . . . . . . . . . . . . . . 355 10.2 Best Approximation in the Tensor Subspace Format . . . . . . . . . . . . . . 358 10.2.1 General Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358 10.2.2 Approximation with Fixed Format . . . . . . . . . . . . . . . . . . . . . . 359 10.2.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 10.3 Alternating Least-Squares Method (ALS) . . . . . . . . . . . . . . . . . . . . . . 362 10.3.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362 10.3.2 ALS for Different Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 10.3.3 Approximation with Fixed Accuracy . . . . . . . . . . . . . . . . . . . . 367 10.4 Analytical Approaches for the Tensor Subspace Approximation . . . . 369 10.4.1 Linear Interpolation Techniques . . . . . . . . . . . . . . . . . . . . . . . . 369 10.4.2 Polynomial Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372 10.4.3 Polynomial Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 10.4.4 Sinc Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376 10.5 Simultaneous Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 10.6 R´esum´e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
11
Hierarchical Tensor Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387 11.1.1 Hierarchical Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387 11.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390 11.1.3 Historical Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391 11.2 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392 11.2.1 Dimension Partition Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392 11.2.2 Algebraic Characterisation, Hierarchical Subspace Family . . 394 11.2.3 Minimal Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 11.2.4 Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398 11.3 Construction of Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
Contents
xv
11.3.1 Hierarchical Basis Representation . . . . . . . . . . . . . . . . . . . . . . 400 11.3.2 Orthonormal Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410 11.3.3 HOSVD Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 11.3.4 Tangent Space and Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . 422 11.3.5 Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422 11.3.6 Conversion from Rr to Hr Revisited . . . . . . . . . . . . . . . . . . . 429 11.4 Approximations in Hr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431 11.4.1 Best Approximation in Hr . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431 11.4.2 HOSVD Truncation to Hr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433 11.5 Joining two Hierarchical Tensor Representation Systems . . . . . . . . . 446 11.5.1 Setting of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446 11.5.2 Trivial Joining of Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 11.5.3 Common Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 12
Matrix Product Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453 12.1 Basic TT Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453 12.2 Function Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456 12.3 TT Format as Hierarchical Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456 12.3.1 Related Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456 12.3.2 From Subspaces to TT Coefficients . . . . . . . . . . . . . . . . . . . . . 457 12.3.3 From Hierarchical Format to TT Format . . . . . . . . . . . . . . . . . 458 12.3.4 Construction with Minimal ρj . . . . . . . . . . . . . . . . . . . . . . . . . 460 12.3.5 Extended TT Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 460 12.3.6 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 12.3.7 HOSVD Bases and Truncation . . . . . . . . . . . . . . . . . . . . . . . . . 462 12.4 Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 12.4.1 Conversion from Rr to Tρ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 12.4.2 Conversion from Tρ to Hr with a General Tree . . . . . . . . . . 463 12.4.3 Conversion from Hr to Tρ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465 12.5 Cyclic Matrix Products and Tensor Network States . . . . . . . . . . . . . . 467 12.5.1 Cyclic Matrix Product Representation . . . . . . . . . . . . . . . . . . . 467 12.5.2 Site-Independent Representation . . . . . . . . . . . . . . . . . . . . . . . 470 12.5.3 Tensor Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471 12.6 Representation of Symmetric and Antisymmetric Tensors . . . . . . . . . 472
13
Tensor Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473 13.1 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474 13.1.1 Full Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474 13.1.2 r-Term Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475 13.1.3 Tensor Subspace Representation . . . . . . . . . . . . . . . . . . . . . . . . 475 13.1.4 Hierarchical Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 13.2 Entry-wise Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 13.2.1 r-Term Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478 13.2.2 Tensor Subspace Representation . . . . . . . . . . . . . . . . . . . . . . . . 478 13.2.3 Hierarchical Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
xvi
Contents
13.2.4 Matrix Product Representation . . . . . . . . . . . . . . . . . . . . . . . . . 480 13.3 Scalar Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480 13.3.1 Full Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481 13.3.2 r-Term Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481 13.3.3 Tensor Subspace Representation . . . . . . . . . . . . . . . . . . . . . . . . 482 13.3.4 Hybrid Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484 13.3.5 Hierarchical Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 485 13.3.6 Orthonormalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489 13.4 Change of Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490 13.4.1 Full Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490 13.4.2 Hybrid r-Term Representation . . . . . . . . . . . . . . . . . . . . . . . . . 490 13.4.3 Tensor Subspace Representation . . . . . . . . . . . . . . . . . . . . . . . . 491 13.4.4 Hierarchical Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 13.5 General Binary Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492 13.5.1 r-Term Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493 13.5.2 Tensor Subspace Representation . . . . . . . . . . . . . . . . . . . . . . . . 493 13.5.3 Hierarchical Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 494 13.6 Hadamard Product of Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495 13.7 Convolution of Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496 13.8 Matrix-Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496 13.9 Matrix-Vector Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497 13.9.1 Identical Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498 13.9.2 Separable Form (13.25a) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498 13.9.3 Elementary Kronecker Tensor (13.25b) . . . . . . . . . . . . . . . . . . 499 13.9.4 Matrix in p-Term Format (13.25c) . . . . . . . . . . . . . . . . . . . . . . 500 13.10Functions of Tensors, Fixed-Point Iterations . . . . . . . . . . . . . . . . . . . . 501 13.11Example: Operations in Quantum Chemistry Applications . . . . . . . . 503 14
Tensorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507 14.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507 14.1.1 Notations, Choice for TD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507 14.1.2 Format Hρtens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509 14.1.3 Operations with Tensorised Vectors . . . . . . . . . . . . . . . . . . . . . 510 14.1.4 Application to Representations by Other Formats . . . . . . . . . 512 14.1.5 Matricisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 14.1.6 Generalisation to Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514 14.2 Approximation of Grid Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515 14.2.1 Grid Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515 14.2.2 Exponential Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516 14.2.3 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516 14.2.4 Multiscale Feature and Conclusion . . . . . . . . . . . . . . . . . . . . . 520 14.2.5 Local Grid Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520 14.3 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 14.3.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 14.3.2 Separable Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
Contents
xvii
14.3.3 Tensor Algebra A(`0 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524 14.3.4 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 14.4 Fast Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534 14.4.1 FFT for Cn Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534 14.4.2 FFT for Tensorised Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535 14.5 Tensorisation of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537 14.5.1 Isomorphism ΦF n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537 14.5.2 Scalar Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538 14.5.3 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539 14.5.4 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539 15
Multivariate Cross Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541 15.1 Approximation of General Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541 15.1.1 Approximation of Multivariate Functions . . . . . . . . . . . . . . . . 542 15.1.2 Multiparametric Boundary-Value Problem and PDE with Stochastic Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543 15.1.3 Function of a Tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545 15.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546 15.3 Properties in the Matrix Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548 15.4 Case d ≥ 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551 15.4.1 Matricisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551 15.4.2 Nestedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553 15.4.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
16
Applications to Elliptic Partial Differential Equations . . . . . . . . . . . . . . 559 16.1 General Discretisation Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559 16.2 Solution of Elliptic Boundary-Value Problems . . . . . . . . . . . . . . . . . . 560 16.2.1 Separable Differential Operator . . . . . . . . . . . . . . . . . . . . . . . . 561 16.2.2 Discretisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561 16.2.3 Solution of the Linear System . . . . . . . . . . . . . . . . . . . . . . . . . . 563 16.2.4 Accuracy Controlled Solution . . . . . . . . . . . . . . . . . . . . . . . . . . 565 16.3 Solution of Elliptic Eigenvalue Problems . . . . . . . . . . . . . . . . . . . . . . . 566 16.3.1 Regularity of Eigensolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 566 16.3.2 Iterative Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568 16.3.3 Alternative Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569 16.4 On Other Types of PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569
17
Miscellaneous Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571 17.1 Minimisation Problems on V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571 17.1.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571 17.1.2 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572 17.2 Solution of Optimisation Problems Involving Tensor Formats . . . . . . 573 17.2.1 Formulation of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 574 17.2.2 Reformulation, Derivatives, and Iterative Treatment . . . . . . . 575 17.3 Ordinary Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576
xviii
Contents
17.3.1 Tangent Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576 17.3.2 Dirac–Frenkel Discretisation . . . . . . . . . . . . . . . . . . . . . . . . . . . 576 17.3.3 Tensor Subspace Format Tr . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577 17.3.4 Hierarchical Format Hr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579 17.4 ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581 17.4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581 17.4.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582 17.4.3 Combination with Tensor Representations . . . . . . . . . . . . . . . 584 17.4.4 Symmetric Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
About the Author
Wolfgang Hackbusch works in the field of numerical mathematics for partial differential equations and integral equations. He has published a number of monographs, for example on the multi-grid method, the numerical analysis of elliptic partial differential equations, the iterative solution of large systems of equation, and the technique of hierarchical matrices.
xix
List of Symbols and Abbreviations
Symbols [a b . . .] [A B . . .] d·e b·c h·, ·i h·, ·iα h·, ·iH h·, ·iHS h·, ·ij h·, ·iF # * •|τ ×σ •⊥ •H •T •−T , •−H • ×
×dj=1 ×j ?
aggregation of vectors a, b ∈ KI , . . . into a matrix of size I × J aggregation of matrices A ∈ KI×J1 , B ∈ KI×J2 , . . . into a matrix of size I × (J1 ∪ J2 ∪ . . .) smallest integer ≥ · largest integer ≤ · scalar product; in KI usually the Euclidean scalar product; cf. §2.1, §4.4.1 partial scalar product; cf. §4.5.5 scalar product of a (pre-)Hilbert space H Hilbert-Schmidt scalar product; cf. Definition 4.140 Nd scalar product of the (pre-)Hilbert space Vj from V = j=1 Vj Frobenius scalar product of matrices; cf. (2.10) cardinality of a set weak convergence; cf. §4.1.7 restriction of a matrix • to the matrix block τ × σ; cf. §1.7 orthogonal complement, cf. §4.4.1 Hermitean transpose of a matrix or vector transpose of a matrix or vector inverse matrix of •T or •H , respectively either complex-conjugate value of a scalar or closure of a set Cartesian product of sets: A × B := {(a, b) : a ∈ A, b ∈ B} d-fold Cartesian product of sets j-mode product, cf. Footnote 8 on page 5; not used here convolution; cf. §4.6.5
xxi
xxii
List of Symbols and Abbreviations
∧ exterior product; cf. §3.5.1 Hadamard product; cf. (4.82) ⊕ direct sum; cf. footnote on page 13 ⊗d d-fold tensor product; cf. Notation 3.24 Nd (j) v ⊗ w, tensor product of two or more vectors; cf. §3.2.1 j=1 v Nd V ⊗ W, j=1 Vj tensor space generated by two or more vector spaces; cf. §3.2.1 Nd V ⊗a W, a j=1 Vj algebraic tensor space; cf. (3.9) and §3.2.4 Nd V ⊗k·k W, k·k j=1 Vj topological tensor space; cf. (3.10); §4.2 N cf. (3.17b) j6=k ⊂ the subset relation A ⊂ B includes the case A = B ∪˙ disjoint union ∼ equivalence relation; cf. §3.1.3, §4.1.1 ∼ •=• isomorphic spaces; cf. §3.2.5 •≤• semi-ordering of matrices; cf. (2.12) k·k norm; cf. §4.1.1 ∗ k·k dual norm; cf. Lemma 4.21 k·k2 Euclidean norm of vector or tensor (cf. §2.3 and Example 4.149) or spectral norm of a matrix (cf. (2.11)) k·kF Frobenius norm of matrices; cf. (2.9) k·kHS Hilbert-Schmidt norm; cf. Definition 4.140 k·kSVD,p Schatten norm; cf. (4.17) k·kX norm of a space X k·kX←Y associated matrix norm (cf. §2.3) or operator norm (cf. (4.6a)) k·k1 . k·k2 semi-ordering of norms; cf. §4.1.1 k·k∧(V,W ) , k·k∧ projective norm; cf. §4.2.3.1 k·k∨(V,W ) , k·k∨
injective norm; cf. §4.2.4.2
List of Symbols and Abbreviations
xxiii
Greek Letters α αc α1 , α2 δij ρ ρ(·) ρxyz (·) ρframe ρHOSVD ρHT ρHOSVD HT ρorth HT ρtens HT
often a subset of the set D of directions (cf. (5.3a,b)) or vertex of the tree TD ; cf. Definition 11.2 complement D\α; cf. (5.3c) often sons of a vertex α ∈ TD ; cf. §11.2.1 Kronecker delta; cf. (2.1) tuple of TT ranks; cf. Definition 12.1 spectral radius of a matrix; cf. §4.6.6 tensor representation by format ‘xyz’; cf. §7.1 general tensor subspace format; cf. (8.8c) HOSVD tensor subspace format; cf. (8.25) hierarchical format; cf. (11.24) hierarchical HOSVD format; cf. Definition 11.37 orthonormal hierarchical format; cf. (11.32) TT format for tensorised vectors; cf. (14.5a)
hybr ρhybr , ρhybr hybrid formats; cf. §8.2.6 orth , ρr-term ρj TT rank; cf. (12.1a) and Definition 12.1 ρorth orthonormal tensor subspace format; cf. (8.10b) ρr-term r-term format; cf. (7.7a) ρsparse sparse format; cf. (7.5) ρTS general tensor subspace format; cf. (8.5c) ρTT TT format; cf. (12.7) σ(·) spectrum of a matrix; cf. §4.6.6 σi singular value of the singular-value decomposition; cf. (2.16a), (4.65) Σ diagonal matrix of the singular-value decomposition; cf. (2.16a) ϕ, ψ often linear mapping or functional; cf. §3.1.4 Φ, Ψ often linear mapping or operator; cf. §4.1.4) Φ0 dual of Φ; cf. Definition 4.23 ∗ Φ adjoint of Φ; cf. Definition 4.136
xxiv~
List of Symbols and Abbreviations
Latin Letters a coefficient tensor, cf. Remark 3.31 A, B, . . . , A1 , A2 , . . . often used for linear mapping (from one vector space into another one). This includes matrices. A, B, C, . . . tensor products of operators or matrices A(j) mapping from L(Vj , Wj ), j-th component in a Kronecker product A(V ) tensor algebra generated by V ; cf. (3.38) A(V ) antisymmetric tensor space generated by V ; cf. Definition 3.74 Arcosh area [inverse] hyperbolic cosine: cosh(Arcosh(x)) = x (j)
(α)
bi , bi
B, Bj , Bα c0 (I)
basis vectors; cf. §8.2.1.1, (11.18a) (j) (j) basis (or frame), Bj = b1 , . . . , br , cf. §8.2.1.1; (α) (α) in the case of tensor spaces: Bα = b1 , . . . , brα , cf. (11.18a) subset of `∞ (I); cf. (4.4)
(α,`)
cij coefficients of the matrix C (α,`) ; cf. (11.20) C field of complex numbers 0 C(D), C (D) bounded, continuous functions on D; cf. Example 4.10 C(d, (ρj ), (nj ))) cyclic matrix-product representation; cf. §12.5.1 Cind (d, ρ, n) site-independent cyclic matrix-product representation; cf. §12.5.2 Cα tuple (C (α,`) )1≤`≤rα of C (α,`) from below; cf. (11.22b) C (α,`)
(α)
coefficient matrix at vertex α characterising the basis vector b` ; cf. (11.20) Cj , Cα contractions; cf. Definition 4.160 CN (f, h), C(f, h) sinc interpolation; cf. Definition 10.34 d order of a tensor; cf. §1.1.1 D set {1, . . . , d} of directions; cf. (5.3b) D(A) domain of the operator A; cf. page 227 Dδ analyticity stripe; cf. (10.33) depth(·) depth of a tree; cf. (11.6) det(·) determinant of a matrix diag{. . .} diagonal matrix with entries . . . dim(·) dimension of a vector space e(i) i-th unit vector of KI (i ∈ I); cf. (2.2) EN (f, h), E(f, h) sinc interpolation error; cf. Definition 10.34 Er (·) exponential sum; cf. (9.36a) Eρ regularity ellipse; cf. §10.4.2.2 F(W, V ) space of finite rank operators; cf. §4.2.9 G(·) Gram matrix of a set of vectors; cf. (2.13), (11.30)
List of Symbols and Abbreviations
xxv
H, H1 , H2 , . . . (pre-)Hilbert spaces H(Dδ ) Banach space from Definition 10.36 1,p H (D) Sobolev space; cf. Example 4.47 HS(V, W ) Hilbert–Schmidt space; cf. Definition 4.140 id identity mapping i, j, k, . . . index variables i, j, k multi-indices from a product index set I etc. I identity matrix or index set I, I[a,b] interpolation operator; cf. §10.4.3 I, J, K, I1 , I2 , . . . , J1 , J2 , . . . often used for index sets I, J index sets defined by products I1 × I2 × . . . of index sets j often index variable for the directions from {1, . . . , d} K underlying field of a vector space; usually R or C K(W, V ) space of compact operators; cf. §4.2.9 `(I) vector space KI ; cf. Example 3.1 `0 (I) subset of `(I); cf. (3.2) `p (I) Banach space from Example 4.7; 1 ≤ p ≤ ∞ level level of a vertex of a tree, cf. (11.5) lim inf limit inferior, smallest accumulation point L often depth of a tree, cf. (11.6) L lower triangular matrix in Cholesky decomposition; cf. §2.5.1 L(T ) set of leaves of the tree T ; cf. Remark 11.3b L(V, W ) vector space of linear mappings from V into W ; cf. §3.1.4 L(X, Y ) space of continuous linear mappings from X into Y ; cf. §4.1.4 Lp (D) Banach space; cf. Example 4.9; 1 ≤ p ≤ ∞ Mα , Mj matricisation isomorphisms; cf. Definition 5.3 n, nj often dimension of a vector space V, Vj N set {1, 2, . . .} of natural numbers N0 set N ∪ {0} = {0, 1, 2, . . .} N (W, V ) space of nuclear operators; cf. §4.2.9 Nxyz arithmetical cost of ‘xyz’ xyz storage cost of ‘xyz’; cf. (7.8a) Nmem NLSVD cost of a left-sided singular-value decomposition; cf. Corollary 2.24 NQR cost of a QR decomposition; cf. Lemma 2.22 NSVD cost of a singular-value decomposition; cf. Corollary 42 o(·), O(·) Landau symbols; cf. (4.12) P permutation matrix (cf. (2.15)), set of permutations, or projection PA alternator, projection onto A(V ); cf. (3.40)
xxvi
List of Symbols and Abbreviations
PS symmetriser, projection onto S(V ); cf. (3.40) P, Pj , etc. often used for projections in tensor spaces (j),HOSVD
PjHOSVD , Prj , PHOSVD HOSVD projections; cf. Remark 10.1 r P, Pp , Pp spaces of polynomials; cf. §10.4.2.1 Q unitary matrix of QR decomposition; cf. (2.14a) r matrix rank or tensor rank (cf. §2.2), representation rank (cf. Definition 7.3), or bound of ranks r rank (rα )α∈TD connected with hierarchical format Hr ; cf. §11.2.2 rα components of r from above r rank (r1 , . . . , rd ) connected with tensor subspace representation in Tr rj components of r from above rmin (v) tensor subspace rank; cf. Remark 8.4 range(·) range of a matrix or operator; cf. §2.1 rank(·) rank of a matrix or tensor; cf. §2.2 and (3.20) rank(·) border rank; cf. (9.11) rankα (·), rankj (·) α-rank and j-rank; cf. Definition 5.7 rmax maximal rank; cf. (2.5) and §3.2.6.5 R upper triangular matrix of QR decomposition; cf. (2.14a) R field of real numbers J R set of J-tuples; cf. page 4 Rr set of matrices or tensors of rank ≤ r; cf. (2.6) and (3.18) S(α) set of sons of a tree vertex α; cf. Definition 11.2 S(V ) symmetric tensor space generated by V ; cf. Definition 3.74 S(k, h)(·) see (10.31) sinc(·) sinc function: sin(πx)/(πx) span{· · · } subspace spanned by · · · supp(·) support of a mapping; cf. §3.1.2 Tα subtree of TD ; cf. Definition 11.6 TD dimension partition tree; cf. Definition 11.2 (`)
TD set of tree vertices at level `; cf. (11.7) TT TD linear tree used for the TT format; cf. §12 Tr set of tensors of representation rank r; cf. Definition 8.1 Tρ set of tensors of TT representation rank ρ; cf. (12.4) trace(·) trace of a matrix or operator; cf. (2.8) and (4.66) tridiag{a, b, c} tridiagonal matrix (a : lower diagonal entries, b : diagonal, c : upper diagonal entries) U vector space, often a subspace U, V unitary matrices of the singular-value decomposition; cf. (2.16b)
List of Symbols and Abbreviations
xxvii
ui , vi left and right singular vectors of SVD; cf. (2.18) u, v, w vectors u, v, w tensors U tensor space, often a subspace of a tensor space Uα subspace of the tensor space Vα ; cf. (11.8) {Uα }α∈TD hierarchical subspace family; cf. Definition 11.8 U 0 , V 0 , W 0 , . . . algebraic duals of U, V, W, . . .; cf. §3.1.4 UjI (v), UjII (v), UjIII (v), UjIV (v) see Lemma 6.12 Ujmin (v), Umin minimal subspaces of a tensor v; Def. 6.3, (6.8a), and §6.4 α (v) vi either the i-th component of v or the i-th vector of a set of vectors v (j) vector of Vj corresponding to the j-th direction of the tensor; cf. §3.2.4 v[k] tensor belonging to V[k] ; cf. (3.17d) Vfree (S) free vector space of a set S; cf. §3.1.2 N Vα tensor space V ; cf. (5.3d) Nj∈α j V[j] tensor space k6=j Vj ; cf. (3.17a) and §5.2 Vcycl space of cyclic tensors; cf. Definition 12.12 V, W, . . . , X, Y, . . . vector spaces V 0 , W 0 , . . . , X 0 , Y 0 , . . . algebraically dual vector spaces; cf. 3.1.4 V, W, X, Y tensor spaces X, Y often used for Banach spaces; cf. §4.1 ∗ ∗ X , Y , . . . dual spaces containing the continuous functionals; cf. §4.1.5 V ∗∗ bidual space; cf. §4.1.5 Z set of integers
xxviii
List of Symbols and Abbreviations
Abbreviations and Algorithms ALS alternating least-squares method; cf. §9.6.2 ANOVA analysis of variance; cf. §17.4 CANDELINC hybrid format; cf. Footnote 11 on page 271 CPD canonical polyadic decomposition; cf. Abstract on page 233 DCQR cf. (2.37) DFT density functional theory; cf. §13.11 DFT discrete Fourier transform; cf. §14.4.1 DMRG density matrix renormalisation group; cf. §17.2.2 FFT fast Fourier transform; cf. §14.4.1 HOOI higher-order orthogonal iteration; cf. §10.3.1 HOSVD higher-order singular-value decomposition; cf. §8.3 HOSVD(·), HOSVD∗ (·), HOSVD∗∗ (·) procedures constructing the hierarchical HOSVD format; cf. (11.41a–c) HOSVD-lw, HOSVD∗ -lw levelwise procedures; cf. (11.41a–c), (11.42a,b) HOSVD-TrSeq sequential truncation procedure; cf. (11.58) HOSVDα (v), HOSVDj (v) computation of HOSVD data; cf. §8.3.3 JoinBases joining two bases; cf. (2.32) JoinONB joining two orthonormal bases; cf. (2.33) LOBPCG locally optimal block preconditioned conjugate gradient; cf. (16.13) LSVD left-sided reduced SVD; cf. (2.29) MALS modified alternating least-squares method; cf. §17.2.2 MBI maximum block improvement variant of ALS; cf. page 321 MLSVD same as HOSVD; cf. Footnote 12 on page 273 MPS matrix product state, matrix product system; cf. §12 PD polyadic decomposition; cf. Abstract on page 233 PEPS projected entangled pairs states; cf. footnote 6 on page 467 PGD proper generalised decomposition; cf. (17.1.1) PQR pivotised QR decomposition; cf. (2.27) QR QR decomposition; cf. §2.5.2 REDUCE, REDUCE∗ truncation procedure; cf. §11.4.2 RQR reduced QR decomposition; cf. (2.26) RSVD reduced SVD; cf. (2.28) SVD singular-value decomposition; cf. §2.5.3
Part I
Algebraic Tensors
In Chapter 1, we start with an elementary introduction into the world of tensors (the precise definitions are in Chapter 3) and explain where large-sized tensors appear. This is followed by a description of the Numerical Tensor Calculus. Section 1.4 contains a preview of the material of the three parts of the book. We conclude with some historical remarks and an explanation of the notation. The numerical tools which will be developed for tensors, make use of linear algebra methods (e.g., QR and singular value decomposition). Therefore, these matrix techniques are recalled in Chapter 2. The definition of the algebraic tensor space structure is given in Chapter 3. This includes linear mappings and their tensor product.
Chapter 1
Introduction In view of all that . . . , the many obstacles we appear to have surmounted, what casts the pall over our victory celebration? It is the curse of dimensionality, a malediction that has plagued the scientist from earliest days. (Bellman [25, p. 94]). 1.1 What are Tensors? For a first rough introduction1 into tensors, we give a preliminary definition of tensors and the tensor product. The formal definition in the sense of multilinear algebra will be given in Chapter 3. In fact, below we consider three types of tensors which are of particular interest in later applications.
1.1.1 Tensor Product of Vectors While vectors have entries vi with one index and matrices have entries Mij with two indices, tensors will carry d indices. The natural number2 d defines the order of the tensor. The indices j ∈ {1, . . . , d} correspond to the ‘j-th direction’, ‘j-th position’, ‘j-th dimension’, ‘j-th axis’, ‘j-th site’, or3 ‘j-th mode’. The names ‘direction’ and ‘dimension’ originate from functions f (x1 , . . . , xd ) (cf. §1.1.3) for which the variable xj corresponds to the j-th spatial direction. For each j ∈ {1, . . . , d} we fix a (finite) index set Ij , e.g., Ij = {1, . . . , nj }. The Cartesian product of these index sets yields I := I1 × . . . × Id . The elements of I are multi-indices or d-tuples i = (i1 , . . . , id ) with ij ∈ Ij . A tensor v is defined by its entries vi = v[i] = v[i1 , . . . , id ] ∈ R. 1 2 3
Introductory articles are [136] and [153]. The letter d is chosen because of its interpretation as spatial dimension. The usual meaning of the term ‘mode’ is ‘eigenfunction’.
© Springer Nature Switzerland AG 2019 W. Hackbusch, Tensor Spaces and Numerical Tensor Calculus, Springer Series in Computational Mathematics 56, https://doi.org/10.1007/978-3-030-35554-8_1
3
1 Introduction
4
We may write v := (v[i])i∈I . Mathematically, we can express the set of these tensors by RI . Note that for any index set J, RJ is the vector space RJ = {v = (vi )i∈J : vi ∈ R} of dimension #J (the sign # denotes the cardinality of a set). Notation 1.1. Both notations, vi with subscript i and v[i] with square brackets are used in parallel. The notation with square brackets is preferred for multiple indices and in the case of secondary subscripts: v[i1 , . . . , id ] instead of vi1 ,...,id . There is an obvious entry-wise definition of the multiplication λv of a tensor by a real number and of the (commutative) addition v + w of two tensors. Therefore the set of tensors has the algebraic structure of a vector space (here over the field R). In scientific areas more remote from mathematics and algebra, a tensor v[i1 , . . . , id ] is considered as a data structure and called a ‘d-way array’. The relation between the vector spaces RIj and RI is given by the tensor product. For vectors v (j) ∈ RIj (1 ≤ j ≤ d) we define the tensor product4,5 v := v (1) ⊗ v (2) ⊗ . . . ⊗ v (d) =
d O j=1
via its entries (1)
(d)
(2)
vi = v[i1 , . . . , id ] = vi1 · vi2 · . . . · vid The tensor space is written as tensor product of the vector spaces RIj defined by the span6 d O
v (j) ∈ RI
Nd
j=1
for all i ∈ I.
(1.1)
RIj = RI1 ⊗ RI2 ⊗ . . . ⊗ RId
RIj = span v (1) ⊗ v (2) ⊗ . . . ⊗ v (d) : v (j) ∈ RIj , 1 ≤ j ≤ d .
(1.2)
j=1 (1) 7 (2) (d) The generating products Nd v Ij ⊗ v ⊗ . . . ⊗ v are called elementary tensors. Any element v ∈ of the tensor spaceN is called a (general) tensor. It is j=1 R d important to notice that, in general, a tensor v ∈ j=1 RIj is not representable as elementary tensor but only as a linear combination of such products. Nd The definition (1.2) implies j=1 RIj ⊂ RI . Taking all linearNcombinations of d Ij elementary tensors defined by the unit vectors, we easily prove = RI . j=1 R 4
In some publications the term ‘outer product’ is used instead of ‘tensor product’. This contradicts another definition of the outer product or exterior product satisfying the antisymmetric property u ∧ v = − (v ∧ u) (see page 90). 5 The index j indicating the ‘direction’ is written as upper index in brackets, in order to let space for further indices placed below. 6 The span is the set of all finite linear combinations. 7 Also the terms ‘decomposable tensors’ and ‘pure tensors’ are used. Further names are ‘dyads’ for d = 2, ‘triads’ for d = 3, etc. (cf. [211, p. 3]). In physics, elementary tensors represent ‘factorisable’ or ’pure states’. A non-pure quantum state is called ‘entangled’.
1.1 What are Tensors?
5
Qd In particular, because #I = j=1 #Ij , the dimension of the tensor space is ! Od Yd RIj = dim dim(RIj ). j=1
j=1
Remark 1.2. Let #Ij = n, i.e., dim(RIj ) = n for 1 ≤ j ≤ d. Then the dimension of the tensor space is nd . Unless both n and d are rather small numbers, nd is a huge number. In such cases, nd may far exceed the computer memory. This fact indicates a practical problem which must be overcome. The set of matrices with indices in I1 × I2 is denoted by RI1 ×I2 . Remark 1.3. (a) In the case of d = 2, the matrix space RI1 ×I2 and the tensor space RI1 ⊗ RI2 are isomorphic: matrix entries Mij correspond to tensor entries vi with i = (i, j) ∈ I1 × I2 . The isomorphism M : RI1 ⊗ RI2 → RI1 ×I2 can be defined by the linear extension of the map M (v ⊗ w) = v wT .
(1.3)
(b) For d = 1 the trivial identity RI = RI1 holds; i.e., vectors are tensors of order 1. (c) ForNthe degenerate case d = 0, the empty product is defined by the underlying 0 Ij := R . field: j=1 R
1.1.2 Tensor Product of Matrices, Kronecker Product Let d pairs of vector spaces Vj and Wj (1 ≤ j ≤ d) and the corresponding tensor spaces d d O O V= Vj and W = Wj j=1
j=1
be given together with linear mappings A(j) : Vj → Wj . The tensor product of the maps A(j) , the so-called Kronecker product, is the linear mapping d O (1.4a) A := A(j) : V → W j=1
defined by d O j=1
v
(j)
∈V
7→
A
d O j=1
! v
(j)
=
d O
A(j) v (j) ∈ W
(1.4b)
j=1
for8 all vj ∈ Vj . Since V is spanned by elementary tensors (cf. (1.2)), equation (1.4b) defines A uniquely on V (more details in §3.3). N (j) In De Lathauwer et al. [70, Def. 8], the matrix-vector multiplication Av by A := d j=1 A (1) (2) (d) is denoted by v ×1 A ×2 A · · · ×d A , where ×j is called the j-mode product. 8
1 Introduction
6
In the case of Vj = RIj and Wj = RJj , the mappings A(j) are matrices from Nd R . The Kronecker product j=1 A(j) belongs to the matrix space RI×J with Ij ×Jj
I = I1 × . . . × Id
and
J = J1 × . . . × Jd .
For d = 2 let I1 = {1, . . . , n1 } , I2 = {1, . . . , n2 } , J1 = {1, . . . , m1 }, and J2 = {1, . . . , m2 } be ordered index sets and use the lexicographical ordering9 of the pairs in I = I1 × I2 and J = J1 × J2 . Then the tensor product A ⊗ B ∈ RI×J of the matrices A ∈ RI1 ×J1 and B ∈ RI2 ×J2 has the block form a11 B · · · a1m1 B . A ⊗ B = ... . .. an1 1 B · · · an1 m1 B
(1.5)
v (1) . P (µ) Similarly, a vector v ∈ RJ is regarded as a block vector .. = e ⊗ v (µ) (m1 ) µ∈J1 v with unit vectors e(µ) ∈ RJ1 (cf. (2.2)) and the vector blocks v (µ) ∈ RJ2 . Exercise 1.4. Let A, B, and v be as above. Check the following. P (a) The matrix-vector multiplication (A ⊗ B) v = µ∈J1 Ae(µ) ⊗ Bv (µ) requires #J1 = m1 multiplications Bv (µ) ; in total, 2#I2 #J1 #J2 operations. w(1) .. P (ν) e ⊗w(ν) ∈ RI (e(ν) ∈ RI1 , w(ν) ∈ RI2 ) (b) If the vector w = . = (n1 ) ν∈J 1 w is similarly organised, the calculation of the scalar product by E E D XD X aνµ Bv (µ) , w(ν) h(A ⊗ B) v, wi = Ae(µ) ⊗ Bv (µ) , e(ν) ⊗ w(ν) = ν∈I1 µ∈J1
ν∈I1 µ∈J1
requires 2#I2 #J1 #J2 + 2#I1 (#I2 + 1) #J1 operations. w ˆ (1) .. P (ν) ˆ ⊗ e(ν) ∈ RI , where (c) Use the opposite blocking: w = . = w (m1 ) ν∈I2 ˆ w w ˆ (ν) ∈ RI1 , e(ν) ∈ RI2 . Show that X X h(A ⊗ B) v, wi = AT w ˆ (ν) µ Bv (µ) ν . ν∈I2 µ∈J1
What is the corresponding cost? The blockwise Kronecker products of two block matrices yield the Khatri–Rao product. Another related operation is the Tracy–Singh product. 9
This is the ordering (1, 1) , (1, 2) , . . . , (1, n2 ) , (2, 1) , . . . , (2, n2 ) , . . . , (n1 , n2 ) . If another ordering or no ordering is defined, definition (1.5) is incorrect or does not make sense.
1.1 What are Tensors?
7
1.1.3 Tensor Product of Functions Now we redefine Ij ⊂ R as an interval and consider infinite-dimensional vector spaces of functions as Vj = C(Ij ) or Vj = L2 (Ij ). C(Ij ) contains the continuous functions on Ij , while L2 (Ij ) is the space of the measurable and square-integrable functions on Ij . The tensor product of univariate functions fj (xj ) is the d-variate function10 f :=
d O
fj
with f (x1 , . . . , xd ) =
j=1
d Y
fj (xj )
(xj ∈ Ij , 1 ≤ j ≤ d) .
(1.6)
j=1
The product belongs to V=
d O
Vj ,
where V ⊂ C(I) or V ⊂ L2 (I), respectively.
j=1
for Vj = C(Ij ) or Vj = L2 (Ij ) (details in §4 and §4.4). In the infinite-dimensional case, the definition (1.2) must be modified if one wants to obtain a complete (Banach or Hilbert) space. The span of the elementary tensors must be closed with respect to a suitable norm (here norm of C(I) or L2 (I)): d O
Vj = span {v 1 ⊗ v 2 ⊗ . . . ⊗ v d : v j ∈ Vj , 1 ≤ j ≤ d}.
j=1
The tensor structure of functions is often termed separation of the variables. This means that a multivariate function f can either be written as an elementary tensor Nd product j=1 fj as in (1.6) or as a sum (or series) of such products. A particular example of a multivariate function is the polynomial X P (x1 , . . . , xd ) = ai xi ,
(1.7)
i
where each monomial xi :=
Qd
j=1 (xj )
ij
is an elementary product.
The definitions in §§1.1.1-3 may lead to the impression that there are different tensor products. This is only partially true. The cases of §§1.1.1–2 follow the same Nd concept. In Chapter 3, the algebraic tensor product V = a j=1 Vj of general vector spaces Vj (1 ≤ j ≤ d) will be defined. Choosing Vj = RIj , we obtain tensors as in §1.1.1, while for matrix spaces Vj = RIj ×Jj the tensor product coincides with the Kronecker product. The infinite-dimensional case of §1.1.3 is different since topological tensor spaces require a closure with respect to some norm (see Chapter 4). 10
According to Notation 1.1 we might write f [x1 , x2 , . . . , xd ] instead of f (x1 , x2 , . . . , xd ). In the sequel we use the usual notation of the argument list with round brackets.
1 Introduction
8
1.2 Where do Tensors Appear? At the first sight, tensors of order d ≥ 3 do not seem to be used so often. Vectors (the particular case d = 1) appear almost everywhere. Since matrices (case d = 2) correspond to linear mappings, they are also omnipresent. The theory of vectors and matrices has led to the field of linear algebra. However, there are no standard constructions in linear algebra which lead to tensors of order d ≥ 3. Instead, tensors are studied in the field of multilinear algebra.
1.2.1 Tensors as Coefficients The first purpose of indexed quantities is a simplification of notation. For instance, the description of the polynomial (1.7) in, say, d = 3 variables is easily readable if coefficients aijk with three indices are introduced. In §1.6 we shall mention such an approach already used by Cayley in 1845. Certain quantities in the partial differential equations of elasticity or in Maxwell’s equations are called tensors. However, these tensors are usually of order two and therefore, the term ‘matrix’ would be more appropriate. Moreover, in physics the term ‘tensor’ is often used with the meaning ‘tensor-valued function’. A true tensor c of order 4 is involved in Hooke’s law for three dimensions, where P the stress ‘tensor’ σ and the elasticity ‘tensor’ ε are connected via σij = − k` cijk` εk` (cf. Cauchy [52, pages 293–319]). In differential geometry, tensors are widely used for coordinate transformations. Typically, one distinguishes covariant and contravariant tensors and those of mixed type. The indices of the coefficients are placed either in lower position (covariant case) or in upper position (contravariant). For instance, aij k is a mixed tensor with two covariant and one contravariant component. For coordinate systems in Rn , all indices vary in {1, . . . , n}. The notational advantage of the lower and upper indices is the implicit Einstein summation rule: expressions containing a certain index in both positions are to be summed over this index. We give an example (cf. Kreyszig [201]). Let a smooth two-dimensional manifold be described by the function x(u1 , u2 ). First and second derivatives with respect to these coordinates are denoted by xuk and xui ,uj . Together with the normal vector n, the Gaussian formula for the second derivatives is xui ,uj = Γij k xuk + aij n
(apply summation over k),
where Γij k are the Christoffel symbols11 of the second kind (cf. Christoffel [59], 1869). 11
The notation Γij k is not used in Christoffel’s original paper [59].
1.2 Where do Tensors Appear?
9
The algebraic explanation of co- and contravariant tensors is as follows. The dual space to V := Rn is denoted by V 0 . Although V 0 is isomorphic to V, it is considered as a different vector space. In particular the basis transformations Nd are different (cf. Lemmata 3.8 and 3.9). Mixed tensors are elements of V = j=1 Vj , where Vj is either V (contravariant component) or V 0 (covariant component). The summation rule defines the dual form v 0 (v) of v 0 ∈ V 0 and v ∈ V .
1.2.2 Tensor Decomposition for Inverse Problems In many fields (psychometrics, linguistics, chemometrics,12 telecommunication, biomedical applications, information extraction,13 computer vision,14 etc.) matrixvalued data appear. M ∈ Rn×m may correspond to m measurements of different properties j, while i is associated to n different input data. For instance, in problems from chemometrics the input may be an excitation spectrum, while the output is the emission spectrum. Assuming linear behaviour, we obtain for one substance a matrix abT of rank one. In this case the inverse problem is trivial: the data abT allow us to recover the vectors a and b up to a constant factor. Having a mixture of r substances, we obtain a matrix M=
r X
(aν ∈ Rn , bν ∈ Rm ),
cν aν bT ν
ν=1
where cν ∈ R is the concentration of substance ν. The componentwise version of the latter equation is r X cν aνi bνj . Mij = ν=1
and B = [b1 b2 . . . br ] ∈ Rm×r , we may With A = [c1 a1 c2 a2 . . . cr ar ] ∈ R write M = AB T . n×r
Now the inverse problem is the task to recover the factors A and B. This, however, is impossible since A0 = A T and B 0 = T −T B satisfy M = A0 B 0T for any regular matrix T ∈ Rr×r . Tensors of order three come into play when we repeat the experiments with varying concentrations cνk (concentration of substance ν in the k-th experiment). The resulting data are r X cνk aνi bνj . Mijk = ν=1
12 13 14
See, for instance, Smile–Bro–Geladi [264] and De Lathauwer–De Moor–Vandevalle [71]. See, for instance, Lu–Plataniotis–Venetsanopoulos [218]. See, for instance, Wang–Ahuja [299].
1 Introduction
10
By the definition (1.1), we can rewrite the latter equation as15 M=
r X
aν ⊗ bν ⊗ cν .
(1.8)
ν=1
Under certain conditions, it is possible to recover the vectors aν ∈ Rn , bν ∈ Rm , cν ∈ Rr from the data M ∈ Rn×m×r (up to scaling factors; cf. Remark 7.4b). In these application fields, the above ‘inverse problem’ is called a ‘factor analysis’ or ‘component analysis’ (cf. [158], [71]). These techniques have been developed in the second part of the last century: Cattell [51] (1944), Tucker [284] (1966), Harshman [158] (1970), Appellof– Davidson [8] (1981), Henrion [160] (1994), De Lathauwer–De Moor–Vandevalle [71] (2000), Comon [61] (2002), Smilde–Bro–Geladi [264] (2004), Kroonenberg [202] (2008), and many more (see also the review by Kolda–Bader [197]). In this monograph, we shall not study these inverse problems. In §7.1.3, the difference between tensor representations and tensor decompositions will be discussed. Our emphasis lies on the tensor representation. We remark that the tensors considered above cannot really be large-sized as long as all entries Mijk can be stored.
1.2.3 Tensor Spaces in Functional Analysis The analysis of topological tensor spaces started by Schatten [254] (1950) and Grothendieck [129]. Chapter 4 introduces parts of their concepts. However, most of the applications in functional analysis concern tensor products X = V ⊗ W of two Banach spaces. The reason is that these tensor spaces of order two can be related to certain linear operator spaces. The interpretation of X as tensor product may allow us to transport certain properties from the factors V and W , which are easier to be analysed, to the product X which may be of a more complicated nature.
1.2.4 Large-Sized Tensors in Analysis Applications In analysis, the approximation of functions is well-studied. The quality of approximation is usually related to smoothness properties. If a function is the solution of a partial differential equation, a lot is known about its regularity (cf. [141, §9]). Below, we give an example how the concept of tensors may appear in the context of partial differential equations and their discretisations. 15
Representations as (1.8) are used by Hitchcock [164] in 1927 (see §1.6).
1.2 Where do Tensors Appear?
11
1.2.4.1 Partial Differential Equations Let Ω = I1 × I2 × I3 ⊂ R3 be the product of three intervals and consider an elliptic differential equation Lu = f on Ω, e.g., with Dirichlet boundary conditions u = 0 on the boundary Γ = ∂Ω. A second order differential operator L is called separable if ∂ ∂ ∂ L = L1 + L2 + L3 with Lj = aj (xj ) + bj (xj ) + cj (xj ). (1.9a) ∂xj ∂xj ∂xj Note that any differential operator with constant coefficients and without mixed derivatives is of this kind. According to §1.1.3, we may consider the three-variate function as a tensor of order three. Moreover, the operator L can be regarded as a Kronecker product:16 L = L1 ⊗ id ⊗ id + id ⊗ L2 ⊗ id + id ⊗ id ⊗ L3 .
(1.9b)
This tensor structure becomes more obvious when we consider a finite difference discretisation of Lu = f . Assume, e.g., that I1 = I2 = I3 = [0, 1] and introduce the uniform grid Gn = ( ni , nj , nk ) : 0 ≤ i, j, k ≤ n of grid size h = 1/n. The discrete values of u and f at the nodes of the grid are denoted by17 uijk := u( ni , nj , nk ),
fijk := f ( ni , nj , nk ),
(1 ≤ i, j, k ≤ n − 1) .
Hence u and f are tensors of the size (n − 1)×(n − 1)×(n − 1). The discretisation of the one-dimensional differential operator Lj in (1.9a) yields a tridiagonal matrix L(j) ∈ R(n−1)×(n−1) . As in (1.9b), the matrix of the discrete system Lu = f is the Kronecker product L = L(1) ⊗ I ⊗ I + I ⊗ L(2) ⊗ I + I ⊗ I ⊗ L(3) .
(1.10)
I ∈ R(n−1)×(n−1) is the identity matrix. Note that L has size (n − 1)3 ×(n − 1)3 . The standard treatment of the system Lu = f views u and f as vectors from RN with N := (n − 1)3 and tries to solve N equations with N unknowns. If n ≈ 100, a system with N ≈ 106 equations can still be handled. However, for n ≈ 10000 or even n ≈ 106 , a system of the size N ≈ 1012 or N ≈ 1018 exceeds the capacity of standard computers. If we regard u and f as tensors of Rn−1 ⊗ Rn−1 ⊗ Rn−1 , it might be possible to find tensor representations with much less storage. Consider, for instance, a uniform load f = 1. Then f = 1 ⊗ 1 ⊗ 1 is an elementary tensor, where 1 ∈ Rn−1 is the vector with entries 1i = 1. The matrix L is already written as Kronecker product (1.10). In §9.8.2.6 we shall show that at least for positive-definite L a very accurate inverse matrix B ≈ L−1 can be constructed and that the matrix-vector multiplication u ˜ = Bf can be performed. The required storage for the representation of B and 16 17
The symbol id denotes the identity map. In the matrix case, we usually write I instead of id. Because of the boundary condition, uijk = 0 holds if one the indices equals 0 or n.
1 Introduction
12
u ˜ is bounded by O(n log2 ( 1ε )), where ε is related to the error L−1 − B 2 ≤ ε. The same bound holds for the computational cost. The following observations are important: (1) Under suitable conditions, the exponential cost nd can be reduced to O(dn) or even O(d log n) (here: d = 3). This allows computations in cases for which the standard approach fails and not even the storage of the data u, f can be achieved. (2) Usually, tensor computations will not be exact but yield approximations. In applications from analysis, there are many cases for which fast convergence holds. In the example from above the accuracy ε improves exponentially with a certain rank parameter so that we obtain the logarithmic factor of log2 (1/ε). Although such a behaviour is typical for many problems from analysis, it does not hold in general, in particular not for random data. (3) The essential key is a tensor representation with two requirements. First, low storage cost is an obvious option. Since the represented tensors are involved in operations (here: the matrix-vector multiplication Bf ), the second option is that such tensor operations should have a comparably low cost. Finally, we give an example for which the tensor structure can be successfully applied without any approximation error. Instead of the linear system Lu = f from above, we consider the eigenvalue problem Lu = λu. First, we discuss the undiscretised problem Lu = λu.
(1.11)
Here it is well known that the separation ansatz u(x, y, z) = u1 (x)u2 (y)u3 (z) yields three one-dimensional boundary eigenvalue problems L1 u1 (x) = λ(1) u1 ,
L2 u2 (y) = λ(2) u2 ,
L3 u3 (z) = λ(3) u3
with zero conditions at x, y, z ∈ {0, 1}. The product u(x, y, z) := u1 (x)u2 (y)u3 (z) satisfies Lu = λu with λ = λ(1) + λ(2) + λ(3) . The latter product can be understood as tensor product: u = u1 ⊗ u2 ⊗ u3 (cf. §1.1.3). Similarly, we derive from the Kronecker product structure (1.10) that the solutions of the discrete eigenvalue problems L(1) u1 = λ(1) u1 ,
L(2) u2 = λ(2) u2 ,
L(3) u3 = λ(3) u3
in Rn−1 yield the solution u = u1 ⊗ u2 ⊗ u3 of Lu = λu with λ = λ(1) + λ(2) + λ(3) . In this example we exploit that the eigensolution is exactly18 equal to an elementary tensor. In the discrete case, this implies that an object of the size 3 (n − 1) can be represented by three vectors of the size n − 1. 18
This holds only for separable differential operators (cf. (1.9a)), but also in more general cases tensor approaches apply as shown in [148] (see §16.3).
1.3 Tensor Calculus
13
1.2.4.2 Multivariate Function Representation The computational realisation of a special function f (x) in one variable may be based on a rational approximation, a recursion etc. or a combination of these tools. Computing a multivariate function f (x1 , . . . , xp ) is even more difficult. Such functions may be defined by complicated integrals, involving parameters x1 , . . . , xp in the integrand or in the integration domain. Consider the evaluation of f on I = I1 × . . . × Ip ⊂ Rp with Ij = [aj , bj ]. We may precompute f at grid points (x1,i1 , . . . , xp,ip ), xj,ij = aj + ij (bj − aj )/n for 0 ≤ ij ≤ n, followed by suitable interpolation at the desired x = (x1 , . . . , xp ) ∈ I. However, we fail as the required storage of the grid values is of the size np . Again, the hope is to find a suitable tensor approximation with storage O(pn) and an evaluation procedure of a similar cost. To give an example, a very easy task is the approximation of the function !−1/2 p X 1 2 xi for kxk ≥ a > 0. = kxk i=1 p √ We obtain a uniform accuracy of the size O exp(−π r/2)/ a with a storage of the size 2r and an evaluation cost O(rp). Details will follow in §9.8.2.5.2.
1.2.5 Tensors in Quantum Chemistry The Schr¨odinger equation determines ‘wave functions’ f (x1 , . . . , xd ) for which each variable xj ∈ R3 corresponds to one electron. Hence the spatial dimension 3d increases with the size of the molecule. A first ansatz19 is f (x1 , . . . , xd ) ≈ Φ(x1 , . . . , xd ) := ϕ1 (x1 )ϕ2 (x2 ) · . . . · ϕd (xd ), which leads to the Hartree–Fock Nd equation. According to (1.6), we can write Φ := j=1 ϕj as a tensor. More accurate approximations require tensors which are linear combinations of such products. The standard ansatz for the three-dimensional functions ϕj (x) are sums of 2 Gaussian functions20 Φν (x) := eαν kx−Rν k as introduced by Boys [39] in 1950. N3 αν (xk −Rν,k )2 . Again, Φν is the elementary tensor k=1 ek with ek (xk ) := e
1.3 Tensor Calculus The representation of tensors (in particular, with not too large storage requirements) is one goal of the efficient numerical treatment of tensors. Another goal is the efficient performance of tensor operations. In the case of matrices, we apply matrix-vector and matrix-matrix multiplications and matrix inversions. The same operations occur for tensors when the matrix is given by a Kronecker matrix and the vector by a tensor. Besides of these operations there are entry-wise multiplications, convolutions etc. 19 20
In fact, the product must be antisymmetrised yielding the Slater determinant from Lemma 3.86. Possibly multiplied by polynomials.
14
1 Introduction
In linear algebra, basis transformations are well known which lead to vector and matrix transforms. Such operations occur for tensors as well. There are matrix decompositions as the singular-value decomposition. Generalisations to tensors will play an important role. These and further operations are summarised under the term of ‘tensor calculus’.21 In the same way, as a library of matrix procedures is the basis for all algorithms in linear algebra, the tensor calculus enables computations in the realm of tensors. Note that already in the case of large-sized matrices, special efficient matrix representations are needed (cf. Hackbusch [138]), although the computational time grows only polynomially (typically cubically) with the matrix size. All the more important are efficient algorithms for tensors to avoid exponential run time. A review of the recent literature in this field is given by Grasedyck–Kressner– Tobler [124].
1.4 Preview 1.4.1 Part I: Algebraic Properties Matrices can be considered as tensors of second order. In Chapter 2 we summarise various properties of matrices, as well as techniques applicable to matrices. QR and singular-value decompositions will play an important role for later tensor operations. In Chapter 3, tensors and tensor spaces are introduced. The definition of the tensor space in §3.2 requires a discussion of free vectors spaces (in §3.1.2) and of quotient spaces (in §3.1.3). Furthermore, linear and multilinear mappings and algebraic dual spaces are discussed in §3.1.4. In §3.2 we introduce the tensor product and the (algebraic) tensor space and define the (tensor) rank of a tensor, which generalises the rank of a matrix. Later we shall introduce other vector-valued ranks of tensors. In §3.3 we have a closer look to linear and multilinear maps. In particular, tensor products of linear maps are discussed. Tensor spaces with additional algebra structure are different from tensor algebras. Both are briefly described in §3.4. In particular applications, symmetric or antisymmetric tensors are needed. These are defined in §3.5. Symmetric tensors are connected to quantics (cf. §3.5.2), while antisymmetric tensors are related to determinants (cf. §3.5.3). 21
The Latin word ‘calculus’ is the diminutive of ‘calx’ (lime, limestone) and has the original meaning ‘pebble’. In particular, it denotes the pieces used in the Roman abacus. Therefore the Latin word ‘calculus’ has also the meaning of ‘calculation’ or, in modern terms, ‘computation’.
1.4 Preview
15
1.4.2 Part II: Functional Analysis of Tensors Normed tensor spaces are needed as soon as we want to approximate certain tensors. Even in the finite-dimensional case, one observes properties of tensors which are completely unknown from the matrix case. In particular in the infinite-dimensional case, we have Banach (or Hilbert) spaces Vj endowed with a norm k·kj , as well Nd as the algebraic tensor space Valg = a j=1 Vj , which together with a norm k·k becomes a normed space. Completion yields the topological Banach space Vtop = Nd k·k j=1 Vj . The tensor space norm k·k is by no means determined by the single norms k·kj . In §4.2 we study the properties of tensor space norms. It turns out that continuity conditions on the tensor product limit the choice for k·k (cf. §4.2.2.1). There are two norms induced by {k·kj : 1 ≤ j ≤ d}, the projective norm (cf. §4.2.3.1) and the injective norm (cf. §4.2.4.2), which are the strongest and weakest possible norms. Further terms of interest are crossnorms (cf. §4.2.2.1), reasonable crossnorms (cf. §4.2.6.1), and uniform crossnorms (cf. §4.2.8). For the case d = 2 considered in §4.2, nuclear and compact operators are discussed (cf. §4.2.9). The except that we also extension to d ≥ 3 discussed in §4.3 is almost straightforward N need suitable norms, e.g., for the tensor spaces a j∈{1,...,d}\{k} Vj of order d − 1. While Lp or C 0 norms of tensor spaces belong to the class of crossnorms, the usual spaces C m or H m (m ≥ 1) cannot be described by crossnorms but by intersections of Banach (or Hilbert) tensor spaces (cf. §4.3.5). The corresponding construction by crossnorms leads to so-called mixed norms. Hilbert spaces are discussed in §4.4. In this case the scalar products h·, ·ij of Vj define the induced scalar product of the Hilbert tensor space (cf. §4.4.1). In the Hilbert case, the infinite singular-value decomposition can be used to define the Hilbert–Schmidt and the Schatten norms (cf. §4.4.3). Besides the usual scalar product the partial scalar product is of interest (cf. §4.5.5). In §4.6 the tensor operations are enumerated which later are to be performed numerically. Particular subspaces of the tensor space ⊗d V are the symmetric and antisymmetric tensor spaces discussed in §4.7. Chapter 5 concerns algebraic tensor spaces, as well as topological ones. We consider different isomorphisms which allow us to regard tensors either as vectors (vectorisation in §5.1) or as matrices (matricisation in §5.2). In particular, the matricisation will become an important tool. The opposite direction is the tensorisation considered in §5.3 and later, in more detail, in Chapter 14. Here vectors from Rn are artificially reformulated as tensors. Another important tool for the analysis and for concrete constructions are the minimal subspaces studied in Chapter 6. Given some tensor v, we ask for Nd the smallest subspaces Uj such that v ∈ j=1 Uj . Of particular interest is their behaviour for sequences vn * v of tensors.
16
1 Introduction
1.4.3 Part III: Numerical Treatment The numerical treatment of tensors is based on a suitable tensor representation. Chapters 7 to 10 are devoted to two well-known representations, the r-term format (also called the canonical or CP format) and the tensor subspace format (also called the Tucker format). We distinguish the exact representation from the approximation task. Exact representations are discussed in Chapter 7 (r-term format) and Chapter 8 (tensor subspace format). If the tensor rank is moderate, the r-term format is a very good choice, whereas the tensor subspace format is disadvantageous for larger tensor order d because of its exponentially increasing storage requirement. Tensor approximations are discussed separately in Chapters 9 (r-term format) and 10 (tensor subspace format). In the first case, many properties known from the matrix case (see §9.3) do not generalise to tensor orders d ≥ 3. A particular drawback is mentioned in §9.4: the set of r-term tensors is not closed. This property ˜ in the may cause a numerical instability. An approximation of a tensor v by some v r-term format may be computed numerically using a regularisation (cf. §9.6). In some cases, analytical methods allow us to determine very accurate r-term approximations to functions and operators (cf. §9.8). In the case of tensor subspace approximations (§10) there are two different options. The simpler approach is based on the higher-order singular-value decomposition (HOSVD; cf. §10.1). This allows a projection to smaller rank similar to the standard singular-value decomposition in the matrix case. The result is not necessarily the best one but quasi-optimal. The second option is the best approximation considered in §10.2. In contrast to the r-term format, the existence of a best approximation is guaranteed. A standard numerical method for its computation is the alternating least-squares method (ALS, cf. §10.3). For particular cases, analytical methods are available to approximate multivariate functions (cf. §10.4). While the r-term format suffers from a possible numerical instability, the storage size of the tensor subspace format increases exponentially with the tensor order d. A format avoiding both problems is the hierarchical format described in Chapter 11. Here the storage is strictly bounded by the product of the tensor order d, the maximal rank involved, and the maximal dimension of the vector spaces Vj . Again, HOSVD techniques can be used for a quasi-optimal truncation. Since the format is closed, numerical instability does not occur. The hierarchical format is based on a dimension partition tree. A particular choice for the tree leads to the matrix product representation or TT format described in Chapter 12. The essential part of the numerical tensor calculus is the performance of tensor operations. In Chapter 13 we describe all operations, their realisation in the different formats, and the corresponding computational cost. The tensorisation briefly mentioned in §5.3 is revisited in Chapter 14. When applied to grid functions, tensorisation corresponds to a multiscale approach. The tensor truncation methods allow an efficient compression of the data size. As shown in §14.2, the approximation can be proved to be at least as good as analytical methods like hp methods, . approximations, or wavelet compression techniques.
1.4 Preview
17
In §14.3 the computation of the convolution is described. The fast Fourier transform is explained in §14.4. The method of tensorisation can also be applied to functions, instead of grid functions as detailed in §14.5. Chapter 15 is devoted to the multivariate cross approximation. The underlying problem is the approximation of general tensors, which has several important applications. In Chapter 16, the application of the tensor calculus to elliptic boundary-value problems (§16.2) and eigenvalue problems (§16.3) is discussed. The final Chapter 17 collects a number of additional topics. §17.1 considers general minimisation problems. Another minimisation approach described in §17.2 applies directly to the parameters of the tensor representation. Dynamic problems are studied in §17.3, while the ANOVA method is mentioned in §17.4.
1.4.4 Topics Outside the Scope of the Monograph As already mentioned in §1.2.2, we do not aim for inverse problems, where the parameters of the representation (decomposition) have a certain external interpretation (see the references in the footnotes 12–14). High-dimensional tensors arise also in data mining (cf. Kolda–Sun [198]). However, in contrast to mathematical applications (e.g., in partial differential equations), weaker properties hold concerning data smoothness, the desired order of accuracy, and even the availability of data. We do not consider data completion (approximation of incomplete data; cf. Liu et al. [214], Kahle et al. [176], Grasedyck–Kr¨amer [123]), which is a typical Nproblem d for data from non-mathematical sources. Entries v[i] of a tensor v ∈ j=1 Rnj d I ⊂ I := ×j=1 {1, . . . , nj }. Other may be available only for i ∈ ˚ I of a subset ˚ examples are cases in which data are lost or deleted. Approximation of the remaining data by a tensor v ˜ of a certain format yields the desired completion (cf. Footnote 10 on page 321). Instead, in Chapter 15 we are discussing quite another kind of data completion, where an approximation v ˜ is constructed by a small part of the data. However, unlike usual data completion, we assume that all data are available on demand, although possibly with high arithmetic cost. Another subject, which is not discussed here, is the detection and determination of principal manifolds of smaller dimension (‘manifold learning’: see, e.g., Feuers¨anger–Griebel [101] or Griebel–Hullmann [127]). The order d of the tensor considered here, is always finite. In fact, the numerical cost of storage or arithmetic operations is at least increasing linearly in d. Infinite dimensions (d = ∞) may appear theoretically (as in §15.1.2.2), but only truncations to finite d are discussed. There are approaches to generalise purely algebraic problems from linear algebra to multilinear algebra; e.g., eigenvalue problems (cf. Cartwright–Sturmfels [50], Qi–Luo [245]) and certain decompositions. Unfortunately, usually such problems are NP-hard (cf. Hillar–Lim [162, §9]) and do not help for large-sized tensors.
18
1 Introduction
1.5 Software Free software for tensor applications is offered by the following groups: • M ATLAB Tensor Toolbox by Bader–Kolda [14]: http://www.tensortoolbox.org/ • Hierarchical Tucker Toolbox by Tobler–Kressner [200]: https://anchp.epfl.ch/index-html/software/htucker/ • Tensorlab 3.0 by Vervliet–Debals–Sorber–Van Barel–De Lathauwer: https://www.tensorlab.net/ • TT TOOLBOX http://github.com/oseledets/TT-Toolbox/ by I. Oseledets • TensorCalculus http://www.swmath.org/software/13726 by M. Espig et al. • B.M. Wise and N.B. Gallagher: http://www.eigenvector.com • Andersson–Bro [6]: http://www.models.life.ku.dk/nwaytoolbox
1.6 Comments about the Early History of Tensors The word ‘tensor’ seems to be used for the first time in an article by William Rowan Hamilton [155] from 1846. The meaning, however, was quite different. Hamilton is well known for his quaternions. Like complex numbers, a modulus of a quaternion can be defined. For this nonnegative real number he introduced the name ‘tensor’. The word ‘tensor’ is again used in a book by Woldemar Voigt [298] in 1898 for quantities which come closer to our understanding.22 In May 1845, Arthur Cayley [53] submitted a paper in which he described hyperdeterminants.23 There he considers tensors of general order. For instance, he gives an illustration of a tensor from R2 ⊗ R2 ⊗ R2 (p. 11 in [298]): e m = 2, et prenons Soit n = 3, posons pour plus simpicit´ 111 = a, 211 = b, 121 = c, 221 = d, 22
112 = e, 212 = b, 122 = g, 222 = h,
From [298, p. 20]: Tensors are “. . . Zust¨ande, die durch eine Zahlgr¨osse und eine zweiseitige Richtung charakterisiert sind. . . . Wir wollen uns deshalb nur darauf st¨utzen, dass Zust¨ande der geschilderten Art bei Spannungen und Dehnungen nicht starrer K¨orper auftreten, und sie deshalb tensorielle, die f¨ur sie charakteristischen physikalischen Gr¨ossen aber Tensoren nennen.” 23 p q r See also [73, §5.3]. The hyperdeterminant vanishes P for a tensor v ∈ R ⊗ R ⊗ R if and only if the associated multilinear form ϕ(x, y, z) := i,j,k v[i, j, k]xi yj zk allows nonzero vectors x, y, z such that ∇x ϕ, ∇y ϕ, or ∇z ϕ vanish at (x, y, z).
1.7 Notations
19
` consid´ erer est ere que la fonction a de mani` U = ax1 y1 z1 + bx2 y1 z1 + cx1 y2 z1 + dx2 y2 z1 + ex1 y1 z2 + f x2 y1 z2 + gx1 y2 z2 + hx2 y2 z2 . Next, he considers linear transformations Λ(1) , Λ(2) , Λ(3) in all three directions, e.g., Λ(1) is described as follows. ´quations pour la transformation sont Les e x1 = λ11 x˙ 1 + λ21 x˙ 2 , x2 = λ12 x˙ 1 + λ22 x˙ 2 , ... The action of the transformations Λ(1) , Λ(2) , Λ(3) already represents the Kronecker product Λ(1) ⊗ Λ(2) ⊗ Λ(3) . The paper of Hitchcock [164] from 1927 has a similar algebraic background. The author states that ‘any covariant tensor Ai1 ..ip can be expressed as the sum of a finite number of which is the product of p covariant vectors’. In [164] the ranks are defined which we introduce in Definition 5.7. Although in this paper he uses the name ‘tensor’, in the following paper [163] of the same year he prefers the term ‘matrix’ or ‘p-way matrix’. The tensor product of vectors a, b is denoted by ab without any special tensor symbol. In §1.1.2 we named the tensor product of matrices ‘Kronecker product’. In fact, this term is well-introduced, but historically it seems to be unfounded. The ‘Kronecker product’ (and its determinant) has first been studied by Johann Georg Zehfuss [308] in 1858, while it is questionable whether there exists any notice of Kronecker about this product (see Henderson–Pukelsheim–Searle [174] for historical remarks). Zehfuss’ result about determinants can be found in Exercise 4.164.
1.7 Notations A list of symbols, letters etc. can be found on page xix. Here we collect the notational conventions which we use in connection with vectors, matrices, and tensors. Index Sets. I, J, K are typical letters used for index sets. In general, we do not require that an index set be ordered. This allows, e.g., to define a new index set K := I × J as the product of index sets I, J without prescribing an ordering of the pairs (i, j) of i ∈ I and j ∈ J.
1 Introduction
20
Fields. A vector space is associated with some field, which will be denoted by K. The standard choices24 are R and C. When we use the symbol K instead of the special choice R, we use the complex-conjugate value λ of a scalar whenever this is required in the case of K = C. Vector Spaces Kn and KI . Let n ∈ N. Kn is the standard notation for the n vector space of the n-tuples v = (vi )i=1 with vi ∈ K. The more general notation I K abbreviates the vector space {v = (vi )i∈I : vi ∈ K}. Equivalently, we may define KI as the space of mappings from I into K. Note that this definition makes sense for non-ordered index sets. If, e.g., K = I × J is the index set, a vector v ∈ KK has entries vk = v(i,j) for k = (i, j) ∈ K. The notation v(i,j) must be distinguished from vi,j which indicates a matrix entry. The simple notation Kn is identical to KI for I = {1, . . . , n}. Vectors will be symbolised by small letters. Vector entries are usually denoted by vi . The alternative notation v[i] is used if the index carries a secondary index (example: v[i1 ]) or if the symbol for the vector is already indexed (example: vν [i] for vν ∈ KI ). Typical symbols for vector spaces are V , W , U , etc. Often, U is used for subspaces. Matrices and Matrix Spaces KI×J . Any linear mapping Φ : KI → KJ (I, J index sets) can be represented by a matrix25 M ∈ KI×J with entries Mij ∈ K and we may write M = (Mij )i∈I,j∈J or M = (Mij )(i,j)∈I×J . The alternative notation Kn×m is used for the special index sets I = {1, . . . , n} and J = {1, . . . , m}. Even the mixed notation KI×m appears if J = {1, . . . , m}, while I is a general index set. Matrices will be symbolised by capital letters. Matrix entries are denoted by Mi,j = Mij or by M [i, j]. Given a matrix M ∈ KI×J , its i-th row or its j-th column will be denoted by Mi,• = M [i, •] ∈ KJ
or M•,j = M [•, j] ∈ KI , respectively.
If τ ⊂ I and σ ⊂ J are index subsets, the restriction of a matrix is written as M |τ ×σ = (Mij )i∈τ,j∈σ ∈ Kτ ×σ . More about matrix notations will follow in §2.1. 24
Finite fields are of less interest, since approximations do not make sense. Nevertheless, there are applications of tensor tools for Boolean data (cf.Lichtenberg–Eichler[210]). Most of the results not involving norms and absolute values hold for general fields. At some places we require that K must have characteristic 0, i.e., finite fields are excluded. 25 M ∈ KI×J is considered as a matrix, whereas v ∈ KK for K = I × J is viewed as a vector.
1.7 Notations
21
Tensors. Tensors are denoted by small boldface letters: v, w, . . . , a, b, . . . Their entries are usually indexed in square brackets: v[i1 , . . . , id ]. In simple cases, subscripts may be used: vijk . The boldface notation v[i1 , . . . , id ] is also used in the case of a variable d which possibly takes the values d = 1 [vector case] or 2 [matrix case]. The standard notation for a tensor space of order d is V=
d O
Vj .
j=1
Here Vj (1 ≤ j ≤ d) are vector spaces generating the tensor space V. As in this example, tensor spaces are denoted by capital letters in bold type. U is the typical letter for a subspace of a tensor space. Nd Elementary tensors from V = j=1 Vj have the form v=
d O
v (j) = v (1) ⊗ . . . ⊗ v (d) .
j=1
The superscript in round brackets indicates the vector corresponding to the j-th direction. The preferred letter for the direction index is j (or k if a second index (j) is needed). The entries of v (j) may be written as vi or v (j) [i]. A lower subscript Pr Nd (j) (j) may also denote the ν-th vector vν ∈ Vj as required in v = ν=1 j=1 vν . In (j) (j) this case the entries of vν are written as vν [i]. To be precise, we N have to distinguish algebraic and topological tensor Nbetween d d spaces denoted by a j=1 Vj and k·k j=1 Vj , respectively. Details can be found in Notation 3.10.
Chapter 2
Matrix Tools
Abstract In connection with tensors, matrices are of interest for two reasons. Firstly, they are tensors of order two and therefore a nontrivial example of a tensor. Differently from tensors of higher order, matrices allow us to apply practically realisable decompositions. Secondly, operations with general tensors will often be reduced to a sequence of matrix operations (realised by well-developed software). Sections 2.1–2.3 introduce the notation and recall well-known facts about matrices. Section 2.5 discusses the important QR decomposition and the singular-value decomposition (SVD) and their computational cost. The (optimal) approximation by matrices of lower rank explained in Section 2.6 will be used later in truncation procedures for tensors. In Part III we shall apply some linear algebra procedures introduced in Section 2.7 based on QR and SVD.
2.1 Matrix Notations In this subsection, the index sets I, J are assumed to be finite. As soon as complex conjugate values appear,1 the scalar field is restricted to K ∈ {R, C}. We recall the notation KI×J explained in §1.7. The entries of a matrix M ∈ KI×J are denoted by Mij (i ∈ I, j ∈ J). Conversely, numbers αij ∈ K (i ∈ I, j ∈ J) may be used to define M := (αij )i∈I, j∈J ∈ KI×J . Let j ∈ J. The j-th column of M ∈ KI×J is the vector M [•, j] = (Mij )i∈I ∈ KI , while vectors c(j) ∈ KI generate a matrix M := [ c(j) : j ∈ J ] ∈ KI×J . If J is ordered, we may write M := [ c(j1 ) , c(j2 ) , . . .]. δij (i, j ∈ I) is the Kronecker symbol defined by δij = 1
1 0
if i = j ∈ I, otherwise.
(2.1)
In the case of K = R, α = α ¯ holds for all α ∈ K.
© Springer Nature Switzerland AG 2019 W. Hackbusch, Tensor Spaces and Numerical Tensor Calculus, Springer Series in Computational Mathematics 56, https://doi.org/10.1007/978-3-030-35554-8_2
23
2 Matrix Tools
24
The unit vector e(i) ∈ KI (i ∈ I) is defined by e(i) := (δij )j∈I .
(2.2)
The symbol I = (δij )i,j∈I is used for the identity matrix. Since matrices and index sets do not appear at the same place, the simultaneous use of I for a matrix and for an index set should not lead to any confusion (example: I ∈ KI×I ). If M ∈ KI×J , the transposed matrix M T ∈ KJ×I is defined by Mij = (M T )ji (i ∈ I, j ∈ J). A matrix from KI×I is symmetric if M = M T . The Hermitian transposed matrix M H ∈ KJ×I is M T , i.e., Mij = (M T )ji , where • is the complex conjugate value. If K = R, M H = M T holds. This allows us to use H for the general case of K ∈ {R, C}. A Hermitian matrix satisfies M = M H . The range of a matrix M ∈ KI×J is2 range(M ) := {M x : x ∈ KJ }. The Euclidean scalar product in KI is given by X hx, yi = y H x = xi yi , i∈I
where, in the real case K = R, the conjugate sign can be ignored. In the case of K = C, the scalar product is a sesquilinear form, i.e., it is antilinear in the second argument.3 Two vectors x, y ∈ KI are orthogonal (symbolic notation: x⊥y) if hx, yi = 0. A family of vectors {xν }ν∈F ⊂ KI is orthogonal if the vectors are pairwise orthogonal, i.e., hxν , xµ i = 0 for all ν, µ ∈ F with ν 6= µ. Similarly, two vectors x, y ∈ KI or a family {xν }ν∈F ⊂ KI are orthonormal, if, in addition, all vectors are normalised: hx, xi = hy, yi = 1 or hxν , xν i = 1 (ν ∈ F ). A matrix M ∈ KI×J is called orthogonal, if the columns of M are orthonormal. An equivalent characterisation is M H M = I ∈ KJ×J .
(2.3)
Note that the (Hermitian) transpose of an orthogonal matrix is, in general, not orthogonal. M ∈ KI×J can be orthogonal only if #J ≤ #I. An orthogonal square4 matrix M ∈ KI×I is called unitary (if K = R, often the term ‘orthogonal’ is preferred). In contrast to the remark above, unitary matrices satisfy 2
Also the notation colspan(M ) exists, since range(M ) is spanned by the columns of M. A mapping ϕ is called antilinear, if ϕ(x + αy) = ϕ(x) + αϕ(y) for α ∈ C. 4 We may assume M ∈ KI×J with #I = #J and different I, J. Then M H M = I ∈ KI×I and M M H = I ∈ KJ×J are the precise conditions. 3
2.2 Matrix Rank
25
M H M = M M H = I ∈ KI×I , i.e., M H = M −1 holds. Assume that the index sets satisfy either I ⊂ J or J ⊂ I. Then a (rectangular) matrix M ∈ KI×J is diagonal, if Mij = 0 for all i 6= j, (i, j) ∈ I ×J. Given numbers δi (i ∈ I ∩ J), the associated diagonal matrix M with Mii = δi is written as diag{δi : i ∈ I ∩ J}. If the index set I ∩ J is ordered, an enumeration of the diagonal entries can be used: diag{δi1 , δi2 , . . .}. Again, assume I ⊂ J or J ⊂ I and a common ordering of I ∪ J. A (rectangular) matrix M ∈ KI×J is lower triangular, if Mij = 0 for all (i, j) ∈ I ×J with i > j. Similarly, Mij = 0 for all i < j defines the upper triangular matrix.
2.2 Matrix Rank Remark 2.1. Let M ∈ KI×J . The following statements are equivalent and each may be used as definition of the ‘matrix rank’ r = rank(M ): (a) r = dim range(M ), (b) r = dim range(M T ), (c) r is the maximal number of linearly independent rows in M, (d) r is the maximal number of linearly independent columns in M, (e) r ∈ N0 is minimal with the property M=
r X
ai bT i,
where ai ∈ KI and bi ∈ KJ ,
(2.4)
i=1
(f) r is maximal with the property that there exists a regular r×r submatrix5 of M . (g) r is the number of positive singular values (see (2.16a)). In (b) and (e) we may replace •T with •H . Part (e) states in particular that products ai bT i of nonvanishing vectors represent all rank-1 matrices. The rank of M ∈ KI×J is bounded by the maximal rank rmax := min {#I, #J} ,
(2.5)
and this bound is attained for the so-called full-rank matrices. The definition of linear independency depends on the field K. This leads to the following question. A real matrix M ∈ RI×J may also be considered as an element of CI×J . Hence, in principle, such an M may possess a ‘real’ rank and a 5
That means that there are I 0 ⊂ I and J 0 ⊂ J with #I 0 = #J 0 = r and M |I 0 ×J 0 regular.
2 Matrix Tools
26
‘complex’ rank. However, the equivalent characterisations (f) and (g) of Remark 2.1 are independent of the choice K ∈ {R, C} and prove the next remark. Remark 2.2. For M ∈ RI×J ⊂ CI×J the value of rank(M ) is independent of the field K ∈ {R, C}. Corollary 2.3. Let r = rank(M ) and define A := [a1 , . . ., ar ] and B := [b1 , . . ., br ] by ai and bi in (2.4). Then (2.4) is equivalent to M = AB T . An interesting matrix family is the set of matrices of rank not exceeding r : Rr := M ∈ KI×J : rank(M ) ≤ r . (2.6) Any M ∈ Rr may be written in the form (2.4). Lemma 2.4. The sets Rr ⊂ KI×J for r ∈ N0 are closed. Any convergent sequence R(k) ∈ Rr satisfies lim inf rank(R(k) ) ≥ rank lim R(k) . (2.7) k→∞
k→∞
Proof. For s ∈ N0 set Ns := k ∈ N : rank(R(k) ) = s ⊂ N and r∞ = min{s ∈ N0 : #Ns = ∞} = lim inf rank(R(k) ) ≤ r. k→∞
We restrict R(k) to the subsequence with k ∈ Nr∞ , i.e., rank(R(k) ) = r∞ . For full rank, i.e., r∞ = min{#I, #J}, nothing has to be proved. Otherwise we use the criterion from Remark 2.1f: all (r∞ +1)×(r∞ +1) submatrices (I 0 ⊂ I, J 0 ⊂ J, #I 0 = #J 0 = r∞ + 1) are singular, in particular, det R(k) |I 0 ×J 0 = 0 holds. 0 = lim det(R(k) |I 0 ×J 0 ) = det(lim(R(k) )|I 0 ×J 0 ) follows from the continuity of determinants and proves that rank(lim R(k) ) ≤ r∞ ≤ r; hence, lim R(k) ∈ Rr , proving that Rr is closed. t u R(k) |I 0 ×J 0
Remark 2.5. A matrix M ∈ KI×J with random entries has maximal rank rmax with probability one. Proof. Matrices of smaller rank form a subset of measure zero.
t u
Definition 2.6 (universal k-column independence). For an m-tuple of vectors (a1 , . . . , am ) ∈ V m let k ∈ {0, . . . , m} be the largest integer such that all subtuples of size k consist of linearly independent vectors. Then krank(a1 , . . . , am ) := k is called the Kruskal rank (cf. Kruskal [204, page 13]).
27
2.3 Matrix Norms
2.3 Matrix Norms Before the Euclidean, spectral and Frobenius norms will be discussed, the trace of a square matrix is introduced. For a generalisation of the trace to operators, see (4.66). Definition 2.7. The mapping trace : KI×I → K is defined by X trace(M ) := Mii .
(2.8)
i∈I
Exercise 2.8. (a) trace(AB) = trace(BA) for any A ∈ KI×J and B ∈ KJ×I . (b) trace(M ) = trace(UM U H ) for M ∈ KI×I and any orthogonal matrix U∈ KJ×I (in particular, for a unitary matrix U ∈ KI×I ). (c) Let λi (i ∈ I) bePall eigenvalues of M ∈ KI×I according to their multiplicity. Then trace(M ) = i∈I λi . The general definition of norms and scalar products can be found in §4.1.1 and §4.4.1. The Frobenius norm s X 2 kM kF = |Mi,j | for M ∈ KI×J (2.9) i∈I,j∈J
is also called the Schur norm or Hilbert–Schmidt norm. This norm is generated by the Frobenius scalar product X (2.10) hA, BiF := Ai,j Bi,j = trace(AB H ) = trace(B H A), i∈I,j∈J 2
2
since hM, M iF = kMkF . In particular, kMkF = trace(MM H ) = trace(M HM ) holds. Remark 2.9. Let I×J and I 0×J 0 define two matrix formats with the same number of entries: #I · #J = #I 0 · #J 0 . Any bijective mapping π : I × J → I 0×J 0 generates 0 0 a mapping P : M ∈ KI×J 7→ P (M ) = M 0 ∈ KI ×J via M 0 [i0 , j 0 ] = M [i, j] 0 0 for (i , j ) = π(i, j). Then the Frobenius norm and scalar product are invariant with respect to P , i.e., kP (M )kF = kM kF
and
hP (A), P (B)iF = hA, BiF .
Let k·kX and k·kY be vector norms on X = KI and Y = KJ , respectively. Then the associated matrix norm is6 kM ykX kM k := kM kX←Y := sup : 0 6= y ∈ KJ for M ∈ KI×J . kykY If k·kX and k·kY coincide with the Euclidean vector norm 6
In the degenerate case of Y = {0} set kM k := 0.
2 Matrix Tools
28
kuk2 :=
sX
2
for u ∈ KK ,
|ui |
i∈K
the associated matrix norm kM kX←Y is the spectral norm denoted by kM k2 . Exercise 2.10. Let M ∈ KI×J . (a) Another equivalent definition of k·k2 is yH M x
kM k2 = sup
p
J
y H y · xH x
: 0 6= x ∈ K , 0 6= y ∈ K
I
.
(2.11)
(b) kM k2 = kU M k2 = kM V H k2 = kU M V H k2 holds for orthogonal matrices 0 0 U ∈ KI ×I and V ∈ KJ ×J . From Lemma 2.23b we shall learn that the squared spectral norm kM k22 is the largest eigenvalue of both matrices M H M and M M H . Both matrix norms k·k2 and k·kF are submultiplicative, i.e., kABk ≤ kAk kBk. The example of A = B = I ∈ Rn×n shows the equality in 1 = kI · Ik2 ≤ √ kIk2 kIk2 = 1, while n = kI · IkF ≤ kIkF kIkF = n is a rather pessimistic estimate. In fact, spectral and Frobenius norms can be mixed to get better estimates. Lemma 2.11. The product of A ∈ KI×J and B ∈ KJ×K is estimated by kABkF ≤ kAk2 kBkF
as well as
kABkF ≤ kAkF kBk2 .
P 2 2 Proof. C[•, j] denotes the j-th column of C ∈ KI×K . kCkF = j∈J kC[•, j]k2 involves the Euclidean norm of the columns. For C := AB the columns satisfy C[•, j] = A · B[•, j] and the estimate kC[•, j]k2 ≤ kAk2 kB[•, j]k2 . Together 2 2 2 with the above identity, kAkF ≤ kAk2 kBkF follows. The second inequality can be obtained from the first one because kXkF = kX T kF and kXk2 = kX T k2 . t u A particular consequence is kAk2 ≤ kAkF (use B = I in the second inequality). Exercise 2.12. Let U ∈ KI
0
×I
and V ∈ KJ
0
×J
be orthogonal matrices and prove:
(a) kM kF = kU M kF = kM V H kF = kU M V H kF for M ∈ KI×J .
(b) hA, BiF = U AV H , U BV H F for A, B ∈ KI×J . Exercise 2.13. For index sets I, J, and K let A ∈ KI×K and B ∈ KJ×K with rank(B) = #J ≤ #K. Show that the matrix C ∈ KI×J minimising kA − CBkF is given by C := AB H (BB H )−1 .
29
2.4 Semidefinite Matrices
2.4 Semidefinite Matrices A matrix M ∈ KI×I is called positive semidefinite if M = MH
and
hM x, xi ≥ 0
for all x ∈ KI .
In addition, a positive-definite matrix has to satisfy hM x, xi > 0 for 0 6= x ∈ KI . Remark 2.14. Let M ∈ KI×I be positive [semi]definite. (a) The equation X 2 = M has a unique positive-[semi]definite solution in KI×I , which is denoted by M 1/2 . (b) M has positive [nonnegative] diagonal entries Mii (i ∈ I). In the set of Hermitian matrices from KI×I a semi-ordering can be defined via A≤B
:⇐⇒
B − A positive semidefinite.
(2.12)
When we write A ≤ B, we always tacitly assume that A and B are Hermitian. Remark 2.15. Let A, B ∈ KI×I be Hermitian. (a) A ≤ B is equivalent to hAx, xi ≤ hBx, xi
for all x ∈ KI .
(b) For any matrix T ∈ KI×J the inequality A ≤ B implies T HAT ≤ T HB T . (c) A ≤ B implies trace(A) ≤ trace(B). (d) A ≤ B implies trace(T HAT ) ≤ trace(T HBT ) for all T. ˆ ≤ E ∈ KI×I and arbitrary Ci ∈ KJ×K (i ∈ I), we have Lemma 2.16. For 0 ≤ E X X ˆij Ci C H ≤ X := ˆ := 0≤X Eij Ci CjH ∈ KJ×J . E j i,j∈I
i,j∈I
ˆ = U diag{λk : k ∈ I} U H holds with λk ≥ 0. Set E−E Proof. Diagonalisation P H ˆ = P ˆ Bk := i∈I Uik Ci . Then X − X k∈I λk Bk Bk proves X ≤ X because H λk ≥ 0 and Bk Bk ≥ 0. t u A tuple x := (xi : i ∈ I) of vectors7 xi ∈ KJ leads to the scalar products hxj , xi i for all i, j ∈ I. Then the Gram matrix of x is defined by G := G(x) = hxj , xi i i,j∈I . (2.13) Exercise 2.17. (a) Gram matrices are always positive semidefinite. (b) The Gram matrix G(x) is positive definite if and only if x is a tuple of linearly independent vectors. (c) Any positive-definite matrix G ∈ KI×I can be interpreted as a Gram matrix of I a basis xP := (xi : i ∈ I) of KP by defining a scalar product via hv, wi := bH Ga for v = i∈I ai xi and w = i∈I bi xi . 7
Here, KJ can also be replaced by an infinite dimensional Hilbert space.
2 Matrix Tools
30
Lemma 2.18. The spectral norm of G(x) can be characterised by ) (
2 X 2
X |ξi | = 1 . kG(x)k2 = max ξi xi
: ξi ∈ K with
2
i∈I
i∈I
Proof. Let ξ := (ξi )i∈I ∈ KI . kG(x)k2 = max {|hGξ, ξi| : kξk = 1} holds since
P P P G(x) is symmetric. hGξ, ξi = i,j hxj , xi i ξj ξi = j∈I ξj xj , i∈I ξi xi =
P
2
ξi xi proves the assertion. t u i∈I
2
2.5 Matrix Decompositions Three well-known decompositions will be recalled. The numbers of arithmetic operations8 given below are reduced to the leading term, i.e., terms of lower order are omitted.
2.5.1 Cholesky Decomposition Remark 2.19. Given a positive-definite matrix M ∈ Kn×n , there is a unique lower triangular matrix L ∈ Kn×n with positive diagonal entries such that M = LLH . Computing L costs 31 n3 operations. Matrix-vector multiplications La or LH a or the solution of linear systems Lx = a or LH x = b require n2 operations. For semidefinite matrices (or positive-definite matrices with rather small eigenvalues) there are pivotised versions such that M is equal to P HLLHP with a permutation matrix P and the side condition Lii ≥ 0, instead of Lii > 0 (cf. Hackbusch [138, §9.4.6]). As explained below, the Cholesky decomposition can be used to orthonormalise the m-tuple x := (xi )1≤i≤m of vectors xi ∈ Kn . We associate the tuple x with the Gram matrix G(x) = X H X of X := [x1 x2 · · · xm ] ∈ Kn×m (cf. (2.13)). Lemma 2.20. (a) If x is linearly independent, the Cholesky decomposition G(x) = LLH exists. Then Y := XL−H is an orthogonal matrix, i.e., its columns form an orthonormal m-tuple y := (yi )1≤i≤m . (b) If the vectors x are linearly dependent, the pivotised version of the Cholesky decomposition yields P G(x)P H = LLH with a permutation matrix P and a rect√ Here, we count all arithmetical operations (+, −, ∗, /, , etc.) equally. Sometimes, the combination of one multiplication and one addition is counted as one unit (‘flop’, cf. [36, p. 43]). 8
2.5 Matrix Decompositions
31
0 0 0 0 angular matrix L = LL00 ∈ Km×m (m0 < m), where L0 ∈ Km ×m is lower 0 triangular with a positive diagonal. Split XP H into [X 0 X 00 ] with X 0 ∈ Kn×m . 0 The orthonormal m -tuple y is defined by columns of the orthogonal matrix 0 Y = X 0 L0−H ∈ Kn×m . Proof. Part (a) can be considered as the particular case of (b) with m0 = m and H H P = I. Comparing LLH and P G(x)P H = XP H XP H = [X 0 X 00 ] [X 0 X 00 ] 0 0H 0H 0 = X X . Hence the shows that L L assertion follows from the identity Y H Y = 0−H 0−H 0 0H 0−1 0H 0 0−1 LL L = I. L t u =L X X L Remark 2.21. The cost of the procedure described in Lemma 2.20a consists of nm (m + 1) operations for X 7→ G(x), 31 m3 operations for G(x) 7→ L, and m2 n operations for computing Y = XL−H .
2.5.2 QR Decomposition The letter ‘R’ in ‘QR decomposition’ stands for a right (or upper) triangular matrix. Since an upper triangular matrix R is defined by Rij = 0 for all i > j, this requires suitably ordered index sets. The QR decomposition (or ‘QR factorisation’) is a helpful tool for orthogonalisation (cf. [247, §3.4.3], [115, §5.2]) and can be viewed as algebraic formulation of the Gram–Schmidt9 orthogonalisation. For details about different variants and their numerical stability we recommend the book of Bj¨orck [36]. Lemma 2.22 (QR factorisation). Let M ∈ Kn×m . (a) Then there are a unitary matrix Q ∈ Kn×n and an upper triangular matrix R ∈ Kn×m with M = QR
(Q unitary, R upper triangular matrix).
(2.14a)
Q can be constructed as product of Householder transforms (cf. [276, §4.7]). The computational work is 2mn min(n, m) − 32 min(n, m)3 for calculating R (while Q is defined implicitly as a product of Householder matrices), and 43 n3 for forming Q explicitly as a matrix (cf. [115, §5.2.1]). 0 (b) If n > m, the matrix R has the block structure R0 containing the submatrix R0 as an upper triangular matrix of the size m × m. The corresponding block decomposition Q = [ Q0 Q00 ] yields the reduced QR factorisation M = Q0 R0
(Q0 ∈ Kn×m , R0 ∈ Km×m ).
(2.14b)
The computational work is10 9
A modified Gram–Schmidt algorithm was already derived by Laplace in 1816 (see reference in Bj¨orck [36, p. 61] together with additional remarks concerning history). 10 Half of the cost of NQR (n, m) is needed for 12 (m2 + m) scalar products. The rest is used for scaling and adding column vectors.
2 Matrix Tools
32
NQR (n, m) := 2nm2
(cf. [115, Alg. 5.2.5], [247, §3.4]).
(c) If r := rank(M ) < min{n, m}, the sizes of Q0 and R0 can be further reduced: M = Q0 R0
(Q0 ∈ Kn×r , R0 ∈ Kr×m ).
(2.14c)
In particular, if M does not possess full rank as in Part (c) of the lemma above, we want R0 in (2.14c) to be of the form R0 = [R10 R20 ] ,
R10 ∈ Kr×r upper triangular,
rank(R10 ) = r,
(2.14d)
i.e., the diagonal entries of R10 do not vanish. This form of R0 can be achieved if and only if the part (Mij )1≤i,j≤r of M has also rank r. Otherwise we need a suitable permutation M 7→ MP of the columns of M . Then the factorisation takes the form M P = Q0 [R10 R20 ]
(P permutation matrix, Q0 , R10 , R20 in (2.14c,d)).
(2.15)
An obvious pivot strategy for a matrix M ∈ Kn×m with r = rank(M ) is the Gram–Schmidt orthogonalisation in the following form (cf. [115, §5.4.1]). (1) Let mi ∈ Kn (1 ≤ i ≤ m) be the i-th columns of M . (2) For i := 1 to r do (2a) Choose k ∈ {i, . . . , m} such that kmk k = max{kmν k : i ≤ ν ≤ m}. If k 6= i, interchange the columns mi and mk . (2b) Now mi has maximal norm. Normalise: mi := mi / kmi k. Store mi as i-th column of the matrix Q. (2c) Set mk := mk − hmk , mi i mi for i + 1 ≤ k ≤ m. Here k·k is the Euclidean norm and h·, ·i the corresponding scalar product. The column exchanges in Step (2a) lead to the permutation matrix11 P in (2.15). The operations in Step (2b) and Step (2c) define [R10 R20 ]. The presupposition r = rank(M ) guarantees that all mi appearing in Step (2b) do not vanish, while mi = 0 (r + 1 ≤ i ≤ m) holds after the r-th iteration for the remaining columns. In usual applications, the rank is unknown. In that case we may introduce a tolerance τ > 0 and redefine Step (2b) as follows: (2b’) If kmi k ≤ τ set r := i − 1 and terminate. Otherwise proceed as in Step (2b). The principle of the QR decomposition can be generalised to tuples V m , where the column vectors from Kn are replaced with functions from the space V (cf. Trefethen [283]). A permutation matrix P ∈ Kr×r (corresponding to a permutation π : {1, . . . , r} → {1, . . . , r}) is defined by (P v)i = vπ(i) . Any permutation matrix P is unitary.
11
2.5 Matrix Decompositions
33
2.5.3 Singular-Value Decomposition 2.5.3.1 Definition and Computational Cost The singular value decomposition (abbreviation: SVD) is the generalisation of the diagonalisation of square matrices (cf. [247, §1.9]). Lemma 2.23 (SVD). (a) Let M ∈ Kn×m be any matrix. Then there are unitary matrices U ∈ Kn×n , V ∈ Km×m , and a diagonal rectangular matrix Σ ∈ Rn×m ,
σ1 0 . . . 0 σ2 . . . Σ= . .. . . . . . . 0 ... 0
0 0 ... 0 0 .. .
0 .. .
0 .. .
illustration for the case , of n ≤ m
(2.16a)
σn 0 . . . 0
with so-called singular values12 σ1 ≥ σ2 ≥ . . . ≥ σi = Σii ≥ . . . ≥ 0
(1 ≤ i ≤ min{n, m})
such that 13 M = U Σ V T.
(2.16b)
The columns of U are the left singular vectors, the columns of V are the right singular vectors. (b) The spectral norm of M has the value kM k2 = σ1 . (c) The Frobenius norm of M is equal to v umin{n,m} u X kM kF = t σi2 .
(2.16c)
i=1
Proof. Assume without loss of generality that n ≤ m and set A := M M H ∈ Kn×n . Diagonalise the positive-semidefinite matrix, A = UDU H with U ∈ Kn×n unitary, n×n where the (nonnegative) eigenvalues in D = diag{d are ordered 1 , . . ., dn } ∈ R √ by size: d1 ≥ d2 ≥ . . . ≥ 0. Defining σi := di in (2.16a), we rewrite D = ΣΣ T = ΣΣ H . With W := M H U = [w1 , . . ., wn ] ∈ Km×n we have 12
For indices ` > min{#I, #J} we formally define σ` := 0. The usual formulation is M = U Σ V H with the Hermitean transposed V H . Here we use V T also for K = C because of Remark 1.3a. 13
2 Matrix Tools
34
D = U HAU = U H M M H U = W H W. Hence the columns wi of W are pairwise orthogonal and wiH wi = di = σi2 . Next, we are looking for a unitary matrix V = [v1 , . . ., vm ] ∈ Km×m with W = V ΣT,
i.e., wi = σi vi
(1 ≤ i ≤ m)
(note that the complex conjugate values vi are used13 ). Let r := max{i : σi > 0}. For 1 ≤ i ≤ r, the condition above leads to vi := σ1i wi ; i.e., vi is normalised: viH vi = 1. Since the vectors wi of W are already pairwise orthogonal, the vectors {vi : 1 ≤ i ≤ r} are orthonormal. For r + 1 ≤ i ≤ n, σi = 0 implies wi = 0. Hence wi = σi vi holds for any choice for vi . To obtain a unitary matrix V , we may choose any orthonormal extension {vi : r + 1 ≤ i ≤ m} of {vi : 1 ≤ i ≤ r}. The relation W = V Σ T (with Σ T = Σ H ) implies W H = ΣV T . By the definition of W we have M = U W H = U Σ V T so that (2.16b) is proved. Exercises 2.10b and 2.12a imply that kM k2 = kΣk2 and kM kF = kΣkF , proving the parts (b) and (c). t u If n < m, the last m − n columns of V are multiplied by the zero part of Σ. Similarly, for n > m, certain columns of U are not involved in the representation of M . Reduction to the first min{n, m} columns yields the following result. Corollary 2.24. (a) Let ui ∈ KI and vi ∈ KJ be the (orthonormal) columns of U and V , respectively. Then the statement M = U Σ V T in (2.16b) is equivalent to min{n,m} X M= σi ui viT . (2.17) i=1
The computational cost is about NSVD (n, m) := min 14nmN + 8N 3 , 6nmN + 20N 3 , where N := min{n, m} (cf. [115, §5.4.5]). (b) The decomposition (2.17) is not unique. Let σi = σi+1 = . . . = σi+k−1 Pi+k−1 (k ≥ 2) be a k-fold singular value. The part j=i σj uj vjT in (2.17) is equal to T
σi [ui , . . . , ui+k−1 ] [vi , . . . , vi+k−1 ] . For any unitary k × k matrix Q, the transformed vectors [ˆ ui , . . . , u ˆi+k−1 ] := [ui , . . . , ui+k−1 ] Q and [ˆ vi , . . . , vˆi+k−1 ] := [vi , . . . , vi+k−1 ] Q
2.5 Matrix Decompositions
35
Pi+k−1 ˆj vˆjT . Even in the case k = 1 of a simple singular yield the same sum j=i σj u ˆi := zui , vˆi := z1 vi with value, each pair ui , vi of columns may be changed into u z ∈ K and |z| = 1. (c) Often the subspace span{ui : 1 ≤ i ≤ r} for some r ≤ min{n, m} is of interest. This space is uniquely determined if and only if σr < σr+1 . The same statement holds for span{vi : 1 ≤ i ≤ r}. Remark 2.25. The orthonormality of {ui } and {vi } corresponds to the usual Euclidean scalar product hx, yi = y H x. If KIi (i = 1, 2) are endowed with other scalar products h·, ·ii , there are positive definite matrices Wi such that hx, yii = (Wi y)H (Wi x). Given a matrix M ∈ KI1 ×I2 , perform the usual SVD P ˆ := W1 M W2 = P σν u of M ˆν vˆνT . Then M = ν σν uν vνT holds with h·, ·i1 ν orthonormal vectors uν := W1−1 u ˆν and h·, ·i2 -orthonormal vectors vν := W2−1 vˆν . Next, we consider a convergent sequence M (ν) → M of matrices together with the singular-value decompositions M (ν) = U (ν) Σ (ν) V (ν)T and M = UΣ V T . Exercise 2.26. Prove the following. (a) Let M (ν) = U (ν) Σ (ν) V (ν)T ∈ Kn×m be the singular-value decompositions of M (ν) → M . Then there is a subsequence {νi : i ∈ N} ⊂ N such that U (νi ) → U,
Σ (νi ) → Σ,
V (νi ) → V,
(ν)
M = UΣ V T .
(ν)
(b) Subsequences of the spaces Sr := span{ui : 1 ≤ i ≤ r} converge to (ν) Sr := span{ui : 1 ≤ i ≤ r}, where ui and ui are the columns of U (ν) and U from Part (a).
2.5.3.2 Reduced and One-Sided Singular-Value Decompositions If M is not of full rank, there are singular values σi = 0 so that further terms can be omitted from the sum in (2.17). Let14 r := max{i : σi > 0} = rank(M ) as in the proof above. Then (2.17) can be rewritten as M=
r X
σi ui viT
i=1
with
r
r
{ui }i=1 , {vi }i=1 orthonormal, σ1 ≥ . . . ≥ σr > 0,
(2.18)
where only nonzero terms appear. The corresponding matrix formulation is
M = U 0 Σ 0 V 0T
14
0 n×r orthogonal, U = [u1 , . . . , ur ] ∈ K 0 m×r orthogonal, (2.19) with V = [v1 , . . . , vr ] ∈ K 0 r×r Σ = diag{σ1 , . . . , σr } ∈ R , σ1 ≥ . . . ≥ σr > 0.
If σi = 0 for all i, set r := 0 (empty sum). This happens for M = 0.
2 Matrix Tools
36
Definition 2.27 (reduced SVD). The identities (2.18) or (2.19) are called the reduced singular-value decomposition (since the matrices U , Σ, V in (2.16b) are reduced to the essential nonzero part). There are cases—in particular when m n—in which one is only interested in the left singular vectors ui and the singular values σi in (2.18) or equivalently only in U 0 and Σ 0 in (2.19). Then we say that we need the left-sided singular-value decomposition. The proof of Lemma 2.23 has already shown how to solve for U 0 and Σ 0 only: (1) Set A := M M H ∈ Kn×n . (2) Diagonalise A = U D U H with the nonnegative diagonal matrix D = diag{d1 , . . . , dn } ∈ Rn×n ,
d1 ≥ d2 ≥ . . . ≥ 0.
(3) Set r := max{i : di > 0}, σi :=
p di , and Σ 0 := diag{σ1 , . . . , σr }.
(4) Restrict U to the first r columns: U 0 = [u1 , . . . , ur ]. Remark 2.28. (a) Steps (1–4) from above define the matrices U 0 and Σ 0 in (2.19). −1 The third matrix V 0 is theoretically available via V 0 = (Σ 0 ) M H U 0 . The product n(n+1) M M H in Step 1 requires computing scalar products hmi , mj i (i, j ∈ I), 2 involving the rows mi := M [i, •] ∈ KJ of M . The computational cost for these scalar products will crucially depend on the underlying data structure (cf. Remark 7.16). Steps (2–4) are independent of the size of J. Their cost is asymptotically 8 3 3 n (cf. Golub–Van Loan [115, §8.3.1]). ˆ := U 0H M . M ˆ has orthogonal rows (b) The knowledge of U 0 suffices to define M m ˆ i (1 ≤ i ≤ n) which are ordered by size: km ˆ 1 k = σ1 > km ˆ 2 k = σ2 > . . . > 0. Proof. Let M = U 0 Σ 0 V 0T be the reduced singular-value decomposition. Since ˆ := U 0H M = Σ 0 V 0T . It follows that U 0H U 0 = I ∈ Kr×r , Part (b) defines M 0 0 2 H 0 0T 0 ˆM ˆ = ΣV ˆ i, m ˆ j i = 0 for i 6= j and km ˆ i k = σi . t u M (V Σ ) = Σ , i.e., hm The analogously defined right-sided singular-value decomposition of M is identical to the left-sided singular-value decomposition of the transposed matrix M T since M = U 0 Σ 0 V 0T ⇐⇒ M T = V 0 Σ 0 U 0T .
2.5.3.3 Inequalities of Singular Values Finally, we discuss estimates about eigenvalues and singular values of perturbed matrices. The following lemma states the characterisation of eigenvalues by Fischer [102] and Courant [65]. For a general matrix A ∈ Kn×n we denote its eigenvalues corresponding to their multiplicity by λk (A). If λk (A) ∈ R, we order the eigenvalues such that λk (A) ≥ λk+1 (A). Formally, we set λk (A) := 0 for k > n.
2.5 Matrix Decompositions
37
Remark 2.29. For matrices A ∈ Kn×m and B ∈ Km×n the identity λk (AB) = λk (BA) is valid. If A and B are positive semidefinite, the eigenvalues λk (AB) are nonnegative. Proof. (i) If e 6= 0 is an eigenvector of AB with nonzero eigenvalue λ, the vector Be does not vanish. Then (BA)(Be) = B(AB)e = B(λe) = λ(Be) proves that Be is an eigenvector of BA for the same λ. Hence AB and BA share the same nonzero eigenvalues, while λk (AB) = λk (BA) = 0 for k > rank(AB). (ii) If B ≥ 0, the square root B 1/2 is defined (cf. Remark 2.14). Part (i) shows that λk (AB) = λk (AB 1/2 B 1/2 ) = λk (B 1/2 AB 1/2 ). The latter matrix is positive semidefinite, proving λk (B 1/2 AB 1/2 ) ≥ 0. t u Lemma 2.30. Let the matrix A ∈ Kn×n be positive semidefinite. Then the eigenvalues λ1 (A) ≥ . . . ≥ λn (A) ≥ 0 can be characterised by λk (A) =
min V ⊂ Kn subspace with dim(V) ≤ k − 1
max xH Ax. n x ∈ K with xH x = 1 and x⊥V
(2.20)
Proof. A can be diagonalised: A = UΛU H with Λii = λi . Since xH Ax = y H Λy for y = U H x, the assertion can also be stated in the form λk =
min
W with dim(W)≤k−1
max y H Λy : y ∈ Kn with y H y = 1, y⊥W
(W = U H V). Fix W with dim(W) ≤ k−1 . All y ∈ Kn with yi = 0 for i > k form a k-dimensional subspace Y. Since dim(W) ≤ k − 1, there is at least one vector in k k P P λi yi2 ≥ λk yi2 ≥ λk . 0 6= y ∈ Y with y H y = 1, y⊥W. Obviously, y H Λy = i=1
i=1
u The choice W = {w ∈ Kn : wi = 0 : k ≤ i ≤ n} yields equality: y H Λy = λk . t We use the notation λk (A) for the k-th eigenvalue of a positive-semidefinite matrix A, where the eigenvalues are ordered by size (see Lemma 2.30). Similarly, σk (A) denotes the k-th singular value of a general matrix A. Note that k·k2 is the spectral norm in (2.11). Lemma 2.31. (a) Let A, B ∈ Kn×n be two positive-semidefinite matrices. Then λk (A) ≤ λk (A + B) ≤ λk (A) + kBk2
for 1 ≤ k ≤ n.
In particular, 0 ≤ A ≤ B implies λk (A) ≤ λk (B) for 1 ≤ k ≤ n.
(2.21)
2 Matrix Tools
38 0
(b) Let the matrices A ∈ Kn×m and B ∈ Kn×m satisfy AAH ≤ BB H . Then the singular values15 σk (A) and σk (B) of both matrices are related by for 1 ≤ k ≤ n.
σk (A) ≤ σk (B)
0
The same statement holds for A ∈ Km×n and B ∈ Km ×n with AH A ≤ B H B. 0
0
(c) Let M ∈ Kn×m be any matrix, while A ∈ Kn ×n and B ∈ Km×m have to satisfy AH A ≤ I and B H B ≤ I. Then15 σk (AM B) ≤ σk (M )
for k ∈ N.
Proof. (i) λk (A) ≤ λk (A + B) is a consequence of Remark 2.15 and Lemma 2.30. (ii) Let VA and VA+B be the subspaces which are the minimisers in (2.20) for A and A + B, respectively. Abbreviate the maximum in (2.20) over x ∈ Kn with xH x = 1 and x⊥V by max. Then V
λk (A + B) = max xH (A + B)x ≤ max xH (A + B)x = max xH Ax + xH Bx VA
VA
VA+B
H
H
≤ max x Ax + max x Bx = λk (A) + kBk2 . VA
xH x=1
(iii) For Part (b) use λk (A) ≤ λk (A + B) with A and B replaced by AAH and BB H − AAH in the case of AAH ≤ BB H . Otherwise use the fact that the eigenvalues of X H X and XX H coincide (cf. Remark 2.29). (iv) Let M 0 := AM B and use σk (M 0 )2 = λk (M 0H M 0 ). Remark 2.15b implies that M 0 (M 0 )H = AM BB H M H AH ≤ AM M H AH , so that λk (M 0H M 0 ) ≤ λk (AM M H AH ). Remark 2.29 states that λk (AM M H AH ) = λk (M H AH AM ), and from AH A ≤ I we infer that λk (M H AH AM ) ≤ λk (M H M ) = σk (M )2 . t u Let n = n1 + n2 , A ∈ Kn1 ×m and B ∈ Kn2 ×m . Then the agglomerated matrix A A n×m . . In the next lemma we compare singular values of A and B B belongs to K Lemma 2.32. For general A ∈ Kn1 ×m and B ∈ Kn2 ×m , the singular values satisfy q A 2 σk (A) ≤ σk ( B ) ≤ σk2 (A) + kBk2 . The same estimate holds for σk ([A B]) containing A ∈ Kn×m1 and B ∈ Kn×m2 . A HA Proof. Use σk2 (A) = λk (AH A) and σk2 ( B ) = λk ([A B] B ) = λk (AH A + B H B) 2 A and apply (2.21): σk2 ( B ) ≤ λk (AH A) + kB H Bk2 = σk2 (A) + kBk2 . t u Exercise 2.33. Prove σk (A) ≤ σk 15
See Footnote 12 on page 33.
A C B
≤
q 2 2 σk2 (A) + kBk2 + kCk2 .
2.6 Low-Rank Approximation
39
2.6 Low-Rank Approximation Given a matrix M , we ask for a matrix R ∈ Rs of lower rank (i.e., s < rank(M )) such that kM −Rk is minimised. The answer is given by16 Erhard Schmidt (1907) [257, §18]. In his paper, he studies the infinite singular-value decomposition for operators (cf. Theorem 4.137). The following finite case is a particular application. Lemma 2.34. (a) Let M, R ∈ Kn×m with r := rank(R). The singular values of M and M − R are denoted by σi (M ) and σi (M − R), respectively. Then17 (2.22) for all 1 ≤ i ≤ min{n, m}. σi (M − R) ≥ σr+i (M ) (b) Let s ∈ 0, 1, . . . , min{n, m} . Use the singular-value decomposition M = UΣ V T to define σi for i = j ≤ s, T R := U Σs V with (Σs )ij = (2.23a) 0 otherwise, i.e., Σs results from Σ by replacing all singular values σi = Σii for i > s with zero. Then the approximation error is v umin{n,m} u X kM − RkF = t kM − Rk2 = σs+1 and (2.23b) σi2 . i=s+1
Inequalities (2.22) becomes σi (M − R) = σs+i (M ). Proof. (i) If r + i > min{n, m}, (2.22) holds because σr+i (M ) = 0. Therefore suppose r + i ≤ min{n, m}. 2 (ii) First, σi (M − R) is investigated for i = 1. λr+1 (M M H ) := σr+1 (M ) is H the (r + 1)-th eigenvalue of A := M M (see proof of Lemma 2.23). The minimisation in (2.20) yields 2 σr+1 (M ) ≤ max xH Ax : x ∈ Kn with xH x = 1, x⊥V for any fixed subspace V of dimension ≤ r. Choose V := ker(RH )⊥ . As x⊥V is equivalent to x ∈ ker(RH ), we conclude that xH Ax = xH M M H x = M H x
H
H H H M H x = (M − R) x (M − R) x
H
= xH (M − R) (M − R) x. H
Application of (2.20) to the first eigenvalue λ1 = λ1 ((M − R) (M − R) ) of the H matrix (M − R) (M − R) shows that 16
Occasionally, this result is (incorrectly) attributed to Eckart–Young [84] (1936). Compare also the historical comments about SVD by Stewart [275]. 17 See Footnote 12 on page 33.
2 Matrix Tools
40
max xH Ax : x ∈ Kn with xH x = 1, x⊥V H
= max{xH (M − R) (M − R) x : xH x = 1, x⊥V} H
≤ max{xH (M − R) (M − R) x : x ∈ KI with xH x = 1} H = λ1 (M − R) (M − R) (in the case of the first eigenvalue, the requirement x⊥V with dim(V) = 0 is an H empty condition). Since again λ1 (M − R) (M − R) = σ12 (M − R), we have 2 2 proved σr+1 (M ) ≤ σ1 (M − R), which is statement (a) for i = 1. (iii) For i > 1 choose V := ker(RH )⊥ + W, where W with dim(W) ≤ i − 1 is arbitrary. Analogous to part (ii), we obtain the bound n o H max xH (M − R) (M − R) x : x ∈ Kn with xH x = 1, x⊥W . H
Minimisation over all W yields λi (M − R) (M − R)
= σi2 (M − R).
(iv) The choice in (2.23a) eliminates the singular values σ1 , . . . , σs so that σi (M − R) = σs+i (M ) for all i ≥ 1. u t Pr Ps T Using the notation M = i=1 σi ui viT in (2.18), we write R as i=1 σi ui vi . A connection with projections is given next. Ps Ps (s) (s) Remark 2.35. P1 := i=1 ui uiH and P2 := i=1 vi viH are the orthogonal projections onto span{ui : 1 ≤ i ≤ s} and span{vi : 1 ≤ i ≤ s}, respectively. (s) (s) (s) (s) Then R in (2.23a) can be written as R = P1 M (P2 )T = P1 M = M (P2 )T . Conclusion 2.36 (best rank-k approximation). For M ∈ Kn×m construct R as in (2.23a). Then R is the solution of the following two minimisation problems: min rank(R)≤r
kM − Rk2
and
min rank(R)≤r
kM − RkF .
(2.24)
The values of the minima are given in (2.23b). The minimising element R is unique if and only if σr > σr+1 . P 2 Proof. (i) Since kM − R0 k2 = σ1 (M − R0 ) and kM − R0 kF = i>0 σi2 (M − R0 ) follows from Lemma 2.23b,c, we obtain from Lemma 2.34a that 2
kM − R0 k2 ≥ σr+1 (M ), kM − R0 kF ≥
X
σi2 (M ) for R0 with rank(R0 ) ≤ r.
i>r
Since equality holds for R0 = R, this is the solution of the minimisation problems. (ii) If σk = σk+1 , we may interchange the r-th and (r + 1)-th columns in U u and V obtaining another singular-value decomposition. Thus, another R results. t Next, we consider a convergent sequence M (ν) and use Exercise 2.26.
2.7 Linear Algebra Procedures
41
Lemma 2.37. Consider M (ν) ∈ Kn×m with M (ν) → M . Then there are best approximations R(ν) according to (2.24) so that a subsequence of R(ν) converges to R , which is the best approximation to M . Remark 2.38. The optimisation problems (2.24) can also be interpreted as the best approximation of the range of M : max {kP M kF : P orthogonal projection with rank(P ) = r} .
(2.25)
Proof. The best approximation R ∈ Rr to M has the representation R = P M (r) for P = P1 (cf. Remark 2.35). By orthogonality, 2
2
2
kP M kF + k(I − P )M kF = kM kF holds. Hence minimisation of 2
2
k(I − P )M kF = kM − RkF 2
is equivalent to maximising kP M kF .
t u
2.7 Linear Algebra Procedures For later use, we formulate procedures based on the previous techniques. The reduced QR decomposition is characterised by the dimensions n and m, the input matrix M ∈ Kn×m , the rank r, and resulting factors Q and R. The corresponding procedure is denoted by {reduced QR decomposition} procedure RQR(n, m, r, M, Q, R); input: M ∈ Kn×m ; output: r = rank(M ), Q ∈ Kn×r orthogonal, R ∈ Kr×m upper triangular.
(2.26)
and requires NQR (n, m) operations (cf. Lemma 2.22). The modified QR decomposition in (2.15) produces an additional permutation matrix P and the decomposition of R into [R1 R2 ]: procedure PQR(n, m, r, M, P, Q, R1 , R2 ); {pivotised QR decomposition} input: M ∈ Kn×m ; (2.27) output: Q ∈ Kn×r orthogonal, P ∈ Km×m permutation matrix, r×(m−r) r×r R1 ∈ K upper triangular with r = rank(M ), R2 ∈ K . A modified version of PQR will be presented in (2.37).
2 Matrix Tools
42
The (two-sided) reduced singular-value decomposition from Definition 2.27 leads to procedure RSVD(n, m, r, M, U, Σ, V ); {reduced SVD} input: M ∈ Kn×m ; output: U ∈ Kn×r , V ∈ Km×r orthogonal with r = rank(M ), Σ = diag{σ1 , . . . , σr } ∈ Rr×r with σ1 ≥ . . . ≥ σr > 0.
(2.28)
Here the integers n, m may also be replaced with index sets I and J. For the cost NSVD (n, m), see Corollary 2.24a. The left-sided reduced singular-value decomposition (cf. Remark 2.28) is denoted by procedure LSVD(n, m, r, M, U, Σ); input: M ∈ Kn×m ; output: U, r, Σ as in (2.28).
{left-sided reduced SVD} (2.29)
Its cost is NLSVD (n, m) :=
1 8 n (n + 1) Nm + n3 , 2 3
where Nm is the cost of the scalar product of rows of M . In general, Nm = 2m − 1 holds, but it may be smaller for structured matrices (cf. Remark 7.16). In the procedures above, M is a general matrix from Kn×m . Matrices M ∈ Rr (cf. (2.6)) may be given in the form
M=
r X r X ν=1 µ=1
H cνµ aν bH µ = ACB
aν ∈ Kn , A = [a1 a2 · · · ] ∈ Kn×r , bν ∈ Km , B = [b1 b2 · · · ] ∈ Km×r
! . (2.30)
Then the following approach has a cost proportional to n + m if r n, m (cf. [138, Alg. 2.17]), but also for r n, m it is cheaper than the direct computation18 of the product M = A CB H followed by a singular-value decomposition. Remark 2.39. For M = A CB H in (2.30) compute the reduced QR decompositions19 QA ∈ Kn×rA , rA := rank(A), A = QA RA and B = QB RB with QB ∈ Kn×rB , rB := rank(B), 18 19
For instance, the direct computation is cheaper if n = m = r = rA = rB . Possibly, permutations according to (2.27) are necessary.
2.7 Linear Algebra Procedures
43
H = followed by the computation of the singular-value decomposition RA CRB H H ˆ ˆ UΣ V . Then the singular-value decomposition of M is given by UΣ V with ˆ and V = QB Vˆ . The cost of this calculation is U = QA U 2 2 n + rB m rr2 + 2 rA NQR (n, r) + NQR (m, r) + NLSVD (rA , rB ) + 2rA rB r + 2¯
with r¯ := min{rA , rB } ≤ min{n, m, r}. In the symmetric case of A = B and n = m with r¯ := rank(A), the cost reduces to NQR (n, r) + 2¯ rr2 + r¯2 r + 2n + 38 r¯ . 00
0
Let B 0 = [b01 , . . . , b0r0 ] ∈ Kn×r and B 00 = [b001 , . . . , b00r00 ] ∈ Kn×r contain two systems of vectors. Often, B 0 and B 00 correspond to bases of subspaces U 0 ⊂ V and U 00 ⊂ V . A basic task is the construction of a basis20 B = [b1 , . . . , br ] of 0 00 U := U 0 + U 00 . Furthermore, the matrices T 0 ∈ Kr×r and T 00 ∈ Kr×r with B 0 = BT 0 and B 00 = BT 00 , i.e., b0j =
r X
Tij0 bi , b00j =
i=1
r X
Tij00 bi ,
(2.31)
i=1
are of interest. The corresponding procedure is procedure JoinBases(B 0 , B 00 , r, B, T 0 , T 00 ); {joined bases} 0 n×r 0 00 n×r 00 input: B ∈ K , B ∈K , output: r = rank[B 0 B 00 ]; B basis of range([B 0 B 00 ]), 0
(2.32)
00
T 0 ∈ Kr×r and T 00 ∈ Kr×r with (2.31). 0
00
A possible realisation starts from B = [b01 ,. . ., b0r0 , b001 ,. . ., b00r00 ] ∈ Kn×(r +r ) and performs the reduced QR factorisation B = QR by RQR(n,r0 + r00,r,B,P,Q,R1,R2). Then the columns of P T Q form the basis B, while [T 0 , T 00 ] = R := [R1, R2]. If B 0 is a basis, it may be advantageous to let the basis vectors bi = b0i from B0 I 0 0 unchanged, whereas for i > r , bi is the i-th column of Q. Then T = 0 holds, while T 00 is as before. If all bases B 0 , B 00 , B are orthonormal, the second variant from above completes the system B 0 to an orthonormal basis B: procedure JoinONB(b0 , b00 , r, b, T 0 , T 00 ); {joined orthonormal basis} 0 00 input: B 0 ∈ Kn×r , B 00 ∈ Kn×r orthonormal bases, (2.33) output: B orthonormal basis of range([B 0 B 00 ]); r, T 0 , T 00 as in (2.32). The cost of both procedures is NQR (n, r0 + r00 ).
20
We call B = [b1 , . . . , br ] a basis, meaning that the set {b1 , . . . , br } is the basis.
2 Matrix Tools
44
2.8 Dominant Columns Again, we consider the minimisation problem (2.24): minrank(M 0 )≤k kM − M 0 k for M ∈ Kn×m and k·k = k·k2 or k·k = k·kF . Without loss of generality, we assume that the minimising matrix Mk satisfies rank(Mk ) = k; otherwise replace k with min0 0 kM − M 0 k. min0 kM − M 0 k = k 0 := rank(Mk ) and note that rank(M )≤k
rank(M )≤k
The minimising matrix Mk ∈ Rk is of the form Mk = AB T ,
where A ∈ Kn×k and B ∈ Km×k
and range(Mk ) = range(A). The singular-value decomposition M = UΣV T yields the matrices A = U 0 Σ 0 and B = V 0 , where the matrices U 0 , Σ 0 , V 0 consist of the first k columns of U, Σ, V . Since A = U 0 Σ 0 = M V 0T , the columns of A are linear combinations of all columns of M . The latter fact is a disadvantage is some cases. For a concrete numerical approach we have to represent A. If the columns mj of M are represented as full vectors from Kn , a linear combination is of the same kind and leads to no difficulty. This can be different if other representations are involved. To give an example of an extreme case, replace Kn×m = (Kn )m with X m , where X is a subspace of, say, L2 ([0, 1]). Let the columns be functions as xν or exp(αx). Such functions can be simply coded together with procedures for pointwise evaluation and mutual scalar products. However, linear combinations cannot be simplified. For instance, scalar products of linear combinations must be written as double sums of elementary scalar products. As a result, the singular-value decomposition reduces the rank of M to k, but the related computational cost may be larger than before. This leads to a new question. Can we find AB T ∈ Rk approximating M such that A = [cj1 · · · cjk ] consists of k (different) columns of M ? In this case, AB T involves only k columns of M , instead of all. For this purpose, we define Rk (M ) := {AB T ∈ Rk : A = [M [·, j1 ], · · · , M [·, jk ]] with 1 ≤ jκ ≤ m}, (2.34) using the notations from above. The minimisation (2.24) is now replaced with find Mk ∈ Rk (M ) with
kM − Mk k =
min
M 0 ∈Rk (M )
kM − M 0 k .
(2.35)
Since there are m k different combinations of columns, we do not try to solve this combinatorial problem exactly. Instead, we are looking for an approximate solution. By procedure PQR in (2.27), we obtain the QR decomposition M P = QR with R = [R1 R2 ]. First we discuss the case r := rank(M ) = m, which is equivalent to M possessing full rank. Then R2 does not exist (m − r = 0 columns) and R := R1 is a square upper triangular matrix with nonvanishing diagonal entries. Thanks to the pivoting strategy, the columns of R are of decreasing Euclidean norm. Let k ∈ {1, . . . , m − 1} be the desired rank from problem (2.35). We split the matrices into the following blocks:
2.8 Dominant Columns
R=
R0 S 0 R00
Q = Q0 Q00
45
with
R0 ∈ Kk×k , R00 ∈ K(m−k)×(m−k) upper triangular, S ∈ Kk×(m−k) ,
with Q0 ∈ Kn×k , Q00 ∈ Kn×(m−k) .
Then Q0 R0 corresponds to the first k columns of M P . As P is a permutation matrix, these columns form the matrix A as required in (2.34). The approximating matrix is defined by MkPQR := Q0 [R0 S] with [R0 S] ∈ Kk×m ; i.e., the matrix B T in (2.34) is B = [R0 S] . Proposition 2.40. The matrix MkPQR := Q0 [R0 S] constructed above belongs to Rk (M ) and satisfies the following estimates: kM − MkPQR kF ≤ kR00 kF , q 2 σk (MkPQR ) ≤ σk (M ) ≤ σk2 (MkPQR ) + kR00 k2 .
kM − MkPQR k2 ≤ kR00 k2 ,
Proof. (i) By construction, M − MkPQR = Q (ii) (2.36b) follows from Lemma 2.32.
0
0 0 R00
(2.36a) (2.36b)
holds and leads to (2.36a). t u
Now we investigate the case r < m. Then the full QR decomposition would lead 0 to QR with R = R0 with zeros in the rows r +1 to m. These zero rows are omitted by the reduced QR decomposition. The remaining part R0 (again denoted by R) is of the shape R = [R1 R2 ], where R1 has upper triangular form. As rank(M ) = r, the approximation rank k in (2.35) should vary in 1 ≤ k < r. The columns of R1 are again decreasing, but the first k columns may not be chosen optimally. This is illustrated by the following example. norm, Let M = 20 11 11 11 with r = 2 < m = 4. Since the first column has largest 1 0 procedure PQR produces P = I (no permutations) and Q = , R1 = 0 1 2 1 1 1 , R2 = . Let k = 1. Choosing the first column of Q and first 0 1 1 1 [1] row of R, we obtain the matrix M1 := 20 10 10 10 . The approximation error is
√ [1] ε1 := kM − M1 k = 00 01 01 01 = 3. Note that we cannot choose the second column 01 of Q and the second row of R 0 instead since 1 is not a column of M ; i.e., the resulting approximation does not belong to R1 (M ). A remedy is to change the pivot strategy in Step 2a from page 32. We choose the second column of M as first column of Q. For this purpose, let P be the permutation matrix corresponding to 1 ↔ 2. The QR decomposition applied to M P = 11 20 11 11 yields √ 1 1 √ 1 1 1 1 1 Q= √ , R1 = 2 , R2 = 2 . 0 1 0 0 2 1 −1 [2]
The first column of Q and the first row of R result in M1 P with the smaller approximation error
2 Matrix Tools
46
√
1000 √ [2]
= 2 < 3.
ε2 := kM − M1 k = −1 0 0 0 The reason for ε2 < ε1 is obvious: although 11 is of smaller norm than 20 , it has a higher weight because it appears in three columns of M . To take this weight into consideration, we need another pivot strategy. Let M = [c1 · · · cm ] ∈ Kn×m . Each column cj 6= 0 is associated with the −2 orthogonal projection Pj := kcj k cj cH j onto span{cj }. We call ci a dominant column if kPi M k = max kPj M k . 1≤j≤m
Equivalently, kM − Pj M k is minimal for j = i. Let P be the permutation matrix corresponding to the exchange 1 ↔ i. Then M P = QR leads to Q with ci / kci k −1 as first column. The first row of R is r1H := kci k cH i M P . Hence the choice for the dominant column ensures that the approximation (2.35) with k = 1 is given by −1 kci k ci r1H P H = Pi M . The calculation of a dominant column is discussed in the next lemma. Lemma 2.41. For M = [c1 · · · cm ] set with ζjk := hck , cj i / kcj k .
Z = (ζjk )1≤j,k≤m
Then the index imax ∈ {1, . . . , m} with kζimax ,• k = max1≤j≤m kζj,• k characterises the dominant column. Proof. Because Pj M = kcj k its norm is kPj M k =
−2
pP
k
cj cH j ck 1≤k≤m
=
ζjk cj kcj k
, 1≤k≤m
|ζjk |2 = kζj,• k .
t u
The concept of the dominant column leads to the following variant of PQR: procedure DCQR(n, m, r, M, P, Q, R1 , R2 ); {pivot by dominant column} input: M ∈ Kn×m ; output: Q ∈ Kn×r orthogonal, P ∈ Km×m permutation matrix, R1 ∈ Kr×r upper triangular with r = rank(M ), R2 ∈ Kr×(m−r) . for j := 1 to r do begin determine i ∈ {j, . . . , m} such that ci := M [•, i] (2.37) is the dominant column of M [•, j : m]; permute j ↔ i (change of P ) Q[j, •] := ci ; M [•, i] := M [•, j]; −2 M [•, j + 1 : m] := (I − Pi ) M [•, j + 1 : m] (Pi := kci k ci cH i ) end; {determination of R1 , R2 as usual; here omitted}
2.8 Dominant Columns
47
Corollary 2.42. In practical applications it suffices to determine ˆιmax , instead of imax , where kζˆιmax ,• k is sufficiently close to kζimax ,• k. For this purpose, order the columns by decreasing norm: with kcj k ≥ kcj+1 k . Pm 2 Choose m0 ∈ {1, . . . , m − 1} with ε2 := k=m0 +1 kck k sufficiently small to the first m0 columns: ˆιmax is the maximiser of and reduce the pPmaximisation m0 2 . Then the two maxima are related by max1≤j≤m0 |ζ | jk k=1 q 2 kζˆιmax ,• k ≤ kζimax ,• k ≤ kζˆιmax ,• k + ε2 . M P =: [c1 · · · cm ]
To estimate the cost, we start with the determination of the matrix Z from Lemma 2.41. The 12 m(m + 1) scalar products are to be computed only once. After one elimination step (ck 7→ Pi ck ) they can be updated21 without new scalar products (cost: 2(m−1)m). The application of Pi to M does also not require scalar products since they are precomputed. The cost of the original procedure DCQR is NDCQR = 4 mr − 12 r2 n + m2 n + 2rm(m − r) plus lower order terms. If m is immediately reduced to m0 , the cost is reduced correspondingly. Remark 2.43. Assume that n = n ˆ p with n ˆP , p ∈ N. An additional reduction of n cost is possible if the scalar product ha, bi = i=1 ai bi is approximated by ha, bip := p
n ˆ X
ai bi .
i=1
h·, ·ip is not a scalar product in Kn , but if a, b ∈ Kn are smooth grid functions, ha, bip approximates ha, bi. With this modification in DCQR, n in NDCQR can be reduced to n ˆ . Note that the computed Q is not strictly orthogonal. Following the ideas from [187], we may start with n ˆ = O(1) to select rˆ columns, where r < rˆ = O(r). Then the final call of DCQR (with exact scalar product) is applied to the reduced matrix M ∈ Kn׈r . Note that this procedure yields the same result as DCQR applied to the original matrix M if the finally chosen r columns are among the rˆ columns selected by the first heuristic step. The total work is O(mr + m20 r + r2 n + r3 ) with m0 from Corollary 2.42.
21
To be precise, only hck , cj i and |ζjk |2 need to be updated.
Chapter 3
Algebraic Foundations of Tensor Spaces
Abstract Since tensor spaces are in particular vector spaces, we start in Section 3.1 with vector spaces. Here, we introduce the free vector space (§3.1.2) and the quotient vector space (§3.1.3) which are needed later. Furthermore, the spaces of linear mappings and dual mappings are discussed in §3.1.4. The core of this chapter is Section 3.2 containing the definition of the tensor space. Section 3.3 is devoted to linear and multilinear mappings as well as to tensor spaces of linear mappings. Algebra structures are discussed in Section 3.4. Finally, symmetric and antisymmetric tensors are defined in Section 3.5.
3.1 Vector Spaces 3.1.1 Basic Facts We recall that V is a vector space (also named linear space) over the field K, if V 6= ∅ is a commutative group (where the group operation is written as addition) and if a multiplication ·:K×V →V is defined with the following properties: (αβ) · v (α + β) · v α · (v + w) 1·v 0·v
= α · (β · v) = α·v+β·v = α·v+α·w =v =0
for α, β ∈ K, v ∈ V, for α, β ∈ K, v ∈ V, for α ∈ K, v, w ∈ V, for v ∈ V, for v ∈ V,
(3.1)
where on the left-hand side 1 and 0 are the respective multiplicative and additive unit elements of the field K, while on the right-hand side 0 is the zero element of the group V . © Springer Nature Switzerland AG 2019 W. Hackbusch, Tensor Spaces and Numerical Tensor Calculus, Springer Series in Computational Mathematics 56, https://doi.org/10.1007/978-3-030-35554-8_3
49
50
3 Algebraic Foundations of Tensor Spaces
The sign ‘·’ for the multiplication · : K × V → V is usually omitted, i.e., αv is written instead of α · v. When α (v + w) may be misunderstood as a function α evaluated at v + w, we prefer the original notation α · (v + w). Any vector space V has a basis {vi : i ∈ B} ⊂ V with the property that it is In the infinite case of linearly independent and spans V = span{vi : i ∈ B}. P #B = ∞, linear independence means that all finite sums i ai vi vanish P if and only if ai = 0. Analogously, span{vi : i ∈ B} consists of all finite sums i ai vi . Here ‘finite sum’ means a sum with finitely many terms or equivalently a (possibly infinite) sum with only finitely many nonzero terms. The cardinality #B is independent of the choice of a basis and called the dimension, denoted by dim(V ). Note that there are many infinite cardinalities. Equality dim(V ) = dim(W ) holds if and only if there exists a bijection BV ↔ BW between the corresponding index sets of the bases. Vector spaces having identical dimensions are isomorphic. For finite dimension n ∈ N0 , the model vector space between a general vector space V with basis {v1 , . . . , vn } is Kn . The isomorphism P and Kn is given by v = αν vν 7→ (α1 , . . . , αn ) ∈ Kn . Example 3.1. Let I be an infinite index set. Then `(I) = KI denotes the set of all sequences (ai )i∈I . The set `(I) may also be viewed as the set of all mappings I → K. A subset of `(I) is `0 (I) := {a ∈ `(I) : ai = 0 for almost all i ∈ I} .
(3.2)
The unit vectors {e(i) : i ∈ I} in (2.2) form a basis of `0 (I) so that the dimension is equal to dim(`0 (I)) = #I. However, the vector space `(I) has a basis of larger infinite cardinality: dim(`(I)) > dim(`0 (I)). Above the case of a finite index set I is less interesting since then `0 (I) = `(I) does not yield a new set. An example of a uncountable index set is I = R (i.e., `(I) is isomorphic to the set of K-valued functions on R).
3.1.2 Free Vector Space over a Set Given an arbitrary set S, we are aiming for a vector space such that S is a basis. Let S be any nonempty set and K a field. Consider a mapping ϕ : S → K. Its support is defined by supp(ϕ) := {s ∈ S : ϕ(s) 6= 0} ⊂ S, where 0 is the zero element in K. Requiring # supp(ϕ) < ∞ means that ϕ = 0 holds for almost all s ∈ S. This property defines the set V := {ϕ : S → K : # supp(ϕ) < ∞} .
3.1 Vector Spaces
51
We introduce an addition in V . For ϕ, ψ ∈ V , the sum σ := ϕ + ψ is the mapping σ : S → K defined by their images σ(s) := ϕ(s) + ψ(s) for all s ∈ S. Note that the support of σ is contained in supp(ϕ) ∪ supp(ψ), which again has finite cardinality, so that σ ∈ V . Since ϕ(s) + ψ(s) is the addition in K, the operation is commutative: ϕ + ψ = ψ + ϕ. Obviously, the zero function 0V ∈ V with 0V (s) = 0 ∈ K for all s ∈ S satisfies ϕ + 0V = 0V + ϕ = ϕ. Furthermore, ϕ− : S → K defined by ϕ− (s) = −ϕ(s) is the inverse of ϕ, i.e., ϕ− + ϕ = ϕ + ϕ− = 0V . Altogether, (V, +) is a commutative group. Scalar multiplication · : K×V → V maps α ∈ K and ϕ ∈ V into the mapping ψ := αϕ defined by ψ(s) = αϕ(s) for all s ∈ S. Therefore, all axioms in (3.1) are satisfied so that V represents a vector space over the field K. The characteristic functions χs are of particular interest for an element s ∈ S: 1 if t = s ∈ S, χs (t) = 0 if t ∈ S\{s}. Every ϕ ∈ V may be written as a linear combination of such χs : X X ϕ= ϕ(s) χs = ϕ(s) χs . s∈S
s∈supp(ϕ)
Here two different notations are used: the first sum is finite, while the second one is infinite, but contains only finitely many nonzero terms. Note that any finite subset of {χs : s ∈ S} is linearly independent. Assuming X αs χs = 0V for some S0 ⊂ S with #S0 < ∞ and αs ∈ K, s∈S0
the evaluation at t ∈ S0 yields ! αt =
X
αs χs
(t) = 0V (t) = 0
s∈S0
proving linear independence. Conversely, any finite linear combination X with αs ∈ K, S0 ⊂ S, #S0 < ∞ αs χs
(3.3)
s∈S0
belongs to V . sets {χs : s ∈ S} Let Φχ : χs 7→ s be the one-to-one correspondence between the P and S. We can extend Φχ to ΦV defined on V such that = v s∈S0 αs χs in P (3.3) is mapped onto the formal linear combination s∈S0 αs s of elements of S. The image Vfree (S) := ΦV (V ) is called the free vector space over the set S.
52
3 Algebraic Foundations of Tensor Spaces
3.1.3 Quotient Vector Space Let V0 be a subspace of a vector space V. Then V0 defines an equivalence relation on V : v∼w
if and only if v − w ∈ V0 .
Any v ∈ V can be associated with an equivalence class cv := {w ∈ V : w ∼ v} .
(3.4)
Here v is called a representative of the class cv , which is also written as v + V0 . Because of the definition of an equivalence relation, the classes are either equal or disjoint. Their union is equal to V . The set {cv : v ∈ V } of all equivalence classes is denoted by the quotient V / V0 . We may define c0 + c00 := {v 0 + v 00 : v 0 ∈ c0 , v 00 ∈ c00 } for two classes c0 , c00 ∈ V /V0 and check that the resulting set is again an equivalence class, i.e., an element in V /V0 . Similarly, we define λ · c ∈ V /V0 for c ∈ V /V0 . Using the notation cv for the classes generated by v ∈ V , we find the relations cv + cw = cv+w and λ · cv = cλv . In particular, c0 is the zero element. Altogether, V /V0 is again a vector space over the same field, called the quotient vector space. Exercise 3.2. Prove the identity dim(V ) = dim (V /V0 ) dim(V0 ) and the particular cases V /V = {0} and V / {0} = V. A mapping ϕ : V /V0 → X (X any set) may possibly be induced by a map Φ : V → X via (cv in (3.4)). (3.5) ϕ(cv ) := Φ(v) Whether (3.5) is a well-defined formulation, hinges upon the following consistency condition. Lemma 3.3. (a) Let Φ : V → X be a general mapping. Then (3.5) for all v ∈ V defines a mapping ϕ : V /V0 → X if and only if Φ is constant on each equivalence class, i.e., v ∼ w implies Φ(v) = Φ(w). (b) If Φ : V → X (X a vector space) is a linear mapping, the necessary and sufficient condition reads Φ(v) = 0 for all v ∈ V0 .
3.1 Vector Spaces
53
3.1.4 (Multi-)Linear Maps, Algebraic Dual, Basis Transformation Let X, Y be two vector spaces. ϕ : X → Y is a linear mapping if ϕ(λx0 + x00 ) = λϕ(x0 ) + ϕ(x00 )
for all λ ∈ K and all x0 , x00 ∈ X.
The set of linear mappings ϕ is denoted by L(X, Y ) := {ϕ : X → Y is linear} .
(3.6) d
Let Xj (1 ≤ j ≤ d) and Y be vector spaces. A mapping ϕ : ×j=1 Xj → Y is called multilinear (or d-linear) if ϕ is linear in all d arguments: ϕ(x1 , . . . , xj−1 , x0j + λx00j , xj+1 , . . . , xd ) = ϕ(x1 , . . . , xj−1 , xj0 , xj+1 , . . . , xd ) + λϕ(x1 , . . . , xj−1 , x00j , xj+1 , . . . , xd ) for all xi ∈ Xi , x0j , x00j ∈ Xj , 1 ≤ j ≤ d, λ ∈ K. In the case of d = 2, the term bilinear mapping is used. Definition 3.4. Φ ∈ L(X, X) is called a projection, if Φ2 = Φ. It is called a projection onto Y if Y = range(Φ). Note that no topology is defined and, therefore, no continuity is required. Remark 3.5. Let {xi : i ∈ B} be a basis of X. ϕ ∈ L(X, Y ) is uniquely determined by the values ϕ(xi ), i ∈ B. In the particular case of Y = K, linear mappings ϕ : X → K are called linear forms. They are elements of the vector space X 0 := L(X, K), which is called the algebraic dual of X. A multilinear (bilinear) map into Y = K is called a multilinear form (bilinear form). Definition 3.6. Let S := {xi : i ∈ B} ⊂ X be a system of linearly independent vectors. A dual system {ϕi : i ∈ B} ⊂ X 0 is defined by ϕi (xj ) = δij
for i, j ∈ B
(cf. (2.1)).
If {xi : i ∈ B} is a basis and #B < ∞, {ϕi : i ∈ B} is called a dual basis.1 Remark 3.7. The dual Psystem allows us to determine the coefficients αi in the basis representation x = αi xi ∈ X by αi = ϕi (x). i∈B
Note that a dual system is not a basis of X 0 if dim(X) = ∞. The functional Φ ∈ X 0 defined by Φ(xi ) = 1 for all i ∈ B is not a finite linear combination of the dual system. 1
54
3 Algebraic Foundations of Tensor Spaces
So far we used abstract vector spaces. An alternative representation follows from Remark 3.7 as soon as a basis {xi : i ∈ B} is fixed. Then the coefficient tuples (αi )i∈B belong to KB (if #B = n we may also write Kn ). In this case the reference to the underlying basis is essential. For another basis {ˆ xi : i ∈ B} we obtain a new coefficient vector (α ˆ i )i∈B ∈ KB . The following result follows from X X α ˆix ˆi . αi xi = i∈B
i∈B
P ˆj and Lemma 3.8 (basis transformation). Assume two bases with xi = j∈B tji x P ˆj = i∈B sij xi . Set T = (tij )i,j∈B and S = (sij )i,j∈B . Let α := (αi )i∈B ∈ x KB and α αi )i∈B ∈ KB be the coefficient vectors corresponding to the ˆ := (ˆ xi : i ∈ B}. Then respective bases {xi : i ∈ B} and {ˆ ˆ, α = Sα
T = S −1 .
α ˆ = T α,
Also the dual basis {ϕi : i ∈ B} refers to {xi : i ∈ B} . Lemma 3.9. Let the dual bases {ϕi } and {ϕˆi } correspond to {xiP } and {ˆ xi }, ϕ ϕˆj to in Lemma 3.8 leads s respectively. The basis transformation = ij i j∈B P and ϕˆj = i∈B tji ϕi . A general vector (functional) ϕ ∈ X 0 can be written as X X βˆi ϕˆi ϕ= βi ϕ i = i∈B
i∈B
with coefficient vectors β = (βi )i∈B ∈ KB and βˆ = (βˆi )i∈B ∈ KB . Using S and T from Lemma 3.8, the relations involve the transposed matrices: ˆ β = T T β,
βˆ = S T β.
3.2 Tensor Product 3.2.1 Constructive Definition There are various ways to define the tensor product of two vector spaces. We follow the quotient space formulation (cf. [301]). Other constructions yield isomorphic objects (see comment after Proposition 3.23). Given two vector spaces V and W over some field K, we start with the free vector space Vfree (S) over the pair set S := V × W as introduced in §3.1.2. Note that Vfree (V ×W ) does not make use of the vector space properties of V or W. We recall that elements of Vfree (V ×W ) are linear combinations of pairs from V ×W : m X i=1
λi (vi , wi )
for any (λi , vi , wi ) ∈ K×V ×W, m ∈ N0 .
3.2 Tensor Product
55
A particular subspace of Vfree (V ×W ) is m m n n P P αi βj (vi , wj ) − P αi vi , P βj wj N := span i=1 j=1 . i=1 j=1 for m, n ∈ N, αi , βj ∈ K, vi ∈ V, wj ∈ W
(3.7)
The algebraic tensor space is defined by the quotient vector space V ⊗a W := Vfree (V × W ) / N
(3.8)
(cf. §3.1.3). The equivalence class c(v,w) ∈ V ⊗a W generated by a pair (v, w) ∈ V × W is denoted by v ⊗ w. Note that the tensor symbol ⊗ is used for two different purposes:2 (i) In the tensor space notation V ⊗ W the symbol ⊗ connects vector spaces and may carry the suffix ‘a’ (meaning ‘algebraic’) or a norm symbol in the later case of Banach tensor spaces (cf. (3.10) and §4). (ii) In v ⊗ w the quantities v, w, v ⊗ w are vectors, i.e., elements of the respective vector spaces V, W, V ⊗a W . As Vfree (V ×W ) is the set of linear combinations of (vi , wi ), the quotient space Vfree (V × W )/N consists of all linear combinations of vi ⊗ wi : V ⊗a W = span {v ⊗ w : v ∈ V, w ∈ W } .
(3.9)
If a norm topology is given, the completion with respect to the given norm k·k yields the topological tensor space V ⊗k·k W := V ⊗a W .
(3.10)
In §4 we discuss the properties of the tensor product for Banach spaces. This includes the Hilbert spaces, which are considered in §4.4. Notation 3.10. (a) For finite-dimensional vector spaces V and W , the algebraic tensor space V ⊗a W is already complete with respect to any norm and, therefore, coincides with the topological tensor space V ⊗k·k W . In this case we omit the suffices and simply write V ⊗ W. (b) Furthermore, the notation V ⊗ W is used when both choices V ⊗a W and V ⊗k·k W are possible or if the distinction between ⊗a and ⊗k·k is irrelevant. (c) The suffices of ⊗a and ⊗k·k will be moved to the left side when indices appear Nd Nd at the right side as in a j=1 Vj and k·k j=1 Vj . 2
Similarly, the sum v + w of vectors and the sum V + W := span{v + w : v ∈ V, w ∈ W } of vector spaces use the same symbol.
56
3 Algebraic Foundations of Tensor Spaces
Definition 3.11 (tensor space, tensor). (a) V ⊗a W (or V ⊗k·k W ) is again a vector space, which is now called a tensor space. (b) The explicit term algebraic tensor space emphasises that V ⊗a W , and not V ⊗k·k W , is meant. (c) Elements of V ⊗a W or V ⊗k·k W are called tensors, in particular, x ∈ V ⊗a W is an algebraic tensor, while x ∈ V ⊗k·k W is a topological tensor. (d) Any product v ⊗ w (v ∈ V, w ∈ W ) is called an elementary tensor.
3.2.2 Characteristic Properties Lemma 3.12. The characteristic algebraic properties of the tensor space V ⊗a W is the bilinearity: (λv) ⊗ w = v ⊗ (λw) = λ · (v ⊗ w) (v 0 + v 00 ) ⊗ w = v 0 ⊗ w + v 00 ⊗ w v ⊗ (w0 + w00 ) = v ⊗ w0 + v ⊗ w00 0⊗w =v⊗0=0
for for for for
λ ∈ K, v ∈ V, w ∈ W, v 0 , v 00 ∈ V, w ∈ W, v ∈ V, w0 , w00 ∈ W, v ∈ V, w ∈ W.
(3.11)
Proof. The first equality in (3.11) follows from λ (v, w) − (λv, w) ∈ N ; i.e., λ · (v ⊗ w) − (λv) ⊗ w = 0 holds in the quotient space. The other identities are t u derived similarly. Here the standard notational convention holds: the multiplication ⊗ has priority over the addition +, i.e., a ⊗ b + c ⊗ d means (a ⊗ b) + (c ⊗ d). Multiplication by a scalar needs no bracket since the interpretation of λv ⊗ w by (λv) ⊗ w or λ · (v ⊗ w) does not change the result (see first identity in (3.11)). In the last line3 of (3.11) the three zeros belong to the different spaces V, W , and V ⊗a W . The following statements also hold for infinite-dimensional spaces. Note that in this case the dimensions must be understood as set theoretical cardinal numbers. Lemma 3.13. (a) Let {vi : i ∈ BV } be a basis of V and {wj : j ∈ BW } a basis of W . Then a basis of V ⊗a W is given by B := {vi ⊗ wj : i ∈ BV , j ∈ BW } .
(3.12)
(b) dim (V ⊗a W ) = dim(V ) · dim(W ). P Proof. Assume i,j aij vi ⊗ wj = 0. For the linear independence of all products vi ⊗ wj ∈ B we have to show aij = 0. The properties (3.11) show X X for wi0 := aij wj . (3.13) vi ⊗ wi0 = 0 i 3
j
The last line can be derived from the first one by setting λ = 0.
3.2 Tensor Product
57
Let ϕi ∈ V 0 (cf. §3.1.4) be the linear form on V with ϕi (vj ) = δij (cf. (2.1)). Define Φi : V ⊗a W → W by Φi (v ⊗ w) = ϕi (v)w. Application of Φi to (3.13) yields wi0 = 0. Since {wj : j ∈ BW } is a basis, aij = 0 follows for all P j. As i is chosen arbitrarily, we have shown aij = 0 for all coefficients in i,j aij vi ⊗ wj = 0. Hence B is a system of linearly independent vectors. P By definition, a general tensor x ∈ V ⊗a W has the form x = ν v (ν) ⊗ w(ν) . P (ν) be expressed by the basis vectors: v (ν) = i αi vi , and similarly, Each v (ν) P can(ν) (ν) w = j βj wj . Note that all sums have finitely many terms. The resulting sum x=
XX ν
(ν)
αi vi
⊗
i
X
(ν)
βj wj
j
XX
= (3.11)
i,j
(ν) (ν)
αi βj
vi ⊗ w j
ν
is again finite and shows that span{B} = V ⊗a W , i.e., B is a basis. Since #B = #BV · #BW , we obtain the dimension identity of Part (b).
u t
The last two statements characterise the tensor space structure. Proposition 3.14. Let V, W , and T be vector spaces over the field K. A product ⊗ : V × W → T is a tensor product and T a tensor space, i.e., it is isomorphic to V ⊗a W, if the following properties hold: (i)
span property: T = span{v ⊗ w : v ∈ V, w ∈ W };
(ii)
bilinearity (3.11);
(iii)
linearly independent vectors {vi : i ∈ BV } ⊂ V and {wj : j ∈ BW } ⊂ W lead to linearly independent vectors {vi ⊗ wj : i ∈ BV , j ∈ BW } in T.
Proof. Properties (i)–(iii) imply that B in (3.12) is again a basis.
t u
Lemma 3.15. For any tensor x ∈ V ⊗a W there exist an r ∈ N0 and a representation r X vi ⊗ wi x= (3.14) i=1
with linearly independent vectors {vi : 1 ≤ i ≤ r} ⊂ V and {wi : 1 ≤ i ≤ r} ⊂ W . Pn Proof. Take any representation x = i=1 vi ⊗ wi . If, e.g., the system of vectors {vi : 1 ≤ i ≤ n} is not linearly independent, one vi can be expressed by the others. Without loss of generality assume vn =
n−1 X
αi vi .
i=1
Then vn ⊗ w n =
n−1 X i=1
! αi vi
⊗ wn =
n−1 X i=1
vi ⊗ (αi wn )
58
3 Algebraic Foundations of Tensor Spaces
shows that x possesses a representation with only n−1 terms: ! n−1 n−1 X X vi ⊗ wi + vn ⊗ wn = vi ⊗ wi0 with wi0 := wi + αi wn . x= i=1
i=1
Since each reduction step decreases the number of terms by one, this process terminates at a certain number r of terms, i.e., we obtain a representation with r t linearly independent vi and wi . u The number r appearing in Lemma 3.15 will be called the rank of the tensor x (cf. §3.2.6.2). This is in accordance with the usual matrix rank as seen in §3.2.3.
3.2.3 Isomorphism to Matrices for d = 2 Let V and W be vector spaces of finite dimensions dim V = #I and dim W = #J (I, J index sets). As discussed in the final part of §3.1.4, a particular choice of bases in V, W leads to the (isomorphic) spaces KI , KJ of coefficient vectors. The transfer to the coefficients is necessary to obtain the matrix space KI×J . In part (a) of Proposition 3.16 the isomorphism V ⊗ W ∼ = KI ⊗ KJ is described. I I×J J The isomorphism M : K ⊗ K → K is already described in Remark 1.3a. Proposition 3.16. Let {vi : i ∈ I} be a basis of V and {wj : j ∈ J} a basis of W , where dim(V ) < ∞ and dim(W ) < ∞. Let Φ : KI → V and Ψ : KJ → W denote the isomorphisms X X Ψ : (βj )j∈J 7→ βj wj . αi vi , Φ : (αi )i∈I 7→ i∈I
j∈J
(a) Then the corresponding canonical isomorphism of the tensor spaces is given by XX αi βj vi ⊗wj . Ξ := Φ⊗Ψ : KI ⊗KJ → V ⊗W with (αi )i∈I ⊗(βj )j∈J 7→ i∈I j∈J
(b) Together with M : KI ⊗ KJ → KI×J from Remark 1.3a we obtain an isomorphism between matrices from KI×J and tensors from V ⊗ W : XX Ξ 0 := ΞM−1 : KI×J → V ⊗ W with (aij )i∈I,j∈J 7→ aij vi ⊗ wj . i∈I j∈J
Concerning the usual isomorphism between matrices from KI×J and linear mappings W → V , we refer to Proposition 3.68. The difference between matrices and tensors leads to different representations of basis transformations.
3.2 Tensor Product
59
Remark 3.17 (basis transformation). If we change the bases {vi : i ∈ I} and {wj : j ∈ J} from Proposition 3.16 by transformations S and T : X X vi = Sni vˆn and wj = ˆm , Tmj w n∈I
P
m∈J
P
ˆnm vˆn ⊗ w then ˆm shows that the matrices A = (aij ) i,j aij vi ⊗ wj = n,m a aij ) are related by and Aˆ = (ˆ Aˆ = S A T T . On the side of the tensors, this transformation takes the form ˆ a = (S ⊗ T ) a
a := (ˆ ai,j )(i,j)∈I×J , with a := (ai,j )(i,j)∈I×J and ˆ
where S ⊗ T is the Kronecker product acting on KI ⊗ KJ (cf. §1.1.2 and §3.3.2.1), i.e., M ((S ⊗ T ) a) = S M(a) T T . If V = KI and W = KJ , Ξ is the identity and the isomorphism between V ⊗ W and the matrix space KI×J is given by M in Remark 1.3a. Remark 3.18. Suppose dim(V ) = 1. Then the vector space V may be identified with the field K. V ⊗a W is isomorphic to K ⊗a W and to W . In the latter case, we identify λ ⊗ w (λ ∈ K, w ∈ W ) with λw. Lemma 3.19 (reduced singular-value decomposition). Let K ∈ {R, C} , and suppose dim(V ) < ∞ and dim(W ) < ∞. Then for any x ∈ V ⊗ W there exist a number r ≤ min{#I, #J} and two families (xi )i=1,...,r and (yi )i=1,...,r of linearly independent vectors such that r X x= σi xi ⊗ yi i=1
with singular values σ1 ≥ . . . ≥ σr > 0. Proof. The isomorphism Ξ 0 : KI×J → V ⊗ W from Proposition 3.16b de0−1 fines the decomposition Prmatrix AT:= Ξ x, for which the reduced singular-value A = i=1 σi ai bi can be determined (cf. (2.18)). Note P that Ξ 0 (ai biT ) = ai ⊗ bi r 0 (cf. i=1 σi Ξ(ai ⊗ bi ) = Pr (1.3)). Backtransformation yields x = Ξ A = i=1 σi Φ(ai ) ⊗ Ψ (bi ). The statement follows by setting xi := Φ(ai ) and u yi := Ψ (bi ). Note that linearly independent ai yield linearly independent Φ(ai ). t We remark that the vectors xi (as well as yi ) are not orthonormal since such properties are not (yet) defined for V (and W ). Lemma 3.19 yields a second proof of Lemma 3.15 but restricted to K ∈ {R, C}. Remark 3.20. The tensor spaces V ⊗a W and W ⊗a V are isomorphic vector spaces via the (bijective) transposition T : V ⊗a W → W ⊗a V
via x = v ⊗ w 7→ xT = w ⊗ v.
If x ∈ V ⊗ W corresponds to a matrix M = Ξ 0−1 (x) (cf. Proposition 3.16b), then xT ∈ W ⊗ V corresponds to the transposed matrix M T .
60
3 Algebraic Foundations of Tensor Spaces
3.2.4 Tensors of Order d ≥ 3 3.2.4.1 Algebraic Properties In principle we can extend the construction in §3.2.1 to the case of more than two factors. However, this is not necessary as the next lemma shows. Lemma 3.21. (a) The tensor product is associative: the left- and right-hands sides in U ⊗a (V ⊗a W ) = (U ⊗a V ) ⊗a W, are isomorphic vector spaces as detailed in the proof. We identify both notations and use the neutral notation U ⊗a V ⊗a W instead. (b) If U, V, W are finite dimensional with dim(U ) = n1 , dim(V ) = n2 , and dim(W ) = n3 , the isomorphic model tensor space is Kn1 ⊗ Kn2 ⊗ Kn3 . Proof. Let ui (i ∈ BU ) , vj (j ∈ BV ) , wk (k ∈ BW ) be bases of U, V,W . As seen in Lemma 3.13, V ⊗a W has the basis vj ⊗ wk (j, k) ∈ BV × BW , while U ⊗a (V ⊗a W ) has the basis ui ⊗ (vj ⊗ wk ) with (i, (j, k)) ∈ BU ×(BV ×BW ). Similarly, (U ⊗a V ) ⊗a W has the basis (ui ⊗ vj ) ⊗ wk
with
((i, j) , k) ∈ (BU × BV ) × BW .
By the obvious bijection between BU × (BV × BW ) and (BU × BV ) × BW , the spaces U ⊗a (V ⊗a W ) and (U ⊗a V ) ⊗a W are isomorphic. This proves part (a). For part (b), see Remark 3.31. t u Repeating the product construction (d − 1)-times, we get the generalisation of Nd the previous results to the algebraic tensor product a j=1 Vj (cf. Notation 3.10). Proposition 3.22. Let Vj (1 ≤ j ≤ d, d ≥ 2) be vector spaces over K. d N (a) The algebraic tensor space4 V := a Vj is independent of the order in j=1
which the pairwise construction (3.8) is performed (more precisely, the resulting spaces are isomorphic and can be identified). (b) T is the algebraic tensor space V if the following properties hold: nN o d (j) (j) (i) span property: T = span v : v V ∈ j ; j=1 (ii)
multilinearity; i.e., for all λ ∈ K and v (j) , w(j) ∈ Vj with j ∈ {1, . . . , d}: v (1) ⊗ v (2) ⊗ . . . ⊗ λv (j) + w(j) ⊗ . . . ⊗ v (d) = λv (1) ⊗ v (2) ⊗...⊗ v (j) ⊗...⊗ v (d) + v (1) ⊗ v (2) ⊗...⊗ w(j) ⊗...⊗ v (d) ;
(iii)
linearly independent vectors {vi : i ∈ Bj } ⊂ Vj (1 ≤ j ≤ d) lead to Nd (j) linearly independent vectors { j=1 vij : ij ∈ Bj } in T.
(j)
N The product d j=1 Vj is to be formed in the order of the indices j, i.e., V1 ⊗ V2 ⊗ . . . If we N write j∈K Vj for an ordered index set K, the ordering of K determines the order of the factors.
4
3.2 Tensor Product
61
(c) The dimension is given by dim(V) =
d Y
dim(Vj ).
j=1 (j)
If {vi : i ∈ Bj } are bases of Vj (1 ≤ j ≤ d), then {vi : i ∈ B} is a basis of V, where d O (j) for i = (i1 , . . . , id ) ∈ B := B1 × . . . × Bd . vi = vi j j=1
Nd An alternative definition of a j=1 Vj follows the construction in §3.2.1 with pairs (v, w) replaced with d-tuples and an appropriately defined subspace N . The following expression of the multilinear mapping ϕ by the linear mapping Φ is also called the ‘linearisation’ of ϕ. Proposition 3.23 (universality of the tensor product). Let Vj (1 ≤ j ≤ d) and U be vector spaces over K. Then, for any multilinear mapping ϕ : V1 × . . . × Vd → U , i.e., ϕ(v (1) , . . . , λv (j) + w(j) , . . . , v (d) ) = λ ϕ(v (1) , . . . , v (j) , . . . , v (d) ) + ϕ(v (1) , . . . , w(j) , . . . , v (d) ) for all v
(j)
,w
(j)
(3.15)
∈ Vj , λ ∈ K, 1 ≤ j ≤ d,
there exists a unique linear mapping Φ :
a
Nd
j=1
Vj → U such that
ϕ(v (1) , v (2) , . . . , v (d) ) = Φ v (1) ⊗ v (2) ⊗ . . . ⊗ v (d)
for all v (j) ∈ Vj , 1 ≤ j ≤ d. (j)
Proof. Let {vi : i ∈ Bj } be a basis of Vj . {vi : i ∈ B} from Proposition 3.22c Nd (1) (d) is a basis of V := a j=1 Vj . Define Φ(vi ) := ϕ(vi1 , . . . , vid ). This determines Φ : V → U uniquely (cf. Remark 3.5). Analogously, the multilinear mapping ϕ (1) (d) is uniquely determined by ϕ(vi1 , . . . , vid ). Multilinearity of ϕ and V yields ϕ v (1) , v (2) , . . . , v (d) = Φ v (1) ⊗ . . . ⊗ v (d) for all v (j) ∈ Vj . t u The statement of Proposition 3.23 may also be Nd used as an equivalent definition of a j=1 Vj (cf. Greub [125, Chap. I, §2]). The universal factorisation property stated in Proposition 3.23 is visualised by the commutative diagram to the right.
V1 × . . . × Vd
a
⊗↓ Nd
j=1
→ U ϕ
Φ% Vj
Notation 3.24. If all Vj = V are identical vector spaces, the notation Nd simplified by ⊗d V . For a vector v ∈ V , we set ⊗d v := j=1 v.
Nd
j=1
Vj is
62
3 Algebraic Foundations of Tensor Spaces
3.2.4.2 Nondegeneracy To avoid trivial situations, we define nondegenerate tensor spaces. Nd Definition 3.25. A tensor space V = a j=1 Vj is called nondegenerate, if d > 0 and dim(Vj ) ≥ 2 for all 1 ≤ j ≤ d. Otherwise V is called degenerate. This definition is justified by the following remarks. If dim(Vj ) = 0 for one j, V = {0} is the trivial vector space. If d = 0, the empty product is defined by V = K, which is also a trivial case. In the case of dim(Vj ) = 1 for some j, the Nd formulation by a j=1 Vj can be reduced (see next remark). Remark 3.26. (a) If dim(Vk ) = 1 for some k, the isomorphism V = a N a j∈{1,...,d}\{k} Vj allows us to omit the factor Vk .
Nd
j=1 Vj
∼ =
(b) After eliminating all factors Vj with dim(Vj ) = 1 and renaming the remaining Ndred Vj . If still dred > 0, the representation vector spaces, we obtain V ∼ = Vred = a j=1 is nondegenerate. Otherwise the tensor space is degenerate because of dred = 0. As an illustration, we may consider a matrix space (i.e., a tensor space of order d = 2). If dim(V2 ) = 1, the matrices consist of only one column. Hence they may be considered as vectors (tensor space with d = 1). If dim(V1 ) = dim(V2 ) = 1 , the 1 × 1-matrices may be identified with scalars from the field K.
3.2.4.3 Tuples Finally, we mention an isomorphism between the space of tuples of tensors and an extended tensor space. Nd Lemma 3.27. Let V = j=1 Vj be a tensor space over the field K and m ∈ N. The vector space of m-tuples (v1 , . . . , vm ) with vi ∈ V is denoted by Vm . Then the following vector space isomorphism is valid: Vm ∼ =
d+1 O
Vj = V ⊗ Vd+1
with Vd+1 := Km .
j=1
Pm (i) (i) Proof. (v1 , . . . , vm ) ∈ Vm corresponds to ∈ Km i=1 vi ⊗ e , where e is the i-th unit vector. The opposite direction of the isomorphism is described by u v ⊗ x(d+1) ∼ t = (x1 v, x2 v, . . . , xm v) with x(d+1) = (xi )i=1,...,m ∈ Km . Exercise 3.28. Let Pmt be the tensor corresponding to the tuple (v1 , . . . , vm ) . Prove that rank(t) ≤ i=1 rank(vi ). A generalisations to infinite dimensions can be defined by V ⊗a `0 (I) with #I = ∞ (cf. (3.2)).
3.2 Tensor Product
63
3.2.5 Different Types of Isomorphisms In algebra it is common to identify isomorphic structures. All characteristic properties of an algebraic structure should be invariant under isomorphisms. The question is what algebraic structures are meant. All previous isomorphisms were vector space isomorphisms. As it is well known, two vector spaces are isomorphic if and only if the dimensions coincide (note that in the infinite-dimensional case, the cardinalities of the bases are decisive). Any tensor space is a vector space, but not any vector space isomorphism preserves the tensor structure. An essential part of the tensor structure is the d-tuple of vector spaces (V1 , . . . , Vd ) together with the dimensions of Vj (cf. ProposiNd tion 3.23). In fact, each space Vj can be regained from V := a j=1 Vj as Nd the range of the mapping A = k=1 Ak (cf. (1.4a)) with Aj = id, while 0 6= Ak ∈ Vk0 for k 6= j. Therefore a tensor space isomorphism must satisfy Nd Nd that V = a j=1 Vj ∼ = W implies that W = a j=1 Wj holds with isomorphic ∼ Vj for all 1 ≤ j ≤ d. In particular, the order d of the tensor vector spaces Wj = spaces must coincide. This requirement is equivalent to the following definition. Definition 3.29 (tensor space isomorphism). A tensor space isomorphism A : V := a
d O j=1
Vj → W = a
d O
Wj
j=1
Nd is any bijection of the form A = j=1 Aj (cf. (3.30a)), where Aj : Vj → Wj (1 ≤ j ≤ d) are vector space isomorphisms. For instance, K2 ⊗ K8 , K4 ⊗ K4 , and K2 ⊗ K2 ⊗ K4 are isomorphic vector spaces (since all have dimension 16), but their tensor structures are not identical. For the definition of U ⊗a V ⊗a W we use in Lemma 3.21 that U ⊗a (V ⊗a W ) ∼ = (U ⊗a V ) ⊗a W . Note that these three spaces are isomorphic only in the sense of vector spaces, whereas their tensor structures are different. In particular, the latter spaces are tensor spaces of order two, while U ⊗a V ⊗a W is of order three. We see the difference when we consider the elementary tensors as in the next example. Example 3.30. Let both {v1 , v2 } ⊂ V and {w1 , w2 } ⊂ W be linearly independent. Then u ⊗ (v1 ⊗ w1 + v2 ⊗ w2 ) is an elementary tensor in U ⊗a (V ⊗a W ) (since u ∈ U and v1 ⊗ w1 + v2 ⊗ w2 ∈ V ⊗a W ), but it is not an elementary tensor of U ⊗a V ⊗a W . Remark 3.31. In the finite-dimensional case of nj := dim(Vj ) < ∞, the isomorphic Nd (j) model tensor space is W := j=1 Knj . Choose some bases {bν : 1 ≤ ν ≤ nj } Nd of Vj . Any v ∈ V := j=1 Vj has a unique representation of the form X (d) (1) v= ai bi1 ⊗ . . . ⊗ bid . (3.16) i1 ···id
The coefficients ai define the coefficient tensor a ∈ W. The mapping A : V → W by A(v) = a is a tensor space isomorphism.
64
3 Algebraic Foundations of Tensor Spaces
The next generalisation of Remark 3.17 describes a general basis transformation. (j) (j) Lemma 3.32 (basis transformation). (a) Let {bi }i∈Ij and {ˆbi }i∈Ij be bases P (j) (j) (j) of Vj with bk = i∈Ij Tik ˆbi . Then the coefficient tensors a and ˆ a in (3.16) are related by d O (j) T (j) , T (j) = (Tik ). with T = ˆ a = Ta j=1
T:
Nd
Ij j=1 K
→
Nd
j=1 K
Ij
is a tensor space isomorphism.
(b) If the spaces Vj = V coincide, i.e., V = ⊗d V, also the transformations T (j) = T coincide and T becomes T = ⊗d T. (c) If either Vj = V or Vj = V 0 , the T (j) are either T or (T −1 )T , respectively. t u
Proof. For part (c) use Lemma 3.9.
The isomorphism sign ∼ = is an equivalence relation in the set of the respective ∼ten is the tensor structure. Let ∼ =vec denote the vector space isomorphism, while = ∼ space isomorphism. Then =ten is the finer equivalence relation since V ∼ =ten W ∼vec W but not vice versa. There are additional equivalence relations implies V = ∼ =, which are between ∼ =ten and ∼ =vec , i.e., V ∼ =ten W ⇒ V ∼ =W ⇒ V∼ =vec W. We give three examples. 1) We may not insist upon a strict ordering of the vector spaces Vj . Let π : {1, . . . , d} → {1, . . . , d} be a permutation. Then V := V1 ⊗ V2 ⊗ . . . ⊗ Vd and Vπ := Vπ(1) ⊗ Vπ(2) ⊗ . . . ⊗ Vπ(d) are considered as isomorphic. In a second step, each Vj may be replaced with an isomorphic vector space Wj . ∼ K of dimension one. This 2) In Remark 3.26 we omitted vector spaces Vj = ∼ V2 , where Vj may be additionally leads to an isomorphism V1 ⊗K ∼ = V1 or K⊗V2 = replaced with an isomorphic vector space Wj . Note that the order of the tensor spaces is changed, but the nontrivial vector spaces are still pairwise isomorphic. 3) The third example will be of broader importance. Fix some k ∈ {1, . . . , d}. Using the isomorphism from Item 1, we may state that V1 ⊗ . . . ⊗ Vk ⊗ . . . ⊗ Vd ∼ = Vk ⊗ V1 ⊗ . . . ⊗ Vk−1 ⊗ Vk+1 ⊗ . . . ⊗ Vd using the permutation 1 ↔ k. Next, we make use of the argument of Lemma 3.21: associativity allows the vector space isomorphism h i Vk ⊗V1 ⊗. . .⊗Vk−1 ⊗Vk+1 ⊗. . .⊗Vd ∼ = Vk ⊗ V1 ⊗. . .⊗Vk−1 ⊗Vk+1 ⊗. . .⊗Vd . The tensor space in parentheses will be abbreviated by O V[k] := a Vj , j6=k
where
(3.17a)
3.2 Tensor Product
65
O
O
.
(3.17b)
Vj ∼ = Vk ⊗ V[k] .
(3.17c)
means
j6=k
j∈{1,...,d}\{k}
Altogether, we have the vector space isomorphism V=
d O a j=1
We notice that V is a tensor space of order d, whereas Vk ⊗ V[k] has order 2. Nevertheless, a part of the tensor structure (the space Vk ) is preserved. The importance of the isomorphism (3.17c) is already obvious from Lemma 3.21 since this allows a reduction to tensor spaces of order two. We shall employ (3.17c) to introduce the matricisation in §5.2. In order to simplify the notation, we shall often replace the ∼ = sign with equality: V = Vk ⊗ V[k] . This allows us to write v ∈ V, as well as v ∈ Vk ⊗ V[k] , whereas the more exact notation is v ∈ V and v ˜ = Φ(v) ∈ Vk ⊗V[k] with the vector space isomorphism Φ : V → Vk ⊗ V[k] . In fact, we shall see in Remark 3.36 that v and v ˜ have different properties. For elementary tensors of V we write v=
d O
v (j) = v (k) ⊗ v[k] ,
where v[k] :=
j=1
O
v (j) ∈ V[k] .
(3.17d)
j6=k
For a general (algebraic) tensor, the corresponding notation is v=
d XO i
(j)
vi
=
j=1
X
(k)
vi
[k]
⊗ vi
[k]
with vi :=
O
i
(j)
vi
∈ V[k] .
j6=k
3.2.6 Rr and Tensor Rank 3.2.6.1 The Set Rr Let Vj (1 ≤ j ≤ d) be vector spaces generating V := combinations of r elementary tensors are contained in ) ( r X Rr := Rr (V) := vν(1) ⊗ . . . ⊗ vν(d) : vν(j) ∈ Vj
a
Nd
j=1
Vj . All linear
(r ∈ N0 ).
(3.18)
ν=1
Deliberately, we use the same symbol Rr as in (2.6) as justified by Remark 3.38a. S Remark 3.33. V = r∈N0 Rr holds for the algebraic tensor space V. Proof. By the definition (3.9), v ∈ V is a finite linear combination of elementary Ps tensors, i.e., v = ν=1 αν eν for some s ∈ N0 and suitable elementary tensors eν . (1) (d) tensor: αν eν =: vν ⊗. . .⊗vν . The factor αν can S be absorbed by the elementary S Hence v ∈ Rs ⊂ r∈N0 Rr proves V ⊂ r∈N0 Rr ⊂ V. t u
66
3 Algebraic Foundations of Tensor Spaces
Remark 3.34. The sets Rr , which in general are not subspaces, are nested: {0} = R0 ⊂ R1 ⊂ . . . ⊂ Rr−1 ⊂ Rr ⊂ . . . ⊂ V
for all r ∈ N,
(3.19a)
and satisfy the additive property Rr + Rs = Rr+s .
(3.19b)
Proof. Note that R0 = {0} by the empty sum convention. Since we may choose Pr (1) (d) (j) u vr = 0 in ν=1 vν ⊗ . . . ⊗ vν , all sums of r−1 terms are included in Rr . t
3.2.6.2 Tensor Rank (1)
(d)
A nonvanishing elementary tensor vν ⊗ . . . ⊗ vν in (3.18) becomes a rank-1 matrix in the case of d = 2 (cf. §2.2). Remark 2.1 states that the rank r is the Pr (1) (2) smallest integer such that a representation M = is valid (cf. ν=1 vν ⊗ vν (1.3)). This leads to the following generalisation. Nd Definition 3.35 (tensor rank). The tensor rank of v ∈ a j=1 Vj is defined by rank(v) := min {r : v ∈ Rr } ∈ N0 .
(3.20)
The definition makes sense since subsets of N0 have always a minimum. As in (2.6), we can characterise the set Rr by Rr = {v ∈ V : rank(v) ≤ r} .
(3.18∗ )
We shall use the shorter ‘rank’ instead of ‘tensor rank’. Note that there is an ambiguity if v is a matrix as well as a Kronecker tensor (see Remark 3.38b). If necessary, we use the explicit terms ‘matrix rank’ and ‘tensor rank’. Another ambiguity is caused by the isomorphisms discussed in §3.2.5. Remark 3.36. Consider the isomorphic vector spaces U ⊗a V ⊗a W and U ⊗a X with X := V ⊗a W from Example 3.30. Let Φ : U ⊗a V ⊗a W → U ⊗a X be the vector space isomorphism. Then rank(v) and rank(Φ(v)) are in general different. If we identify v and Φ(v), the tensor structure should be explicitly mentioned, e.g., by writing rankU ⊗V ⊗W (v) or rankU ⊗X (v). The same statement holds for Nd V := a j=1 Vj and Vk ⊗a V[k] in (3.17a). The latter remark shows that the rank depends on the tensor structure (U ⊗a X versus U ⊗a V ⊗a W ). However, Lemma 3.39 will prove invariance with respect to tensor space isomorphisms. Practically, it may be hard to determine the rank. It is not only that the rank is a discontinuous function so that any numerical rounding error may change the rank (as for the matrix rank). Even with exact arithmetic computing the rank is, in general, not feasible for large-size tensors because of the next statement.
3.2 Tensor Product
67
Proposition 3.37 (H˚astad [159]). In general, the determination of the tensor rank is an NP-hard problem. If Vj are matrix spaces KIj ×Jj , the tensor rank is also called the Kronecker rank. Remark 3.38. (a) For d = 2, the rank of v ∈ V1 ⊗a V2 is given by r in (3.14) and can be constructed as in the proof of Lemma 3.15. If, in addition, the spaces Vj are finite dimensional, Proposition 3.16 yields an isomorphism between V1 ⊗a V2 and matrices of the size dim(V1 ) × dim(V2 ). Independently of the bases chosen in Proposition 3.16, the matrix rank of the associated matrix coincides with the tensor rank. Nd (b) For Vj = KIj ×Jj the (Kronecker) tensors A ∈ V := j=1 Vj are matrices. In this case the matrix rank of A is completely unrelated to the tensor rank of A. Qd For instance, the identity matrix I ∈ V has (full) matrix rank j=1 #Ij , whereas the Nd tensor rank of the elementary tensor I = j=1 Ij (Ij = id ∈ KIj×Ij ) is equal to 1. (c) For tensors of order d ∈ {0, 1}, the rank is trivial: 0 for v = 0 . rank(v) = 1 otherwise Nd So far, we have only considered algebraic tensor spaces Valg := a j=1 Vj . A Nd Banach tensor space Vtop := k·k j=1 Vj (cf. (3.10)) is the closure (completion) S of r∈N0 Rr (cf. Remark 3.33).We can extend the definition of the tensor rank by5 rank(v) := ∞
if
v∈
d O k·k j=1
Vj \ a
d O
Vj .
j=1
Nd Nd Lemma 3.39 (rank invariance). Let V = j=1 Vj and W = j=1 Wj be either algebraic or topological tensor spaces. (a) Assume that V and W are isomorphic tensor spaces, Nd i.e., the vector spaces Vj ∼ = Wj are isomorphic (cf. Definition 3.29). Let Φ = j=1 φ(j) : V → W be an isomorphism. Then the tensor rank of v ∈ V is invariant under Φ: rank(v) = rank(Φ(v)) (b) Let A =
Nd
j=1
for all v ∈ V.
A(j) : V → W with A(j) ∈ L(Vj , Wj ). Then rank(Av) ≤ rank(v)
for all v ∈ V.
Pr Nd (j) Proof. For Part (b) consider v = ν=1 j=1bν . Since the number of terms in Pr Nd (j) Av = ν=1 j=1(A(j) bν ) is unchanged, rank(Av) ≤ rank(v) follows. For Part (a) we use this inequality twice for Φ and Φ−1 : rank(v) ≥ rank(Φv) ≥ t u rank(Φ−1 Φv) = rank(v). 5
This does not mean, in general, that v can be written as an infinite sum (but see §4.2.3.3 and Theorem 4.133).
68
3 Algebraic Foundations of Tensor Spaces
Corollary 3.40. Remark 3.31 states a tensor space isomorphism Φ between a Nd finite-dimensional tensor space V := j=1 Vj with nj := dim(Vj ) and its Nd nj coefficient tensor space W := K Let a be the coefficient tensor of v. . j=1 Then rank(v) = rank(a). As a consequence, Φ is a bijection between Rr (V) and Rr (W). By the definition of the rank of algebraic tensors, there exists a representation v=
d r O X
vν(j)
with r := rank(v) < ∞.
(3.21)
ν=1 j=1
Other names for (3.21) are ‘rank decomposition’ or ‘rank-retaining decomposition’. Sometimes (3.21) is viewed as the generalisation of the singular-value decomposition (SVD) to d ≥ 3. From the view point of numerical analysis the opposite is true. SVD yields orthonormal systems which is a good basis for stable computations. Concerning the vectors in (3.21) compare Remark 3.50. The singular values in SVD clearly indicate what terms can be omitted for a truncation. Analogously, the eleNd Nd (j) (j) mentary tensor j=1 vν may be rewritten as eν := σν j=1 vˆν with normalised P (j) (j) (j) vectors vˆν := vν /kvν k. However, there may be a partial sum v0 = µ eνµ of small size kv0 k involving terms with large σνµ . Thus v0 is a perfect candidate for truncation in spite of the large σνµ (cf. §9.4). The following lemma offers a necessary condition for (3.21). The proof is known from Lemma 3.15. (j)
Lemma 3.41. Assume r = rank(v). Using vν in (3.21), define the elementary tensors O O Vk vν[j] := vν(k) ∈ V[j] = a k∈{1,...,d}\{j}
k∈{1,...,d}\{j} [j]
(cf. (3.17d)). Then (3.21) implies that the tensors {vν : 1 ≤ ν ≤ r} are linearly (j) independent for all 1 ≤ j ≤ d, while vν 6= 0 for all ν and j. [1]
Proof. Let j = 1. Assume that the elementary tensors {vν : 1 ≤ ν ≤ r} are [1] linearly dependent. Without loss of generality, suppose that vr may be expressed Pr−1 [1] [1] by the other tensors: vr = ν=1 βν vν . Then v=
r X
vν(1) ⊗ vν[1] =
=
vν(1) + βν vr(1) ⊗ vν[1]
ν=1
ν=1 r−1 X
r−1 X
vν(1) + βν vr(1) ⊗ vν(2) ⊗ . . . ⊗ vν(d)
ν=1 (j)
implies rank(v) < r. Similarly, if vν = 0, the j-th term can be omitted implying u t the contradiction rank(v) < r. Analogously, j > 1 is treated.
3.2 Tensor Product
69
Remark 3.42. Note that Lemma 3.41 states linear independence only for the [1] (1) tensors vν , 1 ≤ ν ≤ r. The vectors vν are nonzero, but they may be linearly dependent. An example is the tensor in (3.24), which has rank 3, while all subspaces (j) Uj = span{vν : 1 ≤ ν ≤ 3} have only dimension 2. For symmetric tensors (cf. §3.5) a symmetric rank will be defined in Definition 3.83.
3.2.6.3 Expansion with Respect to a Basis Below we consider tensors belonging to a subspace U ⊂ V that is also a tensor Nd space U = a j=1 Uj generated by subspaces Uj ⊂ Vj (possibly Uj = Vj ). The statement u ∈ U ⊂ V can be understood in two ways. It can be regarded as P N (j) (j) a tensor u = ν j vν in V, i.e., vν ∈ Vj , which belongs to U. On the Nd other hand, the definition of u ∈ U = a j=1 Uj requires a representation by P N (j) (j) u = ν j uν with uν ∈ Uj (cf. (3.9) and span property (i) in Proposition 3.22b). Both interpretations are equivalent as will be proved in §6. Lemma 3.43. Let Uj ⊂ Vj be subspaces generating the tensor (sub)space U=
d O a
Uj ⊂ V.
j=1
P N (j) (j) Assume that u = ν j vν with vν ∈ Vj satisfies u ∈ U. Then there are P N (j) (j) uν ∈ Uj such that u = ν j uν . For later use we introduce the following notation (3.22a). Let Ud ⊂ Vd be a finite-dimensional subspace and consider a tensor6 v ∈ V1 ⊗ V2 ⊗ . . . ⊗ Vd−1 ⊗ Ud = V[d] ⊗ Ud (d)
and fix a basis {bi } of Ud . Any elementary tensor in V[d] ⊗ Ud can be written as P [d] P [d] (d) (d) = i vi ⊗ bi with vi := αi v[d] . Using this v[d] ⊗ v (d) = v[d] ⊗ i αi bi P [d] (d) representation for all terms of an algebraic tensor v = i vi ⊗ vi ∈ V[d] ⊗ Ud (d)
(vi
∈ Ud by Lemma 3.43) we obtain a representation of the form v=
X i
(d)
vhii ⊗ bi
with vhii ∈ V[d] =
d−1 O
Vj .
(3.22a)
j=1 (d)
The tensors vhii ∈ V[d] are uniquely determined by the choice of the basis {bi } and the index i. An explicit description of vhii will be given in Remark 3.65. 6
We may assume as well that v ∈ V[j] ⊗ Uj for j 6= d. For the sake of a simple notation we choose j = d.
70
3 Algebraic Foundations of Tensor Spaces
In the case of d = 1, V[d] is the field K and the expansion coefficients vhii are scalars. Nd−2 Vj ⊗ Ud−1 ⊗ Ud , we may fix bases of Ud−1 and Ud . If d ≥ 2 and v ∈ Nd−2 j=1 Then vhii ∈ j=1 Vj ⊗ Ud−1 in (3.22a) can again be expanded with respect to Prd−1 (d−1) (d−1) the basis {bi and thus } of Ud−1 and we obtain vhii = m=1 vhiihmi ⊗ bm v=
X
(d)
(d−1) ⊗ bi . vhiihmi ⊗ bm
i,m
We introduce the shorter notation vhi,mi := vhiihmi ∈
d−2 O a
Vj .
(3.22b)
j=1
3.2.6.4 Dependence on the Field Nd So far, the field K has been fixed. Note that the ‘real’ tensor space VR := j=1 Rnj Nd can be embedded into the ‘complex’ tensor space j=1 Cnj . Concerning the tensor rank, the following problem arises. Let v ∈ VR be a ‘real’ tensor. The tensor rank (j) is the minimal number r = rR of terms in (3.21) with vν ∈ Rnj . We may also ask for the minimal number r = rC of terms in (3.21) under the condition that (j) vν ∈ Cnj . Since RIj ⊂ CIj , the inequality rC ≤ rR is obvious, which already proves statement (a) below. Nd Proposition 3.44. Let VR = j=1 Vj be a tensor space over the field R. Define Nd VC = j=1 Vj,C as the corresponding complex version over C. Let rR (v) be the (real) tensor rank within VR , while rC (v) is the (complex) tensor rank within VC . (a) For any v ∈ VR , the inequality rC (v) ≤ rR (v) holds. (b) (vector and matrix case) If 0 ≤ d ≤ 2, rC (v) = rR (v) holds for all v ∈ VR . Nd (c) (proper tensor case) If d ≥ 3 and VR = j=1 Vj is a nondegenerate tensor space (cf. Definition 3.25), there are v ∈ VR with strict inequality rC (v) < rR (v). Proof. (i) Part (a) is already proved above. (ii) If d = 1, Remark 3.38c shows that rC = rR . If d = 2, v ∈ VR may be interpreted as a real-valued matrix M ∈ RI1 ×I2 . By Remark 2.2, the matrix rank is independent of the field: rC (v) = rR (v). (iii) Example 3.48 below presents a counterexample for which rC (v) < rR (v) in the case of d = 3. It may be easily embedded into tensor spaces with larger d by setting v0 := v ⊗ a4 ⊗ a5 ⊗ . . . ⊗ ad with arbitrary 0 6= aj ∈ Vj (4 ≤ j ≤ d). t u Bergman [29] was the first describing the rank dependence on K. Another dependence on the field will be mentioned in §3.2.6.5.
3.2 Tensor Product
71
3.2.6.5 Maximal, Typical, and Generic Ranks Nd The sequence R0 ⊂ R1 ⊂ . . . ⊂ V := a j=1 Vj in (3.19a) is properly increasing for infinite-dimensional tensor spaces. On the other hand, for finite-dimensional tensor spaces there must be a smallest rmax so that Rr = Rrmax for all r ≥ rmax . As a consequence, V = Rrmax , while Rrmax −1 $ V. This number rmax = rmax (V) is called the maximal rank V (see (2.5) for the matrix case). Lemma 3.45. Let nj := dim(Vj ) < ∞ for 1 ≤ j ≤ d. Then ! d Y Y nj / max ni = min rmax ≤ j=1
1≤i≤d
1≤i≤d
nj
(3.23)
j∈{1,...,d}\{i}
describes an upper bound of the maximal rank. For equal dimensions nj = n, this is rmax ≤ nd−1 . The statement Q can be extended to the case that one space Vi is infinite dimensional: rmax ≤ j6=i nj . Proof. After a permutation of the factors we may assume that nd = max1≤i≤d ni . Consider the full representation (3.16) of any v ∈ V: v=
(1)
X
(d)
a[i1 , . . . , id ] bi1 ⊗ . . . ⊗ bid
i1 ,...,id−1 ,id
! =
X
(1) bi1
⊗ ... ⊗
(d−1) bid−1
⊗
i1 ,...,id−1
X
(d) a[i1 , . . . , id ] bid
.
id
The sum in the last line is taken over r¯ := Rr¯ = V proves rmax ≤ r¯.
Qd−1 j=1
nj elementary tensors. Hence t u
The true value rmax may be clearly smaller than the bound from above. For instance, Kruskal [204] proves max{n1 , n2 } rmax = min{n1 , n2 } + min n1 , n2 , for V = Rn1 ⊗ Rn2 ⊗ R2 , 2 rmax = 5 for V = R3 ⊗ R3 ⊗ R3 . For equal dimensions n := n1 = . . . = nd , Howell [169] proves7 the asymptotic upper bound X d rmax ≤ d (n − 2ν)d−1 = nd−1 + O(nd−2 ). 2(d − 1) n−d 0≤ν≤
7
2
Howell considers tensor spaces over a ring instead of a field K. The cited result holds for rings which are principal ideal domains. Note that fields are in particular principal ideal domains.
72
3 Algebraic Foundations of Tensor Spaces
A general lower bound is proved by Howell [169] and Pamfilos [243]: . Y d d X 1−d+ nj nj . rmax ≥ j=1
j=1
For nj = n, this inequality implies rmax > nd−1 /d. Concerning the maximal rank, there is a remarkable difference to the matrix case. Random matrices and their rank are described in Remark 2.5. Random tensors may attain more than one rank with positive probability. Such ranks are called typical. Kruskal [204] proves that {2, 3} are the typical ranks of V = R2 ⊗R2 ⊗R2 , while 3 is the maximal rank. Note that such results also depend on the field. For algebraically closed fields as C there is only one typical rank (cf. Strassen [278], Comon–Golub– Lim–Mourrain [63]). In this case, the typical rank becomes the generic rank since almost all tensors have this rank (cf. Comon et al. [62]).
3.2.6.6 Examples As an illustration, we consider the tensor v ∈ V ⊗ V defined by v = a ⊗ a + b ⊗ a + a ⊗ b + b ⊗ a, where a, b ∈ V are linearly independent. The given representation proves v ∈ R4 and rank(v) ≤ 4. The fact that all four terms are linearly independent does not imply rank(v) = 4 . In fact, another representation is v = (a + b) ⊗ (a + b) proving rank(v) = 1 since v 6= 0 excludes rank(v) = 0. For later use we exercise the determination of the rank for a special tensor. Lemma 3.46. (a) Let Vj (1 ≤ j ≤ 3) be vector spaces of dimension ≥ 2 and consider the tensor space V := V1 ⊗ V2 ⊗ V3 . For linearly independent vectors v (j) , w(j) ∈ Vj define v := v (1) ⊗ v (2) ⊗ w(3) + v (1) ⊗ w(2) ⊗ v (3) + w(1) ⊗ v (2) ⊗ v (3) .
(3.24)
Then rank(v) = 3 holds; i.e., the given representation is already the shortest one. (b) The generalisation to general d yields the tensor ! j−1 d d O X O v (k) . sd := v (k) ⊗ w(j) ⊗ j=1
k=1
(3.25)
k=j+1
If v (j) , w(j) ∈ Vj are linearly independent for each 1 ≤ j ≤ d, the tensor satisfies rank(sd ) = d.
3.2 Tensor Product
73
Proof. The inductive proof of the more general Part (b) is due to Buczy´nski– Landsberg [45]. For d = 1, 2, the statement is trivial. Let the induction hypothesis rank(sd−1 ) = d−1 hold. By the definition of sd , the inequality r := rank(sd ) ≤ d holds; i.e., there are ujν ∈ Vj with sd =
r O d X
uν(j) .
ν=1 j=1
Let ϕ, ψ ∈ Vd0 be dual to v (d) , w(d) ; i.e., ϕ(v (d) ) = ψ(w(d) ) = 1 and ψ(v (d) ) = ϕ(w(d) ) = 0. Setting (1) v ⊗ . . . ⊗ v (j−1) ⊗ w(j) ⊗ v (j+1) ⊗ . . . ⊗ v (d−1) for j ≤ d − 1, v ˆj := v (1) ⊗ . . . ⊗ v (d−1) for j = d, we observe that ϕ(sd ) = sd−1 and ψ(sd ) = v ˆd . . Using the notation of Remark 3.64b — the functionals are applied to the last factor from Vd — we obtain on the other hand that ϕ(sd ) = sd−1 =
r X
αν
ν=1
d−1 O
uν(j) ,
ψ(sd ) = v ˆd =
r X ν=1
j=1
βν
d−1 O
u(j) ν .
j=1
(d)
(d)
holds with αν := ϕ(uν ) and βν := ψ(uν ). The inductive hypothesis implies rank(sd−1 ) = d − 1 so that r ≥ d − 1. r ˆd are linearly independent, the vectors α = (αν )ν=1 and Since sd−1 and v r β = (βν )ν=1 must also be linearly independent. As β 6= 0, we may assume without loss of generality that βr does not vanish. Set λ := αr /βr and ˆ sd−1 := sd−1 − λˆ vd =
r X ν=1
(αν − λβν )
d−1 O j=1
uν(j) =
r−1 X
(αν − λβν )
ν=1
d−1 O
u(j) ν .
j=1
The latter (r − 1)-term representation shows that rank(ˆ sd−1 ) ≤ r − 1. However, we verify that ! j−1 d−1 d X O O λ (j) ˆ sd−1 = v (k) ⊗ w(j) − v ⊗ v (k) d − 1 j=1 k=1
k=j+1
λ is of the form (3.25) with v (j) and w(j) − d−1 v (j) being linearly independent. Hence the inductive hypothesis states that rank(ˆ sd−1 ) = d − 1, which implies r ≥ d. Together with the inequality r ≤ d from the beginning, the statement is proved. t u Nd Nd Exercise 3.47. Consider v = j=1 vj + j=1 wj with nonvanishing vectors vj and wj . Show that rank(v) ≤ 1 holds if and only if vj and wj are linearly dependent for at least d − 1 indices j ∈ {1, . . . , d}. Otherwise, rank(v) = 2.
For the distinction of rankR and rankC we give the following example.
74
3 Algebraic Foundations of Tensor Spaces
Example 3.48. Let a, b, c, a0 , b0 , c0 ∈ Rn with n ≥ 2 such that (a, a0 ) , (b, b0 ), (c, c0 ) are pairs of linearly independent vectors. The real part of the complex tensor (a + ia0 ) ⊗ (b + ib0 ) ⊗ (c + ic0 ) ∈ Cn ⊗ Cn ⊗ Cn has a representation v=
1 2
(a + ia0 )⊗(b + ib0 )⊗(c + ic0 ) +
1 2
(a − ia0 )⊗(b − ib0 )⊗(c − ic0 ) ∈ R2
in Cn ⊗ Cn ⊗ Cn . Exercise 3.47 proves that rankC (v) = 2 in Cn ⊗ Cn ⊗ Cn . Multilinearity yields the representation v = a ⊗ b ⊗ c − a0 ⊗ b0 ⊗ c − a0 ⊗ b ⊗ c0 − a ⊗ b0 ⊗ c0 within Rn ⊗ Rn ⊗ Rn . We verify that also v = (a − a0 ) ⊗ (b + b0 ) ⊗ c + a0 ⊗ b ⊗ (c − c0 ) − a ⊗ b0 ⊗ (c + c0 ) .
(3.26)
An additional reduction is not possible so that rankR (v) = 3 > 2 = rankC (v) is valid. Proof. Assume that v = A ⊗ B ⊗ C + A0 ⊗ B 0 ⊗ C 0 . Applying suitable functionals to the first two components, we see that C, C 0 ∈ span{c, c0 }. If C and C 0 are linearly dependent, this leads to a quick contradiction. So assume that they 0 are linearly independent and choose functionals γ ∈ (Rn ) with γ(C) = 1 and 0 γ(C ) = 0. Note that at least two of the numbers γ(c), γ(c − c0 ), γ(c + c0 ) are nonzero. Hence application of id ⊗ id ⊗ γ to v = A ⊗ B ⊗ C + A0 ⊗ B 0 ⊗ C 0 yields A ⊗ B with matrix rank equal to 1, while the result for v in (3.26) is a linear combination of (a − a0 ) ⊗ (b + b0 ) and a0 ⊗ b, a ⊗ b0 , where at least two terms are present. One verifies that the matrix rank is 2. This contradiction excludes rankR (v) = 2. It is even easier to exclude the smaller ranks 1 and 0. t u The next example of different n-term representations over the real or complex field is taken from Mohlenkamp–Monz´on [230] and Beylkin–Mohlenkamp [31]. Pd d Example 3.49. Consider the function f (x1 , . . . , xd ) := sin j=1 xj ∈ ⊗ V for V = C(R). If C(R) is regarded as a vector space over K = C, rank(f ) = 2 holds and is realised by ! d d d Pd X 1 O −ixj 1 1 O ixj 1 Pd . e − e xj = ei j=1 xj − e−i j=1 xj = sin 2i 2i 2i j=1 2i j=1 j=1 If C(R) is considered as vector space over K = R, the following representation needs d terms: ! ! ! d d ν−1 d X X O O sin(xj +αj −αν ) sin(xj +αj −αν ) sin xj = ⊗ sin(xν ) ⊗ sin(αj −αν ) sin(αj −αν ) j=1
ν=1
j=1
j=ν+1
with arbitrary αj ∈ R satisfying sin(αj − αν ) 6= 0 for all j 6= ν. Lim [212, §15.3] gives an example of rR (v) < rQ (v) involving the field Q of rational numbers.
3.2 Tensor Product
75
In the 2, the r-term representation (3.21) corresponds to a matrix Prcase of d = I×J M = i=1 ai bT with vectors ai ∈ KI and bi ∈ KJ . Here the minimal i ∈ K (tensor and matrix) rank r is attained if the vectors {ai } and the vectors {bi } are linearly independent. PrMoreover,T the singular-value decomposition yields the particular form M = i=1 σi ai bi with σi > 0 and orthonormal ai and bi . Generalisations of these properties to d ≥ 3 are not valid. Remark 3.50. (a) A true generalisation of the singular-value decomposition to Pr Nd Nd (j) nj d dimensions would be v = with ∈ V := ν=1 σν j=1 vν j=1 K (j) r = rank(v), orthonormal vectors {vν : 1 ≤ ν ≤ r} for all 1 ≤ j ≤ d, and σν > 0. Unfortunately, such tensors form only a small subset of V; i.e., in general, v ∈ V does not possess such a representation. (j)
(b) Even the requirement that the vectors {vν : 1 ≤ ν ≤ r} be linearly independent cannot be satisfied in general. Proof. The tensors a ⊗ a ⊗ a + a ⊗ b ⊗ b cannot be reduced to rank ≤ 1, although the first factors are equal. This proves Part (b), while (b) implies (a). t u
3.2.6.7 Application: Strassen’s Algorithm Standard matrix-matrix multiplication of two n×n matrices costs 2n3 operations. A reduction to 4.7nlog2 7 = 4.7n2.8074 proposed by Strassen [277] is based on the fact that two 2×2 block matrices can be multiplied as follows: c1 c2 a1 a2 b1 b2 (3.27) = , ai , bi , ci submatrices c3 c4 b3 b4 a3 a4 with c1 = m1 + m4 − m5 + m7 ,
c2 = m2 + m4 ,
c4 = m1 + m3 − m2 + m6 ,
c3 = m3 + m5 ,
m1 = (a1 + a4 )(b1 + b4 ),
m2 = (a3 + a4 )b1 ,
m3 = a1 (b2 − b4 ),
m4 = a4 (b3 − b1 ),
m5 = (a1 + a2 )b4 ,
m6 = (a3 − a1 )(b1 + b2 ),
m7 = (a2 − a4 )(b3 + b4 ),
where only 7 multiplications of block matrices are needed. The entries of the following tensor m ∈ ⊗3 K4 are defined by cν =
4 X
mµλν aµ bλ
(1 ≤ ν ≤ 4).
(3.28a)
µ,λ=1
For instance for ν = 1, the identity c1 = a1 b1 +a2 b3 shows that m111 = m231 = 1, and mµλ1 = 0 otherwise. Assume a representation of m using r terms:
76
3 Algebraic Foundations of Tensor Spaces
m=
3 r O X
(j)
mi .
(3.28b)
i=1 j=1
Insertion into (3.28a) yields cν =
4 r X X
(1)
(2)
(3)
mi [µ] mi [λ] mi [ν] aµ bλ
i=1 µ,λ=1
=
r X i=1
(3) mi [ν]
4 X
! (1) mi [µ] aµ
µ=1
4 X
! (2) mi [λ] bλ
,
λ=1
i.e., only r multiplications of block matrices are needed. Algorithm (3.27) corresponds to a representation (3.28b) with r = 7, implying rank(m) ≤ 7. Theorem 3.51. rank(m) = rank(m) = 7 (see Definition 9.14 for rank). Proof. The first equation rank(m) = 7 is proved by Winograd [304]. For the t u stronger statement rank(m) = 7 see Landsberg [206, Theorem 11.0.2.11]. The general matrix-matrix multiplication is discussed in Example 3.72.
3.3 Linear and Multilinear Mappings Now we consider linear mappings defined on V ⊗a W or, more generally, on Nd a j=1 Vj . The image space might be the field K — then the linear mappings are called linear forms or functionals — or another tensor space. In §3.3.1 we shall justify that it suffices to define a (multi-)linear mapping by its values for elementary tensors. Often a linear mapping ϕk : Vk → Wk for a Nd fixed k is extended to a linear mapping defined on V := a j=1 Vj . This leads to an embedding explained in §3.3.2. Functionals are a subclass of linear maps. Nevertheless, there are special properties which are addressed in §3.3.2.4.
3.3.1 Definition on the Set of Elementary Tensors If a linear mapping φ ∈ L(V, W ) has to be defined on a vector space V spanned by a basis {vj }, it is sufficient to describe the images φ(vj ) (cf. Remark 3.5). In the case of a linear mapping φ ∈ L(V, X) we know that V is spanned Nd (j) . Hence it is sufficient to know the image by elementary tensors e := j=1 v φ(e) ∈ X of the elementary tensors. Note that φ(e) cannot take arbitrary values since elementary tensors are not linearly independent.
3.3 Linear and Multilinear Mappings
77
Lemma 3.52. A linear mapping φ : V := by its values for elementary tensors.
a
Nd
j=1
Vj → X is completely defined
Nd (j) is given in terms of the vectors Remark 3.53. Often the value of φ j=1 v v (1) , . . . , v (d) . That means there is a mapping Φ : V1 × . . . × Vd → X with ! d O (j) (3.29) = Φ(v (1) , . . . , v (d) ) for all v (j) ∈ Vj . φ v j=1
In this case we have to require that Φ : V1 × . . . × Vd → U be multilinear. Then the ‘universality of the tensor product’ formulated in Proposition 3.23 implies that (3.29) defines a unique linear mapping φ. Example 3.54. Let ϕ : V → U be a linear mapping. Then the definition φ (v ⊗ w) := ϕ(v) ⊗ w
for all v ∈ V, w ∈ W
defines a unique linear mapping φ from V ⊗a W into X = U ⊗a W. In this example the mapping Φ : V × W → X is given by Φ(v (1) , v (2) ) = ϕ(v (1) ) ⊗ v (2) . Linearity of ϕ implies multilinearity of Φ.
3.3.2 Embeddings 3.3.2.1 Embedding of Spaces of Linear Maps Next we consider two d-tuples (V1 , . . . , Vd ) and (W1 , . . . , Wd ) of vector spaces Nd Nd and the corresponding tensor spaces V := a j=1 Vj and W := a j=1 Wj . Since L(Vj , Wj ) ( 1 ≤ j ≤ d) are again vector spaces, we can build the tensor space L :=
d O a
L(Vj , Wj ) .
j=1
Nd Elementary tensors of L are of the form Φ = j=1 ϕ(j) with ϕ(j) ∈ L(Vj , Wj ). In §1.1.2 we called Φ the Kronecker product8 of the mappings ϕ(j) . They have a natural interpretation as mappings of L(V, W) via (1.4b): O O d d Φ v (j) = ϕ(j) (v (j) ) ∈ W j=1
j=1
for any
d O
v (j) ∈ V.
(3.30a)
j=1
At least, this term is used for matrix spaces L(Vj , Wj ) with Vj = Knj and Wj = Kmj . As mentioned in §1.6, the attribution to Kronecker is questionable. 8
78
3 Algebraic Foundations of Tensor Spaces
Note that (3.30a) defines Φ for all elementary tensors of V. The right-hand side in (3.30a) is multilinear in v (j) . Therefore Remark 3.53 proves that Φ can be uniquely extended to Φ ∈ L(V, W). Linear combinations of such elementary tensors Φ are again elements of L(V, W). This leads to the embedding described below. Nd Proposition 3.55. Let Vj , Wj , V, W be as above. We identify a j=1 L(Vj , Wj ) with a subspace of L(V, W) via (3.30a): L=
d O a
L(Vj , Wj ) ⊂ L(V, W).
(3.30b)
j=1
If at least two of the vector spaces Vj are infinite dimensional and dim(W) > 0, L is a proper subspace. On the other hand, if all vector spaces Vj are finite dimensional, the spaces coincide: d O a
L(Vj , Wj ) = L(V, W).
(3.30c)
j=1
Proof. (a) Definition (3.30a) yields a linear mapping Υ : L → L(V, W). Υ describes an embedding if and only if Υ is injective. For this purpose, we use induction on d and start with d = 2. We have to disprove Υ (Λ) = 0 for 0 6= Λ ∈ L = L(V1 , W1 ) ⊗a L(V2 , W2 ). If Υ (Λ) = 0, the interpretation (3.30a) of Λ produces the zero mapping Υ (Λ) in L(V, W). By Lemma 3.15, there Pr (1) (2) (2) exists a representation Λ = with linearly independent ϕν ν=1 ϕν ⊗ ϕν (1) and 9 ϕ1 6= 0. Application to any v := v (1) ⊗ v (2) yields 0 = Λ(v) =
r X
(1) (2) ϕ(1) ) ⊗ ϕ(2) ) ∈ W = W 1 ⊗a W 2 . ν (v ν (v
ν=1 (1)
Fix v (1) such that ϕ1 (v (1) ) 6= 0. Then there is some functional χ ∈ W10 with (1) χ(ϕ1 (v (1) )) 6= 0. Application of χ ⊗ id to Λ(v) yields 0 = (χ ⊗ id) (Λ(v)) =
r X
(2) ) ∈ W2 αν ϕ(2) ν (v
ν=1 (1)
(cf. Remark 3.64) with αν := χ(ϕν (v (1) )). The choice of v (1) and χ ensures Pr (2) (2) that α1 6= 0. Linear independence of {ϕν } implies ν=1 αν ϕν 6= 0. Hence P (2) r (2) there exists some v (2) ∈ V2 with ) 6= 0 in contradiction to ν=1 αν ϕν (v (1) (2) 0 = (χ ⊗ id) Λ(v)(v ⊗ v ) . This proves injectivity of Υ for d = 2. 9
(1)
In fact, ϕν
(1)
are linearly independent, but only ϕ1
6= 0 is needed.
3.3 Linear and Multilinear Mappings
79
Let the assertion be valid for d − 1. Represent Λ in the form r X with r = rankV1 ⊗V[1] (Λ) ϕν(1) ⊗ ϕν[1] Λ= ν=1 [1]
(cf. Remark 3.36). As stated in Lemma 3.41, {ϕν } is linearly independent, while Nd (1) [1] ϕν 6= 0. By induction, ϕν ∈ L(V[1] , W[1] ) holds with V[1] := a j=2 Vj and Nd W[1] := a j=2 Wj . Now all arguments from the inductive start d = 2 can be repeated. (b) Assume that all vector spaces Vj and Wj are finite dimensional. Then both Qd L and L(V, W) have the same dimension j=1 (dim Vj dim Wj ). Therefore the inclusion (3.30b) implies (3.30c). (c) Now only the spaces Vj are required to be finite dimensional. Fix some Φ ∈ L(V, W). Since V is finite dimensional, also the image Φ(V) ⊂ W is Nd ˆ ˆ j := U min (Φ(V)) ⊂ Wj and W ˆ = finite dimensional. Hence W j j=1 Wj are ˆ part (b) proves finite dimensional (Ujmin (·) defined in §6). Since Φ ∈ L(V, W), Nd ˆ j ) ⊂ a Nd L(Vj , Wj ) = L . that Φ ∈ a j=1 L(Vj , W j=1 (d) If at least two vector spaces Vj are infinite dimensional and dim(W) > 0, the counterexample L(`0 ⊗a `0 , K) of Example 3.63 can be embedded into the u space L(V, W). t Proposition 3.55 does not state what happens if exactly one space Vj is infinite dimensional. In this case the answer depends on the dimensions of Wj . If at most one of these spaces is infinite dimensional, L = L(V, W) holds; otherwise L $ L(V, W). Two canonical cases are explained below. Remark 3.56. (a) Let V = V1 ⊗ V2 and W = W1 ⊗ W2 with finite-dimensional (j) (2) spaces V2 and W2 . Let {vν : ν ∈ Bj } be bases of Vj , while {wk : 1 ≤ k ≤ n} with n := dim(W2 ) < ∞ is a basis of W2 . A linear map Φ ∈ L(V, W) is char(1) (2) acterised by the images wν,µ := Φ(vν ⊗ vµ ). There is a unique representation of Pn (2) (j) the form wν,µ = k=1 wν,µ,k ⊗wk (cf. (3.22a)). Let {ˆ vν : ν ∈ Bj } (j = 1, 2) be the dual bases. Then the mapping ( P (1) n X X ϕµ,k (v) := ν∈B1 vˆν (v) wν,µ,k for v ∈ V1 , 0 0 ϕµ ,k ⊗ψµ ,k ∈ L with (2) (2) ψµ0 ,k (v) := vˆν (v) wk for v ∈ V2 k=1 µ0 (1)
coincides with Φ. Note that vˆν (v) is nonzero for only finitely many ν ∈ B1 so that ϕµ,k ∈ L(V1 , W1 ). Hence, L = L(V, W). (b) Let V = V1 ⊗ K and W = W1 ⊗ W2 with infinite-dimensional spaces V1 , W1 , W2 . Let {bν } be a basis of V1 and define Φ ∈ L(V, W) by Φ(bν ⊗ 1) = wν with rank(wν ) ≥ ν. Any mapping φ ∈ L has some finite rank rφ so that rank(φ(bν ⊗ 1)) ≤ rφ (cf. Exercise 3.60). This shows φ(bν ⊗ 1) 6= Φ(bν ⊗ 1) for ν > rφ and proves Φ ∈ / L, i.e., L $ L(V, W).
80
3 Algebraic Foundations of Tensor Spaces
3.3.2.2 Notational Details In Example 3.54 we use the mapping v ⊗ w 7→ ϕ(v) ⊗ w, which can be generalised to d O Φ: v (j) 7→ v (1) ⊗ . . . ⊗ v (k−1) ⊗ ϕ v (k) ⊗ v (k+1) ⊗ . . . ⊗ v (d) . j=1
The latter mapping can be formulated as (j) d O ϕ = ϕ, Wj := W (j) ϕ Φ= with ϕ(j) = id, Wj := Vj j=1
for j = k, for j 6= k,
(3.31a)
or Φ = id ⊗ . . . ⊗ id ⊗ ϕ ⊗ id ⊗ . . . ⊗ id. Since such a notation is rather cumbersome, we identify10 ϕ and Φ as stated below. Notation 3.57. (a) Let k ∈ {1, . . . , d} be fixed. A mapping ϕ ∈ L(Vk , Wk ) is synonymously interpreted as Φ in (3.31a). This defines the embedding L(Vk , Wk ) ⊂ L(V, W) where V =
a
Nd
j=1
Vj and W =
a
Nd
j=1
(3.31b)
Wj with Wj = Vj for j 6= k.
(b) Let α ⊂ {1, . . . , d} be a nonempty subset. Then the embedding O L(Vk , Wk ) ⊂ L(V, W) a k∈α
is defined analogously by inserting identity N maps for all j ∈ {1, . . . , d}\α. If we want to distinguish the map ϕα ∈ a k∈α L(Vk , Wk ) from Φ ∈ L(V, W), we denote Φ by ϕα ⊗ idαc N which indicates that the identity idαc is applied to a k∈αc Vk corresponding to the complement αc = {1, . . . , d}\α. If α = {j}, the notation idαc = id[j] is used (cf. §5.2.1). We repeat the composition rule, which in §4.6.3 is formulated for Kronecker matrices. Nd Remark 3.58. Composing the elementary tensors Ψ = j=1 ψ (j) ∈ L(U,V) and Nd Φ = j=1 ϕ(j) ∈ L(V,W) yields d O ϕ(j) ◦ ψ (j) ∈ L(U, W). Φ◦Ψ = j=1 10
The identifications (3.31a–c) are standard in other fields. If we use the multi-index notation ∂ n f = ∂xn1 ∂yn2 ∂zn3 f for the partial derivative, the identities are expressed by ∂yn2 = ∂zn3 = id ∂ n1 ) f (x, y, z) is written omitting the identities if, e.g., n2 = n3 = 0. However, more often ( ∂x n1 n1 in ∂ /∂x ⊗ id ⊗ id.
3.3 Linear and Multilinear Mappings
81
An obvious conclusion are the next statements (use that id commutes with any map). Conclusion 3.59. (a) Let Φ and Ψ as above and assume that ϕ(j) and ψ (j) commute: ϕ(j) ◦ ψ (j) = ψ (j) ◦ ϕ(j) (1 ≤ j ≤ d). Then also Φ ◦ Ψ = Ψ ◦ Φ holds. ϕ(β) ∈ L(Vβ , Wβ ) belong to disjoint subsets (b) Let ϕ(α) ∈ L(Vα , Wα ) and N α, β of {1, . . . , d} and Vα = a j∈α Vj , etc. (cf. §5.2.1). Then the mappings Φ = ϕ(α) ⊗ idαc and Ψ = ϕ(β) ⊗ idβ c commute. Moreover, the composition is Φ ◦ Ψ = ϕ(α) ⊗ ϕ(β) ⊗ id{1,...,d}\(α∪β) . (c) Let ϕ(j) ∈ L(Vj , Wj ) for all 1 ≤ j ≤ d. Then ϕ(1) ◦ ϕ(2) ◦ . . . ◦ ϕ(d) interpreted Nd (j) by (3.31b) is equal to j=1 ϕ . 3.3.2.3 Kronecker Matrices and Rank Amplification Now we assume that Vj = Knj and Wj = Kmj . In this case the linear maps ϕ(j) ∈ L(Vj , Wj ) can be realised by matrices A(j) ∈ Knj ×mj . The tensor product Nd Φ = j=1 ϕ(j) ∈ L(V,W) corresponds to the Kronecker matrix A=
d O
A(j) .
j=1
Definition (3.30a) becomes Av =
d O j=1
A(j) v (j)
∈
d O
Kmj
j=1
for v =
d O j=1
v (j) ∈
d O
Kn j .
j=1
Pr Nd (j) The (tensor) rank r is again the smallest integer such that A = ν=1 j=1 Aν . Note that the tensor rank of A is not related to the matrix rank of A (cf. Remark 3.38b). In such a case we explicitly distinguish the matrix rank and tensor rank. Otherwise, rank(·) always means the tensor rank. Exercise 3.60. If rank(A) = r and rank(v) = s, then rank(Av) ≤ rs. The bound rs from above yields an upper bound, which may be too pessimistic. Obviously, rs may be replaced with min{rs, rmax (W)}, where rmax (W) is the Nd maximal rank of W = j=1 Kmj (cf. §3.2.6.5). But even when rs < rmax (W), there are cases for which rank(Av) may be strictly smaller than in the exercise. For this purpose, we define the rank amplification by ramp (A) := max v6=0
rank(Av) , rank(v)
82
3 Algebraic Foundations of Tensor Spaces
which implies that rank(Av) ≤ ramp (A) · rank(v). Exercise 3.60 states that ramp (A) ≤ rank(A). Example 3.61. For V = Kn ⊗ Km and W = Km ⊗ Kn with n, m ≥ 2 define T ∈ L(V, W) by the transposition from Remark 3.20; i.e., T (v ⊗ w) = w ⊗ v. Then ramp (T ) = 1, whereas rank(T ) > 1. Proof. The proof of rank(T v) = rank(v) is trivial. For proving rank(T ) > 1 it the case n = m = 2. V = K2 ⊗ K2 is isomorphic to is sufficient v1 u1 to consider T 4 ∼ K via u2 ⊗ v2 = [u1 v1 u2 v1 u1 v2 u2 v2 ] . T corresponds to the matrix 1000
0 0 1 0 0 1 0 0 , which cannot be equal to a rank-1 tensor product A⊗B (A, B ∈ K2×2 ) 0001
since a11 b11 = 1 and a11 b22 = 0 imply b22 = 0, in contradiction to a22 b22 = 1. t u
3.3.2.4 Embedding of Linear Functionals The previous linear mappings become functionals if the image space Wj is the trivial vector space K. However, the difference is seen from the following example. Consider the mapping ϕ : V → U from Example 3.54 and the induced mapping Φ : v ⊗ w 7→ ϕ(v) ⊗ w ∈ U ⊗a W For U = K, the mapping ϕ is a functional and the image of Φ belongs to K⊗a W As K ⊗a W is isomorphic11 to W , it is standard to simplify U ⊗a W to W (cf. Remark 3.26a). This means that ϕ(v) ∈ K is considered as scalar factor: ϕ(v) ⊗ w = ϕ(v) · w ∈ W . When we want to reproduce the notation in §3.3.2.1, we must modify W by omitting all factors Wj = K. d N K The counterpart of Proposition 3.55 is the statement below. Here W = j=1 degenerates to K, i.e., L(V, W) = V0 . Nd Proposition 3.62. Let Vj (1 ≤ j ≤ d) be vector spaces generating V := a j=1 Vj . Nd N d Elementary tensors of a j=1 Vj0 are Φ = j=1 ϕ(j) , ϕ(j) ∈ Vj0 . Their application to tensors from V is defined via ! d d Y O (j) ϕ(j) (v (j) ) ∈ K . Φ v = j=1
j=1
This defines the embedding 11
Concerning isomorphisms compare the last paragraph in §3.2.5.
3.3 Linear and Multilinear Mappings d O a
83
Vj0 ⊂ V0 , and
(3.32)
j=1 d O a
Vj0 = V0 ,
if dim(Vj ) < ∞ for 1 ≤ j ≤ d.
j=1
The next example shows that, in general, (3.32) holds with proper inclusion. Example 3.63. Choose V := `0 ⊗a `0 withP`0 = `0 (N) defined in (3.2) and consider the functional Φ ∈ V0 with Φ(v ⊗ w) := i∈N vi wi . Note that this infinite sum is well defined since vi wi = 0 for almost all i ∈ Z. Since Φ does not belong to `00 ⊗a `00 , 0 the latter space is a proper subspace of (`0 ⊗a `0 ) . The result does not change when the index set N is replaced with any other infinite index set I. Note that any infinite-dimensional space V is isomorphic to `0 (I) with #I = dim(V ). Therefore this example holds for V := V ⊗a W with V and W of infinite dimension. Pk Proof. For an indirect proof, assume that Φ = ν=1 ϕν ⊗ ψν ∈ `00 ⊗a `00 for some ϕν , ψν ∈ `00 and k ∈ N. Choose any integer m > k. Let e(i) ∈ `0 be the i-th unit Pk vector. The assumed identity Φ = ν=1 ϕν ⊗ ψν tested for all e(i) ⊗ e(j) ∈ `0 ⊗a `0 with 1 ≤ i, j ≤ m yields m2 equations Xk (1 ≤ i, j ≤ m) . (3.33) δjk = Φ(e(i) ⊗ e(j) ) = ϕν (e(i) ) · ψν (e(j) ) ν=1
Define matrices A, B ∈ Km×k by Aiν := ϕν (e(i) ) and Bjν := ψν (e(j) ). Then equation (3.33) becomes I = AB T . Since rank(A) ≤ min{m, k} = k, the rank t u of the product AB T is also bounded by k, contradicting rank(I) = m > k. A very N important case is the counterpart of (3.31b) in Notation 3.57a. We use the notations j6=k and V[k] introduced in (3.17a,b). Nd Remark 3.64. (a) Let Vj (1 ≤ j ≤ d) be vector spaces generating V := a j=1 Vj . For a fixed index k ∈ {1, . . . , d} let ϕ(k) ∈ Vk0 be a linear functional. Then ϕ(k) induces the definition of Φ ∈ L(V, V[k] ) by Φ
d O j=1
! v
(j)
O (j) v . := ϕ(k) v (k) · j6=k
(b) According to (3.31b), we identify Φ = id ⊗ . . . ⊗ ϕ(k) ⊗ . . . ⊗ id = id[k] ⊗ ϕ(k) N Nd (j) = ϕ(k) v (k) · j6=k v (j) . This leads to the and ϕ(k) and write ϕ(k) j=1 v embedding Vk0 ⊂ L(V, V[k] ). (3.34) The tensor vhii ∈ V[d] defined in (3.22a) can be obtained as follows. Part (b) follows by linearity.
84
3 Algebraic Foundations of Tensor Spaces (d)
Remark 3.65. (a) Let {bi : i ∈ B} be a basis of Ud ⊂ Vd with some dual system (d) {ϕi } ⊂ Vd0 . Then v ∈ V[d] ⊗ Ud has the representation (3.22a) with (d)
vhii := ϕi (v) ∈ V[d] . (b) v = 0 implies vhii = 0 for all i. Part (b) from above proves the first part of the next lemma. The second part is left as exercise. Nd Pr Nd (j) Lemma 3.66. Let V = a j=1 Vj and v := i=1 j=1 vi . If, for some index (k)
k ∈ {1, . . . , d}, the vectors {vi v = 0 implies that
: 1 ≤ i ≤ r} ⊂ Vk are linearly independent, [k]
[k]
vi = 0
for all vi :=
O
(j)
(1 ≤ i ≤ r).
vi
j6=k [k]
(k)
Conversely, for linearly dependent vectors {vi : 1 ≤ i ≤ r}, there are vi ∈ V[k] , Pr [k] (k) not all vanishing, with v = i=1 vi ⊗ vi = 0. 0 . Here the elementary tensors are ϕ[k] = Another extreme case occurs for V[k] N (j) ϕ . In this case, the image space is K ⊗ . . . ⊗ K ⊗ Vk ⊗ K ⊗ . . . ⊗ K ∼ = Vk . j6=k
Nd Remark 3.67. Let Vj (1 ≤ j ≤ d) be vector spaces generating V := a j=1 Vj . For a fixed index k ∈ {1, . . . , d} define V[k] by (3.17a). Then elementary tensors N (j) 0 ϕ[k] = ϕ ∈ V[k] (ϕ(j) ∈ Vj0 ) are considered as mappings from L(V, Vk ) via j6=k
d O
[k]
ϕ
! v
(j)
! :=
j=1
Y
(j)
ϕ
(v
(j)
· v (k) .
)
(3.35a)
j6=k
This justifies the embedding 0 V[k] ⊂ L(V, Vk ).
(3.35b)
The generalisation of the above case is as follows. Let α ⊂ {1, . . . , d} be any nonempty subset and define the complement αc := {1, . . . , d}\α. Then the embedding O 0 with Vα := a (3.35c) Vα ⊂ L(V, Vαc ) Vj j∈α
is defined via ! O j∈α
ϕ
(j)
d O j=1
!
! v
(j)
:=
Y j∈α
(j)
ϕ
(v
(j)
)
·
O j∈αc
v (j) .
85
3.4 Tensor Spaces with Algebra Structure
3.3.2.5 Further Embeddings Proposition 3.68. (a) The algebraic tensor space V ⊗a W 0 can be embedded into L(W, V ) via w ∈ W 7→ (v ⊗ w0 ) (w) := w0 (w) · v ∈ V. (b) Similarly, V ⊗a W can be embedded into L(W 0 , V ) via w0 ∈ W 0 7→ (v ⊗ w) (w0 ) := w0 (w) · v ∈ W.
(3.36a)
(c) The embeddings from above show that V ⊗a W 0 ⊂ L(W, V )
and
V ⊗a W ⊂ L(W 0 , V ).
(3.36b)
∼ L(W, V ) are isomorphic. Corollary 3.69. If dim(W ) < ∞, V ⊗a W ∼ = V ⊗a W 0 = Proof. dim(W ) < ∞ implies W 0 ∼ = V ⊗a W 0 . = W and, therefore, also V ⊗a W ∼ Thanks to (3.36b), V ⊗a W 0 is isomorphic to a subspace of L(W, V ). To prove V ⊗a W ∼ = L(W,V ), we must demonstrate that any ϕ ∈ L(W,V ) can be realised by some x ∈ V ⊗a W 0. Let P {wi } be a basis of W and {ωi } a dual basis with ωi (wj ) = δij . Set x := i ϕ(wi ) ⊗ ωi . One easily verifies that x(wi ) = ϕ(wi ) in the sense of the embedding (3.36a). t u
3.4 Tensor Spaces with Algebra Structure Throughout this section, all tensor spaces are algebraic tensor spaces. Therefore we omit the index ‘a’ in ⊗a . So far, we have considered tensor products of vector spaces Aj (1 ≤ j ≤ d). Now we suppose that Aj possesses an additional operation12 ◦ : Aj × Aj → Aj , which we call the multiplication (to be quite precise, we should introduce individual symbols ◦j for each Aj ). We require that (a + b) ◦ c = a ◦ c + b ◦ c a ◦ (b + c) = a ◦ b + a ◦ c (λa) ◦ b = a ◦ (λb) = λ · (a ◦ b) 1◦a=a◦1=a
for all a, b, c ∈ Aj , for all a, b, c ∈ Aj , for all λ ∈ K and all a, b ∈ Aj , for some 1 ∈ Vj and all a ∈ Aj .
(3.37)
These rules define a (noncommutative) algebra with unit element 1. Ignoring the Nd algebra structure, we define the tensor space A := a j=1 Aj as before and establish 12
µ : Aj ×Aj → Aj defined by (a, b) 7→ a ◦ b is called the structure map of the algebra Aj .
86
3 Algebraic Foundations of Tensor Spaces
an operation ◦ : A × A → A by ! ! d d d O O O (aj ◦ bj ) , aj ◦ bj = j=1
j=1
j=1
1 :=
d O
1
j=1
for elementary tensors. The first two axioms in (3.37) are used to define the multiNd plication of tensors in a j=1 Aj . Example 3.70. (a) Consider the matrix spaces Aj := KIj ×Ij (Ij : finite index sets). Here ◦ is the matrix-matrix multiplication in KIj ×Ij . The unit element 1 is the Nd identity matrix I. Then A := j=1 Aj = KI×I (I = I1 × . . . × Id ) is the space containing the Kronecker matrices. (b) The vector spaces Aj and A from Part (a) yield another algebra if ◦ is defined by the Hadamard product (entry-wise product, cf. (4.82)): (a ◦ b)i = ai ·bi for i ∈ Ij×Ij and a, b ∈ Aj . The unit element 1 is the matrix with all entries being one. (c) Let Aj := C(Ij ) be the set of continuous functions on the interval Ij ⊂ R. Aj becomes an algebra with the pointwise multiplication ◦. The unit element is Nd the function with constant value 1 ∈ K. Then A := j=1 Aj ⊂ C(I) contains multivariate functions on the product domain I = I1 × . . . × Id . (d) Let Aj := `0 (Z) (cf. (3.2)). The multiplication ◦ in Aj may be defined by the convolution ? in (4.83a). The unit element 1 is the sequence with 1i = δi0 for allNi ∈ Z (δi0 in (2.1)). This defines the d-dimensional convolution ? in d A := j=1 Aj = `0 (Zd ). Another relation of between algebras and tensor spaces is already exercised in §3.2.6.7 for the matrix-matrix multiplication. Let d = 1, i.e., the vector space V carries a second operation ◦ : V × V → V . Fix any basis {bi : i ∈ I} of V. The operation ◦ is completely defined by the data sijk in X bi ◦ bj = sijk bk . k∈I
These coefficients define the structure tensor s ∈ KI ⊗ KI ⊗ KI of the algebra (cf. Ye–Lim [305, §4]). The importance of rank(s) is emphasised in §3.2.6.7. An equivalent tensor, also denoted by s, is X sijk bi0 ⊗ bj0 ⊗ bk ∈ V 0 ⊗ V 0 ⊗ V, s := i,j,k∈I
where b0i ∈ V 0 is the functional with b0i (b` ) = δi` for basis vectors b` . We recall the bidual embedding V ⊂ V 00 by v(ϕ) := ϕ(v) (v ∈ V, ϕ ∈ V 0 ). Therefore the left-hand side of the following expression belongs to V 00 ⊗ V 00 ⊗ L(V, V ). Remark 3.71. v ◦ w = (v ⊗ w ⊗ id) s holds for all v, w ∈ V. P P Proof. Inserting v = w = w ⊗ id) s = j wj bj , we obtain i vi bi and P P P (v ⊗P sijk b0i (v) ⊗ b0j (w) ⊗ bk = sijk vi wj bk = vi wj k sijk bk = i,j,k i,j,k i,j P P P t u j wj bj ) = v ◦ w. i,j vi wj bi ◦ bj = ( i vi bi ) ◦ (
87
3.4 Tensor Spaces with Algebra Structure
Example 3.72. Let V1 = Kn1 ×n2 , V2 = Kn2 ×n3 , V3 = Kn1 ×n3 be matrix spaces. (j) (j) The respective bases consist of E(k,`) ∈ Vj with E(k,`) [p, q] = δkp δ`q . The characteristic tensor of the matrix-matrix multiplication is defined by m :=
n2 X n3 n1 X X
(1)0
(2)0
(3)
E(i1 ,i2 ) ⊗ E(i2 ,i3 ) ⊗ E(i1 ,i3 ) ∈ V1 ⊗ V2 ⊗ V3 .
i1 =1 i2 =1 i3 =1
The tensor m is already defined in (3.28a) for n1 = n2 = n3 = 2 (up to the isomorphism K4 ' K2×2 ). Proof. Note that (1)0
X
(A ⊗ B ⊗ id) m =
(2)0
(3)
E(i1 ,i2 ) (A)E(i2 ,i3 ) (B)E(i1 ,i3 )
i1 ,i2 ,i3
" =
#
X X i1 ,i3
(3)
A[i1 , i2 ]B[i2 , i3 ] E(i1 ,i3 ) =
i2
X
(3)
(AB) [i1 , i3 ]E(i1 ,i3 ) = AB
i1 ,i3
for A ∈ V1 and B ∈ V2 .
u t
The complexity of matrix multiplication is closely related to the rank of m (cf. §3.2.6.7). For a detailed discussion we refer to Landsberg [206, Chap. 11]. The term tensor algebra is used for another algebraic construction. Let V be a vector space and consider the tensor spaces ⊗d V :=
d O
V
j=1
(cf. Notation 3.24) and define the direct sum13 of ⊗d V for all d ∈ N0 : X A(V ) := ⊗d V.
(3.38)
d∈N0
Elements of the tensor algebra are finite sums multiplicative structure is given by ◦ = ⊗: vn ∈ ⊗d V, vm ∈ ⊗m V
7→
P
d∈N0
vd with vd ∈ ⊗d V . The
vn ⊗ vm ∈ ⊗n+m V
(cf. Greub [125, Chap. III]). The algebra A(V ) has the unit element 1 ∈ K = ⊗0 V ⊂ A(V ). The algebra A(`0 ) will be used in §14.3.3. If we replace N0 in (3.38) with N , ∞ X ⊗d V A(V ) := d=1
is an algebra without unit element. 13
V = X + Y is the direct sum of the vector spaces X, Y, if each v ∈ V has a unique representation v = x + y with x ∈ X, y ∈ Y.
88
3 Algebraic Foundations of Tensor Spaces
3.5 Symmetric and Antisymmetric Tensor Spaces 3.5.1 Basic Definitions (Anti)symmetric tensors are associated with coinciding vector spaces Vj denoted by V : V := V1 = V2 = . . . = Vd . The d-fold tensor product is now denoted by V = ⊗d V , where d ≥ 2 is required. Here we refer to the algebraic tensor space, i.e., V = ⊗da V . The completion to a Banach or Hilbert tensor space will be considered in §4.7.2. A bijection π : D → D of the set D := {1, . . . , d} is called a permutation. Let P := {π : D → D bijective} be the set of all permutations. Note that its cardinality #P = d! increases fast with increasing d . (P, ◦) is a group for which ◦ is defined by composition: (τ ◦ π) (j) = τ (π(j)). The inverse of π is denoted by π −1 . As known from the description of determinants, sign : P → {−1, +1} can be defined such that transpositions (pairwise permutations) have the sign −1, while the function sign is multiplicative: sign(τ ◦ π) = sign(τ ) · sign(π). A permutation π ∈ P gives rise to a mapping V → V, again denoted by π: d d O O −1 v (j) 7→ π: (3.39) v (π (j)) . j=1
j=1
Exercise 3.73. For v ∈ ⊗d Kn show that (π(v))i = vπ(i) for all i ∈ {1, . . . , n}d , where π (i1 , . . . , id ) := iπ(1) , . . . , iπ(d) is the action of π ∈ P onto a tuple from {1, . . . , n}d . Definition 3.74. (a) v ∈ V = ⊗d V is symmetric if π(v) = v for all14 π ∈ P . (b) The symmetric tensor space is defined by S := S(V ) := Sd (V ) := ⊗dsym V := {v ∈ V : v symmetric} . (c) A tensor v ∈ V = ⊗d V is antisymmetric (synonymously with ‘skew-symmetric’ or ‘alternating’) if π(v) = sign(π)v for all π ∈ P . (d) The antisymmetric tensor space is defined by A := A(V ) := Ad (V ) := ⊗danti V := {v ∈ V : v antisymmetric} . 14
Replacing all π ∈ P by a subgroup of permutations (e.g., only permutations of certain positions) one can define tensor spaces with partial symmetry.
3.5 Symmetric and Antisymmetric Tensor Spaces
89
Remark 3.75. An equivalent definition of a symmetric [antisymmetric] tensor v is v = π(v) [v = −π(v)] for all transpositions π : (1, . . . , i, . . . , j, . . . , d) 7→ (1, . . . , j, . . . , i, . . . , d) , since all permutations from P are products of pairwise permutations. For d = 2 and V = Kn , tensors from S and A correspond to symmetric matrices (Mij = Mji ) and antisymmetric matrices (Mij = −Mji ), respectively. Formally we define Sd = Ad = ⊗d V for d = 0 and d = 1. Proposition 3.76. (a) S and A are subspaces of V = ⊗d V . (b) The projections PS and PA from V onto S and A, respectively, are given by PS (v) :=
1 X π(v), d!
PA (v) :=
π∈P
1 X sign(π)π(v). d!
(3.40)
π∈P
PS is called the symmetriser, and PA the alternator. Any matrix can be split into a symmetric and an antisymmetric part, i.e., S ⊕ A = V holds for d = 2. This is not true for d ≥ 3. Remark 3.77. (a) Exercise 3.73 shows that (PS (v))i = (PA (v))i =
1 d!
P
π∈P
vπ(i) and
1 X sign(π)vπ(i) for v ∈ ⊗d Kn and all i ∈ {1, . . . , n}d . d! π∈P
d
(b) Let V = Kn , In := {1, . . . , n} , and Isym := {i ∈ In : 1 ≤ i1 ≤ i2 ≤ . . . ≤ id ≤ n} . n Then v ∈ ⊗dsym Kn is completely determined by the entries vi for i ∈ Isym n . All other entries vi coincide with vπ(i) , where π ∈ P is chosen such that π(i) ∈ Isym n . (c) With V and In from Part (b) let Ianti := {i ∈ In : 1 ≤ i1 < i2 < . . . < id ≤ n} . n Then v ∈ ⊗danti Kn is completely determined by the entries vi with i ∈ Ianti n . For i ∈ In \Ianti n , two cases must be distinguished. If i contains two equal elements, i.e., ij = ik for some j 6= k, then vi = 0. Otherwise there is some and vi = sign(π)vπ(i) . π ∈ P with π(i) ∈ Ianti n Conclusion 3.78. (a) Ad (Kn ) = {0} for d > n since Ianti = ∅. n N d (b) For n = d , Ad (Kd ) = span{PA ( j=1 e(j) )} is one-dimensional (e(j) : unit = 1. vectors, cf. (2.2)) since #Ianti n
90
3 Algebraic Foundations of Tensor Spaces
Proposition 3.79. For dim(V ) = n < ∞ and d ≥ 2 the dimensions of S and A satisfy dim(Ad (V )) = nd < nd /d! < dim(Sd (V )) = n+d−1 . d Bounds are d d d−1 d−1 dim(Ad (V )) ≤ n − /d! and dim(Sd (V )) ≤ n + /d!. 2 2 Proof. S is isomorphic to S(Kn ). Remark 3.77c shows that dim(S) = #Isym n . n sym By induction we show that #In = d . The proof for A is analogous. t u As Proposition 3.79 shows, A(V ) has a dimension smaller than S(V ), but the other side of the coin is that V must be higher dimensional to form Ad (V ) of a certain dimension. As long as n = dim(V ) < d, Ad (V ) = {0} is zero-dimensional. Let NA = ker(PA ), NS = ker(PS ) be the kernels and note that AN = range(PA ) d V admits and S = range(PS ) are the images. Then the tensor space V = the direct decomposition V = NA ⊕ A = NS ⊕ S, i.e., any v ∈ T has a unique decomposition into v = vN + vX with either vN ∈ NA , vX ∈ A or vN ∈ NS , vX ∈ S. Consequently, S and A are isomorphic to the quotient spaces (cf. §3.1.3): S∼ = V/NS ,
A∼ = V/NA .
(3.41)
V/NA = ∧d V is called the d-th exterior power of V , and v (1) ∧v (2) ∧ . . . ∧v (d) is the isomorphic image of PA (v (1) ⊗ v (2) ⊗ . . . ⊗ v (d) ). The operation ∧ is called the exterior product. Analogously, V/NS = ∨d V is called the d-th symmetric power of V and v (1) ∨v (2) ∨. . .∨v (d) is the isomorphic image of PS (v (1) ⊗v (2) ⊗. . .⊗v (d) ). In the context of (anti)symmetric tensor spaces a mapping A ∈ L(V, V) is called symmetric if A commutes with all π ∈ P , i.e., Aπ = πA. This property implies that APS = PS A and APA = PA A, and proves the following result. Remark 3.80. If A ∈ L(V, V) is symmetric, S and A are invariant under A, i.e., the restrictions of A to S and A belong to L(S, S) and L(A, A), respectively. The Hadamard product will be explained in (4.82). In the case of functions, it is the usual pointwise product. Exercise 3.81. Let s, s0 ∈ S and a, a0 ∈ A. Show that s s0 , a a0 ∈ S, whereas s a, a s ∈ A.
3.5 Symmetric and Antisymmetric Tensor Spaces
91
3.5.2 Quantics The term ‘quantics’ introduced by Arthur Cayley [54] in 1854 is used for homogeneous polynomials in multiple variables, i.e., polynomials in the variables xi (i ∈ B) which are a finite sum of terms Y aν xν = aν xνi i∈B i P with multi-indices ν of length |ν| := i∈B νi = d ∈ N0 and aν ∈ K. Such quantics have the property p(λx) = λd p(x). Proposition 3.82. Let V be a vector space15 with the algebraic basis {bi : i ∈ B}. Then the algebraic symmetric tensor space Sd (V ) is isomorphic to the set of quantics in the variables {xi : i ∈ B}. This statement also holds for dim(V ) = #B = ∞. Note that the infinite product Y xν = xνi i i∈B
makes sense since at most d exponents νi are different from zero. Nd For the proof of Proposition 3.82 consider a general tensor v ∈ a V . Fix a basis {bi : i ∈ B} of V . To each bi we associate a variable xi . We may write v=
X i∈B d
ai
d O
bi j
(almost all ai vanish).
j=1
P Nd Therefore any symmetric tensor can be written as PS v = i∈B d PS (ai j=1 bij ). The isomorphism into the quantics of degree d is given by ! d d O Y Φ : PS ai (3.42) bi j 7→ ai xij . j=1
j=1
Note that d Y j=1
xij =
Y
xνi i
with νi := #{j ∈ {1, . . . , d} : ij = i}
for all i ∈ B.
i∈B
The symmetry of PS v corresponds to the fact that all factors xi commute, i.e., xi xj = xj xi . Above we used the range of PS to define Sd (V ). In the case of d = 2, e.g., v := PS (a ⊗ b) = 21 (a ⊗ b + b ⊗ a) represents a symmetric tensor. Instead, we may use v = 41 ⊗2 (a + b) − 14 ⊗2 (a − b) . In the latter representation each term itself is symmetric. In general, Sd (V ) can be generated by d-fold products ⊗d v: 15
An additional requirement is that either K have characteristic 0 or that the polynomials be understood in such a way that xν for different ν are linearly independent, i.e., they are not viewed as functions defined on K (note that x3 − 1 = 0 as a function on K = Z3 ).
92
3 Algebraic Foundations of Tensor Spaces
S(V ) = span{⊗d v : v ∈ V }.
(3.43)
In the language of quantics this means that the polynomial is written as sum of d Pn d-th powers of linear forms. The decomposition of a homogeneous i=1 ai xi polynomial into this special form is addressed by Brachat–Comon–Mourrain– Tsigaridas [40]. Such a decomposition is the basis of the following definition. Definition 3.83. For symmetric tensors16 s ∈ Sd (V ) the symmetric rank (also called Waring rank) is defined by ( ) r X d ⊗ vi with vi ∈ V . ranksym (s) := min r ∈ N0 : s = i=1
An immediate conclusion is ranksym (s) ≥ rank(s). In the case of real matrices we have ranksym (s) = rank(s). The conjecture that this equality also holds for general tensors is wrong. A counterexample is given by Shitov [261]. A related quantity is the Chow rank defined in §7.7.2.
3.5.3 Determinants Since tensor products are related to multilinear forms (cf. Proposition 3.23) and the determinant is a special antisymmetric multilinear form, it is not surprising to find relations between antisymmetric tensors and determinants. We recall that the determinant det(A) of a matrix A ∈ Kd×d is equal to d X Y (3.44) aj,π(j) . det(A) = sign(π) j=1
π∈P
antisymmetric tensors we have to use their antisymmetrisation Lemma 3.84. The antisymmetrised elementary tensor v := PA v (j) ∈ V := Kn has the entries (1) (2) (d) v i 1 v i 1 · · · vi 1 (1) (2) (d) v i 2 v i 2 · · · vi 2 1 v[i1 , . . . , id ] = det .. . . . .. . d! . .. . . (1)
(2)
Nd
(j) j=1 u . For Nd (j) PA . j=1 u
The building block of usual tensors are the elementary tensors
Nd
j=1
v (j) with
(d)
v i d v i d · · · vi d 16
Here we assume P that the underlying field K has characteristic 0 (as R and C). Otherwise the representation s = ri=1 ⊗d vi may be impossible. Consider, e.g., the field K = Z2 = {0, 1}. Then V := K2 contains only 3 nontrivial vectors so that dim{⊗3 v : P v ∈ V } ≤ 3. Since dim(S3 (V )) = 4, not all symmetric tensors allow a representation s = ri=1 ⊗d vi .
3.5 Symmetric and Antisymmetric Tensor Spaces
Proof. Set e :=
Nd
j=1
93
v (j) . By the definition (3.44), the right-hand side is equal to
d Y 1 X 1 X (j) sign(π) viπ(j) = sign(π)eπ(i) = (PA (e))i = vi = v[i1 , ..., id ], d! d! j=1 π∈P
π∈P
t u
proving the assertion.
For V = Kd the antisymmetric space Ad (Kd ) is one-dimensional (cf. Conclusion 3.78b). Therefore any transformation by A = ⊗d A is identical to a multiple of the identity. Nd Remark 3.85. Let V = Kd . For any A ∈ Kd×d and v = PA ( j=1 v (j) ) ∈ Ad (Kd ) one has O O d d Av (j) = det(A) PA v (j) . A(v) := PA j=1
j=1
Proof. By Conclusion 3.78, tensors u from A(Kd ) are determined by u[1, . . . , d]. 1 det(M ), where M = [v (1) ,..., v (d) ] ∈ Kd×d , Lemma 3.84 shows that v[1,..., d] = d! 1 1 and that A(v)1,...,d = d! det(AM ) = det(A) · d! det(M ) = det(A) · v[1,..., d]. t u For function spaces V (e.g., V = L2 (R), V = C[0, 1], etc.) there is an analogue of Lemma 3.84. Lemma 3.86. For functions f1 , . . . , fd ∈ V the antisymmetrisation of the elementary tensor d d O Y F = fj , i.e., F (x1 , . . . , xd ) = fj (xj ), j=1
j=1
yields G := PA (F ) with
f1 (x1 ) f2 (x1 ) · · · fd (x1 ) f1 (x2 ) f2 (x2 ) · · · fd (x2 ) 1 G(x1 , . . . , xd ) = det . .. .. . .. d! .. . . . f1 (xd ) f2 (xd ) · · · fd (xd ) G is also called the Slater determinant of f1 , . . . , fd . Also the scalar product of two antisymmetrised elementary tensors is expressed by a determinant as stated in (4.90).
3.5.4 Application of Functionals For a subset ∅ $ α $ D = {1, . . . , d} let φ ∈ (a (without requiring any symmetry property).
N
j∈α
V )0 be a functional
94
3 Algebraic Foundations of Tensor Spaces
Lemma 3.87. Let φ and α as above and set k := #α. (a) Then φ interpreted as in (3.35c) is a map of Sd (V ) into Sd−k (V ). The value φ(s) of s ∈ Sd (V ) only depends on k = #α, not on the particular choice of α ⊂ D. (b) φ is also a mapping from Ad (V ) into Ad−k (V ). Here, the sign of φ(a) for a ∈ Ad (V ) depends on the choice of α. Proof. Let α and β be subsets of D with #α = #β and denote the corresponding maps by φα and φβ . Define the permutation π by π(β) = α. Then φβ ◦ π = φα holds, i.e., φα (v) = φβ (π(v)) . In the symmetric case we have π(v) = v, i.e., φα (v) = φβ (v). In the antisymmetric case the sign may change. In the sequel we assume that α = {d − k + 1, . . . , d}. Let π be a transposition of i 6= j with i, j ∈ {1, . . . , d − k}. It follows that πφ = φπ, i.e., π(φ(v)) = φ(π(v)). Hence v ∈ Sd (V ) implies π(v) = v and π(φ(v)) = φ(v) ∈ ⊗d−k V. Remark 3.75 proves that φ(v) ∈ Sd−k (V ). The t u antisymmetric case is analogous. Since the expansion term vhii can be defined by a functional (cf. Remark 3.65), Lemma 3.87 proves the next statement. For that purpose we recall (3.22a): X (3.45) {bi : i ∈ B} basis of V. vhii ⊗ bi , v= i
Conclusion 3.88. If v ∈ Sd (V ) or v ∈ Ad (V ), also the tensors vhii ∈ ⊗d−1 V defined in (3.45) belong to Sd−1 (V ) or v ∈ Ad−1 (V ), respectively. Concerning the notation vhi,mi we refer to (3.22b). Theorem 3.89. Assume that v is given by (3.45) with vhii ∈ ⊗d−1 V. (a) Then v ∈ Sd (V ) holds if and only if vhii ∈ Sd−1 (V ) and vhi,mi = vhm,ii
for all i, m ∈ B.
(b) v ∈ Ad (V ) holds if and only if vhii ∈ Ad−1 (V ) and vhi,mi = −vhm,ii for all different i, m ∈ B. Proof. (i) Assume vhiiP ∈ Sd−1 (V ) and define the tensor v by (3.45). The second v = expansion yields i,m vhi,mi ⊗ bm ⊗ bi . Symmetry of v implies that v = P vhi,mi ⊗ bi ⊗ bm . Interchanging the symbols i, m in this expression yields i,mP v = i,m vhm,ii ⊗bm ⊗bi . Therefore, vhi,mi = vhm,ii follows (cf. Remark 3.66a). (ii) For the opposite direction we have to show that πv = v for all transposition of j 6= k (cf. Remark 3.75). If 1 ≤ j, k ≤ d − 1, πv = v is a consequence of vhii ∈ Sd−1 (V ). It remains to consider the case j ≤ d − 1 and k = d; without loss of generality we choose j = d − 1. The considerations in part (i) show that vhi,mi = vhm,ii implies πv = v. (iii) The antisymmetric case is completely analogous. t u
Part II
Functional Analysis of Tensor Spaces
Algebraic tensor spaces yield a suitable fundament for the finite-dimensional case. But already in the finite-dimensional case we want to formulate approximation problems, which require the introduction of a topology. Topological tensor spaces are a subject of functional analysis. Standard examples of infinite-dimensional tensor spaces are function spaces, since multivariate functions can be regarded as tensor products of univariate ones. To obtain a Banach tensor space, we need the completion with respect to a norm, which is not fixed by the normed spaces generating the tensor space. The scale of norms is a particular topic of the discussion of Banach tensor spaces in Chapter 4. A particular, but important case are Hilbert tensor spaces. Chapter 5 has a stronger connection to algebraic tensor spaces than to topological ones. But, in particular, the technique of matricisation is a prerequisite required in Chapter 6. In Chapter 6 we discuss the so-called minimal subspaces which are important for the analysis of the later tensor representations in Part III.
Chapter 4
Banach Tensor Spaces
Abstract The discussion of topological tensor spaces has been started by Schatten [254] and Grothendieck [129, 130]. In Section 4.2 we discuss the question how the norms of V and W are related to the norm of V ⊗W. In particular, we introduce the pro- and injective norms. From the viewpoint of functional analysis, tensor spaces of order two are of special interest, since they are related to certain operator spaces (cf. §4.2.9). However, for our applications we are more interested in tensor spaces of order ≥ 3. These spaces are considered in Section 4.3. As preparation for the aforementioned sections and later ones, we need more or less well-known results from Banach space theory, which we provide in Section 4.1. Sections 4.4 and 4.5 discuss the case of Hilbert spaces and Hilbert tensor spaces. This is important, since many applications and many of the numerical methods require the Hilbert structure. The reason is that, unfortunately, the solution of approximation problems with respect to general Banach norms is much more involved than those with respect to a scalar product. In Section 4.6 we recall the tensor operations. The final Section 4.7 is devoted to symmetric and antisymmetric tensor spaces.
4.1 Banach Spaces 4.1.1 Norms In general, we consider vector spaces X over one of the fields K ∈ {R, C}. The topological structure will be generated by norms. We recall the axioms of a norm on X: k·k : X → [0, ∞) , kxk = 0 kλxk = |λ| kxk kx + yk ≤ kxk + kyk
if and only if x = 0, for all λ ∈ K and x ∈ X, for all x, y ∈ X (triangle inequality).
© Springer Nature Switzerland AG 2019 W. Hackbusch, Tensor Spaces and Numerical Tensor Calculus, Springer Series in Computational Mathematics 56, https://doi.org/10.1007/978-3-030-35554-8_4
(4.1)
97
4 Banach Tensor Spaces
98
The map k·k : X → [0, ∞) is continuous because of the inverse triangle inequality kxk − kyk ≤ kx − yk (4.2) for all x, y ∈ X. Combining a vector space X with a norm defined on X, we obtain a normed vector space denoted by the pair (X, k·k). If there is no doubt about the choice of a norm, the notation (X, k·k) is shortened by X. A vector space X may be equipped with two different norms; i.e., (X, k·k1 ) and (X, k·k2 ) may be two different normed spaces although the set X is the same in both cases. For the set of all norms on X we can define a semi-ordering. A norm k·k1 is called weaker (or not stronger) than k·k2 , in symbolic notation k·k1 . k·k2 , (equivalently: k·k2 stronger than k·k1 , or k·k2 & k·k1 ) if there is a constant C such that kxk1 ≤ C kxk2 for all x ∈ X. Two norms k·k1 and k·k2 on X are equivalent, in symbolic notation k·k1 ∼ k·k2 , if k·k1 . k·k2 . k·k1 , or equivalently, if there are C1 , C2 ∈ (0, ∞) with 1 kxk1 ≤ kxk2 ≤ C2 kxk1 C1
for all x ∈ X.
4.1.2 Basic Facts about Banach Spaces A sequence xi ∈ X in a normed vector space (X, k·k) is called a Cauchy sequence (with respect to k·k) if supi,j≥n kxi − xj k → 0
as n → ∞.
A normed vector space (X, k·k) is called a Banach space if it is also complete (with respect to k·k). Completeness means that any Cauchy sequence xi ∈ X has a limit x := limi→∞ xi ∈ X (i.e., kx − xi k → 0). A subset X0 ⊂ X of a Banach space (X, k·k) is dense, if for any x ∈ X there is a sequence xi ∈ X0 with kx − xi k → 0. An equivalent criterion is that for any ε > 0 and any x ∈ X there is some xε ∈ X0 with kx − xε k ≤ ε. A dense subset may be, in particular, a dense subspace. An important property of dense subsets is noted in the next remark.
4.1 Banach Spaces
99
Remark 4.1. Let Φ : X0 → Y be a continuous linear mapping, where X0 is dense in the Banach space (X, k·k) while (Y, k·kY ) is some Banach space. Then there is a unique continuous extension Φ : X → Y with Φ(x) = Φ(x) for all x ∈ X. If a normed vector space (X, k·k) is not complete, it has a unique completion X, ||| · ||| —up to isomorphisms—such that X is a dense subspace of the Banach space X and ||| · ||| is the continuous extension of k·k (similarly to Remark 4.1, Ψ := k·k : X → R has a unique extension Ψ =||| · ||| to a norm on X). From now on we shall use the same symbol k·k for the norm on the closure X (instead of ||| · |||). Further, the continuous extension Φ of any continuous Φ : X → Y is again denoted by Φ. Remark 4.2. Let (X, k·k1 ) and (X, k·k2 ) be normed vector spaces with identical sets X but different norms. Then we must distinguish between the completions (X1 , k·k1 ) and (X2 , k·k2 ) with respect to the corresponding norms. If X is of infinite dimension, the identity X1 = X2 holds if and only if k·k1 ∼ k·k2 . If k·k1 . k·k2 , then1 X1 ⊃ X2 ⊃ X. Definition 4.3. A Banach space is separable, if there is a countable dense subset. Lemma 4.4. Any subspace of a separable Banach space is separable. Proof. This result is stated by Hausdorff in 1927 (for metric spaces instead of Banach spaces; cf. Taskovi´c [280]). Let U be a subspace of B = {xi : i ∈ N}. If {u ∈ U : ku − xi k ≤ 1/n}
(n ∈ N)
is nonempty, select an element ui,n . The obtained set S := {ui,n } is again count3 3 able. Given u ∈ U and 0 < ε ≤ 1, there is an xi with ku − xi k < 3ε . For n ∈ [ 2ε , ε ], 2ε an element ui,n ∈ S exists with kui,n − xi k ≤ 3 . Hence ku − ui,n k ≤ ε proves that S ⊂ U is dense in U. t u Definition 4.5. A closed subspace U of a Banach space X is called complemented, if there is a subspace W such that X = U ⊕ W is a direct sum (cf. [175, p. 4]). Closedness of U in X = U ⊕ W implies that W is closed too. In 1933, Banach–Mazur [20] showed that there exist non-complemented closed subspaces. Lindenstrauss–Tzafriri [213] prove for infinite-dimensional Banach spaces X: Each closed subspace is complemented if and only if X is isomorphic to a Hilbert space. Lemma 4.6. Any finite-dimensional subspace U ⊂ X is complemented. Proof. Let Φ ∈ L(X, U ) be the projection from Theorem 4.16. Then X = U ⊕ W holds with W := (I − Φ)X. t u 1
More precisely, the completion X1 can be constructed such that X1 ⊃ X2 .
4 Banach Tensor Spaces
100
4.1.3 Examples Let I be a (possibly infinite) index set. Examples for I are I = {1, . . . , n}, N, Z, or products of these sets. The vector spaces `(I) = KI and `0 (I) ⊂ `(I) are already explained in Example 3.1. Example 4.7. `p (I) consists of all a ∈ `(I) with bounded norm P p 1/p kak`p (I) := kakp := for p ∈ [1, ∞) i∈I |ai | kak`∞ (I) := kak∞ := sup{|ai | : i ∈ I}
for p = ∞.
(4.3)
(`p (I), k·kp ) is a Banach space for all 1 ≤ p ≤ ∞. If I is a uncountable index set (i.e., #I > ℵ0 := #N ) and p < ∞, the finiteness of kak`p (I) implies that the support supp(a) := {i ∈ I : ai 6= 0} is at most countable. Remark 4.8. (a) For p < ∞, `0 (I) is a dense subspace of `p (I). (b) For an infinite set I, the completion of `0 (I) under the norm k·k∞ yields the proper subset (c0 (I), k·k∞ ) $ `∞ (I) of zero sequences:2 c0 (I) = {a ∈ `∞ (I) : lim aν = 0}. ν→∞
(4.4)
Proof. (i) For finite sets I, `0 (I) = `(I) holds and nothing has to be proved. Next, I is assumed to be infinite and countable. P∞ p Since (ii) Case ν=1 |aν | < ∞, for any ε > 0, there is a νε P p ∈ [1, ∞). p such that ν>νε |aν | ≤ εp , which proves ka − a0 kp ≤ ε for a0 = (a0i )i∈I with aν0 := aν for the finitely many 1 ≤ ν ≤ νε and with a0ν := 0 otherwise. Since a0 ∈ `0 (I), `0 (I) is dense in `p (I). (iii) Case p = ∞. Assume `0 (I) 3 a(n) → a ∈ `∞ (I) with respect to k·k∞ . (n) For any n ∈ N there is a νn such that aν = 0 for ν > νn . For any ε > 0, there (n) is an nε such that ka − ak∞ ≤ ε for n ≥ nε . Hence |aν | = |aν(nε ) − aν | ≤ ka(nε ) − ak∞ ≤ ε for ν > νnε proves limν aν = 0, i.e., a ∈ c0 (I). This proves `0 (I) ⊂ c0 (I). (iv) Assume a ∈ c0 (I). For any n ∈ N, there is a νn such that |aν | ≤ 1/n (n) (n) for ν > νn . Define a(n) ∈ `0 (I) by aν = aν for 1 ≤ ν ≤ νn , and aν = 0 (n) (n) otherwise. Then ka −ak∞ ≤ 1/n holds; i.e., a → a and the reverse inclusion c0 (I) ⊂ `0 (I) follow. (v) Let I be uncountable. For p < ∞, we use the statement from Example 4.7: # supp(a) ≤ ℵ0 . For p = ∞, use ℵ20 = ℵ0 to prove # supp(a) ≤ ℵ0 for all a ∈ `0 (I). Neglecting the zero components, repeat the proof with I0 := supp(a) instead of I. t u 2
If #I > ℵ0 , all countable subsequences must be zero sequences.
4.1 Banach Spaces
101
Example 4.9. Let D ⊂ Rm be a domain. (a) Assume 1 ≤ p < ∞. Then3 Z o n p Lp (D) := f : D → K measurable and |f (x)| dx < ∞ d
defines a Banach space with the norm kf kLp (D) = kf kp =
R d
1/p p |f (x)| dx .
(b) For p = ∞, L∞ (D) := {f : D → K measurable and kf k∞ < ∞} equipped with the norm kf k∞ := ess sup |f (x)| is a Banach space. x∈D
Example 4.10. Let D ⊂ Rm be a domain. The following sets of continuous functions and n-times continuously differentiable functions are Banach spaces: (a) C(D) = C 0 (D) := {f : D → K with kf kC(D) < ∞}, where kf kC(D) = sup {|f (x)| : x ∈ D} . (b) Let n ∈ N0 and define C n (D) := {f : I → K with kf kC n (D) < ∞} with the norm kf kC n (D) = max|ν|≤n k∂ ν f kC(D) , where the maximum is taken over Pm all multi-indices ν = (ν1 , . . . , νm ) ∈ N0m with |ν| := i=1 νi . The mixed partial derivatives are abbreviated by Ym ∂ νi ∂ ν := . (4.5) i=1 ∂xi ∂ ν f denotes the weak derivative (considered as a distribution). Example 4.11. Let D ⊂ Rm be a domain. The Sobolev space n o H 1,2 (D) := f : D → K with kf kH 1,2 (D) < ∞ is a Banach space with the norm v v u
m uX X u
∂ 2 u 2 2 ν t
kf kH 1,2 (D) := t k∂ f kL2 (D) = kf kL2 (D) + .
∂xi f 2 L (D) i=1 |ν|≤1
4.1.4 Operators We recall the definition (3.6) of the set L(X, Y ) of linear mappings. The following remark states basic facts about continuous linear mappings, which are also called operators. The set of operators will be denoted by L(X, Y ). Remark 4.12. Let (X, k·kX ) and (Y, k·kY ) be normed spaces, and Φ ∈ L(X, Y ). (a) The following conditions (i) and (ii) are equivalent: (i) Φ is continuous; (ii) Φ is bounded, i.e., sup {kΦ(x)kY : x ∈ X with kxkX ≤ 1} < ∞. 3
To be precise, we have to form the quotient space {. . .} /N with N := {f : f = 0 on D\S for all S with measure µ(S) = 0}.
4 Banach Tensor Spaces
102
(b) The supremum in (ii) defines the operator norm4 kΦkY ←X := sup {kΦ(x)kY : x ∈ X with kxkX ≤ 1} .
(4.6a)
A simple consequence is kΦ(x)kY ≤ kΦkY ←X kxkX
for all x ∈ X.
(4.6b)
Another notation for the boundedness of Φ reads as follows: there is a constant C < ∞ such that kΦ(x)kY ≤ C kxkX
for all x ∈ X.
(4.6c)
Then the minimal possible C in (4.6c) coincides with kΦkY ←X (cf. (4.6b)). (c) Let X and Y be the completions of X, Y so that X, k·kX and Y , k·kY are Banach spaces. Then the continuation Φ : X → Y discussed in Remark 4.1 has an identical operator norm: kΦkY ←X = kΦkY ←X . Because of this equality, we shall not distinguish between k·kY ←X and k·kY ←X . (d) The set of continuous linear mappings (operators) from X into Y is denoted by L(X, Y ). Together with (4.6a), (L(X, Y ), k·kY ←X ) forms a normed space. If (Y, k·kY ) is a Banach space, (L(X, Y ), k·kY ←X ) is also a Banach space. Proof. (α) If the boundedness (ii) holds, the definition of kΦkY ←X makes sense and (4.6b) follows by linearity of Φ. (β) (boundedness ⇒ continuity) For any ε > 0 set δ := ε/ kΦkY ←X . Whenever kx0 − x00 kX ≤ δ, we conclude that kΦ(x0 ) − Φ(x00 )kY = kΦ(x0 −x00 )kY ≤ kΦkY ←X kx0 −x00 kX ≤ kΦkY ←X δ = ε, i.e., Φ is continuous. (γ) The direction ‘continuity ⇒ boundedness’ is proved indirectly. Assume that (ii) is not valid. Then there are xi with kxi kX ≤ 1 and αi := kΦ(xi )kY → ∞. αi 1 xi satisfy x0i → 0, whereas kΦ(x0i )kY = 1+α → The scaled vectors x0i := 1+α i i 1 6= 0 = k0kY = kΦ(0)kY . Hence Φ is not continuous. Steps (β) and (γ) prove part (a). (δ) For the last part of (d) let Φi ∈ L(X, Y ) be a Cauchy sequence, i.e., sup kΦi − Φj kY ←X → 0
as n → ∞.
i,j≥n
For any x ∈ X, also the images yi := Φi (x) form a Cauchy sequence as seen from kyi − yj kY ≤ kΦi − Φj kY ←X kxkX → 0. By the Banach space property, y := lim yi ∈ Y exists uniquely giving rise to a mapping Φ(x) := y. One verifies that Φ : X → Y is linear and bounded with kΦi − ΦkY ←X → 0, i.e., Φ ∈ L(X, Y ). t u 4
In (4.6a) one may replace kxkX ≤ 1 by kxkX = 1, as long as the vector space is not the trivial space X = {0} containing no x with kxkX = 1. Since this trivial case is of minor interest, we will often use kxkX = 1 instead.
4.1 Banach Spaces
103
From now on, operator is used as a shorter name for ‘continuous linear map’. In later proofs we shall use the following property of the supremum in (4.6a). Remark 4.13. For all operators Φ ∈ L(X, Y ) and all ε > 0, there is an xε ∈ X with kxε kX ≤ 1 such that kΦkY ←X ≤ (1 + ε) kΦ(xε )kY
and
kΦ(xε )kY ≥ (1 − ε) kΦkY ←X .
A subset K of a normed space is called compact, if any sequence xν ∈ K possesses a convergent subsequence with limit in K. Definition 4.14. An operator Φ ∈ L(X, Y ) is called compact if the unit ball B := {x ∈ X : kxkX ≤ 1} is mapped onto Φ(B) := {Φ(x) : x ∈ B} ⊂ Y and the closure Φ(B) is a compact subset of Y . K(X, Y ) denotes the set of compact operators. The approximation property of a Banach space, which will be explained in Definition 4.89, is also related to compactness. Finally, we add results about projections (cf. Definition 3.4). The next statement follows from closed graph theorem (cf. Yosida [306, §II.6]). Lemma 4.15. Let X = U ⊕ W be the direct sum of closed subspaces U, W . The decomposition x = u + w (u ∈ U , w ∈ W ) of any x ∈ X defines projections P1 , P2 ∈ L(X, X) onto these subspaces by P1 x = u and P2 x = w. Vice versa, a projection P1 ∈ L(X, X) onto U = P1 X together with P2 := I − P1 and W = P2 X defines a direct sum of closed subspaces. According to Definition 4.5, the subspaces U and W are complemented. Theorem 4.16. Let Y ⊂ X be a subspace of a Banach space X with dim(Y ) ≤ n. Then there exists a projection Φ ∈ L(X, X) onto Y such that √ kΦkX←X ≤ n. The proof can be found in DeVore–Lorentz [77, Chap. 9, §7] or Meise [224, Prop. 12.14]. The bound is sharp for general Banach spaces, but can be improved to kΦkX←X ≤ n| 2 − p | 1
1
for X = Lp .
(4.7)
If there is a uniform bound of kΦkX←X for all finite-dimensional subspaces Y, then X is isomorphic to a Hilbert space (cf. Albiac–Kalton [3, Theorem 13.4.3]). Definition 4.17 (embedding). Let X and Y be Banach spaces with (different) norms, while the sets satisfy X ⊂ Y. Then X is continuously embedded into Y if the identity map id : X → Y is continuous, i.e., kxkY ≤ C kxkX for all x ∈ X. The embedding is dense and continuous if, in addition, X is dense in Y. More generally, the embedding ι : X → Y is an injective map so that X and ι(X) ⊂ Y are isomorphic. Identifying X and ι(X), we get the above situation. In the last case of Remark 4.2 X2 is continuously and densely embedded into X1 . Concerning the density use that already X is dense in (X1 , k·k1 ).
4 Banach Tensor Spaces
104
4.1.5 Dual Spaces A trivial example of a Banach space (Y, k·kY ) is the field K ∈ {R, C} with the absolute value |·| as norm. The operators X → K are called continuous functionals or continuous forms. The Banach space L(X, K) is called the (continuous) dual of X and denoted by5 X ∗ := L(X, K). The norm k·kX ∗ of X ∗ follows from the general definition (4.6a): kϕkX ∗ = sup {|ϕ(x)| : x ∈ X with kxkX ≤ 1}
(4.8)
= sup {|ϕ(x)| / kxkX : 0 6= x ∈ X} . ∗
Instead of k·kX ∗ , we also use the notation k·kX (meaning the dual norm corresponding to k·kX ). The next statement is one of the many versions of the Hahn–Banach theorem (cf. Yosida [306, §IV.6]). Theorem 4.18. Let (X, k·kX ) be a normed linear space and U ⊂ X a subspace. If a linear form is bounded on U , it can be extended to a continuous functional on X with the same bound. In particular, for x0 ∈ X there is some ϕ ∈ X ∗ such that ϕ(x0 ) = kx0 kX
and
kϕkX ∗ = 1.
(4.9)
This implies that we recover the norm k·kX from the dual norm k·kX ∗ via the following maximum (no supremum is needed!): |ϕ(x)| ∗ kxkX = max {|ϕ(x)| : kϕkX ∗ = 1} = max : 0 6= ϕ ∈ X . (4.10) kϕkX ∗ Corollary 4.19. Let {xν ∈ X : 1 ≤ ν ≤ n} be linearly independent. Then there are functionals ϕν ∈ X ∗ such that ϕν (xµ ) = δνµ (cf. (2.1)). The functionals n n (ϕν )ν=1 are called to be dual to (xν )ν=1 . Proof. Let U := span {xν : 1 ≤ ν ≤ n}, and define the dual system {ϕν } as in Definition 3.6. Since dim(U ) < ∞, the functionals ϕν are bounded on U . By Theorem 4.18 there exists an extension of all ϕν to X ∗ with the same bound. t u The following Lemma of Auerbach is proved, e.g., in Meise [224, Lemma 10.5] or Light–Cheney [211, page 131].. Lemma 4.20. For any n-dimensional subspace of a Banach space X, there exists a basis {xν : 1 ≤ ν ≤ n} and a corresponding dual system {ϕν : 1 ≤ ν ≤ n} such that ∗ kxν k = kϕν k = 1 for 1 ≤ ν ≤ n. 5
Note that X ∗ is a subset of X 0 , the space of the algebraic duals.
4.1 Banach Spaces
105
Lemma 4.21. (a) Let (X, k·k1 ) and (X, k·k2 ) be two normed vector spaces with k·k1 ≤ C k·k2 . By Remark 4.2, completion yields Banach spaces (X1 , k·k1 ) and ∗ ∗ (X2 , k·k2 ) with X2 ⊂ X1 . The corresponding duals X1∗ , k·k1 and X2∗ , k·k2 ∗ ∗ ∗ ∗ ∗ satisfy X1 ⊂ X2 . The dual norms fulfil kϕk2 ≤ C kϕk1 for all ϕ ∈ X1 with the same constant C. If k·k1 and k·k2 are equivalent, X1∗ = X2∗ holds. ∗ ∗ (b) If X ∗ , k·k1 and X ∗ , k·k2 are identical sets with equivalent dual norms ∗ ∗ k·k1 and k·k2 generated by normed vector spaces (X, k·k1 ) and (X, k·k2 ) , then k·k1 and k·k2 are also equivalent. Proof. (i) Let ϕ ∈ X1∗ . Since X2 ⊂ X1 , ϕ(x2 ) is well defined for any x2 ∈ X2 ∗ ∗ ∗ with kx2 k2 = 1, we obtain |ϕ(x2 )| ≤ kϕk1 kx2 k1 ≤ C kϕk1 kx2 k2 = C kϕk1 . ∗ ∗ Taking the supremum over all x2 ∈ X2 with kx2 k2 = 1, we get kϕk2 ≤ C kϕk1 . ∗ ∗ Again, by Remark 4.2, X1 ⊂ X2 follows. (ii) For equivalent norms both inclusions X1∗ ⊂ X2∗ ⊂ X1∗ prove X1∗ = X2∗ . ∗ ∗ (iii) The identity kxk1 = max |ϕ(x)| : kϕk1 = 1 = maxϕ6=0 |ϕ(x)| / kϕk1 ∗ ∗ follows from (4.10). By norm equivalence, kϕk1 ≤ C kϕk2 follows so that ∗ ∗ max |ϕ(x)| / kϕk1 ≥ C1 max |ϕ(x)| / kϕk2 = C1 kxk2 ϕ6=0
ϕ6=0
t u
and vice versa, proving Part (b).
The Banach space (X ∗ , k·kX ∗ ) has again a dual (X ∗∗ , k·kX ∗∗ ) called the bidual of X. The embedding X ⊂ X ∗∗ has to be understood as identification of x ∈ X with the bidual mapping χx ∈ X ∗∗ defined by χx (ϕ) := ϕ(x) for all ϕ ∈ X ∗ . If X = X ∗∗ , the Banach space X is called reflexive. Lemma 4.22. Let ϕ ∈ X ∗ . Then kϕkX ∗ = sup
06=x∈X
|Φ(ϕ)| |ϕ(x)| . = max ∗∗ 06=Φ∈X kxkX kΦkX ∗∗
If X is reflexive, kϕkX ∗ = max06=x∈X |ϕ(x)| / kxkX holds (max instead of sup!). Proof. The left equality holds by the definition of k·kX ∗ . The right equality is the identity (4.10) with x, X, ϕ replaced with ϕ, X ∗ , Φ. In the reflexive case, Φ(ϕ) = ϕ(xΦ ) holds for some xΦ ∈ X and proves the second part. t u Definition 4.23. If Φ ∈ L(X, Y ), the dual operator Φ∗ ∈ L(Y ∗ , X ∗ ) is defined via Φ∗ : η 7→ ξ := Φ∗ η with ξ(x) := η(Φx) for all x ∈ X. Lemma 4.24. kΦ∗ kX ∗ ←Y ∗ = kΦkY ←X . The embedding properties of X ⊂ Y are almost inherited by their dual spaces (cf. Definition 4.17). For a proof compare Hackbusch [141, Lemma 6.63]. Lemma 4.25. Let X ⊂ Y be a dense and continuous embedding of the Banach spaces. Then also Y 0 is continuously embedded in X 0 . If X is reflexive, the embedding Y 0 ⊂ X 0 is also dense.
4 Banach Tensor Spaces
106
4.1.6 Examples In §4.1.3 examples of Banach spaces are given. Some of the corresponding dual spaces are easy to describe. Example 4.26. (a) The dual of `p (I) for 1 ≤ p < ∞ is (isomorphic to) `q (I), where ∗ the conjugate q is defined by p1 + 1q = 1 and the embedding `q (I) ,→ (`p (I)) is defined by X ϕ (a) := ai ϕi ∈ K for a = (ai )i∈I ∈ `p (I) and ϕ = (ϕi )i∈I ∈ `q (I). i∈I
The subspace c0 (I) ⊂ `∞ (I) (cf. (4.4)) has the dual `1 (I). If #I < ∞, equality ∗ ∗ (`∞ (I)) = `1 (I) holds; otherwise, (`∞ (I)) % `1 (I). ∗ (b) Similarly, (Lp (D)) R ∼ = Lq (D) is valid for 1 ≤ p < ∞ with the embedding q g ∈ L (D) 7→ g(f ) := d f (x)g(x)dx for all f ∈ Lp (D). ∗
(c) Let I = [a, b] ⊂ R be an interval. Any functionalR ϕ ∈ (C(I)) corresponds to a function g of bounded variation such that ϕ(f ) = I f (x)dg(x) exists as Stieljes integral for f ∈ C(I). The latter integral with g chosen as the step function all x≤s gs (x) := 01 for for x>s leads to the Dirac functional δs , which satisfies δs (f ) = f (s)
for all f ∈ C(I) and s ∈ I.
(4.11)
4.1.7 Weak Convergence Let (X, k·k) be a Banach space. We say that (xn )n∈N converges weakly to x ∈ X, if lim ϕ(xn ) = ϕ(x) for all ϕ ∈ X ∗ . In this case we write xn * x. Standard (strong) convergence xn → x implies xn * x. Lemma 4.27. If xn * x, then kxk ≤ lim inf kxn k . n→∞
∗
Proof. Choose ϕ ∈ X ∗ with kϕk = 1 and |ϕ(x)| = kxk (cf. (4.9)) and note that kxk ← |ϕ(xn )| ≤ kxn k. t u Exercise 4.28. Let the sequence {xn } be bounded. Show that xn * x already holds if lim ϕ(xn ) = ϕ(x) for all ϕ ∈ Ξ, where Ξ is dense in X ∗ . Conversely, xn * x implies that {xn } is bounded. Hint: For the latter statement use the uniform boundedness theorem (cf. [137, §3.4.7]). (i)
Lemma 4.29. Let N ∈ N. Assume that the sequences (xn )n∈N for 1 ≤ i ≤ N (i) converge weakly to linearly independent limits x(i) ∈ X (i.e., xn * x(i) ). Then (i) there is an n0 such that for all n ≥ n0 , the N -tuples (xn : 1 ≤ i ≤ N ) are linearly independent.
4.1 Banach Spaces
107
Proof. There exist functionals ϕ(j) ∈ X ∗ (1 ≤ j ≤ N ) with ϕ(j) (x(i) ) = δij (cf. Corollary 4.19). Set N ∆n := det (ϕ(j) (x(i) )) n i,j=1 . (i)
(i)
xn * x(i) implies ϕ(j) (xn ) → ϕ(j) (x(i) ). Continuity of the determinant proves N ∆n → ∆∞ := det((δij )i,j=1 ) = 1. Hence there is an n0 such that ∆n > 0 for (i) t u all n ≥ n0 , but ∆n > 0 proves linear independence of {xn : 1 ≤ i ≤ N }. For the local sequential weak compactness used next see Yosida [306, Chap. V.2]. Lemma 4.30. If X is a reflexive Banach space, any bounded sequence xn ∈ X has a subsequence xnν converging weakly to some x ∈ X. Corollary 4.31. Pr Let X be a reflexive Banach space, xn ∈ X a bounded sequence with xn = i=1 ξn,i , ξn,i ∈ X, and kξn,i k ≤ C kxn k. Then there are Prξi ∈ X and a subsequence such that ξnν ,i * ξi and, in particular, xnν * x := i=1 ξi . Proof. By Lemma 4.30, weak convergence xn * x holds for n ∈ N(0) ⊂ N, where N(0) is an infinite subset of N. Because kξn,1 k ≤ C kxn k, ξn,1 is also a bounded sequence for n ∈ N(0) . A second infinite subset N(1) ⊂ N(0) exists with the property ξn,1 * ξ1 (n ∈ N(1) ) for some ξ1 ∈ X. Next, ξn,2 * ξ2 ∈ X can ξn,i be shown for n ∈ N(2) ⊂ N(1) , etc. Finally, for N(r) 3 n → ∞ all Psequences r t u converge weakly to ξi and summation over i yields xn * x := i=1 ξi . Definition 4.32. A subset M ⊂ X is called weakly closed, if xn ∈ M and xn * x imply x ∈ M . Note the implication ‘M weakly closed ⇒ M closed’, i.e., ‘weakly closed’ is a stronger statement than ‘closed’. Theorem 4.33. Let (X, k·k) be a reflexive Banach space with a weakly closed subset ∅ 6= M ⊂ X. Then the following minimisation problem has a solution: for any x ∈ X find v ∈ M with kx − vk = inf{kx − wk : w ∈ M }. Proof. Choose any sequence wn ∈ M with kx − wn k & inf{kx − wk : w ∈ M }. Since (wn )n∈N is a bounded sequence in X, Lemma 4.30 ensures the existence of a weakly convergent subsequence wni * v ∈ X. v belongs to M because wni ∈ M and M is weakly closed. Since x − wni * x − v is also valid, u Lemma 4.27 shows that kx − vk ≤ lim inf kx − wni k ≤ inf{kx − wk : w ∈ M }. t Since the assumption of reflexivity excludes important spaces, we add some remarks on this subject. The existence of a minimiser (‘nearest point’) v in a certain set A ⊂ X to some v ∈ V \A is a well-studied subject. A set A is called ‘proximinal’ if for all x ∈ X\A the best approximation problem kx − vk = inf w∈A kx − wk has at least one solution v ∈ A. Without the assumption of
4 Banach Tensor Spaces
108
reflexivity, there are statements ensuring under certain conditions that the set of points x ∈ X\A possessing nearest points in A are dense (e.g., Edelstein [85]). However, in order to conclude from the weak closedness6 of the minimal subspaces that they are proximinal, requires reflexivity as the following statement elucidates. Theorem 4.34 ([107, p. 61]). For a Banach space X the following is equivalent: (a) X is reflexive, (b) All closed subspaces are proximinal. (c) All weakly closed nonempty subsets are proximinal.
4.1.8 Continuous Multilinear Mappings Multilinearity is defined in (3.15). The equivalence of continuity and boundedness shown in Remark 4.12a also holds for bilinear, or generally, for multilinear mappings. The proof follows using the same arguments. Lemma 4.35. (a) A bilinear mapping B : (V, k·kV ) × (W, k·kW ) → (X, k·k) is continuous if and only if there is some C ∈ R such that kB(v, w)k ≤ C kvkV kwkW
for all v ∈ V and w ∈ W.
d (b) A multilinear mapping A : ×i=1 Vj , k·kj → (X, k·k) is continuous if and only if there is some C ∈ R such that Yd for all v (j) ∈ Vj . kA(v (1) , . . . , v (d) )k ≤ C kv (j) kj j=1
4.2 Topological Tensor Spaces 4.2.1 Notations The algebraic tensor product V ⊗a W has been defined in (3.9) by the span of all elementary tensors v ⊗w for v ∈ V and w ∈ W . In pure algebraic constructions such a span is always a finite linear combination. Infinite sums, as well as limits of sequences, cannot be defined without topology. In the finite-dimensional case, the algebraic tensor product V ⊗a W is already complete. Corollary 4.71 will even show a similar case with one factor being infinite dimensional. As already announced in (3.10), the completion X := X0 of X0 := V ⊗a W with respect to some norm k·k yields a Banach space (X, k·k), which is denoted On the other hand, if X coincides with the dual Y ∗ of another Banach space Y , every weak∗ closed set in X is proximinal (cf. Holmes [166, p. 123]). However, the later proofs of weak closedness of Ujmin (v) in §6 do not allow to conclude also weak∗ closedness.
6
4.2 Topological Tensor Spaces
109
by V ⊗k·k W := V ⊗ W := V ⊗a W
k·k
k·k
and now called a Banach tensor space. Note that the result of the completion depends on the norm k·k as already discussed in §4.1.2. A tensor x ∈ V ⊗k·k W is defined as a limit x = limn→∞ xn of some xn ∈ V ⊗a W from the algebraic tensor space, e.g., xn is the sum of say n elementary tensors. In general, such a limit of a sequence cannot be written as an infinite sum (but see §4.2.3.3). Furthermore, the convergence xn → x may be arbitrarily slow. However, in practice, one is interested in fast convergence, in order to approximate x by xn with reasonable n (if n is the number of elementary tensors involved, the storage will be related to n). In that case the statement xn → x should be replaced with a quantified error estimate: kxn − xk ≤ O(ϕ(n)) with ϕ(n) → 0 as n → ∞, (4.12) or kxn − xk ≤ o(ψ(n)) with sup ψ(n) < ∞, i.e., kxn − xk ≤ Cϕ(n) for some constant C or kxn − xk /ψ(n) → 0 as n → ∞. The notation ⊗k·k becomes a bit cumbersome if the norm sign k·k carries additional suffices, as e.g., k·kC n (I) . To shorten the notation, we often copy the (shortened) suffix7 of the norm to the tensor sign, e.g., the association p ↔ k·k`p (Z) or ∧ ↔ k·k∧(V,W ) is used in V ⊗ W = V ⊗p W, p
V ⊗ W = V ⊗∧ W, ∧
etc.
The neutral notation V ⊗W is used only if (i) there is not doubt about the norm of the Banach tensor space or (ii) a statement holds both for the algebraic tensor product and the topological one. In the finite-dimensional case, in which no completion is necessary, we shall use V ⊗ W for the algebraic product space (without any norm), as well as for any normed tensor space V ⊗ W . If V and W are infinite-dimensional spaces, it turns out that the algebraic tensor space V ⊗a W is only a rather small part of V ⊗k·k W . As remarked by 8 Uschmajew [289], V ⊗a W is a set of first category, S i.e., there are closed sets Xk (k ∈ N) with empty interior such that V ⊗a W = k∈N Xk . The precise statement Nd in Proposition 4.36 refers to the tensor space Valg := j=1 Vj of order d and will be proved at the end of §8.1. Proposition 4.36. Under the conditions (4.54a) or (4.54b) and if two of the spaces Vj are infinite dimensional, Valg is a set of first category. 7
The letter a must be avoided, since this denotes the algebraic product ⊗a . The complete space V ⊗k·k W cannot be of first category due to the theorem of Baire–Hausdorff (cf. [137, Theorem 3.40]).
8
4 Banach Tensor Spaces
110
4.2.2 Continuity of the Tensor Product, Crossnorms 4.2.2.1 Basic Facts We start from Banach spaces (V, k·kV ) and (W, k·kW ). The question arises as to whether the norms k·kV and k·kW define a norm k·k on V ⊗W in a canonical way. A suitable choice seems to be the definition kv ⊗ wk = kvkV kwkW
for all v ∈ V and w ∈ W
(4.13)
for the norm of elementary tensors. However, in contrast to linear maps (cf. Lemma 3.52), the definition of a norm on the set of elementary tensors does not determine k·k on the whole of V ⊗a W. Hence, unlike algebraic tensor spaces, the topological tensor space (V ⊗ W, k·k) is not uniquely determined by the components (V, k·kV ) and (W, k·kW ). Definition 4.37. Any norm k·k on V ⊗a W satisfying (4.13) is called a crossnorm. A necessary condition for k·k is the continuity of the tensor product; i.e., the mapping (v, w) ∈ V × W 7→ v ⊗ w ∈ V ⊗ W must be continuous. Since ⊗ is bilinear (cf. Lemma 3.12), we may apply Lemma 4.35a. Remark 4.38. The tensor product ⊗ : (V, k·kV ) × (W, k·kW ) → (V ⊗ W, k·k) is continuous if and only if there exists some C < ∞ such that kv ⊗ wk ≤ C kvkV kwkW
for all v ∈ V and w ∈ W.
(4.14)
Here (V ⊗W, k·k) may be the algebraic (possibly incomplete) tensor space V ⊗a W equipped with a norm k·k or a Banach tensor space (V ⊗k·k W, k·k). Inequality (4.14) is, in particular, satisfied by a crossnorm. Without continuity of ⊗, unexpected and unwanted effects may happen. Assume for instance that V is a Banach space and consider a sequence vi ∈ V with v = lim vi . Then the elementary tensors vi ⊗ w and v ⊗ w belong to V ⊗a W and therefore also to the completion V ⊗k·k W , but vi ⊗ w → v ⊗ w may not hold. For a proof, we use the following lemma (cf. Defant–Floret [74, I.1.2]). Lemma 4.39. Let V be a Banach space (W may be incomplete). Assume separate continuity: kv ⊗ wk ≤ Cw kvkV and kv ⊗ wk ≤ Cv kwkW hold for all v ∈ V and w ∈ W with Cv [Cw ] depending on v [w]. Then (4.14) follows for some C < ∞. Assuming that (4.14) is not true, we conclude from Lemma 4.39 that at least one of the statements about separate continuity is not valid. Without loss of generality, there is some w ∈ W such that sup06=v∈V kv ⊗ wk / kvkV = ∞. Fixing such a w, we obtain a norm ||| v |||w := kv ⊗ wk which is strictly stronger than k·kV . Then there are sequences vi → v ∈ V which do not converge with respect to the |||·|||w norm (cf. Remark 4.2).
4.2 Topological Tensor Spaces
111
In the definition of V ⊗k·k W we have not fixed whether V and W are complete or not. The next lemma shows that in any case the same Banach tensor space V ⊗k·k W results. Any incomplete normed space (V, k·kV ) may be regarded as a dense subset of the Banach space (V , k·kV ). Replacing the notations V , V by V, V0 , we can apply the following lemma. There we replace V and W in V ⊗ W with dense subspaces V0 , W0 . Three norms are involved: k·kV for V0 and V , k·kW for W0 and W , and k·k for the tensor spaces V0⊗aW0 , V ⊗aW , V ⊗k·kW. Lemma 4.40. Let V0 be dense in (V, k·kV ) and W0 be dense in (W, k·kW ). Assume ⊗ : (V, k·kV ) × (W, k·kW ) → (V ⊗k·k W, k·k) to be continuous; i.e., (4.14) holds. Then V0 ⊗a W0 is dense in V ⊗k·k W , i.e., V0 ⊗a W0 = V ⊗k·k W. Proof. For any ε > 0 and any x ∈ V ⊗k·k W we must show that there is an xε ∈ V0 ⊗a W0 with kx − xε k ≤ ε. By the definition of V ⊗k·k W , there is an Pn x0 ∈ V ⊗a W with kx − x0 k ≤ 2ε and a finite sum representation x0 = i=1 vi0 ⊗ wi0 with vi0 ∈ V and wi0 ∈ W . We set Cmax := max {kvi0 kV , kwi0 kW : 1 ≤ i ≤ n} , and choose δ so small that nCδ (2Cmax + δ) ≤ ε/2, where C is the equally named constant in (4.14). Select vi P ∈ V0 and wi ∈ W0 n with kvi0 − vi kV ≤ δ and kwi0 − wi kW ≤ δ and set xε := i=1 vi ⊗ wi . Then
n
X
0 0 0 kx − xε k = (vi ⊗ wi − vi ⊗ wi )
i=1
n
X
{(vi0 − vi ) ⊗ wi0 + vi0 ⊗ (wi0 − wi ) + (vi − vi0 ) ⊗ (wi0 − wi )} =
i=1
≤ ≤
n X i=1 n X
k(vi0 − vi ) ⊗ wi0 k + kvi0 ⊗ (wi0 − wi )k + k(vi − vi0 ) ⊗ (wi0 − wi )k
≤ (4.14)
{C kvi0 − vi kV kwi0 kW + C kvi0 kV kwi0 − wi kW + C kvi − vi0 k kwi0 − wi k}
i=1
≤ nCδ {2Cmax + δ} ≤ ε/2 proves kx − xε k ≤ ε.
t u
4 Banach Tensor Spaces
112
Remark 4.41. If (V, k·kV ) and (W, k·kW ) are separable (cf. Definition 4.3) and continuity (4.14) holds, (V ⊗k·k W, k·k) is also separable. Proof. Let {vi : i ∈ N} and {wj : j ∈ N} be dense subsets of V and W , respectively. The set E := {vi ⊗ wj : i, j ∈ N} ⊂ V ⊗a W is again countable. By continuity of ⊗, the closure E contains all elementary tensors v ⊗ w ∈ V ⊗a W . Let B := span{E} be the set of finite linear combinations of vi ⊗ wj ∈ E. The previous result shows that V ⊗k·k W ⊃ B ⊃ V ⊗a W and proves B = V ⊗k·k W. t u
4.2.2.2 Examples Example 4.42 (`p ). Let V := (`p (I), k·k`p (I) ) and W := (`p (J), k·k`p (J) ) for 1 ≤ p ≤ ∞ and for some finite or countable I and J. Then the tensor product of the vectors a = (aν )ν∈I ∈ `p (I) and b = (bµ )µ∈J ∈ `p (J) may be considered as the (infinite) matrix a ⊗ b =: c = (cνµ )ν∈I,µ∈J with entries cνµ := aν bµ . Any elementary tensor a ⊗ b belongs to9 `p (I×J), and k·k`p (I×J) is a crossnorm: ka ⊗ bk`p (I×J) = kak`p (I) kbk`p (J)
for a ∈ `p (I), b ∈ `p (J).
(4.15)
Proof. For p < ∞, (4.15) follows from XX XX p p p p ka ⊗ bk`p (I×J) = |aν bµ | = |aν | |bµ | ν∈I µ∈J
=
X
|aν |
ν∈I µ∈J p
X
ν∈I
p
|bµ |
p = kak`p (I) kbk`p (J) .
µ∈J
For p = ∞, use |cνµ | = |aν bµ | = |aν | |bµ | ≤ kak`∞ (I) kbk`∞ (J) to show that ka ⊗ bk`∞ (I×J) ≤ kak`∞ (I) kbk`∞ (J) . Since there are indices ν ∗ , µ∗ with |aν ∗ | ≥ (1 − ε) kak`∞ (I)
and
|bµ∗ | ≥ (1 − ε) kbk`∞ (J)
the reverse inequality follows.
t u
Since linear combinations of finitely many elementary tensors again belong to `p (I × J), we obtain `p (I) ⊗a `p (J) ⊂ `p (I × J). Hence the completion with respect to k·k`p (I×J) yields a Banach tensor space `p (I) ⊗p `p (J) ⊂ `p (I × J). The next statement shows that, except for p = ∞, even equality holds. Remark 4.43. `p (I) ⊗ `p (J) = `p (I × J) holds for 1 ≤ p < ∞. p
9
Note that I ×J is again countable.
4.2 Topological Tensor Spaces
113
Proof. Let 1 ≤ p < ∞. It is sufficient to show that `p (I) ⊗a `p (J) is dense in `p (I × J). Let iν and jµ (ν, µ ∈ N) be any enumerations of the index sets I and J, respectively. Note that c ∈ `p (I ×J) has entries indexed by (i, j) ∈ I ×J. Then for each c ∈ `p (I × J), X p lim |ciν ,jµ |p = kck`p (I×J) n→∞
1≤ν,µ≤n
holds. Define c(n) ∈ `p (I × J) by ciν ,jµ (n) ciν ,jµ = 0
for 1 ≤ ν, µ ≤ n, otherwise.
The previous limit shows c(n) → c as n → ∞. Let eIν ∈P `p (I)v and eJµ ∈ `p (J) (n) (n) be the unit vectors (cf. (2.2)). Then the identity c = 1≤ν,µ≤n ciν ,jµ eIν ⊗ eµJ shows that c(n) ∈ `p (I)⊗a `p (J). Since c = lim c(n) ∈ `p (I ×J) is an arbitrary element, `p (I) ⊗a `p (J) is dense in `p (I × J). t u For p = ∞ we consider the proper subspace c0 (I) $ `∞ (I) endowed with the same norm k·k`∞ (I) (cf. (4.4)). The previous proof idea can be used to show c0 (I) ⊗ c0 (J) = c0 (I × J). ∞
Next, we consider the particular case of p = 2, i.e., V := (`2 (I), k·k`2 (I) ) and W := (`2 (J), k·k`2 (J) ) for some finite or countable I and J. Again, c := a ⊗ b with a ∈ `2 (I) and b ∈ `2 (J), as well as any linear combination from V ⊗a W , may be considered as a (possibly infinite) matrix c = (cνµ )ν∈I,µ∈J . Theorem 4.137 will provide an infinite singular-value decomposition of c : c=
∞ X i=1
σ i vi ⊗ w i
with σ1 ≥ σ2 ≥ . . . ≥ 0 and orthonormal systems {vi } and {wi }.
(4.16)
Pn Algebraic tensors v = i=1 xi ⊗ yi lead to σi = 0 for all i > n. Therefore the sequence σ := (σi )i∈N belongs to `0 (N). Sequences (σi )i∈N with infinitely many nonzero entries can only appear for topological tensors. ∞
Definition 4.44 (Schatten norms). If the singular values σ = (σν )ν=1 of c in (4.16) have a finite `p norm kσkp , we set kckSVD,p := kσkp :=
∞ X
!1/p p
|σν |
for 1 ≤ p < ∞.
(4.17)
ν=1
As already seen in (4.15), k·k`2 (I×J) is a crossnorm. The next example shows that there is more than one crossnorm.
4 Banach Tensor Spaces
114
Example 4.45. Consider Example 4.42 for p = 2, i.e., V := (`2 (I), k·k`2 (I) ) and W := (`2 (J), k·k`2 (J) ). For any 1 ≤ p ≤ ∞, the Schatten10 norm k·kSVD,p is a crossnorm on V ⊗a W : ka ⊗ bkSVD,p = kak`2 (I) kbk`2 (J)
for a ∈ `2 (I), b ∈ `2 (J), 1 ≤ p ≤ ∞. (4.18)
As a consequence, ⊗ : V × W → V ⊗SVD,p W is continuous for all 1 ≤ p ≤ ∞. In particular, k·kSVD,2 = k·k`2 (I×J) holds. Proof. The rank-1 matrix c := a⊗b has the singular values σ1 = kak`2 (I) kbk`2 (J) and σi = 0 for i ≥ 2. Since the sequence σ := (σi ) has at most one nonzero entry, kσkp = σ1 holds for all 1 ≤ p ≤ ∞ and implies (4.18). For the last statement compare (2.16c). t u The norm (4.18) defines Banach spaces called Schatten classes. For details we refer to Schatten [255] and Meise [224, §II.16]. Example 4.46. (a) Let V = C(I) and W = C(J) be the spaces of continuous functions on some domains I and J with supremum norms k·kV = k·kC(I) and k·kW = k·kC(J) (cf. Example 4.10). Then the norm on V ⊗a W = C(I) ⊗a C(J)
k·k∞ = k·kC(I×J)
satisfies the crossnorm property (4.13). (b) If I and J are compact sets, the completion of V ⊗a W with respect to the maximum norm k·k∞ = k·kC(I×J) yields C(I) ⊗ C(J) = C(I × J). ∞
(4.19)
(c) For V = Lp (I) and W = Lp (J) (1 ≤ p < ∞), the crossnorm property (4.13) and the following identity hold: Lp (I) ⊗ Lp (J) = Lp (I × J) p
for 1 ≤ p < ∞.
Proof. (a) For Part (a) one proves (4.13) as in Example 4.42. (b) Obviously, polynomials in C(I × J) belong to C(I) ⊗a C(J). Since I, J, and hence I × J are compact, the Stone–Weierstrass Theorem (cf. Yosida [306, §0.2]) states that polynomials are dense in C(I × J). This implies that C(I) ⊗a C(J) is dense in C(I × J) , and (4.19) follows. (c) In the case of Lp , we may replace polynomials with step functions. t u 10
Brief remarks about Robert Schatten including his publication list can be found in [211, p. 138].
4.2 Topological Tensor Spaces
115
An obvious generalisation of Lp is the Sobolev space ∂ 1,p p p H (D) := f ∈ L (D) : f ∈ L (D) for 1 ≤ j ≤ d ∂xj
for D ⊂ Rd
with the norm kf k1,p := kf kH 1,p (D) :=
p kf kp
d X
∂f p
+
∂xj
!1/p for 1 ≤ p < ∞.
p
j=1
Let I1 , I2 be intervals. Then the algebraic tensor space H 1,p (Ij ) ⊗a H 1,p (I2 ) is a dense subset of H 1,p (I1×I2 ). Hence the completion with respect to k·kH 1,p (I1 ×I2 ) yields H 1,p (I1 × I2 ) = H 1,p (I1 ) ⊗1,p H 1,p (I2 ). (4.20) Example 4.47 (H 1,p ). The tensor space (H 1,p (I1 × I2 ), k·kH 1,p (I1×I2 ) ) in (4.20) for11 1 ≤ p ≤ ∞ satisfies the continuity inequality (4.14) with C = 1, but the norm is not a crossnorm. Proof. For f = g ⊗ h, i.e., f (x, y) = g(x) h(y), we have p
p
p
p
p
p
p
kg ⊗ hkH 1,p (I1 ×I2 ) = kgkp khkp + kg 0 kp khkp + kgkp kh0 kp p
p
≤ kgk1,p khk1,p , where equality holds if and only if kg 0 kp kh0 kp = 0.
t u
For later use we introduce the anisotropic Sobolev space H (1,0),p (I1 × I2 ) := {f ∈ Lp (I1 × I2 ) : ∂f /∂x1 ∈ Lp (I1 × I2 )} with the norm kf k(1,0),p := kf kH (1,0),p (I1 ×I2 ) := p
∂f p 1/p
p kf kp +
∂x1 p p
p
p
for 1 ≤ p < ∞. p
p
p
In this case kg ⊗ hkH (1,0),p (I1 ×I2 ) = kgkp khkp + kg 0 kp khkp = kgk1,p khkp proves the following result. Example 4.48. Let 1 ≤ p < ∞. The tensor space H (1,0),p (I1 × I2 ), k·kH (1,0),p (I1 ×I2 ) = H 1,p (I1 ) ⊗(1,0),p Lp (I2 ) satisfies (4.13); i.e., k·kH (1,0),p (I1 ×I2 ) is a crossnorm. 11
To include p = ∞, define kf k1,∞ := max{kf k∞ , k∂f /∂x1 k∞ , k∂f /∂x2 k∞ }.
4 Banach Tensor Spaces
116
4.2.3 Projective Norm k·k∧(V,W ) 4.2.3.1 Definition and Properties Let k·k1 and k·k2 be two norms on V ⊗a W and denote the corresponding completions by V ⊗1 W and V ⊗2 W . If k·k1 . k·k2 , we already stated that V ⊗1 W ⊃ V ⊗2 W (cf. Remark 4.2). If the mapping (v, w) 7→ v ⊗ w is continuous with respect to the three norms k·kV , k·kW , k·k2 of V , W , V ⊗2 W , then it is also continuous for any weaker norm k·k1 of V ⊗1 W . For a proof, combine kv ⊗ wk2 ≤ C kvkV kwkW in (4.14) with k·k1 ≤ C 0 k·k2 to obtain boundedness kv ⊗ wk1 ≤ C 00 kvkV kwkW with the constant C 00 := CC 0 . On the other hand, if k·k1 is stronger than k·k2 , continuity may fail. Therefore we may ask for the strongest possible norm still ensuring continuity. Note that the strongest possible norm yields the smallest possible Banach tensor space containing V ⊗a W (with continuous ⊗). Since . is only a semi-ordering of the norms, it is not trivial that there exists indeed a strongest norm. The answer is given in the following exercise. Exercise 4.49. Let N be the set of all norms α on V ⊗aW satisfying α(v ⊗ w) ≤ Cα kvkV kwkW (v ∈ V , w ∈ W ). Replacing α by the equivalent norm α/Cα , we obtain the set N 0 ⊂ N of norms with α(v ⊗ w) ≤ kvkV kwkW . Define kxk := sup{α(x) : α ∈ N 0 } for x ∈ V ⊗a W and show that k·k ∈ N 0 is a norm α . k·k for all α ∈ N . The norm defined next will be the candidate for the strongest possible norm. Definition 4.50 (projective norm). The normed spaces (V, k·kV ) and (W, k·kW ) induce the projective norm12 k·k∧(V,W ) = k·k∧ on V ⊗a W defined by kxk∧(V,W ) := kxk∧ X n n X vi ⊗ wi := inf kvi kV kwi kW : x =
(4.21) for x ∈ V ⊗a W.
i=1
i=1
Completion of V ⊗a W with respect to k·k∧(V,W ) defines the Banach tensor space
V ⊗ W, k·k∧(V,W ) . ∧
Note that the infimum in (4.21) is taken over all representations of x. Lemma 4.51. k·k∧(V,W ) is not only a norm but also a crossnorm. 12
Grothendieck [130] introduced the notations k·k∧ for the projective norm and k·k∨ for the injective norm from §4.2.4.2. The older notations by Schatten [254] are γ(·) for k·k∧(V,W ) and λ(·) for k·k∨(V,W ) .
4.2 Topological Tensor Spaces
117
Proof. (i) The first and third norm axioms in (4.1) are trivial. For the proof of the triangle inequality kx0 + x00 k∧ ≤ kx0 k∧ + kx00 k∧ choose any ε > 0. By Pn0 definition of the infimum in (4.21), there are representations x0 = i=1 vi0 ⊗ wi0 Pn00 and x00 = i=1 vi00 ⊗ wi00 with 0
n X
00
ε 2
n X
ε kvi00 kV kwi00 kW ≤ kx00 k∧ + . 2 i=1 i=1 Pn00 Pn0 A possible representation of x = x0 + x00 is x = i=1 vi0 ⊗ wi0 + i=1 vi00 ⊗ wi00 . Hence a bound of the infimum involved in kxk∧ is
kvi0 kV kwi0 kW ≤ kx0 k∧ +
0
kxk∧ ≤
n X
and
00
kvi0 kV
kwi0 kW
i=1
+
n X
kvi00 kV kwi00 kW ≤ kx0 k∧ + kx00 k∧ + ε.
i=1
Since ε > 0 is arbitrary, kxk∧ ≤ kx0 k∧ + kx00 k∧ follows. It remains to prove kxk∧ > 0 for x 6= 0. In §4.2.4.2 we shall introduce another norm kxk∨ for which kxk∨ ≤ kxk∧ will be shown in Lemma 4.61. Hence the norm property of k·k∨ proves that x 6= 0 implies 0 < kxk∨ ≤ kxk∧ . (ii) Definition (4.21) implies the inequality kv ⊗ wk∧ ≤ kvkV kwkW . Pn (iii) For the reverse inequality consider any representation v ⊗ w = i=1 vi ⊗wi . By Theorem 4.18 there is a continuous functional Λ ∈ V ∗ with Λ(v) = kvkV and kΛkV ∗ ≤ 1. The latter estimate implies |Λ (vi )| ≤ kvi kV . Applying Λ to the first Pn component in v ⊗ w = i=1 vi ⊗ wi , we conclude that Xn kvkV w = Λ(vi )wi i=1
(cf. Remark 3.64). The triangle inequality of k·kW yields Xn Xn kvkV kwkW ≤ |Λ(vi )| kwi kW ≤ kvi kV kwi kW . i=1
i=1
The infimum over all representations yields kvkV kwkW ≤ kv ⊗ wk∧ .
t u
Proposition 4.52. Given (V, k·kV ) and (W, k·kW ), the norm (4.21) is the strongest one ensuring continuity of (v, w) 7→ v ⊗ w. More precisely, if some norm k·k satisfies (4.14) with a constant C, then k·k ≤ C k·k∧ holds with the same constant. Proof. Let k·k be a norm on V ⊗a W such that (v, w) 7→ vP ⊗ w is continuous. Then n (4.14), i.e., kv ⊗ wk ≤ C kvkV kwkW holds. Let x = i=1 vi ⊗ wi ∈ V ⊗a W. Pn The triangle inequality yields kxk ≤ kv ⊗ w k. Together with (4.14), i i i=1 Pn kxk ≤ C kv k kw k follows. Taking the infimum over all representations i W Pn i=1 i V x = i=1 vi ⊗ wi , kxk ≤ C kxk∧ follows; i.e., k·k∧ is stronger than k·k. Since k·k is arbitrary under the continuity side condition, k·k∧ is the strongest norm. t u The property of k·k∧ as the strongest possible norm ensuring continuity justifies calling k·k∧(V,W ) a canonical norm of V ⊗a W induced by k·kV and k·kW. The completion V ⊗∧ W with respect to k·k∧ is the smallest one. Any other norm k·k with (4.14) leads to a larger Banach space V ⊗k·k W.
4 Banach Tensor Spaces
118
4.2.3.2 Examples Example 4.53 (`1 ). Consider V = (`1 (I), k·k`1 (I) ) and W = (`1 (J), k·k`1 (J) ) for some finite or countable sets I and J (cf. Example 4.42). The projective norm and the resulting Banach tensor space are k·k∧(V,W ) = k·k`1 (I×J)
and `1 (I) ⊗ `1 (J) = `1 (I × J). ∧
Proof. (i) c ∈ V ⊗a W ⊂ `1 (I × J) has entries cνµ for ν ∈ I and µ ∈ J. A possible representation of c is c=
XX
(ν)
(µ)
cνµ eV ⊗ eW
ν∈I µ∈J (ν)
(µ)
(ν)
(µ)
with unit vectors eV and eW (cf. (2.2)). By keV k`1 (I) = keW k`1 (J) = 1 and the definition of k·k∧(V,W ) , we obtain the inequality kck∧(V,W ) ≤
XX
|cνµ | = kck`1 (I×J) .
ν∈I µ∈J
(ii) k·k`1 (I×J) is a crossnorm (cf. (4.15)); i.e., (4.14) holds with C = 1. Hence Proposition 4.52 shows that kck`1 (I×J) ≤ kck∧(V,W ) . Together with the previous part, k·k∧(V,W ) = k·k`1 (I×J) follows. (iii) The latter identity together with Remark 4.43 (for p = 1) implies that the tensor product is equal to `1 (I) ⊗ `1 (J) = `1 (I × J). t u ∧
Example 4.54 (`2 ). Let V := (`2 (I), k·k`2 (I) ) and W := (`2 (J), k·k`2 (J) ) for some finite or countable sets I and J. Then k·k∧(V,W ) = k·kSVD,1
(cf. (4.17) for k·kSVD,p ).
Note that k·kSVD,1 k·kSVD,2 = k·k`2 (I×J) for #I, #J > 1. P Proof. (i) Let c ∈ V ⊗a W . Its singular-value decomposition c = i σi vi ⊗ wi has only finitely many nonzero singular values σi . By the definition of k·k∧(V,W ) we have X σi kvi kV kwi kW . kck∧(V,W ) ≤ i
kvi kV = kwi kW = 1 (vi , wi orthonormal) yields kck∧(V,W ) ≤ kσk1 = kckSVD,1 . (ii) k·kSVD,1 satisfies (4.14) with C = 1 (cf. (4.18)) so that Proposition 4.52 implies the opposite inequality kckSVD,1 ≤ kck∧(V,W ) . Together, the assertion t u kckSVD,1 = kck∧(V,W ) is proved.
4.2 Topological Tensor Spaces
119
4.2.3.3 Absolutely Convergent Series By definition, any topological tensor x ∈ V ⊗k·k W is the limit of some sequence Pnν (ν) (ν) (ν) (ν) vi ⊗ wi ∈ V ⊗a W. If, in particular, vi = vi and wi = wi x(ν) = i=1 P∞ are independent of ν, the partial sums xν define the series x = i=1 vi ⊗ wi . P∞ Below we state that there is an absolutely convergent series x = i=1 vi ⊗ wi , whose representation is almost optimal compared with the infimum kxk∧ . Proposition 4.55. For any ε > 0 and any x ∈ V ⊗∧ W, there is an absolutely convergent infinite sum x=
∞ X
vi ⊗ w i
(vi ∈ V, wi ∈ W )
(4.22a)
i=1
with
∞ X
kvi kV kwi kW ≤ (1 + ε) kxk∧ .
(4.22b)
i=1
Proof. We abbreviate k·k∧ by k·k and set εν := 3ε kxk /2ν . If x = 0, nothing has to be done. Otherwise choose some s1 ∈ V ⊗a W with kx − s1 k ≤ ε1 .
(4.23a)
Hence ks1 k P ≤ kxk + ε1 follows. By the definition of the norm, there is a represenn1 vi ⊗ wi with tation s1 = i=1 n1 X
kvi kV kwi kW ≤ ks1 k + ε1 ≤ kxk + 2ε1 .
(4.23b)
i=1
Set d1 := x − s1 and approximate d1 by s2 ∈ V ⊗a W such that kd1 − s2 k ≤ ε2 , s2 =
n2 X
vi ⊗ wi ,
i=n1 +1
n2 X
kvi kV kwi kW ≤ ks2 k + ε2 ≤ ε1 + 2ε2
i=n1 +1
(4.23c) (here we use ks2 k ≤ kd1 k + ε2 and kd1 k ≤ ε1 ; cf. (4.23a)). Analogously, we set d2 := d1 − s2 and choose s3 and its representation such that kd2 − s3 k ≤ ε3 , s3 =
n3 X
vi ⊗ wi ,
i=n2 +1
n3 X
kvi kV kwi kW ≤ ks3 k + ε3 ≤ ε2 + 2ε3 .
i=n2 +1
(4.23d) By induction, using (4.23a,c,d) we obtain statement (4.22a) in the form of ∞ ∞ P P x = sν = vi ⊗ wi . The estimates of the partial sums in (4.23b–d) show ν=1 i=1 P P∞ ∞ that i=1 kvi kV kwi kW ≤ kxk + 3 ν=1 εν = kxk + ε kxk, proving (4.22b). t u
4 Banach Tensor Spaces
120
4.2.4 Duals and Injective Norm k·k∨(V,W ) 4.2.4.1 Tensor Products of Dual Spaces The normed spaces (V, k·kV ) and (W, k·kW ) give rise to the dual spaces V ∗ and W ∗ endowed with the dual norms k·kV ∗ and k·kW ∗ described in (4.8). Consider the tensor space V ∗ ⊗a W ∗ . Elementary tensors ϕ ⊗ ψ from V ∗ ⊗a W ∗ may be viewed as linear forms on V ⊗a W via the definition (ϕ ⊗ ψ) (v ⊗ w) := ϕ(v) · ψ(w) ∈ K . As discussed in §3.3.2.4, any x∗ ∈ V ∗ ⊗a W ∗ is a linear form on V ⊗a W . Hence, 0
V ∗ ⊗a W ∗ ⊂ (V ⊗a W ) .
(4.24)
0
Note that (V ⊗a W ) is the algebraic dual since continuity is not yet ensured. ∗
A norm k·k on V ⊗a W leads to a dual space (V ⊗a W ) with a dual norm ∗ denoted by k·k . We would like to have ∗
V ∗ ⊗a W ∗ ⊂ (V ⊗a W )
(4.25) ∗
instead of (4.24). Therefore the requirement on the dual norm k·k (and indirectly ∗ on k·k) is that ⊗ : (V ∗ , k·kV ∗ )×(W ∗ , k·kW ∗ ) → V ∗ ⊗ a W ∗ , k·k be continuous. The latter property, as seen in §4.2.2.1, is expressed by ∗
kϕ ⊗ ψk ≤ C kϕkV ∗ kψkW ∗
for all ϕ ∈ V ∗ and ψ ∈ W ∗ .
(4.26)
Lemma 4.56. Inequality (4.26) implies that kvkV kwkW ≤ C kv ⊗ wk
for all v ∈ V, w ∈ W,
(4.27)
which coincides with (4.14) up to the direction of the inequality sign. Furthermore, (4.26) implies the inclusion (4.25). Proof. Given v ⊗ w ∈ V ⊗a W , choose ϕ and ψ according to Theorem 4.18: kϕkV ∗ = kψkW ∗ = 1 and ϕ(v) = kvkV , ψ(w) = kwkW . Then ∗
kvkV kwkW = ϕ(v)ψ(w) = |(ϕ ⊗ ψ) (v ⊗ w)| ≤ kϕ ⊗ ψk kv ⊗ wk ≤ ≤ C kϕkV ∗ kψkW ∗ kv ⊗ wk = C kv ⊗ wk proves (4.27) with the same constant C as in (4.26).
(4.26)
t u
4.2 Topological Tensor Spaces
121 ∗
A similar result with a ‘wrong’ inequality sign for k·k is stated next. Lemma 4.57. Let k·k satisfy the continuity condition (4.14) with constant C. Then ∗ C kϕ ⊗ ψk ≥ kϕkV ∗ kψkW ∗ holds for all ϕ ∈ V ∗ and ψ ∈ W ∗ . Proof. The desired inequality follows from |ϕ(v)| |ψ(w)| |(ϕ ⊗ ψ) (v ⊗ w)| kϕ ⊗ ψk kv ⊗ wk = ≤ kvkV kwkW kvkV kwkW kvkV kwkW
≤ C kϕ ⊗ ψk (4.14)
for all 0 6= v ∈ V and 0 6= w ∈ W .
t u ∗
According ∗ to Remark 4.12c, we need not distinguish between (V ⊗a W ) and ∗ V ⊗k·k W , where k·k is the norm of V ⊗a W. The norm dual to k·k is k·k . ∗ ∗ ∗ The algebraic tensor product V ⊗a W can be completed with respect to k·k . Then (4.25) becomes ∗ (4.28) V ∗ ⊗k·k∗ W ∗ ⊂ V ⊗k·k W . Assuming infinite dimensions, the question arises whether the inclusion can be replaced by ‘=’ or ‘$’ (see Proposition 3.55 and Example 3.63 for the algebraic tensor case). In the topological case, both cases are possible as shown in Exercise 4.68. As in §4.2.2.1, we may ask for the strongest dual norm satisfying inequality (4.26). As seen in Lemma 4.21, weaker norms k·k correspond to stronger dual norms ∗ k·k . Therefore the following two questions are equivalent: ∗
• Which norm k·k on V ⊗a W yields the strongest dual norm k·k satisfying (4.26)? • What is the weakest norm k·k on V ⊗a W such that the corresponding dual norm ∗ k·k satisfies (4.26)? A candidate will be defined below. Since this will be again a norm determined only by k·kV and k·kW , it may also be called an induced norm.
4.2.4.2 Injective Norm Definition 4.58 (injective norm). Normed spaces (V, k·kV ) and (W, k·kW ) induce the injective norm k·k∨(V,W ) on V ⊗a W defined by kxk∨(V,W ) := kxk∨ :=
sup
|(ϕ ⊗ ψ) (x)| .
ϕ∈V ∗ ,kϕkV ∗ =1 ψ∈W ∗ ,kψkW ∗ =1
The completion of V ⊗a W with respect to k·k∨ defines (V ⊗∨ W, k·k∨ ).
(4.29)
4 Banach Tensor Spaces
122
Lemma 4.59. (a) k·k∨(V,W ) defined in (4.29) is a crossnorm on V ⊗a W ; i.e., (4.13) holds implying (4.14) and (4.27). ∗ (b) The dual norm kϕ ⊗ ψk∨(V,W ) is a crossnorm on V ∗ ⊗a W ∗ , i.e., ∗
kϕ ⊗ ψk∨(V,W ) = kϕkV ∗ kψkW ∗
for all ϕ ∈ V ∗ , ψ ∈ W ∗
(4.30)
holds, implying (4.26). Proof. (i) The norm axiom kλxk∨ = |λ| kxk∨ and the triangle inequality are standard. To show positivity Pr kxk∨ > 0 for 0 6= x ∈ V ⊗a W, apply Lemma 3.15: x has a representation x = i=1 vi ⊗ wi with linearly independent vi and wi . Note that r ≥ 1 because x 6= 0. Then there are normalised functionals ϕ ∈ V ∗ and ψ ∈ W ∗ with ϕ(v1 ) 6= 0 and ψ(w1 ) 6= 0, while ϕ(vi ) = ψ(wi ) = 0 for i ≥ 2. This leads to X X r r |(ϕ⊗ψ)(x)| = (ϕ⊗ψ) vi ⊗wi = ϕ(vi )ψ(wi ) = |ϕ(v1 )ψ(w1 )| > 0. i=1
i=1
Hence kxk∨ ≥ |(ϕ ⊗ ψ) (x)| is also positive. (ii) Application of (4.29) to an elementary tensor v ⊗ w yields kv ⊗ wk∨(V,W ) = =
|(ϕ ⊗ ψ) (v ⊗ w)| =
sup
|ϕ(v)|
sup
|ϕ(v)| |ψ(w)|
sup kϕkV ∗ =1 kψkW ∗ =1
kϕkV ∗ =1 kψkW ∗ =1
kϕkV ∗ =1
sup kψkW ∗ =1
|ψ(w)|
= kvkV kwkW .
(4.10)
(iii) For 0 6= ϕ⊗ψ ∈ V ∗ ⊗a W ∗ introduce the normalised continuous functionals ϕˆ := ϕ/ kϕkV ∗ and ψˆ := ψ/ kψkW ∗ . Then for all x ∈ V ⊗a W, the inequality |(ϕ ⊗ ψ) (x)| = kϕkV ∗ kψkW ∗ ϕˆ ⊗ ψˆ (x) ≤ kϕkV ∗ kψkW ∗
sup 0
∗
0
ϕ ∈V ,kϕ kV ∗ =1 ψ 0 ∈W ∗ ,kψ 0 kW ∗ =1
|(ϕ0 ⊗ ψ 0 ) (x)| = kϕkV ∗ kψkW ∗ kxk∨
follows. The supremum over all x ∈ V ⊗a W with kxk∨ = 1 yields the dual norm ∗ so that kϕ ⊗ ψk∨(V,W ) ≤ kϕkV ∗ kψkW ∗ . This is already (4.26) with C = 1. (iv) Let ε > 0 and ϕ ⊗ ψ ∈ V ∗ ⊗a W ∗ be arbitrary. According to Remark 4.13, there are vε ∈ V and wε ∈ W with kvε kV = kwε kW = 1 and |ϕ(vε )| ≥ (1 − ε) kϕkV ∗
and
|ψ(wε )| ≥ (1 − ε) kψkW ∗ .
Note that, by (4.13), xε := vε ⊗ wε satisfies kxε k∨ = kvε kV kwε kW = 1. Hence ∗
kϕ ⊗ ψk∨(V,W ) = sup |(ϕ ⊗ ψ) (x)| ≥ |(ϕ ⊗ ψ) (xε )| = kxk∨ =1
= |(ϕ ⊗ ψ) (vε ⊗ wε )| =
|ϕ(vε )| |ψ(wε )|
2
≥ (1 − ε) kϕkV ∗ kψkW ∗ .
4.2 Topological Tensor Spaces
123 ∗
As ε > 0 is arbitrary, the reverse inequality kϕ ⊗ ψk∨(V,W ) ≥ kϕkV ∗ kψkW ∗ follows. Together with step (iii), we have proved (4.30). t u Proposition 4.60. k·k = k·k∨(V,W ) is the weakest norm on V ⊗a W subject to the ∗ additional condition that the dual norm k·k satisfy (4.26). Proof. Let x ∈ V ⊗a W . For any ε > 0, there is some ϕε ⊗ ψε ∈ V ∗ ⊗a W ∗ ∗ ∗ with kϕε kV = kψε kW = 1 and ∗
kxk∨(V,W ) ≤ (1 − ε)| (ϕε ⊗ ψε ) (x)| ≤ (1 − ε) kϕε ⊗ ψε k kxk ≤
(4.26)
≤ C (1 −
∗ ε) kϕε kV
∗ kψε kW
kxk = C (1 − ε) kxk
| {z }| {z } =1
=1
proving k·k∨(V,W ) ≤ C k·k; i.e., k·k∨(V,W ) is weaker than k·k.
t u
Lemma 4.61. k·k∨(V,W ) ≤ k·k∧(V,W ) holds on V ⊗a W. P Proof. Choose any ε > 0. Let x = i vi ⊗ wi ∈ V ⊗a W be some representation P with i kvi kV kwi kW ≤ kxk∧(V,W ) + ε. Choose normalised functionals ϕ and ψ with kxk∨(V,W ) ≤ |(ϕ ⊗ ψ) (x)| + ε. Then X X ϕ(vi )ψ(wi ) + ε vi ⊗ wi + ε = kxk∨(V,W ) ≤ (ϕ ⊗ ψ) i i X X kvi kV kwi kW + ε ≤ kxk∧(V,W ) + 2ε. |ϕ(vi )| |ψ(wi )| + ε ≤ ≤ i
i
As ε > 0 is arbitrary, kxk∨(V,W ) ≤ kxk∧(V,W ) holds for all x ∈ V ⊗a W .
t u
Exercise 4.62. For any a1 , . . . , an ≥ 0 and b1 , . . . , bn > 0 show that min
1≤i≤n
ai a1 + . . . + an ai ≤ ≤ max . 1≤i≤n bi bi b1 + . . . + bn
So far, we have considered the norm kxk∨(V,W ) on V ⊗a W . Analogously, we can define kxk∨(V ∗ ,W ∗ ) on V ∗ ⊗a W ∗ . For the latter norm we shall establish a connection with k·k∧(V,W ) in §4.2.3.1, which states that in a certain sense the injective norm is dual to the projective norm.13 ∗
Proposition 4.63. k·k∨(V ∗ ,W ∗ ) = k·k∧(V,W ) on V ∗ ⊗a W ∗ . ∗
Proof. (i) The norm kΦk∧(V,W ) of Φ ∈ V ∗ ⊗a W ∗ is bounded by ∗
kΦk∧(V,W ) = 13
P |Φ( i vi ⊗ wi )| |Φ(x)| P = sup P 06=x∈V ⊗a W kxk∧(V,W ) 06=x= i vi ⊗wi ∈V ⊗a W i kvi kV kwi kW sup
The reverse statement is not true, but nearly (see Defant–Floret [74, §I.6]).
4 Banach Tensor Spaces
124
≤
sup P 06=x= i vi ⊗wi ∈V ⊗a W
P |Φ (vi ⊗ wi )| Pi kv i kV kwi kW i
≤
sup
Exercise 4.62 06=v∈V 06=w∈W
|Φ (v ⊗ w)| . kvkV kwkW
On the other hand, the elementary tensor x = v ⊗ w appearing in the last expression is only a subset of those x used in P |Φ ( i vi ⊗ wi )| ∗ P sup = kΦk∧(V,W ) P kv k kw k i i 06=x= i vi ⊗wi i V W ∗
so that kΦk∧(V,W ) must be an upper bound. Together, we arrive at ∗
kΦk∧(V,W ) =
sup 06=v∈V 06=w∈W
|Φ (v ⊗ w)| . kvkV kwkW
(4.31a)
P P (ii) Let Φ = ϕi ⊗ψi and set ϕ := ψi (w)ϕi ∈ V ∗ for some fixed w ∈ W . i i Then P | i ϕi (v)ψi (w)| |v ∗∗ (ϕ)| |ϕ(v)| = sup = sup sup ∗∗ ∗∗ kvk 06=v∈V 06=v∈V kvk Lemma 4.22 06=v ∗∗ ∈V ∗∗ kv k P ∗∗ | i v (ϕi )ψi (w))| = sup . ∗∗ ∗∗ ∗∗ kv ∗∗ k 06=v ∈V Similarly, the supremum over w can be replaced with a supremum over w∗∗ : P X | i ϕi (v)ψi (w)| |(v ∗∗ ⊗w∗∗ ) (Φ)| for Φ = ϕ i ⊗ ψi . sup = sup ∗∗ ∗∗ ∗∗ ∗∗ kvk kwk 06=v∈V 06=v ∗∗ ∈V ∗∗ kv k kw k i 06=w∗∗ ∈W ∗∗
06=w∈W
(4.31b) (iii) The left-hand side of (4.31b) coincides with the right-hand side of (4.31a) since P |Φ (v ⊗ w)| = | i ϕi (v)ψi (w)|. The right-hand side of (4.31b) coincides with the ∗ definition of kΦk∨(V ∗ ,W ∗ ) . Together, kΦk∧(V,W ) = kΦk∨(V ∗ ,W ∗ ) is shown. t u Corollary 4.64. The norm k·k∧(V,W ) satisfies not only (4.26) but also ∗
kϕ ⊗ ψk∧(V,W ) = kϕkV ∗ kψkW ∗
for all ϕ ∈ V ∗ , ψ ∈ W ∗
(cf. (4.30)).
∗
Proof. This is the crossnorm property for k·k∨(V ∗ ,W ∗ ) = k·k∧(V,W ) (cf. Proposition 4.63) stated in Lemma 4.59. t u Exercise 4.65. Let U ⊂ V be a closed subspace. The norm k·k = k·k∨(U,W ) is defined involving functionals ψ ∈ U ∗ . On the other hand we may restrict the norm k·k = k·k∨(V,W ) to U ⊗ W (involving ϕ ∈ V ∗ ). Show that both approaches lead to the same closed subspace U ⊗k·k W .
4.2 Topological Tensor Spaces
125
4.2.4.3 Examples Again, we consider the spaces `p (I) and `p (J). To simplify the reasoning, we first restrict the analysis to finite index sets I = J = {1, . . . , n}. The duals of `p (I) and `p (J) for 1 ≤ p < ∞ are `q (I) and `q (J) with p1 + 1q = 1 (cf. Example 4.26). Let ϕ ∈ `q (I) and ψ ∈ `q (J). The definition of k·k∨ makes use of (ϕ ⊗ ψ) (x), where for x = v ⊗ w the definition (ϕ ⊗ ψ) (v ⊗ w) = ϕ(v) · ψ (w) holds. The interpretation of ϕ(v) for a vector v ∈ `p (I) and a (dual) vector ϕ ∈ `q (I) is ϕ(v) := ϕT v. Similarly, ψ (w) = ψ T w. Elements from `p (I) ⊗ `p (J) = `p (I × J) are standard n × n matrices, which we shall denote by M . We recall that v ⊗ w (v ∈ `p (I), w ∈ `p (J)) corresponds to the matrix vwT . Hence, with M = vwT , the definition of (ϕ ⊗ ψ) (v ⊗ w) becomes ϕT M ψ ∈ K. This leads to the interpretation of kxk∨(V,W ) in (4.29) by kM k∨(`p (I),`p (J)) =
sup
|ϕTM ψ| . ϕ,ψ6=0 kϕkq kψkq
T ϕ M ψ = sup
kϕkq =kψkq =1
Remark 4.66. (a) Let 1 ≤ p < ∞ and assume #I > 1 and #J > 1. Then inequality k·k∨(`p (I),`p (J)) ≤ k·k`p (I×J) holds, but the corresponding equality is not valid. (b) For p = 2, k·k∨(`2 (I),`2 (J)) = k·kSVD,∞ 6= k·kSVD,2 = k·k`2 (I×J) holds with k·kSVD,p defined in (4.17). Proof. To prove k·k∨(`p (I),`p (J)) 6= k·k`p (I×J) , choose M =
h
1 1 1 −1
i
for the case14
I = J = {1, 2}. For instance for p = 1, kM k`1 (I×J) = 4, while an elementary |ϕTM ψ| t u analysis of kϕk kψk shows that kM k∨(`1 (I),`1 (J)) = 2. ∞
∞
In the case of the projective norm k·k∧(`p (I),`p (J)) in §4.2.3.1, we saw in §4.2.3.2 that the norms k·k∧(`p (I),`p (J)) and k·k`p (I×J) coincide for p = 1. Now coincidence happens for p = ∞. Remark 4.67. k·k∨(`∞ (I),`∞ (J)) = k·k`∞ (I×J) . (i)
(j)
(i)
(j)
Proof. Choose unit vectors eI ∈ `1 (I), eJ ∈ `1 (J). Then (eI )TM eJ = Mij (i) (j) (i) (i) (j) and keI k1 = keJ k1 = 1 show that |(eI )TM eJj |/ keI k1 keJ k1 = |Mij | and kM k∨(`∞ (I),`∞ (J)) =
sup
sup
06=ϕ∈(`∞ (I))∗ 06=ψ∈(`∞ (J))∗
|ϕTM ψ| ∗ ∗ . kϕk∞ kψk∞
∗
`1 (I) is a subset of the dual space (`∞ (I)) . The particular choice of the unit (i) (j) vectors eI ∈ `1 (I) (i ∈ I) and eJ ∈ `1 (J) (j ∈ J) yields 14
This 2 × 2 example can be embedded in any model with #I, #J ≥ 2.
4 Banach Tensor Spaces
126 (i)
kM k∨(`∞ (I),`∞ (J)) ≥ sup
i∈I,j∈J
(j)
|(eI )TM eJ | (i)
(j)
keI k1 keJ k1
= sup |Mij | = kM k`∞ (I×J) . i∈I,j∈J
On the other hand, for all i ∈ I we have X X |ψj | ≤ kM k`∞ (I×J) kψk1 Mij ψj ≤ sup |Mij | |(M ψ)i | = j
j
j
implying kM ψk∞ ≤ kM k`∞ (I×J) kψk1 . Finally, |ϕTM ψ| ≤ kϕk1 kM ψk∞ proves the reverse inequality kM k∨(`∞ (I),`∞ (J)) ≤ kM k`∞ (I×J) . t u Set V := `∞ (I) ⊗∨ `∞ (J). The conclusion we can draw from Remark 4.67 is not V = `∞ (I × J), but that V is a closed subspace of `∞ (I × J). Exercise 4.68. (a) `1 (I)⊗a `1 (J) is dense in `1 (I×J), i.e., `1 (I)⊗`1 (I×J) `1 (J) = `1 (I ×J). The dual spaces of `1 (I), `1 (J), `1 (I × J) are isomorphic to `∞ (I), `∞ (J), `∞ (I × J). Hence (4.28) becomes `∞ (I) ⊗∨ `∞ (J) = `∞ (I) ⊗`∞ `∞ (J) ⊂ `∞ (I × J). Prove that the left-hand side is a proper subspace. (b) Let `2 (I) ⊗a `2 (J) be equipped with the norm k·k := k·kSVD,2 = k·k`2 (I×J) and prove that (4.28) holds with an equal sign. Among the function spaces, C(I) is of interest since again the supremum norm k·k∞ is involved. Remark 4.69. Let V = (C(I), k·kC(I) ) and W = (C(J), k·kC(J) ) with certain domains I and J (cf. Example 4.10). Then k·k∨(C(I),C(J)) = k·kC(I×J) . Proof. (i) Let f (·, ·) ∈ C(I × J). The duals C(I)∗ and C(J)∗ contain the delta functionals δx , δy (x ∈ I, y ∈ J) with kδx kC(I)∗ = 1 (cf. (4.11)). Hence, | (ϕ ⊗ ψ) f | kϕk ϕ,ψ6=0 C(I)∗ kψkC(J)∗
kf k∨(C(I),C(J)) = sup ≥
sup | (δx ⊗ δy ) f | = x∈I,y∈J
sup |f (x, y)| = kf kC(I×J) .
x∈I,y∈J
(ii) For the reverse inequality, consider the function fy := f (·, y) for fixed y ∈ J. Then fy ∈ C(I) has the norm kfy kC(I) ≤ kf kC(I×J) for all y ∈ J. Application of ϕ yields g(y) := ϕ(fy ) and |g(y)| = |ϕ(fy )| ≤ kϕkC(I)∗ kf kC(I×J) for all y ∈ J; hence, kgkC(J) ≤ kϕkC(I)∗ kf kC(I×J) . Application of ψ to g gives | (ϕ ⊗ ψ) f | = |ψ(g)| ≤ kψkC(J)∗ kgkC(J) ≤ kf kC(I×J) kϕkC(I)∗ kψkC(J)∗ implying kf k∨(C(I),C(J)) ≤ kf kC(I×J) . t u
4.2 Topological Tensor Spaces
127
4.2.5 Embedding of V ∗ into L(V ⊗ W, W ) The following result will be essential for the properties of the minimal subspaces discussed in §6. According to Remark 3.64, we regard V ∗ ⊂ V 0 as a subspace of L(V ⊗a W, W ) via ! X X X ∗ vi ⊗ w i = ϕ ∈ V 7→ ϕ ϕ (vi ⊗ wi ) = ϕ(vi ) wi ∈ W. i
i
i
The crucial question is as to whether the map x 7→ ϕ(x) is continuous, i.e., whether V ∗ ⊂ L (V ⊗a W, W ) . The supposition k·k & k·k∨(V,W ) of the next proposition is satisfied for all reasonable crossnorms (cf. §4.2.6.1 and Proposition 4.60). Proposition 4.70. If V ⊗a W is equipped with a norm k·k & k·k∨(V,W ) , the embedding V ∗ ⊂ L (V ⊗a W, W ) is valid. In particular, kϕ(x)kW ≤ kϕkV ∗ kxk∨(V,W ) ≤ C kϕkV ∗ kxk
for all ϕ ∈ V ∗ (4.32) and x ∈ V ⊗a W.
An analogous result holds for W ∗ ⊂ L (V ⊗ a W, V ) . Proof. For w := ϕ(x) ∈ W choose ψ ∈ W ∗ with ψ(w) = kwkW and kψkW ∗ = 1 (cf. Theorem 4.18). Then X kϕ(x)kW = |ψ(ϕ(x))| P= ϕ(vi )wi ψ i x= i vi ⊗wi X ∗ = ϕ(vi )ψ(wi ) = |(ϕ ⊗ ψ) (x)| ≤ kϕ ⊗ ψk∨(V,W ) kxk∨(V,W ) = i
(4.30)
= kϕkV ∗ kψkW ∗ kxk∨(V,W ) = kϕkV ∗ kxk∨(V,W ) and kxk∨(V,W ) ≤ C kxk prove (4.32).
t u
Corollary 4.71. Let V and W be two Banach spaces, where either V or W are finite dimensional. Equip V ⊗a W with a norm k·k & k·k∨(V,W ) . Then V ⊗a W is already complete. Proof. Let dim(W ) = n and choose a basis {w1 , . . . , wn } of W .P According to n Remark 3.65a, all tensors x ∈ V ⊗a W may be written as x = i=1 vi ⊗ wi with vi = ϕi (x) ∈ V using a dual basis {ϕ1 , . . . , ϕn } . By dim(W ) < ∞ the functionals belong to W ∗ , and Proposition 4.70 implies that ϕi ∈ L (V ⊗a W, V ) . Let xk ∈ V ⊗a W be a Cauchy sequence. By continuity of ϕi , also vik = ϕi (xk ) k is a Cauchy sequence.PSince V is a Banach Pn space, vi → vi ∈ W converges and n k t u proves lim xk = lim i=1 vi ⊗ wi = i=1 vi ⊗ wi ∈ V ⊗a W.
4 Banach Tensor Spaces
128
4.2.6 Reasonable Crossnorms 4.2.6.1 Definition and Properties Now we combine the inequalities (4.14) and (4.26) (with C = 1); i.e., we require, simultaneously, continuity of ⊗ : (V, k·kV ) × (W, k·kW ) → (V ⊗a W, k·k) and ∗ ∗ ⊗ : (V ∗ , k·kV ∗ ) × (W ∗ , k·kW ∗ ) → (V ∗ ⊗a W ∗ , k·k ). Note that k·k is the dual norm of k·k. Definition 4.72. A norm k·k on V ⊗a W is a reasonable crossnorm15 if k·k satisfies kv ⊗ wk ≤ kvkV kwkW ∗
kϕ ⊗ ψk ≤ kϕkV ∗ kψkW ∗
for all v ∈ V and w ∈ W, for all ϕ ∈ V ∗ and ψ ∈ W ∗ .
(4.33a) (4.33b)
Lemma 4.73. If k·k is a reasonable crossnorm, then (4.33c,d) holds: for all v ∈ V and w ∈ W,
kv ⊗ wk = kvkV kwkW ∗
kϕ ⊗ ψk = kϕkV ∗ kψkW ∗
∗
(4.33c)
∗
for all ϕ ∈ V and ψ ∈ W .
(4.33d)
Proof. (i) Note that (4.33b) is (4.26) with C = 1. By Lemma 4.56, inequality (4.27) holds with C = 1, i.e., kv ⊗ wk ≥ kvkV kwkW . Together with (4.33a) we obtain (4.33c). ∗ (ii) Similarly, (4.33a) is (4.14) with C = 1. Lemma 4.57 proves kϕ ⊗ ψk ≥ u t kϕkV ∗ kψkW ∗ so that, together with (4.33b), identity (4.33d) follows. By the previous lemma, an equivalent definition of a reasonable crossnorm k·k ∗ is: k·k and k·k are crossnorms. Lemma 4.51, Corollary 4.64, and Lemma 4.59 prove that the norms k·k∧(V,W ) and k·k∨(V,W ) are particular reasonable crossnorms. Furthermore, Lemma 4.61, together with Propositions 4.52 and 4.60, shows the next statement. Proposition 4.74. k·k∨(V,W ) is the weakest and k·k∧(V,W ) is the strongest reasonable crossnorm; i.e., any reasonable crossnorm k·k satisfies k·k∨(V,W ) . k·k . k·k∧(V,W ) .
(4.34) ∗
Proposition 4.75. If k·k is a reasonable crossnorm on V ⊗ W , then k·k is also a reasonable crossnorm on V ∗ ⊗ W ∗ . ∗
Proof. (i) We must show that k·k satisfies (4.33a,b) with k·k, V , W replaced with ∗ k·k , V ∗ , W ∗ . The reformulated inequality (4.33a) is (4.33b). Hence this condition is satisfied by assumption. It remains to show the reformulated version of (4.33b): 15
Also the name ‘dualisable crossnorm’ has been used (cf. [263]). Schatten [254] used the term ‘crossnorm whose associate is a crossnorm’ (‘associate norm’ means dual norm).
4.2 Topological Tensor Spaces ∗∗
kv ∗∗ ⊗ w∗∗ k
129
≤ kv ∗∗ kV ∗∗ kw∗∗ kW ∗∗ for all v ∗∗ ∈ V ∗∗ , w∗∗ ∈ W ∗∗ . (4.35a)
(ii) By Proposition 4.74, k·k∨(V,W ) ≤ k·k ≤ k·k∧(V,W ) is valid. Applying ∗∗ ∗∗ ∗∗ Lemma 4.21 twice, we see that k·k∨(V,W ) ≤ k·k ≤ k·k∧(V,W ) holds for the bidual norm; in particular, kv ∗∗ ⊗ w∗∗ k
∗∗
∗∗
≤ kv ∗∗ ⊗ w∗∗ k∧(V,W ) .
(4.35b)
∗
Proposition 4.63 states that k·k∧(V,W ) = k·k∨(V ∗ ,W ∗ ) implying ∗∗
∗
k·k∧(V,W ) = k·k∨(V ∗ ,W ∗ ) .
(4.35c)
Since k·k∨(V ∗ ,W ∗ ) is a reasonable crossnorm on V ∗ ⊗ W ∗ (cf. Proposition 4.74), it satisfies the corresponding inequality (4.33b): ∗
kv ∗∗ ⊗ w∗∗ k∨(V ∗ ,W ∗ ) ≤ kv ∗∗ kV ∗∗ kw∗∗ kW ∗∗ for all v ∗∗ ∈ V ∗∗ , w∗∗ ∈ W ∗∗ . (4.35d) Now the equations (4.35b–d) prove (4.35a). u t In general, different reasonable crossnorms are not equivalent. Exceptions occur for finite-dimensional spaces or subspaces. However, the next statement of Fern´andez-Unzueta [100] also shows equivalence on the sets Rr of tensors of rank bounded by r (cf. §3.2.6). Nd Theorem 4.76. Let k·k and |||·||| be two reasonable crossnorms on V = j=1 Vj . Then for all r ∈ N the following estimates hold: kvk ≤ rd−1 |||v|||,
kv − wk ≤ (2r)
d−1
|||v − w|||
for all v, w ∈ Rr .
As a consequence, the closure of Rr is independent of the particular reasonable crossnorm.
4.2.6.2 Examples and Counterexamples Example 4.77 (`2 ). Let V = `2 (I) and W = `2 (J) for finite or countable index sets I, J with norms k·k`2 (I) and k·k`2 (J) . Then all norms k·kSVD,p for 1 ≤ p ≤ ∞ are reasonable crossnorms on `2 (I) ⊗a `2 (J). In particular, k·k∨(`2 (I),`2 (J)) = k·kSVD,∞ ≤ k·kSVD,p ≤ k·kSVD,q ≤ k·kSVD,1 = k·k∧(`∞ (I),`∞ (J))
for all p ≥ q ≥ 1.
Example 4.78 (`p ). k·k`p (I×J) is a reasonable crossnorm for 1 ≤ p < ∞. Proof. (4.15) proves that k·k`p (I×J) satisfies (4.33a). The same statement (4.15) t u for p replaced with q (defined by p1 + 1q = 1) shows (4.33b).
4 Banach Tensor Spaces
130
The next example can be shown analogously. Example 4.79 (Lp ). Let V = Lp (I) and W = Lp (J) for intervals I and J. Then k·kLp (I×J) is a reasonable norm on V ⊗a W = Lp (I × J) for 1 ≤ p < ∞. For the next example we recall that kf kC 1 (I) = max {|f (x)| , |f 0 (x)|} x∈I
is the norm of continuously differentiable functions in one variable x ∈ I ⊂ R. The name k·k1,mix of the following norm is derived from the mixed derivative involved. Example 4.80. Let I, J be compact intervals in R and set V = (C 1 (I), k·kC 1 (I) ), W = (C 1 (J), k·kC 1 (J) ). For the tensor space V ⊗a W we introduce the mixed norm kϕkC 1
mix (I×J)
:= kϕk1,mix :=
max (x,y)∈I×J
(4.36) 2 ∂ϕ(x, y) ∂ϕ(x, y) ∂ ϕ(x, y) , , . |ϕ(x, y)| , ∂x ∂y ∂x∂y
Then k·k1,mix is a reasonable crossnorm. Proof. kf ⊗ gkC 1 (I×J) ≤ kf kC 1 (I) kgkC 1 (J) is easy to verify. The proof of mix (4.33b) uses similar ideas to those from the proof of Remark 4.69. t u However, the standard norm for C 1 (I × J) is kϕkC 1 (I×J) =
max (x,y)∈I×J
As k·kC 1 (I×J) ≤ k·kC 1
mix (I×J)
∂ ∂ |ϕ(x, y)| , ϕ(x, y) , ϕ(x, y) . ∂x ∂y
, the inequality
kf ⊗ gkC 1 (I×J) ≤ kf kC 1 (I) kgkC 1 (J) proves (4.33a). However, the second inequality (4.33b) characterising a reasonable crossnorm cannot be satisfied. For a counterexample, choose the continuous functionals δx0 0 ∈ V ∗ (x0 ∈ I) and δy0 0 ∈ V ∗ (y0 ∈ J) defined by δx0 0 (f ) = −f 0 (x0 ) and δy0 0 (g) = −g 0 (y0 ). Then δx0 0 ⊗δy0 0 (f ⊗g) = f 0 (x0 )g 0 (y0 ) cannot be bounded ∂2 by kf ⊗gkC 1 (I×J) , since the term | ∂x∂y ϕ(x, y)| from (4.36) is missing. For the treatment of this situation we refer to §4.3.5. An analogous situation happens for H 1,p (I × J) = H 1,p (I) ⊗1,p H 1,p (J) from Example 4.47. As mentioned in Example 4.47, k·kH 1,p (I×J) is no crossnorm. 1,p (I × J) allows the Furthermore, it does not satisfy (4.26). On the other hand, Hmix
p 1/p p p p . crossnorm kf k + k∂f /∂xk + k∂f /∂yk + ∂ 2 f /∂x∂y p
p
p
p
4.2 Topological Tensor Spaces
131
The anisotropic Sobolev space H (1,0),p (I1 × I2 ) = H 1,p (I1 ) ⊗(1,0),p Lp (I2 ) is introduced in Example 4.48. Remark 4.81. The norm k·k(1,0),p = k·kH (1,0),p (I1 ×I2 ) on H (1,0),p (I1 × I2 ) is a reasonable crossnorm for 1 ≤ p < ∞. Proof. As already stated in Example 4.48, the norm satisfies (4.33c). It remains to prove (4.33b). The functionals ϕ ∈ V ∗ := (H 1,p (I1 ))∗ and ψ ∈ W ∗ := Lp (I2 )∗ ∗ may be normalised: kϕkV ∗ = kψkW ∗ = 1. Then kϕ ⊗ ψk ≤ 1 has to be proved. ∗ By the definition of k·k , it is sufficient to show for all f ∈ H (1,0),p (I1 × I2 ).
|(ϕ ⊗ ψ) (f )| ≤ kf k(1,0),p
Next, we may restrict f ∈ H (1,0),p (I1 ×I2 ) to the dense subset f ∈ C ∞ (I1 × I2 ). As stated in Example 4.26b, the dual space W ∗ can be identified with Lq (I2 ), i.e., ψ ∈ Lq (I2 ) and kψkq = 1. Application of ψ ∈ W ∗ to f yield the following function of x ∈ I1 : Z
f (x, y)ψ(y)dy ∈ C ∞ (I1 ).
F (x) := I2
The functional ϕ ∈ V ∗ acts with respect to the x-variable: Z (ϕ ⊗ ψ) (f ) = ϕ(F ) =
ϕ [f (·, y)] ψ(y)dy. I2
For a fixed y ∈ I2 , the estimate |ϕ [f (·, y)]| ≤ kϕkV ∗ kf (·, y)k1,p = kf (·, y)k1,p implies that Z |(ϕ ⊗ ψ) (f )| ≤ I2
≤
Z I2
p kf (·, y)k1,p
kf (·, y)k1,p ψ(y)dy dy
1/p Z I2
q
|ψ(y)| dy
1/q
sZ
= kψkq p | {z }
I2
p
kf (·, y)k1,p dy
=1
sZ Z =
p
I2
I1
p ∂ p |f (x, y)| + f (x, y) dx dy = kf k(1,0),p . ∂x
t u
Although the solution of elliptic partial differential equations is usually a 1,p function of the standard Sobolev spaces and not of mixed spaces as Hmix , there are important exceptions. As proved by Yserentant [307], the solutions of the electronic Schr¨odinger equation have mixed regularity because they must be antisymmetric (Pauli principle).
4 Banach Tensor Spaces
132
4.2.7 Reflexivity Let k·k be a crossnorm norm on V ⊗aW . The dual space of V ⊗k·kW or16 V ⊗aW ∗ is V ⊗k·k W . From (4.25) we derive that V ∗ ⊗k·k∗ W ∗ ⊂ V ⊗k·k W
∗
.
(4.37)
Lemma 4.82. Assume that V ⊗k·k W is equipped with a reasonable crossnorm k·k and is reflexive. Then the identity V ∗ ⊗k·k∗ W ∗ = V ⊗k·k W
∗
(4.38)
holds. Furthermore, the spaces V and W are reflexive. Proof. (i) For an indirect proof, assume that (4.38) is not valid. Then there is some / V ∗ ⊗k·k∗ W ∗ . By Hahn–Banach there is some bidual φ ∈ (V ⊗k·k W )∗ with φ ∈ Φ ∈ (V ⊗k·k W )∗∗ such that Φ(φ) 6= 0, while Φ(ω) = 0 for all ω ∈ V ∗ ⊗k·k∗ W ∗ . Because of the reflexivity, Φ(φ) has a representation as φ(xΦ ) for some 0 6= xΦ ∈ V ⊗k·k W . As 0 6= xΦ implies kxΦ k∨(V,W ) > 0, there is some ω = ϕ ⊗ ψ ∈ V ∗ ⊗a W ∗
such that |ω(xΦ )| > 0.
This is in contradiction to 0 = Φ(ω) = ω(xΦ ) for all ω ∈ V ∗ ⊗a W ∗ . Hence identity (4.38) must hold. (ii) The statement analogous to (4.37) for the dual spaces is ∗ ∗∗ V ∗∗ ⊗k·k∗∗ W ∗∗ ⊂ V ∗ ⊗k·k∗ W ∗ = V ⊗k·k W = V ⊗k·k W, (4.38)
implying V ∗∗ ⊂ V and W ∗∗ ⊂ W . Together with the general property V ⊂ V ∗∗ t u and W ⊂ W ∗∗ , we obtain reflexivity of V and W . The last lemma shows that reflexivity of the Banach spaces V and W is necessary for V ⊗k·k W to be reflexive. One might expect that reflexivity of V and W is also sufficient; i.e., the tensor product of reflexive spaces is again reflexive. This is not the case as the next example shows (for a proof see Schatten [254, p. 139]; note that the Banach spaces `p (N) are reflexive for 1 < p < ∞). Example 4.83. For 1 < p < ∞ and
1 p
+
1 q
= 1, `p (N) ⊗ `q (N) is nonreflexive. ∨
A Banach space X and any dense subspace X0 ⊂ X yield the same dual space X ∗ = X0∗ (cf. Lemma 4.40). 16
4.2 Topological Tensor Spaces
133
4.2.8 Uniform Crossnorms Let (V ⊗a W, k·k) be a tensor space with crossnorm k·k and consider operators A ∈ L(V, V ) and B ∈ L(W, W ) with operator norms kAkV ←V and kBkW ←W . As discussed in §3.3.2.1, A ⊗ B is defined on elementary tensors v ⊗ w via (A ⊗ B) (v ⊗ w) := (Av) ⊗ (Bw) ∈ V ⊗a W. While A ⊗ B is well-defined on finite linear combinations from V ⊗a W , the question is as to whether A ⊗ B : V ⊗a W → V ⊗a W is (uniformly) bounded, i.e., A ⊗ B ∈ L(V ⊗a W, V ⊗a W ). In the positive case, A ⊗ B also belongs to L(V ⊗k·k W, V ⊗k·k W ). For elementary tensors, the estimate k(A ⊗ B) (v ⊗ w)k = k(Av) ⊗ (Bw)k = kAvk kBwk
(4.39)
≤ kAkV ←V kBkW ←W kvkV kwkW = kAkV ←V kBkW ←W kv ⊗ wk follows by the crossnorm property. However, this inequality does not automatically extend to general tensors from V ⊗a W . Instead, the desired estimate is the subject of the next definition (cf. Schatten [254]). Definition 4.84. A crossnorm on V ⊗a W is called uniform if A⊗B for all A ∈ L(V, V ) and B ∈ L(W, W ) belongs to L(V ⊗a W, V ⊗a W ) with the operator norm kA ⊗ BkV ⊗a W ←V ⊗a W ≤ kAkV ←V kBkW ←W . (4.40) Taking the supremum over all v ⊗ w with kv ⊗ wk = 1, we conclude from (4.39) that kA ⊗ BkV ⊗a W ←V ⊗a W ≥ kAkV ←V kBkW ←W . Therefore we may replace inequality (4.40) with kA ⊗ BkV ⊗a W ←V ⊗a W = kAkV ←V kBkW ←W . Proposition 4.85. k·k∧(V,W ) and k·k∨(V,W ) are uniform crossnorms. P
vi ⊗wi ∈ V ⊗a W and A ∈ L(V, V ), B ∈ L(W, W ). Then
X X
kAvi kV kBwi kW (Avi )⊗(Bwi ) ≤ k(A ⊗ B) (x)k∧(V,W ) = i i ∧(V,W ) X ≤ kAkV ←V kBkW ←W kvi kV kwi kW
Proof. (i) Let x =
i
i
P
holds for all representations x = i vi ⊗ wi . The infimum over all representations yields k(A ⊗ B) (x)k∧(V,W ) ≤ kAkV ←V kBkW ←W kxk∧(V,W ) ,
4 Banach Tensor Spaces
134
i.e., (4.40) holds for V ⊗a W, k·k∧(V,W ) . P (ii) Let x = i vi ⊗ wi ∈ V ⊗a W and note that k(A ⊗ B) xk∨(V,W ) =
sup 06=ϕ∈V
=
∗ ,06=ψ∈W ∗
sup 06=ϕ∈V ∗ ,06=ψ∈W ∗
=
sup 06=ϕ∈V ∗ ,06=ψ∈W ∗
|(ϕ ⊗ ψ) ((A ⊗ B) x)| kϕkV ∗ kψkW ∗ P | i (ϕ ⊗ ψ) ((Avi ) ⊗ (Bwi ))| kϕkV ∗ kψkW ∗ P | i (ϕ(Avi ) · ψ(Bwi )| . kϕkV ∗ kψkW ∗
By Definition 4.23, A∗ ∈ L(V ∗ , V ∗ ) and B ∗ ∈ L(W ∗ , W ∗ ) satisfy X X ϕ(Avi ) · ψ(Bwi ) = (A∗ ϕ) (vi ) · (B ∗ ψ) (wi ) i
i
= |((A∗ ϕ) ⊗ (B ∗ ψ)) (x)| . We continue: k(A ⊗ B) xk∨(V,W ) = =
|((A∗ ϕ) ⊗ (B ∗ ψ)) (x)| kϕkV ∗ kψkW ∗
sup
06=ϕ∈V ∗ ,06=ψ∈W ∗ kA∗ ϕkV ∗
sup
06=ϕ,06=ψ
kϕkV ∗
kB ∗ ψkW ∗ |((A∗ ϕ) ⊗ (B ∗ ψ)) (x)| . kψkW ∗ kA∗ ϕkV ∗ kB ∗ ψkW ∗
By Lemma 4.24, the inequalities kA∗ ϕkV ∗ ≤ kA∗ kV ∗ ←V ∗ = kAkV ←V kϕkV ∗ and
kB ∗ ψkW ∗ ≤ kB ∗ kW ∗ ←W ∗ = kBkW ←W kψkW ∗
hold, while |((A∗ ϕ) ⊗ (B ∗ ψ)) (x)| ≤ kxk∨(V,W ) . kA∗ ϕkV ∗ kB ∗ ψkW ∗ Together, k(A ⊗ B) xk∨(V,W ) ≤ kAkV ←V kBkW ←W kxk∨(V,W ) proves that k·k∨(V,W ) is also uniform.
t u
By definition, a uniform crossnorm is a crossnorm. Exercise 4.86. Condition (4.40) already implies that, up to a suitable scaling, the norm is a crossnorm. Hint: use similar maps Φ and Ψ as in the next proof.
4.2 Topological Tensor Spaces
135
As shown in Simon [263], it is also a reasonable crossnorm. Lemma 4.87. A uniform crossnorm is a reasonable crossnorm. Proof. Let ϕ ∈ V ∗ and ψ ∈ W ∗ and choose some 0 6= v ∈ V and 0 6= w ∈ W . Define the operator Φ ∈ L(V, V ) by Φ = vϕ (i.e., Φ(x) = ϕ(x) · v) and, similarly, Ψ ∈ L(W, W ) by Ψ := wψ. The identities kΦkV ←V = kvkV kϕkV ∗ and kΨ kW ←W = kwkW kψkW ∗ are valid, as well as (Φ ⊗ Ψ ) (x) = ((ϕ ⊗ ψ) (x)) · (v ⊗ w)
for all x ∈ X : = V ⊗a W.
The crossnorm property yields k(Φ ⊗ Ψ ) (x)k = |(ϕ ⊗ ψ) (x)| kv ⊗ wk = |(ϕ ⊗ ψ) (x)| kvkV kwkW , while the uniform crossnorm property allows the estimate |(ϕ ⊗ ψ) (x)| kvkV kwkW = k(Φ ⊗ Ψ ) (x)k ≤ kΦkV ←V kΨ kW ←W kxk = kvkV kϕkV ∗ kwkW kψkW ∗ kxk . Dividing by kvkV kwkW 6= 0, we obtain |(ϕ ⊗ ψ) (x)| ≤ kϕkV ∗ kψkW ∗ kxk for u t all x ∈ X. Hence k·k is a reasonable crossnorm. Proposition 4.88. Suppose that the Banach spaces V and W are reflexive. If k·k ∗ is a uniform crossnorm on V ⊗a W , then k·k is also a uniform and reasonable ∗ ∗ crossnorm on V ⊗a W . Proof. By Lemma 4.87, k·k is a reasonable crossnorm, while by Proposition 4.75 ∗ k·k is also a reasonable crossnorm. To prove uniformity, let A∗ ∈ L(V ∗ , V ∗ ) and ∗ ∗ B ∗ ∈ L(W ∗ , W ∗ ). Because of the reflexivity, the adjoint Poperators of A ∗ and B ∗ ∗ ∈ For x V B ∈ L(W, W ). = ψ W are A∗∗ = A ∈ L(V, ) and ϕ ⊗ V ⊗ i a i i P and x = j vj ⊗ wj ∈ V ⊗a W we have XX ∗ ∗ (A∗ ⊗ B ∗ ) (x∗ ) (x) = (A ϕi ) (vj ) · (B ψi ) (wj ) i
j
XX = ϕi (Avj ) · ψi (Bwj ) = x∗ (A ⊗ B) (x) i
j
∗ ∗
≤ kx k k(A ⊗ B) (x)k
∗
≤ k·k uniform
∗
From k(A∗ ⊗ B ∗ ) (x∗ )k = sup |((A x6=0
∗
kx∗ k kAkV ←V kBkW ←W kxk .
⊗B ∗ )(x∗ ))(x)| kxk
∗
≤ kAkV ←V kBkW ←W kx∗ k ,
kAkV ←V = kA∗ kV ∗ ←V ∗ , and kBkW ←W = kB ∗ kW ∗ ←W ∗ we derive that the ∗ dual norm k·k is uniform. t u
4 Banach Tensor Spaces
136
4.2.9 Nuclear and Compact Operators Suppose that V and W are Banach spaces and consider the tensor space V ⊗a W ∗ . The inclusion V ⊗a W ∗ ⊂ L(W, V ) is defined via (v ⊗ ψ) (w) := ψ(w)v ∈ V
for all v ∈ V, ψ ∈ W ∗ , w ∈ W.
Similarly as in Proposition 3.68a, V ⊗a W ∗ is interpreted as a subspace of L(W, V ) and denoted by F(W, V ). Elements Φ ∈ F(W, V ) are called finite-rank operators. We recall Definition 4.14: K(W, V ) is the set of compact operators. Definition 4.89. A Banach space X has the approximation property if for any compact set K ⊂ X and ε > 0 there is ΦK,ε ∈ F(X, X) with sup kΦK,ε x − xkV ≤ ε. x∈K
Proposition 4.90. (a) The completion with respect to the operator norm k·kV ←W in (4.6a) yields F(W, V ) ⊂ K(W, V ). (b) Sufficient for F(W, V ) = K(W, V ) is the approximation property of W ∗ . Proof. Φ ∈ F(W,V ) is compact since its range is finite dimensional. Part (a) follows, because limits of compact operators are compact. For Part (b), see [211, p. 17]. t u Next, we relate the operator norm k·kV ←W with the crossnorms of V ⊗k·k W ∗ . Lemma 4.91. kΦkV ←W ≤ kΦk∨(V,W ∗ ) holds for all Φ ∈ V ⊗∨ W ∗ . Reflexivity of W implies the equality k·kV ←W = k·k∨(V,W ∗ ) . Proof. kΦk∨(V,W ∗ ) is the supremum of |(ϕ⊗w∗∗ )(Φ)| over all normalised ϕ ∈ V ∗ and w∗∗ ∈ W ∗∗ . Replacing W ∗∗ by its subspace W , we get a lower bound: kΦk∨(V,W ∗ ) ≥ =
sup
|(ϕ ⊗ w) (Φ)| =
kϕkV ∗ =kwkW =1
sup kwkW =1
sup
|ϕ(Φ(w))|
kϕkV ∗ =kwkW =1
kΦ(w)kV = kΦkV ←W .
If W = W ∗∗ , equality follows.
t u
Corollary 4.92. As all reasonable crossnorms k·k are stronger than k·k∨(V,W ∗ ) , we have V ⊗k·k W ∗ ⊂ F(W, V ) ⊂ K(W, V ). This holds in particular for k·k∧(V,W ∗ ) . The definition of nuclear operators can be found in Grothendieck [129]. Definition 4.93. N (W, V ) := V ⊗∧ W ∗ is the space of nuclear operators. If V and W are assumed to be Hilbert spaces, the infinite singular-value decomposition enables additional conclusions which will be given in §4.4.3.
4.3 Tensor Spaces of Order d
137
4.3 Tensor Spaces of Order d Most of the definitions and properties stated for d = 2 carry over to d > 2 in an obvious way. However, a new problem arises. In the case of d = 2, all three spaces V, N W, V ⊗ W are equipped with norms. For d > 2 spaces as for instance ∗ V[j] = k6=j Vk are of interest. Since we want to embed the dual space V[j] into L(V, Vj ) (cf. §4.3.4) we need a suitable norm. Such norms will be defined in §4.3.2.
4.3.1 Continuity, Crossnorms 4.3.1.1 General Definitions and Properties In this chapter, k·kj are the norms associated with the vector spaces Vj , while k·k Nd is the norm of the algebraic tensor space a j=1 Vj and of the Banach tensor space Nd k·k j=1 Vj . Lemma 4.35b implies the following result. Remark 4.94. Let (Vj , k·kj ) be normed vector spaces for 1 ≤ j ≤ d. The d-fold tensor product d O : V1 × . . . × Vd → V1 ⊗a . . . ⊗a Vd j=1
is continuous if and only if there is some constant C such that
d
d Y
O (j)
v ≤ C kv (j) kj for all v (j) ∈ Vj (1 ≤ j ≤ d) .
j=1
We call k·k a crossnorm if
O d
d (j) Y
= v kv (j) kj
j=1
(4.41)
j=1
for all v (j) ∈ Vj
(1 ≤ j ≤ d)
(4.42)
j=1
holds for elementary tensors. Similarly, we may consider the d-fold tensor product d O j=1
: V1∗ × . . . × Vd∗ →
d O a
Vj∗
j=1
Nd of the dual spaces. We recall that the normed space ( a j=1 Vj , k·k) has a dual Nd ∗ ( a j=1 Vj )∗ equipped with the dual norm k·k . We interpret ϕ1 ⊗ . . . ⊗ ϕd ∈ Nd N Nd d ∗ ∗ a j=1 Vj ) , via j=1 Vj as functional on a j=1 Vj , i.e., as an element of ( a (ϕ1 ⊗ . . . ⊗ ϕd ) v (1) ⊗ . . . ⊗ v (d) := ϕ1 (v (1) ) · ϕ2 (v (2) ) · . . . · ϕd (v (d) ). Nd Nd ∗ ∗ ∗ Then continuity of j=1 Vj is equivalent to j=1 : V1 × . . . × Vd → a
4 Banach Tensor Spaces
138
O
∗ d Y
d
≤C kϕj k∗j ϕ j
j=1
for all ϕj ∈ Vj∗ (1 ≤ j ≤ d) .
(4.43)
j=1
Nd A crossnorm k·k on j=1 Vj is called a reasonable crossnorm if
O
d
d
∗ Y
= ϕ for all ϕj ∈ Vj∗ (1 ≤ j ≤ d) . kϕj k∗j j
j=1
(4.44)
j=1
Nd A crossnorm k·k on V := k·k j=1 Vj is called a uniform crossnorm, if elemenNd tary tensors A := j=1 A(j) have the operator norm kAkV←V =
d Y
kA(j) kVj ←Vj
A(j) ∈ L(Vj ,Vj ), 1 ≤ j ≤ d
(4.45)
j=1
(we may write ≤ instead, but equality follows, compare Definition 4.84 and the subsequent comment on page 133). The proofs of Lemma 4.87 and Proposition 4.88 can easily be extended to d factors yielding the following result. Nd Lemma 4.95. (a) A uniform crossnorm on j=1 Vj is a reasonable crossnorm. Nd (b) Let k·k be a uniform crossnorm on j=1 Vj with reflexive Banach spaces Vj . Nd ∗ ∗ Then k·k is a uniform and reasonable crossnorm on j=1 Vj . 4.3.1.2 Projective Norm k·k∧ Nd As for d = 2 (cf. §4.2.3.1), the projective norm kxk∧(V1 ,...Vd ) on a j=1 Vj is induced by the norms k·kj = k·kVj for 1 ≤ j ≤ d, which can be defined as follows. Nd Remark 4.96. (a) For x ∈ a j=1 Vj define k·k∧(V1 ,...Vd ) by X d d n Y n O X (j) (j) kxk∧(V1 ,...Vd ) := kxk∧ := inf kvi kj : x = vi . i=1 j=1
(b) k·k∧ satisfies the crossnorm property (4.42). (c) k·k∧ is the strongest norm for which the map is continuous.
i=1 j=1 d N j=1
: V1 × . . . × Vd →
a
d N
Vj
j=1
The parts (b) and (c) are proved analogously to the case of d = 2 in Lemma 4.51 and Proposition 4.52. Nd Proposition 4.97. k·k∧ is a uniform and reasonable crossnorm on a j=1 Vj . Proof. The proof of the uniform crossnorm property in Proposition 4.85 can easily be extended from d = 2 to d ≥ 3. The result of Part (i) together with Lemma 4.87 implies that k·k∧ is a reasonable crossnorm. t u
4.3 Tensor Spaces of Order d
139
4.3.1.3 Injective Norm k·k∨ The analogue of the definition of k·k∨ in §4.2.4.2 for d factors yields the following formulation. Let ϕ1 ⊗ ϕ2 ⊗ . . . ⊗ ϕd be an elementary tensor of the tensor space Nd ∗ a j=1 Vj generated by the dual spaces. The proof of the next remark uses the same arguments as used in Lemma 4.59 and Proposition 4.60. Remark 4.98. (a) For v ∈
a
Nd
j=1
kvk∨(V1 ,...,Vd ) := kvk∨ :=
Vj define k·k∨(V1 ,...Vd ) by17
sup 06=ϕj ∈Vj∗ 1≤j≤d
|(ϕ1 ⊗ ϕ2 ⊗ . . . ⊗ ϕd ) (v)| . Qd ∗ j=1 kϕj kj
(4.46)
(b) For elementary tensors the crossnorm property (4.42) holds:
O
d (j)
v
j=1
∨(V1 ,...,Vd )
=
d Y
kv (j) kj
for all v (j) ∈ Vj (1 ≤ j ≤ d) .
j=1
(c) k·k∨(V1 ,...Vd ) is the weakest norm with
N
d
d N
j=1
j=1
: × Vj∗ → a
Vj∗ being continuous.
The definition of the injective norm can be simplified for symmetric tensors (cf. Remark 4.175). Even in the finite-dimensional case with d ≥ 3, the computation of kvk∨ is NP-hard (cf. Hillar–Lim [162]).
4.3.1.4 Examples Example 4.53 can be generalised to tensors of order d. Example 4.99. Let Vj := `1 (Ij ) with finite or countable index set Ij for 1 ≤ j ≤ d. Nd The projective norm k·k∧(V1 ,...,Vd ) of a j=1 Vj coincides with k·k`1 (I1×I2×...×Id ) . Remark 4.69 leads to the following generalisation. Example 4.100. Let Vj = (C(Ij ), k·kC(Ij ) ) with certain domains Ij (e.g., intervals). Then k·k∨(C(I1 ),...,C(Id )) = k·kC(I1 ×I2 ×...×Id ) . Proof. Remark 4.69 shows that k·k∨(C(I1 ),C(I2 )) = k·kC(I1 ×I2 ) . Iteration yields k·k∨(C(I1 ),C(I2 ),C(I3 )) = k·k∨(C(I1 ×I2 ),C(I3 )) = k·kC(I1 ×I2 ×I3 ) . The proof is completed by induction. t u 17
In Zhang–Golub [309, Def. 2.1], the expression (ϕ1 ⊗ ϕ2 ⊗ . . . ⊗ ϕd ) (v)/ is introduced as the generalised Rayleigh quotient.
Qd
j=1
kϕj k∗ j
4 Banach Tensor Spaces
140
Since kckSVD,2 = k·k`2 (I1 ×I2 ) is not equivalent to k·kSVD,1 , the explicit interpretation of k·k∧(V1 ,...,Vd ) for (Vj , k·k`2 (Ij ) ) and d ≥ 3 is not obvious. A later counterexample will use the following unusual uniform crossnorm. Exercise 4.101. For α ⊂ {1, . . . , d} with 2 ≤ #α ≤ d − 2 we regard V = #α 2 d 2 ` (N) and Vαc = a ⊗d−#α `2 (N) a ⊗ ` (N) also as Vα ⊗Vαc with Vα = a ⊗ (cf. Example 4.7). On Vα and Vαc we use the injective and projective norms. Then the norm on V is defined as nX n o o X X kxν k∨ kyν k∧ , kxν k∧ kyν k∨ : v = kvk := inf min xν ⊗ yν , ν
ν
ν
where xν ∈ Vα , yν ∈ Vαc . Prove that k · k is a uniform crossnorm. 4.3.1.5 Dual Spaces Lemma 4.82 together with its proof can easily be generalised to order-d tensors. Nd Lemma 4.102. Assume that V = k·k j=1 Vj is equipped with a reasonable Nd crossnorm k·k and is reflexive. Then the identity k·k∗ j=1 Vj∗ = V∗ holds. Furthermore, the spaces Vj are reflexive. Exercise 4.103. Under the assumptions of Lemma 4.102 show that weak converNd Nd (j) (j) (j) gence vn * v (j) ∈ Vj (1 ≤ j ≤ d) implies . Hint: j=1 vn * j=1 v Exercise 4.28.
4.3.2 Recursive Definition of the Topological Tensor Space 4.3.2.1 Setting of the Problem As mentioned in §3.2.4, the algebraic tensor space Valg := constructed recursively by pairwise products: Xalg 2 := V1 ⊗a V2 ,
a
Nd
j=1
Vj can be
alg alg alg Xalg 3 := X2 ⊗a V3 , . . . , Valg := Xd := Xd−1 ⊗a Vd .
For a similar construction of the topological tensor space V := need, in addition, suitable norms k·kXk on Xk so that Xk := Xk−1 ⊗k·kX Vk k
k·k
Nd
for k = 2, . . . , d with X1 := V1
j=1
Vj we (4.47)
yielding V = Xd with k·k = k·kXd for k = d. In the case of a (reasonable) crossnorm, it is natural to require that k·kXk be also a (reasonable) crossnorm. The crossnorm property is not only a property of k·k but also describes its relation to the norms of the generating normed spaces. For d ≥ 3, different interpretations are possible as explained below.
4.3 Tensor Spaces of Order d
141
Remark 4.104. There are two interpretations of the crossnorm property of k·kXk : Nk (i) A crossnorm on Xk = k·kX j=1 Vj requires k
O Yk
k
(4.48a) (v (j) ∈ Vj ), v (j) kv (j) kj
= j=1
j=1
Xk
(ii) whereas a crossnorm on Xk = Xk−1 ⊗k·kX Vk requires the stronger condition k
kx ⊗ v
(k)
kXk = kxkXk−1 kv
(k)
kk
for x ∈ Xk−1 and v (k) ∈ Vk .
(4.48b)
The difference between (i) and (ii) corresponds to the different meaning of an ‘elementary tensor’ as illustrated in Example 3.30. Remark 4.105. (a) Let α ⊂ D := {1, . . . , d} be arbitrary. We may renumber the spaces Vj such that α = {1, . . . , #α}. Then the above mentioned construction of N k·k the norm k · kα of Vα := X#α yields Banach tensor spaces Vα = a j∈α Vj α for all subsets α. (b) A crossnorm property even stronger than (4.48b) is (α) v ∈ Vα , v(β) ∈ Vβ , (α) (β) (α) (β) kv ⊗ v kγ = kv kα kv kβ for all ˙ ⊂ D (disjoint union). γ = α∪β (4.48c) The implications (4.48c) ⇒ (4.48b) ⇒ (4.48a) are valid. (c) There are uniform crossnorms not satisfying (4.48c). It may even happen that v(α) ⊗ v(β) (v(α) ∈ Vα , v(β) ∈ Vβ ) does not belong to Vγ . c Proof. The proof of (c) uses the norm in Exercise 4.101 N with(j)β = α(j). The definition of the norm k · kα by kxkα = kx ⊗ yk with y = j∈α y , ky k2 = 1 leads to kxkα = kxk∨ since kyk∨ = kyk∧ = 1 and kxk∨ ≤ kxk∧ . Analogously, we obtain k · kαc = k · k∨ . The completed Banach spaces are the injective spaces Vα and Vαc . Choose tensors x ∈ Vα and y ∈ Vαc not belonging to the projective spaces. Then kx ⊗ yk = ∞ holds for the uniform crossnorm of Exercise 4.101. t u
4.3.2.2 Construction for General Uniform Crossnorms If V is a Banach tensor space with a uniform crossnorm k·k, it turns out that the required norm k · kα of Vα (α ⊂ D := {1, . . . , d}) is uniquely determined. Lemma 4.106. If k·kXk is a crossnorm in the sense of (4.48b), k·kXk is uniquely defined by the norm k·k of V via
Od for arbitrary v (j) ∈ Vj
kxkXk = x ⊗ v (j) (4.49) j=k+1 with kv (j) kj = 1. N Similarly, kv(α) kα = kv(α) ⊗ j∈αc v (j) k holds for all v(α) ∈ Vα , v (j) ∈ Vj , kv (j) kj = 1, α ⊂ D. The uniformity of the crossnorm k·k can be replaced by the stronger crossnorm property (4.48c).
4 Banach Tensor Spaces
142
Proof. The induction starts with k = d−1. The crossnorm property (4.48b) implies kx ⊗ v (d) k = kxkXd−1 kv (d) kd = kxkXd−1 for any normalised vector v (d) ∈ Vd . The cases k = d − 2, . . . , 2 follow by recursion. Case (4.48c) is left as exercise. t u Note that Lemma 4.106 requires that a crossnorm k·kXk exist in the sense of (4.48b). Requiring that k·k be a uniform crossnorm on V, we now show the opposite direction: The intermediate norms k·kXk in (4.49) are well defined and have the desired properties. Note that X1 = V1 and Xd = V; in particular, the norms k·kXd and k·k coincide. Nd Theorem 4.107. Assume that k·k is a uniform crossnorm on V = k·k j=1 Vj . Then the definition (4.49) does not depend on the choice of v (j) ∈ VjN , and the k resulting norm k·kXk is a uniform and reasonable crossnorm on k·kX j=1 Vj . k Furthermore, k·kXk is a reasonable crossnorm on Xk−1 ⊗k·kX Vk . k
The proof is in §4.3.3. Remark 4.105a yields the following corollary. Corollary 4.108. LetNα ⊂ {1, . . . , d}. As in Theorem 4.107 uniform crossnorms k·kα of Vα = k·kα j∈α Vj can be defined. It is also a reasonable crossnorm on N ˙ Vα = Vα1 ⊗ k∈α2 Vk for all disjoint subsets α1 , α2 6= ∅ with α = α1 ∪ α2 , i.e.,
!
Y O
(α1 ) (k) kv (k) kk . v ⊗
= kx(α1 ) kα1
x
k∈α2
Lemma 4.109. Let Xd−1 = k·kX
Nd−1 d−1
(d)
defined in Theorem 4.107. All ϕ
I ⊗ . . . ⊗ I ⊗ ϕ(d) (v) X
k∈α2
α
d−1
∈
j=1 Vj Vd∗ and
be equipped with the norm k·kXd−1 ∗ ψ ∈ Xd−1 satisfy
≤ kϕ(d) kd∗ kvk
k(ψ ⊗ I) (v)kd ≤ kψk∗Xd−1 kvk
for all v ∈ V = Xd . (4.50)
Proof. Any ψ ∈ X∗d−1 satisfies ψ ⊗ ϕ(d) = ψ ◦ I ⊗ . . . ⊗ I ⊗ ϕ(d) (I = id). ∗ =1 For v[d] := I ⊗ . . . ⊗ I ⊗ ϕ(d) (v) there is a ψ ∈ X∗d−1 with kψkX d−1 and ψ v[d] = kv[d] kXd−1 (cf. (4.10)). Hence,
(I ⊗ . . . ⊗ I ⊗ ϕ(d) )(v) X
d−1
= ψ (I ⊗ . . . ⊗ I ⊗ ϕ(d) )(v)
= |(ψ ⊗ ϕ(d) )(v)| ≤ kψ ⊗ ϕ(d) k∗ kvk
= (4.33d)
kψk∗Xd−1 kϕ(d) k∗d kvk
= kϕ(d) k∗d kvk proves the first inequality in (4.50). The second one can be proved analogously. t u
4.3 Tensor Spaces of Order d
143
4.3.2.3 Injective and Projective Norms The norm k · kα of Vα defines the operator norm k · kα←α of L(Vα , Vα ). Definition 4.110 (strong uniformity). A crossnorm k·k of V is called strongly uniform if the induced norms k·kα (α ⊂ D = {1, . . . , d}) satisfy the following conS` dition. For all disjoint decompositions α = ν=1 αν and all φ(αν ) ∈ L(Vαν , Vαν ) the norm k · kα←α satisfies the equality:
`
`
O Y
(αν )
(αν ) φ = . (4.51)
φ
αν ←αν ν=1
α←α
ν=1
The injective and projective norms can easily be extended to the intermediate spaces Vα . They even satisfy the strongly uniform crossnorm property of Definition 4.110. The proof of the theorem is postponed to §4.3.3. Nd Nd Theorem 4.111. Let V := ∧ j=1 Vj [or, respectively, V := ∨ j=1 Vj ] be the standard projective [injective] tensor space. The norm k · kα of Vα defined in Corollary 4.108 coincides with the projective [injective] norm k · k∧(Vj :j∈α) [k · k∨(Vj :j∈α) ]. Both norms are strongly uniform crossnorms. In particular, the ∗ stronger crossnorm property (4.48c) holds for Vα , Vβ as well as for Vα , Vβ∗ . ˙ For a disjoint union γ = α∪β, Vγ = Vα ⊗∧ Vβ [Vγ = Vα ⊗∨ Vβ ] holds. An immediate consequence is Xk−1 = Vα with α = {1, . . . , k − 1} for the spaces defined in Theorem 4.107. Another strongly uniform crossnorm is the induced Hilbert norm which will be introduced in §4.5.1.
4.3.3 Proofs Proof of Theorem 4.107. (i) It is sufficient to consider the case k = d − 1 so that definition (4.49) becomes kxkXd−1 := kx ⊗ v (d) k
with kv (d) kd = 1.
There is some ϕ(d) ∈ Vd∗ with kϕ(d) k∗d = 1 and ϕ(d) (v (d) ) = 1. Let w(d) ∈ Vd with kw(d) kd = 1 be another choice. Set A(d) := w(d) ϕ(d) ∈ L(Vd , Vd ), i.e., A(d) v = ϕ(d) (v)w(d) . Because kA(d) kVd ←Vd = kϕ(d) k∗d kw(d) kd = 1, the uniform Nd crossnorm property (4.45) with A := j=1 A(j) , where A(j) := I for 1 ≤ j ≤ d−1, implies that kx ⊗ w(d) k = kA(x ⊗ v (d) )k ≤ kx ⊗ v (d) k. Interchanging the roles of w(d) and v (d) , we obtain kx ⊗ v (d) k = kx ⊗ w(d) k. Obviously, k·kXd−1 = k · ⊗v (d) k is a norm on Xd−1 .
4 Banach Tensor Spaces
144
Nd−1 Nd (ii) For x := j=1 v (j) we form x ⊗ v (d) = j=1 v (j) with some kv (d) kd = 1 . The crossnorm property of k·k implies the crossnorm property of k·kXd−1 on Nd−1 (d) kd = 1): k·kX j=1 Vj (use kv d−1
O
d−1 (j)
v
j=1
Xd−1
d
Y d d−1 Y
O
(d) (j)
= x⊗v = v = kv (j) kj = kv (j) kVj ←Vj . j=1
j=1
j=1
Similarly, the uniform crossnorm property can be shown:
d−1 !
O (j) A x = A(1) ⊗ . . . ⊗ A(d−1) x ⊗ v (d)
j=1 Xd−1
(1) = A ⊗ . . . ⊗ A(d−1) ⊗ I x ⊗ v (d) ! d−1 d−1 Y Y ≤ kA(j) kVj ←Vj kIkVd ←Vd kx ⊗ v (d) k = kA(j) kVj ←Vj kxkXd−1 . | {z } j=1 j=1 =1
As a consequence, by Lemma 4.87, k·kXd−1 is also a reasonable crossnorm. (iii) Now we consider V as the tensor space Xd−1 ⊗a Vd (interpretation (ii) of Remark 4.104). Let v := x ⊗ w(d) with x ∈ Xd−1 and 0 6= w(d) ∈ Vd . Set v (d) := w(d) /kw(d) kd . Then kvk = kx ⊗ w(d) k = kw(d) kd kx ⊗ v (d) kV = kxkXd−1 kw(d) kd follows by the definition (4.49) of kxkXd−1 . This proves that k·k is a crossnorm on Xd−1 ⊗a Vd . Since k·k is not necessarily uniform on Xd−1 ⊗a Vd , we need another argument to prove that k·k is a reasonable crossnorm on Xd−1 ⊗ Vd . Let ψ ∈ X∗d−1 and ϕ(d) ∈ Vd∗ . We need to prove that kψ ⊗ ϕ(d) k∗ = kψk∗Xd−1 kϕ(d) k∗d . Using the crossnorm property for an elementary tensor v = x ⊗ v (d) , we get |ψ(x)| |ϕ(d)(v (d) )| |(ψ⊗ϕ(d) )(v)| ≤ kψk∗Xd−1 kϕ(d) k∗d = kvk kxkXd−1 kv (d) kd
if v = x⊗v (d) 6= 0. (4.52)
Taking the supremum over all v = x ⊗ v (d) 6= 0 (x ∈ Xd−1 ), we obtain |(ψ ⊗ ϕ(d) )(v)| |(ψ ⊗ ϕ(d) )(v)| ≥ sup = kψk∗Xd−1kϕ(d) k∗d . kvk kvk (d) v∈Xd v=x⊗v
kψ ⊗ ϕ(d) k∗ = sup
Nd Define the operator A := j=1 A(j) ∈ L(V, V) by A(j) = I (1 ≤ j ≤ d − 1) and A(d) = vˆ(d) ϕ(d) with 0 6= vˆ(d) ∈ Vd . Then Av is an elementary vector of the form x ⊗ vˆ(d) (x ∈ Xd−1 ), and kA(d) kVd ←Vd = kˆ v (d) kd kϕ(d) k∗d holds. (d) (d) ∗ This fact and the crossnorm property kAvk ≤ kˆ v kd kϕ kd kvk lead us to
4.3 Tensor Spaces of Order d
145
kψk∗Xd−1 kϕ(d) kd∗ ≥
(4.52)
|(ψ ⊗ ϕ(d) )(Av)| |(ψ ⊗ ϕ(d) )(Av)| ≥ (d) . kAvk kˆ v kd kϕ(d) k∗d kvk
Since (ψ ⊗ ϕ(d) )(Av) = (ψ ⊗ (ϕ(d) A(d) ))(v) = ϕ(d) (ˆ v (d) ) · (ψ ⊗ ϕ(d) )(v), the estimate can be continued by |ϕ(d) (ˆ v (d) )| |(ψ ⊗ ϕ(d) )(v)| for all 0 6= vˆ(d) ∈ Vd . kvk kˆ v (d) kd kϕ(d) k∗d Since supvˆ(d) 6=0 ϕ(d) (ˆ v (d) ) /kˆ v (d) kd = kϕ(d) k∗d it follows that ψ ⊗ ϕ(d) (v) ≤ kψk∗Xd−1 kϕ(d) k∗d for all v ∈ V, kvk kψk∗Xd−1 kϕ(d) k∗d ≥
so that kψ ⊗ ϕ(d) k∗ ≤ kψk∗Xd−1 kϕ(d) k∗d . Together with the opposite inequality from above, we have proved kψ ⊗ ϕ(d) k∗ = kψk∗Xd−1 kϕ(d) k∗d . t u N Proof of Theorem 4.111 (case of projective norm). (i) Define Vα = ∧ j∈α Vj with the projective norm k·kα := k·k∧(Vj :j∈α) . Endow W := Vα ⊗∧ Vβ for disjoint α, β ⊂ D with the projective norm k·kW := k·k∧(Vα ,Vβ ) , while Vγ for γ := α ∪ β and its norm are defined as above. We want to prove W = Vγ and k·kW = k·kγ . P Nd (j) Choose any v ∈ Vγ and ε > 0. There is a representation v = ν j=1 vν P Qd N N (j) (β) (j) (α) (j) with kvk ≥ ν j=1 kvν kj −ε. Set xν := j∈α vν and xν := j∈β vν . P (α) P (β) (α) (β) Since v = ⊗ xν , the inequality kvkW ≤ ν xν ν kxν kα kxν kβ = P Qd (j) ν j=1 kvν kj ≤ kvk + ε holds for all ε > 0. Hence, kvkW ≤ kvkγ . P (α) (β) (ii) For the reverse inequality choose a representation v = ⊗ xν ν xν P (β) (α) with kvkW ≥ ν kxν kα kxν kβ − ε. Without loss of generality, the factors P 2 (α) (β) are equally scaled: ων := kxν kα = kxν kβ , i.e., kvkW ≥ ν ων − ε. P N P N (j) (β) (j) (α) Choose representations xν = µ j∈α xνµ , and xν = χ j∈β xνχ with P Q P Q (α) (j) (β) (j) kxν kα ≥ µ j∈α kxνµ kj − 2−ν ε and kxν kβ ≥ χ j∈β kxνχ kj − 2−ν ε. P N N (j) (j) Then the representation v = ν,µ,χ j∈α xνµ ⊗ j∈β xνχ yields kvkγ ≤
X Y
kx(j) νµ kj
ν,µ,χ j∈α
≤
X
(j) kxνχ kj =
XhX Y ν
j∈β −ν
ε
2
X
= ν q ≤ kvkW + O(ε + ε kvkW ) ν
ων + 2
Y
ων2
+2
kx(j) νµ kj
ih X Y
εων + 2
i
χ j∈β
µ j∈α 1−ν
kx(j) νχ kj
−2ν 2
ε
for all ε > 0, proving kvkγ ≤ kvkW . In the last line we use the Schwarz inequality qP pP P 1−ν 2(1−ν) 2 2 ω ≤ ν ν ν2 ν ων .
4 Banach Tensor Spaces
146
(iii) The special choice α = {1, . . . , k − 1} and β = {k} shows that (4.47) N` holds with X` = ∧ j=1 Vj (` = k − 1, k). (iv) Considering Vγ for γ = α ∪ β as the projective
tensor space Vα ⊗
∧ Vβ of order 2, Proposition 4.85 proves φ(α) ⊗ φ(β) γ←γ = φ(α) α←α φ(β) β←β . t u Recursive application of this result yields (4.51). N Proof of Theorem 4.111 (case of injective norm). (i) Define Vα = ∨ j∈α Vj with the injective norm k·kα := k·k∧(Vj :j∈α) . Endow W := Vα ⊗∨ Vβ for disjoint α, β ⊂ D with the injective norm k·kW := k·k∨(Vα ,Vβ ) , while Vγ for γ := α∪β and its norm are defined as above. We want to prove W = Vγ and k·kW = k·kγ . N ∗ (similar for β) we Since elementary tensors of j∈α Vj∗ from a subset of Vα have kvkW = supΦ∈Vα∗ ,Ψ ∈Vβ∗
|(Φ⊗Ψ )v| ∗ kΦk∗ α kΨ kβ
≥ kvkγ .
P (α) (β) (ii) For the reverse inequality consider an arbitrary tensor v = i xi ⊗ xi ∈ (β) ∗ ∗ Vα ⊗a Vβ . For any Φ ∈ Vα and Ψ ∈ Vβ∗ set λi := Ψ (xi )/ kΨ kβ ∈ K. Then P (α) (β) (Φ ⊗ Ψ ) i xi ⊗ xi
|(Φ ⊗ Ψ ) v| ∗ = ∗ kΦkα kΨ kβ
P (α) (β) i Φ(xi )Ψ (xi )
= ∗ ∗ ∗ ∗ kΦkα kΨ kβ kΦkα kΨ kβ P P (α) (α)
Φ i Φ(λi xi ) i λi xi
X (α) = ≤ λi xi . = ∗ ∗ i α kΦkα kΦkα
According to the definition of k·kα , for any ε > 0 there are ϕ(j) ∈ Vj∗ with P N P (α) (α) kϕ(j) k∗j = 1 such that k i λi xi kα ≤ |( j∈α ϕ(j) ) i λi xi | + ε. Inserting N (β) (α) ∗ λi = Ψ (xi )/ kΨ kβ and setting µi = ( j∈α ϕ(j) )xi , we arrive at
X |(Φ ⊗ Ψ ) v|
X (α) (β) ∗ λi xi ≤ µi Ψ (xi )/ kΨ kβ + ε ∗ ∗ ≤ i i α kΦkα kΨ kβ
X
X
(β) (β) ∗ µi xi / kΨ kβ + ε ≤ µi xi + ε. ≤ Ψ i
i
β
∗ Again there are ϕ(j) ∈ Vj∗ with ϕ(j) j = 1 for j ∈ β so that
X
O X
(β) (β) µi xi + 2ε. µi xi + ε ≤ ϕ(j)
i
i
j∈β
β
N
|(Φ⊗Ψ )v| (j) Using µi = xi , we conclude that kΦk ∗ ∗ is bounded by j∈α ϕ α kΨ kβ O O X (α) (β) xi ⊗ xi + 2ε = ϕ(j) ϕ(j) v + 2ε ≤ kvkγ + 2ε. j∈γ
i
(α)
j∈γ
Since this inequality holds for all ε > 0, we obtain kvkW = supΦ,Ψ
|(Φ⊗Ψ )v| ∗ kΦk∗ α kΨ kβ
≤
kvkγ . Together with part (i), kvkW = kvkγ follows. The further conclusions are as in the proof for the projective case. t u
4.3 Tensor Spaces of Order d
147
4.3.4 Embedding into L(V, Vj ) and L(V, Vα ) N N Let ϕ ∈ k6=j Vk∗ denote a functional on V[j] := a k6=j Vk . Usually we identify ϕ with the induced map VN→ Vj (cf. Remark 3.64b). To be quite precise, we now use the symbol ϕ for k6=j ϕk and Φ = ϕ ⊗ idj for the map of V into Vj . If Φ is continuous, i.e., Φ ∈ L(V, Vj ), then the composition of some Nd ϕj ∈ Vj∗ and Φ is again continuous. Since the composition ϕj ◦ Φ = k=1 ϕk is a functional on V, continuity of ϕj ◦ Φ means that ϕj ◦ Φ ∈ V∗ and ∗
kϕj ◦ ΦkV∗ ≤ kϕj kj kΦkVj ←V .
(4.53)
Part (b) of the next lemma shows that also the reverse statement is true. Lemma 4.112. (a) ϕj ∈ Vj∗ and Φ = ϕ ⊗ idj ∈ L(V, Vj ) imply ϕj ◦ Φ ∈ V∗ . (b) Let Φ = ϕ ⊗ idj : V → Vj be a linear map. If ϕj ◦ Φ ∈ V∗ for all ϕj ∈ Vj∗ , then Φ ∈ L(V, Vj ). Proof. Part (a) is proved above. For Part (b) let v ∈ V be arbitrary. Define vj := Φv. The norm of vj is kvj kj = ϕj (vj ) = (ϕj ◦ Φ) (v) ≤ kϕj ◦ ΦkV∗ kvk for a ∗ suitable ϕj ∈ Vj∗ with kϕj kj = 1 (cf. (4.10)). This proves Φ ∈ L(V, Vj ) with t u the norm kΦkVj ←V ≤ kϕj ◦ ΦkV∗ .
4.3.4.1 Basic Requirement I the order in Now we give a reformulation of the previous statements. We rearrange Nd ∗ ∗ ∗ ∗ ∗ that the ⊗ require V and V to get V ⊗ . . . ⊗ V ⊗ V ⊗ . . . ⊗ V j+1 1 j j−1 d k=1 k tensor product mapping ∗ ∗ × Vj+1 × . . . × Vd∗ → V∗ be continuous. (4.54a) ⊗ : Vj∗ × V1∗ × . . . × Vj−1 Nd Note that (4.54a) coincides with (4.43). Let V∨ := ∨ k=1 Vk be the Banach tensor space with the injective norm k·k∨ = k·k∨(V1 ,...,Vd ) . Lemma 4.113. (a) The following statements are equivalent: (i) property (4.54a), (ii) k·k∨ . k·k , and (iii) the inclusion V ⊂ V∨ of the Banach tensor spaces. N (b) If k·k∨ = k·k (and hence V = V∨ ), then ϕ = k6=j ϕk gives rise to Φ = ϕ ⊗ idj ∈ L(V, Vj ) and Y
∗
Φ = kϕk kk . Vj ←V k∈{1,...,d}\{j}
N (c) Under condition (4.54a), any ϕ ∈ a k6=j Vk∗ leads to Φ ∈ L(V, Vj ) with kΦkVj ←V ≤ C kΦkVj ←V∨ < ∞ and C from k·k∨ ≤ C k·k .
4 Banach Tensor Spaces
148
Proof. (a) Since k·k∨ is the weakest norm satisfying (4.54a) (cf. Remark 4.98c), we obtain ‘(i)⇒(ii)’. Remark 4.2 shows that ‘(ii)⇒(iii)’. V ⊂ V∨ implies the con∗ tinuous embedding V∨ ⊂ V∗ (cf. Lemma 4.25). (4.54a) holds for the injective ∗ ∗ ⊂ V∗ leads to (4.54a). norm, i.e., with V∨ on the right-hand side, while V∨ Hence, also ‘(iii)⇒(i)’ is proved. Nd of Lemma 4.112b and use that kϕj ◦ Φk∗∨ = k k=1 ϕk k∗∨ = the proofQ Qd(b) Repeat ∗ ∗ = k=1 kϕk kk k6=j kϕk kk . kϕj kj∗ =1 N (c) ϕ is a (finite) linear combination of elementary tensors k6=j ϕk from part (a). Part (b) shows that kΦkVj ←V∨ is bounded. Hence k·k∨ ≤ C k·k proves kΦkVj ←V ≤ kΦkVj ←V∨ kidkV∨ ←V ≤ C kΦkVj ←V∨ . u t ∗
A quantitative statement about kΦkVj ←V can be given if we define a norm k·k[j] N Nd ∗ on V[j] = ( k6=j Vk )∗ . Assuming a uniform crossnorm on V = k·k j=1 Vj , we can introduce the auxiliary norms k·kV[j] = k·k[j] as defined in Remark 4.105a for N ∗ ∗ = ( k6=j Vk )∗ , k·k[j] . α = D\{j}. The continuous dual space is V[j] N ∗ Exercise 4.114. The algebraic space a k6=j Vk∗ is a subspace of V[j] from above. Hint: use Lemma 4.95a. N ∗ As a consequence, the Banach space k·k∗[j] k6=j Vk∗ is a subspace of V[j] . The spaces even coincide if V[j] is reflexive (cf. Lemma 4.102). N ∗ ∗ Theorem 4.115. Choose the tensor N norms∗ as explained above. ϕ ∈ k·k[j] k6=j Vk or, more general, ϕ ∈ (k·k[j] k6=j Vk ) can be regarded as a continuous map Φ = ϕ ⊗ idj : V → Vj . Moreover, the norms of ϕ and Φ coincide: ∗
kϕk[j] = kΦkVj ←V . Proof. (a) The second inequality in (4.50) (with V[j] instead of Xd−1 ) implies ∗ kΦkVj ←V = sup06=v∈V kΦ(v)kj / kvk ≤ kϕk[j] .
(b) The definition kxk[j] = x ⊗ v (j) of the norm (4.49) involves some v (j) ∈ Vj with kv (j) kj = 1. Choose ψ (j) ∈ Vj∗ with the property kψ (j) k∗j = ψ (j) (v (j) ) = 1. Then we have ϕ ⊗ ψ (j) (x ⊗ v (j) ) |ϕ(x)| · ψ (j) (v (j) ) |ϕ(x)| |ϕ(x)|
= =
x ⊗ v (j) =
x ⊗ v (j)
x ⊗ v (j) kxk[j] ϕ ⊗ ψ (j) (v) = kϕ ⊗ ψ (j) k∗ . ≤ sup kvk 06=v∈V ϕ ⊗ ψ (j) can be written as the composition ψ (j) ◦ (ϕ ⊗ idj ) = ψ (j) ◦ Φ so that kϕ ⊗ ψ (j) k∗ = kψ (j) ◦ ΦkK←V ≤ kψ (j) kK←Vj kΦkVj ←V = kψ (j) k∗j kΦkVj ←V = kΦkVj ←V . Together with the previous inequality, |ϕ(x)| kxk[j] ≤ kΦkVj ←V is proved. ∗ The supremum over all x leads us to the reverse inequality kϕk[j] ≤ kΦkVj ←V . t u
4.3 Tensor Spaces of Order d
149
4.3.4.2 Modified Requirement I So far we assumed that V is equipped with a norm k·k satisfying k·k∨ . k·k . However, there are norms of practical importance which are weaker than k·k∨ . Nd (k) Then the tensor of functionals ϕ(k) ∈ Vk∗ may be an unbounded map k6=j ϕ Nd of a k6=j Vk into Vj . Since, in this case, the injective norm k·k∨ is too strong, we define a weaker norm as follows. In (4.54a) the norms of Vj and V remain fixed, while we replace all (Vk , k·kk ) with k 6= j by larger Banach spaces (V0,k , k·k0,k ) with a weaker norm k·k0,k . k·kk , i.e., Vk ⊂ V0,k
be continuously embedded for k 6= j
(cf. Definition 4.17). §4.3.5 will show examples of practical importance involving weaker spaces V0,k . Then Theorem 4.126 yields (4.54b). Requirement (4.54a) becomes ∗ ∗ ∗ ∗ ⊗ : Vj∗ × V0,1 × . . . × V0,j−1 → V∗ × V0,j+1 × . . . × V0,d
be continuous. (4.54b) Set k·k0,∨ = k·k∨(Vj ,V0,1 ,...,V0,j−1 ,V0,j+1 ,...,V0,d ) . In spite of k·k∨ 6. k·k , the estimate k·k0,∨ .N k·k might be true. The latter inequality is equivalent to (4.54b). Set V0,[j] := ∨ k6=j V0,k . Theorem 4.111 yields V0 = Vj ⊗∨ V0,[j] . The statement analogous to Lemma 4.113 is as follows. Lemma 4.116. (a) The following three requirements are equivalent: (4.54b), V0 ⊂ V,, and k·k0,∨ . k·k . N ∗ corresponds to the continuous (b) Under condition (4.54b), any ϕ ∈ a k6=j V0,k map Φ = ϕ ⊗ id : V → Vj , i.e., Φ ∈ L(V, Vj ).
4.3.4.3 Weak Convergence Theorem 4.117. (a) For vn , v ∈ V assume vn * v. Let O O ϕk ∈ a Vk∗ ϕ= k6=j
k6=j
and define un := ϕ(vn ) ∈ Vj
and
u := ϕ(v) ∈ Vj .
Under condition (4.54a), un converges weakly to u: un * u. N ∗ (b) The same result holds for ϕ ∈ a k6=j V0,k under condition (4.54b).
4 Banach Tensor Spaces
150
Proof. (a) un * u is equivalent to ϕj (un ) → ϕj (u) for all ϕj ∈ Vj∗ . Since ϕj ◦ Φ in ϕj (un ) = (ϕj ◦ Φ) vn belongs to V∗ according to (4.54a), the assumption vn * v shows that ϕj (un ) = (ϕj ◦ Φ) vn → (ϕj ◦ Φ) v = ϕj (u). t u
(b) Part (b) is completely analogous.
4.3.4.4 Basic Requirement II Now we replace Vj by Vα for some nonempty subset α ⊂ {1, . . . , d} with nonempty complement αc . Vα must be equipped with some norm k·kα . Requirements (4.54a) and (4.54b) become
×V × ×V
∗ k
∗ × ⊗ : Vα
→ V∗
be continuous,
(4.54c)
k∈αc
∗ ⊗ : Vα
∗ 0,k
→ V∗ be continuous for V0,k ⊃ Vk .
(4.54d)
k∈αc
Lemma 4.118. Assume that the norm k·k of V is not weaker than the injective ∗ as described in the norm k·k∨(V1 ,...,Vd ) . Then statement (4.54c) holds with Vα proof. N Proof. (i) Define Vα = ∨ k∈α Vk by means of the injective norm k·k∨(Vk :k∈α) , Nd ∗ while V∨ = ∨ j=1 Vj . Then (4.54c) with V∨ instead of V∗ is stated in Theorem 4.111. If the norm of V is a uniform crossnorm, one may alternatively ∗ ) as in §4.3.2.2. define the norm of Vα (and therefore of Vα ∗ (ii) k·k & k·k∨ implies V ⊂ V∨ and V∗ ⊃ V∨ . Hence, the result of part (i) proves (4.54c). t u
Again, sufficiently weak norms of V0,k ⊃ Vk may lead to (4.54d). condition (4.54c) [or (4.54d)] a functional ϕ ∈ a Theorem N4.119. Under ∗ ∗ [ϕ ∈ a k∈αc V0,k ] gives rise to Φ = ϕ ⊗ idα ∈ L(V, Vα ).
N
∗ k∈αc Vk
Corollary 4.120. Strongly uniform crossnorms satisfy the strengthened require∗ ∗ ∗ ∗ ∗ ∗ × Vα or Vα ments that ⊗ : Vα × V0,α be continuous. c → V c → V
4.3.4.5 Weak Convergence The proof of the following theorem is the same as for Theorem 4.117. N Theorem 4.121. (a) For vn , v ∈ V assume vn * v. Let ϕ = k6=j ϕk ∈ N ∗ V and define u := ϕ(v ) V := ∈ V ∈ u ϕ(v) and c n a n α α . Under k k∈α condition (4.54c), un * u holds. N ∗ under condition (4.54d). (b) The same result holds for ϕ ∈ a k∈αc V0,k
4.3 Tensor Spaces of Order d
151
4.3.5 Intersections of Banach Tensor Spaces If two Banach spaces (X, k·kX ) and (Y, k·kY ) have a nonempty intersection Z :=X ∩Y , the intersection norm k·kZ is given by kzkZ := max{kzkX , kzkY } or equivalent ones. Below, we shall make use of this construction. At the end of §4.2.6.2 we studied the example C 1 (I × J). This space can be obtained as the closure of C 1 (I) ⊗a C 1 (J) with respect to the norm k·kC 1 (I×J) , however, this norm is not a reasonable crossnorm, it satisfies (4.33a) but not (4.33b). Instead, the mixed norm k·k1,mix in (4.36) is a reasonable crossnorm, but the 1 (I × J) is a proper subspace of C 1 (I × J), since functions resulting space Cmix 1 f ∈ Cmix (I × J) possess continuous mixed derivatives fxy . There is another way to obtain C 1(I × J). First we consider the anisotropic spaces ∂ C (1,0) (I ×J) := {f : f, ∂x f ∈ C(I ×J)},
kf k(1,0) := max{kf k∞ , kfx k∞ },
∂ f ∈ C(I ×J)}, C (0,1) (I ×J) := {f : f, ∂y
kf k(0,1) := max{kf k∞ , kfy k∞ },
with kf k∞ := sup(x,y)∈I×J |f (x, y)|. Then we obtain C 1 (I ×J) and its norm by C 1 (I × J) = C (1,0) (I × J) ∩ C (0,1) (I × J), k · kC 1 (I×J) = max{kf k(1,0) , kf k(0,1) }. The proof of Remark 4.69 can be extended to show that k·k(1,0) [k·k(0,1) ] is a reasonable crossnorm of C (1,0) (I × J) [C (0,1) (I × J)]. We give another important example. Here N ∈ N is a fixed degree. Example 4.122. For Ij ⊂ R (1 ≤ j ≤ d) and 1 ≤ p < ∞, the Sobolev space H N,p (Ij ) consists of all functions f from Lp (Ij ) with bounded norm 18 p 1/p X N Z n d kf kN,p;Ij := , (4.55a) dxn f dx I j n=0 whereas H N,p (I) for I = I1 × . . . × Id ⊂ Rd is endowed with the norm X Z 1/p p kf kN,p := |∂ n f | dx (4.55b) 0≤|n|≤N
where n ∈
Nd0
I
is a multi-index of length |n| :=
Pd
j=1
nj , and ∂ n as in (4.5).
Again, the norm k·kN,p satisfies (4.33a) but not (4.33b), in particular, it is not a reasonable crossnorm. Instead, for each n ∈ Nd0 with |n| ≤ N we define the space H n,p (I) := {f ∈ Lp (I) : ∂ n f ∈ Lp (I)} 18
It suffices to have the terms for n = 0 and n = N in (4.55a). The derivatives are to be understood as weak derivatives (cf. [141, §6.2.1]).
4 Banach Tensor Spaces
152
with the reasonable crossnorm p
p
kf k0,p + k∂ n f k0,p
kf kn,p :=
1/p
.
T Then the Sobolev space H N,p (I) is equal to the intersection 0≤|n|≤N H n,p (I), and its norm (4.55b) is equivalent to max0≤|n|≤N k·kn,p . Note that H n,p for n = (1, 0) is considered in Example 4.48. If n ∈ Nd0 is a multiple of a unit vector, i.e., ni = 0 except for one i, the proof of Remark 4.81 can be used to show that k·kn,p is a reasonable crossnorm for 1 ≤ p < ∞. The Sobolev spaces H m,p (Ij ) for m = 0, 1, . . . , Nj are an example of a scale of Banach spaces. We fix integers Nj and denote the j-th scale by (Nj )
Vj := Vj
(Nj −1)
⊂ Vj
(0)
⊂ . . . ⊂ Vj
with dense embeddings.
(4.56)
The corresponding norms satisfy k·kj,n & k·kj,m for Nj ≥ n ≥ m ≥ 0 on Vj . (n)
(0)
(1 ≤ n ≤ Nj ) are dense in (Vj , k·kj,0 ).
Lemma 4.123. By (4.56), all Vj
Let numbers Nj ∈ N0 be given and define N as a subset of Nd0 satisfying n ∈ N ⇒ 0 ≤ n j ≤ Nj ,
(4.57a)
Nj := (0, . . . , 0, Nj , 0, . . . , 0) ∈ N . | {z } | {z } j−1
(4.57b)
d−j d
The standard choice for N is N ⊂ {0, . . . , N } with Nj = N for all 1 ≤ j ≤ d. Often N := Nj ∈ Nd0 : 1 ≤ j ≤ d (4.57c) is sufficient. For each n ∈ N we define the tensor space V(n) :=
d O a
(nj )
Vj
.
(4.58a)
j=1
Then we choose a reasonable crossnorm k·kn on V(n) or an equivalent one. The intersection Banach tensor space is defined by \ V := V(n) with intersection norm kvk := max kvkn (4.58b) n∈N
n∈N
or an equivalent norm. (0)
(Nj )
Remark 4.124. Assume Vj % Vj and let V be defined by (4.58b).
(this excludes the finite-dimensional case)
(a) V = Vmix := V(N1 ,N2 ,...,Nd ) holds if and only if (N1 , . . . , Nd ) ∈ N (cf. (4.36)). (b) Otherwise Vmix $ V $ V(0) holds, and Vmix is dense in V. (c) In Case (a) a reasonable crossnorm k·kmix may exist, whereas in Case (b) condition (4.33b) required for a reasonable crossnorm cannot be satisfied. Nd (Nj ) ∗ ) are continuous functionals on Proof. For Part (c) note that ϕ ∈ j=1 (Vj Vmix but not necessarily on V $ Vmix endowed with a strictly weaker norm. t u
4.3 Tensor Spaces of Order d
153
Proposition 4.125. Under the conditions (4.57a,b), the Banach tensor space V in (4.58b) satisfies the inclusion ! d d d O O O (0) Vj , ∩V = a Vj Vj ⊂ Vmix := k·k(N ,...,N ) a 1
d
j=1
j=1
j=1
i.e., an algebraic tensor in V does not differ from an algebraic tensor in Vmix . Pr Nd Nd (j) (0) with Each v ∈ a j=1 Vj ∩ V has a representation v = j=1 vi i=1 (j)
vi
(Nj )
∈ Vj = Vj
.
Proof. By definition (4.58a), ! " d O \ (0) Vj ∩V = a j=1
n∈N
d O a
! (0) Vj
#
∩
V(n)
j=1
Nd (0) holds. Since v ∈ a j=1 Vj ∩ V(n) is an algebraic tensor, it belongs to the Nd (0) space a j=1 Vj ∩ V(n) = V(n) . Lemma 6.11 will show that " # d \ (n ) \ O j Vj . v∈ V(n) = a j=1
n∈N
Nd
T
n∈N
(nj ) n∈N Vj
= a By condition (4.57b), v ∈ a j=1 from the fact that one of the numbers nj is equal to Nj . (0)
Nd
j=1 Vj
can be obtained t u
(1)
Application to Vj = C 0 (Ij ) and Vj = C 1 (Ij ) yields that all functions from Nd 1 (I) the algebraic tensor space a j=1 C 0 (Ij ) ∩C 1 (I) are already in Vmix = Cmix 1 (cf. (4.36)), which is a proper subspace of C (I). ∗ P The dual space V∗ is the sum (span) of the duals of V(n) : V∗ = V(n) . n∈N
Theorem 4.126. Let k·kn be reasonable crossnorms on V(n) for all n ∈ N . Then (4.54b) holds in the form (0)∗ (0)∗ (0)∗ (0)∗ → V∗ is continuous ⊗ : Vj∗ × V1 × . . . × Vj−1 × Vj+1 × . . . × Vd (Nj )
(we recall that Vj = Vj (nj )∗
⊗ : Vj
). Moreover, for any n ∈ N we have that
(n )∗ (n )∗ (n )∗ × V1 1 × . . . × Vj−1j−1 × Vj+1j+1 × . . . → V∗
is continuous.
Proof. The first statement is the special case of the second one with n = Nj ∈ N (cf. (4.57b)). Since k·kn is a reasonable crossnorm, the tensor product ⊗ is contin(n )∗ d uous from ×j=1 Vj j into V(n)∗ . From the continuous embedding V(n)∗ ⊂ V∗ we obtain the desired result. t u
4 Banach Tensor Spaces
154
4.3.6 Tensor Space of Operators Nd Nd Let V = a j=1 Vj and W = a j=1 Wj be two Banach tensor spaces with the respective norms k·kV and k·kW , while k·kVj and k·kWj are the norms of Vj and Wj . The space L(Vj , Wj ) is endowed with the operator norm k·kWj ←Vj . Their algebraic tensor space is d O L(Vj , Wj ) . L := a j=1 d d N N A(j) ∈ L on v = The obvious action of an elementary tensor A = v (j) ∈ V j=1 j=1 yields the following tensor from W: ! ! d d d O O O Av = A(j) A(j) v (j) ∈ W. v (j) = j=1
j=1
j=1
If k·kV and k·kW are crossnorms, we estimate kAvkW by d Y j=1
kA(j) v (j) kWj
d h i Y ≤ kA(j) kWj ←Vj kv (j) kVj = j=1
d Y
! kA(j) kWj ←Vj
kvkV .
j=1
Hence kAvkW ≤ kAkkvkV holds for all elementary tensors. However, we cannot expect that all crossnorms k·kV and k·kW satisfy the estimate kAvkW ≤ kAkkvkV for general tensors v ∈ V. In the special case of V = W, we called crossnorms uniform if they satisfy this estimate (cf. §4.2.8). We show that the pro- and injective norms have the desired property. Proposition 4.127. (a) If k·kV = k·k∧(V1 ,...,Vd ) and k·kW = k·k∧(W1 ,...,Wd ) , then Nd A = j=1 A(j) ∈ L has the operator norm kAkW←V =
d Y
kA(j) kWj ←Vj .
(4.59)
j=1
(b) If k·kV = k·k∨(V1 ,...,Vd ) and k·kW = k·k∨(W1 ,...,Wd ) , (4.59) again holds with respect to the corresponding operator norm. Proof. The same arguments as in the proof of Proposition 4.85 can be applied.
t u
Nd Since kAkW←V is finite for elementary tensors A = j=1 A(j) , boundedness Nd holds for all A ∈ L := a j=1 L(Vj , Wj ) . The completion of (L, k·kW←V ) yields the tensor space d O L(Vj , Wj ) ⊂ L(V, W). k·kW←V j=1
Nd Exercise 4.128. Let k·k be a uniform crossnorm on V := j=1 Vj . Prove that Nd the operator norm k·kV←V is a crossnorm for L := j=1 L(Vj , Vj ).
4.4 Hilbert Spaces
155
4.4 Hilbert Spaces 4.4.1 Scalar Product Again, we restrict the field to either K = R or K = C. A normed vector space (V, k·k) is a pre-Hilbert space if the norm is defined by p kvk = hv, vi < ∞ for all v ∈ V, (4.60) where h·, ·i : V ×V → K is a scalar product on V . In the case of K = R, a scalar product is a bilinear form, which, in addition, must be symmetric and positive: hv,wi = hw, vi
for v, w ∈ V,
(4.61a)
hv,vi > 0
for v 6= 0.
(4.61b)
In the complex case K = C, the form must be sesquilinear, i.e., bilinearity and (4.61a) is replaced with19 hv,wi = hw, vi hu + λv,wi = hu, wi + λ hv,wi ¯ hw, vi hw, u + λvi = hw, ui + λ
for v, w ∈ V, for all u, v, w ∈ V, λ ∈ C, for all u, v, w ∈ V, λ ∈ C.
The triangle inequality of the norm (4.60) follows from the Schwarz inequality |hv,wi| ≤ kvk kwk
for v, w ∈ V.
We describe a pre-Hilbert space by (V, h·, ·i) and note that this defines uniquely a normed space (V, k·k) via (4.60). If (V, h·, ·i) is complete, i.e., if (V, k·k) is a Banach space, we call (V, h·, ·i) a Hilbert space. Example 4.129. The Euclidean scalar product on KI is defined by X vi w i . hv, wi = i∈I
4.4.2 Basic Facts about Hilbert Spaces Vectors u, v ∈ V are orthogonal if hv, wi = 0. A subset S ⊂ V is an orthogonal system if all pairs of different v, w ∈ S are orthogonal. If an orthogonal system is a basis, it is called an orthogonal basis. If, in addition, kvk = kwk = 1 holds, we have orthonormal vectors, an orthonormal system, and an orthonormal basis, respectively. In the infinite-dimensional Hilbert case, the term ‘orthonormal basis’ 19
In physics, the opposite ordering is common: the scalar product is antilinear in the first and linear in the second argument.
4 Banach Tensor Spaces
156
has to be understood as ‘complete basis’, which is different from the algebraic basis: if any v ∈ V can uniquely be written b = {bν : ν ∈ B} is a complete basis of V P as unconditionally20 convergent series v = ν∈B αν bν (αν ∈ K). If V is separable, B P is (at most) countable; otherwise B is not countable, but for each v ∈ V the series ν∈B αν bν contains only countably many nonzero coefficients. The orthogonal complement of a subset S ⊂ V is S ⊥ = {v ∈ V : hv,wi = 0 for all w ∈ S} . Remark 4.130. (a) Any orthogonal complement is closed. (b) If S ⊂ V is a closed subspace, V = S ⊕ S ⊥ is a direct sum, i.e., every v ∈ V has a unique decomposition v = s + t with s ∈ S and t ∈ S ⊥ . An unpleasant feature of general Banach spaces is the possible non-reflexivity X ∗∗ % X. This does not happen for Hilbert spaces as stated next. Remark 4.131. (a) All Hilbert spaces satisfy V = V ∗∗ . (b) The dual space V ∗ is isomorphic to V : for any ϕ ∈ V ∗ there is exactly one vϕ ∈ V with for all v ∈ V (4.62) ϕ(v) = hv,vϕ i (Fr´echet–Riesz theorem, cf. Riesz [250, §II.30]). Conversely, every element vϕ ∈ V generates a functional ϕ ∈ V ∗ via (4.62). This defines the Fr´echet–Riesz isomorphism J : V → V ∗ with hv,wiV = hJv,JwiV ∗ . Notation 4.132. (a) For v ∈ V we shall denote Jv ∈ V ∗ by v ∗ , i.e., v ∗ (·) = h·, vi. For finite-dimensional vector spaces, v ∗ is equal to v H (cf. §2.1). (b) It is possible (but not necessary) to identify V with V ∗ by setting v = v ∗ . (c) Let v ∈ V and w ∈ W . Then wv ∗ ∈ L(V, W ) denotes the operator (wv ∗ ) (x) := v ∗ (x) · w ∈ W
for all x ∈ V.
(4.63)
Theorem 4.133. For every Hilbert space V there exists an orthonormal basis {φi : i ∈ S}. It satisfies X X 2 2 hv, φi i φi , kvk = |hv, φi i| for all v ∈ V. (4.64) v= i∈S
i∈S
The second identity in (4.64) is the Parseval equality. Exercise 4.134. Let v, w ∈ V . Show that X hv, wi = hv, φi i hφi , wi i∈S
for any orthonormal basis {φi : i ∈ S} of V. 20
An unconditionally convergent series gives the same finite value for any ordering of the terms.
4.4 Hilbert Spaces
157
4.4.3 Operators on Hilbert Spaces Throughout this subsection, V and W are Hilbert spaces. Exercise 4.135. The operator norm kΦkW ←V of Φ ∈ L(V, W ) defined in (4.6a) coincides with the definition |hΦv,wiW | p sup kΦkW ←V = . hv,viV hw,wiW 06=v∈V,06=w∈W Definition 4.136. (a) The operator Φ ∈ L(V, W ) gives rise to the adjoint operator21 Φ∗ ∈ L(W, V ) defined by hΦv, wiW = hv, Φ∗ wiV . (b) If V = W and Φ = Φ∗ ∈ L(V, V ), the operator is called selfadjoint. Next, we consider the subspace K(V, W ) ⊂ L(V, W ) of compact operators (cf. Definition 4.14 and §4.2.9). We recall that W ⊗ V ∗ can be interpreted as a subspace of K(V, W ) (cf. Corollary 4.92). The (finite) singular-value decomposition from Lemma 2.23 can be generalised to the infinite-dimensional case. Theorem 4.137 (infinite singular-value decomposition). (a) For Φ ∈ K(V, W ) there are singular values σ1 ≥ σ2 ≥ . . . with σν & 0 and orthonormal systems {wν ∈ W : ν ∈ N} and {vν ∈ V : ν ∈ N} such that ∞ X Φ= σν wν vν∗ (cf. (4.63)), (4.65) ν=1
where the sum converges with respect to the operator norm k·kW ←V : kΦ − Φ(k) kW ←V = σk+1 & 0
for Φ(k) :=
k X
σν wν vν∗ .
ν=1
(b) Conversely, any Φ defined by (4.65) with σk & 0 belongs to K(V, W ). Proof. Set Ψ := Φ∗ Φ ∈ L(V, V ). As product of compact operators, Ψ is compact. The Riesz–Schauder theory (cf. [141, Theorem 6.89]) states that Ψ has eigenvalues λν with λν → 0. Since Ψ is selfadjoint, there are corresponding eigenfunctions wν which can be chosen orthonormally defining an orthonormal system {vν : ν ∈ N}. As Ψ is positive semidefinite, i.e., hΨ v, viV ≥ √ 0 for all v ∈ V , we conclude that λν ≥ 0. Hence the singular values σν := + λν are well-defined. Finally, set wν := Φvν / kΦvν k = σ1ν Φvν (the latter equality follows 2 from kΦvν k = hΦvν , Φvν i = hvν , Φ∗ Φvν i = hvν , Ψ vν i = λν hvν , vν i = λν ). The vectors wν are already normalised. Since 21
There is a slight difference between the adjoint operator defined here and the dual operator from Definition 4.23, since the latter belongs to L(W ∗ , V ∗ ). As we may identify V = V ∗ and W = W ∗ , this difference is not essential.
4 Banach Tensor Spaces
158
hwν , wµ i kΦvν k kΦvµ k = hΦvν , Φvµ i = hvν , Φ∗ Φvµ i = λµ hvν , vµ i = 0 for ν 6= µ, {wν : ν ∈ N} is an orthonormal system in W . Besides Φvν = σν wν (by the definition of wν ) also Φ(k) vν = σν wν holds for ν ≤ k since vµ∗ (vν ) = hvν , vµ i = δνµ and ! Xk Xk ∗ σµ wµ δνµ = σν wν . σµ wµ vµ (vν ) = µ=1
µ=1
We conclude that Φ − Φ(k) (vν ) = 0 for ν ≤ k, while Φ − Φ(k) (vν ) = Φ(vν ) for ∗ 2 2 ≥ σk+2 ≥ ... ν > k. Hence Φ − Φ(k) Φ − Φ(k) has the eigenvalues σk+1 This implies that kΦ − Φ(k) kW ←V = σk+1 . Convergence follows by σν & 0. For the opposite direction use the facts that Φ(k) is compact because of the finitedimensional range and that limits of compact operators are again compact. t u Corollary 4.138. If κ(·, ·) is the Schwartz kernel of Φ, i.e., Z Φ(v) := κ(·, y)v(y)dy for v ∈ V, Ω P∞ we may write (4.65) as κ(x, y) = ν=1 σν wν (x)vν (y). Representation (4.65) allows us to define a scale of norms k·kSVD,p (cf. (4.17)), ∞ which use the `p norm of the sequence σ = (σν )ν=1 of singular values.22 Remark 4.139. (a) kΦkSVD,∞ = kΦk∨(W,V ) = kΦkV ←W is the operator norm. (b) kΦkSVD,2 = kΦkHS is the Hilbert–Schmidt norm. (c) kΦkSVD,1 = kΦk∧(W,V ) determines the nuclear operators. In the context of Hilbert spaces it is of interest that the Hilbert–Schmidt operators form again a Hilbert space with scalar product defined via the trace, which, for the finite-dimensional case, has already been defined in (2.8). In the infinitedimensional case, the definition of the trace is generalised by X trace(Φ) := hφi , Φφi i for any orthonormal basis {φi : i ∈ S} . (4.66) i∈S
To show that this definition makes sense, we must prove that the right-hand side does not depend on the particular basis. Let {ψj : j ∈ T } be another orthonormal basis. Then Exercise 4.134 shows that X X X XX hφi , ψj i hψj , Φφi i = hφi , Φφi i = ψj , Φ hφi , ψj i φi i∈S
i∈S j∈T
=
X
j∈T
i∈S
hψj , Φψj i .
j∈T 22
In physics, in particular quantum information, the entropy −
P
ν
σν log(σν ) is of interest.
4.4 Hilbert Spaces
159
Definition 4.140 (Hilbert–Schmidt space). The Hilbert–Schmidt scalar product of Φ, Ψ ∈ L(V, W ) is defined by hΦ, Ψ iHS := trace(Ψ ∗ Φ) and defines the norm q p kΦkHS := hΦ, ΦiHS = trace(Φ∗ Φ). Operators Φ ∈ L(V, W ), with finite norm kΦkHS form the Hilbert–Schmidt space23 HS(V, W ). As stated in Remark 4.139b, the norms kΦkSVD,2 = kΦkHS coincide. Since P∞ 2 finiteness of ν=1 σν implies σν & 0, Theorem 4.137b proves the next result. Remark 4.141. HS(V, W ) ⊂ K(V, W ). A tensor from V ⊗ W 0 may be interpreted as map (v ⊗ w0 ) : w 7→ hw, w0 iW · v from L(W, V ). In §4.2.9 this approach has led to the nuclear operator equipped with the norm kΦkSVD,1 . For Hilbert spaces, the norm kΦkSVD,2 = kΦkHS is more natural. Lemma 4.142. Let V ⊗k·k W = V ⊗k·k W 0 be the Hilbert tensor space generated by the Hilbert spaces V, W . Interpreting Φ = v⊗w ∈ V ⊗k·kW as a map in L(W, V ), the tensor norm kv⊗wk coincides with the Hilbert–Schmidt norm kΦkHS . P∞ ∗ Proof. By Theorem 4.137, there is a representation Φ = ν=1 σν vp ν wν ∈ L(W, V ) P∞ 2 with orthonormal vν , wν . The Hilbert–Schmidt norm is equal to ν=1 σν (cf. Remark 4.139b). Interpreting Φ as a tensor from V ⊗k·k W, we use the notation ∞ ∞ P P 2 u σν vν ⊗ wν . By orthonormality, kv⊗wk = σν2 leads to the same norm. t Φ= ν=1
ν=1
Combining this result with Remark 4.141, we derive the next statement. Remark 4.143. Φ ∈ V ⊗k·k W interpreted as mapping from L(W, V ) is compact.
4.4.4 Orthogonal Projections A (general) projection has already been defined in Definition 3.4. Definition 4.144. Φ ∈ L(V, V ) is called an orthogonal projection, if it is a projection and selfadjoint. Remark 4.145. (a) Set R := range(Φ) := {Φv : v ∈ V } for a projection Φ ∈ L(V, V ). Then Φ is called a projection onto R. v = Φ(v) holds if and only if v ∈ R. (b) Let Φ ∈ L(V, V ) be an orthogonal projection onto R. Then R is closed and Φ is characterised by 23
This space is also called Schmidt class and denoted by (sc) in Schatten-von Neumann [256, Def. 2.1] or by (σc) in Schatten [255].
4 Banach Tensor Spaces
160
Φv =
v for v ∈ R 0 for v ∈ R⊥
,
where V = R ⊕ R⊥ (cf. Remark 4.130b). (4.67)
(c) Let a closed subspace R ⊂ V and w ∈ V be given. Then the best approximation problem find a minimiser vbest ∈ R
of
kw − vbest k = min kw − vk v∈R
has the unique solution vbest = Φw, where Φ is the projection onto R from (4.67). (d) An orthogonal projection 0 6= Φ ∈ L(V, V ) has the norm kΦkV ←V = 1. (e) If Φ is the orthogonal projection onto R ⊂ V , then I − Φ is the orthogonal projection onto R⊥ . (f) Let {b1 , . . . , br } be an orthonormal basis of a subspace R ⊂ V . Then the orthogonal projection onto R is explicitly given by Φ=
r X
bν b∗ν ,
i.e.,
Φv =
ν=1
r X
hv, bν i bν .
ν=1
In the particular case of V = Kn with the Euclidean scalar product, we form the orthogonal matrix U := [ b1 , . . . , br ]. Then Φ = U U H ∈ Kn×n is the orthogonal projection onto R = range{U }. Lemma 4.146. (a) Let P1 , P2 ∈ L(V, V ), where P1 is an orthogonal projections. Then k (I − P1 P2 ) vk2V ≤ k (I − P1 ) vkV2 + k (I − P2 ) vk2V
for any v ∈ V.
(b) Let Pj ∈ L(V, V ) be orthogonal projections for 1 ≤ j ≤ d − 1. Then k(I − P1 P2 · · · Pd−1 Pd )vk2V ≤
d X
k (I − Pj ) vk2V
for any v ∈ V.
j=1
Proof. In (I − P1 P2 ) v = (I − P1 ) v + P1 (I − P2 ) v the two terms on the righthand side are orthogonal. Therefore k (I − P1 P2 ) vk2V = k (I − P1 ) vk2V + kP1 (I − P2 ) vkV2 2
≤ k (I − P1 ) vk2V + kP1 kV ←V k (I − P2 ) vkV2 ≤ k (I − P1 ) vk2V + k (I − P2 ) vk2V using kP1 kV ←V ≤ 1 from Remark 4.145d proves Part (a). Part (b) follows by Qd induction: replace P2 in Part (a) with j=2 Pj . t u
4.5 Tensor Products of Hilbert Spaces
161
4.5 Tensor Products of Hilbert Spaces 4.5.1 Induced Scalar Product Let h·, ·ij be a scalar product defined on Vj (1 ≤ j ≤ d); i.e., Vj is a pre-Hilbert Nd space. Then V := a j=1 Vj is again a pre-Hilbert space with a scalar product Nd Nd h·, ·i which is defined for elementary tensors v = j=1 v and w = j=1 w(j) by *
d O
v
(j)
j=1
,
d O
+ w
(j)
:=
d Y
hv (j) , w(j) ij
for all v (j), w(j) ∈ Vj .
(4.68)
j=1
j=1
In the case of norms, we saw that a norm defined on elementary tensors does not determine the norm on the whole tensor space. This is different for a scalar product. One verifies that hv, wi is a sesquilinear form. Hence its definition on elementary tensors extends to V×V. The symmetry hv, wi = hw, vi also follows immediately from the symmetry of h·, ·ij . It remains to prove the positivity (4.61b). Lemma 4.147. Equation (4.68) defines a unique scalar product on which we call the induced scalar product.
a
Nd
j=1
Vj ,
Proof. (i) Consider d = 2, i.e., a scalar Pnproduct on V ⊗a W . Let h·, ·iV , h·, ·iW be scalar products of V, W , and x = i=1 vi ⊗wi 6= 0. Without loss of generality we may assume that {vi } and {wi } are linearly independent (cf. Lemma 3.15). Consequently, the Gram matrices Gv = hvi , vj iV
n i,j=1
and
Gw = hwi , wj iW
n i,j=1
are positive definite (cf. Exercise 2.17b). The scalar product hx, xi is equal to n X
hvi , vj iV hwi , wj iW =
n X
Gv,ij Gw,ij = trace(Gv GT w ).
i,j=1
i,j=1 1/2
1/2
Exercise 2.8a with A := Gv and B := Gv GT w (cf. Remark 2.14a) yields 1/2 T 1/2 1/2 1/2 T trace(Gv Gw ) = trace(Gv Gw Gv ). The positive-definite matrix Gv GT w Gv has positive diagonal elements (cf. Remark 2.14b), proving hx, xi > 0. ! d d−1 N N (ii) For d ≥ 3 the assertion follows by induction: a Vj = a Vj ⊗a Vd j=1 j=1 Nd−1 u with the scalar product of a j=1 Vj as in (4.68) but with d replaced by d − 1. t Definition (4.68) implies that elementary tensors v and w are orthogonal if and only if vj ⊥ wj for at least one index j. A simple observation is stated next.
4 Banach Tensor Spaces
162 (j)
Remark 4.148. Orthogonal [orthonormal] systems {φi : i ∈ Bj } ⊂ Vj for 1 ≤ j ≤ d induce the orthogonal [orthonormal] system in V consisting of φi :=
d O
(j)
φij
for all i = (i1 , . . . , id ) ∈ B := B1 × . . . × Bd .
j=1 (j)
If {φi } are orthonormal bases of Vj , {φi } is an orthonormal basis of V. Example 4.149. Consider Vj = KIj endowed with the Euclidean scalar product Nd from Example 4.129. Then the induced scalar product of v, w ∈ V = j=1 Vj is given by X X X hv, wi = vi wi = ··· v[i1 · · · id ] w[i1 · · · id ]. i∈I
i1 ∈I1
id ∈Id
The corresponding (Euclidean) norm is denoted by k·k or more specifically by k·k2 . There is qaPslight mismatch in the case of d = 2 . If we treat v as a tensor, 2 kvk2 = denotes the Euclidean norm. However, if M = v is i,j |vij | considered as a matrix, kM k2 denotes the spectral norm, whereas the previous Euclidean norm is called Frobenius norm and denoted by kM kF (cf. §2.3). The standard Sobolev space H N is a Hilbert space corresponding to p = 2 in Example 4.122. As seen in §4.3.5, H N is an intersection space with a particular Nd intersection norm. Therefore we cannot define H N = j=1 Vj by the induced (n) scalar product (4.68). The Hilbert space Vj is endowed with the scalar product h·, ·ij,n involving the L2 scalar product of the functions and their n-th derivatives. The space V(n) , and the set N are defined as in §4.3.5. Then the scalar product on \ V= V(n) n∈N
is defined by * d + d d O O XY (N ) (j) (j) v , w := hv (j) , w(j) ij,nj for all v (j), w(j) ∈ Vj j . (4.69) j=1
j=1
n∈N j=1
In this definition v and w are elementary tensors of the space Vmix , which by Remark 4.124b is dense in V. The bilinear (sesquilinear) form defined in (4.69) is positive since a convex combination of positive forms is again positive. The corresponding norm sX kvk2n kvk = n∈N
is equivalent to max kvkn in (4.58b). n∈N
4.5 Tensor Products of Hilbert Spaces
163
4.5.2 Crossnorms Proposition 4.150. The norm derived from the scalar product (4.68) is a reasonable crossnorm. Furthermore, it is a strongly uniform crossnorm (cf. Definition 4.110). Qd Proof. (i) Taking v = w in (4.68) shows that kvk = i=1 kvi k for all v ∈ V. (ii) Since the dual spaces Vi∗ and V∗ may be identified with Vi and V, respectively, part (i) proves the crossnorm property (4.44) for V∗ . Nd (iii) First we consider the finite-dimensional case. Let A = j=1 A(j) with A(j) ∈ L(Vj , Vj ) and v ∈ V. Diagonalisation yields A(j)∗A(j) = Uj∗ Dj Uj (Uj (j) unitary, Dj diagonal). The columns {φi : 1 ≤ i ≤ dim(Vj )} of Uj form an orthonormal bases of Vj . Define the orthonormal basis {φi } according to Remark 4.148 P P 2 2 and represent v as v = i ci φi . Note that kvk = i |ci | (cf. (4.64)). Then
2
* d + d d
X O O X O
(j) (j) (j) A(j) (φij ), ck A(j) (φkj ) kAvk = ci ci A(j) (φij ) =
2
j=1
i
j=1
j=1
i,k
d D d D E E X X Y Y (j) (j) (j) (j) = ci ck ci ck A(j) φij , A(j) φkj = φij , A(j)∗ A(j) φkj . j
j=1
i,k
j=1
i,k
(j)
(j)
(j)
Since φkj are eigenvectors of A(j)∗A(j) , the products hφij , A(j)∗ A(j) φkj i vanish for ij 6= kj . Hence, 2
kAvk =
X
d d D
E X Y Y
(j) (j) 2 (j) (j) 2 |ci | φij , A(j)∗ A(j) φij =
A φij j=1
i
≤
2
|ci |
Y d
kA(j) kVj ←Vj
2 X
j=1
=
Y d
|ci |
2
i
kA(j) kVj ←Vj
2 X
j=1
i
d Y
j
j=1
i (j)
kφij k2j j=1 | {z } =1
2
|ci | =
Y d
kA(j) kVj ←Vj
2
2
kvk
j=1
proves that the crossnorm is uniform. Nd (iv) Now consider the infinite-dimensional case. The tensor v ∈ V := a j=1 Vj Pn Nd Nd (j) has some representation v = i=1 j=1 vi and, therefore, v ∈ V0 := j=1 V0,j (j) holds with the finite-dimensional subspaces V0,j := span{vi : 1 ≤ i ≤ n}. Let Φj = Φ∗j ∈ L(Vj ,Vj ) be the orthogonal projection onto V0,j . An easy exercise shows that 2
kAvk = hv, A∗ Avi O O d d = v, A(j)∗ A(j) v = v, j=1
j=1
Φ∗j A(j)∗ A(j) Φj v .
4 Banach Tensor Spaces
164
Set Cj := Φ∗j A(j)∗ A(j) Φj = (A(j) Φj )∗ (A(j) Φj ) = B (j)∗ B (j) for the well1/2 defined square root B (j) := Cj . Since the operator acts in the finite-dimensional subspace V0,j only, part (iii) applies. The desired estimate follows from kB (j) k2Vj ←Vj = kB (j)∗ B (j) kVj ←Vj = k(A(j) Φj )∗ (A(j) Φj )kVj ←Vj = kA(j) Φj k2Vj ←Vj ≤ kA(j) kV2 j ←Vj kΦj k2Vj ←Vj = kA(j) k2Vj ←Vj (cf. Remark 4.145). (v) Let N γ = α ∪ β be a disjoint union of α, β ∈ D = {1, . . . , d}. Define Vα = j∈α Vj and Vβ by means of the induced scalar products h·, ·iα and h·, ·iβ . The scalar product h·, ·iγ of Vγ can be constructed in two ways: (i) induced by h·, ·ij for all j ∈ γ and (ii) induced by h·, ·iα and h·, ·iβ . It is easy to see that both constructions lead to identical results. The uniform crossnorm property of t Vγ = Vα ⊗ Vβ (construction (ii)) proves the strong uniformity. u The projective crossnorm k·k∧ for `2 (I)×`2 (J) is discussed in Example 4.54. The result shows that the generalisation for d ≥ 3 does not lead to a standard norm. Nd The injective crossnorm k·k∨ of a j=1 Vj is defined in (4.46). For instance, Vj = `2 (Ij ) endowed with the Euclidean scalar product leads to P P (d) (1) i1 ∈I1 · · · id ∈Id v[i1 · · · id ]·wi1 ·. . .·wid kvk∨(`2 ,...,`2 ) = sup . kw(1) k1 · . . . · kw(d) kd 06=w(j) ∈Vj 1≤j≤d
If d = 1, kvk∨ coincides with kvk2 . For d = 2, kvk∨ is the spectral norm24 kvk2 of v interpreted as matrix (cf. (2.11)).
4.5.3 Tensor Products of L(Vj , Vj ) The just proved uniformity shows that the Banach spaces L(Vj , Vj ), k · kVj ←Vj form the tensor space d O L(Vj , Vj ) ⊂ L(V, V) a
j=1
and that the operator norm k·kV←V is a crossnorm (cf. Exercise 4.128). It is even a crossnorm in the stronger sense of (4.48c). Note that (L(V, V), k·kV←V ) is a Banach space but not a Hilbert space. To obtain a Hilbert space, we have to consider the space HS(Vj , Vj ) of the Hilbert– Schmidt operators with the scalar product h·, ·ij,HS (cf. Definition 4.140). The scalar 24
By this reason—also for d ≥ 3 —the injective crossnorm is sometimes called the spectral norm.
4.5 Tensor Products of Hilbert Spaces
165
Nd products h·, ·ij,HS induce the scalar product h·, ·iHS on H := a j=1 HS(Vj , Vj ) . Equation (4.81) shows that h·, ·iHS is defined by the trace on H. Nd Exercise 4.151. For A(j) v (j) = λj v (j) (v (j) 6= 0) and A := j=1 A(j) prove: Nd (a) The elementary tensor v := j=1 v (j) is an eigenvector of A with eigenvalue Qd λ := j=1 λj , i.e., Av = λv. (b) Assume that A(j) ∈ L(Vj , Vj ) has dim(Vj ) < ∞ eigenpairs (λj , v (j) ). Then all eigenpairs (λ, v) constructed in Part (a) yield the complete set of eigenpairs of A. Exercise 4.151b requires that all A(j) be diagonalisable. The next lemma treats the general case. Lemma 4.152. Let A(j) ∈ CIj ×Ij be a matrix with #Ij < ∞ for 1 ≤ j ≤ d and Nd form the Kronecker product A := j=1 A(j) ∈ CI×I . Let (λj,k )k∈Ij be the tuple of eigenvalues of A(j) corresponding to their multiplicity. Then (λk )k∈I
with λk :=
d Y
λj,kj
(4.70)
j=1
represents all eigenvalues of A together with their multiplicity. Note that λk might be a multiple eigenvalue by two reasons: (a) λj,kj = λj,kj +1 = . . . = λj,kj +µ−1 is a µ-fold eigenvalue of A(j) with µ > 1, (b) different factors λj,kj 6= λj,kj0 may produce the same product λk = λk0 . Proof. For each matrix A(j) there is a unitary similarity transformation R(j) = U (j) A(j) U (j)H (U (j) unitary) into an upper triangular matrix R(j) (Schur normal form; cf. [140, Theorem A.34]). Hence A(j) and R(j) have identical eigenvalues Nd Nd including their multiplicity. Set U := j=1 U (j) and R := j=1 R(j) . U is again unitary (cf. (4.80a,b)), while R ∈ CI×I is of upper triangular form with λk in (4.70) as diagonal entries. Since the eigenvalues of triangular matrices are given by the diagonal elements of R(j) (including the multiplicity), the assertion follows. t u
4.5.4 Gagliardo–Nirenberg Inequality It will be very convenient to use the induced Hilbert norm (usually Euclidean or L2 norm). On the other hand, there is a need to control other norms, in particular, the supremum norm k·k∞ . The Gagliardo–Nirenberg inequality (cf. Nirenberg [232, p. 125] and Maz’ja [222]25 ) relates the supremum norm to the L2 norm, provided that the function is smooth enough. For the latter property, the following semi-norm is introduced, sZ Xd ∂ m u 2 dx, (4.71) |u|m := j=1 ∂xm Ω j 25
Use the parameters q = ∞, j = 0, p = r = 2, ` = m, n = d in Maz’ja [222, Eq. (2.3.50)].
4 Banach Tensor Spaces
166
which is defined for u ∈ H m (Ω) = H m,2 (Ω) (cf. Example 4.122). Note that |·|m involves no mixed derivatives. We discuss the inequality for three model cases of Ω: d Rd , [0, ∞)d , and [0, 1]q . In the latter case, the semi-norm |·|m must be replaced
with a norm, e.g., with
2
2
|·|m + k·k .
Theorem 4.153. Let Ω ∈ Rd , [0, ∞)d , [0, 1]d and suppose that m > d/2 . Then any function u ∈ H m (Ω) satisfies the inequality d d 1− 2m 2m cΩ for Ω ∈ Rd , [0, ∞)d , m |ϕ|m kϕkL2 h i d kuk∞ ≤ d cΩ |ϕ|2 + kϕk2 4m kϕk1−2 2m for Ω = [0, 1]d . m m L The constant cΩ m will be discussed in more detail below. For this purpose, we give an explicit proof based on the properties of certain Green functions corresponding to a variational problem described next. Let Ω = Ω1 × . . . × Ωd with Ωj ⊂ R. The scalar product of L2 (Ω) is denoted by (·, ·), while k·k is the corresponding norm. For m ∈ N, we define the Pd ∂ m u ∂ m v bilinear form hu, vim := j=1 ∂xm , ∂xm corresponding to the semi-norm |·|m j j introduced above. For positive numbers α, β, we define the bilinear form 2 2 a(u, v) := aΩ m,α,β (u, v) := α · hu, vim + β · (u, v) .
The corresponding norm is denoted by Ω
|||u||| = |||u|||m,α,β :=
p
a(u, u) .
Ω
|||u|||m,α,β , for different α, β > 0, are equivalent norms of the Sobolev space H m (Ω). The Sobolev embedding theorem ensures H m (Ω) ⊂ C(Ω) for m > d2 (cf. [141, Theorem 6.48]); i.e., k·k∞ ≤ γ · ||| · ||| holds. We set Ω
Ω γ = γm,α,β := sup{kuk∞ /|||u|||m,α,β : 0 6= u ∈ H m (Ω)}.
(4.72)
As a consequence of the embedding H m (Ω) ⊂ C(Ω) for m > d/2, the Dirac functional δξ (ξ ∈ Ω) with δξ (u) = u(ξ) belongs to H m (Ω)0 . The Green function Gξ = G(·, ξ) = GΩ m,α,β (·, ξ) is the solution of the variational formulation aΩ m,α,β (Gξ , v) = v(ξ)
for all v ∈ H m (Ω) and a fixed ξ ∈ Ω.
(4.73)
The solution Gξ ∈ H m (Ω) satisfies the following properties. 2
Lemma 4.154. (a) G(ξ, ξ) = |||Gξ ||| > 0, p (b) |u(ξ)| ≤ G(ξ, ξ)|||u||| for all u ∈ V , and the maximum of the ratio |u(ξ)| |||u||| is taken for u = Gξ . (c) γ in (4.72) satisfies γ 2 = sup{|G(x, y)| : x, y ∈ Ω} = sup{G(ξ, ξ) : ξ ∈ Ω}.
4.5 Tensor Products of Hilbert Spaces
167
Proof. (i) v = Gξ in (4.73) yields G(ξ, ξ) = a(Gξ , Gξ ) > 0. p (ii) |u(ξ)| =(4.73) |a(Gξ , u)| ≤ |||Gξ ||||||u||| =(a) G(ξ, ξ)|||u|||. Equality holds for u = Gξ . (iii) u = Gy and ξ = x in (c) yield p p |G(x, y)| ≤ G(x, x)|||Gy ||| =(a) G(x, x)G(y, y); i.e., |G(x, y)| ≤ max{G(x, x), G(y, y)} and the supremum is taken along the diagonal {(ξ, ξ) : ξ ∈ Ω}. t u First, we consider the case Ω = Rd . The translation operator Tδ (δ ∈ Rd ) is defined by (Tδ u) (x) = u(x + δ). Tδ is unitary in L2 (Ω): Tδ∗ = Tδ−1 = T−δ , d and the bilinear form a = aR m,α,β satisfies a(Tδ u, v) = a(u, T−δ v).
(4.74)
Conclusion 4.155. Under assumption (4.74), G depends only on the difference of its arguments: G(x, y) = G(x − y). In particular, G(x, x)p= G(y, y) for all x, y ∈ Rd , and γ from Lemma 4.154d can be defined by γ = G(0, 0). Proof. We must show that G(x, y) = G(x + δ, y + δ). Use G(x + δ, y + δ) = (Tδ Gy+δ ) (x) = a(Gx , Tδ Gy+δ ) = a(Tδ Gy+δ , Gx ) = a(Gy+δ , T−δ Gx ) = (T−δ Gx )(y + δ) = Gx (y) = G(y, x) = G(x, y). t u Lemma 4.156. In the case of Ω = [0, ∞)d , the maximum is also taken at ξ = 0: γ 2 = supξ∈Ω G(ξ, ξ) = G(0, 0). Proof. Assume that for some 0 6= ξ ≥ 0 (pointwise inequality), G(ξ, ξ) > G(0, 0) holds. Define g(x) := G(x+ξ, ξ) and note that g(0) = G(ξ, ξ). The squared norm 2 |||g||| is an integral over [0, ∞)d and equal to the integral over [ξ1 , ∞)×. . .×[ξd , ∞) 2 with g replaced by Gξ . Obviously, the latter integral is not larger than |||Gξ ||| , i.e., |||g||| ≤ |||Gξ |||. p By Lemma 4.154b, G(ξ,ξ) = |g(0)| ≤ G(0,0) = G(0, 0) holds, while |||g||| |||g||| |||G(0,0)||| p the previous inequality yields the contradiction G(ξ,ξ) ≥ G(ξ,ξ) = G(ξ, ξ) > |||g||| |||Gξ ||| p G(0, 0). t u For Ω ∈ [0, ∞)d , Rd we can define26 the dilatation Mλ (λ > 0) centred at 0 ∈ Rd by (Mλ u)(x) := u(λx) (x ∈ Ω). The substitution rule yields (Mλ u, Mλ v) = λ−d/2 (u, v) and the chain rule ∂m ∂m m proves ∂x Mλ ∂x m (Mλ u) = λ m u; hence, j
26
j
In general, Ω = Ω1 × . . . × Ωd must be assumed to be a cone (with origin at 0), i.e., x ∈ Ω implies λx ∈ Ω for all λ ≥ 0.
4 Banach Tensor Spaces
168
−d Ω Ω aΩ am,αλm ,β (u, v). m,α,β (Mλ u, Mλ v) = am,αλm−d/2 ,βλ−d/2 (u, v) = λ Ω Choosing λm = β/α, we can relate aΩ m,α,β to am,1,1 :
aΩ m,α,β
− md − md β β Ω 2 M β m1 u, M β m1 v = am,β,β (u, v) = β aΩ m,1,1 (u, v) (α) (α) α α
or equivalently 2 −d/m Ω aΩ am,1,1 (M( β )−1/m u, M( β )−1/m v). m,α,β (u, v) = β (β/α) α
α
Ω Lemma 4.157. The Green functions GΩ m,α,β belonging to am,α,β satisfy a similar relation: −2 1/m GΩ (β/α)d/m GΩ x, (β/α)1/m y . (4.75) m,α,β (x, y) = β m,1,1 (β/α) 1/m Proof. Set u(x) = β −2 (β/α)d/m GΩ x, (β/α)1/m y) and test with m,1,1 ((β/α) some function v: Ω aΩ m,α,β (u, v) = am,1,1 (M
1
1
β −m ) (α
1
β m β m GΩ m,1,1 (( α ) •, ( α ) y), M
1
β −m ) (α
v)
Ω 1/m = aΩ y), M(β/α)−1/m v) m,1,1 (Gm,1,1 (•, (β/α)
= δ(β/α)1/m y (M(β/α)−1/m v) = (M(β/α)−1/m v)((β/α)1/m y) = v(y) = δy (v), i.e., u satisfies the variational problem defining GΩ m,α,β (·, y).
t u
Since the supremum norm is invariant under dilatation, we obtain the following result from (4.75). Ω defined in Lemma 4.154d is the following Lemma 4.158. The quantity γ = γm,α,β function of α, β :
q d q −1 β 2m kGΩ ( · , 0)k = β kGΩ ∞ m,1,1 ( · , 0)k∞ m,α,β α d 2m β Ω = β −1 α γm,1,1 .
Ω γm,α,β =
Let Qa := [0, a]d be the cube with side length a. Finally, we mention the case of Ω = Q1 = [0, 1]d . Application of the dilatation operator Mλ to u ∈ H m (Ω) yields a function Mλ u ∈ H m (Qλ ) , where, in contrast to the situation above, the domain Qλ depends on λ. Now (4.75) takes the form Q
(β/α) −2 GΩ (β/α)d/m Gm,1,1 m,α,β (x, y) = β
1/m
(β/α)1/m x, (β/α)1/m y .
(4.76)
4.5 Tensor Products of Hilbert Spaces
169
We conjecture that again G(0, 0) ≥ G(ξ, ξ) is valid, however the previous proof does not apply to the cube. For different domains the constants γ compare as follows. If Ω 0 ⊂ Ω 00 and q 00
Ω γm,1,1 =
0
00
00
0 Ω Ω GΩ m,1,1 (ξ, ξ) for some ξ ∈ Ω , then γm,1,1 ≥ γm,1,1 .
Now we return to the proof of Theorem 4.153 and specify the involved constants. Proposition 4.159. Theorem 4.153 holds with constants cΩ m satisfying ( Ω am γm,1,1 if Ω ∈ Rd , [0, ∞)d , Ω cm = √ Q(1+|u|2 /kuk2 )1/(2m) (4.77) if Ω = Q1 = [0, 1]d , 2 γm,1,1 m where o n d d am := min sin 2m −1 (δ) cos− 2m (δ) : 0 < δ < π/2 √ d d d d = sin 2m −1 12 arccos( m − 1) cos− 2m 12 arccos( m − 1) ≤ 2 has the asymptotic behaviour am = 1 + −d/2 If Ω = Rd , lim cΩ holds. m =π
d 4m
log
2m d
d + O( m ) → 1 for m → ∞.
Proof. (i) Fix a function u ∈ H m (Ω). Let M := |u|m and L := kuk, and set α := cos(δ)/M and β := sin(δ)/L for any δ ∈ (0, π/2). By construction, Ω
|||u|||m,α,β = 1 holds. We infer from (4.72) and Lemma 4.158 that d
Ω
Ω Ω Ω kuk∞ ≤ γm,α,β |||u|||m,α,β = γm,α,β = β −1 (β/α) 2m γm,1,1 d d d 1− d Ω Ω 2m = γm,1,1 · (β −1 )1− 2m · α−1 2m = a(δ) · γm,1,1 · kuk 2m · |u|m
d d for Ω ∈ Rd , [0, ∞)d . The factor a(δ) is equal to sin 2m −1 (δ) cos− 2m (δ). The √ choice δ = π/4 yields a(π/4) = 2, while the minimum is taken at the value d δ = 21 arccos( m − 1) yielding the asymptotic behaviour described above. q √ 2 2 If Ω = [0, 1]d , set M := |u|m + kuk , L := kuk, and27 α := 1/( 2 M ), √ β := 1/( 2 L). Now, using (4.76), we can proceed as above. Because of the 2 2 2 different definition of M , |u|m must be replaced with |u|m + kuk . (ii) The Fourier transform (cf. [135]) shows that the constant in (4.77) is equal to s −1 Z Xd −d/2 Ω 1+ ξj2m dξ. cm = am · (2π) Rd
j=1
Pd For m → ∞, the function (1+ j=1 ξj2m )−1 tends to 0 when kξk∞ > 1, and to 1 −d/2 when kξk∞ < 1. Together with am → 1, the statement cΩ follows. t u m →π 27
The estimate is not optimised √ concerning the choice of α and β. The√present choice corresponds to cos(δ) = sin(δ) = 1/ 2 in the previous proof. Hence, am = 2 may be improved.
4 Banach Tensor Spaces
170
4.5.5 Partial Scalar Products Let X := V1 ⊗a W and Y := V2 ⊗a W be two tensor spaces sharing a pre-Hilbert space (W, h·, ·iW ). We define a sesquilinear form (again denoted by h·, ·iW ) via h·, ·iW : X × Y → V1 ⊗a V2 , hv1 ⊗ w1 , v2 ⊗ w2 iW := hw1 , w2 iW · v1 ⊗ v2 for v1 ∈ V1 , v2 ∈ V2 , and w1 , w2 ∈ W. We call this operation a partial scalar product, since it acts on the W part only. We now assume V1 = V 2 so that X = Y. We rename X and Y by V with N the usual structure V = aN j∈D Vj , D = {1, . . . , d}. In this case W from above corresponds to Vα = a j∈α Vj for a nonempty subset α ⊂ D. The notation h·, ·iW is replaced with h·, ·iα :
*
d O
vj ,
j=1
d O j=1
h·, ·iα : V × V → VD\α ⊗a VD\α , ! + " # O Y vj ⊗ := wj hvj , wj i · j∈α
α
! O
wj .
j∈D\α
j∈D\α
The partial scalar product h·, ·iα : V × V → Vαc ⊗a Vαc can be constructed as composition of the following two mappings: 1) sesquilinear concatenation (v, w) 7→ v ⊗ w ∈ V ⊗a V followed by 2) contractions28 explained below. N Definition 4.160. For a nonempty, finite index set D let V = Vd = a j∈D Vj be a pre-Hilbert space with induced scalar product. For any j ∈ D, the contraction Cj : V ⊗a V → VD\{j} ⊗a VD\{j} is defined by ! ! ! !! O O O O vk ⊗ wk . Cj vk ⊗ wk := hvj , wj i k∈D
k∈D
k∈D\{j}
k∈D\{j}
For a Q subset α ⊂ D, the contraction Cα : V ⊗a V → VD\α ⊗a VD\α is the product Cα = j∈α Cj with the action Cα
j∈D
28
!
! O
vj ⊗
O j∈D
wj =
Y j∈α
hvj , wj i ·
!
!
#
"
O j∈D\α
vj
⊗
O
wj .
j∈D\α
N In tensor algebras, contractions are applied to tensors from j Vj , where Vj is either the space 0 0 V or its dual V . If, e.g., N V1 = V and V2 = V, the corresponding contraction is defined by N (j) 7→ v (1) (v (2) ) · j≥3 v (j) (cf. Greub [125, p. 72]). jv
4.6 Tensor Operations
171
§5.2 will show additional matrix interpretations of these partial scalar products. The definition allows us to compute partial scalar product recursively. Formally, we may define C∅ (v) := v and hv, wi∅ := v ⊗ w. Corollary 4.161. If ∅ $ α $ β ⊂ D, then hv, wiβ = Cβ\α (hv, wiα ).
4.6 Tensor Operations In this section we specify operations which later are to be realised numerically in the various formats. With regard to practical applications, we mainly focus on a finite-dimensional setting. In §13 we shall quantify the arithmetical cost of the various operations.
4.6.1 Vector Operations The trivial · v (λ ∈ K , Nd vector space operations are the scalar multiplication λN d v ∈ a j=1 Vj ) and the addition v + w. By definition, v, w ∈ a j=1 Vj have representations as finite linear combinations. Obviously, the sum might have a representation with even more terms. This will become a source of trouble. Nd Nd The scalar product of two elementary tensors v = j=1 vj and w = j=1 wj reduces by the definition to the scalar product of the simple vectors vj , wj ∈ Vj : hv, wi =
d Y
hvj , wj i .
(4.78)
j=1
The naive computation of the scalar product by X vi wi hv, wi = i∈I d
would be much too costly. Therefore the reduction to scalar products in Vj is very helpful. Nd General vectors v, w ∈ a j=1 Vj are sums of elementary tensors. Assume that the (minimal) number of terms is nv and nw , respectively. Then nv nw scalar products (4.78) must be performed and added. Again, it becomes obvious that large numbers nv and nw cause problems. The evaluation of hv, wi is not restricted to the discrete setting Vj = RIj with finite index sets Ij . Assume the infinite-dimensional case of continuous functions from Vj = C([0, 1]) . As long as vj , wj belong to Ra (possibly infinite) family of 1 functions for which the scalar products hvj , wj i = 0 vj (x)wj (x)dx are exactly known, the evaluation of hv, wi can be realised.
4 Banach Tensor Spaces
172
4.6.2 Matrix-Vector Multiplication Nd Again, we consider Vj = KIj and V = KI = j=1 Vj with I = I1 × . . . × Id . Nd Ij ×Ij . For Matrices from KI×I are described by Kronecker products in j=1 K elementary tensors A=
d O
A(j) ∈
j=1
d O
KIj ×Ij
and
v=
Av =
vj ∈
j=1
j=1
the evaluation of
d O
d O
A(j) v (j)
d O
KI j
j=1
j=1
requires d simple matrix-vector multiplications, whereas the naive evaluation of Av may be beyond the computer capacities. Nd The same holds for a rectangular matrix A ∈ KI×J = j=1 KIj ×Jj .
4.6.3 Matrix-Matrix Operations Nd Ij ×Ij the same statement For the addition of Kronecker tensors A, B ∈ j=1 K holds as for the addition of vectors. The multiplication rule for elementary Kronecker tensors is ! ! d d d O O O (j) (j) A = A(j) B (j) for all A(j), B (j) ∈ KIj ×Ij . (4.79) B j=1
j=1
j=1
Similarly for A(j) ∈ KIj ×Jj , B (j) ∈ KJj ×Kj . If A (B) is a linear combination of nA (nB ) elementary Kronecker tensors, nA nB evaluations of (4.79) are needed. Further rules for elementary Kronecker tensors are: !−1 d d −1 O O (j) A = A(j) for all invertible A(j) ∈ KIj ×Ij , (4.80a) j=1 d O j=1
j=1
!T A
(j)
=
d O
A(j)
T
for all A(j) ∈ KIj ×Jj .
(4.80b)
j=1
Exercise 4.162. Assume that all matrices A(j) ∈ KIj ×Jj have one of the properties {regular, symmetric, Hermitian, positive-definite, diagonal, lower triangular, upper triangular, orthogonal, unitary, positive, permutation matrix}. Show that the Nd (j) Kronecker matrix possesses the same property. What statements hold j=1 A for negative, negative-definite, or antisymmetric matrices A(j) ?
4.6 Tensor Operations
173
Nd
Exercise 4.163. Let A :=
j=1 A
A(j) = Q(j) R(j)
(j)
(QR),
. Assume that one of the decompositions A(j) = L(j) L(j)H
(Cholesky),
or A(j) = U (j) Σ (j) V (j)T
(SVD)
is given for all 1 ≤ j ≤ d. Prove that A possesses the respective decomposition QR (QR), LLH (Cholesky), UΣVT (SVD) with the Kronecker matrices Q :=
d O
Q(j) ,
R :=
j=1
d O
R(j) ,
etc.
j=1
Exercise 4.164. Prove the following statements about the matrix rank (cf. Remark 2.1): d d O Y rank A(j) = rank(A(j) ), j=1
j=1
and the trace of a matrix (cf. (2.8)): d d O Y trace trace(A(j) ). A(j) = j=1
(4.81)
j=1
The determinant involving A(j) ∈ KIj ×Ij is equal to det
d O j=1
A(j) =
d Y
n[j] det(A(j) )
Y
with n[j] :=
j=1
#Ik .
k∈{1,...,d}\{j}
The latter identity for d = 2 is treated in the historical paper by Zehfuss [308] (cf. §1.6): matrices A ∈ Kp×p and B ∈ Kq×q lead to the determinant q
p
det(A ⊗ B) = (det A) (det B) . The connection with the Frobenius scalar (2.10) product is given by hA ⊗ B, X ⊗ Y iF = hA, XiF hB, Y iF . Further statements about elementary Kronecker products can be found in Langville–Stewart [208] and Van Loan–Pitsianis [292].
4 Banach Tensor Spaces
174
4.6.4 Hadamard Multiplication As seen in §1.1.3, univariate functions may be the subject of a tensor product, producing multivariate functions. For two functions f (x) and g(x) of the variable x = (x1 , . . . , xd ) ∈ [0, 1]d pointwise multiplication f · g is a standard operation. Replace [0, 1]d by a finite grid Gn := {xi : i ∈ I} ⊂ [0, 1]d ,
where
I = {i = (i1 , . . . , id ) : 0 ≤ ij ≤ n}, xi = (xi1 , . . . , xid ) ∈ [0, 1]d , xν = ν/n. Nd Then the entries ai := f (xi ) and bi := g(xi ) define tensors in KI = j=1 KIj , where Ij = {0, . . . , n}. Pointwise multiplication f · g corresponds to entry-wise multiplication of a and b, which is called the Hadamard product:29 a b ∈ KI
with entries
(a b)i = ai bi
for all i ∈ I.
(4.82)
Performing the multiplication for all entries would be too costly. For elementary tensors it is much cheaper to use d d d O O O b(j) = a(j) b(j) . a(j) j=1
j=1
j=1
The following rules are valid: a b = b a, 0
00
(a + a ) b = a0 b + a00 b, a (b0 + b00 ) = a b0 + a b00 .
4.6.5 Convolution There are various versions of a convolution a ? b. First, we consider sequences in `0 (Z) (cf. Example 3.1). The convolution in Z is defined by X c := a ? b with cν = aµ bν−µ (a, b, c ∈ `0 (Z)) . (4.83a) µ∈Z
Sequences a ∈ `0 (N0 ) can be embedded into a0 ∈ `0 (Z) by setting a0i = ai for i ∈ N0 and a0i = 0 for i < 0. Omitting the zero terms in (4.83a) yields the convolution in N0 : 29
Although the name ‘Hadamard product’ for this product is widely used, it does not go back to Hadamard. However, Issai Schur mentions this product in his paper [258] from 1911. In this sense, the term ‘Schur product’ would be more correct.
4.6 Tensor Operations
c := a ? b
175
with cν =
ν X
aµ bν−µ
(a, b, c ∈ `0 (N0 )) .
(4.83b)
µ=0
The convolution of two vectors a = (a0 , . . . , an−1 ) and b = (b0 , . . . , bm−1 ), with possibly n 6= m, yields min{n−1,ν}
c := a ? b
with cν =
X
aµ bν−µ
for 0 ≤ ν ≤ n + m − 2. (4.83c)
µ=max{0,ν−m+1}
Note that the resulting vector has increased length: c = (c0 , . . . , cn+m−2 ) . For finite tuples a = (a0 , a1 , . . . , an−1 ) ∈ `({0, 1, . . . , n − 1}) = Kn , the periodic convolution (with period n) is explained by c := a ? b
with cν =
n−1 X
aµ b[ν−µ]
(a, b, c ∈ Kn ) ,
(4.83d)
µ=0
where [ν − µ] is the rest class modulo n, i.e., [m] ∈ {0, 1, . . . , n − 1} with [m] − m being a multiple of n. Remark 4.165. For a, b ∈ Kn let c ∈ K2n−1 be the result of (4.83c) and define cper ∈ Kn by cper := cν + cν+n for 0 ≤ ν ≤ n − 2 and cper ν n−1 := cn−1 . Then per c is the periodic convolution result in (4.83d). The index sets Z, N, In := {0, 1, . . . , n − 1} may be replaced with the d-fold products Zd , Nd , Ind . For instance, (4.83a) becomes X a, b, c ∈ `0 (Zd ) . c := a ? b with cν = aµ bν−µ µ∈Zd
For any I ∈ {Z, N, In }, the space `0 (I d ) is isomorphic to a ⊗d `0 (I). For elementary tensors a, b ∈ a ⊗d `0 (I) , we may apply the following rule: O d j=1
a(j)
O O d d ? b(j) = a(j) ? b(j) , j=1
a(j) , b(j) ∈ `0 (I).
(4.84)
j=1
Note that a ? b is again an elementary tensor. Since almost all entries of a ∈ `0 are zero, the sums in (4.83a) and (4.84) contain only finitely many nonzero terms. If we replace `0 with some Banach space `p , the latter sums may contain infinitely many terms and we have to check its convergence. Lemma 4.166. For a ∈ `p (Z) and b ∈ `1 (Z), the sum in (4.83a) is finite and produces a ? b ∈ `p (Z) for all 1 ≤ p ≤ ∞; furthermore, ka ? bk`p (Z) ≤ kak`p (Z) kbk`1 (Z) .
4 Banach Tensor Spaces
176
Proof. Choose any d ∈ `q (Z) with kdk`q (Z) = 1 and p1 + 1q = 1. Then the scalar P P P product hd, a ? bi = ν,µ∈Z aµ bν−µ dν can be written as α∈Z bα ν∈Z aν−α dν . Since also the shifted sequence (aν−α )ν∈Z has the norm kak`p (Z) , we obtain X aν−α dν ≤ kak`p (Z) kdk`q (Z) = kak`p (Z) . ν∈Z
P |hd, a ? bi| can be estimated by α∈Z |bα | kak`p (Z) = kak`p (Z) kbk`1 (Z) . Since `q is isomorphic to (`p )0 for 1 ≤ p < ∞, the assertion is proved except for m = ∞. The latter case is an easy conclusion from |cν | ≤ kak`∞ (Z) kbk`1 (Z) (cf. (4.83a)). t u So far, discrete convolutions have been described. Analogous integral versions for univariate functions are Z x Z ∞ f (t)g(x − t)dt, (4.85a) f (t)g(x − t)dt, (f ? g) (x) = (f ? g) (x) = −∞ 1
0
Z
f (t)g([x − t])dt,
(f ? g) (x) =
where [x] ∈ [0, 1), [x] − x ∈ Z.
0
The multivariate analogue of (4.85a) is Z (f ? g) (x) = f (t1 , . . . , td ) g(x1 − t1 , . . . , xd − td ) dt1 . . . dtd . Rd
Qd Qd Again, elementary tensors f (x) = j=1 f (j) (xj ) and g(x) = j=1 g (j) (xj ) satisfy the counterpart of (4.84): ! ! d d d O O O (j) (j) (4.85b) f (j) ? g (j) , f ? g = j=1
j=1
j=1
i.e., the d-dimensional convolution can be reduced to d one-dimensional ones.
4.6.6 Function of a Matrix A square matrix of the size n × n has n eigenvalues λi ∈ C (1 ≤ i ≤ n, counted according to their multiplicity). They form the spectrum σ(M ) := {λ ∈ C : λ eigenvalue of M } . The spectral radius is defined by ρ(M ) := max {|λ| : λ ∈ σ(M )} .
(4.86)
4.6 Tensor Operations
177
Let f : Ω ⊂ C → C be a holomorphic function30 in an open domain Ω. The application of f to a matrix M is possible if σ(M ) ⊂ Ω . Proposition 4.167. (a) Assume M ∈ CI×I and let D be an (open) domain with σ(M ) ⊂ D ⊂ D ⊂ Ω. Then a holomorphic function on Ω gives rise to a matrix f (M ) ∈ CI×I defined by Z 1 −1 (ζI − M ) f (ζ) dζ . (4.87) f (M ) := 2πi ∂D P∞ ν (b) Assume that f (z) = converges for |z| < R with R > ρ(M ). ν=0 aν z Then an equivalent definition of f (M ) is f (M ) =
∞ X
aν M ν .
ν=0
Important functions are, e.g., f (z) = exp(z),
√ f (z) = exp( z),
f (z) = 1/z
if σ(M ) ⊂ {z ∈ C : 0}, if 0 ∈ / σ(M ).
Lemma 4.168. If f (M ) is defined, then f (I ⊗. . .⊗I ⊗ M ⊗ I ⊗. . .⊗I) = I ⊗. . .⊗I ⊗ f (M ) ⊗ I ⊗. . .⊗I. Nd Proof. Set M := I ⊗. . .⊗I ⊗M ⊗I ⊗. . .⊗I and I := j=1 I. As σ(M) = σ(M ) (cf. Lemma 4.152), f (M ) can be defined by (4.87) if and only if f (M) is well R −1 1 defined. (4.87) yields f (M) := 2πi (ζI − M) f (ζ)dζ. Use ∂D ζI − M = I ⊗. . .⊗I ⊗ (ζI) ⊗ I ⊗. . .⊗I − I ⊗. . .⊗I ⊗ M ⊗ I ⊗. . .⊗I = I ⊗. . .⊗I ⊗ (ζI − M ) ⊗ I ⊗. . .⊗I and (4.80a) and proceed by Z 1 −1 f (M) = I ⊗. . .⊗I ⊗ (ζI − M ) ⊗ I ⊗. . .⊗I f (ζ) dζ 2πi ∂D Z 1 −1 = I ⊗. . .⊗I ⊗ (ζI − M ) f (ζ)dζ ⊗ I ⊗. . .⊗I 2πi ∂D = I ⊗. . .⊗I ⊗ f (M ) ⊗ I ⊗. . .⊗I . For later use, we add rules about the exponential function. 30
Functions with other smoothness properties can be considered too. Compare §14.1 in [138].
t u
4 Banach Tensor Spaces
178
Lemma 4.169. (a) If A, B ∈ CI×I are commutative matrices (i.e., AB = BA), then exp(A) exp(B) = exp(A + B).
(4.88a)
(b) Let A(j) ∈ KIj ×Ij and A = A(1) ⊗ I ⊗ . . . ⊗I + I ⊗ A(2) ⊗ . . . ⊗ I + . . .
(4.88b)
+ I ⊗ . . . ⊗I ⊗ A(d−1) ⊗ I + I ⊗ . . . ⊗I ⊗ A(d) ∈ KI×I . Then exp(tA) =
d O
exp(tA(j) )
(t ∈ K).
(4.88c)
j=1
Proof. The d terms in (4.88b) are pairwise commutative and, therefore, (4.88a) proves exp(A) =
d Y
exp(A(j) ) for A(j) := I ⊗ . . . ⊗ I ⊗ A(j) ⊗ I ⊗ . . . ⊗ I.
j=1
Lemma 4.168 shows that exp(A(j) ) = I ⊗ . . . ⊗ I ⊗ exp(A(j) ) ⊗ I ⊗ . . . ⊗ I. Nd (j) (j) Thanks to (4.79), their product yields by tA(j) , j=1 exp(A ). Replacing A t u we prove (4.88c). Exercise 4.170. Prove sin((A ⊗ I) + (I ⊗ B)) = sin(A) ⊗ cos(B) + cos(A) ⊗ sin(B) and the analogue for cos ((A ⊗ I) + (I ⊗ B)). Finally, we mention quite a different kind of a function application to a tensor. Let v ∈ KI with I = I1 × . . . × Id . Then the entry-wise application of a function f : K → K yields f (v) ∈ KI
with f (v)i := f (vi ) for all i ∈ I.
For a matrix v ∈ Kn×m this is a rather unusual operation. It becomes more natural when we consider multivariate functions C(I) defined on I = I1 × . . . × Id (product Nd of intervals). Let ϕ ∈ C(I) = k·k∞ j=1 C(Ij ) . Then the definition f (ϕ) ∈ C(I) with
(f (ϕ)) (x) = f (ϕ(x)) for all x ∈ I
shows that f (ϕ) = f ◦ ϕ is nothing than the usual composition of mappings. If f is a polynomial, we can use the fact that the power function f (x) = xn applied to v coincides with the n-fold Hadamard product (4.82). However, in general, not even for elementary tensors v the result f (v) has an easy representation.
4.7 Symmetric and Antisymmetric Tensor Spaces
179
4.7 Symmetric and Antisymmetric Tensor Spaces 4.7.1 Hilbert Structure Given a Hilbert space (V, h·, ·iV ), define h·, ·i on V by the induced scalar product (4.68). We recall the set P of permutations and the projections PS , PA (cf. §3.5.1): PS and PA are orthogonal projections from V onto the symmetric tensor space S and the antisymmetric tensor space A, respectively (cf. Proposition 3.76). As a consequence, e.g., the identities hPA (u), PA (v)i = hPA (u), vi = hu, PA (v)i ,
(4.89)
hPA(u), APA(v)i = hPA(u), PA(Av)i = hPA(u), Avi = hu, APA (v)i hold for all u, v ∈ V and symmetric A ∈ L(V, V). By the definition of the induced scalar product (4.68), the scalar product hu, vi of elementary tensors u and v reduces to products of scalar products in V . In A, Nd Nd (j) (j) elementary tensors u = . have to be replaced with PA j=1 u j=1 u Their scalar product reduces to determinants of scalar products in V as stated by the following L¨owdin rule (cf. L¨owdin [217]): * d O + d O 1 (j) (j) u , PA PA v = det hu(i) , v (j) iV . d! i,j=1,...,d j=1 j=1
(4.90)
For a proof use (4.89) so that the left-hand side in (4.90) is equal to * d O j=1
u
(j)
, PA
O d j=1
v
(j)
+
1 = d!
* d O
(j)
u
j=1
,
X π∈P
sign(π)π
O d
v
(j)
+
j=1
* d + d d D E O Y 1 O (j) X 1X (π(j)) u , sign(π) v = = u(j) , v (π(j)) . sign(π) d! j=1 d! V j=1 j=1 π∈P
π∈P
A comparison with (3.44) yields the result in (4.90). Corollary 4.171. For biorthonormal systems {u(j) } and {v (j) }, i.e., hu(i) , v (j) iV = δij , the right-hand side in (4.90) becomes 1/d! . The systems are in particular biorthonormal if u(j) = v (j) forms an orthonormal system. Let {bi : i ∈ I} be an orthonormal system in V with #I ≥ d. For (i1 , . . . , id ) ∈ I d define the elementary tensor d O e(i1 ,...,id ) := bij . j=1
If (i1 , . . . , id ) contains two identical indices, PA (e(i1 ,...,id ) ) = 0 follows.
4 Banach Tensor Spaces
180
Remark 4.172. For two tuples (i1 , . . . , id ) and (j1 , . . . , jd ) consisting of d different indices, the following identity holds: D E PA (e(i1 ,...,id ) ), PA (e(j1 ,...,jd ) ) sign(π)/d! if π(i1 , . . . , id ) = (j1 , . . . , jd ) for some π ∈ P, = 0 otherwise. Proof. hPA (ei ), PA (ej )i = hei , PA (ej )i follows from (4.89). If i = j, there is i π ∈ P with π(i) = j such that PA (ej ) contains a term sign(π) d! e . All other terms i are orthogonal to e . t u An analogue of the L¨owdin rule (4.90) holds for symmetrised elementary tensors. Here the determinant must be exchanged with the permanent: + * d d O O 1 PS fj , PS gj = Perm (hfi , gj i)1≤i,j≤d , d! j=1 j=1 where Perm(M ) =
d X Y
Mi,π(i)
for M ∈ Kd×d .
π∈Pd i=1
The practical disadvantage is the fact that, in general, the computation of the permanent is NP-hard (cf. Valiant [291]).
4.7.2 Banach Spaces and Dual Spaces Let V be a Banach space (possibly, a Hilbert space) with norm k·kV . The norm of the algebraic tensor space Valg = a ⊗d V is denoted by k·k. We require that k·k be invariant with respect to permutations, i.e., kvk = kπ (v)k
for all π ∈ P and v ∈ ⊗da V.
(4.91)
Conclusion 4.173. Assume (4.91). The mapping π : a ⊗d V → a ⊗d V corresponding to π ∈ P, as well as the mappings PS and PA , are bounded by 1. Proof. The bound for π is obvious. For PS use that
1 X 1 1 X
π ≤ kπk = d! = 1 kPS k =
d!
d! d! π∈P
π∈P
holds for the operator norm. Similarly for PA . Vk·k := ⊗dk·k V is defined by completion with respect to k·k.
t u
4.7 Symmetric and Antisymmetric Tensor Spaces
181
Lemma 4.174. Assume (4.91). Denote the algebraic symmetric and antisymmetric tensor spaces by Salg (V ) and Aalg (V ). Both are subspaces of Valg . The completion of Salg (V ) and Aalg (V ) with respect to k·k yields subspaces Sk·k (V ) and Ak·k (V ) of Vk·k . An equivalent description of Sk·k (V ) and Ak·k (V ) is Sk·k (V ) = v ∈ Vk·k : v = π (v) for all π ∈ P , Ak·k (V ) = v ∈ Vk·k : v = sign(π) π(v) for all π ∈ P . Proof. (i) By Conclusion 4.173, π is continuous. For any sequence vn ∈ Salg (V ) with vn → v ∈ Vk·k the property vn = π (vn ) is inherited by v ∈ Sk·k (V ). (ii) Conversely, let v ∈ Vk·k with v = π (v) for all π ∈ P . This is equivalent to v = PS (v). Let vn → v for some vn ∈ Valg and construct un := PS (vn ) ∈ Salg (V ). Continuity of PS (cf. Conclusion 4.173) implies that u := lim un = PS (lim vn ) = PS (v) ∈ Vk·k ; hence v = u lies in the completion Sk·k (V ) of Salg (V ). Analogously for the t u space Ak·k (V ). Any dual form ϕ ∈ a ⊗d V 0 is also a dual form on the subspaces Salg (V ) and Nd Aalg (V ). For π ∈ P let π 0 be the dual mapping; i.e., π 0 ( j=1 ϕj ) ∈ a ⊗d V 0 acts Nd Nd Nd as π 0 j=1 ϕj (v) = ϕ (π(v)). We conclude that π 0 j=1 ϕj = j=1 ϕπ(j) 0 and that all ϕ ∈ a ⊗d V 0 with PS ϕ = 0 represent the zero mapping on Salg (V ). d 0 0 . A comparison with Thus, a ⊗ V reduces to the quotient space a ⊗d V 0 / ker PS d 0 0 (3.41) shows that a ⊗ V / ker PS can be viewed as the symmetric tensor space Aalg (V 0 ) derived from V 0 . Similarly, Aalg (V 0 ) ∼ = a ⊗d V 0 / ker PA0 . The same statements hold for the continuous functionals: ∗ and Sk·k∗ (V ∗ ) ∼ = ⊗dk·k∗ V ∗ / ker PS Ak·k∗ (V ∗ ) ∼ = ⊗dk·k∗ V ∗ / ker PA∗ , ∗ where by the previous considerations ker PS and ker PA∗ are closed subspaces. The next remark states that in the Hilbert case the definition of the injective norm on S(V ) may be formulated with symmetric tensors ⊗d ϕ of functionals instead Nd of j=1 ϕj . More general results including those about the projective norm can be found in Floret [108]. The original proof of Banach [19] refers to the Hilbert space V = L2 ([0, 1]). d ⊗ ϕ (v) Remark 4.175. Let V be a Hilbert space. Then kvk∨ = sup ∗ ∗ ϕ∈V ,kϕkV =1 holds for all symmetric tensors v ∈ S(V ).
Chapter 5
General Techniques
Abstract In this chapter, isomorphisms between the tensor space of order d and vector spaces or other tensor spaces are considered. The vectorisation from Section 5.1 ignores the tensor structure and treats the tensor space as a usual vector space. In finite-dimensional implementations this means that multivariate arrays are organised as linear arrays. After vectorisation, linear operations between tensor spaces become matrices expressed by Kronecker products (cf. §5.1.2). While vectorisation ignores the tensor structure completely, matricisation keeps one of the tensor products and leads to a tensor space of order two (cf. Section 5.2). In the finite-dimensional case, this space is isomorphic to a matrix space. The interpretation as matrix allows to formulate typical matrix properties like the rank leading to the j-rank for a direction j and the α-rank for a subset α of the directions 1, . . . , d. In the finite-dimensional or Hilbert case, the singular-value decomposition can be applied to the matricised tensor. In Section 5.3, the tensorisation is introduced, which maps a vector space (usually without any tensor structure) into an isomorphic tensor space. The artificially constructed tensor structure allows interesting applications. While Section 5.3 gives an introduction into this subject, details about tensorisation will follow in Chapter 14.
5.1 Vectorisation 5.1.1 Tensors as Vectors In program languages, matrices or multi-dimensional arrays are mapped internally into a linear array (vector) containing the entries in a lexicographical ordering. Note that the ordering is not uniquely determined (there are program languages using different lexicographical orderings). Without additional data, it is impossible to restore a matrix or even its format from the vector (e.g., a 4×4 and a 2×8 matrix yield the same linear array). This fact expresses that structural data are omitted. © Springer Nature Switzerland AG 2019 W. Hackbusch, Tensor Spaces and Numerical Tensor Calculus, Springer Series in Computational Mathematics 56, https://doi.org/10.1007/978-3-030-35554-8_5
183
5 General Techniques
184
Vectorisation is implicitly expressed by the notation (1.5) for Kronecker matrices. Applying this notation to n × 1 matrices which are regarded as (column) vectors, (1.5) becomes a1 b a1 b1 a2 b b2 a2 n n·m a⊗b= for a = ∈ K and b = ∈ Km . ∈K .. .. ... . . Hence the resulting tensor is immediately expressed as vector in Kn·m . Only with this vectorisation, can Kronecker products of matrices A ∈ Kn×n and B ∈ Km×m be interpreted as matrices from Kn·m×n·m . assume Vj = KIj with For a mathematical formulation of the vectorisation Q d nj := #Ij < ∞. Choose any index set J with #J = j=1 nj together with a Nd d bijection φ : ×j=1 Ij → J. This defines the isomorphism Φ : V = j=1 Vj → KJ between the tensor space V and the vector space KJ . Tensor entries v[i1 , . . . , id ] are mapped into vector entries v[φ(i1 , . . . , id )] and vice versa. Note that Φ is a vector space isomorphism in the sense of §3.2.5. In the case of a linear system with n equations and unknowns, we are used to dealing with vectors x, b and a matrix M : Mx = b (5.1a) M ∈ KJ×J , x, b ∈ KJ . In particular, LU and Cholesky decompositions require an ordered index set J, e.g., J = {1, . . . , N } with N := #J. Matrix equations are an example of systems described differently, e.g., the Lyapunov matrix equation is AX + XA> = B,
(5.1b)
where matrices A, B ∈ KI×I are given and the solution X ∈ KI×I is sought. Let n := #I. The number of unknown entries Xij is n2 . Furthermore, Eq. (5.1b) is linear in all unknowns Xij ; i.e., (5.1b) turns out to represent a linear system of n2 equations for n2 unknowns. Lemma 5.1 from below allows us to translate the Lyapunov equation (5.1b) into Ax = b, where x, b ∈ V := KI ⊗ KI are tensors and A ∈ L(V, V) is the following Kronecker product: A = A ⊗ I + I ⊗ A ∈ L(V, V)
(5.1c)
(cf. [27]). Using the vectorisation isomorphism Φ : V → KJ from above, we obtain the linear system (5.1a) with M = Φ A Φ−1 , x = Φ x, and b = Φ b. Lemma 5.1. The matrices U, V ∈ KI×I define Kronecker products U = U ⊗ I, V = I ⊗ V, W = U ⊗ V ∈ L(KI ⊗ KI , KI ⊗ KI ). The products Ux, Vx, Wx correspond to U X, XV > , and U XV > , where X ∈ KI×I is the matrix interpretation of the tensor x ∈ KI ⊗ KI .
5.1 Vectorisation
185
P Proof. W := U XV > has the matrix coefficients Wi,j = k,`∈I Ui,k Xk,` Vj,` . The Kronecker matrix W = U ⊗V has the entries W(i,j),(k,`) = Ui,k Vj,` . Hence, (Wx)(i,j) =
X
W(i,j),(k,`) x(k,`) = Wi,j .
(k,`)∈I×I
t u
The special cases U = I or V = I yield the first two statements.
From (5.1c) we easily conclude that a positive-definite matrix A in (5.1b) leads to a positive-definite matrix M and, therefore, enables a Cholesky decomposition.
5.1.2 Kronecker Tensors As already mentioned in the previous section, the interpretation of a Kronecker tensor product as a matrix is based on vectorisation. However, there is a second possibility for vectorisation. Matrices, which may be seen as tensors of order two, can be mapped isomorphically into vectors. This will be done by the mappings φj from below. Let Ij and Jj be the index sets characterising the matrix space Mj := KIj ×Jj . An isomorphic vector space is Vj := KKj with Kj = Ij × Jj . The following isomorphism φj describes the vectorisation: φj : Mj → Vj ,
(j)
A(j) = A`,m
`∈Ij ,m∈Jj
(j) i∈Kj
7→ a(j) := φj (A(j) ) = ai
.
Nd d We identify M := j=1 Mj with the matrix space KI×J , where I := ×j=1 Ij N d d and J := ×j=1 Jj , while V := j=1 Vj is identified with the vector space d
d
KK
with K :=
× j=1
Kj =
× (I ×J ) . j
j
j=1
P Note that the matrix-vector multiplication y = Mx is written as yi = j∈J Mij xj Nd Nd (j) (j) ∈ M and a = ∈V for i ∈ I. Elementary tensors A = j=1 A j=1 a have the entries d d Y Y (j) (j) A[(`1 , . . . , `d ) , (m1 , . . . , md )] = aij . A`j ,mj and a[i1 . . . , id ] = j=1
j=1
Nd Define a(j) by φj (A(j) ). Then a = j=1 φj (A) holds and gives rise to the following definition: d O with φj : A ∈ M 7→ a = Φ(A) ∈ V Φ= j=1
a[(`1 , m1 ) ,..., (`d , md )] = A[(`1 ,..., `d ) , (m1 ,..., md )] for (`j , mj ) ∈ Kj .
5 General Techniques
186
Φ can be regarded as a vectorisation of the Kronecker matrix space M. For d ≥ 3, we have the clear distinction that M is a matrix space, whereas V is a tensor space of order d ≥ 3. For d = 2, however, V can also be viewed as a matrix space (cf. Van Loan–Pitsianis [292]). Remark 5.2. Suppose that d = 2. (a) The matrix A with entries A[(`1 , `2 ) , (m1 , m2 )] is mapped by Φ into a with entries a[(`1 , m1 ) , (`2 , m2 )]. Since d = 2, the tensor a can again be viewed as a matrix from KK1 ×K2 . Note that, in general, a ∈ KK1 ×K2 and A ∈ KI×J have different shape. However, #(K1 × K2 ) = #(I × J) holds, and A and a have the same Frobenius norm; i.e., Φ is also isometric (cf. Remark 2.9). (b) The singular-value decomposition of A in the sense of Lemma can be P3.19 r applied as follows. Apply Lemma 3.19 to a = Φ(A) resulting in a = i=1 σi xi⊗yi (xi ∈ V1 , yi ∈ V2 ). Then application of Φ−1 yields A=
r X
σi Xi ⊗ Yi
−1 with Xi = φ−1 1 (xi ), Yi = φ2 (yi ).
(5.2)
i=1
The orthonormality of the singular vectors {xi }, {yi } translates into orthonormality of the matrices {Xi }, {Yi } with respect to the Frobenius norm (2.9). The Kronecker singular-value decomposition (5.2) doesPnot coincide with the r standard singular-value decomposition of the matrix A = i=1 σiA ui ⊗ viT , since Xi ∈ KK1 and ui ∈ KI as well as Yi P ∈ KK2 and vi ∈ KJ are of different size r A T (in general #K1 6= #I). The terms in i=1 σi ui ⊗ vi have the matrix rank 1, whereas the terms in (5.2) have tensor rank 1. Ps (c) The decomposition (5.2) can be used for truncation. As = i=1 σi Xi ⊗ Yi (0 ≤ s ≤ r) is the best approximation of A with respect to the Frobenius norm. As an illustration of Part (a), consider the identity matrix A = I for the index sets I1 = J1 = {1, 2} and I2 = J2 = {a, b, c}. The matrices A and a = Φ(A) are given below together with the indices for the rows and columns (only nonzero entries are indicated): 1a 1b 1c 2a 2b 2c 1a 1 1b 1 1 A = 1c 2a 1 2b 1 2c 1
7→ Φ
aa ab ac ba bb bc ca cb cc 11 1 1 1 a = 12 21 22 1 1 1
5.2 Matricisation
187
5.2 Matricisation Synonyms for matricisation are matrix unfolding or flattening. The term ‘matricisation’ corresponds to the fact that tensors are turned into matrices (at least in the finite-dimensional case). We recall the two types of isomorphisms discussed in §3.2.5. The strongest form is the tensor space isomorphism which preserves the tensor structure (cf. Definition 3.29). The weakest form is the vector space isomorphism which identifies all tensor spaces V and W having the same dimension not regarding the tensor strucNd ture. An intermediate form groups the d spaces Vj from V = j=1 Vj such that N5 the order is reduced. For instance, j=1 Vj is isomorphic to the rearrangement (V1 ⊗ V5 )⊗(V2 ⊗ V3 )⊗V4 , which is a tensor space of order dnew = 3. Matricisation is characterised by dnew = 2. An example is (V1 ⊗ V5 ) ⊗ (V2 ⊗ V3 ⊗ V4 ) . Since tensor spaces of order two are close to matrix spaces, matricisation tries to exploit all features of matrices. In this setting, vectorisation corresponds to dnew = 1. Below we use the sign ⊗ without subscripts ⊗a or ⊗k·k since both cases are allowed and need not be distinguished.
5.2.1 General Case To get dnew = 2, we must divide the whole index set {1, . . . , d} into two (disjoint) subsets. For a systematic approach we introduce the set D = {1, . . . , d}
(5.3a)
∅ $ α $ D.
(5.3b)
and consider proper subsets
The complement of α is denoted by αc := D\α . We define the (partial) tensor spaces O Vj Vα =
for α ⊂ D,
(5.3c)
(5.3d)
j∈α
which include the special cases V∅ = K for α = ∅ and VD = V for α = D. For singletons we have the synonymous notation V{j} = Vj
(j ∈ D).
Instead of V{j}c for the complement {j}c , we already introduced the symbol
5 General Techniques
188
O
V[j] =
Vk
k ∈D\{j}
(cf. (3.17a)). Depending on the context, the spaces Vα and V[j] may be algebraic or topological tensor spaces. For the norm of the partial tensor space Vα in the case of ∅ $ α $ D, we refer to §4.3.2. Below, we introduce the isomorphism Mα from V onto the binary tensor space Vα ⊗ Vαc : O Vj ∼ V= (5.4) = Vα ⊗ Vαc . j∈D
Often, α is a singleton {j}, i.e., Vα = V{j} = Vj . In this case (5.4) becomes V∼ = Vj ⊗ V[j] and the isomorphism is denoted by Mj . Definition 5.3 (Mα , Mj ). The matricisation Mα with α from1 (5.3b) is the isomorphism N Mα : k∈D Vk → Vα ⊗ Vαc N N (k) (αc ) N (k) c (k) v . 7→ v(α) ⊗ v(α ) with v(α) = v , v = k∈D v k∈αc
k∈α
In particular, for j ∈ D, Mj is the isomorphism Mj :
N N
k∈D
Vk → Vj ⊗ V[j]
k∈D
v (k) 7→ v (j) ⊗ v[j]
with v[j] =
N
k∈D\{j}
v (k) .
Next, we check how Mα (v) behaves when we apply an elementary Kronecker Nd (j) : V → W to v. This includes the case of a tensor space product j=1 A isomorphism Φ : V → W (cf. Definition 3.29). Nd Nd Nd Remark 5.4. For V = j=1 Vj and W = j=1 Wj let A := j=1 A(j) : V → W N be an elementary Kronecker product. For α in (5.3b) set A(α) := j∈α A(j) and N c A(α ) := j∈αc A(j) . Then c Mα (Av) = A(α) ⊗ A(α ) Mα (v)
for all v ∈ V. c
If A(j) : Vj → Wj are isomorphisms, A(α) ⊗ A(α between Vα ⊗ Vαc and Wα ⊗ Wαc .
)
describes the isomorphism
By condition (5.3b) we have avoided the empty set α = ∅ and α = D (⇒ αc = ∅). Since the empty tensor product is interpreted as the field K, one may view MD : V → V ⊗ K as the vectorisation (column vector) and M∅ : V → K ⊗ V as mapping into a row vector.
1
5.2 Matricisation
189
5.2.2 Finite-Dimensional Case 5.2.2.1 Example For finite dimensions, the binary tensor space Vα ⊗ Vαc resulting from the matricisation may be interpreted as matrix space (cf. §3.2.3). If, e.g., Vj = KIj , then Mα maps into2 KIα ×Iαc, where Iα = ×k∈α Ik and Iαc = ×k∈αc Ik . Hence a tensor v with entries v[(iκ )κ∈D ] becomes a matrix M = Mα (v) with entries M [(iκ )κ∈α , (iλ )λ∈αc ]. To demonstrate the matricisations, we illustrate all Mα for a small example. Example 5.5. Below, all matricisations are given for the tensor v ∈ KI1 ⊗ KI2 ⊗ KI3 ⊗ KI4
with I1 = I2 = I3 = I4 = {1, 2}.
The matrix M1 (v) belongs to KI1 ×J with J = I2 × I3 × I4 . For the sake of the following notation we introduce the lexicographical ordering of the triples from I2 × I3 × I4 : (1, 1, 1) , (1, 1, 2) , (1, 2, 1) , . . . , (2, 2, 2). Under these assumptions, KI1 ×J becomes K2×8 : 3 v1111 v1112 v1121 v1122 v1211 v1212 v1221 v1222 M1 (v) = . v2111 v2112 v2121 v2122 v2211 v2212 v2221 v2222 M2 (v) belongs to KI2 ×J with J = I1 ×I3 ×I4 . Together with the lexicographical ordering in J we get v1111 v1112 v1121 v1122 v2111 v2112 v2121 v2122 . M2 (v) = v1211 v1212 v1221 v1222 v2211 v2212 v2221 v2222 Similarly, M3 (v) = M4 (v) =
v1111 v1112 v1211 v1212 v2111 v2112 v2211 v2212 v1121 v1122 v1221 v1222 v2121 v2122 v2221 v2222
v1111 v1121 v1211 v1221 v2111 v2121 v2211 v2221 v1112 v1122 v1212 v1222 v2112 v2122 v2212 v2222
, .
Next, we consider α = {1, 2}. M{1,2} (v) belongs to KI×J with I = I1 × I2 and J = I3 × I4 . Lexicographical ordering of I and J yields a matrix from K4×4 : v1111 v1112 v1121 v1122 v1211 v1212 v1221 v1222 M{1,2} (v) = v2111 v2112 v2121 v2122 . v2211 v2212 v2221 v2222 This means that Mα is replaced by Ξ 0−1 ◦ Mα , where Ξ 0 is the isomorphism from the matrix space KIα ×Iαc onto the tensor space Vα ⊗ Vαc (see Proposition 3.16b). For simplicity, we write Mα instead of Ξ 0−1 ◦ Mα . 3 Bold face indices correspond to the row numbers.
2
5 General Techniques
190
Similarly,
v1111 v1121 M{1,3} (v) = v2111 v2121 v1111 v1112 M{1,4} (v) = v2111 v2112
v1112 v1122 v2112 v2122
v1211 v1221 v2211 v2221
v1121 v1122 v2121 v2122
v1211 v1212 v2211 v2212
v1212 v1222 , v2212 v2222 v1221 v1222 . v2221 v2222
The other Mα (v) are transposed versions of the already described matrices: M{2,3} = MT {1,4} ,
M{2,4} = MT {1,3} ,
M{3,4} = MT {1,2} ,
T T M{1,2,3} = M4T , M{1,2,4} = MT 3 , M{1,3,4} = M2 , M{2,3,4} = M2 .
5.2.2.2 Invariant Properties and α-Rank The interpretation of tensors v as matrices M enables us (i) to transfer the matrix terminology from M to v, (ii) to apply all matrix techniques to M. In Remark 3.17 we considered the isomorphism M(v) = M and stated that multiplication of a tensor by a Kronecker product A⊗B has the isomorphic expression M (A ⊗ B) v = A M(v) B T = AM B T . More generally, the following statement holds, which is the matrix interpretation of Remark 5.4. N N N Lemma 5.6. Let v ∈ V = j∈D KIj and A = j∈D A(j) ∈ j∈D L(KIj , KJj ). N The product Av ∈ W = j∈D KJj satisfies O O c c A(j), A(α ) = Mα (Av) = A(α) Mα (v)A(α )T with A(α) = A(j) . (5.5) j∈αc
j∈α
In particular, if all A Mα (v) coincide.
(j)
are regular matrices, the matrix ranks of Mα (Av) and
Proof. Define the index sets Iα := ×j∈α Ij and Iαc := ×j∈αc Ij , and similarly Jα and Jαc . The indices i ∈ I := ×j∈D Ij are written as (i0 , i00 ) with i0 ∈ Iα and i00 ∈ Iαc . Similarly for j = (j0 , j00 ) ∈ J. Note that •j0 ,j00 denotes a matrix entry, while •j = •(j0 ,j00 ) is a tensor entry. The identity X X X Mα (Av)j0,j00 = (Av)(j0,j00) = A(j0,j00),i vi = A(j0,j00),(i0,i00) v(i0,i00) i0 ∈Iα i00 ∈Iαc
i∈I
=
X X i0 ∈Iα i00 ∈Iαc
proves (5.5).
(α) (αc ) Aj0 ,i0 v(i0 ,i00) Aj00 ,i00
=
X X
(α)
(αc )
Aj0 ,i0 Mα (v)i0 ,i00 Aj00 ,i00
i0 ∈Iα i00 ∈Iαc
t u
5.2 Matricisation
191
According to item (i), we may define the matrix rank of Mα (v) as a property of v. By Lemma 5.6, the rank of Mα (v) is invariant under tensor space isomorphisms. Definition 5.7 (rankα ). For any4 α ⊂ D in (5.3b) and all j ∈ D we define rankα (v) := rank (Mα (v)) , rankj (v) := rank{j} (v) = rank (Mj (v)) .
(5.6a) (5.6b)
In 1927, Hitchcock [164, p. 170] has introduced rankj (v) as ‘the rank on the j th index’. Even rankα (v) is already defined by him as the ‘α-plex rank’. We shall call it the j-rank or the α-rank, respectively. Further properties of the α-rank will follow in Lemma 6.20 and Corollary 6.21. The tensor rank defined in (3.20) is not directly related to the family of ranks {rankα (v) : ∅ $ α $ D} in (5.6a) or the ranks {rankj (v) : j ∈ D} in (5.6b). Later, in Remark 6.24, we shall prove for all α ⊂ D. rankα (v) ≤ rank(v) N Remark 5.8. A tensor v ∈ j∈D Vj with random entries satisfies rankα (v) = Q Q min j∈α dim(Vj ), j∈αc dim(Vj ) with probability one for all α ⊂ D. t u
Proof. Apply Remark 2.5 to Mα (v).
Nd Remark 5.9. Let k·k be the Euclidean norm in V = j=1 KIj (cf. Example 4.149), while k·kF is the Frobenius norm for matrices (cf. (2.9). Then the norms coincide: kvk = kMα (v)kF
for all ∅ $ α $ D and all v ∈ V. u t
Proof. Use Remark 2.9.
5.2.2.3 Singular-Value Decomposition Later, singular values of Mα (v) will be important. The next proposition compares the singular values of Mα (v) and Mα (Av). N c Proposition 5.10. Let v ∈ V = κ∈D KIκ . The Kronecker matrix A = A(α) ⊗A(α ) c is assumed to be composed of A(α) ∈ L(Vα , Vα ) and A(α ) ∈ L(Vαc , Vαc ) with c c the properties A(α)HA(α) ≤ I and A(α )HA(α ) ≤ I. Then the singular values fulfil σk (Mα (Av)) ≤ σk (Mα (v))
for all k ∈ N. c
Proof. Combine Mα (Av) = A(α) Mα (v)A(α 4
)T
in (5.5) and Lemma 2.31c. t u
Usually, we avoid α = ∅ and α = D. Formally, the definition of M∅ , Md from Footnote 1 yields rank∅ (v) = rankd (v) = 1 for v 6= 0 and rank∅ (0) = rankd (0) = 0, otherwise.
5 General Techniques
192
Corollary 5.11. The assumptions of Proposition 5.10 are in particular satisfied if c A(α) and A(α ) are orthogonal projections. Remark 5.12. The reduced singular-value decomposition Mα (v) = UΣ V T =
rα X
(α)
σi ui viT
i=1 (α)
σi
> 0, ui , vi columns of U and V, rα = rankα (v) translates5 into v=
rα X
(α)
σi
ui ⊗ vi
(ui ∈ Vα , vi ∈ Vαc ).
i=1
Here ui and vi are the isomorphic vector interpretations of the tensors ui , vi . Remark 5.13. (a) In the case of matrices (i.e., D = {1, 2}), the ranks are equal:6 (2) (1) rank1 (v) = rank2 (v). Further, the singular values of Mα (v) coincide: σi = σi . (b) If d ≥ 3, the values rankj (v) (j ∈ D) may not coincide. Furthermore, the (j) singular values σi of Mj (v) may be different; however, the following quantity is invariant: rankj (v)
X
(j)
σi
2
2
= kvk2
for all j ∈ D
(k·k2 in (4.149)).
i=1
Proof. (i) Let M(v) = M be the isomorphism (1.3) between v ∈ KI1 ⊗ KI2 and the matrix M ∈ KI1 ×I2 . Then M1 (v) = M , while M2 (v) = M T . Since M and M T have identical rank and identical singular values, Part (a) follows. (ii) Consider the tensor v = a1 ⊗ a2 ⊗ a3 + a1 ⊗ b2 ⊗ b3 ∈ with ai =
1 0
(i = 1, 2, 3) and bi =
0 1
O3 j=1
K2
(i = 2, 3). We have
M1 (v) = a1 ⊗ c ∈ K2 ⊗ K2 ⊗ K2 ∼ = K2 ⊗ K4 with c := a2 ⊗ a3 + b2 ⊗ b3 ∼ = 10 00 00 10 has rank 1), = (1 0 0 1) ∈ K4 (i.e., M1 (v) ∼ whereas d d := a1 ⊗ a3 ∼ = d := (1 0 0 0) , M2 (v) = a2 ⊗ d + b2 ⊗ e ∼ with = e := a1 ⊗ b3 ∼ = e := (0 1 0 0) e Let the tensors ui and vi be vectorised into ui and vi (cf. §5.1). Then M(ui ⊗vi ) = ui viT is used (cf. (1.3)). 6 The true generalisation of this property for general d is described by (6.13a). 5
5.2 Matricisation
193
has two linearly independent rows and, therefore, rank 2. (iii) Since the rank is also the number of positive singular values, they must be different for the given example. The sum of the squared singular values is the 2 squared Frobenius norm of the corresponding matrix: kMj (v)kF . Since the matrix entries of Mj (v) are only a permutation of the entries of v (cf. Example 5.5), 2 t u the sum of their squares is equal to kvk2 . (j)
We may pose the opposite question: Given any real values σi ≥ 0 subject (j) to (4.149), are there tensors possessing these σi as singular values? The answer is negative (cf. Hackbusch–Uschmajew [154], Hackbusch–Kressner–Uschmajew [151]).
5.2.2.4 Matricisation of Kronecker Matrices So far, the tensor v involves a single index tuple i = (ij )j∈D . In the case of a tensor space formed from matrix spaces Xj = CIj ×Jj = L(Uj , Vj ), the Kronecker Nd matrix M ∈ X = j=1 Xj has two multi-indices: M[i, j], i ∈ I, j ∈ J. The matricisation Mα : X = L(U, V) → Xα ⊗ Xαc = L(Uα , Vα ) ⊗ L(Uαc , Vαc ) yields Mα (M) with entries for [(iα , jα ) , (iαc , jαc )]
with iα ∈ Iα , jα ∈ Jα , iαc ∈ Iαc , jαc ∈ Jαc .
The rank of Mα (M) is rankα (M) = dim(span{M[(•, •), (iαc , jαc )] ∈ Xα : iαc ∈ Iαc , jαc ∈ Jαc }). We emphasise that the α-rank of M is completely different from the matrix rank of the Kronecker matrix M. The multiplication rule from Remark 5.4 for Mα (Av) has a matrix equivalent: Mα (AB) = Mα (A) · Mα (B) for A ∈ Xα ⊗ Xαc , B ∈ Yα ⊗ Yαc , where Yj = L(Vj , Wj ) and Yα = L(Vα , Wα ). On the left-hand side, AB is the matrix product between X = L(U, V) and Y = L(V, W) (Kronecker matrices of tensor order d), while on the right-hand side the operation · is the multiplication between Kronecker matrices of tensor order 2. In tensor notation Mα (Av) = Mα (A) · Mα (v) for A ∈ Xα ⊗ Xαc , v ∈ Vα ⊗ Vαc holds, while (5.5) is the matrix interpretation.
(5.7)
5 General Techniques
194
5.2.2.5 Infinite-Dimensional Spaces For infinite-dimensional vector spaces Vj , these quantities generalise as follows. In the finite-dimensional case, rank(Mα (v)) is equal to the dimension of the range of Mα (v) (cf. Remark 2.1), which is given by range(Mα (v)) = {Mα (v)z : z ∈ Vαc }. P Since Mα (v) ∈ Vα ⊗Vαc has the form ν xν ⊗yν (cf. PDefinition 5.3), the matrixvector multiplication Mα (v)z can be considered as ν z(yν ) · xν ∈ Vα , where 0 z ∈ Vα c is regarded as an element of the dual vector space (for dim(Vαc ) < ∞, 0 Vαc may be identified with Vαc ). The mapping X X z(yν ) · xν xν ⊗ yν 7→ ν
ν
is denoted by id ⊗ z. Then the matrix-vector multiplication Mα (v)z may be rewritten as (id ⊗ z) Mα (v). This leads to the notation 0 rankα (v) := dim {(id ⊗ z) Mα (v) : z ∈ Vα c} . 0 ∗ The transition to the dual space Vα c (or Vαc ) is necessary since, in the infinitedimensional case, Mα (v) cannot be interpreted as a mapping from Vαc into Vα 0 0 but as a mapping from Vα c into Vα . The set {(id ⊗ z) Mα (v) : z ∈ Vαc } on the right-hand side will be defined in §6 as the minimal subspace Umin α (v), so that
rankα (v) := dim(Umin α (v))
(5.8)
is the generalisation to infinite-dimensional (algebraic) vector spaces, as well as to Banach spaces. ∗ An identification of Vα c with Vαc becomes possible if all Vj are Hilbert spaces. This case is discussed next.
5.2.3 Hilbert Structure N Next, we consider a pre-Hilbert space V = a j∈D Vj and the left-sided singularvalue decomposition problem of Mα (v) for some v ∈ V. The singular-value decomposition is r X (α) σi ui ⊗ vi , Mα (v) = (5.9) i=1
where
5.2 Matricisation
195
ui ∈ Vα :=
O a
vi ∈ Vαc :=
Vj ,
O
Vj
a j∈αc
j∈α (α)
(α)
are two orthonormal families {ui }, {vi }, and σ1 ≥ . . . ≥ σr > 0 . The (α) left-sided singular-value decomposition problem asks for {ui } and {σi }. If (α) Ij Vj = K allows us to interpret Mα (v) as a matrix, the data {ui }, {σi } are determined by LSVD(Iα , Iαc , r, Mα (v), U, Σ) (cf. (2.29)). We recall that its computation may use the diagonalisation Mα (v)Mα (v)H =
r X (α) (σi )2 ui uH i . i=1
In the infinite-dimensional setting, the latter expression can be expressed by the partial scalar product7 in §4.5.5: hMα (v), Mα (v)iαc = hv, viαc ∈ Vα ⊗ Vα . Pr (α) Assuming a singular-value decomposition Mα (v) = i=1 σi ui ⊗ vi (possibly with r = ∞, cf. (4.16)), the partial scalar product yields the diagonalisation hMα (v), Mα (v)iαc =
X r
(α)
σi
ui ⊗ vi ,
i=1
=
r X r X
(α) (α)
σi σj hvi , vj iαc ui ⊗ uj = | {z } i=1 j=1 =δij
r X
j=1 r X
(α)
σj
uj ⊗ vj αc
(α)
(σi )2 ui ⊗ ui .
i=1
We summarise. N Lemma 5.14. Let V = a j∈D Vj ∼ = Vα ⊗a Vαc with Vα , Vαc as in (5.9). (α) (α) The left singular vectors ui ∈ Vα and the singular values σi are obtainable from the diagonalisation hMα (v), Mα (v)iαc =
r X
(α) 2
σi
(α)
ui
(α)
⊗ ui .
(5.10)
i=1 (α)
Analogously, the right singular vectors vi are obtainable from the diagonalisation hMα (v), Mα (v)iα =
r X
(α)
∈ Vαc and the singular values σi
(α) 2
σi
(α)
vi
(α)
⊗ vi .
i=1 c
If the image Mα (v) = v(α) ⊗ v(α ) under the isomorphism Mα : V → V(α) ⊗ V(α an elementary tensors, the partial scalar product is defined by E D c c hMα (v), Mα (v)iαc = v(α ) , v(α ) · v(α) ⊗ v(α) ∈ Vα ⊗ Vα . 7
αc
The expression hv, viαc has the same meaning.
c
)
is
5 General Techniques
196
Corollary 4.161 allows us to determine the partial scalar product h·, ·iβ from h·, ·iα if α $ β ⊂ D. As a consequence, hMα (v), Mα (v)iαc can be obtained from hMβ (v), Mβ (v)iβ c : hMα (v), Mα (v)iαc = Cβ\α hMβ (v), Mβ (v)iβ c
(5.11)
with the contraction Cβ\α from Definition 4.160. In order to apply Cβ\α , the tensor hMβ (v), Mβ (v)iβ c ∈ Vβ ⊗ Vβ (α)
is interpreted as a tensor in Vα ⊗ Vβ\α ⊗ Vα ⊗ Vβ\α . Let {bi a basis of Vα . We obtain the following result.
: 1 ≤ i ≤ rα } be
Theorem 5.15. Assume ∅ $ α1 $ α ⊂ D and rα X
hMα (v), Mα (v)iαc =
(α)
(α)
eij bi
(α)
⊗ bj
∈ Vα ⊗ Vα .
(5.12a)
i,j=1 (α) Set α2 := α\α1 . Then α = α1 ∪˙ α2 holds. Consider bi ∈ Vα as elements of Vα1 ⊗ Vα2 with the representation (α) bi
=
rα 1 rα 2 X X
(i) (α1 ) cνµ bν ⊗ bµ(α2 ) .
(5.12b)
ν=1 µ=1 (i) We introduce the matrices Ci := cνµ ∈ Krα1 ×rα2 for 1 ≤ i ≤ rα . Then
hMαk (v), Mαk (v)iαc =
rα X
(α )
(αk )
eij k bi
k
(αk )
⊗ bj
∈ Vαk ⊗Vαk
(k = 1, 2)
i,j=1
(α ) holds with coefficient matrices Eαk = eij k ∈ Krαk ×rαk defined by Eα1 =
rα X
(α)
H eij Ci GT α2 Cj ,
Eα2 =
i,j=1
rα X
(α)
eij CiT GT α1 Cj ,
(5.12c)
i,j=1
(α ) (α ) (α ) (α ) where Gαk = gνµk is the Gram matrix with entries gνµk = bµ k , bν k .
Proof. The insertion of (5.12b) in (5.12a) yields hMα (v), Mα (v)iαc =
rα X i,j=1
(α)
eij
X
(j)
(α1 )
(α1 ) 2) ⊗ b(α ⊗ bσ c(i) µ νµ cστ bν
ν,µ,σ,τ
Applying (5.11) with β\α replaced by α2 = α\α1 yields
(α2 )
⊗ bτ
.
5.2 Matricisation
197
hMα1 (v), Mα1 (v)iαc1 =
rα X
(α)
X
eij
i,j=1
(j) (α2 ) (α ) 1) c(i) , bτ(α2 ) b(α ⊗ bσ 1 νµ cστ bµ ν
ν,µ,σ,τ
proving 1) e(α νσ =
rα X
(α)
eij
i.e., Eα1 =
i,j=1
(j)
(α2 ) c(i) νµ cστ gτ µ ,
µ,τ
i,j=1
P
X
(α)
H eij Ci GT α2 Cj . The case of Eα2 is analogous.
t u
(α)
Corollary 5.16. Assume the finite-dimensional case with orthonormal basis {bi }. (α) (α) Form the matrix Bα = [b1 b2 · · · ]. Then Mα (v)Mα (v)H = Bα Eα BH α (α)
(α)
holds in the matrix interpretation. In particular, (σi )2 = λi is valid for the (α) (α) singular values σi of Mα (v) and the eigenvalues λi of Eα . In the following example, we apply the matricisation to a topological Hilbert N tensor space V = k·k j∈D Vj with induced scalar product. Example 5.17. Let Vj = L2 (Ij ) with Ij ⊂ R for 1 ≤ j ≤ d. Then L2 (I) = Nd d 2 k·k j=1 Vj holds for I = ×j=1 Ij . Consider a function f ∈ L (I). To obtain the (j)
left singular vectors ui ∈ L2 (Ij ), we form the operator Kj := Mj (f )M∗j (f ) = hMj (f ), Mj (f )i[j] ∈ L (Vj , Vj ) . The application of Kj to g ∈ L2 (Ij ) is given by Z Kj (g)(ξ) = kj (ξ, ξ 0 )g(ξ 0 )dξ 0 with I[j] = Ij
kj (ξ, ξ 0 ) :=
Z
×I , dx k
k6=j
[j]
=
Y
dxk in
k6=j
f (. . . , xj−1 , ξ, xj+1 , . . .) f (. . . , xj−1 , ξ 0 , xj+1 , . . .) dx[j] .
I[j] (j)
The singular vectors ui value problem
(j)
and singular values σi (j)
(j) 2 (j) ui
Kj (ui ) = σi
can be obtained from the eigen(i ∈ N) .
Nd If f ∈ a j=1 Vj is an algebraic tensor, Kj has finite rank and delivers only finitely (j) (j) many singular vectors ui with positive singular values σi .
5 General Techniques
198
5.2.4 Matricisation of a Family of Tensors N Let F = (vi )i∈I be a family of tensors vi ∈ Vd = j∈D Vj . According to Lemma 3.27, the tuple (vi )i∈I may be considered as an element of the tensor space Vd ⊗ KI . If D = {1, . . . , d}, define the extended index set Dex := D ∪ {d + 1} N and the extended tensor space Vex = j∈Dex Vj , where Vd+1 := KI . Using the identification described in Lemma 3.27, we may view F as an element of Vex . This allows us to define Mα (F) for all α ⊂ Dex . For instance, α = D yields X vi ∈ Vd , ei ∈ KI : i-th unit vector . Md (F) = vi ⊗ ei i∈I
From this representation we conclude the following result about the left-sided singular-value decomposition (cf. (5.10)). P hMα (vi ), Mα (vi )iD\α for α ⊂ D. Remark 5.18. hMα (F), Mα (F)iDex\α = i∈I
5.3 Tensorisation Tensorisation is the opposite of vectorisation: a vector is isomorphically transformed into a tensor, even when the tensor structure is not given beforehand. One example of tensorisation has been presented in §1.2.4.1. There, u and f are grid functions and are usually considered as vectors. Because of the special shape of the grid Gn , the entries of u and f are of the form uijk and fijk (1 ≤ i, j, k ≤ n) and u and f can be regarded as tensors from Kn ⊗ Kn ⊗ Kn . However, the tensorisation may also be rather artificial. Consider, e.g., any vector from KI with I := {0, . . . , n − 1} and assume that n is not a prime so that a factorisation n = n1 n2 (n1 , n2 ≥ 2) exists. Choose index sets J1 := {0,. . . , n1 −1} and J2 := {0, . . . , n2 − 1} and set J := J1 × J2 . Since #J =#I, there is a bijection α : J → I, leading to an isomorphism with x ∈ KI ←→ v ∈ KJ = KJ1 ⊗ KJ2 v[j1 , j2 ] = x[α(j1 , j2 )] and α(j1 , j2 ) = j2 n2 + j1 .
(5.13)
In the latter case, v ∈ KJ is an n1 × n2 matrix or a tensor of order two. Obviously, tensors of higher order can be obtained by exploiting a factorisation n = n1 n2 · . . . · nd (assuming nj ≥ 2 to avoid trivial cases): KI ∼ =
d O j=1
KJj
for #I =
d Y j=1
#Jj .
199
5.3 Tensorisation
An extreme case is KI with the dimension #I = 2d : d K2 ∼ =
Od j=1
K2 .
The advantage of the representation as tensor is the fact that vectors (of length n) rewritten as tensors may require less storage. In the following example, the corresponding tensors are elementary tensors (cf. Khoromskij [185]). Consider x ∈ K{0...,n−1}
with xν = ζ ν
for 0 ≤ ν ≤ n − 1,
(5.14)
where ζ ∈ K is arbitrary. Such vectors appear in . approximations, as well as in Fourier representations. The isomorphism (5.13) based on n = n1 n2 yields the matrix M =
ζ0 ζ1 .. .
ζ n1 ζ n1 +1 .. .
ζ n1 −1 ζ 2n1 −1
0 ζ · · · ζ (n2 −1)n1 ζ1 · · · ζ (n2 −1)n1 +1 = .. ζ 0 ζ n1 · · · ζ (n2 −1)n1 , .. .. . . . n2 n1 −1 ζ n1 −1 ··· ζ
which corresponds to the tensor product x ∈ K{0...,n−1} ←→
ζ0 ζ1 .. .
ζ n1 −1
ζ0 ζ n1 .. .
⊗
ζ (n2 −1)n1
∈ K{0...,n1 −1} ⊗ K{0...,n2 −1} .
The decomposition can be repeated for the first vector, provided that n1 = n0 n00 (n0 , n00 ≥ 2), and yields (after renaming n0 , n00 , n2 by n1 , n2 , n3 ) x ∈ K{0...,n−1} ←→
ζ0 ζ1 . .. ζ n1 −1
ζ0 ζ n1 .. .
⊗
ζ (n2 −1)n1
⊗
ζ0
ζ n1 n2 ...
.
ζ (n3 −1)n1 n2
By induction, this proves the next statement. Remark 5.19. Let n = n1 n2 · · · nd with nj ≥ 2. The vector x in (5.14) corresponds to the elementary tensor v (1) ⊗ . . . ⊗ v (d) with v (j) ∈ K{0...,nj −1} defined by ζ0 j−1 ζ pj Y (j) v = with pj := nk . .. . k=1 ζ (nj −1)pj
5 General Techniques
200 j−1
d
In the case of n = 2 , i.e., nj = 2, pj = 2 is reduced from n to 2d = 2 log2 n.
, and v
(j)
=
2j−1 , the data size 1
ζ
Interestingly, this representation does not only save storage but also provides a more stable representation. As an example consider the integral involving the oscillatory function f (t) = exp(−αt) for α = 2πik + β, k ∈ N, β > 0, and g(t) = 1. For large k, the value of the integral Z
1
f (t)g(t)dt = 0
1 − exp(−β) β + 2πik
R1 is small compared with 0 |f (t)g(t)| dt = (1 − exp(−β)) /β. For usual numerical integration we have to expect a cancellation error of the size machine precision multiplied by the amplification factor Z κ := 0
1
Z |f (t)g(t)| dt /
0
1
p 2πk , f (t)g(t)dt = 1 + (2πk/β)2 ∼ β
which becomes large for large k and small β. If we approximate the integral by8 S :=
n−1 1 X ν ν f g n ν=0 n n
for n = 2d ,
the floating point errors are amplified by κ from above. Using the tensorised grid Nd Nd functions f = j=1 f (j) and g = j=1 g (j) with f (j) =
1 exp{−(2πik + β)2j−1−d }
and g (j) =
1 1
according to Remark 5.19, we rewrite9 the sum (scalar product) as Qd (j) (j) 1 ,g : j=1 f n
1 n
hf , gi =
d 1 Y 1 + exp(−(2πik + β)2j−1−d ) . S= n j=1
In this case the amplification factor for the floating point errors is O(d + 1) and does not deteriorate as k → ∞ and β → 0. More details about tensorisation will follow in Chapter 14. 8
The sum S is slightly different from the trapezoidal rule because of the quadrature weights for ν = 0 and n. For β = 1/10, k = 1000, and d = 20 (i.e., n = 220 = 1 048 576), the value of S is 4.56210 -8 − 1.515 i (exact integral value: 2.41010 -10 − 1.51510 -5 i). 10 -5j−1 P d −1 ν Q 2 9 = 2ν=0 x . 1 + x Note that d j=1
Chapter 6
Minimal Subspaces
Abstract The notion of minimal subspaces is closely connected with the representations of tensors, provided these representations can be characterised by (dimensions of) subspaces. A separate description of the theory of minimal subspaces can be found in Falc´o–Hackbusch [96] (see also [97]. The tensor representations discussed in the later Chapters 8, 11, 12 will lead to subsets Tr , Hr , Tρ of a tensor space. The results of this chapter will prove weak closedness of these sets. Another result concerns the question of a best approximation: is the infimum also a minimum? In the positive case, it is guaranteed that the best approximation can be found in the same set. Nd For tensors v ∈ a j=1 Vj we shall define ‘minimal subspaces’ Ujmin (v) ⊂ Vj in Sects. 6.1–6.4. In Section 6.5 we consider weakly convergent sequences vn * v and analyse the connection between Ujmin (vn ) and Ujmin (v). The main result will be presented in Theorem 6.29. While Sections 6.1–6.5 discuss minimal subspaces of Nd algebraic tensors v ∈ a j=1 Vj , Section 6.6 investigates Ujmin (v) for topological Nd tensors v ∈ k·k j=1 Vj . The final Section 6.7 is concerned with intersection spaces.
6.1 Statement of the Problem, Notations Nd Consider an algebraic tensor space V = a j=1 Vj and a fixed tensor v ∈ V. Among the subspaces Uj ⊂ Vj with the property d O v ∈ U := a Uj (6.1) j=1
we are looking for the smallest ones. We must show that minimal subspaces Uj exist and that these minimal subspaces can be obtained simultaneously in (6.1) for all 1 ≤ j ≤ d. Since it will turn out that the minimal subspaces are uniquely determined by v, we denote them by Ujmin (v) ⊂ Vj . © Springer Nature Switzerland AG 2019 W. Hackbusch, Tensor Spaces and Numerical Tensor Calculus, Springer Series in Computational Mathematics 56, https://doi.org/10.1007/978-3-030-35554-8_6
201
6 Minimal Subspaces
202
The determination of Ujmin (v) will be given in (6.5) and (6.8). We shall characterise the features of Ujmin (v), e.g., the dimension rj := dim(Ujmin (v)). Furthermore, the properties of Ujmin (v) for varying v are of interest. In particular, we consider Ujmin (vn ) and its dimension for a sequence vn * v. First, in §6.2, we explore the case d = 2. In §6.6 we replace the algebraic tensor space with a Banach tensor space. An obvious advantage of (6.1) is the fact that the subspaces Uj are of finite dimension even when dim(Vj ) = ∞ as stated next. Nd Lemma 6.1. For v ∈ a j=1 Vj there are always finite-dimensional subspaces Uj ⊂ Vj satisfying (6.1). More precisely, dim(Uj ) ≤ rank(v) can be achieved. Proof. By the definition of the algebraic tensor space, v ∈ there is a finite linear combination v=
d n O X
a
Nd
j=1 Vj
vν(j)
means that
(6.2a)
ν=1 j=1 (j)
with some integer n ∈ N0 and certain vectors vν ∈ Vj . Define Uj := span{vν(j) : 1 ≤ ν ≤ n}
for 1 ≤ j ≤ d.
(6.2b)
Nd Then v ∈ U := a j=1 Uj proves (6.1) with subspaces of dimension dim(Uj ) ≤ n. By the definition of the tensor rank, the smallest n in (6.2a) is n := rank(v). t u
6.2 Tensors of Order Two 6.2.1 Existence of Minimal Subspaces First, we consider the matrix case d = 2 and admit any field K. To ensure the existence of minimal subspaces, we need the lattice property (X1 ⊗a X2 ) ∩ (Y1 ⊗a Y2 ) = (X1 ∩ Y1 ) ⊗a (X2 ∩ Y2 ) , which is formulated in the next lemma more generally. Lemma 6.2. Let A be an index set of possibly infinite cardinality. Then \ \ \ U1,α ⊗a ( U1,α ⊗a U2,α ) = U2,α . α∈A
α∈A
α∈A
(6.3)
6.2 Tensors of Order Two
203
T T T Proof. The inclusion α∈A U1,α ⊗a α∈A U2,α ⊂ α∈A (U1,α ⊗a U2,α ) is ⊗ U U all v ∈ for β ∈ A implies that obvious.TIt remains to show that 2,β a 1,β T some U U . Choose ∈ A and let γ ∈ A be β v ∈ ⊗ 1,α 2,α a α∈A α∈A arbitrary. By assumption, v has representations v=
nβ X
(1) uν,β
⊗
(2) uν,β
ν=1
=
nγ X
(2) u(1) ν,γ ⊗ uν,γ
(j)
(j) with uν,β ∈ Uj,β , uν,γ ∈ Uj,γ .
ν=1 (1)
(2)
Thanks to Lemma 3.15, we may assume that the systems {uν,β } and {uν,β } are (2) linearly independent. Choose a dual system ϕµ ∈ V20 of {uν,β }. By Defini(2) tion 3.6 it satisfies ϕµ (uν,β ) = δνµ . Application of id ⊗ ϕµ to the first repre(1) sentation yields (id ⊗ ϕµ )(v) = uµ,β , while the second representation leads to Pnγ Pnγ (1) (1) (1) (2) (2) ν=1 ϕµ (uν,γ ) uν,γ shows ν=1 ϕµ (uν,γ ) uν,γ . The resulting equation uµ,β = (1)
(1)
(1)
that uµ,β is a linear combination of vectors uν,γ ∈ U1,γ , i.e., uµ,β ∈ U1,γ . Since T (1) γ ∈ A is arbitrary, uµ,β ∈ α∈A U1,α follows. T (2) (1) Analogously, using the system of {u dual T ν,β }, we prove uν,γ ∈ α∈A U2,α . T t Hence v ∈ u α∈A U1,α ⊗a α∈A U2,α . Definition 6.3. For an algebraic tensor v ∈ V1 ⊗a V2 , subspaces U1min (v) ⊂ V1 and U2min (v) ⊂ V2 are called minimal subspaces if they satisfy v ∈ U1min (v) ⊗a U2min (v), v ∈ U1 ⊗a U2
⇒
U1min (v)
(6.4a) ⊂ U1 and
U2min (v)
⊂ U2 .
(6.4b)
Proposition 6.4. All v ∈ V1 ⊗a V2 possess unique minimal subspaces Ujmin (v) for j = 1, 2. Proof. To prove existence and uniqueness of minimal subspaces, define the set F := F(v) := {(U1 , U2 ) : v ∈ U1 ⊗a U2 for subspaces Uj ⊂ Vj } . T F is nonempty since (V1 , V2 ) ∈ F. Then Ujmin (v) := (U1 ,U2 )∈F Uj holds for (v) holds and proves j = 1, 2. In fact, by Lemma 6.2, v ∈ U1min (v) ⊗a U2minT (6.4a), while (6.4b) is a consequence of the construction by (U1 ,U2 )∈F Uj . u t Pr (2) (1) Lemma 6.5. Assume (3.14); i.e., v = ν=1 uν ⊗ uν holds with linearly inde(j) pendent vectors {uν : 1 ≤ ν ≤ r} for j = 1, 2 . Then these vectors span the minimal subspace: Ujmin (v) = span uν(j) : 1 ≤ ν ≤ r for j = 1, 2. Proof. Apply the proof of Lemma 6.2 to the set A := {β, γ} and the subspaces (j) Uj,β := span{uν : 1 ≤ ν ≤ r}, Uj,γ := Ujmin (v). It shows that Uj,β ⊂ Ujmin (v) . Since a strict inclusion is excluded, Uj,β = Ujmin (v) proves the assertion. t u
6 Minimal Subspaces
204
As a consequence of Lemma 6.5, dim(Ujmin (v)) = r holds for j = 1 and j = 2, proving the following result. Corollary 6.6. U1min (v) and U2min (v) have identical finite dimensions. A constructive algorithm for the determination of Ujmin (v) has already been givenPin the proof of Lemma 3.15: as long as the vectors vν or wν in n v = ν=1 vν ⊗wν are linearly dependent, we can reduce the number of terms by one. This process must terminate after at most n steps. By Lemma 6.5, the resulting vectors {vν } and {wν } span U1min (v) and U2min (v). In the proof of Lemma 6.5 we already made indirect use of the following characterisation of Ujmin (v). The tensor id ⊗ ϕ2 ∈ L(V1 , V1 ) ⊗ V20 can be considered as a mapping belonging to L(V1 ⊗a V2 , V1 ) (cf. §3.3.2.4): v ⊗ w ∈ V1 ⊗a V2 7→ (id ⊗ ϕ2 ) (v ⊗ w) := ϕ2 (w) · v ∈ V1 . The action of ϕ1 ⊗ id ∈ V10 ⊗ L(V2 , V2 ) ⊂ L(V1 ⊗a V2 , V2 ) is analogous. Proposition 6.7. For v ∈ V1 ⊗a V2 the minimal subspaces are characterised by U1min (v) = {(id ⊗ ϕ2 ) (v) : ϕ2 ∈ V20 } ,
(6.5a)
V10 } .
(6.5b)
U2min (v)
= {(ϕ1 ⊗ id) (v) : ϕ1 ∈
(1)
Proof. Repeat the proof of Lemma 6.2: there are maps id ⊗ ϕ2 yielding uν . (1) By Lemma 6.5, the vectors uν span U1min (v). Similarly for (6.5b). Note that the t u right-hand sides in (6.5a,b) are linear subspaces. For V1 = Kn1 and V2 = Kn2 , tensors from V1 ⊗V2 are isomorphic to matrices from Kn1 ×n2 . Then definition (6.5a) may be interpreted as U1min (v) = range{M } = {M x : x ∈ V2 }, where M = M1 (v) is the matrix corresponding to v. Similarly, (6.5b) becomes U2min (v) = range{M T } since M T = M2 (v). Corollary 6.8. (a) Once U1min (v) and U2min (v) are given, we may select any basis Pr (2) (1) (2) {uν : 1 ≤ ν ≤ r} of U2min (v) and find a representation v = ν=1 uν ⊗ uν (2) (1) (cf. (3.14)) with the given vectors {uν } and some basis {uν } of U2min (v). (1) Conversely, we may select a basis {uν : 1 ≤ ν ≤ r} of U1min (v), and obtain Pr (1) (2) (1) (2) v = ν=1 uν ⊗uν with the given uν and some basis {uν } of U2min (v). (2)
(b) If {uν : 1 ≤ ν ≤ s} is a basis of a larger subspace U2 % U2min (v), a representaPs (1) (2) (1) tion v = ν=1 uν ⊗ uν still exists, but the vectors uν are linearly dependent.
6.2 Tensors of Order Two
205 (2)
(c) If we fix a basis {uν : 1 ≤ ν ≤ r} of some subspace U2 ⊂ V2 , there are mappings {ψν : 1 ≤ ν ≤ r} ⊂ L(V1 ⊗a U2 , V1 ) such that ψν (v) ∈ U1min (v) and v=
r X
ψν (v) ⊗ uν(2)
for all v ∈ V1 ⊗ U2 .
ν=1 (2)
Here the functionals ψν ∈ V20 form a dual system of {uν } and are interpreted as mappings of L(V1 ⊗a U2 , V1 ). Proof. (i) Part (c) is a repetition of Remark 3.65, where ψν (v) = vhνi . By definition of U1min (v), vhνi ∈ U1min (v) holds. (ii) In the case of Part (a) dim U2min (v) = r holds and implies dim U1min (v) = r (cf. Corollary 6.6). Set U1 := span1≤ν≤r {vhνi }. Since v ∈ U1 ⊗a U2min (v), (6.4b) implies U1 ⊃ U1min (v). Thus dim(U1 ) = r follows and {vhνi } must be a basis. (iii) If dim(U2 ) = s > dim(U2min (v)) = dim(U1min (v)), the s vectors vhνi in u must be linearly dependent. t
U1min (v)
6.2.2 Use of the Singular-Value Decomposition If v ∈ U1 ⊗a U2 holds for subspaces Uj ⊂ Vj of not too large dimension, the singular-value decomposition offers a practical construction of U1min (v) and U2min (v). Although the singular-value decomposition produces orthonormal bases, no Hilbert structure of V1 and V2 is required. The approach is restricted to the fields R and C. Remark 6.9. Let K ∈ {R, C}. Suppose that a representation of v ∈ U1 ⊗a U2 is Pn (1) (2) (j) given by v = ν=1 vν ⊗ vν with vν ∈ Uj and dim(Uj ) < ∞ . (j)
(a) Choose bases {ui : 1 ≤ i ≤ nj } of Uj (j = 1, 2) and determine the coefficients (j) of vν : vν(j) =
nj X
(j)
(j)
(j = 1, 2).
cνi ui
i=1
Hence v =
Pn1 Pn2 i=1
j=1
(1)
(2)
Mij ui ⊗uj
has the coefficients Mij :=
(1) (2) ν=1 cνi cνj .
Pn
(b) Determine the reduced singular-value decomposition of the matrix M ∈ Kn1×n2 by calling the procedure RSVD(n1 , n2 , r, M, U, Σ, V ), i.e., M = UΣ V T =
r X ν=1
σν aν bT ν.
6 Minimal Subspaces
206
(c) Define {ˆ aν : 1 ≤ ν ≤ r} ⊂ U1 and {ˆbν : 1 ≤ ν ≤ r} ⊂ U2 by1 n2 n1 X X (2) (1) bν [j] uj for 1 ≤ ν ≤ r . a ˆν := σν aν [i] ui and ˆbν := i=1
j=1
U1min (v)
They span the minimal subspaces := span {ˆ aν : 1 ≤ ν ≤ r} ⊂ U1 and Pr ˆν ⊗ ˆbν holds. U2min (v) := span{ˆbν : 1 ≤ ν ≤ r} ⊂ U2 , and v = ν=1 a min min (d) dim(U1 (v)) = dim(U2 (v)) = rank(M ) = r. Proof. Singular-value decomposition yields v=
n2 n1 X X
(1) Mij ui
⊗
(2) uj
=
i=1 j=1
i=1 j=1
=
nv r X X ν=1
n2 n1 X r X X
(1)
σν aν [i] ui
⊗
nw X j=1
i=1
! σν aν [i]bν [j]
(1)
ui
(2)
⊗ uj
ν=1 (2)
bν [j] uj
=
r X
a ˆν ⊗ ˆbν .
ν=1
Since the vectors aν are linearly independent, the a ˆν are also linearly independent. Similarly, {ˆbν } forms a basis. t u
6.2.3 Minimal Subspaces for a Family of Tensors The minimal subspaces Ujmin (v) belong to a single tensor v ∈ V1 ⊗a V2 . Now we replace the tensor v with a subset F ⊂ V1 ⊗a V2 and ask for minimal subspaces U1min (F ) and U2min (F ) so that v ∈ U1min (F ) ⊗a U2min (F ) holds for all v ∈ F . The obvious result is summarised below. Proposition 6.10. Let F ⊂ V1 ⊗a V2 be a nonempty subset. Then the minimal subspaces U1min (F ) and U2min (F ) are 2 X X U1min (F ) := U1min (v) and U2min (F ) := U2min (v). (6.6a) v∈F
v∈F
Another characterisation is U1min (F ) = span {(id ⊗ ϕ2 ) (v) : ϕ2 ∈ V20 , v ∈ F } , U2min (F ) = span {(ϕ1 ⊗ id) (v) : ϕ1 ∈ V10 , v ∈ F } .
(6.6b)
Proof. v ∈ F and v ∈ U1min (F )⊗a U2min (F ) require Ujmin (v) ⊂ Ujmin (v)(F ) for S j = 1, 2 and all v ∈ F ; hence, v∈F Ujmin (v) ⊂ Ujmin (v)(F ). The smallest S P subspace containing v∈F Ujmin (v) is the sum v∈F Ujmin (v), implying (6.6a). Equations (6.5a,b) prove (6.6b). t u 1 2
aν [i] is the i-th entry of aν ∈ Kn1 , etc. The sum of subspaces in (6.6a) is defined by the span of their union.
6.3 Minimal Subspaces of Tensors of Higher Order
207
6.3 Minimal Subspaces of Tensors of Higher Order In this section we assume that d ≥ 3 and generalise some of the features of tensors Nd of second order. Given v ∈ a j=1 Vj we now look for Ujmin (v) of minimal Nd Nd dimension with v ∈ j=1 Ujmin (v), i.e., v ∈ j=1 Uj must imply Uj ⊃ Ujmin (v). Nd By Lemma 6.1, we may assume v ∈ U := j=1 Uj with finite-dimensional subspaces Uj ⊂ Vj . The lattice structure from Lemma 6.2 generalises to higher order. Lemma 6.11. Let Xj , Yj ⊂ Vj be subspaces for 1 ≤ j ≤ d. Then the identity ! ! d d d O O O X = ∩ Y (Xj ∩ Yj ) a j a j a j=1
j=1
j=1
holds and can be generalised to infinitely many intersections. Proof. For the start of the induction d = 2 use Lemma 6.2. AssumeNthat the Nat d d assertion holds for d −1 and use a j=1 Xj = X1 ⊗X[1] with X[1] := a j=2 Xj Nd and a j=1 Yj = Y1 ⊗Y[1] . Lemma 6.2 states that v ∈ (X1 ∩Y1 )⊗ X[1] ∩Y[1] . By Nd inductive hypothesis, X[1] ∩Y[1] = a j=2 (Xj ∩Yj ) holds, proving the assertion. t u Again, the minimal subspaces Ujmin (v) can be defined by the intersection of all Nd subspaces Uj ⊂ Vj satisfying v ∈ a j=1 Uj . TheN algebraic characterisation of Ujmin (v) is similar to the case d = 2. Here we N write k∈{1,...,d}\{j} . Note that the right-hand sides of (6.7a–d) k6=j instead of N N N involve the possibly different spaces a k6=j Vk0 , (a k6=j Vk )0 , a k6=j Vk∗ , N ( k6=j Vk )∗ . Nevertheless, the image spaces are identical. Nd Lemma 6.12. Let v ∈ V = a j=1 Vj . (a) The two spaces n o O UjI (v) := ϕ(v) : ϕ ∈ a Vk0 , (6.7a) k6=j O 0 UjII (v) := ϕ(v) : ϕ ∈ a Vk (6.7b) k6=j
UjI (v)
UjII (v).
coincide: = (b) If Vj are normed spaces, we may replace algebraic functionals with continuous functionals: n o O Vk∗ . (6.7c) UjIII (v) := ϕ(v) : ϕ ∈ a k6=j
Then (c) If
UjI (v) a
UjII (v)
UjIII (v)
is valid. = = is a normed space, we may define V k6=j k n O UjIV (v) := ϕ(v) : ϕ ∈
N
k6=j
Then UjI (v) = UjII (v) = UjIV (v) holds.
Vk
∗ o
.
(6.7d)
6 Minimal Subspaces
208
Nd Proof. (i) Since the mappings ϕ are applied to v ∈ U := j=1 Uj (cf. Lemma 6.1), N N N we may replace ϕ ∈ a k6=j Vk0 with ϕ ∈ a k6=j Uk0 and ϕ ∈ (a k6=j Vk )0 N by ϕ ∈ (a k6=j Uk )0 without changing ϕ(v). Since dim(Uk ) < ∞, Proposition N N 3.62c states that a k6=j Uk0 = (a k6=j Uk )0 . This proves Part (a). N N (ii) As in part (i) we may restrict ϕ to a k6=j Uk0 = (a k6=j Uk )0 . Since dim(Uk ) < ∞, algebraic duals are continuous, i.e., ϕ∈
O a
Uk0 =
O a
k6=j
k6=j
Uk
0
=
O a
Uk∗ .
k6=j
By Hahn–Banach (Theorem 4.18), such mappings can be extended to This proves Part (b), while Part (c) is analogous.
a
N
∗ k6=j Vk .
t u
Nd Theorem 6.13. (a) For any v ∈ V = a j=1 Vj there exist minimal subspaces Ujmin (v) (1 ≤ j ≤ d). An algebraic characterisation of Ujmin (v) is ( Ujmin (v)
= span
) ϕ(1) ⊗ . . . ⊗ ϕ(j−1) ⊗ id ⊗ ϕ(j+1) ⊗ . . . ⊗ ϕ(d) (v) with ϕ(k) ∈ Vk0 for k 6= j (6.8a)
or equivalently n Ujmin (v) = ϕ(v) : ϕ ∈
O a
k6=j
Vk0
o
,
(6.8b)
where the action of the functional ϕ is understood as in (6.8a). Ujmin (v) coincides with the sets UjI (v), UjII (v), UjIII (v), UjIV (v) in (6.7a–d). Nd (b) For a subset F ⊂ V = a j=1 Vj of tensors, the minimal subspaces Vj,F Nd with F ⊂ a j=1 Vj,F are Ujmin (F ) =
X
Ujmin (v).
(6.8c)
v∈F
(c) For finite-dimensional Vj , the j-rank is defined in (5.6b) and satisfies rankj (v) = dim(Ujmin (v)),
(6.8d)
while, for the infinite-dimensional case, Eq. (6.8d) is the true generalisation of the definition of rankj . Proof. (i) The equivalence of N (6.8a) and (6.8b) is easily seen: Linear combinations of elementary tensors k6=j ϕk are expressed by span {. . .} in (6.8a) N 0 and by ϕ ∈ a k6=j Vk in (6.8b).
6.3 Minimal Subspaces of Tensors of Higher Order
209
d N (ii) The isomorphism Mj (‘matricisation’) from Definition 5.3 maps a Vk k=1 into Vj ⊗a V[j] . Proposition 6.7 states that n o O 0 min 0 Vk Uj (v) = ϕ(v) : ϕ ∈ V[j] = ϕ(v) : ϕ ∈ a k6=j
is the minimal subspace. The set on the right-hand side is UjII (v) (cf. (6.7b)) and Lemma 6.12 states that UjII (v) = UjI (v), where UjI (v) coincides with the set on the right-hand side of (6.8b). So far, we have proved v ∈ Ujmin (v) ⊗a V[j] . Thanks to Lemma 6.11, the intersection may be performed componentwise yielding Nd v ∈ j=1 Ujmin (v). (iii) For families of tensors, the argument of Proposition 6.10 proves Part (b). t (iv) For rankj , see the discussion in §6.4. u The right-hand side in (6.8a) is the span of a subset. For d = 2, the symbol ‘span’ may be omitted, since the subset is already a subspace (cf. Proposition 6.7). Exercise 6.14. (a) For a subset F ⊂ V let UF := span{F } ⊃ F . Show that Ujmin (F ) = Ujmin (UF ). (b) Let F ⊂ V be a subspace of finite dimension. Show that dim(Ujmin (F )) < ∞. The determination of Ujmin (v) by (6.8a,b) is not very constructive since it requires the application of all dual mappings ϕ(k) ∈ Vk0 . Another approach has already been used in the proof above. Remark 6.15. For j ∈ {1, . . . , d} apply the matricisation Mj := Mj (v) ∈ Vj ⊗a V[j]
with V[j] :=
O a
Vk .
k∈{1,...,d}\{j}
The techniques of §6.2.1 and §6.2.2 may be used to determine the minimal submin min spaces Ujmin (v) and U[j] (v). In particular, if a (v): Mj (v) ∈ Ujmin (v) ⊗ U[j] singular-value decomposition is required, we can make use of Remark 2.28 since only the first subspace Ujmin (v) is of interest. Remark 6.16. While dim (U1min (v)) = dim (U2min (v)) holds for d = 2 (cf. Corollary 6.6), the dimensions of Ujmin (v) are in general different for d ≥ 3. Nd Remark 6.17. Let v ∈ V = Ujmin (v). Consider j=1 Vj with corresponding Nd larger Banach spaces V0,j ⊃ Vj and set V0 := j=1 V0,j (cf. §4.3.4.2). Using ∗ 0 or ϕ(j) ∈ V0,j in (6.7a,c), we define the minimal subspaces the duals ϕ(j) ∈ V0,j min min min (v) coincide, since the minimality (6.5a,b) U0,j (v). However, Uj (v) = U0,j implies uniqueness. The same statement holds for Uαmin (v) defined below.
6 Minimal Subspaces
210
6.4 Hierarchies of Minimal Subspaces and rankα So far, we have defined minimal subspaces Ujmin (v) for a single index j in D := {1, . . . , d}. min We can extend this definition to Uα (v) for all subsets ∅ $ α $ D (cf. (5.3b)). For illustration we consider the example 7 O v∈V= Vj = (V1 ⊗ V2 ) ⊗ (V3 ⊗ V4 ) ⊗ (V5 ⊗ V6 ⊗ V7 ) = Vα ⊗Vβ ⊗Vγ , | {z } | {z } | {z } j=1
=Vα
=Vβ
=Vγ
N7
in which we use the isomorphism between V = j=1 Vj and Vα ⊗ Vβ ⊗ Vγ . Ignoring the tensor structure of Vα , Vβ , Vγ , we regard V = Vα ⊗Vβ ⊗Vγ as tensor space of order 3. Consequently, for v ∈ V there are minimal subspaces Umin α (v) ⊂ min (v) ⊂ V = V ⊗ V6 ⊗ V7 Vα = V1 ⊗ V2 , Umin (v) ⊂ V = V ⊗ V , and U γ 5 β 3 4 γ β min min min such that v ∈ Uα (v) ⊗ Uβ (v) ⊗ Uγ (v) . These minimal subspaces may be constructively determined from Mα (v), Mβ (v), Mγ (v). As in the example, we use the notations (5.3a–d) for D, α, αc , and Vα . By V = Vα ⊗Vαc , any v ∈ V gives rise to minimal subspaces Umin α (v) ⊂ Vα and c (v) ⊂ V . Umin c α α Nd Proposition 6.18. Let v ∈ V = a j=1 Vj , and ∅ 6= α ⊂ D. Then the minimal min (v) for j ∈ α are related by subspace Umin α (v) and the minimal subspaces Uj O Umin Ujmin (v) . (6.9) α (v) ⊂ a j∈α
Nd
Proof. We know that v ∈ U := a j=1 Ujmin (v) . Writing U as Uα ⊗Uαc with N N Uα := a j∈α Ujmin (v) and Uαc := a j∈αc Ujmin (v) , we see that Umin α (v) N min must be contained in Uα = a j∈α Uj (v) . t u An obvious generalisation is the following. Nd Corollary 6.19. Let v ∈ V = a j=1 Vj . Assume that ∅ = 6 α1 , . . . , αm , β ⊂ D are Sm subsets such that β = µ=1 αµ is a disjoint union. Then Om Umin Umin αµ (v). β (v) ⊂ µ=1
In particular, if ∅ = 6 α, α1 , α2 ⊂ D satisfy α = α1 ∪˙ α2 (disjoint union), then min min Umin α (v) ⊂ Uα1 (v) ⊗ Uα2 (v).
If
Y
min dim(Vµ ) ≥ dim(Umin α1 (v)) · dim(Uα2 (v)),
µ∈αc
there are tensors v ∈ V such that (6.10) holds with equality sign.
(6.10)
6.4 Hierarchies of Minimal Subspaces and rankα
211
(1)
(2)
Proof. For the last statement let {bi } be a basis of Umin α1 (v) and {bj } a basis (1) (2) min of Umin (v). Then {b ⊗ b } ⊗ Umin is a of basis (v) U α2 α1 α2 (v). For all pairs i j N (i, j) choose linearly independent tensors wij ∈ µ∈αc Vµ and set X (2) (1) v := wij ⊗ bi ⊗ bj ∈ V. i,j
(1)
One verifies that Umin α (v) = spani,j {bi
(2)
min ⊗ bj } = Umin α1 (v) ⊗ Uα2 (v).
The algebraic characterisation of Umin α (v) is analogous to (6.8a,b): o n O min 0 (j) (j) Uα V ∈ ϕ (v) = span ϕαc (v) : ϕαc = , ϕ j , c
(6.11)
j∈α
where ϕαc
Nd
j=1
t u
N N (j) v (j) := ϕαc · j∈α v (j) . j∈αc v
In Definition 5.7 rankα is introduced by rankα (v) := rank (Mα (v)), where Mα (v) is interpreted as a matrix. In the finite-dimensional case, rank(Mα (v)) is equal to the dimension of range(Mα (v)). In general, Mα (v) is a mapping 0 from Vα c into Vα , whereas the interpretation of the range of the matrix Mα (v) considers Mα (v) as a mapping from Vαc into Vα , which is true for the finite0 dimensional case since then Vα c and Vαc may be identified. As announced in (5.8), the true generalisation is rankα (v) = dim(Umin α (v)) (cf. Theorem 6.13c), which includes the case of rankα(v) = ∞ for v ∈ k·k with dim(Umin α (v)) = ∞. For completeness, we define 1 if v 6= 0, rank∅ (v) = rankd (v) = 0 if v = 0
(6.12) Nd
j=1 Vj
(cf. Footnote 4 on page 191). The α-ranks satisfy the following basic rules. Lemma 6.20. (a) The ranks for α ⊂ D and for the complement αc coincide: rankα (v) = rankαc (v).
(6.13a)
(b) If α ⊂ D is the disjoint union α = β ∪˙ γ, then rankα (v) ≤ rankβ (v) · rankγ (v)
(6.13b)
(c) If Y
dim(Vµ ) ≥ rankβ (v) · rankγ (v),
(6.13c)
µ∈αc
then there are v such that equality holds in (6.13b): rankα (v) = rankβ (v) · rankγ (v)
(6.13d)
In particular, under condition (6.13c), random tensors satisfy (6.13d) with probability one.
6 Minimal Subspaces
212
Proof. (i) In the finite-dimensional case we can use Mαc (v) = Mα (v)T to derive (6.13a) from Definition 5.7. In general, use V = Vα ⊗ Vαc and Corollary 6.6. (ii) Definition (6.12) together with (6.10) yields (6.13b). (iii) The last statement in Corollary 6.19 yields Part (c). A random tensor yields u a random matrix Mα (v) so that Remark 2.5 applies. t Corollary 6.21. (a) Decompose D = {1, . . . , d} disjointly into D = α ∪˙ β ∪˙ γ. Then the following inequalities hold: rankα (v) ≤ rankβ (v) · rankγ (v), rankβ (v) ≤ rankα (v) · rankγ (v), rankγ (v) ≤ rankα (v) · rankβ (v). (b) Let α = {j , j + 1, . . . , j}, β = {1, . . . , j − 1}, γ = {1, . . . , j}. Then (6.13b) holds again. Proof. (a) Since αc = β ∪˙ γ, the combination of (6.13a,b) proves the first inequality of Part (a). Since α, β, γ are symmetric in their properties, the other two inequalities follow. (b) For Part (b) note that D = α ∪˙ β ∪˙ γ c . t u min Umin α (v) can be determined from Uβ (v) with β % α (cf. notation (6.6b)).
Proposition 6.22. Let ∅ $ α $ β ⊂ D. Then the following identity holds for all v ∈ V: min min Umin α (v) = Uα (Uβ (v)). c
α α Proof. N (i) Any tensor vα ∈ Umin α (v) has the representation v = ϕ (v) with c c 0 αc ˙ ϕ ∈ j∈αc Vj . Using the disjoint union α = (β\α) ∪ β , there is a splitting P β\α N N c c c β\α ϕα = ν ϕν ⊗ ϕβν with ϕν ∈ j∈β\α Vj0 and ϕβν ∈ j∈β c Vj0 . Define P β\α c c α vν := ϕβν (v) ∈ Umin = ϕα (v) = (vν ) belongs to β (v). Then v ν ϕν min min min min Umin (U (v)), proving that U (v) ⊂ U (U (v)). α α α β β P min (ii) Any w ∈ Umin (U (v)) is of the form w = α β ν wν with O wν = ϕβ\α (vν ), where ϕβ\α ∈ Vj0 and vν ∈ Umin ν ν β (v) j∈β\α
N c (cf. (6.6a)). Since vν := ϕβν (v) for some functional ϕβν ∈ j∈β c Vj0 , we obtain P P β\α β c c c c β\α w= ϕν (ϕν (v)) = ν (ϕν ⊗ ϕβν )(v), i.e., w = ϕα (v) with ϕα = P β\αν N c 0 min ⊗ ϕβν ∈ ν ϕν j∈αc Vj . This shows that w ∈ Uα (v) and proves the min min opposite inclusion Umin t u α (v) ⊃ Uα (Uβ (v)). c
N Corollary 6.23. Let ∅ $ α $ β ⊂ D and ϕ ∈ j∈β c Vj0 . ϕ maps v ∈ V into w := ϕ(v) ∈ Vβ . Then rankα (w) ≤ rankα (v) as well as rank(w) ≤ rank(v).
6.5 Sequences of Minimal Subspaces
213
min min min Proof. w ∈ Uβmin (v) implies Umin α (w) ⊂ Uα (Uβ (v)) = Uα (v). The proof for the tensor rank is left to the reader. t u
We conclude with a comparison of the α-rank and the tensor rank introduced in Definition 3.35. Nd Remark 6.24. rankα (v) ≤ rank(v) holds for v ∈ a k=1 Vk and α ⊂ {1, . . ., d}. While rank(·) may depend on the underlying field K ∈ {R, C} (cf. §3.2.6.4), the value of rankα (·) is independent. Pr Nd (j) Proof. (i) Rewrite v = i=1 j=1 ui with r = rank(v) as O Xr (j) (α) (αc ) (α) ui ⊗ ui , where ui := ui . j∈α
i=1
The dimension of Uα :=
(j) span{uα,i
: 1 ≤ i ≤ r} satisfies
min rankα (v) = dim(Uα (v)) ≤ dim(Uα ) ≤ r = rank(v).
(ii) Because rankα (v) is the matrix rank of Mα (v), Remark 2.2 proves independence of the field. t u (d)
(d)
Exercise 6.25. Let {bi : i ∈ B} be an (algebraic) basis of Vd and {ϕi : i ∈ B} min the dual system in Vd0 . Show that U{1,...,d−1} (v) = span{vhii : i ∈ B}, where vhii are the expansion terms in (3.22a) (here with Ud = Vd ).
6.5 Sequences of Minimal Subspaces Nd Next we consider sequences vν ∈ a j=1 Vj . In general, the limit with respect to k·k is not an algebraic tensor. Therefore we have to define Ujmin (v) for topological Nd tensors in V := k·k j=1 Vj . Let V be a Banach tensor space with norm k·k and assume that either condition (4.54a) or (4.54b) be valid.
(6.14)
We recall that k·k & k·k∨ is a sufficient condition and that all reasonable crossnorms satisfy this inequality (cf. Proposition 4.74). In (6.7c), UjIII (v) was defined as one of the equivalent definitions of Ujmin (v) for algebraic v (cf. Lemma 6.12). Now we follow this definition and set (1) ϕ ⊗ . . . ⊗ ϕ(j−1) ⊗ id ⊗ ϕ(j+1) ⊗ . . . ⊗ ϕ(d) (v), min Uj (v) := span where ϕ(k) ∈ Vk∗ for k ∈ {1, . . . , d}\{j} O = ϕ(v) : ϕ ∈ a Vk∗ , (6.15a) k∈{1,...,d}\{j}
U(v) :=
d O k·k j=1
Ujmin (v) .
(6.15b)
6 Minimal Subspaces
214
For algebraic tensors v these definitions coincide with the previous ones. Note that Ujmin (v) may be infinite dimensional for topological tensors. Lemma 6.26. (a) Assume (4.54a). Then Ujmin (v) in (6.15a) is well defined for any Nd v ∈ V = k·k j=1 Vj . Ujmin (v) does not depend on k·k . (b) Ujmin (v) may also be defined by the closure of the right-hand side in (6.15a), however, this does not change the tensor subspace U(v) in (6.15b). Proof. By Lemma 4.113c the map ϕ(1) ⊗. . .⊗ϕ(j−1) ⊗id⊗ϕ(j+1) ⊗. . .⊗ϕ(d) is continuous. This proves part (a). For part (b) apply Lemma 4.40. t u ∗ Corollary 6.27. Alternatively we can choose functionals ϕ(k) ∈ V0,k for Banach spaces V0,k ⊃ Vk appearing in (4.54b): O ∗ . V0,k Ujmin (v) := ϕ(v) : ϕ ∈ a k∈{1,...,d}\{j}
Then, by Lemma 4.116b, ϕ : V → Vj is continuous. In §6.6 we shall discuss the meaning of Ujmin (v) and U(v) for non-algebraic of Ujmin (v) is UjIV (v) in tensors. In the algebraic case, an alternative definition N (6.7d). This definition requires that a norm on V[j] = k6=j Vk be defined. If V the norm of V[j] according is equipped with a uniform crossnorm, choose we may N ∗ ∗ . According to Lemma 4.102 in V[j] to §4.3.2. Note that k6=j Vk is contained N ∗ these spaces may coincide, but in general k6=j Vk is a proper subspace. The next lemma states that, nevertheless, the closures of the image spaces UjIII (v) and UjIV (v) coincide. Nd Lemma 6.28. Assume that V = k·k j=1 Vj is a Banach tensor space with a N uniform crossnorm k·k. For V[j] = k6=j Vk use the uniform crossnorm defined in Corollary 4.108. The dual norm is used in o n ∗ . (6.15c) UjIV (v) := ϕ(v) : ϕ ∈ V[j] Then UjIV (v) = Ujmin (v) is valid. In particular, (6.15b) may be replaced by Nd U(v) = k·k j=1 UjIV (v) . Proof. (i) Since k·k is uniform, it is also a reasonable crossnorm. This ensures that Ujmin (v) is well defined. According to Theorem 4.115, ϕ is well defined on V and therefore also UjIV (v). N ∗ (ii) Since k·k∗[j] k6=j Vk∗ ⊂ V[j] , we have Ujmin (v) ⊂ UjIV (v). (iii) Assuming that UjIV (v) is strictly larger than Ujmin (v), there are ψ ∈ N (k·k k6=j Vk )∗ and u := ψ(v) ∈ UjIV (v) such that u ∈ / Ujmin (v). By Hahn–
6.5 Sequences of Minimal Subspaces
215
Banach, there is a functional ϕ(j) ∈ Vj∗ with ϕ(j) (u) 6= 0 and ϕ(j) |U min (v) = 0. j The tensor w := ϕ(j) (v) ∈ V[j] does not vanish, since ψ(w) = ϕ(j) ⊗ ψ (v) = ϕ(j) (ψ(v)) = ϕ(j) (u) 6= 0. Hence the definition of kwk∨ > 0 implies that there is an elementary tensor N Nd ϕ[j] = k6=j ϕ(k) with |ϕ[j] (w)| > 0. Set ϕ := ϕ(j) ⊗ ϕ[j] = k=1 ϕ(k) . Now, ϕ(v) = ϕ[j] (ϕ(j) (v)) = ϕ[j] (w) 6= 0 is a contradiction to ϕ(v) = ϕ(j) (ϕ[j] (v)) = 0 because ϕ[j] (v) ∈ Ujmin (v) and ϕ(j) |Ujmin (v) = 0 . Hence UjIV (v) ⊂ Ujmin (v) is valid. Combining this inclusion with part (ii) yields UjIV (v) = Ujmin (v). The statement about U(v) corresponds to Lemma 6.26b. t u So far we used the continuity ϕ[j] (vn ) → ϕ[j] (v) for vn → v. Now we replace convergence by weak convergence and recall Theorem 4.117: Assume (6.14). For Nd all vn , v ∈ k·k j=1 Vj with vn * v, we have weak convergence ϕ[j] (vn ) * ϕ[j] (v) for all ϕ[j] ∈
a
N
k6=j
in Vj
Vk∗ [if (4.54a)] or for all ϕ[j] ∈
[if (4.54b), Remark 6.17]. Theorem 6.29. Assume (6.14). If vn ∈ a then
Nd
j=1 Vj
a
N k6=j
∗ V0,k with V0,k ⊃ Vk
satisfies vn * v ∈ k·k
dim(Ujmin (v)) ≤ lim inf dim(Ujmin (vn )) n→∞
Nd
j=1 Vj
,
for all 1 ≤ j ≤ d.
Proof. Choose a subsequence (again denoted by vn ) such that dim(Ujmin (vn )) is weakly increasing. In the case of dim(Ujmin (vn )) → ∞, nothing has to be proved. Therefore let lim dim(Ujmin (vn )) = N < ∞. For an indirect proof, assume that N dim(Ujmin (v)) > N . Since {ϕ(v) : ϕ ∈ a k6=j Vk∗ } spans Ujmin (v), there are N + 1 linearly independent vectors O [j] [j] b(i) = ϕi (v) with ϕi ∈ a Vk∗ for 1 ≤ i ≤ N + 1. k6=j (i)
[j]
By Theorem 4.117, weak convergence bn := ϕi (vn ) * b(i) holds. By Lemma (i) 4.29, for n large enough, the tuple (bn )1≤i≤N +1) is also linearly independent. (i) [j] Because bn = ϕi (vn ) ∈ Ujmin (vn ), this contradicts dim(Ujmin (vn )) ≤ N. t u These statements about Ujmin (·) can be generalised to Umin α (·) for subsets α.
6 Minimal Subspaces
216
Theorem 6.30. We suppose that (6.16)
either condition (4.54c) or (4.54d) be valid. Nd Nd If vn ∈ a j=1 Vj satisfies vn * v ∈ k·k j=1 Vj , then min dim(Umin α (v)) ≤ lim inf dim(Uα (vn )) n→∞
c
for all α $ D = {1, . . . , d}. c
c
4.121 shows that ϕα (vn ) * ϕα (v) in Vα for all ϕα ∈ Proof. N Theorem αc α ∗ α αc α a k∈αc Vk . The proof requires ϕ (ϕ (vn )) → ϕ (ϕ (v)) for all ϕ ∈ ∗ αc α Vα . (6.16) ensures that ϕ ◦ ϕ is continuous. t u If the space Ujmin (v) becomes infinite dimensional, we may ask about its (infinite) cardinality. The proof of the next remark shows that Ujmin (v) is contained in the completion of a space of dimension ≤ ℵ0 = #N. Remark 6.31. Even when the tensor space V is nonseparable, there is a separable Nd subspace S = k·k j=1 S (j) ⊂ V such d O
Ujmin (v) ⊂ S (j) ,
k·k
Ujmin (v) ⊂ S,
and
v ∈ S.
j=1
In particular, Ujmin (v) and
k·k
d N j=1
Ujmin (v) are separable.
Proof. Let vi → v be a converging sequence with algebraic tensors vi ∈ Valg . ∞ P The sum S (j) := Ujmin (vi ) of the finite-dimensional miminal subspaces has an i=1 ∞ P at most countable dimension: dim(Ujmin (vi )) ≤ ℵ0 , i.e., S (j) is separable and i=1
therefore also S (cf. Remark 4.41). Nd The property vi ∈ a j=1 Ujmin (vi ) ⊂ S together with the closedness of S proves that v = lim vi ∈ S. N ∗ [j] min Similarly, for any ϕ[j] ∈ (vi ) ⊂ S (j) has its k6=j Vk , ϕ (vi ) ∈ Uj limit lim ϕ[j] (vi ) = ϕ[j] (v) in S (j) . Hence, Ujmin (v) ⊂ S (j) holds and implies Nd Nd Nd min (v) ⊂ k·k j=1 S (j) = k·k j=1 S (j) = S (cf. Lemma 4.40). k·k j=1 Uj Nd u Separability of Ujmin (v) and k·k j=1 Ujmin (v) follows from Lemma 4.4. t We can revert the latter statement. Nd Proposition 6.32. Let V = k·k j=1 Vj , where at least two Banach spaces Vj are infinite dimensional. The norm k·k is assumed to satisfy (4.41). Then for any closed separable subspace Uj ⊂ Vj there is some v ∈ V with Uj = Ujmin (v).
6.5 Sequences of Minimal Subspaces
217
Proof. (i) Without loss of generality we consider the case of d = 2 and j = 1. Let {xi : i ∈ N} be a countable and dense subset of U1 . Choose any sequence {yi ∈ V2 : i ∈ N} of linearly independent vectors. Both sequences are normalised: kxi k1 = kyi k2 = 1. On span{yi : 1 ≤ i ≤ n} define the functional ϕn by ϕn (yi ) = 0 for i < n and ϕn (yn ) = 1 and extend ϕn by Hahn-Banach, so that |ϕn (yi )| ≤ 1 for all i > n. Define numbers anm for 1 ≤ n ≤ m < ∞ by ann := 1 and the recursion an,n+j = −
j X
4−i ϕn (yn+i ) an+i,n+j
for j = 1, 2, . . .
i=1
P∞ P∞ Set v := ν=1 4−ν x0ν ⊗ yν with x0ν := µ=ν aνµ xµ (convergence of both sums is still to be shown). Application of id ⊗ ϕn yields ∞ X
(id ⊗ ϕn ) (v) =
4−ν ϕn (yν ) x0ν =
ν=n
=
X
= 4−n
∞ X
j X
j=0
i=0
4−ν ϕn (yν )
ν=n
4−ν ϕn (yν ) aνµ xµ
n≤ν≤µ N = O(n2 ), O(n4 ) is a lower order term compared with 2rn1 n2 n3 . Thus, this method needs almost the same computational work, while the resulting N 0 -term representation is X X v= e(1,i) ⊗ xi,µ ⊗ yi,µ with N 0 := mi ≤ n1 min{n2 , n3 } = N. i∈I1
(i,µ)
7.6.4 Sparse-Grid Approach The sparse-grid approach is used to interpolate functions in higher spatial dimensions or it serves as ansatz for discretising partial differential equations. For a review of sparse grids we refer to Bungartz–Griebel [47] and Garcke [110]. Here we only sketch the main line and its relation to tensor representations. Nd To simplify the notation, we assume that the tensor space V = j=1 Vj uses identical spaces V = Vj , which allow a nested sequence of subspaces: V = V(`) ⊃ V(`−1) ⊃ . . . ⊃ V(2) ⊃ V(1) .
(7.18)
Typical examples are finite-element spaces V(`) of functions, say, on the interval [0, 1] using the step size 2−` . For sparse grids in the Fourier space compare Sprengel [268]. While the usual uniform discretisation by V = ⊗d V(`) has a dimension of order 2`d , the sparse-grid approach uses the sum of tensor spaces X
Vsg,` = d P
`j = `+d−1
d O
V(`j ) .
(7.19)
j=1
j=1
These spaces lead to the interpolation error7 by O(2−2` `d−1 ) for suitably regular functions (cf. [47, Theorem 3.8]). This is to be compared with dim(Vsg,` ) ≈ 2` `d−1 (cf. [47, (3.63)]). The basis vectors in Vsg,` are elementary tensors Nd (j) (j) j=1 bk,`j , where `j denotes the level: bk,`j ∈ V(`j ) . Since the number of terms is limited by the dimension of the space Vsg,` , the tensor v ∈ Vsg,` belongs to Rr with r = dim(Vsg,` ) ≈ 2−` `d−1 . 7
Any Lp norm with 2 ≤ p ≤ ∞ can be chosen.
7 r-Term Representation
250
For the practical implementation we use hierarchical bases (cf. [47, §3] and [141, §8.9.11]). In V(1) we choose, e.g., the standard hat function basis b1 := (bi )1≤i≤n1. The basis b2 of V(2) is b1 enriched by n22 hat functions from V(2) . The latter additional basis functions are indexed by n1 + 1 ≤ i ≤ n1 + n22 = n2 . In general, the basis bλ ⊂ V(λ) consists of bλ−1 and additional n2λ hat functions of V(λ) . The index `j corresponds to the dimension nj = 2`j of V(`j ) . The additive side Pd condition j=1 `j ≤ L := ` + d − 1 in (7.19) can be rewritten as d Y
nj ≤ N := 2L .
j=1
It follows from ij ≤ nj that the involved indices of the basis functions bij ∈ V(`j ) Qd satisfy j=1 ij ≤ N . For ease of notation, we replace Vsg,` with Vsg := span
d O
d Y
bij :
j=1
ij ≤ N
.
j=1
Since Vsg ⊃ Vsg,` , the approximation is not worse, while dim(Vsg ) has the same Qd asymptotic behaviour 2−` `d−1 as dim(Vsg,` ). The inequality j=1 ij ≤ N gives rise to the name ‘hyperbolic cross’. Remark 7.23. The typical hyperbolic cross approach is the approximation of a P Nd function f with the (exact) series expansion f = i∈Nd vi j=1 φij by X
fN := Qd
j=1 ij ≤N
vi
d O
φij .
j=1
The behaviour of the number σd (N ) of tuples i involved in the summation with respect to N is σd (N ) = O N logd−1 (N ) . (7.20) In the previous example O(N −2 logd−1 (N )) is the accuracy of fN . The following table shows the values of σ2 (N ) and σ10 (N ) for different N , as well as values of σd (10) for increasing d: d=2 d = 10 N = 10
N σd (N ) N σd (N ) d σd (N )
2 3 2 11 2 27
4 8 16 32 64 128 256 8 20 50 119 280 645 1466 4 8 16 32 64 128 256 76 416 2 056 9 533 41 788 172 643 675 355 3 5 10 20 50 100 1000 53 136 571 2 841 29 851 202 201 170 172 001
251
7.6 Conversions between Formats
7.6.5 From Sparse Format into r-Term Format Finally, we consider the sparse format v = ρsparse (˚ I, (vi )i∈˚ I ) in (7.5). By definiP Nd (j) v holds. latter expression The is an r-term representab tion, v = i∈˚ i j=1 ij I ˚ tion of v with r := #I nonzero terms. The function f from Remark 7.23 is a tensor isomorphically represented by the coefficient v ∈ ⊗d KN , where the tensor space is to equipped with the suitable norm. The tensor vsg corresponding to the approximation fN has sparse format: d ˚ I, (vi )i∈˚ vsg = ρsparse (˚ I ) with I = {i ∈ N :
d Y
ij ≤ N }.
j=1
This ensures an r-term representation with r := #˚ I. The representation rank r can I. Here we follow the idea from the be reduced because of the special structure of ˚ proof of Lemma 3.45: for fixed ij with j ∈ {1, . . . , d}\{k} we can collect all terms for ik ∈ N in ! ! ! k−1 d O X O (j) (k) (j) bij ⊗ vi bik ⊗ ∈ R1 . (7.21) bij j=1
ik
j=k+1
The obvious choice of (i1 , . . . , ik−1 , ik+1 , . . . , id ) are indices such that the sum P ik contains as many nonzero terms as possible. First, we discuss the situation for d = 2. I with 16 Figure 7.1 shows the pairs (i1 , i2 ) ∈ ˚ Qd i ≤ N For the first choice = 16. j j=1 k = 1 and i2 = 1, the indices i involved in (7.21) are contained in the first column of height 16. The second choice k = 2 and i1 = 1 leads to the lower row. Here the sum 8 in (7.21) ranges from 2 to 16 since v[1, 1] belongs to the previous column. Similarly, 5 4 two additional columns and rows correspond 3 to i2 = 2, 3 and i1 = 2, 3. Then we are 21 16 1 2 3 4 5 8 left with a single index i = (4, 4) so that we I into seven groups. Each Fig. 7.1 Sparse-grid indices have decomposed ˚ group gives rise to one elementary tensor (7.21). This finishes the construction of a 7-term representation of vsg . Obviously, for general N we can construct a representation in Rr with √ √ r ≤ 2b N c ≤ 2 N √ √ and even r ≤ 2 N − 1 if N ∈ N.
7 r-Term Representation
252
For general d, the decomposition of ˚ I can be achieved as follows. Let d−1 n o Y d−1 tj ≤ N T := (t1 , . . . , td−1 ) ∈ Nd−1 : max{tj } · j=1
j=1
be a set of (d−1)-tuples. For each t := (t1 , . . . , td−1 ) ∈ T and 1 ≤ k ≤ d define d−1 ˚ It,k := (t1 , . . . , tk−1 , ik , tk , . . . , td−1 ) ∈ ˚ I with ik ≥ max{tj } . j=1
Sd S We claim that t∈T k=1˚ I and I. For a proof, take any i = (i1 , . . . , id ) ∈ ˚ It,k = ˚ let k and m be indices of the largest and second largest ij , i.e., ik ≥ im ≥ ij
for all j ∈ {1, . . . , d}\{k, m} with k 6= m.
Qd Q I imply that im · j6=k ij ≤ j=1 ij ≤ N . Therefore Inequality im ≤ ik and i ∈ ˚ It,k . the tuple t := (i1 , . . . , ik−1 , ik+1 , . . . , id ) belongs to T and shows that i ∈ ˚ Sd S It,k , while direction ‘⊃’ follows by the definition of This proves ˚ I ⊂ t∈T k=1˚ ˚ It,k . We conclude that {˚ It,k : t ∈ T, 1 ≤ k ≤ d} is a (not necessarily disjoint) I, whose cardinality is denoted by τd (N ). Each8 set ˚ It,k gives decomposition of ˚ rise to an elementary tensor (7.21) and proves vsg ∈ Rr , where9 r := τd (N ) ≤ d · #T. It remains to estimate #T . ForQt := (t1 , . . . , td−1 ) ∈ T let m be an index√with 2 tm = maxd−1 N. j=1 {tj }. From tm j6=m tj ≤ N we conclude that 1 ≤ tm ≤ In the following considerations, we distinguish the cases tm ≤ N 1/d
and N 1/d < tm ≤ N 1/2 .
Inequality tm ≤ N 1/d implies tj ≤ N 1/d , and the condition d−1
max{tj } · j=1
d−1 Y
tj ≤ N.
j=1
The number of tuples t ∈ T with maxj {tj } ≤ N 1/d is bounded by N (d−1)/d . Now we consider the case N 1/d < tm ≤ N 1/2 . The remaining components tj Q (j 6= k), satisfy j6=m tj ≤ N/t2m . We ignore the condition tj ≤ tm , and ask Q for all (d − 2)-tuples (tj : j ∈ {1, . . . , d − 1}\{m}) with j6=m tj ≤ N/t2m . Its number is σd−2 (N/t2m ) = O tN2 logd−3 ( tN2 ) (cf. (7.20)). It remains to bound m
m
Since the sets ˚ It,k may overlap, one must take care that each vi is associated to only one ˚ It,k . τd (N ) < d · #T may occur, since ˚ It,k = ˚ It,k0 may hold for k 6= k0 . An example is the decomposition from Figure 7.1, where τ2 (16) = 7.
8
9
7.7 Representation of (Anti-)Symmetric Tensors
the sum Z
N 1/2
N 1/d
N N 1/d 0.
µ=1 [j]
(j)
0 in (6.7b), we obtain Ujmin (v(λ)) ⊂ span{wµ : 1 ≤ µ ≤ m}. Using Ψρ ∈ V[j] [j] [j] [j] 0 with Φν (vκ ) = δνκ (1 ≤ ν, κ ≤ r) On the other hand, there are Φν ∈ V[j] [j] [j] (j) but Φν (wµ ) = 0. This proves Ujmin (v(λ)) ⊂ span{vν : 1 ≤ ν ≤ r}. Together, we obtain that
rankj (v(λ)) = r + m = rj + 1. Hence v(λ) ∈ / Tr holds for all λ > 0, proving the second assertion.
t u
Now we are in position to prove Proposition 4.36 which states that the algebraic tensor space is a set of first category (cf. Uschmajew [289]). Proof of Proposition 4.36. By assumption, at least two spaces of V1 , . . . , Vd are infinite dimensional, which excludes the case (a) of Lemma 8.7 and part (b) applies showing that all sets Tr have no interior point. Furthermore, the algebraic tensor space Valg is the union of the countably many sets Tr , r ∈ Nd . As stated in Lemma 8.6, Tr is weakly closed, which implies that Tr is closed. Hence Valg is t u of first category. 5
This includes the case that nj = ∞ holds for at most one index j.
8.2 Tensor Subspace Formats
261
8.2 Tensor Subspace Formats Nd The previous characterisation of a tensor by subspaces Uj ⊂ Vj with v ∈ j=1 Uj corresponds to a more theoretical level of linear algebra. The practical realisation requires a description of the subspaces Uj by bases or, more general, by frames6 . Even when a basis (in contrast to a frame) is the desired choice, there are intermediate situations in which frames cannot be avoided (cf. §8.6). Concerning the choice of the basis (frame) of Uj , we distinguish three levels: 1. Use of a general frame or basis (cf. §8.2.1). 2. An orthonormal basis is always a good starting point for stable numerical computations (cf. §8.2.4). 3. A special orthonormal basis is the HOSVD basis discussed in §8.3. After fixing a basis, we obtain the explicit representation (8.5b) which often is given as the primal definition of the Tucker format. To change a basis into another one, we study transformations in §8.2.2. Concerning the (non)uniqueness of the tensor subspace representation we refer to §8.4.1.
8.2.1 General Frame or Basis In this section, the underlying spaces Vj are (abstract) vector spaces. For instance, Vj = L2 (Ωj ) may be a function space, while Uj ⊂ Vj is a finite-dimensional subspace spanned by concrete functions. The case Vj = KIj will be discussed in §8.2.3. Then the quantities Bj , B, Bα introduced below become (Kronecker) matrices.
8.2.1.1 Notation By Lemma 6.1 we can suppose that dim(Uj ) < ∞. We represent the subspace by (j)
Uj = span{bi (j)
The bi 6
: 1 ≤ i ≤ rj }.
are called the frame vectors or generating vectors. They form the rj -tuple
The frame is a system of vectors generating the subspace without assuming linear independence. When the term ‘frame’ is used, this does not exclude the special case of a basis; otherwise, we use (j) the term ‘proper frame’. Note that a frame cannot be described by a set {bν : 1 ≤ ν ≤ rj }, (j) (j) since bν = bµ may hold for ν 6= µ.
262
8 Tensor Subspace Representation
h i (j) (j) Bj := b1 , b2 , . . . , b(j) ∈ (Vj )rj . rj Set with Jj = {1 ≤ i ≤ rj }
J = J1 × . . . × Jd
for 1 ≤ j ≤ d.
Bj ∈ (Vj )rj can be considered as an element in L(KJj , Vj ). Its action on a ∈ KJj is X (j) ai bi . Bj a = i∈Jj
The frame data will be abbreviated by Bj ∈ (Vj )rj for 1 ≤ j ≤ d with rj = #Jj . These quantities define the elementary Kronecker product B :=
d O
Bj ∈ L(KJ , V).
(8.4)
j=1
The column of B corresponding to a multi-index i ∈ J is bi = all columns of B form the frame [or basis] of U ⊂ V.
(j) j=1 bij .
Nd
Hence
Using the data Bj ∈ (Vj )rj
frame or basis of Uj for 1 ≤ j ≤ d, ,
Jj := {1, . . . , rj }
for 1 ≤ j ≤ d,
a∈
d N
KJj = KJ
(8.5a)
for J = J1 × . . . × Jd ,
j=1
we define v = Ba =
X
ai
i∈J
=
r1 X r2 X i1 =1 i2 =1
···
d O
j=1 rd X
(j)
bij
(8.5b) (1)
(2)
(d)
a[i1 , i2 , · · · , id ] bi1 ⊗ bi2 ⊗ . . . ⊗ bid .
id =1
Note that rj ≥ dim(Uj ). Equality rj = dim(Uj ) holds if and only if Bj is a basis. According to §7.1, the representation (8.5b) is given by the mapping d O X (j) bij = Ba. ai ρTS a, (Bj )dj=1 := i∈J
(8.5c)
j=1
The coefficient tensor a is also called the core tensor (in [284, page 287] Tucker used the term ‘core matrix’). Formally, representation (8.5c) looks very similar to the full representation (7.3). However, there are two important differences. First, the index set J is hopefully
8.2 Tensor Subspace Formats
263 (j)
much smaller than the original index set I. Second, the vectors {bi } are of different nature. In the case of the full representation (7.3), Bj is a fixed basis. (j) For instance, for the space V of multivariate polynomials, the vectors bi = xji (j) I represent the monomials, or for V = K the basis vectors bi are the unit vectors e(i) ∈ KIj (cf. (2.2)). Because of the fixed (symbolic) meaning, these basis vectors need not be stored. The opposite is true for the representation (8.5c). Here Uj and (j) its frame depend on the tensor v. Therefore the frame vectors bi must be stored explicitly.
8.2.1.2 Data Size and Further Properties Remark 8.8 (general tensor subspace representation). (a) The storage require(j) ments of the vectors bi depend on the nature of Uj (cf. §7.5). Denoting the storage of each frame vector by size(Uj ), the basis data require the memory size d X d TSR Nmem = (Bj )j=1 rj · size(Uj ).
(8.6a)
j=1
(b) The coefficient tensor a ∈ KJ is given by its full representation (cf. §7.2) and requires a storage of the size TSR Nmem (a) =
d Y
rj .
(8.6b)
j=1
(c) For the optimal choice Uj = Ujmin (v) together with bases Bj of Uj , the numbers rj are given by rj = rankj (v) (cf. Remark 8.4). (d) If, at least for one j, Bj is not a basis, the coefficient tensor a ∈ KJ is not uniquely defined. The counterpart of Remark 7.11 reads as follows. Proposition 6.46 directly applies to Uk = Ukmin (v). Remark 8.9. Suppose a representation of v with rj = rankj (v) for 1 ≤ j ≤ d. (k) (a) If v satisfies a linear constraint ϕk (as defined in §6.8), then ϕk (bi ) = 0 (k) holds for all basis vectors bi from Bk . (b) Let V be the intersection Banach space in (4.58b). Then algebraic tensors (N ) (j) (j) v ∈ V imply that bi ∈ Vj j for all bi from Bj (1 ≤ j ≤ d). Set n := maxj size(Uj ) and r := maxj rj . Then the memory costs (8.6a,b) sum to rdn + rd . How rdn and rd compare, depends on the sizes of r and d . If r is small compared with n and if d is small (say d = 3), rd < rdn may hold. For medium sized d, the term rd becomes easily larger than rdn. For really large d, this term makes the representation infeasible.
264
8 Tensor Subspace Representation
8.2.2 Transformations h i i h (j) (j) (j) (j) Let a basis Bj = b1 , . . . , brj and a new basis Bjnew = b1,new , . . . , brj ,new be given. The transformation is described by an rj ×rj matrix T (j) : (j)
Bj = Bjnew T (j) , i.e., bi
=
rj X
(j) (j)
Tki bk,new
for 1 ≤ i ≤ rj ,
(8.7a)
for 1 ≤ k ≤ rj .
(8.7b)
k=1
The reverse direction is given by S (j) = (T (j) )−1 : (j)
Bjnew = Bj S (j) , i.e., bk,new =
rj X
(j) (j)
Sik bi
i=1
Nd The Kronecker matrix T = j=1 T (j) describes the simultaneous transformation in all directions 1 ≤ j ≤ d (with T (j) = I if Bj = Bjnew remains unchained). Nd Then B in (8.4) is transformed into Bnew = j=1 Bjnew via B = Bnew T ,
Bnew = B S,
S = T−1 =
d O
S (j) .
(8.7c)
j=1
Nd Lemma 8.10 (basis transformation). Let v ∈ U = j=1 Uj be described by (8.5a,b) and consider the transformation of Bjnew to the bases Bj by (8.7a) with matrices T (j) , i.e., B = Bnew T. The corresponding transformation of the coefficient tensor is d O anew := T a with T = (8.7d) T (j) . j=1
Then ρTS a, (Bj )dj=1 = ρTS anew , (Bjnew )dj=1 .
(8.7e)
Proof. Ba = (Bnew T)a = Bnew (Ta) = Bnew anew proves (8.7d).
The elementwise formulation of (8.7d) reads as anew [i1 , i2 , · · · , id ] rd r1 X r2 X X = ··· T (1) [i1 , k1 ] T (2) [i2 , k2 ] · · · T (d) [id , kd ] a[k1 , k2 , · · · , kd ]. k1 =1 k2 =1
kd =1
In the case of bases we have rj = dim(Uj ). This may be different for frames.
t u
8.2 Tensor Subspace Formats
265
Remark 8.11 (frame transformation). Let Bj and Bjnew be two frames of size rj and rjnew , respectively. (a) Assume that the frames Bj and Bjnew span the same space Uj . Then there are T (j) and S (j) satisfying (8.7a) and (8.7b) with rjnew instead of rj as upper bound of (j) k in bk,new . The rjnew ×rj matrix T (j) and the rj ×rjnew matrix S (j) are, in general, not uniquely determined. Although Bj = Bj S (j) T (j) and Bjnew = Bjnew T (j) S (j) , the products S (j) T (j) and T (j) S (j) need not be the identity matrix. (8.7d) and (8.7e) are still valid. (b) Let Uj = range(Bj ), while Ujnew = range(Bjnew ) is smaller and satisfies Uj % Ujnew ⊃ Ujmin (v). This includes the interesting case of Ujnew = Ujmin (v). Then S (j) with (8.7b) exists, but there is no T (j) fulfilling (8.7a) and therefore (8.7d) does not make sense. Here we have to proceed from the coefficients to the frame. Assume that the coefficient tensor a allows the formulation a = S anew ,
S=
d O
S (j) .
j=1
Then the new frames Bjnew = Bj S (j) satisfy Bnew := B S and (8.7e). Exercise 8.12. Discuss the case of range(Bj ) $ range(Bjnew ).
8.2.3 Tensors in KI Nd We now consider the tensor space V = KI = j=1 KIj with I = I1 × . . . ×Id and Nd U = j=1 Uj with subspaces Uj ⊂ KIj . The quantity Bj representing a frame or basis is the matrix i h (j) (j) ∈ KIj ×Jj (1 ≤ j ≤ d) , Bj := b1 , b2 , . . . , b(j) rj where the index sets Jj := {1, . . . , rj } form the product J = J1 × . . . ×Jd . Note that rj = dim(Uj ) in the case of a basis; otherwise, rj > dim(Uj ). For the sake of simplicity, we shall speak about ‘the frame Bj or basis Bj ’, although Bj is a matrix and only the collection of its columns form the frame or basis. The matrices Bj generate the Kronecker matrix B :=
d O
Bj ∈ KI×J
j=1
(cf. (8.4)). We repeat the formats (8.5a–c) and (8.10a,b) for Vj = KIj with the modification that the frames are expressed by matrices Bj .
266
8 Tensor Subspace Representation
Lemma 8.13 (general tensor subspace representation). (a) The coefficient tensor a∈
d O
KJj = KJ
for J = J1 × . . . × Jd ,
(8.8a)
j=1
and the tuple (Bj )1≤j≤d of frames represent the tensor v = Ba with the entries v[i1 · · · id ] = =
r1 X r2 X
···
rd X
k1 =1 k2 =1 r1 X r2 X
kd =1 rd X
k1 =1 k2 =1
kd =1
···
B1 [i1 , k1 ]B2 [i2 , k2 ] · · · Bd [id , kd ] a[k1 , k2 ,· · ·, kd ] (1)
(2)
(d)
bk1 [i1 ]bk2 [i2 ]· · ·bkd [id ] a[k1 , k2 ,· · ·, kd ],
(8.8b)
(j)
using the columns bk = Bj [•, k] of Bj . Equation (8.8b) is equivalent to v = Ba. The representation by ρframe (a, (Bj )dj=1 ) = Ba (8.8c) is identical to (8.5c), but now the data Bj are stored as (fully populated) matrices. (b) The storage cost required by B and a is d X Nmem (B) = Nmem (Bj )dj=1 = rj · #Ij ,
Nmem (a) =
j=1
d Y
rj .
(8.8d)
j=1
8.2.4 Orthonormal Basis 8.2.4.1 Definitions and Transformations Nd 7 Let V = j=1 Vj be a [pre-]Hilbert space with scalar product h·, ·i induced by the scalar products h·, ·ij of Vj . Consider again the representation (8.5a,b) of v ∈ U with a basis (Bj )1≤j≤d . Let G(j) ∈ KJj ×Jj be the Gram matrix (cf. (2.13)) with the entries
(j) (j) (j) for 1 ≤ i, k ≤ rj , 1 ≤ j ≤ d. (8.9) Gik := bk , bi
P P (j) (j) Note that hBj v, Bj wi = h i vi bi , k wk bk i = G(j) v, w 2 (v, w ∈ KJj ), where h·, ·i2 is the Euclidean scalar product in KJj . This proves that the adjoint map Bj∗ ∈ L(Vj , KJj ) of Bj ∈ L(KJj , Vj ) leads to the products Bj∗ Bj = G(j) ∈ KJj ×Jj
and
B∗ B = G :=
d O
G(j) .
j=1 7
The assumption of a [pre-]Hilbert space V Ncan be relaxed. Even if V possesses no topology, one can define a Hilbert structure on U = d j=1 Uj . For a fixed basis Bj of Uj , one defines (j) (j) the scalar product on Uj by hbi , bk i := δik . With the induced scalar product, U becomes a Hilbert space. If V is equipped with another norm, the V norm and the newly defined U norm are equivalent on U since dim(U) < ∞.
8.2 Tensor Subspace Formats
267
In the Hilbert space setting an orthonormal basis is obviously the desirable choice. An orthonormal basis is characterised by Bj∗ Bj = I for 1 ≤ j ≤ d or, equivalently, by B∗ B = I ∈ KJ×J . In the matrix case Bj∗ and B∗ are written as BjH and BH . This setting yields the next representation. If the bases Definition 8.14 (orthonormal tensor subspace representation). (a)N d Bj of Uj are orthonormal, the representation (8.5a,b) of v ∈ U = j=1 Uj is called an orthonormal tensor subspace representation in U. (b) The detailed parameters of the representation are rj := dim(Uj ) rj
Bj ∈ (Vj )
with
for 1 ≤ j ≤ d, orthonormal basis of Uj ,
Jj := {1 ≤ i ≤ rj } for 1 ≤ j ≤ d, Nd J Jj a ∈ j=1 K = K for J = J1 × . . . × Jd X Od (j) d ρorth a, (Bj )j=1 bij = Ba. := ai i∈J
j=1
(8.10a)
(8.10b)
(c) If V = KI (cf. §8.2.3), Bj ∈ (Vj )rj in (8.10a) becomes the orthogonal matrix8 Bj ∈ KIj ×Jj , and B ∈ KI×J is an orthogonal Kronecker matrix: BH B = I. Remark 8.15. The statements in Lemma 8.10 and Remark 8.11b are still valid. However, proper frames are excluded since Bj and Bjnew are bases. Moreover, the transformations must be unitary, i.e., (T (j) )∗ T (j) = I, (S (j) )∗ S (j) = I, T∗ T = I, S∗ S = I. If, starting from general frames Bj , we want to obtain orthonormal bases, we must find transformations such that Bjnew is an orthogonal matrix. For this purpose, two standard approaches can be applied. We recall Exercise 4.163: a QR decomposition of B or a Cholesky decomposition of B∗ B are equivalent to the respective decompositions of Bj or BjH Bj . The following lemma follows from Lemma 2.20. For the cost see §8.2.4.2. Lemma 8.16. Let v = Ba be given. (a) Assume that B is a basis. Then B∗ B ∈ KJ×J is a positive-definite matrix. Its Cholesky decomposition B∗ B = LLH defines the transformation v = (B L−H ) anew
with anew := LH a.
B L−H represents an orthogonal basis. (b) In the matrix case B ∈ KI×J , the QR decomposition9 B = QR yields v = Q anew
with anew := R a.
By definition, Q is an orthogonal matrix representing an orthonormal basis. 8
‘Orthonormal basis Bj ’ and ‘orthogonal matrix Bj ’ are equivalent expressions (cf. (2.3)). Also here, Vj may be general Hilbert spaces. In this case Qj is a tuple of orthonormal bases replacing the suitably ordered tuple Bj , while Rj in Bj = Qj Rj is a usual matrix. Their tensor product yields B = QR. 9
268
8 Tensor Subspace Representation
Remark 8.11b can be supplemented with orthogonality conditions. Corollary 8.17. Let v = ρorth a, (Bj )dj=1 be given. Assume a = S anew with an orthogonal matrix S, i.e., SH S = I. Then the new tensor subspace representation is also orthonormal: d ρorth a, (Bj )dj=1 = ρorth anew , (Bjnew )j=1 with Bnew := B S. Proof. Use B∗new Bnew = SH B∗ BS = SH S = I.
t u
In the case of Corollary 8.17, range(Bjnew ) ⊂ range(Bj ) holds. If both orthonormal bases span the same subspace, transformations must be unitary. Given unitary transformations Q(j) of Bj into Bjnew , i.e., (j)
bi =
rj X
(j) (j)
(j)
Qki bk,new , bk,new =
rj X
(j) (j)
Qik bi
for 1 ≤ i ≤ rj , 1 ≤ j ≤ d,
i=1
k=1
Nd the Kronecker product Q := j=1 Q(j) is also unitary and the coefficients transform according to anew = Qa, i.e., ρorth a, (Bj )dj=1 = ρorth anew , (Bjnew )dj=1 ; cf. (8.7c,d) with T = Q and (Q(j) )−1 = Q(j)H . Above, the new coefficient tensor anew is obtained from a by some transformation Ta. Alternatively, the coefficient tensor can be obtained directly from v via projection. Lemma 8.18. (a) Let v ∈ U and orthonormal bases Bj (1 ≤ j ≤ d) be given: Nd v = Ba with B := j=1 Bj . Then the coefficient tensor a of v has the entries * + d O (j) ai := v, bij , i.e., a = B∗ v. (8.11) j=1
(b) For a general the coefficient tensor in v = Ba is equal to a = G−1 b
Nbasis, Nd (j) d with bk := v, j=1 bkj and the Gram matrix G = j=1 G(j) in (8.9). Exercise 8.19. (a) Prove that the orthonormal tensor subspace representation v = Nd P (j) i∈J ai j=1 bij (cf. (8.10b)) implies that kvk = kak2 , where k·k : V → R is the norm associated with the induced scalar product of V, while k·k2 is the Euclidean norm of KJ (cf. Example 4.149). Nd P (j) (b) If a second tensor w = i∈J ci j=1 bij uses the same bases, the scalar products — h·, ·i in V, h·, ·i2 in KJ — coincide: hv, wi = ha, ci2 .
8.2 Tensor Subspace Formats
269
In the case of Vj = KIj , the orthonormal tensor subspace representation reads as ! d O d (8.12) Bj a with BjH Bj = I. v = ρorth a, (Bj )j=1 = j=1
The required memory size is the same as in (8.8d). According to (8.11), the coefficient tensor a can be obtained from v by a = BH v.
(8.13)
8.2.4.2 Orthonormalisation and Computational Cost ˆj )d ) is given with a proper a, (B We now assume10 that a tensor v = ρframe (ˆ j=1 nj ׈ rj ˆ . Lemma 8.16 proposes two methods frame or non-orthonormal basis Bj ∈ K for generating orthonormal bases Bj . Another possibility is computing the HOSVD bases (cf. §8.3 and §8.3.3). The latter approach is more expensive, but it allows us to determine orthonormal bases of the minimal subspaces Ujmin (v), whereas ˆj ). the following methods yield bases of possibly larger subspaces Uj := range(B d ˆj ) , procedure We start with the QR decomposition. Given frames or bases (B j=1 RQR(nj , rˆj , rj , Bj , Qj , Rj ) in (2.26) yields the decomposition ˆj = Qj Rj B
ˆj ∈ Knj ׈rj , Qj ∈ Knj ×rj , Rj ∈ Krj ׈rj ) (B
ˆj , Qj , and Rj . Defining with orthogonal matrices Qj , where rj is the rank of B the Kronecker matrices ˆ := B
d O
ˆj , B
Q :=
j=1
d O
Qj ,
j=1
R :=
d O
Rj ,
j=1
ˆ a = QRˆ a (cf. (8.5c)). Besides the exact operation count, we give a we get v = Bˆ bound in terms of rˆ := maxj rˆj
and
n := maxj nj .
(8.14)
Remark 8.20. Use the notations from above. The computational cost of all QR ˆj = Qj Rj and the cost of the product a := Rˆ decompositions B a add to " # j d d Y Y X 2 d+1 r + dˆ r . (8.15) NQR (nj , rˆj ) + rk · rˆk ≤ 2dnˆ j=1
k=1
k=j
Proof. The second term describes the cost of Rˆ a considered in (13.27a). Because of the triangular structure, a factor of two can be saved. t u 10 Also tensors represented in the r-term format are related to subspaces Uj , for which orthonormal basis can be determined (cf. Lemma 6.1). We can convert such tensors into the format ρTS according to §8.5.2.2 without arithmetical cost and apply the present algorithms.
270
8 Tensor Subspace Representation
The first approach in Lemma 8.16 is based on the Cholesky decomposition, proˆj )d ˆj = rj vided that (B j=1 represents bases. Because of the latter assumption, r holds. Note that in particular for the case r n, the resulting cost in (8.16) is almost the same as in (8.15). Remark 8.21. With the notations from above, the computational cost of the Cholesky approach in Lemma 8.16a is " # d d X Y 1 3 r¯ 2 2 2nj rj + rj + rj r¯ + d¯ rd+1 . (8.16) rk ≤ d 2n + 3 3 j=1 k=1
ˆj takes 1 (2nj − 1)rj (rj + 1) ≈ nj r2 operations. The ˆ HB Proof. The product B j j 2 1 3 Cholesky decomposition into Lj LH requires r operations (cf. Remark 2.19). j 3 j ˆj L−H . The Further nj rj2 operations are needed to build the new basis B := B j j P Q d d t transformation a = LH ˆ a costs j=1 rj · j=1 rj operations (cf. Remark 2.19).u For larger d, the major part of the computational cost in Remark 8.20 is d¯ rd+1 , J which is caused by the fact that ˆ a is organised as full tensor in K . Instead, the hybrid format discussed in §8.2.6 uses the r-term format for ˆ a. The resulting cost of Rˆ a (R and ˆ a as in Remark 8.20) described in (13.28b) is given in the following corollary. Corollary 8.22. Let the coefficient tensor ˆ a ∈ Rr be given in r-term format. The following transformations yield the new coefficient tensor a in the same format. (a) Using the QR decompositions from Remark 8.20, the cost of a := Rˆ a is r
d X
2
r , rj (2ˆ rj − rj ) . drˆ
j=1
while the QR cost 2 r . by d(2n + r)ˆ
Pd
ˆj ) j=1 NQR (nj , r
does not change. The total cost is bounded
(b) In the Cholesky approach from Remark 8.21 the coefficient tensor a := LH ˆ a Pd r¯ 2 can be obtained by r j=1 rj operations, yielding the total bound d(2n+ 3 +r)¯ r2 .
8.2.5 Summary of the Formats The representation schemes ρTS , ρframe , ρorth are mentioned in (8.5c), (8.8c), (8.10b). The format ρHOSVD will be discussed in §8.3 (cf. (8.25)). ρTS is the most general case. If Vj = KIj , it becomes ρframe . If the bases are orthonormal, ρframe turns into ρorth . If the orthonormal bases are the particular HOSVD bases in Definition 8.24, ρorth becomes ρHOSVD .
8.2 Tensor Subspace Formats
271
Finally, we mention a generalisation. The mentioned representations can be used to represent several tensors simultaneously in the same tensor subspace: Od v(1) , . . . , v(m) ∈ U = Uj . j=1
In this case the data (Bj )1≤j≤d need to be stored only once. Each tensor v(µ) requires a coefficient tensor a(µ) (1 ≤ µ ≤ m). The required data size is r¯dn + m¯ rd , where n := maxj #Ij and r¯ := maxj rj .
8.2.6 Hybrid Format P Nd (j) Let v = i∈J ai j=1 bij be the standard tensor subspace representation ρTS or ρorth . An essential drawback of this format is the fact that the coefficient tensor a ∈ KJ is still represented in full format. Although J = J1 × . . . × Jd might be of much smaller size than I = I1 × . . . × Id , the exponential increase of #J with respect to d proves disadvantageous. An obvious idea is to represent a itself by one of the tensor formats described so far. However, using again a tensor subspace representation for a does not yield a new format as seen in Remark 8.23. An interesting approach is to use the r-term representation for the coefficient tensor a. Often, such an approach goes together with an approximation, but here we consider an exact representation of a by a=
d r O X
J a(j) ν ∈K
with aν(j) ∈ KJj .
(8.17)
ν=1 j=1
The tensor subspace format v = ρTS(a, (Bj )dj=1 ) combined with the r-term rep(j) resentation a = ρr-term(r, (aν )1≤j≤d, 1≤ν≤r ) in (8.17) yields the hybrid format, which may be interpreted in two different ways. The first interpretation views v as a particular tensor from Tr (with rj = #Jj ) described by the iterated representation11 d (j) d (j) ρhybr r, (aν )1≤j≤d , (Bj )j=1 := ρTS ρr-term r, (aν )1≤j≤d , (Bj )j=1 1≤ν≤r
1≤ν≤r
=
r Y d XX i∈J
ν=1 j=1
O d (j) (j) aν [ij ] bi j
(8.18)
j=1
with ρTS from (8.5c) and a = ρr-term (. . .) from (7.7a). Similarly, we may define hybr d := ρorth ρr-term r, (aν(j) ) , (Bj )dj=1 , (8.19) ρorth r, (aν(j) ), (Bj )j=1 provided that Bj describes an orthonormal basis. 11
Carroll–Pruzansky–Kruskal [49] call this format CANDELINC. The last four letters abbreviate (j) ‘linear constraints’ which means that the vectors aν belong to certain subspaces.
272
8 Tensor Subspace Representation
The second interpretation views v as a particular tensor from Rr : X r O d X d r XO X (j) (j) v= aν(j) [ij ] bij = aν(j) [i] bi . ν=1 j=1
ν=1 i∈J j=1
The right-hand side may be seen as v =
Pr
ν=1
(j) vν
modification (7.14), is described by the basis the matrix Bj . The format is abbreviated by ρhybr r-term
r, J, (aν(j) )1≤j≤d , (Bj )dj=1 1≤ν≤r
=
i∈Jj
(j) j=1 vν , where, (j) {bi : i ∈ Jj },
Nd
according to which yields !
d r O X
X
ν=1 j=1
i∈Jj
(j) a(j) ν [i] bi
.
(8.20)
Note that the formats (8.18) and (8.20) are equivalent in the sense that they use the same data representing the same tensor. Another characterisation of a tensor v in the hybrid format is v ∈ Rr ∩ Tr with r, r = (r1 , . . . , rd ), rj = #Jj in (8.18) and (8.20). The hybrid format is intensively used in Espig [87, Satz 2.2.4] (cf. §9.6.1) and in Khoromskij–Khoromskaia [186]. Finally, we discuss what happens when we again represent the coefficient tensor a in tensor subspace format. Remark 8.23. (a) Consider the following nested tensor subspace formats: v=
X i∈J
ai
d O
(j)
bi j ,
j=1
X
a=
k∈K
ck
d O
(j)
βk j ,
(8.21a)
j=1
(j)
(j)
where v ∈ V = KI with I = I1 ×. . .×Id , bij ∈ Vj = KIj , a ∈ KJ , βkj ∈ KJj , c ∈ KK with K = K1 × . . . × Kd . Then v has the standard tensor subspace representation ( d (j) X O with ˆbk ∈ KIj defined by (j) ˆb (8.21b) ck v= (j) (j) k ˆb(j) := P (k ∈ Kj ). i∈Jj βk [i] bi k j=1 k∈K Nd (j) (j) (j) (j) (b) Using Bj := [b1 · · · brj ] (rj := #Jj ), B := j=1 B, β j := [β1 · · · βsj ] Nd ˆj := [ˆb(j) · · · ˆbs(j) (sj := #Kj ), β := j=1 β j , and B j ], we rewrite (8.21a,b) as 1 v = Ba,
a = βc,
ˆ ˆ := Bβ. v = Bc with B
(8.21c)
The equations in (8.21c) can be interpreted as transformation: set anew = c, S = β, ˆ , i.e., of all products ˆ in Remark 8.11b. The computation of B and Bnew = B Pd ˆ Bj = Bj βj , requires 2 j=1 nj rj sj operations, where nj := #Ij . (c) Orthonormal tensor subspace representations for v and a in (8.21a) again yield an orthonormal tensor subspace representation in (8.21b).
8.3 Higher-Order Singular-Value Decomposition (HOSVD)
273
8.3 Higher-Order Singular-Value Decomposition (HOSVD) The typical symbols for the matrices appearing in the singular-value decomposition are U, V or even Uj , Vj (cf. (2.16b)). These matrices are to be distinguished from the (sub)spaces Uj , Ujmin , Vj with similar or even equal names. As stated in Remark 3.50a, there is no true generalisation of the singular-value decomposition (SVD) for d ≥ 3. However, it is possible to extend parts of the SVD structure to higher dimensions as sketched below. Considering a (reduced) singular-value decomposition of a matrix, we observe the following properties: Pr (a1 ) M = UΣ V T = i=1 σi ui viT can be exploited, e.g., for rank truncation. Ps (a2 ) In fact, Ms := i=1 σi ui viT is the best approximation of rank s. (b1 ) We may use ui and vi as new basis vectors. (b2 ) The basis transformation from (b1 ) maps M into diagonal form. HOSVD12 will also be helpful for truncation (as in (a1 )), and, in fact, this property will be a very important tool in practice. However, the result of truncation is not necessarily optimal; i.e., (a2 ) does not extend to d ≥ 3. As in (b1 ), {ui } will provide a new basis, while {vi } is not used. The tensor expressed with respect to these bases is by no means diagonal, not even sparse; i.e., (b2 ) has no tensor counterpart.
8.3.1 Definitions Nd We start with the tensor space V = j=1 KIj . Given v ∈ V, we consider the matricisation M := Mj (v) which is a matrix of the size Ij×I[j] with I[j] = ×k6=j Ik . Its reduced singular-value decomposition is M = UΣ V
T
=
rj X
σi ui viT ∈ KIj ×I[j] ,
(8.22)
i=1
where ui and vi are the columns of the respective orthogonal matrices U ∈ KIj ×rj and V ∈ KI[j] ×rj , σ1 ≥ σ2 ≥ . . . > 0 are the singular values, and rj = rank(M ) = rankj (v) (cf. (5.6b)). While U may be of reasonable size, V ∈ KI[j] ×rj is expected to have a huge number of rows, which we do not want to compute. Fortunately, it turns out that the matrix V is not needed. We recall the ‘left-sided singular-value decomposition’: as mentioned in Remark 2.28b, we may ask only for U and Σ in the singular-value decomposition M = UΣ V T , and the computation of U and Σ may be based on M M H = U Σ 2 U H . The diagonal matrix Σ controls the truncation procedure (see item (a1 ) from above), while U defines an orthonormal basis (see item (b1 )). We remark that range(U ) = range(M ) = Ujmin (v) (cf. Remark 8.4). 12
Also the term multilinear singular-value decomposition (MLSVD) is used; cf. De Lathauwer– De Moor–Vandevalle [70].
274
8 Tensor Subspace Representation
Unlike the case d = 2, we have d different matricisations Mj (v), leading to a tuple of d different decompositions (8.22), called the higher-order singularvalue decomposition (HOSVD) by De Lathauwer–De Moor–Vandewalle [70]. To distinguish the matricisations, we ornament the quantities of (8.22) with the index j referring to Mj (v) ∈ Vj ⊗ V[j] . In the next N definition, V is a general Hilbert tensor space. This space, as well as all V[j] = k6=j Vk , is equipped with the corresponding induced scalar product. Nd Nd Definition 8.24 (HOSVD basis). Let v ∈ k·k j=1 Ujmin (v) ⊂ k·k j=1 Vj . (j) (j) An orthonormal basis Bj = (bi , . . . , brj } of Ujmin (v) is called the j-th HOSVD basis for v if the following (singular-value) decomposition is valid: Mj (v) =
rj P i=1
(j)
(j) (j)
[j]
σi bi ⊗ mi
with
(j)
σ1 ≥ σ2 ≥ . . . > 0
(8.23)
and
[j]
orthonormal {mi : 1 ≤ i ≤ rj } ⊂ V[j] :=
k·k
N
k6=j
Vk .
(j)
The σi are called the singular values of the j-th matricisation. For infinitedimensional Hilbert spaces Vj and topological tensors, rj = ∞ may occur. (α)
α of Similarly, for a subset ∅ = 6 α $ {1, . . . , d}, an orthonormal basis (bi )ri=1 is called an α-HOSVD basis for v if
Umin α (v)
Mα (v) =
rα P i=1
(α)
σ1
(α)
≥ σ2
(α) (α)
σi bi
≥ ... > 0 (αc )
orthonormal {mi
(αc )
⊗ mi
with and
(8.24)
: 1 ≤ i ≤ rj } ⊂ Vαc .
Definition 8.25 (HOSVD representation). A tensor subspace representation v = ρorth (a, (Bj )1≤j≤d ) is a higher-order singular-value decomposition (HOSVD) (or HOSVD tensor subspace representation or for short HOSVD representation) of v if all bases Bj (1 ≤ j ≤ d) are HOSVD bases for v.13 For Bj satisfying these conditions, we write v = ρHOSVD a, (Bj )1≤j≤d . (8.25) The storage requirements of HOSVD are those for the general case described in Lemma 8.13b. The next statement follows from Lemma 5.6. It allows us to replace the decomposition of v by the cheaper decomposition of the coefficient tensor a . 13
Because of the orthogonality property (8.23) for all 1 ≤ j ≤ d, De Lathauwer et al. [69], [173] call such a tensor representation all-orthogonal.
8.3 Higher-Order Singular-Value Decomposition (HOSVD)
Lemma 8.26. (a) A tensor v = Ba with B = T Mj (v) = Bj Mj (a) B[j]
Nd
275
j=1
Bj yields
with B[j] =
O k6=j
Bk .
If, at least for k 6= j, the bases Bk are orthonormal, the matricisations of v and a are related by Mj (v) Mj (v)H = Bj Mj (a) Mj (a)H BjH . (b) If Bj also describes an orthonormal basis, a diagonalisation of the matrix ˆj Σ 2 U ˆH Mj (a) Mj (a)H = U j j yields the left-sided singular-value decomposition ˆj . with Uj := Bj U
Mj (v) Mj (v)H = Uj Σj2 UjH
As a consequence, HOSVD representations of v and a are closely connected. Nd Corollary 8.27. Let v ∈ j=1 KIj be given by an orthonormal tensor subspace representation (8.12): v = ρorth (a, (Bj )1≤j≤d ) with BjH Bj = I. Then (Bj )1≤j≤d describes the j-th HOSVD basis of v if and only if Mj (a) Mj (a)H = Σj2 Σj =
(j) (j) diag{σ1 , σ2 , . . .}
with (j)
(j)
and σ1 ≥ σ2 ≥ . . . > 0.
8.3.2 Examples We give two examples of the HOSVD for the simple case d = 3 and the symmetric situation r1 = r2 = r3 = 2, V1 = V2 = V3 =: V . Example 8.28. Let x, y ∈ V be two orthonormal vectors and set14 v := x ⊗ x ⊗ x + σy ⊗ y ⊗ y ∈ V := ⊗3 V.
(8.26)
(8.26) is already the HOSVD representation of v. For all 1 ≤ j ≤ 3, (8.23) holds with (j)
(j)
(j)
(j)
(j)
(j)
rj = 2, σ1 = 1, σ2 = σ, b1 = x, b2 = y, m1 = x ⊗ x, m2 = y ⊗ y. While v in (8.26) has tensor rank 2, the next tensor has rank 3. Example 8.29. Let x, y ∈ V be two orthonormal vectors and set v = αx ⊗ x ⊗ x + βx ⊗ x ⊗ y + βx ⊗ y ⊗ x + βy ⊗ x ⊗ x ∈ V := ⊗3 V. For the choice α := 14
q √ 1 − 32 2 σ + σ 2
q and β :=
√ σ/ 2,
(8.27a)
The coefficient tensor a has the entries a[1, 1, 1] = 1, a[2, 2, 2] = σ, and zero, otherwise.
276
8 Tensor Subspace Representation (j)
(j)
the singular values are again σ1 = 1, σ2 = σ. The HOSVD basis is given by r r q q 1 σ 1 − √2 x + σ √2 − σ y σ √12 − σ x − 1 − √σ2 y (j) (j) p p b1 = , b2 = . (1 + σ) (1 − σ) (1 + σ) (1 − σ) (8.27b) In principle, the HOSVD can also be performed in Hilbert tensor spaces V = Nd case, the HOSVD bases k·k j=1 Vj with induced scalar product. In the generalN d are infinite (rj = ∞, cf. Theorem 4.137). If v := a j=1 Vj is an algebraic tensor, finite bases are ensured as in the next example referring to the polynomial from Example 8.5b. Note that here Vj is the function space L2 ([0, 1]) with the corresponding scalar product. Example 8.30. The HOSVD bases and the corresponding singular values15 of the Nd polynomial f (x, y, z) = xz + x2 y ∈ V := a j=1 Vj , Vj = L2 ([0, 1]), are (1)
(1)
q
(1)
q
109 720
(2)
q
109 720
(2)
q
b1 = 0.99953x + 0.96327x2 , σ1 = (1)
b2 = 6.8557x − 8.8922x2 , (2)
b1 = 0.58909 + 0.77158y,
σ2 = σ1 =
(2)
σ2 =
(3)
σ1 =
(3)
σ2 =
b2 = 1.9113 − 3.3771y, b1 = 0.44547 + 1.0203z, b2 = 1.9498 − 3.3104z,
(3)
(3)
109 720
109 720 (2) σ1 , (2) σ2 .
+
1 45 1 45
√
46
≈ 0.54964,
√
46 ≈ 0.025893, √ 1 2899 ≈ 0.54859, + 360 √ 1 − 360 2899 ≈ 0.042741, −
Proof. The matricisations Mj (f ) define integral operators Kj := Mj (f )M∗j (f ) ∈ R1 L (L2 ([0, 1]), L2 ([0, 1])) of the form (Kj (g)) (ξ) = 0 kj (ξ, ξ 0 ) g(ξ 0 ) dξ 0 (cf. Example 5.17). The involved kernel functions are Z 1Z
1
1 1 1 1 f (x, y, z)f (x0 , y, z)dydz = xx0 + x2 x0 + xx02 + x2 x02 , 3 4 4 3 0 0 1 1 1 1 1 1 1 1 0 k2 (y, y ) = + y + y 0 + yy 0 , k3 (z, z 0 ) = + z + z 0 + zz 0 . 9 8 8 5 15 8 8 3 √ √ The eigenfunctions of K1 are √ x − 16 ( 46 + 1)x2 and x + 16 ( 46 − 1)x2 with the 1 109 eigenvalues λ1,2 = 720 ± 45 46. Normalising the eigenfunctions and extracting (1) (1) the square root of λ1,2 , we obtain the orthonormal basis functions bi and σi (i = 1, 2) from above. 0
k1 (x, x ) =
Similarly, the eigenfunctions 1 + yield the indicated results. 15
Since
P2
i=1
(j) 2
σ1
=
109 720
√ − 2899±8 y 35
of K2 and 1 +
√ 8± 2899 z 27
of K3 t u
for all 1 ≤ j ≤ 3, the values pass the test by Remark 5.13b.
8.3 Higher-Order Singular-Value Decomposition (HOSVD)
277
8.3.3 Computation and Computational Cost Nd Let V = j=1 Vj with Vj = KIj and nj := #I. For subsets ∅ 6= α $ D of N D := {1, . . . , d}, we use the notations αc := D\α and Vα = a k∈α Vk . The usual choice is α = {j}. We introduce the mapping (Bα , Σα ) := HOSVDα (v), where the left-hand side is the pair of the matrices Bα , Σα in the left-sided singular-value decomposition Mα (v)Mα (v)H = Bα Σα2 BαH , Bα ∈ Knα ×rα, 0 ≤ Σα ∈ Krα ×rα, BαH Bα = I, min (v)). Since the singular-value decomposition where rα := rankα (v) = dim(Uα is not always unique (cf. Corollary 2.24b), the map HOSVDα is not well-defined in all cases. If multiple solutions exist, we may pick a suitable one.
Performing HOSVDj (v) for all 1 ≤ j ≤ d, we obtain the complete higher-order singular-value decomposition (B1 , Σ1 , B2 , Σ2 , . . . , Bd , Σd ) := HOSVD(v). The computational realisation of HOSVDj depends on the used format. We discuss the following cases: (A) Tensor v given in full format. (j) (B) v given in r-term format ρr-term r, (vν )1≤j≤d,1≤ν≤r . (C) v given in the orthonormal tensor subspace format ρorth a, (Bj )dj=1 , where the coefficient tensor a may have various formats. (D) v given in the general tensor subspace format ρframe a, (Bj )dj=1 , which means that Bj is not necessarily orthogonal. As review we list the cost (up to lower order terms) for the various cases:
format of v full r-term ρorth ρhybr orth
computational cost for HOSVD nd+1 d 2nr min(n, r)+nr2 +2n¯ r2 +2r2 r¯+3r¯ r2 + 38 r¯3 d+1 2 3d rˆ + 2d rˆ (n + 34 rˆ) 2 3 2dnr rˆ+(d+2) r2 rˆ+2dr rˆ min(ˆ r, r)+3r rˆ + 14 ˆ 3 dr
details in Remark 8.31 Remark 8.32 (8.33b) (8.34)
278
8 Tensor Subspace Representation
8.3.3.1 Case A: Full Format Set nj := #Ij , I[j] = ×k6=j Ik , n := maxj nj . The data HOSVDj (v) = (Bj , Σj ) can be determined by procedure LSVD(#Ij , #I[j] , rj , Mj (v), Bj , Σj ) in (2.29), where rj = dim(Ujmin (v)) describes the size of Bj ∈ KIj ×rj . The cost NLSVD (nj , #I[j] ) summed over all 1 ≤ j ≤ d yields " # 2 Y d d d X X nj 8nj 8n3j 8 2#I[j] −1 (nj + 1)+ + nk ≤ dnd+1 + dn3 . ≈ nj 2 3 3 3 j=1 j=1 k=1
For d ≥ 3, the dominant part of the cost is dnd+1 arising from the evaluation of the matrix entries of Mj := Mj (v)Mj (v)H : X v[i1 , · · ·, ij−1 , ν, ij+1 , · · ·, id ] v[i1 , · · ·, ij−1 , µ, ij+1 , · · ·, id ]. Mj [ν, µ] = i∈I[j]
If the HOSVD tensor subspace format of v is desired, we have to determine the coefficient tensor a. (8.13) implies that H
a=B v
with B :=
d O
Bj .
(8.29)
j=1
P P i.e., a[k1 , . . . , kd ] = i1 ∈I1 · · · id ∈Id B1 [k1 , i1 ] · · · Bd [kd , id ] v[i1 i2 · · · id ]. The cost for evaluating a is d X j=1
(2nj − 1) ·
j Y k=1
rk ·
d Y
nk . 2r1 nd .
k=j+1
In this estimate we assume that rj nj so that the terms for j > 1 containing 2r1 r2 nd−1 , 2r1 r2 r3 nd−2 , . . . are much smaller than the first term. Obviously, the summation should be started with j ∗ = argmin{rj : 1 ≤ j ≤ d}. Above, we first determined the HOSVD bases and afterwards performed the projection (8.29). In fact, it is advantageous to apply the projection by Bj BjH immediately after computing Bj since the projection reduces the size of the tensor: v0 := v for j := 1 to d do begin (Bj , Σj ) := HOSVDj (vj−1 ); vj := I ⊗ . . . ⊗ I ⊗ BjH ⊗ I ⊗ . . . ⊗ I vj−1 end; {I is the identity matrix of varying size} return: a := vd start: loop:
(8.30)
8.3 Higher-Order Singular-Value Decomposition (HOSVD)
Set B(1,d) := B1H ⊗
Nd
j=2 I (1,d)
279
and B(1,d−1) := B1H ⊗
Nd
j=3
I. Lemma 5.6 implies
) . Since (B(1,d−1) )H B(1,d−1) that M2 (v1 ) = M2 (B v) = M2 (v)(B Nd min is the projection onto the subspace U1 (v) ⊗ j=2 Vj containing v, we have (1,d−1) T
(B(1,d−1) )H B(1,d−1) v = v. This proves H H M2 (v1 )M2 (v1 ) = M2 (v) M2 (v) (B(1,d−1) )T B(1,d−1) H H = M2 (v) M2 (B(1,d−1) )H B(1,d−1) v = M2 (v) M2 (v) . Similarly, we prove the identity Mj (vj−1 ) Mj (vj−1 )H = Mj (v) Mj (v) implying (Bj , Σj ) = HOSVDj (vj−1 ) = HOSVDj (v). Remark 8.31. The cost of algorithm (8.30) is " # j−1 d d Y Y X 8 3 rk · nk + nj . (nj + 2rj ) · 3 j=1 k=1
H
(8.31)
k=j
Under the assumptions rj nj and d ≥ 3, the dominant part is n1
Qd
k=1
nk .
8.3.3.2 Case B: r-Term Format In §8.5 we shall discuss conversions between formats. The present case is already such a conversion from r-term format into HOSVD tensor subspace representation (other variants will be discussed in §8.5.2). Pr Nd (j) (j) (j) Let v = ν=1 j=1 vν ∈ Rr be given. First, all scalar products hvν , vµ i for 1 ≤ j ≤ d, 1 ≤ ν, µ ≤ r are to be computed. We discuss in detail the computation of (B1 , Σ1 ) = HOSVD1 (v). The first matricisation is given by M1 (v) =
r X
vν(1) ⊗ vν[1]
with vν[1] =
ν=1
d O
vν(j) .
j=2
By definition, B1 and Σ1 from HOSVD1 (v) results from the diagonalisation of M1 := M1 (v)M1 (v)H = B1 Σ12 B1H . We exploit the special structure of M1 (v): M1 =
r X r D r X r Y d D E E X X vν[1] , vµ[1] vν(1) (vµ(1) )H = vν(j) , vµ(j) vν(1) (vµ(1) )H . ν=1 µ=1
M1 has the form
ν=1 µ=1
j=2
280
8 Tensor Subspace Representation
M1 = A1 C1 AH 1
A1 := [v1(1) v2(1) · · · vr(1) ] and Kd r with C1 := Gj with Gj := hvν(j) , vµ(j) i ν,µ=1 j=2
Jd (here j=2 denotes the multiple Hadamard product; cf. §4.6.4). As explained in Remark 2.39, the diagonalisation M1 = B1 Σ12 B1H is not performed directly. H H Instead, we use the QR decomposition in M1 = A1 C1 AH 1 = Q1 R1 C1 R1 Q1 and H diagonalise R1 C1 R1 . In the following algorithm, the index j varies from 1 to d: (j) (j) r form Gram matrices Gj := hvν , vµ i ν,µ=1 ; J compute Hadamard products Cj := k6=j Gk ; (j) (j) [v1 · · · vr ] = Qj Rj (Qj ∈ Knj×rj, Rj ∈ Krj×r ); form products Aj := Rj Cj RjH ; diagonalise Aj = Uj Λj UjH ; 1/2 return Σj := Λj and Bj := Qj Uj ;
Gj ∈ Kr×r Cj ∈ Kr×r rj = rank(Qj Rj ) Aj ∈ Krj ×rj Uj , Λj ∈ Krj ×rj Bj ∈ Knj ×rj
1 2 3 4 5
(8.32a)
6
In line 3, rank rj = dim(Ujmin (v)) is determined. Therefore r = (r1 , . . . , rd ) is the Tucker rank which should be distinguished from the representation rank r of v ∈ Rr . Line 6 delivers the values (Bj , Σj ) = HOSVDj (v). Nd It remains to determine the coefficient tensor a ∈ j=1 Krj of v. As known from Theorem 8.44, a also possesses an r-term representation. Indeed, a=
r O d X
u(j) ν
(j) (j) H with u(j) ν [i] = hvν , bi i = Rj Uj [ν, i]
(8.32b)
ν=1 j=1
holds. Hence we obtain the hybrid format (8.18) of v. If wanted, we may convert a into full representation. This would yield the standard tensor subspace format of v. Remark 8.32. The computational cost of (8.32a,b), up to lower order terms, is r
d d X X nj (r + 2 min(nj , r)) + rj (2r + 3rj ) + j=1
8 3 rj
+ 2nj rj2
j=1
≤ d 2nr min(n, r) + nr + 2n¯ r + 2r r¯ + 3r¯ r2 + 83 r¯3 2
2
2
with n := maxj nj and r¯ := maxj rj . Pd Proof. The cost of each line in (8.32a) is r(r+1) (2nj − 1) (line 1), 2 Pd j=1 Pd n N (n , r) = 2r (d − 1) r(r + 1) (line 2), j=1 j min(nj , r) (line 3), j=1 QR j d d d P P P r +1 rj (r+ j2 ) (line 4), 83 (2r − 1) rj3 (line 5), and rj (1 + nj (2rj − 1)) j=1 j=1 Pdj=1 t u (line 6), while (8.32b) requires r j=1 rj (2rj − 1) operations.
8.3 Higher-Order Singular-Value Decomposition (HOSVD)
281
This approach is in particular favourable if r, rj nj since nj appears only linearly, whereas squares of r and third powers of rj are present. For later purposes we mention an approximate variant which can be used for large r. The next remark is formulated for the first step j = 1 in (8.32a). The other steps are analogous. Pr Nd (j) Remark 8.33. Assume that the norm16 of v = ν=1 j=1 vν ∈ Rr is known, (j) say, kvk ≈ 1. Normalise the vectors by kvν k = 1 for 2 ≤ j ≤ d. Instead (1) (1) of the exact QR decomposition [v1 · · · vr ] = Q1 R1 , apply the algorithm from Corollary 2.42 and Remark 2.43. According to Corollary 2.42, we may omit sufficiently small terms and reduce r to r0 (called m0 in Corollary 2.42). Alternatively or additionally, we may use the approximate scalar product h·, ·ip from Remark 2.43 for the cheap approximate computation of Gj .
8.3.3.3 Case C: Orthonormal Tensor Subspace Format ˆj )1≤j≤d ), i.e., v = Bˆ ˆ a. According a, (B Let v ∈ V be represented by v = ρorth (ˆ to Lemma 8.26b, the HOSVD bases of v can be derived from the HOSVD bases of ˆj , Σj ) := HOSVDj (ˆ a. Having determined (U the coefficient tensor ˆ a), we obtain ˆ ˆ (Bj , Σj ) := HOSVDj (v) by Bj := Bj Uj . ˆ d ˆ = ×j=1 Jˆj , a ∈ KJ is given in full format with J Case C1: Assume that ˆ ˆj , Σj ) := HOSVDj (ˆ Jˆj = {1, . . . , rˆj }. The computation of (U a) together with the evaluation of the coefficient tensor a ∈ KJ with J = N ˆ ˆ U ˆ := d U a = Ua, and the property ˆ j=1 j , requires d X
(ˆ rj + 2rj ) ·
j=1
j−1 Y
rk ·
k=1
d Y k=j
×dj=1 Jj , Jj = {1, . . . , rj }
8 rˆk + rˆj3 3
operations (cf. (8.31)), where rj = dim(Ujmin (ˆ a)) = dim(Ujmin (v)). From rj ≤ rˆj Pd Qd d+1 3 + 83 d rˆ with we get the estimate by rj · k=1 rˆk + 38 rˆj3 ≤ 3 d rˆ j=1 3ˆ rˆ := maxj rˆj . ˆj U ˆj for all 1 ≤ j ≤ d is The cost of Bj := B d X
(2ˆ rj − 1) nj rj .
(8.33a)
j=1
In total, the computational work is estimated by d+1
3dˆ r 16
4 2 + 2dˆ r n + rˆ 3
with n, rˆ in (8.14).
(8.33b)
Because of the instability discussed later, the norm of v may be much smaller than the sum of the norms of all terms (cf. Definition 9.20).
282
8 Tensor Subspace Representation
Pr Nd ˆ (j) Case C2: Let ˆ a ∈ KJ be given in r-term format ˆ a = j=1 vν . ν=1 aν ˆj , Σj ) := HOSVDj (ˆ By Remark 8.32 (with nj replaced by rˆj ), the cost of (U a) ˆ including the computation of a with ˆ a = Ua amounts to r
d d X X 8 rj , r)) + rj (2r + 3rj ) + rˆj (r + 2 min(ˆ rj + 2ˆ rj rj2 3 j=1 j=1 2
r, r) + 3 r rˆ + ≤ (d + 2) r2 rˆ + 2 d r rˆ min(ˆ
14 3 d rˆ . 3
ˆj U ˆj , we obtain the following operation count. Adding the cost (8.33a) of Bj := B ˆj )1≤j≤d ) is repreRemark 8.34. If the coefficient tensor ˆ a in v = ρorth (ˆ a, ( B (j) sented as ˆ a = ρr-term (r, (vν )1≤j≤d,1≤ν≤r ), computing the HOSVD bases Bj and the coefficient tensor a ∈ Rr in v = ρHOSVD (a, (Bj )1≤j≤d ) requires d X 8 rˆ rj (r + 2 min(ˆ rj , r)) + rrj (2r + 3rj ) + rj + 2ˆ rj rj2 + 2nj rˆj rj 3 j=1 2
r + (d + 2) r2 rˆ + 2dr rˆ min(ˆ r, r) + 3r rˆ + ≤ 2dnrˆ
14 3 d rˆ 3
(8.34)
operations, where rˆ := maxj rˆj and n := maxj nj as in (8.14).
8.3.3.4 Case D: General and Hybrid Tensor Subspace Format ˆj )d ) with non-orthonormal bases, the simplest In the case of v = ρframe (ˆ a, (B j=1 approach combines the following steps: Step 1: convert the representation into orthonormal tensor subspace format v = ρorth (a0 , (Bj0 )dj=1 ) by one of the methods described in §8.2.4.2. Step 2: apply the methods in §8.3.3.3 to obtain v = ρHOSVD (a, (Bj )dj=1 ). Alternatively, we may determine ρHOSVD (a, (Bj )dj=1 ) directly from the full ˆj )d ˆj )d ). However, this tensor ˆ a and and the bases (B a, (B j=1 in v = ρframe (ˆ j=1 17 approach turns out to be more costly. The situation differs, at least in case of r < nj , for the hybrid format when the Pr Nd (j) coefficient tensor ˆ a is given in r-term format: ˆ a = ν=1 aν j=1 vν ∈ Rr . First, the Gram matrices G(j) :=
rˆj ˆb(j) , ˆb(j) ν µ
ν,µ=1
∈ Krˆj ׈rj
(1 ≤ j ≤ d)
(8.35)
N Starting from (8.35) and Kronecker products G[j] := G(k) , one has to determine
[j] N k6=j rˆ j (cf. §4.5.5). matrices G ˆ a, ˆ a [j] using the partial scalar product in k6=j K 17
8.4 Tangent Space and Sensitivity
283
P are generated (cost: j nj rˆj2 ). A different kind of Gram matrices are Gj ∈ Kr×r Pd (j) (j) with Gj [ν, µ] := hG(j) vν , vµ i, whose computation costs rj2 + r2 rˆj ). j=1 (2rˆ ˆ If Bj represents a basis, a modification is possible: compute the Cholesky decom(j) (j) L(j) L(j)H and use Gj [ν, µ] = hL(j)H vν , L(j)H vµ i. Then the positions G(j) =P d 1 3 2 2 operation count ˆj + rˆj r + r rˆj ) is reduced because of the triangular j=1 ( 3 r shape of L(j)H . The matrices Gj correspond to the equally named matrices in the first line of (8.32a). Since the additional steps are identical to those in (8.32a), we 2 2 2 3 2 must compare the costs d(n rˆ + 2r rˆ + r2 rˆ) or d(n rˆ + 13 rˆ + rˆ r + r2 rˆ) from 2 above with the sum of d rˆ (2n + r) from Corollary 8.22 plus dr2 n for (8.32a1 ) (n, rˆ in (8.14)). Unless r n, the direct approach is cheaper. We summarise the results below. For the sake of simplicity, we compare the upper bounds. ˆj )d ) with B ˆj ∈ Knj ׈rj and assume that Remark 8.35. Let v = ρframe (ˆ a, ( B j=1 ˆ a has an r-term representation. Then the direct method is advantageous if n > r. 2 Its cost is by d (n − r) rˆ + n − rˆ r2 cheaper than the combination of Step 1 and 2 from above. The Cholesky modification mentioned above is even cheaper by 2 d n − rˆ/3 rˆ + n − rˆ r2 .
8.4 Tangent Space and Sensitivity 8.4.1 Uniqueness First we consider the subspace notation v ∈
Nd
j=1
Uj .
Nd Remark 8.36. (a) The spaces Ujmin (v) in v ∈ j=1 Ujmin (v) are unique. Nd (b) If v ∈ j=1 Uj , there is a direct sum Uj = Ujmin (v) ⊕ Wj . The space Wj with Wj ∩ Ujmin (v) = {0} can be chosen arbitrarily, but is superfluous. The representation v = ρTS (a, (Bj )) = Ba requires fixing a generating system Bj of Uj . Remark 8.37. (a) If Bj is a proper frame, the kernel ker(B) is nontrivial and v = B(a + b) holds for all b ∈ ker(B). (b) If Bj (1 ≤ j ≤ d) are fixed bases of Uj , the coefficient tensor a of v = Ba is unique. For fixed subspaces Uj , all possible representations are of the form Nd v = B0 a0 with B0 = BT, a0 = T−1 a, where T = j=1 Tj can be any isomorphisms (cf. Lemma 8.10). If Uj % Ujmin (v) for at least one j, the coefficient tensor a cannot take all values, but must belong to the kernel of PWj ⊗ id[j] B, where
284
8 Tensor Subspace Representation
PWj is the projection18 corresponding to the direct sum Uj = Ujmin (v) ⊕ Wj (cf. Remark 8.36b). (c) In the case of v = ρorth (a, (Bj )), part (b) holds for all unitary Tj . (d) The representation v = ρHOSVD (a, (Bj )) is unique if the singular values (j) {σi : 1 ≤ i ≤ rj } are simple for all 1 ≤ j ≤ d.
8.4.2 Tangent Space Given v = ρTS (a, (Bj )) = Ba with a ∈ we set
Nd
j=1
KJj , Bj ∈ (Vj )rj , B =
v(t) := ρTS (a + tb, (Bj + tCj )) for t ∈ R, b ∈
d O
Nd
KJj , Cj ∈ (Vj )rj
j=1
Bj
(8.36)
j=1
Again the derivative dv(t) at t = 0 for all b and Cj spans the tangent space dt TTS (a, (Bj )) (cf. Definition 7.12). Remark 8.38. (a) The tangent space TTS (a, (Bj )) is the sum of TTS,a (a, (Bj )) Pd and j=1 TTS,Bj (a, (Bj )): TTS,a (a, (Bj )) := U :=
d O
Uj
(Uj := span{Bj }),
j=1
TTS,Bj (a, (Bj )) :=
B[j] ⊗ Cj a : Cj ∈ (Vj )rj ,
where B[j] ⊗Cj = B1 ⊗. . .⊗Bj−1 ⊗Cj ⊗Bj+1 . . .⊗Bd . In the case of the hybrid format (cf. §8.2.6), TTS,a (a, (Bj )) = {Bb : b ∈ Tr-term (v)}, where Tr-term (v) is the tangent space of a = ρr-term (r, v). (j)
(j)
(b) If Bj = (b1 , . . . , brj ) represents a basis with Uj = span{Bj } ⊂ Vj , then TTS,Bj (a, (Bj )) = id[j] ⊗ Lj v : Lj ∈ L(Uj , Vj ) . Proof. (a) The chain rule applied to the first argument a + tb yields Bb. Since b Nd Nd Jj varies in the whole space j=1 Uj belong to TTS,a (a, (Bj )). j=1 K , all u ∈ (j)
(j)
(b) Each tuple Cj = (c1 , . . . , crj ) ∈ (Vj )rj uniquely defines a map Lj ∈ (j) (j) L(Uj , Vj ) with Lj bi = ci (1 ≤ i ≤ rj ). Since id[j] ⊗ Lj B = B[j] ⊗ Cj and v = Ba, the statement follows. t u 18
For u = v + w ∈ Uj with v ∈ Ujmin (v), w ∈ Wj the projection is defined by PWj u = w.
8.4 Tangent Space and Sensitivity
285
Remark 8.39. (a) span{v} ⊂ TTS,a (a, (Bj )) and span{v} ⊂ TTS,Bj (a, (Bj )) and therefore also v ∈ TTS (a, (Bj )). (b) Assume that Bj (1 ≤ j ≤ d) represent bases. Let v = Ba = B0 a0 be two Nd representations with B0 = BT and a0 = T−1 a with T = j=1 Tj according to Remark 8.37b. Then TTS (a, (Bj )) = TTS (a0 , (Bj0 )) is independent of the choice of theN bases, but dependent on Uj = span{Bj }. TTS,Bj (a, (Bj )) is contained in Vj ⊗ k6=j Uk . (c) If, for all 1 ≤ j ≤ d, Bj is a basis of Ujmin (v), the tangent space only depends on the tensor v = ρTS (a, (Bj )). Proof. (a) Choose b and Cj as multiples of a and Bj . (b) Uj = span{Bj } = span{Bj0 } = span{Bj Tj } holds, since Tj is an isomorphism. This proves TTS,a (a, (Bj )) = TTS,a (a0 , (Bj0 )). Similarly, B[j] ⊗ Cj a = B[j] ⊗ Cj TT−1 a = B0[j] ⊗ Cj Tj a0 proves TTS,B (a, (Bj )) = TTS,B (a0 , (Bj0 )). (c) Apply Remark 8.37b and part (b).
u t
For orthonormal bases, i.e., v = ρorth (a, (Bj )), we have replace Bj + tCj in (8.36) by a path Bj (t) of orthonormal bases with Bj (0) = Bj and Cj = d H H H dt Bj (t)|t=0 . From Bj (t) Bj (t) = I one infers Cj Bj + Bj Cj = 0. Exercise 8.40. Let B ∈ KI×J (#I ≥ #J) satisfy B H B = I. Then the subspace ˆ : S ∈ KJ×J skewof all C ∈ KI×J with C H B + B H C = 0 is {C = BS + B I×J ˆ ˆ symmetric, B ∈ K with range(B) ⊥ range(B)}. Remark 8.41. In the case of v = ρorth (a, (Bj )), Remarks 8.38 and 8.39 hold with Cj restricted by the side condition CjH Bj + BjH Cj = 0.
8.4.3 Sensitivity Now we consider tb and tCj in (8.36) as perturbations and write instead δa and (j) (δi )i=1,...,rj . The perturbation ˜ a := a + δa P Nd (j) leads to a perturbation δv = i δai j=1 bij of v. In the case of a general crossnorm and a general basis, we have kδvk ≤
X i
|δai |
d Y j=1
(j)
kbij k.
286
8 Tensor Subspace Representation
For an orthonormal basis and Hilbert norm, we can use the fact that the products Nd (j) j=1 bij are pairwise orthonormal, and get kδvk ≤
sX
2
|δai | =: kδak ,
i
where the norm on the right-hand side is the Euclidean norm. For perturbations of the basis vectors we give only a differential analysis; i.e., we only consider a small perturbation in a single component. Furthermore, we assume (j) that the vectors bi form orthonormal bases. Without loss of generality we may (1) (1) (1) suppose that b1 is perturbed into b1 + δ1 . Then (2)
(1)
X
v ˜=v+
(3)
(d)
a[1, i2 , . . . , id ] δ1 ⊗ bi2 ⊗ bi3 ⊗ · · · ⊗ bid ,
i2 ,...,id
i.e., δv =
(1)
P
i2 ···id
a[1, i2 , . . . , id ] δ1 ⊗
are orthogonal. Therefore
d N
(j)
j=2
bij . Terms with different (i2 , . . . , id )
s X (1) a[1, i2 , . . . , id ] 2 . kδvk = kδ1 k i2 ,...,id
(j)
As a consequence, for small perturbations δi approximation is kδvk ≈
d X X j=1
≤ kvk
s
(j) kδ` k
(j)
of all vectors bi , the first order
a[i1 , . . . , ij−1 , `, ij+1 , . . . , id ] 2
X i1 ,...,ij−1 ,ij+1 ,...,id
` d rX X `
j=1
(j)
kδ` k2 ≤ kvk
√ d
rX
(j)
j,`
kδ` k2 .
(8.37)
Here we have used the Schwarz inequality X
s
(j) kδ` k
i1 ,...,ij−1 ,ij+1 ,...,id
`
r ≤
a[i1 , . . . , ij−1 , `, ij+1 , . . . , id ] 2
X
X
(j) kδ` k2 `
r
X i
2
|ai |
P 2 2 together with i |ai | = kvk (cf. Exercise 8.19). The last inequality in (8.37) is again Schwarz’ inequality.
8.5 Conversions between Different Formats
287
8.5 Conversions between Different Formats So far, three formats (full representation, Rr , Tr ) have been introduced; in addition, there is the hybrid format Rr ∩ Tr . A natural question is how to convert one format into another one. It will turn out that conversions between Rr and Tr lead to the hybrid format introduced in §8.2.6. The conversions Rr → Tr and Tr → Rr are described in §8.5.2 and §8.5.3. The mapping from Rr into the HOSVD representation has already been mentioned in §8.3.3.2. For completeness, the full format is considered in §8.5.1 (see also §7.6.1).
8.5.1 Conversion from Full Representation into Tensor Subspace Format Nd nj Assume that a tensor v ∈ V := is given in full representation. The j=1 K P Nd (j) translation into tensor subspace format is v = j=1 bij with the unit i∈J vi (j) basis vectors bi := e(i) ∈ Knj in (2.2). Here the tensor v and its coefficient tensor are identical. The memory cost of the tensor subspace format is even larger because of the additional basis vectors. To reduce the memory, we may determine the minimal subspaces Ujmin (v) ⊂ Knj , e.g., by the HOSVD representation. Qd Qd If rj = dim(Ujmin (v)) < nj , the memory cost j=1 nj is reduced to j=1 rj .
8.5.2 Conversion from Rr to Tr The letter ‘r’ is the standard variable name for all kinds of ranks. Here we have to distinguish the tensor rank or representation rank r in v ∈ Rr from the vectorvalued tensor subspace rank r with components rj .
8.5.2.1 Theoretical Statements First, we recall the special situation of d = 2 (matrix case). The matrix rank is equal = rank1 = rank2 . to all ranks introduced for tensors: matrix-rank P=r tensor-rank (1) (2) Therefore there is an r-term representation v = i=1 vi ⊗vi with r = rank(v). (2) (1) Since {vi : 1 ≤ i ≤ r} and {vi : 1 ≤ i ≤ r} are sets of linearly independent vectors, they can be used as bases and yield a tensor subspace representation (8.5b) for r = (r, r) with the coefficients aij = Prδij . The singular-value decomposition yields another r-term representation v = i=1 σi ui ⊗ vi, which is an orthonormal tensor subspace representation (8.12) (with orthonormal bases {ui }, {vi }, and coefficients aij = δij σi ). The demonstrated identity Rr = T(r,r) of the formats does not extend to d ≥ 3.
288
8 Tensor Subspace Representation
Given an r-term representation v=
d r O X
(j)
vi ,
(8.38)
i=1 j=1
Lemma 6.1 states that v ∈ This proves v ∈ Tr
a
Nd
j=1 Uj
(j)
(j)
(cf. (6.1)) with Uj := span{v1 , . . . , vd }.
for r = (r1 , . . . , rd ), rj := dim(Uj ).
We recall the relations between the different ranks. Theorem 8.42. (a) The minimal tensor subspace rank r = (r1 , . . . , rd ) of v ∈ V is given by rj = rankj (v) (cf. (5.6b)). The tensor rank r = rank(v) (cf. Definition 3.35) satisfies r ≥ rj . The tensor rank may depend on the field: rR ≥ rC ≥ rj (cf. Proposition 3.44), while rj = rankj (v) is independent of the field. (b) The inequalities of Part (a) are also valid for the border rank defined in (9.11) instead of r, rR , rC . Proof. Part (a) follows from Remark 6.24. In the case of (b), v is the limit of a sequence vn ∈ Rr . There are minimal subspaces Ujmin (vn ) (1 ≤ j ≤ d, n ∈ N) of dimension rj,n satisfying r := rank(v) ≥ rj,n . By Theorem 6.29 the minimal subspace Ujmin (v) has dimension rj := dim(Ujmin (v)) ≤ lim inf rj,n ≤ r. n→∞
Hence r ≥ rj is proved.
t u (j)
(j)
Using a basis Bj of Uj = span{v1 , . . . , vd }, we shall construct a tensor subspace representation v = ρTS a, (Bj )dj=1 in §8.5.2.3. In general, these subspaces Uj may be larger than necessary; however, under the assumptions of Proposition 7.10, Uj = Ujmin (v) are the minimal subspaces. This proves the following statement. Remark 8.43. If v is given by the r-term representation (8.38) with r = rank(v), (j) (j) constructions based19 on Uj := span{v1 , . . . , vd } yield v = ρTS a, (Bj )dj=1 with rj = rank(v). j
Having converted v = ρr-term (. . .) into v = ρTS a, (Bj )dj=1 , the next statement describes the r-term structure of the coefficient tensor a. This helps, in particular, to obtain the hybrid format in §8.2.6. 19
Representations with rj = rankj (v) can be obtained anyway by HOSVD. These decompositions, however, cannot be obtained from Uj alone.
8.5 Conversions between Different Formats
289
d ) be any tensor subspace representation Theorem 8.44. Let v = ρTS (a, (Bj )j=1 J with the coefficient tensor a ∈ K and bases20 Bj . Then the ranks of v and a coincide in two different meanings. First, the true tensor ranks satisfy
rank(a) = rank(v). Second, given (8.38) with representation rank r, the r-term representation of a (with same number r) can be obtained constructively as detailed in §8.5.2.3. The resulting tensor subspace representation of v is the hybrid format (8.18). Nd (j) ∼ (j) Proof. Consider v as an element of j=1 Uj with Uj = span{v1 , . . . , vd } = KJj . By Lemma 3.39a, the rank is invariant. This proves rank(a) = rank(v). The second part of the statement follows from the constructions in §8.5.2.3. t u
8.5.2.2 Conversion into General Tensor Subspace Format Pr Nd (j) A rather trivial translation of v = ∈ Rr into v ∈ Tr with j=1 vi i=1 r = (r, . . . , r) can be obtained without any arithmetic cost by choosing the frames (j)
Bj = [ v1 , . . . , vr(j) ] and the diagonal coefficient tensor a [ i, . . . , i ] = 1 for 1 ≤ i ≤ r,
and a[i] = 0 otherwise.
(8.39)
(j)
Obviously, v = ρr-term (r, (vi )) = ρTS (a, (Bj )dj=1 ) holds. Note that there is no guarantee that the frames are bases; i.e., the subspaces Uj = range(Bj ) in (6.2b) may have a dimension smaller than r (cf. Remark 3.42). The diagonal tensor a is a particular case of a sparse tensor (cf. (7.5)).
8.5.2.3 Conversion into Orthonormal Hybrid Tensor Subspace Format Let Vj = KIj . An orthonormal basis of the subspace Uj from above can be ob(j) (j) tained by a QR decomposition of the matrix Aj := [v1 · · · vr ] ∈ KIj ×r . Procedure RQR in (2.26) yields Aj = Bj Rj with an orthogonal matrix Bj ∈ KIj ×rj , where rj = rank(Aj ) = dim(Uj ). The second matrix Rj allows the representaPrj (j) (j) (j) ri,k bi . From this we derive tions vk = Bj Rj [•, k] = i=1 20 For general frames Bj the coefficient tensor is not uniquely defined and, in fact, different (equivalent) coefficient tensors may have different ranks. However, the minimum of rank(a) over all equivalent a coincides with rank(v).
290
8 Tensor Subspace Representation
v=
d r O X
(j) vk
=
k=1 j=1
=
r1 X
···
i1 =1
rd X r Y d X
r O d X
(j)
rij ,k
O d
k=1 j=1
|
a=
(j)
(j)
rij ,k bij
(8.40a)
k=1 j=1 ij =1
id =1
proving v = ρorth a, (Bj )dj=1 r-term format
rj r O d X X
(j)
bij ,
j=1
{z } a[i1 · · · id ]
with the coefficient tensor a described in the
(j)
rk ,
r (j) (j) j ∈ Kr j . rk := ri,k i=1
k=1 j=1
(8.40b)
(j) d This yields the orthonormal hybrid format v = ρhybr orth r, (rk ), (Bj )j=1 (cf. (8.19)). Pn The conversion cost caused by the QR decomposition is j=1 NQR (nj , r) with nj := #Ij . If r ≤ n := maxj n, the cost is bounded by 2nr2 . The construction from above can equivalently be obtained by the following two steps: (i) apply the approach of §8.5.2.2 and (ii) perform an orthonormalisation of the frame as described in §8.2.4. Note that the transformation to new orthonormal bases destroys the sparsity of the coefficient tensor (8.39).
8.5.2.4 Case of Large r As seen from (8.40a,b), the representation rank r of v ∈ Rr (KI ) is inherited by the coefficient tensor a ∈ Rr (KJ ). In §7.6.2 the conversion of v ∈ Rr into full Qd n / max format is proposed, provided that r > N := 1≤i≤d ni . Now the j=1 j same consideration can be applied to a ∈ Rr , provided that r > R :=
d Y j=1
rj / max ri , 1≤i≤d
(8.41)
where rj is obtained in §8.5.2.3 as size of the basis Bj ∈ KIj ×Jj , Jj = {1, . . . , rj }. Lemma 7.20 and Remark 7.21 show that a conversion of a ∈ Rr (KJ ) into full Qd format or R-term format a ∈ RR (KJ ) requires 2r j=1 rj operations. The cost Pn Pn to obtain a ∈ Rr is j=1 nj min{r, nj } (cf. §8.5.2.3). j=1 NQR (nj , r) ≤ 2r This yields the following result. (j) Lemma 8.45. Assume v = ρr-term r, (vν ) with r satisfying (8.41). Then v can be converted into v = ρorth a, (Bj )dj=1 or a hybrid format with a ∈ RR (KJ ) Qd Pn requiring 2r j=1 rj + j=1 nj min{r, nj } operations.
8.5 Conversions between Different Formats
291
8.5.3 Conversion from Tr to Rr The tensor subspace representation v=
r2 r1 X X
rd X
···
k1 =1 k2 =1
(2)
(1)
(d)
a[k1 k2 · · · kd ] bk1 ⊗ bk2 ⊗ · · · ⊗ bkd
kd =1
Qd from (8.5b) or (8.12) is an r-term representation of v with r := j=1 rj terms. To P Nd (j) (1) reach the format (7.7a): v = k j=1 vk , the vectors vk for j = 1 could be (1)
(j)
(j)
defined by a[k1 k2 · · · kd ]bk1 , while vk := bkj for j > 1. Following the proof of Lemma 3.45, an improvement is possible. Choose the largest r` . Without loss of generality, assume r1 ≥ rj for all j. Rewrite v as v=
r2 X k2 =1
···
rd X r1 X
(1) a[k1 k2 · · · kd ] bk1
(d)
(2)
⊗ bk2 ⊗ · · · ⊗ bkd .
(8.42)
k1 =1
kd =1
|
{z (1) ˆ =: b [k2 · · · kd ]
}
Qd This is an r-term representation of v with r := j=2 rj terms. Because of the presumably very large number r of terms, the storage requirement Pd r j=1 size(Uj ) of the r-term representation of v seems huge. An improvement Pr Nd (j) (j) can be based on the fact that the r factors vν (1 ≤ ν ≤ r) in v = ν=1 j=1 vν for j ≥ 2 are r not necessarily linearly independent vectors but can be expressed (j) rj , which is already stored (cf. (7.14)). This again leads to the by the basis (bi )i=1 hybrid format, now considered as a particular case of the r-term format (cf. (8.20)). Next, we assume that v ∈ Tr is given in the hybrid format in (8.18): v=
r1 X i1 =1
···
rd X r Y d X id =1
O d
a(j) ν [ij ]
ν=1 j=1
(j)
bi j .
j=1
Using the reformulation (8.42), we have to compute the vectors ! r1 r Y d X X (1) (j) (1) ˆb [i2 · · · id ] := a [ij ] b ∈ V1 = Kn1 . ν
i1 =1
i1
ν=1 j=1
As above, the direction k = 1 is chosen because of the assumption r1 ≥ rj for all Qd j so that the resulting representation rank N = j=2 nj is minimal. Remark 8.46. Let v be given in the hybrid Qdformat in (8.18) with rj , r as above. Conversion into N -term format with N = j=2 nj requires N (d − 1)r + 2n1 r1 operations.
292
8 Tensor Subspace Representation
8.5.4 A Comparison of Both Representations We summarise the results from above. Remark 8.47. (a) If v ∈ Rr , then v ∈ Tr with r = (r, . . . , r). (b) If v ∈ Rr , a hybrid format Rr ∩ Tr can be constructed with r = (r1 , . . . , rd ), rj = rankj (v). (c) If v ∈ Tr with r = (r1 , . . . , rd ), then v ∈ Rr with r :=
Qd
j=1 rj max1≤j≤d rj .
For simplification, we now assume that Vj = Kn (i.e., the dimension n is independent of j). The transfer between both representations is quite nonsymmetric. According to Remark 8.47a, v ∈ Rr yields v ∈ Tr with r = (r, r, . . . , r). Note that vectors of a similar size must be stored: r-term TSR Nmem = r · d · n = Nmem ((Bj )1≤j≤d )
(cf. (7.8c) and (8.6a)).
Additionally, the tensor subspace representation needs storage for the coefficient tensor a: TSR Nmem (a) = rd
(cf. Remark 8.8b). This large additional memory cost makes the tensor subspace representation clearly less advantageous. The hybrid format from Remark 8.47b needs the storage d X hybr Nmem = (n + r) rj , j=1 r-term which may be smaller than Nmem if rj < r < n.
On the other hand, if a tensor v ∈ Tr with r = (r, · · · , r) is converted into the rd−1 -term representation from (8.42), the latter format requires storage of size N -term Nmem = rd−1 · d · n.
Since r ≤ n (and usually r n), the inequality TSR r-term Nmem = r · d · n + rd rd−1 · n < rd−1 · d · n = Nmem
indicates that the tensor subspace representation is far better. The previous examples underline that none of the formats Rr or Tr are, in general, better than the other. It depends on the nature of the tensor what format is preferable. Often, the hybrid format is the best compromise.
8.6 Joining two Tensor Subspace Representation Systems
293
8.5.5 r-Term Format for Large r > N In §7.6.3 we discussed the case of r > N , where N is the bound of the maximal rank in (8.43). In particular in the case of d = 3, we may use an intermediate tensor subspace representation (and, possibly, approximation tools for this format: see §10). Here we assume that a tensor is represented in the r-term format, v=
r X
(1)
vi
(2)
⊗ vi
(3)
⊗ vi
∈ Kn1 ×n2 ×n3 ,
i=1
with rather large r. The term ‘rather large’ may mean r > N := min{n1 n2 , n1 n3 , n2 n3 }.
(8.43)
Instead of a large r, we may also assume N to be rather small. In this case the following procedure yields an exact N 0 -term representation with N 0 ≤ N < r. Step 1. Convert v from r-term format into tensor subspace format (hybrid variant in §8.2.6). Step 2. Convert v back into N 0 -term format with N 0 ≤ N (concerning N in (8.43), see §8.5.3).
8.6 Joining two Tensor Subspace Representation Systems Let v0 = ρTS (a0 , (Bj0 )dj=1 ) and v00 = ρTS (a00 , (Bj00 )dj=1 ) be two tensors from Nd 0 00 V = j=1 Vj , involving different subspaces Uj and Uj . Obviously, the sum 0 00 v + v requires the spaces Uj defined by Uj := Uj0 + Uj00
for 1 ≤ j ≤ d.
A common systems Bj spanning Uj has to be constructed. Then the coefficient tensors a0 of v0 and a00 of v00 have to be transformed according to the new bases.
8.6.1 Trivial Joining of Frames The least requirement is that 0(j)
Bj0 = [b1 Bj00
=
0(j)
0(j)
, b2 , . . . , br0 ] ∈ Uj0 j
00(j) 00(j) 00(j) [b1 , b2 , . . . , br0 ] j
∈
rj0
and
r00 Uj00 j
are frames spanning the respective subspaces Uj0 and Uj00 . The respective index
294
8 Tensor Subspace Representation d
d
sets are J0 = ×j=1 Jj0 and J00 = ×j=1 Jj00 with Jj0 = {1, . . . , rj0 } and Jj00 = {1, . . . , rj00 }. Since no linear independence is required, the simple definition i h 0(j) 0(j) 0(j) 00(j) 00(j) 00(j) , Bj := Bj0 Bj00 = b1 , b2 , . . . , br0 , b1 , b2 , . . . , br0 j
j
Jj := {1, . . . , rj } with rj := rj0 + rj00 , yields a frame associated with the subspace Uj := Uj0 + Uj00 . The representation rank rj is the sum of the previous ones, even when the subspaces Uj0 and Uj00 overlap. An advantage is the easy construction of the coefficients. The columns of Bj = 0(j) (j) 00(j) (j) are bi := bi for 1 ≤ i ≤ rj0 and bi+r0 := bi for 1 ≤ i ≤ rj00 . j 0 The coefficient a of (j) (j) [b1 , . . . , brj ]
v0 =
X
a0i0
i0 ∈J0
X
0(j)
bi0
j
∈
d O
j=1
becomes v0 =
d O
ai
i∈J
d O j=1
Uj0
(8.44a)
Uj
(8.44b)
j=1
(j)
bij ∈
d O j=1
with ai := a0i for i ∈ J0 and ai := 0 for i ∈ J\J0 . Analogously, a coefficient Nd P 00(j) 0 a00 of v00 = becomes a i+r0 := a00i for i ∈ J00 with i00 ∈J00 ai0 j=1 bi00 j r0 = (r10 , . . . , rd0 ), and ai := 0 otherwise. Remark 8.48. The joining of frames requires only a rearrangement of data but no arithmetic operations.
8.6.2 Common Bases Assume that Bj0 , Bj00 are the respective bases of Uj0 , Uj00 . We want to construct a new, common basis Bj for the sum Uj = Uj0 + Uj00 . Applying the procedure JoinBases(Bj0 , Bj00 , rj , Bj , Tj0 , Tj00 ) (j)
(8.45a)
(j)
in (2.32), we produce a common basis Bj = [ b1 , . . . , brj ] of dimension rj and transformation matrices Tj0 and Tj00 satisfying (8.45b) (cf. (2.31)): Bj0 = Bj Tj0
and Bj00 = Bj Tj00
(8.45b)
8.6 Joining two Tensor Subspace Representation Systems
295
0
Lemma 8.49. Let a0 ∈ KJ be the coefficient tensor of v0 in (8.44a) with respect to the bases Bj0 . The coefficient tensor a0new ∈ KJ of v0 with respect to the bases Bj satisfying (8.45b) is given by a0new =
d O
! Tj0 a0
j=1
Nd 00 (cf. (8.44b)). Similarly, the coefficient tensor a00 ∈ KJ representing v00 ∈ j=1 Uj00 N d transforms into a00new = ( j=1 Tj00 )a00 ∈ KJ with respect to the bases Bj . A JoinBases is to take Bj0 as the first part of Bj , possible option in procedure I which leads to Tj0 = 0 . Remark 8.50. Assume Vj = Knj . The cost of (8.45a) is NQR (nj , rj0 + rj00 ). 0 The coefficient tensor anew is without arithmetic cost under the option mentioned Pd Qj Qd in Lemma 8.49, while a00new requires 2 j=1 ( `=1 r`00 )( `=j r` ) operations. If nj ≤ n and rj ≤ r, the total cost can be estimated by 8dnr2 + 2drd+1 . The cost for the basis transformation is much smaller if v0 and v00 are given in hybrid format. We use that d d d r 00 O r 00 O O X X 00 00 00(j) 00(j) 00 00 Tj a = and anew = aν,new a = aν ν=1 j=1 00(j)
00(j)
with aν,new = Tj00 aν
j=1
cost 2r00
Pd
00 j=1 rj rj
ν=1 j=1
operations.
Remark 8.51. In the case of hybrid tensors v0 and v00 , computing common bases Pd and transforming the coefficient tensors cost NQR (nj , rj0 + rj00 ) + 2r00 j=1 rj rj00 . If all ranks are bounded by r and nj ≤ n, the total cost can be estimated by 8dnr2 + 2dr3 . In the case of a tensor subspace representation with orthonormal bases, procedure JoinBases has to be replaced with JoinONB.
Chapter 9
r-Term Approximation
Abstract In general, one tries to approximate a tensor v by another tensor u requiring less data. The reason is twofold: the memory size should decrease and, hopefully, operations involving u should require less computational work. In fact, u ∈ Rr leads to decreasing cost for storage and operations as r decreases. However, the other side of the coin is an increasing approximation error. Correspondingly, in Section 9.1 two approximation strategies are presented, where either the representation rank r of u or the accuracy is prescribed. Before we study the approximation problem in general, two particular situations are discussed. Section 9.2 is devoted to r = 1, i.e., u ∈ R1 is an elementary tensor. The matrix case d = 2 is recalled in Section 9.3. The properties observed in the latter two cases contrast with the true tensor case studied in Section 9.4. The numerical difficulties are caused by the fact that the r-term format is not closed. In Section 9.5 we study nonclosed formats in general. Numerical algorithms, in particular the ALS method, solving the approximation problem will be discussed in Section 9.6. Modified approximation problems are addressed in Section 9.7. Section 9.8 provides important r-term approximations for special functions and operators. It is shown that the Coulomb potential 1/|x − y| as well as the inverse of the Laplace operator allow r-term approximations with exponentially improving accuracy.
9.1 Approximation of a Tensor Assume nj = n for all 1 ≤ j ≤ d. In (7.8c) the storage requirement of an r-term Pr Nd (j) r-term representation u = i=1 j=1 vi ∈ Rr is described by Nmem (p) = r · d · n. d−1 (cf. Lemma 3.45). If we On the other hand, the size of r is bounded by r ≤ n r-term insert this inequality, we get an upper bound of Nmem (p) ≤ d · nd which is worse than the storage size for the full representation (cf. (7.4)). The consequence is that the r-term representation makes sense only if the involved rank r is of moderate size. Since the true rank r may be large (or even
© Springer Nature Switzerland AG 2019 W. Hackbusch, Tensor Spaces and Numerical Tensor Calculus, Springer Series in Computational Mathematics 56, https://doi.org/10.1007/978-3-030-35554-8_9
297
9 r-Term Approximation
298
infinite), an exact representation may be impossible, and instead we have to accept approximations of the tensor. Nd Let V = k·k j=1 Vj be a Banach tensor space. The approximation problem (truncation problem) can be formulated in two different versions. In the first version we fix the representation rank r and look for approximations in Rr : Given v ∈ V and r ∈ N0 , determine u ∈ Rr minimising kv − uk .
(9.1)
Here k·k is an appropriate norm1 on V. We shall see that, in general, a minimiser u ∈ Rr of Problem (9.1) need not exist but we can form the infimum ε(v, r) := ε(r) := inf {kv − uk : u ∈ Rr } .
(9.2)
In §9.3 we shall discuss modified formulations of Problem (9.1). In the next variant the roles of r and ε(r) are reversed: Given v ∈ V and ε > 0, determine u ∈ Rr with kv − uk ≤ ε for minimal r.
(9.3)
There are two trivial cases which will not be discussed additional on. One case is r = 0, because R0 = {0} leads to the solution u = 0. The second case is d = 1 since then Rr = V for all r ≥ 1 and u := v is the perfect minimiser of (9.1) and (9.3). Remark 9.1. Problem (9.3) has always a solution. Proof. Let Nr := {kv − uk : u ∈ Rr } ⊂ [0, ∞) be the range of the norm. Given ε > 0, we must ensure that there are some r ∈ N0 and ε0 ∈ Nr with ε ≥ ε0 . Then N (ε) := {r ∈ N0 : there is some ε0 ∈ Nr with ε0 ≤ ε} is a nonempty subset of N0 and a minimum r := min{n ∈ N (ε)} must exist. First we consider the finite-dimensional case. Then there is a finite rmax with V = Rrmax . Hence ε0 := 0 ∈ Nrmax satisfies ε0 ≤ ε. Nd In the infinite-dimensional case, a j=1 Vj is dense in V. This implies that Nd there is a uε ∈ a j=1 Vj with ε0 := kv − uε k ≤ ε/2. By the definition of the algebraic tensor space, uε has a representation of r elementary tensors for some t u r ∈ N0 . This proves that ε ≥ ε0 ∈ Nr . Solutions u ∈ Rr of Problem (9.3) satisfy kv − uk ≤ ε for a minimal r. Fixing this rank r, we may still ask for the best approximation among all u ∈ Rr . This again leads to Problem (9.1). 1
Some results are stated for general norms, however, most of the practical algorithms will work for Hilbert tensor spaces with induced scalar product.
9.2 Discussion for r = 1
299
Lemma 9.2. Let (Vj , h·, ·ij ) be Hilbert spaces, while (V, h·, ·i) is endowed with the induced scalar product h·, ·i and the corresponding norm k·k. Assume for a fixed v ∈ V that a minimiser u∗ ∈ Rr of Problem (9.1) exists. Nd (a) Then u∗ ∈ U(v) := j=1 Ujmin (v). (b) In case that v satisfies linear constraints (cf. §6.8), these are also fulfilled by u∗ . (c) If v ∈ D(A) belongs to the domain of an unbounded operator, also u∗ ∈ D(A). T (d) Let V = n∈N V(n) be a Hilbert intersection space (cf. (4.58b)), where the norm k·k of V(0) is used in (9.1). Then also u∗ ∈ V holds (cf. Uschmajew [287]). (e) In the case of Problem (9.3), one of the solutions satisfies u∗ ∈ U(v) and the statements about possible constraints. Nd Proof. Let P : V → U(v) := j=1 Ujmin (v) be the orthogonal projection onto 2 2 2 U(v). Because kv − uk = kv − Puk + k(I − P)uk and Pu ∈ Rr , the Pv=v
minimiser u must satisfy (I − P)u = 0, i.e., u ∈ U(v). The further statements follow from u ∈ U(v). t u
9.2 Discussion for r = 1 By Exercise 8.2b, the following results can be derived from properties of Tr , which will be established later. Nevertheless, we discuss the case R1 as an exercise (cf. Zhang–Golub [309]) and as demonstration of the contrast to the case r > 1. Let v ∈ V be given. Problem (9.1) with r = 1 requires the minimisation of
d
O
(j) u . (9.4)
v −
j=1
In this section we assume that the vector spaces Vj are finite dimensional (corresponding results for infinite-dimensional spaces follow from Theorem 10.11 combined with Exercise 8.2b). Note that the choice of the norm is not restricted. Nd Lemma 9.3. Let V = normed tensor space. j=1 Vj be a finite-dimensional Nd Then for any v ∈ V there are tensors umin = j=1 u(j) ∈ R1 minimising (9.4):
d
O
(j) kv − umin k = min v− u .
u(1) ∈V1 ,...,u(d) ∈Vd
(9.5)
j=1
If d ≥ 2 and dim(Vj ) ≥ 2 for at least two indices j, the minimiser umin may be not unique.
9 r-Term Approximation
300
Proof. (i) If v = 0, u = 0 ∈ R1 is the unique solution. (ii) For the rest of the proof assume v 6= 0. Furthermore, we may assume, without loss of generality, that there are norms k·kj on Vj scaled in such a way that
d d
Y
O
v (j) = kv (j) kj
j=1
(cf. (4.42)).
(9.6)
j=1
(iii) For the minimisation in minu∈R1 kv − uk, the set R1 may be reduced to the subset C := {u ∈ R1 : kuk ≤ 2 kvk} since otherwise kv − uk ≥ kuk − kvk > 2 kvk − kvk = kvk = kv − 0k , (4.2)
i.e., 0 ∈ R1 is a better approximation than u. Consider the subsets Cj := u(j) ∈ Vj : kv (j) kj ≤ (2 kvk)1/d ⊂ Vj o nN d (j) and note that C = : u(j) ∈ Cj because of (9.6). We conclude that j=1 u inf kv − uk = inf kv − uk =
u∈R1
Let uν :=
d N
u∈C
inf u(j) ∈Cj
Od
v −
j=1
u(j) .
uj,ν (uj,ν ∈ Cj ) be a sequence with kv − uν k → inf kv − uk. u∈R1
j=1
Since the sets Cj are bounded and closed, the finite dimension of Vj implies that (j) Cj is compact. We find a subsequence so that uj,ν → u∗ ∈ Vj . Then the tensor Nd (j) u∗ := j=1 u∗ ∈ R1 satisfies kv − u∗ k = inf u∈R1 kv − uk . (iv) For d = 1, v ∈ V belongs to R1 , thus u∗ := v is the only minimiser. Already for d = 2, the matrix 01 01 has the two different minimisers u∗ = 1 1 1 0 0 0 0 0 u 0 ⊗ 0 = 0 0 and u∗∗ = 1 ⊗ 1 = 0 1 with respect to the Frobenius norm. t The practical computation of umin in (9.5) is rendered more difficult by the following fact, which is proved by Example 9.49. Remark 9.4. The function Φ(u(1) , . . . , u(d) ) = kv − minima larger than the global minimum.
Nd
j=1
u(j) k may have local
For Problems (9.1) and (9.3) with r ≥ 2 we must distinguish the cases d = 2 (see §9.3) and d ≥ 3 (see §9.4). The next result is related to Remark 4.175 (cf. Hillar–Lim [162, Theorem 10.1]). Proposition 9.5. Let v be symmetric. Then the best approximation u ∈ R1 is symmetric.
9.3 Discussion in the Matrix Case d = 2
301
9.3 Discussion in the Matrix Case d = 2 First we discuss the general case of finite-dimensional vector spaces V1 , V2 and the (j) tensor space V = V1 ⊗V2 with arbitrary norm k·k. Introducing bases {bi : i ∈ Ij } Ij ∼ in V1 and V2 , we obtain isomorphisms Vj = K . Similarly, the tensor space V is isomorphic to KI1×I2 . The norm k·k on V defines the equally named norm
X X
(2) kM k := Mνµ b(1) for M ∈ KI1 ×I2 . ν ⊗ bµ
ν∈I1 µ∈I2
This makes the isomorphism V ∼ = KI1 ×I2 isometric. Therefore Problem (9.1) is equivalent to Given a matrix M ∈ KI1 ×I2 and r ∈ N0 , (9.7) determine R ∈ Rr minimising kM − R k , with Rr = M ∈ KI1 ×I2 : rank(M ) ≤ r as defined in (2.6). Proposition 9.6. For d = 2, the problems (9.1) and (9.7) have a solution; i.e., the minima minu∈Rr kv − uk and minR∈Rr kM − Rk are attained. Proof. Since the problems (9.1) and (9.7) are equivalent, we focus to Problem (9.7). As in the proof of Lemma 9.3 we find that be reduced the minimisation in (9.7) may to the bounded subset Rr,M := Rr ∩ R ∈ KI×J : kRk ≤ 2 kM k . We consider a sequence R(k) ∈ Rr,M such that kM − R(k) k → inf kM − Rk . R∈Rr
I×J
Since {R ∈ K : kRk ≤ 2 kM k} is compact, there is a subsequence denoted again by R(k) with lim R(k) =: R∗ ∈ KI1×I2 . Continuity of k·k (cf. §4.1.1) implies that inf R∈Rr kM − Rk = kM − R∗ k. It remains to show that R∗ ∈ Rr . Lemma 2.4 u proves that indeed rank(R∗ ) = rank(lim R(k) ) ≤ lim inf k→∞ rank(R(k) ) ≤ r. t Next, we consider the Frobenius norm2 k·k = k·kF . Proposition 9.7. In the case of Problem (9.7) with the Frobenius norm k·kF , the of the solution is derived from the singular-value decomposition characterisation P s M = UΣ V H = i=1 σi ui viH (cf. (2.18)). Then min{r,s}
R :=
X
σi ui viH
i=1
is a solution to Problem (9.7). It is unique if r = s or σr+1 < σr . The remaining error is r Xs σi2 . kM − RkF = i=r+1
2
We may also choose the matrix norm k·k2 from (2.11) or any unitarily invariant matrix norm.
9 r-Term Approximation
302
t u
Proof. Use Lemma 2.34.
For k·k = k·kF , Problem (9.3) has an immediate solution. The result is deduced from (2.23b). Remark 9.8. The problem Given M ∈ KI×J and ε > 0, determine R ∈ Rr with kM − RkF ≤ ε for minimal r
(9.8)
Prε Ps has the solution R := i=1 σi ui viH , where M = i=1 σi ui viH with s := rank(M ) (cf. (2.18)) is the singular-value decomposition and ) ( s X 2 2 σi ≤ ε . rε = min r ∈ {0, . . . , s} : i=r+1
There is a connection between the case r = 1 in §9.2 and Problem (9.7) for k·k = k·kF . We may determine the solution R ∈ Rr of (9.8) sequentially by a deflation technique, where in each step we determine a best rank-1 matrix R(i) ∈ R1 : 1) let R(1) ∈ R1 be the minimiser of min kM −SkF and set M (1) := M −R(1) ; S∈R1
(2)
2) let R ∈ R1 be the minimiser of min kM (1)−SkF and set M (2) := M−R(2) ; S∈R1 .. . r P r) let R(r) ∈ R1 be the minimiser of min kM (r−1) −SkF , set R := R(i) ∈ Rr . S∈R1
i=1
Remark 9.9. The solution of the previous deflation algorithm yields the approximation R ∈ Rr which is identical to the solution of Proposition 9.7. The error kM (r) kF = kM − RkF is as in Proposition 9.7. The solutions discussed above satisfy an important stability property. We shall appreciate this property later when we find situations for which stability is lacking. Lemma 9.10. The solutions R :=
r P i=1
Xr i=1
σi ui viH to Problems (9.1) and (9.3) satisfy
σi ui viH 2 = kRk2 , F F
(9.9)
i.e., the terms σi ui viH are pairwise orthogonal with respect to the Frobenius scalar product.
Proof. The Frobenius scalar product (2.10) yields the value σi ui viH , σj uj vjH F =
H H σi σj ui vi , uj vjH = σi σj trace (uj vjH )H (ui viH ) = σi σj trace vj uH j ui vi . Since the singular vectors ui are orthogonal, uH j ui = 0 holds for products with
H i 6= j, proving the orthogonality σi ui vi , σj uj vjH F = 0. t u
9.4 Discussion in the Tensor Case d ≥ 3
303
9.4 Discussion in the Tensor Case d ≥ 3 In this section we assume that the tensor space of order d ≥ 3 is nondegenerate (cf. Definition 3.25) to exclude trivial cases.
9.4.1 Nonclosedness of Rr A serious difficulty for the treatment of tensors of order d ≥ 3 is based on the fact that Proposition 9.6 does not extend to d ≥ 3. The following result stems from De Silva–Lim [73] and is also discussed in Stegeman [269], [272]. However, an example of such a type can already be found in Bini–Lotti–Romani [35]. Proposition 9.11. Let V be a nondegenerate tensor space of order d ≥ 3. Then, independently of the chosen norm, there are tensors v ∈ V for which Problem (9.1) possesses no solution. Proof. Consider a tensor space V = V1 ⊗ V2 ⊗ V3 with dim(Vj ) ≥ 2 and choose two linearly independent vectors vj , wj ∈ Vj . The tensor v := v (1) ⊗ v (2) ⊗ w(3) + v (1) ⊗ w(2) ⊗ v (3) + w(1) ⊗ v (2) ⊗ v (3) has tensor rank 3 as proved in Lemma 3.46. Next, we define vn := w(1) + nv (1) ⊗ v (2) + n1 w(2) ⊗ v (3) +
v (1)
⊗
v (2)
⊗ w(3) − nv (3)
for n ∈ N.
(9.10)
Exercise 3.47 shows that rank(vn ) = 2. The identity v − vn = − n1 w(1) ⊗ w(2) ⊗ v (3) is easy to verify; hence, independently of the choice for a norm, we obtain lim vn = v
n→∞
and 3 = rank(v) = rank(lim vn ) > rank(vn ) = 2 in contrary to the matrix case (2.7). This implies that inf u∈R2 kv − uk = 0. Therefore a minimiser u∗ ∈ R2 must satisfy kv − u∗ k = 0, i.e., u∗ = v. This, however, yields the contradiction rank(u∗ ) = rank(v) = 3, i.e., a minimiser does not exist. The tensor space of order 3 from above can be embedded into higher-order tensor spaces so that the statement extends to nondegenerate tensor spaces with d ≥ 3. t u The proof reveals that the set R2 is not closed.
9 r-Term Approximation
304
Nd Lemma 9.12. Let V = j=1 Vj be a nondegenerate tensor space of order d ≥ 3. Then R1 ⊂ V is closed but Rr ⊂ V for 2 ≤ r ≤ min dim(Vj ) is not closed.3 1≤j≤d
Nd
Proof. (i) Consider a sequence vn := j=1 uj,n ∈ R1 with v := limn→∞ vn ∈ V. Hence inf u∈R1 kv − uk ≤ inf n kv − vn k = 0. On the other hand, Lemma 9.3 states that the minimum is attained: minu∈R1 kv − uk = kv − umin k for some umin ∈ R1 . Together, we obtain from 0 = inf n kv − vn k = kv − umin k that v = umin ∈ R1 ; i.e., R1 is closed. (ii) The fact that R2 is not closed, has already been proved for d = 3. The extension to d ≥ 3 is mentioned in the proof of Proposition 3.44c. (iii) For the discussion of r > 2 we refer to De Silva–Lim [73, Thm. 4.10]. t u Even if Rr \Rr has measure zero, this does not imply that minu∈Rr kv − uk exists for almost all v ∈ V. The answer depends on the underlying field. Proposition 9.13. If K = C, the minimiser u∗ ∈ Rr of inf u∈Rr kv − uk exists for almost all v ∈ V. However, for K = R, there is a positive expectation that there is no minimiser. In the special case of V = ⊗3 R2 , all tensors v ∈ R3 \R2 fail to have a best approximation in R2 . Proof. For the first and last statements see Qi–Michałek–Lim [246]. The case K = R is considered in De Silva–Lim [73]. t u
9.4.2 Border Rank The observed properties lead to a modification of the tensor rank (cf. Bini–Lotti– Romani [35]), where Rr is replaced with its closure. Note that in the case of an infinite-dimensional tensor space with reasonable crossnorm, the closure is independent of the norm (cf. Theorem 4.76). Definition 9.14. The tensor border rank is defined by rank(v) := min r : v ∈ Rr ∈ N0 .
(9.11)
Note that rank(v) ≤ rank(v). The estimates rankj (v) ≤ rank(v) are already mentioned in Theorem 8.42b. An estimate of the usual and border rank in the reverse direction is given next. Exercise 9.15. Show that rank(v) ≤ (rank(v))d−1 . Hint: Use Rr ⊂ Tr for r = (r, . . . , r), closedness of Tr , and the estimate in Lemma 3.45 of the maximal Nd min rank of (v). j=1 Uj 3
The limitation r ≤ minj {dim(Vj )} is used for the proof in [73, Theorem 4.10]. It is not claimed that Rr is closed for larger r.
9.4 Discussion in the Tensor Case d ≥ 3
305
Remark 9.16. Lemma 3.46 shows that the tensor sd in (3.25) has rank d, whereas the border rank is equal to rank(sd ) = 2. d Proof. rank(sd ) ≤ 1 can easily be excluded. sd is the derivative dt v(t) of v(t) := Nd (j) (j) v + tw at t = 0. Exercise 3.47 proves rank([v(t) − v(0)] /t) = 2 j=1 for t 6= 0. Since sd = limt→0 [v(t) − v(0)] /t, the assertion follows. t u
A prominent example of a tensor of the form sd is the Laplacian operator with v (j) = identity ∈ Vj := L(H01 (Ωj ), H −1 (Ωj )) and w(j) = ∂ 2 /∂x2j ∈ Vj or its discretisation (cf. (9.39) and (9.44)). Nd Exercise 9.17. Let A = j=1 Aj ∈ L(V, V) be a bijective map. Show that rank(v) = rank(Av). Hint: A−1 ∈ L(V, V) holds (consequence of the open mapping theorem, cf. Yosida [306, §II.5]). The next statement shows that sd in (3.25) is the prototype of a tensor with border rank 2 and larger standard rank. Lemma 9.18. Let Vj be finite dimensional with dim(Vj ) ≥ 2. Suppose that u ∈ Nd V = j=1 Vj satisfy rank(u) = 2 < rank(u). Fix any linearly independent vectors v (j) , w(j) ∈ Vj which define the tensor sd in (3.25). Then there is a map Nd Φ = j=1 φ(j) ∈ L(V, V) with Φ(sd ) = u. At most d − 3 maps φ(j) may yield linearly dependent vectors φ(j) (v (j) ), φ(j) (w(j) ). The other φ(j) can be assumed to be isomorphisms. Proof. (i) Endow the tensor space with the Euclidean norm. By definition there are Nd Nd (j) (j) rank-2 tensors uν = xν + yν with xν = j=1 xν , yν = j=1 yν and uν → u. In Lemma 9.28 we shall show that ρν := kxν k → ∞. The sequence ρ1ν xν is bounded. We restrict the sequence to an equally denoted convergent subsequence. Nd The limit x ˆ := lim ρ1ν xν is again an elementary tensor: x ˆ = ˆ(j) . Since j=1 x (j) (j) kˆ xk = 1, we may choose x ˆ such that kˆ x k = 1. x. We may write (ii) Since xν + yν = O(1), it follows that also lim ρ1ν yν = −ˆ Nd Nd (j) (j) (j) (j) xν = ρν j=1 (ˆ x + ξν ) and yν = −ρν j=1 (ˆ x − ην ) with perturbations (j)
(j)
ξν , ην = o(1). (j) (j) (j) (j) x + ξν ) = v (j) (iii) Define invertible maps ϕν ∈ L(Vj , Vj ) with ϕν (ˆ Nd N (j) d (v (j) is defined in the lemma) and ϕν := j=1 ϕν . Then ϕν (xν ) = ρν j=1 v (j) Nd (j) (j) and ϕν (yν ) = −ρν j=1 (v (j) − ων ) holds with some ων = o(1). ϕν can be Nd (j) (j) = lim ϕν is invertible. Splitting ων into a chosen so that also ϕ = j=1 ϕ multiple of v (j) and a perpendicular part, we get ( d (j) (j) O ξν , zν = o(1), (j) (j) (j) ϕν (yν ) = −ρν 1 + ξν v − zν with (j) zν ⊥v (j) . j=1
9 r-Term Approximation
306
(iv) The bracket in " ϕν (xν + yν ) = ρν with αν = 1 −
Qd
j=1
# Nd (1) αν j=1 v (j) + zν ⊗ v (2) ⊗ . . . ⊗ v (d) + ... (2) +v (1) ⊗ zν ⊗ v (3) ⊗ . . . ⊗ v (d) + . . . (j)
1 + ξν
contains the largest terms. Note that the first two
terms in the bracket are orthogonal. Since ϕν (xν + yν ) → ϕ(u), we conclude that (j) the limits ρν αν → α, ρν zν → z (j) ⊥v (j) must exist separately: ϕ(u) = α
d O
v (j) + z (1) ⊗v (2) ⊗. . .⊗v (d) + v (1) ⊗z (2) ⊗v (3) ⊗. . .⊗v (d) + . . .
j=1
= αv (1) + z (1) ⊗ v (2) ⊗ . . . ⊗ v (d) + v (1) ⊗z (2) ⊗v (3) ⊗. . .⊗v (d) + . . . i.e., u = (ϕ(1) )−1 αv (1) + z (1) ⊗ (ϕ(2) )−1 v (2) ⊗ . . . ⊗ (ϕ(d) )−1 v (d) + . . . Nd The map Φ = j=1 φ(j) by φ(1) (w(1) ) = αv (1) + z (1) , φ(2) (w(2) ) = z (2) , . . . satisfies Φ(sd ) = u. (v) The rank of Φ(sd ) is d minus the number of indices j for which φ(j) (v (j) ) and φ(j) (w(j) ) are linearly dependent. The other maps φ(k) can be extended outside of span{v (k) , w(j) } such that φ(k) ∈ L(V, V) is an isomorphism. t u
9.4.3 Stable and Unstable Sequences For practical use, it would be sufficient to replace the (non-existing) minimiser u ∈ Rr of kv − uk by some uε ∈ Rr with kv − uε k ≤ inf u∈Rr kv − uk+ε for an ε small enough. However, those uε with kv − uε k close to inf u∈Rr kv − uk suffer from the following instability (below, ε is replaced with 1/n). Remark 9.19. vn in (9.10) is the sum vn = vn,1 + vn,2 of two elementary tensors. While kvn,1 +vn,2 k ≤ C stays bounded, the norms kvn,1 k and kvn,2 k grow as n: kvn,1 k, kvn,2 k ≥ C 0 n. Hence the cancellation of both terms is the stronger the smaller kv − vn k is. Cancellation is an unpleasant numerical effect, leading to aPsevere error amplification (cf. [137, §2]).4 In general, the ‘condition’ of a sum ν aν of reals can be described by the quotient Pn (−20)ν with suitable n. A typical example is the computation of exp(−20) by ν=0 ν! Independently of n, the calculation with standard machine precision eps = 10−16 yields a completely wrong result. The reason is that rounding errors produce an absolute error of the size Pn ν | (−20) /ν!| · eps ≈ exp(+20) · eps. Hence the relative error is about exp(40) · eps ≈ ν=0 2.410 17 · eps. 4
9.4 Discussion in the Tensor Case d ≥ 3
307
X X |aν | / aν ≥ 1. ν
ν
A similar approach leads us to the following definition of a stable representation (see also (7.13)). Nd Definition 9.20. Let V = a j=1 Vj be a normed tensor space. Pr Nd (j) (a) For any representation 0 6= v = i=1 j=1 vi we define5 κ
(j) vi
1≤j≤d 1≤i≤r
:=
!
r O d r O d
X
X
(j) (j) vi / vi .
i=1
j=1
(9.12a)
i=1 j=1
(b) For v ∈ Rr we set r O d X (j) (j) σ(v, r) := inf κ (vi )1≤j≤d : = v v . i 1≤i≤r
(9.12b)
i=1 j=1
(c) A sequence vn ∈ Rr (n ∈ N) is called stable in Rr if σ((vn )n∈N , r) := sup σ(vn , r) < ∞; n∈N
otherwise the sequence is unstable. The instability observed in Remark 9.19 does not happen accidentally, but is a necessary consequence of the nonclosedness of R2 . The next proposition is the negation of the special case of Lemma 9.28 for the format Rr . Nd Proposition 9.21. Suppose dim(Vj ) < ∞. If a sequence vn ∈ Rr ⊂ a j=1 Vj is stable and convergent, then limn→∞ vn ∈ Rr . A generalisation of the last proposition to the infinite-dimensional case follows. Theorem 9.22. Let V be a reflexive Banach space with a norm satisfying (6.14). For any bounded and stable sequence vn ∈ V, there is a weakly convergent subsequence vnν * v ∈ Rr . (j)
Proof. By the definition of stability, there are vectors vi,n ∈ Vj such that vi,n := Pr Nd (j) i=1 vi,n and kvi,n k ≤ C kvn k, e.g., for C := j=1 vi,n satisfies vn = Pr σ((vn ), r) + 1. Corollary 4.31 states that vi,n * vi and vn * v = i=1 vi are valid after restricting n to a certain subsequence. Note that vi,n ∈ R1 = T(1,...,1) and that Pr T(1,...,1) is weakly closed (cf. Lemma 8.6). Hence vi ∈ R1 implies that t u v = i=1 vi ∈ Rr . Using Theorem 4.117, one even shows the termwise weak convergence Pr Nd (j) (j) * vi in vnν * v = i=1 j=1 vi for a subsequence nν → ∞.
(j) vnν ,i 5
The first sum on the right-hand side of (9.12a) resembles the projective norm (cf. §4.2.3.1). The difference is that here only r terms are allowed.
9 r-Term Approximation
308
9.4.4 A Greedy Algorithm Finally, we hint to another important difference between the matrix case d = 2 and the true tensor case d ≥ 3. Consider again Problem (9.1), where we want to find an approximation u ∈ Rr of v ∈ V. In principle, we can try to repeat the deflation method in §9.3, exploiting that the best R1 -approximation exists: 1) determine the best approximation u1 ∈ R1 to v ∈ V according to Lemma 9.3, set v1 := v − u1 , 2) determine the best approximation u2 ∈ R1 to v1 ∈ V according to Lemma 9.3, set v2 := v1 − u2 , .. . r) determine the best approximation ur ∈ R1 to vr−1 ∈ V. Then u ˆ := u1 + u2 + . . . + ur ∈ Rr can be considered as an approximation of v ∈ V. The described algorithm belongs to the class of greedy algorithms, since in each single step we try to reduce the error as good as possible. Remark 9.23. (a) In the matrix case d = 2 with k·k = k·kF (cf. §9.3), the algorithm ˆ solves (9.1). from above yields the best approximation u ˆ ∈ Rr of v ∈ V, i.e., u ˆ ∈ Rr is, in general, a rather poor (b) In the true tensor case d ≥ 3, the resulting u approximation; i.e., kv − u ˆ k is much larger than inf u∈Rr kv − uk. Proof. (a) The matrix case is discussed in Remark 9.9. (b) If inf u∈Rr kv − uk has no minimiser, u ˆ cannot be a solution of (9.1). But even when v ∈ Rr so that u := v is the unique minimiser, practical examples t u (see below) show that the greedy algorithm yields a poor approximation u ˆ. As an example, we choose the tensor space V = R2 ⊗ R2 ⊗ R2 and the tensor v=
1 1 3 3 1 3 ⊗ + ⊗ ⊗ ⊗ ∈ R2 2 2 2 1 2 1
(9.13)
√ with Euclidean norm kvk = 2078 ≈ 45.585. The best approximation u1 ∈ R1 of v ∈ V is 1 1 1 u1 = 27.14270606 · ⊗ ⊗ . 0.7613363832 0.7613363836 0.3959752430 The approximation error is kv − u1 k = 2.334461003. In the second step the best approximation u2 ∈ R1 of v − u1 turns out to be
9.5 General Statements on Nonclosed Formats
309
1 1 1 u2 = 0.03403966791 · ⊗ ⊗ . −4.86171875 −3.91015625 2.469921875
It yields the approximation error kv − (u1 + u2 )k = 1.465 604 638, whereas the best approximation in R2 is u = v with vanishing error. The reason why the algorithm fails to find the best approximation, becomes obvious from the first correction step. The correction u1 is close to the second term in (9.13) but not equal. Therefore v − u1 belongs to R3 and it is impossible to reach the best approximation in the second step. The fact that the greedy correction can increase the tensor rank is studied in Stegeman–Comon [271]. Nevertheless, the algorithm can be used as an iteration (cf. §17.1). For convergence compare Dilworth–Kutzarova–Temlyakov [78] and Falc´o–Nouy [99].
9.5 General Statements on Nonclosed Formats Rr is not the only example of a nonclosed format. Another one will be discussed in §12.5.1. Here we study rather general nonclosed tensor representations.
9.5.1 Definitions Since there will be other formats which are also not closed (cf. §12.5), this section is devoted to nonclosed formats in general. We consider the situation of a representation ρ mapping a parameter space P with dim(P ) < ∞ into the tensor space Nd V = j=1 Vj (cf. §7.1.1). In the sequel we assume P : vector space with dim(P ) < ∞,
D ⊂ P closed subset,
V: tensor space with dim(V) < ∞,
F ⊂ V subset of cone structure, (9.14b)
ρ:D→F 0∈D
and
continuous and surjective, ρ(0) = 0.
(9.14a) (9.14c) (9.14d)
By definition, the tensor subset F is the range of ρ. The (two-sided) cone structure ensures that with v also λv belongs to F for all λ ∈ K. In general, ρ is not injective. In most of the example D coincides with P. All representations ρ considered in this book are multilinear so that (9.14a–d) is an easy consequence. In the following, we choose some norms on P and V, both denoted by k·k . Because of the finite dimension the choice of norms is not relevant.
9 r-Term Approximation
310
If v = ρ(p), the norm kpk may be considered as a stability measure for the representation of v by p. Since ρ is not necessarily injective, there be many p with v = ρ(p). We define σ(v) := inf{kpk : v = ρ(p)}. Remark 9.24. The infimum may be replaced by a minimum, i.e., there is at least one pv with v = ρ(pv ) and σ(v) = kpv k . The ratio kpk / kvk = kpk / kρ(p)k may seem to be more natural, however note that, in general, this quantity is not scale invariant. In the case of F = Rr the natural choice is r
D = P = (V1 × . . . × Vd )
(j)
and ρ (vi )i=1,...,r
j=1,...,d
=
r O d X
(j)
vi .
i=1 j=1
Then kλpk / kρ(λp)k = λ1−d kpk / kρ(p)k is not invariant. We obtain scale invariance by choosing r
D = (R1 ) ⊂ P := Vr
r X and ρ (ei )i=1,...,r = ei , ei ∈ R1 . i=1
Since R1 is closed, D satisfies (9.14a). Since F = Rr , σ(·) coincides with σ(·, r) in (9.12b). σ(v) measures the stability of the representation of v. Another question is as to whether σ(v0 ) is of similar size for neighbouring v0 . For this purpose we define the ε-neighbourhood of some v ∈ F by UF ,ε (v) := {w ∈ F : kv − wk < ε} . The quantity σε (v) := sup {σ(w) : w ∈ UF ,ε (v)}
for ε > 0
may become infinite. Lemma 9.28 implies σε (v) = ∞ for all v ∈ B and ε > 0. To see that the quantity σε (v) is of numerical interest, consider the following situation. Assume that v ∈ F is the solution of a certain problem. Instead we compute an approximation v0 ∈ V with kv − v0 k ≤ ε/2 which not necessarily belongs to F (e.g., it has a larger rank). An approximation v00 ∈ F with kv0 − v00 k ≤ ε/2 may be computed as in §9.6. If σε (v) = ∞, the decompositions of v and v00 may be completely different although kv − v00 k ≤ ε. Since σε (v) is weakly decreasing as ε & 0 and bounded from below by zero, the improper limit σ0 (v) := lim σε (v) ε&0
exists (here σ0 (v) = ∞ holds if σε (v) = ∞ for all ε > 0).
9.5 General Statements on Nonclosed Formats
311
9.5.2 Nonclosed Formats Definition 9.25. Let F := range(ρ). The representation ρ is called closed if F is closed in V. Instead we also say that the format F is closed. We set B := F \ F. If F = Rr , the set B consists of tensors with rank(v) ≤ r < rank(v). Remark 9.26. If ρ is not closed, the set B is nonempty and, in general, not closed. Proof. Example 9.31 proves the existence of a nonclosed B.
t u
The general formulation of Proposition 9.21 yields the following trivial but fundamental statement. Lemma 9.27. Let vi ∈ F with vi := ρ(pi ) be a convergent sequence with the limit v = lim vi . Then supi kpi k < ∞ implies v ∈ F. The condition supi kpi k < ∞ can be replaced by supi σ(vi ) < ∞. Proof. Since the set {p ∈ D : kpk ≤ C} with C := supi kpi k is compact (cf. (9.14a)), there is an (equally denoted) subsequence with pi → p ∈ D. Continuity of ρ (cf. (9.14c)) implies that v = lim ρ(pi ) = ρ(p). Hence v belongs to the range F of ρ, i.e., v ∈ F. t u By negation we conclude the following result. Lemma 9.28. vi := ρ(pi ) → v∗ ∈ B implies kpi k → ∞. Note that σε (v) ≥ 0 is well-defined for v ∈ B and ε > 0 since UF ,ε (v) is a nonempty subset of F. A consequence of Lemma 9.28 is Conclusion 9.29. If v ∈ B , then σε (v) = ∞ holds for all ε > 0 and leads to σ0 (v) = ∞. Proof. If σε (v) =: C < ∞, we can choose vi ∈ UF ,ε (v) with vi → v and parameters pi ∈ D with vi = ρ(pi ) and kpi k ≤ C (cf. Remark 9.24). Then Lemma 9.28 yields the contradiction v ∈ / B. t u In §9.5.3 we shall comment on the continuity of σ. A general negative result follows. Remark 9.30. If B 6= ∅ (i.e., if the format is nonclosed), σ is discontinuous at 0 ∈ V. Proof. (9.14d) implies σ(0) = 0. Assume that σ is continuous at 0. There there is a neighbourhood UF ,ε for some ε > 0 with σ(v) ≤ 1 for all v ∈ UF ,ε . Hence, σε−kwk (w) ≤ 1 holds for all w ∈ UF ,ε . Conclusion 9.29 implies that UF ,ε ∩ B = ∅. On the other hand, there is some 0 6= v ∈ B. The cone property (9.14b) implies that λv ∈ B for all λ 6= 0. Choosing a sufficiently small λ 6= 0, λv ∈ UF ,ε yields the contradiction. t u
9 r-Term Approximation
312
9.5.3 Discussion of F = Rr Let K = C be the underlying field. An interesting question is as to whether the quantity σ(v) is continuous. As known from algebraic geometry, general tensors in Rr admit only finitely many (essentially different) decompositions and these decompositions depend continuously on the tensor, at least for r not too large. Then σε (v) < ∞ holds for sufficiently small ε > 0 and has the limit σ0 (v) = σ(v). A particular positive result holds if the representation ρ : D0 ⊂ D → V is injective for a certain subset D0 and the inverse map — the decomposition — ρ−1 : F0 := ρ(P0 ) → P is continuous. Then σε (v) is bounded for v ∈ F and σ(v) = σ0 (v). This situation occurs under the conditions mentioned in §7.3. Above we require that r be not too large. In the case of d = 3 and V = Kn ⊗ Km ⊗ Kp the concrete condition is as follows. For r ≤ (n − 1) (m − 1) general tensors have a unique composition as stated in Domanov–De Lathauwer [82, Corollary 1.7]. However, if r ≤ (n − 1) (m − 1) + 1 and6 K = C, general tensors have finitely many decompositions (cf. Chiantini–Ottaviani [55, Proposition 5.4]). The term ‘general tensor’ admits the existence of exceptional tensors. A particular exceptional situation holds for the tensor in the next example Example 9.31. Set V = ⊗3 V with dim(V ) ≥ 2 and choose any linearly independent vectors a, b ∈ V. In the case of the 2-term format F = R2 , the tensor v(t) := (a + tb) ⊗ a ⊗ a + a ⊗ b ⊗ a + a ⊗ a ⊗ b belongs to B for t 6= 0, while v(0) ∈ F. Proof. For t = 0 we rewrite v(0) as a ⊗ a ⊗ (a + b) + a ⊗ b ⊗ a ∈ R2 = F. For t 6= 0, v(t) is of the form (3.25). t u The important conclusion from Example 9.31 is that the set B is not closed. We define B = B ∪˙ ∂B (disjoint union). (9.15) Note that the tensor v(0) 6= 0 defined in Example 9.31 belongs to ∂B.
9.5.4 General Case If B is nonempty, also ∂B is nonempty since 0 ∈ ∂B
(9.16)
is always true (consider λv with v ∈ B for λ → 0 and note that 0 ∈ F because of 6
The case of K = R is more involved (cf. Angelini–Bocci–Chiantini [7, Theorem 4.2]).
9.5 General Statements on Nonclosed Formats
313
(9.14d)). Example 9.31 ensures that ∂B also contains nontrivial tensors of F = R2 . We remark that (9.17) ∂B ⊂ F since ∂B ⊂ B ⊂ F = F ∪ B and ∂B ∩ B = ∅ (cf. (9.15)). Conclusion 9.32. Let 0 6= v ∈ ∂B. Then σε (v) = ∞ for all ε > 0 and σ0 (v) = ∞, although σ(v) < ∞. Proof. By definition of ∂B there is some w ∈ B with 0 < η := kv − wk < 2ε and ση (w) = ∞ (cf. Conclusion 9.29). Since UF ,η (w) ⊂ UF ,ε (v), σε (v) ≥ ση (w) yields the assertion. t u A consequence is the discontinuity of σ on ∂B. Conclusion 9.32 ensures the existence of a sequence vi → v with lim σ(vi ) → ∞ > σ(v). This proves: Conclusion 9.33. σ is not continuous at v ∈ ∂B\{0}.
9.5.5 On the Strength of Divergence In the case described in Remark 9.19 the terms are of order O(n) while ε = O( n1 ) is the error. Hence an error bound ε corresponds to the term size O( 1ε ). Using √ better approximations7 we may obtain ε with the weaker divergence order O(1/ ε). In the sequel we want to study the weakest divergence order. Given an accuracy ε > 0, we have to look for an approximation w ∈ F with minimal σ(w). This leads to the following definition. Definition 9.34. Let v ∈ B and ε > 0. The instability of the approximation problem in F is characterised by δ(v, ε) := inf {σ(w) : w ∈ UF ,ε (v)} . Note that δ(v, ε) is the infimum, whereas σε (v) is the supremum over the same set. Again, δ(v, ε) diverges as ε → 0. Proposition 9.35. Weakly monotone divergence δ(v, ε) % ∞ holds for all v ∈ B as ε & 0. Proof. For an indirect proof assume that δ(v, ε) ≤ K < ∞ for all ε = n1 > 0. Then, for any n ∈ N, there are wn ∈ UF ,1/n (i.e., wn ∈ F and kv − wn k ≤ n1 ) with σ(wn ) ≤ K + 1. Since wn → v, Lemma 9.27 proves the contradicting statement v ∈ F. t u 7
Use a central difference quotient approximation of one-sided difference quotient.
d dt
N
j
v (j) + tw(j) |t=0 instead of the
9 r-Term Approximation
314
9.5.6 Uniform Strength of Divergence 9.5.6.1 Uniform Divergence The function δ(v, ·) is the exact description of the kind of divergence at a fixed v ∈ B. We may ask whether there is a uniform bound for all v ∈ B. The strongest formulation of uniform divergence would an inequality δ(v, ε) ≥ δ0 (ε)
for all v ∈ B with kvk = 1,
(9.18a)
where δ0 (ε) % ∞ as ε & 0.
(9.18b)
The best possible δ0 satisfying (9.18a) is δ0 (ε) := inf{δ(v, ε) : v ∈ B, kvk = 1} = inf{σ(w) : w ∈ F, kv − wk ≤ ε, v ∈ B, kvk = 1}.
(9.19)
By definition, δ0 (ε) is weakly increasing. The crucial question is as to whether the limit limε→0 δ0 (ε) is finite or infinite. Proposition 9.36. Uniform divergence as in (9.18a,b) holds if and only if B ∪ {0} is closed. Proof. As mentioned in (9.16), zero does not belong to B, but to its closure. Therefore closedness of B ∪ {0} means that B contains no nontrivial tensor. In particular B1 := B ∩ {v : kvk = 1} would be closed. (i) Let B ∪ {0} be closed. For an indirect proof assume limε→0 δ0 (ε) =: K < ∞. Then for any ε = 1/n, n ∈ N, there are tensors vn ∈ B1 and wn ∈ F with σ(wn ) ≤ K + 1 and kvn − wn k ≤ 1/n. By compactness we may take subsequences so that vn → v and wn → w. Since B1 is closed, we obtain v ∈ B1 ⊂ B. As σ(wn ) is uniformly bounded, the limit belongs to F (cf. Lemma 9.27), i.e., w ∈ F. Now kvn − wn k ≤ 1/n yields the contradiction v = w (F and B are disjoint!). (ii) If B ∪ {0} is not closed, there is some 0 6= w ∈ ∂B := B\B. Thanks to the cone property (9.14b), we may assume without loss of generality that kwk = 1. Note that ∂B ⊂ F (cf. (9.17)). Hence w has a finite value ω := σ(w). For any ε > 0 we find some v ∈ B1 with kv − wk ≤ ε. Now (9.19) implies that δ0 (ε) ≤ ω for all ε > 0, i.e., the property (9.18b) is not valid. t u In the interesting case of F = Rr we know that B ∪ {0} is not closed (cf. Example 9.31). Hence uniform divergence (9.18a,b) does not hold for F = Rr . Nevertheless it is possible to refine the definition of divergence.
9.5 General Statements on Nonclosed Formats
315
9.5.6.2 Weaker Form of Uniform Divergence In the case of F = Rr , the exceptional set ∂B = B\B is a rather small subset of F. In the following we formulate an inequality involving the distance from ∂B. Theorem 9.37. There is a function δ1 with δ1 (ε) % ∞ as ε & 0
(9.20a)
such that δ(v, ε) ≥ dist(v, ∂B) δ1 (ε)
for all v ∈ B with kvk = 1.
(9.20b)
Proof. (a) If dist(v, ∂B) = 0, the estimate δ(v, ε) ≥ 0 is trivial. (b) In the following we consider those tensors v with v ∈ B, kvk = 1, and dist(v, ∂B) > 0. In this case the best possible δ1 (ε) is δ1 (ε) := inf{δ(v, ε)/ dist(v, ∂B) : v ∈ B, kvk = 1, dist(v, ∂B) > 0} σ(w) kv − wk ≤ ε, kvk = 1, : w ∈ F, v ∈ B with . = inf dist(v, ∂B) > 0 dist(v, ∂B) δ1 is weakly increasing as ε → 0. For an indirect proof of (9.20a) we assume that δ1 (ε) ≤ K < ∞. As in the proof of Proposition 9.36 there are convergent subsequences wn ∈ F, vn ∈ B with v = limn→∞ vn , w = limn→∞ wn , 1 kvn − wn k ≤ , wn ∈ F, vn ∈ B, kvn k = 1, dist(vn , ∂B) > 0, and n σ(wn ) ≤ (K + 1) dist(vn , ∂B). We conclude from (9.16) that dist(vn , ∂B) ≤ dist(vn , 0) = kvn k = 1. Therefore σ(wn ) ≤ K + 1 implies w∈F (9.21) (cf. Lemma 9.27). Next we check the limit dist(v, ∂B) = limn→∞ dist(vn , ∂B) ≥ 0. Assume that dist(vn , ∂B) → 0. Then also σ(wn ) ≤ (K + 1) dist(vn , ∂B) → 0. By Remark 9.24 there are parameters pn ∈ D with wn = ρ(pn ) and σ(wn ) = kpn k . Now σ(wn ) = kpn k → 0 proves pn → 0, while (9.14c,d) show that w = lim wn = lim wn = lim ρ(pn ) = ρ(0) = 0. Since the norm is continuous, kwk = 0 holds in contradiction to kwk = 1, which follows from kvn − wn k ≤ n1 and kvn k = 1. Hence limn→∞ dist(vn , ∂B) = dist(v, ∂B) > 0 holds and implies that v ∈ / ∂B. Since vn ∈ B, the limit v is in B = B ∪ ∂B (cf. (9.15)) and v ∈ / ∂B proves v ∈ B.
(9.22)
9 r-Term Approximation
316
From kvn − wn k ≤ n1 we conclude v = w which is a contradiction since both tensors are in disjoint sets (cf. (9.21), (9.22)). t u The interpretation of Theorem 9.37 depends on the topological structure of ∂B as seen next. Remark 9.38. If ∂B is closed, the distance dist(v, ∂B) is positive for all v ∈ B. This yields a nontrivial estimate (9.20b) for all v ∈ B. Proof. v ∈ B and ∂B ⊂ F implies v ∈ / ∂B. Note that dist(v, ∂B) = 0 for a closed set ∂B is equivalent to v ∈ ∂B. t u Finally we consider the case of a nonclosed set ∂B. We split the closure ∂B into disjoint sets ∂B = ∂B ∪ C. Remark 9.39. (a) C is a subset of B. (b) dist(v, ∂B) = 0 holds for v ∈ B if and only if v ∈ C. Proof. (a) ∂B ⊂ B implies ∂B ⊂ B = B∪∂B and C ⊂ B∪∂B. Since C∩∂B = ∅, C ⊂ B is proved. (b) Note that dist(v, ∂B) = 0 is equivalent to dist(v, ∂B) = 0. In the latter case there is some w ∈ ∂B with kv − wk = dist(v, ∂B) = 0, i.e., v = w. Comparing v ∈ B and w ∈ ∂B = ∂B ∪ C and noting that ∂B ⊂ F, it follows that v = w ∈ C. t u In case of a nonclosed ∂B, the estimate (9.20b) degenerates to δ(v, ε) ≥ 0 if and only if v ∈ C
9.5.6.3 Example ⊗3 R2 Here we explore the topological structure of B for the tensor space V = ⊗3 R 2 which is the smallest nontrivial example. The maximal rank in V is 3 (cf. §3.2.6.5 and Kruskal [204]). Hence R3 coincides with V and is obviously closed. As seen by the tensor v in (3.24), R2 is not closed. Lemma 9.18 states that 3 O φ(j) (v) : φ(j) ∈ L(R2 , R2 ) isomorphism , B= j=1
where v is defined in (3.24) with {a, b} being a fixed basis of R2 . Let φ(2) = φ(3) be the identity and define φ(1) by φ(1) (a) = a, φ(1) (a) = a + tb. For t 6= 0, φ(1) is an isomorphism, whereas for t = 0 it is not invertible. Note that with these
9.5 General Statements on Nonclosed Formats
317
N3 (j) mappings (v) coincides with the tensor in Example 9.31. For t = 0 j=1 φ we obtain the tensor w1 = a ⊗ a ⊗ a + a ⊗ b ⊗ a + a ⊗ a ⊗ b ∈ ∂B. The same construction with respect to the directions j = 2 and j = 3 yields w2 = b ⊗ a ⊗ a + a ⊗ a ⊗ a + a ⊗ a ⊗ b ∈ ∂B, w3 = b ⊗ a ⊗ a + a ⊗ b ⊗ a + a ⊗ a ⊗ a ∈ ∂B. N3 (j) We obtain all tensors in ∂B by (v) if at least one φ(j) ∈ L(R2 , R2 ) j=1 φ N3 (j) is not invertible. Such tensors can be written as (wi ) with wi , j=1 ψ (j) 2 2 ı ∈ {1, 2, 3}, and general linear maps ψ ∈ L(R , R ), i.e., 3 [ O ∂B = ψ (j) (wi ) : ψ (j) ∈ L(R2 , R2 ) . 1≤i≤3
j=1
Since L(R2 , R2 ) is closed we obtain the desired result. Proposition 9.40. In the case of V = ⊗3 R2 , the set ∂B is closed.
9.5.7 Extension to Vector Spaces of Larger Dimension Usually the tensor representation ρ is not restricted to a particular tensor space V, but is defined for all V. When studying the (non-)closedness of a format, we can study the case of low-dimensional vector spaces Vj . In the following we consider spaces Vj ⊂ Wj (1 ≤ j ≤ d) with dim(Vj ) < ∞ and correspondingly V ⊂ W. We denote the formats applied to V and W by ρV and ρW with corresponding ranges FV and FW . Condition 9.41 Let π (j) : Wj → Vj ⊂ Wj be some projections and set π := Nd (j) : W → V. We require: (a) FV ⊂ FW ; (b) if w = ρW (pW ) ∈ FW , j=1 π then there is some parameter vector pV so that πw = ρV (pV ) ∈ FV . Lemma 9.42. Under Condition 9.41 the nonclosedness of FV implies the nonclosedness of FW . Proof. Since FV is not closed, there is some v∗ ∈ FV \FV ⊂ V and a sequence FV 3 vi → v∗ . Because of dim(V) < ∞, the inclusion V ,→ W is continuous (cf. Definition 4.17). Therefore vi → v∗ also holds in the topology of W. Assume that ρW (p∗W ) = v∗ holds for some p∗W . Condition 9.41 shows that v∗ = πv∗ = ρV (p∗V ) ∈ FV in contradiction to the choice of v∗ . t u
9 r-Term Approximation
318
9.6 Numerical Approaches for the r-Term Approximation If inf u∈Rr kv − uk has no minimum, any numerical method is in trouble. First, no computed sequence can converge, and second, the instability will spoil the computation. On the other hand, if minu∈Rr kv − uk exists and moreover essential uniqueness holds (see Theorem 7.5), there is hope for a successful numerical treatment.
9.6.1 Use of the Hybrid Format When v ∈ V has to be approximated by u ∈ Rr , the computational cost depends on the representation of v. The cost is huge if v is represented in full format. Fortunately, in most of the applications, v is already given in R-term format with some R > r or, possibly, in a tensor subspace format. We start with the latter case. We consider the case of Vj = KIj
and
V = KI , I = I1 × . . . × Id ,
equipped with the Euclidean norm k·k. Assume that v = ρorth a, (Bj )dj=1 with J = J1 ×. . .×Jd . The Euclidean orthogonal matrices Bj ∈ KIj ×Jj and a ∈ KJ , N d norm in KJ is also denoted by k·k. Set B := j=1 Bj and note that v = Ba (cf. (8.5b)). Lemma 9.43. Let v, a, B be as above. Then any c ∈ KJ and the corresponding tensor u := Bc ∈ KI satisfy ka − ck = kv − uk . Furthermore, c and u have equal tensor rank. Hence, minimisation of ka − ck over all c ∈ Rr (KJ ) is equivalent to minimisation of kv − uk over all u ∈ Rr (KI ). Proof. The coincidence of the norm holds because orthonormal bases are used: BH B = I. Theorem 8.44a states that rank(c) = rank(u). t u Therefore the strategy consists of three steps: (i)
Given v = ρorth a, (Bj )dj=1 ∈ KI , focus to a ∈ KJ ,
(ii) approximate a by some c ∈ Rr (KJ ), (iii) define u := Bc ∈ Rr (KI ) as approximant of v. The above approach is of practical relevance since #J ≤ #I holds and often
9.6 Numerical Approaches for the r-Term Approximation
319
#J #I is expected. The resulting tensor u = Bc has hybrid format: v = ρhybr orth (. . .) (cf. (8.19)). This approach is, e.g., recommended in Espig [87]. J The previous step (ii) depends on the format of a ∈ K . In the general case of d v = ρorth a, (Bj )j=1 , the coefficient tensor a is given in full format. A more favourable case is the hybrid format v = ρhybr orth (. . .), where a is given in R-term format.
Next, we assume that v ∈ V is given in R-term format with a possibly large representation rank R, which has to be reduced to r ≤ R (either r fixed, or indirectly determined by a prescribed accuracy). §8.5.2.3 describes the conversion hybr of the tensor v = ρr-term (R, . . .) into the hybrid format v = ρorth (. . .) = Ba; J i.e., with the coefficient tensor a ∈ K again given in the R-term format a = ρr-term (R, . . .). According to Lemma 9.43, the approximation is applied to a. We summarise the reduction of the approximation problems for the various formats: format of original tensor v format of coefficient tensor a ρorth a, (Bj )dj=1 full format (j) hybr ρorth R, (aν ), (Bj )dj=1 R-term format R-term format
(9.23)
R-term format
The equivalence of minimising c in ka − ck and u in kv − uk again leads to Nd min the statement that the minimiser u∗ of minu kv − uk belongs to (v) j=1 Uj (cf. Lemma 9.2). The hybrid format is also involved in the approach proposed by Khoromskij– Khoromskaia [187]. It applies in the case of a large representation rank r in v ∈ Rr and d = 3, and consists of two steps: Step 1: convert the tensor v ∈ Rr approximately into an HOSVD representation v0 = ρHOSVD a, (Bj )1≤j≤d ; Step 2: exploit the sparsity pattern (see below) of a to reconvert to v00 ∈ Rr0 with hopefully much smaller r0 < r. For Step 1 one might use methods as described in Remark 8.33. Concerning Step 2 we remark that because of the HOSVD structure, the entries of the coefficient tensor a are not of similar size. In practical applications one observes that a large part of the entries can be dropped yielding a sparse tensor (cf. §7.6.5), although a theoretical guarantee cannot be given. A positive result about the sparsity of a can be stated for sparse-grid bases instead of HOSVD bases (cf. §7.6.5).
9 r-Term Approximation
320
9.6.2 Alternating Least-Squares Method 9.6.2.1 General Setting of Alternating Methods Assume that Φ is a real-valued function of variables x := (xω )ω∈Ω with an ordered index set Ω. We want to find a minimiser x∗ of Φ(x) = Φ(xω1 , xω2 , . . .). A standard iterative approach is the successive minimisation with respect to the single variables xω . The iteration starts with some x(0) . Each step of the iteration maps x(m−1) into x(m) and has the following form: (0)
Start choose xω for ω ∈ Ω. Iteration for i := 1, . . . , #Ω do (m) (m) (m−1) m = 1, 2, . . . xωi := minimiser of Φ(. . . , xωi−1 , ξ, xωi+1 , . . .) w.r.t. ξ (9.24) (m−1)
Note that in the last line the variables xω` for ` > i are taken from the last iterate x(m−1) , while for ` < i the new values are inserted. The underlying assumption is that minimisation with respect to a single variable is much easier and cheaper than minimisation with respect to all variables simultaneously. The form of the iteration is well known from the Gauss– Seidel method (cf. [140, §3.2.3]) and called the coordinate descent method (cf. Luenberger–Ye [221, §8.6]). Obviously, the value Φ(x(m) ) is weakly decreasing during the computation. Whether the iterates converge depends on properties of Φ and on the initial value. In case x(m) converges, the limit may be a local minimum (cf. Example 9.49). Next, we mention some variations of the general method. (α) Minimisation may be replaced with maximisation. (β) Using i := 1, 2, . . . , #Ω − 1, #Ω, #Ω − 1, . . . , 2 as i-loop in (9.24), we ensure a certain symmetry. Prototype is the symmetric Gauss–Seidel iteration (cf. [140, §5.4.3]). (γ) Instead of single variables, we may use groups of variables, e.g., minimise first with respect to (x1 , x2 ), then with respect to (x3 , x4 ), etc. After rewriting (X1 := (x1 , x2 ), . . .) we get the same setting as in (9.24). Since we have not fixed the format of xj , each variable xj may be vector-valued. The corresponding variant of the Gauss–Seidel method is called the block-Gauss–Seidel iteration (cf. [140, §3.3.3]). (δ) The groups of variables may overlap, e.g., minimise first with respect to (x1 , x2 ), then with respect to (x2 , x3 ), etc. (ε) We usually do not determine the exact minimiser as required in (9.24). Since the method is iterative anyway, there is no need for an exact minimisation. The weak decrease of Φ(x(m) ) can still be ensured. (ζ) Previous iterates can be used to form some nonlinear analogues of the cg, Krylov, or GMRES methods.
9.6 Numerical Approaches for the r-Term Approximation
321
(η) The maximum block improvement (MBI) variant determines all corrections with respect to ω ∈ Ω in parallel and applies the update in the direction of the largest correction. This method and its convergence are studied in Li–Uschmajew– Zhang [209]. For any 1 ≤ k ≤ p, let Φ(x1 , . . . , xp ) with fixed xj (j 6= k) be a quadratic function8 in xk . Then the minimisation in (9.24) is a least-squares problem, and algorithm (9.24) is called the alternating least-squares method (ALS). Typically, this situation will arise when Φ is a squared multilinear function.
9.6.2.2 ALS Algorithm for the r-Term Approximation 9.6.2.2.1 Minimisation Problem and Algorithm Nd Let V ∈ j=1 Vj and Vj = KIj be equipped with the Euclidean norm9 and set I := I1 × . . . × Id . For v ∈ V and a representation rank r ∈ N0 we want to minimise10
2 2 r O d d r Y X X X
2 (j) (j) v u kv − uk = − ] u v[i] [i − (9.25a) = j ν ν
ν=1 j=1
i∈I
ν=1 j=1
(j)
with respect to all entries uν [i]. To construct the alternating least-squares method (k) (abbreviation: ALS), we identify the entries uν [i] with the variables xω from above. The indices are ω = (k, ν, i) ∈ Ω := {1, . . . , d}×{1, . . . , r}×Ik . For this N (j) [k] purpose we introduce the notations I[k] := ×j6=k Ij and uν := (cf. j6=k uν Pr (k) [k] (3.17d)) so that u = ν=1 uν ⊗ uν and 2 r X X X (k) [k] kv − uk = uν [i] · uν [`] . v[`1 , . . . , `k−1 , i, `k+1 , . . . , `d ] − 2
ν=1
i∈Ik `∈I[k]
For fixed ω = (k, ν, i) ∈ Ω, this equation has the form 2
2
kv − uk = αω |xω | − 2 minu∈Rrkv − uk (see Example 9.49). Remark 9.47. If one can show that some u∗ is an isolated accumulation point, the above statement (d) implies that lim um = u∗ . Under suitable assumptions local convergence is proved by Uschmajew [288].
9.6.2.4 ALS for the Best Rank-1 Approximation We consider the problem from §9.2: Given a tensor v ∈ V we are looking for the Nd best approximation by a rank-1 tensor u := j=1 u(j) (cf. Zhang–Golub [309]). Let the real tensor space V be equipped with the Euclidean norm. Then the quadratic cost function 2
Φ(u) := kv − uk
(9.27)
is to be minimised over u ∈ R1 . The behaviour of ALS for this example is of particular interest since u belongs to both formats R1 = T(1,...,1) . For general r-term or tensor subspace representations one cannot expect better properties. The principal questions are: (a) does Φ possess several local minima; (b) are all minima attractive fixed points; (c) are all fixed points local minima; (d) what is the convergence speed? Note that the (approximate) minimisation of (9.27) is the basic step of the algorithm in §9.4.4. Question (a) is already answered by Lemma 9.3. However, the multiple solutions 2 u∗ and u∗∗ described in the proof yield the same minimal value kv − u∗ k = 2 kv − u∗∗ k . Below we shall see that — in the true tensor case d ≥ 3 — also local minima Φ(u∗∗ ) > Φ(u∗ ) exist which are even attractive fixed points of the ALS iteration.
9.6 Numerical Approaches for the r-Term Approximation
325
9.6.2.5 Fictitious ALS Optima and Local Optima Now we discuss the behaviour of ALS for the minimisation problem (9.27). The j-th step of the ALS method uses the variation of the parameters corresponding to the j-th direction: v + δj ∈ F, where F denotes the format R1 = T(1,1) . In the case of (9.27) we have δj = u(1) ⊗ . . . ⊗ u(j−1) ⊗ δu(j) ⊗ u(j+1) ⊗ . . . ⊗ u(k) . A reliable minimisation algorithm should verify whether the solution is really a minimum. As explained next it is not sufficient to check this property for the partial ALS steps. We call u∗ ∈ F a strict ALS minimiser if Φ behaves uniformly coercive with respect to all variations appearing in the ALS method: 2
Φ(u∗ + δj ) ≥ Φ(u∗ ) + c kδj k
(c > 0, 1 ≤ j ≤ d).
Unfortunately, there is no guarantee that such a tensor is a minimiser of the true problem (i.e., Φ(u) ≥ Φ(u∗ ) for all u ∈ F at least in a neighbourhood of u∗ ). The following counterexample with d = 2 shows that u∗ may be a saddle point. Example 9.48. For V = R2 ⊗ R2 and F = R1 = T(1,1) (cf. Exercise 8.2b) consider (9.27) with 0 0 1 1 1 1 ∗ ⊗ +2 v := ⊗ , u := . (9.28) ⊗ 1 1 0 0 0 0 (a) The true global minimiser is u∗∗ := 2 01 ⊗ 01 with Φ(u∗∗ ) = 1 < Φ(u∗ ) = 4. (b) The variations in the first and second direction are α 1 α 1 δ1 = ⊗ and δ2 = ⊗ (α, β ∈ R). β 0 0 β They lead to the coercivity property 2
Φ(u∗ + δ1 ) = Φ(u∗ ) + kδ1 k ,
2
Φ(u∗ + δ2 ) = Φ(u∗ ) + kδ2 k .
Therefore u∗ is a minimising fixed point of the ALS iteration. (c) The parametrised tensor 1 1 ∈ R1 u(t) := ⊗ t t
for t ∈ R
satisfies u(0) = u∗ and Φ(u(t)) = Φ(u∗ ) − t2 2 − t2 ; hence, Φ(u(·)) has a strict local maximum at t = 0.
9 r-Term Approximation
326
We recall the vector iteration for computing eigenvectors of matrices. It is known that the vector iteration applied to a special starting value may lead to an eigenpair whose eigenvalue is not the maximal one. However, any tiny random perturbation of the starting value let the iteration converge to the true maximal eigenvalue. A similar behaviour can be shown for Example 9.48.11 Consider an iterate u := a 1b ⊗ 1c ≈ u∗ with a ≈ 1, b, c ≈ 0, and c 6= 0. Optimisation in the first direction yields −1 1 1 1 00 2 −1 1 u0 = 1 + c2 2c ⊗ 4c . 2c ⊗ c , the second step yields u = 1 + 4c Hence u∗ is a repulsive fixed point, whenever a, c 6= 0. One easily verifies that the ALS iteration converges to the global minimum. Since the exceptional case of c = 0 is of measure zero among all starting values, we obtain convergence to u∗∗ for almost all starting values. The situation changes if we turn to the true tensor case d ≥ 3. Example 9.49. For d ≥ 3 the tensor corresponding to (9.28) is 1 0 v := ⊗d + 2 ⊗d 0 1 (cf. [229, §4.3.5]). u∗ := ⊗d 10 ∈ R1 is a local minimum with Φ(u∗ ) = 4, while u∗∗ := 2 ⊗d 01 with Φ(u∗∗ ) = 1 is the global one. For instance, variation along the path u(t) := ⊗d 1t yields Φ(u(t)) = 3 + (1 + t2 )d − 4td = 4 + dt2 + O(tmin{4,d} ). Furthermore, u∗ is an attractive fixed point of the ALS iteration; i.e., the iteration converges to u∗ , whenever the starting value is close enough to u∗ . The ALS convergence to u∗ is quadratic (or of higher order). Proof. (a) A general parametrisation of u ∈ R1 = T(1,...,1) in the neighbourhood of u∗ is d O 1 (9.29) with |aj | ≤ ε. u = (1 + a0 ) a j j=1 Pd 2 An elementary calculation yields kv − uk = 4 + j=0 aj2 + higher order terms. This proves that u∗ is a strict local minimum. Here the inequality d ≥ 3 is crucial. Qd The higher-order terms contain −4 j=1 aj . For d ≥ 3, it does not influence the 2 Hessian ( ∂a∂j ∂ak Φ)dj,k=0 = I and its positive definiteness. However, for d = 2, it is part of the Hessian, which then becomes indefinite. (b) Take u in (9.29) as starting value. The first step acting in the first ALS Nd 1 ; i.e., a0 and a1 direction maps u into u0 = Qd 11+a2 2 Qd1 aj ⊗ j=2 aj j=2 j) j=2 ( Qd 1 2 0 0 are replaced with a00 := Qd (1+a 2 ) −1 satisfying |a0 | ≤ O(ε ) and a1 := 2 j=2 aj j=2
j
with |a01 | ≤ O(εd−1 ) ≤ O(ε2 ). Hence we stay in the ε neighbourhood, and the 11
Concerning the relation of ALS and the vector iteration compare [72] and §10.3.2.5.
9.6 Numerical Approaches for the r-Term Approximation
327
same argument shows that the additional d − 1 ALS steps produce new parameters |a0j | ≤ O(εd−1 ) ≤ O(ε2 ). This proves (at least) quadratic convergence. t u of v in Example 9.49 may seem very particular since the The construction terms ⊗d 01 and 2 ⊗d 01 are not only orthogonal, but also the factors are ortho gonal in each direction: 10 ⊥ 01 . Instead, we can replace ⊗d 10 by ⊗d sc with s2 + c2 = 1 and small s. Let Φs be the new cost function. Then the fixed point stays definite for s sufficiently of Φs deviates a bit from ⊗d sc , but the Hessian c small. For instance, the path u(t) = ⊗d s+t yields Φs (u(t)) = 4 − 12s2 t + (6s2 − 12s + 3)t2 + higher-order terms. √ As long as s ≤ 1 − 1/ 2 , the strict minimum is attained at t = −3s2 + O(s4 ). Corollary 9.50. If we replace the tensor v := ⊗d 10 + 2 ⊗d 01 for d = 3 with 0 1 1 0 0 0 1 ⊗ 0 ⊗ 0 + 2 1 ⊗ 1 ⊗ 1 , we again obtain the situation of Example 9.48: u∗ = 01 ⊗ 10 ⊗ 10 is a strict ALS minimiser, but not a minimiser of Φ. 9.6.2.6 Complete Analysis of a Model Problem Consider 2
Φ(u) := kv − uk
for v = ⊗3 a + 2 ⊗3 b,
where a⊥b
and kak = kbk = 1.
All starting values in ⊗3 span{a, b}12 (except multiples of ⊗3 b) can be written as (0) (0) (0) u0 = c0 a + t1 b ⊗ a + t2 b ⊗ a + t3 b . The m-th iterate is (m)
(m)
(m)
um = cm (a + t1 b) ⊗ (a + t2 b) ⊗ (a + t3 b). (m)
The next ALS step does not depend of the first factor cm (a+t1 b). Therefore um (m) (m) is fully characterised by the pair (t2 , t3 ). One checks that the ALS iteration is described by (m) (m) (m) (m) (m+1) (m+1) (m) (m) = 4t2 (t3 )2 , 16(t2 )2 (t3 )3 . 7→ t2 , t3 t2 , t3 (m)
(m+1)
(m)
(m+1)
) are identical. Therefore it is , t3 The sign patterns of (t2 , t3 ) and (t2 sufficient to study nonnegative parameters t2 , t3 ≥ 0. The characteristic quantity is α 2 √ (m) (m) τm := t2 t3 with α := 5 − 1. 12
(0)
(0)
a + t2 b may be generalised to a + tj b + cj with cj ⊥ span{a, b}. This only changes the (uninteresting) scaling constants. After one ALS iteration the cj terms vanish.
9 r-Term Approximation
328
Theorem 9.51. Let τ ∗ := 2 cases depending on τ0 :
√ −( 5+1)
(A) If τ0 > τ ∗ then u(m) → 2 ⊗3 b
≈ 0.106 13. Below we distinguish three (global minimiser),
(B) If τ0 < τ ∗ then u(m) → ⊗3 a
(local minimiser). . This is a saddle point, but the global (C) If τ0 = τ then u → ⊗ minimum on the manifold described = τ ∗. √ γ Proof. Verify that τm+1 = (4α τm ) with γ := 2 + 5. This proves τm → 0, ∞, τ ∗ (m) (m) in the respective cases (A,B,C). To describe the behaviour of the factors t2 , t3 , 2 α (m) (m) and check that / t2 introduce σm := t3 ∗
(m)
16 25
3
a + 12 b 2 by tα 2 t3
σm+1 = 42χ (σm )−χ (0)
with χ :=
√
5−2
(0)
(here we assume t2 , t√3 > 0 and hence 0 < σm < ∞; the other case is trivial). Then σm → σ ∗ = 23− 5 ≈ 1.698 1 follows. Hence τm → 0, ∞ implies that both (m) (m) (m) (m) u t2 and t3 have the same limit. If τm = τ ∗ , one obtains t2 , t3 → 1/2. t One can also determine the convergence speed. In the cases (A) and (B)√we have superlinear convergence, more precisely convergence of order γ = 2 + 5. Linear convergence holds in Case (c). Similar model problems are analysed by Espig–Khachatryan [95]. Criteria for the starting iterate are formulated which imply convergence to a particular local minimum. Examples of super-, sub-, and linear convergence are given. A more general case (d ≥ 3, general rank-2 tensor v) is treated by Gong–Mohlenkamp– Young [116].
9.6.2.7 Convergence Properties We consider again the ALS iteration for minimising (9.27) over u ∈ R1 . According to Remark 9.47, global convergence13 holds if all ALS sequences have an isolated accumulation point. This fact is proved by Uschmajew [290]. The proof uses a gradient inequality by Łojasiewicz14 which involves an exponent 1 − θ with θ ∈ (0, 1/2]. Depending on the value of θ, the convergence speed ku∗ − uk k → 0 of the ALS iteration {uk } is either at least q k for some q ∈ (0, 1) (case of θ = 1/2) or k −θ/(1−2θ) (case of 0 < θ < 1/2). 13
By ‘global convergence’ we mean that the sequence of iterats converges for all starting values. Note that there are other definitions: Luenberger–Ye [221, page 201] call the iteration globally convergent if the limits of all convergent subsequences are solving the underlying problem. 14 The inequality by Łojasiewicz [216, §18, Proposition 1] states that kgrad f (x)kα ≥ |f (x)| for analytic f in some neighbourhood of a with f (a) = 0. The proof reveals that the exponent α is of the form 1 − 1/j for some integer j ≥ 2.
9.6 Numerical Approaches for the r-Term Approximation
329
9.6.3 Stabilised Approximation Problem As seen in §9.4, the minimisation problem minu∈Rr kv − uk is unsolvable if and only if infimum sequences are unstable. An obvious remedy is to enforce stability by adding a penalty term: v
2
d
2 u d r O r O
u X X
(j) 1≤j≤d (j) (j) Φλ (ui )1≤i≤r := min t v− ui + λ2 ui , (9.30a)
(j)
ui ∈Vj i=1 j=1
i=1
j=1
Nd Qd (j) (j) where λ > 0 and k j=1 ui k2 = j=1 kui k2 (crossnorm property). To illustrate the effect of this approach, assume v ∈ B (as defined in §9.5.2). The corresponding stability measure δ(v, ε) is defined in Definition 9.34. Then the minimisation of Φλ in (9.30a) corresponds to o np min ε2 + λ2 δ 2 (v, ε) : ε > 0 . −κ Assume that δ diverges as δ(v, ε) = εq (κ > 0). Then the optimal ε is 1 1/(2κ+2) 2 λ κ and yields the value Φλ = (1 + κ1 )(λ2 κ) κ+1 while the instability −κ/(2κ+2) . measure is δ = λ2 κ
Alternatively, stability may be requested as a side condition (C > 0):
d r O
X
(j) 1≤j≤d (j) min ΦC (ui )1≤i≤r := ui . v−
(j)
ui ∈Vj subject to Pr
i=1
k
Nd
j=1
(j)
ui k2 ≤C 2 kvk2
(9.30b)
i=1 j=1
Again we may consider the diverging function δ(v, ε) for a fixed v ∈ B. The minimisation of ΦC yields the error δ −1 (C kvk), where δ −1 is the inverse function of δ(v, ·). Assuming δ(v, ε) = cε−κ , we obtain that min ΦC = (C kvk /c)−1/κ . Pr Nd (j) If un = i=1 j=1 ui,n is a sequence with kv − un k & inf u kv − uk subject to the side condition in (9.30b), it is a stable sequence: σ((vn ), r) ≤ C. Hence we infer from Theorem 9.22 that this subsequence converges to some u∗ ∈ Rr . In the penalty case of (9.30a), we may assume Φλ ≤ kvk since the trivial approximation u = 0 already ensures this estimate. Then σ((vn ), r) ≤ λ follows and allows the same conclusion as above. Even for a general minimising Pr Nd (j) (j) 1≤j≤d sequence un = i=1 j=1 ui,n with c := limn Φλ (ui,n )1≤i≤r , we conclude that σ((vn ), r) ≤ c holds asymptotically. In the case that minu∈Rr kv − uk possesses a stable minimising sequence with σ((un ), r) ≤ C and limit u∗ , the minimisation of ΦC in (9.30b) yields the same result. Lemma 9.2 also holds for the solution u∗ of the regularised solution. In the case of an infinite-dimensional topological tensor space, we can study the decrease of
9 r-Term Approximation
330
r O d X
(j) (j) ui : ui,n ∈ Vj εr (v) := inf v − i=1 j=1
for r → ∞. To avoid instabilities (at least for finite r), we add the side-condition (j) 1/d kui,n k ≤ β kvk and obtain the corresponding quantity εβr (v). Note that this side-condition is weaker than that in (9.30b). Restricting v to a bounded class F ⊂ V of tensors, we get the errors εβr (F) := sup εβr (v) : v ∈ F . εr (F) := sup {εr (v) : v ∈ F} , If the r-term approximations to v are stable, εr (v) = εβr (v) holds for suitable β. Bazarkhanov–Temlyakov [21] consider a certain family Fs of periodic functions of smoothness degree s and prove the upper bound εr (F) ≤ O(r−sd/(d−1) ) and −sd/(d−1) ). This result indicates that not all the lower bound εβr (F) ≥ O((r log r) Pr Nd −sd (j) approximation sequences {vr = i=1 j=1 ui,r } with kv − vr k ≤ O r d−1 (j)
satisfy the side-condition kui,n k ≤ β kvk
1/d
.
9.6.4 Newton’s Approach The alternative to the successive minimisation is the simultaneous15 minimisation (j) of Φ(x) in all variables x = (ui,n )1≤j≤d 1≤i≤r . For this purpose, iterative methods can be applied as the gradient method or the Newton method. Both are of the form x(m+1) := x(m) − αm sm
(sm : search direction, αm ∈ K).
The gradient method is characterised by sm = ∇Φ(x(m) ), while Newton’s method uses sm = H(x(m) )−1 ∇Φ(x(m) ) and αm = 1. Here H is the matrix of the second partial derivatives: Hωω0 = ∂ 2 Φ/∂xω ∂xω0 . However, there are a plenty of variations between both methods. The damped Newton method has a reduced parameter 0 < αm < 1. The true Hessian H may be replaced with approximations ˜ which are easier to invert. For H ˜ = I, we regain the gradient method. Below H we use a block diagonal part of H. In Espig [87] and Espig–Hackbusch [89] a method is described which computes the minimiser of 16 Φλ in (9.30a). It is a modified Newton method with an approx˜ allowing for a continuous transition from the Newton to imate Hessian matrix H a gradient-type method. Although the Hessian H is a rather involved expression, its particular structure can be exploited when the system H(x(m) )sm = ∇Φ(x(m) ) has to be solved. This defines a procedure RNM(v, u) which determines the best approximation u ∈ Rr of v ∈ RR by the stabilised Newton method (cf. [89, Alg. 1]). For details and numerical examples, we refer to [89]. The cost per iteration 15 16
Following Vervliet [295], simultaneous approaches are more favourable than ALS methods. (j) (k) In fact, a further penalty term is added to enforce kui k = kui k for 1 ≤ j, k ≤ d.
9.6 Numerical Approaches for the r-Term Approximation
is
2
3
O r(r + R)d + dr + r(r + R + d)
331
d X
rj
j=1
with rj := #Jj and Jj from Lemma 9.43. We now use the symbols v and u for the tensors involved in the optimisation problem. For the computation we should replace the tensors from V with the coefficient tensors in KJ as detailed in §9.6.1. Newton’s method is well known for its fast convergence as soon as x(m) is sufficiently close to a zero of ∇Φ(x) = 0. The main difficulty is usually the choice for suitable starting values. If a fixed rank is given (cf. Problem (9.1)), a rough initial guess can be constructed by the method described in Corollary 15.6. A certain kind of nested iteration (cf. [131, §5], [140, §11.5]) can be exploited for solving Problem (9.3). The framework of the algorithm is as follows: given data v ∈ RR , initial guess u ∈ Rr with r < R, ε > 0 loop RNM(v, u); ρ := v − u; if kρk ≤ ε then return; if r = R then begin u := v; return end; find a minimiser w ∈ R1 of minω∈R1 kρ − ωk ; u := u + w ∈ Rr+1 ; r := r + 1; repeat the loop
1 2 3 4 5
(9.31)
Line 1: The initial guess u ∈ Rr also defines the starting rank r. Line 2: The best approximation u ∈ Rr is accepted if kv − uk ≤ ε. Line 3: If no approximation in Rr with r < R is sufficiently accurate, u = v ∈ Rr must be returned. Line 4: The best approximation problem in R1 can be solved by RNM or ALS. Here no regularisation is needed (cf. §9.2). Line 5: u + w is the initial guess in Rr+1 . Obviously, the R1 optimisation in Line 4 is of low cost compared with the other parts. This fact can be exploited to improve the initial guesses. Before calling RNM(v, u) in Line 2, the following procedure can be applied: App1 (v, w) is a rough R1 approximation of v using w ∈ R1 as the starting value (a very cheap method makes use of Remark 15.7): Pr ∈ R1 . data v ∈ RR , u = i=1 ui ∈ Rr , uiP loop for ν = 1 to r do begin d := u − i6=ν ui ; uν := App1 (d, uν ) end; This improvement of the approximation u can be applied in Line 2 of (9.31) before calling RNM(v, u). Details are given in [89]. The solution of more general optimisation problems (e.g., linear equations, Rayleigh quotient) is discussed in Espig et al. [94].
9 r-Term Approximation
332
9.7 Generalisations Here we refer to §7.8, in which subsets Aj ⊂ Vj and Rr (Aj )dj=1 ⊂ Rr have been introduced. The corresponding approximation problem is: Given v ∈ V and r ∈ N0 , determine u ∈ Rr (Aj )dj=1 minimising kv − uk .
(9.32)
Though the practical computation of the minimiser may be rather involved,17 the theoretical aspects can be simpler than in the standard case. Lemma 9.52. Assume that V is either finite dimensional or a reflexive Banach space with a norm satisfying (6.14). Let Aj (1 ≤ j ≤ d) be weakly closed subsets of Vj . If a stable subsequence un ∈ Rr (Aj )dj=1 exists with lim kv − un k = n→∞ inf u∈Rr ((Aj )d ) kv − uk, then Problem (9.32) is solvable. j=1
Proof. By Theorem 9.22, there is a subsequence such that un =
r O d X
(j)
ui,n * u ∈ Rr
with u =
d r O X i=1 j=1
i=1 j=1 (j)
(j)
ui
(j)
(j)
satisfying kv − uk = inf w∈Rr ((Aj )d ) kv − wk. Since ui,n ∈ Aj j=1 (j) d u t . and Aj is weakly closed, ui ∈ Aj follows, proving u ∈ Rr (Aj )j=1
and ui,n * ui
In §7.8 the first two examples of Aj are the subset {v ∈ Vj : v ≥ 0} of nonnegative vectors (Vj = Rnj ) or functions (Vj = Lp ). Standard norms as the `p norm (cf. (4.3)) have the property kv + wkVj ≥ kvkVj
for all v, w ∈ Aj
(1 ≤ j ≤ d) .
(9.33a)
Furthermore, these examples satisfy Aj + Aj ⊂ Aj , i.e., v, w ∈ Aj ⇒ v + w ∈ Aj
(1 ≤ j ≤ d) .
(9.33b)
Remark 9.53. Conditions (9.33a,b) imply the stability estimate κ(v, r) ≤ r, (j) the vectors vi are restricted to Aj . Hence any provided that in definition (9.12b) d sequence vn ∈ Rr (Aj )j=1 is stable and Lemma 9.52 can be applied. For matrix spaces Vj = Cnj ×nj equipped with the spectral or Frobenius norm, Aj = {M ∈ Vj : M positive semidefinite} also satisfies conditions (9.33a,b). Let Aj be the subset of nonnegative vectors in Rnj . A possible approach for an approximation j j 2 in Rr (Aj )d j=1 is the ansatz vi = (ui ) (cf. Vervliet [295, Example 10 in §2]). For instance, a nonlinear ALS method can be used to solve for the unknowns uji . 17
9.8 Analytical Approaches for the r-Term Approximation
333
The set Aj = {M ∈ Vj : M Hermitian} is a negative example for (9.33a). Indeed, (9.10) with v (j) , w(j) ∈ Aj is an example for an unstable sequence. The subset Aj = {M ∈ Vj : M positive definite} the is not closed, hence d minimiser of Problem (9.32) is expected in Rr (Aj )dj=1 instead of R (A ) r j j=1 . Nevertheless, the following problem has a minimiser in R1 (Aj )dj=1 . Exercise 9.54. For Vj = Cnj ×nj equipped with the spectral or Frobenius norm, Nd M ∈ j=1 Vj positive definite, and Aj = {M ∈ Vj : M positive definite}, d
o n O
M (j) : M (j) ∈ Aj inf M − j=1
is attained by some
d N
M (j) with M (j) ∈ Aj .
j=1
9.8 Analytical Approaches for the r-Term Approximation The previous approximation methods are black box-like techniques which are applicable for any tensor. On the other hand, for very particular tensors (e.g., multivariate functions) there are special analytical tools which yield an r-term approximation. Differently from the approaches above, the approximation error can be described in dependence on the parameter r. Often, the error is estimated with respect to the supremum norm k·k∞ , whereas the standard norm18 considered above is `2 or L2 . Analytical approaches will also be considered for the approximation in tensor subspace format. Since Rr = T(r,r) for dimension d = 2, the approaches in §10.4 can also be interesting for the r-term format. Note that analytically derived approximations can serve two different purposes: 1. Constructive approximation. Most of the following techniques are suited for practical use. Such applications are described, e.g., in §9.8.2.5 and §9.8.2.6. 2. Theoretical complexity estimates. A fundamental question concerning the use of the formats Rr or Tr is how the best approximation error ε(v, r) in (9.2) depends on r. Any explicit error estimate of a particular (analytical) approximation yields an upper bound of ε(v, r). Under the conditions of this section, we shall obtain exponential convergence; i.e., ε(v, r) ≤ O(exp(−crα ) with c, α > 0 is valid for the considered tensors v. Objects of approximation are not only tensors of vector type but also matrices described by Kronecker products. Early papers of such kind are [149], [145, 146]. The optimisation problems from §9.6.2 and §9.6.4 can also be formulated for the `p norm with large, even p, which, however, would not make the task easier.
18
9 r-Term Approximation
334
9.8.1 Quadrature Nd Let Vj be Banach spaces of functions defined on Ij ⊂ R, and V = k·k j=1 Vj d the space of multivariate functions on I := ×j=1 Ij . Assume that f ∈ V has an integral representation Z f (x1 , . . . , xd ) =
g(ω) Ω
d Y
fj (xj , ω) dω
for xj ∈ Ij ,
(9.34)
j=1
where Ω is some parameter domain, such that the functions fj are defined on Nd Ij × Ω. For fixed ω, the integrand is an elementary tensor j=1 fj (·, ω) ∈ V. R Pr Since the integral Ω is a limit of Riemann sums i=1 . . . ∈ Rr , f is a topological tensor. A particular example of the right-hand side in (9.34) is the Fourier integral transform of g(ω) = g(ω1 , . . . , ωd ): ! Z d X g(ω) exp i xj ωj dω. Rd
j=1
R Pr A quadrature method for Ω G(ω) dω is characterised by a sum i=1 γi G(ωi ) r r with quadrature weights (γi )i=1 and quadrature points (ωi )i=1 . Applying such a quadrature method to (9.34), we get the r-term approximation fr ∈ Rr
with fr (x1 , . . . , xd ) :=
r X i=1
γi g(ωi )
d Y
fj (xj , ωi ).
(9.35)
j=1
Usually there is a family of quadrature rules for all r ∈ N, which leads to a sequence (fr )r∈N of approximations. Under suitable smoothness conditions on the integrand of (9.34), we may try to derive error estimates of kf − fr k. An interesting question concerns the (asymptotic) convergence speed kf − fr k → 0. There is a connection to §9.7 and the subset Aj of nonnegative functions. Assume that the integrand in (9.34) is nonnegative. Many quadrature method (as the Gauss quadrature) have positive weights: γi > 0. Under this condition, the terms in (9.35) are nonnegative, i.e., fr ∈ Rr (Aj )dj=1 . So far, only the general setting has been described. The concrete example of the sinc quadrature will follow in §9.8.2.2. The described technique is not restricted to standard functions. Many of the tensors of finite-dimensional tensor spaces can be considered as grid functions, i.e., d as functions with arguments x1 , . . . , xd restricted to a grid ×j=1 Gj , #Gj < ∞. This fact does not influence the approach. If the error kf − fr k is the supremum norm of the associated function space, the restriction of the function to a grid is bounded by the same quantity.
9.8 Analytical Approaches for the r-Term Approximation
335
9.8.2 Approximation by Exponential Sums Below we shall focus to the (best) approximation with respect to the supremum norm k·k∞ . Optimisation with respect to the `2 norm is, e.g., considered by Golub– Pereyra [114]. However, in Proposition 9.60 k·k∞ estimates will be needed, while `2 norm estimates are insufficient.
9.8.2.1 General Setting For scalar-valued functions defined on a set D, we denote the supremum norm by kf kD,∞ := sup{|f (x)| : x ∈ D}. If the reference to D is obvious from the context, we also write k·k∞ instead. Exponential sums are of the form Er (t) =
r X
(t ∈ R)
aν exp(−αν t)
(9.36a)
ν=1
with 2r (real or complex) parameters aν and αν . Such exponential sums are a tool to approximate certain univariate functions (details about their computation are in §9.8.2.2 and §9.8.2.3). Assume that a univariate function f in an interval I ⊂ R is approximated by some . Er with respect to the supremum norm in I: kf − Er kI,∞ ≤ ε
(9.36b)
(we expect an exponential decay of ε = εr with respect to r ; cf. Theorem 9.58). Then the multivariate function ! d X F (x) = F (x1 , . . . , xd ) := f (9.36c) φj (xj ) , j=1
obtained by the substitution t = Pd Fr (x) := Er j=1 φj (xj ) :
Pd
j=1
φj (xj ), is approximated equally well by d
kF − Fr kI,∞ ≤ ε
for I :=
×I , j
(9.36d)
i=1
provided that (
d X
) φj (xj ) : xj ∈ Ij
⊂I
with I in (9.36b).
(9.36e)
j=1
For instance, condition (9.36e) holds for φj (xj ) = xj and Ij = I = [0, ∞).
9 r-Term Approximation
336
By the property of the exponential function, we have ! ! d d r X X X φj (xj ) aν exp − αν Fr (x) := Er φj (xj ) = ν=1
j=1
=
r X
aν
ν=1
d Y
(9.36f)
j=1
exp (−αν φj (xj )) .
j=1
Expressing the multivariate function Er as a tensor product of univariate functions, we arrive at Fr =
r X ν=1
aν
d O
Eν(j) ∈ Rr
with Eν(j) (xj ) := exp (−αν φj (xj )) ,
(9.36g)
j=1
Nd i.e., (9.36g) is an r-term representation of the tensor Fr ∈ C(I) = ∞ j=1 C(Ij ) , where the left suffix ∞ indicates the completion with respect to the supremum norm in I ⊂ Rd . A simple but important observation is the following conclusion, which shows that the analysis of the univariate function f and its approximation by Er is sufficient. Conclusion 9.55. The multivariate function Fr (x) has tensor rank r independently of the dimension d . The approximation error (9.36d) is also independent of the dimension d, provided that (9.36b) and (9.36e) are valid. Pr 2 Approximations by sums of Gaussians, Gr (ξ) = ν=1 aν e−αν ξ , are equiva√ Pr lent to the previous exponential sums via Er (t) := Gr ( t) = ν=1 aν e−αν t . A particular but important substitution of the form considered in (9.36c) is 2 t = kxk , leading to ! ! r r d r d X X X Y Xd 2 2 2 xj = aν exp − αν aν e−αν xj , xj = Fr (x) := Er j=1
i.e.,
Fr =
r X ν=1
aν
d O
ν=1
Gν(j)
j=1
ν=1
j=1
2 with G(j) ν (xj ) := exp −αν xj .
(9.37)
j=1
Inequality (9.36b) implies that kF − Fr kD,∞ ≤ ε
with D := x ∈ Rd : kxk ∈ I .
Remark 9.56. In the above applications we make use of the fact that estimates with respect to the supremum norm are invariant under substitutions. When we consider an Lp norm (1 ≤ p < ∞) instead of the supremum norm, the relation between the one-dimensional error bound (9.36b) and the multi-dimensional one in (9.36d) is more involved and depends on d.
9.8 Analytical Approaches for the r-Term Approximation
337
9.8.2.2 Quadrature-Based Exponential Sum Approximations Approximations by exponential sums may be based on quadrature methods19 . Assume that a function f with domain I ⊂ R is defined by the Laplace transform Z ∞ for x ∈ I. f (x) = e −tx g(t)dt 0
Pr Any quadrature method Q(F ) := ν=1 ων F (tν ) for a suitable integrand F defined on [0, ∞) yields an exponential sum of the form (9.36a):20 r X −•x ων g(tν ) e−tν x ∈ Rr . g) := f (x) ≈ Q(e | {z } ν=1 =:aν
Note that the quadrature error f (x) − Q(e −•x g) has to be controlled for all parameter values x ∈ I. A possible choice for Q is the sinc quadrature. For this purpose we choose a suitable substitution t = ϕ(τ ) with ϕ : R → [0, ∞) to obtain Z ∞ f (x) = e −ϕ(τ )x g(ϕ(τ )) ϕ0 (τ ) dτ. −∞
The sinc quadrature can be applied to analytic functions defined on R: Z
∞
F (x)dx ≈ T (F, h) := h −∞
∞ X
N X
F (kh) ≈ TN (F, h) := h
k=−∞
T (F, h) can be interpreted as the infinite TN (F, h) is a truncated finite sum. In fact, quadratures; i.e., they are exact integrals involving sinc interpolations C(f, h) and
F (kh).
k=−N
trapezoidal rule with step size h, while RT (F, h) and TN (F, h)R are interpolatory C(f, h)(t)dt and R CN (f, h)(t)dt, R CN (f, h) defined in (10.32,b).
The error analysis of T (F, h) is based on the behaviour of the holomorphic function F (z) in the complex strip Dδ defined √ in (10.33) and the norm (10.34). A typical error bound is of the form C1 exp(− 2πδαN ) with C1 = C1 (kF kDδ ) and δ in (10.33), while α describes the decay of F : |F (x)| ≤ O(exp(−α |x|). For a precise analysis, see Stenger [274] and Hackbusch [138, §D.4]. Sinc quadrature applied to F (t) = F (t; x) := e −ϕ(t)x g(ϕ(t)) ϕ0 (t) yields TN (F, h) := h
N X
e−ϕ(kh)x g(ϕ(kh)) ϕ0 (kh).
k=−N 19
Quadrature based approximation is very common in computational quantum chemistry. For a discussion from the mathematical side compare Beylkin–Monz´on [34]. Concerning Prony’s method we refer to McLean [223]. 20 The quadrature Q applies to the variable •.
9 r-Term Approximation
338
The right-hand side is an exponential sum (9.36a) with r := 2N +1 and coefficients aν := h g ϕ((ν −1−N ) h) ϕ0 (ν −1−N ) h , αν := ϕ (ν −1−N ) h . Since the integrand F (•; x) depends on the parameter x ∈ I, the error analysis must be performed uniformly in x ∈ I to prove an estimate (9.36b): kf − Er kI,∞ ≤ ε. Even when the obtainable error bounds possess almost optimal asymptotic behaviour, they are inferior to the best approximations discussed next. √ 9.8.2.3 Approximation of 1/x and 1/ x Negative powers x−λ belong to the class of completely monotone functions which can be well approximated by exponential sums in (0, ∞). Because √ of their importance, we shall consider the particular functions 1/x and 1/ x. For the general theory of approximation by exponentials we refer to Braess [41] and Braess– Hackbusch [44, §7]. The first statement concerns the existence of a best approximation and stability of the approximation expressed by positivity of its terms. Theorem 9.57 ([41, p. 194]). Given the function f (x) = x−λ with λ > 0 in an interval I = [a, b] (including bP= ∞) with a > 0 and r ∈ N, there is a unique r best approximation Er,I (x) = ν=1 aν,I exp(−αν,I x) such that ε(f, I, r) := kf − Er,I kI,∞
r X
−βν x
bν e = inf f −
ν=1
: bν , βν ∈ R .
I,∞
Moreover, this function Er,I has positive coefficients: aν , αν > 0 for 1 ≤ ν ≤ r. In the case of f (x) = 1/x, substitution x = at (1 ≤ t ≤ b/a) shows that the best approximation for I = [a, b] can be derived from the best approximation in [1, b/a] via the transform aν,[a,b] :=
aν,[1,b/a] , a
αν,[a,b] :=
αν,[1,b/a] , a
ε(f, [a, b], r) =
√ In the case of f (x) = 1/ x, the relations are aν,[1,b/a] √ , a
αν,[1,b/a] , a
ε(f, [1, b/a], r) . a (9.38a)
ε(f, [1, b/a], r) √ . a (9.38b) Therefore it suffices to study the best approximation on standardised intervals [1, R] for R ∈ (1, ∞). The coefficients {aν , αν : 1 ≤ ν ≤ r} for various values of R and r are contained in a web page21 (cf. Hackbusch [142, 132]). Concerning convergence, we first consider a fixed interval [1, R] = [1, 10]. The error k1/x − Er,[1,10] k[1,10],∞ is presented below: aν,[a,b] =
21
αν,[a,b] =
ε(f, [a, b], r) =
https://www.mis.mpg.de/scicomp/EXP SUM/readme
9.8 Analytical Approaches for the r-Term Approximation
339
5 r=1 2 3 4 6 7 8.55610 -2 8.75210 -3 7.14510 -4 5.57710 -5 4.24310 -6 3.17310 -7 2.34410 -8 One observes an exponential decay as O(exp(−cr)) with c > 0. If R varies from 1 to ∞, there is a certain finite value R∗ = Rr∗ depending on r, such that ε(f, [1, R], r) as a function of R strictly increases in [1, R∗ ], whereas the approximant Er,[1,R] , as well as the error ε(f, [1, R], r), is constant for R ∈ [R∗ , ∞). This implies that the approximation Er,[1,R∗ ] is already the best approximation in the semi-infinite interval [1, ∞). The next table shows Rr∗ and ε(1/x, [1, Rr∗ ], r) = ε(1/x, [1, ∞), r) for the semi-infinite interval [1, ∞): r Rr∗
9 28387
16 25 36 49 2.02710 +6 1.51310 +8 1.16210 +10 9.07410 +11
ε( x1 , [1, ∞), r) 2.61110 -5 3.65910 -7 4.89810 -9 6.38210 -11 8.17210 -13 √ 25 exp −π 2r 4.06810 -5 4.78510 -7 5.62810 -9 6.61910 -11 7.78610 -13 √ Here the accuracy behaves like the function 25 exp √ −π 2 r shown in the last line for a comparison. The behaviour of f (x) = 1/ x is quite similar: r Rr∗
9 16 25 36 49 7.99410 +6 4.12910 +9 2.1710 +12 1.1510 +15 6.1010 +17
√ ε(1/ x, [1, ∞), r) 3.07210 -4 1.35210 -5 5.89810 -7 2.56410 -8 1.11610 -9 √ 4 exp (−π r) 3.22810 -4 1.39510 -5 6.02810 -7 2.60510 -8 1.12610 -9
The observed asymptotic decay from the last row of the table is better than the theoretical upper bound in the next theorem. Theorem 9.58. Let f (x) = x−λ with λ > 0. The asymptotic behaviour of the error ε(f, I, r) is ( C exp(−cr) for a finite positive interval I = [a, b] ⊂ (0, ∞) , ε(f, I, r) ≤ √ C exp(−c r ) for a semi-infinite interval I = [a, ∞), a > 0, where the constants C, c > 0 depend on I. For instance, for λ = 1/2 and a = 1, the upper bounds are p √ √ ε(1/ x, [1, R], r) ≤ 8 2 exp −π 2 r/ log(8R) , p √ √ ε(1/ x, [1, ∞), r) ≤ 8 2 exp −π r/2 . For general a > 0 use (9.38a,b). Proof. For details about the constants, see Braess–Hackbusch [43], [44]. The latter estimates can be found in [44, Eqs. (33), (34)]. t u
9 r-Term Approximation
340
Best approximations with respect to the supremum norm can be performed by the Remez algorithm (cf. Remez [249]). For details of the implementation in the case of exponential sums, see Hackbusch [142].
9.8.2.4 Other Exponential Sums series. A periodic Another well-known type of exponential sums are trigonometric P function in [0, 2π] has the representation f (x) = ν∈Z aν eiνx . The coefficients aν decay P the faster the smoother the function is. In that case the partial sum fn (x) = |ν|≤n aν eiνx yields a good approximation. fn is of the form (9.36a) with imaginary coefficients αν := iν. Besides real coefficients αν as in Theorem 9.57 and imaginary ones as above, also complex coefficients with positive real part appear in applications. An important example is the Bessel function J0 , which is approximated by exponential sums in Beylkin–Monz´on [33].
9.8.2.5 Application to Multivariate Functions 9.8.2.5.1 Multivariate Functions Derived from 1/x We start with the application to f (x) = 1/x. Let fj ∈ C(Dj ) (1 ≤ j ≤ d) be functions with values in Ij ⊂ (0, ∞). Set I :=
d X
( Ij =
j=1
d X
) yj : yj ∈ Ij
= [ a, b ] ,
j=1
possibly with b = ∞. Choose an optimal exponential sum Er for x1 on22 I with error bound ε( x1 , I, r). As in the construction (9.36c), we obtain a best approximaPd tion of F (x) = F (x1 , . . . , xd ) := 1/ j=1 fj (xj ) by
r d X Y
P 1 − aν,I exp − αν,I fj (xj ) ≤ ε( x1 , I, r),
d
I,∞ ν=1 j=1 fj (xj ) j=1 i.e.,
kF − Fr kI,∞ ≤ ε
with Fr :=
Pr
ν=1
aν,I
Nd
j=1
(j)
1 , I, r x
(j)
Eν ∈ Rr , where Eν = exp(−αν,I fj (·)). (j)
Since aν,I > 0 (cf. Theorem 9.57), the functions Eν belong to the class Aj of positive functions. In the notation of §7.8, Fr ∈ Rr (Aj )dj=1 is valid (cf. §9.7). 22
1 For a larger interval I 0 , Er yields a (non-optimal) error bound with ε( x , I 0 , r).
9.8 Analytical Approaches for the r-Term Approximation
341
1 In quantum chemistry a so-called MP2 energy denominator εa +εb −ε i −εj appears, where εa , εb > 0 and εi , εj < 0 (more than four energies ε• are possible). The denominator is contained in [A, B] with A := 2 (εLUMO − εHOMO ) > 0 being related to the HOMO-LUMO gap, while B := 2 (εmax − εmin ) involves the maximal and minimal orbital energies (cf. [279]). Further computations are significantly accelerated if the dependencies of εa , εb , εi , εj can be separated. For this purpose, the optimal exponential sum Er,[A,B] for x1 on [A, B] can be used: r
X 1 ≈ aν,Iν e−αν εa · e−αν εb · eαν εi · eαν εj ∈ Rr , εa + εb − εi − εj ν=1 where the error can be uniformly estimated by ε( x1 , [A, B], r). The derivation of the Rexponential sum used in quantum chemistry starts from the ∞ Laplace transform x1 = 0 exp(−tx)dt and applies certain quadrature methods as described in §9.8.2.2 (cf. Alml¨of [5]). However, in this setting it is hard to describe how the best quadrature rule should be chosen. Note that the integrand exp(−tx) is parameter dependent. √ 9.8.2.5.2 Multivariate Functions Derived From 1/ x The function P(x) :=
1 1 = qP 3 kxk
2 j=1 xj
for x ∈ R3
is called the Newton potential in the context of gravity and the Coulomb potential in the context of an electrical field. Mathematically, 4πP is the singularity function of the Laplace operator d X ∂2 (9.39) ∆= ∂x2j j=1 for d = 3 (cf. [141, §2.2]). It usually appears in a convolution integral P ? f . If f is the mass [charge] density, Z f (y) 4π dy = 4π (P ? f ) (x) R3 kx − yk describes the gravitational [electrical] field. Obviously, it is impossible to approximate P uniformly on the whole R3 by exponential sums. Instead, we choose some η √ > 0 which will be fixed in Lemma 9.59. Take an optimal approximation Er of 1/ t on I := [η 2 , ∞). Following the P3 2 strategy in (9.37), we substitute t = kxk = j=1 x2j and obtain 2
Er,I (kxk ) =
r X ν=1
aν,I
3 Y j=1
exp(−αν,I x2j ),
9 r-Term Approximation
342
i.e., 2
Er,[η2 ,∞) (k·k ) =
r X
aν,I
ν=1
3 O
Eν(j) ∈ Rr
2
with Eν(j) (ξ) = e−αν,I ξ .
j=1
The uniform estimate ε 2 P(x) − Er,[η2 ,∞) (kxk ) ≤ ε √1· , [η 2 , ∞), r = η for η ≤ kxk < ∞ and ε := ε √1· , [1, ∞), r excludes the neighbourhood Uη := {x ∈ R3 : kxk ≤ η} of the singularity. Here we use Z 2 P(x)dx = 2πη 2 . P(x) − Er,[η2 ,∞) (kxk ) ≤ P(x) for x ∈ Uη and Uη
Lemma 9.59. Assume kf kL1 (R3 ) ≤ C1 and kf kL∞ (R3 ) ≤ C∞ . Then Z Z ε f (y) 2 2 3 kx − yk dy − 3 Er,[η2 ,∞) (kx − yk )f (y)dy ≤ 2πη C∞ + η C1 R R holds with
1 √ , [1, ∞), r for all x ∈ R3 . · q C1 ε The error bound is minimised for η = 3 4πC : ∞
Z
Z
√ q 2 f (y) 2 3 23 3
2 dy − 2 E (kx − yk )f (y)dy π 3 C12 C∞ ε 3 . ≤ r,[η ,∞) 2
3 kx − yk
3 3 {z } | R R R ,∞ ε := ε
=3.4873
√
p Inserting the asymptotic behaviour ε = 8 2 exp(−π r/2) from √ Theorem 9.58, √ we obtain a bound of the same form C√exp(−c r) with c = 2π/3. The observed behaviour is better: O(exp(− 2π r). We conclude from Lemma 9.59 that 3 2 the convolution P ? f may be replaced with the convolution Er,[η2 ,∞) (k·k ) ? f , while the accuracy is still exponentially improving. Assume for simplicity that f is an elementary tensor: f (y) = f1 (y1 ) · f2 (y2 ) · 2 f3 (y3 ). As seen in (4.85b), the convolution with Er (kx − yk ) can be reduced to three one-dimensional convolutions: Z Z f (y) 2 Er,I (kx − yk )f (y)dy dy ≈ kx − yk 3 3 R R r 3 Z X Y = aν,I exp(−αν,I (xj − yj )2 )fj (yj )dyj . ν=1
j=1
R
Numerical examples related to integral operators involving the Newton potential can be found in [147].
9.8 Analytical Approaches for the r-Term Approximation
343
9.8.2.6 Application to Operators Functions of matrices and operators are discussed in §4.6.6. Now we consider the situation of two functions f and f˜ applied to a matrix of the form U D U H , where f˜ is considered as approximation of f . Proposition 9.60. Let M = U D U H (U unitary, D diagonal) and assume that f and f˜ are defined on the spectrum σ(M ). Then the approximation error with respect to the spectral norm k·k2 is bounded by kf (M ) − f˜(M )k2 ≤ kf − f˜kσ(M ),∞ .
(9.40)
The estimate extends to selfadjoint operators. For diagonalisable matrices M = T D T −1 , the right-hand side becomes kT k2 kT −1 k2 kf − f˜kσ(M ),∞ . Proof. Since f (M ) − f˜(M ) = U f (D)U H − U f˜(D)U H = U [f (D) − f˜(D)]U H and unitary transformations do not change the spectral norm, kf (M ) − f˜(M )k2 = u kf (D)− f˜(D)k2 = max{|f (λ)− f˜(λ)| : λ ∈ σ(M )} = kf − f˜kσ(M ),∞ follows. t The supremum norm on the right-hand side in (9.40) cannot be relaxed to an Lp norm with p < ∞. This fact is what makes the construction of best approximations with respect to the supremum norm so important. Under stronger conditions on f and f˜, general operators M ∈ L(V, V ) can be admitted (cf. [138, Theorem 14.13]). Proposition 9.61. Let f and f˜ be holomorphic in a complex domain Ω containing σ(M ) for some operator M ∈ L(V, V ). Then I 1 ˜ kf (M ) − f (M )k2 ≤ |f (ζ) − f˜(ζ)| k(ζI − M )−1 k2 dζ . 2π ∂Ω Proof. Use the representation (4.87).
t u
Quite another question is how f (M ) behaves under perturbations of M . Here the following result of Aleksandrov–Peller [4] for H¨older continuous f is of interest. α
Theorem 9.62. Let f ∈ C α (R) with α ∈ (0, 1), i.e., |f (x)−f (y)| ≤ C |x−y| for x, y ∈ R. Then symmetric matrices (or general selfadjoint operators) M 0 and M 00 satisfy the analogous inequality kf (M 0 ) − f (M 00 )k ≤ C 0 kM 0 − M 00 kα . The corresponding statement for Lipschitz continuous f (i.e., for α = 1) is wrong, but generalisations to functions of the H¨older–Zygmund class are possible (cf. [4]).
9 r-Term Approximation
344
The inverse of M can be considered as the application of the function f (x) = x1 to M , i.e., f (M ) = M −1 . Assume that M is Hermitian (selfadjoint) and has a positive spectrum σ(M ) ⊂ [a, b]P⊂ (0, ∞]. As approximation f˜ we choose the r best exponential sum Er,I (x) = ν=1 aν,I exp(−αν,I x) on I, where I ⊃ [a, b]. Then r X Er,I (M ) = aν,I exp(−αν,I M ) (9.41) ν=1
approximates M
−1
exponentially well: kf (M ) − f˜(M )k2 ≤ ε( x1 , I, r).
(9.42)
The approximation of M −1 seems rather impractical since matrix exponentials exp(−tν M ) must be evaluated. The interesting applications, however, are matrices which are sums of certain Kronecker products. We recall Lemma 4.169b: M=
d X
I ⊗ · · · ⊗ M (j) ⊗ · · · ⊗ I ∈ Rd ,
M (j) ∈ KIj ×Ij
j=1
(factor M (j) at j-th position) has the exponential exp(M) =
d O
exp(M (j) ).
(9.43)
j=1 (j)
(j)
Let M (j) be positive definite with extreme eigenvalues 0 < λmin ≤ λmax for Pd 1 ≤ j ≤ d. Since the spectrum of M is the sum j=1 λ(j) of all λ(j) ∈ σ(M (j) ) , Pd (j) the interval [a, b] containing the spectrum σ(M) is given by a := j=1 λmin > 0 Pd (j) and b := j=1 λmax . In the case of an unbounded selfadjoint operator, b = ∞ holds. These preparations lead us to the following statement, which is often used for the case M (j) = I. Proposition 9.63. Let M (j) , A(j) ∈ KIj ×Ij be positive-definite matrices with (j) (j) λmin and λmax being the extreme eigenvalues of the generalised eigenvalue problem A(j) x = λM (j) x and set A = A(1) ⊗ M (2) ⊗ . . . ⊗ M (d) + M (1) ⊗ A(2) ⊗ . . . ⊗ M (d) + . . . +M
(1)
⊗ ... ⊗ M
(d−1)
⊗A
(d)
(9.44)
.
Then A−1 can be approximated by # " d # " r d O X O (j) −1 (j) −1 (j) M A · . exp −αν,I M B := aν,I ν=1
j=1
j=1
The error is given by
−1
A − B ≤ ε( 1 , [a, b], r) M−1 x 2 2
9.8 Analytical Approaches for the r-Term Approximation
345
with M=
d O
M (j) ,
a :=
d X
(j)
λmin ,
b :=
(j) . λmax
j=1
j=1
j=1
d X
ˆ · M1/2 with A ˆ = Aˆ(1) ⊗ I . . . ⊗ I + . . . , where Proof. Write A = M1/2 · A (j) (j) Aˆ(j) := (M (j) )−1/2 A(j) (M (j) )−1/2 . Note that λmin and λmax are the extreme ˆ appearing ˆ instead of M . For exp(−αν,I A) eigenvalues of Aˆ(j) . Apply (9.42) to A ˆ := Er,I (A) ˆ (cf. (9.41)) use the representation (9.43) with the error estimate in B −1 ˆ ˆ ˆ · M−1/2 . Hence kA − Bk2 ≤ ε( x1 , [a, b], r). Note that B = M−1/2 · Er,I (A) ˆ −1 − Er,I (A)] ˆ · M−1/2 k2 kA−1 − Bk2 = kM−1/2 · [A ˆ −1 − Bk ˆ 2 kM−1/2 k22 . ≤ kA The identity kM−1/2 k22 = kM−1 2 k completes the proof.
t u
−1 We still need to compute the exp(−αν,I M (j) A(j) ). As described in [138, §14.3.1] and [112], the hierarchical matrix technique allows us to approximate exp −αν,I (M (j) )−1 A(j) with a cost that is almost linear in #Ij . The total Pd number of arithmetic operations is O r j=1 #Ij log∗ #Ij . For #Ij = n (1 ≤ j ≤ d), this expression is O(rdn log∗ n) and depends only linearly on d. For identical A(j) = A(k) , M (j) = M (k) (1 ≤ j, k ≤ d), the cost O(rdn log∗ n) reduces to O(rn log∗ n). Proposition 9.63 can in particular be applied to the Laplace operator and its discretisations as detailed below. Remark 9.64. (a) The negative Laplace operator (9.39) in23 H01 ([0, 1]d ) has the d-term format (9.44) with M (j) = id,
A(j) = −
∂2 , ∂x2j
(j)
λmin = π 2 ,
λ(j) max = ∞.
(b) If we discretise by a finite difference scheme in an equidistant grid of step size 1 (j) is the tridiagonal matrix24 n12 · tridiag{−1, 2, −1}, while M (j) = I. The n, A extreme eigenvalues are (j)
π ) ≈ π2 , λmin = 4n2 sin2 ( 2n
(j) π = 4n2 cos2 ( 2n λmax ) ≈ 4n2 .
(c) A finite-element discretisation with piecewise linear elements in the same grid leads to the same25 matrix A(j) , but M (j) is the mass matrix tridiag{ 61 , 32 , 16 }. The reference to H01 ([0, 1]d ) means that zero Dirichlet values are prescribed on the boundary. In this case, a cheap, exact evaluation of exp(A(j) ) can be obtained by diagonalisation of A(j) . 25 In fact, both matrices are to be scaled by a factor 1/n. 23
24
9 r-Term Approximation
346
This approach to the inverse allows us to treat cases with large n and d. Grasedyck [119] presents examples with n = 1024 and d ≈ 1000. Note that in this case the matrix is of the size A−1 ∈ RN ×N with N ≈ 103000 . The approximate inverse B from above will be used as preconditioner for the corresponding systems of linear equation (cf. §16.2.3). The approximation method can be extended to separable differential operators in d Cartesian domains D = ×j=1 Dj with appropriate spectra. Pd Definition 9.65. A differential operator L is called separable if L = j=1 Lj and Lj contains only derivatives with respect to xj and has coefficients which only depend on xj . Er ≈ 1/x. Analogous statements So far, we have applied the exponential sum √ can be made about the application of Er ≈ 1/ x. Then r-term approximations of A−1/2 can be computed.
9.8.3 Sparse Grids 2,p The mixed Sobolev space Hmix ([0, 1]d ) for 2 ≤ p ≤ ∞ is the completion of d 2,p ([0, 1]) with respect to the norm a⊗ H
1/p
kf k2,p,mix =
X Z
p |Dν f (x)|
kνk∞ ≤2
for p < ∞ and the obvious modification for p = ∞. The approximation properties of sparse grids can be used to estimate ε(v, r) in (9.2) with respect to the Lp norm of V = k·kp ⊗d Lp ([0, 1]) . 2,p Remark 9.66. For v ∈ Hmix ([0, 1]d ), the quantity ε(v, r) is equal to
ε(v, r) = inf kv − ukp : u ∈ Rr (V) ≤ O r−2 log3(d−1) (log r) . S Proof. Vsg,` is defined in (7.19). Note that the completion of `∈N Vsg,` yields V. Consider r = dim(Vsg,` ) ≈ 2` logd−1 (`) (cf. Bungartz–Griebel [47, (3.63)]). The interpolant u ∈ Vsg,` of v satisfies kv − ukp ≤ O(2−2` logd−1 (`)) (cf. [47, Theorem 3.8]). The inequality 2−2` logd−1 (`) ≤ r−2 log3(d−1) (`) ≤ O(r−2 log3(d−1) (log r)) proves the assertion.
t u
Chapter 10
Tensor Subspace Approximation
Nd Abstract The exact representation of v ∈ V = j=1 Vj by a tensor subspace representation (8.5b) may be too expensive because of the high dimensions of the involved subspaces or even impossible since v is a topological tensor admitting no finite representation. In such cases we must be satisfied with an approximation u ≈ v which is easier to handle. We require that u ∈ Tr , i.e., there are bases (j) (j) {b1 , . . . , brj } ⊂ Vj such that Xr1 Xrd Od (j) bij . a[i1 · · · id ] u= ··· (10.1) i1 =1
id =1
j=1
The basic task of this chapter is the following problem: Given v ∈ V, find a suitable approximation u ∈ Tr ⊂ V,
(10.2)
d
where r = (r1 , . . . , rd ) ∈ N . Finding u ∈ Tr means finding coefficients a[i1 · · · id ] (j) as well as basis vectors bi ∈ Vj in (10.1). Problem (10.2) is formulated rather vaguely. If an accuracy ε > 0 is prescribed, r ∈ Nd as well as u ∈ Tr are to be determined. The strict minimisation of kv − uk is often replaced by an appropriate approximation u requiring low computational cost. Instead of ε > 0, we may prescribe the rank vector r ∈ Nd in (10.2). Optimal approximations (so-called ‘best approximations’) will be studied in Section 10.2. While best approximations require an iterative computation, quasioptimal approximations can be determined explicitly using the HOSVD basis introduced in Section 8.3. The latter approach is explained in Section 10.1.
10.1 Truncation to Tr The term ‘truncation’ (to Tr ) is used here for a (nonlinear) map τ = τr : V → Tr with quasi-optimality properties. Truncation should be seen as a cheaper alternative to the best approximation, which will be discussed in §10.2. Below we describe such truncations based on the higher-order singular-value decomposition (HOSVD) and study the introduced truncation error. © Springer Nature Switzerland AG 2019 W. Hackbusch, Tensor Spaces and Numerical Tensor Calculus, Springer Series in Computational Mathematics 56, https://doi.org/10.1007/978-3-030-35554-8_10
347
10 Tensor Subspace Approximation
348
One of the advantages of the tensor subspace format is the constructive existence of the higher-order singular-value decomposition (cf. §8.3). The related truncation is described in §10.1.1, while §10.1.2 is devoted to the successive HOSVD projection. Examples of HOSVD projections can be found in §10.1.3. A truncation starting from an r-term representation is mentioned in §10.1.4. Throughout this section, V is a Hilbert tensor space with induced scalar product. In practice, we assume Vj = KIj with the usual Euclidean scalar product, where nj := #Ij denotes the dimension. If Vj should be endowed with a special scalar product h·, ·ij , apply Remark 2.25.
10.1.1 HOSVD Projection The tensor to be approximated will be denoted by v ∈ V, while the approximant is denoted by u (possibly with additional subscripts). The standard assumption is that v is represented in tensor subspace format, i.e., v ∈ Ts for some s ∈ Nd0 , whereas the approximant u ∈ Tr is sought for some smaller rank1 r s. Optimal approximants ubest are favourite subjects in theory, but in applications one is often satisfied with quasi-optimal approximations.2 We call u ∈ Tr quasioptimal if there is a constant C such that kv − uk ≤ C kv − ubest k . In §2.6 the best Psrank-r approximation of a matrix Pr with the singular-value decomposition M = i=1 σi ui viT (s > r) is Mr = i=1 σi ui viT obtained by dropping the terms for i > r. The analogous procedure for tensors starts from the HOSVD tensor space representation v = ρHOSVD a, (Bj )1≤j≤d ∈ Ts (cf. (8.25)), where Bj = BjHOSVD is the HOSVD basis. Fix a smaller representation rank r ≤ s and perform the map v=
sj X
a[i1 · · · id ]
ij =1
d O
(j) bij
7→ uHOSVD =
rj X
a[i1 · · · id ]
ij =1
j=1
d O
(j)
bij ∈ Tr .
j=1
(10.3a)
(j)
Note that span{bν : 1 ≤ ν ≤ sj } = Ujmin (v). Set min Ur(j),HOSVD := span{b(j) (v). ν : 1 ≤ ν ≤ rj } ⊂ Uj j (j)
(j)
The basis vectors are ordered according to σ1 ≥ σ2 ≥ . . . . 1
The notation r s means that rj ≤ sj for all 1 ≤ j ≤ d, but rj < sj for at least one index j. N Ij is given in full representation (cf. §7.2), it can be interpreted as v ∈ Ts If v ∈ V = d j=1 K with s := n = (n1 , . . . , nd ), nj = #Ij . 2 A prominent example is the Galerkin approximation technique, where the Lemma of C´ea proves that the Galerkin solution in a certain subspace is quasi-optimal compared with the best approximation in that subspace (cf. [141, Theorem 8.21]).
10.1 Truncation to Tr
349
Remark 10.1. (a) The map (10.3a) defines the projection PHOSVD : r uHOSVD = PHOSVD v, r
(10.3b)
where (j),HOSVD
(j),HOSVD
orthogonal projection onto Urj
Prj
PHOSVD r
=
d N
(j),HOSVD Prj
orthogonal projection onto
, (10.3c)
UHOSVD , r
j=1
Nd (j),HOSVD and UrHOSVD := j=1 Urj ∈ L(V, V) is called the HOSVD . PHOSVD r HOSVD 3 . projection onto the subspace Ur (b) Let Vj = KIj be endowed with the Euclidean scalar product. Define the matrices Nd (j) (j) (j) (j) Br = j=1 Brj by Brj = [b1 · · · brj ] ∈ KIj ×{1,...,rj } using the HOSVD bases (j)
bν ∈ KIj . Then Pr(j),HOSVD j
=
rj X
(j) H (j) (j) H b(j) ν (bν ) = Brj (Brj ) ,
PHOSVD = Br BH r r . (10.3d)
ν=1
The first inequality in (10.4) below is described by De Lathauwer–De Moor– Vandevalle [70, Property 10]. While this first inequality yields a concrete error √ estimate, the second one in (10.4) states quasi-optimality. The constant C = d shows independence of the dimensions r ≤ s ≤ n = (n1 , . . . , nd ) ∈ Nd . Nd Theorem 10.2. Let V = j=1 KIj be endowed with the Euclidean scalar product. (j) Define the orthogonal projection PHOSVD as in Remark 10.1 and let σi be the r singular values. Then the HOSVD truncation defined in (10.3b) is a quasi-optimal approximation: v u d sj 2 √ uX X (j) t kv − uHOSVD k ≤ σi ≤ d kv − ubest k ,
(10.4)
j=1 i=rj +1
where ubest ∈ Tr yields the minimal error (i.e., kv − ubest k = min kv − uk). u∈Tr
(j),HOSVD
Proof (cf. [120]). Use the shorter notation Pj = Prj (j) Bj = Brj , and B = Br . Then
(j),HOSVD
, Uj = Urj
Pj := I ⊗ . . . ⊗ I ⊗ Bj BjH ⊗ I ⊗ . . . ⊗ I
,
(10.5)
is the projection onto 3
(j),HOSVD
(j)
Note that Prj and PHOSVD depend on the tensor v ∈ V whose singular vectors bi r (j),HOSVD enter their definition. However, we avoid the notation Prj (v) since this looks like the application of the projection onto v.
10 Tensor Subspace Approximation
350
V(j) := KI1 ⊗ . . . ⊗ KIj−1 ⊗ Uj ⊗ KIj+1 ⊗ . . . ⊗ KId . Nd Qd H The projection Pr = BBH = j=1 Bj Bj is the product j=1 Pj and yields Qd kv − uHOSVD k = k(I − j=1 Pj )vk. Lemma 4.146b proves the estimate 2
kv − uHOSVD k ≤
d X
2
k(I − Pj ) vk .
j=1
The singular-value decomposition of Mj (v) used in HOSVD implies that (I−Pj )v is the best approximation of v in V(j) under the condition dim(Uj ) = rj . Error estimate (2.16c) implies 2
k(I − Pj )vk =
sj X
(j)
(σi )2 .
i=rj +1
Therefore, the first inequality in (10.4) is proved. The best approximation ubest belongs to ˜j ⊗ KIj+1 ⊗ . . . ⊗ KId KI1 ⊗ . . . ⊗ KIj−1 ⊗ U ˜j of dimension rj . Since (I −Pj )v is the best approximation with some subspace U in this respect, 2 2 k(I − Pj ) vk ≤ kv − ubest k holds and proves the second inequality in (10.4).
t u
Corollary 10.3. If rj = sj (i.e., no reduction √ in the j-th direction), the sum Psj in (10.4) vanishes and the bound d in (10.4) can be improved by p i=rj +1 #{j : rj < sj }. Nd min (v) , the statements in Lemma 9.2b–e are still Since uHOSVD ∈ j=1 Uj valid. Since all directions are treated equally, the following statement is obvious. Corollary 10.4. If v ∈ Sd (V ) is a symmetric tensor and r1 = . . . = rd , the truncated tensor uHOSVD is again symmetric.
10.1.2 Successive HOSVD Projection The algorithm based on Theorem 10.2 reads as follows. Compute the left-sided singular-value decomposition of Mj (v) for all 1 ≤ j ≤ d in order to obtain Bj (j) and σi (1 ≤ i ≤ sj ). Having computed all data, the projection PHOSVD is applied. r Instead, the projections can be applied sequentially so that the result of the previous projections is already taken into account. The projection P˜j in (10.6) is (j),HOSVD again Prj , but referring to the singular-value decomposition of the actual tensor vj−1 defined below (instead of v0 = v):
10.1 Truncation to Tr
351
Start v0 := v Loop Perform HOSVDj (vj−1 ) (cf. §8.3.3) yielding (j) j = 1 to d the basis Bj and the singular values σ ˜i . ˜ Let Bj be the restriction of Bj to the first rj columns ˜j B ˜ H ⊗ I ⊗ . . . ⊗ I. and set P˜j := I ⊗ . . . ⊗ I ⊗ B j Define vj := P˜j vj−1 . Return u ˜ HOSVD := vd .
(10.6)
The projection P˜j maps Vj onto some subspace Uj of dimension rj . Hence vj belongs to V(j) := U1 ⊗ . . . ⊗ Uj ⊗ Vj+1 ⊗ . . . ⊗ Vd . One advantage is that computing the left-sided singular-value decomposition of Mj (vj−1 ) may be cheaper than computing Mj (v) since dim(V(j) ) ≤ dim(V). There is also an argument why this approach may yield better results. Let v1 := P1HOSVD v = P˜1 v (note that P1HOSVD = P˜1 , where P1HOSVD from (10.3d) belongs to v) be the result of the first step j = 1 of the loop. Projection P1HOSVD splits v into v1 + v1⊥ . If we use the (2) ˆj . projection P2HOSVD from Theorem 10.2, the singular values σi select the basis B (2) 4 (2) (2)H The singular value σi corresponds to the norm of bi bi v, but what really (2) (2)H (2) matters is the size of bi bi v1 , which is the singular value σ ˜i computed in (10.6) from v1 . This proves that the projection P˜2 yields a better result than P2HOSVD from Theorem 10.2, i.e., kv − v2 k = kv − P˜2 P˜1 vk ≤ kv − P2HOSVD P1HOSVD vk. We can prove an estimate corresponding to the first inequality in (10.4), but now (j) (j) (10.4) becomes an equality. Although σ ˜i ≤ σi holds, this does not imply that kv − u ˜ HOSVD k ≤ kv − uHOSVD k. Nevertheless, in general, one should expect the sequential version performs better. Theorem 10.5. The error of u ˜ HOSVD in (10.6) is equal to v u d sj 2 √ uX X (j) t σ ˜i ≤ d kv − ubest k . kv − u ˜ HOSVD k =
(10.7)
j=1 i=rj +1 (j)
The arising singular values satisfy σ ˜i the algorithm from Theorem 10.2.
(j)
(j)
≤ σi , where the values σi
belong to
Proof. (i) We split the difference into v−u ˜ HOSVD = (I − P˜d P˜d−1 · · · P˜1 )v = (I − P˜1 ) v+(I − P˜2 )P˜1 v + . . . +(I − P˜d )P˜d−1 · · · P˜1 v. Since the projections commute (P˜j P˜k = P˜k P˜j ), all terms on the right-hand side are orthogonal. Setting vj = P˜j · · · P˜1 v, we obtain 2
kv − u ˜ HOSVD k =
d
2 X
I − P˜j vj−1 . j=1
4 (2) (2)H bi bi
applies to the 2nd component:
d (2) (2)H N (j) (2) (2) (bi bi ) v = v (1)⊗hv (j), bi ibi ⊗v (3)⊗. . . j=1
10 Tensor Subspace Approximation
352
Psj (j) Now k(I − P˜j )vj k2 = i=r (˜ σi )2 finishes the proof of the first part. j +1 (1)
(1)
(j)
˜i = σi (σi are (ii) For j = 1, the same HOSVD basis is used so that σ the singular values from Theorem 10.2). For j ≥ 2 the sequential algorithm uses the singular-value decomposition of Mj (vj−1 ) = Mj (P˜j−1 · · · P˜1 v). The product N ˜ ⊗ I, where P ˜ = j−1 P˜k P˜j−1 · · · P˜1 is better written as Kronecker product P k=1 Nd and I = k=j I. According to (5.5), ˜ T PM ˜ j (v)H ≤ Mj (v)Mj (v)H Mj (vj−1 )Mj (vj−1 )H = Mj (v)P ˜ TP ˜ =P ˜ ≤ I (cf. Remark 4.145d). By Lemma 2.31b, the singular holds because P (j) (j) (j) (j) values satisfy σ ˜i ≤ σi (˜ σi are singular values of Mj (vj−1 ), σi are those of Mj (v)). Therefore the last inequality in (10.7) follows from (10.4). t u √ The ratio kv − u ˜ HOSVD k / kv − ubest k can take all values in [1, d ]. Note that the statement of Corollary 10.4 is not valid for u ˜ HOSVD .
10.1.3 Examples Examples 8.28 and 8.29 describe two tensors from T(2,2,2) ⊂ V ⊗ V ⊗ V . Here we discuss their truncation to T(1,1,1) = R1 . Tensor v = x ⊗ x ⊗ x + σy ⊗ y ⊗ y in (8.26) is already given in HOSVD (j) (j),HOSVD representation. Assuming 1 > σ > 0, the HOSVD projection P1 := P1 is the projection onto span{x}; i.e., uHOSVD := PHOSVD (1,1,1) v = x ⊗ x ⊗ x ∈ T(1,1,1)
(10.8)
is the HOSVD projection from Theorem 10.2. Obviously, the error is (1)
kv − uHOSVD k = kσ y ⊗ y ⊗ yk = σ = σ2
(cf. Example 8.28) and, therefore, smaller than the upper bound in (10.4). The (1) (2) (3) reason becomes obvious when we apply the factors in PHOSVD (1,1,1) = P1 ⊗P1 ⊗P1 (1) sequentially. The first projection already maps v into the final value P1 v = x ⊗ x ⊗ x ; the following projections cause no additional approximation errors. Accordingly, if we apply the successive HOSVD projection in §10.1.2, the first step (1) of algorithm (10.6) yields P1 v = x ⊗ x ⊗ x ∈ T(1,1,1) , and no additional pro(2) (3) (1) ˜ HOSVD := P1 v holds, jections are needed (i.e., P1 = P1 = id). Therefore u (1) and only the singular value σ2 for j = 1 appears in the error estimate (10.7). Example 8.29 uses the tensor v = αx ⊗ x ⊗ x + βx ⊗ x ⊗ y + βx ⊗ y ⊗ x + (j) (j) βy ⊗ x ⊗ x, where α, β are chosen such that again 1 = σ1 > σ2 = σ ∈ [0, 1) (j) (j) are the singular values for all j (cf. (8.27a)). The HOSVD bases {b1 , b2 } are (j) given in (8.27b). Since bi = bi is independent of j, we omit the superscript j.
10.1 Truncation to Tr
353
The HOSVD projection yields with γ := ακ 3 + 3βκ 2 λ
uHOSVD = γ b1 ⊗ b1 ⊗ b1
x = κb1 + λb2 , y = λb1 − κb2 , where the coefficients κ = are functions of the singular
and
b1 = κx + λy, b2 = λx − κy,
√ √ σ (1/ 2−σ ) 1−σ/ 2 and λ = (1+σ)(1−σ) (1+σ)(1−σ) (j) value σ = σ2 . The error is given by
r
q
kv − uHOSVD k =
p
κ 2 + λ2 = 1
3/2 σ + O(σ 2 ).
For the special choice σ = 1/10, the approximation uHOSVD and its error are uHOSVD = ⊗3 (0.968135 x + 0.247453 y) , kv−uHOSVD k = 0.120158.
(10.9)
Next, we consider the successive HOSVD projection in §10.1.2. The first projection yields (1)
u(1) = b1 ⊗ [(ακ + βλ) x ⊗ x + βκ x ⊗ y + βκ y ⊗ x] (1)
= b1 ⊗ [0.93126 x ⊗ x + 0.25763 x ⊗ y + 0.25763 y ⊗ x]
for σ = 1/10
(j)
and kv − u(1) k = σ = σ2 . The second projection needs the left-sided singularvalue decomposition of (1) (1) M2 (u(1) ) = x ⊗ b1 ⊗ ((ακ + βλ) x + βκy) + y ⊗ b1 ⊗ βκx . The singular values and left singular vectors for σ = 1/10 are (2)
b1 = 0.96824 x + 0.25 y,
(2)
(2)
b2 = 0.25 x − 0.96824 y.
σ1 = 0.99778, σ2 = 0.066521, (2)
b1
(2)
(1)
is quite close to b1 = 0.96885 x + 0.24764 y. The second projection yields (1)
(2)
u(2) = b1 ⊗ b1 ⊗ [0.96609 x + 0.24945 y] = [0.96885 x + 0.24764 y] ⊗ [0.96824 x + 0.25 y] ⊗ [0.96609 x + 0.24945 y] (2)
with the error ku(1) − u(2) k = σ2 . Since rank3 (u(2) ) = 1, a third projection is not needed, i.e., u ˜ HOSVD := u(2) . The total error is kv − u ˜ HOSVD k =
q
(1) 2
σ2
(2) 2
+ σ2
= 0.12010.
We observe that u ˜ HOSVD is a bit better than uHOSVD . However, as a consequence of the successive computations, the resulting tensor u ˜ HOSVD is not symmetric.
10 Tensor Subspace Approximation
354
10.1.4 Other Truncations Pr Nd (j) Starting with an r-term representation v = j=1 vν , the procedure in ν=1 §8.3.3.2 allows an HOSVD representation in the hybrid format in §8.2.6; i.e., the coefficient tensor a of v ∈ Tr is represented in the r-term format Rr . To avoid the calculations in §8.3.3.2 for large r, there are proposals to simplify the truncation. In5 [187] reduced singular-value decompositions of the matrices (j) (j) (j) [v1 , . . . , vr ] are used to project vi onto a smaller subspace. For the correct (j) scaling of vν define h i Y (j) (j) ων(j) := kvν(k) k, Aj := ω1 v1 , · · · , ωr(j) vr(j) ∈ KIj ×r . k6=j
Psj (j) (j) (j)T The reduced left-sided singular-value decomposition of Aj = i=1 σi ui wi (j) (j) (sj = rank(Aj )) yields σi and ui . Note that sj r if #Ij r. Define the Prj (j) (j)H (j) (j) orthogonal projection Prj = from KIj onto span1≤i≤rj {ui } i=1 ui ui Nd (j) for some rj ≤ sj . Application of Pr := j=1 Prj to v yields the truncated tensor v ˜ := Pr v =
d r O X
rj D r O d X X E (j) (j) (j) v Pr(j) = vν(j) , ui ui . ν j ν=1 j=1 i=1
ν=1 j=1
The right-hand side is given in hybrid format (8.20). The error v ˜ − v is caused by sj sj (j) (j) D E X X σi wν,i (j) (j) (j) (j) (j) I v v u dν(j) := Pr(j) − = , u = ui . ν ν i i j (j) ων i=rj +1 i=rj +1
The latter equality uses the singular-value decomposition of Aj . The relative error (j) (j) (j) (j) (j) introduced in (7.11b) is δν = kdν k/kvν k. Note that ων kvν k = kvν k with Nd (j) (j) vν = j=1 vν . Since {ui : 1 ≤ i ≤ sj } are orthonormal, sj X
−2
(δν(j) )2 = kvν k
(j)
2
2
(j)
σi
(j)
wν,i
2
i=rj +1
follows. Orthonormality of r X ν=1
2
kvν k
d X j=1
(δν(j) )2 =
(j) wi
proves
sj r X d X X ν=1 j=1 i=rj +1
(j)
σi
wν,i
2
=
sj d X X
(j)
σi
2
.
j=1 i=rj +1
Remark 10.6. Given a tolerance ε > 0, choose the minimal j-rank rj ≤ sj such Pd Psj (j) that j=1 i=r (σi )2 ≤ ε2 . Then the total error is bounded by j +1 √ k˜ v − vk ≤ r ε. 5
In [187, Theorem 2.5d], this approach is called ‘reduced HOSVD approximation’, although there is no similarity to HOSVD as defined in §8.3.
10.1 Truncation to Tr
355
Proof. Apply Remark 7.15.
t u
In this case differently from Theorem 10.2, no comparison with the best approxto (10.4), can be given.6 Therefore, starting from a given imation, corresponding √ error bound r ε, the obtained reduced ranks rj may be much larger than those obtained from HOSVD. In this case the truncation v 7→ Pv can be followed by the ALS iteration in §10.3. A favourable difference to the HOSVD projection is the fact that the projections Pj are determined independently.
10.1.5 L∞ Estimate of the Truncation Error We provide a general result about the truncation error, which shows in a quantitative manner how properties of the tensor are inherited by the truncated tensor and the truncation error. In particular, this statement applies to smooth multivariate functions. Together with the results of §4.5.4, we obtain L∞ error estimates, which are of extreme importance for pointwise evaluations of (truncated) multivariate functions. We consider (possibly unbounded) operators A(j) on7 Vj (its domain D(·) is defined in (6.26)). The extension of Aj to V is given by Aj := I ⊗ I ⊗ . . . ⊗ A(j) ⊗ . . . ⊗ I with A(j) at the j-th position. The following estimates are slightly different for the ˜ HOSVD (global) truncation PHOSVD from (10.3c) and the successive truncation P r r in §10.1.2. Lemma 10.7. Suppose that v ∈ D(Aj ) for some j ∈ {1, . . . , d}. Then the pro˜ HOSVD jections Π ∈ {PHOSVD ,P } satisfy r r kAj Πvk ≤ kAj vk, kAj (I − Π) vk ≤ cF kAj vk (√ 2 if Π = PHOSVD , r with cF = √ HOSVD ˜ 3 if Π = Pr .
(10.10a) (10.10b)
As counterexample consider a tensor v = v0 + εv00 , where v00 = vn ∈ R2 is taken from (9.10) with n 1/ε, while v0 ∈ Rr−2 has a stable representation. Together, v has an r-term representation, where the two terms related to vn are dominant and lead to the largest singular (j) values σi . The projection described above omits parts of v0 , while εv00 is hardly changed. (j) (j) ∼ (j) The ratio σi /σ1 = σi nε is not related to the relative error. 7 (j) A may be a mapping from D(A(j) ) ⊂ Vj into Vj or into another Hilbert space Wj . In the latter case, the operator norm has to be changed accordingly. 6
10 Tensor Subspace Approximation
356
Nd (j),HOSVD = j=1 Prj Proof. (i) First we study Π = PHOSVD . Since all directions r are of equal nature, we fix j = 1 in (10.10a,b) and consider A1 . Set Πj := I ⊗ I ⊗ . . . ⊗ Pr(j),HOSVD ⊗ . . . ⊗ I. j These projections are mutually commutative and their product (in any order) yields Π. A1 Πj = Πj A1 holds for all j ≥ 2 so that A1 Πv = A1 Π2 Π3 · · · Πd Π1 v = Π2 Π3 · · · Πd A1 Π1 v. As in the proof of Proposition 6.47, Π1 v = Π[1] v holds, where Π [1] ∈ L(V[1] , V[1] ) [1]
[1]
is the orthogonal projection onto span{mi : 1 ≤ i ≤ r1 } with mi from Ps1 (1) (1) [1] ⊗ mi (cf. (8.23)). Therefore we can continue the M1 (v) = i=1 σi bi 8 previous equation, A1 Πv = Π2 Π3 · · · Πd A1 Π1 v = Π2 Π3 · · · Πd A1 Π[1] v = Π2 Π3 · · · Πd Π[1] A1 v, and obtain (10.10a). The truncation error is estimated by kA1 (v − Πv) k ≤ kI − Π2 Π3 · · · Πd Π[1] kkA1 vk. Note that P1 := Π2 Π3 · · · Πd and P2 := Π[1] , are two orthogonal projections. √ Lemma 4.146a proves the estimate kI − P1 P2 k ≤ 2. (ii) In the case of the successive truncation, the order of the directions is impor˜ HOSVD tant. We may assume that Π = P = Πd · Πd−1 · . . . · Π1 , where r (rj ) Πj := I ⊗ I ⊗ . . . ⊗ P˜j,HOSVD ⊗ . . . ⊗ I.
Again, Aj commutes with Πk for all k 6= j. Since Πj is defined by the SVD of v ˜ := Πj−1 · . . . · Π1 v, we can use the fact that Πj v ˜ = Π[j] v ˜, where Π[j] is defined analogously to Π[1] in Part (i). As Aj commutes with Π[j] , we have A1 Πd · · · Πj · · · Π1 v = A1 Πd · · · Π[j] · · · Π1 v =Πd · · · Π[j] · · · Π1 A1 v, again proving (10.10a). Since now three orthogonal projections Πd · · · Πj+1 , Π[j] , √ t u and Πj−1 · · · Π1 are involved, Lemma 4.146b yields the bound 3. Now we consider the function spaces Vj = L2 (Ωj ),
Ωj ⊂ R,
and V =
d O k·k
Vj = L2 (Ω).
j=1 8
By assumption v belongs to the domain of A1 . Reading the next equality from right to left, we conclude from Π[1] A1 v = A1 Π[1] v, that also Π[1] v belongs to the domain of A1 , etc. Hence, Πv = w belongs to the domain.
10.1 Truncation to Tr
357
Instead of v, we write f and f˜ := Πf . The operator A(j) is chosen as ∂ m /∂xm j . We recall the semi-norm | · |m in (4.71). Lemma 10.7 yields 2 |f − f˜|m =
d X
2
kAj (I − Π)f kL2 ≤ c2F
d X
2
2 2 kAj f kL2 = cF |f |m ,
j=1
j=1
i.e., up to the factor of cF , the error is as smooth as the function itself. Combining this inequality with the Gagliardo–Nirenberg inequality from Theorem 4.153, we obtain part (a) of the following result. Proposition 10.8. the suppositions of Theorem 4.153 are valid for a Assume that domain Ω ∈ Rd , [0, ∞)d , and let cF as in (10.10b). (a) If |f |m < ∞ for some m > d/2, the error δf := f˜−f of the tensor truncation allows an estimate with respect to the supremum norm: d
d
1−
d
2m 2m kδf k∞ ≤ cΩ kδf kL2 2m . |f |m m cF
(10.11a)
(b) If |f |m . µm as m → ∞, the estimate kδf k∞ ≤ cΩ µd/2 kδf kL2
with cΩ := lim inf cΩ m m→∞
(10.11b)
is valid, where cΩ = π −d/2 holds for Ω = Rd . 1/m
(c) If |f |m
≤ µmp , the asymptotic behaviour for kδf kL2 → 0 is described by h kδf k∞ . cΩ µe logp ( kδf1k
id/2 ) kδf kL2 . 2
(10.11c)
L
2
2
2
(d) The analogous statements hold for Ω = [0, 1]d with |·|m replaced by |·|m +k·k . d
Proof. Part (b) is obtained from (10.11a) for m → ∞. cΩ = π − 2 is stated in Proposition 4.159. For Part (c) assume that kδf kL2 < 1e and set m∗ := − log(kδf kL2 ). −
d
Then (10.11c) follows since kδf kL22m∗ = ed/2 .
t u
Qd sin(Ax ) Example 10.9. (a) The product of sinc functions f (x) = j=1 xj j (A > 0) −1/2 d/2 m+d/2 satisfies |f |m = (2m + 1) π A , i.e., |f |m . µm with µ = A. (b) The Gaussian function f (x) = exp(−Ax2 ) with A > 0 leads to 2 |f |m
√
m−d/2
≈ d 2 (2A)
π
d/2
1 m− 2
m−1/2
e−m+1/2
p so that (10.11c) can be applied with µ = 2A/e and p = 1/2. Qd 1/m 1 ≤ µmp with (c) f (x) = j=1 cosh(xj ) has derivatives estimated by |f |m µ = 1/e and p = 1 (cf. Hackbusch [135]).
10 Tensor Subspace Approximation
358
10.2 Best Approximation in the Tensor Subspace Format 10.2.1 General Setting Nd As in §9.1, two approximation problems can be formulated. Let V = j=1 Vj be a Banach tensor space with norm k·k. In the first version we fix the format Tr : Given v ∈ V and r = (r1 , . . . , rd ) ∈ Nd , determine u ∈ Tr minimising kv − uk .
(10.12)
Again, we may form the infimum ε(v, r) := ε(r) := inf {kv − uk : u ∈ Tr } .
(10.13)
The variation over all u ∈ Tr includes the variation over all subspaces Uj ⊂ Vj of dimension rj : ) ( ε(v, r) =
inf
inf
U1 ⊂V1 with U2 ⊂V2 with dim(U1 )=r1 dim(U2 )=r2
...
inf
Ud ⊂Vd with dim(Ud )=rd
u∈
inf N d
kv − uk .
j=1 Uj
The existence of a best approximation u ∈ Tr with kv − uk = ε(v, r) will be discussed in §10.2.2.2. Practical computations are usually restricted to the choice for the Euclidean norm (see §§10.2.2.3). In the following second variant the roles of r and ε(r) are reversed:9 Given v ∈ V and ε > 0, determine r and u ∈ Tr with kv − uk ≤ ε and minimal storage.
(10.14)
The following lemma is the analogue of Lemma 9.2 and can be proved similarly. Lemma 10.10. Assume that V is a Hilbert tensor space with induced scalar product. The best approximation from Problem (10.12) and at least Nd one of the solutions of Problem (10.14) belong to the subspace U(v) := k·k j=1 Ujmin (v) (cf. (6.17)). Consequently, the statements (b–e) from Lemma 9.2 are again valid. As an illustration, we discuss the Examples 8.28 and 8.29. The HOSVD projection (10.8) from Example 8.28 is already the best approximation. For Example 8.29 (with σ = 1/10) we make the symmetric ansatz10 u(ξ, η) := ⊗3 (ξ x + η y). Minimisation of kv − u(ξ, η)k over ξ, η ∈ R yields the optimum ubest := ⊗3 (0.96756588 x + 0.24968136 y) ,
kv − ubest k = 0.120083,
which is only insignificantly better than kv − uHOSVD k = 0.120158 in (10.9). In §10.2.2 we shall analyse Problem (10.12) for which the rank vector r is fixed. The second Problem (10.14) will be addressed in §10.3.3. 9 In principle, we would like to ask for u ∈ Tr with kv − uk ≤ ε and r as small as possible, but this question may not lead to a unique rmin . The storage size of u is a scalar value depending of r and attains a minimum. 10 This is justified by Proposition 9.5 because of R1 = T(1,1,1) .
10.2 Best Approximation in the Tensor Subspace Format
359
10.2.2 Approximation with Fixed Format 10.2.2.1 Matrix Case d = 2 The solution of Problem (10.12) has already been discussed in Conclusion 2.36 for the Euclidean (Frobenius) norm. Let r = r1 = r2 < min{n1 , n2 }. Determine the Pmin{n ,n } singular-value decomposition i=1 1 2 σi ui ⊗vi of the tensor v ∈ Kn1 ⊗Kn2 . Then B1 = [u1 , . . . , ur ] and B2 = [v1 , . . . , vr ] contain the optimal orthonormal bases. The solution of Problem (10.12) is u=
r X
σi ui ⊗ vi .
i=1 H The coefficient qP tensor is a = B v = diag{σ1 , . . . , σr }. The error kv − uk is min{n1 ,n2 } 2 equal to σi (cf. (2.23b)), while the maximised value of kBH vk is i=r+1 pPr 2 i=1 σi . Non-uniqueness occurs if σr = σr+1 (cf. Conclusion 2.36).
10.2.2.2 Existence of a Minimiser The following assumptions hold in particular in the finite-dimensional case. Nd Theorem 10.11. Let V = k·k j=1 Vj be a reflexive Banach tensor space with the property (6.14). Then the subset Tr ⊂ V is weakly closed. For any v ∈ V, Problem (10.12) has a solution; i.e., for given finite representation ranks rj ≤ dim(Vj ) there are subspaces Uj ⊂ Vj with dim(Uj ) = rj and a tensor umin ∈ U =
d O
Uj ⊂ Tr
j=1
such that kv − umin k = inf kv − uk . u∈Tr
Proof. By Lemma 8.6, Tr is weakly closed. Thus, Theorem 4.33 proves the existence of a minimiser. t u For (infinite-dimensional) Hilbert spaces Vj , the statement of Theorem 10.11 is differently proved by Uschmajew [286, Corollary 23]. Concerning non-uniqueness of the best approximation, the observations for the matrix case d = 2 mentioned in §10.2.2.1 are still valid for larger d. Remark 10.12. If d ≥ 2 and dim(Vj ) > rj > 0 for at least one j ∈ {0, . . . , d}, uniqueness11 of the minimiser umin cannot be guaranteed. 11
Since there are often misunderstandings we emphasise that uniqueness of umin is meant, not (j) uniqueness of its representation by a[i1 · · · id ] and bi .
10 Tensor Subspace Approximation
360
10.2.2.3 Optimisation with Respect to the Euclidean Norm The Hilbert structure enables additional characterisations. For orthogonal projections we refer to §4.4.3. Nd Lemma 10.13. (a) Given a fixed subspace U = k·k j=1 Uj , the minimiser of kv − uk over all u ∈ U is explicitly described by u = PU v,
(10.15a)
where PU is the orthogonal projection onto U. Pythagoras’ equality yields 2
kvk = kuk2 + kv − uk2 .
(10.15b)
(b) PU may be written as Kronecker product PU =
d O
P Uj
(P Uj orthogonal projection onto Uj ).
(10.15c)
j=1
Nd Proof. (i) By the definition of U := k·k j=1 Uj , this subspace is closed and (10.15a) follows from Remark 4.145c. By Remark 4.145e, I −PU is the orthogonal projection onto U⊥ . Since PU v ∈ U and (I − PU ) v ∈ U⊥ are orthogonal, 2
2
kvk = kPU v + (I − PU ) vk = kPU vk2 + k (I − PU ) vk2 follows. Now PU v = u and (I − PU ) v = v − u yield (10.15b). (ii) (10.15c) is trivial. Note that P Uj uses the closed subspace since closeness of Uj is not yet assumed. t u Next, we consider the special case of finite-dimensional Vj = KIj , nj = #Ij , endowed with the Euclidean norm (and, therefore, also the Euclidean scalar product). Let r = (r1 , . . . , rd ) be the prescribed dimensions and set Jj := {1, . . . , rj }. With each subspace Uj of dimension rj we associate an orthonormal basis Bj = (j) (j) [ b1 · · · brj ] ∈ KIj ×Jj . Then PUj = Bj BjH ∈ KIj ×Ij is the orthogonal projection onto12 Uj (cf. Remark 4.145f). Using (10.15c), we obtain the representation PU =
d O
Bj BjH = BBH
with B :=
j=1
d O
Bj ∈ KI×J ,
j=1
where I := I1 × . . . × Id and J := J1 × . . . × Jd . Remark 10.14. Under the assumptions from above, the following minimisation problems are equivalent: d O H H kv − BB vk : B = Bj , Bj Bj = I . (10.16) min kv − uk = min u∈Tr
12
B∈KIj ×Jj
Note that Uj = Uj because of the finite dimension.
j=1
10.2 Best Approximation in the Tensor Subspace Format
361
Nd Proof. Any u ∈ Tr belongs to some subspace U = j=1 Uj with dim(Uj ) = rj ; hence, u = BBH v holds for a suitable B, proving minB kv−BBH vk ≤ kv−uk. On the other hand, the tensor BBH v belongs to Tr so that minu∈Tr kv−uk ≤ t u kv−BBH vk. Lemma 10.15. The minimisation problem u∗ := arg minu∈Tr kv − uk is equivalent to the following maximisation problem:13 Nd Find B with B = j=1 Bj , Bj ∈ KIj ×Jj , BjH Bj = I, (10.17) such that kBH vk is maximal. H
ˆ := arg maxB kBH vk, then u∗ = Ba ˆ with a := B ˆ v ∈ KJ . If B is a solution If B Nd of (10.17), BQ with Q = j=1 Qj and unitary Qj ∈ KJj ×Jj , is also a solution. Proof. As a consequence of (10.15b), minimisation of kv − uk is equivalent to the maximisation of kuk. By Remark 10.14, u = BBH v holds for some orthogonal matrix B so that
2 kuk = hu, ui = BBH v, BBH v = BH v, BH BBH v = BH v, BH v = kBH vk2 (cf. Exercise 8.19). The last assertion follows from kBH vk = kQH BH vk.
u t
The reformulation (10.17), which is due to De Lathauwer–De Moor–Vandewalle [72, Theorem 4.2]), is the basis of the ALS method described in the next section. Newton-based methods are described by Ishteva et al. [173] and Eld´en–Savas [86].
10.2.3 Properties Let V be a Hilbert tensor spaces with induced scalar product. The best approximation ubest ∈ Tr of v ∈ V shares the same properties as the HOSVD projection (cf. Hackbusch [144]): (a) kAj ubest k ≤ kAj vk for all Aj := I ⊗ . . . ⊗ A(j) ⊗ . . . ⊗ I, A(j) ∈ L(Vj , Vj ). (b) The above inequality also holds for unbounded maps A(j) acting in Vj if v belongs to its domain. (c) Linear constraints of v ∈ V as defined in §6.8 are conserved by ubest . (d) The L∞ estimates of §10.1.5 are valid. For a proof consider the orthogonal projection Φ[j] from V onto Umin [j] (ubest ). min Set u := idj ⊗ Φ[j] v ∈ Vj ⊗ U[j] (ubest ). The Tr truncation by HOSVD can also be written as idj ⊗ Ψ[j] showing that u is already in Tr . As both the HOSVD truncation and the best approximation are optimal, u = ubest follows. The properties (a)–(d) are a result of the identity ubest = idj ⊗ Φ[j] v. 13
The orthogonal matrices Bj from (10.17) form the so-called Stiefel manifold.
10 Tensor Subspace Approximation
362
10.3 Alternating Least-Squares Method (ALS) 10.3.1 Algorithm Problem (10.17) is an optimisation problem. The maximisation is done with respect to the d orthogonal matrices Bj ∈ KIj ×Jj . The function Φ(B1 , . . . , Bd ) := kBH vk2 is a quadratic function of the Bj entries. As discussed in §9.6.2.1, a standard method for optimising multivariate functions is the iterative optimisation with respect to a single parameter. In this case we consider Bj as one parameter and obtain the following iteration. De Lathauwer–De Moor–Vandevalle [72, Alg. 4.2] call this method HOOI, the ‘higher-order orthogonal iteration’. We use the term ‘alternating least-squares method’, although it is an alternating largest-squares method with the side conditions BjH Bj = I. Start
(0)
Choose Bj
∈ KIj ×Jj (1 ≤ j ≤ d) (cf. Remark 10.19c), set m := 1. (m)
Loop For j = 1 to d do compute Bj
as maximiser of
(m) (m) (m−1) (m−1) (m) ) Bj := argmax Φ(B1 , . . . , Bj−1 , Bj , Bj+1 , . . . , Bd H Bj with Bj Bj =I
(10.18)
Set m := m + 1 and repeat the iteration. The concrete realisation will be discussed in §10.3.2. Here we give some general statements. Define vj,m ∈ KJ1 ×...×Jj−1 ×Ij ×Jj+1 ×...×Jd by H (m) (m) (m−1) (m−1) vj,m := B1 ⊗ . . . ⊗ Bj−1 ⊗ id ⊗ Bj+1 ⊗ . . . ⊗ Bd v
(10.19)
During the iteration (10.18) we are looking for an orthogonal matrix Bj ∈ KIj×Jj so that BjH vj,m has maximal norm. Here and in the sequel, the short notation Bj when applied to a tensor, means id ⊗ . . . ⊗ Bj ⊗ . . . ⊗ id. (j)
(j)
Lemma 10.16. The maximiser Bj = [ b1 · · · brj ] ∈ KIj×Jj is given by the first rj columns (singular vectors) of U in the reduced left-sided singular-value (j),HOSVD decomposition Mj (vj,m ) = UΣ V T . Moreover, Bj BjH = Prj is the HOSVD projection. Proof. The statements are easily derived by kBjH vj,m k =
Lemma 5.6
kBjH Mj (vj,m )k = kBjH UΣ V T k
=
V unitary
=
Remark 5.9
kBjH U Σk.
kMj (BjH vj,m )k = t u
363
10.3 Alternating Least-Squares Method (ALS)
Remark 10.17. (a) The construction of Bj requires rank(Mj (vj,m )) ≥ rj since otherwise U has not enough columns. In the latter case, we either add arbitrarily chosen orthonormal vectors from range (U )⊥ or continue with decreased j-th representation rank rj . (b) If rank(Mj (vj,m )) = rj , any orthonormal basis Bj of range(Mj (vj,m )) is the solution of (10.18). (0)
(c) Note that initial values Bj one j ≥ 2 lead to v1,1 = 0.
(0)
with range(Bj ) ⊥ range(Mj (v)) for at least
In the sequel, we assume that such failures of (10.18) do not appear. We introduce the index sets Ij ={1,. . ., nj }, Jj = {1,. . ., rj }, I[j] =
×
(m)
and the tensor B[j] := B1 Remark 5.9 shows that
(m)
(m−1)
⊗ . . . ⊗ Bj−1 ⊗ Bj+1
×
Jk k∈{1,...,d}\{j}
Ik , J[j] =
k∈{1,...,d}\j
(m−1)
⊗ . . . ⊗ Bd
(10.20)
∈ KIj ×J[j] .
H Mj (vj,m ) = Mj ((B[j] ⊗ idj )v) = Mj (v) B[j] .
This proves the first part of the following remark. Remark 10.18. Assume rank(Mj (vj,m )) ≥ rj . Matrix U from Lemma 10.16 is obtained by diagonalising H 2 H Mj (vj,m ) Mj (vj,m )H = Mj (v) B[j] BT [j] Mj (v) = U Σ U . (m)
All maximisers Bj
(m)
Proof. Use range(Bj
(m)
in (10.18) satisfy range(Bj
) ⊂ Ujmin (v).
) ⊂ range(Mj (v)B[j] ) ⊂ range(Mj (v)) = Ujmin (v). t u (m)
(m)
(m−1)
(m−1)
) Remark 10.19. (a) The function values Φ(B1 , . . . , Bj , Bj+1 , . . . , Bd (m) increase weakly monotonously to a maximum of Φ. The sequence Bj has a convergent subsequence. (b) The determined maximum of Φ may be a local one (cf. Example 9.49). (0)
(c) The better the starting values Bj are, the better are the chances to obtain the (0) global maximum of Φ. A good choice for Bj can be obtained from the HOSVD Nd (0) (0)H HOSVD = j=1 Bj Bj , denoted by Br BH projection Pr r in (10.3d). For a detailed discussion of this and related methods we refer to De Lathauwer– De Moor–Vandevalle [72]. It turns out that the chance to obtain fast convergence to (j) (j) the global maximum is the greater the larger the gaps σrj − σrj +1 are.
10 Tensor Subspace Approximation
364
10.3.2 ALS for Different Formats The realisation described above involves Mj (v) B[j] ∈ KIj ×J[j] and its leftsided singular-value decomposition. The corresponding computations depend on the format of v. Note that the choice of the format is independent of the fact that the optimal solution u is sought in tensor subspace format Tr . We start with the case of the full tensor representation. 10.3.2.1 Full Format The tensors vj,m or equivalently their matricisations Mj (vj,m ) must be determined.14 The direct computation of the iterate vj,m = BH [j] v from the tensor Qk Q Pj−1 Qk Qd nj P d d v costs 2 k=1 `=1 r` `=k n` + 2 rj k=j+1 `=1 r` `=k n` operations. Pj−1 Pd This number is bounded by 2 k=1 r¯k nd−k+1 + 2 k=j+1 r¯k−1 nd−k+2 , where Qd rnd . n := max nj and r¯ := max rj . If r¯ n, the leading term is 2r1 `=1 n` ≤ 2¯ (m−1)H
(m−1)
(m−1)
)H v , . . . at the exv, (Bd−1 ⊗ Bd Instead, we can determine Bd pense of more memory. Note that the sizes of these tensors are decreasing. Having (m−1) H (m−1) (m)H (m) ⊗...⊗Bd ) v (B3 computed a new B1 , we can obtain v2,m from B1 etc. Using these data, we need # #" d " k #" j #" d # " j j−1 Y d X d Y Y X Y Y X r` n` r` r` + 2 n` 2 j=2
`=1
`=j
j=2 k=1
`=1
`=k
`=j+1
operations to determine all d tensors v1,m , v2,m , . . . , vd,m . As soon as vj,m is determined, of Mj (vj,m )Mj (vj,m )H and Q the computation 8 3 2 its diagonalisation requires nj `6=j r` + 3 nj operations. We summarise the total cost per iteration m 7→ m + 1 for different ratios r/n. (a) If r¯ n, the leading cost is 4¯ rnd . (b) If r¯ < n, the asymptotic cost is nd r¯ 4 + 6 nr¯ + ( nr¯ )d−2 + O ( nr¯ )2 . (c) If r¯ ≈ n, so that r¯ ≤ n can be used, the leading bound is (d 2 + 2d − 2)nd+1 . 10.3.2.2 r-Term Format Pr Nd (j) Now v = i=1 j=1 vi is assumed. The projections v 7→ BjH v can be performed independently for all j: (j) (j) wi := BjH vi ∈ KJj Pd (cost: 2r j=1 rj nj ). The projected iterate vj,m = BH [j] v (cf. (10.19)) takes the form A precomputation of the Gram matrix C := B[j] BT [j] followed by the evaluation of the H I1 ×I1 product Mj (v)CMj (v) ∈ K is more expensive.
14
365
10.3 Alternating Least-Squares Method (ALS)
vj,m =
r X
j−1 O
i=1
k=1
! (k) wi
⊗
(j) vi
d O
⊗
! (k) wi
.
k=j+1
Therefore the computation of Mj (vj,m ) Mj (vj,m )H ∈ KIj ×Ij requires Gram matrices Gk similar to those in (8.32a), but now the entries are scalar products (k) (k) hwν , wµ i in KJj instead of KIj . Furthermore, the QR decomposition (j) (j) [ v1 · · · vr ] = Qj Rj with r˜j = rank(Qj Rj ) ≤ min{nj , r} (line 3 in (8.32a)) can be performed once for all. The matrix Uj from the diagonalisation Mj (vj,m ) Mj (vj,m )H = Uj Σj UjH is obtained as in (8.32a). Repeating the proof of Remark 8.32, we obtain a total computational cost of d X 8 r2 rj + 2r2 r˜j + r˜ rj2 + r˜j3 + 2rj r˜j nj + 2rrj nj (10.21) 3 j=1 per iteration m 7→ m + 1. For r n := maxj nj , r¯ := maxj rj n, the dominating term is 4dr¯ rn.
10.3.2.3 Tensor Subspace Format We recall that the index sets Jj , J, and J[j] together with the representation ranks rj are fixed in (10.20) and used for the format of the optimal solution u ∈ Tr of Problem (10.12). For v ∈ V = KI we introduce representation ranks sj and Jˆj = {1, . . . , sj },
ˆ = Jˆ1 × . . . × Jˆd , J
ˆ[j] = J
×
Jˆk ,
k∈{1,...,d}\{j}
and assume ˆj )d ˆ a with a, (B v = ρorth ˆ j=1 = Bˆ
(
ˆ
ˆ
ˆ j ∈ KI j ×J j , ˆ a ∈ KJ , B N ˆj ∈ KI×Jˆ ˆ= d B B
(10.22a)
j=1
and Ij and I in (10.20). If v is given in the general format ρframe , we first have to orthonormalise the bases (cf. §8.2.4.2). ˆj ). According to The bases in (10.22a) determine the spaces Uj := range(B Remark 10.19c, we should compute the HOSVD representation (and the corresponding truncation to Tr ). If Bj is the HOSVD basis, even Uj = Ujmin (v) holds; otherwise, Ujmin (v) ⊂ Uj . We recall that the best approximation u = ubest ∈ Tr of Problem (10.12) belongs to Ujmin (v) (cf. Lemma 10.10): ubest ∈ U(v) :=
d O
Ujmin (v) and
ˆj ). (10.22b) Ujmin (v) ⊂ Uj = range(B
j=1
The usual advantage of the tensor subspace format is that the major part of the computations can be performed using the smaller coefficient tensor ˆ a. This statement is also true for the best approximation in Tr . Because of (10.22b), there is a coefficient tensor ˆ cbest such that
10 Tensor Subspace Approximation
366
ˆ cbest . ubest = Bˆ ˆj ensures that Orthonormality of the bases B a−ˆ cbest k kv − ubest k = kˆ ˆ
holds for the Euclidean norms in KI and KJ , respectively. Proposition 10.20. (a) Minimisation of kv − uk over Tr (KI ) is equivalent to ˆ ˆ cbest is the a−ˆ cbest is found, ubest := Bˆ ck over Tr (KJ ). If ˆ minimisation of kˆ desired solution. ˆ (j) (j) (b) Let ˆ cbest = ρorth a, (βj )dj=1 with a ∈ KJ and βj = [β1 · · · βrj ] ∈ KJj ×Jj N ˆ d be the representation in Tr (KJ ) and set β := j=1 βj . Then ( ˆj βj ∈ KIj ×Jˆj , a ∈ KJ , Bj := B d ubest = ρorth a, (Bj )j=1 = Ba with ˆ β ∈ KI×J B := B is the orthonormal tensor subspace representation of ubest . t u
Proof. Part (b) follows from Remark 8.23. ˆ
a ∈ KJ requires the cost stated in §10.3.2.1 Application of the ALS iteration to ˆ ˆ with nj replaced by sj := #Jj .
10.3.2.4 Hybrid Format Proposition 10.20 is again valid, but now ˆ a is given in r-term format. The cost of one ˆ J ALS iteration applied to ˆ a ∈ K is given by (10.21) with nj replaced by sj := #Jˆj .
10.3.2.5 Special Case r = (1, . . . , 1) An interesting special case is given by rj = 1 for all 1 ≤ j ≤ d,
i.e., r = (1, . . . , 1),
since then max kBH vk = kvk∨ B
describes the injective crossnorm in §4.3.1.3 and §4.5.2. De Lathauwer–De Moor– Vandewalle [72] propose an iteration, which they call the higher-order power method, since, for d = 2, it corresponds to the power method. (j)
The basis Bj ∈ KIj ×Jj in (10.16) reduces to one vector b(j) := b1 . The map B[j] = b(1)H ⊗ . . . ⊗ b(j−1)H ⊗ id ⊗ b(j−1)H ⊗ . . . ⊗ b(d)H ∈ L(V, Vj )
367
10.3 Alternating Least-Squares Method (ALS)
acts on elementary vectors as B[j]
d O
! v (k)
=
Y
v (k) , b(k) v (j) .
k6=j
k=1
The higher-order power method applied to v ∈ V can be formulated as follows: start iteration m = 1, 2, . . . return
choose b(j) , 1 ≤ j ≤ d, with kb(j) k = 1 for j := 1 to d do begin b(j) := B[j] (v); λ := kb(j) k; b(j) := b(j) /λ end; Nd u := λ j=1 b(j) ∈ T(1,...,1) .
Further comments on this method are given by De Lathauwer–De Moor–Vandevalle [72, §3].
10.3.3 Approximation with Fixed Accuracy Nd Consider the tensor space V = j=1 Vj with Vj = KIj , nj = #Ij , equipped with the Euclidean norm. A tensor from V given in the format v ∈ Ts with tensor subspace rank s = (s1 , . . . , sd ) ∈ Nd0 requires storage of the size Ns :=
d X j=1
sj nj +
d Y
sj
(cf. Remark 8.8a,b).
j=1
An approximation u ∈ Tr with r s leads to a reduced storage Nr (cf. Footnote 1). Given some ε > 0, there is a subset Rε ⊂ Nd0 of smallest vectors r ∈ Nd0 satisfying minu∈Tr kv − uk ≤ ε; i.e.,15 0 ≤ r ≤ s, minu∈Tr kv − uk ≤ ε, and d Rε := r ∈ N0 : . r = 0 or minu∈Ts kv − uk > ε for all 0 ≤ s r Let r∗ be the minimiser of the cost min{Nr : r ∈ Rε } and choose a minimiser16 u∗ ∈ Tr∗ of min{kv − uk : u ∈ Tr∗ }. Then u∗ is the solution of Problem (10.14). Since neither r∗ ∈ Nd0 nor u∗ ∈ Tr∗ are necessarily unique minimisers, and in particular because of the comment in Footnote 16, Problem (10.14) admits, in general, many solutions. To obtain Rε we need to know the minimal errors εr := minu∈Tr kv − uk. As seen in §10.3, computing εr is not a trivial task. Instead, we shall use a heuristic strategy which is again based on the higher-order singular-value decomposition. 15
The exceptional case r = 0 occurs if kvk ≤ ε. Then u = 0 ∈ T0 is a sufficient approximation. To solve Problem (10.14), it suffices to take any u∗ ∈ Tr∗ with kv − u∗ k ≤ ε. Among all possible u∗ with this property, the minimiser is the most appreciated solution. 16
10 Tensor Subspace Approximation
368
First, we have to compute the HOSVD tensor subspace representation of v. (j) This includes the determination of the singular values σi (1 ≤ j ≤ d, 1 ≤ i ≤ sj ) (j) and the corresponding basis vectors bi . The reduction in memory size is the larger (j) the more basis vectors bi can be omitted. More precisely, the saved storage Y ∆Nj (s) := Ns − Ns(j) = nj + sk with s(j) := (. . ., sj−1 , sj − 1, sj+1 , . . .), k6=j (j)
depends on the size of nj and s. The maximum over all ∆Nj (s)/(σsj )2 is at(j ∗ ) tained for some j ∗ . Dropping the component corresponding to bsj∗ yields the (j ∗ ),HOSVD v best decrease of storage cost. Hence v is replaced with u := Psj∗ −1 ∗
(projection defined in (10.3c)). Note that u ∈ Tr for r := s(j ) . Since the values ∆Nj (r) for j 6= j ∗ are smaller than ∆Nj (s), the singular values are now (j) weighted by ∆Nj (r)/(σrj )2 . Their maximiser j ∗ is used for the next projection ∗ (j ),HOSVD u := Prj∗ −1 u. These reductions are repeated until the sum of the omitted (j ∗ ) squared singular values σrj∗ does not exceed ε2 . The corresponding algorithm reads as follows: start u := v; r := s; τ := ε2 , compute the HOSVD of v (j) loop J := {j ∈ {1, . . . , d} : (σrj )2 < τ }; if J = ∅ then halt; determine ∆Nj (r) for j ∈ J; (j) j ∗ := argmax{∆Nj (r)/(σrj )2 : j ∈ J}; ∗ (j ∗ ) (j ),HOSVD u; τ := τ − (σrj∗ )2 ; u := Prj∗ −1 ∗ if rj ∗ > 1 then set r := r(j ) and repeat the loop
1 2 3 4
(10.23)
5 6 7
(j),HOSVD
In line 2, directions are selected for which a projection Prj −1 yields an approximation u satisfying the requirement kv − uk ≤ ε. In line 6, the previous approximation u ∈ Tr is projected into u ∈ Tr(j∗ ) , where the j ∗ -th rank is reduced from rj ∗ to rj ∗ − 1. If rj ∗ = 1 occurs in line 7, u = 0 holds. The algorithm terminates with values of r ∈ Nd0 , u ∈ Tr , and τ ≥ 0. 2
Thanks to estimate (10.4), the inequality kv − uk ≤ ε2 − τ ≤ ε2 holds with τ determined by (10.23). However, as discussed in §10.1.3, the estimate (10.4) may be too pessimistic. Since v − u ⊥ u, the true error can be computed from 2
2
2
kv − uk = kvk − kuk
Psj (j) 2 (note that kvk = i=1 (σi )2 for any j ). If an additional reduction is wanted, 2 algorithm (10.23) can be repeated with u, r, ε2 − kv − uk instead of v, s, ε2 . The proposed algorithm requires only one HOSVD computation by (10.23). In principle, after getting a new approximation u in line 6, we may compute a new HOSVD. In that case the accumulated squared error ε2 − τ is the true error 2 kv − uk (cf. Theorem 10.5).
10.4 Analytical Approaches for the Tensor Subspace Approximation
369
10.4 Analytical Approaches for the Tensor Subspace Approximation Nd In this section, we consider multivariate function spaces V = k·k j=1 Vj and use interpolation of univariate functions from Vj . We may also replace functions f ∈ Vj by grid functions fˆ ∈ KIj with the interpretation fˆi = f (ξi ) (ξi , i ∈ Ij : grid nodes). Then all interpolation points appearing below must belong to the grid {ξi : i ∈ Ij }. Interpolation will map the functions into a fixed tensor subspace Od (10.24) U= Uj ⊂ V, j=1
which is equipped with the norm of V. Note that U ⊂ Tr with r = (r1 , . . . , rd ) . As remarked at the beginning of §9.8, the following techniques can be used for practical constructions, as well as for theoretical estimates of the best approximation error ε(v, r) in (10.13).
10.4.1 Linear Interpolation Techniques 10.4.1.1 Linear Interpolation Problem We now omit the index j of the direction and rename rj , Uj , Vj by r, U, V. For r ∈ N0 fix a subspace U ⊂ V of dimension r and define linear functionals Λi ∈ V ∗ (1 ≤ i ≤ r). Then the linear interpolation problem in V reads as follows: Given λi ∈ K (1 ≤ i ≤ r), Λi (f ) = λi
find f ∈ U with
for 1 ≤ i ≤ r.
(10.25a)
In most of the cases, Λi are Dirac functionals at certain interpolation points ξi , i.e., Λi (f ) = f (ξi ). Then the interpolation conditions for f ∈ U become f (ξi ) = λi
for 1 ≤ i ≤ r .
(10.25b)
The Dirac functionals are continuous in C(I) or Hilbert spaces with sufficient smoothness (Sobolev embedding; cf. [141, Theorem 6.48]). In spaces as L2 (I), other functionals Λi must be chosen for (10.25a). Remark 10.21. (a) Problem (10.25a) is uniquely solvable for all λi ∈ K if and only if the functionals {Λi : 1 ≤ i ≤ r} are linearly independent on U . (b) In the positive case, there are so-called Lagrange functions Li defined by Li ∈ U (1 ≤ i ≤ r) and Λν (Lµ ) = δνµ
(1 ≤ ν, µ ≤ r) .
(10.26)
10 Tensor Subspace Approximation
370
Then problem (10.25a) has the solution r X λi Li . f= i=1
We always assume that the interpolation problem is solvable. The Lagrange functions define the interpolation operator I ∈ L(V, U ) by r X I(f ) = Λi (f )Li ∈ Uj . i=1
Remark 10.22. I ∈ L(V, U ) is a projection onto U . The norm Cstab := kIj kV ←V is called the stability constant of the interpolation I (cf. Hackbusch [137, Def. 4.8]). The estimation of the interpolation error f −I(f ) requires a Banach subspace W ⊂ V with a stronger norm of f ∈ W (e.g., W = C p+1 (I) & V = C(I) in (10.30)).
10.4.1.2 Linear Product Interpolation d
Let the function f (x) = f (x1 , . . . , xd ) be defined on a product domain I :=×j=1 Ij . For each direction j, we assume an interpolation operator Ij (f ) =
rj X
(j)
(j)
Λi (f )Li ,
i=1 (j)
involving functionals Λi (j) denoted by ξi .
(j)
and Lagrange functions Li . Interpolations points are
We now assume that k·k is a uniform crossnorm. Then the multivariate interpolation operator (product interpolation operator) I :=
d O
Ij : V =
d O k·k
j=1
Vj → U =
j=1
d O
Uj ⊂ V
(10.27)
j=1
Qd isbounded by Cstab := j=1 Cstab,j . Application of I to f (x) = f (x1 ,. . ., xd ) can be performed recursively. The following description to the Dirac functionals Pr refers(1) (1) in (10.25b). Application of I1 yields f(1) (x) = i11=1 f (ξi1 , x2 , . . . , xd )Li1 (x1 ). I2 maps into r1 X r2 X (2) (1) (1) (2) f(2) (x) = f (ξi1 , ξi2 , x2 , . . . , xd ) Li1 (x1 ) Li2 (x2 ). i1 =1 i2 =1
After d steps the final result is reached: f(d) (x) = I(f )(x) =
r1 X i1 =1
···
rd X id =1
(1)
(d)
f (ξi1 , . . . , ξid )
d Y j=1
(j)
Lij (xj ) ∈ U.
10.4 Analytical Approaches for the Tensor Subspace Approximation
371
Remark 10.23. Even the result f(d−1) is of interest. The function f(d−1) (x) =
r1 X i1 =1
belongs to the function
Nd−1
···
rd X
(1)
(d−1)
f (ξi1 , . . . , ξid−1 , xd )
j=1 Uj ⊗ Vd . For (d−1) (1) f (ξi1 , . . . , ξid−1 , •)
(j)
Lij (xj )
j=1
id−1 =1
d−1 Y
(j)
fixed values {ξi
: 1 ≤ i ≤ rj , 1 ≤ j ≤ d − 1}
is already univariate.
Interpolation is a special form of approximation. Error estimates for interpolation can be derived from best approximation errors. Lemma 10.24. Let Cstab,j be the stability constant of Ij (cf. Remark 10.22). Then the interpolation error can be estimated by kf − Ij (f )kVj ≤ (1 + Cstab,j ) inf{kf − gkVj : g ∈ Uj }. Proof. Split the error into [f − Ij (g)] + [Ij (g) − Ij (f )] for g ∈ Uj and note that Ij (g) = g because of the projection property. Taking the infimum over g ∈ Uj in kf − Ij (f )kVj ≤ kf − gkVj + kIj (g − f )kVj ≤ (1 + Cstab,j ) kf − gkVj , t u
we obtain the assertion.
The multivariate interpolation error can be obtained from univariate errors as follows. Let ( " j−1 # " d #) O O εj (f ) := inf kf − gk : g ∈ Vj ⊗ Uj ⊗ (10.28) Vj k=1
k=j+1
for 1 ≤ j ≤ d be the best approximation error in j-th direction. Proposition 10.25. Let the norm of V be a uniform crossnorm. With εj (f ) and Cstab,j from above, the interpolation error of I in (10.27) can be estimated by "j−1 # d Y X kf − I(f )k ≤ Cstab,k (1 + Cstab,j ) εj (f ). j=1
k=1
Proof. Consider the construction of f(j) from above with f(0) := f and f(d) = I(f ). Pd The difference f(j−1) − f(j) in f −I(f ) = j=1 f(j−1) −f(j) can be rewritten as f(j−1) − f(j) = [I1 ⊗ I2 ⊗ . . . ⊗ Ij−1 ⊗ (I − Ij ) ⊗ id ⊗ . . . ⊗ id] (f ) " j−1 ! !# d O O Ik ⊗ id = [id ⊗ . . . ⊗ id ⊗ (I − Ij ) ⊗ id ⊗ . . . ⊗ id] (f ). k=1
k=j
The norm of [. . . ⊗ id ⊗ (I −Ij ) ⊗ id ⊗ . . .] (f ) is bounded by (1 + Cstab,j ) εj (f ) Qj−1 (cf. Lemma 10.24). The operator norm of the first factor is k=1 Cstab,k because of the uniform crossnorm property. t u
10 Tensor Subspace Approximation
372
10.4.1.3 Use of Transformations We return to the univariate case of a function f defined on an interval I. Let φ : J → I be a mapping from a possibly different interval J onto I and define F := f ◦ φ. The purpose of the transformation φ is an improvement of the smoothness properties. For instance, φ may remove a singularity.17 Applying an interpolation IJ with interpolation points ξiJ ∈ J to F , we get F (y) ≈ (IJ (F )) (y) =
r X
F (ξiJ )LJi (y).
i=1
The error estimate of F − IJ (F ) may exploit the improved smoothness of F . We can reinterpret this quadrature rule on J as a new quadrature rule on I: f (x) ≈ (IJ (F )) (φ
−1
(x)) =
r X
F (ξiJ )LJi (φ−1 (x)) = II (f )(x)
i=1
with II (f ) :=
Pr
i=1
ΛIi (f )LiI , ΛIi (f ) := f (ζiI ), ζiI := φ(ξiJ ), LIi := LJi ◦ φ−1 .
Remark 10.26. (a) Since (IJ (f ◦ φ)) (φ−1 (·)) = II (f ), the supremum norms of the errors coincide: kf − II (f )kI,∞ = kF − IJ (F )kJ,∞ . (b) While LJi may be standard functions as, e.g., polynomials, LiI = LJi ◦ φ−1 are nonstandard.
10.4.2 Polynomial Approximation 10.4.2.1 Notations Nd d Let I = ×j=1 Ij . Choose the Banach tensor space C(I) = ∞ j=1 C(Ij ) , i.e., Vj = C(Ij ). The subspaces Uj ⊂ Vj are polynomial spaces Ppj defined by ) ( p X ν aν x : aν ∈ K . Pp := ν=0
The tensor subspace U in (10.24) is built from Uj = Ppj : Pp :=
Od j=1
Ppj ⊂ C(I)
with p = (p1 , . . . , pd ).
Note that U = Pp ⊂ Tr requires pj ≤ rj − 1. 17
f (x) =
p p sin(x) in I = [0, 1] and x = φ(y) := y 2 yield F (y) = sin(y 2 ) ∈ C ∞ .
10.4 Analytical Approaches for the Tensor Subspace Approximation
373
10.4.2.2 Approximation Error The approximation error is the smaller the smoother the function is. Optimal smoothness conditions hold for analytic functions. For this purpose, we assume that a univariate function is analytic (holomorphic) in a certain ellipse. x2 y2 Ea,b := z ∈ C : z = x + iy, 2 + 2 ≤ 1 a b is the ellipse with half-axes a and b. In particular, Eρ := E 21 (ρ+1/ρ), 21 (ρ−1/ρ)
for ρ > 1
is the unique ellipse with foci ±1 and ρ being the sum of the half-axes. The interior of Eρ is denoted by E˚ρ . Note that the interval [−1, 1] is contained in E˚ρ because ρ > 1. Eρ will be called the regularity ellipse since the functions to be approximated are assumed to be holomorphic in E˚ρ . The main result is Bernstein’s theorem [30] (proof in [77, Sect. 8, Chap. 7]). Theorem 10.27 (Bernstein). Let f be holomorphic and uniformly bounded in E˚ρ with ρ > 1. Then, for any p ∈ N0 , there is a polynomial Pp of degree ≤ p such that 2ρ−p kf − Pp k[−1,1],∞ ≤ kf kE˚ρ ,∞ . ρ−1 1) A general real interval [x1 , x2 ] with x1 < x2 is mapped by Φ(z) := 2(z−x x2 −x1 −1 onto [−1, 1]. We set
Eρ ([x1 , x2 ]) := Φ−1 (Eρ ) 2 x − x 2 2 x − x1 +x y2 2 1 2 . = z ∈ C : z = x + iy, 2 + 2 ≤ 4 (ρ + 1/ρ) (ρ − 1/ρ) Corollary 10.28. Assume that a function f defined on I = [x1 , x2 ] can be extended holomorphically onto E˚ρ (I) with M := sup{|f (z)| : z ∈ E˚ρ ([x1 , x2 ])}. Then, for any p ∈ N0 , there is a polynomial Pp of degree ≤ p such that kf − Pp kI,∞ ≤
2ρ−p M. ρ−1
The next statement exploits only properties of f on a real interval (proof in Melenk–B¨orm–L¨ohndorf [225]). Lemma 10.29. Let f be an analytical function defined on the interval I ⊂ R of length L. Assume that there are constants C, γ ≥ 0 such that
dn
≤ C n! γ n for all n ∈ N0 . (10.29)
n f dx I,∞
10 Tensor Subspace Approximation
374
Then, for any p ∈ N0 , Pp there is a polynomial Pp of degree ≤ p such that 2 −(p+1) kf − Pp kI,∞ ≤ 4 e C (1 + γL) (p + 1) 1 + . γL Corollary 10.30. With the notations in §10.4.2.1 assume that f ∈ C(I) is analytic in all arguments. Then the best approximation error εj (f ) in (10.28) can be estimated by εj (f ) ≤
2Mj −pj ρ ρj − 1 j
(ρj > 1, 1 ≤ j ≤ d) ,
if for all xk ∈ Ik (k 6= j), the univariate function f (x1 , . . . , xj−1 , •, xj+1 , . . . , xd ) ∈ C(Ij ) satisfies the conditions of Corollary 10.28. The estimate −pj
εj (f ) ≤ C 0 (p + 1) ρj
with ρj := 1 +
2 γL
holds if f (x1 , . . . , xj−1 , •, xj+1 , . . . , xd ) fulfils the inequalities (10.29). In both −p cases, the bound of εj (f ) decays exponentially as O(ρj j ) for pj → ∞.
10.4.3 Polynomial Interpolation 10.4.3.1 Univariate Interpolation Univariate interpolation by polynomials is characterised by the interval I = [a, b], the degree p, and the quadrature points (ξi )pi=0 ⊂ I. The interval can be standardised to [−1, 1]. An interpolation operator I[−1,1] on [−1, 1] with quadrature points (ξi )pi=0 ⊂ [−1, 1] can be transferred to an interpolation operator I[a,b] on [a, b] with quadrature points (Φ(ξi ))pi=0 , where Φ : [−1, 1] → [a, b] is the affine mapping 1 Φ(x) = a + (b − a) (x + 1) . 2 The interpolating polynomials satisfy I[a,b] (f ) = I[−1,1] (f ◦ Φ). The Lagrange functions Lν in (10.26) are the Lagrange polynomials Y
Lν (x) =
i∈{0,...,p}\{ν} [a,b]
They satisfy Lν
[−1,1]
= Lν
x − ξi . ξν − ξi
◦ Φ−1 .
A well-known interpolation error estimate holds for functions f ∈ C p+1 (I):
10.4 Analytical Approaches for the Tensor Subspace Approximation
kf −I(f )k∞ ≤
Cω (I) (p+1) kf k∞ (p+1)!
375
p
Y
with Cω (I) := (x − ξi )
i=0
.
(10.30)
I,∞
The natural Banach space is V = C(I), k·kI,∞ . Remark 10.31. (a) If I[a,b] (f ) = I[−1,1] (f ◦ Φ) are polynomial interpolations of degree p, then p+1 b−a Cω (I[a,b] ) = Cω (I[−1,1] ) . 2 (b) The stability constant Cstab =kI[a,b] kV ←V if independent of the interval [a, b]. The smallest constant Cω (I[−1,1] ) is obtained for the so-called Chebyshev interpolation using the Chebyshev quadrature points ξi = cos
i + 1/2 π ∈ [−1, 1] p+1
(i = 0, . . . , p),
which are the zeros of the (p + 1)-th Chebyshev polynomial Tp+1 (cf. [137, §4.5]). Remark 10.32 ([251]). Chebyshev interpolation of polynomial degree p leads to the constants 2 Cω (I[−1,1] ) = 2−p−1 and Cstab ≤ 1 + log (p + 1) . π 10.4.3.2 Product Interpolation Given polynomial interpolation operators Ij of degree pj on intervals Ij , the product interpolation is the tensor product I :=
d O
Ij : C(I) → Pp .
j=1
Under the conditions of Corollary 10.30, the approximation error εj (f ) in (10.28) −p decays exponentially: εj (f ) ≤ O(ρj j ). Hence Proposition 10.25 yields the result kf − I(f )k ≤
"j−1 d X Y j=1
# −pj
Cstab,k (1 + Cstab,j ) εj (f ) ≤ O(max ρj j
k=1
For Chebyshev interpolation, the stability constants Cstab,j ≤ 1 +
2 log (pj + 1) π
depend only very weakly on the polynomial degree pj .
).
10 Tensor Subspace Approximation
376
10.4.4 Sinc Approximations The following facts are mainly taken from the monograph of Stenger [274].
10.4.4.1 Sinc Functions, Sinc Interpolation The sinc function sinc(x) := sin(πx) is holomorphic in C. Fixing a step size πx h > 0, we obtain a family of scaled and shifted functions sin [π(x−kh)/h] x S(k, h)(x) := sinc −k = h π (x − kh) /h
(h > 0, k ∈ Z).
(10.31)
Note that S(k, h) is a function in x with two parameters k, h. Remark 10.33. The entire function S(k, h) satisfies S(k, h)(`h) = δk,` for all ` ∈ Z. Because of Remark 10.33, S(k, h) can be viewed as Lagrange function Lk corresponding to infinite many interpolation points {kh : k ∈ Z}. This leads to the following definition. Definition 10.34 (sinc interpolation). Let f ∈ C(R) and N ∈ N0 . Sinc interpolation at 2N + 1 points {kh : k ∈ Z, |k| ≤ N } is denoted by18 CN (f, h) :=
N X
f (kh)S(k, h).
(10.32)
k=−N
If the limit exists for N → ∞, we write C(f, h) :=
∞ X
f (kh)S(k, h).
k=−∞
The corresponding interpolation errors are EN (f, h) := f − CN (f, h),
E(f, h) := f − C(f, h).
Lemma 10.35. The stability constant in kCN (f, h)k∞ ≤ Cmathrmstab (N ) kf k∞ for all f ∈ C(R) satisfies Cstab (N ) ≤
2 (3 + log(N )) π
(cf. [274, p. 142]).
PN Only for the sake of convenience we consider the sum k=−N . . . . One may use instead PN2 k=−N1 , where N1 and N2 are adapted to the behaviour at −∞ and +∞, separately.
18
10.4 Analytical Approaches for the Tensor Subspace Approximation
377
Under strong conditions, f coincides with C(f, h) (cf. [274, (1.10.3)]). Usually, there is an error E(f, h), which will be estimated in (10.35). The speed, by which f (x) tends to zero as R 3 x → ±∞, determines the error estimate of C(f, h) − CN (f, h) = EN (f, h) − E(f, h) (cf. Lemma 10.37) so that, finally, EN (f, h) can be estimated. The error estimates are based on the fact that f can be extended analytically from R to a complex stripe Dδ satisfying R ⊂ Dδ ⊂ C: Dδ := {z ∈ C : |=m z| < δ}
(δ > 0) .
Definition 10.36. For δ > 0 and f holomorphic in Dδ , define Z Z |f (z)| |dz| = kf kDδ = (|f (x + iδ)| + |f (x − iδ)|) dx ∂Dδ
(10.33)
(10.34)
R
(set kf kDδ = ∞ if the integral does not exist). Then a Banach space is given by H(Dδ ) := {f is holomorphic in Dδ and kf kDδ < ∞}. The residual theorem allows us to represent the interpolation error exactly: Z sin(πz/h) f (ζ) E(f, h)(z) = dζ for all z ∈ Dδ 2πi (ζ − z) sin (πζ/h) ∂Dδ (cf. [274, Thm 3.1.2]). Estimates with respect to the supremum norm k·kR,∞ or L2 (R) norm have the form19 kE(f, h)k∞ ≤ Note that
kf kDδ 2πδ sinh( πδ h )
1 ≤ 2 exp sinh(πδ/h)
−πδ h
.
(10.35)
decays exponentially as h → 0. While E(f, h) depends on kf kDδ and therefore on the behaviour of f in the complex plane, the difference E(f, h) − EN (f, h) hinges on decay properties of f on the real axis alone. Lemma 10.37. Assume that for f ∈ H(Dδ ) there are some c ≥ 0 and α > 0 such that |f (x)| ≤ c · e−α|x| for all x ∈ R. (10.36) P Then the difference E(f, h) − EN (f, h) = |k|>N f (kh)S(k, h) can be bounded 19
For a proof and further estimates in L2 (R) compare [274, (3.1.12)] or [138, §D.2.3].
10 Tensor Subspace Approximation
378
by kE(f, h) − EN (f, h)k∞ ≤
2c −αN h e . αh
(10.37)
P Proof. Since E(f, h) − EN (f, h) = |k|>N f (kh)S(k, h) and kS(k, h)k∞ ≤ 1, P the sum |k|>N |f (kh)| can be estimated using (10.36). t u To bound kEN (f, h)k∞ ≤ kE(f, h)k∞ + kE(f, h) − EN (f, h)k∞ optimally, the step width h must be chosen such that both terms are balanced. Theorem 10.38. Let f ∈ H(Dδ ) satisfy (10.36). Choose h by r πδ h := hN := . αN
(10.38)
Then the interpolation error is bounded by √ q kf kDδ 2c √ +√ kEN (f, h)k∞ ≤ Nδ e− παδN . πα π [1 − e−παδN ] N δ
(10.39)
√ The right-hand side in (10.39) behaves like kEN (f, h)k∞ ≤ O(exp{−C N }). t u
Proof. Combine (10.35) and (10.37) with h in (10.38).
Inequality (10.35) implies that, given an ε > 0, accuracy kEN (f, hN )k∞ ≤ ε holds for 1 1 N ≥ C −2 log2 ( ) + O(log ). ε ε Corollary 10.39. A stronger decay than in (10.36) holds if |f (x)| ≤ c · e−α |x|
γ
for all x ∈ R and some γ > 1.
Instead of (10.37), the latter condition implies that 2c γ exp(−α (N h) ). αN γ−1 hγ 1/(γ+1) −γ/(γ+1) := πδ N leads to α
kE(f, h) − EN (f, h)k∞ ≤ The optimal step size h := hN
γ γ+1 kEN (f, hN )k∞ ≤ O e−C N
1
(10.40)
γ
for all 0 < C < α γ+1 (πδ) γ+1 .
(10.41)
(γ+1)/γ . To reach accuracy ε, the number N must be chosen ≥ C −1 log(1/ε) Proof. See [138, Theorem D.12].
t u
10.4 Analytical Approaches for the Tensor Subspace Approximation
379
For increasing γ, the right-hand side in (10.41) approaches O(e−CN ) as attained in Theorem 9.58 for a compact interval. A bound quite close to O(e−CN ) can be obtained when f decays doubly exponentially. Corollary 10.40. Assume that for f ∈ H(Dδ ) there are constants c1 , c2 , c3 > 0 such that |f (x)| ≤ c1 · exp{−c2 ec3 |x| } for all x ∈ R. Then kE(f, h) − EN (f, h)k∞ ≤ Choosing h := hN :=
log N c3 N ,
kEN (f, h)k∞ ≤ C exp
e−c3 N h 2c1 . exp − c2 ec3 N h c2 c3 h
we obtain −πδc N 3 log N
Accuracy ε > 0 follows from 1 1 · log log N ≥ Cε log ε ε
with C →
kf kDδ for N → ∞. 2πδ
Cε =
1 + o(1) as ε → 0. πδ c3
with
Proof. See [138, Theorem D.14].
t u
10.4.4.2 Transformations and Weight Functions As mentioned in §10.4.1.3, a transformation φ : J → I may improve the smoothness of the function. Here the transformation has another reason. Since sinc interpolation is performed on R, a function f defined on I must be transformed into F := f ◦ φ for some φ : R → I. Even when I = R, an additional substitution by φ : R → R may lead to a faster decay of |f (x)| as |x| → ∞. We give some examples:
(a) (b) (c) (d)
I (0, 1] [1, ∞) (0, ∞) (−∞, ∞)
transformations x = φ(ζ) 1 1 or cosh(sinh(ζ)) (cf. [179]) φ(ζ) = cosh(ζ) φ(ζ) = cosh(ζ) or cosh(sinh(ζ)) φ(ζ) = exp(ζ) φ(ζ) = sinh(ζ)
One has to check carefully whether F := f ◦ φ belongs to H(Dδ ) for a positive δ. The stronger the decay on the real axis is, the faster is the increase in the imaginary axis. The second transformations in the lines (a) and (b) are attempts to reach the doubly exponential decay from Corollary 10.40. The transformation from line (d) may improve the decay.
10 Tensor Subspace Approximation
380
Let f be defined on I. To study the behaviour of F (ζ) = f (φ(ζ)) for ζ → ±∞, it is necessary to have a look at the values of f at the end points of I. Here different cases are to be distinguished. Case A1: Assume that f is defined on (0, 1] with f (x) → 0 as x → 0. Then F (ζ) = f (1/ cosh(ζ)) decays even faster to zero as ζ → ±∞. Note that the boundary value f (1) is arbitrary. In particular, f (1) = 0 is not needed. CN (F, h) in (10.32) Furthermore, since F (ζ) is an even function, the interpolation PN can be simplified to CN (F, h) = F (0)S(0, h) + 2 k=1 F (kh)S(k, h), which, formulated with f = F ◦ φ−1 , yields the new interpolation scheme CˆN (f, h)(x) = f (1)L0 (x) + 2
N X
1 f ( cosh(kh) )Lk (x)
k=1
with Lk (x) := S(k, h)(Arcosh( x1 )). Area hyperbolic cosine Arcosh is the inverse of cosh. Note that CˆN involves only N +1 interpolation points ξk = 1/ cosh(kh). Case A2: Take f as above, but assume that f (x) → c 6= 0 as x → 0. Then F (ζ) → c as ζ → ±∞ shows that F fails to fulfil the desired decay to zero. Case B: Let f be defined on (0, ∞) and set F (ζ) := f (exp(ζ)). Here we must require f (x) → 0 for x → 0, as well as for x → ∞. Otherwise F fails as in Case A2. We take Case A2 as a model problem and choose a weight function ω(x) with ω(x) > 0 for x > 0 and ω(x) → 0 as x → 0. Then, obviously, fω := ω · f has the correct zero limit at x = 0. Applying interpolation to Fω (ζ) := (ω · f ) (φ(ζ)) yields N X Fω (ζ) ≈ CN (Fω , h)(ζ) = Fω (kh) · S(k, h)(ζ). k=−N
Backsubstitution yields ω(x)f (x) ≈
N X
(ω · f ) (φ(kh)) · S(k, h)(φ−1 (x))
k=−N
and f (x) ≈ CˆN (f, h)(x) :=
N X k=−N
(ω·f ) (φ(kh))
S(k, h)(φ−1 (x)) ω(x)
for x ∈ (0, 1].
The convergence of |f (x) − CˆN (f, h)(x)| as N → ∞ is no longer uniform. Instead, the weighted error kf − CˆN (f, h)kω := kω[f − CˆN (f, h)]k∞ satisfies the previous estimates. In many cases, this is still sufficient.
10.4 Analytical Approaches for the Tensor Subspace Approximation
381
Example 10.41. (a) The function f (x) = xx is analytic in (0, 1], but singular at x = 0 with limx→0 f (x) = 1. Choose ω(x) := xλ for some λ > 0 and transform −λ−1/ cosh(ζ) 1 by φ(ζ) = cosh(ζ) . Then Fω (ζ) = (cosh(ζ)) behaves for ζ → ±∞ like 2λ exp(−λ |ζ|). It belongs to H(Dδ ) for δ < π/2 (note the singularity at ζ = ±πi/2). Therefore Lemma 10.37 can be applied. √ (b) Even when f is unbounded as f (x) = 1/ x, the weight ω(x) := x1/2+λ leads to the same convergence of Fω as in Part (a).
10.4.4.3 Separation by Interpolation, Tensor Subspace Representation We apply Remark 10.23 for d = 2, using the sinc interpolation with respect to the first variable. Consider a function f (x, y) with x ∈ X and y ∈ Y . If necessary, we apply a transformation x = φ(ζ) with φ : R → X so that the first argument varies in R instead of X. Therefore we may assume that f (x, y) is given with x ∈ X = R. Suppose that f (x, y) → 0 as |x| → ±∞. Sinc interpolation in x yields N X f (x, y) ≈ CN (f (·, y), h)(x) = f (kh, y) · S(k, h)(x). k=−N
The previous convergence results require uniform conditions with respect to y. Proposition 10.42. Assume that there is some δ > 0 so that f (·, y) ∈ H(Dδ ) for all y ∈ Y and ||| f |||:= sup{kf (·, y)kDδ : y ∈ Y } < ∞. Furthermore, suppose that there are c ≥ 0 and α > 0 such that |f (x, y)| ≤ c · e−α|x| for all x ∈ R, y ∈ Y. q πδ yields the uniform error estimate Then the choice h := αN r |EN (f (·, y), h)| ≤
√ N − παδN 2c ||| f ||| √ +√ e δ πα π [1 − e−παδN ] N δ
(y ∈ Y ).
Proof. For any y ∈ Y , the univariate function f (·, y) satisfies the conditions of Theorem 10.38. Inequality kf (·, y)kDδ ≤ ||| f ||| proves the desired estimate. t u If f (x, y) → 0 as |x| → ±∞ is not satisfied, the weighting by ω(x) must be applied additionally. We give an example of this type. 1 Example 10.43. Consider the function f (x, y) := x+y for x, y ∈ (0, ∞). Choose α the weight function ω(x) := x (0 < α < 1). Transformation x = φ(ζ) := exp(ζ)
10 Tensor Subspace Approximation
382
yields Fω (ζ, y) := [ω(x)f (x, y)]
= x=φ(ζ)
exp(αζ) . y + exp(ζ)
One checks that Fω ∈ H(Dδ ) for all δ < π. The norm kf (·, y)kDδ is not uniquely bounded for y ∈ (0, ∞), but behaves as O(y α−1 ): h i √ ω(x) f (x, y) − CˆN (f (·, y), h)(x) ≤ O(y α−1 e− παδN ) CN (Fω (·, y), h)(log(x)) CˆN (f (·, y), h)(x) := = ω(x)
N X k=−N
for
eαkh S(k, h)(log(x)) . y+ekh ω(x)
PN The interpolation yields a sum of the form FN (x, y) := k=−N fk (y)Lk (x), where Lk is the sinc function S(k, h), possibly with an additional transformation and weighting, while fk (y) are evaluations of f (x, y) at certain xk . Note that FN ∈ R2N +1 . Next, we assume that f (x1 , x2 , . . . , xd ) is a d-variate function. For simplicity suppose that all xj vary in R with f → 0 as |xj | → ±∞ to avoid transformations. Since interpolation in x1 yields N1 X
f(1) (x1 , x2 , . . . , xd ) =
f(1),k1 (x2 , . . . , xd ) · S(k1 , h)(x1 ).
k1 =−N1
Application of sinc interpolation to f(1),k (x2 , . . . , xd ) with respect to x2 separates the x2 dependence. The insertion into the previous sum yields f(2) (x1 , . . . , xd ) =
N1 X
N2 X
f(2),k1 ,k2 (x3 , . . . , xd )·S(k1, h)(x1)·S(k2, h)(x2).
k1 =−N1 k2 =−N2
After d − 1 steps we reach at N1 X
f(d−1) (x1 , . . . , xd ) =
Nd−1
···
k1 =−N1
X
f(d−1),k1 ,...,kd−1 (xd )
d−1 Y
S(kj , h)(xj ),
j=1
kd−1 =−Nd−1
Nd which belongs to j=1 Uj , where dim(Uj ) = 2Nj + 1, while Ud spanned by all f(d−1),k1 ,...,kd−1 for −Nj ≤ kj ≤ Nj is high-dimensional. The last step yields f(d) (x1 , x2 , . . . , xd ) =
N1 X k1 =−N1
···
Nd X kd =−Nd
f(d),k1 ,...,kd−1
d Y
S(kj , h)(xj ) ∈ Tr ,
j=1
where r = (2N1 + 1, . . . , 2Nd + 1) is the tensor subspace rank.
10.5 Simultaneous Approximation
383
10.5 Simultaneous Approximation Nd Let v1 , . . . , vm ∈ V = a j=1 Vj be m tensors. As in Problem (10.12) we want Nd to approximate all vi by ui ∈ Tr , however, the involved subspace U = j=1 Uj with dim(Uj ) = rj should be the same for all ui . More precisely, we are looking for the minimisers ui of the following minimisation problem: ( ) m X 2 2 inf inf ... inf inf ωi kvi − ui k , (10.42) N U1 ⊂V1 with U2 ⊂V2 with dim(U1 )=r1 dim(U2 )=r2
Ud ⊂Vd with dim(Ud )=rd
ui ∈
d j=1 Uj
i=1
where ωi2 > 0 are suitable weights. Such a problem arises for matrices (i.e., d = 2), e.g., in the construction of H2 -matrices (cf. [138, §8]). We refer to Lemma 3.27: a tuple (v1 , . . . , vm ) ∈ Vm may be considered as a Nd+1 tensor of W = a j=1 Vj with the (d + 1)-th vector space Vd+1 := Km . The Hilbert structure is discussed in the next remark. Nd Remark 10.44. Let V = a j=1 Vj be a Hilbert space with scalar product h·, ·iV , while Km is endowed with the scalar product hx, yid+1 :=
m X
ωi2 xi yi .
i=1
Then Vm is isomorphic and isometric to W=
d+1 O a
Vj
with
Vd+1 := Km .
j=1
Pm Tuples (v1 , . . . , vm ) ∈ Vm are written as tensors w := i=1 vi ⊗ e(i) ∈ W (e(i) : unit vectors, cf. (2.2)) with the following induced scalar product and norm: r m Xm X 2 0 2 0 ωi hvi , vi iV , ωi2 kvi kV . kwk = hw, w i = i=1
i=1
Hence Problem (10.42) is equivalent to )
( inf
inf
U1 ⊂V1 with U2 ⊂V2 with dim(U1 )=r1 dim(U2 )=r2
...
inf
Ud ⊂Vd with dim(Ud )=rd
u∈
inf N d+1
2
kw − uk
j=1 Uj
where the last subspace Ud+1 = Vd+1 = Km has full dimension. This shows the equivalence to the basic Problem (10.12) (but with d replaced by d + 1, Vd+1 = Km , and rd+1 = m). The statements about the existence of a minimiser of (10.12) can be transferred to statements about Problem (10.42).
10 Tensor Subspace Approximation
384
An important application is the simultaneous approximation problem for matrices vi = Mi ∈ KI×J (1 ≤ i ≤ m), where m X
2
ωi2 kMi − Ri kF
i=1
has to be minimised with respect to Ri ∈ U1 ⊗ U2 with the side conditions dim(U1 ) = r1
and
dim(U2 ) = r2 .
Here W = KI ⊗ KJ ⊗ Km is the underlying tensor space. The HOSVD bases of KI and KJ result from the matrices U (1) and U (2) of the left-sided singular-value decompositions LSVD(I, m#J, r1 , M1 (w), U (1) , Σ (1) ) and LSVD(J, m#I, r2 , M2 (w), U (2) , Σ (2) ) : M1 (w) = [ω1 M1 , ω2 M2 , . . . , ωm Mm ] = U (1) Σ (1) V (1)T ∈ KI×(J×m) , T = U (2) Σ (2) V (2)T ∈ KJ×(I×m) . M2 (w) = ω1 M1T , ω2 M2T , . . . , ωm Mm Equivalently, we must perform the diagonalisations M1 (w)M1 (w)T =
m X
ωi2 Mi MiT = U (1) (Σ (1) )2 U (1)T ∈ KI×I ,
i=1
M2 (w)M2 (w)T =
m X
ωi2 MiT Mi = U (2) (Σ (2) )2 U (2)T ∈ KJ×J .
i=1 m
The HOSVD basis of K is not needed since we do not want to introduce a strictly smaller subspace. The HOSVD projection from Remark 10.1 takes the special form Rk := Pr(1) Mk Pr(2) 1 2
where Pr(i) = i
ri X
uν(i) u(i)H (i = 1, 2) ν
ν=1 (i)
uses the ν-th column uν of U (i) . The error estimate (10.4) becomes m X
2
2
ωi2 kMi − Ri kF = kw − uHOSVD k ≤
nj 2 X X
(j) 2
σi
j=1 i=rj +1
i=1 2
≤ 2 kv − ubest k = 2
m X
2 ωi2 Mi − Ribest F
i=1
with n1 := #I and n2 := #J. Here we have made use of Corollary 10.3 because r3 = n3 := m.
10.6 R´esum´e
385
10.6 R´esum´e The discussion of the two traditional formats (r-term and tensor subspace format) has shown that the analysis, as well as the numerical treatment, is far more complicated for d ≥ 3 than for the matrix case d = 2. The main disadvantages are: Truncation of v ∈ RR to u ∈ Rr with smaller rank r < R is not easy to perform. This nonlinear optimisation problem, which usually needs regularisation, does not only lead to an involved numerical algorithm but also the result is not reliable since a local minimum may be obtained and different starting value can lead to different optima. There is no decomposition of v ∈ Rr into highly and less important terms, which could help for the truncation. In contrast, large terms may be negligible since they add up to a small tensor. Qd The disadvantage of the tensor subspace format is the data size j=1 rj of the coefficient tensor. For larger d and rj ≥ r, the exponential increase of rd leads to severe memory problems. While the ranks rj of the tensor subspace format are bounded by nj = dim(Vj ), the upper bound of the tensor rank has exponential increase with respect to d (cf. Lemma 3.45). Therefore the size of r in v ∈ Rr may become problematic. On the other hand, both formats have their characteristic advantages: ⊕ If the rank r of v ∈ Rr is moderate, the representation of v by the r-term format requires a rather small storage size, which is proportional to r, d, and the dimension of the involved vector spaces Vj . ⊕ The tensor subspace format together with the higher-order singular-value decompositions (HOSVD) allows a simple truncation to smaller ranks. For the Qd important case of d = 3, the data size j=1 rj is still tolerable.
Chapter 11
Hierarchical Tensor Representation
Abstract The hierarchical tensor representation (notation: Hr ) allows to keep the advantages of the subspace structure of the tensor subspace format Tr , but has only linear cost with respect to the order d concerning storage and operations. The hierarchy mentioned in the name is given by a ‘dimension partition tree’. The fact that the tree is binary, allows a simple application of the singular-value decomposition and enables an easy truncation procedure. After an introduction in Section 11.1, the algebraic structure of the hierarchical tensor representation is described in Section 11.2. While the algebraic representation uses subspaces, the concrete representation in Section 11.3 introduces frames or bases and the associated coefficient matrices in the hierarchy. Again, higher order singular-value decompositions (HOSVD) can be applied and the left singular vectors can be used as basis. In Section 11.4, the approximation in the Hr format is studied with respect to two aspects. First, the best approximation within Hr can be considered. Second, the HOSVD bases allow a quasi-optimal truncation. Section 11.5 discusses the joining of two representations. This important feature is needed if two tensors described by two different hierarchical tensor representations require a common representation.
11.1 Introduction 11.1.1 Hierarchical Structure In the following construction we want to keep the positive properties of the tensor subspace representation but avoiding exponential increase of the coefficient tensor. The dimension of the subspaces Uj ⊂ Vj is bounded by rj , but their d-fold tensor product is again high-dimensional. In the approach of the hierarchical tensor format we repeat the concept of tensor subspaces on higher levels: we do not form tensor products of all Uj , but again choose subspaces of pairs of subspaces so that the dimension is reduced. The recursive use of the subspace idea leads to a certain © Springer Nature Switzerland AG 2019 W. Hackbusch, Tensor Spaces and Numerical Tensor Calculus, Springer Series in Computational Mathematics 56, https://doi.org/10.1007/978-3-030-35554-8_11
387
11 Hierarchical Tensor Representation
388
tree structure describing a hierarchy of subspaces. In particular, we shall use binary trees since this fact will allow us to apply standard singular-value decompositions to obtain HOSVD bases and to apply HOSVD truncations. The previous r-term and tensor subspace representations are invariant with respect to the ordering of the spaces Vj . This is different for the hierarchical format.1 Before giving a strict definition, we illustrate the idea by examples. We N start with the case of d = 4, where 4 V = j=1 Vj . In §3.2.4 a tensor space of order 4 has been introduced as
V V12 V1
V34 V2
V3
((V1 ⊗ V2 ) ⊗ V3 ) ⊗ V4
V4
Fig. 11.1 Hierarchy of spaces
V12 := V1 ⊗ V2
using the definition of binary tensor products. However, the order of binary tensor products can be varied. We may first introduce the spaces and
V34 := V3 ⊗ V4
and then V12 ⊗ V34 = (V1 ⊗ V2 ) ⊗ (V3 ⊗ V4 ) ∼ = V1 ⊗ V2 ⊗ V3 ⊗ V4 = V. This approach is visualised in Fig. 11.1. Following the tensor subspace idea, we introduce subspaces for all spaces in Fig. 11.1. This leads to the following construction:
U{1,2} U1 ⊂ V1
U{1,2,3,4} ⊂ U{1,2} ⊗ U{3,4} ⊂ V ⊂ U1 ⊗ U2 U{3,4} ⊂ U3 ⊗ U4 (11.1) U2 ⊂ V2 U3 ⊂ V3 U4 ⊂ V4
The tensor to represented must be contained in the upper subspace U{1,2,3,4} (root of the tree). Assume that there are suitable subspaces Uj (1 ≤ j ≤ 4) of dimension r. Then the tensor products U1 ⊗ U2 and U3 ⊗ U4 have the increased dimension r2 . The hope is to find again subspaces U{1,2} and U{3,4} of smaller dimension, say, r. In this way the exponential increase of the dimension could be avoided. The construction by (11.1) is still invariant with respect to permutations 1 ↔ 2, 3 ↔ 4, {1, 2} ↔ {3, 4}; however, the permutation 2 ↔ 3 yields another tree. L A perfectly balanced tree as in Fig. 11.1 requires that N7d = 2 , where L is the depth of the tree. The next example d = 7, i.e., V = j=1 Vj , shows a possible 1
Instead of ‘hierarchical tensor format’ also the term ‘H-Tucker format’ is used (cf. [121], [200]), since the subspace idea of the Tucker format is repeated recursively. This term may include trees which are not necessarily binary. Then the tensor subspace format (Tucker format) is a special case of the H-Tucker format.
11.1 Introduction
389
construction for V by binary tensor products: V
level 0
V123
level 1 V1
level 2 level 3
V4567
V23 V2 V3
(11.2)
V45 V67 V5 V6 V7 V4
Again, the hierarchical format replaces the full spaces with subspaces. The position of V in the tree is called the root . The factors of a binary tensor product appear in the tree as two sons, e.g., V1 and V23 are the sons of V123 in the last example. Equivalently, V123 is called the father of V1 and V23 . The spaces Vj , which cannot be decomposed further, are the leaves of the tree. The root is associated with the level 0. The other levels are defined recursively: sons of a father at level ` have the level number ` + 1. The fact that we choose a binary tree (i.e., the number of sons is either 2 or 0) is essential since we want to apply matrix techniques. If we decompose tensor d+1 products of order d recursively in d2 + d2 factors for even d and in d−1 2 + 2 2 factors for odd d, it follows easily that L := dlog2 de
(11.3)
is the largest level number (depth of the tree). {1,2,3,4,5,6,7} {1,2,3,4,5,6} {7} {1,2,3,4,5} {6} {1,2,3,4} {5}
Quite another division strategy is the splitting intoN 1 + (d − 1) factors. The lat7 ter example j=1 Vj leads to the partition tree TDTT depicted in Fig. 11.2. In this case the depth of the tree is maximal: L := d − 1.
(11.4)
Another derivation of the hierarchical representation can be connected with the {1,2} {3} minimal subspaces in §6. The fact that a tensor v might have small j-th ranks, gives {1} {2} rise to the standard tensor subspace format with Uj = Ujmin (v). However, minimal Fig. 11.2 Linear tree TdTT subspaces Umin α (v) of dimension rα are also defined for subsets ∅ $ α $ D := {1, . . . , d} (cf. §6.4). For instance, U{1,2} min and U{3,4} in (11.1) can be chosen as Umin {1,2} (v) and U{3,4} (v). As shown in (6.10), they are nested is the sense of U{1,2} ⊂ U1 ⊗ U2 , etc. Note that the last minimal subspace is one-dimensional: U{1,2,3,4} = Umin {1,2,3,4} (v) = span{v}. {1,2,3} {4}
2
dxe is the integer with x ≤ dxe < x + 1.
11 Hierarchical Tensor Representation
390
11.1.2 Properties We give a preview of the favourable features of the actual approach. 1. Representations in the formats Rr (cf. §11.2.4.2, §11.3.6) or Tr (cf. §11.2.4.1) can be converted into hierarchical format with similar storage cost. Later this will be shown for other formats (sparse-grid representation, TT representation). As a consequence, the approximability by the hierarchical format is at least as good as by the aforementioned formats. Nd 2. The cost is strictly linear in the order d of the tensor space j=1 Vj . This is in contrast to Tr , where the coefficient tensor causes problems for higher d. These statements hold under the assumption that the rank parameters stay bounded when d varies. If, however, the rank parameters increase with d, all formats have problems. 3. The binary tree structure allows us to compute all approximations by singularvalue decompositions. This is essential for the truncation procedure. 4. The actual representation may be much cheaper than a representation within Rr or Tr (see next Example 11.1). Example 11.1. Consider the Hilbert tensor space L2 ([0, 1]d ) = L2 for d = 4 and the particular function
Nd
j=1
L2 ([0, 1])
f (x1 , x2 , x3 , x4 ) = P1 (x1 , x2 ) · P2 (x3 , x4 ), where P1 and P2 are polynomials of degree p. (a) Realisation in hierarchical format. The example is such that the dimension partition tree TD from Fig. 11.3 is optimal. Writing P1 (x1 , x2 ) as p+1 X i=1
P1,i (x1 ) xi−1 = 2
p+1 X
P1,i (x1 ) ⊗ x2i−1 ,
i=1
we see that dim(U1 ) = dim(U2 ) = p + 1 is sufficient to have P1 (x1 , x2 ) ∈ U1 ⊗ U2 . Similarly, dim(U3 ) = dim(U4 ) = p + 1 is enough to ensure P2 (x3 , x4 ) ∈ U3 ⊗ U4 . The subspaces U12 and U34 may be one-dimensional: U12 = span{P1 (x1 , x2 )}, U34 = span{P2 (x3 , x4 )}. The last subspace U14 = span{f } is one-dimensional anyway. Hence the highest dimension is rj = p + 1 for 1 ≤ j ≤ 4. so that (b) Realisation in Tr . The subspaces N4Uj coincide with Uj from above d 4 rj = dim(Uj ) = p+1. Hence U = j=1 Uj has dimension (p + 1) = (p + 1) . Pp+1 i−1 (c) Realisation in Rr . Both polynomials P1 (x1 , x2 ) = i=1 P1,i (x1 )x2 , and Pp+1 i−1 P2 (x3 , x4 ) = i=1 P2,i (x3 )x4 , has p + 1 terms. Hence their product has r = 2 (p + 1) terms.
11.1 Introduction
391
The background of this example is that the formats Tr and Rr are symmetNd ric in the treatment of the factors Vj in V = j=1 Vj . The simple structure f = P1 ⊗P2 (while P1 and P2 are non-elementary tensors) cannot be mapped into the TrNor Rr structure. We may extend the example to higher d by choosing d/2 f = j=1 Pj , where Pj = Pj (x2j−1 , x2j ). Then the cost of the Tr and Rr d d/2 formats is exponential in d (more precisely, (p + 1) for Tr and r = (p + 1) for Rr ), while for the hierarchical format the dimensions are rj = p + 1 (1 ≤ j ≤ d) and rα = 1 for all other subsets α appearing in the tree.
11.1.3 Historical Comments The hierarchical idea has also appeared as ‘sequential unfolding’ of a tensor. For instance, v ∈ Kn×n×n×n can be split by a singular-value decomposition into X eν ⊗ fν with eν , fν ∈ Kn×n . v= ν
Again, each eν , fν has a decomposition X eν = aν,µ ⊗ bν,µ and µ
fν =
X λ
cν,λ ⊗ dν,λ
n
with vectors aν,µ , . . . , dν,λ ∈ K . Together, we obtain X v= aν,µ ⊗ bν,µ ⊗ cν,λ ⊗ dν,λ . ν,µ,λ
The required data size is 4r2 n, where r ≤ n is the maximal size of the index Nd sets for ν, µ, λ. In the general case of v ∈ j=1 Kn with d = 2L , the required storage is drL n. Such an approach is considered in Khoromskij [182, §2.2] and repeated in Salmi–Richter–Koivunen [252]. The major drawback is that singularvalue decompositions are to be computed for extremely large matrices. The remedy is to require that all eν from above belong simultaneously to one r-dimensional subspace (U12 in (11.1)). The author has borrowed this idea from a similar approach, leading to the H2 technique of hierarchical matrices (cf. Hackbusch [138, §8], B¨orm [37]). The hierarchical tensor format is described 2009 in Hackbusch–K¨uhn [152]. An additional analysis is given by Grasedyck [120]. However, the matrix-product variant was mentioned earlier by Vidal [297, Eq. (5)] in the quantum computing community. In quantum chemistry MCTDH abbreviates ‘multi-configuration time-dependent Hartree’ (cf. Meyer et al. [226]). In Wang– Thoss [300] a multilayer formulation of the MCTDH theory was presented which might contain the idea of a hierarchical format. At least Lubich [220, p. 45] has translated the very specific quantum chemistry language of that paper into a mathematical formulation using the key construction (11.20) of the hierarchical tensor representation. A closely related method is the matrix product system which is the subject of the next chapter (§12).
11 Hierarchical Tensor Representation
392
11.2 Basic Definitions 11.2.1 Dimension Partition Tree We consider the tensor space3 V=
O a
Vj
j∈D
with a finite index set D = {1, . . . , d}. To avoid trivial cases, we assume d ≥ 2. Definition 11.2 (dimension partition tree). The tree TD is called a dimension partition tree (of D) if 1) all vertices4 α ∈ TD are nonempty subsets of D, 2) D is the root of TD , 3) every vertex α ∈ TD with #α ≥ 2 has two sons α1 , α2 ∈ TD such that α = α1 ∪ α2 ,
α1 ∩ α2 = ∅ .
The set of sons of α is denoted by S(α). If S(α) = ∅, α is called a leaf. The set of leaves is denoted by L(TD ). D={1,2,3,4} {1,2}
{3,4}
{1} {2}
{3} {4}
Fig. 11.3 Dimension partition tree
The tree TD corresponding to (11.1) is illustrated in Fig. 11.3. The numbers 1, . . . , d are chosen according to Remark 11.4. As mentioned in §11.1, the level number of vertices are defined recursively by level(D) = 0,
σ ∈ S(α) ⇒ level(σ) = level(α) + 1. (11.5)
The depth of the tree defined below is often abbreviated by L : L := depth(TD ) := max {level(α) : α ∈ TD } .
(11.6)
Occasionally, the following level-wise decomposition of the tree TD is of interest: (`)
TD := {α ∈ TD : level(α) = `}
(0 ≤ ` ≤ L) .
(11.7)
Remark 11.3. Easy consequences of Definition 11.2 are: (a) TD is a binary tree, (b) The set of leaves, L(TD ), consists of all singletons of D: L(TD ) = {{j} : j ∈ D} . (c) The number of vertices in TD is 2#D − 1 = 2d − 1. 3
The tensors represented N next belong to the algebraic tensor space Valg = a Valg ⊂ Vtop = k·k j∈D Vj , they may also be seen as elements of Vtop . 4
Elements of a tree are called ‘vertices’.
N
j∈D
Vj . Since
11.2 Basic Definitions
393
Considering D as a set, no ordering is prescribed. A total ordering not only of D but also of all vertices of the tree TD can be defined as follows. Remark 11.4 (ordering of Td ). (a) Choose some ordering of the two elements in S(α) for any α ∈ TD \L(TD ); i.e., there are a first son α1 and a second son α2 of α. The ordering is denoted by α1 < α2 . Then for two different β, γ ∈ TD there are the following cases: (i) If β, γ are disjoint, there is some α ∈ TD with sons α1 < α2 such that β ⊂ α1 and γ ⊂ α2 [or γ ⊂ α1 and β ⊂ α2 ]. Then define β < γ [or γ < β, respectively]; (ii) If β, γ are not disjoint, either β ⊂ γ or γ ⊂ β must hold. Then define β < γ or γ < β, respectively. (b) Let TD be ordered and denote the elements of D by 1, . . . , d according to their α ≤ i ≤ iα ordering. The vertices α are of the form α = {i ∈ D : imin max } with α α bounds imin , imax . For the sons α1 < α2 of α we have α2 α α α1 α2 1 imin = iα min ≤ imax = imin − 1 < imax = imax .
Interchanging the ordering of α1 and α2 yields a permutation of D, but the hierarchical representation will be completely isomorphic (cf. Remark 11.20). The notation α1 , α2 ∈ S(α) or S(α) = {α1 , α2 } tacitly implies that α1 < α2 . Taking the example (11.1) corresponding to TD from Fig. 11.3, we may define that the left son precedes the right one. Then the total ordering {1} < {2} < {1, 2} < {3} < {4} < {3, 4} < {1, 2, 3, 4} of TD results. In particular, we obtain the ordering 1 < 2 < 3 < 4 of D, where we identify j with {j} . Remark 11.5. (a) The minimal depth of TD is L = dlog2 de (cf. (11.3)), which is obtained under the additional condition |#α1 − #α2 | ≤ 1
for all α1 , α2 ∈ S(α)\L(TD ).
(b) The maximal depth of TD is L = d − 1 (cf. (11.4)). Different trees TD will lead to different formats and, in the case of approximations, to different approximation errors. Example 11.1 shows that for a given tensor there may be more and less appropriate trees. N We recall the notation Vα := j∈α Vj for α ⊂ D (cf. (5.4)). For leaves α = {j} with j ∈ D, the latter definition reads V{j} = Vj . The matricisation Mα denotes the isomorphism V ∼ = Vα ⊗ Vαc , where αc := D\α is the complement. Definition 11.6 (Tα ). For any α ∈ TD the subtree Tα := {β ∈ TD : β ⊂ α} is defined by the root α and the same sets S(β) of sons as in TD .
11 Hierarchical Tensor Representation
394
11.2.2 Algebraic Characterisation, Hierarchical Subspace Family In the case of Tr , the definition in (8.2) is an algebraic one, while the concrete tensor subspace representation v = ρTS a, (Bj )dj=1 uses bases and coefficients. Similarly, here we start with a definition based on subspace properties and later in §11.3.1 introduce bases and coefficient matrices. N Let a dimension partition tree TD together with a tensor v ∈ V = a j∈D Vj be given. The hierarchical representation of v is characterised by finite-dimensional subspaces5 O Vj Uα ⊂ Vα := a for all α ∈ TD . (11.8) j∈α
The basis assumptions on Uα depend on the nature of the vertex α ∈ TD . Here we distinguish (a) the root α = D, (b) leaves α ∈ L(TD ), (c) non-leaf vertices α ∈ TD \L(TD ) including the root. • One aim of the construction is to obtain a subspace Ud at the root D ∈ TD such that (11.9a) v ∈ Ud . Since D ∈ TD is not a leaf, condition (11.9c) must also hold. • At a leaf α = {j} ∈ L(TD ), Uα = Uj is a subspace6 of Vj (same situation as in (8.2) for Tr ): for all j ∈ D, i.e., α = {j} ∈ L(TD ).
Uj ⊂ Vj
(11.9b)
• For any vertex α ∈ TD \L(TD ) with sons α1 , α2 ∈ S(α), the subspace Uα (cf. (11.8)) must be related to the subspaces Uα1 and Uα1 by the crucial nestedness property Uα ⊂ Uα1 ⊗ Uα2
for all α ∈ TD \L(TD ), α1 , α2 ∈ S(α).
(11.9c)
Diagram (11.1) depicts these conditions for the tree TD from Fig. 11.3. Note that U{1,2} ⊂ U{1} ⊗U{2} is a subspace, but not a tensor subspace (cf. Abstract in §8). Remark 11.7. (a) Since the subspace Ud ⊂ V must only satisfy that v ∈ Ud for a given tensor v ∈ V, it is sufficient to define Ud by i.e., dim(Ud ) = 1. (11.10) N (b) Another situation arises if a family F ⊂ U = j∈D Uj of tensors has to be represented (cf. §6.2.3). Then requirement (11.9a) is replaced with F ⊂ Ud . The minimal choice is Ud = span(F ). Ud = span{v},
5
We use the bold-face notation Uα for tensor spaces, although for α ∈ L(Td ), U{j} = Uj is a subspace of the standard vector space Vj . 6 We identify the notations Vj (j ∈ D) and Vα = V{j} for α = {j} ∈ L(TD ). Similar for Uj = U{j} . Concerning the relation of L(TD ) and D see Remark 11.3b.
11.2 Basic Definitions
395
Definition 11.8 (hierarchical subspace family). (a) We call {Uα }α∈TD a hierN archical subspace family (associated with V = a j∈D Vj ) if TD is a dimension partition tree of D and the subspaces Uα satisfy (11.9b,c). (b) We say that a tensor v is represented by the hierarchical subspace family {Uα }α∈TD if v ∈ Ud (cf. (11.9a)). Definition 11.9. A successor of α ∈ TD is any σ ∈ TD with σ ⊂ α. A set {σi } of disjoint successors of α ∈ TD is called complete if α = ∪˙ i σi . For instance, in the case of the tree from Fig. 11.3, the set {{1}, {2}, {3, 4}} is a complete set of successors of {1, 2, 3, 4}. Lemma 11.10. Let {Uα }α∈TD be a hierarchical subspace family. For any α ∈ TD and any complete set Σ of successors of α, O Uσ ⊂ Vα Uα ⊂ (11.11) holds with Vα in (11.8).
σ∈Σ
Proof. The fundamental structure (11.9c) implies Uα ⊂ Uα1 ⊗ Uα2 . The chain of inclusions can be repeated N inductively Nto prove the first inclusion in (11.11). Since Uσ ⊂ Vσ (cf. (11.8)), U ⊂ u t σ σ∈Σ σ∈Σ Vσ = Vα proves the last inclusion. The analogue of Tr is the set Hr which is defined next. We start from given (bounds of) dimensions (11.12) r := (rα )α∈TD ∈ NT0 D , and consider subspaces Uα ⊂ Vα with dim(Uα ) ≤ rα for all α ∈ TD . N Definition 11.11 (Hr ). Fix some tuple r in (11.12) and let V = a k∈D Vk . Then Hr = Hr (V) ⊂ V is the set7 there is a hierarchical subspace family {Uα }α∈TD (11.13) Hr := v ∈ V : with v ∈ Ud and dim(Uα ) ≤ rα for all α ∈ TD .
11.2.3 Minimal Subspaces We recall the definition of the minimal subspace associated to α ⊂ D with complement αc = D\α: n O 0 o Umin V for all α ∈ TD \{D} (11.14a) j α (v) = ϕαc (v) : ϕαc ∈ c j∈α
(cf. (6.11)). A possible computation uses the matricisation Mα (v) from Definition (α) 5.3. The (left-sided) singular-value decomposition of Mα (v) yields the data σi Pr (α) (α) (α) (αc ) (α) (α) and ui of v = i=1 σi ui ⊗ vi with σ1 ≥ . . . ≥ σr > 0. Then 7
By analogy with (8.2) one might prefer dim(Uα ) = rα instead of dim(Uα ) ≤ rα . The reason for the latter choice is the fact that, otherwise, without the conditions (11.15) Hr = ∅ may occur.
11 Hierarchical Tensor Representation
396 (α)
min Uα (v) = span{ui
: 1 ≤ i ≤ r}.
(11.14b)
For α = {j} ∈ L(TD ), these subspaces coincide with the subspaces Ujmin (v) of the tensor subspace representation (cf. (6.8a)). For α ∈ TD \L(TD ), the subspaces fulfil the nestedness property min min Umin α (v) ⊂ Uα1 (v) ⊗ Uα2 (v)
(α1 , α2 sons of α)
(11.14c)
as stated in (6.10). Next, we give a simple characterisation of the property v ∈ Hr . Moreover, the subspace family {Uα }α∈TD can be described explicitly. N Theorem 11.12. Let v ∈ a j∈D Vj . (a) v belongs to Hr with r := (rα )α∈TD ∈ NT0 D if and only if rankα (v) ≤ rα holds for all α ∈ TD . (b) {Umin α (v)}α∈TD is a possible hierarchical subspace family. (c) Any other hierarchical subspace family {Uα }α∈TD satisfies Uα ⊃ Umin α (v). Proof. (i) The α-rank definition rankα (v) = dim(Umin α (v)) (cf. (6.12)) and rankα (v) ≤ rα imply the condition dim(Uα ) ≤ rα for Uα := Umin α (v) in (11.13). Property (11.14c) proves that {Uα }α∈TD is a hierarchical subspace family. (ii) Assume v ∈ Hr with some hierarchical subspace family {Uα }α∈TD . Fix some α ∈ TD and choose a complete set Σ of successors such N that α ∈ Σ. Statement (11.11) with D, instead of α, shows that v ∈ Ud ⊂ σ∈Σ Uσ . The definimin tion of a minimal subspace implies Umin σ (v) ⊂ Uσ , in particular, Uα (v) ⊂ Uα for the fixed but arbitrary vertex α ∈ TD . Hence the inequality rankα (v) = dim(Umin t u α (v)) ≤ dim(Uα ) ≤ rα proves the reverse direction of the theorem. N Definition 11.13. Let v ∈ a j∈D Vj . (a) The tuple r = (rα )α∈TD with rα := rankα (v) is called the hierarchical rank of v. (b) If v is represented by a hierarchical subspace family {Uα }α∈TD , r = (rα )α∈TD with rα := dim(Uα ) is called the hierarchical representation rank of v. Remark 11.14. Let α ∈ TD \{D} be a vertex with sons α1 , α2 ∈ S(α) so that min min min Umin α (v) ⊂ Uα1 (v) ⊗ Uα2 (v). Then Uαi (v) can be interpreted differently: min Umin αi (v) = Uαi (F )
for F := Umin α (v), i = 1, 2
(cf. (6.8c)).
(α)
Proof. Umin the span of all ui appearing in the reduced singular-value α (v) is P Prα min (α) (αc ) (α) (α) rα min (F ) = i=1 decomposition v = i=1 σi ui ⊗ vi . Hence Uα Uα1 (ui ) 1 (α) holds. The singular-value decomposition of each ui by (α)
ui
=
r X
(i) (i)
(i)
τj aj ⊗ bj
(i)
(i)
j=1
yields
(α) Umin α1 (ui )
(i)
with aj ∈ Vα1 , bj ∈ Vα2 , τ1 ≥ . . . ≥ τr(i) > 0 (i)
= span{aj : 1 ≤ j ≤ r} so that
11.2 Basic Definitions
397
(i) Umin α1 (F ) = span aj : 1 ≤ j ≤ r, 1 ≤ i ≤ rα . (i)
(i)
(i)
(i)
(i)
0 There are functionals βj ∈ Vα with βj (bk ) = δjk (cf. (2.1)). As stated in 2 (α) (i) (11.14a), ui = ϕαc (v) holds for some ϕαc ∈ Vα0 c . The functional ϕ := βj ⊗ ϕαc belongs to Vα0 c1 (note that α1c = α2 ∪ αc ). It follows from (11.14a) that (i)
r X
(α)
ϕ(v) = βj (ui ) =
(i)
(i)
(i)
(i)
min (v). τk βj (bk ) ak = τj aj ∈ Uα 1
k=1 (i) aj ∈
min min Uα (v) for all i, j, i.e., Umin Hence α1 (F ) ⊂ Uα1 (v). On the other hand, 1 P P (i) (i) (i) (αc ) r r α min .u t (v) ⊂ Umin Uα α1 (F ) follows from v = j=1 τj σα,i aj ⊗ bj ⊗ vi i=1 1
Finally, we consider the reverse setting: we specify dimensions rα and construct a tensor v such that rα = rankα (v) = dim(Umin α (v)). The following restrictions are necessary: rα ≤ rα1 rα2 , rα1 ≤ rα2 rα , rα2 ≤ rα1 rα , for α ∈ TD \L(TD ), rα ≤ dim(Vj ) for α = {j} ∈ L(TD ), for α = D, rd = 1
(11.15)
where α1 , α2 are the sons of α. The first line follows from (6.13b) and Corollary 6.21a (note that rankα (v) = rankαc (v)). Lemma 11.15. Let r := (rα )α∈TD ∈ NTD satisfy (11.15). Then there is a tensor min v ∈ Hr (V) with rankα (v) = dim(Uα (v)) = rα . Proof. (i) We construct the subspaces Uα ⊂ Vα from the leaves to the root. For α = {j} ∈ TD \L(TD ) choose any U{j} of dimension r{j} (here the second line in (11.15) is needed). (ii) Assume that for a vertex α ∈ TD \L(TD ) the subspaces Uα1 and Uα2 for the leaves with dimensions rα1 and rα2 are already constructed. Choose any bases (α ) {b` i : 1 ≤ ` ≤ rαi } of Uαi (i = 1, 2). Without loss of generality assume rα1 ≥ rα2 . (α) For the most critical case rα1 = rα2 rα set Uα := span{b` : 1 ≤ ` ≤ rα } with (α) b`
:=
rα2 X
(α2 )
(α )
1 bi+(`−1)r ⊗ bj α
.
2
i,j=1 (α)
We observe that b` ∈ Vα satisfies (α)
(α )
1 Umin α1 (b` ) = span{bi+(`−1)rα : 1 ≤ i ≤ rα2 }, 2
(α)
while Umin α2 (b` ) = Uα2 .
(α)
From Exercise 6.14 we conclude that Umin : 1 ≤ ` ≤ rα }) = Umin α1 ({b` α1 (Uα ) = (α1 ) span{bi : 1 ≤ i ≤ rα2 rα } = Uα1 . For rα1 < rα2 rα (but rα ≤ rα1 rα2 ) it is rα1 =rα2 rα
(α)
easy to produce linearly independent {b` : 1 ≤ ` ≤ rα } with Umin αi (Uα ) = Uαi .
11 Hierarchical Tensor Representation
398
(iii) For α = D, the first and third lines of (11.15) imply that rα1 = rα2 . Prα1 (α1 ) (α ) min (v) = Uαi ⊗ bi 2 and Ud = span{v}. Obviously, Uα Set v := i=1 bi i holds for i = 1, 2, proving the assertion rαi = rankαi (v). For the other vertices use Remark 11.14. Induction from the root to the leaves shows that Uα = Umin α (v) min min min implies Uαi (Uα ) = Uαi (v). Because of the identities Uαi (Uα ) = Uαi and dim(Uαi ) = rankαi (v) = rαi , the lemma is proved. t u N Remark 11.16. With probability one, a random tensor from j∈D Knj possesses the maximal hierarchical rank r with Y Y (α ∈ TD ) . rα = min nj nj , j∈α
j∈D\α
Proof. Apply Remark 2.5 to the matrix Mα (v) and note that rα = rank(Mα (v)).u t
11.2.4 Conversions Tensors from Tr or Rr can be represented exactly in Hr with at most similar memory cost. 11.2.4.1 Conversion from Tr to Hr , Maximal Subspaces Assume that a tensor is given in the tensor subspace representation: O O Vj v∈U= Uj ⊂ Tr ⊂ V = j∈D
j∈D
with dim(Uj ) = rj . The subspaces of maximal dimension are ( for α = {j}, Uj Uα := N j∈α Uj for α ∈ TD \L(TD ). Q From v ∈ Tr we derive that dim(Uj ) = rj and, in general, dim(Uα ) = j∈α rj Q for α 6= D. The large dimension dim(Uα ) = j∈α rj corresponds exactly to the N rj in (8.5). On the positive large data size of the coefficient tensor a ∈ j∈D K side, this approach allows us to represent any v ∈ U in the hierarchical format {Uα }α∈TD . 11.2.4.2 Conversion from Rr Assume an r-term representation of v ∈ Rr ⊂ V = v=
r O X i=1 j∈D
(j)
vi
N
(j)
with vi
j∈D
∈ Vj .
Vj by
11.2 Basic Definitions
Set
( Uα :=
399
o nN (j) v : 1 ≤ r ≤ i span j∈α i span{v}
for α 6= D, for α = D.
(11.16)
Obviously, conditions (11.9a–c) are fulfilled. This proves the next statement. Theorem 11.17. Let TD be any dimension partition tree. Then v ∈ Rr ⊂ V = N V belongs to Hr with rα = r for α 6= D and rα = 1 for α = D. j j∈D Conversion from Rr to hierarchical format will be further discussed in §11.3.6. The (exact) conversion in the reverse direction (from Hr into Rr ) may lead to exponentially large ranks (cf. Cohen–Sharir–Shashua [60]).
11.2.4.3 Conversion from Sparse-Grid The sparse-grid space Vsg and the spaces V(`) are introduced in (7.19) and (7.18). Given v ∈ Vsg and any dimension partition tree TD , we define the characteristic subspaces Uα (α ∈ TD ) as follows: X O V(`j ) for α ∈ TD \{D}, (11.17) Uα = P
`j = `+d−1
j∈α
j∈α
and Ud = span{v}. P Because of (7.18), we mayPreplace the summation in (7.19) and (11.17) over `j ≤ L. `j = L := ` + d − 1 with We have to prove the nestedness property (11.9c): Uα ⊂ Uα1 ⊗ Uα2 . It is sufficient to prove O V(`j ) ⊂ Uα1 ⊗ Uα2 j∈α
for any tuple (`j )j∈α with
P
= L. Obviously, O O V(`j ) . V(`j ) ⊗ = j∈α `j
O j∈α
Since N
P
j∈α2
j∈α2
j∈α1
≤ L, the inclusion ⊂ Uα2 .
j∈α1 `j
V(`j )
V(`j )
N
j∈α1
V(`j ) ⊂ Uα1 holds and, analogously,
If we also define Ud by (11.17), Ud = Vsg follows. Hence v ∈ Vsg belongs to Uα1 ⊗ Uα2 (α1 , α2 sons of D), proving Ud = span{v} ⊂ Uα1 ⊗ Uα2 . All subspaces Uα satisfy dim(Uα ) ≤ dim(Vsg ) = O(2L Ld−1 ). This proves that any sparse-grid tensor from Vsg can be exactly represented in hierarchical format with subspaces of dimensions not exceeding dim(Vsg ).
11 Hierarchical Tensor Representation
400
11.3 Construction of Bases As for the tensor subspace format, the involved subspaces must be characterised by frames or bases. The particular problem in the case of the hierarchical format, (α) (α) (α) is the fact that a basis [b1 , . . . ,brα ] of Uα consists of tensors bi ∈ Vα of (α) order #α. A representation of bi by its entries would require a huge storage. It is (α) essential to describe the vectors bi indirectly by the frames associated to the sons α1 , α2 ∈ S(α). The general concept of the hierarchical representation is explained in §11.3.1. Of particular interest is the calculation of basis transformations. One usually prefers an orthonormal basis representation which is discussed in §11.3.2. A special orthonormal basis is defined by the higher-order singular-value decomposition (HOSVD). Its definition and construction are given in §11.3.3. In §11.3.5 a sensitivity analysis is given, which describes how perturbations of the data influence the tensor. Finally, in §11.3.6, we mention that the conversion of r-term tensors into the hierarchical format yields very particular coefficient matrices.
11.3.1 Hierarchical Basis Representation The term ‘basis’ in the heading may be replaced more generally with ‘frame’.
11.3.1.1 Basic Structure In the most general case the subspace Uα (α ∈ TD ) is generated by a frame: h i (α) (α) r Bα = b1 , b2 , . . . , br(α) with (11.18a) ∈ (Uα ) α α (α)
Uα = span{bi
: 1 ≤ i ≤ rα }
for all α ∈ TD .
(11.18b) (α)
Except for α ∈ L(TD )—i.e., for leaves α = {j}—the tensors bi ∈ Uα are (α) not represented as full tensors. Therefore the frame vectors bi only serve for theoretical purposes, while other data will be used in the later representation of a tensor. The integer8 rα is defined by (11.18b) denoting the size of the frame. Concerning the choice for Bα in (11.18a,b), the following possibilities exist: r
1. Frame. A frame Bα ∈ (Uα ) α cannot be avoided as an intermediate representation (cf. §11.5.2), but usually one of the following choices is preferred. 2. Basis.
If Bα is a basis, the number rα coincides with the dimension: rα := dim(Uα )
8
for all α ∈ TD . (j)
See Footnote 6 for the notation rj = r{j} . Similarly, bi
({j})
= bi
etc.
11.3 Construction of Bases
401
3. Orthonormal basis. Assuming a scalar product in Uα , we can construct an orthonormal basis Bα (cf. §11.3.2). 4. HOSVD. The higher-order singular-value decomposition in §8.3 can be applied again and leads to a particular orthonormal basis Bα (cf. §11.3.3). For the practical realisation, we must distinguish leaves α ∈ L(TD ) from nonleaf vertices. Leaf vertices. Leaves α ∈ L(TD ) are characterised N by α = {j} for some j ∈ D. The subspace Uj ⊂ Vj refers to Vj from V = a j∈D Vj and is characterised by (j) (j) (j) r a frame or basis Bj = b1 , . . . , brj ∈ (Uj ) j from above. The vectors bi are stored directly. Remark 11.18. (a) If Vj = KIj , the memory cost for Bj ∈ KIj ×rj is rj #Ij . (b) Depending on the nature of the vector space Vj , we may use other data-sparse (j) representations of bi (cf. §7.5 and §14.1.4.3). Non-leaf vertices α ∈ TD \L(TD ) The sons of α are denoted by α1 , α2 ∈ S(α). (α ) (α ) Let bi 1 and bj 2 be the columns of the respective frames [bases] Bα1 and Bα2 . Then the tensor space Uα1 ⊗ Uα2 has the canonical frame [basis] consisting of the tensor products of the frame [basis] vectors of Uα1 and Uα2 , as detailed below. Remark 11.19. Let Bα1 and Bα2 be generating systems of Uα1 and Uα2 . De(α) fine the tuple B and the tensors bij ∈ Uα1 ⊗ Uα2 by (α)
(α1 )
B := (bij := bi
(α2 )
⊗ bj
: 1 ≤ i ≤ rα1 , 1 ≤ j ≤ rα2 ).
(a) If Bα1 and Bα2 are frames, B is a frame of Uα1 ⊗ Uα2 . (b) If Bα1 and Bα2 are bases, B is a basis of Uα1 ⊗ Uα2 (cf. Lemma 3.13a). (c) If Bα1 and Bα2 are orthonormal bases, also B is an orthonormal basis of the product Uα1 ⊗ Uα2 (cf. Remark 4.148). As a consequence, any tensor w ∈ Uα1 ⊗ Uα2 and, in particular, w ∈ Uα ⊂ Uα1 ⊗ Uα2 , can be written in the form9 w=
rα1 rα2 X X
(α) (α)
cij bij =
rα1 rα2 X X
(α)
(α1 )
cij bi
(α2 )
⊗ bj
.
(11.19a)
i=1 j=1
i=1 j=1
(α)
(α)
Since the frame vectors bij carry two indices, the coefficient vector cij the special form of a coefficient matrix (α) C (α) = cij i=1,...,rα1 ∈ Krα1 ×rα2 . j=1,...,rα2
9
In the case of a frame, the coefficients
(α) cij
are not uniquely determined.
has
11 Hierarchical Tensor Representation
402
If Bα1 and Bα2 are bases, the one-to-one correspondence between a tensor w ∈ Uα1 ⊗ Uα2 and its coefficient matrix C (α) defines an isomorphism which we denote by Θα : Uα1 ⊗ Uα2 → Krα1 ×rα2
for α ∈ TD \L(TD ).
(11.19b)
The fact that w ∈ Uα1 ⊗ Uα2 can be coded by rα1· rα2 numbers, is independent (α ) (α ) of how the frame vectors bi 1 , bj 2 are represented. They may be given directly as for the leaves α` ∈ L(TD ) or indirectly as for non-leaves α` ∈ TD \L(TD ) . (α)
Now we apply the representation (11.19a,b) to the frame vectors b` ∈ Uα from (α) (α) Bα = b1 , . . . , brα and denote the coefficient matrix by C (α,`) ∈ Krα1 ×rα2 : (α)
b`
with
rα 1 rα 2
(α,`) (α ) (α ) cij bi 1 ⊗ bj 2 i=1 j=1 (α,`) rα ×rα2 C (α,`) = cij 1≤i≤rα1 ∈ K 1 1≤j≤rα2
=
P P
{α1 , α2 } = S(α) (11.20) for 1 ≤ ` ≤ rα .
This is the crucial relation of the hierarchical format. Using the map Θα in (11.19b), (α) we may write C (α,`) = Θα b` . Remark 11.20. The formulation of the coefficient matrix C (α,`) depends of the ordering of the sons α1 , α2 . If the sons are interchanged, C (α,`) changes into the transposed matrix C (α,`)T . (j)
We summarise: only for leaves α ∈ L(TD ), the basis vectors bi are explicitly (α) represented. For all other vertices, the vectors b` ∈ Uα are defined recursively by means of the coefficient matrices10 C (α,`) . The practical representation of a tensor (j) v ∈ V uses the data C (α,`) for α ∈ TD \L(TD ) and bi for {j} ∈ L(TD ) only, (α) while the theoretical discussion may still refer to b` and their properties. In particular, we obtain a frame [basis] Bd for the root D ∈ TD (for Bd compare (11.18a)). We can represent all tensors of v ∈ Ud by a coefficient vector c(D) ∈ Krd : v=
rd X
(D) (D) bi .
ci
(11.21)
i=1
Remark 11.21. Since, usually, the basis Bd of U d consists of one basis vector (D) (D) only (cf. Remark 11.7a), one might avoid the coefficient c1 by choosing b1 = v (D) and c1 = 1. However, for systematic reasons (orthonormal basis, basis trans(D) forms), it is advantageous to separate the choice for the basis vector b1 from the value of v. 10
Note that also in wavelet representations, basis vectors do not appear explicitly. Instead the filter coefficients are used for the transfer of the basis vectors.
11.3 Construction of Bases
403
Remark 11.22. According to Lemma 3.27 the tuples (C (α,`) )1≤`≤rα of coefficient matrices can be regarded as a tensor Cα ∈ Krα1 ⊗ Krα2 ⊗ Krα
(11.22a)
defined by the entries (α,`)
Cα [i, j, `] = Cij
.
(11.22b)
11.3.1.2 Explicit Description The definition of the basis vectors is recursive. Correspondingly, all operations will be performed recursively. Below we give an explicit description of the tensor v represented in the hierarchical format, but this description will not be used for practical purposes. Renaming ` by `[α], i by `[α1 ], j by `[α2 ], we rewrite (11.20) by rα2 X
rα1 X
(α)
b`[α] =
(α )
(α )
(α,`[α])
c`[α1 ],`[α2 ] b`[α11 ] ⊗ b`[α22 ] .
`[α1 ]=1 `[α2 ]=1
(α )
(α )
Recursive insertion of the definitions of b`[α11 ] and b`[α22 ] yields (α) b`[α]
rβ X
=
(β,`[β])
Y
c`[β1 ],`[β2 ]
O
(j)
(β1 , β2 sons of β).
b`[{j}]
j∈α
β∈Tα \L(Tα ) `[β]=1 for all β∈Tα \{α}
The multiple summation involves all variable `[β] ∈ {1, . . . , rβ } with β ∈ Tα \{α}, where Tα is the subtree from Definition 11.6. The variable β also takes the values N (j) {j} appearing in j∈α b`[{j}] . The tensor v =
(D) (D) `=1 c` b`
Prd
rα X
v=
(D)
(β,`[β])
Y
c`[D]
`[α]=1 for all α∈TD
(cf. (11.21)) has the representation
c`[β1 ],`[β2 ]
d O
(j)
b`[{j}] .
(11.23)
j=1
β∈TD \L(TD )
For the case of D = {1, 2, 3} and S(D) = {{1, 2} , {3}}, Eq. (11.23) becomes v=
r3 r1 X r2 X X
r{1,2} (D)
c1
·
X
({1,2},ν)
c`1 ,`2
(D,1)
· cν,`3
ν=1
`1 =1 `2 =1 `3 =1
|
3 O j=1
{z
=: a[`1 ,`2 ,`3 ]
}
(j)
b` j ,
11 Hierarchical Tensor Representation
404
where we assume the standard case rd = 1. When using minimal ranks, we obtain r3 = r{1,2} (cf. (6.13a)). Hence C{1,2} (cf. (11.22b)) can be considered as a tensor from Kr1 ×r2 ×r3 which has the same size as the core tensor a of the tensor subspace representation. We conclude from the last example that for d = 3 the tensor subspace format and the hierarchical format require almost the same memory (the data c(D) and c(D,1) are negligible compared with C{1,2} ).
11.3.1.3 Hierarchical Representation and Its Memory Cost Equation (11.23) shows that v is completely determined by the data C (α,`) (cf. (11.20)), c(D) (cf. (11.21)), and the bases Bj for the leaves j ∈ D. The frames Bα for α ∈ TD \L(TD ) are implicitly given by these data. The formal description of the hierarchical tensor representation uses the quantities Cα defined in (11.22a,b):11 (11.24) v = ρHT TD , (Cα )α∈TD \L(TD ) , c(D) , (Bj )j∈D . Remark 11.23. (a) The data size of TD , (Cα )α∈TD \L(TD ) , c(D) , and (Bj )j∈D is HT Nmem (TD ) HT Nmem ((Cα )α∈TD \L(TD ) )
= 2d − 1 vertices, P rα rα1 rα2
=
(d = #D), (α1 , α2 sons of α),
α∈TD \L(TD ) HT Nmem (c(D) ) = rd d P HT rj · size(Uj ) ((Bj )j∈D ) = Nmem
(cf. Remark 11.21), (cf. Remark 8.8a).
j=1
The sum of the last three terms yields the total storage cost: d X X (D) HT rα rα1 rα2 rj · size(Uj ) + Nmem (c , Cα , Bj ) = rd + j=1
α∈TD \L(TD )
usually rd = 1
HT (b) Suppose rj ≤ r for all j ∈ D. Then the data size Nmem ((Bj )j∈D ) is the r-term TSR same as Nmem for the r-term representation or Nmem ((Bj )j∈D ) for the tensor HT HT subspace representation. The terms Nmem (TD ) + Nmem (c(D) ) may be neglected. The dominant parts are HT HT (Cα )α∈TD \L(TD ) . Nmem ((Bj )j∈D ) and Nmem
If V = KI with I = I1 × . . . × Id and #Ij ≤ n, full representation of the basis vectors leads to HT Nmem ((Bj )j∈D ) ≤ d · r · n, while HT Nmem ((Cα )α∈TD \L(TD ) ) ≤ (d − 1) r3 . 11
HT abbreviates ‘hierarchical tensor representation’.
(11.25)
11.3 Construction of Bases
405
Proof. For TD compare Remark 11.3. The coefficient matrix C (α,`) ∈ Krα1×rα2 contains rα1 rα2 entries. Since 1 ≤ ` ≤ rα , the size of Cα is rα1 rα2 rα for each α ∈ TD \L(TD ). Eq. (11.25) follows from # (TD \L(TD )) = #TD − #L(TD ) = t u (2d − 1) − d = d − 1 (cf. Remark 11.3c). HT (Cα ) = Usually, the ranks rα are different. Comparing Nmem
P
α rα rα1 rα2 1/3 HT with (11.25), we can introduce the effective rank reff := Nmem (Cα )/ (d − 1) so that (11.25) holds as an equation for r = reff . However, there will be operation P 2 costs proportional to rα rα1 rα2 giving rise to another average of the ranks.
11.3.1.4 Transformations There will be various reasons to change the frame in (11.18a) into another one. In general, even the subspaces Uα may vary. For a vertex α ∈ TD we consider the following ‘old’ and ‘new’ frames and subspaces: (α) (α) new Bα = b1,new , . . . , brαnew ,new ,
= range{Bnew Unew α }, α
(α) (α) Bold = b1,old , . . . , brold ,old , α
Uold = range{Bold α }, α
α
(α,`)
The replacement Bold 7→ Bnew creates new coefficient matrices Cnew (cf. α α (β,`) Lemma 11.24). Moreover, if α 6= D, the coefficient matrices Cold associated (β,`) to the father β ∈ TD of α must be renewed to form Cnew since these coefficients (cf. Lemma 11.26). If α = D, the coefficient vector c(D) must be refer to Bnew α transformed instead (cf. Lemma 11.28). We distinguish three different situations. new Case A. Bold generate the same subspace Uα = Unew = Uold α and Bα α α . Then new new old old (α) ×rα rα ×rα rα (α) ,S ∈K there are transformation matrices T ∈K such that new rα
Bold α
=
(α) Bnew , α T
i.e.,
(α) bj,old
=
X
(α)
(α)
(1 ≤ j ≤ rαold ), (11.26a)
(α)
(α)
(1 ≤ k ≤ rαnew ). (11.26b)
Tkj bk,new
k=1 old
Bnew α
=
(α) Bold , α S
i.e.,
(α) bk,new
=
rα X
Sjk bj,old
j=1
and Bnew are bases. Then rαold = rαnew = dim(Uα ) In the standard case, Bold α α (α) (α) holds, and T and S are uniquely defined satisfying S (α) := (T (α) )−1 .
11 Hierarchical Tensor Representation
406
In the case of frames, the representation ranks rαold , rαnew ≥ dim(Uα ) may be different so that T (α) and S (α) are rectangular matrices. If rαnew > dim(Uα ) [rαold > dim(Uα )], the matrix T (α) [S (α) ] satisfying (11.26a [b]) is not unique. There may be reasons to change the subspace. In Case B, we consider new ⊃ Uold Unew ⊂ Uold α . α , and, in Case C, the opposite inclusion Uα α $ Uold Case B. Assume Unew α . This is a typical step when we truncate the tensor α representation. Note that a transformation matrix S (α) satisfying (11.26b) exists, whereas there is no T (α) satisfying (11.26a). old Case C. Assume Unew % Uold α α . This happens when we enrich Uα by additional (α) satisfying (11.26a) exists, but no S (α) vectors. Then a transformation matrix T with (11.26b). In Cases A and B, the transformation matrix S (α) exists. The left figure illustrates the connection of the basis Bα α, Β α with Bα1 and Bα2 at the son vertices via the data Cα . Cα Whenever one of these bases changes, also Cα must be α2, Βα 2 updated. Lemma 11.24 describes the update caused by a α1, Βα 1 transformation of Bα , while Lemma 11.26 considers the transformations of Bα1 and Bα2 . (11.26b) proves the following result. (α)
Lemma 11.24. If (11.26b) holds for α ∈ TD \L(TD ), the new basis vectors bk,new (α,k) have coefficient matrices Cnew defined by old
(α,k) Cnew =
rα X
(α)
(α,j)
Sjk Cold
(1 ≤ k ≤ rαnew ) .
(11.27)
j=1
Using the tensor notation (11.22b), this transformation becomes (α) T Cold Cnew = I ⊗ I ⊗ (S ) α . α The arithmetic cost of (11.27) is 2rαnew rαold rα1 rα2 (α1 , α2 sons of α). Next, we consider the influence of a transformation upon the coefficient matrices of the father. Let α ∈ TD \L(TD ) be the father vertex and assume that for at least old new one of the sons α1 , α2 of α the bases Bold α1 and Bα2 are changed into Bα1 and (αi ) = T (αi ) = I for the other son. Since Bnew α2 . If only one basis is changed, set S (αi ) is used, Lemma 11.26 applies to Cases A and C. the transformation matrix T Remark 11.25. The subscripts ‘old’ and ‘new’ in the statement (11.28) corre(α,`) spond to the interpretation that Cold and the transformations T (αi ) are given (α,`) (α,`) and produce Cnew . However, also the opposite view is possible. Here Cnew is (α,`) the old matrix for which a decomposition Cnew = X1 C 0 X2T (e.g., a singular(α,`) value decomposition) is determined. Then Cold := C 0 is the new matrix, while old new new Bold αi = Bαi Xi (i = 1, 2) transforms the old basis Bαi into the new basis Bαi .
11.3 Construction of Bases
407
Lemma 11.26. Let α1 , α2 be the sons of α ∈ TD \L(TD ). Basis transformations (αi ) = Bold (11.26a) at the son vertices α1 , α2 , i.e., Bnew αi T αi (i = 1, 2), lead to a transformation of the coefficients at vertex α by (α,`)
(α,`)
(α,`) 7→ Cnew = T (α1 ) Cold (T (α2 ) )T
Cold
for 1 ≤ ` ≤ rα .
(11.28)
This is equivalent to new (α1 ) (α2 ) ⊗ Cold C T → 7 I ⊗ Cold = T α α α rold rαnew . If the basis is The arithmetic cost for (11.28) is 2rα rαold + rαnew 1 α2 1 2 rold . changed only at α1 (i.e., T (α2 ) = I), the cost reduces to 2rα rαnew rαold 1 1 α2 (α)
Proof. The basis vector b`
at vertex α has the representation
r old r old
(α) b`
=
α1 α2 X X
(α,`)
(α )
(α )
2 1 cij,old bi,old ⊗ bj,old
for 1 ≤ ` ≤ rα
i=1 j=1 old old new (α1 ) new (α2 ) with respect to Bold and Bold α1 and Bα2 . Using Bα1 = Bα1 T α2 = Bα2 T (cf. (11.26a)), we obtain r old r old
(α) b`
α2 α1 X X
=
(α,`) cij,old
rnew rnew α1 α2 X X ) (α ) (α (α ) 1 2) ⊗ Tki 1 bk,new Tmj2 b(α m,new
i=1 j=1
m=1
k=1
old old r α rα 1 X 2 X X X (α1 ) (α ) (α,`) (α ) 2) = ⊗ b(α Tki 1 cij,old Tmj2 bk,new m,new new rα 1
new rα 2
k=1 m=1
i=1 j=1
new new rα rα 1 X 2 X
=
(α,`)
(α )
1 2) ckm,new bk,new ⊗ b(α m,new
k=1 m=1
with
(α,`) ckm,new
old old rα rα
:=
P1 P2
i=1 j=1
(α ) (α,`)
(α )
Tki 1 cij,old Tmj2 . This corresponds to the matrix formut u
lation (11.28). The next lemma uses the transformation matrix S (αi ) from Cases A and B.
Lemma 11.27. Let α ∈ TD \L(TD ) be a vertex with sons {α1 , α2 } = S(α). (α,`) Assume that the coefficient matrices Cold admit a decomposition (α,`)
Cold or equivalently
(α,`) = S (α1 ) Cnew (S (α2 ) )T
for 1 ≤ ` ≤ rα ,
11 Hierarchical Tensor Representation
408
(α2 ) (α1 ) I Cnew ⊗ ⊗ = S Cold S α . α
(α,`)
Then Cnew are the coefficient matrices with respect to the new bases old (αi ) Bnew αi := Bαi S
(i = 1, 2)
at the son vertices (cf. (11.26b)). Since the frame Bnew αi is not used in computations, no arithmetic operations accrue. (α,`)
Proof. We start with (11.20) and insert Cold
(α,`)
= S (α1 ) Cnew (S (α2 ) )T . Then
,old ,old rα rα
(α)
b`
=
1 X 2 X
(α,`)
(α )
(α )
2 1 cij,old bi,old ⊗ bj,old
i=1 j=1 ,old ,old rα rα 2 1
=
X X
(α,`) S (α1 ) Cnew (S (α2 ) )T
i=1 j=1
(α )
(α )
1 2 bi,old ⊗ bj,old
i,j
,old ,old new new rα rα rα rα
=
1 X 2 X 2 1 X X
(α ) (α,`)
(α )
(α )
(α )
1 2 ⊗ bj,old Sik 1 ckm,new Sjm2 bi,old
i=1 j=1 k=1 m=1 new new rα rα
=
1 X 2 X
k=1 m=1 new rα 1
=
old old rα rα 2 1 X X (α ) (α1 ) (α ) (α2 ) (α,`) Sjm2 bj,old Sik 1 bi,old ckm,new ⊗
i=1
j=1
new rα 2
XX
(α )
(α,`)
1 2) ⊗ b(α ckm,new bk,new m,new
k=1 m=1 (α,`)
old (αi ) . has the coefficients Cnew with respect to the new basis Bnew αi := Bαi S
u t
Prd (D) (D) At the root α = D, the tensor v is expressed by v = i=1 ci bi = Bd c(D) (cf. (11.21)). A change of the basis Bd is considered next. Lemma 11.28. Assume a transformation by Bdnew T (D) = Bdold (cf. (11.26a)). (D) Then the coefficient vector cold must be transformed into (D)
(D) cnew := T (D) cold .
(11.29)
The arithmetic cost is 2 rdold rdnew . (D)
(D)
(D)
new (D) Proof. v = Bold cold = Bnew cnew . T d cold = Bd d
t u
11.3 Construction of Bases
409
11.3.1.5 Multiplication by Elementary Kronecker Products In §11.3.1.4 the bases and consequently also some coefficient matrices have been changed, but Nthe tensor v is fixed. Now we map v into another tensor w = Av with A = j∈D Aj . The proposition states that the coefficient matrices need not be changed. Proposition 11.29. Let the tensor v = ρHT TD , (Cα )α∈TD \L(TD ) , c(D) , (Bj )j∈D N and the elementary Kronecker product A = j∈D A(j) be given. Then w := Av has the representation w = ρHT TD , (Cα )α∈TD \L(TD ) , c(D) , (Bjw )j∈D , in which (j) (j) only the frames (bases) Bj = [b1 , . . . , brj ] are replaced with the new frames (j) (j) Bjw = [A(j) b1 , . . . , A(j) brj ]. Proof. Consider a vertex α ∈ TD \L(TD ) with sons α1 and α2 . Application of (α) A(α) = A(α1 ) ⊗ A(α2 ) to b` in (11.20) yields (α)
A(α) b`
=
X
(α,`)
cij
(α1 )
(A(α1 ) bi
(α2 )
) ⊗ (A(α2 ) bi
).
i,j (α)
(α1 )
Although the quantities A(α) b` , A(α1 ) bi matrix C (α,`) is unchanged.
(α2 )
, A(α2 ) bi
are new, the coefficient t u
11.3.1.6 Gram Matrices of Bases rα ×rα will frequently appear later on. Its The Gram matrix G(Bα ) = BH α Bα ∈ K entries are D E (α) (α) (α) (α) G(Bα ) = (gij ) with gij = bj , bi . (11.30)
The recursive structure (11.20) allows a recursive definition of the Gram matrices. Lemma 11.30. For α ∈ TD \L(TD ) let C (α,•) be the coefficient matrices in (11.20). Then G(Bα ) can be derived from G(Bα1 ), G(Bα2 ) (α1 , α2 sons of α) by (α) g`k = trace C (α,k) G(Bα2 )T (C (α,`) )H G(Bα1 ) (1 ≤ `, k ≤ rα )
(α,k) T (α,`) = C G(Bα2 ) , G(Bα1 ) C F
1 1 1 1 (α,k) T 2 2 G(Bα2 ) , G(Bα1 ) 2 C (α,`) G(Bα2 ) 2 T F . = G(Bα1 ) C P (α,k) (α ) (α) (α) (α) (α ) P (α,`) (α ) (α ) Proof. g`k = bk , b` = h ij cij bi 1 ⊗ bj 2 , pq cpq bp 1 ⊗ bq 2 i = P (α ) (α ) (α,k) (α ) (α ) (α,`) t u hbj 2 , bq 2 i cpq hbi 1 , bp 1 i. Use (2.10). ijpq cij
11 Hierarchical Tensor Representation
410
11.3.1.7 Ordering of the Directions The construction of the tree TD groups the directions 1, 2, . . . , d in a certain way. Different trees TD lead to different nodes α ⊂ D and, therefore, to different dimensions rα . Theoretically, we would prefer a tree TD such that X HT rα rα1 rα2 (α1 , α2 sons of α) Nmem ((Cα )α∈TD \L(TD ) ) = α∈TD \L(TD )
is minimal. However, this minimisation is hard to perform since the ranks rα = rankα (v) are usually not known in advance. On the other hand, given a tree TD , we can identify all permutations π of D = {1, 2, . . . , d} such that the tensor viπ(1) ···iπ(d) with interchanged directions is organised by almost the same tree. Lemma 11.31. Any node α ∈ TD \L(TD ) gives rise to a permutation πα : D → D by interchanging the positions of the sons α1 and α2 . Their products yield the set ( ) Y P := πα : A ⊂ TD \L(TD ) . α∈A
Any permutation π ∈ P leaves the tree TD invariant, only the ordering of the sons α1 and α2 may be reversed. According to Remark 11.20, the coefficient matrix C (α,`) becomes C (α,`)T if the factor πα appears in π ∈ P . Hence all tensors viπ(1) ··· iπ(d) for π ∈ P have almost the same representation. Since (TD \L(TD )) contains d − 1 vertices, there are 2d−1 permutations in P . Remark 11.32. A particular permutation is the reversion π : (1, 2, . . . , d) 7→ (d, d − 1, . . . , 1) . Q
Since π = α∈TD \L(TD ) πα , this permutation is contained in the set P from above. Hence the tensor representation of w ∈ Vd ⊗ . . . ⊗ V1 defined by w = π(v) (cf. (3.39)) is obtained from the representation (11.24) of v by transposing all coefficient matrices C (α,`) .
11.3.2 Orthonormal Bases The choice of orthonormal bases has many advantages, one being numerical stability and another the truncation procedure which will be discussed in §11.4.2.1. The orthonormalisation of general bases (frames) starts at the leaves (cf. §11.3.2.1) and proceeds through the tree to the root (cf. §11.3.2.2).
11.3 Construction of Bases
411
11.3.2.1 Bases at Leaf Vertices N Assume that the spaces Uj involved in U = j∈D Uj possess scalar products, which are denoted by h·, ·i (reference Nto the index j omitted). The induced scalar product on the tensor spaces Uα = j∈α Uj for α ∈ TD is also written as h·, ·i. Again Footnote 7 (page 266) applies. (j)
(j)
If the given bases Bj = [b1 , . . . , brj ] of Uj are not orthonormal, we can apply the techniques discussed in §8.2.4.2 (see also §13.4.4). Using the QR decomposition, we get Bj = Qj Rj (cost: 2nj rj with nj = dim Vj ); hence Bjnew = Q({j})
and
T ({j}) = Rj .
Set α := father({j}) with sons S(α) = {{j}, {k}}. The necessary update of Cα by (11.28) costs about rj2 rk rα operations. Alternatively, we can determine the Gram matrix G(Bj ) and its Cholesky decomposition LLH (cost: nj rj2 + 31 rj3 ). By Lemma 8.16 we have Bjnew = Bj L−H with the transformation T ({j}) = LH (cost: nj rj2 ). The cost for updating Cα is as above.
11.3.2.2 Bases at Non-Leaf Vertices Now we are considering a vertex α ∈ TD \L(TD ) and assume that orthonormal bases Bα1 and Bα2 at the son vertices α1 , α2 are already orthonormalised. According to Remark 11.19c, the tensor space Uα1 ⊗ Uα2 at vertex α has the canonical orthonormal basis (α1 ) 2) {b(α) ⊗ b(α : 1 ≤ i ≤ rα1 , 1 ≤ j ≤ rα2 }. νµ := bν µ
Assume that a subspace Uα ⊂ Uα1 ⊗ Uα2 is defined as the span of some basis : 1 ≤ i ≤ rα }, which gives rise to
(α) {bi
(α) r ∈ (Uα ) α . Bα := b1 , . . . , b(α) rα (α)
Since the tensors bj are not directly available, their orthonormality must be expressed by the coefficient matrices C (α,j) .
11 Hierarchical Tensor Representation
412
Lemma 11.33. Let α1 and α2 be the sons of α ∈ TD . Suppose that Bα1 and Bα2 represent orthonormal12 bases. For any vectors v, w ∈ Uα with representations v=
rα 1 rα 2 X X
(α ) bi 1
cvij
⊗
(α ) bj 2
and w =
i=1 j=1
rα 1 rα 2 X X
(α1 )
cw ij bi
(α2 )
⊗ bj
,
i=1 j=1
w ∈ Krα1×rα2 , the scalar involving coefficient matrices Cv := cvij , Cw := cij product of v and w is equal to the Frobenius scalar product (2.10) of Cv and Cw : hv, wi = hCv , Cw iF =
rα 1 rα 2 X X
cvij cw ij .
i=1 j=1 (α)
Hence the isomorphism Θα in (11.19b) is unitary. In particular, the basis {bν } is orthonormal if and only if {C (α,ν) } is orthonormal with respect to the Frobenius scalar product. Proof. By hv, wi =
X
cvij
(α ) bi 1
⊗
(α ) bj 2 ,
X
i,j
=
XX k,` i,j
cw k`
(α ) bk 1
⊗
(α ) b` 2
k,`
X
(α1 ) (α ) (α ) (α ) cvij cw cvij cw ⊗ bj 2 , bk 1 ⊗ b` 2 = ij , k` bi | {z } i,j = δi,k δj,`
we arrive at the Frobenius scalar product.
t u
Using G(Bαi ) = I, also Lemma 11.30 implies that the entries of the Gram matrix G(Bα ) satisfy
(α,µ) (α,ν) (α) (α) gνµ = b(α) = C ,C (1 ≤ ν, µ ≤ rα ). (11.31) µ , bν F Again, either the QR or the Cholesky decomposition can be used for orthonormalisation. In the QR case, each matrix C (α,·) can be considered as a vector of length rα1 rα2 with standard Euclidean norm. Then the cost of the QR decomposition is rα1 rα2 rα2 . The columns of the orthonormal matrix Q define the coefficient (α,·) matrices Cnew . Let γ := father(α) with S(γ) = {α, β} (i.e., β is the brother of α). Renewing Cβ requires rα2 rβ rγ operations (cf. Lemma 11.26). Now the described by Cβ are orthonormal. (virtual) new bases Bnew α Alternatively, we may compute the Gram matrix G(Bα ) and its Cholesky = Bα L−H new codecomposition (cost: rα1 rα2 rα2 + 31 rα3 ). Instead of Bnew α (α,·) efficients matrices Cnew are determined by (11.27) with S (α) = L−H (cost: rα1 rα2 rα2 operations). The update of Cβ is as in the former case. 12
If the bases are not orthonormal, compare Lemma 11.46 and its proof.
11.3 Construction of Bases
413
We denote the hierarchical format with orthonormal bases by (D) v = ρorth , (Bj )j∈D , HT TD , (Cα )α∈TD \L(TD ) , c
(11.32)
which requires BjH Bj = I for all j ∈ D and hC (α,µ) , C (α,ν) iF = δνµ . By Lemma H 11.33, these conditions imply orthonormality: Bα Bα = I. Adding the cost of the QR orthonormalisation of a basis and of the transformations involved, we get the following result. Remark 11.34. Given a hierarchical representation with general bases or frames, the orthonormalisation costs asymptotically 2dnr2 + 3r4 (d − 1) operations, where r := max rα , α
n := max nj . j
11.3.2.3 Transformation between Orthonormal Bases Lemmata 11.26 and 11.27 remain valid for orthonormal bases. In order not to lose orthonormality, the transformation matrices must be unitary or (in the case of rectangular matrices) orthogonal. Note that orthogonal n × m matrices require n ≥ m. The situation rαnew ≤ rαold is covered by Part (a) of the next corollary, i i new . while Part (b) requires rαi ≥ rαold i Corollary 11.35. Let α1 , α2 be the sons of α ∈ TD \L(TD ). (a) If the transformations (α,`) (α,`) Cold = S (α1 ) Cnew (S (α2 ) )T for 1 ≤ ` ≤ rα , new old (α1 ) , S Bα = Bα 1 1
new (α2 ) Bα = Bold α2 S 2
hold with orthogonal matrices S (αi ) (i = 1, 2), the bases Bnew inherit orthoαi normality from Bold , while the Frobenius scalar product of the matrices coefficient αi is invariant:
(α,`) (α,k)
(α,`) (α,k) (11.33) Cnew , Cnew F = Cold , Cold F for all 1 ≤ `, k ≤ rα . holds for i = 1, 2 with orthogonal matrices T (αi ) , the T (αi ) = Bαold (b) If Bαnew i i (α,`) (α,`) new coefficients defined by Cnew = T (α1 ) Cold (T (α2 ) )T satisfy again (11.33). H new (α1 )H H old (α1 ) Proof. (Bnew = S (α1 )H S (α1 ) = I proves (Bold α1 ) Bα1 = S α1 ) Bα1 S new orthonormality of the basis Bα1 . The identity
(α,`) (α,k) (α1 ) (α,`) (α2 )T (α1 ) (α,k) (α2 )T (α,`) (α,k) Cold , Cold F = S Cnew S ,S = Cnew , Cnew F Cnew S F
follows from Exercise 2.12b.
t u
11 Hierarchical Tensor Representation
414
As in §11.3.2.1, a transformation of Bα should be followed by an update of Cβ for the father β of α (cf. Lemma 11.26). If D is the father of α, the coefficient cD must be updated (cf. Lemma 11.28).
11.3.2.4 Unitary Mappings We now consider the analogue of the mappings in §11.3.1.5 under orthonormality (j) (j) preserving conditions. For j ∈ D, let Bj = [b1 , . . . , brj ] be an orthonormal basis. ˆj ⊂ Vj be a mapping Uj := range(Bj ) is a subspace of Vj . Let Aj : Uj → U such that (j) (j) ˆj = [ ˆb(j) , . . . , ˆb(j) with ˆbi := Aj bi B rj ] 1 N is again an orthonormal basis. Proposition 11.29 applied to A = j∈D Aj takes the following form. Proposition 11.36. Let the tensor v = ρorth TD , (Cα )α∈TD\L(TD) , c(D), (Bj )j∈D HT N and the elementary Kronecker product A = j∈D Aj with unitary mappings Aj : ˆj ⊂ Vj be given. Then w := A v has the representation Uj → U (D) ˆj )j∈D , , (B w = ρorth HT TD , (Cα )α∈TD \L(TD ) , c in which the orthonormal bases Bj are replaced with the orthonormal bases ˆj = [ Aj b(j) , . . . , Aj br(j) B j ]. 1 Let A ⊂ TD be a complete N set of successors of D (cf. Definition 11.9). Consider Kronecker products A = α∈A Aα and assume that ˆ α ⊂ Uα1 ⊗ Uα2 is unitary for all α ∈ A. Aα : U α → U Hence the orthonormal basis (α) Bα = b1 , . . . , b(α) rα is mapped into a new orthonormal basis (α) ˆ ,...,b ˆ (α) ˆα = b B rα 1
with
ˆ (α) := Aα b(α) . b ` `
ˆ (α) , new coefficient matrices Cˆ (α,`) are to be defined satisfying To represent b ` (α)
Aα b`
=
(α,`) (α1 ) cˆ bi ij ij
X
(α2 )
⊗ bj
.
(D) ˆ , (Bj )j∈D , The result Av has the representation ρorth HT TD , (Cβ )β∈TD \L(TD ) , c ˆ β = Cβ for all β ∈ where C / A. Only for β ∈ A, new coefficient matrices appear as defined above.
11.3 Construction of Bases
415
11.3.3 HOSVD Bases 11.3.3.1 Definitions, Computation of Mα (v) Mα (v)H (α)
The left singular vectors ui of Mα (v) (cf. (11.14b)) may be chosen as ortho(α) (α) normal basis: Bα = [u1 · · · urα ]. They form the HOSVD basis corresponding to the tensor v ∈ V and to the vertex α ∈ TD (cf. Definition 8.24). Definition 11.37 (hierarchical HOSVD representation). The hierarchical HOSVD representation denoted by v = ρHOSVD TD , (Cα )α∈TD \L(TD ) , c(D) , (Bj )j∈D HT indicates that these data correspond to HOSVD bases Bα for all α ∈ TD . The (virtual) HOSVD basis BHOSVD = α coefficient matrix family
(α) (α) b1 , . . . , brα corresponds to the
(α,`)
HOSVD = (CHOSVD )1≤`≤rα . Cα
Next, we describe a simple realisation of its computation. The left singular-value decomposition of Mα (v) is equivalent to the diagonalisation of Mα (v)Mα (v)H . We recall that Mα (v)Mα (v)H (α ∈ TD ) is the matrix version of the partial scalar product hMα (v), Mα (v)iαc ∈ Vα ⊗ Vα (cf. §5.2.3). In the case of the tensor subspace format, its computation must refer to the complete coefficient tensor. Similarly, for the r-term format, computing Mα (v)Mα (v)H involves all coefficients. For the orthonormal hierarchical format, the situation is simpler. Only the coefficients Cβ for all predecessors β ⊃ α are involved. orth Theorem 11.38. For v = ρHT TD , (Cα )α∈TD \L(TD ) , c(D) , (Bj )j∈D define the (α) matrices Eα = eij ∈ Krα ×rα by rα X (α) (α) (α) eij bi ⊗ bj ∈ Vα ⊗ Vα . hMα (v), Mα (v)iαc = i,j=1
For α = D, the matrix ED (usually of the size 1 × 1) is equal to ED := c(D) (c(D) )H ∈ Krd ×rd .
(11.34) rα ×rα
, we determine Let α1 , α2 be the sons of α ∈ TD \L(TD ). Given Eα ∈ K Eα1 and Eα2 representing hMαi (v), Mαi (v)iαc (i = 1, 2) from i
Eα1 =
rα X i,j=1
(α) eij C (α,i) (C (α,j) )H ,
Eα2 =
rα X
(α)
eij (C (α,i) )T C (α,j) .
i,j=1
Proof. Use Theorem 5.15 and note that the Gram matrix is the identity since the bases are orthonormal. See also Hackbusch [144]. t u
11 Hierarchical Tensor Representation
416
Even for non-orthonormal bases, Theorem 5.15 provides a recursion for Eα . Theorem 11.38 allows us to determine Eα by a recursion from the root to the leaves. However, it also helps for computing the HOSVD bases. A HOSVD basis at vertex α is characterised by (α) 2 2 , . . . , σr(α) Eα = diag σ1 , (11.35) α corresponding to the diagonalisation hMα (v), Mα (v)iαc =
rα X
(α) 2 (α) bi
σi
(α)
⊗ bi .
i=1
(D) Theorem 11.39. Given the data v = ρorth , (Bj )j∈D , HT TD , (Cα )α∈TD \L(TD ) , c assume that (11.35) holds at the vertex α ∈ TD \L(TD ). Then Eα1 =
rα X
(α) 2
σi
C (α,i) (C (α,i) )H , Eα2 =
rα X
(α) 2
σi
(C (α,i) )T C (α,i)
(11.36)
i=1
i=1
holds at the son vertices α1 and α2 . The diagonalisations Eα1 = Uα1 Σα2 1 UαH1 and Eα2 = Uα2 Σα2 2 UαH2 yield the HOSVD bases13 BHOSVD = Bαk Uαk for k = 1, 2, αk (αk ) (αk ) where Uαk = [u1 , . . . , urHOSVD ]. αk
PrαHOSVD (α ) (α ) (α )H 1 (σν 1 )2 uν 1 uν 1 Proof. Eα1 = Uα1 Σα2 1 UαH1 can be rewritten as Eα1 = ν=1 HOSVD Prα1 (α ) (α )H (α ) (α ) with entries eij 1 = ν=1 (σν 1 )2 uν,i1 uν,j1 . Hence, r HOSVD
hMα1 (v), Mα1 (v)iαc1 =
rα1 α1 X X
(α ) (α )H
(σν(α1 ) )2 uν,i1 uν,j1
(α1 )
bi
(α1 )
⊗ bj
i,j=1 ν=1 HOSVD rα
=
1 X
(α )
(α )
1 1 (σν(α1 ) )2 bν,HOSVD ⊗ bν,HOSVD
ν=1 (α )
1 with bν,HOSVD =
Prα
i=1
(α1 )
uν
(α1 )
[i] bi
u , i.e., BHOSVD = Bα1 Uα1 . Similarly for α2 . t α1
In the following, we shall determine the HOSVD bases BHOSVD together with the α (α) (α) singular values σi > 0 which can be interpreted as weights of bi,HOSVD . We add (α) (α) Σα = diag{σ1 , . . . , σrHOSVD } to the representation data. α
We start from orthonormal bases Bα and their coefficient matrices Cα and . The , weights Σα , and matrices CHOSVD construct the new HOSVD bases BHOSVD α α (α,`) creates coefficient matrices C will change twice: a transform Bα 7→ BHOSVD α new basis vectors and, therefore, also new coefficient matrices Cˆ (α,`) . Since the coefficients refer to the basis vectors of the sons, a basis change in these vertices (α,`) leads to the second transform Cˆ (α,`) 7→ CHOSVD into their final state. The number 13
According to the second interpretation in Remark 11.25.
11.3 Construction of Bases
417
of basis vectors in Bαi for the sons α1 , α2 ∈ S(α) may also change from rαi to rαHOSVD . For simplicity, we shall overwrite the old values by the new ones without i changing the symbol. The inductive steps can be combined in different ways to get (a) the complete HOSVD representation ρHOSVD HT , (b) the HOSVD at one vertex, and (c) the coefficients (`) at one level TD of the tree (cf. §11.3.3.4). 11.3.3.2 General Case To apply Theorem 11.39 we assume that all bases Bα (α ∈ TD ) are orthonormal: (D) , (Bj )j∈D . v = ρorth HT TD , (Cα )α∈TD \L(TD ) , c (D)
The recursion starts at the root α = D with a basis Bd and weights σi as explained in §11.3.3.3. In the general case assume a vertex α ∈ TD \L(TD ) (α) (α) with a newly computed (HOSVD) basis Bα = [b1 , . . . , brα ] and a weight tuple (α) (α) Σα = diag{σ1 , . . . , σrα }. The corresponding coefficient matrices are gathered in C(α) = (C (α,`) )1≤`≤rα (note that in the previous step these matrices have been changed). Form the matrices14 i h (α) (α) C (α,rα ) ∈ Krα1 ×(rα rα2 ) , (11.37a) Zα1 := σ1 C (α,1) , σ2 C (α,2) , . . . , σr(α) α (α) σ1 C. (α,1) r r ×r .. Zα2 := (11.37b) ∈ K( α α 1 ) α 2 , (α) (α,rα ) σ rα C where α1 and α2 are the sons of α ∈ TD . Compute the left-sided reduced singularvalue decomposition of Zα1 and the right-sided one of Zα2 : Zα1 = U Σα1 Vˆ T
and
ˆ Σα V T . Zα2 = U 2
(11.37c)
ˆ are not needed. Only the matrices The matrices Vˆ and U HOSVD
U ∈ Krα1 ×rα1
HOSVD
V ∈ Krα2 ×rα2
(α1 )
, Σα1 = diag{σ1
(α2 )
, Σα2 = diag{σ1
(α )
HOSVD
1 , . . . , σrHOSVD } ∈ Kr α 1
HOSVD ×rα 1
,
α1
(α )
HOSVD
2 } ∈ Kr α 2 , . . . , σrHOSVD
HOSVD ×rα
(11.37d)
2
α2
are of interest, where rαHOSVD := rank(Zαi ) < rαi may occur. The data (11.37d) are i characterised by the diagonalisations of the matrices Eα1 and Eα2 from (11.36):
14
(α) 2 `=1 (σ` )
C (α,`) C (α,`)H = U Σα2 1 U H ,
(α) 2 `=1 (σ` )
C (α,`)T C (α,`) = V Σα2 2 V H .
Eα1 = Zα1 ZH α1 =
Prα
Eα2 = ZT α2 Zα2 =
Prα
Zα1 may be interpreted as Θα (Bα Σα ) from (11.19b).
11 Hierarchical Tensor Representation
418
The equations range(C (α,`) ) = range(U ) and range(C (α,`)T ) = range(V ) are (α,`) valid by construction; hence, C (α,`) allows a representation C (α,`) = U CHOSVD V T . Since U and V are orthogonal matrices, the coefficient matrices at vertex α are transformed by HOSVD
(α,`)
C (α,`) 7→ CHOSVD := U H C (α,`) V ∈ Krα1
HOSVD ×rα 2
(1 ≤ ` ≤ rα ). (11.37e)
According to Lemmata 11.27 and 11.24, the bases and coefficient matrices at the son vertices α1 , α2 transform as follows: := Bα2 V, := Bα1 U and Bα2 7→ BHOSVD Bα1 7→ BHOSVD α2 α1 := Cα1 U and CHOSVD := Cα2 V. CHOSVD α1 α2
(11.37f)
and CHOSVD and redefine rαi by Again, we write Bαi and Cαi instead of BHOSVD αi αi rαi := rαHOSVD . i The singular values in the diagonal of Σα1 and Σα2 in (11.37d) will become the weights in the definition of Eσ for the sons σ of αi . Remark 11.40. The computational work of the steps (11.37a–f) consists of T (a) (rα1 + rα2 ) rα rα1 rα2 operations for forming Zα1 ZH α1 and Zα2 Zα2 and 8 3 3 3 (rα1 + rα2 ) for the diagonalisation, producing U, Σα1 , V, Σα2 ,
(b) 2rα rαHOSVD rα2 (rα1 + rαHOSVD ) for (11.37e), 1 2 rα21 rα22 ) for (11.37f), where αk1 , αk2 ∈ S(αk ), rα11 rα12+rα2 rαHOSVD (c) 2(rα1 rαHOSVD 2 1 provided that αk ∈ TD \L(TD ). Otherwise, if α1 = {j}, (11.37f) costs 2(rj rjHOSVD nj ), where nj = dim(Vj ). Bounding all ranks rγ by r and nj by n, the total asymptotic work is 10r4 (if α1 , α2 ∈ / L(TD )), 8r4 + 2r2 n (if one leaf is in {α1 , α2 }), and 6r4 + 4r2 n (if α1 , α2 ∈ L(TD )). Exercise 11.41. Let {α1 , α2 } = S(α). Zα1 and Zα2 in (11.37a,b) formulated (α,i) with the transformed matrices C (α,`) = CHOSVD satisfy (α) 2 `=1 (σ` )
Zα1 ZH α1 =
Prα
ZT α2 Zα2
Prα
=
(α) 2 `=1 (σ` )
C (α,`) C (α,`)H = Σα2 1 , C
(α,`)T
C (α,`)
=
(11.38)
Σα2 2 .
Property (11.38) characterises the coefficients of the HOSVD bases. Since Σα2 2 is a real diagonal matrix, we may change the second line of (11.38) into
11.3 Construction of Bases
419
ZH α2 Zα2 =
rα X (α) (σ` )2 C (α,`)H C (α,`) = Σα2 2 .
(11.39)
`=1
11.3.3.3 Treatment of the Root (D)
Since the standard choice is Ud = span{v}, there is only one basis vector b1 . (D) Orthonormality of the basis reduces to the requirement that b1 is normalised. (D) We may assume that b1 = v/ kvk. The definition of the weight (D)
σ1
:= kvk
(11.40a)
coincides with (11.34): Σd2 := Ed := c(D) (c(D) )H ∈ K1×1 . Since rd = 1, the matrices Zα1 and Zα2 in (11.37a,b) coincide. Instead of solving two one-sided SVD, we now determine the reduced both-sided singular(D) value decomposition of σ1 C (D,1) , where C (D,1) ∈ Krα1×rα2 is the coefficient (D) matrix of the vector b1 and α1 , α2 are the sons of D (cf. (11.20)): (D)
σ1 C (D,1) = UΣ V T
r ×r r ×r U ∈ K α1 , V ∈ K α2 orthogonal, (D,1) ), (11.40b) σ1 ≥ . . . ≥ σr > 0, r := rank(C r×r Σ = diag{σ1 , . . . , σr } ∈ K .
The rank rαHOSVD := r may be smaller than rαk (k = 1, 2). The bases at the son k vertices α1 , α2 are changed via HOSVD Bα1 7→ Bα := Bα1 U 1
and
:= Bα2 V, Bα2 7→ BHOSVD α2
(11.40c)
i.e., Lemma 11.27 applies with S (α1 ) := U and S (α2 ) := V , and shows that σr σ1 (D,1) CHOSVD = Σ = diag , . . . , (D) . (11.40d) σ1(D) σ1 The size of the bases BHOSVD is rαHOSVD := r = rank(C (D,1) ). According to Lemma αi i (αi ) 11.24, the coefficient matrices of the new basis vectors b`,HOSVD are := I ⊗ I ⊗ U T Cα1 , CHOSVD α1
:= I ⊗ I ⊗ V T Cα2 , CHOSVD α2
(11.40e)
Prα1 Prα2 (α1 ,`) (α2 ,`) i.e., CHOSVD := k=1 Uk` C (α1 ,k) and CHOSVD := k=1 Vk` C (α2 ,k) for 1 ≤ ` ≤ r. To simplify the notation, we omit the suffix ‘HOSVD’ and write rαi , Bαi , Cαi for the new quantities at αi . The newly introduced weights are defined by the singular values of (11.40b): (α1 )
Σα1 := Σα2 := diag{σ1 , . . . , σr } with σi = σi
(α2 )
= σi
in (11.40b).
(11.40f)
11 Hierarchical Tensor Representation
420
As mentioned above, the old dimensions rαi of the subspaces are changed into rα1 := rα2 := r. Remark 11.42. The computational work of (11.40a–e) consists of (a) NSVD (rα1 , rα2 ) for U, σi , V (cf. 11.40b), (b) 2(rα1 rαHOSVD rα11 rα12 + rα2 rαHOSVD rα21 rα22 ) for (11.40e), where15 αk1 , αk2 ∈ 1 2 S(αk ) . Bounding all rγ (γ ∈ TD ) by r, the total asymptotic work is 4r4 . Note that the computation at the root is not different from the general case in §11.3.3.2, but simplified because of rd = 1. If we want the root space Ud containing a family of tensors, we might fix an orthonormal basis and a suitable weight tuple Σd and proceed as is §11.3.3.2.
11.3.3.4 HOSVD Computation First, we want to compute the HOSVD bases at all vertices. The algorithm starts at the root and proceeds to the (fathers of the) leaves. The underlying computational step at vertex α ∈ TD \L(TD ) is abbreviated as follows: (for α ∈ T( D \L(TD ) with sons α1 , α2 ) (11.40d) if α = D, transform C (α,`) (1 ≤ ` ≤ rα ) according to (11.37e) if α 6= D; ( ) ( (α1 ,`) C (1 ≤ ` ≤ rα1 ) (11.40e) if α = D, transform according to (α2 ,`) (11.37f) if α 6= D; C (1 ≤ ` ≤ rα2 ) ( ) (α ) (α ) Σα1 := diag{σ1 1 , . . . , σrα11 } define the weights (α2 ) (α2 ) Σα2 := diag{σ ( ) 1 , . . . , σrα2 } (11.40f) if α = D according to with possibly new rα1 , rα2 ; (11.37c) if α 6= D (11.41a) procedure HOSVD(α);
The complete computation of HOSVD bases at all vertices of TD is performed by the call HOSVD∗ (D) of the recursive procedure HOSVD∗ (α) defined by procedure HOSVD∗ (α); if α ∈ / L(TD ) then begin HOSVD(α); for all sons σ ∈ S(α) do HOSVD∗ (σ) end; The derivation of the algorithm yields the following result. 15
If α1 (or α2 ) is a leaf, the numbers change as detailed in Remark 11.40.
(11.41b)
11.3 Construction of Bases
421
(D) , (Bj )j∈D . The Theorem 11.43. Assume v = ρorth HT TD , (Cα )α∈TD \L(TD ) , c result of HOSVD∗(D) is (D) HOSVD v = ρHT TD , (CHOSVD )α∈TD \L(TD ) , cHOSVD , (BjHOSVD )j∈D . α (α)
(α)
HOSVD = [ b1,HOSVD , . . . , brα ,HOSVD ] for α ∈ TD are the The implicitly defined bases Bα HOSVD bases. The computed tuples Σα contain the singular values.
The computational cost for α = D and α ∈ TD \ ({D} ∪ L(TD )) is discussed in the Remarks 11.42 and 11.40. The total cost of HOSVD∗ (D) sums to Xd rj rjHOSVD nj NSVD (rσ1 , rσ2 ) + 2 (11.41c) j=1 X +2 rα rα2 rαHOSVD rα1 + rαHOSVD (rα1 + rαHOSVD ) , 1 2 α∈TD \({D}\∪L(TD ))
where {σ1 , σ2 } = D and {α1 , α2 } = S(α). If rα ≤ r and nj ≤ n, the asymptotic cost is 3(d − 2)r4 + 2dr2 n. Remark 11.44. Algorithm (11.41b) uses a recursion over the tree TD . Computations at the sons of a vertex are completely independent. This allows an easy parallelisation. This reduces the computational time by a factor of d/ log2 d. We can use a similar recursion to obtain the HOSVD basis of a single vertex α ∈ T . The algorithm shows that only the predecessors of α are involved. procedure HOSVD∗∗ (α); (11.41d) begin if α 6= D then begin β := father(α); if HOSVD not yet installed at β then HOSVD∗∗ (β) end; HOSVD(α) end; (`)
We recall that the tree TD is decomposed in TD for the levels 0 ≤ ` ≤ L (cf. (11.7). The quite general recursion in (11.41b) can be performed levelwise: procedure HOSVD-lw(`); (`)
for all α ∈ TD \L(TD ) do HOSVD(α);
(11.42a)
To determine the HOSVD bases on level `, we may call HOSVD-lw(`), provided that HOSVD bases are already installed on level ` − 1 (or if ` = 0). Otherwise we have to call procedure HOSVD∗ -lw(`); for λ = 0 to ` do HOSVD-lw(λ);
(11.42b)
11 Hierarchical Tensor Representation
422
11.3.4 Tangent Space and Sensitivity 11.3.4.1 Tangent Space Let v(t) = ρHT TD , (Cα + tC0α )α∈TD \L(TD ) , c(D) + tc0(D) , (Bj + tBj0 )j∈D . The tangent space THT TD , (Cα )α∈TD \L(TD ) , c(D) , (Bj )j∈D is the sum THT,c(D) (v) +
X
THT,Cα (v) +
α∈TD \L(TD )
d X
THT,Bj (v).
j=1
Remark 11.45. (a) Under the standard assumption rD = 1 we have THT,c(D) = span{v}. (b) THT,Bj = id[j] ⊗ Lj v : Lj ∈ L(Uj , Vj ) coincides with TTS,Bj (a, (Bj )) in Remark 8.38a. (c) THT,Cα (v) = {(idαc ⊗ Lα ) v : Lα ∈ L(Uα , Uα1 ⊗ Uα2 )} with the subspace (α) Uα = span{bi : 1 ≤ i ≤ rα } and Uα1 , Uα2 corresponding to the sons of α.
11.3.5 Sensitivity The data of the hierarchical format, which may be subject to perturbations, consist (α,`) mainly of the coefficients cij and the bases Bj at the leaves. Since basis vectors (α) (α) b` appear only implicitly, perturbations of b` are caused by perturbations of (α,`) (α,`) cij (usually, some coefficients cij are replaced with zero). An important tool for the analysis are Gram matrices, which are considered in §11.3.5.1. The error analysis for the general (sub)orthonormal case is given in §11.3.5.2. HOSVD bases are considered in §11.3.5.3.
11.3.5.1 Gram Matrices and Suborthonormal Bases We recall the definition of a Gram matrix (cf. (2.13)). For any basis Bα (α ∈ TD ) or Bj (j ∈ D) we set D E (α) (α) (α) := b(α) G(Bα ) := (gνµ )1≤ν,µ≤rα with gνµ µ , bν Similarly, the tuple Cα := C (α,`) G(Cα ) := (gνµ )1≤ν,µ≤rα
1≤`≤rα
of coefficient matrices is associated with
D E with gνµ := C (α,µ) , C (α,ν) . F
(11.43)
11.3 Construction of Bases
423
First we consider the situation of a vertex α ∈ TD \L(TD ) with sons α1 , α2 . In the following lemma the data Bα1 , Bα2 are general variables; they may be bases or their perturbations. The crucial question is as to whether the mapping (Bα1 , Bα2 ) 7→ Bα defined in (11.20) is stable. Lemma 11.46. Let α1 , α2 ∈ S(α). Bα1 ∈ (Vα1 )rα1 and Bα2 ∈ (Vα2 )rα2 are P (α,`) (α ) (α) (α ) mapped by b` = i,j cij bi 1 ⊗ bj 2 into Bα ∈ (Vα )rα . The related Gram matrices satisfy kG(Bα )k2 ≤ kG(Cα )k2 kG(Bα1 )k2 kG(Bα2 )k2 .
(11.44)
Prα 2 Proof. According to Lemma 2.18, there are coefficients ξi ∈ K with `∈1 |ξ` | = 1
Prα P (α,`) (α) (α) 2 rα ξ` b . Summation over ` yields c := and kG(Bα )k = ξ` c `=1 (α)
2
`
ij
2
`=1
ij
and the matrix Cα = (cij ). With this notation and the abbreviations Gi := G(Bαi ) for i = 1, 2, we continue:
r
2
2 α
X
X (α) (α1 )
(α2 ) (α)
cij bi ⊗ bj ξ` b` =
2 i,j `=1 2 X X (α) (α ) (α ) (α ) (α) (α ) = cij bi 1 ⊗ bj 2 , ci0 j 0 bi0 1 ⊗ bj 0 2 i0 ,j 0
i,j
=
XX
(α) (α ) (α) cij gj 0 j2 ci0 j 0
(α ) gi0 i 1
i,j i0 ,j 0 H = trace(Cα GT 2 Cα G1 ). 1/2 1/2 Set Cˆ := G1 Cα (G2 )T . Exercise 2.8a allows us to rewrite the trace as follows: H ˆ ˆH ˆ ˆ ˆ 2 trace(Cα GT 2 Cα G1 ) = trace(C C ) = hC, CiF = kCkF .
Thanks to Lemma 2.11, we can estimate by16 h 1 i2 1 1 1 ˆ 2F = kG 2 Cα (G 2 )T k2F ≤ kG 2 k2 kCα kF kG 2 k2 = kG1 k2 kCα k2F kG2 k2 kCk 1 2 1 2 Prα Now we use Cα = `=1 ξ` C (α,`) and apply Lemma 2.18 (with the Euclidean scalar product replaced by the Frobenius scalar product): kCα k2F
r
r
2
2 α α
X
X
(α,`) (α,`) = ξ` C η` C
≤ P max2
= kG(Cα )k2 .
` |η` | =1 `∈1
16
F
`∈1
Here we use that positive semi-definite matrices satisfy kG (ρ: spectral radius defined in (4.86)).
F
1/2 2
k = (ρ(G1/2 ))2 = ρ(G) = kGk
11 Hierarchical Tensor Representation
424
Putting all estimates together, we obtain the desired estimate.
t u
This result deserves some comments. 1) Orthonormality of the basis Bαi is equivalent to G(Bαi ) = I. According to Lemma 11.33, orthonormal matrices C (α,`) produce an orthonormal basis Bα , i.e., G(Bα ) = I. Under these assumptions, inequality (11.44) takes the form 1 ≤ 1·1·1. 2) The quantity kG(Bα )k2 is a reasonable one since it is an estimate for all
r
2 rα α
P P (α) 2
expressions |ξ` | = 1 (cf. Lemma 2.18). with ξ` b`
`=1
2
`=1
3) Starting from the orthonormal setting (i.e., kG(. . .)k2 = 1), we shall see that truncations lead to kG(. . .)k2 ≤ 1. Therefore errors will not be amplified. (α )
A typical truncation step at vertex α1 omits a vector of the basis, say, brα11 (α ) (α1 ) · · · brα1 −1 ]. Although Bnew keeping Bnew α1 = [ b1 α1 and Bα2 represent ortho1 normal bases, the resulting basis Bnew vertex α is no longer orthofather at the α (α,`) new normal. G(Bnew corresponds to ) ), Cnew G(C where = (Cnew )1≤`≤rα α α α is obtained by omitting the rα1 -th row in C (α,`) . However, still the inequality G(Bnew α ) ≤ I can be shown (cf. Exercise 11.50). Exercise 11.47. Prove (α)
kb` k22 ≤ kG(Bα1 )k2 kG(Bα2 )k2 kC (α,`) k2F for 1 ≤ ` ≤ rα . Exercise 11.48. (a) Prove that p p p kG(B + C)k2 ≤ kG(B)k2 + kG(C)k2 . (b) Let B, C ∈ Kn×m be pairwise orthogonal, i.e., BH C = 0. Prove that G(B + C) = G(B) + G(C),
kG(B + C)k2 ≤ kG(B)k2 + kG(C)k2 .
Definition 11.49. An n-tuple of linearly independent vectors x = (x1 , . . . , xn ) is called suborthonormal if the corresponding Gram matrix satisfies 0 < G(x) ≤ I
(cf. (2.12)) .
Exercise 11.50. Show for any B ∈ Kn×m and any orthogonal projection P ∈ Kn×n that G(P B) ≤ G(B). Hint: Use Remark 2.15b with P = P H P ≤ I.
11.3 Construction of Bases
425
11.3.5.2 Orthonormal and Suborthonormal Bases We now suppose that all bases Bα ∈ (Vα )rα (α ∈ TD ) are orthonormal or, more generally, suborthonormal. This fact implies G(Bα ) ≤ I, G(Cβ ) ≤ I and thus kG(Bα )k2 , kG(Cβ )k2 ≤ 1. Perturbations may be caused as follows. 1. At a leaf j ∈ D, the basis Bj may be altered. We may even reduce the dimension (j) by omitting one of the basis vectors (which changes some bi into 0). Whenever new the new basis Bj is not orthonormal or the generated subspace has a smaller dimension, the implicitly defined bases Bnew with j ∈ α also lose orthonormality. α 2. Let α ∈ TD \L(TD ) have sons α1 , α2 . Changing Cα , we can rotate the basis satisfying the nestedness property Bα into a new orthonormal basis Bnew α range(Bα ) ⊂ range(Bα1 ) ⊗ range(Bα2 ). This does not change G(Bnew β )=I and G(Cβ ) = I for all predecessors β ∈ TD (i.e., β ⊃ α). (α)
(α,r )
3. We may omit, say, brα from Bα ∈ (Vα )rα by setting Cnew α := 0. If Bα is (sub)orthonormal, Bnew is so too. For β ⊃ α, G(Bnew α β ) ≤ G(Bβ ) follows; i.e., an orthonormal basis Bβ becomes a suborthonormal basis Bnew α . The inequality G(Cnew α ) ≤ G(Cα ) holds. We consider a general perturbation δBα ∈ (Vα )rα ; i.e., the exact basis Bα is changed into new Bα := Bα − δBα at one vertex α ∈ TD . Let β ∈ TD be the father of α such that β1 , β2 ∈ S(β) are the sons and, e.g., α = β1 . A perturbation δBα causes a change of Bβ into (β) P (β,`) (α) (β ) Bβnew= Bβ −δBβ . Because of the linear dependence, δb` = i,j cij δbi ⊗bj 2 holds for the columns of δBβ , δBα , and inequality (11.44) implies that kG(δBβ )k2 ≤ kG(Cβ )k2 kG(δBα )k2 kG(Bβ2 )k2 ≤ kG(δBα )k2 . The inequality kG(δBβ )k2 ≤ kG(δBα )k2 can be continued up to the root D (D) yielding kG(δBd )k2 ≤ kG(δBα )k2 . However, since rd = 1, G(δBd ) = kδb1 k22 is an 1×1 matrix. This proves the following result. (D) , (Bj )j∈D as Theorem 11.51. Given v = ρorth HT TD , (Cα )α∈TD \L(TD ) , c defined in (11.32), consider a perturbation δBα at some α ∈ TD (in particular, δBj for α = {j} is of interest). Then v is changed into vnew = v − δv with q (D) kδvk2 ≤ |c1 | kG(δBα )k2 . = Bα − δBα If β is the father of α ∈ TD , the fact that the perturbed basis Bnew α is, in general, no longer orthonormal, implies that Bnew = Bβ − δBβ also loses β orthonormality, although the coefficients Cβ still satisfy G(Cβ ) = I.
11 Hierarchical Tensor Representation
426
Corollary 11.52. Let A ⊂ TD be a subset of the size #A (hence, #A ≤ 2d − 1). Perturbations δBα at all α ∈ A yield the error p (D) kδvk2 ≤ |c1 | #A
r
X α∈A
kG(δBα )k2 + higher-order terms.
A first perturbation changes some Bβ into Bnew β . If kG(Bβ )k2 ≤ 1, i.e., if Bnew is still suborthonormal, the second perturbation is not amplified. β However, kG(Bnew )k may become larger than one. Exercise 11.48a allows us 2 β to bound the norm by q 2 kG(Bβ )k2 + kG(δBβ )k2 q 2 q = 1 + kG(δBβ )k2 = 1 + O kG(δBβ )k2 .
kG(Bnew β )k2 ≤
q
If this factor is not compensated by other factors smaller than 1, higher-order terms appear in the estimate. So far, we have considered general perturbations δBα , leading to perturbations δvα of v. If the perturbations δvα are orthogonal, we can derive a better estimate of kG(·)k2 (cf. Exercise 11.48b). In connection with the HOSVD truncations discussed in §11.4.2 we shall continue the error analysis.
11.3.5.3 HOSVD Bases Since HOSVD bases are particular orthonormal bases, the results of §11.3.5.2 are still valid. However, now the weights Σα in (11.40f) and (11.37d) enable a weighted norm of the error, which turns out to be optimal for our purpose. First, we assume that HOSVD bases are installed at all vertices α ∈ TD . The (α ) (α ) coefficient matrices C (α,`) together with the weights Σαi = diag{σ1 i , . . . , σrαii } (i = 1, 2) at the son vertices {α1 , α2 } = S(α) satisfy (11.38) (cf. Exercise 11.41). (τ ) (τ ) A perturbation δBτ of the basis Bτ is described by δBτ = [δb1 , . . . , δbrτ ]. (τ ) The weights σi from Στ are used for the formulation of the error: v u rτ 2 uX (τ ) (τ ) σi kδbi k ετ := t
(τ ∈ TD ).
(11.45)
i=1
The next proposition describes the error transport from the son α1 to the father α.
11.3 Construction of Bases
427
Proposition 11.53. Suppose that the basis Bα satisfies the first HOSVD property (α ) (α ) (11.38) for α1 ∈ S(α). Let δBα1 be a perturbation17 of Bα1 = [b1 1 , . . . , brα11 ] into Bα1 − δBα1 . The perturbation is measured by εα1 in (11.45). The (exact) Prα1 Prα2 (α,`) (α1 ) (α) (α) (α ) (α) basis Bα = [ b1 , . . . , brα ] with b` = i=1 ⊗ bj 2 is bi j=1 cij perturbed into Bα − δBα with h i (α) δBα = δb1 , . . . , δb(α) rα
(α)
and δb`
=
rα1 rα2 X X
(α,`)
cij
(α1 )
δbi
(α2 )
⊗ bj
.
i=1 j=1 (α)
Then the perturbations δb`
lead to an error of equal size: v u rα 2 uX (α) (α) εα := t σ` kδb` k = εα1 . `=1
Proof. By orthonormality of the basis Bα2 we have
(α) kδb` k2
2 rα
rα rα
2 1 X 2
X
X
X (α,`) (α1 ) 2
(α,`) (α1 ) (α2 )
cij δbi ⊗ bj = = cij δbi
i=1 j=1
=
rα2 X X j=1
(α,`)
cij
(α1 )
δbi
,
j=1
X
(α,`)
ci0 j
(α1 )
δbi0
=
i0
i
X
i
(α,`) (α,`) ci0 j
cij
E D (α ) (α ) δbi 1 , δbi0 1 .
i,i0 ,j
(α ) (α ) The matrix G = G(δBα1 ) ∈ Krα1×rα1 has the entries Gi0 i := δbi 1 , δbi0 1 . The sum over i, i0 , j from above is equal to trace (C (α,`) )H G C (α,`) = trace G1/2 C (α,`) (C (α,`) )H G1/2 (cf. Exercise 2.8a). From (11.38) we derive rα X
rα X (α) 2 (α) 2 (α) σ` trace(G1/2 C (α,`) (C (α,`) )H G1/2 ) σ` kδb` k = `=1
`=1
h Xr α 1
= trace G 2
`=1
(α)
σ`
= trace (Σα1 G Σα1 ) =
2
rα X
i 1 1 1 C (α,`) (C (α,`) )H G 2 = trace G 2 Σα2 1 G 2 (α1 )
σi
(α1 )
kδbi
2 k = ε2α1
i=1
concluding the proof. 17
t u
This and following statements are formulation for the first son α1 . The corresponding result for α2 is completely analogous.
11 Hierarchical Tensor Representation
428
The HOSVD condition (11.38) can be weakened: rα rα X X (α) (α) (σ` )2 C (α,`) (C (α,`) )H ≤ Σα2 1 , (σ` )2 (C (α,`) )H C (α,`) ≤ Σα2 2
(11.46)
`=1
`=1
(τ )
(τ )
for α1 , α2 ∈ S(α). Furthermore, the basis Bτ = [b1 , . . . , brτ ] (τ ∈ TD ) may be suborthonormal; i.e., the Gram matrix G(Bτ ) := BτH Bτ satisfies G(Bτ ) ≤ I
(τ ∈ TD ) .
(11.47)
The statement of Proposition 11.53 can be generalised for the weak HOSVD condition. Corollary 11.54. Suppose that the basis Bα satisfies the first equality in (11.46) for α1 ∈ S(α). Let δBα1 be a perturbation of Bα1 measured by εα1 in (11.45). The basis Bα2 at the other son α2 ∈ S(α) may be suborthonormal (cf. (11.47)). (α) Then the perturbations δb` are estimated by v u rα 2 uX (α) (α) εα := t σ` kδb` k ≤ εα1 . `=1
Prα Prα1 (α,`) (α1 ) (α ) 2 (α) 2 cij δbi . cj ⊗ bj 2 for cj := i=1 Proof. (i) We have kδb` k2 = j=1 Using the Gram matrices C := G(c) and G := G(Bα2 ), the identity
X
2 X
(α ) cj ⊗ bj 2 =
j
j,k
(α ) (α ) hcj , ck i bj 2 , bk 2
=
C=C H
trace(CG)
= trace(C 1/2 GC 1/2 ) holds. Applying (11.47) and Remark 2.15d, we can use the inequality trace(C 1/2 GC 1/2 ) ≤ trace(C 1/2 C 1/2 ) = trace(C) =
r α 2 rα 1 X
X (α,`) (α1 ) 2
= c δb ij i
j=1
X j
hcj , cj i
i=1
to continue with the estimates in the proof of Proposition 11.53. In the last lines of the proof the equality trace {. . .} = trace δ 1/2 Σα2 1 δ 1/2 must be replaced with t u the inequality trace {. . .} ≤ trace δ 1/2 Σα2 1 δ 1/2 . Theorem 11.55. Suppose v ∈ Hr . Assume that all Bτ , τ ∈ TD , are weak HOSVD bases is the sense of (11.47) and that all coefficient matrices C (α,`) together with the weights Σα satisfy (11.46). Then a perturbation of the basis at vertex α ∈ TD by
11.3 Construction of Bases
429
v u rα 2 uX (α) (α) εα := t σ` kδb` k `=1
leads to an absolute error of v by kδvk ≤ εα . Proof. Corollary 11.54 shows that the same q kind of error at the father vertex does Prd (D) (D) not increase. By induction, we obtain εd = kδb` k)2 ≤ εα at the `=1 (σ` (D) (D) root D ∈ TD . Since rd = 1 and σ1 = kvk, it follows that εd = kvkkδb1 k. (D) (D) (D) On the other hand, v = c1 b1 with kvk = |c1 | proves that the perturbation (D) (D) (D) is kδvk = kc1 δb1 k = kvkkδb1 k. t u
11.3.6 Conversion from Rr to Hr Revisited In §11.2.4.2 the conversion from r-term representation Rr into hierarchical format Hr has been discussed on the level of subspaces. Now we consider the choice for bases. The input tensor is v=
r O X
(j)
vi
(j)
with vi
∈ Vj for j ∈ D.
(11.48a)
i=1 j∈D
Let TD be a suitable dimension partition tree for D. The easiest approach is to choose O (j) (α) vi (1 ≤ i ≤ r, α ∈ TD \{D}) (11.48b) bi := j∈α
as a frame (cf. (11.16)). Note that rα = r for all α ∈ TD \{D}. Because (α)
bi
(α1 )
= bi
(α2 )
⊗ bi
(1 ≤ i ≤ r, α1 , α2 sons of α),
(11.48c)
the coefficient matrices are of extremely sparse form: (α,`) cij
=
1 if ` = i = j 0 otherwise.
1 ≤ i, j, ` ≤ r, α ∈ TD \{D}.
(11.48d)
Pr (α ) (D) (α ) Only for α = D, is the definition b1 = i=1 bi 1 ⊗ bi 2 used to ensure (D,1) (α,`) rd = 1, i.e., C = I. In particular, the matrices C are sparse, diagonal, and of rank one for α 6= D. (α)
(α)
Now we denote the basis vectors bi in (11.48b,c) by bi,old . Below we construct an orthonormal hierarchical representation.
11 Hierarchical Tensor Representation
430
Step 1 (orthogonalisation at the leaves): By QR or Gram matrix techniques we (j) (j) can obtain an orthonormal bases {bi,new : 1 ≤ i ≤ rj } of span{vi : 1 ≤ i ≤ r} (j)
(j)
with rj ≤ r (cf. Lemma 8.16). Set Bj,new = [b1,new , . . . , brj ,new ]. Then there is a transformation matrix T (j) with Bjold
=
(j) [v1 , . . . , vr(j) ]
=
Bjnew T (j) ,
i.e.,
(j) v`
=
rj X
(j) (j)
ti` bi,new (1 ≤ ` ≤ r) .
i=1
Step 2 (non-leaf vertices): The sons of α ∈ TD \L(TD ) are denoted by α1 and P (α ) (αν ) (αν ) (ν = 1, 2). From (11.48c) = i ti` ν bi,new α2 . By induction, we have b`,old we conclude that (α) b`,old
=
r α 1 rα 2 X X
(α ) (α )
(α )
(α )
2 1 ti` 1 tj` 2 bi,new ⊗ bi,new
(1 ≤ ` ≤ r)
i=1 j=1
with the coefficient matrix (α,`)
Cold
(α,`) = cij ,
(α,`)
cij
(α ) (α )
:= ti` 1 tj` 2
(11.49)
(α,`)
(1 ≤ ` ≤ r, 1 ≤ i ≤ rα1 , 1 ≤ j ≤ rα2 ) . The identity Cold = abT for the vectors (α ) (α ) a = T•,`1 , b = T•,`2 proves the first statement of the next remark. (α,`)
Remark 11.56. (a) The matrices Cold
in (11.49) are of rank 1.
(α,`) (α,m) (α) (b) The Gram matrix Gα = G(Cold α ) has coefficients gm` = Cold , Cold F (cf. (11.31)) of the form ! ! rα rα 1 2 X X (α2 ) (α2 ) (α) (α1 ) (α1 ) sj` sjm . si` sim gm` = i=1
j=1
Therefore the computational cost for orthonormalisation is 2rα2 (rα1 + rα2 ) (instead of 2rα2 rα1 rα2 as stated in Lemma 11.33 for the general case). new as defined in Remark 11.22 satisfy (c) The tensors Cold α and Cα new rank(Cold α ) = rank(Cα ) ≤ r. old/new
Proof of (c). rank(Cα ) ≤ r follows from Exercise 3.28. Bijective transformations do not change the tensor rank (cf. Lemmata 11.24, 11.26). t u (α)
Using Gα , we can construct an orthonormal basis {b` (α) and an updated C (α,`) . to Bold α = Bα T
: 1 ≤ ` ≤ rα }, leading
Step 3 (root): For α = D choose rd = 1 with obvious modifications of Step 2.
11.4 Approximations in Hr
431
11.4 Approximations in Hr 11.4.1 Best Approximation in Hr 11.4.1.1 Existence Lemma 8.6 has the following counterpart for Hr . Nd Lemma 11.57. Let V = k·k j=1 Vj be a Banach tensor space satisfying (6.16). Then the subset Hr ⊂ V is weakly closed. Proof. Let v(ν) ∈ Hr be weakly convergent: v(ν) * v ∈ V. Because v(ν) ∈ Hr (ν) min we know that dim(Umin (v)) ≤ rj )) ≤ rj . By Theorem 6.29, dim(Uα α (v min min min follows. The nestedness property Uα (v) ⊂ Uα1 (v) ⊗ Uα2 (v) follows from t u (6.10). Hence v ∈ Hr is proved. The minimisation problem for Hr reads as follows: Given v ∈
k·k
d N j=1
Vj and r = (rα )α∈TD ∈ NTD ,
(11.50)
determine u ∈ Hr minimising kv − uk . The supposition of the next theorem is, in particular, satisfied if dim(Vj ) < ∞. Nd Theorem 11.58. Suppose that the Banach space V = k·k j=1 Vj is reflexive with (6.16). Then Problem (11.50) has a solution for all v ∈ V; i.e., for given representation ranks r = (rα )α∈TD , there is a minimiser ubest ∈ Hr ⊂ V of the optimisation problem kv − ubest k = inf kv − uk . u∈Hr
Proof. By Lemma 11.57, Hr is weakly closed. Thus, Theorem 4.33 proves the existence of a minimiser. t u In the rest of this chapter we assume that V is a Hilbert tensor space with induced scalar product. Since d O ubest ∈ Ujmin (v), j=1
the statements from the second part of Lemma 10.10 are still valid. These are also true for the truncation results in §11.4.2 (cf. [144])..
11 Hierarchical Tensor Representation
432
11.4.1.2 ALS Method As in the case of the tensor subspace format, we can improve the approximation iteratively. Each iteration contains a loop over all vertices of TD \{D}. The action at α ∈ TD \{D} is in principle as follows. Let v ∈ V be the given tensor and (D) u = ρorth , (Bj )j∈D HT TD , (Cα )α∈TD \L(TD ) , c the present representation in Hr . If α = {k} ∈ L(TD ), u is replaced with unew which is the minimiser of o n (D)
v − ρorth , (Bj )j∈D : Bk ∈ Knk ×rk with BkH Bk = I , HT TD , (Cα ), c i.e., a new rk -dimensional subspace Uknew ⊂ Vk is optimally chosen and represented by an orthonormal basis Bknew . Replacing the previous basis Bk in orth the representation u = ρHT TD , (Cα )α∈TD \L(TD ) , c(D) , (Bj )j∈D by Bknew , we obtain the representation of unew . Note that the change from Bk to Bknew corresponds to a unitary mapping Ak . By Proposition 11.36, unew = (Ak ⊗ A[k] )u holds with A[k] = I. If α ∈ TD \L(TD ), a new rα -dimensional subspace Unew ⊂ Vα has to be α determined. Since Uα = range(Bα ), we have to minimise over all Bα ∈ (Vα )rα
with
BH α Bα = I.
Since the basis Bα does not appear explicitly in the representation ρorth HT (. . .), we must use Cα instead. Note that BH α Bα = I holds if and only if G(Cα ) = I holds for the Gram matrix G(Cα ) (cf. (11.43)). Hence the previous approximation (D) u = ρorth , (Bj )j∈D is replaced with unew which is HT TD , (Cβ )β∈TD \L(TD ) , c the minimiser of n o (D)
v − ρorth , (Bj )j∈D : Cα with G(Cα ) = I . HT TD , (Cβ )β∈TD \L(TD ) , c (α,`) rα (α) (α) = Cnew `=1 defines the basis Bnew = [b1,new , . . . , brα ,new ] by (11.20). Cnew α α (α) (α) If Aα : range(Bα ) → range(Bnew = bi,new is a unitary mapping, α ) with Aα bi the transition from Cα to Cnew produces u (Aα ⊗ Aαc )u with Aαc = I = new α (see second part of §11.3.2.4).
11.4 Approximations in Hr
433
11.4.2 HOSVD Truncation to Hr As for the tensor subspace format in §10.1, the higher-order singular-value decomposition can be used to project a tensor into Hr . In the case of the tensor subspace format Tr , we have discussed two versions: (i) independent HOSVD projections in all d directions (§10.1.1) and (ii) a successive version with renewed HOSVD after each partial step in §10.1.2. In the case of the hierarchical format, a uniform version (i) will be discussed in §11.4.2.1. However, now the successive variant (ii) splits into two versions (iia) and (iib). The reason is that not only different directions exists but also different levels of the vertices. Variant (iia) in §11.4.2.2 uses the direction root-to-leaves, while variant (iib) in §11.4.2.3 proceeds from the leaves to the root.
11.4.2.1 Basic Form We assume that v is represented in Hs and should be truncated into a representation in Hr for a fixed rank vector r ≤ s. More precisely, the following compatibility conditions are assumed: rα ≤ sα for all α ∈ TD , rσ1 = rσ2 for the sons σ1 , σ2 of D, rd = 1, rα ≤ rα1 rα2 for all α ∈ TD \L(TD ) and {α1 , α2 } = S(α).
(11.51)
The last inequality follows from the nestedness property (11.9c). The equation rσ1 = rσ2 for the sons σ1 , σ2 of D is due to the fact that for V = Vσ1 ⊗ Vσ2 min we have the matrix case: The minimal subspaces Umin σ1 (u) and Uσ2 (u) of some u ∈ V have identical dimensions (cf. Corollary 6.6). First we describe a truncation to uHOSVD ∈ Hr which is completely analogous to the truncation to Tr discussed in Theorem 10.2 for the tensor subspace representation. Nevertheless, there is a slight difference. In the case of Tr , there are single projections Pj (1 ≤ j ≤ d) in (10.5), and uHOSVD = Pr v holds for the product Pr of all Pj . Since the Pj commute, the ordering of the Pj in the product does not matter. In the hierarchical case, we must take into consideration that not all projections commute. (α)
(α)
In the case of Hs , let BHOSVD = [b1 , . . . , bsα ] be the HOSVD basis discussed α in §11.3.3. Denote the reduction to the first rα bases vectors (assuming rα ≤ sα ) by (α) (α) red Bred α := [b1 , . . . , brα ]. The projection Pα from Vα onto range(Bα ) is deterred red H mined by Pα = Bα (Bα ) . The same symbol Pα denotes its extension to V via O h O i h O i uj = Pα Pα := Pα ⊗ idαc , i.e., Pα u uj ⊗ j c j∈D
for any uj ∈ Vj (cf. (3.35a,b)).
j∈α
j∈α
11 Hierarchical Tensor Representation
434
HOSVD Given v = ρHT TD , (Cα )α∈TD \L(TD ) , c(D), (Bj )j∈D (cf. Definition 11.37), the projection Pα must be expressed by the coefficient matrices Cα as detailed in the following procedure with r = rα : procedure REDUCE(α, r); 1) delete C (α,`) for ` > r ; 2) if level(α) ≥ 1 then for the father β do 2a) if α = β1 is the first son of β then (β,`) reduce C (β,`) ∈ Ks1 ×s2 to ci,j 1≤i≤r,1≤j≤s ∈ Kr×s2 else 2 2b) if α = β2 is the second son of β then (β,`) reduce C (β,`) ∈ Ks1 ×s2 to ci,j 1≤i≤s1 ,1≤j≤r ∈ Ks1 ×r . Concerning the ordering of the sons, see Remark 11.4. s1 × s2 denotes the actual (α,`) (β,`) size of C (β,`) before reduction. Note that the remaining coefficients ci,j , ci,j (α) are unchanged, only those referring to {b` : r + 1 ≤ ` ≤ sα } are deleted by reducing the size of Cα , Cβ . The recursive version of this procedure is procedure REDUCE∗ (α, r); begin REDUCE(α, rα ); if α ∈ / L(TD ) then for all σ ∈ S(α) do REDUCE∗ (σ, r) end; REDUCE∗ (D, r) maps v = ρHOSVD HT (·) ∈ Hs into uHOSVD = Pr v ∈ Hr . Since no arithmetic operations are performed, the computational cost of the procedures REDUCE and REDUCE∗ is zero. Remark 11.59. (a) If the vertices α, β ∈ TD are disjoint, the projections commute: Pα Pβ = Pβ Pα . Otherwise Pα Pβ and Pβ Pα may differ. (b) If Pα , Pβ are orthogonal projections onto subspaces Uα , Uβ with Uα ⊂ Uβ , the identity Pα = Pα Pβ = Pβ Pα is valid. (c) Let σ1 , σ2 be the sons of D ∈ TD . The projections Pσ1 and Pσ2 (as mappings on V) satisfy Pσ1 v = Pσ2 v for the particular tensor v. Proof. For Part (c) note that v =
(σ1 )
P
ν
σν bν
(σ2 )
⊗ bν
t u
.
To study the consequence of Remark 11.59a, we consider α ∈ TD \L(TD ) and its sons α1 and α2 . While Pα1 and Pα2 commute, they do not commute with Pα , which can be seen as follow. First we describe the action of Pα . Consider a tensor vα ∈ Uα of the form sα X (α) (α) vα = c` b` `=1 (α) b`
with yields
HOSVD
from the HOSVD basis Bα
(α)
(α)
= [b1 , . . . , bsα ]. Hence the projection
11.4 Approximations in Hr
Pα vα =
rα X
435
(α) (α) c` b`
=
rα X
(α) c`
(α)
where we use the representation (11.20) of b` The projections Pα1 and Pα2 produce Pα2 Pα1 Pα vα =
rα X
(α)
c`
(α,`)
ci,j
(α1 )
bi
(α2 )
⊗ bj
,
i=1 j=1
`=1
`=1
sα1 sα2 X X
(index bound sα replaced with rα !).
rα 1 rα 2 X X
(α,`)
ci,j
(α1 )
bi
(α2 )
⊗ bj
(11.52)
i=1 j=1
`=1
(summation up to rαi instead of sαi ). Now Pα2 Pα1 Pα vα belongs to the subspace ˜ (α) : 1 ≤ ` ≤ rα } with the modified vectors ˜ α := span{b U ` ˜ (α) := b `
rα1 rα2 X X
(α,`)
ci,j
(α1 )
bi
(α2 )
⊗ bj
.
(11.53)
i=1 j=1
˜ α is a subspace of Uα1 ⊗ Uα2 for Uαi := span{b(αi ) : 1 ≤ ` ≤ rαi }; i.e., U ` the nestedness property (11.9c) holds. On the other hand, if we first apply Pα2 Pα1 , we get Pα2 Pα1 vα =
sα X `=1
(α) c`
rα1 rα2 X X
(α,`)
ci,j
(α1 )
bi
(α2 )
⊗ bj
i=1 j=1
=
sα X
(α) ˜ (α) c` b `
`=1
˜ (α) in (11.53). The next projection Pα yields some with the modified vectors b ` (α) ˜ α. vector Pα Pα2 Pα1 vα in Uα := range(Bα ) = span{b` : 1 ≤ ` ≤ rα } = 6 U Therefore, in general, Pα2 Pα1 Pα vα 6= Pα Pα2 Pα1 vα holds, proving noncommutativity. Further, Uα is not a subspace of Uα1 ⊗ Uα2 ; i.e., the construction does not satisfy the nestedness property (11.9c). These considerations show that the HOSVD projections must be applied from the root to the leaves. A possible description is as follows. For all level numbers 1 ≤ ` ≤ L := depth(TD ) (cf. (11.6)) we set18 Y Pα . (11.54) P (`) := α∈TD , level(α)=`
Since all Pα with level(α) = ` commute (cf. Remark 11.59a), the ordering in the product does not matter and P (`) itself is a projection. Then we apply these projections in the order uHOSVD := P (L) P (L−1) · · · P (2) P (1) v.
(11.55)
The following theorem is the analogue of Theorem 10.2 for the tensor subspace representation. Again, we refer to the best approximation ubest in Hr , which exists as stated in Theorem 11.58. 18
At level ` = 0 no projection P0 is needed because of (11.10), which holds for HOSVD bases.
11 Hierarchical Tensor Representation
436
N Theorem 11.60. Let V = a j∈D Vj and Vj be pre-Hilbert spaces with induced scalar product19 . For v ∈ Hs and r ≤ s satisfying (11.51) the approximation uHOSVD ∈ Hr in (11.55) is quasi-optimal: sX X √ (α) kv − uHOSVD k ≤ (11.56) (σi )2 ≤ 2d − 3 kv − ubest k . α i≥rα +1 (α) σi
are the singular values of Mα (v). The α-sum is taken over all α ∈ TD \{D} except that only one son σ1 of D is involved. Q Proof. The vertex α = D is exceptional since P (1) = Pα = Pσ2 Pσ1 α∈TD , level(α)=1
(σ1 , σ2 sons of D) can be replaced with Pσ1 alone (cf. Remark 11.59c). Since uHOSVD := P (L) · · · P (2) Pσ1 v, Lemma 4.146b yields X 2 2 kv − uHOSVD k ≤ k(I − Pα ) vk . α
The number of projections Pα involved is 2d − 3. The last estimate in X (α) 2 2 k(I − Pα ) vk = (σi )2 ≤ kv − ubest k i≥rα +1
(11.57)
follows as in the proof of Theorem 10.2. Summation over α proves (11.56).
t u
For all α with rα = rα1 rα2 or rα = sα , the projections Pα may be omitted. This improves the error bound (cf. Corollary 10.3). The practical calculation of (11.55) is already illustrated by (11.52). Proposition 11.61. The practical realisation of the HOSVD projection (11.55) is done in three steps: 1) Install HOSVD bases at all vertices as described in §11.3.3. For an orthonormal basis this is achieved by HOSVD∗ (D) in (11.41b). (α)
2) Delete the basis vectors bi with rα < i ≤ sα . Practically this means that the coefficient matrices C (α,`) ∈ Ksσ1 ×sσ2 for ` > rα are deleted, whereas those C (α,`) with 1 ≤ ` ≤ rα are reduced to matrices of the size Krα1 ×rα2 . This is performed by REDUCE∗ (D, r). ˜j )j∈D ˜ α )α∈T \L(T ) , c(D) , (B 3) Finally, uHOSVD is represented by ρHOSVD TD , (C D D HT ˜ ˜ (α), . . . , b ˜ (α) ˜ ˜ α = [b referring to the bases B rα ] generated recursively from Bj and Cα : 1
˜ (α) := b(α) b for α ∈ L(TD ) and 1 ≤ i ≤ rα , i i Prα1 Prα2 (α,`) ˜ (α1 ) ˜ (α2 ) (α) ˜ b` := i=1 j=1 ci,j bi ⊗ bj for α ∈ TD \L(TD ) and 1 ≤ ` ≤ rα . Note that these bases are suborthonormal, not orthonormal. To reinstall orthonormality, the orthonormalisation procedure in §11.3.2 must be applied. According to Remark 11.40, the cost of Step 1) is about (10r4 + 2r2 n)d, while Steps 2) and 3) are free. The cost for a possible re-orthonormalisation is discussed in Remark 11.34. 19
The induced scalar product is also used for Vα , α ∈ Td \L(Td ).
11.4 Approximations in Hr
437
11.4.2.2 Sequential Truncation In the case of the tensor subspace representation, a sequential truncation is formulated in (10.6). Similarly, the previous algorithm can be modified. In the algorithm from Proposition 11.61 the generation of the HOSVD bases in Step 1 is completed before the truncation starts in Step 2. Now both parts are interweaved. Again, we want to truncate from Hs to Hr with r ≤ s. The following loop is performed from the root to the leaves: (1) Start: Tensor given in orthonormal hierarchical representation. Set α := D. ˜ α1 and B ˜ α2 for the (2) Loop: a) If α ∈ / L(TD ), compute the HOSVD bases B sons α1 and α2 of α by HOSVD(α). (2b) Restrict the bases at the vertices αi to the first rαi vectors; i.e., call the procedures REDUCE(α1 , rα1 ) and REDUCE(α2 , rα2 ). (2c) As long as the sons satisfy αi ∈ / L(TD ) repeat the loop for α := αi . (D) , (Bj )j∈D ∈ Hs be given Let the tensor v = ρorth HT TD , (Cα )α∈TD \L(TD ) , c by an orthonormal hierarchical representation. The call HOSVD-TrSeq(D, r) ˜j )j∈D ∈ Hr : ˜ α )α∈T \L(T ) , c(D), (B yields the truncated tensor v ˜ = ρHT TD , (C D D procedure HOSVD-TrSeq(α, r); if α ∈ / L(TD ) then begin HOSVD(α); let α1 , α2 ∈ S(α); REDUCE(α1 , rα1 ); REDUCE(α2 , rα2 ); HOSVD-TrSeq(α1 , r); HOSVD-TrSeq(α2 , r) end;
(11.58)
Remark 11.62. (a) Note that the order by which the vertices are visited is not completely fixed. The only restriction is the root-to-leaves direction. This fact enables parallel computing. The result does not depend on the chosen ordering. In particular, computations at the vertices of a fixed level can be performed in parallel. This reduces the factor d in the computational work to log2 d. Another saving of the ˜ α has smaller data size than Cα . computational work is caused by the fact that C ˜ α1 is created, this is by definition an orthonormal (b) When the HOSVD basis B basis. If, however, the computation proceeds to the vertex α := α1 , the truncation of the basis at the son vertex of α1 destroys orthonormality. As in the basic version, an orthonormal basis may be restored afterwards. However, even without re-orthonormalisation the sensitivity analysis from Theorem 11.55 guarantees stability for the resulting suborthonormal bases. (`)
Below we use the sets TD defined in (11.7) and L = depth(TD ) (cf. (11.6)).
11 Hierarchical Tensor Representation
438
Theorem 11.63. The algorithm HOSVD-TrSeq(D, r) yields a final approximation ur ∈ Hr with L rX X X (α) 2 kv − ur k ≤ σ ˜i (11.59) (`) α∈TD
`=1
" ≤ 1+
L q X
i≥rα +1
# (`)
kv − ubest k ,
#TD
`=2 (α)
where σ ˜i are the singular values computed during the algorithm. The α-sum is understood as in Theorem 11.60, i.e., at level ` = 1 only one son of D is involved. Proof. (i) When bases at α1 , α2 ∈ S(α) are computed, they are orthonormal; i.e., G(Bαi ) = I holds for the Gram matrix. All later changes are applications of projections. Thanks to Exercise 11.50, G(Bαi ) ≤ I holds for all modifications of the basis in the course of the algorithm. This is important for the later application of Theorem 11.55. (ii) Algorithm HOSVD-TrSeq(D, r) starts at level ` = 0 and reduces the bases at the son vertices from level ` = 1 to L = depth(TD ) recursively. v0 := v is the starting value. Let v` denote the result after the computations at level `. The final result is ur := vL . The standard triangle inequality yields kv − ur k ≤
L X
kv` − v`−1 k.
`=1
Q As in (11.54), we define the product P˜ (`) := α∈T (`) P˜α , but now P˜α describes D ˜ (α) are the (no ˜ (α) : 1 ≤ i ≤ rα }, where b the orthogonal projection onto span{b i i more orthonormal) basis vectors computed by the present algorithm. We observe that v` = P˜ (`) v`−1 . Lemma 4.146b allows us to estimate by X kv` − v`−1 k2 = k I − P˜ (`) v`−1 k2 ≤ k I − P˜α v`−1 k2 . (`)
α∈TD
Theorem 11.55 states that
X k I − P˜α v`−1 k2 ≤
(α) 2
σ ˜i
,
(11.60)
i≥rα +1 (α)
(α)
(α)
since the perturbations are δbi = 0 for 1 ≤ i ≤ rα , but δbi = −bi with (α) kbi k = 1 for i ≥ rα + 1. This proves the inequality in (11.59). For ` = 1, we use that P˜α1 P˜α2 = P˜α1 (α1 , α2 ∈ S(D)). (α)
(iii) Next we prove that the involved singular values σ ˜i are not larger than those from the basic algorithm in Theorem 11.60. During the sequential process we (α) visit each vertex α ∈ TD and create an orthogonal basis bi (1 ≤ i ≤ sα ) by calling HOSVD(α). For theoretical purposes we choose coefficient matrices Cα corresponding to these bases. The truncation process v 7→ . . . 7→ v0 7→ v00 7→ . . . 7→ ur starts with the original tensor v and ends with ur . Fix a vertex β ∈ TD (α) σi
11.4 Approximations in Hr
439
and let v0 , v00 be the tensors before and after the truncation at β. Via Mα (v0 ) and 00(α) 0(α) Mα (v00 ) we obtain singular values, for which σi ≤ σi has to be proved for Prβ1 (β) (β,i) (β,j)H all α ⊂ β. For these cases we apply (5.12c): Eβ1 = i,j=1 eij C and C Corollary 5.16: the squared singular values are the eigenvalues of the matrices Eα . More precisely, we have Eα0 and Eα00 corresponding to v0 and v00 . 0(α)
0(α)
(iiia) Case α = β. Eα0 = diag{σ1 , . . . , σsα }2 holds because of the particularly (α) 00(α) 00(α) chosen basis {bi }, while truncation yields Eα00 = diag{σ1 , . . . , σsα }2 with 00(α) 00(α) 0(α) = σi for 1 ≤ i ≤ rα and σi = 0 for rα < i ≤ sα . Hence Eα00 ≤ Eα0 holds. σi (iiib) Case α $ β. We apply induction in the subtree Tβ (cf. Definition 11.6). It suffices to explain the case of α being the first son of β. Equation (5.12c) states that Prα (β) eij C (β,i) C (β,j)H (note that G• = I because of the orthonormality). Eα = i,j=1 Using this identity for E•0 and E•00 instead of E• , the inequality Eβ00 ≤ Eβ0 together 00(α) 0(α) with Lemma 2.16 proves Eα00 ≤ Eα0 and, by Lemma 2.31a, σi ≤ σi . This sequence of inequalities proves (α)
σ ˜i
(α)
≤ σi
for 1 ≤ i ≤ sα , α ∈ Tβ ,
(11.61)
to (11.61), inequality (11.60) can be continued by the comparison P (iv) Thanks (α) 2 u t (σ ) ≤ kv − ubest k with the best approximation (cf. (11.57)). i≥rα +1 i P √ The sum ` · · · appears in (11.59) since the perturbations from the different qP 2 levels are not orthogonal. We may estimate the error by α k(I − Pα ) vk as in the proof of Theorem 11.60, but now the HOSVD basis at vertex α is not related to the singular-value decomposition of Mα (v) so that we cannot continue as in the mentioned proof. q PL (`) Remark 11.64. The factor C(TD ) := 1 + `=2 #TD in (11.59) depends on the structure of TD . If d = 2L , the perfectly balanced tree TD leads to C(TD ) = 1 +
L−2 X
√ √ √ √ d − 1 − 2 2 = 3.4142 d − 3.8284 . 2(L−`)/2 = 2 + 2
`=0
For general d, the tree TD with minimal depth dlog2 de (see Remark 11.5a) yields L−2 p X C(TD ) = d − 2L−1 + 2(L−`)/2 + 1 < 4.1213 · 2L/2 − 3.8284 . `=1
The worst factor appears for the tree of maximal depth L = d − 1 (see Remark 11.5b), where √ C(TD ) = 1 + 2 (d − 2) . Remark 11.65. In principle, Algorithm (11.58) can be modified such that after each reduction step (call qP of REDUCE) the higher-order SVD is updated. Then 2 all contributions in α k(I − Pα ) vk can be estimated by kv − ubest k and we regain the estimate in (11.56).
11 Hierarchical Tensor Representation
440
11.4.2.3 Leaves-to-Root Truncation As pointed out in §11.4.2.1, the projections Pα should not proceed from the leaves to the root since then the nestedness property is violated. Grasedyck [120] proposes a truncation from the leaves to the root modified in such a way that nestedness is ensured. Let a tensor v ∈ Hs be given. A truncation at the sons α1 , α1 ∈ S(α) reduces the size of the coefficient matrices C (α,`) so that the computational work at vertex α is reduced. We assume that the target format Hr satisfies (11.51). We use (`) the notation TD in (11.7) and the abbreviation L := depth(TD ). The leaves-to-root direction is reflected by the loop L, L − 1, . . . , 1 in the following algorithm: Start: The starting value is uL+1 := v. (`)
Loop from ` := L to 1: For all α ∈ TD determine the HOSVD from Mα (u`+1 ). Let Uα be spanned by the first rα left singular vectors of Mα (u`+1 ) and define the HOSVD projection Pα as orthogonal projection onto Uα . Set u` := P (`) u`+1 Q (`) with P := α∈T (`) Pα . D
The fact that in each step the matricisation Mα (u`+1 ) uses the last projected tensor u`+1 is essential, since it guarantees that the left-sided singular-value (`) (α) decomposition of Mα (u`+1 ) for α ∈ TD leads to basis vectors bi ∈ Uα1 ⊗Uα2 (α1 , α2 sons of α). This ensures nestedness. Theorem 11.66. The algorithm described above yields a final approximation u1 ∈ Hr with sX X √
(α)
v − u1 ≤ (σi )2 ≤ 2d − 3 kv − ubest k . α i≥rα +1 (α)
The α-sum is understood as in Theorem 11.60. The singular values σi of Mα (u`+1 ) with ` = level(α). Proof. We remark that V = u∈
h
a
O
N
Uα
a (`)
α∈TD
(`)
α∈TD or (α∈L(TD ), level(α) 1 the trivial choice Uj = Vj
for j ∈ D\{1}
(12.9b)
12.3 TT Format as Hierarchical Format
457
is made. The next interior node is {1, 2} ∈ TDTT . As in (12.6) we form {1,2} vk2
:=
ρ1 X
{1,2}
(2)
(1)
v1,k1 ⊗ vk1 k2
and U{1,2} := span{vk2
: k2 ∈ R2 }.
k1 =1
In the general case, {1,...,j}
vkj
j O
X
:=
(`)
vkj−1 kj
ki ∈Ri `=1 (0≤i≤j−1)
is obtained recursively by ρj−1 {1,...,j} vkj
=
X
{1,...,j−1}
vkj−1
(j)
(kj ∈ Rj ).
⊗ vkj−1 kj
kj−1 =1
These tensors define the subspace {1,...,j}
U{1,...,j} := span{vkj
: kj ∈ Rj }
for j ∈ D\{1}
(the case j = 1 has already been stated in (12.9a)). {1,...,j−1}
From vkj−1
(j)
∈ U{1,...,j−1} and vkj−1 kj ∈ Uj = Vj we obtain the inclusion
U{1,...,j} ⊂ U{1,...,j−1} ⊗ Uj
for j ∈ D\{1},
(12.9c)
which is the nestedness condition (11.9c) since {1, . . . , j −1} and {j} are the sons {1,...,d} =v of {1, ..., j}. Because #Rd = 1 (cf. (12.4)), there is only one tensor vkd which spans Ud . This proves v ∈ Ud
and
dim(Ud ) = 1.
(12.9d)
Following Definition 11.8, the tensor v ∈ Tρ is represented by the hierarchical subspace family {Uα }α∈TDTT .
12.3.2 From Subspaces to TT Coefficients Let the subspaces U{1,...,j} ⊂ V{1,...,j} satisfy conditions (12.9c,d). Choose any (1) basis (or frame) {bk : k ∈ R1 } of U{1} = U1 and rename the basis vectors by (1) (j) (1) vk = bk . For j ∈ {2, . . . , d − 1} let {bk : k ∈ Rj } be a basis (or frame) of U{1,...,j} and assume by induction that the tensors
12 Matrix Product Systems
458 (j−1)
(1)
X
bkj−1 =
(2)
(j−1)
vk1 ⊗ vk1 k2 ⊗ . . . ⊗ vkj−2 kj−1
(kj−1 ∈ Rj−1 ) (12.10)
k1 ,...,kj−2
are already constructed. Since Uj = Vj for j > 1 (cf. (12.9b)), the standard basis (j) is formed by the unit vectors bi = e(i) (cf. (2.2)). By inclusion (12.9c), the basis (j) vector bk has a representation (j)
X
bkj =
X
(α,k )
(j−1)
(j)
ckj−1j,ij bkj−1 ⊗ bij
with α = {1, . . . , j},
kj−1 ∈Rj−1 ij ∈Ij
cf. (11.20). Setting (j)
vkj−1 kj =
X
(α,k )
(j)
(α,k )
ckj−1j,ij bij = ckj−1j,• ,
ij
(12.10) follows for j instead of j − 1. For j = d, the tensor v ∈ Ud ⊂ U{1,...,d−1} ⊗ Ud is written as v =
(D,1) (d−1) id ckd−1 ,id bkd−1
P
(d)
vkd−1 =
X
(d)
⊗ bid . Now (D,1)
(d)
(D,1)
ckd−1 ,id bid = ckd−1 ,•
id
defines the last coefficients in the representation (12.1a). Note that the cardinalities ρj = #Rj coincide with dim(U{1,...,j} ), provided that bases (not proper frames) are used. (j) As a by-product, the construction shows how the data vkj−1 kj are connected to ({1,...,j},kj ) the coefficients ckj−1 ,ij of the hierarchical format: ({1,...,j},kj )
ckj−1 ,•
(j)
= vkj−1 kj
(2 ≤ j ≤ d),
(D,1)
(d)
ckd−1 ,• = vkd−1 .
12.3.3 From Hierarchical Format to TT Format Now we start from v ∈ Hr with the underlying tree TDTT in (12.8a,b) and a rank tuple r = (rα )α∈TDTT . We may construct a TT-representation based on the subspaces (Uα )α∈TDTT as in §12.3.2. Instead, we translate the data from v = ρHT TDTT , (Cα ), c(D) , (Bj ) ∈ Hr (j) directly into the TT data of ρTT ρ,(vkj−1 kj ) with ρj = r{1,...,j} . By (11.23) the explicit representation of v is
12.3 TT Format as Hierarchical Format rα X
v=
459
(D)
(β,i[β])
Y
ci[D]
ci[β1 ],i[β2 ]
(j)
bi[{j}] .
j=1
β∈TD \L(TD )
i[α]=1 TT for α∈TD
d O
We rename the indices as follows: for α = {1, . . . , j} ∈ TDTT we rewrite i[α] by kj , and for leaves α = {j} ∈ TDTT , j > 1, we write ij . This yields d X r{1,...,`} X (D) Y ) ({1,...,j},k (1) (2) (d) ckj−1 ,ij j bk1 ⊗ bi2 ⊗ . . . ⊗ bid . v= ckd i` ∈I` k` =1 (2≤`≤d) (1≤`≤d)
j=2
(j)
Because of the choice Uj = Vj for j ≥ 2 (cf. (12.9b)), the basis {bi : i ∈ Ij } is the canonical one formed by the unit vectors of Vj = KIj . This implies that (j) bi [`] = δi` . Therefore the entries of v have the form v[i1 i2 · · · id ] =
ρ` X
(D) ckd
=
({1,...,j},k ) (1) ckj−1 ,ij j bk1 [i1 ]
j=2
k` =1 (1≤`≤d) ρ` X
d Y
(1)
({1,2},k2 )
bk1 [i1 ] · ck1 ,i2
({1,...,d−1},kd−1 )
· . . . · ckd−2 ,id−1
k` =1 (1≤`≤d−1)
rd X
·
({1,...,d},kd ) (D) ckd
ckd−1 ,id
kd =1
with ρ` := r{1,...,`} . Defining (1)
(1)
for j = 1,
({1,...,j},kj )
for 2 ≤ j ≤ d − 1, ij ∈ Ij ,
vk1 [i1 ] := bk1 [i1 ] (j)
vkj−1 ,kj [ij ] := ckj−1 ,ij (d) vkd−1 [id ]
:=
ρd P kd =1
i1 ∈ I1 ,
(12.11) ({1,...,d},k ) (D) ckd−1 ,id d ckd
id ∈ Id ,
for j = d,
with 1 ≤ kj ≤ ρj for 1 ≤ j ≤ d − 1, we get the matrix product formulation v[i1 i2 · · · id ] =
ρ` X
(1)
(2)
(d−1)
(d)
vk1 [i1 ] · vk1 ,k2 [i2 ] · . . . · vkd−2 ,kd−1 [id−1 ] · vkd−1 [id ],
k` =1 (1≤`≤d−1)
i.e., v ∈ Tρ with ρ = (ρ1 , . . . , ρd−1 ).
12 Matrix Product Systems
460
12.3.4 Construction with Minimal ρj Given a tensor v ∈ V and the tree TDTT , the considerations of §11.2.3 show that a hierarchical representation exists, involving the minimal subspaces Uα = Umin α (v) for α = {1, . . . , j}. Hence rα = dim(Umin α (v)) = rankα (v) (cf. (6.12)). As seen in §12.3.3, this hierarchical representation can be transferred into TT format with ρj = dim(Umin {1,...,j} (v)) = rank{1,...,j} (v). On the other hand, the first part of Theorem 12.2 states that ρj ≥ rank{1,...,j} (v). This proves the second part of Theorem 12.2.
12.3.5 Extended TT Representation The standard TT format uses the maximal subspace Uj = Vj , whereas the hierarchical tensor format exploits the (possibly) lower dimension of some proper subspace, e.g., Uj = Ujmin (v). Hence, in general, the TT format is more expensive. Replacing the maximal subspace by a smaller one, the format is called the ‘extended tensor-train decomposition’ (cf. Oseledets–Tyrtyshnikov [241, Eq. (11)]). (j) In the optimal case, all vkj−1 ,kj belong to Ujmin (v) whose dimension is rj . Since it is not unlikely that ρj−1 ρj > rj , it may be more advantageous to store a basis (j) {bi : 1 ≤ i ≤ rj } of Ujmin (v). The representations (1)
vk1 =
r1 X
rd−1 (1,1,k1 ) (1) bi ,
(d)
vkd−1 =
ai
(j)
X
(d,kd−1 ,1) (d) bi ,
ai
i=1
i=1 rj
vkj−1 ,kj =
X
(j,kj−1 ,kj ) (j) bi
(2 ≤ j ≤ d − 1) ,
ai
i=1
lead to the overall storage cost d X
rj ρj−1 ρj +
j=1
d X
rj · size(Vj )
with ρ0 = ρd = 1.
j=1
In this case the tensor v is given by2 v=
(j,kj−1 ,kj )
Y
X
aij
ki ∈Ri 1≤ij ≤rj (0≤i≤d) (1≤j≤d)
2
(j,kj−1 ,kj )
aij
(1,k1 )
is to be interpreted as ai1
d O
(j)
bij .
j=1
(d,kd−1 )
for j = 1 and as aid
for j = d.
12.3 TT Format as Hierarchical Format
461
Note that the optimal values of the decisive parameters (ranks) are rj = rankj (v) and ρj = rank{1,...,j} (v)
for 1 ≤ j ≤ d.
(12.12)
This format is completely equivalent to the hierarchical format for the particular choice of the dimension partition tree TDTT .
12.3.6 Properties Remark 12.3. The storage cost for v ∈ Tρ is d X
ρj−1 ρj · size(Vj ) with ρ0 = ρd = 1,
j=1
where size(Vj ) denotes the storage size for a vector from Vj . Under the assumptions size(Vj ) ≤ n and ρj ≤ ρ for all 1 ≤ j ≤ d − 1, the data need a storage of the size bounded by (d − 2) ρ2 + 2ρ n. (j)
The storage cost is less if some factors vkj−1 ,kj vanish (cf. Remark 12.4). The storage cost improves for the extended TT format (cf. §12.3.5) since then it coincides with the storage cost of the hierarchical format. Remark 11.4b states that any permutation of indices from D which correspond to the interchange of sons {α1 , α2 } = S(α), leads to an isomorphic situation. In the case of the TT representation, the only permutation, keeping the linear tree structure and leading to the same ranks ρj , is the reversion (1, . . . , d) 7→ (d, d − 1, . . . , 1) . The reason is (6.13a): ρj = rank{1,...,j} (v) = rank{j+1,...,d} (v). Note that the underlying linear tree TDTT is not the optimal choice with respect to the following aspects. Its length d − 1 is maximal, which might have negative effects, e.g., in Remark 11.64. Because of the linear structure, computations are sequential, while a balanced tree supports parallel computing (cf. Remark 11.62). On the other hand, the storage and operation costs often contain the product rα1 rα2 of the ranks associated to the sons of α ∈ T . If the dimensions nj of Vj are small: nj ≤ c, the linear tree allows to estimate rα1 rα2 by r{1,...,j−1} r{j} ≤ c r{1,...,j−1} , which is only linear in the rank, not quadratic.
12 Matrix Product Systems
462
12.3.7 HOSVD Bases and Truncation In principle, the HOSVD computation is identical to the algorithm in §11.3.3.4. The do statement ‘for all sons σ ∈ S(α)’ in (11.41b) can be rewritten. Since for the linear tree TDTT , the second son α2 is a leaf, the recursion coincides with the loop from {1, . . . , d} to {1, 2}, i.e., from d to 2. The first matrix in this loop is3 Md := A(d) [·] ∈ Kρd−1 ×nd
(cf. (12.2b) and ρd = 1),
to which a left-sided singular-value decomposition is applied. Let Hd be the result (i.e., Md = Hd Σd GT d with Gd , Hd orthogonal). The HOSVD basis of the d-th direction is given by HdH A(d) [·]. The matrix-product representation A(1) [i1 ] · A(2) [i2 ] · · · · · A(d−1) [id−1 ] · A(d) [id ] in (12.2b) is transformed into A(1) [i1 ] · A(2) [i2 ] · . . . · A(d−1) [id−1 ]Hd · HdH A(d) [id ] | {z } A(d,HOSVD) [id ]
HOSVD
= rank{1···d−1} (v) with A(d,HOSVD) [·] ∈ Kρd−1 ×nd , where the rank ρHOSVD d−1 is possibly smaller than ρd−1 . For general 2 ≤ j ≤ d − 1, the block matrix HOSVD
Mj := [A(j) [i1 ]Hj+1 A(j) [i2 ]Hj+1 · · · A(j) [inj ]Hj+1 ] ∈ Kρj−1 ×ρj HOSVD
possesses a left-sided singular matrix Hj ∈ Kρj−1 is further transformed into
×ρHOSVD j−1
nj
and the matrix-product
H A(d−1) [id−1 ]Hd ·HdH A(d)[id ]. A(1)[i1 ]· · ·A(j−1)[ij−1 ]Hj · HjH A(j)[ij ]Hj+1 · · ·Hd−1 {z } | {z } | {z } | A(j,HOSVD) [ij ]
A(d−1,HOSVD) [id−1 ]
A(d,HOSVD) [id ]
The final HOSVD matrices are A(1,HOSVD) [i] = A(1) [i]H2 ,
A(d,HOSVD) [i] = HdH A(d) [i],
A(j,HOSVD) [i] = HjH A(j) [i]Hj+1
for 2 ≤ j ≤ d − 1
(i varies in the respective set Ij ). The computational cost can be estimated by 2
d X j=2
ρ2j−1
8 ρj−2 nj−1 + 2ρj nj + ρj−1 . 3
(12.13)
In §11.4.2.1 the truncation √ based on the HOSVD bases leads to the estimate (11.56) with the factor of 2d − 3 since 2d − 3 projections are applied. This For fixed j ∈ Id the ‘matrix’ A(d) [j] is a column vector whose entries are indexed by 1 ≤ i ≤ ρd−1 . This defines the matrix Md = (mij ). 3
463
12.4 Conversions
number reduces to d − 1 for the TT format since only d − 1 projections are performed (reduction of Hj to the first ρ0j columns). The hierarchical format requires d − 2 additional projections for the subspaces Uj ⊂ Vj (2 ≤ j ≤ d − 1) which are now fixed by the maximal subspaces Uj = Vj .
12.4 Conversions 12.4.1 Conversion from Rr to Tρ Remark 12.4. (a) Let v ∈ Rr , i.e., v = 1 ≤ j ≤ d − 1 and
Pr
ν=1
(j)
Nd
j=1
( (1) vk1
:=
(1) uk1 ,
(d) vkd−1
:=
(d) ukd−1 ,
(j) vkj−1 ,kj
:=
uν . Then set ρj := r for (j)
uν for kj−1 = kj = ν 0 otherwise
(j)
for the factors in (12.3). Because most of the vkj−1 ,kj are vanishing, the storage cost from Remark 12.3 reduces to the storage needed for Rr . (b) Part (a) describes the implication v ∈ Rr ⇒ v ∈ Tρ for ρ = (r, . . . , r). If r = rank(v), ρ = (r, . . . , r) is minimal in the case of d ≤ 3, while for d ≥ 4, the ranks ρ∗j (j ∈ / {1, d − 1}) of ρ∗ with v ∈ Tρ∗ may be smaller than r. Proof. We only consider Part (b) and prove that ρ1 = ρd−1 = r. Lemma 3.41 [1] defines vν (1 ≤ ν ≤ r) and states that these vectors are linearly independent. This implies that ρ1 = rank{1} (v) = rank{2,...,d} (v) = dim{vν[1] : 1 ≤ ν ≤ r} = r. [d]
For ρd−1 use ρd−1 = rank{d} (v) = dim{vν : 1 ≤ ν ≤ r} = r. Note that for d ≤ 3, all indices 1 ≤ j ≤ d − 1 belong to the exceptional set {1, d − 1}. t u
12.4.2 Conversion from Tρ to Hr with a General Tree The format Tρ is connected with the tree TDTT and the ordering 1, . . . , d of the tree TD is based on vector spaces Vj . We assume that another dimension partition (j) the same ordering.4 The tensor v = ρTT ρ, (vkj−1 kj ) ∈ Tρ is described by the (j) data vkj−1 kj ∈ Vj . 4
According to Remark 11.4, several orderings can be associated with Td . One of them has to coincide with the ordering of TdTT .
12 Matrix Product Systems
464
By the assumption on the ordering of the indices, each α ∈ TD is of the form α = {jα0 , jα0 + 1, . . . , jα00 } for suitable jα0 , jα00 ∈ D. We define X X (j) (α) (j) ukj0 −1 ,kj00 := ... vkj0 −1 kj0 ⊗ vkj0 kj0 α
α
kj 0
α
α
kj 00 −1
α
α
α
(j)
+1
⊗ . . . ⊗ vkj00 −1 kj00 α
(12.14a)
α
α
for kjα0 −1 ∈ Rjα0 −1 , kjα00 ∈ Rjα00 (with R0 = Rd = {1}), and o n (α) Uα := span ukj0 −1 ,kj00 : kjα0 −1 ∈ Rjα0 −1 , kjα00 ∈ Rjα00 . α
(12.14b)
α
Note that ρj := #Rj . Since the number of tensors on the right-hand side of (12.14b) is #Rjα0 −1 #Rjα00 = ρjα0 −1 ρjα00 , we obtain the estimate for α = {j ∈ D : jα0 ≤ j ≤ jα00 }.
(12.14c)
For α ∈ TD \L(TD ) with sons α1 , α2 , we can rewrite (12.14a) as X (α) (α ) (α ) ukj01 −1 ,kj00 ⊗ ukj002 ,kj00 , ukj0 −1 ,kj00 =
(12.15)
dim(Uα ) ≤ ρjα0 −1 ρjα00
α
α
kj 00 ∈Rj 00 α2
α1
α1
α1
α2
α2
since jα0 1 = jα0 , jα001 = jα0 2 − 1, and jα002 = jα00 . Equality (12.15) proves the (D) nestedness property Uα ⊂ Uα1 ⊗ Uα2 . Since uk 0 ,k 00 = v holds for α = D, j −1 d
j d
v ∈ Ud is also shown. Hence {Uα }α∈TD is a hierarchical subspace family, and (11.13) is satisfied: v ∈ Hr . Proposition 12.5. Let v ∈ Tρ , ρ = (ρ1 , . . . , ρd−1 ) , and consider a hierarchical format Hr involving a dimension partition tree TD with the same ordering of D. (a) All v ∈ Tρ can be transformed into a representation v ∈ Hr , where the dimensions r = (rα : α ∈ TD ) are bounded by rα ≤ ρjα0 −1 · ρjα00
for α = {j : jα0 ≤ j ≤ jα00 } ∈ TD .
(12.16)
ρj are the numbers appearing in ρ = (ρ1 , . . . , ρd−1 ). The estimate remains true for ρj := rank{1,...,j} (v). (b) If TD = TDTT (cf. (12.8a,b)), r{1,...,j} = ρj holds. (c) If d ≤ 6, a tree TD with minimal depth can be chosen such that all rα are bounded by some rj or ρj from (12.12); i.e., no product as in (12.16) appears. Proof. (i) (12.16) corresponds to (12.14c). Using the definitions rα = rankα (v) for vertices α = {j : jα0 ≤ j ≤ jα00 } and (12.5), i.e., ρjα0 −1 = rankβ (v) for β = {1, . . . , jα0 − 1} and ρjα00 = rankγ (v) = rankγ c (v) for γ c = {1, . . . , jα00 }, inequality (12.16) is a particular case of Lemma 6.20b.
465
12.4 Conversions
(ii) Consider the case d = 6 in Part (c). Choose the tree depicted below. For a leaf α ∈ L(TD ), the rank rα is some rj in (12.12). The vertices {1, 2} and {1, 2, 3} u lead to ρ2 and ρ3 , while r{4,5,6} = r{1,2,3} = ρ3 and r{5,6} = r{1,2,3,4} = ρ4 . t A simplification of the previous proposition is as follows: if the TT representation uses the constant ranks ρ = (r, . . . , r), the ranks of the hierarchical format are always bounded by r2 . This estimate is sharp for d ≥ 8 as shown by an example in [122]. Up to d = 6, the better bound r can be achieved with optimally balanced trees.
{1,2,3,4,5,6} {1,2,3} {1,2} {3} {1} {2}
{4,5,6} {4} {5,6} {5} {6}
12.4.3 Conversion from Hr to Tρ Given a hierarchical format Hr involving the tree TD , we may consider Tρ with an optimal permutation of the dimension indices from D = {1, . . . , d}. Rewriting this new ordering again by 1, . . . , d means that the Hr -vertices from of TD are not necessarily of the form {j : jα0 ≤ j ≤ jα00 } . The Tρ -ranks ρj = rank{1,...,j} (v) can be estimated by products of the ranks rα = rankα (v) (α ∈ TD ) appearing in the hierarchical format as follows. The set {1, . . . , j} can be represented (possibly in many ways) as a disjoint union of subsets of TD : {1, . . . , j} =
κj [
(disjoint αν ∈ TD ).
αν
(12.17)
ν=1
The existence of suchSa representation is proved by the singletons, i.e., the leaves j of TD : {1, . . . , j} = ν=1 {ν}, yielding the largest possible value κj = j. In the 0 best case, {1, . . . , j} is already contained in TD and κj is equal to 1. Let κj,min be the smallest κj in (12.17) taken over all possible representations. Then Lemma 6.20b states that 0 κj,min
ρj ≤
Y
rαmin , ν
ν=1 0 Sκj,min ανmin . However, since where ανmin are the subsets with {1, . . . , j} = ν=1 ρj = rank{1,...,j} (v) coincides rank{j+1,...,d} (v), we have also to consider Sκwith j 00 partitions {j + 1, . . . , d} = ν=1 βν (βν ∈ TD ). Let κj,min and βνmin be the optimal choice in the latter case. Then
ρj ≤ min
0 ( κj,min Y
ν=1
00 κj,min
rαmin , ν
Y ν=1
) rβνmin
12 Matrix Product Systems
466
follows. Assuming a hierarchical format Hr with rα ≤ r for all α ∈ TD , we 0 00 obtain the estimate ρj ≤ rmin{κj,min ,κj,min } . Introducing 00 0 , κj,min κmax := max min{κj,min }:1≤j≤d , we get max {ρj : 1 ≤ j ≤ d} ≤ rκmax . To understand how large the exponent κmax may become, we first consider the cases d = 3, . . . , 6 explicitly and then the regular case of a completely balanced tree with d = 2L (cf. (11.1) for L = 2). For d = 3 all trees are isomorphic, i.e., Hr and Tρ are essentially equal. For d = 4 take the natural ordering of the balanced tree TD . Then ρ1 = r{1} ,
ρ2 = r{1,2} ,
ρ3 = rank{1,2,3} (v) = rank{4} (v) = r{4}
proves κmax = 1, i.e., Hr and Tρ share the same ranks although the trees are different. For d = 5 consider the tree (a) of the right drawing. Then ρ1 = r{1} , ρ2 = r{1,2} , ρ3 = r{1,2,3} , and ρ4 = rank{5} (v) = r{5} show again κmax = 1 and that the ranks coincide. In the case of d = 6, the choice of the tree TD is essential. The tree (b) leads to ρ1 = r{1} , ρ2 = r{1,2} , ρ3 = r{1,2,3} , and ρ4 = rank{1,2,3,4} (v) = rank{5,6} (v) = r{5,6} , i.e., κmax = 1. However, the tree (c) does neither contain {1, 2, 3} nor its complement and κmax = 2 follows. As shown in Grasedyck–Hackbusch [122], there is a tensor in Hr with rα ≤ r such that ρ3 = r{1,2,3} = r2 . For all 6 ≤ d ≤ 16 one verifies that κmax = 2. In the case of a completely balanced tree we have d = 2L . The next theorem is proved by Buczy´nska–Buczy´nsk– Michałek [46].5
(a) d=5
3 1
4
5
2
(b) d=6
3 1
4 5
2
6
(c) d=6
5 1 2
3
6
4
Theorem 12.6. Let TD be the balanced decomposition tree for d = 2L . Assuming a hierarchical format Hr with rα = r for all α ∈ TD , the rank ρj for P L2 −1 ν 4 is bounded by j = ν=0 ρj ≤ rdL/2e .
(12.18)
There are tensors v ∈ Hr such that inequality (12.18) becomes an equality. 5
The second statement of the theorem was formulated as a conjecture in the first edition of this book. The paper [46] refers to this conjecture.
12.5 Cyclic Matrix Products and Tensor Network States
467
12.5 Cyclic Matrix Products and Tensor Network States The tensor subspace format as well as the hierarchical tensor format (including the TT format) are based on trees. General tensor networks are based on graphs. Examples6 are given in H¨ubener–Nebendahl–D¨ur [171, p. 5], in particular, multidimensional grid-shaped graphs are considered instead of the one-dimensional chain 1 − 2 − . . . − d used in (12.3). The essential difference is that graphs may contain cycles. Landsberg [206, Theorem 14.1.2.2] (see also [207]) proves that in general, a graph-based format containing a cycle is not closed. It implies that an instability may occur similar to the r-term format (cf. Remark 9.19). In the following we study the general cyclic format as well as the site-independent version.
12.5.1 Cyclic Matrix Product Representation The definition ρ0 = ρd = 1 in (12.1c) or #Rj = 1 for the index sets for j = 0 and j = d has the purpose of avoiding summation over these indices. Instead, we can identify the indices ρ0 = ρd and allow ρd > 1: v=
ρ1 X k1 =1
ρd−1
···
X
ρd X
(1)
(2)
(d−1)
(d)
vkd ,k1 ⊗ vk1 k2 ⊗ . . . ⊗ vkd−2 kd−1 ⊗ vkd−1 ,kd .
(12.20)
kd−1 =1 kd =1
A shorter notation is v[i1 , i2 , . . . , id ] = trace A(1) [i1 ]A(2) [i2 ] · . . . · A(d) [id ]
(12.21)
with matrices A(j) [ij ] ∈ Kρj−1 ×ρj , where ρ0 = ρd . This results into a cycle instead of a linear tree.7 In the following, we set D = Zd which implies 0 = d (modulo d) and hence ρ0 = ρd . Although this tensor representation looks quite similar to (12.3), it has essentially different properties. Proposition 12.7. (a) If ρj = 1 for at least one j ∈ D, the tensor representation (12.20) coincides with (12.3) for the ordering {j + 1, j + 2, . . . , d, 1, . . . , j}. We call (12.20) a proper cyclic representation if all ρj are larger than 1. (b) The minimal subspace Ujmin (v) is not related to a single parameter ρk in (12.20); i.e., ρk cannot be interpreted as a subspace dimension. (c) Inequality rankj (v) = dim(Ujmin (v)) ≤ ρj−1 ρj holds for (12.20). (d) In general, a cyclic representation with rankj (v) = ρj−1 ρj does not exist. 6
The graph-based tensor format has several names: tensor network states, finitely correlated states (FCS), valance-bond solids (VBS), projected entangled pair states (PEPS), etc. (cf. [207]). 7 The physical interpretation of the cyclic format is a cyclic chain of particles interacting only with their neighbours. One may also think about an infinite linear alignment and periodic states of length d.
12 Matrix Product Systems
468
Proof. (i) Assume that j = d in Part (a). Then ρ0 = ρd = 1 yields (12.3). Pρ1 Pρd (1) [1] (ii) Fix j = 1. Then the representation v = kd =1 vkd ,k1 ⊗ vkd ,k1 k1 =1 P Nd [1] (`) holds with vkd ,k1 := k2 ,k3 ,...,kd−1 `=2 vk`−1 k` . Both indices kd and k1 enter [1] 0 into the definition of U1min (v) = ϕ(vkd ,k1 ) : ϕ ∈ V[1] in the same way, proving Part (b). Obviously, the dimension is bounded by ρd ρ1 = ρ0 ρ1 as stated in Part (c). (iii) If rankj (v) is a prime number, rankj (v) = ρj−1 ρj implies that ρj−1 = 1 u or ρj = 1. Hence, by Part (a), (12.20) cannot be a proper cyclic representation. t In the cyclic case, the ranks ρj are not related to the dimensions of Ujmin (v). Therefore we cannot repeat the proof of Lemma 11.57 to prove closedness of the format (12.20). In fact, below we shall prove nonclosedness. Let C(d, (ρj ), (nj ))) = C(d, (ρj )1≤j≤d , (nj )1≤j≤d )) Nd Ij denote the set of all tensors v ∈ with nj := #Ij possessing a reprej=1 K (j) sentation (12.21) with matrices A [i] ∈ Kρj−1 ×ρj for i ∈ Ij and 1 ≤ j ≤ d. Next we introduce another formulation of v ∈ C(d, (ρj ), (nj )). (j)
(j)
Let Epq be the matrix with entries Epq [k, `] = δpk δq` . The tensor m = m(d, (ρj )) is already mentioned in Example 3.72: m :=
ρ1 X k1 =1
···
ρd X
(1)
(2)
(d)
(d−1)
Ekd ,k1 ⊗ Ek1 k2 ⊗ . . . ⊗ Ekd−2 kd−1 ⊗ Ekd−1 ,kd ∈
d O
Kρj−1 ×ρj .
j=1
kd =1
(12.22) Lemma 12.8. C(d, (ρj ), (nj )) consists of all v = Φ(m)
with Φ =
d O
φ(j) and φ(j) ∈ L(Kρj−1 ×ρj , KIj ).
j=1 (j)
(j)
(j)
Proof. Let v = Φ(m). Denote the image of Ekj−1 kj by vkj−1 kj := φ(j) (Ekj−1 kj ). Then v = Φ(m) coincides with (12.20) proving v ∈ C(d, (ρj ), (nj )). If, on the other hand, a tensor v ∈ C(d, (ρj ), (nj )) is given by (12.20), the equations (j) (j) vkj−1 kj = φ(j) (Ekj−1 kj ) define a unique map φ(j) ∈ L(Kρj−1 ×ρj , KIj ) so that v = Φ(m). t u Nd Exercise 12.9. Let Ψ = j=1 ψ (j) with ψ (j) ∈ L(KIj , KIj ) be any elementary Kronecker product. Show that v ∈ C(d, (ρj ), (nj )) implies Ψ v ∈ C(d, (ρj ), (nj )). Remark 12.10. For sufficiently large ρj , we have C(d, (ρj ), (nj )) = V =
d O j=1
Knj .
12.5 Cyclic Matrix Products and Tensor Network States
Proof. (i) ρj = 1 is sufficient to ensure eν := (j) eν
469 (j) j=1 eνj
Nd
∈ C(d, (ρj ), (nj )), where
is the ν-th unit vector in K : choose A [i] := δi,νj ∈ K1×1 = K. Since V is spanned by all eν , we have to show that sums belong to the format with possibly increased ρj . (ii) Let v0 ∈ C(d, (ρj0 ), (nj )) and v00 ∈ C(d, (ρ00j ), (nj )) be represented by matrices A0(j) [i] and A00(j) [i]. Then v0 +v00 ∈ C(d, (ρ0j +ρj00 ), (nj )) is represented by the block diagonal matrices A(j) [i] := diag{A0(j) [i], A00(j) [i]}. t u nj
(j)
Theorem 12.11. The set C(3, (ρj ), (nj )) with ρj = 2 and nj ≥ 4 is not closed. Proof. Thanks to Lemma 9.42 it is sufficient to consider the case of nj = 4. Without loss of generality, choose Ij = I = {1, 2} × {1, 2}, i.e., KI = K2×2 . The tensor space is V := ⊗3 K2×2 . We follow the proof of Harris–Michałek–Sert¨oz [157, Theorem 4.5]. (i) Define ψ ∈ L(K2×2 , K2×2 ) by ψ(E12 ) = E12 and ψ(Epq ) = 0 for (p, q) 6= (1, 2). Together with the identity id ∈ L(K2×2 , K2×2 ), we can define for t ∈ R, v(t) = ⊗3 (ψ + t · id) (m) P2 P2 P2 where m = k1 =1 k2 =1 k3 =1 Ek3 k1 ⊗ Ek1 k2 ⊗ Ek2 k3 ∈ V. Multilinearity yields v(t) = v0 + t · v1 + t2 · v2 + t3 · v3 with v0 = (⊗3 ψ)(m),
v1 = [ψ ⊗ ψ ⊗ id + ψ ⊗ id ⊗ ψ + id ⊗ ψ ⊗ ψ](m),
v2 = [id ⊗ id ⊗ ψ + id ⊗ ψ ⊗ id + ψ ⊗ id ⊗ id](m),
v3 = m.
There is at most one factor Ekj−1 kj with (kj−1 , kj ) = (1, 2). Since v0 and v1 involve three or two ψ applications, v0 = v1 = 0 follows. Evaluation of v2 yields v2 = E21 ⊗ E11 ⊗ E12 + E22 ⊗ E21 ⊗ E12 + E11 ⊗ E12 ⊗ E21
(12.23)
+ E21 ⊗ E12 ⊗ E22 + E12 ⊗ E21 ⊗ E11 + E12 ⊗ E22 ⊗ E21 . v0 = v1 = 0 allows us to form the limit v2 = lim t−2 v(t). Lemma 12.8 states that t→0 v(t) ∈ C := C(3, (ρj = 2), (nj = 4)) . (ii) The nonclosedness will follow from v2 ∈ / C. For an indirect proof assume v2 ∈ C. By Lemma 12.8 there must be φ(j) ∈ L(K2×2 , K2×2 ) with v2 = N3 ( j=1 φ(j) )(m). It is easy to check that U1min (v2 ) = K2×2 . Note that U1min (v2 ) N3 N3 is the range of the matricisation M1 (( j=1 φ(j) )(m)) = φ(1) M1 (m)( j=2 φ(j) )T (cf. (5.5)). Therefore the map φ(1) must be surjective. Since φ(1) ∈ L(K2×2 , K2×2 ), surjectivity implies injectivity. Hence φ(1) :NK2×2 → K2×2 and analogously 3 (j) φ(2) , φ(3) are vector space isomorphisms and : V → V a tensor space j=1 φ isomorphisms. By Lemma 3.39, rank(v2 ) = rank(m) follows. The representation (12.23) yields rank(v2 ) ≤ 6. On the other hand, rank(m) = 7 holds according t u to Theorem 3.51. This contradiction proves that v2 ∈ / C.
12 Matrix Product Systems
470
12.5.2 Site-Independent Representation A subset of the cyclic matrix-product format is characterised by equal matrices: I := I1 = . . . = Id ,
ρ := ρ1 = . . . = ρd ,
)
A[i] := A(1) [i] = . . . = A(d) [i] ∈ Kρ×ρ (i ∈ I)
for 1 ≤ i ≤ ρ.
Accordingly, the entries of the represented tensors are v[i1 , i2 , . . . , id ] = trace (A[i1 ] A[i2 ] · . . . · A[id ]) .
(12.24)
In this special case, the format C(d, (ρj ), (nj )) is denoted by Cind (d, ρ, n), where n := #I. If d is even, one should choose the field K = C. Otherwise, v ∈ Cind (d, ρ, n) does not necessarily imply −v ∈ Cind (d, ρ, n). Alternatively, one can introduce an additional scalar factor to ensure that Cind (d, ρ, n) has the cone property (cf. (9.14b)). The physical background are identical particles interacting equally with their neighbours. Perez–Garcia et al. [244, §3.2.1] call such tensors the ‘site-independent cyclic matrix-product states’. Let Cind (d, ρ, n) be the set of tensors (12.24) in ⊗d V with dim(V ) = n, while ρ is the matrix size of A[i] ∈ Kρ×ρ . Next we define a set of the cyclic tensors. Definition 12.12. A tensor v ∈ V := ⊗d V is called cyclic if πv = v
for the permutation π : (1, . . . , d) 7→ (d, 1, . . . , d − 1) .
The set of cyclic tensors is denoted by Vcycl . Remark 12.13. (a) Cind (d, ρ, n) ⊂ Cind (d, ρ, n0 ) ⊂ Vcycl for n ≤ n0 . (b) Let ψ ∈ L(V, V ) and set Ψ := ⊗d ψ. Then v ∈ Cind (d, ρ, n) implies Ψ v ∈ Cind (d, ρ, n). (c) Cind (d, ρ, n) = Vcycl holds for sufficiently large n. Proof. Part (b) is the analogue of Exercise 12.9 and proved in the same way. (c) Let ν = (ν1 , . . . , νd ) be a d-tuple with νj ∈ {1, . . . , n}. The permutation π is as in Definition 12.12. The shifted versions of ν are π k ν. As in the proof of Remark 12.10 it is sufficient to show that the cyclic version eν,cycl :=
d X
eπk ν
k=1
of the unit tensor eν belongs to Cind (d, ρ, n) for sufficiently large ρ.
12.5 Cyclic Matrix Products and Tensor Network States
471
First we consider the particular tuple ν = (1, . . . , d). Choose ρ = d. Using the unit vectors ek ∈ Kρ (1 ≤ k ≤ d with e0 := ed ), define the matrices in ρ×ρ (12.24) by A[k] := ek−1 eT . One verifies that (12.24) yields e(1,...,d),cycl . k ∈K For a general d-tuple ν = (ν1 , . . . , νd ) apply part (b) with the map ψ defined by ψ(ek ) := eνk . t u The next lemma is the analogue of Lemma 12.8. Lemma 12.14. Cind (d, ρ, n) = ⊗d φ (m) : φ ∈ L(Kρ×ρ , Kn ) holds with the tensor m in (12.22) for ρj = ρ. A slight modification of the proof of the previous Theorem 12.11 shows that the set Cind (d = 3, ρ = 2, n = 4) is not closed. A general result is proved by Czapli´nski–Michałek–Seynnaeve [66]: Theorem 12.15. Let K = C, n ≥ 2, and d ≥ 3. Then Cind (d, 2, n) is closed if and only if (d, n) = (3, 2) . The fact that Cind (3, 2, 2) is closed has a trivial reason. The following statement is proved by Harris–Michałek–Sert¨oz [157, §5.1]. Theorem 12.16. Cind (3, 2, 2) = Vcycl . In this context a conjecture is formulated in Czapli´nski–Michałek–Seynnaeve [66]: Either Cind (d, ρ, n) is not closed or it is equal to Vcycl . Another result in [66] concerns the case of a general matrix size ρ. Theorem 12.17. Let n ≥ 2 and fix some ρ ≥ 2 Then, for sufficiently large d, the format Cind (d, ρ, n) is not closed. The consequence for the numerical instability and the topological structure of the nonclosed formats are described in §9.5.
12.5.3 Tensor Network Concerning computations in tensor networks, we refer to Huckle et al. [172] and Espig et al. [90]. However, because of the numerical instability as a consequence of the nonclosedness of the cyclic formats, one should not do computations within this format. Handschuh [156] describes how tensors represented in one network can be transferred into another network topology. In particular, a graph-based representation can be mapped into the tree-based hierarchical format.
12 Matrix Product Systems
472
12.6 Representation of Symmetric and Antisymmetric Tensors In the case of general tensors, the extended TT format corresponds to subspaces U{1,...,j} and Uj subject to the nestedness condition U{1,...,j} ⊂ U{1,...,j−1} ⊗ Uj . Given U{1,...,j−1} and Uj , we can choose some subspace U{1,...,j} , etc. This yields a representation of any v ∈ UD . In the case of (anti-)symmetric tensors, all subspaces Uj must coincide: U := Uj for all j. Assume that U{1,...,j−1} is already spanned by a basis of (anti-) symmetric tensors. Does U{1,...,j−1} ⊗ U already contain a subspace U{1,...,j} of (anti-)symmetric tensors? In principle this approach may yields a positive result, since the minimal subspaces of an (anti-)symmetric tensor satisfy min Umin {1,...,j} (v) ⊂ U{1,...,j−1} (v) ⊗ U,
provided that Umin ⊂ U (cf. (11.14c)). Nevertheless, it seems that this approach 1 is not practical. If U{1,...,j−1} is too small, the only subspace of (anti-)symmetric tensors in U{1,...,j−1} ⊗ Uj is the trivial one: U{1,...,j} = {0}. For details see Hackbusch [143].
Chapter 13
Tensor Operations
Abstract In §4.6 several tensor operations have been described. The numerical tensor calculus requires the practical realisation of these operations. In this chapter we describe the performance and arithmetical cost of the operations for the different formats. The discussed operations are the addition in Section 13.1, evaluation of tensor entries in Section 13.2, the scalar product and partial scalar product in Section 13.3, the change of bases in Section 13.4, general binary operations in Section 13.5, Hadamard product in Section 13.6, convolution of tensors in Section 13.7, matrixmatrix multiplication in Section 13.8, and matrix-vector multiplication in Section 13.9. Section 13.10 is devoted to special functions applied to tensors. In the last Section 13.11 we comment on the operations required for the treatment of Hartree– Fock and Kohn–Sham applications in quantum chemistry. In connection with the tensorisation discussed in Chapter 14, further operations and their cost will be discussed.
We repeat the consideration in §7.1 concerning operations. Two mathematical entities s1 , s2 ∈ S are represented via parameters p1 , p2 ∈ PS , i.e., s1 = ρS (p1 ) leads to s := s1 s2 . We must find a and s2 = ρS (p2 ). A binary operation parameter p ∈ PS such that s = ρS (p). Therefore, on the side of the parameter representations, we must perform (7.2): p := p1 b p2
:⇐⇒
ρS (p) = ρS (p1 )
ρS (p2 ).
The related memory cost is denoted by Nmem (·) and the number of arithmetic operations by N with ‘ ’ replaced by the respective operation. The following list yields an overview of the asymptotic cost of various operations for the different formats. Here we assume the most general case (different bases etc.) and consider only upper bounds using n = min nj , nj = dim(Vj ), r = max rα etc.1 Furthermore, the cost of the j-th scalar product is assumed to be 2nj − 1. 1
Note that the meaning of the bound r differs for the formats, since different kinds of ranks are involved. © Springer Nature Switzerland AG 2019 W. Hackbusch, Tensor Spaces and Numerical Tensor Calculus, Springer Series in Computational Mathematics 56, https://doi.org/10.1007/978-3-030-35554-8_13
473
13 Tensor Operations
474 full storage nd basis change 2dnd+1 orthonormalisation u+v nd vi evaluation 0 hu, vi 2nd 2nd+#α hu, viαc hv, vi{j}c nd+1 u v nd Av 2dnd+1 truncation
r-term dnr 2dn2 r
tensor subspace rd + dnr 2drd+1 2drd+1 + 2dnr2 0 2drd+1 + 2dnr2 dr 2rd 2 2 dr + 2dnr 2drd+1 + 8dnr2 r2 #αc + 2#αc nr2 2rd+#α + 8#αc nr2 1 dr2 + dnr2 2rd+1 + 8dnr2 2 2 dnr r2d + dnr2 2 2dn r 2d rd+1 + n2 r + nr2 ∼ dr2 R + d2 rR 3drd+1 + 2dr2 n
hierarchical dr3 + dnr 2dr3 2dnr2 + 6dr4 8dnr2 + 12dr4 2dr3 2dnr2 + 6dr4 < 2dnr2 + 6dr4 < 2dnr2 + 6dr4 dnr2 + (d − 1) r4 2dn2 r 2dr2 n + 3dr4
The cost of the truncation is concluded from §9.6.4 for2 Rr , from §8.3.3 for Tr , and from (11.41c) for Hr . The terms involving n may be improved by applying the tensorisation technique in §14 as detailed in §14.1.4. In the best case, n may be replaced with O(log n).
13.1 Addition Given v0 , v00 ∈ V in some representation, we must represent v := v0 + v00 .
13.1.1 Full Representation Nd d Assume V = j=1 KIj with I = ×j=1 Ij (cf. §7.2). Given v0 , v00 ∈ V in full representation, the sum is performed directly by vi := vi0 + vi00
for all i ∈ I.
The memory Nmem (v) = #I is the same as for each of v0 , v00 . The number of arithmetic operations is also equal to d Y full N+ = Nmem (v) = #I = #Ij . j=1
As a variant, we may consider sparse tensors. Then, obviously, v is less sparse than v0 , v00 , unless both terms possess the same sparsity pattern. Another variant is the full functional representation by a function. Given two functions function v1(. . .) and function v2(. . .), the sum is represented by the function v(i1 , . . . , id ) defined by v(. . .) := v1(. . .) + v2(. . .). Hence the cost per call increases: Nv = Nv1 + Nv2 . 2
Here, the cost of one iteration is given.
475
13.1 Addition
13.1.2 r-Term Representation Pr Ps (1) (d) (1) (d) Given v0 = ν=1 vν ⊗. . .⊗vν ∈ Rr and v00 = ν=1 wν ⊗. . .⊗wν ∈ Rs , 00 0 the sum v = v + v is performed by concatenation, i.e., v=
r+s X
(j)
vν(1) ⊗ . . . ⊗vν(d) ∈ Rr+s , where vr+ν := wν(j) for 1 ≤ ν ≤ s, 1 ≤ j ≤ d.
ν=1
The memory is additive: Nmem (v) = Nmem (v0 ) + Nmem (v00 ), while no arithmetic work is required: N+ = 0. Since the result lies in Rr+s with increased representation rank r+s, we usually need a truncation procedure to return to a lower representation rank. 0(j) 0 Consider the hybrid format in (8.20). If v0 = ρhybr r-term r , J, (aν ), (Bj ) and 00(j) 00 ), (Bj ) holds with identical bases, the procedure is v00 = ρhybr r-term r , J, (aν (j) as above. The coefficients are joined into (aν ) with 1 ≤ ν ≤ r := r0 + r00 . 0 00 If different bases (Bj ) and (Bj ) are involved, these must be joined via JoinBases(Bj0 , Bj00 , rj , Bj , T 0(j) , T 00(j) ). Now v00 can be reformulated by coefficients a(j) = T 00(j) a00(j) with respect to the basis Bj . v0 may stay essentially unchanged since Bj0 can be taken as the first part of Bj . Afterwards, v0 and v00 have identical bases and can be treated as before. The total cost in the second case is d X N+ = NQR (nj , rj0 + rj00 ) + rj (2rj00 − 1) . j=1
13.1.3 Tensor Subspace Representation Case I (two tensors from Tr with same bases). First Ndwe consider the case that v0 , v00 ∈ Tr belong to the same tensor subspace U := j=1 Uj with rj = dim(Uj ). The representation parameters are the coefficient tensors a0 , a00 ∈ KJ with (j) d J = ×j=1 Jj , where i ∈ Jj = {1, . . . , rj } is associated to the basis vectors bi 0 00 0 00 (cf. (8.5b)). The addition of v , v ∈ U reduces to the addition a := a + a of the coefficient tensors for which the full representation is used. Hence §13.1.1 yields Nmem (v) = N+ = #J =
d Y
rj .
j=1
Case II (two tensors with different bases). Another situation arises if different tensor subspaces are involved: v0 =
X i∈J0
a0i
d O j=1
0(j)
bij ∈ U0 :=
d O j=1
Uj0 , v00 =
X i∈J00
ai00
d O j=1
00(j)
bij
∈ U00 :=
d O j=1
Uj00 .
13 Tensor Operations
476
The sum v := v0 + v00 belongs to the larger space U := U0 +U00 =
Nd
j=1 Uj
with
Uj := Uj0 + Uj00 = range(Bj0 ) + range(Bj00 ), 0(j)
0(j)
00(j)
00(j)
where Bj0 = [b1 , . . . , br0 ] and Bj00 = [b1 , . . . , br00 ] are the given bases or j j frames of Uj0 ⊂ Vj and Uj00 ⊂ Vj , respectively. The further treatment depends on (j) (j) the requirement about Bj := [b1 , . . . , brj ]. 1. If Bj may be any frame (cf. Remark 8.8d), we can set ri := rj0 + rj00 and 0(j) 0(j) 00(j) 00(j) Bj := b1 , . . . , br0 , b1 , . . . , br00 . j
j
d j=1
Then a ∈ KJ with J = × Jj and Jj = {1, . . . , rj } is obtained by concatenation: ai = ai0 for i ∈ J0 ⊂ J, ar0 +i = ai00 for i ∈ J00 , where r0 = (r10 , . . . , rd0 ). All further entries are defined by zero. There is no arithmetic cost, i.e., Tr ,frame = 0, N+
but the memory is greatly increased: Nmem (v) = #J = general, Nmem (v) Nmem (v0 ) + Nmem (v00 ).
Qd
j=1 rj .
Note that, in
2. If Bj should be a basis, we apply JoinBases(Bj0 , Bj00 , rj , Bj , T 0(j) , T 00(j) ) in (j) (j) (2.32), which determines a basis Bj = b1 , . . . , brj of Uj together with Pr Pr 0(j) (j) 0(j) 00(j) 00(j) (j) transfer maps bk = and bk = i=1 Tik bi i=1 Tik bi . It is advantageous to retain one part, say Bj0 , and to complement Bj0 by the linearly 0(j) independent contributions from Bj00 , which leads to Tik = δik . The dimension 0 00 rj = dim(Uj ) may take any value in max{rj , rj } ≤ rj ≤ rj0 + rj00 . It ded fines the index sets Jj = {1, . . . , rj } and J = ×j=1 Jj . If rj = rj0 + rj00 , the memory is as large as in Case 1. The work required by JoinBases depends on Nd 0(j) the representation of the vectors in Vj (cf. §7.5). Set3 T0 := and j=1 T N d T00 := j=1 T 00(j) . Lemma 8.10 states that v0 = ρframe a0 , (Bj0 )1≤j≤d = ρframe (T0 a0 , (Bj )1≤j≤d ) , v00 = ρframe a00 , (Bj00 )1≤j≤d = ρframe (T00 a00 , (Bj )1≤j≤d ) . Then a := T0 a0 +T00 a00 is the resulting coefficient tensor in v = ρframe (a, (Bj )). Pd nj 00 0 The cost of JoinBases is j=1 NQR (nj , rj + rj ) if Vj = K . The update P d a := T0 a0 + T00 a00 of the coefficient tensor leads to 2#J j=1 rj operations. If nj ≤ n and rj ≤ r, the overall cost is Tr ≤ 2dnr2 + 2drd+1 . N+
3. In the case of orthonormal bases Bj0 , Bj00 , Bj , we apply JoinONB (cf. (2.33)). The coefficient tensors are treated as in Case 2. The cost is as in Item 2. 3
0(j)
T0 can be chosen as trivial injection: Tαβ = δαβ .
13.2 Entry-wise Evaluation
477
13.1.4 Hierarchical Representation 00 0 Case I (two tensors with identical bases). Assume that both tensors v ,v ∈ Hr are represented by the same data TD , (Cα )α∈TD \L(TD ) , (Bj )j∈D , only their coefficients c0(D) and c00(D) differ. Then the sum v:= v0 + v00 is characterised by v = ρHT TD , (Cα )α∈TD \L(TD ) , c(D) , (Bj )j∈D with the coefficient vector c(D) := c0(D) + c00(D) ∈ Krd . The cost is marginal: Hr ,Case I = rd N+
(usually, rd = 1).
Case II (two tensors with different bases). Here we assume that both terms v0 ∈ Hr0 and v00 ∈ Hr00 use the same dimension partition tree TD : 0 )α∈TD \L(TD ) , c0(D) , (Bj0 )j∈D , v0 = ρHT TD , (Cα v00 = ρHT TD , (C00α )α∈TD \L(TD ) , c00(D) , (Bj00 )j∈D . First we consider the involved hierarchical subspace families from Definition 0 11.8a. Let {Uα }α∈TD and {U00α }α∈TD be the subspaces associated with v0 and v00 , respectively. The sum v := v0 +v00 belongs to {Uα }α∈TD with Uα := U0α +U00α . As in §13.1.3, we must determine bases of the spaces U0α + U00α . This procedure is described in §11.5. According to Remark 11.70d, the cost is bounded by ≤ 8dnr2 + 12dr4 , where r := max rj and n := max nj . Having a common basis representation, we can apply Case I from above. Hence the cost is Hr ,Case II N+ ≤ 8dnr2 + 12dr4 .
An increase of storage is caused by the fact that in the worst case, the sub00 have dimension dim(Uα ) = dim(U0α ) + dim(U00α ). In spaces Uα = U0α + Uα particular, dim(Ud ) ≥ 2 can be reduced to 1 without loss of accuracy. Possibly, other subspaces Uα can be reduced by the truncation procedure of §11.4.2.
13.2 Entry-wise Evaluation Nd For Vj = Knj , the tensor v ∈ V = j=1 Vj has entries vi with i = (i1 , . . . , id ) ∈ I and the evaluation Λi : v 7→ vi ∈ K is of interest. In connection with variants of the cross approximation (cf. §15) it is necessary to evaluate vi not only for one index i but for all k in the so-called fibre F(j, i) := k ∈ I = I1 × . . . × Id : k` = i` for ` ∈ {1, . . . , d}\{j} . Note that the component kj of k ∈ F(j, i) takes all values from Ij , while all other components are fixed. The challenge is to perform the simultaneous evaluation cheaper than #Ij times the cost of a single evaluation.
13 Tensor Operations
478
The entry-wise evaluation may be viewed as scalar product by the unit vector (i) e(i) ∈ V with ej = δij (i, j ∈ I) since vi = hv, e(i) i. Therefore the evaluation of the scalar product with an elementary tensor is closely related. Full representation of v need not be discussed since then vi is directly available, full i.e., Neval = 0.
13.2.1 r-Term Representation Pr Nd (j) If v is represented in r-term format v = ν=1 j=1 vν ∈ Rr , the entry vi is Pr Qd (j) equal to ν=1 j=1 (vν )ij . Its computation requires r-term = rd − 1 Neval
arithmetic operations. The cost of the evaluation for all indices k ∈ F(j, i) is r-term Neval (F(j, i)) = r (d + 2#Ij − 2) − #Ij . Q (`) Here the products `∈{1,...,d}\{j} (vν )i` (1 ≤ ν ≤ r) are computed first.
13.2.2 Tensor Subspace Representation P
ak
X
ak
The tensor subspace representation v = v[i1 , . . . , id ] = vi =
k∈J
k∈J
(j) j=1 bkj
Nd
Yd j=1
yields
(j)
(bkj )ij
with J = J1 × . . . × Jd , where rj = #Jj . The evaluation starts with summation over k1 yielding a reduced coefficient tensor a[k2 , . . . , kd ] etc. Then the arithmetic operations amount to ! d d X Y Tr Neval = (2r` − 1) rj ≈ 2#J. `=1
j=`+1
Summation in the order k1 , k2 , . . . , kd is optimal if r1 ≥ r2 ≥ . . . ≥ rd . Otherwise the order of summation should be changed. If rj ≤ r, the cost is about Tr Neval ≈ 2rd . For the simultaneous evaluation, the summations over all k` are to be performed in such an order that ` = j is the last one. For j = d, the cost is ! d d−1 Y X Tr rj + #Id (2rd − 1) . Neval (F(j, i)) = (2r` − 1) `=1
j=`+1
13.2 Entry-wise Evaluation
479
13.2.3 Hierarchical Representation For α ⊂ D = {1, . . . , d} the index iα belongs to Iα = ×j∈α Ij . The evaluation of the iα entry (α) β`
(α) b` [iα ]
:=
=
rα1 rα2 X X
(α,`)
cij
(α1 )
bi
(α2 )
[iα1 ] bj
[iα2 ]
(13.1)
i=1 j=1 (α)
of the basis vector b`
is performed recursively from the leaves to the root:
procedure eval∗ (α, i); for ` := 1 to rα do (j) (α) if α = {j} then β` := b` [ij ] else begin eval∗ (α1 , i); eval∗ (α2 , i); (α)
β`
rα1 rα2
:=
P P
i=1 j=1
(α,`) (α1 ) (α2 ) βi βj
cij
{leaf} {α1 , α2 sons of α} {non-leaf vertex, cf. (13.1)}
end; (α)
(cf. (11.23)). The definition of β` implemented by
is the result of eval∗ . The evaluation of v[i] is
function eval(v, i); begin eval∗ (D, i); s := 0; (D) (D) for ` := 1 to rd do s := s + c` · β` ; eval := s end; The asymptotic computational cost is Hr =2 Neval
X
rα rα1 rα2
(α1 , α2 sons of α) .
α∈TD \L(TD ) Hr ≤ 2dr3 . If a better bound rj ≤ rleaf For rα ≤ r, the cost is bounded by Neval Hr holds for the leaves (cf. §14), Neval ≤ 2d rleaf r2 .
The cost of the simultaneous evaluation at F(j, i) amounts to Hr Hr + 2#Ij Neval (F(j, i)) = Neval
X
rα rα1 .
α∈TD \L(TD ) with j∈α1 ∈S(α)
The latter summation involves all non-leaf vertices α with a son α1 containing j. The total cost is bounded by 2r2 [ d r + (depth(TD ) − 1) #Ij ] (cf. (11.6)). Note that the tree TD can be constructed such that depth(TD ) ≈ log2 d.
13 Tensor Operations
480
13.2.4 Matrix Product Representation The TT format is introduced in (12.1a) by a representation of an entry vi . Since (j) the data vkj−1 ij kj are already separated with respect to ij , only the matrices (j) vkj−1 kj [ij ] in (12.2b) enter into the computation. Correspondingly, the evaluation of the right-hand side requires fewer operations than in §13.2.3: TT =2 Neval
d−2 X
ρ` ρ`+1 .
(13.2)
`=1 TT For ρ` ≤ ρ, this is Neval ≈ 2 (d − 2) ρ2 .
For the simultaneous evaluation in the case of j ∈ {2, . . . , d − 1}, perform the product of the matrices A(`,i` ) in (12.2b) such that AI · A(j,kj ) · AII holds with vectors AI ∈ Kρj−1 and AII ∈ Kρj . Its evaluation for all kj ∈ Jj yields TT Neval (F(j, i)) = ρj−1 ρj (2ρj + 1) +
j−2 X
(2ρ` − 1) ρ`+1 +
`=1
d−2 X
(2ρ`+1 − 1) ρ`
`=j
d−2 X ≈ 2 ρj−1 ρ2j + ρ` ρ`+1 . `=1
The cases j = 1 and j = d are left to the reader.
13.3 Scalar Product Given pre-Hilbert Nd spaces Vj with scalar product h·, ·ij , the induced scalar product in V = a j=1 Vj is defined in §4.5.1. The corresponding norms of Vj and V are denoted by k·kj and k·k. We suppose that computing hu, vij is feasible, at least for vectors u, v ∈ Uj ⊂ Vj belonging to the relevant subspace Uj , and that its computational cost is (13.3) Nj (cf. Remark 7.16). In the case of function spaces, hu, vij for u, v ∈ Uj may be given analytically or approximated by a quadrature formula,4 provided that Uj is a subspace of sufficiently smooth functions. The scalar product hu, vi is considered in two situations. In the general case both u and v are tensors represented in one of the formats. A particular but also important case is the scalar product of u —represented in one of the formats—and an elementary tensor 4
Whether approximations of the scalar product are meaningful or not, depends on the application. For instance, a quadrature formula with n quadrature points cannot be used to determine an (approximately) orthogonal system of more than n vectors.
481
13.3 Scalar Product
v=
Od j=1
v (j)
(13.4)
represented by the vectors v (j) ∈ Vj (1-term format). A related problem is computing the partial scalar product defined in §4.5.5. It is important since the left or right singular vectors of the singular-value decomposition can be obtained via the partial scalar product (see Lemma 5.14).
13.3.1 Full Representation d
I For V = K P with I = ×j=1 Ij , the Euclidean scalar product hu, vi has to be computed by i∈I ui vi so that Nh·,·i = 2#I. The computation may be much cheaper for sparse tensors. The full representation by a function is useful only in connection with a quadrature formula. The case of (13.4) is even a bit more expensive. The partial scalar product in KI depends on the decomposition of {1, . . . , d} into disjoint and nonempty sets α and αc := {1, . . . , d}\α. This induces a partition of I into I = Iα × Iαc with Iα = ×j∈α Ij and Iαc = ×j∈αc Ij . Then w := hu, viαc ∈ KIα ⊗ KIα is defined by the entries5 X wi0 ,k0 = for i0 , k0 ∈ Iα . ui0 ,i00 vk0 ,i00 00 i ∈Iαc
2
The computational cost per entry is 2#Iαc . Since w has (#Iα ) entries, the over2 all cost is Nh·,·i,Iα = 2 (#Iα ) #Iαc .
13.3.2 r-Term Representation Qd
For elementary tensors, the definition of h·, ·i yields hu, vi = j=1 u(j) , v (j) j . Combining all terms from u ∈ Rru and v ∈ Rrv , we obtain the following result. Remark 13.1. The scalar product of u ∈ Rru and v ∈ Rrv costs d X r-term Nj Nh·,·i = ru rv d + j=1
operations with Nj in (13.3). The case of (13.4) is included by choosing rv = 1. For partial scalar products we use α, αc ⊂ {1, . . . , d} and w := hu, viIαc as in Nd Nd §13.3.1. First, let u = j=1 u(j) and v = j=1 v (j) be elementary tensors. Then Y
O (j) O v (j) ∈ Vα ⊗ Vα hu, viαc = u(j) , v (j) j u ⊗ j∈αc
j∈α
j∈α
The notation ui0 ,i00 assumes that αc = {j ∗, . . . , d} for some 1 ≤ j ∗ ≤ d. Otherwise, uπ(i0 ,i00 ) with a suitable permutation π is needed. However, this does not effect the computational cost.
5
13 Tensor Operations
482
N with Vα = j∈α Vj is again an elementary tensor. The same considerations as above lead to the next remark. Since hv, viαc (i.e., u = v) appears for the leftsided singular-value decomposition of Mα (v), this case is of special interest. Remark 13.2. (a) The partialPscalar product w := hu, viαc of u ∈ Rru and v ∈ Rrv costs Nh·,·i = ru rv (#αc + j∈αc Nj ) operations. The tensor w ∈ Vα ⊗ Vα is given in the format Rr with r := ru rv . P rv (rv +1) Rr (b) Because of the symmetry, Nh·,·i,α #αc + j∈αc Nj for c reduces to 2 computing hv, viαc .
13.3.3 Tensor Subspace Representation Case I (two tensors from Tr with same bases). The easiest case is given by u = Nd Nd P P (j) (j) v u ∈ Tr and v = j=1 bij ∈ Tr belonging to the j=1 bij N i∈J ai i∈J ai (j) (j) d same subspace U := j=1 Uj with orthonormal bases Bj = [b1 , . . . , brj ] of Uj . Then hu, vi = hau , av iJ holds, where the latter is the Euclidean scalar product of the coefficient tensors in Qd KJ . The cost is Nh·,·i = 2#J = 2 j=1 rj (cf. §13.3.1). P Nd (j) Case II (tensor from Tr and elementary tensor). If u = i∈J ai j=1 bij ∈ Tr , P Nd (j) while v is an elementary tensor, hu, vi = i∈J ai j=1 hbij , v (j) ij requires ! d d d Y X X rj Nj (2rj − 1) rj + j=1
j=1
`=j+1
(j)
(j)
operations. The second sum corresponds to the scalar products βij := hbij , v (j) ij , where Nj is the cost of a scalar product in Vj (cf. (13.3)). Performing first P Qd (j) the summation of β j over 1 ≤ i1 ≤ r1 for all combinations of i∈J ai Pr1 j=1 i(1) i2 , i3 , . . . , id , we obtain i1 =1 ai βi1 with the cost (2r1 − 1) r2 · · · rd . Proceeding with summation over i2 , . . . , id , we obtain the cost given above. P Nd 0(j) 0 Case III (tensors from Tr0 and Tr00 ). If u0 = ∈ Tr0 and i∈J0 ai j=1 bij P N 00(j) d 00 00 u = i∈J00 ai ∈ Tr00 use different bases, the computation of j=1 bij hu0 , u00 i =
X X
a0i a00k
i∈J0 k∈J00
requires Tr Nh·,·i
=
d X j=1
operations.
2rj00
+1
rj0
d D E Y 0(j) 00(j) bij , bkj
j
j=1 d Y `=j+1
r`0 r`00
+
d X j=1
rj0 rj00 Nj
(13.5)
483
13.3 Scalar Product
Remark 13.3. Assume nj ≤ n and rj , r`0 , r`00 ≤ r. Then the asymptotic costs of Cases I–III can be estimated by II: 2 rd + dnr , I: rd , III: 2 r2d + dnr2 . An alternative way for Case III is to transform u0 and u00 into a representation with a common orthonormal basis Bj as explained in §8.6.2. The expense is 8dnr2 + 2drd+1 (cf. Remark 8.50). Having a common basis of dimension ≤ 2r, we can apply Case I. Hence the total cost is d
Tr III’: Nh·,·i = 8dnr2 + 2drd+1 + (2r) .
This leads to the following remark. 2d−2
d−1
d d−2
Remark 13.4. For Case III with n < r 3d − r 3 − 2 r6d , it is advantageous first to transform u0 and u00 into a representation with common orthonormal bases Bj . The cost is O(dnr2 + rd min{rd , 2d + dr}). In the case of a partial scalar product w := hu, viαc ∈ KIα ⊗ KIα , we have similar cases as before. P Nd (j) Case I (two tensors from Tr with same bases). Let u = i∈Jα au i j=1 bij and N P (j) d v = i∈Jα avi j=1 bij . Again, under the assumption of orthonormal bases, the partial scalar product can be applied to the coefficient tensors: X O (j) O (j) ci,k w= bij ⊗ bkj with c := hau , av iαc . j∈α
(i,k)∈Jα ×Jα
j∈α 2
Tr Therefore the cost is given by Nh·,·i,α c = 2 (#Jα ) #Jαc (cf. §13.3.1). Note that the resulting tensor w ∈ KIα ⊗ KIα is again represented in tensor subspace format. A very important case is u = v and α = {j}. Then Tr Nh·,·i,α c = (rj + 1)
d Y
rk
for u = v and α = {j} with rj = #Jj . (13.6)
k=1
P Nd 0(j) Case II (tensors from Tr0 and Tr00 ). Now we assume that v0 = i∈J0 a0i j=1 bij α P Nd 00(j) and v00 = a00i use not only different bases but also different i∈J00 j=1 bij α subspaces of possibly different dimensions. Basing the computation of the partial scalar product w := hv0 , v00 iIαc on the identity6 Y D 0(j) 00(j) E O 0(j) O 00(j) X X X X 0 00 bk0 , bi0 ⊗ ai0 ,i00 ak0 ,k00 bi00 , bk00 w= i00 ∈J0αc k00 ∈J00 αc
i0 ∈J0α k0 ∈J00 α
j∈αc
{z
|
=:bi0 ,k0
×d
j
j
j
j
j
j∈α
j∈α
}
0 0 0 0 00 Note that the index i ∈ J0 = j=1 Jj of ai = ai0 ,i00 is split into the pair (i , i ) , where 00 0 00 c i ∈ Jα = j∈α Jj and i ∈ Jα = j∈αc Jj . Similarly for k ∈ J . 6
×
×
13 Tensor Operations
484
we need Tr Nh·,·i,α c = 2
d Y
rj0 rj00 +
X
rj0 rj00 Nj + lower order
j∈αc
j=1
operations for the evaluation of the coefficient tensor b ∈ KJα ⊗ KJα . Here we assume that all rj0 = #Jj0 and rj00 = #Jj00 are of comparable size. c The alternative approach is to determine common orthonormal bases for j ∈ α 2 d+1 c requiring #α 8nr + 2r operations (assuming common bounds r and n for all directions). Then Case I can be applied. The estimate of the total cost by Tr c 8nr2 + 2rd+1 + 2rd+#α Nh·,·i,α c ≤ #α shows that the second approach is cheaper under the assumptions Nj = 2nj − 1 and n ≤ r2d−2 / (3#αc ) up to lower order terms.
13.3.4 Hybrid Format P Nd (j) The hybrid format u = ρhybr (·) in §8.2.6 implies that u = i∈J au i j=1 bij ∈ Tr , where au ∈ Rr (KJ ) is represented in r-term format. The cost of a scalar product ˆj (this is usually 2#Jj ). in KJj is denoted by N Again, we distinguish the cases from above. Case I (two hybrid tensors with same bases). Here u, v ∈ Tr are given with Nd (j) (j) identical subspace U := j=1 Uj and orthonormal bases Bj = [b1 , . . . , brj ] of Uj . Again, the identity hu, vi = hau , av iJ holds. Since au , av ∈ Rru (KJ ) and av ∈ Rrv (KJ ), the latter scalar product can be performed as discussed in §13.3.2. ! The cost is d X ˆj . ru rv d + N j=1
Case II (hybrid tensor and elementary tensor). Let u be of hybrid format, (j) while v is the elementary tensor (13.4). As in §13.3.3, the scalar products βij := Pr Nd (j) (j) (j) hbij , v ij are to be computed. Since a = ∈ Rr , we obtain ν=1 j=1 aν P Qd P Qd (j) (j) (j) , involving the KJ -scalar hu, vi = i∈J ai j=1 βij = ν j=1 aν , β (j)
product with β (j) = (βi )i∈Jj . The total cost is d X
rj Nj + r
j=1
d X
ˆj . N
j=1
Case III (hybrid tensors with different bases). For hybrid tensors u0 , u00 with Pr0 Nd Pr00 Nd 00(j) 0(j) and a00 = coefficient tensors a0 = , the ν=1 j=1 aν ν=1 j=1 aν right-hand side in (13.5) can be written as # " d D E X X XY 00(j) 0(j) 00(j) (13.7) a0(j) hu0 , u00 i = ν [ij ] aµ [kj ] bij , bkj ν,µ j=1
ij ∈Jj0 kj ∈Jj00
j
485
13.3 Scalar Product
and requires Tr Nh·,·i =
d X
rj0 rj00 (Nj + 3r0 r00 )
j=1
operations, which are bounded by 2dnr2 + 2r4
if rj0 , rj00 , r0 , r00 ≤ r and Nj ≤ 2n.
Alternatively, we introduce common bases. According to Remark 8.51, the cost including the transformation of the coefficient tensors a0 , a00 is 8dnr2 + 2dr3 . The addition of a0 ∈ Rr0 and a00 ∈ Rr00 requires no arithmetic work, but increases the representation rank: r = r0 + r00 . Remark 13.5. For Case III with n ≥ (r2 /d − r)/3, the first variant based on (13.7) is more advantageous. The cost is bounded by 2r2 dn + r2 .
13.3.5 Hierarchical Representation We start with the scalar product hu, vi of u ∈ Hr and an elementary tensor v = Nd N (j) in (13.4). Define v(α) := j∈α v (j) and use the recursion j=1 v rα
rα
1 X 2 X (α )
(α) (α) (α,`) (α1 ) cij = bi , v(α1 ) α bj 2 , v(α2 ) α b` , v α 1
2
(13.8)
i=1 j=1
(cf. (11.20)), where α1 , α2 are the sons of α.
(α) Remark 13.6. The computation of all b` , v(α) α for α ∈ TD , 1 ≤ ` ≤ rα , can be performed by d X X rj Nj + 2rd − 1 ({α1 , α2 } = S(α)) {rα (2rα1 + 1) rα2 − 1} + j=1
α∈TD \L(TD )
arithmetic operations. Under the assumptions rα ≤ r and Nj ≤ 2n − 1, the asymptotic cost is 2 (d − 1) r3 + 2rn. (α)
(α)
(α)
α Proof. Set β` := hb` , v(α) iα and β (α) := (β` )r`=1 ∈ Krα . Given the Prα1 Prα2 (α,`) (α1 ) (α2 ) (α) (α2 ) (α1 ) and β , (13.8) implies that β` = i=1 j=1 cij βi βj , vectors β (α) (α1 ) T (α,`) (α2 ) i.e., β` = (β ) C β for 1 ≤ ` ≤ rα . Therefore the computation of β (α) requires rα (2rα1 + 1) rα2 − 1 operation. The recursion (13.8) terminates Pd (j) with the scalar products hbi , v (j) ij , which cost j=1 rj Nj operations. Finally, Prd (D) (D) the scalar product hu, vi = `=1 c` β` takes 2rd − 1 operations. t u
Next, we consider the scalar product hu, vi of general tensors u, v ∈ Hr .
13 Tensor Operations
486
Case I (two tensors from Hr with identical bases). Two tensors orth (D) (D) u = ρorth HT TD , (Cα )α∈TD \L(TD ) , cu , (Bj )j∈D , v = ρHT . . . , cv , . . . given in the same format Hr with orthonormal bases (cf. §11.3.2) satisfy
(D) hu, vi = c(D) , (13.9) u , cv
(D) (D) where cu , cv is the Euclidean scalar product in Krd . The cost is negligible: Nh·,·i = 2rd − 1. Note that Case I holds in particular for u = v. Case II (two tensors from Hr with different bases). Next, we consider two tensors and (13.10a) u0 = ρHT TD , (C0(α) )α∈TD \L(TD ) , c0(D) , (B0(α) )α∈L(TD ) u00 = ρHT TD , (C00(α) )α∈TD \L(TD ) , c00(D) , (B00(α) )α∈L(TD ) , (13.10b) which are given with respect to different bases. Note that the bases need not be orthonormal. The next lemma uses the subtree Tα from N Definition 11.6. h·, ·iβ denotes the scalar product of the tensor space Vβ := j∈β Vj . 00(β)
0(β)
: 1 ≤ j ≤ rβ00 } be the bases Lemma 13.7. Let {bi : 1 ≤ i ≤ rβ0 } and {bj involved in (13.10a,b) for β ∈ Tα . Computing all scalar products
0(β)
bi
00(β)
, bj
for 1 ≤ i ≤ rβ0 , 1 ≤ j ≤ rβ00 , β ∈ Tα
β
costs X
X
00 0 r{j} Nj + 2 r{j}
j∈α
rβ0 rβ0 1 rβ001 rβ002 + rβ0 1 rβ0 2 rβ001 + rβ00 rβ0 2 rβ002
β∈Tα \L(Tα )
(β1 , β2 sons of β) arithmetic operations if these quantities are computed as detailed in the proof. Proof. By property (11.20), we have the recursive equation r0
r0
r 00
r 00
β1 β2 β2 β1 X X X
0(β) 00(β) X 0(β,`) 00(β,k) 0(β1 ) 00(β1 ) 0(β2 ) 00(β2 ) b` , bk = cij cmn . bi , bm bj , bn β β1 β2
i=1 j=1 m=1 n=1
(13.11) (β)
0 00 rβ ×rβ
(β) 0(β) 00(β) be the matrix with the entries S`k = hb` , bk iβ . Eq. (13.11) 0 00 0 00 rβ ×rβ rβ ×rβ (β1 ) (β2 ) 1 1 2 2
Let S ∈ K involves the matrices S (β )
0(β1 )
Sim1 = hbi
∈K
00(β1 ) , bm iβ1
and S
∈K
(β )
defined by the entries
0(β2 )
and Sjn2 = hbj
, bn00(β2 ) iβ2 .
Note that can be expressed as the Frobenius scalar
the fourfold sum in (13.11) product S β1 T C 0(β,`) S β2 , C 00(β,k) F . Computing
487
13.3 Scalar Product
M` := S β1 T C 0(β,`) S β2
for all 1 ≤ ` ≤ rβ0
needs 2rβ0 (rβ0 1 rβ001 rβ002 + rβ0 1 rβ0 2 rβ001 ) operations. The products M` , C 00(α,k) F for all 1 ≤ ` ≤ rβ0 and 1 ≤ k ≤ rβ00 cost 2rβ0 rβ00 rβ0 2 rβ002 . The recursion (13.11) terminates for the scalar products h·, ·iβ with respect to the leaves β = {j} and j ∈ α. In this case S (β) requires the computation of 0 00 r{j} r{j} u t scalar products in Vj , each with the cost Nj . Prd 0(D) 0(D) Prd 00(D) 00(D) The scalar product of u0 = i=1 and u00 = j=1 is ci bi cj bj equal to rd0 rd00 X X 0(D) 00(D) D 0(D) 00(D) E 0 00 hu , u i = c` ck b` , bk . `=1 k=1 (D)
0(D)
00(D)
The computation of S`k := hb` , bk i is discussed in Lemma 13.7 for α :=D. Computing hu0 , u00 i = (c0(D) )T S (D) c00(D) costs 2rd0 (rd00 + 1) operations. Altogether we get the following result. Remark 13.8. (a) The recursive computation of the scalar product hu0 , u00 i (see proof of Lemma 13.7) costs Nh·,·i =
d X
0 00 r{j} r{j} Nj
(13.12a)
j=1
+2
X
rα0 rα0 1 rα00 1 rα00 2 + rα0 1 rα0 2 rα00 1 + rα00 rα0 2 rα00 2 + 2rd0 (rd00 + 1)
α∈TD \L(TD )
operations. Under the assumptions rα0 , rα00 ≤ r and Nj ≤ 2n, the cost is bounded by Nh·,·i ≤ 2dr2 n + 6 (d − 1) r4 + 2r2 .
(13.12b)
(b) Equation (13.9) does not hold for a non-orthonormal basis. In that case the scalar product hu, vi has to be computed as in Case II. By Hermitian symmetry (α) (α) (α) of S`k = hb` , bk iα , the computational cost is only half of (13.12a). An alternative approach is to join the bases of u0 and u00 as described in §11.5. By Remark 11.70, this procedure requires Nh·,·i ≤ 8dr2 r2 + n operations. Obviously, the latter cost is larger than (13.12b). However, for the special case considered in §14.1.3, this approach is advantageous. Next we consider the partial scalar product hu0 , u00 iαc . Here we concentrate on the case of α ∈ TD . We recall that the partial scalar product hv, viαc is needed for the left-sided singular-value decomposition of Mα (v) (see Lemma 5.14). The result of hu0 , u00 iαc is a tensor in the tensor space Vα ⊗ Vα for which a hierarchical format ˙ 0 . The is still to be defined. Let α0 be a copy of α disjoint to α and set A(α) := α∪α
13 Tensor Operations
488
dimension partition tree TA(α) is defined as follows: A(α) is the root with the sons α and α0 . The subtree at vertex α is Tα (cf. Definition 11.6) and the subtree at vertex 0(β) 00(β) α0 is the isomorphic copy Tα0 of Tα . The bases b` (β ∈ Tα ) [b` (β ∈ Tα0 )] 0 00 of u [u ] define the subspaces Uγ , γ ∈ TA(α) \A(α), together with their bases, while the basis of the subspace UA(α) is still to be determined. The computation of hu0 , u00 iαc follows the description in §4.5.5. First, we form u ⊗ u00 ∈ V ⊗ V, which is represented in the hierarchical format with the tree TA(D) . Let σ1 and σ2 be the sons of D. Since either α ⊂ σ1 or α ⊂ σ2 , it follows that either αc ⊃ σ2 or αc ⊃ σ1 . Without loss of generality, we assume αc ⊃ σ2 and apply the contraction Cσ2 from Definition 4.160: 0
u0 ⊗ u00 7→ Cσ2 (u0 ⊗ u00 ) ∈ Vσ1 ⊗ Vσ1 . Let u0 =
rd X
0(D) 0(D) b`
c`
`=1
u00 =
=
X
0(D) 0(D,`) 0(σ1 ) cij bi
c`
0(σ2 )
⊗ bj
and
`,i,j
X
00(D) 00(D,k) 00(σ1 ) cmn bm
ck
2) ⊗ b00(σ . n
k,m,n
Then
(σ )
=Sjn2
}| z E{ X X 0(D) 0(D,`) 00(D) 00(D,k) D 0(σ2 ) 0(σ ) 00(σ ) 0 00(σ2 ) 00 bj , bn bi 1 ⊗bm 1 c` cij ck cmn Cσ2 (u ⊗u ) = σ2
`,i,j k,m,n 0(σ1 )
holds. For each pair (i, m), the coefficient of bi X
00(σ1 )
⊗ bm
0(D) 0(D,`) 00(D) 00(D,k) ck cmn cij
c`
is the sum
(σ )
Sjn2 .
`,j,k,n
Set 0(D)
cij
:=
X
0(D) 0(D,`) cij
c`
and c00(D) mn :=
`
X
00(D) 00(D,k) cmn .
ck
k 0(D)
(σ2 )
00(D) H
Then the fourfold sum is equal to C S (C ) =: C (σ1 ) and yields the representation X (σ ) 0(σ ) 00(σ ) cim1 bi 1 ⊗ bm 1 ∈ Vσ1 ⊗ Vσ1 . (13.13) Cσ2 (u0 ⊗ u00 ) = i,m (σ )
The computational cost (without the determination of Sjn2 ) is 2rd0 rσ0 1 rσ0 2 + 2rd00 rσ001 rσ002 + 2rσ0 1 rσ0 2 rσ002 + 2rσ0 1 rσ001 rσ002 . Now we proceed recursively: if σ1 = α, we are ready. Otherwise let σ11 and σ12 be the sons of σ1 , apply Cσ12 [Cσ11 ] if αc ⊃ σ12 [if αc ⊃ σ11 ] and repeat recursively.
489
13.3 Scalar Product
The overall cost is given under the simplification rβ0 , rβ00 ≤ r. Then computing the coefficients in (13.13) requires 8r3 level(α). In addition, we need to 00(β) 0(β) compute the scalar products hbj , bn iβ for all β ∈ {γ ∈ TD : γ ∩ α = ∅}. The latter set contains d − #α − level(α) interior vertices and d − #α leaf vertices (see (11.5) for the definition of the level). Hence Lemma 13.7 yields a cost of 2 (d − #α) r2 n + 6 (d − #α − level(α)) r4 . The result is summarised in the following remark. Remark 13.9. Assume α ∈ TD . The partial scalar product hu0 , u00 iαc can be performed with the arithmetic cost 2 (d − #α) r2 n + 6 (d − #α − level(α)) r4 + 8r3 level(α). The resulting tensor belongs to Vα ⊗ Vα and is given in the hierarchical format with the dimension partition tree TA(α) explained above.
13.3.6 Orthonormalisation One purpose of computing scalar products is the orthonormalisation of a basis (QR or Gram–Schmidt orthonormalisation). If the basis belongs to one of the directly represented vector spaces Vj , the standard procedures in §2.7 apply. This is different if the basis vectors are tensors represented in one of the tensor formats. Such a situation happens, e.g., if the vectors from Vj are tensorised as proposed in §14.1.4. Assume that we start with s tensors bj ∈ W
(1 ≤ j ≤ s)
given in some format with representation ranks r (i.e., r = maxj rj in the case of bj ∈ Tr and r = maxα rα for bj ∈ Hr ). Furthermore, assume that dim(W) is sufficiently large. Here we can choose between the following cases. 1. Perform the Gram–Schmidt orthonormalisation without truncation. Then an exact orthonormalisation can be achieved. In general, the representation ranks of the new basis elements bnew ∈ W equal jr for 1 ≤ j ≤ s, leading to j unfavourably large ranks. 2. The same procedure, but with truncation, produces basis elements bjnew which are almost orthonormal. This may be sufficient if an orthonormal basis is introduced for the purpose of stability. 3. Let B := [b1 · · · bs ] ∈ Ws be the matrix corresponding to the basis and compute the Cholesky decomposition of the Gram matrix: BH B = LLH ∈ Ks×s . The (exactly) orthonormalised basis is Bnew = BL−H (cf. Lemma 8.16a). In some applications it is sufficient to use the factorisation BL−H without performing the product.
13 Tensor Operations
490
13.4 Change of Bases In general, the vector spaces Vj [or Uj ] are addressed by a basis or frame n (j) (j) (j) (j) (bi )1≤i≤nj which gives rise to a matrix Bj := [ b1 b2 . . . bnj ] ∈ Vj j . Con(j) sider a new basis (bi,new )1≤i≤nj,new and Bjnew together with the transformation nj,new
Bj = Bjnew T (j) ,
i.e.,
(j)
bk =
X
(j) (j)
Tik bi,new .
(13.14)
i=1
nj = nj,new holds for bases. If Bj is a frame, nj,new < nj may also occur. We write rj [rj,new ] instead of nj [nj,new ] if the bases span only a subspace Uj . In the case of the tensor subspace format and the hierarchical format, another change of bases may be of interest. If the subspaces are not described by orthonormal bases, an orthonormalisation can be performed. This includes the determination of some orthonormal basis and the corresponding transformation of the coefficients.
13.4.1 Full Representation P (d) (1) The full representation i∈I ai bi1 ⊗ . . . ⊗ bid (cf. (7.3)) is identical to the tensor subspace representation, involving maximal subspaces Uj = Vj with dimension nj = #Ij . The coefficient tensor a ∈ KI is transformed into anew := T a ∈ Nd (j) KI with the Kronecker matrix T = . The elementwise operations are j=1 T P P (d) (1) n1 nd new ai1 i2 ···id = k1 =1 · · · kd =1 Ti1 k1 · · · Tid kd ak1 k2 ···kd with 1 ≤ ij , kj ≤ nj . The arithmetic cost is Y X d d full for n := max nj . nj nj ≤ 2dnd+1 Nbasis-change = 2 j=1
j
j=1
13.4.2 Hybrid r-Term Representation nj P Pr Nd (j) (j) (j) (j) Let v = ν=1 j=1 vν ∈ Rr . If vν ∈ KIj represents the vector vν,i bi ∈ Vj , i=1 the transformation with respect to the new bases Bjnew yields ! nj,new nj,new nj nj nj X (j) (j) X X X X (j) (j) (j) (j) (j) (j) vν,k bk = Tik bi,new = vν,k Tik vν,k bi,new k=1
k=1
i=1
i=1
k=1
|
{z
(j)
= vˆν,i (cf. (13.14). Hence the transformed tensor is
}
13.4 Change of Bases
v ˆ=
491 r O d X
vˆν(j) ∈ Rr
with vˆν(j) = T (j) vν(j) .
ν=1 j=1
Multiplication by the nj,new × nj matrices T (j) leads to the total cost Rr =r Nbasis-change
d X
nj,new (2nj − 1) ≤ 2drn2
for n := max nj . j
j=1
13.4.3 Tensor Subspace Representation It is now assumed that only the bases representing the subspaces Uj ⊂ Vj are (j) changed. We assume that the basis vectors bi,new , 1 ≤ i ≤ rj,new , are given together with the rj × rj,new matrices T (j) . The cost for transforming the coefficient tensor is as in §13.4.1, but with nj replaced by rj : j d d Y X Y Tr rk,new =2 rk ≤ 2drd+1 for r := max{rj , rj,new }. Nbasis-change j=1
k=1
k=j
j
(13.15a) Another type of basis transform is the orthonormalisation in the case that the (j) (j) format Tr is described by general bases (or frames) Bjold := [b1,old , . . . , brj,old ,old ]. old By procedure RQR(nj , rj,old , rj , Bj , Q, R) in (2.26) we obtain a new ortho(j) (j) normal basis Bjnew = Q = [b1,new , . . . , brj ,new ] together with the transformation matrix T (j) = R, i.e., Bjnew T (j) = Bjold . Note that in the case of a frame Bjold , the dimension rj may be smaller than rj,old . The cost for calling RQR is 2 NQR (nj , rj,old ) = 2nj rj,old . The cost of an application of T (j) to the coefficient Tr tensor is Nbasis-change from above. Altogether, the cost of orthonormalisation is Tr ≤ 2drd+1 + 2dnr2 Northonormalisation
(13.15b)
with r := maxj rj,old and n := maxj nj .
13.4.4 Hierarchical Representation The transformation considered here, is a special case of §11.3.1.4. The basis transformations (13.14) influence the coefficient matrices C (α,`) for vertices α ∈ TD with at least one son {j} ∈ L(TD ). Let α1 denote the first son and α2 the second son of α. Then (j ) T (j ) (α,`) if α1 = {j1 } and α2 = {j2 } T 1 Cold (T 2 ) (α,`) (α,`) Cnew := T (j1 ) Cold / L(TD ) if α1 = {j1 } and α2 ∈ (α,`) (j2 ) T Cold (T / L(TD ) and α2 = {j2 } ) if α1 ∈
13 Tensor Operations
492
for 1 ≤ ` ≤ rα . Otherwise C (α,`) is unchanged. The computational work consists of d matrix multiplications: Hr Nbasis-change
=2
d X
rj rj,new rbrother(j) ≤ 2dr3 .
(13.16a)
j=1
The brother of {j} may be defined by {brother(j)} := S(father(j))\{j}. (α)
(α)
Next, we assume that the bases (frames) Bα := [b1 , . . . , brα ] (α ∈ TD ) of Hr are to be orthonormalised. Generating orthonormal bases by RQR costs NQR (nj , rj ) = 2nj rj2
for 1 ≤ j ≤ d
7
and
NQR (rα2 , rα ) = 2rα4
for α ∈ TD \L(TD ).
Each transformation T (α) (α 6= D) leads to rγ matrix multiplications (cf. (11.28)) with the cost 2rγ rα rβ rαnew , where γ := father(α) and β := brother(α). T (D) leads to 2rd rdnew operations (cf. (11.29)). Hence orthonormalisation is realised by Hr Northonormalisation
=2
d X
({α1 , α2 } = S(α))
nj rj2 + 2
j=1
X
rα4 + rα rα1 rα2 (rα1 + rα2 ) + 2rd2
(13.16b)
α∈TD \L(TD )
≤ 2dnr2 + 6 (d − 1) r4 + 2r2 . operations, where r := maxα rα and n := maxj nj .
13.5 General Binary Operation We now consider tensor spaces V = and any bilinear operation
Nd
j=1 Vj ,
W =
Nd
j=1 Wj ,
X=
Nd
j=1 Xj ,
:V×W →X 8
satisfying d O v (j) j=1
d O
j=1
w(j) =
d O
v (j)
w(j) ,
v (j)
w(j) ∈ Xj , (13.17)
j=1
for elementary tensors. We assume that the evaluation of v (j) v (j) ∈ Vj and w(j) ∈ Wj costs Nj arithmetic operations. 7 8
w(j) for vectors
2 rα is the dimension of the space of matrices of size rα . The map j = : Vj ×Wj → Xj on the right-hand side is denoted by the same symbol
.
13.5 General Binary Operation
493
13.5.1 r-Term Representation Prv Nd Prw Nd (j) (j) Tensors v = ν=1 µ=1 j=1 vν ∈ Rrv (V) and w = j=1 wµ ∈ Rrw (W) lead to x := v w ∈ Rrv rw (X) with x=
rv X rw O d X
vν(j)
wµ(j) ∈ Rrx ,
rx = rv · rw .
ν=1 µ=1 j=1
Under the assumption about Nj , the total work is N Rr = rv rw
d X
Nj .
(13.18)
j=1
13.5.2 Tensor Subspace Representation For v =
P
i∈J0
ai0
0(j) j=1 bij
Nd
k∈J00
we conclude from (13.17) that w := u
P
∈ Tr0 (V) and w =
v=
X X
a0i a00k
i∈J0 k∈J00 0(j)
d O
ak00
0(j)
bi j
00(j) j=1 bkj
Nd
00(j)
bkj
∈ Tr00 (W)
.
j=1 00(j)
We may define the frame b(j) := (bi bk : i ∈ Jj0 , k ∈ Jj0 ) and the subspace (j) Uj = span(b ) ⊂ X . Then a possible representation is x=
X
am
m∈J 0(j)
b(j) mj := bm0
j
d O
d
(j) bm j
with J :=
×J ,
Jj := Jj0 × Jj00 ,
j
(13.19)
j=1
j=1 00(j)
bm00 ∈ b(j) j
am := a0m0 a00m00
for mj := mj0 , mj00 ∈ Jj ,
with m = ((m01 , m100 ) , . . . , (m0d , m00d ))
and m0 = (m01 , . . . , m0d ), m00 = (m001 , . . . , m00d ). (j)
The cost for computing all frame vectors bmj ∈ b(j) is #Jj Nj . The coefficient tensor am requires #J multiplications. The total work is N Tr =
d X j=1
#Jj0 #Jj00 Nj +
d Y
#Jj0 #Jj00 .
j=1
In general, b(j) is only a frame. Therefore an additional orthonormalisation of b(j) may be desired. By (13.15b), the additional cost is 2d(r2 )d+1 + 2dn(r2 )2 = 2dnr4 + 2dr2d+2
(cf. (13.15b)).
13 Tensor Operations
494
13.5.3 Hierarchical Representation Let v ∈ Hr0 (V) and w ∈ Hr00 (W) be two tensors described in two different hierarchical formats, but with the same dimension partition tree TD . Since v = P 0(D) 0(D) P 00(D) 00(D) and w = k ck bk , we start from b` ` c` 0
v
w=
00
rd rd X X
0(D) 00(D) ck
c`
0(D)
b`
00(D)
bk
(13.20a)
`=1 k=1
and use the recursion r0
0(α) b`
00(α) bk =
r0
r 00
r 00
α2 α1 α1 α2 X X X X
0(α,`) 00(α,k) cmn
cij
0(α1 )
bi
0(α ) 00(α1 ) ⊗ bj 2 bm
2) b00(α n
i=1 j=1 m=1 n=1
(13.20b) (cf. (11.20)), which terminates at the leaves of TD . In the first approach we accept the frame b(α) consisting of the rα0 rα00 vectors 00(α) 0(α) (1 ≤ ` ≤ rα0 , 1 ≤ k ≤ rα00 ) describing the subspace Uα . The computation bk b` Pd (α) (j) 0 00 of b , 1 ≤ j ≤ d, costs j=1 rj rj Nj operations. Denote the elements of b (α) (α) 0(α) 00(α) by bm with m ∈ Jα := {1, . . . , rα0 }×{1, . . . , rα00 }, i.e., bm = b` bk if m = (`, k). Then (13.20b) yields the relation X X 00(α,k) (α,m) (α1 ) 2) (13.20c) := c0(α,`) cpq bp ⊗ b(α b(α) with c(α,m) q pq p1 q1 cp2 q2 m = p∈Jα1 q∈Jα2
for p = (p1 , p2 ) ∈ Jα1 , q = (q1 , q2 ) ∈ Jα2 .The new coefficient matrix C (α,m) is the Kronecker product C (α,m) = C 0(α,`) ⊗ C 00(α,k)
for m = (`, k) . rα0 1 rα00 1 rα0 2 rα00 2
Its explicit computation requires #Jα1 #Jα2 = multiplications. Equation (13.20a) can be rewritten as X 0(D) 00(D) (D) v w= c(D) for m = (m1 , m2 ) , with c(D) m bm m := cm1 cm2 m∈Jd
involving #Jd = rd0 rd00 multiplications. The result v w is represented in Hr (X) with representation ranks rα := rα0 rα00 . Altogether, the computational cost is d X X rj0 rj00 Nj + rα0 rα00 rα0 1 rα00 1 rα0 2 rα00 2 + rd0 rd00 N Hr = j=1
α∈TD \L(TD )
2
≤ dr N + (d − 1) r6 + 1, where N := maxj Nj . In terms of rα = rα0 rα00 and r¯ := max rα , we have N Hr ≤ d¯ rN + (d − 1) r¯3 + 1. By (13.16b) with r replaced by r2 , an additional orthonormalisation of the frame requires 2dnr4 + 6dr8 + (lower order terms) operations.
13.6 Hadamard Product of Tensors
495
13.6 Hadamard Product of Tensors The Hadamard product defined in §4.6.4 is of the form (13.17). For Vj = Knj the number of arithmetic operations is given by Nj = nj replacing Nj . Therefore the considerations in §13.5 yield the following costs for the different formats: full = N
d Y
nj ,
(13.21a)
j=1 Rr N = rv · rw
d X
nj ≤ dr2 n
with r := max{ru , rv },
(13.21b)
j=1 Tr = N
d X
nj #Jj0 #Jj00 +
j=1 Hr N =
d X
d Y
#Jj0 #Jj00 ≤ dnr2 + r2d ,
(13.21c)
j=1
X
rj0 rj00 nj +
j=1
rα0 rα00 rα0 1 rα00 1 rα0 2 rα00 2 + rd0 rd00
(13.21d)
α∈TD \L(TD )
≤ dr2 n + (d − 1) r6 + 1, where n := maxj {nj }. r has the value maxj {#Jj0 , #Jj00 } in (13.21c), while r := maxα {rα0 , rα00 } in (13.21d). For an additional orthonormalisation of the frames obtained for the formats Tr and Hr compare the remarks in §13.5.2 and §13.5.3. Above, we have considered the Hadamard product as an example of a binary operation : V × V → V. Consider h := g f with fixed g ∈ V. Then f 7→ h is a linear mapping and G ∈ L(V, V) defined by G(f ) := g f
(13.22)
is a linear multiplication operator. On the level of matrices, G is the diagonal matrix formed from the vector g: G := diag{gi : i ∈ I}. Remark 13.10. If g is given in one of the formats Rr , Tr , Hr , matrix G has a quite similar representation in Rr , Tr , Hr with vectors replaced by diagonal matrices: g=
d XO i
g=
⇒G=
d XO i
j=1
X a[i] i
(j)
gi
d O j=1
(j)
bij ⇒ G =
(j)
Gi
(j)
with Gi
(j)
:= diag{gi [ν] : ν ∈ Ij },
j=1
d X O (j) (j) (j) a[i] Bij with Bij := diag{bij [ν] : ν ∈ Ij }, i
j=1
and analogously for Hr . Even the storage requirements are identical if we exploit that diagonal matrices are characterised by the diagonal entries.
13 Tensor Operations
496
13.7 Convolution of Tensors We assume that the convolution operations ? : Vj ×Vj → Vj are defined and satisfy (13.17). For Vj = Knj we expect Nj? = O(nj log nj ) replacing Nj . A realisation of the convolution of functions with similar operation count (nj : data size of the function representation) is discussed in [133]. The algorithms in §13.5 with ? instead of require the following costs: N?full ≤ O(dnd log n), N?Rr N?Tr N?Hr
(13.23a)
2
≤ O(dr n log n),
(13.23b) 2d
2
≤ O(dr n log n) + r ,
(13.23c)
≤ O(dr2 n log n) + (d − 1) r6 .
(13.23d)
The same comment as above applies to an orthonormalisation. A cheaper realisation of the convolution will be proposed in §14.3, which—in suitable cases—may lead to Nj? = O(log nj ).
13.8 Matrix-Matrix Multiplication Let V := L(R, S), W := L(S, T), and X := L(R, T) be matrix spaces with Nd Nd Nd R = j=1 Rj , S = j=1 Sj , T = j=1 Tj . Matrix-matrix multiplication is a biR S T nary operation satisfying (13.17). In the case of Rj = Knj , Sj = Knj , Tj = Knj , the standard matrix-matrix multiplication of Aj0 ∈ L(Rj , Sj ) and A00j ∈ L(Sj , Tj ) S T requires Nj = 2nR j nj nj arithmetic operations. This leads to the following costs: full NMMM =2
d Y
S T nR j nj nj ,
j=1 Rr NMMM = 2rR · rS
d X
njR nSj nTj ,
j=1 Tr NMMM =2
d X
S T #Jj0 #Jj00 nR j nj nj +
j=1 Hr NMMM =2
d X j=1
d Y
#Jj0 #Jj00 ≤ 2dr2 n3 + r2d ,
j=1 S T rj0 rj00 nR j nj nj +
X
α∈TD \L(TD )
≤ dr2 n3 + (d − 1) r6 + r2 .
rα0 rα00 rα0 1 rα00 1 rα0 2 rα00 2 + rd0 rd00
13.9 Matrix-Vector Multiplication
497
Note, however, that the matrix-matrix multiplication of hierarchical matrices of size nj × nj requires only N MMM = O(n log∗ n) operations (cf. [138, §7.8.3]). j Often, one is interested in symmetric (K = R) or Hermitian matrices (K = C): M = MH .
(13.24)
Sufficient conditions are given in the following lemma. P Nd (j) Lemma 13.11. (a) Format Rr : M = i j=1 Mi satisfies (13.24) if (j)
(j)
= (Mi )H .
Mi (b) Format Tr : M =
P
i
a[i]
(j) j=1 bij
Nd
satisfies (13.24) if
(j)
(j)
bij = (bij )H holds for the basis of the matrix space Vj . (j)
(c) Format Hr : M ∈ Hr satisfies (13.24) if the bases Bj = (bi )1≤i≤rj ⊂ Vj in (j) (j) (11.24) consist of Hermitian matrices: bi = (bi )H . t u
Proof. See Exercise 4.162.
13.9 Matrix-Vector Multiplication We distinguish the following cases: (a) the matrix A ∈ L(V, W) and the vector v ∈ V are given in the same format, (b) the vector v ∈ V is given in one of the formats, while A is of one of the following special forms: A = A(1) ⊗I ⊗. . .⊗I + I ⊗A(2) ⊗I ⊗. . .⊗I +. . .+ I ⊗. . .⊗I ⊗A(d) , (13.25a) A=
d O
A(j) ,
(13.25b)
j=1
A=
p O d X
(j)
Ai
(13.25c)
i=1 j=1 (j)
with A(j) , Ai ∈ L(Vj , Wj ) (where Vj = Wj in the case of (13.25a)). We assume nj = dim(Vj ) and mj = dim(Wj ). A matrix as in (13.25a) occurs for separable differential operators and their discretisations (cf. Definition 9.65). (13.25b) describes a general elementary tensor, and (13.25c) is the general p-term format.
13 Tensor Operations
498
13.9.1 Identical Formats Matrix-vector multiplication is again of the form (13.17). The standard cost of A(j) v (j) is 2nj mj (for hierarchical matrices the computational cost can be reduced to O((nj + mj ) log(nj + mj )), cf. [138, Lemma 7.17]). §13.5 shows that full NMVM =2
d Y
nj mj ,
j=1 Rr = 2rv · rw NMVM
d X
nj m j ,
j=1 Tr NMVM
=2
Hr =2 NMVM
Xd j=1 d X
#Jj0 #Jj00 nj mj
+
#Jj0 #Jj00 ≤ 2dr2 nm + r2d ,
j=1
rj0 rj00 nj mj +
j=1
d Y
X
rα0 rα00 rα0 1 rα00 1 rα0 2 rα00 2 + rd0 rd00
α∈TD \L(TD )
2
≤ 2dr nm + (d − 1) r6 + 1, where n := maxj nj and m := maxj mj .
13.9.2 Separable Form (13.25a) Let v ∈ V be given in full format. w = (A(1) ⊗I ⊗I ⊗. . .⊗I)v has the explicit Pn1 (1) description w[i1 . . . id ] = k1 =1 Ai1 k1 v[k1 i2 . . . id ]. Its computation for all 2 i1 , . . . , id takes 2n1 n2 · · · nd operations. This proves ! d X full N(13.25a) = 2 nj ≤ 2dnd+1 . (13.26a) j=1
Pr Nd (j) Next we consider the tensor v = ∈ Rr in r-term format. i=1 j=1 vi Multiplication by A in (13.25a) leads to the following cost and representation rank: Rr N(13.25a) =r
d X
(2nj − 1) nj ≤ 2drn2 ,
Av ∈ Rd·r .
(13.26b)
j=1
Nd
Nd Uj is mapped into w = Av ∈ j=1 Yj , P Nd (j) where Uj ⊂ Vj and Yj ⊂ Wj . Its representation is v = k∈J ak j=1 bkj . In the tensor subspace case, v ∈
j=1
13.9 Matrix-Vector Multiplication
499
For A in (13.25a) the resulting subspaces are Yj = span{Uj , A(j) Uj } which can (j) (j) be generated by the frame (bk , A(j) bk : 1 ≤ k ≤ rj ) of the size rjw := 2rj . (j) Denote these vectors by (bk,w )1≤k≤rjw with (j)
(j)
bk,w := bk Then w =
P
k∈Jw
bk
(j)
(j)
and bk+rj ,w := A(j) bk (j) j=1 bkj ,w
Nd
for 1 ≤ k ≤ rj .
holds with
bk1 ···kj−1 ,kj +rj ,kj+1 ···kd = ak1 ···kj−1 ,kj ,kj+1 ···kd for 1 ≤ k` ≤ r` , 1 ≤ j, ` ≤ d, bk
=0
otherwise. (j)
(j)
The only arithmetic computations occur for bk+rj ,w := A(j) bk (cost: (2nj −1)nj operations), while bk needs only copying of data. However, note that the size of the coefficient tensor b is increased by 2d : the new index set Jw has the cardinality Qd Qd #Jw = j=1 rjw = 2d j=1 rj = 2d #J. We summarise: Tr N(13.25a) =
d X
rjw = 2rj ,
(2nj − 1) nj ≤ 2drn2 ,
#Jw = 2d #J. (13.26c)
j=1
For v given in the hierarchical format, we obtain Hr ≤ 2drn2 , N(13.25a)
(13.26d)
as detailed for the case of (13.25b) below.
13.9.3 Elementary Kronecker Tensor (13.25b) For v in full format, multiplication of A in (13.25b) by v requires full N(13.25b)
=2
j d Y X j=1
mk
Y d
k=1
nk
≤ 2dnd+1 ,
(13.27a)
k=j
operations, where n := maxj {nj , mj }. The r-term format v = leading to Rr =r N(13.25b)
d X j=1
Pr
i=1
d N j=1
(j)
vi
(j)
∈ Rr requires computing A(j) vi ,
(2nj − 1) mj ≤ 2drn2 ,
Av ∈ Rr .
(13.27b)
13 Tensor Operations
500
P Nd (j) For the tensor subspace format v = k∈J ak j=1 bkj ∈ Tr we obtain P N (j) d w = Av = k∈Jw aw k j=1 bkj ,w , where, as in §13.9.2, Tr = N(13.25b)
d X
(2nj − 1) mj ≤ 2drn2 ,
rjw = rj , #Jw = #J.
(13.27c)
j=1
N (j) Next, we consider in more detail the case of v ∈ Hr . Let A(α) := j∈α A Prd (D) (D) be the partial product over α and v = i=1 ci bi ∈ Hr . The product w = Prd (D) (D) (D) ci A bi satisfies the recursion Av = A(D) v = i=1 (α) A(α) b`
=
rα 1 r α 2 X X
(α,`)
cij
(α1 )
A(α1 ) bi
(α ) ⊗ A(α2 ) bj 2
({α1 , α2 } = S(α)) .
i=1 j=1 (j)
At the leaves, A(j) bi has to be computed for all 1 ≤ j ≤ d. Defining frames with (α) (α) b`,w := A(α) b` for all α ∈ TD and 1 ≤ ` ≤ rα , we obtain the representation (D) w ∈ Hr with identical coefficient matrices C (α,`) and ci . Note that the frame (α) vectors b`,w are to be computed for the leaves α ∈ L(TD ) only, i.e., for α = {j}, 1 ≤ j ≤ d. Therefore the computational work is Hr = N(13.25b)
d X
(2nj − 1) mj ≤ 2drn2 ,
(13.27d)
j=1
while the data size is unchanged.
13.9.4 Matrix in p-Term Format (13.25c) Pp Nd (j) in (13.25c) requires the p-fold work The general case A = i=1 j=1 Ai compared with (13.25b) plus the cost for p − 1 additions of vectors: full full N(13.25c) = pN(13.25b) ≤ 2pdnd+1 ,
(13.28a)
Rp N(13.25c)
(13.28b)
Tr N(13.25c)
≤ 2prdn2 , 2
w ∈ Rp·r ,
≤ 2prdn + (p −
Tr 1)N+ ,
Hr Hr ≤ 2pdrn2 + (p − 1)N+ . N(13.25c)
(13.28c) (13.28d)
In the first two cases, the addition is either of lower order (full format) or free of Hr Tr cost. The values of N+ and N+ depend on the choice frame versus basis. In the Hr Tr 2 ≤ 8dnr2 + 8dr4 . latter case, N+ ≤ 2dnr + 2drd+1 and N+
13.10 Functions of Tensors, Fixed-Point Iterations
501
13.10 Functions of Tensors, Fixed-Point Iterations Given a function f : D ⊂ C → C and a matrix A with spectrum in D, a matrix f (A) can be defined.9 Details about functions of matrices can be found in Higham [161] or Hackbusch [138, §14]. Examples of such matrix functions are exp(A), exp(tA), or A1/2 , as well as the inverse A−1 (corresponding to f (z) = 1/z). In particular cases, there are fixed-point iterations converging to f (A). In the case of A−1 , the Newton method yields the iteration Xm+1 := 2Xm − Xm AXm ,
(13.29)
which shows local, quadratic convergence. A possible starting value is X0 := I. Equation (13.29) is an example of a fixed-point iteration. If the desired tensor satisfies X ∗ = Φ(X ∗ ), the sequence Xm+1 := Φ(Xm ) converges to X ∗ if Φ is contractive. Assuming that the evaluation of Φ involves only the operations studied in this chapter, Φ(Xm ) is available. However, since the operations cause an increase of the representation ranks, the next iteration must be preceded by a truncation: ˜ m+1 := T (Φ(Xm )) X
(T : truncation).
The resulting iteration is called the ‘truncated iteration’ and has been studied in Hackbusch–Khoromskij–Tyrtyshnikov [150] (see also Hackbusch [138, §15.3.2]). In essence, the error decreases as in the original iteration until the iterates reach an X ∗ neighbourhood of the size of the truncation error (X ∗ : exact solution). For the particular iteration (13.29) converging to X ∗ = A−1 , a suitable modification is proposed by Oseledets–Tyrtyshnikov [241] (see also [238]). The iteration for Hk , Yk , Xk is defined by Hk := T0 (2I − Yk ),
Yk+1 := T1 (Yk Hk ),
Xk+1 := T1 (Xk Hk )
and uses a standard truncation T1 and a possibly rougher truncation T0 . In the exact case (no truncation), Hk → I, Yk → I, and Xk → A−1 hold. We conclude that approximations to A−1 can be determined iteratively, provided that we have a sufficient starting value. For other functions as A1/2 and exp(tA) we refer to [138, §15.3.1] and [138, §14.2.2], respectively. A useful and seemingly simple (nonlinear) function is the maximum of a tensor Nd d v ∈ V = j=1 RIj ∼ = RI (I = ×j=1 Ij ) (cf. Espig et al. [93, §2.3]): max(v) := max{vi : i ∈ I}. 9
In the case of a general function, A must be diagonalisable.
13 Tensor Operations
502
Since min(v) = − max(−v), this function allows us to determine the maximum norm kvk∞ of a tensor. The implementation is trivial for an elementary tensor: Y O d d max v (j) ; v (j) = max j=1
j=1
however, the implementation for general tensors is not straightforward. A possible approach, already described in Espig [87] and also contained in [92, §4.1], is based on the reformulation as an eigenvalue problem. The tensor v ∈ V corresponds to a multiplication operator G(v) defined in (13.22). Let I∗ := {i ∈ I : max(v) = vi } be the set of indices for which the maximum is attained. Then the eigenvalue problem (0 6= u ∈ V)
G(v)u = λu
(13.30)
has the maximal eigenvalue λ = max(v). The eigenspace consists of all vectors u with support in I∗ . In particular, if I∗ = {i∗ } is a singleton, the maximal eigen∗ value is a simple one and the eigenvector is a multiple of the unit vector e(i ) , which has tensor rank 1. Using the simple vector iteration or more advanced methods, we can determine not only the maximum max(v) but also the corresponding maximising index. An interesting function in statistics is the characteristic function χ(a,b) : RI → RI of an interval (a, b) ⊂ R (including a = ∞ or b = ∞) with the pointwise definition 1 vi ∈ (a, b) for v ∈ RI , i ∈ I . χ(a,b) (v) i := 0 otherwise This function can be derived from the sign function: +1 vi > 0 0 vi = 0 (sign(v))i := for v ∈ RI , i ∈ I . −1 vi < 0 In contrast to (13.30), the tensor u := χ(a,b) (v) may have large tensor rank, even for an elementary tensor v. However, in cases of rare events, u is sparse (cf. Remark 7.2). In Espig et al. [92, §4.2] an iteration for computing sign(v) is proposed, using either uk := T
k−1 1 2 (u
+ (uk−1 )−1 )
(T : truncation)
(13.31a)
(3 · 1 − uk−1 uk−1 )
(13.31b)
or uk := T
1 k−1 2u
with the constant tensor 1 of value 1 (i.e., 1i = 1) (cf. Espig et al. [93, §2.4]). Iteration (13.31a) requires a secondary iteration for the pointwise inverse (uk−1 )−1 i := 1/uk−1 for i ∈ I . i For numerical examples, see [92, §6].
13.11 Example: Operations in Quantum Chemistry Applications
503
13.11 Example: Operations in Quantum Chemistry Applications The stationary electronic Schr¨odinger equation
d M X d X X 1X Zk H Ψ := − ∆i − + 2 i=1 |xi −Rk | i=1 k=1
1≤i n) with m = 2d (or modified according to (i)) by replacing v ∈ Kn by v˜ ∈ Km with v˜i = vi (1 ≤ i ≤ n) and v˜i = 0 (n < i ≤ m). Many operations can be performed with v˜ instead of v.
14.1.2 Format Htens ρ Let ρ = (ρ0 = 1, ρ1 = 2, ρ2 , . . . , ρd−1 , ρd = 1). In the following we use a particular hierarchical format2 Hρtens based on the linear tree TDTT , which is almost identical to Tρ . The parameters of v ∈ Hρtens are d (D) v = ρtens . HT (Cj )j=2 , c
(14.5a)
TT bal Of course, the maximum ranks appearing in TK and TK may be different. The TT format applied to tensorised quantities has also been called QTT (‘quantised TT’ or ‘quantics TT’ inspite of the meaning of quantics; cf. [181] and §3.5.2). 1 2
14 Tensorisation
510
The coefficients (Cj )dj=1 with Cj = C (j,`)
and
1≤`≤ρj
1≤k≤2
(1,...,j)
define the basis vectors b` (1)
b`
(1)
(j)
b` =
X
recursively:
for j = 1 and 1 ≤ ` ≤ 2,
= b`
ρj−1
(j,`) C (j,`) = cik 1≤i≤ρj ∈ Kρj ×2
2 X
(j,`)
cik
(j−1)
bi
(j)
⊗ bk
(14.5b)
(1 ≤ ` ≤ ρj )
for j = 2, . . . , d. (14.5c)
i=1 k=1 (1)
Here the basis vectors b` tensor v is defined by3
(j)
and bk
are the unit vectors in (14.2a). Finally, the (d)
v = c(D) b1 .
(14.5d)
The differences to the usual hierarchical format are: (i) fixed tree TDTT , (ii) fixed unit bases (14.2a), (iii) vertex α = {1, . . . , j} is abbreviated by j in the notation (j) (α) (α,`) (j,`) b` (instead of b` ) and cik (instead of cik ), (iv) ρd = 1 simplifies (14.5d). The data Cj in (14.5a) almost coincide with those of the Tρ format (12.1a). For the precise relation we refer to (12.11). ˜ ∈ Hρtens , the later operaOnce the data of v ∈ Kn are approximated by v tions can be performed within this format (cf. §14.1.3). The question remains how tens and then approximated by v ∈ Kn can be transferred to v = Φ−1 n (v) ∈ Hρ tens ˜ ∈ Hρ . An exact representation by v = Φn−1 (v) is possible, but requires some v touching all n data. Hence the cost may be much larger than the later data size of v ˜. Nevertheless, this is the only way if we require an exact approximation error bound. A much cheaper but heuristic approach uses the multivariate cross approximation tools in §15.
14.1.3 Operations with Tensorised Vectors We assume (14.1a,b) and represent v ∈ ⊗d K2 by the Hρtens format (14.5a) with representation ranks ρj . The family Cj of matrices in (14.5a) is assumed to be orthonormal with respect to the Frobenius norm implying that the bases in (14.5b,c) are orthonormal. Pd The storage size of v follows from (14.3) with rα2 = 2: S = 2 j=2 ρj ρj−1 . The addition of two tensors v, w ∈ ⊗d K2 with identical data (Cj )2≤j≤d is trivial. In the standard case, there are different data Cjv and Cjw and the P d (D) (d) Here, we make use of ρd = 1. For ρd > 1, one has v = ρ b` . In the latter case, `=1 c` several tensors can be based of the same parameters (Cj )d j=2 . 3
14.1 Basics
511
procedure JoinBases has to be applied (cf. (11.69a) and Remark 11.70). The arithmetic cost NQR (rα1 · rα2 , rα0 + rα00 ) mentioned in Remark 11.70b becomes4 v v w v w 3 + ρw NQR (2(ρj−1 j−1 ), ρj + ρj ) ≤ 8ρ for ρ := maxj {ρj , ρj }. Pd The entry-wise evaluation of v in (14.5a) costs 2 j=2 ρj−1 ρj operations as seen from (13.2). The latter computation uses the Hρtens data which are directly given by (12.11). The scalar product hu, vi of two tensors with identical data (Cj )2≤j≤d is Note trivial as seen from (13.9). Otherwise we may apply the recursion in (13.11). 0(β ) 00(β ) that (13.11) simplifies because β2 ∈ L(TDTT ) implies bj 2 , bn 2 = δjn (cf. Pd (14.2a)). The cost of the recursion becomes 7 j=2 ρjv ρjw ρvj−1 ρw j−1 . Alternatively, we may join the bases as for the addition above. In fact, this approach is cheaper since it is only cubic in the ranks: 4
d X
ρvj + ρjw
2
v + ρw ρj−1 j−1 .
(14.6)
j=2
A binary operation between tensors of V and W can be performed as in §13.5.3, however, there are two particular features. First, the ranks rα2 for the second son—which is a leaf—is rα2 = 2 (cf. (14.2b)). In the matrix case it may 00(α ) 0(α ) be rα2 = 4 (cf. §14.1.6). Second, the results of bj 2 bn 2 in (13.20b) are 0(α2 ) explicitly known. The basis vectors bj are from the set { 10 , 01 } and their products are again either zero or belong to this set (at least for all considered here). 00(α ) (α ) 0(α ) An example is the Hadamard product = , where bj 2 bn 2 = δjn bn 2 . (α2 ) used in the algorithm of §13.5.3 can be replaced Hence the frame b general with { 01 , 01 }. Correspondingly, the computational cost is reduced to 2
d−1 X
ρvj ρjw .
(14.7)
j=1
The Hadamard product is invariant with respect to the tensorisation (cf. (14.1c)): Φn (v w) = Φn (v) Φn (w),
(14.8)
i.e., the tensorisation of the vector-wise Hadamard product v w is expressed by the tensor-wise Hadamard product v w. This binary operation was already mentioned above. As in §12.3.7, the HOSVD bases can be computed, on which the truncation is based. The corresponding computational cost described in (12.13) for the general case becomes 4
d X j=2
4
ρ2j−1
4 ρj−2 + 2ρj + ρj−1 3
3 52 ≤ (d − 1) max ρj . j 3
The transformations from (11.68a–c) do not appear, since the bases Bj are fixed.
(14.9)
14 Tensorisation
512
14.1.4 Application to Representations by Other Formats The tensorisation procedure can be combined with other formats in various ways. Nd nj In the following, we consider the tensor space V = j=1 Vj with Vj = K δj and assume for simplicity that nj = 2 (δj ∈ N). The tensorisation of the spaces ∼ Vj = ⊗δj K2 leads to Vj = δj d d O O O ˆ := K2 . (14.10) V= Vj ∼ = V j=1
j=1
κ=1
14.1.4.1 Combination with r-Term Format Pr Nd (j) The r-term representation of v = i=2 j=1 vi ∈ V is based on the vectors (j) (j) vi ∈ Vj . Following the previous considerations, we replace the vectors vi with (j) (j) (approximate) tensors vi ∈ Vj = ⊗δj K2 . For the representation of vi we use (j) (j) the Hρtens format (14.5a), involving rank parameters ρ(j) = (ρ1 , . . . , ρδj ). For nj ≤ n, the storage requirement of the r-term format has been described by (j) drn. If a sufficient approximation vi of data size O(ρ2 log(nj )) = O(ρ2 δj ) and (j) moderate ρ = maxκ ρκ exists, the storage can be reduced to O(drρ2 log(n)). Similarly, the cost of operations can be decreased drastically. In Remark 7.16 the cost of the scalar product in Vj is denoted by Nj . In the standard case Vj = Knj , (j) (j) we expect Nj = 2nj − 1 arithmetic operations. Now, with vi and wi replaced (j) (j) by vi , wi ∈ Vj , the cost of the scalar product is Nj = O(ρ3 log(nj ))
} , ρw,(j) with ρ = max{ρv,(j) κ κ κ
as can be seen from (14.6). Analogously, the cost of the other operations improves.
14.1.4.2 Combination with Tensor Subspace Format The data of the tensor subspace format are the coefficient tensor a and the basis (j) (j) (frame) vectors bi ∈ Vj (cf. (8.5c)). Tensorisation can be applied to bi ∈ Vj with (j) the same reduction of the storage for bi as above. Unfortunately, the coefficient tensor a which requires most of the storage, is not affected. The latter disadvantages can be avoided by using the hybrid format (cf. §8.2.6). All operations with tensors from Tr lead to various operations between the basis (j) (j) vectors from Vj . If bi ∈ Vj is expressed by the tensorised version bi ∈ ⊗δj K2 , these operations may be performed much cheaper. For instance, the convolution in Vj can be replaced with the algorithm described in §14.3 below.
513
14.1 Basics
14.1.4.3 Combination with the Hierarchical Format There are two equivalent ways of integration into the hierarchical format. First, we (j) (j) r may replace all basis vectors in Bj = b1 , . . . , brj ∈ (Uj ) j with their tensorised (j) version bi ∈ ⊗δj K2 , using the Hρtens format (14.5a). Consequently, all operations (j) (j) involving bi are replaced with the corresponding tensor operations for bi . The second, simpler interpretation extends the dimension partition tree TD of the hierarchical format. Each leaf vertex {j} is replaced with the root of the linear used for the tensorisation, where ∆j = {1, . . . , δj } (see Fig. 14.1). The tree T∆TT j resulting extended tree is denoted by TDext . The set L(TDext ) of its leaves is the Sd ) of the leaves of T∆TT . Hence dim(V(α) ) = 2 holds for all union j=1 L(T∆TT j j ext ext α ∈ L(TD ). We may interpret TD as the dimension partition tree for the tensor ˆ in (14.10), where a general (possibly balanced) tree structure is combined space V with the linear tree structure below the vertices α ∈ L(TD ), which are now inner vertices of TDext .
⇒ Fig. 14.1 Left: balanced tree with 4 leaves corresponding to Vj = K16 . The isomorphic tensor spaces ⊗4 K2 are treated by the linear trees below. Right: Extended tree.
14.1.5 Matricisation The nontrivial vertices of the linear tree TDTT are {1, . . . , j} for j = 1, . . . , d. In Definition 5.3 the matricisation M{1,...,j} (v) is defined. In this case M{1,...,j} (v) can easily be described by the generating vector v = Φn (v) ∈ Kn (n = 2d ): v0 v2j · · · v2d−1 v1 v2j +1 · · · v2d−1 +1 (14.11) M{1,...,j} (v) = . . .. .. .. . . v2j −1 v2j+1 −1 · · · v2d −1 Hence the columns of M{1,...,j} (v) correspond to blocks of the vector with block size 2j . An illustration is given below for M3 (v) in the case of n = 32. The columns of M3 (v) consists of the four parts of the vector: (14.12)
14 Tensorisation
514
We recall that ρj = rank(M{1,...,j} (v)). An immediate consequence of (2.5) is the following statement. √ Remark 14.2. Any v ∈ ⊗d K2 satisfies ρj ≤ min{2j , 2d−j } ≤ 2bd/2c ≤ n.
14.1.6 Generalisation to Matrices As mentioned above, the original description of the tensorisation technique by Oseledets [235] applies to matrices. Let M be a matrix of the size n × n with n = 2d . Since dim(Kn×n ) = n2 = (2d )2 = 4d , the matrix space Kn×n is isomorphic to ⊗d K2×2 , the tensor product of 2 × 2 matrices. A possible isomorphism is given by the following counterpart of Φn in (14.1c): Φn×n : M ∈
Nd
j=1
K2×2 7→ M ∈ Kn×n
M [ν, µ] = M[(ν1 , µ1 ), . . . , (νd , µd )] Pd Pd with ν = j=1 νj 2j−1 , µ = j=1 µj 2j−1 . The latter definition corresponds to the (d − 1)-fold application of the Kronecker product (1.5) to 2 × 2 matrices. Again, the hierarchical format with the linear tree TDTT can be used to represent tensors M ∈ V := ⊗d K2×2 . Differently from the definitions in (14.2a,b), we now have (j) (j) (j) (j) rj = 4, b1 = 10 00 , b2 = 00 10 , b3 = 01 00 , b4 = 00 01 . This fact increases some constants in the storage cost, but does not change the format in principle. Matricisation corresponds to a block representation of the matrix with block size 2j × 2j as illustrated in A B C Figure 14.2. This means that the columns of the matrix M{1,...,j} (v) are formed by these subblocks. Concerning operations, matrix operations are of particular interest. The multiplication M 0 M 00 is a binary operation. The operation count (14.7) holds with a factor of 4, instead of 2, because rj = 4. The matrixvector multiplication M v of a matrix M ∈ Kn×n by v ∈ Kn using the tensor counterparts M ∈ ⊗d K2×2 Fig. 14.2 Matricisation for n = 32, j = 3 and v ∈ ⊗d K2 is of the same kind. Finally, we discuss the ranks ρj of certain Toeplitz matrices.5 The identity matrix or any diagonal matrix with 2j periodic data has rank ρj = 1 since the range of M{1,...,j} (v) is spanned by one diagonal block. 5
A Toeplitz matrix M is defined by the property that the entries Mij depend on i − j only.
14.2 Approximation of Grid Functions
515
A banded upper triangular Toeplitz matrix with nonzero entries Mik (i ≤ k ≤ 2` ) has ranks ρj ≤ 1 + 2`−j for j ≤ `, ρj = 2 for j ≥ `. The proof can be derived from Figure 14.2. For j ≥ `, the range of M{1,...,j} (v) is spanned by the blocks A and B, whereas C is a zero block. For j ≤ `, 2`−j off-diagonal blocks appear. A general Toeplitz matrix of band width 2` satisfies ρj ≤ 1 + 2 2`−j . Simple examples of this kind are tridiagonal Toeplitz matrices, which appear as discretisations of one-dimensional differential equations with constant coefficients. Explicit TT representations of the one-dimensional Laplace discretisation and related matrices are give in [178].
14.2 Approximation of Grid Functions 14.2.1 Grid Functions In the following, we assume that the vector v ∈ Kn , n = 2d , is a grid function; i.e., b−a vk = f (a + kh) for 0 ≤ k ≤ n − 1, h := , (14.13a) n where f ∈ C([a, b]) is sufficiently smooth. If f ∈ C((a, b]) has a singularity at x = a, the evaluation of f (a) can be avoided by the definition vk = f (a + (k + 1)h) for 0 ≤ k ≤ n − 1.
(14.13b)
Any approximation f˜ of f yields an approximation v˜ of v. Remark 14.3. Let f ∈ C([a, b]). For j ∈ D and k = 0, . . . , 2d−j − 1 consider the functions fj,k (•) := f (a + k2j h + •) ∈ C([0, 2j h]) and the subspace Fj := span{fj,k : 0 ≤ k ≤ 2d−j − 1}. Then rank{1,...,j} (v) ≤ dim(Fj ) holds for v ∈ ⊗d K2 with v = Φn (v) satisfying (14.13a) or (14.13b). Proof. The k-th columns of M{1,...,j} (v) in (14.11) are evaluations of fj,k . Therefore rank{1,...,j} (v) = rank(M{1,...,j} (v)) cannot exceed dim(Fj ). t u The tensorisation technique is not restricted to discrete grid functions. In §14.5 we shall describe a version for functions.
14 Tensorisation
516
14.2.2 Exponential Sums As seen in Remark 5.19, the exponential function f (x) = exp(cx) leads to a tensor v of tensor rank one. Consequently the ranks of other formats, including the TT format, are one: v ∈ Hρtens with ρ = (1, . . . , 1) (cf. (14.5a)). We now suppose that the function f can be approximated in [a, b] by r X (14.14) fr (x) := aν exp(−λν x) ν=1
with aν , λν ∈ K. Examples with aν , λν > 0 are given in §9.8.2.3 together with error estimates for the maximum norm kf − fr k∞ . The grid function corresponding to fr yields a tensor of rank ≤ r. In particular, v ∈ Hρtens with ρ = (r, . . . , r). If f is periodic in [a, b] = [0, 2π], the truncated Fourier sum yields (14.14) with imaginary λν = [ν − (r + 1)/2] i for odd r ∈ N. Closely related are sine or cosine Pr Pr−1 sums ν=1 aν sin(νx) and ν=0 aν cos(νx) in [0, π]. Note Pr that periodicity is not necessary and may be replaced by quasi-perodicity, i.e., ν=1 aν exp(−µν ix) with arbitrary µν ∈ R. An example of (14.14) with general complex exponents λν is mentioned in §9.8.2.4. As seen in Remark 5.19, the grid function v ∈ Kn corresponding to fr in d 2 (14.14) has a tensorised version v = Φ−1 n (v) ∈ ⊗ K in Rr : d r O X 1 aν v= (14.15) exp(−2j−1 λν ) ν=1
j=1
requiring 2rd = 2r log2 n data.6
14.2.3 Polynomials 14.2.3.1 Global Polynomials We recall that PN denotes the space of polynomials of degree ≤ N (cf. §10.4.2.1). Lemma 14.4. Let f ∈ PN and v ∈ Kn the corresponding grid function. Then the tensorisation satisfies v ∈ Hρtens
with ρj ≤ min{N + 1, 2j , 2d−j }.
(14.16)
Proof. The space Fj in Remark 14.3 consists of polynomials of degree ≤ N so that dim(Fj ) ≤ N + 1. Therefore Remark 14.3 implies ρj ≤ N + 1. Together t u with Remark 14.2, the assertion is proved. 6
The factor in (14.15) can be integrated into the first factor. On the other hand, in the special case of (14.15) one need not store the number 1, so that only r(d + 1) data remain.
14.2 Approximation of Grid Functions
517
A consequence is the following approximation result. Conclusion 14.5. Let f ∈ C(I) be a function with εN := min{kf − P k∞ : P ∈ PN }. Let v be the grid function corresponding to f and v its tensorisation. The HOSVD ˜ with error bound truncation of v to ρj ≤ N + 1 yields an approximation v p ˜k∞ ≤ CεN , where C := max{0, d + 1 − 2 b1 + log2 (N + 1)c}. kv − v Proof. The best approximation error is εbest := min{kv − v ˜k∞ : v ˜ ∈ Hρtens with ρj ≤ √ N + 1} ≤ εN . Quasi-optimality of the HOSVD truncation yields the bound M εbest , where M is the number of vertices {1, . . . , j} with min{2j , 2d−j } > N + 1 (only these vertices require a singular-value truncation). One checks that M = max{0, d + 1 − 2 b1 + log2 (N + 1)c . t u If f is analytic in a Bernstein ellipse Eρ (I), a bound of εN is described in Theorem 10.27. Such estimates are applied by Khoromskij–Veit [192] to show that the oscillatory integral Z f (t) eiωg(t) dt I(ω) = J
(J: interval, f, g: smooth functions) as a function of ω is band-limited and allows an efficient approximation.
14.2.3.2 Piecewise Polynomials We divide the interval I = (a, b ] into n1 subintervals of equal length: ν Iν = (aν−1 , aν ] with aν := a + (b − a) for 1 ≤ ν ≤ n1 . n1 On each interval Iν we introduce an equidistant grid of n2 grid points. The resulting step size is h = b−a with n := n1 n2 . The numbers n1 and n2 are n assumed to be powers of 2: n1 = 2L1 ,
n2 = 2L2 ,
n := n1 n2 ,
d = L1 + L2 .
The (discontinuous) piecewise polynomial P of degree ≤ N coincides on all pw Iν with a polynomial of degree ≤ N. We denote the set of these functions by PN . Assume that the function f is analytic in the (complex) rectangle L(ζ − 1)2 L(ζ − 1)2 L 1 Rζ = z = x + iy : a − ≤x≤b+ , |z| ≤ (ζ − ) , 4n1 ζ 4n1 ζ 4n1 ζ where L := b − a. This rectangle contains the Bernstein ellipses Eζ (Iν ) for all 1 ≤ ν ≤ n1 . This fact together with Corollary 10.28 proves the following result.
14 Tensorisation
518
Proposition 14.6. If f is holomorphic in Rζ with |f (z)| ≤ M for z ∈ Rζ , then, pw with for all N ∈ N, there is an approximating piecewise polynomial P ∈ PN |f (x) − P (x)| ≤
2ζ −N M ζ −1
for x ∈ I = (a, b].
If a rectangle R contains [a, b] in its interior, R ⊃ Rζ holds with ζ = O(n1 ). Let f be holomorphic in R with |f (z)| ≤ const on R. Then the error estimate |f (x) − P (x)| ≤ O(n−N 1 ) holds. pw Lemma 14.7. Let P ∈ PN the piecewise polynomial, v ∈ Kn the corresponding grid function, and v the tensorisation. Then v belongs to Hρtens with
ρj ≤ min
np
o n1 (N + 1), N + 1, 2j .
Proof. If j ≥ L2 , each column of (14.11) corresponds to 2j−L2 piecewise polynomials. Therefore the rank is bounded by ρj ≤ 2j−L2 (N + 1) . Instead of the estimate in (14.16) we obtain ρj ≤ min{2max{0,j−L2 } (N + 1) , 2j , 2d−j } and distinguish three cases. (a) j ≥
L2 +d−log2 (N +1) 2
(b) If L2 ≤ j ≤
implies 2d−j ≤ 2(d−L2 +log2 (N +1))/2 =
L2 +d−log2 (N +1) , 2
p n1 (N + 1).
then we have
2max{0,j−L2 } (N + 1) ≤ 2(d−L2 −log2 (N +1))/2 (N + 1) =
p
n1 (N + 1).
(c) 0 ≤ j ≤ L2 implies min{2max{0,j−L2 } (N + 1) , 2j } ≤ min{N + 1, 2j }. This proves the desired rank estimate. t u
14.2.3.3 Polynomial Approximations for Asymptotically Smooth Functions Let f ∈ C ∞ ((0, 1]) be a function possibly with a singularity at x = 0 and assume that the derivatives are bounded by (k) f (x) ≤ C k! x−k−a for all k ∈ N, 0 < x ≤ 1 and some a > 0. (14.17) Because of a possible singularity at x = 0 we choose the setting (14.13b). Exercise 14.8. Check condition (14.17) for f (x) = 1/x, 1/x2 , x log x. Functions f satisfying (14.17) are called asymptotically smooth. In fact, f is analytic in (0, 1]. Taylor series at x0 ∈ (0, 1] has the convergence radius x0 . PThe ∞ 1 (k) f (x0 )(x − x0 )k is bounded by The remainder k=N k!
14.2 Approximation of Grid Functions
C x0−a
∞ X
1−
k=N
x x0
519
k =C
x1−a 0 x
1−
x x0
N → 0.
Lemma 14.9. Assume (14.17) and ξ ∈ (0, 1]. Then there is a polynomial p of degree N such that −a C ξ kf − pk[ξ/2,ξ],∞ = max |f (x) − p(x)| ≤ εN,ξ := 3−a−N . 2 4 ξ/2≤x≤ξ PN 1 (k) (x0 )(x − x0 )k . The Proof. Choose x0 = 34 ξ and set p(x) := k=0 k! f remainder in [ξ/2, ξ] is bounded by εN,ξ defined above. t u If an accuracy ε is prescribed, the number Nε satisfying εNε ,ξ ≤ ε is asymptotically 1 1 N = (log + a log 4ξ )/ log . ε 3 Next, we define a piecewise polynomial for a specially chosen partition of the interval I = [ n1 , 1]. Divide the interval I into the subintervals [ n1 , n2 ] = [2−d , 21−d ], (21−d , 22−d ], . . . , ( 21 , 1]. For each interval (2−j , 21−j ] define a polynomial pj ∈ PN (cf. §10.4.2.1) according to Lemma 14.9. Altogether, we obtain a piecewise continuous function pw fN ∈ PN (an hp finite-element approximation) defined on [1/n, 1] satisfying the exponential decay kf − fN k[ 1 ,1],∞ ≤ εN := 2−1+a(d+1) 3−a−N . n
Evaluation of fN yields the vector entries vk = f ((k + 1)/n) and the tensorised version v ∈ ⊗d K2 . Proposition 14.10. The tensor v constructed above possesses the {1, . . . , j}-rank ρj = rank{1,...,j} (v) = rank(M{1,...,j} (v)) ≤ N + 2. Proof. For fN define the functions fN,j,k from Remark 14.3. For k ≥ 1, fN,j,k is a polynomial of degree N , only for k = 0, the function fN,j,0 is piecewise polynomial. This proves Fj ⊂ PN + span{fN,j,0 } and dim(Fj ) ≤ N + 2. The assertion follows from Remark 14.3. t u Since ρj is the (minimal) TT rank of v in the Hρtens representation, the required storage is bounded by 2 (d − 1) (N + 1)2 . A more general statement of a similar kind for functions with several singularities7 is given by Grasedyck [121]. 7
Then, in the right-hand side of (14.17), x is replaced by the distance of x to the next singularity.
14 Tensorisation
520
14.2.4 Multiscale Feature and Conclusion Multiscale considerations of (grid) functions use different grid sizes hj = 2j h and look for the behaviour in intervals of the size hj . A typical method exploiting these scales is the wavelet approach. Applying wavelet approximations to asymptotically smooth functions as in (14.17), we would need a small number of wavelet levels on the right side of the interval, while the number of levels is increasing towards the singularity at x = 0. Again, we obtain estimates as in Proposition 14.10, showing that the advantages of the wavelet approach carry over to the hierarchical representation of v = Φ−1 n (v). From (14.11) we see that the subspace U{1,...,j} = range(M{1,...,j} (v)) is connected to step size 2j h, i.e., to level d − j. The approximation by exponentials, by hp finite elements, or by wavelets helps to reduce the data size n of the uniformly discretised function v to a much smaller size, exploiting the regularity properties of the function. The tensorisation procedure has the same effect. The particular advantage is that the tensor approximation is a black box procedure using singular-value decompositions, whereas the analytical methods mentioned above are chosen depending on the nature of the function and often require finding appropriate partitions or computing optimal coefficients (as, e.g., in (14.14)).
14.2.5 Local Grid Refinement Standard multiscale approaches apply local grid refinement in regions in which the approximation error is still too large. The tensorisation approach has fixed a step size h = 1/n. The data truncation discussed above can be related to grid coarsening. Nevertheless, a grid refinement is also possible. First, we discuss a prolongation from grid size h to h/2. Nd 2 Remark 14.11 (prolongation). Consider a tensor v ∈ V := j=1 K corren sponding to a vector v ∈ K . We introduce an additional vector space V0 := K2 Nd and define vext ∈ Vext := j=0 K2 by (14.18) vext := 11 ⊗ v. The prolongation P : V → Vext is defined by v 7→ vext according to (14.18). ext ) = v ext ∈ K2n via vext ∈ Vext corresponds to Φ−1 2n (v " d # X ext ext j v [i0 , i1 , . . . , id ] = v ij 2 . i=0
Furthermore, entry v ext [i] represents the function value at grid point i · (h/2). The prolongation can be regarded as the piecewise constant interpolation since v ext [2i] = v ext [2i + 1].
521
14.3 Convolution
The prolongation increases the data size by one tensor 11 . Note that the ranks Nd+1 N d ρj are not altered. Now we redefine Vext = j=0 K2 by Vext = j=1 K2 . Let v ∈ Vext be any tensor, e.g., in the image of the prolongation. Local refinement of the corresponding grid function in the subinterval h j∗ h j∗ ν · 2 · , (µ + 1) · 2 − 1 · ⊂ [0, 1] 2 2 yields a tensor v0 , which satisfies the supposition of the next remark. If the refinement is really local, the level number j ∗ is not large. 2 Remark 14.12 (local change). Let v, v0 ∈ ⊗d+1 Kj ∗ be two tensors such that ∗ −1 −1 h 0 Φ2n (v) and Φ2n (v ) differ only in the interval 2 ν ·2 , (µ+1)·2j −1 for some ∗ 1 ≤ j ∗ ≤ d, 1 ≤ ν ≤ µ ≤ 2d+1−j . Then the respective ranks ρj and ρ0j satisfy
ρ0j ≤ ρj + 1
for j ≥ j ∗ ,
ρ0j ≤ min{ρj + 2j
∗
−j
, 2j }
for 1 ≤ j < j ∗ .
Proof. For j ≥ j ∗ , only one block in (14.12) is altered so that the rank in∗ crease is bounded by one. For j < j ∗ , at most 2j −j blocks are involved so that ∗ ρ0j ≤ ρj + 2j −j . On the other hand, ρ0j ≤ 2j holds for all tensors. t u
14.3 Convolution 14.3.1 Notation We consider vectors from Kn = KI with I = {0, 1, . . . , n − 1}. The convolution of v, w ∈ Kn is defined by8 min{k,n−1}
u=v?w
with uk =
X
v` wk−`
(0 ≤ k ≤ 2n − 2) . (14.19a)
`=max{0,k+1−n}
Note that the resulting vector u belongs to K2n−1 since for all k ∈ {0, . . . , 2n−2} the sum in (14.19a) is nonempty. An easier notation holds for vectors (infinite sequences) from `0 := `0 (N0 ) defined in (3.2): k X ? : `0 × `0 → `0 , u = v ? w with uk = v` wk−` for all k ∈ N (14.19b) `=0
(check that the result belongs to `0 , i.e., uk = 0 for almost all k ∈ N). More generally, one may consider the convolution of v ∈ Kn and w ∈ Km for different n, m. We avoid this trivial generalisation to reduce notational complications.
8
14 Tensorisation
522
In the following, we embed Kn into `0 by identifying v ∈ Kn and vˆ ∈ `0 with vˆk = vk for 0 ≤ k ≤ n − 1 and vˆk = 0 for k ≥ n: Kn ⊂ `0 . A consequence of this identification is the embedding K m ⊂ Kn
for 1 ≤ m ≤ n.
(14.20)
Now we can rewrite (14.19a) as u=v?w
with uk =
k X
v` wk−`
(0 ≤ k ≤ 2n − 2) .
(14.19a’)
`=0
On `0 we define the degree deg(v) := max{k ∈ N : vk 6= 0}. Then Kn is identified with the subset {v ∈ `0 : deg(v) ≤ n − 1}. Remark 14.13. deg(v ? w) = deg(v) + deg(w) for v, w ∈ `0 . Using the name ‘degree’ becomes obvious from the following isomorphism. Remark 14.14. Let P be the vector space of all polynomials (with coefficients in K). Then P and `0 are isomorphic. The corresponding isomorphism is given by X π : `0 → P with π[v](x) := vk x k . (14.21) k∈N
The well-known connection of polynomials with the convolution is described by the property u=v?w
for u, v, w ∈ `0
if and only if
π[u] = π[v]π[w].
(14.22)
Definition 14.15 (shift operator). For any m ∈ Z, the shift operator S m : `0 → `0 is defined by vi−m if m ≤ i for v ∈ `0 . w = S m (v) has entries wi = otherwise 0 For m ∈ N0 , S m maps (v0 , v1 , . . .) into (0, . . . , 0 , v0 , v1 , . . .) and has the left | {z } inverse S −m , i.e., S −m S m = id. m positions The interaction of the shift operator and π is described by π[S m v](x) = xm · π[v](x)
for m ∈ N0 .
523
14.3 Convolution
14.3.2 Separable Operations In §13.5 the property (13.17) for elementary we considered N operations with binary Nd Nd d (j) (j) (j) w(j) , i.e., the operation = tensors: v w j=1 v j=1 j=1 between vectors. between tensors can be reduced to an analogous operation Efficient algorithms for tensor operation as based on this crucial property. Exercise 14.16. Assume v = Φn (v), w = Φn (w). Prove: (a) The Hadamard product defined in (4.82) satisfies v w = Φn (v w) and O d
v
(j)
O d
j=1
j=1
w
(j)
=
d O
v (j) w(j)
for v (j) , w(j) ∈ K2
j=1
satisfies hv, wi = hv, wi and the separability (b) The Euclidean scalar product d d
N (j) N Q d property w(j) = j=1 v (j) , w(j) for v (j) , w(j) ∈ K2 . v , j=1
j=1
The multivariate convolution is also separable as stated in (4.84). Now we want to perform the convolution of (univariate) vectors v, w ∈ Kn . Under the assumptions of §14.1.1, we rewrite the vectors v, w ∈ Kn as tensors v, w ∈ V. We want to perform the composition of the following three mappings: (v, w) ∈ V × V 7→ (v, w) ∈ Kn × Kn 7→ u := v ? w ∈ K2n
with v = Φn (v), w = Φn (w) (cf. (14.19a’))
(14.23)
d+1 2 7→ u := Φ−1 K . 2n (u) ∈ ⊗
We denote the mapping (v, w) 7→ u from above for short by u := v ? w.
(14.24)
Note that the result u ∈ U = ⊗d+1 K2 is a tensor of order d + 1 since the corresponding vector u ∈ K2n−1 also belongs to K2n (cf. (14.20)) and 2n = 2d+1 . Nd Nd (j) (j) with In the case of elementary tensors v = and w = j=1 w j=1 v (j) (j) 2 v , w ∈ K , the basic question arises whether the convolution can be performed separately in each direction: d d d O O O (14.25) v (j) ? w(j) , w(j) = v (j) ? j=1
j=1
j=1
provided that the right-hand side is a true description of u := v ? w. Unfortunately, this equation seems incorrect and even inconsistent since the vectors v (j) ? w(j) belong to K3 instead of K2 .
14 Tensorisation
524
14.3.3 Tensor Algebra A(`0 ) 14.3.3.1 Motivation Because of (14.1c), this difficulty is analogous to a well-known 836 problem arising for sums of integers in digital representation. 367 When adding the decimal numbers 836 and 367, the place-wise (11) 9 (13) addition of the digits leads to the difficulty that 13 and 11 are not valid (decimal) digits. While this problem is usually solved by the carry-over, we can also allow a generalised decimal representation (11)(9)(13), meaning 11 · 102 + 9 · 101 + 13 · 100 , i.e., admitting all nonnegative integers instead of the digits {0, . . . , 9}. A ‘generalised representation’ for tensors will make use of the tensor space ⊗d `0 instead of ⊗d K2 . Note that a vector from K3 appearing in the right-hand side of (14.25) is already considered as an element of `0 . It will turn out that (14.25) has a correct interpretation in ⊗d `0 . Another observation of the generalised decimal representation is (i) the equality (0)(11)(9)(13) = (11)(9)(13), i.e., representations with different numbers of decimal places may give the same value, and (ii) the carry-over in (11)(9)(13) = (1)(2)(0)(3) requires an increase of decimal places. On the side of the tensors, this means that (i) tensors in ⊗d `0 and ⊗d+1 `0 may describe the same vector, and (ii) that a corresponding ‘carry-over’ technique applied to the right-hand side in (14.25) leads to a result in ⊗d+1 K2 .
14.3.3.2 Definition and Interpretation in `0 The embedding K2 ⊂ `0 leads to the embedding V = ⊗d K2 ⊂ ⊗d `0 (cf. Notation 3.24). The tensor algebra (cf. §3.4) is defined by A(`0 ) := span{a ∈ ⊗d `0 : d ∈ N}. Remark 14.17. A linear mapping F : A(`0 ) → V into some vector space V is well defined if one of the following conditions holds: (a) F : ⊗d `0 → V is defined as linear mapping for all d ∈ N, (b) the linear mapping F is defined for all elementary tensors and for any d ∈ N,
Nd
j=1
v (j) ∈ ⊗d `0
Nd (ij ) (c) the linear mapping F is defined for all elementary tensors ∈ ⊗d `0 j=1 e (ν) and for all ij ∈ N0 and all d ∈ N. Here e ∈ `0 is the unit vector with entries e(ν) [k] = δkν
(k, ν ∈ N0 ).
525
14.3 Convolution
Proof. By definition, A(`0 ) is the direct sum of ⊗d `0 , i.e., nonvanishing tensors of ⊗d `0 with different order d are linearly independent. This proves part (a). A linear mapping F : ⊗d `0 → V is well defined by the images of the basis vectors Nd t u e(i) := j=1 e(ij ) for all i = (ij )j=1,...,d ∈ Nd0 . We recall the isomorphism Φn : V → Kn in (14.1c). First, we extend this map to ⊗d `0 by v ∈ ⊗d `0 7→ v ∈ `0 X
Φ : ⊗d `0 → `0 , with vk =
v[i1 i2 . . . id ].
Pd i1 ,...,id ∈N0 such that k= j=1 ij 2j−1
In a second step, Φ is extended to A(`0 ) by9 Φ : A(`0 ) → `0 , X v(d) ∈ A(`0 ) with v(d) ∈ ⊗d `0 a=
(14.26)
d∈N
7→ Φ(a) =
X
Φ(v(d) ) ∈ `0 .
d∈N
Remark 14.18. Since `0 is a subspace of A(`0 ) and Φ(v) = v holds for v ∈ `0 , the mapping Φ is a projection onto `0 . Furthermore, the restriction of Φ to the tensor subspace V = ⊗d K2 ⊂ ⊗d `0 coincides with Φn in (14.1c) (hence Φ is an extension of Φn ).
14.3.3.3 Equivalence Relation and Polynomials Definition 14.19. The equivalence relation ∼ on A(`0 ) is defined by a∼b
⇔
Φ(a) = Φ(b)
(a, b ∈ A(`0 )),
i.e., a and b represent the same vector. Since Φ is a projection onto `0 , we have in particular for all a ∈ A(`0 ).
Φ(a) ∼ a
(14.27)
The mapping π : `0 → P is defined in (14.21). We want to extend this mapping to A(`0 ) ⊃ `0 such that a∼b 9
By definition of A(`0 ) the sum
P
⇔
d∈N
πA [a] = πA [b].
(14.28)
v(d) contains only finitely many nonzero terms.
14 Tensorisation
526
Definition 14.20. The extension10 πA : A(`0 ) → P of π : `0 → P in (14.21) is defined by # " d d j−1 Y O (j) (14.29) π[a(j) ] x2 a (x) := . πA j=1
j=1
Lemma 14.21. Mapping πA from Definition 14.20 is an extension of π : `0 → P and satisfies (14.28). Moreover, π[Φ(a)] = πA [a].
(14.30)
Proof. (i) v ∈ `0 = ⊗1 `0 ⊂ A(`0 ) is an elementary tensor only factor v (1) = v. Definition (14.29) yields πA [v](x) =
1 Y
j−1
π[v (j) ](x2
N1
j=1
v (j) with the
) = π[v (1) ](x1 ) = π[v](x),
j=1
proving the extension property πA |`0 = π. Nd (ii) Let e(i) := j=1 e(ij ) ∈ ⊗d `0 for some multi-index i = (ij )j=1,...,d ∈ Nd0 . Since π[e(ij ) ](x) = xij , definition (14.29) yields (i)
πA [e ](x) =
d Y
π[e
(ij )
](x
2j−1
)=
j=1
= xk
d Y
(x
2j−1 ij
j=1
for k :=
d X
ij 2j−1 .
) =
d Y
x ij 2
j−1
j=1
(14.31)
j=1
Definition (14.26) shows that Φ(e(i) ) = e(k) ∈ `0 with k as above. Hence π[Φ(e(i) )] = xk proves (14.30) for a = e(i) . By Remark 14.17c, (14.30) follows for all a ∈ A(`0 ). (iii) The statement πA [a] = πA [b] ⇔ π[Φ(a)] = π[Φ(b)] follows from (14.30). Since π : `0 → P is an isomorphism, π[Φ(a)] = π[Φ(b)] ⇔ Φ(a) = Φ(b) also t u holds. The latter equality is the definition of a ∼ b. Hence (14.28) is proved. Remark 14.22. a ⊗ e(0) ∼ a and πA [a ⊗ e(0) ] = πA [a] hold for all a ∈ A(`0 ). Nd Proof. By Remark 14.17c, it suffices to consider the tensor a = j=1 e(ij ) . Then N d+1 (ij ) with id+1 := 0. By (14.29), πA [a](x) = xk b := a ⊗ e(0) is equal to j=1 e holds with k as in (14.31), while πA [b](x) = xk · π[e(0) ](x) = xk · 1 = xk . Now (14.28) proves the assertion. t u 10
It mayN be more natural to define π ˆA into polynomials of all variables ∈ N) Q the mapping N xj (j (j) (j) (j) ](x) by π ˆA [ d ](x) := d ] (xj ) . Then the present value πA [ d j=1 v j=1 v j=1 π[v j−1 results from the substitutions xj := x2 .
527
14.3 Convolution
14.3.3.4 Shift m Next, we extend the shift operator S m : `0 → `0 to SA : A(`0 ) → A(`0 ) by 11 ! d d O O m SA v (j) := S m v (1) ⊗ v (j) . (14.32) j=1
j=2
m m Remark 14.23. Φ(SA (a)) = S m (Φ(a)) holds for all a ∈ A(`0 ). SA is an extenm m m sion of S since SA |`0 = S . Nd Proof. According to Remark 14.17c, we choose a tensor a = e(i) = j=1 e(ij ) ∈ ⊗d `0 with i = (ij )j=1,...,d ∈ Nd0 . Since Φ(a) = e(k) holds with k defined in (14.31), the shift yields the result S m (Φ(a)) = e(k+m) . On the other hand, Nd m m (a) = e(i1 +m) ⊗ j=2 e(ij ) and Φ(SA (14.32) shows that SA (a)) = e(k+m) , m m t u proving the first assertion Φ(SA (a)) = S (Φ(a)). The second one is trivial.
On the right-hand side of (14.32), the shift operator S m is applied to v (1) only. Next, we consider shifts of all v (j) . Nd Lemma 14.24. Let m = (m1 , . . . , md ) ∈ Nd0 . The operator S(m) := j=1 S mj applied to v ∈ ⊗d `0 yields ) d X πA [S(m) v](x) = xm πA [v](x) m (14.33) mj 2j−1 . with = S(m) v ∼ S m (Φ(v)) j=1 Nd Proof. (i) By Remark 14.17c, we may consider v = e(i) = j=1 e(ij ) . Set i := Pd Nd j−1 . Then S(m) e(i) = j=1 e(ij +mj ) yields j=1 ij 2 d d O Y j−1 x(ij +mj )2 = xi+m , e(ij +mj ) (x) = πA [S(m) v](x) = πA j=1
j=1
Nd which coincides with xm πA [v](x) = xm πA [ j=1 e(ij ) ](x) = xm xi . This proves the first part of (14.33). m m (ii) S m (Φ(v)) = Φ(SA (v)) holds by Remark 14.23. The definition of SA (m) ˆ with the multi-index m ˆ = (m, 0, . . . , 0) . Statement can be rewritten as S ˆ ˆ ˆ (14.27) shows that Φ(S(m) (v)) ∼ S(m) (v). Since (v), hence S m (Φ(v)) ∼ S(m) (m) ˆ (m) m S (v) and S v have the identical image x πA [v](x) under the mapping πA , property (14.28) implies the second statement in (14.33). t u Corollary 14.25. Let v ∈ ⊗d `0 and m, m0 ∈ Nd0 . Then d X j=1 11
mj 2j−1 =
d X
m0j 2j−1
0
implies S(m) v ∼ S(m ) v.
j=1
Here, we make use of Remark 14.17b and restrict the definition to elementary tensors.
14 Tensorisation
528
14.3.3.5 Multi-Scale Interpretation The representation of a vector from `0 by Φ(a) with a ∈ A(`0 ) has similarities to the multi-scale analysis of functions using a wavelet basis. A vector v ∈ `0 is often viewed as the vector of grid values vk = f (k) of a (smooth) function f defined on Nd [0, ∞). Let a = ν=1 a(ν) ∈ ⊗d `0 and j ∈ {1, . . . , d}. A shift in position j is described by a 7→ ˆ a := a(1) ⊗ . . . ⊗ a(j−1) ⊗ (Sa(j) ) ⊗ a(j+1) ⊗ . . . ⊗ a(d) j−1
a) with vˆ = S 2 v (cf. (14.33)). The and corresponds to v = Φ(a) 7→ vˆ := Φ(ˆ interpretation of v by vk = f (k) leads to vˆµ = fˆ(µ) with the shifted function fˆ(x) = f (x + 2j−1 ). On the other hand, a multi-scale basis at level ` = j − 1 is given by {ψν } with the shift property ψµ (x) = ψν (x + (ν − µ) 2` ). Hence the shift X X X X f= (Sc)ν ψν = cν−1 ψν = cν ψν+1 cν ψν 7→ fˆ = ν
ν
ν
ν
also results in fˆ(x) = f (x + 2` ).
14.3.3.6 Convolution Finally, we define a convolution operation in A(`0 ). The following ? operation will be different (but equivalent) to the ? operation in (14.24). The former operation acts in A(`0 ) × A(`0 ) and yields results in A(`0 ), whereas the latter one maps (⊗d K2 ) × (⊗d K2 ) into ⊗d+1 K2 . Nd Nd (j) (j) from ⊗d `0 the and b = For elementary tensors a = j=1 b j=1 a obvious definition is ! ! d d d O O O (j) (j) ? b (14.34) a a(j) ? b(j) = j=1
j=1
j=1
(cf. (14.25) and (4.84)). Since A(`0 ) contains tensors of different orders, we define more generally db da dc O O O a(j) ? b(j) = c(j) with dc := max{da , db } j=1
and
j=1
j=1
(j) (j) (j) for j ≤ min{da , db }, c := a ? b (j) a for db < j ≤ dc if da = dc , (j) c := b(j) for d < j ≤ d if d = d . a c b c
Note that (14.35) coincides with (14.34) for da = db .
(14.35)
529
14.3 Convolution
a ∈ ⊗da `0 Corollary 14.26. Another interpretation of (14.35) follows. NAssume dc db (0) and b ∈ ⊗ `0 with da < db . Replace a by ˆ a := a ⊗ j=da +1 e ∈ ⊗dc `0 and set a ? b := ˆ a ? b, where the latter expression can be defined by (14.34) with d = db . As e(0) ? v = v ? e(0) = v for all v ∈ `0 , the new definition of a ? b coincides with (14.35). Property (14.22) has a counterpart for the convolution in A(`0 ), which will be very helpful in §14.3.4. Proposition 14.27. (a) Φ(a ? b) = Φ(a) ? Φ(b) holds for all a, b ∈ A(`0 ), where the second ? operation is the convolution (14.19b) in `0 . (b) The implication c∼a?b
⇔
Φ(c) = Φ(a) ? Φ(b)
holds for all a, b, c ∈ A(`0 ). Proof. We apply (14.22), which holds for the `0 -convolution: Φ(c) = Φ(a) ? Φ(b)
⇔
π[Φ(c)] = π[Φ(a)] π[Φ(b)].
By (14.30), this is equivalent to πA [c] = πA [a] πA [b]. It suffices to consider Ndb (j) Nda (j) (extend Remark 14.17 elementary tensors a = and b = j=1 b j=1 a to the bilinear mapping ?). First we assume da = db =: d. Then definition (14.34) yields # " d d O Y j−1 (j) (j) πA [a ? b](x) = πA (a ? b ) (x) = π[a(j) ? b(j) ](x2 ) = (14.29)
j=1
=
(14.22)
j=1
d n o Y j−1 j−1 π[a(j) ](x2 ) · π[b(j) ](x2 ) j=1
" =
d Y
# " (j)
π[a
](x
2j−1
) ·
j=1
d Y j=1
# π[b
(j)
](x
2j−1
)
= (14.29)
= πA [a](x) · πA [b](x). This proves Φ(a ? b) = Φ(a) ? Φ(b) for a, b ∈ ⊗d `0 . For elementary tensors of different orders dv 6= dw use the equivalent definition from Corollary 14.26. Since πA [ˆ a] =πA [a] (cf. Remark 14.22), assertion (a) follows from the previous result. Because c ∼ a ? b is equivalent to Φ(c) = Φ(a ? b), Part (b) follows from (a). t u Exercise 14.28. Let a, a0 ∈ ⊗d `0 and b, b0 ∈ A(`0 ) with a ∼ a0 and b ∼ b0 . Prove a ⊗ b ∼ a0 ⊗ b0 .
14 Tensorisation
530
14.3.3.7 Carry-over Procedure The ‘carry-over’ must change an element a ∈ A(`0 ) into a0 ∈ A(`0 ) such that a ∼ a0 and a0 ∈ ⊗d K2 for a minimal d. Equivalence a ∼ a0 ensures that both a and a0 are (generalised) tensorisations of the same vector Φ(a) = Φ(a0 ) ∈ `0 . The minimal d is determined by the inequality 2d−1< deg(Φ(a)) ≤ 2d . The following algorithm proceeds from Step 0 to Step d − 1. P dν Step 0. Any element a is a finite sum ν aν of elementary tensors aν ∈ ⊗ `0 . Case (a) If dν > d, we may truncate to d as follows. Let aν = a0ν ⊗ a00ν with ∈ ⊗d `0 and a00ν ∈ ⊗dν −d `0 . Replace aν by ˜ aν := λ aν0 ∈ ⊗d `0 , where 00 λ := (aν ) [0 . . . 0] is the entry for (i1 , . . . , idν −d ) = (0, . . . , 0). Then Φ(aν ) and d entries Φ(˜ aν ) have identical entries Pfor the indices 0 ≤ i ≤ 2 − 1. The other d ≤ since vanish sum must deg(Φ(a)) the a they 2 . Hence may differ, but in ν ν P P ∼ a representations. equivalent are a ˜ ν ν ν ν Nd aν := aν ⊗ j=dν +1 e(0) . Remark 14.22 Case (b) If dν < d, replace aν with ˜ ensures aν ∼ ˜ aν . a0ν
a ∼ a belongs to V := ⊗d `0 . After these changes, the new ˜ P (1) (>1) Step 1. Tensor a ∈ ⊗d `0 has the representation ν aν ⊗ aν with components N (1) (>1) (1) (1) d aν ∈ `0 and aν ∈ j=2 `0 . In case of deg(aν ) > 2, split a := aν 0(1) 0(1) 00(1) 0(1) with aν ∈ K2 . For this purpose, set aν := (a0 , a1 ) and into aν + S 2 aν (1) 00(1) aν := (a3 , a4 , . . .) = S −2 aν ∈ `0 . Then (14.33) implies that (>1) (>1) (>1) = a0(1) a(1) ∼ aν0(1) ⊗a(>1) + S 2 a00(1) ⊗aν +a00(1) . ⊗ S 1 a(>1) ν ⊗aν ν ⊗aν ν ν ν ν 00(1)
(1)
Note that deg(aν ) = deg(aν ) − 2 has decreased. This procedure must be re00(1) peated until deg(aν ) ≤ 2. P (1) (1) (>1) At the end of Step 1, a new tensor ˜ a = ν aν ⊗ aν ∼ a with aν ∈ K2 is obtained. (>1)
Step 2. If d > 2, we apply the procedure of Step 1 to the tensor aν a = in ˜ P (1) (>1) (>1) (>1) ˜ν a ⊗ aν with d replaced by d−1 . Each aν = is replaced with a P Pν ν(2) (2) (>2) (1) (>1) (>2) ˆ a a ⊗ a . By Exercise 14.28, ⊗ ∼ ∼ := a a ⊗ a a ˜ a νµ νµ ν ν νµ νµ ν,µ µ P (1) (2) a as new ˜ a. Reorganisation of the sum yields holds with aν , aνµ ∈ K2 . Take ˆ P (1) ν,µ (2) (>2) (1) (2) ˜ a = ν aν ⊗ aν ⊗ aν ∼ a with aν , aν ∈ K2 . .. . P Nd (j) (j) Step d − 1. The previous procedure yields ˜ a = ν j=1 aν with aν ∈ K2 for (d)
(d)
0(d)
00(d)
0(d)
into aν + S 2 aν with aν ∈ K2 . P Nd−1 (j) 2 00(d) = 0. ⊗ S aν As in Case a) ofStep 0, weconclude that j=1 aν ν P Nd−1 (j) 0(d) Hence ˜ a= ν ⊗ aν ∼ a is the desired representation in j=1 aν V := ⊗d K2 . 1 ≤ j ≤ d − 1. If deg(aν ) > 2, split aν
531
14.3 Convolution
14.3.4 Algorithm 14.3.4.1 Main Identities A tensor v ∈ ⊗d K2 (d ∈ N) possesses a unique decomposition12 v = v0 ⊗
1 0
+ v00 ⊗
with v0 , v00 ∈ ⊗d−1 K2 .
0 1
Using the notation (3.22a), we have v0 = v and v00 = v with indices 1, 2 corresponding to the basis (14.2a). We start with the simple case of d = 1. The next lemma demonstrates how the ‘carry-over’ is realised. α γ 1 2 2 β , δ ∈ K = ⊗ K yields
Lemma 14.29. The convolution of
αγ 1 βδ 0 α γ 2 2 β ? δ = Φ(v) with v := αδ+βγ ⊗ 0 + 0 ⊗ 1 ∈ ⊗ K . Furthermore, the shifted vector S 1 S1
(14.36a)
γ α has the tensor representation β ? δ
0 1 αδ+βγ 0 α γ = Φ(v) with v := αγ ⊗ 0 + βδ ⊗ 1 ∈ ⊗2 K2 . (14.36b) β ? δ
Proof. An elementary calculation yields
α β
?
γ δ
αγ αδ+βγ βδ
=
∈ K3 , where the
latter vector is identified with (αγ, αδ + βγ, βδ, 0, 0, . . .) ∈ `0 . We split this vector αγ αγ 1 αγ + S 2 βδ into αδ+βγ 0 . From αδ+βγ ∼ αδ+βγ ⊗ 0 (cf. Remark 14.22) and S2
βδ 0
∼
S2
βδ 0
⊗
1 0
∼
βδ 0
⊗ S1
1 0
βδ
=
0
⊗
0 1
we obtain the first assertion. The second one follows analogously from α γ S = β ? δ 1
"
0 αγ αδ+βγ βδ
# ∼
0 αγ
+ S2
αδ+βγ βδ
finishing the proof.
t u
The basic identity is given in the next lemma which shows how the convolution product of tensors of order d − 1 can be used for tensors of order d. Note that a0 , a00 and u0 , u00 can be expressed by a and u (i = 1, 2), respectively. 12
For d = 1, the tensors v0 , v00 degenerate to numbers in the field K.
14 Tensorisation
532
Lemma 14.30. Let d ≥ 2. Assume that for v, w ∈ ⊗d−1 K2 the equivalence v ? w ∼ a = a0 ⊗
1 0
+ a00 ⊗
0 1
∈ ⊗d K2
(14.37a)
γ 2 holds. Let the tensors v ⊗ x, w ⊗ y ∈ ⊗d K2 be defined by x = α β , y= δ ∈ K . Then (v ⊗ x) ? (w ⊗ y) ∼ u = u0 ⊗ 10 + u00 ⊗ 01 ∈ ⊗d+1 K2 0 αγ (14.37b) with u0 = a0 ⊗ αδ+βγ + a00 ⊗ αγ ∈ ⊗d K2 αδ+βγ βδ 00 0 d 2 00 and u = a ⊗ 0 + a ⊗ βδ ∈⊗ K .
Proof. Proposition 14.27 ensures that (v ⊗ x) ? (w ⊗ y) ∼ (v ? w) ⊗ z with z := x ? y ∈ K3 ⊂ `0 . Assumption (14.37a) together with a0 ⊗ 10 ∼ a0 (cf. Remark 14.22) and a00 ⊗
0 1
= a00 ⊗ S 1
1 0
d−1
2 ∼ SA
a00
(cf. Remark 14.23) yields 2d−1 00 (v ? w) ⊗ z ∼ a0 + SA a ⊗ z. Again, Remark 14.23 shows that d−1
2 (SA
d−1
2 a00 ) ⊗ z = SA
(a00 ⊗ z) ∼ a00 ⊗ (Sz).
Using (14.36a,b), we obtain αγ 1 0 a0 ⊗ z ∼ a0 ⊗ αδ+βγ ⊗ 0 + a0 ⊗ βδ 0 ⊗ 1 , 0 δ−1 0 ⊗ 10 + a00 ⊗ αδ+βγ ⊗ 1 . S 2 a00 ⊗ z ∼ a00 ⊗ (Sz) ∼ a00 ⊗ αγ βδ Summation of both identities yields the assertion of the lemma. If x =
α β
and y =
γ δ
are equal to any of the unit vectors αγ αδ+βγ
t u
1 0 0 , 1 , the quantities
0 βδ αδ+βγ , αγ , 0 , βδ
arising in (14.37b) are of the form
0 1 0 0 , 0 , or 1 .
Remark 14.31. Given v, w ∈ ⊗d K2 , Lemmata 14.29 and 14.30 allow us to find the unique u ∈ ⊗d+1 K2 with v ? w ∼ u. In the following, we write v ? w instead of u. This notation coincides with the definition in (14.24).
533
14.3 Convolution
14.3.4.2 Realisation in Different Formats First, we assume that v, w ∈ Kn are represented by v = Φ(v) and w = Φ(w) Nd Nd (j) (j) with elementary tensors v = . The convolution and w = j=1 v j=1 w (1) (1) 2 2 rank a tensor of from (14.36a). Assume as seen 2 v ? w ∈ ⊗ K already yields Nd−1 (j) Nd−1 (j) d 2 ∼ a ∈ K ⊗ v by induction that ? a representation w has j=1 j=1 d d−1 rank 2 . Then (14.37b) yields a representation rank 2 . Since 2d = n is the bound of the maximal tensor rank in ⊗d+1 K2 , the r-term representation may yield large representation ranks for v ? w, even when rank(v) = rank(w) = 1. Hence the r-term format Rr is not a proper choice for the convolution. Since the tensor subspace format Tr is questionable anyway (see discussion in §14.1.1), we use the Hρtens format as described in §14.1.2. We recall the involved subspaces Uj ⊂ ⊗j K2 for 1 ≤ j ≤ d (cf. (14.4a–c)). Theorem 14.32. Let tensors v, w ∈ ⊗d K2 be represented as v ∈ Hρtens and 0 0 00 subspaces respective j i.e., w ∈ Hρtens U , ≤ d; , involving the and U 1 ≤ 00 j j U01 = K2 ,
U0j ⊂ U0j−1 ⊗ K2 ,
dim(U0j ) = ρ0j ,
v ∈ U0d ,
U100 = K2 ,
U00j ⊂ U00j−1 ⊗ K2 ,
dim(Uj00 ) = ρ00j ,
w ∈ U00d .
Then v ? w ∈ ⊗d+1 K2 belongs to the format Hρtens with ρ1 = ρd+1 = 2,
ρj ≤ 2ρ0j ρ00j
(1 ≤ j ≤ d)
The involved subspaces Uj := span{(x ? y) : x ∈ U0j , y ∈ U00j , i = 1, 2}
(1 ≤ j ≤ d)
(14.38)
with dim(Uj ) = ρj again satisfy U1 = K2 ,
Uj ⊂ Uj−1 ⊗ K2 (2 ≤ j ≤ d + 1) ,
v ? w ∈ Ud+1 .
Proof. By Lemma 14.29, U1 defined in (14.38) is equal to K2 as required in 00 ⊗ K2 , (14.4a). Let j ∈ {2, . . . , d}. Because of U0j ⊂ U0j−1 ⊗ K2 and U00j ⊂ Uj−1 we have ( ) x = v ⊗ x, v ∈ U0j−1 , x ∈ K2 Uj ⊂ span (x ? y) : i = 1, 2, . 00 y = w ⊗ y, w ∈ Uj−1 , y ∈ K2 By (14.37a), v ? w ∈ span{a0 , a00 } ⊗ K2 holds with a0 , a00 ∈ Uj−1 . The tensors u0 = (x ? y) and u00 = (x ? y) in (14.37b) belong to Uj−1 ⊗ K2 , proving Uj ⊂ Uj−1 ⊗ K2 . t u The fact that the ranks are squared is a usual consequence of binary operations (cf. §13.5.3). The factor 2 in ρj ≤ 2ρj0 ρ00j is the carry-over effect. The exact computation using a frame b(j) of the size ρj spanning Uj can be followed by an orthonormalisation and truncation.
14 Tensorisation
534
14.4 Fast Fourier Transform The fast Fourier algorithm uses the same hierarchical structure as the tensorisation. Therefore it is not surprising that the fast Fourier transform can be realised by the tensors without using the original vectors. After recalling the algorithm for vectors in §14.4.1, the tensorised version is derived in §14.4.2. The latter algorithm has been studied by Dolgov et al. [80], who also describes the sine and cosine transforms.
14.4.1 FFT for Cn Vectors Let n = 2d and ωd := exp(2πi/n). The discrete Fourier transform (DFT) is the mapping v ∈ Cn into vˆ ∈ Cn defined by vˆ = Fd v
with Fd =
√1 n
n−1 (ωdk` )k,`=0 .
The inverse Fourier transform vˆ 7→ v is described by FdH , involving ωd instead of ωd . We recall the fast Fourier transform (FFT) in the case of n = 2d . If d = 0, n/2−1 n−1 vˆ = v holds. Otherwise, introducing v I = (vk )k=0 and v II = (vk )k=n/2 , we observe that vˆ has the components n n−1 2 −1 X X 1 1 vˆ2k = √ ωd2k` v`+ n2 = √ Fd−1 v I + v II , ωd2k` v` + n 2 `=0 `= n 2 n n−1 2 −1 X X 1 → v I −v II vˆ2k+1 = √ ω ωd2k` ωd` v`+ n2 = √12 Fd−1 − ωd2k` ωd` v` − d n n `=0
`= 2
for 0 ≤ k ≤ n/2 − 1. The last expression uses a Hadamard product with the vector − → := (ω ` )n/2−1 ∈ Cn/2 . ω d d `=0 For an algorithmic description we need a function Divide with the property v I = Divide(v, 1), v II = Divide(v, 2), and a function13 Merge, such that the arguments u = (u0 , u1 , . . . , un/2−1 )T and v = (v0 , v1 , . . . , vn/2−1 )T are mapped into w := Merge(u, v) ∈ Cn with w = (u0 , v0 , u1 , v1 , . . . , un/2−1 , vn/2−1 )T , i.e., w2k = uk and w2k+1 = vk . Then the discrete Fourier transform can be performed by the following recursive function: 13
After d merge steps the bit reversal is already performed.
14.4 Fast Fourier Transform
535
function DF T (v, d); {v ∈ Cn with n = 2d } if d = 0 then DF T := v else (14.39) begin v I := Divide(v, 1); v II := Divide(v, 2); → v I −v II , d − 1)) DF T := √12 Merge(DF T (v I +v II , d − 1), DF T (− ω d end;
14.4.2 FFT for Tensorised Vectors The vectors v and vˆ = Fd v correspond to tensors v, v ˆ ∈ ⊗d C2 with Φn (v) = v and Φn (ˆ v) = vˆ. Note that for Fd := Φ−1 n Fd Φn .
v ˆ = Fd v
(14.40)
To perform Fd directly, we rewrite the function DF T in (14.39) for the tensorised quantities. Lemma 14.33. Assume n = 2d , d ≥ 1. (a) The tensor vI , vII ∈ ⊗d−1 C2 satisfying Φn (vI ) = v I = Divide(v, 1) and Φn (vII ) = v II = Divide(v, 2) are vI = v and vII = v (cf. (3.22a) with indices corresponding to the basis (14.2a)). (b) Let wI , wII ∈ Cn/2 with w := Merge(wI , wII ) ∈ Cn . The tensorised quan−1 II −1 I II tities wI = Φ−1 n/2 (w ), w = Φn/2 (w ), and w = Φn (w) satisfy w= (c) ω d =
d−1 N j=1
1 j−1
ωd2
1 0
⊗ wI +
0 1
⊗ wII .
− → = Φ−1 n/2 (ωd ) is an elementary tensor.
Proof. For (a), (b) use definition (14.1c). For (c), see Remark 5.19 and (14.8).
t u
By Lemma 14.33 there are direct realisations of Divide and Merge for the tensor counterpart, which we denote by Divide and Merge. The tensorised version of DF T is v ˆ = DFT(v, d) satisfying (14.40). The analogous algorithmic description is14 function DFT(v, d); {v ∈ ⊗d C2 } if d = 0 then DFT := v else begin vI := Divide(v, 1); vII := Divide(v, 2); DFT := √12 Merge(DFT(vI +vII , d − 1), DFT(ω d (vI −vII ), d − 1)) end; Define Ωd v by √12 Merge(vI + vII , ω d (vI − vII )) and observe the identity Fd = (id ⊗ Fd−1 ) Ωd . The d-fold recursion yields 14
√ √ Instead of multiplying by 1/ 2 in each step, one can divide by n in the end.
14 Tensorisation
536
Fd = Ω1 Ω2 · · · Ωd−1 Ωd , where Ωj applies to the directions d−j+1, . . . , d, while the directions 1, . . . , d−j remain unchanged. In its exact form this procedure is unsatisfactory. Each Ωj doubles the number of terms so that finally n = 2d terms are created. As a consequence, the amount of work is not better than the usual FFT for vectors. Instead, after each step a truncation is applied: Fdtrunc = T1 Ω1 T2 Ω2 · · · Td−1 Ωd−1 Td Ωd
(Tj : truncation).
The (nonlinear) truncation operator Tj appearing in Ftrunc can be based on a d prescribed accuracy or a prescribed rank. Lemma 14.34. Given v, set ˜(d+1) := v v(d+1) := v v(j) := Ωj v(j+1) ,
and v ˜(j) := Tj Ωj v ˜(j+1)
for j = d, . . . , 1.
Then v ˜(1) = Fdtrunc v must be compared with v(1) = Fd v. Assume that Tj is chosen such that
Tj Ωj v ˜(j+1) ˜(j+1) − Ωj v ˜(j+1) ≤ ε Ωj v holds with respect to the Euclidean norm. Then the resulting error can be estimated by i
trunc
h
Fd v − Fd v ≤ (1 + ε)d − 1 kvk ≈ d ε kvk .
Proof. Set δj := v ˜(j) − v(j) . Note that δd+1 = 0. Since the operation Ωj is unitary, the recursion
δj = v ˜(j) − v(j) = Tj Ωj v ˜(j+1) − v(j+1) ˜(j+1) − Ωj v ˜(j+1) + Ωj v
˜(j+1) − v(j+1) ≤ ε v ˜(j+1) + δj+1 ≤ ε Ωj v ˜(j+1) + Ωj v
≤ ε v(j+1) + (1 + ε) δj+1 = ε kvk + (1 + ε) δj+1 d+1−j
proves δj ≤ [(1 + ε)
− 1] kvk .
t u Hρtens
Next, we assume that the ranks ρ1 , . . . , ρd of v ˜(j) ∈ are uniformly bounded by ρ. The main part of Ωj is the Hadamard product with ω d . The corresponding cost (cf. (14.7)) is of lower order than the truncation cost O(dρ3 ) (cf. (14.9)). Since Ftrunc v is obtained after d steps, we obtain the following result d about the computational cost. Remark 14.35. If, using the Hρtens format, all intermediate results have TT ranks bounded by ρ, the truncated FFT version Ftrunc v costs O(d2 ρ3 ) operations. d Note that ρ is the maximum of the ranks of the input tensor v and of all intermediate results v ˆ. For numerical examples, ˜(j) including the final Fourier image v we refer to [80].
14.5 Tensorisation of Functions
537
14.5 Tensorisation of Functions So far, tensorisation has been applied to vectors which might be viewed as a grid function. Now we use the same formalism for functions. This corresponds to the multiscale treatment of functions.
14.5.1 Isomorphism ΦF n Consider a space F ((a, b]) of functions defined on the interval (a, b] ⊂ R. The norm (and, possibly, the scalar product) of F ((a, b]) must be such that the norm is invariant with respect to a shift, i.e., kf kF ((a,b]) = kf (· + δ)kF ((a+δ,b+δ]) . Furthermore, the function space F ((0, 1]) must allow discontinuities of the functions at ν/n, 1 ≤ ν ≤ n − 1. In the following we try to establish an isomorphism between F ((0, 1]) and VnF := F ((0, 1/n]) ⊗
d O
K2
with n = 2d .
j=1
For a better understanding, we first introduce the intermediate tensor space Vn := F ((0, 1/n]) ⊗ Kn . ˆn : Vn → F ((0, 1]) by Definition 14.36. Define Φ n−1 f = Φˆn (ϕ ⊗ v) ∈ F ((0, 1]) for ϕ ∈ F ((0, 1/n]) and v = (vk )k=0 ∈ Kn (14.41)
with f (x) = vk · ϕ(x − nk ) for
k n
1 it is impossible to choose (i` , j` ) := (i`−1 , j`−1 ) since this index belongs to the cross in which M − R`−1 vanishes. More generally, i` ∈ / {i1 , . . . , i`−1 } and j` ∈ / {j1 , . . . , j`−1 } is required. The second last value (i`−1 , j`−1 ) from the loop 4 in the previous step ` − 1 may satisfy this requirement. The loop in line 4 may be repeated a few times. Line 5 contains the explicit definition of R` . Implicitly, M − Rr is determined by (15.10b) from Tτ ×σ defined in line 6. In the case of approximation, algorithm (15.8) is either performed with fixed r, or the termination depends on a suitable stopping criterion. Remark 15.10. In (15.8) the number of evaluated matrix entries is bounded by O(r(#I + #J)). For large-scale matrices we also assume that computational cost and storage of the size O(#I + #J) are acceptable, while O(#I#J) is too large. However, this statement holds for a successful application only. There are cases for which this heuristic approach fails (cf. Hackbusch [138, pages 265–266], B¨orm– Grasedyck [38]). Descriptions of the adaptive cross approximation including attempts to control the error and variations of the method can be found in Bebendorf [22, 23] and B¨orm– Grasedyck [38]. Finally, we remark that the same approach can be used for multivariate functions (cf. [138, §9.4.4]). In the case of a function Φ in two variables, the analogue of the rank-1 matrix E(M, i, j) in §15.3 is the rank-one function E(Φ, ξ, η)(x, y) = Φ(ξ, y)Φ(x, η)/Φ(ξ, η), which interpolates Φ(·, ·) in the lines x = ξ and y = η, i.e., Φ(ξ, y) = E(Φ, ξ, η)(ξ, y) and Φ(x, η) = E(Φ, ξ, η)(x, η).
15.4 Case d ≥ 3
551
15.4 Case d ≥ 3 As stated in Lemma 15.5, the rank-1 tensor E(v; i) interpolates v at the cross C(i); i.e., v0 := v − E(v; i) vanishes on C(i). However, when we choose another cross C(i0 ), the next tensor E(v0 ; i0 ) — unlike the matrix case d = 2 — need not vanish on C(i) so that the next iterate v0 −E(v0 ; i0 ) loses the interpolation property on C(i). As a consequence, the iteration (15.8) yields tensors of rank r, which do not satisfy the statement of Lemma 15.8. In the case of locally best rank-one approximations, we have already seen this phenomenon in Remark 9.23b. The properties observed above make it impossible to generalise the cross approximation directly to higher dimension. Another reason for the fact that Lemma 15.8 cannot hold for d ≥ 3, is Proposition 3.37: tensor rank revealing algorithms must be NP hard.
15.4.1 Matricisation There are several approaches to tensor versions of the cross approximation (cf. Espig–Grasedyck–Hackbusch [88], Oseledets–Tyrtyshnikov [242], Oseledets– Savostyanov–Tyrtyshnikov [239], Bebendorf [24]). The approximation of a rank1 tensor is discussed by Bachmayr et al. [13]. Here we follow the algorithm of Ballani–Grasedyck–Kluge [17] (see also [15]). The latter algorithm directly produces the approximation in the hierarchical format. For the explanation of the algorithm we first assume that v ∈ Hr and that we know the ranks rα = rankα (v). As in Lemma 15.8a we want to recover the tensor by the cross approximation. min We recall that rα = dim(Umin α (v)), where Uα (v) = range(Mα (v)) involves the matricisation of v (cf. (6.12)). Mα (v) is a matrix in KIα ×Iαc with the index sets Iα := ×j∈α Ij and Iαc := ×j∈αc Ij (αc = D\α). Theoretically, we may apply the methods of §15.3. Choose pivot subsets (α)
} ⊂ Iα , Pα = {p1 , . . . , pr(α) α
(αc )
Pαc = {p1
c
) , . . . , p(α rα } ⊂ I α c
(15.12a)
(called τ, σ in §15.3) containing rα indices such that Mα (v)|Pα ×Pαc is regular. Then Mα (v) is equal to −1
Mα (v)|Iα ×Pαc · (Mα (v)|Pα ×Pαc )
· Mα (v)|Pα ×Iαc ∈ KIα ×Iαc
(15.12b)
(cf. (15.9) and Lemma 15.8). The columns (α)
bi
(αc )
:= Mα (v)[•, pi
],
1 ≤ i ≤ rα , (αc )
form a basis of Umin α (v) (cf. (11.18b)). Similarly, bj min yields a basis of Uαc (v).
(α)
:= Mα (v)[pj , •]T
552
15 Multivariate Cross Approximation
In general, Iα and Iαc are huge sets so that neither the matrix Mα (v)|Iα ×Pαc nor Mα (v)|Pα ×Iαc are practically available. This corresponds to the fact that (α) (αc ) the bases {bi } and {bi } are never stored. The only practically computable quantity is the rα × rα matrix Sα := Mα (v)|Pα ×Pαc (Sα is the abbreviation of SPα ×Pαc introduced in Remark 15.9). Together with Sα , also Tα := Sα−1 is available. (α)
∈ Pα are particular functionals (cf. Remark 15.3):
Evaluations at pi (α)
ϕi
0 ∈ Vα
(α)
(α)
with ϕi (vα ) := vα [pi ] for vα ∈ Vα . (α)
As explained in Notation 3.57b, we identify ϕi
(α)
0 ∈ Vα and ϕi
(α)
(α)
∈ L(V, Vαc ):
for v ∈ V.
ϕi (v) = v[pi , •] ∈ Vαc
(αc )
(αc )
0 Similarly, evaluations at pi c∈ Pαc are particular functionals ϕi c ∈ Vα c (α ) (α ) (αc ) 0 defined by ϕi (vαc ) := vαc [pi ] for vαc ∈ Vαc . Identification of ϕi ∈ Vα c (αc ) and ϕi ∈ L(V, Vα ) yields (αc )
ϕi
(αc )
(v) = v[ •, pi (αc )
(α)
The bases {bi } and {bi take now the form (α)
bi
(αc )
= ϕi
(v),
] ∈ Vα
for v ∈ V.
min } defined above, spanning Umin α (v) and Uαc (v),
(αc )
bj
(α)
(1 ≤ i ≤ rα ) .
= ϕj (v)
(15.13)
min Since v ∈ Umin α (v) ⊗ Uαc (v), there are coefficients cij with
v=
rα X
(α)
cij bi
(αc )
⊗ bj
.
(15.14)
i,j=1
Prα (α) (αc ) (αc ) (αc ) (αc ) (α) cij bi ϕν (bj ). Application of ϕν yields bν = ϕν (v) = i,j=1 (α) (αc ) (α) (αc ) (αc ) (αc ) Since bj = ϕν (v), the identity ϕν (bj ) = (ϕj ⊗ ϕν )(v) holds. Hence the matrix C = (cij ) ∈ Krα ×rα is the inverse Tα := Sα−1 of rα (α) (αc ) Sα = (ϕj ⊗ ϕi )(v)
i,j=1
rα (α) (αc ) = Mα (v)[ pj , pi ]
i,j=1
.
(15.15)
Here Mα (v)[·, ·] denotes an entry of the matrix Mα (v). Equation (15.14) with C = Tα is the interpretation of (15.12b) in V.
15.4 Case d ≥ 3
553
15.4.2 Nestedness Let α ∈ TD with sons α1 and α2 . While the index sets Pα , Pαc in (15.12a) are associated to α, there are some other index sets Pα1 , Pαc1 and Pα2 , Pαc2 related to α1 and α2 . Their cardinalities are #Pαι = #Pαcι = rαι := dim(Umin αι (v))
(ι = 1, 2) .
min (v) and Umin The bases of Uα α2 (v) are given by 1 (α1 )
bi
(αc1 )
= ϕi
(αc1 )
(v) := v[ •, pi
(α2 )
],
bj
(α2c )
= ϕj
(αc2 )
(v) := v[ •, pj
].
min min Since Umin α (v) ⊂ Uα1 (v)⊗Uα2 (v) (cf. (11.14c)), there are coefficient matrices (α,`) (α,`) = (cij ) such that C
(α) b`
r α 1 rα 2 X X
=
(α,`)
(α1 )
(α2 )
⊗ bj
cij
bi
(α,`)
we make the ansatz
(15.16)
i=1 j=1
(cf. (11.20)). For the determination of cij (α) b` =
rα1 rα2 X X
(α ) (α )
ciν 1 cjµ2
(α ) (α ) (αc ) 2) 1) ⊗ ϕ` ⊗ ϕ(α ϕ(α (v) bi 1 ⊗ bj 2 . µ ν
(15.17)
i,ν=1 j,µ=1 (α )
(α )
Since the rα1 rα2 functionals ϕν 1 ⊗ ϕµ 2 are linearly independent on the space min min (v) of dimension rα1 rα2 , equation (15.17) holds if and only if Uα (v) ⊗ Uα 2 1 (α ) (α ) all images under ϕν 0 1 ⊗ ϕµ0 2 are equal. The left-hand side yields
(α )
(α )
ϕν 0 1 ⊗ ϕµ0 2
(α) b`
(αc ) ϕ` (v) (15.13) (α ) (α ) (αc ) = ϕν 0 1 ⊗ ϕµ0 2 ⊗ ϕ` (v), =
(α )
(α )
ϕν 0 1 ⊗ ϕµ0 2
(15.18a)
while the right-hand side is equal to rα1 rα2 X X
(α ) (α )
ciν 1 cjµ2
(α )
(α )
(αc )
ϕν 0 1 ⊗ ϕµ0 2 ⊗ ϕ`
(α ) (α ) (α ) (α ) (v) ϕν 0 1 (bi 1 ) ϕµ0 2 (bj 2 ).
i,ν=1 j,µ=1
(15.18b) As in (15.15), we have (αc1 ) (α1 ) rα1 1) 1) Sα1 = ϕ(α ⊗ ϕ (v) = ϕ(α ) ν,i=1 , ν ν (bi i (α ) r
(α2 )
Sα2 = ϕµ(α2 )(bj
rα2 ) µ,j=1
(α ) r
α1 α2 and Cα2 = (cjµ2 )j,µ=1 . and Tα1 := Sα−1 . Set Cα1 = (ciν 1 )i,ν=1 , Tα2 := Sα−1 2 1 Then (15.18b) becomes
554
15 Multivariate Cross Approximation
rα1 rα2 X X
(Sα1 Cα1 )ν 0 ,ν (Sα2 Cα2 )µ0 ,µ
(α )
(αc )
(α )
ϕν 0 1 ⊗ ϕµ0 2 ⊗ ϕ`
(v).
(15.18c)
ν=1 µ=1
A comparison of (15.18a) and (15.18c) shows that Cα1 = Tα1 and Cα2 = Tα2 yield (α,`) in (15.16) satisfy the desired identity. Hence the coefficients cij (α,`)
cij
=
rα 1 r α 2 X X
(α ) (α ) (αc ) (v) Tα1 [i, ν] Tα2 [j, µ] ϕν 0 1 ⊗ ϕµ0 2 ⊗ ϕ`
ν=1 µ=1 rα 1 r α 2
=
XX
(αc )
1) 2) Tα1 [i, ν] Tα2 [j, µ] v[p(α , p(α , p` ν µ
].
ν=1 µ=1 (α)
For matrix notation set V`
(α1 )
:= v[pν
(α2 )
, pµ
(α)
C (α,`) = Tα1 V`
(αc )
, p`
]
ν=1,...,rα1 , µ=1,...,rα2
TαT2 .
and
(15.18d)
The case α = D is exceptional because of αc = ∅. We recall that MD (v) = v (cf. Footnote 1 on page 188). Remark 15.11. In the case of α = D with sons α1 and α2 we have rD = 1, (D) (D c ) (D c ) (D c ) b1 = v, pi = ∅, ϕi = id, and formally b1 = 1 ∈ K. There is only (D) (α ) (α ) one matrix V1 = v[pν 1 , pµ 2 ] ν,µ = Sα1 . Since Sα2 = Mα2 (v)|Pα2 ×Pα1 = (D)
. c(D) = 1 is a consequence of b1 SαT1 , (15.18d) yields C (D,1) = Tα1 = Sα−1 1
= v.
We summarise the results. Proposition 15.12. Assume that v ∈ Hr holds exactly. Then the parameters Cα , c(D) , and Bj of the hierarchical representation can be determined as follows. Prα1 (D,1) (α1 ) (D) (α ) ⊗ bj 2 (a) For α = D with sons α1 and α2 , v = b1 = i,j=1 cij bi (D,1) (D) holds with C = 1 ∈ K. = Tα1 . Furthermore, c (b) Let α ∈ TD \({D} ∪ L(TD )) with sons α1 and α2 . For β ∈ {α, α1 , α2 } and rβ := rankβ (v), there are row and column index sets Pβ , Pβ c of Mα (v) such that the rβ × rβ matrix Sβ = Mβ (v)|Pβ ×Pβc is regular. For any such index (β c ) (β) sets, bases {bi : 1 ≤ i ≤ rβ } are defined by v[ •, pi ]. The coefficient matrix C (α,`) for the characteristic relation (15.16) is given by (15.18d). (c) If, in the cases (a,b), a son αι belongs to L(TD ), αι = {j} holds for some (α ) (α ) (α ) j ∈ D, and Bj = [b1 ι b2 ι · · · brj ι ] is used as basis of Ujmin (v) ⊂ Vj . (d) The inverses Tα are computed via the recursion (15.10a). Proof. Remark 15.11 shows part (a). Part (b) is explained in §15.4.2.
t u
15.4 Case d ≥ 3
555
15.4.3 Algorithm 15.4.3.1 Provisional Form We repeat the algorithmic steps mentioned in Proposition 15.12. In this provisional version of the algorithm, the following steps are independent and could be performed in parallel. Step 1 for all α ∈ TD determine index subsets Pα ⊂ Iα , Pαc ⊂ Iαc such that Sα in (15.15) is regular. Tα is computed via (15.10a). Here either the size rα := #Pα = #Pαc may be prescribed or determined adaptively. (D)
Step 2
for α = D, set C (D,1) := Tα1 (α1 first son of D) and c1
Step 3
for all α ∈ TD \({D} ∪ L(TD )) determine (α)
V`
(αc )
1) 2) := (v[p(α , p` , p(α ν µ
])ν,µ
(α)
and define C (α,`) in (15.18d) from V` Step 4
:= 1.
(α1 and α2 sons of α)
and Tα (cf. (15.10a)).
for all 1 ≤ j ≤ d set (j)
bi
({j}c )
:= ϕi
({j}c )
(v) = v[•, pi
(1 ≤ i ≤ rj ).
]
(α)
The bases vectors bi for α ∈ TD \L(TD ) are explained in Proposition 15.12, but they do not enter into the algorithm. Instead, the characteristic parameters C (α,`) , Bj , and c(D) are determined. The choice of the pivots should ensure the regularity of the matrix Sα = Mα (v)|Pα ×Pαc ; in particular, |det(Sα )| should not be too large. Therefore strategies for suitable subsets Pα , Pαc will be discussed later. For a fixed choice of all subsets Pα the number of necessary evaluations of tensor entries is given next. (j)
({j}c )
({j}c )
Remark 15.13. bi = v[ •, pi ] is the fibre F(v; j, pi ) in direction j. Its computation requires dim(Vj ) evaluations of v. Matrix Sα requires rα2 evalua(α) tions of the tensor v. The matrices V` (1 ≤ ` ≤ rα ) need rα rα1 rα2 evaluations. Altogether, d X X X rα2 + rj dim(Vj ) rα rα1 rα2 + α∈TD \L(TD )
α∈TD
j=1
evaluations of v are required. In the case of tensorisation with dim(Vj ) = 2, the leading term of evaluations is 3dr2 (r := maxα rα ), which is surprisingly small compared with the huge total number of tensor entries.
556
15 Multivariate Cross Approximation
15.4.3.2 Choice of Index Sets In (15.11a) we described how to improve the choice of the two-dimensional pivot (i, j) . In the tensor case the rows and columns contain too many entries so that the search for suitable pivots must be changed. Consider a vertex α ∈ TD /L(TD ) with sons α1 and α2 . We want to find the pivots for Mα1 (v), whose rows and columns belong to Iα1 and Iαc1 = Iα2 × Iαc . From the previous computation at α we already have a pivot index set Pαc = (αc ) (αc ) {p1 , . . . , prα } ⊂ Iαc (cf. (15.12a)). To reduce the search steps, possible pivots i = (i1 , . . . , id ) ∈ I are restricted to those satisfying (ij )j∈αc ∈ Pαc (note that there are only rα tuples in Pαc ). The remaining components (ij )j∈α are obtained by maximising along one fibre. Again, the starting index i must be such that d[i] 6= 0 for the difference d = v − v`−1 (v`−1 : actual approximation). The paragraph following (15.11b) is again valid. The index tuple i ∈ I is input and return value of the following function. We split i = (i1 , . . . , id ) into iα := (ij )j∈α ∈ Iα and iαc := (ij )j∈αc ∈ Iαc . We write i = (iα , iαc ) ∈ I for the tuple i constructed from both iα and iαc (note that the indices in α, αc need not be numbered consecutively; e.g., α = {1, 4}, αc = {2, 3}). 1 2 3 4 5 6
function ImprovedP ivot(α, d, i); (15.19) {input: α ∈ TD , d ∈ V, i ∈ I satisfying iαc ∈ Pαc ⊂ Iαc } begin for all j ∈ α do ij := argmaxi∈Ij |d[i1 , . . . , ij−1 , i, ij+1 , . . . id ]| ; iαc := argmaxiαc ∈Pαc |d[iα , iαc ]| ; ImprovedP ivot := (iα , iαc ) end;
A single step in line 3 maximises |d[i1 , . . . , id ]| on the fibre F(j, i). This requires the evaluation of the tensor along this fibre. Since j is restricted to α, line 3 defines the part iα ∈ Iα . Having fixed iα , we need only rα evaluations for the maximisation over Pαc in line 4. The parts iα ,P iαc define the return value in line 5. In total, one call of the function requires rα + j∈α dim(Vj ) evaluations of d. The obtained index i ∈ I will be split into iα1 := (ij )j∈α1 ∈ Iα1 and iαc1 for a son α1 of α. Note that iαc1 is formed by iα2 (with entries optimised in line 3) and iαc from line 4. According to Step 1 in §15.4.3.1, the matrices Sα1 and their inverses Tα1 for the sons α1 of α ∈ TD are to be determined. In fact, Tα1 is explicitly determined in (15.10a). The following procedure ImprovedS performs one iteration step Tα1 ∈ K(r−1)×(r−1) 7→ Tα1 ∈ Kr×r together with the determination of the index sets Pα1 , Pαc1 . These data determine the approximation vr ∈ V satisfying rank(Mα1 (vr )) = r. The evaluation of d := v − vr−1 in ImprovedP ivot needs a comment since vr−1 is not given directly.9 The inverse matrix Tα1 = Note the difference to the matrix case. In (15.10b) the rows M [ •, j 0 ] and columns M [i0 , •] are already evaluated. 9
15.4 Case d ≥ 3
557
Sα−1 ∈ K(r−1)×(r−1) is performed via (15.10a) and yields the representation 1 r−1 X
vr−1 [iα1 , iα2 , iαc ] =
(α2 )
(α2 )
v[iα1 , pi
, iαc ] Tτ ×σ [pi
(α1 )
, pj
(α1 )
] v[pj
, iα2 , iαc ]
i,j=1 (α1 )
(cf. (15.10b)), using pj
(α2 )
∈ Pα1 and pi
(15.20) ∈ Pα 2 .
(α2 )
Changing i, we must update v[iα1 , pi 1 2 3 4 5 6 7 8
(α1 )
, iαc ] and v[pj
, iα2 , iαc ].
procedure ImproveS(α, α1 , v, r, Tα1 , Pα1 , Pαc1 , Pαc ); {input parameters: α1 son of α ∈ TD , v ∈ V, Pαc ⊂ Iαc ; in- and output: r ∈ N0 ; Tα1 ∈ Kr×r , Pα1 ⊂ Iα1 , Pαc1 ⊂ Iαc1 } begin choose initial indices iα ∈ Iα and iαc ∈ Pαc ; i := (iα , iαc ); for χ := 1 to χmax do i := ImprovedP ivot(α1 , v − v`−1 , i); Pα1 := Pα1 ∪ {iα1 }; Pαc1 := Pαc1 ∪ {iαc1 }; r := r + 1; compute Tα1 ∈ Kr×r via (15.10a); end;
Index i obtained in line 5 yields the largest value |(v − v`−1 )[i]| among the fibres checked in ImprovedP ivot. A typical value is χmax = 3. In line 6, the parts iα1 := (ij )j∈α1 ∈ Iα1 and iαc1 become new pivots in Pα1 and Pαc1 (in (15.15) the (α) (αc ) entries are denoted by pj and pj ). Tα1 = (Sα1 )−1 is updated in line 7. The final algorithm for determining Sα1 (Tα1 ) depends on the stopping criterion. If rα1 is prescribed, the criterion is r = rα1 . If rα1 should be determined adaptively, an error estimation can be applied. procedure DetS(α, α1 , v, r, Tα1 , Pα1 , Pαc1 , Pαc ); {input parameters: α1 son of α ∈ TD , v ∈ V, Pαc ⊂ Iαc ; output: r ∈ N0 ; Tα1 ∈ Kr×r , Pα1 ⊂ Iα1 ; Pαc1 ⊂ Iαc1 } begin r := 0; Tα1 := 0; Pα1 := Pαc1 := ∅; repeat ImproveS(α, α1 , v, r, Tα1 , Pα1 , Pαc1 , Pαc ) until criterion satisfied end; The call HierAppr(TD , D, v, (Cα )α∈TD \L(TD ) , c(D) , (Bj )j∈D , (Pα )α∈TD ) of the next procedure determines the parameters Cα = (C (α,`) )1≤`≤rα , c(D) ∈ K, and (Bj )j∈D ∈ (Vj )rj of a hierarchical approximation v ˜ = ρHT (. . .) ≈ v in (11.24). Note that v is the given tensor in full functional representation (cf. (7.6)), while v ˜ is the approximation due to the chosen criterion. We recall that v = v ˜ holds if the criterion prescribes the ranks rα = dim(Mα (v)). 1 2 3
procedure HierAppr(TD , α, v, (Cα )α∈TD \L(TD ) , c(D) , (Bj )j∈D , (Pα )α∈TD ); {input: TD , α ∈ TD \L(TD ), v ∈ V; output: Cα , c(D) , Bj , Pα } if α = D then
558 4 5 6 7 8 9 10 11 12
13 14 15 16 17
15 Multivariate Cross Approximation
begin c(D) := 1; determine sons α1 , α2 of D; C (D,1) := Tα1 ; DetS(α, α1 , v, rα1 , Tα1 , Pα1 , Pα2 , ∅); HierAppr(TD , α1 , v, (Cα )α , c(D) , (Bj )j , (Pα )α ); HierAppr(TD , α2 , v, (Cα )α , c(D) , (Bj )j , (Pα )α ); end else if α ∈ / L(Td ) then begin determine sons α1 , α2 of α; DetS(α, α1 , v, rα1 , Tα1 , Pα1 , Pα1c , Pαc ); DetS(α, α2 , v, rα2 , Tα2 , Pα2 , Pαc2 , Pαc ); (α) for ` := 1 to rα do C (α,`) := Tα1 V` TαT2 (α ) (αc ) (α) (α ) where V` [ν, µ] := v[pν 1 , pµ 2 , p` ];) / L(TD ) then HierAppr(Td , α1 , v, (Cα )α , c(D) , (Bj )j , (Pα )α ); if α1 ∈ (αc ) (j) (j) (j) else Bj := [b1 · · · brj ] with α1 = {j}, bi := v[ •, pi 1 ]; / L(TD ) then HierAppr(Td , α2 , v, (Cα )α , c(D) , (Bj )j , (Pα )α ); if α2 ∈ (αc ) (j) (j) (j) else Bj := [b1 · · · brj ] with α2 = {j}, bi := v[ •, pi 2 ]; end;
Lines 3–7 correspond to Proposition 15.12a. Lines 9–15 describe the case of Proposition 15.12b. For leaves α1 or α2 , case (c) of Proposition 15.12 applies. Concerning error estimates we refer to Ballani–Grasedyck–Kluge [17]. 15.4.3.3 Cost We must distinguish the number Na of arithmetic operations and the number Ne of evaluations of tensor entries. The value Na = O(dr4 + Ne ) with r := maxα∈TD rα is of minor interest (cf. [17]). More important is Ne because evaluations may be rather costly. Following Remark 15.13c, for fixed pivots we need (d − 1) r3 + (2d − 1) r2 + drn evaluations, where r := maxα∈TD rα and n := maxj nj . Another source of evaluations is the pivot search by ImprovedP ivot(α, d, i) in (15.19), where #α fibres and #Pαc indices i = i(iα , iαc ) are tested with respect to the size of |d[i]|. Since d = v − v`−1 , we must evaluate v[i] and v`−1 [i]. The latter expression is defined in (15.20). If a new index ij ∈ Ij belongs to direction (α ) j ∈ α1 , the sum in (15.20) involves P r − 1 new values v[iα1 , pi 2 , iαc ]. Similarly for j ∈ α2 . This leads to 2(r − 1) j∈α nj evaluations. Variation of iαc ∈ Pαc causes only 2(r − 1)#Pαc evaluations. The procedure ImproveS is called for 1 ≤ r ≤ rα1 , the total number of evaluations due to the pivot choice is bounded by Xd X X nj , nj ≤ d · depth(TD ) · r2 rα2 1 #α α∈TD \L(TD )
j∈α
j=1
where r := maxα∈TD rα . We recall that for a balanced tree TD the depth of TD is equal to dlog2 de, whereas depth(TD ) = d − 1 holds for the TT format (cf. Remark 11.5). In the first case, the total number of evaluations is Ne = O (d − 1) r3 + d log(d)r2 n .
Chapter 16
Applications to Elliptic Partial Differential Equations
Abstract We consider elliptic partial differential equations in d variables and their d discretisation in a product grid I = ×j=1 Ij . The solution of the discrete system Nd is a grid function, which can directly be viewed as a tensor in V = j=1 KIj . In Section 16.1 we compare the standard strategy of local refinement with the tensor approach involving regular grids. It turns out that the tensor approach can be more efficient. In Section 16.2 the solution of boundary-value problems is discussed. A related problem is the eigenvalue problem discussed in Section 16.3. We concentrate ourselves to elliptic boundary-value problems of second order. However, elliptic boundary-value problems of higher order or parabolic problems lead to similar results.
16.1 General Discretisation Strategy The discretisation of partial differential equations leads to matrices whose size grows with increasing accuracy requirement. In general, simple discretisation techniques (Galerkin method or finite difference methods) using uniform grids yield too large matrices. Instead, adaptive discretisation techniques are used. Their aim is to use as few unknowns as possible in order to ensure a certain accuracy. The first type of methods is characterised by the relation ε = O(nκ/d ), where ε is the accuracy of the approximation (in some norm), n number of degrees of freedom, κ the consistency order, and d the spatial dimension. The corresponding methods are Galerkin discretisations with polynomial ansatz functions of fixed degree. The relation ε = O(nκ/d ) is not always reached.1 Only if the solution behaves uniformly regular in its domain of definition, the uniform grid also yields ε = O(nκ/d ).
(16.1)
1
For three-dimensional problems, edge singularities require stretched tetraeders. However, usual adaptive refinement strategies try to ensure shape regularity (cf. [141, §8.5.2.1]). © Springer Nature Switzerland AG 2019 W. Hackbusch, Tensor Spaces and Numerical Tensor Calculus, Springer Series in Computational Mathematics 56, https://doi.org/10.1007/978-3-030-35554-8_16
559
560
16 Applications to Elliptic Partial Differential Equations
However, in the standard case, the solution of partial differential equations has point singularities and—for 3D problems—edge singularities. This requires a concentration of grid points towards corners or edges. A more efficient method is the hp finite-element method in the case of piecewise analytic solutions. Here the ideal relation between the error ε and the number n of unknowns is ε = O(exp(−βnα )) for suitable α, β > 0. So far, for fixed ε, the strategy has been to minimise the problem size n. In principle, this requires that the generation of the system matrix and its solution be also O(n). This requirement can be relaxed for the hp finite-element method. If n = O(log1/α 1ε ), even Gauss elimination with cost O(log3/α 1ε ) is only a redefinition of α by α/3. Tensor applications require a Cartesian grid I := I1 × . . . × Id of unknowns. This does not mean that the underlying domain Ω ⊂ Rd must be of product form. It is sufficient that Ω is the image of a domain Ω1 × . . . × Ωd . For instance, Ω may be a circle, which is the image of the polar coordinates varying in Ω1 × Ω2 . The use of a (uniform2 ) grid I = I1 × . . . × Id seems to contradict the strategies from above. However, again the leading concept is: best accuracy for minimal cost, where the accuracy3 is fixed by the grid I. For simplicity, we assume nj = #Ij = n = O(ε−β ). The storage cost of the tensor formats is O(r∗ nd) (r∗ indicates possible powers of some rank parameters). The approximation of the inverse by the technique in §9.8.2.6 costs O(log2 ( 1ε ) · r∗ nd). Comparing the storage and arithmetic cost with the accuracy, we see a relation as in (16.1), but the exponent κ/d is replaced with some β > 0 independent of d. A second step is the tensorisation in §14. As described in §14.2.3.3, the complexity reached by tensorisation corresponds (at least) to the hp finite-element approach. Therefore, in the end, the cost should be not worse than the best hp method, but independent of d.
16.2 Solution of Elliptic Boundary-Value Problems We consider a linear boundary-value problem Lu = f in Ω = Ω1 × . . . × Ωd ⊂ Rd ,
u = 0 on ∂Ω
(16.2)
with a linear differential operator of elliptic type (cf. [141, §5.1.1]). For the product form, see the discussion from above. The homogeneous Dirichlet condition u = 0 on ∂Ω may be replaced with other conditions as the Neumann condition. 2
The grids Ij need not be uniform, but for simplicity this is assumed. In the case of a point singularity rα (α > 0, r = kx − x0 k , x0 : corner point), the grid size h leading to an accuracy ε is of the form h = O(εβ ), β > 0. A uniform grid Ij needs nj = O(ε−β ) grid points. 3
16.2 Solution of Elliptic Boundary-Value Problems
561
The standard dimension d = 3 is already of interest for tensor methods. The other extreme are dimensions of the order d = 1000. In the following we exploit that the solution of (16.2) admits a tensor-sparsity (cf. Dahmen–DeVore–Grasedyck–S¨uli [68].
16.2.1 Separable Differential Operator The most convenient form of L is the separable one (cf. (1.9a); Definition 9.65): L=
d X
Lj ,
Lj differential operator in xj ,
(16.3a)
j=1
i.e., Lj contains derivatives with respect to xj only and its coefficients depend only on xj (in particular, Lj may have constant coefficients). In this case we can write the differential operator as Kronecker product: L=
d X
I ⊗ . . . ⊗ I ⊗ Lj ⊗ I ⊗ . . . ⊗ I,
(16.3b)
j=1
where Lj is considered as one-dimensional differential operator acting on a suitable space Vj .
16.2.2 Discretisation 16.2.2.1 Finite Difference Method Choose a uniform4 grid with nj (interior) grid points in direction j. The onedimensional differential operator Lj in (16.3b) can be approximated by a difference operator Λj (see [141, §4.1] for details). The standard finite differences leads to tridiagonal matrices Λj . Higher-order differences may produce more offdiagonals. The resulting system matrix of the difference method takes the form d X A= I ⊗ . . . ⊗ I ⊗ Λj ⊗ I ⊗ . . . ⊗ I, (16.4) j=1
provided that (16.3b) holds.
4
Also non-uniform grids may be used. In this case, the difference formulae are Newton’s first and second difference quotients (cf. [141, §4.1]).
562
16 Applications to Elliptic Partial Differential Equations
16.2.2.2 Finite-Element Method The variational formulation of (16.2) R is given by a a(·, ·) and the functional f (v) = Ω f vdx :
n
if K = R bilinear sesquilinear if K = C
find u ∈ H01 (Ω) such that a(u, v) = f (v) for all v ∈ H01 (Ω). According to the splitting (16.3a), the form a(·, ·) is a sum of products: d X Y aj (·, ·) (·, ·)k a(·, ·) = j=1
o
form
(16.5)
(16.6)
k6=j
with aj : H01 (Ωj ) × H01 (Ωj ) → K and (·, ·)k the L2 (Ωk ) scalar product. (j)
(j)
The (possibly non-uniform) intervals [xν , xν+1 ], 0 ≤ ν ≤ nj , of the one(j) (j) d dimensional grids in directions 1 ≤ j ≤ d form the cuboids τν := ×j=1 [xνj , xνj +1 ] d (j) 1 for multi-indices ν ∈ ×j=1 {0, . . . , nj }. Let bν ∈ H0 (Ωj ) for 1 ≤ ν ≤ nj be (j) (j) the standard, one-dimensional, piecewise linear hat function: bν (xµ ) = δνµ . Theyspan the subspace Vj ⊂ H01 (Ωj ). The final finite-elements basis functions are bν :=
d O
d
bν(j) ∈ H01 (Ω) for ν ∈ I := j
×I , I j
j
:= {1, . . . , nj }.
j=1
j=1
Their span is the space V := u ∈ V is defined by
Nd
j=1
Vj ⊂ H01 (Ω). The finite-element solution
a(u, v) = f (v) for all v ∈ V. (16.7) P xµ bµ , where the coefficient vector The solution has a representation u = x = (xµ ) is the solution of the linear system Ax = φ. Here the right-hand side φ has the entries φν = f (bν ). The finite-element system matrix A is defined by Aνµ := a(bµ , bν ). From (16.6) we derive that A=
j−1 d X O j=1
k=1
! Mk
⊗ Aj ⊗
d O
Mk , where
(16.8)
k=j+1
(j) (j) (j) Aj [ν, µ] := aj (b(j) µ , bν ) and Mj [ν, µ] := (bµ , bν )j .
Note that the mass matrix Mk replaces the identity matrix in (16.4). Nd Remark 16.1. Let M := j=1 Mj , and define Λ := M−1 A. Then Λ takes the form (16.4) with Λj := Mj−1 Aj .
563
16.2 Solution of Elliptic Boundary-Value Problems
16.2.2.3 Treatment of Nonseparable Differential Operators 2
The assumption of a separable L excludes not only mixed derivatives ∂x∂i ∂xj but also coefficients depending on x-components other than xj . As an example, we consider the first order term Lfirst := c∇ =
d X
cj (x)
j=1
∂ ∂xj
appearing in L with coefficients cj (x1 , . . . , xd ) and discuss the definition of a tensor-based finite difference scheme. The forward difference ∂ + (defined by (∂ + ϕ)(ξ) = [ϕ(ξ + h) − ϕ(ξ)]/h) or the backward difference ∂ − can be represented exactly in the format Hρtens with ranks ρk = 2 (cf. §14.1.6). The central difference requires ρk = 3. Next, we apply the technique of §15.4 to construct a tensor cj ∈ V approximating the d-variate function cj . According to Remark 13.10, we may define the multiplication operator Cj ∈ L(V, V). Hence the discretisation of Lfirst = c∇ is given by Λfirst :=
d X
Cj ∂j+ .
j=1
It is usually not necessary to determine the operator Λfirst explicitly, but in principle this can be done (cf. §13.8). Analogously, other parts of L can be treated.
16.2.3 Solution of the Linear System In the following, we discuss the use of iterative schemes (details in Hackbusch [140, §2]). An alternative approach is mentioned in §17.2.1. Let Ax = b be the linear system with x, b ∈ V. The basic form of a linear iteration is x(m+1) := x(m) − C Ax(m) − b (16.9) with some matrix C and any starting value x(0) . Then convergence x(m) → x = A−1 b holds if and only if the spectral radius (4.86) of CA satisfies ρ(CA) < 1. For the efficient solution we need ρ(CA) ≤ η < 1 with η independent of the grid size and of possible parameters appearing in the problem. The (slow) convergence of inefficient methods is often directly connected to the condition of the matrix A. Assume for simplicity that A ∈ L(V, V) for Nd V = j=1 Knj is of the form (16.4) with positive-definite matrices Λj possessing (j) (j) eigenvalues λ1 ≥ . . . ≥ λnj > 0. Then the condition of A is equal to
564
16 Applications to Elliptic Partial Differential Equations
cond(A) =
d X j=1
(j)
λ1
d X
λ(j) nj .
j=1 (j)
In the simplest case of Λ1 = Λ2 = . . . = Λd , the eigenvalues λν = λν are independent and cond(A) = λ1 /λn holds. This together with Exercise 4.62 proves the next remark. Remark 16.2. The condition of the matrix A depends on the numbers nj = #Ij , but not on the dimension d. In particular, the following inequalities hold: min cond(Λj ) ≤ cond(A) ≤ max cond(Λj ). j
j
If A and C = B−1 are positive definite, C is a suitable choice5 if A and B are spectrally equivalent, i.e., c11 (Ax, x) ≤ (Bx, x) ≤ c2 (Ax, x) for all x ∈ V and constants c1 , c2 < ∞. In the case of (not singularly degenerate) elliptic boundary-value problems, (A·, ·) corresponds to the H 1 norm. As a consequence, different system matrices A and B corresponding to H 1 coercive elliptic problems are spectrally equivalent. In particular, there are elliptic problems with separable B (e.g., the Laplace equation −∆u = f ). Given a positive-definite and separable differential operator, its discretisation B satisfies the conditions of Proposition 9.63 (therein, B is called A, while C = B−1 is called B). The matrices M (j) in Proposition 9.63 are either the identity (finite difference case) or the mass matrices (finite-element case). As a result, a very accurate approximation of C = B−1 can be represented in the format Rr (transfer into other formats is easy). We remark that the solution requires the matrix exponentials exp(−αT ) or exp(−αM −1 T ) (T : n × n triangular matrix). In the case of exp(−αT ), this can be performed exactly by a diagonalisation of T. In general, the technique of hierarchical matrices yields exp(−αM −1 T ) in almost linear cost O(n log∗ n) (cf. Hackbusch [138, §14.2.2]). Once, a so-called preconditioner C is found, we have to apply either iteration (16.9) or an accelerated version using conjugate gradients or GMRES. For simplicity, we assume ρ(CA) ≤ η < 1 and apply (16.9). The representation ranks of x(m) are increased first by the evaluation of the defect Ax(m) − b and second by multiplication by C. Therefore we have to apply the truncated iteration in §13.10: h i x(m+1) := T x(m) − C Ax(m) − b , or h i x(m+1) := T x(m) − C T Ax(m) − b , where T denotes a suitable truncation. See also Khoromskij [184]. If a very efficient C is required (i.e., 0 < η 1), C must be close to A−1 . The fixed-point iterations explained in §13.10 can be used to produce C ≈ A−1 . 5
To be precise, C must be suitably scaled to obtain ρ(CA) ≤ η < 1.
16.2 Solution of Elliptic Boundary-Value Problems
565
Another approach is proposed in Ballani–Grasedyck [16], where a projection method onto a subspace is used, which is created in a Krylov-like manner. A well-known efficient iterative method is the multi-grid iteration (cf. [131]). For its implementation we need a sequence of grids with decreasing grid size, prolongations and restrictions, and a so-called smoother. Since we consider uniform grids, the construction of a sequence of grids with grid width h` = 2−` h0 (`: level of the grid) is easy. The prolongations and the restrictions are elementary Kronecker tensors. For the solution on the coarsest grid (level ` = 0) one of the aforementioned methods can be applied.6 As smoothing iteration we may choose the damped Jacobi iteration. The numerical examples in Ballani–Grasedyck [16] confirm a gridindependent convergence rate. For more details compare Hackbusch [139]. An alternative to the iteration (16.9) is the direct minimisation approach in §17.2.1.
16.2.4 Accuracy Controlled Solution The cost of usual iterative solvers of the system Ax = b do not depend on the vector b. This is different for the tensor case since the memory size of b depends on the rank involved in the format used for its representation. On the other hand, the first iterates x(m) of (16.9) are less accurate anyway. This leads to the following saving of operations. Partition b into ∆b0 + ∆b1 + ∆b2 + . . . , where ∆bk has a sufficiently small rank, so that the rank of X bk := ∆bν 0≤ν≤k
is proportional to k. The computation of x(1) can be performed with the righthand side ∆b0 instead of b. The next iterate x(2) uses ∆b1 , etc. Denote the results of the modified iteration by x ˆ(m) . Balancing the errors x(m) − A−1 b and P ˆ(m) as for x(m) , while ν>m ∆bν , one obtains almost the same accuracy of x (m) because of the smaller ranks. the cost is lower for x ˆ Note that this approach also applies to a right-hand side of infinite rank. So far, a fixed discretisation is given. Instead, one may also try to choose the discretisation and the tensor ranks in such a way that the final error (discretisation error plus tensor representation error) is below a prescribed bound. For such approaches compare Bachmayr–Dahmen [11, 12]. 6
The usual reasoning is that the coarsest grid size corresponds to a low dimension of the linear system so that any standard method can be used for its solution. This is not necessarily true in the tensor case. If the coarsest grid consists of only n0 grid points per direction, the size nd 0 of the system may be large. The choice n0 = 1 would be optimal.
566
16 Applications to Elliptic Partial Differential Equations
16.3 Solution of Elliptic Eigenvalue Problems As already stated in the introduction (see page 12), an eigenvalue problem Lu = λu in Ω = Ω1 × . . . × Ωd ,
u = 0 on ∂Ω,
(16.10)
for a separable differential operator L (cf. (16.3a)) is trivial since the eigenvectors are elementary tensors: u ∈ R1 . The determination of u can be completely reduced to one-dimensional eigenvalue problems Lj u(j) = µu(j) ,
u(j) ∈ Vj \{0}.
In the following, we consider a linear, symmetric eigenvalue problem (16.10) discretised by7 Ax = λx, (16.11a) involving a symmetric matrix A. Regarding x ∈ KN as tensor x ∈ V, we interpret A as a Kronecker product A ∈ L(V, V). In general, due to truncations, A and x will be only approximations of the true problem Ax = λx.
(16.11b)
According to Lemma 13.11, we can ensure Hermitian symmetry A = AH exactly.
16.3.1 Regularity of Eigensolutions Since separable differential operators lead to rank-1 eigenvectors, we may hope that in the general case, the eigenvector is well approximated in one of the formats. This property can be proved, e.g., under the assumption that the coefficients of L are analytic. The details can be found in Hackbusch–Khoromskij–Sauter–Tyrtyshnikov [148]. Besides the usual ellipticity conditions, all coefficients appearing in L are assumed to fulfil
1/2 X
p! ∂ ν 2 p k∇p ckL∞ (Ω) := u (16.12a)
∞ ≤ Cc γ p! ν! ∂x L (Ω) d ν∈N0 with |ν|=p
for all p ∈ N0 and some Cc , γ > 0. Then for analytic Ω or for Ω = Rd , the eigensolutions u corresponding to an eigenvalue λ satisfy v u X
p+2 (p + 2)! u
∂ ν 2
∇ u 2 t u 2 := (16.12b)
L (Ω) ν! ∂x L (Ω) d ν∈N0 with |ν|=p+2
n p o ≤ CK p+2 max p, |λ| 7
for all p ∈ N0
A Galerkin discretisation leads to a generalised eigenvalueAx = λM x with the mass matrix M .
16.3 Solution of Elliptic Eigenvalue Problems
567
with C and K depending only on the constants in (16.12a) and on Ω (cf. [148, Theorem 5.5]). Because of these smoothness results, we obtain error bounds for polynomial interpolants. As shown in [148, Theorem 5.8], this implies that a polynomial ur ∈ Pr ⊂ Tr exists with r = (r, . . . , r) and ku − ur kH 1 ≤ CM r logd (r)ρ−r , where d(d+1)/2e p Cˆd C C˜ p , M := √ d K(p + |λ|) ρ := 1 + 2π 1 + |λ| with Cˆd , C˜d > 0 depending only on C, K in (16.12b) (cf. [148, Theorem 5.8]). The latter approximation carries over to the finite-element solution (cf. [148, Theorem 5.12]). This proves that, under the assumptions made above, the representation rank r depends logarithmically on the required accuracy. However, numerical tests with nonsmooth coefficients show that even then good tensor approximations can be obtained (cf. [148, §6.2]). The most challenging eigenvalue problem is the Schr¨odinger equation (13.32). This is a linear eigenvalue problem, but the requirement of an antisymmetric eigenfunction (Pauli principle) is not easily compatible with the tensor formats (see, e.g., Mohlenkamp et al. [32, 227, 228]). The alternative density functional theory (DFT) approach—a nonlinear eigenvalue problem—and its treatment are already explained in §13.11. Again the question arises as to whether the solution can be well approximated within one of the tensor formats. In fact, the classical approximation in quantum chemistry uses Gaussians8 2
exp{−αν k• − xν k }
(αν > 0, xν ∈ R3 position of nuclei)
multiplied by suitable polynomials. Since the Gaussian function times a monomials is an elementary tensor in ⊗3 C(R), all classical approximations belong to format Rr , more specifically, to the subset of Rr spanned by r Gaussians modulated by a monomial. This implies that methods from tensor calculus can yield results which are satisfactory for quantum chemistry purposes. Nevertheless, the number r of terms is large and increases with molecule size. For instance, for C2 H 5 OH the number r is about 7000 (cf. [106]). Although Gaussian functions are suited for approximation, they are not the optimal choice. Alternatively, we can choose a sufficiently fine grid in some box [−A, A]3 and search for approximations in Rr ⊂ V = ⊗3 Rn corresponding to a grid width 2A/n (see concept in [103]). Such a test is performed in Chinnamsetty et al. [57, 56] for the electron density n(y) = ρ(y, y) (see page 504) and shows a reduction of the representation rank by a large factor. A similar comparison can be found in Flad et al. [106]. See also Chinnamsetty et al. [58]. Theoretical considerations about the approximability of the wave function are the subject of Flad–Hackbusch–Schneider [104, 105]. 8
There are two reasons for this choice. First, Gaussians approximate the solution quite well (cf. Kutzelnigg [205] and Braess [42]). Second, operations as mentioned in §13.11 can be performed analytically. The historical paper is Boys [39] from 1950.
568
16 Applications to Elliptic Partial Differential Equations
16.3.2 Iterative Computation Here we follow the approach of Kressner–Tobler [199], which is based on the algorithm9 of Knyazev [193] for computing the smallest eigenvalue of an eigenvalue problem (16.11a) with positive-definite matrix A and suitable preconditioner B: procedure LOBPCG(A, B, x, λ); (16.13) {input: A, B, x with kxk = 1; output: eigenpair x, λ} begin λ := hAx, xi; p := 0; repeat r := B −1 (Ax − λx); U := [x, r, p] ∈ CN ×3 ; ˆ := U H U ; Aˆ := U H AU ; M ˆ = λM ˆ y with smallest λ; determine eigenpair y ∈ C3 , λ of Ay p := y2 · r + y3 · p; x := y1 · x + p; x := x/ kxk until suitable stopping criterion satisfied end;
1 2 3 4 5 6 7 8 9 2
kxk = hx, xi is the squared Euclidean norm. The input of B has to be understood as a (preconditioning) method performing ξ 7→ B −1 ξ for ξ ∈ CN . The input value x ∈ CN is the starting value. The desired eigenpair (x, λ) of (16.11a) with minimal ˆ are positive-semidefinite λ is the output of the procedure. In line 5, Aˆ and M 3 × 3 matrices; hence, computing the eigenpair in line 6 is very cheap. Next, we consider the tensor formulation (16.11b) of the eigenvalue problem. Then procedure (16.13) becomes 1 2 3 4 5a 5b 6 7 8 9
(16.14) procedure T-LOBPCG(A, B, x, λ); {input: A, B ∈ L(V, V), x ∈ V with kxk = 1; output: eigenpair x, λ} begin λ := hAx, xi ; p := 0 ∈ V; repeat r := T (B−1 (Ax − λx)); u1 := x; u2 := r; u3 := p; for i := 1 to 3 do for j := 1 to i do ˆ ij := M ˆ ji := huj , ui i end; begin Aˆij := Aˆji := hAuj , ui i ; M 3 ˆ = λM ˆ y with smallest λ; determine eigenpair y ∈ C , λ of Ay p := T (y2 · r + y3 · p); x := T (y1 · x + p); x := x/ kxk until suitable stopping criterion satisfied end;
This procedure differs from the (16.13) in lines 4 and 7, in which a truncation T to a suitable format is performed. The required tensor operations are (i) the matrixvector multiplication Auj for 1 ≤ j ≤ 3 (note that u1 = x), (ii) the scalar product in lines 3 and 5b, (iii) additions and scalar multiplications in lines 4 and 7, and (iv) the performance of B−1 in line 4. Here we can apply the techniques in §16.2.3. 9
LOBPCG means ‘locally optimal block preconditioned conjugate gradient’. For simplicity, we consider only one eigenpair, i.e., the block is of size 1 × 1.
16.4 On Other Types of PDEs
569
As pointed out in [199], the scalar products hAuj , ui i are to be computed exactly; i.e., no truncation is applied to Auj . Algorithm (16.14) can be combined with any of the tensor formats. The numerical examples in [199] are based on the hierarchical format. Ballani–Grasedyck [16] compute the minimal (or other) eigenvalues by the (shifted) inverse iteration. Here the arising linear problems are solved by the multigrid iteration described in §16.2.3. Numerical examples can be found in [16].
16.3.3 Alternative Approaches In the case of a positive-definite matrix A, the minimal eigenvalue of Ax = λx is the minimum of the Rayleigh quotient: hAx, xi min : x 6= 0 . hx, xi This allows us to use the minimisation methods in §17.2. This approach is also discussed in [199, §4]. Another approach is proposed by [113, §4], which recovers the spectrum from the time-dependent solution x(t) of x(t) ˙ = iAx(t).
16.4 On Other Types of PDEs So far, we have only discussed partial differential operators of elliptic type. Nevertheless, the tensor calculus can also be applied to partial differential equations of other type. Section 17.3 is concerned with time-dependent problems. In the case of v(t) ˙ = Av(t) + f (t) with an elliptic operator A, we obtain a parabolic partial differential equation. On the other hand, the wave equation is a prototype of a hyperbolic differential equation. For its solution by the retarded potential, tensor methods are used in Khoromskij–Sauter–Veit [190].
Chapter 17
Miscellaneous Topics
Abstract In this chapter we mention further techniques which are of interest for tensor calculations. The first two sections consider optimisation problems. Section 17.1 describes iterative minimisation methods on a theoretical level of topological tensor spaces assuming exact tensor arithmetic. On the other hand, Section 17.2 applies optimisation directly to the parameters of the respective tensor representation. Section 17.3 is devoted to ordinary differential equations for tensor-valued functions. Here, the tangent space and the Dirac–Frenkel discretisation are explained. Finally, Section 17.4 recalls the ANOVA decomposition (‘analysis of variance’).
17.1 Minimisation Problems on V In this section we follow the article of Falc´o–Nouy [99]. We are looking for a minimiser u ∈ V satisfying J(u) = min J(v)
(17.1)
v∈V
under certain conditions on V and J. For its solution a class of iterations producing a sequence (um )m∈N is described and convergence um → u is proved.
17.1.1 Algorithm The usual approach is to replace the global minimisation with a finite or infinite sequence of simpler minimisations. For this purpose, we define a set S for which we shall give examples below. Then the simplest version of the algorithm (called the ‘purely progressive PGD’ in [99], where PGD means ‘proper generalised decomposition’) is the following iteration starting from some u0 := 0: um := um−1 + zm , where J(um−1 + zm ) = min J(um−1 + z). z∈S
© Springer Nature Switzerland AG 2019 W. Hackbusch, Tensor Spaces and Numerical Tensor Calculus, Springer Series in Computational Mathematics 56, https://doi.org/10.1007/978-3-030-35554-8_17
(17.2) 571
17 Miscellaneous Topics
572
The set S must be rich enough. The precise conditions are: 0 ∈ S ⊂ V, S = λS := {λs : s ∈ S} for all λ ∈ K, span(S) is dense in V, S is weakly closed.
(17.3)
Example 17.1. Sets S satisfying (17.3) are (a) S = R1 (set of elementary tensors), (b) S = Tr with r ≥ (1, . . . , 1) . Proof. The first two conditions are obvious. By definition, span(R1 ) = Valg is dense in V. As R1 ⊂ Tr , the same holds for Tr . By Lemma 8.6, Tr and in particular R1 = T(1,...,1) are weakly closed. u t Iteration (17.2) can be improved by certain updates of the correction. Falc´o– Nouy [99] propose two variants. In the following, v 7→ U(v) are mappings from V into the set of closed subspaces, which satisfy v ∈ U(v). Update A
Replace the iteration step (17.2) by
ˆ z∈S um ∈ U(um−1 + ˆ z)
with J(um−1 + ˆ z) = minz∈S J(um−1 + z), (17.4) with J(um ) = minv∈U(um−1 +ˆz) J(v).
While minz∈S , in general, involves a nonconvex set, minimisation over z) is replaced with an affine z) is of simpler kind. Next, U(um−1 + ˆ U(um−1 + ˆ Nd subspace. For instance, U(v) may be chosen as Umin (v) := j=1 Ujmin (v). Update B
Replace the iteration step (17.2) by
ˆ z∈S z) um ∈ um−1 + U(ˆ
with J(um−1 + ˆ z) = minz∈S J(um−1 + z), (17.5) with J(um ) = minv∈U(ˆz) J(um−1 + v).
z), see [99, Example 4]. The choice for the subspaces may For examples of U(ˆ be different in any iteration. Steps (17.4) and (17.5) are called the ‘updated progressive PGD’. The final iteration may choose for each index m one of the variants (17.2), (17.4), or (17.5).
17.1.2 Convergence Because of the later property (17.7), the iterates um are bounded. To find a weakly convergent subsequence, we must assume that V is a reflexive Banach tensor space. The nonlinear functional J must be sufficiently smooth:
(17.6a)
17.2 Solution of Optimisation Problems Involving Tensor Formats
573
J is Fr´echet differentiable with Fr´echet differential J 0 : V → V∗ .
(17.6b)
J must satisfy the following ellipticity condition with constants α > 0 and s > 1: s
hJ 0 (v) − J 0 (w), v − wi ≥ α kv − wk .
(17.6c)
Here hϕ, vi := ϕ(v) denotes the dual pairing in V∗ × V. Furthermore, one of the following two conditions (17.6d,e) are required: J : V → R is weakly sequentially continuous,
(17.6d)
i.e., J(um ) → J(u) for sequences um * u. Alternatively, J 0 : V → V∗ may be assumed to be Lipschitz continuous on bounded sets, i.e., kJ 0 (v) − J 0 (w)k ≤ CA kv − wk for v, w ∈ A and bounded A ⊂ V. (17.6e) As shown in [99, Lemma 3], (17.6b) and (17.6c) imply that J is strictly convex, bounded from below, and satisfies limkvk→∞ J(v) = ∞.
(17.7)
Under these conditions including (17.3), the minima in (17.2), (17.4), and (17.5) exist. The values J(um ) decrease weakly: J(um ) ≤ J(um−1 )
for m ≥ 1.
If equality J(um ) = J(um−1 ) occurs, u := um−1 is the solution of the original problem (17.1) (cf. [99, Lemma 8]). The following result is proved in [99, Thm. 4]. Proposition 17.2. (a) Under the conditions (17.6a-d), all variants of the progressive PGD (17.2), (17.4), (17.5) converge to the solution u of (17.1): um → u. (b) The same statement holds under conditions (17.6a–c,e) if s ≤ 2 in (17.6c).
17.2 Solution of Optimisation Problems Involving Tensor Formats There are two different strategies in tensor calculus. The first one performs tensor operations (as described in §13) in order to calculate certain tensors (solutions of fixed-point iterations, etc.). In this case truncation procedures are essential for the practical application. The final representation ranks of the solution may be determined adaptively. The second strategy fixes the ranks and tries to optimise the parameters of the representation.
17 Miscellaneous Topics
574
17.2.1 Formulation of the Problem Many problems can be written as minimisation problems of the form find x ∈ V such that J(x) = min J(v).
(17.8)
v∈V
Examples are linear systems Ax = b with x, b ∈ V := Nd M := j=1 Knj ×nj . Here J takes the form
Nd
j=1
Knj and A ∈
J(v) = hAv, vi − 2 0 there are η > 0, r ∈ N0 , and xε ∈ Rr such that kx − xε k ≤ η and J(xε ) − J(x) ≤ ε. The optimal xr ∈ Rr satisfies J(xr ) − J(x) ≤ J(xε ) − J(x) ≤ ε. This proves J(xr ) → J(x) for the optimal xr ∈ Rr . Setting r := min r, we derive from Rr ⊂ Tr that also J(xr ) → J(x). Similarly for xr ∈ Hr , since Rr ⊂ Hr for r = min r. t u
17.2.2 Reformulation, Derivatives, and Iterative Treatment The general form of a format description v = ρS (. . .) is considered in §7.1.1. (j) Particular examples are ρr-term (r, (vν ) for the r-term format (7.7a), ρTS (a, (Bj )) for the general subspace format in (8.5c), ρHT (TD , (Cα ), c(D) , (Bj )) for the (j) hierarchical format in (11.24), etc. Discrete parameters as r in ρr-term (r, (vν )) or TD in ρHT (TD , . . .) are fixed. All other parameters are variable. Renaming the latter parameters p := (p1 , . . . , pm ) , we obtain the description v = ρF (p), where p varies in P. The minimisation in (17.9) is equivalent to find p ∈ P such that J(ρF (p)) = min J(ρF (q)). q∈P
(17.10)
Iterative optimisation methods require at least parts of the derivatives in ∂J(ρF (p))/∂p =
∂J ∂ρF ∂v ∂p
or even second order derivatives as the Hessian. Since the mapping ρF (p) is multiF linear in p, the format-dependent part ∂ρ ∂p , as well as higher derivatives, are easy to determine. Since, in general, the representations are non-unique (cf. §7.1.3), the Jacobi (j) F matrix ∂ρ ∂p does not have full rank. For instance, for ρr-term (r, (vν )) the param(2) (1) eters are p1 := v1 ∈ V1 , p2 := v1 ∈ V2 , . . . The fact that ρF (sp1 , 1s p2 , p3 , . . .) (1) (2) ∂ρF ∂ρF F is independent of s ∈ K\{0} leads to h ∂ρ ∂p1 , v1 i = h ∂p2 , v1 i. Hence ∂p has a nontrivial kernel. In order to avoid redundancies of the r-term format, we may, (1) (2) e.g., equi-normalise the vectors: kvν k = kvν k = . . . The problem of redundant parameters will be discussed in more detail in §17.3.1. The usual iterative optimisation methods are alternating optimisations (ALS: see §9.6.2). A modification (MALS: modified alternating least squares) which often yields good results is the overlapping ALS, in which optimisation is performed consecutively with respect to (p1 , p2 ), (p2 , p3 ), (p3 , p4 ), . . . (cf. variant (δ) in §9.6.2.1). For particular quantum physics applications, this approach is called DMRG (density matrix renormalisation group, cf. [302, 303]). For a detailed discussion, see Holtz–Rohwedder–Schneider [168] and Oseledets [236].
17 Miscellaneous Topics
576
17.3 Ordinary Differential Equations We consider initial value problems d v = F(t, v) for t ≥ 0, dt
v(0) = v0 ,
(17.11)
where v = v(t) ∈ V belongs to a tensor space, while F(t, ·) : V → V is defined for t ≥ 0. v0 is the initial value. Since V may be a function space, F can d v = ∆v or the instationary be a differential operator. Then parabolic problems dt d Schr¨odinger equation dt v = −iHv are included into the setting (17.11). The discretisation with respect to time is standard. The unusual part is the discretisation with respect to a fixed format for the tensor v. For this purpose, we have to introduce the tangent space of a manifold.
17.3.1 Tangent Space Let a format F be defined via v = ρF (p1 , . . . , pm ) (cf. §7.1.1), where ρF is differentiable with respect to the parameters pi , 1 ≤ i ≤ m. The set F forms a manifold parametrised by p1 , . . . , pm ∈ K. The subscripts r, r, and r in F = Rr , F = Tr , and F = Hr indicate the fixed representation ranks. Definition 17.6. Let v = ρF (p1 , . . . , pm ) ∈ F. The linear space T (v) := span{∂ρF (p1 , . . . , pm )/∂pi : 1 ≤ i ≤ m} ⊂ V is the tangent space at v ∈ F. As observed above, the mapping ρF (p1 , . . . , pm ) is, in general, not injective. Therefore a strict inequality mT := dim(T (v)) < m may hold. Instead of a bijective parametrisation, we use a basis for T (v). This will be exercised for the format Tr in §17.3.3 and format Hr in §17.3.4.
17.3.2 Dirac–Frenkel Discretisation The Galerkin method restricts v in (17.11) to a certain linear subspace. Here we restrict v to the manifold F, which is not a subspace. We observe that any differd vF ∈ T (vF ). Hence the rightentiable function vF (t) ∈ F has a derivative dt hand side F(t, vF ) in (17.11) must be replaced with an expression belonging to T (vF ). The closest one is the orthogonal projection of F onto T (vF ). Denoting the orthogonal projection onto T (vF ) by P (vF ), we get the substitute vF (t) ∈ F with vF (0) = v0F ∈ F d dt vF (t)
and
= P (vF (t))F(t, vF (t)) for t ≥ 0,
(17.12)
17.3 Ordinary Differential Equations
577
where the initial value v0F is an approximation of v0 . The new differential equation is called the Dirac–Frenkel discretisation of (17.11) (cf. [79], [109]). The variational formulation of (17.12) is d vF (t) − F(t, vF (t)), t = 0 for all t ∈ T (vF ). dt For an error analysis of this discretisation we refer to Lubich [219, 220] and Koch–Lubich [194]. For a concrete discretisation of (17.12), we may choose the explicit Euler (n) (n) scheme. The parameters of vn ≈ vF (n · ∆t) ∈ F are ρF (p1 , . . . , pm ). The (n) (n) vector P (vn )F(t, vn ) from the tangent space leads to coordinates ∂p1 , . . . , ∂pm (their explicit description in the case of F = Tr is given below in Lemma 17.8). Then the Euler scheme with step size ∆t produces the next approximation (n)
(n)
(n) vn+1 := ρF (p1 + ∆t∂p1 , . . . , p(n) m + ∆t∂pm ).
More details are in Falc´o–Hackbusch-Nouy [98].
17.3.3 Tensor Subspace Format Tr Tensors from F = Tr are represented by X v = ρorth (a, (Bj )) =
i∈J
ai (j)
Od
(j) b j=1 ij (j)
= Ba r
(cf. (8.10b)) with orthonormal bases Bj = (b1 , . . . , brj ) ∈ Vj j , which generate Nd B = j=1 Bj . The set of matrices Bj ∈ KIj ×rj with BjH Bj = I is called the Stiefel manifold (cf. Uschmajew [286]). Lemma 17.7. Let v = ρorth (a, (Bj )) ∈ Tr . Every tangent tensor t ∈ T (v) has a representation of the form ! d X t = Bs + B1 ⊗ . . . ⊗ Bj−1 ⊗ Cj ⊗ Bj+1 ⊗ . . . ⊗ Bd a, (17.13a) j=1
where Cj satisfies
BjH Cj = 0.
(17.13b)
Cj and s are uniquely determined (cf. (17.13c,d)), provided that rankj (v) = rj . Conversely, for any coefficient tensor s and all matrices Cj satisfying (17.13b) the right-hand side in (17.13a) belongs to T (v). Proof. (ia) Any t ∈ T (v) is the limit of h1 [ρorth (a + h δa, (Bj + h δBj )) −v] as h → 0 for some δa and δBj . This limit is equal to X d B1 ⊗ . . . ⊗ Bj−1 ⊗ δBj ⊗ Bj+1 ⊗ . . . ⊗ Bd a. B δa + j=1
17 Miscellaneous Topics
578
The first term is of the form Bs. Since Bj (h) := Bj + h δBj must be orthogonal, i.e., Bj (h)H Bj (h) = I, it follows that BjH δBj + δBjH Bj = 0. Split δBj into δBj = δBjI + δBjII with δBjI := I − Bj BjH δBj and δBjII := Bj BjH δBj . The derivative of v with respect to δBjII yields B1 ⊗ . . . ⊗ Bj−1 ⊗ δBjII ⊗ Bj+1 ⊗ . . . ⊗ Bd a = B a0 with a0 := id ⊗ . . . ⊗ id ⊗ BjH δBj ⊗ id ⊗ . . . ⊗ id a. Such a term can be expressed by B s in (17.13a). Therefore we can restrict δBj to the part δBjI =: Cj , which satisfies (17.13b): BjH δBjI = BjH I − Bj BjH δBj = 0. (ib) Given t ∈ T (v), we must determine s and Cj . BH B = I and BjH Cj = 0 imply that s = BH t . (17.13c) P d 0 Hence t = t−Bs is equal to j=1 B1 ⊗ . . . ⊗ Bj−1 ⊗ Cj ⊗ Bj+1 ⊗ . . . ⊗ Bd a. T From (5.5) we conclude that Mj (t0 ) = Cj Mj (a)BT [j] . Let Mj (a) = Uj Σj Vj rj rj ×rj be the reduced singular-value decomposition with Uj , Vj ∈ Vj and Σj ∈ R . Thanks to rankj (v) = rj , Σj is invertible. This allows us to solve for Cj : Cj = Mj (t0 ) B[j] Vj Σj−1 UjH .
(17.13d)
(ii) Given s and matrices Cj satisfying (17.13b), the derivative 1 [ρorth (a + h s, (Bj + h Cj )) − v] ∈ T (v) with v = ρorth (a, (Bj )) h→0 h
t = lim
t u
has the representation (17.13a).
A more general problem is the description of the orthogonal projection on T (v). Given any w ∈ V, we need the parameters s and Cj of t ∈ T (v) defined by t = P (v)w and v = ρorth (a, (Bj )) ∈ Tr . Another notation for t is kt − wk = min{ ˜ t − w : ˜ t ∈ T (v)}. First we split w into the orthogonal components w = BBH w + w0 . Since BBH w ∈ T (v), it remains to determine t0 satisfying kt0 − w0 k = min{k˜ t0 − w0 k : ˜ t0 ∈ T (v)} Pd with t0 = j=1 t0j , t0j := (B[j] ⊗ Cj )a ∈ Uj , and orthogonal subspaces Uj := N Uj⊥ ⊗ k6=j Uk , Uj := range(Bj ). The orthogonal projection onto Uj yields 0 H 0 ˜0 ˜0 kt0j − B[j] BH [j] w k = min{ktj − B[j] B[j] w k : tj ∈ T (v) ∩ Uj }. Equivalent statements are 0 H 0 H˜ ˜ t0j = B[j] (Cj a) with kCj a − BH [j] w k = min{kCj a − B[j] w k : Bj Cj = 0} 0 0 H ˜ ⇔ kMj (Cj a) − Mj (BH [j] w )kF = min kMj (Cj a) − Mj (B[j] w )kF ˜j =0 BjH C
⇔ kCj Mj (a) − Mj (w0 )B[j] kF = min kC˜j Mj (a) − Mj (w0 )B[j] kF , ˜j =0 BjH C
17.3 Ordinary Differential Equations
579
since the Euclidean norm kxk of a tensor x is equal to the Frobenius norm of the matricisation Mj (x). By Exercise 2.13, the minimiser of the last formulation is Cj = Mj (w0 ) B[j] MjH (Mj MjH )−1
with Mj := Mj (a).
The rank condition of Exercise 2.13 is equivalent to rankj (v) = rj . Cj satisfies (17.13b) because BjH Mj (w0 ) = Mj (BjH w0 ) and BjH w0 = BjH (I − B)w = 0. Using the singular-value decomposition Mj = Uj Σj VjT , we may rewrite Cj as Mj (w0 )B[j] Vj Σj UjH (Uj Σj2 UjH )−1 = Mj (w0 )B[j] Vj Σj−1 UjH . We summarise the result in the next lemma. Lemma 17.8. Let v ∈ V with rankj (v) = rj (1 ≤ j ≤ d). For any w ∈ V, the orthogonal projection onto T (v) is given by t = P (v)w in (17.13a) with s := BH w,
Cj := Mj ((I − BBH )w) B[j] Vj Σj−1 UjH .
17.3.4 Hierarchical Format Hr We choose the HOSVD representation for v ∈ Hr : v = ρHOSVD TD , (Cα )α∈TD \L(TD ) , c(D) , (Bj )j∈D HT
(α) (α) (cf. Definition 11.3.3); i.e., all bases Bα = b1 , . . . , brα of Umin α (v) are HOSVD bases. They are characterised by the following properties of the coefficient matrices C (α,`) in Cα = (C (α,`) )1≤`≤rα . (1) For the root D assume rd = 1. Then (D)
C (D,1) = Σα := diag{σ1 , . . .}, (D)
where σi are the singular values of the matricisation Mα1 (v) (α1 son of D). (2) For non-leaf vertices α ∈ TD , α 6= D , we have rα X `=1
(α)
(σ` )2 C (α,`) C (α,`)H = Σα2 1 ,
rα X
(α)
(σ` )2 C (α,`)T C (α,`) = Σα2 2 , (17.14)
`=1
where α1 , α2 are the first and second son of α ∈ TD , and Σαi the diagonal containing the singular values of Mαi (v) (cf. Exercise 11.41). Let v(t) ∈ Hr be a differentiable function. We characterise v(t) ˙ at t = 0 and abbreviate v˙ := v(0). ˙ The differentiation follows the recursion of the representa(D) (D) tion. Since rd = 1, v = c1 b1 yields (D) (D)
v˙ = c˙1 b1
(D) (D) + c1 b˙ 1 .
The differentiation of the basis functions follows from (11.20):
(17.15a)
17 Miscellaneous Topics
580 (α) b˙ ` =
rα1 rα2 X X
(α,`)
(α1 )
(α2 )
⊗ bj
c˙ij
bi
(α,`) cij
(α ) (α ) b˙ i 1 ⊗ bj 2 +
(17.15b)
i=1 j=1 rα1 rα2
XX
+
rα 1 rα 2 X X
(α,`)
cij
(α1 )
bi
(α ) ⊗ b˙ j 2
i=1 j=1
i=1 j=1
(D)
At the end of the recursion, v˙ is represented by the differentiated coefficients c˙1 , (α,`) (j) c˙ij and the derivatives b˙ i of the bases at the leaves. (α) By the same argument as in Lemma 17.7, we may restrict the variations b˙ ` to min the orthogonal complement of Uα (v), i.e., (α) b˙ ` ⊥ Umin α (v) .
(17.16)
We introduce the projections Pα : Vα → Umin α (v),
Pα =
rα D E X (α) (α) ·, b` b` `=1
⊥ onto Umin α (v) and its complement Pα := I − Pα . (D) Next, we discuss the unique representation of the parameters. c˙1 is obtained as D E (D) (D) . (17.17a) c˙1 = v, ˙ b1 (D) b˙ 1 is the result of
(D) b˙ 1 =
(D)
1 (D) c1
Pd⊥ v˙
(17.17b)
(D) (D)
with kvk = |c1 |. Note that c1 b˙ 1 is the quantity of interest. (α) We assume by induction that b˙ ` is known and use (17.15b): D E (α,`) (α) (α ) (α ) c˙ij = b˙ ` , bi 1 ⊗ bj 2 . Set
rα
(α) β`
:=
Pα⊥1
(17.17c)
rα
2 1 X (α) X (α,`) (α ) (α ) ⊗ id b˙ ` = cij b˙ i 1 ⊗ bj 2 .
i=1 j=1 (α)
(α)
The scalar product of (σ` )2 β` * (α) (α) (σ` )2 β` ,
rα 2 X
with respect to Vα2 is
+ (α,`) (α ) cik bk 2
k=1 (α) (σ` )2
(α,`) (α2 ) k ci0 k bk
P
α2
rα1 rα2
* =
and
XX i0 =1 j=1
(α,`) ci0 j
(α ) b˙ i0 1
⊗
(α ) bj 2 ,
rα2 X k=1
+ (α,`) (α ) cik bk 2 α2
17.4 ANOVA
=
rα 1 X
581
* (α) (σ` )2
i0 =1 rα 1
=
(α,`) ci0 j
rα2 X
(α ) bj 2 ,
j=1
X i0 =1
rα2 X
+ (α,`) (α2 ) bk
(α ) b˙ i0 1
cik
k=1
rα2
rα 1 X (α) (α,`) (α,`) X (α ) (α) (α ) (σ` )2 ci0 j cij b˙ i 1 = (σ` )2 C (α,`) C (α,`)H 0 b˙ i0 1 . ii
i0 =1
j=1
Summation over ` and identity (17.14) yield rα X
(α)
(α)
(σ` )2 β` ,
`=1
rα2 X
(α,`) (α2 ) bk
cik
= α2
k=1
rα1 rα X X i0 =1
(α)
(σ` )2 C (α,`) C (α,`)H
`=1
i0 i
(α ) b˙ i0 1
rα1
=
X
Σα2 1
i0 i
(α ) (α ) (α ) b˙ i0 1 = (σi 1 )2 b˙ i 1 .
(17.17d)
i0 =1 (α)
Similarly, γ`
rα X
(α) := id ⊗ Pα⊥2 b˙ ` holds and (α)
(α)
(σ` )2 γ` ,
`=1
rα1 X
(α,`) (α2 ) bi
cij
(α2 ) 2
= (σj
(α ) ) b˙ j 2 .
(17.17e)
α1
i=1
We summarise: Assume v ∈ Hr and dim(Umin α (v)) = rα for α ∈ TD (this (α) implies that σi > 0 for 1 ≤ i ≤ rα ). Under condition (17.16), the tangential (α) (α,`) (D) characterised in tensor v˙ ∈Hr has a unique description by c˙1 , b˙ ` , and c˙ij (17.17a–e). An investigation of the tangent space of the TT format is given by Holtz– Rohwedder–Schneider [167].
17.4 ANOVA ANOVA is the abbreviation of ‘analysis of variance’. It uses a decomposition of functions into terms having different spatial dimensions. If contributions of high spatial dimension are sufficiently small, an approximation by functions of a smaller number of variables is possible.
17.4.1 Definitions Consider a space V of functions in d variables. As an example we choose V = C([0, 1]d ) =
Od k·k∞
j=1
Vj
with Vj = C([0, 1]).
We denote the function with constant value one by 1 ∈ Vk . Functions which are constant with respect to the variable xk are characterised by Ukmin (f ) = span{1}
17 Miscellaneous Topics
582
and can also be considered as elements of V[k] = N identify k·k∞ j∈t Vj for any subset t ⊂ D with Vt :=
O k·k∞
Vj ∼ =
d O k·k∞
k·k∞
Wj ⊂ V
with Wj :=
j=1
j∈t
N
j∈D\{k}
Vj . We may
Vj if j ∈ t, span{1} if j ∈ /t
(cf. Remark 3.26a). For instance, f ∈ V∅ is a globally constant function, while f ∈ V{1,3,4,...} is constant with respect to x2 so that f (x1 , x2 , x3 , . . .) can also be written as f (x1 , x3 , x4 , . . .). Fix a functional Pj ∈ Vj∗ with Pj 1 = 0. We denote the mapping f ∈ Vj 7→ (Pj f ) · 1 ∈ Vj by the same symbol Pj . In the second interpretation, Pj ∈ L(Vj , Vj ) is a projection onto Q the subspace span{1} ⊂ Vj . For each subset t ⊂ D, the product Pt := j∈t Pj defines a projection onto Vtc , where tc = D\t. Note that the order of its factors is irrelevant. P∅ = id holds for t = ∅. A tensor notation is d O Pj if j ∈ t ∈ L(V, Vtc ). Pt := id if j ∈ /t j=1
The recursive definition ft := Ptc f −
X τ &t
fτ
(17.18)
starts with t = ∅ since the empty sum in (17.18) leads to the constant function f∅ = Pd f ∈ V∅ . As PDc = P∅ is the identity, the choice t = D in (17.18) yields the ANOVA decomposition X ft . f= (17.19) t⊂D
Note that ft depends on (at most) #t variables.
17.4.2 Properties Lemma 17.9. (a) Let s, t ⊂ D. The Hadamard product of f ∈ Vs and g ∈ Vt belongs to Vs∪t . (b) Pj ft = 0 holds for ft in (17.19) with j ∈ t. (c) If s, t ⊂ D are different, the ANOVA components fs and gt of some functions f and g satisfy Pτ (fs gt ) = 0 for all τ ⊂ D with τ ∩ (s\t ∪ t\s) 6= ∅.
17.4 ANOVA
583
Proof. Part (a) is trivial. For (b) we rewrite (17.18) as X fτ , P tc f =
(17.20a)
τ ⊂t
P P where the sum includes τ = t. P Split the sum into τ ⊂t with j∈τ and τ ⊂t with j ∈τ / . The second sum is identical to τ ⊂t\{j} : X X X fτ + Ptc f = fτ = fτ + P(t\{j})c f. (17.20b) τ ⊂t with j∈τ
τ ⊂t\{j}
(17.20a)
τ ⊂t with j∈τ
Since (t\{j})c = tc ∪ {j}, the projection Pj satisfies the identity Pj ◦ Ptc = P(t\{j})c = Pj ◦ P(t\{j})c , and application of Pj to (17.20b) leads to ! X fτ = 0 for all t ⊂ D with j ∈ t. Pj (17.20c) τ ⊂t with j∈τ
We use induction on #t. The smallest set with j ∈ t is t = {j}, for which (17.20c) becomes Pj f{j} = 0. Assume that Pj fσ = 0 holds for all σ ⊂ D with j ∈ σ andP#σ ≤ k < d. Assume j ∈ t and #t = k + 1. Identity (17.20c) shows that 0 = τ ⊂t with j∈τ Pj fτ = Pj ft since all other τ satisfy #τ ≤ k. For Part (c) choose some j ∈ τ ∩ (s\t ∪ t\s). Assume without loss of generality that j ∈ t but j ∈ / s. Hence fs is constant with respect to xj and since Pj gt = 0 follows from Part (b), Pj (fs gt ) = 0 and Pτ (fs gt ) = 0 also hold for any τ with j ∈ τ. t u For the choice Vj = L2ρj ([0, 1]) of the Hilbert space of weighted L2 functions, R1 the scalar product is defined by (f, g) = 0 f (x)g(x)dρj (x). The weight should R1 R be scaled such that 0 dρj = 1. The induced scalar product of V is [0,1]d f g dρ Qd with dρ = j=1 dρj . Then the functional Pj ∈ Vj∗ from above can be defined by R1 Pj f := 0 f dρj and Lemma 17.9c takes the following form. R1 Remark 17.10. Let V = L2ρ ([0, 1]d ), Pj ϕ := 0 ϕdρj , and s, t ⊂ D with s 6= t. Then the ANOVA components fs , gt of any f, g ∈ V are orthogonal in V. P In this case, f = t⊂D ft is an orthogonal decomposition. If the terms ft decrease with increasing #t, we may truncate the sum and introduce the approxP imation f (k) := t:#t≤k ft . It may even happen that ft vanishes for #t > k. An important example of ft = 0 for #t > 2 is the potential of the Schr¨odinger equation1 X X X Qi 1 − f (x1 , . . . , xd ) = kxi − xj k kai − xj k 1≤i